1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Python手遊び(mordredでいっぱい計算)

1
Last updated at Posted at 2019-12-08

この記事、何?

ケモインフォマティクスのかけらの記事。
記述子計算ライブラリのMordred (1.0.0) を使っていて、計算結果としてのinfをようやく見つけてみた、という話。

背景

最近、記述子計算結果をちょっとだけ扱っているの。
で、「inf」という結果があるらしいと。でも見たことない。
軽く1000件くらい探してみたけど見つからない。
じゃ、開き直って数万件とか計算させて見つけてみようか、と。

方針

普通にMordred計算。
で、エラーが出る都度exceptを追記。
そのうち長時間稼働し続けるでしょ、というくらいの感じでした。

環境

Windows 10 Pro (x64)でAnacondaから作った以下の環境。

# Name                    Version                   Build  Channel
python                    3.6.8                h9f7ef89_7
rdkit                     2017.09.2.0      py36he334aed_1    rdkit
mordred                   1.0.0                    py36_0    mordred-descriptor

上記以外はあまり影響ないかなと思い割愛。

結果

コードがこれ。

from rdkit import Chem
from mordred import Calculator, descriptors
from mordred import error as err
from datetime import datetime

descs = Calculator(descriptors, ignore_3D=False).descriptors

# ------------------------------------------------------
#  functions
# ------------------------------------------------------


# get compounds
def get_mols(file):
    return Chem.SDMolSupplier(file)


# write text
def output_text(filename, mode, values):
    with open(filename + '.csv', mode) as f:
        f.write(','.join(values) + '\n')


# calculation
def calculate_desc(calc, mol):
    value = None
    try:
        value = calc(mol)
    except ZeroDivisionError as e:
        value = 'errZero'
    except IndexError as e:
        value = 'errIndex'
    except ValueError as e:
        value = 'errValue'
    except NameError as e:
        value = 'errNone'
    except err.Missing3DCoordinate as e:
        value = 'err3D'
    except err.MultipleFragments as e:
        value = 'errMulti'
    return str(value)


# print log
def printlog(value):
    print(str(datetime.now()) + ',' + str(i))

# ------------------------------------------------------
#  main
# ------------------------------------------------------


# get compounds
filename = 'CHEMBL503873'
mols = get_mols(filename + '.sdf')

# get calculators
headers = list()
calcs = list()
headers.append('Name')
for i in range(1824):
    calcs.append(descs[i])
    headers.append(calcs[i].__str__())

# output
output_text(filename, 'w', headers)
printlog(0)
for i, mol in enumerate(mols):
    values = list()
    if mol is not None:
        values.append(mol.GetProp('_Name'))
        for calc in calcs:
            values.append(calculate_desc(calc, mol))
        output_text(filename, 'a', values)
        if i % 100 == 0:
            printlog(i)

で、スクリプトに埋め込んじゃってますけど、見つけた化合物がこれ。

CHEMBL503873
C70H108O24

CO[C@H]1C[C@H](COC[C@@H]2[C@@H](C)O[C@H](C[C@@H]2OC)O[C@H]3CC[C@]4(C)[C@H]5C[C@@H](OC(=O)\C=C\c6ccccc6)[C@]7(C)[C@@](O)(CC[C@]7(O)[C@]5(O)CC=C4C3)C(=O)C)O[C@@H](C)[C@H]1COC[C@H]8C[C@H](OC)[C@H](COC[C@H]9C[C@@H](OC)[C@@H](O[C@H]%10O[C@@H](CO)[C@H](O)[C@@H](O)[C@@H]%10O)[C@H](C)O9)[C@@H](C)O8

で、念のための確認がこれ。

■Python手遊び(mordredで記述子を1個単位で計算)
https://qiita.com/siinai/items/026aad1f05c9f6d51199

(py36) D:\py>python 71-01.py
GRAVH
-------------------------------------

inf

うん。確かにinf見つけたね。

感想

久しぶりにちょっとだけまじめな記事を投稿。
でもやっぱり手を動かすと面白いね。
しかし、ひたすら時間がかかるのね。4 Cores/ 8 ThreadsのCPU使ってるのに・・・って、CPU負荷率30%とかで推移。
あっ、そりゃそうか。
まあ、まじめにmulti-thread組んで、計算機ごととか、化合物ごととかで分けるといいんでしょうけどね。
あとでやってみたいわ。

追記

とりあえず計算結果貼ってみる。

まあ・・・この計算だけで1日以上CPU負荷100%で頑張ってもらった感じ・・・そんなものかねぇ・・・大変だわ。
GPUも欲しいけど、CPUも欲しくなるよね。それこそ16コアとか。

CHEMBL10786
CHEMBL263256
CHEMBL503873
CHEMBL501567
CHEMBL500702
CHEMBL501093
CHEMBL501094
CHEMBL505931
CHEMBL444732
CHEMBL444155
CHEMBL445174
CHEMBL445253
CHEMBL444510
CHEMBL501306
CHEMBL502034
CHEMBL499522
CHEMBL500203
CHEMBL498862
CHEMBL503717
CHEMBL503722
CHEMBL504025
CHEMBL504038
CHEMBL502642
CHEMBL500358
CHEMBL500619
CHEMBL500622
CHEMBL500058
CHEMBL500182
CHEMBL500184
CHEMBL504187
CHEMBL525749
CHEMBL525930
CHEMBL526006
CHEMBL526343
CHEMBL526355
CHEMBL526373
CHEMBL499978
CHEMBL499980
CHEMBL500099
CHEMBL500244
CHEMBL508221
CHEMBL500219
CHEMBL500223
CHEMBL506996
CHEMBL507128
CHEMBL525750
CHEMBL503778
CHEMBL503489
CHEMBL503495
CHEMBL507216
CHEMBL502664
CHEMBL502666
CHEMBL503666
CHEMBL503894
CHEMBL525940
CHEMBL525945
CHEMBL526501
CHEMBL500441
CHEMBL500451
CHEMBL502457
CHEMBL525219
CHEMBL525221
CHEMBL527042
CHEMBL525450
CHEMBL526129
CHEMBL526130
CHEMBL508387
CHEMBL508391
CHEMBL498956
CHEMBL503974
CHEMBL503979
CHEMBL507601
CHEMBL504097
CHEMBL524833
CHEMBL525962
CHEMBL525424
CHEMBL525951
CHEMBL526360
CHEMBL525216
CHEMBL525217
CHEMBL509192
CHEMBL501147
CHEMBL501266
CHEMBL503261
CHEMBL526689
CHEMBL526690
CHEMBL498967
CHEMBL501641
CHEMBL500002
CHEMBL500011
CHEMBL524521
CHEMBL506061
CHEMBL504078
CHEMBL508019
CHEMBL500187
CHEMBL500103
CHEMBL445002
CHEMBL525762
CHEMBL525763
CHEMBL525398
CHEMBL525399
CHEMBL526113
CHEMBL526115
CHEMBL526119
CHEMBL526121
CHEMBL526181
CHEMBL502415
CHEMBL502420
CHEMBL502978
CHEMBL505143
CHEMBL501291
CHEMBL502603
CHEMBL503695
CHEMBL504000
CHEMBL504159
CHEMBL526190
CHEMBL526301
CHEMBL501788
CHEMBL506306
CHEMBL500524
CHEMBL499537
CHEMBL501823
CHEMBL504080
CHEMBL504417
CHEMBL507534
CHEMBL502988
CHEMBL500373
CHEMBL500375
CHEMBL505276
CHEMBL500264
CHEMBL526336
CHEMBL525083
CHEMBL525086
CHEMBL525089
CHEMBL503245
CHEMBL503306
CHEMBL501970
CHEMBL503617
CHEMBL503852
CHEMBL503858
CHEMBL502077
CHEMBL501569
CHEMBL504902
CHEMBL526516
CHEMBL526681
CHEMBL526682
CHEMBL525441
CHEMBL501317
CHEMBL501323
CHEMBL502678
CHEMBL503342
CHEMBL507824
CHEMBL499931
CHEMBL499957
CHEMBL500483
CHEMBL500788
CHEMBL525771
CHEMBL503047
CHEMBL503286
CHEMBL504214
CHEMBL504401
CHEMBL525073
CHEMBL525624
CHEMBL526743
CHEMBL526874
CHEMBL526876
CHEMBL524358
CHEMBL524487
CHEMBL524488
CHEMBL527050
CHEMBL524494
CHEMBL524498
CHEMBL525068
CHEMBL525069
CHEMBL525407
CHEMBL525409
CHEMBL527084
CHEMBL591794
CHEMBL592148
CHEMBL592149
CHEMBL1208990
CHEMBL524531
CHEMBL524539
CHEMBL593680
CHEMBL589995
CHEMBL589997
CHEMBL525394
CHEMBL526678
CHEMBL526890
CHEMBL525224
CHEMBL525386
CHEMBL526131
CHEMBL596000
CHEMBL526544
CHEMBL526545
CHEMBL527072
CHEMBL527074
CHEMBL525419
CHEMBL525991
CHEMBL530121
CHEMBL526741
CHEMBL595999
CHEMBL526703
CHEMBL526853
CHEMBL526916
CHEMBL526922
CHEMBL525076
CHEMBL524356
CHEMBL524357
CHEMBL525237
CHEMBL525242
CHEMBL525402
CHEMBL530345
CHEMBL605624
CHEMBL608706
CHEMBL605628
CHEMBL595776
CHEMBL591446
CHEMBL607837
CHEMBL1097890
CHEMBL589278
CHEMBL589762
CHEMBL602303
CHEMBL605828
CHEMBL609471
CHEMBL604989
CHEMBL608415
CHEMBL1097888
CHEMBL1213233
CHEMBL611968
CHEMBL1099238
CHEMBL132931
CHEMBL135376
CHEMBL136703
CHEMBL194552
CHEMBL207341
CHEMBL214100
CHEMBL216830

1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?