LoginSignup
7
9

More than 1 year has passed since last update.

RDKit 記述子が 400 に増えたさかい、他と比較してみてん。

Last updated at Posted at 2021-12-01

RDKit のバージョンが 2020.09.1 になって、計算できる記述子の数が増えたさかい、他と比較してみたで。え? 今年は 2021 年やって? ほんまやな。この記事書いてから気付いたわ。ちょっと古い話かも知れへんけど、堪忍やで。

RDKit installer

RDKit installer は、 miniconda を使って RDKit を Google Colaboratory 上にインストールできるパッケージやで。ついでに色んな補助機能も付けてん。偉いやろ。褒めて褒めて。まずは、こんな感じで pip install しはってな。

%%time 
!pip install git+https://github.com/maskot1977/rdkit_installer.git
Collecting git+https://github.com/maskot1977/rdkit_installer.git
  Cloning https://github.com/maskot1977/rdkit_installer.git to /tmp/pip-req-build-geoedzyq
  Running command git clone -q https://github.com/maskot1977/rdkit_installer.git /tmp/pip-req-build-geoedzyq
Building wheels for collected packages: rdkit-installer
  Building wheel for rdkit-installer (setup.py) ... [?25l[?25hdone
  Created wheel for rdkit-installer: filename=rdkit_installer-0.2.0-py3-none-any.whl size=5686 sha256=671153f44f25467ad8d1d23028a8595af9e0ad744c806d548044d2b9573e276a
  Stored in directory: /tmp/pip-ephem-wheel-cache-jhs5sxr1/wheels/e6/72/a5/218f5f909a3a87c1ec1ccec03ac61298947fb5f1efa517eefa
Successfully built rdkit-installer
Installing collected packages: rdkit-installer
Successfully installed rdkit-installer-0.2.0
CPU times: user 41.7 ms, sys: 19.4 ms, total: 61.1 ms
Wall time: 5.46 s

Anaconda は有償化されたねんけど miniconda 使わはる分にはOKでっせ っちゅう記事(参考文献)があったで。知らんけど。

%%time 
from rdkit_installer import install
install.from_miniconda(rdkit_version="2020.09.1")
add /root/miniconda/lib/python3.7/site-packages to PYTHONPATH
python version: 3.7.12
fetching installer from https://repo.continuum.io/miniconda/Miniconda3-4.7.12-Linux-x86_64.sh
done
installing miniconda to /root/miniconda
done
installing rdkit
done
rdkit-2020.09.1 installation finished!


CPU times: user 728 ms, sys: 303 ms, total: 1.03 s
Wall time: 53.6 s

RDKit Descriptor

RDKit 2020.09.1 のバージョンから400個の記述子が計算できるようになったねんけど、古いバージョンの208個の記述子だけ計算したい思わはんねんたら、こうしはったらええで。

化合物のデータは見せれへんけど、df は pandas の DataFrame 形式のデータで、その 'Open Babel SMILES' カラムに SMILES文字列が入ってる思わはって。

%%time 
from rdkit_installer.descriptors import calc_208descriptors

rdkit_208descriptors_df = calc_208descriptors(df['Open Babel SMILES'])
display(rdkit_208descriptors_df)
MaxEStateIndex MinEStateIndex MaxAbsEStateIndex MinAbsEStateIndex qed MolWt HeavyAtomMolWt ExactMolWt NumValenceElectrons NumRadicalElectrons MaxPartialCharge MinPartialCharge MaxAbsPartialCharge MinAbsPartialCharge FpDensityMorgan1 FpDensityMorgan2 FpDensityMorgan3 BCUT2D_MWHI BCUT2D_MWLOW BCUT2D_CHGHI BCUT2D_CHGLO BCUT2D_LOGPHI BCUT2D_LOGPLOW BCUT2D_MRHI BCUT2D_MRLOW BalabanJ BertzCT Chi0 Chi0n Chi0v Chi1 Chi1n Chi1v Chi2n Chi2v Chi3n Chi3v Chi4n Chi4v HallKierAlpha ... fr_hdrzine fr_hdrzone fr_imidazole fr_imide fr_isocyan fr_isothiocyan fr_ketone fr_ketone_Topliss fr_lactam fr_lactone fr_methoxy fr_morpholine fr_nitrile fr_nitro fr_nitro_arom fr_nitro_arom_nonortho fr_nitroso fr_oxazole fr_oxime fr_para_hydroxylation fr_phenol fr_phenol_noOrthoHbond fr_phos_acid fr_phos_ester fr_piperdine fr_piperzine fr_priamide fr_prisulfonamd fr_pyridine fr_quatN fr_sulfide fr_sulfonamd fr_sulfone fr_term_acetylene fr_tetrazole fr_thiazole fr_thiocyan fr_thiophene fr_unbrch_alkane fr_urea
0 12.203893 -0.011059 12.203893 0.011059 0.747310 236.359 212.167 236.188863 96 0 0.232872 -0.315939 0.315939 0.232872 1.352941 2.000000 2.588235 16.153539 10.090561 2.217843 -2.244821 2.189928 -2.340170 5.823874 -0.131085 2.692946 342.131373 12.999636 11.656047 11.656047 7.913591 6.243605 6.243605 5.131783 5.131783 3.017810 3.017810 2.172367 2.172367 -1.09 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 8.990151 0.150741 8.990151 0.150741 0.649670 150.221 136.109 150.104465 60 0 0.068649 -0.391660 0.391660 0.068649 1.090909 1.545455 1.909091 16.250348 10.006804 2.001032 -2.068853 2.150011 -1.932082 5.362305 0.279980 3.247713 240.570442 8.430721 7.309021 7.309021 5.147066 3.824482 3.824482 3.035385 3.035385 1.998483 1.998483 1.459315 1.459315 -0.82 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 9.393241 -0.207824 9.393241 0.008704 0.608365 168.192 156.096 168.078644 66 0 0.126112 -0.507200 0.507200 0.126112 1.083333 1.500000 1.833333 16.273518 10.058331 2.056393 -2.067148 2.237259 -1.920923 5.421688 0.263338 3.219755 256.097291 9.137828 6.910555 6.910555 5.685071 3.717870 3.717870 2.748003 2.748003 1.752819 1.752819 1.175315 1.175315 -1.06 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 9.119284 0.353611 9.119284 0.353611 0.652274 150.221 136.109 150.104465 60 0 0.115360 -0.507956 0.507956 0.115360 1.272727 1.909091 2.363636 16.254670 9.980552 2.018479 -2.082124 2.216557 -1.897398 5.349107 0.474188 3.092720 251.049732 8.430721 7.256615 7.256615 5.109061 3.905016 3.905016 3.195591 3.195591 1.875392 1.875392 1.307668 1.307668 -0.98 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 2.401620 0.311389 2.401620 0.311389 0.485103 204.357 180.165 204.187801 84 0 -0.014017 -0.084980 0.084980 0.014017 1.066667 1.800000 2.533333 14.164831 9.873635 2.112923 -2.177669 2.253025 -2.050045 5.115296 0.485569 2.790626 287.160171 11.311555 10.637828 10.637828 6.994800 5.984764 5.984764 5.438827 5.438827 3.016960 3.016960 1.800810 1.800810 -0.78 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 8.990463 -0.446481 8.990463 0.057870 0.330838 156.137 148.073 156.042259 60 0 0.163917 -0.504157 0.504157 0.163917 0.909091 1.272727 1.636364 16.319815 10.150606 2.112174 -2.046124 2.403314 -1.779929 5.573520 0.368156 3.455546 266.732315 8.593858 5.866205 5.866205 5.036581 2.971777 2.971777 2.237032 2.237032 1.469088 1.469088 0.762583 0.762583 -1.58 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 10.577593 -0.546019 10.577593 0.141481 0.571034 142.110 136.062 142.026609 54 0 0.226354 -0.501988 0.501988 0.226354 1.600000 2.300000 2.800000 16.364841 10.370775 1.977536 -1.919971 2.014312 -1.988382 5.143583 0.240965 3.172371 275.321460 7.560478 5.072731 5.072731 4.736382 2.653364 2.653364 1.753958 1.753958 1.022301 1.022301 0.526670 0.526670 -1.29 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
630 4.165093 1.049167 4.165093 1.049167 0.605928 146.193 136.113 146.084398 56 0 0.093137 -0.344588 0.344588 0.093137 1.181818 1.909091 2.727273 14.858487 10.106251 1.972654 -1.967400 2.143397 -1.794492 5.757300 1.343803 2.857747 351.362105 7.844935 6.679264 6.679264 5.270857 3.675181 3.675181 2.856153 2.856153 2.035718 2.035718 1.194726 1.194726 -1.31 ... 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
631 9.567963 0.398056 9.567963 0.398056 0.652274 150.221 136.109 150.104465 60 0 0.121428 -0.507385 0.507385 0.121428 1.272727 1.909091 2.363636 16.256039 9.998067 2.053190 -2.085892 2.271210 -1.884369 5.403987 0.460434 3.165083 251.049732 8.430721 7.256615 7.256615 5.125898 3.910999 3.910999 3.167445 3.167445 1.890976 1.890976 1.147545 1.147545 -0.98 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 10.518632 -0.384815 10.518632 0.156667 0.392541 166.180 156.100 166.074228 64 0 0.274814 -0.398706 0.398706 0.274814 1.333333 1.750000 2.083333 16.628418 10.090020 2.134719 -2.013243 2.243835 -1.965347 5.555281 -0.385558 3.379814 310.824900 9.300965 6.995761 6.995761 5.519745 3.532131 3.532131 2.760417 2.760417 1.695964 1.695964 1.277373 1.277373 -1.58 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 208 columns

CPU times: user 5.49 s, sys: 265 ms, total: 5.75 s
Wall time: 5.79 s

まぁざっとこんな感じや。ついでやし計算時間も表示するようにしといたで。

そんで、400個の記述子を計算するコードはこっちや。

%%time 
from rdkit_installer.descriptors import calc_descriptors

rdkit_descriptors_df = calc_descriptors(df['Open Babel SMILES'])
display(rdkit_descriptors_df)
AUTOCORR2D_1 AUTOCORR2D_10 AUTOCORR2D_100 AUTOCORR2D_101 AUTOCORR2D_102 AUTOCORR2D_103 AUTOCORR2D_104 AUTOCORR2D_105 AUTOCORR2D_106 AUTOCORR2D_107 AUTOCORR2D_108 AUTOCORR2D_109 AUTOCORR2D_11 AUTOCORR2D_110 AUTOCORR2D_111 AUTOCORR2D_112 AUTOCORR2D_113 AUTOCORR2D_114 AUTOCORR2D_115 AUTOCORR2D_116 AUTOCORR2D_117 AUTOCORR2D_118 AUTOCORR2D_119 AUTOCORR2D_12 AUTOCORR2D_120 AUTOCORR2D_121 AUTOCORR2D_122 AUTOCORR2D_123 AUTOCORR2D_124 AUTOCORR2D_125 AUTOCORR2D_126 AUTOCORR2D_127 AUTOCORR2D_128 AUTOCORR2D_129 AUTOCORR2D_13 AUTOCORR2D_130 AUTOCORR2D_131 AUTOCORR2D_132 AUTOCORR2D_133 AUTOCORR2D_134 ... fr_hdrzone fr_imidazole fr_imide fr_isocyan fr_isothiocyan fr_ketone fr_ketone_Topliss fr_lactam fr_lactone fr_methoxy fr_morpholine fr_nitrile fr_nitro fr_nitro_arom fr_nitro_arom_nonortho fr_nitroso fr_oxazole fr_oxime fr_para_hydroxylation fr_phenol fr_phenol_noOrthoHbond fr_phos_acid fr_phos_ester fr_piperdine fr_piperzine fr_priamide fr_prisulfonamd fr_pyridine fr_quatN fr_sulfide fr_sulfonamd fr_sulfone fr_term_acetylene fr_tetrazole fr_thiazole fr_thiocyan fr_thiophene fr_unbrch_alkane fr_urea qed
0 2.962 3.093 -0.045 -0.266 0.351 -0.275 0.073 -0.268 0.111 -0.212 0.096 -0.304 3.205 0.319 -0.295 0.05 -0.182 0.134 -0.208 -0.051 -0.263 0.349 -0.273 3.100 0.074 -0.238 0.129 -0.215 0.034 -0.294 0.349 -0.293 0.061 -0.307 2.600 0.042 -0.187 0.230 -0.298 0.185 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.747310
1 2.512 2.755 -0.300 -0.450 -1.000 0.000 0.000 -0.000 0.027 -0.047 -0.300 -0.450 2.736 -1.000 0.000 0.00 0.000 0.027 -0.047 -0.300 -0.450 -1.000 0.000 2.385 0.000 -0.000 0.027 -0.047 -0.300 -0.450 -1.000 0.000 0.000 0.000 0.999 0.027 -0.047 -0.300 -0.450 -1.000 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.649670
2 2.639 2.764 0.231 -1.000 3.000 0.000 0.000 -0.000 -0.000 -0.294 0.231 -1.000 2.755 3.000 0.000 0.00 0.000 0.000 -0.294 0.231 -1.000 3.000 0.000 2.451 0.000 -0.000 -0.000 -0.294 0.231 -1.000 3.000 0.000 0.000 0.000 1.792 0.000 -0.294 0.231 -1.000 3.000 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.608365
3 2.512 2.736 -0.120 -0.267 -1.000 0.000 0.000 -0.000 -0.047 -0.057 -0.120 -0.267 2.669 -1.000 0.000 0.00 0.000 -0.047 -0.057 -0.120 -0.267 -1.000 0.000 2.345 0.000 -0.000 -0.047 -0.057 -0.120 -0.267 -1.000 0.000 0.000 0.000 1.312 -0.047 -0.057 -0.120 -0.267 -1.000 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.652274
4 2.773 3.045 1.000 1.000 1.000 0.000 0.000 1.000 1.000 1.000 1.000 1.000 2.996 1.000 0.000 0.00 1.000 1.000 1.000 1.000 1.000 1.000 0.000 2.996 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 0.000 1.000 3.091 1.000 1.000 1.000 1.000 1.000 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.485103
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 2.590 2.689 -0.214 1.750 0.000 0.000 0.000 -0.000 -0.214 -0.214 -0.214 1.750 2.651 0.000 0.000 0.00 0.000 -0.214 -0.214 -0.214 1.750 0.000 0.000 1.989 0.000 -0.000 -0.214 -0.214 -0.214 1.750 0.000 0.000 0.000 0.000 0.704 -0.214 -0.214 -0.214 1.750 0.000 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.330838
629 2.539 2.461 -0.306 -0.167 1.500 0.000 0.000 -0.167 -0.359 0.181 -0.306 -0.167 2.313 1.500 0.000 0.00 -0.167 -0.359 0.181 -0.306 -0.167 1.500 0.000 1.681 0.000 -0.167 -0.359 0.181 -0.306 -0.167 1.500 0.000 0.000 -0.167 1.079 -0.359 0.181 -0.306 -0.167 1.500 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.571034
630 2.615 2.810 -0.389 -0.389 0.000 0.000 0.000 -0.185 0.186 -0.127 -0.389 -0.389 2.641 0.000 0.000 0.00 -0.185 0.186 -0.127 -0.389 -0.389 0.000 0.000 2.083 0.000 -0.185 0.186 -0.127 -0.389 -0.389 0.000 0.000 0.000 -0.185 1.508 0.186 -0.127 -0.389 -0.389 0.000 ... 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.605928
631 2.512 2.736 -0.230 0.100 0.000 0.000 0.000 -0.000 -0.047 -0.193 -0.230 0.100 2.699 0.000 0.000 0.00 0.000 -0.047 -0.193 -0.230 0.100 0.000 0.000 2.317 0.000 -0.000 -0.047 -0.193 -0.230 0.100 0.000 0.000 0.000 0.000 1.609 -0.047 -0.193 -0.230 0.100 0.000 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.652274
632 2.670 2.771 -0.715 -0.713 1.290 0.000 0.000 0.510 0.103 -0.364 -0.725 -0.233 2.735 1.950 0.000 0.00 0.423 0.231 -0.273 -0.712 -0.725 1.249 0.000 2.546 0.000 0.495 0.158 -0.333 -0.736 -0.480 1.755 0.000 0.000 0.423 1.100 -0.019 -0.395 -0.611 0.507 1.809 ... 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.392541

633 rows × 400 columns

CPU times: user 6.03 s, sys: 332 ms, total: 6.37 s
Wall time: 6.36 s

なんや、案外、時間かからへんな。せやけどこのカラム名、何やねん。意味わからん特徴量が増えたな。解釈性あきらめたんかな。知らんけど。

各種フィンガープリント

ついでやし、各種フィンガープリントも計算してまうで。

%%time 
from rdkit_installer.fingerprints import Fingerprinter

fingerprinter = Fingerprinter()
fp_dfs = []
for name in fingerprinter.names:
    fp_df = pd.DataFrame([vec for vec in fingerprinter.transform(df['Open Babel SMILES'], fp_type=name)])
    if name in fingerprinter.all_bit_info_keys.keys():
        fp_df.columns=fingerprinter.all_bit_info_keys[name]

    print(name, fp_df.shape)
    fp_dfs.append(fp_df)
MACCSkeys (633, 167)
Avalon (633, 512)
Morgan2(1024bits) (633, 1024)
Morgan2F(1024bits) (633, 1024)
Morgan4(2048bits) (633, 2048)
Morgan4F(2048bits) (633, 2048)
ECFP2 (633, 693)
FCFP2 (633, 207)
ECFP4 (633, 3128)
FCFP4 (633, 1450)
ECFP6 (633, 6508)
FCFP6 (633, 4110)
CPU times: user 12.5 s, sys: 401 ms, total: 12.9 s
Wall time: 12.7 s

できあがったフィンガープリントはこんな感じや。ぎょうさんあっから見んのもえらいけど、行くで。1か0しかあらへん「bit vector」と、それ以外の自然数もある「count vector」があるさかい、違いをよお見とかはるんやで。

for fp_df, name in zip(fp_dfs, fingerprinter.names):
    print("#", name)
    display(fp_df)
    print()
# MACCSkeys
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 1 1 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0
629 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0
630 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 1 1 1 1 0 1 0
631 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0
632 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 1 1 0

633 rows × 167 columns

# Avalon
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 ... 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
629 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 ... 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
630 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 ... 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 1 1
631 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0

633 rows × 512 columns

# Morgan2(1024bits)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
629 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
630 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
631 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

633 rows × 1024 columns

# Morgan2F(1024bits)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
3 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
630 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
631 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 1024 columns

# Morgan4(2048bits)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
629 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
630 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
631 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0

633 rows × 2048 columns

# Morgan4F(2048bits)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047
0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
3 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
630 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
631 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 2048 columns

# ECFP2
2245273601 2583443460 3925172229 3325788165 918112266 4008337421 2088929298 1150554131 2027655192 478283801 148058140 737505311 3327193120 2251845666 3820826669 3163127853 1533767736 1909452861 1446633534 1821515839 2681681991 694069322 74537039 2245384272 1512794193 1445828689 2082340947 994494548 469172305 3808467027 1369337945 2477914203 4033380444 3278587996 87849052 340695135 1026654305 1999904870 3581739114 616028266 ... 2667063169 932734850 1785683843 2258843522 3846829958 1739267982 3206633359 3628883864 2513428378 3034212252 603510687 3995047843 2435680164 4278515623 848127915 3044751281 1135286194 1842145205 3454347189 3368912827 2238101436 2850656190 3981762496 2555238338 2088896453 2309124039 3523495880 152948679 717723595 458993620 3026372575 1534595042 2455228386 4223817698 1632503783 2037096428 2308325357 3011598321 4086265842 612622329
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
630 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
631 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 693 columns

# FCFP2
0 1 2 3 4 5 6 728953866 613541904 16 18 19 17 3764335124 20 614174231 3205495832 729008665 32 728943649 606136866 614173218 729164322 613541923 614173220 729164320 3766531108 594405931 728943668 614173237 728943675 614173243 3205495869 614173247 3766522948 613542983 613542989 3205495887 3205495891 614174292 ... 613595558 614173606 728928169 594405804 728944045 594405807 3766523827 3766533048 594405818 614198715 594405820 613540798 613659071 3205496772 3764348868 3768571846 594910150 3766523334 729011661 594910160 728945618 1541843420 728497630 3208849886 3764344800 3764344801 728941536 613542373 613660134 3208849897 3208849903 614173680 3205493745 3764348914 3208849907 614173686 3764344823 3205496824 3205496825 614173690
0 15 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
1 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0
2 3 0 0 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 3 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 0
3 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 ... 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 1 0 0 4 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 1 0 1 2 5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0
630 2 0 0 0 7 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0
631 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 ... 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 2 0 2 0 6 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0

633 rows × 207 columns

# ECFP4
3793461249 2105860100 3925172229 1036713990 3325788165 2805301255 4008337421 1040949262 3350945806 467730450 2458968089 452370461 670646308 3163127853 667918382 1470644272 1454104624 3599515696 4248715315 913334324 1787035702 1567088695 3602047030 3890528317 1446633534 1821515839 804536389 1191084102 912908359 1930379335 3415654472 557449285 3635585100 2386108492 3501482061 2909626449 1692958801 2082340947 2438226004 469172305 ... 3907878801 2271084433 1742798737 3628883864 3445374872 2513428378 2329247646 667123616 580206500 643719077 3332751270 3867221932 3343122350 2298806190 1918025650 3626221494 35790774 4253597625 838934460 3981762496 3230031810 2829279174 963149774 2810568659 1031544788 2299404245 2495102935 2296168412 1456226268 2850848738 1577254885 173416422 2037096428 2308325357 3813294065 2610397171 2476687347 379527158 3324518393 137543679
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
629 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
630 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
631 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 3128 columns

# FCFP4
0 1 2 3 4 5 6 2885967875 1907154955 1212801037 2374672399 16 17 18 19 20 3279839263 32 2640642081 3274137634 3366518817 2533883944 174874666 4023132220 2558574653 4222517311 1977434178 2123636802 3274137677 1976852558 1849409615 4197064793 4064444508 631562334 2605817951 530399336 3391340690 3764805781 1335812246 4209934490 ... 1465622370 2864267110 2317311856 3164340084 1924489080 1924489086 2592841601 3205496706 3205496708 3205496709 635608965 3205496710 3205496712 3205496711 3205496716 546602898 3538681752 672587674 2760531868 1512521629 3205496734 3937968028 3695083428 1320861611 3027918765 3923320750 3096362947 3205496772 84008901 3660226502 3021553606 3140321250 276463590 1263411178 4224516078 3020292087 3205496824 3205496825 3660226556 3660226557
0 15 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
1 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
2 3 0 0 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0
3 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 1 0 0 4 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 1 0 1 2 5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
630 2 0 0 0 7 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
631 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 2 0 2 0 6 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 1450 columns

# ECFP6
3325788165 1036713990 2805301255 2765094923 369950735 467730450 3197763603 2177204247 3471671321 3214213158 3163127853 2302443567 983433264 1787035702 2414182457 557449285 1191084102 1692958801 469172305 408551507 2082340947 1075183701 3840147540 4033380444 81461341 1026654305 2213904488 4286316648 3581739114 3048505454 327286897 3469148274 2827387003 2031583355 3794894977 3526197379 2691465350 392429703 1096220809 1360035982 ... 3009412898 3040411436 2119794482 2273017667 3909418820 2328067911 3776905034 3577675600 538902353 2389770068 585105244 953089885 4105666400 3819896681 972160873 1807941485 2209152882 3769728884 978321287 3103424394 916848527 3907878801 2271084433 1742798737 2329247646 667123616 2376695715 33685414 3343122350 2298806190 960430021 906887112 310116301 963149774 3067576278 1425670109 2850848738 1577254885 2610397171 1808531449
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
630 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
631 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 6508 columns

# FCFP6
0 1 2 3 4 5 6 3894042626 2290106376 1776730117 2081652746 1907154955 394584067 1212801037 4120633351 2374672399 16 17 18 19 20 1753890836 3575062556 1030635551 32 2640642081 3274137634 791404579 1212301348 2275049506 3366518817 3279839263 2533883944 174874666 396320811 2196316204 2669895726 1728970801 2069250099 3359506483 ... 4064985010 3426910138 84549562 1972658111 2018164673 270032835 3205496772 3096362947 3660226502 3021553606 582639560 3514515398 668172231 84008901 797736906 4262903757 1227366354 123813844 49889236 1909891030 2145148886 4237549528 2970853337 670048219 3641950172 3141042141 3522355165 3140321250 2316976101 276463590 1263411178 4224516078 1609252846 3020292087 3205496824 3205496825 168894458 3660226556 3660226557 3529482238
0 15 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
2 3 0 0 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
3 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 1 0 0 4 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
629 1 0 1 2 5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
630 2 0 0 0 7 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
631 4 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
632 2 0 2 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

633 rows × 4110 columns

いやー、まあ、こんなフィンガープリントのカラム名なんか見せられても、こんなん解釈困難やな。ほんまに。せやけど、こんなけぎょうさんの種類のフィンガープリントを簡単に計算できるように作ったんやで。偉いやろ。褒めて褒めて。

Mordred

記述子といえば Mordred が有名やんな。

%%time 
# mordredのインストール
!pip install mordred
Collecting mordred
  Downloading mordred-1.2.0.tar.gz (128 kB)
[K     |████████████████████████████████| 128 kB 5.1 MB/s 
[?25hRequirement already satisfied: six==1.* in /usr/local/lib/python3.7/dist-packages (from mordred) (1.15.0)
Requirement already satisfied: numpy==1.* in /usr/local/lib/python3.7/dist-packages (from mordred) (1.19.5)
Requirement already satisfied: networkx==2.* in /usr/local/lib/python3.7/dist-packages (from mordred) (2.6.3)
Building wheels for collected packages: mordred
  Building wheel for mordred (setup.py) ... [?25l[?25hdone
  Created wheel for mordred: filename=mordred-1.2.0-py3-none-any.whl size=176723 sha256=5b605782d809434f86e7473beeccbed4824da51a8e5b450715f81bc511d0b24c
  Stored in directory: /root/.cache/pip/wheels/02/c0/2e/e7e3d63b431777712ebc128bc4deb9ac5cb19afc7c1ea341ec
Successfully built mordred
Installing collected packages: mordred
Successfully installed mordred-1.2.0
CPU times: user 49.9 ms, sys: 13.4 ms, total: 63.3 ms
Wall time: 4.64 s

こんなふうに計算できんねんで。ちょっと時間かかっけどな。

%%time 
from rdkit import Chem
from mordred import Calculator, descriptors

calc = Calculator(descriptors)
mordred_descriptors_df = calc.pandas([Chem.MolFromSmiles(smile) for smile in df['Open Babel SMILES']])
display(mordred_descriptors_df)
100%|██████████| 633/633 [03:30<00:00,  3.01it/s]
ABC ABCGG nAcid nBase SpAbs_A SpMax_A SpDiam_A SpAD_A SpMAD_A LogEE_A VE1_A VE2_A VE3_A VR1_A VR2_A VR3_A nAromAtom nAromBond nAtom nHeavyAtom nSpiro nBridgehead nHetero nH nB nC nN nO nS nP nF nCl nBr nI nX ATS0dv ATS1dv ATS2dv ATS3dv ATS4dv ... JGI7 JGI8 JGI9 JGI10 JGT10 Diameter Radius TopoShapeIndex PetitjeanIndex Vabc VAdjMat MWC01 MWC02 MWC03 MWC04 MWC05 MWC06 MWC07 MWC08 MWC09 MWC10 TMWC10 SRW02 SRW03 SRW04 SRW05 SRW06 SRW07 SRW08 SRW09 SRW10 TSRW10 MW AMW WPath WPol Zagreb1 Zagreb2 mZagreb1 mZagreb2
0 12.555834 11.602561 0 1 19.854843 2.352277 4.611329 19.854843 1.167932 3.715164 3.514302 0.206724 1.787469 129.529290 7.619370 5.394535 0 0 41 17 0 0 3 24 0 14 2 1 0 0 0 0 0 0 0 178.0 170.0 209.0 234.0 159.0 ... 0.013021 0.015983 0.0 0.0 0.555706 9 5 0.800000 0.444444 261.218164 5.087463 17.0 4.394449 5.187386 6.037871 6.855409 7.711101 8.541495 9.398892 10.237278 11.095120 103.459001 3.555348 0.0 4.844187 0.000000 6.289716 2.70805 7.812378 5.379897 9.381348 56.970925 236.188863 5.760704 533 26 80.0 89.0 7.916667 3.833333
1 8.025464 7.826624 0 0 12.698196 2.326846 4.653693 12.698196 1.154381 3.295059 3.073114 0.279374 1.218002 37.827512 3.438865 3.728347 6 6 25 11 0 0 1 14 0 10 0 1 0 0 0 0 0 0 0 114.0 110.0 137.0 120.0 61.0 ... 0.000000 0.000000 0.0 0.0 0.570105 6 3 1.000000 0.500000 159.140702 4.459432 11.0 3.970292 4.779123 5.638355 6.459904 7.323171 8.147578 9.011279 9.836439 10.700071 87.866213 3.135494 0.0 4.418841 0.000000 5.916202 0.00000 7.519150 0.000000 9.169831 41.159518 150.104465 6.004179 146 15 52.0 59.0 5.194444 2.555556
2 8.623181 8.454283 0 0 14.828323 2.340374 4.680749 14.828323 1.235694 3.376509 3.161639 0.263470 1.333412 45.534706 3.794559 4.000796 6 6 24 12 0 0 3 12 0 9 0 3 0 0 0 0 0 0 0 166.0 140.0 195.0 200.0 174.0 ... 0.000000 0.000000 0.0 0.0 0.511503 6 4 0.500000 0.333333 159.425171 4.584963 12.0 4.043051 4.859812 5.713733 6.552508 7.407924 8.252446 9.106867 9.953277 10.806956 90.696575 3.218876 0.0 4.488636 0.000000 5.983936 0.00000 7.585281 0.000000 9.238442 42.515171 168.078644 7.003277 188 17 56.0 64.0 5.444444 2.888889
3 8.134854 7.770338 0 0 12.675204 2.302776 4.605551 12.675204 1.152291 3.294669 3.091013 0.281001 1.223809 37.212677 3.382971 3.711960 6 6 25 11 0 0 1 14 0 10 0 1 0 0 0 0 0 0 0 112.0 115.0 140.0 114.0 54.0 ... 0.000000 0.000000 0.0 0.0 0.580361 6 3 1.000000 0.500000 159.140702 4.459432 11.0 3.970292 4.762174 5.609472 6.421622 7.271704 8.087948 8.938663 9.755741 10.606610 87.424225 3.135494 0.0 4.418841 0.000000 5.899897 0.00000 7.474772 0.000000 9.094144 41.023148 150.104465 6.004179 150 14 52.0 58.0 5.194444 2.472222
4 11.143219 9.394819 0 0 17.535313 2.233148 4.444177 17.535313 1.169021 3.590952 3.540074 0.236005 1.669613 66.874116 4.458274 4.608277 0 0 39 15 0 0 0 24 0 15 0 0 0 0 0 0 0 0 0 104.0 107.0 111.0 115.0 121.0 ... 0.000000 0.000000 0.0 0.0 0.517928 6 5 0.200000 0.166667 247.730402 4.906891 15.0 4.262680 5.003946 5.831882 6.591674 7.418781 8.185629 9.011767 9.782844 10.608094 96.697298 3.433987 0.0 4.709530 0.000000 6.124683 0.00000 7.596392 0.000000 9.098403 45.962996 204.187801 5.235585 359 19 70.0 74.0 6.284722 3.333333
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
628 8.163363 8.029752 0 0 13.137460 2.364871 4.729742 13.137460 1.194315 3.303305 3.092493 0.281136 1.224288 35.727761 3.247978 3.671238 6 6 19 11 0 0 4 8 0 7 0 4 0 0 0 0 0 0 0 190.0 172.0 246.0 262.0 143.0 ... 0.000000 0.000000 0.0 0.0 0.560975 5 3 0.666667 0.400000 133.623428 4.459432 11.0 4.007333 4.844187 5.707110 6.561031 7.424762 8.281724 9.145375 10.003107 10.866605 88.841233 3.135494 0.0 4.465908 0.000000 6.008813 0.00000 7.647309 0.000000 9.331052 41.588577 156.042259 8.212750 140 17 54.0 63.0 5.805556 2.444444
629 7.249407 6.952976 0 0 12.392927 2.262725 4.525450 12.392927 1.239293 3.197657 2.966570 0.296657 1.087406 33.218265 3.321827 3.503100 6 6 16 10 0 0 4 6 0 6 0 4 0 0 0 0 0 0 0 192.0 156.0 209.0 225.0 111.0 ... 0.000000 0.000000 0.0 0.0 0.527680 6 3 1.000000 0.500000 116.327442 4.321928 10.0 3.850148 4.634729 5.451038 6.259581 7.077498 7.890957 8.708970 9.523617 10.341581 83.738119 3.044522 0.0 4.290459 0.000000 5.752573 0.00000 7.305860 0.000000 8.899867 39.293282 142.026609 8.876663 117 12 46.0 51.0 4.333333 2.361111
630 8.623181 8.048055 0 0 14.019917 2.385953 4.688728 14.019917 1.274538 3.352587 3.163201 0.287564 1.246895 40.693301 3.699391 3.801374 9 10 21 11 0 0 2 10 0 9 2 0 0 0 0 0 0 0 0 134.0 151.0 201.0 137.0 77.0 ... 0.000000 0.000000 0.0 0.0 0.469370 5 3 0.666667 0.400000 129.155093 4.584963 12.0 4.077537 4.919981 5.789960 6.652863 7.523481 8.390949 9.261129 10.129866 10.999747 90.745514 3.218876 0.0 4.532599 2.397895 6.042633 4.59512 7.646354 6.580639 9.298809 55.312925 146.084398 6.956400 140 14 58.0 68.0 3.694444 2.388889
631 8.094413 7.861189 0 0 13.168697 2.318335 4.636669 13.168697 1.197154 3.295042 3.080399 0.280036 1.220369 37.279058 3.389005 3.713742 6 6 25 11 0 0 1 14 0 10 0 1 0 0 0 0 0 0 0 112.0 116.0 150.0 128.0 54.0 ... 0.000000 0.000000 0.0 0.0 0.505208 5 3 0.666667 0.400000 159.140702 4.459432 11.0 3.970292 4.779123 5.624018 6.453625 7.299121 8.134761 8.978787 9.816785 10.659680 87.716192 3.135494 0.0 4.418841 0.000000 5.916202 0.00000 7.510431 0.000000 9.148784 41.129752 150.104465 6.004179 146 15 52.0 59.0 5.194444 2.500000
632 8.910910 8.632017 0 0 13.407823 2.358294 4.716589 13.407823 1.117319 3.383453 3.189168 0.265764 1.342082 42.532621 3.544385 3.932593 6 6 22 12 0 0 4 10 0 8 2 2 0 0 0 0 0 0 0 194.0 168.0 239.0 216.0 145.0 ... 0.000000 0.000000 0.0 0.0 0.605111 6 3 1.000000 0.500000 152.696021 4.584963 12.0 4.077537 4.890349 5.771441 6.597146 7.482682 8.311153 9.197559 10.026634 10.913214 91.267715 3.218876 0.0 4.532599 0.000000 6.042633 0.00000 7.657755 0.000000 9.324115 42.775979 166.074228 7.548829 184 17 58.0 66.0 6.055556 2.666667

633 rows × 1826 columns

CPU times: user 30.6 s, sys: 239 ms, total: 30.8 s
Wall time: 45.7 s

Mordredは特徴量の数がめっちゃぎょうさんあんのが売りの一つや。その意味とかは公式ホームページの説明を読まはってや。

トレーニングデータ、テストデータへの分割

ほんでこれから、トレーニングデータ、テストデータに分割すっで。ここまで、いろんな記述子とかフィンガープリントを計算したやろ。そいつらが全部同じ分割になるようにすんねん。

import random

test_size = 0.2
ids = [id for id in range(len(df))]
random.shuffle(ids)
split_line = int(len(df) * (1 - test_size))
train_ids = ids[:split_line]
test_ids = ids[split_line:]

この train_ids がトレーニングデータのID、 test_ids がテストデータのIDっちゅうこっちゃな。

学習

ほな、まとめて Random Forest で学習してみんで。なんで Random Forest やねんっちゅうと、ハイパーパラメーターチューニングめんどいやろ? ここでは単に記述子同士の比較がしたいだけやねんから、チューニングせえへんくてもそこそこの性能が出る言われてて、さらに特徴量の相対的な重要性が計算できる Random Forest が一番ええ思たねん。知らんけど。

import timeit
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

from rdkit_installer.preprocess import TableCleaner
from rdkit_installer.depict import rf_feature_importances

def make_comparison(X_names, Xs, Y):
    result = {"train score":[], "test score":[], "train shape":[], "test shape":[], "time":[]}

    for name, X in zip(X_names, Xs):

        # エラーを起こす行とか列とか除外するで
        cleaner = TableCleaner()
        selected_col = cleaner.clean_columns(X, Y)
        selected_row = cleaner.clean_rows(X)
        train_set = list(set(train_ids) & set(selected_row))
        test_set = list(set(test_ids) & set(selected_row))
        X_train, X_test = X.iloc[train_set, selected_col], X.iloc[test_set, selected_col]
        y_train, y_test = Y.iloc[train_set, :], Y.iloc[test_set, :]

        # 学習しながら計算時間も計測すんねん
        regr = RandomForestRegressor()
        seconds = timeit.timeit(lambda: regr.fit(X_train, y_train.values.ravel()), number=1)

        # 計算結果(性能)を記録するんや
        result["time"].append(seconds) # 学習に要した時間(秒)
        result["train shape"].append(X_train.shape)
        result["test shape"].append(X_test.shape)
        result["train score"].append(regr.score(X_train, y_train.values.ravel()))
        result["test score"].append(regr.score(X_test, y_test.values.ravel()))

        # 貢献度の高い特徴量トップ10を図示すんで
        print(name, "Test score=", regr.score(X_test, y_test.values.ravel()))
        rf_feature_importances(regr, X)

    # pandas 形式にしといたほうが後で見やすいさかい、そうしとくな
    data = pd.DataFrame(result)
    data.index = X_names
    return data

"HOMO-LUMO gap" を目的変数とした回帰モデルをまとめて構築しはんねんたら、こんな感じや。

%%time 
result1 = make_comparison(
    ["RDKit 208 descriptors", "RDKit 400 descriptors", "Mordred 1826 descriptors"],
    [rdkit_208descriptors_df, rdkit_descriptors_df, mordred_descriptors_df], 
    df[["HOMO-LUMO gap"]]
    )
RDKit 208 descriptors Test score= 0.7393261212344209

RDKit400_25_1.png

RDKit 400 descriptors Test score= 0.7508808273760392

RDKit400_25_3.png

Mordred 1826 descriptors Test score= 0.7116458690779182

RDKit400_25_5.png

CPU times: user 47.3 s, sys: 2.2 s, total: 49.5 s
Wall time: 3min 37s

データ分割はランダムにしてんねんけど、それによって順位が変わることがあんねん。せやけどまあ、上の例だけで言うたら RDKit の400個の記述子が一番性能が良かったみたいやな。そうは言うけど、さっきも言うた通り、特徴量を解釈しよう思わはるねんたら、RDKitに新しく追加された記述子は、難ありなんや。まあ知らんけど。

全体的な性能について、表にまとめたらこんな感じや。

result1
train score test score train shape test shape time
RDKit 208 descriptors 0.964067 0.739326 (506, 208) (127, 208) 2.123547
RDKit 400 descriptors 0.963012 0.750881 (496, 320) (123, 320) 3.914356
Mordred 1826 descriptors 0.957463 0.711646 (506, 1278) (127, 1278) 19.107194

ついでやし、各種フィンガープリントについても計算してみてん。貢献度の高い特徴量トップ10も出力されるけど、こんなん解釈せえ言わはっても困難やで。ほんまに。

%%time
result2 = make_comparison(
    fingerprinter.names,
    fp_dfs, 
    df[["HOMO-LUMO gap"]]
    )
MACCSkeys Test score= 0.7428648213992282

RDKit400_29_1.png

Avalon Test score= 0.6600486101886427

RDKit400_29_3.png

Morgan2(1024bits) Test score= 0.564839806463383

RDKit400_29_5.png

Morgan2F(1024bits) Test score= 0.6145275706762224

RDKit400_29_7.png

Morgan4(2048bits) Test score= 0.4589796057137068

RDKit400_29_9.png

Morgan4F(2048bits) Test score= 0.5606490833767938

RDKit400_29_11.png

ECFP2 Test score= 0.6797667810884447

RDKit400_29_13.png

FCFP2 Test score= 0.6089529501171054

RDKit400_29_15.png

ECFP4 Test score= 0.6816544563865634

RDKit400_29_17.png

FCFP4 Test score= 0.6067634726762778

RDKit400_29_19.png

ECFP6 Test score= 0.6805538722932313

RDKit400_29_21.png

FCFP6 Test score= 0.6192763399924386

RDKit400_29_23.png

CPU times: user 41.4 s, sys: 205 ms, total: 41.6 s
Wall time: 42.8 s

各フィンガープリントから得られた性能を表にまとめると、こんな感じや。うひょー。

result2
train score test score train shape test shape time
MACCSkeys 0.955100 0.742865 (506, 167) (127, 167) 0.569669
Avalon 0.954562 0.660049 (506, 512) (127, 512) 1.899150
Morgan2(1024bits) 0.949939 0.564840 (506, 1024) (127, 1024) 2.130418
Morgan2F(1024bits) 0.952999 0.614528 (506, 1024) (127, 1024) 1.596574
Morgan4(2048bits) 0.944960 0.458980 (506, 2048) (127, 2048) 4.163680
Morgan4F(2048bits) 0.947081 0.560649 (506, 2048) (127, 2048) 3.517748
ECFP2 0.961705 0.679767 (506, 693) (127, 693) 1.167866
FCFP2 0.948420 0.608953 (506, 207) (127, 207) 0.576670
ECFP4 0.957817 0.681654 (506, 3128) (127, 3128) 3.668387
FCFP4 0.956894 0.606763 (506, 1450) (127, 1450) 1.988229
ECFP6 0.963806 0.680554 (506, 6508) (127, 6508) 6.774919
FCFP6 0.958517 0.619276 (506, 4110) (127, 4110) 4.656614

いちおう、記述子の結果も再掲しとくで。

result1
train score test score train shape test shape time
RDKit 208 descriptors 0.964067 0.739326 (506, 208) (127, 208) 2.123547
RDKit 400 descriptors 0.963012 0.750881 (496, 320) (123, 320) 3.914356
Mordred 1826 descriptors 0.957463 0.711646 (506, 1278) (127, 1278) 19.107194

結果の総括

こんなん数字だけ見せられても比較困難やわ、って声が聞こえてきたさかい、プロットしてみんで。

import matplotlib.pyplot as plt

data = pd.concat([result1, result2])
for colname1 in ["time", "test score"]:
    for colname2 in ["test score", "train score"]:
        if colname1 != colname2:
            X = list(data[colname1])
            Y = list(data[colname2])
            plt.figure(figsize=(8,8))
            plt.scatter(X, Y)
            for x, y, name in zip(X, Y, data.index):
                plt.text(x, y, name, alpha=0.8, size=8)
            plt.xlabel(colname1)
            plt.ylabel(colname2)
            plt.grid()
            plt.show()

RDKit400_35_0.png

RDKit400_35_1.png

RDKit400_35_2.png

まとめたら、Mordred は計算時間がかかる割にはベストな性能っちゅうわけでもあらへんくて、RDKit の 400個の記述子が、計算時間の面でも性能の面でも一番ええように見えるな。あとは、Morganよりも、FCFPよりも、ECFPのほうが頑張らはったみたいや。MACCS keys とか Avalon とか、意外と健闘してはんねんな。興味ある人いはったら詳しく検討しはったらええで。

今回検討したんは1例だけやけど、データセットの分割とか、目的変数によって、違った結果が得られることは充分にありえると思うで。知らんけど。

あとはまあ恥ずかしい話やけど、フィンガープリントの定義とか計算方法とか間違っている部分があるかも知れへん。ほんま堪忍やで。根幹部分のコードは https://github.com/maskot1977/rdkit_installer に記述しといたから、何や気付かはった点があったら指摘してくれはるとほんま助かります。

Take Home Message

RDKit の新機能、まだまだあるでー、きっと。

RDKit だけに。

7
9
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
9