結晶構造のcifデータから結晶学的データを自動取得
科学において結晶構造のデータのフォーマットが決まっていまして、そのデータをcifと呼びます。このデータファイルにはいろんなことが書いてありますので、その中から結晶構造に関する重要なデータだけ取り出したいです。それを自動化しましょうっていうのが今回の内容です。
pythonではこのcifデータを扱うモジュールがいくつかあります。有名なのはRDkitやpymatgenやopenbableとかでしょうか。しかし、どれも癖があって痒いところに手が届かないような印象を受けております。今回、データを取得したいだけなので、モジュールは使わずcifのテキストを直接読み取ります。cifは、例えば、「"_cell_length_a"の後ろに$a$軸の長さを記述する」のようにフォーマットが決まってますのでそれらを参照して読み取るだけです。データベースから引っ張ってきた古いcifデータなどはフォーマットが若干崩れてたりして、一部のデータがうまく読み取れなかったり欠落したりしますが、大体動くと思います。出力はcsvです。
スクリプト
4行目の"filename.cif"を実際に読み込みたいcifのパスに変えていただければ走ります。
import csv
import numpy as np
file = "filename.cif"
crystaloglaph_deta = [["name"],
["Chemical formula"],
["Formula mass"],
["Crystal system"],
["a"],
["b"],
["c"],
["alpha"],
["beta"],
["gamma"],
["Unit cell volume"],
["Temperature"],
["Space group"],
["Z"],
["Rint"],
["Final R1 values (I>2sigma(I))"],
["Final wR(F2) values (I>2sigma(I))"],
["Final R1 values (all data)"],
["Final wR(F2) values (all deta)"],
["Goodness of fit on F2"],
["No. of reflections measured"],
["No. of independent reflections"],
]
path = file.split('/')
name_head = path[-1][:-4]
with open(file, "r") as f:
lines = f.readlines()
num = np.zeros(len(crystaloglaph_deta))
crystaloglaph_deta[0].append(name_head)
num[0] = num[0] + 1
crystaloglaph_deta2 = crystaloglaph_deta
for j in range(len(lines)):
if "_chemical_formula_sum" in lines[j]:
line = lines[j+1].replace(' ', '')
line = line.replace('\n', '')
line = line.replace("'", '')
crystaloglaph_deta[1].append(line)
num[1] = num[1] + 1
if "_chemical_formula_weight" in lines[j]:
line = lines[j].replace('_chemical_formula_weight', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[2].append(line)
num[2] = num[2] + 1
if "_space_group_crystal_system" in lines[j]:
line = lines[j].replace('_space_group_crystal_system', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[3].append(line)
num[3] = num[3] + 1
if "_cell_length_a" in lines[j]:
line = lines[j].replace('_cell_length_a', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[4].append(line)
num[4] = num[4] + 1
if "_cell_length_b" in lines[j]:
line = lines[j].replace('_cell_length_b', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[5].append(line)
num[5] = num[5] + 1
if "_cell_length_c" in lines[j]:
line = lines[j].replace('_cell_length_c', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[6].append(line)
num[6] = num[6] + 1
if "_cell_angle_alpha" in lines[j]:
line = lines[j].replace('_cell_angle_alpha', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[7].append(line)
num[7] = num[7] + 1
if "_cell_angle_beta" in lines[j]:
line = lines[j].replace('_cell_angle_beta', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[8].append(line)
num[8] = num[8] + 1
if "_cell_angle_gamma" in lines[j]:
line = lines[j].replace('_cell_angle_gamma', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[9].append(line)
num[9] = num[9] + 1
if "_cell_volume" in lines[j]:
line = lines[j].replace('_cell_volume', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[10].append(line)
num[10] = num[10] + 1
if "_diffrn_ambient_temperature" in lines[j]:
line = lines[j].replace('_diffrn_ambient_temperature', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[11].append(line)
num[11] = num[11] + 1
if "_space_group_name_H-M_alt" in lines[j]:
line = lines[j].replace('_space_group_name_H-M_alt', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
line = line.replace("'", '')
crystaloglaph_deta[12].append(line)
num[12] = num[12] + 1
if "_cell_formula_units_Z " in lines[j]:
line = lines[j].replace('_cell_formula_units_Z ', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[13].append(line)
num[13] = num[13] + 1
if "_diffrn_reflns_av_R_equivalents" in lines[j]:
line = lines[j].replace('_diffrn_reflns_av_R_equivalents', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[14].append(line)
num[14] = num[14] + 1
if "_refine_ls_R_factor_gt" in lines[j]:
line = lines[j].replace('_refine_ls_R_factor_gt', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[15].append(line)
num[15] = num[15] + 1
if "_refine_ls_wR_factor_gt" in lines[j]:
line = lines[j].replace('_refine_ls_wR_factor_gt', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[16].append(line)
num[16] = num[16] + 1
if "_refine_ls_R_factor_all" in lines[j]:
line = lines[j].replace('_refine_ls_R_factor_all', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[17].append(line)
num[17] = num[17] + 1
if "_refine_ls_wR_factor_ref" in lines[j]:
line = lines[j].replace('_refine_ls_wR_factor_ref', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[18].append(line)
num[18] = num[18] + 1
if "_refine_ls_goodness_of_fit_ref" in lines[j]:
line = lines[j].replace('_refine_ls_goodness_of_fit_ref', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[19].append(line)
num[19] = num[19] + 1
if "_diffrn_reflns_number" in lines[j]:
line = lines[j].replace('_diffrn_reflns_number', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[20].append(line)
num[20] = num[20] + 1
if "_refine_ls_number_reflns" in lines[j]:
line = lines[j].replace('_refine_ls_number_reflns', '')
line = line.replace(' ', '')
line = line.replace('\n', '')
crystaloglaph_deta[21].append(line)
num[21] = num[21] + 1
for k in range(len(crystaloglaph_deta)):
if num[k] == 0:
crystaloglaph_deta[k].append("-")
else:
pass
f = open(name_head + '.csv', 'w', newline='')
writer = csv.writer(f)
writer.writerows(crystaloglaph_deta)
f.close()
簡単ですが、以上です