More than 1 year has passed since last update.

多原子イオンを１つの点に置換した結晶構造の作成ログ

Posted at 2023-11-10

最終目標設定

既存の結晶構造から，多原子イオンを１つの点に置換した新しい結晶構造を作成する．

準備

以下，２つのファイルを用意する．

POSCAR: POSCARファイル形式の結晶構造を記したテキストファイル
POSCAR.nnlist: POSCARファイルから，原子ごとに隣接原子をリストアップしたテキストファイル

作成したプログラム

mk_clusterd_poscar.py
入力：POSCAR, POSCAR.nnlist
出力：POSCAR

今回行ったことの流れ

次の流れに沿って，距離が近い原子団を１つの原子に置き換え，新たな結晶構造を作成した．

0．POSCARとPOSCAR.nnlistをPandasのデータフレームとして読み込む．
1．結晶構造の各原子ごとに2Å以内の原子をピックアップする．
※ これは，準備の段階でPOSCAR.nnlistとして作成済み
2．ピックアップされた数が最大の原子をリストアップする．
3．2.でリストアップされた原子と、ともにピックアップされた原子のxyz座標の相加平均をとる．
4．リストアップされた原子の元素種と3.で得た座標を新たな１点とする．
5．2~4で選択されていない原子をリストアップする．
6．4.と5.を組み合わせて，新しい構造を作る．

今回行ったことの詳細説明

例として，COD (Crystallography Open Database) から得たBaCO3 (1000033.cif) を用いる．
※https://www.crystallography.net/cod/1000033.cif

0．POSCARとPOSCAR.nnlistをPandasのDataFrameとして読み込む．

# Converting a POSCAR file to a DataFrame
from my_package.textfile2df import poscar2df_coords 

df_coords = poscar2df_coords(filename='./POSCAR')
df_coords

	central atom	x	y	z	Species
0	1	0.250000000000000	0.757000000000000	0.919000000000000	C
1	2	0.250000000000000	0.743000000000000	0.419000000000000	C
2	3	0.750000000000000	0.243000000000000	0.081000000000000	C
3	4	0.750000000000000	0.257000000000000	0.581000000000000	C
4	5	0.250000000000000	0.901100000000000	0.912200000000000	O
5	6	0.250000000000000	0.598900000000000	0.412200000000000	O
6	7	0.750000000000000	0.098900000000000	0.087800000000000	O
7	8	0.750000000000000	0.401100000000000	0.587800000000000	O
8	9	0.459500000000000	0.683900000000000	0.921000000000000	O
9	10	0.040500000000000	0.816100000000000	0.421000000000000	O
10	11	0.540500000000000	0.316100000000000	0.079000000000000	O
11	12	0.040500000000000	0.683900000000000	0.921000000000000	O
12	13	0.459500000000000	0.816100000000000	0.421000000000000	O
13	14	0.959500000000000	0.183900000000000	0.579000000000000	O
14	15	0.959500000000000	0.316100000000000	0.079000000000000	O
15	16	0.540500000000000	0.183900000000000	0.579000000000000	O
16	17	0.250000000000000	0.416310000000000	0.754900000000000	Ba
17	18	0.250000000000000	0.083690000000000	0.254900000000000	Ba
18	19	0.750000000000000	0.583690000000000	0.245100000000000	Ba
19	20	0.750000000000000	0.916310000000000	0.745100000000000	Ba

# converting POSCAR.nnlist to df_nnlist
from my_package.textfile2df import nnlist2df

df_nnlist = nnlist2df(POSCAR_nnlist='POSCAR.nnlist')
df_nnlist

	central atom	neighboring atom	distance	X	Y	Z	central species	neighboring species
0	1	1	0.000000	0.000000	0.000000	0.000000	C	C
1	1	5	1.282630	0.000000	1.281885	-0.043713	C	O
2	1	9	1.289100	1.112990	-0.650283	0.012857	C	O
3	1	12	1.289100	-1.112990	-0.650283	0.012857	C	O
4	2	2	0.000000	0.000000	0.000000	0.000000	C	C
5	2	6	1.282630	0.000000	-1.281885	-0.043713	C	O
6	2	10	1.289100	-1.112990	0.650283	0.012857	C	O
7	2	13	1.289100	1.112990	0.650283	0.012857	C	O
8	3	3	0.000000	0.000000	0.000000	0.000000	C	C
9	3	7	1.282630	0.000000	-1.281885	0.043713	C	O
10	3	11	1.289100	-1.112990	0.650283	-0.012857	C	O
11	3	15	1.289100	1.112990	0.650283	-0.012857	C	O
12	4	4	0.000000	0.000000	0.000000	0.000000	C	C
13	4	8	1.282630	0.000000	1.281885	0.043713	C	O
14	4	14	1.289100	1.112990	-0.650283	-0.012857	C	O
15	4	16	1.289100	-1.112990	-0.650283	-0.012857	C	O
16	5	1	1.282630	0.000000	-1.281885	0.043713	O	C
17	5	5	0.000000	0.000000	0.000000	0.000000	O	O
18	6	2	1.282630	0.000000	1.281885	0.043713	O	C
19	6	6	0.000000	0.000000	0.000000	0.000000	O	O
20	7	3	1.282630	0.000000	1.281885	-0.043713	O	C
21	7	7	0.000000	0.000000	0.000000	0.000000	O	O
22	8	4	1.282630	0.000000	-1.281885	-0.043713	O	C
23	8	8	0.000000	0.000000	0.000000	0.000000	O	O
24	9	1	1.289100	-1.112990	0.650283	-0.012857	O	C
25	9	9	0.000000	0.000000	0.000000	0.000000	O	O
26	10	2	1.289100	1.112990	-0.650283	-0.012857	O	C
27	10	10	0.000000	0.000000	0.000000	0.000000	O	O
28	11	3	1.289100	1.112990	-0.650283	0.012857	O	C
29	11	11	0.000000	0.000000	0.000000	0.000000	O	O
30	12	1	1.289100	1.112990	0.650283	-0.012857	O	C
31	12	12	0.000000	0.000000	0.000000	0.000000	O	O
32	13	2	1.289100	-1.112990	-0.650283	-0.012857	O	C
33	13	13	0.000000	0.000000	0.000000	0.000000	O	O
34	14	4	1.289100	-1.112990	0.650283	0.012857	O	C
35	14	14	0.000000	0.000000	0.000000	0.000000	O	O
36	15	3	1.289100	-1.112990	-0.650283	0.012857	O	C
37	15	15	0.000000	0.000000	0.000000	0.000000	O	O
38	16	4	1.289100	1.112990	0.650283	0.012857	O	C
39	16	16	0.000000	0.000000	0.000000	0.000000	O	O
40	17	17	0.000000	0.000000	0.000000	0.000000	Ba	Ba
41	18	18	0.000000	0.000000	0.000000	0.000000	Ba	Ba
42	19	19	0.000000	0.000000	0.000000	0.000000	Ba	Ba
43	20	20	0.000000	0.000000	0.000000	0.000000	Ba	Ba

df_nnlist.groupby('central atom').count()['neighboring atom']で最も要素数の多いもののcentral atomをクラスタとして得る．

→ クラスタ化されたcentral atomを重複削除する　

→ 新しいcentral atomのリストを得る

print(df_nnlist.groupby('central atom').count()['neighboring atom'])
# これをフィルター化したものがget_elelem_max_num_filter_list

central atom
1     4
2     4
3     4
4     4
5     2
6     2
7     2
8     2
9     2
10    2
11    2
12    2
13    2
14    2
15    2
16    2
17    1
18    1
19    1
20    1
Name: neighboring atom, dtype: int64

df_nnlist.groupby('central atom').count()['neighboring atom']で最も要素数の多いもののcentral atomをクラスタとして得る．

import pandas as pd
def get_elem_max_filter(df_nnlist=df_nnlist):
    """
    To get cluster center abs coords from df_coords, Please use this filter.
    
    Input: df_nnlist 
 -> Output: The max number of element 
            in neighboring column of df_nnlist, 
            when df_nnlist groupbyed neighboring column and .count() 
    """
    elem_max_num = df_nnlist.groupby('central atom').count()['neighboring atom'].max()
    elem_max_num_filter = df_nnlist.groupby('central atom').count()['neighboring atom'] == elem_max_num
    # elem_max_num_filter_list = elem_max_num_filter.to_list()
    elem_max_num_filter_list = pd.Series(elem_max_num_filter.to_list())
    return elem_max_num_filter_list

get_elem_max_filter()

0      True
1      True
2      True
3      True
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
dtype: bool

メモ：df_coords[elem_max_num_filter_list] によりクラスタ中心の絶対座標を得る

# # 入力値が左側の数値と同じ場合、対応する右側の数値を返す関数
# def get_neighboring_atoms_list(central_atom_id, df_nnlist=df_nnlist):
#     """
#     To get all central atoms of a cluster(:neighbors), Input a number of cluster center element number(:central atom)
    
#     Input: central atom column element In df_nnlist
#  -> Output: All neighboring atom column element that Input(:elemnt) match central atom column element
    
#     param1: Input: central atom column element In df_nnlist
#     """
#     # 左側の列から対応する行を選択し、右側の数値を取得
#     # result = df_nnlist[df_nnlist['central atom'] == input_value]['neighboring atom'].values
#     neighboring_atoms_list = df_nnlist[df_nnlist['central atom'] == central_atom_id]['neighboring atom'].tolist()
#     return neighboring_atoms_list

# # 関数をテスト
# get_neighboring_atoms_list(3)

[3, 7, 11, 15]

クラスタ中心のcentral atom(id的な番号)の一覧を得る

クラスタに選ばれなかった残りのcentral atom (≒id)の一覧を得る関数

def get_all_non_clusterd_atom(df_nnlist=df_nnlist, df_coords=df_coords):
    """
    dependency: get_elem_max_filter(), get_right_value()
    
    To get non-clusterd central atom list, Use this func.
    
    Input: DataFrames
 -> Output: a list 
    
    param1: df_nnlist=df_nnlist
    param2: df_coords=df_coords
    """
    
    # 入力値が左側の数値と同じ場合、対応する右側の数値を返す関数
    def get_neighboring_atoms_list(central_atom_id, df_nnlist=df_nnlist):
        """
        To get all central atoms of a cluster(:neighbors), Input a number of cluster center element number(:central atom)

        Input: central atom column element In df_nnlist
     -> Output: All neighboring atom column element that Input(:elemnt) match central atom column element

        param1: Input: central atom column element In df_nnlist
        """
        # 左側の列から対応する行を選択し、右側の数値を取得
        # result = df_nnlist[df_nnlist['central atom'] == input_value]['neighboring atom'].values
        neighboring_atoms_list = df_nnlist[df_nnlist['central atom'] == central_atom_id]['neighboring atom'].tolist()
        return neighboring_atoms_list
    
    
    elem_max_num_filter = get_elem_max_filter(df_nnlist=df_nnlist)
    # クラスタ(原子団)の中心の原子のid(central atomの値)のリスト
    cluster_central_atom_list = df_coords[elem_max_num_filter]['central atom'].tolist()
    # クラスタ(原子団)に属するすべての原子のid(central atomの値)を取得
    cluster_all_atom_list_duplicated = [get_neighboring_atoms_list(elem) for elem in cluster_central_atom_list]
    # 2重リストを1重リストにflatten
    cluster_all_atom_list_duplicated_flatten = [item for sublist in cluster_all_atom_list_duplicated for item in sublist]
    # flat_listの重複削除
    cluster_all_atom_set = set(cluster_all_atom_list_duplicated_flatten)
    
    # 元のposcarのcentral atomの一覧を取得
    all_central_atom_set = set(df_coords['central atom'].tolist())
    
    # クラスタ(原子団)に属さない原子のid(central atomの値)を取得
    all_non_clusterd_atom_list = list(all_central_atom_set.difference(cluster_all_atom_set))
    
    return all_non_clusterd_atom_list

# all_non_clusterd_atom_list = get_all_non_clusterd_atom(df_nnlist=df_nnlist, df_coords=df_coords)
# print(all_non_clusterd_atom_list)

all_non_clusterd_atom_listをフィルター化する関数

# diff_central_atom_filter = df_coords['central atom'].apply(lambda row: row in diff_central_atom_list)
# これをget_diff_central_atom_filterに関数化

def get_all_non_clusterd_atom_filter(df_nnlist=df_nnlist, df_coords=df_coords):
    """
    To convert list to filter, Use thie func.
    """
    all_non_clusterd_atom_list = get_all_non_clusterd_atom(df_nnlist=df_nnlist, df_coords=df_coords)
    all_non_clusterd_atom_filter = df_coords['central atom'].apply(lambda row: row in all_non_clusterd_atom_list)
    return all_non_clusterd_atom_filter

get_all_non_clusterd_atom_filter()

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16     True
17     True
18     True
19     True
Name: central atom, dtype: bool

df_coords[]で、クラスタの中心の絶対座標のフィルターと重複削除された残りの絶対座標のフィルターを結合してフィルターする

過不足のないcentral atomのfilterが完成

central_atom_filter_fix = get_elem_max_filter() | get_all_non_clusterd_atom_filter()
print(central_atom_filter_fix)

0      True
1      True
2      True
3      True
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16     True
17     True
18     True
19     True
dtype: bool

クラスタ後の新しい絶対座標を得る

df_coords_abs_center = df_coords[central_atom_filter_fix]
df_coords_abs_center

	central atom	x	y	z	Species
0	1	0.250000000000000	0.757000000000000	0.919000000000000	C
1	2	0.250000000000000	0.743000000000000	0.419000000000000	C
2	3	0.750000000000000	0.243000000000000	0.081000000000000	C
3	4	0.750000000000000	0.257000000000000	0.581000000000000	C
16	17	0.250000000000000	0.416310000000000	0.754900000000000	Ba
17	18	0.250000000000000	0.083690000000000	0.254900000000000	Ba
18	19	0.750000000000000	0.583690000000000	0.245100000000000	Ba
19	20	0.750000000000000	0.916310000000000	0.745100000000000	Ba

クラスタ後の新しい絶対座標を文字列→数値に変換する関数

def df_elem_str2num(df_coords_abs_center=df_coords_abs_center):
    # 文字列を数値化する
    df_coords_abs_center['x'] = pd.to_numeric(df_coords_abs_center['x'], errors='coerce')
    df_coords_abs_center['y'] = pd.to_numeric(df_coords_abs_center['y'], errors='coerce')
    df_coords_abs_center['z'] = pd.to_numeric(df_coords_abs_center['z'], errors='coerce')
    return df_coords_abs_center

df_coords_abs_center = df_elem_str2num(df_coords_abs_center=df_coords_abs_center)

/tmp/ipykernel_2734045/1947849364.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_coords_abs_center['x'] = pd.to_numeric(df_coords_abs_center['x'], errors='coerce')
/tmp/ipykernel_2734045/1947849364.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_coords_abs_center['y'] = pd.to_numeric(df_coords_abs_center['y'], errors='coerce')
/tmp/ipykernel_2734045/1947849364.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_coords_abs_center['z'] = pd.to_numeric(df_coords_abs_center['z'], errors='coerce')

# 数値化したクラスタ後の絶対座標
df_coords_abs_center

	central atom	x	y	z	Species
0	1	0.250000	0.757000	0.919000	C
1	2	0.250000	0.743000	0.419000	C
2	3	0.750000	0.243000	0.081000	C
3	4	0.750000	0.257000	0.581000	C
16	17	0.250000	0.416310	0.754900	Ba
17	18	0.250000	0.083690	0.254900	Ba
18	19	0.750000	0.583690	0.245100	Ba
19	20	0.750000	0.916310	0.745100	Ba

クラスタの相対中心座標を計算

df_nnlist_grouped = df_nnlist.groupby('central atom').mean()
# central atomカラムでgroupby.mean()した後、index列(central atom)をカラムにする   
df_nnlist_grouped = df_nnlist_grouped.reset_index()   
# 意味のないカラムを落とす
df_nnlist_grouped = df_nnlist_grouped[['central atom', 'X', 'Y', 'Z']]

/tmp/ipykernel_2734045/2493061615.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  df_nnlist_grouped = df_nnlist.groupby('central atom').mean()

# フィルターで必要なクラスタの相対中心座標に絞る
df_cluster_relative_center = df_nnlist_grouped[central_atom_filter_fix]
df_cluster_relative_center

	central atom	Y	Z
0	1	-0.004670	-0.004500
1	2	0.004670	-0.004500
2	3	0.004670	0.004500
3	4	-0.004670	0.004500
16	17	0.000000	0.000000
17	18	0.000000	0.000000
18	19	0.000000	0.000000
19	20	0.000000	0.000000

絶対座標 + 相対座標の計算

def get_clusterd_coords(df_abs=df_coords_abs_center, df_relative=df_cluster_relative_center):
    df_coords_x = df_abs['x'] + df_relative['X']
    df_coords_y = df_abs['y'] + df_relative['Y']
    df_coords_z = df_abs['z'] + df_relative['Z']
    df_coords_species = df_abs['Species']

    # カラム名を指定してデータフレームを作成
    df_coords_fix = pd.DataFrame({
        'X': df_coords_x,
        'Y': df_coords_y,
        'Z': df_coords_z,
        'Species': df_coords_species,
    })

    return df_coords_fix

df_coords_fix = get_clusterd_coords(df_abs=df_coords_abs_center, df_relative=df_cluster_relative_center)
df_coords_fix

	X	Y	Z	Species
0	0.250000	0.752330	0.914500	C
1	0.250000	0.747670	0.414500	C
2	0.750000	0.247670	0.085500	C
3	0.750000	0.252330	0.585500	C
16	0.250000	0.416310	0.754900	Ba
17	0.250000	0.083690	0.254900	Ba
18	0.750000	0.583690	0.245100	Ba
19	0.750000	0.916310	0.745100	Ba

# 元のPOSCARファイルから5行目までを抽出して、新しいPOSCARファイルに書き込む関数
import os
def df2poscar(df=df_coords_fix, original_file="./POSCAR", output_file="gen_data/POSCAR"):
    """
    Writing the DataFrame(:df_coords_fix) to a POSCAR file.
    param1: DataFrame that has 'X', 'Y', 'Z' columns about coords.
    param2: original POSCAR file
    param3: generated POSCAR file
    """
    
    # df_coords_fixを文字列に変換
    def df2str(df):
        df_coords_fix_str = df[['X', 'Y', 'Z']].to_string(header=False, index=False, index_names=False)
        return df_coords_fix_str

    
    # df_coords_fixから元素種を文字列として抽出する関数
    def return_species(df):
        species_line = ' '.join(df['Species'].unique())
        num_line = ' '.join([str(len(df[df['Species'] == specie])) for specie in df['Species'].unique()])
        return species_line + '\n' + num_line

    
    # 元のPOSCARファイルの5行目までを抽出し，新しいファイルに書き込む
    def wirte_header2poscar():
        # 最初の5行を抽出
        with open(original_file, 'r') as infile:
            lines = infile.readlines()[:5]
        # 新しいPOSCARファイルに書き込む
        with open(output_file, 'w') as outfile:
            outfile.writelines(lines)
    
    
    # 新しいPOSCARファイルに書き込んでいく
    def write_species2poscar():
        with open(output_file, 'a') as file:
            # すでに存在するテキストファイルに元素種を追記
            file.write(return_species(df) + '\n')
            # 元素種まで書かれたファイルにDirectという文字をを追記
            file.write('Direct\n')
            # 直交座標を追記
            file.write(df_coords_fix_str + '\n')


    # 関数をcall
    None if os.path.exists("gen_data") else os.makedirs('gen_data')
    df_coords_fix_str = df2str(df)
    wirte_header2poscar()
    write_species2poscar()
    
    print(f"{output_file} にクラスタ化後の内容が書き込まれました。")

df2poscar()

gen_data/POSCAR にクラスタ化後の内容が書き込まれました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

多原子イオンを１つの点に置換した結晶構造の作成ログ

最終目標設定

準備

作成したプログラム

今回行ったことの流れ

今回行ったことの詳細説明

0．POSCARとPOSCAR.nnlistをPandasのDataFrameとして読み込む．

df_nnlist.groupby('central atom').count()['neighboring atom']で最も要素数の多いもののcentral atomをクラスタとして得る．

→ クラスタ化されたcentral atomを重複削除する

→ 新しいcentral atomのリストを得る

df_nnlist.groupby('central atom').count()['neighboring atom']で最も要素数の多いもののcentral atomをクラスタとして得る．

メモ：df_coords[elem_max_num_filter_list] によりクラスタ中心の絶対座標を得る

クラスタ中心のcentral atom(id的な番号)の一覧を得る

クラスタに選ばれなかった残りのcentral atom (≒id)の一覧を得る関数

all_non_clusterd_atom_listをフィルター化する関数

df_coords[]で、クラスタの中心の絶対座標のフィルターと重複削除された残りの絶対座標のフィルターを結合してフィルターする

過不足のないcentral atomのfilterが完成

クラスタ後の新しい絶対座標を得る

クラスタ後の新しい絶対座標を文字列→数値に変換 する関数

クラスタの相対中心座標を計算

絶対座標 + 相対座標の計算

→ クラスタ化されたcentral atomを重複削除する　

クラスタ後の新しい絶対座標を文字列→数値に変換する関数