More than 3 years have passed since last update.

pythonでFortran出力を読む

Last updated at 2020-11-30Posted at 2020-01-30

Fortranで出力されたバイナリファイルをpythonで読む

数値計算はFortranで行い、図や解析はpythonという人向け。数値計算の出力はバイナリファイルであるとする。また、pythonではnumpyを用いて解析するものとする。
この文章ではまずFortranのバイナリ出力形式について説明した後、pythonでの読み方について述べる。

Fortran出力の形式について

Fortranのバイナリ出力形式はsequential（順番探索）、direct(直接探索)、streamの3種類である。それぞれについて見ていく。
なおここでのバイナリ出力とはform="unformatted"を指す。form="binary"ではない。(form="binary"についてはよく知らない）

sequential（順番探索）

ファイルの先頭から書き出していく形式。writeの度に出力の先頭と末尾に4バイト(古いものだと8バイトのこともある)のmarkerがつく。先頭には出力のバイト数が入っている。
読み込みも先頭から読む必要がある（バイト数さえ指定すればstreamで読むことも不可能ではないけど。。。）

real(4) :: a=1,b=2
open(10,file="test.seq", form="unformatted", action="write",access="sequential")
write(10) a
write(10) b

direct(直接探索)

レコード長(バイト数；recl)を指定して出力する。出力の際はrecを指定し、出力位置を指定する。なので、必ずしも先頭から書き込む必要は無い（普通は先頭から出力しますが）。
読み込むときも先頭から読む必要がないのは便利。
なお、インテルのコンパイラ(ifort)だと、reclのデフォルトは4バイト単位になっている（たとえばrecl=4だと16バイトを出力する設定）。-assume bytereclをオプションでバイト単位に直しておくのが無難。

real(4) :: a=1,b=2
open(10,file="test.dir", form="unformatted", action="write",access="direct",recl=4)
write(10,rec=1) a
write(10,rec=2) b

stream

Fortran 2003以降ではストリーム入出力が追加された。sequentialに似ているが違いはファイルの先頭と末尾にmarkerがつかないこと。

open(10,file="test.stm", form="unformatted", action="write",access="stream")
write(10) a
write(10) b ! 自動でposが指定される。

出力位置はposを使って指定（バイト数）することもできる。
入力の際もposを指定すれば必ずしも先頭から読み込む必要はなくなる。

endianについて

ビッグエンディアンとリトルエンディアンが存在。指定しない場合はマシンのデフォルトを使うことになる。どちらでも良いが自分でわかるように統一して使うのが無難かなあ。
コンパイル時に指定する方法は以下の通り

$ gfortran -fconvert=big-endian test.f90
$ ifort -convert=big_endian -assume byterecl test.f90

open文でも指定可能。convertで指定する(たぶん大抵のコンパイラで拡張機能として設定されている)。

open(10, file="test.dat", form="unformatted", action="write, access="stream" , &
& convert="big_endian" )

pythonで読む

pythonの標準ライブラリで読むことが可能。バイナリを読んで、np.frombufferで変換する。次のようなクラスを作っておくとFortranの用に扱える。出力は1次元配列なので、必要に応じてreshapeで変換する。
dtypeの説明は代表的なものだけ。
詳細は[https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#]
とか

記号	意味
>	ビッグエンディアン
<	リトルエンディアン
i4	4バイト整数
f4	4バイト浮動小数
f8	8バイト浮動小数

以下例など。200要素ある4バイト実数を読む例のつもり。

sequential（順番探索）

import numpy as np
import struct
class seq_read :
    def __init__(self,filename, endian=">") :
        self.f = open(filename, "rb")
        self.endian = endian
        
    def read(self, dtype) :
        num, = struct.unpack(self.endian+"i",self.f.read(4))    
        data = np.frombuffer(self.f.read(num), dtype )
        num, = struct.unpack(self.endian+"i",self.f.read(4))            
        return data
    
    def rewind(self) :
        self.f.seek(0) 
        
    def __del__(self) :
        self.f.close()
### example ### 
f = seq_read("test.seq", endian=">" ) 
data = f.read(">i") # big endianの4バイトinteger
f.rewind() # ファイル先頭へ

direct(直接探索)

directアクセスは型が変わらない、かつレコード長も一定なので、インスタンスを作成する際に設定する。

import numpy as np
import struct
class dir_read :
    def __init__(self, filename, recl, dtype) : 
        self.f = open(filename, "rb")
        self.recl = recl
        self.dtype = dtype        
        
    def read(self, rec) : # rec は1からスタート(Fortran的)
        self.f.seek((rec-1)*self.recl)
        data = np.frombuffer(self.f.read(self.recl), self.dtype)
        return data
    
    def __del__(self) :
        self.f.close()
### example ### 
f2 = dir_read("test.dir",4*200,">f")
print(f2.read(2))

他にもnumpy.fromfileを使っても読める。
offsetで読み始めるバイト数を指定し、dtypeで読む要素数を指定する。

numpy 1.7以降

import numpy as np
recl = 200
data = np.fromfile("test.dir",dtype=">"+str(recl)+"f4",count=1,offset=4*recl)[0]

stream

上述で使っているseekとnp.frombufferで読むことができる。seekはposと同じ感じなので、Fortranでstream出力を使える人ならすぐにできるはず。

Fortranで読めるバイナリの書き込み(2020/11/30)追記

stream 出力をベースに行う（同じ形状の配列を書き出せばdirectアクセスでも開ける）
下記はビッグエンディアン(>)で4バイトの実数の例。struct.packでバイトに変換しf.writeで出力する。

asytpeとtobytesを組み合わせた方が圧倒的に早いようだ。endianの変換はbyteswapで行う。

import struct
import array
import numpy as np
filename = "test.dat"
f = open(filename,'wb')
def write_f(f,var) :
    buffer = struct.pack('>'+str(len(var.ravel()))+'f', 
                         *var.ravel().astype("f4")[:] )
    f.write(buffer)
def write_f2(f,var) :
    f.write(var.astype("f4").byteswap().tobytes())
var = np.array([1.,2.])
write_f(f,var)
f.close()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up