6
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

姫野ベンチマーク(himenoBMT)をPythonで実行し、[numpy, numba]で高速化。

Last updated at Posted at 2018-06-18

姫野ベンチマークをPythonで実行し、[numpy, numba]で高速化。

 姫野ベンチは、3次元ポアソン方程式をヤコビ法で解く反復処理の速度を図るベンチマークです。
 本家のホームページは→こちら Fortran90とCの公式版があります。
 20数年前からある古いベンチマークですが、流体解析に必要なメモリ律速のベンチマークに向いています。
 Pythonを始めるにあたり、どの程度の速度がでるか興味があって始めました。

ベンチマーク環境

  • CPU: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz(Boost時3.8GHz) (4core)
  • RAM: 32GB
  • OS: CentOS Linux release 7.5.1804 (64bit)
  • Fortran compiler: gfortran 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
  • C compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
  • Python: Python 3.6.3 (default, May 13 2018, 22:01:23) (Anaconda 5.1.0)

かなり古いCPUです。1coreでの計測です。

オリジナル版の計測結果

使用したソースとcompile option

ベンチマーク結果

単精度 XS S M L XL
Fortran[Mflops] 4011 3845 2591 3040 2881
C [Mflops] 1808 1790 1250 791 407
配列サイズ(IxJxK) 65x33x33 129x65x65 257x129x129 513x257x257 1025x513x513
メモリ使用量 5MB 31MB 235MB 1,854MB 14,752MB
メモリ使用量は、Fortran版で/usr/timeの出力。C版は10%弱使用量が少ない。
Fortranは安定して速い。Cは配列サイズが大きいと速度低下が大きい。

Python版

単精度  XS S M L XL
pure_python 1.9 2.1 2.1 2.1 2.1
numba order='C' 1281 611 483 420 368
numba order='F' 2216 1955 1558 1161 870
numpy order='C' 1043 954 941 929 715
numpy order='F' 1186 1180 1099 1078 821

himenoBMT_single.PNG

ソースは、後で示します。試したのは、

  1. pure_python: Fortran版をなるべく変えないようPythonに移植したもの。配列はnumpy.ndarrayを使用
  2. numba : 上記のpure_python版に@numba.jit(nopython=True)を追加し、エラーが出たところを必要最小限で修正したもの。(nopython=True)は、numbaが失敗した際にpythonにデグレードせずエラーを出させるため。
  3. numpy : pure_pnuython版のループ部をnumpyの添え字をずらしたviewを与えることで、一行にまとめたもの
    ※:numba版とnumpy版はndarray作成時のorderを'C'と'F'の両方実施

実行方法

python himenoBMT_pure.py [XS|S|M|L|XL]サイズを指定してください
憶えたてのf-stringsを使っているので、 python3.6以上です。

結果(単精度)

  1. pure_pythonは、目も当てられない。
  2. numbaはCとほぼ互角の速度。ソースコードの変更も少なく、forループをそのまま書けば良いので、見通しが良く、コードのメンテナンスも容易。Fortranに対しても半分程度の速度が出るので、予想以上の結果。ifortと比較したいところ。
  3. numpyでも結構行ける。配列サイズが大きい場合には、numbaよりも速い場合がある。
  4. orderは'C'よりも'F'の方が速い。これは元のコードのループ順に依存すると思われる。

結論

PythonでもCに匹敵する速度で実行できるコードが書けそう
numbaはすごい、でも@numba.jit(nopython=True)で動かないと戸惑う

以下、ソースコード

まずは、pure_python版、なるべくオリジナルのFortran版を素直に移植したつもりだが、
Classを作る必要があったのか…。Python始めて2カ月の初心者ということで大目に見て下さい。

himenoBMT_pure.py
'''
Created on 2018/06/10

@author: akio
'''
import sys
import time
import numpy as np


class pres:
    pres = []
    
class mtrx:
    a = []
    b = []
    c = []
    
class bound:
    bnd = []
    
class work:
    wrk1 = []
    wrk2 = []

class others:
    imax = -1
    jmax = -1
    kmax = -1
    mimax = -1
    mjmax = -1
    mkmax = -1
    
    omega = 0.8
    
def readparam(size):
    grid_set(size)
    
    others.imax = others.mimax-1
    others.jmax = others.mjmax-1
    others.kmax = others.mkmax-1
    return 

def grid_set(size):
    if size == "xs":
        mimax = 65
        mjmax = 33
        mkmax = 33
    elif size == "s":
        mimax = 129
        mjmax = 65
        mkmax = 65
    elif size == "m":
        mimax=257
        mjmax=129
        mkmax=129
    elif size == "l":
        mimax=513
        mjmax=257
        mkmax=257
    elif size == "xl":
        mimax=1025
        mjmax=513
        mkmax=513
    else:
        print("Invalid input character !! {}".format(size))
        sys.exit("-1")
        
    others.mimax = mimax
    others.mjmax = mjmax
    others.mkmax = mkmax


def initmt():
    pres.p[:,:,:] = 0
    mtrx.a[:,:,:,:] = 0
    mtrx.b[:,:,:,:] = 0
    mtrx.c[:,:,:] = 0
    bound.bnd[:,:,:] = 0.0
    work.wrk1[:,:,:] = 0.0
    work.wrk2[:,:,:] = 0.0
    
    mtrx.a[0:others.imax,0:others.jmax,0:others.kmax,0:3] = 1.0
    mtrx.a[0:others.imax,0:others.jmax,0:others.kmax,3] = 1.0/6.0
    mtrx.c[0:others.imax,0:others.jmax,0:others.kmax,:] = 1.0
    bound.bnd[0:others.imax,0:others.jmax,0:others.kmax] = 1.0
    for k in range(others.kmax):
        pres.p[:,:,k] = ((k-1)*(k-1))/((others.kmax-1)*(others.kmax-1))
        
    
    
    
def initmem():
    dim = (others.mimax, others.mjmax, others.mkmax)
    dim4 = (others.mimax, others.mjmax, others.mkmax,4)
    dim3 = (others.mimax, others.mjmax, others.mkmax,3)
    pres.p = np.ndarray(dim, dtype=np.float32)
    mtrx.a = np.ndarray(dim4, dtype=np.float32)
    mtrx.b = np.ndarray(dim3, dtype=np.float32)
    mtrx.c = np.ndarray(dim3, dtype=np.float32)
    bound.bnd = np.ndarray(dim, dtype=np.float32)
    work.wrk1 = np.ndarray(dim, dtype=np.float32)
    work.wrk2 = np.ndarray(dim, dtype=np.float32)

def jacobi(nn):
    
    #gosa = np.array(0.0, dtype=np.float32)
    for loop in range(nn):
        gosa = np.zeros((1), dtype=np.float32)
        #ss = 0.0 #np.array(0.0, np.float32)
        for k in range(1,others.kmax-1):
            for j in range(1,others.jmax-1):
                for i in range(1,others.imax-1):
                    s0 = mtrx.a[i,j,k,0]*pres.p[i+1,j,k] \
                   +mtrx.a[i,j,k,1]*pres.p[i,j+1,k] \
                   +mtrx.a[i,j,k,2]*pres.p[i,j,k+1] \
                   +mtrx.b[i,j,k,0]*(pres.p[i+1,j+1,k]-pres.p[i+1,j-1,k] \
                               -pres.p[i-1,j+1,k]+pres.p[i-1,j-1,k]) \
                   +mtrx.b[i,j,k,1]*(pres.p[i,j+1,k+1]-pres.p[i,j-1,k+1] \
                               -pres.p[i,j+1,k-1]+pres.p[i,j-1,k-1]) \
                   +mtrx.b[i,j,k,2]*(pres.p[i+1,j,k+1]-pres.p[i-1,j,k+1] \
                               -pres.p[i+1,j,k-1]+pres.p[i-1,j,k-1]) \
                   +mtrx.c[i,j,k,0]*pres.p[i-1,j,k] \
                   +mtrx.c[i,j,k,1]*pres.p[i,j-1,k] \
                   +mtrx.c[i,j,k,2]*pres.p[i,j,k-1]+work.wrk1[i,j,k]
                    ss=(s0*mtrx.a[i,j,k,3]-pres.p[i,j,k])*bound.bnd[i,j,k]
                    gosa[0] = gosa[0] + ss*ss
                    work.wrk2[i,j,k]=pres.p[i,j,k]+ others.omega*ss
        

        if loop < 4: print('loop: {} gosa: {}'.format(loop,gosa[0]))
        pres.p[1:others.imax-1,1:others.jmax-1,1:others.kmax-1] = \
              work.wrk2[1:others.imax-1,1:others.jmax-1,1:others.kmax-1]
     
    return float(gosa[0])   

def second():
    return time.time()

if __name__ == '__main__':
    
    ttarget = 10.0 # 60
    xmflops2 = 0.0
    gosa = 0.0
    
    if len(sys.argv) > 1:
        size = sys.argv[1].lower()
    else:
        size = 's'
    readparam(size)
    initmem()
    initmt()

    print(' mimax={} mjmax={} mkmax={}'
          .format(others.mimax,others.mjmax,others.mkmax))
    print(' imax={} jmax={} kmax={}'
          .format(others.imax,others.jmax,others.kmax))

    dt= time.get_clock_info('time').resolution
    print('  {:s} , {:12.5e}'.format('Time measurement accuracy : ',dt))
## Start measuring

    nn=3
    print(' Start rehearsal measurement process.')
    print(' Measure the performance in 3 times.')

    cpu0=second()
## Jacobi iteration
    gosa = jacobi(nn)

    cpu1=second()
    cpu = (cpu1 - cpu0)
    flop=(others.kmax-2)*(others.jmax-2)*(others.imax-2)*34.0
    if (cpu != 0.0):  xmflops2 = flop/cpu*1.0e-6*nn
    print('  MFLOPS:{}  time(s):{}'
          .format(xmflops2,cpu))
#
## end the test loop
    nn=int(ttarget/(cpu/3.0))+1
    print(' Gosa : {}'.format(gosa))
    print('Now, start the actual measurement process.')
    print('The loop will be excuted in {} times.'.format(nn))
    print('This will take about one minute.')
    print('Wait for a while.')
#
## Jacobi iteration
    cpu0= second()
## Jacobi iteration
    gosa = jacobi(nn)
#
    cpu1= second()
    cpu = (cpu1 - cpu0)
    if (cpu != 0.0):  xmflops2 = flop*1.0e-6/cpu*nn
#
###      xmflops2=nflop/cpu*1.0e-6*float(nn)
#
    print('The loop will be excuted in {} times.'.format(nn))
    print(' Gosa : {}'.format(gosa))
    print('  MFLOPS:{}  time(s):{}'
          .format(xmflops2,cpu))
    score=xmflops2/82.84
    print(' Score based on Pentium III 600MHz : {}'.format(score))
#
#  pause
#  stop    

numbaバージョン

次は、numba版。pure_python版に@numba.jitを付けただけだと、実行時にjacobi関数の中で、pres.pやmtrx.aは知らない!と怒られたので、全て引数で渡すように変更した。結果では、order='C'よりもorder='F'の方が速かったが、ループ順をオリジナルのkji順からijk順に変更するとorder='C'の方が速い。実は、オリジナルのkjiでorder='F'よりも、ijk順でorder='C'の方が1割ほど速かった(複雑になるので、数字は載せていません)。

himenoBMT_numba.py
'''
Created on 2018/06/10

@author: akio
'''
import sys
import time
import numpy as np
import numba



class pres:
    pres = []
    
class mtrx:
    a = []
    b = []
    c = []
    
class bound:
    bnd = []
    
class work:
    wrk1 = []
    wrk2 = []

class others:
    imax = -1
    jmax = -1
    kmax = -1
    mimax = -1
    mjmax = -1
    mkmax = -1
    
    omega = 0.8
    
def readparam(size):
    grid_set(size)
    
    others.imax = others.mimax-1
    others.jmax = others.mjmax-1
    others.kmax = others.mkmax-1
    return 

def grid_set(size):
    if size == "xs":
        mimax = 65
        mjmax = 33
        mkmax = 33
    elif size == "s":
        mimax = 129
        mjmax = 65
        mkmax = 65
    elif size == "m":
        mimax=257
        mjmax=129
        mkmax=129
    elif size == "l":
        mimax=513
        mjmax=257
        mkmax=257
    elif size == "xl":
        mimax=1025
        mjmax=513
        mkmax=513
    else:
        print("Invalid input character !! {}".format(size))
        sys.exit("-1")
        
    others.mimax = mimax
    others.mjmax = mjmax
    others.mkmax = mkmax

       
def initmt():
    pres.p[:,:,:] = 0
    mtrx.a[:,:,:,:] = 0
    mtrx.b[:,:,:,:] = 0
    mtrx.c[:,:,:] = 0
    bound.bnd[:,:,:] = 0.0
    work.wrk1[:,:,:] = 0.0
    work.wrk2[:,:,:] = 0.0
    
    mtrx.a[0:others.imax,0:others.jmax,0:others.kmax,0:3] = 1.0
    mtrx.a[0:others.imax,0:others.jmax,0:others.kmax,3] = 1.0/6.0
    mtrx.c[0:others.imax,0:others.jmax,0:others.kmax,:] = 1.0
    bound.bnd[0:others.imax,0:others.jmax,0:others.kmax] = 1.0
    for k in range(others.kmax):
        pres.p[:,:,k] = ((k-1)*(k-1))/((others.kmax-1)*(others.kmax-1))
        
    
    

def initmem():
    dim = (others.mimax, others.mjmax, others.mkmax)
    dim4 = (others.mimax, others.mjmax, others.mkmax,4)
    dim3 = (others.mimax, others.mjmax, others.mkmax,3)
    atype = np.float32
    aorder = 'F'
    pres.p = np.ndarray(dim, dtype=atype, order=aorder)
    mtrx.a = np.ndarray(dim4, dtype=atype, order=aorder)
    mtrx.b = np.ndarray(dim3, dtype=atype, order=aorder)
    mtrx.c = np.ndarray(dim3, dtype=atype, order=aorder)
    bound.bnd = np.ndarray(dim, dtype=atype, order=aorder)
    work.wrk1 = np.ndarray(dim, dtype=atype, order=aorder)
    work.wrk2 = np.ndarray(dim, dtype=atype, order=aorder)

#@numba.jit('(i8,f8,i8,i8,i8,i8,i8,i8,f8,f4[:,:,:],f4[:,:,:,:],f4[:,:,:,:],f4[:,:,:,:],f4[:,:,:],f4[:,:,:],f4[:,:,:])',nopython=True)
@numba.jit(nopython=True)
def jacobi(nn,mimax,mjmax,mkmax,imax,jmax,kmax,omega \
           ,p,a,b,c,bnd,wrk1,wrk2):
    
    gosa = np.zeros((1), dtype=np.float32)
    
    for loop in range(nn):
        gosa[0] = 0.0
        #ss = np.array(0.0, np.float32)
        for k in range(1,kmax-1):
            for j in range(1,jmax-1):
                for i in range(1,imax-1):
                    s0 = a[i,j,k,0]*p[i+1,j,k] \
                   +a[i,j,k,1]*p[i,j+1,k] \
                   +a[i,j,k,2]*p[i,j,k+1] \
                   +b[i,j,k,0]*(p[i+1,j+1,k]-p[i+1,j-1,k] \
                               -p[i-1,j+1,k]+p[i-1,j-1,k]) \
                   +b[i,j,k,1]*(p[i,j+1,k+1]-p[i,j-1,k+1] \
                               -p[i,j+1,k-1]+p[i,j-1,k-1]) \
                   +b[i,j,k,2]*(p[i+1,j,k+1]-p[i-1,j,k+1] \
                               -p[i+1,j,k-1]+p[i-1,j,k-1]) \
                   +c[i,j,k,0]*p[i-1,j,k] \
                   +c[i,j,k,1]*p[i,j-1,k] \
                   +c[i,j,k,2]*p[i,j,k-1]+wrk1[i,j,k]
                    ss=(s0*a[i,j,k,3]-p[i,j,k])*bnd[i,j,k]
                    gosa[0] = gosa[0]+ss*ss
                    wrk2[i,j,k]=p[i,j,k]+ omega*ss
        
        #if loop < 4: print(f'loop: {loop} gosa: {gosa}')
        #print('loop: {} gosa: {}'.format(loop,gosa))
        #print(gosa)
        p[1:imax-1,1:jmax-1,1:kmax-1] = \
              wrk2[1:imax-1,1:jmax-1,1:kmax-1]
        
    return float(gosa[0])

def second():
    return time.time()

if __name__ == '__main__':
    
    ttarget = 60.0 # 60
    xmflops2 = 0.0
    gosa = 0.0
    
    
    if len(sys.argv) > 1:
        size = sys.argv[1].lower()
    else:
        size = 's'
    readparam(size)
    initmem()
    initmt()

    ''' numba jit'''
    nn=1
    gosa = jacobi(nn,others.mimax,others.mjmax,others.mkmax \
       ,others.imax,others.jmax,others.kmax,others.omega \
        ,pres.p,mtrx.a,mtrx.b,mtrx.c,bound.bnd,work.wrk1,work.wrk2)
    initmt()


    print(f' mimax={others.mimax} mjmax={others.mjmax} mkmax={others.mkmax}')
    print(f' imax={others.imax} jmax={others.jmax} kmax={others.kmax}')

    dt= time.get_clock_info('time').resolution
    print('  %a , %10.5e' % ('Time measurement accuracy : ',dt))
## Start measuring

    nn=3
    print(' Start rehearsal measurement process.')
    print(' Measure the performance in 3 times.')

    cpu0=second()
## Jacobi iteration
    gosa = jacobi(nn,others.mimax,others.mjmax,others.mkmax \
           ,others.imax,others.jmax,others.kmax,others.omega \
            ,pres.p,mtrx.a,mtrx.b,mtrx.c,bound.bnd,work.wrk1,work.wrk2)

    cpu1=second()
    cpu = (cpu1 - cpu0)
    flop=(others.kmax-2)*(others.jmax-2)*(others.imax-2)*34.0
    if (cpu != 0.0):  xmflops2 = flop/cpu*1.0e-6*nn
    print(f'  MFLOPS:{xmflops2}  time(s):{cpu} {gosa}')
#
## end the test loop
    nn=int(ttarget/(cpu/3.0))+1
    print('Now, start the actual measurement process.')
    print(f'The loop will be excuted in {nn} times.')
    print('This will take about one minute.')
    print('Wait for a while.')
#
## Jacobi iteration
    cpu0= second()
## Jacobi iteration
    gosa = jacobi(nn,others.mimax,others.mjmax,others.mkmax \
           ,others.imax,others.jmax,others.kmax,others.omega \
            ,pres.p,mtrx.a,mtrx.b,mtrx.c,bound.bnd,work.wrk1,work.wrk2)
#
    cpu1= second()
    cpu = (cpu1 - cpu0)
    if (cpu != 0.0):  xmflops2 = flop*1.0e-6/cpu*nn
#
###      xmflops2=nflop/cpu*1.0e-6*float(nn)
#
    print(f' Loop executed for {nn} times')
    print(f' Gosa : {gosa}')
    print(f' MFLOPS: {xmflops2}  time(s):{cpu}')
    score=xmflops2/82.84
    print(f' Score based on Pentium III 600MHz : {score}')
#
#  pause
#  stop    


numpyバージョン

下にソースのコア部分の抜粋を示します。コメントを参照してください。

def jacobi(nn,mimax,mjmax,mkmax,imax,jmax,kmax,omega \
           ,p,a,b,c,bnd,wrk1,wrk2,s0,ss):

    p111 = p[1:imax-1,1:jmax-1,1:kmax-1]
    p211 = p[2:imax  ,1:jmax-1,1:kmax-1]
    p121 = p[1:imax-1,2:jmax  ,1:kmax-1]
    p112 = p[1:imax-1,1:jmax-1,2:kmax  ]
    '''
    中略。
    上記のように、添字をずらしたndarrayのviewを作り、
    '''

    for loop in range(nn):
        ```
        k,j,iのループを無くして
        numpy配列の演算に置き換えた

        注意事項はまったところ
        s0[:,:,:]=... "[:,:,:]"を忘れてs0=とすると
        s0を計算結果の新しいndarrayで上書きしてしまうので注意
        ```
        s0111[:,:,:] = a0*p211 \
           +a1* p121 \
           +a2* p112 \
           +b0*( p221 - p201 \
                -p021 + p001) \
           +b1*( p122 - p102 \
                -p120 + p100) \
           +b2*( p212 - p012 \
                -p210 + p010) \
           +c0*p011 \
           +c1*p101 \
           +c2*p110 + w1
        ss111[:,:,:] = (s0111*a3-p111)*bnd111

以下、numpy版の全体です。

himenoBMT_numpy2.py
'''
Created on 2018/06/10

@author: akio
'''
import sys
import time
from ctypes import c_float
import numpy as np
import numba



class pres:
    pres = []
    
class mtrx:
    a = []
    b = []
    c = []
    
class bound:
    bnd = []
    
class work:
    wrk1 = []
    wrk2 = []

class others:
    imax = -1
    jmax = -1
    kmax = -1
    mimax = -1
    mjmax = -1
    mkmax = -1
    
    omega = 0.8
    
def readparam(size):
    grid_set(size)
    
    others.imax = others.mimax-1
    others.jmax = others.mjmax-1
    others.kmax = others.mkmax-1
    return 

def grid_set(size):
    if size == "xs":
        mimax = 65
        mjmax = 33
        mkmax = 33
    elif size == "s":
        mimax = 129
        mjmax = 65
        mkmax = 65
    elif size == "m":
        mimax=257
        mjmax=129
        mkmax=129
    elif size == "l":
        mimax=513
        mjmax=257
        mkmax=257
    elif size == "xl":
        mimax=1025
        mjmax=513
        mkmax=513
    else:
        print("Invalid input character !! {}".format(size))
        sys.exit("-1")
        
    others.mimax = mimax
    others.mjmax = mjmax
    others.mkmax = mkmax

       
def initmt():
    pres.p[:,:,:] = 0
    mtrx.a[:,:,:,:] = 0
    mtrx.b[:,:,:,:] = 0
    mtrx.c[:,:,:] = 0
    bound.bnd[:,:,:] = 0.0
    work.wrk1[:,:,:] = 0.0
    work.wrk2[:,:,:] = 0.0
    work.s0[:,:,:] = 0.0
    work.ss[:,:,:] = 0.0
    
    mtrx.a[0:others.imax,0:others.jmax,0:others.kmax,0:3] = 1.0
    mtrx.a[0:others.imax,0:others.jmax,0:others.kmax,3] = 1.0/6.0
    mtrx.c[0:others.imax,0:others.jmax,0:others.kmax,:] = 1.0
    bound.bnd[0:others.imax,0:others.jmax,0:others.kmax] = 1.0
    for k in range(others.kmax):
        pres.p[:,:,k] = ((k-1)*(k-1))/((others.kmax-1)*(others.kmax-1))
        
    
    
    
def initmem():
    dim = (others.mimax, others.mjmax, others.mkmax)
    dim4 = (others.mimax, others.mjmax, others.mkmax,4)
    dim3 = (others.mimax, others.mjmax, others.mkmax,3)
    atype = np.float32
    aorder = 'F'
    pres.p = np.ndarray(dim, dtype=atype, order=aorder)
    mtrx.a = np.ndarray(dim4, dtype=atype, order=aorder)
    mtrx.b = np.ndarray(dim3, dtype=atype, order=aorder)
    mtrx.c = np.ndarray(dim3, dtype=atype, order=aorder)
    bound.bnd = np.ndarray(dim, dtype=atype, order=aorder)
    work.wrk1 = np.ndarray(dim, dtype=atype, order=aorder)
    work.wrk2 = np.ndarray(dim, dtype=atype, order=aorder)
    work.s0 = np.ndarray(dim, dtype=atype, order=aorder)
    work.ss = np.ndarray(dim, dtype=atype, order=aorder)

#@numba.jit('(i8,f8,i8,i8,i8,i8,i8,i8,f8,f4[:,:,:],f4[:,:,:,:],f4[:,:,:,:],f4[:,:,:,:],f4[:,:,:],f4[:,:,:],f4[:,:,:])',nopython=True)
#@numba.jit(nopython=True)
def jacobi(nn,mimax,mjmax,mkmax,imax,jmax,kmax,omega \
           ,p,a,b,c,bnd,wrk1,wrk2,s0,ss):
    
    #gosa = np.zeros((1), dtype=np.float32)
    
    a0 = a[1:imax-1,1:jmax-1,1:kmax-1,0]
    a1 = a[1:imax-1,1:jmax-1,1:kmax-1,1]
    a2 = a[1:imax-1,1:jmax-1,1:kmax-1,2]
    a3 = a[1:imax-1,1:jmax-1,1:kmax-1,3]
    b0 = b[1:imax-1,1:jmax-1,1:kmax-1,0]
    b1 = b[1:imax-1,1:jmax-1,1:kmax-1,1]
    b2 = b[1:imax-1,1:jmax-1,1:kmax-1,2]
    c0 = c[1:imax-1,1:jmax-1,1:kmax-1,0]
    c1 = c[1:imax-1,1:jmax-1,1:kmax-1,1]
    c2 = c[1:imax-1,1:jmax-1,1:kmax-1,2]
    
    p111 = p[1:imax-1,1:jmax-1,1:kmax-1]
    p211 = p[2:imax  ,1:jmax-1,1:kmax-1]
    p121 = p[1:imax-1,2:jmax  ,1:kmax-1]
    p112 = p[1:imax-1,1:jmax-1,2:kmax  ]
    p221 = p[2:imax  ,2:jmax  ,1:kmax-1]
    p201 = p[2:imax  ,0:jmax-2,1:kmax-1]
    p021 = p[0:imax-2,2:jmax  ,1:kmax-1]
    p001 = p[0:imax-2,0:jmax-2,1:kmax-1]
    p122 = p[1:imax-1,2:jmax  ,2:kmax  ]
    p102 = p[1:imax-1,0:jmax-2,2:kmax  ]
    p120 = p[1:imax-1,2:jmax  ,0:kmax-2]
    p100 = p[1:imax-1,0:jmax-2,0:kmax-2]
    p212 = p[2:imax  ,1:jmax-1,2:kmax  ]
    p012 = p[0:imax-2,1:jmax-1,2:kmax  ]
    p210 = p[2:imax  ,1:jmax-1,0:kmax-2]
    p010 = p[0:imax-2,1:jmax-1,0:kmax-2]
    
    p011 = p[0:imax-2,1:jmax-1,1:kmax-1]
    p101 = p[1:imax-1,0:jmax-2,1:kmax-1]
    p110 = p[1:imax-1,1:jmax-1,0:kmax-2]
    
    w1 = wrk1[1:imax-1,1:jmax-1,1:kmax-1]
    w2 = wrk2[1:imax-1,1:jmax-1,1:kmax-1]
    
    s0111 = s0[1:imax-1,1:jmax-1,1:kmax-1]
    ss111 = ss[1:imax-1,1:jmax-1,1:kmax-1]
    bnd111 = bnd[1:imax-1,1:jmax-1,1:kmax-1]
        
    for loop in range(nn):
        s0111[:,:,:] = a0*p211 \
           +a1* p121 \
           +a2* p112 \
           +b0*( p221 - p201 \
                -p021 + p001) \
           +b1*( p122 - p102 \
                -p120 + p100) \
           +b2*( p212 - p012 \
                -p210 + p010) \
           +c0*p011 \
           +c1*p101 \
           +c2*p110 + w1
        ss111[:,:,:] = (s0111*a3-p111)*bnd111
        w2[:,:,:] = p111+ omega*ss111
        ss111[:,:,:] = ss111*ss111
        gosa = float(np.sum(ss111, dtype=np.float32))
        
        #if loop < 4: print(f'loop: {loop} gosa: {gosa}')
        #print('loop: {} gosa: {}'.format(loop,gosa))
        if loop < 3: print(gosa)
        p111[:,:,:] = \
              w2
        
    return float(gosa)

def second():
    return time.time()

if __name__ == '__main__':
    
    ttarget = 60.0 # 60
    xmflops2 = 0.0
    gosa = 0.0
    
    if len(sys.argv) > 1:
        size = sys.argv[1].lower()
    else:
        size = 's'
    readparam(size)
    initmem()
    initmt()

    print(f' mimax={others.mimax} mjmax={others.mjmax} mkmax={others.mkmax}')
    print(f' imax={others.imax} jmax={others.jmax} kmax={others.kmax}')

    dt= time.get_clock_info('time').resolution
    print('  %a , %10.5e' % ('Time measurement accuracy : ',dt))
## Start measuring

    nn=3
    print(' Start rehearsal measurement process.')
    print(' Measure the performance in 3 times.')

    cpu0=second()
## Jacobi iteration
    gosa = jacobi(nn,others.mimax,others.mjmax,others.mkmax \
           ,others.imax,others.jmax,others.kmax,others.omega \
            ,pres.p,mtrx.a,mtrx.b,mtrx.c,bound.bnd \
            ,work.wrk1,work.wrk2,work.s0,work.ss)

    cpu1=second()
    cpu = (cpu1 - cpu0)
    flop=(others.kmax-2)*(others.jmax-2)*(others.imax-2)*34.0
    if (cpu != 0.0):  xmflops2 = flop/cpu*1.0e-6*nn
    print(f'  MFLOPS:{xmflops2}  time(s):{cpu} {gosa}')
#
## end the test loop
    nn=int(ttarget/(cpu/3.0))+1
    print('Now, start the actual measurement process.')
    print(f'The loop will be excuted in {nn} times.')
    print('This will take about one minute.')
    print('Wait for a while.')
#
## Jacobi iteration
    cpu0= second()
## Jacobi iteration
    gosa = jacobi(nn,others.mimax,others.mjmax,others.mkmax \
           ,others.imax,others.jmax,others.kmax,others.omega \
            ,pres.p,mtrx.a,mtrx.b,mtrx.c,bound.bnd \
            ,work.wrk1,work.wrk2,work.s0,work.ss)
#
    cpu1= second()
    cpu = (cpu1 - cpu0)
    if (cpu != 0.0):  xmflops2 = flop*1.0e-6/cpu*nn
#
###      xmflops2=nflop/cpu*1.0e-6*float(nn)
#
    print(f' Loop executed for {nn} times')
    print(f' Gosa : {gosa}')
    print(f' MFLOPS: {xmflops2}  time(s):{cpu}')
    score=xmflops2/82.84
    print(f' Score based on Pentium III 600MHz : {score}')
#
#  pause
#  stop    

後書き

ソースコードを添付すると無駄に長くなってしまい、読み難くすみません。

6
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?