More than 5 years have passed since last update.

Maximum Sum Initialize法巡回セールスマン問題(TSP)高速近似アルゴリズム (RaspberryPiのおまけ付き)

Last updated at 2020-05-04Posted at 2020-01-26

自己紹介

皆さんこんにちは。
山中に生息しているプログラマ＆SE、ほげふがーです。
計算量が普通なら爆発してしまうような数学の最適化問題を、量子コンピュータではなく、非力なPCやRaspbe〇〇piで、無理やり解かせるのが大好きです。

気になったら、何かコメントいただけると嬉しいです(^^

2020/05/04　msinit_shortest1.0.py…計算が実行されない問題を解決しました。
msinit_longest1.0.py …正しい最適解が計算されない問題を解決しました。

githubにアップしたもの

これです。
https://github.com/TiresiasGit/trunk/tree/master/mmiopt

git clone https://github.com/TiresiasGit/trunk.git

randarray4.0.py
　　ex: python randarray4.0.py 10000x10000.npz 10000 uint8
　　In:ファイル名 Out:npzファイル(1~100の自然数の一様分布乱数の正方行列)
csv_to_npz.py
　　ex: python csv_to_npz.py InFile.csv OutFile.npz
─行列の対角要素は、全て0にして、csvカンマ区切りの表を作成してください。(excelなどで。)
　　　一要素の入力可能な最大値は、9999999です。必要であれば、0~100の正規化をしてください。
　　　In:csvファイル Out:npzファイル

　　　下記のfilename.npzに、変換したOutFile.npzを入力してご使用ください。

msinit_shortest1.0.py
　　ex: python msinit_shortest1.0.py filename.npz txt 0.01 single no2-opt
msinit_longest1.0.py
　　ex: python msinit_longest1.0.py 1.0.py filename.npz txt 0.01 single no2-opt

　Input:filename　　　　　　　　　　int or float
　Output:ansedge.npz or txt　　　　　　int
　　　　　ansnode.npz or txt　　　　　int
　　　　　accumulation.npz or txt　　　float

概要

最初は必ずMaximum Sum Initialize法で評価値(dataeva)を算出し,評価値が最大のノードをstart地点として決定します。

Maximum Sum Initialize法とMaximum Mean Initialize法との違い

　Maximum Mean Initialize　　　
(https://qiita.com/TiresiasGit/items/c6f2de7059e25ebf342a)
　　　今までの自己最速記録に比べ、最大71倍高速化しました！
例：10000x10000のTSPにおいて
計算時間　50.9814秒→0.7162秒！
　総距離　　　10417→10449 　　　(わずかな悪化で済んでいる)

今回開発したver.(msinit)


python msinit_shortest1.0.py 10000x10000.npz  txt 0.01 single no2-opt
start load file!
[[ 0 68 82 ... 12 67 18]
 [70  0 98 ... 14 36 85]
 [53 99  0 ...  3 10 77]
 ...
 [ 4 15 64 ...  0 77 80]
 [80 33 64 ... 11  0 36]
 [92  5 16 ...  1 80  0]]
end load file!
flags = [0 0 0 ... 0 0 0]
start max mean initalize.
er=0.01
n=3137
end max mean initialize.
start search min edges.
end search min edges.
start make ansedge from ansnode.
end make ansedge from ansnode.
search end ansnode-> [7133   35   66 ... 9976 9839 9728]
Accumulation Distance =  10449.0
ansedge =  [ 1.  1.  1. ... 45. 79. 26.]
calculation elapsed time=0.7161514759063721
total elapsed time=2.8556344509124756

従来のver.(mmiopt)

>python mmiopt_shortest3.0.py 10000x10000.npz  txt
start load file!
end load file!
start max mean initalize.
end max mean initialize.
start 2-opt
end 2-opt
search end ansnode-> [8132 7417  211 ... 7404 2487 5113]
Accumulation Distance =  10417.0
ansedge =  [ 1.  1.  1. ... 11. 81. 29.]
calculation elapsed time=50.98143482208252
total elapsed time=53.1019127368927

　評価値の計算
　Maximum Mean Initialize法　　　　　Maximum Sum Initialize法
　・平均値　　　　　　　　　→　　合計値（開発のなりゆきでこうなりました...）
　・行の内の全要素がInput　　→　　統計学的なサンプルサイズのみInput
　　とよりヒューリスティックス（直感型）に仕様変更しました。

最小値の探索において探索済み除外の改善

　Maximum Sumに変更したのとは別で、探索済み除外の改善をしました。
　　従来　mmiopt_ver3.0
　　　　　→リスト内包表記とflagsの組み合わせ

リスト内包表記.


   while num > 0:
　　：
　　indexes = [i for i, x in enumerate(data[index]) if 0 == flags[i]]

　　最新　msinit_ver1.0
　　　　→flagsの探索済み(index)に100を代入、np.add()でdata[index]に行単位で加算する(各ノードの行に対して一度だけ適用されるため数値的発散の心配はない)

flagsに100を代入とnp.add()

   while num > 0:
　　：
     data[index]=np.add(data[index],flags)
    :
     flags[index] = 100 #訪問済みにする

10000x10000のTSPにおいて、
　　トータル計算時間50.7954秒→25.7188秒
　　　　となりました。TSPでは最小値のサーチ時間が計算時間の9割以上を占めるので,その中の探索済みにflagを立てる処理を、上記処理に変更したことにより、その部分が2倍速ぐらいとなり、全体としてほぼ2倍速となりました。

ハイパーパラメータの考察

今回私が作成したmmioptのver4.0の外部引数で入力できるパラメータにおいて、どの組み合わせを使うのが良さそうかを考察するため、私のPCで検証してみました。(詳細は実行環境の章を参照。)

思い切って、前回の記事の9倍の規模のTSPで検証してみましょう！

◆30000x30000.npz single no-2opt　固定（このパラメータはテキトーに決め打ちで実施。演算時間が早いので。）

er※①	n※②	距離	演算時間[s]
0.00	29999	30351	4.7850
0.01	3137	30407	4.1949
0.05	125	30453	4.1029
0.10	31	30460	4.0309
0.30	3	30488	4.0099
→やはり評価値の算出の許容誤差を増加させると、総距離が長いパスが出てきます。(^^;
実用的には0.00～0.01を使うのが良さそうです

※①er…許容誤差率です。
※②n…統計的なサンプルサイズnと同義。
nを求める計算式に、一様分布(1%～100%と決め打ち)の標準偏差を代入して作っています。

　　●一様分布の標準偏差　σ
　　　https://bellcurve.jp/statistics/course/8013.html
　　　　1％～100％の一様分布の標準偏差は

σ=\sqrt{(1.00-0.01)^2/12}=0.2857884

　
　　　(入力される行列一要素の値は相対値なので、
　　　　1%=1要素の持つ値域の幅/99　なので、値域により揺らぎますが、
　　　　ざっくりこう置けます。
　　　　1~100の自然数が入力されることを想定して作っています。)

　　●サンプルサイズ　n
　　　　https://toukeigaku-jouhou.info/2018/01/23/how-to-calculate-samplesize/

n=(1.96*σ/許容できる誤差er)^2

n=int((1.96*0.2857884/er)^2)

プログラム上は、高速化のため、左端の[0]から、n個(対角要素は対象外)採取します。
実質、乱択アルゴリズムです。
◆30000x30000.npz er=0.01　固定

判断論理※	2-optあり/なし	距離	演算時間[s]
single	no-2opt	30407	4.1949
single	2opt	30407	9.1305
multi	no-2opt	30367	221.0364
multi	2opt	30367	228.9505
※解探索中に、複数の最大値/最小値に出くわしたときの判断論理

single…　元ノードの持つエッジが、複数の最小のエッジを持つ場合、強制的に若番ノードをその元ノードの解(先ノード)とする
multi…元ノードの持つエッジが、複数の最小のエッジを持つ場合、その中から、最大の評価値をもつノードを元ノードの解(先ノード)とする

no-2opt…解の2opt法の適用なし
2opt　　…解の2opt法の適用あり

multiだと時間のわりに、劇的に改善しないですね...
single/multiにおいて2-optを使用しても、結果として、変化なしでした。（そういうこともあるのか...というより、解の良さが最初のinitalizeとsingle/multiに依存する？）

ハイパーパラメータ考察結論

　　計算資源に余裕がない→single,er=0.01
　　計算資源に余裕がある→multi,er=0.00
　　no-2optで十分。
よって、実行コマンドは
計算資源に余裕がない場合は、
　　　python msi_shortest4.0.py ファイル名.npz txt 0.01 single no2-opt

計算資源に余裕がある場合は、
　　　python msi_shortest4.0.py ファイル名.npz txt 0.01 multi 2-opt

をおすすめします。(｀・ω・´)

社会インフラ（交通渋滞緩和や、ロボットなど）に組み込まれたら、すごく嬉しい...

実行環境

SW:
　Python:Python 3.8.0
　コマンドプロンプトより実行
HW:
　OS:Windwos 10 Home
　CPU:AMD Ryzen 7 2700 Eight-Core Processor 3.20GHz(OCなし,16論理コア)
　DIMM:16.0GB 3200MHz
　HDD:1TB
　GPU:AMD Radeon R9 200

付録A (RaspberryPiに解かせてみた)

RaspberryPi.3B+


pi@raspberrypi:~/trunk/mmiopt $ python msinit_shortest1.0.py 10000x10000.npz txt 0.01 single no-2opt
start load file!
[[ 0 15 21 ...  7 14 75]
 [56  0 94 ... 56 47 89]
 [15 87  0 ... 68 88 56]
 ...
 [ 5 10 79 ...  0 96 68]
 [29 42 58 ... 20  0 56]
 [16 76 49 ... 15 95  0]]
end load file!
('flags =', array([0, 0, 0, ..., 0, 0, 0]))
start max mean initalize.
er=0.01
n=3137
end max mean initialize.
start search min edges.
end search min edges.
start make ansedge from ansnode.
end make ansedge from ansnode.
('search end ansnode->', array([3596,    8,  197, ..., 9889, 9975, 9952]))
('Accumulation Distance = ', 10408.0)
('ansedge = ', array([ 1.,  1.,  1., ..., 15., 46., 47.]))
calculation elapsed time=8.08861899376
total elapsed time=13.6909029484

付録B (ソースコード)

付録B　msinit_shortest1.0.pyのコード

# coding: UTF-8
# Maximum Mean Initialize and 2-opt.
# -> MMI-OPT
#   this version is approximataly solve "shortest path" from TSP Input.
#
#   This argolithm developed for solve TSP approximately fast.
#   (TSP=Traveling Salesman Problem. )
#
# Confirmed operating environment.
#   CentOS Linux release 7.2.1511 (Core)
#   Python 2.7.5
#   Python 3.8.0
#   numpy  1.16.5
#
# How to use this software from command line?
# > python mmiopt_shortest4.0.py filename.npz txt 0.01 single  no2-opt
#                                             ^^^A solution is output as .npz
#                                ^^^^^^^^^^^^^int or float valuses square matrix
#                                             of np.savez_compressed(filename.npz,array_=a)
#                                  type
#  Input:filename                   int or float 
#  Output:ansedge.npz or txt        int
#         ansnode.npz or txt        int
#         accumulation.npz or txt   float
#                                        developer "Ryota Tanaka"
import numpy as np
import csv
import sys
import time
import itertools as itr
def main():
   argv = sys.argv
   if len(argv) != 6:
     print("File name may not be specified,or may contain extra options.")
     exit()
   print("start load file!")
   data = np.load(argv[1])['array_']
   print(data)
   print("end load file!")
   start = time.time()
   searched = [data.shape[1]]
   if data.shape[0] != data.shape[1]:
     print("Please check if it is a square matrix.")
   #array init
   flags = np.zeros(data.shape[0],dtype='int32')
   print("flags =",flags)
   ansnode = np.array([],dtype='int32')
   ansedge = np.array([],dtype='int32')
   #ansedge = np.array([])
   accumulation = 0 #累積距離の初期化
   print("start max mean initalize.")
   er=float(argv[3])
   print("er="+str(er))
   if er < 0.0 or 0.99 < er:
     print("inputed error rate is out of range.")
     exit()
   elif er <=0.0:
     n=data.shape[0]-1
   else:
     n=int((1.96*0.2857884/er)**2)
   print("n="+str(n))
   if n > data.shape[0]-1:
      n=data.shape[0]-1
   index,dataeva = calc_maxevaindex(data,n)
   print("end max mean initialize.")
   flags[index] = 100 #訪問済みにする
   ansnode=np.append(ansnode,int(index)) #答えとなるノード番号をキューに入れる
   num = data.shape[0] - 1
   print("start search min edges.")
   while num > 0:
     beforeindex = index #ひとつ前のindexとして保持する
     data[index]=np.add(data[index],flags)
     if argv[4] == 'single':
       index = np.argmin(data[index])
     else:
       #multi
       xmin = np.min(data[index])
       minindexes = [i for i,x in enumerate(data[index]) if x == xmin]
       if len(minindexes) > 1:
         maxdataevai = np.argmax(dataeva[minindexes])
         index = minindexes[maxdataevai] #minindexesの内、評価値が最大のノード番号を取り出す
       else:
         index = minindexes[0]

     flags[index] = 100 #訪問済みにする
     ansnode=np.append(ansnode,int(index)) #答えとなるノード番号をキューに入れる
     accumulation += data[beforeindex,index] #累積距離に加算する
     num-=1
     #print(num)
   print("end search min edges.")
   print("start make ansedge from ansnode.")
   for i in range(len(ansnode)-1):
     ansedge=np.append(ansedge,data[int(ansnode[i]),int(ansnode[i + 1])])
   print("end make ansedge from ansnode.")
     #ansedge = np.append(ansedge,data[int(ansnode[i]),int(ansnode[i + 1])])
   #2-opt法による解のworst除去
   if argv[5] == '2-opt':
     print("start 2-opt")
     num = data.shape[0] - 1
     tempedge0=np.zeros(4)#従来のエッジ長集合
     tempedge1=np.zeros(4)#swap後エッジ長集合
     templist=np.zeros(6)  
     while num > 0:
       worst_node_i = np.argmax(ansedge)#解の経路から、最長のエッジを出す
       #次のループのために、現在見つかった最大値の要素を削除
       ansedge = np.delete(ansedge,worst_node_i,axis=0)
       worst_minnode=np.argmin(data[int(ansnode[worst_node_i])])
       #◆交換なし
       #交換元
       worst_minnode_i_list=np.where(ansnode-worst_minnode==0)#array[]->intに変換は[0]を参照すればよい
       worst_minnode_i=worst_minnode_i_list[0]
       #print("worst_minnode_i");#print(worst_minnode_i)

       worst_node_i_next = worst_node_i+1
       #以下のif文は、worst_maxnode_iが解の先頭か、最後である場合、前がなかったり後がなかったりするので、その境界の処理
       templist[1]=ansnode[worst_minnode_i]
       if worst_minnode_i-1 > 0 :
          templist[0]=ansnode[worst_minnode_i-1]
          tempedge0[0]=data[int(templist[0]),int(templist[1])]
       if worst_minnode_i+1 < len(ansnode)-1 :
          templist[2]=ansnode[worst_minnode_i+1]
          tempedge0[1]=data[int(templist[1]),int(templist[2])]
       #交換先
       templist[4]=ansnode[worst_node_i_next]
       if worst_node_i-1 > 0 :
          templist[3]=ansnode[worst_node_i_next-1]
          tempedge0[2]=data[int(templist[3]),int(templist[4])]
       if worst_node_i+1 < len(ansnode)-1 :
          templist[5]=ansnode[worst_node_i_next+1]
          tempedge0[3]=data[int(templist[4]),int(templist[5])]

       #◆交換した場合
       #交換元
       templist[1]=ansnode[worst_node_i_next]#swaped
       if worst_minnode_i-1 > 0 :
          templist[0]=ansnode[worst_minnode_i-1]
          tempedge1[0]=data[int(templist[0]),int(templist[1])]
       if worst_minnode_i+1 < len(ansnode)-1 :
          templist[2]=ansnode[worst_minnode_i+1]
          tempedge1[1]=data[int(templist[1]),int(templist[2])]
       #交換先
       templist[4]=ansnode[worst_minnode_i]#swaped
       if worst_node_i-1 > 0 :
          templist[3]=ansnode[worst_node_i_next-1]
          tempedge1[2]=data[int(templist[3]),int(templist[4])]
       if worst_node_i+1 < len(ansnode)-1 :
          templist[5]=ansnode[worst_node_i_next+1]
          tempedge1[3]=data[int(templist[4]),int(templist[5])]
       #print("sum(tempedge0)");#print(np.sum(tempedge0))
       #print("sum(tempedge1)");#print(np.sum(tempedge1))
       #距離判定
       if(np.sum(tempedge1)<np.sum(tempedge0)):
         #node swap
         tempnode=             ansnode[worst_node_i_next]
         ansnode[worst_node_i_next]=ansnode[worst_minnode_i]
         ansnode[worst_minnode_i]=tempnode
       #decrement
       num-=1
     #while num end
     print("end 2-opt")

   print("search end ansnode->",ansnode)
   ansedge =[]
   for i in range(len(ansnode)-1):
     ansedge = np.append(ansedge,data[int(ansnode[i]),int(ansnode[i + 1])])
   accumulation=np.sum(ansedge)
   print("Accumulation Distance = ",accumulation)
   print("ansedge = ",ansedge)
   if argv[2] == 'txt':
     np.savetxt("ansnode.txt", ansnode,fmt ='%d')
     np.savetxt("ansedge.txt", ansedge,fmt ='%d')
     np.savetxt("accumulation.txt", [accumulation],fmt ='%f')
   elif argv[2] == 'npz':
     np.savez_compressed("ansnode.npz", array_=ansnode)
     np.savez_compressed("ansedge.npz", array_=ansedge)
     np.savez_compressed("accumulation.npz", array_=accumulation)
   else:
     print("!none output files!")
   t = time.time() - start
   print("calculation elapsed time="+str(t))


def calc_maxevaindex(data,n):
   #dataeva = np.mean(data,axis=1)
   maxj=data.shape[0]-1
   #before half(row number 0 to n-1 -> sum(0 to n)  (n+1 elements,include 0))
   #print(data)
   dataeva=[np.sum(data[0:n,:n+1],axis=1)]
   #print("data[0:n,:n+1]="+str(data[0:n,:n+1]))
   #after half(row number n to maxj -> sum(0 to n-1 (n elements))
   #print("data[n:maxj+1,:n])"+str(data[n:maxj+1,:n]))
   dataeva=np.append(dataeva,np.sum(data[n:maxj+1,:n],axis=1))
   #print(dataeva)
   #print("data[n:maxj+1,:n]="+str(data[n:maxj+1,:n])

   #print("i=0 ="+str(dataeva))
   #print(dataeva)
   #i=1~maxj-1 (To get the line from which a diagonal element was removed.)
   #print("maxj = "+str(maxj))
   #print(str([np.mean(np.append(row[0:(i+1)],row[(i+1)+1:n+1],axis=0)) for i,row in enumerate(data[1:maxj])]))
   #print(str(np.append(dataeva,[np.mean(np.append(row[0:(i+1)],row[(i+1)+1:n+1],axis=0)) for i,row in enumerate(data[1:maxj])],axis=0)))
   #exit()
   evamax_i=np.argmax(dataeva)
   return evamax_i,dataeva

if __name__  == "__main__":
  try:
    start = time.time() 
    main() #main処理
    t = time.time() - start
    print("total elapsed time="+str(t))

  #errorハンドリング
  except IOError as error:
    print("Make sure the files are on the same level")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Maximum Sum Initialize法 巡回セールスマン問題(TSP)高速近似アルゴリズム (RaspberryPiのおまけ付き)