0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

1950年代アイヌ語の「方言間距離」を疑似単語ベクトルのハミング距離で3D可視化する

Last updated at Posted at 2024-05-10

Sean Lee & Toshikazu Hasegawa『Evolution of the Ainu Language in Space and Time』(PLoS One, Vol.8, No.4 (April 2013), e62243)には、アイヌ語のbinary word vectorが19の方言に対して添付されている。服部四郎・知里真志保『アイヌ語諸方言の基礎語彙統計学的研究』(民族學硏究, Vol.24, No.4 (1960年11月), pp.307-342)の第1表200語彙を、350ビットの疑似単語ベクトルにしたもので、最初のビットは「k(u)ani」に、次のビットは「co(o)kay」に、3番目のビットは「anoka(y)」に対応しているらしい。昨日の記事と同様に、この疑似単語ベクトルのハミング距離を、networkxのspectral_layoutで3D可視化してみよう。Google Colaboratoryだと、こんな感じ。

!pip install networkx plotly
import numpy,networkx
from plotly.express import scatter_3d
d=[("八雲",      "10011101001010100010011111011111111101110100101011111101111111110110010010101010110000111111101110100100001110110111010100110011110010101100111110011111011111011010010010010100101001011111101111110100110000000110111010100111110001000111110001000100001111011111111111111001111101100111111010010001001001010101011010000101001001101100101111010000101001"),
   ("長万部",    "100111010010??1000100111110101111011011101001010111111011111111101010100101010101010001111111011101001000011011101110101001100111100101011001111???111110111110101100100100101001010010111111011111101001100000001101010101001111010010001111100010001000011110111111111111110011111001001111110100100010010010101010???10000101001000101100101111010000101001"),
   ("幌別",      "10011101001110100010011111010111101101110100101011111101111111110101010010101010100100111111101110100100001110110111010100101011110010101100111101011111011111011010010010010100101001111111101111110100101000000110110101010111111001000111111001000010001111011111111111111001111100100111111010010001001001110111010010000100101001101100101111011000011001"),
   ("平取",      "10011100100110100010011111011111101101110100101011111101111111110110010010101010101000111111101110100100001110110111010100101011110010101100111110111111011111011010010001010010101001111111101111110100110000000110010101001111111001000111101001000011001111011111111111111001111100100111111110010001001001110011000110000101101001101100101111010000011001"),
   ("貫気別",    "10011100100110100010011111010111101101110100??1011111101111111110110010010101010101000111111101110100100001110110111010100101011110010101100111????111110111110101100100010100101010001111111011111101001001000001100101100011111010010001111010010000100011110111111111111110011111001001111110100????1001001010010100110000101001000101100011111010000011001"),
   ("新冠",      "10011100100110100010011111010111101101110100101011111101111111110110010010101010101000111111101110100100001101110111010100101011110010101100111110011111011111011010010001010010??100011111110111111010010001000011010010100111111000100011110100100001000111101111111111111100111110010011111101001000???100101001010011000010010100???1100101111010000011001"),
   ("様似",      "100111010010011000100111110101111011011101001010111110111111111101010100101010101100001111111011011000100011011101110101001001111010101011001111100111110111?10110100100100011000110010111111011111101001000010001101001100101111001010001111001010001000011110111111111111110011111001001111110100100010001010100110???01000101000101101100101111010000011001"),
   ("帯広",      "10011101001010100010011111010111101101110100101011111101111111110110010010100110110000111111101101100001001110110111010100100111101110101100111110111111011111011010010010001100011001111111011111110100100001000110101001010111101000100111100101000010001111011111111111111001111010100111111010010001000100110111010101100101001001101100101111010100011001"),
   ("釧路",      "10011101001010100010011111010111101101110100101011111101111111110101010010011010110000111111101101100001001110110111010100100111110110101100111????111110111110110100100100011000110010111110111111101001000010001101001010101111????00101111001010000100011110111111111111110011110101001111110100100010001001100110???00010??1001001101100101111010000011001"),
   ("美幌",      "10011101001010100010011111010111101101110100100111111101111111110111010010101001110000111111101110100001001110110111010010100111101010101010111110111111011111011010010010001100001001011111011111110100100001100110101001010111110000011111100101000010001111011111111111111001111010100111111010010001000100110011010111000101001001101100101111010000011001"),
   ("旭川",      "10011??10010101000100111110111111011011101001010111111011111111101110100101010101100001111111011101001010011101101110101001001111101101011001111100111110111110110??????1001010010???11111111011111101001????????110100101010111101000001111101100100000101111011111111111111001111101000111111010001001000101010011000100110101001001001100101110110000011001"),
   ("名寄",      "100111010010101000100111110111111011011101000110111111011111111111100100101010011100001111111011101001000011101101110101001001111100011011001111???11111011111010110010010010100111001111111111111110100101000000110010101011111110000001111100100100000101111111111111111111001111010100111111110001101000101110111000100100101001001001100101111010100011001"),
   ("宗谷",      "01011??10011101000100111110101111011011101101010111111011111111011100100101010011100001111111011100101000011101011110100011001111101111111001111100111110111110101??????1000110010100101111110111111010010000001011001011001011111000001011110010100010000111011111111111111100111110????111110111001100100100110011010100100101001001101010101111000010011001"),
   ("落帆",      "00111??0010101010010011110101111101111101010101011111101111111101110001010101010101000111111011110100000101101101111010001101011110001011001111?1001111101111101010100100011000110010101111110111110101010000000111001101001011110001000111110010?????????1110111111111111111001111100011111110110000010100010101011010000001010010010011010101111000001010101"),
   ("多蘭泊",    "00111010010110011011111110101111101101101110101111111101111111101110101010101011100010111111011110100000011101101110101001101011100101011001111110011111011110111001001000110001100101011111101111101001100000001110010101010111100010001111100100010000011110111111111111110101111100010111110101000010100010101011010000001010011001001011101111000010011001"),
   ("真岡",      "01111010010101010011011110101111101111101011101111111101111111101110001010101010100010111111011110001000101101101111010001101011100101011001111110011111011111111101001000110001100101011111101111101001100000001110010101010111100010001111100010010000011110111111111111110011111101010111110101000010100100101011010000001010011001101011101111000001011001"),
   ("白浦",      "11011010010101010010011110101111101011101010100111111101111111101100101001101010100010111111011110100000101101101111010001101011110001011001111110011111011111011001001000110001100101011111101111101001100000001110010110010111100010001111100010010000011110111111111111111001111100001111110110000100010100101011010000001010011001001010101111001000010101"),
   ("ライチシカ","10111010010101010111111110101111101011101110101011111101111111101110101001101001100010111111011110100000101101101111010001101011110001011001111110011111011111011001101100110001100111011111101111101001100000001101010101010111100010001111100010001100001110111111111111110011111100010111110100100010010010101011010000001010011001101010101111001000010101"),
   ("内路",      "11111010010101010010111110101111101011101010101111111101111111101110100110101010100001111111011110100000101101101111010100101011110001011001111110011110111111011001001000110001100101011111101111101001100000001110010101010111100010001111100010010000011110111111111111111001111101000111110110000010100010101011010000001010011001001010101111000001010011")]
m=numpy.array([[1.0 if c=="1" else 0.0 for c in v] for h,v in d])
e=numpy.array([numpy.sum(numpy.abs(m-m[i]),axis=1) for i in range(len(d))])
g=networkx.Graph()
for i in range(len(d)):
  g.add_node(i)
  for j in range(i):
    g.add_edge(i,j,weight=1/max(e[i,j],0.5))
c=networkx.spectral_layout(g,dim=3)
p=numpy.array([c[i] for i in range(len(d))])
f=scatter_3d(x=p[:,0],y=p[:,1],z=p[:,2],text=[h for h,_ in d],opacity=0.5,labels={"x":"","y":"","z":""},width=1600,height=1600)
q={"tickfont":{"color":"white"}}
f.update_scenes(xaxis=q,yaxis=q,zaxis=q)
f.show()

私(安岡孝一)の手元では、以下の3D図が出力された。

ainu-dialect-lee-hasegawa.png

この3D図によれば、樺太6ノード(落帆、多蘭泊、真岡、白浦、ライチシカ、内路)と宗谷が、それぞれに離れていて、残りの12ノードは集まっているように見える。マウス操作で回転・拡大もできるので、ノードが集まっているあたりを拡大してみよう。

ainu-dialect-lee-hasegawa-zoom.png

なかなか面白い結果だ。ただし、この疑似単語ベクトルの最初のビットは「k(u)ani」を表しており、「kani」と「kuani」を同一の単語とみなしていいのか議論が残る。次のビットでは、「cokay」と「cookay」を同一の単語とみなしている。3番目のビットでは、「anoka」と「anokay」を同一の単語とみなしている。4番目のビットは「eani」で、19方言すべてで1となっている。5番目のビットは、私の見る限り「tanpe」と「taanpe」と「tapanpe」と「taa」と「tah」を同一の単語とみなしていて、それはさすがにマズイ気がする。このあたり、もう少しちゃんとした単語ベクトルを手に入れたいところだが、さて、どうすればいいかなあ。

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?