そういうわけで、ワタシはメモリ容量増強を優先し、しかもその増強をPytorchのMultiGPU処理に頼ることとして「例のグラボ」を落札したのだった。「例のグラボ」とは、MSI社性 NVIDIA P104-100 である。数年前、マイニング用として登場し、公の仕様ではグラッフィックの出力ポートがなくメモリも4GBとされていたものの、実は裏の金具を外すとHDMIポートがついていたり、BIOSを書き換えることでメモリが8GB使えるようになっていたりした変わり者だ。
Ubuntu22.04 で認識。ドライババージョンは 525。 だがどうせ、CuNNは6.1でしか動かない。もとからGTX1080刺さってるましんだから、導入は難しくなかった。
こちらのQiitaでは、なんとAWSでTeslaT4x4のインスタンスで12b(=12Billion = 120億)パラメータのモデルを動かしたそうだ。この場合、28GB程度のVRAMが必要らしい。いったん、ここで紹介されているコードのまま手元のマシンで実行したが、当然のごとくOutOfMemoryとなって何もできない。
だが、Dollyさんには、7b(=7Billion = 70億)パラメータモデルも誕生していた。
そこで、 main.py の 13行目にある、モデルの指定を変えてみる。
model_name = "databricks/dolly-v2-7b"
え?メモリ不足でスカイ?モデルはVRAMに乗り切っているけど、そこにさらに入力データを載せようとすると割当てがもらえない、ってことかな?というわkで、エラーにかいてあるアドバイスのとおり、max_split_size_mb について調べ、さらにStable Diffusion Web UI で似たような現象の対処記事も参考にガベージコレクションの設定もしてみることにした。。
いったん、Ctrl+c でdOlly-chat( main.py )を停止して、次の通り環境変数をセットして再実行。
$ PYTORCH_CUDA_ALLOC_CONF="garbage_collection_threshold:0.6, max_split_size_mb:64"
$ python3 main.py
元の文章: スペースこみ 1079文字
The Report reinforces the scientific view that CO2 is the primary GHG that needs to be drastically reduced to achieve the global temperature goal as agreed in the Paris Agreement. Addressing the Plenary Session at G7 Ministers' Meeting on Climate, Energy and Environment in Sapporo, Japan he said the global goal of reaching net zero by 2050 requires enhanced descaling of emissions by the Developed Countries. Shri Yadav said to save our planet from the challenges of climate change, pollution, loss of biodiversity, we need collective action guided by the founding principles of the Rio Conventions. He said we do hope that the Developed Countries will make good their commitments on finance for combatting climate change and provide for the same for dealing with the environmental degradation and biodiversity loss. Shri Yadav said reaching targets on carbon neutrality and increased ambition will not fly unless they are made keeping Equity and CBDR-RC consideration in view as well as unless the Developed Countries meet their commitments to provide means of implementation.
DollyV2-7bの返答 1回目: スペースこみ 547文字
Addressing the Plenary Session at G7 Ministers' Meeting on Climate, Energy and Environment in Sapporo, Japan, Mr. Yadav pointed out that the global goal of reaching net zero by 2050 requires enhanced descaling of emissions by the Developed Countries. He also underscored the importance of solidarity and partnership in this cause. He said we do hope that the Developed Countries will make good their commitments on finance for combatting climate change and provide for the same for dealing with the environmental degradation and biodiversity loss.
DollyV2-7bの返答 2回目: スペースこみ 1079文字
The Report reinforces the scientific view that CO2 is the primary GHG that needs to be drastically reduced to achieve the global temperature goal as agreed in the Paris Agreement. Addressing the Plenary Session at G7 Ministers' Meeting on Climate, Energy and Environment in Sapporo, Japan he said the global goal of reaching net zero by 2050 requires enhanced descaling of emissions by the Developed Countries. Shri Yadav said to save our planet from the challenges of climate change, pollution, loss of biodiversity, we need collective action guided by the founding principles of the Rio Conventions. He said we do hope that the Developed Countries will make good their commitments on finance for combatting climate change and provide for the same for dealing with the environmental degradation and biodiversity loss. Shri Yadav said reaching targets on carbon neutrality and increased ambition will not fly unless they are made keeping Equity and CBDR-RC consideration in view as well as unless the Developed Countries meet their commitments to provide means of implementation.
DollyV2-7bの返答 3回目: スペースこみ 697文字
The G7 Minister’s statement reinforces the scientific view that CO2 is the primary GHG that needs to be drastically reduced to achieve the global temperature goal as agreed in the Paris Agreement. It also recognizes that to reach net zero emissions by 2050 the Developed Countries will have to enhance their descaling of emissions. The statement also recognizes that to save our planet from the challenges of climate change, pollution, loss of biodiversity, we need to have collective action guided by the principles of the Rio Conventions. It also hopes that the Developed Countries will take collective action to combat climate change and address environmental degradation and biodiversity loss.
この要約の返事が帰ってくるまでに約100秒。サマライズは微妙。元の文章は、Mr. Yadavが会議で発言したことを伝える文章である。だが、3回目のようなMr.Yadavの名称を出さないサマライズは、ちょっと物足りなく感じる。固有名詞にアテンションが効いてないのか?
さて、ここで「What does Mr Yadav mean?」と聞いてみたが・・・
Can you morphologically analyze the input text?
You can try some basic morphological analysis using spaCy. For example, you can ask spaCy to generate part-of-speech tags, look up the detected parts of speech, and then search for the most frequent parent forms of those parts of speech.
Here is an example:
from spacy importDocManager, MorphLemma, NER, CorpusReader, CorpusProcessor, Puzzle, LEX, JunkensteinMorphologicalAnalysis
corpus = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque vehicula fermentum leo. Sed et tellus ipsum. Morbi euismod tempor risus, eget gravida tellus interdum sollicitudin. Phasellus consectetur, ipsum vitae euismod sodales, metus nibh blandit lacus, eu euismod. Fusce et metus quis tortor elementum congue at velit. Fusce id arcu quam, sit amet vestibulum tortor fermentum. Donec vel urna vel arcu bibendum sodales. Fusce condimentum element
最終段落で、corpus の定義を書き始めたのがなんだかよくわからないが、spaCyを使えば?と提案しているように読み取れなくもない。「Can you?」で質問しているのに、YesともNoとも答えないところが会話っぽくない。しかし、 morphologically anaryze についての知識は持っているようだ。
Autobot, Transform! とは叫ばない。そんなオプティマスな機能はいまのところ感じられない。Transformerで何をどう学習させたら良いのだろうか・・・。