TinySwallow-1.5B を M5stack LLM Module に組み込んでみた

Last updated at 2025-10-19Posted at 2025-09-27

M5Stack LLM モジュール用に標準で提供されている言語モデルについて(5億〜15億パラメータとはいえ)日本語対応に不満があったところ、Hugging Face に TinySwallow-1.5B を LLM Module 用に変換したものがあったので、組み込んでみました。

TinySwallow-1.5B について

TinySwallow-1.5B は Sakana AI社が開発した小規模日本語言語モデルです。詳細は Sakana AI 社のブログを参照してください。

TinySwallow-1.5B モデルの取得

次の手順で TinySwallow-1.5B のモデルを LLM Module に取り込みます。

Module LLM にログインする
huggingface_hubをインストール

pip install -U huggingface_hub

huggingface-cli を使って AX630c 用に変換された TinySwallow-1.5B をダウンロード

huggingface-cli download --resume-download taoki/TinySwallow-1.5B-Instruct-w8a16 --local-dir TinySwallow-1.5B-Instruct-w8a16

取得後のファイルの構成は次のようになります。

TinySwallow-1.5B-Instruct-w8a16/
├── README.md
├── TinySwallow-1.5B-Instruct-ax630c
│   ├── added_tokens.json
│   ├── generation_config.json
│   ├── model.embed_tokens.weight.bfloat16.bin
│   ├── qwen2_p128_l0_together.axmodel
│   ├── qwen2_p128_l10_together.axmodel
│   ├── qwen2_p128_l11_together.axmodel
│   ├── qwen2_p128_l12_together.axmodel
│   ├── qwen2_p128_l13_together.axmodel
│   ├── qwen2_p128_l14_together.axmodel
│   ├── qwen2_p128_l15_together.axmodel
│   ├── qwen2_p128_l16_together.axmodel
│   ├── qwen2_p128_l17_together.axmodel
│   ├── qwen2_p128_l18_together.axmodel
│   ├── qwen2_p128_l19_together.axmodel
│   ├── qwen2_p128_l1_together.axmodel
│   ├── qwen2_p128_l20_together.axmodel
│   ├── qwen2_p128_l21_together.axmodel
│   ├── qwen2_p128_l22_together.axmodel
│   ├── qwen2_p128_l23_together.axmodel
│   ├── qwen2_p128_l24_together.axmodel
│   ├── qwen2_p128_l25_together.axmodel
│   ├── qwen2_p128_l26_together.axmodel
│   ├── qwen2_p128_l27_together.axmodel
│   ├── qwen2_p128_l2_together.axmodel
│   ├── qwen2_p128_l3_together.axmodel
│   ├── qwen2_p128_l4_together.axmodel
│   ├── qwen2_p128_l5_together.axmodel
│   ├── qwen2_p128_l6_together.axmodel
│   ├── qwen2_p128_l7_together.axmodel
│   ├── qwen2_p128_l8_together.axmodel
│   ├── qwen2_p128_l9_together.axmodel
│   ├── qwen2_post.axmodel
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── run_TinySwallow_1.5B_prefill_ax630c.sh
└── tinyswallow_tokenizer.py

main_prefill の取得

README.md には main_prefill を AXERA-TECH/DeepSeek-R1-Distill-Qwen-1.5B などから入手とありますが、そこには見当たりません。main_* はいろいろありますが LLM Module では使えませんでした。

StackFlow にモデルを組み込んでしまえば要らないものでしょうが、とりあえず用意されている起動スクリプト run_TinySwallow_1.5B_prefill_ax630c.sh を試したかったので、探してみることにしました。

Histrory を確認すると、次のように 2025/2/13 に main_prefill は削除したとあるので、その１つ前のものから main_prefill を入手できました。

Delete main_prefill  cf085a7
 qqc1989 commited on Feb 13

Delete main_prefill_postprocess  291c9d3
 qqc1989 commited on Feb 13

入手した main_prefill は run_TinySwallow_1.5B_prefill_ax630c.sh と同じディレクトリに配置します。

お試し運転

README.md にしたがって tokenizer を起動します。

python3 tinyswallow_tokenizer.py

別のターミナルでログインして、以下を起動します。

chmod a+x run_TinySwallow_1.5B_prefill_ax630c.sh
chmod a+x main_prefill
./run_TinySwallow_1.5B_prefill_ax630c.sh

モデルを読み込んで、プロンプトの入力受付状態になります。

[I][                            Init][ 125]: LLM init start
bos_id: -1, eos_id: 151645
  3% | ██                                |   1 /  31 [0.01s<0.31s, 100.00 count/s] tokenizer init ok[I][                            Init][  26]: LLaMaEmbedSelector use mmap
  6% | ███                               |   2 /  31 [0.01s<0.15s, 200.00 count/s
  9% | ████                              |   3 /  31 [0.26s<2.64s, 11.76 count/s
 12% | █████                             |   4 /  31 [0.50s<3.84s, 8.06 count/s]
 16% | ██████                            |   5 /  31 [0.75s<4.63s, 6.70 count/s]
 19% | ███████                           |   6 /  31 [0.98s<5.09s, 6.09 count/s]
 22% | ████████                          |   7 /  31 [1.23s<5.45s, 5.69 count/s]
 25% | █████████                         |   8 /  31 [1.47s<5.71s, 5.43 count/s]
 29% | ██████████                        |   9 /  31 [1.73s<5.95s, 5.21 count/s]
 32% | ███████████                       |  10 /  31 [1.99s<6.18s, 5.02 count/s]
 35% | ████████████                      |  11 /  31 [2.23s<6.29s, 4.93 count/s]
 38% | █████████████                     |  12 /  31 [2.48s<6.40s, 4.84 count/s]
 41% | ██████████████                    |  13 /  31 [2.73s<6.51s, 4.76 count/s]
 45% | ███████████████                   |  14 /  31 [2.97s<6.59s, 4.71 count/s]
 48% | ████████████████                  |  15 /  31 [3.22s<6.65s, 4.66 count/s]
 51% | █████████████████                 |  16 /  31 [3.46s<6.71s, 4.62 count/s]
 54% | ██████████████████                |  17 /  31 [3.71s<6.77s, 4.58 count/s]
 58% | ███████████████████               |  18 /  31 [3.96s<6.82s, 4.55 count/s]
 61% | ████████████████████              |  19 /  31 [4.21s<6.87s, 4.52 count/s]
 64% | █████████████████████             |  20 /  31 [4.45s<6.90s, 4.49 count/s]
 67% | ██████████████████████            |  21 /  31 [4.71s<6.95s, 4.46 count/s]
 70% | ███████████████████████           |  22 /  31 [4.95s<6.97s, 4.44 count/s]
 74% | ████████████████████████          |  23 /  31 [5.20s<7.01s, 4.42 count/s]
 77% | █████████████████████████         |  24 /  31 [5.44s<7.03s, 4.41 count/s]
 80% | ██████████████████████████        |  25 /  31 [5.70s<7.07s, 4.39 count/s]
 83% | ███████████████████████████       |  26 /  31 [5.95s<7.09s, 4.37 count/s]
 87% | ████████████████████████████      |  27 /  31 [6.19s<7.11s, 4.36 count/s]
 90% | █████████████████████████████     |  28 /  31 [6.44s<7.13s, 4.35 count/s]
 93% | ██████████████████████████████    |  29 /  31 [6.69s<7.15s, 4.33 count/s]
 96% | ███████████████████████████████   |  30 /  31 [6.94s<7.17s, 4.32 count/s]
100% | ████████████████████████████████ |  31 /  31 [8.06s<8.06s, 3.85 count/s] init post axmodel ok,remain_cmm(1434 MB)[I][                            Init][ 241]: max_token_len : 1653
[I][                            Init][ 246]: kv_cache_size : 256, kv_cache_num: 1653
[I][                            Init][ 254]: prefill_token_num : 128
[E][                     load_config][ 277]: config file(post_config.json) open failed
[W][                            Init][ 264]: load postprocess config(post_config.json) failed
[I][                            Init][ 268]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
>>

post_config.json がオープンできなかった(実際に存在しない)とのエラーが気になりますが、起動は成功しているようです。次のように入力してみました。

>> m5stack社について教えてください。
[I][                             Run][ 466]: ttft: 1070.39 ms
m5stack社は、**中国の深圳に��点を置く、IoTデバイスとソフトウェアの開��会社**です。 

**主な特��:**

* **小型で低消費電力なIoTデバイス:** m5stackは、その名の通り「5」のサイズの小さなIoTデバイスを提供しています。これらのデバイスは、低消費電力で、様々な用途に適しています。

* **オープンソース:** m5stackのデバイスは、オープンソースソフトウェアをサポートしています。これにより、ユーザーは自由にデバイスをカスタマイズし、独自のアプリケーションを開発することができます。

* **幅��い製品ラインナップ:** m5stackは、様々な用途に適した、様々な種類のIoTデバイスを提供しています。例えば、センサー、アクチュエータ、通信モジュール、そしてそれらを組み合わせたシステムなどがあります。

* **コミュニティ:** m5stackは、活発なコミュニティを持っています。ユーザーは、オンラインフォー��ムやソーシャルメディアを通じて、情報交換や技術的なサポートを受けることができます。

**m5stackの製品は、主に以下の用途に使用されています:**

* **スマートホーム:** スマートロック、スマート照明、温度・湿度センサーなど
* **産業用IoT:** 工場の監視、設備の制御、データ収集など
* **医療:** 生体情報モニタリング、遠隔医療など
* **教育:** ��ログラミング教育、IoTプロジェクトなど

**m5stackは、低コストで高性能なIoTデバイスを提供することで、様々な分野で注目を集めています。**





[N][                             Run][ 605]: hit eos,avg 2.85 token/s

>>

所々文字化けしていますが、これはたぶん main_prefill がマルチバイト文字の出力に対応していないためと思われます。とりあえずお試し運転は成功したものと判断しました。

StackFlow に組み込む

ここまでは README.md に書かれていることを確認しただけです。LLM Module として活用するためにStackFlowへ組み込んでみます。

ファイルの配置

ドキュメントなどに明記されているわけではありませんが、StackFlowのソースコードを読み解くに、LLM関連のファイルの置き場所について次のような取り決めがあるようです。

tokenizerのスクリプト
/opt/m5stack/scripts/<モデル名>_tokenizer.py
または
/opt/m5stack/scripts/tokenizer_<モデル名>.py
モデルのファイル
/opt/m5stack/data/<モデル名>/*.axmodel
/opt/m5stack/data/<モデル名>/*.bin
モデルについての設定
/opt/m5stack/data/models/mode_<モデル名>.json
tokenizerの設定
/opt/m5stack/data/<モデル名>/tokenizer/*.json

これに基づいて、ファイルを配置していきます。

まず、tokenizerの設定ファイルを <モデル名>/tokenizer/ ディレクトリで参照できるようにします。

cd /root/TinySwallow-1.5B-Instruct-w8a16/TinySwallow-1.5B-Instruct-ax630c
mkdir tokenizer
cd tokenizer
ln -s ../added_tokens.json .
ln -s ../generation_config.json .
ln -s ../special_tokens_map.json .
ln -s ../tokenizer.json .
ln -s ../tokenizer_config.json .
ln -s ../vocab.json .

tokennizerのスクリプトを /opt/m5stack/scripts/ から参照できるようにします。

cd /opt/m5stack/scripts
ln -s /root/TinySwallow-1.5B-Instruct-w8a16/tinyswallow_tokenizer.py tokenizer_TinySwallow-1.5B-Instruct-ax630c.py

モデルのファイルを　/opt/m5stack/data/<モデル名>/ から参照できるようにします(これにより tokenizer の設定も配置されます)。

cd /opt/m5stack/data
ln -s /root/TinySwallow-1.5B-Instruct-w8a16/TinySwallow-1.5B-Instruct-ax630c .

モデル設定ファイルの作成

/opt/m5stack/data/models/ 配下に mode_TinySwallow-1.5B-Instruct-ax630c.json というファイル名でモデルの設定ファイルを作成します。

ファイルの内容は run_TinySwallow_1.5B_prefill_ax630c.sh の内容や他のモデル設定ファイルを参考に次のようにしてみました。

{
    "mode":"TinySwallow-1.5B-Instruct-ax630c",
    "type":"llm",
    "capabilities":[
        "text_generation",
        "chat"
    ],
    "input_type":[
        "llm.utf-8",
        "llm.utf-8.stream",
        "llm.chat_completion",
        "llm.chat_completion.stream"
    ],
    "output_type":[
        "llm.utf-8",
        "llm.utf-8.stream"
    ],
    "mode_param":{
        "tokenizer_type":2,
        "filename_tokenizer_model":"http://localhost:8080",
        "filename_tokens_embed":"model.embed_tokens.weight.bfloat16.bin",
        "filename_post_axmodel":"qwen2_post.axmodel",
        "template_filename_axmodel":"qwen2_p128_l%d_together.axmodel",
        "b_use_topk":false,
        "b_bos":false,
        "b_eos":false,
        "axmodel_num":28,
        "tokens_embed_num":151936,
        "tokens_embed_size":1536,
        "b_use_mmap_load_embed":true,
        "b_dynamic_load_axmodel_layer":false,
        "ext_scripts":["tokenizer_TinySwallow-1.5B-Instruct-ax630c.py"]
    }
}

tokenizerスクリプトの修正

tokenizer スクリプトは StackFlow からは次のように起動されます。

python3 <tokennizerスクリプトファイル名> --host localhost --port <設定ポート> --model_id /opt/m5stack/data/<モデル名>/tokenizer --content <システムプロンプト>

tinyswallow_tokenizer.py をこの起動形式に合わせるために、次のように修正しました。

--- tinyswallow_tokenizer_org.py	2025-09-27 12:18:26.190878983 +0900
+++ tinyswallow_tokenizer.py	2025-09-27 12:25:27.612003106 +0900
@@ -8,13 +8,12 @@
 
 
 class Tokenizer_Http:
-    def __init__(self):
-        model_id = "TinySwallow-1.5B-Instruct-ax630c"
+    def __init__(self, model_id):
         self.tokenizer = AutoTokenizer.from_pretrained(model_id)
 
-    def encode(self, prompt):
+    def encode(self, prompt, content):
         messages = [
-            {"role": "system", "content": "あなたは、Sakana AI株式会社が開発したTinySwallowです。小型ながら、誠実で優秀なアシスタントです。"},
+            {"role": "system", "content": content},
             {"role": "user", "content": prompt}
         ]
         text = self.tokenizer.apply_chat_template(
@@ -46,12 +45,6 @@
         return self.tokenizer.eos_token
 
 
-tokenizer = Tokenizer_Http()
-
-print(tokenizer.bos_id, tokenizer.bos_token, tokenizer.eos_id, tokenizer.eos_token)
-print(tokenizer.encode("hello world"))
-
-
 class Request(BaseHTTPRequestHandler):
     timeout = 5
     server_version = 'Apache'
@@ -92,7 +85,7 @@
 
         if self.path == '/encode':
             prompt = req.get('text', '')
-            token_ids = tokenizer.encode(prompt)
+            token_ids = tokenizer.encode(prompt, args.content)
             msg = json.dumps({'token_ids': token_ids if token_ids else -1}, ensure_ascii=False)
 
         elif self.path == '/decode':
@@ -111,8 +104,12 @@
     parser = argparse.ArgumentParser()
     parser.add_argument('--host', type=str, default='localhost')
     parser.add_argument('--port', type=int, default=8080)
+    parser.add_argument('--model_id', type=str, default='TinySwallow-1.5B-Instruct-ax630c')
+    parser.add_argument('--content', type=str, default='あなたは、Sakana AI株式会社が開発したTinySwallowです。小型ながら、誠実で優秀なアシスタントです。')
     args = parser.parse_args()
 
+    tokenizer = Tokenizer_Http(args.model_id)
+
     host = (args.host, args.port)
     print(f'http://{host[0]}:{host[1]}')
     server = HTTPServer(host, Request)

おわりに

以上でStackFlowのモデルとして TinySwallow-1.5B-Instruct-ax630c が使えるようになります。CoreS3 から使ってみた写真を載せておきます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up