OpenELM を簡単に動作させてみる

Last updated at 2024-04-28Posted at 2024-04-27

目的

apple の言語モデルが公開された.
ただ sample を動作させ雰囲気を見る.

machine spec

mac
- m1
- mem: 16GB
mac
- i9
- mem: 32GB
linux
- ryzen 1700
- gpu: gtx 1080, 8GB

動作所管

m1 mac での処理時間はそこそこかかる.
- OpenELM-270M-Instruct: 約10秒
- OpenELM-450M-Instruct: 約20秒
intel mac
- 上記の倍の処理時間
linux with 1080
- 約4~7秒
- 270, 450M, 1B model とも数秒で返る
  - 文章量と所要時間を鑑みると十分な速度と思える
- 3B model では memory 不足で動作しない

m3, nvidia の高容量memory を積んでいれば高速かつ, 大規模モデルも快適に動かせそう

動作結果 sample SS

使い所

現状では, open AI などの api は料金や情報の精査が必用となる
local や closed で動作できるなら, 上記の制約から開放されるので嬉しい

setup

hugging face アカウントを作成， token 作成

以下, アカウント作成の流れがわかる記事
https://zenn.dev/protoout/articles/73-hugging-face-setup

llama の使用申請

末尾の申請フォームより
1~2時間程度で許可された

install git lfs

brew install git-lfs

clone repo

git clone https://huggingface.co/apple/OpenELM
cd OpenELM

setup

https://huggingface.co/apple/OpenELM#setup
上記内容を実行し、各種 install を行う

動作確認

prompt='Write a story about Richard Phillips Feynman'


python generate_openelm.py \
    --model apple/OpenELM-270M-Instruct \
    --hf_access_token hf_xxxxxx \
    --prompt "${prompt}" \
    --generate_kwargs repetition_penalty=3.2


python generate_openelm.py \
    --model apple/OpenELM-450M-Instruct \
    --hf_access_token hf_xxxxxxxxxxxx \
    --prompt "${prompt}" \
    --generate_kwargs repetition_penalty=3.2


python generate_openelm.py \
    --model apple/OpenELM-1_1B \
    --hf_access_token hf_xxxxxxxxxxxxx \
    --prompt "${prompt}" \
    --generate_kwargs repetition_penalty=3.2


python generate_openelm.py \
    --model apple/OpenELM-3B-Instruct \
    --hf_access_token hf_xxxxxxxxxxxxxxxxx \
    --prompt "${prompt}" \
    --generate_kwargs repetition_penalty=3.2

access token を書き換える
動作すると、何らかの文章が出力される
model size を変更すると、精度が上がるが処理能力がだいぶ必要となる
- 線形に増加するだけなら良いが, 実メモリが必要になる

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up