RubyでQwen2クイックスタート(pycall.rbを使用)

Posted at 2024-06-08

はじめに

LLMを使ったプログラムを書いてみようと思って、手始めにQwen2のクイックスタートをRubyでやってみました。pycall.rbを経由してHuggingface Transformersを使う手法なので他のモデルでも使用可能かと思います。

前提

RTX 3060
Ubuntu 20.04 LTS
Ruby 3.3.2
Python 3.8.10

実装

Quickstartのサンプルコードほぼそのままです。

quickstart.rb

require 'pycall'
device = 'cuda'
path = "/path/to/models/Qwen_Qwen2-7B-Instruct"

quantization_config =
  PyCall.import_module('transformers').BitsAndBytesConfig.new(
    load_in_4bit: true,
    bnb_4bit_use_double_quant: true,
    bnb_4bit_quant_type: 'nf4',
    bnb_4bit_compute_dtype: PyCall.import_module('torch').bfloat16
  )
model =
  PyCall.import_module('transformers').AutoModelForCausalLM.from_pretrained(
    path,
    device_map: 'auto',
    quantization_config:
  )
tokenizer = PyCall.import_module("transformers").AutoTokenizer.from_pretrained(path)

prompt = "Give me a short introduction to large language model. The answer must be in Japanese."

messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: prompt },
]

text = tokenizer.apply_chat_template(messages, tokenize: false, add_generation_prompt: true)
model_inputs = tokenizer.call([text], return_tensors: 'pt').to(device: )
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens: 4096)

input = tokenizer.batch_decode(model_inputs.input_ids, skip_special_tokens: true).first
response = tokenizer.batch_decode(generated_ids, skip_special_tokens: true).first

puts response.sub(input, '')

今回使用するgemはpycallのみ。

Gemfile

source "https://rubygems.org"

gem "pycall", "~> 1.5"

Python側の依存関係はこの4つ。

requirements.txt

transformers
bitsandbytes
torch
accelerate

実行

$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ bundle
(venv) $ pip install -r requirements.txt 
(venv) $ bundle exec ruby quickstart.rb 
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.21s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
大規模言語モデルとは、大量のデータを使用して学習したコンピューターシステムで、テキストを生成したり、質問に答えたりする能力があります。これらのモデルは、人工知能の一部として利用され、広範な分野で使用されています。例えば、翻訳、エイリアスの生成、テキストの完結、会話の理解などです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up