DeepSeekに画像を入力する【Colab付き】

Posted at 2025-02-01

画像を入力してテキストを生成する例です。
速い、安い、かんたん。

1, vllmをインストール

pip install vllm

2, deepseekマルチモーダルモデルを初期化

from vllm import LLM
llm = LLM(model="deepseek-ai/deepseek-vl2-tiny", hf_overrides={"architectures": ["DeepseekVLV2ForCausalLM"]},dtype='float16')

3,実行(T4GPU 0.65秒)

image = PIL.Image.open("menu.jpg")
prompt = "USER: <image>\nHow much is an espresso?\nASSISTANT:"

outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {"image": image},
})

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

The price for an espresso is listed as 4.0.
エスプレッソの価格は 4.0 と記載されています。

マルチモーダルがこんなにかんたんに使えるとありがたいです。

🐣

フリーランスエンジニアです。
お仕事のご相談こちらまで
rockyshikoku@gmail.com

Core MLを使ったアプリを作っています。
機械学習関連の情報を発信しています。

Twitter
Medium
GitHub

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up