More than 1 year has passed since last update.

text-generation-webuiをGoogle ColabのGPUで動かす

Posted at 2023-11-12

インストール（CPU処理のみ）

text-generation-webuiは、LLM（大規模言語モデル）用のユーザーインタフェースを提供します。
まだ思うように動かせませんが、テストしたのでメモとして残しておきます。

GitHub記載のインストール手順を参考に、Google Colaboratory上で動かします。

# text-generation-webuiをインストール
!git clone https://github.com/oobabooga/text-generation-webui
%cd /content/text-generation-webui
!pip install -r requirements.txt

# text-generation-webuiを実行
%cd /content/text-generation-webui
!python server.py --share

「llama.cpp」モデルを使う場合、上記コマンドではCPUだけで処理し、GPUは使いません。

インストール（GPUでも処理）

GitHubの下の方に記載の,

Google Colab notebook
https://colab.research.google.com/github/oobabooga/text-generation-webui/blob/main/Colab-TextGen-GPU.ipynb

を使えば、GPUを動かすことができます。
「Colab-TextGen-GPU.ipynb」のコードにはLLMモデルをロードする記述も記載してありますが、インストール後にtext-generation-webuiで取得できるので、その部分を省いて動かしました。

import torch
from pathlib import Path

# コメントアウトの一部はGoogle Bardにつけてもらっています

if Path.cwd().name != 'text-generation-webui':
  print("Installing the webui...")

  # テキスト生成WebUIのリポジトリをクローンする
  !git clone https://github.com/oobabooga/text-generation-webui

  # カレントディレクトリをテキスト生成WebUIのリポジトリに変更する
  %cd text-generation-webui

  # インストールされているPyTorchのバージョンを取得する
  torver = torch.__version__
  print(f"TORCH: {torver}")

  # PyTorchのバージョンに基づいて、必要なパッケージを更新する
  is_cuda118 = '+cu118' in torver  # 2.1.0+cu118
  is_cuda117 = '+cu117' in torver  # 2.0.1+cu117

# この行は、requirements.txtファイルの内容を読み取り、
# 改行で区切り、結果をtextgen_requirementsリストに格納します
  textgen_requirements = open('requirements.txt').read().splitlines()
# リストを繰り返し、各要素に文字列置換を適用します。
# is_cuda117がTrueの場合、'+cu121'、'+cu122'、および'torch2.1'を、それぞれ'+cu117'、'+cu117'、および'torch2.0'に置き換えます。
# is_cuda118がTrueの場合、'+cu121'および'+cu122'を、それぞれ'+cu118'に置き換えます
  if is_cuda117:
      textgen_requirements = [req.replace('+cu121', '+cu117').replace('+cu122', '+cu117').replace('torch2.1', 'torch2.0') for req in textgen_requirements]
  elif is_cuda118:
      textgen_requirements = [req.replace('+cu121', '+cu118').replace('+cu122', '+cu118') for req in textgen_requirements]
# temp_requirements.txtという名前のテンポラリファイルを書き込み用に開きます
  with open('temp_requirements.txt', 'w') as file:
# 更されたtextgen_requirementsリストをテンポラリファイルに書き込み、各要素を改行文字（\n）で区切ります
      file.write('\n'.join(textgen_requirements))

  # 必要なパッケージをインストールする
  !pip install -r extensions/api/requirements.txt --upgrade
  !pip install -r temp_requirements.txt --upgrade

  # 警告が出た場合は、無視してください
  print("\033[1;32;1m\n --> If you see a warning about \"previously imported packages\", just ignore it.\033[0;37;0m")

  # 再起動は不要です
  print("\033[1;32;1m\n --> There is no need to restart the runtime.\n\033[0;37;0m")

  # flash_attnパッケージをアンインストールする
  try:
    import flash_attn
  except:
    !pip uninstall -y flash_attn

# 上記コードで「RESTART」ボタンが出たら再起動
# その後で再度上記コードを実行すると必要なライブラリが更新されエラーがなくなります

「llma.cpp」でGPUを動かす場合、「--n-gpu-layers」を付けて起動します。

# text-generation-webuiを起動。「share」オプションでpublic URLを生成
# 「--n-gpu-layers」はNumber of layers to offload to the GPU。GPU起動用
!python server.py --share --n-gpu-layers 128

インタフェースからGPU設定

上記起動コマンドのパラメータ「--n-gpu-layers」はtext-generation-webuiの設定画面でも指定できます。

・「--n-gpu-layers」を設定

LLMには、ELYZA社がMetaの「Llama 2」をベースに追加事前学習を行なった「ELYZA-japanese-Llama-2-7b-fast-instruct」をダウンロードして使っています。

・推論を実行

Model loaderを変更

「Transformers」モデルを使う例です。
こちらはデフォルトでGPUが動くようです。
サイバーエージェント社が2023年11月2日に公開した、自社開発のLLM「CyberAgentLM2-7B-Chat」を使いました。

・モデルをロード

・推論を実行

参考サイト

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up