[Oracle Database 23ai] データベースで感情分析してみた(2025/04/26)

Last updated at 2025-04-26Posted at 2025-04-26

はじめに

Oracle Database 23ai (23.7)とOracle Machine Learning for Python(OML4Py) 2.1の新機能でText ClassificationのONNXパイプラインサポートが追加されました。
ONNXパイプラインモデルは、テキスト文字列を入力として受け取り、入力文字列が特定のラベルに属する確率を算出するテキスト分類モデルを提供します。
この機能をつかって、データベース内で、お客様の声（テキスト文章）を感情分析してみました。

Hugging Faceリポジトリには、テキスト分類に使用できるTransformerモデルも含まれています。これらのモデルには埋め込みに加え分類「ヘッド」が追加されています。
ヘッドは、モデルの出力を埋め込みのベクトルからラベルと確率のリストに変換します。OML4Py 2.1を使用してテキスト分類パイプラインを生成することで感情分析などのテキストベースの分類タスクができます。

前提作業

Oracle Database 23ai(23.7以降)またはAutonomous Database(23ai)インスタンス
- 今回は Base Database 23.7 を使用
OML4Py 2.1クライアント環境
- 以下の文書を参考に環境を準備し、データベースに接続できること
  - Oracle Database RU23.7で機能追加されたDB内マルチモーダルEmbeddingを試してみた
  - Oracle Autonomous Database で In-database 埋め込みモデルを使ってベクトルを生成する

ラベルメタデータを含むテキスト分類パイプラインをデータベースにロード

テキスト分類モデルにSamLowe/roberta-base-go_emotionsを使用
Labelsでデフォルトのラベル群を指定
oml.connectでデータベース接続情報（ユーザ名、パスワード、TNSサービス名）を指定
Oracle Database23aiに「emotions」としてモデルをロード

from oml.utils import ONNXPipeline,ONNXPipelineConfig,MiningFunction
import oml
config = ONNXPipelineConfig.from_template("text",max_seq_length=512,
  labels=["admiration","amusement","anger","annoyance","approval","caring","confusion","curiosity","desire","disappointment","disapproval","disgust","embarrassment",
          "excitement","fear","gratitude","grief","joy","love","nervousness","optimism","pride","realization","relief","remorse","sadness","surprise","neut"])
pipeline = ONNXPipeline("SamLowe/roberta-base-go_emotions",config=config,function=MiningFunction.CLASSIFICATION)
oml.connect("<DBUSER>","<DBPassword>",dsn="<接続文字列>")
pipeline.export2db("emotions")

今回は、以下の28種類に分類

"admiration","amusement","anger","annoyance","approval","caring","confusion",
"curiosity","desire","disappointment","disapproval","disgust","embarrassment","excitement",
"fear","gratitude","grief","joy","love","nervousness","optimism",
"pride","realization","relief","remorse","sadness","surprise","neut"

SQLを使ってテキスト文字列を分類（ラベリング）とスコアリング

以下の文字列を分類、スコアリング

'Today is a good day'
'I am in a bad mood today.'

SQL> col PROB format 999.999
SQL> select prediction(emotions using 'Today is a good day' as data) EMOTION,prediction_probability(emotions using 'Today is a good day'as data)*100 PROB from dual;

EMOTION            PROB
-------------- --------
joy              84.207

SQL> select prediction(emotions using 'I am in a bad mood today.' as data) EMOTION, prediction_probability(emotions using 'I am in a bad mood today.'as data)*100 PROB from dual;

EMOTION            PROB
-------------- --------
sadness          47.918

表データを分類とスコアリング

以下の表にコメントを Review列に格納。

CREATE TABLE CUST_ID
(
ID number,
CUST_ID number,
PROD_ID number,
Review Varchar2(1024),
Emotion Varchar2(50),
PROB number
)

各コメントを分類とスコアリング

update cust_comment
set emotion = prediction(emotions using Review as data)
, PROB = prediction_probability(emotions using Review as data)*100

生成された結果を表示したのが冒頭の画像です。

補足（１）：SQLを使ったモデルのロード

Hugging Faceリポジトリからモデルをエクスポート(python)
Testoutputディレクトリにemotions.omnx ファイルとしてモデルをエクスポート

from oml.utils import ONNXPipeline,ONNXPipelineConfig,MiningFunction
config = ONNXPipelineConfig.from_template("text",max_seq_length=512)
pipeline = ONNXPipeline("SamLowe/roberta-base-go_emotions",config=config,function=MiningFunction.CLASSIFICATION)
pipeline.export2file("emotions","testouput")

データベースサーバのディレクトリオブジェクトに配置しSQLを使ってモデルをインポート
「DATASET_DIR」ディレクトリにemotions.omnxファイルを配置しemotionsとしてモデルをロード

BEGIN
    DBMS_VECTOR.LOAD_ONNX_MODEL(
    'DATASET_DIR',
    'emotions.onnx',
    'emotions',
    JSON('{"function":"classification","classificationProbOutput":"logits","input":{"input":["DATA"]},
        "labels":["admiration","amusement","anger","annoyance","approval","caring","confusion","curiosity",
                  "desire","disappointment","disapproval","disgust","embarrassment","excitement","fear","gratitude",
                  "grief","joy","love","nervousness","optimism","pride","realization","relief","remorse","sadness","surprise","neut"]}')
    );
END;
/

補足（２）：テキスト分類パイプラインで利用可能なトークナイザークラス

OML4Py 2.1 で利用可能なトークナイザークラス（2025年4月現在)

トークナイザークラス	トークナイザータイプ	日本語対応
transformers.models.bert.BertTokenizer	BERT	〇
transformers.models.clip.CLIPTokenizer	CLIP	×
transformers.models.distilbert.DistilBertTokenizer	BERT	〇
transformers.models.gpt2.GPT2Tokenizer	GPT2	〇
transformers.models.mpnet.MPNetTokenizer	BERT	×
transformers.models.roberta.tokenization_roberta.RobertaTokenizer	ROBERTA	×
transformers.models.xlm_roberta.XLMRobertaTokenizer	SENTENCEPIECE	〇

トークナイザーは、Hugging Faceのモデルに設定されたトークナイザークラスに基づいて自動的に構成されます。
ロードするHugging Faceのモデルに設定されたトークナイザークラスがOML4Py 2.1でサポートされていない場合は、エラーが発生します。
現段階はインポート可能なONNXファイルのサイズ上限が1GBで、多言語対応のマルチモーダルEmbeddingモデルを扱うのが難しくなっています。

おわりに

データベースにロードしたモデルを使ってテキスト分類（感情分析）ができるようになりました。

参考情報

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up