More than 1 year has passed since last update.

データブリックス・ジャパン株式会社

Databricks SQLの新たなAI Functions

Posted at 2024-03-02

以前こちらを書いてから時間が経ってました。そして、気づいたら色々な関数が追加されてました。

新たに追加されたAI Functionsをウォークスルーします。

注意
これらのAI Fucntionsを利用するには、こちらのリージョンのDatabricksである必要があります。

感情を分析する`ai_analyze_sentiment`

ai_analyze_sentimentは、テキストから感情を判定します。

SELECT ai_analyze_sentiment('嬉しいです');

ai_analyze_sentiment(嬉しいです)
positive

SELECT ai_analyze_sentiment('悲しいです');

ai_analyze_sentiment(悲しいです)
negative

分類を行う`ai_classify`

ai_classifyは指定されたテキストを指定したラベルで分類します。

SELECT ai_classify("私のパスワードが漏洩しました。", ARRAY("緊急", "緊急ではない"));

ai_classify(私のパスワードが漏洩しました。, array(緊急, 緊急ではない))
緊急

固有表現を抽出する`ai_extract`

ai_extractは指定されたテキストから指定されたラベルの固有表現を抽出します。いわゆる固有表現抽出(NER: Named Entity Recognition)を行います。

SELECT ai_extract(
    '山田太郎さんは東京に住んでおり、Databricksで働いています',
    array('人物', '場所', '企業')
  );

ai_extract(山田太郎さんは東京に住んでおり、Databricksで働いています, array(人物, 場所, 企業))
{"人物":"山田太郎さん","場所":"東京","企業":"Databricks"}

これは良い。

文法間違いを修正する`ai_fix_grammar`

ai_fix_grammarは指定されたテキストの文法間違いを修正します。

英語の場合。

SELECT ai_fix_grammar('This sentence have some mistake');

ai_fix_grammar(This sentence have some mistake)
This sentence has some mistakes

日本語でも結構いけます。

SELECT ai_fix_grammar('今日を雨です');

ai_fix_grammar(今日を雨です)
今日は雨です

テキストを生成する`ai_gen`

ai_genは指定されたプロンプトに基づきテキストを生成します。

注意
現時点では日本語の生成はできないので、ai_generate_text関数を使った方が良いかもしれません。

SELECT ai_gen('Generate a concise, cheerful email title for a summer bike sale with 20% discount');

ai_gen(Generate a concise, cheerful email title for a summer bike sale with 20% discount)
🎉 Summer Bike Sale: Grab Your Dream Bike at 20% Off! 🚲☀️

情報をマスキングする`ai_mask`

これは実用性ありそうです。ai_maskは指定されたテキストで指定されたエンティティをマスキングします。

SELECT ai_mask(
    '山田太郎さんは東京に住んでいます。彼のメールアドレスは taro.yamada@example.com です',
    array('人物', 'メール')
  );

ai_mask(山田太郎さんは東京に住んでいます。彼のメールアドレスは taro.yamada@example.com です, array(人物, メール))
[MASKED]は東京に住んでいます。彼のメールアドレスは [MASKED] です

日本語でもバッチリ。

文字列の意味的類似度を計算する`ai_similarity`

ai_similarityは、与えられた二つの文字列の意味的類似度を計算します。

SELECT ai_similarity('Apache Spark', 'Apache Spark');

ai_similarity(Apache Spark, Apache Spark)
1

SELECT ai_similarity('Databricks', 'Apache Spark');

ai_similarity(Databricks, Apache Spark)
0.6729488

SELECT ai_similarity('ポパイ', 'ほうれん草');

ai_similarity(ポパイ, ほうれん草)
0.77520055

要約を行う`ai_summarize`

ai_summarizeは、指定されたテキストを指定されたワード数で要約します。ただし、日本語では動作しません。

SELECT ai_summarize(
    'Apache Spark is a unified analytics engine for large-scale data processing. ' ||
    'It provides high-level APIs in Java, Scala, Python and R, and an optimized ' ||
    'engine that supports general execution graphs. It also supports a rich set ' ||
    'of higher-level tools including Spark SQL for SQL and structured data ' ||
    'processing, pandas API on Spark for pandas workloads, MLlib for machine ' ||
    'learning, GraphX for graph processing, and Structured Streaming for incremental ' ||
    'computation and stream processing.',
    20
  )

ai_summarize(concat(concat(concat(concat(concat(concat(Apache Spark is a unified analytics engine for large-scale data processing. , It provides high-level APIs in Java, Scala, Python and R, and an optimized ), engine that supports general execution graphs. It also supports a rich set ), of higher-level tools including Spark SQL for SQL and structured data ), processing, pandas API on Spark for pandas workloads, MLlib for machine ), learning, GraphX for graph processing, and Structured Streaming for incremental ), computation and stream processing.), 20)
Apache Spark is a unified, multi-language analytics engine for large-scale data processing with additional tools for SQL, machine learning, graph processing, and stream computing.

翻訳を行う`ai_translate`

ai_translateは、指定されたテキストを指定された言語に翻訳します。

SELECT ai_translate('Hello, how are you?', 'ja');

ai_translate(Hello, how are you?, ja)
こんにちは、元気ですか？

これも使い所ありそうです。

データベースに格納されているデータに対して、SQLとLLMで色々な処理を行えそうです。是非ご活用ください！(日本リージョンに早く来てほしい)

はじめてのDatabricks

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Databricks SQLの新たなAI Functions

感情を分析するai_analyze_sentiment

分類を行うai_classify

固有表現を抽出するai_extract

文法間違いを修正するai_fix_grammar

テキストを生成するai_gen

情報をマスキングするai_mask

文字列の意味的類似度を計算するai_similarity

要約を行うai_summarize

翻訳を行うai_translate