LoginSignup
12
14

More than 5 years have passed since last update.

CurlでWatson Language-Translation v2 APIを使ってみる

Last updated at Posted at 2015-08-07

2015年7月6日、IBMのBluemix開発者サイトにて、Watsonの翻訳機能(language-translation)が正式版(General Availability, GA)となったことが発表されました。Language-translationはこれまでのlanguage-identificationやmachine-translationなどを置換し、APIはv2となって大きく変更されています。

この記事では、curlを使ってLanguage-Translation v2 APIの基本的な使い方を説明します。

準備

IBM ID (Bluemixアカウント)すでに準備されているものとします。

Bluemixダッシュボードは使用せず、すべてCLIで操作することにします。そこで、まずはじめにCloud FoundryのCLI (cf)をインストールします。

# 以下は64bit Linuxへのインストール例
$ wget -O cf-linux64-6.12.2.tgz https://cli.run.pivotal.io/stable?release=linux64-binary&version=6.12.2&source=github-rel
$ sudo tar zxvf cf-linux64-6.12.2.tgz -C /usr/local/bin/
$ cf -v # バージョン情報が出力されればOK

Bluemixへのログインとサービスの作成

Cloud Foundry CLIを使ってBluemixにログインを行い、サービスを作成します。Curlで試すだけであれば、アプリケーションを作る必要がありません。そのかわりに資格情報(credentials)を作成します。資格情報には、Watsonへアクセスするユーザ名、パスワード、URLが含まれています。なお、資格情報は、Cloud Foundryの用語ではサービスキー(service key)に相当します。

# cfの接続先としてBluemixを指定
$ cf api https://api.ng.bluemix.net

# cfにログイン
$ cf login -u <ユーザ名>

# サービスを作成(サービス名: tr)
$ cf create-service language_translation standard tr

# 資格情報(サービスキー)を作成(資格情報名: tr-credentials)
$ cf create-service-key tr tr-credentials

# 資格情報の内容を確認(username, password, urlを含むjsonが返ってきます)
$ cf service-key tr tr-credentials # confirm

さて、これでcurlからWatson Language-Translationを使う準備ができました。簡単ですね。これ以降は、資格情報で得られた情報を<username><password><url>としてコマンドに埋め込んでいきますので、適宜置き換えてください。

/v2/identifiable_languages: 識別可能な言語の一覧を取得する

まずは、識別可能な言語の一覧を取得します。

  • メソッド: GET
  • URL: /v2/identifiable_languages
  • パラメータ: なし
  • レスポンス形式: application/json

Curlで呼び出すと以下のようになります。なんと62種類もあります。

$ curl -u "<username>:<password>" "<url>/v2/identifiable_languages"
{
  "languages":[
    {
      "language":"af",
      "name":"Afrikaans"
    },
    {
      "language":"ar",
      "name":"Arabic"
    },
    {
      "language":"az",
      "name":"Azerbaijani"
    },
    {
      "language":"ba",
      "name":"Bashkir"
    },
    {
      "language":"be",
      "name":"Belarusian"
    },
    {
      "language":"bg",
      "name":"Bulgarian"
    },
    {
      "language":"bn",
      "name":"Bengali"
    },
    {
      "language":"bs",
      "name":"Bosnian"
    },
    {
      "language":"cs",
      "name":"Czech"
    },
    {
      "language":"cv",
      "name":"Chuvash"
    },
    {
      "language":"da",
      "name":"Danish"
    },
    {
      "language":"de",
      "name":"German"
    },
    {
      "language":"el",
      "name":"Greek"
    },
    {
      "language":"en",
      "name":"English"
    },
    {
      "language":"eo",
      "name":"Esperanto"
    },
    {
      "language":"es",
      "name":"Spanish"
    },
    {
      "language":"et",
      "name":"Estonian"
    },
    {
      "language":"eu",
      "name":"Basque"
    },
    {
      "language":"fa",
      "name":"Persian"
    },
    {
      "language":"fi",
      "name":"Finnish"
    },
    {
      "language":"fr",
      "name":"French"
    },
    {
      "language":"gu",
      "name":"Gujarati"
    },
    {
      "language":"he",
      "name":"Hebrew"
    },
    {
      "language":"hi",
      "name":"Hindi"
    },
    {
      "language":"ht",
      "name":"Haitian"
    },
    {
      "language":"hu",
      "name":"Hungarian"
    },
    {
      "language":"hy",
      "name":"Armenian"
    },
    {
      "language":"id",
      "name":"Indonesian"
    },
    {
      "language":"is",
      "name":"Icelandic"
    },
    {
      "language":"it",
      "name":"Italian"
    },
    {
      "language":"ja",
      "name":"Japanese"
    },
    {
      "language":"ka",
      "name":"Georgian"
    },
    {
      "language":"kk",
      "name":"Kazakh"
    },
    {
      "language":"km",
      "name":"Central Khmer"
    },
    {
      "language":"ko",
      "name":"Korean"
    },
    {
      "language":"ku",
      "name":"Kurdish"
    },
    {
      "language":"ky",
      "name":"Kirghiz"
    },
    {
      "language":"lt",
      "name":"Lithuanian"
    },
    {
      "language":"lv",
      "name":"Latvian"
    },
    {
      "language":"ml",
      "name":"Malayalam"
    },
    {
      "language":"mn",
      "name":"Mongolian"
    },
    {
      "language":"nb",
      "name":"Norwegian Bokmal"
    },
    {
      "language":"nl",
      "name":"Dutch"
    },
    {
      "language":"nn",
      "name":"Norwegian Nynorsk"
    },
    {
      "language":"pa",
      "name":"Panjabi"
    },
    {
      "language":"pl",
      "name":"Polish"
    },
    {
      "language":"ps",
      "name":"Pushto"
    },
    {
      "language":"pt",
      "name":"Portuguese"
    },
    {
      "language":"ro",
      "name":"Romanian"
    },
    {
      "language":"ru",
      "name":"Russian"
    },
    {
      "language":"sk",
      "name":"Slovakian"
    },
    {
      "language":"so",
      "name":"Somali"
    },
    {
      "language":"sq",
      "name":"Albanian"
    },
    {
      "language":"sv",
      "name":"Swedish"
    },
    {
      "language":"ta",
      "name":"Tamil"
    },
    {
      "language":"te",
      "name":"Telugu"
    },
    {
      "language":"tr",
      "name":"Turkish"
    },
    {
      "language":"uk",
      "name":"Ukrainian"
    },
    {
      "language":"ur",
      "name":"Urdu"
    },
    {
      "language":"vi",
      "name":"Vietnamese"
    },
    {
      "language":"zh",
      "name":"Chinese"
    },
    {
      "language":"zh-TW",
      "name":"Traditional Chinese"
    }
  ]

/v2/identify: 言語を識別する

本当に62種類の言語を識別できるのでしょうか? 実際に識別してみましょう。

  • メソッド: POST または GET
  • URL: /v2/identify
  • パラメータ: 対象テキスト
  • レスポンス形式: application/json または text/plain (Acceptヘッダで指定)

Curlで呼び出すと以下のようになります。この例では高い確信度(confidence)で日本語と断定しています。

$ curl -u "<username>:<password>" -X POST -H "Accept: application/json" -H "Content-Type: text/plain; charset=utf-8" -d "吾輩は猫である。名前はまだ無い。" <url>/v2/identify
{
  "languages":[
    {
      "language":"ja",
      "confidence":0.998804
    },
    {
      "language":"zh-TW",
      "confidence":4.23903E-5
    },
    {
      "language":"zh",
      "confidence":3.85035E-5
    },
    {
      "language":"vi",
      "confidence":1.89393E-5
    },
    {
      "language":"ur",
      "confidence":1.89393E-5
    },
    {
      "language":"uk",
      "confidence":1.89393E-5
    },
    {
      "language":"tr",
      "confidence":1.89393E-5
    },
    {
      "language":"te",
      "confidence":1.89393E-5
    },
    {
      "language":"ta",
      "confidence":1.89393E-5
    },
    {
      "language":"sv",
      "confidence":1.89393E-5
    },
    {
      "language":"sq",
      "confidence":1.89393E-5
    },
    {
      "language":"so",
      "confidence":1.89393E-5
    },
    {
      "language":"sk",
      "confidence":1.89393E-5
    },
    {
      "language":"ru",
      "confidence":1.89393E-5
    },
    {
      "language":"ro",
      "confidence":1.89393E-5
    },
    {
      "language":"ps",
      "confidence":1.89393E-5
    },
    {
      "language":"pl",
      "confidence":1.89393E-5
    },
    {
      "language":"pa",
      "confidence":1.89393E-5
    },
    {
      "language":"nn",
      "confidence":1.89393E-5
    },
    {
      "language":"nl",
      "confidence":1.89393E-5
    },
    {
      "language":"nb",
      "confidence":1.89393E-5
    },
    {
      "language":"ml",
      "confidence":1.89393E-5
    },
    {
      "language":"lv",
      "confidence":1.89393E-5
    },
    {
      "language":"lt",
      "confidence":1.89393E-5
    },
    {
      "language":"ky",
      "confidence":1.89393E-5
    },
    {
      "language":"ku",
      "confidence":1.89393E-5
    },
    {
      "language":"km",
      "confidence":1.89393E-5
    },
    {
      "language":"kk",
      "confidence":1.89393E-5
    },
    {
      "language":"ka",
      "confidence":1.89393E-5
    },
    {
      "language":"is",
      "confidence":1.89393E-5
    },
    {
      "language":"hy",
      "confidence":1.89393E-5
    },
    {
      "language":"hu",
      "confidence":1.89393E-5
    },
    {
      "language":"ht",
      "confidence":1.89393E-5
    },
    {
      "language":"hi",
      "confidence":1.89393E-5
    },
    {
      "language":"he",
      "confidence":1.89393E-5
    },
    {
      "language":"gu",
      "confidence":1.89393E-5
    },
    {
      "language":"fr",
      "confidence":1.89393E-5
    },
    {
      "language":"fi",
      "confidence":1.89393E-5
    },
    {
      "language":"fa",
      "confidence":1.89393E-5
    },
    {
      "language":"eu",
      "confidence":1.89393E-5
    },
    {
      "language":"et",
      "confidence":1.89393E-5
    },
    {
      "language":"es",
      "confidence":1.89393E-5
    },
    {
      "language":"eo",
      "confidence":1.89393E-5
    },
    {
      "language":"en",
      "confidence":1.89393E-5
    },
    {
      "language":"el",
      "confidence":1.89393E-5
    },
    {
      "language":"de",
      "confidence":1.89393E-5
    },
    {
      "language":"da",
      "confidence":1.89393E-5
    },
    {
      "language":"cv",
      "confidence":1.89393E-5
    },
    {
      "language":"cs",
      "confidence":1.89393E-5
    },
    {
      "language":"bs",
      "confidence":1.89393E-5
    },
    {
      "language":"bn",
      "confidence":1.89393E-5
    },
    {
      "language":"bg",
      "confidence":1.89393E-5
    },
    {
      "language":"be",
      "confidence":1.89393E-5
    },
    {
      "language":"ba",
      "confidence":1.89393E-5
    },
    {
      "language":"az",
      "confidence":1.89393E-5
    },
    {
      "language":"af",
      "confidence":1.89393E-5
    },
    {
      "language":"id",
      "confidence":1.88452E-5
    },
    {
      "language":"ar",
      "confidence":1.88235E-5
    },
    {
      "language":"mn",
      "confidence":1.87012E-5
    },
    {
      "language":"pt",
      "confidence":1.86689E-5
    },
    {
      "language":"it",
      "confidence":1.84846E-5
    },
    {
      "language":"ko",
      "confidence":1.7914E-5
    }
  ]
}

/v2/models: 翻訳モデルの一覧を取得する

さて、次に翻訳です。まずは翻訳モデルの一覧を取得します。

  • メソッド: GET
  • URL: /v2/models
  • パラメータ: 不要(翻訳元言語モデル、翻訳先言語モデル、デフォルト言語モデルか否かでフィルタすることもできます)
  • レスポンス形式: json

Curlで呼び出すと以下のようになります。model_idが"source-target"となっているモデルはニュースドメイン、"source-target-conversational"は会話ドメイン、"source-target-patent"は特許ドメインです。

$ curl -u "<username>:<password>" "<url>/v2/models"
{
  "models":[
    {
      "model_id":"ar-en",
      "source":"ar",
      "target":"en",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"ar-en-conversational",
      "source":"ar",
      "target":"en",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-ar",
      "source":"en",
      "target":"ar",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-ar-conversational",
      "source":"en",
      "target":"ar",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-es",
      "source":"en",
      "target":"es",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-es-conversational",
      "source":"en",
      "target":"es",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-fr",
      "source":"en",
      "target":"fr",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-fr-conversational",
      "source":"en",
      "target":"fr",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-pt",
      "source":"en",
      "target":"pt",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"en-pt-conversational",
      "source":"en",
      "target":"pt",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"es-en",
      "source":"es",
      "target":"en",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"es-en-conversational",
      "source":"es",
      "target":"en",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"es-en-patent",
      "source":"es",
      "target":"en",
      "base_model_id":"",
      "domain":"patent",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"fr-en",
      "source":"fr",
      "target":"en",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"fr-en-conversational",
      "source":"fr",
      "target":"en",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"ko-en-patent",
      "source":"ko",
      "target":"en",
      "base_model_id":"",
      "domain":"patent",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"pt-en",
      "source":"pt",
      "target":"en",
      "base_model_id":"",
      "domain":"news",
      "customizable":true,
      "default_model":true,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"pt-en-conversational",
      "source":"pt",
      "target":"en",
      "base_model_id":"",
      "domain":"conversational",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"pt-en-patent",
      "source":"pt",
      "target":"en",
      "base_model_id":"",
      "domain":"patent",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    },
    {
      "model_id":"zh-en-patent",
      "source":"zh",
      "target":"en",
      "base_model_id":"",
      "domain":"patent",
      "customizable":false,
      "default_model":false,
      "owner":"",
      "status":"available",
      "name":""
    }
  ]

/v2/translate: 翻訳する

いよいよ翻訳です。

  • メソッド: POSTまたはGET
  • URL: /v2/models
  • パラメータ: 翻訳モデル、対象テキスト
  • レスポンス形式: application/jsonまたはtext/plain (Acceptヘッダで指定)

いくつか実際に翻訳してみます。(中身の正確さはわかりません、あしからず)

# CNN(スペイン語)のサイトよりコピペした文章を翻訳にかけてみる
$ curl -u "<username>:<password>" -X POST -H "Accept: application/json" -H "Content-Type: application/json; charset=utf-8" -d '{"model_id":"es-en","text":["La desaprobación de la presidenta de Chile, Michelle Bachelet, llegó a una cifra histórica del 70%, de acuerdo con la más reciente encuesta de Adimark."]}' <url>/v2/translate
{
  "translations":[
    {
      "translation":"The disapproval of the President of Chile, Michelle Bachelet, reached a historical figure of 70%, according to the latest survey of madmen."
    }
  ],
  "word_count":25,
  "character_count":151
}
# アルジャジーラのサイトよりコピペした文章を翻訳にかけてみる
$ curl -u "<username>:<password>" -X POST -H "Accept: application/json" -H "Content-Type: application/json; charset=utf-8" -d '{"model_id":"ar-en","text":["وصف وزير خارجية دولة قطر خالد العطية الاتفاق النووي الإيراني بأنه الخيار الأنسب، بينما أكد نظيره الأميركي جون كيري أن بلاده ملتزمة بضمان أمن واستقرار المنطقة"]}' <url>/v2/translate
{
  "translations":[
    {
      "translation":"The State of Qatar Foreign Minister Khalid al-Attiyah described the Iranian nuclear deal as the best option, while his American counterpart John Kerry stressed that his country is committed to ensuring the security and stability of the region."
    }
  ],
  "word_count":26,
  "character_count":157
}

まとめ

Watson Language-Translation v2 APIはいかがだったでしょうか。こんなに簡単にさまざまな言語、そして会話やニュースなどに特化したドメインの翻訳が手に入るようになったことは非常によろこばしいことです。日本語でどの程度精緻な翻訳ができるのか、期待できますね。

この記事でとりあげなかった機能に/v2/models (PUT)と/v2/models/{model_id} (GET, DELETE)があります。これらは、既存の翻訳モデルを拡張するもので、使うことはできたのですが、実際に有効な結果を確認できていないため今回はとりあげませんでした。また結果がでましたら記事にしたいと考えています。

12
14
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
12
14