More than 5 years have passed since last update.

【マメ知識】Watson Discoveryで類似文書検索を行う

Last updated at 2018-07-02Posted at 2018-06-29

はじめに

今年になってWatson Discoveryで類似文書検索ができるようになりました。
実際にその動作を確認しましたので、呼出し方のメモを残しておきます。

前提

Discoveryのインスタンス、コレクションは作成済み、文書のアップロードまで済んでいるものとします。
この手順については、例えばWatson Discovery(フルサポート版)に日本語大量文書を投入を参考にして下さい。

テストは、Watson Studio上のJupyter Notebbookで行っています。こちらのセットアップについても、上の書き込みを参考にして下さい。

事前手順

Discoveryのインスタンスを作って、environment_Id, collection_idを取得するまで

# 個別設定
# 管理コンソールのcredentail情報をコピーします。
credencial = {
  "url": "https://gateway.watsonplatform.net/discovery/api",
  "username": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "password": "xxxxxxxxxxxx"
}

# ライブラリ読み込み
import json
from watson_developer_cloud import DiscoveryV1
# In the constructor, letting the SDK manage the IAM token
discovery = DiscoveryV1(version = '2018-03-05',
                        username = credencial['username'],
                       password = credencial['password'])

# environment_id取得

environments = discovery.list_environments()['environments']
environment_id = environments[1]['environment_id']

print('environment_id: ', environment_id)

# collection_id取得
# (下のロジックはcolelctionが一つしかない前提)
collections = discovery.list_collections(environment_id = environment_id)['collections']
collection_id = collections[0]['collection_id']

print('collection_id: ', collection_id)

類似検索API呼出し

ここでなんらかの手段で、類似検索の元になる文書のidを取得します。
それがわかった後で、次の形でAPIを呼び出します。

# 類似検索
# id = "b5eb9c23-824a-4dc9-873e-9d17eaeb5a5d"と類似の文書を検索する

my_query = discovery.query(
    environment_id = environment_id, 
    collection_id = collection_id, 
    similar = 'true',
    similar_document_ids = ['b5eb9c23-824a-4dc9-873e-9d17eaeb5a5d'],
    return_fields = ['id', 'pat_id', 'pat_name', 'summury'])
print(json.dumps(my_query, indent=2, ensure_ascii=False))

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up