More than 1 year has passed since last update.

Cohere を一通り試してみた

Last updated at Posted at 2023-08-22


2 か月ほど前にこんな Press Release がでました。

エンタープライズ向けの AI プラットフォームを提供する Cohere と連携し、OCI 上で新たに Generative AI Service を提供する予定とのことです。
これは最低限、Cohere のことを知っておかなければならない!ということで学習がてら本記事では現在 Cohere から提供されている機能を OCI Data Science1 を使って一通り試してみることにします。

尚、無料利用で発行できる Trial Key には Rate Limit や合計実行回数が厳し目に設定されていますので、ご注意ください。

Cohere から提供されるモデル・機能

https://cohere.com/ を眺めてみると、モデルが 2 つ提供されていることが分かります。

  • Command
    • テキスト生成のモデルを提供
    • ユーザーのコマンドに従うように学習されている
  • Embeddings
    • Embedding2 のモデルを提供
    • 多言語対応

また、提供される機能は一部の Waitlist に登録しなければ使えないものを除き API Reference を見ると記載があります。

endpoint overview
/generate 与えられた入力を条件とし、リアルなテキストを生成しそれを返却する
/embed テキストの Embedding を行い、浮動小数点のリストを返却する
/classify 入力されたテキストにどのラベルが合うのかを判定し、結果を返却する
/tokenize 入力されたテキストをトークンに分割(形態素解析)し、その結果を返却する
/detokenize BPE(byte-pair encoding)されたトークンを受け取り、そのテキスト表現を返却する
/detect-language 入力されたテキストがどの言語で書かれたものかを特定し、その結果を返却する
/summarize 入力されたテキストに対して、英語で要約を作成しその結果を返却する
/rerank クエリとテキストのリストを受け取り、各テキストに関連スコアが割り当てられた順序付きの配列を生成し、返却する




import cohere
api_key = '<your-api-key>'

co = cohere.Client(api_key)



res_generate = co.generate(
    prompt='Hello, how'



[cohere.Generation {
  id: 905bc1fa-cb5a-4082-b1d8-2704d426956b
  prompt: Hello, how
  text:  How can I help you?

  likelihood: None
  finish_reason: None
  token_likelihoods: None

How can I help you? というテキストが生成されて返ってきました。


例: Nightly 版を使用し、最大 50 トークン含むテキストを 3 つ生成する

res_generate = co.generate(
    prompt='Hello, how',



[cohere.Generation {
  id: 8eca3fb0-9a9b-4df2-997d-e5fba22ebfbb
  prompt: Hello, how
  text:  can I help you?
  likelihood: None
  finish_reason: None
  token_likelihoods: None
}, cohere.Generation {
  id: 2b5d7382-64cb-48a8-8bb0-33c574565def
  prompt: Hello, how
  text:  Hello! I'm a chatbot trained to be helpful and harmless. How can I assist you today?
  likelihood: None
  finish_reason: None
  token_likelihoods: None
}, cohere.Generation {
  id: 060903ac-241f-4a9f-bd46-32de661a0799
  prompt: Hello, how
  text:  are you doing today?
  likelihood: None
  finish_reason: None
  token_likelihoods: None

例: 最も可能性の高い上位 5 個(k=5)のトークンのみを生成対象として考慮し、生成におけるランダム性を最大(temperature=5.0)にする

res_generate = co.generate(
    prompt='Hello, how',



[cohere.Generation {
  id: 55f7482c-70ef-4846-be70-34af13df268a
  prompt: Hello, how
  text:  are you ?
  likelihood: None
  finish_reason: None
  token_likelihoods: None
}, cohere.Generation {
  id: d9286b01-bd12-4563-839b-6cf867f69873
  prompt: Hello, how
  text:  Id like to thank the user.  Is this how they would prefer me to address them?  If so I can change that to their username.
How may I be of service?  If I can help with any tasks I
  likelihood: None
  finish_reason: None
  token_likelihoods: None
}, cohere.Generation {
  id: 6a2472a8-2663-4b83-a988-f164d1254707
  prompt: Hello, how
  text:  can I help you with something, or provide any assistance?

  likelihood: None
  finish_reason: None
  token_likelihoods: None

例: 生成結果を JSON stream で受け取る(レスポンスの内容を少しずつレンダリングする UI にとって有益らしい)

res_generate_streaming = co.generate(
    prompt='Hello, how',

for index, token in enumerate(res_generate_streaming):
    print(f"{index}: {token}")



0: StreamingText(index=0, text=' Hello', is_finished=False)
1: StreamingText(index=0, text='!', is_finished=False)
2: StreamingText(index=0, text=' How', is_finished=False)
3: StreamingText(index=0, text=' can', is_finished=False)
4: StreamingText(index=0, text=' I', is_finished=False)
5: StreamingText(index=0, text=' help', is_finished=False)
6: StreamingText(index=0, text=' you', is_finished=False)
7: StreamingText(index=0, text=' today', is_finished=False)
8: StreamingText(index=0, text='?', is_finished=False)
[' Hello! How can I help you today?']

その他詳細は、API Reference - Co.Generate をご参照ください。



res_embed = co.embed(
    texts=['hello', 'world']



cohere.Embeddings {
  embeddings: [[1.6142578, 0.24841309, 0.5385742, -1.6630859, -0.27783203, 0.35888672, 1.3378906, -1.8261719, 0.89404297, 1.0791016, 1.0566406, 1.0664062, 0.20983887, ... , -2.4160156, 0.22875977, -0.21594238]]
  compressed_embeddings: []
  meta: {'api_version': {'version': '1'}}

Embedding された結果が返ってきました。また、Embedding の機能では、使用するモデルをパラメータで指定できるみたいです。

例: 軽量版モデル(model=embed-english-light-v2.0)を使ってみる

res_embed = co.embed(
    texts=['hello', 'world'],



cohere.Embeddings {
  embeddings: [[-0.16577148, -1.2109375, 0.54003906, -1.7148438, -1.5869141, 0.60839844, 0.6328125, 1.3974609, -0.49658203, -0.73046875, -1.796875, 1.5410156, 0.66064453, 0.9448242, -0.53515625, 0.24914551, 0.53222656, 0.23425293, 0.52685547, -1.3935547, 0.04095459, 0.8569336, -0.5620117, -0.42211914, 0.55371094, 3.5820312, 7.2890625, -1.2539062, 1.3583984, 0.12988281, -1.1660156, 0.124816895, ... ,0.87060547, 1.0205078, 0.5854492, -2.734375, -0.066589355, 1.8349609, 0.16430664, -0.26220703, -1.0625]]
  compressed_embeddings: []
  meta: {'api_version': {'version': '1'}}

例: multilingual 対応のモデル(model=embed-multilingual-v2.0)を使ってみる

res_embed = co.embed(
    texts=['こんにちは', '世界'],



cohere.Embeddings {
  embeddings: [[0.2590332, 0.41308594, 0.24279785, 0.30371094, 0.04647827, 0.1361084, 0.41357422, -0.40063477, 0.2553711, 0.17749023, -0.1899414, -0.041900635, 0.20141602, 0.43017578, -0.5878906, 0.18054199, 0.42333984, 0.010749817, -0.56640625, 0.1517334, 0.14282227, 0.36767578, 0.26953125, 0.1418457, 0.28051758, 0.1661377, -0.13293457, 0.23620605, 0.08703613, 0.36914062, 0.22180176, ... ,0.027786255, -0.18530273, -0.24414062, 0.123168945, 0.6425781, 0.08831787, -0.21862793, -0.18237305, -0.031341553]]
  compressed_embeddings: []
  meta: {'api_version': {'version': '1'}}

その他の詳細は、API Reference - Co.Embed をご参照ください。



from cohere.responses.classify import Example

  Example("Dermatologists don't like her!", "Spam"),
  Example("'Hello, open to this?'", "Spam"),
  Example("I need help please wire me $1000 right now", "Spam"),
  Example("Nice to know you ;)", "Spam"),
  Example("Please help me?", "Spam"),
  Example("Your parcel will be delivered today", "Not spam"),
  Example("Review changes to our Terms and Conditions", "Not spam"),
  Example("Weekly sync notes", "Not spam"),
  Example("'Re: Follow up from today's meeting'", "Not spam"),
  Example("Pre-read for tomorrow", "Not spam"),

  "Confirm your email address",
  "hey i need u to send some $",

res_classify = co.classify(



[Classification<prediction: "Not spam", confidence: 0.5661598, labels: {'Not spam': LabelPrediction(confidence=0.5661598), 'Spam': LabelPrediction(confidence=0.43384025)}>, Classification<prediction: "Spam", confidence: 0.9909811, labels: {'Not spam': LabelPrediction(confidence=0.009018883), 'Spam': LabelPrediction(confidence=0.9909811)}>]

Confirm your email address は、スパムの場合もあればそうではない場合もあるので、Not spam: 0.56..., Spam: 0.43... は妥当な気がします。また、hey i need u to send some $ は、かなりスパムっぽいですが、結果もその通り(Spam: 0.99, Not spam: 0.009)でした。

Classification の機能では、他にモデルを指定するパラメータや truncate(最大トークン長より長い入力をどのように扱うか)が指定できます。

例: 軽量版モデル(model=embed-english-light-v2.0)で最大トークン長より長い入力を渡した際は、エラーが返却される(truncate=NONE)ようにする

res_classify = co.classify(



CohereAPIError                            Traceback (most recent call last)

# ... omit

CohereAPIError: invalid request: inputs cannot contain more than 96 elements, received 136

最大 96 個の入力までしか渡せない inputs に 136 の入力を与えたところちゃんとエラーで返ってきました。

その他の詳細は、API Reference - Co.Classify をご参照ください。



res_tokenize = co.tokenize(
    text='tokenize me !D',



cohere.Tokens {
  tokens: [10002, 2261, 2012, 3362, 43]
  token_strings: ['token', 'ize', ' me', ' !', 'D']
  meta: {'api_version': {'version': '1'}}

この他に、モデルを選択するパラメータはあるのですが、command 以外何が選択可能なのか API Reference - Co.Tokenize を参照しても良く分かりませんでした。



tokens = [10002, 2261, 2012, 3362, 43] # tokenize me !D

res_detokenize = co.detokenize(



tokenize me !D

この他に、モデルを選択するパラメータはあるのですが、command 以外何が選択可能なのか API Reference - Co.Detokenize を参照しても良く分かりませんでした。



res_detect_language = co.detect_language(
    texts=['Hello world', "'Здравствуй, Мир'", 'こんにちは世界', '世界你好', '안녕하세요']



cohere.DetectLanguageResponse {
  results: [Language<language_code: "en", language_name: "English">, Language<language_code: "ru", language_name: "Russian">, Language<language_code: "ja", language_name: "Japanese">, Language<language_code: "zh", language_name: "Chinese">, Language<language_code: "ko", language_name: "Korean">]
  meta: {'api_version': {'version': '1'}}

ちゃんと検出されていそうです。Detect_language はこれ以外に指定可能なパラメータはありませんでした。



text = (
  "Ice cream is a sweetened frozen food typically eaten as a snack or dessert. "
  "It may be made from milk or cream and is flavoured with a sweetener, "
  "either sugar or an alternative, and a spice, such as cocoa or vanilla, "
  "or with fruit such as strawberries or peaches. "
  "It can also be made by whisking a flavored cream base and liquid nitrogen together. "
  "Food coloring is sometimes added, in addition to stabilizers. "
  "The mixture is cooled below the freezing point of water and stirred to incorporate air spaces "
  "and to prevent detectable ice crystals from forming. The result is a smooth, "
  "semi-solid foam that is solid at very low temperatures (below 2 °C or 35 °F). "
  "It becomes more malleable as its temperature increases.\n\n"
  "The meaning of the name \"ice cream\" varies from one country to another. "
  "In some countries, such as the United States, \"ice cream\" applies only to a specific variety, "
  "and most governments regulate the commercial use of the various terms according to the "
  "relative quantities of the main ingredients, notably the amount of cream. "
  "Products that do not meet the criteria to be called ice cream are sometimes labelled "
  "\"frozen dairy dessert\" instead. In other countries, such as Italy and Argentina, "
  "one word is used fo\r all variants. Analogues made from dairy alternatives, "
  "such as goat's or sheep's milk, or milk substitutes "
  "(e.g., soy, cashew, coconut, almond milk or tofu), are available for those who are "
  "lactose intolerant, allergic to dairy protein or vegan."

co_summarize = co.summarize(



SummarizeResponse(id='5d566208-06f3-4da0-ac54-fd590c22777f', summary='Ice cream is a frozen food, usually eaten as a snack or dessert. It is made from milk or cream, and is flavored with a sweetener and a spice or fruit. It can also be made by whisking a flavored cream base and liquid nitrogen together. Food coloring and stabilizers may be added. The mixture is cooled and stirred to incorporate air spaces and prevent ice crystals from forming. The result is a smooth, semi-solid foam that is solid at low temperatures. It becomes more malleable as its temperature increases. The meaning of the name "ice cream" varies from one country to another. Some countries, such as the US, have specific regulations for the commercial use of the term "ice cream". Products that do not meet the criteria to be called ice cream are sometimes labeled "frozen dairy dessert". Dairy alternatives, such as goat\'s or sheep\'s milk, or milk substitutes, are available for those who are lactose intolerant or allergic to dairy protein.', meta={'api_version': {'version': '1'}})

Ice cream is a frozen food, usually eaten as a snack or dessert. It is made from milk or cream, and is flavored with a sweetener and a spice or fruit. It can also be made by whisking a flavored cream base and liquid nitrogen together. Food coloring and stabilizers may be added. The mixture is cooled and stirred to incorporate air spaces and prevent ice crystals from forming. The result is a smooth, semi-solid foam that is solid at low temperatures. It becomes more malleable as its temperature increases. The meaning of the name "ice cream" varies from one country to another. Some countries, such as the US, have specific regulations for the commercial use of the term "ice cream". Products that do not meet the criteria to be called ice cream are sometimes labeled "frozen dairy dessert". Dairy alternatives, such as goat's or sheep's milk, or milk substitutes, are available for those who are lactose intolerant or allergic to dairy protein.


例: 長さを短く(length=short)し、箇条書き形式(format=bullets)で出力する

co_summarize = co.summarize(



SummarizeResponse(id='fefca5fe-1dd4-4fff-91b0-622fc9fc64ec', summary='- Ice cream is a frozen dessert made from milk or cream.\n- It is flavored with a sweetener and a spice or fruit.\n- It can also be made with dairy alternatives.', meta={'api_version': {'version': '1'}})


  • Ice cream is a frozen dessert made from milk or cream.
  • It is flavored with a sweetener and a spice or fruit.
  • It can also be made with dairy alternatives.

例: 軽量版のモデルを使い(model=command-light)、元の文章からの抽出度を高くし(extractiveness=high)、箇条書き形式で出力する

co_summarize = co.summarize(



SummarizeResponse(id='df22373b-dd85-4b8f-a6c4-0aac6db32898', summary='- Ice cream is a sweetened frozen food typically eaten as a snack or dessert.\n- It may be made from milk or cream and is flavored with a sweetener, either sugar or an alternative.\n- It can also be made by whisking a flavored cream base and liquid nitrogen together.\n- The result is a smooth, semi-solid foam that is solid at very low temperatures.\n- It becomes more malleable as its temperature increases.\n- The meaning of the name varies from one country to another.', meta={'api_version': {'version': '1'}})


  • Ice cream is a sweetened frozen food typically eaten as a snack or dessert.
  • It may be made from milk or cream and is flavored with a sweetener, either sugar or an alternative.
  • It can also be made by whisking a flavored cream base and liquid nitrogen together.
  • The result is a smooth, semi-solid foam that is solid at very low temperatures.
  • It becomes more malleable as its temperature increases.
  • The meaning of the name varies from one country to another.

確かに、元の文章からの抽出度が高くなりました(e.g.Ice cream is a sweetened frozen food typically eaten as a snack or dessert.

その他詳細は、API Reference - Co.Summarize をご参照ください。



docs = [
    'Carson City is the capital city of the American state of Nevada.',
    'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.',
    'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.',
    'Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.'

co_rerank = co.rerank(
    query='What is the capital of the United States?',


API Reference - Co.Rerank によると、model のパラメータは required ではないのですが、付与せずに実行したところ TypeError: rerank() missing 1 required positional argument: 'model' とエラーログが出力されたので必須のパラメータだと思われます。ご注意を。


[RerankResult<document['text']: Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district., index: 2, relevance_score: 0.98005307>, RerankResult<document['text']: Capital punishment (the death penalty) has existed in the United States since beforethe United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states., index: 3, relevance_score: 0.27904198>, RerankResult<document['text']: Carson City is the capital city of the American state of Nevada., index: 0, relevance_score: 0.10194652>, RerankResult<document['text']: The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan., index: 1, relevance_score: 0.0721122>]

Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.


これ以外にも multilingual なモデルが指定できたり(model=rerank-multilingual-v2.0)、順序付き配列のうちレスポンス含める数を指定(top_n=3)するためのパラメータが指定できるようです。いくつか試した例を以下に記載します。

例: 日本語を試してみる(model=rerank-multilingual-v2.0)

先ほどの docs, query を DeepL にかけた結果を用いてみます。

docs_ja = [
    ' ワシントンD.C.(Washington, D.C.)は、アメリカ合衆国の首都である。連邦区である',

co_rerank_ja = co.rerank(



[RerankResult<document['text']:  ワシントンD.C.Washington, D.C.アメリカ合衆国の首都である連邦区である, index: 2, relevance_score: 0.9998813>, RerankResult<document['text']: カーソンシティはアメリカネバダ州の州都である, index: 0, relevance_score: 0.37903354>, RerankResult<document['text']: 北マリアナ諸島連邦は太平洋に浮かぶ島々である首都はサイパンである, index: 1, relevance_score: 0.23899394>, RerankResult<document['text']: 死刑制度はアメリカ合衆国が国である以前から存在する2017年現在死刑は50州のうち30州で合法である, index: 3, relevance_score: 0.0007014083>]

ワシントン D.C.(Washington, D.C.)は、アメリカ合衆国の首都である。連邦区である


例: 日本語(model=rerank-multilingual-v2.0)で最も相関の高いもののみ(top_n=1)を返却する。

docs_ja = [
    ' ワシントンD.C.(Washington, D.C.)は、アメリカ合衆国の首都である。連邦区である',

co_rerank_ja = co.rerank(



[RerankResult<document['text']:  ワシントンD.C.Washington, D.C.アメリカ合衆国の首都である連邦区である, index: 2, relevance_score: 0.9998813>]


その他詳細は、API Reference - Co.Rerank をご参照ください。


今回、色々実験した Notebook はこちらから参照できます。

  1. 機械学習のためのマネージド・プラットフォームを提供するサービス。本記事では、Jupyter Notebook の Oracle Managed な環境を利用します。

  2. 自然言語を計算可能な形(多くの場合は、ベクトル空間)に変換すること


