13
6

OCIで生成AIがリリースされたので早速LangChainでRAGを試してみた

Last updated at Posted at 2024-02-01

はじめに

OCIで生成AIサービスがリリースされました。

今回は2024/1/23にリリースされたばかりのOCI Cohereを使い、Embedding, CommandによるRAGを試してみます!
またリリースと同タイミングでLangchainサポートも公式ページに載ったので、こちらを使います!

動作確認

OCIにてポリシーを設定します。

allow group <your-group-name> to manage generative-ai-family in tenancy

続いて必要なモジュールをインストールします。

pip install oci langchain pypdf chromadb

今回使用した各モジュールのバージョンは以下です:

  • Python 3.11.5
  • oci 2.120.0
  • langchain 0.1.4
  • langchain-community 0.0.16
  • langchain-core 0.1.17
  • pypdf 4.0.1
  • chromadb 0.4.22

続いてサンプルとしてPDFファイルをCohereのEmbeddingを呼び出してブロックボリューム内のフォルダにベクトル化&保存します。
ベクトル対象のデータは下記の様なサンプルのFAQ集です。

faq.pdf
Q: How do I contact customer support?
A: You can contact our customer support team via email at support@example.com or by
calling our toll-free number during business hours. Check the Contact Us page for more
options.
(以下別のQA..)
embed.py
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OCIGenAIEmbeddings
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter

loader = PyPDFLoader("./faq.pdf")
docs = loader.load()
text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=1000,
    chunk_overlap=50,
    length_function=len,
)
documents = text_splitter.split_documents(docs)
embeddings = OCIGenAIEmbeddings(
    model_id="cohere.embed-english-v3.0",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
    compartment_id="my_compartment_id",
)
db_dir = 'faq'
db = Chroma.from_documents(
    documents=documents,
    embedding=embeddings,
    persist_directory=db_dir,
)
db.persist()
db = None

ベクトル化が完了したら、プロンプトを指定して実行します。

rag.py
from langchain.chains import ConversationChain
from langchain_community.llms import OCIGenAI
from langchain_community.embeddings import OCIGenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma

cid = "my_compartment_id"
ep = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
llm = OCIGenAI(
    model_id="cohere.command",
    service_endpoint=ep,
    compartment_id=cid,
    model_kwargs={"temperature": 0.7, "max_tokens": 1000, },
)
embeddings = OCIGenAIEmbeddings(
    model_id="cohere.embed-english-v3.0",
    service_endpoint=ep,
    compartment_id=cid,
)
db_dir = 'faq'
loaded_db = Chroma(persist_directory=db_dir,
                   embedding_function=embeddings
                   )
retriever = loaded_db.as_retriever()
QA = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)
print(QA.invoke('How can I contact a customer service representative?'))

実行すると事前にPDFに説明した内容を上手くセマンティック検索してくれているのがわかります。

出力結果
{
	"query": "How can I contact a customer service representative?",
	"result": " You can contact a customer service representative by calling the toll-free number during business hours or by emailing support@example.com. Alternatively, you can check the Contact Us page for more options, such as live chat or online form, depending on the company. It's important to review the company's contact information or consult their official website for the most accurate and up-to-date methods to reach customer support. \n\nIs there anything else I can help you with regarding customer service or any other topic? "
}

ちなみにRAGのベクトルデータを空にした状態で上記のrag.pyを実行すると、
一般的なカスタマーサポートへの問い合わせについてCohereが回答します。

出力結果(ベクトルデータなし)
{
	"query": "How can I contact a customer service representative?",
	"result": " To contact a customer service representative, you can take the following steps:\n\nReview the company's website: Most companies provide contact information on their website, including customer service options. Look for a \"Contact Us\" or \"Support\" section that may contain details on how to reach a representative.\n\nUtilize available communication channels: Companies may offer different ways to contact customer service, such as phone, email, live chat, social media, or even physical mail. Consider which method would be most convenient for you and select that option.\n\nCheck for self-service options: Before reaching out to a representative, it can be helpful to explore any available self-service options. Many companies provide resources, such as FAQs, user manuals, or knowledge bases, that may provide answers to your questions without needing to involve a representative.\n\nIf you still need to speak with a customer service representative, you can follow the provided steps to contact the company through your preferred communication channel. Remember to have relevant information ready, such as your account details or order number, to expedite the process and allow the representative to better assist you. \n\nIt's important to note that the specific steps may vary depending on the company you are contacting. Therefore, it's always best to start by reviewing the company's website or any communication materials they have provided. \n\nIf you need further assistance with finding the contact information for a specific company, please provide additional details, and I will do my best to assist you in finding the relevant information. \n\nWould you like me to help you find the contact information for a specific company? "
}

おわりに

OCI Cohereについて、LangChainを使用してRAGを試してみました。
LangChain連携については、以前OpenAIで作成したクラス(ChatOpenAI)をOCIGenAIに変えるだけで簡単に切り替えられました。
Cohereのページによると、Cohere Embed V3はOpenAIのada-002に比べて高い性能を示していますので、RAGの精度向上も期待できます。是非お試しください!

13
6
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
13
6