Introduction
Since LangChain v1.0 was released, I decided to give it a try.
https://docs.langchain.com/oss/python/releases-v1
I had previously built a RAG system with LangChain v0.3, so I first checked whether the same code would run without changes.
https://qiita.com/yum11/items/edf9d7338b8ef03c3f8f
Conclusion
I was able to run it without changing the code. However, installing the libraries took a bit of effort.
Implementation
I built a simple RAG using LangChain and the OpenAI API.
- Target documents: Local PDF files
- Embedding model: text-embedding-ada-002 (OpenAI)
- Vector DB: InMemoryVectorStore (LangChain)
- LLM: gpt-4o-mini (OpenAI)
Python and main library versions
- python==3.11.11
- langchain==1.0.0a4
- langchain-community==0.3.29
- langchain-core==1.0.0a2
- langchain-openai==1.0.0a2
- pypdf==6.0.0
I installed LangChain v1.0 as follows:
pip install --pre -U langchain
To run the code, both langchain-community and langchain-openai need to be installed.
langchain-community is currently at v0.3, which appears compatible with langchain v1.0, and it worked fine at v0.3.
langchain-openai has a v1.0 release, and the code only worked properly when I installed v1.0.
pip install --pre langchain-openai==1.0.0a2
At some point (probably after installing pypdf), a cryptography DependencyError occurred, so I installed the latest version of the library:
pip install -U cryptography
Note: langchain-core v1.0 is automatically installed together with langchain.
Preparing the documents
- I used monthly economic report PDFs published by the Cabinet Office of Japan.
- For this test, I used three PDFs: September, October, and November 2024.
Setting the OpenAI API key
- Create a .env file in the same directroy as the Python script.
- Write the OpenAI API key in the .env file:
OPENAI_API_KEY="your-api-key"
- Load the environment variables using python-dotenv:
from dotenv import load_dotenv
load_dotenv()
Loading the documents
- Use LangChain’s PyPDFLoader.
- The PDF files are loaded page by page.
from langchain_community.document_loaders import PyPDFLoader
# list of pdf files
file_paths = ["./report_202409.pdf", "./report_202410.pdf", "./report_202411.pdf"]
# list storing pdf file pages
pages = []
for file_path in file_paths:
loader = PyPDFLoader(file_path)
for page in loader.load():
pages.append(page)
Vectoring the documents and storing them in the vector DB
- Convert the documents into vectors using OpenAI’s text-embedding-ada-002.
- Use LangChain’s InMemoryVectorStore as the vector DB.
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings
vector_store = InMemoryVectorStore.from_documents(pages, OpenAIEmbeddings())
Building the RAG chain
- The RAG chain is written using LCEL (LangChain Expression Language).
- It connects a prompt template, LLM model, and output parser.
- Define a prompt template with ChatPromptTemplate.
- The user’s query is passed into the variable question, and the retriever results are stored in the variable context.
- Configure the LLM with ChatOpenAI.
- Use the retriever created with vector_store.as_retriever().
- By default, this performs similarity search.
- The argument search_kwargs can be used to specify search behavior.
- For example,
search_kwargs={"k": 3}
limits results to 3.
- For example,
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_template('''
Answer the following question based only on the given context.
context: """
{context}
"""
question: {question}
''')
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
chain = {
"question": RunnablePassthrough(),
"context": retriever,
} | prompt | model | StrOutputParser()
Responses by RAG
- Use the invoke method of the chain to input a query.
query = "When was the November monthly economic report held?"
chain.invoke(query)
- The LLM responded based on the documents:
'The November monthly economic report was held in November 2024, as indicated by the document dated "令和6年 11月" (November of the 6th year of Reiwa, which corresponds to 2024).'
- Another question: What was the Employment Judgment DI for manufacturing in September?
- Answer:
'The Employment Judgment DI for manufacturing in September was -22.'