When working on NLP or RAG experiments in Python, you may want to quickly try search functionality such as:
- Japanese full-text keyword search
- Vector search
- Comparing keyword search and embedding search
- Building a small local search index in Jupyter Notebook or Google Colab
- Testing search behavior without installing Elasticsearch, OpenSearch, Solr, or Docker
For this purpose, I added vector search support to nlp4j-local-search.
GitHub:
https://github.com/oyahiroki/nlp4j-local-search
What is nlp4j-local-search?
nlp4j-local-search is a lightweight local search library that can be used from Python.
It allows you to create a local search index, add documents, and search them directly from Python code.
Main features:
- Usable from Python
- Japanese keyword search
- English keyword search
- Vector search
- JSON document support
- No Elasticsearch server required
- No OpenSearch server required
- No Solr server required
- No Docker required
- Easy to use in Jupyter Notebook and Google Colab
- Internally based on Apache Lucene search functionality
This library is not intended to replace a large-scale production search system.
Instead, it is designed for cases where you want to quickly try search functionality for NLP, RAG, embedding evaluation, or small experiments.
Installation
You can install it directly from GitHub.
!pip install git+https://github.com/oyahiroki/nlp4j-local-search.git
After installation, you can check the installed package.
!pip list | grep nlp4j
Example:
nlp4j-local-search 0.2.0
Japanese keyword search example
First, let’s try Japanese keyword search.
from nlp4j_local_search import SearchEngine
# Initialize the search engine in Japanese mode
engine = SearchEngine("ja")
Add documents.
engine.add("1", "東京都は日本の都道府県のひとつです")
engine.add("2", "京都は日本の都市です")
engine.add("3", "京都市には任天堂の本社があります")
engine.add_json({"id": "4", "body": "京都府は広いです"}) # JSON format is also supported
engine.add_json({"id": "5", "body": "大阪は関西の大都市です"})
Commit the documents to the index.
engine.commit()
print("Index commit completed")
Search with the keyword 京都.
results = engine.search("京都", limit=10)
for r in results:
print(f"ID: {r.id}, Score: {r.score:.4f}")
print(f"Body: {r.body}")
print("-" * 50)
Example output:
ID: 2, Score: 0.2729
Body: 京都は日本の都市です
--------------------------------------------------
ID: 4, Score: 0.2729
Body: 京都府は広いです
--------------------------------------------------
ID: 3, Score: 0.2450
Body: 京都市には任天堂の本社があります
--------------------------------------------------
This is not just a simple substring search.
For example, if you use simple substring matching, 東京都 may also match the query 京都, because the character sequence 京都 appears inside 東京都.
However, Japanese full-text search analyzes Japanese text as search terms instead of treating everything as a raw character sequence.
This makes it useful for Japanese search experiments.
Vector search example
Next, let’s try vector search.
For simplicity, this example uses 2-dimensional vectors.
from nlp4j_local_search import SearchEngine
# Japanese mode + 2-dimensional vector search
engine = SearchEngine(lang="ja", vector_dimension=2)
Add simple vector data.
engine.add("1 East", [1.0, 0.0]) # East
engine.add("2 North", [1.0, 1.0]) # North-East
engine.add("3 West", [-1.0, 0.0]) # West
engine.add("4 South", [-1.0, -1.0]) # South-West
engine.commit()
Search by query vector.
query_vector = [0.9, 0.1]
print("Query vector:", query_vector)
for r in engine.search(query_vector, limit=10):
print(f"{r.id}: body={r.body} score={r.score}")
Example output:
Query vector: [0.9, 0.1]
1 East: body=None score=0.9969
2 North: body=None score=0.8904
4 South: body=None score=0.1095
3 West: body=None score=0.0036
The query vector [0.9, 0.1] is close to [1.0, 0.0], so East is returned with the highest score.
In this way, once vectors are added to the index, you can search for items that are close to a query vector.
Keyword search and vector search with a similar interface
With nlp4j-local-search, both keyword search and vector search can be used from the same SearchEngine interface.
Keyword search:
results = engine.search("京都", limit=10)
Vector search:
results = engine.search([0.9, 0.1], limit=10)
This is useful when you want to compare different search methods in NLP experiments.
For example:
- Compare keyword search results and embedding search results
- Prototype the retrieval part of a RAG application
- Validate search logic with a small dataset
- Create a temporary search index for unit tests
- Check search behavior in Google Colab or Jupyter Notebook
Before setting up Elasticsearch or OpenSearch, you can quickly test the search behavior locally.
No Elasticsearch server required
Normally, if you want to use a search engine, you may need to prepare things such as:
- Installing Elasticsearch
- Installing OpenSearch
- Installing Solr
- Preparing Docker Compose
- Configuring ports and memory settings
- Defining index mappings
- Creating index settings
These are important for production systems.
However, for small NLP experiments or notebook-based validation, this setup can feel too heavy.
With nlp4j-local-search, you can initialize the search engine directly in Python, add documents, and search them immediately.
from nlp4j_local_search import SearchEngine
with SearchEngine("ja") as engine:
engine.add("1", "京都は日本の都市です")
engine.add("2", "大阪は関西の大都市です")
engine.commit()
for r in engine.search("京都", limit=10):
print(r.id, r.body, r.score)
You do not need to start a search server.
You can begin search experiments with only Python code.
Use cases
nlp4j-local-search is especially useful for the following use cases.
1. NLP experiments
You can use it to search small text datasets, dictionary data, Wikipedia-derived text, JSONL data, and other local datasets.
2. RAG prototyping
Before preparing a full vector database or search server, you can test the retrieval part locally.
3. Comparing keyword search and vector search
When evaluating embedding models, it is often useful to compare keyword search results with vector search results.
For small evaluation datasets, this can be done locally and quickly.
4. Google Colab and Jupyter Notebook experiments
You can create a temporary search index inside a notebook, run searches, and discard the index after the experiment.
5. Unit testing
You can create a temporary local search index for tests that involve search behavior.
Notes
At this stage, nlp4j-local-search is mainly intended for experimentation and validation.
For large-scale production search systems, Elasticsearch, OpenSearch, Solr, Milvus, or other dedicated systems may be more appropriate.
However, nlp4j-local-search is useful when you want to:
- Quickly check search behavior
- Build a small proof of concept
- Run search experiments in a notebook
- Compare keyword search and vector search
- Validate search logic before introducing a search server
Summary
Vector search support has been added to nlp4j-local-search.
Now you can easily use the following features from Python:
- Japanese keyword search
- English keyword search
- JSON document indexing
- Vector search
- Local in-memory search
You do not need to install Elasticsearch, OpenSearch, Solr, or Docker.
This makes nlp4j-local-search useful as a lightweight local search engine for NLP experiments, RAG prototyping, embedding evaluation, and search logic validation.