Local Japanese Keyword Search and Vector Search from Python without Elasticsearch: nlp4j-local-search

Posted at 2026-05-31

When working on NLP or RAG experiments in Python, you may want to quickly try search functionality such as:

Japanese full-text keyword search
Vector search
Comparing keyword search and embedding search
Building a small local search index in Jupyter Notebook or Google Colab
Testing search behavior without installing Elasticsearch, OpenSearch, Solr, or Docker

For this purpose, I added vector search support to nlp4j-local-search.

GitHub:

https://github.com/oyahiroki/nlp4j-local-search

What is nlp4j-local-search?

nlp4j-local-search is a lightweight local search library that can be used from Python.

It allows you to create a local search index, add documents, and search them directly from Python code.

Main features:

Usable from Python
Japanese keyword search
English keyword search
Vector search
JSON document support
No Elasticsearch server required
No OpenSearch server required
No Solr server required
No Docker required
Easy to use in Jupyter Notebook and Google Colab
Internally based on Apache Lucene search functionality

This library is not intended to replace a large-scale production search system.

Instead, it is designed for cases where you want to quickly try search functionality for NLP, RAG, embedding evaluation, or small experiments.

Installation

You can install it directly from GitHub.

!pip install git+https://github.com/oyahiroki/nlp4j-local-search.git

After installation, you can check the installed package.

!pip list | grep nlp4j

Example:

nlp4j-local-search    0.2.0

Japanese keyword search example

First, let’s try Japanese keyword search.

from nlp4j_local_search import SearchEngine

# Initialize the search engine in Japanese mode
engine = SearchEngine("ja")

Add documents.

engine.add("1", "東京都は日本の都道府県のひとつです")
engine.add("2", "京都は日本の都市です")
engine.add("3", "京都市には任天堂の本社があります")
engine.add_json({"id": "4", "body": "京都府は広いです"})  # JSON format is also supported
engine.add_json({"id": "5", "body": "大阪は関西の大都市です"})

Commit the documents to the index.

engine.commit()
print("Index commit completed")

Search with the keyword 京都.

results = engine.search("京都", limit=10)

for r in results:
    print(f"ID: {r.id}, Score: {r.score:.4f}")
    print(f"Body: {r.body}")
    print("-" * 50)

Example output:

ID: 2, Score: 0.2729
Body: 京都は日本の都市です
--------------------------------------------------
ID: 4, Score: 0.2729
Body: 京都府は広いです
--------------------------------------------------
ID: 3, Score: 0.2450
Body: 京都市には任天堂の本社があります
--------------------------------------------------

This is not just a simple substring search.

For example, if you use simple substring matching, 東京都 may also match the query 京都, because the character sequence 京都 appears inside 東京都.

However, Japanese full-text search analyzes Japanese text as search terms instead of treating everything as a raw character sequence.

This makes it useful for Japanese search experiments.

Vector search example

Next, let’s try vector search.

For simplicity, this example uses 2-dimensional vectors.

from nlp4j_local_search import SearchEngine

# Japanese mode + 2-dimensional vector search
engine = SearchEngine(lang="ja", vector_dimension=2)

Add simple vector data.

engine.add("1 East",  [1.0,  0.0])   # East
engine.add("2 North", [1.0,  1.0])   # North-East
engine.add("3 West",  [-1.0, 0.0])   # West
engine.add("4 South", [-1.0, -1.0])  # South-West

engine.commit()

Search by query vector.

query_vector = [0.9, 0.1]

print("Query vector:", query_vector)

for r in engine.search(query_vector, limit=10):
    print(f"{r.id}: body={r.body} score={r.score}")

Example output:

Query vector: [0.9, 0.1]
1 East: body=None score=0.9969
2 North: body=None score=0.8904
4 South: body=None score=0.1095
3 West: body=None score=0.0036

The query vector [0.9, 0.1] is close to [1.0, 0.0], so East is returned with the highest score.

In this way, once vectors are added to the index, you can search for items that are close to a query vector.

Keyword search and vector search with a similar interface

With nlp4j-local-search, both keyword search and vector search can be used from the same SearchEngine interface.

Keyword search:

results = engine.search("京都", limit=10)

Vector search:

results = engine.search([0.9, 0.1], limit=10)

This is useful when you want to compare different search methods in NLP experiments.

For example:

Compare keyword search results and embedding search results
Prototype the retrieval part of a RAG application
Validate search logic with a small dataset
Create a temporary search index for unit tests
Check search behavior in Google Colab or Jupyter Notebook

Before setting up Elasticsearch or OpenSearch, you can quickly test the search behavior locally.

No Elasticsearch server required

Normally, if you want to use a search engine, you may need to prepare things such as:

Installing Elasticsearch
Installing OpenSearch
Installing Solr
Preparing Docker Compose
Configuring ports and memory settings
Defining index mappings
Creating index settings

These are important for production systems.

However, for small NLP experiments or notebook-based validation, this setup can feel too heavy.

With nlp4j-local-search, you can initialize the search engine directly in Python, add documents, and search them immediately.

from nlp4j_local_search import SearchEngine

with SearchEngine("ja") as engine:
    engine.add("1", "京都は日本の都市です")
    engine.add("2", "大阪は関西の大都市です")
    engine.commit()

    for r in engine.search("京都", limit=10):
        print(r.id, r.body, r.score)

You do not need to start a search server.

You can begin search experiments with only Python code.

Use cases

nlp4j-local-search is especially useful for the following use cases.

1. NLP experiments

You can use it to search small text datasets, dictionary data, Wikipedia-derived text, JSONL data, and other local datasets.

2. RAG prototyping

Before preparing a full vector database or search server, you can test the retrieval part locally.

3. Comparing keyword search and vector search

When evaluating embedding models, it is often useful to compare keyword search results with vector search results.

For small evaluation datasets, this can be done locally and quickly.

4. Google Colab and Jupyter Notebook experiments

You can create a temporary search index inside a notebook, run searches, and discard the index after the experiment.

5. Unit testing

You can create a temporary local search index for tests that involve search behavior.

Notes

At this stage, nlp4j-local-search is mainly intended for experimentation and validation.

For large-scale production search systems, Elasticsearch, OpenSearch, Solr, Milvus, or other dedicated systems may be more appropriate.

However, nlp4j-local-search is useful when you want to:

Quickly check search behavior
Build a small proof of concept
Run search experiments in a notebook
Compare keyword search and vector search
Validate search logic before introducing a search server

Summary

Vector search support has been added to nlp4j-local-search.

Now you can easily use the following features from Python:

Japanese keyword search
English keyword search
JSON document indexing
Vector search
Local in-memory search

You do not need to install Elasticsearch, OpenSearch, Solr, or Docker.

This makes nlp4j-local-search useful as a lightweight local search engine for NLP experiments, RAG prototyping, embedding evaluation, and search logic validation.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up