This tool lets you generate multilingual E5 embeddings from JSONL files or use them via a lightweight HTTP API, without writing Python integration code every time.
Introduction
I released a Python package named nlp4j-llm-embedding-e5.
PyPI:
https://pypi.org/project/nlp4j-llm-embedding-e5/
This package provides simple command-line tools for using multilingual E5 embeddings locally.
It supports two main use cases:
nlp4j-embedding-local-e5
nlp4j-embedding-server-e5
The first command processes JSONL files and adds embedding vectors.
The second command starts a lightweight HTTP server that exposes embedding, semantic search, and cosine similarity APIs.
Internally, this package uses sentence-transformers and the multilingual E5 model:
intfloat/multilingual-e5-large
Installation
The package is available on PyPI, so it can be installed with pip.
pip install nlp4j-llm-embedding-e5
After installation, the following commands are available:
nlp4j-embedding-local-e5
nlp4j-embedding-server-e5
You can check the command options with:
nlp4j-embedding-local-e5 --help
nlp4j-embedding-server-e5 --help
Why I Created This Tool
Embedding models are useful for many NLP and LLM-related tasks, such as:
- semantic search
- RAG preprocessing
- document retrieval
- similarity calculation
- clustering
- multilingual search
- local text analysis
However, using embedding models directly from Python code is not always convenient.
In many systems, the main application may be written in Java, Node.js, shell scripts, or another language. In such cases, a simple command-line tool or HTTP API can be more convenient than writing Python integration code every time.
This package provides both:
- a local JSONL embedding command
- a lightweight HTTP embedding server
This makes multilingual E5 embeddings easier to use from local scripts, batch jobs, and non-Python applications.
What Is E5?
E5 is a family of embedding models designed for text retrieval and semantic search.
This package uses:
intfloat/multilingual-e5-large
E5 models expect different prefixes depending on the role of the text.
For documents or passages:
passage: your document text
For search queries:
query: your search query
For example:
query: Japanese NLP
passage: GiNZA is a Japanese NLP library.
This package automatically adds these prefixes depending on the selected mode.
If the input text already starts with query: or passage:, the existing prefix is preserved.
Command 1: Local JSONL Embedding
The command nlp4j-embedding-local-e5 reads a JSONL file, embeds text from a specified JSON attribute, and writes a new JSONL file with embedding vectors added.
Input Example
Create a JSONL file named input.jsonl.
{"id":"1","text":"Kyoto is a city in Japan."}
{"id":"2","text":"Tokyo is the capital of Japan."}
{"id":"3","text":"GiNZA is a Japanese NLP library."}
Run Embedding
nlp4j-embedding-local-e5 --input input.jsonl --output output.jsonl
Short options are also available:
nlp4j-embedding-local-e5 -i input.jsonl -o output.jsonl
By default:
- input text attribute:
text - output vector attribute:
vector - E5 text type:
passage - batch size:
32 - max length:
512
Output Example
The output JSONL will contain a new vector field.
{"id":"1","text":"Kyoto is a city in Japan.","vector":[0.0123,-0.0456,...]}
{"id":"2","text":"Tokyo is the capital of Japan.","vector":[0.0234,-0.0567,...]}
{"id":"3","text":"GiNZA is a Japanese NLP library.","vector":[0.0345,-0.0678,...]}
The actual vector is a high-dimensional floating-point array.
For intfloat/multilingual-e5-large, the embedding dimension is typically 1024.
Custom JSON Attributes
If your JSONL file uses a different attribute name, you can specify it.
For example:
{"doc_id":"a","body":"Natural language processing handles text data."}
{"doc_id":"b","body":"Vector search can retrieve semantically similar documents."}
Run:
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--text-attr body \
--vector-attr embedding
The output will contain the embedding field.
{"doc_id":"a","body":"Natural language processing handles text data.","embedding":[...]}
{"doc_id":"b","body":"Vector search can retrieve semantically similar documents.","embedding":[...]}
Specifying E5 Text Type
For document embeddings, use passage.
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--text-type passage
For query embeddings, use query.
nlp4j-embedding-local-e5 \
--input queries.jsonl \
--output query_vectors.jsonl \
--text-type query
To disable automatic E5 prefixing, use none.
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--text-type none
Useful Local Options
Batch Size
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--batch-size 64
A larger batch size may improve throughput if enough memory is available.
If memory is limited, reduce the batch size.
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--batch-size 8
Max Length
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--max-length 512
Token Count Check
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--check-token-count
If the token count exceeds --max-length, a warning is printed.
Verbose Mode
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--verbose
This prints batch-level processing time.
Command 2: HTTP Embedding Server
The command nlp4j-embedding-server-e5 starts a lightweight HTTP server.
nlp4j-embedding-server-e5
By default, the server listens on:
127.0.0.1:8888
You can specify the host and port.
nlp4j-embedding-server-e5 --host 127.0.0.1 --port 8888
The server provides the following endpoints:
/embeddings
/semantic_search
/cos_sim
Model Warmup
By default, the server loads and warms up the model at startup.
This may take some time, especially on the first run, because:
- the model may need to be downloaded
-
sentence-transformersloads the model - PyTorch initializes CPU/GPU resources
- a warmup embedding is executed
The server prints status messages during startup, such as model loading and warmup completion.
If you want to skip startup warmup, use:
nlp4j-embedding-server-e5 --no-warmup
In this mode, the server starts faster, but the first API request may take longer because the model is loaded on demand.
API: /embeddings
The /embeddings endpoint generates an embedding for a single text.
This endpoint is intended for document embeddings and uses the E5 passage: prefix internally.
GET
curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test."
POST
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text":"This is a test."}' \
http://127.0.0.1:8888/embeddings
Response Example
{
"message": "ok",
"time": "2026-06-20T12:00:00",
"text": "This is a test.",
"embeddings": [0.0123, -0.0456, 0.0789]
}
Token Count Check
curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test.&checktokencount=true"
The response includes token count information.
{
"message": "ok",
"text": "This is a test.",
"token_count": 10,
"max_tokens": 512,
"truncated": false,
"embeddings": [...]
}
API: /semantic_search
The /semantic_search endpoint compares a query with one or more candidate texts.
The query is encoded with:
query: ...
The candidate texts are encoded with:
passage: ...
This is the recommended endpoint for retrieval-style search.
GET
GET supports one query and one candidate text.
curl "http://127.0.0.1:8888/semantic_search?text1=Japanese%20NLP&text2=GiNZA%20is%20a%20Japanese%20NLP%20library."
POST
POST supports multiple candidate texts.
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text":"Japanese NLP","texts":["GiNZA is a Japanese NLP library.","This document is about image processing.","Tokyo is the capital of Japan."]}' \
http://127.0.0.1:8888/semantic_search
Response Example
{
"message": "ok",
"time": "2026-06-20T12:00:00",
"text": "Japanese NLP",
"r": [
{
"corpus_id": 0,
"score": 0.8234
},
{
"corpus_id": 1,
"score": 0.3123
}
]
}
The corpus_id indicates the index of the candidate text in the input list.
The score is the semantic similarity score.
API: /cos_sim
The /cos_sim endpoint calculates cosine similarity between two texts.
This endpoint currently uses no automatic E5 prefix by default. It is useful as a simple compatibility endpoint for comparing two raw texts.
For retrieval-style search, /semantic_search is recommended because it applies query: and passage: prefixes correctly.
GET
curl "http://127.0.0.1:8888/cos_sim?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."
POST
curl -X POST \
-H "Content-Type: application/json" \
-d '{"text1":"This is a test.","text2":"This is an exam.","checktokencount":true}' \
http://127.0.0.1:8888/cos_sim
Response Example
{
"text1": "This is a test.",
"text2": "This is an exam.",
"cosine_similarity": 0.8123
}
Python API
The package can also be used directly from Python.
Single Text Embedding
from nlp4j_embedding import e5_model
vector, elapsed = e5_model.embed_text(
"Kyoto is a city in Japan.",
text_type="passage"
)
print(len(vector))
print(elapsed)
Batch Embedding
from nlp4j_embedding import e5_model
vectors, elapsed = e5_model.embed_texts(
[
"Kyoto is a city in Japan.",
"Tokyo is the capital of Japan."
],
text_type="passage"
)
print(len(vectors))
Semantic Search
from nlp4j_embedding import e5_model
results = e5_model.semantic_search(
"Japanese city",
[
"Kyoto is a city in Japan.",
"Python is a programming language."
]
)
print(results)
Cosine Similarity
from nlp4j_embedding import e5_model
score = e5_model.cos_sim(
"This is a test.",
"This is an exam."
)
print(score)
Testing the Local Command
Create a small JSONL file.
cat > input.jsonl << 'EOF'
{"id":"1","text":"Kyoto is a city in Japan."}
{"id":"2","text":"Tokyo is the capital of Japan."}
EOF
Run:
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--check-token-count \
--verbose
Check the result.
cat output.jsonl
You should see vectors added to each JSON object.
Testing the HTTP Server
Start the server.
nlp4j-embedding-server-e5 --port 8888
Then test the embedding endpoint.
curl "http://127.0.0.1:8888/embeddings?text=This%20is%20a%20test."
Test semantic search.
curl "http://127.0.0.1:8888/semantic_search?text1=Japanese%20NLP&text2=GiNZA%20is%20a%20Japanese%20NLP%20library."
Test cosine similarity.
curl "http://127.0.0.1:8888/cos_sim?text1=This%20is%20a%20test.&text2=This%20is%20an%20exam."
Why an HTTP API Is Useful
Although the embedding model runs in Python, many real-world applications are not written entirely in Python.
A lightweight HTTP server makes the model usable from:
- Java applications
- Node.js applications
- shell scripts
- web applications
- enterprise batch systems
- internal tools
For example, a Java or Node.js application can call:
http://127.0.0.1:8888/embeddings?text=...
without directly importing Python libraries.
This is useful when Python is used as the NLP model runtime and another language is used for application development.
Comparison with Vector Databases and Search Engines
This package is not intended to replace Elasticsearch, OpenSearch, Lucene, PostgreSQL with pgvector, or dedicated vector databases.
Instead, it is useful when you want:
- a simple local embedding tool
- JSONL preprocessing
- a lightweight HTTP embedding server
- quick semantic search experiments
- integration with your own search engine
- embedding generation before indexing into another system
For example, you can use this package to generate vectors first, and then store those vectors in your own search index.
Performance Notes
The first execution may take longer because the model must be downloaded and loaded.
The HTTP server warms up the model by default so that the first request can be faster after startup.
nlp4j-embedding-server-e5
To skip warmup:
nlp4j-embedding-server-e5 --no-warmup
For large JSONL files, adjust the batch size depending on available memory and GPU capacity.
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--batch-size 64
If memory is limited, reduce the batch size.
nlp4j-embedding-local-e5 \
--input input.jsonl \
--output output.jsonl \
--batch-size 8
Conclusion
nlp4j-llm-embedding-e5 is a small Python package for using multilingual E5 embeddings locally.
It provides:
- pip installation from PyPI
- a local JSONL embedding command
- a lightweight HTTP embedding server
- semantic search API
- cosine similarity API
- automatic E5 prefix handling
- token count checking
- direct Python API access
Install it with:
pip install nlp4j-llm-embedding-e5
This package is useful for local NLP experiments, RAG preprocessing, semantic search workflows, and integration with applications written in languages other than Python.