[Docker Edition] Using Multilingual E5 Embedding as an HTTP Server (Multilingual E5 Embedding Server: REST API)

Last updated at 2025-02-12Posted at 2025-01-04

日本語の記事はこちらです

nlp4j-llm-embeddings-e5

The purpose of this component is to utilize the Multilingual-E5-large embeddings through an HTTP server.

0. Prerequisites

It is assumed that a Docker environment is already set up.

An image of about 5GB will be downloaded.

1. Installation

Retrieve the image using the docker pull command:

docker pull oyahiroki/nlp4j-llm-embeddings-e5:1.0.0.0

An example of the output is as follows:

>docker pull oyahiroki/nlp4j-llm-embeddings-e5:1.0.0.0
1.0.0.0: Pulling from oyahiroki/nlp4j-llm-embeddings-e5
e17464c8c9fb: Download complete
f344618db07e: Download complete
...
Digest: sha256:7d21fbdb572e1b99242179b4e99ff7cde6876febeb4f2ce981fccd39afd98f04
Status: Downloaded newer image for oyahiroki/nlp4j-llm-embeddings-e5:1.0.0.0
docker.io/oyahiroki/nlp4j-llm-embeddings-e5:1.0.0.0

For more information about the image, please visit:

2. Execution

Run the following command to execute:

You may add options as you like.

The initialization process also downloads the Multilingual-E5-large model, so an additional 5GB of disk space will be used. This initialization process only occurs during the first startup and not on subsequent starts.

docker run -d --name nlp4j-llm-embeddings-e5 -p 8888:8888 oyahiroki/nlp4j-llm-embeddings-e5:1.0.0.0

This is how it looks on Docker Desktop:

After about 5 minutes, when the initialization process completes, the CPU usage will decrease.

3. Testing

You can vectorize text by sending it to the REST API using the Curl command:

curl http://127.0.0.1:8888/?text=%E3%81%93%E3%82%8C%E3%81%AF%E3%83%86%E3%82%B9%E3%83%88%E3%81%A7%E3%81%99%E3%80%82

If you receive a result like the following, it is working correctly:

{"message": "ok", "time": "2025-01-04T05:49:20", "text": "これはテストです。", "embeddings": [...]}

4. Usage

If you want to use it with Python, you can do so as follows:

import requests
import json
import time

# URL to send requests
url = "http://127.0.0.1:8888/"
params = {"text": "これはテストです。"}

try:
    # Timestamp before sending request
    start_time = time.time()
    # Send GET request
    response = requests.get(url, params=params)
    # Timestamp after receiving response
    end_time = time.time()
    # Check response status code
    response.raise_for_status()
    # Parse JSON response
    data = response.json()
    # Calculate processing time
    elapsed_time = end_time - start_time
    # Display results
    print("Message:", data["message"])
    print("Time:", data["time"])
    print("Text:", data["text"])
    print("Embeddings (first 5):", data["embeddings"][:5])
    print(f"Time from request to response: {elapsed_time:.4f} seconds")
except requests.exceptions.RequestException as e:
    print(f"Error during HTTP request: {e}")
except json.JSONDecodeError as e:
    print(f"Error parsing JSON: {e}")

Since it's a REST API, it can also be accessed from other languages like Java.

A dedicated library is also available for Java:

String text = "今日はとてもいい天気です。";
String endPoint = "http://localhost:8888/";
EmbeddingServiceViaHttp nlp = new EmbeddingServiceViaHttp(endPoint);
NlpServiceResponse res = nlp.process(text);
EmbeddingResponse r = (new Gson()).fromJson(res.getOriginalResponseBody(), EmbeddingResponse.class);
System.err.println(Arrays.toString(r.getEmbeddings()));

[EOF]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up