When building AI-powered applications, two things will eventually catch you off guard: unexpected API costs and context window errors. Both problems come down to one thing — not knowing how many tokens your text actually uses.
Token Counter API is a simple REST API that counts tokens accurately using OpenAI's official tiktoken library, supports 10+ models, and handles Japanese text correctly — which is trickier than it sounds.
Why Token Counting Matters
Every LLM API charges by token, not by character or word. Before you send a request, you need to know:
- Will this text exceed the model's context limit?
- How much will this request cost?
- If I need to trim the text, where should I cut?
These are questions that come up constantly in production apps — chatbots, document summarizers, RAG pipelines, and more. Guessing or estimating is fine during development, but it will bite you in production.
Japanese Text Is a Special Case
If your app handles Japanese (or other CJK languages), token counting gets more complex.
In English, tokens roughly correspond to words or word-pieces. One word is usually 1–2 tokens. But Japanese has no spaces between words, uses multiple character sets (hiragana, katakana, kanji), and gets split into smaller chunks by the tokenizer.
Here's a quick comparison:
| Text | Characters | Tokens (gpt-4o) |
|---|---|---|
Hello world |
11 | 2 |
こんにちは世界 |
7 | 7–9 |
A 500-character Japanese paragraph might use 2–3x more tokens than a 500-character English paragraph. If you're building a multilingual app and ignoring this, your cost estimates are probably off.
Token Counter API returns the actual token breakdown (tokens field), so you can see exactly how the text is being split.
Quick Start
The API is available on RapidAPI:
https://rapidapi.com/yuyayokoyama0427/api/token-counter-api
Or you can call the base URL directly:
https://token-counter-api.onrender.com
No setup required. Send a POST request with your text and model name, get the token count back.
Endpoints
POST /count — Count tokens for a single text
The core endpoint. Returns token count, character count, encoding name, and the token list.
curl:
curl -X POST "https://token-counter-api.onrender.com/count" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, 日本語テキスト", "model": "gpt-4o"}'
Python:
import requests
response = requests.post(
"https://token-counter-api.onrender.com/count",
json={"text": "Hello, 日本語テキスト", "model": "gpt-4o"}
)
print(response.json())
Response:
{
"token_count": 12,
"character_count": 18,
"model": "gpt-4o",
"encoding": "cl100k_base",
"tokens": ["Hello", ",", " ", "日本語", "テ", "キ", "スト"]
}
The tokens array shows exactly how tiktoken split your input — useful for debugging unexpected token counts.
POST /count/bulk — Count tokens for multiple texts at once
If you're processing a batch of documents, use this endpoint to count all of them in a single request.
import requests
response = requests.post(
"https://token-counter-api.onrender.com/count/bulk",
json={
"texts": [
"First document text here.",
"Second document, longer and more detailed.",
"短い日本語テキスト"
],
"model": "gpt-4o"
}
)
print(response.json())
This is much faster than looping over individual /count calls when you have many items.
POST /truncate — Trim text to a token limit
Need to fit user input into a context window? This endpoint cuts the text at a specified token boundary without breaking in the middle of a token.
import requests
response = requests.post(
"https://token-counter-api.onrender.com/truncate",
json={
"text": "This is a very long document that might exceed your context window...",
"model": "gpt-4o",
"max_tokens": 100
}
)
data = response.json()
print(data["truncated_text"])
print(f"Tokens used: {data['token_count']}")
Useful for safely clipping user messages, document chunks, or retrieved context before passing them to an LLM.
POST /estimate/cost — Estimate API cost before sending
Before calling an expensive model, you can estimate the cost based on token count and model pricing.
import requests
response = requests.post(
"https://token-counter-api.onrender.com/estimate/cost",
json={
"text": "Summarize the following quarterly report...",
"model": "gpt-4o",
"token_type": "input"
}
)
print(response.json())
This helps you build cost guardrails into your app — for example, warning users before processing unusually large inputs.
GET /models — List supported models
Check which models are currently supported:
curl "https://token-counter-api.onrender.com/models"
Currently supported models:
gpt-4ogpt-4o-minigpt-4-turbogpt-4gpt-3.5-turboclaude-3-5-sonnetclaude-3-opusclaude-3-haikugemini-1.5-progemini-1.5-flash
Coverage across OpenAI, Anthropic, and Google Gemini models means you can use a single API regardless of which LLM you're targeting.
A Practical Example: Safe Message Handling in a Chatbot
Here's a pattern you might use in a real app — check the token count before sending, truncate if needed, then estimate cost:
import requests
BASE_URL = "https://token-counter-api.onrender.com"
MODEL = "gpt-4o"
MAX_INPUT_TOKENS = 4000
def prepare_message(user_input: str) -> dict:
# Count tokens
count_res = requests.post(f"{BASE_URL}/count", json={
"text": user_input,
"model": MODEL
}).json()
token_count = count_res["token_count"]
# Truncate if over limit
if token_count > MAX_INPUT_TOKENS:
trunc_res = requests.post(f"{BASE_URL}/truncate", json={
"text": user_input,
"model": MODEL,
"max_tokens": MAX_INPUT_TOKENS
}).json()
user_input = trunc_res["truncated_text"]
token_count = trunc_res["token_count"]
# Estimate cost
cost_res = requests.post(f"{BASE_URL}/estimate/cost", json={
"text": user_input,
"model": MODEL,
"token_type": "input"
}).json()
return {
"text": user_input,
"token_count": token_count,
"estimated_cost": cost_res
}
This kind of guard layer is easy to add and prevents a lot of runtime errors and cost surprises.
Summary
Token Counter API gives you accurate token counts powered by tiktoken — the same library OpenAI uses internally. It handles Japanese and other non-Latin text correctly, supports 10+ models across major providers, and covers the common use cases: counting, bulk counting, truncation, and cost estimation.
If you're building anything with LLMs, knowing your token counts before you send is a small habit that pays off quickly.
Try it on RapidAPI:
https://rapidapi.com/yuyayokoyama0427/api/token-counter-api