0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Token Counter API: Count Tokens for GPT, Claude, and Gemini in One Request

0
Posted at

When building AI-powered applications, two things will eventually catch you off guard: unexpected API costs and context window errors. Both problems come down to one thing — not knowing how many tokens your text actually uses.

Token Counter API is a simple REST API that counts tokens accurately using OpenAI's official tiktoken library, supports 10+ models, and handles Japanese text correctly — which is trickier than it sounds.


Why Token Counting Matters

Every LLM API charges by token, not by character or word. Before you send a request, you need to know:

  • Will this text exceed the model's context limit?
  • How much will this request cost?
  • If I need to trim the text, where should I cut?

These are questions that come up constantly in production apps — chatbots, document summarizers, RAG pipelines, and more. Guessing or estimating is fine during development, but it will bite you in production.


Japanese Text Is a Special Case

If your app handles Japanese (or other CJK languages), token counting gets more complex.

In English, tokens roughly correspond to words or word-pieces. One word is usually 1–2 tokens. But Japanese has no spaces between words, uses multiple character sets (hiragana, katakana, kanji), and gets split into smaller chunks by the tokenizer.

Here's a quick comparison:

Text Characters Tokens (gpt-4o)
Hello world 11 2
こんにちは世界 7 7–9

A 500-character Japanese paragraph might use 2–3x more tokens than a 500-character English paragraph. If you're building a multilingual app and ignoring this, your cost estimates are probably off.

Token Counter API returns the actual token breakdown (tokens field), so you can see exactly how the text is being split.


Quick Start

The API is available on RapidAPI:
https://rapidapi.com/yuyayokoyama0427/api/token-counter-api

Or you can call the base URL directly:
https://token-counter-api.onrender.com

No setup required. Send a POST request with your text and model name, get the token count back.


Endpoints

POST /count — Count tokens for a single text

The core endpoint. Returns token count, character count, encoding name, and the token list.

curl:

curl -X POST "https://token-counter-api.onrender.com/count" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, 日本語テキスト", "model": "gpt-4o"}'

Python:

import requests

response = requests.post(
    "https://token-counter-api.onrender.com/count",
    json={"text": "Hello, 日本語テキスト", "model": "gpt-4o"}
)
print(response.json())

Response:

{
  "token_count": 12,
  "character_count": 18,
  "model": "gpt-4o",
  "encoding": "cl100k_base",
  "tokens": ["Hello", ",", " ", "日本語", "テ", "キ", "スト"]
}

The tokens array shows exactly how tiktoken split your input — useful for debugging unexpected token counts.


POST /count/bulk — Count tokens for multiple texts at once

If you're processing a batch of documents, use this endpoint to count all of them in a single request.

import requests

response = requests.post(
    "https://token-counter-api.onrender.com/count/bulk",
    json={
        "texts": [
            "First document text here.",
            "Second document, longer and more detailed.",
            "短い日本語テキスト"
        ],
        "model": "gpt-4o"
    }
)
print(response.json())

This is much faster than looping over individual /count calls when you have many items.


POST /truncate — Trim text to a token limit

Need to fit user input into a context window? This endpoint cuts the text at a specified token boundary without breaking in the middle of a token.

import requests

response = requests.post(
    "https://token-counter-api.onrender.com/truncate",
    json={
        "text": "This is a very long document that might exceed your context window...",
        "model": "gpt-4o",
        "max_tokens": 100
    }
)
data = response.json()
print(data["truncated_text"])
print(f"Tokens used: {data['token_count']}")

Useful for safely clipping user messages, document chunks, or retrieved context before passing them to an LLM.


POST /estimate/cost — Estimate API cost before sending

Before calling an expensive model, you can estimate the cost based on token count and model pricing.

import requests

response = requests.post(
    "https://token-counter-api.onrender.com/estimate/cost",
    json={
        "text": "Summarize the following quarterly report...",
        "model": "gpt-4o",
        "token_type": "input"
    }
)
print(response.json())

This helps you build cost guardrails into your app — for example, warning users before processing unusually large inputs.


GET /models — List supported models

Check which models are currently supported:

curl "https://token-counter-api.onrender.com/models"

Currently supported models:

  • gpt-4o
  • gpt-4o-mini
  • gpt-4-turbo
  • gpt-4
  • gpt-3.5-turbo
  • claude-3-5-sonnet
  • claude-3-opus
  • claude-3-haiku
  • gemini-1.5-pro
  • gemini-1.5-flash

Coverage across OpenAI, Anthropic, and Google Gemini models means you can use a single API regardless of which LLM you're targeting.


A Practical Example: Safe Message Handling in a Chatbot

Here's a pattern you might use in a real app — check the token count before sending, truncate if needed, then estimate cost:

import requests

BASE_URL = "https://token-counter-api.onrender.com"
MODEL = "gpt-4o"
MAX_INPUT_TOKENS = 4000

def prepare_message(user_input: str) -> dict:
    # Count tokens
    count_res = requests.post(f"{BASE_URL}/count", json={
        "text": user_input,
        "model": MODEL
    }).json()

    token_count = count_res["token_count"]

    # Truncate if over limit
    if token_count > MAX_INPUT_TOKENS:
        trunc_res = requests.post(f"{BASE_URL}/truncate", json={
            "text": user_input,
            "model": MODEL,
            "max_tokens": MAX_INPUT_TOKENS
        }).json()
        user_input = trunc_res["truncated_text"]
        token_count = trunc_res["token_count"]

    # Estimate cost
    cost_res = requests.post(f"{BASE_URL}/estimate/cost", json={
        "text": user_input,
        "model": MODEL,
        "token_type": "input"
    }).json()

    return {
        "text": user_input,
        "token_count": token_count,
        "estimated_cost": cost_res
    }

This kind of guard layer is easy to add and prevents a lot of runtime errors and cost surprises.


Summary

Token Counter API gives you accurate token counts powered by tiktoken — the same library OpenAI uses internally. It handles Japanese and other non-Latin text correctly, supports 10+ models across major providers, and covers the common use cases: counting, bulk counting, truncation, and cost estimation.

If you're building anything with LLMs, knowing your token counts before you send is a small habit that pays off quickly.

Try it on RapidAPI:
https://rapidapi.com/yuyayokoyama0427/api/token-counter-api

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?