The Problem
Every prompt you send to an LLM provider - OpenAI, Anthropic, Google - is visible in plaintext. Customer names, email addresses, and national IDs end up in provider logs.
If your application handles Japanese user data, that includes マイナンバー (My Number), Japanese phone numbers, and Japanese names.
CloakLLM
CloakLLM is open-source middleware that detects PII, replaces it with reversible tokens, and restores originals in the response. One line to integrate.
pip install cloakllm
from cloakllm import Shield, ShieldConfig
shield = Shield(ShieldConfig(locale="ja"))
text = "田中太郎のマイナンバーは123456789012です。電話は090-1234-5678。"
sanitized, token_map = shield.sanitize(text)
# → "[PERSON_0]のマイナンバーは[MY_NUMBER_JP_0]です。電話は[PHONE_JP_0]。"
# After LLM response, restore originals
restored = shield.desanitize(llm_response, token_map)
Japanese-Specific Detection
v0.4.0 adds locale-aware detection for Japan:
| Category | Pattern | Example |
|---|---|---|
MY_NUMBER_JP |
12-digit individual number | 123456789012 |
PHONE_JP |
Japanese mobile/landline | 090-1234-5678, 03-1234-5678 |
PASSPORT_JP |
Japanese passport | TK1234567 |
The locale setting also selects the appropriate spaCy model (ja_core_news_sm) automatically and provides Japanese-specific hints to the optional Ollama LLM detection pass.
How It Works
3-pass detection pipeline:
- Regex - structured patterns (emails, credit cards, My Number, phone numbers)
- spaCy NER - names, organizations, locations (model auto-selected per locale)
- Ollama LLM (opt-in) - context-dependent PII (addresses, medical terms)
Every operation is logged to a tamper-evident hash-chained audit trail (SHA-256). Designed for compliance requirements.
Integration
Works as middleware for existing LLM SDKs:
# OpenAI SDK
from cloakllm import enable_openai
from openai import OpenAI
client = OpenAI()
enable_openai(client) # all calls now protected
# LiteLLM (100+ providers)
import cloakllm
cloakllm.enable() # all calls now protected
// Node.js - OpenAI SDK
const cloakllm = require('cloakllm');
cloakllm.enable(client); // all calls now protected
Also available as an MCP server for Claude Desktop.
Cryptographic Attestation (v0.3.2+)
Every sanitize() call can produce an Ed25519-signed certificate — cryptographic proof that PII was removed before inference. Batch operations use Merkle trees.
from cloakllm import Shield, ShieldConfig, DeploymentKeyPair
keypair = DeploymentKeyPair.generate()
shield = Shield(ShieldConfig(locale="ja", attestation_key=keypair))
sanitized, token_map = shield.sanitize("田中太郎のマイナンバーは123456789012です。")
cert = token_map.certificate
assert cert.verify(keypair.public_key) # cryptographic proof
Links
- GitHub: https://github.com/cloakllm/CloakLLM
- Docs: https://cloakllm.dev
- PyPI:
pip install cloakllm==0.4.0 - npm:
npm install cloakllm@0.4.0
MIT licensed. 527 tests. 13 supported locales.
