ローカル LLM でコード補完エディタを作った話

Last updated at 2025-12-01Posted at 2025-11-30

いよいよ今年も残りわずかで、皆さんも慌ただしい毎日ではないでしょうか。
恒例の千住Meetup!のほうも年末になると参加者が減ってきたなと感じております。

前回の記事が微妙に読まれてなかったので、今月の内にもう一本記事書こうかと思いました。

Teams アプリ関連の話でもしたかったのですが、色々と準備できていないところが多く、今回は何となくローカル LLM でもいじってみようかなと思います。

背景

VSCode でコード自動補完してくれるようになってからというもの、あんまり頭働かせずにコード書いて（Tab キー押して）しまうようになりました。
無課金なので、無料枠がなくなると結構しんどい時ありますよね。
そこでローカル LLM なら利用回数気にせず使い放題なので、今回は困ったとき用のコード補完エディタを作ってみようと思います！

1. ローカル LLM の準備

今回は Windows 機で CPU のみで動かしてみたいと思います。

まずは Ollama をインストールして LLM モデルをダウンロードします。

Ollama とは：

大規模言語モデルを自分の PC などのローカル環境で簡単に実行できるオープンソースツール
インターネットに依存せず、プライバシーを確保しながら様々なLLMを無料で利用できるのが特徴

環境変数登録確認

> ollama --version

ollama version is 0.13.0

Ollama を起動していないとこのように表示される

> ollama --version

Warning: could not connect to a running Ollama instance
Warning: client version is 0.13.0

2. LLM をローカルから使えるようにする

今回は OpenAI が提供しているリソース gpt-oss を使ってみようと思います。
Ollama を起動している状態で

> ollama pull gpt-oss

赤枠のブルダウンからダウンロードした gpt-oss に含まれるモデルを選択し、ネットを切った状態で試しに何か聞いてみる（今回は Linkin Park の New Divide の歌詞を引用）

全然的外れで心配されたけど、まあ、ありがとね💦

コーディングを試してみるか。。。
Prompt:

Please genereate a python code that can do a get request to localhost:3000 and print out the result

Thought for 6.8 seconds

Below is a very small, self‑contained Python script that performs an HTTP GET request to http://localhost:3000 and prints whatever the server replies.
It uses the popular third‑party library requests (recommended because of its ease of use), but I also included a short alternative that works with the Python standard library (urllib.request) in case you don’t want to install any external dependencies.

1️⃣ Using the requests library
#!/usr/bin/env python3
"""
Simple GET request to http://localhost:3000 using requests
"""

import requests

def main():
    url = "http://localhost:3000"

    try:
        # Send the request
        response = requests.get(url)

        # Raise an exception for HTTP error codes (4xx, 5xx)
        response.raise_for_status()

        # Print status code, headers, and body
        print(f"Status code: {response.status_code}")
        print("Headers:")
        for k, v in response.headers.items():
            print(f"  {k}: {v}")

        print("\nBody:")
        # Pretty‑print JSON responses when possible
        try:
            print(response.json())
        except ValueError:
            # Not JSON – just print raw text
            print(response.text)

    except requests.exceptions.ConnectionError:
        print(f"Could not connect to {url}. Is the server running?")
    except requests.exceptions.HTTPError as e:
        print(f"HTTP error occurred: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


if __name__ == "__main__":
    main()
How to run
# 1️⃣ Install requests (if you haven’t already)
pip install requests

# 2️⃣ Run the script
python get_localhost.py
2️⃣ Without any third‑party packages (standard library only)
#!/usr/bin/env python3
"""
GET request to http://localhost:3000 using urllib (no external deps)
"""

import urllib.request
import urllib.error
import json

def main():
    url = "http://localhost:3000"

    try:
        with urllib.request.urlopen(url) as response:
            status = response.status
            headers = response.getheaders()
            body = response.read()

            print(f"Status code: {status}")
            print("Headers:")
            for k, v in headers:
                print(f"  {k}: {v}")

            print("\nBody:")
            # Try to decode as UTF‑8, fall back to bytes if needed
            try:
                text = body.decode("utf-8")
            except UnicodeDecodeError:
                text = body

            # Pretty‑print JSON if possible
            try:
                json_obj = json.loads(text)
                print(json.dumps(json_obj, indent=2))
            except (ValueError, TypeError):
                print(text)

    except urllib.error.URLError as e:
        print(f"Could not connect to {url}: {e.reason}")

if __name__ == "__main__":
    main()
How to run
python get_localhost_stdlib.py
Quick sanity check
Make sure something is listening on port 3000 before running the script.
For example, you could start a simple Node.js/Express server:

# Example: node server.js (listening on 3000)
or a minimal Python HTTP server:

# In the directory you want to serve
python -m http.server 3000
Once the server is up, run the script and you’ll see the response printed to the console.

普通に ChatGPT って感じの回答ですね。
それではコードから呼び出しやすいように API 化してみましょう。

3. API で呼び出してみる

どうやら ollama は起動すると localhost:11434 で動いているようです。
ollama serve でバックグランドのみで動かすこともできます。

試しに Python で localhost をコールしてみる。

import requests
import json

API_SERVER_URL = "http://localhost:11434/api/chat"

headers = {"Content-Type": "application/json"}
payload = {
    "model": "gpt-oss",
    "messages": [{
        "role": "user",
        "content": """
        要望: Hello World を返す
        言語: Python
        制約条件:
        ・コードは実行可能な形で提供すること
        ・コードはシンプルで分かりやすいこと
        出力形式:
        ・コードのみを出力してください。説明やコメントは不要です。
        """,
    }]
}
response = requests.post(
    API_SERVER_URL, 
    headers=headers, 
    json=payload, 
    stream=True
)
response.raise_for_status()

final_text = ""

for line in response.iter_lines():
    if not line:
        continue
    data = json.loads(line.decode("utf-8"))
    if "message" in data and "content" in data["message"]:
        final_text += data["message"]["content"]
print(final_text)

実行結果：

print("Hello World")

/api/chat はデフォルトでストリーミング(SSE) 形式で返します。

{"message": {"content": "Hello"}}
{"message": {"content": "World"}}
{"done": true}

response.json() ではなく、1 行ずつパースが必要です。

正しい受け取りコード（ストリーミング対応）

response = requests.post(
    API_SERVER_URL, 
    headers=headers, 
    json=payload, 
    stream=True
)
response.raise_for_status()

for line in response.iter_lines():
    if not line:
        continue
    data = json.loads(line.decode("utf-8"))
    print(data)

4. コード補完機能設計

コード補完と言うと、VSCode などで予測変換のような感じで自動補完するものを想像するかと存じます。

正直 Windows アプリケーションの開発経験がなく、薄く予測が表示されてタブキーで適用する機能の作り方はわかりません（何となくキー入力と機能のマッピング作って、 CSS めっちゃ弄ってとかめんどくさそうなことごにょごにょってのは想像つきます）。

なので、今回は画面を左右に分けて左がエディタで右が予測補完って感じの Next.js ローカルアプリにしてみようと思います。
リアルタイムに入力されるごとにコールして予測補完をするのは現実的じゃないので、入力が止まってから自動実行にします。

（とか、色々言っているけど、ほんとはリアルタイムで LLM 動かしてコードをレビューしてゴーストで予測補完コードを表示できるほどマシンパワーがないです、ごめんなさい。）

Prompt:

要望: 現在のコードを読んで、意図を汲み取り、完成形を予測して補完してください。
言語: Python
現在のコード:
    ```Python
    num1 = 10
    num2 = 20
    result = 
    ```
制約条件:
・コードは実行可能な形で提供すること
・コードはシンプルで分かりやすいこと
出力形式:
・コードのみを出力してください。説明やコメントは不要です。

実行結果：

num1 = 10
num2 = 20
result = num1 + num2
print(result)

お、結構調子いいのでは、これはいけるぞ！

こういう感じで変数用意しておけば柔軟に対応できそう！

言語: {lang}
現在のコード:
    ```{lang}
    {cur_code}
    ```

5. Next.js アプリの作成

いつものプロジェクト作成

> npx create-next-app llmeditor

エディタは Monaco エディタを使いたいと思います。

> npm install @monaco-editor/react axios

TypeScript app/page.tsx

"use client";

import { useState } from "react";
import Editor from "@monaco-editor/react";
import CallComponent from "./components/Prediction";

const languages = [
  { label: "HTML", value: "html" },
  { label: "CSS", value: "css" },
  { label: "Java", value: "java" },
  { label: "Apex", value: "apex" },
  { label: "Python", value: "python" },
  { label: "Go", value: "go" },
  { label: "C", value: "c" },
  { label: "C++", value: "cpp" },
  { label: "C#", value: "csharp" },
  { label: "Perl", value: "perl" },
  { label: "PHP", value: "php" },
  { label: "Rust", value: "rust" },
  { label: "TypeScript", value: "typescript" },
  { label: "JavaScript", value: "javascript" },
  { label: "SQL", value: "sql" },
  { label: "Shell Script", value: "shell" },
  { label: "YAML", value: "yaml" },
  { label: "Dockerfile", value: "dockerfile" },
  { label: "PowerShell", value: "powershell" },
  { label: "JSON", value: "json" },
  { label: "Markdown", value: "markdown" },
  { label: "PlainText", value: "plaintext" }
];

export default function Page() {
  const [lang, setLang] = useState(languages[0].value);
  const [cur_code, setCurCode] = useState("");

  return (
    <div style={{ display: "flex", height: "100vh" }}>
      
      {/* 左: 言語選択・エディタ */}
      <div style={{ flex: 1, borderRight: "1px solid #555" }}>        
        <div style={{ padding: 8, background: "#222" }}>
          <select
            value={lang}
            onChange={(e) => setLang(e.target.value)}
            style={{ padding: 6, color: "#fff", background: "#333", border: "1px solid #555" }}
          >
            {languages.map((l) => (
              <option key={l.value} value={l.value}>{l.label}</option>
            ))}
          </select>
        </div>

        <Editor
          height="calc(100vh - 40px)"
          language={lang}
          value={cur_code}
          theme="vs-dark"
          onChange={(v) => setCurCode(v ?? "")}
        />
      </div>

      {/* 右: 予測補完 */}
      <div style={{ flex: 1, borderRight: "1px solid #555"  }}>
        <div style={{ padding: 13, background: "#222", color: "#fff" }}>
          予測補完
        </div>
          <CallComponent lang={lang} cur_code={cur_code} />
      </div>
    </div>
  );
}

TypeScript app/component/Prediction.tsx

"use client";

import { useState, useEffect, useRef } from "react";

interface Props {
  lang: string;
  cur_code: string;
}

export default function CallComponent({ lang, cur_code }: Props) {
  const [response, setResponse] = useState("");
  const timeoutRef = useRef<NodeJS.Timeout | null>(null);

  const run = async () => {
    setResponse("");

    const payload = {
      model: "gpt-oss",
      messages: [
        {
          role: "user",
          content: `
            要望: 現在のコードを読んで、意図を汲み取り、完成形を予測して補完してください。
            言語: ${lang}
            現在のコード:
            \`\`\`${lang}
            ${cur_code}
            \`\`\`
            制約条件:
            ・コードは実行可能な形で提供すること
            ・コードはシンプルで分かりやすいこと
            出力形式:
            ・コードのみを出力してください。説明やコメントは不要です。
          `,
        },
      ],
    };

    try {
      const res = await fetch("http://localhost:11434/api/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify(payload),
      });

      if (!res.ok) throw new Error(`HTTP error: ${res.status}`);

      const reader = res.body!.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { value, done } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split("\n");

        for (const line of lines) {
          if (!line.trim()) continue;
          try {
            const data = JSON.parse(line);
            if (data?.message?.content) {
              setResponse((prev) => prev + data.message.content);
            }
          } catch {}
        }
      }
    } catch (err: any) {
      setResponse("Error: " + err.message);
    }
  };

  // cur_code が変わったときに Debounce して run を呼ぶ
  useEffect(() => {
    if (timeoutRef.current) clearTimeout(timeoutRef.current);
    timeoutRef.current = setTimeout(() => {
      // 600ms 待ってから呼ぶ
      if (cur_code.trim()) run();
    }, 600);
    return () => {
      if (timeoutRef.current) clearTimeout(timeoutRef.current);
    };
  }, [cur_code, lang]);

  return (
    <div style={{ width: "100%", background: "#111", color: "#eee", height: "100%", display: "flex", flexDirection: "column" }}>
      <pre style={{ whiteSpace: "pre-wrap", padding: 6 }}>{response}</pre>
    </div>
  );
}

これで完成です！
GitHub Repository

後書き

結局 VSCode の再発明みたいになりましたが、やってみるとローカル LLM 結構楽しいですね！

ついでに gpt-oss はやはり重かったので、結局 gemma2:2b に変更しました。
正直（主に要件の理解において）精度はかなり落ちている気はしますが、レスポンスは途端に速くなるので、遊び程度にはいいのではないでしょうか。

それではまた次回お会いしましょう！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up