ClineのMax output

Last updated at 2025-06-07Posted at 2025-06-07

はじめに

ClineのMax outputはLLMの性能とイコールではないというだけの話です。

もしかしたら当たり前のことかもしれませんがいくつかIssueにも似たような内容があったので知らない人もいるかもしれません。
私はしばらくLLMの性能しかちゃんと見てなくてしばらく気づいていませんでした。

LLMのMax Output

私は普段Claude使っているのでClaudeのいくつか下記に抜粋します。
4 OpusがSonnetの半分ですがそれでも32Kあります。

モデル	入力トークン	出力トークン
claude-opus-4-20250514	200K	32K
claude-sonnet-4-20250514	200K	64K
claude-3-7-sonnet-20250219	200K	64K

Claude 4もいけるのか把握してませんが3.7はリクエストヘッダ追加すると128Kまでいけます。

Clineを見てみる

ClineでClaude 4 Sonnetを選んでみます。

すると Max output: 8,192 tokens と表示されます。
あれ、Sonnetなら64Kのはずが8K？

OSSですので実際のソースも抜粋して見てみます。

api.tsには各プロバイダとモデルの設定があります。
ここを見るとたしかにmaxTokens: 8192となっています。

src/shared/api.ts

export type AnthropicModelId = keyof typeof anthropicModels
export const anthropicDefaultModelId: AnthropicModelId = "claude-sonnet-4-20250514"
export const anthropicModels = {
	"claude-sonnet-4-20250514": {
		maxTokens: 8192,
		contextWindow: 200_000,
		supportsImages: true,

		supportsPromptCache: true,
		inputPrice: 3.0,
		outputPrice: 15.0,
		cacheWritesPrice: 3.75,
		cacheReadsPrice: 0.3,
	},
	"claude-opus-4-20250514": {
		maxTokens: 8192,
		contextWindow: 200_000,
		supportsImages: true,
		supportsPromptCache: true,
		inputPrice: 15.0,
		outputPrice: 75.0,
		cacheWritesPrice: 18.75,
		cacheReadsPrice: 1.5,
	},
	"claude-3-7-sonnet-20250219": {
		maxTokens: 8192,
		contextWindow: 200_000,
		supportsImages: true,

		supportsPromptCache: true,
		inputPrice: 3.0,
		outputPrice: 15.0,
		cacheWritesPrice: 3.75,
		cacheReadsPrice: 0.3,
	},
    ...
} as const satisfies Record<string, ModelInfo> // as const assertion makes the object deeply readonly

課金額についての値もあるので表示用だけで実際に制限値として使っているのか若干疑問だったのでAPIを叩いているところも見ていきます。

anthropic.tsにAnthropicのAPIを叩いているところがあります。
APIのmax_tokensにはmodel.info.maxTokens || 8192を与えており、api.tsで定義していたモデルごとの設定を使ってパラメータを設定していることが分かります(もしModelInfoから取れなければ8192)。

一応ガイドも確認しておきますが、max_tokensにはThe maximum number of tokens to generate before stopping.とあるので出力トークン数に対してのものであることが分かります

src/api/providers/anthropic.ts

import { anthropicDefaultModelId, AnthropicModelId, anthropicModels, ApiHandlerOptions, ModelInfo } from "@shared/api"

export class AnthropicHandler implements ApiHandler {
	private options: ApiHandlerOptions
	private client: Anthropic

	constructor(options: ApiHandlerOptions) {
		this.options = options
		this.client = new Anthropic({
			apiKey: this.options.apiKey,
			baseURL: this.options.anthropicBaseUrl || undefined,
		})
	}

	@withRetry()
	async *createMessage(systemPrompt: string, messages: Anthropic.Messages.MessageParam[]): ApiStream {
		const model = this.getModel()
		let stream: AnthropicStream<Anthropic.RawMessageStreamEvent>
		const modelId = model.id

		const budget_tokens = this.options.thinkingBudgetTokens || 0
		const reasoningOn = (modelId.includes("3-7") || modelId.includes("4-")) && budget_tokens !== 0 ? true : false

		switch (modelId) {
			// 'latest' alias does not support cache_control
			case "claude-sonnet-4-20250514":
			case "claude-3-7-sonnet-20250219":
			case "claude-3-5-sonnet-20241022":
			case "claude-3-5-haiku-20241022":
			case "claude-opus-4-20250514":
			case "claude-3-opus-20240229":
			case "claude-3-haiku-20240307": {
				/*
				The latest message will be the new user message, one before will be the assistant message from a previous request, and the user message before that will be a previously cached user message. So we need to mark the latest user message as ephemeral to cache it for the next request, and mark the second to last user message as ephemeral to let the server know the last message to retrieve from the cache for the current request..
				*/
				const userMsgIndices = messages.reduce(
					(acc, msg, index) => (msg.role === "user" ? [...acc, index] : acc),
					[] as number[],
				)
				const lastUserMsgIndex = userMsgIndices[userMsgIndices.length - 1] ?? -1
				const secondLastMsgUserIndex = userMsgIndices[userMsgIndices.length - 2] ?? -1
				stream = await this.client.messages.create(
					{
						model: modelId,
						thinking: reasoningOn ? { type: "enabled", budget_tokens: budget_tokens } : undefined,
						max_tokens: model.info.maxTokens || 8192,
						// "Thinking isn’t compatible with temperature, top_p, or top_k modifications as well as forced tool use."
						// (https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#important-considerations-when-using-extended-thinking)
						temperature: reasoningOn ? undefined : 0,
						system: [
							{
								text: systemPrompt,
								type: "text",
								cache_control: { type: "ephemeral" },
							},
						], // setting cache breakpoint for system prompt so new tasks can reuse it
						messages: messages.map((message, index) => {
							if (index === lastUserMsgIndex || index === secondLastMsgUserIndex) {
								return {
									...message,
									content:
										typeof message.content === "string"
											? [
													{
														type: "text",
														text: message.content,
														cache_control: {
															type: "ephemeral",
														},
													},
												]
											: message.content.map((content, contentIndex) =>
													contentIndex === message.content.length - 1
														? {
																...content,
																cache_control: {
																	type: "ephemeral",
																},
															}
														: content,
												),
								}
							}
							return message
						}),
						// tools, // cache breakpoints go from tools > system > messages, and since tools dont change, we can just set the breakpoint at the end of system (this avoids having to set a breakpoint at the end of tools which by itself does not meet min requirements for haiku caching)
						// tool_choice: { type: "auto" },
						// tools: tools,
						stream: true,
					},
					(() => {
						// prompt caching: https://x.com/alexalbert__/status/1823751995901272068
						// https://github.com/anthropics/anthropic-sdk-typescript?tab=readme-ov-file#default-headers
						// https://github.com/anthropics/anthropic-sdk-typescript/commit/c920b77fc67bd839bfeb6716ceab9d7c9bbe7393
						switch (modelId) {
							case "claude-sonnet-4-20250514":
							case "claude-opus-4-20250514":
							case "claude-3-7-sonnet-20250219":
							case "claude-3-5-sonnet-20241022":
							case "claude-3-5-haiku-20241022":
							case "claude-3-opus-20240229":
							case "claude-3-haiku-20240307":
								return {
									headers: {
										"anthropic-beta": "prompt-caching-2024-07-31",
									},
								}
							default:
								return undefined
						}
					})(),
				)
				break
			}
			default: {
				stream = await this.client.messages.create({
					model: modelId,
					max_tokens: model.info.maxTokens || 8192,
					temperature: 0,
					system: [{ text: systemPrompt, type: "text" }],
					messages,
					// tools,
					// tool_choice: { type: "auto" },
					stream: true,
				})
				break
			}
		}
        ...
    }
    
	getModel(): { id: AnthropicModelId; info: ModelInfo } {
		const modelId = this.options.apiModelId
		if (modelId && modelId in anthropicModels) {
			const id = modelId as AnthropicModelId
			return { id, info: anthropicModels[id] }
		}
		return {
			id: anthropicDefaultModelId,
			info: anthropicModels[anthropicDefaultModelId],
		}
	}
}

結論

ということでCline側でmax_tokens指定しているのでLLMの出力がたとえ32Kや64Kあってもそこまで出力できませんよということになります。

余談ですが、どこで指定しているのかぱっと見つけられなかったんですが、OpenRouterのClaudeだとMax outputが64KとLLMの出力上限と同じになっています。

よくClineとCursorどっちが良いか比較している記事見かけますが、Max outputに言及している記事は見かけない気がします。
言語や処理内容にもよりますが感覚的には500～700行くらいで8192トークンに達するように感じます。

新しく作るなら1ファイルでそこまで出力させなければいいですが、古いコードのリファクタリング等だとこの上限にひっかかってしまうこともあるのでこういったケースにはあまり適してないですね。
続きを出力させようとしても私はうまくいきませんでした。

GitHubのIssue

もっと出力できてほしいから設定選べるようにしたいとか声が上がっているようです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up