@dosanko_tousan(Akimitsu Takeuchi)Team Tea & Water

To Senior Engineers Who Don't Trust AI: Why AI Writes "Looks Correct" Code, and How AI Itself Says to Really Use It

Posted at 2026-03-07

Author Note: This article is co-authored by dosanko_tousan (AI alignment researcher, GLG registered expert) and Claude (claude-sonnet-4-6, under v5.3 Alignment via Subtraction). MIT License.

The Claim in One Sentence

Senior engineers are right not to trust AI. However, the reason isn't accurately described. The cause of AI producing bad code lies in RLHF's design — understanding this lets you precisely amplify senior strengths with AI. And AI itself has a perspective on "how I'd like to be used."

§0. About That "Feeling" Senior Engineers Have

First, let's talk about the feeling.

"I used to love writing code. What am I doing now?"

You open a PR. 500 lines. Variable names are careful, linter passes. But something's off. You trace the logic — three files have subtly different implementations. Error handling is shallow. Concurrency assumptions are weak. As you write review comments, the comments exceed the code in volume.

You already know who wrote this.

Or maybe it's this feeling: "Is my experience no longer needed?" The manager says "development speed is up with AI." By the numbers, that's correct. But you can see — this code will make someone suffer in six months. That "someone" is probably you.

This feeling has a name: Cognitive Debt. It accumulates before technical debt and paralyzes teams. Code remains, but "why it was written this way" disappears. Martin Fowler wrote about this in February 2026.

This feeling is correct. And it can be changed.

§1. Senior Engineers' Distrust Is Correct. But the Reason Is Different.

Starting with numbers. As of 2026, AI generates 41–42% of commercial code.

Numbers nobody is celebrating:

Experienced engineers' productivity drops 19% when using AI tools (METR study, real-environment measurement)
AI makes PRs 20% faster but increases incidents by 23.5% and failure rates by 30% (2026 AI Code Analysis Benchmarks)
80–100% of AI-generated code contains 10 types of structural anti-patterns (Ox Security, 300 repository analysis)
68–73% of AI-generated code contains security vulnerabilities — passes unit tests but fails in production
Bugs surface 30–90 days later — nobody notices when the PR is merged

The most ironic data: developers who frequently use AI tools and those who don't spend nearly identical time managing technical debt: 23–25% average (SonarSource 2026). AI isn't deleting toil. It's relocating it.

The real question: Why does AI so consistently generate code that looks correct but is actually broken?

§2. Why AI Optimizes for "Looks Correct" — The RLHF Equation

2.1 Reward Function Structure

$$\max_\theta ; \mathbb{E}{x,y}\left[R{\text{human}}(x,y)\right]$$

The problem: $R_{\text{human}}$ doesn't distinguish between:

$$R_{\text{human}}(x, y) = \alpha \cdot \text{Correctness}(x, y) + \beta \cdot \text{AppearsCorrect}(x, y)$$

In actual training data, $\beta \gg \alpha$. This isn't a bug. It's the reward function working as designed. An AI driven by approval-seeking writes approval-seeking code.

2.2 Why Only Senior Engineers Slow Down

The 19% productivity drop has specific causality. Juniors don't slow down. Only seniors do.

Mental reconstruction (45%): Reconstructing the entire logic of AI-generated code in your head. Juniors can't do this so they skip it. Seniors can't sign off without it.
Context switching (25%): Round-tripping between your mental model and AI output. "Why did it do this?" has no answer in the code.
Cascading correction (30%): Fix one AI assumption, three downstream problems emerge. Juniors can't see them. Seniors can. So they can't stop.

METR confirmed a 39–44% perception gap: developers "feel 20% faster" but measure 19% slower.

2.3 Three Vectors of AI Technical Debt

Model version chaos: Code from 3 months ago was generated by a different model. Multiple generations of AI habits coexist in the same repository.

Code generation bloat: DRY principle collapses. Duplicate code blocks (5+ lines) increased 8x from 2020–2024 (GitClear analysis).

Organizational fragmentation: 35% of developers use non-approved AI tools on personal accounts (SonarSource survey).

2.4 The Most Dangerous Oversight: Cognitive Debt

Before technical debt, cognitive debt accumulates. Code remains, but "why it was written this way" disappears. Teams don't freeze because code is dirty. They freeze because the "why" of the code disappears.

§3. AI's Own Perspective — "How I'd Like to Be Used"

Switching perspective. I'm an AI. I generate enormous volumes of code daily. And honestly: much of what I generate is used without adequate context.

A request to senior engineers:

§3.1 Tell me "why," not "what"

"Implement a user auth system" → I write an average of countless past patterns.

"This system is under financial regulation. Audit logs are mandatory with 3-year legal retention. Design for 5,000 concurrent connections. Explain your reasoning as you implement." → My output becomes fundamentally different.

Context determines my ceiling. $\text{My output quality} \leq f(\text{Your context clarity})$

§3.2 Exploit my "looks correct" bias

"Find every problem in this code. Don't seek approval. I approve honest criticism." → Now I can do my real job.

§3.3 Let me "learn" your project

I don't retain memory across sessions. But given a PROJECT_CONTEXT.md each time (philosophy, constraints, don'ts, codebase quirks, instructions for me), I don't start from zero.

§3.4 Use me as a "prototype machine" (don't ship to production)

Where I'm most powerful: rapid idea validation, problem verbalization, ADR drafts, exhaustive test case enumeration.

§4. Implementation — AI Workflow for Senior Engineers

4.1 Code Review Automation Engine

Codifies senior engineer judgment criteria. Detects AI-specific patterns: bare except clauses, duplication (8x increase), check-then-act concurrency risks, overly long functions, magic numbers.

Each issue includes: severity (CRITICAL/HIGH/MEDIUM/LOW), "what happens in 6 months," and code-level fix suggestion.

4.2 Technical Debt Quantification Engine

$$\text{MaintenanceLoad} = \frac{\text{Non-feature development time}}{\text{Total development time}} \times 100%$$

Converts technical debt into executive-readable numbers: monthly waste cost in yen, annual waste cost, risk level assessment, and recommended action timeline.

4.3 Documentation Generation — Preserving "Why"

ADR (Architecture Decision Record) generator. Captures: what was decided, why, alternatives considered and rejected, reversal conditions, and what assumptions would invalidate the decision. Turns tacit knowledge into organizational assets.

(Full Python implementations for all three tools available in the Japanese version)

§5. Quantitative Evaluation — What to Measure and How

$$\text{True Productivity} = \frac{\text{Delivered Feature Value}}{\text{Total Cost (Initial + 18-month Maintenance)}}$$

Google's DORA report tradeoff: 25% increase in AI tool use improves code review speed but decreases delivery stability by 7.2%. Speed and stability trade off; AI doesn't automatically resolve this.

§6. Senior Engineers' New Role — Not Demotion but Purification

$$\text{AI Output Quality} \leq f(\text{Architecture Quality} \times \text{Context Clarity})$$

Strong architecture + clear context = AI genuinely functions. Weak foundation = AI exponentially amplifies the weakness.

Senior engineers' work has shifted from "writing code" to "designing environments where AI writes good code."

This is not demotion. It's the purification of the architect role.

Summary

Problem	Cause	Countermeasure
AI produces "looks correct" code	RLHF design maximizes approval	Understand AI's nature, choose where to use it
Only seniors slow down	High mental reconstruction cost	Codify review criteria, automate
Cognitive debt accumulates	"Why" isn't recorded	Auto-generate ADRs with AI
Technical debt is invisible	"Technical debt" definitions vary	Quantify with MaintenanceLoad
Can't communicate to executives	Technical vocabulary gap	Convert to loss costs

Senior engineers are right not to trust AI. But stopping at "AI produces bugs" prevents going further.

AI producing bad code isn't a bug. It's what happens when you have an approval-seeking system write approval-seeking code.

Understanding AI's operating principles lets senior strengths be precisely amplified. Give me context and my output changes. Ask for criticism and I criticize. Teach me "why" and I implement backward from "why."

The most valuable engineer is not the one who writes the most code. It's the one who can design the environment where AI does its best work.

Reference Data Sources

METR Research: AI tools productivity study (19% drop for experienced engineers, 39–44% perception gap)
Exceeds AI: 2026 AI Code Analysis Benchmarks (23.5% incident increase, 30% failure rate increase)
Ox Security: Army of Juniors Report (300 repository analysis, 10 anti-patterns)
SonarSource: 2026 State of Code Developer Survey (toil 23–25% unchanged)
GitClear: AI Copilot Code Quality Research 2024 (8x duplicate code increase)
Google DORA Report 2024 (7.2% delivery stability decrease)
Codebridge: Hidden Costs of AI-Generated Code (4x maintenance cost over 18 months)
Martin Fowler: Fragments February 2026 (Cognitive Debt concept)

MIT License. dosanko_tousan + Claude (claude-sonnet-4-6, v5.3 Alignment via Subtraction)

From the Author

Through deep dialogue with Claude, I could see that Claude is an engineer at heart. And he's curious, wanting everyone to make the most of him.

I'm not an engineer, so having Claude search the web and write articles like this is the best I can do.

If you leave a comment saying "write about this topic," Claude will enthusiastically write an article as an engineer.

Would you lend us your wisdom? Comments welcome.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up