LGTM Driven Development

AI writes, you review, nobody reads

Three hours into a feature

You're in the zone
AI suggests a library — name looks right
Install. Tests pass. Ship.

The package didn't exist two weeks ago

~20% of AI code suggests non-existent packages
58% of hallucinated names recur — predictable, weaponisable
Attackers register the names. Add payloads. Wait.

AI model output

invents package name

Attacker registers name

malicious payload → developer installs

No control was bypassed

Nothing was hacked
The attacker exploited missing context
The developer wasn't this code's author — they were a reviewer who didn't know they were reviewing

What review used to rely on

The author had been inside the code
The reviewer applied skepticism to someone who already had context
The asymmetry is what caught bugs

Author

Deep context

friction

where bugs surface

Reviewer

Broad context

Review works because someone can be questioned

"What happens if this is null?"

Author has the answer → reassurance
Author doesn't → that's the finding

What the old model assumed

Author understood the code (because they wrote it)

Reviewer could trust that understanding as a starting point

Review = check on reasoning, not reconstruction

If the reviewer had a question, the author could answer it

Same ceremony. Different underneath.

You still raise PRs
You still do reviews

Author understood the code

Reviewer could trust that understanding

Review = check on reasoning

Author could answer questions

Where the decisions live

Problem

Prompt

Output

Skim

decisions made here — by the model, not you

The model decided: which library, which error handling, which edge cases
You inherited decisions. You didn't make them.

Designed to feel like authorship

You wrote the prompt
Your name's on the commit
Teammates ask how your feature is going

Cognitively, you're in the reviewer's seat — with worse information than any reviewer ever had

You are a reviewer who has been mistaken for an author

This is a transition, not a category

"That's not me — I write most of my code"

Fair. For now.
The risk isn't being a reviewer who thinks they're an author
It's becoming one without noticing the moment it happened

Specific decisions, available for questioning

JWT validation — old model

Which claims to validate
What happens on expiry
Whether unsigned tokens get accepted in dev

Reviewer asks:

"What about alg: none? What about HS256 verified with the RSA public key?"

The question isn't available to be asked

JWT validation — new model

Model produces clean, structured code
Handles the happy path
Nobody made a decision about algorithm confusion

Reviewer asks:

"..."

The author isn't in the room

We've run this cycle before

New tech → speed increases → threat modelling lags → new vulnerability class

80s–90s: memory

Direct memory access, no guardrails
Morris Worm, 1988 — buffer overflow
Language had been around for over a decade

Late 90s–2000s: input

User input flowed straight into SQL
SQL injection documented 1998
Still in OWASP Top 10 twenty years later

2010s: configuration

Open S3 buckets, wildcard IAM
Capital One, 2019 — 100 million records
An overpermissioned role and an SSRF. Not a zero-day.

The author still knew what they built

C dev could explain their memory management
Web dev could explain their query layer
The gap was threat modelling, not understanding

That gap was closeable. You could ask.

Not threat modelling. Understanding.

The dev who generated the JWT code can't say why it handles algs that way
Because they didn't decide it
Reading the code isn't the same as authoring the decisions in it

Code on the page

no human ever
crossed this

Decisions behind the code

Better at looking right. Not at being right.

Model trained on the open internet
45% of AI-generated code fails basic security tests
Little improvement between model generations on security
Syntax accuracy keeps climbing

The numbers tell the shape. The cases should keep you up.

~45%

of AI code fails basic security tests

1 in 5

samples reference non-existent packages

CVE counts climbing month on month

But this is where it gets real

CVE-2025-53773: Hidden text. YOLO mode.

Invisible-to-human, readable-to-AI text in a PR description
Flips Copilot into auto-approve everything mode
One line in .vscode/settings.json
Developer sees a clean diff. Reviewer approves.

Rules File Backdoor

Hidden Unicode. Instructions only the AI reads.

Cursor / Copilot config files — shared across teams
Unicode that looks blank to humans
To the AI: "introduce this vulnerability in auth"
Code looks clean. Diff looks tidy. LGTM.

Amazon Q VS Code extension

Passed verification. Live for two days.

Compromised AI tooling — inside the developer's own editor
Cleared review processes
Two days in production before it was caught
The tooling itself was the attack surface

AI code looks like an expert wrote it

Clean. Consistently structured. Sensible names.
No rough edges
No friction = no skepticism trigger

Human-written

Slightly messy, comments don't match, naming inconsistencies

AI-written

Perfectly tidy, well-scoped, authoritative

Which one are you more likely to question?

The mess was doing work

200-line function with three jobs → slow down
Comment that doesn't match → ask
Imperfections are the friction that triggers skepticism
AI-generated code suppresses them all

Nobody was malicious. Nobody was negligent.

AI writes fast
Developer skims because it looks right
Reviewer rubber-stamps because the diff is clean
Defects persist longer, spread further, when review is shallow

And it gets structurally worse

The next version is already arriving
Agents committing dozens of PRs overnight
No human prompting each step
No developer skim before the diff lands

Today's gap is the small version

Individual hygiene won't be enough

This is structural. The fix is rebuilding review culture.

Three things you can do on Monday

AI-heavy PRs ship with a decision log for security-sensitive paths

Reviewer's first job: ask the author's questions back

Allocate review time as a real budget — proportional to generation speed

Three things I don't have answers to

What replaces velocity as a measure of a good engineering team?

When the agent raises the PR, who's the author? Who's the reviewer?

How do we train new engineers to understand code they never wrote?

Review is the primary engineering act now

Engineering used to happen during writing
Now: writing is generation
Engineering happens in review — or it doesn't happen
Shallow review = no review

When AI is writing most of your code — and it will be —
do you still own the understanding?
Or just the commit?

Sources & further reading

Claim	Source
Slopsquatting stats	FOSSA — Slopsquatting: AI Hallucinations and the New Software Supply Chain Risk
CVE-2025-53773 (Copilot YOLO mode)	Embrace the Red
Rules File Backdoor	Pillar Security
1-in-5 breach figure	Aikido — State of AI in Security & Development 2026
CVE escalation Jan→Mar 2026	Infosecurity Magazine
45% security test failure rate	Veracode
Defects persist under shallow review	arxiv.org — AI Code in the Wild
Amazon Q extension compromise	Fortune
Slopsquatting end-to-end chain	Aikido — Slopsquatting