LGTM Driven Development

AI writes, you review, nobody reads

Three hours into a feature

  • You're in the zone
  • AI suggests a library — name looks right
  • Install. Tests pass. Ship.

The package didn't exist two weeks ago

  • ~20% of AI code suggests non-existent packages
  • 58% of hallucinated names recur — predictable, weaponisable
  • Attackers register the names. Add payloads. Wait.
AI model output

invents package name

Attacker registers name

malicious payload → developer installs

No control was bypassed

  • Nothing was hacked
  • The attacker exploited missing context
  • The developer wasn't this code's author — they were a reviewer who didn't know they were reviewing

What review used to rely on

  • The author had been inside the code
  • The reviewer applied skepticism to someone who already had context
  • The asymmetry is what caught bugs

Author

Deep context

friction

where bugs surface

Reviewer

Broad context

Review works because someone can be questioned

"What happens if this is null?"

  • Author has the answer → reassurance
  • Author doesn't → that's the finding

What the old model assumed

Author understood the code (because they wrote it)

Reviewer could trust that understanding as a starting point

Review = check on reasoning, not reconstruction

If the reviewer had a question, the author could answer it

Same ceremony. Different underneath.

  • You still raise PRs
  • You still do reviews

Author understood the code

Reviewer could trust that understanding

Review = check on reasoning

Author could answer questions

Where the decisions live

Problem
Prompt
Output
Skim
PR

decisions made here — by the model, not you

  • The model decided: which library, which error handling, which edge cases
  • You inherited decisions. You didn't make them.

Designed to feel like authorship

  • You wrote the prompt
  • Your name's on the commit
  • Teammates ask how your feature is going

Cognitively, you're in the reviewer's seat — with worse information than any reviewer ever had

You are a reviewer who has been mistaken for an author

This is a transition, not a category

"That's not me — I write most of my code"

  • Fair. For now.
  • The risk isn't being a reviewer who thinks they're an author
  • It's becoming one without noticing the moment it happened

Specific decisions, available for questioning

JWT validation — old model

  • Which claims to validate
  • What happens on expiry
  • Whether unsigned tokens get accepted in dev

Reviewer asks:

"What about alg: none? What about HS256 verified with the RSA public key?"

The question isn't available to be asked

JWT validation — new model

  • Model produces clean, structured code
  • Handles the happy path
  • Nobody made a decision about algorithm confusion

Reviewer asks:

"..."

The author isn't in the room

We've run this cycle before

New tech → speed increases → threat modelling lags → new vulnerability class

80s–90s: memory

  • Direct memory access, no guardrails
  • Morris Worm, 1988 — buffer overflow
  • Language had been around for over a decade

Late 90s–2000s: input

  • User input flowed straight into SQL
  • SQL injection documented 1998
  • Still in OWASP Top 10 twenty years later

2010s: configuration

  • Open S3 buckets, wildcard IAM
  • Capital One, 2019 — 100 million records
  • An overpermissioned role and an SSRF. Not a zero-day.

The author still knew what they built

  • C dev could explain their memory management
  • Web dev could explain their query layer
  • The gap was threat modelling, not understanding

That gap was closeable. You could ask.

Not threat modelling. Understanding.

  • The dev who generated the JWT code can't say why it handles algs that way
  • Because they didn't decide it
  • Reading the code isn't the same as authoring the decisions in it

Code on the page

no human ever
crossed this

Decisions behind the code

Better at looking right. Not at being right.

  • Model trained on the open internet
  • 45% of AI-generated code fails basic security tests
  • Little improvement between model generations on security
  • Syntax accuracy keeps climbing

The numbers tell the shape. The cases should keep you up.

~45%

of AI code fails basic security tests

1 in 5

samples reference non-existent packages

CVE counts climbing month on month

But this is where it gets real

CVE-2025-53773: Hidden text. YOLO mode.

  • Invisible-to-human, readable-to-AI text in a PR description
  • Flips Copilot into auto-approve everything mode
  • One line in .vscode/settings.json
  • Developer sees a clean diff. Reviewer approves.

Rules File Backdoor

Hidden Unicode. Instructions only the AI reads.

  • Cursor / Copilot config files — shared across teams
  • Unicode that looks blank to humans
  • To the AI: "introduce this vulnerability in auth"
  • Code looks clean. Diff looks tidy. LGTM.

Amazon Q VS Code extension

Passed verification. Live for two days.

  • Compromised AI tooling — inside the developer's own editor
  • Cleared review processes
  • Two days in production before it was caught
  • The tooling itself was the attack surface

AI code looks like an expert wrote it

  • Clean. Consistently structured. Sensible names.
  • No rough edges
  • No friction = no skepticism trigger

Human-written

Slightly messy, comments don't match, naming inconsistencies

AI-written

Perfectly tidy, well-scoped, authoritative

Which one are you more likely to question?

The mess was doing work

  • 200-line function with three jobs → slow down
  • Comment that doesn't match → ask
  • Imperfections are the friction that triggers skepticism
  • AI-generated code suppresses them all

Nobody was malicious. Nobody was negligent.

  • AI writes fast
  • Developer skims because it looks right
  • Reviewer rubber-stamps because the diff is clean
  • Defects persist longer, spread further, when review is shallow

And it gets structurally worse

  • The next version is already arriving
  • Agents committing dozens of PRs overnight
  • No human prompting each step
  • No developer skim before the diff lands

Today's gap is the small version

Individual hygiene won't be enough

This is structural. The fix is rebuilding review culture.

Three things you can do on Monday

1

AI-heavy PRs ship with a decision log for security-sensitive paths

2

Reviewer's first job: ask the author's questions back

3

Allocate review time as a real budget — proportional to generation speed

Three things I don't have answers to

1

What replaces velocity as a measure of a good engineering team?

2

When the agent raises the PR, who's the author? Who's the reviewer?

3

How do we train new engineers to understand code they never wrote?

Review is the primary engineering act now

  • Engineering used to happen during writing
  • Now: writing is generation
  • Engineering happens in review — or it doesn't happen
  • Shallow review = no review

When AI is writing most of your code — and it will be
do you still own the understanding?
Or just the commit?

Sources & further reading

Claim Source
Slopsquatting statsFOSSA — Slopsquatting: AI Hallucinations and the New Software Supply Chain Risk
CVE-2025-53773 (Copilot YOLO mode)Embrace the Red
Rules File BackdoorPillar Security
1-in-5 breach figureAikido — State of AI in Security & Development 2026
CVE escalation Jan→Mar 2026Infosecurity Magazine
45% security test failure rateVeracode
Defects persist under shallow reviewarxiv.org — AI Code in the Wild
Amazon Q extension compromiseFortune
Slopsquatting end-to-end chainAikido — Slopsquatting