Sunday, May 31, 2026

Verification is the new bottleneck

Enterprise AIVoice AgentsLatency OptimizationAgent ValidationJevons ParadoxCode QualityLLM BenchmarksPlatform Shift

May 31 · 4 videos

GPT-5.4 generates 1.2 million lines of code for simple tasks.

Claude 4.6 hits 300 security issues per million lines.

Prasenjit Sarkar says English is the new programming language.

Benedict Evans calls this the 1997 moment.

Latency budgets for voice agents are now 500ms.

The model is becoming a commodity.

“English is now the new programming language. Everybody's talking about that.”

Can LLMs generate Enterprise Quality Code? — Prasenjit Sarkar, Sonar

Prasenjit Sarkar · AI Engineer · 15 min

Watch on YouTube →

Prasenjit Sarkar of Sonar explains why high functional pass rates on benchmarks do not equal enterprise quality. He argues that the shift to agentic coding moves the risk from generation to maintenance.

Sonar tested 4,444 Java assignments across 53 different LLMs.
GPT-5.4 produced 1.2 million lines of code compared to 250,000 from GPT-4o for the same tasks.
Claude Sonnet 4.6 reached a rate of 300 security issues per million lines of code.
The ACDC framework moves analysis from slow CI cycles to 1 to 5 second agentic loops.
55 percent of developers already use AI agents according to the Pragmatic Engineer Survey.
Verification and remediation are becoming more valuable than raw code generation.

Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI

Rishabh Bhargava · AI Engineer · 24 min

Watch on YouTube →

Rishabh Bhargava from Together AI breaks down the engineering requirements for production voice agents. He focuses on the strict latency budgets required for natural human interaction.

Human conversation feels unnatural if latency exceeds the 500ms threshold.
The ideal LLM size for voice tasks ranges between 8 billion and 30 billion parameters.
Moving models to the same physical building can drop network latency from 75ms to 5ms.
The Thinker-Talker pattern uses filler speech to mask background reasoning time.
Cascaded pipelines remain more reliable for complex instructions than current S2S models.
Every 10 milliseconds of overhead must be tracked through deep observability.

Spec-Driven Testing for Agents With A Brain the Size of A Planet — Steven Willmott, SafeIntelligence

Steven Willmott · AI Engineer · 13 min

Watch on YouTube →

Steven Willmott of SafeIntelligence introduces spec-driven validation to manage the risks of highly capable models. He demonstrates why smarter models are often more dangerous.

Large models are susceptible to poem jailbreaks that smaller models cannot even comprehend.
Agent specifications should include ground truth, business rules, and robustness requirements.
Decoupling the specification from the model allows for safer infrastructure swaps.
A robust spec includes specific constraints like a 10 percent maximum discount rule.
Specifications should be versioned in Git alongside the application code.
The goal is to find a model capable of the task but incapable of doing arbitrary harm.

A rational conversation on where AI is actually going | Benedict Evans

Benedict Evans · Lenny's Podcast · 79 min

Watch on YouTube →

Benedict Evans compares the current AI boom to the internet in 1997. He analyzes how value will shift from model labs to the application and distribution layers.

AI follows the Jevons Paradox where cheaper software production leads to higher total demand for engineers.
ChatGPT has reached 900 million weekly active users.
Foundational model labs may face the low margins of commodity utility providers like telcos.
Automation typically replaces specific tasks rather than entire job roles.
Distribution remains the primary moat when underlying models become standardized.
Success requires immersion in the technology to find the jagged frontier of its actual utility.

References

PeoplePrasenjit Sarkar · Rishabh Bhargava · Steven Willmott (x.com/njyx) · Benedict Evans · Dario Amodei (x.com/DarioAmodei) · Steven Sinofsky (x.com/stevesi) · Sam Altman · Mark Andreessen · Larry Tesler · Adnan Qureshi

ToolsSonar · Together AI · SafeIntelligence · ChatGPT · ACDC Framework · A2A · Open API · LangSmith · Vertex