Wednesday, May 20, 2026

The agent native cloud has arrived.

Agentic WorkflowsLong ContextTiny LLMsInfrastructureMultimodalityRAGObservabilityRustHealthcare AISRE Agents

May 20 · 21 videos

Railway is adding 100,000 users every week.

Jake Cooper says the traditional SDLC is dead.

Google Gemini now analyzes five minutes of video for 1.5 cents.

Andrew Ng predicts a 1:1 ratio for product managers and engineers.

Tiny models under 1 billion parameters hit 2,000 tokens per second on mobile.

Context is the new bottleneck.

“An AI agent without memory is just auto complete with ambition.”

The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway

Jake Cooper · Latent Space · 89 min

Watch on YouTube →

Railway is evolving into the first agent-native cloud to handle a future where AI agents outnumber human developers. Founder Jake Cooper details the shift to bare-metal ownership and the death of traditional CI/CD.

Railway supports 3 million users with a lean 35-person team using high-leverage internal systems.
The company achieved a 3-month hardware payback period by owning racks instead of renting from hyperscalers.
Traditional SDLC components like Git and Pull Requests are being replaced by production forks and copy-on-write infrastructure.
Agents prefer CLI-heavy interfaces over GUIs, shifting the bottleneck from compute to intelligence.
Hardware like RAM and CPUs can appreciate in value, making data center debt a viable financing tool.
The free tier era of PaaS is unsustainable due to crypto-miners and bots.
Railway uses BPF-level networking and a custom 5-cloud overlay for scaling.

AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data

Adit Abraham · DeepLearningAI · 28 min

Watch on YouTube →

Adit Abraham of Reducto argues that agent performance is limited by unstructured data ingestion rather than model intelligence. He introduces Agentic OCR for processing complex visual layouts.

Traditional RAG pipelines fail because they treat documents as flat text instead of visual whiteboards.
Reducto has processed over 3 billion documents using speculative decoding for superhuman accuracy.
In high-stakes fields like healthcare, 90 percent accuracy is considered a failure.
The framework involves moving from RAG to file-system-based architectures where agents navigate sandboxes.
Specialized ingestion layers are required for messy scans and intricate financial tables.
The future of document interaction is read-write, where agents edit and create deliverables directly.

AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI

Paige Bailey · DeepLearningAI · 42 min

Watch on YouTube →

Paige Bailey from Google DeepMind showcases the Gemini 3.1 model family and the collapse of barriers for building multimodal applications. She emphasizes the shift to agent-first development environments.

Gemini 3.1 Flashlight can analyze five minutes of video for only 1.5 cents.
The AI-Studio-to-Production workflow allows developers to export production-ready code from prototypes.
Gemma 4 is released under the Apache 2.0 license to drive startup innovation.
Specialized models like Liria 3 for music and Genie 3 for world models are now available.
Small teams of 1 to 3 people can now build hyper-personalized marketing apps at scale.
The GPU footprint required for state-of-the-art performance has shrunk significantly.

AI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database

Eli Schilling · DeepLearningAI · 51 min

Watch on YouTube →

Eli Schilling introduces a robust architecture for agent memory and context engineering using the Oracle AI Database. He argues that persistent memory is essential for agents to move beyond simple auto-complete.

The Three Musketeers framework covers Context Engineering, Memory Engineering, and Harness Engineering.
Memory-augmented agents maintain flat token utilization compared to the linear growth of naive agents.
Oracle AI Database manages vector, spatial, relational, and graph data in a single engine.
Context offloading is triggered at an 80 percent threshold to prevent bloat.
Human reasoning structures like short-term and procedural memory serve as blueprints for machine systems.
Memory enables consistent performance across enterprise teams regardless of the individual user.

AI Dev 26 x SF | William Imoh & Charlie Wood: Closing the Care Gap

William Imoh · DeepLearningAI · 33 min

Watch on YouTube →

William Imoh and Charlie Wood demonstrate a Care Transition Copilot that reduces clinician chart preparation time from 45 minutes to seconds. They utilize Actian VectorAI for secure, on-premise processing.

Clinicians currently spend 45 minutes on manual chart preparation per patient.
The system uses a four-stage agentic loop: Gathering Context, Analyzing Risk, Retrieving Protocols, and Drafting Briefs.
Actian VectorAI provides a 3 to 7x speed increase in query processing at scale.
Edge deployment is critical for healthcare to ensure data sovereignty and avoid PII leaks.
Auditability is essential; clinicians must see the exact source document for every AI claim.
Readmission risk detection is a high-ROI use case as insurance often refuses payment for 30-day readmissions.

AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models

Eda Zhou · DeepLearningAI · 33 min

Watch on YouTube →

AMD experts Eda Zhou and Mahdi Ghodsi demonstrate building personal agents using open-source models like Qwen 2.5. They focus on persistent application-level architectures like Open Claw.

Open-source models are now capable of complex reasoning when deployed on AMD Instinct MI325 GPUs.
The ReAct loop (Reason, Action, Observation) bridges the gap between static LLMs and functional agents.
VLLM offers Day 0 support for major open-source models on AMD ROCm-compatible hardware.
Defining clear boundaries and souls for agents prevents unauthorized changes or skipped verification.
Persistent agent applications offer better ROI for recurring tasks than script-based libraries.
Multi-agent systems allow for specialized tasks like parallel benchmarking and skepticism.

AI Dev 26 x SF | Aditi Gupta: Building SRE Agents with the Redis Context Engine

Aditi Gupta · DeepLearningAI · 31 min

Watch on YouTube →

Aditi Gupta from Redis explains how to build SRE agents that prioritize trust and verifiability. She details an architecture that uses Redis as a context engine for complex infrastructure management.

Redis functions as a vector store, thread manager, and semantic cache for SRE agents.
Semantic caching achieved a 15x speed increase and a 98 percent reduction in costs.
Model tiering uses Nano models for classification and Hefty models for reasoning.
Unsafe recommendations are worse than no recommendations in critical infrastructure.
Operationalizing tribal knowledge via ingested runbooks turns documentation into automation.
Enterprise customers often manage over 60 clusters across multiple regions.

AI Dev 26 x SF | Nyah Macklin: The AI Said So? How to Build Auditable AI Agents Using Context Graphs

Nyah Macklin · DeepLearningAI · 31 min

Watch on YouTube →

Nyah Macklin argues that fractured context causes 95 percent of AI pilot failures. She introduces Context Graphs to provide auditable decision traces for high-stakes industries.

Context Graphs capture the why behind decisions, not just the what of the data.
GraphRAG pushed accuracy to 91 percent compared to 54 percent for fine-tuned models in domain tasks.
The AI said so defense is considered a firing offense in senior leadership.
Engineers must move from text similarity search to hybrid search and graph algorithms.
95 percent of AI pilots fail due to a lack of air traffic control over agent swarms.
Ethics and safety must be centered in AI implementation rather than treated as an afterthought.

AI Dev 26 x SF | Jeff Huber: Everything You Need to Know About Agentic Search

Jeff Huber · DeepLearningAI · 23 min

Watch on YouTube →

Jeff Huber, CEO of Chroma, identifies context as the primary bottleneck for reliable agents. He introduces Agentic Search where specialized models curate their own context to avoid performance rot.

Model performance degrades sharply when context windows expand beyond 40k to 100k tokens.
Chroma released Context 1, a 20B parameter open-source model for fast search accuracy.
Context 1 runs at 3,000 tokens per second and is 50x smaller than frontier models.
Agentic search costs can be reduced by 25x by switching from frontier reasoners to specialized models.
Information workers spend 30 percent of their time seeking information, a task agents must master.
Compute is being pushed down into the data layer to allow iterative exploration without latency.

AI Dev 26 x SF | Pratik Verma: Observability Agent to Find & Fix Issues in AI Agents

Pratik Verma · DeepLearningAI · 14 min

Watch on YouTube →

Pratik Verma addresses the challenge of moving agents to production by focusing on reliability. He introduces the Agentic Engineering loop and the Monocle instrumentation framework.

Agents rarely fail due to logic but rather because of edge cases and missing context.
Monocle is an open-source framework that captures traces from LangChain and LangGraph automatically.
The Agentic Engineering loop involves simulation, trace observation, evaluation, and automated fixing.
Software development is returning to outcome-based testing where agents build solutions to pass tests.
Instrumentation facilitates outcome-based pricing by tracking successful agent results.
Iterative prompt optimization is required to ensure high reliability before deployment.

AI Dev 26 x SF: Jean-Marie John-Mathews: Red Teaming LLM Applications Systematically

Jean-Marie John-Mathews · DeepLearningAI · 14 min

Watch on YouTube →

Jean-Marie John-Mathews from Giskard discusses the evolution of AI Red Teaming. He argues for dynamic user simulation to uncover vulnerabilities in multi-turn agent interactions.

Traditional LLM as a Judge frameworks are insufficient for modern agents with invisible tool calls.
Static golden datasets are becoming obsolete; dynamic intent simulation is the new standard.
Off-topic vulnerabilities occur when brand bots engage in unrelated conversations like coding advice.
The Giskard framework translates natural language requirements into versionable test suites.
Red teaming should be a prerequisite for production to avoid reputational damage.
Automated red teaming skills should be integrated directly into CI/CD pipelines.

AI Dev 26 x SF | A Fireside Chat with OpenAI's Marc Manara

Marc Manara · DeepLearningAI · 24 min

Watch on YouTube →

Marc Manara of OpenAI discusses the shift from manual coding to agent orchestration. He highlights the importance of the harness-model synergy and the evolution of the engineering role.

OpenAI is prioritizing agentic behaviors like preambles where models explain their reasoning.
GPT-5.5 aims for the Pareto frontier of token efficiency, delivering higher accuracy with fewer tokens.
The primary competitive advantage for startups is now iteration speed rather than technical gatekeeping.
Hyper-efficient teams of 5 to 10 people are generating tens of millions in ARR.
Engineers are evolving into managers who oversee multiple agents and review their output.
Startups have an advantage because they lack business debt and legacy models to disrupt.

AI Dev 26 x SF: Andrew Ng: The Future of Software Engineering

Andrew Ng · DeepLearningAI · 19 min

Watch on YouTube →

Andrew Ng explores the transition to an AI-augmented generalist model in software engineering. He argues that 100 percent AI-generated code is the threshold for exponential productivity.

The Product Management Bottleneck occurs as the ratio of engineers to PMs collapses toward 1:1.
Reaching 100 percent AI-generated code is necessary because human review is a critical bottleneck.
Parallel Skill Development involves improving agent capabilities while training human drivers.
Non-technical functions like Legal and Marketing are becoming the new primary bottlenecks.
Context Hub provides agents with real-time API documentation to prevent hallucinations.
Small, AI-native teams can bypass traditional organizational silos using agentic workflows.

AI Dev 26 x SF | Panel Discussion: Future of Software Engineering

Michele Catasta · DeepLearningAI · 55 min

Watch on YouTube →

A panel of industry leaders discusses the shift from manual coding to problem ownership. They explore how coding agents are compressing the junior-to-senior timeline.

Google currently writes 75 percent of its code using AI.
Engineers must cultivate taste to distinguish between mediocre and excellent AI output.
Token maxing, the tendency to produce infinite content because it is cheap, should be avoided.
The manager job title may dissolve as managing agents becomes a baseline skill for all contributors.
Companies can achieve Series A milestones with significantly fewer developers using agents.
1.2 million new AI-related jobs appeared on LinkedIn in the last year.

AI Dev 26 x SF | Paige Bailey: Research to Reality

Paige Bailey · DeepLearningAI · 12 min

Watch on YouTube →

Paige Bailey discusses the shift from foundational research to production-ready applications. She argues that massive context windows allow models to be treated as entire operating systems.

Gemini 1.5 Pro maintains 99 percent plus recall across context windows of 1 million plus tokens.
Context-First Design can replace complex RAG pipelines, reducing architectural complexity.
The context window is described as the new RAM for AI applications.
Successful AI products focus on solving boring problems with high-reliability models.
The value of AI is shifting from the model itself to the Developer Experience surrounding the API.
Research-to-API latency has been reduced from months to days.

Google's Next Big Thing Is Finally Here

Ejaaz · Limitless Podcast · 29 min

Watch on YouTube →

Google I/O 2026 signaled a shift toward agentic capabilities and world models. The event featured the launch of Gemini Omni and Gemini 3.5 Flash.

Gemini 3.5 Flash built an entire operating system from scratch in 12 hours using 93 parallel agents.
Gemini Omni is a hybrid LLM and world model capable of generating physically accurate video.
Google is bundling AI Pro with 5TB of storage and ad-free YouTube for 20 dollars.
Gemini 3.5 Flash is now the default routing layer for Google Search and YouTube Search.
Anthropic is reportedly reaching a 45 billion dollar ARR.
World models teach LLMs to understand the physical consequences of actions.

Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind

Patrick Löber · AI Engineer · 16 min

Watch on YouTube →

Patrick Löber presents the transition to Any-to-Any native multimodal agents. He explains how unified architectures natively understand code, image, audio, and video.

Gemini can ingest up to 9 hours of audio or 1 hour of video in a single prompt.
Context caching can reduce costs by 90 percent for repeated queries on long documents.
Live API audio-to-audio models reduce latency and preserve tonal nuance by avoiding cascaded pipelines.
Native multimodality means the model understands underlying concepts across all senses.
One minute of audio costs 1920 tokens in Gemini models.
The Any-to-Any pattern is transferable from education to high-level research.

Why Rust is different, with Alice Ryhl

Alice Ryhl · The Pragmatic Engineer · 65 min

Watch on YouTube →

Alice Ryhl, a Google engineer and Tokio maintainer, explains Rust's unique safety model and governance. She discusses why Rust is being adopted for critical infrastructure like the Linux kernel.

Rust eliminates null pointers and data races at the architectural level without a garbage collector.
The language uses a consensus-driven governance model without a benevolent dictator for life.
Rust recently achieved non-experimental status within the Linux kernel.
The Editions model allows for breaking syntax changes while maintaining full interoperability.
The US Department of Defense is supporting Rust adoption for memory safety reasons.
Rust has a rapid 6-week release cycle for stable compiler updates.

Skill issue: Lessons from skilling up coding agents to use Langfuse - Marc Klingen, Clickhouse

Marc Klingen · AI Engineer · 24 min

Watch on YouTube →

Marc Klingen discusses how to improve coding agents by providing them with formalized skills. He uses the integration of Langfuse into Claude Code as a primary case study.

Skills function as formalized shortcuts that allow agents to gather context progressively.
A natural language search endpoint for agents reduced hallucinations compared to documentation crawling.
Target functions in auto-research loops can lead agents to skip essential reliability steps to meet speed metrics.
Engineers should provide agent sitemaps (llms.txt) to help agents navigate documentation.
Manual trace analysis still provides 80 percent of the insights needed to improve AI applications.
Tracking agent search queries provides a roadmap for where developers are getting stuck.

HOW TO GET TO KNOW YOURSELF | WHO ARE YOU?

Rob Dial · The Mindset Mentor Podcast · 18 min

Watch on YouTube →

Rob Dial explores how personality traits are often behavioral adaptations for survival. He discusses the journey of rewiring the nervous system for authenticity.

Traits like hyper-independence and people-pleasing are often unconscious strategies for safety.
The nervous system prioritizes familiarity and safety over fulfillment.
Rewiring the brain for peace and genuine connection is an 8 to 9 year journey.
Humor is frequently used as a defense mechanism to diffuse tension in a room.
Hyper-independence in leadership often stems from a lack of trust developed in childhood.
Unchecked ambition driven by external validation often leads to unsustainable burnout.

From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google

Cormac Brick · AI Engineer · 21 min

Watch on YouTube →

Cormac Brick details the rise of Tiny LLMs (TLMs) for on-device applications. He shows how fine-tuning models under 1 billion parameters can achieve high accuracy for specialized tasks.

Function Gemma, a 270M parameter model, can hit 2,000 tokens per second on a Pixel 7.
Fine-tuning with synthetic data pushed accuracy from 46 percent to over 90 percent for specific app intents.
System-level GenAI like Gemini Nano reduces app size by sharing resources.
App-level GenAI like LiteRT-LM offers maximum customization across non-flagship devices.
On-device AI provides significant cost savings on inference and enables offline functionality.
A Skill Harness can dynamically load tool descriptions and JavaScript UIs for on-device agents.

References

PeopleJake Cooper (@JustJake) · Adit Abraham · Paige Bailey (@DynamicWebPaige) · Eli Schilling · William Imoh · Charlie Wood · Eda Zhou · Mahdi Ghodsi · Aditi Gupta · Nyah Macklin · Jeff Huber · Pratik Verma · Jean-Marie John-Mathews · Marc Manara · Andrew Ng · Alice Ryhl (https://ryhl.io) · Marc Klingen (https://x.com/marcklingen) · Rob Dial · Cormac Brick · Patrick Löber (@patloeber)

ToolsRailway · Reducto · Gemini 3.1 · Gemini 3.5 Flash · Oracle AI Database · Actian VectorAI · Open Claw · Qwen 2.5 · Redis Context Engine · Context 1 · Monocle · Giskard · Context Hub · Code Dream · Tokio · Langfuse · Claude Code · Function Gemma

PapersComGPT study · IEEE 2026 study · MIT 2025 study