Wednesday, May 20, 2026
The agent native cloud has arrived.
May 20 · 21 videos
Railway is adding 100,000 users every week.
Jake Cooper says the traditional SDLC is dead.
Google Gemini now analyzes five minutes of video for 1.5 cents.
Andrew Ng predicts a 1:1 ratio for product managers and engineers.
Tiny models under 1 billion parameters hit 2,000 tokens per second on mobile.
Context is the new bottleneck.
“An AI agent without memory is just auto complete with ambition.”
The Agent-Native Cloud: 3M Users, 100K Signups/Wk, Data Centers, & Death PRs — Jake Cooper, Railway
Jake Cooper · Latent Space · 89 min
Watch on YouTube →Railway is evolving into the first agent-native cloud to handle a future where AI agents outnumber human developers. Founder Jake Cooper details the shift to bare-metal ownership and the death of traditional CI/CD.
- Railway supports 3 million users with a lean 35-person team using high-leverage internal systems.
- The company achieved a 3-month hardware payback period by owning racks instead of renting from hyperscalers.
- Traditional SDLC components like Git and Pull Requests are being replaced by production forks and copy-on-write infrastructure.
- Agents prefer CLI-heavy interfaces over GUIs, shifting the bottleneck from compute to intelligence.
- Hardware like RAM and CPUs can appreciate in value, making data center debt a viable financing tool.
- The free tier era of PaaS is unsustainable due to crypto-miners and bots.
- Railway uses BPF-level networking and a custom 5-cloud overlay for scaling.
AI Dev 26 x SF | Adit Abraham: Better Agents with Better Data
Adit Abraham · DeepLearningAI · 28 min
Watch on YouTube →Adit Abraham of Reducto argues that agent performance is limited by unstructured data ingestion rather than model intelligence. He introduces Agentic OCR for processing complex visual layouts.
- Traditional RAG pipelines fail because they treat documents as flat text instead of visual whiteboards.
- Reducto has processed over 3 billion documents using speculative decoding for superhuman accuracy.
- In high-stakes fields like healthcare, 90 percent accuracy is considered a failure.
- The framework involves moving from RAG to file-system-based architectures where agents navigate sandboxes.
- Specialized ingestion layers are required for messy scans and intricate financial tables.
- The future of document interaction is read-write, where agents edit and create deliverables directly.
AI Dev 26 x SF | Paige Bailey: What's New and What's Next in AI
Paige Bailey · DeepLearningAI · 42 min
Watch on YouTube →Paige Bailey from Google DeepMind showcases the Gemini 3.1 model family and the collapse of barriers for building multimodal applications. She emphasizes the shift to agent-first development environments.
- Gemini 3.1 Flashlight can analyze five minutes of video for only 1.5 cents.
- The AI-Studio-to-Production workflow allows developers to export production-ready code from prototypes.
- Gemma 4 is released under the Apache 2.0 license to drive startup innovation.
- Specialized models like Liria 3 for music and Genie 3 for world models are now available.
- Small teams of 1 to 3 people can now build hyper-personalized marketing apps at scale.
- The GPU footprint required for state-of-the-art performance has shrunk significantly.
AI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database
Eli Schilling · DeepLearningAI · 51 min
Watch on YouTube →Eli Schilling introduces a robust architecture for agent memory and context engineering using the Oracle AI Database. He argues that persistent memory is essential for agents to move beyond simple auto-complete.
- The Three Musketeers framework covers Context Engineering, Memory Engineering, and Harness Engineering.
- Memory-augmented agents maintain flat token utilization compared to the linear growth of naive agents.
- Oracle AI Database manages vector, spatial, relational, and graph data in a single engine.
- Context offloading is triggered at an 80 percent threshold to prevent bloat.
- Human reasoning structures like short-term and procedural memory serve as blueprints for machine systems.
- Memory enables consistent performance across enterprise teams regardless of the individual user.
AI Dev 26 x SF | William Imoh & Charlie Wood: Closing the Care Gap
William Imoh · DeepLearningAI · 33 min
Watch on YouTube →William Imoh and Charlie Wood demonstrate a Care Transition Copilot that reduces clinician chart preparation time from 45 minutes to seconds. They utilize Actian VectorAI for secure, on-premise processing.
- Clinicians currently spend 45 minutes on manual chart preparation per patient.
- The system uses a four-stage agentic loop: Gathering Context, Analyzing Risk, Retrieving Protocols, and Drafting Briefs.
- Actian VectorAI provides a 3 to 7x speed increase in query processing at scale.
- Edge deployment is critical for healthcare to ensure data sovereignty and avoid PII leaks.
- Auditability is essential; clinicians must see the exact source document for every AI claim.
- Readmission risk detection is a high-ROI use case as insurance often refuses payment for 30-day readmissions.
AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models
Eda Zhou · DeepLearningAI · 33 min
Watch on YouTube →AMD experts Eda Zhou and Mahdi Ghodsi demonstrate building personal agents using open-source models like Qwen 2.5. They focus on persistent application-level architectures like Open Claw.
- Open-source models are now capable of complex reasoning when deployed on AMD Instinct MI325 GPUs.
- The ReAct loop (Reason, Action, Observation) bridges the gap between static LLMs and functional agents.
- VLLM offers Day 0 support for major open-source models on AMD ROCm-compatible hardware.
- Defining clear boundaries and souls for agents prevents unauthorized changes or skipped verification.
- Persistent agent applications offer better ROI for recurring tasks than script-based libraries.
- Multi-agent systems allow for specialized tasks like parallel benchmarking and skepticism.
AI Dev 26 x SF | Aditi Gupta: Building SRE Agents with the Redis Context Engine
Aditi Gupta · DeepLearningAI · 31 min
Watch on YouTube →Aditi Gupta from Redis explains how to build SRE agents that prioritize trust and verifiability. She details an architecture that uses Redis as a context engine for complex infrastructure management.
- Redis functions as a vector store, thread manager, and semantic cache for SRE agents.
- Semantic caching achieved a 15x speed increase and a 98 percent reduction in costs.
- Model tiering uses Nano models for classification and Hefty models for reasoning.
- Unsafe recommendations are worse than no recommendations in critical infrastructure.
- Operationalizing tribal knowledge via ingested runbooks turns documentation into automation.
- Enterprise customers often manage over 60 clusters across multiple regions.
AI Dev 26 x SF | Nyah Macklin: The AI Said So? How to Build Auditable AI Agents Using Context Graphs
Nyah Macklin · DeepLearningAI · 31 min
Watch on YouTube →Nyah Macklin argues that fractured context causes 95 percent of AI pilot failures. She introduces Context Graphs to provide auditable decision traces for high-stakes industries.
- Context Graphs capture the why behind decisions, not just the what of the data.
- GraphRAG pushed accuracy to 91 percent compared to 54 percent for fine-tuned models in domain tasks.
- The AI said so defense is considered a firing offense in senior leadership.
- Engineers must move from text similarity search to hybrid search and graph algorithms.
- 95 percent of AI pilots fail due to a lack of air traffic control over agent swarms.
- Ethics and safety must be centered in AI implementation rather than treated as an afterthought.
AI Dev 26 x SF | Jeff Huber: Everything You Need to Know About Agentic Search
Jeff Huber · DeepLearningAI · 23 min
Watch on YouTube →Jeff Huber, CEO of Chroma, identifies context as the primary bottleneck for reliable agents. He introduces Agentic Search where specialized models curate their own context to avoid performance rot.
- Model performance degrades sharply when context windows expand beyond 40k to 100k tokens.
- Chroma released Context 1, a 20B parameter open-source model for fast search accuracy.
- Context 1 runs at 3,000 tokens per second and is 50x smaller than frontier models.
- Agentic search costs can be reduced by 25x by switching from frontier reasoners to specialized models.
- Information workers spend 30 percent of their time seeking information, a task agents must master.
- Compute is being pushed down into the data layer to allow iterative exploration without latency.
AI Dev 26 x SF | Pratik Verma: Observability Agent to Find & Fix Issues in AI Agents
Pratik Verma · DeepLearningAI · 14 min
Watch on YouTube →Pratik Verma addresses the challenge of moving agents to production by focusing on reliability. He introduces the Agentic Engineering loop and the Monocle instrumentation framework.
- Agents rarely fail due to logic but rather because of edge cases and missing context.
- Monocle is an open-source framework that captures traces from LangChain and LangGraph automatically.
- The Agentic Engineering loop involves simulation, trace observation, evaluation, and automated fixing.
- Software development is returning to outcome-based testing where agents build solutions to pass tests.
- Instrumentation facilitates outcome-based pricing by tracking successful agent results.
- Iterative prompt optimization is required to ensure high reliability before deployment.
AI Dev 26 x SF: Jean-Marie John-Mathews: Red Teaming LLM Applications Systematically
Jean-Marie John-Mathews · DeepLearningAI · 14 min
Watch on YouTube →Jean-Marie John-Mathews from Giskard discusses the evolution of AI Red Teaming. He argues for dynamic user simulation to uncover vulnerabilities in multi-turn agent interactions.
- Traditional LLM as a Judge frameworks are insufficient for modern agents with invisible tool calls.
- Static golden datasets are becoming obsolete; dynamic intent simulation is the new standard.
- Off-topic vulnerabilities occur when brand bots engage in unrelated conversations like coding advice.
- The Giskard framework translates natural language requirements into versionable test suites.
- Red teaming should be a prerequisite for production to avoid reputational damage.
- Automated red teaming skills should be integrated directly into CI/CD pipelines.
AI Dev 26 x SF | A Fireside Chat with OpenAI's Marc Manara
Marc Manara · DeepLearningAI · 24 min
Watch on YouTube →Marc Manara of OpenAI discusses the shift from manual coding to agent orchestration. He highlights the importance of the harness-model synergy and the evolution of the engineering role.
- OpenAI is prioritizing agentic behaviors like preambles where models explain their reasoning.
- GPT-5.5 aims for the Pareto frontier of token efficiency, delivering higher accuracy with fewer tokens.
- The primary competitive advantage for startups is now iteration speed rather than technical gatekeeping.
- Hyper-efficient teams of 5 to 10 people are generating tens of millions in ARR.
- Engineers are evolving into managers who oversee multiple agents and review their output.
- Startups have an advantage because they lack business debt and legacy models to disrupt.
AI Dev 26 x SF: Andrew Ng: The Future of Software Engineering
Andrew Ng · DeepLearningAI · 19 min
Watch on YouTube →Andrew Ng explores the transition to an AI-augmented generalist model in software engineering. He argues that 100 percent AI-generated code is the threshold for exponential productivity.
- The Product Management Bottleneck occurs as the ratio of engineers to PMs collapses toward 1:1.
- Reaching 100 percent AI-generated code is necessary because human review is a critical bottleneck.
- Parallel Skill Development involves improving agent capabilities while training human drivers.
- Non-technical functions like Legal and Marketing are becoming the new primary bottlenecks.
- Context Hub provides agents with real-time API documentation to prevent hallucinations.
- Small, AI-native teams can bypass traditional organizational silos using agentic workflows.
AI Dev 26 x SF | Panel Discussion: Future of Software Engineering
Michele Catasta · DeepLearningAI · 55 min
Watch on YouTube →A panel of industry leaders discusses the shift from manual coding to problem ownership. They explore how coding agents are compressing the junior-to-senior timeline.
- Google currently writes 75 percent of its code using AI.
- Engineers must cultivate taste to distinguish between mediocre and excellent AI output.
- Token maxing, the tendency to produce infinite content because it is cheap, should be avoided.
- The manager job title may dissolve as managing agents becomes a baseline skill for all contributors.
- Companies can achieve Series A milestones with significantly fewer developers using agents.
- 1.2 million new AI-related jobs appeared on LinkedIn in the last year.
AI Dev 26 x SF | Paige Bailey: Research to Reality
Paige Bailey · DeepLearningAI · 12 min
Watch on YouTube →Paige Bailey discusses the shift from foundational research to production-ready applications. She argues that massive context windows allow models to be treated as entire operating systems.
- Gemini 1.5 Pro maintains 99 percent plus recall across context windows of 1 million plus tokens.
- Context-First Design can replace complex RAG pipelines, reducing architectural complexity.
- The context window is described as the new RAM for AI applications.
- Successful AI products focus on solving boring problems with high-reliability models.
- The value of AI is shifting from the model itself to the Developer Experience surrounding the API.
- Research-to-API latency has been reduced from months to days.
Google's Next Big Thing Is Finally Here
Ejaaz · Limitless Podcast · 29 min
Watch on YouTube →Google I/O 2026 signaled a shift toward agentic capabilities and world models. The event featured the launch of Gemini Omni and Gemini 3.5 Flash.
- Gemini 3.5 Flash built an entire operating system from scratch in 12 hours using 93 parallel agents.
- Gemini Omni is a hybrid LLM and world model capable of generating physically accurate video.
- Google is bundling AI Pro with 5TB of storage and ad-free YouTube for 20 dollars.
- Gemini 3.5 Flash is now the default routing layer for Google Search and YouTube Search.
- Anthropic is reportedly reaching a 45 billion dollar ARR.
- World models teach LLMs to understand the physical consequences of actions.
Any-to-Any: Building Native Multimodal Agents - Patrick Löber, Google DeepMind
Patrick Löber · AI Engineer · 16 min
Watch on YouTube →Patrick Löber presents the transition to Any-to-Any native multimodal agents. He explains how unified architectures natively understand code, image, audio, and video.
- Gemini can ingest up to 9 hours of audio or 1 hour of video in a single prompt.
- Context caching can reduce costs by 90 percent for repeated queries on long documents.
- Live API audio-to-audio models reduce latency and preserve tonal nuance by avoiding cascaded pipelines.
- Native multimodality means the model understands underlying concepts across all senses.
- One minute of audio costs 1920 tokens in Gemini models.
- The Any-to-Any pattern is transferable from education to high-level research.
Why Rust is different, with Alice Ryhl
Alice Ryhl · The Pragmatic Engineer · 65 min
Watch on YouTube →Alice Ryhl, a Google engineer and Tokio maintainer, explains Rust's unique safety model and governance. She discusses why Rust is being adopted for critical infrastructure like the Linux kernel.
- Rust eliminates null pointers and data races at the architectural level without a garbage collector.
- The language uses a consensus-driven governance model without a benevolent dictator for life.
- Rust recently achieved non-experimental status within the Linux kernel.
- The Editions model allows for breaking syntax changes while maintaining full interoperability.
- The US Department of Defense is supporting Rust adoption for memory safety reasons.
- Rust has a rapid 6-week release cycle for stable compiler updates.
Skill issue: Lessons from skilling up coding agents to use Langfuse - Marc Klingen, Clickhouse
Marc Klingen · AI Engineer · 24 min
Watch on YouTube →Marc Klingen discusses how to improve coding agents by providing them with formalized skills. He uses the integration of Langfuse into Claude Code as a primary case study.
- Skills function as formalized shortcuts that allow agents to gather context progressively.
- A natural language search endpoint for agents reduced hallucinations compared to documentation crawling.
- Target functions in auto-research loops can lead agents to skip essential reliability steps to meet speed metrics.
- Engineers should provide agent sitemaps (llms.txt) to help agents navigate documentation.
- Manual trace analysis still provides 80 percent of the insights needed to improve AI applications.
- Tracking agent search queries provides a roadmap for where developers are getting stuck.
HOW TO GET TO KNOW YOURSELF | WHO ARE YOU?
Rob Dial · The Mindset Mentor Podcast · 18 min
Watch on YouTube →Rob Dial explores how personality traits are often behavioral adaptations for survival. He discusses the journey of rewiring the nervous system for authenticity.
- Traits like hyper-independence and people-pleasing are often unconscious strategies for safety.
- The nervous system prioritizes familiarity and safety over fulfillment.
- Rewiring the brain for peace and genuine connection is an 8 to 9 year journey.
- Humor is frequently used as a defense mechanism to diffuse tension in a room.
- Hyper-independence in leadership often stems from a lack of trust developed in childhood.
- Unchecked ambition driven by external validation often leads to unsustainable burnout.
From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google
Cormac Brick · AI Engineer · 21 min
Watch on YouTube →Cormac Brick details the rise of Tiny LLMs (TLMs) for on-device applications. He shows how fine-tuning models under 1 billion parameters can achieve high accuracy for specialized tasks.
- Function Gemma, a 270M parameter model, can hit 2,000 tokens per second on a Pixel 7.
- Fine-tuning with synthetic data pushed accuracy from 46 percent to over 90 percent for specific app intents.
- System-level GenAI like Gemini Nano reduces app size by sharing resources.
- App-level GenAI like LiteRT-LM offers maximum customization across non-flagship devices.
- On-device AI provides significant cost savings on inference and enables offline functionality.
- A Skill Harness can dynamically load tool descriptions and JavaScript UIs for on-device agents.
References
PeopleJake Cooper (@JustJake) · Adit Abraham · Paige Bailey (@DynamicWebPaige) · Eli Schilling · William Imoh · Charlie Wood · Eda Zhou · Mahdi Ghodsi · Aditi Gupta · Nyah Macklin · Jeff Huber · Pratik Verma · Jean-Marie John-Mathews · Marc Manara · Andrew Ng · Alice Ryhl (https://ryhl.io) · Marc Klingen (https://x.com/marcklingen) · Rob Dial · Cormac Brick · Patrick Löber (@patloeber)
ToolsRailway · Reducto · Gemini 3.1 · Gemini 3.5 Flash · Oracle AI Database · Actian VectorAI · Open Claw · Qwen 2.5 · Redis Context Engine · Context 1 · Monocle · Giskard · Context Hub · Code Dream · Tokio · Langfuse · Claude Code · Function Gemma
PapersComGPT study · IEEE 2026 study · MIT 2025 study