Episódios

  • Is It Broken Everywhere or Just for Me with Omri Sass
    Jan 22 2026

    When your website stops working at 3 AM, you need to answer one question fast: Is it my code or is a big cloud provider having problems? Omri Sass from Datadog explains updog.ai, a tool that monitors whether major services like AWS, CloudFlare, and others are actually working. Instead of asking people to report problems like Down Detector does, updog uses real data from thousands of computers to detect when services go down. Omri shares why this took 6 years to build, how they process massive amounts of data with machine learning, and why cloud providers have been strangely upset about these tools existing.



    About Omri:

    Omri Sass is a Director of Product Management at Datadog, where he leads and supports a team of 25+ product managers driving initiatives across Bits AI SRE, Data Observability, Service Management, and most recently, the launch of updog.ai. Outside of work, Omri is an avid sci-fi reader, a dedicated yoga practitioner, and happily outmatched by his cat.


    Show Highlights:

    (02:12) What is Updog and How Does It Work

    (03:38) Why Knowing If It's a Global Problem Matters

    (04:01) The Problem With Testing Every Endpoint Yourself

    (05:52) How Datadog Discovered EC2 Outages From Their Own Systems

    (10:38) When AWS Regions Go Down and Cascade Failures

    (13:13) What Happens When Services Rebuild Completely
    (16:29) The Most Important Learning During a 3 AM Incident
    (20:11) Why This Took So Long to Build
    (23:40) When Datadog Going Down Isn't Critical Path
    (25:22) How They Picked Which AWS Services to Monitor
    (27:07) What Comes Next for Updog
    (30:11) Where to Find Omri and Updog


    Links:

    Datadog: datadoghq.com

    Omir’s LinkedIn: https://www.linkedin.com/in/omri-sass-65632a14/

    Sponsored by:
    duckbillhq.com

    Exibir mais Exibir menos
    31 minutos
  • Solving the 20-Year S3 File System Problem with Hunter Leath
    Jan 20 2026

    Hunter Leath, CEO of Archil, spent 8 years building Amazon's EFS file storage system, learning exactly why making cloud storage act like a hard drive always fails. Old programs need hard drives, but cloud storage doesn't work like hard drives—a problem that's existed for 20 years.

    Now Hunter's building Archil, which puts super-fast storage between programs and S3 so they can finally work together. Your programs think they're talking to a regular disk while your data lives safely in the cloud.

    Hunter explains how they're doing what others couldn't, why it costs less than Amazon's own solutions, and why file systems suddenly matter again in the AI era.

    Show Highlights:

    (01:37) What Archil Does and Why It Exists

    (02:26) Why Mounting S3 as a File System Has Always Failed

    (03:07) What Building EFS Taught Hunter

    (06:55) Using Fast SSDs as a Cache Layer for S3

    (09:45) Attaching Archil to Your Existing S3 Buckets

    (15:08) Why Archil Costs Less Than EBS When You Do the Math

    (17:56) What Happens If Amazon Builds This Feature

    (19:20) Competing With EBS Performance on GP3 Volumes

    (21:43) Raising $6.7 Million Without an AI Pitch

    (23:46) What Customers Get Wrong About Archil

    (28:07) Accessing Data Stored in Glacier Deep Archive

    (29:24) The Plan to Get Into the Linux Kernel

    (30:51) Where to Find Hunter



    About Hunter Leath:

    Hunter is the founder and CEO of Archil, which transforms S3 buckets into infinite, local file systems that provide instant access to massive data sets. Prior to Archill, Hunter spent the last ten years in the cloud storage industry, including 8 years building Amazon's Elastic File System product and one year on Netflix's core storage team.

    Links:
    Hunter Leath on LinkedIn: https://www.linkedin.com/in/hleath/

    Hunter Leath on X: https://x.com/jhleath/

    Archil’s Website: https://archil.com

    Sponsored by:
    duckbillhq.com

    Exibir mais Exibir menos
    32 minutos
  • Building Systems That Work Even When Everything Breaks with Ben Hartshorne
    Jan 15 2026

    When AWS has a major outage, what actually happens behind the scenes? Ben Hartshorne, a principal engineer at Honeycomb, joins Corey Quinn to discuss a recent AWS outage and how they kept customer data safe even when their systems couldn't fully work. Ben explains why building services that expect things to break is the only way to survive these outages. Ben also shares how Honeycomb used its own tools to cut their AWS Lambda costs in half by tracking five different things in a spreadsheet and making small changes to all of them.


    About Ben Hartshorne:

    Ben has spent much of his career setting up monitoring systems for startups and now is thrilled to help the industry see a better way. He is always eager to find the right graph to understand a service and will look for every excuse to include a whiteboard in the discussion.

    Show highlights:

    (02:41)Two Stories About Cost Optimization

    (04:20) Cutting Lambda Costs by 50%

    (08:01) Surviving the AWS Outage

    (09:20) Preserving Customer Data During the Outage

    (13:08) Should You Leave AWS After an Outage?

    (15:09) Multi-Region Costs 10x More

    (18:10) Vendor Dependencies

    (22:06) How LaunchDarkly's SDK Handles Outages

    (24:40) Rate Limiting Yourself

    (29:00) How Much Instrumentation Is Too Much?

    (34:28) Where to Find Ben


    Links:

    Linkedin: https://www.linkedin.com/in/benhartshorne/

    GitHub: https://github.com/maplebed


    Sponsored by:
    duckbillhq.com

    Exibir mais Exibir menos
    36 minutos
  • Engineering Around Extreme S3 Scale with R. Tyler Croy
    Jan 13 2026

    R. Tyler Croy, a principal engineer at Scribd, joins Corey Quinn to explain what happens when simple tasks cost $100,000. Checking if files are damaged? $100K. Using newer S3 tools? Way too expensive. Normal solutions don't work anymore. Tyler shares how with this much data, you can't just throw money at the problem, but rather you have to engineer your way out.

    About R. Tyler:

    R. Tyler Croy leads infrastructure architecture at Scribd and has been an open source developer for over 14 years. His work spans the FreeBSD, Python, Ruby, Puppet, Jenkins, and Delta Lake communities. Under his leadership, Scribd’s Infrastructure Engineering team built Delta Lake for Rust to support a wide variety of high performance data processing systems. That experience led to Tyler developing the next big iteration of storage architecture to power large-scale fulltext compute challenges facing the organization.

    Show Highlights:
    01:48 Scribd's 18-Year History

    04:00 One Document Becomes Billions of Files

    05:47 When Normal Physics Stop Working

    08:02 Why S3 Metadata Costs Too Much

    10:50 How AI Made Old Documents Valuable

    13:30 From 100 Billion to 100 Million Objects

    15:05 The Curse of Retail Pricing

    19:17 How Data Scientists Create Growth

    21:18 De-Normalizing Data Problems

    25:29 Evolving Old Systems

    27:45 Billions Added Since Summer

    29:29 Underused S3 Features

    31:48 Where to Find Tyler


    Links:

    Scribd: https://tech.scribd.com
    Mastodon: https://hacky.town/@rtyler
    GitHub: https://github.com/rtyler

    Sponsored by:
    duckbillhq.com

    Exibir mais Exibir menos
    34 minutos
  • Avery Pennarun on Tailscale's Evolution: From Mesh VPN to AI Security Gateway
    Jan 8 2026

    Corey Quinn sits down with Avery Pennarun, co-founder and CEO of Tailscale, for a deep dive into how the company is reinventing networking for the modern era. From finally making VPNs behave the way they should to tackling AI security with zero-click authentication, Avery shares candid insights on building infrastructure people actually love using, and love talking about.

    They get into everything: surviving 100% year-over-year growth, why running on two tailnets at once is pure chaos, and how Tailscale makes “secure by default” feel effortless. Plus, they dig into why FreeBSD firewalls needed some tough love, the uncomfortable truth behind POCs, and even the surprisingly useful trick of turning your Apple TV into an exit node.


    About Avery:

    Avery Pennarun is the co-founder and CEO of Tailscale, where he’s redefining secure networking with a simple, Zero Trust approach. A veteran software engineer with experience ranging from startups to Google, he’s known for turning complex systems into approachable, user-friendly tools. His contributions to projects like wvdial, bup, and sshuttle reflect his belief that great technology should be both powerful and easy to use. With a mix of technical depth and dry humor, Avery shares insights on modern networking, internet evolution, and the realities of scaling a startup.

    Highlights:
    (0:00) Introduction to Tailscale and Security

    (00:52) Sponsorship and Personal Experiences

    (02:07) Technical Deep Dive into Tail Scale

    (06:10) Challenges and Future of Tail Scale

    (22:45) Building the Tail Net's API

    (23:54) Connecting Cloud Providers with Tailscale

    (25:22) Tailscale as a Security Solution

    (26:44) Innovations and Future of Tailscale

    Sponsored by:
    duckbillhq.com

    Exibir mais Exibir menos
    44 minutos
  • How Grokability Built a Profitable Open Source Business with Jeremy Price
    Jan 6 2026

    Most open source companies do the same thing. They take investor money, lock their best features behind paywalls, sell the company, and disappoint everyone. Grokability did something different.

    Jeremy Price, VP of Technology at Grokability talks with Corey Quinn about how they built a business that makes enough money without chasing endless growth. From why they use simple technology to how they run thousands of separate installations for customers, Jeremy explains what happens when you care more about making a good product than explosive growth.

    Show Highlights:

    (00:51) Welcoming Jeremy Price from Grokability

    (03:34) How Snipe-IT Started With a Bet

    (05:30) Paying for Software Can Change Everything

    (07:40) When AWS Competes With Open Source

    (10:10) Boring Businesses Make Money

    (15:30) Balancing Hosting Needs and Product Quality

    (18:00) Pricing That Avoids Big Customer Problems

    (21:06) Better Than a Google Sheet

    (27:02) The Psychology of Buying

    (29:33) Where to Find Jeremy and Grokability

    Links:

    https://jermops.com/about/

    https://www.linkedin.com/in/jeremygprice/

    https://snipeitapp.com/company

    Sponsored by:
    duckbillhq.com

    Exibir mais Exibir menos
    31 minutos
  • The AI Productivity Gap with Keith Townsend
    Dec 11 2025

    Corey Quinn reconnects with Keith Townsend, founder of The CTO Advisor, for a candid conversation about the massive gap between AI hype and enterprise reality. Keith shares why a biopharma company gave Microsoft Copilot a hard no, and why AI has genuinely 10x’d his personal productivity while Fortune 500 companies treat it like radioactive material. From building apps with Cursor to watching enterprises freeze in fear of being the next AI disaster in the news, Keith and Corey dig into why the tools transforming solo founders and small teams are dead on arrival in the enterprise, and what it'll actually take to bridge that gap.


    About Keith Townsend
    Keith Townsend is an enterprise technologist and founder of The Advisor Bench LLC, where he helps major IT vendors refine their go-to-market strategies through practitioner-driven insights from CIOs, CTOs, and enterprise architects. Known as “The CTO Advisor,” Keith blends deep expertise in IT infrastructure, AI, and cloud with a talent for translating complex technology into clear business strategy.
    With more than 20 years of experience, including roles as a systems engineer, enterprise architect, and PwC consultant, Keith has advised clients such as HPE, Google Cloud, Adobe, Intel, and AWS. His content series, 100 Days of AI and CloudEveryday.dev, provide practical, plainspoken guidance for IT leaders. A frequent speaker at VMware Explore, Interop, and Tech Field Day, Keith is a trusted voice on cloud and infrastructure transformation.


    Show Highlights
    (01:25) Life After the Futurum Group Acquisition

    (03:56) Building Apps You're Not Qualified to Build with Cursor

    (05:45)Creating an AI-Powered RSS Reader

    (09:01) Why AI is Great at Language But Not Intelligence

    (11:39) Are You Looking for Advice or Just Validation?

    (13:49) Why Startups Can Risk AI Disasters and AWS Can't

    (17:28) You Can't Outsource Responsibility

    (19:52) Business Users Are Scared of AI Too

    (23:00) LinkedIn's AI Writing Tool Misses the Point

    (26:42) Private AI is Starting to Look Appealing

    (29:00) Never Going Back to Pre-AI Development

    (34:27) AI for Jobs You'd Never Hire Someone to Do

    (39:09) Where to Find Keith and Closing Thoughts

    Links

    The CTO Advisor: https://thectoadvisor.com

    Sponsor:
    https://www.sumologic.com/solutions/dojo-ai
    https://wiz.io/crying-out-cloud

    Exibir mais Exibir menos
    41 minutos
  • AI Agents, Enterprise Risk, and the Future of Recovery: Rubrik’s Vision with Dev Rishi
    Dec 4 2025

    In this episode of Screaming in the Cloud, Corey Quinn sits down with Rubrik’s GM of AI, Dev Rishi, to unpack the real story behind enterprise AI adoption, the rise of agentic systems, and why most organizations are still stuck in read-only mode. Dev breaks down how Rubrik’s Agent Rewind brings safety, observability, and resilience to AI-driven actions, solving the “Oh no, the agent deleted production data” problem before it happens. From deep learning’s evolution to the massive gap between consumer AI enthusiasm and enterprise risk posture, this conversation is a candid, insightful look at the AI future Global 2000 companies are racing toward… or cautiously tiptoeing into.



    Show Highlights

    (00:25) Understanding Rubrik and Agent Rewind

    (00:50) Challenges in AI and Disaster Recovery

    (01:27) Guest Introduction: Dev Rishi from Rubrik

    (01:44) The Evolution of AI in Enterprises

    (02:33) Starting an AI Company: The Backstory

    (05:10) Generative AI and Its Impact

    (07:15) Enterprise AI Trends and Challenges

    (08:56) The Future of Agentic AI

    (18:03) AI in Customer Support

    (22:03) Rubrik's Acquisition and AI Strategy

    (29:30) Launching Rubrik Agent Cloud

    (31:26) Lessons from Starting a Machine Learning Company

    (35:25) Conclusion and Contact Information

    Sponsor:
    Rubrik: https://www.rubrik.com/sitc

    Exibir mais Exibir menos
    36 minutos