Future Forward Interview

I recently joined Nick and Matt on their podcast to talk about what we're building at Modular - and why the AI infrastructure layer matters more than most people think. We covered everything from why no engineer actually starts a project by picking their GPU, to the edge AI thesis I've been chasing since the TensorFlow Lite days at Google, to a fun side project I built called Compound Loop that orchestrates multiple frontier models against each other to produce better code than any single model can alone. If you're interested in where AI compute is heading - beyond the data center, beyond CUDA lock-in, and toward a world where intelligence runs everywhere - give it a listen or read through the highlights below.

Summary of the interview

Modular's Core Pitch

  • "Hypervisor for compute" - abstracting away hardware so AI programs run seamlessly across any silicon

  • No one starts a project saying "I must use this hardware" - they start with accuracy, latency, cost, and throughput targets. Hardware shouldn't be front and center

  • Today, moving from Nvidia to AMD to TPUs requires enormous rewriting, and Modular eliminates that

Market & Customers

  • Inference-focused today, targeting sophisticated AI labs and Gen AI startups doing large-scale deployments

  • The multi-hardware future is already here - every hyperscaler is building their own silicon (Google TPUs, AWS Inferentia, AMD, Apple Silicon)

  • Analogy to the multi-cloud movement - no one wants to be locked to a single provider

Edge AI Thesis

  • Rooted in your TensorFlow Lite experience at Google - low latency, privacy-sensitive AI running where you are

  • Local models will get there - "the models you're using today are the dumbest models you'll ever use"

  • Pointed to OpenClaw as validating the pattern of local agents with persistent memory and full system access, even though inference still goes to cloud today

Big Tech vs. Open Community

  • OpenClaw did what Apple hasn't in 15 years - a solo developer in Austria shipped a personal AI assistant on local machines

  • Big tech has organizational challenges, security concerns, and massive user bases creating caution

  • Microsoft got roasted for Copilot reading your screen, but OpenClaw launches and everyone loves it - the "HR meme" dynamic

Compound Loop (Your Side Project)

  • Orchestration system that battles models against each other - Claude, Codex, Gemini

  • Workflow: plan → implement → review → merge, with models cross-reviewing each other's work

  • Key local component: local embedding models build representations of your codebase, so you only send small context windows to cloud models - ~30x reduction in token usage

  • Runs autonomously - finishes a task, goes back to the plan, does the next thing

Fun Observation

  • Everyone defaults to the most powerful model even when they don't need it - "the higher the number, the better" pattern. Model labs are now starting to abstract that away with routing based on query complexity.

Next
Next

The case for democratising compute in a multi-model world