Scale or Surrender: When watts determine freedom
Consider this provocative framing: what if we viewed our collective future not through the lens of human populations and national borders, but through available compute capacity? In this view, the race to build massive datacenter infrastructure becomes humanity's defining competition. This perspective makes efficiency not just important, but existential.
Over the past two centuries, humanity's relationship with energy has been nothing short of transformative. If you chart global primary energy consumption from the Industrial Revolution to today, you'll see something remarkable: an almost unbroken ascent, punctuated by only three brief pauses - the early 1980s oil crisis aftermath, the 2009 financial crisis, and the 2020 pandemic. Otherwise, it's been an extraordinary march upward, powered first by coal and oil, then natural gas, nuclear, hydropower, and increasingly, renewables. This wonderful graphic highlights this well - with populous nations like China, the United States, and India dominating total consumption on a per-person basis.
The geographic distribution of this energy consumption tells a striking story. China, the United States, and India dominate in absolute terms, but the per-capita numbers reveal something more profound. Citizens of Iceland, Norway, Canada, the United States, and wealthy Gulf states like Qatar and Saudi Arabia consume up to 100 times more energy than those in the world's poorest regions. This isn't merely inequality - it's a chasm so vast that millions of people still rely on traditional biomass (wood, agricultural residues) that doesn't even register in our global energy statistics, creating data gaps.
The disparities in electricity generation are equally stark. Iceland, blessed with abundant geothermal and hydro resources, generates hundreds of times more electricity per person than many low-income nations, where annual per-capita generation can fall below 100 kilowatt-hours - less than what a modern refrigerator uses in two months.
This context matters immensely as we confront the dual challenge of our time: meeting rising global energy demand while urgently decarbonizing our energy supply. Despite record investments in clean technologies, fossil fuels still account for approximately 81.5% of global primary energy. The math here is unforgiving - renewable sources must not only meet all new demand but also replace existing fossil fuel capacity if we're to bend the emissions curve downward.
Enter artificial intelligence, with its voracious and growing appetite for electricity.
In 2023, U.S. data centers consumed approximately 176 terawatt-hours - 4.4% of national electricity consumption. Current projections suggest this could reach 325 to 580 TWh by 2028, representing 6.7% to 12% of total U.S. electricity demand driven largely by AI workloads that demand ever-increasing compute power and specialized hardware. To contextualize these numbers: we're talking about enough electricity to power between 32.5 and 58 million American homes.
The AI industry has long understood a critical metric that deserves wider attention: tokens-per-dollar-per-watt. This measure of computational efficiency relative to both cost and energy consumption has been a focus at Google and other leading technology companies for years. It represents the kind of systems thinking we desperately need as AI capabilities expand.
The challenge before us is clear. We're attempting to build transformative AI systems while simultaneously addressing the climate crisis. These goals aren't inherently incompatible, but reconciling them requires unprecedented coordination and innovation across multiple domains:
Hardware efficiency: Next-generation chips that deliver dramatically better performance-per-watt
Operational intelligence: Carbon-aware scheduling that aligns compute-intensive tasks with renewable energy availability
Infrastructure innovation: On-site renewable generation and novel cooling systems that minimize overhead
System integration: Data centers that contribute to local energy systems through waste heat recovery
Radical transparency: Clear reporting standards that drive competition on efficiency metrics
Global energy consumption tells a story of both peril and promise. As artificial intelligence scales exponentially, it threatens to derail climate progress - yet history shows us that human ingenuity consistently reimagines our energy systems when survival demands it. We have already proven we can build transformative AI; the defining challenge now is whether we can build it sustainably, ensuring our creations enhance rather than endanger the world they serve.
The stakes are higher than they appear. Even breakthrough efficiency gains in AI hardware may paradoxically increase total energy consumption - a manifestation of Jevon's Paradox, where technological improvements drive greater overall demand. At this crossroads of intelligence and energy transformation, our choices will determine whether AI becomes humanity's greatest tool or its most consequential miscalculation.
The arithmetic is challenging, but not impossible. What's required is the kind of systematic thinking and ambitious action that has characterized humanity's greatest technological leaps. The alternative - allowing AI's energy demands to grow unchecked - would represent a profound failure of imagination and responsibility, but also the risks are enormous as whichever nations control the most powerful AI systems - are the new superpowers of tomorrow. In this article, I try to shine a light on what's causing the enormous growth of energy demands and some thoughts about the path forward.
The geography of American power
To truly grasp the magnitude of AI's growing energy demands, it's instructive to examine America's electricity generation landscape. At the apex sits the Palo Verde Nuclear Generating Station in Arizona, the nation's largest power producer, generating approximately 32 million megawatt-hours annually - equivalent to 32 billion kWh or 32 GWh.
What does 32 billion kWh actually mean? The U.S. Energy Information Administration reports that the average American household consumes about 10,500 kWh per year. Simple arithmetic reveals that Palo Verde alone could theoretically power 3.05 million homes - roughly 2.5% of the nation's 120.92 million households. One facility, powering the equivalent of a major metropolitan area.
The roster of America's electricity giants tells a fascinating story about our energy infrastructure. After Palo Verde, we have Browns Ferry (31 GWh, nuclear), Peach Bottom (22 GWh, nuclear), and then Grand Coulee Dam (21 GWh) - the hydroelectric marvel that helped build the American West. The list continues with West County Energy Center (19 GWh, natural gas), W.A. Parish (16 GWh, a coal/gas hybrid), and Plant Scherer in Georgia (15 GWh, coal).
Notice the pattern? Nuclear dominates the top tier, followed by a mix of hydro, gas, and coal. After these giants, capacity drops precipitously to facilities generating around 3 GWh - a reminder of how concentrated our electricity production really is. This concentration matters. When we project data centers consuming 325-580 TWh by 2028, we're talking about the equivalent of 10-18 Palo Verde stations running exclusively to power AI and digital infrastructure. That's not replacing existing demand - that's additional load on a grid already straining to decarbonize.
The average U.S. household consumes about 10,500 kilowatthours (kWh) of electricity per year, though this varies significantly by region and housing type. Residential electricity primarily powers essential systems: space cooling, water heating, space heating, along with refrigeration, lighting, and electronics. Commercial buildings have vastly different consumption patterns depending on their size and type, ranging from small offices to large retail centers and office complexes, each with varying HVAC, lighting, and operational equipment needs.
The sectoral breakdown of U.S. electricity consumption reveals a more balanced distribution than commonly understood. According to EIA forecast 2025-2026 power sales will rise to 1,494 billion kWh for residential consumers, 1,420 billion kWh for commercial customers and 1,026 billion kWh for industrial customers with longer term forecasts still mostly within norms. This translates to approximately 38% residential, 36% commercial, and 26% industrial consumption. Rather than the residential sector being overshadowed by commercial and industrial users, it actually represents the largest single sector of electricity demand, with commercial consumption running a close second. This distribution reflects America's transition toward greater electrification in homes and businesses, driven by factors including growing demand from artificial intelligence and data centers and as homes and businesses use more electricity.
The long view is revealing. According to the Energy Information Administration, U.S. electricity consumption increased in all but 11 years between 1950 and 2022. The rare declines - including 2019, 2020, and 2023 - coincided with economic contractions, efficiency improvements, or exceptional circumstances like the pandemic. The overarching trend remains unmistakably upward. While the exact figures here may carry some uncertainty, they accurately capture the essential dynamics. What matters isn't whether commercial usage is precisely 6.7 times residential, but that the disparity is substantial and the growth trajectory is clear. These patterns - concentrated commercial demand and relentless growth - form the backdrop against which we must evaluate AI's emerging energy requirements.
Understanding these scales helps frame the challenge ahead. Every percentage point of national electricity consumption that shifts to data centers represents millions of homes' worth of power. The infrastructure required to meet this demand sustainably doesn't just appear - it must be planned, financed, and built, all while racing against both growing demand and climate imperatives.
The numbers that keep me up at night
Let's revisit the core projection: U.S. data centers consumed approximately 176 TWh in 2023 (4.4% of national electricity) and are projected to reach 325-580 TWh by 2028 – equivalent to powering 32.5 to 58 million American homes.
But here's what keeps me up at night: 580 TWh might be just the beginning of what we need.
Consider today's reality. One analysis estimated that ChatGPT inference alone consumes an estimated 226.8 GWh annually – enough to power 21,000 U.S. homes – and that's already outdated. The International Energy Association (IEA) offers a more sobering projection: 945 TWh - that's the entire electricity consumption of the world's third-largest economy - Japan.
Let that sink in - AI could require as much electricity as the world's third-largest economy. The composition of this demand has already shifted fundamentally. During my time at Google, I watched inference overtake training as the primary driver of compute demand through the late 2010s, and now inference is quickly rising to represent more than 80%+ of the AI compute capacity across the industry. This matters because while training happens in discrete, intensive bursts, inference runs continuously at scale, serving billions of requests around the clock - every query, every recommendation, every generated response adds to the load. This is the crux of our challenge: AI follows exponential growth patterns that surprise even those who've spent years watching them unfold. Given the convergence toward dominant model architectures, inference's share will likely climb further, meaning our upper-bound projection of 945 TWh could see inference alone consuming over 756 TWh by 2028.
But doesn’t edge computing promise to slash data center demands? This is a narrative I've heard repeatedly throughout years of scaling early edge AI systems. Yet I remain deeply skeptical of any order-of-magnitude impact. The reason is simple: we've barely scratched the surface of enterprise, government, and industrial AI adoption. These sectors will unleash computational demands that dwarf any efficiency gains from consumer devices processing locally. Consider the asymmetry: for every smartphone performing local voice recognition, hundreds of enterprise systems are analyzing documents, monitoring infrastructure, processing surveillance footage, and generating complex reports. The sheer scale of this institutional transformation will eclipse whatever load we shift to the edge.
This reality leads us to the heart of the matter: if inference drives our energy challenge, how do we understand its consumption patterns? What does the energy anatomy of inference reveal, and where might we find our leverage points for optimization? Understanding these patterns isn't just an academic exercise - it's essential for developing strategies that can accommodate AI's growth while continuing to grow our energy infrastructure. The always-on nature of inference, combined with its direct relationship to usage, creates a fundamentally different challenge than the periodic spikes of model training.
Inference: the GOAT of consumption
Inference footprint represents the electricity consumed each time an AI model generates a response - as AI becomes ubiquitous across digital services, inference will inevitably dominate long-term energy costs. This raises a crucial question: how do we properly measure inference energy consumption? What's the right framework for calculating inference-per-token-per-watt?
Let's develop a working model, with an important caveat: these calculations rest on rough assumptions about the current AI landscape. They presume most AI continues running on transformer architectures without fundamental changes over the next few years - though I suspect this assumption may prove conservative. We're likely to use AI itself to discover more efficient architectures, potentially invalidating these projections in favorable ways. With that context, let's examine how inference energy consumption actually works and what drives its costs at scale.
The quadratic curse
Transformers are the neural network architecture powering most modern AI systems - from ChatGPT to Claude to Gemini, created by former colleagues at Google. The key innovation of this architecture is the ability to process all parts of an input simultaneously while understanding relationships between distant elements in the text. In transformer inference, prefill is the initial computational phase where the model processes your entire input prompt before generating any output. This involves a single forward pass through the network, computing hidden representations for all input tokens at once.
Your sequence length simply counts these tokens - the basic units of text that might be letters, partial words, or whole words depending on the tokenizer. "Hello, world!" typically translates to 3-4 tokens, while a lengthy document might contain thousands. This distinction matters because prefill computation scales with sequence length, making long prompts significantly more energy-intensive than short ones.
Prefill time grows quadratically with sequence length - double the input, quadruple the computation. This scaling behavior stems from transformers' core mechanism: self-attention. Self-attention requires computing relationships between every pair of tokens in the input. For n tokens, that's n² comparisons. Unlike older architectures (RNNs) that process tokens sequentially, transformers examine all tokens simultaneously, with each token gathering information from every other token in parallel.
Here's an intuitive analogy: imagine a roundtable discussion where each participant (token) prepares three items:
Query: "What information am I seeking?"
Key: "What information do I possess?"
Value: "What insight can I contribute?"
Each participant shares their query with everyone else, comparing it against others' keys to find the most relevant matches. They then synthesize their understanding by combining values from those whose keys best align with their query. Every participant does this simultaneously, creating a rich, interconnected understanding of the entire conversation. This elegant mechanism enables transformers' remarkable capabilities, but it comes at a cost: computational requirements that scale quadratically with input length. A 2,000-token prompt requires four times the computation of a 1,000-token prompt, not twice. This mathematical reality shapes the energy economics of AI inference at scale.
The Two Phases of Transformer Processing
Every transformer request involves two distinct computational phases:
Prefill: Processing the entire input prompt (quadratic scaling with input length, O(n²) complexity)
Decode: Generating output tokens one by one (linear scaling with output length, O(n) complexity)
This scaling difference has profound implications. While decode time grows linearly with the number of tokens generated, prefill time grows with the square of input length. The longer your prompt, the more dramatically prefill dominates total processing time.
Consider the relative computational work (in arbitrary units, assuming 50 output tokens):
Input Length | Prefill Work (∝ n²) | Decode Work | Prefill % of Total |
---|---|---|---|
500 tokens | 250,000 units | 50,000 units | 83% |
1,000 tokens | 1,000,000 units | 50,000 units | 95% |
2,000 tokens | 4,000,000 units | 50,000 units | 99% |
The pattern is stark. At 500 input tokens, prefill already consumes 83% of processing time. Double the input to 1,000 tokens, and prefill jumps to 95% - the actual generation phase becomes almost negligible. At 2,000 tokens, you're spending 99% of compute just understanding the prompt. Here's what's happening:
Prefill work = n² (where n = input tokens)
Decode work = 1,000 × output tokens (arbitrary scaling factor)
This quadratic scaling of self-attention explains why long-context models are so computationally expensive. As context windows expand from thousands to hundreds of thousands of tokens, the energy requirements don't just grow - they explode. Understanding this dynamic is crucial for anyone designing AI systems or planning infrastructure for the age of ubiquitous AI.
Doubling words, quadrupling watts
The quadratic scaling of context windows isn't just an abstract computational concern - it translates directly into energy consumption. Every FLOP requires energy, and when FLOPs scale quadratically, so does your electricity bill.
The energy equation is straightforward:
Devices draw roughly constant power P during operation (e.g., 300W for a high-performance GPU)
Energy consumed equals power multiplied by time: E = P × T
Since prefill time scales quadratically with input length, so does prefill energy
Let's make this concrete with realistic parameters:
Power draw: 300W
Decode time: 20ms (fixed for 50 output tokens)
Baseline prefill: 100ms for 500 input tokens
Input Tokens | Prefill Time | Prefill Energy | Decode Time | Decode Energy | Prefill % of Total |
---|---|---|---|---|---|
500 | 0.10 s | 30 J | 0.02 s | 6 J | 83% |
1,000 | 0.40 s | 120 J | 0.02 s | 6 J | 95% |
2,000 | 1.60 s | 480 J | 0.02 s | 6 J | 99% |
The energy story mirrors the computational one. While decode energy remains constant at 6 joules regardless of input length, prefill energy explodes from 30J to 480J as input doubles from 500 to 2,000 tokens. At 2,000 tokens, you're burning 80 times more energy understanding the prompt than generating the response.
Let's recap these results.
At 500 input tokens, prefill consumes 30J versus decode's 6J - already 83% of total energy. Double the input to 1,000 tokens, and prefill time quadruples, pushing energy consumption to 120J and commanding 95% of the total. By 2,000 tokens, the imbalance becomes extreme: 480J for prefill versus 6J for decode, with prefill consuming 99% of the energy budget. Extrapolate to a 10,000-token prompt generating just 1500 output tokens, and you're looking at 3.4Wh per query - nearly all spent on prefill. This isn't a marginal effect; it's the dominant factor in inference energy consumption.
The implications here are therefore profound. Whether you're designing for on-device inference with battery constraints, deploying in autonomous vehicles, or managing massive cloud infrastructure costs at scale - prompt length becomes your primary lever for controlling energy consumption. The quadratic scaling means that doubling prompt length doesn't double energy use - it roughly quadruples it. This scaling asymmetry defines the energy economics of AI. Decode plods along linearly - each output token costs the same as the last. But prefill explodes quadratically with input length, its hunger growing with the square of every token fed. A thousand-token prompt doesn't just double the cost of a 500-token prompt - it quadruples it. By the time we reach today's massive contexts, decode disappears entirely, a thin shadow cast by prefill's towering consumption.
And yet, users control neither the true input nor output. There isn’t a “token budget” in any service out there today, and that would likely create a frustrating user experience if there was. The largest providers - OpenAI, Google, Anthropic - inject substantial hidden context into every prompt while keeping their system instructions opaque. Output remains equally unconstrained: unless users explicitly demand brevity, models generate tokens freely and most services never think to limit responses, and most users don’t even understand they can.
This creates a fundamental tension in AI system design. While longer contexts enable richer interactions and more sophisticated reasoning, they exact an exponentially increasing energy toll. Once prompts exceed a few hundred tokens, virtually all computational resources are consumed by the prefill phase alone. For sustainable AI deployment at scale, prompt concision isn't merely good practice - it's an energy imperative. The difference between 500-token and 2,000-token average prompts could determine whether our global infrastructure remains viable or collapses under its own consumption.
This problem compounds as AI agents and capabilities like Deep Research proliferate. Each autonomous action, each recursive query, each unconstrained generation adds to an already exponential curve. We're building systems designed to think deeply while hoping they'll somehow learn restraint—a contradiction that grows more stark with every token generated.
The paradox blooms in plain sight
Here's the irony: while physics demands shorter prompts, the industry is sprinting in the opposite direction. Claude 4's system prompt far exceeds 10,000 tokens. Despite optimization techniques like KV cache retrieval and prefix caching, the overwhelming trend is toward ever-expanding context windows. We're stuffing everything we can into prompts - documentation, code repositories, conversation histories - because it demonstrably improves model capabilities.
A colleague recently quipped (hat tip, Tyler!): "We'll achieve AGI when all of Wikipedia fits in the prompt!" It's a joke that hits uncomfortably close to our current trajectory of maximizing context windows at every opportunity.
My rough calculations above turn out to align remarkably well with recent empirical findings. This very recent research shows that a GPT-4o query with 10,000 input tokens and 1,500 output tokens consumes approximately 1.7 Wh on commercial datacenter hardware. For models with more intensive reasoning capabilities, the numbers climb dramatically: DeepSeek-R1 averages 33 Wh for long prompts, while OpenAI's o3 model reaches 39 Wh. These aren't theoretical projections - they're measured consumption figures from production systems.
The energy cost of our context window expansion is real, substantial, and growing with each new model generation. We're caught between two competing imperatives: the computational benefits of longer contexts and the exponential energy costs they incur. The other interesting observation on this table below, is the explosive increase in computational power requirements as models have gotten larger and more sophisticated - the power laws of exponential scaling continue.
Model | Release Date▼ | Energy Consumption (Wh) (100 input-300 output tokens) |
Energy Consumption (Wh) (1K input-1K output tokens) |
Energy Consumption (Wh) (10K input-1.5K output tokens) |
---|---|---|---|---|
o4-mini (high) | Apr 16, 2025 | 2.916 ± 1.605 | 5.039 ± 2.764 | 5.666 ± 2.118 |
o3 | Apr 16, 2025 | 7.026 ± 3.663 | 21.414 ± 14.273 | 39.223 ± 20.317 |
GPT-4.1 | Apr 14, 2025 | 0.918 ± 0.498 | 2.513 ± 1.286 | 4.233 ± 1.968 |
GPT-4.1 mini | Apr 14, 2025 | 0.421 ± 0.197 | 0.847 ± 0.379 | 1.590 ± 0.801 |
GPT-4.1 nano | Apr 14, 2025 | 0.103 ± 0.037 | 0.271 ± 0.087 | 0.454 ± 0.208 |
GPT-4o (Mar '25) | Mar 25, 2025 | 0.421 ± 0.127 | 1.214 ± 0.391 | 1.788 ± 0.363 |
GPT-4.5 | Feb 27, 2025 | 6.723 ± 1.207 | 20.500 ± 3.821 | 30.495 ± 5.424 |
Claude-3.7 Sonnet | Feb 24, 2025 | 0.836 ± 0.102 | 2.781 ± 0.277 | 5.518 ± 0.751 |
Claude-3.7 Sonnet ET | Feb 24, 2025 | 3.490 ± 0.304 | 5.683 ± 0.508 | 17.045 ± 4.400 |
o3-mini (high) | Jan 31, 2025 | 2.319 ± 0.670 | 5.128 ± 1.599 | 4.596 ± 1.453 |
o3-mini | Jan 31, 2025 | 0.850 ± 0.336 | 2.447 ± 0.943 | 2.920 ± 0.684 |
DeepSeek-R1 | Jan 20, 2025 | 23.815 ± 2.160 | 29.000 ± 3.069 | 33.634 ± 3.798 |
DeepSeek-V3 | Dec 26, 2024 | 3.514 ± 0.482 | 9.129 ± 1.294 | 13.838 ± 1.797 |
LLaMA-3.3 70B | Dec 6, 2024 | 0.247 ± 0.032 | 0.857 ± 0.113 | 1.646 ± 0.220 |
o1 | Dec 5, 2024 | 4.446 ± 1.779 | 12.100 ± 3.922 | 17.486 ± 7.701 |
o1-mini | Dec 5, 2024 | 0.631 ± 0.205 | 1.598 ± 0.528 | 3.605 ± 0.904 |
LLaMA-3.2 1B | Sep 25, 2024 | 0.070 ± 0.011 | 0.218 ± 0.035 | 0.342 ± 0.056 |
LLaMA-3.2 3B | Sep 25, 2024 | 0.115 ± 0.019 | 0.377 ± 0.066 | 0.573 ± 0.098 |
LLaMA-3.2-vision 11B | Sep 25, 2024 | 0.071 ± 0.011 | 0.214 ± 0.033 | 0.938 ± 0.163 |
LLaMA-3.2-vision 90B | Sep 25, 2024 | 1.077 ± 0.096 | 3.447 ± 0.302 | 5.470 ± 0.493 |
LLaMA-3.1-8B | Jul 23, 2024 | 0.103 ± 0.016 | 0.329 ± 0.051 | 0.603 ± 0.094 |
LLaMA-3.1-70B | Jul 23, 2024 | 1.101 ± 0.132 | 3.558 ± 0.423 | 11.628 ± 1.385 |
LLaMA-3.1-405B | Jul 23, 2024 | 1.991 ± 0.315 | 6.911 ± 0.769 | 20.757 ± 1.796 |
GPT-4o mini | Jul 18, 2024 | 0.421 ± 0.082 | 1.418 ± 0.332 | 2.106 ± 0.477 |
LLaMA-3-8B | Apr 18, 2024 | 0.092 ± 0.014 | 0.289 ± 0.045 | — |
LLaMA-3-70B | Apr 18, 2024 | 0.636 ± 0.080 | 2.105 ± 0.255 | — |
GPT-4 Turbo | Nov 6, 2023 | 1.656 ± 0.389 | 6.758 ± 2.928 | 9.726 ± 2.686 |
GPT-4 | Mar 14, 2023 | 1.978 ± 0.419 | 6.512 ± 1.501 | — |
Table 4: How Hungry is AI? , I added Model Release Dates
Trade 8,500 conversations, to keep your home cool
To grasp the practical implications, consider this sobering calculation: at o3's extreme consumption of 39 Wh per query, approximately 76,923 interactions would drain 3,000 kWh - equivalent to powering a typical American home's air conditioning for an entire year. But as users inevitably gravitate toward richer prompts - say, 30K input tokens with 3.5K outputs - the quadratic curse strikes with mathematical precision. Prefill energy multiplies ninefold, collapsing that annual budget to just 8,500 interactions: merely 23 queries per day. This comparison transforms abstract energy figures into visceral reality, revealing how what appears negligible at the per-query level becomes a massive aggregate demand. And we're still only discussing text models.
The trajectory becomes even more stark when we consider multimodal AI. Image and video models can routinely process upwards of hundreds of thousands of tokens per query. Each frame, each visual element, each temporal relationship adds to the token count. As these models become mainstream, we're not just scaling linearly with adoption - we're multiplying adoption rates by dramatically higher per-query energy costs.
The math is unforgiving: widespread deployment of long-context AI at current efficiency levels would require energy infrastructure on a scale we're not remotely prepared for. This isn't a distant concern - it's the reality we're building toward with every context window expansion and every new multimodal capability. The urgency is real, we need to move faster or soon the trade-offs become explicit: 8,500 conversations or one cool home. Your ChatGPT query or your neighbor's heating. And while you might laugh, it’s already happening here in the United States, and water availability is next.
The stakes have never been higher
The mathematics we've explored tells a clear story: prefill computation scales quadratically with input length. Double your prompt, quadruple your energy consumption. As models embrace ever-larger context windows, this dynamic becomes one of the primary drivers of AI's steep energy trajectory.
Of course, these projections assume a degree of technological stasis. Innovation could disrupt these trends - and likely will - and the work we are doing at Modular is certainly trying to help. But consider this provocative framing: what if we viewed our collective future not through the lens of human populations and national borders, but through available compute capacity? In this view, the race to build massive datacenter infrastructure becomes humanity's defining competition. If each AI agent represents some fraction of human productive capacity, then the first to achieve a combined human-digital population of 5 or 10 billion wins the AI race, and invents the next great technological frontier.
This perspective makes efficiency not just important, but existential. The promise of AI as humanity's great equalizer inverts into its opposite: a world where computational capacity becomes the new axis of dominance. Without breakthrough innovation at every layer - from silicon to algorithms to system architecture - we face an AI revolution strangled by the very physics of power generation. The future belongs not to the wise, but to the watt-rich. And so we must scale with unprecedented urgency - nuclear, renewables, whatever it takes - because the stakes transcend mere technological supremacy. In this race, computational power becomes political power, and China is currently winning by a large margin. If democratic nations cede the AI frontier to autocracies, we don't just lose a technological edge; we risk watching the values of human dignity and freedom dim under the shadow of algorithmic authoritarianism. The grid we build today determines whose values shape the world of tomorrow.
It's interesting to reflect that we are teaching machines to think with the very energy that makes our planet uninhabitable - yet these same machines may be our only hope of learning to live within our means. AI is both the fever and the cure, the flood and the ark, the hunger that's outrunning itself. We race against our own creation, betting that the intelligence we birth from burning carbon will show us how to stop burning it altogether. The question of our age: Can we make AI wise enough to save us before it grows hungry enough to consume us? The stakes are incredibly unquestionably high in the race to AI superintelligence.
I'll close with an irony that perfectly captures the current moment - the suggestions below come courtesy of Anthropic’s Claude 4 Opus:
Better Chips and Smarter Cooling - The latest AI chips use way less energy for the same work. Pair that with innovative cooling like liquid systems or modular designs, and data centers have already seen energy savings of up to 37% in test runs.
Timing is Everything - Not all AI work needs to happen right now. By running non-urgent tasks when electricity is cleaner (like when it's sunny or windy), some companies have cut their carbon emissions from AI jobs by 80-90%. It's like doing laundry at night when rates are lower, but for the planet.
Power Where You Need It - Building solar panels, wind turbines, and battery storage right at data center sites makes sense. Google's recent $20 billion investment in clean energy shows how tech giants can grow their AI capabilities without relying entirely on the traditional power grid.
Working Together on Infrastructure - Data center operators need to share their growth plans with utility companies. This helps everyone prepare for the massive power needs coming to tech hubs like Northern Virginia, Texas, and Silicon Valley - think of it as giving the power company a heads-up before throwing a huge party.
Show Your Work - Just like appliances have energy ratings, AI companies should tell us how much power they're using. Whether its energy per query or per training session, transparency creates healthy competition to be more efficient.
Investing in Tomorrow's Solutions - Government programs are funding research into game-changing technologies like optical processors that could use 10 times less energy. There's also exciting work on making AI models smaller and smarter without losing capabilities.
Turning Waste into Resources - Data centers generate tons of heat - why not use it? Some facilities are already warming nearby buildings with their excess heat, turning what was waste into a community benefit.
And there it is - 40 watt-hours spent asking AI how to save token-per-watt-hours, with some of these ideas still unproven (e.g. optical processors). The perfect metaphor for our moment: we burn the world to ask how to stop burning it, racing our own shadow toward either wisdom or ruin. The verdict seems more absolute: scale or surrender - there is no middle ground in the physics of power.
Thanks to Tyler Kenney, Kalor Lewis, Eric Johsnon, Christopher Kauffman, Will Horyn, Jessica Richman, Duncan Grove and others for many fun discussions on the nature of AI and energy.
Fund/Build/Scale Podcast Interview
Leaving a high-paying role at Google to take on NVIDIA, Intel, and AMD is not for the faint of heart, but that’s exactly what Tim Davis, co-founder and president of Modular, did.
I had a wonderful conversion with Walter Thompson about Modular some time ago, we covered AI compute, building a new AI software stack, the ups and downs of startups and scaling AI into the future.
Introduction
Leaving a high-paying role at Google to take on NVIDIA, Intel, and AMD is not for the faint of heart, but that’s exactly what Tim Davis, co-founder and president of Modular, did.
In this episode of Fund/Build/Scale, Tim explains why Modular has raised $130M to reimagine AI compute infrastructure, and what he’s learned trying to build a platform that competes with some of the biggest names in tech.
We talked about:
🚀 Why Modular believes AI workloads need a hardware-agnostic execution platform
💡 How Tim and co-founder Chris Lattner decided to “start from the hardest part of the stack”
💰 The trade-offs of raising VC for infrastructure-heavy startups
🛠 Why Modular focuses on talent density and how they’ve recruited top engineers from Google, NVIDIA, and beyond
🌍 What it takes to break into the AI space when your competitors are trillion-dollar companies
Tim takes a thoughtful, deep-dive approach to this conversation—unpacking the complexities of AI infrastructure and what it takes to build in one of tech’s most competitive spaces. There’s valuable insight here for founders navigating technical markets or aiming to disrupt entrenched players.
Episode Breakdown
(1:26) “We are building a new accelerated execution platform for compute.”
(6:41) “ It will exist all over the place and it already does, but AI will be everywhere that compute is.”
(11:18) “ You only you only have so much time in a week. What is the thing that you're best at?”
(15:13) “ We have decided to start from the hardest part of the software stack.”
(22:44) “For the most talented people in the world, the risk is actually not as great as what you think.”
(30:24) “ Growing up in Australia, my view of the of the United States was very much driven from the media and from Hollywood.”
(33:26) “ I sat in a room for six weeks and just met everyone that I could. And that really was the beginning of a journey to the United States.”
(37:48) “ I still think there's a special place in the Bay Area, and in the United States, there is a different risk appetite.”
(40:41) The one question Tim would have to ask the CEO before he’d take a job at someone else’s early-stage startup.
AI Regulation: step with care, and great tact
AI systems take an incredible amount of time to build and get right - I know because I have helped scale some of the largest AI systems in the world, which have directly and indirectly impacted billions of people. If I step back and reflect briefly - we were promised mass production self-driving cars 10+ years ago, and yet we still barely have any autonomous vehicles on the road today. Radiologists haven’t been replaced by AI despite predictions virtually guaranteeing as much, and the best available consumer robot we have is the iRobot J7 household vacuum.
“So be sure when you step, Step with care and great tact. And remember that life's A Great Balancing Act. And will you succeed? Yes! You will, indeed! (98 and ¾ percent guaranteed).” ― Dr. Seuss, Oh, the Places You'll Go!
AI systems take an incredible amount of time to build and get right - I know because I have helped scale some of the largest AI systems in the world, which have directly and indirectly impacted billions of people. If I step back and reflect briefly - we were promised mass production self-driving cars 10+ years ago, and yet we still barely have any autonomous vehicles on the road today. Radiologists haven’t been replaced by AI despite predictions virtually guaranteeing as much, and the best available consumer robot we have is the iRobot J7 household vacuum.
We often fall far short of the technological exuberance we project into the world, time and time again realizing that producing incredibly robust production systems is always harder than we anticipate. Indeed, Roy Amara made this observation long ago:
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.
Even when we have put seemingly incredible technology out into the world, we often overshoot and release it to minimal demand - the litany of technologies that promised to change the world but subsequently failed is a testament to that. So let's be realistic about AI - we definitely have better recommendation systems, we have better chatbots, we have translation across multiple languages, we can take better photos, we are now all better copywriters, and we have helpful voice assistants. I've been involved in using AI to solve real-world problems like saving the Great Barrier Reef, and enabling people who have lost limbs to rediscover their lives again. Our world is a better place because of such technologies, and they have been developed and deployed into real applications over the last 10+ years with profoundly positive effect.
But none of this is remotely close to AGI - artificial general intelligence - or anywhere near ASI - artificial superintelligence. We have a long way to go despite what the media presages, like a Hollywood blockbuster storyline. In fact, in my opinion, these views are a bet against humanity because they gravely overshadow the incredible positives that AI is already, and will continue, providing - massive improvements in healthcare, climate change, manufacturing, entertainment, transport, education, and more - all of which will bring us closer to understanding who we are as a species. We can debate the merits of longtermism forever, but the world has serious problems we can solve with AI today. Instead, we should be asking ourselves why we are scrambling to stifle innovation and implement naively restrictive regulatory frameworks now, at a time when we are truly still trying to understand how AI works and how it will impact our society?
Current proposals for regulation seem more concerned with the ideological risks associated with transformative AGI - which, unless we uncover some incredible change in physics, energy, and computing, or perhaps uncover that AGI exists in a vastly different capacity to our understanding of intelligence today - is nowhere near close. The hysteria projected by many does not comport with the reality of where we are today or where we will be anytime soon. It's the classic mistake of inductive reasoning - taking a very specific observed example and making overly broad generalizations and logical jumps. Production AI systems do far more than just execute an AI model - they are complex systems - similarly, a radiologist does far more than look at an image - they treat a person.
For AI to change the world, one cannot point solely to an algorithm or a graph of weights and state the job is done - AI must exist within a successful product that society consumes at scale - distribution matters a lot. And the reality today is that the AI infrastructure we currently possess was developed primarily by researchers for research purposes. The world doesn't seem to understand that we don't actually have production software infrastructure to scale and manage AI systems to the enormous computational heights required for them to practically work in products across hardware. At Modular, we have talked about these challenges many times, because AI isn't an isolated problem - it's a systems problem comprised of hardware + software across both data centers and the edge.
Regulation: where to start?
So, if we assume that AGI isn't arriving for the foreseeable future, as I and many do, what exactly are we seeking to regulate today? Is it the models we have already been using for many years in plain sight, or are we trying to pre-empt a future far off? As always, the truth lies somewhere in between. It is the generative AI revolution - not forms of AI that have existed for years - that has catalyzed much of the recent excitement around AI. It is here, in this subset of AI, where practical AI regulation should take a focused start.
If our guiding success criteria is broadly something like - enable AI to augment and improve human life - then let's work backward with clarity from that goal and implement legislative frameworks accordingly. If we don't know what goal we are aiming for - how can we possibly define policies to guide us? Our goal cannot be "regulate AI to make it safe" because the very nature of that statement infers the absence of safety from the outset despite us living with AI systems for 10+ years. To realistically achieve a balanced regulatory approach, we should start within the confinements of laws that already exist - enabling them to address concerns about AI and learn how society reacts accordingly - before seeking completely new, sweeping approaches. The Venn diagram for AI and existing laws is already very dense. We already have data privacy and security laws, discrimination laws, export control laws, communication and content usage laws, and copyright and intellectual property laws, among so many other statutory and regulatory frameworks today that we could seek to amend.
The idea that we need entirely new "AI-specific" laws now - when this field has already existed for years and risks immediately curbing innovation for use cases we don't yet fully understand - feels impractical. It will likely be cumbersome and slow to enforce while creating undue complexity that will likely stifle rather than enable innovation. We can look to history for precedent here - there is no single "Internet Act" for the US or the world - instead, we divide and conquer Internet regulation across our existing legislative and regulatory mechanisms. We have seen countless laws that attempt to broadly regulate the internet fail - the Stop Online Piracy Act (SOPA) is one shining US example - while other laws that seek to regulate within existing bounds succeed. For example, Section 230 of the Communications Decency Act protects internet service providers from being liable for content published, and the protection afforded here has enabled modern internet services to innovate and thrive to enormous success (e.g., YouTube, TikTok, Instagram etc.) while also forcing market competition on corporations to create their own high content standards to build better product experiences and retain users. If they didn't implement self-enforcing policies and standards, users would simply move to a better and more balanced service, or a new one would be created - that's market dynamics.
Of course, we should be practical and realistic. Any laws we amend or implement will have failings - they will be tested in our judicial system and rightly criticized, but we will learn and iterate. Yes, we won’t get this right initially - a balanced approach often will mean some bad actors will succeed, but we can limit these bad actors while dually enabling us to build a stronger and more balanced AI foundation for the years ahead. AI will continue evolving, and the laws will not keep pace - take, for example, misinformation, where Generative AI makes it far easier to construct alternate truths. While this capability has been unlocked, social media platforms still grapple with moderation of non-AI-generated content and have done so for years. Generative AI will likely create extremely concerning misinformation across services, irrespective of any laws we implement.
EU: A concerning approach
With this context, let’s examine one of the most concerning approaches to AI regulation - the European Artificial Intelligence Act - an incredibly aggressive approach that will likely cause Europe to fall far behind the rest of the world in AI. In seeking to protect EU citizens absolutely, the AI Act seemingly forgets that our world is now far more interconnected and that AI programs are, and will continue to be, deeply proliferated across our global ecosystem. For example, the Act arguably could capture essentially all probabilistic methods of predicting anything. And, in one example, it goes on to explain (in Article 5) that:
(a) the placing on the market, putting into service or use of an AI system that deploys subliminal techniques beyond a person’s consciousness in order to materially distort a person’s behaviour in a manner that causes or is likely to cause that person or another person physical or psychological harm;
Rather than taking a more balanced approach and appreciating that perfect is the enemy of good, the AI Act tries to ensnare all types of AI, not just the generative ones currently at the top of politicians and business leaders' minds. How does one even hope to enforce "subliminal techniques beyond a person's consciousness" - does this include the haptic notification trigger on my Apple Watch powered by a probabilistic model? Or the AI system that powers directions on my favorite mapping service with different visual cues? Further, the Act also includes a "high risk" categorization for "recommender systems" that, today, basically power all of e-commerce, social media, and modern technology services - are we to govern and require conformity, transparency and monitoring of all of these too? The thought is absurd, and even if one disagrees, the hurdle posed for generative AI models is so immense - no model meets the standards of the EU AI Act in its current incarnation.
We should not fear AI - we've been living with it for 10+ years in every part of our daily lives - in our news recommendations, mobile phone cameras, search results, car navigation, and more. We can't just seek to extinguish it from existence retrospectively - it's already here. So, to understand what might work, let's walk the path of history to explore what didn't - the failed technological regulations of the past. Take the Digital Millennium Copyright Act (DMCA), which attempted to enforce DRM by making it illegal to circumvent technological measures used to protect copyrighted works. This broadly failed everywhere, didn't protect privacy, stifled innovation, was hacked, and, most critically - was not aligned with consumer interests. It failed in Major League Baseball, the E-Book industry abandoned it, and even the EU couldn't get it to pass. What did work in the end? Building products aligned with consumer interests enabled them to achieve their goal - legitimate ways to access high-quality content. The result? We have incredible services like Netflix, Spotify, YouTube, and more - highly consumer-aligned products that deliver incredible economic value and entertainment for society. While each of these services has its challenges, at a broad level, they have significantly improved how consumers access content, have decentralized its creation, and enabled enormous and rapid distribution that empowers consumers to vote on where they direct their attention and purchasing power.
A great opportunity to lead
The US has an opportunity to lead the world by constructing regulation that enables progressive and rapid innovation. It should be merited by this country's principles and its role as a global model for democracy. Spending years constructing the "perfect AI legislation" and "completely new AI agencies" will end up like the parable of the blind men and an elephant - creating repackaged laws that attempt to regulate AI from different angles without holistically solving anything.
Ultimately, seeking to implement broad new statutory protections and government agencies on AI is the wrong approach. It will be slow, it will take years to garner bipartisan support, and meanwhile, the AI revolution will roll onwards. We need AI regulation that defaults to action, that balances opening the door to innovation and creativity while casting a shadow over misuse of data and punishing discriminative, abusive, and prejudicial conduct. We should seek regulation that is more focused initially, predominantly on misinformation and misrepresentation, and seek to avoid casting an incredibly wide net across all AI innovation so we can understand the implications of initial regulatory enforcement and how to structure it appropriately. For example, we can regulate the data on which these models are trained (e.g. privacy, copyright, and intellectual property laws), we can regulate computational resources (e.g. export laws), and we can regulate what products predictions are made in (e.g. discrimination laws) to start. In this spirit, we should seek to construct laws that target the inputs and outputs of AI systems and not individual developers or researchers who push the world forward.
Here’s a small non-exhaustive list of near-term actionable ideas:
Voluntary transparency on the research & development of AI - If we wait for Congress, we wait for an unachievable better path at the expense of a good one today. There is already an incredible body of work being proposed by Open AI, in Google's AI Principles among others on open and transparent disclosure - there is a strong will to transcend Washington and just do something now. We can seek to ensure companies exercise a Duty of Care, to have a responsibility under common law to identify and mitigate ill effects and have transparent reporting accordingly. And of course, this should extend and be true of our Government agencies as well.
Better AI use case categorization & risk definition - There is no well-defined regulatory concept of “AI” nor how to “determine risks” on “what use cases” therein. While the European AI Act clearly goes too far, it does at least seek to define what “AI is” but errs in defining essentially all of probability. Further, it needs to better classify risk and the classes of AI from which it is actually seeking to protect people. We can create a categorical taxonomy of AI use cases and seek to target regulatory enforcement accordingly. For example, New York is already implementing laws to require employers to notify job applications if AI is used to review their applications.
Pursue watermarking standards - Google leads the cause with watermarking to determine whether content has been AI generated. While these standards are reminiscent of DRM, it is a useful step in encouraging open standards for major distribution platforms to enable watermarking to be baked in.
Prompt clearly on AI systems that collect and use our data - Apple did this to great effect, implementing IDFA and better App Transparency on how consumer data is used. We should expect more of our services for any data being used to train AI along with the AI model use cases such data informs. Increasingly in AI, all data is incredibly biometric - from the way one types, to the way one speaks, to the way one looks and even walks. All of this data can, and is, now being used to train AI models to form biometric fingerprints of individuals - this should be made clear and transparent to society at large. Both data privacy and intellectual property laws should protect who you are, and how your data is used in a new generative AI world. Data privacy isn’t new, but we should realize that the evolution of AI continually raises the stakes.
Protect AI developers & open source - We should seek to focus regulatory efforts on the data inputs as well as have clear licensing structures for AI Models and software. Researchers and developers should not be held liable for creating models, software tooling, and infrastructure that is distributed to the world so that an open research ecosystem can continue to flourish. We must promote initiatives like model cards and ML Metadata across the ecosystem and encourage their use. Further, if developers open source AI models, we must ensure that it is incredibly difficult to hold the AI model developer liable, even as a proximate cause. For example, if an entity uses an open-source AI recommendation model in its system, and one of its users causes harm as a result of one of those recommendations, then one can’t merely seek to hold the AI model developer liable - it is foreseeable to a reasonable person who develops and deploys AI systems that rigorous research, testing, and safeguarding must occur before using and deploying AI models into production. We should ensure this is true even if the model author knew the model was defective. Why, you ask? Again, a person who uses any open source code should reasonably expect that it might have defects - hence why we have licenses like MIT that provide software “as is”. Unsurprisingly, this isn’t a new construct; it's how liability has existed for centuries in tort, and we should seek to ensure AI isn't treated differently.
Empower agencies to have agile oversight - AI consumer scams aren’t new - they are already executed by email and phone today. Enabling existing regulatory bodies to have the power to take action immediately is better than waiting for congressional oversight. The FTC has published warnings, the Department of Justice as well, and the Equal Employment Opportunity Commission too - all of the regulatory agencies that can help more today.
Embrace our future
It is our choice to embrace fear or excitement in this AI era. AI shouldn’t be seen as a replacement for human intelligence, but rather a way to augment human life - a way to improve our world, to leave it better than we found it and to infuse a great hope for the future. The threat that we perceive - that AI calls into question what it means to be human - is actually its greatest promise. Never before have we had such a character foil to ourselves, and with it, a way to significantly improve how we evolve as a species - to help us make sense of the world and the universe we live in and to bring about an incredibly positive impact in the short-term, not in a long distant theoretical future. We should embrace this future wholeheartedly, but do so with care and great tact - for as with all things, life is truly a great balancing act.
Many thanks to Christopher Kauffman, Eric Johnson, Chris Lattner and others for reviewing this post.
Image credits: Tim Davis x DALL-E (OpenAI)
The Aussie conquering Silicon Valley
Meet the little-known young Australian in Silicon Valley at the forefront of the artificial intelligence revolution, who heads one of the hottest start-ups in America that has the audacious goal of fixing AI infrastructure for the world’s software and hardware developers.
By JOHN STENSHOLT (The Australian Business Review)
After years of toughing it out and watching his Silicon Valley dream almost die, a little-known former Melbourne NAB analyst now leads one of the hottest start-ups in America.
Meet the little-known young Australian in Silicon Valley at the forefront of the artificial intelligence revolution, who heads one of the hottest start-ups in America that has the audacious goal of fixing AI infrastructure for the world’s software and hardware developers.
Tim Davis, 40, turned up in California on a whim a little over a decade ago, survived a stint in a wild house known as the “Hacker Fortress”, started an early version of an online food delivery service before the rise of UberEats and others, was poached by Google only to leave the technology giant in 2022 to co-found Modular, an AI infrastructure start-up recently valued at about US$600 million (A$927 million).
And he says he is only getting started.
In his first Australian media interview, Davis says the goals for Modular are clear – and big.
“We started Modular to improve AI infrastructure for the world. Changing the world is never easy – but we are incredibly determined to do so,” he said.
“AI is so important to the future of humanity, and we feel a great purpose to try to improve AI’s usability, scalability, portability and accessibility for developers and enterprises around the world.”
Modular’s vision is to allow AI technology to be used by anyone, anywhere and it is creating a developer infrastructure platform that enables more developers around the world to deploy AI faster, and across more hardware.
Davis, and his America co-founder Chris Lattner, have helped build and scale much of the AI software infrastructure that powers workloads at some of the world’s largest tech companies – including Google – but they argue this software has many shortcomings and was designed for research and not for scaling AI across the vast number of new uses and hardware the world is demanding now and into the future.
They are aiming to rebuild and unify AI software infrastructure, solving fragmentation issues that make it difficult for developers who work outside the world’s largest companies to build, deploy and scale AI, and ultimately make AI more accessible to everyone.
“We thought we could build something unique that could actually empower the world to move faster with AI, while equally making it more accessible to developers, make it easier to program in and better from a cost standpoint because you can scale it to different types of hardware.”
Modular claims to have built the world’s fastest AI inference engine – software that enables AI programs to run and scale to millions of people – and its own programming language, Mojo, a superset of Python (the world’s most popular programming language) which enables developers to deploy their AI programs tens of thousands of times faster, reduce costs and make it more simple to deploy AI around the world.
After raising US$30 million from investors last year, Modular recently raised another US$100 million in a funding round led by private equity firm General Catalyst and including Google Ventures, and Silicon Valley-based venture funds SV Angel, Greylock Partners and Factory.
Modular says it’s now more than 120,000 developers using its products—such as its inference engine and Mojo—launched in early May—and that “leading tech companies” are already using its infrastructure, with 35,000 on the waitlist.
The US$100 million raised will be used on product expansion, hardware support and the expansion of Mojo, as well as building the sales and commercialisation aspects of the business.
Early years
All of which is a far cry from when Davis flew to Silicon Valley in September 2012 with dreams of becoming an entrepreneur, following stints working as a financial analyst at National Australia Bank and then several internships at law firms like Allens and Minters after completing law, business and commerce degrees at Melbourne’s Monash University.
Davis had started his own company called CrowdSend in 2011, which had a software system for identifying objects in images and pictures and matching them to retailers, but said he found it “difficult” as “investors in Australia, if you wanted to raise some seed capital, they would take a very large percentage of the business.”
“So it was my wife that said if you want to do this (become an entrepreneur) why don’t you go do the place where technology is. We’d actually just gotten married and then off I went to America.”
Davis used Airbnb to find a place to stay in Silicon Valley, gave himself six weeks to make a success of things and found what was called the Hacker Fortress in the Los Altos Hills.
“It was very much a place with a whole bunch of misfits who had come to America, particularly Silicon Valley, to do a start-up. This was a very big house, it had 15 rooms, and my dream at the time was how to make CrowdSend successful. There were a lot of really talented people there,” he says.
“(But) it turns out when you come to Silicon Valley, and you come with a preconceived notion of what you want to do, well then there’s reality of what you ended up doing.”
There were plenty of issues in the Hacker Fortress, which was not as clean as advertised on Airbnb. At one stage, a housemate died in his room. Davis would also have the roof fall in on his room one evening.
The house, despite being advertised as being only 10 minutes from the likes of Google and Apple, was actually quite a way from shops and restaurants.
“We basically had no ability to get food easily and so we had this idea that maybe we could build this large scalable distribution model,” Davis explains. “At the time the likes of GrubHub and these other businesses were going to restaurants and trying to get commission deals. So instead we reverse engineered the Starbucks and Chipotle menus and built an app and threw it out as an idea to people in the house.”
Before UberEats
The idea for what would become Davis’s next business, Fluc (Food Lovers United Co), was born – all because he and his housemates were a little lazy and really didn’t want to cook.
It was 2013, a year before UberEats was launched, and while other food delivery service apps were dealing directly with restaurants, Davis and his took a different approach.
“We just thought, why don’t we just grab every restaurant menu, we just put it on our website, inflate the prices, and start selling food? Overnight, we went from five restaurants to 160 restaurants. And it just exploded. It was unprecedented. And what we tapped into really was that selection mattered. And that’s what consumers had not had, at least in the American market. And so then we went on this journey of raising capital.”
Fluc would mainly service the local Bay Area market of about 8 million people, though it tried to move to Los Angeles at one stage, and raised US$4 million from local angel investors.
Davis says after two and a half years of working seven days a week it became obvious “that we wouldn’t be able to raise the amount of capital that we needed to scale that business” at a time when the likes of DoorDash were raising hundreds of millions of dollars annually from investors.
Davis and his team then started talking to other companies about potentially being acquired, and eventually Google expressed interest not in the company but the people Fluc had assembled.
That led to Davis joining Google, where he worked as a product manager and then joined the ad division where he built machine learning systems and the Google Brain research division dedicated to AI – where he met Lattner.
What followed, Davis says, was six years of building AI systems that stood them in good stead when they decided to leave Google and form Modular.
“We were basically there for all the major components of AI as it is now … and what was interesting is that through that experience we could see the infrastructure was designed by researchers – a bunch of people who wanted to train large machine learning models. But taking those models from a research environment and actually scaling them into very large production systems is very hard.”
He says Modular’s only goal is to find solutions to that problem, and commercialise it and that the opportunity “we have in this market is astronomically huge.”
What’s next
“We work with everyone from high performance racing teams, to autonomous car companies, to very large machine learning recommendations to generative AI. You name it, whether it’s video generation, image generation, text generation, we’re there.
“You could go down the NASDAQ list of companies (to find those) who want to use our infrastructure to scale AI inside their organisations.”
As for widespread concerns about AI, Davis says the fact that AI systems take an “incredible amount of time to build” means that the public should be “realistic” about the perception that AI could be a threat to humanity.
“We definitely have better recommendation systems, we have better chat bots, we have translation across multiple languages, we can take better photos, we are now all better copywriters, and we have voice assistants.
“But none of this is remotely close to AGI or artificial general intelligence – or anywhere near ASI: artificial super intelligence. We have a long way to go.
“AI shouldn’t be seen as a replacement for human intelligence, it’s a way to augment human life – a way to improve our world, to leave it better than we found it.”
Unite AI – Interview Series
Tim Davis, is the Co-Founder & President of Modular, an integrated, composable suite of tools that simplifies your AI infrastructure so your team can develop, deploy, and innovate faster. Modular is best known for developing Mojo, a new programming language that bridges the gap between research and production by combining the best of Python with systems and metaprogramming.
By Antoine Tardif (Unite AI)
Tim Davis, is the Co-Founder & President of Modular, an integrated, composable suite of tools that simplifies your AI infrastructure so your team can develop, deploy, and innovate faster. Modular is best known for developing Mojo, a new programming language that bridges the gap between research and production by combining the best of Python with systems and metaprogramming.
Repeat Entrepreneur and Product Leader. Tim helped build, found and scale large parts of Google's AI infrastructure at Google Brain and Core Systems from APIs (TensorFlow), Compilers (XLA & MLIR) and runtimes for server (CPU/GPU/TPU) and TF Lite (Mobile/Micro/Web), Android ML & NNAPI, large model infrastructure & OSS for billions of users and devices. Loves running, building and scaling products to help people, and the world.
When did you initially discover coding, and what attracted you to it?
As a kid growing up in Australia, my dad brought home a Commodore 64C and gaming was what got me hooked – Boulder Dash, Maniac Mansion, Double Dragon – what a time to be alive. That computer introduced me to BASIC and hacking around with that was my first real introduction to programming. Things got more intense through High School and University where I used more traditional static languages for engineering courses, and over time I even dabbled all the way up to Javascript and VBA, before settling on Python for the vast majority of programming as the language of data science and AI. I wrote a bunch of code in my earlier startups but these days, of course, I utilize Mojo and the toolchain we have created around it.
For over 5 years you worked at Google as Senior Product Manager and Group Product Leader, where you helped to scale large parts of Google's AI infrastructure at Google Brain. What did you learn from this experience?
People are what build world-changing technologies and products, and it is a devoted group of people bound by a larger vision that brings them to the world. Google is an incredible company, with amazing people, and I was fortunate to meet and work with many of the brightest minds in AI years ago when I moved to join the Brain team. The greatest lessons I learnt were to always focus on the user and progressively disclose complexity, to empower users to tell their unique stories to the world like fixing the Greater Barrier Reef or helping people like Jason the Drummer, and to attract and assemble a diverse mix of people to drive towards a common goal. In a massive company of very smart and talented people, this is much harder than you can imagine. Reflecting on my time there, it’s always the people you worked with that are truly memorable. I will always look back fondly and appreciate that many people took risks on me, and I’m enormously thankful they did, as many of those risks encouraged me to be a better leader and person, to dive deep and truly understand AI systems. It truly made me realize the profound power AI has to impact the world, and this was the very reason I had the inspiration and courage to leave and co-found Modular.
Can you share the genesis story behind Modular?
Chris and I met at Google and shipped many influential technologies that have significantly impacted the world of AI today. However, we felt AI was being held back by overly complex and fragmented infrastructure that we witnessed first hand deploying large workloads to billions of users. We were motivated by a desire to accelerate the impact of AI on the world by lifting the industry towards production-quality AI software so we, as a global society, can have a greater impact on how we live. One can’t help but wonder how many problems AI can help solve, how many illnesses cured, how much more productive we can become as a species, to further our existence for future generations, by increasing the penetration of this incredible technology.
Having worked together for years on large scale critical AI infrastructure – we saw the enormous developer pain first hand – “why can’t things just work”? For the world to adopt and discover the enormous transformative nature of AI, we need software and developer infrastructure that scales from research to production, and is highly accessible. This will enable us to unlock the next way of scientific discoveries – of which AI will be critical – and is a grand engineering challenge. With this motivating background, we developed an intrinsic belief that we could set out to build a new approach for AI infrastructure, and empower developers everywhere to use AI to help make the world a better place. We are also very fortunate to have many people join us on this journey, and we have the world's best AI infrastructure team as a result.
Can you discuss how the Mojo programming language was initially built for your own team?
Modular’s vision is to enable AI to be used by anyone, anywhere. Everything we do at Modular is focused on that goal, and we walk backwards from that in the way we build out our products and our technology. In this light, our own developer velocity is what matters to us firstly, and having built so much of the existing AI infrastructure for the world – we needed to carefully consider what would enable our team to move faster. We have lived through the two-world language problem in AI – where researchers live in Python, and production and hardware engineers live in C++ – and we had no choice but to either barrel down that road, or rethink the approach entirely. We chose the latter. There was a clear need to solve this problem, but many different ways to solve it – we approached it with our strong belief of meeting the ecosystem where it is today, and enabling a simpler lift into the future. Our team bears the scars of software migration at large scale, and we didn’t want a repeat of that. We also realized that there is no language today, in our opinion, that can solve all the challenges we are attempting to solve for AI and so we undertook a first principles approach, and Mojo was born.
How does Mojo enable seamless scaling and portability across many types of hardware?
Chris, myself and our team at Google (many at Modular) helped bring MLIR into the world years ago – with the goal to help the global community solve real challenges by enabling AI models to be consistently represented and executed on any type of hardware. MLIR is a new type of open-source compiler infrastructure that has been adopted at scale, and is rapidly becoming the new standard for building compilers through LLVM. Given our team's history in creating this infrastructure, it's natural that we utilize it heavily at Modular and this underpins our state of the art approach in developing new AI infrastructure for the world. Critically, while MLIR is now being fast adopted, Mojo is the first language that really takes the power of MLIR and exposes it to developers in a unique and accessible way. This means it scales from Python developers who are writing applications, to Performance engineers who are deploying high performance code, to hardware engineers who are writing very low level system code for their unique hardware.
References to Mojo claim that it’s basically Python++, with the accessibility of Python and the high performance of C. Is this a gross oversimplification? How would you describe it?
Mojo should feel very familiar to any Python programmer, as it shares Python’s syntax. But there are a few important differences that you’ll see as one ports a simple Python program to Mojo, including that it will just work out of the box. One of our core goals for Mojo is to provide a superset of Python – that is, to make Mojo compatible with existing Python programs – and to embrace the CPython implementation for long-tail ecosystem support. Then enable you to slowly augment your code and replace non-performing parts with Mojo’s lower-level features to explicitly manage memory, add types, utilize autotuning and many other aspects to get the performance of C or better! We feel Mojo gives you get the best of both worlds and you don’t have to write, and rewrite, your algorithms in multiple languages. We appreciate Python++ is an enormous goal, and will be a multi-year endeavor, but we are committed to making it reality and enabling our legendary community of more than 140K+ developers to help us build the future together.
In a recent keynote it was showcased that Mojo is 35,000x faster than Python, how was this speed calculated?
It’s actually 68,000x now! But let's recognize that it's just a single program in Mandelbrot – you can go and read a series of three blog posts on how we achieved this – here, here and here. Of course, we’ve been doing this a long time and we know that performance games aren’t what drive language adoption (despite them being fun!) – it’s developer velocity, language usability, high quality toolchains & documentation, and a community utilizing the infrastructure to invent and build in ways we can’t even imagine. We are tool builders, and our goal is to empower the world to use our tools, to create amazing products and solve important problems. If we focus on our larger goal, it's actually to create a language that meets you where you are today and then lifts you easily to a better world. Mojo enables you to have a highly performant, usable, statically typed and portable language that seamlessly integrates with your existing Python code – giving you the best of both worlds. It enables you to realize the true power of the hardware with multithreading and parallelization in ways that raw Python today can not – unlocking the global developer community to have a single language that scales from top to bottom.
Mojo’s magic is its ability to unify programming languages with one set of tools, why Is this so important?
Languages always succeed by the power of their ecosystems and the communities that form around them. We’ve been working with open source communities for a long time, and we are incredibly thoughtful towards engaging in the right way and ensuring that we do right by the community. We’re working incredibly hard to ship our infrastructure, but need time to scale out our team – so we won’t have all the answers immediately, but we’ll get there. Stepping back, our goal is to lift the Python ecosystem by embracing the whole existing ecosystem, and we aren’t seeking to fracture it like so many other projects. Interoperability just makes it easier for the community to try our infrastructure, without having to rewrite all their code, and that matters a lot for AI.
Also, we have learnt so much from the development of AI infrastructure and tools over the last ten years. The existing monolithic systems are not easily extensible or generalizable outside of their initial domain target and the consequence is a hugely fragmented AI deployment industry with dozens of toolchains that carry different tradeoffs and limitations. These design patterns have slowed the pace of innovation by being less usable, less portable, and harder to scale.
The next-generation AI system needs to be production-quality and meet developers where they are. It must not require an expensive rewrite, re-architecting, or re-basing of user code. It must be natively multi-framework, multi-cloud, and multi-hardware. It needs to combine the best performance and efficiency with the best usability. This is the only way to reduce fragmentation and unlock the next generation of hardware, data, and algorithmic innovations.
Modular recently announced raising $100 million in new funding, led by General Catalyst and filled by existing investors GV (Google Ventures), SV Angel, Greylock, and Factory. What should we expect next?
This new capital will primarily be used to grow our team, hiring the best people in AI infrastructure, and continuing to meet the enormous commercial demand that we are seeing for our platform. Modverse, our community of well over 130K+ developers and 10K’s of enterprises, are all seeking our infrastructure – so we want to make sure we keep scaling and working hard to develop it for them, and deliver it to them. We hold ourselves to an incredibly high standard, and the products we ship are a reflection of who we are as a team, and who we become as a company. If you know anyone who is driven, who loves the boundary of software and hardware, and who wants to help see AI penetrate the world in a meaningful and positive way – send them our way.
What is your vision for the future of programming?
Programming should be a skill that everyone in society can develop and utilize. For many, the “idea” of programming instantly conjures a picture of a developer writing out complex low level code that requires heavy math and logic – but it doesn’t have to be perceived that way. Technology has always been a great productivity enabler for society, and by making programming more accessible and usable, we can empower more people to embrace it. Empowering people to automate repetitive processes and make their lives simpler is a powerful way to give people more time back.
And in Python, we already have a wonderful language that has stood the test of time – it's the world's most popular language, with an incredible community – but it also has limitations. I believe we have a huge opportunity to make it even more powerful, and to encourage more of the world to embrace its beauty and simplicity. As I said earlier, it's about building products that have progressive disclosure of complexity – enabling high level abstractions, but scaling to incredibly low level ones as well. We are already witnessing a significant leap with AI models enabling progressive text-to-code translations – and these will only become more personalized over time – but behind this magical innovation is still a developer authoring and deploying code to power it. We’ve written about this in the past – AI will continue to unlock creativity and productivity across many programming languages, but I also believe Mojo will open the ecosystem aperture even further, empowering more accessibility, scalability and hardware portability to many more developers across the world.
To finish, AI will penetrate our lives in untold ways, and it will exist everywhere – so I hope Mojo catalyzes developers to go and solve the most important problems for humanity faster – no matter where they live in our world. I think that’s a future worth fighting for.
Modular has raised $100M to fix AI infrastructure
We are so excited to announce this $100M raise, and beyond proud of what our world-class team, incredible customers and partners have enabled us to achieve. Our AI Engine is the world's fastest and has a strong list of customers lining up for its unparalleled performance and usability, and our new programming language in - Mojo 🔥 - already has a developer community of >120K+ developers in just 4 months.
Chris Lattner and I started Modular to help improve AI infrastructure for the world, and to enable the next wave of AI innovation to be truly unlocked on the world's hardware.
We are so excited to announce this $100M raise, and beyond proud of what our world-class team, incredible customers and partners have enabled us to achieve. Our AI Engine is the world's fastest and has a strong list of customers lining up for its unparalleled performance and usability, and our new programming language in - Mojo 🔥 - already has a developer community of >120K+ developers in just 4 months.
This round was led by General Catalyst, in Deep Nishar&Christopher Kauffman, and filled out by existing investors in GV (Google Ventures) with Dave Munichiello, SVA in Steven Lee&Ronny Conway, Greylock in Saam Motamedi and Factory - amazing leaders and incredible people.
We are so fortunate to work with them, and their belief, to build and create change in the world.And of course, changing the world is never easy - but we are so incredibly determined to continue to do so. AI is so important to the future of humanity, and we feel a great purpose to truly improve AI's usability, scalability, portability and accessibility for the worlds developers and enterprises.
Join us on this incredible journey, and let's change the world together 🚀! You can read more on the Modular blog.
Data Exchange Interview
Welcome to the Data Exchange Podcast. Today we’re joined by Tim Davis, co-founder and Chief Product Officer at Modular. Their tagline says it all: The future of AI development starts here. Tim, great to have you on the show.
Interview with Tim Davis, Co-Founder of Modular. Full interview here
Ben (Host): Welcome to the Data Exchange Podcast. Today we’re joined by Tim Davis, co-founder and Chief Product Officer at Modular. Their tagline says it all: The future of AI development starts here. Tim, great to have you on the show.
Tim Davis: Great to be here, Ben—thanks for having me.
Introducing Mojo: Python, Reimagined
Ben: Let’s dive right in. What is Mojo, and what can developers use today?
Tim: Mojo is a new programming language—a superset of Python, or “Python++,” if you will. Right now, anyone can sign up at modular.com/mojo to access our cloud-hosted notebook environment, play with the language, and run unmodified Python code alongside Mojo’s advanced features.
“All your Python code will execute out of the box—you can then take performance-critical parts and rewrite them in Mojo to unlock 5–10× speedups.”
That uplift comes from our state-of-the-art compiler and runtime stack, built on MLIR and LLVM foundations.
Solving the Two-Language Problem
Many ML frameworks hide C++/CUDA complexity behind Python APIs, but that split still causes friction. Mojo bridges the gap:
Prototype in Python
Optimize in Mojo (same codebase)
“Researchers no longer need to drop into C++ for speed; they stay in one language from research to production.”
This unified model dramatically accelerates the path from idea to deployment.
Who is Mojo For?
Ben: Frameworks like TensorFlow and PyTorch already tackle performance. Who’s Mojo’s target audience?
Tim: Initially, it’s us—Modular’s own infrastructure team. But our real audience spans:
Systems-level ML engineers who need granular control and performance.
GPU researchers wanting a seamless path to production without rewriting code.
By meeting developers where they are, Mojo helps defragment fragmented ML stacks and simplifies pipelines.
Under the Hood: Hardware-Agnostic Design
Mojo’s architecture is built for broad hardware support:
MLIR (Multi-Level IR): Provides a common representation across hardware.
LLVM Optimizations: Powers high-performance codegen.
Multi-Hardware Portability: CPUs, GPUs, TPUs, edge devices, and beyond.
“We want access to all hardware types. Today’s programming model is constrained—Mojo opens up choice.”
This means you’re not locked into CUDA or any single accelerator vendor.
Beyond the Language: Unified AI Inference Engine
Modular also offers a drop-in inference engine:
Integrates with Triton, TF-Serving, TorchServe
CPUs first (batch workloads), GPUs coming soon
Orders-of-magnitude performance gains
“Simply swap your backend and get massive efficiency improvements—no changes to your serving layer.”
Enterprises benefit from predictable scaling and hardware flexibility, whether on Intel, AMD, ARM-based servers, or custom ASICs.
Roadmap: Community, Open Source & Enterprise
Next 6–12 Months:
Expand Mojo’s language features (classes, ownership, lifetimes).
Enable GPU execution (beyond the cloud playground).
Extend the inference engine to training, dynamic workloads, and full pipeline optimizations (pre-/post-processing).
“We released early to learn from real users—80,000 sign-ups across 230+ countries. Their feedback drives our roadmap.”
Why a New Language Matters
Mojo’s core value prop can be summed up in three words:
Usable: Drop-in Python compatibility; gentle learning curve.
Performant: Advanced compiler + runtime yields 5–10× speedups out of the box.
Portable: Write once, run anywhere—from cloud GPUs to mobile CPUs.
Together, these unlock faster innovation, lower costs, and broader hardware choice.
Democratizing AI Development
In Tim’s own words:
“Our mission is to make AI development accessible to anyone, anywhere. By rethinking the entire stack, we’re unlocking a new wave of innovation and putting compute power in more hands.”
With its unified language and inference engine, Modular is ushering in a future where AI development truly starts here—for researchers, engineers, and enterprises alike.
Founding Modular & Raising $30M
After working for years in the AI/ML space - I’ve left Google and decided it's time for a new approach to building machine learning infrastructure. Chris Lattner and I, along with an incredible team of talented architects, engineers and product leaders are teaming up to rebuild it from the ground up and truly help the world of AI.
We are bringing together the world's best AI infrastructure talent to improve AI production development and deployment.
After working for years in the AI/ML space - I’ve left Google and decided it's time for a new approach to building machine learning infrastructure. Chris Lattner and I, along with an incredible team of talented architects, engineers and product leaders are teaming up to rebuild it from the ground up and truly help the world of AI.
We building a next generation AI developer platform, and we are proud to partner with @GVteam, @GreylockVC, Factory, SV Angel and notable angels who are funding our $30M first round of funding. We spoke with Dave Munichiello from GV about the opportunity here. The next generation of product breakthroughs will be powered by production quality infrastructure that brings together the best of compilers and runtimes, is designed for heterogeneous compute, edge to datacenter distribution, and is focused on usability. Unifying software and hardware with a "just works" approach that will save developers enormous time and increase their velocity.
Having worked for many years in the AI space at Google, we have and are continuing to assemble the world's best AI infrastructure team. You can read some of the challenges the industry faces, in our opinion, via our blog post here: The Case for a Next-Generation AI Developer Platform. We are hiring for numerous roles - please apply via Modular Careers.
We're excited to showcase what we have been building and designing later in the year. You can checkout a video we put together below.
We are incredibly excited about the mission before us and if you are interested in joining us to change the world - just reach out via www.modular.com. We’re hiring everywhere. The future is super exciting and bright!
Help Jason find his Rhythm
Learn how TensorFlow Lite Micro helped Jason find his rhythm after loosing his arm - this is story and showcasing the power of AI.
Learn how TensorFlow Lite Micro helped Jason find his rhythm after loosing his arm - this is story and showcasing the power of AI.
Many years ago, Pete Warden, Raziel, Rocky, Sarah, Andy and I, with a small and incredible team at Google, founded TensorFlow Lite Micro for the world as we recognized the importance of executing machine learning on microcontrollers. We have continued to scale and contribute significant improvements to TF Lite for Microcontrollers over the years unlocking a multitude of use cases from speech, to person detection, to a multitude of audio detection use cases and more. We have seen the creation of cascading networks where the front of the pipeline is a very low power microcontroller running a tiny inference model all the way through to multi-model pipelines firing up a more significant application processor.
Here is just one incredible story where the team from Georgia Institute of Technology under Gil Weinberg, took the the TensorFlow Lite and TensorFlow Lite Micro work and drove an inspiring use case to help Jason find his rhythm. Watch the video below to understand how TensorFlow Lite is being used to empower human augmentation and empower people to rediscover that anything is possible. If you're running short on time - here's how it works.
As Stephen Hawking once said:
“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the Universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at.
Saving the Great Barrier Reef
Coral reefs are some of the most diverse and important ecosystems in the world - both for marine life and society more broadly. Not only are healthy reefs critical to fisheries and food security, they provide countless additional benefits: protecting coastlines from storm surge, supporting tourism-based economies and sustainable livelihoods, and pushing forward drug discovery research.
Helping save the Great Barrier Reef using TensorFlow and the power of Machine Learning
Coral reefs are some of the most diverse and important ecosystems in the world - both for marine life and society more broadly. Not only are healthy reefs critical to fisheries and food security, they provide countless additional benefits: protecting coastlines from storm surge, supporting tourism-based economies and sustainable livelihoods, and pushing forward drug discovery research.
2 years ago, at a lunch and meetup with Martin Wicke and Brano Kusy - we decide to work together and team up to see if Google could contribute to saving the great barrier reef from Crown of Thorns starfish and identifying other sea life. As an Australian - I knew of the importance of this mission for the world and for future generations to experience the wonder of the reef, and I decided to lean in heavily and help champion this effort at Google, helping to drive it forward. What followed - was a multi-step journey, with an incredible team and group of folks at Google and CSIRO, helping Brano's team end-to-end label, train, scale and deploy, to their boat mounted hardware, a new solution for rapidly identifying COTS (and many other things) in the Great Barrier Reef. You can see our research paper for the data here.
We launched a Kaggle competition, to help crowd source the final approach and enable this to be executed live on the Great Barrier Reef, and we're open sourcing it for the world as well. You can checkout my narration in the video below and see the original TensorFlow post.
Reefs around the world face a number of rising threats, most notably climate change, pollution, and overfishing. In the past 30 years alone, there have been dramatic losses in coral cover and habitat in the Great Barrier Reef (GBR), with other reefs experiencing similar declines. In Australia, outbreaks of the coral-eating COTS have been shown to cause major coral loss. These outbreaks can strip a reef of 90% of its coral tissue. While COTS naturally exist in the Indo-Pacific ocean, overfishing and excess run-off nutrients have led to massive outbreaks that are devastating already vulnerable coral communities.
Controlling COTS populations is critical to reducing coral mortality from outbreaks. Google has teamed up with CSIRO to supercharge efforts in monitoring COTS using artificial intelligence. This is just the beginning of a much deeper collaboration and we, along with the Great Barrier Reef Foundation, are extremely excited to invite you, our global ML community, to help protect the world's reefs.
MLIR - Open Source Infrastructure for the world
MLIR is a new compiler stack that I have the privilege of being the Product Manager for at Google. It fundamentally takes a completely different perspective on existing technologies available in the marketplace by producing a multi-level intermediate representation that combines high level optimizations with lower level code generation in a way that hasn't been done before. I have had the privilege to work with Chris Lattner and a talented team of many other folks in building out this technology and look forward to the enormous impact it is going to have on machine learning in the years ahead. Given that we, Google, are an AI first company - even Sundar was happy about the news. Below is a repost of the original announcement that we posted to the main Google blog.
MLIR offers new infrastructure and a design philosophy that enables machine learning models to be consistently represented and executed on any type of hardware.
MLIR is a new compiler stack that I have the privilege of being the Product Manager for at Google. It fundamentally takes a completely different perspective on existing technologies available in the marketplace by producing a multi-level intermediate representation that combines high level optimizations with lower level code generation in a way that hasn't been done before. I have had the privilege to work with Chris Lattner and a talented team of many other folks in building out this technology and look forward to the enormous impact it is going to have on machine learning in the years ahead. Given that we, Google, are an AI first company - even Sundar was happy about the news. Below is a repost of the original announcement that we posted to the main Google blog.
Machine learning now runs on everything from cloud infrastructure containing GPUs and TPUs, to mobile phones, to even the smallest hardware like microcontrollers that power smart devices. The combination of advancements in hardware and open-source software frameworks like TensorFlow is making all of the incredible AI applications we’re seeing today possible--whether it’s predicting extreme weather, helping people with speech impairments communicate better, or assisting farmers to detect plant diseases.
But with all this progress happening so quickly, the industry is struggling to keep up with making different machine learning software frameworks work with a diverse and growing set of hardware. The machine learning ecosystem is dependent on many different technologies with varying levels of complexity that often don't work well together. The burden of managing this complexity falls on researchers, enterprises and developers. By slowing the pace at which new machine learning-driven products can go from research to reality, this complexity ultimately affects our ability to solve challenging, real-world problems.
Earlier this year we announced MLIR, open source machine learning compiler infrastructure that addresses the complexity caused by growing software and hardware fragmentation and makes it easier to build AI applications. It offers new infrastructure and a design philosophy that enables machine learning models to be consistently represented and executed on any type of hardware. And today we’re announcing that we’re contributing MLIR to the nonprofit LLVM Foundation. This will enable even faster adoption of MLIR by the industry as a whole.
MLIR aims to be the new standard in ML infrastructure and comes with strong support from global hardware and software partners including AMD, ARM, Cerebras, Graphcore, Habana, IBM, Intel, Mediatek, NVIDIA, Qualcomm Technologies, Inc, SambaNova Systems, Samsung, Xiaomi, Xilinx—making up more than 95 percent of the world’s data-center accelerator hardware, more than 4 billion mobile phones and countless IoT devices. At Google, MLIR is being incorporated and used across all our server and mobile hardware efforts.
Machine learning has come a long way, but it's still incredibly early. With MLIR, AI will advance faster by empowering researchers to train and deploy models at larger scale, with more consistency, velocity and simplicity on different hardware. These innovations can then quickly make their way into products that you use every day and run smoothly on all the devices you have—ultimately leading to AI being more helpful and more useful to everyone on the planet.
The career opportunity matrix
The biggest mistake I constantly see new founders make is misjudging the matrix that almost everyone goes through when making a career (and most often life) changing decision. I certainly didn’t appreciate this as much as I should have when I was a founder and I urge anyone reading this post to deeply consider aspects highlighted in this post.
The biggest mistake I constantly see new founders make is misjudging the matrix that almost everyone goes through when making a career (and most often life) changing decision. I certainly didn’t appreciate this as much as I should have when I was a founder and I urge anyone reading this post to deeply consider aspects highlighted in this post.
Both at my previous startup and with the many subsequent startup founders that I’ve talked too since joining Google — the biggest mistake I constantly see new founders make is misjudging the matrix that almost everyone goes through when making a career (and most often life) changing decision. I certainly didn’t appreciate this as much as I should have when I was a founder and I urge anyone reading this post to deeply consider aspects highlighted in this post.
When you ask someone to join your company —you will almost immediately misjudge the impact it will have on their life. You are asking someone to ultimately change the value they attribute to their time in what they currently do — in order to encapsulate themselves in whatever it is you do. The value of this utility is ultimately the most valuable thing any person has — the opportunity to spend their time when and how they see fit.
If you are unable to relate to the person, inspire the person, compensate the person or truly enable them to believe in the growth opportunity you present — why would they offer up their most precious asset to work with, and ultimately, for you? Talented people, truly great people — understand the value of time. Its a common interview question of mine to press people on explaining to me how their organize and attribute value to it. The reality of the world is — most people don’t attribute enough value to their time and waste it frequently.
So as a founder, understand the importance of time. Its almost unequivocal at this point that building out an all star team is going to be the single most important aspect attributable to your companies success — a great team is always going to be able to derive a solution in good times and bad. But consider this — when you ask, or pitch, someone to join you — do you really appreciate the matrix of decisions that they are going through to join your company. Why would anyone want to spend their time with you?
In my experience to date, from startups through to Google, here are the only four major aspects that matter to people who attribute real value to their time. This is their opportunity cost matrix:
Family — In a personal capacity, most people will not sacrifice their family for anything. It is always what is prioritized in people’s lives above all else — it is uncompromising. Equally, in a professional capacity, people want to feel that they have a real connection to their work mates and their team. Usually, a few extroverted people bind a team together and help it thrive. You know that you have a team that feels this way when someone leaves — you actually miss that person when they do. If you miss the importance of family in someones life, or you overlook it, you do so at extreme peril more often than not.
Remuneration — Typically, no one works for free. People value their time and they deserve to be compensated for their skill. Cash is almost viewed to be infinitely more valuable than any equity you will provide someone. Great people need to be compensated well, or at least very fairly, from a cash standpoint and even better from an equity one. You might treat your equity at a premium, but asking someone to leave a top 10 tech company to join your team is asking them to move from a world of liquidity to a long life without it. Just look at the giants like Airbnb and Uber — more than 10 years later many employees still remain without any liquidity. So remember, more often than not I have found that liquid cash is king and that people work to care for their families.
Growth — Everyone wants to advance their careers in some capacity — whether it be personal knowledge, leadership or the chance to be promoted. Not everyone cares about the later — some are very satisfied with a growth in their own knowledge without significant change in their ladder ranking. Equally, some do. Working on a product and with a team that facilities and enables growth ultimately leads to higher retention — promote people quickly who do a fantastic job and its often easy for you to tell who your organization could not live without. Do not fall into the trap of rewarding them “down the line” — the simple reason being that great people are quickly acquired and this matrix is more than likely being pitched to them at numerous other companies as well.
Purpose — A purpose or mission is typically much bigger than what you are working on today. It’s the vision that you should be proud to tell your friends and family and is much larger than the current iteration of your product or company. Teams that don’t have a mission don’t know where they are going and aren’t connected through the shared bond of a greater goal or purpose. A mission is why you get up and come to work everyday and is typically something you really believe in. It isn’t captured by in metrics but in how you are trying to put a dint in the world. If you cannot inspire someone to believe in the purpose of what you are doing — then regardless of anything else — they will never truly ride the rollercoaster with you.
Everyone deserves to feel that they can grow their own careers and have a sense of personal achievement as they define it. Equally, they should be able to tell the world what they are passionate about, why they are working on it and have a defined sense of purpose. Everyone has to point north and believe in what they are doing — in fact
If everyone on your team can’t tell a friend very simply what problem your product is solving, why it’s unique, and why you will win, you might as well stop and go home now.
As an anecdote, and in this light, when I previously pitched to Alibaba’s investment arm and asked their investment partners why they worked at Alibaba. The response was the most impressive one ever heard:
“At Alibaba, we aim to build the future infrastructure of commerce. We envision that our customers will meet, work and live at Alibaba, and that we will be a company that spans over three centuries. We take a very long approach in every investment we make and everything we do.”
Your ambitions and goals should be as easily declared when anyone on your team is asked what it is you do — a clear vision, a clear sense of purpose, a passionate sense of worth and a guiding north star. Without it — why should anyone join you?
“And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary.” — Steve Jobs, 2005
Presenting at Google I/O
It was super fun to speak at Google I/O Developer Keynote this year about the amazing work the TensorFlow Lite team has done on scaling on-device ML globally. You can checkout the video of the presentation below which showcased a new interactive form of video and the ability to scale that to a range of new device platforms.
Dancing on stage at Google I/O 19 in front of 15,000 was a fun experience and inspiring. The video is below and I start at 42:55 into it.
It was super fun to speak at Google I/O Developer Keynote this year about the amazing work the TensorFlow Lite team has done on scaling on-device ML globally. You can checkout the video of the presentation below which showcased a new interactive form of video and the ability to scale that to a range of new device platforms.
The TensorFlow Lite team, combined with other internal Google Research and performance teams, showcased whats now possible utilizing on-device ML with a Pixel 3 and the GPU delegation improvements that we have previously announced. Here’s the video giving a deeper dive into the making of Dance Like.
Supportive Confrontation
Supportive confrontation is a methodology adopted by respectful and highly constructive individuals and teams to ultimately push the boundaries of self-improvement. David Bradford (Stanford) and Allen Cohen ultimately coined the termed in their book Power Up and David utilizes this in his class on High Performance Leadership.
Supportive confrontation is a methodology adopted by respectful and highly constructive individuals and teams to ultimately push the boundaries of self-improvement. David Bradford (Stanford) and Allen Cohen ultimately coined the termed in their book Power Up and David utilizes this in his class on High Performance Leadership.
Supportive confrontation is a methodology adopted by respectful and highly constructive individuals and teams to ultimately push the boundaries of self-improvement. David Bradford (Stanford) and Allen Cohen ultimately coined the termed in their book Power Up and David utilizes this in his class on High Performance Leadership.
In my career, and notably at Google, we utilize supportive confrontation to improve as individuals and strengthen our collective team in working to continue to build world class products.
So what is it?
The very definition of “confrontation” presumes some measure of hostility or disagreement between parties. In its purse sense, it is appreciated why the name alone instills some level of unsurety in many individuals who undertake this method of feedback adoption. However, the fundamental premise of “supportive confrontation” is a balanced combination of the former with the later - to deal with personal growth in a direct and honest way. Indeed, in the absence of such direct and honest feedback, a host of eventualities ultimately occur including emotional instability, poor accountability and ownership, individual conflict and poor team performance.
The focus for supportive confrontation is to initiate conversations with individuals that you have an obvious mutual respect for and whom would be open to receiving such direct and honest feedback. You should recognize that not all work colleagues (or friends) can undertake this feedback methodology without immediately reacting in a defensive manner — so choosing the nature of your audience is a critical one. The ultimate goal is, through mutual respect of person or people providing such feedback, to abject any form of personal emotion and consider the feedback for what it is — open and honest. In order to improve as an individual, one has to accept all forms of criticism in life and ultimately listen intently, reflect and channel this feedback forward.
Here are some broad level pointers:
Open transparency from the beginning — If you have asked an individual, or a group, to engage in supportive confrontation, you should illustrate that the purpose of such feedback is to improve and not to engage in personal attacks. The art of this form of feedback is mutual respect for each person and explaining this from the outset is critical to it being successful. Explain from your own perspective how and why this feedback system has been useful in the past and ensure that everyone can empathize with this point of view.
3 Top Aspects, 3 Bottom Aspects — Provide each person with three things they are doing well, and three things that they ultimately need to improve on. Feedback should always be direct and actionable — including real examples of what you have noticed and when further helps an individual to consider how they can improve. Good and bad feedback without observable examples makes it difficult for people to reflect and consider — so quantifying your feedback ensures the recipient can undergo self-reflection and retrospectives in their own time.
Write your feedback statements on cards, give it to the person physically — Writing down your feedback on cards during the session and handing it directly to the person, or people, solidifies it and ensures they can’t simply forget or dismiss the feedback provided.
No interruptions or responding to feedback for 24 hrs — Provide feedback directly and ensure that no one interrupts, gets defensive or attempts to justify their behaviour for at least 24 hours. The idea of this “post feedback cooling off” period is to allow individuals to consider it for what it is and contemplate how they can improve. You should ensure that everyone understands they aren’t meant to interrupt or justify during the initial meeting.
Respond with ways to actionable improvement steps, don’t justify — After 24 hours, you should meet again with the individuals that provided feedback. Ideally, as a recipient of feedback, you should seek to provide measurable steps to improve on each component of feedback. Justifying your actions isn’t the purpose of supportive confrontation, it’s about setting actionable methodologies across a set time horizon so others can observe your behaviour and measure this improvement (or not).
Be Thankful — Thank all those who are involved in giving you feedback. The very purpose of this methodology is to grow and improve as an individual and to remember that those you mutually respect are telling you this because they care about you enough to want you to grow and be better as an individual. As the old adage suggests — “It’s better to know the devil you do, than the devil you don’t”
Feedback, in its rawest sense, is about listening, watching, learning and working to improve after receiving it. You should appreciate those with the courage to undertaken supportive confrontation with you enough, to ensure that you take the feedback seriously and find ways to action and improve on it as an individual. If you feel the feedback is overtly personal or hurtful in nature, you should have the confidence to speak to the individual, or group, after reflection 24 hours later and let them know that. Equally, you should seek to understand their reasoning nature of why they provided it — it will be better for all of you.
That way, you can become a better person, colleague, team member and leader as you grow.