Scale or Surrender: When watts determine freedom
Consider this provocative framing: what if we viewed our collective future not through the lens of human populations and national borders, but through available compute capacity? In this view, the race to build massive datacenter infrastructure becomes humanity's defining competition. This perspective makes efficiency not just important, but existential.
Over the past two centuries, humanity's relationship with energy has been nothing short of transformative. If you chart global primary energy consumption from the Industrial Revolution to today, you'll see something remarkable: an almost unbroken ascent, punctuated by only three brief pauses - the early 1980s oil crisis aftermath, the 2009 financial crisis, and the 2020 pandemic. Otherwise, it's been an extraordinary march upward, powered first by coal and oil, then natural gas, nuclear, hydropower, and increasingly, renewables. This wonderful graphic highlights this well - with populous nations like China, the United States, and India dominating total consumption on a per-person basis.
The geographic distribution of this energy consumption tells a striking story. China, the United States, and India dominate in absolute terms, but the per-capita numbers reveal something more profound. Citizens of Iceland, Norway, Canada, the United States, and wealthy Gulf states like Qatar and Saudi Arabia consume up to 100 times more energy than those in the world's poorest regions. This isn't merely inequality - it's a chasm so vast that millions of people still rely on traditional biomass (wood, agricultural residues) that doesn't even register in our global energy statistics, creating data gaps.
The disparities in electricity generation are equally stark. Iceland, blessed with abundant geothermal and hydro resources, generates hundreds of times more electricity per person than many low-income nations, where annual per-capita generation can fall below 100 kilowatt-hours - less than what a modern refrigerator uses in two months.
This context matters immensely as we confront the dual challenge of our time: meeting rising global energy demand while urgently decarbonizing our energy supply. Despite record investments in clean technologies, fossil fuels still account for approximately 81.5% of global primary energy. The math here is unforgiving - renewable sources must not only meet all new demand but also replace existing fossil fuel capacity if we're to bend the emissions curve downward.
Enter artificial intelligence, with its voracious and growing appetite for electricity.
In 2023, U.S. data centers consumed approximately 176 terawatt-hours - 4.4% of national electricity consumption. Current projections suggest this could reach 325 to 580 TWh by 2028, representing 6.7% to 12% of total U.S. electricity demand driven largely by AI workloads that demand ever-increasing compute power and specialized hardware. To contextualize these numbers: we're talking about enough electricity to power between 32.5 and 58 million American homes.
The AI industry has long understood a critical metric that deserves wider attention: tokens-per-dollar-per-watt. This measure of computational efficiency relative to both cost and energy consumption has been a focus at Google and other leading technology companies for years. It represents the kind of systems thinking we desperately need as AI capabilities expand.
The challenge before us is clear. We're attempting to build transformative AI systems while simultaneously addressing the climate crisis. These goals aren't inherently incompatible, but reconciling them requires unprecedented coordination and innovation across multiple domains:
Hardware efficiency: Next-generation chips that deliver dramatically better performance-per-watt
Operational intelligence: Carbon-aware scheduling that aligns compute-intensive tasks with renewable energy availability
Infrastructure innovation: On-site renewable generation and novel cooling systems that minimize overhead
System integration: Data centers that contribute to local energy systems through waste heat recovery
Radical transparency: Clear reporting standards that drive competition on efficiency metrics
Global energy consumption tells a story of both peril and promise. As artificial intelligence scales exponentially, it threatens to derail climate progress - yet history shows us that human ingenuity consistently reimagines our energy systems when survival demands it. We have already proven we can build transformative AI; the defining challenge now is whether we can build it sustainably, ensuring our creations enhance rather than endanger the world they serve.
The stakes are higher than they appear. Even breakthrough efficiency gains in AI hardware may paradoxically increase total energy consumption - a manifestation of Jevon's Paradox, where technological improvements drive greater overall demand. At this crossroads of intelligence and energy transformation, our choices will determine whether AI becomes humanity's greatest tool or its most consequential miscalculation.
The arithmetic is challenging, but not impossible. What's required is the kind of systematic thinking and ambitious action that has characterized humanity's greatest technological leaps. The alternative - allowing AI's energy demands to grow unchecked - would represent a profound failure of imagination and responsibility, but also the risks are enormous as whichever nations control the most powerful AI systems - are the new superpowers of tomorrow. In this article, I try to shine a light on what's causing the enormous growth of energy demands and some thoughts about the path forward.
The geography of American power
To truly grasp the magnitude of AI's growing energy demands, it's instructive to examine America's electricity generation landscape. At the apex sits the Palo Verde Nuclear Generating Station in Arizona, the nation's largest power producer, generating approximately 32 million megawatt-hours annually - equivalent to 32 billion kWh or 32 GWh.
What does 32 billion kWh actually mean? The U.S. Energy Information Administration reports that the average American household consumes about 10,500 kWh per year. Simple arithmetic reveals that Palo Verde alone could theoretically power 3.05 million homes - roughly 2.5% of the nation's 120.92 million households. One facility, powering the equivalent of a major metropolitan area.
The roster of America's electricity giants tells a fascinating story about our energy infrastructure. After Palo Verde, we have Browns Ferry (31 GWh, nuclear), Peach Bottom (22 GWh, nuclear), and then Grand Coulee Dam (21 GWh) - the hydroelectric marvel that helped build the American West. The list continues with West County Energy Center (19 GWh, natural gas), W.A. Parish (16 GWh, a coal/gas hybrid), and Plant Scherer in Georgia (15 GWh, coal).
Notice the pattern? Nuclear dominates the top tier, followed by a mix of hydro, gas, and coal. After these giants, capacity drops precipitously to facilities generating around 3 GWh - a reminder of how concentrated our electricity production really is. This concentration matters. When we project data centers consuming 325-580 TWh by 2028, we're talking about the equivalent of 10-18 Palo Verde stations running exclusively to power AI and digital infrastructure. That's not replacing existing demand - that's additional load on a grid already straining to decarbonize.
The average U.S. household consumes about 10,500 kilowatthours (kWh) of electricity per year, though this varies significantly by region and housing type. Residential electricity primarily powers essential systems: space cooling, water heating, space heating, along with refrigeration, lighting, and electronics. Commercial buildings have vastly different consumption patterns depending on their size and type, ranging from small offices to large retail centers and office complexes, each with varying HVAC, lighting, and operational equipment needs.
The sectoral breakdown of U.S. electricity consumption reveals a more balanced distribution than commonly understood. According to EIA forecast 2025-2026 power sales will rise to 1,494 billion kWh for residential consumers, 1,420 billion kWh for commercial customers and 1,026 billion kWh for industrial customers with longer term forecasts still mostly within norms. This translates to approximately 38% residential, 36% commercial, and 26% industrial consumption. Rather than the residential sector being overshadowed by commercial and industrial users, it actually represents the largest single sector of electricity demand, with commercial consumption running a close second. This distribution reflects America's transition toward greater electrification in homes and businesses, driven by factors including growing demand from artificial intelligence and data centers and as homes and businesses use more electricity.
The long view is revealing. According to the Energy Information Administration, U.S. electricity consumption increased in all but 11 years between 1950 and 2022. The rare declines - including 2019, 2020, and 2023 - coincided with economic contractions, efficiency improvements, or exceptional circumstances like the pandemic. The overarching trend remains unmistakably upward. While the exact figures here may carry some uncertainty, they accurately capture the essential dynamics. What matters isn't whether commercial usage is precisely 6.7 times residential, but that the disparity is substantial and the growth trajectory is clear. These patterns - concentrated commercial demand and relentless growth - form the backdrop against which we must evaluate AI's emerging energy requirements.
Understanding these scales helps frame the challenge ahead. Every percentage point of national electricity consumption that shifts to data centers represents millions of homes' worth of power. The infrastructure required to meet this demand sustainably doesn't just appear - it must be planned, financed, and built, all while racing against both growing demand and climate imperatives.
The numbers that keep me up at night
Let's revisit the core projection: U.S. data centers consumed approximately 176 TWh in 2023 (4.4% of national electricity) and are projected to reach 325-580 TWh by 2028 – equivalent to powering 32.5 to 58 million American homes.
But here's what keeps me up at night: 580 TWh might be just the beginning of what we need.
Consider today's reality. One analysis estimated that ChatGPT inference alone consumes an estimated 226.8 GWh annually – enough to power 21,000 U.S. homes – and that's already outdated. The International Energy Association (IEA) offers a more sobering projection: 945 TWh - that's the entire electricity consumption of the world's third-largest economy - Japan.
Let that sink in - AI could require as much electricity as the world's third-largest economy. The composition of this demand has already shifted fundamentally. During my time at Google, I watched inference overtake training as the primary driver of compute demand through the late 2010s, and now inference is quickly rising to represent more than 80%+ of the AI compute capacity across the industry. This matters because while training happens in discrete, intensive bursts, inference runs continuously at scale, serving billions of requests around the clock - every query, every recommendation, every generated response adds to the load. This is the crux of our challenge: AI follows exponential growth patterns that surprise even those who've spent years watching them unfold. Given the convergence toward dominant model architectures, inference's share will likely climb further, meaning our upper-bound projection of 945 TWh could see inference alone consuming over 756 TWh by 2028.
But doesn’t edge computing promise to slash data center demands? This is a narrative I've heard repeatedly throughout years of scaling early edge AI systems. Yet I remain deeply skeptical of any order-of-magnitude impact. The reason is simple: we've barely scratched the surface of enterprise, government, and industrial AI adoption. These sectors will unleash computational demands that dwarf any efficiency gains from consumer devices processing locally. Consider the asymmetry: for every smartphone performing local voice recognition, hundreds of enterprise systems are analyzing documents, monitoring infrastructure, processing surveillance footage, and generating complex reports. The sheer scale of this institutional transformation will eclipse whatever load we shift to the edge.
This reality leads us to the heart of the matter: if inference drives our energy challenge, how do we understand its consumption patterns? What does the energy anatomy of inference reveal, and where might we find our leverage points for optimization? Understanding these patterns isn't just an academic exercise - it's essential for developing strategies that can accommodate AI's growth while continuing to grow our energy infrastructure. The always-on nature of inference, combined with its direct relationship to usage, creates a fundamentally different challenge than the periodic spikes of model training.
Inference: the GOAT of consumption
Inference footprint represents the electricity consumed each time an AI model generates a response - as AI becomes ubiquitous across digital services, inference will inevitably dominate long-term energy costs. This raises a crucial question: how do we properly measure inference energy consumption? What's the right framework for calculating inference-per-token-per-watt?
Let's develop a working model, with an important caveat: these calculations rest on rough assumptions about the current AI landscape. They presume most AI continues running on transformer architectures without fundamental changes over the next few years - though I suspect this assumption may prove conservative. We're likely to use AI itself to discover more efficient architectures, potentially invalidating these projections in favorable ways. With that context, let's examine how inference energy consumption actually works and what drives its costs at scale.
The quadratic curse
Transformers are the neural network architecture powering most modern AI systems - from ChatGPT to Claude to Gemini, created by former colleagues at Google. The key innovation of this architecture is the ability to process all parts of an input simultaneously while understanding relationships between distant elements in the text. In transformer inference, prefill is the initial computational phase where the model processes your entire input prompt before generating any output. This involves a single forward pass through the network, computing hidden representations for all input tokens at once.
Your sequence length simply counts these tokens - the basic units of text that might be letters, partial words, or whole words depending on the tokenizer. "Hello, world!" typically translates to 3-4 tokens, while a lengthy document might contain thousands. This distinction matters because prefill computation scales with sequence length, making long prompts significantly more energy-intensive than short ones.
Prefill time grows quadratically with sequence length - double the input, quadruple the computation. This scaling behavior stems from transformers' core mechanism: self-attention. Self-attention requires computing relationships between every pair of tokens in the input. For n tokens, that's n² comparisons. Unlike older architectures (RNNs) that process tokens sequentially, transformers examine all tokens simultaneously, with each token gathering information from every other token in parallel.
Here's an intuitive analogy: imagine a roundtable discussion where each participant (token) prepares three items:
Query: "What information am I seeking?"
Key: "What information do I possess?"
Value: "What insight can I contribute?"
Each participant shares their query with everyone else, comparing it against others' keys to find the most relevant matches. They then synthesize their understanding by combining values from those whose keys best align with their query. Every participant does this simultaneously, creating a rich, interconnected understanding of the entire conversation. This elegant mechanism enables transformers' remarkable capabilities, but it comes at a cost: computational requirements that scale quadratically with input length. A 2,000-token prompt requires four times the computation of a 1,000-token prompt, not twice. This mathematical reality shapes the energy economics of AI inference at scale.
The Two Phases of Transformer Processing
Every transformer request involves two distinct computational phases:
Prefill: Processing the entire input prompt (quadratic scaling with input length, O(n²) complexity)
Decode: Generating output tokens one by one (linear scaling with output length, O(n) complexity)
This scaling difference has profound implications. While decode time grows linearly with the number of tokens generated, prefill time grows with the square of input length. The longer your prompt, the more dramatically prefill dominates total processing time.
Consider the relative computational work (in arbitrary units, assuming 50 output tokens):
Input Length | Prefill Work (∝ n²) | Decode Work | Prefill % of Total |
---|---|---|---|
500 tokens | 250,000 units | 50,000 units | 83% |
1,000 tokens | 1,000,000 units | 50,000 units | 95% |
2,000 tokens | 4,000,000 units | 50,000 units | 99% |
The pattern is stark. At 500 input tokens, prefill already consumes 83% of processing time. Double the input to 1,000 tokens, and prefill jumps to 95% - the actual generation phase becomes almost negligible. At 2,000 tokens, you're spending 99% of compute just understanding the prompt. Here's what's happening:
Prefill work = n² (where n = input tokens)
Decode work = 1,000 × output tokens (arbitrary scaling factor)
This quadratic scaling of self-attention explains why long-context models are so computationally expensive. As context windows expand from thousands to hundreds of thousands of tokens, the energy requirements don't just grow - they explode. Understanding this dynamic is crucial for anyone designing AI systems or planning infrastructure for the age of ubiquitous AI.
Doubling words, quadrupling watts
The quadratic scaling of context windows isn't just an abstract computational concern - it translates directly into energy consumption. Every FLOP requires energy, and when FLOPs scale quadratically, so does your electricity bill.
The energy equation is straightforward:
Devices draw roughly constant power P during operation (e.g., 300W for a high-performance GPU)
Energy consumed equals power multiplied by time: E = P × T
Since prefill time scales quadratically with input length, so does prefill energy
Let's make this concrete with realistic parameters:
Power draw: 300W
Decode time: 20ms (fixed for 50 output tokens)
Baseline prefill: 100ms for 500 input tokens
Input Tokens | Prefill Time | Prefill Energy | Decode Time | Decode Energy | Prefill % of Total |
---|---|---|---|---|---|
500 | 0.10 s | 30 J | 0.02 s | 6 J | 83% |
1,000 | 0.40 s | 120 J | 0.02 s | 6 J | 95% |
2,000 | 1.60 s | 480 J | 0.02 s | 6 J | 99% |
The energy story mirrors the computational one. While decode energy remains constant at 6 joules regardless of input length, prefill energy explodes from 30J to 480J as input doubles from 500 to 2,000 tokens. At 2,000 tokens, you're burning 80 times more energy understanding the prompt than generating the response.
Let's recap these results.
At 500 input tokens, prefill consumes 30J versus decode's 6J - already 83% of total energy. Double the input to 1,000 tokens, and prefill time quadruples, pushing energy consumption to 120J and commanding 95% of the total. By 2,000 tokens, the imbalance becomes extreme: 480J for prefill versus 6J for decode, with prefill consuming 99% of the energy budget. Extrapolate to a 10,000-token prompt generating just 1500 output tokens, and you're looking at 3.4Wh per query - nearly all spent on prefill. This isn't a marginal effect; it's the dominant factor in inference energy consumption.
The implications here are therefore profound. Whether you're designing for on-device inference with battery constraints, deploying in autonomous vehicles, or managing massive cloud infrastructure costs at scale - prompt length becomes your primary lever for controlling energy consumption. The quadratic scaling means that doubling prompt length doesn't double energy use - it roughly quadruples it. This scaling asymmetry defines the energy economics of AI. Decode plods along linearly - each output token costs the same as the last. But prefill explodes quadratically with input length, its hunger growing with the square of every token fed. A thousand-token prompt doesn't just double the cost of a 500-token prompt - it quadruples it. By the time we reach today's massive contexts, decode disappears entirely, a thin shadow cast by prefill's towering consumption.
And yet, users control neither the true input nor output. There isn’t a “token budget” in any service out there today, and that would likely create a frustrating user experience if there was. The largest providers - OpenAI, Google, Anthropic - inject substantial hidden context into every prompt while keeping their system instructions opaque. Output remains equally unconstrained: unless users explicitly demand brevity, models generate tokens freely and most services never think to limit responses, and most users don’t even understand they can.
This creates a fundamental tension in AI system design. While longer contexts enable richer interactions and more sophisticated reasoning, they exact an exponentially increasing energy toll. Once prompts exceed a few hundred tokens, virtually all computational resources are consumed by the prefill phase alone. For sustainable AI deployment at scale, prompt concision isn't merely good practice - it's an energy imperative. The difference between 500-token and 2,000-token average prompts could determine whether our global infrastructure remains viable or collapses under its own consumption.
This problem compounds as AI agents and capabilities like Deep Research proliferate. Each autonomous action, each recursive query, each unconstrained generation adds to an already exponential curve. We're building systems designed to think deeply while hoping they'll somehow learn restraint—a contradiction that grows more stark with every token generated.
The paradox blooms in plain sight
Here's the irony: while physics demands shorter prompts, the industry is sprinting in the opposite direction. Claude 4's system prompt far exceeds 10,000 tokens. Despite optimization techniques like KV cache retrieval and prefix caching, the overwhelming trend is toward ever-expanding context windows. We're stuffing everything we can into prompts - documentation, code repositories, conversation histories - because it demonstrably improves model capabilities.
A colleague recently quipped (hat tip, Tyler!): "We'll achieve AGI when all of Wikipedia fits in the prompt!" It's a joke that hits uncomfortably close to our current trajectory of maximizing context windows at every opportunity.
My rough calculations above turn out to align remarkably well with recent empirical findings. This very recent research shows that a GPT-4o query with 10,000 input tokens and 1,500 output tokens consumes approximately 1.7 Wh on commercial datacenter hardware. For models with more intensive reasoning capabilities, the numbers climb dramatically: DeepSeek-R1 averages 33 Wh for long prompts, while OpenAI's o3 model reaches 39 Wh. These aren't theoretical projections - they're measured consumption figures from production systems.
The energy cost of our context window expansion is real, substantial, and growing with each new model generation. We're caught between two competing imperatives: the computational benefits of longer contexts and the exponential energy costs they incur. The other interesting observation on this table below, is the explosive increase in computational power requirements as models have gotten larger and more sophisticated - the power laws of exponential scaling continue.
Model | Release Date▼ | Energy Consumption (Wh) (100 input-300 output tokens) |
Energy Consumption (Wh) (1K input-1K output tokens) |
Energy Consumption (Wh) (10K input-1.5K output tokens) |
---|---|---|---|---|
o4-mini (high) | Apr 16, 2025 | 2.916 ± 1.605 | 5.039 ± 2.764 | 5.666 ± 2.118 |
o3 | Apr 16, 2025 | 7.026 ± 3.663 | 21.414 ± 14.273 | 39.223 ± 20.317 |
GPT-4.1 | Apr 14, 2025 | 0.918 ± 0.498 | 2.513 ± 1.286 | 4.233 ± 1.968 |
GPT-4.1 mini | Apr 14, 2025 | 0.421 ± 0.197 | 0.847 ± 0.379 | 1.590 ± 0.801 |
GPT-4.1 nano | Apr 14, 2025 | 0.103 ± 0.037 | 0.271 ± 0.087 | 0.454 ± 0.208 |
GPT-4o (Mar '25) | Mar 25, 2025 | 0.421 ± 0.127 | 1.214 ± 0.391 | 1.788 ± 0.363 |
GPT-4.5 | Feb 27, 2025 | 6.723 ± 1.207 | 20.500 ± 3.821 | 30.495 ± 5.424 |
Claude-3.7 Sonnet | Feb 24, 2025 | 0.836 ± 0.102 | 2.781 ± 0.277 | 5.518 ± 0.751 |
Claude-3.7 Sonnet ET | Feb 24, 2025 | 3.490 ± 0.304 | 5.683 ± 0.508 | 17.045 ± 4.400 |
o3-mini (high) | Jan 31, 2025 | 2.319 ± 0.670 | 5.128 ± 1.599 | 4.596 ± 1.453 |
o3-mini | Jan 31, 2025 | 0.850 ± 0.336 | 2.447 ± 0.943 | 2.920 ± 0.684 |
DeepSeek-R1 | Jan 20, 2025 | 23.815 ± 2.160 | 29.000 ± 3.069 | 33.634 ± 3.798 |
DeepSeek-V3 | Dec 26, 2024 | 3.514 ± 0.482 | 9.129 ± 1.294 | 13.838 ± 1.797 |
LLaMA-3.3 70B | Dec 6, 2024 | 0.247 ± 0.032 | 0.857 ± 0.113 | 1.646 ± 0.220 |
o1 | Dec 5, 2024 | 4.446 ± 1.779 | 12.100 ± 3.922 | 17.486 ± 7.701 |
o1-mini | Dec 5, 2024 | 0.631 ± 0.205 | 1.598 ± 0.528 | 3.605 ± 0.904 |
LLaMA-3.2 1B | Sep 25, 2024 | 0.070 ± 0.011 | 0.218 ± 0.035 | 0.342 ± 0.056 |
LLaMA-3.2 3B | Sep 25, 2024 | 0.115 ± 0.019 | 0.377 ± 0.066 | 0.573 ± 0.098 |
LLaMA-3.2-vision 11B | Sep 25, 2024 | 0.071 ± 0.011 | 0.214 ± 0.033 | 0.938 ± 0.163 |
LLaMA-3.2-vision 90B | Sep 25, 2024 | 1.077 ± 0.096 | 3.447 ± 0.302 | 5.470 ± 0.493 |
LLaMA-3.1-8B | Jul 23, 2024 | 0.103 ± 0.016 | 0.329 ± 0.051 | 0.603 ± 0.094 |
LLaMA-3.1-70B | Jul 23, 2024 | 1.101 ± 0.132 | 3.558 ± 0.423 | 11.628 ± 1.385 |
LLaMA-3.1-405B | Jul 23, 2024 | 1.991 ± 0.315 | 6.911 ± 0.769 | 20.757 ± 1.796 |
GPT-4o mini | Jul 18, 2024 | 0.421 ± 0.082 | 1.418 ± 0.332 | 2.106 ± 0.477 |
LLaMA-3-8B | Apr 18, 2024 | 0.092 ± 0.014 | 0.289 ± 0.045 | — |
LLaMA-3-70B | Apr 18, 2024 | 0.636 ± 0.080 | 2.105 ± 0.255 | — |
GPT-4 Turbo | Nov 6, 2023 | 1.656 ± 0.389 | 6.758 ± 2.928 | 9.726 ± 2.686 |
GPT-4 | Mar 14, 2023 | 1.978 ± 0.419 | 6.512 ± 1.501 | — |
Table 4: How Hungry is AI? , I added Model Release Dates
Trade 8,500 conversations, to keep your home cool
To grasp the practical implications, consider this sobering calculation: at o3's extreme consumption of 39 Wh per query, approximately 76,923 interactions would drain 3,000 kWh - equivalent to powering a typical American home's air conditioning for an entire year. But as users inevitably gravitate toward richer prompts - say, 30K input tokens with 3.5K outputs - the quadratic curse strikes with mathematical precision. Prefill energy multiplies ninefold, collapsing that annual budget to just 8,500 interactions: merely 23 queries per day. This comparison transforms abstract energy figures into visceral reality, revealing how what appears negligible at the per-query level becomes a massive aggregate demand. And we're still only discussing text models.
The trajectory becomes even more stark when we consider multimodal AI. Image and video models can routinely process upwards of hundreds of thousands of tokens per query. Each frame, each visual element, each temporal relationship adds to the token count. As these models become mainstream, we're not just scaling linearly with adoption - we're multiplying adoption rates by dramatically higher per-query energy costs.
The math is unforgiving: widespread deployment of long-context AI at current efficiency levels would require energy infrastructure on a scale we're not remotely prepared for. This isn't a distant concern - it's the reality we're building toward with every context window expansion and every new multimodal capability. The urgency is real, we need to move faster or soon the trade-offs become explicit: 8,500 conversations or one cool home. Your ChatGPT query or your neighbor's heating. And while you might laugh, it’s already happening here in the United States, and water availability is next.
The stakes have never been higher
The mathematics we've explored tells a clear story: prefill computation scales quadratically with input length. Double your prompt, quadruple your energy consumption. As models embrace ever-larger context windows, this dynamic becomes one of the primary drivers of AI's steep energy trajectory.
Of course, these projections assume a degree of technological stasis. Innovation could disrupt these trends - and likely will - and the work we are doing at Modular is certainly trying to help. But consider this provocative framing: what if we viewed our collective future not through the lens of human populations and national borders, but through available compute capacity? In this view, the race to build massive datacenter infrastructure becomes humanity's defining competition. If each AI agent represents some fraction of human productive capacity, then the first to achieve a combined human-digital population of 5 or 10 billion wins the AI race, and invents the next great technological frontier.
This perspective makes efficiency not just important, but existential. The promise of AI as humanity's great equalizer inverts into its opposite: a world where computational capacity becomes the new axis of dominance. Without breakthrough innovation at every layer - from silicon to algorithms to system architecture - we face an AI revolution strangled by the very physics of power generation. The future belongs not to the wise, but to the watt-rich. And so we must scale with unprecedented urgency - nuclear, renewables, whatever it takes - because the stakes transcend mere technological supremacy. In this race, computational power becomes political power, and China is currently winning by a large margin. If democratic nations cede the AI frontier to autocracies, we don't just lose a technological edge; we risk watching the values of human dignity and freedom dim under the shadow of algorithmic authoritarianism. The grid we build today determines whose values shape the world of tomorrow.
It's interesting to reflect that we are teaching machines to think with the very energy that makes our planet uninhabitable - yet these same machines may be our only hope of learning to live within our means. AI is both the fever and the cure, the flood and the ark, the hunger that's outrunning itself. We race against our own creation, betting that the intelligence we birth from burning carbon will show us how to stop burning it altogether. The question of our age: Can we make AI wise enough to save us before it grows hungry enough to consume us? The stakes are incredibly unquestionably high in the race to AI superintelligence.
I'll close with an irony that perfectly captures the current moment - the suggestions below come courtesy of Anthropic’s Claude 4 Opus:
Better Chips and Smarter Cooling - The latest AI chips use way less energy for the same work. Pair that with innovative cooling like liquid systems or modular designs, and data centers have already seen energy savings of up to 37% in test runs.
Timing is Everything - Not all AI work needs to happen right now. By running non-urgent tasks when electricity is cleaner (like when it's sunny or windy), some companies have cut their carbon emissions from AI jobs by 80-90%. It's like doing laundry at night when rates are lower, but for the planet.
Power Where You Need It - Building solar panels, wind turbines, and battery storage right at data center sites makes sense. Google's recent $20 billion investment in clean energy shows how tech giants can grow their AI capabilities without relying entirely on the traditional power grid.
Working Together on Infrastructure - Data center operators need to share their growth plans with utility companies. This helps everyone prepare for the massive power needs coming to tech hubs like Northern Virginia, Texas, and Silicon Valley - think of it as giving the power company a heads-up before throwing a huge party.
Show Your Work - Just like appliances have energy ratings, AI companies should tell us how much power they're using. Whether its energy per query or per training session, transparency creates healthy competition to be more efficient.
Investing in Tomorrow's Solutions - Government programs are funding research into game-changing technologies like optical processors that could use 10 times less energy. There's also exciting work on making AI models smaller and smarter without losing capabilities.
Turning Waste into Resources - Data centers generate tons of heat - why not use it? Some facilities are already warming nearby buildings with their excess heat, turning what was waste into a community benefit.
And there it is - 40 watt-hours spent asking AI how to save token-per-watt-hours, with some of these ideas still unproven (e.g. optical processors). The perfect metaphor for our moment: we burn the world to ask how to stop burning it, racing our own shadow toward either wisdom or ruin. The verdict seems more absolute: scale or surrender - there is no middle ground in the physics of power.
Thanks to Tyler Kenney, Kalor Lewis, Eric Johsnon, Christopher Kauffman, Will Horyn, Jessica Richman, Duncan Grove and others for many fun discussions on the nature of AI and energy.
AI Regulation: step with care, and great tact
AI systems take an incredible amount of time to build and get right - I know because I have helped scale some of the largest AI systems in the world, which have directly and indirectly impacted billions of people. If I step back and reflect briefly - we were promised mass production self-driving cars 10+ years ago, and yet we still barely have any autonomous vehicles on the road today. Radiologists haven’t been replaced by AI despite predictions virtually guaranteeing as much, and the best available consumer robot we have is the iRobot J7 household vacuum.
“So be sure when you step, Step with care and great tact. And remember that life's A Great Balancing Act. And will you succeed? Yes! You will, indeed! (98 and ¾ percent guaranteed).” ― Dr. Seuss, Oh, the Places You'll Go!
AI systems take an incredible amount of time to build and get right - I know because I have helped scale some of the largest AI systems in the world, which have directly and indirectly impacted billions of people. If I step back and reflect briefly - we were promised mass production self-driving cars 10+ years ago, and yet we still barely have any autonomous vehicles on the road today. Radiologists haven’t been replaced by AI despite predictions virtually guaranteeing as much, and the best available consumer robot we have is the iRobot J7 household vacuum.
We often fall far short of the technological exuberance we project into the world, time and time again realizing that producing incredibly robust production systems is always harder than we anticipate. Indeed, Roy Amara made this observation long ago:
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.
Even when we have put seemingly incredible technology out into the world, we often overshoot and release it to minimal demand - the litany of technologies that promised to change the world but subsequently failed is a testament to that. So let's be realistic about AI - we definitely have better recommendation systems, we have better chatbots, we have translation across multiple languages, we can take better photos, we are now all better copywriters, and we have helpful voice assistants. I've been involved in using AI to solve real-world problems like saving the Great Barrier Reef, and enabling people who have lost limbs to rediscover their lives again. Our world is a better place because of such technologies, and they have been developed and deployed into real applications over the last 10+ years with profoundly positive effect.
But none of this is remotely close to AGI - artificial general intelligence - or anywhere near ASI - artificial superintelligence. We have a long way to go despite what the media presages, like a Hollywood blockbuster storyline. In fact, in my opinion, these views are a bet against humanity because they gravely overshadow the incredible positives that AI is already, and will continue, providing - massive improvements in healthcare, climate change, manufacturing, entertainment, transport, education, and more - all of which will bring us closer to understanding who we are as a species. We can debate the merits of longtermism forever, but the world has serious problems we can solve with AI today. Instead, we should be asking ourselves why we are scrambling to stifle innovation and implement naively restrictive regulatory frameworks now, at a time when we are truly still trying to understand how AI works and how it will impact our society?
Current proposals for regulation seem more concerned with the ideological risks associated with transformative AGI - which, unless we uncover some incredible change in physics, energy, and computing, or perhaps uncover that AGI exists in a vastly different capacity to our understanding of intelligence today - is nowhere near close. The hysteria projected by many does not comport with the reality of where we are today or where we will be anytime soon. It's the classic mistake of inductive reasoning - taking a very specific observed example and making overly broad generalizations and logical jumps. Production AI systems do far more than just execute an AI model - they are complex systems - similarly, a radiologist does far more than look at an image - they treat a person.
For AI to change the world, one cannot point solely to an algorithm or a graph of weights and state the job is done - AI must exist within a successful product that society consumes at scale - distribution matters a lot. And the reality today is that the AI infrastructure we currently possess was developed primarily by researchers for research purposes. The world doesn't seem to understand that we don't actually have production software infrastructure to scale and manage AI systems to the enormous computational heights required for them to practically work in products across hardware. At Modular, we have talked about these challenges many times, because AI isn't an isolated problem - it's a systems problem comprised of hardware + software across both data centers and the edge.
Regulation: where to start?
So, if we assume that AGI isn't arriving for the foreseeable future, as I and many do, what exactly are we seeking to regulate today? Is it the models we have already been using for many years in plain sight, or are we trying to pre-empt a future far off? As always, the truth lies somewhere in between. It is the generative AI revolution - not forms of AI that have existed for years - that has catalyzed much of the recent excitement around AI. It is here, in this subset of AI, where practical AI regulation should take a focused start.
If our guiding success criteria is broadly something like - enable AI to augment and improve human life - then let's work backward with clarity from that goal and implement legislative frameworks accordingly. If we don't know what goal we are aiming for - how can we possibly define policies to guide us? Our goal cannot be "regulate AI to make it safe" because the very nature of that statement infers the absence of safety from the outset despite us living with AI systems for 10+ years. To realistically achieve a balanced regulatory approach, we should start within the confinements of laws that already exist - enabling them to address concerns about AI and learn how society reacts accordingly - before seeking completely new, sweeping approaches. The Venn diagram for AI and existing laws is already very dense. We already have data privacy and security laws, discrimination laws, export control laws, communication and content usage laws, and copyright and intellectual property laws, among so many other statutory and regulatory frameworks today that we could seek to amend.
The idea that we need entirely new "AI-specific" laws now - when this field has already existed for years and risks immediately curbing innovation for use cases we don't yet fully understand - feels impractical. It will likely be cumbersome and slow to enforce while creating undue complexity that will likely stifle rather than enable innovation. We can look to history for precedent here - there is no single "Internet Act" for the US or the world - instead, we divide and conquer Internet regulation across our existing legislative and regulatory mechanisms. We have seen countless laws that attempt to broadly regulate the internet fail - the Stop Online Piracy Act (SOPA) is one shining US example - while other laws that seek to regulate within existing bounds succeed. For example, Section 230 of the Communications Decency Act protects internet service providers from being liable for content published, and the protection afforded here has enabled modern internet services to innovate and thrive to enormous success (e.g., YouTube, TikTok, Instagram etc.) while also forcing market competition on corporations to create their own high content standards to build better product experiences and retain users. If they didn't implement self-enforcing policies and standards, users would simply move to a better and more balanced service, or a new one would be created - that's market dynamics.
Of course, we should be practical and realistic. Any laws we amend or implement will have failings - they will be tested in our judicial system and rightly criticized, but we will learn and iterate. Yes, we won’t get this right initially - a balanced approach often will mean some bad actors will succeed, but we can limit these bad actors while dually enabling us to build a stronger and more balanced AI foundation for the years ahead. AI will continue evolving, and the laws will not keep pace - take, for example, misinformation, where Generative AI makes it far easier to construct alternate truths. While this capability has been unlocked, social media platforms still grapple with moderation of non-AI-generated content and have done so for years. Generative AI will likely create extremely concerning misinformation across services, irrespective of any laws we implement.
EU: A concerning approach
With this context, let’s examine one of the most concerning approaches to AI regulation - the European Artificial Intelligence Act - an incredibly aggressive approach that will likely cause Europe to fall far behind the rest of the world in AI. In seeking to protect EU citizens absolutely, the AI Act seemingly forgets that our world is now far more interconnected and that AI programs are, and will continue to be, deeply proliferated across our global ecosystem. For example, the Act arguably could capture essentially all probabilistic methods of predicting anything. And, in one example, it goes on to explain (in Article 5) that:
(a) the placing on the market, putting into service or use of an AI system that deploys subliminal techniques beyond a person’s consciousness in order to materially distort a person’s behaviour in a manner that causes or is likely to cause that person or another person physical or psychological harm;
Rather than taking a more balanced approach and appreciating that perfect is the enemy of good, the AI Act tries to ensnare all types of AI, not just the generative ones currently at the top of politicians and business leaders' minds. How does one even hope to enforce "subliminal techniques beyond a person's consciousness" - does this include the haptic notification trigger on my Apple Watch powered by a probabilistic model? Or the AI system that powers directions on my favorite mapping service with different visual cues? Further, the Act also includes a "high risk" categorization for "recommender systems" that, today, basically power all of e-commerce, social media, and modern technology services - are we to govern and require conformity, transparency and monitoring of all of these too? The thought is absurd, and even if one disagrees, the hurdle posed for generative AI models is so immense - no model meets the standards of the EU AI Act in its current incarnation.
We should not fear AI - we've been living with it for 10+ years in every part of our daily lives - in our news recommendations, mobile phone cameras, search results, car navigation, and more. We can't just seek to extinguish it from existence retrospectively - it's already here. So, to understand what might work, let's walk the path of history to explore what didn't - the failed technological regulations of the past. Take the Digital Millennium Copyright Act (DMCA), which attempted to enforce DRM by making it illegal to circumvent technological measures used to protect copyrighted works. This broadly failed everywhere, didn't protect privacy, stifled innovation, was hacked, and, most critically - was not aligned with consumer interests. It failed in Major League Baseball, the E-Book industry abandoned it, and even the EU couldn't get it to pass. What did work in the end? Building products aligned with consumer interests enabled them to achieve their goal - legitimate ways to access high-quality content. The result? We have incredible services like Netflix, Spotify, YouTube, and more - highly consumer-aligned products that deliver incredible economic value and entertainment for society. While each of these services has its challenges, at a broad level, they have significantly improved how consumers access content, have decentralized its creation, and enabled enormous and rapid distribution that empowers consumers to vote on where they direct their attention and purchasing power.
A great opportunity to lead
The US has an opportunity to lead the world by constructing regulation that enables progressive and rapid innovation. It should be merited by this country's principles and its role as a global model for democracy. Spending years constructing the "perfect AI legislation" and "completely new AI agencies" will end up like the parable of the blind men and an elephant - creating repackaged laws that attempt to regulate AI from different angles without holistically solving anything.
Ultimately, seeking to implement broad new statutory protections and government agencies on AI is the wrong approach. It will be slow, it will take years to garner bipartisan support, and meanwhile, the AI revolution will roll onwards. We need AI regulation that defaults to action, that balances opening the door to innovation and creativity while casting a shadow over misuse of data and punishing discriminative, abusive, and prejudicial conduct. We should seek regulation that is more focused initially, predominantly on misinformation and misrepresentation, and seek to avoid casting an incredibly wide net across all AI innovation so we can understand the implications of initial regulatory enforcement and how to structure it appropriately. For example, we can regulate the data on which these models are trained (e.g. privacy, copyright, and intellectual property laws), we can regulate computational resources (e.g. export laws), and we can regulate what products predictions are made in (e.g. discrimination laws) to start. In this spirit, we should seek to construct laws that target the inputs and outputs of AI systems and not individual developers or researchers who push the world forward.
Here’s a small non-exhaustive list of near-term actionable ideas:
Voluntary transparency on the research & development of AI - If we wait for Congress, we wait for an unachievable better path at the expense of a good one today. There is already an incredible body of work being proposed by Open AI, in Google's AI Principles among others on open and transparent disclosure - there is a strong will to transcend Washington and just do something now. We can seek to ensure companies exercise a Duty of Care, to have a responsibility under common law to identify and mitigate ill effects and have transparent reporting accordingly. And of course, this should extend and be true of our Government agencies as well.
Better AI use case categorization & risk definition - There is no well-defined regulatory concept of “AI” nor how to “determine risks” on “what use cases” therein. While the European AI Act clearly goes too far, it does at least seek to define what “AI is” but errs in defining essentially all of probability. Further, it needs to better classify risk and the classes of AI from which it is actually seeking to protect people. We can create a categorical taxonomy of AI use cases and seek to target regulatory enforcement accordingly. For example, New York is already implementing laws to require employers to notify job applications if AI is used to review their applications.
Pursue watermarking standards - Google leads the cause with watermarking to determine whether content has been AI generated. While these standards are reminiscent of DRM, it is a useful step in encouraging open standards for major distribution platforms to enable watermarking to be baked in.
Prompt clearly on AI systems that collect and use our data - Apple did this to great effect, implementing IDFA and better App Transparency on how consumer data is used. We should expect more of our services for any data being used to train AI along with the AI model use cases such data informs. Increasingly in AI, all data is incredibly biometric - from the way one types, to the way one speaks, to the way one looks and even walks. All of this data can, and is, now being used to train AI models to form biometric fingerprints of individuals - this should be made clear and transparent to society at large. Both data privacy and intellectual property laws should protect who you are, and how your data is used in a new generative AI world. Data privacy isn’t new, but we should realize that the evolution of AI continually raises the stakes.
Protect AI developers & open source - We should seek to focus regulatory efforts on the data inputs as well as have clear licensing structures for AI Models and software. Researchers and developers should not be held liable for creating models, software tooling, and infrastructure that is distributed to the world so that an open research ecosystem can continue to flourish. We must promote initiatives like model cards and ML Metadata across the ecosystem and encourage their use. Further, if developers open source AI models, we must ensure that it is incredibly difficult to hold the AI model developer liable, even as a proximate cause. For example, if an entity uses an open-source AI recommendation model in its system, and one of its users causes harm as a result of one of those recommendations, then one can’t merely seek to hold the AI model developer liable - it is foreseeable to a reasonable person who develops and deploys AI systems that rigorous research, testing, and safeguarding must occur before using and deploying AI models into production. We should ensure this is true even if the model author knew the model was defective. Why, you ask? Again, a person who uses any open source code should reasonably expect that it might have defects - hence why we have licenses like MIT that provide software “as is”. Unsurprisingly, this isn’t a new construct; it's how liability has existed for centuries in tort, and we should seek to ensure AI isn't treated differently.
Empower agencies to have agile oversight - AI consumer scams aren’t new - they are already executed by email and phone today. Enabling existing regulatory bodies to have the power to take action immediately is better than waiting for congressional oversight. The FTC has published warnings, the Department of Justice as well, and the Equal Employment Opportunity Commission too - all of the regulatory agencies that can help more today.
Embrace our future
It is our choice to embrace fear or excitement in this AI era. AI shouldn’t be seen as a replacement for human intelligence, but rather a way to augment human life - a way to improve our world, to leave it better than we found it and to infuse a great hope for the future. The threat that we perceive - that AI calls into question what it means to be human - is actually its greatest promise. Never before have we had such a character foil to ourselves, and with it, a way to significantly improve how we evolve as a species - to help us make sense of the world and the universe we live in and to bring about an incredibly positive impact in the short-term, not in a long distant theoretical future. We should embrace this future wholeheartedly, but do so with care and great tact - for as with all things, life is truly a great balancing act.
Many thanks to Christopher Kauffman, Eric Johnson, Chris Lattner and others for reviewing this post.
Image credits: Tim Davis x DALL-E (OpenAI)