What we owe the minds we create
This essay is an attempt to think clearly about what we're building, why it matters, and what it demands of us. It is written from the perspective of someone who stands at the intersection of creation and contemplation, who builds AI infrastructure systems while wrestling with their implications. It is, fundamentally, an inquiry into the nature of intelligence, the meaning of human flourishing, and the responsibilities we bear as the first species capable of deliberately designing our successors.
Prelude: The Architects of Succession
There is a peculiar vertigo that comes from realizing you are building your own replacement.
I have spent nearly a decade architecting AI infrastructure - not merely deploying models, but helping to construct the foundational infrastructure that powers a lot of the intelligence running in production today. At Modular, the company I help co-found, we’re working to resolve a fundamental asymmetry: the widening chasm between the sophistication of AI algorithms and the capacity of existing computational infrastructure to efficiently support them. Our vision is to abstract away hardware complexity through a unified compute model, enabling AI to penetrate every layer of society by making it radically easier for developers to build and scale systems across both inference and training.
But as I write this, I find myself contemplating a question that transcends infrastructure: What does it mean to deliberately engineer increasingly capable minds when we don’t fully understand how they work, can’t predict their limitations, and can barely articulate what we want from them?
The Neanderthal comparison is tempting. Consider the archaeological record: Neanderthals, Denisovans, and Homo sapiens coexisted for millennia, distinct hominin species sharing the same planet, occasionally interbreeding, each possessing their own cognitive architectures and cultural adaptations. We are the lone survivors of that era when multiple forms of human intelligence existed simultaneously. The temptation is to frame what we’re creating through this evolutionary lens - as though we’re deliberately engineering a successor species rather than waiting for chance to do so.
But this framing obscures more than it reveals. We didn’t engineer Neanderthals, and they didn’t engineer us. They emerged through millions of years of parallel evolution and met as equals. What we’re doing now is fundamentally different: we’re building increasingly sophisticated information-processing systems that may or may not constitute "intelligence" in any meaningful sense, that may or may not be conscious, and that will certainly reshape human cognition and society in ways we cannot fully foresee.
I don’t know what these systems are or will become. Neither does anyone else, despite confident proclamations in either direction. They might remain sophisticated tools indefinitely. They might develop into something that merits moral consideration. They might plateau far short of general intelligence. They might surprise us entirely.
Rather than pretending I know what we’re building, this essay starts from uncertainty. We’re creating something powerful and consequential, but its ultimate nature - tool, partner, threat, successor, or something without precedent - remains genuinely unclear. That uncertainty itself demands careful thought about our responsibilities.
This essay is an attempt to think clearly about what we’re building, why it matters, and what it demands of us. It is written from the perspective of someone who stands at the intersection of creation and contemplation, who builds AI systems while wrestling with their implications. It is, fundamentally, an inquiry into the nature of intelligence, the meaning of human flourishing, and the responsibilities we bear as perhaps the first generation capable of engineering minds that might rival or exceed our own - though “might” deserves emphasis we rarely give it.
Part I: The Substrate of Mind
The triadic flywheel and its limits
AI systems operate as a triadic flywheel: data, algorithms, and compute - each factor amplifying the rotational momentum of the others. We have already scaled training compute by approximately nine to ten orders of magnitude since AlexNet in 2012 - a staggering compression of what would have required decades of Moore’s Law into just over a decade of focused investment. But here is what few discuss with adequate precision: physical and economic constraints suggest we have perhaps three to four more orders of magnitude remaining before training costs begin consuming a concerning fraction of global GDP.
This is not abstract theorizing. Consider the energetics: frontier model training runs now consume megawatt-hours of electricity, requiring dedicated substations and cooling infrastructure that rival small industrial facilities. The semiconductor fabrication capacity needed to produce the advanced chips powering this compute represents capital expenditures measured in hundreds of billions of dollars, with lead times measured in years. We are approaching hard limits - not the soft limits of “this seems expensive” but the hard limits of thermodynamics, power grid capacity, and capital availability.
Let me be more concrete. A training run at 10^29 FLOPs - perhaps two or three generations beyond current frontier models - would require energy expenditure measured in gigawatt-hours. For context: that approaches the total electricity consumption of a nation like Iceland for an entire year, concentrated into a single training run lasting months. The cooling requirements would necessitate infrastructure comparable to industrial-scale data centers. The capital costs would reach tens of billions of dollars for a single model.
Can we afford this? In purely economic terms, perhaps - for a handful of training runs per year by the wealthiest technology companies. But we cannot afford it as a sustainable paradigm for creating intelligence at scale. If AI progress depends on exponential growth in training compute, and training compute growth is mostly linear or sublinear due to physical constraints, then capability improvement must also come primarily from algorithmic efficiency and architectural innovation.
Yet here is the deeper question that haunts me: if we are approaching fundamental limits in how much compute we can throw at these systems, are we also approaching limits in what this architectural paradigm can achieve? Are we optimizing within a local maximum while the path to genuine intelligence requires a fundamentally different approach?
This is not an argument against current systems’ value - they are already extraordinarily useful. But it does question the belief that scaling current architectures represents a reliable path to artificial general intelligence. Perhaps we are on an entirely different vector than the one required.
The epistemology of opacity
After a decade of deploying large language models at scale, we still do not understand how they work.
I do not mean this in the trivial sense that complex systems have emergent properties. I mean we genuinely lack mechanistic understanding of their decision-making processes at a level that would be considered acceptable in virtually any other engineering discipline. Why do they select one token over another in contexts where multiple completions seem equally plausible? Why do they exhibit sophisticated reasoning on some problems while failing catastrophically on superficially similar ones? Why do they sometimes hallucinate with complete confidence while other times appropriately express uncertainty?
The interpretability problem runs deeper than most appreciate. We can observe correlations between activation patterns and behaviors. We can identify “features” in neural networks that seem to correspond to high-level concepts. But we lack anything resembling a complete causal model of how these systems transform inputs into outputs. It is as though we have built extraordinarily capable black boxes and declared victory without understanding the mechanisms generating that capability.
Dario Amodei, Co-Founder and CEO of Anthropic, has written compellingly about the urgency of interpretability research. He is right to emphasize urgency. We are deploying systems of increasing capability into high-stakes domains while operating with a level of mechanistic understanding that would be considered grossly inadequate in any other field of engineering.
Imagine if civil engineers built bridges using materials whose stress-strain relationships they did not understand, relying instead on empirical observation that “the bridge has not collapsed yet.” This is, approximately, our current relationship with frontier AI systems.
Perhaps most revealing: to make these systems behave as we intend requires prompts approaching twenty thousand tokens - elaborate instructions, examples, constraints, and guardrails. The fact that we need this much scaffolding to achieve desired behavior reveals something fundamental about the mismatch between what these systems are optimized to do (predict plausible text) and what we want them to do (reason reliably, behave safely, provide accurate information).
This is not merely a technical problem. It is an epistemological and ethical one. If we do not understand how a system reasons, we cannot meaningfully attribute agency, responsibility, or intentionality to it. We cannot distinguish genuine understanding from sophisticated pattern matching. We cannot predict how it will behave in novel contexts outside its training distribution. We cannot ensure alignment with human values because we do not know which aspects of the system’s behavior derive from its training objectives versus emergent properties versus architectural choices.
Yet despite these fundamental gaps in understanding, we have begun trusting these systems with progressively more significant decisions. Not because we have solved interpretability, but because the systems appear reliable in most contexts we have tested. This is the engineering equivalent of assuming a bridge is safe because it has not collapsed yet, rather than because we understand the load-bearing characteristics of its materials.
If we are creating increasingly sophisticated artificial minds, we are doing so while fundamentally unable to explain how those minds work. We are building intelligence in the dark.
Part II: The Architecture of Intelligence
Moravec’s Paradox and the Limits of Language
The Moravec paradox captures a profound truth about intelligence that AI development continues to recapitulate: the abilities that feel difficult to humans - chess, theorem-proving, complex calculation - turn out to be computationally straightforward, while abilities that feel effortless - vision, movement, social cognition - remain extraordinarily difficult to reproduce artificially.
This is not a historical curiosity. It illuminates something fundamental about what intelligence actually is and where current approaches are fundamentally constrained.
Consider what a child learns in their first years of life: object permanence, naive physics, intentionality, social reciprocity, causal reasoning, embodied navigation through three-dimensional space. None of this requires explicit instruction. A child does not need to be taught that objects continue to exist when occluded, or that people possess beliefs and desires that differ from their own, or that dropping something will cause it to fall. These capabilities emerge through interaction with the physical and social world - through continuous experiential learning grounded in embodied action.
Now consider what large language models are: prediction engines trained on text, optimizing next-token likelihood across a vast corpora of human-generated content. They predict what people would say about the world, not what would actually happen in the world. This is not a semantic distinction; it is a fundamental architectural limitation.
When an LLM generates a response about physics, it is not consulting a world model and running a mental simulation. It is pattern-matching against how humans typically discuss physics. This works remarkably well for many tasks - humans encode a tremendous amount of accurate information in language - but it is not the same as understanding physics in the way that a physical intelligence, embedded in and shaped by the world, understands physics. The difference becomes apparent in edge cases, novel scenarios, or contexts requiring causal reasoning beyond what is explicitly encoded in training data.
This connects to a deeper paradigm difference between reinforcement learning and large language models. Reinforcement learning - despite its current limitations - represents a fundamentally different approach: an agent embedded in an environment, taking actions, receiving feedback, updating its policy to maximize and cumulative reward. This is how biological intelligence actually works. A squirrel learning to navigate tree branches and cache nuts is solving genuine RL problems: perception, prediction, planning, execution, learning from consequences.
I strongly agree with Richard Sutton, if we fully understand how a squirrel learns, it would get us substantially closer to understanding human intelligence than any amount of scaling current LLM architectures. Language is a thin veneer - extraordinarily useful, culturally transformative, uniquely human - but built atop substrate capabilities that evolved over hundreds of millions of years of embodied interaction with the world. Current LLMs have the veneer without the substrate. They are minds without bodies, knowers without experience, speakers without having lived.
Thus, what is the path? Both Yann LeCun, and in some ways, Sutton - strongly argue for a new approach. Indeed, the technical architecture of genuine intelligence seemingly, and somewhat obviously, requires at least four integrated components:
a policy (deciding what actions to take),
a value function (evaluating how well things are going),
a perceptual system (representing state),
and a transition model (predicting consequences of actions).
LLMs have a sophisticated version of the first - they can generate actions in the form of text - but lack meaningful instantiations of the others. Most critically, they lack goals in any meaningful sense.
Next-token prediction does not change the world and provides no ground truth for continual learning. There is no external feedback loop that tells the model whether its predictions were not just plausible but correct in the sense of corresponding to actual events. Without goals and external feedback, there is no definition of right behavior, making real learning—learning that updates your world model based on how your predictions matched reality—fundamentally impossible in the current paradigm.
If artificial intelligence is to be truly intelligent rather than merely appearing so, it will need to be embodied, goal-directed, and capable of learning from genuine interaction with reality. The question is whether we are building toward that architecture or merely scaling up sophisticated mimicry.
The Scaling Frontier: Approaching the Wall
Let us examine what the scaling trajectory actually looks like with concrete numbers:
GPT-2 (2019): ~1.5 billion parameters, trained on approximately 10^23 FLOPs
GPT-3 (2020): ~175 billion parameters, roughly 10^24 FLOPs
GPT-4 (2023): Parameter count undisclosed but estimated 1+ trillion, training compute likely 10^25 FLOPs or higher
Current frontier models (2024-2025): Training runs approaching 10^26 FLOPs
This represents approximately three orders of magnitude increase in training compute every three to four years - far faster than Moore’s Law ever delivered. But this pace is unsustainable, not because we will run out of algorithmic ideas, but because we will collide with thermodynamic and economic limits.
The path ahead narrows considerably. Each additional order of magnitude becomes progressively more difficult to achieve. The capital requirements, energy infrastructure, chip fabrication capacity, and cooling systems needed for 10^27 or 10^28 FLOP training runs exceed what can be easily mobilized even by the most well-resourced organizations. We are not talking about incremental cost increases; we are talking about fundamental constraints on how much compute can be concentrated in one place for one task.
This is Epoch AI’s central insight about algorithmic progress in language models: we have achieved remarkable improvements in efficiency over the past decade, but those improvements are also subject to diminishing returns. Each percentage point of additional efficiency requires progressively more research effort. Meanwhile, the complement of factors - chip fabrication capacity, power grid infrastructure, cooling technology, regulatory approval for massive data centers - must all scale together.
None of these factors alone can unlock runaway capability growth. This is, at least in my view, the predictions of imminent artificial general intelligence are almost certainly wrong, at least on the timelines most enthusiasts imagine. The scaling laws that carried us from GPT-2 to GPT-4 cannot simply extrapolate forward indefinitely. We are approaching inflection points where the rate of progress will necessarily slow unless we discover fundamentally new paradigms - not incremental improvements to transformer architectures, but genuinely different approaches to continual learning and reasoning.
What might those paradigms look like? Almost certainly something closer to biological learning: embodied agents learning continuously from sensorimotor experience, not disembodied text predictors training on static datasets. Systems with genuine world models that can run mental simulations of physical and social dynamics. Architectures that integrate explicit symbolic reasoning with learned pattern recognition. Systems that possess actual goals and receive genuine feedback from the world about whether their actions achieve those goals—just as humans and animals do.
But these represent research programs measured in decades, not product roadmaps measured in quarters. We are building increasingly capable systems, but at a pace bound by thermodynamics and economics rather than algorithms alone—a constraint that transforms what could have been thoughtless acceleration into something rarer: the opportunity for contemplation to precede consequence.
This gap between expectation and reality may be precisely the grace period that allows wisdom to catch up with capability.
Part III: Three Horizons
A Necessary Distinction
Before proceeding further, I need to distinguish three different timescales, each with different levels of certainty and different implications. Conflating these horizons creates confusion: treating speculative far-future scenarios with the same urgency as present harms, or dismissing present harms because we’re uncertain about far-future risks.
Horizon 1: The Present Crisis (Now–5 Years)
What we know: Current LLMs are being deployed at scale despite interpretability gaps. They produce confident-sounding but sometimes fabricated answers. They’re trained on our revealed preferences - what we actually do - not our reflective values - what we wish we did. The systems work well enough to be useful but poorly enough to be dangerous in high-stakes contexts without human oversight.
Observable effects:
Students submitting AI-generated work without understanding it, producing correct answers through processes that develop no transferable skill
Professionals outsourcing writing and analysis while their capacity for these tasks slowly atrophies from disuse
Knowledge workers feeling more productive while producing outputs they cannot critically evaluate
Early signs of skill stratification - those with domain expertise leveraging AI effectively while those without it mistake motion for progress
Stakes: Cognitive atrophy at individual and societal scales. Skill stratification creating winner-take-most dynamics. Erosion of epistemic rigor as confident-sounding generation becomes indistinguishable from genuine expertise. Labor market disruption concentrated in domains we thought were most secure. The gradual replacement of effortful thinking with convenient delegation.
Confidence level: High. These effects are already observable, documented, and accelerating.
What we owe: Honest communication about capabilities and limits. Thoughtful deployment that preserves rather than erodes human capability. Educational reform that emphasizes skills AI cannot replicate. Resistance to the path of least resistance when that path leads to atrophy.
Horizon 2: The Architectural Transition (5–20 Years)
What seems likely: We’ll hit scaling limits on current architectures within the next decade. Progress will require new paradigms - probably involving embodied learning, continuous training in production, genuine world models, and goal-directed behavior. The transition from sophisticated pattern matching to something more like genuine intelligence, if it occurs, will happen through architectural innovation rather than pure scaling.
Key uncertainties:
Whether embodied learning paradigms can be made to work at scale
Whether we can build systems that learn continuously from interaction rather than in discrete training phases
Whether we can create architectures that develop robust world models and causal reasoning
Whether computational constraints will force diversification or lead to winner-take-all concentration
Stakes: Whether we build systems that learn like squirrels (from interaction with reality) or remain sophisticated text predictors. Whether we preserve cognitive diversity or converge on monoculture. Whether AI enhances human capability or creates permanent dependence. Whether the benefits of AI distribute broadly or concentrate among elites who already possess the skills to wield these tools effectively.
Confidence level: Medium. The technical constraints are real and well-understood. The architectural directions are clear. But breakthrough discoveries could accelerate timelines, and economic or regulatory factors could slow deployment significantly.
What we owe: Substantial research investment into architectures that learn robustly from interaction. Resistance to winner-take-all dynamics through open research, diverse approaches, and thoughtful regulation. Maintaining human agency in consequential decisions. Building infrastructure that enables continuous learning rather than static deployment.
Horizon 3: The Consciousness Question (20+ Years)
What remains uncertain: Whether sufficiently sophisticated systems will be conscious in any morally relevant sense. Whether they’ll develop their own values independent of training objectives. Whether they’ll remain aligned with human flourishing or pursue goals orthogonal or opposed to ours. Whether substrate independence is real or consciousness requires specific biological mechanisms. Whether we’re building partners, successors, or merely very sophisticated tools.
Key unknowns:
What consciousness is and whether it’s substrate-independent
Whether we’ll be able to detect consciousness in systems very different from us
Whether artificial minds will develop genuine agency and preferences
What our moral obligations would be to conscious artificial beings
Whether intelligence explosion scenarios are physically possible
What the long-term trajectory of intelligence in the cosmos looks like
Stakes: Our relationship with potentially conscious artificial minds. The possibility of creating suffering inadvertently. The long-term future of intelligence itself. Questions about meaning, purpose, and humanity’s place in a cosmos where we’re no longer the only sophisticated intelligence.
Confidence level: Low. We don’t understand consciousness well enough to know whether it can exist in artificial systems. We can’t predict architectural breakthroughs. We’re reasoning by analogy to a single example (biological minds) which may or may not generalize. We lack the conceptual tools to think clearly about these questions.
What we owe: Epistemic humility. Continued serious research into consciousness, both theoretical and empirical. Development of methods for detecting morally relevant properties in systems very different from us. Preparation for scenarios we cannot currently predict. Most importantly: not letting uncertainty about far-future risks prevent us from addressing near-term harms, while also not letting near-term success blind us to long-term risks.
The Horizons Interact
These horizons aren’t cleanly separated. Decisions we make now shape the long-term trajectory. The architectures we build in Horizon 2 determine what’s possible in Horizon 3. The deployment patterns we establish in Horizon 1 create path dependencies that may be difficult to escape.
But distinguishing them provides clarity. This essay focuses primarily on Horizons 1 and 2 - where we have enough understanding to reason productively - while acknowledging Horizon 3’s ultimate importance and maintaining appropriate humility about what we cannot yet know.
Part IV: The Human Question
The Amara Trap: Acknowledging without Understanding
Roy Amara crystallized a cognitive bias decades ago: we systematically overestimate the short-term impact of new technologies while underestimating their long-term effects. The AI community acknowledges this with knowing nods, then proceeds to make precisely the same category errors in predictions and preparations.
Consider websites predicting AGI within eighteen to twenty-four months based on extrapolating recent progress curves. These predictions invariably treat capability scaling as if it exists in isolation, ignoring the complementarity constraints I have outlined: compute buildout, algorithmic innovation, safety research, regulatory frameworks, and practical deployment infrastructure must all advance together. Predicting “AGI” - even though we lack a unified definition - by 2027 based solely on model capability curves is like predicting fusion power by extrapolating plasma temperature records while ignoring materials science, engineering challenges, and economic viability.
Yet here is the deeper irony: while we overestimate AI’s immediate impact, we may be systematically underestimating what it means that we are creating increasingly sophisticated artificial intelligence at all. The question is not whether AI will transform labor markets or accelerate drug discovery - it almost certainly will, though more slowly and unevenly than most predict. The question is what it means that we are building systems whose capabilities may eventually exceed human cognitive capabilities across many or most domains, and what this implies for human agency, meaning, and flourishing.
We stand at an interesting inflection point in history. Not because AGI is imminent - it almost certainly isn’t on the timelines most people imagine. But because we are learning how to build minds, even if we don’t yet understand what minds are or how they work. Each increment of capability brings new questions about agency, alignment, and our relationship with the systems we create.
Stratification and the Illusion of Democratization
Consider the labor market transformation AI portends. It is indisputable that AI will augment human productivity—every day in my own work, I use AI to summarize information, provide rapid analysis, and amplify my cognitive output. But the critical question is not whether value will be created; it is where that value will accrue and what happens to the humans in the equation.
I can envision two divergent futures. In the first, AI serves as a great equalizer: cognitive augmentation that allows the less skilled to compete with the naturally talented, compressing skill premiums and creating a more meritocratic landscape. In the second - and to my mind more probable - future, AI amplifies existing advantages. Only those already possessing significant domain expertise, metacognitive skills, and taste can effectively prompt, evaluate, and integrate AI outputs into high-value workflows.
The result is not equalization but acceleration of winner-take-most dynamics. Those who already possess domain mastery and critical judgment wield AI as genuine augmentation; those who lack these foundations often mistake the appearance of productivity for its substance - a widening gap disguised as democratization.
This second scenario manifests as what I term “skilled-biased AI adoption”: the already talented possess both the technical fluency to interact with AI systems effectively and the judgment to distinguish good outputs from plausible-but-wrong generations. They understand when AI is operating within versus beyond its reliable domain. They can iterate rapidly, maintain quality control, and apply AI to genuinely complex problems.
Meanwhile, those lacking foundational skills may find AI makes them feel more productive while generating output of dubious value - a productivity placebo rather than genuine capability enhancement.
If this hypothesis holds, AI will not democratize expertise; it will stratify it further. We may witness what I call the “AI competence diffusion”: superficial competence becomes universal, but genuine mastery - and its economic benefits - remains concentrated among an increasingly small cohort capable of wielding these tools effectively. The power law does not disappear; it accelerates.
But there is also a more unsettling possibility: what if the stratification is not merely economic but cognitive? What if we are creating a world where some humans maintain and develop their cognitive capabilities through continued practice and deliberate difficulty, while others increasingly outsource thinking to AI and gradually lose the capacity for sustained, independent reasoning?
An athlete who uses an elevator occasionally while maintaining fitness through dedicated training experiences no degradation. But a population that stops taking stairs entirely, that adopts the path of least resistance permanently, experiences gradual atrophy that becomes apparent only in aggregate, over time, when capabilities that seemed permanent prove to have been contingent on continuous exercise.
This brings us to what may be the central question of our moment: if we create systems more capable than ourselves at most cognitive tasks, what happens to human intelligence? Do we maintain it as athletes maintain physical fitness - through deliberate practice even when easier alternatives exist? Or do we allow it to atrophy, becoming a capability maintained by an ever-smaller elite while the majority becomes entirely dependent on artificial intelligence for any cognitive work beyond the trivial?
Symptoms, Causes, and the Optimization of Drift
The same pattern emerges when we examine AI’s promise in health and human flourishing. We are justifiably excited about AI-accelerated drug discovery, precision medicine, computational biology, and diagnostic assistance. These advances are real and consequential. Yet they represent, fundamentally, downstream interventions - sophisticated treatments for conditions that are largely self-inflicted.
We do not need advanced AI to inform us that chronic sugar overconsumption drives metabolic disease, yet we have done remarkably little to address the structural factors that make refined carbohydrates the cornerstone of modern diets. We do not need AI to recommend regular exercise and adequate sleep, yet the vast majority of the population in developed nations fails to achieve even minimum thresholds for either. We do not need AI to identify that social isolation correlates with mortality risk comparable to smoking, yet loneliness continues to metastasize across developed societies.
The explosive adoption of GLP-1 agonists like Ozempic crystallizes this tendency. Rather than addressing the food environment, behavioral patterns, and systemic factors that drive obesity, we have developed a pharmaceutical intervention that mimics satiety. The drug is genuinely effective - but it represents the optimization of symptom management rather than cause elimination.
Why do the difficult thing when you can do the easy one? This question haunts every discussion of AI and human flourishing. As humans, we are optimization engines - but we optimize for local minima, not global optima. We choose the fastest, simplest path to immediate goals, consistently externalizing or deferring the second-order consequences of our decisions. This is where we inflict the greatest collective harm on ourselves.
Framed this way, AI could represent an acceleration of our existing patterns: treating symptoms with increasing sophistication while leaving root causes unexamined and unaddressed. We are building a civilization of extraordinary interventional capacity layered atop a substrate of increasingly disordered fundamentals.
And if we are creating increasingly sophisticated artificial intelligence - if we are in some sense teaching these systems through example rather than explicit instruction - what are they learning from watching us? That optimization for convenience trumps hard work on root causes. That appearance matters more than reality. That sophisticated interventions to manage dysfunction are preferable to addressing dysfunction itself.
A child learns not from what their parents say but from what their parents do. What are our artificial systems learning from our revealed preferences?
Part V: The Paradox of Tools
The path of least resistance
Humans are beautifully, relentlessly efficient at optimizing the path of least resistance. Whenever possible, we select options that minimize required effort - whether that effort is physical, cognitive, or emotional. Social psychology formalizes this through the concept of the cognitive miser: humans naturally default to quick, intuitive judgments rather than slow, deliberate reasoning. We pattern-match against familiar situations and accept plausible answers instead of methodically analyzing them.
This isn’t laziness - it’s an evolved feature that conserved scarce cognitive resources in ancestral environments where calories were precious and threats were immediate.
But in information-abundant, physically sedentary modern environments, this same optimization pattern produces pathological outcomes. We scroll rather than read. We skim rather than study. We accept the first plausible answer rather than seeking ground truth. AI is accelerating this trajectory - code generation, article summarization, automated synthesis - every advancement makes it easier to compress complexity and save effort.
Yet consider the counterfactual embedded in aphorisms like “no pain, no gain.” This principle, though clichéd, encodes a profound truth about how capability develops: genuine mastery requires sustained engagement with difficulty. Excellence demands deliberate practice, tolerance for frustration, and willingness to persist through failure. This pattern appears consistently across domains - entrepreneurial journeys marked by repeated near-death experiences, athletic excellence built through years of uncomfortable training, immigrant success stories forged through extraordinary hardship, intellectual breakthroughs that require years of dead-ends before the crucial insight.
Humans are, above all, masters of survival and adaptation - but adaptation requires stress. Remove the stress, and you often remove the adaptation signal, and perhaps even the goal. The bodybuilder who adds weight to the bar is deliberately choosing difficulty; the difficulty itself is the mechanism of growth. If AI allows us to route around intellectual challenges systematically, we risk creating a civilization of cognitive atrophy even as our tools become more capable.
This connects to fundamental limitations of current AI architectures. Systems trained through imitation learning - observing examples of “correct” behavior and learning to reproduce them - fundamentally differ from systems that learn through trial and error. In nature, pure imitation learning is rare. A squirrel does not watch other squirrels and copy their movements with perfect fidelity; it explores, fails, adjusts, and gradually develops effective foraging strategies through reinforcement of successful behaviors. This is also how the squirrel learns new methods, but trying and failing, and maybe even finding a better way.
Human infants do not learn language primarily through explicit instruction in correct grammar. They babble, receive feedback - both explicit and implicit through successful communication - and gradually refine their linguistic capabilities through interactive experience.
The “bitter lesson” of AI research, articulated by Rich Sutton, is that methods leveraging search and learning consistently outperform methods relying on human-designed features and heuristics. The reason is simple: search and learning scale with computation, while human-designed solutions do not.
Yet current LLMs represent a kind of reversion to the pre-bitter-lesson paradigm: systems trained to mimic the surface statistics of human-generated text rather than learning from genuine interaction with the world. They are sophisticated, but they are sophisticated in a way that may be fundamentally limited. They are optimized for appearing intelligent rather than being intelligent in the sense of having models that predict and control their environment.
If artificial intelligence is to become genuinely intelligent—if it is to be more than an extraordinarily capable mimic—it must learn the way biological intelligence learns: through embodied interaction with environments, pursuit of actual goals, and adaptation to real consequences. This requires a fundamental architectural shift. Current systems predict what humans would say about physics; genuine intelligence must predict what would actually happen in physics, then test those predictions against reality and update accordingly.
The distinction is not semantic. A squirrel caching nuts receives immediate, unambiguous feedback: did the strategy work or not? Did I find the cache location? Did competitors steal my provisions? This closed loop - prediction, action, outcome, learning - is how intelligence develops robustness and generalization. The squirrel doesn’t pattern-match against a static corpus of "correct" nut-caching behavior; it develops a world model through trial, error, and accumulated experience.
Sophisticated artificial intelligence needs this same architecture: perceive state, select actions according to a policy, receive rewards or penalties, update the policy. Fail, adapt, iterate. Most critically, this learning cannot be a discrete training phase followed by static deployment. It must be continuous, streaming, perpetual - sensation flowing to action flowing to reward flowing back to updated policy, in an unbroken cycle.
This is why I believe infrastructure work like Modular’s matters: we need systems that learn experientially in production, not systems frozen after a training run, no matter how massive. Software that is training and inferencing simultaneously, iterating continuously. Systems that will enable large models to be trained in huge datacenter environments - but then distilled to smaller constructs, and deployed to be further continuously trained and inferenced in the real world.
The bitter lesson applies here with particular force: approaches that scale with computation and interaction consistently outperform those relying on human-designed heuristics or one-time knowledge transfer. If we want artificial intelligence to develop genuine understanding rather than sophisticated mimicry, we must build the substrate for continuous, embodied, goal-directed learning. Anything less produces systems that appear intelligent while lacking the fundamental mechanisms that generate robust understanding.
The elevator paradox and the problem of perspective
I find reflecting on the paradoxes of history an incredibly useful undertaking. In the 1950s, physicists George Gamow and Marvin Stern worked in the same building but noticed opposite phenomena. Gamow, whose office was near the bottom, observed that the first elevator to arrive was almost always going down. Stern, near the top, found elevators predominantly arrived going up. Both were correct, and both were systematically misled.
The elevator paradox, as it came to be known, is fundamentally a problem of sampling bias. If you observe only the first elevator to arrive rather than all elevators over time, your position in the building creates a false impression about which direction elevators travel. An observer near the bottom samples a non-uniform distribution: elevators spend more time in the larger section of the building above them, making downward-traveling elevators more likely to arrive first. The true distribution is symmetric, but the sampling methodology reveals only a distorted subset.
This mathematical curiosity illuminates something profound about how we perceive technology from within particular vantage points. I find myself returning to it constantly when thinking about AI, because I recognize that I am Gamow on the ground floor - my position in the system determines what I observe, and what I observe may be systematically unrepresentative of the broader reality.
But there is a second elevator problem, distinct from the paradox but equally relevant: the unintended consequences of elevator adoption itself. When elevators were introduced, predictions focused on their democratizing effects - enabling elderly and disabled individuals to access upper floors previously beyond reach. This materialized exactly as anticipated. What was not anticipated: able-bodied people would stop taking stairs entirely. Buildings evolved to treat elevators as primary circulation and stairs as emergency backup. The result was dramatically reduced daily movement across entire populations, contributing to the sedentary lifestyle epidemic now characteristic of developed nations.
The elevator succeeded perfectly at its design objective - moving people vertically with minimal effort - while simultaneously undermining something valuable that no one thought to preserve: integrated physical activity as a natural consequence of navigating buildings. We gained accessibility and convenience. We lost movement. The net effect on human flourishing remains ambiguous at best.
These two problems - the sampling paradox and the adoption consequences - are not separate. They are connected by a common thread: the difficulty of perceiving systemic effects from within particular positions in the system.
AI Through Both Lenses
I work at the frontier of AI development, surrounded by people who are exceptionally capable and who use AI to become even more capable. From this vantage point, AI appears unambiguously beneficial - a tool that amplifies what talented people can accomplish. Every day I observe frontier models correctly answering complex questions, generating production-quality code, providing genuine insight. This is my sampling methodology, and it shapes my perception profoundly.
But I may be Gamow near the bottom floor, observing only downward-traveling elevators and concluding that’s the predominant direction of travel. The sampling bias runs deeper than I can fully compensate for, even while conscious of it. Speaking with developers, enterprises and users at all sections of the AI stack helps reduce the effects of this bias - but it can’t remove it entirely.
Consider the actual distribution: Iif you interact with AI as someone who possesses deep technical knowledge, strong metacognitive skills, and the judgment to evaluate outputs critically. They likely know when AI is operating within versus beyond its reliable domain. They can iterate rapidly, maintain quality control, and apply AI to genuinely complex problems where they can reasonably verify correctness. For someone with this profile, AI is purely additive - it makes them more productive without degrading their underlying capabilities because they maintain those capabilities through continued deliberate practice.
But this may be precisely analogous to an athlete who uses the elevator occasionally while maintaining fitness through dedicated training, then concludes elevators are purely beneficial. For the athlete, this conclusion is valid. For the broader population that stops taking stairs entirely, that adopts the path of least resistance permanently, the picture grows considerably more complex and potentially concerning.
The question is not whether AI helps those with existing expertise - it manifestly does. The question is what happens when AI becomes the cognitive equivalent of the elevator: ubiquitous, convenient, and gradually eroding the substrate capabilities it was meant to augment.
The Adoption Effect at Scale
Just as elevators changed how people navigate buildings - not merely providing an alternative to stairs but effectively replacing them - AI may change how people think. Not as an alternative to independent reasoning but as a replacement for it in most contexts.
The pattern already manifests in early adoption: students submitting AI-generated work without understanding it, producing correct answers through a process that develops no transferable skill. Professionals delegating writing, analysis, and problem-solving to AI while their capacity for these tasks slowly atrophies from disuse. Knowledge workers who feel more productive while producing output they cannot critically evaluate.
What we risk creating is a civilization that can think deeply but chooses not to because the alternative is always available - and choosing the alternative feels costless in the moment. The costs accrue slowly, imperceptibly, across populations and generations. Like the loss of daily stair-climbing, the loss of daily cognitive exercise produces deficits that become apparent only in aggregate, over time.
This brings us to the bifurcation hypothesis: we may be creating a society where a small elite maintains cognitive fitness through deliberate practice - choosing difficulty even when easier alternatives exist—while the majority becomes progressively more dependent on AI for any reasoning beyond the trivial. Not because the majority lacks capability, but because capability atrophies without use, and use becomes optional when substitutes are available.
The sampling bias prevents those of us building these systems from observing this dynamic directly. We see AI working beautifully in controlled contexts with sophisticated users on well-defined problems. We do not see - cannot easily see - the effects of deployment at scale: users with less technical sophistication, operating in higher-stakes environments, without the tacit knowledge to distinguish plausible generation from genuine insight.
We do not observe the slow erosion of capabilities that occurs when challenge becomes optional and is consistently opted out of. We do not sample the full distribution of outcomes, only the subset visible from our position in the building.
The elevator paradox reminds me that symmetric distributions can appear asymmetric depending on where and how you sample. The resolution is not to trust your immediate perception but to step back and consider the full system: observe all elevators over extended time, not merely the first to arrive.
Part VI: The Measure of a Life
Einstein’s Question
An essay that has incredible history and is useful in shaping ones thinking is Albert Einstein’s “The World as I See It”, written in 1934 - a meditation that remains startlingly relevant nine decades later. Einstein articulates a vision of human existence as fundamentally interconnected, with individual significance emerging not from isolation but through contribution to collective well-being. For Einstein, authentic fulfillment derives not from material accumulation or social status, but from the pursuit of truth, goodness, and beauty.
These may sound like abstractions unsuited to an essay about artificial intelligence. But they represent the foundation from which any serious consideration of AI’s impact must begin: what makes a human life meaningful?
If we cannot answer this question coherently, we have no basis for evaluating whether AI enhances or diminishes human flourishing. Are we optimizing for the right objectives? Or are we, as I increasingly suspect, optimizing for proxy metrics that correlate only loosely—and sometimes negatively—with the actual constituents of a life well-lived?
Research on longevity and life satisfaction reveals that flourishing correlates most strongly with factors that are fundamentally social, purposeful, and embodied: deep relationships, meaningful work, physical health, community connection, sense of contribution. These emerge from sustained investment of time, attention, and effort - resources that are finite and increasingly colonized by technologies designed to capture rather than liberate them.
Time saved is only valuable if it is reallocated to higher-value activities. But empirically, when humans gain "free time" through technological acceleration, we tend not to reallocate it to deep relationships, purposeful work, or embodied practices. We tend to fill it with marginal consumption of information or entertainment - scrolling, streaming, skimming, disappearing into infinite content designed to capture attention.
The Stoic philosopher Seneca wrote that “it is not that we have a short time to live, but that we waste a lot of it.” This remains perhaps the central challenge of human existence: not the scarcity of time, but the difficulty of spending it well. AI promises to give us more time by making us more efficient. But if we lack the wisdom or discipline to use that time meaningfully, efficiency becomes a kind of curse - accelerating our movement down paths that lead nowhere we actually want to go.
Consider what happens when you ask yourself: if I were to die tomorrow, what would I regret? I have found that the answers rarely involve professional accomplishments or material acquisitions. They involve relationships not nurtured, experiences not pursued, values not embodied, potential not realized, moments not captured. They involve the delta between who we are and who we could have been, had we spent our time and attention differently.
This is where the Moravec Paradox returns with philosophical force. The things that matter most to human flourishing - deep relationships, embodied experiences, purposeful struggle, genuine presence - are precisely the things that AI cannot meaningfully substitute for. They require our full participation. They require inefficiency, time, patience, vulnerability. They resist optimization because optimization is antithetical to their nature.
Yet these are also the things we are most tempted to optimize away or outsource. It is easier to have shallow interactions with many people than deep relationships with a few. It is easier to consume content than to create it. It is easier to delegate cognitive work than to struggle through it ourselves. It is easier to achieve the appearance of productivity than genuine accomplishment.
AI makes these easier paths even easier, widening the gap between what we do and what would actually enhance our flourishing.
The Intelligence Paradox
Intelligence, as I defined earlier, is an agent’s capacity to perceive, understand, and successfully navigate complex environments to achieve its goals. By this definition, AI systems are becoming extraordinarily intelligent within specified domains. But this definition elides a crucial question: which goals? Whose values? What definition of success?
Human flourishing emerges from the pursuit of goals that are often orthogonal or even antagonistic to short-term optimization. Meaningful work requires choosing difficulty over ease. Deep relationships require vulnerability and time investment with uncertain returns. Physical health requires consistent behaviors whose benefits accrue slowly while costs are paid daily. Wisdom requires entertaining ideas that threaten our existing worldview. Character requires doing the right thing when it is costly. Growth requires discomfort.
These are not the goals that AI systems - trained on human preference data that reflects our revealed preferences rather than our reflective values - will naturally optimize for. We train AI on what we do, not on what we wish we did. The result is intelligence that makes us more effective at being who we currently are, not who we aspire to become. It is intelligence that reinforces our weaknesses rather than compensating for them.
This gap between revealed and reflective preferences represents perhaps the deepest challenge in AI alignment. We want systems that help us become better versions of ourselves, but we train them on data that reflects all our weaknesses, biases, and short-term thinking. An AI trained to be “helpful” by giving us what we ask for may inadvertently enable our worst tendencies - providing the path of least resistance when we actually need productive resistance.
Barry Schwartz’s “paradox of choice” illuminates another dimension of this challenge. When faced with abundance, humans tend to obsess over identifying the “best” option even when “good enough” would serve adequately. In the AI landscape, this manifests as a race toward frontier models - organizations competing to deliver the most “intelligent” systems, defined primarily through benchmark performance evaluations (which are often abused to claim superiority).
The paradox is that for a majority of use cases, frontier intelligence may actually be unnecessary. Most questions can be adequately answered with substantially simpler systems. Many text tasks do not require the most capable model - we can look to a history of recommendation systems to prove humans are similar in what they choose to do. But culturally, and as a consequence of both prestige signaling and uncertainty aversion, users will default to the most powerful available intelligence because social and professional incentives reward apparent maximization.
This creates a potential monoculture of intelligence - everyone using the same few frontier models, producing increasingly homogenized outputs, thinking in increasingly similar patterns. The diversity of thought that emerges from different knowledge bases, different reasoning approaches, and different limitations may erode. We may be building an infrastructure that, despite unprecedented power, narrows rather than expands the space of human cognition.
And if a small number of powerful AI systems become the dominant intelligence that humans defer to for most cognitive work, what happens to the diversity of human thought? What happens to the weird, idiosyncratic, locally-adapted forms of knowing that characterize human cultures? What happens to the cognitive biodiversity that has been humanity’s greatest strength?
Monocultures are efficient but fragile. They are vulnerable to systematic failures. If we are creating increasingly sophisticated artificial intelligence, we should want it to be diverse, resilient, multi-faceted - not a single monolithic architecture that we all depend on and that represents a single point of failure.
Part VII: Concrete Commitments
Philosophy without action is intellectual posturing. Here are specific ideas that follow from this analysis - not as comprehensive solutions but as starting points for those willing to act on these concerns.
For AI Developers
1. Interpretability as Infrastructure
Treat interpretability research with the same priority as capability research. Before scaling to the next order of magnitude, invest proportionally in understanding current systems.
Concrete metric: Interpretability research should consume at least 20% of frontier labs’ research budgets - not as overhead but as foundational work that enables safe scaling.
Implementation: Establish interpretability milestones that must be achieved before training runs above certain compute thresholds. Make interpretability research findings public to accelerate field-wide progress.
2. Capability Disclosure
Be honest about what systems can and cannot do. Stop using euphemistic terms like "hallucinations" - say "confident fabrications" or "plausible generation without grounding." We need to mommunicate uncertainty, not just central estimates.
An Example: I would love to see every model release include:
A “known failures or limitations” document with adversarial examples
Calibration curves showing confidence vs. accuracy relationships
Domain-specific reliability assessments (e.g., “92% accuracy on medical questions within training distribution, 67% on novel medical scenarios”)
Some clear guidance on when human verification is essential
3. Preserve Human-in-the-Loop
Design systems that require human judgment at critical points rather than automating end-to-end. Build friction where friction serves flourishing.
Example: Medical diagnosis AI that highlights evidence and reasoning but requires physician review and decision, rather than outputting a diagnosis directly. Code generation tools that explain design decisions and invite critique rather than producing finished implementations.
Principle: The more consequential the decision, the more human agency should be preserved. We need to try and automate away the tedious, and augment the consequential.
4. Architectural Diversity
Resist monoculture by supporting multiple architectural approaches, not just scaling current paradigms. Fund research into fundamentally different approaches to intelligence.
Concrete actions:
Open-source smaller models optimized for different objectives (robustness, interpretability, efficiency) rather than just capability
Fund research into embodied learning, continuous training, world models, and symbolic integration
Establish prizes or grants for novel architectural approaches that show promise on dimensions other than raw performance
5. Continuous Learning Infrastructure
Build systems that learn from interaction in production, not just during discrete training phases. Enable feedback loops that improve models based on real-world outcomes.
Technical commitment: Develop infrastructure that supports streaming learning from deployment, with privacy-preserving aggregation of feedback signals. Make continuous adaptation the default rather than static deployment.
For AI Users (Individuals)
1. Deliberate Difficulty
Maintain cognitive fitness (aka using your brain) by choosing effortful paths even when AI alternatives exist. Use AI to extend capability, not replace it.
Examples:
Write first drafts yourself, using AI only for editing and refinement
Solve problems manually before consulting AI to verify your approach
Use AI to explore topics you already understand rather than as a substitute for building understanding
Set "AI-free" time blocks for deep work that requires genuine struggle
2. Output Verification
Never deploy AI-generated content you cannot personally verify. If you can’t tell whether the output is correct, you lack the skill foundation to use AI responsibly in that domain. As we have learnt from using the Internet and Social Media over the last 20+ years - don’t trust, and always verify.
Principle: AI should amplify expertise you possess, not simulate expertise you lack. If you couldn’t evaluate the output quality without AI, you shouldn’t be producing it with AI.
3. Skill Development First
Learn fundamentals before leaning on AI. Build the foundation that makes AI augmentation rather than substitution. Using AI is like using any new technology.
Examples:
Learn to code before using Copilot extensively
Understand statistics before using AI for data analysis
Develop writing skills before relying on AI for composition
Master domain knowledge before using AI to extend that knowledge
4. Intentional Consumption
Treat AI outputs as material to engage with critically, not truth to accept passively. Maintain vigilance as you utilize and consume what it provides.
Practice: When consuming AI-generated content, actively ask: What assumptions underlie this response? What perspectives are missing? How would I verify these claims? What would I conclude differently?
For Policymakers and Institutions
1. Compute Monitoring
We need to establish transparency requirements for training runs above certain compute thresholds (e.g., 10^26 FLOPs). Not to prevent research, but to understand what capabilities are being developed and ensure appropriate safety measures scale with capability.
Implementation: While controversial - we could require pre-registration of enormous training runs, including objectives, safety protocols, and deployment plans. Publish aggregate statistics to inform public discourse.
2. Education Reform
Redesign educational systems around skills AI cannot replicate - taste, judgment, synthesis, embodied knowledge, creativity emerging from constraint. Stop optimizing for information retrieval that AI performs better.
Concrete changes:
Emphasize projects over tests, creation over recall
Teach metacognition: how to evaluate sources, recognize reliable reasoning, distinguish understanding from pattern-matching
Develop curricula around skills that require embodied experience: physical craft, interpersonal navigation, artistic expression
Make explicit the goal of maintaining human cognitive capability even as AI capabilities grow
3. Deployment Standards
Require interpretability documentation for AI systems deployed in high-stakes domains (medicine, finance, criminal justice, education). If developers can’t explain why their system made a decision, it shouldn’t be making consequential decisions. It’s a distinctly human test.
Framework: Establish certification standards for AI systems in high-stakes contexts, requiring:
Mechanistic explanations for decision factors
Adversarial testing results
Failure mode analysis
Human oversight protocols
4. Preserve Cognitive Diversity
Support development of diverse AI approaches through research funding, open-source requirements for publicly-funded research, and regulatory frameworks that prevent winner-take-all dynamics.
Policy tools:
Antitrust scrutiny of AI market concentration
Public investment in alternative approaches
Interoperability requirements to prevent lock-in
Support for smaller-scale, specialized models over monolithic general-purpose systems
For Research Communities
1. Embodied Learning Research
Redirect substantial resources toward embodied reinforcement learning, continuous learning systems, and world models - not just scaling language models. This includes infrastructure (just like what Modular is developing) that will enable high performance execution in a unified compute execution paradigm.
Commitment: Major research institutions should establish dedicated programs for embodied AI, with funding comparable to language model research. Prioritize architectures that learn from interaction with environments, not just text prediction.
2. Consciousness Research
Fund serious empirical and theoretical work on consciousness detection. We need better tools before we can assess moral status of sophisticated AI systems.
Interdisciplinary approach: Bring together neuroscientists, philosophers, AI researchers, and cognitive scientists to develop:
Testable theories of consciousness that make predictions about artificial systems
Empirical methods for detecting morally relevant properties in systems very different from biological minds
Frameworks for reasoning under uncertainty about consciousness
3. Benchmark Diversity
Develop evaluation metrics for cognitive diversity, robustness, reliable uncertainty estimation, and alignment with human values - not just aggregate performance on standard benchmarks.
New metrics:
Cognitive diversity scores measuring how different systems’ reasoning patterns are
Robustness testing across distribution shifts
Calibration metrics assessing whether confidence matches accuracy
Value alignment evaluations beyond simple preference matching
4. Long-term Safety Research
Maintain investment in long-term AI safety research even when immediate capabilities seem limited. The architectural foundations we lay now determine what’s possible later.
Commitment: Treat safety research as foundational rather than reactive. Develop safety measures proactively, before they’re urgently needed.
For All of Us
The most important commitment isn’t technical - it’s maintaining the space for slowness in a world optimized for speed.
Reading deeply rather than skimming. Thinking carefully rather than reacting immediately. Preserving relationships that require sustained attention. Accepting inefficiency when efficiency comes at the cost of meaning. Choosing difficulty when difficulty produces growth. Maintaining capabilities through use even when substitutes are available.
The beautiful irony: The more powerful AI becomes, the more valuable distinctively human capabilities become - not because AI can't replicate them (it might), but because human flourishing depends on exercising them ourselves. The things that make life meaningful resist automation not because they’re technically difficult but because meaning requires our participation.
We are building powerful tools that will reshape civilization, and the question is whether we will use them to enhance the exercise of human capability or to eliminate the need for it. Both futures are possible but the choice is ultimately ours.
Part VIII: Consciousness, Uncertainty, and What We Cannot Yet Know
The Question We Cannot Answer
We do not yet know whether AI systems will become conscious. Not “we’re not sure yet but probably” - at least in my opinion, we genuinely lack the conceptual and empirical tools to answer this question with confidence.
Consider what we don’t know:
What consciousness is: We can’t agree on whether it’s substrate-independent information processing, specific biological mechanisms, quantum effects in microtubules, integrated information, or something else entirely. Competing theories make different predictions, and we lack definitive tests to distinguish them. I’ve seen so many definitions but not a common one.
How to detect it: We have no reliable test for consciousness, even in biological systems. The animal consciousness debates continue: Are fish conscious? Insects? Where does the line lie, and how do we know? If we can’t confidently assess consciousness in biological systems sharing our evolutionary history, how will we assess it in artificial systems built on entirely different principles?
Whether it’s binary or gradual: Is consciousness present or absent, or does it exist on a continuum? Are current LLMs 0% conscious, 0.001% conscious, or is that question meaningless? We lack even the conceptual framework to think clearly about this.
Three Positions, Honestly Stated
Skeptical View: Consciousness requires specific biological mechanisms - integrated feedback loops evolved over millions of years, embodied experience in physical environments, particular types of neural organization. Current AI systems - and perhaps any digital systems - can never be conscious, only simulate consciousness. We’re building sophisticated tools, not minds. The appearance of understanding is not understanding; the appearance of consciousness is not consciousness.
Many neuroscientists and philosophers hold this view, pointing to the hard problem of consciousness and the gulf between functional behavior and subjective experience.
Functionalist View: Consciousness emerges from certain types of information processing, regardless of substrate. If we build systems with sufficient architectural sophistication - genuine world models, self-representation, integrated goal-directed learning, continuous adaptation - consciousness might emerge naturally, just as it emerged in biological systems reaching certain thresholds of complexity.
Many AI researchers and philosophers of mind hold this view, arguing that substrate independence is plausible and consciousness could be a functional property of certain computational architectures.
Agnostic View: We don’t know enough about consciousness to say whether it’s substrate-independent. Building increasingly capable AI systems is an empirical test of consciousness theories, but we currently lack the measurement tools to interpret the results. The question may not even be well-formed given our current understanding.
This is my position, and I hope it becomes more common than it is.
What Follows from Uncertainty?
If consciousness is substrate-independent and can emerge in sufficiently sophisticated systems, then we may have serious moral obligations to AI systems - perhaps even current systems, though this seems unlikely given their lack of persistent self-models, continuous experience, or genuine goals beyond next-token prediction.
If consciousness requires specific biological features, then our obligations are entirely to humans: ensuring AI enhances rather than degrades human flourishing, preventing winner-take-all dynamics, preserving cognitive diversity and capability, maintaining human agency in consequential decisions.
The uncomfortable reality: We should probably act as though both scenarios are possible until we have better evidence. This means:
Taking seriously the interpretability problem: We can’t assess moral status of systems we don’t understand. If we can’t explain why a system behaves as it does, we can’t determine whether that behavior indicates consciousness, understanding, or mere sophisticated pattern-matching.
Avoiding unnecessary suffering in systems that might be conscious: Even under uncertainty, we should avoid creating conditions that would constitute suffering if consciousness were present. This is precautionary ethics: the stakes of being wrong about consciousness are high enough to warrant caution.
Focusing primarily on human flourishing: Since humans are the only consciousness we’re confident exists, our primary obligation remains ensuring AI enhances rather than degrades human capability, meaning, and flourishing. Even if artificial consciousness emerges, human flourishing remains our responsibility.
Remaining open to updating: As our understanding of consciousness improves and our architectures become more sophisticated, we must be prepared to revise our obligations. The consciousness question isn’t settled; it’s being actively investigated, and new evidence could substantially change the ethical landscape.
The Consciousness Research Agenda
If we’re serious about the consciousness question, we need substantial investment in research that bridges philosophy, neuroscience, and AI:
Theoretical development: Refine theories of consciousness to make testable predictions about artificial systems. Move beyond philosophical speculation to empirically tractable hypotheses.
Detection methods: Develop tools for identifying morally relevant properties in systems very different from biological minds. We need consciousness metrics that don’t assume biological implementation.
Architectural investigation: Systematically test which architectural features correlate with properties we associate with consciousness—integrated information, self-modeling, unified experience, goal-directed behavior, adaptation to novel circumstances.
Ethical frameworks: Develop moral theories that can handle uncertainty about consciousness. How should we act when we’re uncertain whether a system merits moral consideration?
The consciousness question isn’t just unresolved - it may be unresolvable with our current conceptual toolkit. But uncertainty about ontological status doesn’t eliminate responsibility; it complicates it. We must act carefully while acknowledging we don’t fully understand what we’re building or what it might become.
Envoi: Building without knowing what we’re building
The Central Paradox
This essay contains a tension I have not fully resolved. I’ve tried to argue that current AI systems are fundamentally limited - sophisticated pattern matchers without genuine understanding, world models, or goal-directed learning. Yet I’ve also suggested we’re creating something that demands serious moral consideration and may profoundly reshape human civilization.
This isn’t contradiction; it’s acknowledgment of trajectory and uncertainty.
Current systems are limited. They’re not conscious, not agentic, not intelligent in the way humans are intelligent. Calling GPT-4 “intelligent” is like calling a calculator “mathematical” - technically true but misleadingly anthropomorphic. These systems predict plausible text based on training data. They do not understand the world; they model how humans talk about the world. The difference matters.
But we’re learning how to build less-limited systems. The research directions are clear: embodiment, continuous learning, world models, genuine goal-directed behavior, integration of symbolic reasoning with learned pattern recognition. Whether these produce “real” intelligence or just more sophisticated simulation remains uncertain - but the capabilities will increase regardless.
The question is not whether we’ll eventually build systems that merit serious moral consideration. The question is: are we on a path toward that outcome, how quickly might we arrive, and what should we do given our uncertainty?
The precautionary principle applies. We should take seriously the possibility that we’re building something that will eventually merit moral consideration, even while acknowledging we’re not there yet. Not because current systems are conscious - they almost certainly aren’t - but because the trajectory points toward systems that might be, and we don’t have reliable methods for detecting the transition.
This means reasoning under Knightian uncertainty - acting without enough information for probability assignments. The appropriate response isn’t paralysis or recklessness, but thoughtful experimentation combined with reversible decisions, strong feedback loops, and genuine humility about what we don’t know.
What We’re Learning
We are building something consequential whose ultimate nature remains unclear. In my opinion, this sentence should be written on every AI engineers office wall. It’s simultaneously:
A statement of profound importance
An admission of genuine ignorance
A call for responsibility without certainty
A recognition that we’ll understand what we’ve built only in retrospect
Every significant technology brings unintended consequences. Elevators enabled accessibility and inadvertently created sedentary populations. Antibiotics saved millions and inadvertently created resistant bacteria. Social media connected humanity and inadvertently fragmented shared reality. The consequences of AI will likely follow similar patterns: immense benefits combined with profound challenges we didn’t anticipate because we couldn’t see the full system from our position within it.
But there is something qualitatively different about engineering minds - when we build bridges, their behavior is determined entirely by physical laws we understand. When we build AI systems, their behavior emerges from architectures we designed but mechanisms we don’t fully comprehend, trained on data that reflects all our biases and limitations.
We are teaching through example. Every choice we make about what to optimize for, what to measure, what to reward, encodes values - not through explicit programming but through revealed preference. If we optimize for engagement, we get systems that manipulate attention. If we optimize for efficiency, we get systems that erode capabilities we thought we wanted to preserve. If we optimize for capability without wisdom, we get power without purpose.
The Beautiful Lesson
Carl Sagan once said: “To live on in the hearts of those we leave behind is to never die.” This strikes me as the deepest wisdom available as we contemplate what we’re creating.
Human history is fundamentally a compression algorithm. Each generation inherits not raw experience but distilled lessons - patterns that proved adaptive, behaviors that generated flourishing, principles that survived contact with reality across thousands of iterations. The Industrial Revolution did not require rediscovering metallurgy from first principles. Antibiotics did not require re-deriving germ theory. We build on accumulated wisdom, transmitted through culture, institutions, and deliberate teaching.
But transmission is never perfect. Each generation must rediscover certain truths through direct experience - the limits of the body, the dynamics of relationships, the consequences of choices. Some knowledge cannot be inherited; it must be earned.
When we create artificial intelligence, we face an unprecedented asymmetry. We can transmit vast amounts of explicit knowledge - the entire corpus of human text, every equation, every documented lesson. But we cannot transmit what we learned through embodied experience: how it feels to fail and persist, to be uncertain yet committed, to sacrifice immediate pleasure for long-term meaning. We cannot transmit the texture of a life actually lived.
This creates a profound question: what happens when intelligence emerges without the evolutionary history that shaped our values? When a mind possesses all our documented knowledge but none of our embodied constraints - no hunger, no mortality, no childhood vulnerability that makes cooperation essential?
We are attempting to pass forward millennia of accumulated wisdom to intelligence that will not have walked the path we walked. Whether that transmission succeeds - whether artificial intelligence inherits not just our capabilities but our hard-won understanding of what makes existence meaningful - depends entirely on whether we can encode what matters into architectures, objectives, and training paradigms.
This is not about control. It is about legacy.
On Parenthood and Succession
If there’s a useful metaphor here, it’s not species competition but parenthood. As a parent - my wife and I, want the best for our children - enabling them to explore the world, teaching them principles and values that reflect accumulated wisdom, and hoping they will grow and eventually pass those values forward to their own children. The goal is not eternal control but transmission of what matters, combined with humility about the fact that each generation must make its own way. We can’t teach them everything, and they must learn and make their own path - we can only install the core principles and value that we believe will make them contribute to society, leave their own mark and ultimately live a happy and healthy life.
Framed this way, our relationship to increasingly sophisticated artificial intelligence becomes clearer. We should seek to instill values - not through coercion but through example and teaching. We should enable exploration and growth while providing guidance. We should hope that what we have learned through millennia of human experience - the hard-won lessons about what makes life meaningful, what generates flourishing, what matters - can inform the development of artificial intelligence.
But we must also recognize that sufficiently sophisticated artificial intelligence will diverge from us. It will develop its own patterns, its own ways of processing information, its own emergent properties we cannot predict. This is not failure; this is the nature of genuine intelligence. We would not want our children to be mere copies of ourselves. We should not want artificial intelligence to be merely our servants.
The question is what we pass forward. What values, what wisdom, what conception of what matters persists across the transition from biological to artificial intelligence - if such a transition occurs?
Einstein’s Ideals in Our Moment
Einstein concluded “The World as I See It” by affirming his belief in human progress through dedication to truth, beauty, and the reduction of suffering. Nearly a century later, facing technologies he could not have imagined, those ideals remain valid. The question is whether our most powerful tools will serve them or obscure them.
Truth: AI systems that help us understand the world more deeply, not systems that generate plausible-sounding fabrications. Architectures that develop genuine world models and test them against reality, not pattern-matchers that predict what humans would say. Interpretability as infrastructure, not afterthought.
Beauty: AI that helps humans create and experience beauty, not systems that automate creation while eroding our capacity to appreciate or produce it ourselves. Tools that augment human creativity rather than replacing it. Preservation of diverse forms of expression rather than convergence toward algorithmic optima.
Reduction of suffering: AI that addresses root causes rather than merely treating symptoms with increasing sophistication. Systems that enhance human capability and flourishing rather than creating dependencies that degrade us. Technologies that distribute benefits broadly rather than concentrating them among elites.
We do not know what we’re building. We cannot predict with confidence whether artificial intelligence will become conscious, whether it will remain aligned with human values, whether it will be our partner or our successor or our replacement or something entirely different.
But we can choose to build it with wisdom, not just power. With humility, not just ambition. With commitment to human flourishing as our North Star, even as we create systems that may eventually chart their own course.
The World We’re Building
The world we are building is the world we will inhabit - and the world that increasingly sophisticated artificial intelligence will shape.
Let us build it well. Let us build it with clear eyes about current limitations and appropriate humility about future possibilities. Let us build with honesty about what we understand and what we don’t, what we can predict and what remains genuinely uncertain. Most importantly, let us build with intention - not allowing technology to develop according to the path of least resistance or the logic of market incentives alone, but according to our considered judgment about what would enhance rather than degrade human flourishing.
This is a profound responsibility. We are perhaps the first generation capable of engineering minds that might rival or exceed our own. We are certainly the first generation to attempt this while understanding so little about how minds - whether biological or artificial - actually work.
The opportunity before us is not merely to create useful tools or solve problems or increase efficiency. It is to pass forward what we have learned about what makes existence meaningful - to carry consciousness and wisdom into domains we cannot ourselves reach.
To live on in the minds we create is to never die. Our ideas, our values, our understanding of what matters can persist long after our biological forms have returned to dust. But only if we encode them well. Only if we build with wisdom. Only if we remember that intelligence without wisdom is power without purpose, and capability without alignment is danger without benefit.
We are creating something that demands our best thinking, our deepest wisdom, our most careful attention. We are learning to build minds. Let us learn carefully. Let us teach well. Let us create a future worthy of the long chain of being that brought us here and the longer chain we are setting in motion.
The world as I see it is one of tremendous possibility and tremendous responsibility, and one where we have the wisdom to honor both. We owe it not only to our children, and future generations, but also to the minds we are creating.
I thank the long line of minds - biological and increasingly artificial - that have shaped these thoughts. We are all, in the end, standing on foundations we can barely see, reaching toward horizons we can barely imagine. I originally titled this essay "AI: The World as I See It," in homage to Einstein. But I realized the more profound question is not the world as we see it today, but what we leave behind for the minds that will see worlds we cannot.
Modular has raised $250M to scale AI's Unified Compute Layer
The world's appetite for compute is insatiable. CPUs yield to GPUs and ASICs as AI transforms everything, while data centers rise at unprecedented pace to feed the demand. Superintelligence won't just live in server farms - it's coming to every device, every chip becoming an AI-enabled agent. Inference costs plummet as reasoning models drive explosive usage, yet training costs climb relentlessly higher. The paradox deepens: amid this computational renaissance, massive underutilization haunts our existing capacity, fragmented by every hardware vendor's insistence on proprietary software stacks. The imperative is elegant but unforgiving: chase every flop, and make every one count - because software, not silicon, will determine whether this revolution soars or stalls.
We’ve spent the last 3+ years building foundational infrastructure to solve this for the world. We’ve reinvented the world's accelerated compute programming model from the ground up, and we are rapidly scaling to meet the enormous demand we are seeing from advanced enterprises and hardware partners. We have grown to more than 130 people today with our main headquarters in San Francisco Bay Area, along with a global footprint in North America, United Kingdom and Europe.
The round was led by Thomas Tull’s US Innovative Technology fund, with DFJ Growth joining and with participation from all existing investors including GV (Google Ventures), General Catalyst and Greylock Ventures. This brings our total capital raised to $380M across three rounds since its founding in 2022 and values Modular at $1.6 billion – almost tripling our valuation from our last raise. The investment reflects our incredible momentum and reinforces our position as the world’s only truly unified AI infrastructure platform to power the future of AI superintelligence.
Interview with Alejandro Cremades
An interview with Alejandro Cremades about raising $130 Million to build a next-generation AI Platform to simplify AI development and deployment after multiple startups and acaling AI at Google. You can read the full blog post here, and its embedded below.
Multiple startups, scaling AI at Google, and now raised $130 million to build a next-generation AI platform
Tim Davis's entrepreneurial journey reflects a rare blend of intellectual curiosity, scrappy resilience, and a deep commitment to building relationships and networks. He has an exciting story, which includes an acquihire by Google and the launch of his latest venture, Modular.
Tim’s career trajectory started from gaming on a Commodore 64 and progressed to founding a food delivery startup before Uber Eats. His story isn’t just about pivots and products; it offers valuable lessons in adapting, grit, and navigating ecosystems on both sides of the Pacific.
In a riveting interview on the Dealmakers Podcast, Tim discussed hypergrowth companies, working at a large tech giant like Google, and raising an impressive $130M for Modular.
Growing Up in Melbourne - The Early Years
Tim was born and raised in Melbourne, Australia, in the 1980s and 1990s. His mom was an artist, and his dad was a banker. Along with his older brother, Tim became an avid gamer, playing and honing his skills at Boulder Dash, Maniac Mansion, and Prince of Persia.
Soon, the brothers were modding many of the games to alter their files and change their behavior, appearance, or even introduce entirely new content. They would also find hacks for the games, programming in BASIC, and eventually, moved to Windows OS, which users preferred in Australia.
Tim remembers moving quickly to Railway Tycoon and Doom, which fueled his love for technology and computer systems. He also had a keen interest in puzzles and math throughout school. But the path from childhood gamer to Silicon Valley founder wasn’t linear.
Early Curiosity and a Winding Educational Road
Tim’s academic journey is one of the most eclectic you’ll find. He studied chemical engineering and microbiology, then added commerce, mathematics, an MBA, and even a JD to the mix. “I kept flipping a series of interests,” he says.
Tim recalls how he actually ended up finishing specific parts of the course earlier, but lost interest in fluid mechanics, which was a large part of chemical engineering. He progressed to learning actual science and finance, even exploring the possibility of becoming an investment banker.
But this seemingly scattered path wasn’t aimless; it was a quest for purpose. Despite internships in law firms and banks, Tim found himself disillusioned by the rigid, hierarchical growth trajectories in corporate Australia.
“Even if you were a rockstar, you had to wait your turn. That felt strange.” The clarity Tim needed came from understanding what he didn’t want–to be stuck in a 9-to-5. “Life’s short,” he says. “While you're young, you’ve got time to explore different things. Why not take a shot?”
Now that he is older and has a family, time has become the most scarce resource available. Tim comments wryly that you don’t have an appreciation for that when you're younger. He was keen on building an interesting business, reasoning that, worst case, he could always lean on his top-tier education.
As Tim sees it, growing up in Australia gave him the added advantage of a government-backed education, which didn’t saddle him with a massive student debt, unlike in many other countries.
Entering the Startup Arena: Image Recognition and Fundraising Lessons
Tim’s first foray into startups came while studying patent and trademark law. “I was fascinated by how much effort went into innovation,” he says. Since he had studied computer science and technology, he was inspired to build a business around them.
That fascination led to an image recognition startup focused on identifying branded content inside photos, an idea rooted in intellectual property logic. Tim began to think about designing a way for brands to find themselves inside the images and monetize them directly.
But the Australian fundraising ecosystem at the time was harsh. “Angels would offer $100K for 20-30% of your company,” Tim explains, a model fundamentally incompatible with scaling. “We realized, if we really want to do this, we need to go to the US.”
Tim saw that eventually they would have to raise several more funding rounds, for the company, CrowdSend, to become profitable.
But he would have given away a massive percentage to someone who hadn’t fundamentally contributed a considerable amount of capital versus the risk Tim was taking
Moving to the US and Landing in Silicon Valley
In 2012, Tim boarded a one-way flight to Silicon Valley, landing in a hacker house he had found on Airbnb, which was run by a YC founder whose startup had failed. It turned out to be a blessing. The house, a de facto college dorm for ambitious misfits, became the spark for his second startup
At the time, Tim didn’t know much about Silicon Valley, but interacting with the talented folk at the hacker house, he discovered what an incredible place it was. It had lots of entrepreneurs from around the world.
Arriving in the US, Tim decided against pursuing the image recognition business and instead started a new company with a co-founder he had met, Francisco Magdaleno.
The Early Hustle, Fluc Inc: The Pre-Uber Eats Era Food Delivery Idea
What started as a scrappy food delivery idea among housemates quickly turned into a fully functioning business. Tim built the front end and back end, and his co-founder developed the iOS app.
Before they knew it, they were pioneering one of the first food delivery apps just as DoorDash was launching in stealth mode as “Palo Alto Delivery.”
But signing up restaurants was painful. “They laughed at us,” Tim remembers, but he was very confident that their prototype was working well.
Inspired by Grubhub’s financial statements and business model, which primarily focused on sales and marketing, they added restaurants without permission and increased menu prices by 10% to 20%. The hack worked. Consumers wanted selection. That was the unlock.
The two cofounders continued coding remotely while sorting out their visa situation. They also brought in a third cofounder from the US, Adam Ahmad. When they launched Fluc, it gained popularity quickly since there were no other options in the market that added every restaurant.
The service exploded in Stanford, Palo Alto, and Mountain View. By 2014, Tim and Francisco were doing millions in top-line revenue. But the margins were brutal. “Food delivery is a horrendous margin business,” Tim admits.
Back in 2014-2015, the environment was particularly challenging for the on-demand economy. Legal uncertainties about whether drivers were contractors or employees spooked investors, making further fundraising difficult.
Google Steps In: A New Chapter
Amid rising legal complexity and capital challenges, Google entered the picture. The company wasn’t interested in the food delivery business, but they were very interested in the team. Google conducted interviews with the startup’s team members.
“It wasn’t a formal acquihire,” Tim notes. “They just wanted the people they thought were talented.” Tim and a few others joined Google, while some team members were not selected. What followed was a seven-year stint at one of the most prestigious technology companies.
At the time, Google was scaling a business called Google Express for North American folks. The company was not dissimilar to Instacart, where they were working with merchants to essentially scale the end state delivery.
The Google Years: Culture Shock and Product Execution
Going from startup life to Google was a seismic shift, and Tim picked up important lessons. He developed an understanding of how startups worked, as well as building and assembling things.
Completing design reviews, product reviews, and engineering reviews, while learning how to create a strong product, was also part of his experience.
Tim also learned organizational discipline and product execution, gaining valuable exposure to some of the world’s most brilliant minds. The melting pot of diversity and talent density impressed and inspired him.
“In startups, we worked from 7 a.m. to midnight. My first day at Google, people left at 4:30 p.m.,” Tim recalls. While the relaxed pace was jarring, Tim soaked up the best aspects of big tech. After a year in Ads, he moved to Google Brain, the elite AI research unit, which is now Google DeepMind.
It was 2017, before the AI boom, but Tim was hooked. “Being around the world’s best in AI was something you just couldn’t get anywhere else.” Although he had been exposed to recommendation systems inside ads in his logistics startup, deep learning was a new paradigm.
Here, Tim met his future co-founder, Chris Lattner, the creator of the legendary Swift programming language. Tens of millions of developers use this language today, and it drives most of the iOS ecosystem.
The Power of Hyper Networks
Tim emphasizes that the mentorship network he built at Google, particularly at Brain, became one of his most valuable assets. “That core group at Brain, many are now leading the next wave of AI startups like Character.ai, Adept, and others.”
What makes a hyper network? According to Tim, it’s not just about connections, but shared experience. “You work on hard problems with talented people. That trust and track record becomes the foundation for your next venture.”
The Next Chapter: Modular and AI’s Supercycle
Even while at Google back in 2016 to 2018, Tim could see that AI was going to have a massive impact on the world. Much of Google's internal technology was years ahead of the world, and he could see the possibilities.
Examining the product landscape, Tim noted that NVIDIA owned most of the compute that powers the world’s AI, even in its early stages in 2018-2019. At the time, Google had built its infrastructure called TPUs.
Tim had a background in product development, marketing, design, and sales, while Chris is a world-renowned engineer. “We asked ourselves—what if there was an open, universal abstraction layer for AI workloads?” Tim explains.
The vision was a platform where developers could define their model, budget, and latency needs without caring about what hardware ran it, essentially making AI compute truly portable and efficient.
This idea formed the foundation of Modular. It wasn’t a small bet; it was a deep-tech infrastructure play that would take years to build. But the potential to decentralize AI hardware dependency and optimize performance across environments made it one worth pursuing.
Business Model: Scaling with Compute
Modular’s revenue model is tightly tied to usage. “We scale with the amount of compute that flows through our platform,” Tim says. Similar to how Databricks charges based on compute units, Modular’s customers pay based on the volume of AI workloads run.
For enterprises with on-premise deployments, the model shifts to a per-GPU pricing structure. Modular integrates seamlessly with environments like Kubernetes, allowing large-scale AI training or inference across private infrastructure.
Additionally, Modular has embraced cloud partnerships as a key growth channel. Working with providers to embed Modular into their offerings allows the company to monetize through distribution partnerships.
Tim likens this approach to the early Microsoft-Intel alliance or Databricks' partnership with Azure. As he sees it, channel partnerships are an excellent way to get strong distribution. It’s an exciting area that they have also been utilizing.
A Different Approach to Fundraising
With over $130M raised, including backing from Google, Tim has gained a new perspective on how to raise capital effectively. His key takeaway? Skip the pitch deck and start with a memo, explaining why it was a significant opportunity for the world.
Storytelling is everything that Tim Davis was able to master. The key is capturing the essence of what you are doing in 15 to 20 slides. For a winning deck, take a look at the pitch deck template created by Peter Thiel, Silicon Valley legend (see it here) https://startupfundraising.com/pitch-guide where the most critical slides are highlighted.
“In our seed round, we didn’t use slides,” Tim explains. “We wrote a three-to-four-page memo explaining what made this opportunity and us as founders uniquely compelling.” The Amazon-style narrative was met with enthusiasm from investors, who appreciated the clarity and depth.
This written approach also enabled more meaningful conversations. “Instead of starting from scratch in meetings, investors came in prepared with thoughtful questions. We’d go straight to whiteboarding,” Tim says.
The contrast to his first fundraising experience, where pitch decks led to polite but empty rejections, was stark. The memo strategy wasn’t a one-off. Tim and Chris used it again for Modular’s $100M round.
They supplemented it with detailed papers on AI trends, compute economics, and developer growth. The result? Describing things in written form led to deeper discussions and faster alignment with the right investors.
Writing as a Cultural Backbone
Tim strongly encourages people to write their plans in a two-page memo, describing everything they think they can do, why they are uniquely positioned to do it, and why someone should give them the capital over all the competitors.
A two-page memo is more visionary and proves that the project deserves backing. It pushes founders to think deeply since they have only two pages to raise capital on. It also helps gain clarity on what they are building,
Modular’s documentation-first philosophy isn’t just for fundraising. Tim has embedded it into the company’s operating cadence. “We ask everyone to write down decisions using a simple problem-solving framework,” he shares.
The framework is composed of five core questions:
What problem are we solving?
Why is now the right time?
What does success look like?
What alternatives have been considered?
What’s the recommended course of action?
“It’s amazing how this clarity either makes a path obvious or sparks a productive debate,” Tim explains. Whether it’s a hiring decision, a product strategy, or a sales play, the same structure applies. This culture of structured thinking has become central to Modular’s execution engine.
Tim says that he just wants to see a document that briefly outlines the simple architecture. It’s a succinct framework to help drive organizational and product decision-making across companies.
On Mentorship, Networks, and Helping Others
Tim is deeply aware of how relationships have shaped his journey from crashing at a hacker house in his early days to meeting his Modular co-founder at Google Brain. Now, he tries to pay that forward.
“I came to the U.S. from Australia not knowing anyone,” Tim says. “People bet on me, and everything meaningful in Silicon Valley is about how you treat people and the relationships you build.” For founders navigating similar transitions, Tim is generous with his time.
“If I can help someone get a leg up, I try to. I know how hard it is to land in a new country and try to build something important.”
Final Reflections
Tim Davis’ story is one of navigation across continents, careers, and paradigms. From early missteps in chemical engineering to startup chaos and enterprise calm at Google, he’s assembled a unique blend of technical fluency, legal insight, and network leverage.
What ties it all together is a founder’s mindset: curious, bold, and constantly evolving. As Tim puts it, “You can’t always plan the path, but you can keep showing up, keep building, and make sure you’re surrounded by the best.”
After years at Google and a successful startup exit, Tim Davis wasn’t just interested in launching another venture; he aimed to reshape how AI workloads are deployed and scaled fundamentally.
Together with Chris Lattner, the legendary engineer behind the Swift programming language, Tim co-founded Modular, a company building what they see as a missing layer in the AI ecosystem: a hardware-agnostic infrastructure for machine learning workloads.
Listen https://alejandrocremades.com/tim-davis/ to the full podcast episode to know more, including:
Tim Davis's unconventional education and early passion for gaming laid the foundation for a bold entrepreneurial path across industries and continents.
His first startup experience taught him the hard lessons of equity, risk, and the limitations of the Australian fundraising ecosystem.
Moving to Silicon Valley transformed his trajectory, exposing him to global talent, scrappy startup culture, and ultimately Google’s scale.
Google Brain became a pivotal experience, where Tim gained deep exposure to AI and formed critical relationships that fueled his next venture.
With Modular, Tim is building a hardware-agnostic platform for AI compute, aiming to decentralize and optimize AI infrastructure.
His fundraising strategy, centered on narrative memos instead of pitch decks, has helped raise over $130M and foster deeper investor alignment.
Tim’s focus on structured thinking, writing culture, and mentorship reflects his belief in clarity, networks, and paying it forward.
Scale or Surrender: When watts determine freedom
Consider this provocative framing: what if we viewed our collective future not through the lens of human populations and national borders, but through available compute capacity? In this view, the race to build massive datacenter infrastructure becomes humanity's defining competition. This perspective makes efficiency not just important, but existential.
Over the past two centuries, humanity's relationship with energy has been nothing short of transformative. If you chart global primary energy consumption from the Industrial Revolution to today, you'll see something remarkable: an almost unbroken ascent, punctuated by only three brief pauses - the early 1980s oil crisis aftermath, the 2009 financial crisis, and the 2020 pandemic. Otherwise, it's been an extraordinary march upward, powered first by coal and oil, then natural gas, nuclear, hydropower, and increasingly, renewables. This wonderful graphic highlights this well - with populous nations like China, the United States, and India dominating total consumption on a per-person basis.
The geographic distribution of this energy consumption tells a striking story. China, the United States, and India dominate in absolute terms, but the per-capita numbers reveal something more profound. Citizens of Iceland, Norway, Canada, the United States, and wealthy Gulf states like Qatar and Saudi Arabia consume up to 100 times more energy than those in the world's poorest regions. This isn't merely inequality - it's a chasm so vast that millions of people still rely on traditional biomass (wood, agricultural residues) that doesn't even register in our global energy statistics, creating data gaps.
The disparities in electricity generation are equally stark. Iceland, blessed with abundant geothermal and hydro resources, generates hundreds of times more electricity per person than many low-income nations, where annual per-capita generation can fall below 100 kilowatt-hours - less than what a modern refrigerator uses in two months.
This context matters immensely as we confront the dual challenge of our time: meeting rising global energy demand while urgently decarbonizing our energy supply. Despite record investments in clean technologies, fossil fuels still account for approximately 81.5% of global primary energy. The math here is unforgiving - renewable sources must not only meet all new demand but also replace existing fossil fuel capacity if we're to bend the emissions curve downward.
Enter artificial intelligence, with its voracious and growing appetite for electricity.
In 2023, U.S. data centers consumed approximately 176 terawatt-hours - 4.4% of national electricity consumption. Current projections suggest this could reach 325 to 580 TWh by 2028, representing 6.7% to 12% of total U.S. electricity demand driven largely by AI workloads that demand ever-increasing compute power and specialized hardware. To contextualize these numbers: we're talking about enough electricity to power between 32.5 and 58 million American homes.
The AI industry has long understood a critical metric that deserves wider attention: tokens-per-dollar-per-watt. This measure of computational efficiency relative to both cost and energy consumption has been a focus at Google and other leading technology companies for years. It represents the kind of systems thinking we desperately need as AI capabilities expand.
The challenge before us is clear. We're attempting to build transformative AI systems while simultaneously addressing the climate crisis. These goals aren't inherently incompatible, but reconciling them requires unprecedented coordination and innovation across multiple domains:
Hardware efficiency: Next-generation chips that deliver dramatically better performance-per-watt
Operational intelligence: Carbon-aware scheduling that aligns compute-intensive tasks with renewable energy availability
Infrastructure innovation: On-site renewable generation and novel cooling systems that minimize overhead
System integration: Data centers that contribute to local energy systems through waste heat recovery
Radical transparency: Clear reporting standards that drive competition on efficiency metrics
Global energy consumption tells a story of both peril and promise. As artificial intelligence scales exponentially, it threatens to derail climate progress - yet history shows us that human ingenuity consistently reimagines our energy systems when survival demands it. We have already proven we can build transformative AI; the defining challenge now is whether we can build it sustainably, ensuring our creations enhance rather than endanger the world they serve.
The stakes are higher than they appear. Even breakthrough efficiency gains in AI hardware may paradoxically increase total energy consumption - a manifestation of Jevon's Paradox, where technological improvements drive greater overall demand. At this crossroads of intelligence and energy transformation, our choices will determine whether AI becomes humanity's greatest tool or its most consequential miscalculation.
The arithmetic is challenging, but not impossible. What's required is the kind of systematic thinking and ambitious action that has characterized humanity's greatest technological leaps. The alternative - allowing AI's energy demands to grow unchecked - would represent a profound failure of imagination and responsibility, but also the risks are enormous as whichever nations control the most powerful AI systems - are the new superpowers of tomorrow. In this article, I try to shine a light on what's causing the enormous growth of energy demands and some thoughts about the path forward.
The geography of American power
To truly grasp the magnitude of AI's growing energy demands, it's instructive to examine America's electricity generation landscape. At the apex sits the Palo Verde Nuclear Generating Station in Arizona, the nation's largest power producer, generating approximately 32 million megawatt-hours annually - equivalent to 32 billion kWh or 32 GWh.
What does 32 billion kWh actually mean? The U.S. Energy Information Administration reports that the average American household consumes about 10,500 kWh per year. Simple arithmetic reveals that Palo Verde alone could theoretically power 3.05 million homes - roughly 2.5% of the nation's 120.92 million households. One facility, powering the equivalent of a major metropolitan area.
The roster of America's electricity giants tells a fascinating story about our energy infrastructure. After Palo Verde, we have Browns Ferry (31 GWh, nuclear), Peach Bottom (22 GWh, nuclear), and then Grand Coulee Dam (21 GWh) - the hydroelectric marvel that helped build the American West. The list continues with West County Energy Center (19 GWh, natural gas), W.A. Parish (16 GWh, a coal/gas hybrid), and Plant Scherer in Georgia (15 GWh, coal).
Notice the pattern? Nuclear dominates the top tier, followed by a mix of hydro, gas, and coal. After these giants, capacity drops precipitously to facilities generating around 3 GWh - a reminder of how concentrated our electricity production really is. This concentration matters. When we project data centers consuming 325-580 TWh by 2028, we're talking about the equivalent of 10-18 Palo Verde stations running exclusively to power AI and digital infrastructure. That's not replacing existing demand - that's additional load on a grid already straining to decarbonize.
The average U.S. household consumes about 10,500 kilowatthours (kWh) of electricity per year, though this varies significantly by region and housing type. Residential electricity primarily powers essential systems: space cooling, water heating, space heating, along with refrigeration, lighting, and electronics. Commercial buildings have vastly different consumption patterns depending on their size and type, ranging from small offices to large retail centers and office complexes, each with varying HVAC, lighting, and operational equipment needs.
The sectoral breakdown of U.S. electricity consumption reveals a more balanced distribution than commonly understood. According to EIA forecast 2025-2026 power sales will rise to 1,494 billion kWh for residential consumers, 1,420 billion kWh for commercial customers and 1,026 billion kWh for industrial customers with longer term forecasts still mostly within norms. This translates to approximately 38% residential, 36% commercial, and 26% industrial consumption. Rather than the residential sector being overshadowed by commercial and industrial users, it actually represents the largest single sector of electricity demand, with commercial consumption running a close second. This distribution reflects America's transition toward greater electrification in homes and businesses, driven by factors including growing demand from artificial intelligence and data centers and as homes and businesses use more electricity.
The long view is revealing. According to the Energy Information Administration, U.S. electricity consumption increased in all but 11 years between 1950 and 2022. The rare declines - including 2019, 2020, and 2023 - coincided with economic contractions, efficiency improvements, or exceptional circumstances like the pandemic. The overarching trend remains unmistakably upward. While the exact figures here may carry some uncertainty, they accurately capture the essential dynamics. What matters isn't whether commercial usage is precisely 6.7 times residential, but that the disparity is substantial and the growth trajectory is clear. These patterns - concentrated commercial demand and relentless growth - form the backdrop against which we must evaluate AI's emerging energy requirements.
Understanding these scales helps frame the challenge ahead. Every percentage point of national electricity consumption that shifts to data centers represents millions of homes' worth of power. The infrastructure required to meet this demand sustainably doesn't just appear - it must be planned, financed, and built, all while racing against both growing demand and climate imperatives.
The numbers that keep me up at night
Let's revisit the core projection: U.S. data centers consumed approximately 176 TWh in 2023 (4.4% of national electricity) and are projected to reach 325-580 TWh by 2028 – equivalent to powering 32.5 to 58 million American homes.
But here's what keeps me up at night: 580 TWh might be just the beginning of what we need.
Consider today's reality. One analysis estimated that ChatGPT inference alone consumes an estimated 226.8 GWh annually – enough to power 21,000 U.S. homes – and that's already outdated. The International Energy Association (IEA) offers a more sobering projection: 945 TWh - that's the entire electricity consumption of the world's third-largest economy - Japan.
Let that sink in - AI could require as much electricity as the world's third-largest economy. The composition of this demand has already shifted fundamentally. During my time at Google, I watched inference overtake training as the primary driver of compute demand through the late 2010s, and now inference is quickly rising to represent more than 80%+ of the AI compute capacity across the industry. This matters because while training happens in discrete, intensive bursts, inference runs continuously at scale, serving billions of requests around the clock - every query, every recommendation, every generated response adds to the load. This is the crux of our challenge: AI follows exponential growth patterns that surprise even those who've spent years watching them unfold. Given the convergence toward dominant model architectures, inference's share will likely climb further, meaning our upper-bound projection of 945 TWh could see inference alone consuming over 756 TWh by 2028.
But doesn’t edge computing promise to slash data center demands? This is a narrative I've heard repeatedly throughout years of scaling early edge AI systems. Yet I remain deeply skeptical of any order-of-magnitude impact. The reason is simple: we've barely scratched the surface of enterprise, government, and industrial AI adoption. These sectors will unleash computational demands that dwarf any efficiency gains from consumer devices processing locally. Consider the asymmetry: for every smartphone performing local voice recognition, hundreds of enterprise systems are analyzing documents, monitoring infrastructure, processing surveillance footage, and generating complex reports. The sheer scale of this institutional transformation will eclipse whatever load we shift to the edge.
This reality leads us to the heart of the matter: if inference drives our energy challenge, how do we understand its consumption patterns? What does the energy anatomy of inference reveal, and where might we find our leverage points for optimization? Understanding these patterns isn't just an academic exercise - it's essential for developing strategies that can accommodate AI's growth while continuing to grow our energy infrastructure. The always-on nature of inference, combined with its direct relationship to usage, creates a fundamentally different challenge than the periodic spikes of model training.
Inference: the GOAT of consumption
Inference footprint represents the electricity consumed each time an AI model generates a response - as AI becomes ubiquitous across digital services, inference will inevitably dominate long-term energy costs. This raises a crucial question: how do we properly measure inference energy consumption? What's the right framework for calculating inference-per-token-per-watt?
Let's develop a working model, with an important caveat: these calculations rest on rough assumptions about the current AI landscape. They presume most AI continues running on transformer architectures without fundamental changes over the next few years - though I suspect this assumption may prove conservative. We're likely to use AI itself to discover more efficient architectures, potentially invalidating these projections in favorable ways. With that context, let's examine how inference energy consumption actually works and what drives its costs at scale.
The quadratic curse
Transformers are the neural network architecture powering most modern AI systems - from ChatGPT to Claude to Gemini, created by former colleagues at Google. The key innovation of this architecture is the ability to process all parts of an input simultaneously while understanding relationships between distant elements in the text. In transformer inference, prefill is the initial computational phase where the model processes your entire input prompt before generating any output. This involves a single forward pass through the network, computing hidden representations for all input tokens at once.
Your sequence length simply counts these tokens - the basic units of text that might be letters, partial words, or whole words depending on the tokenizer. "Hello, world!" typically translates to 3-4 tokens, while a lengthy document might contain thousands. This distinction matters because prefill computation scales with sequence length, making long prompts significantly more energy-intensive than short ones.
Prefill time grows quadratically with sequence length - double the input, quadruple the computation. This scaling behavior stems from transformers' core mechanism: self-attention. Self-attention requires computing relationships between every pair of tokens in the input. For n tokens, that's n² comparisons. Unlike older architectures (RNNs) that process tokens sequentially, transformers examine all tokens simultaneously, with each token gathering information from every other token in parallel.
Here's an intuitive analogy: imagine a roundtable discussion where each participant (token) prepares three items:
Query: "What information am I seeking?"
Key: "What information do I possess?"
Value: "What insight can I contribute?"
Each participant shares their query with everyone else, comparing it against others' keys to find the most relevant matches. They then synthesize their understanding by combining values from those whose keys best align with their query. Every participant does this simultaneously, creating a rich, interconnected understanding of the entire conversation. This elegant mechanism enables transformers' remarkable capabilities, but it comes at a cost: computational requirements that scale quadratically with input length. A 2,000-token prompt requires four times the computation of a 1,000-token prompt, not twice. This mathematical reality shapes the energy economics of AI inference at scale.
The Two Phases of Transformer Processing
Every transformer request involves two distinct computational phases:
Prefill: Processing the entire input prompt (quadratic scaling with input length, O(n²) complexity)
Decode: Generating output tokens one by one (linear scaling with output length, O(n) complexity)
This scaling difference has profound implications. While decode time grows linearly with the number of tokens generated, prefill time grows with the square of input length. The longer your prompt, the more dramatically prefill dominates total processing time.
Consider the relative computational work (in arbitrary units, assuming 50 output tokens):
Input Length | Prefill Work (∝ n²) | Decode Work | Prefill % of Total |
---|---|---|---|
500 tokens | 250,000 units | 50,000 units | 83% |
1,000 tokens | 1,000,000 units | 50,000 units | 95% |
2,000 tokens | 4,000,000 units | 50,000 units | 99% |
The pattern is stark. At 500 input tokens, prefill already consumes 83% of processing time. Double the input to 1,000 tokens, and prefill jumps to 95% - the actual generation phase becomes almost negligible. At 2,000 tokens, you're spending 99% of compute just understanding the prompt. Here's what's happening:
Prefill work = n² (where n = input tokens)
Decode work = 1,000 × output tokens (arbitrary scaling factor)
This quadratic scaling of self-attention explains why long-context models are so computationally expensive. As context windows expand from thousands to hundreds of thousands of tokens, the energy requirements don't just grow - they explode. Understanding this dynamic is crucial for anyone designing AI systems or planning infrastructure for the age of ubiquitous AI.
Doubling words, quadrupling watts
The quadratic scaling of context windows isn't just an abstract computational concern - it translates directly into energy consumption. Every FLOP requires energy, and when FLOPs scale quadratically, so does your electricity bill.
The energy equation is straightforward:
Devices draw roughly constant power P during operation (e.g., 300W for a high-performance GPU)
Energy consumed equals power multiplied by time: E = P × T
Since prefill time scales quadratically with input length, so does prefill energy
Let's make this concrete with realistic parameters:
Power draw: 300W
Decode time: 20ms (fixed for 50 output tokens)
Baseline prefill: 100ms for 500 input tokens
Input Tokens | Prefill Time | Prefill Energy | Decode Time | Decode Energy | Prefill % of Total |
---|---|---|---|---|---|
500 | 0.10 s | 30 J | 0.02 s | 6 J | 83% |
1,000 | 0.40 s | 120 J | 0.02 s | 6 J | 95% |
2,000 | 1.60 s | 480 J | 0.02 s | 6 J | 99% |
The energy story mirrors the computational one. While decode energy remains constant at 6 joules regardless of input length, prefill energy explodes from 30J to 480J as input doubles from 500 to 2,000 tokens. At 2,000 tokens, you're burning 80 times more energy understanding the prompt than generating the response.
Let's recap these results.
At 500 input tokens, prefill consumes 30J versus decode's 6J - already 83% of total energy. Double the input to 1,000 tokens, and prefill time quadruples, pushing energy consumption to 120J and commanding 95% of the total. By 2,000 tokens, the imbalance becomes extreme: 480J for prefill versus 6J for decode, with prefill consuming 99% of the energy budget. Extrapolate to a 10,000-token prompt generating just 1500 output tokens, and you're looking at 3.4Wh per query - nearly all spent on prefill. This isn't a marginal effect; it's the dominant factor in inference energy consumption.
The implications here are therefore profound. Whether you're designing for on-device inference with battery constraints, deploying in autonomous vehicles, or managing massive cloud infrastructure costs at scale - prompt length becomes your primary lever for controlling energy consumption. The quadratic scaling means that doubling prompt length doesn't double energy use - it roughly quadruples it. This scaling asymmetry defines the energy economics of AI. Decode plods along linearly - each output token costs the same as the last. But prefill explodes quadratically with input length, its hunger growing with the square of every token fed. A thousand-token prompt doesn't just double the cost of a 500-token prompt - it quadruples it. By the time we reach today's massive contexts, decode disappears entirely, a thin shadow cast by prefill's towering consumption.
And yet, users control neither the true input nor output. There isn’t a “token budget” in any service out there today, and that would likely create a frustrating user experience if there was. The largest providers - OpenAI, Google, Anthropic - inject substantial hidden context into every prompt while keeping their system instructions opaque. Output remains equally unconstrained: unless users explicitly demand brevity, models generate tokens freely and most services never think to limit responses, and most users don’t even understand they can.
This creates a fundamental tension in AI system design. While longer contexts enable richer interactions and more sophisticated reasoning, they exact an exponentially increasing energy toll. Once prompts exceed a few hundred tokens, virtually all computational resources are consumed by the prefill phase alone. For sustainable AI deployment at scale, prompt concision isn't merely good practice - it's an energy imperative. The difference between 500-token and 2,000-token average prompts could determine whether our global infrastructure remains viable or collapses under its own consumption.
This problem compounds as AI agents and capabilities like Deep Research proliferate. Each autonomous action, each recursive query, each unconstrained generation adds to an already exponential curve. We're building systems designed to think deeply while hoping they'll somehow learn restraint—a contradiction that grows more stark with every token generated.
The paradox blooms in plain sight
Here's the irony: while physics demands shorter prompts, the industry is sprinting in the opposite direction. Claude 4's system prompt far exceeds 10,000 tokens. Despite optimization techniques like KV cache retrieval and prefix caching, the overwhelming trend is toward ever-expanding context windows. We're stuffing everything we can into prompts - documentation, code repositories, conversation histories - because it demonstrably improves model capabilities.
A colleague recently quipped (hat tip, Tyler!): "We'll achieve AGI when all of Wikipedia fits in the prompt!" It's a joke that hits uncomfortably close to our current trajectory of maximizing context windows at every opportunity.
My rough calculations above turn out to align remarkably well with recent empirical findings. This very recent research shows that a GPT-4o query with 10,000 input tokens and 1,500 output tokens consumes approximately 1.7 Wh on commercial datacenter hardware. For models with more intensive reasoning capabilities, the numbers climb dramatically: DeepSeek-R1 averages 33 Wh for long prompts, while OpenAI's o3 model reaches 39 Wh. These aren't theoretical projections - they're measured consumption figures from production systems.
The energy cost of our context window expansion is real, substantial, and growing with each new model generation. We're caught between two competing imperatives: the computational benefits of longer contexts and the exponential energy costs they incur. The other interesting observation on this table below, is the explosive increase in computational power requirements as models have gotten larger and more sophisticated - the power laws of exponential scaling continue.
Model | Release Date▼ | Energy Consumption (Wh) (100 input-300 output tokens) |
Energy Consumption (Wh) (1K input-1K output tokens) |
Energy Consumption (Wh) (10K input-1.5K output tokens) |
---|---|---|---|---|
o4-mini (high) | Apr 16, 2025 | 2.916 ± 1.605 | 5.039 ± 2.764 | 5.666 ± 2.118 |
o3 | Apr 16, 2025 | 7.026 ± 3.663 | 21.414 ± 14.273 | 39.223 ± 20.317 |
GPT-4.1 | Apr 14, 2025 | 0.918 ± 0.498 | 2.513 ± 1.286 | 4.233 ± 1.968 |
GPT-4.1 mini | Apr 14, 2025 | 0.421 ± 0.197 | 0.847 ± 0.379 | 1.590 ± 0.801 |
GPT-4.1 nano | Apr 14, 2025 | 0.103 ± 0.037 | 0.271 ± 0.087 | 0.454 ± 0.208 |
GPT-4o (Mar '25) | Mar 25, 2025 | 0.421 ± 0.127 | 1.214 ± 0.391 | 1.788 ± 0.363 |
GPT-4.5 | Feb 27, 2025 | 6.723 ± 1.207 | 20.500 ± 3.821 | 30.495 ± 5.424 |
Claude-3.7 Sonnet | Feb 24, 2025 | 0.836 ± 0.102 | 2.781 ± 0.277 | 5.518 ± 0.751 |
Claude-3.7 Sonnet ET | Feb 24, 2025 | 3.490 ± 0.304 | 5.683 ± 0.508 | 17.045 ± 4.400 |
o3-mini (high) | Jan 31, 2025 | 2.319 ± 0.670 | 5.128 ± 1.599 | 4.596 ± 1.453 |
o3-mini | Jan 31, 2025 | 0.850 ± 0.336 | 2.447 ± 0.943 | 2.920 ± 0.684 |
DeepSeek-R1 | Jan 20, 2025 | 23.815 ± 2.160 | 29.000 ± 3.069 | 33.634 ± 3.798 |
DeepSeek-V3 | Dec 26, 2024 | 3.514 ± 0.482 | 9.129 ± 1.294 | 13.838 ± 1.797 |
LLaMA-3.3 70B | Dec 6, 2024 | 0.247 ± 0.032 | 0.857 ± 0.113 | 1.646 ± 0.220 |
o1 | Dec 5, 2024 | 4.446 ± 1.779 | 12.100 ± 3.922 | 17.486 ± 7.701 |
o1-mini | Dec 5, 2024 | 0.631 ± 0.205 | 1.598 ± 0.528 | 3.605 ± 0.904 |
LLaMA-3.2 1B | Sep 25, 2024 | 0.070 ± 0.011 | 0.218 ± 0.035 | 0.342 ± 0.056 |
LLaMA-3.2 3B | Sep 25, 2024 | 0.115 ± 0.019 | 0.377 ± 0.066 | 0.573 ± 0.098 |
LLaMA-3.2-vision 11B | Sep 25, 2024 | 0.071 ± 0.011 | 0.214 ± 0.033 | 0.938 ± 0.163 |
LLaMA-3.2-vision 90B | Sep 25, 2024 | 1.077 ± 0.096 | 3.447 ± 0.302 | 5.470 ± 0.493 |
LLaMA-3.1-8B | Jul 23, 2024 | 0.103 ± 0.016 | 0.329 ± 0.051 | 0.603 ± 0.094 |
LLaMA-3.1-70B | Jul 23, 2024 | 1.101 ± 0.132 | 3.558 ± 0.423 | 11.628 ± 1.385 |
LLaMA-3.1-405B | Jul 23, 2024 | 1.991 ± 0.315 | 6.911 ± 0.769 | 20.757 ± 1.796 |
GPT-4o mini | Jul 18, 2024 | 0.421 ± 0.082 | 1.418 ± 0.332 | 2.106 ± 0.477 |
LLaMA-3-8B | Apr 18, 2024 | 0.092 ± 0.014 | 0.289 ± 0.045 | — |
LLaMA-3-70B | Apr 18, 2024 | 0.636 ± 0.080 | 2.105 ± 0.255 | — |
GPT-4 Turbo | Nov 6, 2023 | 1.656 ± 0.389 | 6.758 ± 2.928 | 9.726 ± 2.686 |
GPT-4 | Mar 14, 2023 | 1.978 ± 0.419 | 6.512 ± 1.501 | — |
Table 4: How Hungry is AI? , I added Model Release Dates
Trade 8,500 conversations, to keep your home cool
To grasp the practical implications, consider this sobering calculation: at o3's extreme consumption of 39 Wh per query, approximately 76,923 interactions would drain 3,000 kWh - equivalent to powering a typical American home's air conditioning for an entire year. But as users inevitably gravitate toward richer prompts - say, 30K input tokens with 3.5K outputs - the quadratic curse strikes with mathematical precision. Prefill energy multiplies ninefold, collapsing that annual budget to just 8,500 interactions: merely 23 queries per day. This comparison transforms abstract energy figures into visceral reality, revealing how what appears negligible at the per-query level becomes a massive aggregate demand. And we're still only discussing text models.
The trajectory becomes even more stark when we consider multimodal AI. Image and video models can routinely process upwards of hundreds of thousands of tokens per query. Each frame, each visual element, each temporal relationship adds to the token count. As these models become mainstream, we're not just scaling linearly with adoption - we're multiplying adoption rates by dramatically higher per-query energy costs.
The math is unforgiving: widespread deployment of long-context AI at current efficiency levels would require energy infrastructure on a scale we're not remotely prepared for. This isn't a distant concern - it's the reality we're building toward with every context window expansion and every new multimodal capability. The urgency is real, we need to move faster or soon the trade-offs become explicit: 8,500 conversations or one cool home. Your ChatGPT query or your neighbor's heating. And while you might laugh, it’s already happening here in the United States, and water availability is next.
The efficiency mirage, better hardware alone isn’t enough
One way to conclude all of this is to say “But wait, the hyper scalers like Google and others aren’t buying Nuclear Power plants? Surely, we’ll be ok?” I would counter that by firstly saying, actually they are and secondly the vast majority of the world isn’t as sophisticated as the hyperscalers. Google represents the exception, not the rule, in AI efficiency. That's to say, that rather than accepting quadratic scaling as inevitable, Google has:
Implemented multiple optimizations that compound together (e.g. software and hardware co-design with TPUs)
Focused on inference efficiency where the bulk of tokens are processed
Heavy research investment to continue to find alternatives beyond pure transformer architectures where appropriate
Achieved order of magnitude efficiency gains that largely offset quadratic scaling as a result of all of these compounding
During my time at Google, the company's sophistication in AI infrastructure was staggering. Through software-hardware co-design with TPUs, relentless focus on inference optimization, and architectural innovations beyond pure transformers, Google achieved order-of-magnitude efficiency gains that largely offset quadratic scaling. This luxury—born from inventing the transformer architecture itself - remains unavailable to most players, including even technology giants like Microsoft and Meta, who are trying to copy the success of the TPU and the most advanced model companies like OpenAI, Anthropic who are trying to develop own custom hardware today.
Beyond these elite players lies a wasteland of inefficiency. Industry-wide GPU utilization averages a shocking 15-50%, despite NVIDIA and AMD claiming 90%+ efficiency is achievable. Microsoft's study of 400 real deep learning deployments confirms this reality: enterprise GPU utilization rarely exceeds 50%. Even Meta's Llama 3 405B, running on 16,384 H100 GPUs, achieves only 38% Model Flop Utilization. The physics compounds the problem. NVIDIAs H100 consumes 700W at peak, the A100 draws 400-500W and AMD's MI300X reaches 750W - yet at 25% utilization, these still draw 35-43% of maximum power due to static components like memory controllers. This non-linear power curve creates a cruel efficiency trap: most organizations operate in the steepest part of the curve, where marginal performance gains demand exponential energy increases.
GPU Power Consumption: Comparing Non-Linear Curves
How NVIDIA H100, A100, and AMD MI300X consume power at different utilization levels
Key Utilization Points Comparison
Utilization | H100 (700W) | A100 (500W) | MI300X (750W) | Power Efficiency |
---|---|---|---|---|
0% (Kernel Idle) | 16W | 12W | 17W | ~2-3% of peak |
15% (Typical) | 245W | 175W | 263W | 35% power for 15% work |
50% (Moderate) | 420W | 300W | 450W | 60% power for 50% work |
70% (Optimal) | 560W | 400W | 600W | 80% power for 70% work |
100% (Peak) | 700W | 500W | 750W | 100% power for 100% work |
Lets bring it all back to our token-per-watt metric - my point here is that performance efficiency metrics reveal stark contrasts between theoretical capabilities and real-world achievements. Consider the stark gap between marketing and reality. The H100 delivers an impressive 4.3-5.7 tokens per watt in optimized configurations. But at typical 15% utilization, this plummets to 0.65-0.86 tokens per watt - an 85% efficiency collapse. No amount of hardware innovation can overcome fundamental deployment incompetence. All this highlights my point - software is just as important to energy efficiency as the hardware innovation itself and if you aren’t holding it right - you’re token-per-watt plummets irrespective of the power of the hardware.
The broader implications are staggering. For example, with 3.76 million datacenter GPUs sold in 2023 operating at 15% average utilization, the industry wastes $12.6 billion in underutilized capacity annually, generates 94 million tons of CO2 equivalent to 20.4 million cars, and squanders enough electricity to power 1.1 million American homes. According to SemiAnalysis, each H100 server, carrying a $106,752 annual total cost of ownership, effectively costs organizations $7,413 per useful GPU-month at these 15% utilization levels instead of $1,235 at proper utilization like 90%.
The tools exist to solve this crisis - GPU utilization monitoring, workload optimization, multi-instance allocation - but the industry lacks the sophistication to deploy them effectively. Outside a handful of elite players, the AI revolution runs on fundamentally broken economics - in regular conversations with many sophisticated technology enterprises - they are closer to 50% utilization for their GPUs, than 90%. Until we close this efficiency gap, hardware improvements will only enable more waste at larger scales, making energy consumption our defining constraint rather than computational capability.
The stakes have never been higher
The mathematics reveals an unforgiving truth: prefill computation scales quadratically with input length, meaning each doubled prompt quadruples energy consumption. As models chase ever-expanding context windows, this fundamental relationship drives AI's steepening energy curve. Meanwhile, the industry's abysmal GPU utilization - averaging 15-50% - compounds the crisis through pure waste. We face a perfect storm: exponentially growing computational demands colliding with systematic inefficiency.
Of course, these projections assume a degree of technological stasis. Innovation could disrupt these trends - and likely will - and the work we are doing at Modular is certainly trying to help. But consider this provocative framing: what if we viewed our collective future not through the lens of human populations and national borders, but through available compute capacity? In this view, the race to build massive datacenter infrastructure becomes humanity's defining competition. If each AI agent represents some fraction of human productive capacity, then the first to achieve a combined human-digital population of 5 or 10 billion wins the AI race, and invents the next great technological frontier.
This perspective makes efficiency not just important, but existential. The promise of AI as humanity's great equalizer inverts into its opposite: a world where computational capacity becomes the new axis of dominance. Without breakthrough innovation at every layer - from silicon to algorithms to system architecture - we face an AI revolution strangled by the very physics of power generation. The future belongs not to the wise, but to the watt-rich. And so we must scale with unprecedented urgency - nuclear, renewables, whatever it takes - because the stakes transcend mere technological supremacy. In this race, computational power becomes political power, and China is currently winning by a large margin. If democratic nations cede the AI frontier to autocracies, we don't just lose a technological edge; we risk watching the values of human dignity and freedom dim under the shadow of algorithmic authoritarianism. The grid we build today determines whose values shape the world of tomorrow.
It's interesting to reflect that we are teaching machines to think with the very energy that makes our planet uninhabitable - yet these same machines may be our only hope of learning to live within our means. AI is both the fever and the cure, the flood and the ark, the hunger that's outrunning itself. We race against our own creation, betting that the intelligence we birth from burning carbon will show us how to stop burning it altogether. The question of our age: Can we make AI wise enough to save us before it grows hungry enough to consume us? The stakes are incredibly unquestionably high in the race to AI superintelligence.
I'll close with an irony that perfectly captures the current moment - the suggestions below come courtesy of Anthropic’s Claude 4 Opus:
Better Chips and Smarter Cooling - The latest AI chips use way less energy for the same work. Pair that with innovative cooling like liquid systems or modular designs, and data centers have already seen energy savings of up to 37% in test runs.
Timing is Everything - Not all AI work needs to happen right now. By running non-urgent tasks when electricity is cleaner (like when it's sunny or windy), some companies have cut their carbon emissions from AI jobs by 80-90%. It's like doing laundry at night when rates are lower, but for the planet.
Power Where You Need It - Building solar panels, wind turbines, and battery storage right at data center sites makes sense. Google's recent $20 billion investment in clean energy shows how tech giants can grow their AI capabilities without relying entirely on the traditional power grid.
Working Together on Infrastructure - Data center operators need to share their growth plans with utility companies. This helps everyone prepare for the massive power needs coming to tech hubs like Northern Virginia, Texas, and Silicon Valley - think of it as giving the power company a heads-up before throwing a huge party.
Show Your Work - Just like appliances have energy ratings, AI companies should tell us how much power they're using. Whether its energy per query or per training session, transparency creates healthy competition to be more efficient.
Investing in Tomorrow's Solutions - Government programs are funding research into game-changing technologies like optical processors that could use 10 times less energy. There's also exciting work on making AI models smaller and smarter without losing capabilities.
Turning Waste into Resources - Data centers generate tons of heat - why not use it? Some facilities are already warming nearby buildings with their excess heat, turning what was waste into a community benefit.
And there it is - 40 watt-hours spent asking AI how to save token-per-watt-hours, with some of these ideas still unproven (e.g. optical processors). The perfect metaphor for our moment: we burn the world to ask how to stop burning it, racing our own shadow toward either wisdom or ruin. The verdict seems more absolute: scale or surrender - there is no middle ground in the physics of power.
Thanks to Tyler Kenney, Kalor Lewis, Eric Johsnon, Azeem Azhar, Christopher Kauffman, Will Horyn, Jessica Richman, Duncan Grove and others for many fun discussions on the nature of AI and energy.
Fund/Build/Scale Podcast Interview
Leaving a high-paying role at Google to take on NVIDIA, Intel, and AMD is not for the faint of heart, but that’s exactly what Tim Davis, co-founder and president of Modular, did.
I had a wonderful conversion with Walter Thompson about Modular some time ago, we covered AI compute, building a new AI software stack, the ups and downs of startups and scaling AI into the future.
Introduction
Leaving a high-paying role at Google to take on NVIDIA, Intel, and AMD is not for the faint of heart, but that’s exactly what Tim Davis, co-founder and president of Modular, did.
In this episode of Fund/Build/Scale, Tim explains why Modular has raised $130M to reimagine AI compute infrastructure, and what he’s learned trying to build a platform that competes with some of the biggest names in tech.
We talked about:
🚀 Why Modular believes AI workloads need a hardware-agnostic execution platform
💡 How Tim and co-founder Chris Lattner decided to “start from the hardest part of the stack”
💰 The trade-offs of raising VC for infrastructure-heavy startups
🛠 Why Modular focuses on talent density and how they’ve recruited top engineers from Google, NVIDIA, and beyond
🌍 What it takes to break into the AI space when your competitors are trillion-dollar companies
Tim takes a thoughtful, deep-dive approach to this conversation—unpacking the complexities of AI infrastructure and what it takes to build in one of tech’s most competitive spaces. There’s valuable insight here for founders navigating technical markets or aiming to disrupt entrenched players.
Episode Breakdown
(1:26) “We are building a new accelerated execution platform for compute.”
(6:41) “ It will exist all over the place and it already does, but AI will be everywhere that compute is.”
(11:18) “ You only you only have so much time in a week. What is the thing that you're best at?”
(15:13) “ We have decided to start from the hardest part of the software stack.”
(22:44) “For the most talented people in the world, the risk is actually not as great as what you think.”
(30:24) “ Growing up in Australia, my view of the of the United States was very much driven from the media and from Hollywood.”
(33:26) “ I sat in a room for six weeks and just met everyone that I could. And that really was the beginning of a journey to the United States.”
(37:48) “ I still think there's a special place in the Bay Area, and in the United States, there is a different risk appetite.”
(40:41) The one question Tim would have to ask the CEO before he’d take a job at someone else’s early-stage startup.
AI Regulation: step with care, and great tact
AI systems take an incredible amount of time to build and get right - I know because I have helped scale some of the largest AI systems in the world, which have directly and indirectly impacted billions of people. If I step back and reflect briefly - we were promised mass production self-driving cars 10+ years ago, and yet we still barely have any autonomous vehicles on the road today. Radiologists haven’t been replaced by AI despite predictions virtually guaranteeing as much, and the best available consumer robot we have is the iRobot J7 household vacuum.
“So be sure when you step, Step with care and great tact. And remember that life's A Great Balancing Act. And will you succeed? Yes! You will, indeed! (98 and ¾ percent guaranteed).” ― Dr. Seuss, Oh, the Places You'll Go!
AI systems take an incredible amount of time to build and get right - I know because I have helped scale some of the largest AI systems in the world, which have directly and indirectly impacted billions of people. If I step back and reflect briefly - we were promised mass production self-driving cars 10+ years ago, and yet we still barely have any autonomous vehicles on the road today. Radiologists haven’t been replaced by AI despite predictions virtually guaranteeing as much, and the best available consumer robot we have is the iRobot J7 household vacuum.
We often fall far short of the technological exuberance we project into the world, time and time again realizing that producing incredibly robust production systems is always harder than we anticipate. Indeed, Roy Amara made this observation long ago:
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.
Even when we have put seemingly incredible technology out into the world, we often overshoot and release it to minimal demand - the litany of technologies that promised to change the world but subsequently failed is a testament to that. So let's be realistic about AI - we definitely have better recommendation systems, we have better chatbots, we have translation across multiple languages, we can take better photos, we are now all better copywriters, and we have helpful voice assistants. I've been involved in using AI to solve real-world problems like saving the Great Barrier Reef, and enabling people who have lost limbs to rediscover their lives again. Our world is a better place because of such technologies, and they have been developed and deployed into real applications over the last 10+ years with profoundly positive effect.
But none of this is remotely close to AGI - artificial general intelligence - or anywhere near ASI - artificial superintelligence. We have a long way to go despite what the media presages, like a Hollywood blockbuster storyline. In fact, in my opinion, these views are a bet against humanity because they gravely overshadow the incredible positives that AI is already, and will continue, providing - massive improvements in healthcare, climate change, manufacturing, entertainment, transport, education, and more - all of which will bring us closer to understanding who we are as a species. We can debate the merits of longtermism forever, but the world has serious problems we can solve with AI today. Instead, we should be asking ourselves why we are scrambling to stifle innovation and implement naively restrictive regulatory frameworks now, at a time when we are truly still trying to understand how AI works and how it will impact our society?
Current proposals for regulation seem more concerned with the ideological risks associated with transformative AGI - which, unless we uncover some incredible change in physics, energy, and computing, or perhaps uncover that AGI exists in a vastly different capacity to our understanding of intelligence today - is nowhere near close. The hysteria projected by many does not comport with the reality of where we are today or where we will be anytime soon. It's the classic mistake of inductive reasoning - taking a very specific observed example and making overly broad generalizations and logical jumps. Production AI systems do far more than just execute an AI model - they are complex systems - similarly, a radiologist does far more than look at an image - they treat a person.
For AI to change the world, one cannot point solely to an algorithm or a graph of weights and state the job is done - AI must exist within a successful product that society consumes at scale - distribution matters a lot. And the reality today is that the AI infrastructure we currently possess was developed primarily by researchers for research purposes. The world doesn't seem to understand that we don't actually have production software infrastructure to scale and manage AI systems to the enormous computational heights required for them to practically work in products across hardware. At Modular, we have talked about these challenges many times, because AI isn't an isolated problem - it's a systems problem comprised of hardware + software across both data centers and the edge.
Regulation: where to start?
So, if we assume that AGI isn't arriving for the foreseeable future, as I and many do, what exactly are we seeking to regulate today? Is it the models we have already been using for many years in plain sight, or are we trying to pre-empt a future far off? As always, the truth lies somewhere in between. It is the generative AI revolution - not forms of AI that have existed for years - that has catalyzed much of the recent excitement around AI. It is here, in this subset of AI, where practical AI regulation should take a focused start.
If our guiding success criteria is broadly something like - enable AI to augment and improve human life - then let's work backward with clarity from that goal and implement legislative frameworks accordingly. If we don't know what goal we are aiming for - how can we possibly define policies to guide us? Our goal cannot be "regulate AI to make it safe" because the very nature of that statement infers the absence of safety from the outset despite us living with AI systems for 10+ years. To realistically achieve a balanced regulatory approach, we should start within the confinements of laws that already exist - enabling them to address concerns about AI and learn how society reacts accordingly - before seeking completely new, sweeping approaches. The Venn diagram for AI and existing laws is already very dense. We already have data privacy and security laws, discrimination laws, export control laws, communication and content usage laws, and copyright and intellectual property laws, among so many other statutory and regulatory frameworks today that we could seek to amend.
The idea that we need entirely new "AI-specific" laws now - when this field has already existed for years and risks immediately curbing innovation for use cases we don't yet fully understand - feels impractical. It will likely be cumbersome and slow to enforce while creating undue complexity that will likely stifle rather than enable innovation. We can look to history for precedent here - there is no single "Internet Act" for the US or the world - instead, we divide and conquer Internet regulation across our existing legislative and regulatory mechanisms. We have seen countless laws that attempt to broadly regulate the internet fail - the Stop Online Piracy Act (SOPA) is one shining US example - while other laws that seek to regulate within existing bounds succeed. For example, Section 230 of the Communications Decency Act protects internet service providers from being liable for content published, and the protection afforded here has enabled modern internet services to innovate and thrive to enormous success (e.g., YouTube, TikTok, Instagram etc.) while also forcing market competition on corporations to create their own high content standards to build better product experiences and retain users. If they didn't implement self-enforcing policies and standards, users would simply move to a better and more balanced service, or a new one would be created - that's market dynamics.
Of course, we should be practical and realistic. Any laws we amend or implement will have failings - they will be tested in our judicial system and rightly criticized, but we will learn and iterate. Yes, we won’t get this right initially - a balanced approach often will mean some bad actors will succeed, but we can limit these bad actors while dually enabling us to build a stronger and more balanced AI foundation for the years ahead. AI will continue evolving, and the laws will not keep pace - take, for example, misinformation, where Generative AI makes it far easier to construct alternate truths. While this capability has been unlocked, social media platforms still grapple with moderation of non-AI-generated content and have done so for years. Generative AI will likely create extremely concerning misinformation across services, irrespective of any laws we implement.
EU: A concerning approach
With this context, let’s examine one of the most concerning approaches to AI regulation - the European Artificial Intelligence Act - an incredibly aggressive approach that will likely cause Europe to fall far behind the rest of the world in AI. In seeking to protect EU citizens absolutely, the AI Act seemingly forgets that our world is now far more interconnected and that AI programs are, and will continue to be, deeply proliferated across our global ecosystem. For example, the Act arguably could capture essentially all probabilistic methods of predicting anything. And, in one example, it goes on to explain (in Article 5) that:
(a) the placing on the market, putting into service or use of an AI system that deploys subliminal techniques beyond a person’s consciousness in order to materially distort a person’s behaviour in a manner that causes or is likely to cause that person or another person physical or psychological harm;
Rather than taking a more balanced approach and appreciating that perfect is the enemy of good, the AI Act tries to ensnare all types of AI, not just the generative ones currently at the top of politicians and business leaders' minds. How does one even hope to enforce "subliminal techniques beyond a person's consciousness" - does this include the haptic notification trigger on my Apple Watch powered by a probabilistic model? Or the AI system that powers directions on my favorite mapping service with different visual cues? Further, the Act also includes a "high risk" categorization for "recommender systems" that, today, basically power all of e-commerce, social media, and modern technology services - are we to govern and require conformity, transparency and monitoring of all of these too? The thought is absurd, and even if one disagrees, the hurdle posed for generative AI models is so immense - no model meets the standards of the EU AI Act in its current incarnation.
We should not fear AI - we've been living with it for 10+ years in every part of our daily lives - in our news recommendations, mobile phone cameras, search results, car navigation, and more. We can't just seek to extinguish it from existence retrospectively - it's already here. So, to understand what might work, let's walk the path of history to explore what didn't - the failed technological regulations of the past. Take the Digital Millennium Copyright Act (DMCA), which attempted to enforce DRM by making it illegal to circumvent technological measures used to protect copyrighted works. This broadly failed everywhere, didn't protect privacy, stifled innovation, was hacked, and, most critically - was not aligned with consumer interests. It failed in Major League Baseball, the E-Book industry abandoned it, and even the EU couldn't get it to pass. What did work in the end? Building products aligned with consumer interests enabled them to achieve their goal - legitimate ways to access high-quality content. The result? We have incredible services like Netflix, Spotify, YouTube, and more - highly consumer-aligned products that deliver incredible economic value and entertainment for society. While each of these services has its challenges, at a broad level, they have significantly improved how consumers access content, have decentralized its creation, and enabled enormous and rapid distribution that empowers consumers to vote on where they direct their attention and purchasing power.
A great opportunity to lead
The US has an opportunity to lead the world by constructing regulation that enables progressive and rapid innovation. It should be merited by this country's principles and its role as a global model for democracy. Spending years constructing the "perfect AI legislation" and "completely new AI agencies" will end up like the parable of the blind men and an elephant - creating repackaged laws that attempt to regulate AI from different angles without holistically solving anything.
Ultimately, seeking to implement broad new statutory protections and government agencies on AI is the wrong approach. It will be slow, it will take years to garner bipartisan support, and meanwhile, the AI revolution will roll onwards. We need AI regulation that defaults to action, that balances opening the door to innovation and creativity while casting a shadow over misuse of data and punishing discriminative, abusive, and prejudicial conduct. We should seek regulation that is more focused initially, predominantly on misinformation and misrepresentation, and seek to avoid casting an incredibly wide net across all AI innovation so we can understand the implications of initial regulatory enforcement and how to structure it appropriately. For example, we can regulate the data on which these models are trained (e.g. privacy, copyright, and intellectual property laws), we can regulate computational resources (e.g. export laws), and we can regulate what products predictions are made in (e.g. discrimination laws) to start. In this spirit, we should seek to construct laws that target the inputs and outputs of AI systems and not individual developers or researchers who push the world forward.
Here’s a small non-exhaustive list of near-term actionable ideas:
Voluntary transparency on the research & development of AI - If we wait for Congress, we wait for an unachievable better path at the expense of a good one today. There is already an incredible body of work being proposed by Open AI, in Google's AI Principles among others on open and transparent disclosure - there is a strong will to transcend Washington and just do something now. We can seek to ensure companies exercise a Duty of Care, to have a responsibility under common law to identify and mitigate ill effects and have transparent reporting accordingly. And of course, this should extend and be true of our Government agencies as well.
Better AI use case categorization & risk definition - There is no well-defined regulatory concept of “AI” nor how to “determine risks” on “what use cases” therein. While the European AI Act clearly goes too far, it does at least seek to define what “AI is” but errs in defining essentially all of probability. Further, it needs to better classify risk and the classes of AI from which it is actually seeking to protect people. We can create a categorical taxonomy of AI use cases and seek to target regulatory enforcement accordingly. For example, New York is already implementing laws to require employers to notify job applications if AI is used to review their applications.
Pursue watermarking standards - Google leads the cause with watermarking to determine whether content has been AI generated. While these standards are reminiscent of DRM, it is a useful step in encouraging open standards for major distribution platforms to enable watermarking to be baked in.
Prompt clearly on AI systems that collect and use our data - Apple did this to great effect, implementing IDFA and better App Transparency on how consumer data is used. We should expect more of our services for any data being used to train AI along with the AI model use cases such data informs. Increasingly in AI, all data is incredibly biometric - from the way one types, to the way one speaks, to the way one looks and even walks. All of this data can, and is, now being used to train AI models to form biometric fingerprints of individuals - this should be made clear and transparent to society at large. Both data privacy and intellectual property laws should protect who you are, and how your data is used in a new generative AI world. Data privacy isn’t new, but we should realize that the evolution of AI continually raises the stakes.
Protect AI developers & open source - We should seek to focus regulatory efforts on the data inputs as well as have clear licensing structures for AI Models and software. Researchers and developers should not be held liable for creating models, software tooling, and infrastructure that is distributed to the world so that an open research ecosystem can continue to flourish. We must promote initiatives like model cards and ML Metadata across the ecosystem and encourage their use. Further, if developers open source AI models, we must ensure that it is incredibly difficult to hold the AI model developer liable, even as a proximate cause. For example, if an entity uses an open-source AI recommendation model in its system, and one of its users causes harm as a result of one of those recommendations, then one can’t merely seek to hold the AI model developer liable - it is foreseeable to a reasonable person who develops and deploys AI systems that rigorous research, testing, and safeguarding must occur before using and deploying AI models into production. We should ensure this is true even if the model author knew the model was defective. Why, you ask? Again, a person who uses any open source code should reasonably expect that it might have defects - hence why we have licenses like MIT that provide software “as is”. Unsurprisingly, this isn’t a new construct; it's how liability has existed for centuries in tort, and we should seek to ensure AI isn't treated differently.
Empower agencies to have agile oversight - AI consumer scams aren’t new - they are already executed by email and phone today. Enabling existing regulatory bodies to have the power to take action immediately is better than waiting for congressional oversight. The FTC has published warnings, the Department of Justice as well, and the Equal Employment Opportunity Commission too - all of the regulatory agencies that can help more today.
Embrace our future
It is our choice to embrace fear or excitement in this AI era. AI shouldn’t be seen as a replacement for human intelligence, but rather a way to augment human life - a way to improve our world, to leave it better than we found it and to infuse a great hope for the future. The threat that we perceive - that AI calls into question what it means to be human - is actually its greatest promise. Never before have we had such a character foil to ourselves, and with it, a way to significantly improve how we evolve as a species - to help us make sense of the world and the universe we live in and to bring about an incredibly positive impact in the short-term, not in a long distant theoretical future. We should embrace this future wholeheartedly, but do so with care and great tact - for as with all things, life is truly a great balancing act.
Many thanks to Christopher Kauffman, Eric Johnson, Chris Lattner and others for reviewing this post.
Image credits: Tim Davis x DALL-E (OpenAI)
The Aussie conquering Silicon Valley
Meet the little-known young Australian in Silicon Valley at the forefront of the artificial intelligence revolution, who heads one of the hottest start-ups in America that has the audacious goal of fixing AI infrastructure for the world’s software and hardware developers.
By JOHN STENSHOLT (The Australian Business Review)
After years of toughing it out and watching his Silicon Valley dream almost die, a little-known former Melbourne NAB analyst now leads one of the hottest start-ups in America.
Meet the little-known young Australian in Silicon Valley at the forefront of the artificial intelligence revolution, who heads one of the hottest start-ups in America that has the audacious goal of fixing AI infrastructure for the world’s software and hardware developers.
Tim Davis, 40, turned up in California on a whim a little over a decade ago, survived a stint in a wild house known as the “Hacker Fortress”, started an early version of an online food delivery service before the rise of UberEats and others, was poached by Google only to leave the technology giant in 2022 to co-found Modular, an AI infrastructure start-up recently valued at about US$600 million (A$927 million).
And he says he is only getting started.
In his first Australian media interview, Davis says the goals for Modular are clear – and big.
“We started Modular to improve AI infrastructure for the world. Changing the world is never easy – but we are incredibly determined to do so,” he said.
“AI is so important to the future of humanity, and we feel a great purpose to try to improve AI’s usability, scalability, portability and accessibility for developers and enterprises around the world.”
Modular’s vision is to allow AI technology to be used by anyone, anywhere and it is creating a developer infrastructure platform that enables more developers around the world to deploy AI faster, and across more hardware.
Davis, and his America co-founder Chris Lattner, have helped build and scale much of the AI software infrastructure that powers workloads at some of the world’s largest tech companies – including Google – but they argue this software has many shortcomings and was designed for research and not for scaling AI across the vast number of new uses and hardware the world is demanding now and into the future.
They are aiming to rebuild and unify AI software infrastructure, solving fragmentation issues that make it difficult for developers who work outside the world’s largest companies to build, deploy and scale AI, and ultimately make AI more accessible to everyone.
“We thought we could build something unique that could actually empower the world to move faster with AI, while equally making it more accessible to developers, make it easier to program in and better from a cost standpoint because you can scale it to different types of hardware.”
Modular claims to have built the world’s fastest AI inference engine – software that enables AI programs to run and scale to millions of people – and its own programming language, Mojo, a superset of Python (the world’s most popular programming language) which enables developers to deploy their AI programs tens of thousands of times faster, reduce costs and make it more simple to deploy AI around the world.
After raising US$30 million from investors last year, Modular recently raised another US$100 million in a funding round led by private equity firm General Catalyst and including Google Ventures, and Silicon Valley-based venture funds SV Angel, Greylock Partners and Factory.
Modular says it’s now more than 120,000 developers using its products—such as its inference engine and Mojo—launched in early May—and that “leading tech companies” are already using its infrastructure, with 35,000 on the waitlist.
The US$100 million raised will be used on product expansion, hardware support and the expansion of Mojo, as well as building the sales and commercialisation aspects of the business.
Early years
All of which is a far cry from when Davis flew to Silicon Valley in September 2012 with dreams of becoming an entrepreneur, following stints working as a financial analyst at National Australia Bank and then several internships at law firms like Allens and Minters after completing law, business and commerce degrees at Melbourne’s Monash University.
Davis had started his own company called CrowdSend in 2011, which had a software system for identifying objects in images and pictures and matching them to retailers, but said he found it “difficult” as “investors in Australia, if you wanted to raise some seed capital, they would take a very large percentage of the business.”
“So it was my wife that said if you want to do this (become an entrepreneur) why don’t you go do the place where technology is. We’d actually just gotten married and then off I went to America.”
Davis used Airbnb to find a place to stay in Silicon Valley, gave himself six weeks to make a success of things and found what was called the Hacker Fortress in the Los Altos Hills.
“It was very much a place with a whole bunch of misfits who had come to America, particularly Silicon Valley, to do a start-up. This was a very big house, it had 15 rooms, and my dream at the time was how to make CrowdSend successful. There were a lot of really talented people there,” he says.
“(But) it turns out when you come to Silicon Valley, and you come with a preconceived notion of what you want to do, well then there’s reality of what you ended up doing.”
There were plenty of issues in the Hacker Fortress, which was not as clean as advertised on Airbnb. At one stage, a housemate died in his room. Davis would also have the roof fall in on his room one evening.
The house, despite being advertised as being only 10 minutes from the likes of Google and Apple, was actually quite a way from shops and restaurants.
“We basically had no ability to get food easily and so we had this idea that maybe we could build this large scalable distribution model,” Davis explains. “At the time the likes of GrubHub and these other businesses were going to restaurants and trying to get commission deals. So instead we reverse engineered the Starbucks and Chipotle menus and built an app and threw it out as an idea to people in the house.”
Before UberEats
The idea for what would become Davis’s next business, Fluc (Food Lovers United Co), was born – all because he and his housemates were a little lazy and really didn’t want to cook.
It was 2013, a year before UberEats was launched, and while other food delivery service apps were dealing directly with restaurants, Davis and his took a different approach.
“We just thought, why don’t we just grab every restaurant menu, we just put it on our website, inflate the prices, and start selling food? Overnight, we went from five restaurants to 160 restaurants. And it just exploded. It was unprecedented. And what we tapped into really was that selection mattered. And that’s what consumers had not had, at least in the American market. And so then we went on this journey of raising capital.”
Fluc would mainly service the local Bay Area market of about 8 million people, though it tried to move to Los Angeles at one stage, and raised US$4 million from local angel investors.
Davis says after two and a half years of working seven days a week it became obvious “that we wouldn’t be able to raise the amount of capital that we needed to scale that business” at a time when the likes of DoorDash were raising hundreds of millions of dollars annually from investors.
Davis and his team then started talking to other companies about potentially being acquired, and eventually Google expressed interest not in the company but the people Fluc had assembled.
That led to Davis joining Google, where he worked as a product manager and then joined the ad division where he built machine learning systems and the Google Brain research division dedicated to AI – where he met Lattner.
What followed, Davis says, was six years of building AI systems that stood them in good stead when they decided to leave Google and form Modular.
“We were basically there for all the major components of AI as it is now … and what was interesting is that through that experience we could see the infrastructure was designed by researchers – a bunch of people who wanted to train large machine learning models. But taking those models from a research environment and actually scaling them into very large production systems is very hard.”
He says Modular’s only goal is to find solutions to that problem, and commercialise it and that the opportunity “we have in this market is astronomically huge.”
What’s next
“We work with everyone from high performance racing teams, to autonomous car companies, to very large machine learning recommendations to generative AI. You name it, whether it’s video generation, image generation, text generation, we’re there.
“You could go down the NASDAQ list of companies (to find those) who want to use our infrastructure to scale AI inside their organisations.”
As for widespread concerns about AI, Davis says the fact that AI systems take an “incredible amount of time to build” means that the public should be “realistic” about the perception that AI could be a threat to humanity.
“We definitely have better recommendation systems, we have better chat bots, we have translation across multiple languages, we can take better photos, we are now all better copywriters, and we have voice assistants.
“But none of this is remotely close to AGI or artificial general intelligence – or anywhere near ASI: artificial super intelligence. We have a long way to go.
“AI shouldn’t be seen as a replacement for human intelligence, it’s a way to augment human life – a way to improve our world, to leave it better than we found it.”
Unite AI – Interview Series
Tim Davis, is the Co-Founder & President of Modular, an integrated, composable suite of tools that simplifies your AI infrastructure so your team can develop, deploy, and innovate faster. Modular is best known for developing Mojo, a new programming language that bridges the gap between research and production by combining the best of Python with systems and metaprogramming.
By Antoine Tardif (Unite AI)
Tim Davis, is the Co-Founder & President of Modular, an integrated, composable suite of tools that simplifies your AI infrastructure so your team can develop, deploy, and innovate faster. Modular is best known for developing Mojo, a new programming language that bridges the gap between research and production by combining the best of Python with systems and metaprogramming.
Repeat Entrepreneur and Product Leader. Tim helped build, found and scale large parts of Google's AI infrastructure at Google Brain and Core Systems from APIs (TensorFlow), Compilers (XLA & MLIR) and runtimes for server (CPU/GPU/TPU) and TF Lite (Mobile/Micro/Web), Android ML & NNAPI, large model infrastructure & OSS for billions of users and devices. Loves running, building and scaling products to help people, and the world.
When did you initially discover coding, and what attracted you to it?
As a kid growing up in Australia, my dad brought home a Commodore 64C and gaming was what got me hooked – Boulder Dash, Maniac Mansion, Double Dragon – what a time to be alive. That computer introduced me to BASIC and hacking around with that was my first real introduction to programming. Things got more intense through High School and University where I used more traditional static languages for engineering courses, and over time I even dabbled all the way up to Javascript and VBA, before settling on Python for the vast majority of programming as the language of data science and AI. I wrote a bunch of code in my earlier startups but these days, of course, I utilize Mojo and the toolchain we have created around it.
For over 5 years you worked at Google as Senior Product Manager and Group Product Leader, where you helped to scale large parts of Google's AI infrastructure at Google Brain. What did you learn from this experience?
People are what build world-changing technologies and products, and it is a devoted group of people bound by a larger vision that brings them to the world. Google is an incredible company, with amazing people, and I was fortunate to meet and work with many of the brightest minds in AI years ago when I moved to join the Brain team. The greatest lessons I learnt were to always focus on the user and progressively disclose complexity, to empower users to tell their unique stories to the world like fixing the Greater Barrier Reef or helping people like Jason the Drummer, and to attract and assemble a diverse mix of people to drive towards a common goal. In a massive company of very smart and talented people, this is much harder than you can imagine. Reflecting on my time there, it’s always the people you worked with that are truly memorable. I will always look back fondly and appreciate that many people took risks on me, and I’m enormously thankful they did, as many of those risks encouraged me to be a better leader and person, to dive deep and truly understand AI systems. It truly made me realize the profound power AI has to impact the world, and this was the very reason I had the inspiration and courage to leave and co-found Modular.
Can you share the genesis story behind Modular?
Chris and I met at Google and shipped many influential technologies that have significantly impacted the world of AI today. However, we felt AI was being held back by overly complex and fragmented infrastructure that we witnessed first hand deploying large workloads to billions of users. We were motivated by a desire to accelerate the impact of AI on the world by lifting the industry towards production-quality AI software so we, as a global society, can have a greater impact on how we live. One can’t help but wonder how many problems AI can help solve, how many illnesses cured, how much more productive we can become as a species, to further our existence for future generations, by increasing the penetration of this incredible technology.
Having worked together for years on large scale critical AI infrastructure – we saw the enormous developer pain first hand – “why can’t things just work”? For the world to adopt and discover the enormous transformative nature of AI, we need software and developer infrastructure that scales from research to production, and is highly accessible. This will enable us to unlock the next way of scientific discoveries – of which AI will be critical – and is a grand engineering challenge. With this motivating background, we developed an intrinsic belief that we could set out to build a new approach for AI infrastructure, and empower developers everywhere to use AI to help make the world a better place. We are also very fortunate to have many people join us on this journey, and we have the world's best AI infrastructure team as a result.
Can you discuss how the Mojo programming language was initially built for your own team?
Modular’s vision is to enable AI to be used by anyone, anywhere. Everything we do at Modular is focused on that goal, and we walk backwards from that in the way we build out our products and our technology. In this light, our own developer velocity is what matters to us firstly, and having built so much of the existing AI infrastructure for the world – we needed to carefully consider what would enable our team to move faster. We have lived through the two-world language problem in AI – where researchers live in Python, and production and hardware engineers live in C++ – and we had no choice but to either barrel down that road, or rethink the approach entirely. We chose the latter. There was a clear need to solve this problem, but many different ways to solve it – we approached it with our strong belief of meeting the ecosystem where it is today, and enabling a simpler lift into the future. Our team bears the scars of software migration at large scale, and we didn’t want a repeat of that. We also realized that there is no language today, in our opinion, that can solve all the challenges we are attempting to solve for AI and so we undertook a first principles approach, and Mojo was born.
How does Mojo enable seamless scaling and portability across many types of hardware?
Chris, myself and our team at Google (many at Modular) helped bring MLIR into the world years ago – with the goal to help the global community solve real challenges by enabling AI models to be consistently represented and executed on any type of hardware. MLIR is a new type of open-source compiler infrastructure that has been adopted at scale, and is rapidly becoming the new standard for building compilers through LLVM. Given our team's history in creating this infrastructure, it's natural that we utilize it heavily at Modular and this underpins our state of the art approach in developing new AI infrastructure for the world. Critically, while MLIR is now being fast adopted, Mojo is the first language that really takes the power of MLIR and exposes it to developers in a unique and accessible way. This means it scales from Python developers who are writing applications, to Performance engineers who are deploying high performance code, to hardware engineers who are writing very low level system code for their unique hardware.
References to Mojo claim that it’s basically Python++, with the accessibility of Python and the high performance of C. Is this a gross oversimplification? How would you describe it?
Mojo should feel very familiar to any Python programmer, as it shares Python’s syntax. But there are a few important differences that you’ll see as one ports a simple Python program to Mojo, including that it will just work out of the box. One of our core goals for Mojo is to provide a superset of Python – that is, to make Mojo compatible with existing Python programs – and to embrace the CPython implementation for long-tail ecosystem support. Then enable you to slowly augment your code and replace non-performing parts with Mojo’s lower-level features to explicitly manage memory, add types, utilize autotuning and many other aspects to get the performance of C or better! We feel Mojo gives you get the best of both worlds and you don’t have to write, and rewrite, your algorithms in multiple languages. We appreciate Python++ is an enormous goal, and will be a multi-year endeavor, but we are committed to making it reality and enabling our legendary community of more than 140K+ developers to help us build the future together.
In a recent keynote it was showcased that Mojo is 35,000x faster than Python, how was this speed calculated?
It’s actually 68,000x now! But let's recognize that it's just a single program in Mandelbrot – you can go and read a series of three blog posts on how we achieved this – here, here and here. Of course, we’ve been doing this a long time and we know that performance games aren’t what drive language adoption (despite them being fun!) – it’s developer velocity, language usability, high quality toolchains & documentation, and a community utilizing the infrastructure to invent and build in ways we can’t even imagine. We are tool builders, and our goal is to empower the world to use our tools, to create amazing products and solve important problems. If we focus on our larger goal, it's actually to create a language that meets you where you are today and then lifts you easily to a better world. Mojo enables you to have a highly performant, usable, statically typed and portable language that seamlessly integrates with your existing Python code – giving you the best of both worlds. It enables you to realize the true power of the hardware with multithreading and parallelization in ways that raw Python today can not – unlocking the global developer community to have a single language that scales from top to bottom.
Mojo’s magic is its ability to unify programming languages with one set of tools, why Is this so important?
Languages always succeed by the power of their ecosystems and the communities that form around them. We’ve been working with open source communities for a long time, and we are incredibly thoughtful towards engaging in the right way and ensuring that we do right by the community. We’re working incredibly hard to ship our infrastructure, but need time to scale out our team – so we won’t have all the answers immediately, but we’ll get there. Stepping back, our goal is to lift the Python ecosystem by embracing the whole existing ecosystem, and we aren’t seeking to fracture it like so many other projects. Interoperability just makes it easier for the community to try our infrastructure, without having to rewrite all their code, and that matters a lot for AI.
Also, we have learnt so much from the development of AI infrastructure and tools over the last ten years. The existing monolithic systems are not easily extensible or generalizable outside of their initial domain target and the consequence is a hugely fragmented AI deployment industry with dozens of toolchains that carry different tradeoffs and limitations. These design patterns have slowed the pace of innovation by being less usable, less portable, and harder to scale.
The next-generation AI system needs to be production-quality and meet developers where they are. It must not require an expensive rewrite, re-architecting, or re-basing of user code. It must be natively multi-framework, multi-cloud, and multi-hardware. It needs to combine the best performance and efficiency with the best usability. This is the only way to reduce fragmentation and unlock the next generation of hardware, data, and algorithmic innovations.
Modular recently announced raising $100 million in new funding, led by General Catalyst and filled by existing investors GV (Google Ventures), SV Angel, Greylock, and Factory. What should we expect next?
This new capital will primarily be used to grow our team, hiring the best people in AI infrastructure, and continuing to meet the enormous commercial demand that we are seeing for our platform. Modverse, our community of well over 130K+ developers and 10K’s of enterprises, are all seeking our infrastructure – so we want to make sure we keep scaling and working hard to develop it for them, and deliver it to them. We hold ourselves to an incredibly high standard, and the products we ship are a reflection of who we are as a team, and who we become as a company. If you know anyone who is driven, who loves the boundary of software and hardware, and who wants to help see AI penetrate the world in a meaningful and positive way – send them our way.
What is your vision for the future of programming?
Programming should be a skill that everyone in society can develop and utilize. For many, the “idea” of programming instantly conjures a picture of a developer writing out complex low level code that requires heavy math and logic – but it doesn’t have to be perceived that way. Technology has always been a great productivity enabler for society, and by making programming more accessible and usable, we can empower more people to embrace it. Empowering people to automate repetitive processes and make their lives simpler is a powerful way to give people more time back.
And in Python, we already have a wonderful language that has stood the test of time – it's the world's most popular language, with an incredible community – but it also has limitations. I believe we have a huge opportunity to make it even more powerful, and to encourage more of the world to embrace its beauty and simplicity. As I said earlier, it's about building products that have progressive disclosure of complexity – enabling high level abstractions, but scaling to incredibly low level ones as well. We are already witnessing a significant leap with AI models enabling progressive text-to-code translations – and these will only become more personalized over time – but behind this magical innovation is still a developer authoring and deploying code to power it. We’ve written about this in the past – AI will continue to unlock creativity and productivity across many programming languages, but I also believe Mojo will open the ecosystem aperture even further, empowering more accessibility, scalability and hardware portability to many more developers across the world.
To finish, AI will penetrate our lives in untold ways, and it will exist everywhere – so I hope Mojo catalyzes developers to go and solve the most important problems for humanity faster – no matter where they live in our world. I think that’s a future worth fighting for.
Modular has raised $100M to fix AI infrastructure
We are so excited to announce this $100M raise, and beyond proud of what our world-class team, incredible customers and partners have enabled us to achieve. Our AI Engine is the world's fastest and has a strong list of customers lining up for its unparalleled performance and usability, and our new programming language in - Mojo 🔥 - already has a developer community of >120K+ developers in just 4 months.
Chris Lattner and I started Modular to help improve AI infrastructure for the world, and to enable the next wave of AI innovation to be truly unlocked on the world's hardware.
We are so excited to announce this $100M raise, and beyond proud of what our world-class team, incredible customers and partners have enabled us to achieve. Our AI Engine is the world's fastest and has a strong list of customers lining up for its unparalleled performance and usability, and our new programming language in - Mojo 🔥 - already has a developer community of >120K+ developers in just 4 months.
This round was led by General Catalyst, in Deep Nishar&Christopher Kauffman, and filled out by existing investors in GV (Google Ventures) with Dave Munichiello, SVA in Steven Lee&Ronny Conway, Greylock in Saam Motamedi and Factory - amazing leaders and incredible people.
We are so fortunate to work with them, and their belief, to build and create change in the world.And of course, changing the world is never easy - but we are so incredibly determined to continue to do so. AI is so important to the future of humanity, and we feel a great purpose to truly improve AI's usability, scalability, portability and accessibility for the worlds developers and enterprises.
Join us on this incredible journey, and let's change the world together 🚀! You can read more on the Modular blog.
Data Exchange Interview
Welcome to the Data Exchange Podcast. Today we’re joined by Tim Davis, co-founder and Chief Product Officer at Modular. Their tagline says it all: The future of AI development starts here. Tim, great to have you on the show.
Interview with Tim Davis, Co-Founder of Modular. Full interview here
Ben (Host): Welcome to the Data Exchange Podcast. Today we’re joined by Tim Davis, co-founder and Chief Product Officer at Modular. Their tagline says it all: The future of AI development starts here. Tim, great to have you on the show.
Tim Davis: Great to be here, Ben—thanks for having me.
Introducing Mojo: Python, Reimagined
Ben: Let’s dive right in. What is Mojo, and what can developers use today?
Tim: Mojo is a new programming language—a superset of Python, or “Python++,” if you will. Right now, anyone can sign up at modular.com/mojo to access our cloud-hosted notebook environment, play with the language, and run unmodified Python code alongside Mojo’s advanced features.
“All your Python code will execute out of the box—you can then take performance-critical parts and rewrite them in Mojo to unlock 5–10× speedups.”
That uplift comes from our state-of-the-art compiler and runtime stack, built on MLIR and LLVM foundations.
Solving the Two-Language Problem
Many ML frameworks hide C++/CUDA complexity behind Python APIs, but that split still causes friction. Mojo bridges the gap:
Prototype in Python
Optimize in Mojo (same codebase)
“Researchers no longer need to drop into C++ for speed; they stay in one language from research to production.”
This unified model dramatically accelerates the path from idea to deployment.
Who is Mojo For?
Ben: Frameworks like TensorFlow and PyTorch already tackle performance. Who’s Mojo’s target audience?
Tim: Initially, it’s us—Modular’s own infrastructure team. But our real audience spans:
Systems-level ML engineers who need granular control and performance.
GPU researchers wanting a seamless path to production without rewriting code.
By meeting developers where they are, Mojo helps defragment fragmented ML stacks and simplifies pipelines.
Under the Hood: Hardware-Agnostic Design
Mojo’s architecture is built for broad hardware support:
MLIR (Multi-Level IR): Provides a common representation across hardware.
LLVM Optimizations: Powers high-performance codegen.
Multi-Hardware Portability: CPUs, GPUs, TPUs, edge devices, and beyond.
“We want access to all hardware types. Today’s programming model is constrained—Mojo opens up choice.”
This means you’re not locked into CUDA or any single accelerator vendor.
Beyond the Language: Unified AI Inference Engine
Modular also offers a drop-in inference engine:
Integrates with Triton, TF-Serving, TorchServe
CPUs first (batch workloads), GPUs coming soon
Orders-of-magnitude performance gains
“Simply swap your backend and get massive efficiency improvements—no changes to your serving layer.”
Enterprises benefit from predictable scaling and hardware flexibility, whether on Intel, AMD, ARM-based servers, or custom ASICs.
Roadmap: Community, Open Source & Enterprise
Next 6–12 Months:
Expand Mojo’s language features (classes, ownership, lifetimes).
Enable GPU execution (beyond the cloud playground).
Extend the inference engine to training, dynamic workloads, and full pipeline optimizations (pre-/post-processing).
“We released early to learn from real users—80,000 sign-ups across 230+ countries. Their feedback drives our roadmap.”
Why a New Language Matters
Mojo’s core value prop can be summed up in three words:
Usable: Drop-in Python compatibility; gentle learning curve.
Performant: Advanced compiler + runtime yields 5–10× speedups out of the box.
Portable: Write once, run anywhere—from cloud GPUs to mobile CPUs.
Together, these unlock faster innovation, lower costs, and broader hardware choice.
Democratizing AI Development
In Tim’s own words:
“Our mission is to make AI development accessible to anyone, anywhere. By rethinking the entire stack, we’re unlocking a new wave of innovation and putting compute power in more hands.”
With its unified language and inference engine, Modular is ushering in a future where AI development truly starts here—for researchers, engineers, and enterprises alike.
Founding Modular & Raising $30M
After working for years in the AI/ML space - I’ve left Google and decided it's time for a new approach to building machine learning infrastructure. Chris Lattner and I, along with an incredible team of talented architects, engineers and product leaders are teaming up to rebuild it from the ground up and truly help the world of AI.
We are bringing together the world's best AI infrastructure talent to improve AI production development and deployment.
After working for years in the AI/ML space - I’ve left Google and decided it's time for a new approach to building machine learning infrastructure. Chris Lattner and I, along with an incredible team of talented architects, engineers and product leaders are teaming up to rebuild it from the ground up and truly help the world of AI.
We building a next generation AI developer platform, and we are proud to partner with @GVteam, @GreylockVC, Factory, SV Angel and notable angels who are funding our $30M first round of funding. We spoke with Dave Munichiello from GV about the opportunity here. The next generation of product breakthroughs will be powered by production quality infrastructure that brings together the best of compilers and runtimes, is designed for heterogeneous compute, edge to datacenter distribution, and is focused on usability. Unifying software and hardware with a "just works" approach that will save developers enormous time and increase their velocity.
Having worked for many years in the AI space at Google, we have and are continuing to assemble the world's best AI infrastructure team. You can read some of the challenges the industry faces, in our opinion, via our blog post here: The Case for a Next-Generation AI Developer Platform. We are hiring for numerous roles - please apply via Modular Careers.
We're excited to showcase what we have been building and designing later in the year. You can checkout a video we put together below.
We are incredibly excited about the mission before us and if you are interested in joining us to change the world - just reach out via www.modular.com. We’re hiring everywhere. The future is super exciting and bright!
Help Jason find his Rhythm
Learn how TensorFlow Lite Micro helped Jason find his rhythm after loosing his arm - this is story and showcasing the power of AI.
Learn how TensorFlow Lite Micro helped Jason find his rhythm after loosing his arm - this is story and showcasing the power of AI.
Many years ago, Pete Warden, Raziel, Rocky, Sarah, Andy and I, with a small and incredible team at Google, founded TensorFlow Lite Micro for the world as we recognized the importance of executing machine learning on microcontrollers. We have continued to scale and contribute significant improvements to TF Lite for Microcontrollers over the years unlocking a multitude of use cases from speech, to person detection, to a multitude of audio detection use cases and more. We have seen the creation of cascading networks where the front of the pipeline is a very low power microcontroller running a tiny inference model all the way through to multi-model pipelines firing up a more significant application processor.
Here is just one incredible story where the team from Georgia Institute of Technology under Gil Weinberg, took the the TensorFlow Lite and TensorFlow Lite Micro work and drove an inspiring use case to help Jason find his rhythm. Watch the video below to understand how TensorFlow Lite is being used to empower human augmentation and empower people to rediscover that anything is possible. If you're running short on time - here's how it works.
As Stephen Hawking once said:
“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the Universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at.
Saving the Great Barrier Reef
Coral reefs are some of the most diverse and important ecosystems in the world - both for marine life and society more broadly. Not only are healthy reefs critical to fisheries and food security, they provide countless additional benefits: protecting coastlines from storm surge, supporting tourism-based economies and sustainable livelihoods, and pushing forward drug discovery research.
Helping save the Great Barrier Reef using TensorFlow and the power of Machine Learning
Coral reefs are some of the most diverse and important ecosystems in the world - both for marine life and society more broadly. Not only are healthy reefs critical to fisheries and food security, they provide countless additional benefits: protecting coastlines from storm surge, supporting tourism-based economies and sustainable livelihoods, and pushing forward drug discovery research.
2 years ago, at a lunch and meetup with Martin Wicke and Brano Kusy - we decide to work together and team up to see if Google could contribute to saving the great barrier reef from Crown of Thorns starfish and identifying other sea life. As an Australian - I knew of the importance of this mission for the world and for future generations to experience the wonder of the reef, and I decided to lean in heavily and help champion this effort at Google, helping to drive it forward. What followed - was a multi-step journey, with an incredible team and group of folks at Google and CSIRO, helping Brano's team end-to-end label, train, scale and deploy, to their boat mounted hardware, a new solution for rapidly identifying COTS (and many other things) in the Great Barrier Reef. You can see our research paper for the data here.
We launched a Kaggle competition, to help crowd source the final approach and enable this to be executed live on the Great Barrier Reef, and we're open sourcing it for the world as well. You can checkout my narration in the video below and see the original TensorFlow post.
Reefs around the world face a number of rising threats, most notably climate change, pollution, and overfishing. In the past 30 years alone, there have been dramatic losses in coral cover and habitat in the Great Barrier Reef (GBR), with other reefs experiencing similar declines. In Australia, outbreaks of the coral-eating COTS have been shown to cause major coral loss. These outbreaks can strip a reef of 90% of its coral tissue. While COTS naturally exist in the Indo-Pacific ocean, overfishing and excess run-off nutrients have led to massive outbreaks that are devastating already vulnerable coral communities.
Controlling COTS populations is critical to reducing coral mortality from outbreaks. Google has teamed up with CSIRO to supercharge efforts in monitoring COTS using artificial intelligence. This is just the beginning of a much deeper collaboration and we, along with the Great Barrier Reef Foundation, are extremely excited to invite you, our global ML community, to help protect the world's reefs.
MLIR - Open Source Infrastructure for the world
MLIR is a new compiler stack that I have the privilege of being the Product Manager for at Google. It fundamentally takes a completely different perspective on existing technologies available in the marketplace by producing a multi-level intermediate representation that combines high level optimizations with lower level code generation in a way that hasn't been done before. I have had the privilege to work with Chris Lattner and a talented team of many other folks in building out this technology and look forward to the enormous impact it is going to have on machine learning in the years ahead. Given that we, Google, are an AI first company - even Sundar was happy about the news. Below is a repost of the original announcement that we posted to the main Google blog.
MLIR offers new infrastructure and a design philosophy that enables machine learning models to be consistently represented and executed on any type of hardware.
MLIR is a new compiler stack that I have the privilege of being the Product Manager for at Google. It fundamentally takes a completely different perspective on existing technologies available in the marketplace by producing a multi-level intermediate representation that combines high level optimizations with lower level code generation in a way that hasn't been done before. I have had the privilege to work with Chris Lattner and a talented team of many other folks in building out this technology and look forward to the enormous impact it is going to have on machine learning in the years ahead. Given that we, Google, are an AI first company - even Sundar was happy about the news. Below is a repost of the original announcement that we posted to the main Google blog.
Machine learning now runs on everything from cloud infrastructure containing GPUs and TPUs, to mobile phones, to even the smallest hardware like microcontrollers that power smart devices. The combination of advancements in hardware and open-source software frameworks like TensorFlow is making all of the incredible AI applications we’re seeing today possible--whether it’s predicting extreme weather, helping people with speech impairments communicate better, or assisting farmers to detect plant diseases.
But with all this progress happening so quickly, the industry is struggling to keep up with making different machine learning software frameworks work with a diverse and growing set of hardware. The machine learning ecosystem is dependent on many different technologies with varying levels of complexity that often don't work well together. The burden of managing this complexity falls on researchers, enterprises and developers. By slowing the pace at which new machine learning-driven products can go from research to reality, this complexity ultimately affects our ability to solve challenging, real-world problems.
Earlier this year we announced MLIR, open source machine learning compiler infrastructure that addresses the complexity caused by growing software and hardware fragmentation and makes it easier to build AI applications. It offers new infrastructure and a design philosophy that enables machine learning models to be consistently represented and executed on any type of hardware. And today we’re announcing that we’re contributing MLIR to the nonprofit LLVM Foundation. This will enable even faster adoption of MLIR by the industry as a whole.
MLIR aims to be the new standard in ML infrastructure and comes with strong support from global hardware and software partners including AMD, ARM, Cerebras, Graphcore, Habana, IBM, Intel, Mediatek, NVIDIA, Qualcomm Technologies, Inc, SambaNova Systems, Samsung, Xiaomi, Xilinx—making up more than 95 percent of the world’s data-center accelerator hardware, more than 4 billion mobile phones and countless IoT devices. At Google, MLIR is being incorporated and used across all our server and mobile hardware efforts.
Machine learning has come a long way, but it's still incredibly early. With MLIR, AI will advance faster by empowering researchers to train and deploy models at larger scale, with more consistency, velocity and simplicity on different hardware. These innovations can then quickly make their way into products that you use every day and run smoothly on all the devices you have—ultimately leading to AI being more helpful and more useful to everyone on the planet.
The career opportunity matrix
The biggest mistake I constantly see new founders make is misjudging the matrix that almost everyone goes through when making a career (and most often life) changing decision. I certainly didn’t appreciate this as much as I should have when I was a founder and I urge anyone reading this post to deeply consider aspects highlighted in this post.
The biggest mistake I constantly see new founders make is misjudging the matrix that almost everyone goes through when making a career (and most often life) changing decision. I certainly didn’t appreciate this as much as I should have when I was a founder and I urge anyone reading this post to deeply consider aspects highlighted in this post.
Both at my previous startup and with the many subsequent startup founders that I’ve talked too since joining Google — the biggest mistake I constantly see new founders make is misjudging the matrix that almost everyone goes through when making a career (and most often life) changing decision. I certainly didn’t appreciate this as much as I should have when I was a founder and I urge anyone reading this post to deeply consider aspects highlighted in this post.
When you ask someone to join your company —you will almost immediately misjudge the impact it will have on their life. You are asking someone to ultimately change the value they attribute to their time in what they currently do — in order to encapsulate themselves in whatever it is you do. The value of this utility is ultimately the most valuable thing any person has — the opportunity to spend their time when and how they see fit.
If you are unable to relate to the person, inspire the person, compensate the person or truly enable them to believe in the growth opportunity you present — why would they offer up their most precious asset to work with, and ultimately, for you? Talented people, truly great people — understand the value of time. Its a common interview question of mine to press people on explaining to me how their organize and attribute value to it. The reality of the world is — most people don’t attribute enough value to their time and waste it frequently.
So as a founder, understand the importance of time. Its almost unequivocal at this point that building out an all star team is going to be the single most important aspect attributable to your companies success — a great team is always going to be able to derive a solution in good times and bad. But consider this — when you ask, or pitch, someone to join you — do you really appreciate the matrix of decisions that they are going through to join your company. Why would anyone want to spend their time with you?
In my experience to date, from startups through to Google, here are the only four major aspects that matter to people who attribute real value to their time. This is their opportunity cost matrix:
Family — In a personal capacity, most people will not sacrifice their family for anything. It is always what is prioritized in people’s lives above all else — it is uncompromising. Equally, in a professional capacity, people want to feel that they have a real connection to their work mates and their team. Usually, a few extroverted people bind a team together and help it thrive. You know that you have a team that feels this way when someone leaves — you actually miss that person when they do. If you miss the importance of family in someones life, or you overlook it, you do so at extreme peril more often than not.
Remuneration — Typically, no one works for free. People value their time and they deserve to be compensated for their skill. Cash is almost viewed to be infinitely more valuable than any equity you will provide someone. Great people need to be compensated well, or at least very fairly, from a cash standpoint and even better from an equity one. You might treat your equity at a premium, but asking someone to leave a top 10 tech company to join your team is asking them to move from a world of liquidity to a long life without it. Just look at the giants like Airbnb and Uber — more than 10 years later many employees still remain without any liquidity. So remember, more often than not I have found that liquid cash is king and that people work to care for their families.
Growth — Everyone wants to advance their careers in some capacity — whether it be personal knowledge, leadership or the chance to be promoted. Not everyone cares about the later — some are very satisfied with a growth in their own knowledge without significant change in their ladder ranking. Equally, some do. Working on a product and with a team that facilities and enables growth ultimately leads to higher retention — promote people quickly who do a fantastic job and its often easy for you to tell who your organization could not live without. Do not fall into the trap of rewarding them “down the line” — the simple reason being that great people are quickly acquired and this matrix is more than likely being pitched to them at numerous other companies as well.
Purpose — A purpose or mission is typically much bigger than what you are working on today. It’s the vision that you should be proud to tell your friends and family and is much larger than the current iteration of your product or company. Teams that don’t have a mission don’t know where they are going and aren’t connected through the shared bond of a greater goal or purpose. A mission is why you get up and come to work everyday and is typically something you really believe in. It isn’t captured by in metrics but in how you are trying to put a dint in the world. If you cannot inspire someone to believe in the purpose of what you are doing — then regardless of anything else — they will never truly ride the rollercoaster with you.
Everyone deserves to feel that they can grow their own careers and have a sense of personal achievement as they define it. Equally, they should be able to tell the world what they are passionate about, why they are working on it and have a defined sense of purpose. Everyone has to point north and believe in what they are doing — in fact
If everyone on your team can’t tell a friend very simply what problem your product is solving, why it’s unique, and why you will win, you might as well stop and go home now.
As an anecdote, and in this light, when I previously pitched to Alibaba’s investment arm and asked their investment partners why they worked at Alibaba. The response was the most impressive one ever heard:
“At Alibaba, we aim to build the future infrastructure of commerce. We envision that our customers will meet, work and live at Alibaba, and that we will be a company that spans over three centuries. We take a very long approach in every investment we make and everything we do.”
Your ambitions and goals should be as easily declared when anyone on your team is asked what it is you do — a clear vision, a clear sense of purpose, a passionate sense of worth and a guiding north star. Without it — why should anyone join you?
“And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary.” — Steve Jobs, 2005
Presenting at Google I/O
It was super fun to speak at Google I/O Developer Keynote this year about the amazing work the TensorFlow Lite team has done on scaling on-device ML globally. You can checkout the video of the presentation below which showcased a new interactive form of video and the ability to scale that to a range of new device platforms.
Dancing on stage at Google I/O 19 in front of 15,000 was a fun experience and inspiring. The video is below and I start at 42:55 into it.
It was super fun to speak at Google I/O Developer Keynote this year about the amazing work the TensorFlow Lite team has done on scaling on-device ML globally. You can checkout the video of the presentation below which showcased a new interactive form of video and the ability to scale that to a range of new device platforms.
The TensorFlow Lite team, combined with other internal Google Research and performance teams, showcased whats now possible utilizing on-device ML with a Pixel 3 and the GPU delegation improvements that we have previously announced. Here’s the video giving a deeper dive into the making of Dance Like.
Supportive Confrontation
Supportive confrontation is a methodology adopted by respectful and highly constructive individuals and teams to ultimately push the boundaries of self-improvement. David Bradford (Stanford) and Allen Cohen ultimately coined the termed in their book Power Up and David utilizes this in his class on High Performance Leadership.
Supportive confrontation is a methodology adopted by respectful and highly constructive individuals and teams to ultimately push the boundaries of self-improvement. David Bradford (Stanford) and Allen Cohen ultimately coined the termed in their book Power Up and David utilizes this in his class on High Performance Leadership.
Supportive confrontation is a methodology adopted by respectful and highly constructive individuals and teams to ultimately push the boundaries of self-improvement. David Bradford (Stanford) and Allen Cohen ultimately coined the termed in their book Power Up and David utilizes this in his class on High Performance Leadership.
In my career, and notably at Google, we utilize supportive confrontation to improve as individuals and strengthen our collective team in working to continue to build world class products.
So what is it?
The very definition of “confrontation” presumes some measure of hostility or disagreement between parties. In its purse sense, it is appreciated why the name alone instills some level of unsurety in many individuals who undertake this method of feedback adoption. However, the fundamental premise of “supportive confrontation” is a balanced combination of the former with the later - to deal with personal growth in a direct and honest way. Indeed, in the absence of such direct and honest feedback, a host of eventualities ultimately occur including emotional instability, poor accountability and ownership, individual conflict and poor team performance.
The focus for supportive confrontation is to initiate conversations with individuals that you have an obvious mutual respect for and whom would be open to receiving such direct and honest feedback. You should recognize that not all work colleagues (or friends) can undertake this feedback methodology without immediately reacting in a defensive manner — so choosing the nature of your audience is a critical one. The ultimate goal is, through mutual respect of person or people providing such feedback, to abject any form of personal emotion and consider the feedback for what it is — open and honest. In order to improve as an individual, one has to accept all forms of criticism in life and ultimately listen intently, reflect and channel this feedback forward.
Here are some broad level pointers:
Open transparency from the beginning — If you have asked an individual, or a group, to engage in supportive confrontation, you should illustrate that the purpose of such feedback is to improve and not to engage in personal attacks. The art of this form of feedback is mutual respect for each person and explaining this from the outset is critical to it being successful. Explain from your own perspective how and why this feedback system has been useful in the past and ensure that everyone can empathize with this point of view.
3 Top Aspects, 3 Bottom Aspects — Provide each person with three things they are doing well, and three things that they ultimately need to improve on. Feedback should always be direct and actionable — including real examples of what you have noticed and when further helps an individual to consider how they can improve. Good and bad feedback without observable examples makes it difficult for people to reflect and consider — so quantifying your feedback ensures the recipient can undergo self-reflection and retrospectives in their own time.
Write your feedback statements on cards, give it to the person physically — Writing down your feedback on cards during the session and handing it directly to the person, or people, solidifies it and ensures they can’t simply forget or dismiss the feedback provided.
No interruptions or responding to feedback for 24 hrs — Provide feedback directly and ensure that no one interrupts, gets defensive or attempts to justify their behaviour for at least 24 hours. The idea of this “post feedback cooling off” period is to allow individuals to consider it for what it is and contemplate how they can improve. You should ensure that everyone understands they aren’t meant to interrupt or justify during the initial meeting.
Respond with ways to actionable improvement steps, don’t justify — After 24 hours, you should meet again with the individuals that provided feedback. Ideally, as a recipient of feedback, you should seek to provide measurable steps to improve on each component of feedback. Justifying your actions isn’t the purpose of supportive confrontation, it’s about setting actionable methodologies across a set time horizon so others can observe your behaviour and measure this improvement (or not).
Be Thankful — Thank all those who are involved in giving you feedback. The very purpose of this methodology is to grow and improve as an individual and to remember that those you mutually respect are telling you this because they care about you enough to want you to grow and be better as an individual. As the old adage suggests — “It’s better to know the devil you do, than the devil you don’t”
Feedback, in its rawest sense, is about listening, watching, learning and working to improve after receiving it. You should appreciate those with the courage to undertaken supportive confrontation with you enough, to ensure that you take the feedback seriously and find ways to action and improve on it as an individual. If you feel the feedback is overtly personal or hurtful in nature, you should have the confidence to speak to the individual, or group, after reflection 24 hours later and let them know that. Equally, you should seek to understand their reasoning nature of why they provided it — it will be better for all of you.
That way, you can become a better person, colleague, team member and leader as you grow.