Cerebras CEO: "Nobody Wants Slow AI" — A $20B OpenAI Deal, the AWS Architecture Play, and Why This Is Not a Bubble

Bloomberg Tech 2026, San Francisco — June 4, 2026

Two weeks after pulling off what Cerebras CEO Andrew Feldman called the largest semiconductor IPO in history, Feldman sat down with Bloomberg's Tom Giles at Bloomberg Tech 2026 in San Francisco to lay out the company's commercial thesis, its partnership architecture with hyperscalers, and his unambiguous view on whether AI infrastructure spending has gotten ahead of itself. The answers were more illuminating than most investor calls deliver in a full quarter.

The OpenAI and AWS Deals Are the Blueprint, Not the Exception

The most consequential new information in the conversation concerns the structure and scale of Cerebras' commercial momentum heading into the IPO. Feldman confirmed a committed take-or-pay deal with OpenAI "north of $20 billion," signed roughly 45 days before a separate agreement with AWS. Together, these two transactions establish a commercial model that Feldman described as designed to extend to other hyperscalers — with one pointed exception. "We are now engaged in that process of using other people's part for part of the problem and our part for another part of the problem, with all members of the community, other hyperscalers that aren't Nvidia," he said. When pressed, he confirmed: "So everybody but them."

The AWS deal is architecturally interesting and deserves close investor attention because it reveals how Cerebras intends to embed itself into existing cloud infrastructure rather than compete with it head-on. The core insight is a decomposition of the inference workload into two distinct compute problems. The first — called prefill — processes the incoming prompt and is highly parallelizable, meaning existing training-optimized silicon from hyperscalers handles it well. The second — called decode, the generation of the actual answer — is strictly sequential, and that is where Cerebras' chip delivers its performance advantage. "We can use your training in part to do the pre, and we'll use our big chip to do the decode," Feldman explained. "And what we will get is this extraordinary solution." The implication for investors is that Cerebras is not trying to displace hyperscaler infrastructure wholesale; it is inserting itself into the most latency-sensitive, highest-value step of the inference pipeline.

Speed Is the Product — and the Market Analogy Is Deliberately Blunt

Feldman grounded the speed argument in a Google paper from 2009 showing that even small increases in response latency meaningfully reduce user engagement, retention, and session length — even when users are not consciously aware of the delay. He translated this into a direct market sizing argument: "How big is the market for slow search? How big is the market for dial-up internet?" He framed speed not as a performance specification but as the defining characteristic of a product category. Cerebras claims its inference is faster than competitors by more than 15 times. Peter Steinberger, the designer of Open Claw, was quoted saying that using Cerebras "was like giving him Thor's hammer" for coding productivity.

The company's performance claim — more than 15x faster than alternatives — remains the central pillar of its commercial proposition, and the OpenAI and AWS wins provide meaningful third-party validation. Whether that performance lead is durable as competing architectures evolve is a legitimate question the interview did not address directly.

$25 Billion Backlog and the Anti-Bubble Case

On the question of whether AI infrastructure spending constitutes a bubble, Feldman made the most direct and empirically grounded argument available to him: Cerebras currently carries a backlog of more than $25 billion in demand that no supplier — including AMD and Nvidia — can fulfill. "The builders are so far behind the demand, it's absurd," he said. His framing of historical bubbles is worth taking seriously. "Historically, bubbles were characterized by a notion of if you build it, they will come," he noted, citing late-1990s fiber buildouts and 1870s railroad construction. "What is unusual about AI right now is the builders are so far behind the demand." He added: "Our customers and their customers are moving at the speed of software, and we're moving at the speed of real estate data centers."

The $25 billion backlog figure, if accurate, represents a significant data point for the sector. Investors should note that Feldman did not break down the composition of that backlog or the timeline over which it is expected to convert to revenue, which is material given the long-cycle nature of data center deployments.

Customer Concentration: One Big Customer, Then a Bigger One

Feldman addressed the customer concentration risk with characteristic directness. Before the OpenAI deal, Cerebras had a $1 billion committed agreement with G42, the UAE-based AI champion, signed in late 2023. When the company tried to raise capital, investors flagged single-customer dependency. Then Cerebras signed OpenAI for more than $20 billion and then AWS. "I used to have one and now I still have one. Only it's 20 times bigger," Feldman said. He contextualized this against Nvidia's own concentration profile: "Nvidia did roughly $68 billion last quarter and four customers accounted for half that. That's the world we play in." The point is well taken, though it does not eliminate the concentration risk — it normalizes it within the sector.

He also offered a useful reframe on what single large customers actually represent in practice. G42 is a cloud provider servicing universities, oil companies, and hundreds of other end users across the UAE ecosystem. OpenAI's compute demand ultimately reflects billions of individual end users. The headline customer count understates the actual breadth of end demand being served.

Token Economics Are Maturing Faster Than Expected

On the emerging question of token limits, pricing sensitivity, and enterprise allocation of AI compute, Feldman used a Costco analogy that cuts to the issue efficiently. Early enterprise AI adoption resembled walking every aisle of a warehouse store without a list — wasteful and poorly calibrated. "Microsoft woke up one day and said, tokens are expensive," he noted, describing the realization as obvious in retrospect. "Which other resource do we let everybody use as much as they want? It's just boneheaded from the get go." The market is now learning to differentiate: high-capability frontier models for tasks that justify the cost, open-source alternatives for everything else, with internal allocation reflecting individual productivity levels. Feldman sees this as a healthy and rapid normalization, not a demand-destruction signal.

Data Center Bottlenecks and the Industry's Community Relations Failure

Feldman was unusually candid about the AI industry's failure to build community support for data center expansion. The constraint is real — Cerebras' cloud offering is capacity-constrained by data center availability, as are all hyperscalers — but he located a large part of the political resistance in an avoidable own-goal. "We could have been good neighbors. We could have stepped out into these communities and used their processes, their local governments, to gain approval and buy-in." He cited the industry's failure to communicate job creation figures, tax base contributions, and the counterintuitive fact that U.S. data centers consume between five and seven times less water than California's almond growers. "We raced ahead and we didn't think about the communities into which we were putting these data centers," he said flatly. "We blew it."

Cerebras' own response has been to locate capacity in areas with abundant and inexpensive power: West Texas, rural Utah, parts of Louisiana, Niagara, and Canada more broadly. The logic is straightforward — chase power availability rather than proximity to population centers, then move tokens via fiber. It is a pragmatic workaround to a problem the industry created for itself and has not yet solved at scale.

The Specialist vs. Generalist Question Remains the Right One to Ask

On the inevitable question of whether integrated, general-purpose architectures eventually displace specialist silicon, Feldman offered an analytical framework rather than a promotional answer. The outcome, he argued, is determined entirely by the shape of the resource landscape. "If the vein of resources the specialist is targeted at is very large, the specialist crushes it and they win. If the resource landscape is made of lots of little different pockets of resources, the generalist wins." He cited the GPU's dominance in discrete graphics as a specialist win, ARM's defeat of x86 in mobile as another, and the x86 machine's eventual breadth as a generalist win in fragmented use cases. His view is that AI inference — specifically the decode problem — represents a large and structurally distinct workload that justifies a specialist architecture. Whether that vein of resources remains large enough as model efficiency improves and hardware competition intensifies is the central long-term risk to Cerebras' thesis, and Feldman did not directly engage with it.

DruckFin