Cerebras Systems Transcript: CEO Reveals $25 Billion AI Backlog and Explains Why the AI Bubble is a Myth
June 4, 2026 - Bloomberg Tech 2026, San Francisco
The Largest Semiconductor IPO and Solving a 75-Year Chip Problem
Tom Giles: Andrew, just two weeks ago, you had a big event.
Andrew Feldman: Yeah. The biggest IPO of the year so far.
Tom Giles: The largest ever semiconductor IPO. If my math serves, you've been in the industry a long time. I've been watching it for a very long time. We don't see a lot of chip-related startups, much less chip startups that make it to IPO. How does AI change that?
Andrew Feldman: Well, first, chips are really hard. And so, most of us have either aged out or died. They're not expensive to make to us here. This one here is what we built. [Andrew Feldman holds up the massive Cerebras Wafer-Scale Engine chip to the audience]. It's the largest chip ever made. Thank you. They're clapping for big chips. Yeah, that's a switch. It only cost half a billion dollars and ten years of my life to get it to work. And what we got for that was the fastest AI processor ever built. We solved a problem that had been open in the computer industry for 75 years, which is how to build a big chip. And we solved it and we delivered a product, and we were so proud. And we announced this in August of 2019, and absolutely nobody cared. Nobody cared. And it took the world a little while to catch up.
Andrew Feldman: Starting in 2025, the AI models got smart enough such that people began to use AI. And once people began to use AI, speed mattered. And in the way we use AI, it is with inference. And what we do is the fastest inference in the world, not by a little bit, but by more than 15x. And so, that's how we ended up in this sort of extraordinary place, going public two weeks ago.
Disintegrating Inference and the AWS Partnership
Tom Giles: And in the run-up to that, you scored some pretty significant validating customer wins, including AWS. And I find that relationship fascinating because it really exemplifies the way you handle inference and you disintegrate the process, right? Where the AWS Trainium handles one part, and then you take the part of inference where you decode. So talk a little bit about that, and more to the point, talk about how that is a blueprint or can be a blueprint for work with other hyperscalers.
Andrew Feldman: We had a pretty good 90-day run. We did a deal north of $20 billion with OpenAI in a committed take-or-pay deal. And then 45 days later, we signed a big deal with AWS. The idea in most cases, as a computer architect, is to try and look at a problem and think about what the right machine is for it. Should you design the machine? Can we use somebody else's machine? What we saw in 2015 and 2016 was the rise of a new workload. And we thought, well, this workload is going to eat a lot of compute. This new AI is going to eat an extraordinary amount of compute. And we made two contrarian bets at the time. The first was we will build dedicated silicon for it. And number two, we will not build something that looks like a GPU. We will start with a clean sheet of paper and build something entirely different.
Andrew Feldman: Both of those, at the time, people thought we were mad, and it turned out we weren't dead, right? Now, if you fast forward ten years, there is so much demand right now on inference because the AI, starting in about 2025, became so smart that it can do important things. And we're using it more and more. And so again, we looked at the work. What is the essence of the inference problem? It's comprised of two parts. It's comprised of a part called processing the prompt. And don't be fooled for a minute, we just invent complicated names for no reason, it is amazing. We're going to call that pre-fill for no reason at all. And all that is, is it's processing the prompt. And then there's a second part, which is generating the answer. So you process the prompt and you generate the answer. We call the first part pre-fill, and the second part decode.
Andrew Feldman: It turns out to have some very different compute characteristics. And so we thought to ourselves that there are machines that are better than us at this pre-fill. It is a parallelizable problem. It has fundamentally different characteristics than the decode, which is a strictly sequential problem. And so we took this observation and we went to talk to AWS and said, we can use your Trainium part to do the pre-fill, and we'll use our big chip to do the decode. And what we will get is this extraordinary solution. It turned out to be really well received. And we're now engaged in that process of using other people's parts for part of the problem and our part for another part of the problem, with all members of the community, other hyperscalers that aren't Nvidia. So everybody but them.
The Battle Between Specialist and Generalist Chips
Tom Giles: I want to come back to that in a minute. We talked about disintegration, but in chip-making, isn't it inevitable that generalization and getting away from a disintegrated approach wins out? Isn't there almost inevitability there, and what happens if that's the case?
Andrew Feldman: No. I think the battle between the specialist and the generalist is a really interesting sort of battle. And whether it's in the savanna in Africa, whether it's with small companies going up against big companies, the thing that determines whether the specialist beats the generalist or the generalist beats a specialist is the shape of the resource landscape. If the vein of resources that the specialist is targeted at is very large, the specialist crushes it and they win. If the resource landscape is made of lots of little different pockets of resources, the generalist wins.
Andrew Feldman: So where did x86 win? In this landscape that was filled with lots of different use cases. Where did the GPU win? It won on discrete graphics, one discrete workload. Where did the x86 machine win? Everywhere. Why didn't it also win in the cell phone? Because ARM built a thing that was 100% focused on running off of battery and very low power. And so here are two examples where the specialist absolutely clocked the generalist. In other cases, the resource landscape wasn't enough. Members of the industry, myself included, tried to build a specialist. There wasn't enough for us to eat. So we ate a little bit and we starved as the generalists sort of gathered up all sorts of resources. So what we saw in 2015 was that this rise of AI would create so much demand for compute that it was best served with a specialist. And that was sort of one of the winning observations.
Is AI a Bubble? The $25 Billion Backlog
Tom Giles: The other big customer win you mentioned, OpenAI, uniquely structured. We're seeing that OpenAI and other LLMs are having to become increasingly creative about the ways that they finance and pay for these compute agreements, like with yours, because of this great demand that you reference and that is all around us. Hundreds of billions of dollars being spent. Do you have concerns about their ability to generate the revenue and raise the financing that they need in order to meet their obligations? I think maybe another way to ask that question is, do you think there's an AI bubble? Can the leaders keep going? Is the growth sustainable, and will that user demand materialize quick enough?
Andrew Feldman: You know, we've both been around this. This is not our first rodeo. But one of the few advantages of not being young is it's not your first rodeo. I think the following. I think historically, bubbles were characterized by a notion of if you build it, they will come. I saw some people I recognized in the audience who were with me in the late 90s when we were building out data networking equipment, and people were putting huge amounts of fiber in the ground on the assumption that it would come. Economists like to go back, for reasons that are unclear to me, to the railroads and a lot of good analogies from the 1870s. There, they were also, if you build it, they'll come.
Andrew Feldman: What is unusual about AI right now is the builders are so far behind the demand, it's absurd. We have a backlog of more than $25 billion of demand that there are none of us, not us, not AMD, not Nvidia, that can keep up with the demand that end users are driving. And that's sort of, in a lot of ways, the opposite of a bubble. We are chasing our customers, and their customers are moving at the speed of software, and we're moving at the speed of real estate data centers. And so we are behind.
Data Center Constraints and Community Relations
Tom Giles: Talk about that a little bit more. I mean, you did a podcast with one of my colleagues from Bloomberg Intelligence, and you were talking about if there's a constraint right now for you, it's in access to the data center. We're seeing around the country, especially in an election year, a lot of pushback, a lot of objection to I don't want that in my backyard. How are you coping with that?
Andrew Feldman: Two different things. First is all of us are constrained by data centers. Right now, if you talk to us, we have a cloud, we're constrained by data centers. AWS is constrained by their data center deployment phase. Everybody is constrained by their data center deployments. So that's number one. Number two, it's a separate issue, and that's why is the world angry at us? And they're angry at us because we were dopes. Not us particularly, but our industry. We could have stepped out into these communities and been good neighbors. We could have gone to these communities and used their processes, their local governments, to gain approval and buy-in. We could have been good neighbors. We could have paid our way, paid sufficient funds in the development of these data centers so that the local community was never on the hook for a nickel.
Andrew Feldman: We could have shared how a data center of 150 to 200 megawatts, which isn't giant, will create thousands of jobs for several years in construction alone. We could have shared how, for example, we use less water in a giant data center than a small restaurant. Do you know, in the entire U.S., data centers use less water than the almond growers in California? Not by one or two times or three times or four, but between five and seven times, almonds consume more water than data centers. And so what we did is we raced ahead, and it might be that we're sort of low IQ and we're better at talking to machines than people as an industry, but we raced ahead and we didn't think about the communities into which we were putting these data centers.
Andrew Feldman: Brad Smith at Microsoft came along and put out a call to action for everybody. It was just common sense. It had five thoughtful pillars, and in the end, it was like, treat them like your neighbors. And it is absolutely possible that you go into a community and you build a data center, and the community loves you. You create jobs. The tax base increases by a lot. We have heavy equipment on site, we can build a baseball field for the school. As a community, we could have done a better job and we blew it. And we didn't win over the community.
Tom Giles: And what will you do differently?
Andrew Feldman: I'm not a data center builder. I am a buyer. And so, we are engaged in the communities in which we have data centers. We are engaged with the local chamber of commerce. We're engaged with the community to the best of our ability. We also have chosen data centers that are in rural areas that are far away. You'll hear sometimes that we don't have enough power in the U.S. That's not true. We have plenty of power. It's just not near anything. And so it's a little more expensive to get to. Our power is in West Texas, our power is in rural Utah, our power is in parts of Louisiana that nobody wants to live in. Our power is in Niagara. Canada has more power than they know what to do with. Not only do they have falling water, they've got trapped natural gas in places. And so you have to go to where the power is. And I think you have to think about how to get the results, the tokens, out with fiber optic cables. So you've got to put those in. But I think it is not an either-or. And we just, as an industry, did a poor job of stepping into communities and being good neighbors.
Customer Concentration and Scaling with G42 and OpenAI
Tom Giles: As you look at customer wins, we saw from your early pivotal, important relationship with G42 that you need to diversify your customer base. You've done that with Meta, with AWS. Where should we look for the next big wins, and how long it may be before we see that materialize?
Andrew Feldman: So it is very curious, and it never occurred to me. And certainly, in the private world, nobody says you've got this huge customer, it's bad. We will be accounted for such a significant percentage of your revenue that it counts all our manufacturing. So in late 2023, we did a $1 billion deal with the AI champion in the UAE, a company called G42. And they were one of the first movers in the world. And we went to market and went to raise money, and people said, you've only got one big customer. And then we won OpenAI, and they did a $20-plus billion deal. And people said, now you've only got one big customer. I used to have one, and now I still have one, only it's 20 times bigger. It's one of the biggest deals in the history of Silicon Valley. And then we won AWS.
Andrew Feldman: I think the truth is several things. First, this industry is going to have very, very big customers. Nvidia did, what, $68 billion last quarter, and four customers accounted for half that. That's the world we play in. Right. And so there are going to be extraordinary customer concentrations. And some of those customers actually service hundreds of other customers. So G42 is a cloud for the UAE ecosystem. There are universities in Abu Dhabi. There are oil companies in Dubai. There are hundreds of different users, but they aggregate up to one spot, and they're one customer. In the same way, when we sell to OpenAI, what are we actually selling to? We're selling to billions of individual users who are using the compute.
Speed as a Moat and the Costco Era of Token Economics
Tom Giles: I'd love to get a sense, you know, OpenAI just introduced a model based on Cerebras. What are some early learnings from that? What are some takeaways and what are some metrics that you can share with respect to performance wise, tokens per second or whatever the metric is?
Andrew Feldman: What we know, and Google showed this years ago in 2009, there's an interesting paper that even very small changes in the amount of time it takes to get an answer back affect your enjoyment of the service. Milliseconds slower produces unbelievably significant results in how long you stay, how frequently you use, even if you're not aware of it. And we know this. And if you think about it, you say, well, how big is the market for slow search? Why? Right. How big is the market for dial-up internet? Right. How much would I have to pay you to rip out broadband? A thousand a month. Would you want slow internet at home? No. AI is going to be the same way. Nobody wants slow AI.
Andrew Feldman: If I ask you to wait eight seconds for a website to resolve, you lose your mind. And so once a technology becomes enmeshed in what we do every day, right, the speed with which you use it becomes fundamental. And when you are so much faster, you feel it in everything you do. The guy who designed Open Coder, Peter Steinberger, said that using us was like giving him Thor's hammer. And he said that's what it was like to be a coder with our speed. And so, your users will be more productive. They'll get more done in an hour. And that advantage concatenates and increases over time. And so, that's what speed has always brought.
Tom Giles: There's a price sensitivity that's kind of being built into the market right now. Whereas earlier we heard about token maxing, and now we're hearing about meters and limits, and you know, how is that, is that real? Is it widespread, and is it changing the pace of adoption?
Andrew Feldman: I'm going to date myself here. I remember when Costco, the first warehouse store, came to Palo Alto. It opened in Redwood City, and my mother would shop Costco the same way she would shop Safeway. She'd go down every aisle. And as you know, it's a horrible mistake at a Costco, right? Because you make two mistakes, and they're $19 each, and you end up with a tub of mayo that's big, for some reason that you thought was a good idea right now. What happened two or three years later was nobody shopped Costco like that. You went to the back and got the cheap chicken, right? And you looked at your list, and you went over there, and you got the big box of cupcakes because your kid's got a birthday, and you changed completely the way you shopped.
Andrew Feldman: That's what's happening right now with tokens. At first, it's like, hey, have at it, right? And Microsoft woke up one day and said, tokens are expensive, wait, we can't let everybody use as much Anthropic as they want. What a strange observation. Which other resource do we let everybody use as much as they want, right? It's just boneheaded from the get-go. Of course, you have to allocate resources in your organization. There are some people you should get out of their way; they're unbelievably productive in everything. There are other people you've got to meter them. It's the way the world works, right? Do you need Spark or GPT-4 or the highest-end model for every problem? You know, you don't need a Ferrari to go to the grocery store. Use a lower-cost open-source model. And so what we're learning is how to shop in Costco. We are learning we have this abundance now. And we're learning how not to buy that $18 tub of mayo. And we just gotta step back and say to ourselves, okay, we're going to use the expensive models here, and we're going to use open-source models here. And here are some people we're going to allocate to each of these buckets. And that's how we're going to go. And I think this is the learning that you're seeing happening extremely quickly.
Cerebras Systems Deep Dive
The Wafer-Scale Architecture and Physical Moat
In the high-performance computing landscape, artificial intelligence inference and training are fundamentally constrained by the memory wall. This refers to the time and energy consumed by moving data between memory banks and the compute processor. The dominant market architecture solves this by linking discrete graphics processing units via high-speed optical networking and high-bandwidth memory. Cerebras Systems bypasses this physical bottleneck entirely. By utilizing an entire 46,225 square millimeter silicon wafer, the Wafer-Scale Engine acts as a single, contiguous processor. The current iteration, the WSE-3, features 4 trillion transistors and 900,000 artificial intelligence-optimized cores. The true architectural weapon, however, is the 44 gigabytes of on-chip static random-access memory. By storing model weights directly on the wafer, Cerebras delivers 21 petabytes per second of memory bandwidth. Compared to incumbent flagship processors, the WSE-3 boasts vastly more compute cores and a massive multiplier in memory bandwidth. This structural difference enables exceptionally large parameter models to run natively on a single system without the latency penalty of inter-chip communication, resulting in a substantial advantage in tokens-per-second throughput for critical inference workloads.
Business Model and Revenue Monetization
Cerebras operates a hybrid monetization structure that is actively transitioning from capital-intensive hardware sales to a higher-margin utility model. Historically, revenue was driven almost exclusively by the sale of CS-3 supercomputing systems to sovereign entities and national laboratories. Today, the commercial model is bifurcating. The company secures upfront capital via discrete hardware deployments but captures recurring economics through support and maintenance contracts, which typically command 15% to 20% of the initial hardware price annually. Furthermore, Cerebras is pivoting aggressively toward an artificial intelligence-as-a-service model through its AI Model Studio. This cloud-based inference and training application programming interface allows enterprises to access wafer-scale compute without assuming exorbitant upfront capital expenditures. Concurrently, the firm is licensing its proprietary software stack as a standalone enterprise product. This strategic shift is designed to smooth the inherent cyclicality of semiconductor hardware sales and drive sustained gross margin expansion beyond the baseline of 40% to 45% achieved on direct hardware deployments.
Customer Concentration and Demand Catalysts
The most critical vector of analysis for Cerebras is its extreme customer concentration. Throughout its pre-public history, the company operated virtually as a captive hardware supplier to the United Arab Emirates. Entities such as G42 and the Mohamed bin Zayed University of Artificial Intelligence historically comprised up to 86% of total revenue, a dependency that presented profound geopolitical and regulatory vulnerabilities. However, the commercial narrative shifted materially in late 2025 when OpenAI signed a multi-year compute agreement valued at over $20 billion, supplemented by a $1 billion working capital loan. This transaction fundamentally altered the trajectory of the firm, providing definitive technical validation from the world's most demanding foundational model builder. Additionally, Amazon Web Services committed to deploying Cerebras hardware within its data centers by the second half of 2026. While the contracted backlog provides unparalleled revenue visibility, it effectively swaps sovereign concentration for corporate concentration. If the anchor tenant alters its compute strategy, shifts inference workloads in-house, or pivots back to traditional graphics processing units, Cerebras faces significant revenue impairment.
Supply Chain Architecture and Foundry Dependency
Underneath the architectural differentiation lies a precarious supply chain dependency. Cerebras is a strictly fabless semiconductor designer that relies entirely on Taiwan Semiconductor Manufacturing Company for wafer fabrication. The WSE-3 is forged on the 5-nanometer process node, with the next-generation WSE-4 slated for the 3-nanometer node. Unlike incumbent technology conglomerates that command massive purchasing scale and priority allocation, Cerebras represents a fraction of the total foundry volume. The company possesses no formalized long-term supply or capacity allocation commitments from the foundry. Any disruption in wafer allocation, adverse pricing adjustments, or geopolitical friction in Taiwan would immediately impair the company's ability to fulfill its towering commercial backlog. Furthermore, the physics of wafer-scale manufacturing introduce highly specific yield challenges. Because no silicon wafer is perfectly free of defects, Cerebras engineers around this reality by etching redundant compute cores across the surface and utilizing software routing to bypass physical imperfections. While this elegant solution resolves the yield problem, it requires hyper-specialized manufacturing and packaging techniques that severely limit alternative sourcing optionality.
Competitive Landscape and Ecosystem Dynamics
The artificial intelligence accelerator market, estimated to exceed $200 billion in 2026, operates under the absolute hegemony of Nvidia. The incumbent commands roughly 80% of the data center accelerator market, an entrenched position fortified by over a decade of developer lock-in through its proprietary software platform. Advanced Micro Devices serves as the primary merchant alternative, capturing between 5% and 7% market share with its Instinct accelerator series. However, the true long-term threat to merchant silicon arises from the hyperscalers themselves. Internal custom silicon, such as Google's Tensor Processing Unit, Amazon's Trainium, and proprietary chips designed in partnership with Broadcom and Marvell, are absorbing massive internal workloads. Within the independent startup ecosystem, competitive dynamics experienced a structural realignment in December 2025 when Nvidia acquired Groq for $20 billion. Groq, which also relied heavily on static random-access memory to maximize inference speeds, was directly competing with Cerebras for latency-sensitive workloads. With Groq absorbed into the dominant ecosystem, Cerebras stands as the most capitalized independent purveyor of radical high-bandwidth architectures at scale, though it faces sustained pressure from specialized hardware challengers like SambaNova and Tenstorrent.
New Product Drivers and Forward Trajectory
The future growth engine relies heavily on the successful deployment of the WSE-4 architecture. Transitioning to the 3-nanometer process node will allow Cerebras to pack exponentially more transistors onto a single wafer, simultaneously driving down power consumption per token generated and expanding the raw compute envelope. Furthermore, the company is aggressively integrating direct-to-chip liquid cooling systems at the rack level, a mandatory physical evolution given the immense thermal density generated by operating an entire wafer at peak utilization. Beyond raw silicon execution, the primary catalyst for growth resides within the software layer. The company's compiler software must prove it can seamlessly ingest open-source models and widely used frameworks without requiring developers to heavily modify their codebases. The success of the upcoming Amazon Web Services deployment will serve as a definitive litmus test in this regard. If enterprise developers can deploy massive parameter models onto a hosted Cerebras instance as effortlessly as they would onto a conventional cluster, the total addressable market successfully expands from elite research laboratories to mainstream enterprise deployments.
Management Track Record
The executive leadership, anchored by Chief Executive Officer Andrew Feldman and Chief Technology Officer Sean Lie, possesses a distinguished operational pedigree in semiconductor architecture. The team previously founded and sold server infrastructure company SeaMicro to Advanced Micro Devices, establishing deep credibility in high-performance computing design. Their tenure at Cerebras is defined by executing an engineering vision that the broader semiconductor industry broadly dismissed as physically impossible. Successfully managing the thermal expansion, power delivery, and defect routing of a wafer-scale chip is an objectively monumental engineering achievement. Moreover, management demonstrated exceptional strategic agility in late 2025. Facing existential regulatory roadblocks regarding their Middle Eastern revenue exposure ahead of a planned public listing, leadership aggressively pivoted to secure the transformative OpenAI agreement, successfully derisking the 2026 initial public offering. However, operating as a publicly traded entity introduces an entirely new set of demands. The transition from pure research and development to scaled global deployment, complex supply chain management, and quarter-by-quarter financial execution will rigorously test the operational bandwidth of the executive team.
The Scorecard
Cerebras Systems represents the most audacious architectural deviation in the artificial intelligence silicon landscape. By confronting the memory wall head-on, the wafer-scale paradigm delivers demonstrable advantages in throughput and latency for the industry's most demanding inference workloads. The securing of a massive multi-year backlog with the leading foundational model builder provides unparalleled validation of the underlying technology and underwrites a clear path to extraordinary revenue acceleration over the coming cycles.
Conversely, the structural risks attached to this growth profile are severe. The commercial base merely swapped sovereign dependency for corporate concentration, leaving the firm highly exposed to the strategic whims of a single paramount customer. Coupled with unhedged reliance on external foundry allocation, a developing software ecosystem, and an incumbent competitor armed with effectively infinite capital and a recently acquired inference competitor, the margin for execution error is nonexistent. The company must flawlessly bridge the chasm from niche hardware provider to scaled enterprise utility to justify its current market positioning.