Nebius: The Neocloud Built for Inference, Agents, and Margin

Nebius: The Neocloud Built for Inference, Agents, and Margin — Not Just Megawatts

Bank of America 2026 Global Technology Conference, June 3, 2026

Nebius Group's Chief Business Officer Roman Chernin sat down with Bank of America analyst Tal Liani at the firm's 2026 Global Technology Conference to lay out why Nebius believes it is structurally positioned to outcompete both hyperscalers and fellow neoclouds as the AI infrastructure market evolves from raw compute toward inference and agentic workloads. The conversation surfaced two genuinely new and investor-relevant insights: a clear articulation of how Nebius's software stack directly expands its addressable customer base and drives margin, and the strategic rationale behind its two recent acquisitions, Eigen and Clarifai, which together form what management believes is one of the strongest inference-at-scale engineering teams in the market.

Not Selling Water in the Desert — Selling the Plumbing System

Liani opened by framing the current AI infrastructure boom bluntly: "Selling data center capacity now is selling water on a hot day in the desert." Chernin's rebuttal was equally direct and strategically important. "We don't sell data center capacity. We sell product that's built on top." This distinction is not marketing language — it reflects a deliberate and layered go-to-market architecture that Nebius has constructed around distinct customer archetypes, each consuming AI infrastructure at a different level of abstraction.

At the base are the hyperscalers and large frontier labs that want bare-metal compute at scale with little else. Above them sit what Chernin calls "AI-Native Labs" or "Neolabs" — hundreds, potentially thousands of research-focused organizations that want managed infrastructure so they can concentrate on training tasks without running their own full software stack. Nebius serves this cohort through what it calls its multi-tenant cloud. The next layer is builders of vertical AI products — companies like Cursor in coding, Harvey or Legora in legal, Gamma in content, and Clay in CRM — who do not think in terms of GPU hours at all. They consume models as a service, and for them Nebius built a managed inference platform it calls Nebius Token Factory. Beyond that sits the emerging agentic layer, where developers will not choose models or compare token prices but will simply purchase outcomes from agent execution.

"Our product strategy is to meet them there," Chernin said, describing the philosophy of following each successive wave of AI consumption rather than anchoring to any single abstraction layer. The commercial logic is straightforward: each layer up the stack expands Nebius's serviceable universe from a handful of hyperscalers to thousands and eventually tens or hundreds of thousands of developers and builders.

Inference Is Already Moving the Needle — And Improving CapEx Economics

Chernin confirmed that inference is already the fastest-growing segment within Nebius's revenue mix and is having a "significant, positive impact on the business" today, not at some future date. This matters for investors modeling the company's near-term trajectory. Training contracts are largely one-off, infrastructure-driven sales where customers arrive knowing exactly what GPU cluster they want and for how long. Inference is structurally different: it is recurring, aligned with the customer's own growth, and allows Nebius to extract value through software optimization rather than just hardware delivery.

Chernin made a particularly useful point about CapEx lifecycle. When newer chips arrive and large customers migrate their frontier training workloads to the latest hardware, older clusters do not become stranded assets — they get redeployed for inference workloads. He cited the Anthropic-SpaceX arrangement as a public illustration of this dynamic, where SpaceX moved training to a newer cluster while the original hardware remained productive for inference. For a capital-intensive business like Nebius, extending the revenue-generating life of a GPU cluster is a direct improvement to return on invested capital.

The Eigen and Clarifai Acquisitions: Building the Inference Engine

The two acquisitions Nebius has made recently — Eigen, based in San Francisco, and Clarifai, headquartered on the East Coast — are specifically targeted at strengthening the Token Factory inference platform, and the rationale is technically precise enough to be worth understanding in detail.

Eigen is a research-driven team founded by MIT PhDs focused on model-level inference optimization: extracting more token throughput from a single GPU. Clarifai's core competency is inference as a system — how to orchestrate thousands of GPUs serving millions of users efficiently, including caching strategies, node scaling during demand spikes, and rapid scale-down when traffic subsides. "Combining it together and actually combining with our in-house engineering capabilities, we believe that we now have pretty strong — maybe one of the best — teams to build inference as a big system," Chernin said. The economic translation is direct: better inference system performance means better token economics for the customer, better utilization for Nebius, and stronger competitive positioning on price-performance.

Why Software Is the Margin Story, Even If It Is Not Directly Monetized

Liani pressed on whether the software and full-stack focus actually translates to higher margins in practice. Chernin's answer cut through the usual neocloud evasiveness on unit economics. The core argument is demand-side optionality: a platform that can serve 10,000 customers will always have better pricing power than one that can only serve 10. "The more hot options you have, the more prices you have," he said, drawing on his background in digital advertising. The ability to abstract hardware from the customer — meaning Nebius, not the buyer, decides which GPU cluster handles a given inference workload — creates optimization levers that translate directly into economics for both sides.

He was also explicit that software is not necessarily monetized as a standalone product. "You don't necessarily monetize software directly, but you build the software to unlock new use cases and give you more levers of optimization for the customer — and as a result for yourself." This is an important nuance for investors who might be looking for a discrete software revenue line that does not exist. The margin benefit is embedded in utilization rates, pricing power, and the ability to serve more diverse and higher-value workloads from the same infrastructure base.

Hyperscaler Contracts Fund the Real Business

On customer mix, Chernin was candid about the role large hyperscaler contracts play in financing Nebius's broader ambitions. Working with customers like Microsoft or Meta does not represent the long-term strategic destination — Nebius's stated goal is a diversified portfolio of AI-native enterprises, growth-stage startups, and established companies. But large wholesale contracts provide the capital to build out capacity faster and finance the rest of the business more aggressively. The company reportedly has three to four customers competing for each GPU, which Chernin described as a direct indicator of demand-side pricing leverage.

Diversification extends beyond customer archetype to contract structure: Nebius holds a mix of long-term deals, short-term agreements, and spot capacity that can command a premium for immediate availability. This portfolio approach gives the company commercial flexibility that pure bare-metal operators lack.

Supply Chain: Own-Build Data Centers Are the Key Unlock for 2H26 and Beyond

The binding constraint for Nebius — as for every neocloud — remains the ability to bring powered, connected data center space online quickly enough to match demand. Chernin noted that a very significant portion of new capacity coming online from late 2026 onward will be in data centers Nebius builds itself from greenfield, rather than leasing third-party facilities. Self-built data centers improve cost structure, provide greater control over timelines, and reduce dependence on external landlords. Running roughly a dozen data center projects in parallel across multiple regions also acts as a hedge — delays in any single project do not create a delivery crisis because the portfolio is deliberately over-subscribed. The distributed nature of inference workloads reinforces this: unlike large training clusters that demand concentrated compute in a single location, inference can be served from geographically dispersed facilities, adding further scheduling flexibility.

On commodity and chip pricing risk, Chernin was measured but confident: the largest contracts have locked supply, and in the current environment, demand pressure on pricing is a far stronger force than input cost inflation on components.

DruckFin