NVIDIA Transcript: Vera Rubin AI Platform Enters Full Production as Agents Transform Computing Economics

GTC Taipei 2026 Keynote - June 1, 2026

Jensen Huang took the stage at GTC Taipei to announce that NVIDIA's ambitious Vera Rubin platform is now in full production, marking what he called the most significant shift in computing since the PC revolution. The keynote, broadcast to 70 simultaneous events across Taiwan, emphasized that useful AI has arrived and that the economics of computing have fundamentally changed.

The Economics of AI: Tokens Equal Revenue

Huang opened with compelling evidence that agentic AI has moved from promise to productivity. Displaying GitHub commit data, he revealed that while 30 million software developers historically produced 300 million commits in 2023, that number reached 500 million by 2025. In the first months of 2026, commits have nearly tripled to approximately 900 million.

The economic implications are profound. Huang explained that three trillion dollars in developer salaries now generates what would previously require nine trillion dollars worth of human productivity. This represents the first tangible proof that AI generates measurable GDP growth. He emphasized that contrary to fears about job displacement, companies are hiring more software engineers because the output per engineer has become so valuable.

This productivity surge has created unprecedented demand for compute infrastructure. Huang noted that because tokens are now profitable units of revenue, AI companies want to generate vastly more of them, driving the explosive growth Taiwan's semiconductor ecosystem is experiencing. He referenced Taiwan's projected GDP growth of nearly 10 percent as direct evidence of this compute demand.

Understanding the Agentic Computing Model

Huang took considerable time explaining how agents differ fundamentally from traditional applications. An agent consists of a large language model sitting inside a harness that orchestrates observation, reasoning, planning, and action. The agent uses tools—spreadsheets, web browsers, databases, or specialized computing engines—all managed by the harness much like an operating system manages applications.

The demonstration included generating a GIF animation of Taiwan's 101 building with simple text prompts, creating CAD files for 3D printing from verbal descriptions, and executing complex tasks through natural language. Huang stressed that the computing pattern has shifted from launching applications and clicking buttons to explaining intent to AI, which then generates code or uses tools to produce outputs.

One major revelation concerned tool use. Huang addressed the misconception that agentic AI would make software companies obsolete. Instead, he argued the opposite: because the world will no longer be limited by the number of human workers, agents will use exponentially more tools than people ever could. This represents an unprecedented opportunity for software companies, provided their tools are presented in ways agents can use them.

The Disaggregated Computing Challenge

The agentic computing model represents the ultimate disaggregated and distributed computing architecture. Different components run on different parts of a data center. The large language model thinking process activates entire racks of Grace Blackwell NVLink 72 systems. Tool use activates CPUs running compilers, Python, JavaScript, or accelerated computing libraries. Security harnesses run on CPUs and DPUs like NVIDIA's BlueField. Orchestration happens on CPUs managing the entire workflow.

Memory management emerged as one of the most complex challenges. Working memory, called KV caching, must handle compression and retrieval of both structured and unstructured data. The ontology and relationships between data structures create extraordinarily complicated processing requirements. Huang predicted the memory systems of AI will completely revolutionize storage infrastructure.

This heterogeneous, disaggregated architecture is precisely why NVIDIA built Vera Rubin. Huang emphasized that Vera Rubin is not a single chip but an entire end-to-end system including GPUs, CPUs, storage systems, ConnectX-9 networking, security processors, and software stacks. Everything implements confidential computing because AI models are so precious. Each system component would constitute a revolution independently; together they represent NVIDIA's most ambitious endeavor.

NVIDIA as Infrastructure Company

Huang explained that NVIDIA has transformed from a GPU company to a systems company and now to an infrastructure company. The ecosystem now includes power generators, cooling systems, and grid providers. NVIDIA's goal is building complete infrastructure stacks so customers can construct AI factories.

He introduced DSX—the infrastructure blueprint following RTX for GPUs and DGX for systems. DSX includes DSX Sim, an Omniverse-based simulator where partners design and validate entire AI factories before ordering a single rack. They plan layouts, simulate power and cooling, design networks, and test every change in a digital twin.

DSX OS provisions, operates, monitors, and remediates infrastructure, converting installed systems into trusted, multi-tenant, resilient AI-ready capacity. Breakthrough innovations include DSX MaxLPS, which lets operators deploy more GPUs within the same power budget by reducing overprovisioning from 40 percent to nearly zero, adding billions in annual revenue. Hot liquid cooling at 45 degrees Celsius uses less water and energy. Dynamic power allocation steers power between racks, recovering stranded watts. AI agent teams continuously coordinate to balance cooling and power to meet workload demands.

Huang showed how DSX AI factories operate as flexible energy assets that cooperatively work with the grid. DSX Flex reads real-time grid signals and dynamically adjusts power consumption when the grid needs relief. With 100 gigawatts of AI factories expected online before the decade's end, these efficiency improvements translate to massive economic advantages.

The Four Factors of Competitive Advantage

Huang presented what he considers the definitive framework for evaluating AI infrastructure investments. He displayed a curve showing how quickly infrastructure comes online, its throughput, reliability, and useful lifetime. Because these systems represent 50 to 100 billion dollars, each factor dramatically impacts return on investment.

Time to first token matters enormously. NVIDIA's fully integrated approach—where every component is co-designed and the entire system is simulated—enables much faster deployment than competitors. The company has built out billions of dollars of infrastructure itself to ensure everything works correctly.

Throughput per watt is revenue. When a data center has one gigawatt of power capacity, that's all it will ever have. Every token is profitable, so throughput per watt directly equals revenue. Huang emphasized that choosing cheaper chips without considering performance per watt is economically irrational. He stated emphatically: the more you buy, the more you make.

Reliability—measured as mean time between interrupts—proves extremely valuable at scale. Data centers contain millions of cables and components. NVIDIA's experience operating at very large scale for many years creates significant advantage in harmonious, reliable operation.

System lifetime creates the most dramatic differences. In the four years since Hopper's introduction, AI has completely changed. Six years ago during Ampere, AI looked entirely different. The industry moved from CNNs to Transformers to mixture of experts to agentic systems. If architecture isn't flexible and ecosystem isn't rich, useful asset life is short. Because NVIDIA systems operate worldwide and software developers start with CUDA, the ecosystem guarantees long useful life and low total cost of ownership.

Vera Rubin: Full Production Announcement

Huang announced that Vera Rubin is now in full production—a statement met with sustained applause. The supply chain NVIDIA created for Vera Rubin is twice as large as Grace Blackwell. Manufacturing throughput has improved dramatically: assembly that took two hours per Grace Blackwell rack now takes five minutes. Millions of square feet of manufacturing capacity has come online to support Grace Blackwell and is now ramping for Vera Rubin.

The video presentation detailed the manufacturing miracle across the supply chain. The seven new chips comprising Vera Rubin take shape through hundreds of processing steps at TSMC using three-nanometer process technology, CoWoS advanced packaging, and HBM4 memory from Micron, SK Hynix, and Samsung. The Vera Rubin GPU contains six trillion transistors with over 18,000 components on one board.

Vera Rubin NVL72 handles the thinking—prompt processing, context understanding, reasoning, and planning. A new modular compute tray features a streamlined PCB midplane. ConnectX-9, SuperNICs, and BlueField-4 DPUs are maintenance-accessible with no cables for resiliency and scaling. Eighteen compute trays and nine hot-swappable NVLink switch trays connect through new high-efficiency liquid-cooled bus bars carrying over 5,000 amps—equivalent to 20 electric cars at full acceleration. Together, 1.3 million components form the third-generation MGX rack.

Huang congratulated Microsoft, Dell, and CoreWeave for standing up operational Vera Rubin NVL72 engineering racks. The Vera CPU rack contains 256 CPUs in a single liquid-cooled rack, orchestrating models, shuffling memory, and launching tools. Foxconn and Quanta manufactured the Vera LPX rack with 256 Groq LPUs across 16 trays delivering 40 petabytes per second of SRAM bandwidth for ultra-low latency. While NVL72 generates tokens at highest throughput, LPX generates them at lowest latency.

Vera BlueField-4 STX serves as AI memory storage—storage processing accelerated by BlueField-4 connecting memory, storage, and in-silicon security. NVIDIA Spectrum-X Ethernet Photonics is the world's first Ethernet switch with 200-gigabit co-packaged optics, using TSMC CoWoS process, chip-scale packaging, and ultra-high-powered laser dies on indium phosphide.

The complete system comprises five connected rack-scale systems. It involved 150 supply chain partners across Taiwan, millions of square feet of factory floor across hundreds of sites. Huang emphasized that Vera Rubin was not built merely to run AI but specifically to run agents. The complexity of the agent architecture represents the last great computer science breakthrough, requiring the most advanced computer system in the world.

Vera Rubin Hardware Demonstration

Physical rack systems were brought on stage for examination. Huang showed the Vera Rubin NVL72, the LPX rack, the Vera CPU rack with 256 liquid-cooled CPUs, the Vera BlueField storage processing and security system, and the Mellanox networking with the world's first co-packaged optics.

The design eliminates cables, hoses, and fans. A PCB midplane connects both sides of what previously required extensive cabling. The reliability and resilience improvements are substantial. Huang displayed the Vera CPU tray and storage tray containing two Vera CPUs, four ConnectX-9 units, and massive storage capacity. The Groq LPX rack provides very low latency inference extending Vera Rubin's throughput capabilities. The NVLink switch tray represents revolutionary technology enabling NVIDIA to become the largest networking company in the world.

NVIDIA Vera CPU: Computing for Agents

Huang introduced Vera CPU as fundamentally different from all previous processors. Traditional CPUs were created for humans living in a world counted by seconds, with cloud economics based on renting CPU cores by the hour. Agents live in a world counted by nanoseconds. They are impatient. When agents use tools or access databases, every moment waiting prevents them from advancing to the next step. Making CPUs as low-latency and interactive as possible is vital.

Within Vera Rubin systems, CPUs serve three functions. Two CPUs in each Vera Rubin rack orchestrate and manage GPUs, manage KV cache, and handle rack software. Grace BlueField CPUs provide security and isolation. Vera compute CPUs handle the harness, orchestration of AI models, tool use, and database access. Vera BlueField CPUs power the fastest storage servers ever made.

These systems sit in the critical path of the most expensive part of the data center. The economics center on token generation, and the CPU infrastructure must not impede this primary function. This required NVIDIA to build an entirely new architecture from the ground up—a CPU the world has never seen before, built for agents rather than humans.

Huang outlined four defining characteristics. First, instructions per clock must be incredibly high because latency must be short. Single-threaded performance, not throughput, must be world-class. Vera achieves the highest IPC in the world, fetching, decoding, and executing 10 instructions per clock.

Second, bandwidth per core must be exceptional for rapidly moving data in and out of the CPU. Third, total bandwidth throughout the system must be world-class. Because agentic systems are fundamentally disaggregated and distributed, networking becomes the central challenge. Data must move as quickly as possible between CPU cores, between CPU and storage, and between CPU and GPU. The bandwidth around and inside the system must exceed all previous standards.

Vera is the first CPU built at radical limits with a fabric connecting all CPU cores at the speed of light—3.6 terabytes per second with no trip boundary crossings. All CPU cores communicate with extremely high bandwidth, working together rather than being rented individually. Cross-sectional bandwidth is unprecedented. Vera is the first CPU using PCIe Gen 6 and LPDDR5 with 1.2 terabytes per second—two to three times the bandwidth of the highest-performance CPUs on the market.

Fourth, energy efficiency is critical. The number of CPUs will be quite high because there will be billions of agents compared to only one billion humans using traditional CPUs. Agents use CPUs with little patience because the cost of adjacent GPUs is too high and too valuable. CPUs must be both performant and extremely energy-efficient to maximize CPU deployment without taking power from token generation.

When compared to the highest-performance x86 processors, Vera's real single-threaded performance improvements are extraordinary. Huang noted that achieving five percent improvement on CPUs is incredible, ten percent remarkable, but Vera's speedups are simply unheard of.

Vera CPU Performance Benchmarks

The video presentation detailed the NVIDIA Olympus core at Vera's heart, built specifically for modern data center workloads including branch-heavy Python runtimes, tool calls, and sandbox code execution. Each core is tuned for throughput with a neural branch predictor evaluating two taken branches per cycle, a 10-wide decode engine bringing in more work each cycle, a large out-of-order engine keeping instructions moving, and advanced prefetchers with a novel graph engine anticipating the next data fetch.

Vera is the first CPU using LPDDR5X memory while correcting multiple errors simultaneously without compromising bandwidth. It achieves 40 percent lower peak memory latency versus x86, keeping cores fed during retrieval, analytics, and sandbox execution. NVIDIA's second-generation scalable coherence fabric unifies all 88 Olympus cores on a monolithic mesh. Memory and cores are not split across chiplets, enabling 50 percent faster core-to-core communication than traditional CPUs. Memory-coherent NVLink chip-to-chip connects GPUs directly to the CPU and can scale Vera to multiple sockets for massive inter-CPU bandwidth.

Huang presented benchmark results that stunned the audience. SQL—the most famous domain-specific language ever created—runs three times faster on Vera, not 10 or 25 percent faster. For real-time stream processing, particularly important for factories and stock exchanges, Vera delivers six-times speedup. He highlighted the partnership with Lynn Martin, president of the New York Stock Exchange, noting this system runs globally in real time.

These x-factor improvements on real workloads are rare for CPUs. Huang expressed pride in the team's achievement and previewed an extraordinary roadmap. Early adopter partnerships span the industry, with Taiwan's ODMs and computer makers bringing systems to market. The early adopters are agentic companies, marking the beginning of a new market that never existed before—CPUs for agents rather than people. This market will surely be larger than previous CPU markets because there will be many more agents than people, and agents are very impatient.

The NVIDIA Agent Toolkit for Enterprise AI

Huang emphasized that the application pattern he described—agents with harnesses orchestrating large language models—will define the next decade of computing. Every company will become an agent company with agents running internally. Every company will need operating systems for agents and will ask how to run agents safely and build agents for their workloads.

The NVIDIA Agent Toolkit for Enterprise AI addresses these needs. Huang noted that NVIDIA builds everything in plain sight—reviewing previous GTCs reveals today's announcements were previewed years ago. He has discussed this toolkit for several years, building toward this moment.

Companies need four things to build agents as services or operate agents internally. First, models—large language models that are smart, cheap, and fast. Second, a harness to orchestrate everything. Third, tools that models want to use, along with skills. The CUDA-X libraries will become amazing tools for future agents. Fourth, a runtime—the operating system holding it all together.

The toolkit includes models users can modify, various world-class open models with more to come. It runs agents from any provider—Claude Code, Codex, or others—inside NVIDIA OpenShell, a highly secure harness for enterprise use. OpenShell protects agents, keeps them grounded in security policies, protects privacy, grants appropriate rights and privileges, and protects identity. This open-source harness is being adopted globally by Red Hat, Canonical, Microsoft, and many others.

The runtime is fully optimized for the NVIDIA AI platform, which is everywhere. OpenShell runs in any cloud, on-premises, and even on devices. Users have tools and libraries agents can use, models to modify or use as-is, and agent harnesses like OpenClaw and Hermes. These can run on-premises or anywhere. This represents the operating system of the modern enterprise.

Cadence Chip Design Super Agent

One of Huang's favorite use cases involves chip designers, which he called the single most important thing NVIDIA does. The company partnered with Cadence to build a chip design super agent orchestrated by Codex or Claude Code. It processes RTL and architecture diagrams, schematics, or specifications as input, along with whatever needs fixing. Together they created super agents optimized for the NVIDIA platform with Nemotron.

The demonstration showed how hundreds of thousands of NVIDIA chips come together to make AI factories powering the world's frontier AI models. Designing these chips and their systems represents one of the hardest engineering challenges—trillions of transistors in three-dimensional circuits at microscopic scale, every gate and wire synchronized to picoseconds, working in perfect harmony with no error margin.

Physical prototypes are too slow and costly, so engineers work digitally. Each chip begins as architectural specifications translated into RTL—the language of chip design. RTL must be verified in simulation. A single bug can delay a chip by months. At NVIDIA, thousands of engineers spend billions of compute hours annually writing, running, and debugging millions of tests in a cycle taking teams weeks.

To compress this cycle, Cadence and NVIDIA built a design verification agent. Codex orchestrates the process. Cadence Chip Stack launches the RTL verification loop powered by Nemotron and secured by OpenShell, calling expert subagents in RTL generation, test bench creation, regression testing, and debugging. The system drives itself, running hundreds of simulations with Cadence Xcelium and formal verification with JasperGold. Design flaws are revealed and code bugs fixed. What once took weeks now takes hours—verification cycles over 40 times faster.

Huang emphasized the implications: from weeks to hours. NVIDIA has thousands of chip designers and will hire hundreds of thousands of Cadence super agents to accelerate the company, enabling greater ambition, more amazing creations, and faster execution.

Nemotron 3 Ultra: The Foundation Model

The agent toolkit starts with excellent models that partners like Cadence can modify and tune for their expertise, creating proprietary super agents with proprietary knowledge. Huang introduced Nemotron 3 Ultra, NVIDIA's next open model dedicated to building open models for the world so everyone can create their own agents.

Nemotron models provide not only the model but all training data and training scripts. Through a coalition of incredible partners contributing data to each other, Nemotron trains on one of the largest suites of long-running reasoning models and tool task-solving, tool-using datasets in the world. Everything—model, training scripts, and data—is made completely available. This is open models at its best, the best open model system in the world, with the simple goal of enabling users to take everything, add to it, make it better, and make it theirs.

Nemotron 3 Ultra is five times faster than previous models—the world's first model based on a hybrid architecture of State Space Models with Mixture of Experts. The architecture is incredibly fast so users can think fast. When thinking fast, they can think longer at the same cost. It is also 30 percent cheaper, requiring 30 percent lower cost in total FLOPs and total inference time than even the most cost-effective models in the world. It is frontier smart, five times faster, 30 percent cheaper, and completely open. NVIDIA is completely dedicated to this direction. Nemotron 3 is current, Nemotron 4 is in development.

Enterprise Software Partner Ecosystem

Huang addressed the misconception that agents would disrupt existing enterprise software markets. He stated emphatically: completely the opposite. The toolkit enables companies like Cadence, CrowdStrike, ServiceNow, Palantir, and SAP to create the largest opportunity ever. Agents will create unprecedented demand for enterprise software capabilities, provided as tools that agents can use.

The three major announcements to this point: Vera Rubin in full production, Vera CPU built for a new generation of computing with agents, and the NVIDIA Enterprise AI Toolkit enabling every enterprise and enterprise software company to build agents.

Reinventing the Personal Computer

Huang's relationship with Taiwan and many partners started with the PC revolution 40 years ago. The modern computer industry began in Taiwan with Windows 1, Windows 2, Apple 1, and Apple 2. By the time NVIDIA arrived 33 years ago, Windows 3.1 defined the PC. Windows 95 made the PC personal, transforming it from an enterprise tool to a consumer electronics device everyone should have.

That computing platform succeeded through intelligent design. Windows was not just disaggregated but properly abstracted with correct architecture—system BIOS, open chipsets, the operating system with drivers installable at runtime, and an abstraction layer with multimedia APIs that opened the PC to what we know today. Each element was essential to the PC's popularity.

Forty years later, Microsoft and NVIDIA are reinventing the PC. Huang previewed a discussion with Satya Nadella scheduled for the following night to detail their collaboration. It took three years to completely reinvent how the PC will work to be ready for this moment.

The agent computing pattern will run in AI clouds, inside enterprises, and on PCs. When a PC has an autonomous agent that understands the user, can be spoken to, can look at the user, and can be asked to refile things, conduct research, and perform many other tasks, what happens to that PC? The new operating system is the old operating system plus large language models. Large language models are the modern version of DirectX—an intelligence extension of the PC with input and output, understanding prompts and computer vision, generating video and sound. The application layer is replaced by an agentic runtime. The agent is the modern application.

NVIDIA RTX Spark Introduction

The video presentation introduced RTX Spark as the culmination of 33 years of NVIDIA's learning distilled into one chip—a reimagining of the PC for the age of AI. Agents running natively, connected to local or cloud models, become personal AIs sandboxed for security, running continuously and getting work done. The chips and OS must evolve together.

RTX Spark contains a Blackwell RTX GPU with 6,144 Tensor Cores delivering one petaflop of AI performance. A custom 20-core Grace CPU was built in partnership with MediaTek, fused by NVLink with 128 gigabytes of unified memory. Using TSMC's three-nanometer process, it contains 70 billion transistors. In close collaboration with Microsoft, a Windows platform for agents was created. This is the dawn of a new personal computing revolution starting with RTX Spark.

Huang emphasized this is the most beautiful chip ever built, created in partnership with MediaTek. He noted seeing Rick Tsai in the audience. The chip is extraordinary because it took 33 years to build in the sense that 100 percent of NVIDIA software runs on it. Digital biology, seismic processing, astrophysics, all CUDA-related physics, biology, genomics, AI, and computer graphics work without problems. Every single application NVIDIA has ever created and every application Windows has ever run operate on this system. Microsoft and NVIDIA meticulously optimized everything so this computer literally runs everything the world has ever created, plus agents.

The system can run a local Nemotron 3 Ultra model, Nemotron 3, Claude Code, Codex, or other models in the cloud or on the network. A demonstration showed an architectural design workflow where an agent running locally on RTX Spark helps design a house using laptop tools. With an OpenShell sandbox running the Hermes harness connected to Claude Sonnet in the cloud, the user selects a site, shares concept sketches and mood boards, and provides a text description of requirements and design intent.

The agent opens Rhino and models the site, shaping terrain, setbacks, and building envelope. It proposes building forms optimized for cost, comfort, and quality. With the form defined, it generates interior layout with walls, circulation, and rooms. The user can jump in anytime to adjust or change. Doors, windows, and structural elements are placed automatically. The agent detects its own mistakes and fixes them. When approved, it exports the model from Rhino into Blender with materials and object properties transferring with design context intact. The user fine-tunes materials and picks shots. Blender renders the house. The agent uses generative AI with the Flux 2 model to make renders photorealistic with multiple viewpoints and lighting conditions. What was once a complex workflow is now guided and simplified by an agent working on RTX Spark—designed at the speed of imagination.

Adobe Partnership and Industry Support

Developers are excited about RTX Spark's capabilities—acceleration, software capabilities, and partnerships making it incredible for everyone. Adobe has re-engineered the architecture and core of Photoshop and Premiere for RTX Spark, delivering twice the speed. The applications are already fast and now will be twice as fast. They are also designed to be agent-friendly with MCP servers that can interact with agents on the laptop.

The number of partners bringing RTX Spark to market is extraordinary. Huang called this the first great PC reinvention in 40 years. He expressed happiness that the global ecosystem has joined this effort. Essentially every major PC manufacturer will support RTX Spark, building incredibly smart, powerful, and beautiful laptops.

Complete PC Line Reinvention

RTX Spark represents laptop reinvention, but Microsoft and NVIDIA are reinventing all of PC. Huang announced a whole new line—three revolutionary Windows machines covering desktop, laptop, and workstation—all 100 percent Windows compatible, 100 percent CUDA, 100 percent NVIDIA AI Tensor Core. Everything that runs on NVIDIA on platforms around the world runs on these systems. This is the first completely re-engineered, reinvented line of PCs in 40 years.

The desktop system from MSI can run agents 24/7 meter-free. Users can download agents that run continuously without meter anxiety. The system connects throughout the house—to laptops, displays, cameras, appliances, water heaters, security systems. It becomes a personal AI agent that gets smarter over time as Nemotron evolves through versions 3, 4, 5, 6 and beyond. The agent sits at home helping with tasks like booking travel.

The DGX Station represents the workstation solution—compatible with Windows, running everything in Windows, with 768 gigabytes of memory enabling trillion-parameter model execution. This is an unprecedented capability. At 20 petaflops and eight terabytes per second memory bandwidth, this system sits by the developer's desk. For developers of large language models and agents, having this compute power at the desk provides everything needed for development, with cloud deployment following.

The Future of Personal Computing

Huang drew an analogy to phones from 15 to 20 years ago. Today's phone is used for just about everything except making phone calls—it means something very different than in the past. He is certain the PC will undergo similar transformation. In 10 years, the PC will be completely different from today's tool where you launch applications, click, and type.

Huang theorized that just as every house today has home theaters, big TVs, lawn mowers, and dishwashers, someday there will be AI supercomputers in houses running all agents and assistants, doing all kinds of things continuously. People will want AI agent computers running in their houses. Over time, these become more like R2-D2 or C-3PO—more like companions than PCs.

There is no question this reinvention of the computer is as significant as the reinvention of the phone into the smartphone. This is the beginning of that journey, the beginning of a new product line. NVIDIA has a roadmap for this brand new product family. Every single generation of architecture will include a desktop, laptop, and workstation.

Huang expressed being incredibly pleased and honored that 100 percent of the world's PC industry has joined NVIDIA to reinvent the PC—a new line, a new beginning.

Physical AI and the Cosmos 3 Foundation Model

Agentic AI is essentially a digital robot that understands, reasons, plans, acts, and uses tools. Agentic AI will run across all computers—humanoid robotics computers, robotics computers of all kinds, self-driving car computers, satellites, GeForce systems, the new PC line, agriculture equipment, manufacturing equipment, heavy industry equipment. Even base stations—radio stations of the future—will be agentic, understanding traffic and coordinating with other stations to minimize energy use while increasing spectral efficiency.

Everything will run agents. Today NVIDIA is largely in the data center, but Huang is certain there will be tens of billions, eventually hundreds of billions of agentic systems and agentic computers running worldwide.

The biggest problem is data. For language models, all English and language on the internet used for training was written from human perspective—we wrote it and read it. However, creating data for AI robotics requires the robot's perspective. Most of the world's video data is third-person, not first-person. For agentic systems, robotic systems, and physical AI, data is the hardest problem.

NVIDIA has moved up a progression ladder starting with teleoperation—human demonstration similar to the breakthrough of reinforcement learning from human feedback. Simulation using Omniverse enables reinforcement learning with verifiable rewards. These systems bootstrap AI models—physical AI models. Eventually learning from third-person data can be reprojected into first person. Through bootstrapping, a World Foundation Model emerges that understands the physical world from any perspective—third person, first person, inside and out. This represents a big breakthrough.

Huang announced Cosmos 3 as the frontier of physical AI. While many people work on language models, NVIDIA is absolutely the world's best in physical AI. This is the foundation model for all robotics work. Anyone wanting to create a robot—factory robots, robots working in factories, any kind of robot involving the physical world—now has a companion in Cosmos 3 that can understand and reason. It can generate, simulate, and in the loop can even be the policy itself. It leads leaderboards worldwide.

The video detailed how the real world is infinite and unpredictable. Physical AI needs data, but real-world data is impossible to scale. For physical AI, compute is data. Cosmos is an open frontier omni-model for physical AI built on a new Mixture of Transformers architecture. Pixels, action, sound, and language flow into the autoregressive transformer, which reasons, plans, and instructs the diffusion transformer that generates what comes next.

Developers post-train Cosmos across embodiments and use cases. As a VLM, Cosmos watches the physical world, understands what's happening, describes scenes, and flags what matters. As a world model, Cosmos generates physics-accurate synthetic video from an image, text, or video. As a simulator, Cosmos closes the loop for policy training and evaluation. As the foundation of NVIDIA OmniDreams—an action-conditioned world model—Cosmos predicts the future frame by frame.

Post-training Cosmos transforms it into a world action model—perceiving, reasoning, planning, generating actions for robots of every kind, for everything that moves. This represents a new kind of data, a new kind of teacher generated by compute. Cosmos provides the foundation for developers in the age of physical AI.

Just as text data plus compute gives AI, now that we have AI, compute equals data. Use Cosmos 3 to train models. Cosmos is an open model system exactly like Nemotron—open model, open data, open training methodology so users can enhance it and turn Cosmos into their proprietary model. NVIDIA has incredible partners working across many different industries.

Alpamayo 2 Super for Autonomous Vehicles

Huang announced Alpamayo 2 Super, an open model for self-driving cars. NVIDIA works with car companies across the world. Brands that have signed up for NVIDIA DRIVE Hyperion—building DRIVE Hyperion cars—represent about 80 percent of the world's car manufacturers. There will be enormous numbers of DRIVE Hyperion systems capable of running Alpamayo 2 Super or any other NVIDIA stack. NVIDIA also connects with approximately 97 percent of the world's mobility services, so when deploying Alpamayo 2 Super on the DRIVE Hyperion runtime with the Halos operating system, it will connect to all these services globally.

The demonstration showed a Mercedes autonomous vehicle navigating complex scenarios. The car verbally explains its reasoning: pulling out when the lane is clear, nudging left due to stationary vehicles blocking lanes, stopping at stop signs and intersections, yielding to pedestrians, maintaining distance from cutting vehicles, handling lane streams, keeping distance from trucks, navigating around obstacles. The destination announcement concludes the demonstration. While continuous verbal output would be annoying for passengers, the continuous internal dialogue represents thinking—Alpamayo 2 Super is a reasoning car.

Isaac GR00T Platform and Reference Robot

The technology created for autonomous vehicles applies to humanoids, though many new breakthroughs must happen. NVIDIA Isaac GR00T is the humanoid robotics stack—model, data generation, simulation, and runtime including operating system. This represents the complete GR00T platform.

Every NVIDIA system follows the exact same architecture whether for agentic systems in the cloud, agentic systems for PCs, robotic systems for self-driving cars, or robotic systems for humanoid robots. In every case, NVIDIA builds everything completely with vertical integration, complete integration, co-design, and extreme co-design. When opened for everyone to use—whichever parts desired, whatever users want—NVIDIA even helps with modifications.

What has been missing is a reference platform for robotic systems. These systems are extremely complicated with many motors and sensors, yet very fragile. NVIDIA needs a way to deliver reference platforms just as it does with PCs, DGXs, clouds, and self-driving cars. Now NVIDIA is doing this for robots.

Huang announced the NVIDIA Isaac GR00T reference humanoid robot—fully integrated. Each hand has 25 degrees of freedom, made by Sharpa. The robot has 31 degrees of freedom, stands 6 feet tall, weighs 150 pounds. Huang joked that he shares the first characteristic (closer to 6 feet) and the second (closer to 150 pounds), though the specific numbers vary. This platform runs the new Jetson Thor and NVIDIA's entire software stack—data generation stack, data simulation stack, and runtime—all integrated into a robot designed for everyone to use. It was built for higher education and university researchers because building such systems independently is insanely hard.

The video explained that the next leap in AI is general-purpose robots—humanoids. But building one is hard. Every team starts from scratch, stitching together simulators, teleoperation systems, data pipelines, and training infrastructure. Months of setup precede research.

Isaac GR00T is an open development platform for humanoid robots—open models, simulation and training libraries, and data generators, plus the robot computer fully pre-configured and ready to go in hours. First, set up the simulation environment in Isaac Lab. Capture demonstrations with Isaac Teleoperation on real or simulated robots. Generate synthetic data with Omniverse and Cosmos, scaling one demonstration into thousands. Train policies and evaluate them in Isaac Lab Arena. Deploy through Isaac ROS running on Jetson Thor.

Every element is modular and open—use NVIDIA's components or swap in alternatives. GR00T powers robotics research across every discipline for every domain, from research labs to factory floors. One open platform with a new edition: Isaac GR00T reference design robots built on NVIDIA's open platform, ready for frontier research for any lab anywhere. The age of robotics starts here with NVIDIA Isaac GR00T.

Computing Pattern for the Next Decade

Huang summarized the comprehensive changes to the computer industry in the last six months. Everything changed because agents were realized and converged with the latest frontier models, making it possible for AI to now do useful work.

The computing pattern will repeat across all domains. This computing pattern—an agent with a model, a harness that uses tools with skills, running in a runtime—applies whether in a cloud, on-premises, on a PC, or in a robot. The runtime varies, but the computing pattern remains identical. Different harnesses and models will be used based on preference. Users will improve them for proprietary use, creating super agents to offer to others for their work.

This agentic platform, this agentic pattern—NVIDIA has an Enterprise AI Toolkit. This is a wonderful way for everyone to engage with AI and for NVIDIA represents a wonderful growth opportunity.

Vera Rubin is in full production. While Grace Blackwell was created to process AI, particularly inference, Vera Rubin was created to run agents. It is in full production. It is much more than a GPU—an entire disaggregated, distributed agent processing system.

NVIDIA has become an infrastructure company, not just a GPU company or systems company, but an infrastructure company helping generate maximum revenue and maximum profit as soon as possible.

The agentic world requires CPUs built for agents, not people. CPUs for agents have special requirements, and NVIDIA Vera is revolutionary. Huang expressed happiness about its ramp. Orders are already in. It will be the fastest and most successful product launch in the company's history.

NVIDIA and Microsoft have created a whole new line of PCs—a new beginning. The same agentic computing pattern will run on all kinds of devices. PCs today, but in the future robots, satellites, base stations, factories—in the cloud, on-premises, at the edge. This agentic computing pattern will be replicated in computers worldwide.

How we think about personal computers will very likely change. Huang thanked everyone for partnership and friendship. NVIDIA couldn't be here without everything done together. He expressed pride in partner success over the last year, predicting the next year will be even better.

Closing Performance

The keynote concluded with a musical performance summarizing the announcements. The performance celebrated useful AI's arrival, agents working alongside people, Vera Rubin in full production, the new CPU architecture built for agents rather than x86, OpenShell sandboxing, five-layer system integration, MaxLPS power optimization, 40-year PC reinvention running anywhere Windows goes, synthetic data generation through compute, reasoning capabilities understanding the world like people do, and the future arriving for everyone to see.

Huang closed by welcoming everyone to Computex, thanking them for an amazing year, expressing gratitude for friendship and support, and wishing everyone well.

NVIDIA Deep Dive: Monopolizing the Agentic Era Through Full-Stack Co-Design

The Business Model and Revenue Architecture

NVIDIA has engineered one of the most remarkable transitions in corporate history, evolving from a graphics card manufacturer into a vertically integrated, full-stack accelerated computing platform provider. The core engine of NVIDIA’s economic model is its Data Center business, which now represents more than 92% of the company's consolidated revenue. Instead of selling discrete silicon chips, the company’s business model is structured around selling complete accelerated computing platforms. These platforms encompass high-performance Graphics Processing Units, custom central processing units, advanced networking silicon, and proprietary software. Over the past several quarters, the basic unit of sale has migrated from individual server boards to fully integrated, rack-scale supercomputing enclosures, such as the Grace Blackwell systems and the newly ramping Vera Rubin platform. This allows the company to capture multiple high-margin hardware layers and software premiums that would otherwise be split among various supply chain participants.

NVIDIA's financial results reflect the compounding returns of this system-level monetization strategy. In the first quarter of fiscal 2027, ended April 26, 2026, NVIDIA delivered record-breaking total revenue of $81.6 billion, which represents an 85% increase compared to the prior year and a 20% sequential expansion. This growth was anchored by the Data Center segment, which brought in $75.2 billion, up 92% year-over-year. The financial productivity of this model is underpinned by unparalleled pricing power and high-margin system mixes, leading to GAAP and non-GAAP gross margins of 74.9% and 75.0%, respectively. This immense profitability supports an aggressive capital return strategy. In the first quarter of fiscal 2027 alone, NVIDIA returned approximately $20.0 billion to shareholders through dividends and share repurchases, including raising its quarterly cash dividend to $0.25 per share and establishing an additional $80.0 billion share buyback authorization.

The Customer, Supplier, and Competitor Ecosystem

The structural dynamics of NVIDIA’s value chain are characterized by extreme customer concentration, high manufacturing specialization, and intense competitive posturing. NVIDIA’s primary customer base consists of the global cloud service providers and hyperscale operators, often referred to as the Big Four, which include Microsoft, Amazon, Alphabet, and Meta. This concentration introduces a material risk to the revenue pipeline, as just four customers accounted for 61% of total revenues in late fiscal 2026, with the largest single buyer representing 22% of sales. The end customers are enterprise software developers, consumer internet platforms, and national governments building sovereign AI infrastructure. On the supply side, Taiwan Semiconductor Manufacturing Company is NVIDIA’s indispensable manufacturing partner. NVIDIA has escalated its capital commits and procurement spending in Taiwan to approximately $150 billion annually, making it the largest purchaser in the Taiwanese technology ecosystem and securing prioritized access to advanced nodes and Chip-on-Wafer-on-Substrate packaging capacity. Other key suppliers include SK Hynix, Samsung, and Micron, which provide the high-performance High Bandwidth Memory essential for modern accelerators.

In the competitive landscape, the merchant semiconductor market is a duopoly at the high end. Advanced Micro Devices serves as the primary challenger, scaling its Instinct GPU portfolio. AMD reported a 57% year-over-year expansion in its data center revenue to $5.8 billion in the first quarter of 2026, propelled by its Instinct MI300 series and upcoming MI350 and MI400 architectures. AMD has secured major partnership commitments, including a gigawatt-scale infrastructure build with OpenAI and custom co-design initiatives with Meta. Intel’s Gaudi accelerators remain a distant third, as the company struggles with software ecosystem adoption and platform integration. Beyond merchant silicon, the most formidable competitors are the internal engineering divisions of the hyperscalers themselves, which are increasingly designing custom application-specific integrated circuits to bypass merchant silicon margins.

Market Share Dynamics and the Co-Design Moat

NVIDIA maintains a dominant grip on the merchant AI accelerator market, commanding an estimated 85% to 92% market share as of mid-2026. While competitors have successfully secured secondary positions in the market, the sheer velocity of industry demand has allowed NVIDIA to maintain its leadership without compromising its margin profile. The foundation of this market share dominance is the Compute Device Unified Architecture, known as CUDA, a proprietary software computing platform developed over two decades. CUDA has created an immense developer lock-in effect, as the vast majority of AI training libraries, compilers, and framework optimizations are written specifically for NVIDIA’s software stack. Trying to run state-of-the-art models on competitor hardware requires complex emulation layers or extensive code re-writes, which introduces execution risk and latency that most enterprise buyers are unwilling to accept.

NVIDIA’s competitive advantage has expanded from a software-silicon moat into a highly integrated, rack-scale co-design hegemony. As data center architectures evolve, the performance bottleneck has shifted from raw compute capability to system-level communication and interconnect bandwidth. NVIDIA addresses this through its proprietary scale-up and scale-out networks, particularly the NVLink interconnect protocol and NVSwitch silicon. In the newly announced Vera Rubin platform, NVLink 6 switches deliver an unprecedented 260 terabytes per second of aggregate bandwidth. By acting as the sole architect of the entire system — including the custom ARM-based Vera CPU, the Rubin GPU, ConnectX-9 SuperNICs, and Spectrum-6 Ethernet switches — NVIDIA optimizes memory access and power delivery at the rack level. This systems-level engineering capability prevents competitors from competing on a component-by-component basis, as enterprise buyers increasingly demand turnkey, fully optimized liquid-cooled supercomputers rather than disparate semiconductor elements.

Industry Opportunities, Geopolitical Threats, and Platform Concentration

The primary long-term opportunity for NVIDIA is the secular transition from generative AI models to agentic AI and reinforcement learning architectures. Generative AI relied heavily on single-pass, feed-forward inference queries, whereas agentic AI systems execute autonomous, multi-step workflows. A single user prompt can trigger thousands of sequential steps involving local sandboxed code execution, vector database retrievals, tool utilization, and reasoning loops. This agentic shift exponentially increases the computational intensity per transaction, driving a structural expansion in the total addressable market for data center infrastructure and sustaining demand for advanced hardware long after initial foundational model training is complete.

These structural opportunities are counterbalanced by severe geopolitical headwinds and platform concentration. The most immediate financial impact is the total loss of the Chinese data center market due to stringent United States export restrictions. In the first quarter of fiscal 2026, NVIDIA generated $4.6 billion in Hopper-class data center revenues from Chinese customers; in the first quarter of fiscal 2027, this figure declined to zero. NVIDIA’s current forward guidance assumes no data center compute revenue from China, permanently closing a massive addressable market. Furthermore, the concentration of the global semiconductor supply chain in Taiwan represents a systemic tail-risk. With TSMC manufacturing approximately 90% of the advanced silicon used in AI accelerators, any geopolitical disruption in the region would immediately halt NVIDIA's hardware production. Finally, the extreme customer concentration among the top four hyperscalers exposes NVIDIA to capital expenditure digestion cycles if these major buyers decide to optimize their existing capacity.

Technological Roadmaps: Vera, Rubin, and the Shift to Agentic AI

To preempt competition and sustain its high-margin revenue streams, NVIDIA has compressed its hardware release cadence to a yearly cycle. The newly announced Vera Rubin platform, which has transitioned into full production for initial customer shipments in the third quarter of 2026, is engineered specifically for this agentic era. The platform is not merely a single semiconductor die but an integrated suite of seven custom-designed chips: the Vera CPU, the Rubin GPU, the NVLink 6 Switch, the ConnectX-9 SuperNIC, the BlueField-4 DPU, the Spectrum-6 Ethernet switch, and the Groq 3 LPU. This extreme co-design allows the entire rack-scale system to function as a single, distributed accelerator, bypassing the performance and communication limitations of traditional modular designs.

At the heart of this next-generation architecture is the Rubin GPU, built on TSMC's advanced 3-nanometer process. The Rubin GPU features 336 billion transistors, a 1.6-fold increase over the Blackwell architecture, and integrates 288 gigabytes of HBM4 memory across eight stacks. This architecture delivers an extraordinary 22 terabytes per second of memory bandwidth, which is 2.8 times higher than Blackwell's bandwidth. The Rubin GPU utilizes a third-generation Transformer Engine with hardware-accelerated adaptive compression to support 4-bit floating point format, enabling up to 50 petaFLOPS of NVFP4 inference capability. Operating at a thermal design power of up to 2,300 watts per GPU, the Rubin platform is entirely liquid-cooled, completely eliminating air-cooled options from NVIDIA’s high-end data center portfolio.

The platform’s other key growth driver is the Vera CPU, NVIDIA's first custom, ARM-based processor built specifically for agentic AI orchestration and reinforcement learning. The Vera CPU features 88 custom Olympus cores compatible with the Armv9.2 architecture and utilizes spatial multi-threading to partition core resources. Benchmark data indicates that the Vera CPU delivers a 1.8-fold improvement in task completion speeds compared to traditional x86 server processors and achieves a 1.63-fold performance leap over the previous-generation Grace CPU. At the edge, NVIDIA is deploying its RTX Spark processor architecture, which scales the Rubin microarchitecture into consumer laptops and desktops. This on-device agentic framework is designed to transition AI assistants from cloud-dependent tools to local, low-latency, autonomous agents, establishing a secondary growth channel in the personal computing market.

Disruptive Challengers and the Custom Silicon Threat

While traditional competitors like AMD and Intel remain focal points for market commentators, the most credible long-term threat to NVIDIA’s market share is the rapid growth of custom application-specific integrated circuits designed internally by cloud hyperscalers. Custom silicon represented 20.9% of the AI accelerator market in 2025 and is projected to expand to 27.8% by the end of 2026. Hyperscalers are highly motivated to deploy proprietary hardware to bypass NVIDIA’s massive merchant gross margins and reduce their total cost of ownership. Google’s TPUs continue to serve as the volume backbone of this custom market, while Amazon’s Trainium series has secured substantial multi-billion dollar commitments, including deployments with Meta.

Broadcom has emerged as the premier enabler and design partner for this custom ASIC ecosystem, capturing an estimated 60% of the custom AI semiconductor market through multi-generation, custom co-design partnerships with Google, Meta, and OpenAI. Broadcom’s custom silicon program is expanding into advanced 2-nanometer nodes, leveraging its deep physical intellectual property and packaging integration capabilities to build high-performance, cost-effective alternatives to general-purpose GPUs. Similarly, Marvell holds an estimated 20% to 25% share of the custom ASIC design space, primarily serving Amazon Web Services and Microsoft. As the AI market matures and workloads transition from compute-heavy foundational training to highly repetitive, cost-sensitive inference, these customized, workloads-specific ASICs present a structural headwind that could gradually erode NVIDIA’s long-term market share in high-volume hyperscale data centers.

Management Track Record and Execution Under Scarcity

NVIDIA’s senior leadership, led by founder and CEO Jensen Huang and CFO Colette Kress, has demonstrated an exceptional track record of operational agility and long-term strategic foresight. Management successfully anticipated the hardware requirements of the transformer-model revolution and aggressively committed capital to secure fabrication and advanced packaging capacity before the generative AI wave fully materialized. This aggressive posture has enabled NVIDIA to consistently execute its hardware ramps under severe global packaging and silicon constraints, managing a highly complex supply chain of hundreds of partners, including 150 suppliers in Taiwan alone.

This operational execution is paired with a disciplined, shareholder-aligned capital allocation framework. With the business generating massive free cash flows, management has taken advantage of its financial strength to initiate a substantial capital return program. In the first quarter of fiscal 2027, the company returned $20.0 billion to shareholders and established an additional, non-expiring $80.0 billion share repurchase authorization, bringing total buyback capacity to over $118 billion. While some market participants interpret this focus on share buybacks and a 25-fold dividend hike as a sign of a hyper-growth story transitioning into maturity, management's simultaneous commitment to a yearly hardware cadence and heavy research and development investments in future architectures, such as the upcoming Feynman platform, indicates that NVIDIA remains focused on technology leadership while maintaining a balanced capital structure.

The Scorecard

NVIDIA continues to occupy an exceptionally strong, near-monopolistic position at the epicenter of the global artificial intelligence infrastructure buildout, as evidenced by its record-breaking Q1 fiscal 2027 revenue of $81.6 billion and non-GAAP gross margins of 75.0%. The company’s competitive moat has expanded far beyond simple silicon superiority to encompass full-system and rack-scale co-design, represented by the newly ramping Vera Rubin platform. By pairing the custom 88-core Vera CPU with the 336-billion-transistor Rubin GPU and integrating them via proprietary NVLink 6 networks, NVIDIA has created an optimized, highly integrated platform that delivers up to 10 times the agentic throughput of the previous Blackwell generation. This full-stack integration, combined with the deeply entrenched CUDA software ecosystem, makes it highly improbable that merchant competitors like AMD will gain substantial market share in high-end training and complex agentic inference workloads in the near term.

Despite this operational excellence, long-term investors must weigh NVIDIA's dominance against mounting structural and geopolitical headwinds. The company faces unprecedented customer concentration, with just four hyperscalers accounting for over 60% of total revenue, exposing the business to substantial volatility should these buyers enter a capital expenditure digestion phase. Simultaneously, the rapid rise of custom ASICs, co-developed by Broadcom and Marvell for hyperscalers like Google and Amazon, poses a credible long-term threat as the market shifts from training to highly cost-sensitive inference. Compounded by the total loss of Chinese data center revenue due to U.S. export controls and a systemic reliance on Taiwan's manufacturing ecosystem, NVIDIA's premium valuation leaves little room for execution missteps. While the technological roadmap remains peerless, the transition to a mature, highly concentrated market indicates that future returns will be dictated by supply-chain resilience and the economics of custom silicon substitution.

DruckFin