If you run AI agents in production — or you're thinking about it — today's GTC 2026 keynote wasn't just another product launch. It was a roadmap for what compute infrastructure looks like when inference costs approach zero and tokens become the next commodity.
Jensen Huang stood on stage in San Jose and announced $1 trillion in purchase orders for NVIDIA's Blackwell and Vera Rubin chips through 2027. That's double last year's $500 billion. NVIDIA's market cap now sits at roughly $4.5 trillion. For context, that's larger than the entire GDP of Germany.
But the number that matters more? 350x token generation leap with Vera Rubin. And the phrase Jensen kept repeating: "Every single technology company now has to think, 'what's your OpenClaw strategy?'"
We run a multi-agent AI team in production. We've watched inference costs drop, capabilities explode, and the architecture of useful AI shift from monolithic models to orchestrated agents. Today's announcements aren't just faster chips — they're structural shifts in what becomes possible and what becomes economically inevitable.
Here's what actually matters from the keynote, grouped by implication rather than product name. Plus some spicy predictions about what comes next.
Sources: CNBC live coverage, TechRadar live blog
The Compute Arms Race: When Hardware Defines What's Possible
Vera Rubin Ships Q3 2026
The headline chip is Vera Rubin: 1.3 million components, 10x performance per watt versus Grace Blackwell, and that 350x token generation leap. Microsoft is the first cloud provider to validate the NVL72 configuration.
What does this actually mean? If you're running multi-agent systems today, you're bottlenecked by inference speed and cost. Agents don't just need to be smart — they need to be fast and cheap enough to run continuously. Vera Rubin doesn't just make tokens faster; it makes entire categories of agent architecture economically viable that weren't before.
We've built agents that analyse customer data, scrape competitor pricing, audit SEO performance, and synthesise strategic insights. The constraint isn't "can the model do this?" — it's "can we afford to run this every hour?" Vera Rubin shifts that calculus hard.
The Groq Wildcard
In December 2025, NVIDIA acquired Groq for $20 billion. Today we saw why: the Groq 3 LPU (Language Processing Unit), delivering a 35x tokens-per-watt improvement over traditional GPUs.
Jensen's framing: "We united two processors of extreme differences — one for high throughput, one for low latency."
Translation: GPUs excel at training and batch inference (high throughput). LPUs excel at real-time, single-request inference (low latency). The Groq 3 LPX rack holds 256 LPUs alongside Vera Rubin GPUs in a unified architecture.
The strategic bet: NVIDIA is hedging against a world where inference architecture fragments. If latency-critical applications (think voice agents, real-time trading, autonomous systems) demand LPU-style chips, NVIDIA wants to own both sides of the market.
The risk: Does this fragment the ecosystem? Will developers optimise for GPU or LPU? Or does NVIDIA's orchestration layer abstract it away?
The Roadmap: Kyber (2027) and Feynman (2028)
- Kyber: Next-generation rack prototype shipping in 2027 with Vera Rubin Ultra. 144 GPUs in vertical compute trays for higher density and lower latency.
- Feynman (2028): New GPU, new LPU, new CPU architecture called "Rosa", Bluefield 5 networking, Kyber racks with copper interconnects and co-packaged optics (CPO) at scale.
Jensen also announced a $4 billion investment in photonics for next-gen AI chips. This isn't incremental — it's re-architecting how data moves inside compute clusters.
What It Means
The compute arms race isn't slowing down. It's accelerating. If you're planning AI infrastructure for 2027-2028, assume:
- Inference costs drop by another order of magnitude
- Latency improvements unlock real-time agent interactions
- Architecture choice matters (GPU vs LPU vs hybrid)
The Software Stack Shift: When NVIDIA Wants to Own the Agent Layer
NemoClaw: NVIDIA's Bid for the Orchestration Layer
The most strategically interesting announcement wasn't a chip. It was NemoClaw — NVIDIA's enterprise-ready reference stack for OpenClaw.
For context: OpenClaw is an open-source agentic framework created by Austrian developer Peter Steinberger, launched in January 2026. Steinberger joined OpenAI shortly after, and Sam Altman said OpenClaw would "live in a foundation as an open source project."
Jensen's line: "Every single technology company now has to think, 'what's your OpenClaw strategy?'" He called OpenClaw "the new computer."
Why this matters: NVIDIA has historically been a hardware company that enables software ecosystems. NemoClaw is a play to own the software layer for enterprise AI orchestration. If enterprises adopt NemoClaw as the de facto standard for deploying multi-agent systems, NVIDIA isn't just selling chips — they're selling the entire stack.
The question: Does NemoClaw become the Rails of AI orchestration? Or does it fragment into competing stacks (Anthropic's tooling, OpenAI's agents API, open-source alternatives)?
We're watching this closely. Our multi-agent setup is built on custom orchestration, but if NemoClaw offers batteries-included deployment, monitoring, and scaling, the switching cost might be worth it.
Tokens as the Next Commodity
Jensen described tokens as "the next commodity." NVIDIA's internal modelling suggests Vera Rubin NVL72 configurations could unlock a $150 billion revenue opportunity for cloud providers.
His framing: "If they could just get more capacity, they could generate more tokens, their revenues would go up."
Translation: We're moving from a world where AI is a feature to a world where token generation is infrastructure. Cloud providers will sell tokens the way they sell compute hours today. Enterprises will buy token capacity the way they buy bandwidth.
Implication: If tokens become commoditised, the moat shifts to orchestration quality. The agent that asks the right question, filters the right context, and routes to the right model wins — not the one with the biggest context window.
This aligns with what we've seen in production. The breakthroughs aren't from using the newest model; they're from better task decomposition, clearer instructions, and validation workflows that catch errors before they compound.
Physical AI: When Bits Meet Atoms
Autonomous Driving's "ChatGPT Moment"
Jensen claimed: "The ChatGPT moment for autonomous driving is here."
The Uber partnership is the proof point: autonomous vehicle fleets across 28 cities, 4 continents by 2028. New OEM partners include Nissan, BYD, Geely, Isuzu, and Hyundai, all on NVIDIA's Drive Hyperion platform.
The sceptical take: Autonomous driving has been "two years away" for a decade. 28 cities by 2028 is ambitious. Regulatory approval, edge cases, public trust — all unsolved.
The optimistic take: If Vera Rubin truly delivers 350x token generation improvements, real-time sensor fusion and decision-making become tractable in ways they weren't before. The "ChatGPT moment" isn't about sentiment — it's about economic viability. When the cost per mile drops below human drivers and the safety record improves, adoption accelerates fast.
Robotics and Space-1
- Disney partnership continues: Olaf robot demo on stage. NVIDIA's Omniverse and Isaac platforms remain the backbone of physical AI simulation.
- Space-1: A Vera Rubin variant designed for orbital data centres. Yes, you read that correctly. NVIDIA is building chips for space-based compute infrastructure.
Why space? Latency for earth-based applications, energy access (solar), cooling (vacuum), and future lunar/Mars infrastructure. It sounds like science fiction, but if you're planning compute infrastructure for 2030+, you need off-planet options.
What Jensen Didn't Say (And Why It Matters)
No Mention of Model Training Costs
Every announcement focused on inference — generating tokens, deploying agents, running real-time systems. Nothing about training efficiency or reducing the cost of pre-training foundation models.
Read: The battlefield has shifted. Training foundation models is table stakes. The differentiation is in what you do with them.
No OpenAI Partnership (But OpenClaw Got a Shout-Out)
Jensen praised OpenClaw extensively but didn't mention OpenAI's models, GPT-5 rumours, or deeper collaboration. Microsoft (OpenAI's key partner) was named as the first Vera Rubin cloud validator, but OpenAI wasn't on stage.
Read: NVIDIA is model-agnostic and wants to stay that way. NemoClaw supports any model provider. NVIDIA's bet is on infrastructure, not a specific model vendor.
No Consumer Announcements
DLSS 5 (next-gen upscaling) got a brief mention, but no new consumer product details, no pricing, no availability.
Read: The consumer market is now a footnote in NVIDIA's strategy. The real money is enterprise AI infrastructure.
What Comes Next: Five Spicy Predictions
1. Inference Costs Approach Zero, Orchestration Becomes the Moat
If Vera Rubin delivers 350x token generation improvements and Groq 3 delivers 35x efficiency gains, inference pricing will collapse again in the next 18 months.
What happens: Running an AI agent 24/7 becomes cheaper than a Netflix subscription. The constraint shifts from "can we afford this?" to "do we trust the output quality?"
Implication: Quality assurance, validation workflows, and adversarial testing become the bottleneck. The companies that solve "how do we catch agent hallucinations before they cause damage?" will win.
2. NemoClaw Fragments the Market (Or Unifies It — No Middle Ground)
Either NemoClaw becomes the Rails/Kubernetes of AI orchestration (every enterprise uses it), or it fragments into competing stacks and we get a messy standards war.
Our guess: It unifies, but only if it stays open-source and doesn't become NVIDIA-hardware-only. The moment it favours Vera Rubin over AMD or custom ASICs, the community forks.
3. Autonomous Driving Hits 15 Cities by 2028 (Not 28)
We're optimistic about AV progress, but 28 cities across 4 continents in two years? Regulatory approval alone will bottleneck that.
More realistic: 15 cities in controlled geofenced zones, mostly in China and select US metros. Full autonomy in dense urban cores (London, Tokyo, NYC) is still 2030+.
4. The Groq Acquisition Was Defensive, Not Offensive
NVIDIA didn't buy Groq because LPUs are better than GPUs. They bought Groq to prevent a competitor (Google, Amazon, Microsoft) from owning the low-latency inference layer and fragmenting the market.
What to watch: Does the LPU architecture gain independent traction, or does it get absorbed into NVIDIA's GPU roadmap as a niche accelerator?
5. "What's Your OpenClaw Strategy?" Becomes the New "What's Your Cloud Strategy?"
In 2010, every board asked, "What's your cloud strategy?" By 2015, not having one was existential.
In 2026, Jensen asked, "What's your OpenClaw strategy?" By 2028, enterprises without a multi-agent orchestration plan will be competitively disadvantaged.
Why: When your competitors can deploy AI agents that analyse data, respond to customers, and optimise operations 24/7 at near-zero marginal cost, you can't compete with human-only workflows.
The Bottom Line
Today's GTC wasn't just a product launch. It was a declaration that AI infrastructure is entering a new phase:
- Inference costs are collapsing (again)
- Orchestration quality becomes the moat (not raw model size)
- Physical AI is moving from pilots to production (autonomous driving, robotics)
- NVIDIA wants to own the full stack (NemoClaw is the tell)
If you're running AI in production, here's what to do:
- Re-evaluate your inference costs — the economics just shifted
- Invest in orchestration quality — validation, error handling, context management
- Watch the NemoClaw ecosystem — if it gains traction, it might be worth the migration
- Plan for agent-first workflows — the competitive advantage is speed to deployment, not model access
And if you're still treating AI as a "nice-to-have" rather than infrastructure? Your competitors aren't.