NVIDIA and Microsoft Are Betting the Future of the PC Is Agents and Local Inference

Strip away the keynote theatrics and the RTX Spark launch at Computex comes down to a single wager: a meaningful share of future AI work belongs on the PC rather than in a data center. NVIDIA and Microsoft built a Windows platform around a strong GPU and a deep pool of unified memory because they want models and agents running where the user sits. For the next year or two, open-weight models are what make that wager credible. Over a longer horizon, the bet only pays off if the companies behind the best frontier models are willing to change how they do business.

To date, the Windows AI message has leaned on the neural processing unit (NPU) and the idea of small, persistent AI workloads carried out locally. But the early returns were thin. Copilot+ shipped with fewer headline features than promised, and developers never showed up to write for the NPU in any meaningful numbers. It says something, then, that this launch put a GPU and an agent runtime at the center, and went silent on the NPU and on Copilot+ branding. Both will presumably be present in these as-yet-released machines. The choice to keep them offstage is the part worth noting.

Governance first, compute second

Most of the engineering that should interest corporate buyers has little to do with raw compute. NVIDIA and Microsoft added containment and identity controls to Windows and wrapped them in an agent layer called NVIDIA OpenShell. An administrator can spell out exactly what an agent may reach, push sensitive requests to a model running on the machine, and scrub identifying details before any prompt leaves for the cloud. IT teams have made clear they will not let autonomous software roam across managed devices without that kind of governance. Whether it works as advertised is unproven, but the design attempts to answer ITDMs’ legitimate concerns.

Local inference = lower cost

The case for keeping inference on the device is mostly about money. Privacy, security, and lower latency matter too — but the key is that a generous memory pool attached to a robust GPU lets a capable model sit resident and respond locally. The work does not run up charges in a cloud instance every time an agent acts. Once agents are working all day underneath ordinary tasks, paying cloud rates for every token they burn stops being sustainable well before the technology hits any wall. Shifting a portion of that load onto hardware the buyer already owns is one of the few ways the forecasted agentic AI ramp proceeds unimpeded.

Today, open-weight models may be capable enough to run useful agents on a machine like the RTX Spark. That’s the near-term proof point. The best models, though, still sit behind cloud subscriptions from Anthropic, Google, and OpenAI. Companies trust those models, and the AI firms behind them have built large businesses on running inference in their own data centers. Moving part of that work onto a customer’s GPU asks them to do more than ship a smaller model — it asks them to redraw the business: pricing and packaging a hybrid where some inference runs locally and some runs in the cloud. Local tokens for the simple jobs, cloud tokens for the heavy lifting. None of the foundation model companies have offered a subscription like that yet. Until one does, on-device intelligence will run on open models that will likely always trail the frontier.

Price, timeline, and the real test

The practical limits are easy to see. No RTX Spark notebook prices have been set, and launch configurations will almost certainly sit at the top of the premium market. Average selling prices for notebooks have drifted upward for years, and that was before memory costs exploded. Today’s memory pricing will make a fully loaded 128GB RTX Spark system quite steep. The buyers willing to pay for the maximum build will mostly be developers and creative professionals who already understand what running models locally gets them. Everyone else is further out.

None of this makes RTX Spark the mainstream arrival of the agentic PC. The silicon looks capable, and adding a fourth serious contender alongside Intel, AMD, and Qualcomm should sharpen all of them. The shift to more AI compute on device is the more straightforward part of this equation. The questions that determine whether this matters are softer and slower: whether Microsoft’s agent platform earns the trust of cautious IT departments, and whether the frontier labs conclude that some of their inference belongs on the customer’s desk. Settle those two and RTX Spark will look ahead of its time.

Read the full analysis

Tom Mainelli - Group Vice President, Device & Consumer Research

Tom Mainelli heads the Device & Consumer Research Group, overseeing a wide array of hardware and technology categories catering to both home and enterprise markets. His team's research spans PCs, tablets, smartphones, wearables, smart home devices, thin clients, displays, and…

Subscribe to our blog

Stay ahead in a rapidly evolving market. Get timely insights, trusted research, and actionable guidance to help you make smarter technology and business decisions — whether you’re driving innovation, evaluating solutions, or shaping strategy.

NVIDIA and Microsoft Are Betting the Future of the PC Is Agents and Local Inference

Governance first, compute second

Local inference = lower cost

Price, timeline, and the real test

Tom Mainelli - Group Vice President, Device & Consumer Research

Subscribe to our blog

Covered

Subscribe to our blog

IDC Environmental Policy

We fulfill this mission by a commitment to:

Leaving?