AMD’s Ryzen AI Max+ 395, codenamed Strix Halo, has been selling in laptops and mini-PCs since January 2025, and by the time Nvidia unveiled RTX Spark at Computex last week, the chip had already landed in 35 consumer products. Nvidia’s platform arrived without a price or a confirmed ship date, targeting fall 2026. At a press roundtable the same day, AMD VP Rahul Tikoo picked up an HP Strix Halo mini-PC and asked HP VP Jim Nottingham when HP launched it. “CES 2025,” Nottingham said. Tikoo grabbed a laptop. “Two months later.” Before sitting back down: “We have 35 products with Strix Halo in market. Welcome, Nvidia, to the modern compute journey.”
The software question is what has actually defined AMD’s disadvantage in local AI. CUDA (Compute Unified Device Architecture, Nvidia’s GPU programming platform) is why “just buy Nvidia” has been a genuinely successful answer for years: new models assumed it, inference frameworks targeted it first, and AMD’s alternative stack, ROCm (Radeon Open Compute), required enough patience to deter most developers. In mid-2026, ROCm supports most of what a local LLM user actually runs, the official compatibility list covers consumer cards that weren’t on it two years ago, and the remaining gaps are more specific than the old “AMD doesn’t work for AI” dismissal.
AMD’s Head Start in AI Mini-PCs
The hardware class Nvidia entered at Computex, a high-end chip pairing a powerful integrated GPU with a large unified memory pool in a compact form factor, belongs to a category AMD created. The Ryzen AI Max+ 395 packs 16 Zen 5 CPU cores and a 40 compute-unit RDNA 3.5 integrated GPU into a 55W default envelope. Its most relevant specification for local AI work is up to 96GB of that pool addressable as VRAM through AMD’s Variable Graphics Memory, meaning a 70-billion-parameter model at Q4 quantization fits entirely on-chip without discrete GPU hardware or CPU offloading. GMKtec’s EVO-X2, a third-party mini-PC built around the same chip and loaded with 128GB of memory, sells for around $3,300.
Tikoo’s count of 35 Strix Halo products covers the full range: thin-and-light laptops like the HP ZBook Ultra G1a, which launched in early 2025, through a wave of mini-PCs that followed across the rest of the year. Framework’s Desktop, also built around the platform, drew a developer audience that cares about repairability alongside raw memory capacity.
At Computex, AMD launched its own first-party developer box, the Ryzen AI Halo mini-PC, priced at $3,999 with a 2TB SSD. The box ships with ROCm, PyTorch, and AMD-validated model packages already installed, alongside what Tikoo called developer playbooks. AMD re-certifies the full software stack every month so inference tools keep working as upstream packages change. Setting up a local inference environment from scratch, Tikoo said at the roundtable, can eat a full weekend for someone who already knows what they’re doing. The $3,999 price buys back that weekend.
What RTX Spark Is Promising
The RTX Spark Superchip is Nvidia’s consumer version of the Grace Blackwell GB10 architecture, the same silicon Nvidia CEO Jensen Huang tied to the N1 and N1X chips inside the DGX Spark desktop. The consumer package pairs a 20-core Arm CPU from MediaTek with a Blackwell GPU carrying 6,144 CUDA cores, scales from 16GB up to 128GB of unified LPDDR5X memory, and claims 1 petaflop of FP4 compute. Nvidia quotes 300 GB/s of memory bandwidth.
That figure carries a caveat. Nvidia quoted 300 GB/s for GB10 at the Hot Chips conference last year, and the DGX Spark shipped at 273 GB/s instead. Whether the RTX Spark variant closes that gap won’t be known until consumer devices reach reviewers.
More than 30 OEM partners have signed on for fall: ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI, with Acer and Gigabyte following. No partner has disclosed a price. The platform runs Windows on Arm, carrying the same x86 application compatibility questions that have trailed Qualcomm’s Snapdragon X series, while AMD’s x86 architecture sidesteps those questions entirely. Adobe is rebuilding Photoshop and Premiere from the ground up for RTX Spark, and Nvidia and Microsoft are co-developing OpenShell, a framework that positions Windows as an agentic operating system where models run overnight tasks without user input. Both are software bets that can only be measured once hardware ships.
Specs Side by Side
The two platforms share the same memory ceiling and diverge on CPU architecture, GPU lineage, and compute format. AMD’s specification sheet for the chip lists a 55W default TDP with a configurable range up to 120W, giving OEMs wide latitude in thermal design. Nvidia’s entry-level RTX Spark configurations start at 16GB of memory; AMD’s existing Ryzen AI 300 and 400 series already cover that lower tier today.
| Feature | AMD Ryzen AI Max+ 395 | Nvidia RTX Spark |
|---|---|---|
| CPU | 16-core Zen 5 (x86), up to 5.1 GHz | 20-core Arm, up to 4.1 GHz |
| GPU | 40 RDNA 3.5 CUs (integrated) | 6,144 Blackwell CUDA cores |
| Max memory | 128GB LPDDR5X | 128GB LPDDR5X |
| Memory bandwidth | 256 GB/s (theoretical peak) | 300 GB/s claimed; 273 GB/s on DGX Spark |
| CPU architecture | x86, native Windows | Windows on Arm |
| AI compute | ~60 TFLOPS RDNA 3.5 | 1 PFLOP FP4 (Blackwell) |
| Availability | On sale since January 2025 | Fall 2026 |
| Entry price | ~$3,300 (third-party); $3,999 (AMD Halo) | Not announced |
| GPU software | ROCm, Vulkan | CUDA, TensorRT |
Nvidia’s Blackwell architecture carries a genuine edge on low-precision matrix math, particularly for prompt processing, the phase that determines time to first token. AMD’s answer is x86 compatibility and 18 months of delivery, with the DGX Spark’s measured 273 GB/s putting the bandwidth ceiling in perspective before RTX Spark hardware is even available for independent testing.
ROCm’s Changed Baseline
The Tools That Now Run
ROCm’s reputation was earned honestly: it was difficult, it was incomplete, and a developer who hit an edge case had far less community knowledge to draw on than a CUDA user would. That description is aging fast. The inference stack most local AI users actually care about now supports AMD hardware in ways that hold up in practice.
Confirmed working on the gfx1151 integrated GPU in current Strix Halo machines:
- LM Studio (Vulkan backend on Windows; ROCm backend on Linux)
- Ollama (automatic AMD GPU detection on Linux with ROCm installed)
- llama.cpp (Vulkan and ROCm/HIP targets; AMD provides pre-built Windows binaries for the ROCm path)
- ComfyUI and Stable Diffusion (via PyTorch with ROCm wheels)
- vLLM (AMD ROCm listed as a first-class platform alongside CUDA since v0.16, with pre-built Docker images)
On current hardware, Vulkan tends to come out ahead for raw token generation throughput. ROCm matters more for prompt processing, Flash Attention, and long-context behavior where AMD’s HIP stack has architectural advantages. Which backend wins depends on model size, quantization level, and context length.
HIP’s Compatibility Bridge
HIP (Heterogeneous-compute Interface for Portability, AMD’s GPU programming interface) is what makes the PyTorch transition smoother than the name “alternative CUDA” implies. On a ROCm build of PyTorch, the familiar torch.cuda API still exists, but those calls route through HIP to AMD hardware instead of through CUDA on Nvidia silicon. Most PyTorch code written with CUDA assumptions runs on AMD without being rewritten around a separate API.
PyTorch’s AMD integration has moved quickly. Version 2.9 brought ROCm into the experimental wheel-variant system, making installation less awkward. Version 2.12 added expandable memory segments, rocSHMEM symmetric memory collectives, and FlexAttention pipelining. AMD’s developer documentation lists ROCm 7.2.1 support covering Radeon RX 9000, select RX 7000 cards, and Ryzen AI Max, AI 300, and select AI 400 series APUs. Windows is now part of AMD’s ROCm story for consumer hardware, particularly around PyTorch, even if Linux remains the broader and more stable target for serious inference work. The RX 7900 XTX, which once required manually building half the stack, sits on the official support list today.
Where CUDA Still Leads
AMD’s VP laid the gaps out at the roundtable without papering over them. On sandboxing for agentic use cases, which becomes a security requirement as AI agents handle more autonomous overnight tasks, he said:
That’s one of the things we want to address quickly.
He identified a second gap at the same session: making ROCm development built for Instinct data-center accelerators run fully usable on endpoint iGPUs like the chip’s integrated GPU, which he called a big focus. AMD’s NPU (Neural Processing Unit, the dedicated AI accelerator block in Ryzen AI chips) has the same problem from a different angle. It handles efficiency workloads when power consumption matters more than peak speed, but an ISV (independent software vendor) library for NPU-specific inference is still being assembled.
Flash Attention implementation on gfx1151 fails in some PyTorch builds, an open bug users of the chip have reported in the field. Quantization libraries like bitsandbytes have AMD support now, but the CUDA path ships first and carries fewer rough edges. Ollama still intermittently times out on this hardware while it hunts for the GPU, falling back silently to CPU inference. None of these issues are unfixable, but a CUDA user encounters none of them.
Training is the softest part of AMD’s story. Zyphra’s ZAYA1-8B, trained on AMD Instinct MI300X clusters and described by the company as the first large-scale mixture-of-experts foundation model trained entirely on AMD hardware, drew attention when it shipped. Several years into the LLM era, a claim like that being notable still illustrates how thoroughly CUDA-first the training world remains. New AI research papers assume CUDA, and model releases on new architectures target Nvidia hardware first. ROCm support follows, with a delay and more rough edges.
The Price Spread and the Wait
AMD’s Ryzen AI Halo opens for pre-order this month at $3,999. Nvidia’s DGX Spark, after launching at that same price, was raised to $4,699, putting AMD’s first-party box $700 below its direct Nvidia counterpart before RTX Spark pricing is even known.
Gorgon Halo, the Ryzen AI Max 400 series refresh due in Q3, pushes the memory ceiling to 192GB and raises the model parameter ceiling to around 300 billion. That arrives roughly when Nvidia’s first RTX Spark devices ship, meaning updated AMD hardware and Nvidia’s first consumer generation land in close proximity.
RTX Spark pricing remains undisclosed. Hands-on time has been limited to guided demos, and independent benchmarks on prompt processing and token throughput won’t exist until consumer devices ship. When they do, Nvidia’s FP4 compute lead will face its first independent test. AMD has 35 products on shelves today.





