A new benchmark designed specifically for agentic AI workloads has crowned NVIDIA's latest hardware as the top performer — and the test itself may matter as much as the result.

AgentPerf, developed by Artificial Analysis, is described by both the firm and NVIDIA as the industry's first benchmark built to evaluate AI infrastructure for agentic tasks. Unlike traditional AI benchmarks that measure raw model accuracy or throughput on static tasks, AgentPerf is designed to reflect how well systems handle the kind of multi-step, autonomous workflows that define modern AI agents.

According to the NVIDIA Blog, the NVIDIA Blackwell Ultra NVL72 platform delivered leading performance across the first round of published AgentPerf results. The platform is part of NVIDIA's Blackwell architecture, its most recent generation of AI chips.

The benchmark is positioned as a practical tool for developers, enterprises, and infrastructure providers trying to make apples-to-apples comparisons when choosing hardware for agentic deployments — a decision that has become increasingly consequential as companies move from experimenting with chatbots to deploying AI systems that can plan, execute, and iterate autonomously.

The timing is notable. The AI industry has spent years optimizing for training large models, but inference infrastructure for agents — which may call tools, loop through reasoning steps, and run for extended periods — puts different demands on hardware. Having a shared measurement standard could help cut through marketing claims and give buyers clearer guidance.

If AgentPerf gains broad adoption as a standard, it could reshape how the chip and cloud industries compete for the fast-growing agentic AI market.