VAST Data Unveils New Inference Architecture to Power NVIDIA’s Inference Context Memory Platform
VAST Data, a company specializing in AI operating systems, has announced the launch of a new Inference Architecture designed to enable deployment of NVIDIA’s Inference Context Memory Storage Platform, marking a major step forward in supporting Agentic AI applications and long-term inference workloads.
The new platform introduces a new category of storage infrastructure purpose-built for gigascale inference, leveraging NVIDIA BlueField-4 DPUs and Spectrum-X Ethernet networking to dramatically accelerate access to Key-Value (KV) cache memory. The architecture enables high-speed sharing of inference context across distributed nodes while delivering significant gains in energy efficiency.
Redefining Inference for Agentic AI
As inference evolves from single-response outputs to continuous, multi-step reasoning across multiple AI agents, VAST Data noted that context can no longer be confined to a single node. Performance is now increasingly determined by the efficiency of storing, retrieving, reusing, and scaling inference history (KV cache) under constant operational pressure—rather than raw GPU compute power alone.
To address this shift, VAST Data has re-architected the inference data path by running VAST AI Operating System natively on NVIDIA BlueField-4, embedding core data services directly inside GPU servers executing inference workloads, alongside dedicated data nodes. This approach eliminates traditional client-server bottlenecks, reduces unnecessary data copies, and lowers time-to-first-token (TTFT) while scaling concurrent sessions.
