
“This greatly aids our customers in building an overall working solution, because the interactions between the network and the host can be complicated and difficult to debug when it’s different systems collecting them,” Duda said.
Analysts react to telemetry preview
Arista declined to share more details about its forthcoming AI telemetry extensions, but experts say additional control features would be a benefit to high-end customers such as hyperscalers that are operating AI networks.
“Modern switches already know detailed internal conditions (congestion, drops, buffers, RDMA counters, latency), but that information is invisible unless it’s exported. Streaming it to a central system makes the network observable in real time, not just via logs but via live operational state. This is especially critical for AI clusters, where tiny network issues can stall synchronized GPU jobs and waste massive compute resources,” said Sameh Boujelbene, vice president of Dell’Oro Group.
“Operators therefore need visibility across both the network and the hosts (congestion, NIC buffering, RDMA behavior, and collective performance), all at once. The key idea is to unify host and network telemetry into one correlated view. Many failures happen between layers, and siloed monitoring hides the root cause. A single timeline that combines both perspectives lets operators see the full pipeline and diagnose complex performance problems much faster,” Boujelbene said.
According to Alan Weckel, co-founder and analyst with the 650 Group, telemetry is key to understanding what is actually occurring in AI fabrics, and Arista has a lot of these features already on the switch side.
Arista bought Big Switch and its Big Cloud Fabric in 2020, and that technology lets customers manage physical switches as a single fabric, including security, automation, orchestration and analytics. Importantly, the software can run on a variety of certified switches from Dell EMC, HPE and others.
