Virtuals integrates Leyten's GPU engine to run GLM-5.2 across AI agent network

Editorial illustration for: Virtuals integrates Leyten's distributed GPU engine to run GLM-5.2 across its AI agent network

In brief

  • Virtuals Protocol integrated Leyten's shard engine to distribute GLM-5.2 inference across networked GPUs
  • GLM-5.2 features 744 billion parameters and 1 million token context window, released by Z.ai June 16
  • Integration reduces reliance on centralized cloud infrastructure for decentralized frontier-scale AI inference

How the integration works

Leyten's shard engine uses pipeline-parallel inference to slice large models into pieces and distribute them across separate GPUs. This approach lets Virtuals run frontier-scale AI without relying on single massive GPU clusters or centralized cloud infrastructure.

GLM-5.2 is a substantial model. The open-weight architecture contains approximately 744 billion total parameters, though only 39 to 40 billion are active per token. The model uses a mixture-of-experts design, activating only a fraction of its parameters for any given task. It also ships with a 1 million token context window, five times larger than its predecessor GLM-5.1.

The broader shift

Z.ai released GLM-5.2 to subscribers on June 13, 2026, before making it publicly available three days later. The timing reflects growing momentum around open-weight models as alternatives to proprietary frontier AI.

Virtuals' ecosystem runs on its native token, VIRTUAL. By distributing inference across a decentralized network of GPUs, the platform sidesteps the infrastructure bottlenecks that centralized AI providers face. This integration signals a shift toward decentralized AI infrastructure in the crypto space, where cost and access constraints have historically limited what's possible onchain.