Virtuals integrates Leyten's GPU engine to run GLM-5.2 across AI agent network
In brief
- Virtuals Protocol integrated Leyten's shard engine to distribute GLM-5.2 inference across networked GPUs
- GLM-5.2 features 744 billion parameters and 1 million token context window, released by Z.ai June 16
- Integration reduces reliance on centralized cloud infrastructure for decentralized frontier-scale AI inference
How the integration works
Leyten's shard engine uses pipeline-parallel inference to slice large models into pieces and distribute them across separate GPUs. This approach lets Virtuals run frontier-scale AI without relying on single massive GPU clusters or centralized cloud infrastructure.
GLM-5.2 is a substantial model. The open-weight architecture contains approximately 744 billion total parameters, though only 39 to 40 billion are active per token. The model uses a mixture-of-experts design, activating only a fraction of its parameters for any given task. It also ships with a 1 million token context window, five times larger than its predecessor GLM-5.1.
The broader shift
Z.ai released GLM-5.2 to subscribers on June 13, 2026, before making it publicly available three days later. The timing reflects growing momentum around open-weight models as alternatives to proprietary frontier AI.
Virtuals' ecosystem runs on its native token, VIRTUAL. By distributing inference across a decentralized network of GPUs, the platform sidesteps the infrastructure bottlenecks that centralized AI providers face. This integration signals a shift toward decentralized AI infrastructure in the crypto space, where cost and access constraints have historically limited what's possible onchain.


