Perplexity Hybrid Inference Splits AI Work Between Device and Cloud

By Khal · Jun 3, 2026 (1 month ago) · 1 min read

Editorial illustration for: Perplexity Splits AI Work Between Your Device and Cloud to Cut Costs — Published Jun 3, 2026, 7:51 p.m. (1 month ago)

In brief

Perplexity CEO announced hybrid inference orchestrator at Computex 2026 on June 2
System routes simple tasks to local device, complex reasoning to cloud servers
Local model protects sensitive data like financial records and health information
Design optimizes efficiency: maximum token value per watt per user
Launches in Perplexity Computer in July

How It Works

A compact model runs locally on your device and acts as a traffic cop—figuring out which information is sensitive and which tasks need cloud-based frontier models. Simple tasks like summarizing documents, formatting text, and lightweight classification run locally. Complex reasoning gets routed to the cloud automatically, with no user selection required.

The system is designed for work involving sensitive data such as financial records, health information, and personal files that require powerful AI capabilities. Currently, almost all AI inference happens on remote servers owned by AI companies, meaning user data travels to external computers before responses are generated. Perplexity's approach keeps that data on your machine when possible.

The Economics and Philosophy

Perplexity's goal for the AI system is to deliver the most token value per watt for each user. The company isn't open-sourcing the local model—it's a compact model deployed as part of Perplexity's app, with cloud routing through Perplexity's servers.

The efficiency pitch matters. Some organizations spend half a billion dollars per month on compute. Offloading inference work to user hardware reduces server costs, a critical concern as AI companies scale. Srinivas framed the problem as centralization: "You don't want all your compute centralized in servers and everything running through the largest models."

Perplexity's revenue grew from $100 million to $500 million, and hybrid inference could help the company sustain that growth trajectory without proportional increases in infrastructure spend. The July rollout will be the first real test of whether users embrace splitting their AI workload across local and cloud endpoints.

Perplexity Hybrid Inference Splits AI Work Between Device and Cloud

In brief

How It Works

The Economics and Philosophy

Related stories

France Orders ISPs to Block Polymarket Over Gambling Concerns

XLK Technology ETF Posts $9B in Outflows, Worst Among Sectors

US service members killed in Iranian strikes on Jordan air base