Apple has used its WWDC 2026 keynote to announce what it calls the most significant upgrade to Siri since its launch — a 3-billion parameter large language model that runs entirely on-device, processing natural language requests without sending a single byte to the cloud.
On-Device First
The technical achievement is the core of the announcement. Apple has compressed a capable language model to run on its custom neural engine with what the company claims is negligible impact on battery life. The model powers Siri's ability to understand multi-turn conversations, summarize long documents, draft emails and messages, and control third-party apps through natural language — all without an internet connection.
"Every AI company asks you to send your data to their servers," said Tim Cook during the keynote. "We built one that never needs to."
The model runs on devices with Apple's A19 Pro chip or later (iPhone 17 Pro), M4 or later (iPad and Mac). Older devices will continue using the existing Apple Intelligence features introduced in iOS 18.
What Siri Can Do Now
The upgraded Siri handles tasks that previously required cloud-based AI. In demonstrations, Apple showed Siri summarizing a 40-page PDF entirely on-device, drafting a reply to a complex email thread with awareness of previous messages, and controlling a sequence of app actions — "find the restaurant Sarah mentioned in our Messages conversation last week, check if they have availability Saturday, and make a reservation for four."
The system also supports what Apple calls "App Intents Intelligence," which lets Siri understand and operate within third-party apps that adopt the new API. Developers can expose app functionality to Siri without sharing user data with Apple.
Cloud Tier for Heavy Tasks
For tasks that exceed the on-device model's capabilities — complex code generation, advanced reasoning, large-scale data analysis — Apple is offering a cloud processing tier using its own server-side models running in what it calls Private Cloud Compute. These servers use Apple Silicon, and Apple claims that no user data is stored or logged.
The cloud tier is optional and requires explicit user consent for each request. It is included at no additional cost for iCloud+ subscribers.
Developer Tools
Apple released a suite of developer tools for building with the on-device model. A new framework called FoundationModels lets developers run inference against the on-device LLM directly from their apps, with results that never leave the device. Early partners including Notion, Slack, and Adobe demonstrated integrations at the keynote.
Industry Impact
Apple's approach is the polar opposite of the cloud-first strategy pursued by OpenAI, Google, and Anthropic. By running AI on-device, Apple avoids the massive infrastructure costs of cloud inference while reinforcing its privacy brand. The trade-off is capability — a 3-billion parameter model cannot match the reasoning depth of a frontier cloud model. But for the tasks most people use AI for daily, it may be enough.



