Back to stories
Tools

Microsoft Copilot Cowork Is Now Available — Claude and GPT Work Together

Michael Ouroumis3 min read
Microsoft Copilot Cowork Is Now Available — Claude and GPT Work Together

Microsoft just made two of the AI industry's most significant competitors work together inside the same product. Copilot Cowork — the company's bet on long-running agentic work within Microsoft 365 — is now available through the Microsoft Frontier early-access program, and it ships with a feature that would have been unthinkable eighteen months ago: Anthropic's Claude and Microsoft's GPT-based models operating in a coordinated pipeline.

This isn't Microsoft hedging its bets by offering model choice. It's a more deliberate design decision: use each model for what it does best, in sequence, and measure whether the combination outperforms either model working alone.

The Two-Model Pipeline Behind the Numbers

The headline feature is Researcher Critique. When a user asks for a research report, GPT drafts the initial document. Claude then reviews the draft for factual accuracy, logical consistency, and completeness — a second pair of eyes that the product can run automatically without user intervention.

The result, according to Microsoft: a 13.8% improvement on the DRACO benchmark — Deep Research Accuracy, Completeness, Objectivity. That's a specific, named benchmark, not a vague internal metric, and 13.8% is a meaningful gap at the frontier of research quality.

The architecture reflects a pattern that's emerging across the industry: different models have different failure modes. GPT-class models are generally strong at generation and synthesis; Claude has a reputation for being more conservative about hallucination and more willing to flag uncertainty. Running Claude as a critique pass after GPT generates catches a class of errors that neither model reliably catches in isolation.

The second significant new feature is Model Council — a side-by-side comparison view that shows responses from multiple models to the same prompt. This isn't a novel idea in AI tooling, but it's the first time Microsoft has productized it inside the M365 surface, giving enterprise users a direct way to evaluate model differences on their actual work rather than synthetic benchmarks.

Agentic Work in Practice

Beyond the two-model architecture, Copilot Cowork is fundamentally positioned as a product for outcomes rather than interactions. The design philosophy, as Microsoft describes it: "Delegate the outcome you want, Copilot Cowork creates a plan, reasons across your tools and files, and carries work forward."

That's a different user model than a chatbot or even a copilot in the traditional sense. It implies state persistence, tool use, and the ability to execute over minutes or hours rather than seconds. The system is designed to connect steps — pull data from a SharePoint file, synthesize it with an email thread, draft a document, flag inconsistencies, and report back.

Capital Group, one of the early enterprise users, described it in terms that reflect this: "It's about taking real action — connecting steps, coordinating tasks, and following through." That's language from a firm managing trillions in assets that presumably has specific, high-stakes workflow requirements. The fact that they're in early access suggests Copilot Cowork has cleared some level of enterprise reliability bar that earlier agentic products haven't.

What Microsoft Is Actually Building

The strategic picture here is more interesting than any individual feature. Microsoft has exclusive partnership rights with OpenAI, but it's also licensing Claude from Anthropic and building products that use both. The exclusive is about Azure compute and model deployment; the product layer is clearly being designed to be model-agnostic at the task level.

This creates a durable competitive advantage: Microsoft can upgrade the underlying models without changing the user interface, and it can route tasks to whichever model performs best for that specific task type. If GPT-5 is better at synthesis and Claude Opus is better at critique, the pipeline can encode that knowledge.

The 13.8% DRACO benchmark improvement is worth taking seriously precisely because Microsoft named it. Companies don't surface specific benchmark numbers in product launches unless they expect scrutiny. DRACO — measuring accuracy, completeness, and objectivity in deep research tasks — is a reasonable proxy for the workflows that make Copilot Cowork worth using in enterprise settings.

Copilot Cowork is available now through the Microsoft Frontier program. Frontier is Microsoft's early-access tier for its most experimental 365 features, meaning broad general availability is likely months away. But the architecture is visible, and the direction is clear: the future of enterprise AI tooling is multi-model pipelines, not single-model chat.

Learn AI for Free — FreeAcademy.ai

Take "Prompt Engineering Practice" — a free course with certificate to master the skills behind this story.

More in Tools

Rivian Rolls Out 'Hey Rivian' AI Assistant in 2026.15 Update With Google Calendar Sync
Tools

Rivian Rolls Out 'Hey Rivian' AI Assistant in 2026.15 Update With Google Calendar Sync

Rivian's AI-powered Rivian Assistant and Unified Intelligence platform begin rolling out to Gen 1 and Gen 2 R1T and R1S owners with active Connect+ subscriptions, marking the EV maker's pivot from software-defined to AI-defined vehicles.

1 day ago2 min read
OpenAI Launches 'Daybreak' Cybersecurity Platform to Find and Fix Bugs Before Attackers Do
Tools

OpenAI Launches 'Daybreak' Cybersecurity Platform to Find and Fix Bugs Before Attackers Do

OpenAI unveiled Daybreak, a security initiative built on GPT-5.5 and a Codex Security agent that scans code, models threats, and validates patches — positioning it head-to-head with Anthropic's Mythos.

2 days ago2 min read
Digg Reboots — Again — This Time as an AI News Aggregator Built on X Signals
Tools

Digg Reboots — Again — This Time as an AI News Aggregator Built on X Signals

Kevin Rose's Digg has relaunched in beta as an AI-focused news ranker that mines X for what stories are gaining traction, just months after its Reddit-style reboot was wound down.

3 days ago2 min read