Back to stories
Tools

Microsoft Copilot Cowork Is Now Available — Claude and GPT Work Together

Michael Ouroumis3 min read
Microsoft Copilot Cowork Is Now Available — Claude and GPT Work Together

Microsoft just made two of the AI industry's most significant competitors work together inside the same product. Copilot Cowork — the company's bet on long-running agentic work within Microsoft 365 — is now available through the Microsoft Frontier early-access program, and it ships with a feature that would have been unthinkable eighteen months ago: Anthropic's Claude and Microsoft's GPT-based models operating in a coordinated pipeline.

This isn't Microsoft hedging its bets by offering model choice. It's a more deliberate design decision: use each model for what it does best, in sequence, and measure whether the combination outperforms either model working alone.

The Two-Model Pipeline Behind the Numbers

The headline feature is Researcher Critique. When a user asks for a research report, GPT drafts the initial document. Claude then reviews the draft for factual accuracy, logical consistency, and completeness — a second pair of eyes that the product can run automatically without user intervention.

The result, according to Microsoft: a 13.8% improvement on the DRACO benchmark — Deep Research Accuracy, Completeness, Objectivity. That's a specific, named benchmark, not a vague internal metric, and 13.8% is a meaningful gap at the frontier of research quality.

The architecture reflects a pattern that's emerging across the industry: different models have different failure modes. GPT-class models are generally strong at generation and synthesis; Claude has a reputation for being more conservative about hallucination and more willing to flag uncertainty. Running Claude as a critique pass after GPT generates catches a class of errors that neither model reliably catches in isolation.

The second significant new feature is Model Council — a side-by-side comparison view that shows responses from multiple models to the same prompt. This isn't a novel idea in AI tooling, but it's the first time Microsoft has productized it inside the M365 surface, giving enterprise users a direct way to evaluate model differences on their actual work rather than synthetic benchmarks.

Agentic Work in Practice

Beyond the two-model architecture, Copilot Cowork is fundamentally positioned as a product for outcomes rather than interactions. The design philosophy, as Microsoft describes it: "Delegate the outcome you want, Copilot Cowork creates a plan, reasons across your tools and files, and carries work forward."

That's a different user model than a chatbot or even a copilot in the traditional sense. It implies state persistence, tool use, and the ability to execute over minutes or hours rather than seconds. The system is designed to connect steps — pull data from a SharePoint file, synthesize it with an email thread, draft a document, flag inconsistencies, and report back.

Capital Group, one of the early enterprise users, described it in terms that reflect this: "It's about taking real action — connecting steps, coordinating tasks, and following through." That's language from a firm managing trillions in assets that presumably has specific, high-stakes workflow requirements. The fact that they're in early access suggests Copilot Cowork has cleared some level of enterprise reliability bar that earlier agentic products haven't.

What Microsoft Is Actually Building

The strategic picture here is more interesting than any individual feature. Microsoft has exclusive partnership rights with OpenAI, but it's also licensing Claude from Anthropic and building products that use both. The exclusive is about Azure compute and model deployment; the product layer is clearly being designed to be model-agnostic at the task level.

This creates a durable competitive advantage: Microsoft can upgrade the underlying models without changing the user interface, and it can route tasks to whichever model performs best for that specific task type. If GPT-5 is better at synthesis and Claude Opus is better at critique, the pipeline can encode that knowledge.

The 13.8% DRACO benchmark improvement is worth taking seriously precisely because Microsoft named it. Companies don't surface specific benchmark numbers in product launches unless they expect scrutiny. DRACO — measuring accuracy, completeness, and objectivity in deep research tasks — is a reasonable proxy for the workflows that make Copilot Cowork worth using in enterprise settings.

Copilot Cowork is available now through the Microsoft Frontier program. Frontier is Microsoft's early-access tier for its most experimental 365 features, meaning broad general availability is likely months away. But the architecture is visible, and the direction is clear: the future of enterprise AI tooling is multi-model pipelines, not single-model chat.

How AI Actually Works — Free Book on FreeLibrary

A free book that explains the AI concepts behind the headlines — no jargon, just clarity.

More in Tools

Attie Can Now Vibe-Code Full Apps Directly on the AT Protocol
Tools

Attie Can Now Vibe-Code Full Apps Directly on the AT Protocol

Attie, an AI coding assistant built for Bluesky's open AT Protocol, can now generate complete working apps for the decentralized social web. The Verge's Terrence O'Brien reports it's the clearest sign yet that AI is making decentralized development accessible.

17 hours ago4 min read
PixVerse V6 Launches — AI Video Generation Gets Agentic Workflows
Tools

PixVerse V6 Launches — AI Video Generation Gets Agentic Workflows

Singapore-based PixVerse launched V6 of its AI video platform today, bringing more precise shot execution, camera control, and agentic workflow integration. With OpenAI's Sora gone, PixVerse is well-positioned to absorb its former user base.

17 hours ago4 min read
Cline Launches Kanban: Multi-Agent Orchestration for Claude Code and Codex
Tools

Cline Launches Kanban: Multi-Agent Orchestration for Claude Code and Codex

Cline has launched Kanban, a standalone app for managing multiple AI coding agents simultaneously. Tasks run in isolated git worktrees, you review diffs visually, and dependency chains let agents complete large work packages autonomously.

2 days ago2 min read