Back to stories
Research

Stanford Study: AI Tutoring Doubled Student Test Scores in Six Months

Michael Ouroumis3 min read
Stanford Study: AI Tutoring Doubled Student Test Scores in Six Months

A large-scale randomized controlled trial led by Stanford University's Graduate School of Education has produced the strongest evidence yet that AI tutoring systems can dramatically improve student outcomes. Students who used an AI tutor for 30 minutes daily over six months scored 2.1 times higher on standardized math assessments than a control group receiving only traditional instruction.

Study Design

The trial enrolled 4,200 middle school students across 38 schools in California, Texas, and Georgia. Half were randomly assigned to use Khanmigo, Khan Academy's AI tutor powered by GPT-4, for 30 minutes daily during a dedicated study period. The other half spent the same time on traditional practice worksheets and textbook exercises. Both groups attended identical regular math classes.

The study ran for the full 2025-2026 academic year, with assessments at baseline, three months, and six months. Researchers controlled for socioeconomic status, prior academic performance, school quality, and teacher experience.

Key Findings

The headline result is a 2.1x improvement in standardized math scores for the AI tutoring group compared to the control group after six months. But the details are more nuanced and arguably more interesting.

The effect was strongest among students who entered the study with below-average math scores. These students showed a 2.8x improvement, suggesting that AI tutoring is particularly effective for students who are behind. Students who started at or above grade level showed a more modest 1.4x improvement.

Engagement was another surprise. Students in the AI group completed 40% more practice problems than the control group, even though they were given the same amount of time. The researchers attribute this to the AI tutor's adaptive difficulty — it kept students in a productive challenge zone rather than giving them problems that were too easy or too hard.

How the AI Tutor Works

Khanmigo does not simply present problems and check answers. It uses Socratic questioning — asking students to explain their reasoning, identify errors, and work through misconceptions step by step. When a student is stuck, it provides hints rather than answers. When a student makes a systematic error, it identifies the underlying misconception and addresses it directly.

The system adapts in real time. If a student masters a concept quickly, it accelerates. If they struggle, it breaks the material into smaller steps and provides additional scaffolding.

Limitations

The researchers are careful to note limitations. The study measured math performance only. It did not assess other subjects where AI tutoring may be less effective. The 30-minute daily commitment required dedicated school time that many districts may struggle to provide. And the study period of six months does not show whether gains persist long-term.

Implications

The findings arrive as school districts across the country debate AI adoption. Several states have banned AI tools in classrooms, while others are piloting them aggressively. This study provides the most rigorous evidence to date that AI tutoring, when implemented as a supplement to human teaching, can produce meaningful improvements in student outcomes.

Khan Academy announced that it will make the study's full dataset available for independent replication.

More in Research

AI2 Releases OLMo Hybrid: Combining Transformers and RNNs for 2x Data Efficiency
Research

AI2 Releases OLMo Hybrid: Combining Transformers and RNNs for 2x Data Efficiency

The Allen Institute for AI releases OLMo Hybrid, a fully open 7B model that blends transformer attention with linear recurrent layers, achieving the same accuracy as OLMo 3 using 49% fewer tokens.

8 hours ago2 min read
DeepMind's AlphaCode 3 Beats 99% of Competitive Programmers
Research

DeepMind's AlphaCode 3 Beats 99% of Competitive Programmers

Google DeepMind releases AlphaCode 3, an AI system that performs at the 99th percentile on Codeforces, effectively matching the level of the world's top competitive programmers.

1 day ago2 min read
Oxford AI System Predicts Heart Attacks Up to 10 Years in Advance With 92% Accuracy
Research

Oxford AI System Predicts Heart Attacks Up to 10 Years in Advance With 92% Accuracy

Researchers at the University of Oxford have developed CardioSense, an AI system that analyzes routine blood tests and ECG data to predict cardiac events up to a decade before they occur.

3 days ago3 min read