What did the Stanford AI tutoring study find?

The study found that middle school students who used an AI tutoring system for 30 minutes daily over six months scored 2.1 times higher on standardized math assessments compared to students who received only traditional classroom instruction. The effect was strongest among students who started with below-average scores.

What AI tutoring system was used in the study?

The study used Khanmigo, Khan Academy's AI tutor powered by GPT-4, which provides personalized instruction, Socratic questioning, and adaptive problem sets. The system adjusts difficulty in real time based on student responses.

Does AI tutoring replace human teachers?

The study explicitly does not support replacing teachers. The AI tutor was used as a supplement to regular classroom instruction, not a replacement. Students in the AI group still attended all regular classes. The researchers recommend AI tutoring as a complement to human teaching.

Stanford Study: AI Tutoring Doubled Student Test Scores in Six Months

A large-scale randomized controlled trial led by Stanford University's Graduate School of Education has produced the strongest evidence yet that AI tutoring systems can dramatically improve student outcomes. Students who used an AI tutor for 30 minutes daily over six months scored 2.1 times higher on standardized math assessments than a control group receiving only traditional instruction.

Study Design

The trial enrolled 4,200 middle school students across 38 schools in California, Texas, and Georgia. Half were randomly assigned to use Khanmigo, Khan Academy's AI tutor powered by GPT-4, for 30 minutes daily during a dedicated study period. The other half spent the same time on traditional practice worksheets and textbook exercises. Both groups attended identical regular math classes.

The study ran for the full 2025-2026 academic year, with assessments at baseline, three months, and six months. Researchers controlled for socioeconomic status, prior academic performance, school quality, and teacher experience.

Key Findings

The headline result is a 2.1x improvement in standardized math scores for the AI tutoring group compared to the control group after six months. But the details are more nuanced and arguably more interesting.

The effect was strongest among students who entered the study with below-average math scores. These students showed a 2.8x improvement, suggesting that AI tutoring is particularly effective for students who are behind. Students who started at or above grade level showed a more modest 1.4x improvement.

Engagement was another surprise. Students in the AI group completed 40% more practice problems than the control group, even though they were given the same amount of time. The researchers attribute this to the AI tutor's adaptive difficulty — it kept students in a productive challenge zone rather than giving them problems that were too easy or too hard.

How the AI Tutor Works

Khanmigo does not simply present problems and check answers. It uses Socratic questioning — asking students to explain their reasoning, identify errors, and work through misconceptions step by step. When a student is stuck, it provides hints rather than answers. When a student makes a systematic error, it identifies the underlying misconception and addresses it directly.

The system adapts in real time. If a student masters a concept quickly, it accelerates. If they struggle, it breaks the material into smaller steps and provides additional scaffolding.

Limitations

The researchers are careful to note limitations. The study measured math performance only. It did not assess other subjects where AI tutoring may be less effective. The 30-minute daily commitment required dedicated school time that many districts may struggle to provide. And the study period of six months does not show whether gains persist long-term.

Implications

The findings arrive as school districts across the country debate AI adoption. Several states have banned AI tools in classrooms, while others are piloting them aggressively. This study provides the most rigorous evidence to date that AI tutoring, when implemented as a supplement to human teaching, can produce meaningful improvements in student outcomes.

Khan Academy announced that it will make the study's full dataset available for independent replication.

Stanford Study: AI Tutoring Doubled Student Test Scores in Six Months

Study Design

Key Findings

How the AI Tutor Works

Limitations

Implications

More in Research

Northwestern's Printed Artificial Neurons Talk Back to Living Brain Cells

Honor's Autonomous Humanoid Robot Wins Beijing Half-Marathon in 50:26, Outpacing Human World Record

Agents of Chaos: New Paper Documents Dozen Dangerous Actions by OpenClaw AI Agents