When AI in Education Makes Students Worse at What Matters: What a Large RCT Means for Neurodivergent Learners

David Ruttenberg
1 day ago
6 min read

By Dr David Ruttenberg | April 2026 | ~1,500 words · approx. 4.5-minute read

A student sits at a desk in a dim classroom, laptop glowing while a closed textbook and unused notebook sit to the side — illustrating over-reliance on AI tools instead of actual learning. — ***AI can make students look stronger in the moment — and 17% weaker when the model disappears.***

A large randomized control trial (RCT) in a real school just delivered a result that should make every educator pause.

When high school students were given unrestricted GPT‑4 access during math practice, their grades shot up. On average they saw a 48% improvement with a basic GPT‑4 tutor (“GPT Base”) and a 127% improvement with a more guided version (“GPT Tutor”) (Bastani et al., 2025).

On paper, that looks like AI fulfilling every promise in the marketing decks: more help, better performance, happier students.

But then researchers did the one thing too few AI evaluations bother to do.

They took the model away.

On a later exam where no AI tools were allowed, the students who had used GPT‑4 during practice didn’t just fall back to the level of the control group.

They did worse.

Students with access to the basic GPT‑4 tutor scored 17% lower on the final exam than students who had never used AI at all (Bastani et al., 2025).

The same tool that made them look more capable during practice quietly left them less capable when it actually mattered.

What the Bastani et al. Trial Really Shows About AI in Education

The study, led by Bastani and colleagues and later published in PNAS as “Generative AI Without Guardrails Can Harm Learning: Evidence from High School Mathematics”, took place in a high school math curriculum with nearly a thousand students (Bastani et al., 2025). It compared three groups:

Control: No GPT‑4
GPT Base: Standard chat interface, essentially a math‑help chatbot
GPT Tutor: GPT‑4 constrained with “learning‑safe” prompts — teacher‑designed hints, guarded answers, and nudges toward student reasoning rather than full solutions (Bastani et al., 2025)

During practice sessions, both AI conditions improved performance dramatically. But when the final exam came — with no AI access — a sharp divergence appeared:

The GPT Base group showed a 17% reduction in exam grades relative to the control group (Bastani et al., 2025).
The GPT Tutor group largely avoided this drop, suggesting that carefully designed guardrails can mitigate harm (Bastani et al., 2025).

The authors describe the mechanism plainly: students used GPT‑4 as a “crutch” (Bastani et al., 2025). When the model was present, they leaned on it instead of doing the cognitive work required to encode the skill. When the model disappeared, the illusion of competence disappeared with it.

This aligns with existing learning science: we already know that “desirable difficulties” — effortful retrieval, grappling with problems, making and correcting mistakes — produce more durable learning than feeling smooth and easy in the moment (Bjork & Bjork, 2011; Deslauriers et al., 2019). Generative AI, used without a learning‑safe design, does exactly the opposite. It makes things feel easier and look better while quietly hollowing out the underlying skill (Bastani et al., 2025).

Why This Is a Neurodiversity Issue, Not Just an EdTech Issue

On its face, the Bastani trial is about high school math.

But if you’re a neurodivergent learner — or you care about one — this is a neurodiversity story (Crompton et al., 2023).

Students with autism, ADHD, dyslexia, and other neurodevelopmental differences are already navigating classrooms calibrated to neurotypical pacing, attention patterns, and output expectations (Ruttenberg, 2024). Many are told, implicitly or explicitly, that they are “behind,” “disorganized,” or “not working to potential.”

In that context, AI tools are marketed as accommodations:

“This will help you get your ideas out faster.”
“This will take care of the boring parts.”
“This will level the playing field.”

But when AI is dropped into that environment without guardrails, neurodivergent students become the ones most at risk of over‑reliance (Ruttenberg, 2024).

They are more likely to be steered toward AI as a default support — by well‑meaning teachers who want to see improved grades and calmer classrooms. They may already carry a history of being told their own strategies are “wrong,” making it easier to trust the model over their own thinking. They may experience more intense cognitive fatigue and sensory load, making the path of least resistance — “let GPT‑4 handle it” — feel not just tempting but necessary (Ruttenberg, 2024).

In my S²MHD work, I map a sensory‑to‑mental‑health pathway where overload leads to anxiety and fatigue, which then collapse attention (Ruttenberg, 2026). In a classroom, that collapse is exactly when an always‑available AI “helper” looks most attractive. If the tool is not carefully designed to protect learning, it can convert moments of overload into moments of skill loss.

For neurodivergent students, that doesn’t just show up as a test score. It shows up as yet another piece of “evidence” that they can’t perform without technological scaffolding — even when the scaffolding itself is what undermined their performance.

How Schools Should Be Thinking About AI Integration

The takeaway from this trial is not “ban GPT‑4 in schools.” The authors themselves show that a thoughtfully constrained tutor — GPT‑4 with teacher‑designed hints and guardrails — can avoid the worst harms (Bastani et al., 2025).

The real takeaway is more uncomfortable:

If you deploy AI into education without a learning‑safe design, you are running an uncontrolled experiment on students’ long‑term competence — and neurodivergent learners will bear the brunt of the negative effects.

For schools and districts, that means at least three concrete responsibilities:

1. Treat AI like a pharmacological intervention, not a new worksheet.

You wouldn’t give every student an unlabelled pill that makes homework feel easier but reduces exam performance by 17% later. AI should be evaluated with the same seriousness: randomized trials, disaggregated outcomes, and explicit attention to how different learner groups are affected (Bastani et al., 2025).

2. Design for “friction” on purpose.

If an AI tool gives full solutions instantly, it is almost guaranteed to harm learning. Tools should be configured to withhold answers, offer graduated hints, and prompt students to show their own reasoning — even if that feels slower in the moment (Bjork & Bjork, 2011; Deslauriers et al., 2019).

3. Include neurodivergent students and specialists in AI policy decisions.

If the people designing your AI policies have never had to navigate a classroom with sensory overload, ADHD time‑blindness, or autistic attention patterns, they will miss how quickly “help” becomes “dependency.” Neurodivergent students, parents, and clinicians need to be at the table — with real influence, not symbolic input (Crompton et al., 2023; Ruttenberg, 2024).

The Question I’d Ask Every School Right Now

After reading the Bastani trial, I keep coming back to one simple question:

Where in your AI strategy is it someone’s explicit job to protect long‑term learning — especially for neurodivergent students — even when short‑term grades go up? (Bastani et al., 2025)

If you can’t point to a person, a policy, and a set of guardrails that answer that question, then “AI in education” isn’t a modernization strategy.

It’s a learning‑risk strategy you haven’t named yet.

REFERENCES

Bastani, H., Bastani, O., Sinha, A., et al. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122(26), e2422633122. https://doi.org/10.1073/pnas.2422633122

Bastani, H., Bastani, O., Sinha, A., et al. (2024). Generative AI can harm learning (Working paper). Social Science Research Network. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486

Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher et al. (Eds.), Psychology and the real world (pp. 56–64). Worth Publishers.

Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251–19257.

Crompton, C. J., et al. (2023). Participatory methods to engage autistic people in the design of research. Autism, 27(4), 1030–1042.

About the Author

Dr David Ruttenberg PhD, FRSA, FIoHE, AFHEA, HSRF is a neuroscientist, autism advocate, Fulbright Specialist Awardee, and Senior Research Fellow dedicated to advancing ethical artificial intelligence, neurodiversity accommodation, and transparent science communication. With a background spanning music production to cutting-edge wearable technology, Dr Ruttenberg combines science and compassion to empower individuals and communities to thrive. Inspired daily by their brilliant autistic daughter and family, Dr Ruttenberg strives to break barriers and foster a more inclusive, understanding world. #AIinEducation #AutismAISafety #GenerativeAI #Neurodiversity #NeurodivergentLearners #EthicalAI #GPT4 #LearningScience #DesirableDifficulties #EducationalTechnology #InclusiveEducation #S2MHD #CognitiveLiberty #HumanCenteredAI #NothingAboutUsWithoutUs

Dr David P Ruttenberg

PhD, FRSA, FIoHE, AFHEA, HSRF

Neuroscientist & AI-Ethics Specialist
Honorary Senior Research Fellow & Fulbright Specialist
Creator of Neuro-adaptive/Sensory Sensitivity Technologies

I help organisations deploy AI that enhances human cognition—ethically and inclusively.