Why AI Might Be the Best — and Worst — Thing to Happen to Education

Listen on Youtube

Episode Summary

Is AI the future of learning — or the fastest way to stop learning altogether?

Across classrooms in Bangladesh and beyond, teachers are being told that generative AI will “transform education.” It promises to save time, personalise tutoring, and make every student more creative. But what if those promises come with a hidden cost?

In this episode of EBTD Research Bites, we take a hard look at the evidence behind the hype. Drawing on new studies from Harvard, Dinsmore & Frier, and Carl Hendrick, this deep dive separates science from speculation.

You’ll discover why some AI tutors can double learning gains, while others quietly undermine critical thinking. We explore the Model of Domain Learning — showing how expertise develops through struggle, feedback, and cognitive effort — and why removing that friction can destroy genuine understanding.

This is the real debate every teacher and policymaker should be having:
Is AI acting as a scaffold, supporting students as they build knowledge?
Or as a crutch, letting them skip the hard thinking that makes learning last?

From classrooms in Dhaka to rural schools across Bangladesh, this episode unpacks what “AI in education” really means for teachers, students, and the future of learning.

Because the question isn’t whether AI will change education — it already has.
The question is: Will it make us smarter, or just faster at forgetting?

Key Takeaways

1. AI is powerful — but not always pedagogical.
Generative AI tools are exceptional at producing fluent, convincing answers. But fluency isn’t the same as understanding. Without the right design, AI can make learning feel easier while quietly weakening the effortful thinking that builds long-term memory.

2. The “Scaffold vs Crutch” distinction matters.
AI can support learning when it acts as a scaffold — prompting recall, offering feedback, and encouraging deliberate practice. But when it becomes a crutch, giving instant solutions or polished explanations, it removes the very struggle that forms expertise.

3. Learning depends on cognitive effort, not convenience.
Research from Dinsmore & Frier and Carl Hendrick shows that genuine learning happens through retrieval, application, and feedback. Efficiency isn’t always a virtue — the mental friction that AI often removes is the same friction that builds understanding.

4. AI tutors can outperform humans — under strict conditions.
The Harvard study on GPT-4-based tutors found learning gains up to double those from expert human teachers, but only when the AI was tightly constrained, used for foundational practice, and designed around established learning science principles.

5. Poorly designed AI can harm learning.
When students rely on unrestricted tools like ChatGPT for homework or explanations, studies show lower performance and reduced critical-thinking ability — a classic case of cognitive offloading and illusion of understanding.

6. Design decides everything.
Helpful AI is deliberately “strategically unhelpful”: it resists giving easy answers and instead nudges learners to think. Harmful AI smooths out all the struggle, turning learning into task completion. The same technology can produce either outcome, depending entirely on its design.

7. Teachers still hold the human advantage.
AI can deliver information, feedback, and practice — but it can’t replicate relationships, motivation, or cultural context. The most effective future classrooms will combine AI precision with human empathy.

8. The question isn’t if AI will reshape education — but how.
Educators must decide whether to use AI as a tool for human flourishing or allow it to become a system of shortcuts that hollow out learning. The future of education in Bangladesh — and globally — depends on making that choice wisely.

Research Notes & Links

1. The Algorithmic Turn: The Emerging Role of AI in Education

Link: https://carlhendrick.substack.com/p/the-algorithmic-turn-the-emerging
Description:
Education researcher Carl Hendrick explores how artificial intelligence is transforming not just how students learn, but what it means to learn at all. He argues that as algorithmic tools become embedded in classrooms, they risk shifting education from a process of cognitive effort to one of effortless completion. Hendrick’s essay raises critical questions about the future role of teachers, the illusion of understanding created by fluent AI responses, and how to safeguard genuine learning in an age of digital shortcuts.

2. “Generative Artificial Intelligence and Learning: Opportunities, Risks, and Evidence” – Educational Research Review (2025)

Link: https://www.sciencedirect.com/science/article/abs/pii/S1041608025002109?dgcid=rss_sd_all
Description:
This peer-reviewed article provides one of the most comprehensive analyses yet of how generative AI affects student learning outcomes. Drawing on experimental data, the authors distinguish between systems designed to enhance cognitive engagement and those that promote cognitive offloading. Their findings highlight a growing paradox: when AI tools are aligned with the science of learning, they can double student progress; when poorly designed, they may erode critical-thinking skills and long-term retention.

Transcript

The Cognitive Crutch vs The Scaffold

Transcript – EBTD Research Bites

Welcome back to the deep dive and uh this one is a bit special. It’s tailored specifically for EBTD’s research bites series.

That’s right. So, a big hello to all the teachers and leaders listening in from schools across Bangladesh. We know you’re right in the thick of it seeing generative AI everywhere feeling the pressure, maybe some excitement, too.

Oh, absolutely. It’s moving incredibly fast and you hear these big promises, right? People like Peter Deng, Sam Alman, they’re saying AI will free us up, let us think deeper, be more creative, give us our time back.

Exactly. But our job today really is to cut through that hype. We need to dig into the actual evidence. What does the research say GenAI is really doing to student learning?

So, we’re pulling together insights from some key analyses Dinsmore and Frier on how expertise develops versus what GenAI does. And Carl Hendrick’s look at the evidence coming out on AI tutors.

And the big question, the one we really need to answer for you is, do these tech promises actually hold up when you look at how people—how students—actually learn things?

That really is the fundamental question, isn’t it? And the research points to a pretty significant gap right from the start.

Section One: The Foundational Challenge

Okay, let’s unpack that. What’s the basic misunderstanding here?

Well, Dinsmore and Frier point out that a lot of the talk around GenAI misses two key things. First, what GenAI’s knowledge even is, and second, how humans actually get better at something—how they develop expertise.

Right. Let’s start with the tool itself. GenAI, LLMs. Yeah. What are we actually talking about? Is it intelligent?

Well, intelligent is tricky. Blackman defines the current stuff pretty simply: software that learns by example. It’s essentially a probability machine—a really, really powerful prediction tool.

Prediction, like guessing, sort of.

Yeah. But based on analyzing just unimaginable amounts of text and data, it learns statistical patterns. So, it predicts the most likely next word or generates a really fluent sounding explanation because it’s seen similar patterns millions of times.

It doesn’t know physics like a professor knows physics. It’s more like an incredibly articulate parrot that’s read the entire internet. It uses statistics.

Statistical parrot. I like that.

So if that’s the tool, how does that match up—or maybe not match up—with how a student learns?

Okay. So for that we need a framework. Alexander’s Model of Domain Learning, the MDL, is really helpful here. It basically shows learning isn’t like flipping a switch. It’s a journey.

Think of it like climbing a mountain. At the bottom, you’re in the acclimation phase—total novice, just figuring out the basics. Then you climb a bit, you reach competence, you can do things, but maybe you still need guidance, a bit of help, right? And then finally, you reach the summit or near it—that’s proficiency, real expertise, deep understanding.

You can work independently, apply knowledge flexibly. And connecting this back to AI, Dinsmore and Frier argue that where a student is on that mountain really matters for whether generic GenAI is useful.

Absolutely critical—because if a student is still down in that acclimation phase, just starting out, they lack that foundational knowledge, that structure in their head. So if they ask ChatGPT for a summary of, I don’t know, photosynthesis, the AI gives this beautifully written fluent summary, but the student—they don’t have the mental hooks to hang that information on. It doesn’t connect deeply.

It just floats there.

Pretty much, yeah. And if the AI’s main trick is just summarizing stuff it has access to, it lets the students skip the hard part—the struggle, the effort you need to actually build those connections to climb from competence up to proficiency.

That cognitive effort is the learning process.

Wow. So, wait, are you saying that for a student who’s genuinely trying to learn something new, these standard off-the-shelf tools like ChatGPT might actually be useless—or worse?

Based on this MDL framework, Dinsmore and Frier conclude there’s basically no compelling evidence right now that generic GenAI helps develop real learning or expertise.

That feels huge, given how they’re being pushed. It provides efficiency, sure—task completion—but efficiency can sometimes be the absolute enemy of genuine education.

Section Two: The Scaffold vs The Crutch

Okay, that’s a really sobering thought, but—and this is the paradox, right?—the same underlying tech, these LLMs, can apparently produce amazing results when they’re designed differently.

Exactly. And this is the crucial distinction we have to make. It’s the difference between AI acting as a scaffold—something that supports you while you build—versus AI acting as a crutch, something that does the work for you so you never build the strength.

A scaffold versus a crutch. Right.

Okay. So, let’s look at the positive side first: the scaffold. What’s the big optimistic vision here?

Oh, the vision is incredibly compelling. Imagine every single child getting expert tutoring, infinitely patient, tireless, perfectly adapted to their exact needs—one-on-one, like a personal tutor for everyone, right?

And that frees up the human teacher to focus on the things humans do best: motivation, building relationships, tailoring things to the local culture—the human side of learning, which let’s be honest, previous waves of EdTech haven’t really delivered on that scale.

They really haven’t. But now, now there’s some evidence suggesting this might actually be changing.

Things are getting interesting. Okay, this is where we get into the really eye-opening stuff.

Yeah. Tell us about that Harvard study, the one with physics undergraduates.

Yeah, this was a rigorous randomized controlled trial. They had students using a tutor built on GPT-4.

And the key thing was the comparison group, right?

They weren’t comparing it to like bad lectures.

No, absolutely not. That’s what makes it so powerful. The comparison group was using well-implemented active learning delivered by highly rated, experienced human instructors. This was a high bar.

Okay, so good teaching versus the AI tutor—and the results?

Stunning. Really. The students using the AI tutor showed median learning gains more than double the group with the excellent human-led instruction.

Double. Double. We’re talking effect sizes between 0.73 and 1.3 standard deviations.

Okay, translate that for us non-statisticians.

An effect size over 1.0—what does that mean in a classroom? It’s huge. It means you could potentially see a student move from, say, a C grade to an A. Or a student who is really struggling might suddenly grasp the concepts and pass comfortably. It’s a massive leap in learning effectiveness.

And it was efficient, too, right? They got through the material quickly.

Yep. Seventy percent finished within the expected time—median around 49 minutes for a 60-minute session. Efficient and effective.

But—and this seems crucial—it wasn’t just generic ChatGPT let loose, was it?

Absolutely crucial point. This wasn’t the wild, unconstrained AI. It was carefully engineered for learning.

How so?

Well, it was used for foundational concepts, kind of like a flipped classroom model. And importantly, it had guardrails. The instructors had built in things like pre-written solutions to stop it from just making stuff up—hallucinating.

Ah, okay.

They basically designed it to force the students to engage, to retrieve information, to apply those known laws of learning—the things we know work.

So it’s applying learning science principles but using the AI engine.

Exactly. We see that elsewhere too, like with Assistments or Carnegie Learning Mathia. They show consistent positive results—maybe not 1.3 SD, but solid gains often between 0.2 and 0.4 SD, especially for struggling students.

And they work because they relentlessly apply those core principles: clear instruction, quick feedback, adapting the difficulty, making students practice retrieving what they know.

It makes you think about Bloom’s old idea, the two-sigma problem—the challenge of scaling up that ideal one-on-one tutoring.

Maybe we’re finally getting closer.

Well, Bloom’s original numbers were probably too optimistic based on modern research. Real human tutoring is more like 0.3 to 0.4 SD. But these GPT-4 results are significantly higher.

It does suggest that for certain parts of teaching—the more algorithmic bits, the practice, the feedback loops—AI might be able to scale that expert effect in a way we haven’t seen before.

Section Three: The Danger Zone

Okay, that’s the powerful potential—the scaffold—but you mentioned the danger zone: the crutch, right?

The darker truth. Because that same powerful technology, if it’s not carefully designed—if it’s just used generically—it seems it can actively harm learning.

Why? What’s the mechanism there?

You mentioned generic LLMs are built for efficiency—for making things easy, frictionless. Why is that bad for learning?

Because, as Carl Hendrick puts it, learning requires effort. It requires that friction, that cognitive struggle.

When the AI just smooths everything over, gives you the perfect answer instantly—you don’t do the thinking yourself.

Exactly. You offload the cognition to the machine.

And we’re seeing research now trying to quantify this. Gerick’s study found a strong negative correlation between frequent AI tool use and critical thinking skills.

Wow. So using it a lot made students worse at thinking, it seems.

So they relied on the AI for reasoning, for explaining, so their own ability to do that independently—yeah—it weakened. It’s like a muscle that doesn’t get used. Cognitive offloading.

Teachers listening have seen versions of this for years, right? Students just reaching for a calculator or Google without trying first.

Precisely.

And there’s another danger: the illusion of understanding. The AI produces such a fluent, confident-sounding answer that the student thinks they understand it just as well.

Yes, it’s a serious metacognitive error. They mistake the AI’s fluency for their own comprehension. They haven’t actually done the work of mastery.

And this isn’t just theoretical harm. There’s evidence of students doing worse.

There is that University of Pennsylvania study—quite concerning. High school math students who had unrestricted access to GenAI, with no guardrails, just plain ChatGPT basically—they actually performed worse on tests than students who just worked through the problems themselves unaided.

Worse. Not just the same, but worse.

Worse, because they likely used the AI to bypass the necessary steps—the cognitive processes, the actual learning needed to solve the problems later on their own. They used it as a crutch.

Section Four: Lessons and Future Implications

Okay. So, the lesson here seems incredibly clear. Helpful AI versus harmful AI—it’s all about the design. Entirely.

It comes down to how the tool is built and implemented.

That Harvard tutor worked because it was constrained.

Right. It was built to resist just giving the answer—to force the student to work.

Exactly. It had to be, you might say, strategically unhelpful.

Yeah. It needed to know when providing more information would actually short-circuit the learning process for that student based on where they were on that MDL continuum—and withhold the answer even if that makes the interaction feel less smooth or efficient for the user.

Yes. Because the goal isn’t smooth efficiency. The goal is learning. And sometimes learning is bumpy. Sometimes it requires hitting a wall and figuring how to get over it—not having the AI instantly beam you to the other side.

Let’s think about the future then, and what this means for, say, a school leader listening in Bangladesh. Maybe resources are tight, teacher ratios are high. Is this AI tutoring a realistic hope or just fancy tech for rich universities?

I think it’s realistic, but it comes with uncomfortable truths.

The trajectory is important. Unlike human teaching expertise—which improves slowly, person by person—improvements in AI tutoring software can be rolled out instantly, globally, to millions of students.

There’s this potential for an exponential feedback loop. The AI gets better faster.

Meaning it’s almost inevitable that AI tutors will become more effective than the average human teacher for certain tasks—for the algorithmic parts of teaching, yeah, the drilling, the concept checks, the adaptive practice, the immediate feedback.

I think that’s highly likely.

Yes. Which leads to that really uncomfortable question for the profession: if AI can teach foundational stuff better, more efficiently, what’s left for the human teacher?

Augmentation or replacement?

History isn’t always kind on this. Efficiency gains often do tend to displace labour rather than just augment it.

Good teaching—the really deep, human, context-aware stuff that’s complex—it doesn’t scale easily. But AI scales incredibly well.

It does. But we have to hold on to what the machines genuinely can’t do yet: the motivation, building that relationship, understanding the specific cultural context—the nuances of your students in your school in Bangladesh. That’s where great teachers shine.

Okay, so summing this up, we’re at a real fork in the road—a critical inflection point. The evidence is showing AI can be incredibly powerful for learning, potentially giving us learning gains we haven’t seen before at scale—or it can actively undermine learning if we’re not careful.

Precisely. And the deciding factor every single time seems to be the design. Is the AI designed to align with how humans actually learn and develop expertise, like that Model of Domain Learning suggests—or is it just designed for frictionless task completion?

So the advice for teachers and leaders listening is: you can’t just adopt AI. You have to be incredibly discerning. You have to understand the difference.

Is this tool a cognitive prosthetic, like eyeglasses helping you see better but you’re still doing the seeing? That’s the scaffold.

Or is it a cognitive offload—something that does the seeing for you, letting your own vision weaken? That’s the crutch.

The choice isn’t whether AI will change education—it already is. The real choice is whether we guide that change wisely, based on evidence and focused on human flourishing—or just let the tech, driven by efficiency, dictate the terms.

And maybe that brings us back to that philosophical touchstone from the sources. Maybe instruction—the application of known learning principles, the feedback loops, the practice for schema—maybe that is computable. Maybe it can be automated and scaled effectively by machines.

But teaching in that richer sense—the spark, the connection, what Keats called that “wreath trellis of a working brain awakening another mind”—maybe that’s something else entirely.

And ultimately, we have to decide what we value most. Is it fast learning focused on efficiency, or is it deep learning focused on genuine understanding?

That’s the question AI forces us to confront.

A profound question to end on. Thank you for joining us for this deep dive.

Why AI Might Be the Best — and Worst — Thing to Happen to Education

Listen on Youtube

Episode Summary

Key Takeaways

Key Takeaways

Research Notes & Links

1. The Algorithmic Turn: The Emerging Role of AI in Education

2. “Generative Artificial Intelligence and Learning: Opportunities, Risks, and Evidence” – Educational Research Review (2025)

Transcript

The Cognitive Crutch vs The Scaffold

Section One: The Foundational Challenge

Section Two: The Scaffold vs The Crutch

Section Three: The Danger Zone

Section Four: Lessons and Future Implications

Leave a Reply Cancel Reply

Previous ProjectEEF Metacognition Report Explained for Bangladesh Classrooms | EBTD Research Bites

Next ProjectHow Students Remember: Evidence-Based Teaching for Lasting Learning in Bangladesh (BD)

WA: +88 01865 964 393

E: info@ebtd.education