Generative artificial intelligence (GenAI) has moved rapidly from novelty to reality in classrooms, raising urgent questions about how it should be used, studied, and governed—especially in K–12 education, where foundational learning takes place. While debates about ethics, authorship, and academic integrity often dominate public discourse, a growing body of research is beginning to explore how GenAI tools like ChatGPT are actually being used by teachers and students. A recent synthesis of empirical studies offers a revealing snapshot of current practices, research gaps, and future priorities for integrating GenAI meaningfully into school systems.
Professional Learning as the Cornerstone
Across the literature, one conclusion stands out: teachers are central to any successful use of GenAI in classrooms. Nearly all reviewed studies stress the importance of sustained professional development that goes beyond basic tool familiarity. Educators need opportunities to experiment with subject-specific applications of GenAI, engage in authentic instructional tasks, and develop AI literacy that enables informed decision-making.
Researchers also emphasize the importance of evaluating learning outcomes before and after GenAI use, rather than relying solely on perceptions of effectiveness. Understanding how teachers and students think and feel while interacting with large language models—including their cognitive load, confidence, and emotional responses—has emerged as an important area for future inquiry. Additionally, scholars call for deeper investigation into partnerships between students and GenAI systems, examining how these collaborations influence learning outcomes across diverse educational contexts.
Moving Beyond Short-Term Effects
Most existing studies focus on immediate or short-term impacts of GenAI use, such as improved engagement or task completion. However, researchers increasingly argue that this is not enough. Long-term and indirect effects—particularly on creativity, critical thinking, and problem-solving—remain largely unexplored. Longitudinal studies are needed to determine whether GenAI enhances or diminishes these essential skills over time.
There is also growing recognition that GenAI systems are imperfect and evolving. Inaccuracies in model outputs suggest the need for experimentation with multiple tools and configurations. Some scholars advocate for “teacher-in-the-loop” systems that allow educators to guide, refine, and contextualize AI-generated content, increasing both trustworthiness and pedagogical relevance.

Who Is—and Isn’t—Included in the Research
One striking finding from the literature is what it leaves out. None of the reviewed studies focused directly on early childhood education (ECE), despite the growing presence of digital tools in early learning environments. Existing work in this area has largely examined teachers’ perceptions rather than student outcomes, leaving open questions about how GenAI might support—or hinder—learning among younger children.
Similarly, parental involvement remains an underexplored dimension. Only one study explicitly included parents as participants, even though decades of educational research highlight the role families play in motivation, access, and achievement. Given GenAI’s potential to extend learning beyond the classroom, future research would benefit from examining how parents and children engage with these tools together.
Teachers as Designers, Not Just Users
Despite frequent claims that GenAI can support teachers, relatively few studies have examined how educators actually use these tools in day-to-day practice. Some promising work demonstrates how GenAI can assist with tasks like generating discussion questions or supporting self-regulated learning. However, much of this research stops short of evaluating classroom impact.
A consistent theme is that GenAI cannot replace teacher expertise. AI systems lack knowledge of students’ cultural backgrounds, learning histories, and classroom dynamics unless explicitly guided. As a result, teachers remain essential as designers, mediators, and decision-makers. More classroom-based studies are needed to show how GenAI functions in real instructional settings, rather than in theoretical or tightly controlled environments.
Expanding Beyond STEM
Most documented uses of GenAI in K–12 education appear in STEM-related subjects, such as physics and chemistry. While this focus is understandable given AI’s technical roots, researchers are increasingly exploring applications across disciplines—including geography, drama, and literacy. Conceptual frameworks like SAMR and TPACK have been proposed to guide integration, but empirical evidence validating these approaches is still limited.
The consensus is clear: GenAI has implications for every subject area. What remains missing are detailed accounts of how these tools perform in practice, measured against clear learning objectives and discipline-specific criteria.
Research Design Matters
Methodologically, much of the existing research relies on quasi-experimental designs and self-reported data, such as surveys and interviews. While these approaches offer useful insights, they also introduce limitations. Self-reports are susceptible to bias, inconsistent interpretation, and cultural variation, which can weaken both validity and generalizability.
Researchers argue for greater use of objective measures—such as assessments and performance tasks—as well as more rigorous experimental designs that allow for causal claims. Scalability is another concern: interventions that work in small, controlled studies may not translate effectively to everyday classroom settings without clear guidelines and safeguards.
Tools, Contexts, and Global Gaps
Although many GenAI tools exist, research has largely centered on a small subset, particularly ChatGPT and Copilot. This narrow focus limits understanding of how different platforms might support distinct educational goals. Comparative studies across tools, settings, and learner populations are urgently needed.
The literature also reflects geographic and linguistic imbalances. Most studies are published in English and originate from a limited set of regions, leaving voices from Africa, the Middle East, South America, and non-English-speaking contexts underrepresented. These gaps raise important questions about equity, access, and cultural relevance.

Looking Ahead
GenAI is not a passing trend in education. As schools move toward more personalized, inclusive, and flexible learning models, AI tools are likely to play an increasingly visible role. Yet the current research base remains uneven, with far more attention paid to higher education than to K–12 settings.
Future research agendas must address neglected areas such as early childhood education, special needs and inclusive learning, rural and underserved contexts, and language learning. Policymaker perspectives, alongside those of teachers, students, and families, are also essential for building comprehensive frameworks for responsible GenAI use.
Ultimately, opinions and position papers are not enough. What K–12 education needs now is robust, classroom-based evidence showing how GenAI can genuinely enhance teaching and learning—while preserving human judgment, creativity, and care at the heart of education.
This article was based, in part, on the following research review article:
Alfarwan A (2025) Generative AI use in K-12 education: a systematic review. Front. Educ. 10:1647573. doi: 10.3389/feduc.2025.1647573
Leslie Stebbins is the Director of Research4Ed. She has more than twenty-five years of experience in higher education and K-12 learning. Her clients include Harvard University, the U.S. Department of Education, Tufts University, and the Gates Foundation. She has an M.Ed. from the Technology Innovation & Education Program at the Harvard Graduate School of Education and a Master’s in Library and Information Science from Simmons College. https://www.linkedin.com/in/lesliestebbins/






