Cheat Sites & AI & Large Language Models, Oh My!

Title: Cheat sites and artificial intelligence usage in online introductory physics courses: What is the extent and what effect does it have on assessments?

Authors: Gerd Kortemeyer & Wolfgang Bauer

First Author’s Institution: Rectorate and AI Center, ETH Zurich, 8092 Zurich, Switzerland

Status: Published in the Physical Review Physics Education Research, Open Access

It’s 2 AM, you’re 3 coffees deep, and you STILL haven’t figured out the answer to 4d on Problem Set 5… what do you do? While students may have previously turned to cheat sites such as Chegg in such moments of desperation, the rapid rise of large language models (LLMs) and artificial intelligence (AI) tools such as ChatGPT have given students a whole new avenue to receive homework help and allow them to catch up on some much needed sleep. But with the concurrent rising popularity of online courses, due in large part to the COVID-19 pandemic, are students today employing these tools on their quizzes and exams too? The authors of today’s paper explore how students are using AI to aid them in their introductory physics course, how that usage impacts their scores, and how these students feel about the current landscape of AI as it pertains to their educational experiences. 

The Study

Bar chart showing the self-reported usage of resources, where the resources (OnlAI, OnlInt, HwkAI, HwkInt, HwkPeer, HwkFac) are listed on the x-axis, and mean value listed on the y-axis. Most prevalent is the usage of the Internet during homework (HwkInt), followed by discussing homework with peers (HwkPeer). Least prevalent is the usage of AI on online exams (OnlAI)
Figure 1: Self-reported usage of resources, homework, and online exams (in percent of problems). (Figure 5 in the paper)

In order to study the effect of AI tools on academic performance in physics, the authors surveyed students from Michigan State University in calculus-based introductory physics courses taught via weekly asynchronous video lectures. The course had weekly online homework assignments, 11 low-stakes weekly exams (9 of which were conducted online and 2 of which were taken on campus under supervision), and 1 high-stakes on-campus final exam (which included five questions that were randomized duplicates of problems from their homework). At the end of the course, the students were asked to estimate the percentage of time they received help on the homework, whether through AI tools like ChatGPT (HwkAI), internet resources like help sites or forums (HwkInt), or consulting their peers (HwkPeer) or TA/profs (HwkFac). They were also asked to estimate the percentage of time they received help on their online exam problems, either through AI tools (OnlAI) or help sites/forums (OnlInt). They compared these self-reported variables to their grades to capture various aspects of student performance and behavior. Overall, the students reported less usage of resources during exams than during homework, though not significantly (see Figure 1). 

The Results: Does the use of AI Significantly Alter a Student’s Performance?

In order to see the effects of using the helping hand of AI on performance in introductory physics, the authors noted that none of the self-reported responses followed normal distributions, thus indicating an emergence of distinct subpopulations. The authors identified these subpopulations as four different clusters – cluster 1 was comprised of students who received human help on their homework and received no aid on their exams, cluster 2 was comprised of students who did not make use of any external resources at any point, cluster 3 was comprised of students who mostly relied on internet resources in both their homework and exams, and cluster 4 who used both internet resources and AI in both their homework and exams. They compared their self-reported usage to their grades on their homework (Hwk), their online exams (OnlExams), their on-campus exams (CamExams), and their final exam (Final). Additionally, they included the variables Sem5, which represented the scores of the five exam problems when they first appeared in the homework,  Final5 for the same problems when included in the final exam, and Diff5, which indicated the difference in scores between these two settings to serve as a proxy for the retention rate between the semester and the final exam.

Fruchterman-Reingold representation for each of the 4 distinct clusters. Online exams are represented in light bluish-gray on-campus exams are represented in green, and the differences in scores between selected subsets of exams are in gray. The percentages of AI usage are indicated in beige, percentages of Internet resources in yellow, and discussions with humans in orange. Green lines denote positive correlations, red lines negative correlations, and the thickness shows their absolute strength.
Figure 2. Fruchterman-Reingold representation of the statistically significant correlations (p < 0.05) between variables. Online exams are represented in light bluish-gray on-campus exams are represented in green, and the differences in scores between selected subsets of exams are in gray. The percentages of AI usage are indicated in beige, percentages of Internet resources in yellow, and discussions with humans in orange. Green lines denote positive correlations, red lines negative correlations, and the thickness shows their absolute strength. (Figure 8 in the paper)

The authors then analyzed these variables using a correlation matrix and – to their surprise – found very few significant correlations between resource usage and assessment performance. The few correlations that did emerge can be visualized with a Fruchterman-Reingold diagram, which shows only the variables with statistically significant correlations. The results are shown in Figure 2. For students in clusters 1 and 2, who did not use any outside resources on the exams, all performance measures were significantly positively correlated. For the group of students who relied on the Internet for both homework and exams (cluster 3), their performance on online exams is significantly negatively correlated with their scores. The authors suggest this might be because the students may not have been able to find the solutions on online help sites quickly enough. For the group that made use of both the internet and AI in all contexts (cluster 4), the correlations become more fragmented. Overall, when using Final as a proxy for learning success, the authors found no significant difference between any of the groups.

The Kids Are Alright?

The lack of correlations between self-reported resource usage and exam scores prompt more questions than answers. The authors pose the question: “How can it be that undermining the formative assessment through cheating does not seem to have a significant impact?” The final part of this study was to ask the students their thoughts. For the most part, students seemed to understand the possible dangers of heavily relying on AI, but have accepted it as a part of the educational landscape, for better or for worse. Many students believe that AI can be used as a helpful tool, allowing it to serve as their personal on-call tutor, rather than simply a means to avoid doing their homework. We don’t yet know how AI will alter our education system, but we must be prepared to adapt to best serve the students.

Astrobite edited by Archana Aravindan

Featured image credit: A screenshot of my conversation with ChatGPT 

Author

  • Tori Bonidie

    I am a 5th year PhD candidate studying exoplanet atmospheres at the University of Pittsburgh. Prior to this, I earned my BA in astrophysics at Franklin and Marshall College where I worked on pulsar detection as a member of NANOGrav. In my free time you can find me cooking, napping with my cat, or reading STEMinist romcoms!

    View all posts

Discover more from astrobites

Subscribe to get the latest posts sent to your email.

Leave a Reply