Skip to content

pasantiesteban/ICSE_SEET2026_Chats

Repository files navigation

An Empirical Study of Anonymous, Unmoderated, and Online Peer-to-Peer Programming Tutoring Conversations

Authors: Priscila Santiesteban, Emma Shedden, Madeline Endres, Westley Weimer, Published: ICSE-SEET 2026

Abstract

Many people want to learn to program but lack access to traditional classroom instruction. Teaching these novices at scale is crucial for building a more diverse and capable software engineering work- force. While online tools like Stack Overflow and ChatGPT offer help, they can be impersonal or reinforce poor software develop- ment practices. Anonymous peer-to-peer (P2P) tutoring has the potential to be an additional place for scalable support, but we lack a firm understanding of how to best support it for CS pedagogy. We present a mixed-analysis study of 𝑛=108 anonymous, unmod- erated P2P CS tutoring sessions. We analyze text-based conversa- tions from Python Tutor, a widely-used learning platform. In this setting, novice programmers (Learners) request help from volun- teer programmers (Helpers) in a shared coding environment. We present a qualitatively-backed model of user motivations, conversa- tional dynamics, and Learner-reported satisfaction. Surprisingly, Learners often receive useful (59% of tutoring interactions), low- toxicity (78% of messages) help without moderation. P2P chats reflect key phases of the software development process (83% of chats) and occasionally foster personal connection (17% of chats). We identify behaviors linked to satisfaction and discuss implications for scalable peer tutoring system design for CS education.

Purpose

We have included a sample of raw text chat data here. Full data will be avaliable upon publication of the paper.

The sample raw text data (8/108) can be found in chat-samples/, including three examples of excluded text data for reviewers discretion.

The processed data in annotations-data-analysis/derived-dataframes/ is valid input to the GLMER analysis script in annotations-data-analysis.

The long form data in data-management/data/clean/v7/ is valid input to our counting and data visualization scripts. Outputs are provided in annotations-data-analysis/output/.

Full annotation data (before cleaning) is in data-management/data/v*/.

Contents

  • data-management/scripts/data-management_qdpx-to-csv.ipynb --- script for step 1 of data management/processing
  • data-management/scripts/data-management_csv-cleaning.ipynb --- script for step 2 of data management/processing
  • data-management/scripts/balanced-f-measure.ipynb --- script for IRR evaluation
  • data-management/data/ --- data before and after filtering and processing, described above
  • annotations-data-analysis/regression-analysis-data-preprocessing.ipynb --- script for step 3 of data management/processing
  • annotations-data-analysis/derived-dataframes/regression-data-v5/ --- anonymized version of the dataframes output by data management/processing
  • annotations-data-analysis/glmer-regression-analysis-v3.Rmd --- script for primary quantitative analysis (GLMER models)
  • annotations-data-analysis/rscripts/ --- for ease of use, individual model fitting steps are copied into short R scripts in this directory
  • annotations-data-analysis/counting-things.ipynb --- script to generate counting results and data visualization presented in the paper
  • annotations-data-analysis/output/ --- data visualizations presented in the paper

Usage

We run the GLMER analysis script using RStudio Server. Individual R scripts can be run with just R. We run the Python scripts in classic Jupyter Notebook.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors