Skip to content

PurdueDualityLab/github_perf_patch_study

Repository files navigation

GitHub Performance Patch Study

Comparison of performance-focused pull requests opened by AI coding agents and human developers. This repo contains the curated datasets plus analysis pipelines for review/merge dynamics, optimization-pattern labeling, patch complexity deltas, and validation evidence.

Repository Layout

.
├── datasets/
│   ├── ai_pr/                        # AI PR parquet bundles + notebook
│   ├── human_pr/                     # Human PR parquet bundles + notebook
│   ├── pr_filtering/                 # Valid perf PR ID filtering notebook
│   └── performance_prs_*.csv         # Unified AI vs human datasets
├── Quantative_analysis/
│   ├── review_time_and_merge_rate/   # Review/merge latency analysis
│   └── patch_size_and_complexity_analysis/ # Lizard-based deltas
├── RQ1_pattern_analysis/             # Optimization-pattern analysis
├── RQ2_test_and_validation/          # Testing/validation evidence
└── requirements.txt

Key folders:

  • datasets/ holds the raw parquet pulls and the curated joins (performance_prs_ai_vs_human.csv, performance_prs_ai_vs_human_raw.csv) generated by the notebooks in datasets/ai_pr/ and datasets/human_pr/.
  • Quantative_analysis/review_time_and_merge_rate/ contains rq1_analysis.ipynb and figures like review_time_distribution.png.
  • Quantative_analysis/patch_size_and_complexity_analysis/ includes CLI scripts (ai.py, human.py) that fetch patches and run lizard, plus analyze_result.ipynb for plotting.
  • RQ1_pattern_analysis/ stores the optimization catalog (catalog/), LLM labeling notebooks/scripts, mismatch comparisons (compare_pattern.py), and results/plots.
  • RQ2_test_and_validation/ hosts GPT/Gemini labeling notebooks, manual label merges, final parquet outputs, and validation plots.

Data Sources

  1. Hugging Face hao-li/AIDev dataset – The notebooks read the following parquet tables via the hf:// URI scheme:
    • pull_request.parquet, human_pull_request.parquet
    • pr_task_type.parquet, human_pr_task_type.parquet
    • pr_commit_details.parquet
    • all_repository.parquet
  2. Local caches – Frequently accessed aggregates are stored under datasets/ and reused by downstream notebooks.
  3. LLM outputs – Intermediate responses are saved under RQ1_pattern_analysis/llm_data/ and RQ2_test_and_validation/llm_data/.

Authenticate with Hugging Face (huggingface-cli login) before running notebooks so hf:// reads succeed. Large parquet files are intentionally kept out of Git history.

Environment & Credentials

  1. Use Python 3.10+ and install dependencies:
    python -m venv .venv && source .venv/bin/activate
    pip install -r requirements.txt
    pip install lizard PyGithub huggingface_hub  # used by patch-size scripts
  2. Create a .env in the repo root for GitHub API access (used by Quantitative_analysis/patch_size_and_complexity_analysis/*.py):
    GITHUB_TOKEN=ghp_your_personal_access_token
    
  3. LLM-specific tooling:
    • RQ1_pattern_analysis/optimization_pattern_detection_qwen.py calls a local Ollama endpoint; configure OLLAMA_HOST or update the client block if needed.
    • GPT/Gemini notebooks expect API keys via environment variables (e.g., OPENAI_API_KEY, GOOGLE_API_KEY). Store them in .env and load with python-dotenv inside the notebooks.

Running the Analyses

  1. Prepare datasets (once per machine)
    • Run datasets/pr_filtering/get_valid_pr.ipynb to regenerate valid_perf_pr_ids.csv if you adjust filtering rules.
    • Execute datasets/ai_pr/ai_pr.ipynb and datasets/human_pr/human_pr.ipynb to download/cross-check PR metadata, commits, comments, reviews, and workflow runs.
    • The notebooks refresh datasets/performance_prs_ai_vs_human_raw.csv and datasets/performance_prs_ai_vs_human.csv.
  2. Quantitative - Review time & merge rate
    • Open Quantitative_analysis/review_time_and_merge_rate/rq1_analysis.ipynb.
    • Point the intake cell to datasets/performance_prs_ai_vs_human.csv.
    • Running the notebook recreates the summary tables and plots in the same folder.
  3. Quantitative – Patch size & complexity impact
    • Ensure GITHUB_TOKEN is set, then run:
      python Quantitative_analysis/patch_size_and_complexity_analysis/ai.py
      python Quantitative_analysis/patch_size_and_complexity_analysis/human.py
    • Use Quantitative_analysis/patch_size_and_complexity_analysis/analyze_result.ipynb to regenerate plots in results/.
  4. RQ1 – Optimization pattern analysis
    • Choose a labeling notebook (RQ1_pattern_analysis/optimization_pattern_detection_gpt.ipynb or RQ1_pattern_analysis/optimization_pattern_detection_gemini.ipynb) to classify each performance PR into the catalog in RQ1_pattern_analysis/catalog/.
    • For local models, run:
      python RQ1_pattern_analysis/optimization_pattern_detection_qwen.py
    • Compare agreement via:
      python RQ1_pattern_analysis/compare_pattern.py
    • Plots and summary tables land in RQ1_pattern_analysis/results/.
  5. RQ2 – Testing & validation evidence
    • Start with RQ2_test_and_validation/validation-gpt.ipynb or RQ2_test_and_validation/validation-gemini2.ipynb.
    • RQ2_test_and_validation/merge_validation_labels.ipynb consolidates manual labels under RQ2_test_and_validation/manual_label/.
    • RQ2_test_and_validation/analysis.ipynb and RQ2_test_and_validation/validation_plots.ipynb generate the final parquet and figures in results/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •