Probing the Origins of Reasoning Performance: Representational Quality for Mathematical Problem-Solving in RL vs SFT Finetuned Models
Large reasoning models trained via reinforcement learning (RL) substantially outperform their supervised counterparts on tasks requiring logic and mathematical reasoning, yet the mechanistic basis for these improvements remains unclear. We investigate this phenomenon through an integrated behavioral-mechanistic analysis of mathematical reasoning, asking: what internal differences enable RL models' improved reasoning capabilities?
- Antyabha Rahman (University of New South Wales)
- Akshaj Gurugubelli (Algoverse AI Research)
- Omar Ankit (University of Waterloo)
- Kevin Zhu (Algoverse AI Research)
- Aishwarya Balwani (St. Jude Children's Research Hospital) - Corresponding Author
Visit our project website: https://oankit.github.io/-rl-sft-reasoning/
- Python 3.12+
- NVIDIA GPU (Recommended for model inference and activation extraction)
- uv package manager
The experiment code is located in the experiment_code directory. We use uv for fast and reliable dependency management.
-
Install
uv(if not already installed):curl -LsSf https://astral.sh/uv/install.sh | sh -
Set up the environment:
cd experiment_code uv sync
We primarily use uv run to run python files in scripts as it uses the dependencies installed.
This experiment investigates the linear separability of internal representations across model layers. All scripts are located in experiment_code/layerwise_linear_probe.
Step 1: Data Generation Generate synthetic math questions for the probing task.
uv run experiment_code/layerwise_linear_probe/question_generate.pyStep 2: Generate Completions Run models to generate answers for the synthetic questions.
bash experiment_code/layerwise_linear_probe/generate_completion_script.shStep 3: Extract Activations Extract and save the internal activations (hidden states) of the models during inference.
bash experiment_code/layerwise_linear_probe/extract_activation_script.shStep 4: Train Probes Train linear probes on the extracted activations to predict correct/incorrect reasoning steps.
# Balance the dataset first
uv run experiment_code/layerwise_linear_probe/balance_probe_data_flexible.py
# Train probes across model families
bash experiment_code/layerwise_linear_probe/train_families_per_question.shStep 5: Visualize Results Generate plots comparing probe performance across layers and models.
bash experiment_code/layerwise_linear_probe/visualize_families_per_question.shThis experiment analyzes the variability of output tokens to understand generation diversity. Code is in experiment_code/token_variability_experiment.
- Run Experiments: Use the automation script to run variability analysis across all supported models.
bash experiment_code/token_variability_experiment/token_var_script.sh
- Individual Scripts: Specific scripts for models (e.g., DeepSeek, Olmo) are in folders like
deepseek_math_scripts/andolmo3_scripts/. - Visualizations: Generate bar graphs of accuracy vs. token coefficient of variation.
uv run experiment_code/token_variability_experiment/bargraph_accuracy_vs_token_cv.py
This implements systematic activation patching to investigate the criticality of each layer in DeepSeek-Math models for mathematical reasoning capabilities.
-
Run Experiments:
uv run experiment_code/layerwise_ablation/main.py
-
Visualizations:
uv run experiment_code/layerwise_ablation/visualizaiton.py
If you find this work useful, please cite our paper:
@article{rahman2025reasoning,
title={Probing the Origins of Reasoning Performance: Representational Quality for Mathematical Problem-Solving in RL vs SFT Finetuned Models},
author={Rahman, Antyabha and Gurugubelli, Akshaj and Ankit, Omar and Zhu, Kevin and Balwani, Aishwarya},
journal={arXiv preprint},
year={2025}
}For questions or correspondence, please contact:
- Aishwarya Balwani: aishwarya.balwani@stjude.org
- Antyabha Rahman: antyabha.rahman@student.unsw.edu.au
Work conducted with Algoverse AI Research.
© 2025 Algoverse AI Research. All rights reserved.