Probing the Origins of Reasoning Performance: Representational Quality for Mathematical Problem-Solving in RL vs SFT Finetuned Models

Abstract

Large reasoning models trained via reinforcement learning (RL) substantially outperform their supervised counterparts on tasks requiring logic and mathematical reasoning, yet the mechanistic basis for these improvements remains unclear. We investigate this phenomenon through an integrated behavioral-mechanistic analysis of mathematical reasoning, asking: what internal differences enable RL models' improved reasoning capabilities?

Authors

Antyabha Rahman (University of New South Wales)
Akshaj Gurugubelli (Algoverse AI Research)
Omar Ankit (University of Waterloo)
Kevin Zhu (Algoverse AI Research)
Aishwarya Balwani (St. Jude Children's Research Hospital) - Corresponding Author

Website

Visit our project website: https://oankit.github.io/-rl-sft-reasoning/

Prerequisites

Python 3.12+
NVIDIA GPU (Recommended for model inference and activation extraction)
uv package manager

Installation

The experiment code is located in the experiment_code directory. We use uv for fast and reliable dependency management.

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Set up the environment:
```
cd experiment_code
uv sync
```

Usage

We primarily use uv run to run python files in scripts as it uses the dependencies installed.

1. Layer-wise Linear Probes

This experiment investigates the linear separability of internal representations across model layers. All scripts are located in experiment_code/layerwise_linear_probe.

Step 1: Data Generation Generate synthetic math questions for the probing task.

uv run experiment_code/layerwise_linear_probe/question_generate.py

Step 2: Generate Completions Run models to generate answers for the synthetic questions.

bash experiment_code/layerwise_linear_probe/generate_completion_script.sh

Step 3: Extract Activations Extract and save the internal activations (hidden states) of the models during inference.

bash experiment_code/layerwise_linear_probe/extract_activation_script.sh

Step 4: Train Probes Train linear probes on the extracted activations to predict correct/incorrect reasoning steps.

# Balance the dataset first
uv run experiment_code/layerwise_linear_probe/balance_probe_data_flexible.py

# Train probes across model families
bash experiment_code/layerwise_linear_probe/train_families_per_question.sh

Step 5: Visualize Results Generate plots comparing probe performance across layers and models.

bash experiment_code/layerwise_linear_probe/visualize_families_per_question.sh

2. Token Variability Analysis

This experiment analyzes the variability of output tokens to understand generation diversity. Code is in experiment_code/token_variability_experiment.

Run Experiments: Use the automation script to run variability analysis across all supported models.
```
bash experiment_code/token_variability_experiment/token_var_script.sh
```
Individual Scripts: Specific scripts for models (e.g., DeepSeek, Olmo) are in folders like deepseek_math_scripts/ and olmo3_scripts/.

Visualizations: Generate bar graphs of accuracy vs. token coefficient of variation.

uv run experiment_code/token_variability_experiment/bargraph_accuracy_vs_token_cv.py

3. Layerwise Ablation

This implements systematic activation patching to investigate the criticality of each layer in DeepSeek-Math models for mathematical reasoning capabilities.

Run Experiments:

uv run experiment_code/layerwise_ablation/main.py

Visualizations:

uv run experiment_code/layerwise_ablation/visualizaiton.py

Citation

If you find this work useful, please cite our paper:

@article{rahman2025reasoning,
  title={Probing the Origins of Reasoning Performance: Representational Quality for Mathematical Problem-Solving in RL vs SFT Finetuned Models},
  author={Rahman, Antyabha and Gurugubelli, Akshaj and Ankit, Omar and Zhu, Kevin and Balwani, Aishwarya},
  journal={arXiv preprint},
  year={2025}
}

Contact

For questions or correspondence, please contact:

Aishwarya Balwani: aishwarya.balwani@stjude.org
Antyabha Rahman: antyabha.rahman@student.unsw.edu.au

Affiliation

Work conducted with Algoverse AI Research.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
experiment_code		experiment_code
images		images
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
index.html		index.html
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probing the Origins of Reasoning Performance: Representational Quality for Mathematical Problem-Solving in RL vs SFT Finetuned Models

Abstract

Authors

Website

Prerequisites

Installation

Usage

1. Layer-wise Linear Probes

2. Token Variability Analysis

3. Layerwise Ablation

Citation

Contact

Affiliation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Probing the Origins of Reasoning Performance: Representational Quality for Mathematical Problem-Solving in RL vs SFT Finetuned Models

Abstract

Authors

Website

Prerequisites

Installation

Usage

1. Layer-wise Linear Probes

2. Token Variability Analysis

3. Layerwise Ablation

Citation

Contact

Affiliation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages