[Submission] Assignment 2 GRPO Training for the Countdown Task - Giang Nguyen

### Student Name

Giang Nguyen

### Model Length

256

### Accuracy

55.76%

### Improvement Description

Multi-round training, with checkpoint selection and optimizer reset; warm-up with Dr.GRPO.

### Detailed Write-up

_No response_

### GPU Hours

_No response_

### Submission Agreement

- [x] I confirm that these results are from my own work