Hello! In your implementation in [https://github.com/thomfoster/minRLHF/blob/main/minRLHF/buffer.py#L129](https://github.com/thomfoster/minRLHF/blob/main/minRLHF/buffer.py#L129), you perform sample-level normalization, why not batch-level normalization?
Hello! In your implementation in https://github.com/thomfoster/minRLHF/blob/main/minRLHF/buffer.py#L129, you perform sample-level normalization, why not batch-level normalization?