About the reward computing logic

I notice some detailed designs in the `_compute_score_with_rules` in `envs/vision.py` or `envs/search.py`. For instance, when the answer is not extracted, an additional `-format_score` penalty is imposed, although `total_format_score` already contains such a punishment. Also, `total_format_score` is multiplied by 1/2.
I wonder whether the logic in `_compute_score_with_rules` is based on any reference, or entirely designed by you. Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the reward computing logic #115

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About the reward computing logic #115

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions