The repo only include how to generate answer, we should evaluate in which metric? Exact match or other metrics.
The repo only include how to generate answer, we should evaluate in which metric? Exact match or other metrics.