https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
https://huggingface.co/evaluate-metric
https://huggingface.co/docs/evaluate/index
https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG
https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1
https://github.com/openai/evals
https://arxiv.org/abs/2306.05685