Skip to content

Commit

Permalink
Merge pull request #1 from reuank/patch-1
Browse files Browse the repository at this point in the history
Fix incorrect paper links
  • Loading branch information
ggbetz authored Jun 19, 2024
2 parents 212cafb + b695b6a commit 7e6a02d
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,9 @@ _Let LLMs self-correct their deliberation._

_Methods for analysing LLM deliberation and assessing reasoning quality._

- 🎓👩‍💻 Comprehensive LLM-based reasoning analytics that breaks texts down into individual reasons. "DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models." 2024-01-04. [[>paper](https://arxiv.org/abs/2304.10703)] [[>code](https://github.com/intuit-ai-research/DCR-consistency)]
- 🎓👩‍💻 Comprehensive LLM-based reasoning analytics that breaks texts down into individual reasons. "DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models." 2024-01-04. [[>paper](https://arxiv.org/abs/2401.02132)] [[>code](https://github.com/intuit-ai-research/DCR-consistency)]
- 🎓🤗 Highly performant, open LLM (T5-based) for inference verification. "Minds versus Machines: Rethinking Entailment Verification with Language Models." 2024-02-06. [[>paper](https://arxiv.org/abs/2402.03686)] [[>model](soumyasanyal/entailment-verifier-xxl)]
- 🎓👩‍💻 Test dataset for CoT evaluators. "A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains." 2023-11-23. [[>paper](https://arxiv.org/abs/2304.10703)] [[>dataset](https://huggingface.co/datasets/google/reveal)]
- 🎓👩‍💻 Test dataset for CoT evaluators. "A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains." 2023-11-23. [[>paper](https://arxiv.org/abs/2402.00559)] [[>dataset](https://huggingface.co/datasets/google/reveal)]
- 🎓👩‍💻 Framework for evaluating reasoning chains by viewing them as informal proofs that derive the final answer. "ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness." 2023-11-23. [[>paper](https://arxiv.org/abs/2304.10703)] [[>code](https://github.com/archiki/ReCEval)]
- 🎓 GPT-4 is 5x better at predicting whether math reasoning is correct than GPT-3.5. "Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs." 2023-12-28. [[>paper](https://arxiv.org/abs/2312.17080)]
- 🎓 Minimalistic GPT-4 prompts for assessing reasoning quality. "SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation." 2023-09-29. [[>paper](https://arxiv.org/abs/2310.00074)] [[>code](https://github.com/facebookresearch/ParlAI/tree/main/projects/roscoe#meta-evaluation)]
Expand Down

0 comments on commit 7e6a02d

Please sign in to comment.