Skip to content

shainarazavi/Responsible-reasoning-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Responsible Agentic Reasoning and AI Agents: A Critical Survey

Authors: Shaina Raza*, Ranjan Sapkota*, Manoj Karkee, Christos Emmanouilidis
Affiliations: Vector Institute; Cornell University; University of Groningen
Equal contribution: Shaina Raza and Ranjan Sapkota

If you use this work, please cite us (see Cite below).


Overview

What is R²A²?
Responsible Reasoning AI Agents (R²A²) are LLM-powered agents that perform multi-step reasoning with built-in safeguards — bias checks, privacy protection, audit logs, and robustness tests — applied at every reasoning step, not just the final output.

Why now? The 2024–2025 wave of reasoning models and agentic browsers demands trace-level evaluation (faithfulness, safety, privacy), continuous auditing, and human-in-the-loop oversight to reach production in high-stakes domains.


BibTeX:

@article{raza2025responsible,
  author       = {Shaina Raza and Ranjan Sapkota and Manoj Karkee and Christos Emmanouilidis},
  title        = {Responsible Agentic Reasoning and AI Agents: A Critical Survey},
  journal      = {TechRxiv},
  year         = {2025},
  month        = sep,
  day          = {08},
  doi          = {10.36227/techrxiv.175735299.97215847/v1},
  note         = {Preprint}
}

License

This repository is licensed under the MIT License (see LICENSE).


Acknowledgments

We thank contributors and readers who provide feedback and issue reports. PRs welcome!


📚 References (Inline View)

Show full references table
#KeyTitleAuthorsVenueYearLink
1venerito2025reasoningReasoning large language models in rheumatology: a call for responsible actionVenerito, Vincenzo and Iannone, Florenzo and Gupta, LatikaThe Lancet Rheumatology2025
2nist2023airmfArtificial Intelligence Risk Management Framework (AI RMF 1.0){National Institute of Standards and Technology2023DOI/URL
3johnson2019billionBillion-scale similarity search with GPUsJohnson, Jeff and Douze, Matthijs and J{\'eIEEE Transactions on Big Data2019
4oecd2019aiRecommendation of the Council on Artificial Intelligence{Organisation for Economic Co-operation and Development2019DOI/URL
5eu2024aiactRegulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 on Artificial Intelligence and amending certain Union legislative acts (Artificial Intelligence Act){European Parliament and Council of the European Union2024DOI/URL
6ieee2019ethicsEthically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems{IEEE2019DOI/URL
7chen2025reasoningReasoning Models Don’t Always Say What They ThinkChen, Yanda and Benton, Joe and Radhakrishnan, Ansh and Uesato, Jonathan and Denison, Carson and Schulman, John and Somani, Arushi and Hase, Peter and Wagner, Misha and Roger, Fabien and Mikulik, Vlad and Bowman, Samuel R. and Leike, Jan and Kaplan, Jared and Perez, Ethan and Alignment Science Team, AnthropicarXiv preprint arXiv:2505.054102025DOI/URL
8xu2025towardsTowards large reasoning models: A survey of reinforced reasoning with large language modelsXu, Fengli and Hao, Qianyue and Zong, Zefang and Wang, Jingwei and Zhang, Yunke and Wang, Jingyi and Lan, Xiaochong and Gong, Jiahui and Ouyang, Tianjian and Meng, Fanjin and othersarXiv preprint arXiv:2501.096862025
9karpas2022mrklMRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoningKarpas, Ehud and Abend, Omri and Belinkov, Yonatan and Lenz, Barak and Lieber, Opher and Ratner, Nir and Shoham, Yoav and Bata, Hofit and Levine, Yoav and Leyton-Brown, Kevin and othersarXiv preprint arXiv:2205.004452022
10raza2024beadsBeads: Bias evaluation across domainsRaza, Shaina and Rahman, Mizanur and Zhang, Michael RarXiv preprint arXiv:2406.042202024
11xia2025evaluatingEvaluating mathematical reasoning beyond accuracyXia, Shijie and Li, Xuefeng and Liu, Yixin and Wu, Tongshuang and Liu, PengfeiProceedings of the AAAI Conference on Artificial Intelligence2025
12raza2025humanibenchHumanibench: A human-centric framework for large multimodal models evaluationRaza, Shaina and Narayanan, Aravind and Khazaie, Vahid Reza and Vayani, Ashmal and Chettiar, Mukund S and Singh, Amandeep and Shah, Mubarak and Pandya, DevalarXiv preprint arXiv:2505.114542025
13dafoe2018aiAI governance: a research agendaDafoe, AllanGovernance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK2018
1410771762Exploring Bias and Prediction Metrics to Characterise the Fairness of Machine Learning for Equity-Centered Public Health Decision-Making: A Narrative ReviewRaza, Shaina and Shaban-Nejad, Arash and Dolatabadi, Elham and Mamiya, HiroshiIEEE Access2024DOI/URL
15putnam_axiom2024Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical ReasoningAryan Gulati and Brando Miranda and Eric Chen and Emily Xia and Kai Fronsdal and Bruno de Moraes Dumont and Sanmi Koyejo38th Conference on Neural Information Processing Systems (NeurIPS 2024) Workshop on MATH-AI2024DOI/URL
16OperaBrowserOperator2025Meet Opera’s AI Browser Operator{Opera Software2025
17Comet2025Introducing Comet: Browse at the Speed of Thought{Perplexity Team2025
18Dia2025Dia BrowserDia Browser2025
19OpenAIOperator2025OpenAI OperatorOpenAI2025
20sapkota2025multimodalMultimodal large language models for image, text, and speech data augmentation: A surveySapkota, Ranjan and Raza, Shaina and Shoman, Maged and Paudel, Achyut and Karkee, ManojarXiv preprint arXiv:2501.186482025
21ClaudeArtifacts2024Claude 3.5 Sonnet Launch \& Artifacts Preview{Anthropic2024
22CowPilot2025CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web NavigationFaria Huq and Zora Zhiruo Wang and Frank F. Xu and Tianyue Ou and Shuyan Zhou and Jeffrey P. Bigham and Graham NeubigarXiv preprint2025DOI/URL
23SWEAgent2024SWE-agent: Agent-Computer Interfaces Enable Automated Software EngineeringJohn Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir PressarXiv preprint2024DOI/URL
24MistralAgentsAPI2025Build AI Agents with the Mistral Agents API{Mistral AI2025
25chollet2019measureOn the measure of intelligenceChollet, Fran{\c{carXiv preprint arXiv:1911.015472019
26chollet2025arcArc-agi-2: A new challenge for frontier ai reasoning systemsChollet, Francois and Knoop, Mike and Kamradt, Gregory and Landers, Bryan and Pinkard, HenryarXiv preprint arXiv:2505.118312025
27yue2024mmmumassivemultidisciplinemultimodalMMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGIXiang Yue and Yuansheng Ni and Kai Zhang and Tianyu Zheng and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen2024DOI/URL
28peiyuan_liu_2023MMLU DatasetPeiyuan LiuKaggle2023DOI/URL
29PerplexityComet2025Comet: The Browser That Thinks With YouPerplexity AI2025
30dominguez2024trainingTraining on the test task confounds evaluation and emergenceDominguez-Olmedo, Ricardo and Dorner, Florian E and Hardt, MoritzarXiv preprint arXiv:2407.078902024
31OpenAIChatGPTAgent2025Introducing ChatGPT Agent: Bridging Research and ActionOpenAI2025
32lee2024vhelmVhelm: A holistic evaluation of vision language modelsLee, Tony and Tu, Haoqin and Wong, Chi Heem and Zheng, Wenhao and Zhou, Yiyang and Mai, Yifan and Roberts, Josselin and Yasunaga, Michihiro and Yao, Huaxiu and Xie, Cihang and othersAdvances in Neural Information Processing Systems2024
33pineau2020improvingImproving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)Joelle Pineau and Philippe Vincent-Lamarre and Koustuv Sinha and Vincent Larivière and Alina Beygelzimer and Florence d'Alché-Buc and Emily Fox and Hugo Larochelle2020DOI/URL
34AWSStrandsAgents2025Introducing Strands Agents, an Open Source AI Agents SDK{AWS Open Source2025
35he2024webvoyagerWebvoyager: Building an end-to-end web agent with large multimodal modelsHe, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, DongarXiv preprint arXiv:2401.139192024
36GeminiMariner2024Introducing Gemini 2.0: Our New AI Model for the Agentic Era{Google DeepMind2024
37zhang2024litewebagentLiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent ApplicationsDanqing Zhang and Balaji Rama and Shiying He and Jingyi NiZenodo2024DOI/URL
38GoogleMariner2025Project Mariner{Google DeepMind2025
39yang2025agenticwebweavingwebAgentic Web: Weaving the Next Web with AI AgentsYingxuan Yang and Mulei Ma and Yuxuan Huang and Huacan Chai and Chenyu Gong and Haoran Geng and Yuanjian Zhou and Ying Wen and Meng Fang and Muhao Chen and Shangding Gu and Ming Jin and Costas Spanos and Yang Yang and Pieter Abbeel and Dawn Song and Weinan Zhang and Jun Wang2025DOI/URL
40Fellou2025Fellou: Agentic Web Browser{Fellou AI2025
41OperaNeon2025Opera Neon{Opera Software AS2025
42CopilotAgent2025GitHub CopilotGitHub Copilot2025
43AmazonQDeveloper2025Amazon Q Developer Elevates the IDE Experience with New Agentic Coding ExperienceElizabeth Fuentes2025
44AutoGen04_2025AutoGen v0.4: Reimagining the Foundation of Agentic AI for Scale, Extensibility, and RobustnessAdam Fourney and Ahmed Awadallah and Cheng Tan and Erkang Zhu and Friederike Niedtner and Gagan Bansal and \textit{et al.2025
45zhou2024webarenarealisticwebenvironmentWebArena: A Realistic Web Environment for Building Autonomous AgentsShuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig2024DOI/URL
46huang2023benchmarkingBenchmarking large language models as ai research agentsHuang, Qian and Vora, Jian and Liang, Percy and Leskovec, JureNeurIPS 2023 Foundation Models for Decision Making Workshop2023
47huang2023mlagentbenchMlagentbench: Evaluating language agents on machine learning experimentationHuang, Qian and Vora, Jian and Liang, Percy and Leskovec, JurearXiv preprint arXiv:2310.033022023
48martinez2025dissectingDissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM-and Agent-Based Repair SystemsMartinez, Matias and Franch, XavierarXiv preprint arXiv:2506.172082025
49wang2024mobileagentv2mobiledeviceoperationMobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent CollaborationJunyang Wang and Haiyang Xu and Haitao Jia and Xi Zhang and Ming Yan and Weizhou Shen and Ji Zhang and Fei Huang and Jitao Sang2024DOI/URL
50chen2025spabenchSPA-Bench: A Comprehensive Benchmark for SmartPhone Agent EvaluationJingxuan Chen and Derek Yuen and Bin Xie and Yuhao Yang and Gongwei Chen and Zhihao Wu and Li Yixing and Xurui Zhou and Weiwen Liu and Shuai Wang and Kaiwen Zhou and Rui Shao and Liqiang Nie and Yasheng Wang and Jianye HAO and Jun Wang and Kun ShaoThe Thirteenth International Conference on Learning Representations2025
51chollet2024arcArc prize 2024: Technical reportChollet, Francois and Knoop, Mike and Kamradt, Gregory and Landers, BryanarXiv preprint arXiv:2412.046042024
52chang2024agentboardAgentboard: An analytical evaluation board of multi-turn llm agentsChang, Ma and Zhang, Junlei and Zhu, Zhihao and Yang, Cheng and Yang, Yujiu and Jin, Yaohui and Lan, Zhenzhong and Kong, Lingpeng and He, JunxianAdvances in neural information processing systems2024
53talmor2018commonsenseqaCommonsenseqa: A question answering challenge targeting commonsense knowledgeTalmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, JonathanarXiv preprint arXiv:1811.009372018
54casper2025aiagentindexThe AI Agent IndexStephen Casper and Luke Bailey and Rosco Hunter and Carson Ezell and Emma Cabalé and Michael Gerovitch and Stewart Slocum and Kevin Wei and Nikola Jurkovic and Ariba Khan and Phillip J. K. Christoffersen and A. Pinar Ozisik and Rakshit Trivedi and Dylan Hadfield-Menell and Noam Kolt2025DOI/URL
55srivastava2023beyondBeyond the imitation game: Quantifying and extrapolating the capabilities of language modelsSrivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal and Abid, Abubakar and Fisch, Adam and Brown, Adam R and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri and othersTransactions on machine learning research2023
56zaharia2018acceleratingAccelerating the machine learning lifecycle with MLflow.Zaharia, Matei and Chen, Andrew and Davidson, Aaron and Ghodsi, Ali and Hong, Sue Ann and Konwinski, Andy and Murching, Siddharth and Nykodym, Tomas and Ogilvie, Paul and Parkhe, Mani and othersIEEE Data Eng. Bull.2018
57merkel2014dockerDocker: lightweight linux containers for consistent development and deploymentMerkel, Dirk and othersLinux j2014
58borenstein2021introductionIntroduction to meta-analysisBorenstein, Michael and Hedges, Larry V and Higgins, Julian PT and Rothstein, Hannah RJohn wiley \& sons2021
59W3C2013PROVOverviewPROV-Overview: An Overview of the PROV Family of Documents{W3C Provenance Working Group2013
60gebru2021datasheetsDatasheets for datasetsGebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and Iii, Hal Daum{\'eCommunications of the ACM2021
61imoOfficial Website{International Mathematical Olympiadn.d.
62livecodebench_datasetsLiveCodeBench datasets - code\_generation\_lite, execution‑v2, test\_generation, …{LiveCodeBenchn.d.
63park2023generativeGenerative agents: Interactive simulacra of human behaviorPark, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael SProceedings of the 36th annual acm symposium on user interface software and technology2023
64hong2023metagptMetaGPT: Meta programming for a multi-agent collaborative frameworkHong, Sirui and Zhuge, Mingchen and Chen, Jonathan and Zheng, Xiawu and Cheng, Yuheng and Wang, Jinlin and Zhang, Ceyao and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and othersThe Twelfth International Conference on Learning Representations2023
65wu2023visualVisual chatgpt: Talking, drawing and editing with visual foundation modelsWu, Chenfei and Yin, Shengming and Qi, Weizhen and Wang, Xiaodong and Tang, Zecheng and Duan, NanarXiv preprint arXiv:2303.046712023
66li2023camelCamel: Communicative agents for" mind" exploration of large language model societyLi, Guohao and Hammoud, Hasan and Itani, Hani and Khizbullin, Dmitrii and Ghanem, BernardAdvances in Neural Information Processing Systems2023
67wang2023voyagerVoyager: An open-ended embodied agent with large language modelsWang, Guanzhi and Xie, Yuqi and Jiang, Yunfan and Mandlekar, Ajay and Xiao, Chaowei and Zhu, Yuke and Fan, Linxi and Anandkumar, AnimaarXiv preprint arXiv:2305.162912023
68madaan2023selfSelf-refine: Iterative refinement with self-feedbackMadaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and othersAdvances in Neural Information Processing Systems2023
69li2023apiApi-bank: A comprehensive benchmark for tool-augmented llmsLi, Minghao and Zhao, Yingxiu and Yu, Bowen and Song, Feifan and Li, Hangyu and Yu, Haiyang and Li, Zhoujun and Huang, Fei and Li, YongbinarXiv preprint arXiv:2304.082442023
70patil2024gorillaGorilla: Large language model connected with massive apisPatil, Shishir G and Zhang, Tianjun and Wang, Xin and Gonzalez, Joseph EAdvances in Neural Information Processing Systems2024
71suris2023vipergptVipergpt: Visual inference via python execution for reasoningSur{\'\iProceedings of the IEEE/CVF international conference on computer vision2023
72ahn2022canDo as i can, not as i say: Grounding language in robotic affordancesAhn, Michael and Brohan, Anthony and Brown, Noah and Chebotar, Yevgen and Cortes, Omar and David, Byron and Finn, Chelsea and Fu, Chuyuan and Gopalakrishnan, Keerthana and Hausman, Karol and othersarXiv preprint arXiv:2204.016912022
73shinn2023reflexionReflexion: Language agents with verbal reinforcement learningShinn, Noah and Cassano, Federico and Gopinath, Ashwin and Narasimhan, Karthik and Yao, ShunyuAdvances in Neural Information Processing Systems2023
74shen2023hugginggptsolvingaitasksHuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging FaceYongliang Shen and Kaitao Song and Xu Tan and Dongsheng Li and Weiming Lu and Yueting Zhuang2023DOI/URL
75jimenez2023sweSwe-bench: Can language models resolve real-world github issues?Jimenez, Carlos E and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, KarthikarXiv preprint arXiv:2310.067702023
76chen2021evaluatingEvaluating large language models trained on codeChen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde De Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and othersarXiv preprint arXiv:2107.033742021
77hendrycks2020measuringMeasuring massive multitask language understandingHendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, JacobarXiv preprint arXiv:2009.033002020
78chollet2024abstractionAbstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI)Chollet, Fran{\c{c2024
79codeforcesCompetitive Programming Platform{Codeforcesn.d.
80rein2024gpqaGpqa: A graduate-level google-proof q\&a benchmarkRein, David and Hou, Betty Li and Stickland, Asa Cooper and Petty, Jackson and Pang, Richard Yuanzhe and Dirani, Julien and Michael, Julian and Bowman, Samuel RFirst Conference on Language Modeling2024
81hendrycks2021measuringMeasuring mathematical problem solving with the math datasetHendrycks, Dan and Burns, Collin and Kadavath, Saurav and Arora, Akul and Basart, Steven and Tang, Eric and Song, Dawn and Steinhardt, JacobarXiv preprint arXiv:2103.038742021
82maa_aimeAmerican Invitational Mathematics Examination (AIME){Mathematical Association of America
83cobbe2021trainingTraining verifiers to solve math word problemsCobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and othersarXiv preprint arXiv:2110.141682021
84shao2024deepseekmathDeepseekmath: Pushing the limits of mathematical reasoning in open language modelsShao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Yang and othersarXiv preprint arXiv:2402.033002024
85he2025skyworkSkywork open reasoner 1 technical reportHe, Jujie and Liu, Jiacai and Liu, Chris Yuhao and Yan, Rui and Wang, Chaojie and Cheng, Peng and Zhang, Xiaoyu and Zhang, Fuxiang and Xu, Jiacheng and Shen, Wei and othersarXiv preprint arXiv:2505.223122025
86bai2023qwenQwen technical reportBai, Jinze and Bai, Shuai and Chu, Yunfei and Cui, Zeyu and Dang, Kai and Deng, Xiaodong and Fan, Yang and Ge, Wenbin and Han, Yu and Huang, Fei and othersarXiv preprint arXiv:2309.166092023
87LuongLockhart2025GeminiIMOAdvanced version of Gemini with Deep Think officially achieves gold‑medal standard at the International Mathematical OlympiadThang Luong and Edward Lockhart2025
88comanici2025geminiGemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilitiesComanici, Gheorghe and Bieber, Eric and Schaekermann, Mike and Pasupat, Ice and Sachdeva, Noveen and Dhillon, Inderjit and Blistein, Marcel and Ram, Ori and Zhang, Dan and Rosen, Evan and othersarXiv preprint arXiv:2507.062612025
89zhang2024naturalcodebenchNaturalcodebench: Examining coding performance mismatch on humaneval and natural user queriesZhang, Shudan and Zhao, Hanlin and Liu, Xiao and Zheng, Qinkai and Qi, Zehan and Gu, Xiaotao and Dong, Yuxiao and Tang, JieFindings of the Association for Computational Linguistics ACL 20242024
90bai2022constitutionalaiharmlessnessaiConstitutional AI: Harmlessness from AI FeedbackYuntao Bai and Saurav Kadavath et al.2022DOI/URL
91liang2023holisticHolistic Evaluation of Language ModelsPercy Liang and Rishi Bommasani and et al.Transactions on Machine Learning Research2023DOI/URL
92borghoff2025humanHuman-artificial interaction in the age of agentic AI: a system-theoretical approachBorghoff, Uwe M and Bottoni, Paolo and Pareschi, RemoFrontiers in Human Dynamics2025
93team2024gemmaGemma: Open models based on gemini research and technologyTeam, Gemma and Mesnard, Thomas and Hardin, Cassidy and Dadashi, Robert and Bhupatiraju, Surya and Pathak, Shreya and Sifre, Laurent and Rivi{\`earXiv preprint arXiv:2403.082952024
94GoogleGeminiModels2025Gemini Models \textbar{2025
95anil2023palmPalm 2 technical reportAnil, Rohan and Dai, Andrew M and Firat, Orhan and Johnson, Melvin and Lepikhin, Dmitry and Passos, Alexandre and Shakeri, Siamak and Taropa, Emanuel and Bailey, Paige and Chen, Zhifeng and othersarXiv preprint arXiv:2305.104032023
96OpenAI2025OpenAI2015--2025
97yehudai2025surveySurvey on evaluation of llm-based agentsYehudai, Asaf and Eden, Lilach and Li, Alan and Uziel, Guy and Zhao, Yilun and Bar-Haim, Roy and Cohan, Arman and Shmueli-Scheuer, MichalarXiv preprint arXiv:2503.164162025
98wang2025surveyA survey on responsible llms: Inherent risk, malicious use, and mitigation strategyWang, Huandong and Fu, Wenjie and Tang, Yingzhou and Chen, Zhilong and Huang, Yuxi and Piao, Jinghua and Gao, Chen and Xu, Fengli and Jiang, Tao and Li, YongarXiv preprint arXiv:2501.094312025
99chu2024fairnessFairness in large language models: A taxonomic surveyChu, Zhibo and Wang, Zichong and Zhang, WenbinACM SIGKDD explorations newsletter2024
100plaat2024reasoningReasoning with large language models, a surveyPlaat, Aske and Wong, Annie and Verberne, Suzan and Broekens, Joost and van Stein, Niki and Back, ThomasarXiv preprint arXiv:2407.115112024
101plaat2025agenticAgentic large language models, a surveyPlaat, Aske and van Duijn, Max and van Stein, Niki and Preuss, Mike and van der Putten, Peter and Batenburg, Kees JoostarXiv preprint arXiv:2503.230372025
102rao1995bdiBDI agents: From theory to practice.Rao, Anand S and Georgeff, Michael P and othersIcmas1995
103chu2023navigateNavigate through enigmatic labyrinth a survey of chain of thought reasoning: Advances, frontiers and futureChu, Zheng and Chen, Jingchang and Chen, Qianglong and Yu, Weijiang and He, Tao and Wang, Haotian and Peng, Weihua and Liu, Ming and Qin, Bing and Liu, TingarXiv preprint arXiv:2309.154022023
104Wooldridge_Jennings_1995Intelligent agents: theory and practiceWooldridge, Michael and Jennings, Nicholas R.The Knowledge Engineering Review1995DOI/URL
105Wang_2024A survey on large language model based autonomous agentsWang, Lei and Ma, Chen and Feng, Xueyang and Zhang, Zeyu and Yang, Hao and Zhang, Jingsen and Chen, Zhiyuan and Tang, Jiakai and Chen, Xu and Lin, Yankai and Zhao, Wayne Xin and Wei, Zhewei and Wen, JirongFrontiers of Computer Science2024DOI/URL
106raza2025responsibleWho is Responsible? The Data, Models, Users or Regulations? A Comprehensive Survey on Responsible Generative AI for a Sustainable FutureRaza, Shaina and Qureshi, Rizwan and Zahid, Anam and Fioresi, Joseph and Sadak, Ferhat and Saeed, Muhammad and Sapkota, Ranjan and Jain, Aditya and Zafar, Anas and Hassan, Muneeb Ul and othersarXiv preprint arXiv:2502.086502025
107sapkota2025aiAi agents vs. agentic ai: A conceptual taxonomy, applications and challengeSapkota, Ranjan and Roumeliotis, Konstantinos I and Karkee, ManojarXiv preprint arXiv:2505.104682025
108raza2025fairsenseFairSense-AI: Responsible AI Meets SustainabilityRaza, Shaina and Chettiar, Mukund Sayeeganesh and Yousefabadi, Matin and Khan, Tahniat and Lotif, MarceloarXiv preprint arXiv:2503.028652025
109song2024auditAudit-llm: Multi-agent collaboration for log-based insider threat detectionSong, Chengyu and Ma, Linru and Zheng, Jianming and Liao, Jinzhi and Kuang, Hongyu and Yang, LinarXiv preprint arXiv:2408.089022024
110green2025leakyLeaky Thoughts: Large Reasoning Models Are Not Private ThinkersGreen, Tommaso and Gubri, Martin and Puerto, Haritz and Yun, Sangdoo and Oh, Seong JoonarXiv preprint arXiv:2506.156742025
111yao2023reactsynergizingreasoningactingReAct: Synergizing Reasoning and Acting in Language ModelsShunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik Narasimhan and Yuan Cao2023DOI/URL
112openai2019gpt2Better Language Models and Their Implications{OpenAI2019
113google2024gemini15proGet more done with Gemini: Try 1.5 Pro and more intelligent features{Google2024
114meta2025llama4The Llama 4 herd: The beginning of a new era of natively multimodal intelligence{Meta AI2025
115xai2025grok3Grok 3 Beta --- The Age of Reasoning Agents{xAI2025
116ibm2025granite33IBM Granite 3.3: Speech recognition, refined reasoning, and RAG LoRAs{IBM2025
117ibm2025granitedocsGranite 3.3 Models --- Documentation{IBM2025
118baidu2025ernie45blogAnnouncing the Open Source Release of the ERNIE 4.5 Model Family{ERNIE Team2025
119baidu2025ernie45reportERNIE 4.5 Technical Report{ERNIE Team2025DOI/URL
120pan2024webcanvasbenchmarkingwebagentsWebCanvas: Benchmarking Web Agents in Online EnvironmentsYichen Pan and Dehan Kong and Sida Zhou and Cheng Cui and Yifei Leng and Bing Jiang and Hangyu Liu and Yanyi Shang and Shuyan Zhou and Tongshuang Wu and Zhengyang Wu2024DOI/URL
121yoran2024assistantbenchAssistantbench: Can web agents solve realistic and time-consuming tasks?Yoran, Ori and Amouyal, Samuel Joseph and Malaviya, Chaitanya and Bogin, Ben and Press, Ofir and Berant, JonathanarXiv preprint arXiv:2407.157112024
122BEARCUBS2025BEARCUBS: A benchmark for computer-using web agentsSong, Yixiao and Thai, Katherine and Pham, Chau Minh and Chang, Yapei and Nadaf, Mazin and Iyyer, MohitarXiv:2503.079192025DOI/URL
123mistral2025medium3Medium is the new large. (Mistral Medium 3){Mistral AI2025
124mistral2025magistralMagistral: Reasoning Model Family{Mistral AI2025
125anthropic2024claude3Introducing the next generation of Claude (Claude 3 family){Anthropic2024
126meta2024llama31Introducing Llama 3.1: Our most capable models to date{Meta AI2024
127openai2025o3o4miniIntroducing OpenAI o3 and o4-mini{OpenAI2025
128brown2020languageLanguage models are few-shot learnersBrown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and othersAdvances in neural information processing systems2020
129chen2021codexEvaluating Large Language Models Trained on CodeMark Chen and Jerry Tworek and Heewoo Jun and et al.arXiv preprint arXiv:2107.033742021DOI/URL
130raza2025vldbenchVLDBench Evaluating Multimodal Disinformation with Regulatory AlignmentRaza, Shaina and Vayani, Ashmal and Jain, Aditya and Narayanan, Aravind and Khazaie, Vahid Reza and Bashir, Syed Raza and Dolatabadi, Elham and Uddin, Gias and Emmanouilidis, Christos and Qureshi, Rizwan and othersarXiv preprint arXiv:2502.113612025
131ouyang2022trainingTraining language models to follow instructions with human feedbackOuyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and othersAdvances in neural information processing systems2022
132wang2024rethinkingRethinking the bounds of llm reasoning: Are multi-agent discussions the key?Wang, Qineng and Wang, Zihao and Su, Ying and Tong, Hanghang and Song, YangqiuarXiv preprint arXiv:2402.182722024
133yao2023treeTree of thoughts: Deliberate problem solving with large language modelsYao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, KarthikAdvances in neural information processing systems2023
134wang2023planPlan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language modelsWang, Lei and Xu, Wanyu and Lan, Yihuai and Hu, Zhiqiang and Lan, Yunshi and Lee, Roy Ka-Wei and Lim, Ee-PengarXiv preprint arXiv:2305.040912023
135ejjami2024ethicalEthical artificial intelligence framework theory (EAIFT): a new paradigm for embedding ethical reasoning in AI systemsEjjami, RachidInt J Multidiscip Res2024
136schick2023toolformerToolformer: Language models can teach themselves to use toolsSchick, Timo and Dwivedi-Yu, Jane and Dess{\`\iAdvances in Neural Information Processing Systems2023
137raza2025trismagenticaireviewTRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent SystemsShaina Raza and Ranjan Sapkota and Manoj Karkee and Christos Emmanouilidis2025DOI/URL
138zhang2025litewebagentopensourcesuitevlmbasedLiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent ApplicationsDanqing Zhang and Balaji Rama and Jingyi Ni and Shiying He and Fu Zhao and Kunyu Chen and Arnold Chen and Junyu Cao2025DOI/URL
139SAPKOTA2026103575Object detection with multimodal large vision-language models: An in-depth reviewRanjan Sapkota and Manoj KarkeeInformation Fusion2026DOI/URL
140Huq_2025CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web NavigationHuq, Faria and Wang, Zora Zhiruo and Xu, Frank F. and Ou, Tianyue and Zhou, Shuyan and Bigham, Jeffrey P. and Neubig, GrahamProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)2025DOI/URL
141dunnell2024bioticbrowserapplyingstreamingllmBiotic Browser: Applying StreamingLLM as a Persistent Web Browsing Co-PilotKevin F. Dunnell and Andrew P. Stoddard2024DOI/URL
142desai2025responsibleaiagentsResponsible AI AgentsDeven R. Desai and Mark O. Riedl2025DOI/URL
143wu2025llmLlm fine-tuning: Concepts, opportunities, and challengesWu, Xiao-Kun and Chen, Min and Li, Wanyi and Wang, Rui and Lu, Limeng and Liu, Jia and Hwang, Kai and Hao, Yixue and Pan, Yanru and Meng, Qingguo and othersBig Data and Cognitive Computing2025
144jin2024impactThe impact of reasoning step length on large language modelsJin, Mingyu and Yu, Qinkai and Shu, Dong and Zhao, Haiyan and Hua, Wenyue and Meng, Yanda and Zhang, Yongfeng and Du, MengnanarXiv preprint arXiv:2401.049252024
145patil2025advancingAdvancing reasoning in large language models: Promising methods and approachesPatil, Avinash and Jadon, AryanarXiv preprint arXiv:2502.036712025
146bonagiri2025towardsTowards Trustworthy AI: Frameworks for Evaluating Consistency in Language ModelsBonagiri, Vamshi Krishna2025
147liang2025aiAI Reasoning in Deep Learning Era: From Symbolic AI to Neural--Symbolic AILiang, Baoyu and Wang, Yuchen and Tong, ChaoMathematics2025
148al2025buildingBuilding Trustworthy AI: Transparent AI Systems via Language Models, Ontologies, and Logical ReasoningAl Machot, Fadi and Horsch, Martin Thomas and Ullah, HabibDesigning the Conceptual Landscape for a XAIR Validation Infrastructure: Proceedings of the International Workshop on Designing the Conceptual Landscape for a XAIR Validation Infrastructure, DCLXVI 2024, Kaiserslautern, Germany2025
149wu2024usableUsable XAI: 10 strategies towards exploiting explainability in the LLM eraWu, Xuansheng and Zhao, Haiyan and Zhu, Yaochen and Shi, Yucheng and Yang, Fan and Hu, Lijie and Liu, Tianming and Zhai, Xiaoming and Yao, Wenlin and Li, Jundong and othersarXiv preprint arXiv:2403.089462024
150wu2025doesDoes Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM ReasoningWu, Xuyang and Nian, Jinming and Wei, Ting-Ruen and Tao, Zhiqiang and Wu, Hsin-Tai and Fang, YiarXiv preprint arXiv:2502.153612025
151fan2025biasguardBiasguard: A reasoning-enhanced bias detection tool for large language modelsFan, Zhiting and Chen, Ruizhe and Liu, ZuozhuarXiv preprint arXiv:2504.212992025
152zhang2025collaborativeCollaborative LLM Numerical Reasoning with Local Data ProtectionZhang, Min and Lu, Yuzhe and Zhou, Yun and Xu, Panpan and Cheong, Lin Lee and Lu, Chang-Tien and Wang, HaozhuarXiv preprint arXiv:2504.002992025
153tavasoli2025responsibleResponsible innovation: A strategic framework for financial LLM integrationTavasoli, Ahmadreza and Sharbaf, Maedeh and Madani, Seyed MohamadarXiv preprint arXiv:2504.021652025
154ferdaus2024towardsTowards trustworthy ai: A review of ethical and robust large language modelsFerdaus, Md Meftahul and Abdelguerfi, Mahdi and Ioup, Elias and Niles, Kendall N and Pathak, Ken and Sloan, StevenarXiv preprint arXiv:2407.139342024
155chen2024trustworthyTrustworthy, responsible, and safe ai: A comprehensive architectural framework for ai safety with challenges and mitigationsChen, Chen and Gong, Xueluan and Liu, Ziyao and Jiang, Weifeng and Goh, Si Qi and Lam, Kwok-YanarXiv preprint arXiv:2408.129352024
156shi2024largeLarge language model safety: A holistic surveyShi, Dan and Shen, Tianhao and Huang, Yufei and Li, Zhigen and Leng, Yongqi and Jin, Renren and Liu, Chuang and Wu, Xinwei and Guo, Zishan and Yu, Linhao and othersarXiv preprint arXiv:2412.176862024
157zheng2025beyondBeyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning ModelsZheng, Baihui and Zheng, Boren and Cao, Kerui and Tan, Yingshui and Liu, Zhendong and Wang, Weixun and Liu, Jiaheng and Yang, Jian and Su, Wenbo and Zhu, Xiaoyong and othersarXiv preprint arXiv:2505.196902025
158goh2024largeLarge language model influence on diagnostic reasoning: a randomized clinical trialGoh, Ethan and Gallo, Robert and Hom, Jason and Strong, Eric and Weng, Yingjie and Kerman, Hannah and Cool, Jos{\'eJAMA network open2024
159lucas2024reasoningReasoning with large language models for medical question answeringLucas, Mary M and Yang, Justin and Pomeroy, Jon K and Yang, Christopher CJournal of the American Medical Informatics Association2024
160guha2023legalbenchLegalbench: A collaboratively built benchmark for measuring legal reasoning in large language modelsGuha, Neel and Nyarko, Julian and Ho, Daniel and R{\'eAdvances in neural information processing systems2023
161shu2024lawllmLawLLM: Law large language model for the US legal systemShu, Dong and Zhao, Haoran and Liu, Xukun and Demeter, David and Du, Mengnan and Zhang, YongfengProceedings of the 33rd ACM International Conference on information and knowledge management2024
162liu2025finFin-r1: A large language model for financial reasoning through reinforcement learningLiu, Zhaowei and Guo, Xin and Lou, Fangqi and Zeng, Lingfeng and Niu, Jinyi and Wang, Zixuan and Xu, Jiajie and Cai, Weige and Yang, Ziwei and Zhao, Xueqian and othersarXiv preprint arXiv:2503.162522025
163son2023beyondBeyond classification: Financial reasoning in state-of-the-art language modelsSon, Guijin and Jung, Hanearl and Hahm, Moonjeong and Na, Keonju and Jin, SolarXiv preprint arXiv:2305.015052023
164yuan2024finllmsFinllms: A framework for financial reasoning dataset generation with large language modelsYuan, Ziqiang and Wang, Kaiyuan and Zhu, Shoutai and Yuan, Ye and Zhou, Jingya and Zhu, Yanlin and Wei, WenqiIEEE Transactions on Big Data2024
165beltagy2019scibertSciBERT: A pretrained language model for scientific textBeltagy, Iz and Lo, Kyle and Cohan, ArmanarXiv preprint arXiv:1903.106762019
166taylor2022galacticaGalactica: A large language model for scienceTaylor, Ross and Kardas, Marcin and Cucurull, Guillem and Scialom, Thomas and Hartshorn, Anthony and Saravia, Elvis and Poulton, Andrew and Kerkez, Viktor and Stojnic, RobertarXiv preprint arXiv:2211.090852022
167raza2025developingDeveloping safe and responsible large language model: can we balance bias reduction and language understanding?Raza, Shaina and Bamgbose, Oluwanifemi and Ghuge, Shardul and Tavakoli, Fatemeh and Reji, Deepak John and Bashir, Syed RazaMachine Learning2025
168besta2025reasoningReasoning language models: A blueprintBesta, Maciej and Barth, Julia and Schreiber, Eric and Kubicek, Ales and Catarino, Afonso and Gerstenberger, Robert and Nyczyk, Piotr and Iff, Patrick and Li, Yueling and Houliston, Sam and othersarXiv preprint arXiv:2501.112232025
169lomonaco2019continualContinual learning with deep architecturesLomonaco, Vincenzoalma2019
170hitzler2022neuroNeuro-symbolic artificial intelligence: The state of the artHitzler, Pascal and Sarker, Md KamruzzamanIOS press2022
171FERPA1974{Family Educational Rights and Privacy Act of 1974 (FERPA){U.S. Congress1974DOI/URL
172iso42001ISO/IEC 42001:2023 -- Artificial Intelligence Management System (AI MS) -- Requirements{International Organization for Standardization2023
173MiFIDII2014{Directive 2014/65/EU{European Parliament and Council of the European Union2014DOI/URL
174hipaa164{HIPAA Privacy Rule -- 45 CFR Part 164: Security and Privacy Protections for Health Information{U.S. Department of Health and Human Services2003
175gdpr25{General Data Protection Regulation (GDPR) -- Article 25: Data protection by design and by default{European Union2016
176slattery2024aiThe ai risk repository: A comprehensive meta-review, database, and taxonomy of risks from artificial intelligenceSlattery, Peter and Saeri, Alexander K and Grundy, Emily AC and Graham, Jess and Noetel, Michael and Uuk, Risto and Dao, James and Pour, Soroush and Casper, Stephen and Thompson, NeilarXiv preprint arXiv:2408.126222024
177sakib2024risksRisks, causes, and mitigations of widespread deployments of large language models (llms): A surveySakib, Md Nazmus and Islam, Md Athikul and Pathak, Royal and Arifin, Md Mashrur2024 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings)2024
178zhao2024explainabilityExplainability for large language models: A surveyZhao, Haiyan and Chen, Hanjie and Yang, Fan and Liu, Ninghao and Deng, Huiqi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Du, MengnanACM Transactions on Intelligent Systems and Technology2024
179jha2022responsibleResponsible reasoning with large language models and the impact of proper nounsJha, Sumit Kumar and Ewetz, Rickard and Velasquez, Alvaro and Jha, SusmitWorkshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 20222022
180park2023generativeagentsGenerative Agents: Interactive Simulacra of Human BehaviorPark, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S.Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23)2023DOI/URL
181lewis2020ragRetrieval-Augmented Generation for Knowledge-Intensive NLPLewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K{\"uAdvances in Neural Information Processing Systems 33 (NeurIPS 2020)2020DOI/URL
182packer2023memgptMemGPT: Towards LLMs as Operating Systems.Packer, Charles and Fang, Vivian and Patil, Shishir\_G and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph\_EArXiv2023
183google2025_gemini25_deepthink_modelcard{Gemini 2.5 Deep Think Model Card{Google DeepMind2025
184anthropic2025_claude37_sonnet{Claude 3.7 Sonnet (Hybrid Reasoning Model) Announcement and System CardAnthropic2025
185he2025_skywork_or1{Skywork Open Reasoner 1 (Skywork-OR1): A Scalable RL Framework for Long Chain-of-Thought ReasoningHe, Jujie and Liu, Jiacai and Liu, Chris Yuhao and Yan, Rui and Wang, Chaojie and Cheng, Peng and Zhang, Xiaoyu and Zhang, Fuxiang and Xu, Jiacheng and Shen, Wei and Li, Siyuan and Zeng, Liang and Wei, Tianwen and Cheng, Cheng and An, Bo and Liu, Yang and Zhou, YahuiarXiv preprint arXiv:2505.223122025
186alibaba2025_qwq32b{Alibaba Cloud Unveils QwQ-32B: A Compact Reasoning Model with Cutting-Edge Performance{Alibaba Cloud Qwen Team2025
187anthropic2024_claude35_sonnet{Introducing Claude 3.5 SonnetAnthropic2024
188anthropic_claude4_systemcardClaude Opus 4 \& Claude Sonnet 4 System Card{Anthropic2025
189Guardian_OpenAI_GPT5_2025OpenAI says latest {ChatGPT{The GuardianThe Guardian2025DOI/URL
190OpenAI_GPT5_2025Introducing {GPT-5{OpenAI2025
191openai2025_gpt_oss_model_card{gpt-oss-120b \& gpt-oss-20b Model Card{OpenAI2025DOI/URL
192wang2022selfSelf-consistency improves chain of thought reasoning in language modelsWang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, DennyarXiv preprint arXiv:2203.111712022
193peter1997experiencesExperiences with an architecture for intelligent, reactive agentsPeter Bonasso, R and James Firby, R and Gat, Erann and Kortenkamp, David and Miller, David P and Slack, Mark GJournal of Experimental \& Theoretical Artificial Intelligence1997
194gat1998threeOn three-layer architecturesGat, Erann and Bonnasso, R Peter and Murphy, Robin and othersArtificial intelligence and mobile robots1998
195brooks1991intelligenceIntelligence without representationBrooks, Rodney AArtificial intelligence1991
196brooks2003robustA robust layered control system for a mobile robotBrooks, RodneyIEEE journal on robotics and automation2003
197karpukhin2020dprDense Passage Retrieval for Open-Domain Question AnsweringKarpukhin, Vladimir and O{\u{gProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)2020DOI/URL
198johnson2017faissBillion-Scale Similarity Search with GPUsJohnson, Jeff and Douze, Matthijs and J{\'eIEEE Transactions on Big Data2019DOI/URL
199woodgate2024macroMacro ethics principles for responsible AI systems: Taxonomy and directionsWoodgate, Jessica and Ajmeri, NiravACM Computing Surveys2024
200devlin2019bertBert: Pre-training of deep bidirectional transformers for language understandingDevlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, KristinaProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)2019
201wei2022chainChain-of-thought prompting elicits reasoning in large language modelsWei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny and othersAdvances in neural information processing systems2022
202jaech2024openaiOpenai o1 system cardJaech, Aaron and Kalai, Adam and Lerer, Adam and Richardson, Adam and El-Kishky, Ahmed and Low, Aiden and Helyar, Alec and Madry, Aleksander and Beutel, Alex and Carney, Alex and othersarXiv preprint arXiv:2412.167202024
203guo2025deepseekDeepseek-r1: Incentivizing reasoning capability in llms via reinforcement learningGuo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and othersarXiv preprint arXiv:2501.129482025
204zhang2022automaticAutomatic chain of thought prompting in large language modelsZhang, Zhuosheng and Zhang, Aston and Li, Mu and Smola, AlexarXiv preprint arXiv:2210.034932022
205lyu2023faithfulFaithful chain-of-thought reasoningLyu, Qing and Havaldar, Shreya and Stein, Adam and Zhang, Li and Rao, Delip and Wong, Eric and Apidianaki, Marianna and Callison-Burch, ChrisThe 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023)2023
206mokander2024auditingAuditing large language models: a three-layered approachM{\"oAI and Ethics2024
207amirizaniani2024llmauditorLLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-LoopAmirizaniani, Maryam and Yao, Jihan and Lavergne, Adrian and Okada, Elizabeth Snell and Chadha, Aman and Roosta, Tanya and Shah, ChiragarXiv preprint arXiv:2402.093462024
208amirizaniani2024auditllmAuditLLM: A tool for auditing large language models using multiprobe approachAmirizaniani, Maryam and Martin, Elias and Roosta, Tanya and Chadha, Aman and Shah, ChiragProceedings of the 33rd ACM International Conference on Information and Knowledge Management2024
209paraschou2025mindMind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AIParaschou, Eva and Arapakis, Ioannis and Yfantidou, Sofia and Macaluso, Sebastian and Vakali, AthenaarXiv preprint arXiv:2506.122402025
210ehsan2024humanHuman-centered explainable AI (HCXAI): Reloading explainability in the era of large language models (LLMs)Ehsan, Upol and Watkins, Elizabeth A and Wintersberger, Philipp and Manger, Carina and Kim, Sunnie SY and Van Berkel, Niels and Riener, Andreas and Riedl, Mark OExtended Abstracts of the CHI Conference on Human Factors in Computing Systems2024
211yang2024humanHuman-centric autonomous systems with llms for user command reasoningYang, Yi and Zhang, Qingwen and Li, Ci and Marta, Daniel Sim{\~oProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision2024
212zhang2024llamaLlama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoningZhang, Di and Wu, Jianbo and Lei, Jingdi and Che, Tong and Li, Jiatong and Xie, Tong and Huang, Xiaoshui and Zhang, Shufei and Pavone, Marco and Li, Yuqiang and othersarXiv preprint arXiv:2410.028842024
213zheng2024processbenchProcessbench: Identifying process errors in mathematical reasoningZheng, Chujie and Zhang, Zhenru and Zhang, Beichen and Lin, Runji and Lu, Keming and Yu, Bowen and Liu, Dayiheng and Zhou, Jingren and Lin, JunyangarXiv preprint arXiv:2412.065592024
214browne2012surveyA survey of monte carlo tree search methodsBrowne, Cameron B and Powley, Edward and Whitehouse, Daniel and Lucas, Simon M and Cowling, Peter I and Rohlfshagen, Philipp and Tavener, Stephen and Perez, Diego and Samothrakis, Spyridon and Colton, SimonIEEE Transactions on Computational Intelligence and AI in games2012
215zhao2024expelExpel: Llm agents are experiential learnersZhao, Andrew and Huang, Daniel and Xu, Quentin and Lin, Matthieu and Liu, Yong-Jin and Huang, GaoProceedings of the AAAI Conference on Artificial Intelligence2024
216besta2024graphGraph of thoughts: Solving elaborate problems with large language modelsBesta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and othersProceedings of the AAAI conference on artificial intelligence2024
217liu2024mathbenchMathbench: Evaluating the theory and application proficiency of llms with a hierarchical mathematics benchmarkLiu, Hongwei and Zheng, Zilong and Qiao, Yuxuan and Duan, Haodong and Fei, Zhiwei and Zhou, Fengzhe and Zhang, Wenwei and Zhang, Songyang and Lin, Dahua and Chen, KaiarXiv preprint arXiv:2405.122092024
218wang2024rupbenchRupbench: Benchmarking reasoning under perturbations for robustness evaluation in large language modelsWang, Yuqing and Zhao, YunarXiv preprint arXiv:2406.110202024
219zeng2024mrMr-ben: A meta-reasoning benchmark for evaluating system-2 thinking in llmsZeng, Zhongshen and Liu, Yinhong and Wan, Yingjia and Li, Jingyao and Chen, Pengguang and Dai, Jianbo and Yao, Yuxuan and Xu, Rongwu and Qi, Zehan and Zhao, Wanru and othersAdvances in Neural Information Processing Systems2024
220estermann2024puzzlesPuzzles: A benchmark for neural algorithmic reasoningEstermann, Benjamin and Lanzend{\"oAdvances in Neural Information Processing Systems2024
221wang2019superglueSuperglue: A stickier benchmark for general-purpose language understanding systemsWang, Alex and Pruksachatkun, Yada and Nangia, Nikita and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, SamuelAdvances in neural information processing systems2019
222pan2023logicLogic-lm: Empowering large language models with symbolic solvers for faithful logical reasoningPan, Liangming and Albalak, Alon and Wang, Xinyi and Wang, William YangarXiv preprint arXiv:2305.122952023
223yu2024naturalNatural language reasoning, a surveyYu, Fei and Zhang, Hongbo and Tiwari, Prayag and Wang, BenyouACM Computing Surveys2024
224liu2024logicLogic-of-thought: Injecting logic into contexts for full reasoning in large language modelsLiu, Tongxuan and Xu, Wenjiang and Huang, Weizhe and Zeng, Yuting and Wang, Jiaxing and Wang, Xingyu and Yang, Hailong and Li, JingarXiv preprint arXiv:2409.175392024
225pan2025surveyA survey of slow thinking-based reasoning llms using reinforced learning and inference-time scaling lawPan, Qianjun and Ji, Wenkai and Ding, Yuyang and Li, Junsong and Chen, Shilian and Wang, Junyi and Zhou, Jie and Chen, Qin and Zhang, Min and Wu, Yulan and othersarXiv preprint arXiv:2505.026652025
226huang2025deepDeep Research Agents: A Systematic Examination And RoadmapHuang, Yuxuan and Chen, Yihang and Zhang, Haozheng and Li, Kang and Fang, Meng and Yang, Linyi and Li, Xiaoguang and Shang, Lifeng and Xu, Songcen and Hao, Jianye and othersarXiv preprint arXiv:2506.180962025
227raiaan2024reviewA review on large language models: Architectures, applications, taxonomies, open issues and challengesRaiaan, Mohaimenul Azam Khan and Mukta, Md Saddam Hossain and Fatema, Kaniz and Fahad, Nur Mohammad and Sakib, Sadman and Mim, Most Marufatul Jannat and Ahmad, Jubaer and Ali, Mohammed Eunus and Azam, SamiIEEE access2024
228chen2025towardsTowards reasoning era: A survey of long chain-of-thought for reasoning large language modelsChen, Qiguang and Qin, Libo and Liu, Jinhao and Peng, Dengyun and Guan, Jiannan and Wang, Peng and Hu, Mengkang and Zhou, Yuhang and Gao, Te and Che, WanxiangarXiv preprint arXiv:2503.095672025
229cao2025towardToward generalizable evaluation in the llm era: A survey beyond benchmarksCao, Yixin and Hong, Shibo and Li, Xinze and Ying, Jiahao and Ma, Yubo and Liang, Haiyuan and Liu, Yantao and Yao, Zijun and Wang, Xiaozhi and Huang, Dan and othersarXiv preprint arXiv:2504.188382025
230chang2024surveyA survey on evaluation of large language modelsChang, Yupeng and Wang, Xu and Wang, Jindong and Wu, Yuan and Yang, Linyi and Zhu, Kaijie and Chen, Hao and Yi, Xiaoyuan and Wang, Cunxiang and Wang, Yidong and othersACM transactions on intelligent systems and technology2024
231morishita2024enhancingEnhancing reasoning capabilities of llms via principled synthetic logic corpusMorishita, Terufumi and Morio, Gaku and Yamaguchi, Atsuki and Sogawa, YasuhiroAdvances in Neural Information Processing Systems2024
232basiouni2025contextIn-Context Learning in Large Language Models (LLMs): Mechanisms, Capabilities, and Implications for Advanced Knowledge Representation and ReasoningBasiouni, Azza Mohamed and El Rashid, Mohamed and Shaalan, KhaledIEEE Access2025
233yeo2025demystifyingDemystifying long chain-of-thought reasoning in llmsYeo, Edward and Tong, Yuxuan and Niu, Morry and Neubig, Graham and Yue, XiangarXiv preprint arXiv:2502.033732025
234kumar2025llmLlm post-training: A deep dive into reasoning large language modelsKumar, Komal and Ashraf, Tajamul and Thawakar, Omkar and Anwer, Rao Muhammad and Cholakkal, Hisham and Shah, Mubarak and Yang, Ming-Hsuan and Torr, Phillip HS and Khan, Fahad Shahbaz and Khan, SalmanarXiv preprint arXiv:2502.213212025
235fu2025improvingImproving complex reasoning in large language modelsFu, YaoThe University of Edinburgh2025
236feng2025efficientEfficient reasoning models: A surveyFeng, Sicheng and Fang, Gongfan and Ma, Xinyin and Wang, XinchaoarXiv preprint arXiv:2504.109032025
237ferrag2025llmFrom llm reasoning to autonomous ai agents: A comprehensive reviewFerrag, Mohamed Amine and Tihanyi, Norbert and Debbah, MerouanearXiv preprint arXiv:2504.196782025
238putta2024agentAgent q: Advanced reasoning and learning for autonomous ai agentsPutta, Pranav and Mills, Edmund and Garg, Naman and Motwani, Sumeet and Finn, Chelsea and Garg, Divyansh and Rafailov, RafaelarXiv preprint arXiv:2408.071992024
239tariq2025reasoningReasoning About Responsibility in Autonomous Systems: Navigating the Challenges and Charting Future DirectionsTariq, Usman and Ahmed, IrfanUbiquitous Technology Journal2025
240ferrag2025reasoningReasoning beyond limits: Advances and open problems for llmsFerrag, Mohamed Amine and Tihanyi, Norbert and Debbah, MerouanearXiv preprint arXiv:2503.227322025
241wu2025positionPosition Paper: Towards Open Complex Human-AI Agents Collaboration System for Problem-Solving and Knowledge ManagementWu, Ju and Or, Calvin KLarXiv preprint arXiv:2505.000182025
242tran2025reasoningReasoning in Neurosymbolic AITran, Son and Mota, Edjard and Garcez, Artur d'AvilaarXiv preprint arXiv:2505.203132025
243swiechowski2023monteMonte Carlo tree search: A review of recent modifications and applications{\'SArtificial Intelligence Review2023
244sun2025dataData Agent: A Holistic Architecture for Orchestrating Data+ AI EcosystemsSun, Zhaoyan and Wang, Jiayi and Zhao, Xinyang and Wang, Jiachi and Li, GuoliangarXiv preprint arXiv:2507.015992025
245zheng2025retrievalRetrieval augmented generation and understanding in vision: A survey and new outlookZheng, Xu and Weng, Ziqiao and Lyu, Yuanhuiyi and Jiang, Lutao and Xue, Haiwei and Ren, Bin and Paudel, Danda and Sebe, Nicu and Van Gool, Luc and Hu, XumingarXiv preprint arXiv:2503.180162025
246bei2025graphsGraphs Meet AI Agents: Taxonomy, Progress, and Future OpportunitiesBei, Yuanchen and Zhang, Weizhi and Wang, Siwen and Chen, Weizhi and Zhou, Sheng and Chen, Hao and Li, Yong and Bu, Jiajun and Pan, Shirui and Yu, Yizhou and othersarXiv preprint arXiv:2506.180192025
247chhikara2025mem0Mem0: Building production-ready ai agents with scalable long-term memoryChhikara, Prateek and Khant, Dev and Aryan, Saket and Singh, Taranjeet and Yadav, DeshrajarXiv preprint arXiv:2504.194132025
248huang2025foundationFoundation models and intelligent decision-making: Progress, challenges, and perspectivesHuang, Jincai and Xu, Yongjun and Wang, Qi and Wang, Qi Cheems and Liang, Xingxing and Wang, Fei and Zhang, Zhao and Wei, Wei and Zhang, Boxuan and Huang, Libo and othersThe Innovation2025
249sun2025surveyA survey of reasoning with foundation models: Concepts, methodologies, and outlookSun, Jiankai and Zheng, Chuanyang and Xie, Enze and Liu, Zhengying and Chu, Ruihang and Qiu, Jianing and Xu, Jiaqi and Ding, Mingyu and Li, Hongyang and Geng, Mengzhe and othersACM Computing Surveys2025
250zhang2025ignitingIgniting language intelligence: The hitchhiker’s guide from chain-of-thought reasoning to language agentsZhang, Zhuosheng and Yao, Yao and Zhang, Aston and Tang, Xiangru and Ma, Xinbei and He, Zhiwei and Wang, Yiming and Gerstein, Mark and Wang, Rui and Liu, Gongshen and othersACM Computing Surveys2025
251wang2025multimodalMultimodal chain-of-thought reasoning: A comprehensive surveyWang, Yaoting and Wu, Shengqiong and Zhang, Yuecheng and Yan, Shuicheng and Liu, Ziwei and Luo, Jiebo and Fei, HaoarXiv preprint arXiv:2503.126052025
252chen2025policyPolicy frameworks for transparent chain-of-thought reasoning in large language modelsChen, Yihang and Deng, Haikang and Han, Kaiqiao and Zhao, QingyuearXiv preprint arXiv:2503.145212025
253manuvinakurike2025thoughtsThoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic PipelinesManuvinakurike, Ramesh and Moss, Emanuel and Watkins, Elizabeth Anne and Sahay, Saurav and Raffa, Giuseppe and Nachman, LamaarXiv preprint arXiv:2505.008752025
254li2025llmLLM-augmented hierarchical reinforcement learning for human-like decision-making of autonomous drivingLi, Lin and Tan, Runjia and Fang, Jianwu and Xue, Jianru and Lv, ChenExpert Systems with Applications2025
255zhao2025worldWorld Models for Cognitive Agents: Transforming Edge Intelligence in Future NetworksZhao, Changyuan and Zhang, Ruichen and Wang, Jiacheng and Zhao, Gaosheng and Niyato, Dusit and Sun, Geng and Mao, Shiwen and Kim, Dong InarXiv preprint arXiv:2506.004172025
256lopez2025surveyA Survey on Large Language Models in Multimodal Recommender SystemsLopez-Avila, Alejo and Du, JinhuaarXiv preprint arXiv:2505.097772025
257giannone2025feedbackFeedback-Driven Vision-Language Alignment with Minimal Human SupervisionGiannone, Giorgio and Li, Ruoteng and Feng, Qianli and Perevodchikov, Evgeny and Chen, Rui and Martinez, AleixarXiv preprint arXiv:2501.045682025
258cao2025causalCausal action empowerment for efficient reinforcement learning in embodied agentsCao, Hongye and Feng, Fan and Huo, Jing and Gao, YangScience China Information Sciences2025
259ranjan2025fairnessFairness in Agentic AI: A Unified Framework for Ethical and Equitable Multi-Agent SystemRanjan, Rajesh and Gupta, Shailja and Singh, Surya NarayanarXiv preprint arXiv:2502.072542025
260chen2024fairnessFairness testing: A comprehensive survey and analysis of trendsChen, Zhenpeng and Zhang, Jie M and Hort, Max and Harman, Mark and Sarro, FedericaACM Transactions on Software Engineering and Methodology2024
261su2025thinkingThinking with Images for Multimodal Reasoning: Foundations, Methods, and Future FrontiersSu, Zhaochen and Xia, Peng and Guo, Hangyu and Liu, Zhenhua and Ma, Yan and Qu, Xiaoye and Liu, Jiaqi and Li, Yanshu and Zeng, Kaide and Yang, Zhengyuan and othersarXiv preprint arXiv:2506.239182025
262karunanayake2025nextNext-generation agentic AI for transforming healthcareKarunanayake, NalanInformatics and Health2025
263zhang2025surveyA Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?Zhang, Qiyuan and Lyu, Fuyuan and Sun, Zexu and Wang, Lei and Zhang, Weixu and Hua, Wenyue and Wu, Haolun and Guo, Zhihan and Wang, Yufei and Muennighoff, Niklas and othersarXiv preprint arXiv:2503.242352025
264kim2025costThe Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure PerspectiveKim, Jiin and Shin, Byeongjun and Chung, Jinha and Rhu, MinsooarXiv preprint arXiv:2506.043012025
265li2025systemFrom system 1 to system 2: A survey of reasoning large language modelsLi, Zhong-Zhi and Zhang, Duzhen and Zhang, Ming-Liang and Zhang, Jiaxin and Liu, Zengyan and Yao, Yuxuan and Xu, Haotian and Zheng, Junhao and Wang, Pei-Jie and Chen, Xiuyi and othersarXiv preprint arXiv:2502.174192025
266gao2024interpretableInterpretable contrastive monte carlo tree search reasoningGao, Zitian and Niu, Boye and He, Xuzheng and Xu, Haotian and Liu, Hongzhang and Liu, Aiwei and Hu, Xuming and Wen, LijiearXiv preprint arXiv:2410.017072024
267liang2025mctsI-MCTS: Enhancing agentic AutoML via introspective monte carlo tree searchLiang, Zujie and Wei, Feng and Xu, Wujiang and Chen, Lin and Qian, Yuxi and Wu, XinhuiarXiv preprint arXiv:2502.146932025
268an2025combiningCombining llms with logic-based framework to explain mctsAn, Ziyan and Wang, Xia and Baier, Hendrik and Chen, Zirong and Dubey, Abhishek and Johnson, Taylor T and Sprinkle, Jonathan and Mukhopadhyay, Ayan and Ma, MeiyiarXiv preprint arXiv:2505.006102025
269dao2025boostingBoosting MCTS with Free Energy MinimizationDao, Mawaba Pascal and Peter, Adrian MarXiv preprint arXiv:2501.130832025
270meimandi2025measurementThe Measurement Imbalance in Agentic AI Evaluation Undermines Industry Productivity ClaimsMeimandi, Kiana Jafari and Ar{\'aarXiv preprint arXiv:2506.020642025
271ahmed2025enhancingEnhancing Explainability, Robustness, and Autonomy: A Comprehensive Approach in Trustworthy AIAhmed, Mobyen Uddin and Begum, Shahina and Barua, Shaibal and Masud, Abu Naser and Di Flumeri, Gianluca and Navarin, Nicol{\`o2025 IEEE Symposium on Trustworthy, Explainable and Responsible Computational Intelligence (CITREx)2025
272sanwal2025layeredLayered chain-of-thought prompting for multi-agent llm systems: A comprehensive approach to explainable large language modelsSanwal, ManisharXiv preprint arXiv:2501.186452025
273pang2025interactiveInteractive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language ModelsPang, Rock Yuren and Feng, KJ and Feng, Shangbin and Li, Chu and Shi, Weijia and Tsvetkov, Yulia and Heer, Jeffrey and Reinecke, KatharinaarXiv preprint arXiv:2506.236782025
274bilal2025metaMeta-thinking in llms via multi-agent reinforcement learning: A surveyBilal, Ahsan and Mohsin, Muhammad Ahmed and Umer, Muhammad and Bangash, Muhammad Awais Khan and Jamshed, Muhammad AliarXiv preprint arXiv:2504.145202025
275wen2025cotguardCoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM SystemsWen, Yan and Guo, Junfeng and Huang, HengarXiv preprint arXiv:2505.194052025
276zahid2025explainabilityExplainability, Robustness, and Fairness in User-Centric Intelligent Systems: A Systematic ReviewZahid, Idrees A and Garfan, Salem and Chyad, MA and Albahri, AS and Albahri, OS and Alamoodi, AH and Deveci, Muhammet and Homod, Raad Z and Alzubaidi, LaithIEEE Transactions on Emerging Topics in Computational Intelligence2025
277gupta2025aiAI Agents Collaboration Under Resource Constraints: Practical ImplementationsGupta, ShubhamINTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT2025
278molinari2025towardsTowards Pervasive Distributed Agentic Generative AI--A State of The ArtMolinari, Gianni and Ciravegna, FabioarXiv preprint arXiv:2506.133242025
279zhang2024integratingIntegrating Artificial Intelligence into Operating Systems: A Comprehensive Survey on Techniques, Applications, and Future DirectionsZhang, Yifan and Zhao, Xinkui and Li, Ziying and Yin, Jianwei and Zhang, Lufei and Chen, ZuoningarXiv preprint arXiv:2407.145672024
280wei2025agentAgent. xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoCWei, Xinming and Zhang, Jiahao and Li, Haoran and Chen, Jiayu and Qu, Rui and Li, Maoliang and Chen, Xiang and Luo, GuojiearXiv preprint arXiv:2506.240452025
281jiang2025largeFrom large ai models to agentic ai: A tutorial on future intelligent communicationsJiang, Feibo and Pan, Cunhua and Dong, Li and Wang, Kezhi and Dobre, Octavia A and Debbah, MerouanearXiv preprint arXiv:2505.223112025
282liu2025optimizingOptimizing on-demand food delivery with BDI-based multi-agent systems and Monte Carlo tree search schedulingLiu, Li and Chen, Shikun and Jin, Huan and Deng, Xiaoying and Liu, Yangguang and Lin, YangScientific Reports2025
283zou2025agenteEl Agente: An autonomous agent for quantum chemistryZou, Yunheng and Cheng, Austin H and Aldossary, Abdulrahman and Bai, Jiaru and Leong, Shi Xuan and Campos-Gonzalez-Angulo, Jorge Arturo and Choi, Changhyeok and Ser, Cher Tian and Tom, Gary and Wang, Andrew and othersMatter2025
284amini2025distributedDistributed llms and multimodal large language models: A survey on advances, challenges, and future directionsAmini, Hadi and Mia, Md Jueal and Saadati, Yasaman and Imteaj, Ahmed and Nabavirazavi, Seyedsina and Thakker, Urmish and Hossain, Md Zarif and Fime, Awal Ahmed and Iyengar, SSarXiv preprint arXiv:2503.165852025
285chaudhry2025towardsTowards Resource-Efficient Compound AI SystemsChaudhry, Gohar Irfan and Choukse, Esha and Goiri, {\'IProceedings of the 2025 Workshop on Hot Topics in Operating Systems2025
286roy2024enhancingEnhancing Real-World Robustness in AI: Challenges and SolutionsRoy, PritamJ. Recent Trends Comput. Sci. Eng2024
287kim2025medicalMedical hallucinations in foundation models and their impact on healthcareKim, Yubin and Jeong, Hyewon and Chen, Shan and Li, Shuyue Stella and Lu, Mingyu and Alhamoud, Kumail and Mun, Jimin and Grau, Cristina and Jung, Minseok and Gameiro, Rodrigo and othersarXiv preprint arXiv:2503.057772025
288gao2025monoMono: Is Your" Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and BeyondGao, Zeyu and Zhou, Junlin and Zhang, Bolun and He, Yi and Zhang, Chao and Cui, Yuxin and Wang, HaoarXiv preprint arXiv:2506.036512025
289chander2025towardToward trustworthy artificial intelligence (TAI) in the context of explainability and robustnessChander, Bhanu and John, Chinju and Warrier, Lekha and Gopalakrishnan, KumaravelanACM Computing Surveys2025
290barros2025thinkI Think, Therefore I Hallucinate: Minds, Machines, and the Art of Being WrongBarros, SebastianarXiv preprint arXiv:2503.058062025
291latif2025hallucinationsHallucinations in large language models and their influence on legal reasoning: Examining the risks of ai-generated factual inaccuracies in judicial processesLatif, Youssef AbdelJournal of Computational Intelligence, Machine Reasoning, and Decision-Making2025
292chakraborti2025personalizedPersonalized uncertainty quantification in artificial intelligenceChakraborti, Tapabrata and Banerji, Christopher RS and Marandon, Ariane and Hellon, Vicky and Mitra, Robin and Lehmann, Brieuc and Br{\"aNature Machine Intelligence2025
293liu2025uncertaintyUncertainty quantification and confidence calibration in large language models: A surveyLiu, Xiaoou and Chen, Tiejin and Da, Longchao and Chen, Chacha and Lin, Zhen and Wei, HuaarXiv preprint arXiv:2503.158502025
294becerra2025historicalHistorical Methods for AI Evaluations, Assessments, and AuditsBecerra Sandoval, Juana Catalina and Jing, Felicia SProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency2025
295yeo2025comprehensiveA comprehensive review on financial explainable AIYeo, Wei Jie and Van Der Heever, Wihan and Mao, Rui and Cambria, Erik and Satapathy, Ranjan and Mengaldo, GianmarcoArtificial Intelligence Review2025
296mao2025llmsFrom LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM EcosystemMao, Yanxu and Cui, Tiehan and Liu, Peipei and You, Datao and Zhu, HongsongarXiv preprint arXiv:2506.151702025
297feng2025integrationIntegration of multi-agent systems and artificial intelligence in self-healing subway power supply systems: Advancements in fault diagnosis, isolation, and recoveryFeng, Jianbing and Yu, Tao and Zhang, Kuozhen and Cheng, LefengProcesses2025
298hammond2025multiMulti-agent risks from advanced aiHammond, Lewis and Chan, Alan and Clifton, Jesse and Hoelscher-Obermaier, Jason and Khan, Akbir and McLean, Euan and Smith, Chandler and Barfuss, Wolfram and Foerster, Jakob and Gaven{\v{carXiv preprint arXiv:2502.141432025
299acharya2025agenticAgentic ai: Autonomous intelligence for complex goals--a comprehensive surveyAcharya, Deepak Bhaskar and Kuppan, Karthigeyan and Divya, BIEEe Access2025
300abdallah2024multiMulti-agent DRL for distributed codebook design in RIS-aided cell-free massive MIMO networksAbdallah, Asmaa and Celik, Abdulkadir and Mansour, Mohammad M and Eltawil, Ahmed MIEEE Transactions on Communications2024
301feffer2024redRed-teaming for generative AI: Silver bullet or security theater?Feffer, Michael and Sinha, Anusha and Deng, Wesley H and Lipton, Zachary C and Heidari, HodaProceedings of the AAAI/ACM Conference on AI, Ethics, and Society2024
302majumdar2025redRed Teaming AI Red TeamingMajumdar, Subhabrata and Pendleton, Brian and Gupta, AbhishekarXiv preprint arXiv:2507.055382025
303qwen2025ledgerAccountability Ledger: Blockchain-Based AI Decision Logging{Qwen Team2025
304google2025bertBias Bounty Program for BERT{Google AI2025
305openai2025gpt4Homomorphic Encryption in GPT-4OpenAI2025
306deepmind2025sparrowSafety Layer in Sparrow: Preventing Harmful OutputsDeepMind2025
307anthropic2025claudeInteractive Transparency in ClaudeAnthropic2025
308ey2025mottHow Mott MacDonald is Building Confidence Through Responsible AIEY2025
309ey2025biopharmaHow a Global Biopharma Became a Leader in Ethical AIEY2025
310eu2025aiEU AI Act{European Union2025
311masood2025effectivenessMeasuring the Effectiveness of AI AdoptionMasood, A.2025
312forbes2025aiFuture Directions in AI EthicsForbes2025
313ey_mottmac2025{How Mott MacDonald is building confidence through responsible AI}{EY}2025
314challita2025redteamllmRedTeamLLM: an Agentic AI framework for offensive securityChallita, Brian and Parrend, PierrearXiv preprint arXiv:2505.069132025
315glazer2024frontiermathFrontiermath: A benchmark for evaluating advanced mathematical reasoning in aiGlazer, Elliot and Erdil, Ege and Besiroglu, Tamay and Chicharro, Diego and Chen, Evan and Gunning, Alex and Olsson, Caroline Falkman and Denain, Jean-Stanislas and Ho, Anson and Santos, Emily de Oliveira and othersarXiv preprint arXiv:2411.048722024
316ogbu2023agenticAgentic ai in computer vision domain-recent advances and prospectsOgbu, DanielInternational Journal of Research Publication and Reviews2023
317glaese2022improvingalignmentdialogueagentsImproving alignment of dialogue agents via targeted human judgementsAmelia Glaese and Nat McAleese and Maja Trębacz and John Aslanides and Vlad Firoiu and Timo Ewalds and Maribeth Rauh and Laura Weidinger and Martin Chadwick and Phoebe Thacker and Lucy Campbell-Gillingham and Jonathan Uesato and Po-Sen Huang and Ramona Comanescu and Fan Yang and Abigail See and Sumanth Dathathri and Rory Greig and Charlie Chen and Doug Fritz and Jaume Sanchez Elias and Richard Green and Soňa Mokrá and Nicholas Fernando and Boxi Wu and Rachel Foley and Susannah Young and Iason Gabriel and William Isaac and John Mellor and Demis Hassabis and Koray Kavukcuoglu and Lisa Anne Hendricks and Geoffrey Irving2022DOI/URL
318amorim2023dataprivacyhomomorphicencryptionData Privacy with Homomorphic Encryption in Neural Networks Training and InferenceIvone Amorim and Eva Maia and Pedro Barbosa and Isabel Praça2023DOI/URL
319scharowski2023exploringExploring the effects of human-centered AI explanations on trust and relianceScharowski, Nicolas and Perrig, Sebastian AC and Svab, Melanie and Opwis, Klaus and Br{\"uFrontiers in Computer Science2023
320liao2022humancenteredexplainableaixaiHuman-Centered Explainable AI (XAI): From Algorithms to User ExperiencesQ. Vera Liao and Kush R. Varshney2022DOI/URL
321alibabacloud_sls_logauditSimple Log Service: Log Audit Service (new version){Alibaba Cloud2024DOI/URL
322yang2020ledgerdbLedgerDB: A centralized ledger database for universal audit and verificationYang, Xinying and Zhang, Yuan and Wang, Sheng and Yu, Benquan and Li, Feifei and Li, Yize and Yan, WenyuanProceedings of the VLDB Endowment2020
323fli_ai_safety_index_2025{AI Safety Index: Summer 2025 Edition{Future of Life Institute2025DOI/URL
324TFS2025_ai_agents_euAhead of the Curve: Governing AI Agents under the EU {AI{The Future Society2025DOI/URL
325maclean2017nistThe NIST risk management framework: Problems and recommendationsMaclean, DonCyber Security: A Peer-Reviewed Journal2017
326gogia2025trustTrust by Design: Dissecting IBM's Enterprise AI Governance StackSanchit Vir Gogia2025DOI/URL
327xia2024responsibleaimetricscatalogueTowards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI AccountabilityBoming Xia and Qinghua Lu and Liming Zhu and Sung Une Lee and Yue Liu and Zhenchang Xing2024DOI/URL
328weidinger2024holisticHolistic safety and responsibility evaluations of advanced ai modelsWeidinger, Laura and Barnhart, Joslyn and Brennan, Jenny and Butterfield, Christina and Young, Susie and Hawkins, Will and Hendricks, Lisa Anne and Comanescu, Ramona and Chang, Oscar and Rodriguez, Mikel and othersarXiv preprint arXiv:2404.140682024
329sprague2024cotTo cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoningSprague, Zayne and Yin, Fangcong and Rodriguez, Juan Diego and Jiang, Dongwei and Wadhwa, Manya and Singhal, Prasann and Zhao, Xinyu and Ye, Xi and Mahowald, Kyle and Durrett, GregarXiv preprint arXiv:2409.121832024
330bergman2024stelaSTELA: a community-centred approach to norm elicitation for AI alignmentBergman, Stevie and Marchal, Nahema and Mellor, John and Mohamed, Shakir and Gabriel, Iason and Isaac, WilliamScientific Reports2024
331larsen2024aivaluealignmentAI value alignment: How we can align artificial intelligence with human valuesLarsen, Benjamin and Dignum, Virginia2024DOI/URL
332alicloud2025ledgerdbLedgerDB: a centralized ledger database for universal audit and verificationYang, Xinying and Zhang, Yuan and Wang, Sheng and Yu, Benquan and Li, Feifei and Li, Yize and Yan, WenyuanProc. VLDB Endow.2020DOI/URL
333mialon2023gaiabenchmarkgeneralaiGAIA: a benchmark for General AI AssistantsGrégoire Mialon and Clémentine Fourrier and Craig Swift and Thomas Wolf and Yann LeCun and Thomas Scialom2023DOI/URL
334timms2024agenticAgentic Anomaly Detection for ShippingTimms, Alexander and Langbridge, Abigail and O'Donncha, FearghalNeurIPS 2024 Workshop on Open-World Agents2024
335kumar2025saarthiSaarthi: The First AI Formal Verification EngineerKumar, Aman and Gadde, Deepak Narayan and Radhakrishna, Keerthan Kopparam and Lettnin, DjonesarXiv preprint arXiv:2502.166622025
336garg2025designingDesigning the Mind: How Agentic Frameworks Are Shaping the Future of AI BehaviorGarg, VenusJournal of Computer Science and Technology Studies2025
337buehler2025agenticAgentic deep graph reasoning yields self-organizing knowledge networksBuehler, Markus JarXiv preprint arXiv:2502.130252025
338perrier2025outOut of Control--Why Alignment Needs Formal Control Theory (and an Alignment Control Stack)Perrier, ElijaarXiv preprint arXiv:2506.178462025
339huang2025agenticAgentic AIHuang, KenSpringer2025
340kitchenham2004proceduresProcedures for performing systematic reviewsKitchenham, BarbaraKeele, UK, Keele University2004
341boland2017doingDoing a systematic review: a student s guideBoland, Angela and Cherry, Gemma and Dickson, RumonaSage2017
342lee2025evaluatingEvaluating step-by-step reasoning traces: A surveyLee, Jinu and Hockenmaier, JuliaarXiv preprint arXiv:2502.122892025
343natarajan2025humanHuman-in-the-loop or AI-in-the-loop? Automate or Collaborate?Natarajan, Sriraam and Mathur, Saurabh and Sidheekh, Sahil and Stammer, Wolfgang and Kersting, KristianProceedings of the AAAI Conference on Artificial Intelligence2025
344yigit2025generativeGenerative AI and LLMs for critical infrastructure protection: evaluation benchmarks, agentic AI, challenges, and opportunitiesYigit, Yagmur and Ferrag, Mohamed Amine and Ghanem, Mohamed C and Sarker, Iqbal H and Maglaras, Leandros A and Chrysoulas, Christos and Moradpoor, Naghmeh and Tihanyi, Norbert and Janicke, HelgeSensors2025
345allana2025privacyPrivacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping ReviewAllana, Sonal and Kankanhalli, Mohan and Dara, RozitaarXiv preprint arXiv:2505.028282025
346deng2025aiAi agents under threat: A survey of key security challenges and future pathwaysDeng, Zehang and Guo, Yongjian and Han, Changzhou and Ma, Wanlun and Xiong, Junwu and Wen, Sheng and Xiang, YangACM Computing Surveys2025
347inala2025buildingBuilding Trustworthy Agentic Ai Systems FOR Personalized Banking ExperiencesInala, Ramesh and Somu, BharathMetallurgical and Materials Engineering2025
348huang2025aiAI Agent Safety and Security ConsiderationsHuang, Jerry and Huang, Ken and Jackson, Krystal and Hughes, ChrisAgentic AI: Theories and Practices2025
349sutton2018reinforcement{Reinforcement learning: An introductionSutton, Richard S and Barto, Andrew GMIT press2018
350hosseini2025aiAI ethics in action: a circular model for transparency, accountability and inclusivityHosseini Tabaghdehi, Seyedeh Asieh and Ayaz, {\"OJournal of Managerial Psychology2025
351bahangulu2025algorithmicAlgorithmic bias, data ethics, and governance: Ensuring fairness, transparency and compliance in AI-powered business analytics applicationsBahangulu, Julien Kiesse and Berko, Louis OwusuWorld Journal of Advanced Research and Reviews2025
352li2025aiAI-Driven Governance: Enhancing Transparency and Accountability in Public AdministrationLI, CHANGKUIDigital Society \& Virtual Governance2025
353andrada2023varietiesVarieties of transparency: Exploring agency within AI systemsAndrada, Gloria and Clowes, Robert W and Smart, Paul RAI \& society2023
354zerilli2022transparencyHow transparency modulates trust in artificial intelligenceZerilli, John and Bhatt, Umang and Weller, AdrianPatterns2022
355akhtar2024privacyPrivacy and Security Considerations in Explainable AIAkhtar, Mohammad Amir Khusru and Kumar, Mohit and Nayyar, AnandTowards Ethical and Socially Responsible Explainable AI: Challenges and Opportunities2024
356busuioc2021accountableAccountable artificial intelligence: Holding algorithms to accountBusuioc, MadalinaPublic administration review2021
357griffin2024ethicalThe ethical agency of AI developersGriffin, Tricia A and Green, Brian Patrick and Welie, Jos VMAI and Ethics2024
358bjurling2025designingDesigning Human-Swarm Interaction SystemsBjurling, OscarLink{\"o2025
359braun2025liabilityLiability for artificial intelligence reasoning technologies--a cognitive autonomy that does not helpBraun, TomaszCorporate Governance: The International Journal of Business in Society2025
360crewAICrewAI: Framework for Orchestrating Role-Playing, Autonomous AI AgentsJoão Moura and contributorsGitHub2023
361raman2025navigatingNavigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathwaysRaman, Raghu and Kowalski, Robin and Achuthan, Krishnashree and Iyer, Akshay and Nedungadi, PremaScientific Reports2025
362hammerschmidt2025bridgingBridging the gap: inequalities that divide those who can and cannot create sustainable outcomes with AIHammerschmidt, Teresa and Stolz, Katharina and Posegga, OliverBehaviour \& Information Technology2025
363dahlan2025navigatingNavigating the Digital Frontier: Understanding Technology's Impact on SocietyDahlan, Mariani MohdUniversiti Poly-Tech Malaysia2025
364jiang_mistral_2023Mistral {7BJiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, Lélio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, Timothée and Sayed, William ElarXiv2023DOI/URL
365xu_bot-adversarial_2021Bot-{AdversarialXu, Jing and Ju, Da and Li, Margaret and Boureau, Y-Lan and Weston, Jason and Dinan, EmilyProceedings of the 2021 {Conference2021DOI/URL
366pujari2024ethicalEthical and responsible AI: Governance frameworks and policy implications for multi-agent systemsPujari, Tejaskumar and Goel, Anshul and Sharma, AshwinIJST2024
367nanjundan2025navigatingNavigating the ethical landscape of artificial intelligence: Challenges, frameworks, and responsible deploymentNanjundan, Preethi and Indu, PV and Thomas, LijoArtificial Intelligence Technologies for Engineering Applications2025
368panarese2025algorithmicAlgorithmic bias, fairness, and inclusivity: a multilevel framework for justice-oriented AIPanarese, Paola and Grasso, Marta Margherita and Solinas, ClaudiaAI \& SOCIETY2025
369mergen2025artificialArtificial intelligence and bias towards marginalised groups: Theoretical roots and challengesMergen, Aybike and {\c{CAI and Diversity in a Datafied World of Work: Will the Future of Work be Inclusive?2025
370kay2025imitationImitation, Identity, and Injustice in Artificial IntelligenceKay, Jackie2025
371koukaras2025aiAI-driven telecommunications for smart classrooms: Transforming education through personalized learning and secure networksKoukaras, Christos and Koukaras, Paraskevas and Ioannidis, Dimosthenis and Stavrinides, Stavros GTelecom2025
372sharma2025roleThe role of large language models in personalized learning: a systematic review of educational impactSharma, Sahil and Mittal, Puneet and Kumar, Mukesh and Bhardwaj, VivekDiscover Sustainability2025
373lau2025sizeSize Matters When Adopting and Scaling AILau, TheodoraBanking on (Artificial) Intelligence: Navigating the Realities of AI in Financial Services2025
374rahal2025useThe use of publicly available online texts in training AI: an ethical analysis of AI’s right to learnRahal, LouaiJournal of Information, Communication and Ethics in Society2025
375emery2025internationalInternational governance of advancing artificial intelligenceEmery-Xu, Nicholas and Jordan, Richard and Trager, RobertAI \& SOCIETY2025
376charkhian2025canHOW CAN AI EVALUATE AND IMPROVE INCLUSIVITY IN UNIVERSITY PORTALS, WITH A FOCUS ON CULTURAL, LINGUISTIC, AND ACCESSIBLE REQUIREMENTS?Charkhian, D and Moghaddami, BINTED2025 Proceedings2025
377davoodi2024equalEQUAL AI: A framework for enhancing equity, quality, understanding and accessibility in liberal arts through AI for multilingual learnersDavoodi, AminLanguage, Technology, and Social Media2024
378hyrynsalmi2025makingMaking Software Development More Diverse and Inclusive: Key Themes, Challenges, and Future DirectionsHyrynsalmi, Sonja M and Baltes, Sebastian and Brown, Chris and Prikladnicki, Rafael and Rodriguez-Perez, Gema and Serebrenik, Alexander and Simmonds, Jocelyn and Trinkenreich, Bianca and Wang, Yi and Liebel, GrischaACM Transactions on Software Engineering and Methodology2025
379alam2025ethicalEthical Challenges and Bias in AI-Driven Marketing: Educational Imperatives and Policy PerspectivesAlam, AshrafImpacts of AI-Generated Content on Brand Reputation2025
380neumann2025positionPosition is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)Neumann, Anna and Kirsten, Elisabeth and Zafar, Muhammad Bilal and Singh, JatinderProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency2025
381ma2025breakingBreaking Down Bias: On The Limits of Generalizable Pruning StrategiesMa, Sibo and Salinas, Alejandro and Nyarko, Julian and Henderson, PeterProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency2025
382solano2025running" Who is running it?" Towards Equitable AI Deployment in Home Care WorkSolano-Kamaiko, Ian Ren{\'eProceedings of the 2025 CHI Conference on Human Factors in Computing Systems2025
383gabriel2025matterA matter of principle? AI alignment as the fair treatment of claimsGabriel, Iason and Keeling, GeoffPhilosophical Studies2025
384watson2025competingCompeting narratives in AI ethics: a defense of sociotechnical pragmatismWatson, David S and M{\"oai \& Society2025
385goldberg2025threatThreat Rigidity and the Role of Leadership and Organizational Change in Artificial Intelligence Adoption in Technology CompaniesGoldberg, Nicole Dillon2025
386van2025beyondBeyond efficiency: How artificial intelligence (AI) will reshape scientific inquiry and the publication processVan Quaquebeke, Niels and Tonidandel, Scott and Banks, George CThe Leadership Quarterly2025
387belliger2025newNew Perspectives on AI AlignmentBelliger, Andr{\'eEthics in the Age of AI: Navigating Politics and Security2025
388xue2025mmrcMmrc: A large-scale benchmark for understanding multimodal large language model in real-world conversationXue, Haochen and Tang, Feilong and Hu, Ming and Liu, Yexin and Huang, Qidong and Li, Yulong and Liu, Chengzhi and Xu, Zhongxing and Zhang, Chong and Feng, Chun-Mei and othersarXiv preprint arXiv:2502.119032025
389yang2025surveyA survey of ai agent protocolsYang, Yingxuan and Chai, Huacan and Song, Yuanyi and Qi, Siyuan and Wen, Muning and Li, Ning and Liao, Junwei and Hu, Haoyi and Lin, Jianghao and Chang, Gaowei and othersarXiv preprint arXiv:2504.167362025
390tian2025outlookAn outlook on the opportunities and challenges of multi-agent ai systemsTian, Fangqiao and Luo, An and Du, Jin and Xian, Xun and Specht, Robert and Wang, Ganghua and Bi, Xuan and Zhou, Jiawei and Srinivasa, Jayanth and Kundu, Ashish and othersarXiv preprint arXiv:2505.183972025
391karim2025aiAi agents meet blockchain: A survey on secure and scalable collaboration for multi-agentsKarim, Md Monjurul and Van, Dong Hoang and Khan, Sangeen and Qu, Qiang and Kholodov, YaroslavFuture Internet2025
392gawande2025reactiveFrom Reactive to Proactive: Real-Time Human-AI Collaboration in Intelligent Alerting SystemsGawande, Pramod DattaraoJournal of Computer Science and Technology Studies2025
393hughes2025aiAI agents and agentic systems: A multi-expert analysisHughes, Laurie and Dwivedi, Yogesh K and Malik, Tegwen and Shawosh, Mazen and Albashrawi, Mousa Ahmed and Jeon, Il and Dutot, Vincent and Appanderanda, Mandanna and Crick, Tom and De’, Rahul and othersJournal of Computer Information Systems2025
394ahrweiler2025inclusiveInclusive technology co-design for participatory AIAhrweiler, Petra and Sp{\"aParticipatory Artificial Intelligence in Public Social Services: From Bias to Fairness in Assessing Beneficiaries2025
395merchan2025trustTrust by Design: An Ethical Framework for Collaborative Intelligence Systems in Industry 5.0Merch{\'aElectronics2025
396watson2025personalizedPersonalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human ValuesWatson, Nell and Amer, Ahmed and Harris, Evan and Ravindra, Preeti and Zhang, ShujunarXiv preprint arXiv:2506.137742025
397kolt2025governingGoverning AI agentsKolt, NoamarXiv preprint arXiv:2501.079132025
398kraprayoon2025aiAi agent governance: A field guideKraprayoon, Jam and Williams, Zoe and Fayyaz, RidaarXiv preprint arXiv:2505.218082025
399cohen2025exploringExploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation DialoguesCohen, Myke C and Su, Zhe and Kao, Hsien-Te and Nguyen, Daniel and Lynch, Spencer and Sap, Maarten and Volkova, SvitlanaarXiv preprint arXiv:2506.159282025
400zhi2024beyondBeyond preferences in ai alignmentZhi-Xuan, Tan and Carroll, Micah and Franklin, Matija and Ashton, HalPhilosophical Studies2024
401chan2024visibilityVisibility into AI agentsChan, Alan and Ezell, Carson and Kaufmann, Max and Wei, Kevin and Hammond, Lewis and Bradley, Herbie and Bluemke, Emma and Rajkumar, Nitarshan and Krueger, David and Kolt, Noam and othersProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency2024
402raza_fair_2024{FAIRRaza, Shaina and Ghuge, Shardul and Ding, Chen and Pandya, DevalarXiv preprint arXiv:2401.110332024
403liu_agentbench_2023{AgentBenchLiu, Xiao and Yu, Hao and Zhang, Hanchen and Xu, Yifan and Lei, Xuanyu and Lai, Hanyu and Gu, Yu and Ding, Hangliang and Men, Kaiwen and Yang, Kejuan and Zhang, Shudan and Deng, Xiang and Zeng, Aohan and Du, Zhengxiao and Zhang, Chenhui and Shen, Sheng and Zhang, Tianjun and Su, Yu and Sun, Huan and Huang, Minlie and Dong, Yuxiao and Tang, JiearXiv2023DOI/URL
404touvron2023llama{LLaMA: Open and Efficient Foundation Language ModelsTouvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozi{\`{earXiv preprint arXiv:2302.139712023DOI/URL
405zhang_multitrust_2024{MultiTrustZhang, Yichi and Huang, Yao and Sun, Yitong and Liu, Chang and Zhao, Zhe and Fang, Zhengwei and Wang, Yifan and Chen, Huanran and Yang, Xiao and Wei, Xingxing and Su, Hang and Dong, Yinpeng and Zhu, JunarXiv2024DOI/URL

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published