Authors: Shaina Raza*, Ranjan Sapkota*, Manoj Karkee, Christos Emmanouilidis
Affiliations: Vector Institute; Cornell University; University of Groningen
Equal contribution: Shaina Raza and Ranjan Sapkota
If you use this work, please cite us (see Cite below).
What is R²A²?
Responsible Reasoning AI Agents (R²A²) are LLM-powered agents that perform multi-step reasoning with built-in safeguards — bias checks, privacy protection, audit logs, and robustness tests — applied at every reasoning step, not just the final output.
Why now? The 2024–2025 wave of reasoning models and agentic browsers demands trace-level evaluation (faithfulness, safety, privacy), continuous auditing, and human-in-the-loop oversight to reach production in high-stakes domains.
BibTeX:
@article{raza2025responsible,
author = {Shaina Raza and Ranjan Sapkota and Manoj Karkee and Christos Emmanouilidis},
title = {Responsible Agentic Reasoning and AI Agents: A Critical Survey},
journal = {TechRxiv},
year = {2025},
month = sep,
day = {08},
doi = {10.36227/techrxiv.175735299.97215847/v1},
note = {Preprint}
}
This repository is licensed under the MIT License (see LICENSE
).
We thank contributors and readers who provide feedback and issue reports. PRs welcome!
Show full references table
# | Key | Title | Authors | Venue | Year | Link |
---|---|---|---|---|---|---|
1 | venerito2025reasoning | Reasoning large language models in rheumatology: a call for responsible action | Venerito, Vincenzo and Iannone, Florenzo and Gupta, Latika | The Lancet Rheumatology | 2025 | |
2 | nist2023airmf | Artificial Intelligence Risk Management Framework (AI RMF 1.0) | {National Institute of Standards and Technology | 2023 | DOI/URL | |
3 | johnson2019billion | Billion-scale similarity search with GPUs | Johnson, Jeff and Douze, Matthijs and J{\'e | IEEE Transactions on Big Data | 2019 | |
4 | oecd2019ai | Recommendation of the Council on Artificial Intelligence | {Organisation for Economic Co-operation and Development | 2019 | DOI/URL | |
5 | eu2024aiact | Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 on Artificial Intelligence and amending certain Union legislative acts (Artificial Intelligence Act) | {European Parliament and Council of the European Union | 2024 | DOI/URL | |
6 | ieee2019ethics | Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems | {IEEE | 2019 | DOI/URL | |
7 | chen2025reasoning | Reasoning Models Don’t Always Say What They Think | Chen, Yanda and Benton, Joe and Radhakrishnan, Ansh and Uesato, Jonathan and Denison, Carson and Schulman, John and Somani, Arushi and Hase, Peter and Wagner, Misha and Roger, Fabien and Mikulik, Vlad and Bowman, Samuel R. and Leike, Jan and Kaplan, Jared and Perez, Ethan and Alignment Science Team, Anthropic | arXiv preprint arXiv:2505.05410 | 2025 | DOI/URL |
8 | xu2025towards | Towards large reasoning models: A survey of reinforced reasoning with large language models | Xu, Fengli and Hao, Qianyue and Zong, Zefang and Wang, Jingwei and Zhang, Yunke and Wang, Jingyi and Lan, Xiaochong and Gong, Jiahui and Ouyang, Tianjian and Meng, Fanjin and others | arXiv preprint arXiv:2501.09686 | 2025 | |
9 | karpas2022mrkl | MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning | Karpas, Ehud and Abend, Omri and Belinkov, Yonatan and Lenz, Barak and Lieber, Opher and Ratner, Nir and Shoham, Yoav and Bata, Hofit and Levine, Yoav and Leyton-Brown, Kevin and others | arXiv preprint arXiv:2205.00445 | 2022 | |
10 | raza2024beads | Beads: Bias evaluation across domains | Raza, Shaina and Rahman, Mizanur and Zhang, Michael R | arXiv preprint arXiv:2406.04220 | 2024 | |
11 | xia2025evaluating | Evaluating mathematical reasoning beyond accuracy | Xia, Shijie and Li, Xuefeng and Liu, Yixin and Wu, Tongshuang and Liu, Pengfei | Proceedings of the AAAI Conference on Artificial Intelligence | 2025 | |
12 | raza2025humanibench | Humanibench: A human-centric framework for large multimodal models evaluation | Raza, Shaina and Narayanan, Aravind and Khazaie, Vahid Reza and Vayani, Ashmal and Chettiar, Mukund S and Singh, Amandeep and Shah, Mubarak and Pandya, Deval | arXiv preprint arXiv:2505.11454 | 2025 | |
13 | dafoe2018ai | AI governance: a research agenda | Dafoe, Allan | Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK | 2018 | |
14 | 10771762 | Exploring Bias and Prediction Metrics to Characterise the Fairness of Machine Learning for Equity-Centered Public Health Decision-Making: A Narrative Review | Raza, Shaina and Shaban-Nejad, Arash and Dolatabadi, Elham and Mamiya, Hiroshi | IEEE Access | 2024 | DOI/URL |
15 | putnam_axiom2024 | Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning | Aryan Gulati and Brando Miranda and Eric Chen and Emily Xia and Kai Fronsdal and Bruno de Moraes Dumont and Sanmi Koyejo | 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Workshop on MATH-AI | 2024 | DOI/URL |
16 | OperaBrowserOperator2025 | Meet Opera’s AI Browser Operator | {Opera Software | 2025 | ||
17 | Comet2025 | Introducing Comet: Browse at the Speed of Thought | {Perplexity Team | 2025 | ||
18 | Dia2025 | Dia Browser | Dia Browser | 2025 | ||
19 | OpenAIOperator2025 | OpenAI Operator | OpenAI | 2025 | ||
20 | sapkota2025multimodal | Multimodal large language models for image, text, and speech data augmentation: A survey | Sapkota, Ranjan and Raza, Shaina and Shoman, Maged and Paudel, Achyut and Karkee, Manoj | arXiv preprint arXiv:2501.18648 | 2025 | |
21 | ClaudeArtifacts2024 | Claude 3.5 Sonnet Launch \& Artifacts Preview | {Anthropic | 2024 | ||
22 | CowPilot2025 | CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | Faria Huq and Zora Zhiruo Wang and Frank F. Xu and Tianyue Ou and Shuyan Zhou and Jeffrey P. Bigham and Graham Neubig | arXiv preprint | 2025 | DOI/URL |
23 | SWEAgent2024 | SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press | arXiv preprint | 2024 | DOI/URL |
24 | MistralAgentsAPI2025 | Build AI Agents with the Mistral Agents API | {Mistral AI | 2025 | ||
25 | chollet2019measure | On the measure of intelligence | Chollet, Fran{\c{c | arXiv preprint arXiv:1911.01547 | 2019 | |
26 | chollet2025arc | Arc-agi-2: A new challenge for frontier ai reasoning systems | Chollet, Francois and Knoop, Mike and Kamradt, Gregory and Landers, Bryan and Pinkard, Henry | arXiv preprint arXiv:2505.11831 | 2025 | |
27 | yue2024mmmumassivemultidisciplinemultimodal | MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI | Xiang Yue and Yuansheng Ni and Kai Zhang and Tianyu Zheng and Ruoqi Liu and Ge Zhang and Samuel Stevens and Dongfu Jiang and Weiming Ren and Yuxuan Sun and Cong Wei and Botao Yu and Ruibin Yuan and Renliang Sun and Ming Yin and Boyuan Zheng and Zhenzhu Yang and Yibo Liu and Wenhao Huang and Huan Sun and Yu Su and Wenhu Chen | 2024 | DOI/URL | |
28 | peiyuan_liu_2023 | MMLU Dataset | Peiyuan Liu | Kaggle | 2023 | DOI/URL |
29 | PerplexityComet2025 | Comet: The Browser That Thinks With You | Perplexity AI | 2025 | ||
30 | dominguez2024training | Training on the test task confounds evaluation and emergence | Dominguez-Olmedo, Ricardo and Dorner, Florian E and Hardt, Moritz | arXiv preprint arXiv:2407.07890 | 2024 | |
31 | OpenAIChatGPTAgent2025 | Introducing ChatGPT Agent: Bridging Research and Action | OpenAI | 2025 | ||
32 | lee2024vhelm | Vhelm: A holistic evaluation of vision language models | Lee, Tony and Tu, Haoqin and Wong, Chi Heem and Zheng, Wenhao and Zhou, Yiyang and Mai, Yifan and Roberts, Josselin and Yasunaga, Michihiro and Yao, Huaxiu and Xie, Cihang and others | Advances in Neural Information Processing Systems | 2024 | |
33 | pineau2020improving | Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) | Joelle Pineau and Philippe Vincent-Lamarre and Koustuv Sinha and Vincent Larivière and Alina Beygelzimer and Florence d'Alché-Buc and Emily Fox and Hugo Larochelle | 2020 | DOI/URL | |
34 | AWSStrandsAgents2025 | Introducing Strands Agents, an Open Source AI Agents SDK | {AWS Open Source | 2025 | ||
35 | he2024webvoyager | Webvoyager: Building an end-to-end web agent with large multimodal models | He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong | arXiv preprint arXiv:2401.13919 | 2024 | |
36 | GeminiMariner2024 | Introducing Gemini 2.0: Our New AI Model for the Agentic Era | {Google DeepMind | 2024 | ||
37 | zhang2024litewebagent | LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications | Danqing Zhang and Balaji Rama and Shiying He and Jingyi Ni | Zenodo | 2024 | DOI/URL |
38 | GoogleMariner2025 | Project Mariner | {Google DeepMind | 2025 | ||
39 | yang2025agenticwebweavingweb | Agentic Web: Weaving the Next Web with AI Agents | Yingxuan Yang and Mulei Ma and Yuxuan Huang and Huacan Chai and Chenyu Gong and Haoran Geng and Yuanjian Zhou and Ying Wen and Meng Fang and Muhao Chen and Shangding Gu and Ming Jin and Costas Spanos and Yang Yang and Pieter Abbeel and Dawn Song and Weinan Zhang and Jun Wang | 2025 | DOI/URL | |
40 | Fellou2025 | Fellou: Agentic Web Browser | {Fellou AI | 2025 | ||
41 | OperaNeon2025 | Opera Neon | {Opera Software AS | 2025 | ||
42 | CopilotAgent2025 | GitHub Copilot | GitHub Copilot | 2025 | ||
43 | AmazonQDeveloper2025 | Amazon Q Developer Elevates the IDE Experience with New Agentic Coding Experience | Elizabeth Fuentes | 2025 | ||
44 | AutoGen04_2025 | AutoGen v0.4: Reimagining the Foundation of Agentic AI for Scale, Extensibility, and Robustness | Adam Fourney and Ahmed Awadallah and Cheng Tan and Erkang Zhu and Friederike Niedtner and Gagan Bansal and \textit{et al. | 2025 | ||
45 | zhou2024webarenarealisticwebenvironment | WebArena: A Realistic Web Environment for Building Autonomous Agents | Shuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig | 2024 | DOI/URL | |
46 | huang2023benchmarking | Benchmarking large language models as ai research agents | Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure | NeurIPS 2023 Foundation Models for Decision Making Workshop | 2023 | |
47 | huang2023mlagentbench | Mlagentbench: Evaluating language agents on machine learning experimentation | Huang, Qian and Vora, Jian and Liang, Percy and Leskovec, Jure | arXiv preprint arXiv:2310.03302 | 2023 | |
48 | martinez2025dissecting | Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM-and Agent-Based Repair Systems | Martinez, Matias and Franch, Xavier | arXiv preprint arXiv:2506.17208 | 2025 | |
49 | wang2024mobileagentv2mobiledeviceoperation | Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | Junyang Wang and Haiyang Xu and Haitao Jia and Xi Zhang and Ming Yan and Weizhou Shen and Ji Zhang and Fei Huang and Jitao Sang | 2024 | DOI/URL | |
50 | chen2025spabench | SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation | Jingxuan Chen and Derek Yuen and Bin Xie and Yuhao Yang and Gongwei Chen and Zhihao Wu and Li Yixing and Xurui Zhou and Weiwen Liu and Shuai Wang and Kaiwen Zhou and Rui Shao and Liqiang Nie and Yasheng Wang and Jianye HAO and Jun Wang and Kun Shao | The Thirteenth International Conference on Learning Representations | 2025 | |
51 | chollet2024arc | Arc prize 2024: Technical report | Chollet, Francois and Knoop, Mike and Kamradt, Gregory and Landers, Bryan | arXiv preprint arXiv:2412.04604 | 2024 | |
52 | chang2024agentboard | Agentboard: An analytical evaluation board of multi-turn llm agents | Chang, Ma and Zhang, Junlei and Zhu, Zhihao and Yang, Cheng and Yang, Yujiu and Jin, Yaohui and Lan, Zhenzhong and Kong, Lingpeng and He, Junxian | Advances in neural information processing systems | 2024 | |
53 | talmor2018commonsenseqa | Commonsenseqa: A question answering challenge targeting commonsense knowledge | Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan | arXiv preprint arXiv:1811.00937 | 2018 | |
54 | casper2025aiagentindex | The AI Agent Index | Stephen Casper and Luke Bailey and Rosco Hunter and Carson Ezell and Emma Cabalé and Michael Gerovitch and Stewart Slocum and Kevin Wei and Nikola Jurkovic and Ariba Khan and Phillip J. K. Christoffersen and A. Pinar Ozisik and Rakshit Trivedi and Dylan Hadfield-Menell and Noam Kolt | 2025 | DOI/URL | |
55 | srivastava2023beyond | Beyond the imitation game: Quantifying and extrapolating the capabilities of language models | Srivastava, Aarohi and Rastogi, Abhinav and Rao, Abhishek and Shoeb, Abu Awal and Abid, Abubakar and Fisch, Adam and Brown, Adam R and Santoro, Adam and Gupta, Aditya and Garriga-Alonso, Adri and others | Transactions on machine learning research | 2023 | |
56 | zaharia2018accelerating | Accelerating the machine learning lifecycle with MLflow. | Zaharia, Matei and Chen, Andrew and Davidson, Aaron and Ghodsi, Ali and Hong, Sue Ann and Konwinski, Andy and Murching, Siddharth and Nykodym, Tomas and Ogilvie, Paul and Parkhe, Mani and others | IEEE Data Eng. Bull. | 2018 | |
57 | merkel2014docker | Docker: lightweight linux containers for consistent development and deployment | Merkel, Dirk and others | Linux j | 2014 | |
58 | borenstein2021introduction | Introduction to meta-analysis | Borenstein, Michael and Hedges, Larry V and Higgins, Julian PT and Rothstein, Hannah R | John wiley \& sons | 2021 | |
59 | W3C2013PROVOverview | PROV-Overview: An Overview of the PROV Family of Documents | {W3C Provenance Working Group | 2013 | ||
60 | gebru2021datasheets | Datasheets for datasets | Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and Iii, Hal Daum{\'e | Communications of the ACM | 2021 | |
61 | imo | Official Website | {International Mathematical Olympiad | n.d. | ||
62 | livecodebench_datasets | LiveCodeBench datasets - code\_generation\_lite, execution‑v2, test\_generation, … | {LiveCodeBench | n.d. | ||
63 | park2023generative | Generative agents: Interactive simulacra of human behavior | Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S | Proceedings of the 36th annual acm symposium on user interface software and technology | 2023 | |
64 | hong2023metagpt | MetaGPT: Meta programming for a multi-agent collaborative framework | Hong, Sirui and Zhuge, Mingchen and Chen, Jonathan and Zheng, Xiawu and Cheng, Yuheng and Wang, Jinlin and Zhang, Ceyao and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and others | The Twelfth International Conference on Learning Representations | 2023 | |
65 | wu2023visual | Visual chatgpt: Talking, drawing and editing with visual foundation models | Wu, Chenfei and Yin, Shengming and Qi, Weizhen and Wang, Xiaodong and Tang, Zecheng and Duan, Nan | arXiv preprint arXiv:2303.04671 | 2023 | |
66 | li2023camel | Camel: Communicative agents for" mind" exploration of large language model society | Li, Guohao and Hammoud, Hasan and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard | Advances in Neural Information Processing Systems | 2023 | |
67 | wang2023voyager | Voyager: An open-ended embodied agent with large language models | Wang, Guanzhi and Xie, Yuqi and Jiang, Yunfan and Mandlekar, Ajay and Xiao, Chaowei and Zhu, Yuke and Fan, Linxi and Anandkumar, Anima | arXiv preprint arXiv:2305.16291 | 2023 | |
68 | madaan2023self | Self-refine: Iterative refinement with self-feedback | Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and others | Advances in Neural Information Processing Systems | 2023 | |
69 | li2023api | Api-bank: A comprehensive benchmark for tool-augmented llms | Li, Minghao and Zhao, Yingxiu and Yu, Bowen and Song, Feifan and Li, Hangyu and Yu, Haiyang and Li, Zhoujun and Huang, Fei and Li, Yongbin | arXiv preprint arXiv:2304.08244 | 2023 | |
70 | patil2024gorilla | Gorilla: Large language model connected with massive apis | Patil, Shishir G and Zhang, Tianjun and Wang, Xin and Gonzalez, Joseph E | Advances in Neural Information Processing Systems | 2024 | |
71 | suris2023vipergpt | Vipergpt: Visual inference via python execution for reasoning | Sur{\'\i | Proceedings of the IEEE/CVF international conference on computer vision | 2023 | |
72 | ahn2022can | Do as i can, not as i say: Grounding language in robotic affordances | Ahn, Michael and Brohan, Anthony and Brown, Noah and Chebotar, Yevgen and Cortes, Omar and David, Byron and Finn, Chelsea and Fu, Chuyuan and Gopalakrishnan, Keerthana and Hausman, Karol and others | arXiv preprint arXiv:2204.01691 | 2022 | |
73 | shinn2023reflexion | Reflexion: Language agents with verbal reinforcement learning | Shinn, Noah and Cassano, Federico and Gopinath, Ashwin and Narasimhan, Karthik and Yao, Shunyu | Advances in Neural Information Processing Systems | 2023 | |
74 | shen2023hugginggptsolvingaitasks | HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | Yongliang Shen and Kaitao Song and Xu Tan and Dongsheng Li and Weiming Lu and Yueting Zhuang | 2023 | DOI/URL | |
75 | jimenez2023swe | Swe-bench: Can language models resolve real-world github issues? | Jimenez, Carlos E and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik | arXiv preprint arXiv:2310.06770 | 2023 | |
76 | chen2021evaluating | Evaluating large language models trained on code | Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde De Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and others | arXiv preprint arXiv:2107.03374 | 2021 | |
77 | hendrycks2020measuring | Measuring massive multitask language understanding | Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob | arXiv preprint arXiv:2009.03300 | 2020 | |
78 | chollet2024abstraction | Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) | Chollet, Fran{\c{c | 2024 | ||
79 | codeforces | Competitive Programming Platform | {Codeforces | n.d. | ||
80 | rein2024gpqa | Gpqa: A graduate-level google-proof q\&a benchmark | Rein, David and Hou, Betty Li and Stickland, Asa Cooper and Petty, Jackson and Pang, Richard Yuanzhe and Dirani, Julien and Michael, Julian and Bowman, Samuel R | First Conference on Language Modeling | 2024 | |
81 | hendrycks2021measuring | Measuring mathematical problem solving with the math dataset | Hendrycks, Dan and Burns, Collin and Kadavath, Saurav and Arora, Akul and Basart, Steven and Tang, Eric and Song, Dawn and Steinhardt, Jacob | arXiv preprint arXiv:2103.03874 | 2021 | |
82 | maa_aime | American Invitational Mathematics Examination (AIME) | {Mathematical Association of America | |||
83 | cobbe2021training | Training verifiers to solve math word problems | Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and others | arXiv preprint arXiv:2110.14168 | 2021 | |
84 | shao2024deepseekmath | Deepseekmath: Pushing the limits of mathematical reasoning in open language models | Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Yang and others | arXiv preprint arXiv:2402.03300 | 2024 | |
85 | he2025skywork | Skywork open reasoner 1 technical report | He, Jujie and Liu, Jiacai and Liu, Chris Yuhao and Yan, Rui and Wang, Chaojie and Cheng, Peng and Zhang, Xiaoyu and Zhang, Fuxiang and Xu, Jiacheng and Shen, Wei and others | arXiv preprint arXiv:2505.22312 | 2025 | |
86 | bai2023qwen | Qwen technical report | Bai, Jinze and Bai, Shuai and Chu, Yunfei and Cui, Zeyu and Dang, Kai and Deng, Xiaodong and Fan, Yang and Ge, Wenbin and Han, Yu and Huang, Fei and others | arXiv preprint arXiv:2309.16609 | 2023 | |
87 | LuongLockhart2025GeminiIMO | Advanced version of Gemini with Deep Think officially achieves gold‑medal standard at the International Mathematical Olympiad | Thang Luong and Edward Lockhart | 2025 | ||
88 | comanici2025gemini | Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities | Comanici, Gheorghe and Bieber, Eric and Schaekermann, Mike and Pasupat, Ice and Sachdeva, Noveen and Dhillon, Inderjit and Blistein, Marcel and Ram, Ori and Zhang, Dan and Rosen, Evan and others | arXiv preprint arXiv:2507.06261 | 2025 | |
89 | zhang2024naturalcodebench | Naturalcodebench: Examining coding performance mismatch on humaneval and natural user queries | Zhang, Shudan and Zhao, Hanlin and Liu, Xiao and Zheng, Qinkai and Qi, Zehan and Gu, Xiaotao and Dong, Yuxiao and Tang, Jie | Findings of the Association for Computational Linguistics ACL 2024 | 2024 | |
90 | bai2022constitutionalaiharmlessnessai | Constitutional AI: Harmlessness from AI Feedback | Yuntao Bai and Saurav Kadavath et al. | 2022 | DOI/URL | |
91 | liang2023holistic | Holistic Evaluation of Language Models | Percy Liang and Rishi Bommasani and et al. | Transactions on Machine Learning Research | 2023 | DOI/URL |
92 | borghoff2025human | Human-artificial interaction in the age of agentic AI: a system-theoretical approach | Borghoff, Uwe M and Bottoni, Paolo and Pareschi, Remo | Frontiers in Human Dynamics | 2025 | |
93 | team2024gemma | Gemma: Open models based on gemini research and technology | Team, Gemma and Mesnard, Thomas and Hardin, Cassidy and Dadashi, Robert and Bhupatiraju, Surya and Pathak, Shreya and Sifre, Laurent and Rivi{\`e | arXiv preprint arXiv:2403.08295 | 2024 | |
94 | GoogleGeminiModels2025 | Gemini Models \textbar{ | 2025 | |||
95 | anil2023palm | Palm 2 technical report | Anil, Rohan and Dai, Andrew M and Firat, Orhan and Johnson, Melvin and Lepikhin, Dmitry and Passos, Alexandre and Shakeri, Siamak and Taropa, Emanuel and Bailey, Paige and Chen, Zhifeng and others | arXiv preprint arXiv:2305.10403 | 2023 | |
96 | OpenAI2025 | OpenAI | 2015--2025 | |||
97 | yehudai2025survey | Survey on evaluation of llm-based agents | Yehudai, Asaf and Eden, Lilach and Li, Alan and Uziel, Guy and Zhao, Yilun and Bar-Haim, Roy and Cohan, Arman and Shmueli-Scheuer, Michal | arXiv preprint arXiv:2503.16416 | 2025 | |
98 | wang2025survey | A survey on responsible llms: Inherent risk, malicious use, and mitigation strategy | Wang, Huandong and Fu, Wenjie and Tang, Yingzhou and Chen, Zhilong and Huang, Yuxi and Piao, Jinghua and Gao, Chen and Xu, Fengli and Jiang, Tao and Li, Yong | arXiv preprint arXiv:2501.09431 | 2025 | |
99 | chu2024fairness | Fairness in large language models: A taxonomic survey | Chu, Zhibo and Wang, Zichong and Zhang, Wenbin | ACM SIGKDD explorations newsletter | 2024 | |
100 | plaat2024reasoning | Reasoning with large language models, a survey | Plaat, Aske and Wong, Annie and Verberne, Suzan and Broekens, Joost and van Stein, Niki and Back, Thomas | arXiv preprint arXiv:2407.11511 | 2024 | |
101 | plaat2025agentic | Agentic large language models, a survey | Plaat, Aske and van Duijn, Max and van Stein, Niki and Preuss, Mike and van der Putten, Peter and Batenburg, Kees Joost | arXiv preprint arXiv:2503.23037 | 2025 | |
102 | rao1995bdi | BDI agents: From theory to practice. | Rao, Anand S and Georgeff, Michael P and others | Icmas | 1995 | |
103 | chu2023navigate | Navigate through enigmatic labyrinth a survey of chain of thought reasoning: Advances, frontiers and future | Chu, Zheng and Chen, Jingchang and Chen, Qianglong and Yu, Weijiang and He, Tao and Wang, Haotian and Peng, Weihua and Liu, Ming and Qin, Bing and Liu, Ting | arXiv preprint arXiv:2309.15402 | 2023 | |
104 | Wooldridge_Jennings_1995 | Intelligent agents: theory and practice | Wooldridge, Michael and Jennings, Nicholas R. | The Knowledge Engineering Review | 1995 | DOI/URL |
105 | Wang_2024 | A survey on large language model based autonomous agents | Wang, Lei and Ma, Chen and Feng, Xueyang and Zhang, Zeyu and Yang, Hao and Zhang, Jingsen and Chen, Zhiyuan and Tang, Jiakai and Chen, Xu and Lin, Yankai and Zhao, Wayne Xin and Wei, Zhewei and Wen, Jirong | Frontiers of Computer Science | 2024 | DOI/URL |
106 | raza2025responsible | Who is Responsible? The Data, Models, Users or Regulations? A Comprehensive Survey on Responsible Generative AI for a Sustainable Future | Raza, Shaina and Qureshi, Rizwan and Zahid, Anam and Fioresi, Joseph and Sadak, Ferhat and Saeed, Muhammad and Sapkota, Ranjan and Jain, Aditya and Zafar, Anas and Hassan, Muneeb Ul and others | arXiv preprint arXiv:2502.08650 | 2025 | |
107 | sapkota2025ai | Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenge | Sapkota, Ranjan and Roumeliotis, Konstantinos I and Karkee, Manoj | arXiv preprint arXiv:2505.10468 | 2025 | |
108 | raza2025fairsense | FairSense-AI: Responsible AI Meets Sustainability | Raza, Shaina and Chettiar, Mukund Sayeeganesh and Yousefabadi, Matin and Khan, Tahniat and Lotif, Marcelo | arXiv preprint arXiv:2503.02865 | 2025 | |
109 | song2024audit | Audit-llm: Multi-agent collaboration for log-based insider threat detection | Song, Chengyu and Ma, Linru and Zheng, Jianming and Liao, Jinzhi and Kuang, Hongyu and Yang, Lin | arXiv preprint arXiv:2408.08902 | 2024 | |
110 | green2025leaky | Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers | Green, Tommaso and Gubri, Martin and Puerto, Haritz and Yun, Sangdoo and Oh, Seong Joon | arXiv preprint arXiv:2506.15674 | 2025 | |
111 | yao2023reactsynergizingreasoningacting | ReAct: Synergizing Reasoning and Acting in Language Models | Shunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik Narasimhan and Yuan Cao | 2023 | DOI/URL | |
112 | openai2019gpt2 | Better Language Models and Their Implications | {OpenAI | 2019 | ||
113 | google2024gemini15pro | Get more done with Gemini: Try 1.5 Pro and more intelligent features | 2024 | |||
114 | meta2025llama4 | The Llama 4 herd: The beginning of a new era of natively multimodal intelligence | {Meta AI | 2025 | ||
115 | xai2025grok3 | Grok 3 Beta --- The Age of Reasoning Agents | {xAI | 2025 | ||
116 | ibm2025granite33 | IBM Granite 3.3: Speech recognition, refined reasoning, and RAG LoRAs | {IBM | 2025 | ||
117 | ibm2025granitedocs | Granite 3.3 Models --- Documentation | {IBM | 2025 | ||
118 | baidu2025ernie45blog | Announcing the Open Source Release of the ERNIE 4.5 Model Family | {ERNIE Team | 2025 | ||
119 | baidu2025ernie45report | ERNIE 4.5 Technical Report | {ERNIE Team | 2025 | DOI/URL | |
120 | pan2024webcanvasbenchmarkingwebagents | WebCanvas: Benchmarking Web Agents in Online Environments | Yichen Pan and Dehan Kong and Sida Zhou and Cheng Cui and Yifei Leng and Bing Jiang and Hangyu Liu and Yanyi Shang and Shuyan Zhou and Tongshuang Wu and Zhengyang Wu | 2024 | DOI/URL | |
121 | yoran2024assistantbench | Assistantbench: Can web agents solve realistic and time-consuming tasks? | Yoran, Ori and Amouyal, Samuel Joseph and Malaviya, Chaitanya and Bogin, Ben and Press, Ofir and Berant, Jonathan | arXiv preprint arXiv:2407.15711 | 2024 | |
122 | BEARCUBS2025 | BEARCUBS: A benchmark for computer-using web agents | Song, Yixiao and Thai, Katherine and Pham, Chau Minh and Chang, Yapei and Nadaf, Mazin and Iyyer, Mohit | arXiv:2503.07919 | 2025 | DOI/URL |
123 | mistral2025medium3 | Medium is the new large. (Mistral Medium 3) | {Mistral AI | 2025 | ||
124 | mistral2025magistral | Magistral: Reasoning Model Family | {Mistral AI | 2025 | ||
125 | anthropic2024claude3 | Introducing the next generation of Claude (Claude 3 family) | {Anthropic | 2024 | ||
126 | meta2024llama31 | Introducing Llama 3.1: Our most capable models to date | {Meta AI | 2024 | ||
127 | openai2025o3o4mini | Introducing OpenAI o3 and o4-mini | {OpenAI | 2025 | ||
128 | brown2020language | Language models are few-shot learners | Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others | Advances in neural information processing systems | 2020 | |
129 | chen2021codex | Evaluating Large Language Models Trained on Code | Mark Chen and Jerry Tworek and Heewoo Jun and et al. | arXiv preprint arXiv:2107.03374 | 2021 | DOI/URL |
130 | raza2025vldbench | VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment | Raza, Shaina and Vayani, Ashmal and Jain, Aditya and Narayanan, Aravind and Khazaie, Vahid Reza and Bashir, Syed Raza and Dolatabadi, Elham and Uddin, Gias and Emmanouilidis, Christos and Qureshi, Rizwan and others | arXiv preprint arXiv:2502.11361 | 2025 | |
131 | ouyang2022training | Training language models to follow instructions with human feedback | Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others | Advances in neural information processing systems | 2022 | |
132 | wang2024rethinking | Rethinking the bounds of llm reasoning: Are multi-agent discussions the key? | Wang, Qineng and Wang, Zihao and Su, Ying and Tong, Hanghang and Song, Yangqiu | arXiv preprint arXiv:2402.18272 | 2024 | |
133 | yao2023tree | Tree of thoughts: Deliberate problem solving with large language models | Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik | Advances in neural information processing systems | 2023 | |
134 | wang2023plan | Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models | Wang, Lei and Xu, Wanyu and Lan, Yihuai and Hu, Zhiqiang and Lan, Yunshi and Lee, Roy Ka-Wei and Lim, Ee-Peng | arXiv preprint arXiv:2305.04091 | 2023 | |
135 | ejjami2024ethical | Ethical artificial intelligence framework theory (EAIFT): a new paradigm for embedding ethical reasoning in AI systems | Ejjami, Rachid | Int J Multidiscip Res | 2024 | |
136 | schick2023toolformer | Toolformer: Language models can teach themselves to use tools | Schick, Timo and Dwivedi-Yu, Jane and Dess{\`\i | Advances in Neural Information Processing Systems | 2023 | |
137 | raza2025trismagenticaireview | TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems | Shaina Raza and Ranjan Sapkota and Manoj Karkee and Christos Emmanouilidis | 2025 | DOI/URL | |
138 | zhang2025litewebagentopensourcesuitevlmbased | LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications | Danqing Zhang and Balaji Rama and Jingyi Ni and Shiying He and Fu Zhao and Kunyu Chen and Arnold Chen and Junyu Cao | 2025 | DOI/URL | |
139 | SAPKOTA2026103575 | Object detection with multimodal large vision-language models: An in-depth review | Ranjan Sapkota and Manoj Karkee | Information Fusion | 2026 | DOI/URL |
140 | Huq_2025 | CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | Huq, Faria and Wang, Zora Zhiruo and Xu, Frank F. and Ou, Tianyue and Zhou, Shuyan and Bigham, Jeffrey P. and Neubig, Graham | Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) | 2025 | DOI/URL |
141 | dunnell2024bioticbrowserapplyingstreamingllm | Biotic Browser: Applying StreamingLLM as a Persistent Web Browsing Co-Pilot | Kevin F. Dunnell and Andrew P. Stoddard | 2024 | DOI/URL | |
142 | desai2025responsibleaiagents | Responsible AI Agents | Deven R. Desai and Mark O. Riedl | 2025 | DOI/URL | |
143 | wu2025llm | Llm fine-tuning: Concepts, opportunities, and challenges | Wu, Xiao-Kun and Chen, Min and Li, Wanyi and Wang, Rui and Lu, Limeng and Liu, Jia and Hwang, Kai and Hao, Yixue and Pan, Yanru and Meng, Qingguo and others | Big Data and Cognitive Computing | 2025 | |
144 | jin2024impact | The impact of reasoning step length on large language models | Jin, Mingyu and Yu, Qinkai and Shu, Dong and Zhao, Haiyan and Hua, Wenyue and Meng, Yanda and Zhang, Yongfeng and Du, Mengnan | arXiv preprint arXiv:2401.04925 | 2024 | |
145 | patil2025advancing | Advancing reasoning in large language models: Promising methods and approaches | Patil, Avinash and Jadon, Aryan | arXiv preprint arXiv:2502.03671 | 2025 | |
146 | bonagiri2025towards | Towards Trustworthy AI: Frameworks for Evaluating Consistency in Language Models | Bonagiri, Vamshi Krishna | 2025 | ||
147 | liang2025ai | AI Reasoning in Deep Learning Era: From Symbolic AI to Neural--Symbolic AI | Liang, Baoyu and Wang, Yuchen and Tong, Chao | Mathematics | 2025 | |
148 | al2025building | Building Trustworthy AI: Transparent AI Systems via Language Models, Ontologies, and Logical Reasoning | Al Machot, Fadi and Horsch, Martin Thomas and Ullah, Habib | Designing the Conceptual Landscape for a XAIR Validation Infrastructure: Proceedings of the International Workshop on Designing the Conceptual Landscape for a XAIR Validation Infrastructure, DCLXVI 2024, Kaiserslautern, Germany | 2025 | |
149 | wu2024usable | Usable XAI: 10 strategies towards exploiting explainability in the LLM era | Wu, Xuansheng and Zhao, Haiyan and Zhu, Yaochen and Shi, Yucheng and Yang, Fan and Hu, Lijie and Liu, Tianming and Zhai, Xiaoming and Yao, Wenlin and Li, Jundong and others | arXiv preprint arXiv:2403.08946 | 2024 | |
150 | wu2025does | Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning | Wu, Xuyang and Nian, Jinming and Wei, Ting-Ruen and Tao, Zhiqiang and Wu, Hsin-Tai and Fang, Yi | arXiv preprint arXiv:2502.15361 | 2025 | |
151 | fan2025biasguard | Biasguard: A reasoning-enhanced bias detection tool for large language models | Fan, Zhiting and Chen, Ruizhe and Liu, Zuozhu | arXiv preprint arXiv:2504.21299 | 2025 | |
152 | zhang2025collaborative | Collaborative LLM Numerical Reasoning with Local Data Protection | Zhang, Min and Lu, Yuzhe and Zhou, Yun and Xu, Panpan and Cheong, Lin Lee and Lu, Chang-Tien and Wang, Haozhu | arXiv preprint arXiv:2504.00299 | 2025 | |
153 | tavasoli2025responsible | Responsible innovation: A strategic framework for financial LLM integration | Tavasoli, Ahmadreza and Sharbaf, Maedeh and Madani, Seyed Mohamad | arXiv preprint arXiv:2504.02165 | 2025 | |
154 | ferdaus2024towards | Towards trustworthy ai: A review of ethical and robust large language models | Ferdaus, Md Meftahul and Abdelguerfi, Mahdi and Ioup, Elias and Niles, Kendall N and Pathak, Ken and Sloan, Steven | arXiv preprint arXiv:2407.13934 | 2024 | |
155 | chen2024trustworthy | Trustworthy, responsible, and safe ai: A comprehensive architectural framework for ai safety with challenges and mitigations | Chen, Chen and Gong, Xueluan and Liu, Ziyao and Jiang, Weifeng and Goh, Si Qi and Lam, Kwok-Yan | arXiv preprint arXiv:2408.12935 | 2024 | |
156 | shi2024large | Large language model safety: A holistic survey | Shi, Dan and Shen, Tianhao and Huang, Yufei and Li, Zhigen and Leng, Yongqi and Jin, Renren and Liu, Chuang and Wu, Xinwei and Guo, Zishan and Yu, Linhao and others | arXiv preprint arXiv:2412.17686 | 2024 | |
157 | zheng2025beyond | Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models | Zheng, Baihui and Zheng, Boren and Cao, Kerui and Tan, Yingshui and Liu, Zhendong and Wang, Weixun and Liu, Jiaheng and Yang, Jian and Su, Wenbo and Zhu, Xiaoyong and others | arXiv preprint arXiv:2505.19690 | 2025 | |
158 | goh2024large | Large language model influence on diagnostic reasoning: a randomized clinical trial | Goh, Ethan and Gallo, Robert and Hom, Jason and Strong, Eric and Weng, Yingjie and Kerman, Hannah and Cool, Jos{\'e | JAMA network open | 2024 | |
159 | lucas2024reasoning | Reasoning with large language models for medical question answering | Lucas, Mary M and Yang, Justin and Pomeroy, Jon K and Yang, Christopher C | Journal of the American Medical Informatics Association | 2024 | |
160 | guha2023legalbench | Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models | Guha, Neel and Nyarko, Julian and Ho, Daniel and R{\'e | Advances in neural information processing systems | 2023 | |
161 | shu2024lawllm | LawLLM: Law large language model for the US legal system | Shu, Dong and Zhao, Haoran and Liu, Xukun and Demeter, David and Du, Mengnan and Zhang, Yongfeng | Proceedings of the 33rd ACM International Conference on information and knowledge management | 2024 | |
162 | liu2025fin | Fin-r1: A large language model for financial reasoning through reinforcement learning | Liu, Zhaowei and Guo, Xin and Lou, Fangqi and Zeng, Lingfeng and Niu, Jinyi and Wang, Zixuan and Xu, Jiajie and Cai, Weige and Yang, Ziwei and Zhao, Xueqian and others | arXiv preprint arXiv:2503.16252 | 2025 | |
163 | son2023beyond | Beyond classification: Financial reasoning in state-of-the-art language models | Son, Guijin and Jung, Hanearl and Hahm, Moonjeong and Na, Keonju and Jin, Sol | arXiv preprint arXiv:2305.01505 | 2023 | |
164 | yuan2024finllms | Finllms: A framework for financial reasoning dataset generation with large language models | Yuan, Ziqiang and Wang, Kaiyuan and Zhu, Shoutai and Yuan, Ye and Zhou, Jingya and Zhu, Yanlin and Wei, Wenqi | IEEE Transactions on Big Data | 2024 | |
165 | beltagy2019scibert | SciBERT: A pretrained language model for scientific text | Beltagy, Iz and Lo, Kyle and Cohan, Arman | arXiv preprint arXiv:1903.10676 | 2019 | |
166 | taylor2022galactica | Galactica: A large language model for science | Taylor, Ross and Kardas, Marcin and Cucurull, Guillem and Scialom, Thomas and Hartshorn, Anthony and Saravia, Elvis and Poulton, Andrew and Kerkez, Viktor and Stojnic, Robert | arXiv preprint arXiv:2211.09085 | 2022 | |
167 | raza2025developing | Developing safe and responsible large language model: can we balance bias reduction and language understanding? | Raza, Shaina and Bamgbose, Oluwanifemi and Ghuge, Shardul and Tavakoli, Fatemeh and Reji, Deepak John and Bashir, Syed Raza | Machine Learning | 2025 | |
168 | besta2025reasoning | Reasoning language models: A blueprint | Besta, Maciej and Barth, Julia and Schreiber, Eric and Kubicek, Ales and Catarino, Afonso and Gerstenberger, Robert and Nyczyk, Piotr and Iff, Patrick and Li, Yueling and Houliston, Sam and others | arXiv preprint arXiv:2501.11223 | 2025 | |
169 | lomonaco2019continual | Continual learning with deep architectures | Lomonaco, Vincenzo | alma | 2019 | |
170 | hitzler2022neuro | Neuro-symbolic artificial intelligence: The state of the art | Hitzler, Pascal and Sarker, Md Kamruzzaman | IOS press | 2022 | |
171 | FERPA1974 | {Family Educational Rights and Privacy Act of 1974 (FERPA) | {U.S. Congress | 1974 | DOI/URL | |
172 | iso42001 | ISO/IEC 42001:2023 -- Artificial Intelligence Management System (AI MS) -- Requirements | {International Organization for Standardization | 2023 | ||
173 | MiFIDII2014 | {Directive 2014/65/EU | {European Parliament and Council of the European Union | 2014 | DOI/URL | |
174 | hipaa164 | {HIPAA Privacy Rule -- 45 CFR Part 164: Security and Privacy Protections for Health Information | {U.S. Department of Health and Human Services | 2003 | ||
175 | gdpr25 | {General Data Protection Regulation (GDPR) -- Article 25: Data protection by design and by default | {European Union | 2016 | ||
176 | slattery2024ai | The ai risk repository: A comprehensive meta-review, database, and taxonomy of risks from artificial intelligence | Slattery, Peter and Saeri, Alexander K and Grundy, Emily AC and Graham, Jess and Noetel, Michael and Uuk, Risto and Dao, James and Pour, Soroush and Casper, Stephen and Thompson, Neil | arXiv preprint arXiv:2408.12622 | 2024 | |
177 | sakib2024risks | Risks, causes, and mitigations of widespread deployments of large language models (llms): A survey | Sakib, Md Nazmus and Islam, Md Athikul and Pathak, Royal and Arifin, Md Mashrur | 2024 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings) | 2024 | |
178 | zhao2024explainability | Explainability for large language models: A survey | Zhao, Haiyan and Chen, Hanjie and Yang, Fan and Liu, Ninghao and Deng, Huiqi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Du, Mengnan | ACM Transactions on Intelligent Systems and Technology | 2024 | |
179 | jha2022responsible | Responsible reasoning with large language models and the impact of proper nouns | Jha, Sumit Kumar and Ewetz, Rickard and Velasquez, Alvaro and Jha, Susmit | Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022 | 2022 | |
180 | park2023generativeagents | Generative Agents: Interactive Simulacra of Human Behavior | Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. | Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23) | 2023 | DOI/URL |
181 | lewis2020rag | Retrieval-Augmented Generation for Knowledge-Intensive NLP | Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K{\"u | Advances in Neural Information Processing Systems 33 (NeurIPS 2020) | 2020 | DOI/URL |
182 | packer2023memgpt | MemGPT: Towards LLMs as Operating Systems. | Packer, Charles and Fang, Vivian and Patil, Shishir\_G and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph\_E | ArXiv | 2023 | |
183 | google2025_gemini25_deepthink_modelcard | {Gemini 2.5 Deep Think Model Card | {Google DeepMind | 2025 | ||
184 | anthropic2025_claude37_sonnet | {Claude 3.7 Sonnet (Hybrid Reasoning Model) Announcement and System Card | Anthropic | 2025 | ||
185 | he2025_skywork_or1 | {Skywork Open Reasoner 1 (Skywork-OR1): A Scalable RL Framework for Long Chain-of-Thought Reasoning | He, Jujie and Liu, Jiacai and Liu, Chris Yuhao and Yan, Rui and Wang, Chaojie and Cheng, Peng and Zhang, Xiaoyu and Zhang, Fuxiang and Xu, Jiacheng and Shen, Wei and Li, Siyuan and Zeng, Liang and Wei, Tianwen and Cheng, Cheng and An, Bo and Liu, Yang and Zhou, Yahui | arXiv preprint arXiv:2505.22312 | 2025 | |
186 | alibaba2025_qwq32b | {Alibaba Cloud Unveils QwQ-32B: A Compact Reasoning Model with Cutting-Edge Performance | {Alibaba Cloud Qwen Team | 2025 | ||
187 | anthropic2024_claude35_sonnet | {Introducing Claude 3.5 Sonnet | Anthropic | 2024 | ||
188 | anthropic_claude4_systemcard | Claude Opus 4 \& Claude Sonnet 4 System Card | {Anthropic | 2025 | ||
189 | Guardian_OpenAI_GPT5_2025 | OpenAI says latest {ChatGPT | {The Guardian | The Guardian | 2025 | DOI/URL |
190 | OpenAI_GPT5_2025 | Introducing {GPT-5 | {OpenAI | 2025 | ||
191 | openai2025_gpt_oss_model_card | {gpt-oss-120b \& gpt-oss-20b Model Card | {OpenAI | 2025 | DOI/URL | |
192 | wang2022self | Self-consistency improves chain of thought reasoning in language models | Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny | arXiv preprint arXiv:2203.11171 | 2022 | |
193 | peter1997experiences | Experiences with an architecture for intelligent, reactive agents | Peter Bonasso, R and James Firby, R and Gat, Erann and Kortenkamp, David and Miller, David P and Slack, Mark G | Journal of Experimental \& Theoretical Artificial Intelligence | 1997 | |
194 | gat1998three | On three-layer architectures | Gat, Erann and Bonnasso, R Peter and Murphy, Robin and others | Artificial intelligence and mobile robots | 1998 | |
195 | brooks1991intelligence | Intelligence without representation | Brooks, Rodney A | Artificial intelligence | 1991 | |
196 | brooks2003robust | A robust layered control system for a mobile robot | Brooks, Rodney | IEEE journal on robotics and automation | 2003 | |
197 | karpukhin2020dpr | Dense Passage Retrieval for Open-Domain Question Answering | Karpukhin, Vladimir and O{\u{g | Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) | 2020 | DOI/URL |
198 | johnson2017faiss | Billion-Scale Similarity Search with GPUs | Johnson, Jeff and Douze, Matthijs and J{\'e | IEEE Transactions on Big Data | 2019 | DOI/URL |
199 | woodgate2024macro | Macro ethics principles for responsible AI systems: Taxonomy and directions | Woodgate, Jessica and Ajmeri, Nirav | ACM Computing Surveys | 2024 | |
200 | devlin2019bert | Bert: Pre-training of deep bidirectional transformers for language understanding | Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina | Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) | 2019 | |
201 | wei2022chain | Chain-of-thought prompting elicits reasoning in large language models | Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny and others | Advances in neural information processing systems | 2022 | |
202 | jaech2024openai | Openai o1 system card | Jaech, Aaron and Kalai, Adam and Lerer, Adam and Richardson, Adam and El-Kishky, Ahmed and Low, Aiden and Helyar, Alec and Madry, Aleksander and Beutel, Alex and Carney, Alex and others | arXiv preprint arXiv:2412.16720 | 2024 | |
203 | guo2025deepseek | Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning | Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and others | arXiv preprint arXiv:2501.12948 | 2025 | |
204 | zhang2022automatic | Automatic chain of thought prompting in large language models | Zhang, Zhuosheng and Zhang, Aston and Li, Mu and Smola, Alex | arXiv preprint arXiv:2210.03493 | 2022 | |
205 | lyu2023faithful | Faithful chain-of-thought reasoning | Lyu, Qing and Havaldar, Shreya and Stein, Adam and Zhang, Li and Rao, Delip and Wong, Eric and Apidianaki, Marianna and Callison-Burch, Chris | The 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023) | 2023 | |
206 | mokander2024auditing | Auditing large language models: a three-layered approach | M{\"o | AI and Ethics | 2024 | |
207 | amirizaniani2024llmauditor | LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop | Amirizaniani, Maryam and Yao, Jihan and Lavergne, Adrian and Okada, Elizabeth Snell and Chadha, Aman and Roosta, Tanya and Shah, Chirag | arXiv preprint arXiv:2402.09346 | 2024 | |
208 | amirizaniani2024auditllm | AuditLLM: A tool for auditing large language models using multiprobe approach | Amirizaniani, Maryam and Martin, Elias and Roosta, Tanya and Chadha, Aman and Shah, Chirag | Proceedings of the 33rd ACM International Conference on Information and Knowledge Management | 2024 | |
209 | paraschou2025mind | Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI | Paraschou, Eva and Arapakis, Ioannis and Yfantidou, Sofia and Macaluso, Sebastian and Vakali, Athena | arXiv preprint arXiv:2506.12240 | 2025 | |
210 | ehsan2024human | Human-centered explainable AI (HCXAI): Reloading explainability in the era of large language models (LLMs) | Ehsan, Upol and Watkins, Elizabeth A and Wintersberger, Philipp and Manger, Carina and Kim, Sunnie SY and Van Berkel, Niels and Riener, Andreas and Riedl, Mark O | Extended Abstracts of the CHI Conference on Human Factors in Computing Systems | 2024 | |
211 | yang2024human | Human-centric autonomous systems with llms for user command reasoning | Yang, Yi and Zhang, Qingwen and Li, Ci and Marta, Daniel Sim{\~o | Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision | 2024 | |
212 | zhang2024llama | Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning | Zhang, Di and Wu, Jianbo and Lei, Jingdi and Che, Tong and Li, Jiatong and Xie, Tong and Huang, Xiaoshui and Zhang, Shufei and Pavone, Marco and Li, Yuqiang and others | arXiv preprint arXiv:2410.02884 | 2024 | |
213 | zheng2024processbench | Processbench: Identifying process errors in mathematical reasoning | Zheng, Chujie and Zhang, Zhenru and Zhang, Beichen and Lin, Runji and Lu, Keming and Yu, Bowen and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang | arXiv preprint arXiv:2412.06559 | 2024 | |
214 | browne2012survey | A survey of monte carlo tree search methods | Browne, Cameron B and Powley, Edward and Whitehouse, Daniel and Lucas, Simon M and Cowling, Peter I and Rohlfshagen, Philipp and Tavener, Stephen and Perez, Diego and Samothrakis, Spyridon and Colton, Simon | IEEE Transactions on Computational Intelligence and AI in games | 2012 | |
215 | zhao2024expel | Expel: Llm agents are experiential learners | Zhao, Andrew and Huang, Daniel and Xu, Quentin and Lin, Matthieu and Liu, Yong-Jin and Huang, Gao | Proceedings of the AAAI Conference on Artificial Intelligence | 2024 | |
216 | besta2024graph | Graph of thoughts: Solving elaborate problems with large language models | Besta, Maciej and Blach, Nils and Kubicek, Ales and Gerstenberger, Robert and Podstawski, Michal and Gianinazzi, Lukas and Gajda, Joanna and Lehmann, Tomasz and Niewiadomski, Hubert and Nyczyk, Piotr and others | Proceedings of the AAAI conference on artificial intelligence | 2024 | |
217 | liu2024mathbench | Mathbench: Evaluating the theory and application proficiency of llms with a hierarchical mathematics benchmark | Liu, Hongwei and Zheng, Zilong and Qiao, Yuxuan and Duan, Haodong and Fei, Zhiwei and Zhou, Fengzhe and Zhang, Wenwei and Zhang, Songyang and Lin, Dahua and Chen, Kai | arXiv preprint arXiv:2405.12209 | 2024 | |
218 | wang2024rupbench | Rupbench: Benchmarking reasoning under perturbations for robustness evaluation in large language models | Wang, Yuqing and Zhao, Yun | arXiv preprint arXiv:2406.11020 | 2024 | |
219 | zeng2024mr | Mr-ben: A meta-reasoning benchmark for evaluating system-2 thinking in llms | Zeng, Zhongshen and Liu, Yinhong and Wan, Yingjia and Li, Jingyao and Chen, Pengguang and Dai, Jianbo and Yao, Yuxuan and Xu, Rongwu and Qi, Zehan and Zhao, Wanru and others | Advances in Neural Information Processing Systems | 2024 | |
220 | estermann2024puzzles | Puzzles: A benchmark for neural algorithmic reasoning | Estermann, Benjamin and Lanzend{\"o | Advances in Neural Information Processing Systems | 2024 | |
221 | wang2019superglue | Superglue: A stickier benchmark for general-purpose language understanding systems | Wang, Alex and Pruksachatkun, Yada and Nangia, Nikita and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel | Advances in neural information processing systems | 2019 | |
222 | pan2023logic | Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning | Pan, Liangming and Albalak, Alon and Wang, Xinyi and Wang, William Yang | arXiv preprint arXiv:2305.12295 | 2023 | |
223 | yu2024natural | Natural language reasoning, a survey | Yu, Fei and Zhang, Hongbo and Tiwari, Prayag and Wang, Benyou | ACM Computing Surveys | 2024 | |
224 | liu2024logic | Logic-of-thought: Injecting logic into contexts for full reasoning in large language models | Liu, Tongxuan and Xu, Wenjiang and Huang, Weizhe and Zeng, Yuting and Wang, Jiaxing and Wang, Xingyu and Yang, Hailong and Li, Jing | arXiv preprint arXiv:2409.17539 | 2024 | |
225 | pan2025survey | A survey of slow thinking-based reasoning llms using reinforced learning and inference-time scaling law | Pan, Qianjun and Ji, Wenkai and Ding, Yuyang and Li, Junsong and Chen, Shilian and Wang, Junyi and Zhou, Jie and Chen, Qin and Zhang, Min and Wu, Yulan and others | arXiv preprint arXiv:2505.02665 | 2025 | |
226 | huang2025deep | Deep Research Agents: A Systematic Examination And Roadmap | Huang, Yuxuan and Chen, Yihang and Zhang, Haozheng and Li, Kang and Fang, Meng and Yang, Linyi and Li, Xiaoguang and Shang, Lifeng and Xu, Songcen and Hao, Jianye and others | arXiv preprint arXiv:2506.18096 | 2025 | |
227 | raiaan2024review | A review on large language models: Architectures, applications, taxonomies, open issues and challenges | Raiaan, Mohaimenul Azam Khan and Mukta, Md Saddam Hossain and Fatema, Kaniz and Fahad, Nur Mohammad and Sakib, Sadman and Mim, Most Marufatul Jannat and Ahmad, Jubaer and Ali, Mohammed Eunus and Azam, Sami | IEEE access | 2024 | |
228 | chen2025towards | Towards reasoning era: A survey of long chain-of-thought for reasoning large language models | Chen, Qiguang and Qin, Libo and Liu, Jinhao and Peng, Dengyun and Guan, Jiannan and Wang, Peng and Hu, Mengkang and Zhou, Yuhang and Gao, Te and Che, Wanxiang | arXiv preprint arXiv:2503.09567 | 2025 | |
229 | cao2025toward | Toward generalizable evaluation in the llm era: A survey beyond benchmarks | Cao, Yixin and Hong, Shibo and Li, Xinze and Ying, Jiahao and Ma, Yubo and Liang, Haiyuan and Liu, Yantao and Yao, Zijun and Wang, Xiaozhi and Huang, Dan and others | arXiv preprint arXiv:2504.18838 | 2025 | |
230 | chang2024survey | A survey on evaluation of large language models | Chang, Yupeng and Wang, Xu and Wang, Jindong and Wu, Yuan and Yang, Linyi and Zhu, Kaijie and Chen, Hao and Yi, Xiaoyuan and Wang, Cunxiang and Wang, Yidong and others | ACM transactions on intelligent systems and technology | 2024 | |
231 | morishita2024enhancing | Enhancing reasoning capabilities of llms via principled synthetic logic corpus | Morishita, Terufumi and Morio, Gaku and Yamaguchi, Atsuki and Sogawa, Yasuhiro | Advances in Neural Information Processing Systems | 2024 | |
232 | basiouni2025context | In-Context Learning in Large Language Models (LLMs): Mechanisms, Capabilities, and Implications for Advanced Knowledge Representation and Reasoning | Basiouni, Azza Mohamed and El Rashid, Mohamed and Shaalan, Khaled | IEEE Access | 2025 | |
233 | yeo2025demystifying | Demystifying long chain-of-thought reasoning in llms | Yeo, Edward and Tong, Yuxuan and Niu, Morry and Neubig, Graham and Yue, Xiang | arXiv preprint arXiv:2502.03373 | 2025 | |
234 | kumar2025llm | Llm post-training: A deep dive into reasoning large language models | Kumar, Komal and Ashraf, Tajamul and Thawakar, Omkar and Anwer, Rao Muhammad and Cholakkal, Hisham and Shah, Mubarak and Yang, Ming-Hsuan and Torr, Phillip HS and Khan, Fahad Shahbaz and Khan, Salman | arXiv preprint arXiv:2502.21321 | 2025 | |
235 | fu2025improving | Improving complex reasoning in large language models | Fu, Yao | The University of Edinburgh | 2025 | |
236 | feng2025efficient | Efficient reasoning models: A survey | Feng, Sicheng and Fang, Gongfan and Ma, Xinyin and Wang, Xinchao | arXiv preprint arXiv:2504.10903 | 2025 | |
237 | ferrag2025llm | From llm reasoning to autonomous ai agents: A comprehensive review | Ferrag, Mohamed Amine and Tihanyi, Norbert and Debbah, Merouane | arXiv preprint arXiv:2504.19678 | 2025 | |
238 | putta2024agent | Agent q: Advanced reasoning and learning for autonomous ai agents | Putta, Pranav and Mills, Edmund and Garg, Naman and Motwani, Sumeet and Finn, Chelsea and Garg, Divyansh and Rafailov, Rafael | arXiv preprint arXiv:2408.07199 | 2024 | |
239 | tariq2025reasoning | Reasoning About Responsibility in Autonomous Systems: Navigating the Challenges and Charting Future Directions | Tariq, Usman and Ahmed, Irfan | Ubiquitous Technology Journal | 2025 | |
240 | ferrag2025reasoning | Reasoning beyond limits: Advances and open problems for llms | Ferrag, Mohamed Amine and Tihanyi, Norbert and Debbah, Merouane | arXiv preprint arXiv:2503.22732 | 2025 | |
241 | wu2025position | Position Paper: Towards Open Complex Human-AI Agents Collaboration System for Problem-Solving and Knowledge Management | Wu, Ju and Or, Calvin KL | arXiv preprint arXiv:2505.00018 | 2025 | |
242 | tran2025reasoning | Reasoning in Neurosymbolic AI | Tran, Son and Mota, Edjard and Garcez, Artur d'Avila | arXiv preprint arXiv:2505.20313 | 2025 | |
243 | swiechowski2023monte | Monte Carlo tree search: A review of recent modifications and applications | {\'S | Artificial Intelligence Review | 2023 | |
244 | sun2025data | Data Agent: A Holistic Architecture for Orchestrating Data+ AI Ecosystems | Sun, Zhaoyan and Wang, Jiayi and Zhao, Xinyang and Wang, Jiachi and Li, Guoliang | arXiv preprint arXiv:2507.01599 | 2025 | |
245 | zheng2025retrieval | Retrieval augmented generation and understanding in vision: A survey and new outlook | Zheng, Xu and Weng, Ziqiao and Lyu, Yuanhuiyi and Jiang, Lutao and Xue, Haiwei and Ren, Bin and Paudel, Danda and Sebe, Nicu and Van Gool, Luc and Hu, Xuming | arXiv preprint arXiv:2503.18016 | 2025 | |
246 | bei2025graphs | Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities | Bei, Yuanchen and Zhang, Weizhi and Wang, Siwen and Chen, Weizhi and Zhou, Sheng and Chen, Hao and Li, Yong and Bu, Jiajun and Pan, Shirui and Yu, Yizhou and others | arXiv preprint arXiv:2506.18019 | 2025 | |
247 | chhikara2025mem0 | Mem0: Building production-ready ai agents with scalable long-term memory | Chhikara, Prateek and Khant, Dev and Aryan, Saket and Singh, Taranjeet and Yadav, Deshraj | arXiv preprint arXiv:2504.19413 | 2025 | |
248 | huang2025foundation | Foundation models and intelligent decision-making: Progress, challenges, and perspectives | Huang, Jincai and Xu, Yongjun and Wang, Qi and Wang, Qi Cheems and Liang, Xingxing and Wang, Fei and Zhang, Zhao and Wei, Wei and Zhang, Boxuan and Huang, Libo and others | The Innovation | 2025 | |
249 | sun2025survey | A survey of reasoning with foundation models: Concepts, methodologies, and outlook | Sun, Jiankai and Zheng, Chuanyang and Xie, Enze and Liu, Zhengying and Chu, Ruihang and Qiu, Jianing and Xu, Jiaqi and Ding, Mingyu and Li, Hongyang and Geng, Mengzhe and others | ACM Computing Surveys | 2025 | |
250 | zhang2025igniting | Igniting language intelligence: The hitchhiker’s guide from chain-of-thought reasoning to language agents | Zhang, Zhuosheng and Yao, Yao and Zhang, Aston and Tang, Xiangru and Ma, Xinbei and He, Zhiwei and Wang, Yiming and Gerstein, Mark and Wang, Rui and Liu, Gongshen and others | ACM Computing Surveys | 2025 | |
251 | wang2025multimodal | Multimodal chain-of-thought reasoning: A comprehensive survey | Wang, Yaoting and Wu, Shengqiong and Zhang, Yuecheng and Yan, Shuicheng and Liu, Ziwei and Luo, Jiebo and Fei, Hao | arXiv preprint arXiv:2503.12605 | 2025 | |
252 | chen2025policy | Policy frameworks for transparent chain-of-thought reasoning in large language models | Chen, Yihang and Deng, Haikang and Han, Kaiqiao and Zhao, Qingyue | arXiv preprint arXiv:2503.14521 | 2025 | |
253 | manuvinakurike2025thoughts | Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines | Manuvinakurike, Ramesh and Moss, Emanuel and Watkins, Elizabeth Anne and Sahay, Saurav and Raffa, Giuseppe and Nachman, Lama | arXiv preprint arXiv:2505.00875 | 2025 | |
254 | li2025llm | LLM-augmented hierarchical reinforcement learning for human-like decision-making of autonomous driving | Li, Lin and Tan, Runjia and Fang, Jianwu and Xue, Jianru and Lv, Chen | Expert Systems with Applications | 2025 | |
255 | zhao2025world | World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks | Zhao, Changyuan and Zhang, Ruichen and Wang, Jiacheng and Zhao, Gaosheng and Niyato, Dusit and Sun, Geng and Mao, Shiwen and Kim, Dong In | arXiv preprint arXiv:2506.00417 | 2025 | |
256 | lopez2025survey | A Survey on Large Language Models in Multimodal Recommender Systems | Lopez-Avila, Alejo and Du, Jinhua | arXiv preprint arXiv:2505.09777 | 2025 | |
257 | giannone2025feedback | Feedback-Driven Vision-Language Alignment with Minimal Human Supervision | Giannone, Giorgio and Li, Ruoteng and Feng, Qianli and Perevodchikov, Evgeny and Chen, Rui and Martinez, Aleix | arXiv preprint arXiv:2501.04568 | 2025 | |
258 | cao2025causal | Causal action empowerment for efficient reinforcement learning in embodied agents | Cao, Hongye and Feng, Fan and Huo, Jing and Gao, Yang | Science China Information Sciences | 2025 | |
259 | ranjan2025fairness | Fairness in Agentic AI: A Unified Framework for Ethical and Equitable Multi-Agent System | Ranjan, Rajesh and Gupta, Shailja and Singh, Surya Narayan | arXiv preprint arXiv:2502.07254 | 2025 | |
260 | chen2024fairness | Fairness testing: A comprehensive survey and analysis of trends | Chen, Zhenpeng and Zhang, Jie M and Hort, Max and Harman, Mark and Sarro, Federica | ACM Transactions on Software Engineering and Methodology | 2024 | |
261 | su2025thinking | Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers | Su, Zhaochen and Xia, Peng and Guo, Hangyu and Liu, Zhenhua and Ma, Yan and Qu, Xiaoye and Liu, Jiaqi and Li, Yanshu and Zeng, Kaide and Yang, Zhengyuan and others | arXiv preprint arXiv:2506.23918 | 2025 | |
262 | karunanayake2025next | Next-generation agentic AI for transforming healthcare | Karunanayake, Nalan | Informatics and Health | 2025 | |
263 | zhang2025survey | A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well? | Zhang, Qiyuan and Lyu, Fuyuan and Sun, Zexu and Wang, Lei and Zhang, Weixu and Hua, Wenyue and Wu, Haolun and Guo, Zhihan and Wang, Yufei and Muennighoff, Niklas and others | arXiv preprint arXiv:2503.24235 | 2025 | |
264 | kim2025cost | The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective | Kim, Jiin and Shin, Byeongjun and Chung, Jinha and Rhu, Minsoo | arXiv preprint arXiv:2506.04301 | 2025 | |
265 | li2025system | From system 1 to system 2: A survey of reasoning large language models | Li, Zhong-Zhi and Zhang, Duzhen and Zhang, Ming-Liang and Zhang, Jiaxin and Liu, Zengyan and Yao, Yuxuan and Xu, Haotian and Zheng, Junhao and Wang, Pei-Jie and Chen, Xiuyi and others | arXiv preprint arXiv:2502.17419 | 2025 | |
266 | gao2024interpretable | Interpretable contrastive monte carlo tree search reasoning | Gao, Zitian and Niu, Boye and He, Xuzheng and Xu, Haotian and Liu, Hongzhang and Liu, Aiwei and Hu, Xuming and Wen, Lijie | arXiv preprint arXiv:2410.01707 | 2024 | |
267 | liang2025mcts | I-MCTS: Enhancing agentic AutoML via introspective monte carlo tree search | Liang, Zujie and Wei, Feng and Xu, Wujiang and Chen, Lin and Qian, Yuxi and Wu, Xinhui | arXiv preprint arXiv:2502.14693 | 2025 | |
268 | an2025combining | Combining llms with logic-based framework to explain mcts | An, Ziyan and Wang, Xia and Baier, Hendrik and Chen, Zirong and Dubey, Abhishek and Johnson, Taylor T and Sprinkle, Jonathan and Mukhopadhyay, Ayan and Ma, Meiyi | arXiv preprint arXiv:2505.00610 | 2025 | |
269 | dao2025boosting | Boosting MCTS with Free Energy Minimization | Dao, Mawaba Pascal and Peter, Adrian M | arXiv preprint arXiv:2501.13083 | 2025 | |
270 | meimandi2025measurement | The Measurement Imbalance in Agentic AI Evaluation Undermines Industry Productivity Claims | Meimandi, Kiana Jafari and Ar{\'a | arXiv preprint arXiv:2506.02064 | 2025 | |
271 | ahmed2025enhancing | Enhancing Explainability, Robustness, and Autonomy: A Comprehensive Approach in Trustworthy AI | Ahmed, Mobyen Uddin and Begum, Shahina and Barua, Shaibal and Masud, Abu Naser and Di Flumeri, Gianluca and Navarin, Nicol{\`o | 2025 IEEE Symposium on Trustworthy, Explainable and Responsible Computational Intelligence (CITREx) | 2025 | |
272 | sanwal2025layered | Layered chain-of-thought prompting for multi-agent llm systems: A comprehensive approach to explainable large language models | Sanwal, Manish | arXiv preprint arXiv:2501.18645 | 2025 | |
273 | pang2025interactive | Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models | Pang, Rock Yuren and Feng, KJ and Feng, Shangbin and Li, Chu and Shi, Weijia and Tsvetkov, Yulia and Heer, Jeffrey and Reinecke, Katharina | arXiv preprint arXiv:2506.23678 | 2025 | |
274 | bilal2025meta | Meta-thinking in llms via multi-agent reinforcement learning: A survey | Bilal, Ahsan and Mohsin, Muhammad Ahmed and Umer, Muhammad and Bangash, Muhammad Awais Khan and Jamshed, Muhammad Ali | arXiv preprint arXiv:2504.14520 | 2025 | |
275 | wen2025cotguard | CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems | Wen, Yan and Guo, Junfeng and Huang, Heng | arXiv preprint arXiv:2505.19405 | 2025 | |
276 | zahid2025explainability | Explainability, Robustness, and Fairness in User-Centric Intelligent Systems: A Systematic Review | Zahid, Idrees A and Garfan, Salem and Chyad, MA and Albahri, AS and Albahri, OS and Alamoodi, AH and Deveci, Muhammet and Homod, Raad Z and Alzubaidi, Laith | IEEE Transactions on Emerging Topics in Computational Intelligence | 2025 | |
277 | gupta2025ai | AI Agents Collaboration Under Resource Constraints: Practical Implementations | Gupta, Shubham | INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT | 2025 | |
278 | molinari2025towards | Towards Pervasive Distributed Agentic Generative AI--A State of The Art | Molinari, Gianni and Ciravegna, Fabio | arXiv preprint arXiv:2506.13324 | 2025 | |
279 | zhang2024integrating | Integrating Artificial Intelligence into Operating Systems: A Comprehensive Survey on Techniques, Applications, and Future Directions | Zhang, Yifan and Zhao, Xinkui and Li, Ziying and Yin, Jianwei and Zhang, Lufei and Chen, Zuoning | arXiv preprint arXiv:2407.14567 | 2024 | |
280 | wei2025agent | Agent. xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC | Wei, Xinming and Zhang, Jiahao and Li, Haoran and Chen, Jiayu and Qu, Rui and Li, Maoliang and Chen, Xiang and Luo, Guojie | arXiv preprint arXiv:2506.24045 | 2025 | |
281 | jiang2025large | From large ai models to agentic ai: A tutorial on future intelligent communications | Jiang, Feibo and Pan, Cunhua and Dong, Li and Wang, Kezhi and Dobre, Octavia A and Debbah, Merouane | arXiv preprint arXiv:2505.22311 | 2025 | |
282 | liu2025optimizing | Optimizing on-demand food delivery with BDI-based multi-agent systems and Monte Carlo tree search scheduling | Liu, Li and Chen, Shikun and Jin, Huan and Deng, Xiaoying and Liu, Yangguang and Lin, Yang | Scientific Reports | 2025 | |
283 | zou2025agente | El Agente: An autonomous agent for quantum chemistry | Zou, Yunheng and Cheng, Austin H and Aldossary, Abdulrahman and Bai, Jiaru and Leong, Shi Xuan and Campos-Gonzalez-Angulo, Jorge Arturo and Choi, Changhyeok and Ser, Cher Tian and Tom, Gary and Wang, Andrew and others | Matter | 2025 | |
284 | amini2025distributed | Distributed llms and multimodal large language models: A survey on advances, challenges, and future directions | Amini, Hadi and Mia, Md Jueal and Saadati, Yasaman and Imteaj, Ahmed and Nabavirazavi, Seyedsina and Thakker, Urmish and Hossain, Md Zarif and Fime, Awal Ahmed and Iyengar, SS | arXiv preprint arXiv:2503.16585 | 2025 | |
285 | chaudhry2025towards | Towards Resource-Efficient Compound AI Systems | Chaudhry, Gohar Irfan and Choukse, Esha and Goiri, {\'I | Proceedings of the 2025 Workshop on Hot Topics in Operating Systems | 2025 | |
286 | roy2024enhancing | Enhancing Real-World Robustness in AI: Challenges and Solutions | Roy, Pritam | J. Recent Trends Comput. Sci. Eng | 2024 | |
287 | kim2025medical | Medical hallucinations in foundation models and their impact on healthcare | Kim, Yubin and Jeong, Hyewon and Chen, Shan and Li, Shuyue Stella and Lu, Mingyu and Alhamoud, Kumail and Mun, Jimin and Grau, Cristina and Jung, Minseok and Gameiro, Rodrigo and others | arXiv preprint arXiv:2503.05777 | 2025 | |
288 | gao2025mono | Mono: Is Your" Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond | Gao, Zeyu and Zhou, Junlin and Zhang, Bolun and He, Yi and Zhang, Chao and Cui, Yuxin and Wang, Hao | arXiv preprint arXiv:2506.03651 | 2025 | |
289 | chander2025toward | Toward trustworthy artificial intelligence (TAI) in the context of explainability and robustness | Chander, Bhanu and John, Chinju and Warrier, Lekha and Gopalakrishnan, Kumaravelan | ACM Computing Surveys | 2025 | |
290 | barros2025think | I Think, Therefore I Hallucinate: Minds, Machines, and the Art of Being Wrong | Barros, Sebastian | arXiv preprint arXiv:2503.05806 | 2025 | |
291 | latif2025hallucinations | Hallucinations in large language models and their influence on legal reasoning: Examining the risks of ai-generated factual inaccuracies in judicial processes | Latif, Youssef Abdel | Journal of Computational Intelligence, Machine Reasoning, and Decision-Making | 2025 | |
292 | chakraborti2025personalized | Personalized uncertainty quantification in artificial intelligence | Chakraborti, Tapabrata and Banerji, Christopher RS and Marandon, Ariane and Hellon, Vicky and Mitra, Robin and Lehmann, Brieuc and Br{\"a | Nature Machine Intelligence | 2025 | |
293 | liu2025uncertainty | Uncertainty quantification and confidence calibration in large language models: A survey | Liu, Xiaoou and Chen, Tiejin and Da, Longchao and Chen, Chacha and Lin, Zhen and Wei, Hua | arXiv preprint arXiv:2503.15850 | 2025 | |
294 | becerra2025historical | Historical Methods for AI Evaluations, Assessments, and Audits | Becerra Sandoval, Juana Catalina and Jing, Felicia S | Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency | 2025 | |
295 | yeo2025comprehensive | A comprehensive review on financial explainable AI | Yeo, Wei Jie and Van Der Heever, Wihan and Mao, Rui and Cambria, Erik and Satapathy, Ranjan and Mengaldo, Gianmarco | Artificial Intelligence Review | 2025 | |
296 | mao2025llms | From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem | Mao, Yanxu and Cui, Tiehan and Liu, Peipei and You, Datao and Zhu, Hongsong | arXiv preprint arXiv:2506.15170 | 2025 | |
297 | feng2025integration | Integration of multi-agent systems and artificial intelligence in self-healing subway power supply systems: Advancements in fault diagnosis, isolation, and recovery | Feng, Jianbing and Yu, Tao and Zhang, Kuozhen and Cheng, Lefeng | Processes | 2025 | |
298 | hammond2025multi | Multi-agent risks from advanced ai | Hammond, Lewis and Chan, Alan and Clifton, Jesse and Hoelscher-Obermaier, Jason and Khan, Akbir and McLean, Euan and Smith, Chandler and Barfuss, Wolfram and Foerster, Jakob and Gaven{\v{c | arXiv preprint arXiv:2502.14143 | 2025 | |
299 | acharya2025agentic | Agentic ai: Autonomous intelligence for complex goals--a comprehensive survey | Acharya, Deepak Bhaskar and Kuppan, Karthigeyan and Divya, B | IEEe Access | 2025 | |
300 | abdallah2024multi | Multi-agent DRL for distributed codebook design in RIS-aided cell-free massive MIMO networks | Abdallah, Asmaa and Celik, Abdulkadir and Mansour, Mohammad M and Eltawil, Ahmed M | IEEE Transactions on Communications | 2024 | |
301 | feffer2024red | Red-teaming for generative AI: Silver bullet or security theater? | Feffer, Michael and Sinha, Anusha and Deng, Wesley H and Lipton, Zachary C and Heidari, Hoda | Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society | 2024 | |
302 | majumdar2025red | Red Teaming AI Red Teaming | Majumdar, Subhabrata and Pendleton, Brian and Gupta, Abhishek | arXiv preprint arXiv:2507.05538 | 2025 | |
303 | qwen2025ledger | Accountability Ledger: Blockchain-Based AI Decision Logging | {Qwen Team | 2025 | ||
304 | google2025bert | Bias Bounty Program for BERT | {Google AI | 2025 | ||
305 | openai2025gpt4 | Homomorphic Encryption in GPT-4 | OpenAI | 2025 | ||
306 | deepmind2025sparrow | Safety Layer in Sparrow: Preventing Harmful Outputs | DeepMind | 2025 | ||
307 | anthropic2025claude | Interactive Transparency in Claude | Anthropic | 2025 | ||
308 | ey2025mott | How Mott MacDonald is Building Confidence Through Responsible AI | EY | 2025 | ||
309 | ey2025biopharma | How a Global Biopharma Became a Leader in Ethical AI | EY | 2025 | ||
310 | eu2025ai | EU AI Act | {European Union | 2025 | ||
311 | masood2025effectiveness | Measuring the Effectiveness of AI Adoption | Masood, A. | 2025 | ||
312 | forbes2025ai | Future Directions in AI Ethics | Forbes | 2025 | ||
313 | ey_mottmac2025 | {How Mott MacDonald is building confidence through responsible AI} | {EY} | 2025 | ||
314 | challita2025redteamllm | RedTeamLLM: an Agentic AI framework for offensive security | Challita, Brian and Parrend, Pierre | arXiv preprint arXiv:2505.06913 | 2025 | |
315 | glazer2024frontiermath | Frontiermath: A benchmark for evaluating advanced mathematical reasoning in ai | Glazer, Elliot and Erdil, Ege and Besiroglu, Tamay and Chicharro, Diego and Chen, Evan and Gunning, Alex and Olsson, Caroline Falkman and Denain, Jean-Stanislas and Ho, Anson and Santos, Emily de Oliveira and others | arXiv preprint arXiv:2411.04872 | 2024 | |
316 | ogbu2023agentic | Agentic ai in computer vision domain-recent advances and prospects | Ogbu, Daniel | International Journal of Research Publication and Reviews | 2023 | |
317 | glaese2022improvingalignmentdialogueagents | Improving alignment of dialogue agents via targeted human judgements | Amelia Glaese and Nat McAleese and Maja Trębacz and John Aslanides and Vlad Firoiu and Timo Ewalds and Maribeth Rauh and Laura Weidinger and Martin Chadwick and Phoebe Thacker and Lucy Campbell-Gillingham and Jonathan Uesato and Po-Sen Huang and Ramona Comanescu and Fan Yang and Abigail See and Sumanth Dathathri and Rory Greig and Charlie Chen and Doug Fritz and Jaume Sanchez Elias and Richard Green and Soňa Mokrá and Nicholas Fernando and Boxi Wu and Rachel Foley and Susannah Young and Iason Gabriel and William Isaac and John Mellor and Demis Hassabis and Koray Kavukcuoglu and Lisa Anne Hendricks and Geoffrey Irving | 2022 | DOI/URL | |
318 | amorim2023dataprivacyhomomorphicencryption | Data Privacy with Homomorphic Encryption in Neural Networks Training and Inference | Ivone Amorim and Eva Maia and Pedro Barbosa and Isabel Praça | 2023 | DOI/URL | |
319 | scharowski2023exploring | Exploring the effects of human-centered AI explanations on trust and reliance | Scharowski, Nicolas and Perrig, Sebastian AC and Svab, Melanie and Opwis, Klaus and Br{\"u | Frontiers in Computer Science | 2023 | |
320 | liao2022humancenteredexplainableaixai | Human-Centered Explainable AI (XAI): From Algorithms to User Experiences | Q. Vera Liao and Kush R. Varshney | 2022 | DOI/URL | |
321 | alibabacloud_sls_logaudit | Simple Log Service: Log Audit Service (new version) | {Alibaba Cloud | 2024 | DOI/URL | |
322 | yang2020ledgerdb | LedgerDB: A centralized ledger database for universal audit and verification | Yang, Xinying and Zhang, Yuan and Wang, Sheng and Yu, Benquan and Li, Feifei and Li, Yize and Yan, Wenyuan | Proceedings of the VLDB Endowment | 2020 | |
323 | fli_ai_safety_index_2025 | {AI Safety Index: Summer 2025 Edition | {Future of Life Institute | 2025 | DOI/URL | |
324 | TFS2025_ai_agents_eu | Ahead of the Curve: Governing AI Agents under the EU {AI | {The Future Society | 2025 | DOI/URL | |
325 | maclean2017nist | The NIST risk management framework: Problems and recommendations | Maclean, Don | Cyber Security: A Peer-Reviewed Journal | 2017 | |
326 | gogia2025trust | Trust by Design: Dissecting IBM's Enterprise AI Governance Stack | Sanchit Vir Gogia | 2025 | DOI/URL | |
327 | xia2024responsibleaimetricscatalogue | Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability | Boming Xia and Qinghua Lu and Liming Zhu and Sung Une Lee and Yue Liu and Zhenchang Xing | 2024 | DOI/URL | |
328 | weidinger2024holistic | Holistic safety and responsibility evaluations of advanced ai models | Weidinger, Laura and Barnhart, Joslyn and Brennan, Jenny and Butterfield, Christina and Young, Susie and Hawkins, Will and Hendricks, Lisa Anne and Comanescu, Ramona and Chang, Oscar and Rodriguez, Mikel and others | arXiv preprint arXiv:2404.14068 | 2024 | |
329 | sprague2024cot | To cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoning | Sprague, Zayne and Yin, Fangcong and Rodriguez, Juan Diego and Jiang, Dongwei and Wadhwa, Manya and Singhal, Prasann and Zhao, Xinyu and Ye, Xi and Mahowald, Kyle and Durrett, Greg | arXiv preprint arXiv:2409.12183 | 2024 | |
330 | bergman2024stela | STELA: a community-centred approach to norm elicitation for AI alignment | Bergman, Stevie and Marchal, Nahema and Mellor, John and Mohamed, Shakir and Gabriel, Iason and Isaac, William | Scientific Reports | 2024 | |
331 | larsen2024aivaluealignment | AI value alignment: How we can align artificial intelligence with human values | Larsen, Benjamin and Dignum, Virginia | 2024 | DOI/URL | |
332 | alicloud2025ledgerdb | LedgerDB: a centralized ledger database for universal audit and verification | Yang, Xinying and Zhang, Yuan and Wang, Sheng and Yu, Benquan and Li, Feifei and Li, Yize and Yan, Wenyuan | Proc. VLDB Endow. | 2020 | DOI/URL |
333 | mialon2023gaiabenchmarkgeneralai | GAIA: a benchmark for General AI Assistants | Grégoire Mialon and Clémentine Fourrier and Craig Swift and Thomas Wolf and Yann LeCun and Thomas Scialom | 2023 | DOI/URL | |
334 | timms2024agentic | Agentic Anomaly Detection for Shipping | Timms, Alexander and Langbridge, Abigail and O'Donncha, Fearghal | NeurIPS 2024 Workshop on Open-World Agents | 2024 | |
335 | kumar2025saarthi | Saarthi: The First AI Formal Verification Engineer | Kumar, Aman and Gadde, Deepak Narayan and Radhakrishna, Keerthan Kopparam and Lettnin, Djones | arXiv preprint arXiv:2502.16662 | 2025 | |
336 | garg2025designing | Designing the Mind: How Agentic Frameworks Are Shaping the Future of AI Behavior | Garg, Venus | Journal of Computer Science and Technology Studies | 2025 | |
337 | buehler2025agentic | Agentic deep graph reasoning yields self-organizing knowledge networks | Buehler, Markus J | arXiv preprint arXiv:2502.13025 | 2025 | |
338 | perrier2025out | Out of Control--Why Alignment Needs Formal Control Theory (and an Alignment Control Stack) | Perrier, Elija | arXiv preprint arXiv:2506.17846 | 2025 | |
339 | huang2025agentic | Agentic AI | Huang, Ken | Springer | 2025 | |
340 | kitchenham2004procedures | Procedures for performing systematic reviews | Kitchenham, Barbara | Keele, UK, Keele University | 2004 | |
341 | boland2017doing | Doing a systematic review: a student s guide | Boland, Angela and Cherry, Gemma and Dickson, Rumona | Sage | 2017 | |
342 | lee2025evaluating | Evaluating step-by-step reasoning traces: A survey | Lee, Jinu and Hockenmaier, Julia | arXiv preprint arXiv:2502.12289 | 2025 | |
343 | natarajan2025human | Human-in-the-loop or AI-in-the-loop? Automate or Collaborate? | Natarajan, Sriraam and Mathur, Saurabh and Sidheekh, Sahil and Stammer, Wolfgang and Kersting, Kristian | Proceedings of the AAAI Conference on Artificial Intelligence | 2025 | |
344 | yigit2025generative | Generative AI and LLMs for critical infrastructure protection: evaluation benchmarks, agentic AI, challenges, and opportunities | Yigit, Yagmur and Ferrag, Mohamed Amine and Ghanem, Mohamed C and Sarker, Iqbal H and Maglaras, Leandros A and Chrysoulas, Christos and Moradpoor, Naghmeh and Tihanyi, Norbert and Janicke, Helge | Sensors | 2025 | |
345 | allana2025privacy | Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review | Allana, Sonal and Kankanhalli, Mohan and Dara, Rozita | arXiv preprint arXiv:2505.02828 | 2025 | |
346 | deng2025ai | Ai agents under threat: A survey of key security challenges and future pathways | Deng, Zehang and Guo, Yongjian and Han, Changzhou and Ma, Wanlun and Xiong, Junwu and Wen, Sheng and Xiang, Yang | ACM Computing Surveys | 2025 | |
347 | inala2025building | Building Trustworthy Agentic Ai Systems FOR Personalized Banking Experiences | Inala, Ramesh and Somu, Bharath | Metallurgical and Materials Engineering | 2025 | |
348 | huang2025ai | AI Agent Safety and Security Considerations | Huang, Jerry and Huang, Ken and Jackson, Krystal and Hughes, Chris | Agentic AI: Theories and Practices | 2025 | |
349 | sutton2018reinforcement | {Reinforcement learning: An introduction | Sutton, Richard S and Barto, Andrew G | MIT press | 2018 | |
350 | hosseini2025ai | AI ethics in action: a circular model for transparency, accountability and inclusivity | Hosseini Tabaghdehi, Seyedeh Asieh and Ayaz, {\"O | Journal of Managerial Psychology | 2025 | |
351 | bahangulu2025algorithmic | Algorithmic bias, data ethics, and governance: Ensuring fairness, transparency and compliance in AI-powered business analytics applications | Bahangulu, Julien Kiesse and Berko, Louis Owusu | World Journal of Advanced Research and Reviews | 2025 | |
352 | li2025ai | AI-Driven Governance: Enhancing Transparency and Accountability in Public Administration | LI, CHANGKUI | Digital Society \& Virtual Governance | 2025 | |
353 | andrada2023varieties | Varieties of transparency: Exploring agency within AI systems | Andrada, Gloria and Clowes, Robert W and Smart, Paul R | AI \& society | 2023 | |
354 | zerilli2022transparency | How transparency modulates trust in artificial intelligence | Zerilli, John and Bhatt, Umang and Weller, Adrian | Patterns | 2022 | |
355 | akhtar2024privacy | Privacy and Security Considerations in Explainable AI | Akhtar, Mohammad Amir Khusru and Kumar, Mohit and Nayyar, Anand | Towards Ethical and Socially Responsible Explainable AI: Challenges and Opportunities | 2024 | |
356 | busuioc2021accountable | Accountable artificial intelligence: Holding algorithms to account | Busuioc, Madalina | Public administration review | 2021 | |
357 | griffin2024ethical | The ethical agency of AI developers | Griffin, Tricia A and Green, Brian Patrick and Welie, Jos VM | AI and Ethics | 2024 | |
358 | bjurling2025designing | Designing Human-Swarm Interaction Systems | Bjurling, Oscar | Link{\"o | 2025 | |
359 | braun2025liability | Liability for artificial intelligence reasoning technologies--a cognitive autonomy that does not help | Braun, Tomasz | Corporate Governance: The International Journal of Business in Society | 2025 | |
360 | crewAI | CrewAI: Framework for Orchestrating Role-Playing, Autonomous AI Agents | João Moura and contributors | GitHub | 2023 | |
361 | raman2025navigating | Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways | Raman, Raghu and Kowalski, Robin and Achuthan, Krishnashree and Iyer, Akshay and Nedungadi, Prema | Scientific Reports | 2025 | |
362 | hammerschmidt2025bridging | Bridging the gap: inequalities that divide those who can and cannot create sustainable outcomes with AI | Hammerschmidt, Teresa and Stolz, Katharina and Posegga, Oliver | Behaviour \& Information Technology | 2025 | |
363 | dahlan2025navigating | Navigating the Digital Frontier: Understanding Technology's Impact on Society | Dahlan, Mariani Mohd | Universiti Poly-Tech Malaysia | 2025 | |
364 | jiang_mistral_2023 | Mistral {7B | Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, Lélio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, Timothée and Sayed, William El | arXiv | 2023 | DOI/URL |
365 | xu_bot-adversarial_2021 | Bot-{Adversarial | Xu, Jing and Ju, Da and Li, Margaret and Boureau, Y-Lan and Weston, Jason and Dinan, Emily | Proceedings of the 2021 {Conference | 2021 | DOI/URL |
366 | pujari2024ethical | Ethical and responsible AI: Governance frameworks and policy implications for multi-agent systems | Pujari, Tejaskumar and Goel, Anshul and Sharma, Ashwin | IJST | 2024 | |
367 | nanjundan2025navigating | Navigating the ethical landscape of artificial intelligence: Challenges, frameworks, and responsible deployment | Nanjundan, Preethi and Indu, PV and Thomas, Lijo | Artificial Intelligence Technologies for Engineering Applications | 2025 | |
368 | panarese2025algorithmic | Algorithmic bias, fairness, and inclusivity: a multilevel framework for justice-oriented AI | Panarese, Paola and Grasso, Marta Margherita and Solinas, Claudia | AI \& SOCIETY | 2025 | |
369 | mergen2025artificial | Artificial intelligence and bias towards marginalised groups: Theoretical roots and challenges | Mergen, Aybike and {\c{C | AI and Diversity in a Datafied World of Work: Will the Future of Work be Inclusive? | 2025 | |
370 | kay2025imitation | Imitation, Identity, and Injustice in Artificial Intelligence | Kay, Jackie | 2025 | ||
371 | koukaras2025ai | AI-driven telecommunications for smart classrooms: Transforming education through personalized learning and secure networks | Koukaras, Christos and Koukaras, Paraskevas and Ioannidis, Dimosthenis and Stavrinides, Stavros G | Telecom | 2025 | |
372 | sharma2025role | The role of large language models in personalized learning: a systematic review of educational impact | Sharma, Sahil and Mittal, Puneet and Kumar, Mukesh and Bhardwaj, Vivek | Discover Sustainability | 2025 | |
373 | lau2025size | Size Matters When Adopting and Scaling AI | Lau, Theodora | Banking on (Artificial) Intelligence: Navigating the Realities of AI in Financial Services | 2025 | |
374 | rahal2025use | The use of publicly available online texts in training AI: an ethical analysis of AI’s right to learn | Rahal, Louai | Journal of Information, Communication and Ethics in Society | 2025 | |
375 | emery2025international | International governance of advancing artificial intelligence | Emery-Xu, Nicholas and Jordan, Richard and Trager, Robert | AI \& SOCIETY | 2025 | |
376 | charkhian2025can | HOW CAN AI EVALUATE AND IMPROVE INCLUSIVITY IN UNIVERSITY PORTALS, WITH A FOCUS ON CULTURAL, LINGUISTIC, AND ACCESSIBLE REQUIREMENTS? | Charkhian, D and Moghaddami, B | INTED2025 Proceedings | 2025 | |
377 | davoodi2024equal | EQUAL AI: A framework for enhancing equity, quality, understanding and accessibility in liberal arts through AI for multilingual learners | Davoodi, Amin | Language, Technology, and Social Media | 2024 | |
378 | hyrynsalmi2025making | Making Software Development More Diverse and Inclusive: Key Themes, Challenges, and Future Directions | Hyrynsalmi, Sonja M and Baltes, Sebastian and Brown, Chris and Prikladnicki, Rafael and Rodriguez-Perez, Gema and Serebrenik, Alexander and Simmonds, Jocelyn and Trinkenreich, Bianca and Wang, Yi and Liebel, Grischa | ACM Transactions on Software Engineering and Methodology | 2025 | |
379 | alam2025ethical | Ethical Challenges and Bias in AI-Driven Marketing: Educational Imperatives and Policy Perspectives | Alam, Ashraf | Impacts of AI-Generated Content on Brand Reputation | 2025 | |
380 | neumann2025position | Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs) | Neumann, Anna and Kirsten, Elisabeth and Zafar, Muhammad Bilal and Singh, Jatinder | Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency | 2025 | |
381 | ma2025breaking | Breaking Down Bias: On The Limits of Generalizable Pruning Strategies | Ma, Sibo and Salinas, Alejandro and Nyarko, Julian and Henderson, Peter | Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency | 2025 | |
382 | solano2025running | " Who is running it?" Towards Equitable AI Deployment in Home Care Work | Solano-Kamaiko, Ian Ren{\'e | Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems | 2025 | |
383 | gabriel2025matter | A matter of principle? AI alignment as the fair treatment of claims | Gabriel, Iason and Keeling, Geoff | Philosophical Studies | 2025 | |
384 | watson2025competing | Competing narratives in AI ethics: a defense of sociotechnical pragmatism | Watson, David S and M{\"o | ai \& Society | 2025 | |
385 | goldberg2025threat | Threat Rigidity and the Role of Leadership and Organizational Change in Artificial Intelligence Adoption in Technology Companies | Goldberg, Nicole Dillon | 2025 | ||
386 | van2025beyond | Beyond efficiency: How artificial intelligence (AI) will reshape scientific inquiry and the publication process | Van Quaquebeke, Niels and Tonidandel, Scott and Banks, George C | The Leadership Quarterly | 2025 | |
387 | belliger2025new | New Perspectives on AI Alignment | Belliger, Andr{\'e | Ethics in the Age of AI: Navigating Politics and Security | 2025 | |
388 | xue2025mmrc | Mmrc: A large-scale benchmark for understanding multimodal large language model in real-world conversation | Xue, Haochen and Tang, Feilong and Hu, Ming and Liu, Yexin and Huang, Qidong and Li, Yulong and Liu, Chengzhi and Xu, Zhongxing and Zhang, Chong and Feng, Chun-Mei and others | arXiv preprint arXiv:2502.11903 | 2025 | |
389 | yang2025survey | A survey of ai agent protocols | Yang, Yingxuan and Chai, Huacan and Song, Yuanyi and Qi, Siyuan and Wen, Muning and Li, Ning and Liao, Junwei and Hu, Haoyi and Lin, Jianghao and Chang, Gaowei and others | arXiv preprint arXiv:2504.16736 | 2025 | |
390 | tian2025outlook | An outlook on the opportunities and challenges of multi-agent ai systems | Tian, Fangqiao and Luo, An and Du, Jin and Xian, Xun and Specht, Robert and Wang, Ganghua and Bi, Xuan and Zhou, Jiawei and Srinivasa, Jayanth and Kundu, Ashish and others | arXiv preprint arXiv:2505.18397 | 2025 | |
391 | karim2025ai | Ai agents meet blockchain: A survey on secure and scalable collaboration for multi-agents | Karim, Md Monjurul and Van, Dong Hoang and Khan, Sangeen and Qu, Qiang and Kholodov, Yaroslav | Future Internet | 2025 | |
392 | gawande2025reactive | From Reactive to Proactive: Real-Time Human-AI Collaboration in Intelligent Alerting Systems | Gawande, Pramod Dattarao | Journal of Computer Science and Technology Studies | 2025 | |
393 | hughes2025ai | AI agents and agentic systems: A multi-expert analysis | Hughes, Laurie and Dwivedi, Yogesh K and Malik, Tegwen and Shawosh, Mazen and Albashrawi, Mousa Ahmed and Jeon, Il and Dutot, Vincent and Appanderanda, Mandanna and Crick, Tom and De’, Rahul and others | Journal of Computer Information Systems | 2025 | |
394 | ahrweiler2025inclusive | Inclusive technology co-design for participatory AI | Ahrweiler, Petra and Sp{\"a | Participatory Artificial Intelligence in Public Social Services: From Bias to Fairness in Assessing Beneficiaries | 2025 | |
395 | merchan2025trust | Trust by Design: An Ethical Framework for Collaborative Intelligence Systems in Industry 5.0 | Merch{\'a | Electronics | 2025 | |
396 | watson2025personalized | Personalized Constitutionally-Aligned Agentic Superego: Secure AI Behavior Aligned to Diverse Human Values | Watson, Nell and Amer, Ahmed and Harris, Evan and Ravindra, Preeti and Zhang, Shujun | arXiv preprint arXiv:2506.13774 | 2025 | |
397 | kolt2025governing | Governing AI agents | Kolt, Noam | arXiv preprint arXiv:2501.07913 | 2025 | |
398 | kraprayoon2025ai | Ai agent governance: A field guide | Kraprayoon, Jam and Williams, Zoe and Fayyaz, Rida | arXiv preprint arXiv:2505.21808 | 2025 | |
399 | cohen2025exploring | Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues | Cohen, Myke C and Su, Zhe and Kao, Hsien-Te and Nguyen, Daniel and Lynch, Spencer and Sap, Maarten and Volkova, Svitlana | arXiv preprint arXiv:2506.15928 | 2025 | |
400 | zhi2024beyond | Beyond preferences in ai alignment | Zhi-Xuan, Tan and Carroll, Micah and Franklin, Matija and Ashton, Hal | Philosophical Studies | 2024 | |
401 | chan2024visibility | Visibility into AI agents | Chan, Alan and Ezell, Carson and Kaufmann, Max and Wei, Kevin and Hammond, Lewis and Bradley, Herbie and Bluemke, Emma and Rajkumar, Nitarshan and Krueger, David and Kolt, Noam and others | Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency | 2024 | |
402 | raza_fair_2024 | {FAIR | Raza, Shaina and Ghuge, Shardul and Ding, Chen and Pandya, Deval | arXiv preprint arXiv:2401.11033 | 2024 | |
403 | liu_agentbench_2023 | {AgentBench | Liu, Xiao and Yu, Hao and Zhang, Hanchen and Xu, Yifan and Lei, Xuanyu and Lai, Hanyu and Gu, Yu and Ding, Hangliang and Men, Kaiwen and Yang, Kejuan and Zhang, Shudan and Deng, Xiang and Zeng, Aohan and Du, Zhengxiao and Zhang, Chenhui and Shen, Sheng and Zhang, Tianjun and Su, Yu and Sun, Huan and Huang, Minlie and Dong, Yuxiao and Tang, Jie | arXiv | 2023 | DOI/URL |
404 | touvron2023llama | {LLaMA: Open and Efficient Foundation Language Models | Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozi{\`{e | arXiv preprint arXiv:2302.13971 | 2023 | DOI/URL |
405 | zhang_multitrust_2024 | {MultiTrust | Zhang, Yichi and Huang, Yao and Sun, Yitong and Liu, Chang and Zhao, Zhe and Fang, Zhengwei and Wang, Yifan and Chen, Huanran and Yang, Xiao and Wei, Xingxing and Su, Hang and Dong, Yinpeng and Zhu, Jun | arXiv | 2024 | DOI/URL |