Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-03 | MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues | Zhaofeng Hu et.al. | 2412.02734 | link |
2024-12-03 | GSOT3D: Towards Generic 3D Single Object Tracking in the Wild | Yifan Jiao et.al. | 2412.02129 | link |
2024-11-28 | Improving Accuracy and Generalization for Efficient Visual Tracking | Ram Zaveri et.al. | 2411.18855 | null |
2024-11-27 | A comparison of extended object tracking with multi-modal sensors in indoor environment | Jiangtao Shuai et.al. | 2411.18476 | null |
2024-12-04 | A Distractor-Aware Memory for Visual Object Tracking with SAM2 | Jovana Videnovic et.al. | 2411.17576 | link |
2024-11-23 | How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking | Xuchen Li et.al. | 2411.15600 | null |
2024-11-24 | ClickTrack: Towards Real-time Interactive Single Object Tracking | Kuiran Wang et.al. | 2411.13183 | null |
2024-11-30 | SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory | Cheng-Yen Yang et.al. | 2411.11922 | link |
2024-12-09 | Vision Eagle Attention: a new lens for advancing image classification | Mahmudul Hasan et.al. | 2411.10564 | link |
2024-11-14 | MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation | Jonas Serych et.al. | 2411.09551 | link |
2024-11-12 | Visual Tracking with Intermittent Visibility: Switched Control Design and Implementation | Yangge Li et.al. | 2411.08144 | null |
2024-11-04 | ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model | Yiming Sun et.al. | 2411.01756 | null |
2024-10-30 | IP-MOT: Instance Prompt Learning for Cross-Domain Multi-Object Tracking | Run Luo et.al. | 2410.23907 | null |
2024-10-27 | NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking | Yu Liu et.al. | 2410.20421 | link |
2024-10-19 | The Solution for Single Object Tracking Task of Perception Test Challenge 2024 | Zhiqiang Zhong et.al. | 2410.16329 | null |
2024-10-13 | Gaussian Splatting Visual MPC for Granular Media Manipulation | Wei-Cheng Tseng et.al. | 2410.09740 | null |
2024-10-09 | DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2410.02492 | null |
2024-09-30 | Opt-in Camera: Person Identification in Video via UWB Localization and Its Application to Opt-in Systems | Matthew Ishige et.al. | 2409.19891 | null |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
2024-09-26 | General Compression Framework for Efficient Transformer Object Tracking | Lingyi Hong et.al. | 2409.17564 | null |
2024-09-25 | Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 | Chunhui Zhang et.al. | 2409.16902 | link |
2024-09-25 | Conditional Generative Denoiser for Nighttime UAV Tracking | Yucheng Wang et.al. | 2409.16834 | link |
2024-09-25 | Progressive Representation Learning for Real-Time UAV Tracking | Changhong Fu et.al. | 2409.16652 | link |
2024-09-25 | Enhancing Nighttime UAV Tracking with Light Distribution Suppression | Liangliang Yao et.al. | 2409.16631 | link |
2024-09-19 | WeHelp: A Shared Autonomy System for Wheelchair Users | Abulikemu Abuduweili et.al. | 2409.12159 | link |
2024-09-18 | Distilling Channels for Efficient Deep Tracking | Shiming Ge et.al. | 2409.11785 | null |
2024-09-13 | Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Xuchen Li et.al. | 2409.08887 | null |
2024-09-10 | VBIT: Towards Enhancing Privacy Control Over IoT Devices | Jad Al Aaraj et.al. | 2409.06233 | null |
2024-09-03 | Ultra-broadband room-temperature Fourier transform spectrometer with watt-level power consumption | Jakub Mnich et.al. | 2409.01875 | null |
2024-08-25 | Camouflaged_Object_Tracking__A_Benchmark | Xiaoyu Guo et.al. | 2408.13877 | null |
2024-08-21 | Low-Light Object Tracking: A Benchmark | Pengzhi Zhong et.al. | 2408.11463 | link |
2024-08-20 | MambaEVT: Event Stream based Visual Object Tracking using State Space Model | Xiao Wang et.al. | 2408.10487 | link |
2024-08-05 | VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking | Yuxuan Lu et.al. | 2408.02263 | null |
2024-09-06 | 3D Single-object Tracking in Point Clouds with High Temporal Variation | Qiao Wu et.al. | 2408.02049 | null |
2024-09-09 | SiamMo: Siamese Motion-Centric 3D Object Tracking | Yuxiang Yang et.al. | 2408.01688 | link |
2024-08-02 | Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach | Yabin Zhu et.al. | 2408.00969 | link |
2024-08-06 | Broadband THz wave generation and detection in organic crystal PNPA at MHz repetition rates | Lukasz A. Sterczewski et.al. | 2407.20745 | null |
2024-07-16 | Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers | Zhengbo Zhang et.al. | 2407.08394 | null |
2024-07-11 | PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin Ray Fingers | Xing Wang et.al. | 2407.08222 | null |
2024-07-07 | Addressing single object tracking in satellite imagery through prompt-engineered solutions | Athena Psalta et.al. | 2407.05518 | null |
2024-07-07 | Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking | You Wu et.al. | 2407.05383 | null |
2024-07-09 | P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jiahao Nie et.al. | 2407.05238 | link |
2024-07-07 | Tracking Reflected Objects: A Benchmark | Xiaoyu Guo et.al. | 2407.05235 | null |
2024-07-04 | TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers | Fatemeh Nourilenjan Nokabadi et.al. | 2407.03946 | link |
2024-07-02 | FlowTrack: Point-level Flow Network for 3D Single Object Tracking | Shuo Li et.al. | 2407.01959 | null |
2024-09-07 | eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking | Yucheng Chen et.al. | 2406.20024 | null |
2024-06-14 | Constrained Motion Planning for a Robotic Endoscope Holder based on Hierarchical Quadratic Programming | Jacinto Colan et.al. | 2406.09982 | null |
2024-06-14 | Robust compressive tracking via online weighted multiple instance learning | Sandeep Singh Sengar et.al. | 2406.09914 | null |
2024-07-01 | Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking | Xiangyang Yang et.al. | 2406.08037 | null |
2024-06-07 | Multi-Granularity Language-Guided Multi-Object Tracking | Yuhao Li et.al. | 2406.04844 | link |
2024-06-02 | Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection | Zhuang Qi et.al. | 2406.00589 | null |
2024-05-28 | Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion | Hongze Sun et.al. | 2405.17903 | link |
2024-05-27 | LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking | Shaohua Dong et.al. | 2405.17660 | null |
2024-05-31 | Awesome Multi-modal Object Tracking | Chunhui Zhang et.al. | 2405.14200 | link |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
2024-05-16 | A Novel Bounding Box Regression Method for Single Object Tracking | Omar Abdelaziz et.al. | 2405.10444 | null |
2024-05-16 | Beyond Traditional Single Object Tracking: A Survey | Omar Abdelaziz et.al. | 2405.10439 | null |
2024-05-08 | TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking | Pengcheng Shao et.al. | 2405.05004 | link |
2024-04-22 | 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos | Yinzhe Xu et.al. | 2404.13953 | null |
2024-05-25 | An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
2024-04-16 | Attention-Aware Visualization: Tracking and Responding to User Perception Over Time | Arvind Srinivasan et.al. | 2404.10732 | null |
2024-04-15 | Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Fangwei Zhong et.al. | 2404.09857 | null |
2024-04-15 | Learning Tracking Representations from Single Point Annotations | Qiangqiang Wu et.al. | 2404.09504 | null |
2024-04-11 | PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds | Weisheng Xu et.al. | 2404.07495 | link |
2024-05-02 | Longitudinal Analysis and Quantitative Assessment of Child Development through Mobile Interaction | Juan Carlos Ruiz-Garcia et.al. | 2404.06919 | link |
2024-04-09 | LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks | Jianlang Chen et.al. | 2404.06247 | link |
2024-04-08 | Semi-Supervised Novelty Detection for Precise Ultra-Wideband Error Signal Prediction | Umberto Albertin et.al. | 2404.05351 | null |
2024-03-29 | Context-Aware Integration of Language and Visual References for Natural Language Tracking | Yanyan Shao et.al. | 2403.19975 | null |
2024-03-27 | TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes | Liangyu Xu et.al. | 2403.18238 | null |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-26 | Exploring Dynamic Transformer for Efficient Object Tracking | Jiawen Zhu et.al. | 2403.17651 | null |
2024-03-29 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558 | link |
2024-03-25 | Multi-attention Associate Prediction Network for Visual Tracking | Xinglong Sun et.al. | 2403.16395 | null |
2024-03-28 | SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking | Xiaojun Hou et.al. | 2403.16002 | link |
2024-03-23 | Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking | Shaoyu Sun et.al. | 2403.15831 | null |
2024-03-19 | TON-VIO: Online Time Offset Modeling Networks for Robust Temporal Alignment in High Dynamic Motion VIO | Chaoran Xiong et.al. | 2403.12504 | null |
2024-03-18 | Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model | Jan Krejčí et.al. | 2403.11978 | null |
2024-03-16 | A Spectrum-based Image Denoising Method with Edge Feature Enhancement | Peter Luvton et.al. | 2403.11036 | null |
2024-03-15 | Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers | Jinxia Xie et.al. | 2403.10574 | null |
2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634 | null |
2024-02-27 | ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking | Yushan Han et.al. | 2403.07914 | null |
2024-04-03 | Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline | Xiao Wang et.al. | 2403.05839 | link |
2024-03-08 | Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance | Liting Lin et.al. | 2403.05231 | link |
2024-03-08 | Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy | Yuelin Zhang et.al. | 2403.05146 | link |
2024-03-06 | VastTrack: Vast Category Visual Object Tracking | Liang Peng et.al. | 2403.03493 | link |
2024-02-28 | Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks | Zhewei Wu et.al. | 2402.17976 | null |
2024-02-26 | SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking | Yu Lin et.al. | 2402.16249 | link |
2024-02-26 | Reading Relevant Feature from Global Representation Memory for Visual Object Tracking | Xinyu Zhou et.al. | 2402.14392 | null |
2024-02-13 | Optimized Information Flow for Transformer Tracking | Janani Kugarajeevan et.al. | 2402.08195 | link |
2024-02-07 | BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision | Xin Zhao et.al. | 2402.04519 | null |
2024-02-04 | Spatio-temporal Prompting Network for Robust Video Feature Extraction | Guanxiong Sun et.al. | 2402.02574 | link |
2024-01-24 | Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region | Shengjing Tian et.al. | 2401.13285 | null |
2024-01-23 | Correlation-Embedded Transformer Tracking: A Single-Branch Framework | Fei Xie et.al. | 2401.12743 | link |
2024-01-20 | Unifying Visual and Vision-Language Tracking via Contrastive Learning | Yinchao Ma et.al. | 2401.11228 | link |
2024-01-20 | Towards Category Unification of 3D Single Object Tracking on Point Clouds | Jiahao Nie et.al. | 2401.11204 | null |
2024-01-18 | Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking | Amir M. Mansourian et.al. | 2401.09942 | null |
2024-01-12 | Dense Optical Flow Estimation Using Sparse Regularizers from Reduced Measurements | Muhammad Wasim Nawaz et.al. | 2401.06396 | null |
2024-01-18 | Hold 'em and Fold 'em: Towards Human-scale, Feedback-Controlled Soft Origami Robots | Immanuel Ampomah Mensah et.al. | 2401.04650 | null |
2024-01-06 | Explicit Visual Prompts for Visual Object Tracking | Liangtao Shi et.al. | 2401.03142 | link |
2024-01-03 | ODTrack: Online Dense Temporal Token Learning for Visual Tracking | Yaozong Zheng et.al. | 2401.01686 | link |
2023-12-27 | X Modality Assisting RGBT Object Tracking | Zhaisheng Ding et.al. | 2312.17273 | null |
2023-12-22 | Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset | Lei Liu et.al. | 2312.14446 | link |
2023-12-18 | Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking | Shihao Feng et.al. | 2312.11051 | link |
2023-12-17 | Robust 3D Tracking with Quality-Aware Shape Completion | Jingwen Zhang et.al. | 2312.10608 | null |
2023-12-15 | Tracking Skiers from the Top to the Bottom | Matteo Dunnhofer et.al. | 2312.09723 | null |
2023-12-11 | M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking | Jiaming Liu et.al. | 2312.06117 | link |
2023-12-07 | Instance Tracking in 3D Scenes from Egocentric Videos | Yunhan Zhao et.al. | 2312.04117 | link |
2024-02-19 | Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking | Jiawei Ge et.al. | 2311.17085 | null |
2023-11-21 | Visual tracking brain computer interface | Changxing Huang et.al. | 2311.12592 | null |
2024-01-10 | ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers | Edison P. Velasco Sánchez et.al. | 2311.07268 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-10 | Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences | Alan Nawzad Amin et.al. | 2412.07763 | link |
2024-12-10 | SAT: Spatial Aptitude Training for Multimodal Language Models | Arijit Ray et.al. | 2412.07755 | null |
2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746 | null |
2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743 | null |
2024-12-10 | AI Expands Scientists' Impact but Contracts Science's Focus | Qianyue Hao et.al. | 2412.07727 | null |
2024-12-10 | Granite Guardian | Inkit Padhi et.al. | 2412.07724 | link |
2024-12-10 | Leveraging Content and Context Cues for Low-Light Image Enhancement | Igor Morawski et.al. | 2412.07693 | null |
2024-12-10 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689 | link |
2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687 | null |
2024-12-10 | TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation | Alfredo Garrachón Ruiz et.al. | 2412.07682 | null |
2024-12-10 | RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models | Greg Heinrich et.al. | 2412.07679 | null |
2024-12-10 | Ask Humans or AI? Exploring Their Roles in Visualization Troubleshooting | Shuyu Shen et.al. | 2412.07673 | null |
2024-12-10 | FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks | Bocheng Chen et.al. | 2412.07672 | null |
2024-12-10 | Automating Business Intelligence Requirements with Generative AI and Semantic Search | Nimrod Busany et.al. | 2412.07668 | null |
2024-12-10 | Searching for Structure: Investigating Emergent Communication with Large Language Models | Tom Kouwenhoven et.al. | 2412.07646 | null |
2024-12-10 | TrojanWhisper: Evaluating Pre-trained LLMs to Detect and Localize Hardware Trojans | Md Omar Faruque et.al. | 2412.07636 | null |
2024-12-10 | ChocoLlama: Lessons Learned From Teaching Llamas Dutch | Matthieu Meeus et.al. | 2412.07633 | null |
2024-12-10 | Piece of Table: A Divide-and-Conquer Approach for Selecting Sub-Tables in Table Question Answering | Wonjin Lee et.al. | 2412.07629 | null |
2024-12-10 | OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations | Linke Ouyang et.al. | 2412.07626 | link |
2024-12-10 | DRUM: Learning Demonstration Retriever for Large MUlti-modal Models | Ellen Yi-Ge et.al. | 2412.07619 | null |
2024-12-09 | Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models | Yi-Lun Lee et.al. | 2412.06775 | link |
2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774 | null |
2024-12-09 | Training Large Language Models to Reason in a Continuous Latent Space | Shibo Hao et.al. | 2412.06769 | null |
2024-12-09 | Ranking-aware adapter for text-driven image ordering with CLIP | Wei-Hsiang Yu et.al. | 2412.06760 | link |
2024-12-09 | Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code | Joy Krishan Das et.al. | 2412.06757 | null |
2024-12-09 | Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models | Neel Jain et.al. | 2412.06748 | null |
2024-12-09 | ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities | Adhiraj Ghosh et.al. | 2412.06745 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738 | null |
2024-12-09 | AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark | Lan Li et.al. | 2412.06724 | null |
2024-12-09 | How to Merge Your Multimodal Models Over Time? | Sebastian Dziadzio et.al. | 2412.06712 | null |
2024-12-09 | OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions | Yi-Kai Zhang et.al. | 2412.06693 | null |
2024-12-09 | Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach | Weichao Xu et.al. | 2412.06684 | null |
2024-12-09 | Toward LLM-Agent-Based Modeling of Transportation Systems: A Conceptual Framework | Tianming Liu et.al. | 2412.06681 | null |
2024-12-09 | I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token | Roi Cohen et.al. | 2412.06676 | null |
2024-12-09 | ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | Chunwei Wang et.al. | 2412.06673 | null |
2024-12-09 | MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models | Shansong Liu et.al. | 2412.06660 | null |
2024-12-09 | Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben | Rainer Mühlhoff et.al. | 2412.06651 | null |
2024-12-09 | The Narrow Gate: Localized Image-Text Communication in Vision-Language Models | Alessandro Serra et.al. | 2412.06646 | null |
2024-12-09 | MAVias: Mitigate any Visual Bias | Ioannis Sarridis et.al. | 2412.06632 | null |
2024-12-09 | Copyright-Protected Language Generation via Adaptive Model Fusion | Javier Abad et.al. | 2412.06619 | link |
2024-12-06 | Birth and Death of a Rose | Chen Geng et.al. | 2412.05278 | null |
2024-12-06 | Sparse autoencoders reveal selective remapping of visual concepts during adaptation | Hyesu Lim et.al. | 2412.05276 | link |
2024-12-06 | Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Zhe Chen et.al. | 2412.05271 | null |
2024-12-06 | APOLLO: SGD-like Memory, AdamW-level Performance | Hanqing Zhu et.al. | 2412.05270 | null |
2024-12-06 | Uncertainty Quantification for Transformer Models for Dark-Pattern Detection | Javier Muñoz et.al. | 2412.05251 | null |
2024-12-06 | Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization | Luca Masserano et.al. | 2412.05244 | null |
2024-12-06 | CompCap: Improving Multimodal Large Language Models with Composite Captions | Xiaohui Chen et.al. | 2412.05243 | null |
2024-12-06 | MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | Jarvis Guo et.al. | 2412.05237 | null |
2024-12-06 | BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits | Wazib Ansar et.al. | 2412.05225 | null |
2024-12-06 | 100% Hallucination Elimination Using Acurai | Michael C. Wood et.al. | 2412.05223 | null |
2024-12-06 | Evaluating and Aligning CodeLLMs on Human Preference | Jian Yang et.al. | 2412.05210 | null |
2024-12-06 | A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges | Aditi Singh et.al. | 2412.05208 | null |
2024-12-06 | Are Frontier Large Language Models Suitable for Q&A in Science Centres? | Jacob Watson et.al. | 2412.05200 | null |
2024-12-06 | SurgBox: Agent-Driven Operating Room Sandbox with Surgery Copilot | Jinlin Wu et.al. | 2412.05187 | link |
2024-12-06 | LinVT: Empower Your Image-level Large Language Model to Understand Videos | Lishuai Gao et.al. | 2412.05185 | link |
2024-12-06 | QueEn: A Large Language Model for Quechua-English Translation | Junhao Chen et.al. | 2412.05184 | null |
2024-12-06 | Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models | Kuofeng Gao et.al. | 2412.05167 | null |
2024-12-06 | Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation | Manish Bhattarai et.al. | 2412.05159 | null |
2024-12-06 | Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies | Recep Firat Cekinel et.al. | 2412.05155 | null |
2024-12-06 | A text-to-tabular approach to generate synthetic patient data using LLMs | Margaux Tornqvist et.al. | 2412.05153 | null |
2024-12-05 | Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Luca Bartolomei et.al. | 2412.04472 | link |
2024-12-05 | NVILA: Efficient Frontier Visual Language Models | Zhijian Liu et.al. | 2412.04468 | null |
2024-12-05 | VisionZip: Longer is Better but Not Necessary in Vision Language Models | Senqiao Yang et.al. | 2412.04467 | link |
2024-12-05 | Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection | Enshen Zhou et.al. | 2412.04455 | null |
2024-12-05 | p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Jun Zhang et.al. | 2412.04449 | link |
2024-12-05 | EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios | Lu Qiu et.al. | 2412.04447 | null |
2024-12-05 | DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Yizhuo Li et.al. | 2412.04446 | null |
2024-12-05 | Moto: Latent Motion Token as the Bridging Language for Robot Manipulation | Yi Chen et.al. | 2412.04445 | null |
2024-12-05 | Towards Real-Time Open-Vocabulary Video Instance Segmentation | Bin Yan et.al. | 2412.04434 | null |
2024-12-05 | Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Yuying Ge et.al. | 2412.04432 | link |
2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429 | link |
2024-12-05 | Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | Jiuhai Chen et.al. | 2412.04424 | link |
2024-12-05 | Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation | Xuying Li et.al. | 2412.04415 | null |
2024-12-05 | Establishing Task Scaling Laws via Compute-Efficient Model Ladders | Akshita Bhagia et.al. | 2412.04403 | null |
2024-12-05 | SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding | Rong Li et.al. | 2412.04383 | null |
2024-12-05 | Discriminative Fine-tuning of LVLMs | Yassine Ouali et.al. | 2412.04378 | null |
2024-12-05 | Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting | Edoardo Cetin et.al. | 2412.04368 | null |
2024-12-05 | Approximate Top- |
Oscar Key et.al. | 2412.04358 | null |
2024-12-05 | Retrieval-Augmented Machine Translation with Unstructured Knowledge | Jiaan Wang et.al. | 2412.04342 | link |
2024-12-05 | Liquid: Language Models are Scalable Multi-modal Generators | Junfeng Wu et.al. | 2412.04332 | null |
2024-12-04 | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | Xinyi Mou et.al. | 2412.03563 | link |
2024-12-04 | FLAIR: VLM with Fine-grained Language-informed Image Representations | Rui Xiao et.al. | 2412.03561 | link |
2024-12-04 | Best-of-N Jailbreaking | John Hughes et.al. | 2412.03556 | link |
2024-12-04 | PaliGemma 2: A Family of Versatile VLMs for Transfer | Andreas Steiner et.al. | 2412.03555 | null |
2024-12-04 | SPICE: Smart Projection Interface for Cooking Enhancement | Vera Prohaska et.al. | 2412.03551 | null |
2024-12-04 | Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Mahtab Bigverdi et.al. | 2412.03548 | null |
2024-12-04 | Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models | Natalie Mackraz et.al. | 2412.03537 | null |
2024-12-04 | A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences | Gabriel Lino Garcia et.al. | 2412.03531 | null |
2024-12-04 | FANAL -- Financial Activity News Alerting Language Modeling Framework | Urjitkumar Patel et.al. | 2412.03527 | null |
2024-12-04 | You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? | Dominic Lohr et.al. | 2412.03516 | null |
2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512 | null |
2024-12-04 | Tight PAC-Bayesian Risk Certificates for Contrastive Learning | Anna van Elst et.al. | 2412.03486 | link |
2024-12-04 | Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Neale Ratzlaff et.al. | 2412.03467 | null |
2024-12-04 | Pre-trained Multiple Latent Variable Generative Models are good defenders against Adversarial Attacks | Dario Serez et.al. | 2412.03453 | link |
2024-12-04 | From Words to Workflows: Automating Business Processes | Laura Minkova et.al. | 2412.03446 | null |
2024-12-04 | Assessing Foundation Models' Transferability to Physiological Signals in Precision Medicine | Matthias Christenson et.al. | 2412.03427 | null |
2024-12-04 | PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation | Ao Wang et.al. | 2412.03409 | link |
2024-12-04 | RedStone: Curating General, Code, Math, and QA Data for Large Language Models | Yaoyao Chang et.al. | 2412.03398 | null |
2024-12-04 | Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs | Ge Zheng et.al. | 2412.03390 | null |
2024-12-04 | WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis | Chengwei Hu et.al. | 2412.03359 | null |
2024-12-03 | T-REG: Preference Optimization with Token-Level Reward Regularization | Wenxuan Zhou et.al. | 2412.02685 | null |
2024-12-03 | Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models | Yuda Song et.al. | 2412.02674 | null |
2024-12-03 | LLM-Enhanced Path Planning: Safe and Efficient Autonomous Navigation with Instructional Inputs | Pranav Doma et.al. | 2412.02655 | null |
2024-12-03 | Time-Reversal Provides Unsupervised Feedback to LLMs | Yerram Varun et.al. | 2412.02626 | null |
2024-12-03 | Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions | Kai Sun et.al. | 2412.02621 | null |
2024-12-03 | Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Hiroki Furuta et.al. | 2412.02617 | null |
2024-12-03 | GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Aohan Zeng et.al. | 2412.02612 | link |
2024-12-03 | AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Kaixiong Gong et.al. | 2412.02611 | null |
2024-12-03 | Interpretable Company Similarity with Sparse Autoencoders | Marco Molinari et.al. | 2412.02605 | null |
2024-12-03 | CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs | Abhas Kumar et.al. | 2412.02602 | null |
2024-12-03 | PrefixLLM: LLM-aided Prefix Circuit Design | Weihua Xiao et.al. | 2412.02594 | null |
2024-12-03 | OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | Junyuan Zhang et.al. | 2412.02592 | link |
2024-12-03 | Explainable CTR Prediction via LLM Reasoning | Xiaohan Yu et.al. | 2412.02588 | null |
2024-12-03 | Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Chenyang Liu et.al. | 2412.02573 | link |
2024-12-03 | SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Joongwon Chae et.al. | 2412.02565 | link |
2024-12-03 | Semantic Tokens in Retrieval Augmented Generation | Joel Suro et.al. | 2412.02563 | null |
2024-12-03 | Patent-CR: A Dataset for Patent Claim Revision | Lekang Jiang et.al. | 2412.02549 | null |
2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Jinjin Cai et.al. | 2412.02531 | null |
2024-12-03 | LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data | Hanyu Zhang et.al. | 2412.02525 | null |
2024-12-03 | OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations | Caixin Kang et.al. | 2412.02479 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951 | link |
2024-12-02 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Zicheng Lin et.al. | 2411.19943 | null |
2024-11-29 | VLSBench: Unveiling Visual Leakage in Multimodal Safety | Xuhao Hu et.al. | 2411.19939 | null |
2024-11-29 | On Domain-Specific Post-Training for Multimodal Large Language Models | Daixuan Cheng et.al. | 2411.19930 | null |
2024-11-29 | SIMS: Simulating Human-Scene Interactions with Real World Script Planning | Wenjia Wang et.al. | 2411.19921 | null |
2024-11-29 | FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation | Chang Won Lee et.al. | 2411.19888 | null |
2024-11-29 | PDDLFuse: A Tool for Generating Diverse Planning Domains | Vedant Khandelwal et.al. | 2411.19886 | null |
2024-12-02 | LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states | Luis Ibanez-Lissen et.al. | 2411.19876 | null |
2024-11-29 | DeMo: Decoupled Momentum Optimization | Bowen Peng et.al. | 2411.19870 | link |
2024-11-29 | AIDetx: a compression-based method for identification of machine-learning generated text | Leonardo Almeida et.al. | 2411.19869 | link |
2024-11-29 | Reverse Thinking Makes LLMs Stronger Reasoners | Justin Chih-Yao Chen et.al. | 2411.19865 | null |
2024-11-29 | Cross-Domain Recommendation Meets Large Language Models | Ajay Krishna Vajjala et.al. | 2411.19862 | link |
2024-11-29 | What fifty-one years of Linguistics and Artificial Intelligence research tell us about their correlation: A scientometric review | Mohammed Q. Shormani et.al. | 2411.19858 | null |
2024-11-29 | Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation | Dimosthenis Antypas et.al. | 2411.19832 | null |
2024-11-29 | Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation | Robin D. Pesl et.al. | 2411.19804 | null |
2024-11-29 | INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge | Angelika Romanou et.al. | 2411.19799 | null |
2024-11-29 | MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks | Yiming Wu et.al. | 2411.19786 | null |
2024-11-29 | PerLA: Perceptive 3D Language Assistant | Guofeng Mei et.al. | 2411.19774 | null |
2024-11-29 | LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos | Tiantian Geng et.al. | 2411.19772 | null |
2024-11-29 | Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models | Kaican Li et.al. | 2411.19757 | link |
2024-11-27 | Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation | Yueru Jia et.al. | 2411.18623 | null |
2024-11-27 | Cross-modal Information Flow in Multimodal Large Language Models | Zhi Zhang et.al. | 2411.18620 | null |
2024-11-27 | Diffusion Self-Distillation for Zero-Shot Customized Image Generation | Shengqu Cai et.al. | 2411.18616 | null |
2024-11-27 | Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation | Nurshat Fateh Ali et.al. | 2411.18583 | null |
2024-11-27 | Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning | Omkar Khade et.al. | 2411.18571 | null |
2024-11-27 | A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models | Rong Wang et.al. | 2411.18564 | null |
2024-11-27 | DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation | Zhixuan Liang et.al. | 2411.18562 | null |
2024-11-27 | Retrofitting (Large) Language Models with Dynamic Tokenization | Darius Feher et.al. | 2411.18553 | null |
2024-11-27 | AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans | Dillon Loh et.al. | 2411.18539 | link |
2024-11-27 | Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models | Minhyeok Lee et.al. | 2411.18530 | link |
2024-11-27 | LLM-ABBA: Understand time series via symbolic approximation | Erin Carson et.al. | 2411.18506 | null |
2024-11-27 | GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | Pengfei Zhou et.al. | 2411.18499 | null |
2024-11-27 | Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Jinyang Wu et.al. | 2411.18478 | null |
2024-11-27 | Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding | Ziyin Zhang et.al. | 2411.18462 | link |
2024-11-27 | Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator | Frederic Kirstein et.al. | 2411.18444 | null |
2024-11-27 | An AI-Assisted Multi-Agent Dual Dialogue System to Support Mental Health Care Providers | Onno P. Kampman et.al. | 2411.18429 | null |
2024-11-27 | FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Ao Shen et.al. | 2411.18424 | null |
2024-11-27 | Politicians vs ChatGPT. A study of presuppositions in French and Italian political communication | Davide Garassino et.al. | 2411.18403 | null |
2024-11-27 | Topic Modeling and Sentiment Analysis on Japanese Online Media's Coverage of Nuclear Energy | Yifan Sun et.al. | 2411.18383 | null |
2024-11-27 | ChatGPT as speechwriter for the French presidents | Dominique Labbé et.al. | 2411.18382 | null |
2024-11-26 | Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats | Jiaxin Wen et.al. | 2411.17693 | null |
2024-11-26 | Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens | Xu Ouyang et.al. | 2411.17691 | null |
2024-11-26 | Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration | Yuhang Han et.al. | 2411.17686 | null |
2024-11-26 | Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | Zhu Xu et.al. | 2411.17679 | link |
2024-11-26 | Instance-Aware Graph Prompt Learning | Jiazheng Li et.al. | 2411.17676 | null |
2024-11-26 | Push the Limit of Multi-modal Emotion Recognition by Prompting LLMs with Receptive-Field-Aware Attention Weighting | Liyun Zhang et.al. | 2411.17674 | null |
2024-11-26 | SketchAgent: Language-Driven Sequential Sketch Generation | Yael Vinker et.al. | 2411.17673 | null |
2024-11-26 | Synthetic Data Generation with LLM for Improved Depression Prediction | Andrea Kang et.al. | 2411.17672 | null |
2024-11-26 | How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations | Hyunji Lee et.al. | 2411.17666 | null |
2024-11-26 | Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism | Yi-Chien Lin et.al. | 2411.17651 | null |
2024-11-26 | On Limitations of LLM as Annotator for Low Resource Languages | Suramya Jadhav et.al. | 2411.17637 | null |
2024-11-26 | MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation | Harsh Singh et.al. | 2411.17636 | null |
2024-11-26 | Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining | Jaewoong Lee et.al. | 2411.17625 | null |
2024-11-26 | Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Aohan Zeng et.al. | 2411.17607 | null |
2024-11-26 | HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Cong Wei et.al. | 2411.17606 | link |
2024-11-26 | Making History Readable | Bipasha Banerjee et.al. | 2411.17600 | null |
2024-11-26 | Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals | William A. Ingram et.al. | 2411.17598 | null |
2024-11-26 | Can artificial intelligence predict clinical trial outcomes? | Shuyi Jin et.al. | 2411.17595 | null |
2024-11-26 | RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation | Lakshmi Likhitha Mankali et.al. | 2411.17569 | null |
2024-11-26 | Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey | Jiayi Kuang et.al. | 2411.17558 | null |
2024-11-25 | Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? | Sohee Yang et.al. | 2411.16679 | null |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668 | null |
2024-11-25 | DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Zun Wang et.al. | 2411.16657 | null |
2024-11-25 | Self-Generated Critiques Boost Reward Modeling for Language Models | Yue Yu et.al. | 2411.16646 | null |
2024-11-25 | Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective | Jean Marie Tshimula et.al. | 2411.16642 | null |
2024-11-25 | StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training | Kaustubh Ponkshe et.al. | 2411.16618 | null |
2024-11-25 | Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models | Ronghuan Wu et.al. | 2411.16602 | null |
2024-11-25 | From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | Dawei Li et.al. | 2411.16594 | link |
2024-11-25 | Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles | Klinsmann Agyei et.al. | 2411.16587 | null |
2024-11-25 | MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series | Aaron Wheeler et.al. | 2411.16585 | link |
2024-11-25 | Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision | Zhiheng Xi et.al. | 2411.16579 | null |
2024-11-25 | Predictive Power of LLMs in Financial Markets | Jerick Shi et.al. | 2411.16569 | null |
2024-11-25 | EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code | Shahriyar Zaman Ridoy et.al. | 2411.16561 | null |
2024-11-25 | Generating Out-Of-Distribution Scenarios Using Language Models | Erfan Aasi et.al. | 2411.16554 | null |
2024-11-25 | Representation Collapsing Problems in Vector Quantization | Wenhao Zhao et.al. | 2411.16550 | null |
2024-11-25 | RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics | Chan Hee Song et.al. | 2411.16537 | null |
2024-11-25 | Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings | Carolin M. Schuster et.al. | 2411.16527 | null |
2024-11-25 | Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency | Jerry Yao-Chieh Hu et.al. | 2411.16525 | null |
2024-11-25 | LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation | Steven Song et.al. | 2411.16523 | null |
2024-11-25 | Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis | Boming Miao et.al. | 2411.16503 | null |
2024-11-22 | Measuring Bullshit in the Language Games played by ChatGPT | Alessandro Trevisan et.al. | 2411.15129 | null |
2024-11-22 | Health AI Developer Foundations | Atilla P. Kiraly et.al. | 2411.15128 | null |
2024-11-22 | TÜLU 3: Pushing Frontiers in Open Language Model Post-Training | Nathan Lambert et.al. | 2411.15124 | link |
2024-11-22 | RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | Hjalmar Wijk et.al. | 2411.15114 | link |
2024-11-22 | Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion | Samarth N Ramesh et.al. | 2411.15113 | null |
2024-11-22 | AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | Fengyuan Liu et.al. | 2411.15102 | link |
2024-11-22 | What You See is Not What You Get: Neural Partial Differential Equations and The Illusion of Learning | Arvind Mohan et.al. | 2411.15101 | null |
2024-11-22 | XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Yixin Dong et.al. | 2411.15100 | null |
2024-11-22 | Context-Aware Multimodal Pretraining | Karsten Roth et.al. | 2411.15099 | null |
2024-11-22 | mR |
Tao Zhang et.al. | 2411.15041 | null |
2024-11-22 | One to rule them all: natural language to bind communication, perception and action | Simone Colombani et.al. | 2411.15033 | null |
2024-11-22 | Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot | Simone Colombani et.al. | 2411.15027 | null |
2024-11-22 | DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | Keda Tao et.al. | 2411.15024 | null |
2024-11-22 | FTA generation using GenAI with an Autonomy sensor Usecase | Sneha Sudhir Shetiya et.al. | 2411.15007 | null |
2024-11-22 | ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | Junhong Shen et.al. | 2411.15004 | link |
2024-11-22 | Generative AI may backfire for counterspeech | Dominik Bär et.al. | 2411.14986 | null |
2024-11-22 | Exploring Foundation Models Fine-Tuning for Cytology Classification | Manon Dausort et.al. | 2411.14975 | link |
2024-11-22 | Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models | Alec Wright et.al. | 2411.14972 | link |
2024-11-22 | SwissADT: An Audio Description Translation System for Swiss Languages | Lukas Fischer et.al. | 2411.14967 | null |
2024-11-22 | LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement | Jieming Bian et.al. | 2411.14961 | null |
2024-11-21 | Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2411.14432 | link |
2024-11-21 | Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation | Zhuoman Liu et.al. | 2411.14423 | null |
2024-11-21 | From RNNs to Foundation Models: An Empirical Study on Commercial Building Energy Consumption | Shourya Bose et.al. | 2411.14421 | null |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401 | null |
2024-11-21 | Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings | Aaron Zheng et.al. | 2411.14398 | null |
2024-11-21 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema et.al. | 2411.14343 | link |
2024-11-21 | SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching | Arjun P S et.al. | 2411.14322 | null |
2024-11-21 | Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training | Zheheng Luo et.al. | 2411.14318 | null |
2024-11-21 | Automated Generation of Code Debugging Exercises | Victor-Alexandru Pădurean et.al. | 2411.14303 | null |
2024-11-21 | Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams | Jitendra Bhandari et.al. | 2411.14299 | link |
2024-11-21 | EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild | Yumeng Liu et.al. | 2411.14280 | null |
2024-11-21 | Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance | Haozhe Zhao et.al. | 2411.14279 | null |
2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272 | link |
2024-11-21 | Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective | Ernests Lavrinovics et.al. | 2411.14258 | null |
2024-11-21 | Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando et.al. | 2411.14257 | null |
2024-11-21 | Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs | Zeyu Dong et.al. | 2411.14256 | null |
2024-11-21 | Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification | Junhua Liu et.al. | 2411.14252 | null |
2024-11-21 | Natural Language Reinforcement Learning | Xidong Feng et.al. | 2411.14251 | null |
2024-11-21 | FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression | Yuke Zhu et.al. | 2411.14228 | null |
2024-11-21 | Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data | Paul Fergus et.al. | 2411.14219 | null |
2024-11-20 | Find Any Part in 3D | Ziqi Ma et.al. | 2411.13550 | null |
2024-11-20 | SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs | Shirley Kokane et.al. | 2411.13547 | null |
2024-11-20 | Promoting User Data Autonomy During the Dissolution of a Monopolistic Firm | Rushabh Solanki et.al. | 2411.13546 | null |
2024-11-20 | BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Davide Paglieri et.al. | 2411.13543 | null |
2024-11-20 | Metacognition for Unknown Situations and Environments (MUSE) | Rodolfo Valiente et.al. | 2411.13537 | null |
2024-11-20 | Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse | S. Chapagain et.al. | 2411.13534 | link |
2024-11-20 | Advancing Complex Medical Communication in Arabic with Sporo AraSum: Surpassing Existing Large Language Models | Chanseo Lee et.al. | 2411.13518 | null |
2024-11-20 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin et.al. | 2411.13504 | link |
2024-11-20 | Neural machine translation of seismic waves for petrophysical inversion | José Cunha Teixeira et.al. | 2411.13491 | null |
2024-11-20 | Utilizing Large Language Models to Synthesize Product Desirability Datasets | John D. Hastings et.al. | 2411.13485 | null |
2024-11-20 | PatentEdits: Framing Patent Novelty as Textual Entailment | Ryan Lee et.al. | 2411.13477 | null |
2024-11-20 | When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | Haonan Wang et.al. | 2411.13476 | link |
2024-11-20 | SoK: A Systems Perspective on Compound AI Threats and Countermeasures | Sarbartha Banerjee et.al. | 2411.13459 | null |
2024-11-20 | LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models | Salvatore Mario Carta et.al. | 2411.13453 | null |
2024-11-20 | AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | Gaurav Verma et.al. | 2411.13451 | null |
2024-11-20 | WaterPark: A Robustness Assessment of Language Model Watermarking | Jiacheng Liang et.al. | 2411.13425 | link |
2024-11-20 | Unleashing the Power of Large Language Models for Group POI Recommendations | Jing Long et.al. | 2411.13415 | null |
2024-11-20 | A Survey On Enhancing Reinforcement Learning in Complex Environments: Insights from Human and LLM Feedback | Alireza Rashidi Laleh et.al. | 2411.13410 | null |
2024-11-20 | Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology | Muhammad Sharif et.al. | 2411.13409 | null |
2024-11-20 | Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese | Dat Van-Thanh Nguyen et.al. | 2411.13407 | null |
2024-11-19 | ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models | Salma Kharrat et.al. | 2411.12736 | link |
2024-11-19 | Information Theory of Meaningful Communication | Doron Sivan et.al. | 2411.12728 | null |
2024-11-19 | CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs | Zhehan Kan et.al. | 2411.12713 | null |
2024-11-19 | Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs | Ahmed Akib Jawad Karim et.al. | 2411.12712 | null |
2024-11-19 | Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT? | Ahmed Akib Jawad Karim et.al. | 2411.12703 | null |
2024-11-19 | When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations | Huaizhi Ge et.al. | 2411.12701 | null |
2024-11-19 | SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference | Jiho Shin et.al. | 2411.12692 | null |
2024-11-19 | Neurosymbolic Graph Enrichment for Grounded World Models | Stefano De Giorgis et.al. | 2411.12671 | null |
2024-11-19 | DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models | Vinay Kumar Sankarapu et.al. | 2411.12643 | link |
2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641 | null |
2024-11-19 | Provable unlearning in topic modeling and downstream tasks | Stanley Wei et.al. | 2411.12600 | null |
2024-11-19 | AdaCM |
Yuanbin Man et.al. | 2411.12593 | null |
2024-11-19 | Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models | Laura Ruis et.al. | 2411.12580 | link |
2024-11-19 | Large Language Models for Combinatorial Optimization of Design Structure Matrix | Shuo Jiang et.al. | 2411.12571 | null |
2024-11-19 | Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues | Riccardo Grazzi et.al. | 2411.12537 | link |
2024-11-19 | Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | Yang Zou et.al. | 2411.12530 | link |
2024-11-19 | Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Terufumi Morishita et.al. | 2411.12498 | link |
2024-11-19 | AI Flow at the Network Edge | Jiawei Shao et.al. | 2411.12469 | null |
2024-11-19 | Guide-to-Explain for Controllable Summarization | Sangwon Ryu et.al. | 2411.12460 | null |
2024-11-19 | \textsc{Neon}: News Entity-Interaction Extraction for Enhanced Question Answering | Sneha Singhania et.al. | 2411.12449 | null |
2024-11-18 | Bi-Mamba: Towards Accurate 1-Bit State Space Models | Shengkun Tang et.al. | 2411.11843 | null |
2024-11-18 | Tackling prediction tasks in relational databases with LLMs | Marek Wydmuch et.al. | 2411.11829 | null |
2024-11-18 | Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods | Egor Kovalev et.al. | 2411.11795 | null |
2024-11-18 | LLM-IE: A Python Package for Generative Information Extraction with Large Language Models | Enshuo Hsu et.al. | 2411.11779 | null |
2024-11-18 | sMoRe: Enhancing Object Manipulation and Organization in Mixed Reality Spaces with LLMs and Generative AI | Yunhao Xing et.al. | 2411.11752 | null |
2024-11-18 | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen et.al. | 2411.11745 | link |
2024-11-18 | Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment | Allison Huang et.al. | 2411.11731 | link |
2024-11-18 | Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation | Mingchao Qi et.al. | 2411.11714 | link |
2024-11-18 | FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models | Tao Fan et.al. | 2411.11707 | null |
2024-11-18 | MC-LLaVA: Multi-Concept Personalized Vision-Language Model | Ruichuan An et.al. | 2411.11706 | link |
2024-11-18 | Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search | Jinhao Jiang et.al. | 2411.11694 | null |
2024-11-18 | TrojanRobot: Backdoor Attacks Against Robotic Manipulation in the Physical World | Xianlong Wang et.al. | 2411.11683 | null |
2024-11-18 | PSPO: An Effective Process-supervised Policy Optimization for Reasoning Alignment* | Jiawei Li et.al. | 2411.11681 | link |
2024-11-18 | Dissecting Misalignment of Multimodal Large Language Models via Influence Function | Lijie Hu et.al. | 2411.11667 | null |
2024-11-18 | TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection | Mengxuan Li et.al. | 2411.11641 | link |
2024-11-18 | Chapter 7 Review of Data-Driven Generative AI Models for Knowledge Extraction from Scientific Literature in Healthcare | Leon Kopitar et.al. | 2411.11635 | null |
2024-11-18 | Signaling and Social Learning in Swarms of Robots | Leo Cazenille et.al. | 2411.11616 | null |
2024-11-18 | Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining | Danny Barash et.al. | 2411.11613 | null |
2024-11-18 | VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation | Bangguo Yu et.al. | 2411.11609 | null |
2024-11-18 | Exploring LLMs for Verifying Technical System Specifications Against Requirements | Lasse M. Reinpold et.al. | 2411.11582 | null |
2024-11-15 | VeriGraph: Scene Graphs for Execution Verifiable Robot Planning | Daniel Ekpo et.al. | 2411.10446 | null |
2024-11-15 | Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | Weiyun Wang et.al. | 2411.10442 | null |
2024-11-15 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step | Guowei Xu et.al. | 2411.10440 | link |
2024-11-15 | MARS: Unleashing the Power of Variance Reduction for Training Large Models | Huizhuo Yuan et.al. | 2411.10438 | link |
2024-11-15 | Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization | Yuhan Fu et.al. | 2411.10436 | null |
2024-11-15 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi et.al. | 2411.10422 | link |
2024-11-15 | On the Foundation Model for Cardiac MRI Reconstruction | Chi Zhang et.al. | 2411.10403 | null |
2024-11-15 | Interactive Cycle Model -- The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses | Libo Wang et.al. | 2411.10362 | null |
2024-11-15 | Bias Unveiled: Investigating Social Bias in LLM-Generated Code | Lin Ling et.al. | 2411.10351 | null |
2024-11-15 | Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images | Ammar Qammaz et.al. | 2411.10334 | null |
2024-11-15 | Number it: Temporal Grounding Videos like Flipping Manga | Yongliang Wu et.al. | 2411.10332 | link |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309 | link |
2024-11-15 | Static network structure cannot stabilize cooperation among Large Language Model agents | Jin Han et.al. | 2411.10294 | null |
2024-11-15 | Scaling Law for Post-training after Model Pruning | Xiaodong Chen et.al. | 2411.10272 | null |
2024-11-15 | Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Jingru Yang et.al. | 2411.10252 | null |
2024-11-15 | Measuring Non-Adversarial Reproduction of Training Data in Large Language Models | Michael Aerni et.al. | 2411.10242 | null |
2024-11-15 | Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability | J. Bieniek et.al. | 2411.10234 | null |
2024-11-15 | An Empirical Study on LLM-based Agents for Automated Bug Fixing | Xiangxin Meng et.al. | 2411.10213 | null |
2024-11-15 | Agentic LLMs in the Supply Chain: Towards Autonomous Multi-Agent Consensus-Seeking | Valeria Jannelli et.al. | 2411.10184 | null |
2024-11-15 | CART: Compositional Auto-Regressive Transformer for Image Generation | Siddharth Roheda et.al. | 2411.10180 | null |
2024-11-14 | MagicQuill: An Intelligent Interactive Image Editing System | Zichen Liu et.al. | 2411.09703 | null |
2024-11-14 | Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models | Wei Wang et.al. | 2411.09691 | null |
2024-11-14 | Squeezed Attention: Accelerating Long Context Length LLM Inference | Coleman Hooper et.al. | 2411.09688 | link |
2024-11-14 | Adaptive Decoding via Latent Preference Optimization | Shehzaad Dhuliawala et.al. | 2411.09661 | null |
2024-11-14 | On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse | Alkis Kalavasis et.al. | 2411.09642 | null |
2024-11-14 | Local deployment of large-scale music AI models on commodity hardware | Xun Zhou et.al. | 2411.09625 | null |
2024-11-14 | PTR: Precision-Driven Tool Recommendation for Large Language Models | Hang Gao et.al. | 2411.09613 | null |
2024-11-14 | The Moral Foundations Weibo Corpus | Renjie Cao et.al. | 2411.09612 | null |
2024-11-14 | Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework | Ronak Pradeep et.al. | 2411.09607 | null |
2024-11-14 | Accelerating Knowledge Graph and Ontology Engineering with Large Language Models | Cogan Shimizu et.al. | 2411.09601 | null |
2024-11-14 | Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images | Bipasha Kundu et.al. | 2411.09598 | null |
2024-11-14 | LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models | Zhengyi Wang et.al. | 2411.09595 | null |
2024-11-14 | Adopting RAG for LLM-Aided Future Vehicle Design | Vahid Zolfaghari et.al. | 2411.09590 | null |
2024-11-14 | BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency | Akari Haga et.al. | 2411.09587 | null |
2024-11-14 | Software Performance Engineering for Foundation Model-Powered Software (FMware) | Haoxiang Zhang et.al. | 2411.09580 | null |
2024-11-14 | Piecing It All Together: Verifying Multi-Hop Multimodal Claims | Haoran Wang et.al. | 2411.09547 | null |
2024-11-14 | A Practical Guide to Fine-tuning Language Models with Limited Data | Márton Szép et.al. | 2411.09539 | null |
2024-11-14 | Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents | Yuyou Gan et.al. | 2411.09523 | null |
2024-11-14 | Communication Compression for Tensor Parallel LLM Inference | Jan Hansen-Palmus et.al. | 2411.09510 | null |
2024-11-14 | Spider: Any-to-Many Multimodal LLM | Jinxiang Lai et.al. | 2411.09439 | null |
2024-11-13 | Large Wireless Model (LWM): A Foundation Model for Wireless Channels | Sadjad Alikhani et.al. | 2411.08872 | link |
2024-11-13 | The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models | Daniel P. Jeong et.al. | 2411.08870 | link |
2024-11-13 | CamemBERT 2.0: A Smarter French Language Model Aged to Perfection | Wissam Antoun et.al. | 2411.08868 | null |
2024-11-13 | LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Piyush Jha et.al. | 2411.08862 | null |
2024-11-13 | Multimodal Instruction Tuning with Hybrid State Space Models | Jianing Zhou et.al. | 2411.08840 | null |
2024-11-13 | FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | Tianyu Zhou et.al. | 2411.08804 | link |
2024-11-13 | Evaluating World Models with LLM for Decision Making | Chang Yang et.al. | 2411.08794 | null |
2024-11-13 | Can sparse autoencoders be used to decompose and interpret steering vectors? | Harry Mayne et.al. | 2411.08790 | link |
2024-11-13 | Sharingan: Extract User Action Sequence from Desktop Recordings | Yanting Chen et.al. | 2411.08768 | null |
2024-11-13 | Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers | Clément Dumas et.al. | 2411.08745 | link |
2024-11-13 | A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models | Dingdong Wang et.al. | 2411.08742 | null |
2024-11-13 | Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | Somanshu Singla et.al. | 2411.08733 | link |
2024-11-13 | Polymetis:Large Language Modeling for Multiple Material Domains | Chao Huang et.al. | 2411.08728 | null |
2024-11-13 | Voxeland: Probabilistic Instance-Aware Semantic Mapping with Evidence-based Uncertainty Quantification | Jose-Luis Matez-Bandera et.al. | 2411.08727 | link |
2024-11-13 | Theoretical Analysis of Byte-Pair Encoding | László Kozma et.al. | 2411.08671 | null |
2024-11-13 | OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances | Youqi Liao et.al. | 2411.08665 | link |
2024-11-13 | UniMat: Unifying Materials Embeddings through Multi-modal Learning | Janghoon Ock et.al. | 2411.08664 | null |
2024-11-13 | Accelerating Quasi-Static Time Series Simulations with Foundation Models | Alban Puech et.al. | 2411.08652 | null |
2024-11-13 | A System Level Performance Evaluation for Superconducting Digital Systems | Joyjit Kundu et.al. | 2411.08645 | null |
2024-11-13 | Towards Secure Intelligent O-RAN Architecture: Vulnerabilities, Threats and Promising Technical Solutions using LLMs | Mojdeh Karbalaee Motalleb et.al. | 2411.08640 | null |
2024-11-12 | Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data | Juanhui Li et.al. | 2411.08028 | null |
2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | Anoop Cherian et.al. | 2411.08027 | null |
2024-11-12 | Language Models as Causal Effect Generators | Lucius E. J. Bynum et.al. | 2411.08019 | link |
2024-11-12 | ExpressivityArena: Can LLMs Express Information Implicitly? | Joshua Tint et.al. | 2411.08010 | null |
2024-11-12 | Can adversarial attacks by large language models be attributed? | Manuel Cebrian et.al. | 2411.08003 | null |
2024-11-12 | Derivational Morphology Reveals Analogical Generalization in Large Language Models | Valentin Hofmann et.al. | 2411.07990 | null |
2024-11-12 | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | Yiyang Ma et.al. | 2411.07975 | link |
2024-11-12 | From General to Specific: Utilizing General Hallucation to Automatically Measure the Role Relationship Fidelity for Specific Role-Play Agents | Chuyi Kong et.al. | 2411.07965 | null |
2024-11-12 | Towards Low-bit Communication for Tensor Parallel LLM Inference | Harry Dong et.al. | 2411.07942 | null |
2024-11-12 | Leveraging Multimodal Models for Enhanced Neuroimaging Diagnostics in Alzheimer's Disease | Francesco Chiumento et.al. | 2411.07871 | null |
2024-11-12 | Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders | Xiaofeng Zhu et.al. | 2411.07870 | null |
2024-11-12 | Verbosity |
Yusen Zhang et.al. | 2411.07858 | link |
2024-11-12 | Tucano: Advancing Neural Text Generation for Portuguese | Nicholas Kluge Corrêa et.al. | 2411.07854 | link |
2024-11-12 | NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN | Sonia Raychaudhuri et.al. | 2411.07848 | null |
2024-11-12 | Chain Association-based Attacking and Shielding Natural Language Processing Systems | Jiacheng Huang et.al. | 2411.07843 | null |
2024-11-12 | FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training | Philip Zmushko et.al. | 2411.07837 | link |
2024-11-12 | Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices | Kilian Pfeiffer et.al. | 2411.07826 | null |
2024-11-12 | Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models | Youan Cong et.al. | 2411.07820 | null |
2024-11-12 | Federated Low-Rank Adaptation with Differential Privacy over Wireless Networks | Tianqu Kang et.al. | 2411.07806 | null |
2024-11-12 | Likelihood as a Performance Gauge for Retrieval-Augmented Generation | Tianyu Liu et.al. | 2411.07773 | link |
2024-11-11 | UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Bo Yang et.al. | 2411.07240 | link |
2024-11-11 | OpenThaiGPT 1.5: A Thai-Centric Open Source Large Language Model | Sumeth Yuenyong et.al. | 2411.07238 | null |
2024-11-11 | Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations | Chaitanya Malaviya et.al. | 2411.07237 | null |
2024-11-11 | Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving | Botao Yu et.al. | 2411.07228 | null |
2024-11-11 | TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models | Matheus Simão et.al. | 2411.07224 | null |
2024-11-11 | Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks | Madeline Brumley et.al. | 2411.07213 | null |
2024-11-11 | General Geospatial Inference with a Population Dynamics Foundation Model | Mohit Agarwal et.al. | 2411.07207 | null |
2024-11-11 | DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID | Nyle Siddiqui et.al. | 2411.07205 | link |
2024-11-11 | The Super Weight in Large Language Models | Mengxia Yu et.al. | 2411.07191 | link |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186 | null |
2024-11-11 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184 | link |
2024-11-11 | Counterfactual Generation from Language Models | Shauli Ravfogel et.al. | 2411.07180 | link |
2024-11-11 | More Expressive Attention with Negative Weights | Ang Lv et.al. | 2411.07176 | link |
2024-11-11 | Continual Memorization of Factoids in Large Language Models | Howard Chen et.al. | 2411.07175 | link |
2024-11-11 | A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19 | Vedant Khandelwal et.al. | 2411.07163 | null |
2024-11-11 | Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models | Yancheng He et.al. | 2411.07140 | null |
2024-11-11 | Stronger Models are NOT Stronger Teachers for Instruction Tuning | Zhangchen Xu et.al. | 2411.07133 | null |
2024-11-11 | Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis | Taihang Hu et.al. | 2411.07132 | link |
2024-11-11 | Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | Kaijian Zou et.al. | 2411.07130 | link |
2024-11-11 | Benchmarking LLMs' Judgments with No Gold Standard | Shengwei Xu et.al. | 2411.07127 | link |
2024-11-08 | Recycled Attention: Efficient inference for long-context language models | Fangyuan Xu et.al. | 2411.05787 | null |
2024-11-08 | Using Language Models to Disambiguate Lexical Choices in Translation | Josh Barua et.al. | 2411.05781 | link |
2024-11-08 | Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? | Veronica Chatrath et.al. | 2411.05775 | null |
2024-11-08 | Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 | Christopher Malon et.al. | 2411.05762 | null |
2024-11-08 | End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Dylan Goetting et.al. | 2411.05755 | link |
2024-11-08 | Aioli: A Unified Optimization Framework for Language Model Data Mixing | Mayee F. Chen et.al. | 2411.05735 | link |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734 | null |
2024-11-08 | STARS: Sensor-agnostic Transformer Architecture for Remote Sensing | Ethan King et.al. | 2411.05714 | null |
2024-11-08 | Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal | Fuka Matsuzaki et.al. | 2411.05665 | link |
2024-11-08 | The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent | Leon O. H. Kroczek et.al. | 2411.05653 | null |
2024-11-08 | LightVA: Lightweight Visual Analytics with LLM Agent-Based Task Planning and Execution | Yuheng Zhao et.al. | 2411.05651 | null |
2024-11-08 | Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation | Elena V. Epure et.al. | 2411.05649 | link |
2024-11-08 | Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation | Long Truong To et.al. | 2411.05641 | null |
2024-11-08 | Assessing Open-Source Large Language Models on Argumentation Mining Subtasks | Mohammad Yeghaneh Abkenar et.al. | 2411.05639 | null |
2024-11-08 | A Two-Step Concept-Based Approach for Enhanced Interpretability and Trust in Skin Lesion Diagnosis | Cristiano Patrício et.al. | 2411.05609 | link |
2024-11-08 | Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages | JA Meaney et.al. | 2411.05593 | null |
2024-11-08 | Open-set object detection: towards unified problem formulation and benchmarking | Hejer Ammar et.al. | 2411.05564 | null |
2024-11-08 | Training objective drives the consistency of representational similarity across datasets | Laure Ciernik et.al. | 2411.05561 | link |
2024-11-08 | AcceLLM: Accelerating LLM Inference using Redundancy for Load Balancing and Data Locality | Ilias Bournias et.al. | 2411.05555 | null |
2024-11-08 | Assessing the Answerability of Queries in Retrieval-Augmented Code Generation | Geonmin Kim et.al. | 2411.05547 | null |
2024-11-07 | SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Muyang Li et.al. | 2411.05007 | link |
2024-11-07 | Analyzing The Language of Visual Tokens | David M. Chan et.al. | 2411.05001 | null |
2024-11-07 | Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? | Jonathan Roberts et.al. | 2411.05000 | null |
2024-11-07 | DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation | Peiqi Liu et.al. | 2411.04999 | link |
2024-11-07 | LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation | Weiquan Huang et.al. | 2411.04997 | link |
2024-11-07 | Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | Weixin Liang et.al. | 2411.04996 | null |
2024-11-07 | Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives | Hao Sun et.al. | 2411.04991 | link |
2024-11-07 | The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities | Zhaofeng Wu et.al. | 2411.04986 | null |
2024-11-07 | Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | Dylan Manuel et.al. | 2411.04981 | null |
2024-11-07 | SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference | Gabriele Oliaro et.al. | 2411.04975 | null |
2024-11-07 | BitNet a4.8: 4-bit Activations for 1-bit LLMs | Hongyu Wang et.al. | 2411.04965 | null |
2024-11-07 | Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability | Yanjun Gao et.al. | 2411.04962 | null |
2024-11-07 | CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | Jingwei Xu et.al. | 2411.04954 | null |
2024-11-07 | M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | Jaemin Cho et.al. | 2411.04952 | null |
2024-11-07 | A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model | Panwen Hu et.al. | 2411.04942 | null |
2024-11-07 | VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Shehan Munasinghe et.al. | 2411.04923 | null |
2024-11-07 | GPTKB: Building Very Large Knowledge Bases from Language Models | Yujia Hu et.al. | 2411.04920 | link |
2024-11-07 | OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models | Siming Huang et.al. | 2411.04905 | null |
2024-11-07 | In the Era of Prompt Learning with Vision-Language Models | Ankit Jha et.al. | 2411.04892 | null |
2024-11-07 | GUI Agents with Foundation Models: A Comprehensive Survey | Shuai Wang et.al. | 2411.04890 | null |
2024-11-06 | Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? | Daniel P. Jeong et.al. | 2411.04118 | link |
2024-11-06 | How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis | Guan Zhe Hong et.al. | 2411.04105 | null |
2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097 | link |
2024-11-06 | Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation | Ke Fan et.al. | 2411.04079 | null |
2024-11-06 | H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models | Nhi Pham et.al. | 2411.04077 | null |
2024-11-06 | M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models | Chuhan Li et.al. | 2411.04075 | null |
2024-11-06 | Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning | Ping Li et.al. | 2411.04059 | link |
2024-11-06 | Beemo: Benchmark of Expert-edited Machine-generated Outputs | Ekaterina Artemova et.al. | 2411.04032 | null |
2024-11-06 | Prompt Engineering Using GPT for Word-Level Code-Mixed Language Identification in Low-Resource Dravidian Languages | Aniket Deroy et.al. | 2411.04025 | null |
2024-11-06 | Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval | Davide Buoso et.al. | 2411.04006 | null |
2024-11-06 | Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning | Jiawei Yao et.al. | 2411.03978 | link |
2024-11-06 | What Really is Commonsense Knowledge? | Quyet V. Do et.al. | 2411.03964 | null |
2024-11-06 | How Does A Text Preprocessing Pipeline Affect Ontology Syntactic Matching? | Zhangcheng Qiang et.al. | 2411.03962 | null |
2024-11-06 | Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model | Hatef Otroshi Shahreza et.al. | 2411.03960 | null |
2024-11-06 | Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation | Yuhang Liu et.al. | 2411.03957 | null |
2024-11-06 | Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks | Felipe Marra et.al. | 2411.03948 | null |
2024-11-06 | Interactions Across Blocks in Post-Training Quantization of Large Language Models | Khasmamad Shabanovi et.al. | 2411.03934 | null |
2024-11-06 | Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models | Minh Duc Bui et.al. | 2411.03888 | link |
2024-11-06 | Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | Zhijian Zhuo et.al. | 2411.03884 | link |
2024-11-06 | MEG: Medical Knowledge-Augmented Large Language Models for Question Answering | Laura Cabello et.al. | 2411.03883 | link |
2024-11-05 | Inference Optimal VLMs Need Only One Visual Token but Larger Models | Kevin Y. Li et.al. | 2411.03312 | link |
2024-11-05 | LLMs for Domain Generation Algorithm Detection | Reynier Leyva La O et.al. | 2411.03307 | null |
2024-11-05 | VERITAS: A Unified Approach to Reliability Evaluation | Rajkumar Ramamurthy et.al. | 2411.03300 | null |
2024-11-05 | Examining Human-AI Collaboration for Co-Writing Constructive Comments Online | Farhana Shahid et.al. | 2411.03295 | null |
2024-11-05 | Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? | Jingyu Xiao et.al. | 2411.03292 | link |
2024-11-05 | The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare | Souren Pashangpour et.al. | 2411.03287 | null |
2024-11-05 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li et.al. | 2411.03284 | link |
2024-11-05 | Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities | Ryosuke Takata et.al. | 2411.03252 | null |
2024-11-05 | DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models | Ying Zhou et.al. | 2411.03250 | null |
2024-11-05 | From Pen to Prompt: How Creative Writers Integrate AI into their Writing Practice | Alicia Guo et.al. | 2411.03137 | null |
2024-11-05 | "Create a Fear of Missing Out" -- ChatGPT Implements Unsolicited Deceptive Designs in Generated Websites Without Warning | Veronika Krauß et.al. | 2411.03108 | null |
2024-11-05 | Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation | Jinbao Chen et.al. | 2411.03079 | null |
2024-11-05 | Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning | Bei Li et.al. | 2411.03042 | null |
2024-11-05 | HumanVLM: Foundation for Human-Scene Vision-Language Model | Dawei Dai et.al. | 2411.03034 | null |
2024-11-05 | Leveraging Large Language Models in Code Question Answering: Baselines and Issues | Georgy Andryushchenko et.al. | 2411.03012 | link |
2024-11-05 | Controlling for Unobserved Confounding with Large Language Model Classification of Patient Smoking Status | Samuel Lee et.al. | 2411.03004 | null |
2024-11-05 | Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation | Junchen Fu et.al. | 2411.02992 | null |
2024-11-05 | Growing a Tail: Increasing Output Diversity in Large Language Models | Michal Shur-Ofry et.al. | 2411.02989 | null |
2024-11-05 | [Vision Paper] PRObot: Enhancing Patient-Reported Outcome Measures for Diabetic Retinopathy using Chatbots and Generative AI | Maren Pielka et.al. | 2411.02973 | null |
2024-11-05 | Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation | Xavier Timoneda et.al. | 2411.02969 | null |
2024-11-04 | Training-free Regional Prompting for Diffusion Transformers | Anthony Chen et.al. | 2411.02395 | link |
2024-11-04 | Adaptive Length Image Tokenization via Recurrent Allocation | Shivam Duggal et.al. | 2411.02393 | link |
2024-11-04 | Attacking Vision-Language Computer Agents via Pop-ups | Yanzhe Zhang et.al. | 2411.02391 | link |
2024-11-04 | Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models | Guangzhi Xiong et.al. | 2411.02382 | null |
2024-11-04 | Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI | Ramneet Kaur et.al. | 2411.02381 | null |
2024-11-04 | Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis | Neel Dey et.al. | 2411.02372 | link |
2024-11-04 | DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Yang Yue et.al. | 2411.02359 | link |
2024-11-04 | "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | Eldar Kurtic et.al. | 2411.02355 | null |
2024-11-04 | Machine learning identification of maternal inflammatory response and histologic choroamnionitis from placental membrane whole slide images | Abhishek Sharma et.al. | 2411.02354 | null |
2024-11-04 | Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences | Ruotong Wang et.al. | 2411.02353 | null |
2024-11-04 | Can Large Language Models generalize analogy solving like people can? | Claire E. Stevenson et.al. | 2411.02348 | null |
2024-11-04 | WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | Zehan Qi et.al. | 2411.02337 | link |
2024-11-04 | Sparsing Law: Towards Large Language Models with Greater Activation Sparsity | Yuqi Luo et.al. | 2411.02335 | link |
2024-11-04 | Disrupting Test Development with AI Assistants | Vijay Joshi et.al. | 2411.02328 | null |
2024-11-04 | PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | Ruyang Liu et.al. | 2411.02327 | link |
2024-11-04 | An Empirical Study on the Code Refactoring Capability of Large Language Models | Jonathan Cordeiro et.al. | 2411.02320 | null |
2024-11-04 | Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast | Marilyn Rego et.al. | 2411.02318 | null |
2024-11-04 | Defining and Evaluating Physical Safety for Large Language Models | Yung-Chen Tang et.al. | 2411.02317 | null |
2024-11-04 | Evaluating Creative Short Story Generation in Humans and Large Language Models | Mete Ismayilzada et.al. | 2411.02316 | link |
2024-11-04 | Taking AI Welfare Seriously | Robert Long et.al. | 2411.00986 | null |
2024-10-31 | P-Masking: Power Law Masking Improves Multi-attribute Controlled Generation | Mohamed Elgaar et.al. | 2410.24201 | null |
2024-11-01 | SelfCodeAlign: Self-Alignment for Code Generation | Yuxiang Wei et.al. | 2410.24198 | link |
2024-10-31 | DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models | Heng-Jui Chang et.al. | 2410.24177 | null |
2024-10-31 | Constraint Back-translation Improves Complex Instruction Following of Large Language Models | Yunjia Qi et.al. | 2410.24175 | null |
2024-10-31 | Kevin Black et.al. | 2410.24164 | null | |
2024-10-31 | GPT or BERT: why not both? | Lucas Georges Gabriel Charpentier et.al. | 2410.24159 | link |
2024-10-31 | Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | Jinghan Zhang et.al. | 2410.24155 | null |
2024-10-31 | Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning | Jiaqi Liu et.al. | 2410.24152 | null |
2024-10-31 | Exploring Vision Language Models for Facial Attribute Recognition: Emotion, Race, Gender, and Age | Nouar AlDahoul et.al. | 2410.24148 | null |
2024-10-31 | Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing | Akash Dhruv et.al. | 2410.24119 | link |
2024-10-31 | Repository-Level Compositional Code Translation and Validation | Ali Reza Ibrahimzada et.al. | 2410.24117 | link |
2024-10-31 | Matchmaker: Self-Improving Large Language Model Programs for Schema Matching | Nabeel Seedat et.al. | 2410.24105 | null |
2024-10-31 | Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning | Nabil Omi et.al. | 2410.24096 | null |
2024-10-31 | In-Context Fine-Tuning for Time-Series Foundation Models | Abhimanyu Das et.al. | 2410.24087 | null |
2024-10-31 | Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs | Muhammed Saeed et.al. | 2410.24049 | null |
2024-10-31 | Handwriting Recognition in Historical Documents with Multimodal LLM | Lucian Li et.al. | 2410.24034 | null |
2024-10-31 | Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks | Yingzhe Peng et.al. | 2410.24032 | null |
2024-10-31 | AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | Yifan Xu et.al. | 2410.24024 | link |
2024-10-31 | SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation | Liang He et.al. | 2410.24022 | null |
2024-10-31 | Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? | Ioannis Tsiamas et.al. | 2410.24019 | null |
2024-10-30 | ReferEverything: Towards Segmenting Everything We Can Speak of in Videos | Anurag Bagchi et.al. | 2410.23287 | null |
2024-10-30 | A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction | Qidong Yang et.al. | 2410.23272 | null |
2024-10-30 | TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models | Ziyao Shangguan et.al. | 2410.23266 | link |
2024-10-30 | EMMA: End-to-End Multimodal Model for Autonomous Driving | Jyh-Jing Hwang et.al. | 2410.23262 | null |
2024-10-30 | Keypoint Abstraction using Large Models for Object-Relative Imitation Learning | Xiaolin Fang et.al. | 2410.23254 | null |
2024-10-30 | Evaluating Cultural and Social Awareness of LLM Web Agents | Haoyi Qiu et.al. | 2410.23252 | null |
2024-10-30 | Carrot and Stick: Eliciting Comparison Data and Beyond | Yiling Chen et.al. | 2410.23243 | null |
2024-10-30 | A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment | Matteo G. Mecattaf et.al. | 2410.23242 | link |
2024-10-30 | EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning | Peide Huang et.al. | 2410.23234 | null |
2024-10-30 | COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences | Yixin Liu et.al. | 2410.23223 | link |
2024-10-30 | Partial Channel Dependence with Channel Masks for Time Series Foundation Models | Seunghan Lee et.al. | 2410.23222 | null |
2024-10-30 | OS-ATLAS: A Foundation Action Model for Generalist GUI Agents | Zhiyong Wu et.al. | 2410.23218 | link |
2024-10-31 | Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | Sheryl Hsu et.al. | 2410.23214 | null |
2024-10-30 | ProTransformer: Robustify Transformers via Plug-and-Play Paradigm | Zhichao Hou et.al. | 2410.23182 | null |
2024-10-30 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay et.al. | 2410.23180 | link |
2024-10-30 | TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters | Haiyang Wang et.al. | 2410.23168 | link |
2024-10-30 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang et.al. | 2410.23166 | link |
2024-10-30 | FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities | Jingge Xiao et.al. | 2410.23160 | link |
2024-10-30 | VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning | Yichao Liang et.al. | 2410.23156 | null |
2024-10-30 | Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms | Jordan Meyer et.al. | 2410.23144 | null |
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | Murtaza Dalal et.al. | 2410.22332 | null |
2024-10-29 | Task Vectors are Cross-Modal | Grace Luo et.al. | 2410.22330 | null |
2024-10-29 | Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models | Seetharam Killivalavan et.al. | 2410.22323 | null |
2024-10-29 | Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting | Can Chen et.al. | 2410.22318 | link |
2024-10-29 | Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier | Kai Wang et.al. | 2410.22317 | link |
2024-10-29 | Natural Language Inference Improves Compositionality in Vision-Language Models | Paola Cascante-Bonilla et.al. | 2410.22315 | null |
2024-10-29 | Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Bo Jiang et.al. | 2410.22313 | link |
2024-10-29 | GPT-4o reads the mind in the eyes | James W. A. Strachan et.al. | 2410.22309 | null |
2024-10-29 | SVIP: Towards Verifiable Inference of Open-source Large Language Models | Yifan Sun et.al. | 2410.22307 | null |
2024-10-29 | Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | Yihe Deng et.al. | 2410.22304 | null |
2024-10-29 | LLMs are Highly-Constrained Biophysical Sequence Optimizers | Angelica Chen et.al. | 2410.22296 | null |
2024-10-29 | Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats | Mohammad Setak et.al. | 2410.22293 | null |
2024-10-29 | From melodic note sequences to pitches using word2vec | Daniel Defays et.al. | 2410.22285 | null |
2024-10-29 | Embedding-based classifiers can detect prompt injection attacks | Md. Ahsan Ayub et.al. | 2410.22284 | link |
2024-10-29 | Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models | Renzhe Yu et.al. | 2410.22282 | null |
2024-10-29 | Fourier Head: Helping Large Language Models Learn Complex Probability Distributions | Nate Gillman et.al. | 2410.22269 | null |
2024-10-29 | Meta-Learning Adaptable Foundation Models | Jacob L. Block et.al. | 2410.22264 | null |
2024-10-29 | FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality Evaluation | Farima Fatahi Bayat et.al. | 2410.22257 | null |
2024-10-29 | Abrupt Learning in Transformers: A Case Study on Matrix Completion | Pulkit Gopalani et.al. | 2410.22244 | null |
2024-10-29 | Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | Yuxuan Chen et.al. | 2410.22240 | link |
2024-10-28 | Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Yaniv Nikankin et.al. | 2410.21272 | link |
2024-10-28 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Hanyu Wang et.al. | 2410.21264 | null |
2024-10-28 | BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference | Changwoo Lee et.al. | 2410.21262 | link |
2024-10-28 | AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | Han Bao et.al. | 2410.21259 | link |
2024-10-28 | Multi-modal AI for comprehensive breast cancer prognostication | Jan Witowski et.al. | 2410.21256 | null |
2024-10-28 | LongReward: Improving Long-context Large Language Models with AI Feedback | Jiajie Zhang et.al. | 2410.21252 | link |
2024-10-28 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | Nour Jedidi et.al. | 2410.21242 | null |
2024-10-28 | Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce | Zhantao Yang et.al. | 2410.21237 | null |
2024-10-28 | Flaming-hot Initiation with Regular Execution Sampling for Large Language Models | Weizhe Chen et.al. | 2410.21236 | null |
2024-10-28 | LoRA vs Full Fine-tuning: An Illusion of Equivalence | Reece Shuttleworth et.al. | 2410.21228 | null |
2024-10-28 | Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines | Zhixin Zhang et.al. | 2410.21220 | link |
2024-10-28 | Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations | Kaifeng Huang et.al. | 2410.21218 | null |
2024-10-28 | BongLLaMA: LLaMA for Bangla Language | Abdullah Khan Zehady et.al. | 2410.21200 | null |
2024-10-28 | Belief in the Machine: Investigating Epistemological Blind Spots of Language Models | Mirac Suzgun et.al. | 2410.21195 | link |
2024-10-29 | Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction | Qintong Zhang et.al. | 2410.21169 | null |
2024-10-28 | M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation | Jiaheng Liu et.al. | 2410.21157 | null |
2024-10-28 | Palisade -- Prompt Injection Detection Framework | Sahasra Kokkula et.al. | 2410.21146 | null |
2024-10-28 | LLM-initialized Differentiable Causal Discovery | Shiv Kampani et.al. | 2410.21141 | null |
2024-10-28 | Do LLMs generate test oracles that capture the actual or the expected program behaviour? | Michael Konstantinou et.al. | 2410.21136 | null |
2024-10-28 | Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments | Marharyta Domnich et.al. | 2410.21131 | null |
2024-10-25 | The Potential and Value of AI Chatbot in Personalized Cognitive Training | Zilong Wang et.al. | 2410.19733 | null |
2024-10-25 | Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models | Yucheng Zhou et.al. | 2410.19732 | null |
2024-10-25 | Counting Ability of Large Language Models and Impact of Tokenization | Xiang Zhang et.al. | 2410.19730 | link |
2024-10-25 | FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning | Nicole Cho et.al. | 2410.19727 | null |
2024-10-25 | 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision | Shilong Li et.al. | 2410.19720 | null |
2024-10-25 | Multi-view biomedical foundation models for molecule-target and property prediction | Parthasarathy Suryanarayanan et.al. | 2410.19704 | link |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702 | null |
2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697 | null |
2024-10-25 | Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs | Yifei Zhang et.al. | 2410.19694 | null |
2024-10-25 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang et.al. | 2410.19656 | null |
2024-10-25 | Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models | Shenghao Fu et.al. | 2410.19635 | null |
2024-10-25 | Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina | Yuan Gao et.al. | 2410.19599 | null |
2024-10-25 | Diverse Sign Language Translation | Xin Shen et.al. | 2410.19586 | link |
2024-10-25 | ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems | Ritvik Aggarwal Ishneet Sukhvinder Singh Ibrahim Allahverdiyev et.al. | 2410.19572 | null |
2024-10-25 | GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing | Hosam Elgendy et.al. | 2410.19552 | link |
2024-10-25 | Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? | Antonia Wüst et.al. | 2410.19546 | link |
2024-10-25 | Brain-like Functional Organization within Large Language Models | H. Sun et.al. | 2410.19542 | null |
2024-10-25 | Detection of Human and Machine-Authored Fake News in Urdu | Muhammad Zain Ali et.al. | 2410.19517 | link |
2024-10-25 | SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models | Jahyun Koo et.al. | 2410.19503 | null |
2024-10-25 | Introducing MAPO: Momentum-Aided Gradient Descent Prompt Optimization | Anthony Cui et.al. | 2410.19499 | null |
2024-10-24 | Unbounded: A Generative Infinite Game of Character Life Simulation | Jialu Li et.al. | 2410.18975 | null |
2024-10-24 | Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques | David Ortiz-Perez et.al. | 2410.18972 | null |
2024-10-24 | ConceptDrift: Uncovering Biases through the Lens of Foundational Models | Cristian Daniel Păduraru et.al. | 2410.18970 | null |
2024-10-24 | Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | Zhangheng Li et.al. | 2410.18967 | null |
2024-10-24 | Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions | Yujuan Fu et.al. | 2410.18966 | null |
2024-10-24 | On the Crucial Role of Initialization for Matrix Factorization | Bingcong Li et.al. | 2410.18965 | null |
2024-10-24 | OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning | Xiaoqiang Wang et.al. | 2410.18963 | null |
2024-10-24 | Context is Key: A Benchmark for Forecasting with Essential Textual Information | Andrew Robert Williams et.al. | 2410.18959 | link |
2024-10-24 | Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code | Jipeng Zhang et.al. | 2410.18957 | null |
2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955 | null |
2024-10-24 | Dynamic Vocabulary Pruning in Early-Exit LLMs | Jort Vincenti et.al. | 2410.18952 | link |
2024-10-24 | SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models | Zonghao Ying et.al. | 2410.18927 | null |
2024-10-24 | From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | A M Muntasir Rahman et.al. | 2410.18921 | null |
2024-10-25 | A Survey on Speech Large Language Models | Jing Peng et.al. | 2410.18908 | null |
2024-10-24 | PRISM: A Methodology for Auditing Biases in Large Language Models | Leif Azzopardi et.al. | 2410.18906 | link |
2024-10-24 | LLMs for Extremely Low-Resource Finno-Ugric Languages | Taido Purason et.al. | 2410.18902 | null |
2024-10-24 | Creating and Repairing Robot Programs in Open-World Domains | Claire Schlesinger et.al. | 2410.18893 | null |
2024-10-24 | Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks | Graziano A. Manduzio et.al. | 2410.18890 | null |
2024-10-24 | Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance | Omer Nahum et.al. | 2410.18889 | null |
2024-10-24 | Provably Robust Watermarks for Open-Source Language Models | Miranda Christ et.al. | 2410.18861 | null |
2024-10-23 | TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts | Yuxuan Xie et.al. | 2410.18071 | null |
2024-10-23 | CLEAR: Character Unlearning in Textual and Visual Modalities | Alexey Dontsov et.al. | 2410.18057 | null |
2024-10-23 | LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | Qingfei Zhao et.al. | 2410.18050 | link |
2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040 | null |
2024-10-23 | MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Jingfan Zhang et.al. | 2410.18035 | null |
2024-10-23 | GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | Xin Li et.al. | 2410.18032 | link |
2024-10-23 | MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting | Sungil Seok et.al. | 2410.18012 | null |
2024-10-23 | Benchmarking Foundation Models on Exceptional Cases: Dataset Creation and Validation | Suho Kang et.al. | 2410.18001 | link |
2024-10-23 | MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers | Zebin Yang et.al. | 2410.17957 | null |
2024-10-23 | ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Xin He et.al. | 2410.17954 | null |
2024-10-23 | SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains | Ran Xu et.al. | 2410.17952 | null |
2024-10-23 | Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling | Nirav Bhan et.al. | 2410.17950 | null |
2024-10-23 | Toward path-invariant embeddings for local distance source characterization | Lisa Linville et.al. | 2410.17937 | null |
2024-10-23 | Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models | He Cao et.al. | 2410.17922 | link |
2024-10-23 | Scaling Diffusion Language Models via Adaptation from Autoregressive Models | Shansan Gong et.al. | 2410.17891 | link |
2024-10-23 | R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models | Linger Deng et.al. | 2410.17885 | link |
2024-10-23 | Lightweight Neural App Control | Filippos Christianos et.al. | 2410.17883 | null |
2024-10-23 | AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning | Yehonathan Refael et.al. | 2410.17881 | null |
2024-10-23 | Understanding Layer Significance in LLM Alignment | Guangyuan Shi et.al. | 2410.17875 | null |
2024-10-23 | DataTales: A Benchmark for Real-World Intelligent Data Narration | Yajing Yang et.al. | 2410.17859 | link |
2024-10-22 | PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction | Long Xing et.al. | 2410.17247 | link |
2024-10-22 | Towards Reliable Evaluation of Behavior Steering Interventions in LLMs | Itamar Pres et.al. | 2410.17245 | null |
2024-10-22 | Frontiers in Intelligent Colonoscopy | Ge-Peng Ji et.al. | 2410.17241 | link |
2024-10-22 | Large Language Models Empowered Personalized Web Agents | Hongru Cai et.al. | 2410.17236 | null |
2024-10-22 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park et.al. | 2410.17235 | link |
2024-10-22 | Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | Benedict Aaron Tjandra et.al. | 2410.17234 | null |
2024-10-22 | Few-shot In-Context Preference Learning Using Large Language Models | Chao Yu et.al. | 2410.17233 | null |
2024-10-22 | Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods | Tsachi Blau et.al. | 2410.17222 | null |
2024-10-22 | MiniPLM: Knowledge Distillation for Pre-Training Language Models | Yuxian Gu et.al. | 2410.17215 | link |
2024-10-22 | Exploring Possibilities of AI-Powered Legal Assistance in Bangladesh through Large Language Modeling | Azmine Toushik Wasi et.al. | 2410.17210 | link |
2024-10-22 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen et.al. | 2410.17196 | link |
2024-10-23 | Non-myopic Generation of Language Model for Reasoning and Planning | Chang Ma et.al. | 2410.17195 | link |
2024-10-22 | Remote Timing Attacks on Efficient Language Model Inference | Nicholas Carlini et.al. | 2410.17175 | null |
2024-10-22 | From Attention to Activation: Unravelling the Enigmas of Large Language Models | Prannay Kaul et.al. | 2410.17174 | null |
2024-10-22 | Self-calibration for Language Model Quantization and Pruning | Miles Williams et.al. | 2410.17170 | null |
2024-10-22 | Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence | İlker Işık et.al. | 2410.17161 | null |
2024-10-22 | Improving Pinterest Search Relevance Using Large Language Models | Han Wang et.al. | 2410.17152 | null |
2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149 | null |
2024-10-22 | Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation ? | Jirat Chiaranaipanich et.al. | 2410.17145 | null |
2024-10-22 | Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | Isamu Isozaki et.al. | 2410.17141 | link |
2024-10-21 | Reflection-Bench: probing AI intelligence with reflection | Lingyu Li et.al. | 2410.16270 | link |
2024-10-21 | SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Shuangrui Ding et.al. | 2410.16268 | link |
2024-10-21 | xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs | Michael S. Ryoo et.al. | 2410.16267 | null |
2024-10-22 | Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | Zhangwei Gao et.al. | 2410.16261 | link |
2024-10-21 | Elucidating the design space of language models for image generation | Xuantong Liu et.al. | 2410.16257 | link |
2024-10-21 | CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | Maosong Cao et.al. | 2410.16256 | link |
2024-10-21 | Can Knowledge Editing Really Correct Hallucinations? | Baixiang Huang et.al. | 2410.16251 | link |
2024-10-21 | Analyzing Context Contributions in LLM-based Machine Translation | Emmanouil Zaranis et.al. | 2410.16246 | null |
2024-10-21 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237 | null |
2024-10-21 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai et.al. | 2410.16236 | link |
2024-10-21 | ToW: Thoughts of Words Improve Reasoning in Large Language Models | Zhikun Xu et.al. | 2410.16235 | null |
2024-10-21 | Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping | Ryan Li et.al. | 2410.16232 | null |
2024-10-21 | Building A Coding Assistant via the Retrieval-Augmented Language Model | Xinze Li et.al. | 2410.16229 | link |
2024-10-21 | A Realistic Threat Model for Large Language Model Jailbreaks | Valentyn Boreiko et.al. | 2410.16222 | link |
2024-10-21 | Pre-training Distillation for Large Language Models: A Design Space Exploration | Hao Peng et.al. | 2410.16215 | null |
2024-10-21 | Comprehensive benchmarking of large language models for RNA secondary structure prediction | L. I. Zablocki et.al. | 2410.16212 | link |
2024-10-21 | CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning | Kumar Manas et.al. | 2410.16207 | null |
2024-10-21 | Improve Vision Language Model Chain-of-thought Reasoning | Ruohong Zhang et.al. | 2410.16198 | link |
2024-10-22 | LASER: Script Execution by Autonomous Agents for On-demand Traffic Simulation | Hao Gao et.al. | 2410.16197 | link |
2024-10-21 | Contamination Report for Multilingual Benchmarks | Sanchit Ahuja et.al. | 2410.16186 | null |
2024-10-18 | Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts | German Gritsai et.al. | 2410.14677 | null |
2024-10-18 | SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment | Qin Liu et.al. | 2410.14676 | null |
2024-10-18 | Enhancing Large Language Models' Situated Faithfulness to External Contexts | Yukun Huang et.al. | 2410.14675 | link |
2024-10-18 | Decomposing The Dark Matter of Sparse Autoencoders | Joshua Engels et.al. | 2410.14670 | link |
2024-10-18 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples | Baiqi Li et.al. | 2410.14669 | null |
2024-10-18 | MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps | Xiongtao Zhou et.al. | 2410.14668 | link |
2024-10-18 | A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning | Shengjie Sun et.al. | 2410.14660 | null |
2024-10-18 | Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | Zhepeng Cen et.al. | 2410.14655 | null |
2024-10-18 | EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search | Oliver Sieberling et.al. | 2410.14649 | link |
2024-10-18 | Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs | Runchu Tian et.al. | 2410.14641 | link |
2024-10-18 | GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | Raghuveer Thirukovalluru et.al. | 2410.14635 | link |
2024-10-18 | Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning | Yuxiang Lu et.al. | 2410.14633 | null |
2024-10-18 | On the Regularization of Learnable Embeddings for Time Series Processing | Luca Butera et.al. | 2410.14630 | null |
2024-10-18 | CELI: Controller-Embedded Language Model Interactions | Jan-Samuel Wagner et.al. | 2410.14627 | null |
2024-10-18 | DiSCo Meets LLMs: A Unified Approach for Sparse Retrieval and Contextual Distillation in Conversational Search | Simon Lupart et.al. | 2410.14609 | null |
2024-10-18 | Teaching Models to Balance Resisting and Accepting Persuasion | Elias Stengel-Eskin et.al. | 2410.14596 | link |
2024-10-18 | Neuro-Symbolic Traders: Assessing the Wisdom of AI Crowds in Markets | Namid R. Stillman et.al. | 2410.14587 | null |
2024-10-18 | Do LLMs estimate uncertainty well in instruction-following? | Juyeon Heo et.al. | 2410.14582 | null |
2024-10-18 | Large Language Models Are Overparameterized Text Encoders | Thennal D K et.al. | 2410.14578 | null |
2024-10-18 | MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | Rachel S. Y. Teo et.al. | 2410.14574 | link |
2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863 | null |
2024-10-17 | PUMA: Empowering Unified MLLM with Multi-granular Visual Generation | Rongyao Fang et.al. | 2410.13861 | link |
2024-10-17 | VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Runsen Xu et.al. | 2410.13860 | link |
2024-10-17 | Yaxin Luo et.al. | 2410.13859 | null | |
2024-10-17 | How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | Guhao Feng et.al. | 2410.13857 | null |
2024-10-17 | Can MLLMs Understand the Deep Implication Behind Chinese Images? | Chenhao Zhang et.al. | 2410.13854 | link |
2024-10-17 | Retrospective Learning from Interactions | Zizhao Chen et.al. | 2410.13852 | null |
2024-10-17 | Differentiable Robot Rendering | Ruoshi Liu et.al. | 2410.13851 | null |
2024-10-17 | SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | Xuan Zhang et.al. | 2410.13846 | link |
2024-10-17 | A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models | Qiaoyu Tang et.al. | 2410.13841 | null |
2024-10-17 | Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | Tianyu Guo et.al. | 2410.13835 | link |
2024-10-17 | A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement | Hui Yuan et.al. | 2410.13828 | link |
2024-10-17 | Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models | Mazda Moayeri et.al. | 2410.13826 | null |
2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825 | null |
2024-10-18 | Harnessing Webpage UIs for Text-Rich Visual Understanding | Junpeng Liu et.al. | 2410.13824 | null |
2024-10-17 | Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning | Xiaodan Xing et.al. | 2410.13823 | link |
2024-10-17 | Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance | Mitsuhiko Nakamoto et.al. | 2410.13816 | null |
2024-10-17 | De-mark: Watermark Removal in Large Language Models | Ruibo Chen et.al. | 2410.13808 | null |
2024-10-17 | A Watermark for Order-Agnostic Language Models | Ruibo Chen et.al. | 2410.13805 | null |
2024-10-18 | BenTo: Benchmark Task Reduction with In-Context Transferability | Hongyu Zhao et.al. | 2410.13804 | link |
2024-10-16 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models | Ce Zhang et.al. | 2410.12790 | link |
2024-10-16 | Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | Jihao Zhao et.al. | 2410.12788 | link |
2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782 | null |
2024-10-16 | Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information | Yingya Li et.al. | 2410.12774 | null |
2024-10-16 | Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions | Zhenyu Jiang et.al. | 2410.12773 | null |
2024-10-16 | Towards Zero-Shot Camera Trap Image Categorization | Jiří Vyskočil et.al. | 2410.12769 | null |
2024-10-16 | The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse | Ekansh Sharma et.al. | 2410.12766 | null |
2024-10-16 | StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples | Ajay Patel et.al. | 2410.12757 | null |
2024-10-17 | CREAM: Consistency Regularized Self-Rewarding Language Models | Zhaoyang Wang et.al. | 2410.12735 | null |
2024-10-16 | WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation | João Matos et.al. | 2410.12722 | link |
2024-10-16 | FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | Zhenheng Tang et.al. | 2410.12707 | null |
2024-10-16 | WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines | Genta Indra Winata et.al. | 2410.12705 | link |
2024-10-16 | Sarcasm Detection in a Less-Resourced Language | Lazar Đoković et.al. | 2410.12704 | link |
2024-10-16 | Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization | Xingqi Wang et.al. | 2410.12700 | link |
2024-10-16 | VividMed: Vision Language Model with Versatile Visual Grounding for Medicine | Lingxiao Luo et.al. | 2410.12694 | link |
2024-10-16 | Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 | Mohamad Abdi et.al. | 2410.12686 | null |
2024-10-16 | 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation | Dewei Zhou et.al. | 2410.12669 | null |
2024-10-16 | Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models | Shicheng Xu et.al. | 2410.12662 | null |
2024-10-16 | Evaluating Morphological Compositional Generalization in Large Language Models | Mete Ismayilzada et.al. | 2410.12656 | null |
2024-10-16 | Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals | Orchid Chetia Phukan et.al. | 2410.12645 | null |
2024-10-15 | GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Fei Tang et.al. | 2410.11841 | link |
2024-10-15 | A Hitchhiker's Guide to Scaling Law Estimation | Leshem Choshen et.al. | 2410.11840 | link |
2024-10-15 | MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding | Yue Cao et.al. | 2410.11829 | link |
2024-10-15 | Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws | Yiding Jiang et.al. | 2410.11820 | link |
2024-10-15 | Improving Long-Text Alignment for Text-to-Image Diffusion Models | Luping Liu et.al. | 2410.11817 | link |
2024-10-15 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang et.al. | 2410.11815 | null |
2024-10-15 | NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models | Han Han et.al. | 2410.11805 | null |
2024-10-15 | FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | Zhe Li et.al. | 2410.11802 | null |
2024-10-15 | Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability | Tsz Ting Chung et.al. | 2410.11786 | null |
2024-10-15 | Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty | Joey Wilson et.al. | 2410.11783 | link |
2024-10-15 | G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | Guibin Zhang et.al. | 2410.11782 | null |
2024-10-15 | Language Models Encode Numbers Using Digit Representations in Base 10 | Amit Arnold Levy et.al. | 2410.11781 | link |
2024-10-15 | MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | Chenxi Wang et.al. | 2410.11779 | link |
2024-10-15 | Time-Series Foundation Model for Value-at-Risk | Anubha Goel et.al. | 2410.11773 | link |
2024-10-15 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao et.al. | 2410.11772 | link |
2024-10-15 | SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding | Ying Chen et.al. | 2410.11761 | null |
2024-10-15 | Latent Action Pretraining from Videos | Seonghyeon Ye et.al. | 2410.11758 | null |
2024-10-15 | Personas with Attitudes: Controlling LLMs for Diverse Data Annotation | Leon Fröhling et.al. | 2410.11745 | link |
2024-10-15 | DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure | Yunfan Xiong et.al. | 2410.11744 | null |
2024-10-15 | Light-Weight Fault Tolerant Attention for Large Language Model Training | Yuhang Liang et.al. | 2410.11720 | null |
2024-10-14 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao et.al. | 2410.10819 | link |
2024-10-14 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li et.al. | 2410.10814 | link |
2024-10-14 | LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | Di Wu et.al. | 2410.10813 | link |
2024-10-14 | Local and Global Decoding in Text Generation | Daniel Gareev et.al. | 2410.10810 | link |
2024-10-14 | Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | Aakanksha et.al. | 2410.10801 | null |
2024-10-14 | Towards Foundation Models for 3D Vision: How Close Are We? | Yiming Zuo et.al. | 2410.10799 | null |
2024-10-15 | MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling | Jian Yang et.al. | 2410.10798 | null |
2024-10-14 | Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance | Sachin Goyal et.al. | 2410.10796 | link |
2024-10-15 | LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content | Nimrod Shabtay et.al. | 2410.10783 | link |
2024-10-14 | When Attention Sink Emerges in Language Models: An Empirical View | Xiangming Gu et.al. | 2410.10781 | link |
2024-10-14 | Focused ReAct: Improving ReAct through Reiterate and Early Stop | Shuoqiu Li et.al. | 2410.10779 | null |
2024-10-14 | AFlow: Automating Agentic Workflow Generation | Jiayi Zhang et.al. | 2410.10762 | link |
2024-10-14 | Denial-of-Service Poisoning Attacks against Large Language Models | Kuofeng Gao et.al. | 2410.10760 | link |
2024-10-14 | SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization | Akrit Mudvari et.al. | 2410.10759 | null |
2024-10-14 | Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification | Jan Cegin et.al. | 2410.10756 | link |
2024-10-14 | NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models | Yanbiao Ji et.al. | 2410.10743 | null |
2024-10-14 | SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing | Pengrui Quan et.al. | 2410.10741 | link |
2024-10-14 | Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs | Ishan Jindal et.al. | 2410.10739 | null |
2024-10-14 | Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning | Kuofeng Gao et.al. | 2410.10735 | null |
2024-10-14 | Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection | Giorgos Iacovides et.al. | 2410.10728 | null |
2024-10-11 | Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models | Qin Liu et.al. | 2410.09047 | null |
2024-10-11 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang et.al. | 2410.09040 | link |
2024-10-11 | Semi-Supervised Learning of Noisy Mixture of Experts Models | Oh-Ran Kwon et.al. | 2410.09039 | null |
2024-10-11 | SimpleStrat: Diversifying Language Model Generation with Stratification | Justin Wong et.al. | 2410.09038 | null |
2024-10-11 | Mentor-KD: Making Small Language Models Better Multi-step Reasoners | Hojae Lee et.al. | 2410.09037 | link |
2024-10-11 | PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | Xiangyu Yin et.al. | 2410.09034 | link |
2024-10-11 | MedMobile: A mobile-sized language model with expert-level clinical capabilities | Krithik Vishwanath et.al. | 2410.09019 | link |
2024-10-11 | Parameter-Efficient Fine-Tuning of State Space Models | Kevin Galim et.al. | 2410.09016 | link |
2024-10-11 | The Impact of Visual Information in Chinese Characters: Evaluating Large Models' Ability to Recognize and Utilize Radicals | Xiaofeng Wu et.al. | 2410.09013 | null |
2024-10-11 | Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models | Hao Li et.al. | 2410.09012 | link |
2024-10-11 | SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Ling Yang et.al. | 2410.09008 | link |
2024-10-11 | From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation Impacts | Zhuohao Jerry Zhang et.al. | 2410.09006 | null |
2024-10-11 | DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection | Haochen Li et.al. | 2410.09004 | null |
2024-10-11 | Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference | Grace Proebsting et.al. | 2410.08996 | null |
2024-10-11 | The structure of the token space for large language models | Michael Robinson et.al. | 2410.08993 | null |
2024-10-11 | Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory | Rebecca M. M. Hicke et.al. | 2410.08991 | link |
2024-10-11 | SubZero: Random Subspace Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning | Ziming Yu et.al. | 2410.08989 | link |
2024-10-11 | Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective | Bo Ni et.al. | 2410.08985 | null |
2024-10-11 | NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Zheng Yi Ho et.al. | 2410.08970 | null |
2024-10-11 | Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements | Jingyu Zhang et.al. | 2410.08968 | null |
2024-10-10 | DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models | Xiaoxiao He et.al. | 2410.08207 | null |
2024-10-10 | Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training | Gen Luo et.al. | 2410.08202 | null |
2024-10-10 | Adam Exploits |
Shuo Xie et.al. | 2410.08198 | link |
2024-10-10 | From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions | Changle Qu et.al. | 2410.08197 | link |
2024-10-10 | MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Zimu Lu et.al. | 2410.08196 | link |
2024-10-10 | Features are fate: a theory of transfer learning in high-dimensional regression | Javan Tahir et.al. | 2410.08194 | null |
2024-10-10 | GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | Yuancheng Xu et.al. | 2410.08193 | null |
2024-10-10 | MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models | Wenbo Hu et.al. | 2410.08182 | null |
2024-10-10 | Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models | Qingni Wang et.al. | 2410.08174 | null |
2024-10-10 | On the Evaluation of Generative Robotic Simulations | Feng Chen et.al. | 2410.08172 | null |
2024-10-10 | Visual Scratchpads: Enabling Global Reasoning in Vision | Aryo Lotfi et.al. | 2410.08165 | null |
2024-10-10 | Agent S: An Open Agentic Framework that Uses Computers Like a Human | Saaket Agashe et.al. | 2410.08164 | link |
2024-10-10 | The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading | Keren Gruteke Klein et.al. | 2410.08162 | link |
2024-10-10 | DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation | Jiatao Gu et.al. | 2410.08159 | null |
2024-10-10 | Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | Amrith Setlur et.al. | 2410.08146 | null |
2024-10-10 | Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs | Xiaoyuan Liu et.al. | 2410.08145 | link |
2024-10-10 | DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory | Yutong Wang et.al. | 2410.08143 | link |
2024-10-10 | Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction | Jarrid Rector-Brooks et.al. | 2410.08134 | null |
2024-10-10 | Think Beyond Size: Dynamic Prompting for More Effective Reasoning | Kamesh R et.al. | 2410.08130 | null |
2024-10-10 | Mars: Situated Inductive Reasoning in an Open-World Environment | Xiaojuan Tang et.al. | 2410.08126 | null |
2024-10-09 | MM-Ego: Towards Building Egocentric Multimodal LLMs | Hanrong Ye et.al. | 2410.07177 | null |
2024-10-09 | Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | Fei Wang et.al. | 2410.07176 | null |
2024-10-09 | Do better language models have crisper vision? | Jona Ruthardt et.al. | 2410.07173 | null |
2024-10-09 | One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | Fabian Paischer et.al. | 2410.07170 | link |
2024-10-09 | Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Cheol Jun Cho et.al. | 2410.07168 | link |
2024-10-09 | Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate | Qidong Huang et.al. | 2410.07167 | link |
2024-10-09 | Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | Manling Li et.al. | 2410.07166 | link |
2024-10-09 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Chongyu Fan et.al. | 2410.07163 | link |
2024-10-09 | Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Bohan Zeng et.al. | 2410.07155 | link |
2024-10-09 | Towards Interpreting Visual Information Processing in Vision-Language Models | Clement Neo et.al. | 2410.07149 | link |
2024-10-09 | Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling | Yingfa Chen et.al. | 2410.07145 | null |
2024-10-09 | Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | Xiaosen Zheng et.al. | 2410.07137 | link |
2024-10-10 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models | Rui Zhao et.al. | 2410.07133 | link |
2024-10-09 | Mental Disorders Detection in the Era of Large Language Models | Gleb Kuzmin et.al. | 2410.07129 | null |
2024-10-09 | Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy | Tagore Rao Kosireddy et.al. | 2410.07118 | link |
2024-10-09 | Personalized Visual Instruction Tuning | Renjie Pi et.al. | 2410.07113 | link |
2024-10-09 | VHELM: A Holistic Evaluation of Vision Language Models | Tony Lee et.al. | 2410.07112 | link |
2024-10-09 | I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | Gian Maria Campedelli et.al. | 2410.07109 | link |
2024-10-09 | Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context | Sangwon Yu et.al. | 2410.07103 | null |
2024-10-09 | MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | Jun Shern Chan et.al. | 2410.07095 | link |
2024-10-07 | Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia | Mohammad Fahes et.al. | 2410.05270 | link |
2024-10-07 | Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Fei Wang et.al. | 2410.05269 | null |
2024-10-07 | PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | Mengzhao Chen et.al. | 2410.05265 | link |
2024-10-07 | TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | Qingchen Yu et.al. | 2410.05262 | link |
2024-10-07 | TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Ya-Qi Yu et.al. | 2410.05261 | null |
2024-10-07 | Differential Transformer | Tianzhu Ye et.al. | 2410.05258 | link |
2024-10-07 | GLEE: A Unified Framework and Benchmark for Language-based Economic Environments | Eilam Shapira et.al. | 2410.05254 | link |
2024-10-07 | Causal Micro-Narratives | Mourad Heddaya et.al. | 2410.05252 | null |
2024-10-07 | SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | Yuxin Xiao et.al. | 2410.05248 | null |
2024-10-07 | Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | Boyu Gou et.al. | 2410.05243 | link |
2024-10-08 | TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models | Rabin Adhikari et.al. | 2410.05239 | link |
2024-10-07 | GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Iman Mirzadeh et.al. | 2410.05229 | null |
2024-10-07 | Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates | Avanika Narayan et.al. | 2410.05224 | null |
2024-10-07 | Precise Model Benchmarking with Only a Few Observations | Riccardo Fogliato et.al. | 2410.05222 | null |
2024-10-07 | Density estimation with LLMs: a geometric investigation of in-context learning trajectories | Toni J. B. Liu et.al. | 2410.05218 | null |
2024-10-07 | Organizing Unstructured Image Collections using Natural Language | Mingxuan Liu et.al. | 2410.05217 | null |
2024-10-07 | Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality | Youngtaek Oh et.al. | 2410.05210 | link |
2024-10-07 | RevisEval: Improving LLM-as-a-Judge via Response-Adapted References | Qiyuan Zhang et.al. | 2410.05193 | null |
2024-10-07 | Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective | Kaiyue Wen et.al. | 2410.05192 | null |
2024-10-07 | LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation | Zhijie Wang et.al. | 2410.05191 | null |
2024-10-04 | Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models | Zhuochun Li et.al. | 2410.03663 | null |
2024-10-04 | Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models | Tinghui Zhu et.al. | 2410.03659 | link |
2024-10-04 | RAFT: Realistic Attacks to Fool Text Detectors | James Wang et.al. | 2410.03658 | link |
2024-10-04 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu et.al. | 2410.03642 | link |
2024-10-04 | Conditional Enzyme Generation Using Protein Language Models with Adapters | Jason Yang et.al. | 2410.03634 | null |
2024-10-04 | Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation | Jie Xiao et.al. | 2410.03613 | null |
2024-10-04 | TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation | Jonathan Cook et.al. | 2410.03608 | null |
2024-10-04 | LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos | Noriaki Hirose et.al. | 2410.03603 | null |
2024-10-04 | Efficiently Identifying Watermarked Segments in Mixed-Source Texts | Xuandong Zhao et.al. | 2410.03600 | null |
2024-10-04 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View | Lijie Hu et.al. | 2410.03595 | null |
2024-10-04 | Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models | Xin Zou et.al. | 2410.03577 | link |
2024-10-04 | Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) | Abrar Rahman et.al. | 2410.03568 | null |
2024-10-04 | Structure-Enhanced Protein Instruction Tuning: Towards General-Purpose Protein Understanding | Wei Wu et.al. | 2410.03553 | null |
2024-10-04 | Re-examining Sexism and Misogyny Classification with Annotator Attitudes | Aiqi Jiang et.al. | 2410.03543 | null |
2024-10-04 | No Need to Talk: Asynchronous Mixture of Language Models | Anastasiia Filippova et.al. | 2410.03529 | null |
2024-10-04 | Steering Large Language Models between Code Execution and Textual Reasoning | Yongchao Chen et.al. | 2410.03524 | null |
2024-10-04 | A Probabilistic Perspective on Unlearning and Alignment for Large Language Models | Yan Scholten et.al. | 2410.03523 | null |
2024-10-04 | CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios | Zetian Ouyang et.al. | 2410.03502 | link |
2024-10-04 | FedStein: Enhancing Multi-Domain Federated Learning Through James-Stein Estimator | Sunny Gupta et.al. | 2410.03499 | link |
2024-10-04 | Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores | Robert E. Blackwell et.al. | 2410.03492 | null |
2024-10-03 | Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations | Nick Jiang et.al. | 2410.02762 | link |
2024-10-03 | FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models | Zhipei Xu et.al. | 2410.02761 | link |
2024-10-03 | Erasing Conceptual Knowledge from Language Models | Rohit Gandikota et.al. | 2410.02760 | link |
2024-10-03 | Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Yuqing Wang et.al. | 2410.02757 | null |
2024-10-03 | SIEVE: General Purpose Data Filtering System Matching GPT-4o Accuracy at 1% the Cost | Jifan Zhang et.al. | 2410.02755 | null |
2024-10-03 | Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Ulyana Piterbarg et.al. | 2410.02749 | link |
2024-10-03 | CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation | Han He et.al. | 2410.02748 | null |
2024-10-03 | Contrastive Localized Language-Image Pre-Training | Hong-You Chen et.al. | 2410.02746 | null |
2024-10-03 | Neutral residues: revisiting adapters for model extension | Franck Signe Talla et.al. | 2410.02744 | null |
2024-10-03 | MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | Yekun Chai et.al. | 2410.02743 | null |
2024-10-03 | Grounding Large Language Models In Embodied Environment With Imperfect World Models | Haolan Liu et.al. | 2410.02742 | null |
2024-10-03 | Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization | Lei Xu et.al. | 2410.02741 | link |
2024-10-03 | Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | Zhengfeng Lai et.al. | 2410.02740 | null |
2024-10-04 | Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge | Jiayi Ye et.al. | 2410.02736 | null |
2024-10-03 | DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects | Zhaowei Wang et.al. | 2410.02730 | link |
2024-10-03 | Unified Multi-Modal Interleaved Document Representation for Information Retrieval | Jaewoo Lee et.al. | 2410.02729 | null |
2024-10-03 | Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | Rohin Manvi et.al. | 2410.02725 | null |
2024-10-03 | Large Language Models as Markov Chains | Oussama Zekri et.al. | 2410.02724 | null |
2024-10-03 | Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization | Ryan C. Barron et.al. | 2410.02721 | null |
2024-10-03 | UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation | Zixuan Li et.al. | 2410.02719 | null |
2024-10-02 | Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads | Yuxiang Huang et.al. | 2410.01805 | link |
2024-10-02 | Efficient |
Alex W. Neal Riasanovsky et.al. | 2410.01799 | null |
2024-10-02 | Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models | Joseph Lee et.al. | 2410.01795 | link |
2024-10-02 | When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 | R. Thomas McCoy et.al. | 2410.01792 | null |
2024-10-02 | Investigating on RLHF methodology | Alexey Kutalev et.al. | 2410.01789 | null |
2024-10-02 | OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models | Heng Yang et.al. | 2410.01784 | link |
2024-10-02 | Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam et.al. | 2410.01782 | link |
2024-10-03 | Quantifying Generalization Complexity for Large Language Models | Zhenting Qi et.al. | 2410.01769 | link |
2024-10-02 | Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes | Hossein Sholehrasa et.al. | 2410.01755 | null |
2024-10-03 | Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks | Mengzhao Jia et.al. | 2410.01744 | link |
2024-10-02 | VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models | Kailai Feng et.al. | 2410.01738 | link |
2024-10-02 | Visual Perception in Text Strings | Qi Jia et.al. | 2410.01733 | link |
2024-10-02 | Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing | Yilmazcan Ozyurt et.al. | 2410.01727 | link |
2024-10-02 | Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting | Longyu Feng et.al. | 2410.01724 | null |
2024-10-02 | Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective | Zeyu Gan et.al. | 2410.01720 | link |
2024-10-02 | Examining the Role of Relationship Alignment in Large Language Models | Kristen M. Altenburger et.al. | 2410.01708 | null |
2024-10-02 | Interpretable Contrastive Monte Carlo Tree Search Reasoning | Zitian Gao et.al. | 2410.01707 | link |
2024-10-02 | An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Soham Govande et.al. | 2410.01704 | link |
2024-10-02 | CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs | Kangsheng Wang et.al. | 2410.01696 | null |
2024-10-02 | U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models | Tung-Yu Wu et.al. | 2410.01692 | null |
2024-09-30 | MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | Haotian Zhang et.al. | 2409.20566 | null |
2024-09-30 | LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner | Xiaopan Zhang et.al. | 2409.20560 | null |
2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557 | null |
2024-09-30 | UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | Qiaojun Yu et.al. | 2409.20551 | null |
2024-09-30 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang et.al. | 2409.20550 | null |
2024-09-30 | Robi Butler: Remote Multimodal Interactions with Household Robot Assistant | Anxing Xiao et.al. | 2409.20548 | null |
2024-09-30 | Uncertainty-Informed Screening for Safer Solvents Used in the Synthesis of Perovskite via Language Models | Arpan Mukherjee et.al. | 2409.20512 | null |
2024-09-30 | COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models | Divyanshu Daiya et.al. | 2409.20502 | null |
2024-09-30 | A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media | Dung Ha Nguyen et.al. | 2409.20467 | null |
2024-09-30 | Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments | Mohamed Elnoor et.al. | 2409.20445 | null |
2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441 | null |
2024-09-30 | HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding | Fan Yuan et.al. | 2409.20429 | null |
2024-09-30 | World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Jiacong Wang et.al. | 2409.20424 | link |
2024-09-30 | Anti-stereotypical Predictive Text Suggestions Do Not Reliably Yield Anti-stereotypical Writing | Connor Baumler et.al. | 2409.20390 | null |
2024-09-30 | Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation | Shan Chen et.al. | 2409.20385 | null |
2024-09-30 | Word-wise intonation model for cross-language TTS systems | Tomilov A. A. et.al. | 2409.20374 | null |
2024-09-30 | The Perfect Blend: Redefining RLHF with Mixture of Judges | Tengyu Xu et.al. | 2409.20370 | null |
2024-09-30 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao et.al. | 2409.20365 | link |
2024-09-30 | Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models | Yizhou Huang et.al. | 2409.20364 | null |
2024-09-30 | Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference | Ke Yi et.al. | 2409.20361 | null |
2024-09-27 | Exploring Token Pruning in Vision State Space Models | Zheng Zhan et.al. | 2409.18962 | null |
2024-09-27 | LML: Language Model Learning a Dataset for Data-Augmented Prediction | Praneeth Vadlapati et.al. | 2409.18957 | link |
2024-09-27 | Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | Jiaming Li et.al. | 2409.18943 | link |
2024-09-27 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | Heqing Zou et.al. | 2409.18938 | null |
2024-09-27 | Social Media Bot Policies: Evaluating Passive and Active Enforcement | Kristina Radivojevic et.al. | 2409.18931 | null |
2024-09-27 | AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow | Huizi Yu et.al. | 2409.18924 | null |
2024-09-27 | Soft Measures for Extracting Causal Collective Intelligence | Maryam Berijanian et.al. | 2409.18911 | link |
2024-09-27 | Improving Visual Object Tracking through Visual Prompting | Shih-Fang Chen et.al. | 2409.18901 | link |
2024-09-27 | IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation | Fan Lin et.al. | 2409.18892 | link |
2024-09-27 | Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models | Zehan Li et.al. | 2409.18878 | null |
2024-09-27 | Predicting and analyzing memorization within fine-tuned Large Language Models | Jérémie Dentan et.al. | 2409.18858 | null |
2024-09-27 | Mitigating Selection Bias with Node Pruning and Auxiliary Options | Hyeong Kyu Choi et.al. | 2409.18857 | null |
2024-09-27 | LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis | Hamed Babaei Giglou et.al. | 2409.18812 | link |
2024-09-27 | Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs | Yanyuan Qiao et.al. | 2409.18794 | null |
2024-09-27 | A Survey on the Honesty of Large Language Models | Siheng Li et.al. | 2409.18786 | link |
2024-09-27 | Enhancing Explainability in Multimodal Large Language Models Using Ontological Context | Jihen Amara et.al. | 2409.18753 | null |
2024-09-27 | OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph | Yujie Tang et.al. | 2409.18743 | null |
2024-09-27 | Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs | Gleb Mezentsev et.al. | 2409.18721 | link |
2024-09-27 | Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity | Sergey Berezin et.al. | 2409.18708 | link |
2024-09-27 | Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models | Yiming Chen et.al. | 2409.18680 | link |
2024-09-26 | EgoLM: Multi-Modal Language Model of Egocentric Motions | Fangzhou Hong et.al. | 2409.18127 | null |
2024-09-26 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Jing He et.al. | 2409.18124 | null |
2024-09-26 | Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography | Yuexi Du et.al. | 2409.18119 | null |
2024-09-26 | E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | Ye Liu et.al. | 2409.18111 | link |
2024-09-26 | Open-World Evaluation for Retrieving Diverse Perspectives | Hung-Ting Chen et.al. | 2409.18110 | null |
2024-09-26 | MALPOLON: A Framework for Deep Species Distribution Modeling | Theo Larcher et.al. | 2409.18102 | link |
2024-09-26 | SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation | Xin Li et.al. | 2409.18082 | null |
2024-09-26 | Infer Human's Intentions Before Following Natural Language Instructions | Yanming Wan et.al. | 2409.18073 | link |
2024-09-26 | Infering Alt-text For UI Icons With Large Language Models During App Development | Sabrina Haque et.al. | 2409.18060 | null |
2024-09-26 | DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving | Dingrui Wang et.al. | 2409.18053 | link |
2024-09-26 | EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions | Kai Chen et.al. | 2409.18042 | null |
2024-09-26 | Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective | Yotam Wolf et.al. | 2409.18028 | null |
2024-09-26 | An Adversarial Perspective on Machine Unlearning for AI Safety | Jakub Łucki et.al. | 2409.18025 | link |
2024-09-26 | DARE: Diverse Visual Question Answering with Robustness Evaluation | Hannah Sterz et.al. | 2409.18023 | null |
2024-09-26 | Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles | Lewei He et.al. | 2409.18014 | null |
2024-09-26 | Control Industrial Automation System with Large Language Models | Yuchen Xia et.al. | 2409.18009 | link |
2024-09-26 | Multilingual Evaluation of Long Context Retrieval and Reasoning | Ameeta Agrawal et.al. | 2409.18006 | link |
2024-09-26 | Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation | Ashmi Banerjee et.al. | 2409.18003 | null |
2024-09-26 | Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models | Georg Ahnert et.al. | 2409.17990 | link |
2024-09-26 | LLM4Brain: Training a Large Language Model for Brain Video Understanding | Ruizhe Zheng et.al. | 2409.17987 | null |
2024-09-25 | Attention Prompting on Image for Large Vision-Language Models | Runpeng Yu et.al. | 2409.17143 | link |
2024-09-25 | FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | Fazal Mittu et.al. | 2409.17141 | link |
2024-09-25 | Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | Junting Lu et.al. | 2409.17140 | null |
2024-09-25 | Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset | Andrew Goldberg et.al. | 2409.17126 | null |
2024-09-25 | Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | Fan Zhou et.al. | 2409.17115 | link |
2024-09-25 | Unveiling Ontological Commitment in Multi-Modal Foundation Models | Mert Keser et.al. | 2409.17109 | null |
2024-09-25 | Accumulator-Aware Post-Training Quantization | Ian Colbert et.al. | 2409.17092 | null |
2024-09-25 | Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Bowen Zhao et.al. | 2409.17080 | link |
2024-09-25 | VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | Yifei Liu et.al. | 2409.17066 | link |
2024-09-25 | Benchmarking Domain Generalization Algorithms in Computational Pathology | Neda Zamanitajeddin et.al. | 2409.17063 | null |
2024-09-25 | Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia | Azmul Asmar Irfan et.al. | 2409.17054 | null |
2024-09-25 | GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design | Phillip Mueller et.al. | 2409.17045 | null |
2024-09-25 | How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Francesco Verdini et.al. | 2409.17044 | null |
2024-09-25 | Counterfactual Token Generation in Large Language Models | Ivi Chatzi et.al. | 2409.17027 | link |
2024-09-25 | LLM-CARD: Towards a Description and Landscape of Large Language Models | Shengwei Tian et.al. | 2409.17011 | link |
2024-09-25 | Models Can and Should Embrace the Communicative Nature of Human-Generated Math | Sasha Boguraev et.al. | 2409.17005 | null |
2024-09-26 | INT-FlashAttention: Enabling Flash Attention for INT8 Quantization | Shimao Chen et.al. | 2409.16997 | link |
2024-09-25 | Harnessing Diversity for Important Data Selection in Pretraining Large Language Models | Chi Zhang et.al. | 2409.16986 | null |
2024-09-25 | AXCEL: Automated eXplainable Consistency Evaluation using LLMs | P Aditya Sreekar et.al. | 2409.16984 | null |
2024-09-25 | Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions | Zeyneb N. Kaya et.al. | 2409.16974 | null |
2024-09-24 | Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation | Yong Xien Chng et.al. | 2409.16278 | null |
2024-09-24 | LLM Echo Chamber: personalized and automated disinformation | Tony Ma et.al. | 2409.16241 | link |
2024-09-24 | EuroLLM: Multilingual Language Models for Europe | Pedro Henrique Martins et.al. | 2409.16235 | null |
2024-09-24 | Fine-Tuning is Fine, if Calibrated | Zheda Mai et.al. | 2409.16223 | link |
2024-09-24 | Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models | Omar Mussa et.al. | 2409.16220 | link |
2024-09-24 | LLMCount: Enhancing Stationary mmWave Detection with Multimodal-LLM | Boyan Li et.al. | 2409.16209 | null |
2024-09-25 | CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data | Qian-Wen Zhang et.al. | 2409.16202 | link |
2024-09-24 | Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking | Jun Bai et.al. | 2409.16198 | null |
2024-09-24 | HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Haoran Que et.al. | 2409.16191 | link |
2024-09-24 | Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation | Xiaohong Liu et.al. | 2409.16183 | null |
2024-09-24 | SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image | Dimitrije Antić et.al. | 2409.16178 | null |
2024-09-24 | Cyber Knowledge Completion Using Large Language Models | Braden K Webb et.al. | 2409.16176 | null |
2024-09-24 | Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | Ziyu Zhao et.al. | 2409.16167 | null |
2024-09-24 | EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges | Talor Abramovich et.al. | 2409.16165 | link |
2024-09-24 | ComiCap: A VLMs pipeline for dense captioning of Comic Panels | Emanuele Vivoli et.al. | 2409.16159 | link |
2024-09-24 | Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework | Lu Chen et.al. | 2409.16146 | link |
2024-09-24 | Evaluation of state-of-the-art ASR Models in Child-Adult Interactions | Aditya Ashvin et.al. | 2409.16135 | null |
2024-09-24 | MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | Ming Zhu et.al. | 2409.16120 | link |
2024-09-25 | Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration | Pin-Jui Ku et.al. | 2409.16117 | link |
2024-09-24 | Exploring Hint Generation Approaches in Open-Domain Question Answering | Jamshid Mozafari et.al. | 2409.16096 | link |
2024-09-20 | Gender Representation and Bias in Indian Civil Service Mock Interviews | Somonnoy Banerjee et.al. | 2409.12194 | null |
2024-09-18 | Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Peng Wang et.al. | 2409.12191 | link |
2024-09-18 | To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | Zayne Sprague et.al. | 2409.12183 | link |
2024-09-23 | A Controlled Study on Long Context Extension and Generalization in LLMs | Yi Lu et.al. | 2409.12181 | link |
2024-09-18 | Finetuning Language Models to Emit Linguistic Expressions of Uncertainty | Arslan Chaudhry et.al. | 2409.12180 | null |
2024-09-18 | Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference | Najmeh Forouzandehmehr et.al. | 2409.12150 | null |
2024-09-18 | MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | Justin Chih-Yao Chen et.al. | 2409.12147 | link |
2024-09-18 | MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion | Kalakonda Sai Shashank et.al. | 2409.12140 | null |
2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139 | null |
2024-09-18 | GRIN: GRadient-INformed MoE | Liyuan Liu et.al. | 2409.12136 | null |
2024-09-18 | Linguini: A benchmark for language-agnostic linguistic reasoning | Eduardo Sánchez et.al. | 2409.12126 | link |
2024-09-18 | Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | An Yang et.al. | 2409.12122 | null |
2024-09-18 | Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference | Edresson Casanova et.al. | 2409.12117 | null |
2024-09-18 | Measuring Human and AI Values based on Generative Psychometrics with Large Language Models | Haoran Ye et.al. | 2409.12106 | link |
2024-09-19 | Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval | Warren Jouanneau et.al. | 2409.12097 | null |
2024-09-19 | The Impact of Element Ordering on LM Agent Performance | Wayne Chi et.al. | 2409.12089 | link |
2024-09-18 | Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking | Ningyuan Xi et.al. | 2409.12059 | null |
2024-09-19 | Using Large Language Models to Generate Clinical Trial Tables and Figures | Yumeng Yang et.al. | 2409.12046 | null |
2024-09-18 | All-in-one foundational models learning across quantum chemical levels | Yuxinxin Chen et.al. | 2409.12015 | link |
2024-09-18 | Mixture of Prompt Learning for Vision Language Models | Yu Du et.al. | 2409.12011 | null |
2024-09-17 | AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs | Basel Mousi et.al. | 2409.11404 | null |
2024-09-17 | NVLM: Open Frontier-Class Multimodal LLMs | Wenliang Dai et.al. | 2409.11402 | null |
2024-09-17 | Says Who? Effective Zero-Shot Annotation of Focalization | Rebecca M. M. Hicke et.al. | 2409.11390 | null |
2024-09-17 | Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Simon Yu et.al. | 2409.11378 | link |
2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376 | null |
2024-09-17 | Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification | Fatema-E- Jannat et.al. | 2409.11375 | null |
2024-09-17 | Learning Spatially-Aware Language and Audio Embedding | Bhavika Devnani et.al. | 2409.11369 | null |
2024-09-17 | CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration | Jiahui Gao et.al. | 2409.11365 | null |
2024-09-17 | CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | Zachary S. Siegel et.al. | 2409.11363 | link |
2024-09-17 | AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances | Dhruv Agarwal et.al. | 2409.11360 | null |
2024-09-17 | THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | Mengfei Liang et.al. | 2409.11353 | link |
2024-09-17 | LPT++: Efficient Training on Mixture of Long-tailed Experts | Bowen Dong et.al. | 2409.11323 | null |
2024-09-17 | SOAP: Improving and Stabilizing Shampoo using Adam | Nikhil Vyas et.al. | 2409.11321 | link |
2024-09-17 | Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models | Divij Gupta et.al. | 2409.11302 | null |
2024-09-17 | Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 | Marcel Lamott et.al. | 2409.11282 | null |
2024-09-17 | P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task | Weiye Xu et.al. | 2409.11279 | null |
2024-09-17 | Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | Maria Rigaki et.al. | 2409.11276 | null |
2024-09-17 | Task Arithmetic for Language Expansion in Speech Translation | Yao-Fei Cheng et.al. | 2409.11274 | null |
2024-09-17 | LOLA -- An Open-Source Massively Multilingual Large Language Model | Nikit Srivastava et.al. | 2409.11272 | link |
2024-09-17 | Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models | Jiahao Qin et.al. | 2409.11263 | null |
2024-09-16 | RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | Di Liu et.al. | 2409.10516 | link |
2024-09-16 | Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | Momoko Shiraishi et.al. | 2409.10506 | null |
2024-09-16 | DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction | John Wu et.al. | 2409.10504 | null |
2024-09-16 | Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles | Kulin Shah et.al. | 2409.10502 | link |
2024-09-16 | Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | Shaznin Sultana et.al. | 2409.10490 | null |
2024-09-16 | Do Pre-trained Vision-Language Models Encode Object States? | Kaleb Newman et.al. | 2409.10488 | null |
2024-09-16 | XLM for Autonomous Driving Systems: A Comprehensive Review | Sonda Fourati et.al. | 2409.10484 | null |
2024-09-16 | Schrodinger's Memory: Large Language Models | Wei Wang et.al. | 2409.10482 | null |
2024-09-16 | Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face | Adekunle Ajibode et.al. | 2409.10472 | null |
2024-09-16 | LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning | Jicong Ao et.al. | 2409.10444 | link |
2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441 | null |
2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419 | null |
2024-09-16 | A Large-Scale Privacy Assessment of Android Third-Party SDKs | Mark Huasong Meng et.al. | 2409.10411 | null |
2024-09-16 | A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration | Zhang Zheng et.al. | 2409.10403 | null |
2024-09-17 | Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | Bhuvan Sachdeva et.al. | 2409.10354 | null |
2024-09-16 | Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation | Tianrui Song et.al. | 2409.10343 | null |
2024-09-16 | The 20 questions game to distinguish large language models | Gurvan Richardeau et.al. | 2409.10338 | null |
2024-09-16 | MGSA: Multi-granularity Graph Structure Attention for Knowledge Graph-to-Text Generation | Shanshan Wang et.al. | 2409.10294 | null |
2024-09-16 | ReflectDiffu: Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework | Jiahao Yuan et.al. | 2409.10289 | link |
2024-09-16 | ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code | Jia Feng et.al. | 2409.10280 | link |
2024-09-13 | Agents in Software Engineering: Survey, Landscape, and Vision | Yanxian Huang et.al. | 2409.09030 | link |
2024-09-13 | Contri(e)ve: Context + Retrieve for Scholarly Question Answering | Kanchan Shivashankar et.al. | 2409.09010 | null |
2024-09-13 | Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | Lucio La Cava et.al. | 2409.08963 | null |
2024-09-13 | Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | Zahra Ashktorab et.al. | 2409.08937 | null |
2024-09-13 | SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records | Paloma Rabaey et.al. | 2409.08936 | link |
2024-09-13 | LLM-based Weak Supervision Framework for Query Intent Classification in Video Search | Farnoosh Javadi et.al. | 2409.08931 | null |
2024-09-13 | Affective Computing Has Changed: The Foundation Model Disruption | Björn Schuller et.al. | 2409.08907 | null |
2024-09-13 | AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language Models | Yifei Yao et.al. | 2409.08904 | link |
2024-09-13 | A Market for Lemons? Strategic Directions for a Vigilant Application of Artificial Intelligence in Entrepreneurship Research | Martin Obschonka et.al. | 2409.08890 | null |
2024-09-13 | Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark | Xuchen Li et.al. | 2409.08887 | null |
2024-09-13 | Exploring Graph Structure Comprehension Ability of Multimodal Large Language Models: Case Studies | Zhiqiang Zhong et.al. | 2409.08864 | null |
2024-09-13 | FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | Zhenhua Xu et.al. | 2409.08846 | null |
2024-09-13 | AIPO: Improving Training Objective for Iterative Preference Optimization | Yaojie Shen et.al. | 2409.08845 | link |
2024-09-13 | A RAG Approach for Generating Competency Questions in Ontology Engineering | Xueli Pan et.al. | 2409.08820 | null |
2024-09-13 | Your Weak LLM is Secretly a Strong Teacher for Alignment | Leitian Tao et.al. | 2409.08813 | null |
2024-09-13 | Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task | Shao Zhang et.al. | 2409.08811 | null |
2024-09-13 | LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment | Huan Zhang et.al. | 2409.08795 | link |
2024-09-13 | Optimizing Ingredient Substitution Using Large Language Models to Enhance Phytochemical Content in Recipes | Luis Rita et.al. | 2409.08792 | null |
2024-09-13 | Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling | Jialu Tang et.al. | 2409.08788 | null |
2024-09-13 | Uncertainty and Generalizability in Foundation Models for Earth Observation | Raul Ramos-Pollan et.al. | 2409.08744 | null |
2024-09-12 | Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale | Rogerio Bonatti et.al. | 2409.08264 | link |
2024-09-12 | OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering | Jiahao Nick Li et.al. | 2409.08250 | null |
2024-09-12 | Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | Alisia Lupidi et.al. | 2409.08239 | null |
2024-09-12 | LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | Hakan T. Otal et.al. | 2409.08234 | link |
2024-09-12 | Adaptive Language-Guided Abstraction from Contrastive Explanations | Andi Peng et.al. | 2409.08212 | null |
2024-09-12 | ComAlign: Compositional Alignment in Vision-Language Models | Ali Abdollah et.al. | 2409.08206 | null |
2024-09-12 | What Makes a Maze Look Like a Maze? | Joy Hsu et.al. | 2409.08202 | null |
2024-09-12 | AudioBERT: Audio Knowledge Augmented Language Model | Hyunjong Ok et.al. | 2409.08199 | link |
2024-09-12 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner et.al. | 2409.08185 | link |
2024-09-12 | On the Role of Context in Reading Time Prediction | Andreas Opedal et.al. | 2409.08160 | link |
2024-09-12 | Faster Speech-LLaMA Inference with Multi-token Prediction | Desh Raj et.al. | 2409.08148 | null |
2024-09-12 | LLM-POTUS Score: A Framework of Analyzing Presidential Debates with Large Language Models | Zhengliang Liu et.al. | 2409.08147 | null |
2024-09-12 | Towards a graph-based foundation model for network traffic analysis | Louis Van Langendonck et.al. | 2409.08111 | null |
2024-09-12 | The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language | Michael Ong et.al. | 2409.08103 | null |
2024-09-12 | The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Huiyuan Xie et.al. | 2409.08098 | null |
2024-09-12 | Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks | Benji Peng et.al. | 2409.08087 | null |
2024-09-12 | SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality | Chenyang Lei et.al. | 2409.08083 | link |
2024-09-12 | SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing | An Guo et.al. | 2409.08081 | null |
2024-09-12 | TravelAgent: An AI Assistant for Personalized Travel Planning | Aili Chen et.al. | 2409.08069 | null |
2024-09-12 | An Evaluation Framework for Attributed Information Retrieval using Large Language Models | Hanane Djeddal et.al. | 2409.08014 | link |
2024-09-11 | "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays | Shengxin Hong et.al. | 2409.07453 | null |
2024-09-11 | StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | Sijie Zhao et.al. | 2409.07447 | null |
2024-09-11 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | Ben Bogin et.al. | 2409.07440 | link |
2024-09-11 | A Suite for Acoustic Language Model Evaluation | Gallil Maimon et.al. | 2409.07437 | link |
2024-09-11 | Synthetic continued pretraining | Zitong Yang et.al. | 2409.07431 | link |
2024-09-11 | Agent Workflow Memory | Zora Zhiruo Wang et.al. | 2409.07429 | link |
2024-09-11 | CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification | Zeqing Qin et.al. | 2409.07407 | null |
2024-09-11 | AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | Han Wang et.al. | 2409.07394 | link |
2024-09-11 | Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination | Daniel Zhang-Li et.al. | 2409.07372 | null |
2024-09-11 | Demo: SGCode: A Flexible Prompt-Optimizing System for Secure Generation of Code | Khiem Ton et.al. | 2409.07368 | null |
2024-09-11 | Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu et.al. | 2409.07355 | link |
2024-09-11 | Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks | Md Zarif Hossain et.al. | 2409.07353 | link |
2024-09-11 | Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization | Mehrdad Zakershahrak et.al. | 2409.07335 | null |
2024-09-11 | Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Weixi Weng et.al. | 2409.07331 | null |
2024-09-11 | MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | Praveen K Kanithi et.al. | 2409.07314 | null |
2024-09-11 | Exploring User-level Gradient Inversion with a Diffusion Prior | Zhuohang Li et.al. | 2409.07291 | null |
2024-09-11 | STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM | Qijiong Liu et.al. | 2409.07276 | null |
2024-09-11 | MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Enming Zhang et.al. | 2409.07267 | link |
2024-09-11 | Alignment of Diffusion Models: Fundamentals, Challenges, and Future | Buhua Liu et.al. | 2409.07253 | link |
2024-09-11 | PiTe: Pixel-Temporal Alignment for Large Video-Language Model | Yang Liu et.al. | 2409.07239 | link |
2024-09-10 | Benchmarking Sub-Genre Classification For Mainstage Dance Music | Hongzhi Shu et.al. | 2409.06690 | null |
2024-09-10 | E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | Zihan Liao et.al. | 2409.06679 | null |
2024-09-10 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang et.al. | 2409.06666 | link |
2024-09-10 | Human Perception of LLM-generated Text Content in Social Media Environments | Kristina Radivojevic et.al. | 2409.06653 | null |
2024-09-10 | Optimal Workload Placement on Multi-Instance GPUs | Bekir Turkkan et.al. | 2409.06646 | null |
2024-09-10 | EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | Danli Shi et.al. | 2409.06644 | null |
2024-09-11 | Segmenting sea ice floes in close-range optical imagery with active contour and foundation models | Giulio Passerotti et.al. | 2409.06641 | null |
2024-09-10 | TeXBLEU: Automatic Metric for Evaluate LaTeX Format | Kyudan Jung et.al. | 2409.06639 | link |
2024-09-10 | MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders | Wenyu Zhang et.al. | 2409.06635 | null |
2024-09-10 | A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio | Ningyuan Xi et.al. | 2409.06624 | null |
2024-09-10 | Exploring Italian sentence embeddings properties through multi-tasking | Vivi Nastase et.al. | 2409.06622 | link |
2024-09-10 | Alleviating Hallucinations in Large Language Models with Scepticism Modeling | Yetao Wu et.al. | 2409.06601 | null |
2024-09-10 | GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | Sacha Muller et.al. | 2409.06595 | link |
2024-09-10 | Quantifying and Enabling the Interpretability of CLIP-like Models | Avinash Madasu et.al. | 2409.06579 | null |
2024-09-10 | Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement | Vivi Nastase et.al. | 2409.06567 | null |
2024-09-10 | MAPS: Energy-Reliability Tradeoff Management in Autonomous Vehicles Through LLMs Penetrated Science | Mahdieh Aliazam et.al. | 2409.06558 | null |
2024-09-10 | Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games | Juhwan Choi et.al. | 2409.06518 | link |
2024-09-10 | Aligning Machine and Human Visual Representations across Abstraction Levels | Lukas Muttenthaler et.al. | 2409.06509 | null |
2024-09-10 | Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding | Xiaoyu Liang et.al. | 2409.06485 | null |
2024-09-10 | Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles | Qiujing Lu et.al. | 2409.06450 | null |
2024-09-09 | MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Run Luo et.al. | 2409.05840 | null |
2024-09-09 | Are Large Language Models a Threat to Programming Platforms? An Exploratory Study | Md Mustakim Billah et.al. | 2409.05824 | null |
2024-09-09 | VFA: Vision Frequency Analysis of Foundation Models and Human | Mohammad-Javad Darvishi-Bayazi et.al. | 2409.05817 | null |
2024-09-09 | Improving Pretraining Data Using Perplexity Correlations | Tristan Thrush et.al. | 2409.05816 | null |
2024-09-09 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu et.al. | 2409.05806 | link |
2024-09-09 | Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | Emily Cheng et.al. | 2409.05771 | null |
2024-09-09 | Model Input Verification of Large Scale Simulations | Rumyana Neykova et.al. | 2409.05768 | null |
2024-09-09 | A Novel Idea Generation Tool using a Structured Conversational AI (CAI) System | B. Sankar et.al. | 2409.05747 | null |
2024-09-09 | LLMs Will Always Hallucinate, and We Need to Live With This | Sourav Banerjee et.al. | 2409.05746 | null |
2024-09-09 | A System and Benchmark for LLM-based Q&A on Heterogeneous Data | Achille Fokoue et.al. | 2409.05735 | null |
2024-09-09 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou et.al. | 2409.05732 | null |
2024-09-09 | The Influence of Task and Group Disparities over Users' Attitudes Toward Using Large Language Models for Psychotherapy | Qihang He et.al. | 2409.05703 | null |
2024-09-09 | Segmentation by Factorization: Unsupervised Semantic Segmentation for Pathology by Factorizing Foundation Model Features | Jacob Gildenblat et.al. | 2409.05697 | null |
2024-09-09 | Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! | Yuchen Shen et.al. | 2409.05672 | null |
2024-09-09 | Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case | Vagrant Gautam et.al. | 2409.05653 | link |
2024-09-10 | MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery | Hongjin Qian et.al. | 2409.05591 | link |
2024-09-09 | Leveraging Content and Acoustic Representations for Efficient Speech Emotion Recognition | Soumya Dutta et.al. | 2409.05566 | null |
2024-09-09 | CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning | Jinwei He et.al. | 2409.05559 | null |
2024-09-09 | SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | Alireza Ghafarollahi et.al. | 2409.05556 | link |
2024-09-09 | Harmonic Reasoning in Large Language Models | Anna Kruspe et.al. | 2409.05521 | null |
2024-09-06 | VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | Yecheng Wu et.al. | 2409.04429 | link |
2024-09-06 | Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques | Davide Clode da Silva et.al. | 2409.04424 | null |
2024-09-06 | RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | Jiaxing Wu et.al. | 2409.04421 | null |
2024-09-06 | Question-Answering Dense Video Events | Hangyu Qin et.al. | 2409.04388 | null |
2024-09-06 | Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | Aliakbar Nafar et.al. | 2409.04318 | link |
2024-09-06 | An optically accelerated extreme learning machine using hot atomic vapors | Pierre Azam et.al. | 2409.04312 | null |
2024-09-06 | Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | Desiree Heim et.al. | 2409.04286 | null |
2024-09-06 | Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | Yuxiao Huang et.al. | 2409.04270 | null |
2024-09-06 | An overview of domain-specific foundation model: key technologies, applications and challenges | Haolong Chen et.al. | 2409.04267 | null |
2024-09-06 | UniDet3D: Multi-dataset Indoor 3D Object Detection | Maksim Kolodiazhnyi et.al. | 2409.04234 | link |
2024-09-06 | Fast Forwarding Low-Rank Training | Adir Rahamim et.al. | 2409.04206 | null |
2024-09-06 | Residual Stream Analysis with Multi-Layer SAEs | Tim Lawson et.al. | 2409.04185 | link |
2024-09-06 | GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | Ziyin Zhang et.al. | 2409.04183 | null |
2024-09-06 | Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering | Larissa Pusch et.al. | 2409.04181 | null |
2024-09-06 | From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Andreas Stephan et.al. | 2409.04168 | null |
2024-09-06 | Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation | Luis Mayer et.al. | 2409.04164 | null |
2024-09-06 | Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering | Jan Hofmann et.al. | 2409.04122 | null |
2024-09-06 | Multi-Programming Language Ensemble for Code Generation in Large Language Model | Tengfei Xue et.al. | 2409.04114 | link |
2024-09-06 | Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | Chenglei Si et.al. | 2409.04109 | link |
2024-09-06 | UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity | Yicheng Fu et.al. | 2409.04081 | null |
2024-09-05 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Yunze Man et.al. | 2409.03757 | link |
2024-09-05 | Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution | Marga Don et.al. | 2409.03754 | link |
2024-09-05 | Attention Heads of Large Language Models: A Survey | Zifan Zheng et.al. | 2409.03752 | link |
2024-09-05 | LLM-CI: Assessing Contextual Integrity Norms in Language Models | Yan Shvartzshnaider et.al. | 2409.03735 | null |
2024-09-05 | Safety vs. Performance: How Multi-Objective Learning Reduces Barriers to Market Entry | Meena Jagadeesan et.al. | 2409.03734 | null |
2024-09-05 | Planning In Natural Language Improves LLM Search For Code Generation | Evan Wang et.al. | 2409.03733 | link |
2024-09-06 | RAG based Question-Answering for Contextual Response Prediction System | Sriram Veturi et.al. | 2409.03708 | null |
2024-09-05 | LAST: Language Model Aware Speech Tokenization | Arnon Turetzky et.al. | 2409.03701 | null |
2024-09-05 | TRACE-cs: Trustworthy Reasoning for Contrastive Explanations in Course Scheduling Problems | Stylianos Loukas Vasileiou et.al. | 2409.03671 | null |
2024-09-05 | A Fused Large Language Model for Predicting Startup Success | Abdurahman Maarouf et.al. | 2409.03668 | null |
2024-09-05 | The representation landscape of few-shot learning and fine-tuning in large language models | Diego Doimo et.al. | 2409.03662 | link |
2024-09-06 | LLM-based multi-agent poetry generation in non-cooperative environments | Ran Zhang et.al. | 2409.03659 | link |
2024-09-05 | On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization | Yong Lin et.al. | 2409.03650 | null |
2024-09-05 | Text-Guided Mixup Towards Long-Tailed Image Categorization | Richard Franklin et.al. | 2409.03583 | link |
2024-09-05 | FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation | Xi Chen et.al. | 2409.03525 | null |
2024-09-05 | Have Large Vision-Language Models Mastered Art History? | Ombretta Strafforello et.al. | 2409.03521 | null |
2024-09-05 | Tissue Concepts: supervised foundation models in computational pathology | Till Nicke et.al. | 2409.03519 | link |
2024-09-05 | From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | Jifan Yu et.al. | 2409.03512 | null |
2024-09-05 | LLM-based event abstraction and integration for IoT-sourced logs | Mohsen Shirali et.al. | 2409.03478 | link |
2024-09-05 | How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes | Inacio Vieira et.al. | 2409.03454 | null |
2024-09-04 | RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) | Yao Mu et.al. | 2409.02920 | null |
2024-09-04 | Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving | Yuhang Lu et.al. | 2409.02914 | null |
2024-09-04 | Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling | Kaiwen Zheng et.al. | 2409.02908 | null |
2024-09-05 | LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | Jiajie Zhang et.al. | 2409.02897 | link |
2024-09-04 | LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | Xidong Wang et.al. | 2409.02889 | link |
2024-09-04 | CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently | Jonathan Zalach et.al. | 2409.02885 | null |
2024-09-04 | Benchmarking Spurious Bias in Few-Shot Image Classifiers | Guangtao Zheng et.al. | 2409.02882 | link |
2024-09-04 | Configurable Foundation Models: Building LLMs from a Modular Perspective | Chaojun Xiao et.al. | 2409.02877 | null |
2024-09-04 | Historical German Text Normalization Using Type- and Token-Based Language Modeling | Anton Ehrmanntraut et.al. | 2409.02841 | null |
2024-09-04 | Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models | Moein Shahiki Tash et.al. | 2409.02836 | null |
2024-09-04 | CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Wentao Liu et.al. | 2409.02834 | link |
2024-09-04 | ExpLLM: Towards Chain of Thought for Facial Expression Recognition | Xing Lan et.al. | 2409.02828 | null |
2024-09-04 | Design Contradictions: Help or Hindrance? | Aron E. Owen et.al. | 2409.02823 | null |
2024-09-04 | Language Understanding as a Constraint on Consensus Size in LLM Societies | Giordano De Marzo et.al. | 2409.02822 | null |
2024-09-04 | Towards a Unified View of Preference Learning for Large Language Models: A Survey | Bofei Gao et.al. | 2409.02795 | link |
2024-09-05 | Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models? | Yixuan Tang et.al. | 2409.02727 | link |
2024-09-04 | Pre-training data selection for biomedical domain adaptation using journal impact metrics | Mathieu Laï-king et.al. | 2409.02725 | null |
2024-09-04 | Alignment-Aware Model Extraction Attacks on Large Language Models | Zi Liang et.al. | 2409.02718 | link |
2024-09-04 | Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL | Mohammad Reshadati et.al. | 2409.02711 | null |
2024-09-04 | LLM-Assisted Visual Analytics: Opportunities and Challenges | Maeve Hutchinson et.al. | 2409.02691 | null |
2024-08-30 | SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | Raoyuan Zhao et.al. | 2408.17437 | link |
2024-08-30 | DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model | Mona Sheikh Zeinoddin et.al. | 2408.17433 | link |
2024-08-30 | Advancing Multi-talker ASR Performance with Large Language Models | Mohan Shi et.al. | 2408.17431 | null |
2024-08-30 | CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | Jonathan Bourne et.al. | 2408.17428 | null |
2024-09-03 | Open-vocabulary Temporal Action Localization using VLMs | Naoki Wake et.al. | 2408.17422 | null |
2024-08-30 | Getting Inspiration for Feature Elicitation: App Store- vs. LLM-based Approach | Jialiang Wei et.al. | 2408.17404 | link |
2024-08-30 | EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution | Francesco Argenziano et.al. | 2408.17379 | null |
2024-08-30 | NDP: Next Distribution Prediction as a More Broad Target | Junhao Ruan et.al. | 2408.17377 | null |
2024-08-30 | Assessing Generative Language Models in Classification Tasks: Performance and Self-Evaluation Capabilities in the Environmental and Climate Change Domain | Francesca Grasso et.al. | 2408.17362 | link |
2024-08-30 | Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage | Md Rafi Ur Rashid et.al. | 2408.17354 | null |
2024-09-02 | LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation | Shuyi Ouyang et.al. | 2408.17347 | null |
2024-08-30 | Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering | Nicholas Pochinkov et.al. | 2408.17322 | link |
2024-08-30 | Bridging Domain Knowledge and Process Discovery Using Large Language Models | Ali Norouzifar et.al. | 2408.17316 | link |
2024-08-30 | Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | Rhui Dih Lee et.al. | 2408.17280 | null |
2024-08-30 | Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach | Tong Nie et.al. | 2408.17258 | null |
2024-08-30 | VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters | Mouxiang Chen et.al. | 2408.17253 | link |
2024-08-30 | Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study | Shubham Agarwal et.al. | 2408.17181 | null |
2024-08-30 | Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model | Zhen Ye et.al. | 2408.17175 | link |
2024-08-30 | Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning | Xiaoye Qu et.al. | 2408.17150 | link |
2024-08-30 | Reasoning AI Performance Degradation in 6G Networks with Large Language Models | Liming Huang et.al. | 2408.17097 | null |
2024-08-29 | PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning | Noor Hussein et.al. | 2408.16769 | link |
2024-08-29 | How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models | Jiyue Jiang et.al. | 2408.16756 | link |
2024-08-29 | Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models | Alec Solway et.al. | 2408.16753 | null |
2024-08-29 | A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models | Yi-Lin Tuan et.al. | 2408.16751 | null |
2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749 | null |
2024-08-29 | Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models | Jiří Milička et.al. | 2408.16740 | null |
2024-08-29 | Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling | Hritik Bansal et.al. | 2408.16737 | null |
2024-08-29 | VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation | Shiwei Wu et.al. | 2408.16730 | null |
2024-08-30 | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | Zhifei Xie et.al. | 2408.16725 | link |
2024-08-29 | GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models | Moreno D'Incà et.al. | 2408.16700 | link |
2024-08-29 | Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | Ziniu Li et.al. | 2408.16673 | null |
2024-08-29 | Space3D-Bench: Spatial 3D Question Answering Benchmark | Emilia Szymanska et.al. | 2408.16662 | null |
2024-08-29 | DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Yongjie Fu et.al. | 2408.16647 | null |
2024-08-29 | Examination of Code generated by Large Language Models | Robin Beer et.al. | 2408.16601 | link |
2024-08-29 | Enhancing Dialogue Generation in Werewolf Game Through Situation Analysis and Persuasion Strategies | Zhiyang Qi et.al. | 2408.16586 | null |
2024-08-29 | WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling | Shengpeng Ji et.al. | 2408.16532 | link |
2024-08-29 | CNIMA: A Universal Evaluation Framework and Automated Approach for Assessing Second Language Dialogues | Rena Gao et.al. | 2408.16518 | link |
2024-08-29 | LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? | Jan Cegin et.al. | 2408.16502 | null |
2024-08-29 | CogVLM2: Visual Language Models for Image and Video Understanding | Wenyi Hong et.al. | 2408.16500 | link |
2024-08-29 | A Survey on Evaluating Large Language Models in Code Generation Tasks | Liguo Chen et.al. | 2408.16498 | null |
2024-08-28 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi et.al. | 2408.15998 | link |
2024-08-29 | Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang et.al. | 2408.15996 | null |
2024-08-28 | Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration | Xu Zhang et.al. | 2408.15994 | null |
2024-08-28 | BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | Wei Wang et.al. | 2408.15971 | null |
2024-08-28 | More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding | Yuan Tang et.al. | 2408.15966 | link |
2024-08-28 | Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | Nicholas R. Waytowich et.al. | 2408.15950 | null |
2024-08-28 | DeMoBot: Deformable Mobile Manipulation with Vision-based Sub-goal Retrieval | Yuying Zhang et.al. | 2408.15919 | null |
2024-08-28 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang et.al. | 2408.15915 | link |
2024-08-28 | Decentralized LLM Inference over Edge Networks with Energy Harvesting | Aria Khoshsirat et.al. | 2408.15907 | null |
2024-08-28 | LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments | Ruirui Chen et.al. | 2408.15903 | null |
2024-08-28 | Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Nikolas Gritsch et.al. | 2408.15901 | null |
2024-08-28 | Bias in LLMs as Annotators: The Effect of Party Cues on Labelling Decision by Large Language Models | Sebastian Vallejo Vera et.al. | 2408.15895 | null |
2024-08-28 | LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Fangxun Shu et.al. | 2408.15881 | link |
2024-08-28 | Persuasion Games using Large Language Models | Ganesh Prasath Ramani et.al. | 2408.15879 | null |
2024-08-28 | Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection | Sagar Srinivas Sakhinana et.al. | 2408.15866 | null |
2024-08-28 | Benchmarking foundation models as feature extractors for weakly-supervised computational pathology | Peter Neidlinger et.al. | 2408.15823 | null |
2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802 | null |
2024-08-28 | Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization | Léo Hemamou et.al. | 2408.15801 | null |
2024-08-28 | Evaluating Named Entity Recognition Using Few-Shot Prompting with Large Language Models | Hédi Zhegidi et.al. | 2408.15796 | link |
2024-08-28 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu et.al. | 2408.15792 | link |
2024-08-27 | Generative Verifiers: Reward Modeling as Next-Token Prediction | Lunjun Zhang et.al. | 2408.15240 | null |
2024-08-27 | The Mamba in the Llama: Distilling and Accelerating Hybrid Models | Junxiong Wang et.al. | 2408.15237 | link |
2024-08-27 | Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | Yucheng Jiang et.al. | 2408.15232 | null |
2024-08-27 | LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet | Nathaniel Li et.al. | 2408.15221 | null |
2024-08-27 | Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks | Shide Zhou et.al. | 2408.15207 | null |
2024-08-27 | Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation | Jian Hu et.al. | 2408.15205 | link |
2024-08-27 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? | Kristina Gligorić et.al. | 2408.15204 | link |
2024-08-27 | Infusing Acoustic Pause Context into Text-Based Dementia Assessment | Franziska Braun et.al. | 2408.15188 | null |
2024-08-27 | Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement | Longshen Ou et.al. | 2408.15176 | null |
2024-08-27 | X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation | Hanjia Lyu et.al. | 2408.15172 | null |
2024-08-27 | Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation | N. E. Kriman et.al. | 2408.15171 | null |
2024-08-27 | How transformers learn structured data: insights from hierarchical filtering | Jerome Garnier-Brun et.al. | 2408.15138 | null |
2024-08-27 | CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP | Zhenchen Tang et.al. | 2408.15098 | null |
2024-08-27 | Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models | Xiyu Liu et.al. | 2408.15091 | null |
2024-08-27 | BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline | Guosheng Dong et.al. | 2408.15079 | null |
2024-08-27 | Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models | Ned Cooper et.al. | 2408.15066 | null |
2024-08-27 | The Benefits of Balance: From Information Projections to Variance Reduction | Lang Liu et.al. | 2408.15065 | null |
2024-08-28 | DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | Wenhui Liao et.al. | 2408.15045 | null |
2024-08-28 | A Survey of Large Language Models for European Languages | Wazir Ali et.al. | 2408.15040 | null |
2024-08-27 | Speech Recognition Transformers: Topological-lingualism Perspective | Shruti Singh et.al. | 2408.14991 | null |
2024-08-26 | A Practitioner's Guide to Continual Multimodal Pretraining | Karsten Roth et.al. | 2408.14471 | link |
2024-08-27 | Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models | Aradhye Agarwal et.al. | 2408.14470 | link |
2024-08-26 | Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos | Qirui Chen et.al. | 2408.14469 | null |
2024-08-26 | Explicit Inductive Inference using Large Language Models | Tianyang Liu et.al. | 2408.14467 | null |
2024-08-26 | Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Liuchang Xu Shuo Zhao et.al. | 2408.14438 | null |
2024-08-26 | Social perception of faces in a vision-language model | Carina I. Hausladen et.al. | 2408.14435 | link |
2024-08-26 | CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models | Shubham Bharti et.al. | 2408.14419 | null |
2024-08-26 | MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues | Kuluhan Binici et.al. | 2408.14418 | null |
2024-08-26 | Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse | Yahao Ding et.al. | 2408.14416 | null |
2024-08-26 | Language-specific Calibration for Pruning Multilingual Language Models | Simon Kurz et.al. | 2408.14398 | null |
2024-08-26 | Reprogramming Foundational Large Language Models(LLMs) for Enterprise Adoption for Spatio-Temporal Forecasting Applications: Unveiling a New Era in Copilot-Guided Cross-Modal Time Series Representation Learning | Sakhinana Sagar Srinivas et.al. | 2408.14387 | null |
2024-08-26 | Probing Causality Manipulation of Large Language Models | Chenyang Zhang et.al. | 2408.14380 | link |
2024-08-26 | An Embedding is Worth a Thousand Noisy Labels | Francesco Di Salvo et.al. | 2408.14358 | link |
2024-08-26 | SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | Daoguang Zan et.al. | 2408.14354 | link |
2024-08-26 | Assessing Contamination in Large Language Models: Introducing the LogProber method | Nicolas Yax et.al. | 2408.14352 | null |
2024-08-26 | Foundation Models for Music: A Survey | Yinghao Ma et.al. | 2408.14340 | link |
2024-08-26 | Claim Verification in the Age of Large Language Models: A Survey | Alphaeus Dmonte et.al. | 2408.14317 | null |
2024-08-26 | LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | Yayati Jadhav et.al. | 2408.14307 | null |
2024-08-26 | Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails | Malte Josten et.al. | 2408.14293 | link |
2024-08-26 | Predictability and Causality in Spanish and English Natural Language Generation | Andrea Busto-Castiñeira et.al. | 2408.14283 | null |
2024-08-23 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang et.al. | 2408.13257 | null |
2024-08-23 | Domain-specific long text classification from sparse relevant information | Célia D'Cruz et.al. | 2408.13253 | null |
2024-08-23 | Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption | Sakhinana Sagar Srinivas et.al. | 2408.13248 | null |
2024-08-23 | Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time | Yingyu Liang et.al. | 2408.13233 | null |
2024-08-23 | EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods | Hongcheng Ding et.al. | 2408.13214 | null |
2024-08-23 | DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | Qiming Zhu et.al. | 2408.13204 | null |
2024-08-23 | Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Hourui Deng et.al. | 2408.13184 | null |
2024-08-23 | IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models | Zhihao Yu et.al. | 2408.13073 | link |
2024-08-23 | Guiding IoT-Based Healthcare Alert Systems with Large Language Models | Yulan Gao et.al. | 2408.13071 | null |
2024-08-23 | SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks | Kai-Wei Chang et.al. | 2408.13040 | null |
2024-08-23 | VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models | Wentao Wu et.al. | 2408.13031 | link |
2024-08-23 | In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting | Haowei Du et.al. | 2408.13028 | null |
2024-08-23 | A Web-Based Solution for Federated Learning with LLM-Based Automation | Chamith Mawela et.al. | 2408.13010 | null |
2024-08-23 | Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates | Hui Wei et.al. | 2408.13006 | link |
2024-08-23 | CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution | Ruiyang Xu et.al. | 2408.13001 | null |
2024-08-23 | Open Llama2 Model for the Lithuanian Language | Artūras Nakvosas et.al. | 2408.12963 | null |
2024-08-23 | Multimodal Contrastive In-Context Learning | Yosuke Miyanishi et.al. | 2408.12959 | null |
2024-08-23 | Image Segmentation in Foundation Model Era: A Survey | Tianfei Zhou et.al. | 2408.12957 | link |
2024-08-23 | E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group | Yue Pan et.al. | 2408.12948 | null |
2024-08-23 | Causal-Guided Active Learning for Debiasing Large Language Models | Zhouhao Sun et.al. | 2408.12942 | link |
2024-08-22 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang et.al. | 2408.12599 | link |
2024-08-23 | Non-Homophilic Graph Pre-Training and Prompt Learning | Xingtong Yu et.al. | 2408.12594 | null |
2024-08-22 | RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | Xiaohan Wang et.al. | 2408.12579 | null |
2024-08-22 | MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | Haojun Shi et.al. | 2408.12574 | link |
2024-08-22 | Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Jamba Team et.al. | 2408.12570 | null |
2024-08-22 | ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation | Lujia Zhong et.al. | 2408.12561 | link |
2024-08-22 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu et.al. | 2408.12547 | link |
2024-08-22 | Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | Jinheng Xie et.al. | 2408.12528 | null |
2024-08-22 | MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | Hao Wei et.al. | 2408.12496 | null |
2024-08-22 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang et.al. | 2408.12494 | link |
2024-08-23 | Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Khang T. Doan et.al. | 2408.12480 | null |
2024-08-22 | Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition | Bozheng Li et.al. | 2408.12475 | null |
2024-08-22 | DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems | Jiaju Chen et.al. | 2408.12470 | null |
2024-08-22 | Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning | Mushui Liu et.al. | 2408.12469 | null |
2024-08-22 | Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing | Mengqi Zhang et.al. | 2408.12456 | null |
2024-08-22 | Positional Description for Numerical Normalization | Deepanshu Gupta et.al. | 2408.12430 | null |
2024-08-22 | FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing | Jue Wang et.al. | 2408.12429 | link |
2024-08-22 | Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification | Sudi Murindanyi et.al. | 2408.12426 | null |
2024-08-22 | Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code | Mahdi Kazemi et.al. | 2408.12416 | null |
2024-08-22 | Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes | Sota Kato et.al. | 2408.12406 | link |
2024-08-21 | Great Memory, Shallow Reasoning: Limits of |
Shangyi Geng et.al. | 2408.11815 | link |
2024-08-21 | SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Yuanyang Yin et.al. | 2408.11813 | null |
2024-08-21 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Xiuwei Xu et.al. | 2408.11811 | null |
2024-08-21 | Approaching Deep Learning through the Spectral Dynamics of Weights | David Yunis et.al. | 2408.11804 | link |
2024-08-21 | Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Yuzhou Huang et.al. | 2408.11801 | null |
2024-08-21 | PermitQA: A Benchmark for Retrieval Augmented Generation in Wind Siting and Permitting domain | Rounak Meyur et.al. | 2408.11800 | null |
2024-08-21 | Practical token pruning for foundation models in few-shot conversational virtual assistant systems | Haode Qi et.al. | 2408.11799 | null |
2024-08-21 | EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model | Feipeng Ma et.al. | 2408.11795 | null |
2024-08-21 | Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design | Nathaniel H. Park et.al. | 2408.11793 | null |
2024-08-21 | Critique-out-Loud Reward Models | Zachary Ankner et.al. | 2408.11791 | link |
2024-08-21 | DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Zhifei Xie et.al. | 2408.11788 | null |
2024-08-21 | Personality Alignment of Large Language Models | Minjun Zhu et.al. | 2408.11779 | link |
2024-08-21 | Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Omar Erak et.al. | 2408.11775 | link |
2024-08-21 | Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks | Yiyi Chen et.al. | 2408.11749 | link |
2024-08-21 | DH-Bench: Probing Depth and Height Perception of Large Visual-Language Models | Shehreen Azad et.al. | 2408.11748 | link |
2024-08-21 | Open-Ended 3D Point Cloud Instance Segmentation | Phuc D. A. Nguyen et.al. | 2408.11747 | null |
2024-08-21 | Mixed Sparsity Training: Achieving 4 |
Pihe Hu et.al. | 2408.11746 | null |
2024-08-21 | FocusLLM: Scaling LLM's Context by Parallel Decoding | Zhenyu Li et.al. | 2408.11745 | null |
2024-08-21 | MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Elias Frantar et.al. | 2408.11743 | link |
2024-08-21 | CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering | Yuliang Cai et.al. | 2408.11742 | link |
2024-08-20 | Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement | Satoshi Kosugi et.al. | 2408.11055 | link |
2024-08-20 | Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks | Nathaniel Pinckney et.al. | 2408.11053 | link |
2024-08-20 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu et.al. | 2408.11051 | link |
2024-08-20 | MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | Jian Chen et.al. | 2408.11049 | link |
2024-08-20 | Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders | Yuan Xin et.al. | 2408.11046 | null |
2024-08-20 | Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research | Sreyoshi Bhaduri et.al. | 2408.11043 | null |
2024-08-20 | Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | Chunting Zhou et.al. | 2408.11039 | null |
2024-08-20 | Scaling Law with Learning Rate Annealing | Howe Tissue et.al. | 2408.11029 | null |
2024-08-20 | Athena: Safe Autonomous Agents with Verbal Contrastive Learning | Tanmana Sadhu et.al. | 2408.11021 | null |
2024-08-20 | While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output? | Wen Cheng et.al. | 2408.11006 | link |
2024-08-20 | SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining | Jonathan Prexl et.al. | 2408.11000 | link |
2024-08-20 | CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | Michael Reinisch et.al. | 2408.10995 | null |
2024-08-20 | Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models | Yuyan Chen et.al. | 2408.10947 | null |
2024-08-20 | Large Language Model Driven Recommendation | Anton Korikov et.al. | 2408.10946 | null |
2024-08-20 | HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments | Kazi Hasan Ibn Arif et.al. | 2408.10945 | link |
2024-08-20 | SysBench: Can Large Language Models Follow System Messages? | Yanzhao Qin et.al. | 2408.10943 | link |
2024-08-20 | Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience | Yoonseo Choi et.al. | 2408.10937 | null |
2024-08-20 | LBC: Language-Based-Classifier for Out-Of-Variable Generalization | Kangjun Noh et.al. | 2408.10923 | link |
2024-08-21 | BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model | Yeyong Yu et.al. | 2408.10903 | link |
2024-08-20 | Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs | John Mendonça et.al. | 2408.10902 | link |
2024-08-19 | SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP | Yusuke Hirota et.al. | 2408.10202 | null |
2024-08-19 | Demystifying the Communication Characteristics for Distributed Transformer Models | Quentin Anthony et.al. | 2408.10197 | null |
2024-08-19 | Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models | Aviv Bick et.al. | 2408.10189 | null |
2024-08-19 | LongVILA: Scaling Long-Context Visual Language Models for Long Videos | Fuzhao Xue et.al. | 2408.10188 | link |
2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174 | link |
2024-08-19 | Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Xiaoyu Kong et.al. | 2408.10159 | link |
2024-08-19 | Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models | Amey Hengle et.al. | 2408.10151 | link |
2024-08-19 | In-Context Learning with Representations: Contextual Generalization of Trained Transformers | Tong Yang et.al. | 2408.10147 | null |
2024-08-19 | Instruction Finetuning for Leaderboard Generation from Empirical AI Research | Salomon Kabongo et.al. | 2408.10141 | null |
2024-08-19 | Rhyme-aware Chinese lyric generator based on GPT | Yixiao Yuan et.al. | 2408.10130 | null |
2024-08-19 | Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track | Feiyu Pan et.al. | 2408.10125 | null |
2024-08-19 | Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models | Tianyu Zhang et.al. | 2408.10124 | link |
2024-08-19 | Geometry Informed Tokenization of Molecules for Language Model Generation | Xiner Li et.al. | 2408.10120 | null |
2024-08-19 | GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization | Ran Liu et.al. | 2408.10115 | link |
2024-08-20 | PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities | Yuanjian Xu et.al. | 2408.10111 | null |
2024-08-19 | ARMADA: Attribute-Based Multimodal Data Augmentation | Xiaomeng Jin et.al. | 2408.10086 | null |
2024-08-19 | Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning | Sriyash Poddar et.al. | 2408.10075 | null |
2024-08-19 | FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Zhengchao Huang et.al. | 2408.10072 | link |
2024-08-19 | Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory | Haoran Li et.al. | 2408.10053 | null |
2024-08-19 | Defense Priorities in the Open-Source AI Debate: A Preliminary Assessment | Masao Dahlgren et.al. | 2408.10026 | null |
2024-08-16 | SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation | Xinyu Xiong et.al. | 2408.08870 | link |
2024-08-16 | PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | Sumanth Prabhu et.al. | 2408.08869 | null |
2024-08-16 | A Hassle-free Algorithm for Private Learning in Practice: Don't Use Tree Aggregation, Use BLTs | H. Brendan McMahan et.al. | 2408.08868 | null |
2024-08-16 | Visual Agents as Fast and Slow Thinkers | Guangyan Sun et.al. | 2408.08862 | link |
2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | Eman Ali et.al. | 2408.08855 | null |
2024-08-16 | GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms | Yuhao Jia et.al. | 2408.08852 | null |
2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849 | link |
2024-08-16 | PsychoLex: Unveiling the Psychological Mind of Large Language Models | Mohammad Amin Abbasi et.al. | 2408.08848 | null |
2024-08-16 | FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats | Xuanliang Zhang et.al. | 2408.08841 | link |
2024-08-16 | EasyRec: Simple yet Effective Language Models for Recommendation | Xubin Ren et.al. | 2408.08821 | link |
2024-08-16 | Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models | Lin Zhao et.al. | 2408.08813 | null |
2024-08-16 | Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors | Felipe A. Csaszar et.al. | 2408.08811 | null |
2024-08-16 | Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge | Ravi Raju et.al. | 2408.08808 | null |
2024-08-16 | CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems | Joanito Agili Lopo et.al. | 2408.08805 | null |
2024-08-16 | A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks | Boa Jang et.al. | 2408.08790 | link |
2024-08-16 | EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics | Chenwei Wan et.al. | 2408.08782 | link |
2024-08-16 | Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Chenming Tang et.al. | 2408.08780 | null |
2024-08-16 | DAC: Decomposed Automation Correction for Text-to-SQL | Dingzirui Wang et.al. | 2408.08779 | link |
2024-08-16 | Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused | Dingwei Chen et.al. | 2408.08769 | null |
2024-08-16 | Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM | Wanting Yang et.al. | 2408.08765 | null |
2024-08-15 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu et.al. | 2408.08313 | null |
2024-08-15 | ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws | Ruihang Li et.al. | 2408.08310 | null |
2024-08-15 | Towards Flexible Visual Relationship Segmentation | Fangrui Zhu et.al. | 2408.08305 | null |
2024-08-15 | Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors | Usman Syed et.al. | 2408.08302 | null |
2024-08-15 | VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps | Senthil Hariharan Arul et.al. | 2408.08301 | null |
2024-08-15 | HELP: Hierarchical Embeddings-based Log Parsing | Andy Xu et.al. | 2408.08300 | null |
2024-08-15 | The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community | Shachar Don-Yehiya et.al. | 2408.08291 | null |
2024-08-15 | Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model | Jin Wang et.al. | 2408.08282 | null |
2024-08-15 | BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Qizhen Zhang et.al. | 2408.08274 | null |
2024-08-15 | DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System | Xihong Yang et.al. | 2408.08231 | null |
2024-08-15 | RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science | David Farr et.al. | 2408.08217 | null |
2024-08-15 | Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | Javier González et.al. | 2408.08210 | null |
2024-08-15 | LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation | Bohao Wang et.al. | 2408.08208 | null |
2024-08-15 | Heavy Labels Out! Dataset Distillation with Label Space Lightening | Ruonan Yu et.al. | 2408.08201 | null |
2024-08-15 | Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy | Shaojun Xu et.al. | 2408.08188 | null |
2024-08-15 | General-purpose Clothes Manipulation with Semantic Keypoints | Yuhong Deng et.al. | 2408.08160 | null |
2024-08-15 | EmBARDiment: an Embodied AI Agent for Productivity in XR | Riccardo Bovo et.al. | 2408.08158 | null |
2024-08-15 | DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search | Huajian Xin et.al. | 2408.08152 | link |
2024-08-15 | P/D-Serve: Serving Disaggregated Large Language Model at Scale | Yibo Jin et.al. | 2408.08147 | null |
2024-08-15 | KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning | Kaiqi Zhang et.al. | 2408.08146 | null |
2024-08-14 | The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models | Karime Maamari et.al. | 2408.07702 | null |
2024-08-15 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang et.al. | 2408.07666 | link |
2024-08-14 | Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models | Yi-Cheng Lin et.al. | 2408.07665 | link |
2024-08-14 | Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions | Quan Liu et.al. | 2408.07663 | link |
2024-08-14 | WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs | Weijian Xie et.al. | 2408.07611 | null |
2024-08-14 | Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | Hamza Kheddar et.al. | 2408.07583 | null |
2024-08-15 | MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Minxuan Zhou et.al. | 2408.07543 | link |
2024-08-15 | Usefulness of data flow diagrams and large language models for security threat validation: a registered report | Winnie Bahati Mbaka et.al. | 2408.07537 | null |
2024-08-14 | Development of a Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | Seungjun Han et.al. | 2408.07531 | null |
2024-08-14 | Large Language Models Know What Makes Exemplary Contexts | Quanyu Long et.al. | 2408.07505 | null |
2024-08-14 | Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach | Shizhou Zhang et.al. | 2408.07500 | link |
2024-08-14 | QirK: Question Answering via Intermediate Representation on Knowledge Graphs | Jan Luca Scheerer et.al. | 2408.07494 | null |
2024-08-14 | Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems | Ning Lu et.al. | 2408.07482 | null |
2024-08-14 | Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization | Yuxin Jiang et.al. | 2408.07471 | link |
2024-08-14 | Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification | Yongcheng Li et.al. | 2408.07467 | link |
2024-08-14 | Large Language Models Prompting With Episodic Memory | Dai Do et.al. | 2408.07465 | null |
2024-08-14 | From Brazilian Portuguese to European Portuguese | João Sanches et.al. | 2408.07457 | null |
2024-08-14 | Fact or Fiction? Improving Fact Verification with Knowledge Graphs through Simplified Subgraph Retrievals | Tobias A. Opsahl et.al. | 2408.07453 | link |
2024-08-15 | BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | Asif Hanif et.al. | 2408.07440 | link |
2024-08-14 | Beyond Inter-Item Relations: Dynamic Adaptive Mixture-of-Experts for LLM-Based Sequential Recommendation | CanYi Liu et.al. | 2408.07427 | null |
2024-08-13 | Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | Kexun Zhang et.al. | 2408.07060 | null |
2024-08-13 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Yushi Bai et.al. | 2408.07055 | link |
2024-08-13 | Casper: Prompt Sanitization for Protecting User Privacy in Web-Based Large Language Models | Chun Jie Chong et.al. | 2408.07004 | null |
2024-08-13 | LLMs can Schedule | Henrik Abgaryan et.al. | 2408.06993 | link |
2024-08-13 | DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs | Dongyuan Li et.al. | 2408.06966 | null |
2024-08-13 | Towards Holistic Disease Risk Prediction using Small Language Models | Liv Björkdahl et.al. | 2408.06943 | null |
2024-08-13 | OpenResearcher: Unleashing AI for Accelerated Scientific Research | Yuxiang Zheng et.al. | 2408.06941 | link |
2024-08-13 | The advantages of context specific language models: the case of the Erasmian Language Model | João Gonçalves et.al. | 2408.06931 | link |
2024-08-13 | Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas | Louis Kwok et.al. | 2408.06929 | link |
2024-08-13 | SceneGPT: A Language Model for 3D Scene Understanding | Shivam Chandhok et.al. | 2408.06926 | null |
2024-08-13 | Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives | Zhihu Wang et.al. | 2408.06904 | null |
2024-08-13 | Leveraging Language Models for Emotion and Behavior Analysis in Education | Kaito Tanaka et.al. | 2408.06874 | null |
2024-08-13 | LoRA |
Jia-Chen Zhang et.al. | 2408.06854 | null |
2024-08-13 | Causal Agent based on Large Language Model | Kairong Han et.al. | 2408.06849 | link |
2024-08-13 | DracoGPT: Extracting Visualization Design Preferences from Large Language Models | Huichen Will Wang et.al. | 2408.06845 | null |
2024-08-13 | How Aligned are Human Chart Takeaways and LLM Predictions? A Case Study on Bar Charts with Varying Layouts | Huichen Will Wang et.al. | 2408.06837 | null |
2024-08-13 | Efficient Search for Customized Activation Functions with Gradient Descent | Lukas Strack et.al. | 2408.06820 | link |
2024-08-13 | MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Yongjin Yang et.al. | 2408.06816 | null |
2024-08-13 | HLSPilot: LLM-based High-Level Synthesis | Chenwei Xiong et.al. | 2408.06810 | link |
2024-08-13 | Layerwise Recurrent Router for Mixture-of-Experts | Zihan Qiu et.al. | 2408.06793 | link |
2024-08-12 | FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection | Yufei Huang et.al. | 2408.06333 | link |
2024-08-12 | Animate, or Inanimate, That is the Question for Large Language Models | Leonardo Ranaldi et.al. | 2408.06332 | null |
2024-08-12 | Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example | Yanan Chen et.al. | 2408.06318 | null |
2024-08-12 | Long-Form Answers to Visual Questions from Blind and Low Vision People | Mina Huh et.al. | 2408.06303 | null |
2024-08-12 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Chris Lu et.al. | 2408.06292 | link |
2024-08-12 | MovieSum: An Abstractive Summarization Dataset for Movie Screenplays | Rohit Saxena et.al. | 2408.06281 | link |
2024-08-13 | Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation | Jieyong Kim et.al. | 2408.06276 | null |
2024-08-12 | FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data | Haoran Sun et.al. | 2408.06273 | link |
2024-08-12 | A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | Sampath Rajapaksha et.al. | 2408.06272 | null |
2024-08-12 | Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | Karel D'Oosterlinck et.al. | 2408.06266 | link |
2024-08-12 | Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning | Yingjin Song et.al. | 2408.06259 | null |
2024-08-12 | On Effects of Steering Latent Representation for Large Language Model Unlearning | Dang Huu-Tien et.al. | 2408.06223 | null |
2024-08-12 | Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers | Zhenting Qi et.al. | 2408.06195 | link |
2024-08-12 | FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework | Lukas Meyer et.al. | 2408.06190 | link |
2024-08-12 | Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting | Halley Young et.al. | 2408.06186 | null |
2024-08-12 | OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning | Mushui Liu et.al. | 2408.06158 | link |
2024-08-12 | LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library | Tianhao Yu et.al. | 2408.06150 | null |
2024-08-12 | Self-Supervised Learning on MeerKAT Wide-Field Continuum Images | Erica Lastufka et.al. | 2408.06147 | link |
2024-08-12 | Med42-v2: A Suite of Clinical LLMs | Clément Christophe et.al. | 2408.06142 | null |
2024-08-12 | Utilize Transformers for translating Wikipedia category names | Hoang-Thang Ta et.al. | 2408.06124 | null |
2024-08-10 | Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions | Michele Miranda et.al. | 2408.05212 | link |
2024-08-09 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu et.al. | 2408.05211 | link |
2024-08-09 | Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners | Michael Vaccaro Jr et.al. | 2408.05204 | null |
2024-08-09 | TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning | Yujie Feng et.al. | 2408.05200 | link |
2024-08-09 | ECG-FM: An Open Electrocardiogram Foundation Model | Kaden McKeen et.al. | 2408.05178 | link |
2024-08-09 | Weak-Annotation of HAR Datasets using Vision Foundation Models | Marius Bock et.al. | 2408.05169 | link |
2024-08-09 | AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | Pritam Deka et.al. | 2408.05149 | null |
2024-08-09 | A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning | Ye Yuan et.al. | 2408.05141 | null |
2024-08-09 | Is ChatGPT a Good Software Librarian? An Exploratory Study on the Use of ChatGPT for Software Library Recommendations | Jasmine Latendresse et.al. | 2408.05128 | null |
2024-08-09 | Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media | Petre Breazu et.al. | 2408.05126 | null |
2024-08-09 | Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video | Chunggi Lee et.al. | 2408.05123 | null |
2024-08-09 | A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? | Xinyu Liu et.al. | 2408.05109 | link |
2024-08-09 | Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection | Xincheng Pang et.al. | 2408.05107 | null |
2024-08-09 | How Well Do LLMs Identify Cultural Unity in Diversity? | Jialin Li et.al. | 2408.05102 | link |
2024-08-09 | Hyperbolic Learning with Multimodal Large Language Models | Paolo Mandica et.al. | 2408.05097 | null |
2024-08-09 | Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts | Tingchen Fu et.al. | 2408.05094 | null |
2024-08-09 | Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models | Zikai Xie et.al. | 2408.05093 | link |
2024-08-09 | Generating novel experimental hypotheses from language models: A case study on cross-dative generalization | Kanishka Misra et.al. | 2408.05086 | link |
2024-08-09 | RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records | Sangjoon Park et.al. | 2408.05074 | null |
2024-08-09 | Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Marcelo Sartori Locatelli et.al. | 2408.05035 | null |
2024-08-08 | Better Alignment with Instruction Back-and-Forth Translation | Thao Nguyen et.al. | 2408.04614 | null |
2024-08-08 | Code-switching in text and speech reveals information-theoretic audience design | Debasmita Bhattacharya et.al. | 2408.04596 | null |
2024-08-09 | Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Qirui Jiao et.al. | 2408.04594 | link |
2024-08-08 | Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness | Xiaojing Fan et.al. | 2408.04585 | null |
2024-08-08 | SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Tianrun Chen et.al. | 2408.04579 | null |
2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575 | null |
2024-08-08 | Learning Fine-Grained Grounded Citations for Attributed Large Language Models | Lei Huang et.al. | 2408.04568 | link |
2024-08-08 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang et.al. | 2408.04556 | link |
2024-08-08 | Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation | Daniele Rege Cambrin et.al. | 2408.04523 | link |
2024-08-08 | Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models | Fabio Pernisi et.al. | 2408.04522 | null |
2024-08-08 | What You Need is What You Get: Theory of Mind for an LLM-Based Code Understanding Assistant | Jonan Richards et.al. | 2408.04477 | null |
2024-08-08 | Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | Yiqun Zhang et.al. | 2408.04472 | link |
2024-08-08 | RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents | Zihao Zhu et.al. | 2408.04449 | link |
2024-08-08 | Large Language Models for cross-language code clone detection | Micheline Bénédicte Moumoula et.al. | 2408.04430 | null |
2024-08-08 | Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models | Philipp Müller et.al. | 2408.04420 | null |
2024-08-08 | Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning | Seong-Il Park et.al. | 2408.04414 | null |
2024-08-08 | Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers | Moritz Scherer et.al. | 2408.04413 | null |
2024-08-08 | Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset | Kentaro Ozeki et.al. | 2408.04403 | link |
2024-08-08 | Automated Educational Question Generation at Different Bloom's Skill Levels using Large Language Models: Strategies and Evaluation | Nicy Scaria et.al. | 2408.04394 | link |
2024-08-08 | Open-domain Implicit Format Control for Large Language Model Generation | Yiqun Yao et.al. | 2408.04392 | link |
2024-08-07 | How Well Can Vision Language Models See Image Details? | Chenhui Gou et.al. | 2408.03940 | null |
2024-08-07 | SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature | Vinícius Di Oliveira et.al. | 2408.03936 | null |
2024-08-07 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu et.al. | 2408.03910 | link |
2024-08-07 | Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models | Shachi H Kumar et.al. | 2408.03907 | null |
2024-08-07 | Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond | Beomseok Lee et.al. | 2408.03900 | link |
2024-08-07 | Simplifying Scholarly Abstracts for Accessible Digital Libraries | Haining Wang et.al. | 2408.03899 | link |
2024-08-07 | From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems | Leixian Shen et.al. | 2408.03876 | null |
2024-08-07 | PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training | Haoran Xu et.al. | 2408.03865 | null |
2024-08-07 | GAIA -- A Large Language Model for Advanced Power Dispatch | Yuheng Cheng et.al. | 2408.03847 | null |
2024-08-07 | MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models | Yuchen Dong et.al. | 2408.03841 | null |
2024-08-07 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta et.al. | 2408.03837 | link |
2024-08-07 | Target Prompting for Information Extraction with Vision Language Model | Dipankar Medhi et.al. | 2408.03834 | null |
2024-08-07 | Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning | Simret Araya Gebreegziabher et.al. | 2408.03819 | null |
2024-08-07 | Generative Language Models with Retrieval Augmented Generation for Automated Short Answer Scoring | Zifan Wang et.al. | 2408.03811 | null |
2024-08-07 | 'Finance Wizard' at the FinLLM Challenge Task: Financial Text Summarization | Meisin Lee et.al. | 2408.03762 | null |
2024-08-07 | MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video | Xiaoqing Guo et.al. | 2408.03761 | null |
2024-08-07 | Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation | Jingjing Xie et.al. | 2408.03735 | link |
2024-08-07 | Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks | Zizhang Chen et.al. | 2408.03732 | null |
2024-08-07 | A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models | Pengxiang Zhao et.al. | 2408.03728 | null |
2024-08-07 | Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction | Benjamin Matthias Ruppik et.al. | 2408.03706 | null |
2024-08-06 | CoverBench: A Challenging Benchmark for Complex Claim Verification | Alon Jacovi et.al. | 2408.03325 | null |
2024-08-06 | Segment Anything in Medical Images and Videos: Benchmark and Deployment | Jun Ma et.al. | 2408.03322 | link |
2024-08-06 | TextIM: Part-aware Interactive Motion Synthesis from Text | Siyuan Fan et.al. | 2408.03302 | null |
2024-08-06 | KaPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models | Ruizhe Zhang et.al. | 2408.03297 | null |
2024-08-06 | Biomedical SAM 2: Segment Anything in Biomedical Images and Videos | Zhiling Yan et.al. | 2408.03286 | link |
2024-08-07 | StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Boxi Cao et.al. | 2408.03281 | link |
2024-08-06 | Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments | Angie Boggust et.al. | 2408.03274 | null |
2024-08-06 | Synthesizing Text-to-SQL Data from Weak and Strong LLMs | Jiaxi Yang et.al. | 2408.03256 | null |
2024-08-06 | Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons | Yifei Wang et.al. | 2408.03247 | link |
2024-08-06 | Making Long-Context Language Models Better Multi-Hop Reasoners | Yanyang Li et.al. | 2408.03246 | link |
2024-08-06 | Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi | Pranita Deshmukh et.al. | 2408.03172 | null |
2024-08-06 | Conditioning LLMs with Emotion in Neural Machine Translation | Charles Brazier et.al. | 2408.03150 | null |
2024-08-06 | Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization | Yanghai Zhang et.al. | 2408.03149 | link |
2024-08-06 | Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations | Leo Donisch et.al. | 2408.03130 | null |
2024-08-06 | Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation | Artur Guimarães et.al. | 2408.03127 | link |
2024-08-06 | Evaluating the Translation Performance of Large Language Models Based on Euas-20 | Yan Huang et.al. | 2408.03119 | null |
2024-08-06 | Topic Modeling with Fine-tuning LLMs and Bag of Sentences | Johannes Schneider et.al. | 2408.03099 | link |
2024-08-07 | TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration | Siqi Gu et.al. | 2408.03095 | null |
2024-08-06 | 500xCompressor: Generalized Prompt Compression for Large Language Models | Zongqian Li et.al. | 2408.03094 | link |
2024-08-06 | Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement | Le Yu et.al. | 2408.03092 | link |
2024-08-05 | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | Dongyang Liu et.al. | 2408.02657 | link |
2024-08-05 | Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models? | Mohammad Bahrami Karkevandi et.al. | 2408.02651 | null |
2024-08-05 | Command-line Obfuscation Detection using Small Language Models | Vojtech Outrata et.al. | 2408.02637 | null |
2024-08-05 | SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | Muxi Diao et.al. | 2408.02632 | null |
2024-08-05 | Language Model Can Listen While Speaking | Ziyang Ma et.al. | 2408.02622 | null |
2024-08-05 | Progressively Selective Label Enhancement for Language Model Alignment | Biao Liu et.al. | 2408.02599 | null |
2024-08-05 | Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection | Sajal Aggarwal et.al. | 2408.02595 | null |
2024-08-05 | Leveraging the Power of LLMs: A Fine-Tuning Approach for High-Quality Aspect-Based Summarization | Ankan Mullick et.al. | 2408.02584 | null |
2024-08-05 | DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions | Siying Hu et.al. | 2408.02574 | null |
2024-08-05 | Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | Yauwai Yim et.al. | 2408.02559 | null |
2024-08-05 | Generative AI as a Service in 6G Edge-Cloud: Generation Task Offloading by In-context Learning | Hao Zhou et.al. | 2408.02549 | null |
2024-08-05 | RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | Daniel Fleischer et.al. | 2408.02545 | link |
2024-08-05 | Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Xinbei Ma et.al. | 2408.02544 | link |
2024-08-05 | Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph | Zhao Kaichen et.al. | 2408.02535 | null |
2024-08-05 | Practical Attacks against Black-box Code Completion Engines | Slobodan Jenko et.al. | 2408.02509 | null |
2024-08-05 | UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model | Zhaowei Li et.al. | 2408.02503 | link |
2024-08-05 | Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation | Aaron Imani et.al. | 2408.02502 | null |
2024-08-05 | A First Look at License Compliance Capability of LLMs in Code Generation | Weiwei Xu et.al. | 2408.02487 | link |
2024-08-05 | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection | Ting Lei et.al. | 2408.02484 | link |
2024-08-05 | From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | Haolin Jin et.al. | 2408.02479 | null |
2024-08-02 | Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting | Xiangyu Zhao et.al. | 2408.01423 | null |
2024-08-02 | Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | Jingtong Su et.al. | 2408.01420 | null |
2024-08-02 | DebateQA: Evaluating Question Answering on Debatable Knowledge | Rongwu Xu et.al. | 2408.01419 | link |
2024-08-02 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua et.al. | 2408.01417 | null |
2024-08-02 | Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer | Yu Yang et.al. | 2408.01402 | null |
2024-08-02 | Coalitions of Large Language Models Increase the Robustness of AI Agents | Prattyush Mangal et.al. | 2408.01380 | null |
2024-08-02 | Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation | Jheng-Hong Yang et.al. | 2408.01363 | null |
2024-08-02 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding et.al. | 2408.01355 | link |
2024-08-02 | MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code | Kaiwen Ning et.al. | 2408.01354 | link |
2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346 | null |
2024-08-02 | MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | Benno Weck et.al. | 2408.01337 | link |
2024-08-02 | A Backbone for Long-Horizon Robot Task Understanding | Xiaoshuai Chen et.al. | 2408.01334 | null |
2024-08-02 | FANNO: Augmenting High-Quality Instruction Data with Open-Sourced LLMs Only | He Zhu et.al. | 2408.01323 | null |
2024-08-02 | A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks | Jiaqi Wang et.al. | 2408.01319 | null |
2024-08-02 | Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models | Ying Zhang et.al. | 2408.01308 | null |
2024-08-02 | The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models | Hannah Chen et.al. | 2408.01285 | null |
2024-08-02 | RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | Kunlun Zhu et.al. | 2408.01262 | link |
2024-08-02 | The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models | Simone Caldarella et.al. | 2408.01228 | null |
2024-08-02 | High-Throughput Phenotyping of Clinical Text Using Large Language Models | Daniel B. Hier et.al. | 2408.01214 | null |
2024-08-02 | Misinforming LLMs: vulnerabilities, challenges and opportunities | Bo Zhou et.al. | 2408.01168 | null |
2024-08-01 | AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | Mengkang Hu et.al. | 2408.00764 | null |
2024-08-01 | UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model | Xiangyu Fan et.al. | 2408.00762 | null |
2024-08-01 | Tamper-Resistant Safeguards for Open-Weight LLMs | Rishub Tamirisa et.al. | 2408.00761 | link |
2024-08-01 | Thermal Conductivity Predictions with Foundation Atomistic Models | Balázs Póta et.al. | 2408.00755 | link |
2024-08-01 | Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | Benlin Liu et.al. | 2408.00754 | null |
2024-08-01 | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | Siyu Jiao et.al. | 2408.00744 | link |
2024-08-01 | DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | Jovan Stojkovic et.al. | 2408.00741 | null |
2024-08-01 | Virchow 2: Scaling Self-Supervised Mixed Magnification Models in Pathology | Eric Zimmermann et.al. | 2408.00738 | null |
2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727 | link |
2024-08-01 | An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Yangzhen Wu et.al. | 2408.00724 | null |
2024-08-01 | Pathway to Secure and Trustworthy 6G for LLMs: Attacks, Defense, and Opportunities | Sunder Ali Khowaja et.al. | 2408.00722 | null |
2024-08-01 | SAM 2: Segment Anything in Images and Videos | Nikhila Ravi et.al. | 2408.00714 | link |
2024-08-01 | Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM | Xiaofeng Liu et.al. | 2408.00706 | null |
2024-08-01 | Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | Trapoom Ukarapol et.al. | 2408.00690 | link |
2024-08-01 | Can Developers Prompt? A Controlled Experiment for Code Documentation Generation | Hans-Alexander Kruse et.al. | 2408.00686 | null |
2024-08-01 | ExpertAF: Expert Actionable Feedback from Video | Kumar Ashutosh et.al. | 2408.00672 | null |
2024-08-01 | AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models | Daqin Luo et.al. | 2408.00665 | link |
2024-08-01 | Disentangling Dense Embeddings with Sparse Autoencoders | Charles O'Neill et.al. | 2408.00657 | null |
2024-08-02 | SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models | Hongjun An et.al. | 2408.00655 | link |
2024-08-01 | Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning | Xuri Ge et.al. | 2408.00644 | null |
2024-07-31 | Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey | Atsuyuki Miyai et.al. | 2407.21794 | null |
2024-07-31 | Vision-Language Model Based Handwriting Verification | Mihir Chauhan et.al. | 2407.21788 | null |
2024-07-31 | Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | Bradley Brown et.al. | 2407.21787 | null |
2024-07-31 | The Llama 3 Herd of Models | Abhimanyu Dubey et.al. | 2407.21783 | null |
2024-07-31 | Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Shi Liu et.al. | 2407.21771 | null |
2024-07-31 | MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts | Xi Victoria Lin et.al. | 2407.21770 | null |
2024-07-31 | ReplanVLM: Replanning Robotic Tasks with Visual Language Models | Aoran Mei et.al. | 2407.21762 | null |
2024-07-31 | Learning Video Context as Interleaved Multimodal Sequences | Kevin Qinghong Lin et.al. | 2407.21757 | link |
2024-07-31 | A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation | Mothilal Asokan et.al. | 2407.21739 | null |
2024-07-31 | Open-Vocabulary Audio-Visual Semantic Segmentation | Ruohao Guo et.al. | 2407.21721 | null |
2024-07-31 | Adaptive Retrieval-Augmented Generation for Conversational Systems | Xi Wang et.al. | 2407.21712 | null |
2024-07-31 | CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature | Stefan Langer et.al. | 2407.21708 | null |
2024-07-31 | TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities | Ming Zhang et.al. | 2407.21693 | link |
2024-07-31 | Synth-Empathy: Towards High-Quality Synthetic Empathy Data | Hao Liang et.al. | 2407.21669 | link |
2024-08-01 | Defending Jailbreak Attack in VLMs via Cross-modality Information Detector | Yue Xu et.al. | 2407.21659 | link |
2024-07-31 | MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment | Anurag Das et.al. | 2407.21654 | null |
2024-07-31 | Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation | Xiang Luo et.al. | 2407.21633 | link |
2024-07-31 | TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods | Gabriel Loiseau et.al. | 2407.21630 | link |
2024-07-31 | LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows | Lukas Teufelberger et.al. | 2407.21593 | null |
2024-07-31 | A Performance Study of LLM-Generated Code on Leetcode | Tristan Coignion et.al. | 2407.21579 | null |
2024-07-30 | ThinK: Thinner Key Cache by Query-Driven Pruning | Yuhui Xu et.al. | 2407.21018 | null |
2024-07-30 | CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning | Yuexi Du et.al. | 2407.21011 | link |
2024-07-30 | GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models | Ali Abdollahi et.al. | 2407.21001 | link |
2024-07-30 | MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning | Yupeng Chen et.al. | 2407.20999 | null |
2024-07-30 | From Feature Importance to Natural Language Explanations Using LLMs with RAG | Sule Tekkesinoglu et.al. | 2407.20990 | link |
2024-07-30 | Large Language Models (LLMs) for Semantic Communication in Edge-based IoT Networks | Alakesh Kalita et.al. | 2407.20970 | null |
2024-07-30 | MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions | Xiaowei Chi et.al. | 2407.20962 | link |
2024-07-30 | UniProcessor: A Text-induced Unified Low-level Image Processor | Huiyu Duan et.al. | 2407.20928 | link |
2024-07-30 | SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition | Hao Tan et.al. | 2407.20920 | null |
2024-07-30 | Automated Review Generation Method Based on Large Language Models | Shican Wu et.al. | 2407.20906 | link |
2024-07-30 | Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach | Adam Wojciechowski et.al. | 2407.20899 | link |
2024-07-30 | ThinkRepair: Self-Directed Automated Program Repair | Xin Yin et.al. | 2407.20898 | link |
2024-07-30 | Effective Black Box Testing of Sentiment Analysis Classification Networks | Parsa Karbasizadeh et.al. | 2407.20884 | null |
2024-07-30 | Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification | Boyang Zhang et.al. | 2407.20859 | null |
2024-07-30 | Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations | Sarthak Anand et.al. | 2407.20856 | null |
2024-07-30 | Large Language Model (LLM)-enabled Graphs in Dynamic Networking | Geng Sun et.al. | 2407.20840 | null |
2024-07-30 | How to Measure the Intelligence of Large Language Models? | Nils Körber et.al. | 2407.20828 | null |
2024-07-30 | Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning | Norman Di Palo et.al. | 2407.20798 | null |
2024-07-30 | Interpretable Pre-Trained Transformers for Heart Time-Series Data | Harry J. Davies et.al. | 2407.20775 | link |
2024-07-30 | OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance | Yongqiang Yao et.al. | 2407.20761 | link |
2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232 | null |
2024-07-29 | Improving 2D Feature Representations by 3D-Aware Fine-Tuning | Yuanwen Yue et.al. | 2407.20229 | null |
2024-07-29 | FlexAttention for Efficient High-Resolution Vision-Language Models | Junyan Li et.al. | 2407.20228 | null |
2024-07-29 | Can Editing LLMs Inject Harm? | Canyu Chen et.al. | 2407.20224 | null |
2024-07-29 | SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction | Çağhan Köksal et.al. | 2407.20214 | null |
2024-07-29 | QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval | Hongming Tan et.al. | 2407.20207 | null |
2024-07-29 | MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Zehui Chen et.al. | 2407.20183 | link |
2024-07-29 | Theia: Distilling Diverse Vision Foundation Models for Robot Learning | Jinghuan Shang et.al. | 2407.20179 | link |
2024-07-29 | AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs | Feiyang Kang et.al. | 2407.20177 | link |
2024-07-29 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng et.al. | 2407.20174 | link |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171 | link |
2024-07-29 | Language-Conditioned Offline RL for Multi-Robot Navigation | Steven Morad et.al. | 2407.20164 | null |
2024-07-29 | rLLM: Relational Table Learning with LLMs | Weichen Li et.al. | 2407.20157 | link |
2024-07-29 | ByteCheckpoint: A Unified Checkpointing System for LLM Development | Borui Wan et.al. | 2407.20143 | null |
2024-07-29 | Strong Copyright Protection for Language Models via Adaptive Model Fusion | Javier Abad et.al. | 2407.20105 | null |
2024-07-29 | Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models | Zhe Li et.al. | 2407.20053 | null |
2024-07-29 | Exploring Large Language Models to generate Easy to Read content | Paloma Martínez et.al. | 2407.20046 | null |
2024-07-29 | MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Walid Bousselham et.al. | 2407.20034 | null |
2024-07-29 | Efficient Training of Large Language Models on Distributed Infrastructures: A Survey | Jiangfei Duan et.al. | 2407.20018 | null |
2024-07-29 | Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs | Lars Vogt et.al. | 2407.20007 | null |
2024-07-26 | Wolf: Captioning Everything with a World Summarization Framework | Boyi Li et.al. | 2407.18908 | null |
2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
2024-07-26 | A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web | Juliana Barbosa et.al. | 2407.18898 | link |
2024-07-26 | Small Molecule Optimization with Large Language Models | Philipp Guevorguian et.al. | 2407.18897 | link |
2024-07-26 | Human-artificial intelligence teaming for scientific information extraction from data-driven additive manufacturing research using large language models | Mutahar Safdar et.al. | 2407.18827 | null |
2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787 | link |
2024-07-26 | The power of Prompts: Evaluating and Mitigating Gender Bias in MT with LLMs | Aleix Sant et.al. | 2407.18786 | null |
2024-07-26 | Foundation Models for the Digital Twin Creation of Cyber-Physical Systems | Shaukat Ali et.al. | 2407.18779 | null |
2024-07-26 | TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portals | Kevin Kliimask et.al. | 2407.18764 | null |
2024-07-26 | Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery | Yuni Susanti et.al. | 2407.18752 | link |
2024-07-26 | Towards Effective and Efficient Continual Pre-training of Large Language Models | Jie Chen et.al. | 2407.18743 | null |
2024-07-26 | Towards Generalized Offensive Language Identification | Alphaeus Dmonte et.al. | 2407.18738 | null |
2024-07-26 | LLASP: Fine-tuning Large Language Models for Answer Set Programming | Erica Coppolillo et.al. | 2407.18723 | null |
2024-07-26 | Neurosymbolic AI for Enhancing Instructability in Generative AI | Amit Sheth et.al. | 2407.18722 | null |
2024-07-26 | Cluster-norm for Unsupervised Probing of Knowledge | Walter Laurito et.al. | 2407.18712 | link |
2024-07-26 | Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation | Esteban Garces Arias et.al. | 2407.18698 | link |
2024-07-26 | Collaborative Evolving Strategy for Automatic Data-Centric Development | Xu Yang et.al. | 2407.18690 | null |
2024-07-26 | The BIAS Detection Framework: Bias Detection in Word Embeddings and Language Models for European Languages | Alexandre Puttick et.al. | 2407.18689 | link |
2024-07-26 | Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift | Seongho Son et.al. | 2407.18676 | null |
2024-07-26 | Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models | Xiang Shi et.al. | 2407.18626 | link |
2024-07-25 | Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Tianduo Wang et.al. | 2407.18248 | link |
2024-07-25 | LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Zhengbo Wang et.al. | 2407.18242 | link |
2024-07-25 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
2024-07-26 | Exploring Scaling Trends in LLM Robustness | Nikolaus Howe et.al. | 2407.18213 | null |
2024-07-25 | AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction | Chunan Liu et.al. | 2407.18184 | link |
2024-07-25 | Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning | Sindhura Kommu et.al. | 2407.18181 | null |
2024-07-25 | Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | Sanae Lotfi et.al. | 2407.18158 | null |
2024-07-25 | Vlad Sobal et.al. | 2407.18134 | null | |
2024-07-25 | Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Fakhraddin Alwajih et.al. | 2407.18129 | null |
2024-07-25 | Efficient Inference of Vision Instruction-Following Models with Elastic Cache | Zuyan Liu et.al. | 2407.18121 | link |
2024-07-25 | Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping | Jack Breen et.al. | 2407.18105 | link |
2024-07-25 | Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow | Tian Guo et.al. | 2407.18103 | null |
2024-07-25 | PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization | Christopher Clarke et.al. | 2407.18078 | link |
2024-07-25 | C2P: Featuring Large Language Models with Causal Reasoning | Abdolmahdi Bagheri et.al. | 2407.18069 | null |
2024-07-25 | ComPeer: A Generative Conversational Agent for Proactive Peer Support | Tianjian Liu et.al. | 2407.18064 | link |
2024-07-25 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding | Soham Deshmukh et.al. | 2407.18062 | link |
2024-07-25 | Difficulty Estimation and Simplification of French Text Using LLMs | Henri Jamet et.al. | 2407.18061 | null |
2024-07-25 | The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | Eric Yang et.al. | 2407.18044 | null |
2024-07-25 | RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | Haoyu Chen et.al. | 2407.18035 | null |
2024-07-25 | GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy | Jan Batzner et.al. | 2407.18008 | null |
2024-07-24 | I Could've Asked That: Reformulating Unanswerable Questions | Wenting Zhao et.al. | 2407.17469 | link |
2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
2024-07-24 | CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models | Jiawei Gu et.al. | 2407.17467 | null |
2024-07-24 | Yunhao Fang et.al. | 2407.17453 | null | |
2024-07-24 | Fluent Student-Teacher Redteaming | T. Ben Thompson et.al. | 2407.17447 | link |
2024-07-24 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Michael-Andrei Panaitescu-Liess et.al. | 2407.17417 | null |
2024-07-24 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | Tianjin Huang et.al. | 2407.17412 | null |
2024-07-24 | Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models | Yida Zhao et.al. | 2407.17406 | link |
2024-07-24 | Grammar-based Game Description Generation using Large Language Models | Tsunehiko Tanaka et.al. | 2407.17404 | null |
2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | null |
2024-07-24 | PERSONA: A Reproducible Testbed for Pluralistic Alignment | Louis Castricato et.al. | 2407.17387 | null |
2024-07-24 | A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance | Amirreza Naziri et.al. | 2407.17383 | null |
2024-07-24 | MMRA: A Benchmark for Multi-granularity Multi-image Relational Association | Siwei Wu et.al. | 2407.17379 | link |
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
2024-07-24 | Gradient-based inference of abstract task representations for generalization in neural networks | Ali Hummos et.al. | 2407.17356 | null |
2024-07-24 | Scalify: scale propagation for efficient low-precision LLM training | Paul Balança et.al. | 2407.17353 | link |
2024-07-24 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding et.al. | 2407.17349 | link |
2024-07-24 | DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation | Qian Feng et.al. | 2407.17348 | null |
2024-07-24 | Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition | Ke Bao et.al. | 2407.17344 | null |
2024-07-24 | How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations? | Leo Yu-Ho Lo et.al. | 2407.17291 | null |
2024-07-23 | PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects | Junyi Li et.al. | 2407.16696 | link |
2024-07-23 | Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack | Xiaoyue Xu et.al. | 2407.16695 | link |
2024-07-23 | Can Large Language Models Automatically Jailbreak GPT-4V? | Yuanwei Wu et.al. | 2407.16686 | null |
2024-07-23 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | Pengfei Chen et.al. | 2407.16682 | null |
2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
2024-07-23 | Course-Correction: Safety Alignment Using Synthetic Preferences | Rongwu Xu et.al. | 2407.16637 | link |
2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
2024-07-23 | Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? | Jonathan Hayase et.al. | 2407.16607 | link |
2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
2024-07-23 | A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions | Giorgos Lysandrou et.al. | 2407.16593 | null |
2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
2024-07-23 | TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback | Eunseop Yoon et.al. | 2407.16574 | null |
2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | link |
2024-07-23 | Patched RTC: evaluating LLMs for diverse software development tasks | Asankhaya Sharma et.al. | 2407.16557 | link |
2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
2024-07-23 | Quantifying the Role of Textual Predictability in Automatic Speech Recognition | Sean Robertson et.al. | 2407.16537 | null |
2024-07-23 | Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models | Aristeidis Panos et.al. | 2407.16526 | null |
2024-07-23 | AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | Yizhou Chi et.al. | 2407.16521 | null |
2024-07-23 | Language-Based Security for Low-Level MPC | Christian Skalka et.al. | 2407.16504 | null |
2024-07-23 | Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models | Kenza Benkirane et.al. | 2407.16470 | link |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850 | link |
2024-07-22 | LLMmap: Fingerprinting For Large Language Models | Dario Pasquini et.al. | 2407.15847 | link |
2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841 | link |
2024-07-22 | MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity | Yangzhou Liu et.al. | 2407.15838 | link |
2024-07-22 | dMel: Speech Tokenization made Simple | He Bai et.al. | 2407.15835 | null |
2024-07-22 | J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling | Wataru Nakata et.al. | 2407.15828 | null |
2024-07-22 | Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight | Ziyuan Huang et.al. | 2407.15819 | null |
2024-07-22 | Perceptions of Linguistic Uncertainty by Language Models and Humans | Catarina G Belem et.al. | 2407.15814 | link |
2024-07-22 | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Yunkang Cao et.al. | 2407.15795 | link |
2024-07-22 | CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning | Emanuele Frascaroli et.al. | 2407.15793 | link |
2024-07-22 | Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach | Rian Dolphin et.al. | 2407.15788 | null |
2024-07-22 | Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels | Zhuorui Ye et.al. | 2407.15786 | null |
2024-07-22 | Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning | Kaiwen Wang et.al. | 2407.15762 | null |
2024-07-22 | MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | Marco Simoni et.al. | 2407.15748 | null |
2024-07-22 | OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context | Steffen Kleinle et.al. | 2407.15736 | null |
2024-07-22 | TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | John Chong Min Tan et.al. | 2407.15734 | link |
2024-07-22 | Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders | Laura Niss et.al. | 2407.15731 | null |
2024-07-22 | SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection | Dimitrios Kollias et.al. | 2407.15728 | null |
2024-07-22 | DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design | Zhi Hao Luo et.al. | 2407.15723 | link |
2024-07-22 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu et.al. | 2407.15720 | link |
2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507 | link |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
2024-07-19 | PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding | Chenshu Hou et.al. | 2407.14491 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487 | link |
2024-07-19 | Data-Centric Human Preference Optimization with Rationales | Hoang Anh Just et.al. | 2407.14477 | link |
2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
2024-07-19 | Check-Eval: A Checklist-based Approach for Evaluating Text Quality | Jayr Pereira et.al. | 2407.14467 | null |
2024-07-19 | Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier | Zachary Wojtowicz et.al. | 2407.14452 | null |
2024-07-19 | Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Renshan Zhang et.al. | 2407.14439 | link |
2024-07-19 | Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Senthooran Rajamanoharan et.al. | 2407.14435 | null |
2024-07-19 | Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | HamidReza Imani et.al. | 2407.14417 | null |
2024-07-19 | System-1.x: Learning to Balance Fast and Slow Planning with Language Models | Swarnadeep Saha et.al. | 2407.14414 | link |
2024-07-19 | DEAL: Disentangle and Localize Concept-level Explanations for VLMs | Tang Li et.al. | 2407.14412 | link |
2024-07-19 | The Vision of Autonomic Computing: Can LLMs Make It a Reality? | Zhiyang Zhang et.al. | 2407.14402 | null |
2024-07-19 | Frontiers of Deep Learning: From Novel Application to Real-World Deployment | Rui Xie et.al. | 2407.14386 | null |
2024-07-19 | Open Artificial Knowledge | Vadim Borisov et.al. | 2407.14371 | null |
2024-07-19 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu et.al. | 2407.14355 | link |
2024-07-19 | Improving Retrieval in Sponsored Search by Leveraging Query Context Signals | Akash Kumar Mohankumar et.al. | 2407.14346 | null |
2024-07-19 | LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains | Raphael Hernandes et.al. | 2407.14344 | null |
2024-07-19 | Multimodal Misinformation Detection using Large Vision-Language Models | Sahar Tahmasebi et.al. | 2407.14321 | null |
2024-07-18 | Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data | Charles Jin et.al. | 2407.13765 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761 | null |
2024-07-18 | Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models | Zhuo Chen et.al. | 2407.13757 | null |
2024-07-18 | CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications | Mirza Masfiqur Rahman et.al. | 2407.13742 | null |
2024-07-18 | Baba Is AI: Break the Rules to Beat the Benchmark | Nathan Cloos et.al. | 2407.13729 | null |
2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | link |
2024-07-18 | Understanding Reference Policies in Direct Preference Optimization | Yixin Liu et.al. | 2407.13709 | link |
2024-07-18 | A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice | Shaina Raza et.al. | 2407.13699 | null |
2024-07-18 | Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation | Yotam Perlitz et.al. | 2407.13696 | link |
2024-07-18 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692 | null |
2024-07-18 | Shaded Route Planning Using Active Segmentation and Identification of Satellite Images | Longchao Da et.al. | 2407.13689 | null |
2024-07-18 | FuLG: 150B Romanian Corpus for Language Model Pretraining | Vlad-Andrei Bădoiu et.al. | 2407.13657 | null |
2024-07-18 | COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization | Skyler Grandel et.al. | 2407.13648 | null |
2024-07-18 | Weak-to-Strong Reasoning | Yuqing Yang et.al. | 2407.13647 | link |
2024-07-18 | Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies | Chaofan Tao et.al. | 2407.13623 | link |
2024-07-18 | KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration | Youfu Yan et.al. | 2407.13598 | null |
2024-07-18 | PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks | Vishal Pallagani et.al. | 2407.13597 | null |
2024-07-18 | EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension | Wei Zhang et.al. | 2407.13596 | link |
2024-07-18 | Robust Calibration of Large Vision-Language Adapters | Balamurali Murugesan et.al. | 2407.13588 | link |
2024-07-18 | Towards Zero-Shot Multimodal Machine Translation | Matthieu Futeral et.al. | 2407.13579 | link |
2024-07-17 | LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Kaichen Zhang et.al. | 2407.12772 | link |
2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725 | null |
2024-07-17 | The Future of Learning: Large Language Models through the Lens of Students | He Zhang et.al. | 2407.12723 | null |
2024-07-17 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen et.al. | 2407.12709 | link |
2024-07-17 | Subgraph-Aware Training of Text-based Methods for Knowledge Graph Completion | Youmin Ko et.al. | 2407.12703 | null |
2024-07-17 | Patch-Level Training for Large Language Models | Chenze Shao et.al. | 2407.12665 | link |
2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642 | null |
2024-07-17 | Domain-specific or Uncertainty-aware models: Does it really make a difference for biomedical text classification? | Aman Sinha et.al. | 2407.12626 | null |
2024-07-17 | Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences | Claudio Pinhanez et.al. | 2407.12620 | null |
2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613 | link |
2024-07-17 | VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | Ofir Abramovich et.al. | 2407.12594 | null |
2024-07-18 | Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Antoni Kowalczuk et.al. | 2407.12588 | link |
2024-07-17 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang et.al. | 2407.12580 | link |
2024-07-17 | Audio Conditioning for Music Generation via Discrete Bottleneck Features | Simon Rouard et.al. | 2407.12563 | null |
2024-07-17 | Conspiracy theories and where to find them on TikTok | Francesco Corso et.al. | 2407.12545 | null |
2024-07-17 | Abstraction Alignment: Comparing Model and Human Conceptual Relationships | Angie Boggust et.al. | 2407.12543 | link |
2024-07-17 | Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | Xihe Qiu et.al. | 2407.12532 | null |
2024-07-17 | Crafting the Path: Robust Query Rewriting for Information Retrieval | Ingeol Baek et.al. | 2407.12529 | null |
2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | link |
2024-07-16 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li et.al. | 2407.11963 | link |
2024-07-16 | Code Documentation and Analysis to Secure Software Development | Paul Attie et.al. | 2407.11934 | null |
2024-07-16 | What's Wrong? Refining Meeting Summaries with LLM Feedback | Frederic Kirstein et.al. | 2407.11919 | null |
2024-07-16 | GraphFM: A Scalable Framework for Multi-Graph Pretraining | Divyansha Lachi et.al. | 2407.11907 | null |
2024-07-16 | Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads | Aritra Dhar et.al. | 2407.11888 | null |
2024-07-16 | Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection | Gaetan Lopez Latouche et.al. | 2407.11854 | null |
2024-07-16 | Schema Matching with Large Language Models: an Experimental Study | Marcel Parciak et.al. | 2407.11852 | link |
2024-07-16 | LoFTI: Localization and Factuality Transfer to Indian Locales | Sona Elza Simon et.al. | 2407.11833 | link |
2024-07-16 | GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text | Kyle Hamilton et.al. | 2407.11827 | null |
2024-07-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
2024-07-16 | Large Language Models as Misleading Assistants in Conversation | Betty Li Hou et.al. | 2407.11789 | null |
2024-07-16 | SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models | Xinbo Wu et.al. | 2407.11780 | null |
2024-07-16 | Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text | Seyedeh Fatemeh Ebrahimi et.al. | 2407.11774 | null |
2024-07-16 | Educational Personalized Learning Path Planning with Large Language Models | Chee Ng et.al. | 2407.11773 | null |
2024-07-16 | XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach | Truong Thanh Hung Nguyen et.al. | 2407.11771 | link |
2024-07-16 | Robust Utility-Preserving Text Anonymization Based on Large Language Models | Tianyu Yang et.al. | 2407.11770 | link |
2024-07-16 | Vectoring Languages | Joseph Chen et.al. | 2407.11766 | null |
2024-07-16 | Exploring Quantization for Efficient Pre-Training of Transformer Language Models | Kamran Chitsaz et.al. | 2407.11722 | link |
2024-07-16 | Harnessing Large Language Models for Multimodal Product Bundling | Xiaohao Liu et.al. | 2407.11712 | null |
2024-07-15 | VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation | Bocheng Zou et.al. | 2407.10972 | link |
2024-07-15 | Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | Hongyu Wang et.al. | 2407.10969 | null |
2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | link |
2024-07-15 | Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | Ruisheng Cao et.al. | 2407.10956 | link |
2024-07-15 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan et.al. | 2407.10953 | null |
2024-07-15 | Can Textual Semantics Mitigate Sounding Object Segmentation Preference? | Yaoting Wang et.al. | 2407.10947 | link |
2024-07-15 | Learning from Naturally Occurring Feedback | Shachar Don-Yehiya et.al. | 2407.10944 | link |
2024-07-15 | GRUtopia: Dream General Robots in a City at Scale | Hanqing Wang et.al. | 2407.10943 | link |
2024-07-15 | Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Dilara Soylu et.al. | 2407.10930 | null |
2024-07-15 | Benchmarking Vision Language Models for Cultural Understanding | Shravan Nayak et.al. | 2407.10920 | null |
2024-07-15 | FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets | Xiaohui Victor Li et.al. | 2407.10909 | link |
2024-07-15 | Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique | Mark Russinovich et.al. | 2407.10887 | null |
2024-07-15 | SLIP: Securing LLMs IP Using Weights Decomposition | Yehonathan Refael et.al. | 2407.10886 | null |
2024-07-15 | Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models | Rui Zhang et.al. | 2407.10873 | null |
2024-07-15 | GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM | Keshav Bimbraw et.al. | 2407.10870 | null |
2024-07-15 | Physics-Inspired Generative Models in Medical Imaging: A Review | Dennis Hein et.al. | 2407.10856 | null |
2024-07-15 | Weighted Grouped Query Attention in Transformers | Sai Sena Chinnakonduru et.al. | 2407.10855 | null |
2024-07-15 | An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases | Dylan Bouchard et.al. | 2407.10853 | null |
2024-07-15 | MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs | Quang H. Nguyen et.al. | 2407.10834 | null |
2024-07-15 | BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy | Tim Menzner et.al. | 2407.10829 | null |
2024-07-12 | FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 | Georgios Makridis et.al. | 2407.09467 | null |
2024-07-12 | Human-like Episodic Memory for Infinite Context LLMs | Zafeirios Fountas et.al. | 2407.09450 | link |
2024-07-12 | ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts | Amelia F. Hardy et.al. | 2407.09447 | link |
2024-07-12 | MUSCLE: A Model Update Strategy for Compatible LLM Evolution | Jessica Echterhoff et.al. | 2407.09435 | null |
2024-07-12 | A Perspective on Foundation Models for the Electric Power Grid | Hendrik F. Hamann et.al. | 2407.09434 | null |
2024-07-12 | Open (Clinical) LLMs are Sensitive to Instruction Phrasings | Alberto Mario Ceballos Arroyo et.al. | 2407.09429 | link |
2024-07-12 | TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Hang Zou et.al. | 2407.09424 | null |
2024-07-12 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su et.al. | 2407.09417 | link |
2024-07-12 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Shraman Pramanick et.al. | 2407.09413 | link |
2024-07-12 | Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce | Zhe Lin et.al. | 2407.09395 | null |
2024-07-12 | PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Saber Zerhoudi et.al. | 2407.09394 | link |
2024-07-12 | GAVEL: Generating Games Via Evolution and Language Models | Graham Todd et.al. | 2407.09388 | link |
2024-07-12 | Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text | Lucio La Cava et.al. | 2407.09364 | null |
2024-07-12 | Good Intentions, Risky Inventions: A Method for Assessing the Risks and Benefits of AI in Mobile and Wearable Uses | Marios Constantinides et.al. | 2407.09322 | link |
2024-07-12 | Scalability of Bayesian Network Structure Elicitation with Large Language Models: a Novel Methodology and Comparative Analysis | Nikolay Babakov et.al. | 2407.09311 | null |
2024-07-12 | Transformer Layers as Painters | Qi Sun et.al. | 2407.09298 | link |
2024-07-12 | Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study | Yulong Yang et.al. | 2407.09295 | null |
2024-07-12 | CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models | Dong Shu et.al. | 2407.09292 | null |
2024-07-12 | Structuring Authenticity Assessments on Historical Documents using LLMs | Andrea Schimmenti et.al. | 2407.09290 | null |
2024-07-12 | WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation | Robin Schön et.al. | 2407.09288 | link |
2024-07-11 | MAVIS: Mathematical Visual Instruction Tuning | Renrui Zhang et.al. | 2407.08739 | link |
2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735 | null |
2024-07-11 | Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Zihao Zhou et.al. | 2407.08733 | null |
2024-07-11 | A Taxonomy for Data Contamination in Large Language Models | Medha Palavalli et.al. | 2407.08716 | null |
2024-07-11 | GTA: A Benchmark for General Tool Agents | Jize Wang et.al. | 2407.08713 | link |
2024-07-11 | eyeballvul: a future-proof benchmark for vulnerability detection in the wild | Timothee Chauvin et.al. | 2407.08708 | link |
2024-07-11 | Extracting Training Data from Document-Based VQA Models | Francesco Pinto et.al. | 2407.08707 | null |
2024-07-11 | HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models | Runhui Huang et.al. | 2407.08706 | null |
2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701 | null |
2024-07-11 | Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | Anton Alexandrov et.al. | 2407.08699 | null |
2024-07-11 | Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight | Zhiqiang Xie et.al. | 2407.08694 | null |
2024-07-11 | Robotic Control via Embodied Chain-of-Thought Reasoning | Zawalski Michał et.al. | 2407.08693 | null |
2024-07-11 | SEED-Story: Multimodal Long Story Generation with Large Language Model | Shuai Yang et.al. | 2407.08683 | link |
2024-07-11 | NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Yi Zhang et.al. | 2407.08672 | null |
2024-07-11 | Uncertainty Estimation of Large Language Models in Medical Question Answering | Jiaxin Wu et.al. | 2407.08662 | null |
2024-07-11 | Towards Building Specialized Generalist AI with System 1 and System 2 Fusion | Kaiyan Zhang et.al. | 2407.08642 | null |
2024-07-11 | Junkang Wu et.al. | 2407.08639 | link | |
2024-07-11 | RoboMorph: Evolving Robot Morphology using Large Language Models | Kevin Qiu et.al. | 2407.08626 | null |
2024-07-11 | Tamil Language Computing: the Present and the Future | Kengatharaiyer Sarveswaran et.al. | 2407.08618 | null |
2024-07-11 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | Jay Shah et.al. | 2407.08608 | link |
2024-07-10 | Training on the Test Task Confounds Evaluation and Emergence | Ricardo Dominguez-Olmedo et.al. | 2407.07890 | link |
2024-07-10 | Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization | Junkang Wu et.al. | 2407.07880 | link |
2024-07-11 | Toto: Time Series Optimized Transformer for Observability | Ben Cohen et.al. | 2407.07874 | null |
2024-07-10 | FACTS About Building Retrieval Augmented Generation-based Chatbots | Rama Akkiraju et.al. | 2407.07858 | null |
2024-07-10 | OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training | Sami Jaghouar et.al. | 2407.07852 | link |
2024-07-10 | Natural Language Mechanisms via Self-Resolution with Foundation Models | Nicolas Della Penna et.al. | 2407.07845 | null |
2024-07-10 | Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Shengjia Chen et.al. | 2407.07841 | link |
2024-07-10 | Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison | Qian Yang et.al. | 2407.07840 | null |
2024-07-10 | Transformer Alignment in Large Language Models | Murdock Aubry et.al. | 2407.07810 | null |
2024-07-11 | AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning | Jongsuk Kim et.al. | 2407.07801 | link |
2024-07-10 | Attribute or Abstain: Large Language Models as Long Document Assistants | Jan Buchmann et.al. | 2407.07799 | link |
2024-07-11 | Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard | Oguzhan Topsakal et.al. | 2407.07796 | link |
2024-07-10 | Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | Tianjie Ju et.al. | 2407.07791 | link |
2024-07-10 | WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment | Jiefu Ou et.al. | 2407.07778 | null |
2024-07-10 | Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs | Hao-Tien Lewis Chiang et.al. | 2407.07775 | null |
2024-07-10 | Can ChatGPT Pass a Theory of Computing Course? | Matei A. Golesteanu et.al. | 2407.07757 | null |
2024-07-10 | Fine-Tuning Large Language Models with User-Level Differential Privacy | Zachary Charles et.al. | 2407.07737 | null |
2024-07-10 | PaliGemma: A versatile 3B VLM for transfer | Lucas Beyer et.al. | 2407.07726 | link |
2024-07-10 | Why should we ever automate moral decision making? | Vincent Conitzer et.al. | 2407.07671 | null |
2024-07-10 | A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability | Ting Fang Tan et.al. | 2407.07666 | null |
2024-07-09 | AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning | Jiaxi Cui et.al. | 2407.07094 | link |
2024-07-09 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma et.al. | 2407.07093 | link |
2024-07-09 | CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation | Tong Chen et.al. | 2407.07087 | link |
2024-07-09 | Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models | Logan Cross et.al. | 2407.07086 | link |
2024-07-09 | Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities | Shaltiel Shmidman et.al. | 2407.07080 | null |
2024-07-09 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Yung-Sung Chuang et.al. | 2407.07071 | link |
2024-07-09 | Prompting Techniques for Secure Code Generation: A Systematic Investigation | Catherine Tony et.al. | 2407.07064 | null |
2024-07-09 | Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | Weize Chen et.al. | 2407.07061 | link |
2024-07-09 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053 | link |
2024-07-09 | ProtoSAM -- One Shot Medical Image Segmentation With Foundational Models | Lev Ayzenberg et.al. | 2407.07042 | link |
2024-07-09 | Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models | Yue Zhang et.al. | 2407.07035 | link |
2024-07-09 | Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization | Jeongseok Hyun et.al. | 2407.07024 | link |
2024-07-09 | Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies | Inwon Kang et.al. | 2407.07019 | null |
2024-07-09 | End-To-End Causal Effect Estimation from Unstructured Natural Language Data | Nikita Dhawan et.al. | 2407.07018 | null |
2024-07-09 | Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures? | Zhilong Song et.al. | 2407.07016 | null |
2024-07-09 | Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning | J. Crosbie et.al. | 2407.07011 | null |
2024-07-09 | Metron: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal et.al. | 2407.07000 | link |
2024-07-09 | Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective | Yu-An Liu et.al. | 2407.06992 | link |
2024-07-09 | Segment-Based Interactive Machine Translation for Pre-trained Models | Angel Navarro et.al. | 2407.06990 | null |
2024-07-09 | Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models | Yi-Cheng Lin et.al. | 2407.06957 | link |
2024-07-08 | Multi-Object Hallucination in Vision-Language Models | Xuweiyi Chen et.al. | 2407.06192 | link |
2024-07-08 | 4D Contrastive Superflows are Dense 3D Representation Learners | Xiang Xu et.al. | 2407.06190 | link |
2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189 | link |
2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188 | null |
2024-07-08 | JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation | Yu Zeng et.al. | 2407.06187 | null |
2024-07-08 | Vision-Language Models under Cultural and Inclusive Considerations | Antonia Karamolegkou et.al. | 2407.06177 | null |
2024-07-08 | On Speeding Up Language Model Evaluation | Jin Peng Zhou et.al. | 2407.06172 | null |
2024-07-08 | What's Wrong with Your Code Generated by Large Language Models? An Extensive Study | Shihan Dou et.al. | 2407.06153 | null |
2024-07-09 | Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks | Lukas Netz et.al. | 2407.06146 | null |
2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
2024-07-08 | Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization | Hannah K. Bako et.al. | 2407.06129 | link |
2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125 | null |
2024-07-08 | Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning | Yadong Zhang et.al. | 2407.06112 | null |
2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093 | null |
2024-07-08 | Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models | Jinliang Lu et.al. | 2407.06089 | null |
2024-07-08 | From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty | Maor Ivgi et.al. | 2407.06071 | link |
2024-07-08 | Variational Best-of-N Alignment | Afra Amini et.al. | 2407.06057 | null |
2024-07-08 | MST5 -- Multilingual Question Answering over Knowledge Graphs | Nikit Srivastava et.al. | 2407.06041 | link |
2024-07-08 | PAS: Data-Efficient Plug-and-Play Prompt Augmentation System | Miao Zheng et.al. | 2407.06027 | null |
2024-07-08 | iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Aoyu Pang et.al. | 2407.06025 | link |
2024-07-05 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | Rudolf Laine et.al. | 2407.04694 | link |
2024-07-05 | ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Yuzhe Gu et.al. | 2407.04693 | link |
2024-07-05 | Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Yuanze Lin et.al. | 2407.04681 | null |
2024-07-05 | Lost in Translation: The Algorithmic Gap Between LMs and the Brain | Tommaso Tosato et.al. | 2407.04680 | null |
2024-07-05 | Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | Ye Bai et.al. | 2407.04675 | null |
2024-07-05 | Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement | Yongji Wu et.al. | 2407.04656 | null |
2024-07-05 | Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models | Bolaji Yusuf et.al. | 2407.04641 | null |
2024-07-05 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework | Reza Averly et.al. | 2407.04629 | null |
2024-07-05 | On scalable oversight with weak LLMs judging strong LLMs | Zachary Kenton et.al. | 2407.04622 | null |
2024-07-05 | CountGD: Multi-Modal Open-World Counting | Niki Amini-Naieni et.al. | 2407.04619 | null |
2024-07-05 | ARM: Efficient Guided Decoding with Autoregressive Reward Models | Sergey Troshin et.al. | 2407.04615 | null |
2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603 | link |
2024-07-05 | Written Term Detection Improves Spoken Term Detection | Bolaji Yusuf et.al. | 2407.04601 | link |
2024-07-05 | Testing learning hypotheses using neural networks by manipulating learning data | Cara Su-Yi Leong et.al. | 2407.04593 | null |
2024-07-05 | Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions | Shumaila Javaid et.al. | 2407.04581 | null |
2024-07-05 | VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models | Hang Gao et.al. | 2407.04573 | null |
2024-07-05 | Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition | Aditya K Surikuchi et.al. | 2407.04559 | link |
2024-07-05 | Spontaneous Reward Hacking in Iterative Self-Refinement | Jane Pan et.al. | 2407.04549 | null |
2024-07-05 | PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts | Ana-Cristina Rogoz et.al. | 2407.04541 | link |
2024-07-05 | GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning | Aleksander Ficek et.al. | 2407.04528 | null |
2024-07-03 | Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages | Max Zuo et.al. | 2407.03321 | link |
2024-07-03 | InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Pan Zhang et.al. | 2407.03320 | link |
2024-07-03 | BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations | Zhantao Yang et.al. | 2407.03314 | null |
2024-07-03 | Universal Length Generalization with Turing Programs | Kaiying Hou et.al. | 2407.03310 | null |
2024-07-03 | Large Language Models for JSON Schema Discovery | Michael J. Mior et.al. | 2407.03286 | null |
2024-07-03 | LLM Internal States Reveal Hallucination Risk Faced With a Query | Ziwei Ji et.al. | 2407.03282 | link |
2024-07-03 | STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data | Kheir Eddine Daouadi et.al. | 2407.03253 | null |
2024-07-03 | Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning | Zhili Shen et.al. | 2407.03227 | null |
2024-07-03 | How Does Quantization Affect Multilingual LLMs? | Kelly Marchisio et.al. | 2407.03211 | null |
2024-07-03 | TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Ruida Wang et.al. | 2407.03203 | link |
2024-07-03 | Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models | Haritz Puerto et.al. | 2407.03181 | link |
2024-07-03 | Investigating Decoder-only Large Language Models for Speech-to-text Translation | Chao-Wei Huang et.al. | 2407.03169 | null |
2024-07-03 | SOS! Soft Prompt Attack Against Open-Source Large Language Models | Ziqing Yang et.al. | 2407.03160 | null |
2024-07-03 | Let the Code LLM Edit Itself When You Edit the Code | Zhenyu He et.al. | 2407.03157 | null |
2024-07-03 | Reinforcement Learning for Sequence Design Leveraging Protein Language Models | Jithendaraa Subramanian et.al. | 2407.03154 | null |
2024-07-03 | Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data | Minato Kondo et.al. | 2407.03145 | null |
2024-07-03 | Social Bias Evaluation for Large Language Models Requires Prompt Variations | Rem Hida et.al. | 2407.03129 | link |
2024-07-03 | KeyVideoLLM: Towards Large-scale Video Keyframe Selection | Hao Liang et.al. | 2407.03104 | null |
2024-07-03 | Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory | Suyeon Lee et.al. | 2407.03103 | link |
2024-07-03 | ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text Monitoring | Le Fang et.al. | 2407.03063 | null |
2024-07-02 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
2024-07-02 | Neurocache: Efficient Vector Retrieval for Long-range Language Modeling | Ali Safaya et.al. | 2407.02486 | link |
2024-07-02 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | Yue Yu et.al. | 2407.02485 | null |
2024-07-02 | MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | Binxu Li et.al. | 2407.02483 | link |
2024-07-02 | Understanding Alignment in Multimodal LLMs: A Comprehensive Study | Elmira Amirloo et.al. | 2407.02477 | null |
2024-07-02 | Open Scene Graphs for Open World Object-Goal Navigation | Joel Loo et.al. | 2407.02473 | null |
2024-07-02 | ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions | Chan Young Park et.al. | 2407.02472 | link |
2024-07-02 | Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I | Harrie Oosterhuis et.al. | 2407.02464 | null |
2024-07-02 | Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets | Kheir Eddine Daouadi et.al. | 2407.02448 | null |
2024-07-03 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411 | null |
2024-07-02 | CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models | Song Wang et.al. | 2407.02408 | null |
2024-07-02 | Assessing the Code Clone Detection Capability of Large Language Models | Zixian Zhang et.al. | 2407.02402 | null |
2024-07-02 | Learning to Refine with Fine-Grained Natural Language Feedback | Manya Wadhwa et.al. | 2407.02397 | link |
2024-07-02 | Is Your AI-Generated Code Really Secure? Evaluating Large Language Models on Secure Code Generation with CodeSecEval | Jiexin Wang et.al. | 2407.02395 | null |
2024-07-02 | TokenPacker: Efficient Visual Projector for Multimodal LLM | Wentong Li et.al. | 2407.02392 | link |
2024-07-02 | Talking to Machines: do you read me? | Lina M. Rojas-Barahona et.al. | 2407.02354 | null |
2024-07-02 | Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification | Pritish Sahu et.al. | 2407.02352 | null |
2024-07-02 | Generative Large Language Models in Automated Fact-Checking: A Survey | Ivan Vykopal et.al. | 2407.02351 | null |
2024-07-02 | Conceptual Codebook Learning for Vision-Language Models | Yi Zhang et.al. | 2407.02350 | null |
2024-07-02 | MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space | Yihong Tang et.al. | 2407.02345 | null |
2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098 | link |
2024-06-28 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Xiang Li et.al. | 2406.20095 | link |
2024-06-28 | Scaling Synthetic Data Creation with 1,000,000,000 Personas | Xin Chan et.al. | 2406.20094 | link |
2024-06-28 | LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression | Jieneng Chen et.al. | 2406.20092 | link |
2024-06-28 | ProgressGym: Alignment with a Millennium of Moral Progress | Tianyi Qiu et.al. | 2406.20087 | link |
2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085 | null |
2024-06-28 | Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | Anisha Gunjal et.al. | 2406.20079 | link |
2024-06-28 | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Yuxuan Zhang et.al. | 2406.20076 | link |
2024-06-28 | To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models | Bastien Liétard et.al. | 2406.20054 | null |
2024-06-28 | Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation | Danny Halawi et.al. | 2406.20053 | null |
2024-07-01 | BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration | Noel Crawford et.al. | 2406.20041 | null |
2024-06-28 | BioMNER: A Dataset for Biomedical Method Entity Recognition | Chen Tang et.al. | 2406.20038 | null |
2024-06-28 | LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models | Renzhi Wang et.al. | 2406.20030 | null |
2024-06-28 | ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models | Yuxiang Zhang et.al. | 2406.20015 | link |
2024-06-28 | The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models | Xinyi Chen et.al. | 2406.19999 | link |
2024-06-28 | Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model | Habib Hajimolahoseini et.al. | 2406.19995 | null |
2024-06-28 | ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | Rui Pan et.al. | 2406.19976 | null |
2024-06-28 | STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical | Guohao Sun et.al. | 2406.19973 | link |
2024-06-28 | Into the Unknown: Generating Geospatial Descriptions for New Environments | Tzuf Paz-Argaman et.al. | 2406.19967 | null |
2024-06-28 | Simulating Financial Market via Large Language Model based Agents | Shen Gao et.al. | 2406.19966 | null |
2024-06-27 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392 | link |
2024-06-27 | The Remarkable Robustness of LLMs: Stages of Inference? | Vedang Lad et.al. | 2406.19384 | link |
2024-06-27 | The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models | Xiliang Zhu et.al. | 2406.19358 | null |
2024-06-27 | DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Nigel Fernandez et.al. | 2406.19356 | link |
2024-06-27 | Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? | Peter Hase et.al. | 2406.19354 | null |
2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349 | null |
2024-06-27 | Jump Starting Bandits with LLM-Generated Prior Knowledge | Parand A. Alamdari et.al. | 2406.19317 | link |
2024-06-27 | MCNC: Manifold Constrained Network Compression | Chayne Thrash et.al. | 2406.19301 | null |
2024-06-27 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | Zheyang Xiong et.al. | 2406.19292 | link |
2024-06-27 | PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models | Cathy Mengying Fang et.al. | 2406.19283 | null |
2024-06-27 | HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale | Junying Chen et.al. | 2406.19280 | link |
2024-06-27 | VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation | Yixiao Song et.al. | 2406.19276 | link |
2024-06-27 | AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning | Praneeth Vadlapati et.al. | 2406.19271 | link |
2024-06-27 | Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding | Yue Fan et.al. | 2406.19263 | link |
2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255 | null |
2024-06-27 | AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation | Jia Fu et.al. | 2406.19251 | null |
2024-06-27 | Revealing Fine-Grained Values and Opinions in Large Language Models | Dustin Wright et.al. | 2406.19238 | link |
2024-06-28 | FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Shubhankar Singh et.al. | 2406.19237 | null |
2024-06-27 | Seeing Is Believing: Black-Box Membership Inference Attacks Against Retrieval Augmented Generation | Yuying Li et.al. | 2406.19234 | null |
2024-06-28 | RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs | Ekaterina Taktasheva et.al. | 2406.19232 | link |
2024-06-26 | Towards Compositionality in Concept Learning | Adam Stein et.al. | 2406.18534 | link |
2024-06-26 | Symbolic Learning Enables Self-Evolving Agents | Wangchunshu Zhou et.al. | 2406.18532 | link |
2024-06-26 | PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation | Christoph Leiter et.al. | 2406.18528 | link |
2024-06-26 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang et.al. | 2406.18521 | link |
2024-06-26 | "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | Grace Li et.al. | 2406.18512 | null |
2024-06-26 | WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models | Liwei Jiang et.al. | 2406.18510 | link |
2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505 | null |
2024-06-26 | Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming | Zhenghao Zhou et.al. | 2406.18501 | null |
2024-06-26 | Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation | Ahmed Njifenjou et.al. | 2406.18460 | null |
2024-06-26 | Cascading Large Language Models for Salient Event Graph Generation | Xingwei Tan et.al. | 2406.18449 | link |
2024-06-26 | New intelligent empowerment for digital transformation | Peng Yifeng et.al. | 2406.18440 | null |
2024-06-26 | IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons | Dan Shi et.al. | 2406.18406 | link |
2024-06-26 | Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers | Yibo Jiang et.al. | 2406.18400 | null |
2024-06-26 | Adversarial Search Engine Optimization for Large Language Models | Fredrik Nestaas et.al. | 2406.18382 | null |
2024-06-26 | MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | Haolang Lu et.al. | 2406.18379 | null |
2024-06-26 | Themis: Towards Flexible and Interpretable NLG Evaluation | Xinyu Hu et.al. | 2406.18365 | link |
2024-06-26 | AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations | Adam Dahlgren Lindström et.al. | 2406.18346 | null |
2024-06-26 | PDFA Distillation via String Probability Queries {PDFA Distillation via String Probability Queries} | Robert Baumgartner et.al. | 2406.18328 | link |
2024-06-26 | PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models | Huixuan Zhang et.al. | 2406.18326 | null |
2024-06-26 | MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Meng Fang et.al. | 2406.18321 | null |
2024-06-25 | MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | Xiangyu Zhao et.al. | 2406.17770 | link |
2024-06-25 | EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data | Jesse Zhang et.al. | 2406.17768 | null |
2024-06-25 | BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning | Ercong Nie et.al. | 2406.17764 | null |
2024-06-25 | CaLMQA: Exploring culturally specific long-form question answering across 23 languages | Shane Arora et.al. | 2406.17761 | link |
2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
2024-06-25 | Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language | Amalie Brogaard Pauli et.al. | 2406.17753 | null |
2024-06-25 | Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon | USVSN Sai Prashanth et.al. | 2406.17746 | link |
2024-06-25 | Point-SAM: Promptable 3D Segmentation Model for Point Clouds | Yuchen Zhou et.al. | 2406.17741 | link |
2024-06-25 | Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model | Fei Xia et.al. | 2406.17739 | null |
2024-06-25 | LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users | Elinor Poole-Dayan et.al. | 2406.17737 | null |
2024-06-25 | FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model | Feijie Wu et.al. | 2406.17706 | link |
2024-06-25 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake et.al. | 2406.17692 | link |
2024-06-25 | VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Kun Qian et.al. | 2406.17681 | link |
2024-06-25 | Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models | Yuan Li et.al. | 2406.17675 | null |
2024-06-25 | LaTable: Towards Large Tabular Models | Boris van Breugel et.al. | 2406.17673 | null |
2024-06-25 | LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic | Aditya Kalyanpur et.al. | 2406.17663 | null |
2024-06-25 | Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | Aashiq Muhamed et.al. | 2406.17660 | link |
2024-06-25 | DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning | Xiaohan Zhang et.al. | 2406.17659 | null |
2024-06-25 | Leveraging Large Language Models for Software Model Completion: Results from Industrial and Public Datasets | Christof Tinnes et.al. | 2406.17651 | link |
2024-06-25 | Variationist: Exploring Multifaceted Variation and Bias in Written Language Data | Alan Ramponi et.al. | 2406.17647 | link |
2024-06-24 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs | Shengbang Tong et.al. | 2406.16860 | link |
2024-06-24 | EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | Yuhui Li et.al. | 2406.16858 | link |
2024-06-24 | Long Context Transfer from Language to Vision | Peiyuan Zhang et.al. | 2406.16852 | link |
2024-06-24 | Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts | Aditya Sharma et.al. | 2406.16851 | null |
2024-06-24 | RaTEScore: A Metric for Radiology Report Generation | Weike Zhao et.al. | 2406.16845 | link |
2024-06-24 | From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models | Sean Welleck et.al. | 2406.16838 | null |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long |
Mounika Marreddy et.al. | 2406.16833 | null |
2024-06-24 | Understanding and Mitigating Tokenization Bias in Language Models | Buu Phan et.al. | 2406.16829 | null |
2024-06-24 | Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | Ronak Pradeep et.al. | 2406.16828 | link |
2024-06-24 | GPT-4V Explorations: Mining Autonomous Driving | Zixuan Li et.al. | 2406.16817 | null |
2024-06-24 | RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale | Beck LaBash et.al. | 2406.16801 | link |
2024-06-24 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda et.al. | 2406.16797 | link |
2024-06-24 | Adam-mini: Use Fewer Learning Rates To Gain More | Yushun Zhang et.al. | 2406.16793 | link |
2024-06-24 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models | Rishabh Maheshwary et.al. | 2406.16783 | null |
2024-06-24 | It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension | Sagi Shaier et.al. | 2406.16779 | null |
2024-06-24 | Finding Transformer Circuits with Edge Pruning | Adithya Bhaskar et.al. | 2406.16778 | link |
2024-06-24 | Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 | Sai Koneru et.al. | 2406.16777 | null |
2024-06-24 | WARP: On the Benefits of Weight Averaged Rewarded Policies | Alexandre Ramé et.al. | 2406.16768 | null |
2024-06-24 | The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories | Xi Yu Huang et.al. | 2406.16767 | link |
2024-06-24 | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | Euiin Yi et.al. | 2406.16758 | link |
2024-06-21 | GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians | Haoyang Liu et.al. | 2406.15341 | link |
2024-06-21 | Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance | Haoling Li et.al. | 2406.15330 | null |
2024-06-21 | Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks | Hokyung Lee et.al. | 2406.15325 | link |
2024-06-21 | Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model | Doyoung Kim et.al. | 2406.15275 | link |
2024-06-21 | Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics | Weijia Zhang et.al. | 2406.15264 | null |
2024-06-21 | Unsupervised Morphological Tree Tokenizer | Qingyang Zhu et.al. | 2406.15245 | null |
2024-06-21 | Large Batch Analysis for Adagrad Under Anisotropic Smoothness | Yuxing Liu et.al. | 2406.15244 | null |
2024-06-21 | Detecting Synthetic Lyrics with Few-Shot Inference | Yanis Labrak et.al. | 2406.15231 | null |
2024-06-21 | A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation | Irune Zubiaga et.al. | 2406.15227 | link |
2024-06-21 | Unsupervised Extraction of Dialogue Policies from Conversations | Makesh Narsimhan Sreedhar et.al. | 2406.15214 | null |
2024-06-21 | Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding | Mohan Li et.al. | 2406.15209 | null |
2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
2024-06-21 | UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis | Yulong Hui et.al. | 2406.15187 | link |
2024-06-21 | Hybrid Alignment Training for Large Language Models | Chenglong Wang et.al. | 2406.15178 | link |
2024-06-21 | EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot | Hao Fei et.al. | 2406.15177 | link |
2024-06-21 | Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss | Wei He et.al. | 2406.15175 | null |
2024-06-21 | Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens | Mathieu Chartier et.al. | 2406.15173 | null |
2024-06-21 | Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks | Victor Hugo Nascimento Rocha et.al. | 2406.15130 | link |
2024-06-21 | Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network | Badr AlKhamissi et.al. | 2406.15109 | link |
2024-06-21 | PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data | Ishaan Watts et.al. | 2406.15053 | null |
2024-06-20 | Model Merging and Safety Alignment: One Bad Model Spoils the Bunch | Hasan Abed Al Kader Hammoud et.al. | 2406.14563 | null |
2024-06-20 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Sachit Menon et.al. | 2406.14562 | null |
2024-06-20 | How to Compute the Probability of a Word | Tiago Pimentel et.al. | 2406.14561 | link |
2024-06-21 | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving | Yuan Chen et.al. | 2406.14556 | link |
2024-06-20 | GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models | Shilong Li et.al. | 2406.14550 | null |
2024-06-20 | Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models | Sunny Duan et.al. | 2406.14549 | null |
2024-06-20 | Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein et.al. | 2406.14546 | link |
2024-06-20 | Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems | Đorđe Klisura et.al. | 2406.14545 | null |
2024-06-20 | Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | Yuxuan Qiao et.al. | 2406.14544 | link |
2024-06-20 | Are LLMs Naturally Good at Synthetic Tabular Data Generation? | Shengzhe Xu et.al. | 2406.14541 | link |
2024-06-20 | PostMark: A Robust Blackbox Watermark for Large Language Models | Yapei Chang et.al. | 2406.14517 | link |
2024-06-20 | MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | Xinyu Fang et.al. | 2406.14515 | link |
2024-06-20 | Evidence of a log scaling law for political persuasion with large language models | Kobi Hackenburg et.al. | 2406.14508 | link |
2024-06-20 | Overview of the CAIL 2023 Argument Mining Track | Jingcong Liang et.al. | 2406.14503 | null |
2024-06-20 | Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary | Xingmeng Zhao et.al. | 2406.14500 | null |
2024-06-20 | LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors | Sheikh Asif Imran et.al. | 2406.14498 | link |
2024-06-20 | CodeRAG-Bench: Can Retrieval Augment Code Generation? | Zora Zhiruo Wang et.al. | 2406.14497 | link |
2024-06-20 | African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Gregor Geigle et.al. | 2406.14496 | link |
2024-06-20 | Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Gregor Geigle et.al. | 2406.14492 | null |
2024-06-20 | Instruction Pre-Training: Language Models are Supervised Multitask Learners | Daixuan Cheng et.al. | 2406.14491 | link |
2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846 | null |
2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | link |
2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
2024-06-18 | GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation | Ci-Siang Lin et.al. | 2406.12834 | null |
2024-06-18 | LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation | Seyedarmin Azizi et.al. | 2406.12832 | link |
2024-06-18 | What Are the Odds? Language Models Are Capable of Probabilistic Reasoning | Akshay Paruchuri et.al. | 2406.12830 | link |
2024-06-18 | From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries | Hitesh Wadhwa et.al. | 2406.12824 | null |
2024-06-18 | Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? | Pinzhen Chen et.al. | 2406.12822 | null |
2024-06-18 | Adversarial Attacks on Multimodal Agents | Chen Henry Wu et.al. | 2406.12814 | link |
2024-06-18 | Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? | Zhe Yang et.al. | 2406.12809 | link |
2024-06-18 | Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents | Zehao Wang et.al. | 2406.12806 | null |
2024-06-18 | Supporting Human Raters with the Detection of Harmful Content using Large Language Models | Kurt Thomas et.al. | 2406.12800 | null |
2024-06-18 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Team GLM et.al. | 2406.12793 | link |
2024-06-18 | In-Context Learning of Energy Functions | Rylan Schaeffer et.al. | 2406.12785 | null |
2024-06-18 | UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions | Xunzhi Wang et.al. | 2406.12784 | link |
2024-06-18 | Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries | Eden Biran et.al. | 2406.12775 | link |
2024-06-18 | Towards Exact Gradient-based Training on Analog In-memory Computing | Zhaoxian Wu et.al. | 2406.12774 | null |
2024-06-18 | GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping | Angel Daruna et.al. | 2406.12756 | null |
2024-06-18 | OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | Zhen Huang et.al. | 2406.12753 | link |
2024-06-18 | Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | Bingchen Zhao et.al. | 2406.12742 | link |
2024-06-17 | LLaNA: Large Language and NeRF Assistant | Andrea Amaduzzi et.al. | 2406.11840 | null |
2024-06-17 | mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Fei Wang et.al. | 2406.11839 | null |
2024-06-17 | MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | Ziyu Liu et.al. | 2406.11833 | link |
2024-06-17 | Unveiling Encoder-Free Vision-Language Models | Haiwen Diao et.al. | 2406.11832 | link |
2024-06-17 | Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models | Bingqi Ma et.al. | 2406.11831 | null |
2024-06-17 | Language Modeling with Editable External Knowledge | Belinda Z. Li et.al. | 2406.11830 | link |
2024-06-17 | WPO: Enhancing RLHF with Weighted Preference Optimization | Wenxuan Zhou et.al. | 2406.11827 | link |
2024-06-17 | On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning | Geewook Kim et.al. | 2406.11823 | link |
2024-06-17 | MegaScenes: Scene-Level View Synthesis at Scale | Joseph Tung et.al. | 2406.11819 | link |
2024-06-17 | Embodied Instruction Following in Unknown Environments | Zhenyu Wu et.al. | 2406.11818 | null |
2024-06-17 | Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level | Jie Liu et.al. | 2406.11817 | null |
2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816 | null |
2024-06-17 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? | Hoyeon Chang et.al. | 2406.11813 | link |
2024-06-17 | RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content | Joao Monteiro et.al. | 2406.11811 | link |
2024-06-17 | Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations | Rima Hazra et.al. | 2406.11801 | link |
2024-06-17 | DataComp-LM: In search of the next generation of training sets for language models | Jeffrey Li et.al. | 2406.11794 | null |
2024-06-17 | CELL your Model: Contrastive Explanation Methods for Large Language Models | Ronny Luss et.al. | 2406.11785 | null |
2024-06-17 | Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs | Swanand Ravindra Kadhe et.al. | 2406.11780 | null |
2024-06-17 | Improving Multi-Agent Debate with Sparse Communication Topology | Yunxuan Li et.al. | 2406.11776 | null |
2024-06-17 | Task Me Anything | Jieyu Zhang et.al. | 2406.11775 | link |
2024-06-14 | Quantifying Variance in Evaluation Benchmarks | Lovish Madaan et.al. | 2406.10229 | null |
2024-06-14 | EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Julian Straub et.al. | 2406.10224 | link |
2024-06-14 | Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding | Ridouane Ghermi et.al. | 2406.10221 | link |
2024-06-14 | Semantic Membership Inference Attack against Large Language Models | Hamid Mozaffari et.al. | 2406.10218 | null |
2024-06-14 | Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Rui Yang et.al. | 2406.10216 | link |
2024-06-14 | DevBench: A multimodal developmental benchmark for language learning | Alvin Wei Ming Tan et.al. | 2406.10215 | link |
2024-06-14 | Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Abhimanyu Hans et.al. | 2406.10209 | link |
2024-06-14 | A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors | Naaman Tan et.al. | 2406.10203 | link |
2024-06-14 | TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners | Tomas de la Rosa et.al. | 2406.10196 | null |
2024-06-14 | Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jiawei Chen et.al. | 2406.10185 | null |
2024-06-14 | Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors | Siyuan Chen et.al. | 2406.10181 | null |
2024-06-14 | Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation | Mohamad Elzohbi et.al. | 2406.10174 | link |
2024-06-14 | IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce | Wenxuan Ding et.al. | 2406.10173 | link |
2024-06-14 | Datasets for Multilingual Answer Sentence Selection | Matteo Gabburo et.al. | 2406.10172 | null |
2024-06-14 | CarLLaVA: Vision language models for camera-only closed-loop driving | Katrin Renz et.al. | 2406.10165 | null |
2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162 | link |
2024-06-14 | RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model | Hantao Zhou et.al. | 2406.10157 | null |
2024-06-14 | BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | Yuri Kuratov et.al. | 2406.10149 | link |
2024-06-14 | Evaluation of Large Language Models: STEM education and Gender Stereotypes | Smilla Due et.al. | 2406.10133 | null |
2024-06-14 | The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models | Yan Liu et.al. | 2406.10130 | link |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
2024-06-13 | 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Roman Bachmann et.al. | 2406.09406 | null |
2024-06-13 | Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Yushi Hu et.al. | 2406.09403 | null |
2024-06-13 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Junke Wang et.al. | 2406.09399 | link |
2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397 | null |
2024-06-13 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396 | link |
2024-06-13 | Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition | Youngtaek Oh et.al. | 2406.09388 | link |
2024-06-13 | Towards Vision-Language Geo-Foundation Model: A Survey | Yue Zhou et.al. | 2406.09385 | link |
2024-06-13 | Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models | Lukas Thede et.al. | 2406.09384 | null |
2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
2024-06-13 | ElicitationGPT: Text Elicitation Mechanisms via Language Models | Yifan Wu et.al. | 2406.09363 | null |
2024-06-13 | Enhancing Domain Adaptation through Prompt Gradient Alignment | Hoang Phan et.al. | 2406.09353 | link |
2024-06-13 | Separations in the Representational Capabilities of Transformers and Recurrent Architectures | Satwik Bhattamishra et.al. | 2406.09347 | null |
2024-06-13 | DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding | Suwon Shon et.al. | 2406.09345 | null |
2024-06-13 | ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models | David Anugraha et.al. | 2406.09334 | link |
2024-06-13 | REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space | Tomer Ashuach et.al. | 2406.09325 | null |
2024-06-13 | Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Zhao Xu et.al. | 2406.09324 | link |
2024-06-13 | JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models | Delong Ran et.al. | 2406.09321 | link |
2024-06-13 | Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases | Meng Wang et.al. | 2406.09317 | link |
2024-06-12 | What If We Recaption Billions of Web Images with LLaMA-3? | Xianhang Li et.al. | 2406.08478 | null |
2024-06-12 | Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | Ting-Ji Huang et.al. | 2406.08477 | null |
2024-06-12 | Real2Code: Reconstruct Articulated Objects via Code Generation | Zhao Mandi et.al. | 2406.08474 | null |
2024-06-12 | PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences | Daiwei Chen et.al. | 2406.08469 | null |
2024-06-12 | Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Zhangchen Xu et.al. | 2406.08464 | link |
2024-06-12 | AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind | Wei Ding et.al. | 2406.08455 | null |
2024-06-12 | OLMES: A Standard for Language Model Evaluations | Yuling Gu et.al. | 2406.08446 | null |
2024-06-12 | SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models | Chun Yin et.al. | 2406.08445 | null |
2024-06-12 | TasTe: Teaching Large Language Models to Translate through Self-Reflection | Yutong Wang et.al. | 2406.08434 | link |
2024-06-12 | Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | Zijin Hong et.al. | 2406.08426 | null |
2024-06-12 | OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418 | link |
2024-06-12 | Discovering Preference Optimization Algorithms with and for Large Language Models | Chris Lu et.al. | 2406.08414 | link |
2024-06-12 | Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference | Christopher Wolters et.al. | 2406.08413 | null |
2024-06-13 | MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos | Xuehai He et.al. | 2406.08407 | link |
2024-06-12 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Chun-Yi Kuan et.al. | 2406.08402 | link |
2024-06-12 | cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers | Anirudh Sundar et.al. | 2406.08398 | null |
2024-06-12 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394 | link |
2024-06-12 | Large Language Models Must Be Taught to Know What They Don't Know | Sanyam Kapoor et.al. | 2406.08391 | link |
2024-06-12 | Banal Deception Human-AI Ecosystems: A Study of People's Perceptions of LLM-generated Deceptive Behaviour | Xiao Zhan et.al. | 2406.08386 | null |
2024-06-13 | APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation | Weizhao He et.al. | 2406.08372 | null |
2024-06-11 | A3VLM: Actionable Articulation-Aware Vision Language Model | Siyuan Huang et.al. | 2406.07549 | link |
2024-06-11 | Image and Video Tokenization with Binary Spherical Quantization | Yue Zhao et.al. | 2406.07548 | link |
2024-06-11 | Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Aidar Myrzakhan et.al. | 2406.07545 | link |
2024-06-11 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li et.al. | 2406.07528 | link |
2024-06-11 | Simple and Effective Masked Diffusion Language Models | Subham Sekhar Sahoo et.al. | 2406.07524 | link |
2024-06-11 | Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling | Liliang Ren et.al. | 2406.07522 | link |
2024-06-11 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | Yunzhen Feng et.al. | 2406.07515 | null |
2024-06-11 | THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report | KBTG Labs et.al. | 2406.07505 | null |
2024-06-11 | Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Renjie Pi et.al. | 2406.07502 | link |
2024-06-11 | TextGrad: Automatic "Differentiation" via Text | Mert Yuksekgonul et.al. | 2406.07496 | link |
2024-06-11 | CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization | Frederic Kirstein et.al. | 2406.07494 | null |
2024-06-11 | Paraphrasing in Affirmative Terms Improves Negation Understanding | MohammadHossein Rezaei et.al. | 2406.07492 | null |
2024-06-11 | PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction | Adnan Abbas et.al. | 2406.07485 | null |
2024-06-11 | Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing | Mao Li et.al. | 2406.07483 | null |
2024-06-11 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476 | link |
2024-06-11 | Anomaly Detection on Unstable Logs with GPT Models | Fatemeh Hadadi et.al. | 2406.07467 | null |
2024-06-11 | Estimating the Hallucination Rate of Generative AI | Andrew Jesson et.al. | 2406.07457 | null |
2024-06-11 | Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis | Qining Zhang et.al. | 2406.07455 | null |
2024-06-11 | On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations | Shiao Meng et.al. | 2406.07444 | link |
2024-06-11 | McEval: Massively Multilingual Code Evaluation | Linzheng Chai et.al. | 2406.07436 | null |
2024-06-10 | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Peize Sun et.al. | 2406.06525 | link |
2024-06-10 | UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | Shivani Upadhyay et.al. | 2406.06519 | link |
2024-06-10 | Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Louis Blankemeier et.al. | 2406.06512 | null |
2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | null |
2024-06-10 | Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation | Oishi Banerjee et.al. | 2406.06496 | null |
2024-06-10 | Can Language Models Serve as Text-Based World Simulators? | Ruoyao Wang et.al. | 2406.06485 | null |
2024-06-10 | Parallelizing Linear Transformers with the Delta Rule over Sequence Length | Songlin Yang et.al. | 2406.06484 | link |
2024-06-10 | Towards a Personal Health Large Language Model | Justin Cosentino et.al. | 2406.06474 | null |
2024-06-10 | AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction | Zhen Xing et.al. | 2406.06465 | null |
2024-06-10 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464 | null |
2024-06-10 | VCR: Visual Caption Restoration | Tianyu Zhang et.al. | 2406.06462 | link |
2024-06-11 | Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies | Junlin Wang et.al. | 2406.06461 | null |
2024-06-10 | Evaluating the Retrieval Component in LLM-Based Question Answering Systems | Ashkan Alinejad et.al. | 2406.06458 | null |
2024-06-10 | A Large Language Model Pipeline for Breast Cancer Oncology | Tristen Pool et.al. | 2406.06455 | null |
2024-06-10 | Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course | Aadarsh Padiyath et.al. | 2406.06451 | null |
2024-06-10 | LLM Dataset Inference: Did you train on my dataset? | Pratyush Maini et.al. | 2406.06443 | link |
2024-06-10 | Interpretability of Language Models via Task Spaces | Lucas Weber et.al. | 2406.06441 | null |
2024-06-10 | Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain | Brian Hu et.al. | 2406.06435 | link |
2024-06-10 | Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking | Gabriel Rioux et.al. | 2406.06425 | null |
2024-06-10 | An Empirical Design Justice Approach to Identifying Ethical Considerations in the Intersection of Large Language Models and Social Robotics | Alva Markelius et.al. | 2406.06400 | null |
2024-06-07 | 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs | Jianing Yang et.al. | 2406.05132 | link |
2024-06-07 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou et.al. | 2406.05130 | link |
2024-06-07 | Towards Semantic Equivalence of Tokenization in Multimodal LLM | Shengqiong Wu et.al. | 2406.05127 | null |
2024-06-07 | Large Generative Graph Models | Yu Wang et.al. | 2406.05109 | null |
2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107 | null |
2024-06-07 | Corpus Poisoning via Approximate Greedy Gradient Descent | Jinyan Su et.al. | 2406.05087 | link |
2024-06-07 | Multi-Head RAG: Solving Multi-Aspect Problems with LLMs | Maciej Besta et.al. | 2406.05085 | link |
2024-06-07 | SUMIE: A Synthetic Benchmark for Incremental Entity Summarization | Eunjeong Hwang et.al. | 2406.05079 | null |
2024-06-07 | Are Large Language Models More Empathetic than Humans? | Anuradha Welivita et.al. | 2406.05063 | null |
2024-06-07 | Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Shi-Yu Tian et.al. | 2406.05055 | null |
2024-06-07 | Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation | Nachiket Kotalwar et.al. | 2406.05053 | null |
2024-06-07 | Bootstrapping Referring Multi-Object Tracking | Yani Zhang et.al. | 2406.05039 | link |
2024-06-07 | Scenarios and Approaches for Situated Natural Language Explanations | Pengshuo Qiu et.al. | 2406.05035 | null |
2024-06-07 | CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search | Fengran Mo et.al. | 2406.05013 | link |
2024-06-07 | Compositional Generalization with Grounded Language Models | Sondre Wold et.al. | 2406.04989 | link |
2024-06-07 | Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences | Patrick Haller et.al. | 2406.04988 | link |
2024-06-07 | MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jitai Hao et.al. | 2406.04984 | link |
2024-06-07 | CityCraft: A Real Crafter for 3D City Generation | Jie Deng et.al. | 2406.04983 | null |
2024-06-07 | Quantifying Geospatial in the Common Crawl Corpus | Ilya Ilyankou et.al. | 2406.04952 | null |
2024-06-07 | BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense | Baktash Ansari et.al. | 2406.04947 | link |
2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344 | null |
2024-06-06 | Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image | Stanislaw Szymanowicz et.al. | 2406.04343 | link |
2024-06-06 | Learning 1D Causal Visual Representation with De-focus Attention Networks | Chenxin Tao et.al. | 2406.04342 | link |
2024-06-06 | RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation | Jiaming Liu et.al. | 2406.04339 | null |
2024-06-06 | Coherent Zero-Shot Visual Instruction Generation | Quynh Phung et.al. | 2406.04337 | null |
2024-06-06 | DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Lingchen Meng et.al. | 2406.04334 | null |
2024-06-06 | PaCE: Parsimonious Concept Engineering for Large Language Models | Jinqi Luo et.al. | 2406.04331 | link |
2024-06-06 | Parameter-Inverted Image Pyramid Networks | Xizhou Zhu et.al. | 2406.04330 | link |
2024-06-06 | Simplified and Generalized Masked Diffusion for Discrete Data | Jiaxin Shi et.al. | 2406.04329 | null |
2024-06-06 | Causal Estimation of Memorisation Profiles | Pietro Lesci et.al. | 2406.04327 | link |
2024-06-06 | ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Lin Chen et.al. | 2406.04325 | null |
2024-06-06 | Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step | Zhanhao Liang et.al. | 2406.04314 | link |
2024-06-06 | Improving Alignment and Robustness with Short Circuiting | Andy Zou et.al. | 2406.04313 | link |
2024-06-06 | Semantically Diverse Language Generation for Uncertainty Estimation in Language Models | Lukas Aichberger et.al. | 2406.04306 | link |
2024-06-06 | Quixer: A Quantum Transformer Model | Nikhil Khatri et.al. | 2406.04305 | null |
2024-06-06 | Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models | Phat Nguyen et.al. | 2406.04300 | null |
2024-06-06 | VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval | Junjie Zhou et.al. | 2406.04292 | link |
2024-06-06 | Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation | Adam Fisch et.al. | 2406.04291 | null |
2024-06-07 | What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages | Nadav Borenstein et.al. | 2406.04289 | null |
2024-06-06 | Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People | Dun-Ming Huang et.al. | 2406.04278 | link |
2024-06-05 | Wings: Learning Multimodal LLMs without Text-only Forgetting | Yi-Kai Zhang et.al. | 2406.03496 | null |
2024-06-06 | Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | Ao Sun et.al. | 2406.03488 | link |
2024-06-05 | Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends | Sanjana Ramprasad et.al. | 2406.03487 | null |
2024-06-05 | BIPED: Pedagogically Informed Tutoring System for ESL Education | Soonwoo Kwon et.al. | 2406.03486 | null |
2024-06-05 | Does your data spark joy? Performance gains from domain upsampling at the end of training | Cody Blakeney et.al. | 2406.03476 | null |
2024-06-05 | AD-H: Autonomous Driving with Hierarchical Agents | Zaibin Zhang et.al. | 2406.03474 | null |
2024-06-05 | What is the Best Way for ChatGPT to Translate Poetry? | Shanshan Wang et.al. | 2406.03450 | null |
2024-06-05 | Pre-trained Large Language Models Use Fourier Features to Compute Addition | Tianyi Zhou et.al. | 2406.03445 | null |
2024-06-05 | Are language models rational? The case of coherence norms and belief revision | Thomas Hofweber et.al. | 2406.03442 | null |
2024-06-05 | Cycles of Thought: Measuring LLM Confidence through Stable Explanations | Evan Becker et.al. | 2406.03441 | null |
2024-06-05 | Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Moein Heidari et.al. | 2406.03430 | link |
2024-06-05 | Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach | Saehyung Lee et.al. | 2406.03411 | link |
2024-06-05 | Automating Turkish Educational Quiz Generation Using Large Language Models | Kamyar Zeinalipour et.al. | 2406.03397 | link |
2024-06-05 | Log Parsing with Self-Generated In-Context Learning and Self-Correction | Yifan Wu et.al. | 2406.03376 | null |
2024-06-05 | IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | David Ifeoluwa Adelani et.al. | 2406.03368 | null |
2024-06-05 | CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning | Xinrui Lin et.al. | 2406.03367 | null |
2024-06-05 | LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | Timon Ziegenbein et.al. | 2406.03363 | null |
2024-06-05 | Save It for the "Hot" Day: An LLM-Empowered Visual Analytics System for Heat Risk Management | Haobo Li et.al. | 2406.03317 | null |
2024-06-05 | The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games | Mikhail Mozikov et.al. | 2406.03299 | null |
2024-06-05 | SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms | Xingrun Xing et.al. | 2406.03287 | link |
2024-06-04 | Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks | Tianyu He et.al. | 2406.02550 | link |
2024-06-04 | Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation | Mohamed El Amine Boudjoghra et.al. | 2406.02548 | link |
2024-06-04 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Alex Jinpeng Wang et.al. | 2406.02547 | link |
2024-06-04 | To Believe or Not to Believe Your LLM | Yasin Abbasi Yadkori et.al. | 2406.02543 | null |
2024-06-04 | Loki: Low-Rank Keys for Efficient Sparse Attention | Prajwal Singhania et.al. | 2406.02542 | link |
2024-06-04 | Parrot: Multilingual Visual Instruction Tuning | Hai-Long Sun et.al. | 2406.02539 | link |
2024-06-04 | TopViewRS: Vision-Language Models as Top-View Spatial Reasoners | Chengzu Li et.al. | 2406.02537 | link |
2024-06-04 | Mitigate Position Bias in Large Language Models via Scaling a Single Dimension | Yijiong Yu et.al. | 2406.02536 | link |
2024-06-04 | SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices | Ruslan Svirschevski et.al. | 2406.02532 | link |
2024-06-04 | Scalable MatMul-free Language Modeling | Rui-Jie Zhu et.al. | 2406.02528 | link |
2024-06-04 | CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks | Maciej Besta et.al. | 2406.02524 | link |
2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523 | null |
2024-06-04 | Demystifying the Compression of Mixture-of-Experts Through a Unified Framework | Shwai He et.al. | 2406.02500 | link |
2024-06-04 | Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion | Jakub Hoscilowicz et.al. | 2406.02481 | link |
2024-06-04 | Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding | Zhihan Zhang et.al. | 2406.02472 | link |
2024-06-04 | Meta-Designing Quantum Experiments with Language Models | Sören Arlt et.al. | 2406.02470 | null |
2024-06-04 | Seed-TTS: A Family of High-Quality Versatile Speech Generation Models | Philip Anastassiou et.al. | 2406.02430 | link |
2024-06-04 | Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion | Ruiqi Li et.al. | 2406.02429 | null |
2024-06-04 | GrootVL: Tree Topology is All You Need in State Space Model | Yicheng Xiao et.al. | 2406.02395 | link |
2024-06-04 | Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | Maxime Griot et.al. | 2406.02394 | link |
2024-05-31 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu et.al. | 2405.21075 | null |
2024-05-31 | Code Pretraining Improves Entity Tracking Abilities of Language Models | Najoung Kim et.al. | 2405.21068 | null |
2024-05-31 | Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality | Tri Dao et.al. | 2405.21060 | link |
2024-05-31 | RydbergGPT | David Fitzek et.al. | 2405.21052 | link |
2024-05-31 | Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling | Jiatao Gu et.al. | 2405.21048 | null |
2024-05-31 | Grammar-Aligned Decoding | Kanghee Park et.al. | 2405.21047 | null |
2024-05-31 | Exploratory Preference Optimization: Harnessing Implicit Q-Approximation for Sample-Efficient RLHF* | Tengyang Xie et.al. | 2405.21046 | null |
2024-05-31 | Direct Alignment of Language Models via Quality-Aware Self-Refinement | Runsheng Yu et.al. | 2405.21040 | null |
2024-05-31 | Standards for Belief Representations in LLMs | Daniel A. Herrmann et.al. | 2405.21030 | null |
2024-05-31 | LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | Elias Stengel-Eskin et.al. | 2405.21028 | link |
2024-05-31 | You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet | Zhen Qin et.al. | 2405.21022 | null |
2024-05-31 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | Xiaojun Jia et.al. | 2405.21018 | link |
2024-06-03 | StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond | Pengyuan Lyu et.al. | 2405.21013 | null |
2024-05-31 | Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models | Yi Yang et.al. | 2405.20991 | link |
2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985 | link |
2024-05-31 | Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training | Feiteng Fang et.al. | 2405.20978 | link |
2024-05-31 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974 | link |
2024-05-31 | LCQ: Low-Rank Codebook based Quantization for Large Language Models | Wen-Pu Cai et.al. | 2405.20973 | null |
2024-06-03 | Large Language Models are Zero-Shot Next Location Predictors | Ciro Beneduce et.al. | 2405.20962 | link |
2024-06-03 | A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians | Piotr Wojciech Mirowski et.al. | 2405.20956 | null |
2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340 | link |
2024-05-30 | Visual Perception by Large Language Model's Weights | Feipeng Ma et.al. | 2405.20339 | link |
2024-05-30 | Xwin-LM: Strong and Scalable Alignment Practice for LLMs | Bolin Ni et.al. | 2405.20335 | link |
2024-05-31 | ParSEL: Parameterized Shape Editing with Language | Aditya Ganeshan et.al. | 2405.20319 | null |
2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318 | link |
2024-05-30 | ANAH: Analytical Annotation of Hallucinations in Large Language Models | Ziwei Ji et.al. | 2405.20315 | link |
2024-05-30 | Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | Guillaume Huguet et.al. | 2405.20313 | null |
2024-05-30 | Large Language Models Can Self-Improve At Web Agent Tasks | Ajay Patel et.al. | 2405.20309 | link |
2024-05-30 | Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models | Himangi Mittal et.al. | 2405.20305 | null |
2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | link |
2024-05-30 | Who Writes the Review, Human or AI? | Panagiotis C. Theocharopoulos et.al. | 2405.20285 | null |
2024-05-30 | ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections | Massimo Bini et.al. | 2405.20271 | link |
2024-05-30 | Evaluating Large Language Model Biases in Persona-Steered Generation | Andy Liu et.al. | 2405.20253 | link |
2024-05-30 | Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization | Yuchi Liu et.al. | 2405.20252 | link |
2024-05-30 | Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use | Franz Louis Cesista et.al. | 2405.20245 | null |
2024-05-30 | Context Injection Attacks on Large Language Models | Cheng'an Wei et.al. | 2405.20234 | null |
2024-05-30 | Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies | Harveen Kaur et.al. | 2405.20217 | null |
2024-05-30 | TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models | Chen Zhang et.al. | 2405.20215 | null |
2024-05-30 | One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments | Ke Yi et.al. | 2405.20202 | null |
2024-05-31 | Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations | Zilin Ma et.al. | 2405.20195 | null |
2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335 | null |
2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | link |
2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328 | null |
2024-05-29 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang et.al. | 2405.19327 | link |
2024-05-29 | Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
2024-05-29 | Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | Minghan Li et.al. | 2405.19325 | null |
2024-05-29 | Are Large Language Models Chameleons? | Mingmeng Geng et.al. | 2405.19323 | null |
2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | null |
2024-05-29 | Robust Preference Optimization through Reward Model Distillation | Adam Fisch et.al. | 2405.19316 | null |
2024-05-29 | Matryoshka Query Transformer for Large Vision-Language Models | Wenbo Hu et.al. | 2405.19315 | link |
2024-05-29 | Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice | Jian-Qiao Zhu et.al. | 2405.19313 | null |
2024-05-29 | Expert-Guided Extinction of Toxic Tokens for Debiased Generation | Xueyao Sun et.al. | 2405.19299 | null |
2024-05-29 | MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection | Michael Regan et.al. | 2405.19285 | null |
2024-05-29 | Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform | Viviane Potocnik et.al. | 2405.19284 | null |
2024-05-29 | Programmable Motion Generation for Open-Set Motion Control Tasks | Hanchao Liu et.al. | 2405.19283 | null |
2024-05-29 | PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications | Dingkang Yang et.al. | 2405.19266 | link |
2024-05-29 | AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data | Zifan Song et.al. | 2405.19265 | link |
2024-05-29 | Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models | Zhanhui Zhou et.al. | 2405.19262 | link |
2024-05-28 | Why are Visually-Grounded Language Models Bad at Image Classification? | Yuhui Zhang et.al. | 2405.18415 | link |
2024-05-28 | Don't Forget to Connect! Improving RAG with Graph-based Reranking | Jialin Dong et.al. | 2405.18414 | null |
2024-05-28 | WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization | Jiawei Ma et.al. | 2405.18405 | null |
2024-05-29 | Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | Ethan Shen et.al. | 2405.18400 | link |
2024-05-28 | Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Yixiao Zhang et.al. | 2405.18386 | link |
2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380 | link |
2024-05-28 | LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | Anthony Sarah et.al. | 2405.18377 | null |
2024-05-28 | Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning | Dongjie Chen et.al. | 2405.18376 | link |
2024-05-28 | Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning | Phakphum Artkaew et.al. | 2405.18375 | link |
2024-05-28 | PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework | Eshaan Agarwal et.al. | 2405.18369 | null |
2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
2024-05-28 | Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs | Somnath Kumar et.al. | 2405.18359 | null |
2024-05-28 | MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | Somnath Kumar et.al. | 2405.18358 | null |
2024-05-28 | Faithful Logical Reasoning via Symbolic Chain-of-Thought | Jundong Xu et.al. | 2405.18357 | link |
2024-05-28 | Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | Jie Liu et.al. | 2405.18356 | link |
2024-05-28 | Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation | Anjanava Biswas et.al. | 2405.18346 | null |
2024-05-28 | The Battle of LLMs: A Comparative Study in Conversational QA Tasks | Aryan Rangapur et.al. | 2405.18344 | null |
2024-05-28 | Frustratingly Easy Test-Time Adaptation of Vision-Language Models | Matteo Farina et.al. | 2405.18330 | link |
2024-05-28 | Multi-modal Generation via Cross-Modal In-Context Learning | Amandeep Kumar et.al. | 2405.18304 | link |
2024-05-28 | Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning | Renzhi Wang et.al. | 2405.18292 | null |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428 | null |
2024-05-27 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang et.al. | 2405.17427 | link |
2024-05-27 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence | Zhuoling Li et.al. | 2405.17424 | null |
2024-05-27 | Privacy-Aware Visual Language Models | Laurens Samson et.al. | 2405.17423 | null |
2024-05-27 | Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | Jiaming Liu et.al. | 2405.17418 | null |
2024-05-27 | THREAD: Thinking Deeper with Recursive Spawning | Philip Schroeder et.al. | 2405.17402 | link |
2024-05-27 | The Expressive Capacity of State Space Models: A Formal Language Perspective | Yash Sarrof et.al. | 2405.17394 | null |
2024-05-27 | MindMerger: Efficient Boosting LLM Reasoning in non-English Languages | Zixian Huang et.al. | 2405.17386 | link |
2024-05-27 | Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective | Zhen Qin et.al. | 2405.17383 | null |
2024-05-27 | ReMoDetect: Reward Models Recognize Aligned LLM's Generations | Hyunseok Lee et.al. | 2405.17382 | link |
2024-05-27 | Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention | Zhen Qin et.al. | 2405.17381 | link |
2024-05-27 | RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects | Ahmed Allam et.al. | 2405.17378 | link |
2024-05-28 | Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models | ShengYun Peng et.al. | 2405.17374 | link |
2024-05-27 | Prompt Optimization with Human Feedback | Xiaoqiang Lin et.al. | 2405.17346 | link |
2024-05-27 | Exploring and steering the moral compass of Large Language Models | Alejandro Tlaie et.al. | 2405.17345 | link |
2024-05-27 | Cost-efficient Knowledge-based Question Answering with Large Language Models | Junnan Dong et.al. | 2405.17337 | null |
2024-05-27 | XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser | Xianfu Cheng et.al. | 2405.17336 | link |
2024-05-27 | FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation | Yuting Ma et.al. | 2405.17267 | null |
2024-05-27 | On the Noise Robustness of In-Context Learning for Text Generation | Hongfu Gao et.al. | 2405.17264 | link |
2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765 | null |
2024-05-24 | Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence | Abhinav Patil et.al. | 2405.15750 | link |
2024-05-24 | Sparse maximal update parameterization: A holistic approach to sparse training dynamics | Nolan Dey et.al. | 2405.15743 | link |
2024-05-24 | Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias | Andres Algaba et.al. | 2405.15739 | link |
2024-05-24 | LM4LV: A Frozen Large Language Model for Low-level Vision Tasks | Boyang Zheng et.al. | 2405.15734 | link |
2024-05-24 | Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks | Jerome Sieber et.al. | 2405.15731 | link |
2024-05-24 | Optimizing Large Language Models for OpenAPI Code Completion | Bohdan Petryshyn et.al. | 2405.15729 | link |
2024-05-24 | Disease-informed Adaptation of Vision-Language Models | Jiajin Zhang et.al. | 2405.15728 | **[link](https://github.com/rpidial/disease-informe |