📚 Daily Paper Review - 2026-05-28
Found 10 relevant papers today. Please review and approve/reject.
1. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
Score: 5.7/10 | arXiv: 2605.27178v1
Authors: Zihui Zhang, Zhixuan Sun, Yafei Yang...
Relevance:
- 🎯 Field Match: 1.19/10 - Matches: self-supervised, segmentation
- 🏆 Venue: ICML (10/10)
- 💻 Code: ✅ Available
AI Summary:
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery...
Key Contributions:
- We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training.
- Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process.
- In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery agent that incrementally merges suitable neighboring superpoints, guided by our innovative semantic and geometric reward modules.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
2. MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale
Score: 4.8/10 | arXiv: 2605.27235v1
Authors: Zhicong Tang, Zhao Zhang, Jingye Chen...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: CVPR (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image ge...
Key Contributions:
- Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language.
- Despite its importance, this remains an underexplored area at scale.
- To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image generation and editing, trained on over 10M multilingual design samples spanning diverse aspect ratios and textual prompts.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
3. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
Score: 4.8/10 | arXiv: 2605.27134v1
Authors: Heng Qu, Yike Liu, Renren Jin...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-s...
Key Contributions:
- Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation.
- This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain.
- To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-source toolkit for unified benchmarking of VLMs on offline GUI navigation tasks.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
4. Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
Score: 4.8/10 | arXiv: 2605.27355v1
Authors: Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ✅ Available
AI Summary:
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors. This arises from core limitations of RLHF: (1) preference datasets are const...
Key Contributions:
- Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences.
- In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors.
- This arises from core limitations of RLHF: (1) preference datasets are constructed from the LLM's own outputs, allowing it to influence them, and (2) pairwise comparisons only indicate which response is better, not why.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
5. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
Score: 4.5/10 | arXiv: 2605.27130v1
Authors: John Donaghy, Shikhar Rastogi
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct crea...
Key Contributions:
- We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations.
- Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty.
- Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
6. Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning
Score: 4.4/10 | arXiv: 2605.27080v1
Authors: Qida Tan, Hongyu Yang, Wenchao Du
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity. Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts. In this work, we devise a simple yet effective semi-supervised learning architectur...
Key Contributions:
- Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity.
- Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts.
- In this work, we devise a simple yet effective semi-supervised learning architecture that leverages unlabeled data to enhance domain generalization, thereby reducing reliance on labor-intensive manual annotations.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
7. ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference
Score: 4.3/10 | arXiv: 2605.27081v1
Authors: Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in the cache must be fetched from slow external storage (e.g., UFS), leading to frequent evictions and substantial I/O overh...
Key Contributions:
- Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity.
- However, in memory-constrained inference scenarios, only a small set of experts can be cached.
- Experts not in the cache must be fetched from slow external storage (e.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
8. PlayClass: Automated Play Behaviour Classification in Poultry
Score: 4.2/10 | arXiv: 2605.27304v1
Authors: Prince Ravi Leow, Neil Scheidwasser, Rebecca Oscarsson...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: CVPR (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored. To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video. The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-base...
Key Contributions:
- Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored.
- To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video.
- The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-based prompting, and frozen embeddings from image and video foundation models for play action classification.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
9. Feedforward 3D Editing Learns from Semantic-Part Transformation
Score: 4.2/10 | arXiv: 2605.27351v1
Authors: Jiawei Weng, Saining Zhang, Zhenxin Diao...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: None (5.0/10)
- 💻 Code: ✅ Available
AI Summary:
3D editing is a fundamental capability for scalable 3D content creation. While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines. A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision. Editable 3D assets require simultaneous preservation of geometry...
Key Contributions:
- 3D editing is a fundamental capability for scalable 3D content creation.
- While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines.
- A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
10. Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification
Score: 4.2/10 | arXiv: 2605.27146v1
Authors: Joao Batista Florindo
Relevance:
- 🎯 Field Match: 1.86/10 - Matches: medical image, self-supervised, image analysis
- 🏆 Venue: None (5.0/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis. However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies. This paper introduces Chaos-SSL, a n...
Key Contributions:
- Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis.
- However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies.
- This paper introduces Chaos-SSL, a novel two-stage framework for medical image classification.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
How to Review
- Read the summaries above
- Check paper links for more details
- Add labels to indicate your decision:
approved - Add to collection
rejected - Skip this paper
starred - Mark as particularly important
- Comment "approve" or "reject" to trigger automation
Note: Papers with approved label will be automatically added to the collection.
📚 Daily Paper Review - 2026-05-28
Found 10 relevant papers today. Please review and approve/reject.
1. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
Score:
5.7/10| arXiv: 2605.27178v1Authors: Zihui Zhang, Zhixuan Sun, Yafei Yang...
Relevance:
AI Summary:
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred2. MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale
Score:
4.8/10| arXiv: 2605.27235v1Authors: Zhicong Tang, Zhao Zhang, Jingye Chen...
Relevance:
AI Summary:
Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image ge...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred3. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation
Score:
4.8/10| arXiv: 2605.27134v1Authors: Heng Qu, Yike Liu, Renren Jin...
Relevance:
AI Summary:
Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-s...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred4. Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
Score:
4.8/10| arXiv: 2605.27355v1Authors: Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee
Relevance:
AI Summary:
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors. This arises from core limitations of RLHF: (1) preference datasets are const...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred5. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
Score:
4.5/10| arXiv: 2605.27130v1Authors: John Donaghy, Shikhar Rastogi
Relevance:
AI Summary:
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct crea...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred6. Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning
Score:
4.4/10| arXiv: 2605.27080v1Authors: Qida Tan, Hongyu Yang, Wenchao Du
Relevance:
AI Summary:
Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity. Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts. In this work, we devise a simple yet effective semi-supervised learning architectur...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred7. ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference
Score:
4.3/10| arXiv: 2605.27081v1Authors: Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang...
Relevance:
AI Summary:
Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in the cache must be fetched from slow external storage (e.g., UFS), leading to frequent evictions and substantial I/O overh...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred8. PlayClass: Automated Play Behaviour Classification in Poultry
Score:
4.2/10| arXiv: 2605.27304v1Authors: Prince Ravi Leow, Neil Scheidwasser, Rebecca Oscarsson...
Relevance:
AI Summary:
Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored. To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video. The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-base...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred9. Feedforward 3D Editing Learns from Semantic-Part Transformation
Score:
4.2/10| arXiv: 2605.27351v1Authors: Jiawei Weng, Saining Zhang, Zhenxin Diao...
Relevance:
AI Summary:
3D editing is a fundamental capability for scalable 3D content creation. While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines. A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision. Editable 3D assets require simultaneous preservation of geometry...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred10. Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification
Score:
4.2/10| arXiv: 2605.27146v1Authors: Joao Batista Florindo
Relevance:
AI Summary:
Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis. However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies. This paper introduces Chaos-SSL, a n...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starredHow to Review
approved- Add to collectionrejected- Skip this paperstarred- Mark as particularly importantNote: Papers with
approvedlabel will be automatically added to the collection.