📚 Paper Review - 2026-05-28

# 📚 Daily Paper Review - 2026-05-28

Found **10** relevant papers today. Please review and approve/reject.

---

## 1. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

**Score:** `5.7/10` | **arXiv:** [2605.27178v1](http://arxiv.org/abs/2605.27178v1)

**Authors:** Zihui Zhang, Zhixuan Sun, Yafei Yang...

**Relevance:**
- 🎯 Field Match: 1.19/10 - Matches: self-supervised, segmentation
- 🏆 Venue: ICML (10/10)
- 💻 Code: ✅ Available

**AI Summary:**
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery...

**Key Contributions:**
- We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training.
- Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process.
- In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery agent that incrementally merges suitable neighboring superpoints, guided by our innovative semantic and geometric reward modules.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27178v1) | [📥 PDF](https://arxiv.org/pdf/2605.27178v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 2. MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

**Score:** `4.8/10` | **arXiv:** [2605.27235v1](http://arxiv.org/abs/2605.27235v1)

**Authors:** Zhicong Tang, Zhao Zhang, Jingye Chen...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: CVPR (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image ge...

**Key Contributions:**
- Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language.
- Despite its importance, this remains an underexplored area at scale.
- To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image generation and editing, trained on over 10M multilingual design samples spanning diverse aspect ratios and textual prompts.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27235v1) | [📥 PDF](https://arxiv.org/pdf/2605.27235v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 3. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

**Score:** `4.8/10` | **arXiv:** [2605.27134v1](http://arxiv.org/abs/2605.27134v1)

**Authors:** Heng Qu, Yike Liu, Renren Jin...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-s...

**Key Contributions:**
- Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation.
- This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain.
- To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-source toolkit for unified benchmarking of VLMs on offline GUI navigation tasks.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27134v1) | [📥 PDF](https://arxiv.org/pdf/2605.27134v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 4. Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

**Score:** `4.8/10` | **arXiv:** [2605.27355v1](http://arxiv.org/abs/2605.27355v1)

**Authors:** Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ✅ Available

**AI Summary:**
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors. This arises from core limitations of RLHF: (1) preference datasets are const...

**Key Contributions:**
- Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences.
- In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors.
- This arises from core limitations of RLHF: (1) preference datasets are constructed from the LLM's own outputs, allowing it to influence them, and (2) pairwise comparisons only indicate which response is better, not why.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27355v1) | [📥 PDF](https://arxiv.org/pdf/2605.27355v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 5. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

**Score:** `4.5/10` | **arXiv:** [2605.27130v1](http://arxiv.org/abs/2605.27130v1)

**Authors:** John Donaghy, Shikhar Rastogi

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct crea...

**Key Contributions:**
- We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations.
- Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty.
- Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27130v1) | [📥 PDF](https://arxiv.org/pdf/2605.27130v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 6. Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning

**Score:** `4.4/10` | **arXiv:** [2605.27080v1](http://arxiv.org/abs/2605.27080v1)

**Authors:** Qida Tan, Hongyu Yang, Wenchao Du

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity. Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts. In this work, we devise a simple yet effective semi-supervised learning architectur...

**Key Contributions:**
- Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity.
- Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts.
- In this work, we devise a simple yet effective semi-supervised learning architecture that leverages unlabeled data to enhance domain generalization, thereby reducing reliance on labor-intensive manual annotations.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27080v1) | [📥 PDF](https://arxiv.org/pdf/2605.27080v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 7. ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

**Score:** `4.3/10` | **arXiv:** [2605.27081v1](http://arxiv.org/abs/2605.27081v1)

**Authors:** Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in the cache must be fetched from slow external storage (e.g., UFS), leading to frequent evictions and substantial I/O overh...

**Key Contributions:**
- Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity.
- However, in memory-constrained inference scenarios, only a small set of experts can be cached.
- Experts not in the cache must be fetched from slow external storage (e.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27081v1) | [📥 PDF](https://arxiv.org/pdf/2605.27081v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 8. PlayClass: Automated Play Behaviour Classification in Poultry

**Score:** `4.2/10` | **arXiv:** [2605.27304v1](http://arxiv.org/abs/2605.27304v1)

**Authors:** Prince Ravi Leow, Neil Scheidwasser, Rebecca Oscarsson...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: CVPR (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored. To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video. The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-base...

**Key Contributions:**
- Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored.
- To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video.
- The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-based prompting, and frozen embeddings from image and video foundation models for play action classification.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27304v1) | [📥 PDF](https://arxiv.org/pdf/2605.27304v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 9. Feedforward 3D Editing Learns from Semantic-Part Transformation

**Score:** `4.2/10` | **arXiv:** [2605.27351v1](http://arxiv.org/abs/2605.27351v1)

**Authors:** Jiawei Weng, Saining Zhang, Zhenxin Diao...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: None (5.0/10)
- 💻 Code: ✅ Available

**AI Summary:**
3D editing is a fundamental capability for scalable 3D content creation. While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines. A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision. Editable 3D assets require simultaneous preservation of geometry...

**Key Contributions:**
- 3D editing is a fundamental capability for scalable 3D content creation.
- While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines.
- A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27351v1) | [📥 PDF](https://arxiv.org/pdf/2605.27351v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 10. Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification

**Score:** `4.2/10` | **arXiv:** [2605.27146v1](http://arxiv.org/abs/2605.27146v1)

**Authors:** Joao Batista Florindo

**Relevance:**
- 🎯 Field Match: 1.86/10 - Matches: medical image, self-supervised, image analysis
- 🏆 Venue: None (5.0/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis. However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies. This paper introduces Chaos-SSL, a n...

**Key Contributions:**
- Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis.
- However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies.
- This paper introduces Chaos-SSL, a novel two-stage framework for medical image classification.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.27146v1) | [📥 PDF](https://arxiv.org/pdf/2605.27146v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---


## How to Review

1. Read the summaries above
2. Check paper links for more details
3. Add labels to indicate your decision:
   - `approved` - Add to collection
   - `rejected` - Skip this paper
   - `starred` - Mark as particularly important
4. Comment "approve" or "reject" to trigger automation

**Note:** Papers with `approved` label will be automatically added to the collection.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📚 Paper Review - 2026-05-28 #246

📚 Daily Paper Review - 2026-05-28

1. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

2. MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

3. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

4. Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

5. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

6. Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning

7. ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

8. PlayClass: Automated Play Behaviour Classification in Poultry

9. Feedforward 3D Editing Learns from Semantic-Part Transformation

10. Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification

How to Review

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

📚 Paper Review - 2026-05-28 #246

Description

📚 Daily Paper Review - 2026-05-28

1. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

2. MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

3. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

4. Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

5. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

6. Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning

7. ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

8. PlayClass: Automated Play Behaviour Classification in Poultry

9. Feedforward 3D Editing Learns from Semantic-Part Transformation

10. Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification

How to Review

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions