Skip to content

📚 Paper Review - 2026-05-28 #246

@github-actions

Description

@github-actions

📚 Daily Paper Review - 2026-05-28

Found 10 relevant papers today. Please review and approve/reject.


1. FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

Score: 5.7/10 | arXiv: 2605.27178v1

Authors: Zihui Zhang, Zhixuan Sun, Yafei Yang...

Relevance:

  • 🎯 Field Match: 1.19/10 - Matches: self-supervised, segmentation
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ✅ Available

AI Summary:
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery...

Key Contributions:

  • We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training.
  • Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process.
  • In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery agent that incrementally merges suitable neighboring superpoints, guided by our innovative semantic and geometric reward modules.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

2. MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

Score: 4.8/10 | arXiv: 2605.27235v1

Authors: Zhicong Tang, Zhao Zhang, Jingye Chen...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image ge...

Key Contributions:

  • Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language.
  • Despite its importance, this remains an underexplored area at scale.
  • To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image generation and editing, trained on over 10M multilingual design samples spanning diverse aspect ratios and textual prompts.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

3. Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

Score: 4.8/10 | arXiv: 2605.27134v1

Authors: Heng Qu, Yike Liu, Renren Jin...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-s...

Key Contributions:

  • Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation.
  • This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain.
  • To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applications, along with GUIEvalKit, an open-source toolkit for unified benchmarking of VLMs on offline GUI navigation tasks.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

4. Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Score: 4.8/10 | arXiv: 2605.27355v1

Authors: Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors. This arises from core limitations of RLHF: (1) preference datasets are const...

Key Contributions:

  • Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences.
  • In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behaviors.
  • This arises from core limitations of RLHF: (1) preference datasets are constructed from the LLM's own outputs, allowing it to influence them, and (2) pairwise comparisons only indicate which response is better, not why.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

5. DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

Score: 4.5/10 | arXiv: 2605.27130v1

Authors: John Donaghy, Shikhar Rastogi

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct crea...

Key Contributions:

  • We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations.
  • Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty.
  • Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

6. Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning

Score: 4.4/10 | arXiv: 2605.27080v1

Authors: Qida Tan, Hongyu Yang, Wenchao Du

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity. Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts. In this work, we devise a simple yet effective semi-supervised learning architectur...

Key Contributions:

  • Appearance-based gaze estimation always suffers from poor generalization due to limited annotated samples and insufficient dataset diversity.
  • Leading approaches adopt weakly supervised learning to generate large-scale pseudo-labeled data from unconstrained real-world scenarios, aiming to mitigate the domain shifts.
  • In this work, we devise a simple yet effective semi-supervised learning architecture that leverages unlabeled data to enhance domain generalization, thereby reducing reliance on labor-intensive manual annotations.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

7. ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

Score: 4.3/10 | arXiv: 2605.27081v1

Authors: Xiongwei Zhu, Xiaojian Liao, Tianyang Jiang...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity. However, in memory-constrained inference scenarios, only a small set of experts can be cached. Experts not in the cache must be fetched from slow external storage (e.g., UFS), leading to frequent evictions and substantial I/O overh...

Key Contributions:

  • Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, reducing activated computation while maintaining high model capacity.
  • However, in memory-constrained inference scenarios, only a small set of experts can be cached.
  • Experts not in the cache must be fetched from slow external storage (e.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

8. PlayClass: Automated Play Behaviour Classification in Poultry

Score: 4.2/10 | arXiv: 2605.27304v1

Authors: Prince Ravi Leow, Neil Scheidwasser, Rebecca Oscarsson...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored. To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video. The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-base...

Key Contributions:

  • Automated monitoring of animal welfare has largely targeted negative indicators, leaving positive welfare behaviours such as play underexplored.
  • To address this gap, we present PlayClass, a pipeline for play-behaviour classification in poultry from top-down pen video.
  • The pipeline leverages long-duration tracking with SAM 3 via YOLO-guided chunk boundaries to minimise identity errors in point-based prompting, and frozen embeddings from image and video foundation models for play action classification.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

9. Feedforward 3D Editing Learns from Semantic-Part Transformation

Score: 4.2/10 | arXiv: 2605.27351v1

Authors: Jiawei Weng, Saining Zhang, Zhenxin Diao...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ✅ Available

AI Summary:
3D editing is a fundamental capability for scalable 3D content creation. While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines. A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision. Editable 3D assets require simultaneous preservation of geometry...

Key Contributions:

  • 3D editing is a fundamental capability for scalable 3D content creation.
  • While image editing has rapidly evolved toward large-scale feedforward generative paradigms, 3D AI generation remains dominated by training-free editing pipelines.
  • A central challenge of feedforward 3D editing lies in the lack of high-quality paired supervision.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

10. Chaos-SSL: An Attention-Based Self-Supervised Learning Framework with Chaotic Transformation for Medical Image Classification

Score: 4.2/10 | arXiv: 2605.27146v1

Authors: Joao Batista Florindo

Relevance:

  • 🎯 Field Match: 1.86/10 - Matches: medical image, self-supervised, image analysis
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis. However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies. This paper introduces Chaos-SSL, a n...

Key Contributions:

  • Self-Supervised Learning (SSL) has emerged as a powerful paradigm to mitigate the reliance on large, annotated datasets, a common bottleneck in medical image analysis.
  • However, standard SSL methods, which rely on simple geometric and color augmentations, may fail to capture the fine-grained, complex textural details necessary for classifying subtle pathologies.
  • This paper introduces Chaos-SSL, a novel two-stage framework for medical image classification.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

How to Review

  1. Read the summaries above
  2. Check paper links for more details
  3. Add labels to indicate your decision:
    • approved - Add to collection
    • rejected - Skip this paper
    • starred - Mark as particularly important
  4. Comment "approve" or "reject" to trigger automation

Note: Papers with approved label will be automatically added to the collection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions