Skip to content

📚 Paper Review - 2026-05-22 #240

@github-actions

Description

@github-actions

📚 Daily Paper Review - 2026-05-22

Found 10 relevant papers today. Please review and approve/reject.


1. iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

Score: 5.6/10 | arXiv: 2605.21431v1

Authors: Jun Zheng, Zhengze Xu, Mengting Chen...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interac...

Key Contributions:

  • Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one.
  • While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments.
  • This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interaction.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

2. RoadTones: Tone Controllable Text Generation from Road Event Videos

Score: 5.1/10 | arXiv: 2605.21411v1

Authors: Chirag Parikh, Siddhi Pravin Lipare, Ravi Kiran Sarvadevabhatla

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style. This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy. To mitigate this, we introduce a comprehensive dataset-model-evaluation...

Key Contributions:

  • Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style.
  • This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy.
  • To mitigate this, we introduce a comprehensive dataset-model-evaluation suite for tone-controllable road video captioning.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

3. Deformba: Vision State Space Model with Adaptive State Fusion

Score: 4.8/10 | arXiv: 2605.21308v1

Authors: Hongyu Ke, Jack Morris, Yongkang Liu...

Relevance:

  • 🎯 Field Match: 0.51/10 - Matches: segmentation
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
State Space Models (SSMs) have emerged as a powerful and efficient alternative to Transformers, demonstrating linear-time complexity and exceptional sequence modeling capabilities. However, their application to vision tasks remains challenging. First, existing vision SSMs largely depend on manually designed fixed scanning methods to flatten image patches into sequences, which imposes predefined ge...

Key Contributions:

  • State Space Models (SSMs) have emerged as a powerful and efficient alternative to Transformers, demonstrating linear-time complexity and exceptional sequence modeling capabilities.
  • However, their application to vision tasks remains challenging.
  • First, existing vision SSMs largely depend on manually designed fixed scanning methods to flatten image patches into sequences, which imposes predefined geometric structures and increases the complexity.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

4. Divide and Contrast: Learning Robust Temporal Features without Augmentation

Score: 4.8/10 | arXiv: 2605.21241v1

Authors: Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor

Relevance:

  • 🎯 Field Match: 0.68/10 - Matches: self-supervised
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and mult...

Key Contributions:

  • Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics.
  • In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps.
  • Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

5. Is Fixing Schema Graphs Necessary? Full-Resolution Graph Structure Learning for Relational Deep Learning

Score: 4.6/10 | arXiv: 2605.21475v1

Authors: Yi Huang, Qingyun Sun, Jia Li...

Relevance:

  • 🎯 Field Match: 0.42/10 - Matches: deep learning
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs). Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning. However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs ...

Key Contributions:

  • Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs).
  • Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning.
  • However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs to preserve relational semantics, which leads most existing methods to rely on fixed graph structures.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

6. RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis

Score: 4.6/10 | arXiv: 2605.21237v1

Authors: Xuan Yang, Xiaohan Yuan, Hao Li...

Relevance:

  • 🎯 Field Match: 0.76/10 - Matches: cardiac
  • 🏆 Venue: MICCAI (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases. Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence. Due to strong regional and disease-specific differences, traditional methods often oversmooth th...

Key Contributions:

  • Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases.
  • Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence.
  • Due to strong regional and disease-specific differences, traditional methods often oversmooth the data by relying on generative models that are optimized for global patterns.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

7. OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Score: 4.4/10 | arXiv: 2605.21343v1

Authors: Ziye Li, Henghui Ding

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often ...

Key Contributions:

  • Recent layout-to-image models have achieved remarkable progress in spatial controllability.
  • However, they still struggle with inter-object occlusion.
  • When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

8. Let EEG Models Learn EEG

Score: 4.4/10 | arXiv: 2605.21280v1

Authors: Yifan Wang, Yijia Ma, Wen Li...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
High-fidelity EEG generation is critical for alleviating data scarcity and addressing privacy constraints in large-scale neural modeling. Despite recent progress, most existing approaches formulate EEG generation via discrete denoising objectives, which inadequately reflect the inherently continuous temporal dynamics and spectral structure of neural activity. As a result, these methods often strug...

Key Contributions:

  • High-fidelity EEG generation is critical for alleviating data scarcity and addressing privacy constraints in large-scale neural modeling.
  • Despite recent progress, most existing approaches formulate EEG generation via discrete denoising objectives, which inadequately reflect the inherently continuous temporal dynamics and spectral structure of neural activity.
  • As a result, these methods often struggle to preserve long-range temporal dependencies and exhibit mismatches in the spectral and temporal structure of the generated signals.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

9. Data-Efficient Neural Operator Training via Physics-Based Active Learning

Score: 4.2/10 | arXiv: 2605.21348v1

Authors: Alicja Polanska, Lorenzo Zanisi, Vignesh Gopakumar...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICLR (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements. Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner. We introduce physics-based acquisition - a novel physics-informed active learning algorithm that l...

Key Contributions:

  • Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements.
  • Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner.
  • We introduce physics-based acquisition - a novel physics-informed active learning algorithm that leverages the partial differential equation residual to guide data selection.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

10. Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Score: 4.2/10 | arXiv: 2605.21470v1

Authors: Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use. We...

Key Contributions:

  • Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser.
  • Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use.
  • We present agent just-in-time (JIT) compilation, an alternative that compiles task descriptions directly into executable code that is free to include LLM calls, tool calls, and parallelization.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

How to Review

  1. Read the summaries above
  2. Check paper links for more details
  3. Add labels to indicate your decision:
    • approved - Add to collection
    • rejected - Skip this paper
    • starred - Mark as particularly important
  4. Comment "approve" or "reject" to trigger automation

Note: Papers with approved label will be automatically added to the collection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions