📚 Daily Paper Review - 2026-05-22
Found 10 relevant papers today. Please review and approve/reject.
1. iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance
Score: 5.6/10 | arXiv: 2605.21431v1
Authors: Jun Zheng, Zhengze Xu, Mengting Chen...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ✅ Available
AI Summary:
Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interac...
Key Contributions:
- Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one.
- While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments.
- This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interaction.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
2. RoadTones: Tone Controllable Text Generation from Road Event Videos
Score: 5.1/10 | arXiv: 2605.21411v1
Authors: Chirag Parikh, Siddhi Pravin Lipare, Ravi Kiran Sarvadevabhatla
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: CVPR (10/10)
- 💻 Code: ✅ Available
AI Summary:
Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style. This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy. To mitigate this, we introduce a comprehensive dataset-model-evaluation...
Key Contributions:
- Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style.
- This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy.
- To mitigate this, we introduce a comprehensive dataset-model-evaluation suite for tone-controllable road video captioning.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
3. Deformba: Vision State Space Model with Adaptive State Fusion
Score: 4.8/10 | arXiv: 2605.21308v1
Authors: Hongyu Ke, Jack Morris, Yongkang Liu...
Relevance:
- 🎯 Field Match: 0.51/10 - Matches: segmentation
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
State Space Models (SSMs) have emerged as a powerful and efficient alternative to Transformers, demonstrating linear-time complexity and exceptional sequence modeling capabilities. However, their application to vision tasks remains challenging. First, existing vision SSMs largely depend on manually designed fixed scanning methods to flatten image patches into sequences, which imposes predefined ge...
Key Contributions:
- State Space Models (SSMs) have emerged as a powerful and efficient alternative to Transformers, demonstrating linear-time complexity and exceptional sequence modeling capabilities.
- However, their application to vision tasks remains challenging.
- First, existing vision SSMs largely depend on manually designed fixed scanning methods to flatten image patches into sequences, which imposes predefined geometric structures and increases the complexity.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
4. Divide and Contrast: Learning Robust Temporal Features without Augmentation
Score: 4.8/10 | arXiv: 2605.21241v1
Authors: Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
Relevance:
- 🎯 Field Match: 0.68/10 - Matches: self-supervised
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and mult...
Key Contributions:
- Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics.
- In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps.
- Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
5. Is Fixing Schema Graphs Necessary? Full-Resolution Graph Structure Learning for Relational Deep Learning
Score: 4.6/10 | arXiv: 2605.21475v1
Authors: Yi Huang, Qingyun Sun, Jia Li...
Relevance:
- 🎯 Field Match: 0.42/10 - Matches: deep learning
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs). Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning. However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs ...
Key Contributions:
- Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs).
- Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning.
- However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs to preserve relational semantics, which leads most existing methods to rely on fixed graph structures.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
6. RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis
Score: 4.6/10 | arXiv: 2605.21237v1
Authors: Xuan Yang, Xiaohan Yuan, Hao Li...
Relevance:
- 🎯 Field Match: 0.76/10 - Matches: cardiac
- 🏆 Venue: MICCAI (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases. Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence. Due to strong regional and disease-specific differences, traditional methods often oversmooth th...
Key Contributions:
- Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases.
- Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence.
- Due to strong regional and disease-specific differences, traditional methods often oversmooth the data by relying on generative models that are optimized for global patterns.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
7. OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation
Score: 4.4/10 | arXiv: 2605.21343v1
Authors: Ziye Li, Henghui Ding
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often ...
Key Contributions:
- Recent layout-to-image models have achieved remarkable progress in spatial controllability.
- However, they still struggle with inter-object occlusion.
- When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
8. Let EEG Models Learn EEG
Score: 4.4/10 | arXiv: 2605.21280v1
Authors: Yifan Wang, Yijia Ma, Wen Li...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
High-fidelity EEG generation is critical for alleviating data scarcity and addressing privacy constraints in large-scale neural modeling. Despite recent progress, most existing approaches formulate EEG generation via discrete denoising objectives, which inadequately reflect the inherently continuous temporal dynamics and spectral structure of neural activity. As a result, these methods often strug...
Key Contributions:
- High-fidelity EEG generation is critical for alleviating data scarcity and addressing privacy constraints in large-scale neural modeling.
- Despite recent progress, most existing approaches formulate EEG generation via discrete denoising objectives, which inadequately reflect the inherently continuous temporal dynamics and spectral structure of neural activity.
- As a result, these methods often struggle to preserve long-range temporal dependencies and exhibit mismatches in the spectral and temporal structure of the generated signals.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
9. Data-Efficient Neural Operator Training via Physics-Based Active Learning
Score: 4.2/10 | arXiv: 2605.21348v1
Authors: Alicja Polanska, Lorenzo Zanisi, Vignesh Gopakumar...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICLR (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements. Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner. We introduce physics-based acquisition - a novel physics-informed active learning algorithm that l...
Key Contributions:
- Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements.
- Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner.
- We introduce physics-based acquisition - a novel physics-informed active learning algorithm that leverages the partial differential equation residual to guide data selection.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
10. Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
Score: 4.2/10 | arXiv: 2605.21470v1
Authors: Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini...
Relevance:
- 🎯 Field Match: 0.0/10 - Matches:
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned
AI Summary:
Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use. We...
Key Contributions:
- Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser.
- Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use.
- We present agent just-in-time (JIT) compilation, an alternative that compiles task descriptions directly into executable code that is free to include LLM calls, tool calls, and parallelization.
Links: 📄 Paper | 📥 PDF
Actions:
- ✅ Approve: Add label
approved and comment "approve"
- ❌ Reject: Add label
rejected and comment "reject"
- ⭐ Important: Add label
starred
How to Review
- Read the summaries above
- Check paper links for more details
- Add labels to indicate your decision:
approved - Add to collection
rejected - Skip this paper
starred - Mark as particularly important
- Comment "approve" or "reject" to trigger automation
Note: Papers with approved label will be automatically added to the collection.
📚 Daily Paper Review - 2026-05-22
Found 10 relevant papers today. Please review and approve/reject.
1. iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance
Score:
5.6/10| arXiv: 2605.21431v1Authors: Jun Zheng, Zhengze Xu, Mengting Chen...
Relevance:
AI Summary:
Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interac...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred2. RoadTones: Tone Controllable Text Generation from Road Event Videos
Score:
5.1/10| arXiv: 2605.21411v1Authors: Chirag Parikh, Siddhi Pravin Lipare, Ravi Kiran Sarvadevabhatla
Relevance:
AI Summary:
Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style. This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy. To mitigate this, we introduce a comprehensive dataset-model-evaluation...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred3. Deformba: Vision State Space Model with Adaptive State Fusion
Score:
4.8/10| arXiv: 2605.21308v1Authors: Hongyu Ke, Jack Morris, Yongkang Liu...
Relevance:
AI Summary:
State Space Models (SSMs) have emerged as a powerful and efficient alternative to Transformers, demonstrating linear-time complexity and exceptional sequence modeling capabilities. However, their application to vision tasks remains challenging. First, existing vision SSMs largely depend on manually designed fixed scanning methods to flatten image patches into sequences, which imposes predefined ge...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred4. Divide and Contrast: Learning Robust Temporal Features without Augmentation
Score:
4.8/10| arXiv: 2605.21241v1Authors: Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor
Relevance:
AI Summary:
Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and mult...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred5. Is Fixing Schema Graphs Necessary? Full-Resolution Graph Structure Learning for Relational Deep Learning
Score:
4.6/10| arXiv: 2605.21475v1Authors: Yi Huang, Qingyun Sun, Jia Li...
Relevance:
AI Summary:
Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs). Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning. However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs ...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred6. RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis
Score:
4.6/10| arXiv: 2605.21237v1Authors: Xuan Yang, Xiaohan Yuan, Hao Li...
Relevance:
AI Summary:
Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases. Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence. Due to strong regional and disease-specific differences, traditional methods often oversmooth th...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred7. OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation
Score:
4.4/10| arXiv: 2605.21343v1Authors: Ziye Li, Henghui Ding
Relevance:
AI Summary:
Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often ...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred8. Let EEG Models Learn EEG
Score:
4.4/10| arXiv: 2605.21280v1Authors: Yifan Wang, Yijia Ma, Wen Li...
Relevance:
AI Summary:
High-fidelity EEG generation is critical for alleviating data scarcity and addressing privacy constraints in large-scale neural modeling. Despite recent progress, most existing approaches formulate EEG generation via discrete denoising objectives, which inadequately reflect the inherently continuous temporal dynamics and spectral structure of neural activity. As a result, these methods often strug...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred9. Data-Efficient Neural Operator Training via Physics-Based Active Learning
Score:
4.2/10| arXiv: 2605.21348v1Authors: Alicja Polanska, Lorenzo Zanisi, Vignesh Gopakumar...
Relevance:
AI Summary:
Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements. Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner. We introduce physics-based acquisition - a novel physics-informed active learning algorithm that l...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starred10. Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
Score:
4.2/10| arXiv: 2605.21470v1Authors: Caleb Winston, Ron Yifeng Wang, Azalia Mirhoseini...
Relevance:
AI Summary:
Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in high latency and frequent errors from incorrect tool use. We...
Key Contributions:
Links: 📄 Paper | 📥 PDF
Actions:
approvedand comment "approve"rejectedand comment "reject"starredHow to Review
approved- Add to collectionrejected- Skip this paperstarred- Mark as particularly importantNote: Papers with
approvedlabel will be automatically added to the collection.