Skip to content

📚 Paper Review - 2026-06-01 #250

@github-actions

Description

@github-actions

📚 Daily Paper Review - 2026-06-01

Found 10 relevant papers today. Please review and approve/reject.


1. City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

Score: 5.8/10 | arXiv: 2605.30310v1

Authors: Sayan Paul, Sourav Ghosh, Siddharth Katageri...

Relevance:

  • 🎯 Field Match: 2.03/10 - Matches: gaussian splatting, 3d reconstruction, nerf
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ✅ Available

AI Summary:
City-scale 3D surface reconstruction from multiview images for downstream 3D simulation, poses highly challenging problems due to the scale and complexity of urban scenes. Existing city-scale 3D reconstruction methods based on NeRF, Gaussian Splatting etc. often fail to recover 3D meshes ready for simulation due to incomplete/missing geometry and irregular, noisy surfaces. Scaling existing small-s...

Key Contributions:

  • City-scale 3D surface reconstruction from multiview images for downstream 3D simulation, poses highly challenging problems due to the scale and complexity of urban scenes.
  • Existing city-scale 3D reconstruction methods based on NeRF, Gaussian Splatting etc.
  • often fail to recover 3D meshes ready for simulation due to incomplete/missing geometry and irregular, noisy surfaces.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

2. Uncertainty-driven 3D Gaussian Splatting Active Mapping via Anisotropic Visibility Field

Score: 5.7/10 | arXiv: 2605.30342v1

Authors: Shangjie Xue, Jesse Dill, Dhruv Ahuja...

Relevance:

  • 🎯 Field Match: 1.69/10 - Matches: 3d gaussian, gaussian splatting
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ✅ Available

AI Summary:
We present Gaussian Splatting Anisotropic Visibility Field (GAVIS), a novel framework for uncertainty quantification and active mapping in 3DGS. Our key insight is that regions unseen from the training views yield unreliable predictions from the 3DGS. To address this, we introduce a principled and efficient method for quantifying the visibility field in 3DGS, defined as the anisotropic visibility ...

Key Contributions:

  • We present Gaussian Splatting Anisotropic Visibility Field (GAVIS), a novel framework for uncertainty quantification and active mapping in 3DGS.
  • Our key insight is that regions unseen from the training views yield unreliable predictions from the 3DGS.
  • To address this, we introduce a principled and efficient method for quantifying the visibility field in 3DGS, defined as the anisotropic visibility of each particle with respect to the training views, and represented using spherical harmonics.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

3. Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Score: 5.3/10 | arXiv: 2605.30231v1

Authors: Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Vision-Language Models (VLMs) often struggle with robust 3D spatial reasoning. Prevailing methods that rely on fine-tuning with 3D visual question-answering (VQA) datasets may overfit dataset-specific biases, while integrating specialized 3D visual encoders is often inflexible and cumbersome. In this paper, we argue that genuine spatial understanding should emerge from learning fundamental geometr...

Key Contributions:

  • Vision-Language Models (VLMs) often struggle with robust 3D spatial reasoning.
  • Prevailing methods that rely on fine-tuning with 3D visual question-answering (VQA) datasets may overfit dataset-specific biases, while integrating specialized 3D visual encoders is often inflexible and cumbersome.
  • In this paper, we argue that genuine spatial understanding should emerge from learning fundamental geometric priors, not only from high-level VQA supervision.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

4. Archon: A Unified Multimodal Model for Holistic Digital Human Generation

Score: 4.9/10 | arXiv: 2605.30311v1

Authors: Chong Bao, Shichen Liu, Lijun Yu...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Digital humans are fundamental to immersive interaction, yet creating a unified model for holistic modalities, including text, audio, motion, and visual content, remains an open challenge. In this paper, we present Archon, a fully pretrained, human-centric unified multimodal model for holistic avatar generation. Archon unifies seven modalities with modality-specific tokenizers, and a native autore...

Key Contributions:

  • Digital humans are fundamental to immersive interaction, yet creating a unified model for holistic modalities, including text, audio, motion, and visual content, remains an open challenge.
  • In this paper, we present Archon, a fully pretrained, human-centric unified multimodal model for holistic avatar generation.
  • Archon unifies seven modalities with modality-specific tokenizers, and a native autoregressive unified multimodal model pretrained on synchronized modalities and 72 diverse tasks to model holistic joint distributions.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

5. Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

Score: 4.8/10 | arXiv: 2605.30353v1

Authors: Nhat-Minh Nguyen

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX. We documented and classified 15 supervision events by intervention level.
The agent resolved ten autonomously b...

Key Contributions:

  • Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX.
  • We documented and classified 15 supervision events by intervention level.
  • The agent resolved ten autonomously by iterating against oracle tests.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

6. Benchmarking Single-Factor Physical Video-to-Audio Generation

Score: 4.5/10 | arXiv: 2605.30339v1

Authors: Tingle Li, Siddharth Gururani, Kevin J. Shih...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes. Existing evaluations emphasize perceptual realism and overlook physical correctness under controlled interventions. In this paper, we introduce FlatSounds, a benchmark that audits the physical reasoning of V2A models through: 1) controlled coun...

Key Contributions:

  • Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes.
  • Existing evaluations emphasize perceptual realism and overlook physical correctness under controlled interventions.
  • In this paper, we introduce FlatSounds, a benchmark that audits the physical reasoning of V2A models through: 1) controlled counterfactual pairs in which a single physical factor is varied, and 2) single-video pattern tests that probe internal consistency and directional trends.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

7. Grounded 3D-Aware Spatial Vision-Language Modeling

Score: 4.4/10 | arXiv: 2605.30307v1

Authors: An-Chieh Cheng, Yang Fu, Yatai Ji...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
We present GR3D, a spatial vision language model equipped with three complementary grounding capabilities--explicit 2D grounding, implicit 2D grounding, and monocular 3D grounding--within a single framework. GR3D introduces an implicit grounding mechanism that identifies entity mentions during generation and inserts the corresponding region tokens into the text stream, allowing the model to refere...

Key Contributions:

  • We present GR3D, a spatial vision language model equipped with three complementary grounding capabilities--explicit 2D grounding, implicit 2D grounding, and monocular 3D grounding--within a single framework.
  • GR3D introduces an implicit grounding mechanism that identifies entity mentions during generation and inserts the corresponding region tokens into the text stream, allowing the model to reference visual evidence on the fly when producing spatial chain-of-thought responses.
  • In parallel, a region-prompted monocular 3D grounding design predicts 3D bounding boxes in the camera view from grounded region queries, supported by intrinsic-aware normalization and dense geometric supervision.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

8. iLoRA: Bayesian Low-Rank Adaptation with Latent Interaction Graphs for Microbiome Diagnosis

Score: 4.4/10 | arXiv: 2605.30179v1

Authors: Yang Song, Yixuan Zhang, Lingfa Meng...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Parameter-efficient adaptation has made LLMs practical for domain prediction, but standard LoRA still relies on a static low-rank update and does not expose the latent interactions that often drive scientific labels. We introduce iLoRA. To our knowledge, it is the first Bayesian graph-conditioned LoRA framework. It infers a latent interaction graph from the input and uses it to generate input-cond...

Key Contributions:

  • Parameter-efficient adaptation has made LLMs practical for domain prediction, but standard LoRA still relies on a static low-rank update and does not expose the latent interactions that often drive scientific labels.
  • We introduce iLoRA.
  • To our knowledge, it is the first Bayesian graph-conditioned LoRA framework.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

9. MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

Score: 4.4/10 | arXiv: 2605.30295v1

Authors: Valentina Bui Muti, Eugénie Dulout, Ziquan Fu

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health record-congruent settings remains limited. Existing benchmarks often rely on static datasets or unstructured inputs that do not reflect the structured, interoperable data formats used in clinical systems. We introduce a pipeline for generating clinically realistic H...

Key Contributions:

  • Large language models (LLMs) show promise for clinical reasoning and decision support, but evaluation in realistic, electronic health record-congruent settings remains limited.
  • Existing benchmarks often rely on static datasets or unstructured inputs that do not reflect the structured, interoperable data formats used in clinical systems.
  • We introduce a pipeline for generating clinically realistic HL7 FHIR R4 bundles from unstructured text, enabling controllable evaluation of clinical decision support systems.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

10. Do Language Models Track Entities Across State Changes?

Score: 4.3/10 | arXiv: 2605.30233v1

Authors: Zilu Tang, Qiao Zhao, Gabriel Franco...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of work investigates how transformer language models (LMs) solve entity binding $\textit{without}$ state changes. However, there is limited understanding of how non-toy LMs address ET problems of realistic difficulties expressed in natural language. To this end, ...

Key Contributions:

  • Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning.
  • An increasing amount of work investigates how transformer language models (LMs) solve entity binding $\textit{without}$ state changes.
  • However, there is limited understanding of how non-toy LMs address ET problems of realistic difficulties expressed in natural language.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

How to Review

  1. Read the summaries above
  2. Check paper links for more details
  3. Add labels to indicate your decision:
    • approved - Add to collection
    • rejected - Skip this paper
    • starred - Mark as particularly important
  4. Comment "approve" or "reject" to trigger automation

Note: Papers with approved label will be automatically added to the collection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions