- ScanNet (170 frames): TAE<=2.2
- Bonn RGB-D Dynamic (5 video clips with 110 frames each): OPW<=0.1
- ScanNet++ (98 video clips with 32 frames each): TAE
- NYU-Depth V2: OPW<=0.37
- Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.079
- NYU-Depth V2: AbsRel<=0.045 (relative depth)
- NYU-Depth V2: AbsRel<=0.051 (metric depth)
- Appendix 1: Rules for qualifying models for the rankings (to do)
- Appendix 2: Metrics selection for the rankings (to do)
- Appendix 3: List of all research papers from the above rankings
📝 Note: There are no quantitative comparison results of StereoCrafter yet, so this ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video clip: 22_dogskateboarder and one output frame (right view) is compared with one input frame (left view) from the video clip: scooter-black
RK | Model Links: Venue Repository |
Rank ↓ (human perceptual judgment) |
---|---|---|
1 | StereoCrafter |
1 |
2-3 | Immersity AI | 2-3 |
2-3 | Owl3D | 2-3 |
4 | Deep3D |
4 |
RK | Model Links: Venue Repository |
TAE ↓ {Input fr.} DAV |
---|---|---|
1 | Depth Any Video |
2.1 {MF} |
2 | DepthCrafter |
2.2 {MF} |
3 | ChronoDepth |
2.3 {MF} |
4 | NVDS |
3.7 {4} |
RK | Model Links: Venue Repository |
OPW ↓ {Input fr.} FD |
OPW ↓ {Input fr.} NVDS+ |
OPW ↓ {Input fr.} NVDS |
---|---|---|---|---|
1 | FutureDepth |
0.303 {4} | - | - |
2 | NVDS+ |
- | 0.339 {4} | - |
3 | NVDS |
0.364 {4} | - | 0.364 {4} |
📝 Note: This ranking is based on data from Table 4. The example score 3:0:2 (first left in the first row) means that Depth Pro has a better F-score than UniDepth-V in 3 datasets, in no dataset has the same F-score as UniDepth-V and has a worse F-score compared to UniDepth-V in 2 datasets.
📝 Note: 1) See Figure 4 2) The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The DepthCrafter rank is based on the latest version 1.0.1.
RK | Model Links: Venue Repository |
AbsRel ↓ {Input fr.} M3D v2 |
AbsRel ↓ {Input fr.} GRIN |
- | - | - |
---|---|---|---|---|---|---|
1 | Metric3D v2 ViT-giant |
0.045 {1} | - | - | - | - |
2 | GRIN_FT_NI |
- | 0.051 {1} | - | - | - |
Method | Abbr. | Paper | Venue (Alt link) |
Official repository |
---|---|---|---|---|
Align3R | - | Align3R: Aligned Monocular Depth Estimation for Dynamic Videos | ||
BetterDepth | BD | BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation | - | |
Buffer Anytime | BA | Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors | - | |
ChronoDepth | - | Learning Temporally Consistent Video Depth from Video Diffusion Priors | ||
CUT3R | - | Continuous 3D Perception Model with Persistent State | ||
Deep3D | - | Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks | ||
Depth Any Video | DAV | Depth Any Video with Scalable Synthetic Data | ||
Depth Anything | DA | Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | ||
Depth Anything V2 | DA V2 | Depth Anything V2 | ||
Depth Pro | DP | Depth Pro: Sharp Monocular Metric Depth in Less Than a Second | ||
DepthCrafter | DC | DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | ||
Diffusion E2E FT | E2E FT | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | ||
FutureDepth | FD | FutureDepth: Learning to Predict the Future Improves Video Depth Estimation | - | |
GRIN | - | GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion | - | |
Metric3D | M3D | Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image | ||
Metric3D v2 | M3D v2 | Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation | (Alt link) |
|
MoGe | - | MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision | ||
MonST3R | - | MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | ||
NVDS | - | Neural Video Depth Stabilizer | ||
NVDS+ | - | NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation | (Alt link) |
|
PatchFusion | PF | PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation | ||
RollingDepth | RD | Video Depth without Video Models | ||
StereoCrafter | - | StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | ||
UniDepth | UD | UniDepth: Universal Monocular Metric Depth Estimation | ||
Video Depth Anything | VDA | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | ||
ZeroDepth | ZD | Towards Zero-Shot Scale-Aware Monocular Depth Estimation | ||
ZoeDepth | ZoeD | ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth |