Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

List of Rankings

2D to 3D Video Conversion Rankings

Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

Monocular Depth Estimation Rankings

Appendices

Appendix 1: Rules for qualifying models for the rankings (to do)
Appendix 2: Metrics selection for the rankings (to do)
Appendix 3: List of all research papers from the above rankings

Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

📝 Note: There are no quantitative comparison results of StereoCrafter yet, so this ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video clip: 22_dogskateboarder and one output frame (right view) is compared with one input frame (left view) from the video clip: scooter-black

RK	Model Links: Venue Repository	Rank ↓ (human perceptual judgment)
1	StereoCrafter	1
2-3	Immersity AI	2-3
2-3	Owl3D	2-3
4	Deep3D	4

ScanNet (170 frames): TAE<=2.2

RK	Model Links: Venue Repository	TAE ↓ {Input fr.} VDA
1	VDA-L	0.570 {MF}
2	DepthCrafter	0.639 {MF}
3	Depth Any Video	0.967 {MF}
4	ChronoDepth	1.022 {MF}
5	Depth Anything V2 Large	1.140 {1}
6	NVDS	2.176 {4}

Bonn RGB-D Dynamic (5 video clips with 110 frames each): OPW<=0.1

RK	Model Links: Venue Repository	OPW ↓ {Input fr.} BA
1	Buffer Anytime (DA V2)	0.028 {MF}
2	DepthCrafter	0.029 {MF}
3	ChronoDepth	0.035 {MF}
4	Marigold + E2E FT	0.053 {1}
5	Depth Anything V2 Large	0.059 {1}
6	NVDS	0.068 {4}

ScanNet++ (98 video clips with 32 frames each): TAE

RK	Model Links: Venue Repository	TAE ↓ {Input fr.} DAV
1	Depth Any Video	2.1 {MF}
2	DepthCrafter	2.2 {MF}
3	ChronoDepth	2.3 {MF}
4	NVDS	3.7 {4}

NYU-Depth V2: OPW<=0.37

RK	Model Links: Venue Repository	OPW ↓ {Input fr.} FD	OPW ↓ {Input fr.} NVDS⁺	OPW ↓ {Input fr.} NVDS
1	FutureDepth	0.303 {4}	-	-
2	NVDS⁺	-	0.339 {4}	-
3	NVDS	0.364 {4}	-	0.364 {4}

Direct comparison of 9 metric depth models (each with each) on 5 datasets: F-score

📝 Note: This ranking is based on data from Table 4. The example score 3:0:2 (first left in the first row) means that Depth Pro has a better F-score than UniDepth-V in 3 datasets, in no dataset has the same F-score as UniDepth-V and has a worse F-score compared to UniDepth-V in 2 datasets.

RK	Model Links: Venue Repository	DP	UD	M3D v2	DA V2	DA	ZoeD	M3D	PF	ZD
1	Depth Pro	-	3:0:2	3:1:1	5:0:0	5:0:0	5:0:0	5:0:0	5:0:0	3:0:0
2	UniDepth-V	2:0:3	-	4:0:1	5:0:0	5:0:0	5:0:0	5:0:0	5:0:0	3:0:0
3	Metric3D v2 ViT-giant	1:1:3	1:0:4	-	4:1:0	5:0:0	5:0:0	5:0:0	5:0:0	3:0:0
4	Depth Anything V2	0:0:5	0:0:5	0:1:4	-	4:1:0	4:0:1	5:0:0	4:0:1	3:0:0
5	Depth Anything	0:0:5	0:0:5	0:0:5	0:1:4	-	3:0:2	3:1:1	3:0:2	2:1:0
6	ZoeD-M12-NK	0:0:5	0:0:5	0:0:5	1:0:4	2:0:3	-	3:0:2	3:1:1	2:0:1
7	Metric3D	0:0:5	0:0:5	0:0:5	0:0:5	1:1:3	2:0:3	-	3:0:2	2:1:0
8	PatchFusion	0:0:5	0:0:5	0:0:5	1:0:4	2:0:3	1:1:3	2:0:3	-	2:0:1
9	ZeroDepth	0:0:3	0:0:3	0:0:3	0:0:3	0:1:2	1:0:2	0:1:2	1:0:2	-

Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.079

📝 Note: 1) See Figure 4 2) The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The DepthCrafter rank is based on the latest version 1.0.1.

RK	Model Links: Venue Repository	AbsRel ↓ {Input fr.} VDA	AbsRel ↓ {Input fr.} Align3R	AbsRel ↓ {Input fr.} MonST3R	AbsRel ↓ {Input fr.} DC	AbsRel ↓ {Input fr.} CUT3R	AbsRel ↓ {Input fr.} RD
1	Depth Any Video	0.051 {MF}	-	-	-	-	-
2	VDA-L	0.053 {MF}	-	-	-	-	-
3	Depth Pro	-	0.067 {1}	-	-	-	-
4	Align3R (Depth Pro)	-	0.068 {2}	-	-	-	-
5	MonST3R	-	0.082 {2}	0.063 {2}	-	0.066 {2}	-
6	DepthCrafter v1.0.1	0.066 {MF} (DC v1.0.0)	0.075 {MF} (DC v1.0.0)	0.075 {MF} (DC v1.0.0)	0.071 {MF}	0.075 {MF} (DC v1.0.0)	0.066 {MF} (DC v1.0.0)
7	CUT3R	-	-	-	-	0.074 {MF}	-
8	RollingDepth	-	-	-	-	-	0.079 {MF}
9	Depth Anything	-	-	-	0.078 {1}	-	0.099 {1}

NYU-Depth V2: AbsRel<=0.045 (relative depth)

RK	Model Links: Venue Repository	AbsRel ↓ {Input fr.} MoGe	AbsRel ↓ {Input fr.} BD	AbsRel ↓ {Input fr.} M3D v2	AbsRel ↓ {Input fr.} DA	AbsRel ↓ {Input fr.} DA V2
1	MoGe	0.0341 {1}	-	-	-	-
2	UniDepth	0.0380 {1}	-	-	-	-
3-4	BetterDepth	-	0.042 {1}	-	-	-
3-4	Metric3D v2 ViT-Large	0.134 {1}	-	0.042 {1}	-	-
5	Depth Anything Large	0.0424 {1}	0.043 {1}	0.043 {1}	0.043 {1}	0.043 {1}
6	Depth Anything V2 Large	0.0420 {1}	-	-	-	0.045 {1}

NYU-Depth V2: AbsRel<=0.051 (metric depth)

RK	Model Links: Venue Repository	AbsRel ↓ {Input fr.} M3D v2	AbsRel ↓ {Input fr.} GRIN	-	-	-
1	Metric3D v2 ViT-giant	0.045 {1}	-	-	-	-
2	GRIN_FT_NI	-	0.051 {1}	-	-	-

Appendix 3: List of all research papers from the above rankings

Method	Abbr.	Paper	Venue (Alt link)	Official repository
Align3R	-	Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
BetterDepth	BD	BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation		-
Buffer Anytime	BA	Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors		-
ChronoDepth	-	Learning Temporally Consistent Video Depth from Video Diffusion Priors
CUT3R	-	Continuous 3D Perception Model with Persistent State
Deep3D	-	Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks
Depth Any Video	DAV	Depth Any Video with Scalable Synthetic Data
Depth Anything	DA	Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Depth Anything V2	DA V2	Depth Anything V2
Depth Pro	DP	Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
DepthCrafter	DC	DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Diffusion E2E FT	E2E FT	Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
FutureDepth	FD	FutureDepth: Learning to Predict the Future Improves Video Depth Estimation		-
GRIN	-	GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion		-
Metric3D	M3D	Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image
Metric3D v2	M3D v2	Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation	(Alt link)
MoGe	-	MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
MonST3R	-	MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
NVDS	-	Neural Video Depth Stabilizer
NVDS⁺	-	NVDS⁺: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation	(Alt link)
PatchFusion	PF	PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
RollingDepth	RD	Video Depth without Video Models
StereoCrafter	-	StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
UniDepth	UD	UniDepth: Universal Monocular Metric Depth Estimation
Video Depth Anything	VDA	Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
ZeroDepth	ZD	Towards Zero-Shot Scale-Aware Monocular Depth Estimation
ZoeDepth	ZoeD	ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

List of Rankings

2D to 3D Video Conversion Rankings