autonomousvision
diff --git a/‎.gitignore
Lines changed: 161 additions & 0 deletions b/‎.gitignore
Lines changed: 161 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 120 additions & 3 deletions b/‎README.md
Lines changed: 120 additions & 3 deletions
diff --git a/‎code/confs/dtu_grids_3views.conf
Lines changed: 79 additions & 0 deletions b/‎code/confs/dtu_grids_3views.conf
Lines changed: 79 additions & 0 deletions
@@ -0,0 +1,161 @@
+code/run_logs
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+exps/*
+exps*
+evals*
+data/DTU
+data/BlendedMVS
+data/Replica
+data/tnt_advanced
+data/
+
+code/tmp_build
+
+code/.idea/
+.DS_Store
+._.DS_Store
+.idea/
+
+*.png
+*.ply
+*.txt
+*.jpg
+*.npy
+*.npz
+*.tar
+uploadtnt_*/
+
+*.json
+*.csv
+dtu_eval/Offical_DTU_Dataset/
+media/
@@ -28,16 +28,133 @@ We demonstrate that state-of-the-art depth and normal cues extracted from monocu
 </p>
 <br>
 
-### Code coming soon
+# Setup
+
+## Installation
+Clone the repository and create an anaconda environment called monosdf using
+```
+git clone [email protected]:autonomousvision/monosdf.git
+cd monosdf
+
+conda create -y -n monosdf python=3.8
+conda activate monosdf
+
+conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
+conda install cudatoolkit-dev=11.3 -c conda-forge
+
+pip install -r requirements.txt
+```
+The hash encoder will be compiled on the fly when running the code.
+
+## Dataset
+For downloading the preprocessed data, run the following script. The data for the DTU, Replica, Tanks and Temples is adapted from [VolSDF](https://github.com/lioryariv/volsdf), [Nice-SLAM](https://github.com/cvg/nice-slam), and [Vis-MVSNet](https://github.com/jzhangbs/Vis-MVSNet), respectively.
+```
+bash scripts/download_dataset.sh
+```
+# Training
+
+Run the following command to train monosdf:
+```
+cd ./code
+CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf CONFIG  --scan_id SCAN_ID
+```
+where CONFIG is the config file in `code/confs`, and SCAN_ID is the id of the scene to reconstruct.
+
+We provide example commands for training DTU, ScanNet, and Replica dataset as follows:
+```
+# DTU scan65
+CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/dtu_mlp_3views.conf  --scan_id 65
+
+# ScanNet scan 1 (scene_0050_00)
+CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/scannet_mlp.conf  --scan_id 1
+
+# Replica scan 1 (room0)
+CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/replica_mlp.conf  --scan_id 1
+```
+
+We created individual config file on Tanks and Temples dataset so you don't need to set the scan_id. Run training on the courtroom scene as:
+```
+CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_mlp_1.conf
+```
+
+We also generated high resolution monocular cues on the courtroom scene and it's better to train with more gpus. First download the dataset
+```
+bash scripts/download_highres_TNT.sh
+```
+
+Then run training with 8 gpus:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 python -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_highres_grids_courtroom.conf
+```
+Of course, you can also train on all other scenes with multi-gpus.
+
+# Evaluations
+
+## DTU
+First, download the ground truth DTU point clouds:
+```
+bash scripts/download_dtu_ground_truth.sh
+```
+then you can evaluate the quality of extracted meshes (take scan 65 for example):
+```
+python evaluate_single_scene.py --input_mesh scan65_mesh.ply --scan_id 65 --output_dir dtu_scan65
+```
+
+We also provide script for evaluating all DTU scenes:
+```
+python evaluate.py
+```
+Evaluation results will be saved to ```evaluation/DTU.csv``` by default, please check the script for more details.
+
+## Replica
+Evaluate on one scene (take scan 1 room0 for example)
+```
+cd replica_eval
+python evaluate_single_scene.py --input_mesh replica_scan1_mesh.ply --scan_id 1 --output_dir replica_scan1
+```
+
+We also provided script for evaluating all Replica scenes:
+```
+cd replica_eval
+python evaluate.py
+```
+please check the script for more details.
+
+## ScanNet
+```
+cd scannet_eval
+python evaluate.py
+```
+please check the script for more details.
+
+## Tanks and Temples
+You need to submit the reconstruction results to the [official evaluation server](https://www.tanksandtemples.org), please follow their guidance. We also provide an example of our submission [here](https://drive.google.com/file/d/1Cr-UVTaAgDk52qhVd880Dd8uF74CzpcB/view?usp=sharing) for reference.
+
+# Custom dataset
+We provide an example of how to preprocess scannet to monosdf format. First, run the script to subsample training images, normalize camera poses, and etc.
+```
+cd preprocess
+python scannet_to_monosdf.py
+```
+
+Then, we can extract monocular depths and normals (please install [omnidata model](https://github.com/EPFL-VILAB/omnidata) before running the command):
+```
+python extract_monocular_cues.py --task depth --img_path ../data/custom/scan1 --output_path ../data/custom/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS
+python extract_monocular_cues.py --task normal --img_path ../data/custom/scan1 --output_path ../data/custom/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS
+```
+
+
+# Acknowledgements
+This project is built upon [VolSDF](https://github.com/lioryariv/volsdf). We use pretrained [Omnidata](https://omnidata.vision) for monocular depth and normal extraction. Cuda implementation of Multi-Resolution hash encoding is based on [torch-ngp](https://github.com/ashawkey/torch-ngp). Evaluation scripts for DTU, Replica, and ScanNet are taken from [DTUeval-python](https://github.com/jzhangbs/DTUeval-python), [Nice-SLAM](https://github.com/cvg/nice-slam) and [manhattan-sdf](https://github.com/zju3dv/manhattan_sdf) respectively. We thank all the authors for their great work and repos. 
 
-<br>
 
+# Citation
 If you find our code or paper useful, please cite
 ```bibtex
 @article{Yu2022MonoSDF,
   author    = {Yu, Zehao and Peng, Songyou and Niemeyer, Michael and Sattler, Torsten and Geiger, Andreas},
   title     = {MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction},
-  journal   = {arXiv:2022.00665},
+  journal   = {Advances in Neural Information Processing Systems (NeurIPS)},
   year      = {2022},
 }
 ```
@@ -0,0 +1,79 @@
+train{
+    expname = dtu_grids_3views
+    dataset_class = datasets.scene_dataset.SceneDatasetDN
+    model_class = model.network.MonoSDFNetwork
+    loss_class = model.loss.MonoSDFLoss
+    learning_rate = 5.0e-4
+    lr_factor_for_grid = 1.0
+    num_pixels = 1024
+    checkpoint_freq = 100
+    plot_freq = 10
+    split_n_pixels = 1024
+}
+plot{
+    plot_nimgs = 1
+    resolution = 512
+    grid_boundary = [-1.2, 1.2]
+}
+loss{
+    rgb_loss = torch.nn.L1Loss
+    eikonal_weight = 0.1
+    smooth_weight = 0.005
+    depth_weight = 0.1
+    normal_l1_weight = 0.05
+    normal_cos_weight = 0.05
+    end_step = 12800
+}
+dataset{
+    data_dir = DTU
+    img_res = [384, 384]
+    scan_id = 65
+    center_crop_type = center_crop_for_dtu
+    num_views = 3
+}
+model{
+    feature_vector_size = 256
+    scene_bounding_sphere = 5.0
+    
+    Grid_MLP = True
+
+    implicit_network
+    {
+        d_in = 3
+        d_out = 1
+        dims = [ 256, 256]
+        geometric_init = True
+        bias = 0.6
+        skip_in = [4]
+        weight_norm = True
+        multires = 6
+        use_grid_feature = True
+        divide_factor = 5.0 # 1.5 for replica, 6 for dtu, 3.5 for tnt, 1.5 for bmvs, we need it to normalize the points range for multi-res grid
+    }
+    rendering_network
+    {
+        mode = idr
+        d_in = 9
+        d_out = 3
+        dims = [ 256, 256] #, 256, 256]
+        weight_norm = True
+        multires_view = 4
+    }
+    density
+    {
+        params_init{
+            beta = 0.1
+        }
+        beta_min = 0.0001
+    }
+    ray_sampler
+    {
+        near = 2.0
+        N_samples = 64
+        N_samples_eval = 128
+        N_samples_extra = 32
+        eps = 0.1
+        beta_iters = 10
+        max_total_iters = 5
+    }
+}