The repository for my bachelor thesis written at TUM. It contains a Stable Diffusion pipeline based on ALDM to convert semantic segmentation maps into realistic images.
The pipeline revolves around a core class, the Pipeline, which uses a builder pattern to construct and execute image processing tasks. It encapsulates data within a stream, which includes the image, metadata, and additional information like bounding boxes or depth data. The data stream is represented as a dictionary or tuple, ensuring ordered and immutable data flow through the pipeline. Several methods within the Pipeline class facilitate various functionalities, such as looping through sub-pipelines, running processes in parallel, and handling multiple image inputs.
This project was tested on Ubuntu and heavily relies on CUDA. Please make sure to have CUDA installed before the next steps.
To install this project, first setup a Conda environment running Python version 1.8.
conda env create -y -f=environment.yml
conda activate sdpipeline
For the next step PDM has to be installed. Please refer to the official page for installation or use:
conda install pdm
After that the project can be installed by using the following command:
python -m pdm install
This should install all modules inside this repository in editable mode. If you want to install them in production mode please use:
python -m pdm install --prod
For installing only selected modules please modify the pyproject.toml file accordingly
To make the different large modules work often configuration files has to be placed into the data folder. Please refer to the different modules for detailed instruction.
To generate the results from the bachelor thesis by yourself you need all modules installed.
- 9_aldm_large.py
- 10_aldm+I2I_seg_large.py
- 11_aldm+I2I_seg_dep_large.py
- 12_aldm+I2I_seg_dep_large_first_person.py
for generation,
for dataset generation and
for the object detection via yolox.
All resulting yolox checkpoints can also be found via this link.
for image generation and
for testing. The results can also be found in the scripts (for RMSE in the plot.py file).
On the server at the chair generation and Human-Machine-Communication is a deployed version of this code. You can find there also all scripts, datasets and other results like the yolox weights. The conda environment has the name "application". For more information feel free to contact me :)
The script folder contains a lot of different scripts used for converting datasets into different formats or the video generation part. In the futures these could be also integrated into the pipeline as modules.
The Sync and Share link contains all generated datasets.
A hugh thanks to the Institute for Human-Machine-Communication at TUM, Univ.-Prof. Dr.-Ing. habil. G. Rigoll and my advisors Philipp Wolters M.Sc. and Fabian Herzog Ph.D. for giving me a place to write my thesis and for the great support.
Also hugh thanks to Sonja Nagy and Fabian Lehr for proofreading :)