Refer to SpatialRGPT's dataset_pipeline environment installation, we build ours environment. As show in below:
conda create -n GenSpace python=3.10
conda activate GenSpace
##### Install Pytorch according to your own setup #####
pip install torch==2.2.2 torchaudio==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu121
# isntall openmim for mmengine
pip install -U openmim
mim install mmengine
# Install Wis3D for visualization (optional)
pip install https://github.com/zju3dv/Wis3D/releases/download/2.0.0/wis3d-2.0.0-py3-none-any.whl
# Install detectron2 for SOM visualization
pip install 'git+https://github.com/facebookresearch/detectron2.git'
# Some other libraries
pip install iopath pyequilib==0.3.0 albumentations einops open3d imageio
# This may take a lot of time
pip install mmcv==2.0.0 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.13/index.htmlRefer to its introduction, install this repo.
cd osdsynth
git clone https://github.com/SpatialVision/Orient-Anything.git
# rename it to Orient_Anything
mv ./Orient-Anything/ ./Orient_Anything/
cd Orient_Anything
pip install -r requirements.txt
# Move the Rotation.py into Orient_Anything
cd ..
mv Rotation.py Orient_Anything/Follow the instructions provided by SpatialRGPT to install this package.
mkdir osdsynth/external && cd osdsynth/external
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
cd Grounded-Segment-Anything/
# Install Segment Anything
python -m pip install -e segment_anything
# Install Grounding DINO
pip install --no-build-isolation -e GroundingDINO
# Install RAM
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install setuptools --upgrade
pip install -e ./recognize-anything/cd osdsynth/external
git clone https://github.com/jinlinyi/PerspectiveFields.gitRun the .sh scripts to download these weights
bash download_checkpoints.shPrepare image and text data and organize them in the following format
data
|---t2i
| |---0
| | |--0.png
| | |--0.txt
| | ...
| |
| |---1
| ...
|
|---imageedit_unedit
|---imageedit
Store the images generated by the T2I task and their categories in the t2i folder (stored in a .txt file) where the text should be like "<cat> <dog>"(All object types must be enclosed in<>and separated by spaces).
The other two folders are also similar. Imageedit_unedit stores the pre edited images and their corresponding. txt files from the image editing task to describe the object categories in the images; And imageedit stores the edited image and the corresponding. txt file to describe the object categories in the image.
The subfolders' 0 ',' 1 ', etc. under folder t2i, imageedit and imageedit_unedit represent the specific type of task.
For T2I task, here is an exmaple:
python run_t2i.py --config configs/v2.py --input example/t2iFor Imageedit task, here is also an example:
# Preprocess these files to extract information such as the position and orientation of objects before editing (if necessary)
python run_imageedit_preprocess.py --config configs/v2.py --input example/imageedit_unedit
python run_imageedit.py --config configs/v2.py --input example/imageeditFor the Sub-domain Complex Relation, its inference method is different from other subdomains. Here is an example:
# Preprocess these files to extract information such as the position and orientation of objects before editing (if necessary)
python run_imageedit_CR_preprocess.py --config configs/v2.py --input example/imageedit_CR_unedit
python run_imageedit_CR.py --config configs/v2.py --input example/imageedit_CR