We now have sd-webui-GPT4V-Image-Captioner for SD WebUI
This is a multifunctional image processing toolbox built with Gradio, capable of tagging images using the GPT-4-vision or Claude 3 API, the cogVLM model, Qwen-VL(Alibaba Cloud), the Moondream model.
Key features include:
- One-click installation and use
- Single image and multi-image batch tagging
- Choice of online GPT4V or Claude 3 or Qwen-VL(Alibaba Cloud) & local CogVLM and Moondream models
- Visual tag analysis and processing
- Image pre-compression
- Keyword filtering and watermark image recognition
Developers: Jiaye, LEOSAM是只兔狲, SleeeepyZhou, Fok, GPT4. Welcome everyone to add more new features to this project.
To use Claude 3, simply replace the API key and URL with the Claude 3 API key and URL (/v1/messages), and changing the model name to "claude-3-opus" (or sonnet).
Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)
- Open Command Prompt as administrator and navigate to the directory where you want to clone the repository.
- Clone the repository using the following command:
git clone https://github.com/jiayev/GPT4V-Image-Captioner
- Double-click
install_windows.bat
to run and install all necessary dependencies. - After the installation is complete, you can launch the GPT4V-Image-Captioner by double-clicking
start_windows.bat
. - Hold down Ctrl and click on the URL in the terminal (or copy the URL to your browser), which will open the Gradio app interface in your default browser.
- Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.
- Open a terminal and navigate to the directory where you want to clone the repository.
- Clone the repository using the following command:
git clone https://github.com/jiayev/GPT4V-Image-Captioner
- Navigate to the cloned directory:
cd GPT4V-Image-Captioner
- Make the install and start scripts executable with the following command:
chmod +x install_linux_mac.sh; chmod +x start_linux_mac.sh
- Execute the install script:
./install_linux_mac.sh
- Launch the GPT4V-Image-Captioner in the terminal by executing the launch script:
./start_linux_mac.sh
- Copy the URL displayed in the terminal and open it in your browser to access the Gradio app interface.
- Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.
-
Open the Command Prompt by pressing
Win + R
, typingcmd
, and then pressingEnter
. -
Clone the repository to your local machine using the following command:
git clone https://github.com/jiayev/GPT4V-Image-Captioner
-
Once cloning is complete, navigate to the cloned directory:
cd GPT4V-Image-Captioner
-
Before installing any dependencies, make sure that Python is installed on your system. Check for Python's presence by typing the following command and pressing
Enter
in the Command Prompt:python --version
If Python is not installed, you will get an error message. In that case, please visit the Python official download page and follow the instructions to install it.
-
Create a virtual environment named
myenv
to avoid contaminating the global Python environment:python -m venv myenv
-
Activate the virtual environment you just created:
myenv\Scripts\activate
-
Update
pip
to date:python -m pip install --upgrade pip
-
Install libraries within the virtual environment:
pip install scipy networkx wordcloud matplotlib Pillow tqdm gradio requests
-
After completing the steps above, you can start GPT4V-Image-Captioner by double-clicking the
start_windows.bat
file.