GPT4V-Image-Captioner / GPT4V图像打标器

We now have sd-webui-GPT4V-Image-Captioner for SD WebUI

This is a multifunctional image processing toolbox built with Gradio, capable of tagging images using the GPT-4-vision or Claude 3 API, the cogVLM model, Qwen-VL(Alibaba Cloud), the Moondream model.

Key features include:

One-click installation and use
Single image and multi-image batch tagging
Choice of online GPT4V or Claude 3 or Qwen-VL(Alibaba Cloud) & local CogVLM and Moondream models
Visual tag analysis and processing
Image pre-compression
Keyword filtering and watermark image recognition

Developers: Jiaye, LEOSAM是只兔狲, SleeeepyZhou, Fok, GPT4. Welcome everyone to add more new features to this project.

Please note that the Claude 3 feature is not finished yet.

To use Claude 3, simply replace the API key and URL with the Claude 3 API key and URL (/v1/messages), and changing the model name to "claude-3-opus" (or sonnet).

Installation and Startup Guide

Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)

Open Command Prompt as administrator and navigate to the directory where you want to clone the repository.

Clone the repository using the following command:

git clone https://github.com/jiayev/GPT4V-Image-Captioner

Double-click install_windows.bat to run and install all necessary dependencies.
After the installation is complete, you can launch the GPT4V-Image-Captioner by double-clicking start_windows.bat.
Hold down Ctrl and click on the URL in the terminal (or copy the URL to your browser), which will open the Gradio app interface in your default browser.
Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.

Linux / macOS

Open a terminal and navigate to the directory where you want to clone the repository.

Clone the repository using the following command:

git clone https://github.com/jiayev/GPT4V-Image-Captioner

Navigate to the cloned directory:
```
cd GPT4V-Image-Captioner
```
Make the install and start scripts executable with the following command:
```
chmod +x install_linux_mac.sh; chmod +x start_linux_mac.sh
```
Execute the install script:
```
./install_linux_mac.sh
```
Launch the GPT4V-Image-Captioner in the terminal by executing the launch script:
```
./start_linux_mac.sh
```
Copy the URL displayed in the terminal and open it in your browser to access the Gradio app interface.
Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.

Windows Manual Installation Instructions

Open the Command Prompt by pressing Win + R, typing cmd, and then pressing Enter.
Clone the repository to your local machine using the following command:
```
git clone https://github.com/jiayev/GPT4V-Image-Captioner
```
Once cloning is complete, navigate to the cloned directory:
```
cd GPT4V-Image-Captioner
```
Before installing any dependencies, make sure that Python is installed on your system. Check for Python's presence by typing the following command and pressing Enter in the Command Prompt:
```
python --version
```
If Python is not installed, you will get an error message. In that case, please visit the Python official download page and follow the instructions to install it.
Create a virtual environment named myenv to avoid contaminating the global Python environment:
```
python -m venv myenv
```
Activate the virtual environment you just created:
```
myenv\Scripts\activate
```
Update pip to date:
```
python -m pip install --upgrade pip
```

Install libraries within the virtual environment:

pip install scipy networkx wordcloud matplotlib Pillow tqdm gradio requests

After completing the steps above, you can start GPT4V-Image-Captioner by double-clicking the start_windows.bat file.

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
install_script		install_script
lib		lib
moondream		moondream
omnilmm		omnilmm
promptenv/Lib/site-packages		promptenv/Lib/site-packages
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README-CN.md		README-CN.md
README.md		README.md
gpt-caption.py		gpt-caption.py
install_linux_mac.sh		install_linux_mac.sh
install_windows.bat		install_windows.bat
omnichat.py		omnichat.py
openai_api.py		openai_api.py
saved_prompts.csv		saved_prompts.csv
start_linux_mac.sh		start_linux_mac.sh
start_windows.bat		start_windows.bat
thread-caption.py		thread-caption.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT4V-Image-Captioner / GPT4V图像打标器

Please note that the Claude 3 feature is not finished yet.

Installation and Startup Guide

Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)

Linux / macOS

Windows Manual Installation Instructions

About

Releases

Packages

Languages

License

TheMistoAI/GPT4V-Image-Captioner

Folders and files

Latest commit

History

Repository files navigation

GPT4V-Image-Captioner / GPT4V图像打标器

Please note that the Claude 3 feature is not finished yet.

Installation and Startup Guide

Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)

Linux / macOS

Windows Manual Installation Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages