PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs.
PDFChatAnnotator is a collaborative annotation tool that leverages the strengths of both human experts and Large Language Models (LLMs) to annotate multi-modal data in PDF-format catalogs. It is designed to streamline and enhance the annotation process through interactive workflows and intelligent suggestions.
📄 Related Publication:
This project is based on our research paper published at ACM IUI 2024:
PDFChatAnnotator: A Human-LLM Collaborative Multi-Modal Data Annotation Tool for PDF-Format Catalogs
Current Version: 2.0
- In version 1.0, data was saved to a MySQL database, which required additional setup and configuration.
- In version 2.0, to simplify the installation and usage process—especially for non-computer science users—we have switched to saving annotation results directly into Excel files (
.xlsxformat).
This change makes the tool more accessible and easier to use out of the box.
-
The currently supported catalog types are:
-
Each page's images are only associated with the text content on that page (a).
-
All images that appear before the start of new page text are associated with the text content on the current page (b).
-
In a page where there are multiple image-text matching pairs, each image is associated with the text content below it (c).
⚠️ Due to the high correlation with the inherent characteristics of the catalog type, it is currently not open source.
-
- Python 3.9
- Anaconda (recommended for environment management)
- Visual Studio Code (recommended IDE)
-
Download the project:
- Visit: https://github.com/VanillaTY/PDFChatAnnotator
- Click the green
Codebutton and selectDownload ZIP - Extract the ZIP file to your preferred location (e.g., Desktop)
-
Open the project in VS Code:
- Drag the extracted folder into VS Code
- If prompted with "Do you trust the authors?", select "Yes"
-
Install Anaconda:
- Download from: https://www.anaconda.com/download
- Follow the installation wizard
- For Windows users: Add Anaconda to system PATH during installation
-
Create and activate the environment:
conda create -n pdfannotator python=3.9 conda activate pdfannotator
-
Install project dependencies:
pip install -r requirements.txt
-
Install OS-specific dependencies:
- Windows:
pip install pyreadline3
- macOS:
pip install readline
- Windows:
-
Obtain your API key:
- Visit: https://api.chatanywhere.tech/#/
- Purchase a plan and get your API key
-
Configure the API key:
- Open
utils/prompt.py - Replace the placeholder with your API key:
api_key = "your_api_key_here" base_url = "your_base_url_here"
- Open
-
Activate the environment:
conda activate pdfannotator
-
Start the development server:
python manage.py runserver
-
Access the application:
- Open your browser
- Navigate to: http://127.0.0.1:8000/
Before launching the system, you must preprocess your PDF file(s) to extract necessary text and image data.
Please follow the guide below before running the application:
The preprocessing process requires a GPU-supported environment and will prepare the data required for annotation.
For daily use:
- Open VS Code and load the project
- Open terminal and run:
conda activate pdfannotator python manage.py runserver
- Access http://127.0.0.1:8000/ in your browser
For more detailed installation instructions, please refer to the Installation Guide.


