This project extracts data from PDF files containing details about political bonds, stores this data in a SQLite database, and provides a Streamlit web interface to interact with the data. Users can query the database using natural language through the web interface, which leverages an AI model to convert these queries into SQL statements.
- PDF Data Extraction: Extracts tables from PDF documents and parses them into structured data.
- Database Storage: Loads extracted data into a SQLite database for easy querying.
- Natural Language Queries: Allows users to enter queries in natural language and converts these into SQL queries.
- Streamlit Web Interface: Provides a user-friendly interface to interact with the data.
- Python 3.8+
- Streamlit
- SQLite3
- Additional Python libraries:
pandas
,sqlalchemy
,tqdm
,transformers[torch]
-
Clone the repository:
git clone https://your-repository-url cd political_bonds
-
Install required Python packages:
pip install -r requirements.txt
Ensure that the PDF files are placed under static/pdf/
directory.
To run the Streamlit application:
streamlit run main.py
This will start the server and open the web interface in your default web browser.
- Loading Data: Click the 'Load Data' button on the web interface to extract data from the PDFs and load it into the database.
- Querying Data: Enter your natural language query in the text input box and press 'Execute Query' to see the results.