This application processes .json
files extracted from the CDLI (Cuneiform Digital Library Initiative) and transfers the data to a SQLite database. The main goal is to provide a user-friendly interface to automatically clean up and process the data from the CDLI.
It is intended to work in conjunction with the CDLI API Client application I've developed that you can get over here. You can also directly use the official CDLI framework API client.
As with my CDLI API client application, I've relied quite a lot on LLM to write the code for these Python files: first on ChatGPT-4 mini and now on Claude 3.5 Sonnet. The code is organized into a more modular structure, but still needs some cleaning up. In the meantime, feel free to submit improvements, fork the repository, etc.
Just install Python 3.8 or higher from the official website to run the code. For Linux users, if I'm not mistaken, Python is shipped with most distributions, so you won't have to install anything else!
- Clone the repository:
git clone https://github.com/ili-yahu/cdli-json-export-processor.git
cd cdli-json-export-processor
- Install the required Python packages:
pip install -r requirements.txt
The application is organized into several modules:
cdli-json-export-processor/
├── database/ # Database operations and models
│ ├── entity_config.py
│ ├── processor.py
│ └── tables_config.py
├── gui/ # User interface components
│ ├── credits_tab.py
│ ├── help_tab.py
│ ├── home_tab.py
│ ├── import_tab.py
│ ├── main_window.py
│ └── options_tab.py
├── ui/ # Additional UI components
│ └── progress_tracker.py
├── utils/ # Utility functions
│ ├── config_manager.py
│ ├── file_handler.py
│ ├── logger.py
│ └── text_cleaner.py
├── .gitignore # Git ignore file
├── config.json # Configuration file
├── info.py # Version info
├── main.py # Entry point
├── README.md # Documentation
└── requirements.txt # Dependencies
- Run
main.py
to start the application - Use the GUI to:
- Create/select a SQLite database
- Select and clean JSON files
- Send the data to the database
The program will automatically clean up the JSON files and format them for proper database insertion.
- If you encounter any issues, please enable the logging options in the help tab and check the logs in the
/logs
directory.