This repository contains the training script of our solution for winning 2nd place in the Arabicthon2022 in KSA.
Arabicthon is a deep learning competition organized by The King Salman Global Academy for the Arabic Language. It came with 3 tracks:
- The Arabic poetry challenge.
- The lexicon Challenge.
- The Arabic language games for kids.
We choose to work on the Arabic poetry challenge. We build a web app that contains multiple tools that treat the Arabic poetry, such as:
- Poem generation based on the rhyme and prosody.
- Poem generation based on a picture.
- Verse completion given a rhyme.
- Meter classification without diacritics.
- Arabic poetry automatic diacritics.
- Aroud generation without the need of diacritics.
- And many other variants of these tools ...
This project has won us the 2nd place in the Arabicthon2022.
React app that contains the frontend of the project. you can find it in here: arabicthon_frontend
Flask app that contains the backend of the project. you can find it in here: arabicthon_backend
The poem generated has been trained by finetuning Aragpt2-medium on a dataset of ~1M Arabic poem verse. You can find the training file in here: poem_generation_notebook.ipynb
This classifier is based on 3 layers of Bi-directional LSTMs, trained on the same dataset in order to classify poem verses to 10 meters. You can find the training file in here: meter-classification-lstm.ipynb
This is a seq2seq LSTM based model that takes a verse without diacritics, and outputs it's Aroud form. You can find the training file in here: aroud-lstm.ipynb
This model is used in order to generate Arabic poetry based on an input image. This model takes an image as input an output its captioning in English. You can find the training file in here:image-captioning-with-attention.ipynb.
The caption is then translated into the Arabic language and then it gets fed to the GPT based generator as a first verse. The generator will then make sure to generate verses that are constrained by the rhyme and meter conditions while saving as much context as possible from the image caption.
This model is a seq2seq LSTM based model, that takes a verse as input, and predicts the most suitable diacritics. You can find the training file in here: tachkil_notebook.ipynb.