GitHub - wadjih-bencheikh18/arabicthon-backend

This repository contains the training script of our solution for winning 2nd place in the Arabicthon2022 in KSA.

What is Arabicthon:

Arabicthon is a deep learning competition organized by The King Salman Global Academy for the Arabic Language. It came with 3 tracks:

The Arabic poetry challenge.
The lexicon Challenge.
The Arabic language games for kids.

Our solution:

We choose to work on the Arabic poetry challenge. We build a web app that contains multiple tools that treat the Arabic poetry, such as:

Poem generation based on the rhyme and prosody.
Poem generation based on a picture.
Verse completion given a rhyme.
Meter classification without diacritics.
Arabic poetry automatic diacritics.
Aroud generation without the need of diacritics.
And many other variants of these tools ...

This project has won us the 2nd place in the Arabicthon2022.

Front:

React app that contains the frontend of the project. you can find it in here: arabicthon_frontend

Backend:

Flask app that contains the backend of the project. you can find it in here: arabicthon_backend

Training:

Poem generation:

The poem generated has been trained by finetuning Aragpt2-medium on a dataset of ~1M Arabic poem verse. You can find the training file in here: poem_generation_notebook.ipynb

Meter classification:

This classifier is based on 3 layers of Bi-directional LSTMs, trained on the same dataset in order to classify poem verses to 10 meters. You can find the training file in here: meter-classification-lstm.ipynb

Aroud generation:

This is a seq2seq LSTM based model that takes a verse without diacritics, and outputs it's Aroud form. You can find the training file in here: aroud-lstm.ipynb

Image captioning:

This model is used in order to generate Arabic poetry based on an input image. This model takes an image as input an output its captioning in English. You can find the training file in here:image-captioning-with-attention.ipynb.

The caption is then translated into the Arabic language and then it gets fed to the GPT based generator as a first verse. The generator will then make sure to generate verses that are constrained by the rhyme and meter conditions while saving as much context as possible from the image caption.

Arabic automatic diacritization:

This model is a seq2seq LSTM based model, that takes a verse as input, and predicts the most suitable diacritics. You can find the training file in here: tachkil_notebook.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
consts		consts
flask_app		flask_app
helper		helper
models		models
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
aroud_dataset.csv		aroud_dataset.csv
caption_generation.py		caption_generation.py
generation.py		generation.py
last_word_prediction.py		last_word_prediction.py
main.py		main.py
meter_classificaiton.py		meter_classificaiton.py
tachkil.py		tachkil.py
taksim_aroud.py		taksim_aroud.py
test.ipynb		test.ipynb
test2.ipynb		test2.ipynb
test3.ipynb		test3.ipynb
test4.ipynb		test4.ipynb
test5.ipynb		test5.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Arabicthon:

Our solution:

Front:

Backend:

Training:

Poem generation:

Meter classification:

Aroud generation:

Image captioning:

Arabic automatic diacritization:

About

Releases

Packages

Languages

wadjih-bencheikh18/arabicthon-backend

Folders and files

Latest commit

History

Repository files navigation

What is Arabicthon:

Our solution:

Front:

Backend:

Training:

Poem generation:

Meter classification:

Aroud generation:

Image captioning:

Arabic automatic diacritization:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages