ProjectX-VJTI · soham30rane · Jan 3, 2025
diff --git a/Deep-Multiclass-Audio-Classification.md b/Deep-Multiclass-Audio-Classification.md
@@ -0,0 +1,125 @@
+# Deep Multiclass Audio Classification 
+
+## Project structure
+
+```bash
+├── Coursera/
+│   ├── soham/
+│   │   ├── Coursera Assignments/
+│   │   └── Coursera Notes/
+│   └── Aanchal/
+│       ├── Course1/
+│       ├── Course2/
+│       └── Course4/
+├── EDA/
+│   ├── esc-50-explore.ipynb
+│   └── esc-preprocess-and-eda.ipynb
+├── UI/
+│   ├── test/
+│   ├── audio_ui.py
+│   ├── audio_ui2.py
+│   ├── labels.py
+│   ├── model.py
+│   ├── yamnet.onnx
+│   └── yamnet_inference.py
+├── mini-projects/
+│   ├── Aanchal/
+│   │   ├── Audio Classification UrbanSound8k.ipynb
+│   │   ├── NN_from_scratch.ipynb
+│   │   └── Transfer learning with ResNet-50 cifar10.ipynb
+│   └── Soham/
+│       ├── Audio Classification UrbanSound8k/
+│       ├── Neural-Network-from-scratch/
+│       └── Transfer-learning-cifar10/
+├── resnets_and_efficientnets/
+│   ├── esc-dataset.ipynb
+│   ├── esc-model1_2024-08-20_18-11-09.pth
+│   ├── esc-transfer-learn.ipynb
+│   ├── esc-transfer-learning2.ipynb
+│   └── esc-utils.ipynb
+├── yamnet/
+│   ├── esc-dataset.ipynb
+│   ├── esc-dataset2.xpynb
+│   ├── esc-model1_20/
+│   ├── esc-utils.ipynb
+│   ├── esc-utils3.xpynb
+│   ├── esc-yamnet.ipynb
+│   ├── escyamnetdataset.xpynb
+│   ├── getyamnet.xpynb
+│   ├── yamnet-load.xpynb
+│   └── yamnet.ipynb
+├── LICENSE
+└── README.md
+```
+
+## Table of Contents
+- [Introduction](#introduction)
+
+- [Description](#description)
+
+- [Tech Stack](#tech-stack)
+
+
+- [Contributors](#contributors)
+
+- [Future Prospects](#future-prospects)
+
+- [Resources](#resources)
+
+- [Acknowledgement](#acknowledgement)
+
+## Introduction
+This project focuses on developing a robust audio classifier that processes user-provided audio files and accurately identifies the category or class to which the audio belongs.
+
+## Description
+This project seeks to create a cutting-edge audio classification system capable of classifying diverse audio inputs, including speech, music, and environmental sounds.  
+We used 2 approaches for this project, which are as follows,
+
+- Convolutional Neural Networks (CNNs)
+- Transfer learning (YAMNet, ResNet50, EfficientNET )
+
+https://github.com/user-attachments/assets/c7d5853d-6642-4652-b233-214ce93727d9
+
+
+
+## Tech Stack 
+- [Python](https://www.python.org/)
+- [Pytorch](https://pytorch.org/)
+- [Kaggle](https://www.kaggle.com/)
+
+
+
+## Contributors
+- [Aanchal Borse](https://github.com/Aanchallllll)
+- [Soham Rane](https://github.com/soham30rane)
+
+
+
+## Future Prospects
+- Hate Speech Detection in low-Resource Languages
+- Audio based Security Systems
+- Environmental Monitoring
+
+
+## Resources
+
+[Audio processing](https://discord.com/channels/1262070461324333198/1262075598621245610/1264632565764067368
+) by Valerio Valerdo
+
+Coursera course on [Deep learning](https://discord.com/channels/1262070461324333198/1262075598621245610/1263464039816757341
+) by Andrew Ng and Younes Bensouda Mourri
+
+[Pytorch playlist](https://discord.com/channels/1262070461324333198/1262075598621245610/1267162792994148393
+) by Patrick Leober
+
+Datasets used are as follows, 
+1. [ESC-50 dataset](https://www.kaggle.com/datasets/mmoreaux/environmental-sound-classification-50)
+2. [CIFAR 10 dataset](https://www.kaggle.com/c/cifar-10/)
+3. [Urban Sound 8k](https://www.kaggle.com/datasets/chrisfilo/urbansound8k)
+
+
+## Acknowledgement 
+Special thanks to [COC VJTI](https://github.com/CommunityOfCoders) for ProjectX 2024
+
+Special Thanks to our mentors [Kshitij Shah](https://github.com/kshitijdshah99) and [Param Thakkar](https://github.com/ParamThakkar123) who guided us throughout our project journey.
+