Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions Deep-Multiclass-Audio-Classification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Deep Multiclass Audio Classification

## Project structure

```bash
├── Coursera/
│ ├── soham/
│ │ ├── Coursera Assignments/
│ │ └── Coursera Notes/
│ └── Aanchal/
│ ├── Course1/
│ ├── Course2/
│ └── Course4/
├── EDA/
│ ├── esc-50-explore.ipynb
│ └── esc-preprocess-and-eda.ipynb
├── UI/
│ ├── test/
│ ├── audio_ui.py
│ ├── audio_ui2.py
│ ├── labels.py
│ ├── model.py
│ ├── yamnet.onnx
│ └── yamnet_inference.py
├── mini-projects/
│ ├── Aanchal/
│ │ ├── Audio Classification UrbanSound8k.ipynb
│ │ ├── NN_from_scratch.ipynb
│ │ └── Transfer learning with ResNet-50 cifar10.ipynb
│ └── Soham/
│ ├── Audio Classification UrbanSound8k/
│ ├── Neural-Network-from-scratch/
│ └── Transfer-learning-cifar10/
├── resnets_and_efficientnets/
│ ├── esc-dataset.ipynb
│ ├── esc-model1_2024-08-20_18-11-09.pth
│ ├── esc-transfer-learn.ipynb
│ ├── esc-transfer-learning2.ipynb
│ └── esc-utils.ipynb
├── yamnet/
│ ├── esc-dataset.ipynb
│ ├── esc-dataset2.xpynb
│ ├── esc-model1_20/
│ ├── esc-utils.ipynb
│ ├── esc-utils3.xpynb
│ ├── esc-yamnet.ipynb
│ ├── escyamnetdataset.xpynb
│ ├── getyamnet.xpynb
│ ├── yamnet-load.xpynb
│ └── yamnet.ipynb
├── LICENSE
└── README.md
```

## Table of Contents
- [Introduction](#introduction)

- [Description](#description)

- [Tech Stack](#tech-stack)


- [Contributors](#contributors)

- [Future Prospects](#future-prospects)

- [Resources](#resources)

- [Acknowledgement](#acknowledgement)

## Introduction
This project focuses on developing a robust audio classifier that processes user-provided audio files and accurately identifies the category or class to which the audio belongs.

## Description
This project seeks to create a cutting-edge audio classification system capable of classifying diverse audio inputs, including speech, music, and environmental sounds.
We used 2 approaches for this project, which are as follows,

- Convolutional Neural Networks (CNNs)
- Transfer learning (YAMNet, ResNet50, EfficientNET )

https://github.com/user-attachments/assets/c7d5853d-6642-4652-b233-214ce93727d9



## Tech Stack
- [Python](https://www.python.org/)
- [Pytorch](https://pytorch.org/)
- [Kaggle](https://www.kaggle.com/)



## Contributors
- [Aanchal Borse](https://github.com/Aanchallllll)
- [Soham Rane](https://github.com/soham30rane)



## Future Prospects
- Hate Speech Detection in low-Resource Languages
- Audio based Security Systems
- Environmental Monitoring


## Resources

[Audio processing](https://discord.com/channels/1262070461324333198/1262075598621245610/1264632565764067368
) by Valerio Valerdo

Coursera course on [Deep learning](https://discord.com/channels/1262070461324333198/1262075598621245610/1263464039816757341
) by Andrew Ng and Younes Bensouda Mourri

[Pytorch playlist](https://discord.com/channels/1262070461324333198/1262075598621245610/1267162792994148393
) by Patrick Leober

Datasets used are as follows,
1. [ESC-50 dataset](https://www.kaggle.com/datasets/mmoreaux/environmental-sound-classification-50)
2. [CIFAR 10 dataset](https://www.kaggle.com/c/cifar-10/)
3. [Urban Sound 8k](https://www.kaggle.com/datasets/chrisfilo/urbansound8k)


## Acknowledgement
Special thanks to [COC VJTI](https://github.com/CommunityOfCoders) for ProjectX 2024

Special Thanks to our mentors [Kshitij Shah](https://github.com/kshitijdshah99) and [Param Thakkar](https://github.com/ParamThakkar123) who guided us throughout our project journey.