Toolkit for Audiovisual Speaker Diarization in Noisy Environments, Speech Feature Extraction, and Well-Being Prediction
Repository for the master's thesis of Tobias Zeulner: Leveraging Speech Features for Automated Analysis of Well-Being in Teamwork Contexts
About this Project • Key Features • How To Use • License
Current methods for assessing employee well-being rely primarily on irregular and time-consuming surveys, which limits proactive measures and support. This thesis addresses this problem by developing predictive algorithms to automatically assess well-being. The algorithms are based on audio data collected in teamwork contexts. A dataset of 56 participants who worked in teams over a four-day period was curated. The positive emotion, engagement, relationships, meaning, and accomplishment (PERMA) framework consisting of five pillars ( developed by Seligman) was used to measure well-being. An audiovisual speaker diarization system was developed to enable the calculation of speech features at the individual level in a noisy environment. After extracting and selecting the most relevant features, regression, and classification algorithms were trained to predict well-being.
The best predictive model for each PERMA pillar is the two-class classification system. It achieves the following balanced accuracies: P: 78%, E: 50%, R: 74%, M: 61%, and A: 50%.
The entire pipeline (see image below) and final models are provided in this GitHub repository.
The four main building blocks of this toolbox are shown in the figure below.
- Input Video:
- mp4 or avi file
- Stored in
src/audio/videos
- Filename provided in
configs/config.yaml
- Ideally 25 fps (otherwise processing takes longer)
- Output of Audiovisual Speaker Diarization:
- 1 folder with the same name as the video (
src/audio/videos/VIDEONAME
), containing all current and future results - 3 important files in this folder:
- RTTM file (“who spoke when”)
- Log file (for troubleshooting)
- “faces_id” folder, which contains all recognized speaker and their corresponding ID from the RTTM file
- 1 folder with the same name as the video (
- Output of Communication Pattern & Emotion Feature Calculation:
- 1 csv file named "VIDEONAME_audio_analysis_results.csv" containing one row for each speaker with the corresponding features values over time as columns
- Output of Feature Visualization:
- 3 line charts for visualization of the feature values contained in the csv file
- 3 features are plotted per chart (i.e., 9 time series in total)
- Output of Well-Being Prediction:
- 1 csv file for the PERMA classification results (low/high well-being)
- 1 csv file for the PERMA regression results (continuous well-being scores either between 0-1 or 1-7)
- 1 plot to visualize the regression results (also saved as “perma_spider_charts.png”)
The parts can be run separately if, for example, the prediction of well-being is not required but other downstream tasks such as the prediction of team performance are.
If you wish to exclude an individual from the analysis (e.g. either random person in the background or no informed consent), you can do so by:
- performing only step 1 of the pipeline.
- deleting the person's image in the
src/audio/videos/VIDEONAME/faces_id
folder. - perform the remaining steps of the pipeline (2,3,4). From now on, the corresponding person will be excluded from the analysis
If you want to change the name of a person from the ID to the real name, you can do it as follows:
- perform only step 1 of the pipeline
- rename the corresponding file name in the folder
src/audio/videos/VIDEONAME/faces_id
by adding two underscores after the ID followed by the name (e.g. change the name from "2.jpg" to "2__john.jpg") - execute the remaining steps of the pipeline (2,3,4). From now on, the analysis will use the real name, not the ID
-
I recommend using the same Python version as me to avoid conflicts (3.8.10). After cloning this GitHub repo, I also recommend to set up a new virtual environment using the venv module (in the same directory as the "main.py" file).
How to set up a virtual environment in Python 3 using the venv module (Windows)
python -m venv venv .\venv\Scripts\activate
How to set up a virtual environment in Python 3 using the venv module (MacOS/Linux)
python3 -m venv venv source venv/bin/activate
-
Then, install ffmpeg (which is needed to process the video recordings).
How to install ffmpeg on Windows/Linux/MacOS
-
Install the required packages:
pip install -r requirements.txt
or
pip3 install -r requirements.txt
depending on your Python installation.
-
To process a video using this tool, follow the steps below (if you use it for the first time, you can leave the initial value in the configuration file (001) and go directly to the next step):
- Video Placement: Place the video you wish to process in the
src/audio/videos
directory. Ensure that the video file is in a format compatible with the project (mp4 or avi). - Configuration File: Open the
configs/config.yaml
file. This file contains various parameters that control the processing of the video. - Video Specification: In the configuration file, specify the filename of the video you placed in the
src/audio/videos
directory. Do not include the file extension in the filename. For instance, if your video file is called "my_video.mp4", you should enter "my_video". - Parameter Adjustment: Review the other parameters in the configuration file. These parameters control various aspects of the video processing, and you may adjust them as necessary to suit your specific needs.
- Video Placement: Place the video you wish to process in the
-
Run the main file:
python main.py
or
python3 main.py
depending on your Python installation.
Notes:
- Should you face errors indicating a problem with multiprocessing, then set "MULTIPROCESSING" to "False" in the Config file
- When running the script for the first time, all required machine learning models will be downloaded automatically
- Running the script on a GPU can accelerate the runtime by a factor of 4x-8x (the script recognizes automatically if a CUDA device is available) - due to Pytorch development, there is only support for NVIDEA GPUs (no M1/M2 GPUs)
Have fun! 😎
If you encounter any issues, please reach out to me or open a new issue.
Distributed under the MIT License. See LICENSE
for more information.
Email: [email protected] · LinkedIn: Tobias Zeulner