Bioacoustic F0 estimation

This repository provides a Python interface based on deep learning to analyse bioacoustic signals by detecting tonal sounds and estimating their fundamental frequency.

For a detailled descriptions of the study corresponding to this repository, see the journal publication: https://doi.org/10.1080/09524622.2025.2500380 (Author accepted manuscript full text is available here)

The data that was used to train F0 estimation models and to run experiments is available on a dryad repository: https://doi.org/10.5061/dryad.prr4xgxw8

If you use this repository, please cite the associated journal publication.

How to use a pretrained model to analyse your own recordings

Clone the repo locally

git clone https://github.com/lamipaul/bioacoustic_F0_estimation

Navigate inside the local repository and install dependencies

cd bioacoustic_F0_estimation
pip install -r requirements.txt

Note that these package requirements rely on CUDA being installed (for GPU computing and faster analysis). If you do not have CUDA installed, you can use freely available GPUs on google collab, or run F0 estimation locally on CPU (it will just be slower, please then use the cpu_requirements.txt file).

Run the predict.py script to use a pretrained crepe model to estimate F0 values for your own sounds.

python predict.py my_sound_file.wav

A .csv file will be saved with timestamped F0 values, their associated model confidence, along with F0-based features (salience, harmonicity, and sub-harmonic ratio).

Several options can also be specified when using this script:

usage: predict.py [-h] [--model_path MODEL_PATH] [--compress COMPRESS] [--step STEP] [--decoder {argmax,weighted_argmax,viterbi}] [--no_print] [--no_characterisation]
                  [--threshold THRESHOLD] [--NFFT NFFT]
                  input

positional arguments:
  input                 Directory with sound files to process, or a single file to process

options:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH
                        Path of model weights
  --compress COMPRESS   Compression factor used to shift frequencies into CREPE's range [32Hz; 2kHz]. Frequencies are divided by the given factor by artificially changing the
                        sampling rate (slowing down / speeding up the signal).
  --step STEP           Step used between each prediction (in seconds)
  --decoder {argmax,weighted_argmax,viterbi}
                        Decoder used to postprocess predictions
  --no_print            Skip printing spectrograms with overlaid F0 predictions to assess their quality
  --no_characterisation
                        Skip the computation of vocalisation characteristics (harmonicity, salience, and SHR)
  --threshold THRESHOLD
                        Confidence threshold used when printing F0 predictions on spectrograms
  --NFFT NFFT           Window size used for the spectrum computation (for printing F0 predictions and computing vocalisation characteristics)

Reproducing paper experiments

Go to the paper_experiments folder

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
paper_experiments		paper_experiments
README.md		README.md
cpu_requirements.txt		cpu_requirements.txt
model_all.pth		model_all.pth
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioacoustic F0 estimation

How to use a pretrained model to analyse your own recordings

Reproducing paper experiments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bioacoustic F0 estimation

How to use a pretrained model to analyse your own recordings

Reproducing paper experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages