BERT-LID: Leveraging BERT to Improve Spoken Language Identification

We propose a BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. We extend the original BERT model by taking the phonetic posteriorgrams (PPG) derived from the front-end phone recognizer as input. Then we deployed the optimal deep classifier followed by it for language identification.

Datasets

We use OLR20, TIMIT&THCHS-30, TAL_ASR datasets for experiments. Among them, in order to test in segment audio, we perform segmentation processing on TIMIT&THCHS-30, where the window length is 1s and the window movement is 1s for segmentation. The specific usage of data is shown in data. The TAL_ASR data is forcibly aligned.

Feature extraction

We directly obtains the phoneme features of the audio through Phoneme recognizer based on long temporal context, and then obtains tokens through the BertTokenizer model provided by BERT. We take this feature as input. For the specific processing of data, see the 'get_data.py'.

Experiments

Load data

Load the data by adjusting the path parameters in 'load_data.py'. In the experiment, we divided the dataset into three datasets: train, test, and dev.

Models

Our model is saved in BertCNN, BertRCNN, BertDPCNN, BertLSTM, and pay attention to whether bert in the model-related file is turned on during use.

Program

Run the program using the following command:

python run_me.py

Important parameters are as follows:

model_name: Select the input model
label_list：Given label list
bert_voacb_file：Input the bert model

Or you can use the following command to run multiple times and adjust the super parameters at the same time:

bash run_me.sh

References

[1] P. Schwarz, "Phoneme Recognition based on Long Temporal Context, PhD Thesis", Brno University of Technology, 2009

[2] TAL_ASR: https://ai.100tal.com/dataset

[2] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[4] https://github.com/codertimo/BERT-pytorch

[5] Schwarz, Petr, et al. "Phoneme recognizer based on long temporal context." Speech Processing Group, Faculty of Information Technology, Brno University of Technology.[Online]. Available: http://speech. fit. vutbr. cz/en/software (2006).

Citation

Please cite our paper if you find this work useful:

@InProceedings{NieYT2022_ISCSLP,
  author    = {Yuting Nie and Junhong Zhao and Wei-Qiang Zhang and Jinfeng Bai},
  booktitle = {International Symposium on Chinese Spoken Language Processing (ISCSLP)},
  title     = {BERT-LID: Leveraging BERT to improve spoken language identification},
  address   = {Singapore},
  month     = {12},
  year      = {2022},
  url       = {https://arxiv.org/abs/2203.00328},
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
BertCNN		BertCNN
BertDPCNN		BertDPCNN
BertLSTM		BertLSTM
BertRCNN		BertRCNN
Models		Models
Utils		Utils
LICENSE		LICENSE
README.md		README.md
get_data.py		get_data.py
load_data.py		load_data.py
main.py		main.py
modeling.py		modeling.py
modeling_lid.py		modeling_lid.py
run_me.py		run_me.py
run_me.sh		run_me.sh
train_evalute.py		train_evalute.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT-LID: Leveraging BERT to Improve Spoken Language Identification

Datasets

Feature extraction

Experiments

Load data

Models

Program

References

Citation

About

Releases

Packages

Contributors 2

Languages

License

THUsatlab/BERT-LID

Folders and files

Latest commit

History

Repository files navigation

BERT-LID: Leveraging BERT to Improve Spoken Language Identification

Datasets

Feature extraction

Experiments

Load data

Models

Program

References

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages