VoiceCoach

This is the TED Talk dataset for VoiceCoach, CHI 2020, https://arxiv.org/abs/2001.07876.

TED_dataset

The datasets contain meta information of 2623 TED Talks in official TED.com website until Jun 7th, 2019.

The meta information includes fields: 'author', 'datefilmed', 'totalviews', 'comments', 'language', 'downloadlink', 'vidlen', 'aws-transcripts', 'datecrawled', 'datepublished', 'title', 'id', 'url', 'keywords', 'videoname', 'ratings', and complete information is stored in the field 'alldata_JSON'.

Fileds

url: original video link
aws-transcripts: Each video in the dataset is transcribed by AWS. It has two fileds, including:
- transcript: all words in the video
- words: an array containing detailed information about all words. e.g.,
  - "start_time": "12.94",
  - "end_time": "13.25",
  - "alternatives": [{"confidence": "0.9097", "content": "we"}], "type": "pronunciation"}]

Video Downloading

tedvideo_download.py contains the code for downloading ted videos from TED.com

Video2mp3/wav

You can use ffmpeg to convert .mp4 to other audio formats (e.g., mp3, wav, etc.)

(updating)

Notice

If you use this dataset, please cite our paper

VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public Speaking

Preprint: https://arxiv.org/abs/2001.07876

Authors: Xingbo Wang, Haipeng Zeng, Yong Wang, Aoyu Wu, Zhida Sun, Xiaojuan Ma, Huamin Qu

Acknowledgements

The dataset is shared under the Creative Commons license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VoiceCoach

TED_dataset

Fileds

Video Downloading

Video2mp3/wav

Notice

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

VoiceCoach

TED_dataset

Fileds

Video Downloading

Video2mp3/wav

Notice

Acknowledgements