Ted Talks Recommender and Chatbot Service

This Repo contains my work on scraping ted-talks from Tedtalks.org and building a NLP recommender system.

This project was a collaboration between myself, Uche, and Brittany. — Will add full names and github links.

Data Acquisition

The data was scraped from tedtalks.org using the BeautifulSoup library. The data was then cleaned and stored in a sql database. Features collected from the site are title, speaker, description, url, and topics and transcript for each talk. There are a total of 6000 talk events from XX authors across XX topics.

Exploratory Data Analysis

2.2. Transcripts and Topics UCHE

2.3 Statistics and patterns: PATRICK

Sentence structure analysis. The longest transcripts in the corpus is 510 sentences long, majority of the talks are between 25 and 250 sentences long. There were a total of 829 talk events with a transcript of 39 sentences.

Word count analysis. The longest transcript in the corpus is over 12K words. Majority of the transcripts are between 350 and 3,700 words. There was a total of 1,386 talks with a transcript of a little over 770 words.

![Alt text]()

The average word length for words in each talk is between 2 and 9 characters. Majority of the talks have an average word length of 4.5 characters.

Stop word analysis. The word 'the' was used over 390K times in the corpus. The top 10% stop words are 'the', 'to', 'of' ... 'have'.

Common words. The word 'I' was used over 129K times, 6 times the word 'we'. The most common words are 'I', 'And', ...'really'.

Bigram analysis. The most common bigrams are 'we re', and the top 3 most common bigrams include some form of the word 'are' or 're'. Top 10 bigrams mostly have 'we', 'you', 'they' preceeding and followed by either 're', 've', or 'll'.

Trigram analysis. The trigram 'day to day' was used over 100 times in the corpus, 60% of the top 10 tri-grams decribed age — 'six year old', '14 year old'... 'five years old'.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.idea		.idea
Data_output		Data_output
Images		Images
SRC		SRC
markdown_documents		markdown_documents
python_finished_code		python_finished_code
rough_work		rough_work
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ted Talks Recommender and Chatbot Service

Data Acquisition

Exploratory Data Analysis

About

Releases

Packages

Contributors 3

Languages

License

pokwir/Ted-Talks-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Ted Talks Recommender and Chatbot Service

Data Acquisition

Exploratory Data Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages