This repository has been archived by the owner on Sep 25, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
15_99-Solutions.Rmd
71 lines (51 loc) · 1.83 KB
/
15_99-Solutions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: "Preprocessing"
author: "3I: Webscraping and Data Management in R"
date: "Aug 2020"
output: html_document
---
## Dictionary
#### Challenge 1. {-}
Using the code we wrote above, make a function that accepts 1) a vector of texts and 2) a sentiment dictionary (i.e., a dataframe with words and scores) and returns a vector of sentiment scores for each text.
```{r eval = F}
sentiment_score <- function(texts, sent_dict){
# Preprocess and create DTM
docs <- Corpus(VectorSource(texts))
dtm <- DocumentTermMatrix(docs,
control = list(tolower = TRUE,
removeNumbers = TRUE,
removePunctuation = TRUE,
stopwords = TRUE
))
# Convert to dataframe
dtm <- as.data.frame(as.matrix(dtm))
# Get all the words in our DTM and put them in a dataframe
words = data.frame(word = colnames(dtm), stringsAsFactors = F)
head(words)
# Get their sentiment scores
words_sent <- words %>% left_join(sent_dict)
# Fix names
names(words_sent) <- c("word", "score")
# Replace NAs with 0
words_sent$score <- replace_na(words_sent$score, 0))
# Calculate documents scores with matrix algebra!
doc_scores <- as.matrix(dtm) %*% words_sent$score
return(doc_scores)
}
# Uncomment to test it out!
sentiment_score(ts$lyrics, sent_dict)
```
#### Challenge 2. {-}
Using the function you wrote above, find out what the most and least positive Taylor Swift album is.
```{r eval = F}
# Concatenate songs to make albums
albums <- ts %>%
group_by(album) %>%
summarise(lyrics = str_c(lyrics, collapse = ";"))
# First load the dictionary
afinn <- get_sentiments("afinn")
# Then run the function
sentiment_score(albums$lyrics, afinn)
# Add to original df
albums$sent <- sentiment_score(albums$lyrics, afinn)
```