Skip to content

johvanniperez/a3-experiment

 
 

Repository files navigation

Assignment 3 - Replicating a Classic Experiment


Data Visualization Professor Harrison A3 - 3/4/2022 Jules Cazaubiel, Elaine Chen, Johvanni Perez, & Nick Tourtillott a3-JulesCazaubiel-ElaineChen-JohvanniPerez-NickTourtillott

Description of Experiment:

The goal of our experiment is to test the effectiveness of word clouds and their capability of being utilized to identify main points and summarizations of written articles. The experiment utilized five different articles, all of different topics to see if the topic impacted how individuals would be able to identify the corresponding word cloud. Participants were asked to read an excerpt of an article, and we would present 3 different word clouds. The participants had to select the word cloud that was associated with the excerpt they just read. The word cloud choices were all about similar topics, but generated using different articles. The participants would be presented an excerpt of an article. After reading it, they would be presented the word cloud choices and select the correct one. This was done for each of the five articles. The participants were only informed that they would complete a study about word clouds, but they did not have any prior knowledge on what the articles would be about or what they were be expected to do.

In order to produce word clouds uncluttered by commonly used words that are ultimately not relevant to any theme (= stopwords), such as “the” or “and”, we decided to preprocess the texts. We used NLTK to do so (Natural Language Toolkit), a widely used python package used for natural language processing. We used the basic stopword list from NLTK to remove these unnecessary words, along with other functions to remove all punctuation and numbers from the texts. This left us with a list of all relevant words for each text, outputted to a csv/text file for ease of use. Utlizing these csv files, we were able to use the D3 library to create the word clouds for each article.

Screenshots of Experiment:

Visualization 1: COVID-19 Article

Correct Answer: covid-19 correct word cloud Incorrect Answers: covid-19 incorrect word cloud

covid-19 incorrect word cloud

Visualization 2: Kardashian Drama Article

Correct Answer: kanye correct word cloud

Incorrect Answers: kanye incorrect word cloud

kanye incorrect word cloud

Visualization 3: Music Industry Trends Article

Correct Answer: music correct word cloud

Incorrect Answers: music incorrect word cloud

music incorrect word cloud

Visualization 4: Plant Based Drug Discovery Article

Correct Answer: plant correct word cloud

Incorrect Answers: plant incorrect word cloud

plant incorrect word cloud

Visualization 5: Travel Trends Article

Correct Answer: travel correct word cloud

Incorrect Answers: travel incorrect word cloud

travel incorrect word cloud

Results from Experiment:

accuracy_gender accuracy_gender_distro accuracy_major accuracy_major_distro accuracy_major_text accuracy_participants_text accuracy_word_count

Technical Achievements:

  • utilized D3 to create word clouds presented in Qualtrics Survey
  • utilized word processing software in Python to create CSV files for word clouds
  • utilized python script to create results visualizations
  • utilized qualtrics to create survey and sent it out to students

Design Achievements:

  • utilized D3 library to create word cloud that was only one color and created similar shapes so there wouldn't be alternative factors contributing to the effectiveness of the word cloud
  • D3 API generated random word clouds and utilized random positioning
  • D3 API caused styling all of the word clouds to be created in a similar manner
  • designed the experiment so that the user would be able to read an article and then identify the associated word cloud right after, so that they would have the most accurate memory of the article
  • each results visualization is utilizing color to differentiate between the different variables, like the relationship between accuracy and gender or the relationship between accuracy and major

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 98.5%
  • HTML 1.5%