Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
b732dde
skeleton
screpeau Feb 24, 2022
d4e500e
Added Pie Chart
mnorales Feb 26, 2022
c8bf210
Added Pie Chart
mnorales Feb 26, 2022
8446cb6
Added Pie Chart
mnorales Feb 26, 2022
e4c2c49
bar chart
screpeau Feb 28, 2022
842e72d
Slice Labels
mnorales Feb 28, 2022
660a474
pie Slices
mnorales Feb 28, 2022
b381066
Update index.html
mnorales Feb 28, 2022
4994e05
php stuff
screpeau Mar 2, 2022
186b893
Add files via upload
mnorales Mar 2, 2022
392e62d
Delete Male_Female_Data.xlsx
mnorales Mar 2, 2022
d9d347e
Add files via upload
mnorales Mar 2, 2022
d20ea00
Update index.php
mnorales Mar 2, 2022
455980c
Update index.php
mnorales Mar 2, 2022
6c41bfd
deathbydrugs
sudheshnabodapati Mar 2, 2022
b2eb1ac
deathbydrugs
sudheshnabodapati Mar 2, 2022
1af8a3b
Merge branch 'main' into deathsbydrugs
mnorales Mar 3, 2022
c1a54e1
Merge pull request #1 from sudheshnabodapati/deathsbydrugs
mnorales Mar 3, 2022
a9a9236
Create original_ReadMe.md
mnorales Mar 3, 2022
7eb28e0
Update index.php
mnorales Mar 3, 2022
3652b26
Update index.php
mnorales Mar 3, 2022
6443bb0
Set theme jekyll-theme-slate
mnorales Mar 3, 2022
9a4d148
Add files via upload
mnorales Mar 3, 2022
242dbb8
Update index.html
mnorales Mar 3, 2022
a26d585
Update index.html
mnorales Mar 3, 2022
4d90138
Update index.html
mnorales Mar 3, 2022
7149b2b
added form endpoint etc
screpeau Mar 3, 2022
1c5968f
Merge branch 'main' of https://github.com/mnorales/a3-experiment into…
screpeau Mar 3, 2022
3b65bfa
Update index.html
mnorales Mar 3, 2022
32cc505
fix
screpeau Mar 3, 2022
c30c900
exp description
screpeau Mar 3, 2022
1f89b58
updated exp description
screpeau Mar 3, 2022
eff15d0
skeleton for writeup
screpeau Mar 3, 2022
19871dd
Add files via upload
mnorales Mar 3, 2022
b400353
Update README.md
mnorales Mar 3, 2022
2a955db
Update README.md
mnorales Mar 3, 2022
918c17a
clean
screpeau Mar 4, 2022
f3f0e43
Update index.html
mnorales Mar 4, 2022
5df16bb
Delete pie_chart.PNG
mnorales Mar 4, 2022
4f6847a
Add files via upload
mnorales Mar 4, 2022
c34d571
Delete pie_chart.PNG
mnorales Mar 4, 2022
4170069
Add files via upload
mnorales Mar 4, 2022
0590819
Update README.md
mnorales Mar 4, 2022
33458c8
Update README.md
mnorales Mar 4, 2022
a302adc
scatter
screpeau Mar 4, 2022
11c5c03
Merge branch 'main' of https://github.com/mnorales/a3-experiment into…
screpeau Mar 4, 2022
d8aef86
fixed tick format
screpeau Mar 4, 2022
e922ef9
html screenshot test
screpeau Mar 4, 2022
bce22a5
added exp desc etc
screpeau Mar 4, 2022
7ff6242
fix img
screpeau Mar 4, 2022
2e8cd9e
scatter desc
screpeau Mar 4, 2022
7857389
fancy button test
screpeau Mar 4, 2022
be88431
fix button
screpeau Mar 4, 2022
10ea3da
fix fancy button 2
screpeau Mar 4, 2022
d77a199
final fix for vaporwave button
screpeau Mar 4, 2022
325edc3
update readme
screpeau Mar 4, 2022
bae2df5
readme fix
screpeau Mar 4, 2022
722c3a5
readme fix again
screpeau Mar 4, 2022
5311b35
reported log2error
screpeau Mar 4, 2022
77fa752
added results csv
screpeau Mar 4, 2022
5a1191e
Add files via upload
mnorales Mar 4, 2022
5c71a58
Add files via upload
mnorales Mar 4, 2022
78cb22a
Update README.md
mnorales Mar 4, 2022
e5fb1b7
Update README.md
mnorales Mar 4, 2022
c9e4b4c
Update README.md
mnorales Mar 4, 2022
2610a0c
Delete original_ReadMe.md
mnorales Mar 5, 2022
c05824c
Update bootstrapCalc.R
mnorales Mar 5, 2022
7899629
Delete Intervals.PNG
mnorales Mar 5, 2022
74f53ea
Add files via upload
mnorales Mar 5, 2022
36b1d5d
Delete Intervals.PNG
mnorales Mar 5, 2022
eb2d977
Add files via upload
mnorales Mar 5, 2022
ef293ec
Delete Intervals.PNG
mnorales Mar 5, 2022
a68931f
Add files via upload
mnorales Mar 5, 2022
d73905e
Update README.md
mnorales Mar 5, 2022
7c0250e
Update README.md
mnorales Mar 5, 2022
9653f8b
Update README.md
mnorales Mar 5, 2022
25416fb
Update README.md
mnorales Mar 5, 2022
7217ae2
Update README.md
mnorales Mar 5, 2022
9282a1e
Update README.md
mnorales Mar 5, 2022
997146e
Delete Bootstrap.PNG
mnorales Mar 5, 2022
4180ecc
Add files via upload
mnorales Mar 5, 2022
4bc08e9
Update README.md
mnorales Mar 5, 2022
56cf1f2
Update bootstrapCalc.R
mnorales Mar 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Intervals.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Male_Female_Data.csv
Binary file not shown.
Binary file added PieChart.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
202 changes: 97 additions & 105 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,106 +1,98 @@
Assignment 3 - Replicating a Classic Experiment
===
# Assignment 3

### Sudheshna Bodapati, Samantha Crepeau, Monet Norales

## Overview and Design

Here is the [link](https://mnorales.github.io/a3-experiment/) to our project site.

The purpose of this experiment is to determine which method of manipulating a graph to make it misleading is the most effective. We test three different kinds of visualizations: the pie chart, the histogram, and the scatter plot. Each visualization is manipulated in a different way. For the pie chart, the data is inaccurately represented through the slices. For the histogram, the x-axis makes the interpretation of the data ambigous. For the scatter plot, the upward trend of deaths is flattened through the omission of the y-axis tick labels. We hypothesize that the scatter plot will be least likely to be identified as misleading, due to the fact that participants will have to mouseover the datapoints to interpret the data correctly.

Questions Asked:
- Do you think the graph is misleading? (yes/no)
- Why or why not? (short answer)

<br/>


### Graph 1: Pie Chart

<img src="PieChart.PNG" width="400" height="400">

**Description:**

This pie chart compares deaths per 100,000 for males and females and shows the total. The misleading part of this graph is that the total is included as its own entity, which makes it the third slice. A more accurate representation of the information would be for the pie chart to only include male and female as the slices. The total would be better shown separately with just the numerical value. Also the male and female values do not add up to the total because male, female, and total are all averages of the raw data columns. However, because the direct values are not given, this remains unclear to the participants who are viewing the chart.

**Hypothesis:**

Participants will have difficulty understanding that the pie chart is misleading and why.

<br/>


### Graph 2: Histogram

<img src="histogram.PNG" width="500" height="400">

**Description:**

This histogram shows the number of overdose deaths for each age group. The histogram is misleading because only the first age in each age group range is listed, which makes the x-axis unclear and leaves the data open for interpretation. A better histogram would have no spaces in between the bars and the full range for each age group, ie. 25-34.

**Hypothesis:**

Participants will have difficulty understanding that the histogram is misleading and why.

<br/>


### Graph 3: Scatterplot

<img src="scatter.PNG" width="550" height="400">

**Description:**

This scatter plot shows the number of overdose deaths from synthetic opioids over two decades.Participants can mouseover the datapoints to learn more about the graph. The scatter plot is misleading because we omit the y-axis labels and lengthen the axis to make the sharp upward trend in deaths look more flat. It is not obvious by looking at the graph that there were 730 deaths in 1999 and 56516 deaths in 2020; this information can only be found by mousing over the datapoints. A scatter plot that was not intentionally misleading would have a labeled y-axis of the correct scale.

**Hypothesis:**

Participants will have difficulty understanding that the scatter plot is misleading and why.

<br/>


## Results

Using Cleveland and McGill's logarithmic error function, we computed the log2Error for each visualization. Because all of our graphs are misleading, the true percent was 100. The judged percent was determined by the number of "yes" answers for each visualization. The ranking of the visualizations is as follows:

1. Pie chart: 3.518849829
2. Histogram: 2.64385619
3. Scatterplot: 2.087462841

The true value was 3.64385619; therefore, the pie chart was the least misleading and the scatterplot was the most misleading. This confirms our hypothesis that participants would find it most difficult to correctly interpret the scatterplot.

<img src="errorBars.png" width="600" height="400">
<img src="Intervals.png" width="600" height="200">
In the picture above, [,1] refers to the mean for each graph type, [,2] refers to the lower bound, and [,3] refers to the upper bound of each confidence interval.


When participants were able to identify the graph as misleading, they generally had the correct explanation for as to why the graph was misleading. For example, one participant gave the following response for as to why the scatterplot was misleading:
> "This graph doesn't show the severity of the impact on drug overdose deaths and it does not have a y -axis and needs one for better representation of the graph to look at the number of deaths instead of hovering over on them"

<br/>


## Technical Achievements

<img src="tooltip.PNG" width="650" height="300">

Our technical achievement is the mouseover tooltip found in the scatterplot. The tooltip shows the year and number of deaths for a datapoint when moused over, and is not shown otherwise.

<br/>


## Visual Achievements

<img src="button.PNG" width="220" height="140">

For our visual achievement, we used CSS to make a nice button with a moving gradient on hover.

For the scope of this project, assume the role of a scientist who runs experiments for a living.

Q: How do we know that bar charts are "better" than pie charts?
A: Controlled experiments!

In this assignment you'll implement a simple controlled experiment using some of the visualizations you’ve been building in this class.
You'll need to develop support code for the experiment sequence, results file output, and other experiment components.
(These are all simple with Javascript buttons and forms.)
The main goals for you are to a) test three competing visualizations, b) implement data generation and error calculation functions inspired by Cleveland and McGill's 1984 paper, c) run the experiment with 10 participants (or a trial equivalent), and d) do some basic analysis and reporting of the results.

For this assignment you should aim to write everything from scratch. For experimentation it is often necessary to control all elements of the chart.
You should definitely *reference* demo programs from books or the web, and if you do please provide a References section with links at the end of your Readme.

Going Beyond Cleveland-McGill
---

Several have expressed interest in conducting surveys of various sorts. I encourage you go move beyond Cleveland and McGill if you can think of other interesting visualization experiment designs and corresponding analyses.

You might study how people interpret COVID visualizations, for example.
If you decide to go in a custom route, simply contact staff so we can help you set acceptable parameters.
Basically, we still want you to do a multi-trial study with each participant, to raise the chance that you get solid results.
How you measure "error" and similar facets also matter. But you can't go wrong with finding a visualization study online to start from :)

Requirements
---

- Look it over Cleveland and McGill's original experiment (see the section below) and [watch this video](experiment-example.mp4) to get a sense of the experiment structure and where your visualizations will go.
- When viewing the example experiment video, note the following:
- Trials are in random order.
- Each trial has a randomly generated set of 5-10 data points.
- Two of these data points are marked.
- (Note: the experiment UI and User Experience could be better. Plenty of design achievements here).
- Implement the data generation code **as described in the Cleveland & McGill paper**.
- The goal is to generate a set of random datapoints (usually 5 or 10, with values be between 0 and 100) and to mark two of them for comparison in the trial.
- Add 3 visualizations (i.e. conditions) to your experiment. When you are adding these visualizations, think about *why* these visualizations are interesting to test. In other words, keep in mind a *testable hypothesis* for each of the added visualization. Some good options include bar charts, pie charts, stacked-bar charts, and treemaps. You can also rotate your bar chart to be horizontal or upside-down as one of your conditions. You are encouraged to test unorthodox charts -- radar charts come to mind, but there are MANY possibilities here-- feel free to be creative!
- Follow the style from Cleveland and McGill closely (e.g. no color, simple lines) unless you are specifically testing a hypothesis (e.g. color versus no color). Pay attention to spacing between elements like bars. Do not mark bars for comparison using color-- this makes the perceptual task too easy.
- After each trial, implement code that grades and stores participant’s responses.
- At the end of the experiment, to get the data, one easy option use Javascript to show the data from the current experiment\* (i.e. a comma separated list in a text box) and copy it into your master datafile. See the Background section below for an example of what this file should look like. (\*Alternately implement a server, if you're experienced with that sort of thing.)

- Figure out how to calculate "Error", the difference between the true percentage and the reported percentage.
- Scale this error using Cleveland and McGill’s log-base-2 error equation. For details, see the background section (there’s a figure with the equation). This becomes your “Error” column in the output. Make sure you use whole percentages (not decimal) in the log-base-2 equation. Make sure you handle the case of when a person gets the exact percentage correct (log-base-2 of 1/8 is -3, it is better to set this to 0).
- Run your experiment with 10 or more participants, or-- make sure you get at least 200 trials **per visualization type** in total.
- Grab friends or people in the class.
- Run at least 20 trials per visualization type, per participant. This is to ensure that you cover the range of possible answers (e.g. 5%, 10%, ..., 95%)
- Make sure to save the resulting CSV after each participant. Compile the results into a master csv file (all participants, all trials).
- Produce a README with figures that shows the visualizations you tested and results, ordered by best performance to worst performance. Follow the modern Cleveland-McGill figure below -- though note that using names instead of icons is fine.
- To obtain the ranking, calculate and report the average log2Error for each visualization across all trials and participants. This should be straightforward to do in a spreadsheet.
- Use Bootstrapped 95\% confidence intervals for your error upper and lower bounds. Include these in your figures. Bootstrapped confidence intervals are easily implemented in R + ggplot2 using the `stat_summary` geom. You can also use Excel, Python, or many many other tools. Bootstrapped 95% CIs are **very** useful in modern experiment practice.
- Include example images of each visualization as they appeared in your experiment (i.e. if you used a pie chart show the actual pie chart you used in the experiment along with the markings, not an example from Google Images).

## General Requirements

0. Your code should be forked from the GitHub repo and linked using GitHub pages.
2. Your project should use d3 to build visualizations.
3. Your writeup (readme.md in the repo) should contain the following:

- Working link to the experiment hosted on gh-pages or some other site.
- Concise description and screenshot of your experiment.
- Description of the technical achievements you attempted with this project.
- Description of the design achievements you attempted with this project.

Background
---

In 1984, William Cleveland and Robert McGill published the results of several controlled experiments that pitted bar charts against pies and stacked-bar variants.
Their paper (http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/cleveland.pdf) (http://info.slis.indiana.edu/~katy/S637-S11/cleveland84.pdf) is considered a seminal paper in data visualization.
In particular, they ran a psychology-style experiment where users were shown a series of randomly-generated charts with two graphical elements marked like this:

![cleveland bar chart](img/cleveland-bar.png)

Participants were then asked, "What percentage is the smaller of the larger?".
This was repeated hundreds of time with varying data and charts.
By the end of the study, Cleveland and McGill had amassed a large dataset that looked like this:

![cleveland table](img/cleveland-table.png)

__Log-base-2 or "cm-error"__: The true percent is the actual percentage of the smaller to the larger, while the reported percent is what participants reported.
Cleveland and McGill recognized that their analyses would be biased if they took `abs(ReportedPercent – TruePercent)` as their score for error.
To compensate, they came up with a logarithmic scale for error with this equation:

![cleveland equation](img/cleveland-equation.png)

You’ll be implementing this error score as part of the lab.
(Hint: it’s not a trick question, this is just to familiarize you with the experiment protocol).
With this Cleveland-McGill error score you can better compare the performance of the charts you test to figure out which one performs the best.

As a baseline, compare your average Error scores to the following chart, which include both Cleveland and McGill’s results as well as more recent extensions of this experiment (lower error indicates better performance, and error bars are bootstrapped 95% confidence intervals (`http://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation`)):

![cleveland results](img/cleveland-results.png)

GitHub Details
---

- Fork the GitHub Repository. You now have a copy associated with your username.
- Make changes to index.html to fulfill the project requirements.
- Make sure your "master" branch matches your "gh-pages" branch. See the GitHub Guides referenced above if you need help.
- Edit this README.md with a link to your gh-pages site: e.g. http://YourUsernameGoesHere.github.io/Experiment/index.html
- Replace this file (README.md) with your writeup and Design/Technical achievements.
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.
- Name your submission using the following scheme:
```
a3-FirstLastnameMember1-FirstLastnameMember2-FirstLastnameMember3-...
```
1 change: 1 addition & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
theme: jekyll-theme-slate
7 changes: 7 additions & 0 deletions age-deaths.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
age_group,number,"deaths_per"
15,4777,11.2
25,16375,35.6
35,16859,40.5
45,15083,36.9
55,12896,30.4
65+,4469,8.3
35 changes: 35 additions & 0 deletions bootstrapCalc.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# https://www.geeksforgeeks.org/bootstrap-confidence-interval-with-r-programming/
# https://cran.r-project.org/web/packages/superb/superb.pdf

library(boot)
library(tidyverse)
library(superb)

# load result data
# 1 = Yes; 0 = No

data<-read.csv(file = 'expResults.csv', header = TRUE) #read data

newData<-data[,c("Pie.chart","Bar.chart","Scatter.plot")] #relevant columns

i<-nrow(newData) #count number of rows for re-sampling

boot.mean<-function(newData,i){
boot.mean<-mean(newData[i])} #bootstrapping function to get the mean

#bootstrap all columns
#gets bootstrap mean and confidence interval
apply(newData,2,function(y){
b<-boot(y,boot.mean,R=50000);
c(mean(b$t),boot.ci(b,type="perc", conf=0.95)$percent[4:5])
})

# plotting bootstrap
dta <- data.frame(plot_type = c("Pie Chart","Bar Chart","Scatter Plot"),
center= c(0.9161,0.4996,0.3335),
lower = c(0.1661,0.2496,0.2502),
upper = c(0.0839,0.2504,0.2498))

# plot of error bars
ggplot(dta, aes_string(ymin="center-lower", ymax="center+upper", x = "plot_type" ) ) +
geom_superberrorbar()
Binary file added button.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added errorBars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions expResults.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Participant,Pie chart,Bar chart,Scatter plot,
1,1,0,0,
2,1,1,1,
3,1,0,0,
4,1,0,0,
5,1,0,0,
6,1,1,0,
7,1,1,1,
8,1,1,0,
9,1,1,0,
10,1,0,1,
11,0,0,0,
12,1,1,1,
Binary file removed experiment-example.mp4
Binary file not shown.
Binary file added histogram.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed img/cleveland-bar.png
Binary file not shown.
Binary file removed img/cleveland-equation.png
Binary file not shown.
Binary file removed img/cleveland-results.png
Binary file not shown.
Binary file removed img/cleveland-table.png
Binary file not shown.
Loading