-
Notifications
You must be signed in to change notification settings - Fork 48
Analysis on #34 #75
base: master
Are you sure you want to change the base?
Analysis on #34 #75
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, two big things here:
-
Your notebook articulates what you've done, but it does not articulate why. What does this analysis help us understand? Why are you interested in this question? What do you want to know about fingerprinting, tracking, ....? How does this analysis further work on the core issue in Can we build a heuristic for browser attribute fingerprinting? #34?
-
Your methodology does not support your stated aim. You say "It calculates the percentage of each of the three scripts with respect to the total number of scripts" but that's not what you've done. Please think about why this might be, and update your methodology. I'm avoiding telling you to give you an opportunity to think about what the data is more yourself.
Smaller things:
- You've used a 10% sample, is that appropriate for the analysis you're doing?
- You have done a lot of unnecessary computation which I'm assuming took up RAM and time. I don't see any need for each of the
df.compute()
calls you've made. - The result of the
df.compute()
is that you've outputted a lot of data which is not adding anything to my reading of the analysis you've performed. Try and keep things clean for easy reading of the knowledge you're generating. - You are manually transcribing counts into values. Use a variable.
- It's not clear to me why in cell 40 and cell 45 you have the same values.
Hi @birdsarah I realized I made a wrong assumption that each row has a unique script and did not consider there is redundancy. I first need to find the count of total unique scripts and the count of unique fingerprintjs scripts, hs-analytics, akam scripts. |
No description provided.