From 98975c1451532032c778d1f1d9bf8e092b2e4897 Mon Sep 17 00:00:00 2001 From: Guillaume NICOLAS Date: Sun, 15 Sep 2024 09:05:11 +0200 Subject: [PATCH] doc(readme): add section to explain 2024_07_01 algorithm change --- README.md | 6 ++++++ lib/markdown/updates/2024-07_01.md | 3 +++ 2 files changed, 9 insertions(+) create mode 100644 lib/markdown/updates/2024-07_01.md diff --git a/README.md b/README.md index c509fca..c3e9e6f 100644 --- a/README.md +++ b/README.md @@ -64,6 +64,12 @@ console.log(entity) ## Updates +## 2024-07-01 dataset + +Some third parties use a dynamic subdomain to serve its main script on websites (e.g .domain.com). Some of these subdomain scripts are saved under observed-domains JSON file as results of the `sql/all-observed-domains-query.sql` query but analyzing http archive database we found a lot that are ignored because of number of occurrences (less than 50 ). + +So, we've created a new query to keep observed domains with occurrence below 50 only if its mapped entity (based on entity.js) has a total occurrence (of all its declared domain) greater than 50. + ## 2021-01-01 dataset Due to a change in HTTPArchive measurement which temporarily disabled site-isolation (out-of-process iframes), all of the third-parties whose work previously took place off the main-thread are now counted _on_ the main thread (and thus appear in our stats). This is most evident in the change to Google-owned properties such as YouTube and Doubleclick whose _complete_ cost are now captured. diff --git a/lib/markdown/updates/2024-07_01.md b/lib/markdown/updates/2024-07_01.md new file mode 100644 index 0000000..828cc09 --- /dev/null +++ b/lib/markdown/updates/2024-07_01.md @@ -0,0 +1,3 @@ +Some third parties use a dynamic subdomain to serve its main script on websites (e.g .domain.com). Some of these subdomain scripts are saved under observed-domains JSON file as results of the `sql/all-observed-domains-query.sql` query but analyzing http archive database we found a lot that are ignored because of number of occurrences (less than 50 ). + +So, we've created a new query to keep observed domains with occurrence below 50 only if its mapped entity (based on entity.js) has a total occurrence (of all its declared domain) greater than 50.