Skip to content

HTTPArchive/dataform

Repository files navigation

HTTP Archive BigQuery pipeline with Dataform

Tables

Crawl tables in all dataset

Tag: crawl_results_all

  • httparchive.all.pages
  • httparchive.all.parsed_css
  • httparchive.all.requests

Core Web Vitals Technology Report

Tag: cwv_tech_report

  • httparchive.core_web_vitals.technologies

Legacy crawl tables (to be deprecated)

Tag: crawl_results_legacy

  • httparchive.lighthouse.YYYY_MM_DD_client
  • httparchive.pages.YYYY_MM_DD_client
  • httparchive.requests.YYYY_MM_DD_client
  • httparchive.response_bodies.YYYY_MM_DD_client
  • httparchive.summary_pages.YYYY_MM_DD_client
  • httparchive.summary_requests.YYYY_MM_DD_client
  • httparchive.technologies.YYYY_MM_DD_client

Schedules

  1. crawl-complete PubSub subscription

    Tags:

    • crawl_results_all
    • crawl_results_legacy
  2. bq-poller-cwv-tech-report Scheduler

    Tags:

    • cwv_tech_report

Triggering workflows

see here

Contributing

Dataform development

  1. Create new dev workspace in Dataform.
  2. Make adjustments to the dataform configuration files and manually run a workflow to verify.
  3. Push all your changes to a dev branch & open a PR with the link to the BigQuery artifacts generated in the test workflow.

Dataform development workspace hints

  1. In workflow settings vars set dev_name: dev to process sampled data in dev workspace.
  2. Change current_month variable to a month in the past. May be helpful for testing pipelines based on chrome-ux-report data.
  3. definitions/extra/test_env.sqlx script helps to setup the tables required to run pipelines when in dev workspace. It's disabled by default.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published