Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 engineering: add prefect engine when running ETL #3029

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Marigold
Copy link
Collaborator

@Marigold Marigold commented Jul 29, 2024

Adds option --engine for specifying which scheduler to use (default is --engine etl). Using --engine prefect orchestrates ETL with Prefect and creates SQLite file that could be inspected with Prefect UI.

The Prefect UI runs on http://staging-site-prefect:4200/flow-runs (there's a new link from Wizard). Here's an example of a run after changing regions.

It runs steps concurrently with a single worker (default) and uses Dask with multiple workers (flag --workers).

Comparison to --engine etl

  • Prefect doesn't interrupt other tasks on error and tries to complete as many tasks as possible
  • When using multiple workers, it's easy to find the failing task and the exception (it's not as trivial from Buildkite logs)
  • structlog.info adds colour to output, but Prefect can't decode it and prints characters like [0m [�[32m�[1minfo. This should be soon fixed in Enhancement: ANSI color support in logs PrefectHQ/prefect-ui-library#2582
  • There's not much overhead from Prefect compared to multiprocessing
  • We could leverage Dask features (e.g. memory limiting), but it's also likely that a naive multiprocessing is good enough for us and we shouldn't complicate our lives (like with DVC)

My 2 cents

I'd find it very helpful for inspecting ETL runs on staging servers and in production (where I find searching through logs really annoying). We could give it a try and see if it was worth it in a few weeks. Prefect could also be useful for automatic dataset updates, which currently exist as bash scripts and are run by Buildkite. It works, but as @lucasrodes suggested, we might need more flexibility.

TODO before merging

  • Undo changes to regions.yml

@owidbot
Copy link
Contributor

owidbot commented Jul 29, 2024

Quick links (staging server):

Site Admin Wizard Docs

Login: ssh owid@staging-site-prefect

chart-diff: ✅ No charts for review.
data-diff:
= Dataset garden/agriculture/2024-03-26/attainable_yields
  = Table attainable_yields
= Dataset garden/agriculture/2024-03-26/long_term_crop_yields
  = Table long_term_crop_yields
= Dataset garden/agriculture/2024-03-26/long_term_wheat_yields
  = Table long_term_wheat_yields
= Dataset garden/agriculture/2024-03-26/uk_long_term_yields
  = Table uk_long_term_yields
= Dataset garden/agriculture/2024-05-23/daily_calories_per_person
  = Table daily_calories_per_person
= Dataset garden/animal_welfare/2023-08-08/farmed_finfishes_used_for_food
  = Table farmed_finfishes_used_for_food
= Dataset garden/animal_welfare/2023-08-14/number_of_farmed_fish
  = Table number_of_farmed_fish
= Dataset garden/animal_welfare/2023-08-15/number_of_farmed_decapod_crustaceans
  = Table number_of_farmed_decapod_crustaceans
= Dataset garden/animal_welfare/2023-08-16/number_of_wild_fish_killed_for_food
  = Table number_of_wild_fish_killed_for_food
= Dataset garden/animal_welfare/2024-05-20/animals_used_for_food
  = Table animals_used_for_food
= Dataset garden/artificial_intelligence/2023-06-14/ai_national_strategy
  = Table ai_national_strategy
= Dataset garden/artificial_intelligence/2023-07-25/cset
  = Table cset
= Dataset garden/artificial_intelligence/2024-06-28/ai_strategies
  = Table ai_strategies
= Dataset garden/artificial_intelligence/2024-07-16/cset
  = Table cset
= Dataset garden/bgs/2024-07-09/world_mineral_statistics
  = Table world_mineral_statistics_flat
  = Table world_mineral_statistics
= Dataset garden/climate/2024-02-19/monthly_burned_area
  = Table monthly_burned_area
= Dataset garden/climate/2024-02-19/monthly_fire_emissions
  = Table monthly_fire_emissions
= Dataset garden/climate_watch/2023-10-31/emissions_by_sector
  = Table carbon_dioxide_emissions_by_sector
  = Table methane_emissions_by_sector
  = Table nitrous_oxide_emissions_by_sector
  = Table greenhouse_gas_emissions_by_sector
  = Table fluorinated_gas_emissions_by_sector
= Dataset garden/countries/2023-09-25/gleditsch
  = Table gleditsch_countries
  = Table gleditsch_regions
  = Table gleditsch
= Dataset garden/countries/2023-09-25/isd
  = Table isd
  = Table isd_regions
  = Table isd_countries
= Dataset garden/countries/2023-09-29/cow_ssm
  = Table cow_ssm_majors
  = Table cow_ssm_system
  = Table cow_ssm_countries
  = Table cow_ssm_regions
  = Table cow_ssm_states
= Dataset garden/cow/2024-07-26/national_material_capabilities
  = Table national_material_capabilities
= Dataset garden/democracy/2024-03-07/bmr
  = Table population_regime
  = Table bmr
  = Table num_countries_regime
  = Table population_regime_years
  = Table num_countries_regime_years
= Dataset garden/democracy/2024-03-07/eiu
  = Table avg_pop
  = Table num_countries
  = Table num_people
  = Table eiu
= Dataset garden/democracy/2024-03-07/ert
  = Table region_aggregates
  = Table ert
= Dataset garden/democracy/2024-03-07/fh
  = Table fh_regions
  = Table fh
= Dataset garden/democracy/2024-03-07/lexical_index
  = Table region_aggregates
  = Table lexical_index
= Dataset garden/democracy/2024-03-07/polity
  = Table avg_pop
  = Table num_countries
  = Table num_people
  = Table polity
= Dataset garden/democracy/2024-03-07/vdem
  = Table vdem_population
  = Table vdem_num_countries
  = Table vdem_multi_with_regions
  = Table vdem_multi_without_regions
  = Table vdem
= Dataset garden/demography/2023-03-31/population
  = Table population
  = Table population_original
= Dataset garden/demography/2023-06-27/world_population_comparison
  = Table world_population_comparison
= Dataset garden/demography/2024-07-15/population
  = Table population_density
  = Table population_original
  = Table historical
  = Table projections
  = Table population
  = Table population_growth_rate
= Dataset garden/demography/2024-07-18/population_doubling_times
  = Table population_doubling_times
= Dataset garden/education/2023-07-17/education_barro_lee_projections
  = Table education_barro_lee_projections
= Dataset garden/education/2023-07-17/education_lee_lee
  = Table education_lee_lee
= Dataset garden/eia/2023-12-12/energy_consumption
  = Table energy_consumption
= Dataset garden/ember/2024-05-08/yearly_electricity
  = Table yearly_electricity
= Dataset garden/emdat/2024-04-11/natural_disasters
  = Table natural_disasters_yearly_deaths
  = Table natural_disasters_yearly
  = Table natural_disasters_yearly_impact
  = Table natural_disasters_decadal_deaths
  = Table natural_disasters_decadal_impact
  = Table natural_disasters_decadal
= Dataset garden/emissions/2024-04-08/national_contributions
  = Table national_contributions
= Dataset garden/emissions/2024-06-20/gdp_and_co2_decoupling
  = Table gdp_and_co2_decoupling
= Dataset garden/energy/2024-05-08/photovoltaic_cost_and_capacity
  = Table photovoltaic_cost_and_capacity
= Dataset garden/energy/2024-06-20/electricity_mix
  = Table electricity_mix
= Dataset garden/energy/2024-06-20/energy_mix
  = Table energy_mix
= Dataset garden/energy/2024-06-20/fossil_fuel_production
  = Table fossil_fuel_production
= Dataset garden/energy/2024-06-20/fossil_fuel_reserves_production_ratio
  = Table fossil_fuel_reserves_production_ratio
= Dataset garden/energy/2024-06-20/global_primary_energy
  = Table global_primary_energy
= Dataset garden/energy/2024-06-20/primary_energy_consumption
  = Table primary_energy_consumption
= Dataset garden/energy/2024-06-20/uk_historical_electricity
  = Table uk_historical_electricity
= Dataset garden/energy_institute/2024-06-20/statistical_review_of_world_energy
  = Table statistical_review_of_world_energy_prices
  = Table statistical_review_of_world_energy
  = Table statistical_review_of_world_energy_price_index
= Dataset garden/ess/2023-08-02/ess_trust
  = Table ess_trust
= Dataset garden/eth/2023-03-15/ethnic_power_relations
  = Table ethnic_power_relations
= Dataset garden/faostat/2024-03-14/additional_variables
  = Table macronutrient_compositions
  = Table vegetable_oil_yields
  = Table fertilizer_exports
  = Table arable_land_per_crop_output
  = Table hypothetical_meat_consumption
  = Table cereal_allocation
  = Table maize_and_wheat
  = Table fertilizers
  = Table area_used_per_crop_type
  = Table food_available_for_consumption
  = Table land_spared_by_increased_crop_yields
  = Table share_of_sustainable_and_overexploited_fish
  = Table agriculture_land_use_evolution
= Dataset garden/faostat/2024-03-14/faostat_cahd
  = Table faostat_cahd
  = Table faostat_cahd_flat
= Dataset garden/faostat/2024-03-14/faostat_ei
  = Table faostat_ei_flat
  = Table faostat_ei
= Dataset garden/faostat/2024-03-14/faostat_ek
  = Table faostat_ek
  = Table faostat_ek_flat
= Dataset garden/faostat/2024-03-14/faostat_emn
  = Table faostat_emn_flat
  = Table faostat_emn
= Dataset garden/faostat/2024-03-14/faostat_esb
  = Table faostat_esb
  = Table faostat_esb_flat
= Dataset garden/faostat/2024-03-14/faostat_fa
  = Table faostat_fa
  = Table faostat_fa_flat
= Dataset garden/faostat/2024-03-14/faostat_fbsc
  = Table faostat_fbsc_flat
  = Table faostat_fbsc
= Dataset garden/faostat/2024-03-14/faostat_fo
  = Table faostat_fo
  = Table faostat_fo_flat
= Dataset garden/faostat/2024-03-14/faostat_food_explorer
  = Table faostat_food_explorer
= Dataset garden/faostat/2024-03-14/faostat_fs
  = Table faostat_fs_flat
  = Table faostat_fs
= Dataset garden/faostat/2024-03-14/faostat_ic
  = Table faostat_ic_flat
  = Table faostat_ic
= Dataset garden/faostat/2024-03-14/faostat_lc
  = Table faostat_lc_flat
  = Table faostat_lc
= Dataset garden/faostat/2024-03-14/faostat_qcl
  = Table faostat_qcl
  = Table faostat_qcl_flat
= Dataset garden/faostat/2024-03-14/faostat_qi
  = Table faostat_qi_flat
  = Table faostat_qi
= Dataset garden/faostat/2024-03-14/faostat_qv
  = Table faostat_qv_flat
  = Table faostat_qv
= Dataset garden/faostat/2024-03-14/faostat_rfb
  = Table faostat_rfb_flat
  = Table faostat_rfb
= Dataset garden/faostat/2024-03-14/faostat_rfn
  = Table faostat_rfn_flat
  = Table faostat_rfn
= Dataset garden/faostat/2024-03-14/faostat_rl
  = Table faostat_rl_flat
  = Table faostat_rl
= Dataset garden/faostat/2024-03-14/faostat_rp
  = Table faostat_rp
  = Table faostat_rp_flat
= Dataset garden/faostat/2024-03-14/faostat_rt
  = Table faostat_rt
  = Table faostat_rt_flat
= Dataset garden/faostat/2024-03-14/faostat_scl
  = Table faostat_scl
  = Table faostat_scl_flat
= Dataset garden/faostat/2024-03-14/faostat_sdgb
  = Table faostat_sdgb_flat
  = Table faostat_sdgb
= Dataset garden/faostat/2024-03-14/faostat_tcl
  = Table faostat_tcl
  = Table faostat_tcl_flat
= Dataset garden/faostat/2024-03-14/faostat_ti
  = Table faostat_ti_flat
  = Table faostat_ti
= Dataset garden/forests/2024-05-08/ifl
  = Table ifl
= Dataset garden/forests/2024-07-10/tree_cover_loss_by_driver
  = Table tree_cover_loss_by_driver
= Dataset garden/gcp/2024-06-20/global_carbon_budget
  = Table global_carbon_budget
= Dataset garden/ggdc/2022-11-28/penn_world_table
  = Table penn_world_table
= Dataset garden/happiness/2024-06-09/happiness
  = Table happiness
= Dataset garden/harvard/2023-09-18/colonial_dates_dataset
  = Table colonial_dates_dataset
= Dataset garden/harvard/2024-07-22/global_military_spending_dataset
  = Table global_military_spending_dataset
= Dataset garden/health/2023-04-18/wgm_mental_health
  = Table wgm_mental_health
= Dataset garden/health/2023-04-25/wgm_2018
  = Table wgm_2018
= Dataset garden/health/2023-08-09/unaids
  = Table unaids
= Dataset garden/health/2023-08-14/avian_influenza_h5n1_kucharski
  = Table avian_influenza_h5n1_kucharski
= Dataset garden/health/2023-08-16/deaths_karlinsky
  = Table deaths_karlinsky
= Dataset garden/health/2024-04-02/organ_donation_and_transplantation
  = Table organ_donation_and_transplantation
= Dataset garden/health/2024-04-12/polio_free_countries
  = Table polio_free_countries
= Dataset garden/hyde/2024-01-02/all_indicators
  = Table all_indicators
= Dataset garden/irena/2023-12-12/renewable_electricity_capacity
  = Table renewable_electricity_capacity
= Dataset garden/irena/2023-12-12/renewable_energy_patents
  = Table renewable_energy_patents
  = Table renewable_energy_patents_by_technology
= Dataset garden/lgbt_rights/2023-04-27/lgbti_policy_index
  = Table lgbti_policy_index
= Dataset garden/lgbt_rights/2024-06-03/equaldex
  = Table equaldex
= Dataset garden/lgbt_rights/2024-06-11/criminalization_mignot
  = Table criminalization_mignot
= Dataset garden/lis/2024-06-13/luxembourg_income_study
  = Table lis_percentiles
  = Table luxembourg_income_study
  = Table luxembourg_income_study_adults
  = Table lis_percentiles_adults
= Dataset garden/maternal_mortality/2024-07-08/maternal_mortality
  = Table maternal_mortality
= Dataset garden/minerals/2024-07-15/minerals
  = Table minerals
= Dataset garden/missing_data/2024-03-26/children_out_of_school
  = Table children_out_of_school
= Dataset garden/missing_data/2024-03-26/who_md_suicides
  = Table who_md_suicides
= Dataset garden/missing_data/2024-03-26/who_neuropsychiatric_conditions
  = Table neuropsychiatric_conditions
= Dataset garden/neglected_tropical_diseases/2024-05-02/lymphatic_filariasis
  = Table lymphatic_filariasis_national
  = Table lymphatic_filariasis
= Dataset garden/neglected_tropical_diseases/2024-05-02/schistosomiasis
  = Table schistosomiasis
= Dataset garden/news/2024-05-08/guardian_mentions
  = Table guardian_mentions
  = Table avg_10y
= Dataset garden/noaa_ncei/2024-05-09/natural_hazards
  = Table natural_hazards
= Dataset garden/oecd/2024-07-01/road_accidents
  = Table road_accidents
= Dataset garden/owid/latest/key_indicators
  = Table land_area
  = Table population_density
  = Table population
= Dataset garden/pew/2024-06-03/same_sex_marriage
  = Table same_sex_marriage
= Dataset garden/regions/2023-01-01/regions
  = Table regions
= Dataset garden/research_development/2024-05-20/patents_articles
  = Table patents_articles
= Dataset garden/shift/2023-12-12/energy_production_from_fossil_fuels
  = Table energy_production_from_fossil_fuels
= Dataset garden/smoking/2024-05-30/cigarette_sales
  = Table cigarette_sales
= Dataset garden/state_capacity/2023-10-19/state_capacity_dataset
  = Table state_capacity_dataset
= Dataset garden/state_capacity/2023-11-10/information_capacity_dataset
  = Table information_capacity_dataset
= Dataset garden/survey/2023-08-04/trust_surveys
  = Table trust_surveys
= Dataset garden/technology/2022/internet
  = Table users
= Dataset garden/terrorism/2023-07-20/global_terrorism_database
  = Table global_terrorism_database
= Dataset garden/tuberculosis/2023-11-27/budget
  = Table budget
= Dataset garden/tuberculosis/2023-11-27/burden_disaggregated
  = Table burden_disaggregated
  = Table burden_disaggregated_rate
= Dataset garden/tuberculosis/2023-11-27/burden_estimates
  = Table burden_estimates
= Dataset garden/tuberculosis/2023-11-27/drug_resistance_surveillance
  = Table drug_resistance_surveillance
= Dataset garden/tuberculosis/2023-11-27/laboratories
  = Table laboratories
= Dataset garden/tuberculosis/2023-11-27/notifications
  = Table notifications
= Dataset garden/tuberculosis/2023-11-27/outcomes_disagg
  = Table outcomes_disagg
= Dataset garden/un/2023-08-02/comtrade_pandemics
  = Table comtrade_pandemics
= Dataset garden/un/2023-10-09/plastic_waste
  = Table plastic_waste
= Dataset garden/un/2023-10-30/un_members
  = Table un_members
= Dataset garden/un/2024-01-17/urbanization_urban_rural
  = Table urbanization_urban_rural
= Dataset garden/un/2024-07-08/maternal_mortality
  = Table maternal_mortality
= Dataset garden/un/2024-07-25/refugee_data
  = Table refugee_data
= Dataset garden/un/2024-07-25/resettlement
  = Table resettlement
= Dataset garden/unep/2023-03-17/consumption_controlled_substances
  = Table consumption_controlled_substances
= Dataset garden/unicef/2024-07-30/child_migration
  = Table child_migration
= Dataset garden/urbanization/2024-01-26/ghsl_degree_of_urbanisation
  = Table ghsl_degree_of_urbanisation
= Dataset garden/war/2023-09-21/brecke
  = Table brecke
= Dataset garden/war/2023-09-21/cow
  = Table cow_country
  = Table cow_locations
  = Table cow
= Dataset garden/war/2023-09-21/cow_mid
  = Table cow_mid_country
  = Table cow_mid
= Dataset garden/war/2023-09-21/mars
  = Table mars
  = Table mars_country
= Dataset garden/war/2023-09-21/mie
  = Table mie
  = Table mie_country
= Dataset garden/war/2023-09-21/prio_v31
  = Table prio_v31
  = Table prio_v31_country
= Dataset garden/war/2023-09-21/ucdp
  = Table ucdp
  = Table ucdp_locations
  = Table ucdp_country
= Dataset garden/war/2023-09-21/ucdp_prio
  = Table ucdp_prio
= Dataset garden/war/2023-09-27/peace_diehl
  = Table peace_diehl
  = Table peace_diehl_agg
= Dataset garden/war/2024-01-11/nuclear_weapons_proliferation
  = Table nuclear_weapons_proliferation_counts
  = Table nuclear_weapons_proliferation
= Dataset garden/war/2024-01-23/nuclear_weapons_treaties
  = Table nuclear_weapons_treaties_country_counts
  = Table nuclear_weapons_treaties
= Dataset garden/wash/2024-01-06/who
  = Table who
= Dataset garden/wb/2024-07-29/income_groups
  = Table income_groups
  = Table income_groups_latest
= Dataset garden/who/2022-09-30/ghe
  = Table ghe_suicides_ratio
  = Table ghe
= Dataset garden/who/2023-06-01/cholera
  = Table cholera
2024-08-06 09:44:15 [error    ] Traceback (most recent call last):

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/simplejson/__init__.py", line 514, in loads
    return _default_decoder.decode(s)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/simplejson/decoder.py", line 386, in decode
    obj, end = self.raw_decode(s)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/simplejson/decoder.py", line 416, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())

simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/home/owid/etl/etl/datadiff.py", line 423, in cli
    lines = future.result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception

  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/owid/etl/etl/datadiff.py", line 416, in func
    differ.summary()

  File "/home/owid/etl/etl/datadiff.py", line 254, in summary
    self._diff_tables(self.ds_a, self.ds_b, table_name)

  File "/home/owid/etl/etl/datadiff.py", line 122, in _diff_tables
    table_a = future_a.result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception

  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 330, in wrapped_f
    return self(f, *args, **kw)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 467, in __call__
    do = self.iter(retry_state=retry_state)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 368, in iter
    result = action(retry_state)

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 390, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 470, in __call__
    result = fn(*args, **kwargs)

  File "/home/owid/etl/etl/datadiff.py", line 837, in get_table_with_retry
    return ds[table_name]

  File "/home/owid/etl/etl/datadiff.py", line 278, in __getitem__
    return tables.load()

  File "/home/owid/etl/lib/catalog/owid/catalog/catalogs.py", line 312, in load
    return self.iloc[0].load()  # type: ignore

  File "/home/owid/etl/lib/catalog/owid/catalog/catalogs.py", line 363, in load
    return Table.read(uri)

  File "/home/owid/etl/lib/catalog/owid/catalog/tables.py", line 177, in read
    table = cls.read_feather(path, **kwargs)

  File "/home/owid/etl/lib/catalog/owid/catalog/tables.py", line 349, in read_feather
    cls._add_metadata(df, path, **kwargs)

  File "/home/owid/etl/lib/catalog/owid/catalog/tables.py", line 321, in _add_metadata
    metadata = cls._read_metadata(path)

  File "/home/owid/etl/lib/catalog/owid/catalog/tables.py", line 383, in _read_metadata
    return cast(Dict[str, Any], requests.get(metadata_path).json())

  File "/home/owid/etl/.venv/lib/python3.10/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

= Dataset garden/who/2024-02-14/gho_suicides
  = Table gho_suicides
  = Table gho_suicides_ratio
= Dataset garden/who/2024-04-08/polio
  = Table polio
= Dataset garden/who/2024-05-20/vehicles
  = Table vehicles
= Dataset garden/who/latest/avian_influenza_ah5n1
  = Table avian_influenza_ah5n1_month
  = Table avian_influenza_ah5n1_year
= Dataset garden/wid/2024-05-24/world_inequality_database
  = Table world_inequality_database
  = Table world_inequality_database_distribution
  = Table world_inequality_database_fiscal
= Dataset garden/wvs/2023-06-25/longitudinal_wvs
  = Table longitudinal_wvs

⚠ Found errors, create an issue please

Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2024-11-06 10:24:54 UTC
Execution time: 4.63 seconds

@lucasrodes
Copy link
Member

lucasrodes commented Jul 31, 2024

This looks great, thanks for doing this, Mojmir.

This would be an alternative to Buildkite, where error logs are more legible?

I tried checking the Prefect UI on Wizard, and realised that one needs to first run make prefect-ui. This feels a bit confusing at first, maybe we should signal this somewhere. Maybe Wizard can have an option to trigger it? or some message Also wondering if the app should be in the 'Misc' category, or rather in 'Monitoring', for instance.

Also, I tried running make prefect-ui, but run into the following error. Note that http://0.0.0.0:4200 is working though.

Some entry in the docs could be of great use if we want data managers / engineers to use this tool.

make prefect-ui
--- Starting Prefect UI at http://0.0.0.0:4200
poetry run prefect server start --host 0.0.0.0

 ___ ___ ___ ___ ___ ___ _____ 
| _ \ _ \ __| __| __/ __|_   _| 
|  _/   / _|| _|| _| (__  | |  
|_| |_|_\___|_| |___\___| |_|  

Configure Prefect to communicate with the server with:

    prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api

View the API reference documentation at http://0.0.0.0:4200/docs

Check out the dashboard at http://0.0.0.0:4200



10:17:43.352 | ERROR   | prefect.server.services.flowrunnotifications - Unexpected error in: OperationalError('(sqlite3.OperationalError) database is locked')
Traceback (most recent call last):
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 146, in execute
    self._adapt_connection._handle_exception(error)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 298, in _handle_exception
    raise error
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 128, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 131, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/core.py", line 132, in _execute
    return await future
           ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/core.py", line 115, in run
    result = function()
             ^^^^^^^^^^
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/services/loop_service.py", line 79, in start
    await self.run_once()
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/dependencies.py", line 125, in async_wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/.pyenv/versions/3.11.1/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/home/lucas/.pyenv/versions/3.11.1/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/configurations.py", line 453, in begin_transaction
    yield transaction
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/interface.py", line 119, in session_context
    yield session
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/services/flow_run_notifications.py", line 38, in run_once
    notifications = await db.get_flow_run_notifications_from_queue(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/interface.py", line 405, in get_flow_run_notifications_from_queue
    return await self.queries.get_flow_run_notifications_from_queue(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/query_components.py", line 1115, in get_flow_run_notifications_from_queue
    await session.execute(delete_stmt)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/session.py", line 461, in execute
    result = await greenlet_spawn(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 201, in greenlet_spawn
    result = context.throw(*sys.exc_info())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2236, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1953, in orm_execute_statement
    return super().orm_execute_statement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
    result = conn.execute(
             ^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2353, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 146, in execute
    self._adapt_connection._handle_exception(error)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 298, in _handle_exception
    raise error
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 128, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 131, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/core.py", line 132, in _execute
    return await future
           ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/core.py", line 115, in run
    result = function()
             ^^^^^^^^^^
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: DELETE FROM flow_run_notification_queue WHERE flow_run_notification_queue.id IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1) RETURNING id]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

@larsyencken
Copy link
Collaborator

This would be an alternative to Buildkite, where error logs are more legible?

This is more of a complement. We would run the Prefect web UI every staging server for you, and it would be an additional way of seeing individual ETL runs or changes made to your staging server and their logs.

@Marigold
Copy link
Collaborator Author

Also, I tried running make prefect-ui, but run into the following error.

Hm, weird. Could you try running make prefect-reset and if that doesn't work, remove file ~/.prefect/prefect.db?

Also wondering if the app should be in the 'Misc' category, or rather in 'Monitoring', for instance.

Oh right 🤦 Moved it there.

This feels a bit confusing at first, maybe we should signal this somewhere. Maybe Wizard can have an option to trigger it? or some message

Good idea! Fixed that with nicer error message.

Some entry in the docs could be of great use if we want data managers / engineers to use this tool.

The plan is to "dark launch" this first. Once we confirm that it has value, we should let data managers know and add it to docs.

@larsyencken
Copy link
Collaborator

@Marigold Marcel and I were trying this now, and overall it's quite impressive, though there's a ton of errors of this form littered through the run, preventing it from executing cleanly:

Encountered exception during execution:
Traceback (most recent call last):
  File "/Users/lars/Documents/owid/etl/.venv/lib/python3.11/site-packages/prefect/engine.py", line 2145, in orchestrate_task_run
    result = await call.aresult()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/lars/Documents/owid/etl/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lars/Documents/owid/etl/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lars/Documents/owid/etl/etl/command.py", line 421, in run_step
    return step.run()
           ^^^^^^^^^^
  File "/Users/lars/Documents/owid/etl/etl/steps/__init__.py", line 446, in run
    dataset.save()
  File "/Users/lars/Documents/owid/etl/lib/catalog/owid/catalog/datasets.py", line 173, in save
    assert self.metadata.short_name, "Missing dataset short_name"
AssertionError: Missing dataset short_name

@larsyencken
Copy link
Collaborator

💡 If we got this working and integrated this, one cool thing is that we could potentially provide step-level and run-level analytics on all our ETL steps and production runs, basically telling you how things are changing over time and what's most expensive.

Copy link
Collaborator

@larsyencken larsyencken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks super nice, mainly the many task failures when executing it would need to be solved.

etl/command.py Outdated
task_futures: Dict[str, PrefectFuture] = {}

for step in steps:
# task = prefect.task(name=str(step), on_failure=[on_failure_hook])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am noticing that the flows hang around forever on failure. Is there a reason this bit is commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, when does it happen? When I run it locally with an error, it doesn't hang forever, but it fails after all tasks are either completed or failed.

etl/command.py Outdated Show resolved Hide resolved
@Marigold
Copy link
Collaborator Author

Marigold commented Jul 31, 2024

though there's a ton of errors of this form littered through the run, preventing it from executing cleanly:

@larsyencken how did you get the error? I can't replicate it...

EDIT: I fixed one serious bug which might have caused it.

@lucasrodes
Copy link
Member

@Marigold Thanks for the suggestions.

Removing ~/.prefect/prefect.db made it work!

Tried make prefect-reset, but got a slightly different error now:

make prefect-reset
==> Installing packages
poetry install --no-ansi || poetry install --no-ansi
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: etl (0.1.0)
touch .venv
--- Resetting Prefect database
poetry run prefect server database reset -y
Downgrading database...
Upgrading database...
/home/lucas/.pyenv/versions/3.11.1/lib/python3.11/contextlib.py:144: SAWarning: Skipped unsupported reflection of expression-based index ix_flow_run__coalesce_start_time_expected_start_time_desc
  next(self.gen)
/home/lucas/.pyenv/versions/3.11.1/lib/python3.11/contextlib.py:144: SAWarning: Skipped unsupported reflection of expression-based index ix_flow_run__coalesce_start_time_expected_start_time_asc
  next(self.gen)
Prefect database "sqlite+aiosqlite:////home/lucas/.prefect/prefect.db" reset!
etl-py3.11(etl-py3.11) ➜  etl git:(prefect) ✗ make prefect-ui
--- Starting Prefect UI at http://0.0.0.0:4200
poetry run prefect server start --host 0.0.0.0

 ___ ___ ___ ___ ___ ___ _____ 
| _ \ _ \ __| __| __/ __|_   _| 
|  _/   / _|| _|| _| (__  | |  
|_| |_|_\___|_| |___\___| |_|  

Configure Prefect to communicate with the server with:

    prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api

View the API reference documentation at http://0.0.0.0:4200/docs

Check out the dashboard at http://0.0.0.0:4200



13:48:05.689 | ERROR   | prefect.server.services.telemetry - Unexpected error in: OperationalError('(sqlite3.OperationalError) database is locked')
Traceback (most recent call last):
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 146, in execute
    self._adapt_connection._handle_exception(error)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 298, in _handle_exception
    raise error
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 128, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 131, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 40, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/core.py", line 132, in _execute
    return await future
           ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/core.py", line 115, in run
    result = function()
             ^^^^^^^^^^
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/services/loop_service.py", line 79, in start
    await self.run_once()
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/services/telemetry.py", line 85, in run_once
    await self._fetch_or_set_telemetry_session()
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/dependencies.py", line 125, in async_wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/.pyenv/versions/3.11.1/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/home/lucas/.pyenv/versions/3.11.1/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/configurations.py", line 453, in begin_transaction
    yield transaction
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/interface.py", line 119, in session_context
    yield session
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/services/telemetry.py", line 63, in _fetch_or_set_telemetry_session
    await configuration.write_configuration(session, telemetry_session)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/database/dependencies.py", line 125, in async_wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/prefect/server/models/configuration.py", line 29, in write_configuration
    await session.flush()
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/session.py", line 800, in flush
    await greenlet_spawn(self.sync_session.flush, objects=objects)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 203, in greenlet_spawn
    result = context.switch(value)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4341, in flush
    self._flush(objects)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4476, in _flush
    with util.safe_reraise():
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4437, in _flush
    flush_context.execute()
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1233, in _emit_insert_statements
    result = connection.execute(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2353, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 146, in execute
    self._adapt_connection._handle_exception(error)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 298, in _handle_exception
    raise error
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py", line 128, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 131, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site-packages/aiosqlite/cursor.py", line 48, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "/home/lucas/repos/etl/.venv/lib/python3.11/site

@Marigold
Copy link
Collaborator Author

@lucasrodes is the dashboard running on http://0.0.0.0:4200? I read that the timeout thing is more of a warning than an actual error.

@lucasrodes
Copy link
Member

lucasrodes commented Jul 31, 2024

@Marigold after removing ~/.prefect/prefect.db, it works perfectly. However, if I then run

make prefect-reset
make prefect-ui

I get the sqlalchemy.exc.OperationalError error I mentioned above. But the link localhost:42000 works.

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: DELETE FROM flow_run_notification_queue WHERE flow_run_notification_queue.id IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1) RETURNING id]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

@Marigold
Copy link
Collaborator Author

Marigold commented Aug 6, 2024

though there's a ton of errors of this form littered through the run, preventing it from executing cleanly:

@larsyencken how did you get the error, please? I can't replicate it.

@lucasrodes lucasrodes changed the title 🎉 add prefect engine when running ETL 🎉 engineering: add prefect engine when running ETL Sep 10, 2024
@Marigold
Copy link
Collaborator Author

This is ready to be merged, but I'm rethinking whether it's a good idea to introduce more complexity to ETL. Setting it as blocked until we find the right moment.

@lucasrodes lucasrodes removed their request for review September 12, 2024 15:27
@lucasrodes
Copy link
Member

@Marigold that makes sense. I've unset myself from the reviewers list. Feel free to add me back once you want me to review it! Thanks <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants