Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Resources exceeded during query execution #8

Closed
1 task done
benigls opened this issue Nov 4, 2022 · 10 comments · Fixed by #9
Closed
1 task done

[Bug] Resources exceeded during query execution #8

benigls opened this issue Nov 4, 2022 · 10 comments · Fixed by #9
Assignees
Labels
bug Something isn't working priority:p2 Affects most users; fix needed status:in_review Currently in review type:bug Something is broken or incorrect update_type:models Primary focus requires model updates

Comments

@benigls
Copy link
Contributor

benigls commented Nov 4, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

After running all the recurly models, I got a memory exceeded error on the last model, recurly__account_daily_overview

Database Error in model recurly__account_daily_overview (models/recurly__account_daily_overview.sql)
07:55:06    Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 134% of limit.
07:55:06    Top memory consumer(s):
07:55:06      sort operations used for analytic OVER() clauses: 99%
07:55:06      other/unattributed: 1%

Relevant error log or model output

>>> dbt run --select +recurly                       

07:52:41  Running with dbt=1.0.4
07:52:41  Unable to do partial parsing because a project config has changed
07:52:46  Found 203 models, 182 tests, 2 snapshots, 0 analyses, 830 macros, 0 operations, 0 seed files, 185 sources, 0 exposures, 0 metrics
07:52:46  
07:52:48  Concurrency: 2 threads (target='dev')
07:52:48  
07:52:48  1 of 18 START view model dbt_ben.stg_recurly__account_history_tmp............... [RUN]
07:52:48  2 of 18 START view model dbt_ben.stg_recurly__invoice_history_tmp............... [RUN]
07:52:50  2 of 18 OK created view model dbt_ben.stg_recurly__invoice_history_tmp.......... [OK in 1.63s]
07:52:50  3 of 18 START view model dbt_ben.stg_recurly__line_item_history_tmp............. [RUN]
07:52:50  1 of 18 OK created view model dbt_ben.stg_recurly__account_history_tmp.......... [OK in 1.69s]
07:52:50  4 of 18 START view model dbt_ben.stg_recurly__plan_history_tmp.................. [RUN]
07:52:52  4 of 18 OK created view model dbt_ben.stg_recurly__plan_history_tmp............. [OK in 1.80s]
07:52:52  3 of 18 OK created view model dbt_ben.stg_recurly__line_item_history_tmp........ [OK in 1.86s]
07:52:52  5 of 18 START view model dbt_ben.stg_recurly__subscription_history_tmp.......... [RUN]
07:52:52  6 of 18 START view model dbt_ben.stg_recurly__transaction_tmp................... [RUN]
07:52:53  5 of 18 OK created view model dbt_ben.stg_recurly__subscription_history_tmp..... [OK in 1.80s]
07:52:53  7 of 18 START table model dbt_ben.stg_recurly__invoice_history.................. [RUN]
07:52:54  6 of 18 OK created view model dbt_ben.stg_recurly__transaction_tmp.............. [OK in 1.82s]
07:52:54  8 of 18 START table model dbt_ben.stg_recurly__account_history.................. [RUN]
07:52:59  7 of 18 OK created table model dbt_ben.stg_recurly__invoice_history............. [CREATE TABLE (15.0k rows, 2.4 MB processed) in 5.12s]
07:52:59  9 of 18 START table model dbt_ben.stg_recurly__plan_history..................... [RUN]
07:53:01  8 of 18 OK created table model dbt_ben.stg_recurly__account_history............. [CREATE TABLE (123.7k rows, 14.6 MB processed) in 7.00s]
07:53:01  10 of 18 START table model dbt_ben.stg_recurly__line_item_history............... [RUN]
07:53:04  9 of 18 OK created table model dbt_ben.stg_recurly__plan_history................ [CREATE TABLE (210.0 rows, 31.0 KB processed) in 4.91s]
07:53:04  11 of 18 START table model dbt_ben.stg_recurly__subscription_history............ [RUN]
07:53:06  10 of 18 OK created table model dbt_ben.stg_recurly__line_item_history.......... [CREATE TABLE (40.0k rows, 12.1 MB processed) in 5.77s]
07:53:06  12 of 18 START table model dbt_ben.stg_recurly__transaction..................... [RUN]
07:53:08  11 of 18 OK created table model dbt_ben.stg_recurly__subscription_history....... [CREATE TABLE (2.9k rows, 656.4 KB processed) in 4.79s]
07:53:12  12 of 18 OK created table model dbt_ben.stg_recurly__transaction................ [CREATE TABLE (13.4k rows, 6.6 MB processed) in 5.22s]
07:53:12  13 of 18 START table model dbt_ben.recurly__balance_transactions................ [RUN]
07:53:17  13 of 18 OK created table model dbt_ben.recurly__balance_transactions........... [CREATE TABLE (43.2k rows, 11.2 MB processed) in 5.63s]
07:53:17  14 of 18 START table model dbt_ben.recurly__monthly_recurring_revenue........... [RUN]
07:53:17  15 of 18 START table model dbt_ben.recurly__account_overview.................... [RUN]
07:53:23  14 of 18 OK created table model dbt_ben.recurly__monthly_recurring_revenue...... [CREATE TABLE (12.8k rows, 14.1 MB processed) in 5.62s]
07:53:25  15 of 18 OK created table model dbt_ben.recurly__account_overview............... [CREATE TABLE (112.1k rows, 17.2 MB processed) in 7.46s]
07:53:25  16 of 18 START table model dbt_ben.recurly__subscription_overview............... [RUN]
07:53:30  16 of 18 OK created table model dbt_ben.recurly__subscription_overview.......... [CREATE TABLE (1.1k rows, 7.8 MB processed) in 4.80s]
07:53:30  17 of 18 START table model dbt_ben.recurly__churn_analysis...................... [RUN]
07:53:33  17 of 18 OK created table model dbt_ben.recurly__churn_analysis................. [CREATE TABLE (1.1k rows, 166.9 KB processed) in 3.93s]
07:53:34  18 of 18 START table model dbt_ben.recurly__account_daily_overview.............. [RUN]
07:55:06  18 of 18 ERROR creating table model dbt_ben.recurly__account_daily_overview..... [ERROR in 91.67s]
07:55:06  
07:55:06  Finished running 6 view models, 12 table models in 140.07s.
07:55:06  
07:55:06  Completed with 1 error and 0 warnings:
07:55:06  
07:55:06  Database Error in model recurly__account_daily_overview (models/recurly__account_daily_overview.sql)
07:55:06    Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 134% of limit.
07:55:06    Top memory consumer(s):
07:55:06      sort operations used for analytic OVER() clauses: 99%
07:55:06      other/unattributed: 1%
07:55:06    
07:55:06    compiled SQL at target/run/recurly/models/recurly__account_daily_overview.sql
07:55:06  
07:55:06  Done. PASS=17 WARN=0 ERROR=1 SKIP=0 TOTAL=18


### Expected behavior

All models would run successfully

### dbt Project configurations

- package: fivetran/recurly
  version: 0.1.0

### Package versions

recurly_database: raw-data
recurly_schema: recurly

### What database are you using dbt with?

bigquery

### dbt Version

installed version: 1.0.4
   latest version: 1.3.0

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 1.0.0 - Update available!
  Your version of dbt-bigquery is out of date! You can find instructions for upgrading here:
  https://docs.getdbt.com/dbt-cli/install/overview

### Additional Context

_No response_

### Are you willing to open a PR to help address this issue?

- [ ] Yes.
- [ ] Yes, but I will need assistance and will schedule time during our [office hours](https://calendly.com/fivetran-solutions-team/fivetran-solutions-team-office-hours) for guidance
- [ ] No.
@benigls benigls added the bug Something isn't working label Nov 4, 2022
@dimoschi
Copy link
Collaborator

dimoschi commented Nov 8, 2022

This is indeed a resource intensive model as it is based on int_recurly__account_rolling_totals which is based on int_recurly__transactions_date_spine, where both tables have around 350 million records. In Snowflake the query took a long time to complete, 2820.44 seconds.

I'm now comparing it a similar model in Stripe stripe__daily_overview to try and find differences/similarities and maybe come up with a more efficient way.

@fivetran-joemarkiewicz
Copy link
Collaborator

Thanks for opening this issue @benigls, and thanks for taking a look to dig in @dimoschi!

This is great for us to know and I imagine there should be a way we can optimize this query to be more performant. I will be sure to dig into this as well and explore a possible solution to hopefully be rolled out in our upcoming sprint.

@fivetran-avinash
Copy link
Contributor

fivetran-avinash commented Nov 21, 2022

Thanks for bringing this issue to our attention @benigls and examining this problem more closely @dimoschi!

It looks like the main culprit might be the multiple OVER functions happening in the daily_overview.

I'm wondering if we take those resource-intensive functions out of the final model, separate them into additional intermediate model(s), then somehow separate them out before we get to the end model and join the two models together), we can reduce the load time and the query runs.

My original thought is we could bring both of the OVER CTEs into one intermediate model, see if it can run on its own. If that still has trouble running, we can split them up into multiple CTEs in two intermediate models.

Then the only operation we have to do is the join to account_overview in the final model and adding the passthrough columns.

Would either of you be able to take a look and try breaking that final model logic up? Otherwise we can attempt to tackle this in an upcoming sprint!

@dimoschi
Copy link
Collaborator

I'll do my best to look into it until the end of this week, but cannot promise anything.

@fivetran-avinash
Copy link
Contributor

Hello @benigls and @dimoschi!

I've attempted the above solution to try and resolve the query execution issue. Before I send over the proposed PR, I just want to double-check what your specific dbt environment setup is looking like. Some of our code we're deploying and testing is dependent on your current version, so we might need to make some modifications if you're using a specific version of dbt.

Could you provide the following details:

  • What version of dbt are you using?
  • What version of dbt-utils are you running?
  • Are you running dbt Core or Cloud?
  • Are you using Fivetran Transformations?

Thanks for all your help and patience in resolving this issue!

@benigls
Copy link
Contributor Author

benigls commented Dec 15, 2022

Thank you for taking care of this @fivetran-avinash!

  • What version of dbt are you using?
    1.0.4
  • What version of dbt-utils are you running?
    0.8.6
  • Are you running dbt Core or Cloud?
    Yes
  • Are you using Fivetran Transformations?
    No

Let me know if you need further info.

@fivetran-avinash fivetran-avinash added priority:p2 Affects most users; fix needed type:bug Something is broken or incorrect status:blocked Need additional information or requirements before proceeding update_type:models Primary focus requires model updates labels Dec 16, 2022
@fivetran-avinash fivetran-avinash self-assigned this Dec 16, 2022
@fivetran-avinash
Copy link
Contributor

fivetran-avinash commented Dec 16, 2022

Thanks @benigls!

I think your versions are up-to-date with what can run on our branch.

Here is the link to the branch with the updated changes: https://github.com/fivetran/dbt_recurly/tree/bug/account-daily-overview-query-resources-exceeded

You can clone the repo, fetch this specific branch above, and explore whether these code changes make a difference in resource execution on your data.

Let me know how it goes and if you have any questions!

@benigls
Copy link
Contributor Author

benigls commented Dec 22, 2022

Hi @fivetran-avinash. I just tested the branch and all the models are running in my environment and data. 🚀 Thank you!

Side note:
I noticed that it took 3.5 minutes to run all the models. I wonder, is there a way to add a condition in the base layer to only process accounts with an existing subscription? The reason for asking is that in our case, we only care about the accounts with a subscription, which is ~1/4 of the entire account population. I think this is a common pattern with companies with a free-tier plan.

@fivetran-joemarkiewicz
Copy link
Collaborator

Hi @benigls thanks for testing out the branch and that is great to hear that the models are running with your data now!!

The majority of our team is on PTO until the new year, but we will be sure to pick this up and plan to integrate into the next release once we come back in the new year.

In the meantime, I think your suggestion for filtering out accounts without active subscriptions is completely reasonable and would add value for all of the package users. Realistically, we would probably add a variable to allow customers to configure if they want the filter applied or not. In order for us to ensure we can get this fix out I am going to move the filter request into a new feature request (FR #11) where we will be able to scope it out and integrate into a future sprint. Be sure to follow the linked FR to see when we are able to integrate the filter and test out the branch once it is available.

Thanks again for all your help!

@fivetran-avinash fivetran-avinash removed the status:blocked Need additional information or requirements before proceeding label Jan 3, 2023
@fivetran-avinash fivetran-avinash added the status:in_review Currently in review label Jan 17, 2023
@fivetran-avinash fivetran-avinash linked a pull request Feb 16, 2023 that will close this issue
14 tasks
@fivetran-avinash
Copy link
Contributor

Closing this issue since the branch has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Affects most users; fix needed status:in_review Currently in review type:bug Something is broken or incorrect update_type:models Primary focus requires model updates
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants