Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: EXPLAIN ANALYZE taking excessive time #16972

Open
1 of 2 tasks
rad-pat opened this issue Nov 29, 2024 · 4 comments
Open
1 of 2 tasks

bug: EXPLAIN ANALYZE taking excessive time #16972

rad-pat opened this issue Nov 29, 2024 · 4 comments
Labels
C-bug Category: something isn't working

Comments

@rad-pat
Copy link

rad-pat commented Nov 29, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Version: v1.2.665-nightly-ab08029eb9(rust-1.81.0-nightly-2024-11-27T10:17:49.268106726Z)

What's Wrong?

Running EXPLAIN ANALYZE on a GROUP BY query that is not completing takes excessive time. Hopefully the SQL below will replicate sufficiently to see the problem for the EXPLAIN, and possibly also for why the GROUP BY takes such time. The SQL generates 60M rows, but we actually have 256M in the real table so the volume may need increasing.

We saw the error in the attached file in the query logs.
explain_analyze_group_by_error.txt

How to Reproduce?

CREATE TABLE rt (
  "YTD_MONTH" VARCHAR NULL,
  "YTD_MONTH_NUM" VARCHAR NULL,
  "VERSION" VARCHAR NULL,
  "SCENARIO" VARCHAR NULL,
  "VALUE__FC" DECIMAL(38, 10) NULL,
  "VALUE__LC" DECIMAL(38, 10) NULL,
  "VALUE__GC" DECIMAL(38, 10) NULL,
  "CURRENCY__FUNCTIONAL" VARCHAR NULL,
  "CURRENCY__LOCAL" VARCHAR NULL,
  "CURRENCY__GROUP" VARCHAR NULL,
  "YEAR" VARCHAR NULL,
  "MONTH" VARCHAR NULL,
  "PERIOD_TYPE" VARCHAR NULL,
  "PERIOD" VARCHAR NULL,
  "LE" VARCHAR NULL,
  "LE__COUNTRY" VARCHAR NULL,
  "LE__REGION" VARCHAR NULL,
  "LE__DISPLAY" VARCHAR NULL,
  "LE__NAME" VARCHAR NULL,
  "LE__CITY" VARCHAR NULL,
  "LE__TYPE" VARCHAR NULL,
  "LE__ACCOUNTING_METHOD" VARCHAR NULL,
  "LE__AITRA" VARCHAR NULL,
  "LE__ERP_SYSTEM" VARCHAR NULL,
  "LE__CURRENCY__FUNCTIONAL" VARCHAR NULL,
  "LE__CURRENCY__LOCAL" VARCHAR NULL,
  "LE__TRANSMITTING_SYSTEM" VARCHAR NULL,
  "LE__PLANT" VARCHAR NULL,
  "ENTITY_TYPE" VARCHAR NULL,
  "VT" VARCHAR NULL,
  "VT__REGION" VARCHAR NULL,
  "VT__COUNTRY" VARCHAR NULL,
  "VT__NAME" VARCHAR NULL,
  "VT__CITY" VARCHAR NULL,
  "TP" VARCHAR NULL,
  "TP__REGION" VARCHAR NULL,
  "TP__COUNTRY" VARCHAR NULL,
  "TP__NAME" VARCHAR NULL,
  "TP__CITY" VARCHAR NULL,
  "ACCOUNT__MGCOA" VARCHAR NULL,
  "ACCOUNT" VARCHAR NULL,
  "ACCOUNT__DISPLAY" VARCHAR NULL,
  "ACCOUNT__NAME" VARCHAR NULL,
  "TAX_SEGMENT" VARCHAR NULL,
  "SALE_TYPE" VARCHAR NULL,
  "MFG_TYPE" VARCHAR NULL,
  "FUNCTION" VARCHAR NULL,
  driver VARCHAR NULL,
  "PRETAX_GROUP" VARCHAR NULL,
  "PRETAX_GROUP_SORT" VARCHAR NULL,
  "PRETAX_SEQ" VARCHAR NULL,
  "PROFIT_CENTER__TYPE" VARCHAR NULL,
  "FILE_NAME" VARCHAR NULL,
  path_id SMALLINT NULL,
  path VARCHAR NULL,
  rule VARCHAR NULL
) Engine = Random;

create transient table t0 (
  "YTD_MONTH" VARCHAR NULL,
  "YTD_MONTH_NUM" VARCHAR NULL,
  "VERSION" VARCHAR NULL,
  "SCENARIO" VARCHAR NULL,
  "VALUE__FC" DECIMAL(38, 10) NULL,
  "VALUE__LC" DECIMAL(38, 10) NULL,
  "VALUE__GC" DECIMAL(38, 10) NULL,
  "CURRENCY__FUNCTIONAL" VARCHAR NULL,
  "CURRENCY__LOCAL" VARCHAR NULL,
  "CURRENCY__GROUP" VARCHAR NULL,
  "YEAR" VARCHAR NULL,
  "MONTH" VARCHAR NULL,
  "PERIOD_TYPE" VARCHAR NULL,
  "PERIOD" VARCHAR NULL,
  "LE" VARCHAR NULL,
  "LE__COUNTRY" VARCHAR NULL,
  "LE__REGION" VARCHAR NULL,
  "LE__DISPLAY" VARCHAR NULL,
  "LE__NAME" VARCHAR NULL,
  "LE__CITY" VARCHAR NULL,
  "LE__TYPE" VARCHAR NULL,
  "LE__ACCOUNTING_METHOD" VARCHAR NULL,
  "LE__AITRA" VARCHAR NULL,
  "LE__ERP_SYSTEM" VARCHAR NULL,
  "LE__CURRENCY__FUNCTIONAL" VARCHAR NULL,
  "LE__CURRENCY__LOCAL" VARCHAR NULL,
  "LE__TRANSMITTING_SYSTEM" VARCHAR NULL,
  "LE__PLANT" VARCHAR NULL,
  "ENTITY_TYPE" VARCHAR NULL,
  "VT" VARCHAR NULL,
  "VT__REGION" VARCHAR NULL,
  "VT__COUNTRY" VARCHAR NULL,
  "VT__NAME" VARCHAR NULL,
  "VT__CITY" VARCHAR NULL,
  "TP" VARCHAR NULL,
  "TP__REGION" VARCHAR NULL,
  "TP__COUNTRY" VARCHAR NULL,
  "TP__NAME" VARCHAR NULL,
  "TP__CITY" VARCHAR NULL,
  "ACCOUNT__MGCOA" VARCHAR NULL,
  "ACCOUNT" VARCHAR NULL,
  "ACCOUNT__DISPLAY" VARCHAR NULL,
  "ACCOUNT__NAME" VARCHAR NULL,
  "TAX_SEGMENT" VARCHAR NULL,
  "SALE_TYPE" VARCHAR NULL,
  "MFG_TYPE" VARCHAR NULL,
  "FUNCTION" VARCHAR NULL,
  driver VARCHAR NULL,
  "PRETAX_GROUP" VARCHAR NULL,
  "PRETAX_GROUP_SORT" VARCHAR NULL,
  "PRETAX_SEQ" VARCHAR NULL,
  "PROFIT_CENTER__TYPE" VARCHAR NULL,
  "FILE_NAME" VARCHAR NULL,
  path_id SMALLINT NULL,
  path VARCHAR NULL,
  rule VARCHAR NULL
) engine = fuse;


insert into t0
--select * from rt limit 256000000;  -- we have 256M rows in actual table, but 60M gives sufficient issue in my test
select * from rt limit 60000000;

select count(*) from t0


EXPLAIN ANALYZE 
SELECT CAST(t0."YTD_MONTH" AS VARCHAR) AS "YTD_MONTH"
	,CAST(t0."YTD_MONTH_NUM" AS VARCHAR) AS "YTD_MONTH_NUM"
	,CAST(t0."VERSION" AS VARCHAR) AS "VERSION"
	,CAST(t0."SCENARIO" AS VARCHAR) AS "SCENARIO"
	,CAST(sum(t0."VALUE__FC") AS DECIMAL(38, 10)) AS "VALUE__FC"
	,CAST(sum(t0."VALUE__LC") AS DECIMAL(38, 10)) AS "VALUE__LC"
	,CAST(sum(t0."VALUE__GC") AS DECIMAL(38, 10)) AS "VALUE__GC"
	,CAST(t0."CURRENCY__FUNCTIONAL" AS VARCHAR) AS "CURRENCY__FUNCTIONAL"
	,CAST(t0."CURRENCY__LOCAL" AS VARCHAR) AS "CURRENCY__LOCAL"
	,CAST(t0."CURRENCY__GROUP" AS VARCHAR) AS "CURRENCY__GROUP"
	,CAST(t0."YEAR" AS VARCHAR) AS "YEAR"
	,CAST(t0."YTD_MONTH_NUM" AS VARCHAR) AS "MONTH"
	,CAST(t0."PERIOD_TYPE" AS VARCHAR) AS "PERIOD_TYPE"
	,CAST(CONCAT (
			t0."YEAR"
			,t0."YTD_MONTH_NUM"
			) AS VARCHAR) AS "PERIOD"
	,CAST(t0."LE" AS VARCHAR) AS "LE"
	,CAST(t0."LE__COUNTRY" AS VARCHAR) AS "LE__COUNTRY"
	,CAST(t0."LE__REGION" AS VARCHAR) AS "LE__REGION"
	,CAST(t0."LE__DISPLAY" AS VARCHAR) AS "LE__DISPLAY"
	,CAST(t0."LE__NAME" AS VARCHAR) AS "LE__NAME"
	,CAST(t0."LE__CITY" AS VARCHAR) AS "LE__CITY"
	,CAST(t0."LE__TYPE" AS VARCHAR) AS "LE__TYPE"
	,CAST(t0."LE__ACCOUNTING_METHOD" AS VARCHAR) AS "LE__ACCOUNTING_METHOD"
	,CAST(t0."LE__AITRA" AS VARCHAR) AS "LE__AITRA"
	,CAST(t0."LE__ERP_SYSTEM" AS VARCHAR) AS "LE__ERP_SYSTEM"
	,CAST(t0."LE__CURRENCY__FUNCTIONAL" AS VARCHAR) AS "LE__CURRENCY__FUNCTIONAL"
	,CAST(t0."LE__CURRENCY__LOCAL" AS VARCHAR) AS "LE__CURRENCY__LOCAL"
	,CAST(t0."LE__TRANSMITTING_SYSTEM" AS VARCHAR) AS "LE__TRANSMITTING_SYSTEM"
	,CAST(t0."LE__PLANT" AS VARCHAR) AS "LE__PLANT"
	,CAST(t0."ENTITY_TYPE" AS VARCHAR) AS "ENTITY_TYPE"
	,CAST(t0."VT" AS VARCHAR) AS "VT"
	,CAST(t0."VT__REGION" AS VARCHAR) AS "VT__REGION"
	,CAST(t0."VT__COUNTRY" AS VARCHAR) AS "VT__COUNTRY"
	,CAST(t0."VT__NAME" AS VARCHAR) AS "VT__NAME"
	,CAST(t0."VT__CITY" AS VARCHAR) AS "VT__CITY"
	,CAST(t0."TP" AS VARCHAR) AS "TP"
	,CAST(t0."TP__REGION" AS VARCHAR) AS "TP__REGION"
	,CAST(t0."TP__COUNTRY" AS VARCHAR) AS "TP__COUNTRY"
	,CAST(t0."TP__NAME" AS VARCHAR) AS "TP__NAME"
	,CAST(t0."TP__CITY" AS VARCHAR) AS "TP__CITY"
	,CAST(t0."ACCOUNT__MGCOA" AS VARCHAR) AS "ACCOUNT__MGCOA"
	,CAST(t0."ACCOUNT" AS VARCHAR) AS "ACCOUNT"
	,CAST(t0."ACCOUNT__DISPLAY" AS VARCHAR) AS "ACCOUNT__DISPLAY"
	,CAST(t0."ACCOUNT__NAME" AS VARCHAR) AS "ACCOUNT__NAME"
	,CAST(t0."TAX_SEGMENT" AS VARCHAR) AS "TAX_SEGMENT"
	,CAST(t0."SALE_TYPE" AS VARCHAR) AS "SALE_TYPE"
	,CAST(t0."MFG_TYPE" AS VARCHAR) AS "MFG_TYPE"
	,CAST(t0."FUNCTION" AS VARCHAR) AS "FUNCTION"
	,CAST(t0.driver AS VARCHAR) AS driver
	,CAST(t0."PRETAX_GROUP" AS VARCHAR) AS "PRETAX_GROUP"
	,CAST(t0."PRETAX_GROUP_SORT" AS VARCHAR) AS "PRETAX_GROUP_SORT"
	,CAST(t0."PRETAX_SEQ" AS VARCHAR) AS "PRETAX_SEQ"
	,CAST(t0."PROFIT_CENTER__TYPE" AS VARCHAR) AS "PROFIT_CENTER__TYPE"
	,CAST(NULL AS VARCHAR) AS "FILE_NAME"
	,CAST(NULL AS SMALLINT) AS path_id
	,CAST(NULL AS VARCHAR) AS path
	,CAST(t0.RULE AS VARCHAR) AS RULE
FROM t0
GROUP BY t0."YTD_MONTH"
	,t0."YTD_MONTH_NUM"
	,t0."VERSION"
	,t0."SCENARIO"
	,t0."CURRENCY__FUNCTIONAL"
	,t0."CURRENCY__LOCAL"
	,t0."CURRENCY__GROUP"
	,t0."YEAR"
	,t0."YTD_MONTH_NUM"
	,t0."PERIOD_TYPE"
	,CAST(CONCAT (
			t0."YEAR"
			,t0."YTD_MONTH_NUM"
			) AS VARCHAR)
	,t0."LE"
	,t0."LE__COUNTRY"
	,t0."LE__REGION"
	,t0."LE__DISPLAY"
	,t0."LE__NAME"
	,t0."LE__CITY"
	,t0."LE__TYPE"
	,t0."LE__ACCOUNTING_METHOD"
	,t0."LE__AITRA"
	,t0."LE__ERP_SYSTEM"
	,t0."LE__CURRENCY__FUNCTIONAL"
	,t0."LE__CURRENCY__LOCAL"
	,t0."LE__TRANSMITTING_SYSTEM"
	,t0."LE__PLANT"
	,t0."ENTITY_TYPE"
	,t0."VT"
	,t0."VT__REGION"
	,t0."VT__COUNTRY"
	,t0."VT__NAME"
	,t0."VT__CITY"
	,t0."TP"
	,t0."TP__REGION"
	,t0."TP__COUNTRY"
	,t0."TP__NAME"
	,t0."TP__CITY"
	,t0."ACCOUNT__MGCOA"
	,t0."ACCOUNT"
	,t0."ACCOUNT__DISPLAY"
	,t0."ACCOUNT__NAME"
	,t0."TAX_SEGMENT"
	,t0."SALE_TYPE"
	,t0."MFG_TYPE"
	,t0."FUNCTION"
	,t0.driver
	,t0."PRETAX_GROUP"
	,t0."PRETAX_GROUP_SORT"
	,t0."PRETAX_SEQ"
	,t0."PROFIT_CENTER__TYPE"
	,CAST(NULL AS VARCHAR)
	,CAST(NULL AS SMALLINT)
	,CAST(NULL AS VARCHAR)
	,t0.RULE;

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@rad-pat rad-pat added the C-bug Category: something isn't working label Nov 29, 2024
@rad-pat
Copy link
Author

rad-pat commented Dec 2, 2024

Is it possible here that grouping by so many columns causes a problem? It seems like there may be an exponential issue with grouping

@sundy-li
Copy link
Member

sundy-li commented Dec 2, 2024

I can't reproduce it in my local machine, the data is so large I don't have enough disk to test it.

Can you post the elapsed time comparisons of query and explain analyze query ?

@rad-pat
Copy link
Author

rad-pat commented Dec 2, 2024

I just ran for 60M rows, it took 1m32s for select count(*) from (<query above>) and 5m29s for explain analyze of the same count.

@rad-pat
Copy link
Author

rad-pat commented Dec 2, 2024

explain                                                                                                                                                                                                                                                        |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
AggregateFinal                                                                                                                                                                                                                                                 |
├── output columns: [COUNT(*) (#118)]                                                                                                                                                                                                                          |
├── group by: []                                                                                                                                                                                                                                               |
├── aggregate functions: [count()]                                                                                                                                                                                                                             |
├── estimated rows: 1.00                                                                                                                                                                                                                                       |
├── cpu time: 107.981µs                                                                                                                                                                                                                                        |
├── output rows: 1                                                                                                                                                                                                                                             |
├── output bytes: 8.00 B                                                                                                                                                                                                                                       |
└── AggregatePartial                                                                                                                                                                                                                                           |
    ├── group by: []                                                                                                                                                                                                                                           |
    ├── aggregate functions: [count()]                                                                                                                                                                                                                         |
    ├── estimated rows: 1.00                                                                                                                                                                                                                                   |
    ├── cpu time: 908.526339ms                                                                                                                                                                                                                                 |
    ├── output rows: 8                                                                                                                                                                                                                                         |
    ├── output bytes: 64.00 B                                                                                                                                                                                                                                  |
    └── AggregateFinal                                                                                                                                                                                                                                         |
        ├── output columns: [t0.YTD_MONTH (#0), t0.YTD_MONTH_NUM (#1), t0.VERSION (#2), t0.SCENARIO (#3), t0.CURRENCY__FUNCTIONAL (#7), t0.CURRENCY__LOCAL (#8), t0.CURRENCY__GROUP (#9), t0.YEAR (#10), t0.PERIOD_TYPE (#12), t0.LE (#14), t0.LE__COUNTRY (#15|
        ├── group by: [YTD_MONTH, YTD_MONTH_NUM, VERSION, SCENARIO, CURRENCY__FUNCTIONAL, CURRENCY__LOCAL, CURRENCY__GROUP, YEAR, PERIOD_TYPE, LE, LE__COUNTRY, LE__REGION, LE__DISPLAY, LE__NAME, LE__CITY, LE__TYPE, LE__ACCOUNTING_METHOD, LE__AITRA, LE__ER|
        ├── aggregate functions: []                                                                                                                                                                                                                            |
        ├── estimated rows: 60000000.00                                                                                                                                                                                                                        |
        ├── cpu time: 506.517582617s                                                                                                                                                                                                                           |
        ├── wait time: 70.014329411s                                                                                                                                                                                                                           |
        ├── output rows: 60 million                                                                                                                                                                                                                            |
        ├── output bytes: 32.52 GiB                                                                                                                                                                                                                            |
        ├── numbers remote spilled by read: 1024                                                                                                                                                                                                               |
        ├── bytes remote spilled by read: 13.71 GiB                                                                                                                                                                                                            |
        ├── remote spilled time by read: 329.893s                                                                                                                                                                                                              |
        └── AggregatePartial                                                                                                                                                                                                                                   |
            ├── group by: [YTD_MONTH, YTD_MONTH_NUM, VERSION, SCENARIO, CURRENCY__FUNCTIONAL, CURRENCY__LOCAL, CURRENCY__GROUP, YEAR, PERIOD_TYPE, LE, LE__COUNTRY, LE__REGION, LE__DISPLAY, LE__NAME, LE__CITY, LE__TYPE, LE__ACCOUNTING_METHOD, LE__AITRA, LE|
            ├── aggregate functions: []                                                                                                                                                                                                                        |
            ├── estimated rows: 60000000.00                                                                                                                                                                                                                    |
            ├── cpu time: 488.237000195s                                                                                                                                                                                                                       |
            ├── wait time: 363.478044469s                                                                                                                                                                                                                      |
            ├── numbers remote spilled by write: 8                                                                                                                                                                                                             |
            ├── bytes remote spilled by write: 13.71 GiB                                                                                                                                                                                                       |
            ├── remote spilled time by write: 364.089s                                                                                                                                                                                                         |
            └── TableScan                                                                                                                                                                                                                                      |
                ├── table: default.default.t0                                                                                                                                                                                                                  |
                ├── output columns: [YTD_MONTH (#0), YTD_MONTH_NUM (#1), VERSION (#2), SCENARIO (#3), CURRENCY__FUNCTIONAL (#7), CURRENCY__LOCAL (#8), CURRENCY__GROUP (#9), YEAR (#10), PERIOD_TYPE (#12), LE (#14), LE__COUNTRY (#15), LE__REGION (#16), LE__|
                ├── read rows: 60000000                                                                                                                                                                                                                        |
                ├── read size: 7.68 GiB                                                                                                                                                                                                                        |
                ├── partitions total: 256                                                                                                                                                                                                                      |
                ├── partitions scanned: 256                                                                                                                                                                                                                    |
                ├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 256 to 256>]                                                                                                                                                    |
                ├── push downs: [filters: [], limit: NONE]                                                                                                                                                                                                     |
                ├── estimated rows: 60000000.00                                                                                                                                                                                                                |
                ├── cpu time: 106.886693894s                                                                                                                                                                                                                   |
                ├── wait time: 4354.147343145s                                                                                                                                                                                                                 |
                ├── output rows: 60 million                                                                                                                                                                                                                    |
                ├── output bytes: 44.59 GiB                                                                                                                                                                                                                    |
                ├── bytes scanned: 44.59 GiB                                                                                                                                                                                                                   |
                └── bytes scanned from cache: 1.51 GiB                                                                                                                                                                                                         |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants