Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPC-DS Query 88, 89, 93 Results mismatch between Java vs Native Runs on Hive Table #23661

Open
agrawalreetika opened this issue Sep 16, 2024 · 0 comments
Labels

Comments

@agrawalreetika
Copy link
Member

agrawalreetika commented Sep 16, 2024

TPC-DS Query 88, 89, 93 Results mismatch between Java vs Native Runs on Hive Table

Your Environment

  • Presto version used: 0.289
  • Storage (HDFS/S3/GCS..): s3
  • Data source and connector used: hive connector
  • Deployment (Cloud or On-prem): Cloud

Expected Behavior

Native run should return the same results as Presto Java runs

Current Behavior

Query 88 -
Even though all the selected result columns are count, the output columns native run returns output as a string.

Java sample output -
2001931,4024477,4020514,6041669,6041474,3515510,3518365,4023355
Native sample output -
"2001931","4024477","4020514","6041669","6041474","3515510","3518365","4023355"

Query 89 -
Here in the output column (6th one) d_moy (INT) but in Native output, it's coming in double-quotes.

Java sample output -
"Children","infants","importoexporti #2","able","Unknown",1,"3660722.34","5708486.93"
Native sample output -
"Children","infants","importoexporti #2","able","Unknown","1","3660722.34","5708486.93"

Query 93 -

--TPC-DS Q93
select ss_customer_sk
     , sum(act_sales) sumsales
from (select ss_item_sk
           , ss_ticket_number
           , ss_customer_sk
           , case
                 when sr_return_quantity is not null then (ss_quantity - sr_return_quantity) * ss_sales_price
                 else (ss_quantity * ss_sales_price) end act_sales
      from store_sales
               left outer join store_returns on (sr_item_sk = ss_item_sk
          and sr_ticket_number = ss_ticket_number)
         , reason
      where sr_reason_sk = r_reason_sk
        and r_reason_desc = 'Did not get it on time') t
group by ss_customer_sk
order by sumsales, ss_customer_sk
limit 100;

Here in the output column (1st one) ss_customer_sk (INT) but in Native output, it's coming in double-quotes.

Java sample output -
1145,"0.00"
Native sample output -
"1145","0.00"

Data Set : tpc-ds sf-1k
Tables : Un-partitioned

Possible Solution

Steps to Reproduce

  1. Run TPC-DS Query on Presto Java cluster
  2. Run TPC-DS Query on Presto Native cluster
  3. Compare the results b/w both

Screenshots (if appropriate)

Context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

1 participant