You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to generate cohort query from sogamo dataset for cohortQueryProcessing unittest.
Through some simple data analysis, there some problems. we found that:
In sogamo dataset, there are only 4 players in the entire dataset which contains 10k items. Thus the cohort query in old-version code is not representative. It can not work well as a unittest. According to the CoHANA paper, the raw data is larger than the sample data current we have. I recommend use raw data to generate test cohort query.
In tpch dataset, there is a same problem. There is only 1 user in the entire dataset. Total order in this datasets is about the same user.
The text was updated successfully, but these errors were encountered:
We want to generate cohort query from
sogamo
dataset for cohortQueryProcessing unittest.Through some simple data analysis, there some problems. we found that:
In
sogamo
dataset, there are only 4 players in the entire dataset which contains 10k items. Thus the cohort query in old-version code is not representative. It can not work well as a unittest. According to the CoHANA paper, the raw data is larger than the sample data current we have. I recommend use raw data to generate test cohort query.In
tpch
dataset, there is a same problem. There is only 1 user in the entire dataset. Total order in this datasets is about the same user.The text was updated successfully, but these errors were encountered: