An example on how to parse the dataset.
-
process.ipynb: to parse the dataset into a pandas dataframe The other notebooks are not required. They show how to work with the original big dataset:
-
raw_to_parquet.ipynb: convert the dataset into a dask dataframe and write to list in a parquet format
-
sample_from_parquet.ipynb: sample engaging user ids from a parquet format