Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in slice_head when n is less than chunk size for the any group. Behavior different from dplyr. #50

Open
grahitr opened this issue Aug 25, 2023 · 2 comments
Assignees

Comments

@grahitr
Copy link
Member

grahitr commented Aug 25, 2023

In [1]: import tidypandas.tidy_accessor as tp
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({"a":[1,1,1,2], "b": [1,2,3,4]})
In [4]: df.tp.slice_head(n=2, by="a")
Minimum group size is  1
/Users/a0r0qfj/py_envs/python3.10.7/lib/python3.10/site-packages/astroid/node_classes.py:94: DeprecationWarning: The 'astroid.node_classes' module is deprecated and will be replaced by 'astroid.nodes' in astroid 3.0.0
  warnings.warn(
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [4], line 1
----> 1 df.tp.slice_head(n=2, by="a")

File ~/py_envs/python3.10.7/lib/python3.10/site-packages/tidypandas/tidy_accessor.py:382, in tp.slice_head(self, n, prop, rounding_type, by)
    375 def slice_head(self
    376                , n = None
    377                , prop = None
    378                , rounding_type = "round"
    379                , by = None
    380                ):
    381     tf = tidyframe(self._obj, copy = False, check = False)
--> 382     return tf.slice_head(n = n
    383                          , prop = prop
    384                          , rounding_type = rounding_type
    385                          , by = by
    386                          ).to_pandas(copy = False)

File ~/py_envs/python3.10.7/lib/python3.10/site-packages/tidypandas/tidyframe_class.py:4124, in tidyframe.slice_head(self, n, prop, rounding_type, by)
   4122 if n > min_group_size:
   4123     print("Minimum group size is ", min_group_size)
-> 4124 assert n <= min_group_size,\
   4125     "arg 'n' should not exceed the size of any chunk after grouping"
   4127 ro_name = _generate_new_string(cn) 
   4128 res = (self.group_modify(lambda x: x.slice(np.arange(n))
   4129                          , by = by
   4130                          , preserve_row_order = True
   4131                          , row_order_column_name = ro_name
   4132                          )
   4133            )

AssertionError: arg 'n' should not exceed the size of any chunk after grouping

Same operation in R, doesn't throw an error. Instead it returns the chunk with size = min(size of the chunk, n)

> library(tidyverse)
> df = tibble(a=c(1,1,1,2), b=c(1,2,3,4))
> df %>% group_by(a) %>% slice_head(n=2) %>% ungroup()
# A tibble: 3 × 2
      a     b
  <dbl> <dbl>
1     1     1
2     1     2
3     2     4
@talegari
Copy link
Member

This was done intentionally.

Design question: If an user seeks say 5 rows per group and we cant provide it ... should we give an error stating it or silently provide what we can?

@talegari talegari self-assigned this Nov 19, 2023
@fkgruber
Copy link

I think head should give what it can. That makes it easier when you are doing data exploration. Perhaps you can add an argument to change the behavior so that give error if the size if larger than what you ask. This would be useful inside functions so you know what to expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants