Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation of data frame created by group by and summarize #527

Open
jessesadler opened this issue Sep 27, 2024 · 1 comment
Open

Explanation of data frame created by group by and summarize #527

jessesadler opened this issue Sep 27, 2024 · 1 comment
Labels
good first issue Good issue for first-time contributors help wanted Looking for Contributors

Comments

@jessesadler
Copy link
Contributor

jessesadler commented Sep 27, 2024

How could the content be improved?

Two improvements to be made:

  1. Explanation of the smaller data frame created by summarize() incorrectly attributes it to tibble printing not to the change in the data.

Current language: You may also have noticed that the output from these calls doesn’t run off the screen anymore. It’s one of the advantages of tbl_df over dataframe.

Change to explanation of summarize() only keeping columns it uses or creates.

  1. Would be good to have explanation of printing of message summarise() has grouped output by 'village'. You can override using the .groups argument. even when ungroup() is used. Could also be more explicit about where it is indicated that the output is a grouped tibble.

Which part of the content does your suggestion apply to?

https://datacarpentry.org/r-socialsci/instructor/03-dplyr.html#the-summarize-function

@juanfung juanfung added help wanted Looking for Contributors good first issue Good issue for first-time contributors labels Sep 27, 2024
XAM12 added a commit to XAM12/r-socialsci that referenced this issue Nov 28, 2024
proposed solution to: datacarpentry#527

moved the line about the advantage of tbl_df to the example where the first tibble based output is created as it is not related to the summarize function.
added an explanation on how to identify grouped vs ungrouped tibbles in the output
@XAM12
Copy link
Contributor

XAM12 commented Nov 28, 2024

Try to solve two of the previous mentioned problems by moving the statement about the advantages of tibble over dataframe to the first example on the page as it is unrelated to the summarize function. Also added an explanation on how to identify grouped vs ungrouped tibbles in the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good issue for first-time contributors help wanted Looking for Contributors
Projects
None yet
Development

No branches or pull requests

3 participants