Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Palmer penguin dataset #45

Open
devmotion opened this issue Jun 29, 2020 · 3 comments
Open

Add Palmer penguin dataset #45

devmotion opened this issue Jun 29, 2020 · 3 comments

Comments

@devmotion
Copy link

Would it make sense to add the Palmer penguin dataset? It was recently proposed as an alternative to the well-known Iris dataset due to growing sentiment about Ronald Fisher's eugenicist past. Since Iris is included in MLDataset, I assumed that it might fit in here quite well. Or should it rather be added to RDatasets (but then the same argument would apply to the Iris dataset, it seems)?

@devmotion
Copy link
Author

I got the impression that it does not really fit into this package, even though it is intended to be a replacement of the Iris dataset since there are multiple categorical and numerical features without any specific targets. Moreover, for some observations some features are missing. Hence in contrast to the Iris dataset there exists no straightforward split of the Palmer penguins dataset into categorical labels of type Vector{String} and features of type Matrix{Float64}.

Thus instead for now I created a separate package PalmerPenguins.jl that allows to load a CSV.File object of the dataset with all columns and missing values.

@CarloLucibello
Copy link
Member

I think this dataset would be a nice addition to this repo, can be ported here easily from PalmerPenguins.jl if anyone wants to do it and if @devmotion agrees

@devmotion
Copy link
Author

Since both PalmerPenguins and MLDatasets use DataDeps, I guess one could just load PalmerPenguins.jl instead of re-implementing everything from scratch? I don't plan to retire PalmerPenguins anyways since it's much more lightweight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants