Generally speaking, I cringe whenever someone asks me "what tools do you use?" because I have found that tools tend to limit one's thinking on how to solve a problem. As the saying goes:
I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
-- Abraham Maslow, 1966
But the fact of the matter is that many people are not aware of the vast array of open source tools that they have at their disposal when facing data science problems---myself included! The goal of this project is to build awareness of the most important tools, packages, programs, etc in a manner that is as accessible as possible to the data science community. I'm putting this on github with the explicit intent to have the community propose other tools or other forms of organization that are useful.
The idea for this project originally came from a conversation I had with a coworker and how difficult it is for data science n00bs to understand how all these tools come together. This project is an attempt to address that need.
This presentation is hosted via Github Pages and is intended to evolve over time as more tools become available and the landscape of open source tools changes. The goal here is not necessarily to include every available tool under the sun --- there are far too many for that --- but rather to identify a set of core tools that data scientists find most relevant and useful. Packages, frameworks, and tools are likely to be added and removed as their importance and relevance ebbs, but the intent is to keep this repository up to date.
There are many ways to contribute, all of which are greatly appreciated:
-
adding new packages (see tools.json)
-
adding or editing metadata to existing packages (see tools.json)
-
improvements to the presentation (see index.html)
-
other things that are also awesome and helpful ideas that I haven't contemplated.