|
| 1 | +### The Open-Source Masters |
| 2 | + |
| 3 | +I couldn't wait to go back to grad school. Literally. So I designed my own grad school and spent 5 months learning & hacking in great delight! |
| 4 | + |
| 5 | +### My Background ([linkedin](http://bit.ly/clarecorthell)) |
| 6 | + |
| 7 | +I'm a Stanford-educated Engineer, previously a Front-End Developer and UX Designer on early-stage products. I'm always in hot pursuit of deeper insight to social questions! |
| 8 | + |
| 9 | +### Goals & Motivations of the Open Source M.S. |
| 10 | + |
| 11 | +Data Science is an ideal marriage for my technical capacities, social research inquisitions, and my geekish-freakish love of statistics. |
| 12 | + |
| 13 | +### Next Steps? |
| 14 | + |
| 15 | +I'm now a Data Scientist with an incredible team at [Mattermark](http://www.mattermark.com)! |
| 16 | + |
| 17 | +*** |
| 18 | + |
| 19 | +## The Data Science Curriculum / April-August 2013 |
| 20 | + |
| 21 | +* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci) |
| 22 | + * *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization. |
| 23 | + |
| 24 | +### Math |
| 25 | +* Linear Algebra / Levandosky [Stanford / Book](http://www.amazon.com/Linear-Algebra-Steven-Levandosky/dp/0536667470/ref=sr_1_1?ie=UTF8&qid=1376546498&sr=8-1&keywords=linear+algebra+levandosky#) |
| 26 | +* Statistics [Stats in a Nutshell / Book](http://shop.oreilly.com/product/9780596510497.do) |
| 27 | +* Problem-Solving Heuristics "How To Solve It" [Polya / Book](http://en.wikipedia.org/wiki/How_to_Solve_It) |
| 28 | + |
| 29 | +### Computing |
| 30 | +* **Algorithms** |
| 31 | + * Algorithms Design & Analysis I [Stanford / Coursera](https://www.coursera.org/course/algo) |
| 32 | + * Algorithm Design [Kleinberg & Tardos / Book](http://www.amazon.com/Algorithm-Design-Jon-Kleinberg/dp/0321295358/ref=sr_1_1?ie=UTF8&qid=1376702127&sr=8-1&keywords=kleinberg+algorithms) |
| 33 | + |
| 34 | +* **Databases** |
| 35 | + * Introduction to Databases [Stanford / Coursera](https://www.coursera.org/course/db) |
| 36 | + |
| 37 | +* **Data Mining** |
| 38 | + * Mining Massive Data Sets [Stanford / Book](http://i.stanford.edu/~ullman/mmds.html) |
| 39 | + * Mining The Social Web [O'Reilly / Book](http://shop.oreilly.com/product/0636920010203.do) |
| 40 | + * Introduction to Information Retrieval [Stanford / Book](http://nlp.stanford.edu/IR-book/information-retrieval-book.html) |
| 41 | + |
| 42 | +* **Machine Learning** |
| 43 | + * Machine Learning / Ng [Stanford / Coursera](https://www.coursera.org/course/ml) |
| 44 | + * Programming Collective Intelligence [O'Reilly / Book](http://shop.oreilly.com/product/9780596529321.do) |
| 45 | + * Statistics [The Elements of Statistical Learning / Book](http://www-stat.stanford.edu/~tibs/ElemStatLearn/) ** *en process* |
| 46 | + |
| 47 | +* **Probabilistic Graphical Models** |
| 48 | + * Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials] (https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers) |
| 49 | + * PGMs / Koller [Stanford / Coursera](https://www.coursera.org/course/pgm) ** *en process* |
| 50 | + |
| 51 | +* **Natural Language Processing** |
| 52 | + * NLP with Python [O'Reilly / Book](http://shop.oreilly.com/product/9780596516499.do) |
| 53 | + |
| 54 | +* **Analysis** |
| 55 | + * Python for Data Analysis [O'Reilly / Book](http://www.kqzyfj.com/click-7040302-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&cjsku=0636920023784) |
| 56 | + * Big Data Analysis with Twitter [UC Berkeley / Lectures](http://blogs.ischool.berkeley.edu/i290-abdt-s12/) |
| 57 | + * Social and Economic Networks: Models and Analysis / [Stanford / Coursera](https://www.coursera.org/course/networksonline) |
| 58 | + * Information Visualization ["Envisioning Information" Tufte / Book](http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/ref=sr_1_8?ie=UTF8&qid=1376709039&sr=8-8&keywords=information+design) |
| 59 | + |
| 60 | +* **Python** (Learning) |
| 61 | + * New To Python: [Learn Python the Hard Way](http://learnpythonthehardway.org/), [Google's Python Class](code.google.com/edu/languages/google-python-class/) |
| 62 | + |
| 63 | +* **Python** (Libraries) |
| 64 | + * Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPython ](http://www.lowindata.com/2013/installing-scientific-python-on-mac-os-x/) |
| 65 | + * Bayesian Inference | [pymc](https://github.com/pymc-devs/pymc) |
| 66 | + * Labeled data structures objects, statistical functions, etc [pandas](https://github.com/pydata/pandas) (See: Python for Data Analysis) |
| 67 | + * Python wrapper for the Twitter API [twython](https://github.com/ryanmcgrath/twython) |
| 68 | + * Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/) |
| 69 | + * Network Modeling & Viz [networkx](http://networkx.github.io/) |
| 70 | + * Natural Language Toolkit [NLTK](http://nltk.org/) |
| 71 | + |
| 72 | +### Projects |
| 73 | +* Coursework |
| 74 | + * Sentiment analysis, trending topics, and friendship mapping with Twitter API |
| 75 | + * Joins and Matrix Manipulation in MapReduce (AWS EC2) |
| 76 | + * In-database Text analysis (SQL) |
| 77 | +* Sentiment analysis of movie tweets (Python) |
| 78 | + |
| 79 | + |
| 80 | +*** |
| 81 | +### A Note on Tools |
| 82 | + |
| 83 | +This degree is brought to you by: "THE INTERNET". |
| 84 | + |
| 85 | +Information is more democratized^ now than it was at any point in history. Given a little initiative and interest, you can tailor and excel in an education of your own design. The connective web made me what I am today, growing from the child obsessed with [Number Munchers](http://en.wikipedia.org/wiki/Munchers#Number_Munchers) to an adult jaw-dropping over [DBSCAN](http://en.wikipedia.org/wiki/DBSCAN). |
| 86 | + |
| 87 | +The most valuable resources I used were: |
| 88 | +* [Coursera](http://coursera.org) |
| 89 | +* [Khan Academy](https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/term-life-insurance-and-death-probability) |
| 90 | +* [Wolfram Alpha](http://www.wolframalpha.com/input/?i=torus) |
| 91 | +* [Wikipedia](http://en.wikipedia.org/wiki/List_of_cognitive_biases) |
| 92 | +* [Quora](http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science) |
| 93 | +* **Kindle .mobis** (carrying textbooks is so 90s.) |
| 94 | +* PopSci Read: [The Signal and The Noise](http://www.amazon.com/Signal-Noise-Predictions-Fail-but-ebook/dp/B007V65R54/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=8-1&qid=1376699450) Nate Silver |
| 95 | +* **Friends & Family** (Impossible without their support! Special Thanks to N.S.) |
| 96 | + |
| 97 | +*^ given internet access - an issue near and dear to me.* |
| 98 | + |
| 99 | +*** |
| 100 | + |
| 101 | + |
| 102 | +### I "Forked" this into the [Open Source Data Science Masters](http://datasciencemasters.org) Curriculum. |
| 103 | + |
| 104 | +[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell) |
0 commit comments