Skip to content

Commit 2110920

Browse files
committed
moving postgres line into database-tech.md
2 parents 58d401f + 3c6f549 commit 2110920

10 files changed

+166
-7
lines changed

Diff for: README.md

+5-2
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ Classic academic conduits aren't providing Data Scientists -- this talent gap wi
4343
Start here.
4444
* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci)
4545
* *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
46+
* **Haravard CS 109 Data Science** [Video Archive](http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml) [Class Webpage](http://cs109.org)
47+
* *Topics:* Data wrangling, data management, exploratory data analysis to generate hypotheses and intuition, prediction based on statistical methods such as regression and classification, communication of results through visualization, stories, and summaries.
4648

4749
* Data Science with Open Source Tools [Book](http://it-ebooks.info/book/624/)
4850
* *Topics:* Visualizing Data, Estimation, Models from Scaling Arguments, Arguments from Probability Models, What you Really Need to Know about Classical Statistics, Data Mining, Clustering, PCA, Map/Reduce, Predictive Analytics
@@ -54,6 +56,7 @@ Start here.
5456
* Linear Programming (Math 407) [University of Washington / Course](http://www.math.washington.edu/~burke/crs/407/lectures/)
5557

5658
* **Statistics**
59+
* Statistics One [Princeton / Coursera](https://www.coursera.org/course/stats1)
5760
* Stats in a Nutshell [Book ```$29```](http://amzn.to/1iMnx2X)
5861
* Think Stats: Probability and Statistics for Programmers [Digital](http://greenteapress.com/thinkstats/) & [Book ```$25```](http://amzn.to/RcVnTf)
5962
* Think Bayes [Allen Downey / Book](http://www.greenteapress.com/thinkbayes/)
@@ -129,7 +132,7 @@ _OSDSM Specialization: [Data Journalism](https://github.com/datasciencemasters/g
129132
* Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/)
130133
* Network Modeling & Viz [networkx](http://networkx.github.io/)
131134
* Natural Language Toolkit [NLTK](http://nltk.org/)
132-
* Database querying libraries [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html) [PostgreSQL](https://pypi.python.org/pypi/psycopg2) [AWS](https://boto.readthedocs.org/en/latest/)
135+
* Database querying libraries [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html) [AWS](https://boto.readthedocs.org/en/latest/)
133136

134137
### R resources are now [here](https://github.com/datasciencemasters/go/blob/master/r-resources.md)
135138

@@ -179,6 +182,6 @@ Please Share and Contribute Your Ideas -- **it's Open Source!**
179182

180183
Here's [my transcript](https://github.com/datasciencemasters/go/wiki/%5BTranscript%5D-Clare-Corthell).
181184

182-
Please **showcase your own specialization & transcript** by submitting a markdown file pull request with your name! eg ```clare-corthell-transcript.md```
185+
Please **showcase your own specialization & transcript** by submitting a markdown file pull request in the ```/transcripts``` directory with your name! eg ```clare-corthell-2014.md```
183186

184187
[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell)

Diff for: analysis-technologies.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#### **Weka (Java Framework)**
2+
3+
* [Weka (MOOC)](http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/) for Data Mining
4+
5+
#### **Lua** (Libraries)
6+
* [Torch7](http://torch.ch/) scientific computing framework with wide support for machine learning algorithms
7+
8+
#### **R** [here](r-resources.md)
9+
10+
NB: The core curriculum centers on python-based techniques and technologies

Diff for: basic-programming.md

+3
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ _[I'm adding this section due to the great materials centering on applied method
66

77
* [Codecademy](http://www.codecademy.com/)
88

9+
#### **Startups and Programming**
10+
* Startup Engineering [Stanford / Coursera](https://class.coursera.org/startup-001) _NB: This is a full-stack class; explains development from conception to deployment. Great granualar, stepwise course explaining how to built an application from scratch._
11+
912
#### **GIT** (Source control)
1013

1114
* Git tutorial [Tutorial](http://gitimmersion.com/lab_01.html)

Diff for: blogs-n-media.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
### Aggregate Sources
2+
3+
* [DataTau](http://www.datatau.com/) - [Hacker News](https://news.ycombinator.com/) for data scientists
4+
5+
### Blogs
6+
7+
* [Data Science Weekly](http://www.datascienceweekly.org/blog)
8+
* [FastML](http://fastml.com/)
9+
* [Shape of Data](http://shapeofdata.wordpress.com/)
10+
* [yhat](http://blog.yhathq.com/)

Diff for: database-tech.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
### Database Technologies & Management
2+
3+
#### MongoDB
4+
5+
* Data Wrangling with Mongo DB [Udacity Course](https://www.udacity.com/course/ud032)
6+
7+
#### PostgreSQL
8+
* [PostgreSQL](https://pypi.python.org/pypi/psycopg2)

Diff for: datasets.md

+10
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,16 @@
33
#### Machine Learning
44

55
* [UCI Machine Learning Dataset Repository](https://archive.ics.uci.edu/ml/datasets.html)
6+
* [Machine Learning Dataset Repository](http://mldata.org/)
7+
8+
#### Deep Learning
9+
10+
* [Deep Learning Datasets](http://deeplearning.net/datasets/) for benchmarking deep learning algorithms
11+
12+
#### Clean Sample Data (for Learning New Techniques)
13+
14+
* [Scikit-learn sample datasets](http://scikit-learn.org/stable/datasets/index.html)
15+
* [Statsmodels datasets](http://statsmodels.sourceforge.net/devel/datasets/index.html)
616

717
#### Raw Dataz
818

Diff for: nosql-tech.md

-5
This file was deleted.

Diff for: r-resources.md

+8
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,11 @@ _[Note: The core of The Open Source Data Science Masters focuses on programmatic
4444
* Kernel Method [kernlab](http://cran.r-project.org/web/packages/kernlab/index.html)
4545
* Chinese Language Processing [Rwordseg](http://jliblog.com/app/rwordseg)
4646
* Chinese Weibo Analysis [Rweibo](http://jliblog.com/app/rweibo)
47+
48+
#### R Datasets
49+
50+
* [Rdatasets](http://vincentarelbundock.github.io/Rdatasets/)
51+
52+
#### R Blogs & Media
53+
54+
* [R-bloggers](http://www.r-bloggers.com/) R news and tutorials contributed by (452) R bloggers

Diff for: specializations.md

+8
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,15 @@ _[Note: I'm adding this section due to the overwhelming amount of input from new
55
#### Machine Learning
66

77
* Neural Networks for Machine Learning [U Toronto / Coursera](https://www.coursera.org/course/neuralnets)
8+
* [Building Machine Learning Systems with Python](http://www.packtpub.com/building-machine-learning-systems-with-python/book) [source code](https://github.com/luispedro/BuildingMachineLearningSystemsWithPython)
89

910
#### Deep Learning
1011

1112
[Wikipedia Definition](http://en.wikipedia.org/wiki/Deep_learning)
1213

14+
* Deep Learning [Tutorials](http://deeplearning.net/tutorial/)
15+
* Deep Learning Course [Stanford / OpenClassroom](http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=DeepLearning)
16+
1317
#### Web Scraping & Crawling
1418

1519
* Introduction to WebAPIs including Twitter, Youtube, BitLy, Sunlight Foundation [CodeAcademy](http://www.codecademy.com/tracks/apis)
@@ -18,6 +22,10 @@ _[Note: I'm adding this section due to the overwhelming amount of input from new
1822
* Web scraping [NewCoder / Tutorial](http://newcoder.io/scrape/)
1923
* Working with Web APIs [NewCoder / Tutorial](http://newcoder.io/api/)
2024

25+
#### Visualization
26+
27+
* [D3.js Tutorial](https://www.dashingd3js.com/table-of-contents)
28+
2129
#### Social Network Analysis
2230

2331
#### Data Journalism

Diff for: transcripts/clare-corthell-2013.md

+104
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
### The Open-Source Masters
2+
3+
I couldn't wait to go back to grad school. Literally. So I designed my own grad school and spent 5 months learning & hacking in great delight!
4+
5+
### My Background ([linkedin](http://bit.ly/clarecorthell))
6+
7+
I'm a Stanford-educated Engineer, previously a Front-End Developer and UX Designer on early-stage products. I'm always in hot pursuit of deeper insight to social questions!
8+
9+
### Goals & Motivations of the Open Source M.S.
10+
11+
Data Science is an ideal marriage for my technical capacities, social research inquisitions, and my geekish-freakish love of statistics.
12+
13+
### Next Steps?
14+
15+
I'm now a Data Scientist with an incredible team at [Mattermark](http://www.mattermark.com)!
16+
17+
***
18+
19+
## The Data Science Curriculum / April-August 2013
20+
21+
* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci)
22+
* *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
23+
24+
### Math
25+
* Linear Algebra / Levandosky [Stanford / Book](http://www.amazon.com/Linear-Algebra-Steven-Levandosky/dp/0536667470/ref=sr_1_1?ie=UTF8&qid=1376546498&sr=8-1&keywords=linear+algebra+levandosky#)
26+
* Statistics [Stats in a Nutshell / Book](http://shop.oreilly.com/product/9780596510497.do)
27+
* Problem-Solving Heuristics "How To Solve It" [Polya / Book](http://en.wikipedia.org/wiki/How_to_Solve_It)
28+
29+
### Computing
30+
* **Algorithms**
31+
* Algorithms Design & Analysis I [Stanford / Coursera](https://www.coursera.org/course/algo)
32+
* Algorithm Design [Kleinberg & Tardos / Book](http://www.amazon.com/Algorithm-Design-Jon-Kleinberg/dp/0321295358/ref=sr_1_1?ie=UTF8&qid=1376702127&sr=8-1&keywords=kleinberg+algorithms)
33+
34+
* **Databases**
35+
* Introduction to Databases [Stanford / Coursera](https://www.coursera.org/course/db)
36+
37+
* **Data Mining**
38+
* Mining Massive Data Sets [Stanford / Book](http://i.stanford.edu/~ullman/mmds.html)
39+
* Mining The Social Web [O'Reilly / Book](http://shop.oreilly.com/product/0636920010203.do)
40+
* Introduction to Information Retrieval [Stanford / Book](http://nlp.stanford.edu/IR-book/information-retrieval-book.html)
41+
42+
* **Machine Learning**
43+
* Machine Learning / Ng [Stanford / Coursera](https://www.coursera.org/course/ml)
44+
* Programming Collective Intelligence [O'Reilly / Book](http://shop.oreilly.com/product/9780596529321.do)
45+
* Statistics [The Elements of Statistical Learning / Book](http://www-stat.stanford.edu/~tibs/ElemStatLearn/) ** *en process*
46+
47+
* **Probabilistic Graphical Models**
48+
* Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials] (https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers)
49+
* PGMs / Koller [Stanford / Coursera](https://www.coursera.org/course/pgm) ** *en process*
50+
51+
* **Natural Language Processing**
52+
* NLP with Python [O'Reilly / Book](http://shop.oreilly.com/product/9780596516499.do)
53+
54+
* **Analysis**
55+
* Python for Data Analysis [O'Reilly / Book](http://www.kqzyfj.com/click-7040302-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&cjsku=0636920023784)
56+
* Big Data Analysis with Twitter [UC Berkeley / Lectures](http://blogs.ischool.berkeley.edu/i290-abdt-s12/)
57+
* Social and Economic Networks: Models and Analysis / [Stanford / Coursera](https://www.coursera.org/course/networksonline)
58+
* Information Visualization ["Envisioning Information" Tufte / Book](http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/ref=sr_1_8?ie=UTF8&qid=1376709039&sr=8-8&keywords=information+design)
59+
60+
* **Python** (Learning)
61+
* New To Python: [Learn Python the Hard Way](http://learnpythonthehardway.org/), [Google's Python Class](code.google.com/edu/languages/google-python-class/)
62+
63+
* **Python** (Libraries)
64+
* Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPython ](http://www.lowindata.com/2013/installing-scientific-python-on-mac-os-x/)
65+
* Bayesian Inference | [pymc](https://github.com/pymc-devs/pymc)
66+
* Labeled data structures objects, statistical functions, etc [pandas](https://github.com/pydata/pandas) (See: Python for Data Analysis)
67+
* Python wrapper for the Twitter API [twython](https://github.com/ryanmcgrath/twython)
68+
* Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/)
69+
* Network Modeling & Viz [networkx](http://networkx.github.io/)
70+
* Natural Language Toolkit [NLTK](http://nltk.org/)
71+
72+
### Projects
73+
* Coursework
74+
* Sentiment analysis, trending topics, and friendship mapping with Twitter API
75+
* Joins and Matrix Manipulation in MapReduce (AWS EC2)
76+
* In-database Text analysis (SQL)
77+
* Sentiment analysis of movie tweets (Python)
78+
79+
80+
***
81+
### A Note on Tools
82+
83+
This degree is brought to you by: "THE INTERNET".
84+
85+
Information is more democratized^ now than it was at any point in history. Given a little initiative and interest, you can tailor and excel in an education of your own design. The connective web made me what I am today, growing from the child obsessed with [Number Munchers](http://en.wikipedia.org/wiki/Munchers#Number_Munchers) to an adult jaw-dropping over [DBSCAN](http://en.wikipedia.org/wiki/DBSCAN).
86+
87+
The most valuable resources I used were:
88+
* [Coursera](http://coursera.org)
89+
* [Khan Academy](https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/term-life-insurance-and-death-probability)
90+
* [Wolfram Alpha](http://www.wolframalpha.com/input/?i=torus)
91+
* [Wikipedia](http://en.wikipedia.org/wiki/List_of_cognitive_biases)
92+
* [Quora](http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science)
93+
* **Kindle .mobis** (carrying textbooks is so 90s.)
94+
* PopSci Read: [The Signal and The Noise](http://www.amazon.com/Signal-Noise-Predictions-Fail-but-ebook/dp/B007V65R54/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=8-1&qid=1376699450) Nate Silver
95+
* **Friends & Family** (Impossible without their support! Special Thanks to N.S.)
96+
97+
*^ given internet access - an issue near and dear to me.*
98+
99+
***
100+
101+
102+
### I "Forked" this into the [Open Source Data Science Masters](http://datasciencemasters.org) Curriculum.
103+
104+
[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell)

0 commit comments

Comments
 (0)