moving postgres line into database-tech.md

mminar · mminar · commit 21109204ae6c · 2014-05-22T01:07:16.000-07:00
diff --git a/README.md b/README.md
@@ -43,6 +43,8 @@ Classic academic conduits aren't providing Data Scientists -- this talent gap wi
 Start here.
 * **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci)
  * *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
+* **Haravard CS 109 Data Science** [Video Archive](http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml) [Class Webpage](http://cs109.org)
+ * *Topics:* Data wrangling, data management, exploratory data analysis to generate hypotheses and intuition, prediction based on statistical methods such as regression and classification, communication of results through visualization, stories, and summaries.
 
 * Data Science with Open Source Tools [Book](http://it-ebooks.info/book/624/)
   * *Topics:* Visualizing Data, Estimation, Models from Scaling Arguments, Arguments from Probability Models, What you Really Need to Know about Classical Statistics, Data Mining, Clustering, PCA, Map/Reduce, Predictive Analytics
@@ -54,6 +56,7 @@ Start here.
  * Linear Programming (Math 407) [University of Washington / Course](http://www.math.washington.edu/~burke/crs/407/lectures/)
 
 * **Statistics**
+ * Statistics One [Princeton / Coursera](https://www.coursera.org/course/stats1) 
  * Stats in a Nutshell [Book ```$29```](http://amzn.to/1iMnx2X)
  * Think Stats: Probability and Statistics for Programmers [Digital](http://greenteapress.com/thinkstats/) & [Book ```$25```](http://amzn.to/RcVnTf)
  * Think Bayes [Allen Downey / Book](http://www.greenteapress.com/thinkbayes/)
@@ -129,7 +132,7 @@ _OSDSM Specialization: [Data Journalism](https://github.com/datasciencemasters/g
  * Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/)
  * Network Modeling & Viz [networkx](http://networkx.github.io/)
  * Natural Language Toolkit [NLTK](http://nltk.org/)
- * Database querying libraries [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html) [PostgreSQL](https://pypi.python.org/pypi/psycopg2) [AWS](https://boto.readthedocs.org/en/latest/)
+ * Database querying libraries [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html) [AWS](https://boto.readthedocs.org/en/latest/)
 
 ### R resources are now [here](https://github.com/datasciencemasters/go/blob/master/r-resources.md)
 
@@ -179,6 +182,6 @@ Please Share and Contribute Your Ideas -- **it's Open Source!**
 
 Here's [my transcript](https://github.com/datasciencemasters/go/wiki/%5BTranscript%5D-Clare-Corthell).
 
-Please **showcase your own specialization & transcript** by submitting a markdown file pull request with your name! eg ```clare-corthell-transcript.md```
+Please **showcase your own specialization & transcript** by submitting a markdown file pull request in the ```/transcripts``` directory with your name! eg ```clare-corthell-2014.md```
 
 [Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell)
diff --git a/analysis-technologies.md b/analysis-technologies.md
@@ -0,0 +1,10 @@
+#### **Weka (Java Framework)**
+
+* [Weka (MOOC)](http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/) for Data Mining
+
+#### **Lua** (Libraries)
+ * [Torch7](http://torch.ch/) scientific computing framework with wide support for machine learning algorithms
+
+#### **R** [here](r-resources.md)
+
+NB: The core curriculum centers on python-based techniques and technologies
diff --git a/basic-programming.md b/basic-programming.md
@@ -6,6 +6,9 @@ _[I'm adding this section due to the great materials centering on applied method
 
  * [Codecademy](http://www.codecademy.com/)
 
+#### **Startups and Programming**
+ * Startup Engineering [Stanford / Coursera](https://class.coursera.org/startup-001) _NB: This is a full-stack class; explains development from conception to deployment. Great granualar, stepwise course explaining how to built an application from scratch._
+
 #### **GIT** (Source control)
 
  * Git tutorial [Tutorial](http://gitimmersion.com/lab_01.html)
diff --git a/blogs-n-media.md b/blogs-n-media.md
@@ -0,0 +1,10 @@
+### Aggregate Sources
+
+* [DataTau](http://www.datatau.com/) - [Hacker News](https://news.ycombinator.com/) for data scientists
+
+### Blogs
+
+* [Data Science Weekly](http://www.datascienceweekly.org/blog)
+* [FastML](http://fastml.com/)
+* [Shape of Data](http://shapeofdata.wordpress.com/)
+* [yhat](http://blog.yhathq.com/)
diff --git a/database-tech.md b/database-tech.md
@@ -0,0 +1,8 @@
+### Database Technologies & Management
+
+#### MongoDB
+
+* Data Wrangling with Mongo DB [Udacity Course](https://www.udacity.com/course/ud032)
+
+#### PostgreSQL
+* [PostgreSQL](https://pypi.python.org/pypi/psycopg2)
diff --git a/datasets.md b/datasets.md
@@ -3,6 +3,16 @@
 #### Machine Learning
 
 * [UCI Machine Learning Dataset Repository](https://archive.ics.uci.edu/ml/datasets.html)
+* [Machine Learning Dataset Repository](http://mldata.org/)
+
+#### Deep Learning
+
+* [Deep Learning Datasets](http://deeplearning.net/datasets/) for benchmarking deep learning algorithms
+
+#### Clean Sample Data (for Learning New Techniques)
+
+* [Scikit-learn sample datasets](http://scikit-learn.org/stable/datasets/index.html)
+* [Statsmodels datasets](http://statsmodels.sourceforge.net/devel/datasets/index.html)
 
 #### Raw Dataz
 
diff --git a/nosql-tech.md b/nosql-tech.md
diff --git a/r-resources.md b/r-resources.md
@@ -44,3 +44,11 @@ _[Note: The core of The Open Source Data Science Masters focuses on programmatic
  * Kernel Method [kernlab](http://cran.r-project.org/web/packages/kernlab/index.html)
  * Chinese Language Processing [Rwordseg](http://jliblog.com/app/rwordseg)
  * Chinese Weibo Analysis [Rweibo](http://jliblog.com/app/rweibo)
+
+#### R Datasets
+
+ * [Rdatasets](http://vincentarelbundock.github.io/Rdatasets/)
+
+#### R Blogs & Media
+
+ * [R-bloggers](http://www.r-bloggers.com/) R news and tutorials contributed by (452) R bloggers
diff --git a/specializations.md b/specializations.md
@@ -5,11 +5,15 @@ _[Note: I'm adding this section due to the overwhelming amount of input from new
 #### Machine Learning
 
 * Neural Networks for Machine Learning [U Toronto / Coursera](https://www.coursera.org/course/neuralnets)
+* [Building Machine Learning Systems with Python](http://www.packtpub.com/building-machine-learning-systems-with-python/book) [source code](https://github.com/luispedro/BuildingMachineLearningSystemsWithPython)
 
 #### Deep Learning 
 
 [Wikipedia Definition](http://en.wikipedia.org/wiki/Deep_learning)
 
+* Deep Learning [Tutorials](http://deeplearning.net/tutorial/)
+* Deep Learning Course [Stanford / OpenClassroom](http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=DeepLearning)
+
 #### Web Scraping & Crawling
 
 * Introduction to WebAPIs including Twitter, Youtube, BitLy, Sunlight Foundation [CodeAcademy](http://www.codecademy.com/tracks/apis)
@@ -18,6 +22,10 @@ _[Note: I'm adding this section due to the overwhelming amount of input from new
 * Web scraping [NewCoder / Tutorial](http://newcoder.io/scrape/)
 * Working with Web APIs [NewCoder / Tutorial](http://newcoder.io/api/)
 
+#### Visualization
+
+* [D3.js Tutorial](https://www.dashingd3js.com/table-of-contents)
+
 #### Social Network Analysis
 
 #### Data Journalism
diff --git a/transcripts/clare-corthell-2013.md b/transcripts/clare-corthell-2013.md
@@ -0,0 +1,104 @@
+### The Open-Source Masters
+
+I couldn't wait to go back to grad school. Literally. So I designed my own grad school and spent 5 months learning & hacking in great delight!
+
+### My Background ([linkedin](http://bit.ly/clarecorthell))
+
+I'm a Stanford-educated Engineer, previously a Front-End Developer and UX Designer on early-stage products. I'm always in hot pursuit of deeper insight to social questions!
+
+### Goals & Motivations of the Open Source M.S.
+
+Data Science is an ideal marriage for my technical capacities, social research inquisitions, and my geekish-freakish love of statistics.
+
+### Next Steps?
+
+I'm now a Data Scientist with an incredible team at [Mattermark](http://www.mattermark.com)!
+
+***
+
+## The Data Science Curriculum / April-August 2013
+
+* **Intro to Data Science** [UW / Coursera](https://www.coursera.org/course/datasci)
+ * *Topics:* Python NLP on Twitter API, Distributed Computing Paradigm, MapReduce/Hadoop & Pig Script, SQL/NoSQL, Relational Algebra, Experiment design, Statistics, Graphs, Amazon EC2, Visualization.
+
+### Math
+* Linear Algebra / Levandosky [Stanford / Book](http://www.amazon.com/Linear-Algebra-Steven-Levandosky/dp/0536667470/ref=sr_1_1?ie=UTF8&qid=1376546498&sr=8-1&keywords=linear+algebra+levandosky#)
+* Statistics [Stats in a Nutshell / Book](http://shop.oreilly.com/product/9780596510497.do)
+* Problem-Solving Heuristics "How To Solve It" [Polya / Book](http://en.wikipedia.org/wiki/How_to_Solve_It)
+
+### Computing
+* **Algorithms**
+ * Algorithms Design & Analysis I [Stanford / Coursera](https://www.coursera.org/course/algo)
+ * Algorithm Design [Kleinberg & Tardos / Book](http://www.amazon.com/Algorithm-Design-Jon-Kleinberg/dp/0321295358/ref=sr_1_1?ie=UTF8&qid=1376702127&sr=8-1&keywords=kleinberg+algorithms)
+
+* **Databases**
+ * Introduction to Databases [Stanford / Coursera](https://www.coursera.org/course/db)
+
+* **Data Mining**
+ * Mining Massive Data Sets [Stanford / Book](http://i.stanford.edu/~ullman/mmds.html)
+ * Mining The Social Web [O'Reilly / Book](http://shop.oreilly.com/product/0636920010203.do)
+ * Introduction to Information Retrieval [Stanford / Book](http://nlp.stanford.edu/IR-book/information-retrieval-book.html)
+
+* **Machine Learning**
+ * Machine Learning / Ng [Stanford / Coursera](https://www.coursera.org/course/ml)
+ * Programming Collective Intelligence [O'Reilly / Book](http://shop.oreilly.com/product/9780596529321.do)
+ * Statistics [The Elements of Statistical Learning / Book](http://www-stat.stanford.edu/~tibs/ElemStatLearn/)  ** *en process*
+
+* **Probabilistic Graphical Models**
+ * Probabilistic Programming and Bayesian Methods for Hackers [Github / Tutorials] (https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers)
+ * PGMs / Koller [Stanford / Coursera](https://www.coursera.org/course/pgm)   ** *en process*
+
+* **Natural Language Processing**
+ * NLP with Python [O'Reilly / Book](http://shop.oreilly.com/product/9780596516499.do)
+
+* **Analysis**
+ * Python for Data Analysis [O'Reilly / Book](http://www.kqzyfj.com/click-7040302-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920023784.do&cjsku=0636920023784)
+ * Big Data Analysis with Twitter [UC Berkeley / Lectures](http://blogs.ischool.berkeley.edu/i290-abdt-s12/)
+ * Social and Economic Networks: Models and Analysis / [Stanford / Coursera](https://www.coursera.org/course/networksonline)
+ * Information Visualization ["Envisioning Information" Tufte / Book](http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/ref=sr_1_8?ie=UTF8&qid=1376709039&sr=8-8&keywords=information+design)
+
+* **Python** (Learning)
+ * New To Python: [Learn Python the Hard Way](http://learnpythonthehardway.org/), [Google's Python Class](code.google.com/edu/languages/google-python-class/)
+
+* **Python** (Libraries)
+ * Basic Packages [Python, virtualenv, NumPy, SciPy, matplotlib and IPython ](http://www.lowindata.com/2013/installing-scientific-python-on-mac-os-x/)
+ * Bayesian Inference | [pymc](https://github.com/pymc-devs/pymc)
+ * Labeled data structures objects, statistical functions, etc [pandas](https://github.com/pydata/pandas) (See: Python for Data Analysis)
+ * Python wrapper for the Twitter API [twython](https://github.com/ryanmcgrath/twython)
+ * Tools for Data Mining & Analysis [scikit-learn](http://scikit-learn.org/stable/)
+ * Network Modeling & Viz [networkx](http://networkx.github.io/)
+ * Natural Language Toolkit [NLTK](http://nltk.org/)
+
+### Projects
+* Coursework
+ * Sentiment analysis, trending topics, and friendship mapping with Twitter API
+ * Joins and Matrix Manipulation in MapReduce (AWS EC2)
+ * In-database Text analysis (SQL)
+* Sentiment analysis of movie tweets (Python)
+
+
+***
+### A Note on Tools
+
+This degree is brought to you by: "THE INTERNET". 
+
+Information is more democratized^ now than it was at any point in history. Given a little initiative and interest, you can tailor and excel in an education of your own design. The connective web made me what I am today, growing from the child obsessed with [Number Munchers](http://en.wikipedia.org/wiki/Munchers#Number_Munchers) to an adult jaw-dropping over [DBSCAN](http://en.wikipedia.org/wiki/DBSCAN).
+
+The most valuable resources I used were:
+* [Coursera](http://coursera.org)
+* [Khan Academy](https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/term-life-insurance-and-death-probability)
+* [Wolfram Alpha](http://www.wolframalpha.com/input/?i=torus)
+* [Wikipedia](http://en.wikipedia.org/wiki/List_of_cognitive_biases)
+* [Quora](http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science)
+* **Kindle .mobis** (carrying textbooks is so 90s.)
+* PopSci Read: [The Signal and The Noise](http://www.amazon.com/Signal-Noise-Predictions-Fail-but-ebook/dp/B007V65R54/ref=tmm_kin_swatch_0?_encoding=UTF8&sr=8-1&qid=1376699450) Nate Silver
+* **Friends & Family** (Impossible without their support! Special Thanks to N.S.)
+
+*^ given internet access - an issue near and dear to me.*
+
+***
+
+
+### I "Forked" this into the [Open Source Data Science Masters](http://datasciencemasters.org) Curriculum.
+
+[Follow me on Twitter @clarecorthell](http://twitter.com/clarecorthell)