index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
    <meta name="description" content="Course homepage for CS 489 Big Data Infrastructure (Winter 2017) at the University of Waterloo">
    <meta name="author" content="Jimmy Lin">
    <title>Big Data Infrastructure</title>

    <!-- Bootstrap -->
    <link href="css/bootstrap.min.css" rel="stylesheet">

    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
    <link href="css/ie10-viewport-bug-workaround.css" rel="stylesheet">

    <style>
      body {
        padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
      }
    </style>

    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
    <!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
  </head>


  <body>

    <nav class="navbar navbar-inverse navbar-fixed-top">
      <div class="container">
        <div class="navbar-header">
          <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
            <span class="sr-only">Toggle navigation</span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
          </button>
        </div>
        <div id="navbar" class="collapse navbar-collapse">
          <ul class="nav navbar-nav">
            <li class="active"><a href="index.html">Overview</a></li>
            <li><a href="organization.html">Organization</a></li>
            <li><a href="syllabus.html">Syllabus</a></li>
            <li><a href="assignments.html">Assignments</a></li>
            <li><a href="software.html">Software</a></li>
          </ul>
        </div><!--/.nav-collapse -->
      </div>
    </nav>

    <div class="container">

  <div class="page-header">
    <div style="float: right"/><img src="images/waterloo_logo.png"/></div>
    <h1>Big Data Infrastructure<br/><small>CS 489/698 (Winter 2017)</small></h1>
  </div>

<p>
<b>Time:</b> Tuesdays and Thursdays, 1:00-2:20pm<br/>
<b>Location:</b> AL 124<br/>
<b>Instructor:</b> <a href="https://cs.uwaterloo.ca/~jimmylin/">Jimmy Lin</a><br/>
<b>TAs:</b> Libo Gao, Kareem El Gebaly, Ripul Jain</br>
<b>Piazza:</b> <a href="http://piazza.com/uwaterloo.ca/winter2017/cs489698/home">course link</a> &mdash; use for general questions</br>
<b>Email:</b> uwaterloo-bigdata-2017w-staff@googlegroups.com (will reach instructor and TAs) &mdash; use <i>only</i> for personal concerns<br/>
</p>

<div style="float: right; padding-left: 20px; padding-bottom: 20px"/><img src="images/stack.png"/></div>

<p>Over the past few years, we have seen the emergence of "big data":
disruptive technologies that have transformed commerce, science, and
many aspects of society. These developments are enabled by
infrastructure that allows us to distribute computations across
hundreds or even thousands of commodity servers. One important advance
that has made all this possible is the development of abstractions for
data-intensive computing that allow programmers to reason about
computations at a massive scale, hiding low-level details such as
synchronization, data movement, and fault tolerance.</p>

<p><b>What is this course about?</b> This course provides an
introduction to big data infrastructure for analytics. The focus is
algorithm design and "thinking at scale": we will cover data mining
and machine learning techniques as applied to text, graphs, and
relational data. Most of the course will be taught in a combination of
MapReduce and Spark, two representative dataflow abstractions for
large-scale data analysis, although we will introduce alternative
abstractions such as bulk-synchronous parallel and streaming models
as well.</p>

<p>One might break down the "big data" stack in the manner shown on
the right. At the bottom resides the execution infrastructure, which
is responsible for coordinating computations across a cluster
(examples include MapReduce and Spark). In the middle resides
analytics infrastructure, which implements data mining and machine
learning algorithms on top of the execution infrastructure (an example
would be MLlib in Spark). At the top are the tools data scientists use
to generate insights, built on top of the analytics
infrastructure. This course focuses on the middle part &mdash; by the
end of the course, you will be able to implement basic data mining and
machine learning algorithms that can operate at scale. Of course,
effective algorithm design requires understanding the execution
infrastructure (below) and what the algorithms are used for (above),
so we will cover the broader context as well.</p>


<p style="padding-top: 20px">
<a href="https://github.com/lintool/bigdata-2017w/" class="btn btn-primary btn-large">Fork me on Github!</a>
</p>

<p style="padding-bottom: 100px"/>

    </div><!-- /.container -->


    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
    <!-- Include all compiled plugins (below), or include individual files as needed -->
    <script src="js/bootstrap.min.js"></script>

    <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
    <script src="js/ie10-viewport-bug-workaround.js"></script>
  </body>

</html>