kite-sdk · DennisDawson · Mar 4, 2015 · Mar 5, 2015 · Mar 12, 2015 · Mar 30, 2015
diff --git a/tutorials/create-events-dataset.md b/tutorials/create-events-dataset.md
@@ -0,0 +1,108 @@
+---
+layout: page
+title: Creating the Events Dataset
+---
+
+This lesson shows you how to create a dataset suitable for storing standard event records. You define a dataset schema, a partition strategy, and a URI that specifies the storage scheme.
+
+## Defining the Schema
+
+The `standard_event.avsc` schema is self-describing, thanks to the _doc_ property for each of the fields. The fields store the `user_id` for the person who initiated the event, the user's IP address, and when the event occurred.
+
+### standard_event.avsc
+
+```JSON
+{
+  "name": "StandardEvent",
+  "namespace": "org.kitesdk.data.event",
+  "type": "record",
+  "doc": "A standard event type for logging, based on the paper 'The Unified Logging Infrastructure for Data Analytics at Twitter' by Lee et al, http://vldb.org/pvldb/vol5/p1771_georgelee_vldb2012.pdf",
+  "fields": [
+    {
+      "name": "event_initiator",
+      "type": "string",
+      "doc": "Source of the event in the format {client,server}_{user,app}; for example, 'client_user'. Required."
+    },
+    {
+      "name": "event_name",
+      "type": "string",
+      "doc": "A hierarchical name for the event, with parts separated by ':'. Required."
+    },
+    {
+      "name": "user_id",
+      "type": "long",
+      "doc": "A unique identifier for the user. Required."
+    },
+    {
+      "name": "session_id",
+      "type": "string",
+      "doc": "A unique identifier for the session. Required."
+    },
+    {
+      "name": "ip",
+      "type": "string",
+      "doc": "The IP address of the host where the event originated. Required."
+    },
+    {
+      "name": "timestamp",
+      "type": "long",
+      "doc": "The point in time when the event occurred, represented as the number of milliseconds since January 1, 1970, 00:00:00 GMT. Required."
+    }
+  ]
+}
+```
+
+For convenience, save `standard_event.avsc` to the same directory where you installed the kite-dataset executable JAR.
+
+## Defining the Partition Strategy
+
+Analytics for the `events` dataset are time-based. Partitioning the dataset on the `timestamp` field allows Kite to go directly to the files for a particular day, ignoring files outside the chosen time period. Partition strategies are defined in JSON format. See [Partition Strategy JSON Format][partition-strategies].
+
+The following code sample defines a strategy that partitions a dataset by _year_, _month_, and _day_, based on a _timestamp_ field.
+
+### standard_event.json
+
+```
+[ {
+  "source" : "timestamp",
+  "type" : "year",
+  "name" : "year"
+}, {
+  "source" : "timestamp",
+  "type" : "month",
+  "name" : "month"
+}, {
+  "source" : "timestamp",
+  "type" : "day",
+  "name" : "day"
+} ]
+```
+
+For convenience, save `standard_event.json` to the same directory where you installed the `kite-dataset` executable JAR.
+
+[partition-strategies]:{{site.baseurl}}/Partition-Strategy-Format.html
+
+## Creating the Events Dataset Using the Kite CLI
+
+Create the _events_ dataset using the default Hive scheme.
+
+To create the _events_ dataset:
+
+1. Open a terminal window and navigate to the directory where you saved the schema file.
+1. Use the `create` command to create the dataset.
+
+```
+kite-dataset create events \
+             --schema standard_event.avsc \
+             --partition-by standard_event.json
+```
+
+Use Hue to look at the schema and confirm that the dataset is ready to use.
+
+[http://quickstart.cloudera:8888/filebrowser/view//tmp/data/default/events/.metadata/schema.avsc](http://quickstart.cloudera:8888/filebrowser/view//tmp/data/default/events/.metadata/schema.avsc)
+
+## Next Steps
+
+You've created a dataset to store events captured as they happen. Now you can run a web application to create records in your new dataset. See [Capturing Events with Flume][capture-events].
+
+[capture-events]:{{site.baseurl}}/tutorials/flume-capture-events.html
diff --git a/tutorials/flume-capture-events.md b/tutorials/flume-capture-events.md
@@ -0,0 +1,179 @@
+---
+layout: page
+title: Capturing Events with Flume
+---
+
+Once you have an [Events dataset][events], you can create a web application that captures session events.
+
+This example shows how you can send log information via Flume to your Hadoop database using a JSP and custom servlet running on Tomcat.
+
+[events]:{{site.baseurl}}/tutorials/create-events-dataset.html
+
+## Configuring Flume
+
+These are the steps to configure Flume to channel log information directly to the `events` dataset. You first generate the configuration information using the Kite command-line interface, copy the results, paste them in the Flume configuration file, and then restart Flume.
+
+1. In a terminal window, type `kite-dataset flume-config --channel-type memory events`.
+1. Copy the output from the terminal window.
+1. Open Cloudera Manager.
+1. Under __Status__, click the link to __Flume__.
+1. Choose the __Configuration__ tab.
+1. Click __Agent Base Group__.
+1. Right-click the Configuration File text area and choose __Select All__.
+1. Right-click the Configuration File text area and choose __Paste__.
+1. Click __Save Changes__.
+1. From the __Actions__ menu, choose __Restart__, and confirm the action.
+
+Flume is now configured to receive logging events and record them in the `events` dataset.
+
+## Creating Web Application Pages
+
+These JSP and servlet examples create message events that can be captured by Flume.
+
+## index.jsp
+
+The default landing page for the web application is `index.jsp`. It defines a form with fields for an arbitrary User ID and a message. The __Send__ button submits the input values to the Tomcat server.
+
+```JSP
+<html>
+  <head>
+    <title>Kite Example</title>
+  <head>
+  <body>
+    <h2>Kite Example</h2>
+    <form name="input" action="send" method="get">
+        User ID: <input type="text" name="user_id" value="1">
+        Message: <input type="text" name="message" value="Hello!">
+        <input type="submit" value="Send">
+    </form>
+  </body>
+</html>
+```
+
+## LoggingServlet
+
+When you submit a message from the JSP, the LoggingServlet receives and processes the request. The following is mostly standard servlet code, with some notes about application-specific snippets.
+
+```Java
+package org.kitesdk.examples.demo;
+```
+
+The servlet parses information from the request to create a StandardEvent object. However, you won't find any source code for `org.kitesdk.data.event.StandardEvent`. During the Maven build, the avro-maven-plugin runs before the compile phase. Any `.avsc` file in the `/main/avro` folder is defined as a Java class. The autogenerated classes have  the methods required to build corresponding Avro `SpecificRecord` objects of that type. `SpecificRecord` objects permit efficient access to object fields.
+
+```Java
+
+import org.kitesdk.data.event.StandardEvent;
+import java.io.IOException;
+import java.io.PrintWriter;
+import javax.servlet.ServletException;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+
+```
+
+This example sends Log4j messages directly to the Hive data sink via Flume.
+
+```Java
+import org.apache.log4j.Logger;
+
+public class LoggingServlet extends HttpServlet {
+
+  private final Logger logger = Logger.getLogger(LoggingServlet.class);
+
+  @Override
+  protected void doGet(HttpServletRequest request, HttpServletResponse
+      response) throws ServletException, IOException {
+
+    response.setContentType("text/html");
+```    
+
+Create a PrintWriter instance to write the response page.
+
+```Java
+  PrintWriter pw = response.getWriter();
+
+    pw.println("<html>");
+    pw.println("<head><title>Kite Example</title></head>");
+    pw.println("<body>");
+```
+
+Get the user ID and message values from the servlet request.
+
+```Java
+    String userId = request.getParameter("user_id");
+    String message = request.getParameter("message");
+```
+
+If there's no message, don't create a log entry.
+
+```Java
+    if (message == null) {
+      pw.println("<p>No message specified.</p>");
+
+```
+
+Otherwise, print the message at the top of the page body.
+
+```Java
+    } else {
+      pw.println("<p>Message: " + message + "</p>");
+
+```
+
+Create a new StandardEvent builder.
+
+```Java
+      StandardEvent event = StandardEvent.newBuilder()
+```
+The event initiator is a user on the server. The event is a web message. These can be set as string literals, because the event initiator and event name are always the same.
+
+```Java
+          .setEventInitiator("server_user")
+          .setEventName("web:message")
+```
+
+Parse the arbitrary user ID, provided by the user, as a long integer.
+
+```Java
+          .setUserId(Long.parseLong(userId))
+
+```
+
+The application obtains the session ID and IP address from the request object, and creates a timestamp based on the local machine clock.
+
+```Java
+          .setSessionId(request.getSession(true).getId())
+          .setIp(request.getRemoteAddr())
+          .setTimestamp(System.currentTimeMillis())
+```
+
+Build the StandardEvent object, then send the object to the logger with the level _info_.
+
+```Java
+          .build();
+      logger.info(event);
+    }
+    pw.println("<p><a href=\"/demo-logging-webapp\">Home</a></p>");
+    pw.println("</body></html>");
+  }
+}
+```
+
+## Running the Web Application
+
+Follow these steps to build the web application, start the Tomcat server, and then use the web application to generate events that are sent to the Hadoop dataset.
+
+1. In a terminal window, navigate to `/kite-examples/demo`.
+1. Type the command `mvn install`.
+1. In the terminal window, enter `mvn tomcat7:run`.
+1. In a web browser, enter the URL [`http://quickstart.cloudera:8034/demo-logging-webapp/`][logging-app].
+1. On the web form, enter any user ID and a message, and then click **Send** to create a web event. 
+
+The Flume agent receives the events over inter-process communication (IPC), and the agent writes the events to the Hive file sink. Each time you send a message, Log4j writes a new `INFO` line in the terminal window.
+
+View the records in Hadoop using the Hue File Browser.
+
+[http://quickstart.cloudera:8888/filebrowser/view/tmp/data/default/events](http://quickstart.cloudera:8888/filebrowser/view/tmp/data/default/events)
+
+[logging-app]:http://quickstart.cloudera:8034/demo-logging-webapp/
diff --git a/tutorials/generate-events.md b/tutorials/generate-events.md
@@ -0,0 +1,69 @@
+---
+layout: page
+title: Generating Events
+---
+
+Kite applications work with Big Data. `GenerateEvents.java` generates 1-1.5 million random event records, a small amount of realistic Big Data you can use with Kite examples.
+
+Much of the class is devoted to creating random values. The two methods of interest are `run` and `generateRandomEvent`.
+
+The `run` method performs the following tasks:
+* creates a view of the `hive:events` dataset
+* creates a writer instance
+* spends 36 seconds writing random events
+* closes the writer, which stores the results in the `events` dataset.
+
+While the goal is to create random events, if they're _too_ random there won't be anything to aggregate. The `while` loop simulates a user session with random values for `sessionId`, `userId`, and `ip`. It then generates up to 25 random events for that session.
+
+```Java
+    View<StandardEvent> events = Datasets.load(
+        "dataset:hive:events", StandardEvent.class);
+    DatasetWriter<StandardEvent> writer = events.newWriter();
+    try {
+      Utf8 sessionId = new Utf8("sessionId");
+      long userId = 0;
+      Utf8 ip = new Utf8("ip");
+      int randomEventCount = 0;
+      while (System.currentTimeMillis() - baseTimestamp < 36000) {
+        sessionId = randomSessionId();
+        userId = randomUserId();
+        ip = randomIp();
+        randomEventCount = random.nextInt(25);
+        for (int i=0; i < randomEventCount; i++) {
+          writer.write(generateRandomEvent(sessionId, userId, ip));
+        }
+      }
+    } finally {
+      writer.close();
+    }
+```
+
+The `generateRandomEvent` method produces `StandardEvent` objects using random values for the event and time details. 
+
+```Java
+  public StandardEvent generateRandomEvent(Utf8 sessionId, long userId, Utf8 ip) {
+    return StandardEvent.newBuilder()
+        .setEventInitiator(new Utf8("client_user"))
+        .setEventName(randomEventName())
+        .setUserId(userId)
+        .setSessionId(sessionId)
+        .setIp(ip)
+        .setTimestamp(randomTimestamp())
+        .setEventDetails(randomEventDetails())
+        .build();
+  }
+```
+
+## Running GenerateEvents
+
+This example assumes that you've already created the [`hive:events` dataset][events].
+
+These are the steps to run the GenerateEvents program to populate the `hive:events` dataset. 
+
+1. In a terminal window, navigate to `/kite-examples/dataset`.
+1. Enter `mvn compile`.
+1. Run the Java utility with `mvn exec:java -Dexec.mainClass="org.kitesdk.examples.data.GenerateEvents"`.
+
+Use Hue to view the records in Hive.
+
+[events]:{{site.baseurl}}/tutorials/create-events-dataset.html