-
Notifications
You must be signed in to change notification settings - Fork 11
CDK-29, CDK-466, CDK-827: Include use of CLI flume-config. #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
f9e33e7
16e423c
b9f1150
eef961f
bf0f4fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
--- | ||
layout: page | ||
title: Creating the Events Dataset | ||
--- | ||
|
||
This lesson shows you how to create a dataset suitable for storing standard event records. You define a dataset schema, a partition strategy, and a URI that specifies the storage scheme. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should describe what a standard event record is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The next section goes into detail about what a standard event is. If the consumer is momentarily intrigued and wonders what a standard event could be, their suspense will be short-lived. I ended up adding a link to the white paper in the context at the start of the tutorial, so this is moot. |
||
|
||
## Defining the Schema | ||
|
||
The `standard_event.avsc` schema is self-describing, thanks to the _doc_ property for each of the fields. The fields store the `user_id` for the person who initiated the event, the user's IP address, and when the event occurred. | ||
|
||
### standard_event.avsc | ||
|
||
```JSON | ||
{ | ||
"name": "StandardEvent", | ||
"namespace": "org.kitesdk.data.event", | ||
"type": "record", | ||
"doc": "A standard event type for logging, based on the paper 'The Unified Logging Infrastructure for Data Analytics at Twitter' by Lee et al, http://vldb.org/pvldb/vol5/p1771_georgelee_vldb2012.pdf", | ||
"fields": [ | ||
{ | ||
"name": "event_initiator", | ||
"type": "string", | ||
"doc": "Source of the event in the format {client,server}_{user,app}; for example, 'client_user'. Required." | ||
}, | ||
{ | ||
"name": "event_name", | ||
"type": "string", | ||
"doc": "A hierarchical name for the event, with parts separated by ':'. Required." | ||
}, | ||
{ | ||
"name": "user_id", | ||
"type": "long", | ||
"doc": "A unique identifier for the user. Required." | ||
}, | ||
{ | ||
"name": "session_id", | ||
"type": "string", | ||
"doc": "A unique identifier for the session. Required." | ||
}, | ||
{ | ||
"name": "ip", | ||
"type": "string", | ||
"doc": "The IP address of the host where the event originated. Required." | ||
}, | ||
{ | ||
"name": "timestamp", | ||
"type": "long", | ||
"doc": "The point in time when the event occurred, represented as the number of milliseconds since January 1, 1970, 00:00:00 GMT. Required." | ||
} | ||
] | ||
} | ||
``` | ||
|
||
For convenience, save `standard_event.avsc` to the same directory where you installed the kite-dataset executable JAR. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be better not to specify where this should be saved, as long as it is accessible. The user might have installed kite-dataset in their parcel directory so that location isn't always appropriate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the updated install instructions, kite-dataset should run properly from any location. Changed this example to point to the home directory, with instructions to change the directory if needed. |
||
|
||
## Defining the Partition Strategy | ||
|
||
Analytics for the `events` dataset are time-based. Partitioning the dataset on the `timestamp` field allows Kite to go directly to the files for a particular day, ignoring files outside the chosen time period. Partition strategies are defined in JSON format. See [Partition Strategy JSON Format][partition-strategies]. | ||
|
||
The following code sample defines a strategy that partitions a dataset by _year_, _month_, and _day_, based on a _timestamp_ field. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't call this a code sample, it is a configuration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fine, but this is splitting hairs. JSON is a lightweight data interchange format based on a subset of JavaScript. Whatever we call it, it's a simple form of code. This is not confusing to the consumer. |
||
|
||
### standard_event.json | ||
|
||
``` | ||
[ { | ||
"source" : "timestamp", | ||
"type" : "year", | ||
"name" : "year" | ||
}, { | ||
"source" : "timestamp", | ||
"type" : "month", | ||
"name" : "month" | ||
}, { | ||
"source" : "timestamp", | ||
"type" : "day", | ||
"name" : "day" | ||
} ] | ||
``` | ||
|
||
For convenience, save `standard_event.json` to the same directory where you installed the `kite-dataset` executable JAR. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be named according to what it does rather than after the schema it could be used in conjunction with. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see that you now use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. Please file a JIRA issue and assign yourself. |
||
|
||
[partition-strategies]:{{site.baseurl}}/Partition-Strategy-Format.html | ||
|
||
## Creating the Events Dataset Using the Kite CLI | ||
|
||
Create the _events_ dataset using the default Hive scheme. | ||
|
||
To create the _events_ dataset: | ||
|
||
1. Open a terminal window and navigate to the directory where you saved the schema file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be easier to tell the user to go to a specific directory at the start instead of pointing them to the right one each time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't hurt to be sure they're in the correct directory when the steps are given in separate task sets. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think it does. This makes the user keep track of a decision he or she made earlier, rather than being clear about what they should do. It's fine to remind them the exact location, but it should be an exact location at this point in the tutorial. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moot. |
||
1. Use the `create` command to create the dataset. | ||
|
||
``` | ||
kite-dataset create events \ | ||
--schema standard_event.avsc \ | ||
--partition-by standard_event.json | ||
``` | ||
|
||
Use Hue to look at the schema and confirm that the dataset is ready to use. | ||
|
||
[http://quickstart.cloudera:8888/filebrowser/view//tmp/data/default/events/.metadata/schema.avsc](http://quickstart.cloudera:8888/filebrowser/view//tmp/data/default/events/.metadata/schema.avsc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It isn't clear that this link is how the reader would use Hue to look at the schema. Plus, the dataset is in Hive so the reader should use the metastore browser instead of the file browser. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to "Use [Hue][hue] to confirm that the dataset appears in your table list and is ready to use." |
||
|
||
## Next Steps | ||
|
||
You've created a dataset to store events captured as they happen. Now you can run a web application to create records in your new dataset. See [Capturing Events with Flume][capture-events]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the context given in the first sentence should be in the setup for this tutorial. The tutorial series creates a dataset to load with flume as events come in, but we actually use it in a few ways: we bulk-load fake data, stream events in through flume, and read events with Crunch. Saying that the dataset was created to do one of those is a little misleading and leaves the me wondering which part makes this dataset specific to storing events as they happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've discussed with Cris Morris ways that we can add context, and will address this concern with a new section at the top of each page. |
||
|
||
[capture-events]:{{site.baseurl}}/tutorials/flume-capture-events.html |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
--- | ||
layout: page | ||
title: Capturing Events with Flume | ||
--- | ||
|
||
Once you have an [Events dataset][events], you can create a web application that captures session events. | ||
|
||
This example shows how you can send log information via Flume to your Hadoop database using a JSP and custom servlet running on Tomcat. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is great context, but I'd like to have a bit more information. At least:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add a context section to the top of each tutorial. |
||
|
||
[events]:{{site.baseurl}}/tutorials/create-events-dataset.html | ||
|
||
## Configuring Flume | ||
|
||
These are the steps to configure Flume to channel log information directly to the `events` dataset. You first generate the configuration information using the Kite command-line interface, copy the results, paste them in the Flume configuration file, and then restart Flume. | ||
|
||
1. In a terminal window, type `kite-dataset flume-config --channel-type memory events`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the command could use more explanation. What doe sit do and why? What might I change for production instead of the demo? (The channel would be a file channel in production, by the way) |
||
1. Copy the output from the terminal window. | ||
1. Open Cloudera Manager. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The VM setup instructions don't assume you're using Cloudera Manager, so I don't think this should either. The alternative is to copy the flume config and restart:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a separate list of instructions for configuring Flume from the command line. |
||
1. Under __Status__, click the link to __Flume__. | ||
1. Choose the __Configuration__ tab. | ||
1. Click __Agent Base Group__. | ||
1. Right-click the Configuration File text area and choose __Select All__. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: "Configuration File" should be distinguished as something to look for with formatting. |
||
1. Right-click the Configuration File text area and choose __Paste__. | ||
1. Click __Save Changes__. | ||
1. From the __Actions__ menu, choose __Restart__, and confirm the action. | ||
|
||
Flume is now configured to receive logging events and record them in the `events` dataset. | ||
|
||
## Creating Web Application Pages | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has a lot of discussion on the servlet and JSP, but the focus should be on the architecture: The servlet (or any application) uses Log4j to "log" events, which are Avro records. Log4j is configured to send those events to Flume. Flume accumulates events and writes them to the Dataset. Most people want to know why we are doing it this way:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding context will help, but the essential issue here is that all we're doing is configuring Flume to watch for events, and that's achieved using the CLI. There's very little Kite stuff involved in this example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree, but that highlights the need to make what is happening very clear. Focusing on the web application doesn't help the user understand the message we want them to, but discussing why they care about Kite's support in Flume does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be "Understanding" not "Creating" because the app is already written. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the logging configuration and StandardEvent code should be explained above this point, so the rest of the tutorial is optional for those readers that want to understand the rest of the web app in detail. |
||
|
||
These JSP and servlet examples create message events that can be captured by Flume. | ||
|
||
## index.jsp | ||
|
||
The default landing page for the web application is `index.jsp`. It defines a form with fields for an arbitrary User ID and a message. The __Send__ button submits the input values to the Tomcat server. | ||
|
||
```JSP | ||
<html> | ||
<head> | ||
<title>Kite Example</title> | ||
<head> | ||
<body> | ||
<h2>Kite Example</h2> | ||
<form name="input" action="send" method="get"> | ||
User ID: <input type="text" name="user_id" value="1"> | ||
Message: <input type="text" name="message" value="Hello!"> | ||
<input type="submit" value="Send"> | ||
</form> | ||
</body> | ||
</html> | ||
``` | ||
|
||
## LoggingServlet | ||
|
||
When you submit a message from the JSP, the LoggingServlet receives and processes the request. The following is mostly standard servlet code, with some notes about application-specific snippets. | ||
|
||
```Java | ||
package org.kitesdk.examples.demo; | ||
``` | ||
|
||
The servlet parses information from the request to create a StandardEvent object. However, you won't find any source code for `org.kitesdk.data.event.StandardEvent`. During the Maven build, the avro-maven-plugin runs before the compile phase. Any `.avsc` file in the `/main/avro` folder is defined as a Java class. The autogenerated classes have the methods required to build corresponding Avro `SpecificRecord` objects of that type. `SpecificRecord` objects permit efficient access to object fields. | ||
|
||
```Java | ||
|
||
import org.kitesdk.data.event.StandardEvent; | ||
import java.io.IOException; | ||
import java.io.PrintWriter; | ||
import javax.servlet.ServletException; | ||
import javax.servlet.http.HttpServlet; | ||
import javax.servlet.http.HttpServletRequest; | ||
import javax.servlet.http.HttpServletResponse; | ||
|
||
``` | ||
|
||
This example sends Log4j messages directly to the Hive data sink via Flume. | ||
|
||
```Java | ||
import org.apache.log4j.Logger; | ||
|
||
public class LoggingServlet extends HttpServlet { | ||
|
||
private final Logger logger = Logger.getLogger(LoggingServlet.class); | ||
|
||
@Override | ||
protected void doGet(HttpServletRequest request, HttpServletResponse | ||
response) throws ServletException, IOException { | ||
|
||
response.setContentType("text/html"); | ||
``` | ||
|
||
Create a PrintWriter instance to write the response page. | ||
|
||
```Java | ||
PrintWriter pw = response.getWriter(); | ||
|
||
pw.println("<html>"); | ||
pw.println("<head><title>Kite Example</title></head>"); | ||
pw.println("<body>"); | ||
``` | ||
|
||
Get the user ID and message values from the servlet request. | ||
|
||
```Java | ||
String userId = request.getParameter("user_id"); | ||
String message = request.getParameter("message"); | ||
``` | ||
|
||
If there's no message, don't create a log entry. | ||
|
||
```Java | ||
if (message == null) { | ||
pw.println("<p>No message specified.</p>"); | ||
|
||
``` | ||
|
||
Otherwise, print the message at the top of the page body. | ||
|
||
```Java | ||
} else { | ||
pw.println("<p>Message: " + message + "</p>"); | ||
|
||
``` | ||
|
||
Create a new StandardEvent builder. | ||
|
||
```Java | ||
StandardEvent event = StandardEvent.newBuilder() | ||
``` | ||
The event initiator is a user on the server. The event is a web message. These can be set as string literals, because the event initiator and event name are always the same. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this might be a bug. Isn't the user a client and not a server? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that the event generator is orthogonal to the discussion. It would be helpful if kite-sdk/kite-examples#23 were addressed, to ensure that the code is sound, or at least not embarrassing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that ready for review? I thought you were having trouble with command-line options. I'm happy to review it when you think it's ready. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is ready for review. It runs, and does what I want it to do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to "user on the client." |
||
|
||
```Java | ||
.setEventInitiator("server_user") | ||
.setEventName("web:message") | ||
``` | ||
|
||
Parse the arbitrary user ID, provided by the user, as a long integer. | ||
|
||
```Java | ||
.setUserId(Long.parseLong(userId)) | ||
|
||
``` | ||
|
||
The application obtains the session ID and IP address from the request object, and creates a timestamp based on the local machine clock. | ||
|
||
```Java | ||
.setSessionId(request.getSession(true).getId()) | ||
.setIp(request.getRemoteAddr()) | ||
.setTimestamp(System.currentTimeMillis()) | ||
``` | ||
|
||
Build the StandardEvent object, then send the object to the logger with the level _info_. | ||
|
||
```Java | ||
.build(); | ||
logger.info(event); | ||
} | ||
pw.println("<p><a href=\"/demo-logging-webapp\">Home</a></p>"); | ||
pw.println("</body></html>"); | ||
} | ||
} | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There should be some summary, next steps, etc. here. |
||
|
||
## Running the Web Application | ||
|
||
Follow these steps to build the web application, start the Tomcat server, and then use the web application to generate events that are sent to the Hadoop dataset. | ||
|
||
1. In a terminal window, navigate to `/kite-examples/demo`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Paths shouldn't start with "/" unless they are actually root paths. I think it would be better to have the user go to a specific directory and use relative paths in the tutorial. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the /. |
||
1. Type the command `mvn install`. | ||
1. In the terminal window, enter `mvn tomcat7:run`. | ||
1. In a web browser, enter the URL [`http://quickstart.cloudera:8034/demo-logging-webapp/`][logging-app]. | ||
1. On the web form, enter any user ID and a message, and then click **Send** to create a web event. | ||
|
||
The Flume agent receives the events over inter-process communication (IPC), and the agent writes the events to the Hive file sink. Each time you send a message, Log4j writes a new `INFO` line in the terminal window. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should have more discussion and appear at the top of this tutorial. Readers are going to have the most questions about this part and the underlying architecture. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add a context section to the top of each tutorial. |
||
|
||
View the records in Hadoop using the Hue File Browser. | ||
|
||
[http://quickstart.cloudera:8888/filebrowser/view/tmp/data/default/events](http://quickstart.cloudera:8888/filebrowser/view/tmp/data/default/events) | ||
|
||
[logging-app]:http://quickstart.cloudera:8034/demo-logging-webapp/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
layout: page | ||
title: Generating Events | ||
--- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this tutorial needs more context:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you're right about this. I will add a section at the beginning of each tutorial concisely setting the context. |
||
|
||
Kite applications work with Big Data. `GenerateEvents.java` generates 1-1.5 million random event records, a small amount of realistic Big Data you can use with Kite examples. | ||
|
||
Much of the class is devoted to creating random values. The two methods of interest are `run` and `generateRandomEvent`. | ||
|
||
The `run` method performs the following tasks: | ||
* creates a view of the `hive:events` dataset | ||
* creates a writer instance | ||
* spends 36 seconds writing random events | ||
* closes the writer, which stores the results in the `events` dataset. | ||
|
||
While the goal is to create random events, if they're _too_ random there won't be anything to aggregate. The `while` loop simulates a user session with random values for `sessionId`, `userId`, and `ip`. It then generates up to 25 random events for that session. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think people need to understand the code for events generation unless they are interested. Could you move the relevant information about what is happening to the intro, then just cover how in this section? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would be nice is if we had something interesting that exercised Kite in a practical way. I will handle the reorganization. Whatever code we provide as a sample needs to be described, at least in broad strokes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed, but I want for users to be able to understand what the tutorial accomplishes (add interesting data for later tutorials) and get it done before diving into it because it is mostly an interesting tangent. We should also note that it is optional.
Could you explain what you mean here? |
||
|
||
```Java | ||
View<StandardEvent> events = Datasets.load( | ||
"dataset:hive:events", StandardEvent.class); | ||
DatasetWriter<StandardEvent> writer = events.newWriter(); | ||
try { | ||
Utf8 sessionId = new Utf8("sessionId"); | ||
long userId = 0; | ||
Utf8 ip = new Utf8("ip"); | ||
int randomEventCount = 0; | ||
while (System.currentTimeMillis() - baseTimestamp < 36000) { | ||
sessionId = randomSessionId(); | ||
userId = randomUserId(); | ||
ip = randomIp(); | ||
randomEventCount = random.nextInt(25); | ||
for (int i=0; i < randomEventCount; i++) { | ||
writer.write(generateRandomEvent(sessionId, userId, ip)); | ||
} | ||
} | ||
} finally { | ||
writer.close(); | ||
} | ||
``` | ||
|
||
The `generateRandomEvent` method produces `StandardEvent` objects using random values for the event and time details. | ||
|
||
```Java | ||
public StandardEvent generateRandomEvent(Utf8 sessionId, long userId, Utf8 ip) { | ||
return StandardEvent.newBuilder() | ||
.setEventInitiator(new Utf8("client_user")) | ||
.setEventName(randomEventName()) | ||
.setUserId(userId) | ||
.setSessionId(sessionId) | ||
.setIp(ip) | ||
.setTimestamp(randomTimestamp()) | ||
.setEventDetails(randomEventDetails()) | ||
.build(); | ||
} | ||
``` | ||
|
||
## Running GenerateEvents | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This answers some of my questions above, so it should come first. |
||
|
||
This example assumes that you've already created the [`hive:events` dataset][events]. | ||
|
||
These are the steps to run the GenerateEvents program to populate the `hive:events` dataset. | ||
|
||
1. In a terminal window, navigate to `/kite-examples/dataset`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These instructions should note that a prerequisite is cloning the examples repository and link back to the instructions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll deal with this in the context section at the start of the page. I don't want to continually link off the page throughout. At some point, the consumer needs to understand that these examples are modular and build on one another. |
||
1. Enter `mvn compile`. | ||
1. Run the Java utility with `mvn exec:java -Dexec.mainClass="org.kitesdk.examples.data.GenerateEvents"`. | ||
|
||
Use Hue to view the records in Hive. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you explain this a little more? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want to give the consumer some credit. If they're at a level where they are manipulating data using the API, they should be able to figure out how to view the records in Hive without a hand-holding lesson. |
||
|
||
[events]:{{site.baseurl}}/tutorials/create-events-dataset.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this tutorial needs more context as well. Maybe there should be an overall tutorial page that outlines the example goal, or maybe it can be included at the start of the tutorials. Otherwise, it isn't clear why we're creating "the" events dataset or what this is used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add context at the start of each lesson.