CDK-928: Utility to generate events to existing table. #23

DennisDawson · 2015-02-18T23:25:25Z

With an eye toward modularization, I've repurposed CreateEvents.java from the Spark example and placed it in org/kitesdk/examples/data. This lets the customer create the events dataset using the CLI, then populate it with a substantial number of records using the Java utility. The same dataset can be used for the Flume and Spark examples, without having to delete them after running their respective jobs.

In GenerateEvents, I essentially swapped the CreateEvents create() method with load(). I added the Avro plug-in to pom.xml, copied the avro folder with standard_event.avscinto the main directory, and copied BaseEventsTool.java to org/kitesdk/examples/data.

In my environment, it compiles, runs, and populates the events table as expected.

**Update

The random records were a little too random: if the user_id, session_id, and ip are different each time, when the Crunch utility runs, there are no sessions to aggregate. I revised the run method to first generate the user_id, session_id, and ip, then used a for loop to generate 1-25 random events. I also modified the randomTimestamp method to increase the base length of time and add random padding to create more realistic session duration.

I'm happy to incorporate any changes that make the code more elegant, my changes just make it work.

rdblue · 2015-03-06T21:45:03Z

dataset/src/main/java/org/kitesdk/examples/data/BaseEventsTool.java

+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.util.Tool;
+
+public abstract class BaseEventsTool extends Configured implements Tool {


It doesn't look like any of the code in this class is used, so it would be better to remove it and make GenerateEvents implement Tool directly.

Excellent. Done.

…and line.

rdblue · 2015-03-23T21:31:00Z

dataset/src/main/java/org/kitesdk/examples/data/GenerateEvents.java

+    baseTimestamp = System.currentTimeMillis();  
+
+    View<StandardEvent> events = Datasets.load(
+        (args.length==1 ? args[0] : "dataset:hive:events"), StandardEvent.class);


I noted this elsewhere, but I think it would be better to use a variable rather than the inline test here.

Is this wrong, or just different? Are you suggesting that the test should set the variable before the load method? If the argument is invalid, does it change the result by setting it outside the load method? If the code must change before publication, please provide the acceptable alternate code, rather than have me guess at what I should do.

Utility to generate events to existing table.

61c341e

DennisDawson changed the title ~~Utility to generate events to existing table.~~ CDK-928: Utility to generate events to existing table. Feb 19, 2015

Updating GenerateEvents to create more interesting random sessions.

6dcf86a

DennisDawson mentioned this pull request Mar 6, 2015

CDK-29, CDK-466, CDK-827: Include use of CLI flume-config. kite-sdk/kite-docs#79

Open

rdblue reviewed Mar 6, 2015
View reviewed changes

DennisDawson added 3 commits March 9, 2015 12:25

Remove BaseEventsTool.java - not needed.

af3090b

Removed dependency on BaseEventsTool.java. Pass dataset URI from comm…

b076c5c

…and line.

Moved counter to local variable per RBlue.

68a0f7d

rdblue reviewed Mar 23, 2015
View reviewed changes

Update GenerateEvents.java

4836d3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDK-928: Utility to generate events to existing table. #23

CDK-928: Utility to generate events to existing table. #23

DennisDawson commented Feb 18, 2015

rdblue Mar 6, 2015

DennisDawson Mar 9, 2015

rdblue Mar 23, 2015

DennisDawson Mar 27, 2015

CDK-928: Utility to generate events to existing table. #23

Are you sure you want to change the base?

CDK-928: Utility to generate events to existing table. #23

Conversation

DennisDawson commented Feb 18, 2015

rdblue Mar 6, 2015

Choose a reason for hiding this comment

DennisDawson Mar 9, 2015

Choose a reason for hiding this comment

rdblue Mar 23, 2015

Choose a reason for hiding this comment

DennisDawson Mar 27, 2015

Choose a reason for hiding this comment