-
Notifications
You must be signed in to change notification settings - Fork 1
Spark Streaming
Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, ZeroMQ, Kinesis or TCP sockets can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window.
Spark Streaming receives live input data streams and divides the data into micro batches, which are then processed by the Spark engine to generate the final stream of results in batches.
Spark Streaming is available through Maven Central Repository
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.3.1</version>
</dependency>
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.3.0"
For ingesting data from sources like Kafka or Flume, that are not part of the Spark Streaming core API, you need to add the corresponding artifact spark-streaming-xyz_2.10 that includes all required classes to integrate Spark Streaming with the selected source.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.3.0</version>
</dependency>
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.3.0"
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-flume_2.10</artifactId>
<version>1.3.0</version>
</dependency>
libraryDependencies += "org.apache.spark" % "spark-streaming-flume_2.10" % "1.3.0"
-
Sparking
- Spark
- Spark Streaming
- Contributors