Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, ZeroMQ, Kinesis or TCP sockets can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window.

Spark Streaming receives live input data streams and divides the data into micro batches, which are then processed by the Spark engine to generate the final stream of results in batches.

Spark Streaming is available through Maven Central Repository

Maven

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.10</artifactId>
    <version>1.3.1</version>
</dependency>

SBT

libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.3.0"

For ingesting data from sources like Kafka or Flume, that are not part of the Spark Streaming core API, you need to add the corresponding artifact spark-streaming-xyz_2.10 that includes all required classes to integrate Spark Streaming with the selected source.

Kafka

Maven

<dependency>
	<groupId>org.apache.spark</groupId>
	<artifactId>spark-streaming-kafka_2.10</artifactId>
	<version>1.3.0</version>
</dependency>

SBT

libraryDependencies += "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.3.0"

Flume

Maven

<dependency>
	<groupId>org.apache.spark</groupId>
	<artifactId>spark-streaming-flume_2.10</artifactId>
	<version>1.3.0</version>
</dependency>

SBT

libraryDependencies += "org.apache.spark" % "spark-streaming-flume_2.10" % "1.3.0"

Sparking
- Spark
  - How to...
    - [Use pySpark from CLI] (https://github.com/ContentWise/sparking/wiki/Use-Spark-(python)-from-CLI)
    - Split your sorted dataset in different files
    - [Order couple of items] (https://github.com/ContentWise/sparking/wiki/From-score-to-rank)
  - Solution for memory/disk leaks issues
    - The persist paradox
  - Best Practices
- Spark Streaming
  - How to
    - Kafka integration
    - Flume integration
  - Best Practices
Contributors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark Streaming

Spark Streaming

Maven

SBT

Kafka

Maven

SBT

Flume

Maven

SBT

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally