Merge pull request #166 from fe2s/more-docs-and-delete-sbt

gkorland · web-flow · commit 4965db9f858b · 2019-05-07T10:58:27.000+03:00
More docs and delete sbt
diff --git a/README.md b/README.md
@@ -44,6 +44,7 @@ This library is a work in progress so the API may change before the official rel
   - [Java](doc/java.md)
   - [Python](doc/python.md)
   - [Configuration](doc/configuration.md)
+  - [Dev environment](doc/dev.md)
 
 ## Contributing
 
diff --git a/build.sbt b/build.sbt
diff --git a/doc/dev.md b/doc/dev.md
@@ -0,0 +1,27 @@
+### Development Environment
+
+Spark-Redis is built using [Apache Maven](https://maven.apache.org/) and a helper [GNU Make](https://www.gnu.org/software/make/) file.
+Maven is used to build a jar file and run tests. Makefile is used to start and stop redis instances required for integration tests.
+
+The `Makefile` expects that Redis binaries (`redis-server` and`redis-cli`) are in your `PATH` environment variable.
+
+To build Spark-Redis and run tests, run:
+
+```
+make package
+```
+
+To run tests:
+
+```
+make test
+```
+
+If you would like to run tests from your IDE, you have to start Redis test instances with `make start` before that. To stop test
+instances, run `make stop`.
+
+To build Spark-Redis skipping tests, run:
+
+```
+mvn clean package -DskipTests
+```
diff --git a/doc/structured-streaming.md b/doc/structured-streaming.md
@@ -39,6 +39,53 @@ xadd sensors * sensor-id 2 temperature 30.5
 xadd sensors * sensor-id 1 temperature 28.3
 ```
 
+### Output to Redis
+
+There is no Redis Sink available, but you can leverage [`foreachBatch`](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#foreachbatch) and [DataFrame](dataframe.md) write command to output 
+stream into Redis. Please note, `foreachBatch` is only available starting from Spark 2.4.0.
+
+```scala
+val query = sensors
+  .writeStream
+  .outputMode("update")
+  .foreachBatch { (batchDF: DataFrame, batchId: Long) =>
+    batchDF
+      .write
+      .format("org.apache.spark.sql.redis")
+      .option("table", "output")
+      .mode(SaveMode.Append)
+      .save()
+  }
+  .start()
+
+query.awaitTermination()
+``` 
+
+After writing the following to the Redis Stream:
+```
+xadd sensors * sensor-id 1 temperature 28.1
+xadd sensors * sensor-id 2 temperature 30.5
+xadd sensors * sensor-id 1 temperature 28.3
+```
+
+there will be the output `keys output:*`:
+```
+1) "output:b1682af092b9467cb13cfdcf7fcc9835"
+2) "output:04c80769320f4edeadcce8381a6f834d"
+3) "output:4f04070a2fd548fdbea441b694c8673b"
+```
+
+`hgetall output:b1682af092b9467cb13cfdcf7fcc9835`:
+
+```
+1) "sensor-id"
+2) "2"
+3) "temperature"
+4) "30.5"
+```
+
+Please refer to [DataFrame docs](dataframe.md) for different options(such as specifying key name) available for writing .
+
 ### Stream Offset
 
 By default it pulls messages starting from the latest message in the stream. If you need to start from the specific position in the stream, specify the `stream.offsets` parameter as a JSON string.