Skip to content

Latest commit

 

History

History
26 lines (14 loc) · 857 Bytes

README.md

File metadata and controls

26 lines (14 loc) · 857 Bytes

spark-util

A simple helper library that allows your Spark job to handle exceptions related to malformed file formats more gracefully.

Sometimes, Spark can be unforgiving when it comes to consuming files that may contain malformed records. This library contains classes that extend the Hadoop Avro and text input file formats that are commonly used when consuming files from HDFS via Spark.

Examples

An example when consuming Avro files via the Spark framework:

 sparkContext.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable,AtlasAvroKeyInputFormat[GenericRecord]]("hdfs://nameservice1/user/data/")

An example consuming files containing text records via the Spark framework:


 sparkContext.newAPIHadoopFile[LongWritable, Text, AtlasTextInputFileFormat]("hdfs://nameservice1/user/data/")