Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Spark Issue #2

Open
tjtate opened this issue Sep 2, 2019 · 6 comments
Open

Apache Spark Issue #2

tjtate opened this issue Sep 2, 2019 · 6 comments

Comments

@tjtate
Copy link

tjtate commented Sep 2, 2019

In this section of the zepplin notebook:

Create a Structured Streaming Dataset that consumes the data from a "neo4j" topic (the default topic of the CDC)

when running the command:

val kafkaStreamingDF = (spark
    .readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "broker:9093")
    .option("startingoffsets", "earliest")
    .option("subscribe", "neo4j")
    .load())

i get the following error:

java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:546)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:87)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:87)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:196)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
  at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
  ... 47 elided
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22$$anonfun$apply$14.apply(DataSource.scala:530)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22$$anonfun$apply$14.apply(DataSource.scala:530)
  at scala.util.Try$.apply(Try.scala:192)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22.apply(DataSource.scala:530)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$22.apply(DataSource.scala:530)
  at scala.util.Try.orElse(Try.scala:84)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:530)
  ... 54 more

Its like apache spark isnt installed at all, then i dug into the docker file and didnt see reference to it. Im an a complete noob to this and just wanted to understand this conceptually and do not have much exposure to proper configuration. any help is greatly appreciated.

@cissecedric
Copy link

cissecedric commented Sep 2, 2019

Hi,
you may have forgotten to run this lines below, wich can be found at the begining, of the notebook:

%spark.dep
z.reset()
z.load("org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1")

in my case it solved this error.

But after that I got another error when I re-run the command:

val kafkaStreamingDF = (spark
    .readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "broker:9093")
    .option("startingoffsets", "earliest")
    .option("subscribe", "neo4j")
    .load())

I got:

java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)
  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
  at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
  at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
  at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:526)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:87)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:87)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:196)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
  at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
  ... 47 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamWriteSupport
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  ... 80 more ERROR   

I also need help !
Thanks

@tjtate
Copy link
Author

tjtate commented Sep 3, 2019

Thanks! I ran the 1st command and reran the


val kafkaStreamingDF = (spark
    .readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "broker:9093")
    .option("startingoffsets", "earliest")
    .option("subscribe", "neo4j")
    .load())

and got the same error you received above... let me know if you know how to fix this.

@tjtate
Copy link
Author

tjtate commented Sep 3, 2019

looking here https://stackoverflow.com/questions/52115055/spark-error-java-lang-noclassdeffounderror-org-apache-spark-sql-sources-v2-st it says to add the new spark jar. not sure how to do that either though

@cissecedric
Copy link

cissecedric commented Sep 3, 2019

it worked for me with this jar version

%spark.dep
z.reset()
z.load("org.apache.spark:spark-sql-kafka-0-10_2.10:2.2.3")

@mauliktrapas
Copy link

Did anyone got resolution?

@vikramrajsahu
Copy link

vikramrajsahu commented Jun 30, 2021

I was also facing the same issue, resolved by using the correct version of spark-sql-kafka-0-10 as per my spark installation.

Just download the jar from Maven Repository and use it with spark-submit or spark-shell like spark-shell --jars ../Jars/spark-sql-kafka-0-10_2.11-2.1.0.jar

You can also provide the same via pom.xml file or build.sbt whichever you are using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants