-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I am trying to read two tables from Kudu and join them in the query.
I followed the example steps of reading the Table to DataFrame and registering it as a temp table. I repeat the same steps for a second table and then I query on them.
I have then use the dbGetQuery() method to pass a query joining the two tables and getting it in the data frame.
I get the following error:
Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 8.0 failed 1 times, most recent failure: Lost task 19.0 in stage 8.0 (TID 163, localhost, executor driver): org.apache.kudu.client.NonRecoverableException: Scanner not found at org.apache.kudu.client.KuduException.transformException(KuduException.java:110) at org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:352) at org.apache.kudu.client.KuduScanner.nextRows(KuduScanner.java:58) at org.apache.kudu.spark.kudu.RowIterator.hasNext(KuduRDD.scala:120) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148) at org.apache.spark.schedule
The sample query is:
`test_query <- paste("SELECT * FROM tbl1 n0 FULL OUTER JOIN tbl2 n1 on n0.id = n1.id WHERE n0.id LIKE CONCAT(cast(default.getJulianFromDate('yyyy-MM-dd hh:mm:ss', '", Sys.getenv("START"), "') AS STRING),'%') AND n1.id LIKE CONCAT(cast(default.getJulianFromDate('yyyy-MM-dd hh:mm:ss', '", Sys.getenv("START"), "') AS STRING),'%') LIMIT 100",sep="")
table_df <- dbGetQuery(sc, test_query)`