-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARKC-706: Add basic support for Cassandra vectors #1366
Conversation
43073c7
to
972458c
Compare
8e55ff0
to
631db5d
Compare
631db5d
to
67c41f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! :)
Please have a look at the minor comments and add a note with a large font in the main README stating that the vector type is now supported. Additionally, please add a document showcasing how it can be used (examples taken from tests should be okay here).
} | ||
} | ||
|
||
/** Skips the given test if the Cluster Version is lower or equal to the given version */ | ||
def from(version: Version)(f: => Unit): Unit = { | ||
private def from(version: Version)(f: => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc is not correct, right? It skips only when the version is lower.
@@ -147,16 +147,24 @@ trait SparkCassandraITSpecBase | |||
|
|||
/** Skips the given test if the Cluster Version is lower or equal to the given `cassandra` Version or `dse` Version | |||
* (if this is a DSE cluster) */ | |||
def from(cassandra: Version, dse: Version)(f: => Unit): Unit = { | |||
def from(cassandra: Version, dse: Version)(f: => Unit): Unit = from(Some(cassandra), Some(dse))(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Option instead of Some
CaseClassType <: Product : ClassTag : TypeTag : ColumnMapper: RowReaderFactory : ValidRDDType](typeName: String) extends SparkCassandraITFlatSpecBase with DefaultCluster | ||
{ | ||
/** Skips the given test if the cluster is not Cassandra */ | ||
override def cassandraOnly(f: => Unit): Unit = super.cassandraOnly(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
def createVectorTable(session: CqlSession, table: String): Unit = { | ||
session.execute( | ||
s"""CREATE TABLE IF NOT EXISTS $ks.$table ( | ||
| id INT PRIMARY KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a vector be a primary key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, why do you ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if that's supported.
} | ||
} | ||
|
||
private def hasVectors(rows: List[Row], expectedVectors: Seq[Seq[ScalaType]]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assertVectors would be more clear as the method doesn't return boolean, instead it asserts on the content.
implicitly[TypeTag[CqlVector[T]]] | ||
} | ||
|
||
def newCollection(items: Iterable[Any]): java.util.List[T] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be private
} | ||
|
||
def newCollection(items: Iterable[Any]): java.util.List[T] = { | ||
val buf = new java.util.ArrayList[T](dimension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why java collection here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This collection is passed to CqlVector.getInstance
which expects either Java List
or array. If we use here a Scala collection and then let JavaConverters
adjust that, we will have to create one extra wrapper which is a waste in this case.
@@ -184,7 +184,49 @@ val street = address.getString("street") | |||
val number = address.getInt("number") | |||
``` | |||
|
|||
[//]: # (TODO loading vectors) | |||
### Reading vectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add similar docs to https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
How did the Spark Cassandra Connector Work or Not Work Before this Patch
Describe the problem, or state of the project that this patch fixes. Explain
why this is a problem if this isn't obvious.
Example:
"When I read from tables with 3 INTS I get a ThreeIntException(). This is a problem because I often want to read from a table with three integers."
General Design of the patch
How the fix is accomplished, were new parameters or classes added? Why did you
pursue this particular fix?
Example: "I removed the incorrect assertion which would throw the ThreeIntException. This exception was incorrectly added and the assertion is not actually needed."
Fixes: Put JIRA Reference HERE
How Has This Been Tested?
Almost all changes and especially bug fixes will require a test to be added to either the integration or Unit Tests. Any tests added will be automatically run on travis when the pull request is pushed to github. Be sure to run suites locally as well.
Checklist: