Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKC-706: Add basic support for Cassandra vectors #1366

Merged
merged 6 commits into from
Jun 21, 2024
Merged

Conversation

jacek-lewandowski
Copy link
Contributor

Description

How did the Spark Cassandra Connector Work or Not Work Before this Patch

Describe the problem, or state of the project that this patch fixes. Explain
why this is a problem if this isn't obvious.

Example:
"When I read from tables with 3 INTS I get a ThreeIntException(). This is a problem because I often want to read from a table with three integers."

General Design of the patch

How the fix is accomplished, were new parameters or classes added? Why did you
pursue this particular fix?

Example: "I removed the incorrect assertion which would throw the ThreeIntException. This exception was incorrectly added and the assertion is not actually needed."

Fixes: Put JIRA Reference HERE

How Has This Been Tested?

Almost all changes and especially bug fixes will require a test to be added to either the integration or Unit Tests. Any tests added will be automatically run on travis when the pull request is pushed to github. Be sure to run suites locally as well.

Checklist:

  • I have a ticket in the OSS JIRA
  • I have performed a self-review of my own code
  • Locally all tests pass (make sure tests fail without your patch)

@jacek-lewandowski jacek-lewandowski force-pushed the SPARKC-706 branch 3 times, most recently from 43073c7 to 972458c Compare May 13, 2024 08:40
@jacek-lewandowski jacek-lewandowski changed the title Java driver version bump SPARKC-706: Add basic support for Cassandra vectors May 13, 2024
@jacek-lewandowski jacek-lewandowski force-pushed the SPARKC-706 branch 2 times, most recently from 8e55ff0 to 631db5d Compare May 13, 2024 10:51
Copy link
Collaborator

@jtgrabowski jtgrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! :)

Please have a look at the minor comments and add a note with a large font in the main README stating that the vector type is now supported. Additionally, please add a document showcasing how it can be used (examples taken from tests should be okay here).

}
}

/** Skips the given test if the Cluster Version is lower or equal to the given version */
def from(version: Version)(f: => Unit): Unit = {
private def from(version: Version)(f: => Unit): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is not correct, right? It skips only when the version is lower.

@@ -147,16 +147,24 @@ trait SparkCassandraITSpecBase

/** Skips the given test if the Cluster Version is lower or equal to the given `cassandra` Version or `dse` Version
* (if this is a DSE cluster) */
def from(cassandra: Version, dse: Version)(f: => Unit): Unit = {
def from(cassandra: Version, dse: Version)(f: => Unit): Unit = from(Some(cassandra), Some(dse))(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Option instead of Some

CaseClassType <: Product : ClassTag : TypeTag : ColumnMapper: RowReaderFactory : ValidRDDType](typeName: String) extends SparkCassandraITFlatSpecBase with DefaultCluster
{
/** Skips the given test if the cluster is not Cassandra */
override def cassandraOnly(f: => Unit): Unit = super.cassandraOnly(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

def createVectorTable(session: CqlSession, table: String): Unit = {
session.execute(
s"""CREATE TABLE IF NOT EXISTS $ks.$table (
| id INT PRIMARY KEY,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a vector be a primary key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, why do you ask?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if that's supported.

}
}

private def hasVectors(rows: List[Row], expectedVectors: Seq[Seq[ScalaType]]): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assertVectors would be more clear as the method doesn't return boolean, instead it asserts on the content.

implicitly[TypeTag[CqlVector[T]]]
}

def newCollection(items: Iterable[Any]): java.util.List[T] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be private

}

def newCollection(items: Iterable[Any]): java.util.List[T] = {
val buf = new java.util.ArrayList[T](dimension)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why java collection here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collection is passed to CqlVector.getInstance which expects either Java List or array. If we use here a Scala collection and then let JavaConverters adjust that, we will have to create one extra wrapper which is a waste in this case.

@@ -184,7 +184,49 @@ val street = address.getString("street")
val number = address.getInt("number")
```

[//]: # (TODO loading vectors)
### Reading vectors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@jtgrabowski jtgrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jtgrabowski jtgrabowski merged commit 965b2dc into master Jun 21, 2024
8 of 10 checks passed
@jtgrabowski jtgrabowski deleted the SPARKC-706 branch June 21, 2024 10:08
tarzanek pushed a commit to tarzanek/spark-scylladb-connector that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants