Skip to content

CASSGO-22 CASSGO-73 Changes to Query and Batch to make them safely reusable #1868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2025

Conversation

joao-r-reis
Copy link
Contributor

@joao-r-reis joao-r-reis commented Mar 13, 2025

API Changes

ExecutableQuery

ExecutableQuery is currently an interface that Query and Batch implements (and is referenced by HostSelectionPolicy). However, it is also used in driver internals so the interface contains private methods which makes it impossible for users to "mock" the interface for testing purposes.

In this PR, ExecutableQuery is changed so it contains only public methods and is no longer implemented by Query and Batch. Now, ExecutableQuery is used exclusively as a "hook" in HostSelectionPolicy. This does mean that users can no longer attempt to cast an ExecutableQuery to Query or Batch but I've added a new method that provides this functionality (Statement).

type ExecutableQuery interface {
	GetRoutingKey() ([]byte, error)
	Keyspace() string
	Table() string
	IsIdempotent() bool
	Statement() Statement
}

Statement

This is the new interface that represents (and is implemented by) Query and Batch. In this PR it's only referenced by ExecutableQuery.

type Statement interface {
	Iter() *Iter
	Exec() error
}

internalRequest

New interface that is used by driver internals to decouple the public API of Query/Batch from the internal API.
When creating an internal request object the driver now copies most if not all of the query/batch properties so that users can submit a Query/Batch for execution and then re-use immediately without having to be concerned about the object being modified by the driver execution. It also makes it less error prone because the driver is free to modify these properties (e.g. page state, consistency, query metrics) without causing a change on the objects that the users are using.

type internalRequest interface {
	execute(ctx context.Context, conn *Conn) *Iter
	attempt(keyspace string, end, start time.Time, iter *Iter, host *HostInfo)
	retryPolicy() RetryPolicy
	speculativeExecutionPolicy() SpeculativeExecutionPolicy
	getQueryMetrics() *queryMetrics
	RetryableQuery
	ExecutableQuery
}

Query Metrics (i.e. Query/Batch.Attempts(), Query/Batch.Latency())

This functionality makes the API a bit awkward because the user submits a query for execution and then inspects the query object to retrieve the side effects of that execution after it's done. It's a better design (imo) to have this kind of data available in the Iter object since this is the type that represents the return value of a query/batch.

Due to this, AddAttempts(), Attempts(), Latency() and AddLatency() have been removed from Query and Batch. Latency() and Attempts() have been added to Iter and a new Batch.Iter() method has been added so that users can obtain the Iter object when executing a batch (even though it doesn't make a lot of sense conceptually due to the name but Iter is what the driver uses to return data about a request so it's not just about "iterating").

queryPool

Query objects were allocated from a sync.Pool but I don't see the gain of keeping it especially now that queryMetrics have been moved to Iter. Also users can create these query objects once and store them if they are worried about memory/GC.

Query/Batch.GetRoutingKey()

Having this method as part of the public API of Query and Batch makes it difficult to manage from the driver maintainer POV, it's supposed to be an internal method that can be called by implementations of HostSelectionPolicy but since ExecutableQuery was implemented by Query and Batch this meant that these two types had to have this method on their public API as well.

With this PR, this method is deprecated on Query and Batch since these two types no longer implement ExecutableQuery so we can keep the Query/Batch API much simpler.

@joao-r-reis joao-r-reis marked this pull request as ready for review March 17, 2025 13:58
Copy link
Contributor

@worryg0d worryg0d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @joao-r-reis,

I spent some time reviewing your PR. Great work! It simplifies public query and batch API which is pretty good.

Removing queryPool is a good idea. If anyone really needs this, it can be handled on their application side.

I left some minor comments but overall implementation is well.

I am only a bit concerned about the changes this PR provides to the driver internals, especially the conn part which overlaps with #1822.

@joao-r-reis
Copy link
Contributor Author

I am only a bit concerned about the changes this PR provides to the driver internals, especially the conn part which overlaps with #1822.

Yeah I'll have to spend some time rebasing this branch and even writing some tests, for now it's good enough if reviewers can provide feedback on the current state of the PR especially the public API changes

"time"
)

// ExecutableQuery is an interface that represents a query or batch statement that
// exposes the correct functions for the HostSelectionPolicy to operate correctly.
type ExecutableQuery interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about renaming type to HostPolicyQuery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted a comment about the renaming of this interface below in a response to James

qryOpts *queryOptions
pageState []byte
metrics *queryMetrics
refCount uint32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, leftover from a copy paste

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still persist. but changed its position 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, maybe I added it back during rebase, fixed

session.go Outdated
// Iter executes a batch operation and returns an Iter object
// that can be used to access properties related to the execution like Iter.Attempts and Iter.Latency
func (b *Batch) Iter() *Iter {
return b.session.executeBatch(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that people may execute sequence of Exec() and Iter(). Shall we make an assertion that only one of the is invoked?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you mean but I don't think we can lock them into one or another if we want it to be reusable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that people may execute sequence of Exec() and Iter(). Shall we make an assertion that only one of the is invoked?

They can't execute sequence (Exec().Iter()) unless they do it like:

err := b.Exec()
iter := b.Iter()

Which I believe is already something that can be done with the current API and it will trigger two requests. We can try to improve documentation if this is a concern.

+1 to what James said

@jameshartig
Copy link
Contributor

executing a batch (even though it doesn't make a lot of sense conceptually due to the name but Iter is what the driver uses to return data about a request so it's not just about "iterating").

Note that Yugabyte lets you iterate over the row status from a batch operation.

@jameshartig
Copy link
Contributor

With this PR, this method is deprecated on Query and Batch since these two types no longer implement ExecutableQuery so we can keep the Query/Batch API much simpler.

Why not just remove it? Seems like any HostSelectionPolicy implementation would need to make major changes already.

@jameshartig
Copy link
Contributor

I don't want to bikeshed about names but ExecutableQuery is a bit confusing since it contains Query and Batch. Did that name come from somewhere? Would ExecutableStatement make more sense since it has a statement?

}

type batchOptions struct {
Type BatchType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why some of these are public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type itself is not public and it is never exposed so it wasn't a deliberate choice, just a result of copy paste. I can make them all private for consistency

@jameshartig
Copy link
Contributor

What about making the methods Exec(context.Context) and Iter(context.Context) error so there's no need to shallow copy query in WithContext? This is a bigger change though that we might not want to make. Taht said, I think Query would be a bit easier to re-use then. I know this differs from http.Request but in that case it might be a symptom of not wanting to break the existing API when context's were introduced.

@joao-r-reis
Copy link
Contributor Author

With this PR, this method is deprecated on Query and Batch since these two types no longer implement ExecutableQuery so we can keep the Query/Batch API much simpler.

Why not just remove it? Seems like any HostSelectionPolicy implementation would need to make major changes already.

Hmm I don't think HostSelectionPolicy implementations have to make major changes after this PR unless they rely on casting the ExecutableQuery object into Query or Batch (and even this is a small change, they just need to call .Statement() before doing so).

I don't want to bikeshed about names but ExecutableQuery is a bit confusing since it contains Query and Batch. Did that name come from somewhere? Would ExecutableStatement make more sense since it has a statement?

I kept ExecutableQuery name unchanged so HostSelectionPolicy implementations can pretty much continue to work without many changes (or even any change at all). cc @lukasz-antoniak because you also brought this up in 1 of your comments.
I'm concerned that renaming this interface will increase the chance users will have to change their code when upgrading the driver but I do agree that the current name isn't good.

@joao-r-reis
Copy link
Contributor Author

What about making the methods Exec(context.Context) and Iter(context.Context) error so there's no need to shallow copy query in WithContext? This is a bigger change though that we might not want to make. Taht said, I think Query would be a bit easier to re-use then. I know this differs from http.Request but in that case it might be a symptom of not wanting to break the existing API when context's were introduced.

I like the idea but I think it would be too much to ask of users when upgrading since it would affect every single statement on their app... I'd be down to adding an overload and deprecating WithContext though... I just don't know what names would fit for the new Iter/Exec overloads...

@joao-r-reis joao-r-reis changed the title CASSGO-22 Changes to Query and Batch to make them safely reusable and "threadsafe" CASSGO-22 Changes to Query and Batch to make them safely reusable Mar 25, 2025
@worryg0d
Copy link
Contributor

I'm concerned that renaming this interface will increase the chance users will have to change their code when upgrading the driver but I do agree that the current name isn't good.

We can add a type alias to leave the ExecutableQuery name for the migration period:

// Deprecated: Will be removed in the future major release.
// Please use Statement instead.
type ExecutableQuery = Statement

type Statement interface {
...
}

I tested it on go 1.19.13 and it works fine

@worryg0d
Copy link
Contributor

I just don't know what names would fit for the new Iter/Exec overloads...

Well, we can use the common Context suffix for those overloads, like it is done in sql package from standard lib - https://pkg.go.dev/database/sql#Conn.QueryContext
https://pkg.go.dev/database/sql#DB.ExecContext

@joao-r-reis
Copy link
Contributor Author

I've addressed all PR comments, I'll work on rebasing the branch now and then I'll work on the change mentioned here.

@joao-r-reis joao-r-reis force-pushed the cassgo-22-prototype branch from 26570d6 to 4ad60b6 Compare April 10, 2025 12:16
@joao-r-reis
Copy link
Contributor Author

Rebase is done, it was a bit more complex than I thought it would be so I'd appreciate if you guys could take a look again @jameshartig @lukasz-antoniak @worryg0d

qryOpts *queryOptions
pageState []byte
metrics *queryMetrics
refCount uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still persist. but changed its position 😅

@joao-r-reis joao-r-reis requested a review from ribaraka April 15, 2025 14:44
@joao-r-reis
Copy link
Contributor Author

@jameshartig @worryg0d do you guys have some free time to take a look at this again during the next few days?

@joao-r-reis joao-r-reis changed the title CASSGO-22 Changes to Query and Batch to make them safely reusable CASSGO-22 CASSGO-73 Changes to Query and Batch to make them safely reusable Jun 5, 2025
@joao-r-reis joao-r-reis force-pushed the cassgo-22-prototype branch from 4691ddd to 0f09a6d Compare June 9, 2025 18:24
Before this change queries were mutated while being executed (the query metrics and the consistency for example).
Instead copy query properties to an internal query object and move query metrics to Iter. This allows users
to reuse Query and Batch objects. Query object pooling was also removed.

Some query and batch properties were not accessible via ObservedBatch and ObservedQuery. Added the original Batch and Query
objects to ObservedBatch and ObservedQuery to fix this.

Patch by João Reis; reviewed by James Hartig and Stanislav Bychkov for CASSGO-22 and CASSGO-73
@joao-r-reis joao-r-reis force-pushed the cassgo-22-prototype branch from 828d72f to 68e805a Compare June 10, 2025 18:37
@joao-r-reis joao-r-reis merged commit f9a0f63 into apache:trunk Jun 10, 2025
72 checks passed
@joao-r-reis joao-r-reis deleted the cassgo-22-prototype branch June 10, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants