-
Notifications
You must be signed in to change notification settings - Fork 634
CASSGO-22 Changes to Query and Batch to make them safely reusable #1868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @joao-r-reis,
I spent some time reviewing your PR. Great work! It simplifies public query and batch API which is pretty good.
Removing queryPool
is a good idea. If anyone really needs this, it can be handled on their application side.
I left some minor comments but overall implementation is well.
I am only a bit concerned about the changes this PR provides to the driver internals, especially the conn part which overlaps with #1822.
Yeah I'll have to spend some time rebasing this branch and even writing some tests, for now it's good enough if reviewers can provide feedback on the current state of the PR especially the public API changes |
query_executor.go
Outdated
"time" | ||
) | ||
|
||
// ExecutableQuery is an interface that represents a query or batch statement that | ||
// exposes the correct functions for the HostSelectionPolicy to operate correctly. | ||
type ExecutableQuery interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about renaming type to HostPolicyQuery
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted a comment about the renaming of this interface below in a response to James
qry.releaseAfterExecution() | ||
} | ||
|
||
type queryOptions struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we group here all relevant options into sub-types, for example:
Paging
pageSize
initialPageState
disableAutoPage
Monitoring
observer
trace
Parameters
values
binding
WriteTimestamp
defaultTimestamp
defaultTimestampValue
Consistency
initialConsistency
serialConsistency
FrameOptions
customPayload
disableSkipMetadata
stmt
prefetch
rt
spec
context
idempotent
keyspace
skipPrepare
routingKey
Grouping is not definitive, it is just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'll do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought I'm not sure if I like this idea that much, I'd have to create a few new types when maybe I can just group them a bit more by arranging the way they are declared... Wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this:
type queryOptions struct {
stmt string
// Paging
pageSize int
initialPageState []byte
disableAutoPage bool
// Monitoring
trace Tracer
observer QueryObserver
// Parameters
values []interface{}
binding func(q *QueryInfo) ([]interface{}, error)
// Timestamp
defaultTimestamp bool
defaultTimestampValue int64
// Consistency
initialConsistency Consistency
serialCons SerialConsistency
// Protocol flag
disableSkipMetadata bool
customPayload map[string][]byte
prefetch float64
rt RetryPolicy
spec SpeculativeExecutionPolicy
context context.Context
idempotent bool
keyspace string
skipPrepare bool
routingKey []byte
// getKeyspace is field so that it can be overriden in tests
getKeyspace func() string
}
query_executor.go
Outdated
qryOpts *queryOptions | ||
pageState []byte | ||
metrics *queryMetrics | ||
refCount uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, leftover from a copy paste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still persist. but changed its position 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, maybe I added it back during rebase, fixed
session.go
Outdated
// Iter executes a batch operation and returns an Iter object | ||
// that can be used to access properties related to the execution like Iter.Attempts and Iter.Latency | ||
func (b *Batch) Iter() *Iter { | ||
return b.session.executeBatch(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that people may execute sequence of Exec()
and Iter()
. Shall we make an assertion that only one of the is invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see what you mean but I don't think we can lock them into one or another if we want it to be reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that people may execute sequence of
Exec()
andIter()
. Shall we make an assertion that only one of the is invoked?
They can't execute sequence (Exec().Iter()
) unless they do it like:
err := b.Exec()
iter := b.Iter()
Which I believe is already something that can be done with the current API and it will trigger two requests. We can try to improve documentation if this is a concern.
+1 to what James said
Note that Yugabyte lets you iterate over the row status from a batch operation. |
Why not just remove it? Seems like any |
I don't want to bikeshed about names but |
qry.routingInfo.keyspace = info.request.keyspace | ||
qry.routingInfo.table = info.request.table | ||
qry.routingInfo.mu.Unlock() | ||
q.routingInfo.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need to lock? I don't think it does unless we expect GetRoutingKey
to be called from separate goroutines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can we just change routingInfo
to not be a struct anymore? Seems like it only was because...
// routingInfo is a pointer because Query can be copied and copyable struct can't hold a mutex.
Do we still allow copies of internalQuery
? If so, maybe we should make a Clone method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need to lock? I don't think it does unless we expect GetRoutingKey to be called from separate goroutines?
I think this is still the case due to speculative executions (and possibly retries I'm not sure).
Also can we just change routingInfo to not be a struct anymore? Seems like it only was because...
// routingInfo is a pointer because Query can be copied and copyable struct can't hold a mutex.
Do we still allow copies of internalQuery? If so, maybe we should make a Clone method?
I don't think it needs to be a struct anymore, I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I made some changes to the routingInfo
part but I didn't remove the struct... I think it still serves a purpose by making it clear that the mutex is used to protect access against the two fields within the struct. Do you still prefer to remove the struct and just have the mutex and two fields in the request type itself?
} | ||
|
||
func newQueryOptions(q *Query) *queryOptions { | ||
return &queryOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be copying values
, customPayload
, initialPageState
, routingKey
? it might not be obvious that those are not copied and can't be mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think copying values
and customPayload
might be a bit too much due to performance reasons. initialPageState
should probably be copied for safety and since its length is usually pretty small. routingKey
I'm not so sure, I think it will be nil
the vast majority of cases anyway but I can make sure it gets copied as well
query_executor.go
Outdated
} | ||
|
||
type batchOptions struct { | ||
Type BatchType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why some of these are public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type itself is not public and it is never exposed so it wasn't a deliberate choice, just a result of copy paste. I can make them all private for consistency
What about making the methods |
Hmm I don't think
I kept |
I like the idea but I think it would be too much to ask of users when upgrading since it would affect every single statement on their app... I'd be down to adding an overload and deprecating |
We can add a type alias to leave the // Deprecated: Will be removed in the future major release.
// Please use Statement instead.
type ExecutableQuery = Statement
type Statement interface {
...
} I tested it on go 1.19.13 and it works fine |
Well, we can use the common Context suffix for those overloads, like it is done in sql package from standard lib - https://pkg.go.dev/database/sql#Conn.QueryContext |
I've addressed all PR comments, I'll work on rebasing the branch now and then I'll work on the change mentioned here. |
# Conflicts: # conn.go # session.go
26570d6
to
4ad60b6
Compare
Rebase is done, it was a bit more complex than I thought it would be so I'd appreciate if you guys could take a look again @jameshartig @lukasz-antoniak @worryg0d |
query_executor.go
Outdated
func newQueryOptions(q *Query, ctx context.Context) *queryOptions { | ||
var newPageState, newRoutingKey []byte | ||
if q.initialPageState != nil { | ||
pageState := q.initialPageState | ||
newPageState = make([]byte, len(pageState)) | ||
copy(newPageState, pageState) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newPageState
seems to be redundant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the reason for this is to ensure that a query execution isn't affected if the user tries to modify the page byte slice after query has been submitted for execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if we're just adding a performance penalty to safeguard against something that users won't attempt to do 99% of the times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that newPageState
is copied but never used. And it doesn't seem the queryOptions
struct has such a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh you're right, I removed this field from this type and moved it up to internalQuery
but I forgot to remove this part, nice catch. Done
query_executor.go
Outdated
qryOpts *queryOptions | ||
pageState []byte | ||
metrics *queryMetrics | ||
refCount uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still persist. but changed its position 😅
query_executor.go
Outdated
func newInternalBatch(batch *Batch, ctx context.Context) *internalBatch { | ||
return &internalBatch{ | ||
originalBatch: batch, | ||
batchOpts: newBatchOptions(batch, ctx), | ||
metrics: &queryMetrics{m: make(map[string]*hostMetrics)}, | ||
routingInfo: &queryRoutingInfo{}, | ||
session: batch.session, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency
is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, done
session.go
Outdated
// Iter executes a batch operation and returns an Iter object | ||
// that can be used to access properties related to the execution like Iter.Attempts and Iter.Latency | ||
func (b *Batch) Iter() *Iter { return b.session.executeBatch(b, nil) } | ||
|
||
// Iter executes a batch operation with the provided context and returns an Iter object | ||
// that can be used to access properties related to the execution like Iter.Attempts and Iter.Latency | ||
func (b *Batch) IterContext(ctx context.Context) *Iter { | ||
return b.session.executeBatch(b, ctx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicks:
- The
Iter
method should call theIterContext
- The comment for
IterContext
should start with the method name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
query_executor.go
Outdated
// Statement is an interface that represents a CQL statement that the driver can execute | ||
// (currently Query and Batch via Session.Query and Session.Batch) | ||
type Statement interface { | ||
Iter() *Iter | ||
Exec() error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding here IterContext
and ExecContetx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion 👍 done
@jameshartig @worryg0d do you guys have some free time to take a look at this again during the next few days? |
# Conflicts: # cassandra_test.go # session.go
API Changes
ExecutableQuery
ExecutableQuery
is currently an interface thatQuery
andBatch
implements (and is referenced byHostSelectionPolicy
). However, it is also used in driver internals so the interface contains private methods which makes it impossible for users to "mock" the interface for testing purposes.In this PR,
ExecutableQuery
is changed so it contains only public methods and is no longer implemented byQuery
andBatch
. Now,ExecutableQuery
is used exclusively as a "hook" inHostSelectionPolicy
. This does mean that users can no longer attempt to cast anExecutableQuery
toQuery
orBatch
but I've added a new method that provides this functionality (Statement
).Statement
This is the new interface that represents (and is implemented by)
Query
andBatch
. In this PR it's only referenced byExecutableQuery
.internalRequest
New interface that is used by driver internals to decouple the public API of Query/Batch from the internal API.
When creating an internal request object the driver now copies most if not all of the query/batch properties so that users can submit a Query/Batch for execution and then re-use immediately without having to be concerned about the object being modified by the driver execution. It also makes it less error prone because the driver is free to modify these properties (e.g. page state, consistency, query metrics) without causing a change on the objects that the users are using.
Query Metrics (i.e.
Query/Batch.Attempts()
,Query/Batch.Latency()
)This functionality makes the API a bit awkward because the user submits a query for execution and then inspects the query object to retrieve the side effects of that execution after it's done. It's a better design (imo) to have this kind of data available in the
Iter
object since this is the type that represents the return value of a query/batch.Due to this,
AddAttempts()
,Attempts()
,Latency()
andAddLatency()
have been removed fromQuery
andBatch
.Latency()
andAttempts()
have been added toIter
and a newBatch.Iter()
method has been added so that users can obtain theIter
object when executing a batch (even though it doesn't make a lot of sense conceptually due to the name butIter
is what the driver uses to return data about a request so it's not just about "iterating").queryPool
Query
objects were allocated from async.Pool
but I don't see the gain of keeping it especially now thatqueryMetrics
have been moved toIter
. Also users can create these query objects once and store them if they are worried about memory/GC.Query/Batch.GetRoutingKey()
Having this method as part of the public API of
Query
andBatch
makes it difficult to manage from the driver maintainer POV, it's supposed to be an internal method that can be called by implementations ofHostSelectionPolicy
but sinceExecutableQuery
was implemented byQuery
andBatch
this meant that these two types had to have this method on their public API as well.With this PR, this method is deprecated on
Query
andBatch
since these two types no longer implementExecutableQuery
so we can keep theQuery
/Batch
API much simpler.