Skip to content

CASSGO-43: externally-defined type registration #1855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

jameshartig
Copy link
Contributor

@jameshartig jameshartig commented Jan 2, 2025

CASSGO-43: externally-defined type registration
The new RegisteredTypes struct can be used to register externally-defined
types. You'll need to define your own marshalling and unmarshalling code
as well as a TypeInfo implementation. The name and id MUST not collide
with existing and future native CQL types.

A lot of the type handling was refactored to use the new format for
types.

Pointers to empty interfaces are accepted by Scan and are used to build
the maps in MapScan and SliceMap.

inet columns are now unmarshaled as net.IP which is a breaking change.

goos: linux
goarch: amd64
pkg: github.com/gocql/gocql
cpu: AMD EPYC 7B13
                             │   old.txt    │                new2.txt                │
                             │    sec/op    │    sec/op     vs base                  │
SingleConn-16                  25.87µ ±  2%   25.65µ ±  1%        ~ (p=0.280 n=10)
ParseRowsFrame-16              870.5n ±  3%   634.6n ±  4%  -27.10% (p=0.000 n=10)
UnmarshalVarchar-16            47.29n ±  2%
UnmarshalUUID-16               3.161n ±  2%
Unmarshal_BigInt-16            17.61n ±  9%   17.88n ±  0%        ~ (p=0.138 n=10)
Unmarshal_Blob-16              19.19n ±  0%   18.69n ±  0%   -2.61% (p=0.000 n=10)
Unmarshal_Boolean-16           15.71n ±  0%   16.16n ±  1%   +2.90% (p=0.000 n=10)
Unmarshal_Date-16              18.53n ±  1%   18.64n ±  0%   +0.59% (p=0.017 n=10)
Unmarshal_Decimal-16           172.2n ±  2%   181.3n ±  1%   +5.32% (p=0.000 n=10)
Unmarshal_Double-16            16.33n ±  1%   16.20n ±  1%   -0.77% (p=0.003 n=10)
Unmarshal_Duration-16          26.03n ±  1%   25.05n ±  0%   -3.77% (p=0.000 n=10)
Unmarshal_Float-16             15.70n ±  0%   15.90n ±  1%   +1.31% (p=0.001 n=10)
Unmarshal_Int-16               17.80n ±  1%   18.78n ±  2%   +5.48% (p=0.000 n=10)
Unmarshal_Inet-16              28.68n ±  1%   29.20n ±  1%   +1.78% (p=0.001 n=10)
Unmarshal_SmallInt-16          17.92n ±  1%   19.37n ±  2%   +8.09% (p=0.000 n=10)
Unmarshal_Time-16              16.00n ±  1%   16.22n ±  1%   +1.37% (p=0.001 n=10)
Unmarshal_Timestamp-16         16.00n ±  2%   16.57n ±  2%   +3.50% (p=0.000 n=10)
Unmarshal_TinyInt-16           16.38n ±  3%   18.41n ±  1%  +12.39% (p=0.000 n=10)
Unmarshal_UUID-16              16.05n ±  1%   16.34n ±  3%   +1.74% (p=0.000 n=10)
Unmarshal_Varchar-16           18.86n ±  1%   18.71n ±  1%   -0.77% (p=0.003 n=10)
Unmarshal_List-16              176.2n ±  4%   183.2n ±  4%   +3.97% (p=0.003 n=10)
Unmarshal_Set-16               177.7n ±  1%   181.5n ±  0%   +2.17% (p=0.000 n=10)
Unmarshal_Map-16               374.0n ±  5%   375.2n ±  4%        ~ (p=0.218 n=10)
FramerReadTypeInfo-16          207.8n ±  3%   214.9n ±  2%   +3.44% (p=0.002 n=10)
ConnStress-16                  14.10µ ± 16%   15.20µ ± 14%        ~ (p=0.631 n=10)
ConnRoutingKey-16              275.3n ±  4%   197.8n ±  2%  -28.17% (p=0.000 n=10)
WikiCreateSchema-16            515.9m ±  3%   501.4m ±  3%   -2.81% (p=0.011 n=10)
WikiCreatePages-16             1.534m ±  1%   1.474m ±  1%   -3.90% (p=0.000 n=10)
WikiSelectAllPages-16          1.982m ±  3%   1.906m ±  3%   -3.84% (p=0.000 n=10)
WikiSelectSinglePage-16        1.529m ±  2%   1.479m ±  1%   -3.27% (p=0.000 n=10)
WikiSelectPageCount-16         1.691m ±  2%   1.646m ±  1%   -2.63% (p=0.001 n=10)
Unmarshal_TupleStrings-16                     329.6n ±  1%
Unmarshal_TupleInterfaces-16                  357.9n ±  3%
geomean                        387.3n         475.8n         -0.98%                ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

                             │    old.txt     │                 new2.txt                 │
                             │      B/op      │     B/op      vs base                    │
SingleConn-16                  3.155Ki ± 0%     3.109Ki ± 0%   -1.45% (p=0.000 n=10)
ParseRowsFrame-16               1128.0 ± 0%       856.0 ± 0%  -24.11% (p=0.000 n=10)
UnmarshalVarchar-16              32.00 ± 0%
UnmarshalUUID-16                 0.000 ± 0%
Unmarshal_BigInt-16              0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Blob-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Boolean-16             0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Date-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Decimal-16             96.00 ± 0%       96.00 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Double-16              0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Duration-16            0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Float-16               0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Int-16                 0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Inet-16                4.000 ± 0%       4.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_SmallInt-16            0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Time-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Timestamp-16           0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_TinyInt-16             0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_UUID-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Varchar-16             0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_List-16                32.00 ± 0%       32.00 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Set-16                 32.00 ± 0%       32.00 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Map-16                 280.0 ± 0%       280.0 ± 0%        ~ (p=1.000 n=10) ¹
FramerReadTypeInfo-16           160.00 ± 0%       96.00 ± 0%  -40.00% (p=0.000 n=10)
ConnStress-16                  2.956Ki ± 0%     2.669Ki ± 0%   -9.70% (p=0.001 n=10)
ConnRoutingKey-16                96.00 ± 0%       32.00 ± 0%  -66.67% (p=0.000 n=10)
WikiCreateSchema-16            59.90Ki ± 3%     51.87Ki ± 4%  -13.41% (p=0.000 n=10)
WikiCreatePages-16             4.357Ki ± 0%     3.864Ki ± 0%  -11.33% (p=0.000 n=10)
WikiSelectAllPages-16          38.32Ki ± 0%     28.74Ki ± 0%  -25.01% (p=0.000 n=10)
WikiSelectSinglePage-16        3.783Ki ± 0%     3.303Ki ± 0%  -12.69% (p=0.000 n=10)
WikiSelectPageCount-16         3.215Ki ± 0%     2.816Ki ± 0%  -12.42% (p=0.000 n=10)
Unmarshal_TupleStrings-16                         126.0 ± 0%
Unmarshal_TupleInterfaces-16                      126.0 ± 0%
geomean                                     ²                  -9.27%                ³ ²
¹ all samples are equal
² summaries must be >0 to compute geomean
³ benchmark set differs from baseline; geomeans may not be comparable

                             │   old.txt    │                new2.txt                │
                             │  allocs/op   │ allocs/op   vs base                    │
SingleConn-16                  37.00 ± 0%     37.00 ± 0%        ~ (p=1.000 n=10) ¹
ParseRowsFrame-16              21.00 ± 0%     13.00 ± 0%  -38.10% (p=0.000 n=10)
UnmarshalVarchar-16            1.000 ± 0%
UnmarshalUUID-16               0.000 ± 0%
Unmarshal_BigInt-16            0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Blob-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Boolean-16           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Date-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Decimal-16           4.000 ± 0%     4.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Double-16            0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Duration-16          0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Float-16             0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Int-16               0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Inet-16              1.000 ± 0%     1.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_SmallInt-16          0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Time-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Timestamp-16         0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_TinyInt-16           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_UUID-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Varchar-16           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_List-16              2.000 ± 0%     2.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Set-16               2.000 ± 0%     2.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Map-16               5.000 ± 0%     5.000 ± 0%        ~ (p=1.000 n=10) ¹
FramerReadTypeInfo-16          4.000 ± 0%     4.000 ± 0%        ~ (p=1.000 n=10) ¹
ConnStress-16                  35.00 ± 0%     33.00 ± 0%   -5.71% (p=0.000 n=10)
ConnRoutingKey-16              4.000 ± 0%     3.000 ± 0%  -25.00% (p=0.000 n=10)
WikiCreateSchema-16            858.0 ± 3%     654.5 ± 4%  -23.72% (p=0.000 n=10)
WikiCreatePages-16             56.00 ± 2%     51.00 ± 0%   -8.93% (p=0.000 n=10)
WikiSelectAllPages-16          343.0 ± 0%     338.0 ± 0%   -1.46% (p=0.000 n=10)
WikiSelectSinglePage-16        50.00 ± 0%     45.00 ± 0%  -10.00% (p=0.000 n=10)
WikiSelectPageCount-16         45.00 ± 0%     40.00 ± 0%  -11.11% (p=0.000 n=10)
Unmarshal_TupleStrings-16                     8.000 ± 0%
Unmarshal_TupleInterfaces-16                  8.000 ± 0%
geomean                                   ²                -4.81%                ³ ²
¹ all samples are equal
² summaries must be >0 to compute geomean
³ benchmark set differs from baseline; geomeans may not be comparable

@jameshartig
Copy link
Contributor Author

jameshartig commented Jan 3, 2025

Here's an example of a JSONB type:

package mypackage

import "github.com/gocql/gocql"

var typeJSONB gocql.Type = 0x0080

func init() {
	gocql.RegisterType(typeJSONB, "jsonb", SimpleCQLType{jsonbCQLType{}})
}

// jsonbCQLType implements the gocql.CQLType interface
type jsonbCQLType struct {}

// Marshal implements the gocql.CQLType interface
func (j jsonbCQLType) Marshal(value interface{}) ([]byte, error) {
	// TODO: find a better way to marshal as this type
	return gocql.Marshal(gocql.NewNativeType(4, gocql.TypeBlob, ""), value)
}

// Unmarshal implements the gocql.CQLType interface
func (j jsonbCQLType) Unmarshal(value []byte, dest interface{}) error {
	// TODO: find a better way to unmarshal as this type
	return gocql.Unmarshal(gocql.NewNativeType(4, gocql.TypeBlob, ""), value, dest)
}

Or they can write their own marshal/unmarshal methods:

func marshalJSONB(info gocql.TypeInfo, value interface{}) ([]byte, error) {
	switch v := value.(type) {
	case json.RawMessage:
		return v, nil
	case string:
		return []byte(v), nil
	case []byte:
		return v, nil
	}
	if value == nil {
		return nil, nil
	}

	rv := reflect.ValueOf(value)
	t := rv.Type()
	k := t.Kind()
	switch {
	case k == reflect.String:
		return []byte(rv.String()), nil
	case k == reflect.Slice && t.Elem().Kind() == reflect.Uint8:
		return rv.Bytes(), nil
	}
	return nil, gocql.MarshalError(fmt.Sprintf("can not marshal %T into jsonb", value))
}

var typeSliceOfBytes = reflect.TypeOf([]byte(nil))

func unmarshalJSONB(info gocql.TypeInfo, data []byte, value interface{}) error {
	switch v := value.(type) {
	case *json.RawMessage:
		if data != nil {
			*v = append((*v)[:0], data...)
		} else {
			*v = nil
		}
		return nil
	case *string:
		*v = string(data)
		return nil
	case *[]byte:
		if data != nil {
			*v = append((*v)[:0], data...)
		} else {
			*v = nil
		}
		return nil
	case *interface{}:
		if data != nil {
			*v = append(json.RawMessage{}, data...)
		} else {
			*v = nil
		}
		return nil

	}

	rv := reflect.ValueOf(value)
	if rv.Kind() != reflect.Ptr {
		return gocql.UnmarshalError(fmt.Sprintf("can not unmarshal jsonb into non-pointer %T", value))
	}
	rv = rv.Elem()
	t := rv.Type()
	k := t.Kind()
	switch {
	case k == reflect.String:
		rv.SetString(string(data))
		return nil
	case k == reflect.Slice && t.Elem().Kind() == reflect.Uint8:
		if data != nil {
			if rv.Type() != typeSliceOfBytes {
				rv.Set(reflect.AppendSlice(rv.Slice(0, 0), reflect.ValueOf(data).Convert(t)))
			} else {
				rv.Set(reflect.AppendSlice(rv.Slice(0, 0), reflect.ValueOf(data)))
			}
		} else {
			rv.Set(reflect.Zero(t))
		}
		return nil
	}
	return gocql.UnmarshalError(fmt.Sprintf("can not unmarshal jsonb into %T", value))
}

@jameshartig jameshartig marked this pull request as ready for review January 31, 2025 03:24
@jameshartig jameshartig force-pushed the external-types branch 3 times, most recently from 61387c3 to 0abeec5 Compare January 31, 2025 07:52
@jameshartig
Copy link
Contributor Author

I could use some tests of custom types but given that all of the native types are using most of the same code, I feel confident in the guts of the code.

@jameshartig jameshartig force-pushed the external-types branch 2 times, most recently from 1c53a5b to 0203176 Compare January 31, 2025 08:05
@joao-r-reis
Copy link
Contributor

This looks great! #1828 has been waiting to be merged for a while and it looks like the code changes might conflict with some of the changes here on this PR so I'd like to get #1828 merged before this one unless there's benefits to mergind this one first

@jameshartig
Copy link
Contributor Author

This looks great! #1828 has been waiting to be merged for a while and it looks like the code changes might conflict with some of the changes here on this PR so I'd like to get #1828 merged before this one unless there's benefits to mergind this one first

That makes sense. Going to switch this back to draft so I can resolve those conflicts and take a stab at cleaning up the UDT case we talked about in Slack.

@jameshartig jameshartig marked this pull request as draft January 31, 2025 15:17
@jameshartig
Copy link
Contributor Author

I simplified the TypeInfo struct but had to keep it around so that composite types could store their children type info somewhere. Simple types can just use their Type as a TypeInfo though. This reduced the boilerplate necessary to add a new type (I updated the above jsonb example) and allowed me to remove the separate composite interface and only have a single interface that's necessary to implement.

@jameshartig
Copy link
Contributor Author

Initially, some of the unmarshal benchmarks were worse so I inlined some of the type switching which is a bit gross but massively improved the benchmarks.

@jameshartig jameshartig force-pushed the external-types branch 4 times, most recently from 88ed1af to 33fab21 Compare February 5, 2025 07:15
@jameshartig
Copy link
Contributor Author

Initially, some of the unmarshal benchmarks were worse so I inlined some of the type switching which is a bit gross but massively improved the benchmarks.

I do think we can roll some of those back when Go 1.24 comes out and there's improved Swiss table maps. I'll need to rerun the benchmarks but it would be nice to get rid of all of that boilerplate.

@joao-r-reis
Copy link
Contributor

I do think we can roll some of those back when Go 1.24 comes out and there's improved Swiss table maps. I'll need to rerun the benchmarks but it would be nice to get rid of all of that boilerplate.

We probably want to be careful with requiring such a recent go version, if it's not crucial then we should delay it as much as possible

@joao-r-reis
Copy link
Contributor

Is this ready for review btw?

@jameshartig
Copy link
Contributor Author

Is this ready for review btw?

Yes, but I would like a chance to write some more tests. I think it's in a good place though for review besides the tests.

We probably want to be careful with requiring such a recent go version, if it's not crucial then we should delay it as much as possible

Yeah, my comment was meant for the distant future.

@jameshartig jameshartig marked this pull request as ready for review February 22, 2025 04:30
Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this rework of the type code but I think there's a couple of changes that should be made:

  • Adding protocol version to the Marshal() and Unmarshal functions so the marshal/unmarshal code can provide the implementation for multiple protocol versions (e.g. list implementation changes according to the protocol version)
  • Custom types and UDTs - for these two types we probably need a new Register function because the cql type doesn't identify the actual type (there's multiple UDTs and multiple custom types). This can probably be out of scope of this PR since I believe the driver doesn't handle this well anyway but maybe we should just panic when a user tries to call RegisterType with UDT or custom type atm.

return TypeTuple
default:
return TypeCustom
// TODO: move to types.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want to keep these TODOs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these can be handled in a follow-up PR. I didn't want to do it in this one because it would make the diff even worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did end up moving some of these over if I significantly rewrote them

for i := range c.Elems {
columnName := TupleColumnName(column.Name, i)
if dest, ok := m[columnName]; ok {
// TODO: check if dest is a pointer and if not, error here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some TODOs here too

types.go Outdated
// column. CQLType is the implementation of the type.
// This function is not goroutine-safe and must be called before any other usage
// of the gocql package. Typically this should be called in an init function.
func RegisterType(typ Type, name string, t CQLType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this a part of clusterconfig so that users can have different sessions with different type registrations within the same app

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that but then it would require locking, and it couldn't be as easy as importing a package that has a side-effect of registering it's type (which is what my employer wanted to do). I don't think it would be very common for people to have different handling for different types in different clusters. Do you think so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be very common for people to have different handling for different types in different clusters.

For basic native types maybe not but eventually this API will evolve to support UDTs and custom types and with these types I can definitely see a use case for having separate type registrations on the same app.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that but then it would require locking

Why? It could be a matter of the global API returning an object that is then provided to clusterconfig or the API could be tied to the clusterconfig object, I don't see how this requires locking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to work on supporting both but I'm still not clear how exactly you'd use this for a UDT since UDT's are already supported. Unfortunately anything outside of the package won't have access to readBytes and they can already make a struct implement UnmarshalCQL which would essentially do the same thing as registering their own type but with a lot less work.

marshal.go Outdated
}

// NativeType is a simple type that is defined by the Cassandra protocol.
// Deprecated. Use Type instead.
// TODO: move to types.go
type NativeType struct {
proto byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're using this field to basically tell the marshaller/unmarshaller what the protocol version is. Maybe we should add protocol version to the marshal interface instead? And then deprecate the current one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really only needed for the protocol 2 UDT error. NativeType and NewNativeType are also a publicly exposed and there might be usages that expect there to be a protocol option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also used for collections if I'm not mistaken

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collections use their own structs and not this one.

types.go Outdated

// Marshal marshals the value for the given TypeInfo into a byte slice.
func (t customCQLType) Marshal(info TypeInfo, value interface{}) ([]byte, error) {
panic("not implemented")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this new API works for custom types, typically you can have multiple custom types but all of them are of type "custom" and then the name (string) is what identifies the type (this is how vectors are implemented for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we not risk the driver panicing here with this implementation? Or will the driver never call this if it encounters a custom type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was already the case where it would panic. I'm still a bit unclear on exactly how to handle custom types. I considered removing this completely, but we'd need to have a bit of custom code to check for TypeCustom and then get the name of it and look that up instead. I think that could be handled in a follow-up PR but I can try to handle it here if you want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leave that for a follow up PR as long as things don't get broken by merging this PR, did this panic already exist before these changes?

types.go Outdated
// TypeInfo. It must support data being nil for nullable columns and it must
// support pointers to empty interfaces to get the default go type for the
// column.
Unmarshal(info TypeInfo, data []byte, value interface{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add protocol version to this function parameters (and marshal). I've mentioned this in a previous comment above

marshal.go Outdated
@@ -1579,7 +2113,8 @@ func writeCollectionSize(info CollectionType, n int, buf *bytes.Buffer) error {
return nil
}

func marshalList(info TypeInfo, value interface{}) ([]byte, error) {
// Marshal marshals the value for the given TypeInfo into a byte slice.
func (listSetCQLType) Marshal(info TypeInfo, value interface{}) ([]byte, error) {
listInfo, ok := info.(CollectionType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this implementation relies on accessing listInfo.proto which isn't exported shows that there needs to be a better (and accessible) way of knowing what protocol version is being used otherwise user defined registrations won't be able to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a lot. I'm hesitant to make the Marshal/Unmarshal interface methods have another parameter when just a few types need it, and potentially no external types need it.

I see the TypeInfo as a way for the types to store the information they need to marshal and unmarshal, including the list subtype, map subtypes, and also the protocol, if it's needed.

An alternative is that we could have listSetCQLVersion2Type and listSetCQLType and use a different struct for version 2 (and lower) specifically but that felt like a lot of code for one tiny difference.

// TypeInfoFromString should return a TypeInfo implementation for the type with
// the given names/classes. Only the portion within the parantheses or arrows
// are passed to this function. For simple types, the name passed might be empty.
TypeInfoFromString(proto int, name string) TypeInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should enable the possibility of returning an error on these methods? Otherwise the only thing that the implementation can do is panic which is far from ideal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should do that, and that's next on my todo list.

@joao-r-reis
Copy link
Contributor

So at this time I think the main open discussion point is whether to have these type registration API as a global API that has sideeffects on import or whether we want this to be an API tied to a cluster/session object. I have a pretty strong conviction that a global API is the wrong path but I will obviously defer to what the community agrees on so I think we need a discussion thread on the C* dev mailing list to move forward

@jameshartig
Copy link
Contributor Author

jameshartig commented Apr 3, 2025

@joao-r-reis I refactored a lot of the code and even though I'm not done, I'm interested in your thoughts. I think the code is close to where something like Vector could be completely defined via the public API.

  • A new public RegisteredTypes struct which holds types and could be passed to a config. I don't like that the global one is overwritable so I might change it to GlobalTypes() *RegisteredTypes. I think I also need to expose methods for getting a type to make it easier for custom types to reuse some existing types for marshalling/unmarshalling.
  • SimpleCQLType can wrap a TypeInfo if you don't need custom parsing.
  • Marshal/Unmarshal were moved off of CQLType and instead put on TypeInfo. This simplifies the protocol version handling and I added an explicit unsupportedUDTTypeInfo.
  • I added explicit handling of Custom and removed the Marshal/Unmarshal panic, that said, I'm still trying to figure out how to handle unknown custom types. When we parse metadata we need to be able to make TypeInfos for custom types that we don't necessarily know how to marshal/unmarshal so I might need to add some fallback unknownCustomTypeInfo type that still panics/errors in Marshal/Unmarshal.
  • Session now has a types field on it. Not pulling it in yet from config. Still need to figure out the best way to "merge" the global one with the session ones. Now passing session along throughout the framer and metadata parsing.
  • Removed the NativeType type but left NewNativeType which does something similar to before since there was a decent amount of usage in other repos.

Overall these changes also significantly improved the benchmarks compared to the last patch and most unmarshal benchmarks are within single-digit percentages now to trunk.

The new RegisteredTypes struct can be used to register externally-defined
types. You'll need to define your own marshalling and unmarshalling code
as well as a TypeInfo implementation. The name and id MUST not collide
with existing and future native CQL types.

A lot of the type handling was refactored to use the new format for
types.

Pointers to empty interfaces are accepted by Scan and are used to build
the maps in MapScan and SliceMap.

inet columns are now unmarshaled as net.IP which is a breaking change.

goos: linux
goarch: amd64
pkg: github.com/gocql/gocql
cpu: AMD EPYC 7B13
                             │   old.txt    │                new2.txt                │
                             │    sec/op    │    sec/op     vs base                  │
SingleConn-16                  25.87µ ±  2%   25.65µ ±  1%        ~ (p=0.280 n=10)
ParseRowsFrame-16              870.5n ±  3%   634.6n ±  4%  -27.10% (p=0.000 n=10)
UnmarshalVarchar-16            47.29n ±  2%
UnmarshalUUID-16               3.161n ±  2%
Unmarshal_BigInt-16            17.61n ±  9%   17.88n ±  0%        ~ (p=0.138 n=10)
Unmarshal_Blob-16              19.19n ±  0%   18.69n ±  0%   -2.61% (p=0.000 n=10)
Unmarshal_Boolean-16           15.71n ±  0%   16.16n ±  1%   +2.90% (p=0.000 n=10)
Unmarshal_Date-16              18.53n ±  1%   18.64n ±  0%   +0.59% (p=0.017 n=10)
Unmarshal_Decimal-16           172.2n ±  2%   181.3n ±  1%   +5.32% (p=0.000 n=10)
Unmarshal_Double-16            16.33n ±  1%   16.20n ±  1%   -0.77% (p=0.003 n=10)
Unmarshal_Duration-16          26.03n ±  1%   25.05n ±  0%   -3.77% (p=0.000 n=10)
Unmarshal_Float-16             15.70n ±  0%   15.90n ±  1%   +1.31% (p=0.001 n=10)
Unmarshal_Int-16               17.80n ±  1%   18.78n ±  2%   +5.48% (p=0.000 n=10)
Unmarshal_Inet-16              28.68n ±  1%   29.20n ±  1%   +1.78% (p=0.001 n=10)
Unmarshal_SmallInt-16          17.92n ±  1%   19.37n ±  2%   +8.09% (p=0.000 n=10)
Unmarshal_Time-16              16.00n ±  1%   16.22n ±  1%   +1.37% (p=0.001 n=10)
Unmarshal_Timestamp-16         16.00n ±  2%   16.57n ±  2%   +3.50% (p=0.000 n=10)
Unmarshal_TinyInt-16           16.38n ±  3%   18.41n ±  1%  +12.39% (p=0.000 n=10)
Unmarshal_UUID-16              16.05n ±  1%   16.34n ±  3%   +1.74% (p=0.000 n=10)
Unmarshal_Varchar-16           18.86n ±  1%   18.71n ±  1%   -0.77% (p=0.003 n=10)
Unmarshal_List-16              176.2n ±  4%   183.2n ±  4%   +3.97% (p=0.003 n=10)
Unmarshal_Set-16               177.7n ±  1%   181.5n ±  0%   +2.17% (p=0.000 n=10)
Unmarshal_Map-16               374.0n ±  5%   375.2n ±  4%        ~ (p=0.218 n=10)
FramerReadTypeInfo-16          207.8n ±  3%   214.9n ±  2%   +3.44% (p=0.002 n=10)
ConnStress-16                  14.10µ ± 16%   15.20µ ± 14%        ~ (p=0.631 n=10)
ConnRoutingKey-16              275.3n ±  4%   197.8n ±  2%  -28.17% (p=0.000 n=10)
WikiCreateSchema-16            515.9m ±  3%   501.4m ±  3%   -2.81% (p=0.011 n=10)
WikiCreatePages-16             1.534m ±  1%   1.474m ±  1%   -3.90% (p=0.000 n=10)
WikiSelectAllPages-16          1.982m ±  3%   1.906m ±  3%   -3.84% (p=0.000 n=10)
WikiSelectSinglePage-16        1.529m ±  2%   1.479m ±  1%   -3.27% (p=0.000 n=10)
WikiSelectPageCount-16         1.691m ±  2%   1.646m ±  1%   -2.63% (p=0.001 n=10)
Unmarshal_TupleStrings-16                     329.6n ±  1%
Unmarshal_TupleInterfaces-16                  357.9n ±  3%
geomean                        387.3n         475.8n         -0.98%                ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

                             │    old.txt     │                 new2.txt                 │
                             │      B/op      │     B/op      vs base                    │
SingleConn-16                  3.155Ki ± 0%     3.109Ki ± 0%   -1.45% (p=0.000 n=10)
ParseRowsFrame-16               1128.0 ± 0%       856.0 ± 0%  -24.11% (p=0.000 n=10)
UnmarshalVarchar-16              32.00 ± 0%
UnmarshalUUID-16                 0.000 ± 0%
Unmarshal_BigInt-16              0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Blob-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Boolean-16             0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Date-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Decimal-16             96.00 ± 0%       96.00 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Double-16              0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Duration-16            0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Float-16               0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Int-16                 0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Inet-16                4.000 ± 0%       4.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_SmallInt-16            0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Time-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Timestamp-16           0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_TinyInt-16             0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_UUID-16                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Varchar-16             0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_List-16                32.00 ± 0%       32.00 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Set-16                 32.00 ± 0%       32.00 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Map-16                 280.0 ± 0%       280.0 ± 0%        ~ (p=1.000 n=10) ¹
FramerReadTypeInfo-16           160.00 ± 0%       96.00 ± 0%  -40.00% (p=0.000 n=10)
ConnStress-16                  2.956Ki ± 0%     2.669Ki ± 0%   -9.70% (p=0.001 n=10)
ConnRoutingKey-16                96.00 ± 0%       32.00 ± 0%  -66.67% (p=0.000 n=10)
WikiCreateSchema-16            59.90Ki ± 3%     51.87Ki ± 4%  -13.41% (p=0.000 n=10)
WikiCreatePages-16             4.357Ki ± 0%     3.864Ki ± 0%  -11.33% (p=0.000 n=10)
WikiSelectAllPages-16          38.32Ki ± 0%     28.74Ki ± 0%  -25.01% (p=0.000 n=10)
WikiSelectSinglePage-16        3.783Ki ± 0%     3.303Ki ± 0%  -12.69% (p=0.000 n=10)
WikiSelectPageCount-16         3.215Ki ± 0%     2.816Ki ± 0%  -12.42% (p=0.000 n=10)
Unmarshal_TupleStrings-16                         126.0 ± 0%
Unmarshal_TupleInterfaces-16                      126.0 ± 0%
geomean                                     ²                  -9.27%                ³ ²
¹ all samples are equal
² summaries must be >0 to compute geomean
³ benchmark set differs from baseline; geomeans may not be comparable

                             │   old.txt    │                new2.txt                │
                             │  allocs/op   │ allocs/op   vs base                    │
SingleConn-16                  37.00 ± 0%     37.00 ± 0%        ~ (p=1.000 n=10) ¹
ParseRowsFrame-16              21.00 ± 0%     13.00 ± 0%  -38.10% (p=0.000 n=10)
UnmarshalVarchar-16            1.000 ± 0%
UnmarshalUUID-16               0.000 ± 0%
Unmarshal_BigInt-16            0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Blob-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Boolean-16           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Date-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Decimal-16           4.000 ± 0%     4.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Double-16            0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Duration-16          0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Float-16             0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Int-16               0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Inet-16              1.000 ± 0%     1.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_SmallInt-16          0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Time-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Timestamp-16         0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_TinyInt-16           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_UUID-16              0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Varchar-16           0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_List-16              2.000 ± 0%     2.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Set-16               2.000 ± 0%     2.000 ± 0%        ~ (p=1.000 n=10) ¹
Unmarshal_Map-16               5.000 ± 0%     5.000 ± 0%        ~ (p=1.000 n=10) ¹
FramerReadTypeInfo-16          4.000 ± 0%     4.000 ± 0%        ~ (p=1.000 n=10) ¹
ConnStress-16                  35.00 ± 0%     33.00 ± 0%   -5.71% (p=0.000 n=10)
ConnRoutingKey-16              4.000 ± 0%     3.000 ± 0%  -25.00% (p=0.000 n=10)
WikiCreateSchema-16            858.0 ± 3%     654.5 ± 4%  -23.72% (p=0.000 n=10)
WikiCreatePages-16             56.00 ± 2%     51.00 ± 0%   -8.93% (p=0.000 n=10)
WikiSelectAllPages-16          343.0 ± 0%     338.0 ± 0%   -1.46% (p=0.000 n=10)
WikiSelectSinglePage-16        50.00 ± 0%     45.00 ± 0%  -10.00% (p=0.000 n=10)
WikiSelectPageCount-16         45.00 ± 0%     40.00 ± 0%  -11.11% (p=0.000 n=10)
Unmarshal_TupleStrings-16                     8.000 ± 0%
Unmarshal_TupleInterfaces-16                  8.000 ± 0%
geomean                                   ²                -4.81%                ³ ²
¹ all samples are equal
² summaries must be >0 to compute geomean
³ benchmark set differs from baseline; geomeans may not be comparable

Patch by James Hartig for CASSGO-43
panic(fmt.Errorf("type %d already registered", typ))
}
if _, ok := r.byString[name]; ok {
panic(fmt.Errorf("type name %s already registered", name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably return errors instead of panic

cancel: cancel,
logger: cfg.logger(),
// TODO: switch to from the config
types: GlobalTypes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we could add some logic here so that the user can't modify the registrations after a session has been created and therefore not requiring locking. Maybe copying the registrations to a new internal struct that only supports reading them

@joao-r-reis
Copy link
Contributor

I'm much more in favor of this new path. I think for the "merging" we should probably just go for a simple approach where the global registrations just aren't used at all if the user provides a struct in the cluster config.

Or you overwrite the global registrations by checking them one by one and you log a warning when an overwrite occurs.

Are you completely against adding the protocol version as a parameter to the Marshal/Unmarshal functions of TypeInfo? If the concern is how intrusive the change might be for users then we could try and explore ways to make it less intrusive but the core principal of having access to the protocol version being used while marshalling and unmarshalling is quite important in my view

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants