Improve TADriver interface and implementations #1032

joseph-sentry · 2025-01-24T22:02:47Z

improve the TADriver interface
complete the BQDriver and PGDriver implementations
- the PGDriver changes are mostly moving around existing code
Introduce ta_utils which is going to replace services.test_results
- the reason it's being introduced is because I had to change the interface of TestResultsNotifier but we still want the old one around for the old test results pipeline
add and use the test_analytics app from shared
consume the config options in django_scaffold.settings instead of making various get_config calls

codecov · 2025-01-24T22:09:45Z

Codecov Report

Attention: Patch coverage is 89.32515% with 87 lines in your changes missing coverage. Please review.

Project coverage is 97.52%. Comparing base (c6453bf) to head (2ad04b0).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
services/ta_utils.py	76.47%	40 Missing ⚠️
ta_storage/pg.py	83.58%	22 Missing ⚠️
ta_storage/bq.py	77.77%	20 Missing ⚠️
ta_storage/base.py	85.29%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1032      +/-   ##
==========================================
+ Coverage   97.48%   97.52%   +0.03%     
==========================================
  Files         459      462       +3     
  Lines       37279    37956     +677     
==========================================
+ Hits        36341    37015     +674     
- Misses        938      941       +3

Flag	Coverage Δ
integration	`42.31% <31.77%> (-0.22%)`	⬇️
unit	`90.26% <88.46%> (+0.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

⚠️ Impact Analysis from Codecov is deprecated and will be sunset on Jan 31 2025. See more

codecov-staging · 2025-01-24T22:09:47Z

Codecov Report

Attention: Patch coverage is 89.32515% with 87 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
services/ta_utils.py	76.47%	40 Missing ⚠️
ta_storage/pg.py	83.58%	22 Missing ⚠️
ta_storage/bq.py	77.77%	20 Missing ⚠️
ta_storage/base.py	85.29%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

codecov-qa · 2025-01-24T22:09:56Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
1814	2	1812	6

View the top 2 failed tests by shortest run time

ta_storage/tests/test_bq.py::test_write_testruns_with_flake

Stack Traces | 0.051s run time

mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125457632'>
mock_config = None, snapshot = snapshot

    @pytest.mark.django_db(transaction=True, databases=["test_analytics"])
    def test_write_testruns_with_flake(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
        timestamp = int(
            datetime.fromisoformat("2025-01-01T00:00:00Z").timestamp() * 1000000
        )
    
>       flake = Flake.objects.create(
            repoid=1,
            test_id=calc_test_id("test_suite", "TestClass", "test_something"),
            flags_id=calc_flags_hash(["unit"]),
            fail_count=1,
            count=1,
            recent_passes_count=1,
            start_date=datetime.now(timezone.utc),
        )

ta_storage/tests/test_bq.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.13.../db/models/manager.py:87: in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
.../local/lib/python3.13.../db/models/query.py:656: in create
    obj = self.model(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Flake: Flake object (None)>, args = ()
kwargs = {'flags_id': b'\\\xb9\\\xc4\x06\x0fx\x12'}
cls = <class 'shared.django_apps.test_analytics.models.Flake'>
opts = <Options for Flake>, _setattr = <built-in function setattr>
_DEFERRED = <Deferred field>
fields_iter = <tuple_iterator object at 0x7f4a88037850>, val = None
field = <django.db.models.fields.DateTimeField: end_date>
is_related_object = False, property_names = frozenset({'pk'})

    def __init__(self, *args, **kwargs):
        # Alias some things as locals to avoid repeat global lookups
        cls = self.__class__
        opts = self._meta
        _setattr = setattr
        _DEFERRED = DEFERRED
        if opts.abstract:
            raise TypeError("Abstract models cannot be instantiated.")
    
        pre_init.send(sender=cls, args=args, kwargs=kwargs)
    
        # Set up the storage for instance state
        self._state = ModelState()
    
        # There is a rather weird disparity here; if kwargs, it's set, then args
        # overrides it. It should be one or the other; don't duplicate the work
        # The reason for the kwargs check is that standard iterator passes in by
        # args, and instantiation for iteration is 33% faster.
        if len(args) > len(opts.concrete_fields):
            # Daft, but matches old exception sans the err msg.
            raise IndexError("Number of args exceeds number of fields")
    
        if not kwargs:
            fields_iter = iter(opts.concrete_fields)
            # The ordering of the zip calls matter - zip throws StopIteration
            # when an iter throws it. So if the first iter throws it, the second
            # is *not* consumed. We rely on this, so don't change the order
            # without changing the logic.
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
        else:
            # Slower, kwargs-ready version.
            fields_iter = iter(opts.fields)
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
                if kwargs.pop(field.name, NOT_PROVIDED) is not NOT_PROVIDED:
                    raise TypeError(
                        f"{cls.__qualname__}() got both positional and "
                        f"keyword arguments for field '{field.name}'."
                    )
    
        # Now we're left with the unprocessed fields that *must* come from
        # keywords, or default.
    
        for field in fields_iter:
            is_related_object = False
            # Virtual field
            if field.attname not in kwargs and field.column is None:
                continue
            if kwargs:
                if isinstance(field.remote_field, ForeignObjectRel):
                    try:
                        # Assume object instance was passed in.
                        rel_obj = kwargs.pop(field.name)
                        is_related_object = True
                    except KeyError:
                        try:
                            # Object instance wasn't passed in -- must be an ID.
                            val = kwargs.pop(field.attname)
                        except KeyError:
                            val = field.get_default()
                else:
                    try:
                        val = kwargs.pop(field.attname)
                    except KeyError:
                        # This is done with an exception rather than the
                        # default argument on pop because we don't want
                        # get_default() to be evaluated, and then not used.
                        # Refs #12057.
                        val = field.get_default()
            else:
                val = field.get_default()
    
            if is_related_object:
                # If we are passed a related instance, set it using the
                # field.name instead of field.attname (e.g. "user" instead of
                # "user_id") so that the object gets properly cached (and type
                # checked) by the RelatedObjectDescriptor.
                if rel_obj is not _DEFERRED:
                    _setattr(self, field.name, rel_obj)
            else:
                if val is not _DEFERRED:
                    _setattr(self, field.attname, val)
    
        if kwargs:
            property_names = opts._property_names
            unexpected = ()
            for prop, value in kwargs.items():
                # Any remaining kwargs must correspond to properties or virtual
                # fields.
                if prop in property_names:
                    if value is not _DEFERRED:
                        _setattr(self, prop, value)
                else:
                    try:
                        opts.get_field(prop)
                    except FieldDoesNotExist:
                        unexpected += (prop,)
                    else:
                        if value is not _DEFERRED:
                            _setattr(self, prop, value)
            if unexpected:
                unexpected_names = ", ".join(repr(n) for n in unexpected)
>               raise TypeError(
                    f"{cls.__name__}() got unexpected keyword arguments: "
                    f"{unexpected_names}"
                )
E               TypeError: Flake() got unexpected keyword arguments: 'flags_id'

.../local/lib/python3.13.../db/models/base.py:567: TypeError

ta_storage/tests/test_bq.py::test_write_flakes

Stack Traces | 0.344s run time

mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125456960'>
mock_config = None, snapshot = snapshot

    @travel("2025-01-01T00:00:00Z", tick=False)
    @pytest.mark.django_db(transaction=True, databases=["default", "test_analytics"])
    def test_write_flakes(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
    
        upload = UploadFactory.create()
        upload.save()
    
        mock_bigquery_service.query.return_value = [
            {
                "branch_name": "main",
                "timestamp": int(datetime.now().timestamp() * 1000000),
                "outcome": ta_testrun_pb2.TestRun.Outcome.FAILED,
                "test_id": b"test_id",
                "flags_hash": b"flags_hash",
            }
        ]
    
        driver.write_flakes([upload])
    
        mock_bigquery_service.query.assert_called_once()
        query, params = mock_bigquery_service.query.call_args[0]
        assert snapshot("txt") == query
        assert params == [
            ScalarQueryParameter("upload_id", "INT64", upload.id),
            ArrayQueryParameter(
                "flake_ids",
                StructQueryParameterType(
                    ScalarQueryParameter("test_id", "STRING", "test_id"),
                    ScalarQueryParameter("flags_id", "STRING", "flags_id"),
                ),
                [],
            ),
        ]
    
        flakes = Flake.objects.all()
        flake_data = [
            {
                "repoid": flake.repoid,
                "test_id": flake.test_id.hex(),
                "fail_count": flake.fail_count,
                "count": flake.count,
                "recent_passes_count": flake.recent_passes_count,
                "start_date": flake.start_date.isoformat() if flake.start_date else None,
                "end_date": flake.end_date.isoformat() if flake.end_date else None,
>               "flags_id": flake.flags_id.hex() if flake.flags_id else None,
            }
            for flake in flakes
        ]
E       AttributeError: 'Flake' object has no attribute 'flags_id'

ta_storage/tests/test_bq.py:245: AttributeError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

codecov-public-qa · 2025-01-24T22:10:06Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
1814	2	1812	6

View the top 2 failed tests by shortest run time

ta_storage/tests/test_bq.py::::test_write_testruns_with_flake

Stack Traces | 0.051s run time

mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125457632'>
mock_config = None, snapshot = snapshot

    @pytest.mark.django_db(transaction=True, databases=["test_analytics"])
    def test_write_testruns_with_flake(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
        timestamp = int(
            datetime.fromisoformat("2025-01-01T00:00:00Z").timestamp() * 1000000
        )
    
>       flake = Flake.objects.create(
            repoid=1,
            test_id=calc_test_id("test_suite", "TestClass", "test_something"),
            flags_id=calc_flags_hash(["unit"]),
            fail_count=1,
            count=1,
            recent_passes_count=1,
            start_date=datetime.now(timezone.utc),
        )

ta_storage/tests/test_bq.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.13.../db/models/manager.py:87: in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
.../local/lib/python3.13.../db/models/query.py:656: in create
    obj = self.model(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Flake: Flake object (None)>, args = ()
kwargs = {'flags_id': b'\\\xb9\\\xc4\x06\x0fx\x12'}
cls = <class 'shared.django_apps.test_analytics.models.Flake'>
opts = <Options for Flake>, _setattr = <built-in function setattr>
_DEFERRED = <Deferred field>
fields_iter = <tuple_iterator object at 0x7f4a88037850>, val = None
field = <django.db.models.fields.DateTimeField: end_date>
is_related_object = False, property_names = frozenset({'pk'})

    def __init__(self, *args, **kwargs):
        # Alias some things as locals to avoid repeat global lookups
        cls = self.__class__
        opts = self._meta
        _setattr = setattr
        _DEFERRED = DEFERRED
        if opts.abstract:
            raise TypeError("Abstract models cannot be instantiated.")
    
        pre_init.send(sender=cls, args=args, kwargs=kwargs)
    
        # Set up the storage for instance state
        self._state = ModelState()
    
        # There is a rather weird disparity here; if kwargs, it's set, then args
        # overrides it. It should be one or the other; don't duplicate the work
        # The reason for the kwargs check is that standard iterator passes in by
        # args, and instantiation for iteration is 33% faster.
        if len(args) > len(opts.concrete_fields):
            # Daft, but matches old exception sans the err msg.
            raise IndexError("Number of args exceeds number of fields")
    
        if not kwargs:
            fields_iter = iter(opts.concrete_fields)
            # The ordering of the zip calls matter - zip throws StopIteration
            # when an iter throws it. So if the first iter throws it, the second
            # is *not* consumed. We rely on this, so don't change the order
            # without changing the logic.
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
        else:
            # Slower, kwargs-ready version.
            fields_iter = iter(opts.fields)
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
                if kwargs.pop(field.name, NOT_PROVIDED) is not NOT_PROVIDED:
                    raise TypeError(
                        f"{cls.__qualname__}() got both positional and "
                        f"keyword arguments for field '{field.name}'."
                    )
    
        # Now we're left with the unprocessed fields that *must* come from
        # keywords, or default.
    
        for field in fields_iter:
            is_related_object = False
            # Virtual field
            if field.attname not in kwargs and field.column is None:
                continue
            if kwargs:
                if isinstance(field.remote_field, ForeignObjectRel):
                    try:
                        # Assume object instance was passed in.
                        rel_obj = kwargs.pop(field.name)
                        is_related_object = True
                    except KeyError:
                        try:
                            # Object instance wasn't passed in -- must be an ID.
                            val = kwargs.pop(field.attname)
                        except KeyError:
                            val = field.get_default()
                else:
                    try:
                        val = kwargs.pop(field.attname)
                    except KeyError:
                        # This is done with an exception rather than the
                        # default argument on pop because we don't want
                        # get_default() to be evaluated, and then not used.
                        # Refs #12057.
                        val = field.get_default()
            else:
                val = field.get_default()
    
            if is_related_object:
                # If we are passed a related instance, set it using the
                # field.name instead of field.attname (e.g. "user" instead of
                # "user_id") so that the object gets properly cached (and type
                # checked) by the RelatedObjectDescriptor.
                if rel_obj is not _DEFERRED:
                    _setattr(self, field.name, rel_obj)
            else:
                if val is not _DEFERRED:
                    _setattr(self, field.attname, val)
    
        if kwargs:
            property_names = opts._property_names
            unexpected = ()
            for prop, value in kwargs.items():
                # Any remaining kwargs must correspond to properties or virtual
                # fields.
                if prop in property_names:
                    if value is not _DEFERRED:
                        _setattr(self, prop, value)
                else:
                    try:
                        opts.get_field(prop)
                    except FieldDoesNotExist:
                        unexpected += (prop,)
                    else:
                        if value is not _DEFERRED:
                            _setattr(self, prop, value)
            if unexpected:
                unexpected_names = ", ".join(repr(n) for n in unexpected)
>               raise TypeError(
                    f"{cls.__name__}() got unexpected keyword arguments: "
                    f"{unexpected_names}"
                )
E               TypeError: Flake() got unexpected keyword arguments: 'flags_id'

.../local/lib/python3.13.../db/models/base.py:567: TypeError

ta_storage/tests/test_bq.py::::test_write_flakes

Stack Traces | 0.344s run time

mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125456960'>
mock_config = None, snapshot = snapshot

    @travel("2025-01-01T00:00:00Z", tick=False)
    @pytest.mark.django_db(transaction=True, databases=["default", "test_analytics"])
    def test_write_flakes(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
    
        upload = UploadFactory.create()
        upload.save()
    
        mock_bigquery_service.query.return_value = [
            {
                "branch_name": "main",
                "timestamp": int(datetime.now().timestamp() * 1000000),
                "outcome": ta_testrun_pb2.TestRun.Outcome.FAILED,
                "test_id": b"test_id",
                "flags_hash": b"flags_hash",
            }
        ]
    
        driver.write_flakes([upload])
    
        mock_bigquery_service.query.assert_called_once()
        query, params = mock_bigquery_service.query.call_args[0]
        assert snapshot("txt") == query
        assert params == [
            ScalarQueryParameter("upload_id", "INT64", upload.id),
            ArrayQueryParameter(
                "flake_ids",
                StructQueryParameterType(
                    ScalarQueryParameter("test_id", "STRING", "test_id"),
                    ScalarQueryParameter("flags_id", "STRING", "flags_id"),
                ),
                [],
            ),
        ]
    
        flakes = Flake.objects.all()
        flake_data = [
            {
                "repoid": flake.repoid,
                "test_id": flake.test_id.hex(),
                "fail_count": flake.fail_count,
                "count": flake.count,
                "recent_passes_count": flake.recent_passes_count,
                "start_date": flake.start_date.isoformat() if flake.start_date else None,
                "end_date": flake.end_date.isoformat() if flake.end_date else None,
>               "flags_id": flake.flags_id.hex() if flake.flags_id else None,
            }
            for flake in flakes
        ]
E       AttributeError: 'Flake' object has no attribute 'flags_id'

ta_storage/tests/test_bq.py:245: AttributeError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

github-actions · 2025-01-24T22:10:15Z

✅ All tests successful. No failed tests were found.

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

Improve TADriver interface - add some more methods to the TADriver - implement a base constructor - modify the write_testruns interface - implement all methods in BQ and PG - improve BQ and PG tests - modify use of TADriver interface in processor and finishers - update django settings to include new settings - TODO: modify requirements to suitable shared version - create ta_utils to replace test_results in the future - the reason for this is that we want a slightly different implementation of the test results notifier for the new TA pipeline

github-actions · 2025-01-24T22:48:21Z

This PR includes changes to shared. Please review them here: codecov/shared@e854f50...c16e8bc

Swatinem

there are a few simplifications possible here, otherwise this looks good.

I am quite unhappy with the tests though. We are once again relying on overly mocked "whitebox" tests. the only thing they are asserting is that you call the mocks with certain parameters, and that the function returns the mocked data unmodified. I don’t think these tests provide any kind of value :-(

Swatinem · 2025-01-28T09:56:12Z

django_scaffold/settings.py

@@ -12,11 +12,23 @@
 if "timeseries" in DATABASES:
    DATABASES["timeseries"]["AUTOCOMMIT"] = False

+if "test_analytics" in DATABASES:
+    DATABASES["test_analytics"]["AUTOCOMMIT"] = False


we should really consider changing this in general, but that is a completely different discussion.

Swatinem · 2025-01-28T09:58:55Z

services/ta_utils.py

+            max_backtick_count = curr_backtick_count
+
+    backticks = "`" * (max_backtick_count + 1)
+    return f"{backticks}python\n{content}\n{backticks}"


setting the codeblock language to python might be the wrong thing to do in general
not sure if we have any way to detect the language? maybe depending on the testsuite (which should really be named the "test runner")?

this was a product / design decision, we just wanted to have some highlighting in the failure message displayed in the comment, i can't remember if this decision was made before or after we had framework detection but either way i can bring this up to them

Swatinem · 2025-01-28T10:08:38Z

ta_storage/tests/test_bq.py

+    testruns_written = [
+        MessageToDict(
+            ta_testrun_pb2.TestRun.FromString(testrun_bytes),
+            preserving_proto_field_name=True,
+        )
+        for testrun_bytes in mock_bigquery_service.mock_calls[0][1][3]
+    ]
+    assert snapshot("json") == sorted(testruns_written, key=lambda x: x["name"])


you are asserting that the mocked function is being called with the arguments you put in above.
IMO the value of such tests is very low. All you are asserting here is that the serialization to protobuf is working correctly?

Swatinem · 2025-01-28T10:10:29Z

ta_storage/tests/test_bq.py

+        ScalarQueryParameter("repoid", "INT64", 1),
+        ScalarQueryParameter("commit_sha", "STRING", "abc123"),
+    ]
+    assert snapshot("json") == result


here also, the snapshot just contains the mock_bigquery_service.query.return_value

Swatinem · 2025-01-28T10:12:11Z

ta_storage/bq.py

+        return [
+            flake
+            for flake in Flake.objects.filter(


Suggested change

return [

flake

for flake in Flake.objects.filter(

return list(Flake.objects.filter(

Swatinem · 2025-01-28T10:14:51Z

ta_storage/tests/test_bq.py

+    query, params = mock_bigquery_service.query.call_args[0]
+    assert snapshot("txt") == query
+    assert params == [
+        ScalarQueryParameter("repoid", "INT64", 1),
+        ScalarQueryParameter("commit_sha", "STRING", "abc123"),
+    ]


In general, I doubt asserting the generated SQL here is providing any value.

You are building these queries using trivial string concatenation.

This is very different from the way I assert the generated queries in the deletion code, which are fully dynamically created by the ORM, where I have very little insight into how it works under the hood.

Swatinem · 2025-01-28T10:22:13Z

tasks/ta_processor.py

+            pg = PGDriver(repoid, db_session, flaky_test_set)
+            if settings.BIGQUERY_WRITE_ENABLED:
+                bq = BQDriver(repoid)


instead of duplicating this code, how about taking advantage of the new driver interface you have introduced:

drivers = [pg, bq] if BQ_ENABLED else [pg] # or any kind of combination based on feature flags for driver in drivers: driver.bulk_write_testruns(parsing_info)

Swatinem · 2025-01-28T10:31:34Z

ta_storage/bq.py

+        return [
+            {
+                "branch_name": result["branch_name"],
+                "timestamp": result["timestamp"],
+                "outcome": result["outcome"],
+                "test_id": result["test_id"],
+                "flags_hash": result["flags_hash"],
+            }
+            for result in query_result
+        ]


are you picking only specific fields from the result? how is this different from just returning query_result or list(query_result) in case its an iterator?

the reason for this was typing, since i wanted the return type to be a typed dict, but there was no guarantee on the result of query having those keys, i ended up doing this, which is no better than doing cast(query_result, list[TypedDict]) tbh, but i do want it to be a TypedDict

Swatinem · 2025-01-28T10:33:08Z

ta_storage/bq.py

+        flakes = list(self.flake_dict.values())
+
+        flake_dict = {
+            (
+                bytes(flake.test_id),
+                bytes(flake.flags_id) if flake.flags_id else None,
+            ): flake
+            for flake in flakes
+        }


this is just flake_dict = self.flake_dict, is it?

Swatinem · 2025-01-28T10:35:37Z

ta_storage/bq.py

+                for flake in Flake.objects.raw(
+                    "SELECT * FROM flake WHERE repoid = %s AND (test_id, flags_id) IN %s AND end_date IS NULL AND count != recent_passes_count + fail_count",
+                    [self.repo_id, test_ids],


is there a particular reason to use a raw query here?
IMO, using the query builder would be simpler, as the if test_ids would just be simply adding another filter to the base query set, which seems to be the same in both branches?

if we use the query builder here:

from django.db.models import Q query = Q() for id, author in filter_data: query |= Q(id=id, author=author) books = Book.objects.filter(query)

would be the way to do it, which i wasn't a fan of, because the sql would be a bunch of (id, author) = (value, value) OR concatenated together, so i just went with the raw sql to express it exactly

ohhh, so the problem here is that a IN query with a tuple is not expressible with the django query builder?

ohhh, so the problem here is that a IN query with a tuple is not expressible with the django query builder?

yes, the only way to get the behaviour we want through the query builder is by having a where expression with a bunch of equality checks joined by OR

joseph-sentry · 2025-01-28T14:53:42Z

I am quite unhappy with the tests though. We are once again relying on overly mocked "whitebox" tests. the only thing they are asserting is that you call the mocks with certain parameters, and that the function returns the mocked data unmodified. I don’t think these tests provide any kind of value :-(

@Swatinem i think you're right. my idea was to ship this quickly and start validating in prod on our own repos and have minimal automated testing because the option of "find a way to connect to BQ in CI" sounded like it would take too long and the option of "use a third party emulator" sounded like a bad idea. the goal of the tests right now is not to validate that anything works but that changes that devs make in the future don't break the existing behaviour. I thought this might have some value but i think you're right that we want tests that validate that things work and that these tests are basically just here to make sure that devs review their changes to this code through the snapshots.

what i can do right now is use https://github.com/goccy/bigquery-emulator but i still don't think it's a good idea to use a third party emulator in the long term and that it we would be best if we find a way to connect to some dev deployment of BQ in the CI at some point in the future

Swatinem · 2025-01-28T15:00:30Z

similar to what I proposed a while back to test real GCS access, we can use secrets in CI to run this with a proper test project in GCP, and just skip these tests when the credentials are not available either locally or for forks in CI (though maybe still fail in CI on missing credentials anyway)

I think that would be a very reasonable thing to do

joseph-sentry force-pushed the joseph/bq-flags-hash branch from 6c6004b to 4df5375 Compare January 24, 2025 22:32

update shared

a7fe54d

chore: make lint

2ad04b0

joseph-sentry marked this pull request as ready for review January 27, 2025 14:05

joseph-sentry requested a review from a team January 27, 2025 14:06

Swatinem reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve TADriver interface and implementations #1032

Improve TADriver interface and implementations #1032

joseph-sentry commented Jan 24, 2025 •

edited

Loading

codecov bot commented Jan 24, 2025 •

edited

Loading

codecov-staging bot commented Jan 24, 2025 •

edited by codecov-notifications bot

Loading

codecov-qa bot commented Jan 24, 2025 •

edited

Loading

codecov-public-qa bot commented Jan 24, 2025 •

edited

Loading

github-actions bot commented Jan 24, 2025 •

edited

Loading

github-actions bot commented Jan 24, 2025

Swatinem left a comment

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

joseph-sentry Jan 28, 2025

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

joseph-sentry Jan 28, 2025

Swatinem Jan 28, 2025

Swatinem Jan 28, 2025

joseph-sentry Jan 28, 2025

Swatinem Jan 28, 2025

joseph-sentry Jan 28, 2025 •

edited

Loading

joseph-sentry commented Jan 28, 2025 •

edited

Loading

Swatinem commented Jan 28, 2025

Improve TADriver interface and implementations #1032

Are you sure you want to change the base?

Improve TADriver interface and implementations #1032

Conversation

joseph-sentry commented Jan 24, 2025 • edited Loading

codecov bot commented Jan 24, 2025 • edited Loading

Codecov Report

codecov-staging bot commented Jan 24, 2025 • edited by codecov-notifications bot Loading

Codecov Report

codecov-qa bot commented Jan 24, 2025 • edited Loading

❌ 2 Tests Failed:

codecov-public-qa bot commented Jan 24, 2025 • edited Loading

❌ 2 Tests Failed:

github-actions bot commented Jan 24, 2025 • edited Loading

github-actions bot commented Jan 24, 2025

Swatinem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joseph-sentry Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

joseph-sentry commented Jan 28, 2025 • edited Loading

Swatinem commented Jan 28, 2025

joseph-sentry commented Jan 24, 2025 •

edited

Loading

codecov bot commented Jan 24, 2025 •

edited

Loading

codecov-staging bot commented Jan 24, 2025 •

edited by codecov-notifications bot

Loading

codecov-qa bot commented Jan 24, 2025 •

edited

Loading

codecov-public-qa bot commented Jan 24, 2025 •

edited

Loading

github-actions bot commented Jan 24, 2025 •

edited

Loading

joseph-sentry Jan 28, 2025 •

edited

Loading

joseph-sentry commented Jan 28, 2025 •

edited

Loading