Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TADriver interface and implementations #1032

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

joseph-sentry
Copy link
Contributor

@joseph-sentry joseph-sentry commented Jan 24, 2025

depends on: codecov/shared#484

  • improve the TADriver interface
  • complete the BQDriver and PGDriver implementations
    • the PGDriver changes are mostly moving around existing code
  • Introduce ta_utils which is going to replace services.test_results
    • the reason it's being introduced is because I had to change the interface of TestResultsNotifier but we still want the old one around for the old test results pipeline
  • add and use the test_analytics app from shared
  • consume the config options in django_scaffold.settings instead of making various get_config calls

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 89.32515% with 87 lines in your changes missing coverage. Please review.

Project coverage is 97.52%. Comparing base (c6453bf) to head (2ad04b0).

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
services/ta_utils.py 76.47% 40 Missing ⚠️
ta_storage/pg.py 83.58% 22 Missing ⚠️
ta_storage/bq.py 77.77% 20 Missing ⚠️
ta_storage/base.py 85.29% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1032      +/-   ##
==========================================
+ Coverage   97.48%   97.52%   +0.03%     
==========================================
  Files         459      462       +3     
  Lines       37279    37956     +677     
==========================================
+ Hits        36341    37015     +674     
- Misses        938      941       +3     
Flag Coverage Δ
integration 42.31% <31.77%> (-0.22%) ⬇️
unit 90.26% <88.46%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

⚠️ Impact Analysis from Codecov is deprecated and will be sunset on Jan 31 2025. See more

@codecov-staging
Copy link

codecov-staging bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 89.32515% with 87 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
services/ta_utils.py 76.47% 40 Missing ⚠️
ta_storage/pg.py 83.58% 22 Missing ⚠️
ta_storage/bq.py 77.77% 20 Missing ⚠️
ta_storage/base.py 85.29% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@codecov-qa
Copy link

codecov-qa bot commented Jan 24, 2025

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
1814 2 1812 6
View the top 2 failed tests by shortest run time
ta_storage/tests/test_bq.py::test_write_testruns_with_flake
Stack Traces | 0.051s run time
mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125457632'>
mock_config = None, snapshot = snapshot

    @pytest.mark.django_db(transaction=True, databases=["test_analytics"])
    def test_write_testruns_with_flake(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
        timestamp = int(
            datetime.fromisoformat("2025-01-01T00:00:00Z").timestamp() * 1000000
        )
    
>       flake = Flake.objects.create(
            repoid=1,
            test_id=calc_test_id("test_suite", "TestClass", "test_something"),
            flags_id=calc_flags_hash(["unit"]),
            fail_count=1,
            count=1,
            recent_passes_count=1,
            start_date=datetime.now(timezone.utc),
        )

ta_storage/tests/test_bq.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.13.../db/models/manager.py:87: in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
.../local/lib/python3.13.../db/models/query.py:656: in create
    obj = self.model(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Flake: Flake object (None)>, args = ()
kwargs = {'flags_id': b'\\\xb9\\\xc4\x06\x0fx\x12'}
cls = <class 'shared.django_apps.test_analytics.models.Flake'>
opts = <Options for Flake>, _setattr = <built-in function setattr>
_DEFERRED = <Deferred field>
fields_iter = <tuple_iterator object at 0x7f4a88037850>, val = None
field = <django.db.models.fields.DateTimeField: end_date>
is_related_object = False, property_names = frozenset({'pk'})

    def __init__(self, *args, **kwargs):
        # Alias some things as locals to avoid repeat global lookups
        cls = self.__class__
        opts = self._meta
        _setattr = setattr
        _DEFERRED = DEFERRED
        if opts.abstract:
            raise TypeError("Abstract models cannot be instantiated.")
    
        pre_init.send(sender=cls, args=args, kwargs=kwargs)
    
        # Set up the storage for instance state
        self._state = ModelState()
    
        # There is a rather weird disparity here; if kwargs, it's set, then args
        # overrides it. It should be one or the other; don't duplicate the work
        # The reason for the kwargs check is that standard iterator passes in by
        # args, and instantiation for iteration is 33% faster.
        if len(args) > len(opts.concrete_fields):
            # Daft, but matches old exception sans the err msg.
            raise IndexError("Number of args exceeds number of fields")
    
        if not kwargs:
            fields_iter = iter(opts.concrete_fields)
            # The ordering of the zip calls matter - zip throws StopIteration
            # when an iter throws it. So if the first iter throws it, the second
            # is *not* consumed. We rely on this, so don't change the order
            # without changing the logic.
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
        else:
            # Slower, kwargs-ready version.
            fields_iter = iter(opts.fields)
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
                if kwargs.pop(field.name, NOT_PROVIDED) is not NOT_PROVIDED:
                    raise TypeError(
                        f"{cls.__qualname__}() got both positional and "
                        f"keyword arguments for field '{field.name}'."
                    )
    
        # Now we're left with the unprocessed fields that *must* come from
        # keywords, or default.
    
        for field in fields_iter:
            is_related_object = False
            # Virtual field
            if field.attname not in kwargs and field.column is None:
                continue
            if kwargs:
                if isinstance(field.remote_field, ForeignObjectRel):
                    try:
                        # Assume object instance was passed in.
                        rel_obj = kwargs.pop(field.name)
                        is_related_object = True
                    except KeyError:
                        try:
                            # Object instance wasn't passed in -- must be an ID.
                            val = kwargs.pop(field.attname)
                        except KeyError:
                            val = field.get_default()
                else:
                    try:
                        val = kwargs.pop(field.attname)
                    except KeyError:
                        # This is done with an exception rather than the
                        # default argument on pop because we don't want
                        # get_default() to be evaluated, and then not used.
                        # Refs #12057.
                        val = field.get_default()
            else:
                val = field.get_default()
    
            if is_related_object:
                # If we are passed a related instance, set it using the
                # field.name instead of field.attname (e.g. "user" instead of
                # "user_id") so that the object gets properly cached (and type
                # checked) by the RelatedObjectDescriptor.
                if rel_obj is not _DEFERRED:
                    _setattr(self, field.name, rel_obj)
            else:
                if val is not _DEFERRED:
                    _setattr(self, field.attname, val)
    
        if kwargs:
            property_names = opts._property_names
            unexpected = ()
            for prop, value in kwargs.items():
                # Any remaining kwargs must correspond to properties or virtual
                # fields.
                if prop in property_names:
                    if value is not _DEFERRED:
                        _setattr(self, prop, value)
                else:
                    try:
                        opts.get_field(prop)
                    except FieldDoesNotExist:
                        unexpected += (prop,)
                    else:
                        if value is not _DEFERRED:
                            _setattr(self, prop, value)
            if unexpected:
                unexpected_names = ", ".join(repr(n) for n in unexpected)
>               raise TypeError(
                    f"{cls.__name__}() got unexpected keyword arguments: "
                    f"{unexpected_names}"
                )
E               TypeError: Flake() got unexpected keyword arguments: 'flags_id'

.../local/lib/python3.13.../db/models/base.py:567: TypeError
ta_storage/tests/test_bq.py::test_write_flakes
Stack Traces | 0.344s run time
mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125456960'>
mock_config = None, snapshot = snapshot

    @travel("2025-01-01T00:00:00Z", tick=False)
    @pytest.mark.django_db(transaction=True, databases=["default", "test_analytics"])
    def test_write_flakes(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
    
        upload = UploadFactory.create()
        upload.save()
    
        mock_bigquery_service.query.return_value = [
            {
                "branch_name": "main",
                "timestamp": int(datetime.now().timestamp() * 1000000),
                "outcome": ta_testrun_pb2.TestRun.Outcome.FAILED,
                "test_id": b"test_id",
                "flags_hash": b"flags_hash",
            }
        ]
    
        driver.write_flakes([upload])
    
        mock_bigquery_service.query.assert_called_once()
        query, params = mock_bigquery_service.query.call_args[0]
        assert snapshot("txt") == query
        assert params == [
            ScalarQueryParameter("upload_id", "INT64", upload.id),
            ArrayQueryParameter(
                "flake_ids",
                StructQueryParameterType(
                    ScalarQueryParameter("test_id", "STRING", "test_id"),
                    ScalarQueryParameter("flags_id", "STRING", "flags_id"),
                ),
                [],
            ),
        ]
    
        flakes = Flake.objects.all()
        flake_data = [
            {
                "repoid": flake.repoid,
                "test_id": flake.test_id.hex(),
                "fail_count": flake.fail_count,
                "count": flake.count,
                "recent_passes_count": flake.recent_passes_count,
                "start_date": flake.start_date.isoformat() if flake.start_date else None,
                "end_date": flake.end_date.isoformat() if flake.end_date else None,
>               "flags_id": flake.flags_id.hex() if flake.flags_id else None,
            }
            for flake in flakes
        ]
E       AttributeError: 'Flake' object has no attribute 'flags_id'

ta_storage/tests/test_bq.py:245: AttributeError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Copy link

codecov-public-qa bot commented Jan 24, 2025

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
1814 2 1812 6
View the top 2 failed tests by shortest run time
ta_storage/tests/test_bq.py::::test_write_testruns_with_flake
Stack Traces | 0.051s run time
mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125457632'>
mock_config = None, snapshot = snapshot

    @pytest.mark.django_db(transaction=True, databases=["test_analytics"])
    def test_write_testruns_with_flake(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
        timestamp = int(
            datetime.fromisoformat("2025-01-01T00:00:00Z").timestamp() * 1000000
        )
    
>       flake = Flake.objects.create(
            repoid=1,
            test_id=calc_test_id("test_suite", "TestClass", "test_something"),
            flags_id=calc_flags_hash(["unit"]),
            fail_count=1,
            count=1,
            recent_passes_count=1,
            start_date=datetime.now(timezone.utc),
        )

ta_storage/tests/test_bq.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.13.../db/models/manager.py:87: in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
.../local/lib/python3.13.../db/models/query.py:656: in create
    obj = self.model(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Flake: Flake object (None)>, args = ()
kwargs = {'flags_id': b'\\\xb9\\\xc4\x06\x0fx\x12'}
cls = <class 'shared.django_apps.test_analytics.models.Flake'>
opts = <Options for Flake>, _setattr = <built-in function setattr>
_DEFERRED = <Deferred field>
fields_iter = <tuple_iterator object at 0x7f4a88037850>, val = None
field = <django.db.models.fields.DateTimeField: end_date>
is_related_object = False, property_names = frozenset({'pk'})

    def __init__(self, *args, **kwargs):
        # Alias some things as locals to avoid repeat global lookups
        cls = self.__class__
        opts = self._meta
        _setattr = setattr
        _DEFERRED = DEFERRED
        if opts.abstract:
            raise TypeError("Abstract models cannot be instantiated.")
    
        pre_init.send(sender=cls, args=args, kwargs=kwargs)
    
        # Set up the storage for instance state
        self._state = ModelState()
    
        # There is a rather weird disparity here; if kwargs, it's set, then args
        # overrides it. It should be one or the other; don't duplicate the work
        # The reason for the kwargs check is that standard iterator passes in by
        # args, and instantiation for iteration is 33% faster.
        if len(args) > len(opts.concrete_fields):
            # Daft, but matches old exception sans the err msg.
            raise IndexError("Number of args exceeds number of fields")
    
        if not kwargs:
            fields_iter = iter(opts.concrete_fields)
            # The ordering of the zip calls matter - zip throws StopIteration
            # when an iter throws it. So if the first iter throws it, the second
            # is *not* consumed. We rely on this, so don't change the order
            # without changing the logic.
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
        else:
            # Slower, kwargs-ready version.
            fields_iter = iter(opts.fields)
            for val, field in zip(args, fields_iter):
                if val is _DEFERRED:
                    continue
                _setattr(self, field.attname, val)
                if kwargs.pop(field.name, NOT_PROVIDED) is not NOT_PROVIDED:
                    raise TypeError(
                        f"{cls.__qualname__}() got both positional and "
                        f"keyword arguments for field '{field.name}'."
                    )
    
        # Now we're left with the unprocessed fields that *must* come from
        # keywords, or default.
    
        for field in fields_iter:
            is_related_object = False
            # Virtual field
            if field.attname not in kwargs and field.column is None:
                continue
            if kwargs:
                if isinstance(field.remote_field, ForeignObjectRel):
                    try:
                        # Assume object instance was passed in.
                        rel_obj = kwargs.pop(field.name)
                        is_related_object = True
                    except KeyError:
                        try:
                            # Object instance wasn't passed in -- must be an ID.
                            val = kwargs.pop(field.attname)
                        except KeyError:
                            val = field.get_default()
                else:
                    try:
                        val = kwargs.pop(field.attname)
                    except KeyError:
                        # This is done with an exception rather than the
                        # default argument on pop because we don't want
                        # get_default() to be evaluated, and then not used.
                        # Refs #12057.
                        val = field.get_default()
            else:
                val = field.get_default()
    
            if is_related_object:
                # If we are passed a related instance, set it using the
                # field.name instead of field.attname (e.g. "user" instead of
                # "user_id") so that the object gets properly cached (and type
                # checked) by the RelatedObjectDescriptor.
                if rel_obj is not _DEFERRED:
                    _setattr(self, field.name, rel_obj)
            else:
                if val is not _DEFERRED:
                    _setattr(self, field.attname, val)
    
        if kwargs:
            property_names = opts._property_names
            unexpected = ()
            for prop, value in kwargs.items():
                # Any remaining kwargs must correspond to properties or virtual
                # fields.
                if prop in property_names:
                    if value is not _DEFERRED:
                        _setattr(self, prop, value)
                else:
                    try:
                        opts.get_field(prop)
                    except FieldDoesNotExist:
                        unexpected += (prop,)
                    else:
                        if value is not _DEFERRED:
                            _setattr(self, prop, value)
            if unexpected:
                unexpected_names = ", ".join(repr(n) for n in unexpected)
>               raise TypeError(
                    f"{cls.__name__}() got unexpected keyword arguments: "
                    f"{unexpected_names}"
                )
E               TypeError: Flake() got unexpected keyword arguments: 'flags_id'

.../local/lib/python3.13.../db/models/base.py:567: TypeError
ta_storage/tests/test_bq.py::::test_write_flakes
Stack Traces | 0.344s run time
mock_bigquery_service = <MagicMock name='get_bigquery_service()' id='139958125456960'>
mock_config = None, snapshot = snapshot

    @travel("2025-01-01T00:00:00Z", tick=False)
    @pytest.mark.django_db(transaction=True, databases=["default", "test_analytics"])
    def test_write_flakes(mock_bigquery_service, mock_config, snapshot):
        driver = BQDriver(repo_id=1)
    
        upload = UploadFactory.create()
        upload.save()
    
        mock_bigquery_service.query.return_value = [
            {
                "branch_name": "main",
                "timestamp": int(datetime.now().timestamp() * 1000000),
                "outcome": ta_testrun_pb2.TestRun.Outcome.FAILED,
                "test_id": b"test_id",
                "flags_hash": b"flags_hash",
            }
        ]
    
        driver.write_flakes([upload])
    
        mock_bigquery_service.query.assert_called_once()
        query, params = mock_bigquery_service.query.call_args[0]
        assert snapshot("txt") == query
        assert params == [
            ScalarQueryParameter("upload_id", "INT64", upload.id),
            ArrayQueryParameter(
                "flake_ids",
                StructQueryParameterType(
                    ScalarQueryParameter("test_id", "STRING", "test_id"),
                    ScalarQueryParameter("flags_id", "STRING", "flags_id"),
                ),
                [],
            ),
        ]
    
        flakes = Flake.objects.all()
        flake_data = [
            {
                "repoid": flake.repoid,
                "test_id": flake.test_id.hex(),
                "fail_count": flake.fail_count,
                "count": flake.count,
                "recent_passes_count": flake.recent_passes_count,
                "start_date": flake.start_date.isoformat() if flake.start_date else None,
                "end_date": flake.end_date.isoformat() if flake.end_date else None,
>               "flags_id": flake.flags_id.hex() if flake.flags_id else None,
            }
            for flake in flakes
        ]
E       AttributeError: 'Flake' object has no attribute 'flags_id'

ta_storage/tests/test_bq.py:245: AttributeError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

Copy link

github-actions bot commented Jan 24, 2025

✅ All tests successful. No failed tests were found.

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

Improve TADriver interface

- add some more methods to the TADriver
- implement a base constructor
- modify the write_testruns interface
- implement all methods in BQ and PG
- improve BQ and PG tests
- modify use of TADriver interface in processor and finishers
- update django settings to include new settings
- TODO: modify requirements to suitable shared version
- create ta_utils to replace test_results in the future
  - the reason for this is that we want a slightly different implementation
    of the test results notifier for the new TA pipeline
Copy link

This PR includes changes to shared. Please review them here: codecov/shared@e854f50...c16e8bc

@joseph-sentry joseph-sentry marked this pull request as ready for review January 27, 2025 14:05
@joseph-sentry joseph-sentry requested a review from a team January 27, 2025 14:06
Copy link
Contributor

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few simplifications possible here, otherwise this looks good.

I am quite unhappy with the tests though. We are once again relying on overly mocked "whitebox" tests. the only thing they are asserting is that you call the mocks with certain parameters, and that the function returns the mocked data unmodified. I don’t think these tests provide any kind of value :-(

@@ -12,11 +12,23 @@
if "timeseries" in DATABASES:
DATABASES["timeseries"]["AUTOCOMMIT"] = False

if "test_analytics" in DATABASES:
DATABASES["test_analytics"]["AUTOCOMMIT"] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should really consider changing this in general, but that is a completely different discussion.

max_backtick_count = curr_backtick_count

backticks = "`" * (max_backtick_count + 1)
return f"{backticks}python\n{content}\n{backticks}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting the codeblock language to python might be the wrong thing to do in general
not sure if we have any way to detect the language? maybe depending on the testsuite (which should really be named the "test runner")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a product / design decision, we just wanted to have some highlighting in the failure message displayed in the comment, i can't remember if this decision was made before or after we had framework detection but either way i can bring this up to them

Comment on lines +139 to +146
testruns_written = [
MessageToDict(
ta_testrun_pb2.TestRun.FromString(testrun_bytes),
preserving_proto_field_name=True,
)
for testrun_bytes in mock_bigquery_service.mock_calls[0][1][3]
]
assert snapshot("json") == sorted(testruns_written, key=lambda x: x["name"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are asserting that the mocked function is being called with the arguments you put in above.
IMO the value of such tests is very low. All you are asserting here is that the serialization to protobuf is working correctly?

ScalarQueryParameter("repoid", "INT64", 1),
ScalarQueryParameter("commit_sha", "STRING", "abc123"),
]
assert snapshot("json") == result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also, the snapshot just contains the mock_bigquery_service.query.return_value

Comment on lines +230 to +232
return [
flake
for flake in Flake.objects.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return [
flake
for flake in Flake.objects.filter(
return list(Flake.objects.filter(

Comment on lines +164 to +169
query, params = mock_bigquery_service.query.call_args[0]
assert snapshot("txt") == query
assert params == [
ScalarQueryParameter("repoid", "INT64", 1),
ScalarQueryParameter("commit_sha", "STRING", "abc123"),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I doubt asserting the generated SQL here is providing any value.

You are building these queries using trivial string concatenation.

This is very different from the way I assert the generated queries in the deletion code, which are fully dynamically created by the ORM, where I have very little insight into how it works under the hood.

Comment on lines +133 to +135
pg = PGDriver(repoid, db_session, flaky_test_set)
if settings.BIGQUERY_WRITE_ENABLED:
bq = BQDriver(repoid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplicating this code, how about taking advantage of the new driver interface you have introduced:

drivers = [pg, bq] if BQ_ENABLED else [pg] # or any kind of combination based on feature flags
for driver in drivers:
  driver.bulk_write_testruns(parsing_info)

Comment on lines +396 to +405
return [
{
"branch_name": result["branch_name"],
"timestamp": result["timestamp"],
"outcome": result["outcome"],
"test_id": result["test_id"],
"flags_hash": result["flags_hash"],
}
for result in query_result
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you picking only specific fields from the result? how is this different from just returning query_result or list(query_result) in case its an iterator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason for this was typing, since i wanted the return type to be a typed dict, but there was no guarantee on the result of query having those keys, i ended up doing this, which is no better than doing cast(query_result, list[TypedDict]) tbh, but i do want it to be a TypedDict

Comment on lines +419 to +427
flakes = list(self.flake_dict.values())

flake_dict = {
(
bytes(flake.test_id),
bytes(flake.flags_id) if flake.flags_id else None,
): flake
for flake in flakes
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just flake_dict = self.flake_dict, is it?

Comment on lines +590 to +592
for flake in Flake.objects.raw(
"SELECT * FROM flake WHERE repoid = %s AND (test_id, flags_id) IN %s AND end_date IS NULL AND count != recent_passes_count + fail_count",
[self.repo_id, test_ids],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason to use a raw query here?
IMO, using the query builder would be simpler, as the if test_ids would just be simply adding another filter to the base query set, which seems to be the same in both branches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use the query builder here:

from django.db.models import Q

query = Q()
for id, author in filter_data:
    query |= Q(id=id, author=author)

books = Book.objects.filter(query)

would be the way to do it, which i wasn't a fan of, because the sql would be a bunch of (id, author) = (value, value) OR concatenated together, so i just went with the raw sql to express it exactly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, so the problem here is that a IN query with a tuple is not expressible with the django query builder?

Copy link
Contributor Author

@joseph-sentry joseph-sentry Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, so the problem here is that a IN query with a tuple is not expressible with the django query builder?

yes, the only way to get the behaviour we want through the query builder is by having a where expression with a bunch of equality checks joined by OR

@joseph-sentry
Copy link
Contributor Author

joseph-sentry commented Jan 28, 2025

I am quite unhappy with the tests though. We are once again relying on overly mocked "whitebox" tests. the only thing they are asserting is that you call the mocks with certain parameters, and that the function returns the mocked data unmodified. I don’t think these tests provide any kind of value :-(

@Swatinem i think you're right. my idea was to ship this quickly and start validating in prod on our own repos and have minimal automated testing because the option of "find a way to connect to BQ in CI" sounded like it would take too long and the option of "use a third party emulator" sounded like a bad idea. the goal of the tests right now is not to validate that anything works but that changes that devs make in the future don't break the existing behaviour. I thought this might have some value but i think you're right that we want tests that validate that things work and that these tests are basically just here to make sure that devs review their changes to this code through the snapshots.

what i can do right now is use https://github.com/goccy/bigquery-emulator but i still don't think it's a good idea to use a third party emulator in the long term and that it we would be best if we find a way to connect to some dev deployment of BQ in the CI at some point in the future

@Swatinem
Copy link
Contributor

similar to what I proposed a while back to test real GCS access, we can use secrets in CI to run this with a proper test project in GCP, and just skip these tests when the credentials are not available either locally or for forks in CI (though maybe still fail in CI on missing credentials anyway)

I think that would be a very reasonable thing to do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants