Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat anyOf: [] as matching no routing shards #39

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

acerbusace
Copy link
Collaborator

@acerbusace acerbusace commented Nov 20, 2024

Context

As part of #6 we have changed the behaviour of anyOf, such that:

  • anyOf: [] is treated as false
  • anyOf: [{anyOf: []}] is treated as false

Change

This PR looks to adopt that behaviour for routing as well, such that:

  • anyOf: [] will route to nothing
  • anyOf: [{anyOf: []}] will route to nothing

Also, updated shard_routing_spec to use language different from ignore where it makes sense (e.g. ignore -> treat ... as matching everything).

@acerbusace acerbusace force-pushed the block/alexp/routingAnyOfChange branch 4 times, most recently from 045b82f to 961e54d Compare November 21, 2024 17:13
Copy link
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Great work with the fix here @acerbusace! Left a few suggestions. Also, since your PR only includes unit test changes, before merging, I'd like to get confirmation on how this impacts a real query running end-to-end. Here's my suggestion for how to do that.

  • Run DEBUG_QUERY=1 be rake boot_locally. Run a query that is impacted by this change. Capture the output.
  • Checkout main and do the same thing.
  • Leave a comment on this PR showing the behavior before and after this change so we can confirm it's having the impact we want.
  • Perform a before/after comparison of this change.

@@ -16,8 +16,9 @@ class RoutingPicker
def initialize(schema_names:)
# @type var all_values_set: _RoutingValueSet
all_values_set = RoutingValueSet::ALL
none_values_set = RoutingValueSet::NONE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none_values is grammatically odd. Can we call this empty_set (and name the constant RoutingValueSet::EMPTY)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like the name of none_values as well, so changed to empty_set instead.

@schema_names = schema_names
@all_values_set = all_values_set
@none_values_set = none_values_set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here--can we use the name empty_set instead of non_values_set for the argument and instance variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -105,6 +106,11 @@ def filter_value_set_for_filter_hash_entry(field_or_op, filter_value, target_fie

# Determines the set of filter values for an `any_of` clause, which is used for ORing multiple filters together.
def filter_value_set_for_any_of(filter_hashes, target_field_path_parts, traversed_field_path_parts, negate:)
# Here we treat `any_of: []` as matching no routes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Here we treat `any_of: []` as matching no routes.
# Here we treat `any_of: []` as matching no values.

This file isn't just for routing values; it's for extracting a set of matching values from filters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Comment on lines 268 to 285
it "searches no shards when we have an `any_of: []` AND `equal_to_any_of: ['abc']` filter because that will match no results" do
expect(shard_routing_for(["name"], {
"any_of" => [],
"equal_to_any_of" => ["abc"]
})).to search_no_shards
end

it "searches all shards when we have an `any_of: [{equal_to_any_of: ['abc']}]` filter because that will match some results" do
expect(shard_routing_for(["name"], {
"any_of" => [{"equal_to_any_of" => ["abc"]}]
})).to search_all_shards
end

it "searches all shards when we have an `any_of: [{anyof: []}, {equal_to_any_of: ['abc']}]` filter because that will match some results" do
expect(shard_routing_for(["name"], {
"any_of" => [{"any_of" => []}, {"equal_to_any_of" => ["abc"]}]
})).to search_all_shards
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it "searches no shards when we have an `any_of: []` AND `equal_to_any_of: ['abc']` filter because that will match no results" do
expect(shard_routing_for(["name"], {
"any_of" => [],
"equal_to_any_of" => ["abc"]
})).to search_no_shards
end
it "searches all shards when we have an `any_of: [{equal_to_any_of: ['abc']}]` filter because that will match some results" do
expect(shard_routing_for(["name"], {
"any_of" => [{"equal_to_any_of" => ["abc"]}]
})).to search_all_shards
end
it "searches all shards when we have an `any_of: [{anyof: []}, {equal_to_any_of: ['abc']}]` filter because that will match some results" do
expect(shard_routing_for(["name"], {
"any_of" => [{"any_of" => []}, {"equal_to_any_of" => ["abc"]}]
})).to search_all_shards
end
it "searches no shards when we have an `any_of: []` AND `equal_to_any_of: ['abc']` filter because that will match no results" do
expect(shard_routing_for(["name"], {
"any_of" => [],
"name" => {"equal_to_any_of" => ["abc"]}
})).to search_no_shards
end
it "searches the same shards for `any_of: [predicate]` and `predicate` since they are equivalent" do
predicate = {"name" => {"equal_to_any_of" => ["abc"]}}
expect(shard_routing_for(["name"], predicate)).to search_shards_identified_by "abc"
expect(shard_routing_for(["name"], {"any_of" => [predicate]})).to search_shards_identified_by "abc"
end
it "searches the same shards for `any_of: [{any_of: []}, predicate]` and `predicate` since they are equivalent" do
predicate = {"name" => {"equal_to_any_of" => ["abc"]}}
expect(shard_routing_for(["name"], predicate)).to search_shards_identified_by "abc"
expect(shard_routing_for(["name"], {"any_of" => [{"any_of" => []}, predicate]})).to search_shards_identified_by "abc"
end

These tests were all using equal_to_any_of without first specifying the field they are filtering. The behavior in that case is essentially undefined because it's not valid query syntax. (It would be like a SQL query that was SELECT * WHERE in("abc")). The GraphQL schema prevents such a query from coming into this layer.

Once you add the field ("name") it turns out that the behavior is different from what you asserting...but more correct then what you were asserting.

Let me know if that all makes sense to you or if you disagree!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. Makes sense to me and hence changed to your recommendation.

@@ -456,6 +454,10 @@ def target_all_widget_indices
contain_exactly("widgets_rollover__*")
end

def target_no_widget_indices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_no_widget_indices says nothing about non-widget indices, and technically it's accurate to say that a query that targets only non-widget targets no widget indices.

Can this be called target_no_indices to indicate that it actually doesn't target any indices?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to target_no_indices.

@acerbusace acerbusace force-pushed the block/alexp/routingAnyOfChange branch from 961e54d to d4ba8a6 Compare November 21, 2024 20:48
@acerbusace
Copy link
Collaborator Author

acerbusace commented Nov 21, 2024

🚀 Great work with the fix here @acerbusace! Left a few suggestions. Also, since your PR only includes unit test changes, before merging, I'd like to get confirmation on how this impacts a real query running end-to-end. Here's my suggestion for how to do that.

  • Run DEBUG_QUERY=1 be rake boot_locally. Run a query that is impacted by this change. Capture the output.

  • Checkout main and do the same thing.

  • Leave a comment on this PR showing the behavior before and after this change so we can confirm it's having the impact we want.

  • Perform a before/after comparison of this change.

Ran DEBUG_QUERY=1 be rake boot_locally on this branch

Input

EG Query

query anyOfChanges {
  addresses(filter: {
    any_of: []
  }) {
    nodes {
      full_address
      geo_location {
        latitude
        longitude
      }
    }
  }
}

ES Query

<No query output, instead got "::1 - - [21/Nov/2024:16:01:33 -0500] "POST /graphql HTTP/1.1" 200 - 0.0287">

Output

EG Response

{
  "data": {
    "addresses": {
      "nodes": []
    }
  }
}

ES RESPONSE

<No response output, instead got "::1 - - [21/Nov/2024:16:01:33 -0500] "POST /graphql HTTP/1.1" 200 - 0.0287">

Ran DEBUG_QUERY=1 be rake boot_locally on main branch

Input

EG Query

query anyOfChanges {
  addresses(filter: {
    any_of: []
  }) {
    nodes {
      full_address
      geo_location {
        latitude
        longitude
      }
    }
  }
}

ES Query

[
  {
    "index": "addresses"
  },
  {
    "size": 51,
    "sort": [
      {
        "id": {
          "order": "desc",
          "missing": "_last"
        }
      }
    ],
    "track_total_hits": false,
    "query": {
      "bool": {
        "filter": [
          {
            "match_none": {
            }
          }
        ]
      }
    },
    "_source": {
      "includes": [
        "full_address",
        "geo_location.lat",
        "geo_location.lon"
      ]
    }
  }
]

Output

EG Response

{
  "data": {
    "addresses": {
      "nodes": []
    }
  }
}

ES Response

{
  "took": 4,
  "responses": [
    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "max_score": null,
        "hits": [

        ]
      },
      "status": 200
    }
  ]
}

Copy link
Collaborator

@myronmarston myronmarston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@acerbusace acerbusace merged commit 7ec7e83 into main Nov 21, 2024
10 checks passed
@acerbusace acerbusace deleted the block/alexp/routingAnyOfChange branch November 21, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants