Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation of SKG traversal implementation (Chapter 05) #208

Open
labdmitriy opened this issue Jan 13, 2025 · 1 comment
Open

Explanation of SKG traversal implementation (Chapter 05) #208

labdmitriy opened this issue Jan 13, 2025 · 1 comment

Comments

@labdmitriy
Copy link

Hi @treygrainger ,

Could you please explain why we have the following Solr query for the SKG implement in Chapter 05, based on the example from Listing 5.4:

{
  "limit": 0,
  "params": {
    "q": "*:*",
    "fore": "{!${defType} v=$q}",
    "back": "*:*",
    "defType": "edismax",
    "f0_0_query": "advil"
  },
  "facet": {
    "f0_0": {
      "type": "query",
      "sort": {
        "relatedness": "desc"
      },
      "facet": {
        "relatedness": {
          "type": "func",
          "func": "relatedness($fore,$back)"
        },
        "f1_0": {
          "type": "terms",
          "limit": 8,
          "sort": {
            "relatedness": "desc"
          },
          "facet": {
            "relatedness": {
              "type": "func",
              "func": "relatedness($fore,$back)"
            }
          },
          "mincount": 2,
          "field": "body"
        }
      },
      "field": "body",
      "query": "{!edismax q.op=AND qf=body v=$f0_0_query}"
    }
  }
}

I've just started exploring Solr features therefore I have the following questions about it:

  • Why do we have foreground dataset initially as "{!${defType} v=$q}", and because "q": "*:*", so as I understand initially we have foreground set as entire collection, but somehow using top-level facet of the type query we will filter foreground set to only documents that match our query ("query": "{!edismax q.op=AND qf=body v=$f0_0_query}")? Could you please explain the mechanism?
    I didn't find enough information about it in the documentaition and on external resources.
  • Why do we need to get multiple relatedness function calculations and sorting based on these calculations? As I understand this calculation and sorting on the level of the facet f1_0 is enough?

I tried to simplify this request manually and tested the query using Solr Admin, and found that we have the same results with the following JSON query:

{
  "limit": 0,
  "params": {
    "fore": "{!edismax q.op=AND qf=body v=$f0_0_query}",
    "back": "*:*",
    "defType": "edismax",
    "f0_0_query": "advil"
  },
  "facet": {
    "f1_0": {
      "type": "terms",
      "limit": 8,
      "sort": {
        "relatedness": "desc"
      },
      "facet": {
        "relatedness": {
          "type": "func",
          "func": "relatedness($fore,$back)"
        }
      },
      "mincount": 2,
      "field": "body"
    }
  }
}
  • q parameter was removed
  • fore parameter has already target query
  • Top-level facet f0_0 was removed
  • Only one calculation and sorting of the relatedness function scores is presented

Could you please share your thoughts about it?

Thank you.

@labdmitriy
Copy link
Author

Also I am really confused about relatedness arguments, in the formula from the book we have 3 arguments: x, fg and bg, but in Solr we have function with 2 arguments: fore and back, and as we can see from the simplified JSON that we even don't need q parameter and need to define only foreground and background datasets.
Could you please explain what is the real meaning of x from the formula from the book and how it is defined in Solr implementation?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant