Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17043: Remove SolrClient path pattern matching #3238

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

jkmuriithi
Copy link

https://issues.apache.org/jira/browse/SOLR-17043

Description

Currently, some SolrClient implementations (especially our "load-balancing" and "cloud" clients) do pattern-matching on the path to guess the "type" (admin, update, etc. ) of each request. This seems unnecessary though, as SolrRequest already has a "getRequestType" method exposing this. We should use this method where possible instead of the ad-hoc pattern matching we currently do, which is brittle and doesn't map well to the varied paths used by our v2 APIs.

Solution

  • Use SolrRequest.getRequestType to identify request types in SolrJ, and remove other forms of request type identification from the SolrJ codebase. This includes deprecating the IsUpdateRequest interface.

  • Change getRequestType to return the actual SolrRequestType enum instead of a String.

  • Some unit tests use SolrRequest.setPath() to send a QueryRequest to a /admin path. This PR adds basic pattern matching in getRequestType and adds method SolrRequest.getBaseRequestType to promote an inheritance pattern which works with mutable request paths.

  • CloudSolrClient relies heavily on request type to route requests. After modifying the routing in CloudSolrClient to use getRequestType, several requests started failing because they returned request type ADMIN when their handler path was not in CommonParams.ADMIN_PATHS. This PR changes the type of those misclassified requests to UNSPECIFIED. Some of these changes may seem counterintuitive (i.e. /admin/ping and /admin/luke are not ADMIN requests), but they preserve the existing data flow.

Tests

Solr unit tests pass with ./gradlew test.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

* SolrRequest#getRequestType}. Note that changing request type can break/impact request routing
* within various clients (i.e. {@link CloudSolrClient}).
*/
protected SolrRequestType getBaseRequestType() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud: I wonder if it is possible to make this a member of SolrRequest itself, i.e. private SolrRequestType type

The advantage of this is you don't have to expand the interface with this arguably low-level detail. You could imagine a set of overloaded constructors of SolrRequest, i.e.:

  private SolrRequestType type;
  
  public SolrRequest(METHOD m, String path) {
    this(m, path, UNSPECIFIED);
  }
  
  public SolrRequest(METHOD m, String path, SolrRequestType defaultType) {
    this.method = m;
    this.path = path;
    this.type = resolveType(path, defaultType);
  }

  public void setPath(String path) {
    this.path = path;
    this.type = resolveType(path, type);
  }
  
  public SolrRequestType getRequestType() {
    return type;
  }

The catch is that calling setPath has to update this additional property as well. But because getRequestType already depends on path, both solutions are equally mutable. It would be nice if SolrRequest had a builder which could unburden SolrRequest from this mutability but that would be a much bigger change I suppose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBaseRequestType seems sad to me; and I sympathize with Luke's musings. I'd like to see this method go away. I like's Luke's suggestion on a field. But going towards immutable is a bit much IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does seem to complicate the interface a bit.

I'd be fine with the "field" approach that Luke suggested. Or alternately, could this be solved without even a "field" if getRequestType below was non-final and subclasses could implement it as needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving the type in the constructor breaks QueryRequest, since it can only handle calls to SolrRequest.getPath() after construction. I removed getBaseRequestType, added a field defaultType, and put the request type pattern matching logic in getRequestType. Does this compromise work?

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff here!

* SolrRequest#getRequestType}. Note that changing request type can break/impact request routing
* within various clients (i.e. {@link CloudSolrClient}).
*/
protected SolrRequestType getBaseRequestType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBaseRequestType seems sad to me; and I sympathize with Luke's musings. I'd like to see this method go away. I like's Luke's suggestion on a field. But going towards immutable is a bit much IMO.

*
* @see CloudSolrClient#isUpdatesToLeaders
*/
public boolean shouldSendToLeaders() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question that we want this. Maybe this pre-dated org.apache.solr.common.params.ShardParams#SHARDS_PREFERENCE where you can accomplish the same thing. Ideally this method goes away (and goes away from AbstractUpdateRequest).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this PR, as I understand it, is to clean up a lot of the String pattern matching, casting and instanceof checks from our client implementations. And the whole "is this an update request and should I route it to leaders" logic is a big source of those issues.

If you don't like this approach @dsmiley , can you be more explicit about what alternative path you'd like Jude to take? Are you suggesting this PR not touch that logic at all? Are you suggesting we remove it altogether in favor of UpdateRequest setting a shards.preference value, or something similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have clarified that my comment wasn't a critique of this PR. The presence of this boolean seemed to somewhat get in the way of this PR (?) so I chose this moment to share my thoughts. Feel free to ignore!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the author has made this method more prominent by elevating to SolrRequest, so my feedback is relevant & timely. I don't like this method here at all (or anywhere).

Are you suggesting we remove it altogether in favor of UpdateRequest setting a shards.preference value, or something similar?

Exactly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this closer just now. shards.preference routes the entire request whereas IsUpdateRequest.isSendToLeaders is about each document in the request, individually routing it. So not the same. Any way, I don't think it makes sense for this to be on SolrRequest generally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way, I don't think it makes sense for this to be on SolrRequest generally.

@dsmiley is this more "sharing your thoughts", or do you want Jude to address this in some way? Just want to double-check so there's no ambiguity.

I agree with you that UpdateRequest should probably re-use shards.preference instead of offering additional methods to impact routing. (Or perhaps - maybe UpdateRequest offers a few syntax-sugar methods to set common preferences, but those operate by modifying shards.preference rather than maintaining a separate boolean?).......but all of that feels out of scope in a PR that was proposed initially as a refactor?

If you feel strongly about the method being out of place on SolrRequest, maybe the best approach for now is to file a follow-up ticket, and to hold off touching the UpdateRequest stuff at all in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel strongly about the method being out of place on SolrRequest

Very much so.

[-1] to ShardRequest.shouldSendToLeaders since it's usage is actually only for updates; isn't generally controlling all requests (nor should it). We could just leave Solr as it is with regards to this boolean in UpdateRequest & IsUpdateRequest. Maybe there are other possible solutions. Maybe it doesn't need to exist if users will configure CloudSolrClient with the setting and not change it; not sure.

@dsmiley dsmiley requested a review from gerlowskija March 7, 2025 20:29
Copy link
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking great overall, though I agree with some of the other commenters that we should come up with an alternative to the SolrRequest.getBaseRequestType method.

Other than that, only a few minor comments inline.

* SolrRequest#getRequestType}. Note that changing request type can break/impact request routing
* within various clients (i.e. {@link CloudSolrClient}).
*/
protected SolrRequestType getBaseRequestType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does seem to complicate the interface a bit.

I'd be fine with the "field" approach that Luke suggested. Or alternately, could this be solved without even a "field" if getRequestType below was non-final and subclasses could implement it as needed?

*
* @see CloudSolrClient#isUpdatesToLeaders
*/
public boolean shouldSendToLeaders() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this PR, as I understand it, is to clean up a lot of the String pattern matching, casting and instanceof checks from our client implementations. And the whole "is this an update request and should I route it to leaders" logic is a big source of those issues.

If you don't like this approach @dsmiley , can you be more explicit about what alternative path you'd like Jude to take? Are you suggesting this PR not touch that logic at all? Are you suggesting we remove it altogether in favor of UpdateRequest setting a shards.preference value, or something similar?

@@ -84,8 +84,8 @@ public DelegationTokenResponse.Get createResponse(SolrClient client) {
}

@Override
public String getRequestType() {
return SolrRequestType.ADMIN.toString();
protected SolrRequestType getBaseRequestType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Unrelated to this PR, but does DelegationTokenRequest serve a purpose anymore? SOLR-9200 added it originally in order to support 'hadoop-auth', and now that the "hadoop-auth" module is gone this is only referenced in tests afaict.

Perhaps it could've been removed when @epugh removed hadoop-auth earlier this year?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@risdenk I also at-mentioned you on that point on the related Hadoop removal PR since you are the SME on this (not Pugh).

@@ -55,8 +55,13 @@ public SolrParams getParams() {
}

@Override
public String getRequestType() {
return SolrRequestType.UPDATE.toString();
protected SolrRequestType getBaseRequestType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] Unrelated to this PR, but it's a shame that DirectXmlRequest isn't a child of either AbstractUpdateRequest or UpdateRequest. Anyone remember why this is by chance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should deprecate DirectXmlRequest. It's unused, untested, and not needed. I'll do so in my deprecation branch/PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already did that:

public class DirectXmlRequest extends CollectionRequiringSolrRequest<UpdateResponse>

I should merge that PR so we can more easily communicate to even each other what things are going away so we shouldn't spend attention on

*/
public SolrRequestType getRequestType() {
String path = getPath();
if (path != null && CommonParams.ADMIN_PATHS.contains(path)) {
Copy link
Contributor

@dsmiley dsmiley Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is troubling IMO; I think we should just return the field of the requestType. Therefore each subclass needs to provide it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is necessary at the moment is because of this pattern, which is present in several unit tests across the codebase (example from TestReplicationHandler):

ModifiableSolrParams params = new ModifiableSolrParams();
params.set("action", "reload");
params.set("core", core);
params.set("qt", "/admin/cores");
QueryRequest req = new QueryRequest(params);

Doing pattern matching on getPath allows this type of path manipulation to "just work" with CloudSolrClient, at the cost of introducing more complexity into SolrRequest. I considered going through and changing each instance of this pattern across all unit tests, but I felt like it might expand the scope of the PR a bit too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those unit tests are then "wrong". QueryRequest is only for a query, not requests generally. It's parameterized response type is QueryResponse holding all the interesting information from /select. Note that, unfortunately, some classes in Solr include the word "Query" when it shouldn't but QueryRequest isn't one of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see "qt" makes this harder. I hate it; I almost removed it completely as one of my first big actions as a committer, forever ago, but I went half-way by removing its processing server-side. It's still present in SolrJ as you see here; sigh. I'd love to see this gone. I know it's a convenience for tests that want a SolrParams to be sort of all-encompassing about the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any "real" risks of mis-classification of the request type in SolrJ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pattern matching is necessary here to make the APIs work as intended. Moving that logic to SolrRequest seemed like a good idea when I first made this change, but I am kind of second-guessing that now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could sympathize if my suggestion increases the scope of the PR but I still think we shouldn't play games in this method.
What could help is having request constructors that don't take the request type but insist the path is obviously admin, so we default to admin or otherwise throw IllegalArgumentException. That would cut down on many changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[re: 'qt' param] I almost removed it completely as one of my first big actions as a committer

It's been on my list for awhile, as it's caused problems with the v2 API as well. We really should nuke it across the board...

The reason this is necessary at the moment is because of this pattern, which is present in several unit tests

I think this pattern is mostly a result of gaps in SolrJ. We didn't offer a "reload core" SolrRequest implementation at some point, so a test-author hacked around it by forcing the request into a QueryRequest. But I wonder how many of those gaps have closed up over time.

We have a reload-core SolrRequest now, on the v2 side at least: org.apache.solr.client.solrj.request.CoresApi.ReloadCore. That should be safe to use in tests. And even where there's still a SolrJ gap we should still be able to replace could these QueryRequest mis-uses with GenericSolrRequest I think?

Is there's not that many instances of the pattern, maybe that's a path forward as well? (Though if it's too much scope creep, I'd also understand that...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go ahead and make the changes to the tests, I agree that it seems like the best path forwards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of "qt": I filed SOLR-17715 to remove it for SolrJ 10.

@@ -38,7 +38,8 @@ public class SolrPing extends CollectionRequiringSolrRequest<SolrPingResponse> {

/** Create a new SolrPing object. */
public SolrPing() {
super(METHOD.GET, CommonParams.PING_HANDLER);
// this request is not processed as an admin request
super(METHOD.GET, CommonParams.PING_HANDLER, SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADMIN

*
* @see CloudSolrClient#isUpdatesToLeaders
*/
public boolean shouldSendToLeaders() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the author has made this method more prominent by elevating to SolrRequest, so my feedback is relevant & timely. I don't like this method here at all (or anywhere).

Are you suggesting we remove it altogether in favor of UpdateRequest setting a shards.preference value, or something similar?

Exactly.

@@ -35,11 +35,12 @@ public class LukeRequest extends CollectionRequiringSolrRequest<LukeResponse> {
private Boolean includeIndexFieldFlags = null;

public LukeRequest() {
super(METHOD.GET, "/admin/luke");
// this request is not processed as an ADMIN request
super(METHOD.GET, "/admin/luke", SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed to change the classification separately (another PR)

if (request instanceof IsUpdateRequest) {
sendToLeaders = ((IsUpdateRequest) request).isSendToLeaders() && this.isUpdatesToLeaders();
if (request.getRequestType() == SolrRequestType.UPDATE) {
sendToLeaders = request.shouldSendToLeaders() && this.isUpdatesToLeaders();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know casting is never elegant but shouldSendToLeaders() doesn't belong on SolrRequest base class

@@ -30,17 +30,12 @@ public AbstractSchemaRequest(METHOD m, String path) {
}

public AbstractSchemaRequest(METHOD m, String path, SolrParams params) {
super(m, path);
super(m, path, SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again; lets not change that in this PR

@@ -39,7 +39,7 @@ public class FieldAnalysisRequest extends CollectionRequiringSolrRequest<FieldAn

/** Constructs a new FieldAnalysisRequest with a default uri of "/fieldanalysis". */
public FieldAnalysisRequest() {
super(METHOD.GET, "/analysis/field");
super(METHOD.GET, "/analysis/field", SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep as QUERY

@@ -50,7 +50,7 @@ public GenericSolrRequest(METHOD m, String path) {
* @param params query parameter names and values for making this request.
*/
public GenericSolrRequest(METHOD m, String path, SolrParams params) {
super(m, path);
super(m, path, SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should overload the constructor and encourage specifying this param. II could see defaulting to ADMIN only if the path starts with "/admin" and otherwise throwing an IllegalArgumentException, thus compelling users to be explicit while also benefiting from the implied ADMIN from the path when that's commonly the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see defaulting to ADMIN only if the path starts with "/admin" and otherwise throwing an IllegalArgumentException

+1 to the additional ctor so that callers can specify their own request-type as desired, but I dislike the suggestion around conditionally defaulting to "ADMIN", for two reasons.

  1. Some built-in paths containing "admin" are emphatically NOT "ADMIN" APIs; /admin/security being a particularly thorny example. And the assumption gets even rockier when you start considering users who might want GenericSolrRequest to hit their custom endpoints. The whole purpose of the class is to be generic - making assumptions about the type of request (beyond "UNSPECIFIED") seems shaky IMO.
  2. One of the main purposes of this JIRA/PR is to get rid of path-inspection and string comparison, which tends to be brittle. Let's not just shift it to a different place!

@@ -50,7 +50,7 @@ public class V2Request extends SolrRequest<V2Response> implements MapWriter {
private ResponseParser parser;

private V2Request(METHOD m, String resource, boolean useBinary) {
super(m, resource);
super(m, resource, SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep ADMIN

*/
public SolrRequestType getRequestType() {
String path = getPath();
if (path != null && CommonParams.ADMIN_PATHS.contains(path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could sympathize if my suggestion increases the scope of the PR but I still think we shouldn't play games in this method.
What could help is having request constructors that don't take the request type but insist the path is obviously admin, so we default to admin or otherwise throw IllegalArgumentException. That would cut down on many changes.

@@ -36,7 +36,7 @@ public class ConfigRequest extends CollectionRequiringSolrRequest {
protected final String message;

public ConfigRequest(String message) {
super(SolrRequest.METHOD.POST, "/config");
super(SolrRequest.METHOD.POST, "/config", SolrRequestType.UNSPECIFIED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep ADMIN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants