-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17635: unmarshalling Map to SimpleOrderedMap if key i of type St… #3163
base: main
Are you sure you want to change the base?
SOLR-17635: unmarshalling Map to SimpleOrderedMap if key i of type St… #3163
Conversation
} | ||
return m; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsmiley FYI, this is marked as drafted.
We cannot simply create a SimpleOrderedMap since the key of the Map can also me something other than a String e.g. an Integer. The only way I can think of to overcome that is to read the first key and then do an instanceof check on it. Pretty ugly considering I have to use a raw Map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man I didn't think of that! Let's say we treat keys as a String no matter what. I'm suspicious that our response formats would ever have a map whose key isn't a string. Maybe add an assert if the key isn't already a String so that we can more readily identify problems. (not sure if may read CharSequence but we can consider that as a String).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just casting the key to a String, or having an assert on it, will fail three unit test of TestJavaBinCodec. E.g. the .bin used in the tests - org.apache.solr.common.util.TestJavaBinCodec#SOLRJ_JAVABIN_BACKCOMPAT_BIN - holds a Map with Integer as key. So you are saying that this is not representing a realistic case and had to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; consider this a hypothesis to be tested (by the results of the totality of Solr's test suite).
int sz = readVInt(dis); | ||
return readMap(dis, sz); | ||
|
||
if (EnvUtils.getPropertyAsBool("solr.solrj.javabin.mapAsNamedList", true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this setting on every single map decode is quite bad for performance. This setting should be a field on the codec, and that which could be toggled with a setter (if useful in a test).
} | ||
return m; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man I didn't think of that! Let's say we treat keys as a String no matter what. I'm suspicious that our response formats would ever have a map whose key isn't a string. Maybe add an assert if the key isn't already a String so that we can more readily identify problems. (not sure if may read CharSequence but we can consider that as a String).
@@ -293,7 +306,7 @@ public void testForwardCompat() throws IOException { | |||
for (int i = 1; | |||
i < currentFormatBytes.length; | |||
i++) { // ignore the first byte. It is version information | |||
assertEquals(newFormatBytes[i], currentFormatBytes[i]); | |||
assertEquals("for i:" + i, newFormatBytes[i], currentFormatBytes[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rewriting the .bin file with the changed output from org.apache.solr.common.util.TestJavaBinCodec#generateAllDataTypes tests like org.apache.solr.common.util.TestJavaBinCodec#testBackCompat are passing now. But this one testForwardCompat which compares all the bytes fails since the bytes are not the same anymore.
Is it possible I did something wrong when recreated the bin-file?
So the object in the unmarshalle version are the same, but the bytes are for some reason not.
Is just took the output from generateAllDataTypes(), called javabin.marshall(matchObj, output) with it and wrote the result via a FileOutputStream to a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestJavaBinCodec should test without this setting. This is another reason to make this a setter or something, which is clearer than a system property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the tests in TestJavaBinCodec do not need to pass with javabin.setMapAsNamedList=true even though it will be true by default? I would have thought all these test need to pass in order to ensure we do not break anything by changing to SOM.
In this case, there is also no need to update /solrj/javabin_backcompat.bin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Of course there needs to be some test of this specific change being added. Just one unit test is fine. If that seems lacking, remember that the totality of Solr tests will also be running with this by default, and thus if there's an incompatibility we'll probably see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsmiley
We have now 9 failing tests, here my findings:
TestFastJavabinDecoder
testSimple-> the assert on line 136 fails due to the JSON with a SOM in it has some extra line breaks
TestUpdateRequestCodec
testBackCompat4_5() -> failing because /solrj/updateReq_4_5.bin" has a Map with SolrInputDocument as Key
testStreamableInputDocFormat -> because in JavaBinUpdateRequestCodec.StreamingCodec#readOuterMostDocIterator line 315 instanceOf NamedList is now true, and it is checked before instanceof Map
ClusterStateProviderTest
6 tests are failing e.g. testClusterStateProvider -> assertThat on line 262 fails. ClusterState is using org.apache.solr.common.util.Utils to deserialize the Json, which creates a LinkedHashMap and clusterStateHttp has now a SOM in it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestUpdateRequestCodec: dunno what to make of that without debugging; hopefully you can make recommendations. Not surprised to see this kind of issue where some code is doing instanceof checks in a certain order, written at a time when no NamedList was a Map but now a prominent (and even most popular, I'd say) is actually one. You might want to search the codebase for instanceof checks against NamedList to see if there are other areas to improve. I have some WIP for Utils.java
ClusterStateProviderTest: looks like just needs to convert both maps to the same type (e.g. wrap in HashMap) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterStateProviderTest: looks like just needs to convert both maps to the same type (e.g. wrap in HashMap
Where would I do that? There is really a lot happening in those tests and I still have not managed to fully wrap my head around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks basically ready!
Suggested 9.9 CHANGES.txt in "Other"
SOLR-17635: SolrJ "javabin" format now reads all maps as SimpleOrderedMap (a NamedList, implements Map) instead of LinkedHashMap for better compatibility as Solr changes some NamedLists to Map/SimpleOrderedMap.
This is also worthy of upgrade notes, which would go in a 9.9 section on major_changes_in_solr_9.adoc
(maybe named differently) where you can point out the system property that will toggle this behavior.
Object key = readVal(dis); | ||
assert key instanceof String; | ||
Object val = readVal(dis); | ||
entries.put((String) key, val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entries.put((String) key, val); | |
entries.add((String) key, val); // using NL.add() since key won't repeat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an O(N^2)
performance bug, by the way 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I am glad at least one of us knows what we are doing here;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It underscores my concern about making SimpleOrderedMap mutable. I still question it.
@@ -1436,4 +1459,8 @@ public void close() throws IOException { | |||
daos.flushBuffer(); | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO javadocs.
@@ -409,12 +419,17 @@ private static byte[] getBytes(Object o, boolean readAsCharSeq) throws IOExcepti | |||
} | |||
} | |||
|
|||
private static Object getObject(byte[] bytes) throws IOException { | |||
private static Object getObject(byte[] bytes, boolean mapAsNamedList) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need these getObject anymore, I think
@@ -290,7 +290,7 @@ public static Object fromJSON(byte[] utf8) { | |||
} | |||
|
|||
public static Object fromJSON(byte[] utf8, int offset, int length) { | |||
return fromJSON(utf8, offset, length, STANDARDOBJBUILDER); | |||
return fromJSON(utf8, offset, length, SIMPLEORDEREDMAPOBJBUILDER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this breaks another 17 tests.
@dsmiley Do you think it is OK to run the test ClusterStateProviderTest with solr.solrj.javabin.mapAsNamedList=false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; we should understand why the test or underlying functionality are breaking and then make decisions based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at a test case ClusterStateProviderTest.testClusterStateProviderEmptySolrVersion, Http2ClusterStateProvider is using JavaBinCode to deserialize, hence we get now a SOM, ZkClientClusterStateProvider is using JSON and usees the Utils.fromJson to deserialize, and it gets a LinkedHashMap. In this test it expects the same data structure from both Http2ClusterStateProvider and ZkClientClusterStateProvider, hence it's failing now.
If I change the Utils to return SMO instead of a LinkedHashMap, as I did in this change, all the tests in ClusterStateProviderTest pass, but it breaks at least 17 other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsmiley How do we continue on this issue? Based on what I've described above, it seems fine to me to run these tests with solr.solrj.javabin.mapAsNamedList=false. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK if you need my assistance to look at some key failures and make heads or tails out of what to do.
writeMap(name, (MapWriter) val); | ||
} else if (val instanceof ReflectWritable) { | ||
writeVal(name, Utils.getReflectWriter(val)); | ||
} else if (val instanceof MapSerializable) { | ||
} else if (val instanceof MapSerializable && !(val instanceof SimpleOrderedMap)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this line? SOM isn't MapSerializable
writeMap(name, (MapWriter) val); | ||
} else if (val instanceof ReflectWritable) { | ||
writeVal(name, Utils.getReflectWriter(val)); | ||
} else if (val instanceof MapSerializable) { | ||
} else if (val instanceof MapSerializable && !(val instanceof SimpleOrderedMap)) { | ||
// todo find a better way to reuse the map more efficiently | ||
writeMap(name, ((MapSerializable) val).toMap(new LinkedHashMap<>()), false, true); | ||
} else if (val instanceof Map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should simply be ranked higher than MapWriter, where it will then apply to SimpleOrderedMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried rearranging the order before adding the instanceof-checks, but much code, or at least many tests seem to depend on this order.
If I just move up the 'instanceof Map' check as you suggest, I have 42 new tests failing.
@@ -290,7 +290,7 @@ public static Object fromJSON(byte[] utf8) { | |||
} | |||
|
|||
public static Object fromJSON(byte[] utf8, int offset, int length) { | |||
return fromJSON(utf8, offset, length, STANDARDOBJBUILDER); | |||
return fromJSON(utf8, offset, length, SIMPLEORDEREDMAPOBJBUILDER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; we should understand why the test or underlying functionality are breaking and then make decisions based on that.
FYI I'm very eager to examine this in detail but am on vacation this week. |
|
https://issues.apache.org/jira/browse/SOLR-17635
Description
Please provide a short description of the changes you're making with this pull request.
Solution
Please provide a short description of the approach taken to implement your solution.
Tests
Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.