-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow sorting keys on to_json and to_python by passing in sort_keys #1637
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
CodSpeed Performance ReportMerging #1637 will not alter performanceComparing Summary
|
07a31f5
to
7222c8d
Compare
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main issue is perf regression
false, | ||
false, | ||
false, | ||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidhewitt this is getting gross 😢
src/serializers/fields.rs
Outdated
// NOTE! we maintain the order of the input dict assuming that's right | ||
for result in main_iter { | ||
let (key, value) = result?; | ||
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that this extra Vec being created is the cause for the slowdown in benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the function out, so we can re-use this. and avoid creating new variable when sort keys is false
tests/serializers/test_model.py
Outdated
'field_a': core_schema.model_field(core_schema.bytes_schema()), | ||
'field_b': core_schema.model_field(core_schema.int_schema()), | ||
'field_a': core_schema.model_field(core_schema.bytes_schema()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these have to change? Is this to make the tests change? I would prefer a new self-contained test just for this bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seperated this.
I separated the test out, I also refactor the functions so we can reuse when Let me know what else I need to improve. Thanks |
please review, not sure how I can take it from here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted']
or something like that.
src/serializers/fields.rs
Outdated
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?; | ||
items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're already collecting into a Vec here I think you should do something like this:
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?; | |
items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string()); | |
let mut items = main_iter.map(|k,v| (key_str(key), v)).collect::<PyResult<Vec<_>>>()?; |
Might need to collapse two levels of Result or something like that.
Then you'll have to convert back to the original type. I think this will be the fastest approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I kept the original implementation. Didn't quite get your advice, still quite new to Rust 😓
But added recursive mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first thing process_field_entry_python
does is call key_str
:
pydantic-core/src/serializers/fields.rs
Line 234 in 95f9329
let key_str = key_str(key)?; |
So I think you can just convert to it from the get go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to convert it once and pass to the function.
0cf4b6d
to
95f9329
Compare
Added different sort mode as above, updated the PR description. |
please review 👍 |
95f9329
to
c83a212
Compare
please review 👍 |
@@ -305,6 +305,7 @@ class SchemaSerializer: | |||
exclude_unset: bool = False, | |||
exclude_defaults: bool = False, | |||
exclude_none: bool = False, | |||
sort_keys: Literal['recursive', 'top-level', 'unsorted'] = 'unsorted', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need these three options? surely if we add this feature it should match python's json.dumps
behaviour and be sort_keys: bool = False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are scenarios where you'd want to only sort the top level, e.g. the trick we were doing with our JSON data in FusionFire.
But I also agree that if no one is asking for it we shouldn't implement extra stuff.
How does it sound if for now we do sort_keys: bool = False
and then if needed in the future we can do sort_keys: bool | Literal['top-level'] = False
.
We reduce the complexity and make a nicer signature for now but can still add the top-level option in the future, even if it makes the signature a bit weird.
@samuelcolvin @aezomz wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, that's what we've done for previous flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me check the feature for sort_keys on json.dumps.
I only want to do sort_keys
on model_dump_json
thats why am here 😆
Let me try to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @aezomz, you are being incredibly patient and helpful 🙏🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Adrian, please review when ur free. I think David fixed the wasm build test already.
c83a212
to
0075a4b
Compare
please review 👍 There is this test that is failing, unclear if its related to my change. Let me know any other places i can optimize. Thanks! |
dce2689
to
5ce696c
Compare
Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense?
Since I have been trying to do model_dump_json with sort keys too.
This feature should simulate the same as how we use
json.dumps(data, sort_keys=True)
Take note that
field_d
is extra and still manage to sortPlease let me know if I miss out any other features that
sort_keys=True
is suppose to do!Thanks!
Change Summary
allow sorting keys on to_json and to_python by passing in sort_keys
Related issue number
should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.
Checklist
pydantic-core
(except for expected changes)Selected Reviewer: @davidhewitt