-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Implement partial_sort_unstable for slice #149318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
7af04ad to
115ac5c
Compare
This comment has been minimized.
This comment has been minimized.
115ac5c to
0e87d5d
Compare
|
cc @orlp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some remarks, some on style, but also some with substance.
Besides comments on the code that's written, I do note a lack of tests?
Doc tests cover most branches. I don't find a dedicated file to cover its cousin |
|
The examples can change at any time. And you didn't test, for example, the post-condition that all elements |
Thanks and yes. Do you know where the unit tests of |
|
I believe the bulk is found in https://github.com/rust-lang/rust/blob/main/library/alloctests/tests/sort/tests.rs. |
|
What I suggested in the ACP was a sketch implementation, I did some more thinking and I think the following handles all corner cases nicely: pub fn partial_sort<T, F, R>(mut v: &mut [T], range: R, is_less: &mut F)
where
F: FnMut(&T, &T) -> bool,
R: RangeBounds<usize>,
{
let len = v.len();
let Range { start, end } = slice::range(range, ..len);
if end - start <= 1 {
// Can be resolved in at most a single partition_at_index call, without
// further sorting. Do nothing if it is an empty range at start or end.
if start != len && end != 0 {
sort::select::partition_at_index(v, start, is_less);
}
return;
}
// Don't bother reducing the slice to sort if it eliminates fewer than 8 elements.
if end + 8 <= len {
v = sort::select::partition_at_index(v, end - 1, is_less).0;
}
if start >= 8 {
v = sort::select::partition_at_index(v, start, is_less).2;
}
sort::unstable::sort(v, is_less);
}And to formalize the post-conditions, I think the following should hold after a call to for i in 0..b {
for j in b..n {
assert!(v[i] <= v[j]);
}
}
for i in 0..e {
for j in e..n {
assert!(v[i] <= v[j]);
}
}
for i in b..e {
for j in i..e {
assert!(v[i] <= v[j]);
}
} |
A lot of those individual comparisons are implied by transitivity of the ordering, so it can be reduced to choosing the maximum of the prefix (if any), the minimum of the suffix (if any), and then asserting that the concatenation is sorted. Informally, let max_before = v[..b].iter().max().into_iter();
let sorted_range = v[b..e].iter();
let min_after = v[e..].iter().min().into_iter();
let seq = max_before.chain(sorted_range).chain(min_after);
assert!(seq.is_sorted());That's pretty much what you said in rust-lang/libs-team#685 (comment) , just using transitivity of the comparison. Without assuming that, the implementation couldn't guarantee the universally quantified property anyway. |
f9a09e0 to
372589e
Compare
This comment has been minimized.
This comment has been minimized.
|
Pushed a new implementation. I'm writing tests but perhaps we'd have a new mod under |
cc @Amanieu for early review for the direction and advice on where to organize the tests. |
This comment has been minimized.
This comment has been minimized.
6ef6ab4 to
10d053f
Compare
This comment has been minimized.
This comment has been minimized.
10d053f to
43fc006
Compare
|
Added some pattern tests - can be extended later. |
|
cc @tgross35 can you take a look here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few requests, mostly stylistic. I think the tests could use some improvements but I'm not sure what would be reasonable - @orlp would you mind providing some suggestions here? (In general, I'm happy to defer this review to you)
|
Reminder, once the PR becomes ready for a review, use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall structure of tests look fine, just some stuff missing.
|
Thanks for your comments @orlp @tgross35! One comment inline #149318 (comment) Rest SGTM. I'll integrate them in a few days and re-request a review :D |
This comment has been minimized.
This comment has been minimized.
|
All comments except tests are resolved. I'll handle the tests addition and reorganize later today. |
Signed-off-by: tison <[email protected]> Co-Authored-By: Orson Peters <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
cbc2c7b to
e2bf02f
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Signed-off-by: tison <[email protected]>
e2bf02f to
524fa92
Compare
|
@rustbot ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small request then LGTM, unless @orlp has anything else. Please squash the commits (we don't have a way to do it automatically, unfortunately)
| // A heuristic factor to decide whether to partition the slice or not. | ||
| // If the range to sort is almost the whole slice, it's not worth | ||
| // partitioning the slice first. | ||
| const MAX_LEN_ALWAYS_INSERTION_SORT: usize = 8; | ||
|
|
||
| // Avoid partitioning the slice when it eliminates only a few elements to sort. | ||
| // The threshold of 8 elements was determined empirically. | ||
| let mut v = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments can be merged. "empirically" is still pretty vague - could you add a bit more detail here? E.g. "The threshold of 8 showed the best performances when benchmarking partial sorts of random slices at random ranges on an x86-64 machine" would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps @orlp knows more about where 8 comes from. I treat it as a random but reasonably small number where:
If the range to sort is almost the whole slice, it's not worth partitioning the slice first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the second comment can be deleted now.
| // A heuristic factor to decide whether to partition the slice or not. | |
| // If the range to sort is almost the whole slice, it's not worth | |
| // partitioning the slice first. | |
| const MAX_LEN_ALWAYS_INSERTION_SORT: usize = 8; | |
| // Avoid partitioning the slice when it eliminates only a few elements to sort. | |
| // The threshold of 8 elements was determined empirically. | |
| let mut v = v; | |
| // A heuristic factor to decide whether to partition the slice or not. | |
| // If the range to sort is almost the whole slice, it's not worth | |
| // partitioning the slice first. | |
| const MAX_LEN_ALWAYS_INSERTION_SORT: usize = 8; | |
| let mut v = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just estimated the 8 from experience, there isn't anything concrete to back it up. A different number might very well be better but it definitely is better than not making an exception.
By the way this shouldn't be called MAX_LEN_ALWAYS_INSERTION_SORT, not sure why that name was chosen. If you do a partial sort of 0..1_000_000 - 2 on a one-million array this exception also triggers and we just sort the whole array, but definitely not with insertion sort.
I'd suggest PARTITION_THRESHOLD.
This refers to #149046.