Skip to content

Conversation

@HarrisonMc555
Copy link
Contributor

The iter method returns an iterator that returns each key the appropriate number of times. The new iter_counts method returns an iterator that returns tuples that contain the key and the number of times it appears.


This implements another feature referenced in #17.

Let me know what you think of the name/whatever.

The `iter` method returns an iterator that returns each key the appropriate
number of times. The new `iter_counts` method returns an iterator that returns
tuples that contain the key and the number of times it appears.
@mashedcode
Copy link
Collaborator

If one wants iter_counts why would they not just use HashMap? I guess I'm just interested in your use-case.

@HarrisonMc555
Copy link
Contributor Author

I ran into this crate when I was trying to keep track of the factors for a number. Conceptually, the order didn't matter and I wanted constant-time lookup for the factors. However, there also needed to be duplicates.

I could have constructed the HashMap manually, but it felt more natural to have a data structure that implemented that relationship in a natural way.

However, I then needed to loop through the number of times each factor appeared, and so I needed an iter_counts method.

I'll admit that it may not be the most common occurrence, but I think that it would be nice to have. I'm not in love with the naming scheme, but it was the best I could come up with at the time and seemed to make sense to me...

@mashedcode
Copy link
Collaborator

The python multiset also also implements an equivalent to iter_counts which they call items. Calling it something with iter_ is probably a good idea you had since we're in rust. On the other hand in the C++ std::multiset there's no such thing other than a counts method that counts the number of occurrences of a certain element.

/// keys_with_counts.sort(); // Order is arbitrary
/// assert_eq!(keys_with_counts, vec![(&0, 2), (&1, 1)]);
/// ```
pub fn iter_counts(&self) -> IterCounts<K> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be done with less code. Something like this should work:

pub fn iter_counts(&self) -> impl Iterator<Item = (&K, usize)> {
    self.elem_counts.iter().map(|(key, &counts)| (key, counts))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true. The general practice for the standard library has been to provide a concrete struct to return. However, returning an impl trait is a relatively recent change, so that may be considered a new best practice. I'd be willing to go that way. It may be worth considering for the sake of consistency, since we do need custom structs for several of the other methods (i.e. UnionCounts, etc.).

@HarrisonMc555
Copy link
Contributor Author

I think there should be some way of iterating through the multiset that is O(keys) instead of O(keys×counts). That could be an iter_counts, like proposed here, or an iter_values that only yields the values.

I forgot, we already have a distinct_elements that returns just the values/elements. I guess that gets us most of what we need.

The benefit to the iter_counts method over the distinct_elements method is that if you only had distinct_elements, you would have to do something like this multiset.distinct_elements().map(|elem| (elem, multiset.get(elem)).unwrap()) to recreate like iter_counts.

Never mind. With the HashMultiset, it would just be multiset.distinct_elements().map(|elem| (elem, multiset.count_of(elem))). I was going to say that you could avoid calls to unwrap, but since we're returning a usize (not an Option<T>) this works a lot better.

Well, considering the solution is fairly simple, I would be ok with not having this. It would be consistent with the union_counts and intersection_counts implementations if those get accepted from #22, but either way it's fine if this doesn't make it in.

The equivalent approach for anyone who Googles this in the future:

multiset.distinct_elements().map(|elem| (elem, multiset.count_of(elem)))

@ma2bd
Copy link

ma2bd commented May 3, 2021

Hi. count_of certainly works but I can't help noticing that it costs an additional hashmap access per element. I would personally argue that an HashMultiSet<K> is fundamentally also an HashMap<K, usize> : many operations on multisets (add, sub, union, intersection etc) can be seen as a pointwise (possibly lossy) operation on the map values (+, checked_sub, max, min).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants