Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: boiler plate code for PyPickle to support Enums and multithreaded pools #4465

Open
attack68 opened this issue Aug 21, 2024 · 3 comments

Comments

@attack68
Copy link
Contributor

attack68 commented Aug 21, 2024

As context, I rarely use pickle directly in Python, but apparently I have been using it indirectly in the below code:

from multiprocessing import Pool
func = partial(other_func, **kwargs)
p = Pool(NUM_CPUS)
results = p.map(func, some_list)

When my project was pure Python this went undetected. When I started porting some of my classes to Rust with PyO3 all of my tests passed except ones involving the above lines. Why? Becuase my [pyclass] objects were not picklable, apparently.

Thus, for someone not at all au fait with pickling, this lead me on a rather blind hunt. However, I came across a very useful issue on this board, and implemented that code.

Later, my [pyclass] become more complex and fields used to construct them also require enum also labelled as [pyclass]. A simple example which works directly in Rust as MyEnum::Variant1 and Python as MyEnum.Variant1 is:

#[pyclass(module = "mymodule.rs")]
#[derive(Copy, Clone, Debug, Eq, PartialEq, Serialize, Deserialize)]
pub enum MyEnum {
    Variant1,
    Variant2,
    Variant3,
}

My method for pickling broke becuase simple enum do not have, or naturally need, a constructor method. To solve this I have used the following structure:

#[pymethods]
impl MyEnum {
    // Pickling
    #[new]
    fn new_py(variant: u8) -> PyResult<MyEnum> {
        match ad {
            0_u8 => Ok(MyEnum::Variant1),
            1_u8 => Ok(MyEnum::Variant2),
            2_u8 => Ok(MyEnum::Variant3),
            _ => Err(PyValueError::new_err("unreachable code on MyEnum pickle."))
        }
    }
    fn __setstate__(&mut self, state: Bound<'_, PyBytes>) -> PyResult<()> {
        *self = deserialize(state.as_bytes()).unwrap();
        Ok(())
    }
    fn __getstate__<'py>(&self, py: Python<'py>) -> PyResult<Bound<'py, PyBytes>> {
        Ok(PyBytes::new_bound(py, &serialize(&self).unwrap()))
    }
    fn __getnewargs__<'py>(&self) -> PyResult<(u8,)> {
        match self {
            MyEnum::Variant1 => Ok((0_u8,)),
            MyEnum::Variant2 => Ok((1_u8,)),
            MyEnum::Variant3 => Ok((2_u8,)),
        }
    }
}

I don't really want this code in my project. Its complicated (for me) and excessive when applied across all my [pyclass], and it is only needed for that pool multithreading. Issue raised to provoke any discussion if any of this can be abstracted away...

@ngoldbaum
Copy link
Contributor

I think having some kind of support for generating pickle protocol boilerplate makes sense (via a macro?). There are obviously types where that's not possible so pyclass can't always do it, but for most plain old data types I think it's reasonable for users to opt into to PyO3 to doing it automatically.

@attack68
Copy link
Contributor Author

I think even the smallest amount of auto-implements will be helpful if there is a doc section on what it does and how one can modify/alter it to make it work for their specific case. Currently searching "pickle" in Pyo3 docs gets no results. 'Tis possible I might be able to help out with the docs if there is progress made on the auto-implement.

@davidhewitt
Copy link
Member

I would definitely be positive on making progress on supporting pickle; it is one of our oldest open issues (#100). As I see it, there are two main challenges associated with it:

  • The pickle protocol requires the __module__ of the #[pyclass] to be set correctly so that it can be imported upon unpickling. With the declarative #[pymodule] we are a lot better at this (still not perfect), so this might now be good enough. This was a major blocker to automatic support in the past, in my opinion.
  • We need to decide how to support the pickle protocol. It needs either:
    • __getstate__ and __setstate__
      • We have to decide how to serialize the #[pyclass] "state". I think there is no general solution here to automatically decide on serialization without user input, but we might be able to do it automatically in the special case where all fields of the struct can be converted to Python objects.
      • There is a question around the soundness of the mutation in __setstate__, e.g. for frozen clasess.
    • getnewargsandnew(aka#[new]` in PyO3)

In general, I'm more optimistic for __getnewargs__ and __new__. I'm very open to both solutions; mostly this just needs research and hasn't been highest priority item for me yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants