-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with numpy.load and allow_pickle #85
Comments
To avoid this type of bug we should probably avoid using the |
I find it very peculiar that they changed the default for |
Indeed. And it is very problematic for backwards compatibility... (It does sound like it's a good idea not to allow_pickle, if it would be able to run arbitrary code!) |
I think the advantage of saving the files with pickle enabled is it compresses them, which is still desirable. |
So I'm not sure what we should do long term. I think it would be good to add an option so the user can pick, and for the moment the default behaviour should be backwards compatible. Maybe we should change the default behaviour in the future though. |
Long term we should use HDF5 by default, with the option for numpy to preserve backward compatibility (this depends on finishing #82). |
The bug was fixed by #111, restoring intended behaviour, but we may still need to implement a long term solution. |
@scottclowe , we can close this now with your changes to the caching backend right? |
I have not worked on anything which fixes the potential pickle vulnerability. If we are happy to allow this vulnerability to persist on the basis that users are only unpickling their own cache so they are not exposing themselves to any external threats, you can close this on the basis that it was resolved by #111. |
Expected behavior
Whenever an experiment is defined, it should check if a previous analysis was run and if there is data to be loaded. It should then show the message 'Reloading previously prepared data...'.
Actual behaviour
Even if previous analysis data exists, defining a new experiment now always redoes all of the analysis.
Reason
Currently if numpy.load is called, if allow_pickle is not set to True it will throw an error. In our current code this means that whenever an experiment is defined, all of the analysis will be re-done instead of loaded from the previously generated files. The easy fix is to just add 'allow_pickle=True' as an option.
This seems to be a new feature, and it's there for security reasons:
So we perhaps should think about fixing it in another way, without somehow breaking backwards compatibility with previously stored files!
The text was updated successfully, but these errors were encountered: