Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added progress callback when save_all is used #7435

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

radarhere
Copy link
Member

@radarhere radarhere commented Oct 2, 2023

Resolves #7433

The issue requests the ability to use a callback function to monitor progress when calling save_all. This function is triggered after processing each frame, with the filename of the current image (or None if the filename is not available), the number of frames processed so far, and the total number of frames.

from PIL import Image

def callback(filename, frame_count, n_frames):
    print((filename, frame_count, n_frames))

with Image.open("Tests/images/hopper.gif") as im:
    with Image.open("Tests/images/dispose_bgnd.gif") as im2:
        im.save("out.gif", save_all=True, append_images=[im2], progress=callback)

gives

('Tests/images/hopper.gif', 1, 6)
('Tests/images/dispose_bgnd.gif', 2, 6)
('Tests/images/dispose_bgnd.gif', 3, 6)
('Tests/images/dispose_bgnd.gif', 4, 6)
('Tests/images/dispose_bgnd.gif', 5, 6)
('Tests/images/dispose_bgnd.gif', 6, 6)


assert progress == [
("Tests/images/apng/single_frame.png", 1, 7),
("Tests/images/apng/single_frame.png", 2, 7),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line could use a comment that single_frame.png gets appended to itself in the save() call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

come to think of it, having explicit index there would make adding a comment somewhat redundant

You no longer think this is necessary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You no longer think this is necessary?

The comment was meant to clarify that the image is repeated because it occurs twice in the input; providing explicit indices makes pointing that out unnecessary.

progress.append((filename, frame_number, n_frames))

Image.new("RGB", (1, 1)).save(out, "PNG", save_all=True, progress=callback)
assert progress == [(None, 1, 1)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe indicate index here (instead of None)? 1 for the 1st input image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be open to the idea of having an index all of the time? Or, we could pass back the image instance, and let the user figure out the index or filename as they see fit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having index always is probably simpler. As for passing the image, it wouldn't be much help with identifying the index if an image is reused in the input (like in that one test… come to think of it, having explicit index there would make adding a comment somewhat redundant).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to switch to an index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Why did you remove the filename though? I thought you were talking about adding an index, not removing the filename (as my own suggestion was to provide index when the filename isn't available)

Copy link

@LeXofLeviafan LeXofLeviafan Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that (index, filename, frame_number, frame_total) crosses the line from 'neat and tidy' to 'passing back everything to the user just in case something is useful'.

You can use a namedtuple instead (i.e. define Process = namedtuple('Process', "index filename frame n_frames") somewhere global, and pass its value to the progress callback).

And I'd say that the name of the input file being currently processed is a quite appropriate piece of information to be included, as it's the most user-friendly way of showing, well, which file is being processed (whether you're logging your progress to a terminal or displaying it in GUI on top of the progressbar).

Why make the most likely usecase inconvenient? The name of the game shouldn't be "provide as little information as possible and force the calling code to figure out the rest of it every single time", it should aim at the most likely general purpose case (which I imagine more often than not would be logging/displaying something in the lines of Converting: input/image2.png [#2] -> output.png (frame 12 of 84); and the only thing here that would be instantly available for the calling code without complication in any situation would be the output file name, as it's passed in as a raw string and never changes throughout the entire process).

I would have thought it was simple, given that the user just provided these images as the base image and append_images?

Suppose the code that wants to log progress isn't concerned with input at all – it only has a generator that supplies these images; why would it need to implement workarounds like instantiating all values from that generator at once, or wrapping it into a generator object that mucks around with the data being yielded, when all it wants to do is display progress on screen? And like I said, filename is the clearest way possible of identifying an image that was loaded from a file.

append_images are arbitrary PIL.Image.Image

I'm pretty sure these can be obtained with Image.open(), just like the first image.

why not pass just saved im object itself?

I suggested that, but it was pointed out that the image might be provided as an input more than once.

…which is not the case with filenames, I suppose :-)

The index can be necessary in case of duplicate files in the input, but the filename is likely to be necessary for the purpose of displaying the progress in user-friendly manner. (Now passing the entire image object, that would actually be "passing back everything to the user just in case something is useful" 😆)

I don't like passing indices since it's hard to figure out the difference between index and frame_number and it's very simple to write the code which uses index incorrectly.

Incidentally, passing progress status as a namedtuple (or the like) would help with that.

I'm not opposed to dropping index altogether - I think the key benefit of this feature is being able to track how many frames have been completed out of the total number.

Filename and/or index are necessary to show which image this current frame comes from, in case when more than one is on the input – it shouldn't be a stretch to expect it to be necessary information for the vast majority of usecases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that a filename or image index is necessary, but I've pushed a commit to restore the filename.

>>> from collections import namedtuple
>>> State = namedtuple("State", "a b")
>>> State(2, 4)[0]
2

Because namedtuples still have order, and I'm concerned that there could be different views about what the best order is, or that we might want to add more items in the future, I've used a dictionary instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keys are a bit overly verbose, but that should do the job at least.

Still, I believe having a single fixed class with predefined attributes would make for better API than using free-form dicts in every implementation (not to mention avoiding potential issues if there's a need to make changes at some point in the future). If you're concerned with possibility of using indices to access a namedtuple field (which I honestly find rather unlikely as attribute access is much clearer), you can use a @dataclass instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it would be a good idea to have this getattr(im, "filename", None) thing in one place in the codebase (and if you keep the dict, it would still be a good idea to have one piece of code defining its creation, at least for sake of ensuring there won't be any discrepancies in key names).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to move the creation of the dictionary into a common method.

expected = []
for i in range(42):
expected.append(("Tests/images/iss634.webp", i + 1, 43))
expected.append((None, 43, 43))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, saying that the last frame comes from image #​2 would be more informative than "don't know" – especially when there's multiple images involved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to switch to an index.

@homm homm closed this Oct 2, 2023
@homm homm reopened this Oct 2, 2023
@homm
Copy link
Member

homm commented Oct 2, 2023

An idea: use the same callback for generating frames: pass current im object (which could be an image without bitmap in memory) and check if complete image is returned.

@radarhere
Copy link
Member Author

Sorry, I'm not sure I understand. Are you suggesting this

from PIL import Image

colors_left = ["#f00", "#0f0"]
def callback():
    if colors_left:
        return Image.new("RGB", (1, 1), frames_left.pop(0))

im = Image.new("RGB", (1, 1))
im.save("out.png", save_all=True, progress=callback)

with the intention that it allow multiframe images to be saved without needing all images to be created beforehand and so reduce memory impact?

@homm
Copy link
Member

homm commented Oct 2, 2023

In general, yes. Depending on current code flow, it could be like this, or require some informations about the frames (like size and mode).

@radarhere
Copy link
Member Author

APNG considers the modes of all frames when saving - #6610. I don't know what an elegant way of getting that particular information from the user at the beginning would look like, and I find it hard to imagine that users would expect that they have to know that at the beginning.

assert progress == [
(0, 1, 7),
(1, 2, 7),
(2, 3, 7),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say... if the frames are counted from 1, why count the images from 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm triggering these callbacks when frames are completed, not when they are started.

My intention is for (0, 1, 7) to communicate "We're in the first image. We've completed the first frame, there are seven frames in total".
Then (2, 7, 7) communicates "We're in the third image. We've completed seven frames, there are seven frames in total, so we're done"

If I started counting the frames from 0, the last value would be (2, 6, 7). That looks less to me like save_all is finished.

I recognise this could be a subject of debate.

Copy link

@LeXofLeviafan LeXofLeviafan Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…The question was “Why count the images from 0?”

I.e. why are the images numbered starting from 0? You're numbering frames starting from 1 – it would be consistent to use the same numbering scheme for images.

(And yes, I know that due to how pointer arithmetic works, arrays in C are indexed starting from 0, but that's an implementation detail and it's kind of ridiculous to drag it into higher-level languages just because; counting things is meant to be started from 1 – that's what the number means in the first place.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to explain that my goal is not to number the frames from 1, but rather to clearly communicate when progress is completed, because the frame count will be compared with the total number of frames. The image count will not.

arrays in C are indexed starting from 0, but that's an implementation detail and it's kind of ridiculous to drag it into higher-level languages

Python arrays are also indexed from 0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image count will not.

And yet counting from 0 would be inconsistent with the other counting used within same data set.

Python arrays are also indexed from 0.

Which makes no sense since Python is a high-level language (and there's some consistency issues caused by this arbitrary choice to imitate C array indexing, but that's neither here nor there).

Besides, the images aren't coming from a list as far as the user is concerned (well, there's append_images, but going by that logic you'd have to start from -1 since the first input image is not in this list in the first place… except that -1 is a valid index in Python, so it probably would be None instead).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit naming each piece of information within the returned data. It is now "image_index" and "completed_frames". Hopefully that clarifies that one is the Pythonic zero-based index of the image in progress, and the other is the number of frames that have been finished.

@LeXofLeviafan
Copy link

allow multiframe images to be saved without needing all images to be created beforehand and so reduce memory impact

I imagine if this is ever implemented, it should not be exposed in a workaround-ish way, and it probably should be implemented as a format-specific method if it can only be supported for specific formats (e.g. GifImagePlugin.fromFrames(generator)).

I'm also pretty sure such a feature is out of scope for this particular pull-request.

@homm
Copy link
Member

homm commented Oct 2, 2023

I'm also pretty sure such a feature is out of scope for this particular pull-request.

It out of scope of your needs, but is closely related to current PR since it may or may not share the same API and it is better to discuss it here.

@homm
Copy link
Member

homm commented Oct 2, 2023

APNG considers the modes of all frames when saving

We can restrict modes of generated frames only to the same as the initial image.

@LeXofLeviafan
Copy link

It out of scope of your needs, but is closely related to current PR since it may or may not share the same API and it is better to discuss it here.

You're conflating the concerns of "keeping track of progress" and "generating input data for the operation", based solely on the fact that both are related to "iteration over frames". I don't believe such features should even intersect in their implementation, let alone complicating each other.

In fact, I'm pretty sure that the correct way of implementing the feature you're requesting would be by supporting a generator as a value passed in append_images without immediately converting it to a list (it would be requested to yield a new image after the previous one got processed) – doing it this way would be idiomatic for Python; and the only way it would in any way affect the functionality of this pull-request, is that it would no longer be possible to pre-calculate the number of frames if append_images is a generator.

def to_gif_memheavy (frames, filename):
    first, _frames = Image.open(frames[0]), frames[1:]
    # append_images is a list with pre-loaded data
    first.save(filename, 'GIF', save_all=True, append_images=[Image.open(s) for s in _frames])

def to_gif_memlow (frames, filename):
    first, _frames = Image.open(frames[0]), frames[1:]
    # append_images is a generator and loads images one-at-a-time
    first.save(filename, 'GIF', save_all=True, append_images=(Image.open(s) for s in _frames))

@homm
Copy link
Member

homm commented Oct 3, 2023

Having iterators support you can track progress without any callbacks.

@LeXofLeviafan
Copy link

LeXofLeviafan commented Oct 3, 2023

Having iterators support you can track progress without any callbacks.

Only in the sole specific case when composing an animated image from separate single-frame images that you load one-by-one. Which means, when combining animated images, you'd only get updates per image, not per frame (and the simple usecase of "saving/converting a single animated image" would get zero progress tracking support).

Not to mention you're conflating concerns of "providing input" and "progress tracking" again. Why would the user need to be forced to deal with both when he only needs one of these things? And why would the user code dealing with one of these need to also deal with the other? If the user wants to track frame progress, he passes a callback that gets invoked on every frame (e.g. print). If the user wants to load input images one-at-a-time, he passes a generator instead of a list. He can easily do one of these things, both, or neither. Simple, intuitive, unentangled.

@@ -2446,6 +2446,23 @@ def save(self, fp, format=None, **params):
if open_fp:
fp.close()

def _save_all_progress(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with "save all"? This method just passes data into another function (supplied externally).

If anything, _report_progress would make much more sense here. (Or even just _progress)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indicates that the method is used in relation to im.save(out, save_all=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…Ah. Alright, it does make sense in that context.

Though it would still make more sense to name it something like _frame_progress, I'd say; it's more generic and there's not really anything in the method itself that makes it necessarily about saving only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Progress indication when writing multiple frames
3 participants