Added progress callback when save_all is used #7435

radarhere · 2023-10-02T03:25:55Z

Resolves #7433

The issue requests the ability to use a callback function to monitor progress when calling save_all. This function is triggered after processing each frame, with the filename of the current image (or None if the filename is not available), the number of frames processed so far, and the total number of frames.

from PIL import Image

def callback(filename, frame_count, n_frames):
    print((filename, frame_count, n_frames))

with Image.open("Tests/images/hopper.gif") as im:
    with Image.open("Tests/images/dispose_bgnd.gif") as im2:
        im.save("out.gif", save_all=True, append_images=[im2], progress=callback)

gives

('Tests/images/hopper.gif', 1, 6)
('Tests/images/dispose_bgnd.gif', 2, 6)
('Tests/images/dispose_bgnd.gif', 3, 6)
('Tests/images/dispose_bgnd.gif', 4, 6)
('Tests/images/dispose_bgnd.gif', 5, 6)
('Tests/images/dispose_bgnd.gif', 6, 6)

LeXofLeviafan · 2023-10-02T06:43:51Z

Tests/test_file_apng.py

+
+    assert progress == [
+        ("Tests/images/apng/single_frame.png", 1, 7),
+        ("Tests/images/apng/single_frame.png", 2, 7),


This line could use a comment that single_frame.png gets appended to itself in the save() call

come to think of it, having explicit index there would make adding a comment somewhat redundant

You no longer think this is necessary?

You no longer think this is necessary?

The comment was meant to clarify that the image is repeated because it occurs twice in the input; providing explicit indices makes pointing that out unnecessary.

LeXofLeviafan · 2023-10-02T06:46:41Z

Tests/test_file_apng.py

+        progress.append((filename, frame_number, n_frames))
+
+    Image.new("RGB", (1, 1)).save(out, "PNG", save_all=True, progress=callback)
+    assert progress == [(None, 1, 1)]


Maybe indicate index here (instead of None)? 1 for the 1st input image?

Would you be open to the idea of having an index all of the time? Or, we could pass back the image instance, and let the user figure out the index or filename as they see fit.

Having index always is probably simpler. As for passing the image, it wouldn't be much help with identifying the index if an image is reused in the input (like in that one test… come to think of it, having explicit index there would make adding a comment somewhat redundant).

I've pushed a commit to switch to an index.

...Why did you remove the filename though? I thought you were talking about adding an index, not removing the filename (as my own suggestion was to provide index when the filename isn't available)

I'm concerned that (index, filename, frame_number, frame_total) crosses the line from 'neat and tidy' to 'passing back everything to the user just in case something is useful'.

You can use a namedtuple instead (i.e. define Process = namedtuple('Process', "index filename frame n_frames") somewhere global, and pass its value to the progress callback).

And I'd say that the name of the input file being currently processed is a quite appropriate piece of information to be included, as it's the most user-friendly way of showing, well, which file is being processed (whether you're logging your progress to a terminal or displaying it in GUI on top of the progressbar).

Why make the most likely usecase inconvenient? The name of the game shouldn't be "provide as little information as possible and force the calling code to figure out the rest of it every single time", it should aim at the most likely general purpose case (which I imagine more often than not would be logging/displaying something in the lines of Converting: input/image2.png [#2] -> output.png (frame 12 of 84); and the only thing here that would be instantly available for the calling code without complication in any situation would be the output file name, as it's passed in as a raw string and never changes throughout the entire process).

I would have thought it was simple, given that the user just provided these images as the base image and append_images?

Suppose the code that wants to log progress isn't concerned with input at all – it only has a generator that supplies these images; why would it need to implement workarounds like instantiating all values from that generator at once, or wrapping it into a generator object that mucks around with the data being yielded, when all it wants to do is display progress on screen? And like I said, filename is the clearest way possible of identifying an image that was loaded from a file.

append_images are arbitrary PIL.Image.Image

I'm pretty sure these can be obtained with Image.open(), just like the first image.

why not pass just saved im object itself?

I suggested that, but it was pointed out that the image might be provided as an input more than once.

…which is not the case with filenames, I suppose :-)

The index can be necessary in case of duplicate files in the input, but the filename is likely to be necessary for the purpose of displaying the progress in user-friendly manner. (Now passing the entire image object, that would actually be "passing back everything to the user just in case something is useful" 😆)

I don't like passing indices since it's hard to figure out the difference between index and frame_number and it's very simple to write the code which uses index incorrectly.

Incidentally, passing progress status as a namedtuple (or the like) would help with that.

I'm not opposed to dropping index altogether - I think the key benefit of this feature is being able to track how many frames have been completed out of the total number.

Filename and/or index are necessary to show which image this current frame comes from, in case when more than one is on the input – it shouldn't be a stretch to expect it to be necessary information for the vast majority of usecases.

I'm still not convinced that a filename or image index is necessary, but I've pushed a commit to restore the filename.

>>> from collections import namedtuple >>> State = namedtuple("State", "a b") >>> State(2, 4)[0] 2

Because namedtuples still have order, and I'm concerned that there could be different views about what the best order is, or that we might want to add more items in the future, I've used a dictionary instead.

The keys are a bit overly verbose, but that should do the job at least.

Still, I believe having a single fixed class with predefined attributes would make for better API than using free-form dicts in every implementation (not to mention avoiding potential issues if there's a need to make changes at some point in the future). If you're concerned with possibility of using indices to access a namedtuple field (which I honestly find rather unlikely as attribute access is much clearer), you can use a @dataclass instead.

Either way, it would be a good idea to have this getattr(im, "filename", None) thing in one place in the codebase (and if you keep the dict, it would still be a good idea to have one piece of code defining its creation, at least for sake of ensuring there won't be any discrepancies in key names).

I've pushed a commit to move the creation of the dictionary into a common method.

LeXofLeviafan · 2023-10-02T06:51:56Z

Tests/test_file_webp.py

+        expected = []
+        for i in range(42):
+            expected.append(("Tests/images/iss634.webp", i + 1, 43))
+        expected.append((None, 43, 43))


Likewise, saying that the last frame comes from image #2 would be more informative than "don't know" – especially when there's multiple images involved

I've pushed a commit to switch to an index.

homm · 2023-10-02T10:12:40Z

An idea: use the same callback for generating frames: pass current im object (which could be an image without bitmap in memory) and check if complete image is returned.

radarhere · 2023-10-02T10:25:16Z

Sorry, I'm not sure I understand. Are you suggesting this

from PIL import Image

colors_left = ["#f00", "#0f0"]
def callback():
    if colors_left:
        return Image.new("RGB", (1, 1), frames_left.pop(0))

im = Image.new("RGB", (1, 1))
im.save("out.png", save_all=True, progress=callback)

with the intention that it allow multiframe images to be saved without needing all images to be created beforehand and so reduce memory impact?

homm · 2023-10-02T10:27:53Z

In general, yes. Depending on current code flow, it could be like this, or require some informations about the frames (like size and mode).

radarhere · 2023-10-02T11:26:11Z

APNG considers the modes of all frames when saving - #6610. I don't know what an elegant way of getting that particular information from the user at the beginning would look like, and I find it hard to imagine that users would expect that they have to know that at the beginning.

LeXofLeviafan · 2023-10-02T15:53:05Z

Tests/test_file_apng.py

+    assert progress == [
+        (0, 1, 7),
+        (1, 2, 7),
+        (2, 3, 7),


Say... if the frames are counted from 1, why count the images from 0?

I'm triggering these callbacks when frames are completed, not when they are started.

My intention is for (0, 1, 7) to communicate "We're in the first image. We've completed the first frame, there are seven frames in total".
Then (2, 7, 7) communicates "We're in the third image. We've completed seven frames, there are seven frames in total, so we're done"

If I started counting the frames from 0, the last value would be (2, 6, 7). That looks less to me like save_all is finished.

I recognise this could be a subject of debate.

…The question was “Why count the images from 0?”

I.e. why are the images numbered starting from 0? You're numbering frames starting from 1 – it would be consistent to use the same numbering scheme for images.

(And yes, I know that due to how pointer arithmetic works, arrays in C are indexed starting from 0, but that's an implementation detail and it's kind of ridiculous to drag it into higher-level languages just because; counting things is meant to be started from 1 – that's what the number means in the first place.)

I was trying to explain that my goal is not to number the frames from 1, but rather to clearly communicate when progress is completed, because the frame count will be compared with the total number of frames. The image count will not.

arrays in C are indexed starting from 0, but that's an implementation detail and it's kind of ridiculous to drag it into higher-level languages

Python arrays are also indexed from 0.

The image count will not.

And yet counting from 0 would be inconsistent with the other counting used within same data set.

Python arrays are also indexed from 0.

Which makes no sense since Python is a high-level language (and there's some consistency issues caused by this arbitrary choice to imitate C array indexing, but that's neither here nor there).

Besides, the images aren't coming from a list as far as the user is concerned (well, there's append_images, but going by that logic you'd have to start from -1 since the first input image is not in this list in the first place… except that -1 is a valid index in Python, so it probably would be None instead).

I've pushed a commit naming each piece of information within the returned data. It is now "image_index" and "completed_frames". Hopefully that clarifies that one is the Pythonic zero-based index of the image in progress, and the other is the number of frames that have been finished.

LeXofLeviafan · 2023-10-02T15:59:26Z

allow multiframe images to be saved without needing all images to be created beforehand and so reduce memory impact

I imagine if this is ever implemented, it should not be exposed in a workaround-ish way, and it probably should be implemented as a format-specific method if it can only be supported for specific formats (e.g. GifImagePlugin.fromFrames(generator)).

I'm also pretty sure such a feature is out of scope for this particular pull-request.

homm · 2023-10-02T18:17:49Z

I'm also pretty sure such a feature is out of scope for this particular pull-request.

It out of scope of your needs, but is closely related to current PR since it may or may not share the same API and it is better to discuss it here.

homm · 2023-10-02T18:19:28Z

APNG considers the modes of all frames when saving

We can restrict modes of generated frames only to the same as the initial image.

LeXofLeviafan · 2023-10-02T20:30:32Z

It out of scope of your needs, but is closely related to current PR since it may or may not share the same API and it is better to discuss it here.

You're conflating the concerns of "keeping track of progress" and "generating input data for the operation", based solely on the fact that both are related to "iteration over frames". I don't believe such features should even intersect in their implementation, let alone complicating each other.

In fact, I'm pretty sure that the correct way of implementing the feature you're requesting would be by supporting a generator as a value passed in append_images without immediately converting it to a list (it would be requested to yield a new image after the previous one got processed) – doing it this way would be idiomatic for Python; and the only way it would in any way affect the functionality of this pull-request, is that it would no longer be possible to pre-calculate the number of frames if append_images is a generator.

def to_gif_memheavy (frames, filename):
    first, _frames = Image.open(frames[0]), frames[1:]
    # append_images is a list with pre-loaded data
    first.save(filename, 'GIF', save_all=True, append_images=[Image.open(s) for s in _frames])

def to_gif_memlow (frames, filename):
    first, _frames = Image.open(frames[0]), frames[1:]
    # append_images is a generator and loads images one-at-a-time
    first.save(filename, 'GIF', save_all=True, append_images=(Image.open(s) for s in _frames))

homm · 2023-10-03T09:53:19Z

Having iterators support you can track progress without any callbacks.

LeXofLeviafan · 2023-10-03T11:31:27Z

Having iterators support you can track progress without any callbacks.

Only in the sole specific case when composing an animated image from separate single-frame images that you load one-by-one. Which means, when combining animated images, you'd only get updates per image, not per frame (and the simple usecase of "saving/converting a single animated image" would get zero progress tracking support).

Not to mention you're conflating concerns of "providing input" and "progress tracking" again. Why would the user need to be forced to deal with both when he only needs one of these things? And why would the user code dealing with one of these need to also deal with the other? If the user wants to track frame progress, he passes a callback that gets invoked on every frame (e.g. print). If the user wants to load input images one-at-a-time, he passes a generator instead of a list. He can easily do one of these things, both, or neither. Simple, intuitive, unentangled.

LeXofLeviafan · 2023-10-06T18:18:30Z

src/PIL/Image.py

@@ -2446,6 +2446,23 @@ def save(self, fp, format=None, **params):
        if open_fp:
            fp.close()

+    def _save_all_progress(


What's with "save all"? This method just passes data into another function (supplied externally).

If anything, _report_progress would make much more sense here. (Or even just _progress)

It indicates that the method is used in relation to im.save(out, save_all=True)

…Ah. Alright, it does make sense in that context.

Though it would still make more sense to name it something like _frame_progress, I'd say; it's more generic and there's not really anything in the method itself that makes it necessarily about saving only.

radarhere mentioned this pull request Oct 2, 2023

Progress indication when writing multiple frames #7433

Open

radarhere added the Enhancement label Oct 2, 2023

Added progress callback when save_all is used

6850465

radarhere force-pushed the progress branch from ddcf065 to 6850465 Compare October 2, 2023 04:45

LeXofLeviafan reviewed Oct 2, 2023

View reviewed changes

Pass back index of image in sequence, instead of filename

3465a59

homm closed this Oct 2, 2023

homm reopened this Oct 2, 2023

LeXofLeviafan reviewed Oct 2, 2023

View reviewed changes

radarhere added 2 commits October 5, 2023 18:40

Restored filename in progress callback

b5f3228

Moved progress code into common private method

fe7c9d3

LeXofLeviafan reviewed Oct 6, 2023

View reviewed changes

radarhere and others added 6 commits December 21, 2023 22:36

Merge branch 'main' into progress

fae0653

Merge branch 'main' into progress

19c2721

Merge branch 'main' into progress

4a0a011

Merge branch 'main' into progress

69680db

Merge branch 'main' into progress

344c300

Merge branch 'main' into progress

3c80ec0

Updated tests for os.path.realpath

b7a5b59

radarhere force-pushed the progress branch from 0c9abdd to b7a5b59 Compare February 12, 2024 08:37

radarhere and others added 8 commits March 12, 2024 08:24

Merge branch 'main' into progress

14560e1

Merge branch 'main' into progress

d52b3a2

Merge branch 'main' into progress

682a9ae

Merge branch 'main' into progress

2f27173

Merge branch 'main' into progress

26e2a9f

Merge branch 'main' into progress

8b4b7ce

Merge branch 'main' into progress

ebbdc6e

Merge branch 'main' into progress

fc5c096

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added progress callback when save_all is used #7435

Added progress callback when save_all is used #7435

radarhere commented Oct 2, 2023 •

edited

Loading

LeXofLeviafan Oct 2, 2023

radarhere Oct 2, 2023

LeXofLeviafan Oct 2, 2023

LeXofLeviafan Oct 2, 2023

radarhere Oct 2, 2023

LeXofLeviafan Oct 2, 2023

radarhere Oct 2, 2023

LeXofLeviafan Oct 2, 2023

LeXofLeviafan Oct 4, 2023 •

edited

Loading

radarhere Oct 5, 2023

LeXofLeviafan Oct 5, 2023

LeXofLeviafan Oct 5, 2023

radarhere Oct 6, 2023

LeXofLeviafan Oct 2, 2023

radarhere Oct 2, 2023

homm commented Oct 2, 2023

radarhere commented Oct 2, 2023

homm commented Oct 2, 2023

radarhere commented Oct 2, 2023

LeXofLeviafan Oct 2, 2023

radarhere Oct 4, 2023

LeXofLeviafan Oct 4, 2023 •

edited

Loading

radarhere Oct 4, 2023

LeXofLeviafan Oct 4, 2023

radarhere Oct 5, 2023

LeXofLeviafan commented Oct 2, 2023

homm commented Oct 2, 2023

homm commented Oct 2, 2023

LeXofLeviafan commented Oct 2, 2023

homm commented Oct 3, 2023

LeXofLeviafan commented Oct 3, 2023 •

edited

Loading

LeXofLeviafan Oct 6, 2023

radarhere Oct 6, 2023

LeXofLeviafan Oct 9, 2023

Added progress callback when save_all is used #7435

Are you sure you want to change the base?

Added progress callback when save_all is used #7435

Conversation

radarhere commented Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan Oct 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

homm commented Oct 2, 2023

radarhere commented Oct 2, 2023

homm commented Oct 2, 2023

radarhere commented Oct 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan Oct 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan commented Oct 2, 2023

homm commented Oct 2, 2023

homm commented Oct 2, 2023

LeXofLeviafan commented Oct 2, 2023

homm commented Oct 3, 2023

LeXofLeviafan commented Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radarhere commented Oct 2, 2023 •

edited

Loading

LeXofLeviafan Oct 4, 2023 •

edited

Loading

LeXofLeviafan Oct 4, 2023 •

edited

Loading

LeXofLeviafan commented Oct 3, 2023 •

edited

Loading