api: Add global attribute `imageinput:strict` #4560

lgritz · 2024-12-06T00:24:25Z

We don't do anything with this at the present time, but this PR reserves and documents this attribute for future use.

The intent is to be able to set whether we want image readers to try being as tolerant as possible when reading a file with flaws (press on and see if the rest of the file is ok?), or be more conservative and abandon reading any file as soon as a corruption or invalid data is encountered (because that might be a clue that the file is arbitrarily corrupted or even maliciously constructed).

I documented it as defaulting to 0 (err on the side of being permissive of bad input), with high-security applications being responsible for setting it to 1. But it's open for debate if people think that a better default is to be strict and let applications who want to be more tolerant be responsible for accepting the risk and switching the mode.

We don't do anything with this at the present time, but this PR reserves and documents this attribute for future use. The intent is to be able to set whether we want image readers to try being as tolerant as possible when reading a file with flaws (press on and see if the rest of the file is ok?), or be more conservative and abandon reading any file as soon as a corruption or invalid data is encountered (because that might be a clue that the file is arbitrarily corrupted or even maliciously constructed). I documented it as defaulting to 0 (err on the side of being permissive of bad input), with high-security applications being responsible for setting it to 1. But it's open for debate of people think that a better default is to be strict and let applications who want to be more tolerant be responsible for accepting the risk and switching the mode. Signed-off-by: Larry Gritz <[email protected]>

lgritz · 2024-12-16T20:48:34Z

Any opinions on this?

EmilDohne · 2024-12-18T20:48:50Z

I appreciate this introduction and I think a 0 as default is perfectly valid as most of the case in non-critical applications I think people would prefer to get some usable data rather than some obscure error.

I have some overall concerns (although not in the scope of this MR) with the attributes being passed as const void* and reinterpreted based on the passed TypeDesc. I'm guessing around the time that this was written std::variant was not available but seeing as C++17 is the standard we are on now I think it would be valid to start deprecating it in favour of the more type safe std::variant.
I'd love to hear your opinion on this and would be happy to provide a std::variant based version and the other methods can then be slowly deprecated.

lgritz · 2024-12-18T21:15:10Z

Yes, we've already been in the process of trying to get out of the practice of passing around "unlengthed" raw pointers wherever possible. So I'm definitely interested in continuing that work, including here.

I'm not sure one way or the other whether std::variant is the right approach in this particular case, but it's definitely a contender, and if you want to prototype a variant-based way to set attributes, I'd love to see that and we can all collectively evaluate it. The three possible issues that come immediately to mind are: (1) how to handle arrays, (2) whether use of variant will incur making extra copies or allocations that we wish to avoid merely to pass the parameters, and (3) whether there could possibly be ABI incompatibilities (e.g., are we 100% sure that a library compiled with gcc and a client app compiled with clang, for example, can pass std::variant through public interfaces with guaranteed compatibility?).

One reason why we hadn't really used variant anywhere before is simply because the 3.0 release (just a month ago) is the first time that C++17 had been the minimum for public interfaces, so it just wasn't a possibility. So that also means we're a little inexperienced with its pros and cons.

EmilDohne · 2024-12-18T21:55:43Z

I'm happy to have a look into this and provide a tentative MR to have everyone take a look at it. As for the allocation and copying part it's more than possible to construct a variant without copying (i.e. move, emplace etc.). It does have the limitations that it will use up the memory of the widest type but I figure for attributes specifically this is likely a non-issue as these look to be fairly small and not on a hot path.

As for the guarantee of public interfaces I would assume so as they are all based on the same standard but I have to be honest not yet tried it.

EDIT:

To add to this, I definitely don't think variant is the right type for everything. Where e.g. an array is passed a template would be much more fitting than a generic variant. In fact for attributes that may also hold true but I will play around with this a bit

lgritz · 2024-12-18T22:31:37Z

Great, I'd love to see a prototype of a std::variant based attribute() call. (As long as you don't mind that it may or may not turn out to fit the bill in the end. But I think there's no way to know without doing it.)

As an alternate design to consider, though, I wonder if it might avoid some pitfalls to use a templated attribute() call that takes a span<T>? That would definitely not have any extra allocations or copies or other unobvious overhead, has a future-proof ABI, and it handles arrays trivially.

EmilDohne · 2024-12-19T09:42:47Z

After doing some initial testing I don't know if a templated approach is the right one as, unless you explicitly instantiate the template for every variation of input arguments there might be, the function can't really be implemented in the imageio.h header as it relies on the pvt:: externed variables. Perhaps a variant and/or specializations is more appropriate in this case.

I'll come back to this at a later point when I get some more time but it's definitely something to consider

lgritz · 2024-12-19T20:49:08Z

Well, what I was thinking about is something like:

// Concrete base case, implemented in a .cpp file and largely identical to the
// raw-pointer-based one we have now, except that its access to the caller's data
// is properly bounds checked:
bool attribute(string_view name, TypeDesc type, cspan<std::byte> data);

// Templated, inlined, type safe
template<typename T>
bool attribute(string_view name, span<T> data) {
    return attribute(name, TypeDesc(BaseTypeFromC<T>::value, data.size() > 1 ? data.size : 0), as_bytes(data));
}

lgritz · 2024-12-19T20:53:11Z

I'm going to merge this PR, but we should open a new issue or discussion for any further hashing out of a type- and bounds-safe attribute API.

EmilDohne · 2024-12-19T22:29:16Z

Well, what I was thinking about is something like:

// Concrete base case, implemented in a .cpp file and largely identical to the
// raw-pointer-based one we have now, except that its access to the caller's data
// is properly bounds checked:
bool attribute(string_view name, TypeDesc type, cspan<std::byte> data);

// Templated, inlined, type safe
template<typename T>
bool attribute(string_view name, span<T> data) {
    return attribute(name, TypeDesc(BaseTypeFromC<T>::value, data.size() > 1 ? data.size : 0), as_bytes(data));
}

That looks like a neat solution given the constraints of the current API! I'll open up a new ticket to track this and any progress I make.

…ion#4560) We don't do anything with this at the present time, but this PR reserves and documents this attribute for future use. The intent is to be able to set whether we want image readers to try being as tolerant as possible when reading a file with flaws (press on and see if the rest of the file is ok?), or be more conservative and abandon reading any file as soon as a corruption or invalid data is encountered (because that might be a clue that the file is arbitrarily corrupted or even maliciously constructed). I documented it as defaulting to 0 (err on the side of being permissive of bad input), with high-security applications being responsible for setting it to 1. But it's open for debate if people think that a better default is to be strict and let applications who want to be more tolerant be responsible for accepting the risk and switching the mode. Signed-off-by: Larry Gritz <[email protected]>

lgritz mentioned this pull request Dec 19, 2024

fix: more ICC protections against invalid tag sizes #4565

Merged

lgritz merged commit dcb38bd into AcademySoftwareFoundation:main Dec 19, 2024
29 checks passed

EmilDohne mentioned this pull request Dec 19, 2024

Provide a type and memory safe interface for attribute #4566

Open

lgritz deleted the lg-strict branch December 23, 2024 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api: Add global attribute `imageinput:strict` #4560

api: Add global attribute `imageinput:strict` #4560

lgritz commented Dec 6, 2024

lgritz commented Dec 16, 2024

EmilDohne commented Dec 18, 2024

lgritz commented Dec 18, 2024

EmilDohne commented Dec 18, 2024 •

edited

Loading

lgritz commented Dec 18, 2024

EmilDohne commented Dec 19, 2024

lgritz commented Dec 19, 2024 •

edited

Loading

lgritz commented Dec 19, 2024

EmilDohne commented Dec 19, 2024

api: Add global attribute imageinput:strict #4560

api: Add global attribute imageinput:strict #4560

Conversation

lgritz commented Dec 6, 2024

lgritz commented Dec 16, 2024

EmilDohne commented Dec 18, 2024

lgritz commented Dec 18, 2024

EmilDohne commented Dec 18, 2024 • edited Loading

lgritz commented Dec 18, 2024

EmilDohne commented Dec 19, 2024

lgritz commented Dec 19, 2024 • edited Loading

lgritz commented Dec 19, 2024

EmilDohne commented Dec 19, 2024

api: Add global attribute `imageinput:strict` #4560

api: Add global attribute `imageinput:strict` #4560

EmilDohne commented Dec 18, 2024 •

edited

Loading

lgritz commented Dec 19, 2024 •

edited

Loading