Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jpeg2000: valid_file implementation, much faster than trying to open #4548

Merged
merged 1 commit into from
Dec 1, 2024

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Dec 1, 2024

Description

Primary cost of detecting whether a given file is valid jpeg2000 file is the thread pool initialization and shutdown that Jpeg2000Input::open does (the actual thread pool part is inside OpenJpeg code).

So add a dedicated valid_file() implementation that only needs to check 12 bytes of the header. While at it, I changed already existing isJp2File() to is_jp2_header() to better match naming conventions used elsewhere, and instead of trying to handle both little and big endian cases by manual repetition of two sets of magic integers, let's do just byte comparisons with memcmp instead.

On my PC (Ryzen 5950X, SSD, Windows VS2022), doing ImageInput::create() on 1138 files where they are not images at all (so OIIO in turns tries all the input plugins on them):

  • Before: 3.4 seconds spent in Jpeg2000Input::open (1.9s opj_thread_pool_create, 1.3s opj_thread_pool_destroy)
  • After: 33 milliseconds spent in Jpeg2000Input::valid_file

Tests

No new tests. I checked behavior on the official Jpeg2000 conformance data set (https://github.com/uclouvain/openjpeg-data/tree/master/input/conformance), seems to work.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable. (Check if there is no
    need to update the documentation, for example if this is a bug fix that
    doesn't change the API.)
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

Primary cost of detecting whether a given file is valid jpeg2000 file
is the thread pool initialization and shutdown that
Jpeg2000Input::open does (the actual thread pool part is inside
OpenJpeg code).

So add a dedicated valid_file() implementation that only needs to
check 12 bytes of the header. While at it, I changed already existing
isJp2File() to is_jp2_header() to better match naming conventions
used elsewhere, and instead of trying to handle both little and big
endian cases by manual repetition of two sets of magic integers,
let's do just byte comparisons with memcmp instead.

On my PC (Ryzen 5950X, SSD, Windows VS2022), doing ImageInput::create()
on 1138 files where they are not images at all (so OIIO in turns tries
all the input plugins on them):

- Before: 3.4 seconds spent in Jpeg2000Input::open (1.9s
  opj_thread_pool_create, 1.3s opj_thread_pool_destroy)
- After: 33 milliseconds spent in Jpeg2000Input::valid_file

Signed-off-by: Aras Pranckevicius <[email protected]>
Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great find!

(Note to self, or anybody else reading: I wonder if the openjpeg library has a facility to the caller to provide its own thread pool instead of having a new one created and potentially over-threading.)

@lgritz lgritz merged commit da475b0 into AcademySoftwareFoundation:main Dec 1, 2024
28 checks passed
@lgritz
Copy link
Collaborator

lgritz commented Dec 1, 2024

Oof, the OpenJPEG library creates and destroys a threadpool for every file it reads (or tries to read)? It really doesn't just let the thread pool persist so that if multiple files are read, it doesn't do a full build/teardown each time?

@lgritz
Copy link
Collaborator

lgritz commented Dec 1, 2024

Always good to see you helping out here, @aras-p. You never fail to turn up some really interesting thing that we had previously overlooked.

@aras-p
Copy link
Contributor Author

aras-p commented Dec 1, 2024

Oof, the OpenJPEG library creates and destroys a threadpool for every file it reads (or tries to read)?

Looks like it. It's... not a design I would recommend, let's put it that way :)

lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Dec 1, 2024
…open (AcademySoftwareFoundation#4548)

Primary cost of detecting whether a given file is valid jpeg2000 file is
the thread pool initialization and shutdown that `Jpeg2000Input::open`
does (the actual thread pool part is inside OpenJpeg code).

So add a dedicated `valid_file()` implementation that only needs to
check 12 bytes of the header. While at it, I changed already existing
`isJp2File()` to `is_jp2_header()` to better match naming conventions
used elsewhere, and instead of trying to handle both little and big
endian cases by manual repetition of two sets of magic integers, let's
do just byte comparisons with `memcmp` instead.

On my PC (Ryzen 5950X, SSD, Windows VS2022), doing
`ImageInput::create()` on 1138 files where they are not images at all
(so OIIO in turns tries all the input plugins on them):

- Before: **3.4 seconds** spent in `Jpeg2000Input::open` (1.9s
`opj_thread_pool_create`, 1.3s `opj_thread_pool_destroy`)
- After: **33 milliseconds** spent in `Jpeg2000Input::valid_file`

No new tests. I checked behavior on the official Jpeg2000 conformance
data set
(https://github.com/uclouvain/openjpeg-data/tree/master/input/conformance),
seems to work.


Signed-off-by: Aras Pranckevicius <[email protected]>
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Dec 9, 2024
…open (AcademySoftwareFoundation#4548)

Primary cost of detecting whether a given file is valid jpeg2000 file is
the thread pool initialization and shutdown that `Jpeg2000Input::open`
does (the actual thread pool part is inside OpenJpeg code).

So add a dedicated `valid_file()` implementation that only needs to
check 12 bytes of the header. While at it, I changed already existing
`isJp2File()` to `is_jp2_header()` to better match naming conventions
used elsewhere, and instead of trying to handle both little and big
endian cases by manual repetition of two sets of magic integers, let's
do just byte comparisons with `memcmp` instead.

On my PC (Ryzen 5950X, SSD, Windows VS2022), doing
`ImageInput::create()` on 1138 files where they are not images at all
(so OIIO in turns tries all the input plugins on them):

- Before: **3.4 seconds** spent in `Jpeg2000Input::open` (1.9s
`opj_thread_pool_create`, 1.3s `opj_thread_pool_destroy`)
- After: **33 milliseconds** spent in `Jpeg2000Input::valid_file`

No new tests. I checked behavior on the official Jpeg2000 conformance
data set
(https://github.com/uclouvain/openjpeg-data/tree/master/input/conformance),
seems to work.


Signed-off-by: Aras Pranckevicius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants