feat(server): library refresh go brrr #14456

etnoy · 2024-12-02T21:37:45Z

This PR significantly improves library scanning performance. Wherever suitable, we are doing jobs in batches, and many looped database interactions are replaced with SQL queries.

User testimonials
"@etnoy what on earth have you done. I tried your PR and it finished the scan for 1M assets in 37 seconds down from 728s on main. It takes 188s just to finish queuing on main" -- @mertalev

Changes made

When crawling a library: Don't call stat() on the files when importing, instead put a placeholder date (9999-12-31) that will be overwritten by metadata extraction. All dates after year 9000 are ignored in the timeline
When importing new library files: Crawl 10k files per batch and then insert each batch in a single SQL query instead adding them individually
When checking asset filenames against import paths and exclusion patterns: Do this directly in SQL which is much faster
When a library is deleted, hide all its assets first. Otherwise, they'll linger until the asset deletion job catches up which can take days for >1M libraries

Plus several minor cleanups and performance enhancements.

The performance improvements are at least an order of magnitude in library scanning.

Benchmark 1
A library scan with 22k items where nothing has changed since the last scan used to take 1m 22s, now it's below 10 seconds, an improvement of 87 percent!

Benchmark 2
A clean library import with 19k items takes 1m40s in main and 7 seconds in this PR.
NOTE: this benchmark is only the library service scan and does not include the metadata extraction. Also, some fs calls have been migrated from the library service to the metadata service, although this should only have a minor impact on overall scan performance

Benchmark 3
Importing a library with >5M assets.

Time to 1M imported (without metadata extraction): 6m50s.

No need to compare to main, you know it's fast!

Benchmark 4
Importing a library of 527041 files took 1m58s (without metadata extraction) in this PR.
No need to compare to main, you know it's fast!

Bonus:

Greatly improved log messages related to library scans.
This scan imports all new files:

This is an "idle scan", where a refresh finds no changes:

More e2e tests for handling when offline files go back online, leading to one major bug fixed

Future work:

Crawl for XMP sidecars instead of queuing a sidecar discovery for each asset

Final note:
This PR allowed me to hit a milestone of 10M assets in a single Immich instance, likely a world-first. This does require max-old-space-size=8096, but that's to be expected anyway

mertalev

Nice start! I think there are still a lot of untapped potential improvements here.

server/src/services/library.service.ts

mertalev · 2024-12-04T18:08:32Z

The update to fileCreatedAt, fileModifiedAt and originalFileName is unnecessary and can be handled in metadata extraction since this will be queued anyway. This makes the batched update for isOffline and deletedAt simpler since there'll be no values that are unique to each asset.

etnoy · 2024-12-08T22:20:59Z

Thanks for your comments @mertalev ! I'll first attempt to do the import path and exclusion pattern checks in SQL and then move to your suggestions

server/src/repositories/asset.repository.ts

server/src/services/library.service.ts

etnoy · 2024-12-12T21:10:28Z

The update to fileCreatedAt, fileModifiedAt and originalFileName is unnecessary and can be handled in metadata extraction since this will be queued anyway. This makes the batched update for isOffline and deletedAt simpler since there'll be no values that are unique to each asset.

Never thought of that, I've implemented your suggestion. I'm also considering changing the initial import code to ignore file mtime, this allows us to not do any file system calls except for the crawl. Metadata extraction will have to do the heavy lifting instead

mertalev · 2024-12-12T21:26:49Z

The update to fileCreatedAt, fileModifiedAt and originalFileName is unnecessary and can be handled in metadata extraction since this will be queued anyway. This makes the batched update for isOffline and deletedAt simpler since there'll be no values that are unique to each asset.

Never thought of that, I've implemented your suggestion. I'm also considering changing the initial import code to ignore file mtime, this allows us to not do any file system calls except for the crawl. Metadata extraction will have to do the heavy lifting instead

Would that mean you queue them for metadata extraction even if they're unchanged? You can test it but I think it'd be more overhead than the stat calls.

Edit: also if you do this with the source set to upload, it would definitely be worse because it would queue a bunch of other things after metadata extraction.

etnoy · 2024-12-12T21:42:08Z

The update to fileCreatedAt, fileModifiedAt and originalFileName is unnecessary and can be handled in metadata extraction since this will be queued anyway. This makes the batched update for isOffline and deletedAt simpler since there'll be no values that are unique to each asset.

Never thought of that, I've implemented your suggestion. I'm also considering changing the initial import code to ignore file mtime, this allows us to not do any file system calls except for the crawl. Metadata extraction will have to do the heavy lifting instead

Would that mean you queue them for metadata extraction even if they're unchanged? You can test it but I think it'd be more overhead than the stat calls.

Edit: also if you do this with the source set to upload, it would definitely be worse because it would queue a bunch of other things after metadata extraction.

I was referring to new imports, files that are new to immich. I hoped to improve the ingest performance by removing the stat call. After testing, there are two issues:

assetRepository.create requires mtime, which we can only get from stat. We could work around that by setting it to new Date(), but ideally it should be undefined
We still check for the existence of a sidecar, and this complicates things

If we can mitigate the two issues above, I can rewrite the library import feature and do that in batches as well!

mertalev · 2024-12-12T21:55:06Z

I don't see why fileModifiedAt needs a non-null constraint in the DB. Might just be an oversight that didn't matter because it didn't affect our usage. I think you can change the asset entity and generate a migration to remove that constraint.

For sidecar files, maybe you could add.xmp to the glob filter and enable the option to make the files come in sorted order? That way you could make sure they're in the same batch.

etnoy · 2024-12-12T22:20:59Z

I don't see why fileModifiedAt needs a non-null constraint in the DB. Might just be an oversight that didn't matter because it didn't affect our usage. I think you can change the asset entity and generate a migration to remove that constraint.

For sidecar files, maybe you could add.xmp to the glob filter and enable the option to make the files come in sorted order? That way you could make sure they're in the same batch.

I might just put new Date() in at the moment to keep the PR somewhat constrained.

Regarding sidecars, I have thought about that, problem right now is that we're batching the crawled files in batches of 10k. It might be hard to do get that working alright. Maybe I'll just queue a sidecar discovery for every imported asset for now

…/inline-offline-check

alextran1502 · 2025-03-03T03:31:10Z

You call on merging this one @mertalev

zackpollard · 2025-03-03T19:49:31Z

You call on merging this one @mertalev

If it's ready for final review I would like to take a look at it too before it is merged, I can take a look tomorrow morning.

etnoy · 2025-03-03T22:14:37Z

You call on merging this one @mertalev

If it's ready for final review I would like to take a look at it too before it is merged, I can take a look tomorrow morning.

Thanks, I'd apprecate it!

…/inline-offline-check

mertalev · 2025-03-04T00:08:07Z

@etnoy I made a few changes, let me know if there's anything that seems off.

etnoy · 2025-03-04T07:38:22Z

@etnoy I made a few changes, let me know if there's anything that seems off.

I had a look and it's a big improvement, thanks for helping me clean this up

zackpollard · 2025-03-04T18:33:19Z

Didn't get to this today sorry, on my list for tomorrow 😅

…/inline-offline-check

zackpollard

LGTM, nice work. You have some failing unit tests after the latest changes, but it's good to go after those are fixed I think

etnoy · 2025-03-06T13:33:35Z

@zackpollard good to go now that the release is done?

* feat: brr --------- Co-authored-by: mertalev <[email protected]>

etnoy added the changelog:enhancement label Dec 2, 2024

github-actions bot added the 🗄️server label Dec 2, 2024

etnoy force-pushed the feat/inline-offline-check branch from 0eb1440 to 80aa615 Compare December 2, 2024 21:45

feat: run all offline checks in a single job

8ecde3b

etnoy force-pushed the feat/inline-offline-check branch from 80aa615 to 8ecde3b Compare December 2, 2024 21:46

mertalev reviewed Dec 4, 2024

View reviewed changes

server/src/services/library.service.ts Outdated Show resolved Hide resolved

server/src/services/library.service.ts Outdated Show resolved Hide resolved

etnoy force-pushed the feat/inline-offline-check branch 2 times, most recently from d394654 to 8b2a48c Compare December 9, 2024 21:34

Merge remote-tracking branch 'origin' into feat/inline-offline-check

02c5765

etnoy force-pushed the feat/inline-offline-check branch 3 times, most recently from 6d69307 to c26f6aa Compare December 10, 2024 16:41

etnoy added the changelog:bugfix label Dec 10, 2024

etnoy force-pushed the feat/inline-offline-check branch from c26f6aa to a3be620 Compare December 10, 2024 20:39

etnoy removed the changelog:bugfix label Dec 10, 2024

etnoy changed the title ~~feat(server): run all offline checks in a single job~~ feat(server): library refresh go brrr Dec 10, 2024

mertalev reviewed Dec 10, 2024

View reviewed changes

server/src/repositories/asset.repository.ts Outdated Show resolved Hide resolved

server/src/repositories/asset.repository.ts Show resolved Hide resolved

mertalev reviewed Dec 10, 2024

View reviewed changes

server/src/services/library.service.ts Outdated Show resolved Hide resolved

etnoy force-pushed the feat/inline-offline-check branch 5 times, most recently from 775b817 to 69b273d Compare December 12, 2024 20:59

etnoy force-pushed the feat/inline-offline-check branch from cb772ad to aa689ef Compare February 27, 2025 10:49

Merge branch 'main' of https://github.com/immich-app/immich into feat…

d8d61a0

…/inline-offline-check

etnoy force-pushed the feat/inline-offline-check branch from 9313217 to d8d61a0 Compare February 28, 2025 12:53

Merge remote-tracking branch 'origin' into feat/inline-offline-check

5968d63

Merge branch 'main' of https://github.com/immich-app/immich into feat…

954200e

…/inline-offline-check

etnoy force-pushed the feat/inline-offline-check branch from 822ea5f to 954200e Compare March 3, 2025 22:22

mertalev added 3 commits March 3, 2025 18:22

query cleanup

53cb333

batch stats, simplify logic

3b84769

conditional promises

59f37b6

mertalev added 3 commits March 3, 2025 19:10

linting

0eb65a0

Update test

1e13760

fix mtime comparison

32bea50

Log message for earliest date

a39e08d

simplify

eef51dd

mertalev approved these changes Mar 5, 2025

View reviewed changes

Merge branch 'main' of https://github.com/immich-app/immich into feat…

6ccfd70

…/inline-offline-check

zackpollard approved these changes Mar 5, 2025

View reviewed changes

Simplify by renaming assetpath

a67d97e

etnoy force-pushed the feat/inline-offline-check branch from dbaa302 to a67d97e Compare March 5, 2025 14:49

etnoy merged commit 3af26ee into main Mar 6, 2025
42 checks passed

etnoy deleted the feat/inline-offline-check branch March 6, 2025 15:00

mertalev mentioned this pull request Apr 14, 2025

refactor(server): non-nullable file metadata #17598

Merged

savely-krasovsky pushed a commit to savely-krasovsky/immich that referenced this pull request Jun 8, 2025

feat(server): library refresh go brrr (immich-app#14456)

c7b8535

* feat: brr --------- Co-authored-by: mertalev <[email protected]>

Uh oh!

feat(server): library refresh go brrr #14456

feat(server): library refresh go brrr #14456

Uh oh!

Conversation

etnoy commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mertalev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mertalev commented Dec 4, 2024

Uh oh!

etnoy commented Dec 8, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etnoy commented Dec 12, 2024

Uh oh!

mertalev commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etnoy commented Dec 12, 2024

Uh oh!

mertalev commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etnoy commented Dec 12, 2024

Uh oh!

alextran1502 commented Mar 3, 2025

Uh oh!

zackpollard commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etnoy commented Mar 3, 2025

Uh oh!

mertalev commented Mar 4, 2025

Uh oh!

etnoy commented Mar 4, 2025

Uh oh!

zackpollard commented Mar 4, 2025

Uh oh!

zackpollard left a comment

Choose a reason for hiding this comment

Uh oh!

etnoy commented Mar 6, 2025

Uh oh!

Uh oh!

Uh oh!

etnoy commented Dec 2, 2024 •

edited

Loading

mertalev commented Dec 12, 2024 •

edited

Loading

mertalev commented Dec 12, 2024 •

edited

Loading

zackpollard commented Mar 3, 2025 •

edited

Loading