Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#209 - Add Support for Sliding Invisibility Timeouts #353

Conversation

timpikelmg
Copy link
Contributor

This adds the sliding invisibility timeouts functionality (from #209) which is ported from SQL Server storage.

(This also ports a change where when removing from queue or requeuing, it will include the fetchedat in the where clause. See this commit (Check FetchedAt has expected value to prevent prolonging other's work).)

As noted in the original ticket, sliding invisibility timeouts is better than invisibility timeouts since as long as the worker instance is running, it will never reassign the job to another worker. This is much better than setting the invisibility timeout to long timeouts for long running jobs.

The SQL Server storage also implements transaction dequeuing (different to what is implemented here now with the native transaction setting) where it holds an open transaction while the job is running. This has advantages over using the sliding invisibility timeout (which is why it is the default in SQL Server storage) but it has limitations for very long running jobs since it can hold a transaction for a long period of time which has caused issues with backups on SQL Server (see this ticket #864).

Like SQL Server storage, it makes sense to allow both options (and longer term remove the older non-sliding invisibility timeout) rather than implementing just the newer open transaction dequeuing so users can choose the implementation that suits their use case (ie. short running jobs or long running jobs).

This PR implements sliding invisibility timeouts since this easily extends the current dequeuing functionality and is easily ported across from SQL Server storage.

How this works is it utilizes the existing dequeue functionality but adds a second configurable timeout setting SlidingInvisibilityTimeout which when set to a value enables using sliding invisibility timeout. When a job is fetched, it adds the job to the background storage heartbeat process, called from the background server which updates the fetchedat for each job.

This works with both native transactions enabled and disabled.

In detail, given this is basically ported from SQL Server storage it works in basically the same fashion:

  1. Dequeue a job and set the fetchedat to now (same query, except using SlidingInvisibilityTimeout value)
  2. Return a fetched job which when created adds the job to the PostgreSqlHeartbeatProcess to update
  3. The PostgreSqlHeartbeatProcess (an IServerComponent) is called regularly from the background server
  4. PostgreSqlHeartbeatProcess calls ExecuteKeepAliveQueryIfRequired method for each running job which updates the fetchedat to now for that job.

ExecuteKeepAliveQueryIfRequired is run 5 times within the sliding invisibility timeout range.

It should be noted that setting the SlidingInvisibilityTimeout to lower than 10 seconds gets very unstable since sometimes it cannot update the fetchedat quick enough before the job is fetched and so duplicate jobs start running. Ideally this should never be set lower than 1 minute (in my opinion).

The other caveat is that if the worker process has IsLightweightServer set to true, then sliding invisibility timeouts will not work at all because it relies on the background storage processes running. There doesn't seem to be any easy way around this without a change to Hangfire.Core. Hopefully longer term, it will allow storage libraries to either add background processes directly or provide more information to allow them to decide which processes should run when IsLightweightServer is set.

This uses the same fetching as normal invisibility timeout except with an added background update to update the FetchedAt time regularly
This is disabled by default and must be set in PostgreSqlStorageOptions SlidingInvisibilityTimeout to enable it's usage.
Fixed two issues with tests where tests were asserting against record.FetchedAt but the record returned is the database column names so should be record.fetchedat
… timeouts are enabled to reduce server load if not enabled

Added additional documentation to SlidingInvisibilityTimeout to indicate that this and IsLightweightServer are not compatible
src/Hangfire.PostgreSql/PostgreSqlFetchedJob.cs Outdated Show resolved Hide resolved
src/Hangfire.PostgreSql/PostgreSqlStorageOptions.cs Outdated Show resolved Hide resolved
src/Hangfire.PostgreSql/PostgreSqlJobQueue.cs Outdated Show resolved Hide resolved
src/Hangfire.PostgreSql/PostgreSqlJobQueue.cs Outdated Show resolved Hide resolved
src/Hangfire.PostgreSql/PostgreSqlJobQueue.cs Outdated Show resolved Hide resolved
src/Hangfire.PostgreSql/PostgreSqlJobQueue.cs Outdated Show resolved Hide resolved
src/Hangfire.PostgreSql/PostgreSqlStorage.cs Show resolved Hide resolved
@azygis
Copy link
Collaborator

azygis commented Apr 15, 2024

One thing I don't really like right now overall is that it's a separate property in storage options. With current API, it's not possible to "unset" the "regular" invisibility timeout without introducing possibly breaking changes (making the existing property nullable).

Can we have a boolean instead UseSlidingInvisibilityTimeout? That way the invisibility timeout property will only be one. Options can be logged separately, the process can be yielded based on such boolean and so on. It's a bit confusing to have multiple timespans, especially since one literally overrides the other.

@timpikelmg
Copy link
Contributor Author

One thing I don't really like right now overall is that it's a separate property in storage options. With current API, it's not possible to "unset" the "regular" invisibility timeout without introducing possibly breaking changes (making the existing property nullable).

Can we have a boolean instead UseSlidingInvisibilityTimeout? That way the invisibility timeout property will only be one. Options can be logged separately, the process can be yielded based on such boolean and so on. It's a bit confusing to have multiple timespans, especially since one literally overrides the other.

That makes alot of sense. I was trying to keep it as separate as possible from the existing settings and also align the naming with the SQL Server Storage options but perhaps better to keep it simple. That makes the changes more straight forward as well since same timeouts settings.

I'll add the UseSlidingInvisibilityTimeout to the options.

…ityTimeout to enable/disable using sliding timeout and maintain existing InvisibilityTimeout setting for both sliding and existing invisibility timeout options.
@azygis azygis merged commit 942fccb into hangfire-postgres:master Jun 27, 2024
2 checks passed
@azygis
Copy link
Collaborator

azygis commented Jun 27, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants