1.7.3
Recommend Upgrade to 1.7.6
SDK 1.7.3 Advisory: Known Issues with Long-Running Jobs
1.7.3: Long-running jobs (>60 seconds) can cause the system to stop the worker, triggering retries and failures. Additionally, a long idle timeout (20+ seconds) may result in similar behavior, especially for the second request.
What's Changed
- Refactored rp_job.get_job to work well under pause and unpause conditions. More debug lines too.
- Refactored rp_scale.JobScaler to handle shutdowns where it cleans up hanging tasks and connections gracefully. Better debug lines.
- Fixed rp_scale.JobScaler from unnecessary long asyncio.sleeps made before considering the blocking get_job calls.
- Improved worker_state's JobProgress and JobsQueue to timestamp when jobs are added or removed.
- Incorporated the lines of code in worker.run_worker into rp_scale.JobScaler where it belongs and simplified to job_scaler.start()
- Fixed non-error logged as errors in tracer
- Updated unit tests mandating these changes* Blocking job take call means 5-sec debounce no longer needed by @deanq in #366
- Debounce at HTTP 429 response by @deanq in #367
Full Changelog: 1.7.2...1.7.3