Release 1.7.3 · runpod/runpod-python

Recommend Upgrade to 1.7.6

SDK 1.7.3 Advisory: Known Issues with Long-Running Jobs

1.7.3: Long-running jobs (>60 seconds) can cause the system to stop the worker, triggering retries and failures. Additionally, a long idle timeout (20+ seconds) may result in similar behavior, especially for the second request.

What's Changed

Refactored rp_job.get_job to work well under pause and unpause conditions. More debug lines too.
Refactored rp_scale.JobScaler to handle shutdowns where it cleans up hanging tasks and connections gracefully. Better debug lines.
Fixed rp_scale.JobScaler from unnecessary long asyncio.sleeps made before considering the blocking get_job calls.
Improved worker_state's JobProgress and JobsQueue to timestamp when jobs are added or removed.
Incorporated the lines of code in worker.run_worker into rp_scale.JobScaler where it belongs and simplified to job_scaler.start()
Fixed non-error logged as errors in tracer
Updated unit tests mandating these changes* Blocking job take call means 5-sec debounce no longer needed by @deanq in #366
Debounce at HTTP 429 response by @deanq in #367

Full Changelog: 1.7.2...1.7.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.7.3

Recommend Upgrade to 1.7.6

SDK 1.7.3 Advisory: Known Issues with Long-Running Jobs

What's Changed

Contributors