Skip to content

Commit a81699b

Browse files
authored
Add retry_standard_errors config for SQS ActiveJob (#115)
1 parent 5c19133 commit a81699b

File tree

10 files changed

+102
-13
lines changed

10 files changed

+102
-13
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Unreleased Changes
22
------------------
33

4+
* Feature - Add `retry_standard_errors` (default `true`) in SQS ActiveJob and improve retry logic (#114).
5+
46
3.10.0 (2024-01-19)
57
------------------
68

README.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -298,12 +298,31 @@ YourJob.set(wait: 1.minute).perform_later(args)
298298
Note: Due to limitations in SQS, you cannot schedule jobs for
299299
later than 15 minutes in the future.
300300

301-
### Performance
302-
AWS SQS ActiveJob is a lightweight and performant queueing backend. Benchmark performed using: Ruby MRI 2.6.5,
303-
shoryuken 5.0.5, aws-sdk-rails 3.3.1 and aws-sdk-sqs 1.34.0 on a 2015 Macbook Pro dual-core i7 with 16GB ram.
301+
### Retry Behavior and Handling Errors
302+
See the Rails ActiveJob Guide on
303+
[Exceptions](https://guides.rubyonrails.org/active_job_basics.html#exceptions)
304+
for background on how ActiveJob handles exceptions and retries.
305+
306+
In general - you should configure retries for your jobs using
307+
[retry_on](https://edgeapi.rubyonrails.org/classes/ActiveJob/Exceptions/ClassMethods.html#method-i-retry_on).
308+
When configured, ActiveJob will catch the exception and reschedule the job for
309+
re-execution after the configured delay. This will delete the original
310+
message from the SQS queue and requeue a new message.
311+
312+
By default SQS ActiveJob is configured with `retry_standard_error` set to `true`
313+
and will not delete messages for jobs that raise a `StandardError` and that do
314+
not handle that error via `retry_on` or `discard_on`. These job messages
315+
will remain on the queue and will be re-read and retried following the
316+
SQS Queue's configured
317+
[retry and DLQ settings](https://docs.aws.amazon.com/lambda/latest/operatorguide/sqs-retries.html).
318+
If you do not have a DLQ configured, the message will continue to be attempted
319+
until it reaches the queues retention period. In general, it is a best practice
320+
to configure a DLQ to store unprocessable jobs for troubleshooting and redrive.
321+
322+
If you want failed jobs that do not have `retry_on` or `discard_on` configured
323+
to be immediately discarded and not left on the queue, set `retry_standard_error`
324+
to `false`. See the configuration section below for details.
304325

305-
*AWS SQS ActiveJob* (default settings): Throughput 119.1 jobs/sec
306-
*Shoryuken* (default settings): Throughput 76.8 jobs/sec
307326

308327
### Running workers - polling for jobs
309328
To start processing jobs, you need to start a separate process
@@ -325,6 +344,13 @@ Note: When running in production, its recommended that use a process
325344
supervisor such as [foreman](https://github.com/ddollar/foreman), systemd,
326345
upstart, daemontools, launchd, runit, ect.
327346

347+
### Performance
348+
AWS SQS ActiveJob is a lightweight and performant queueing backend. Benchmark performed using: Ruby MRI 2.6.5,
349+
shoryuken 5.0.5, aws-sdk-rails 3.3.1 and aws-sdk-sqs 1.34.0 on a 2015 Macbook Pro dual-core i7 with 16GB ram.
350+
351+
*AWS SQS ActiveJob* (default settings): Throughput 119.1 jobs/sec
352+
*Shoryuken* (default settings): Throughput 76.8 jobs/sec
353+
328354
### Serverless workers: processing activejobs using AWS Lambda
329355
Rather than managing the worker processes yourself, you can use Lambda with an SQS Trigger.
330356
With [Lambda Container Image Support](https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/)

lib/aws/rails/sqs_active_job/configuration.rb

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ class Configuration
2525
DEFAULTS = {
2626
max_messages: 10,
2727
shutdown_timeout: 15,
28+
retry_standard_errors: true, # TODO: Remove in next MV
2829
queues: {},
2930
logger: ::Rails.logger,
3031
message_group_id: 'SqsActiveJobGroup',
@@ -64,6 +65,16 @@ class Configuration
6465
# will not be deleted from the SQS queue and will be retryable after
6566
# the visibility timeout.
6667
#
68+
# @ option options [Boolean] :retry_standard_errors
69+
# If `true`, StandardErrors raised by ActiveJobs are left on the queue
70+
# and will be retried (pending the SQS Queue's redrive/DLQ/maximum receive settings).
71+
# This behavior overrides the standard Rails ActiveJob
72+
# [Retry/Discard for failed jobs](https://guides.rubyonrails.org/active_job_basics.html#retrying-or-discarding-failed-jobs)
73+
# behavior. When set to `true` the retries provided by this will be
74+
# on top of any retries configured on the job with `retry_on`.
75+
# When `false`, retry behavior is fully configured
76+
# through `retry_on`/`discard_on` on the ActiveJobs.
77+
#
6778
# @option options [ActiveSupport::Logger] :logger Logger to use
6879
# for the poller.
6980
#

lib/aws/rails/sqs_active_job/executor.rb

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,15 @@ class Executor
1313
auto_terminate: true,
1414
idletime: 60, # 1 minute
1515
fallback_policy: :caller_runs # slow down the producer thread
16+
# TODO: Consider catching the exception and sleeping instead of using :caller_runs
1617
}.freeze
1718

1819
def initialize(options = {})
1920
@executor = Concurrent::ThreadPoolExecutor.new(DEFAULTS.merge(options))
21+
@retry_standard_errors = options[:retry_standard_errors]
2022
@logger = options[:logger] || ActiveSupport::Logger.new($stdout)
2123
end
2224

23-
# TODO: Consider catching the exception and sleeping instead of using :caller_runs
2425
def execute(message)
2526
@executor.post(message) do |message|
2627
begin
@@ -31,10 +32,18 @@ def execute(message)
3132
rescue Aws::Json::ParseError => e
3233
@logger.error "Unable to parse message body: #{message.data.body}. Error: #{e}."
3334
rescue StandardError => e
34-
# message will not be deleted and will be retried
3535
job_msg = job ? "#{job.id}[#{job.class_name}]" : 'unknown job'
3636
@logger.info "Error processing job #{job_msg}: #{e}"
3737
@logger.debug e.backtrace.join("\n")
38+
39+
if @retry_standard_errors && !job.exception_executions?
40+
@logger.info(
41+
'retry_standard_errors is enabled and job has not ' \
42+
"been retried by Rails. Leaving #{job_msg} in the queue."
43+
)
44+
else
45+
message.delete
46+
end
3847
end
3948
end
4049
end

lib/aws/rails/sqs_active_job/job_runner.rb

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ def initialize(message)
1515
def run
1616
ActiveJob::Base.execute @job_data
1717
end
18+
19+
def exception_executions?
20+
@job_data['exception_executions'] &&
21+
!@job_data['exception_executions'].empty?
22+
end
1823
end
1924
end
2025
end

lib/aws/rails/sqs_active_job/poller.rb

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ class Poller
1616
threads: 2 * Concurrent.processor_count,
1717
max_messages: 10,
1818
shutdown_timeout: 15,
19-
backpressure: 10
19+
backpressure: 10,
20+
retry_standard_errors: true
2021
}.freeze
2122

2223
def initialize(args = ARGV)
@@ -45,7 +46,12 @@ def run
4546

4647
Signal.trap('INT') { raise Interrupt }
4748
Signal.trap('TERM') { raise Interrupt }
48-
@executor = Executor.new(max_threads: @options[:threads], logger: @logger, max_queue: @options[:backpressure])
49+
@executor = Executor.new(
50+
max_threads: @options[:threads],
51+
logger: @logger,
52+
max_queue: @options[:backpressure],
53+
retry_standard_errors: @options[:retry_standard_errors]
54+
)
4955

5056
poll
5157
rescue Interrupt
@@ -99,6 +105,7 @@ def boot_rails
99105
require File.expand_path('config/environment.rb')
100106
end
101107

108+
# rubocop:disable Metrics
102109
def parse_args(argv)
103110
out = {}
104111
parser = ::OptionParser.new do |opts|
@@ -127,6 +134,10 @@ def parse_args(argv)
127134
'The amount of time to wait for a clean shutdown. Jobs that are unable to complete in this time will not be deleted from the SQS queue and will be retryable after the visibility timeout.') do |a|
128135
out[:shutdown_timeout] = a
129136
end
137+
opts.on('--[no-]retry_standard_errors [FLAG]', TrueClass,
138+
'When set, retry all StandardErrors (leaving failed messages on the SQS Queue). These retries are ON TOP of standard Rails ActiveJob retries set by retry_on in the ActiveJob.') do |a|
139+
out[:retry_standard_errors] = a.nil? ? true : a
140+
end
130141
end
131142

132143
parser.banner = 'aws_sqs_active_job [options]'
@@ -138,6 +149,7 @@ def parse_args(argv)
138149
parser.parse(argv)
139150
out
140151
end
152+
# rubocop:enable Metrics
141153

142154
def validate_config
143155
raise ArgumentError, 'You must specify the name of the queue to process jobs from' unless @options[:queue]

sample_app/Gemfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ gem "sprockets-rails"
1515
# Use sqlite3 as the database for Active Record
1616
gem 'sqlite3', '~> 1.4'
1717
# Use Puma as the app server
18-
gem 'puma', '~> 5.0'
18+
gem 'puma', '~> 6.0'
1919
# Use SCSS for stylesheets
2020
gem 'sass-rails', '>= 6'
2121
# Transpile app-like JavaScript. Read more: https://github.com/rails/webpacker

sample_app/app/jobs/hello_job.rb

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,18 @@
11
class HelloJob < ApplicationJob
22
queue_as :default
33

4+
class NameException < StandardError; end
5+
6+
class SkipException < StandardError; end
7+
8+
retry_on NameException
9+
discard_on SkipException
10+
411
def perform(name)
12+
raise NameException if name == "error"
13+
raise SkipException if name == "skip"
14+
raise StandardError if name == "StandardError"
15+
516
puts "Hello from our job: #{name}"
617
end
718
end
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
queues:
2-
default: 'https://sqs.us-east-1.amazonaws.com/655347895545/ActiveJobDefault'
2+
default: <%= ENV['AWS_ACTIVE_JOB_QUEUE_URL'] %>
33
shutdown_timeout: 10
4+
retry_standard_errors: true

test/aws/rails/sqs_active_job/executor_test.rb

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,25 @@ module SqsActiveJob
3434
executor.shutdown # give the job a chance to run
3535
end
3636

37-
it 'does not delete the message on exception' do
37+
it 'deletes the message on exception' do
3838
expect(JobRunner).to receive(:new).and_return(runner)
3939
expect(runner).to receive(:run).and_raise StandardError
40-
expect(msg).not_to receive(:delete)
40+
expect(msg).to receive(:delete)
4141
executor.execute(msg)
4242
executor.shutdown # give the job a chance to run
4343
end
44+
45+
describe 'retry_standard_errors' do
46+
let(:executor) { Executor.new(retry_standard_errors: true) }
47+
48+
it 'does not delete the message on exception' do
49+
expect(JobRunner).to receive(:new).and_return(runner)
50+
expect(runner).to receive(:run).and_raise StandardError
51+
expect(msg).not_to receive(:delete)
52+
executor.execute(msg)
53+
executor.shutdown # give the job a chance to run
54+
end
55+
end
4456
end
4557

4658
describe '#shutdown' do

0 commit comments

Comments
 (0)