-
Notifications
You must be signed in to change notification settings - Fork 1
TechDocs Job Processing
The V2 API hasn’t been built with support for offloading resource intensive/slow operations in mind. Thus, we have many operations today that could be made async and failure-tolerant (i.e: Stripe / Geocoding / SOAP services).
We are currently using Rufus Scheduler (github.com/jmettraux/rufus-scheduler) as both a task scheduler and a rather primitive job processing at the same time, in blocking mode, which can be very problematic in terms of failure handling or scalability.
To schedule tasks we have been using tables in PostgreSQL as a queue of sorts. This is also not the best approach, as there is no pub/sub and the job handler needs to keep querying the database for new jobs. It’s just not the best tool for the job (no pun intended).
To make matters worse, the task_scheduler.rb is run as an initializer, which makes it run alongside the Rails stack, meaning if we were to run production in HA (www.linux-ha.org/wiki/Main_Page), we’d have at least two instances of the task scheduler running at the same time scheduling / running the same jobs with no proper job locking, which could cause all sorts of havok.
The proven, “Rails” way of handling background jobs involves:
-
A library that handles creating jobs / placing them on a queue, processing this queue later i.e: Sidekiq, Resque
-
A library that allows scheduling these jobs at arbitrary times i.e: rufus, clockwork
-
A fast, in-memory, key-value store backend i.e: Redis / Memcached
Mike Perham (creator of Sidekiq) has a great talk about this whole subject: confreaks.com/videos/1290-rubyconf2012-asynchronous-processing-for-fun-and-profit
Based on our previous experience at Staunchrobots, we recommend running with:
-
Sidekiq for job processing
-
Clockwork for scheduling recurring jobs
-
Redis as the in-memory store
Simple, efficient message processing for Ruby.
Sidekiq uses threads to handle many messages at the same time in the same process. It does not require Rails but will integrate tightly with Rails 3 to make background message processing dead simple.
Sidekiq is compatible with Resque. It uses the exact same message format as Resque so it can integrate into an existing Resque processing farm. You can have Sidekiq and Resque run side-by-side at the same time and use the Resque client to enqueue messages in Redis to be processed by Sidekiq.
At the same time, Sidekiq uses multithreading so it is much more memory efficient than Resque (which forks a new process for every job). You’ll find that you might need 50 200MB resque processes to peg your CPU whereas one 300MB Sidekiq process will peg the same CPU and perform the same amount of work.
It’s the modern Ruby approach. Fast, reliable, scalable.
github.com/tomykaira/clockwork
Cron is non-ideal for running scheduled application tasks, especially in an app deployed to multiple machines. More details: adam.heroku.com/past/2010/4/13/rethinking_cron/
Clockwork is a cron replacement. It runs as a lightweight, long-running Ruby process which sits alongside your web processes (Mongrel/Thin) and your worker processes (DJ/Resque/Minion/Stalker) to schedule recurring work at particular times or dates. For example, refreshing feeds on an hourly basis, or send reminder emails on a nightly basis, or generating invoices once a month on the 1st.
It’s the lightest, tightly integrates with Sidekiq. Note that sidekiq does support programatically delaying execution with delay_for(interval). Clockwork is just for the recurring stuff.
Redis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
It’s required by Sidekiq and we’re planning on using it already, for caching.
All this is fine and dandy, but what exactly do we need to do to have all these great features?
-
Install Redis on our application servers stack / separate server.
-
Use monit/upstart to manage sidekiq. Example upstart: github.com/mperham/sidekiq/tree/master/examples/upstart
-
Prune Gemfile of other unused job processing gems
-
Refactor all workers to use the .perform class method and include Sidekiq::Worker
-
Make sure all workers are idempotent and transactional. This may or may not mean designing new worker state machines.
-
Move rufus schedules to a separate clockwork ruby file that doesn’t necessarily need to load the entire Rails stack. This should be run by clockworkd and managed by monit / upstart
These are found in app/workers:
-
backoffice_worker.rb
-
email_worker.rb
-
fax_worker.rb
-
payment_worker.rb
-
ride_reservation_worker.rb
-
routing_worker.rb
-
search_stats_worker.rb
-
voice_worker.rb
-
worker.rb
I’m not exactly sure how these are used as I’ve never had to mess with them, but I’m assuming they are either workers that talk to 3rd party services or services used by our own workers.
These are found in app/services:
-
service.rb
-
fax_service/phaxio.rb
-
ground_widgets/booking_service.rb
-
ground_widgets/query_service.rb
-
ground_widgets/soap_service.rb
-
voice_service/twilio.rb
These are found in app/models (see a pattern already?):
We have some hooks in models that would benefit greatly from async processing. Namely:
-
concerns/stripe_customer.rb - all the stripe stuff could be moved to a worker, as credit card is not required on user creation. We could have a credit card state machine that validates when stripe returns valid, etc.
-
address.rb - could move all the geocoding stuff to a worker, have a simple geocoded flag that moves to true after processed, back to false when address changes.
These are found in config/initializers:
-
task_scheduler.rb - our current dread file. Should move all this to a clockwork file, review what we’re scheduling as recurring and why. We now have a way of inserting jobs to be processed directly so in the end the only thing that would end up here would be aggregators / digest mails / etc.
We don’t necessarily need to do all of this at once. We could delay deprecating the rufus task_scheduler.rb initializer, and slowly move workers out of it and into the Sidekiq architecture.
This would allow us to review / rewrite these workers and their specs one at a time. The only thing that we really need to do to make this possible is review the Gemfile and have sidekiq properly running, with its dashboard linked to our routes. This was done in the feature/sidekiq spike so theoretically we have a foundation to start working already.