Running single-node Factotum in a semi-distributed fashion

Currently Factotum is a single-node jobflow runner - there is no built-in support for running multiple Factotum workers in a distributed fashion (unlike Chronos or similar).

However, there is a strategy you can use, based on an idea in Kyle Kingsbury’s Jepsen analysis of Chronos:

you might consider shipping cronfiles directly to redundant nodes and
having tasks coordinate through a consensus system–it could, depending
on your infrastructure reliability and need for load-balancing, be
simpler and more reliable

Provided that your jobs:

  1. Can detect if another instance of the same job has started
  2. Will exit gracefully (providing a distinct no-op code) if 1. is true

then you can potentially push the same cron file containing Factotum commands to multiple servers for execution.

Some provisos:

  1. There could still be race conditions if two jobs start at the exact same time
  2. If one of your servers dies during a run, obviously the jobs running at the time will not complete and will not be re-scheduled, so this isn’t a true high-availability solution

We’ll update this article as/when we have built in distribution in Factotum…