Drift occurs when the configuration or code in source control diverges from what is actually installed and running in production. This is a common problem for software organizations. In my first month at RtF, I've seen Drift create problems with building and running the Docker container responsible for the First Discovery Server, unapplied iptables rules, and incorrect host information in the Ansible inventory.
The best and most direct way to fight Drift is with Continuous Delivery. If the configuration and code in source control are constantly being applied and deployed to the production environment, Drift becomes impossible(1). So the goal of this ticket is to make Ansible run automatically and more often, ideally running all the time!
My general approach to systems like this is pull > push. Ansible can operate this way. In the not-too-distant future I see Terraform spinning up VMs, handing them off to Ansible to set up a cron to pull periodically, and then we have a fleet of machines that (more or less) keep themselves updated all the time.
Another approach is to use Jenkins (or maybe Gitlab) to run Ansible for us using a push model. This may be simpler and/or faster than switching to a pull-based model.
Jenkins (or Gitlab) will likely have a role in the system anyway, running tests on each commit (using something like Molecule) and perhaps being responsible for blessing commits that passed the test suite so that they can be deployed to production.
What is my rosy picture of the future overlooking? Is there a simpler or better way to solve this problem I'm overlooking? Any details that might make this harder than it sounds (today in the ops weekly Alfredo mentioned some coordination issues between load balancers and other hosts when updating ssl certs)?
(1) for small to medium values of "impossible".