How To Avoid IaC Drift

Blocking infrastructure drift with RBAC alone will just frustrate your engineer; here's how to make IaC changes so easy they'll actually want to use it.

Got problems with infrastructure as code (IaC) drift? Let's talk about what you can do about it.

Avoiding drift comes down to two things:

  • Making the right way of making infrastructure changes through IaC easy, approachable, and supported

  • Using RBAC and the principle of least privilege to block the folks who don't care or think they don't have time to do things the "right" way

You can implement #2 pretty easily, but things will grind to a halt if you don't combine it with #1. In this situation, you won't have drift, but you'll create frustration across your organization. The result is usually that engineers introduce crap architecture and code to "hack" around that they couldn't get the infrastructure they needed.

As my friends at Spacelift would say, you need to "deliver the speed your developers demand with the control your platform team requires".

So let's talk about how to implement #1, making infra changes through IaC easy, approachable and supported. Here are some reasonable ways to create the right environment for IaC efficiency and ease of use within your org:

  • Automate: This should go without saying, but if you're asking engineers in your org (platform or otherwise) to drive all infrastructure changes through manual, local applies then you're going to struggle to make it easy to introduce changes. Automation makes it so an engineer who doesn't know much about your tooling can make a simple config change without needing to know how the underlying tooling work. Do they need a minor version upgrade to their database? That should be a one-line change and a PR review away. For proper self-service that application engineers actually use, you need an abstraction layer that simplifies the system down to "provide these variable inputs, put up a PR, and when it merges you get your infrastructure".

  • Document: Providing documentation for your systems, how your IaC is organized, how to introduce changes through a PR, and where to ask for help is absolutely critical. Documentation enables self-service. Treat this as a critical component of your platform and make sure to ask your teams "what documentation are we missing?" every once in a while. (Don’t forget to write documentation after you find gaps.)

  • Sandboxes are for ClickOps: Making changes directly in the console can be valuable and has a place: it's called a "Sandbox" environment. Think of this as an environment that you wipe regularly where you can give everyone admin permissions and they can use it to try out changes and configurations BEFORE they move those changes to code. This gives you a workflow for developing complicated or sensitive infrastructure changes without impacting an environment that everyone uses. Engineers test ClickOps or IaC changes in the Sandbox environment, they port the tested changes to IaC in git, and then they elevate those changes up through the correct environments. It's actually very difficult to find the right balance of "Let me POC this thing real quick and check it actually fits the use-case" and "We need to be secure, have an inventory of our infrastructure, and ensure we're following all of our best practices and guardrails -- otherwise we end up with a dumpster fire in production and we'll fail our next audit". It's the classic dilemma. The sandbox is the best compromise that we've found to make it all come together well for all parties involved. But there probably isn't a perfect answer.

There are other things you can do in this realm, but these three are absolutely critical. You won't have engineers move away from ClickOps, ShadowOps, and constant drift unless you nail these.

May your IaC deployment always be automated,

Matt @ Masterpoint

PS Check out this comprehensive guide to the various files that make up a Terraform and OpenTofu project that we put together. If you’ve found this newsletter helpful, please forward it to a friend. Or if you want to share on your company slack, here’s the archive of all my newsletters.