Real-life Ansible – chapter one – how to run playbooks on production (more) safely

This is the first blog post in a series presenting some various real-life examples of Ansible in our projects. If you are interested in Ansible going production-ready – stay tuned!

Problem definition

We have a few inventories, like dev, stage and prod. For some of them, stability is not as important as for the others. On the other hand, our playbooks may contain several parameters, and we also want to pass some ansible-playbook parameters (e.g. tags), so a command to perform some action could be pretty long. It’s easy to overlook the crucial ones, especially for the production environment.

ansible-playbook -i inventories/dev app1.yml -e app_version=snapshot -t app --skip-tags maintenance -e serial=2

In such a situation there is a very high risk that we find a command in our shell history and run a playbook forgetting to change the inventory. While running the wrong command on dev is not so painful, we want to prevent accidental playbook running on production.

Confirmation on production

In the next few paragraphs I will show you our solution of making our playbooks production-aware.

Dedicated role

We decided to create a dedicated role named confirm, which is responsible for confirming our production actions. Why a dedicated role? Because we can include it in all playbooks that require confirmation. So to ensure confirmation is prompted, it’s enough to add a role to your playbook.


- hosts: app1
  gather_facts: no
  roles:
    - confirm
  tags:
    - always

We can skip the fact-gathering stage, as we do not depend on any system parameter. Most importantly, by using the always tag, we run this role always, regardless of the tags given in the command line.

Role tasks

I will show you the tasks’ evolution leading to the final version. Firstly, there might be just two simple tasks:

- name: wait for confirmation if required
  pause:
    prompt: "You are running playbook for {{ env }}, type environment name to confirm"
  register: confirm_response

- name: fail if wrong confirmation
  fail:
    msg: "Aborting due to incorrect input, given <<{{ confirm_response.user_input }}>>, expected <<{{ env }}>>"
  when: env != confirm_response.user_input

The snippet above shows a simple prompt asking a person to type the environment name to confirm their actions. This gives us the solution to our main goal – getting rid of playbooks that are running mindlessly. After prompt, you must type again the environment name (e.g. prod), so you must be aware that the playbook is run on production. If the confirmation response is wrong, the playbook is aborted.

Running multiple playbooks at once

Sometimes we need to run multiple playbooks at once, e.g.:

ansible-playbook -i inventories/dev app1.yml app2.yml app3.yml

In the solution above, we should confirm our actions in each playbook. To avoid this drawback, let’s introduce a slight modification:

- name: wait for confirmation if required
  pause:
    prompt: "You are running playbook for {{ env }}, type environment name to confirm"
  register: confirm_response
  when: confirm_response_user_input is not defined

- set_fact:
    confirm_response_user_input: "{{ confirm_response.user_input }}"
  when: confirm_response_user_input is not defined

- name: fail if wrong confirmation
  fail:
    msg: "Aborting due to incorrect input, given <<{{ confirm_response_user_input }}>>, expected <<{{ env }}>>"
  when: env != confirm_response_user_input

Now, we save the confirmation to the variable, so we can confirm only once. Why a dedicated variable? Why can’t we use confirm_response? Because if we skip the first task, confirm_response would contain info about the skipped first task instead of previous user input.

Securing only prod

Securing all playbooks would give us some inconvenience – it’s only important in a prod environment. To run security only on prod, we can wrap all tasks in a block, where we would test if this environment needs securing.

- block:
    - name: wait for confirmation if required
      pause:
        prompt: "You are running playbook for {{ env }}, type environment name to confirm"
      register: confirm_response
      when: confirm_response_user_input is not defined

    - set_fact:
        confirm_response_user_input: "{{ confirm_response.user_input }}"
      when: confirm_response_user_input is not defined

    - name: fail if wrong confirmation
      fail:
        msg: "Aborting due to incorrect input, given <<{{ confirm_response_user_input }}>>, expected <<{{ env }}>>"
      when: env != confirm_response_user_input
  when: confirm_actions is defined and confirm_actions
  delegate_to: localhost
  run_once: true

Now, only production has the variable defined as confirm_actions: true and only production needs manual confirmation before running. We must delegate the whole block to localhost, because we need to set fact – and we want to do it once and for all hosts.

Summary

Using the solution given above, we achieved the following goals:

  • Confirmation is required before running a playbook on production.
  • Even if we run a few playbooks at once, only single confirmation is required.
  • It’s easy to secure additional roles.