Handling Failures In Ansible Playbooks With Examples

In this article, you will learn about different ways of handling failures in ansible using built-in directives.

Failures are very common across any platform you use. There are different types of failures starting from syntax errors, undefined variables, unreachable hosts, failed commands, etc. You can also create your own failures based on certain conditions. You can either rely on built-in tools to handle failures or use third-party tools like yaml lint, vscode ansible extension to handle a few cases like syntax errors, invalid references, etc.

In this article, we will focus only on the built-in directives to solve failures.

Check For Syntax Errors In Ansible Playbook

To validate the YAML syntax you can use the yaml linter. Ansible has its own linter called ansible-lint which shows you syntax as well as other errors.

The ansible-playbook command comes with the “–syntax-check” flag which checks for any syntax error in the playbook.

You can see from the below output, ansible throws an error when a nonexisting module is used.

$ ansible-playbook --syntax-check playbook.yml

The problem with this command is, it will only show the first error it spots but not all the errors in the playbook.

Ignoring Unreachable Hosts In Ansible

Ansible will fail the task when it was not able to resolve or connect to a particular target machine. In this case, it will mark the host status as “UNREACHABLE” and remove it from the list of active nodes. No further task will be submitted on this unreachable node.

The following playbook will be submitted against the host which is currently not up and running.

---
- name: Ignoring Un reachable hosts
  hosts: ansalpine
  gather_facts: False

  tasks:

    - name: Check if a file is present
      ansible.builtin.stat:
        path: /home/ansuser/mainfile.txt
      register: fileop
      
    - name: Print the output of previous task
      ansible.builtin.debug:
        var: fileop

Since my host is not running, ansible will first try to connect to the host through SSH and fail with the following error message.

fatal: [ansalpine]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname ansalpine: Temporary failure in name resolution", "unreachable": true}

You can add the “ignore_unreachable: yes” directive at the play level or task level which will ignore the failure and proceed with running the next task in the unreachable host.

You might wonder how the second task ran fine when the host is unreachable. That is because the debug module will run on the controller node, not the target node.

Ignoring Failed Commands In Ansible Playbook

When a task runs and returns the “failed” value as true then ansible treats the task as failed and stops running any subsequent task on that node.

You can ignore a task failure and run the subsequent tasks by adding the “ignore_errors: yes” directive at the play or task level.

- name: Fail the task purposely
  ansible.builtin.shell: /bin/false
  register: status
  ignore_errors: yes
      
- name: Print the output of previous task
  ansible.builtin.debug:
    var: status

Define Condition Based Failures In Ansible

Sometimes a task might get completed but it will not produce the result we expect for. In such cases, we can make the task fail using the “failed_when” directive by defining our own conditions.
I am running the same playbook which I ran in the first section of this article. It uses the stat module to check for file presence. If a file is not available instead of failing the task it will complete the task.

TASK [Check if a file is present] ****************************************************************************************
ok: [ansubuntu]

TASK [Print the output of previous task] *********************************************************************************
ok: [ansubuntu] => {
    "status": {
        "ansible_facts": {
            "discovered_interpreter_python": "/usr/bin/python3"
        },
        "changed": false,
        "failed": false,
        "stat": {
            "exists": false
        }
    }
}

Here I wish the task to be failed when a file is not found. If you check the previous output, the value “exists” is set to false. I can now use this to fail the task.

- name: Check if a file is present
  ansible.builtin.stat:
    path: /home/ansuser/mainfile.txt
  register: fileop
  failed_when: fileop.stat.exists == false

You can also check for multiple conditions using the and/or operator.

The following task will fail if the file is not present or if it is an empty file. I am using the “OR” operator to evaluate the condition.

- name: Check if a file is present
  ansible.builtin.stat:
    path: /home/ansuser/mainfile.txt
  register: fileop
  failed_when: fileop.stat.exists == false or fileop.stat.size == 0

The following task will fail if either the file is present and empty. I am using the “AND” operator to evaluate the condition.

- name: Check if a file is present
  ansible.builtin.stat:
    path: /home/ansuser/mainfile.txt
  register: fileop
  failed_when: fileop.stat.exists == false and fileop.stat.size == 0

Task Failures & Handlers In Ansible Playbook

The handler is an ansible built-in feature where you can create tasks that will only run when the parent tasks send a signal to run. There is a detailed article about handlers and how to run handler tasks when tasks are failed. Refer to the following article.

RELATED ARTICLE - How To Work With Handlers In Ansible

Recovering From Block Level Failures In Ansible

Ansible offers three directives to group multiple tasks and to recover from failures.

Block Directive – Multiple tasks under a single block.
Rescue Directive – Any task failures in the block directive can be recovered using the rescue directive tasks.
Always Directive – Tasks under the always directive will execute always irrespective of block and rescue directive status.

There is a detailed article about the usage of the block, rescue, and always directive and how to recover from failures. Refer to the following article.

RELATED ARTICLE - Error Handling With Block And Rescue In Ansible

How To Abort All Plays In Ansible

Sometimes you simply want to stop all the pending tasks from executing if a task failed on any hosts. You can set the directive “any_errors_fatal: yes” either at the task level or play level. Now when a task is failed, it will first complete the submitted tasks in the current batch(forks) and fail the play.
I am running the same playbook from the previous section again but with “any_errors_fatal: yes”. I have also set “serial: 1” which will execute all the tasks at one host before proceeding with the next host.

---
- hosts: '*'
  gather_facts: False
  any_errors_fatal: True
  serial: 1

In the below output, you can see the task is submitted to the first host alone since the serial is set to 1 and it checks for a file and gets failed. Since the property “any_errors_fatal” is set to True, the playbook execution is stopped.

I am submitting the same playbook again but this time the serial keyword is removed and the forks are set to 2. On a single batch, the tasks will be submitted on two hosts.

$ ansible-playbook playbook.yml -f 2

In the above output, ansible will run already submitted tasks in the current batch before stopping the play. This is an important behavior of any_errors_fatal which you should be aware of.

Wrap-Up

In this article, we have discussed different types of failures in ansible as well as ways to handle the failures. Once you start creating playbooks for your production use cases you will face different types of failures and issues but whatever we have seen in this article will server as your base knowledge.

Devops | Dataops | Automation | Cloud