Report this

What is the reason for this report?

How To Set Up a Production Elasticsearch Cluster with Ansible

Updated on April 10, 2026
Anish Singh Walia

By Anish Singh Walia

Sr Technical Content Strategist and Team Lead

How To Set Up a Production Elasticsearch Cluster with Ansible

Introduction

Ansible is a configuration management tool that lets you automate server provisioning and application deployment from a single control node. Elasticsearch is a distributed search and analytics engine used for log analysis, full-text search, and real-time data pipelines. When you combine Ansible with Elasticsearch, you can deploy a production-ready cluster across multiple servers in a repeatable, version-controlled way.

This tutorial walks you through using Ansible+ to set up a three-node Elasticsearch cluster on Ubuntu 22.04 LTS and later DigitalOcean Droplets. You will configure dedicated master and data node roles, enable TLS encryption between nodes, validate cluster health, and set up Index Lifecycle Management (ILM) policies. By the end, you will have a fully functional Elasticsearch cluster secured with transport and HTTP-layer TLS, deployed entirely through Ansible playbooks.

Key Takeaways

  • A production Elasticsearch cluster requires a minimum of three nodes to maintain quorum and prevent split-brain scenarios.
  • Elasticsearch 8.x enables TLS and authentication by default. You must generate and distribute certificates to all nodes before they can form a cluster.
  • Ansible automates the entire cluster deployment, from package installation and system tuning to certificate distribution and service management, making the process repeatable and version-controlled.
  • Node roles in Elasticsearch 8.x are configured using the node.roles parameter in elasticsearch.yml, replacing the older node.master and node.data booleans.
  • The JVM heap size should be set to no more than half of the available system memory, and should not exceed 32 GB to maintain compressed object pointers.

Prerequisites

Before you begin this tutorial, you will need:

Hardware and DigitalOcean Droplet Requirements

  • Three Ubuntu Droplets with at least 4 GB of RAM and 2 vCPUs each. Elasticsearch recommends at least 4 GB of memory per node so you can allocate 2 GB to the JVM heap and leave the rest for the operating system and file system cache. You can create Droplets from the DigitalOcean Control Panel or using the DigitalOcean API.
  • Private networking enabled on all three Droplets. DigitalOcean VPC (Virtual Private Cloud) networking allows your Droplets to communicate securely on a private network within the same datacenter region. This is important because Elasticsearch does not encrypt its transport layer by default in versions prior to 8.x.
  • A non-root user with sudo privileges configured on each server. Follow the Initial Server Setup with Ubuntu tutorial to set this up.

The following table lists the recommended minimum resources per node role:

Node Role RAM vCPUs Disk Heap Size
Master-eligible 4 GB 2 50 GB SSD 2 GB
Data (hot) 8 GB 4 100 GB+ SSD 4 GB
Data (warm) 8 GB 2 500 GB+ HDD 4 GB
Coordinating 4 GB 2 50 GB SSD 2 GB

Software Requirements

Elasticsearch Licensing and Distribution Notes

Elasticsearch changed its licensing in 2021, moving from Apache 2.0 to a dual license under the Server Side Public License (SSPL) and the Elastic License. The Elastic License allows free use of the default distribution, including security features like TLS and role-based access control, without a paid subscription. The open-source fork, OpenSearch, is maintained by Amazon and follows a different release cycle. This tutorial uses the official Elastic distribution from artifacts.elastic.co.

Understanding the Cluster Topology

Before writing any Ansible code, you should understand how Elasticsearch distributes responsibilities across nodes.

Node Roles in Elasticsearch 8.x

Elasticsearch uses the node.roles parameter in elasticsearch.yml to assign roles to each node. This replaced the older node.master: true and node.data: true booleans used in Elasticsearch. The primary roles are:

Role Value for node.roles Function
Master-eligible master Manages cluster state, index creation, shard allocation
Data (generic) data Stores data and executes search and aggregation queries
Data hot data_hot Stores frequently accessed, recently indexed data
Data warm data_warm Stores less frequently accessed data at lower cost
Data content data_content Stores data that is not part of a time-series data stream
Ingest ingest Runs ingest pipelines to transform documents before indexing
Coordinating-only [] (empty array) Routes requests to appropriate nodes, aggregates results

For a complete list of roles, refer to the Elasticsearch node roles documentation.

For this tutorial, you will set up a three-node cluster where each node is both master-eligible and a data node. This layout is the most common starting point for small-to-medium production clusters:

Hostname Private IP Node Roles
es-node-1 10.132.0.2 master, data_hot, data_content, ingest
es-node-2 10.132.0.3 master, data_hot, data_content, ingest
es-node-3 10.132.0.4 master, data_hot, data_content, ingest

Three master-eligible nodes provide a quorum of two, which prevents split-brain scenarios where two separate groups of nodes each believe they are the active cluster. As your cluster grows, you can separate master-eligible nodes from data nodes and add dedicated warm or cold tier nodes for Index Lifecycle Management.

Network and Firewall Requirements Between Nodes

Elasticsearch uses two network ports:

  • Port 9200: HTTP REST API (client-to-cluster communication)
  • Port 9300: Transport protocol (node-to-node communication)

All three nodes must be able to reach each other on both ports over the private network. If you are using DigitalOcean’s VPC, traffic between Droplets in the same VPC is already permitted. If you have ufw enabled, you will need to allow traffic from each node:

sudo ufw allow from 10.132.0.0/16 to any port 9200
sudo ufw allow from 10.132.0.0/16 to any port 9300

Replace 10.132.0.0/16 with the subnet of your DigitalOcean VPC.

Step 1: Setting Up Your Ansible Control Node

In this step, you will set up the directory structure for your Ansible project on your local machine.

Installing Ansible on Ubuntu

If Ansible is not already installed on your control node, install it from the official PPA:

sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository --yes --update ppa:ansible/ansible
sudo apt install -y ansible

Verify the installation:

ansible --version

You should see output similar to:

Output
ansible [core 2.17.x] config file = /etc/ansible/ansible.cfg configured module search path = ['/home/user/.ansible/plugins/modules'] python version = 3.10.12

For a more detailed walkthrough, see How To Install and Configure Ansible on Ubuntu.

Configuring the Project Directory

Create a project directory with subdirectories for roles, templates, and variable files:

mkdir -p ~/elasticsearch-ansible/{inventory,roles/elasticsearch/{tasks,templates,handlers,files,defaults},group_vars}
cd ~/elasticsearch-ansible

This structure follows Ansible role best practices. Each directory serves a specific purpose: tasks/ holds the automation steps, templates/ stores Jinja2 configuration templates, handlers/ defines service restart triggers, and defaults/ contains default variable values.

Verifying Connectivity with ansible -m ping

Before writing any playbooks, confirm that your control node can reach all three target servers:

ansible -i inventory/hosts.ini all -m ping -u root

You should see a pong response from each host:

Output
es-node-1 | SUCCESS => { "changed": false, "ping": "pong" } es-node-2 | SUCCESS => { "changed": false, "ping": "pong" } es-node-3 | SUCCESS => { "changed": false, "ping": "pong" }

If any node fails, verify that your SSH keys are correctly configured and that the nodes are reachable on the private network.

Step 2: Defining the Ansible Inventory

The Ansible inventory file tells Ansible which servers to manage and how they are grouped. In this step, you will create an inventory that assigns each node to the elasticsearch group and sets host-specific variables for node roles.

Creating the Inventory File

Create the inventory file at inventory/hosts.ini:

[elasticsearch]
es-node-1 ansible_host=10.132.0.2 node_name=es-node-1
es-node-2 ansible_host=10.132.0.3 node_name=es-node-2
es-node-3 ansible_host=10.132.0.4 node_name=es-node-3

[elasticsearch:vars]
ansible_user=root
ansible_python_interpreter=/usr/bin/python3

Replace the ansible_host IP addresses with the private IP addresses of your DigitalOcean Droplets. You can find these in the DigitalOcean Control Panel under each Droplet’s Networking tab.

Grouping Master-Eligible and Data Nodes

For this three-node setup, every node acts as both a master-eligible and data node. Define the roles in a group variables file at group_vars/elasticsearch.yml:

---
es_version: "8.17.0"
es_major_version: "8.x"
es_cluster_name: "production"
es_heap_size: "2g"

es_node_roles:
  - master
  - data_hot
  - data_content
  - ingest

es_seed_hosts:
  - "10.132.0.2"
  - "10.132.0.3"
  - "10.132.0.4"

es_initial_master_nodes:
  - "es-node-1"
  - "es-node-2"
  - "es-node-3"

Update es_version to the specific Elasticsearch 8.x release you want to install. At the time of writing, 8.17.0 is a stable release in the 8.x branch. The es_seed_hosts list must contain the private IP addresses of all three nodes so they can discover each other during cluster formation.

Update es_heap_size based on the RAM available on your Droplets. A good rule is to set the heap to half of the system memory, up to a maximum of 32 GB. For a 4 GB Droplet, 2g is appropriate.

Note: The es_initial_master_nodes setting is only used during the initial cluster bootstrap. After the cluster has formed for the first time, you should remove this setting to prevent accidental re-bootstrapping. You can handle this in Ansible with a conditional task that checks whether the cluster has already been initialized.

Step 3: Writing the Elasticsearch Ansible Role

This step covers building the Ansible role that installs Elasticsearch, tunes the operating system, configures the JVM, and deploys the elasticsearch.yml configuration file.

Role Directory Structure

Your role directory should look like this:

roles/elasticsearch/
├── defaults/
│   └── main.yml
├── files/
├── handlers/
│   └── main.yml
├── tasks/
│   └── main.yml
└── templates/
    ├── elasticsearch.yml.j2
    └── jvm.options.j2

Installing Java and the Elasticsearch Package

Elasticsearch 8.x bundles its own JDK, so you do not need to install Java separately. Create the main task file at roles/elasticsearch/tasks/main.yml:

---
- name: Install required packages
  ansible.builtin.apt:
    name:
      - apt-transport-https
      - gnupg2
      - curl
    state: present
    update_cache: true

- name: Add Elasticsearch GPG key
  ansible.builtin.get_url:
    url: https://artifacts.elastic.co/GPG-KEY-elasticsearch
    dest: /usr/share/keyrings/elasticsearch-keyring.asc
    mode: "0644"

- name: Add Elasticsearch APT repository
  ansible.builtin.apt_repository:
    repo: "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.asc] https://artifacts.elastic.co/packages/{{ es_major_version }}/apt stable main"
    state: present
    filename: elasticsearch

- name: Install Elasticsearch
  ansible.builtin.apt:
    name: "elasticsearch={{ es_version }}"
    state: present
    update_cache: true

- name: Set vm.max_map_count for Elasticsearch
  ansible.posix.sysctl:
    name: vm.max_map_count
    value: "262144"
    sysctl_set: true
    state: present
    reload: true

- name: Configure system file descriptor limits
  ansible.builtin.lineinfile:
    path: /etc/security/limits.conf
    line: "elasticsearch  -  nofile  65535"
    create: true
    mode: "0644"

- name: Deploy Elasticsearch configuration
  ansible.builtin.template:
    src: elasticsearch.yml.j2
    dest: /etc/elasticsearch/elasticsearch.yml
    owner: root
    group: elasticsearch
    mode: "0660"
  notify: Restart Elasticsearch

- name: Deploy JVM options
  ansible.builtin.template:
    src: jvm.options.j2
    dest: /etc/elasticsearch/jvm.options.d/heap.options
    owner: root
    group: elasticsearch
    mode: "0660"
  notify: Restart Elasticsearch

- name: Enable and start Elasticsearch
  ansible.builtin.systemd:
    name: elasticsearch
    enabled: true
    state: started
    daemon_reload: true

The task Set vm.max_map_count is required because Elasticsearch uses memory-mapped files for its Lucene indexes. Without this setting, Elasticsearch will refuse to start and log an error about insufficient map count. The default kernel value of 65530 is too low; Elasticsearch requires at least 262144.

Templating elasticsearch.yml with Node-Specific Variables

Create the Jinja2 template at roles/elasticsearch/templates/elasticsearch.yml.j2:

# Elasticsearch configuration - managed by Ansible
cluster.name: {{ es_cluster_name }}
node.name: {{ node_name }}

node.roles: {{ es_node_roles | to_json }}

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: {{ ansible_host }}
http.port: 9200
transport.port: 9300

discovery.seed_hosts: {{ es_seed_hosts | to_json }}
cluster.initial_master_nodes: {{ es_initial_master_nodes | to_json }}

xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12

Each parameter in this template maps to an Elasticsearch configuration option:

  • cluster.name ties all nodes into the same cluster.
  • node.name gives each node a human-readable identifier.
  • node.roles sets the roles using the Elasticsearch 8.x format.
  • network.host binds the node to its private IP address.
  • discovery.seed_hosts lists the IP addresses of other nodes for initial discovery.
  • cluster.initial_master_nodes is used only during the first cluster bootstrap.
  • The xpack.security.* settings enable TLS on both the transport and HTTP layers.

Configuring jvm.options for Heap Sizing

Create the JVM options template at roles/elasticsearch/templates/jvm.options.j2:

-Xms{{ es_heap_size }}
-Xmx{{ es_heap_size }}

The -Xms and -Xmx values should always be equal. Setting them to the same value prevents the JVM from resizing the heap at runtime, which can cause pauses. On a 4 GB Droplet, set this to 2g. On an 8 GB Droplet, set it to 4g. Do not exceed 32g because the JVM loses the ability to use compressed ordinary object pointers (compressed oops) beyond that threshold, which reduces memory efficiency.

Setting System-Level Kernel Parameters

The task file already includes vm.max_map_count and file descriptor limits. These are the two system-level settings most commonly required for Elasticsearch. The handler file at roles/elasticsearch/handlers/main.yml should contain:

---
- name: Restart Elasticsearch
  ansible.builtin.systemd:
    name: elasticsearch
    state: restarted

Step 4: Configuring TLS and xpack Security

Elasticsearch enables security by default. Nodes cannot form a cluster without TLS configured on the transport layer. In this step, you will generate certificates and distribute them to all nodes using Ansible.

Generating a Certificate Authority with elasticsearch-certutil

On one of your Elasticsearch nodes (or on a machine where Elasticsearch is installed), generate a certificate authority (CA) and node certificates. Run these commands on es-node-1:

sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca \
  --out /etc/elasticsearch/certs/elastic-stack-ca.p12 \
  --pass ""

This creates a PKCS#12 keystore containing the CA certificate and private key. The --pass "" flag sets an empty password for non-interactive use. In a production environment with stricter security requirements, use a strong password and store it in Ansible Vault.

Next, generate node certificates signed by this CA:

sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert \
  --ca /etc/elasticsearch/certs/elastic-stack-ca.p12 \
  --ca-pass "" \
  --out /etc/elasticsearch/certs/elastic-certificates.p12 \
  --pass ""

Then generate HTTP-layer certificates:

sudo /usr/share/elasticsearch/bin/elasticsearch-certutil http

This command starts an interactive wizard. Answer the prompts as follows:

  • Generate a CSR? n
  • Use an existing CA? y
  • CA path: /etc/elasticsearch/certs/elastic-stack-ca.p12
  • CA password: (press Enter for empty)
  • Certificate validity (days): 365
  • Generate per node? n (for this tutorial, a single certificate works for all nodes)
  • Hostnames: enter each node’s hostname and private IP
  • Output: /etc/elasticsearch/certs/http.p12

Distributing Node Certificates with Ansible

Copy the generated .p12 files to your Ansible control node’s roles/elasticsearch/files/ directory:

scp root@10.132.0.2:/etc/elasticsearch/certs/elastic-certificates.p12 ~/elasticsearch-ansible/roles/elasticsearch/files/
scp root@10.132.0.2:/etc/elasticsearch/certs/http.p12 ~/elasticsearch-ansible/roles/elasticsearch/files/

Then add certificate distribution tasks to your role. Add these tasks to roles/elasticsearch/tasks/main.yml before the Deploy Elasticsearch configuration task:

- name: Create certs directory
  ansible.builtin.file:
    path: /etc/elasticsearch/certs
    state: directory
    owner: root
    group: elasticsearch
    mode: "0750"

- name: Copy transport certificates
  ansible.builtin.copy:
    src: elastic-certificates.p12
    dest: /etc/elasticsearch/certs/elastic-certificates.p12
    owner: root
    group: elasticsearch
    mode: "0640"
  notify: Restart Elasticsearch

- name: Copy HTTP certificates
  ansible.builtin.copy:
    src: http.p12
    dest: /etc/elasticsearch/certs/http.p12
    owner: root
    group: elasticsearch
    mode: "0640"
  notify: Restart Elasticsearch

Setting the Elasticsearch Keystore Password via Ansible

If you used passwords for your certificates, add them to the Elasticsearch keystore:

- name: Set transport keystore password
  ansible.builtin.command:
    cmd: /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin xpack.security.transport.ssl.keystore.secure_password
    stdin: ""
  changed_when: false

- name: Set transport truststore password
  ansible.builtin.command:
    cmd: /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin xpack.security.transport.ssl.truststore.secure_password
    stdin: ""
  changed_when: false

- name: Set HTTP keystore password
  ansible.builtin.command:
    cmd: /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin xpack.security.http.ssl.keystore.secure_password
    stdin: ""
  changed_when: false

Since this tutorial uses empty passwords (--pass ""), the stdin values are empty strings. For production, store the passwords in Ansible Vault and reference them as variables.

Step 5: Writing and Running the Master Playbook

Now you will create the top-level playbook that ties everything together and run it against your cluster.

Playbook Structure and Variable Precedence

Create the master playbook at site.yml in the project root:

---
- name: Deploy Elasticsearch cluster
  hosts: elasticsearch
  become: true
  roles:
    - elasticsearch

Ansible resolves variables in the following order (lowest to highest precedence):

  1. Role defaults (roles/elasticsearch/defaults/main.yml)
  2. Group variables (group_vars/elasticsearch.yml)
  3. Host variables (inventory host_vars/ or inline in hosts.ini)
  4. Playbook variables
  5. Extra variables passed via -e on the command line

Your group variables in group_vars/elasticsearch.yml will override any defaults set in the role.

Running the Playbook and Interpreting Output

Run the playbook from your Ansible control node:

cd ~/elasticsearch-ansible
ansible-playbook -i inventory/hosts.ini site.yml

Ansible will process each task sequentially across all nodes. The output shows the status of each task:

Output
PLAY [Deploy Elasticsearch cluster] ****************************************** TASK [Gathering Facts] ******************************************************* ok: [es-node-1] ok: [es-node-2] ok: [es-node-3] TASK [elasticsearch : Install required packages] ***************************** changed: [es-node-1] changed: [es-node-2] changed: [es-node-3] ... TASK [elasticsearch : Enable and start Elasticsearch] ************************ changed: [es-node-1] changed: [es-node-2] changed: [es-node-3] PLAY RECAP ******************************************************************* es-node-1 : ok=12 changed=10 unreachable=0 failed=0 es-node-2 : ok=12 changed=10 unreachable=0 failed=0 es-node-3 : ok=12 changed=10 unreachable=0 failed=0

If any task shows failed=1, check the error message. Common issues include incorrect IP addresses in the inventory, missing SSH keys, or network connectivity problems between nodes.

Handling Idempotency and Re-runs

The playbook is designed to be idempotent, meaning you can run it multiple times without causing unintended changes. The apt module checks whether packages are already installed, the template module compares file checksums before writing, and the Restart Elasticsearch handler only fires when a configuration file actually changes.

One exception is the cluster.initial_master_nodes setting. After the cluster has bootstrapped, this setting is no longer needed and should ideally be removed. You can add a conditional task that checks for an existing cluster:

- name: Check if cluster is already formed
  ansible.builtin.uri:
    url: "https://{{ ansible_host }}:9200/_cluster/health"
    user: elastic
    password: "{{ es_elastic_password }}"
    validate_certs: false
    status_code: 200
  register: cluster_health
  ignore_errors: true

- name: Remove initial_master_nodes after bootstrap
  ansible.builtin.lineinfile:
    path: /etc/elasticsearch/elasticsearch.yml
    regexp: "^cluster.initial_master_nodes"
    state: absent
  when: cluster_health is succeeded
  notify: Restart Elasticsearch

Step 6: Validating the Cluster

After the playbook completes, you need to verify that all nodes have joined the cluster and that the cluster is healthy.

Checking Cluster Health with the REST API

SSH into any one of your nodes and run:

curl -s -k -u elastic:your_password https://localhost:9200/_cluster/health?pretty

The -k flag skips certificate verification for this test. Replace your_password with the password for the elastic user. If you have not set a password yet, reset it with:

sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

A healthy cluster returns:

{
  "cluster_name" : "production",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 1,
  "active_shards" : 2,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

The "status": "green" field means all primary and replica shards are assigned. A "yellow" status indicates that all primary shards are assigned but some replicas are not, which can happen if you have fewer data nodes than the number of replicas configured for an index. A "red" status means some primary shards are unassigned, which requires immediate investigation.

Verifying Node Membership and Role Assignment

Check that all three nodes appear in the cluster with the correct roles:

curl -s -k -u elastic:your_password https://localhost:9200/_cat/nodes?v

Expected output:

Output
ip heap.percent ram.percent cpu load_1m node.role master name 10.132.0.2 25 78 3 0.12 dhims * es-node-1 10.132.0.3 22 75 2 0.08 dhims - es-node-2 10.132.0.4 21 76 1 0.05 dhims - es-node-3

The node.role column shows the roles as single-character codes: d = data, h = data_hot, i = ingest, m = master-eligible, s = data_content. The * in the master column indicates which node is the currently elected master.

Running a Basic Index Write and Read Test

Verify that the cluster can accept and return data by creating a test index and document:

curl -s -k -u elastic:your_password -X PUT "https://localhost:9200/test-index/_doc/1" \
  -H "Content-Type: application/json" \
  -d '{"message": "Elasticsearch cluster is working", "timestamp": "2026-04-08T12:00:00Z"}'

Expected response:

{
  "_index" : "test-index",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Retrieve the document:

curl -s -k -u elastic:your_password "https://localhost:9200/test-index/_doc/1?pretty"

Clean up the test index when you are done:

curl -s -k -u elastic:your_password -X DELETE "https://localhost:9200/test-index"

Step 7: Configuring Index Lifecycle Management (ILM)

Index Lifecycle Management automates how Elasticsearch manages indexes over time. For log data and time-series workloads, ILM can automatically roll over indexes based on size or age, move older indexes to cheaper storage tiers, and delete expired data.

Defining an ILM Policy via the Elasticsearch API

Create an ILM policy that rolls over indexes at 50 GB or 30 days, moves data to a warm tier after 7 days, and deletes data after 90 days:

curl -s -k -u elastic:your_password -X PUT "https://localhost:9200/_ilm/policy/logs-policy" \
  -H "Content-Type: application/json" \
  -d '{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

After running this command, verify the policy was created:

curl -s -k -u elastic:your_password "https://localhost:9200/_ilm/policy/logs-policy?pretty"

Automating ILM Policy Application with the Ansible uri Module

You can apply this ILM policy through Ansible so it is part of your infrastructure code. Create a task file at roles/elasticsearch/tasks/ilm.yml:

---
- name: Create ILM policy for logs
  ansible.builtin.uri:
    url: "https://{{ ansible_host }}:9200/_ilm/policy/logs-policy"
    method: PUT
    user: elastic
    password: "{{ es_elastic_password }}"
    validate_certs: false
    body_format: json
    body:
      policy:
        phases:
          hot:
            min_age: "0ms"
            actions:
              rollover:
                max_primary_shard_size: "50gb"
                max_age: "30d"
          warm:
            min_age: "7d"
            actions:
              shrink:
                number_of_shards: 1
              forcemerge:
                max_num_segments: 1
          delete:
            min_age: "90d"
            actions:
              delete: {}
    status_code:
      - 200
  run_once: true

The run_once: true directive ensures this task only executes on a single node, since ILM policies are cluster-wide settings.

Monitoring Policy Execution

Check the status of indexes managed by ILM:

curl -s -k -u elastic:your_password "https://localhost:9200/*/_ilm/explain?pretty" | head -30

This shows which lifecycle phase and step each index is in, helping you verify that the policy is progressing as expected.

Step 8: Hardening the Cluster for Production

Once the cluster is running, apply these additional configurations to prepare it for production workloads.

Configuring Shard Allocation Awareness

If your Droplets are in different availability zones or you want to distribute replicas across distinct racks, configure allocation awareness:

curl -s -k -u elastic:your_password -X PUT "https://localhost:9200/_cluster/settings" \
  -H "Content-Type: application/json" \
  -d '{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "zone"
  }
}'

Then set the node.attr.zone attribute in each node’s elasticsearch.yml:

node.attr.zone: zone-1

Elasticsearch will distribute primary and replica shards across different zones, so a single zone failure does not cause data loss.

Setting Up Snapshot Repositories for Backup

Snapshots are the recommended method for backing up Elasticsearch clusters. Configure a shared filesystem or S3-compatible repository. For DigitalOcean Spaces (S3-compatible), install the repository-s3 plugin:

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-s3

Then register the repository:

curl -s -k -u elastic:your_password -X PUT "https://localhost:9200/_snapshot/my-backup" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "s3",
  "settings": {
    "bucket": "your-spaces-bucket",
    "endpoint": "nyc3.digitaloceanspaces.com",
    "protocol": "https"
  }
}'

Create a snapshot:

curl -s -k -u elastic:your_password -X PUT "https://localhost:9200/_snapshot/my-backup/snapshot-1?wait_for_completion=true"

Comparing Self-Managed Elasticsearch vs. Managed Services

Self-managed Elasticsearch clusters via Ansible give you full control over node configuration, hardware selection, and cost optimization. The tradeoff is operational responsibility for upgrades, certificate rotation, backup scheduling, and scaling. Here is a comparison:

Factor Self-Managed (Ansible) Elastic Cloud OpenSearch Service
Control over configuration Full Limited Moderate
Operational overhead High Low Low
Cost at scale Lower Higher Moderate
Upgrade process Manual (rolling via Ansible) Automated Automated
TLS certificate management Manual Automatic Automatic
Custom plugins Supported Limited Limited

For teams that need specific plugin versions, custom JVM tuning, or compliance requirements that mandate running on specific infrastructure, self-managed clusters are the better choice. For teams prioritizing reduced operational overhead, managed services eliminate the need for certificate management, patching, and capacity planning.

Troubleshooting

Elasticsearch Fails to Start

If Elasticsearch does not start after running the playbook, check the logs:

sudo journalctl -u elasticsearch --no-pager -n 50

Common issues include:

  • vm.max_map_count too low: The playbook sets this, but verify with sysctl vm.max_map_count. It must be at least 262144.
  • Certificate errors: Ensure .p12 files are in /etc/elasticsearch/certs/ and owned by root:elasticsearch with 640 permissions.
  • Heap size errors: Verify that -Xms and -Xmx are equal and do not exceed half of available RAM.

Nodes Cannot Discover Each Other

If nodes start but do not form a cluster:

  • Verify that discovery.seed_hosts contains the correct private IP addresses.
  • Confirm firewall rules allow traffic on ports 9200 and 9300 between all nodes.
  • Check that cluster.name is identical on all nodes.

Connection Refused on Port 9200

If curl returns Connection refused:

  • Confirm Elasticsearch is running: sudo systemctl status elasticsearch
  • Check that network.host is set to the node’s private IP, not localhost.
  • Review /var/log/elasticsearch/production.log for startup errors.

FAQs

1. What version of Elasticsearch does this tutorial support?

This tutorial covers Elasticsearch 8.x on Ubuntu 22.04 LTS, using Ansible 2.14 or later. The configuration patterns, particularly around security and node roles, differ significantly from Elasticsearch 2.x and 7.x tutorials. If you are running Elasticsearch 9.x, most of the configuration in this tutorial still applies, though you should check the Elasticsearch release notes for any breaking changes.

2. What is the minimum number of nodes for a production Elasticsearch cluster?

Three nodes is the recommended minimum for production. This allows a quorum of two master-eligible nodes to elect a primary master, which prevents split-brain scenarios. A single-node cluster can be used for development and testing but is not suitable for production workloads because it has no redundancy.

3. Do I need to configure TLS manually when using Ansible to set up Elasticsearch 8.x?

Yes. Elasticsearch enables TLS between nodes by default. If you do not provide certificates, nodes will not be able to communicate and the cluster will not form. This tutorial covers generating certificates with elasticsearch-certutil and distributing them to all nodes via Ansible.

4. How do I control which node acts as the master using Ansible?

Node roles are defined in elasticsearch.yml using the node.roles parameter. In this tutorial, the Ansible inventory and group variables set the roles per node, and a Jinja2 template renders the correct configuration for each host. To create a dedicated master node, set node.roles: [master]. To create a data-only node, use node.roles: [data_hot, data_content].

5. How many shards should I have per node?

Elastic recommends keeping the number of shards per node below 20 per GB of JVM heap. For a node with a 2 GB heap, aim for no more than 40 shards. Each shard consumes memory and CPU resources, so over-sharding degrades performance. A good starting point is one primary shard per index for small datasets, with the number of replicas set to 1 for redundancy. See the Elastic shard sizing guidance for detailed recommendations.

Conclusion

You have now deployed a production Elasticsearch cluster across three Ubuntu Droplets using Ansible. The cluster is configured with:

  • Three master-eligible and data nodes providing high availability.
  • TLS encryption on both the transport and HTTP layers.
  • JVM heap sizing and system kernel tuning for stable operation.
  • An ILM policy for automated index lifecycle management.
  • Snapshot configuration for cluster backups.

The Ansible playbooks you created are version-controlled and repeatable, so you can add nodes, update configurations, or rebuild the cluster from scratch by re-running the playbook.

For more information on operating and scaling Elasticsearch clusters, explore these related tutorials:

Try DigitalOcean

If you are looking to deploy Elasticsearch clusters or other infrastructure, DigitalOcean Droplets provide a straightforward platform with predictable pricing, built-in VPC networking, and SSD-backed storage. You can spin up the three-node cluster described in this tutorial in minutes. Sign up for DigitalOcean today.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Anish Singh Walia
Anish Singh Walia
Author
Sr Technical Content Strategist and Team Lead
See author profile

I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer(Team Lead) @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix

Still looking for an answer?

Was this helpful?


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

What is the point of this? “In your favorite editor, edit a new file called elasticsearch.yml” When there is no mention of it, in this example you are just editing the site.yml file from the tinc example right? So not creating a new site.yml file, but editing the existing one.

Worked on my AWS Linux Machine

cluster.name: myES_Cluster node.name: ESNODE_CYR node.master: true node.data: true transport.host: localhost transport.tcp.port: 9300 http.port: 9200 network.host: 0.0.0.0 discovery.zen.minimum_master_nodes: 2

I have tried with this on the elasticsearch.yml (key:value) and worked fine for me. But it takes 2 days to fix it :wink: :slight_smile: , going on with ES Doc is so tough.

kept failing on step: Wait for elasticsearch to startup

“failed”: true, “msg”: “Timeout when waiting for localhost:9200”}

using es 5.4.0

anyone shares the same pain? can anyone help?

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.