ansible-role-k3s/documentation/quickstart-ha-cluster.md

# Quickstart: K3s cluster with a HA control plane using embedded etcd

This is the quickstart guide to creating your own 3 node k3s cluster with a
highly available control plane using the embedded etcd datastore.
The control plane will all be workers as well.

:hand: This example requires your Ansible user to be able to connect to the
servers over SSH using key-based authentication. The user is also has an entry
in a sudoers file that allows privilege escalation without requiring a
password.

To test this is the case, run the following check replacing `<ansible_user>`
and `<server_name>`. The expected output is `Works`

`ssh <ansible_user>@<server_name> 'sudo cat /etc/shadow >/dev/null && echo "Works"'`

For example:

```text
[ xmanning@dreadfort:~/git/kubernetes-playground ] (master) $ ssh ansible@kube-0 'sudo cat /etc/shadow >/dev/null && echo "Works"'
Works
[ xmanning@dreadfort:~/git/kubernetes-playground ] (master) $
```

## Directory structure

Our working directory will have the following files:

```text
kubernetes-playground/
  |_ inventory.yml
  |_ ha_cluster.yml
```

## Inventory

Here's a YAML based example inventory for our servers called `inventory.yml`:

```yaml
---

# We're adding k3s_control_node to each host, this can be done in host_vars/
# or group_vars/ as well - but for simplicity we are setting it here.
k3s_cluster:
  hosts:
    kube-0:
      ansible_user: ansible
      ansible_host: 10.10.9.2
      ansible_python_interpreter: /usr/bin/python3
      k3s_control_node: true
    kube-1:
      ansible_user: ansible
      ansible_host: 10.10.9.3
      ansible_python_interpreter: /usr/bin/python3
      k3s_control_node: true
    kube-2:
      ansible_user: ansible
      ansible_host: 10.10.9.4
      ansible_python_interpreter: /usr/bin/python3
      k3s_control_node: true

```

We can test this works with `ansible -i inventory.yml -m ping all`, expected
result:

```text
kube-0 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
kube-1 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
kube-2 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

```

## Playbook

Here is our playbook for the k3s cluster (`ha_cluster.yml`):

```yaml
---

- name: Build a cluster with HA control plane
  hosts: k3s_cluster
  vars:
    k3s_become_for_all: true
    k3s_etcd_datastore: true
    k3s_use_experimental: true  # Note this is required for k3s < v1.19.5+k3s1
  roles:
    - role: xanmanning.k3s
```

## Execution

To execute the playbook against our inventory file, we will run the following
command:

`ansible-playbook -i inventory.yml ha_cluster.yml`

The output we can expect is similar to the below, with no failed or unreachable
nodes. The default behavior of this role is to delegate the first play host as
the primary control node, so kube-0 will have more changed tasks than others:

```text
PLAY RECAP *******************************************************************************************************
kube-0                     : ok=53   changed=8    unreachable=0    failed=0    skipped=30   rescued=0    ignored=0
kube-1                     : ok=47   changed=10   unreachable=0    failed=0    skipped=28   rescued=0    ignored=0
kube-2                     : ok=47   changed=9    unreachable=0    failed=0    skipped=28   rescued=0    ignored=0
```

## Testing

After logging into any of the servers (it doesn't matter), we can test that k3s
is running across the cluster, that all nodes are ready and that everything is
ready to execute our Kubernetes workloads by running the following:

  - `sudo kubectl get nodes -o wide`
  - `sudo kubectl get pods -o wide --all-namespaces`

:hand: Note we are using `sudo` because we need to be root to access the
kube config for this node. This behavior can be changed with specifying
`write-kubeconfig-mode: 0644` in `k3s_server`.

**Get Nodes**:

```text
ansible@kube-0:~$ sudo kubectl get nodes -o wide
NAME     STATUS   ROLES         AGE     VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
kube-0   Ready    etcd,master   2m58s   v1.19.4+k3s1   10.10.9.2     10.10.9.2     Ubuntu 20.04.1 LTS   5.4.0-56-generic   containerd://1.4.1-k3s1
kube-1   Ready    etcd,master   2m22s   v1.19.4+k3s1   10.10.9.3     10.10.9.3     Ubuntu 20.04.1 LTS   5.4.0-56-generic   containerd://1.4.1-k3s1
kube-2   Ready    etcd,master   2m10s   v1.19.4+k3s1   10.10.9.4     10.10.9.4     Ubuntu 20.04.1 LTS   5.4.0-56-generic   containerd://1.4.1-k3s1
```

**Get Pods**:

```text
ansible@kube-0:~$ sudo kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE     IP          NODE     NOMINATED NODE   READINESS GATES
kube-system   coredns-66c464876b-rhgn6                 1/1     Running     0          3m38s   10.42.0.2   kube-0   <none>           <none>
kube-system   helm-install-traefik-vwglv               0/1     Completed   0          3m39s   10.42.0.3   kube-0   <none>           <none>
kube-system   local-path-provisioner-7ff9579c6-d5xpb   1/1     Running     0          3m38s   10.42.0.5   kube-0   <none>           <none>
kube-system   metrics-server-7b4f8b595-nhbt8           1/1     Running     0          3m38s   10.42.0.4   kube-0   <none>           <none>
kube-system   svclb-traefik-9lzcq                      2/2     Running     0          2m56s   10.42.1.2   kube-1   <none>           <none>
kube-system   svclb-traefik-vq487                      2/2     Running     0          2m45s   10.42.2.2   kube-2   <none>           <none>
kube-system   svclb-traefik-wkwkk                      2/2     Running     0          3m1s    10.42.0.7   kube-0   <none>           <none>
kube-system   traefik-5dd496474-lw6x8                  1/1     Running     0          3m1s    10.42.0.6   kube-0   <none>           <none>
```