6.1. High-Availability Boro Cluster¶

This chapter describes the creation of a high-availability (HA) cluster consisting of two Boro Solution servers. An HA cluster is a group of servers providing continued service and minimum down-time during failures. Clusters consist of at least two nodes, i.e. an active or primary server and a standby server. If the primary node fails, application operations are moved to the standby server.

Cluster Capabilities:

The maximum number of servers in a cluster is 2 (primary/standby).
Hot spare, the switchover time is less than 5 seconds.
The period for which the statistics may be lost during switchover is up to 5 minutes.
The functionality of both servers is identical, so Boro Solution remains fully operational without any limitations after the switchover.
Full data replication, including server settings, metrics, journals, thumbnails, etc.
Manual switchover (from the command line) or switchover using API. An external system for server status monitoring is required.
It is possible to add a redundant node for a previously configured and currently active Boro Solution server (update required).

To create a cluster, you need to do the following:

Write a request to the Technical Support team.
Prepare two servers with similar performance characteristics and identical disk space (allocated for Boro Solution operation). The server performance requirements are the same as those for servers without High Availability. You can find recommendations on server selection and OS installation in the Preparing the Server section.
Establish a direct network connectivity between the nodes. The requirements for the minimum data transmission rate between two nodes can be described as an order of magnitude greater than the input bitrate of all probes. Note, however, that connecting a standby server to the primary one for the first time or restoring communication between cluster nodes will cause database replication. The replication speed will depend on the network throughput in the first place.
The use of NTP is required for all nodes in the cluster.

6.1.1. Installing and Configuring a Cluster¶

Installing Boro Solution¶

To create a high-availability cluster, you will need two Boro Solution servers.

Install Boro Solution software on the servers. See the detailed guide in Installing the Boro Solution Server.
Note
1. Boro Solution should be installed with the same conditions on the future cluster nodes (the variables set in boro_install_variables.sh). The SERVER_PUBLIC_NAME variable should have the same value for all nodes. It defines a host name pointing at the primary server.
2. The primary and standby servers should have a direct network connection. Using any NAT will interfere with connecting nodes.
3. Time synchronization via NTP should work correctly on both nodes.
Upload certificates to the servers.

Decide which server will be the primary one; create a CSR certificate. Open the standby node and create a CSR certificate for this server. Request certificates issuance from technical support and install them on both servers. The process of creating and uploading certificates is described in details in Installing Certificates.

Note

It is recommended to upload certificates right after installing Boro Solution. This is because after you register all cluster nodes, requests from the standby node will be redirected to the primary node, and uploading certificates will become challenging.

Uploading the Update to Create the Cluster¶

The update for HA cluster creation is provided by an Elecard engineer in the form of an archive. The archive should be uploaded to both cluster nodes. After each upload, you need to run the following command as root:

TMP_DIR=$(mktemp -d)
tar -C $TMP_DIR -xf /PATH/TO/install_HA.2024-xx-xx.01.tgz
$TMP_DIR/install_HA.sh
[ "${TMP_DIR#/}" -a -d "$TMP_DIR" ] && rm -rf "$TMP_DIR"

Instead /PATH/TO, indicate a path to the archive; replace install_HA.2024-xx-xx.01.tgz with the actual archive name.

Initializing HA Cluster Nodes¶

On the server that you choose to be primary, run the following command as root:

/opt/elecard/boro/bin/HA_ctrl register

Note

The command is run only once. It initializes the cluster that includes only a primary server so far.

Register the second Boro server as a standby node by doing the following:

Copy the SSH key:
- Run the following command on the standby server:
```
cat ~root/.ssh/id_ed25519-HA.pub
```
Copy and save the string returned in response.
- On the primary server, execute the following command, adding the copied string:
```
echo "ssh-ed25519 ...." >>~root/.ssh/authorized\_keys
```
  If you don’t add the string, the server registration will end prematurely with a short instruction.
Run the command as root to register the standby server:
```
/opt/elecard/boro/bin/HA_ctrl register <PRIMARY_IP_OR_HOSTNAME>
```
Instead <PRIMARY_IP_OR_HOSTNAME>, type the IP address or the host name of the primary node.

The process of database replication between cluster nodes will start.

Note

Since the database is copied from the primary node, the execution of the command may take significant time, depending on the database size and connection speed. Wait for the program to finish.

6.1.2. Working With the Cluster¶

General Information¶

All probe requests are redirected from the standby node to the primary server. HTTP requests from browsers, except those whose paths start with /HA/, are also redirected from the standby to the primary node. The responses for such requests contain the X-HA-Proxy: <NODE_NAME> header. HTTP requests whose paths start with /HA/ are not redirected and handled by the requested node. These are the following paths:

/HA/health_check — returns a JSON object with node status. The object contains the following parameters:
- node — node name;
- state — node role in cluster, e.g. primary, standby or unknown;
- health — a parameter that shows if the node is healthy; can be true or false;
- last_promotion — the last time the node was promoted to the primary role (in Unix time). If the node has never been promoted to primary, the value is 0.
Even if the node is unhealthy, the HTTP return code should be 200.
/HA/metrics — metrics for Prometheus, a software for tracking events and sending notifications;

There is no authentication for the three paths listed above, but you can allow or restrict their usage by editing the settings in the /etc/nginx/sites-include/boro_HA.conf file.

Besides HTTP requests, you can interact with the cluster using the /opt/elecard/boro/bin/HA_ctrl command. If you run the command without arguments or parameters, you will see the current node status. To output the help message, pass the -h argument: /opt/elecard/boro/bin/HA_ctrl -h.

Viewing the Node Status¶

You can check the node status using one of the following ways:

Run the command as root:

/opt/elecard/boro/bin/HA_ctrl status

Response example:

Boro HA state, node2:
      node state:                        primary
    last change:      2024-01-23T20:53:07+07:00
  last promotion:      2024-01-23T20:53:07+07:00
    remote node:                        visible

  PostgreSQL cluster info:
  ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
  1  | node1 | standby |   running | node2    | default  | 100      | 48       | host=node1 user=repmgr dbname=repmgr connect_timeout=2
  2  | node2 | primary | * running |          | default  | 100      | 48       | host=node2 user=repmgr dbname=repmgr connect_timeout=2

  nginx usual mode

Send an HTTP request:

NODE_IP_OR_HOSTNAME=<NODE_IP_OR_HOSTNAME>

curl $<NODE_IP_OR_HOSTNAME>/HA/health_check

Replace <NODE_IP_OR_HOSTNAME> with the actual IP address or the name of the host to which the request is sent.

Response example:

{
  "node": "node1",
  "state": "standby",
  "health": true,
  "last_promotion": 1706017165
}
{
  "node": "node2",
  "state": "primary",
  "health": true,
  "last_promotion": 1706017987
}

Switching to the Standby Server¶

Attention

When switching between servers, statistics may be lost for up to 5 minutes.

Note

The switchover is done only in the manual mode. A logic for automatic switchover should be implemented outside the cluster using a witness node.

You can switch to the standby server using one of the following ways:

Run the command as root on the standby node:
```
/opt/elecard/boro/bin/HA_ctrl switchover
```

Send the Control API request to the standby node:

NODE_IP_OR_HOSTNAME=<NODE_IP_OR_HOSTNAME>
USER_ID=<USER_ID>

curl \
  -H "Content-Type: application/json" \
  --data "{\"user_id\":$USER_ID,\"methods\":[{\"method\":\"HASwitchOver\"}]}" \
  http://$NODE_IP_OR_HOSTNAME/HA/ctrl_api/v1/json

Replace <USER_ID> and <NODE_IP_OR_HOSTNAME> with actual values. The HASwitchOver method should be used without parameters.

Response example:

{"reply":[{"method":"HASwitchOver","result":"ok"}]} # the switchover is done correctly

{"reply":[{"method":"HASwitchOver","error":"is primary already!"}]} # the node is already a primary node