High Availability
P4 Code Review offers an active-passive form of High Availability, where the active node is a P4 Code Review instance and the passive node is a duplicate of this P4 Code Review instance that is left in standby.
Both of these P4 Code Review instances must be connected to the same P4 Server to ensure a seamless failover transition if the active node fails.
Most of your P4 Code Review data is backed up on the P4 Server. However, certain files will have to be backed up locally. These include modifications made to config, module, templates, and CSS files. If your P4 Code Review instance fails, during the failover the majority of your data is retrieved from the P4 Server and used to get the passive P4 Code Review instance up and running. You will then need to add in the modified files you have backed up locally.
Set up active-passive in P4 Code Review
-
Set up a Redis cluster (or an equivalent standalone system). This is so P4 Code Review can connect to a system which stores the cache.
-
Set up a P4 Code Review instance to be the active node (referred to as node A) and complete the following steps:
-
Connect this P4 Code Review instance to the Redis cluster (or standalone system) that stores the cache.
-
Configure the config.php file with any custom modification.
-
If you have any Extensions installed on the P4 Server, point them towards the node A (the current active node).
-
-
Set up a P4 Code Review instance to be the passive node (referred to as node B) and complete the following steps:
-
For the passive node, you do not need to run the
configure-swarm.shscript. For more information, see Why the passive node does not require the setup script. -
To prevent jobs from being duplicated or missed, turn off workers on node B.
-
Ensure the P4 Code Review data directory is shared via NFS or another shared filesystem. The data directory contains files such as
config.php, tokens, logs, and other attachments. -
Copy the queue token from the active node to the passive node. You can find the queue token in the following location:
/<SWARM>/data/queue/tokens/<TOKENID>
-
(Optional) If you have custom modules in your active node (node A), copy them over to the passive node (node B).
-
(Optional) If you have public files in your active node, copy these over to the passive node.
-
Pre-disaster preparations
Complete the following tasks to ensure a successful failover if a P4 Code Review instance fails.
-
Local backups
-
Regularly back up any local modifications, such as configuration, module, templates, and CSS files, to a backup location of your choice.
-
Ensure that your backup process includes checks for both the current version and historical changes of these modified files.
-
-
Training and documentation:
-
Keep detailed documentation of where your local files are stored and the steps required to retrieve them.
-
Ensure your operations team is well-trained on the manual failover process and is aware of how to retrieve the local backups and implement the files on the passive node when it becomes active.
-
-
Health Checks and monitoring:
-
Implement automated monitoring for both the active and passive nodes.
-
Create alerts that inform the operations team about potential issues. This is so they can prepare for a manual failover.
-
If you are using HTTPS servers, make sure all certificates are trusted by the P4 Servers.
-
-
P4 Server High Availability setup:
-
Confirm the P4 Server that contains your review data is configured for High Availability.
-
Regularly test the High Availability failover procedures according to the P4 Server High Availability documentation.
-
Volume sharing between P4 Code Review instances
As an optional step, you can configure both the active and passive P4 Code Review instances to share the same data volume found in /<SWARM>/data/. This means the node B could easily take over should node A fail.
However, there is an edge case where some of the data may be lost during the failover period. For example, if node A failed at 14:00 and node B becomes active at 14:10, any new activity on the server may be missed in the 10 minute period.
Failover from active node to passive node
For more in depth steps on how to move from one P4 Code Review instance to another, see Moving your P4 Code Review instance.
As a brief overview, to manually move activity from the active node to the passive node, follow these steps:
-
Turn on workers in node B.
-
Change the DNS record so it points to node B instead of node A. This is assuming that the correct extensions, triggers and tokens have been set up on node B.
-
Create a new P4 Code Review instance to be the passive node (referred to as node C).
The new active node is node B and requires a new passive node, which will be node C. Follow Step 3 in the Set up active-passive in P4 Code Review section above to create the passive node C.
Why the passive node does not require the setup script
The configure-swarm.sh script is run when creating a P4 Code Review environment from scratch. It performs many setup tasks such as creating P4 Code Review users in P4, installing server-side extensions and modifying the config.php file.
After you have set up the active node, the configure-swarm.sh script has already configured the P4 Code Review environment for the passive node. This means the script does not need to be run again, as it would recreate users and re-edit the config.php file.