Experimental Redis Sentinel support was Introduced in GitSwarm 2017.1. Starting with 8.14, Redis Sentinel is no longer experimental. If you've used it with versions
< 8.14
before, please check the updated documentation here.
High Availability with Redis is possible using a Master x Slave topology with a Redis Sentinel service to watch and automatically start the failover procedure.
You can choose to install and manage Redis and Sentinel yourself, use a hosted cloud solution or you can use the one that comes bundled with GitSwarm EE packages.
**Notes:**
Before diving into the details of setting up Redis and Redis Sentinel for HA, make sure you read this Overview section to better understand how the components are tied together.
You need at least 3
independent machines: physical, or VMs running into distinct physical machines. It is essential that all master and slaves Redis instances run in different machines. If you fail to provision the machines in that specific way, any issue with the shared environment can bring your entire setup down.
It is OK to run a Sentinel along with a master or slave Redis instance. No more than one Sentinel in the same machine though.
You also need to take in consideration the underlying network topology, making sure you have redundant connectivity between Redis / Sentinel and GitSwarm instances, otherwise the networks will become a single point of failure.
Make sure that you read this document once as a whole before configuring the components below.
**Notes:**
8.11
, you can configure a list of Redis Sentinel servers that will monitor a group of Redis servers to provide failover support.8.14
, the GitSwarm EE Enterprise Edition package comes with Redis Sentinel daemon built-in.High Availability with Redis requires a few things:
Redis Sentinel can handle the most important tasks in an HA environment and that's to help keep servers online with minimal to no downtime. Redis Sentinel:
When a Master fails to respond, it's the application's responsibility (in our case GitSwarm) to handle timeout and reconnect (querying a Sentinel for a new Master).
To get a better understanding on how to correctly setup Sentinel, please read the Redis Sentinel documentation first, as failing to configure it correctly can lead to data loss or can bring your whole cluster down, invalidating the failover effort.
For a minimal setup, you will install the GitSwarm EE package in 3
independent machines, both with Redis and Sentinel:
If you are not sure or don't understand why and where the amount of nodes come from, read Redis setup overview and Sentinel setup overview.
For a recommended setup that can resist more failures, you will install the GitSwarm EE package in 5
independent machines, both with Redis and Sentinel:
You must have at least 3
Redis servers: 1
Master, 2
Slaves, and they need to be each in a independent machine (see explanation above).
You can have additional Redis nodes, that will help survive a situation where more nodes goes down. Whenever there is only 2
nodes online, a failover will not be initiated.
As an example, if you have 6
Redis nodes, a maximum of 3
can be simultaneously down.
Please note that there are different requirements for Sentinel nodes. If you host them in the same Redis machines, you may need to take that restrictions into consideration when calculating the amount of nodes to be provisioned. See Sentinel setup overview documentation for more information.
All Redis nodes should be configured the same way and with similar server specs, as in a failover situation, any Slave can be promoted as the new Master by the Sentinel servers.
The replication requires authentication, so you need to define a password to protect all Redis nodes and the Sentinels. They will all share the same password, and all instances must be able to talk to each other over the network.
Sentinels watch both other Sentinels and Redis nodes. Whenever a Sentinel detects that a Redis node is not responding, it will announce that to the other Sentinels. They have to reach the quorum, that is the minimum amount of Sentinels that agrees a node is down, in order to be able to start a failover.
Whenever the quorum is met, the majority of all known Sentinel nodes need to be available and reachable, so that they can elect the Sentinel leader who will take all the decisions to restore the service availability by:
You must have at least 3
Redis Sentinel servers, and they need to be each in a independent machine (that are believed to fail independently), ideally in different geographical areas.
You can configure them in the same machines where you've configured the other Redis servers, but understand that if a whole node goes down, you loose both a Sentinel and a Redis instance.
The number of sentinels should ideally always be an odd number, for the consensus algorithm to be effective in the case of a failure.
In a 3
nodes topology, you can only afford 1
Sentinel node going down. Whenever the majority of the Sentinels goes down, the network partition protection prevents destructive actions and a failover will not be started.
Here are some examples:
5
or 6
sentinels, a maximum of 2
can go down for a failover begin.7
sentinels, a maximum of 3
nodes can go down.The Leader election can sometimes fail the voting round when consensus is not achieved (see the odd number of nodes requirement above). In that case, a new attempt will be made after the amount of time defined in sentinel['failover_timeout']
(in milliseconds).
Note: We will see where
sentinel['failover_timeout']
is defined later.
The failover_timeout
variable has a lot of different use cases. According to the official documentation:
The time needed to re-start a failover after a previous failover was already tried against the same master by a given Sentinel, is two times the failover timeout.
The time needed for a slave replicating to a wrong master according to a Sentinel current configuration, to be forced to replicate with the right master, is exactly the failover timeout (counting since the moment a Sentinel detected the misconfiguration).
The time needed to cancel a failover that is already in progress but did not produced any configuration change (SLAVEOF NO ONE yet not acknowledged by the promoted slave).
The maximum time a failover in progress waits for all the slaves to be reconfigured as slaves of the new master. However even after this time the slaves will be reconfigured by the Sentinels anyway, but not with the exact parallel-syncs progression as specified.
Based on your infrastructure setup and how you have installed GitSwarm, there are multiple ways to configure Redis HA. GitSwarm EE packages have Redis and/or Redis Sentinel bundled with them so you only need to focus on configuration. Pick the one that suits your needs.
This is the section where we install and setup the new Redis instances.
**Notes:**
redis['password']
. At any time during a failover the Sentinels can reconfigure a node and change its status from master to slave and vice versa.The prerequisites for a HA Redis setup are the following:
6379
) and Sentinel (26379
) ports (unless you change the default ones).Edit /etc/gitswarm/gitswarm.rb
and add the contents:
# Enable the master role and disable all other services in the machine
# (you can still enable Sentinel).
redis_master_role['enable'] = true
# IP address pointing to a local IP that the other machines can reach to.
# You can also set bind to '0.0.0.0' which listen in all interfaces.
# If you really need to bind to an external accessible IP, make
# sure you add extra firewall rules to prevent unauthorized access.
redis['bind'] = '10.0.0.1'
# Define a port so Redis can listen for TCP requests which will allow other
# machines to connect to it.
redis['port'] = 6379
# Set up password authentication for Redis (use the same password in all nodes).
redis['password'] = 'redis-password-goes-here'
To prevent database migrations from running on upgrade, run:
sudo touch /etc/gitswarm/skip-auto-migrations
Only the primary GitSwarm application server should handle migrations.
Reconfigure GitSwarm EE for the changes to take effect.
Edit /etc/gitswarm/gitswarm.rb
and add the contents:
# Enable the slave role and disable all other services in the machine
# (you can still enable Sentinel). This will also set automatically
# `redis['master'] = false`.
redis_slave_role['enable'] = true
# IP address pointing to a local IP that the other machines can reach to.
# You can also set bind to '0.0.0.0' which listen in all interfaces.
# If you really need to bind to an external accessible IP, make
# sure you add extra firewall rules to prevent unauthorized access.
redis['bind'] = '10.0.0.2'
# Define a port so Redis can listen for TCP requests which will allow other
# machines to connect to it.
redis['port'] = 6379
# The same password for Redeis authentication you set up for the master node.
redis['password'] = 'redis-password-goes-here'
# The IP of the master Redis node.
redis['master_ip'] = '10.0.0.1'
# Port of master Redis server, uncomment to change to non default. Defaults
# to `6379`.
#redis['master_port'] = 6379
To prevent database migrations from running on upgrade, run:
sudo touch /etc/gitswarm/skip-auto-migrations
Only the primary GitSwarm application server should handle migrations.
Go through the steps again for all the other slave nodes.
These values don't have to be changed again in /etc/gitswarm/gitswarm.rb
after a failover, as the nodes will be managed by the Sentinels, and even after a gitswarm-ctl reconfigure
, they will get their configuration restored by the same Sentinels.
Note: Redis Sentinel is bundled with GitSwarm EE Enterprise Edition only. The following section assumes you are using GitSwarm EE Enterprise Edition. For the Omnibus Community Edition and source installations, follow the Redis HA source install guide.
Now that the Redis servers are all set up, let's configure the Sentinel servers.
If you are not sure if your Redis servers are working and replicating correctly, please read the Troubleshooting Replication and fix it before proceeding with Sentinel setup.
You must have at least 3
Redis Sentinel servers, and they need to be each in an independent machine. You can configure them in the same machines where you've configured the other Redis servers.
With GitSwarm Enterprise Edition, you can use the package installation to setup multiple machines with the Sentinel daemon.
You can omit this step if the Sentinels will be hosted in the same node as the other Redis instances.
Download/install the GitSwarm EE Enterprise Edition package using steps 1 and 2 from the GitSwarm downloads page.Edit /etc/gitswarm/gitswarm.rb
and add the contents (if you are installing the Sentinels in the same node as the other Redis instances, some values might be duplicate below):
redis_sentinel_role['enable'] = true
# Must be the same in every sentinel node
redis['master_name'] = 'gitlab-redis'
# The same password for Redis authentication you set up for the master node.
redis['password'] = 'redis-password-goes-here'
# The IP of the master Redis node.
redis['master_ip'] = '10.0.0.1'
# Define a port so Redis can listen for TCP requests which will allow other
# machines to connect to it.
redis['port'] = 6379
# Port of master Redis server, uncomment to change to non default. Defaults
# to `6379`.
#redis['master_port'] = 6379
## Configure Sentinel
sentinel['bind'] = '10.0.0.1'
# Port that Sentinel listens on, uncomment to change to non default. Defaults
# to `26379`.
# sentinel['port'] = 26379
## Quorum must reflect the amount of voting sentinels it take to start a failover.
## Value must NOT be greater then the amount of sentinels.
##
## The quorum can be used to tune Sentinel in two ways:
## 1. If a the quorum is set to a value smaller than the majority of Sentinels
## we deploy, we are basically making Sentinel more sensible to master failures,
## triggering a failover as soon as even just a minority of Sentinels is no longer
## able to talk with the master.
## 1. If a quorum is set to a value greater than the majority of Sentinels, we are
## making Sentinel able to failover only when there are a very large number (larger
## than majority) of well connected Sentinels which agree about the master being down.s
sentinel['quorum'] = 2
## Consider unresponsive server down after x amount of ms.
# sentinel['down_after_milliseconds'] = 10000
## Specifies the failover timeout in milliseconds. It is used in many ways:
##
## - The time needed to re-start a failover after a previous failover was
## already tried against the same master by a given Sentinel, is two
## times the failover timeout.
##
## - The time needed for a slave replicating to a wrong master according
## to a Sentinel current configuration, to be forced to replicate
## with the right master, is exactly the failover timeout (counting since
## the moment a Sentinel detected the misconfiguration).
##
## - The time needed to cancel a failover that is already in progress but
## did not produced any configuration change (SLAVEOF NO ONE yet not
## acknowledged by the promoted slave).
##
## - The maximum time a failover in progress waits for all the slaves to be
## reconfigured as slaves of the new master. However even after this time
## the slaves will be reconfigured by the Sentinels anyway, but not with
## the exact parallel-syncs progression as specified.
# sentinel['failover_timeout'] = 60000
To prevent database migrations from running on upgrade, run:
sudo touch /etc/gitswarm/skip-auto-migrations
Only the primary GitSwarm application server should handle migrations.
Go through the steps again for all the other Sentinel nodes.
The final part is to inform the main GitSwarm application server of the Redis Sentinels servers and authentication credentials.
You can enable or disable Sentinel support at any time in new or existing installations. From the GitSwarm application perspective, all it requires is the correct credentials for the Sentinel nodes.
While it doesn't require a list of all Sentinel nodes, in case of a failure, it needs to access at least one of the listed.
Note: The following steps should be performed in the GitSwarm application server which ideally should not have Redis or Sentinels on it for a HA setup.
Edit /etc/gitswarm/gitswarm.rb
and add/change the following lines:
## Must be the same in every sentinel node
redis['master_name'] = 'gitlab-redis'
## The same password for Redis authentication you set up for the master node.
redis['password'] = 'redis-password-goes-here'
## A list of sentinels with `host` and `port`
gitlab_rails['redis_sentinels'] = [
{'host' => '10.0.0.1', 'port' => 26379},
{'host' => '10.0.0.2', 'port' => 26379},
{'host' => '10.0.0.3', 'port' => 26379}
]
Reconfigure GitSwarm EE for the changes to take effect.
If you already have a single-machine GitSwarm install running, you will need to replicate from this machine first, before de-activating the Redis instance inside it.
Your single-machine install will be the initial Master, and the 3
others should be configured as Slave pointing to this machine.
After replication catches up, you will need to stop services in the single-machine install, to rotate the Master to one of the new nodes.
Make the required changes in configuration and restart the new nodes again.
To disable redis in the single install, edit /etc/gitswarm/gitswarm.rb
:
redis['enable'] = false
If you fail to replicate first, you may loose data (unprocessed background jobs).
Note: Redis Sentinel is bundled with GitSwarm EE Enterprise Edition only. For different setups, read the available configuration setups section.
In this example we consider that all servers have an internal network interface with IPs in the 10.0.0.x
range, and that they can connect to each other using these IPs.
In a real world usage, you would also setup firewall rules to prevent unauthorized access from other machines and block traffic from the outside (Internet).
We will use the same 3
nodes with Redis + Sentinel topology discussed in Redis setup overview and Sentinel setup overview documentation.
Here is a list and description of each machine and the assigned IP:
10.0.0.1
: Redis Master + Sentinel 110.0.0.2
: Redis Slave 1 + Sentinel 210.0.0.3
: Redis Slave 2 + Sentinel 310.0.0.4
: GitSwarm applicationPlease note that after the initial configuration, if a failover is initiated by the Sentinel nodes, the Redis nodes will be reconfigured and the Master will change permanently (including in redis.conf
) from one node to the other, until a new failover is initiated again.
The same thing will happen with sentinel.conf
that will be overridden after the initial execution, after any new sentinel node starts watching the Master, or a failover promotes a different Master node.
In /etc/gitswarm/gitswarm.rb
:
redis_master_role['enable'] = true
redis_sentinel_role['enable'] = true
redis['bind'] = '10.0.0.1'
redis['port'] = 6379
redis['password'] = 'redis-password-goes-here'
redis['master_name'] = 'gitlab-redis' # must be the same in every sentinel node
redis['master_password'] = 'redis-password-goes-here' # the same value defined in redis['password'] in the master instance
redis['master_ip'] = '10.0.0.1' # ip of the initial master redis instance
#redis['master_port'] = 6379 # port of the initial master redis instance, uncomment to change to non default
sentinel['bind'] = '10.0.0.1'
# sentinel['port'] = 26379 # uncomment to change default port
sentinel['quorum'] = 2
# sentinel['down_after_milliseconds'] = 10000
# sentinel['failover_timeout'] = 60000
Reconfigure GitSwarm EE for the changes to take effect.
In /etc/gitswarm/gitswarm.rb
:
redis_slave_role['enable'] = true
redis_sentinel_role['enable'] = true
redis['bind'] = '10.0.0.2'
redis['port'] = 6379
redis['password'] = 'redis-password-goes-here'
redis['master_password'] = 'redis-password-goes-here'
redis['master_ip'] = '10.0.0.1' # IP of master Redis server
#redis['master_port'] = 6379 # Port of master Redis server, uncomment to change to non default
redis['master_name'] = 'gitlab-redis' # must be the same in every sentinel node
sentinel['bind'] = '10.0.0.2'
# sentinel['port'] = 26379 # uncomment to change default port
sentinel['quorum'] = 2
# sentinel['down_after_milliseconds'] = 10000
# sentinel['failover_timeout'] = 60000
Reconfigure GitSwarm EE for the changes to take effect.
In /etc/gitswarm/gitswarm.rb
:
redis_slave_role['enable'] = true
redis_sentinel_role['enable'] = true
redis['bind'] = '10.0.0.3'
redis['port'] = 6379
redis['password'] = 'redis-password-goes-here'
redis['master_password'] = 'redis-password-goes-here'
redis['master_ip'] = '10.0.0.1' # IP of master Redis server
#redis['master_port'] = 6379 # Port of master Redis server, uncomment to change to non default
redis['master_name'] = 'gitlab-redis' # must be the same in every sentinel node
sentinel['bind'] = '10.0.0.3'
# sentinel['port'] = 26379 # uncomment to change default port
sentinel['quorum'] = 2
# sentinel['down_after_milliseconds'] = 10000
# sentinel['failover_timeout'] = 60000
Reconfigure GitSwarm EE for the changes to take effect.
In /etc/gitswarm/gitswarm.rb
:
redis['master_name'] = 'gitlab-redis'
redis['password'] = 'redis-password-goes-here'
gitlab_rails['redis_sentinels'] = [
{'host' => '10.0.0.1', 'port' => 26379},
{'host' => '10.0.0.2', 'port' => 26379},
{'host' => '10.0.0.3', 'port' => 26379}
]
Reconfigure GitSwarm EE for the changes to take effect.
GitSwarm EE configures some things behind the curtains to make the sysadmins' lives easier. If you want to know what happens underneath keep reading.
In the previous example, we've used redis_sentinel_role
and redis_master_role
which simplifies the amount of configuration changes.
If you want more control, here is what each one sets for you automatically when enabled:
## Redis Sentinel Role
redis_sentinel_role['enable'] = true
# When Sentinel Role is enabled, the following services are also enabled
sentinel['enable'] = true
# The following services are disabled
redis['enable'] = false
bootstrap['enable'] = false
nginx['enable'] = false
postgresql['enable'] = false
gitlab_rails['enable'] = false
mailroom['enable'] = false
-------
## Redis master/slave Role
redis_master_role['enable'] = true # enable only one of them
redis_slave_role['enable'] = true # enable only one of them
# When Redis Master or Slave role are enabled, the following services are
# enabled/disabled. Note that if Redis and Sentinel roles are combined, both
# services will be enabled.
# The following services are disabled
sentinel['enable'] = false
bootstrap['enable'] = false
nginx['enable'] = false
postgresql['enable'] = false
gitlab_rails['enable'] = false
mailroom['enable'] = false
# For Redis Slave role, also change this setting from default 'true' to 'false':
redis['master'] = false
You can find the relevant attributes defined in gitlab_rails.rb.
There are a lot of moving parts that needs to be taken care carefully in order for the HA setup to work as expected.
Before proceeding with the troubleshooting below, check your firewall rules:
6379
6379
26379
26379
6379
You can check if everything is correct by connecting to each server using redis-cli
application, and sending the INFO
command.
If authentication was correctly defined, it should fail with: NOAUTH Authentication required
error. Try to authenticate with the previous defined password with AUTH redis-password-goes-here
and try the INFO
command again.
Look for the # Replication
section where you should see some important information like the role
of the server.
When connected to a master
redis, you will see the number of connected slaves
, and a list of each with connection details:
# Replication
role:master
connected_slaves:1
slave0:ip=10.133.5.21,port=6379,state=online,offset=208037514,lag=1
master_repl_offset:208037658
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:206989083
repl_backlog_histlen:1048576
When it's a slave
, you will see details of the master connection and if its up
or down
:
# Replication
role:slave
master_host:10.133.1.58
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:208096498
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
If you get an error like: Redis::CannotConnectError: No sentinels available.
, there may be something wrong with your configuration files or it can be related to this issue.
You must make sure you are defining the same value in redis['master_name']
and redis['master_pasword']
as you defined for your sentinel node.
The way the redis connector redis-rb
works with sentinel is a bit non-intuitive. We try to hide the complexity in omnibus, but it still requires a few extra configs.
To make sure your configuration is correct:
Enter the Rails console:
# For package installations
sudo gitswarm-rails console
# For source installations
sudo -u git rails console production
Run in the console:
redis = Redis.new(Gitlab::Redis.params)
redis.info
Keep this screen open and try to simulate a failover below.
To simulate a failover on master Redis, SSH into the Redis server and run:
# port must match your master redis port, and the sleep time must be a few seconds bigger than defined one
redis-cli -h localhost -p 6379 DEBUG sleep 20
Then back in the Rails console from the first step, run:
redis.info
You should see a different port after a few seconds delay (the failover/reconnect time).
Changes to Redis HA over time.
8.14
8.11
Read more on High Availability: