This release provides a preview of a new module for High Availability (HA). The new module is not yet complete or ready for production use, it being made available so that users can experiment with the new approach and provide feedback early in the development process. Feedback should go to user@qpid.apache.org.
The old cluster module takes an active-active approach, i.e. all the brokers in a cluster are able to handle client requests simultaneously. The new HA module takes an active-passive, hot-standby approach.
In an active-passive cluster, only one broker, known as the primary, is active and serving clients at a time. The other brokers are standing by as backups. Changes on the primary are immediately replicated to all the backups so they are always up-to-date or "hot". If the primary fails, one of the backups is promoted to be the new primary. Clients fail-over to the new primary automatically. If there are multiple backups, the backups also fail-over to become backups of the new primary.
The new approach depends on an external cluster resource manager to detect failure of the primary and choose the new primary. The first supported resource manager will be rgmanager, but it will be possible to add integration with other resource managers in the future. The preview version is not integrated with any resource manager, you can use the qpid-ha tool to simulate the actions of a resource manager or do your own integration.
There are a number of known limitations in the current preview implementation. These will be fixed in the production versions.
The broker must load the ha module, it is loaded by default
when you start a broker. The following broker options are available for the HA module.
Table 1.18. Options for High Availability Messaging Cluster
| Options for High Availability Messaging Cluster | |
|---|---|
--ha-cluster yes|no
| Set to "yes" to have the broker join a cluster. |
--ha-brokers URL
|
URL use by brokers to connect to each other. The URL lists the addresses of
all the brokers in the cluster
[a]
in the following form:
url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
addr = tcp_addr / rmda_addr / ssl_addr / ...
tcp_addr = ["tcp:"] host [":" port]
rdma_addr = "rdma:" host [":" port]
ssl_addr = "ssl:" host [":" port]'
|
--ha-public-brokers URL | URL used by clients to connect to the brokers in the same format as --ha-brokers above. Use this option if you want client traffic on a different network from broker replication traffic. If this option is not set, clients will use the same URL as brokers. |
|
--ha-username --ha-password --ha-mechanism |
Brokers use USER,
PASS, MECH to
authenticate when connecting to each other.
|
[a] If the resource manager supports virtual IP addresses then the URL can contain just the single virtual IP. | |
To configure a cluster you must set at least ha-cluster and ha-brokers
To create a replicated queue or exchange, pass the argument qpid.replicate when creating the queue or exchange. It should have one of the following three values:
Bindings are automatically replicated if the queue and exchange being bound both have replication argument of all or confguration, they are not replicated otherwise. You can create replicated queues and exchanges with the qpid-config management tool like this:
qpid-config add queue myqueue --replicate all
To create replicated queues and exchangs via the client API, add a node entry to the address like this:
"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
Clients can only connect to the single primary broker. All other brokers in the cluster are backups, and they automatically reject any attempt by a client to connect.
Clients are configured with the addreses of all of the brokers in the cluster. [1] When the client tries to connect initially, it will try all of its addresses until it successfully connects to the primary. If the primary fails, clients will try to try to re-connect to all the known brokers until they find the new primary.
Suppose your cluster has 3 nodes: node1, node2 and node3 all using the default AMQP port.
With the C++ client, you specify all the cluster addresses in a single URL, for example:
qpid::messaging::Connection c("node1:node2:node3");
With the python client, you specify reconnect=True and a list of host:port addresses as reconnect_urls when calling establish or open
connection = qpid.messaging.Connection.establish("node1", reconnect=True, "reconnect_urls=["node1", "node2", "node3"])
Broker fail-over is managed by a cluster resource manager. The initial preview version of HA is not integrated with a resource manager, the production version will be integrated with rgmanager and it may be integrated with other resource managers in the future.
The resource manager is responsible for ensuring that there is exactly one broker is acting as primary at all times. It selects the initial primary broker when the cluster is started, detects failure of the primary, and chooses the backup to promote as the new primary.
You can simulate the actions of a resource manager, or indeed do your own integration with a resource manager using the qpid-ha tool. The command
qpid-ha promote -bhost:port
will promote the broker listening on
host:port to be the primary.
You should only promote a broker to primary when there is no other primary in the
cluster. The brokers will not detect multiple primaries, they rely on the resource
manager to do that.
A clustered broker always starts initially in discovery mode. It uses the addresses configured in the ha-brokers configuration option and tries to connect to each in turn until it finds to the primary. The resource manager is responsible for choosing on of the backups to promote as the initial primary.
If the primary fails, all the backups are disconnected and return to discovery mode. The resource manager chooses one to promote as the new primary. The other backups will eventually discover the new primary and reconnect.
You can connect to a backup broker with the administrative tool qpid-ha. You can also connect with the tools qpid-config, qpid-route and qpid-stat if you pass the flag --ha-admin on the command line. If you do connect to a backup you should not modify any of the replicated queues, as this will disrupt the replication and may result in message loss.
[1] If the resource manager supports virtual IP addresses then the clients can be configured with a single virtual IP address.