1.13. Active-passive Messaging Clusters

1.13.1. Overview

The High Availability (HA) module provides active-passive, hot-standby messaging clusters to provide fault tolerant message delivery.

In an active-passive cluster only one broker, known as the primary, is active and serving clients at a time. The other brokers are standing by as backups. Changes on the primary are replicated to all the backups so they are always up-to-date or "hot". Backup brokers reject client connection attempts, to enforce the requirement that clients only connect to the primary.

If the primary fails, one of the backups is promoted to take over as the new primary. Clients fail-over to the new primary automatically. If there are multiple backups, the other backups also fail-over to become backups of the new primary.

This approach relies on an external cluster resource manager to detect failures, choose the new primary and handle network partitions. Rgmanager is supported initially, but others may be supported in the future.

1.13.1.1. Avoiding message loss

In order to avoid message loss, the primary broker delays acknowledgment of messages received from clients until the message has been replicated to and acknowledged by all of the back-up brokers.

Clients buffer unacknowledged messages and re-send them in the event of a fail-over. [1] If the primary crashes before a message is replicated to all the backups, the client will re-send the message when it fails over to the new primary.

Note that this means it is possible for messages to be duplicated. In the event of a failure it is possible for a message to be both received by the backup that becomes the new primary and re-sent by the client.

When a new primary is promoted after a fail-over it is initially in "recovering" mode. In this mode, it delays acknowledgment of messages on behalf of all the backups that were connected to the previous primary. This protects those messages against a failure of the new primary until the backups have a chance to connect and catch up.

Status of a HA broker

Joining

Initial status of a new broker that has not yet connected to the primary.

Catch-up

A backup broker that is connected to the primary and catching up on queues and messages.

Ready

A backup broker that is fully caught-up and ready to take over as primary.

Recovering

The newly-promoted primary, waiting for backups to connect and catch up.

Active

The active primary broker with all backups connected and caught-up.

1.13.1.2. Replacing the old cluster module

The High Availability (HA) module replaces the previous active-active cluster module. The new active-passive approach has several advantages compared to the existing active-active cluster module.

  • It does not depend directly on openais or corosync. It does not use multicast which simplifies deployment.
  • It is more portable: in environments that don't support corosync, it can be integrated with a resource manager available in that environment.
  • Replication to a disaster recovery site can be handled as simply another node in the cluster, it does not require a separate replication mechanism.
  • It can take advantage of features provided by the resource manager, for example virtual IP addresses.
  • Improved performance and scalability due to better use of multiple CPUs

1.13.1.3. Limitations

There are a number of known limitations in the current preview implementation. These will be fixed in the production versions.

  • Transactional changes to queue state are not replicated atomically. If the primary crashes during a transaction, it is possible that the backup could contain only part of the changes introduced by a transaction.
  • Not yet integrated with the persistent store. A persistent broker must have its store erased before joining an existing cluster. If the entire cluster fails, there are no tools to help identify the most recent store. In the future a persistent broker will be able to use its stored messages to avoid downloading messages from the primary when joining a cluster.
  • Configuration changes (creating or deleting queues, exchanges and bindings) are replicated asynchronously. Management tools used to make changes will consider the change complete when it is complete on the primary, it may not yet be replicated to all the backups.
  • Deletions made immediately after a failure (before all the backups are ready) may be lost on a backup. Queues, exchange or bindings that were deleted on the primary could re-appear if that backup is promoted to primary on a subsequent failure.
  • Federated links from the primary will be lost in fail over, they will not be re-connected to the new primary. Federation links to the primary can fail over.

1.13.2. Virtual IP Addresses

Some resource managers (including rgmanager) support virtual IP addresses. A virtual IP address is an IP address that can be relocated to any of the nodes in a cluster. The resource manager associates this address with the primary node in the cluster, and relocates it to the new primary when there is a failure. This simplifies configuration as you can publish a single IP address rather than a list.

A virtual IP address can be used by clients and backup brokers to connect to the primary. The following sections will explain how to configure virtual IP addresses for clients or brokers.

1.13.3. Configuring the Brokers

The broker must load the ha module, it is loaded by default. The following broker options are available for the HA module.

Table 1.18. Broker Options for High Availability Messaging Cluster

Options for High Availability Messaging Cluster
ha-cluster yes|no Set to "yes" to have the broker join a cluster.
ha-brokers-url URL

The URL [a] used by cluster brokers to connect to each other. The URL can contain a list of all the broker addresses or it can contain a single virtual IP address. If a list is used it is comma separated, for example amqp:node1.exaple.com,node2.exaple.com,node3.exaple.com

ha-public-url URL

The URL that is advertised to clients. This defaults to the ha-brokers-url URL above, and has the same format. A virtual IP address is recommended for the public URL as it simplifies deployment and hides changes to the cluster membership from clients.

This option allows you to put client traffic on a different network from broker traffic, which is recommended.

ha-replicate VALUE

Specifies whether queues and exchanges are replicated by default. VALUE is one of: none, configuration, all. For details see Section 1.13.7, “Creating replicated queues and exchanges”.

ha-username USER

ha-password PASS

ha-mechanism MECH

Authentication settings used by HA brokers to connect to each other. If you are using authorization (Section 1.5.2, “Authorization”) then this user must have all permissions.
ha-backup-timeout SECONDS

Maximum time that a recovering primary will wait for an expected backup to connect and become ready.

link-maintenance-interval SECONDS

Interval for the broker to check link health and re-connect links if need be. If you want brokers to fail over quickly you can set this to a fraction of a second, for example: 0.1.

[a] The full format of the URL is given by this grammar:

url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
addr = tcp_addr / rmda_addr / ssl_addr / ...
tcp_addr = ["tcp:"] host [":" port]
rdma_addr = "rdma:" host [":" port]
ssl_addr = "ssl:" host [":" port]'
		  


To configure a HA cluster you must set at least ha-cluster and ha-brokers-url.

1.13.4. The Cluster Resource Manager

Broker fail-over is managed by a cluster resource manager. An integration with rgmanager is provided, but it is possible to integrate with other resource managers.

The resource manager is responsible for starting the qpidd broker on each node in the cluster. The resource manager then promotes one of the brokers to be the primary. The other brokers connect to the primary as backups, using the URL provided in the ha-brokers-url configuration option.

Once connected, the backup brokers synchronize their state with the primary. When a backup is synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers continually receive updates from the primary in order to stay synchronized.

If the primary fails, backup brokers go into fail-over mode. The resource manager must detect the failure and promote one of the backups to be the new primary. The other backups connect to the new primary and synchronize their state with it.

The resource manager is also responsible for protecting the cluster from split-brain conditions resulting from a network partition. A network partition divide a cluster into two sub-groups which cannot see each other. Usually a quorum voting algorithm is used that disables nodes in the inquorate sub-group.

1.13.5. Configuring rgmanager as resource manager

This section assumes that you are already familiar with setting up and configuring clustered services using cman and rgmanager. It will show you how to configure an active-passive, hot-standby qpidd HA cluster with rgmanager.

You must provide a cluster.conf file to configure cman and rgmanager. Here is an example cluster.conf file for a cluster of 3 nodes named node1, node2 and node3. We will go through the configuration step-by-step.

      
<?xml version="1.0"?>
<!--
This is an example of a cluster.conf file to run qpidd HA under rgmanager.
This example assumes a 3 node cluster, with nodes named node1, node2 and node3.

NOTE: fencing is not shown, you must configure fencing appropriately for your cluster.
-->

<cluster name="qpid-test" config_version="18">
  <!-- The cluster has 3 nodes. Each has a unique nodid and one vote
       for quorum. -->
  <clusternodes>
    <clusternode name="node1.example.com" nodeid="1"/>
    <clusternode name="node2.example.com" nodeid="2"/>
    <clusternode name="node3.example.com" nodeid="3"/>
  </clusternodes>
  <!-- Resouce Manager configuration. -->
  <rm>
    <!--
	There is a failoverdomain for each node containing just that node.
	This lets us stipulate that the qpidd service should always run on each node.
    -->
    <failoverdomains>
      <failoverdomain name="node1-domain" restricted="1">
	<failoverdomainnode name="node1.example.com"/>
      </failoverdomain>
      <failoverdomain name="node2-domain" restricted="1">
	<failoverdomainnode name="node2.example.com"/>
      </failoverdomain>
      <failoverdomain name="node3-domain" restricted="1">
	<failoverdomainnode name="node3.example.com"/>
      </failoverdomain>
    </failoverdomains>

    <resources>
      <!-- This script starts a qpidd broker acting as a backup. -->
      <script file="/etc/init.d/qpidd" name="qpidd"/>

      <!-- This script promotes the qpidd broker on this node to primary. -->
      <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/>

      <!-- This is a virtual IP address for broker replication traffic. -->
      <ip address="20.0.10.200" monitor_link="1"/>

      <!-- This is a virtual IP address on a seprate network for client traffic. -->
      <ip address="20.0.20.200" monitor_link="1"/>
    </resources>

    <!-- There is a qpidd service on each node, it should be restarted if it fails. -->
    <service name="node1-qpidd-service" domain="node1-domain" recovery="restart">
      <script ref="qpidd"/>
    </service>
    <service name="node2-qpidd-service" domain="node2-domain" recovery="restart">
      <script ref="qpidd"/>
    </service>
    <service name="node3-qpidd-service" domain="node3-domain"  recovery="restart">
      <script ref="qpidd"/>
    </service>

    <!-- There should always be a single qpidd-primary service, it can run on any node. -->
    <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate">
      <script ref="qpidd-primary"/>
      <!-- The primary has the IP addresses for brokers and clients to connect. -->
      <ip ref="20.0.10.200"/>
      <ip ref="20.0.20.200"/>
    </service>
  </rm>
</cluster>
      
    

There is a failoverdomain for each node containing just that one node. This lets us stipulate that the qpidd service should always run on all nodes.

The resources section defines the qpidd script used to start the qpidd service. It also defines the qpid-primary script which does not actually start a new service, rather it promotes the existing qpidd broker to primary status.

The resources section also defines a pair of virtual IP addresses on different sub-nets. One will be used for broker-to-broker communication, the other for client-to-broker.

To take advantage of the virtual IP addresses, qpidd.conf should contain these lines:

      ha-cluster=yes
      ha-brokers-url=20.0.20.200
      ha-public-url=20.0.10.200
    

This configuration specifies that backup brokers will use 20.0.20.200 to connect to the primary and will advertise 20.0.10.200 to clients. Clients should connect to 20.0.10.200.

The service section defines 3 qpidd services, one for each node. Each service is in a restricted fail-over domain containing just that node, and has the restart recovery policy. The effect of this is that rgmanager will run qpidd on each node, restarting if it fails.

There is a single qpidd-primary-service using the qpidd-primary script which is not restricted to a domain and has the relocate recovery policy. This means rgmanager will start qpidd-primary on one of the nodes when the cluster starts and will relocate it to another node if the original node fails. Running the qpidd-primary script does not start a new broker process, it promotes the existing broker to become the primary.

1.13.6. Broker Administration Tools

Normally, clients are not allowed to connect to a backup broker. However management tools are allowed to connect to a backup brokers. If you use these tools you must not add or remove messages from replicated queues, nor create or delete replicated queues or exchanges as this will disrupt the replication process and may cause message loss.

qpid-ha allows you to view and change HA configuration settings.

The tools qpid-config, qpid-route and qpid-stat will connect to a backup if you pass the flag ha-admin on the command line.

1.13.7. Creating replicated queues and exchanges

By default, queues and exchanges are not replicated automatically. You can change the default behavior by setting the ha-replicate configuration option. It has one of the following values:

  • all: Replicate everything automatically: queues, exchanges, bindings and messages.
  • configuration: Replicate the existence of queues, exchange and bindings but don't replicate messages.
  • none: Don't replicate anything, this is the default.

You can over-ride the default for a particular queue or exchange by passing the argument qpid.replicate when creating the queue or exchange. It takes the same values as ha-replicate

Bindings are automatically replicated if the queue and exchange being bound both have replication all or configuration, they are not replicated otherwise.

You can create replicated queues and exchanges with the qpid-config management tool like this:

      qpid-config add queue myqueue --replicate all
    

To create replicated queues and exchanges via the client API, add a node entry to the address like this:

      "myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
    

1.13.8. Client Connection and Fail-over

Clients can only connect to the primary broker. Backup brokers automatically reject any connection attempt by a client.

Clients are configured with the URL for the cluster (details below for each type of client). There are two possibilities

  • The URL contains multiple addresses, one for each broker in the cluster.
  • The URL contains a single virtual IP address that is assigned to the primary broker by the resource manager. [2]

In the first case, clients will repeatedly re-try each address in the URL until they successfully connect to the primary. In the second case the resource manager will assign the virtual IP address to the primary broker, so clients only need to re-try on a single address.

When the primary broker fails, clients re-try all known cluster addresses until they connect to the new primary. The client re-sends any messages that were previously sent but not acknowledged by the broker at the time of the failure. Similarly messages that have been sent by the broker, but not acknowledged by the client, are re-queued.

TCP can be slow to detect connection failures. A client can configure a connection to use a heartbeat to detect connection failure, and can specify a time interval for the heartbeat. If heartbeats are in use, failures will be detected no later than twice the heartbeat interval. The following sections explain how to enable heartbeat in each client.

See "Cluster Failover" in Programming in Apache Qpid for details on how to keep the client aware of cluster membership.

Suppose your cluster has 3 nodes: node1, node2 and node3 all using the default AMQP port, and you are not using a virtual IP address. To connect a client you need to specify the address(es) and set the reconnect property to true. The following sub-sections show how to connect each type of client.

1.13.8.1. C++ clients

With the C++ client, you specify multiple cluster addresses in a single URL [3] You also need to specify the connection option reconnect to be true. For example:

	qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
      

Heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the heartbeat option. For example:

	  qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
	

1.13.8.2. Python clients

With the python client, you specify reconnect=True and a list of host:port addresses as reconnect_urls when calling Connection.establish or Connection.open

	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
      

Heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the 'heartbeat' option. For example:

	connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
      

1.13.8.3. Java JMS Clients

In Java JMS clients, client fail-over is handled automatically if it is enabled in the connection. You can configure a connection to use fail-over using the failover property:

	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&failover='failover_exchange'
      

This property can take three values:

Fail-over Modes

failover_exchange

If the connection fails, fail over to any other broker in the cluster.

roundrobin

If the connection fails, fail over to one of the brokers specified in the brokerlist.

singlebroker

Fail-over is not supported; the connection is to a single broker only.

In a Connection URL, heartbeat is set using the idle_timeout property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:

	connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3
      

1.13.9. Security.

You can secure your cluster using the authentication and authorization features described in Section 1.5, “Security”.

Backup brokers connect to the primary broker and subscribe for management events and queue contents. You can specify the identity used to connect to the primary with the following options:

Table 1.19. Security options for High Availability Messaging Cluster

Security options for High Availability Messaging Cluster

ha-username USER

ha-password PASS

ha-mechanism MECH

Authentication settings used by HA brokers to connect to each other. If you are using authorization (Section 1.5.2, “Authorization”) then this user must have all permissions.

This identity is also used to authorize actions taken on the backup broker to replicate from the primary, for example to create queues or exchanges.

1.13.10. Integrating with other Cluster Resource Managers

To integrate with a different resource manager you must configure it to:

  • Start a qpidd process on each node of the cluster.
  • Restart qpidd if it crashes.
  • Promote exactly one of the brokers to primary.
  • Detect a failure and promote a new primary.

The qpid-ha command allows you to check if a broker is primary, and to promote a backup to primary.

To test if a broker is the primary:

	qpid-ha -b broker-address status --expect=primary
      

This command will return 0 if the broker at broker-address is the primary, non-0 otherwise.

To promote a broker to primary:

	qpid-ha -b broker-address promote
      

qpid-ha --help gives information on other commands and options available. You can also use qpid-ha to manually examine and promote brokers. This can be useful for testing failover scenarios without having to set up a full resource manager, or to simulate a cluster on a single node. For deployment, a resource manager is required.



[1] Clients must use "at-least-once" reliability to enable re-send of unacknowledged messages. This is the default behavior, no options need be set to enable it. For details of client addressing options see "Using the Qpid Messaging API" in Programming in Apache Qpid

[2] Only if the resource manager supports virtual IP addresses

[3] The full grammar for the URL is:

	    url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
	    addr = tcp_addr / rmda_addr / ssl_addr / ...
	    tcp_addr = ["tcp:"] host [":" port]
	    rdma_addr = "rdma:" host [":" port]
	    ssl_addr = "ssl:" host [":" port]'