Failure to Present Available Cluster Resources when Using Multiple Network Cards

Views:

Summary

Shared resources may not appear in the DPX management console as expected. These resources are hosted by the node with an active DPX communication agent (cmagent). To recognize resources hosted by another cluster member, DPX cmagent must fail-over to this node. Failure to display shared cluster resources may occur when multiple network cards (NICs) are in use on cluster nodes and the implementation of the DPX option is restricted to use a specific NIC.

Symptoms

In a case where each node only has a single NIC, the IP addresses returned by the Cluster Administrator to DPX appears similar to the following:

Server1 192.168.x.x
Server2 192.168.x.x
Server3 192.168.x.x

In the case where a dedicated backup network is implemented and therefore each node has more than one NIC, the IP addresses returned by the Cluster Administrator appears similar to the following:

Server1 192.168.x.x 172.27.x.x
Server2 192.168.x.x 172.27.x.x
Server3 192.168.x.x 172.27.x.x

Even though DPX is configured to use a specific backup network, the Cluster Administrator returns the main network IP addressing (192.168.x.x) for the physical servers as this is how the names are resolved (in most configurations).

After expanding the DPX virtual cluster object not all shared resources are detected. Since the cluster nodes have multiple network interfaces installed, -hn is configured for all cluster nodes. However, the apph logs still contain the public interface IP and not the backup interface IP.

In the below example, for a backup LAN on 172.25.x.x, the apph logs show that the hostnames are bound to:
hostaddr 192.168.70.30 (MSCSNODE1) and 192.168.70.31 (MSCSNODE2):

Tue Jun 28 11:01:32 2011; TGID(main), TID(main); SNBAPH_417I clust node, hostname (MSCSNODE1), hostaddr (192.168.70.30)
Tue Jun 28 11:01:32 2011; TGID(main), TID(main); SNBAPH_417I clust node, hostname (MSCSNODE2), hostaddr (192.168.70.31)

PX(4064-T4738) CM_SOCK(0) OP[connect(000000000000029C,000000000012D7E0,16)] ret(0) cm_err(10051) errno(10051) "A socket operation was attempted to an unreachable network. " "SADDR(192.168.70.30)"
PDT 2016 PX(4064-T4738) reconnect_direct: cm_err(10051/ENETUNREACH)
PX(4064-T4738) CM_SOCK(0) OP[connect(000000000000029C,000000000012D7E0,16)] ret(0) cm_err(10051) errno(10051) "A socket operation was attempted to an unreachable network. " "SADDR(192.168.70.31)"
PX(4064-T4738) reconnect_direct: cm_err(10051/ENETUNREACH)

However, these IP addresses belong to the public LAN (192.168.x.x) and not to the backup LAN (172.27.x.x).

Resolution

On each node add the following information, for all cluster nodes in the windows registry as a string value pair :

Computer\HKEY_LOCAL_MACHINE\Software\Syncsort\BackupExpress\<DPX-CLUSTER-NAME>\0\CLUSNIP$<HOSTNAME> - <IPADDRESS>

For example in a two node cluster configuration the registry keys should appear as below:

Keywords: SNBAPH_417I

Failure to Present Available Cluster Resources when Using Multiple Network Cards

Summary

Symptoms

Resolution

Catalogic experts are here to help you.

Catalogic experts are here to help you.