Views:

Summary



When attempting an NDMP backup from a NetApp clustered SVM, error 23 (NDMP_CONNECT_ERR) is received from the NetApp filer.

Symptoms



NDMP connections consist of a control and data paths; with this error, the control paths can be established, but the data path cannot. The Control paths are established between all NDMP components and uses port 10000 for the communication, while the data path uses a dynamic port and is established as a connection from the primary to the secondary storage. 

For the description below a backup is done with a clustered NetApp SVM as primary storage (with Proxy client configured on the DPX master server) to the secondary storage (which is a Linux host with the DPX client installed) where a Device server is configured to a DiskDirectory. In this case control paths are from  NDMP proxy node to the Primary storage, from NDMP Proxy node to device server, and between Primary storage node and Device server using  port 10000 for communications. The NDMP Data path should be established from the primary storage (NetApp) to the Secondary Storage listener IP and port, which is dynamically allocated per connection on DPX nibbler.

Errors in Joblog

192.168.23.39 ssjobhnd Thu Dec 13 14:23:51 2018 SNBJH_3036E *** Drive fail reported: drive Data1 on node lnx_svr1 in job 1544707422 ***
192.168.23.39 ssevthnd Thu Dec 13 14:23:51 2018 SNBEHB3021E *** Fail searching socket 5 of module ssdmbs at node  [dev ] ***
192.168.23.39 ssjobhnd Thu Dec 13 14:23:51 2018 SNBJHB3460J --- Putting definition on retry queue: node netApp1 disk /Backup_svm/vol_data of job 1544707422. ---
192.168.23.39 ssjobhnd Thu Dec 13 14:23:51 2018 SNBJH_3045E *** Task 1 of job 1544707422 for node netApp1 disk /Backup_svm/vol_data failed ***
192.168.23.39 ssjobhnd Thu Dec 13 14:23:51 2018 SNBJH_3012E *** final_stat: invalid rqstlink (reqid=4, class=NONE) function ID 896 ***


Error in Data Mover client log xxxxxxx.ncl

Thu Dec 13 14:23:51 2018 SNBNCX1505D Message() sent to module(ndmpd) at host(192.168.23.139) on connection 20
Thu Dec 13 14:23:51 2018 SNBNCX1506D Message(NDMP_DATA_CONNECT) received from module(ndmpd) at host(x.x.x.x) on connection 20
Thu Dec 13 14:23:51 2018 SNBNCX6014E ERROR: Task 11, type(BACKUP), state(DATA_1); Error(501, NC_TASK_ERROR_SERVER_RESPONSE), NDMP error(23, NDMP_CONNECT_ERR); Connection 0, Message token 0

 

Resolution



Possible Causes:

1) Secondary storage server has more than one NIC where one interface address is not reachable by the NDMP LIF of the SVM from NetApp filer (Primary Storage). Therefore no route to data can be established to the device server.
Addresses on Secondary Storage:

SLES11-05:~ # ip -f inet address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
    inet 172.20.74.5/16 brd 172.20.255.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000
    inet 10.0.0.2/24 brd 10.0.0.255 scope global eth1
 
Set the debug level for ndmpc to 3 with  -d 3 parameter. Under Configure Enterprise, select master node, right click, Manage configuration. Select Module ndmpc, Parameter Visibilty Advanced. 
xxxxxx.ncl shows the following entries for the listener being used on the primary storage before NDMP_DATA_CONNECT is called.

Thu Dec 13 14:23:51 2018 SNBNCX3067I Address type(1, NDMP_ADDR_TCP)
Thu Dec 13 14:23:51 2018 SNBNCX1100D Function: nc_ndmp_format_pvals called
Thu Dec 13 14:23:51 2018 SNBNCX3067I Address data: IP@[0]=a0000002; Port(38540); Addr Env()
Thu Dec 13 14:23:51 2018 SNBNCX1100D Function: ms_send_msg called
Thu Dec 13 14:23:51 2018 SNBNCX1100D Function: ms_find_connection called


IP@[0]=a0000002 is the ip address being listened on for the data connection, and translates to 10.0.0.2

Netapp SVM LIF Configuration

Node0::> network interface show -vserver BackupSVM
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
BackupSVM
            back_lif1   up/up      172.20.74.14/16     Node0         e0a     true


Solution:
Add The ip address nibbler should listen for data connection to the NIB_LISTEN_IP_LIST parameter. These IP addresses should be reachable by the SVM LIF.
a) In the DPX Admin Console, Select Configure Enterprise
b) Select the Primary Storage node
c) Select Manage Configuration by Right Clicking on the node.
d) Select nibbler from the drop down and Advanced for Primary Visibility
e) Select  NIB_LISTEN_LIST and add the Reachable IP addresses in a comma separated list.
f) Restart the Catalogic DPX Advanced Protection Manager (nibbler) service on Windows. On a Linux client execute /opt/DPX/misc/bexads stop to stop the agent and /opt/DPX/misc/bexads start to start.

2) The Netapp Cluster does not have intercluster LIFs configured, while the data LIF and volume on the primary storage are on different nodes.

To see the error for this condition you need to enable "NDMP debug logging" on the filer by following the instructions provided by NetApp here:
How to enable NDMP debug logging on Vserver-scoped Vservers in clustered Data ONTAP 8.2 and later
In the debug log generation, you will see the error indicating that the there is no route in the cluster, by seeing the following line just before the NDMP4_DATA_CONNECT error (23).
.......
LOG sev (3) msg (NDMP Vserver Connect: couldn't find any appropriate preferred local src IP address)
....
....
cb_ndmp4_data_connect: here
 DEBUG: DMA<<S V4 sequence=10 (0xa)
Time_stamp=0x5c07b197 (Dec  5 12:08:07 2018)
message type=1 (NDMP4_MESSAGE_REPLY)
message_code=0x40a (NDMP4_DATA_CONNECT)
reply_sequence=8 (0x8)
error_code=0 (NDMP4_NO_ERR)
error=23 (NDMP4_CONNECT_ERR)


To show if there are any LIF's with the intercluster role, excecute the following on the filer (the example shows no intercluster LIF existing):

Node0::*> net interface show -role intercluster
  (network interface show)
There are no entries matching your query.


Solution

Ensure that  intercluster LIF is on all nodes of the cluster.

Sample command to create and view on Filer:
cluster1::> network interface create -vserver cluster1 -lif IC_N0 -role intercluster 
-home-node cluster1-2 -home-port e0b -address 192.0.2.68 -netmask 255.255.255.0 
-status-admin up -failover-policy local-only -firewall-policy intercluster
cluster1::> network interface show -role intercluster
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
cluster1    IC1        up/up      192.0.2.65/24      cluster1-1    e0a     true
cluster1    IC2        up/up      192.0.2.68/24      cluster1-2    e0b     true
 
Comments (0)