Views:


Symptoms



A NetApp storage system has a snapshot that cannot be deleted. I'm trying to delete this snapshot either because I need to free space on my storage system, or else my DPXcondense is failing on RC=16 on 'unable to delete snapshot' message. The error message I get refers to the snapshot being LUN busy, or busy with a LUN clone operation. I have already destroyed all system LUN's but the issue persists.



Resolution



The LUN-Busy condition is not a bug with DPX. LUN-Busy is the expected behavior of a NetApp storage system with regard to iSCSI LUN's and preserving point-in-time snapshot state. iSCSI mapping to snapshot data is a feature provided as a data recovery resource, and is not intended to support long lasting enterprise iSCSI SAN needs.

Short answer:

This LUN-Busy condition with a snapshot exists when a logical unit (LUN) is created on the storage system to use Instant Availability (IA) to iSCSI map a drive. The LUN-Busy snapshot becomes un-deletable if the iSCSI drive is mapped and busy and then another snapshot is taken of the volume.

All snapshots that are involved with the undeletable LUN-Busy snapshot must be deleted first before the busy snapshot can be removed.

 

Explanation:

The IA feature of DPX is supported through an iSCSI option that is a licensed component of your NetApp storage system.

A snapshot on a storage system is an immutable read-only resource representing a point-in-time file system view. When an IA map is initiated and backed by data contained in a snapshot, the storage system creates a separate set of data structures that keep track of change (write) activities. This data structure lives on the volume with the read-only snapshot data, and together these are used to present to the iSCSI initiator host a virtual read/write block device (the IA map drive data from the snapshot/backup requested). Usually when the IA map is disconnected and the LUN destroyed on the storage system, these writable data structures are removed and the read-only data in the snapshot remains unchanged.

The issue of an undeletable snapshot stuck in the LUN-Busy state comes about when an IA map (iSCSI) is created on a storage system, some block-level change activity takes place on the iSCSI device, and then another snapshot on that volume is taken while the IA map is still connected. The snapshot can either be the result of a manual snapshot specified by the ONTAP web GUI, a snapshot taken from the ONTAP command line interface, an ONTAP scheduled snapshot, or when DPX completes a new Snapvault backup and triggers a snapshot on the storage system volume. This new snapshot preserves all known file system state at that point in time, including the status of snapshots made busy due to iSCSI transactions. The side effect of this is that each snapshot taken while the iSCSI device is mounted holds a reference to these LUN data structures. This condition persists even when all LUNs have been destroyed and removed from the storage system.

If you see that a snapshot is in the LUN-Busy state, you can find out what snapshots are involved with keeping it busy by using the following command from the ONTAP command line interface:

lun snap usage<volume> <snapshot>

Where <volume> is the name of the volume your backups are going to, and <snapshot> is the name of the snapshot in LUN-Busy state.

The lun snap usage command usually lists snapshots in the order in which they need to be deleted so that the snapshot defined in the <snapshot> parameter above can be removed. Start from the top and work your way down.

If a LUN currently exists which is causing your snapshot to be busy, the lun snap usage command will also display this for you under the title of Active. However, it will not automatically display all the LUNs in all of the involved snapshots if more than one active LUN exists. To see the LUNs that are active on your storage system, use the following command:

lun show

This lists all of the LUNs active on your filer. Look for all of the LUNs that are defined and active for your affected volume, and destroy these LUNs using BEX GUI, or the 'lun destroy' function from the OnTap FilerView application or OnTap command line interface.

The lun snap usage command will not necessarily display all of the snapshot LUN dependencies when multiple overlapping iSCSI LUNs have been created between Snapvault backups. When multiple overlapping LUN dependencies exist start by looking at the oldest snapshot first and resolving its dependencies by deleting snapshots in the order displayed in the lun snap usage command. If you encounter another snapshot that cannot be deleted due to LUN clone'or LUN - Busy, use the lun snap usage command on this snapshot and resolve its dependencies before moving forward.

In extreme situations, you may have a large number of snapshots/LUN dependencies that prevent you from easily cleaning up your storage. In these extreme cases it might be easier on the administrator to consider moving the backup to a new job name and volume. Please call Catalogic Software Technical support if you need further assistance.

The standard snapshot naming convention for Snapvault backups is:

SSSV_JobName.n

Where JobName is the name of the backup job as saved in DPX, and n is an integer that starts from 0 and increases each time you take a snapshot backup. Integer 0 is always the most recent backup with increasing numbers representing progressively older backup images. When you use the lun snap usage command, you will see that snapshot dependencies start with smaller numbers (more recent backups) and increase to larger numbers (older backups).

You might also see a dependency on a snapshot with that has a name similar to:

r200(0050405016)_VolName-base.22

If you look at your snapshots though the OnCommand, FilerView or OnTap command line, you will see that this snapshot is "busy" in a state called snapvault, and you will find that you cannot delete this snapshot. This snapshot is part of the Snapvault backup relationship and it is normal for it to be "busy" in the snapvault state. If you try to delete this snapshot, you will get an error: <snapshot> is busy because of snapmirror. In order to remove this special snapshot dependency, do the following:

  1. Unmap or destroy all LUN's on the affected volume.
  2. Run another snapvault backup.

With all the LUN's unmapped, the backup will create a new snapvault image, and the LUN dependency will be removed from the chain of dependant busy snapshots. At this point, the integer at the end of your snapshot name has probably incremented again, so you need to re-verify which snapshots need to be corrected and continue the chain of removing dependant snapshots until the original stuck snapshot can be removed.