Can't find what you are looking for?

Click here to open a case.



Reset Search
 

 

Article

Cannot Delete Storage System Snapshot: Understanding the LUN-Busy Snapshot Condition

« Go Back

Information

 
Summary
Symptoms

A NetApp storage system has a snapshot that cannot be deleted. I'm trying to delete this snapshot either because I need to free space on my storage system, or else my DPXcondense is failing on RC=16 on 'unable to delete snapshot' message. The error message I get refers to the snapshot being LUN busy, or busy with a LUN clone operation. I have already destroyed all system LUN's but the issue persists.

Resolution

The LUN-Busy condition is not a bug with DPX. LUN-Busy is the expected behavior of a NetApp storage system with regard to iSCSI LUN's and preserving point-in-time snapshot state. iSCSI mapping to snapshot data is a feature provided as a data recovery resource, and is not intended to support long lasting enterprise iSCSI SAN needs.

Short answer:

This LUN-Busy condition with a snapshot exists when a logical unit (LUN) is created on the storage system to use Instant Availability (IA) to iSCSI map a drive. The LUN-Busy snapshot becomes un-deletable if the iSCSI drive is mapped and busy and then another snapshot is taken of the volume.

All snapshots that are involved with the undeletable LUN-Busy snapshot must be deleted first before the busy snapshot can be removed.

 

Explanation:

The IA feature of DPX is supported through an iSCSI option that is a licensed component of your NetApp storage system.

A snapshot on a storage system is an immutable read-only resource representing a point-in-time file system view. When an IA map is initiated and backed by data contained in a snapshot, the storage system creates a separate set of data structures that keep track of change (write) activities. This data structure lives on the volume with the read-only snapshot data, and together these are used to present to the iSCSI initiator host a virtual read/write block device (the IA map drive data from the snapshot/backup requested). Usually when the IA map is disconnected and the LUN destroyed on the storage system, these writable data structures are removed and the read-only data in the snapshot remains unchanged.

The issue of an undeletable snapshot stuck in the LUN-Busy state comes about when an IA map (iSCSI) is created on a storage system, some block-level change activity takes place on the iSCSI device, and then another snapshot on that volume is taken while the IA map is still connected. The snapshot can either be the result of a manual snapshot specified by the ONTAP web GUI, a snapshot taken from the ONTAP command line interface, an ONTAP scheduled snapshot, or when DPX completes a new Snapvault backup and triggers a snapshot on the storage system volume. This new snapshot preserves all known file system state at that point in time, including the status of snapshots made busy due to iSCSI transactions. The side effect of this is that each snapshot taken while the iSCSI device is mounted holds a reference to these LUN data structures. This condition persists even when all LUNs have been destroyed and removed from the storage system.

If you see that a snapshot is in the LUN-Busy state, you can find out what snapshots are involved with keeping it busy by using the following command from the ONTAP command line interface:

lun snap usage<volume> <snapshot>

Where <volume> is the name of the volume your backups are going to, and <snapshot> is the name of the snapshot in LUN-Busy state.

The lun snap usage command usually lists snapshots in the order in which they need to be deleted so that the snapshot defined in the <snapshot> parameter above can be removed. Start from the top and work your way down.

If a LUN currently exists which is causing your snapshot to be busy, the lun snap usage command will also display this for you under the title of Active. However, it will not automatically display all the LUNs in all of the involved snapshots if more than one active LUN exists. To see the LUNs that are active on your storage system, use the following command:

lun show

This lists all of the LUNs active on your filer. Look for all of the LUNs that are defined and active for your affected volume, and destroy these LUNs using BEX GUI, or the 'lun destroy' function from the OnTap FilerView application or OnTap command line interface.

The lun snap usage command will not necessarily display all of the snapshot LUN dependencies when multiple overlapping iSCSI LUNs have been created between Snapvault backups. When multiple overlapping LUN dependencies exist start by looking at the oldest snapshot first and resolving its dependencies by deleting snapshots in the order displayed in the lun snap usage command. If you encounter another snapshot that cannot be deleted due to LUN clone'or LUN - Busy, use the lun snap usage command on this snapshot and resolve its dependencies before moving forward.

In extreme situations, you may have a large number of snapshots/LUN dependencies that prevent you from easily cleaning up your storage. In these extreme cases it might be easier on the administrator to consider moving the backup to a new job name and volume. Please call Catalogic Software Technical support if you need further assistance.

The standard snapshot naming convention for Snapvault backups is:

SSSV_JobName.n

Where JobName is the name of the backup job as saved in DPX, and n is an integer that starts from 0 and increases each time you take a snapshot backup. Integer 0 is always the most recent backup with increasing numbers representing progressively older backup images. When you use the lun snap usage command, you will see that snapshot dependencies start with smaller numbers (more recent backups) and increase to larger numbers (older backups).

You might also see a dependency on a snapshot with that has a name similar to:

r200(0050405016)_VolName-base.22

If you look at your snapshots though the OnCommand, FilerView or OnTap command line, you will see that this snapshot is "busy" in a state called snapvault, and you will find that you cannot delete this snapshot. This snapshot is part of the Snapvault backup relationship and it is normal for it to be "busy" in the snapvault state. If you try to delete this snapshot, you will get an error: <snapshot> is busy because of snapmirror. In order to remove this special snapshot dependency, do the following:

  1. Unmap or destroy all LUN's on the affected volume.
  2. Run another snapvault backup.

With all the LUN's unmapped, the backup will create a new snapvault image, and the LUN dependency will be removed from the chain of dependant busy snapshots. At this point, the integer at the end of your snapshot name has probably incremented again, so you need to re-verify which snapshots need to be corrected and continue the chain of removing dependant snapshots until the original stuck snapshot can be removed.

Addtional Information

If you are snapvaulting multiple hosts to a single volume using different job names for each host or group of hosts, then each backup to that volume will trigger a snapshot. Each volume snapshot taken while an iSCSI drive is mapped will be involved in saving LUN data structures, and will get involved with keeping the original snapshot in the undeletable state.

Once the iSCSI drive is unmounted, subsequent volume snapshots will not be involved with the iSCSI resource that ties up the undeletable snapshot(s).

Snapshots and LUNs are specific to volumes. Thus if you have a stuck snapshot and LUN-Busy condition in one volume, it will not affect other volumes. Even if you have a single large aggregate with several flex-volumes configured, a snapshot in one flex-volume will not affect other flex-volumes.

DPX attempts to automatically remove snapshots from your filer as backup jobs expire and condense out of the catalog. When the condense finds a snapshot that it cannot delete, it will continue processing as far as it can, and then fail the condense with error code RC=16. This specific RC=16 condition with condense is not serious; but your condense job log should be reviewed to discern if other errors have also occurred.

To avoid the condition of snapshots becoming un-removable, Catalogic Software recommends the following:

  1. Avoid taking snapshots and/or performing OSSV snapvault backups to a volume if IA maps within the volume are active or the volume has snapshots in LUN-Busy state that need correction. Use the lun snap usage command to investigate the state of individual snapshots.
  2. Frequent review of your NetApp Storage snapshots should be performed to discern any volumes that may have LUN-Busy issues.
  3. Correct all LUN-Busy conditions as soon as possible in order to avoid extra maintenance and non-reclaimable space in the future.

The only way to correct this condition is to remove all snapshots that depend on the LUN-Busy snapshot, and then remove the busy snapshot. Since snapshots represent point-in-time backup data, deleting your backups may not be a viable option.

Overall, the issue can be considered benign but will require a little effort to clean up over time. If you can afford to delete all the snapshots involved in LUN-Busy, then the easiest thing to do is to delete these snapshots manually, expire the related jobs from the catalog, run a Condense, and the RC=16 condition will go away.

In the event that you cannot afford to delete backups, please call Catalogic Software Technical support for help. All the involved snapshots will have to remain on the volume until it is appropriate to remove them. This could affect disk usage and require close monitoring for volume space issues. A Catalogic Software support engineer can help step you through the process of manually cleaning up your Master server so that Condense errors and warnings are skipped. After the time that all affected snapshots would have normally been removed through the Condense, you can remove them manually to reclaim the space.

How to Free Filer Space after LUN-Busy Condition provides additional information on how to free filer space after a LUN-Busy condition.

 
Article Number000003774
Article TypeTSN
Created Date11/16/2016 8:58 PM

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255