DPX NDMP Backup Methods for NetApp Filers

Views:

Summary

This article describes various methods and considerations for NDMP backup of Network Appliance Filers.

Resolution

Prerequisites

This article summarizes and adds additional details to the topic of DPX NDMP backup using the 'smtape' and 'dump' protocols. For reference, please familiarize yourself with the following articles:

Backup of Network Appliance Snap Mirrors - Brief description of 'dump' and 'smtape' protocol and configuration.
How to Protect SnapVault Data on Tertiary Storage and Recover the Data - NDMP 'dump' backup of latest snapshot and how to IA map manually.

Introduction

Backup of a Network Appliance filer can take one of two forms: 'dump' and 'smtape'. Which one you choose will depend on what your archiving and recovery objectives are and may be influenced by available tape and on-line disk storage resources.

This article highlights points made in the prerequisite section articles and adds additional details which may be important in deciding which backup method (dump or smtape) is more appropriate for your site.

Network Appliance Filer Backup Objectives

The primary difference between 'dump' and 'smtape' type backups is that 'dump' is essentially a file system backup of the volume, and 'smtape' is a block-level image of the volume.

The 'dump' protocol is generally used when backup of specific (but not all) files, folders, or QTrees are desired and selective file restore is a priority.

The 'smtape' protocol is generally used when a full volume restore is necessary for disaster recovery or for seeding a secondary remote filer for subsequent SnapMirror synchronization.

Within the DPX Backup->NDMP GUI screen, the choice of 'smtape' or 'dump' is significant. With 'dump', you may drill down and backup individual folders within a volume, or the entire volume (all files and folders). With 'smtape', it is only valid to backup the whole volume.

By default, file history process is enabled for NDMP 'dump' backups. If you decide to turn file history processing off, then the DPX Restore->NDMP screen will not allow you drill down and restore individual files; the entire backup will need to be restored.

Since 'smtape' is a full binary block level backup of a volume, file history processing does not apply. Since no file history is saved and individual file restore is not supported, you will not be able to bring up the Restore->NDMP screen and drill down into the backup. You will need to restore the entire volume.

Backing Up a Network Appliance Filer

The use of 'dump' or 'smtape' for backup of volumes will depend on your overall backup and recovery needs and may be influenced by how the filer has been configured (individual or shared volumes).

In older versions of ONTAP (v6 and older), configuration of aggregates and volumes were more rigid and lead to many customers creating one volume per aggregate and sending all NAS operations and OSSV backups to large "shared use" volumes.

As of this writing, ONTAP v7 is much more flexible. It is Catalogic Software's recommendation to configure your filer with flex-volumes utilizing "no space reservation" in order to promote better organization and easier space management. The general recommendation is to configure individual flex-volumes for specific purposes: create separate volumes for NAS operations and individual volumes per OSSV backup job.

If you have single large shared-use volumes, you may want to consider using NDMP 'dump' style backups, and breaking up your backup needs into individual jobs. These jobs can be organized by function (OSSV & NAS), department, grouped by size, etc. Breaking up a large shared-use volume into multiple jobs may ease restoration needs; OSSV server backup and NAS folders can be restored individually on an as-needed basis. Allowing for smaller ad-hoc restore operations may also help the site administrator manage disk resources when space is an issue.

If your organization is under any sort of long-term auditing, compliance/governance, or restricted by document retention rules (such as regulated industries or legal discovery), then you may want to consider the differences between 'dump' and 'smtape' and choose the style and backup frequency that meets your organizations needs. Since 'dump' cannot take incremental backups of OSSV QTrees, it is not tape space efficient to archive multiple snapshots in a job; you may find that using 'smtape' style can be more tape efficient since all existing snapshots are retained in each 'smtape' backup.

When a backup of a filer volume is attempted for either method, a snapshot is used as a read-only resource for performing the backup. For volume level backups a special snapshot is created for the backup. The snapshot will be named something similar to:

snapshot_for_backup.2 (for dump backup of volume or subdirectory within root of the volume)

snapmirror_tape_12_28_06_16:15:56 (for smtape backup of volume)

For 'dump' backups of specific snapshots, the requested snapshot will be read directly and no temporary snapshot is created.

In DPX, the 'smtape' temporary snapshots are not automatically deleted. If you want them to be automatically cleaned up, put the following option string into the "Additional NDMP Environment" Source Options field:

SMTAPE_DELETE_SNAPSHOT=Y

And then save the job. See below for additional details.

In either case (dump or smtape), if you see that these temporary snapshots are left behind and no backups of the volume are currently active, you may safely delete them.

A backup job containing two or more filers of mixed type will run as expected. Smtape and dump entities will run their respective jobs/tasks as defined in the enterprise.

'Dump' backups do support Base, Differential, and Incremental backup options; files that are changed are backed up in entirety. 'Smtape' backup ignores these options and will always take a full volume backup.

Considering the Difference Between smtape and dump

The primary differences between these two backup methods is that 'dump' is a file system level backup, and 'smtape' is a full-volume block level backup. Depending on your backup and restore objectives, the choice of 'dump' or 'smtape' may have a significantly affect on needed resources and ease/availability of restores.

Tape Resources per Job

'Dump' and 'smtape' backup schemes can use a significantly different amount of tape resources. This will depend on weather you are attempting a full or partial volume backup, the size of your volume, the rate of change, and the number of retained snapshots.

A 'dump' of a folder or a specific QTree will be less than the total data contained on the entire volume. This is especially true where volume use is shared between NAS usage and one or more OSSV entities and where only specific resources need to be sent to tape.

A 'dump' of a volume will only include those files that are accessible from the top level folder. Only the state of existing folders and QTrees will be retained. Anything under the ".snapshot" directory will be skipped and so specific point-in-time restores previous to the backup date will not be possible. This amount of space can be significantly smaller than all the data retained in the entire volume (this depends on the rate of change and number of snapshots retained). Skipping the ".snapshot" directory makes sense because a "snapshot" is a reference to all the data that existed on your filer at a specific date and time, and multiple snapshots represent an enormous amount of data from a file system perspective. The Network Appliance filer has drivers that avoid duplicating data blocks. From a hardware level, enormous data stores only grow by the number of modified blocks added to the system. From a file system perspective, the same files from different snapshots are different files. So, attempting to backup all snapshot data in a 'dump' style job could consume an enormous amount of data (the size of today's current data times the number of available snapshots!) due to the backup of redundant data blocks.

A 'smtape' backup of a volume will contain all of the blocks contained within that volume. This block-level backup will include all files, folders, QTrees, and their associated snapshots. The number of tapes that will be consumed can be estimated from the "Used Capacity" metric provided from the ONTAP FilerView web application or volume metrics pulled from command line.

Tape Resources Overall

The number of overall tapes consumed by a job over time will depend on the frequency of your backups and your restore accessibility needs. Accessibility needs will depend on your requirements for point-in-time data access and will be affected by your DPX job retention period and/or automated snapshot creation and deletion procedures.

Since 'dump' backups represent only the current point in time, you will need to run a 'dump' backup as frequently as your local policy requires you to provide restore data. So for example, if your requirement is to be able to provide daily point-in-time restores, a daily 'dump' will be required. The more time you have between recovery points, the less frequent your 'dump' backups will be (ie. weekly, monthly, quarterly, etc.).

'Smtape' backups represent all available data (including snapshots) within the volume at a given time. For long-term archiving, this style of backup only requires you to run the job as often as you can expect old data to be removed from the Filer (either automatically or via DPX condense). So for example, if your requirement is to archive daily point-in-time system images and you condense your OSSV jobs after 30 days, you will only need to 'smtape' backup your volume once per month.

In some circumstances it might be desirable to mix these styles of backups to achieve disaster recovery and long term archiving objectives. This will depend entirely on your Recovery Time Objective (which governs how fast the data should be recovered), Recovery Point Objective (how much data can the company afford to loose) and long-term document/data retention policies. See section below on how to enable both types of backup operations.

Accessibility of Restore Data

The data you backup from your filer will be accessible as long as the backup job has not been condensed out of your DPX catalog. Both 'dump' and 'smtape' are equal in this concern.

The ease of accessibility to OSSV snapvault data will depend on the type of backup that was performed, and if the OSSV job has condensed from the catalog or not.

Since a 'dump' of OSSV data does not retain the snapvault relationship data, you will need to restore the QTree data, perform a manual snapshot, and then iSCSI map the snapshot to an iSCSI target. This iSCSI map procedure is detailed in KB How to Protect SnapVault Data on Tertiary Storage and Recover the Data , however a summary of such appears below.

Restoring a whole system, or restoring whole drives (either via OSSV or ExpressDR) may introduce OSSV Change Journal inconsistencies that may require running a new base backup of the newly restored resources. Please see section below "OSSV and Change Journal Considerations"

If an 'smtape' backup of a volume was taken and the OSSV job has not expired from the catalog, then it is possible to restore this volume to the same or different filer, and utilize the 'alternate secondary' feature in Snapvault restore to either restore specific files or IA map the whole drive to an iSCSI target machine.

It is a little simpler to coordinate ExpressDR system restores from 'smtape' data. Since 'smtape' is a full volume restore, ExpressDR will view all the available snapshots from that volume as soon as the NDMP restore operation is completed. 'Dump' data can be used to ExpressDR restore clients, but there are a few extra steps needed documented in KB Restoring from an NDMP Tape Dump for ExpressDR Use .

If a 'smtape' backup of a volume was taken and the OSSV job has expired from the catalog, then the operator will need to iSCSI map the resource manually, similar to the details in KB How to Protect SnapVault Data on Tertiary Storage and Recover the Data .

Please note the following about restoring 'smtape' volumes:

1) If the filer used for restore is the original filer, then the volume name to restore to should be different than the original volume name. Otherwise the 'alternate secondary' feature will not recognize it as a secondary resource. If it is necessary to restore to the original volume name on the original filer, the following may help:

a. Temporarily re-name the volume for 'alternate secondary' to work.

b. Create a new server name for an alternate IP on your filer.

2) An 'smtape' restore will require enough free space to recreate the entire volume.

3) Before attempting any OSSV restore operations (OSSV, IA map, or ExpressDR) from 'smtape' data, you must break the snapmirror relationship via the 'snapmirror break <vol>' command as documented in KB Restore Attempts From SnapMirror Secondary Fail.

4) Restore of 'smtape' data to original filer and original volume name may introduce OSSV Change Journal inconsistencies requiring the run of a new base backup. Please see notes below regarding 'smtape' restore and OSSV Change Journal inconsistencies.

Restore Considerations

Free space requirements within a Filer volume or aggregate will depend on the type and amount of backup data.

A 'smtape' restore will require enough free space in an aggregate in order to recreate the entire volume.

A 'dump' restore will require enough free space within a volume to restore the desired folders or QTrees. You may either create a new folder within an existing volume that has enough space to hold the restore data, or create a new (temporary) flex-volume on an aggregate with available space.

The restore destination filer does not need to be in enterprise at time of your backup. When restore is required, simply add it into your enterprise. If you do not have licensing for an additional NDMP resource, then you will need to request a temporary or emergency key to accomplish this task.

Restore is not dependant on enterprise NDMP type, so you can restore an smtape backup when the enterprise is set to 'dump' and vice-versa

Mixing smtape and dump Backups

A site may find it useful to consider a backup strategy that includes both 'dump' and 'smtape' NDMP backup methods.

As of DPX the Configure->Enterprise utility only allows you to define one default backup type at the bottom of the configuration screen. Thus, out of the box, you cannot run mixed 'dump' and 'smtape' jobs without some kind of intervention.

There are 3 possible ways to support environments requiring mixed 'dump' and 'smtape' backup needs:

1) If you have a spare NDMP node license and an extra IP address on your NetApp Filer that is not already defined in your DPX enterprise, then you can create NDMP entities specific to 'dump' and 'smtape' jobs. The IP address can either be a physical controller address, or a virtual IP address.

2) If an alternate backup method is rarely needed, you can manually switch your existing NDMP (Filer) node from 'smtape' to 'dump' and back again as needed. It would be better to have an alternate job name for this infrequent backup in order to make restore type easier to organize. However, if your job is defined to backup everything at the volume level, you can run the job as either 'dump' or 'smtape' and DPX will keep track of the differences. When you need to restore, 'dump' style backups will allow you to drill down and choose specific files/folders, where as 'smtape' backups will not allow drill-down.

3) It is possible to execute 'syncui' scripts in the pre-script and post-script areas of the job definition that will specifically alter your NDMP entity for the desired backup type. However please note, it is not possible to run 'dump' and 'smtape' backups in parallel; they must be run sequentially. If you need help with this, please contact Catalogic technical support.

Restoring 'smtape' Volumes

There are a few things to prepare when attempting to restore a 'smtape' backup of a volume to a Filer.

In order to restore a 'smtape' backup of a volume, you must first create a volume on an available Filer that has enough free space to hold the entire restored volume. If you create a volume that is too small, your restore job will fail with a message similar to:

SMTAPE: Aborting: Destination volume, TestVol, is too small

If you cannot remember exactly how large the volume was, you may create a flex-volume larger than what you need and configure it with "no space reservation". After the restore operation is successful, the volume will only use the amount of space that was actually backed up. You may not be able to reduce the size of the overall volume later on, but it will not take up additional space unless you use the restored volume to store additional data.

DPX recommends that you name the volume for the restore something different than the original volume name. It is possible to restore to a volume with the same name, but this can lead to confusion and operator errors. Using the same volume name on the original Filer will not permit you to use the 'alternate secondary' option for OSSV restore or IA map from the DPX GUI. In this case, you may need to rename to volume something different. Please see the notes below regarding OSSV and Change Journal considerations after 'smtape' restore.

The volume must be placed in the 'restricted' or 'offline' status prior to the 'smtape' restore, otherwise the restore job will fail. This is the intended behavior in order to help prevent accidental destruction of production on-line volumes. When the restore is done, the volume will be automatically brought back on-line. It is recommended that you use the 'restricted' volume mode; this will allow browsing and choosing of the volume directly from the BEX GUI. The BEX GUI will not be able to interrogate volumes in the 'offline' mode, so if you need to use 'offline' mode, set up your restore job prior to taking the volume 'offline', and then 'offline' the volume prior to actually running the restore job.

Once the volume is created and placed into 'restricted' or 'offline' status, you may proceed with the 'smtape' restore operation.

Once your restore operation is completed, the restored volume will be in the "Snapmirrored" state. If you intend on using OSSV restore, IA map, or ExpressDR to access data on this volume, or if you intend on writing new data to this volume, you will need to break the SnapMirror status. To do this, access your Filer's command console and use the "snapmirror break <volume>" command. For additional information, please refer to KB Restore Attempts From SnapMirror Secondary Fail .

In the event of serious Filer outages such as loss of multiple drives, accidental volume or aggregate destruction, corruption of shelf, catastrophic hardware loss/failure, etc., the operator may want to take the opportunity to create new OSSV snapvault volumes, and opt not to restore these 'smtape' volumes unless a Primary snapvault host restoration is required. Creating the volume new and allowing for a fresh base backup is recommended to:

1) Reduce operator time in restoring data that may expire from the DPX catalog.

2) Avoid possible inconsistencies between the primary's change journal and the secondary's data store.

Once the new volumes are created and OSSV backup job functions restarted, the operator can use the 'smtape' backups to restore data on an as-needed basis.

Please see notes below on OSSV and Change Journal considerations.

ExpressDR Restoration

At this time, ExpressDR Bare Metal Recovery is supported from the following resources:

The original secondary snapvault data source.
A SnapMirror of the original secondary data source (alternate secondary).
A 'smtape' restored volume from the original or Snapmirrored secondary snapvault data source.
A 'dump' restored dataset, restored following recommendations stated in KB Restoring from an NDMP Tape Dump for ExpressDR Use.

In order for ExpressDR to accomplish a Bare Metal Restore, you must also have the following:

Licensing for ExpressDR option within DPX.
Valid ExpressDR block backups of your servers that exist on the Secondary volume, a SnapMirror alternate secondary, or an NDMP backup that can be restored to an available Filer.
Latest ExpressDR boot CD.
An available machine to restore your server image to that conforms to the guidelines and recommendations outlined in the ExpressDR documentation.

Please refer to the ExpressDR documentation if you are restoring images from the original, unaltered location where your block backups were saved to.

If you are attempting to restore an image from a SnapMirror alternate secondary location, you will need to break the SnapMirror relationship first as explained above and in KB Restore Attempts From SnapMirror Secondary Fail .

If you need to restore a server from a volume backed up via 'smtape', you will need to do the following:

Restore your 'smtape' volume as described above.
Make sure you break the SnapMIrror relationship prior to attempting ExpressDR restore.
Boot ExpressDR. When prompted enter in the IP address of the Filer that contains the 'smtape' restore volume you need to access, and the volume name that was restored to. From this point, ExpressDR restore is the same as described in the ExpressDR documentation.

To avoid confusion and save space, the restored volume should be removed when it is no longer needed.

Please see the notes below regarding OSSV and Change Journal considerations after 'smtape' restore.

If you experience a disaster where licensing, job definition mistakes, or lack of available prerequisite resources prevents your critical restore operation, please contact Catalogic Technical support or Catalogic Professional Services for assistance.

Manual IA Map Procedures

If the restored volume was from a 'smtape' backup and the OSSV backup job you need has not expired and condensed from the catalog, then you can simply use the 'alternate secondary' feature from the GUI to IA map the resource to an available iSCSI target machine.

A manual IA map of block data is required if your restored volume was backed up via 'dump' or if your restored volume was restored via 'smtape' and the original OSSV backup job has expired and condensed out of the DPX catalog.

Please refer to KB How to Protect SnapVault Data on Tertiary Storage and Recover the Data for step-by-step screen shots on how data recovered from a 'dump' style restore can be manually iSCSI mapped to any available iSCSI initiator.

The important things to remember on any manual IA map operation is that you need the following:

You need to know exactly which BEXIMAGE.RAW file on your Filer you want to act as the source for the iSCSI target.
BEXIMAGE.RAW must be contained within a snapshot; this can either be a manual snapshot on a 'dump' restored volume, or an existing snapshot from the .snapshot directory within your 'smtape' restored volume.
If your volume was restored via 'smtape' you must make sure that the SnapMirror relationship for the volume is broken.

Once you have restored your data and know exactly which BEXIMAGE.RAW file you need, the following procedure (as per KB How to Protect SnapVault Data on Tertiary Storage and Recover the Data ) is as follows:

Log into the ONTAP command line console as root.
The following sets up the LUN resource:
lun create -b <path_to_snapshot_beximage.raw> -o noreserve <filename_for_lun_data>
The following associates the target machine iSCSI initiator with a logical group name:
igroup create -i -t windows < initiator_group_name> <initiator_node_name>
The following command maps the LUN resource to all members of a group:
lun map <filename_for_lun_data> < initiator_group_name>
From the initiator host simply log onto the Filer that your image is on, and Windows will attach the drive. You may need to go into Windows disk administrator and assign the new drive a windows drive letter.

When you are finished using the IA map:

Log into the ONTAP command line console as root.
The following will get rid of the LUN resource and associated data files:
lun offline < filename_for_lun_data>
lun destroy < filename_for_lun_data>
On your initiator host, log out of the filer so that Windows is forced to refresh its drive mappings and properly disassociate from the iSCSI resource.

As a final note, please read KB Cannot Delete Filer Snapshot: Understanding the LUN-Busy Snapshot Condition for important information regarding use of iSCSI map, OSSV backups, and how subsequent snapshots can get stuck in a "LUN-Busy" state that prevents snapshots deletion that may cause DPX to fail on Snapvault condense.

OSSV and Change Journal considerations

There is a very rigid relationship that exists between the Snapvault Primary (your OSSV server machine) and Snapvault Secondary (NetApp Filer) machines. The Filer houses a QTree which contains the BEXIMAGE.RAW file that represents a point-in-time backup of a hard drive on the Snapvault Primary server. In DPX, changes to the file system are saved in a "Change Journal". If you look at each drive on a Windows Snapvault host, you will see a hidden system directory named "Backup Express Change Journal".

Under normal circumstances, the BEXIMAGE.RAW QTree file represents the last good backup made from your server. The Primary's local Change Journal manages bitmaps that represent all the data that has changed since that last backup. Through this mechanism, DPX can take faster incremental backups. When a Snapvault backup is requested, the Change Journal is consulted to find out what data needs to be transferred, and then only those blocks of data are actually sent. Through the QTree and snapshot facilities on your Filer, these changed blocks are merged into the existing data to "recreate" the host's hard drive. The Filer manages efficient space organization to present complete point-in-time drive representations without duplicating redundant data blocks.

If the QTree data on the Filer and the Change Journal data on the Primary become out of sync, a new OSSV "base" backup must be run. If a base is not run, then subsequent incremental backups will result in corrupt QTree data. Each Filer snapshot taken after corruption occurs will only worsen the corruption. The easiest way to correct this is to force a new OSSV base backup. It is recommended that you use the existing job name when running your base backup.

Therefore, it is generally recommended that you perform a new "base" OSSV backup after a complete system or drive restore. If you are absolutely sure that the data you restored was from the most recent backup of this system, then a change journal conflict will not exist.

This QTree and Change Journal inconsistency can occur after the following:

If you ExpressDR restore a server from a snapshot that is not from the last good backup. The QTree will be newer than and Change Journal.
'smtape' restore of a volume that was backed up prior to new OSSV backups running. In this case, the restored QTrees will be older than the Change Journals that refer to them.
If you SnapMirror your volume to an alternate secondary, break the SnapMirror relationship, run new backups to the original Filer, and then later attempt to divert backups to the alternate secondary when it is out of sync with the original source.

If you believe that you need to force a new Base OSSV backup of a Primary server, Catalogic Software makes the following recommendations:

You can simply create a new job name and delete the old job name. The old job name should be discarded and used again. Creating a new job name will create a new set of QTrees and the old QTrees from the old job will eventually have to be deleted manually.
You can manually clear out the change journals from affected drives. Please see KB Recommendations for Performing New BEX Advanced Recovery Base Backup Jobs on how to accomplish this.

DPX NDMP Backup Methods for NetApp Filers

Summary

Resolution

Catalogic experts are here to help you.

Catalogic experts are here to help you.