Optimizing Backup Performance in Heterogeneous Network/SAN Environments

Views:

Summary

In many environments, data is distributed between the network and the SAN. This article illustrates the details behind configuring DPX for optimal performance in these heterogeneous Network/SAN environments.

Resolution

Background

The following figure outlines a simple environment of 6 network nodes, 2 device server nodes (D1 and D2) and 2 tape drives (T1 and T2) and will be used to outline the concepts. The nodes here are platform independent.

In most cases, it is optimal to distribute the load between these two device servers. The following figure shows an optimal versus an inefficient method for transferring data from network nodes to the SAN device servers. This method will depend on differences in the ability of each of these servers to handle data transfer rates. The assumption here is that the servers are equally capable of handling network data transfer rates, but you can customize DPX to adjust for differences.

The difference between the optimal and inefficient configuration is clear; we would like to equally utilize both device servers when transferring data from the network to the SAN tape drives. The primary bottleneck in many instances is network speed. With a single Gigabit network interface card (approximately 50 MB/s), a device server will be unable to keep more than 3 high-end tape drives (LTO, SuperDLT, AIT-3) busy. These drives write with speeds upwards of 15+ MB/s. This condition is a server side network limitation. There are also common limitations on the ability of client machines to send the device server high data transfer rates. It is often necessary to backup multiple streams of data concurrently to each tape drive because your hard disk drives on network nodes may only be sending ~5-10 MB/s due to disk fragmentation and/or slower network interface cards (such as 10BT or 100BT NICs). Backing up these nodes concurrently results in interleaving of data streams on the tape itself. The process of interleaving the data is managed by the device server and is somewhat CPU intensive. Distributing this CPU load to all of the device servers on the SAN is desirable, one node may not be able to handle all of the streams. It is important to have a good feel for the factors involved here, since they will dictate how successful you are at optimally configuring DPX.

Solution

When you are adding SAN paths to the DPX database, it is important to realize that the first device path that you add is of higher preference than the second pathway that you add, and so on... The following screenshots demonstrate how to configure both the optimal and inefficient case above.

Initial condition :

Go through the following procedure to create two tape drives, T1 and T2, associated with the tape library.

Add tape drives to the jukebox (multiple step process), e.g.

You have created two tape drives, but you have not added any device paths. Configurational differences between the optimal and inefficient methods begin to develop after this point. As you add device paths, remember that the first path that you add to the database will be the preferred device server that this tape drive will mount through. In the optimal case, the first SAN device path that you add will be from a different server for each tape drive.

First Tape Drive :

There is nothing unique about adding pathways to the first tape drive, but how you add pathways to the subsequent drives will determine an optimal or inefficient solution.

Optimal :

Notice here that D2 is the first device server pathway added to the database for the tape device T2. This means that data will be routed through D2 preferentially for the tape drive T2. The same concept can be extended to more than two tape drives, just indicate the preferred server as the first entry you add to the database.

Inefficient :

Note the difference here in the node field. For both T1 (last page) and T2, data is routed preferentially through the node D1.

Therefore, the first SAN device path that you add in the GUI for each tape drive is important because this controller node will be used preferentially to transfer data coming from network nodes (non-local backups). The optimal solution for your environment may vary from what we have outlined. When you define the preferential path on your SAN device servers, you can customize your network backups to meet your environmental needs, e.g.

Compensate for network throughput differences on your SAN servers
Mitigate activity of high-usage servers on the SAN
Load-balance backup activity (optimal solution in this case)

Additional Considerations :

First, do not change the default naming convention of these SAN paths. The result may be misleading because these paths are sorted alphabetically in the GUI and this may not be the same as the order that you physically add them to the database.

How you define your backup jobs is also important. If DPX determines that a task can be accomplished locally, it will over-ride any preference for SAN paths that you may have in the database and mount the drive local to the requesting device server. This determination is made for the first task to begin to write to the tape drive, and tasks are generated on a first-come first-serve basis after building the list of files that need to be backed up. For backup jobs with a single task, this is the desired result. However, the number of nodes in a backup job and the performance options that you choose, e.g. split by partition, can significantly increase the number of tasks. These additional tasks in the backup job will all be forced to use the same SAN device path as the first task that wrote to the drive and may therefore not be optimal.

Therefore, one good strategy is to define backup jobs for each machine on the SAN and then define a single job for all of the machines on the network (3 jobs total in this case, 2 for D1 and D2 and 1 for all of the nodes on the network). Regardless of the performance options that you choose to use, this will ensure that data transfer is local for your device servers on the SAN and well distributed to the device servers when coming over the network.

Optimizing Backup Performance in Heterogeneous Network/SAN Environments

Summary

Resolution

Catalogic experts are here to help you.

Catalogic experts are here to help you.