Views:

Summary

Linux Advanced Backup fails with the following error in the job log: X.X.X.X 6/14/2011 7:03:04 am SNBAPH_371W Func(discover_app::run): Exception (There was a problem in getting fs info from nibbler )

 

 

Symptoms

 

 

Running the bexps command shows that many snapshots are currently open:

 

 

root 7010 1 0 04:02 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o6958_0
root 8650 1 0 Jun13 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o8645_0
root 13723 1 0 Jun10 ? 00:00:00 sh -c /usr/sbin/lvdisplay > /var/BackEx/tmp/snap.lcCeOB 2>&1
root 26239 1 0 Jun12 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o26222_0
root 26395 1 0 Jun10 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o26394_0
roott 29843 1 0 Jun11 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o29842_0

When running lvscan to get more information on these snapshots, the program stops responding, and the follow errors appear:

.[root@ilsia834 ~]# lvscan
/dev/local/bex_snapshot.441864643: read failed after 0 of 4096 at 5368643584: Input/output error
/dev/local/bex_snapshot.68951976: read failed after 0 of 4096 at 4096: Input/output error

Running the bexps command shows clvmd consuming 100% of the CPU.

 

Resolution

 

 

In this scenario, backups fail after clvmd stops responding for the node in question.

 

 

After stopping and restarting the cluster LVM deamon (clvmd) all LVM commands and backups run successfully.