Summary
Linux Advanced Backup fails with the following error in the job log: X.X.X.X 6/14/2011 7:03:04 am SNBAPH_371W Func(discover_app::run): Exception (There was a problem in getting fs info from nibbler )
Symptoms
Running the bexps
command shows that many snapshots are currently open:
root 7010 1 0 04:02 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o6958_0
root 8650 1 0 Jun13 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o8645_0
root 13723 1 0 Jun10 ? 00:00:00 sh -c /usr/sbin/lvdisplay > /var/BackEx/tmp/snap.lcCeOB 2>&1
root 26239 1 0 Jun12 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o26222_0
root 26395 1 0 Jun10 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o26394_0
roott 29843 1 0 Jun11 ? 00:00:00 sh -c lvdisplay -c 1>/var/BackEx/logs/nib_o29842_0
When running lvscan to get more information on these snapshots, the program stops responding, and the follow errors appear:
.[root@ilsia834 ~]# lvscan
/dev/local/bex_snapshot.441864643: read failed after 0 of 4096 at 5368643584: Input/output error
/dev/local/bex_snapshot.68951976: read failed after 0 of 4096 at 4096: Input/output error
Running the bexps
command shows clvmd
consuming 100% of the CPU.
Resolution
In this scenario, backups fail after clvmd
stops responding for the node in question.
After stopping and restarting the cluster LVM deamon (clvmd
) all LVM commands and backups run successfully.