[PVE-User] Host Shutdown and ZFS Unmounting Race Condition

Chidester, Bryce Bryce.Chidester at calyptix.com
Wed Aug 26 00:53:41 CEST 2015


Hi All,
Just wanted to mention a funny little bug that tripped me up today. I
don't [think I] have enough information to open a formal bug report,
but I'm hoping the devs and other admins take notice and keep an eye
out for this.

I rebooted a Proxmox box today, everything seemed to come back up
normal, and I went on my merry way. A little later, I received an alert
from my monitoring that / was 100% full (and become so in a matter of
minutes). Upon investigation (ncdu -x is wonderfully helpful) I found
that the system had started writing backups to /rpool/backups, but my
backup dataset wasn't mounted so the backup was being written to my /
filesystem. It seems as that something had prevented the backup volume
from mounting on boot. I proceeded to delete the partial backup (it
failed, / was full after all), removed the directories and mounted the
backup volume. (And restarted the backup.)
I found out also that another dataset, rpool/vztemplates failed to
mount on boot for the same reason. (boot.log below)

After thinking about this, I'm guessing that this started with the
shutdown. The ZFS filesystems were unmounted while the PVE processes
were still running, resulting in the PVE process (I don't know which
one is responsible) recreating the various directories (cache,
templates, dump, etc) which would later prevent ZFS from mounting its
datasets on boot. Ideally /var/log/syslog would corroborate this
hypothesis, unfortunately there's no mention of when filesystems are
being unmounted.

I've shutdown/reboot this box a handful of times without this issue
coming up before, so it does seem to be rare and probably not serious
so long as one is monitoring basic system health.

Nitty gritty:
Standalone Proxmox box, pve-manager/3.4-9/4b51d87a (running kernel:
2.6.32-40-pve)
ZFS root, with separate ZFS datasets for /, backups, templates, ISOs,
and swap.
Backup and template storage each are defined as type=Directory pointing
at ZFS mount locations.
And here's a graph of the sudden spike in usage when backups kicked offhttp://i.imgur.com/arpAylk.png

From /var/log/boot.log
Tue Aug 25 11:18:04 2015: Mounting ZFS filesystem(s) cannot mount '/rpool': directory is not empty
Tue Aug 25 11:18:04 2015: cannot mount '/rpool/backups': directory is not empty
Tue Aug 25 11:18:04 2015: cannot mount '/rpool/vztemplates': directory is not empty
Tue Aug 25 11:18:04 2015: 1 ... failed!

Please let me know if there's any additional information I can provide to help debug this.

-- 

Bryce Chidester
Director of Systems Engineering
Calyptix Security | Simply Powerful Network Security.

www.calyptix.com




More information about the pve-user mailing list