[PVE-User] Buffer I/O error on device dm-3 / nfs lost / Cluster crashed
Martin Schuchmann
ms at city-pc.de
Wed Aug 14 11:55:00 CEST 2013
Hi there,
After upgrading three host to the current stable 3.0-23 with kernel
2.6.32-22 we have on one host a returning error in kern.log:
Aug 14 07:00:05 promo2 kernel: EXT3-fs: barriers disabled
Aug 14 07:00:05 promo2 kernel: kjournald starting. Commit interval 5
seconds
Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): using internal journal
Aug 14 07:00:05 promo2 kernel: ext3_orphan_cleanup: deleting
unreferenced inode 92192952
Aug 14 07:00:05 promo2 kernel: ext3_orphan_cleanup: deleting
unreferenced inode 92037342
Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): 2 orphan inodes deleted
Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): recovery complete
Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): mounted filesystem with
ordered data mode
Aug 14 07:03:46 promo2 kernel: device-mapper: snapshots: Invalidating
snapshot: Unable to allocate exception.
Aug 14 07:03:47 promo2 kernel: Aborting journal on device dm-3.
Aug 14 07:03:47 promo2 kernel: Buffer I/O error on device dm-3, logical
block 346882562
Aug 14 07:03:47 promo2 kernel: EXT3-fs (dm-3): error:
ext3_journal_start_sb: Detected aborted journal
Aug 14 07:03:47 promo2 kernel: EXT3-fs (dm-3): error: remounting
filesystem read-only
Aug 14 07:03:47 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:47 promo2 kernel: JBD: I/O error detected when updating
journal superblock for dm-3.
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 535724034
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 535724035
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 536215554
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 556138498
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 564822018
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 564822019
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 571408386
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3, logical
block 573800450
Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on dm-3
Aug 14 07:04:36 promo2 kernel: EXT3-fs (dm-3): error: ext3_put_super:
Couldn't clean up the journal
Aug 14 07:07:37 promo2 kernel: ct0 nfs: server 10.1.0.2 not responding,
still trying
Aug 14 07:07:37 promo2 kernel: ct0 nfs: server 10.1.0.2 not responding,
still trying
Aug 14 07:07:37 promo2 kernel: ct0 nfs: server 10.1.0.2 not responding,
still trying
.. (continue until hard-reboot)
The hardware (HP DL360 G7 with P410 Controller) doesn't show any error
in their own log (ILO Interface).
The problem returns every 12h @ 07:00h and 19:00h.
Today at 07:00h after the error in kern.log the nfs daemon stopped also
working.
All machines on the first and the second node became inactive since that
time and were no longer accessible from the outside, but still running.
The local sshd on the proxmox nodes did still work, but we were not able
to reboot the nodes because we could not stop the VMs. Only "echo b >
/proc/sysrq-trigger" helped.
The third node was not concerned, even if he also is connected to the
missing shared storage of node 2.
We share each local storage over the cluster via nfs to all nodes, but
machines are only running on local storage.
Is there any hint what to do?
Using again the older kernel?
Anything about driver problems with the P410i Controller and the new kernel?
Thank you!
Martin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20130814/ecc5e762/attachment.htm>
More information about the pve-user
mailing list