<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-15">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi there,<br>

    <br>

    After upgrading three host to the current stable 3.0-23 with kernel

    2.6.32-22 we have on one host a returning error in kern.log:<br>

    <br>

    Aug 14 07:00:05 promo2 kernel: EXT3-fs: barriers disabled<br>

    Aug 14 07:00:05 promo2 kernel: kjournald starting.  Commit interval

    5 seconds<br>

    Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): using internal

    journal<br>

    Aug 14 07:00:05 promo2 kernel: ext3_orphan_cleanup: deleting

    unreferenced inode 92192952<br>

    Aug 14 07:00:05 promo2 kernel: ext3_orphan_cleanup: deleting

    unreferenced inode 92037342<br>

    Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): 2 orphan inodes

    deleted<br>

    Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): recovery complete<br>

    Aug 14 07:00:05 promo2 kernel: EXT3-fs (dm-3): mounted filesystem

    with ordered data mode<br>

    Aug 14 07:03:46 promo2 kernel: device-mapper: snapshots:

    Invalidating snapshot: Unable to allocate exception.<br>

    Aug 14 07:03:47 promo2 kernel: Aborting journal on device dm-3.<br>

    Aug 14 07:03:47 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 346882562<br>

    Aug 14 07:03:47 promo2 kernel: EXT3-fs (dm-3): error:

    ext3_journal_start_sb: Detected aborted journal<br>

    Aug 14 07:03:47 promo2 kernel: EXT3-fs (dm-3): error: remounting

    filesystem read-only<br>

    Aug 14 07:03:47 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:47 promo2 kernel: JBD: I/O error detected when updating

    journal superblock for dm-3.<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 535724034<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 535724035<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 536215554<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 556138498<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 564822018<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 564822019<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 571408386<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:03:49 promo2 kernel: Buffer I/O error on device dm-3,

    logical block 573800450<br>

    Aug 14 07:03:49 promo2 kernel: lost page write due to I/O error on

    dm-3<br>

    Aug 14 07:04:36 promo2 kernel: EXT3-fs (dm-3): error:

    ext3_put_super: Couldn't clean up the journal<br>

    Aug 14 07:07:37 promo2 kernel: ct0 nfs: server 10.1.0.2 not

    responding, still trying<br>

    Aug 14 07:07:37 promo2 kernel: ct0 nfs: server 10.1.0.2 not

    responding, still trying<br>

    Aug 14 07:07:37 promo2 kernel: ct0 nfs: server 10.1.0.2 not

    responding, still trying<br>

    .. (continue until hard-reboot)<br>

    <br>

    <br>

    The hardware (HP DL360 G7 with P410 Controller) doesn't show any

    error in their own log (ILO Interface).<br>

    The problem returns every 12h @ 07:00h and 19:00h.<br>

    <br>

    Today at 07:00h after the error in kern.log the nfs daemon stopped

    also working. <br>

    All machines on the first and the second node became inactive since

    that time and were no longer accessible from the outside, but still

    running. The local sshd on the proxmox nodes did still work, but we

    were not able to reboot the nodes because we could not stop the VMs.

    Only "<span style="color: #7a0874; font-weight: bold;">echo</span> b

    <span style="color: #000000; font-weight: bold;">></span> <span

      style="color: #000000; font-weight: bold;">/</span>proc<span

      style="color: #000000; font-weight: bold;">/</span>sysrq-trigger"

    helped.<br>

    <br>

    The third node was not concerned, even if he also is connected to

    the missing shared storage of node 2.<br>

    <br>

    We share each local storage over the cluster via nfs to all nodes,

    but machines are only running on local storage.<br>

    <br>

    Is there any hint what to do?<br>

    Using again the older kernel?<br>

    Anything about driver problems with the P410i Controller and the new

    kernel?<br>

    <br>

    <br>

    Thank you!<br>

    <br>

    Martin.<br>

    <br>

    <br>

    <div class="moz-signature"><br>

    </div>

  </body>

</html>