[PVE-User] System hangs / CPU 100% Windows 2008 Server
Martin Schuchmann
ms at city-pc.de
Wed Sep 5 13:52:07 CEST 2012
>> We have a cluster of 3 proxmox servers and one serious problem on a
>> Win 2008 Std (No R2) guest: Approximately every 5-15 days on
>> different times the CPU turns up to 100% and the systems hangs. Today
>> at 11:59:57 am this failure occurs the last time. We have had the
>> failure in the past also on a sunday, when no one was working on the
>> machine. So we do not think, that any software installed on the
>> Win-Server itself causes the problem. Also the Windows Event-Logs
>> does not show anything.
>>
>> The Proxmox syslog says (the nodes 301 and 501 are located at Server
>> 1 (local storage), the hanging Win2008 machine runs as node 402 on
>> Server 2 - also in local storage):
>>
>> Sep 5 11:42:12 promo2 rrdcached[1847]: removing old journal
>> /var/lib/rrdcached/journal//rrd.journal.1346830932.227122
>> Sep 5 11:59:24 promo2 pmxcfs[1869]: [dcdb] notice: data verification
>> successful
>> Sep 5 12:00:01 promo2 /USR/SBIN/CRON[348613]: (root) CMD (vzdump 301
>> --quiet 1 --mode snapshot --compress lzo --maxfiles 18 --dumpdir
>> /backup_sftp/vz/host1/hourly/)
>> Sep 5 12:00:01 promo2 /USR/SBIN/CRON[348614]: (root) CMD (vzdump 501
>> --quiet 1 --mode snapshot --compress lzo --maxfiles 12 --dumpdir
>> /backup_sftp/vz/elvis/hourly/)
>> Sep 5 12:00:02 promo2 pmxcfs[1869]: [status] notice: received log
>> Sep 5 12:00:02 promo2 pmxcfs[1869]: [status] notice: received log
>> Sep 5 12:00:38 promo2 pmxcfs[1869]: [status] notice: received log
>> Sep 5 12:05:01 promo2 pmxcfs[1869]: [status] notice: received log
>>
>>
>> Also in the past there seemed to be a possible connection between
>> starting snapshots and killing the node 402.
>> The destination for the backups is a SFTP Server in another datacenter.
>>
>> Has anyone experiences with that behaviour?
>
>
> Yes, we did, many many times. Everything solved (really!) after bios
> update (we have many hp and dell servers with Xeon 3xxx and 5xxx
> series and all suffered of a cpu microcode problem, solved at the end
> of 2010 / beginning 2011). Look for a bios update.
>
> Massimo Santoro
Hi Massimo,
Thanks for that advice!
I have checked the bios and according to HP Support it is already a
corrected version from May 2011.
I think if it would be caused by a hardware error, the problem would
occur on other guests on this host also, or the complete host should
freeze up?
On the same machine is a Win 2008 SBS running - for 6 month without any
error.
The node which is freezing is used as a Terminalserver with about 5-10
active Users.
Regards, Martin
More information about the pve-user
mailing list