[PVE-User] periodic Node Crash/freeze

Woods, Ken A (DNR) ken.woods at alaska.gov
Thu Aug 23 12:19:32 CEST 2018


Why did you decide to not use multicast?


> On Aug 22, 2018, at 22:58, Ml Ml <mliebherr99 at googlemail.com> wrote:
> 
> Hello,
> 
> i could need some hint/help since one cluster is letting me down since
> 29.07.2018 .
> Thats when one of my three nodes started to freeze and stop.
> 
> In syslog the last entries are:
> 
> Aug 21 02:33:00 node10 systemd[1]: Starting Proxmox VE replication runner...
> Aug 21 02:33:01 node10 systemd[1]: Started Proxmox VE replication runner.
> Aug 21 02:33:01 node10 CRON[1870491]: (root) CMD (/usr/bin/puppet
> agent -vt --color false --logdest /var/log/puppet/agent.log
> 1>/dev/null)
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
> 
> 
>  or:
> 
> Aug 22 16:11:12 node08 pmxcfs[5227]: [dcdb] notice: cpg_send_message
> retried 1 times
> Aug 22 16:11:12 node08 pmxcfs[5227]: [status] notice: members: 1/5227, 2/5058
> Aug 22 16:11:12 node08 pmxcfs[5227]: [status] notice: starting data
> syncronisation
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> 
> I already posted it here:
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_periodic-2Dnode-2Dcrash-2Dfreeze.46407_&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=zpOdKmRPAro1hJw-CO0lkGqmzXn8fQ4Ye5aJvsC8lbk&s=fRGRq_-sMJvikzFr6peWj3oZxkZ5eHY434Re48Mv9mI&e=
> 
> It happened at:
> 29.07.2018 node09 / pve 4.4
> 07.08.2018 node08 / pve 4.4 ( then i decided to upgrade)
> 21.08.2018 node10 / pve 5.2
> 22.08.2018 node08 / pve 5.2
> 
> ...and i am getting nervous now since there are 60 important VMs on it.
> As you can see it happened across multiple nodes with diffrent PVE Versions.
> 
> Memtest is okay.
> 
> As far as i googled the "^@^@^@^@^@^" appear is syslog because i can
> not fully write the file to disk?
> 
> Maybe something triggers some totem/watchdog stuff which then ends in
> a disaster?
> 
> My Ideas from here:
> - disable corosync/totem and see if the problems stop
> 
> Have you any ideas which could narrow my problem down?
> 
> 
> My Setup is a 3 Node Cluster (node08, node09, node10) with ceph.
> 
> I have 4 other 3-NodeCluster running just fine.
> 
> Thanks a lot.
> 
> Mario
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=zpOdKmRPAro1hJw-CO0lkGqmzXn8fQ4Ye5aJvsC8lbk&s=8K2XEB3Soz8V0JMR6hzvc78bjDExInI2vC2LC_FfljI&e=


More information about the pve-user mailing list