[PVE-User] lots of 'heartbeat_check: no reply from ...' in the logs
lists at merit.unu.edu
Tue Feb 12 10:47:46 CET 2019
On 8-2-2019 9:41, Alwin Antreich wrote:
> I pressume, that you run hyper-converged (CT/VM) on the machine. As you
> mentioned there where issues with slow request due to some "sudden" high
> resource demand, it could be that most of the memory contents, oft that
> OSD, where swapped out.
Yes, VMs running on the server.
> Is KSM running on that node? If so, it could be also a reason for slow
> requests or high CPU usage of that OSD, merge/unmerge operations could
> interfere too.
> If you have enough free memory available then try to stop and start the
> OSD. This way you can check if the 100% swap + kworker reside with the
> OSD or are cause by something else.
I restarted both involved OSDs 18 and 19, and that made the
"heartbeat_check: no reply from 10.10.89.2:6807 osd.18" lines disappear
from the logs.
However the kworker pids running with 100% CPU utilisation remained.
Then I decided to reboot that node, and that made everything well again.
Thanks for the help!
More information about the pve-user