[PVE-User] lots of 'heartbeat_check: no reply from ...' in the logs

lists lists at merit.unu.edu
Tue Feb 12 10:47:46 CET 2019

Hi Alwin,

On 8-2-2019 9:41, Alwin Antreich wrote:
> I pressume, that you run hyper-converged (CT/VM) on the machine. As you
> mentioned there where issues with slow request due to some "sudden" high
> resource demand, it could be that most of the memory contents, oft that
> OSD, where swapped out.
Yes, VMs running on the server.

> Is KSM running on that node? If so, it could be also a reason for slow
> requests or high CPU usage of that OSD, merge/unmerge operations could
> interfere too.

> If you have enough free memory available then try to stop and start the
> OSD. This way you can check if the 100% swap + kworker reside with the
> OSD or are cause by something else.
I restarted both involved OSDs 18 and 19, and that made the 
"heartbeat_check: no reply from osd.18" lines disappear 
from the logs.

However the kworker pids running with 100% CPU utilisation remained.

Then I decided to reboot that node, and that made everything well again.

Thanks for the help!


