[PVE-User] lots of 'heartbeat_check: no reply from ...' in the logs

Alwin Antreich a.antreich at proxmox.com
Fri Feb 8 09:41:28 CET 2019

On Fri, Feb 08, 2019 at 09:07:09AM +0100, mj wrote:
> Hi Alwin,
> Thanks for your reply! Appreciated.
> > These messages are not necessarily caused by a network issue. It might
> > well be that the daemon osd.18 can not react to heartbeat messages.
> The thing is: the two OSDs are on the same host. I checked ceph-osd.18.log,
> and it contains just regular ceph stuff, nothing special, like this:
> I noticed on host pm2 there are multiple kworker pids running with 100% CPU
> utilisation. Also swap usage is 100%, while regular RAM usage (from proxmox
> gui) is only 54%.
> No idea what to make of that...
I pressume, that you run hyper-converged (CT/VM) on the machine. As you
mentioned there where issues with slow request due to some "sudden" high
resource demand, it could be that most of the memory contents, oft that
OSD, where swapped out.

Is KSM running on that node? If so, it could be also a reason for slow
requests or high CPU usage of that OSD, merge/unmerge operations could
interfere too.

If you have enough free memory available then try to stop and start the
OSD. This way you can check if the 100% swap + kworker reside with the
OSD or are cause by something else.


