[PVE-User] Stability issue Proxmox4/DRBD9

Sat Apr 9 17:54:24 CEST 2016

Dear list,

Sorry it’s a long message and my english may not be the best :(

I have a cluster with two hosts and DRBD replication, I don’t want to use HA I only want to achieve more resiliency in case a host fail and avoid downtime for the VMs when there is a kernel update and I need to reboot one of the PVE so 2 hosts should not be a problem.

I don’t have powerful machines nor hardware raid, I have 2x2To disks witch I use with a soft raid 1 partition for the system and the second partition is dedicated to drbd ressources.

I’m on the very last version of Proxmox4 with community subscription and my servers are up to date, I have a link between the servers witch truly provide 1Gb of bandwidth (tested with iperf) and a latency of less than 2ms.

I use the version 9 of DRBD since it’s the one installed with PVE4 by default but I’m a little bit worried since Proxmox drbd9 documentation says this is a technology preview and even Linbit say DRBD 9.0.x is not production ready… so I use it the old fashion way for configuration, like the following link advises, 2 ressources to avoid complicated split brain recovery : https://pve.proxmox.com/wiki/DRBD <https://pve.proxmox.com/wiki/DRBD>

I hope that by doing this way I avoid the « not production » thing, since I believe that it is the drbdmanage tool that is not production ready and not DRBD itself but maybe I’m wrong…? 
Is there people that are using drbd9 for production ? successfully ? please share your experience.

My problem, it is almost reproductible everytime (9 times /10) : when I try to clone a VM from one ressource to the other or when I try to move the disk of a VM from a ressource to the other, it finish by a server crash even without running a single VM on the cluster !

I understand it’s a resource intensive operation but it should be slow at worst but my servers should not crash...

What happens :  I launch the operation, it usually happen with « bigs » disks (100Go for exemple, with 5Go it usually complete), at the beginning it seems to work, I monitor the operation with several open ssh consoles with iotop, iftop, iostat, top… on both the servers.

After a wile, sometimes 10%, sometimes 20% of the operation, the progression window stop moving, the target server become red in the web interface, I can see that the server where I launched the operation (source server) stop making read io, there is almost no more bandwidth used in DRBD network link and the server that was « receiving » the datas (the target server) start to become unresponsive, there is no more io on both servers, and one of the servers, usually the target server start to have a big big load, at first its only 3,4,5… but the more I wait, the more the load grows (at the end it can be up to 30 or 35 !!!)

There is a process witch is eating cpu it is drdb even if it’s no more doing anything (and I don’t know how to stop it, the kill and kill -9 don’t work, "service drdb stop" either… 
there is plenty of free memory, there is almost no more IO and the CPU is used but not so much as you can see : https://cloud.ipgenius.fr/index.php/s/UWGbnePdxkiee5A <https://cloud.ipgenius.fr/index.php/s/UWGbnePdxkiee5A>

At the end, I can see theses errors in my ssh consoles :

Message from syslogd at virt1 at Apr  1 11:24:09 ...
 kernel:[79286.931625] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kvm:9177]

Message from syslogd at virt1 at Apr  1 11:24:37 ...
 kernel:[79314.904263] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kvm:9177]

and i can’t do nothing more, can’t even connect to ssh or launch any command in the consoles opened, the only way for getting my server back is to do a hard reboot :(

After that, the ressources quickly resync, and I end up with a LVM disk created but of course unusable… so I delete it...

I give you my drbd config files witch are as basic as possible :

root at virt1 ~ # cat /etc/drbd.d/global_common.conf
global {
  usage-count no;
}

common {

	startup {
                wfc-timeout  0; 
        }

 	handlers {
     	split-brain "/usr/lib/drbd/notify-split-brain.sh root";
     	out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
	}

 	options {
        cpu-mask 0;
    }

 	net {
 		protocol C;
 		allow-two-primaries;
        	cram-hmac-alg sha1;
        	sndbuf-size 0; max-buffers 8000; max-epoch-size 8000; verify-alg sha1;
    }

    disk {
		resync-rate 40M;
		on-io-error detach;
	}

}
root at virt1 ~ # cat /etc/drbd.d/r1.res
resource r1 {
		device /dev/drbd1;
		disk /dev/sda3;
		meta-disk internal;

        net {
                shared-secret "xxxxxxxxxx";
        }

        on virt1 {  
                address 10.200.10.1:7788;     
        }

        on virt2 {
                address 10.200.10.2:7788;
        }
}
root at virt1 ~ # cat /etc/drbd.d/r2.res
resource r2 {
		device /dev/drbd2;
		disk /dev/sdb3;
		meta-disk internal;

        net {
                shared-secret "xxxxxxxxxx";      
        }

        on virt1 {
                address 10.200.10.1:7789;
        }

        on virt2 {
                address 10.200.10.2:7789;
        }
}
root at virt1 ~ # 

I’m not sure about the values for the line with the buffers directives but all the rest is very standard.

For information, unless for this specific operation (copy from a ressource to the other) the cluster is working correctly : I have made 3 or 4 linux and 3 windows VMs on each hosts and launched very stressful tests (all ressources, cpu, ram and disk read/write) in every VMs during more than 24 hours and of course the cluster was a bit loaded, the network links sometimes showed as congested and the VMs was not very fast but it worked very well and it have been stable !

What I need above everything is stability, performances are far away in my priority list because I would like to go on prod with this cluster so really need it to be stable...

Can someone help me ? maybe do you need more config files or any kind of information, I can provide anything you ask. Is there a way for limiting the ressources allocated to drbd ?

Moreover, If there is a real specialist of drbd technology, my company is not rich and can’t afford to pay a lot for it but it is something that can be discussed. 

Anyway, thank you in advance if you have taken the time to read my message and thank you again more if you can provide an answer !

One last thing, not in direct relation with the subject :
There is no /proc/drbd anymore and I’m a little bit lost without it because I was used to it. 
Of course, I know there is others commands (drbdadm status, drbd-overview, drbdsetup events2…) but the nagios plugin I was using to monitor drbd will not work since it is using /proc/drbd output and I didn’t found a newer nagios plugin adapted to this new version, if someone has it could you please provide it ?

if nobody answer to this, I will write one myself, so if someone is interested for it, let me know.

Best regards,

Jean-Laurent Ivars 
Responsable Technique | Technical Manager
22, rue Robert - 13007 Marseille 
Tel: 09 84 56 64 30 - Mobile: 06.52.60.86.47 
Linkedin <http://fr.linkedin.com/in/jlivars/>   |  Viadeo <http://www.viadeo.com/fr/profile/jean-laurent.ivars>   |  www.ipgenius.fr <https://www.ipgenius.fr/>