[pve-devel] [PATCH] Fix error handling if ram hot-plug fail.

Mon Apr 11 05:08:59 CEST 2016

Hi,
sorry to be late, I was on holiday the 2 last weeks.

Does memory hot-*un*-plugging work for you? 

Yes, the memory hot-unpluging is working. 

But, the main problem is linux kernel :p

I need to retreive documentation,but linux kernel currently put some unmovable memory allocations.
If they are theses allocation on the specific dimm, the linux kernel will refuse the offline them

here a good powerpoint
https://events.linuxfoundation.org/sites/events/files/lcjp13_ishimatsu.pdf

(Note that I don't have checked with last 4.X kernel)

>> Does it need any 
>>special guest OS? Because the `device_del` qmp command doesn't seem to 
>>have any effect regardless of the `removable` or `online` states in the 
>>guest's /sys/devices/system/memory/memory* files.

a udev rules should help to put offline state (like for hotplug)

>>As bug #931 reports that it takes a huge amount of time for memory 
>>unplugging to give up I also wonder why we retry 5 times with a timeout 
>>of 3 seconds per dimm. Can't we just send the device_del commands for 
>>all dimms at once, then wait 3 seconds _once_, then check? Why bother 
>>with so many retries? 
>>Of course the foreach_dimm*() would have to use qemu_dimm_list() instead 
>>of assuming a default layout if eg. a remove command ended up removing 
>>some dimms in between but failing on the last ones, otherwise further 
>>changes will be problematic. 

I don't think you can send device_del on all dimm at the same time. (But I don't have tested it).
As we don't manage a  dimm list in config (we only use memory size to known the memory mapping),
we need to unplug them in reverse order.

For the retry and timeout, I had added them because I was helping sometimes. Feel free to remove them.
until linux have proper fixes for unplug, I don't think we can do better unplug.

----- Mail original -----
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
À: "Wolfgang Link" <w.link at proxmox.com>, "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 7 Avril 2016 12:52:58
Objet: Re: [pve-devel] [PATCH] Fix error handling if ram hot-plug fail.

@Alexandre: pinging you about this, since you added the memory 
hotplug/unplug code: 

Does memory hot-*un*-plugging work for you? Does it need any 
special guest OS? Because the `device_del` qmp command doesn't seem to 
have any effect regardless of the `removable` or `online` states in the 
guest's /sys/devices/system/memory/memory* files. 

As bug #931 reports that it takes a huge amount of time for memory 
unplugging to give up I also wonder why we retry 5 times with a timeout 
of 3 seconds per dimm. Can't we just send the device_del commands for 
all dimms at once, then wait 3 seconds _once_, then check? Why bother 
with so many retries? 
Of course the foreach_dimm*() would have to use qemu_dimm_list() instead 
of assuming a default layout if eg. a remove command ended up removing 
some dimms in between but failing on the last ones, otherwise further 
changes will be problematic. 

On Wed, Apr 06, 2016 at 10:24:35AM +0200, Wolfgang Link wrote: 
> There is no need to cancel the program if the ram can't remove. 
> The user will see that it is pending. 
> --- 
> PVE/API2/Qemu.pm | 9 ++++++++- 
> 1 file changed, 8 insertions(+), 1 deletion(-) 
> 
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm 
> index 0d33f6c..96829c8 100644 
> --- a/PVE/API2/Qemu.pm 
> +++ b/PVE/API2/Qemu.pm 
> @@ -960,7 +960,14 @@ my $update_vm_api = sub { 
> if ($running) { 
> my $errors = {}; 
> PVE::QemuServer::vmconfig_hotplug_pending($vmid, $conf, $storecfg, $modified, $errors); 
> - raise_param_exc($errors) if scalar(keys %$errors); 
> + if (scalar(keys %$errors)) { 
> + foreach my $k (keys %$errors) { 
> + my $msg = $errors->{$k}; 
> + $msg =~ s/\n/ /; 
> + print $msg; 
> + syslog('warning', "$k: $msg"); 
> + } 
> + } 
> } else { 
> PVE::QemuServer::vmconfig_apply_pending($vmid, $conf, $storecfg, $running); 
> } 
> -- 
> 2.1.4