[pve-devel] [PATCH] Fix error handling if ram hot-plug fail.
Wolfgang Bumiller
w.bumiller at proxmox.com
Mon Apr 11 09:04:07 CEST 2016
> On April 11, 2016 at 5:26 AM Alexandre DERUMIER <aderumier at odiso.com> wrote:
>
>
> Here a interesting redhat bugzilla, about udev rules
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1320534
This does not apply to the VM I tested it with. There, hotplugged memory
is not automatically activated and the state is 'offline' on all memory dimms
I hot-added, and yet I still cannot unplug them. (Arch, kernel 4.4.5)
> also , according to hotplug documentation
> https://www.kernel.org/doc/Documentation/memory-hotplug.txt
>
> It's possible to use "online_movable" insted "online" to enable dimm state.
>
> And if they are enabled in order, like we do, It should be possible de unplug them in reverse order too.
>
> (The movable zone need to be contiguous, so you can unplug them in random order)
>
>
>
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier at odiso.com>
> À: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
> Cc: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Lundi 11 Avril 2016 05:08:59
> Objet: Re: [pve-devel] [PATCH] Fix error handling if ram hot-plug fail.
>
> Hi,
> sorry to be late, I was on holiday the 2 last weeks.
>
> Does memory hot-*un*-plugging work for you?
>
> Yes, the memory hot-unpluging is working.
>
> But, the main problem is linux kernel :p
>
> I need to retreive documentation,but linux kernel currently put some unmovable memory allocations.
> If they are theses allocation on the specific dimm, the linux kernel will refuse the offline them
>
> here a good powerpoint
> https://events.linuxfoundation.org/sites/events/files/lcjp13_ishimatsu.pdf
>
> (Note that I don't have checked with last 4.X kernel)
>
>
> >> Does it need any
> >>special guest OS? Because the `device_del` qmp command doesn't seem to
> >>have any effect regardless of the `removable` or `online` states in the
> >>guest's /sys/devices/system/memory/memory* files.
>
> a udev rules should help to put offline state (like for hotplug)
>
>
>
>
> >>As bug #931 reports that it takes a huge amount of time for memory
> >>unplugging to give up I also wonder why we retry 5 times with a timeout
> >>of 3 seconds per dimm. Can't we just send the device_del commands for
> >>all dimms at once, then wait 3 seconds _once_, then check? Why bother
> >>with so many retries?
> >>Of course the foreach_dimm*() would have to use qemu_dimm_list() instead
> >>of assuming a default layout if eg. a remove command ended up removing
> >>some dimms in between but failing on the last ones, otherwise further
> >>changes will be problematic.
>
> I don't think you can send device_del on all dimm at the same time. (But I don't have tested it).
> As we don't manage a dimm list in config (we only use memory size to known the memory mapping),
> we need to unplug them in reverse order.
>
> For the retry and timeout, I had added them because I was helping sometimes. Feel free to remove them.
> until linux have proper fixes for unplug, I don't think we can do better unplug.
>
>
>
>
> ----- Mail original -----
> De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
> À: "Wolfgang Link" <w.link at proxmox.com>, "aderumier" <aderumier at odiso.com>
> Cc: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Jeudi 7 Avril 2016 12:52:58
> Objet: Re: [pve-devel] [PATCH] Fix error handling if ram hot-plug fail.
>
> @Alexandre: pinging you about this, since you added the memory
> hotplug/unplug code:
>
> Does memory hot-*un*-plugging work for you? Does it need any
> special guest OS? Because the `device_del` qmp command doesn't seem to
> have any effect regardless of the `removable` or `online` states in the
> guest's /sys/devices/system/memory/memory* files.
>
> As bug #931 reports that it takes a huge amount of time for memory
> unplugging to give up I also wonder why we retry 5 times with a timeout
> of 3 seconds per dimm. Can't we just send the device_del commands for
> all dimms at once, then wait 3 seconds _once_, then check? Why bother
> with so many retries?
> Of course the foreach_dimm*() would have to use qemu_dimm_list() instead
> of assuming a default layout if eg. a remove command ended up removing
> some dimms in between but failing on the last ones, otherwise further
> changes will be problematic.
>
> On Wed, Apr 06, 2016 at 10:24:35AM +0200, Wolfgang Link wrote:
> > There is no need to cancel the program if the ram can't remove.
> > The user will see that it is pending.
> > ---
> > PVE/API2/Qemu.pm | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> > index 0d33f6c..96829c8 100644
> > --- a/PVE/API2/Qemu.pm
> > +++ b/PVE/API2/Qemu.pm
> > @@ -960,7 +960,14 @@ my $update_vm_api = sub {
> > if ($running) {
> > my $errors = {};
> > PVE::QemuServer::vmconfig_hotplug_pending($vmid, $conf, $storecfg, $modified, $errors);
> > - raise_param_exc($errors) if scalar(keys %$errors);
> > + if (scalar(keys %$errors)) {
> > + foreach my $k (keys %$errors) {
> > + my $msg = $errors->{$k};
> > + $msg =~ s/\n/ /;
> > + print $msg;
> > + syslog('warning', "$k: $msg");
> > + }
> > + }
> > } else {
> > PVE::QemuServer::vmconfig_apply_pending($vmid, $conf, $storecfg, $running);
> > }
> > --
> > 2.1.4
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
More information about the pve-devel
mailing list