[pve-devel] PVE child process behavior question

Denis Kanchev denis.kanchev at storpool.com
Mon Jun 2 15:23:27 CEST 2025


We tend to prevent having a volume active on two nodes, as may lead to data
corruption, so we detach the volume from all nodes ( except the target one
) via our shared storage system.
In the sub activate_volume() our logic is to not detach the volume from
other hosts in case of migration - because activate_volume() can be called
in other cases, where detaching is necessary.
But in this case where the QM start process is killed, the migration is
marked as failed and still activate_volume() is called on the destination
host after migration_cancel ( we track the "lock" flag to be migrate ).
That's why i proposed the child processes to be killed when the parent one
dies - it will prevent such cases.
Not sure if passing an extra argument (marking it as migration) to
activate_volume() will solve such issue too.
Here is a trace log of activate_volume() in case of migration.

2025-05-02 13:03:28.2222 [2712103] took 0.0006: activate_volume: storeid
'autotest__ec2_1', scfg {'type' => 'storpool','shared' => 1,'template' =>
'autotest__ec2_1','extra-tags' => 'tier=high','content' => {'iso' =>
1,'images' => 1}}, volname 'vm-101-disk-0-sp-z.b.df.raw', exclusive undef
at /usr/share/perl5/PVE/St
orage/Custom/StorPoolPlugin.pm line 1551.
       PVE::Storage::Custom::StorPoolPlugin::activate_volume("PVE::Storage::Custom::StorPoolPlugin",
"autotest__ec2_1", HASH(0x559cd06d88a0), "vm-101-disk-0-sp-z.b.df.raw",
undef, HASH(0x559cd076b9a8)) called at /usr/share/perl5/PVE/Storage.pm line
1309
       PVE::Storage::activate_volumes(HASH(0x559cc99d04e0),
ARRAY(0x559cd0754558)) called at /usr/share/perl5/PVE/QemuServer.pm line
5823
       PVE::QemuServer::vm_start_nolock(HASH(0x559cc99d04e0), 101,
HASH(0x559cd0730ca0), HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/QemuServer.pm line 5592
       PVE::QemuServer::__ANON__() called at
/usr/share/perl5/PVE/AbstractConfig.pm line 299
       PVE::AbstractConfig::__ANON__() called at
/usr/share/perl5/PVE/Tools.pm line 259
       eval {...} called at /usr/share/perl5/PVE/Tools.pm line 259

       PVE::Tools::lock_file_full("/var/lock/qemu-server/lock-101.conf",
10, 0, CODE(0x559ccf14b968)) called at
/usr/share/perl5/PVE/AbstractConfig.pm line 302
       PVE::AbstractConfig::__ANON__("PVE::QemuConfig", 101, 10, 0,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
322
       PVE::AbstractConfig::lock_config_full("PVE::QemuConfig", 101, 10,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/AbstractConfig.pm line
330
       PVE::AbstractConfig::lock_config("PVE::QemuConfig", 101,
CODE(0x559ccf59f740)) called at /usr/share/perl5/PVE/QemuServer.pm line
5593
       PVE::QemuServer::vm_start(HASH(0x559cc99d04e0), 101,
HASH(0x559ccfd6d680), HASH(0x559cc99cfe38)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3259
       PVE::API2::Qemu::__ANON__("UPID:lab-dk-2:00296227:0ADF72E0:683DA11F:qmstart:101:root\@pam:")
called at /usr/share/perl5/PVE/RESTEnvironment.pm line 620
       eval {...} called at /usr/share/perl5/PVE/RESTEnvironment.pm line
611
       PVE::RESTEnvironment::fork_worker(PVE::RPCEnvironment=HASH(0x559cc99d0558),
"qmstart", 101, "root\@pam", CODE(0x559cd06cc160)) called at
/usr/share/perl5/PVE/API2/Qemu.pm line 3263
       PVE::API2::Qemu::__ANON__(HASH(0x559cd0700df8)) called at
/usr/share/perl5/PVE/RESTHandler.pm line 499
       PVE::RESTHandler::handle("PVE::API2::Qemu", HASH(0x559cd05deb98),
HASH(0x559cd0700df8), 1) called at /usr/share/perl5/PVE/RESTHandler.pm line
985
       eval {...} called at /usr/share/perl5/PVE/RESTHandler.pm line 968

       PVE::RESTHandler::cli_handler("PVE::API2::Qemu", "qm start",
"vm_start", ARRAY(0x559cc99cfee0), ARRAY(0x559cd0745e98),
HASH(0x559cd0745ef8), CODE(0x559cd07091f8), undef) called at
/usr/share/perl5/PVE/CLIHandler.pm line 594
       PVE::CLIHandler::__ANON__(ARRAY(0x559cc99d00c0), undef,
CODE(0x559cd07091f8)) called at /usr/share/perl5/PVE/CLIHandler.pm line 673
       PVE::CLIHandler::run_cli_handler("PVE::CLI::qm") called at
/usr/sbin/qm line 8


On Mon, Jun 2, 2025 at 2:42 PM Fabian Grünbichler <
f.gruenbichler at proxmox.com> wrote:

>
> > Denis Kanchev <denis.kanchev at storpool.com> hat am 02.06.2025 11:18 CEST
> geschrieben:
> >
> >
> > My bad :) in terms of Proxmox it must be hand-overing the storage
> control - the storage plugin function activate_volume() is called in our
> case, which moves the storage to the new VM.
> > So no data is moved across the nodes and only the volumes get
> re-attached.
> > Thanks for the plentiful information
>
> okay!
>
> so you basically special case this "volume is active on two nodes" case
> which should only happen during a live migration, and that somehow runs
> into an issue if the migration is aborted because there is some suspected
> race somewhere?
>
> as part of a live migration, the sequence should be:
>
> node A: migration starts
> node A: start request for target VM on node B (over SSH)
> node B: `qm start ..` is called
> node B: qm start will activate volumes
> node B: qm start returns
> node A: migration starts
> node A/B: some fatal error
> node A: cancel migration (via QMP/the source VM running on node A)
> node A: request to stop target VM on node B (over SSH)
> node B: `qm stop ..` called
> node B: qm stop will deactivate volumes
>
> I am not sure where another activate_volume call after node A has started
> the migration could happen? at that point, node A still has control over
> the VM (ID), so nothing in PVE should operate on it other than the
> selective calls made as part of the migration, which are basically only
> querying migration status and error handling at that point..
>
> it would still be good to know what actually got OOM-killed in your case..
> was it the `qm start`? was it the `kvm` process itself? something entirely
> else?
>
> if you can reproduce the issue, you could also add logging in
> activate_volume to find out the exact call path (e.g., log the call stack
> somewhere), maybe that helps find the exact scenario that you are seeing..
>
>


More information about the pve-devel mailing list