[pve-devel] pve api offline during log rotation

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Nov 9 11:03:00 CET 2017


Hi,

On 09/27/2017 08:21 AM, Stefan Priebe - Profihost AG wrote:
> Am 21.09.2017 um 15:30 schrieb Thomas Lamprecht:
>> On 09/20/2017 01:26 PM, Stefan Priebe - Profihost AG wrote:
>>>> [snip]
>>> thanks for the reply. Does this also apply to PVE 4? Sorry i missed that
>>> info.
>>>
>>
>> Yes. But it should happen more often with PVE 5.
>> Affecting packages had the fix already backported and applied in git.
>> It may need a little time until all get released to all debian repos,
>> though.
> 
> sorry for the late reply. I updated all pkgs to latest git stable-4 branch.
> 
> But it still happens. On 06:25 i get a lot of:
> 595 Connection refused
> or
> 500 Can't connect to XXXXXX.de:8006
> 
> messages.
> 
> The journal looks like this:
> Sep 27 06:25:03 systemd[1]: Stopping PVE API Proxy Server...
> Sep 27 06:25:04 pveproxy[47485]: received signal TERM

This is a `systemctl restart pveproxy` not a `systemctl reload pveproxy`
The first does a full shutdown and then full startup cycle (with new PID)
the latter does not a complete full shutdown but a rexec on itself.

On an update of pve-manager reload is used, so I get:

Nov 09 10:08:48 dev5 systemd[1]: Reloading PVE API Proxy Server.
Nov 09 10:08:49 dev5 pvedaemon[1761]: restarting server
Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 11849 finished
Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 11848 finished
Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 11847 finished
Nov 09 10:08:49 dev5 pvedaemon[1761]: starting 3 worker(s)
Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 19464 started
Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 19465 started
Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 19466 started
Nov 09 10:08:49 dev5 pveproxy[19458]: send HUP to 14616
Nov 09 10:08:49 dev5 pveproxy[14616]: received signal HUP
Nov 09 10:08:49 dev5 pveproxy[14616]: server closing
Nov 09 10:08:49 dev5 pveproxy[14616]: server shutdown (restart)
Nov 09 10:08:49 dev5 pveproxy[15962]: worker exit
Nov 09 10:08:49 dev5 pveproxy[15963]: worker exit
Nov 09 10:08:49 dev5 pveproxy[15961]: worker exit
Nov 09 10:08:49 dev5 systemd[1]: Reloaded PVE API Proxy Server.

> Sep 27 06:25:04 pveproxy[47485]: server closing
> Sep 27 06:25:04 pveproxy[25445]: worker exit
> Sep 27 06:25:04 pveproxy[24339]: worker exit
> Sep 27 06:25:04 pveproxy[16705]: worker exit
> Sep 27 06:25:04 pveproxy[52966]: worker exit
> Sep 27 06:25:04 pveproxy[16550]: worker exit
> Sep 27 06:25:04 pveproxy[18493]: worker exit
> Sep 27 06:25:04 pveproxy[3140]: worker exit
> Sep 27 06:25:04 pveproxy[12704]: worker exit
> Sep 27 06:25:04 pveproxy[13839]: worker exit
> Sep 27 06:25:04 pveproxy[47485]: worker 16705 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 13839 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 12704 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 17652 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 3140 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 52966 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 24339 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 25445 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 18493 finished
> Sep 27 06:25:04 pveproxy[47485]: worker 16550 finished
> Sep 27 06:25:04 pveproxy[47485]: server stopped
> Sep 27 06:25:05 systemd[1]: Starting PVE API Proxy Server...
> Sep 27 06:25:06 pveproxy[28612]: Using '/etc/pve/local/pveproxy-ssl.pem'
> as certificate for the web interface.
> Sep 27 06:25:06 pveproxy[28617]: starting server
> Sep 27 06:25:06 pveproxy[28617]: starting 10 worker(s)
> Sep 27 06:25:06 pveproxy[28617]: worker 28618 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28619 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28620 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28621 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28622 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28623 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28624 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28625 started
> Sep 27 06:25:06 pveproxy[28617]: worker 28626 started
> Sep 27 06:25:06 systemd[1]: Started PVE API Proxy Server.
> Sep 27 06:25:06 pveproxy[28617]: worker 28628 started
> 
> The problem still is that pveproxy is stopped and started again. So
> there is a gap where no new connections get accepted.
> 
> Normal behaviour for other deamons i know is to use a special SIGNAL to
> just reopen the logs and do not stop and start the daemon.
> 

We do a re-exec on "ourself" (from the daemons POV), and the intend is to
leave non-idle child workers untouched, but the logic doing this is a bit
flawed as all current worker child always receive a TERM signal.
Here the HTTP server worker wait at least for active connection to end,
but new ones do not get accepted. We directly restart after that, but yes,
depending on load there can be a time window where no one is there to
accept connections.
I'd rather not send the TERM signal in the case where the
"leave_children_open_on_reload" option is set and we're restarting but
just restart, passing the current worker PIDs over to our new self
(this gets already done). There on startup then start new workers and
message the old ones to not  accept new connections and terminate
gracefully as soon as possible. Now there is never a time where no active
listening worker would there. I try to give it a look.

cheers,
Thomas




More information about the pve-devel mailing list