[pve-devel] pve api offline during log rotation

Stefan Priebe - Profihost AG s.priebe at profihost.ag
Thu Nov 9 13:08:14 CET 2017


Hello,

yes that's what i'm talking about. The logfile rotation script DOES a
restart not a reload.

See here:
https://git.proxmox.com/?p=pve-manager.git;a=blob_plain;f=debian/pve.logrotate;hb=HEAD

Greets,
Stefan
Am 09.11.2017 um 11:03 schrieb Thomas Lamprecht:
> Hi,
> 
> On 09/27/2017 08:21 AM, Stefan Priebe - Profihost AG wrote:
>> Am 21.09.2017 um 15:30 schrieb Thomas Lamprecht:
>>> On 09/20/2017 01:26 PM, Stefan Priebe - Profihost AG wrote:
>>>>> [snip]
>>>> thanks for the reply. Does this also apply to PVE 4? Sorry i missed that
>>>> info.
>>>>
>>>
>>> Yes. But it should happen more often with PVE 5.
>>> Affecting packages had the fix already backported and applied in git.
>>> It may need a little time until all get released to all debian repos,
>>> though.
>>
>> sorry for the late reply. I updated all pkgs to latest git stable-4 branch.
>>
>> But it still happens. On 06:25 i get a lot of:
>> 595 Connection refused
>> or
>> 500 Can't connect to XXXXXX.de:8006
>>
>> messages.
>>
>> The journal looks like this:
>> Sep 27 06:25:03 systemd[1]: Stopping PVE API Proxy Server...
>> Sep 27 06:25:04 pveproxy[47485]: received signal TERM
> 
> This is a `systemctl restart pveproxy` not a `systemctl reload pveproxy`
> The first does a full shutdown and then full startup cycle (with new PID)
> the latter does not a complete full shutdown but a rexec on itself.
> 
> On an update of pve-manager reload is used, so I get:
> 
> Nov 09 10:08:48 dev5 systemd[1]: Reloading PVE API Proxy Server.
> Nov 09 10:08:49 dev5 pvedaemon[1761]: restarting server
> Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 11849 finished
> Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 11848 finished
> Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 11847 finished
> Nov 09 10:08:49 dev5 pvedaemon[1761]: starting 3 worker(s)
> Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 19464 started
> Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 19465 started
> Nov 09 10:08:49 dev5 pvedaemon[1761]: worker 19466 started
> Nov 09 10:08:49 dev5 pveproxy[19458]: send HUP to 14616
> Nov 09 10:08:49 dev5 pveproxy[14616]: received signal HUP
> Nov 09 10:08:49 dev5 pveproxy[14616]: server closing
> Nov 09 10:08:49 dev5 pveproxy[14616]: server shutdown (restart)
> Nov 09 10:08:49 dev5 pveproxy[15962]: worker exit
> Nov 09 10:08:49 dev5 pveproxy[15963]: worker exit
> Nov 09 10:08:49 dev5 pveproxy[15961]: worker exit
> Nov 09 10:08:49 dev5 systemd[1]: Reloaded PVE API Proxy Server.
> 
>> Sep 27 06:25:04 pveproxy[47485]: server closing
>> Sep 27 06:25:04 pveproxy[25445]: worker exit
>> Sep 27 06:25:04 pveproxy[24339]: worker exit
>> Sep 27 06:25:04 pveproxy[16705]: worker exit
>> Sep 27 06:25:04 pveproxy[52966]: worker exit
>> Sep 27 06:25:04 pveproxy[16550]: worker exit
>> Sep 27 06:25:04 pveproxy[18493]: worker exit
>> Sep 27 06:25:04 pveproxy[3140]: worker exit
>> Sep 27 06:25:04 pveproxy[12704]: worker exit
>> Sep 27 06:25:04 pveproxy[13839]: worker exit
>> Sep 27 06:25:04 pveproxy[47485]: worker 16705 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 13839 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 12704 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 17652 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 3140 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 52966 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 24339 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 25445 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 18493 finished
>> Sep 27 06:25:04 pveproxy[47485]: worker 16550 finished
>> Sep 27 06:25:04 pveproxy[47485]: server stopped
>> Sep 27 06:25:05 systemd[1]: Starting PVE API Proxy Server...
>> Sep 27 06:25:06 pveproxy[28612]: Using '/etc/pve/local/pveproxy-ssl.pem'
>> as certificate for the web interface.
>> Sep 27 06:25:06 pveproxy[28617]: starting server
>> Sep 27 06:25:06 pveproxy[28617]: starting 10 worker(s)
>> Sep 27 06:25:06 pveproxy[28617]: worker 28618 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28619 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28620 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28621 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28622 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28623 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28624 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28625 started
>> Sep 27 06:25:06 pveproxy[28617]: worker 28626 started
>> Sep 27 06:25:06 systemd[1]: Started PVE API Proxy Server.
>> Sep 27 06:25:06 pveproxy[28617]: worker 28628 started
>>
>> The problem still is that pveproxy is stopped and started again. So
>> there is a gap where no new connections get accepted.
>>
>> Normal behaviour for other deamons i know is to use a special SIGNAL to
>> just reopen the logs and do not stop and start the daemon.
>>
> 
> We do a re-exec on "ourself" (from the daemons POV), and the intend is to
> leave non-idle child workers untouched, but the logic doing this is a bit
> flawed as all current worker child always receive a TERM signal.
> Here the HTTP server worker wait at least for active connection to end,
> but new ones do not get accepted. We directly restart after that, but yes,
> depending on load there can be a time window where no one is there to
> accept connections.
> I'd rather not send the TERM signal in the case where the
> "leave_children_open_on_reload" option is set and we're restarting but
> just restart, passing the current worker PIDs over to our new self
> (this gets already done). There on startup then start new workers and
> message the old ones to not  accept new connections and terminate
> gracefully as soon as possible. Now there is never a time where no active
> listening worker would there. I try to give it a look.
> 
> cheers,
> Thomas
> 



More information about the pve-devel mailing list