From mark at openvs.co.uk  Mon Dec  4 19:51:04 2017
From: mark at openvs.co.uk (Mark Adams)
Date: Mon, 4 Dec 2017 18:51:04 +0000
Subject: [PVE-User] HA Fencing
In-Reply-To: <0bc124d5-a080-2e38-af47-bdcf36507bc3@proxmox.com>
References: <CAHxUxjBsoxXf4+qYvCsn1Vd+Uu6wF5-fg_iyKhXKFdf5HWjoKA@mail.gmail.com>
 <0bc124d5-a080-2e38-af47-bdcf36507bc3@proxmox.com>
Message-ID: <CAHxUxjCSShEnPm1_trBjzdLaHBL82WoTjYgCeymej6M0_ZZ2sQ@mail.gmail.com>

Hi,

On 17 November 2017 at 10:55, Thomas Lamprecht <t.lamprecht at proxmox.com>
wrote:

> Hi,
>
> On 11/16/2017 07:20 PM, Mark Adams wrote:
> > Hi all,
> >
> > It looks like in newer versions of proxmox, the only fencing type advised
> > is watchdog. Is that the case?
> >
>
> Yes, since PVE 4.0 watchdog fencing is the norm.
> There is a patch set of mine which implements the use of external fence
> device,
> but it has seen no review. I should probably dust it up, look over it and
> re send
> it again, it's about time we finally get this feature.
>

I think you should definitely get this feature in - I would even say it is
necessary for an enterprise HA setup?


> > Is it still possible to do PDU fencing as well? This should enable us to
> be
> > able to fail over faster as the fence will not fail if the machine has no
> > power right?
> >
>
> No, at the moment external fence devices are not integrated.
> You can expect an faster recovery with external fence devices, at least in
> simple setups (i.e., not multiple fence device hierachy)
>
> cheers,
> Thomas
>


From wolfgang.bucher at netland-mn.de  Mon Dec  4 19:52:38 2017
From: wolfgang.bucher at netland-mn.de (=?utf-8?Q?Wolfgang_Bucher?=)
Date: Mon, 4 Dec 2017 19:52:38 +0100
Subject: [PVE-User] HA Fencing
Message-ID: <kcim.5a259976.2d3d.6d9854044fa6a3bd@kopano.netland.local>

Vielen Dank! 
 
 
Gesendet ?ber BlackBerry Hub f?r Android <http://play.google.com/store/apps/details?id&#61;com.blackberry.hub> 
 
 
Von: mark at openvs.co.uk
Gesendet: 4. Dezember 2017 19:52
An: t.lamprecht at proxmox.com
Cc: pve-user at pve.proxmox.com
Betreff: Re: [PVE-User] HA Fencing


Hi,

On 17 November 2017 at 10:55, Thomas Lamprecht <t.lamprecht at proxmox.com>
wrote:

> Hi,
>
> On 11/16/2017 07:20 PM, Mark Adams wrote:
> > Hi all,
> >
> > It looks like in newer versions of proxmox, the only fencing type advised
> > is watchdog. Is that the case?
> >
>
> Yes, since PVE 4.0 watchdog fencing is the norm.
> There is a patch set of mine which implements the use of external fence
> device,
> but it has seen no review. I should probably dust it up, look over it and
> re send
> it again, it's about time we finally get this feature.
>

I think you should definitely get this feature in - I would even say it is
necessary for an enterprise HA setup?


> > Is it still possible to do PDU fencing as well? This should enable us to
> be
> > able to fail over faster as the fence will not fail if the machine has no
> > power right?
> >
>
> No, at the moment external fence devices are not integrated.
> You can expect an faster recovery with external fence devices, at least in
> simple setups (i.e., not multiple fence device hierachy)
>
> cheers,
> Thomas
>
_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From t.lamprecht at proxmox.com  Tue Dec  5 09:52:41 2017
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Tue, 5 Dec 2017 09:52:41 +0100
Subject: [PVE-User] HA Fencing
In-Reply-To: <CAHxUxjCSShEnPm1_trBjzdLaHBL82WoTjYgCeymej6M0_ZZ2sQ@mail.gmail.com>
References: <CAHxUxjBsoxXf4+qYvCsn1Vd+Uu6wF5-fg_iyKhXKFdf5HWjoKA@mail.gmail.com>
 <0bc124d5-a080-2e38-af47-bdcf36507bc3@proxmox.com>
 <CAHxUxjCSShEnPm1_trBjzdLaHBL82WoTjYgCeymej6M0_ZZ2sQ@mail.gmail.com>
Message-ID: <4fac619c-73e9-f95a-f00c-e5817c932e4b@proxmox.com>

Hi,

On 12/04/2017 07:51 PM, Mark Adams wrote:
> On 17 November 2017 at 10:55, Thomas Lamprecht <t.lamprecht at proxmox.com> wrote:
>> On 11/16/2017 07:20 PM, Mark Adams wrote:
>>> Hi all,
>>>
>>> It looks like in newer versions of proxmox, the only fencing type advised
>>> is watchdog. Is that the case?
>>>
>>
>> Yes, since PVE 4.0 watchdog fencing is the norm.
>> There is a patch set of mine which implements the use of external fence
>> device,
>> but it has seen no review. I should probably dust it up, look over it and
>> re send
>> it again, it's about time we finally get this feature.
>>
> 
> I think you should definitely get this feature in - I would even say it is
> necessary for an enterprise HA setup?
> 

Not really a necessary. Watchdog based fencing is no less secure than traditional
fence devices. In fact, as there's much less to configure, and much less protocols
between them I'd say its the opposite. I.e., you do not must fire up a command
over TCP/IP to fence a node to a device. Here are multiple problem points,
Link problems, high load problems delaying fencing, fence devices whit a setup not
well tested, at least not under failure conditions, ...
A watchdog, which triggers as soon as the node did not pulled it up, independent
of link failures, cluster load is here the safer bet. They are often the norm in
highly-secure critical embedded systems to, not without reason.
It's the difference between a emergency shutdown button and a dead-man-switch.

Maybe you didn't even meant the reliability stand point but that a better
best-case SLA could be possible with fence devices?

But nonetheless agreeing that we should really get it in. I'll try to pickup the
series before this month ends, after the Cluster over API stuff got in.

cheers,
Thomas


From mark at openvs.co.uk  Tue Dec  5 10:25:50 2017
From: mark at openvs.co.uk (Mark Adams)
Date: Tue, 5 Dec 2017 09:25:50 +0000
Subject: [PVE-User] HA Fencing
In-Reply-To: <4fac619c-73e9-f95a-f00c-e5817c932e4b@proxmox.com>
References: <CAHxUxjBsoxXf4+qYvCsn1Vd+Uu6wF5-fg_iyKhXKFdf5HWjoKA@mail.gmail.com>
 <0bc124d5-a080-2e38-af47-bdcf36507bc3@proxmox.com>
 <CAHxUxjCSShEnPm1_trBjzdLaHBL82WoTjYgCeymej6M0_ZZ2sQ@mail.gmail.com>
 <4fac619c-73e9-f95a-f00c-e5817c932e4b@proxmox.com>
Message-ID: <CAHxUxjAsdrOQyL9fa5bfbbXQ4_hxsJ10xoPbphRSe89G9t_7MQ@mail.gmail.com>

On 5 December 2017 at 08:52, Thomas Lamprecht <t.lamprecht at proxmox.com>
wrote:

> Hi,
>
> On 12/04/2017 07:51 PM, Mark Adams wrote:
> > On 17 November 2017 at 10:55, Thomas Lamprecht <t.lamprecht at proxmox.com>
> wrote:
> >> On 11/16/2017 07:20 PM, Mark Adams wrote:
> >>> Hi all,
> >>>
> >>> It looks like in newer versions of proxmox, the only fencing type
> advised
> >>> is watchdog. Is that the case?
> >>>
> >>
> >> Yes, since PVE 4.0 watchdog fencing is the norm.
> >> There is a patch set of mine which implements the use of external fence
> >> device,
> >> but it has seen no review. I should probably dust it up, look over it
> and
> >> re send
> >> it again, it's about time we finally get this feature.
> >>
> >
> > I think you should definitely get this feature in - I would even say it
> is
> > necessary for an enterprise HA setup?
> >
>
> Not really a necessary. Watchdog based fencing is no less secure than
> traditional
> fence devices. In fact, as there's much less to configure, and much less
> protocols
> between them I'd say its the opposite. I.e., you do not must fire up a
> command
> over TCP/IP to fence a node to a device. Here are multiple problem points,
> Link problems, high load problems delaying fencing, fence devices whit a
> setup not
> well tested, at least not under failure conditions, ...
> A watchdog, which triggers as soon as the node did not pulled it up,
> independent
> of link failures, cluster load is here the safer bet. They are often the
> norm in
> highly-secure critical embedded systems to, not without reason.
> It's the difference between a emergency shutdown button and a
> dead-man-switch.
>

AFAIK It's the only way to know for sure, that your server has actually
been fenced when it is not contactable by other means, For instance some
network issue on the host.

Yes the Watchdog on the machine that goes offline should fence itself, but
still the only way to know for sure that the machine is dead is to power it
off right?


> Maybe you didn't even meant the reliability stand point but that a better
> best-case SLA could be possible with fence devices?
>

This does make a difference too, it could fail over in seconds with faster
fencing.


>
> But nonetheless agreeing that we should really get it in. I'll try to
> pickup the
> series before this month ends, after the Cluster over API stuff got in.
>

Thanks it would be great to see it in.


>
> cheers,
> Thomas
>


From t.lamprecht at proxmox.com  Tue Dec  5 11:05:11 2017
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Tue, 5 Dec 2017 11:05:11 +0100
Subject: [PVE-User] HA Fencing
In-Reply-To: <CAHxUxjAsdrOQyL9fa5bfbbXQ4_hxsJ10xoPbphRSe89G9t_7MQ@mail.gmail.com>
References: <CAHxUxjBsoxXf4+qYvCsn1Vd+Uu6wF5-fg_iyKhXKFdf5HWjoKA@mail.gmail.com>
 <0bc124d5-a080-2e38-af47-bdcf36507bc3@proxmox.com>
 <CAHxUxjCSShEnPm1_trBjzdLaHBL82WoTjYgCeymej6M0_ZZ2sQ@mail.gmail.com>
 <4fac619c-73e9-f95a-f00c-e5817c932e4b@proxmox.com>
 <CAHxUxjAsdrOQyL9fa5bfbbXQ4_hxsJ10xoPbphRSe89G9t_7MQ@mail.gmail.com>
Message-ID: <c0ea7ddf-9e3b-59c9-ae56-b00e378bbf26@proxmox.com>

On 12/05/2017 10:25 AM, Mark Adams wrote:
> On 5 December 2017 at 08:52, Thomas Lamprecht <t.lamprecht at proxmox.com>
> wrote:
>> On 12/04/2017 07:51 PM, Mark Adams wrote:
>>> On 17 November 2017 at 10:55, Thomas Lamprecht <t.lamprecht at proxmox.com>
>>>> wrote:
>>>> On 11/16/2017 07:20 PM, Mark Adams wrote:
>>>>> Hi all,
>>>>>
>>>>> It looks like in newer versions of proxmox, the only fencing type
>>>>> advised is watchdog. Is that the case?
>>>>>
>>>>
>>>> Yes, since PVE 4.0 watchdog fencing is the norm.
>>>> There is a patch set of mine which implements the use of external fence
>>>> device, but it has seen no review. I should probably dust it up, look
>>>> over it and re send it again, it's about time we finally get this feature.
>>>>
>>>
>>> I think you should definitely get this feature in - I would even say it
>>> is necessary for an enterprise HA setup?
>>>
>>
>> Not really a necessary. Watchdog based fencing is no less secure than
>> traditional
>> fence devices. In fact, as there's much less to configure, and much less
>> protocols
>> between them I'd say its the opposite. I.e., you do not must fire up a
>> command
>> over TCP/IP to fence a node to a device. Here are multiple problem points,
>> Link problems, high load problems delaying fencing, fence devices whit a
>> setup not
>> well tested, at least not under failure conditions, ...
>> A watchdog, which triggers as soon as the node did not pulled it up,
>> independent
>> of link failures, cluster load is here the safer bet. They are often the
>> norm in
>> highly-secure critical embedded systems to, not without reason.
>> It's the difference between a emergency shutdown button and a
>> dead-man-switch.
>>
> 
> AFAIK It's the only way to know for sure, that your server has actually
> been fenced when it is not contactable by other means, For instance some
> network issue on the host.
> 

Both the Fence devices and a Watchdog can be possibly "wrong", thus we
*always* acquire a cluster wide lock to ensure that we only do anything
HA related if we're in the quorate partition and in an OK state.

With the watchdog you know that it released all resources for sure if the
node went out of the quorate partition for a certain time.
We then try to acquire the nodes local resource manager lock, only then
we start recovery of the fenced services. This lock together with the
watchdog guarantees us that we do not access the same resource twice.

Even if the node starts now up OK again it won't get its lock immediately
and thus won't start any HA service. Only once the recovery had been taken
place and completed it can reintegrate in the cluster and do work again.
If you just power it down with a external fence device it always needs
manual intervention, with the watchdog mechanism you won't need that if
the source of the quorum loss was a temporary switch hiccup or similar -
a bit rare but not unheard of.

> Yes the Watchdog on the machine that goes offline should fence itself, but
> still the only way to know for sure that the machine is dead is to power it
> off right?
> 

Not necessarily (see above). Also network fencing is a thing, i.e. cut all
network links related to shared resources (storage, public network, ...)
This allows to investigate the still running, but fenced off, node for the
failure reason - if wished.

> 
>> Maybe you didn't even meant the reliability stand point but that a better
>> best-case SLA could be possible with fence devices?
>>
> 
> This does make a difference too, it could fail over in seconds with faster
> fencing.
> 

Depends a bit on the fencing devices used, I had some experiences where it was
slower than I expected when testing, but yes still a tad faster than the "wait
for the watchdog+lock" approach, though.

cheers,
Thomas

maybe you can find some more information here, if not read already:
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html


From mark at openvs.co.uk  Tue Dec  5 16:43:44 2017
From: mark at openvs.co.uk (Mark Adams)
Date: Tue, 5 Dec 2017 15:43:44 +0000
Subject: [PVE-User] ZFS Replication
Message-ID: <CAHxUxjBwOCdWnLbKD2N7wONk0QRpCFPEA0tuZZutkmgEhC2w+Q@mail.gmail.com>

Im just trying out the zfs replication in proxmox, nice work!

Just a few questions..

- Is it possible to change the network that does the replication? (IE be
good to use a direct connected with balance-rr for throughput)

- Is it possible to replicate between machines that are not in the same
cluster?

Both can be easily done via zfs send/recv in cli of course, but wonder if
this is possible through the web interface?

And lastly, what is the correct procedure for using a replicated VM, should
it be needed?


Thanks,
Mark


From gilberto.nunes32 at gmail.com  Tue Dec  5 17:07:06 2017
From: gilberto.nunes32 at gmail.com (Gilberto Nunes)
Date: Tue, 5 Dec 2017 14:07:06 -0200
Subject: [PVE-User] ZFS Replication
In-Reply-To: <CAOKSTBs+E_7tDN1s0ZtEdRk0t+r7NH+r3AeOAOca6_dRViNTRQ@mail.gmail.com>
References: <CAHxUxjBwOCdWnLbKD2N7wONk0QRpCFPEA0tuZZutkmgEhC2w+Q@mail.gmail.com>
 <CAOKSTBs+E_7tDN1s0ZtEdRk0t+r7NH+r3AeOAOca6_dRViNTRQ@mail.gmail.com>
Message-ID: <CAOKSTBtiY0oY2uCQiPLEFPem4jxrz0h_5fxDaE9smekDE7Lgzg@mail.gmail.com>

For my experience if you set the host in /etc/hosts put different IP
addresses you can use different network to traffic cluster and Replication

Em 5 de dez de 2017 13:44, "Mark Adams" <mark at openvs.co.uk> escreveu:

Im just trying out the zfs replication in proxmox, nice work!

Just a few questions..

- Is it possible to change the network that does the replication? (IE be
good to use a direct connected with balance-rr for throughput)

- Is it possible to replicate between machines that are not in the same
cluster?

Both can be easily done via zfs send/recv in cli of course, but wonder if
this is possible through the web interface?

And lastly, what is the correct procedure for using a replicated VM, should
it be needed?


Thanks,
Mark
_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From w.link at proxmox.com  Wed Dec  6 08:05:05 2017
From: w.link at proxmox.com (Wolfgang Link)
Date: Wed, 6 Dec 2017 08:05:05 +0100 (CET)
Subject: [PVE-User] ZFS Replication
In-Reply-To: <CAHxUxjBwOCdWnLbKD2N7wONk0QRpCFPEA0tuZZutkmgEhC2w+Q@mail.gmail.com>
References: <CAHxUxjBwOCdWnLbKD2N7wONk0QRpCFPEA0tuZZutkmgEhC2w+Q@mail.gmail.com>
Message-ID: <1935809464.5.1512543905883@webmail.proxmox.com>

Hi Mark,

> - Is it possible to change the network that does the replication? (IE be
> good to use a direct connected with balance-rr for throughput)

You can change the replication network in the datacenter.conf option migration.

> - Is it possible to replicate between machines that are not in the same
> cluster?

For this task you have to use pve-zsync.

> Both can be easily done via zfs send/recv in cli of course, but wonder if
> this is possible through the web interface?

No it is not.


From davel at upilab.com  Wed Dec  6 17:56:11 2017
From: davel at upilab.com (David Lawley)
Date: Wed, 6 Dec 2017 11:56:11 -0500
Subject: [PVE-User] bridge issue after last update
Message-ID: <55d665ad-2fdc-83fe-36e1-31c8442558c6@upilab.com>

Have a single node server for a test bed sort of..

Applied updates this morning.

Afterward I lost connectivity between the network and bridged VMs

This was good practice as there was no pressure ;)

Anyway found that these items had been changed from 0 to 1

bridge-nf-call-arptables
bridge-nf-call-iptables
bridge-nf-call-ip6tablesP

Not sure how it got changed, checked it against my production servers

Did something else happen that I missed that might have been part of 
Prox?  Just not yet clear of what conditions would have done it.

Or just a one off crap shoot that will never happen again???

proxmox-ve: 5.1-30 (running kernel: 4.13.8-3-pve)
pve-manager: 5.1-38 (running version: 5.1-38/1e9bc777)
pve-kernel-4.13.3-1-pve: 4.13.3-2
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-7
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-17
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-22
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-3
pve-container: 2.0-17
pve-firewall: 3.0-4
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9


From andreas at mx20.org  Wed Dec  6 18:32:00 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Wed, 6 Dec 2017 18:32:00 +0100
Subject: [PVE-User] bridge issue after last update
In-Reply-To: <55d665ad-2fdc-83fe-36e1-31c8442558c6@upilab.com>
References: <55d665ad-2fdc-83fe-36e1-31c8442558c6@upilab.com>
Message-ID: <f7cfb485-20f1-8383-e169-dbf2c1081464@mx20.org>

Hi there,

On 06.12.2017 17:56, David Lawley wrote:
> Have a single node server for a test bed sort of..
> 
> Applied updates this morning.
> 
> Afterward I lost connectivity between the network and bridged VMs
> 
> This was good practice as there was no pressure ;)
> 
> Anyway found that these items had been changed from 0 to 1
> 
> bridge-nf-call-arptables
> bridge-nf-call-iptables
> bridge-nf-call-ip6tablesP
> 
> Not sure how it got changed, checked it against my production servers

ACK, but the problem is tricky:

/etc/sysctl.d/pve.conf was changed to /etc/sysctl.d/pve.conf/sysctl.conf
and is ignored.

Have a look at Manual page sysctl.conf(5): /etc/sysctl.d/*.conf

Andreas


From andreas at mx20.org  Wed Dec  6 18:43:46 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Wed, 6 Dec 2017 18:43:46 +0100
Subject: [PVE-User] WARNING: Upgrade and Watchdog kills Server in HA-Mode
Message-ID: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>

Hi there,

be warned: the actual update may reboot your server if in HA-Mode. It happened
on 2 of 5 servers!

The following packages will be upgraded:
   libpve-common-perl (5.0-20 => 5.0-22)
   libpve-http-server-perl (2.0-6 => 2.0-7)
   libpve-storage-perl (5.0-16 => 5.0-17)
   lxc-pve (2.1.0-2 => 2.1.1-2)
   lxcfs (2.0.7-pve4 => 2.0.8-1)
   pve-cluster (5.0-15 => 5.0-17)
   pve-firewall (3.0-3 => 3.0-4)
   pve-ha-manager (2.0-3 => 2.0-4)
   pve-manager (5.1-36 => 5.1-38)
   pve-qemu-kvm (2.9.1-2 => 2.9.1-3)
   spiceterm (3.0-4 => 3.0-5)
   vncterm (1.5-2 => 1.5-3)


Installing new version of config file /etc/rc.d/init.d/lxcfs ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up pve-cluster (5.0-17) ...
Removing obsolete conffile /etc/default/pve-cluster ...
Setting up pve-firewall (3.0-4) ...
Setting up lxc-pve (2.1.1-2) ...
Installing new version of config file
/etc/apparmor.d/abstractions/lxc/container-base ...
Setting up libpve-http-server-perl (2.0-7) ...
Setting up libpve-storage-perl (5.0-17) ...
Setting up pve-ha-manager (2.0-4) ...
watchdog-mux.service is a disabled or a static unit, not starting it.
Setting up pve-manager (5.1-38) ...
Installing new version of config file /etc/logrotate.d/pve ...
....
REBOOT


root at nethcn-b1:~# apt-get upgrade
E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to
correct the problem.
root at nethcn-b1:~# dpkg --configure -a
Setting up pve-manager (5.1-38) ...
Processing triggers for libc-bin (2.24-11+deb9u1) ...


Andreas


From davel at upilab.com  Wed Dec  6 19:25:35 2017
From: davel at upilab.com (David Lawley)
Date: Wed, 6 Dec 2017 13:25:35 -0500
Subject: [PVE-User] bridge issue after last update
In-Reply-To: <f7cfb485-20f1-8383-e169-dbf2c1081464@mx20.org>
References: <55d665ad-2fdc-83fe-36e1-31c8442558c6@upilab.com>
 <f7cfb485-20f1-8383-e169-dbf2c1081464@mx20.org>
Message-ID: <91bed31a-9083-8772-cef6-45808cbf89f5@upilab.com>


On 12/6/2017 12:32 PM, Andreas Herrmann wrote:

Ok, got it. I see area you are talking about

Guess it must be missing it, as fs.aio-max-nr is incorrect too.

sysctl -a is showing fs.aio-max-nr = 65536

pve.conf is suppose to set it to fs.aio-max-nr = 1048576

My install may be botched, its been inop a few times since its an older 
server that I have had to fall back kernel versions once or twice, since 
5.1 has been hit/miss on some older hardware...


> ACK, but the problem is tricky:
> 
> /etc/sysctl.d/pve.conf was changed to /etc/sysctl.d/pve.conf/sysctl.conf
> and is ignored.
> 
> Have a look at Manual page sysctl.conf(5): /etc/sysctl.d/*.conf
> 
> Andreas
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From f.gruenbichler at proxmox.com  Wed Dec  6 20:38:43 2017
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Wed, 6 Dec 2017 20:38:43 +0100
Subject: [PVE-User] bridge issue after last update
In-Reply-To: <91bed31a-9083-8772-cef6-45808cbf89f5@upilab.com>
References: <55d665ad-2fdc-83fe-36e1-31c8442558c6@upilab.com>
 <f7cfb485-20f1-8383-e169-dbf2c1081464@mx20.org>
 <91bed31a-9083-8772-cef6-45808cbf89f5@upilab.com>
Message-ID: <20171206193843.e5mfuygx26lybepr@nora.maurer-it.com>

On Wed, Dec 06, 2017 at 01:25:35PM -0500, David Lawley wrote:
> 
> 
> On 12/6/2017 12:32 PM, Andreas Herrmann wrote:
> 
> Ok, got it. I see area you are talking about
> 
> Guess it must be missing it, as fs.aio-max-nr is incorrect too.
> 
> sysctl -a is showing fs.aio-max-nr = 65536
> 
> pve.conf is suppose to set it to fs.aio-max-nr = 1048576
> 
> My install may be botched, its been inop a few times since its an older
> server that I have had to fall back kernel versions once or twice, since 5.1
> has been hit/miss on some older hardware...
> 
> 
> 
> > ACK, but the problem is tricky:
> > 
> > /etc/sysctl.d/pve.conf was changed to /etc/sysctl.d/pve.conf/sysctl.conf
> > and is ignored.
> > 
> > Have a look at Manual page sysctl.conf(5): /etc/sysctl.d/*.conf
> > 
> > Andreas

that is a bug that slipped through while refactoring the packaging of
pve-cluster, I'll send a patch to pve-devel and updated packages will be
available tomorrow!


From t.lamprecht at proxmox.com  Thu Dec  7 08:57:56 2017
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Thu, 7 Dec 2017 08:57:56 +0100
Subject: [PVE-User] WARNING: Upgrade and Watchdog kills Server in HA-Mode
In-Reply-To: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
References: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
Message-ID: <6e4940d4-6c10-f253-7dad-f93959c111fc@proxmox.com>

Hi,

some more information would be great to check this.
First, do you have a daemon(like) service loading sysctl
configs on the fly? If not we may rule out the sysctl config problem
as a trigger for this.

On 12/06/2017 06:43 PM, Andreas Herrmann wrote:
> Hi there,
> 
> be warned: the actual update may reboot your server if in HA-Mode. It happened
> on 2 of 5 servers!
> 
> The following packages will be upgraded:
>    libpve-common-perl (5.0-20 => 5.0-22)
>    libpve-http-server-perl (2.0-6 => 2.0-7)
>    libpve-storage-perl (5.0-16 => 5.0-17)
>    lxc-pve (2.1.0-2 => 2.1.1-2)
>    lxcfs (2.0.7-pve4 => 2.0.8-1)
>    pve-cluster (5.0-15 => 5.0-17)
>    pve-firewall (3.0-3 => 3.0-4)

Can you describe your firewall setup a bit?
Do you use Firewall groups?

>    pve-ha-manager (2.0-3 => 2.0-4)
>    pve-manager (5.1-36 => 5.1-38)
>    pve-qemu-kvm (2.9.1-2 => 2.9.1-3)
>    spiceterm (3.0-4 => 3.0-5)
>    vncterm (1.5-2 => 1.5-3)
> 
> 
> Installing new version of config file /etc/rc.d/init.d/lxcfs ...
> Processing triggers for man-db (2.7.6.1-2) ...
> Setting up pve-cluster (5.0-17) ...
> Removing obsolete conffile /etc/default/pve-cluster ...
> Setting up pve-firewall (3.0-4) ...
> Setting up lxc-pve (2.1.1-2) ...
> Installing new version of config file
> /etc/apparmor.d/abstractions/lxc/container-base ...
> Setting up libpve-http-server-perl (2.0-7) ...
> Setting up libpve-storage-perl (5.0-17) ...
> Setting up pve-ha-manager (2.0-4) ...
> watchdog-mux.service is a disabled or a static unit, not starting it.
> Setting up pve-manager (5.1-38) ...
> Installing new version of config file /etc/logrotate.d/pve ...
> ....
> REBOOT
> 

Do you got some log entries around that time?
Or a persistent journal?

thanks,
Thomas

> 
> root at nethcn-b1:~# apt-get upgrade
> E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to
> correct the problem.
> root at nethcn-b1:~# dpkg --configure -a
> Setting up pve-manager (5.1-38) ...
> Processing triggers for libc-bin (2.24-11+deb9u1) ...
> 
> 
> Andreas
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From andreas at mx20.org  Thu Dec  7 13:08:38 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Thu, 7 Dec 2017 13:08:38 +0100
Subject: [PVE-User] WARNING: Upgrade and Watchdog kills Server in HA-Mode
In-Reply-To: <6e4940d4-6c10-f253-7dad-f93959c111fc@proxmox.com>
References: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
 <6e4940d4-6c10-f253-7dad-f93959c111fc@proxmox.com>
Message-ID: <f54694fe-828f-ad28-c1df-c0c47a8fb65c@mx20.org>

Hi,

On 07.12.2017 08:57, Thomas Lamprecht wrote:
> some more information would be great to check this.
> First, do you have a daemon(like) service loading sysctl
> configs on the fly? If not we may rule out the sysctl config problem
> as a trigger for this.

No. It's a quite new installation from ISO without upgrade from Proxmox 4 and
really less modifications.

> Can you describe your firewall setup a bit?
> Do you use Firewall groups?

We don't use Proxmox firewall at all. We have uif based rules and no
limitations between the proxmox hosts:

        # Zugriff der Nodes untereinander
        in+     s=nethcn-b-vl58(4),nethcn-b-vl802(4)
        # Die beiden Corosync HA Ringe
        in+     i=coro1 s=nethcn-b-ha1(4)
        in+     i=coro2 s=nethcn-b-ha2(4)
        # Ceph Traffic
        in+     i=ceph s=nethcn-b-store(4)

> Do you got some log entries around that time?
> Or a persistent journal?

Some logs are attached. nethcn-b5 rebootet after I restarted services with
needrestart. nethcn-b4 rebootet in between the update. Maybe are problem with
communication between watchdog-mux.service und Proxmox.

Maybe I should change to hardware watchdog provided by Supermicro X10SRW-F
mainboard.

Andreas
-------------- next part --------------
Dec  6 17:51:08 nethcn-b2 systemd[1]: Created slice User Slice of root.
Dec  6 17:51:08 nethcn-b2 systemd[1]: Starting User Manager for UID 0...
Dec  6 17:51:08 nethcn-b2 systemd[1]: Started Session 2294 of user root.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Listening on GnuPG cryptographic agent and passphrase cache.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Reached target Paths.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Reached target Timers.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Listening on GnuPG network certificate management daemon.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Listening on GnuPG cryptographic agent (access for web browsers).
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Reached target Sockets.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Reached target Basic System.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Reached target Default.
Dec  6 17:51:08 nethcn-b2 systemd[25841]: Startup finished in 21ms.
Dec  6 17:51:08 nethcn-b2 systemd[1]: Started User Manager for UID 0.
Dec  6 17:51:14 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:14 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 15min 25.622828s random time.
Dec  6 17:51:14 nethcn-b2 systemd[1]: apt-daily.timer: Adding 6h 6min 27.629758s random time.
Dec  6 17:51:14 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 24min 42.371776s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 10h 23min 49.731837s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 25min 49.899301s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 1h 1min 44.339369s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 53min 41.700970s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 4h 19min 32.155871s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 33min 33.939842s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 10h 3min 29.743451s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 26min 34.968617s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 10h 29min 18.753427s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 28min 47.463310s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 1h 32min 44.821502s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 15min 11.470765s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 43min 12.485912s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 11min 29.546795s random time.
Dec  6 17:51:15 nethcn-b2 systemd[1]: apt-daily.timer: Adding 2h 35min 42.196692s random time.
Dec  6 17:51:16 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:17 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 49min 31.062780s random time.
Dec  6 17:51:17 nethcn-b2 systemd[1]: apt-daily.timer: Adding 4h 32min 30.982647s random time.
Dec  6 17:51:17 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:17 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 45min 39.993857s random time.
Dec  6 17:51:17 nethcn-b2 systemd[1]: apt-daily.timer: Adding 3h 8min 26.608575s random time.
Dec  6 17:51:17 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:17 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 37min 6.641514s random time.
Dec  6 17:51:17 nethcn-b2 systemd[1]: apt-daily.timer: Adding 11h 34min 54.498924s random time.
Dec  6 17:51:18 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:18 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 8min 17.506967s random time.
Dec  6 17:51:18 nethcn-b2 systemd[1]: apt-daily.timer: Adding 3h 55min 54.889100s random time.
Dec  6 17:51:18 nethcn-b2 systemd[1]: Stopping The Proxmox VE cluster filesystem...
Dec  6 17:51:18 nethcn-b2 pmxcfs[9987]: [main] notice: teardown filesystem
Dec  6 17:51:20 nethcn-b2 pmxcfs[9987]: [main] notice: exit proxmox configuration filesystem (0)
Dec  6 17:51:20 nethcn-b2 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Dec  6 17:51:20 nethcn-b2 systemd[1]: Starting The Proxmox VE cluster filesystem...
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: update cluster info (cluster name  NETHCN-B, version = 7)
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: node has quorum
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: members: 1/10104, 2/28566, 3/29106, 4/30188, 5/10652
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: starting data syncronisation
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: received sync request (epoch 1/10104/00000011)
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: members: 1/10104, 2/28566, 3/29106, 4/30188, 5/10652
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: starting data syncronisation
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: received sync request (epoch 1/10104/00000011)
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: received all states
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: leader is 1/10104
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: synced members: 1/10104, 2/28566, 3/29106, 4/30188, 5/10652
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [dcdb] notice: all data is up to date
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: received all states
Dec  6 17:51:20 nethcn-b2 pmxcfs[28566]: [status] notice: all data is up to date
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ERROR: Connection refused
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: server received shutdown request
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: server stopped
Dec  6 17:51:20 nethcn-b2 watchdog-mux[3397]: client did not stop watchdog - disable watchdog updates
Dec  6 17:51:20 nethcn-b2 systemd[1]: pve-ha-crm.service: Main process exited, code=exited, status=255/n/a
Dec  6 17:51:21 nethcn-b2 systemd[1]: Started The Proxmox VE cluster filesystem.
Dec  6 17:51:21 nethcn-b2 systemd[1]: Reloading Proxmox VE firewall.
Dec  6 17:51:21 nethcn-b2 systemd[1]: pve-ha-crm.service: Unit entered failed state.
Dec  6 17:51:21 nethcn-b2 systemd[1]: pve-ha-crm.service: Failed with result 'exit-code'.
Dec  6 17:51:21 nethcn-b2 pve-ha-lrm[13145]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:51:21 nethcn-b2 watchdog-mux[3397]: exit watchdog-mux with active connections
Dec  6 17:51:21 nethcn-b2 kernel: [88876.361477] watchdog: watchdog0: watchdog did not stop!
Dec  6 17:51:21 nethcn-b2 pve-firewall[28714]: send HUP to 10566
Dec  6 17:51:21 nethcn-b2 pve-firewall[10566]: received signal HUP
Dec  6 17:51:21 nethcn-b2 pve-firewall[10566]: server shutdown (restart)
Dec  6 17:51:21 nethcn-b2 systemd[1]: Reloaded Proxmox VE firewall.
Dec  6 17:51:22 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 43min 57.561346s random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily.timer: Adding 1h 53min 46.711159s random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 23min 666.457ms random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily.timer: Adding 5h 40min 48.607339s random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: Stopping Proxmox VE firewall logger...
Dec  6 17:51:22 nethcn-b2 pvepw-logger[22772]: received terminate request (signal)
Dec  6 17:51:22 nethcn-b2 pvepw-logger[22772]: stopping pvefw logger
Dec  6 17:51:22 nethcn-b2 pve-firewall[10566]: restarting server
Dec  6 17:51:22 nethcn-b2 systemd[1]: Stopped Proxmox VE firewall logger.
Dec  6 17:51:22 nethcn-b2 systemd[1]: Starting Proxmox VE firewall logger...
Dec  6 17:51:22 nethcn-b2 pvefw-logger[28896]: starting pvefw logger
Dec  6 17:51:22 nethcn-b2 systemd[1]: Started Proxmox VE firewall logger.
Dec  6 17:51:22 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 16min 39.381721s random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily.timer: Adding 1h 23min 56.458060s random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 53min 22.748230s random time.
Dec  6 17:51:22 nethcn-b2 systemd[1]: apt-daily.timer: Adding 4h 43min 27.611334s random time.
Dec  6 17:51:22 nethcn-b2 kernel: [88877.490542] audit: type=1400 audit(1512579082.769:14): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=28947 comm="apparmor_parser"
Dec  6 17:51:22 nethcn-b2 kernel: [88877.677013] audit: type=1400 audit(1512579082.955:15): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default" pid=28951 comm="apparmor_parser"
Dec  6 17:51:22 nethcn-b2 kernel: [88877.693940] audit: type=1400 audit(1512579082.955:16): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-cgns" pid=28951 comm="apparmor_parser"
Dec  6 17:51:22 nethcn-b2 kernel: [88877.711368] audit: type=1400 audit(1512579082.956:17): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-with-mounting" pid=28951 comm="apparmor_parser"
Dec  6 17:51:23 nethcn-b2 kernel: [88877.729675] audit: type=1400 audit(1512579082.956:18): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-with-nesting" pid=28951 comm="apparmor_parser"
Dec  6 17:51:23 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:23 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 8min 32.222778s random time.
Dec  6 17:51:23 nethcn-b2 systemd[1]: apt-daily.timer: Adding 9h 48min 39.647146s random time.
Dec  6 17:51:23 nethcn-b2 pvestatd[10618]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:51:23 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:24 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 36min 23.950978s random time.
Dec  6 17:51:24 nethcn-b2 systemd[1]: apt-daily.timer: Adding 7h 27min 37.724293s random time.
Dec  6 17:51:24 nethcn-b2 systemd[1]: Reloading.
Dec  6 17:51:24 nethcn-b2 systemd[1]: apt-daily-upgrade.timer: Adding 28min 57.947572s random time.
Dec  6 17:51:24 nethcn-b2 systemd[1]: apt-daily.timer: Adding 3h 8min 51.079398s random time.
Dec  6 17:51:24 nethcn-b2 systemd[1]: Started Session 2296 of user root.
Dec  6 17:51:24 nethcn-b2 systemd[1]: Stopping PVE Local HA Ressource Manager Daemon...
Dec  6 17:51:24 nethcn-b2 pve-ha-lrm[13145]: received signal TERM
Dec  6 17:51:24 nethcn-b2 pve-ha-lrm[13145]: restart LRM, freeze all services
-------------- next part --------------
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: members: 1/10104, 2/9987, 3/9969, 4/9839
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: starting data syncronisation
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [status] notice: members: 1/10104, 2/9987, 3/9969, 4/9839
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [status] notice: starting data syncronisation
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: received sync request (epoch 1/10104/00000008)
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [status] notice: received sync request (epoch 1/10104/00000008)
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: received all states
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: leader is 1/10104
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: synced members: 1/10104, 2/9987, 3/9969, 4/9839
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [dcdb] notice: all data is up to date
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [status] notice: received all states
Dec  6 17:27:33 nethcn-b2 pmxcfs[9987]: [status] notice: all data is up to date
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: members: 1/10104, 2/9987, 3/9969, 4/9839, 5/14789
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: starting data syncronisation
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [status] notice: members: 1/10104, 2/9987, 3/9969, 4/9839, 5/14789
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [status] notice: starting data syncronisation
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: received sync request (epoch 1/10104/00000009)
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [status] notice: received sync request (epoch 1/10104/00000009)
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: received all states
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: leader is 1/10104
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: synced members: 1/10104, 2/9987, 3/9969, 4/9839, 5/14789
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [dcdb] notice: all data is up to date
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [status] notice: received all states
Dec  6 17:27:34 nethcn-b2 pmxcfs[9987]: [status] notice: all data is up to date
Dec  6 17:28:00 nethcn-b2 systemd[1]: Starting Proxmox VE replication runner...
Dec  6 17:28:01 nethcn-b2 systemd[1]: Started Proxmox VE replication runner.
Dec  6 17:28:01 nethcn-b2 CRON[21670]: (root) CMD (   sleep $((RANDOM % 20)); /usr/local/sbin/check_ipmi.sh)
Dec  6 17:28:20 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:20Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:23 nethcn-b2 corosync[10210]: notice  [TOTEM ] A new membership (192.168.112.1:2524) was formed. Members left: 5
Dec  6 17:28:23 nethcn-b2 corosync[10210]: notice  [TOTEM ] Failed to receive the leave message. failed: 5
Dec  6 17:28:23 nethcn-b2 corosync[10210]:  [TOTEM ] A new membership (192.168.112.1:2524) was formed. Members left: 5
Dec  6 17:28:23 nethcn-b2 corosync[10210]:  [TOTEM ] Failed to receive the leave message. failed: 5
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: members: 1/10104, 2/9987, 3/9969, 4/9839
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: starting data syncronisation
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [status] notice: members: 1/10104, 2/9987, 3/9969, 4/9839
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [status] notice: starting data syncronisation
Dec  6 17:28:23 nethcn-b2 corosync[10210]: notice  [QUORUM] Members[4]: 1 2 3 4
Dec  6 17:28:23 nethcn-b2 corosync[10210]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Dec  6 17:28:23 nethcn-b2 corosync[10210]:  [QUORUM] Members[4]: 1 2 3 4
Dec  6 17:28:23 nethcn-b2 corosync[10210]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: received sync request (epoch 1/10104/0000000A)
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [status] notice: received sync request (epoch 1/10104/0000000A)
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: received all states
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: leader is 1/10104
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: synced members: 1/10104, 2/9987, 3/9969, 4/9839
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: all data is up to date
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [dcdb] notice: dfsm_deliver_queue: queue length 11
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [status] notice: received all states
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [status] notice: all data is up to date
Dec  6 17:28:23 nethcn-b2 pmxcfs[9987]: [status] notice: dfsm_deliver_queue: queue length 26
Dec  6 17:28:24 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:24Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:28 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:28Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:32 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:32Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057623 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057651 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057658 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057665 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057672 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057681 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057688 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:32 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:32.057694 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:12.057620)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058175 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058198 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058212 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058224 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058238 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058250 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058263 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:33.058274 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:13.058171)
Dec  6 17:28:33 nethcn-b2 pvestatd[10618]: status update time (9.911 seconds)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058434 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058444 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058447 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058449 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058454 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058456 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058458 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:34.058460 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:14.058430)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382846 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382872 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382880 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382890 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382899 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382906 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382912 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:34 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:34.382918 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:14.382840)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058560 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058575 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058578 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058599 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058602 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058604 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058606 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[10845]: 2017-12-06 17:28:35.058609 7fbf4f84c700 -1 osd.11 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:11.439837 front 2017-12-06 17:28:11.439837 (cutoff 2017-12-06 17:28:15.058555)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202181 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202194 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202199 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202202 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202205 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202207 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202210 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11486]: 2017-12-06 17:28:35.202212 7f38ff3a1700 -1 osd.10 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:14.631386 front 2017-12-06 17:28:14.631386 (cutoff 2017-12-06 17:28:15.202177)
Dec  6 17:28:35 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:35Z E! InfluxDB Output Error: Post http://influxdb-b1.as6724.net:8086/write?consistency=any&db=noc_nethcn_telegraf: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Dec  6 17:28:35 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:35Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383153 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383170 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383173 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383178 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383180 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383183 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383185 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11856]: 2017-12-06 17:28:35.383187 7f308a81d700 -1 osd.14 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:13.944993 front 2017-12-06 17:28:13.944993 (cutoff 2017-12-06 17:28:15.383148)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471821 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6815 osd.32 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471853 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6803 osd.33 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471861 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6807 osd.34 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471871 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6827 osd.35 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471877 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6811 osd.36 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471888 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6819 osd.37 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471909 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6823 osd.38 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 ceph-osd[11590]: 2017-12-06 17:28:35.471916 7f6a57e74700 -1 osd.12 37761 heartbeat_check: no reply from 192.168.112.135:6831 osd.39 since back 2017-12-06 17:28:14.871165 front 2017-12-06 17:28:14.871165 (cutoff 2017-12-06 17:28:15.471814)
Dec  6 17:28:35 nethcn-b2 kernel: [87510.180709] libceph: osd32 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.184066] libceph: osd33 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.187436] libceph: osd34 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.190854] libceph: osd35 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.194260] libceph: osd36 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.197709] libceph: osd37 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.201060] libceph: osd38 down
Dec  6 17:28:35 nethcn-b2 kernel: [87510.204407] libceph: osd39 down
Dec  6 17:28:36 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:36Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:40 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:40Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:44 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:44Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:48 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:48Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:52 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:52Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:28:56 nethcn-b2 telegraf[29224]: 2017-12-06T16:28:56Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:29:00 nethcn-b2 telegraf[29224]: 2017-12-06T16:29:00Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:29:00 nethcn-b2 systemd[1]: Starting Proxmox VE replication runner...
Dec  6 17:29:01 nethcn-b2 systemd[1]: Started Proxmox VE replication runner.
Dec  6 17:29:04 nethcn-b2 telegraf[29224]: 2017-12-06T16:29:04Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:29:08 nethcn-b2 telegraf[29224]: 2017-12-06T16:29:08Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:29:12 nethcn-b2 telegraf[29224]: 2017-12-06T16:29:12Z E! Error in plugin [inputs.ceph]: took longer to collect than collection interval (4s)
Dec  6 17:29:29 nethcn-b2 nullmailer[914]: Rescanning queue.
Dec  6 17:29:55 nethcn-b2 corosync[10210]: notice  [TOTEM ] A new membership (192.168.112.1:2528) was formed. Members joined: 5
Dec  6 17:29:55 nethcn-b2 corosync[10210]:  [TOTEM ] A new membership (192.168.112.1:2528) was formed. Members joined: 5
Dec  6 17:29:59 nethcn-b2 corosync[10210]: notice  [TOTEM ] Retransmit List: 4
Dec  6 17:29:59 nethcn-b2 corosync[10210]:  [TOTEM ] Retransmit List: 4
-------------- next part --------------
Dec  6 17:27:20 nethcn-b5 systemd[1]: Created slice User Slice of root.
Dec  6 17:27:20 nethcn-b5 systemd[1]: Starting User Manager for UID 0...
Dec  6 17:27:20 nethcn-b5 systemd[1]: Started Session 2272 of user root.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Listening on GnuPG cryptographic agent (access for web browsers).
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Listening on GnuPG network certificate management daemon.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Reached target Paths.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Reached target Timers.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Listening on GnuPG cryptographic agent and passphrase cache.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Reached target Sockets.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Reached target Basic System.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Reached target Default.
Dec  6 17:27:20 nethcn-b5 systemd[9782]: Startup finished in 19ms.
Dec  6 17:27:20 nethcn-b5 systemd[1]: Started User Manager for UID 0.
Dec  6 17:27:24 nethcn-b5 kernel: [88684.296467] FW INVALID STATE: IN=vlan31 OUT= MAC=24:8a:07:20:c5:56:24:8a:07:20:c5:5e:08:00 SRC=192.168.112.131 DST=192.168.112.135 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=27581 DF PROTO=TCP SPT=34568 DPT=6789 WINDOW=0 RES=0x00 RST URGP=0 
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 54min 17.746014s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 7min 30.184150s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 9min 42.427373s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 35min 49.985856s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 57min 39.588322s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 41min 14.870258s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 14min 20.468467s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 9min 11.475661s random time.
Dec  6 17:27:30 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:30 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 38min 19.555617s random time.
Dec  6 17:27:31 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:31 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 30min 37.001210s random time.
Dec  6 17:27:31 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:32 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 2min 37.078602s random time.
Dec  6 17:27:32 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:32 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 40min 55.466580s random time.
Dec  6 17:27:32 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:32 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 1min 34.251377s random time.
Dec  6 17:27:32 nethcn-b5 systemd[1]: Stopping The Proxmox VE cluster filesystem...
Dec  6 17:27:32 nethcn-b5 pmxcfs[10077]: [main] notice: teardown filesystem
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ERROR: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: server received shutdown request
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: server stopped
Dec  6 17:27:33 nethcn-b5 systemd[1]: pve-ha-crm.service: Main process exited, code=exited, status=255/n/a
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[1] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[1] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: Unable to load access control list: Connection refused
Dec  6 17:27:33 nethcn-b5 systemd[1]: pve-ha-crm.service: Control process exited, code=exited status=111
Dec  6 17:27:33 nethcn-b5 systemd[1]: pve-ha-crm.service: Unit entered failed state.
Dec  6 17:27:33 nethcn-b5 systemd[1]: pve-ha-crm.service: Failed with result 'exit-code'.
Dec  6 17:27:34 nethcn-b5 pmxcfs[10077]: [main] notice: exit proxmox configuration filesystem (0)
Dec  6 17:27:34 nethcn-b5 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Dec  6 17:27:34 nethcn-b5 systemd[1]: Starting The Proxmox VE cluster filesystem...
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: update cluster info (cluster name  NETHCN-B, version = 7)
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: node has quorum
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: members: 1/10104, 2/9987, 3/9969, 4/9839, 5/14789
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: starting data syncronisation
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: members: 1/10104, 2/9987, 3/9969, 4/9839, 5/14789
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: starting data syncronisation
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: received sync request (epoch 1/10104/00000009)
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: received sync request (epoch 1/10104/00000009)
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: received all states
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: leader is 1/10104
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: synced members: 1/10104, 2/9987, 3/9969, 4/9839, 5/14789
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [dcdb] notice: all data is up to date
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: received all states
Dec  6 17:27:34 nethcn-b5 pmxcfs[14789]: [status] notice: all data is up to date
Dec  6 17:27:35 nethcn-b5 systemd[1]: Started The Proxmox VE cluster filesystem.
Dec  6 17:27:35 nethcn-b5 systemd[1]: Reloading Proxmox VE firewall.
Dec  6 17:27:35 nethcn-b5 pve-firewall[15692]: send HUP to 10709
Dec  6 17:27:35 nethcn-b5 pve-firewall[10709]: received signal HUP
Dec  6 17:27:35 nethcn-b5 pve-firewall[10709]: server shutdown (restart)
Dec  6 17:27:35 nethcn-b5 systemd[1]: Reloaded Proxmox VE firewall.
Dec  6 17:27:35 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:36 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 17min 21.376308s random time.
Dec  6 17:27:36 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:36 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 17min 58.242872s random time.
Dec  6 17:27:36 nethcn-b5 systemd[1]: Stopping Proxmox VE firewall logger...
Dec  6 17:27:36 nethcn-b5 pvepw-logger[24509]: received terminate request (signal)
Dec  6 17:27:36 nethcn-b5 pvepw-logger[24509]: stopping pvefw logger
Dec  6 17:27:36 nethcn-b5 pve-firewall[10709]: restarting server
Dec  6 17:27:36 nethcn-b5 systemd[1]: Stopped Proxmox VE firewall logger.
Dec  6 17:27:36 nethcn-b5 systemd[1]: Starting Proxmox VE firewall logger...
Dec  6 17:27:36 nethcn-b5 pvefw-logger[15777]: starting pvefw logger
Dec  6 17:27:36 nethcn-b5 systemd[1]: Started Proxmox VE firewall logger.
Dec  6 17:27:36 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:36 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 26min 56.751953s random time.
Dec  6 17:27:36 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:36 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 10min 35.995195s random time.
Dec  6 17:27:36 nethcn-b5 kernel: [88695.943725] audit: type=1400 audit(1512577656.632:14): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=15841 comm="apparmor_parser"
Dec  6 17:27:36 nethcn-b5 kernel: [88696.136335] audit: type=1400 audit(1512577656.825:15): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default" pid=15845 comm="apparmor_parser"
Dec  6 17:27:36 nethcn-b5 kernel: [88696.153706] audit: type=1400 audit(1512577656.825:16): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-cgns" pid=15845 comm="apparmor_parser"
Dec  6 17:27:36 nethcn-b5 kernel: [88696.172585] audit: type=1400 audit(1512577656.825:17): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-with-mounting" pid=15845 comm="apparmor_parser"
Dec  6 17:27:36 nethcn-b5 kernel: [88696.191137] audit: type=1400 audit(1512577656.825:18): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-container-default-with-nesting" pid=15845 comm="apparmor_parser"
Dec  6 17:27:37 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:37 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 25min 26.022837s random time.
Dec  6 17:27:37 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:37 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 43min 7.212239s random time.
Dec  6 17:27:37 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:37 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 45min 48.451806s random time.
Dec  6 17:27:37 nethcn-b5 systemd[1]: Stopping PVE Local HA Ressource Manager Daemon...
Dec  6 17:27:38 nethcn-b5 pve-ha-lrm[14351]: received signal TERM
Dec  6 17:27:38 nethcn-b5 pve-ha-lrm[14351]: restart LRM, freeze all services
Dec  6 17:27:38 nethcn-b5 pve-ha-lrm[14351]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:27:39 nethcn-b5 pvestatd[10636]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:27:48 nethcn-b5 pve-ha-lrm[14351]: watchdog closed (disabled)
Dec  6 17:27:48 nethcn-b5 pve-ha-lrm[14351]: server stopped
Dec  6 17:27:49 nethcn-b5 systemd[1]: Stopped PVE Local HA Ressource Manager Daemon.
Dec  6 17:27:49 nethcn-b5 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Dec  6 17:27:49 nethcn-b5 pve-ha-crm[16740]: starting server
Dec  6 17:27:49 nethcn-b5 pve-ha-crm[16740]: status change startup => wait_for_quorum
Dec  6 17:27:49 nethcn-b5 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Dec  6 17:27:49 nethcn-b5 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Dec  6 17:27:50 nethcn-b5 pve-ha-lrm[16775]: starting server
Dec  6 17:27:50 nethcn-b5 pve-ha-lrm[16775]: status change startup => wait_for_agent_lock
Dec  6 17:27:50 nethcn-b5 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
Dec  6 17:27:50 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:50 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 42min 4.585055s random time.
Dec  6 17:27:50 nethcn-b5 systemd[1]: Stopping PVE Cluster Ressource Manager Daemon...
Dec  6 17:27:50 nethcn-b5 pve-ha-crm[16740]: received signal TERM
Dec  6 17:27:50 nethcn-b5 pve-ha-crm[16740]: server received shutdown request
Dec  6 17:27:54 nethcn-b5 pve-ha-crm[16740]: status change wait_for_quorum => slave
Dec  6 17:27:54 nethcn-b5 pve-ha-crm[16740]: server stopped
Dec  6 17:27:55 nethcn-b5 systemd[1]: Stopped PVE Cluster Ressource Manager Daemon.
Dec  6 17:27:55 nethcn-b5 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Dec  6 17:27:55 nethcn-b5 pve-ha-crm[17429]: starting server
Dec  6 17:27:55 nethcn-b5 pve-ha-crm[17429]: status change startup => wait_for_quorum
Dec  6 17:27:55 nethcn-b5 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Dec  6 17:27:56 nethcn-b5 systemd[1]: Reloading.
Dec  6 17:27:56 nethcn-b5 systemd[1]: apt-daily-upgrade.timer: Adding 32min 16.418414s random time.
Dec  6 17:27:56 nethcn-b5 systemd[1]: Reloading PVE API Daemon.
Dec  6 17:27:57 nethcn-b5 pvedaemon[17563]: send HUP to 10872
Dec  6 17:27:57 nethcn-b5 pvedaemon[10872]: received signal HUP
Dec  6 17:27:57 nethcn-b5 pvedaemon[10872]: server closing
Dec  6 17:27:57 nethcn-b5 pvedaemon[10872]: server shutdown (restart)
Dec  6 17:27:57 nethcn-b5 pvedaemon[10874]: worker exit
Dec  6 17:27:57 nethcn-b5 pvedaemon[10873]: worker exit
Dec  6 17:27:57 nethcn-b5 pvedaemon[10875]: worker exit
Dec  6 17:27:57 nethcn-b5 systemd[1]: Reloaded PVE API Daemon.
Dec  6 17:27:57 nethcn-b5 systemd[1]: Reloading PVE API Proxy Server.
Dec  6 17:27:58 nethcn-b5 pvedaemon[10872]: restarting server
Dec  6 17:27:58 nethcn-b5 pvedaemon[10872]: starting 3 worker(s)
Dec  6 17:27:58 nethcn-b5 pvedaemon[10872]: worker 17616 started
Dec  6 17:27:58 nethcn-b5 pvedaemon[10872]: worker 17617 started
Dec  6 17:27:58 nethcn-b5 pvedaemon[10872]: worker 17618 started
Dec  6 17:27:58 nethcn-b5 pveproxy[17600]: send HUP to 13530
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: received signal HUP
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: server closing
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: server shutdown (restart)
Dec  6 17:27:58 nethcn-b5 pveproxy[13533]: worker exit
Dec  6 17:27:58 nethcn-b5 pveproxy[13532]: worker exit
Dec  6 17:27:58 nethcn-b5 pveproxy[13531]: worker exit
Dec  6 17:27:58 nethcn-b5 systemd[1]: Reloaded PVE API Proxy Server.
Dec  6 17:27:58 nethcn-b5 systemd[1]: Reloading PVE SPICE Proxy Server.
Dec  6 17:27:58 nethcn-b5 spiceproxy[17623]: send HUP to 13563
Dec  6 17:27:58 nethcn-b5 spiceproxy[13563]: received signal HUP
Dec  6 17:27:58 nethcn-b5 spiceproxy[13563]: server closing
Dec  6 17:27:58 nethcn-b5 spiceproxy[13563]: server shutdown (restart)
Dec  6 17:27:58 nethcn-b5 spiceproxy[13564]: worker exit
Dec  6 17:27:58 nethcn-b5 systemd[1]: Reloaded PVE SPICE Proxy Server.
Dec  6 17:27:58 nethcn-b5 systemd[1]: Reloading PVE Status Daemon.
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: Using '/etc/pve/local/pveproxy-ssl.pem' as certificate for the web interface.
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: restarting server
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: starting 3 worker(s)
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: worker 17637 started
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: worker 17639 started
Dec  6 17:27:58 nethcn-b5 pveproxy[13530]: worker 17640 started
Dec  6 17:27:58 nethcn-b5 spiceproxy[13563]: restarting server
Dec  6 17:27:58 nethcn-b5 spiceproxy[13563]: starting 1 worker(s)
Dec  6 17:27:58 nethcn-b5 spiceproxy[13563]: worker 17644 started
Dec  6 17:27:58 nethcn-b5 pvestatd[17634]: send HUP to 10636
Dec  6 17:27:58 nethcn-b5 pvestatd[10636]: received signal HUP
Dec  6 17:27:58 nethcn-b5 pvestatd[10636]: server shutdown (restart)
Dec  6 17:27:58 nethcn-b5 systemd[1]: Reloaded PVE Status Daemon.
Dec  6 17:27:59 nethcn-b5 pvestatd[10636]: restarting server
Dec  6 17:28:00 nethcn-b5 systemd[1]: Starting Proxmox VE replication runner...
Dec  6 17:28:00 nethcn-b5 pve-ha-lrm[16775]: successfully acquired lock 'ha_agent_nethcn-b5_lock'
Dec  6 17:28:00 nethcn-b5 pve-ha-lrm[16775]: watchdog active
Dec  6 17:28:00 nethcn-b5 pve-ha-lrm[16775]: status change wait_for_agent_lock => active
Dec  6 17:28:00 nethcn-b5 systemd[1]: Started Proxmox VE replication runner.
Dec  6 17:28:00 nethcn-b5 pve-ha-crm[17429]: status change wait_for_quorum => slave
Dec  6 17:28:01 nethcn-b5 cron[10376]: (*system*pveupdate) RELOAD (/etc/cron.d/pveupdate)
Dec  6 17:28:01 nethcn-b5 CRON[18584]: (root) CMD (   sleep $((RANDOM % 20)); /usr/local/sbin/check_ipmi.sh)
Dec  6 17:28:03 nethcn-b5 pvedaemon[10872]: worker 10873 finished
Dec  6 17:28:03 nethcn-b5 pvedaemon[10872]: worker 10874 finished
Dec  6 17:28:03 nethcn-b5 pvedaemon[10872]: worker 10875 finished
Dec  6 17:28:03 nethcn-b5 pveproxy[13530]: worker 13531 finished
Dec  6 17:28:03 nethcn-b5 pveproxy[13530]: worker 13532 finished
Dec  6 17:28:03 nethcn-b5 pveproxy[13530]: worker 13533 finished
Dec  6 17:28:03 nethcn-b5 spiceproxy[13563]: worker 13564 finished
Dec  6 17:28:06 nethcn-b5 systemd[1]: Stopping LXC Container Monitoring Daemon...
Dec  6 17:28:06 nethcn-b5 systemd[1]: Stopped LXC Container Monitoring Daemon.
Dec  6 17:28:06 nethcn-b5 systemd[1]: Started LXC Container Monitoring Daemon.
Dec  6 17:28:06 nethcn-b5 systemd[1]: Stopping Proxmox VE watchdog multiplexer...
Dec  6 17:28:06 nethcn-b5 watchdog-mux[3747]: got terminate request
Dec  6 17:28:06 nethcn-b5 watchdog-mux[3747]: exit watchdog-mux with active connections
Dec  6 17:28:06 nethcn-b5 systemd[1]: Stopped Proxmox VE watchdog multiplexer.
Dec  6 17:28:06 nethcn-b5 systemd[1]: Started Proxmox VE watchdog multiplexer.
Dec  6 17:28:06 nethcn-b5 kernel: [88725.955509] watchdog: watchdog0: watchdog did not stop!
Dec  6 17:28:06 nethcn-b5 watchdog-mux[18946]: watchdog active - unable to restart watchdog-mux
Dec  6 17:28:06 nethcn-b5 systemd[1]: watchdog-mux.service: Main process exited, code=exited, status=1/FAILURE
Dec  6 17:28:06 nethcn-b5 systemd[1]: watchdog-mux.service: Unit entered failed state.
Dec  6 17:28:06 nethcn-b5 systemd[1]: watchdog-mux.service: Failed with result 'exit-code'.
Dec  6 17:28:10 nethcn-b5 pve-ha-lrm[16775]: watchdog update failed - Broken pipe
Dec  6 17:29:41 nethcn-b5 systemd-modules-load[1761]: Inserted module 'iscsi_tcp'
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] random: get_random_bytes called from start_kernel+0x42/0x4f3 with crng_init=0
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] Linux version 4.13.8-3-pve (root at nora) (gcc version 6.3.0 20170516 (Debian 6.3.0-18)) #1 SMP PVE 4.13.8-30 (Tue, 5 Dec 2017 13:06:48 +0100) ()
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.8-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs elevator=noop console=tty0 console=ttyS1,115200n8
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] KERNEL supported cpus:
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000]   Intel GenuineIntel
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000]   AMD AuthenticAMD
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000]   Centaur CentaurHauls
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] e820: BIOS-provided physical RAM map:
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000099bff] usable
Dec  6 17:29:41 nethcn-b5 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000099c00-0x000000000009ffff] reserved

From andreas at mx20.org  Thu Dec  7 14:07:41 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Thu, 7 Dec 2017 14:07:41 +0100
Subject: [PVE-User] WARNING: Upgrade and Watchdog kills Server in HA-Mode
In-Reply-To: <6e4940d4-6c10-f253-7dad-f93959c111fc@proxmox.com>
References: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
 <6e4940d4-6c10-f253-7dad-f93959c111fc@proxmox.com>
Message-ID: <7d48bde3-5220-d75b-d835-86dd4e4e1bdd@mx20.org>

Hi again,

On 07.12.2017 08:57, Thomas Lamprecht wrote:
> Do you got some log entries around that time?
> Or a persistent journal?

some more filtered logs about the watchdog are attached. nethcn-b(1|2|5)
"crashed" and nethcn-b(3|4) kept online. Ceph monitors are running on
nethcn-b(1|3|5).

Andreas
-------------- next part --------------
root at nethcn-b1:~# cat /var/log/syslog.1|egrep watchdog\|ipcc
Dec  6 18:33:53 nethcn-b1 pvestatd[10770]: ipcc_send_rec[4] failed: Transport endpoint is not connected
Dec  6 18:33:53 nethcn-b1 pvestatd[10770]: ipcc_send_rec[4] failed: Connection refused
Dec  6 18:33:53 nethcn-b1 pvestatd[10770]: ipcc_send_rec[4] failed: Connection refused
Dec  6 18:33:53 nethcn-b1 pvestatd[10770]: ipcc_send_rec[4] failed: Connection refused
Dec  6 18:33:53 nethcn-b1 pvestatd[10770]: ipcc_send_rec[4] failed: Connection refused
Dec  6 18:33:56 nethcn-b1 pve-ha-lrm[13875]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 18:33:56 nethcn-b1 pve-ha-lrm[13875]: ipcc_send_rec[2] failed: Connection refused
Dec  6 18:33:56 nethcn-b1 pve-ha-lrm[13875]: ipcc_send_rec[3] failed: Connection refused
Dec  6 18:33:56 nethcn-b1 watchdog-mux[3565]: client did not stop watchdog - disable watchdog updates
Dec  6 18:33:58 nethcn-b1 pve-ha-crm[10964]: ipcc_send_rec[1] failed: Transport endpoint is not connected

root at nethcn-b2:~# cat /var/log/syslog.1|egrep watchdog\|ipcc
Dec  6 17:46:40 nethcn-b2 pve-ha-crm[10842]: watchdog active
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:51:20 nethcn-b2 pve-ha-crm[10842]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:51:20 nethcn-b2 watchdog-mux[3397]: client did not stop watchdog - disable watchdog updates
Dec  6 17:51:21 nethcn-b2 pve-ha-lrm[13145]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:51:21 nethcn-b2 watchdog-mux[3397]: exit watchdog-mux with active connections
Dec  6 17:51:21 nethcn-b2 kernel: [88876.361477] watchdog: watchdog0: watchdog did not stop!
Dec  6 17:51:23 nethcn-b2 pvestatd[10618]: ipcc_send_rec[1] failed: Transport endpoint is not connected

root at nethcn-b3:~# cat /var/log/syslog.1|egrep watchdog\|ipcc
Dec  6 17:46:15 nethcn-b3 pveproxy[15923]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:46:15 nethcn-b3 pveproxy[15923]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:46:15 nethcn-b3 pveproxy[15923]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:46:19 nethcn-b3 pvestatd[10805]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:46:20 nethcn-b3 pve-ha-crm[10996]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:46:20 nethcn-b3 pve-ha-lrm[13497]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:46:30 nethcn-b3 pve-ha-lrm[13497]: watchdog closed (disabled)
Dec  6 17:46:40 nethcn-b3 pve-ha-crm[10996]: watchdog closed (disabled)
Dec  6 17:47:03 nethcn-b3 systemd[1]: Stopping Proxmox VE watchdog multiplexer...
Dec  6 17:47:03 nethcn-b3 watchdog-mux[3580]: got terminate request
Dec  6 17:47:03 nethcn-b3 watchdog-mux[3580]: clean exit
Dec  6 17:47:03 nethcn-b3 systemd[1]: Stopped Proxmox VE watchdog multiplexer.
Dec  6 17:47:03 nethcn-b3 systemd[1]: Started Proxmox VE watchdog multiplexer.
Dec  6 17:47:03 nethcn-b3 watchdog-mux[834]: Watchdog driver 'Software Watchdog', version 0
Dec  6 17:49:21 nethcn-b3 pve-ha-lrm[30589]: watchdog active

root at nethcn-b4:~# cat /var/log/syslog.1|egrep watchdog\|ipcc
Dec  6 17:37:08 nethcn-b4 pveproxy[12998]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:37:08 nethcn-b4 pveproxy[12998]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:37:08 nethcn-b4 pveproxy[12998]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:37:10 nethcn-b4 pve-ha-lrm[12950]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:37:11 nethcn-b4 pve-ha-crm[10654]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:37:13 nethcn-b4 pvestatd[10424]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:37:20 nethcn-b4 pve-ha-lrm[12950]: watchdog closed (disabled)
Dec  6 17:39:01 nethcn-b4 systemd[1]: Stopping Proxmox VE watchdog multiplexer...
Dec  6 17:39:01 nethcn-b4 watchdog-mux[3564]: got terminate request
Dec  6 17:39:01 nethcn-b4 watchdog-mux[3564]: clean exit
Dec  6 17:39:01 nethcn-b4 systemd[1]: Stopped Proxmox VE watchdog multiplexer.
Dec  6 17:39:01 nethcn-b4 systemd[1]: Started Proxmox VE watchdog multiplexer.
Dec  6 17:39:01 nethcn-b4 watchdog-mux[5395]: Watchdog driver 'Software Watchdog', version 0
Dec  6 17:44:51 nethcn-b4 pve-ha-lrm[31595]: watchdog active
Dec  6 17:53:26 nethcn-b4 pve-ha-crm[31896]: watchdog active

root at nethcn-b5:/var/log# cat /var/log/syslog.1|egrep watchdog\|ipcc
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[11175]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[1] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[1] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[2] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:27:33 nethcn-b5 pve-ha-crm[14737]: ipcc_send_rec[3] failed: Connection refused
Dec  6 17:27:38 nethcn-b5 pve-ha-lrm[14351]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:27:39 nethcn-b5 pvestatd[10636]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec  6 17:27:48 nethcn-b5 pve-ha-lrm[14351]: watchdog closed (disabled)
Dec  6 17:28:00 nethcn-b5 pve-ha-lrm[16775]: watchdog active
Dec  6 17:28:06 nethcn-b5 systemd[1]: Stopping Proxmox VE watchdog multiplexer...
Dec  6 17:28:06 nethcn-b5 watchdog-mux[3747]: got terminate request
Dec  6 17:28:06 nethcn-b5 watchdog-mux[3747]: exit watchdog-mux with active connections
Dec  6 17:28:06 nethcn-b5 systemd[1]: Stopped Proxmox VE watchdog multiplexer.
Dec  6 17:28:06 nethcn-b5 systemd[1]: Started Proxmox VE watchdog multiplexer.
Dec  6 17:28:06 nethcn-b5 kernel: [88725.955509] watchdog: watchdog0: watchdog did not stop!
Dec  6 17:28:06 nethcn-b5 watchdog-mux[18946]: watchdog active - unable to restart watchdog-mux
Dec  6 17:28:06 nethcn-b5 systemd[1]: watchdog-mux.service: Main process exited, code=exited, status=1/FAILURE
Dec  6 17:28:06 nethcn-b5 systemd[1]: watchdog-mux.service: Unit entered failed state.
Dec  6 17:28:06 nethcn-b5 systemd[1]: watchdog-mux.service: Failed with result 'exit-code'.
Dec  6 17:28:10 nethcn-b5 pve-ha-lrm[16775]: watchdog update failed - Broken pipe

From mark at tuxis.nl  Sun Dec 10 05:54:19 2017
From: mark at tuxis.nl (Mark Schouten)
Date: Sun, 10 Dec 2017 05:54:19 +0100
Subject: [PVE-User] WARNING: Upgrade and Watchdog kills Server in HA-Mode
In-Reply-To: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
References: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
Message-ID: <1340E011-9802-4D06-85BC-23242C791025@tuxis.nl>

Isn?t this the issue:

Setting up pve-ha-manager (2.0-4) ...
watchdog-mux.service is a disabled or a static unit, not starting it.


Where possibly the service is stopped, but not started again?


> On 6 Dec 2017, at 18:43, Andreas Herrmann <andreas at mx20.org> wrote:
> 
> Setting up pve-ha-manager (2.0-4) ...
> watchdog-mux.service is a disabled or a static unit, not starting it.


From andreas at mx20.org  Sun Dec 10 08:04:18 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Sun, 10 Dec 2017 08:04:18 +0100
Subject: [PVE-User] WARNING: Upgrade and Watchdog kills Server in HA-Mode
In-Reply-To: <1340E011-9802-4D06-85BC-23242C791025@tuxis.nl>
References: <5c946c6e-bfa9-7bf5-aa3f-59be6279fdb3@mx20.org>
 <1340E011-9802-4D06-85BC-23242C791025@tuxis.nl>
Message-ID: <83098aba-c96d-668c-8a0a-477b327ab594@mx20.org>

Hi,

the error seems to be fixed with pve-cluster version 5.0-19

https://git.proxmox.com/?p=pve-cluster.git;a=commitdiff;h=02b93019317d2b598fbae808301aeccc6088e9c5

https://git.proxmox.com/?p=pve-cluster.git;a=commitdiff;h=ec826d72c06e6f649b2b19c3341c39abb29b19f9

Andreas

On 10.12.2017 05:54, Mark Schouten wrote:
> Isn?t this the issue:
> 
> Setting up pve-ha-manager (2.0-4) ...
> watchdog-mux.service is a disabled or a static unit, not starting it.
> 
> 
> Where possibly the service is stopped, but not started again?
> 
> 
>> On 6 Dec 2017, at 18:43, Andreas Herrmann <andreas at mx20.org> wrote:
>>
>> Setting up pve-ha-manager (2.0-4) ...
>> watchdog-mux.service is a disabled or a static unit, not starting it.


From elacunza at binovo.es  Mon Dec 11 09:55:46 2017
From: elacunza at binovo.es (Eneko Lacunza)
Date: Mon, 11 Dec 2017 09:55:46 +0100
Subject: [PVE-User] PVE4->PVE5 Live Migration issues
In-Reply-To: <edba0532-4223-6508-9d0f-53d25e22733a@coppint.com>
References: <1cab5238-f913-fed2-1e37-3a5eb657c1d1@coppint.com>
 <edba0532-4223-6508-9d0f-53d25e22733a@coppint.com>
Message-ID: <86089592-f9c7-42b9-239a-daa7a22e182e@binovo.es>

What we found was that some of our VMs were running already with a 
non-cirrus VGA.

So we had to check each VM's running kvm process, to know wether we had 
to add vga:cirrus or not.

We didn't see this 100%CPU issue though.

El 07/12/17 a las 15:56, Florent B escribi?:
> Even if migration succeeded with "vga: cirrus", some VM are frozen with
> 100%CPU, no console...
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


-- 
Zuzendari Teknikoa / Director T?cnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es


From f.rust at sec.tu-bs.de  Mon Dec 11 10:31:55 2017
From: f.rust at sec.tu-bs.de (F.Rust)
Date: Mon, 11 Dec 2017 10:31:55 +0100
Subject: [PVE-User] Setup default VM ID starting number
Message-ID: <3B6B854E-AF79-4A55-B6BB-3CA643891413@sec.tu-bs.de>

Hi all,

is it possible to set a different starting number for VM ids?
We have different clusters and don?t want to have overlapping vm ids.
So it would be great to simply say
 Cluster 1 start VM-ids at 100
 Cluster 2 start VM ids at 1000
 ?
Admin of Cluster 2 can not see which machine ids on Cluster 1 exist an vice versa.
But machine images or backups might get mixed up in SAN

Best regards,
Frank


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20171211/d410ca42/attachment.sig>

From t.lamprecht at proxmox.com  Mon Dec 11 10:58:55 2017
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Mon, 11 Dec 2017 10:58:55 +0100
Subject: [PVE-User] Setup default VM ID starting number
In-Reply-To: <3B6B854E-AF79-4A55-B6BB-3CA643891413@sec.tu-bs.de>
References: <3B6B854E-AF79-4A55-B6BB-3CA643891413@sec.tu-bs.de>
Message-ID: <ccee0818-73c0-4858-8042-842f7e1fc7fa@proxmox.com>

Hi,

On 12/11/2017 10:31 AM, F.Rust wrote:
> Hi all,
> 
> is it possible to set a different starting number for VM ids?

No, currently not, I'm afraid.

> We have different clusters and don?t want to have overlapping vm ids.
> So it would be great to simply say
>  Cluster 1 start VM-ids at 100
>  Cluster 2 start VM ids at 1000
>  ?
> Admin of Cluster 2 can not see which machine ids on Cluster 1 exist an vice versa.
> But machine images or backups might get mixed up in SAN


Is there a possibility to declare two different backup endpoints in
your setup? We normally expect that two cluster do not access the same
write-able storage twice at the same path, exactly for the backup clash
possibility and other shared resource access problems.

cheers,
Thomas


From miguel_3_gonzalez at yahoo.es  Mon Dec 11 13:40:28 2017
From: miguel_3_gonzalez at yahoo.es (=?UTF-8?Q?Miguel_Gonz=c3=a1lez?=)
Date: Mon, 11 Dec 2017 13:40:28 +0100
Subject: [PVE-User] sparse and compression
Message-ID: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>

Dear all,

 Is it advisable to use sparse on ZFS pools performance wise? And
compression? Which kind of compression?

 Can I change a zpool to sparse on the fly or do I need to turn off all
VMs before doing so?

 Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?

 NAME                       USED  AVAIL  REFER  MOUNTPOINT
 rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -

 Thanks!

 Miguel

---
This email has been checked for viruses by AVG.
http://www.avg.com


From andreas at mx20.org  Mon Dec 11 14:16:38 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Mon, 11 Dec 2017 14:16:38 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
Message-ID: <ec9db737-9f15-bd7e-431b-59853ffaffbb@mx20.org>

Hi Migual,

first at all: man zfs!

On 11.12.2017 13:40, Miguel Gonz?lez wrote:
>  Is it advisable to use sparse on ZFS pools performance wise? And
> compression? Which kind of compression?

Sparse or not doesn't matter on SSDs. I would use compression because of less
r/w to the disc and modern CPUs can handle lz4 quite well.

Also keep in mind: A sparse volume only stays sparse if trim/discard is used!

volblocksize is important: ZFS is using 8k as default. For ZFS filesystem a
recordsize of 128K is used.

Some older test:
zpool/vm-zvols/bsize_1k    written               10.3G
zpool/vm-zvols/bsize_1k    logicalused           1.82G
zpool/vm-zvols/bsize_4k    written               2.60G
zpool/vm-zvols/bsize_4k    logicalused           1.76G
zpool/vm-zvols/bsize_8k    written               2.60G
zpool/vm-zvols/bsize_8k    logicalused           1.78G
zpool/vm-zvols/bsize_16k   written               1.87G
zpool/vm-zvols/bsize_16k   logicalused           1.70G
zpool/vm-zvols/bsize_32k   written               1.87G
zpool/vm-zvols/bsize_32k   logicalused           1.71G
zpool/vm-zvols/bsize_64k   written               1.72G
zpool/vm-zvols/bsize_64k   logicalused           1.72G
zpool/vm-zvols/bsize_128k  written               1.75G
zpool/vm-zvols/bsize_128k  logicalused           1.75G


>  Can I change a zpool to sparse on the fly or do I need to turn off all
> VMs before doing so?

No, sparse or not is set at creation.

>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
> 
>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -

Because of blocksizes. Check zfs get all and read theory about ZFS.

Here's an example for a non-sparse 50GB Volume for a VM:

zpool/vm-zvols/foobar      51.6G  2.19T  34.6G  -
zpool/vm-zvols/foobar  used                  51.6G
zpool/vm-zvols/foobar  referenced            34.6G
zpool/vm-zvols/foobar  compressratio         1.02x
zpool/vm-zvols/foobar  volsize               50G
zpool/vm-zvols/foobar  volblocksize          8K
zpool/vm-zvols/foobar  compression           lz4
zpool/vm-zvols/foobar  refreservation        51.6G
zpool/vm-zvols/foobar  usedbydataset         34.6G
zpool/vm-zvols/foobar  usedbyrefreservation  17.0G
zpool/vm-zvols/foobar  refcompressratio      1.02x
zpool/vm-zvols/foobar  written               34.6G
zpool/vm-zvols/foobar  logicalused           24.2G
zpool/vm-zvols/foobar  logicalreferenced     24.2G


Andreas


From f.gruenbichler at proxmox.com  Mon Dec 11 14:17:31 2017
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Mon, 11 Dec 2017 14:17:31 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
Message-ID: <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>

On Mon, Dec 11, 2017 at 01:40:28PM +0100, Miguel Gonz?lez wrote:
> Dear all,
> 
>  Is it advisable to use sparse on ZFS pools performance wise? And
> compression? Which kind of compression?

sparse just tells ZFS to not reserve space, it does not make a
difference performance wise. if you do over provision and attempt to use
more space than you actuall have, you can corrupt volumes / run into I/O
errors though, like with most storages.

compression is advisable, it costs (almost) nothing and usually
increases performance and saves space. the default (on which is lz4) is
fine.

> 
>  Can I change a zpool to sparse on the fly or do I need to turn off all
> VMs before doing so?

sparse will only affect newly created volumes. you can "convert" sparse
volumes to fully reserved ones and vice versa manually though.

compression only affects data written after it has been enabled, and
already written data stays compressed if you turn it off again. if you
want to fully switch from compressed to uncompressed or vice versa, you
need to re-write all the data.

> 
>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
> 
>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -

wild guess - you are using raidz of some kind? ashift is set to 12 /
auto-detected?


From miguel_3_gonzalez at yahoo.es  Mon Dec 11 15:23:34 2017
From: miguel_3_gonzalez at yahoo.es (=?UTF-8?Q?Miguel_Gonz=c3=a1lez?=)
Date: Mon, 11 Dec 2017 15:23:34 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
Message-ID: <d3a65782-6dd4-09ab-70ea-83c66c03f211@yahoo.es>


>>  Can I change a zpool to sparse on the fly or do I need to turn off all
>> VMs before doing so?
> 
> sparse will only affect newly created volumes. you can "convert" sparse
> volumes to fully reserved ones and vice versa manually though.
> 

how can I convert manually from non-sparse to sparse? Creating a new
zpool and copy disk with dd? Or any other easier way?


> 
>>
>>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
>>
>>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
>>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -
> 
> wild guess - you are using raidz of some kind? ashift is set to 12 /
> auto-detected?

Yes, raid1

Thanks for your promptly reply!

Miguel

---
This email has been checked for viruses by AVG.
http://www.avg.com


From miguel_3_gonzalez at yahoo.es  Mon Dec 11 15:27:47 2017
From: miguel_3_gonzalez at yahoo.es (=?UTF-8?Q?Miguel_Gonz=c3=a1lez?=)
Date: Mon, 11 Dec 2017 15:27:47 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <ec9db737-9f15-bd7e-431b-59853ffaffbb@mx20.org>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <ec9db737-9f15-bd7e-431b-59853ffaffbb@mx20.org>
Message-ID: <d96a175b-3488-e5aa-5d20-85aefe032878@yahoo.es>


> 
> Also keep in mind: A sparse volume only stays sparse if trim/discard is used!

How do I use trim/discard? Has this to be set in guest level, right?

> 
> volblocksize is important: ZFS is using 8k as default. For ZFS filesystem a
> recordsize of 128K is used.

Can I change recordsize to 128K after creation or do I need to create a
new zpool for that?

Thanks for your promptly answer!

Miguel


---
This email has been checked for viruses by AVG.
http://www.avg.com


From f.gruenbichler at proxmox.com  Mon Dec 11 15:29:04 2017
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Mon, 11 Dec 2017 15:29:04 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <d3a65782-6dd4-09ab-70ea-83c66c03f211@yahoo.es>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
 <d3a65782-6dd4-09ab-70ea-83c66c03f211@yahoo.es>
Message-ID: <20171211142904.biawkzgnntym2vmn@nora.maurer-it.com>

On Mon, Dec 11, 2017 at 03:23:34PM +0100, Miguel Gonz?lez wrote:
> 
> >>  Can I change a zpool to sparse on the fly or do I need to turn off all
> >> VMs before doing so?
> > 
> > sparse will only affect newly created volumes. you can "convert" sparse
> > volumes to fully reserved ones and vice versa manually though.
> > 
> 
> how can I convert manually from non-sparse to sparse? Creating a new
> zpool and copy disk with dd? Or any other easier way?

(un)set the reservations appropriately. like I said, "sparse" is
entirely virtual for ZFS, the only difference is whether the full size
is reserved upon creation or not.

> >>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
> >>
> >>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
> >>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -
> > 
> > wild guess - you are using raidz of some kind? ashift is set to 12 /
> > auto-detected?
> 
> Yes, raid1
> 
> Thanks for your promptly reply!

raid1 (aka mirror)? or raidZ-1 ? those are two very different things ;)


From miguel_3_gonzalez at yahoo.es  Mon Dec 11 15:34:37 2017
From: miguel_3_gonzalez at yahoo.es (=?UTF-8?Q?Miguel_Gonz=c3=a1lez?=)
Date: Mon, 11 Dec 2017 15:34:37 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <20171211142904.biawkzgnntym2vmn@nora.maurer-it.com>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
 <d3a65782-6dd4-09ab-70ea-83c66c03f211@yahoo.es>
 <20171211142904.biawkzgnntym2vmn@nora.maurer-it.com>
Message-ID: <79d00b4d-9211-4b1a-d136-7a78454badda@yahoo.es>

On 12/11/17 3:29 PM, Fabian Gr?nbichler wrote:
> On Mon, Dec 11, 2017 at 03:23:34PM +0100, Miguel Gonz?lez wrote:
>>
>>>>  Can I change a zpool to sparse on the fly or do I need to turn off all
>>>> VMs before doing so?
>>>
>>> sparse will only affect newly created volumes. you can "convert" sparse
>>> volumes to fully reserved ones and vice versa manually though.
>>>
>>
>> how can I convert manually from non-sparse to sparse? Creating a new
>> zpool and copy disk with dd? Or any other easier way?
> 
> (un)set the reservations appropriately. like I said, "sparse" is
> entirely virtual for ZFS, the only difference is whether the full size
> is reserved upon creation or not.

from Andreas comment maybe i should look more into change blocksize.

> 
>>>>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
>>>>
>>>>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
>>>>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -
>>>
>>> wild guess - you are using raidz of some kind? ashift is set to 12 /
>>> auto-detected?
>>
>> Yes, raid1
>>
>> Thanks for your promptly reply!
> 
> raid1 (aka mirror)? or raidZ-1 ? those are two very different things ;)

from zfs perspective is called mirror-0 (not softraid underneath):

zpool status
  pool: rpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not
support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0h39m with 0 errors on Sun Dec 10 02:03:43 2017
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0


---
This email has been checked for viruses by AVG.
http://www.avg.com


From andreas at mx20.org  Mon Dec 11 15:35:35 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Mon, 11 Dec 2017 15:35:35 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <d96a175b-3488-e5aa-5d20-85aefe032878@yahoo.es>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <ec9db737-9f15-bd7e-431b-59853ffaffbb@mx20.org>
 <d96a175b-3488-e5aa-5d20-85aefe032878@yahoo.es>
Message-ID: <f6af550a-47d0-efe5-5256-e34552341e14@mx20.org>

Hi,

On 11.12.2017 15:27, Miguel Gonz?lez wrote:
>> Also keep in mind: A sparse volume only stays sparse if trim/discard is used!
> 
> How do I use trim/discard? Has this to be set in guest level, right?

https://pve.proxmox.com/wiki/Qemu_trim/discard_and_virtio_scsi

>> volblocksize is important: ZFS is using 8k as default. For ZFS filesystem a
>> recordsize of 128K is used.
> 
> Can I change recordsize to 128K after creation or do I need to create a
> new zpool for that?

Play and learn:

zfs set volblocksize=16K zpool/vm-zvols/test
cannot set property for 'zpool/vm-zvols/test': 'volblocksize' is readonly

You really should read 'man zfs'

Andreas


From andreas at mx20.org  Mon Dec 11 15:47:56 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Mon, 11 Dec 2017 15:47:56 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
Message-ID: <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>

Hi

On 11.12.2017 14:17, Fabian Gr?nbichler wrote:
> On Mon, Dec 11, 2017 at 01:40:28PM +0100, Miguel Gonz?lez wrote:
>>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
>>
>>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
>>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -
> 
> wild guess - you are using raidz of some kind? ashift is set to 12 /
> auto-detected?

No! 'zpool list' will show what is used on disk. zfs list is totally
transparent to zpool layout. Have a look at 'zpool get all' for the ashift
setting.

Example for raidz1 (4x 960GB SSDs):
root at foobar:~# zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zpool  3.41T   102G  3.31T         -     8%     2%  1.00x  ONLINE  -

root at foobar:~# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
zpool                            237G  2.17T   140K  /zpool

zpool ALLOC is smaller than zfs USED in this example. Why? Try to unserstand
the difference between 'referenced' and 'used'. My volumes aren't sparse but
discard is used.

Andreas


From miguel_3_gonzalez at yahoo.es  Mon Dec 11 16:10:29 2017
From: miguel_3_gonzalez at yahoo.es (=?UTF-8?Q?Miguel_Gonz=c3=a1lez?=)
Date: Mon, 11 Dec 2017 16:10:29 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
 <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>
Message-ID: <b58ed64d-7aea-b080-fe20-845c3a5a01c3@yahoo.es>

On 12/11/17 3:47 PM, Andreas Herrmann wrote:
> Hi
> 
> On 11.12.2017 14:17, Fabian Gr?nbichler wrote:
>> On Mon, Dec 11, 2017 at 01:40:28PM +0100, Miguel Gonz?lez wrote:
>>>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
>>>
>>>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
>>>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -
>>
>> wild guess - you are using raidz of some kind? ashift is set to 12 /
>> auto-detected?
> 
> No! 'zpool list' will show what is used on disk. zfs list is totally
> transparent to zpool layout. Have a look at 'zpool get all' for the ashift
> setting.
> 
> Example for raidz1 (4x 960GB SSDs):
> root at foobar:~# zpool list
> NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
> zpool  3.41T   102G  3.31T         -     8%     2%  1.00x  ONLINE  -
> 
> root at foobar:~# zfs list
> NAME                             USED  AVAIL  REFER  MOUNTPOINT
> zpool                            237G  2.17T   140K  /zpool
> 
> zpool ALLOC is smaller than zfs USED in this example. Why? Try to unserstand
> the difference between 'referenced' and 'used'. My volumes aren't sparse but
> discard is used.


I have search around about how to understand those columns. I didn?t
find anything on the wiki that explains this.

This is my zfs list:

NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                      207G  61.9G   104K  /rpool
rpool/ROOT                6.10G  61.9G    96K  /rpool/ROOT
rpool/ROOT/pve-1          6.10G  61.9G  6.10G  /
rpool/data                 197G  61.9G    96K  /rpool/data
rpool/data/vm-100-disk-1   108G  61.9G   108G  -
rpool/data/vm-102-disk-1  37.1G  77.9G  21.1G  -
rpool/data/vm-102-disk-2  51.6G  81.8G  31.7G  -
rpool/swap                4.25G  64.9G  1.25G  -

If I run zfs get all I get:

rpool/data/vm-100-disk-1  written                108G
rpool/data/vm-100-disk-1  logicalused            129G
rpool/data/vm-100-disk-1  logicalreferenced      129G

rpool/data/vm-102-disk-1  written                21.1G
rpool/data/vm-102-disk-1  logicalused            27.1G
rpool/data/vm-102-disk-1  logicalreferenced      27.1G


rpool/data/vm-102-disk-2  written                31.7G
rpool/data/vm-102-disk-2  logicalused            36.2G
rpool/data/vm-102-disk-2  logicalreferenced      36.2G

So even If I?m having 8k blocksize and non-sparse the written data is
quite close to the real usage in the guests VMs.

All this comes from that I was running out of space when running
pve-zsync to perform a copy of the VM in other node.

I have found out that snapshots were taken some part of the data (30 Gb).

Any way to run a pve-zsync only a day that doesn?t consume snapshots on
this machine (Maybe running from the target machine?)

Thanks

Miguel


---
This email has been checked for viruses by AVG.
http://www.avg.com


From andreas at mx20.org  Mon Dec 11 16:22:59 2017
From: andreas at mx20.org (Andreas Herrmann)
Date: Mon, 11 Dec 2017 16:22:59 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <b58ed64d-7aea-b080-fe20-845c3a5a01c3@yahoo.es>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
 <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>
 <b58ed64d-7aea-b080-fe20-845c3a5a01c3@yahoo.es>
Message-ID: <9d630687-41ae-4849-e90f-18a0f25c5a3a@mx20.org>

Hi Miguel,

On 11.12.2017 16:10, Miguel Gonz?lez wrote:
>> Example for raidz1 (4x 960GB SSDs):
>> root at foobar:~# zpool list
>> NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
>> zpool  3.41T   102G  3.31T         -     8%     2%  1.00x  ONLINE  -
>>
>> root at foobar:~# zfs list
>> NAME                             USED  AVAIL  REFER  MOUNTPOINT
>> zpool                            237G  2.17T   140K  /zpool
>>
>> zpool ALLOC is smaller than zfs USED in this example. Why? Try to unserstand
>> the difference between 'referenced' and 'used'. My volumes aren't sparse but
>> discard is used.
> 
> 
> I have search around about how to understand those columns. I didn?t
> find anything on the wiki that explains this.

Why should Proxmox explain the theory of ZFS? Please have a look at 'man zfs'.
There you'll find all you need.

Andreas


From f.gruenbichler at proxmox.com  Mon Dec 11 16:37:57 2017
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Mon, 11 Dec 2017 16:37:57 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
 <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>
Message-ID: <20171211153757.fe7pgkh3yndob2wz@nora.maurer-it.com>

On Mon, Dec 11, 2017 at 03:47:56PM +0100, Andreas Herrmann wrote:
> Hi
> 
> On 11.12.2017 14:17, Fabian Gr?nbichler wrote:
> > On Mon, Dec 11, 2017 at 01:40:28PM +0100, Miguel Gonz?lez wrote:
> >>  Why a virtual disk shows as 60G when originally It was 36 Gb in raw format?
> >>
> >>  NAME                       USED  AVAIL  REFER  MOUNTPOINT
> >>  rpool/data/vm-102-disk-1  60.0G  51.3G  20.9G  -
> > 
> > wild guess - you are using raidz of some kind? ashift is set to 12 /
> > auto-detected?
> 
> No! 'zpool list' will show what is used on disk. zfs list is totally
> transparent to zpool layout. Have a look at 'zpool get all' for the ashift
> setting.

I know. in most cases when people are surprised by their zvols taking up
more space than expected, it is because they are using raidz and don't
know about the interaction between ashift=12, raidz and small
volblocksize.

> 
> Example for raidz1 (4x 960GB SSDs):
> root at foobar:~# zpool list
> NAME    SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
> zpool  3.41T   102G  3.31T         -     8%     2%  1.00x  ONLINE  -
> 
> root at foobar:~# zfs list
> NAME                             USED  AVAIL  REFER  MOUNTPOINT
> zpool                            237G  2.17T   140K  /zpool
> 
> zpool ALLOC is smaller than zfs USED in this example. Why? Try to unserstand
> the difference between 'referenced' and 'used'. My volumes aren't sparse but
> discard is used.

your output is pretty worthless, as "REFER" only refers to the pool
dataset, and not its children. I do know the difference between used and
referenced, which is not (directly) related to discard at all. discard
can obviously get your referenced value down ;)

see the following for an example where a 10G volume takes more than 10G
of space in 'zfs list' output:

$ zfs list testpool -r -o name,used,referenced,volsize
NAME            USED  REFER  VOLSIZE
testpool       14.3G   140K        -
testpool/test  14.3G  14.3G      10G

$ zpool list testpool
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool  39.8G  19.7G  20.1G         -     0%    49%  1.00x  ONLINE  -

the only difference between a sparse and non-sparse zvol is whether
refreservation is set, which affects the usedbyrefreservation value
which in turn (might / probably will) affect the used value. no relation
to discard at all.


From lindsay.mathieson at gmail.com  Mon Dec 11 16:46:36 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Tue, 12 Dec 2017 01:46:36 +1000
Subject: [PVE-User] pveproxy dying, node unusable
Message-ID: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>

I dist-upraded two nodes yesterday. Now both those nodes have multiple 
unkilliable pveproxy processes. dmesg has many entries of:

    [50996.416909] INFO: task pveproxy:6798 blocked for more than 120
    seconds.
    [50996.416914]?????? Tainted: P?????????? O 4.4.95-1-pve #1
    [50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
    disables this message.
    [50996.416922] pveproxy??????? D ffff8809194e3df8 0? 6798????? 1
    0x00000004
    [50996.416925]? ffff8809194e3df8 ffff880ff6f5ed80 ffff880ff84fe200
    ffff880fded5e200
    [50996.416927]? ffff8809194e4000 ffff880fc7fb43ac ffff880fded5e200
    00000000ffffffff
    [50996.416929]? ffff880fc7fb43b0 ffff8809194e3e10 ffffffff818643b5
    ffff880fc7fb43a8


qm list hangs

Node vms do not respond in web gui

The node I did not upgrade is fine.


-- 
Lindsay Mathieson


From lindsay.mathieson at gmail.com  Mon Dec 11 16:50:30 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Tue, 12 Dec 2017 01:50:30 +1000
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
Message-ID: <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>

Also I was unable to connect to the VM's on those nodes, not even via RDP

On 12/12/2017 1:46 AM, Lindsay Mathieson wrote:
>
> I dist-upraded two nodes yesterday. Now both those nodes have multiple 
> unkilliable pveproxy processes. dmesg has many entries of:
>
>     [50996.416909] INFO: task pveproxy:6798 blocked for more than 120
>     seconds.
>     [50996.416914]?????? Tainted: P?????????? O 4.4.95-1-pve #1
>     [50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [50996.416922] pveproxy??????? D ffff8809194e3df8 0? 6798????? 1
>     0x00000004
>     [50996.416925]? ffff8809194e3df8 ffff880ff6f5ed80 ffff880ff84fe200
>     ffff880fded5e200
>     [50996.416927]? ffff8809194e4000 ffff880fc7fb43ac ffff880fded5e200
>     00000000ffffffff
>     [50996.416929]? ffff880fc7fb43b0 ffff8809194e3e10 ffffffff818643b5
>     ffff880fc7fb43a8
>
>
> qm list hangs
>
> Node vms do not respond in web gui
>
> The node I did not upgrade is fine.
>
>
> -- 
> Lindsay Mathieson


-- 
Lindsay Mathieson


From miguel_3_gonzalez at yahoo.es  Mon Dec 11 17:05:17 2017
From: miguel_3_gonzalez at yahoo.es (=?UTF-8?Q?Miguel_Gonz=c3=a1lez?=)
Date: Mon, 11 Dec 2017 17:05:17 +0100
Subject: [PVE-User] sparse and compression
In-Reply-To: <20171211153757.fe7pgkh3yndob2wz@nora.maurer-it.com>
References: <26aaa72a-458d-8c63-c3c9-16830a96e0c3@yahoo.es>
 <20171211131731.7ixb7wpq6tly3khp@nora.maurer-it.com>
 <d40e52e8-a16d-8a7e-4e57-9a2ca989e683@mx20.org>
 <20171211153757.fe7pgkh3yndob2wz@nora.maurer-it.com>
Message-ID: <e37a2f5e-178f-c5ef-9b7e-2cda6f1cdd05@yahoo.es>


> $ zfs list testpool -r -o name,used,referenced,volsize
> NAME            USED  REFER  VOLSIZE
> testpool       14.3G   140K        -
> testpool/test  14.3G  14.3G      10G

Mine is:

zfs list rpool -r -o name,used,referenced,volsize

NAME                       USED  REFER  VOLSIZE
rpool                      207G   104K        -
rpool/ROOT                6.10G    96K        -
rpool/ROOT/pve-1          6.10G  6.10G        -
rpool/data                 197G    96K        -
rpool/data/vm-100-disk-1   108G   108G     138G
rpool/data/vm-102-disk-1  37.1G  21.3G      36G
rpool/data/vm-102-disk-2  51.6G  31.7G      50G
rpool/swap                4.25G  1.25G       4G

How can I fix this with the minimum downtime? In the three disks I have
more than 15 Gb free.

Thanks,

Miguel


---
This email has been checked for viruses by AVG.
http://www.avg.com


From e.kasper at proxmox.com  Mon Dec 11 17:14:37 2017
From: e.kasper at proxmox.com (Emmanuel Kasper)
Date: Mon, 11 Dec 2017 17:14:37 +0100
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
 <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
Message-ID: <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>


On 12/11/2017 04:50 PM, Lindsay Mathieson wrote:
> Also I was unable to connect to the VM's on those nodes, not even via RDP
> 
> On 12/12/2017 1:46 AM, Lindsay Mathieson wrote:
>>
>> I dist-upraded two nodes yesterday. Now both those nodes have multiple
>> unkilliable pveproxy processes. dmesg has many entries of:
>>
>> ??? [50996.416909] INFO: task pveproxy:6798 blocked for more than 120
>> ??? seconds.
>> ??? [50996.416914]?????? Tainted: P?????????? O 4.4.95-1-pve #1
>> ??? [50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> ??? disables this message.
>> ??? [50996.416922] pveproxy??????? D ffff8809194e3df8 0? 6798????? 1
>> ??? 0x00000004
>> ??? [50996.416925]? ffff8809194e3df8 ffff880ff6f5ed80 ffff880ff84fe200
>> ??? ffff880fded5e200
>> ??? [50996.416927]? ffff8809194e4000 ffff880fc7fb43ac ffff880fded5e200
>> ??? 00000000ffffffff
>> ??? [50996.416929]? ffff880fc7fb43b0 ffff8809194e3e10 ffffffff818643b5
>> ??? ffff880fc7fb43a8
>>
>>
>> qm list hangs
>>
>> Node vms do not respond in web gui
>>
>> The node I did not upgrade is fine.


Hi Lindsay
As a quick check, is the cluster file system mounted on /etc/pve and can
you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ?

Are the node storages  returning their status properly ?
(ie pvesm status does not hang)


From lindsay.mathieson at gmail.com  Mon Dec 11 17:18:42 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Tue, 12 Dec 2017 02:18:42 +1000
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
 <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
 <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
Message-ID: <9e4acabd-ede0-2bf7-ce78-0eb2980ad92d@gmail.com>

On 12/12/2017 2:14 AM, Emmanuel Kasper wrote:
> Hi Lindsay
> As a quick check, is the cluster file system mounted on /etc/pve and can
> you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ?

Unfortunately I hard reset both nodes as I needed them up. But a pvecm 
status showed that quorum was ok and the nodes were marked green in the 
web gui.

/etc/pve was mounted and accessible on the unaffected node.

>
> Are the node storages  returning their status properly ?
> (ie pvesm status does not hang)


Yes they were (pvesm status).


nb. Both nodes are running ok after a reset now.


thanks.

-- 
Lindsay Mathieson


From davel at upilab.com  Wed Dec 13 14:34:20 2017
From: davel at upilab.com (David Lawley)
Date: Wed, 13 Dec 2017 08:34:20 -0500
Subject: [PVE-User] netdata anyone?
Message-ID: <61f8bd85-2659-795f-3aed-9dd791b9bfb0@upilab.com>

Anyone use netdata?

https://github.com/firehol/netdata

pro/cons, impact on Promox if any.

It help me find one issue I was having but was unsure of long term 
impact.  Help me identify "Squeezed" packets on a nic.


From daniel at linux-nerd.de  Wed Dec 13 22:03:47 2017
From: daniel at linux-nerd.de (Daniel)
Date: Wed, 13 Dec 2017 22:03:47 +0100
Subject: [PVE-User] netdata anyone?
In-Reply-To: <61f8bd85-2659-795f-3aed-9dd791b9bfb0@upilab.com>
References: <61f8bd85-2659-795f-3aed-9dd791b9bfb0@upilab.com>
Message-ID: <7C10FA35-95E9-48E9-86FB-4C395381532C@linux-nerd.de>

There is now problem. Proxmox is more or less a normal Debian.
You can install it as in the docu described.

Cheers

Daniel


Am 13.12.17, 14:35 schrieb "pve-user im Auftrag von David Lawley" <pve-user-bounces at pve.proxmox.com im Auftrag von davel at upilab.com>:

    Anyone use netdata?
    
    https://github.com/firehol/netdata
    
    pro/cons, impact on Promox if any.
    
    It help me find one issue I was having but was unsure of long term 
    impact.  Help me identify "Squeezed" packets on a nic.
    
    _______________________________________________
    pve-user mailing list
    pve-user at pve.proxmox.com
    https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
    

From m at plus-plus.su  Mon Dec 18 18:57:23 2017
From: m at plus-plus.su (Mikhail)
Date: Mon, 18 Dec 2017 20:57:23 +0300
Subject: [PVE-User] Failure to install latest PVE on Debian Stretch
Message-ID: <dcef4c98-b037-7601-c580-d3db15f7ad49@plus-plus.su>

Hello,

Following this official wiki instruction:
https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Stretch

(done that procedure several times already)

I'm having problem with new server. It is running clean install of
Debian Stretch and I'm trying to put Proxmox using PVE packages install
as described in wiki. I'm not sure whether the problem is related to
something misconfigured on Stretch itself, or the problem is somewhere
with PVE packages, but my installation fails on the following:

Setting up pve-firewall (3.0-5) ...
Created symlink
/etc/systemd/system/multi-user.target.wants/pve-firewall.service ?
/lib/systemd/system/pve-firewall.service.
insserv: Service pve-cluster has to be enabled to start service pvefw-logger
insserv: exiting now!
update-rc.d: error: insserv rejected the script header
dpkg: error processing package pve-firewall (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of qemu-server:
 qemu-server depends on pve-firewall; however:
  Package pve-firewall is not configured yet.

dpkg: error processing package qemu-server (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of proxmox-ve:
 proxmox-ve depends on qemu-server; however:
  Package qemu-server is not configured yet.

dpkg: error processing package proxmox-ve (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-manager:
 pve-manager depends on pve-firewall; however:
  Package pve-firewall is not configured yet.
 pve-manager depends on qemu-server (>= 1.1-1); however:
  Package qemu-server is not configured yet.

dpkg: error processing package pve-manager (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-ha-manager:
 pve-ha-manager depends on qemu-server; however:
  Package qemu-server is not configured yet.

dpkg: error processing package pve-ha-manager (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-container:
 pve-container depends on pve-ha-manager; however:
  Package pve-ha-manager is not configured yet.

dpkg: error processing package pve-container (--configure):
 dependency problems - leaving unconfigured
Processing triggers for initramfs-tools (0.130) ...
update-initramfs: Generating /boot/initrd.img-4.13.13-1-pve
I: The initramfs will attempt to resume from /dev/md0
I: (UUID=25b05adb-f12d-40d0-8c68-1bf28e25e9ba)
I: Set the RESUME variable to override this.
Processing triggers for libc-bin (2.24-11+deb9u1) ...
Processing triggers for systemd (232-25+deb9u1) ...
Errors were encountered while processing:
 pve-firewall
 qemu-server
 proxmox-ve
 pve-manager
 pve-ha-manager
 pve-container
E: Sub-process /usr/bin/dpkg returned an error code (1)
root at pve /etc/apt/sources.list.d #


I have tried everything, "apt-get -f install", dpkg-reconfigure all PVE
packages, remove proxmox-ve postfix open-iscsi packages and doing
install again - always failing with the same error.

Has anyone else experienced something similar? This has never failed on
my history before.

Thanks.


From dietmar at proxmox.com  Mon Dec 18 21:06:51 2017
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Mon, 18 Dec 2017 21:06:51 +0100 (CET)
Subject: [PVE-User] Failure to install latest PVE on Debian Stretch
In-Reply-To: <dcef4c98-b037-7601-c580-d3db15f7ad49@plus-plus.su>
References: <dcef4c98-b037-7601-c580-d3db15f7ad49@plus-plus.su>
Message-ID: <1370055586.64.1513627612111@webmail.proxmox.com>

> I'm having problem with new server. It is running clean install of
> Debian Stretch and I'm trying to put Proxmox using PVE packages install
> as described in wiki. I'm not sure whether the problem is related to
> something misconfigured on Stretch itself, or the problem is somewhere
> with PVE packages, but my installation fails on the following:
> 
> Setting up pve-firewall (3.0-5) ...
> Created symlink
> /etc/systemd/system/multi-user.target.wants/pve-firewall.service ?
> /lib/systemd/system/pve-firewall.service.
> insserv: Service pve-cluster has to be enabled to start service pvefw-logger
> insserv: exiting now!

We do not support insserv based systems anymore - please use systemd instead.


From davel at upilab.com  Mon Dec 18 21:19:38 2017
From: davel at upilab.com (David Lawley)
Date: Mon, 18 Dec 2017 15:19:38 -0500
Subject: [PVE-User] sysctl tuning 5.1
Message-ID: <f927de1a-bdd5-7d7b-68b1-de75c0be123a@upilab.com>

Been working on tuning for a 10g network on PV 5.1

These examples for my sysctl.conf file give me errors and I never can 
really seem to get any to stick

net.core.rmem_max=8388608
net.core.wmem_max=8388608
net.core.rmem_default=65536
net.core.wmem_default=65536
net.ipv4.tcp_rmem="4096 87380 8388608"
net.ipv4.tcp_wmem="4096 65536 8388608"
net.ipv4.tcp_mem="8388608 8388608 8388608"


When sysctl -p is ran I get

sysctl: setting key "net.ipv4.tcp_rmem": Invalid argument
net.ipv4.tcp_rmem = "4096 87380 8388608"
sysctl: setting key "net.ipv4.tcp_wmem": Invalid argument
net.ipv4.tcp_wmem = "4096 65536 8388608"
sysctl: setting key "net.ipv4.tcp_mem": Invalid argument
net.ipv4.tcp_mem = "8388608 8388608 8388608"

Guess I'm trying to understand what I am missing


doing it manually via cli, seems to work.

root at pve:/etc# sysctl -w net.ipv4.tcp_rmem="4096 87380 8388608"
net.ipv4.tcp_rmem = 4096 87380 8388608
root at pve:/etc#


From olivier.benghozi at wifirst.fr  Mon Dec 18 22:49:17 2017
From: olivier.benghozi at wifirst.fr (Olivier Benghozi)
Date: Mon, 18 Dec 2017 22:49:17 +0100
Subject: [PVE-User] sysctl tuning 5.1
In-Reply-To: <f927de1a-bdd5-7d7b-68b1-de75c0be123a@upilab.com>
References: <f927de1a-bdd5-7d7b-68b1-de75c0be123a@upilab.com>
Message-ID: <677240E5-98F9-44D9-826E-B30C0AB1EA3F@wifirst.fr>

Remove the double quotes.

> On 18 dec. 2017 at 21:19, David Lawley <davel at upilab.com> wrote :
> 
> sysctl: setting key "net.ipv4.tcp_rmem": Invalid argument
> net.ipv4.tcp_rmem = "4096 87380 8388608"


From IMMO.WETZEL at adtran.com  Tue Dec 19 11:29:33 2017
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Tue, 19 Dec 2017 10:29:33 +0000
Subject: [PVE-User] network restart
Message-ID: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>

Hi,

PVE 4.4
we observed a few times network card outages. The only way was a network card driver reload. But this leads to a destroyed network setup.
A service restart networking.service doenst solved this. It looks like the full network restart isn't done with that.
Once I tried to recover the network manually with all the steps which should be done automatically via ip ...
It's a hard way if there are 40VMs and two or three different network connections per vm existing.
Is there a common tool supported way to bring back all network connections ?

Immo Wetzel

ADTRAN GmbH
Siemensallee 1
17489 Greifswald
Germany

Phone: +49 3834 5352 823
Mobile: +49 151 147 29 225
Immo.Wetzel at Adtran.com<mailto:Immo.Wetzel at Adtran.com>   PGP-Fingerprint: 7313 7E88 4E19 AACF 45E9 E74D EFF7 0480 F4CF 6426
http://www.adtran.com<http://www.adtran.com/>

Sitz der Gesellschaft: Berlin / Registered office: Berlin
Registergericht: Berlin / Commercial registry: Amtsgericht Charlottenburg, HRB 135656 B
Gesch?ftsf?hrung / Managing Directors: Roger Shannon, James D. Wilson, Jr., Dr. Eduard Scheiterer


From mark at tuxis.nl  Tue Dec 19 11:34:54 2017
From: mark at tuxis.nl (Mark Schouten)
Date: Tue, 19 Dec 2017 11:34:54 +0100
Subject: [PVE-User] network restart
In-Reply-To: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>
References: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>
Message-ID: <1780533.Pdf45Cfqpt@tuxis>

Hi,

On dinsdag 19 december 2017 10:29:33 CET IMMO WETZEL wrote:
> PVE 4.4
> we observed a few times network card outages. The only way was a network
> card driver reload. But this leads to a destroyed network setup. A service

Sounds like spanning tree, or something like that. What can you do with ip to 
fix it?

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076  | http://www.tuxis.nl/
T: 0318 200208 | info at tuxis.nl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20171219/2706c4be/attachment.sig>

From IMMO.WETZEL at adtran.com  Tue Dec 19 12:13:30 2017
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Tue, 19 Dec 2017 11:13:30 +0000
Subject: [PVE-User] network restart
In-Reply-To: <1780533.Pdf45Cfqpt@tuxis>
References: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>,
 <1780533.Pdf45Cfqpt@tuxis>
Message-ID: <e4ifqh8otlmlhdxfnrd6ajxh.1513682005633@email.android.com>

No spanning tree configured in the whole network


Sent from Mobile

-------- Original message --------
From: Mark Schouten
Date:19/12/2017 12:09 (GMT+01:00)
To: PVE User List
Subject: Re: [PVE-User] network restart

Hi,

On dinsdag 19 december 2017 10:29:33 CET IMMO WETZEL wrote:
> PVE 4.4
> we observed a few times network card outages. The only way was a network
> card driver reload. But this leads to a destroyed network setup. A service

Sounds like spanning tree, or something like that. What can you do with ip to
fix it?

--
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/<https://www.kerioindecloud.nl/>
Mark Schouten | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/<http://www.tuxis.nl/>
T: 0318 200208 | info at tuxis.nl


From davel at upilab.com  Tue Dec 19 12:17:43 2017
From: davel at upilab.com (David Lawley)
Date: Tue, 19 Dec 2017 06:17:43 -0500
Subject: [PVE-User] sysctl tuning 5.1
In-Reply-To: <677240E5-98F9-44D9-826E-B30C0AB1EA3F@wifirst.fr>
References: <f927de1a-bdd5-7d7b-68b1-de75c0be123a@upilab.com>
 <677240E5-98F9-44D9-826E-B30C0AB1EA3F@wifirst.fr>
Message-ID: <9d1e334c-ea2a-851e-a720-207c44fc1ade@upilab.com>

Gives a different error, but I did try it too. Guessing these are not 
turntable yet in the PVE kernel yet?

root at pve:/etc# sysctl -w net.ipv4.tcp_rmem=4096 87380 8388608
net.ipv4.tcp_rmem = 4096
sysctl: "87380" must be of the form name=value
sysctl: "8388608" must be of the form name=value


On 12/18/2017 4:49 PM, Olivier Benghozi wrote:
> Remove the double quotes.
> 
>> On 18 dec. 2017 at 21:19, David Lawley <davel at upilab.com> wrote :
>>
>> sysctl: setting key "net.ipv4.tcp_rmem": Invalid argument
>> net.ipv4.tcp_rmem = "4096 87380 8388608"
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From olivier.benghozi at wifirst.fr  Tue Dec 19 12:24:13 2017
From: olivier.benghozi at wifirst.fr (Olivier Benghozi)
Date: Tue, 19 Dec 2017 12:24:13 +0100
Subject: [PVE-User] sysctl tuning 5.1
In-Reply-To: <9d1e334c-ea2a-851e-a720-207c44fc1ade@upilab.com>
References: <f927de1a-bdd5-7d7b-68b1-de75c0be123a@upilab.com>
 <677240E5-98F9-44D9-826E-B30C0AB1EA3F@wifirst.fr>
 <9d1e334c-ea2a-851e-a720-207c44fc1ade@upilab.com>
Message-ID: <E5659AA1-E89C-461B-BD3D-FD79F3A86A0F@wifirst.fr>

In your interactive shell you need double quotes.
In the .conf file you need to remove the double quotes and leave a space behind and after the equal sign.

> Le 19 d?c. 2017 ? 12:17, David Lawley <davel at upilab.com> a ?crit :
> 
> Gives a different error, but I did try it too. Guessing these are not turntable yet in the PVE kernel yet?
> 
> root at pve:/etc# sysctl -w net.ipv4.tcp_rmem=4096 87380 8388608
> net.ipv4.tcp_rmem = 4096
> sysctl: "87380" must be of the form name=value
> sysctl: "8388608" must be of the form name=value
> 
> 
> On 12/18/2017 4:49 PM, Olivier Benghozi wrote:
>> Remove the double quotes.
>>> On 18 dec. 2017 at 21:19, David Lawley <davel at upilab.com> wrote :
>>> 
>>> sysctl: setting key "net.ipv4.tcp_rmem": Invalid argument
>>> net.ipv4.tcp_rmem = "4096 87380 8388608"
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From infolist at schwarz-fr.net  Tue Dec 19 12:28:58 2017
From: infolist at schwarz-fr.net (Phil Schwarz)
Date: Tue, 19 Dec 2017 12:28:58 +0100
Subject: [PVE-User] Ceph over IP over Infiniband
Message-ID: <8647ac76-d5bb-1068-8c6c-a97b29de00c2@schwarz-fr.net>

Hi,
I'm currently trying to set up a brand new home cluster :
- 5 nodes, with each :

- 1 HCA Mellanox ConnectX-2
- 1 GB Ethernet (Proxmox 5.1 Network Admin)
- 1 CX4 to CX4 cable

All together connected to a SDR Flextronics IB Switch.

This setup should back a Ceph Luminous (V12.2.2 included in proxmox
V5.1) On all nodes, I did:
- apt-get infiniband-diags
- modprobe mlx4_ib
- modprobe ib_ipoib
- modprobe ib_umad
- ifconfig ib0 IP/MASK

On two nodes (tried previously on a single on, same issue), i installed
opensm ( The switch doesn't have SM included) :
apt-get install opensm
/etc/init.d/opensm stop
/etc/init.d/opensm start
(Necessary to let the daemon create the logfiles)

I tailed the logfile and got a "Active&Running" Setup, with "SUBNET UP"

Every node is OK regardless to IB Setup :
- All ib0 are UP, using ibstat
- ibhosts and ibswitches seem to be OK

On a node :
ibping -S

On every other node :
ibping -G GID_Of_Previous_Server_Port

I got a nice pong reply on every node. Should be happy, but...
But i never went further.. Tried to ping each other. No way to get into
this (mostly probably) simple issue...


Any hint to achieve this task ??


Thanks for all
Best regards


From davel at upilab.com  Tue Dec 19 12:31:45 2017
From: davel at upilab.com (David Lawley)
Date: Tue, 19 Dec 2017 06:31:45 -0500
Subject: [PVE-User] sysctl tuning 5.1
In-Reply-To: <E5659AA1-E89C-461B-BD3D-FD79F3A86A0F@wifirst.fr>
References: <f927de1a-bdd5-7d7b-68b1-de75c0be123a@upilab.com>
 <677240E5-98F9-44D9-826E-B30C0AB1EA3F@wifirst.fr>
 <9d1e334c-ea2a-851e-a720-207c44fc1ade@upilab.com>
 <E5659AA1-E89C-461B-BD3D-FD79F3A86A0F@wifirst.fr>
Message-ID: <e26d1208-e2a0-26a9-bcff-592911f1bcfc@upilab.com>

Bingo, thanks!!

Sometimes you can read too much!


On 12/19/2017 6:24 AM, Olivier Benghozi wrote:
> In your interactive shell you need double quotes.
> In the .conf file you need to remove the double quotes and leave a space behind and after the equal sign.
> 
>> Le 19 d?c. 2017 ? 12:17, David Lawley <davel at upilab.com> a ?crit :
>>
>> Gives a different error, but I did try it too. Guessing these are not turntable yet in the PVE kernel yet?
>>
>> root at pve:/etc# sysctl -w net.ipv4.tcp_rmem=4096 87380 8388608
>> net.ipv4.tcp_rmem = 4096
>> sysctl: "87380" must be of the form name=value
>> sysctl: "8388608" must be of the form name=value
>>
>>
>> On 12/18/2017 4:49 PM, Olivier Benghozi wrote:
>>> Remove the double quotes.
>>>> On 18 dec. 2017 at 21:19, David Lawley <davel at upilab.com> wrote :
>>>>
>>>> sysctl: setting key "net.ipv4.tcp_rmem": Invalid argument
>>>> net.ipv4.tcp_rmem = "4096 87380 8388608"
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From gilberto.nunes32 at gmail.com  Tue Dec 19 12:45:03 2017
From: gilberto.nunes32 at gmail.com (Gilberto Nunes)
Date: Tue, 19 Dec 2017 09:45:03 -0200
Subject: [PVE-User] network restart
In-Reply-To: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>
References: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>
Message-ID: <CAOKSTBtH_O=eDJ9ofwts6AkT9nP1+3S116AQLG=Oj=vSheWo=g@mail.gmail.com>

Hi

Can you tell us about your hardware? Mainly the network card and
switches.....


---
Gilberto Ferreira

(47) 3025-5907
(47) 99676-7530

Skype: gilberto.nunes36


2017-12-19 8:29 GMT-02:00 IMMO WETZEL <IMMO.WETZEL at adtran.com>:

> Hi,
>
> PVE 4.4
> we observed a few times network card outages. The only way was a network
> card driver reload. But this leads to a destroyed network setup.
> A service restart networking.service doenst solved this. It looks like the
> full network restart isn't done with that.
> Once I tried to recover the network manually with all the steps which
> should be done automatically via ip ...
> It's a hard way if there are 40VMs and two or three different network
> connections per vm existing.
> Is there a common tool supported way to bring back all network connections
> ?
>
> Immo Wetzel
>
> ADTRAN GmbH
> Siemensallee 1
> 17489 Greifswald
> Germany
>
> Phone: +49 3834 5352 823
> Mobile: +49 151 147 29 225
> Immo.Wetzel at Adtran.com<mailto:Immo.Wetzel at Adtran.com>   PGP-Fingerprint:
> 7313 7E88 4E19 AACF 45E9 E74D EFF7 0480 F4CF 6426
> http://www.adtran.com<http://www.adtran.com/>
>
> Sitz der Gesellschaft: Berlin / Registered office: Berlin
> Registergericht: Berlin / Commercial registry: Amtsgericht Charlottenburg,
> HRB 135656 B
> Gesch?ftsf?hrung / Managing Directors: Roger Shannon, James D. Wilson,
> Jr., Dr. Eduard Scheiterer
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From gilberto.nunes32 at gmail.com  Tue Dec 19 13:16:30 2017
From: gilberto.nunes32 at gmail.com (Gilberto Nunes)
Date: Tue, 19 Dec 2017 10:16:30 -0200
Subject: [PVE-User] network restart
In-Reply-To: <CAOKSTBtH_O=eDJ9ofwts6AkT9nP1+3S116AQLG=Oj=vSheWo=g@mail.gmail.com>
References: <F5452071F098E84B91D98FDE02860FAAF14739C5@ex-mb1.corp.adtran.com>
 <CAOKSTBtH_O=eDJ9ofwts6AkT9nP1+3S116AQLG=Oj=vSheWo=g@mail.gmail.com>
Message-ID: <CAOKSTBsQOvwqg-eopG+4_q6dbqLJMpx0y-adZ5RXL-E941k_Jg@mail.gmail.com>

Oh And if you can send the output of dmesg, just when the problem occur!
Something like dmesg | tail, to see the lastest message from the kernel log

---
Gilberto Ferreira

(47) 3025-5907
(47) 99676-7530

Skype: gilberto.nunes36


2017-12-19 9:45 GMT-02:00 Gilberto Nunes <gilberto.nunes32 at gmail.com>:

> Hi
>
> Can you tell us about your hardware? Mainly the network card and
> switches.....
>
>
>
> ---
> Gilberto Ferreira
>
> (47) 3025-5907
> (47) 99676-7530
>
> Skype: gilberto.nunes36
>
>
>
>
> 2017-12-19 8:29 GMT-02:00 IMMO WETZEL <IMMO.WETZEL at adtran.com>:
>
>> Hi,
>>
>> PVE 4.4
>> we observed a few times network card outages. The only way was a network
>> card driver reload. But this leads to a destroyed network setup.
>> A service restart networking.service doenst solved this. It looks like
>> the full network restart isn't done with that.
>> Once I tried to recover the network manually with all the steps which
>> should be done automatically via ip ...
>> It's a hard way if there are 40VMs and two or three different network
>> connections per vm existing.
>> Is there a common tool supported way to bring back all network
>> connections ?
>>
>> Immo Wetzel
>>
>> ADTRAN GmbH
>> Siemensallee 1
>> 17489 Greifswald
>> Germany
>>
>> Phone: +49 3834 5352 823
>> Mobile: +49 151 147 29 225
>> Immo.Wetzel at Adtran.com<mailto:Immo.Wetzel at Adtran.com>   PGP-Fingerprint:
>> 7313 7E88 4E19 AACF 45E9 E74D EFF7 0480 F4CF 6426
>> http://www.adtran.com<http://www.adtran.com/>
>>
>> Sitz der Gesellschaft: Berlin / Registered office: Berlin
>> Registergericht: Berlin / Commercial registry: Amtsgericht
>> Charlottenburg, HRB 135656 B
>> Gesch?ftsf?hrung / Managing Directors: Roger Shannon, James D. Wilson,
>> Jr., Dr. Eduard Scheiterer
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>


From lindsay.mathieson at gmail.com  Tue Dec 19 15:41:12 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 20 Dec 2017 00:41:12 +1000
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
 <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
 <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
Message-ID: <ffc1ce03-8f93-9eee-3bb0-19ae60a9942b@gmail.com>

On 12/12/2017 2:14 AM, Emmanuel Kasper wrote:
> Hi Lindsay
> As a quick check, is the cluster file system mounted on /etc/pve and can
> you read files there normally ( ie cat /etc/pve/datacenter.cfg working ) ?
>
> Are the node storages  returning their status properly ?
> (ie pvesm status does not hang)


Just had this exact same behaviour. multiple unkillable pveproxy 
processes with the timeout errors in dmesg. Only for the two nodes I 
upgraded.

- cluster file system is fine

- pvesm returns all storage ok.

- pvecm status is normal

- qm list and qm migrate just hang.


- can't connect to the webgui on the two ndoes in question.


Having to hard reset them as I need them usable again before work starts.


-- 
Lindsay Mathieson


From tobias.guth at ecos.de  Tue Dec 19 16:24:57 2017
From: tobias.guth at ecos.de (Tobias Guth - ECOS Technology)
Date: Tue, 19 Dec 2017 16:24:57 +0100 (CET)
Subject: [PVE-User] pveceph dmcrypt Support
Message-ID: <006601d378dd$7beb2570$73c17050$@ecos.de>

Hello,

 
I was wondering if pveceph supports creation of encrypted osds ? 

There is nothing in the official documentation mentioning anything about
it ? Besides I did not find any information for future releases.

It would be nice to have an ceph cluster setup by proxmox, but for
production use my requirement is encryption of the osd devices !

 
Regards

Tobi


From m at plus-plus.su  Tue Dec 19 16:59:57 2017
From: m at plus-plus.su (Mikhail)
Date: Tue, 19 Dec 2017 18:59:57 +0300
Subject: [PVE-User] Failure to install latest PVE on Debian Stretch
In-Reply-To: <1370055586.64.1513627612111@webmail.proxmox.com>
References: <dcef4c98-b037-7601-c580-d3db15f7ad49@plus-plus.su>
 <1370055586.64.1513627612111@webmail.proxmox.com>
Message-ID: <5b619800-5841-4aaa-a94d-8b4885740c16@plus-plus.su>

>> insserv: Service pve-cluster has to be enabled to start service pvefw-logger
>> insserv: exiting now!
> 
> We do not support insserv based systems anymore - please use systemd instead.
> 

Thanks for pointing! I removed insserv and reinstalled systemd on a
running system and then was able to fix PVE packages issue - it now runs
as usual.

Cheers.


From lindsay.mathieson at gmail.com  Wed Dec 20 01:13:59 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 20 Dec 2017 10:13:59 +1000
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <ffc1ce03-8f93-9eee-3bb0-19ae60a9942b@gmail.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
 <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
 <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
 <ffc1ce03-8f93-9eee-3bb0-19ae60a9942b@gmail.com>
Message-ID: <b0efb676-556b-76dd-1102-136c78e80c55@gmail.com>

On 20/12/2017 12:41 AM, Lindsay Mathieson wrote:
> Having to hard reset them as I need them usable again before work starts.

And pveproxy hung on both nodes again this morning, this is becoming 
quite a problem for us.


[21360.917460] INFO: task pveproxy:18122 blocked for more than 120 seconds.
[21360.917465]?????? Tainted: P?????????? O??? 4.4.95-1-pve #1
[21360.917469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[21360.917473] pveproxy??????? D ffff8807799cbdf8???? 0 18122????? 1 
0x00000004
[21360.917476]? ffff8807799cbdf8 ffff880ff114a840 ffff880ff84fc600 
ffff880fd9979c00
[21360.917478]? ffff8807799cc000 ffff880fc30143ac ffff880fd9979c00 
00000000ffffffff
[21360.917480]? ffff880fc30143b0 ffff8807799cbe10 ffffffff818643b5 
ffff880fc30143a8
[21360.917482] Call Trace:
[21360.917485]? [<ffffffff818643b5>] schedule+0x35/0x80
[21360.917487]? [<ffffffff8186466e>] schedule_preempt_disabled+0xe/0x10
[21360.917489]? [<ffffffff81866369>] __mutex_lock_slowpath+0xb9/0x130
[21360.917491]? [<ffffffff818663ff>] mutex_lock+0x1f/0x30
[21360.917493]? [<ffffffff812211ca>] filename_create+0x7a/0x160
[21360.917495]? [<ffffffff81222163>] SyS_mkdir+0x53/0x100
[21360.917497]? [<ffffffff818684f6>] entry_SYSCALL_64_fastpath+0x16/0x75


Is it possible to rollback the last update?

-- 
Lindsay Mathieson


From lindsay.mathieson at gmail.com  Wed Dec 20 01:19:01 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 20 Dec 2017 10:19:01 +1000
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <b0efb676-556b-76dd-1102-136c78e80c55@gmail.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
 <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
 <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
 <ffc1ce03-8f93-9eee-3bb0-19ae60a9942b@gmail.com>
 <b0efb676-556b-76dd-1102-136c78e80c55@gmail.com>
Message-ID: <ab66b051-8693-f130-5f3f-5c613a462448@gmail.com>

nb. This is with Proxmox 4

On 20/12/2017 10:13 AM, Lindsay Mathieson wrote:
> On 20/12/2017 12:41 AM, Lindsay Mathieson wrote:
>> Having to hard reset them as I need them usable again before work 
>> starts.
>
> And pveproxy hung on both nodes again this morning, this is becoming 
> quite a problem for us.
>
>
> [21360.917460] INFO: task pveproxy:18122 blocked for more than 120 
> seconds.
> [21360.917465]?????? Tainted: P?????????? O??? 4.4.95-1-pve #1
> [21360.917469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [21360.917473] pveproxy??????? D ffff8807799cbdf8???? 0 18122 1 
> 0x00000004
> [21360.917476]? ffff8807799cbdf8 ffff880ff114a840 ffff880ff84fc600 
> ffff880fd9979c00
> [21360.917478]? ffff8807799cc000 ffff880fc30143ac ffff880fd9979c00 
> 00000000ffffffff
> [21360.917480]? ffff880fc30143b0 ffff8807799cbe10 ffffffff818643b5 
> ffff880fc30143a8
> [21360.917482] Call Trace:
> [21360.917485]? [<ffffffff818643b5>] schedule+0x35/0x80
> [21360.917487]? [<ffffffff8186466e>] schedule_preempt_disabled+0xe/0x10
> [21360.917489]? [<ffffffff81866369>] __mutex_lock_slowpath+0xb9/0x130
> [21360.917491]? [<ffffffff818663ff>] mutex_lock+0x1f/0x30
> [21360.917493]? [<ffffffff812211ca>] filename_create+0x7a/0x160
> [21360.917495]? [<ffffffff81222163>] SyS_mkdir+0x53/0x100
> [21360.917497]? [<ffffffff818684f6>] entry_SYSCALL_64_fastpath+0x16/0x75
>
>
> Is it possible to rollback the last update?
>

-- 
Lindsay Mathieson


From lindsay.mathieson at gmail.com  Wed Dec 20 01:33:48 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 20 Dec 2017 10:33:48 +1000
Subject: [PVE-User] pveproxy dying, node unusable
In-Reply-To: <b0efb676-556b-76dd-1102-136c78e80c55@gmail.com>
References: <9a8556df-7f6c-255d-1d9e-0ad4619f5f11@gmail.com>
 <4e32b3c0-ddd5-6579-f521-775c22015e05@gmail.com>
 <5e86f27f-4d69-fc6f-8b5c-ec80f94e74ac@proxmox.com>
 <ffc1ce03-8f93-9eee-3bb0-19ae60a9942b@gmail.com>
 <b0efb676-556b-76dd-1102-136c78e80c55@gmail.com>
Message-ID: <6e69b36a-9cfb-62f4-7501-6026b415d796@gmail.com>

On 20/12/2017 10:13 AM, Lindsay Mathieson wrote:
> On 20/12/2017 12:41 AM, Lindsay Mathieson wrote:
>> Having to hard reset them as I need them usable again before work 
>> starts.
>
> And pveproxy hung on both nodes again this morning, this is becoming 
> quite a problem for us.
>
 ?systemctl status pveproxy
? pveproxy.service - PVE API Proxy Server
 ?? Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
 ?? Active: failed (Result: timeout) since Wed 2017-12-20 06:49:06 AEST; 
3h 44min ago
 ?Main PID: 4325 (code=exited, status=0/SUCCESS)

Dec 20 06:46:06 vng systemd[1]: pveproxy.service start operation timed 
out. Terminating.
Dec 20 06:47:36 vng systemd[1]: pveproxy.service stop-final-sigterm 
timed out. Killing.
Dec 20 06:49:06 vng systemd[1]: pveproxy.service still around after 
final SIGKILL. Entering failed mode.
Dec 20 06:49:06 vng systemd[1]: Failed to start PVE API Proxy Server.
Dec 20 06:49:06 vng systemd[1]: Unit pveproxy.service entered failed state.

-- 
Lindsay Mathieson


From f.gruenbichler at proxmox.com  Wed Dec 20 08:17:15 2017
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Wed, 20 Dec 2017 08:17:15 +0100
Subject: [PVE-User] pveceph dmcrypt Support
In-Reply-To: <006601d378dd$7beb2570$73c17050$@ecos.de>
References: <006601d378dd$7beb2570$73c17050$@ecos.de>
Message-ID: <20171220071715.gcliw2yfzmkr2uzu@nora.maurer-it.com>

On Tue, Dec 19, 2017 at 04:24:57PM +0100, Tobias Guth - ECOS Technology wrote:
> Hello,
> 
> I was wondering if pveceph supports creation of encrypted osds ? 
> 
> There is nothing in the official documentation mentioning anything about
> it ? Besides I did not find any information for future releases.
> 
> It would be nice to have an ceph cluster setup by proxmox, but for
> production use my requirement is encryption of the osd devices !
> 
>  
> 
> Regards
> 
> Tobi

no, it does not (currently / yet).

but you should be able to set them up manually, and all of the other
pveceph integration stuff should still work (except for destroying the
OSDs, which assumes a regular unencrypted GPT / ceph-disk setup).

we might re-visit this when looking at ceph-volume integration for the
upcoming Mimic release.


From t.lamprecht at proxmox.com  Wed Dec 20 10:22:52 2017
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Wed, 20 Dec 2017 10:22:52 +0100
Subject: [PVE-User] Proxmox provided container system appliances updated
Message-ID: <6f0752ac-646a-b47d-0033-f9f1dcd682b0@proxmox.com>

Hi,

At the end of last week we updated the container system appliances,
hosted on http://download.proxmox.com/images/
As previously, they are available to download through the Proxmox VE
webUI storage content panel.

Here a quick overview of what changed:

New:
* Ubuntu Artful (17.10)
* Alpine Linux 3.6
* Alpine Linux 3.7
* Fedora 26
* Fedora 27
* openSUSE 42.3

Updated (point release or rolling release):
* Debian Stretch (9.0 -> 9.3)
* Centos 7 (04 May 2017 (7.3) -> 12 Dec. 2017 (7.4))
* Arch Linux (04 July 2017 -> 12 Dec. 2017)
* gentoo (03 May 2017 -> 11 Dec. 2017)

Removed (EOL):
* Fedora 24
* Alpine Linux 3.3

Note: Removals are done from the appliances index, they may be still
downloaded manually, if needed.

cheers,
Thomas


From gilberto.nunes32 at gmail.com  Thu Dec 21 14:25:33 2017
From: gilberto.nunes32 at gmail.com (Gilberto Nunes)
Date: Thu, 21 Dec 2017 11:25:33 -0200
Subject: [PVE-User] Proxmox 5.1-40 - LXC Templates just gone!?!
Message-ID: <CAOKSTBsQsX-n0p7hFSLuwc5C=F1eYt6V3pg6NyxpeUSbiouR_w@mail.gmail.com>

Hi guys

Where's TurnKey repos???

Cheers

---
Gilberto Ferreira

(47) 3025-5907
(47) 99676-7530

Skype: gilberto.nunes36


From f.gruenbichler at proxmox.com  Thu Dec 21 14:34:45 2017
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Thu, 21 Dec 2017 14:34:45 +0100
Subject: [PVE-User] Proxmox 5.1-40 - LXC Templates just gone!?!
In-Reply-To: <CAOKSTBsQsX-n0p7hFSLuwc5C=F1eYt6V3pg6NyxpeUSbiouR_w@mail.gmail.com>
References: <CAOKSTBsQsX-n0p7hFSLuwc5C=F1eYt6V3pg6NyxpeUSbiouR_w@mail.gmail.com>
Message-ID: <20171221133445.y33dslkp56h665rw@nora.maurer-it.com>

On Thu, Dec 21, 2017 at 11:25:33AM -0200, Gilberto Nunes wrote:
> Hi guys
> 
> Where's TurnKey repos???
> 
> Cheers
> 

maybe you need to run "pveam update" ? everything looks OK..


From gilberto.nunes32 at gmail.com  Thu Dec 21 14:35:56 2017
From: gilberto.nunes32 at gmail.com (Gilberto Nunes)
Date: Thu, 21 Dec 2017 11:35:56 -0200
Subject: [PVE-User] Proxmox 5.1-40 - LXC Templates just gone!?!
In-Reply-To: <20171221133445.y33dslkp56h665rw@nora.maurer-it.com>
References: <CAOKSTBsQsX-n0p7hFSLuwc5C=F1eYt6V3pg6NyxpeUSbiouR_w@mail.gmail.com>
 <20171221133445.y33dslkp56h665rw@nora.maurer-it.com>
Message-ID: <CAOKSTBuPoCUycs9+LCusLFV1v1YtfCFboy3yZ107ePQVwQgaxQ@mail.gmail.com>

Yes... I realize that just a second aftet send the e-mail.
Sorry for that!

---
Gilberto Ferreira

(47) 3025-5907
(47) 99676-7530

Skype: gilberto.nunes36


2017-12-21 11:34 GMT-02:00 Fabian Gr?nbichler <f.gruenbichler at proxmox.com>:

> On Thu, Dec 21, 2017 at 11:25:33AM -0200, Gilberto Nunes wrote:
> > Hi guys
> >
> > Where's TurnKey repos???
> >
> > Cheers
> >
>
> maybe you need to run "pveam update" ? everything looks OK..
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From tobias.guth at ecos.de  Fri Dec 29 08:51:58 2017
From: tobias.guth at ecos.de (Tobias Guth - ECOS Technology)
Date: Fri, 29 Dec 2017 08:51:58 +0100 (CET)
Subject: [PVE-User] pveceph dmcrypt Support
Message-ID: <01db01d38079$db3de0a0$91b9a1e0$@ecos.de>

> no, it does not (currently / yet).
...
> we might re-visit this when looking at ceph-volume integration for the
> upcoming Mimic release.

Thanks for your hint. I have setup Ceph within Proxmox (pveceph) and did 
setup
encrypted OSDs with ceph-deploy on my storageboxes. Worked like charm !

Regards
Tobi


From gbr at majentis.com  Fri Dec 29 20:48:55 2017
From: gbr at majentis.com (Gerald Brandt)
Date: Fri, 29 Dec 2017 13:48:55 -0600
Subject: [PVE-User] Snapshots not showing in interface
Message-ID: <cd00c17d-9fa7-ee51-9e45-d251fde2130f@majentis.com>

Hi,

I have a VM with 2 snapshots. The display of snapsots for the VM is 
blank, so I can't delete the snapshot from there.

This is a conf file:

#Univention Corprorate Server 4.2-3
#Active Directory Domain Server
#email server
#
#UPS Monitoring
bootdisk: virtio0
cores: 4
ide2: NAS:iso/systemrescuecd-x86-4.6.0.iso,media=cdrom,size=456342K
memory: 4096
name: AD-Mail
net0: virtio=36:61:63:36:37:38,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: update2
smbios1: uuid=2c9872f5-3e0d-4b8b-a080-9abc234d0517
sockets: 2
startup: order=1,up=60,down=30
unused0: NAS:131/vm-131-disk-2.qcow2
virtio0: NAS:131/vm-131-disk-1.qcow2,size=150G

[update]
bootdisk: virtio0
cores: 4
ide2: NAS:iso/systemrescuecd-x86-4.6.0.iso,media=cdrom,size=456342K
memory: 4096
name: AD-Mail
net0: virtio=36:61:63:36:37:38,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: update2
smbios1: uuid=2c9872f5-3e0d-4b8b-a080-9abc234d0517
snaptime: 1514400926
sockets: 2
startup: order=1,up=60,down=30
virtio0: NAS:131/vm-131-disk-1.qcow2,size=150G

[update2]
#before 4.1.4 to 4.2.3
bootdisk: virtio0
cores: 4
ide2: NAS:iso/systemrescuecd-x86-4.6.0.iso,media=cdrom,size=456342K
memory: 4096
name: AD-Mail
net0: virtio=36:61:63:36:37:38,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: update
smbios1: uuid=2c9872f5-3e0d-4b8b-a080-9abc234d0517
snaptime: 1514404291
sockets: 2
startup: order=1,up=60,down=30
virtio0: NAS:131/vm-131-disk-1.qcow2,size=150G

Any idea why the GUI is blank?


Gerald


From lindsay.mathieson at gmail.com  Sat Dec 30 02:27:12 2017
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sat, 30 Dec 2017 11:27:12 +1000
Subject: [PVE-User] Snapshots not showing in interface
In-Reply-To: <cd00c17d-9fa7-ee51-9e45-d251fde2130f@majentis.com>
References: <cd00c17d-9fa7-ee51-9e45-d251fde2130f@majentis.com>
Message-ID: <3410dd42-6718-493b-18af-7809cc5d6a10@gmail.com>

On 30/12/2017 5:48 AM, Gerald Brandt wrote:
> I have a VM with 2 snapshots. The display of snapsots for the VM is 
> blank, so I can't delete the snapshot from there.
>
> This is a conf file: 

update and update2 both have each other as a parent - circular reference.

If you don't want to save the snapshots I'd delete them from the conf 
file and use qemu-img to delete them from the qcow2 image.


Once that is done, delete the parent entry from the main part of the 
conf file.


-- 
Lindsay Mathieson


From jagan.p at stackuptech.com  Sat Dec 30 09:07:46 2017
From: jagan.p at stackuptech.com (jagan)
Date: Sat, 30 Dec 2017 13:37:46 +0530
Subject: [PVE-User] Corosync Totem Re transmit logs - Node not responding
Message-ID: <9c7acf0b-544d-4663-ad72-e043a10dbcd5@stackuptech.com>

Hi,

I am using 2 node cluster with DRBD on PVE 3.4, i have seen huge log 
entries " corosync[2539]:? [TOTEM ] Retransmit List: 4b493 4b494 4b495 
4b496 4b497 4b498 4b499 4b49a" in syslog & corosync log. one cluster 
node? is freezing frequently not responding (Monitor & keyboard not 
responding).
2 Nodes are running in production, need your support to resolve the issue.

Thanks in advance.


From dietmar at proxmox.com  Sat Dec 30 09:33:11 2017
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Sat, 30 Dec 2017 09:33:11 +0100 (CET)
Subject: [PVE-User] Snapshots not showing in interface
In-Reply-To: <3410dd42-6718-493b-18af-7809cc5d6a10@gmail.com>
References: <cd00c17d-9fa7-ee51-9e45-d251fde2130f@majentis.com>
 <3410dd42-6718-493b-18af-7809cc5d6a10@gmail.com>
Message-ID: <99258182.3.1514622792402@webmail.proxmox.com>


> On December 30, 2017 at 2:27 AM Lindsay Mathieson
> <lindsay.mathieson at gmail.com> wrote:
> 
> 
> On 30/12/2017 5:48 AM, Gerald Brandt wrote:
> > I have a VM with 2 snapshots. The display of snapsots for the VM is 
> > blank, so I can't delete the snapshot from there.
> >
> > This is a conf file: 
> 
> update and update2 both have each other as a parent - circular reference.

I wonder how that can happen - did you manually edit the config file?