From lindsay.mathieson at gmail.com  Thu Apr  1 03:29:33 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 1 Apr 2021 11:29:33 +1000
Subject: [PVE-User] Offsite Backups with PBS using External swapable
 Drives?
In-Reply-To: <mailman.305.1617168091.347.pve-user@lists.proxmox.com>
References: <658103a5-e20c-a26a-de58-95c75e3e745d@gmail.com>
 <mailman.305.1617168091.347.pve-user@lists.proxmox.com>
Message-ID: <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com>

On 31/03/2021 3:20 pm, Arjen via pve-user wrote:
> If you run PBS somewhere, you can use a folder on the NAS as a Datastore. You could duplicate that folder to the external drive, as a backup copy of that Datastore and keep it off-site.


I'll look into that, though I'd be concerned regards file locking and 
sync to disk issues. Worth testing out though.


> You would need a PBS to read from the external drive in case of a on-site disaster. Something like that might be simlar to what you do now, except that you need to run a PBS somewhere on-site (possibly in a VM or CT).

Actually running it in a VM now for testing!

-- 
Lindsay


From leesteken at protonmail.ch  Thu Apr  1 08:24:45 2021
From: leesteken at protonmail.ch (Arjen)
Date: Thu, 01 Apr 2021 06:24:45 +0000
Subject: [PVE-User] Offsite Backups with PBS using External swapable
 Drives?
In-Reply-To: <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com>
References: <658103a5-e20c-a26a-de58-95c75e3e745d@gmail.com>
 <mailman.305.1617168091.347.pve-user@lists.proxmox.com>
 <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com>
Message-ID: <deLgznDQ29YbTQXCZaetsvGYbANS3nkzA7rU0PEnOom6k1kMxRbL6smIJQ4cZRlLBo0Y6Swh25wFHNPAbg365oOMWn4zoA2HwPFCiUYH4HQ=@protonmail.ch>

On Thursday, April 1st, 2021 at 03:29, Lindsay Mathieson <lindsay.mathieson at gmail.com> wrote:

> On 31/03/2021 3:20 pm, Arjen via pve-user wrote:
>
> > If you run PBS somewhere, you can use a folder on the NAS as a Datastore. You could duplicate that folder to the external drive, as a backup copy of that Datastore and keep it off-site.
>
> I'll look into that, though I'd be concerned regards file locking and
>
> sync to disk issues. Worth testing out though.

Probably best to gracefully shutdown PBS before copying the disk. Or do a copy (best effort, ignoring failures) first, then shutdown PBS and rsync all remaining differences to reduce the down time on the PBS.
Maybe someone more knowledgeable of PBS can tell us how to copy a Datastore to another disk?

> > You would need a PBS to read from the external drive in case of a on-site disaster. Something like that might be simlar to what you do now, except that you need to run a PBS somewhere on-site (possibly in a VM or CT).
>
> Actually running it in a VM now for testing!

If the Datastore is on a virtual disk, you could maybe use vzdump to backup that disk or even the whole VM to an external drive?
Or first backup the whole PBS VM to the NAS and then sync to external disk, much like you do now?

kind regards, Arjen


From lindsay.mathieson at gmail.com  Thu Apr  1 11:32:50 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 1 Apr 2021 19:32:50 +1000
Subject: [PVE-User] Offsite Backups with PBS using External swapable
 Drives?
In-Reply-To: <mailman.359.1617258321.347.pve-user@lists.proxmox.com>
References: <658103a5-e20c-a26a-de58-95c75e3e745d@gmail.com>
 <mailman.305.1617168091.347.pve-user@lists.proxmox.com>
 <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com>
 <mailman.359.1617258321.347.pve-user@lists.proxmox.com>
Message-ID: <dce9dc8a-94d5-75a0-966b-f6d69bc5beb3@gmail.com>

On 1/04/2021 4:24 pm, Arjen via pve-user wrote:
> Or first backup the whole PBS VM to the NAS and then sync to external disk, much like you do now?

That would certainly work


Cheers,

-- 
Lindsay


From leandro at tecnetmza.com.ar  Thu Apr  1 18:58:54 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Thu, 1 Apr 2021 13:58:54 -0300
Subject: [PVE-User] mi first cluster
Message-ID: <CALt2oz4+=jOqN8OtcXthqdrFt6+CghQBc5cDtLFpV1tPM5vLng@mail.gmail.com>

Hi guys :
Im preparing my second proxmox box to create my first cluster.
IT does not contain any usefull data yet , so I can play aorund a little
bit.
Have some questions.
Is it possible to create a cluster with only two boxes ?
I understood it is possible , but can not move VMs without service
disruption.
And if I want to move VMs without services disruption I need a third box.

What is ceph storage for ?
what is ZFS pool for  ?
In case I need I can add it on any of my available bays o any box ?

Do pve boxes need to be on same network subnet ?
I have two datacenters , I was thinking in the idea to install them on each
location just for redundancy.
Lets supose network connection performance is optimal ... Is there any
problem with that ?
Any other clustering good practice advice would be wellcome.
Thanks.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


From d.csapak at proxmox.com  Fri Apr  2 11:28:29 2021
From: d.csapak at proxmox.com (Dominik Csapak)
Date: Fri, 2 Apr 2021 11:28:29 +0200
Subject: [PVE-User] mi first cluster
In-Reply-To: <CALt2oz4+=jOqN8OtcXthqdrFt6+CghQBc5cDtLFpV1tPM5vLng@mail.gmail.com>
References: <CALt2oz4+=jOqN8OtcXthqdrFt6+CghQBc5cDtLFpV1tPM5vLng@mail.gmail.com>
Message-ID: <fa30f4c5-588d-ec85-8e44-61bcb61c1d41@proxmox.com>

On 4/1/21 18:58, Leandro Roggerone wrote:
> Hi guys :

Hi,

> Im preparing my second proxmox box to create my first cluster.
> IT does not contain any usefull data yet , so I can play aorund a little
> bit.
> Have some questions.
> Is it possible to create a cluster with only two boxes ?

yes

> I understood it is possible , but can not move VMs without service
> disruption.

if you mean HA, yes you need at least 3 nodes or 2 nodes + a quorum 
device, also the storage needs to be avaiable on both nodes

live migration still works even with 2 nodes, and even with local storage

> And if I want to move VMs without services disruption I need a third box.
> 
> What is ceph storage for ?
> what is ZFS pool for  ?

what do you mean?

ceph is distributed, redundant, software defined storage
zfs is a type of local filesystem and disk management

> In case I need I can add it on any of my available bays o any box ?
> 
> Do pve boxes need to be on same network subnet ?

generally no, i think, but it makes it easier

> I have two datacenters , I was thinking in the idea to install them on each
> location just for redundancy.
> Lets supose network connection performance is optimal ... Is there any
> problem with that ?

i'd advise against that. our clustering stack uses corosync, which
needs low latency links to work properly

> Any other clustering good practice advice would be wellcome.
> Thanks.
> 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Libre
> de virus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 


From d.csapak at proxmox.com  Fri Apr  2 11:28:29 2021
From: d.csapak at proxmox.com (Dominik Csapak)
Date: Fri, 2 Apr 2021 11:28:29 +0200
Subject: [PVE-User] mi first cluster
In-Reply-To: <CALt2oz4+=jOqN8OtcXthqdrFt6+CghQBc5cDtLFpV1tPM5vLng@mail.gmail.com>
References: <CALt2oz4+=jOqN8OtcXthqdrFt6+CghQBc5cDtLFpV1tPM5vLng@mail.gmail.com>
Message-ID: <fa30f4c5-588d-ec85-8e44-61bcb61c1d41@proxmox.com>

On 4/1/21 18:58, Leandro Roggerone wrote:
> Hi guys :

Hi,

> Im preparing my second proxmox box to create my first cluster.
> IT does not contain any usefull data yet , so I can play aorund a little
> bit.
> Have some questions.
> Is it possible to create a cluster with only two boxes ?

yes

> I understood it is possible , but can not move VMs without service
> disruption.

if you mean HA, yes you need at least 3 nodes or 2 nodes + a quorum 
device, also the storage needs to be avaiable on both nodes

live migration still works even with 2 nodes, and even with local storage

> And if I want to move VMs without services disruption I need a third box.
> 
> What is ceph storage for ?
> what is ZFS pool for  ?

what do you mean?

ceph is distributed, redundant, software defined storage
zfs is a type of local filesystem and disk management

> In case I need I can add it on any of my available bays o any box ?
> 
> Do pve boxes need to be on same network subnet ?

generally no, i think, but it makes it easier

> I have two datacenters , I was thinking in the idea to install them on each
> location just for redundancy.
> Lets supose network connection performance is optimal ... Is there any
> problem with that ?

i'd advise against that. our clustering stack uses corosync, which
needs low latency links to work properly

> Any other clustering good practice advice would be wellcome.
> Thanks.
> 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Libre
> de virus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 


From ouafnico at shivaserv.fr  Fri Apr  2 14:43:14 2021
From: ouafnico at shivaserv.fr (Ouafnico)
Date: Fri, 02 Apr 2021 14:43:14 +0200
Subject: vgs/vgscan sleep disks
Message-ID: <2OSXQQ.7DLD2CWRGIPN3@shivaserv.fr>

Hi,

I'm using proxmox for a while on a personal home server.

I'm using multiples disk devices with LVM.

Actually, I saw pve is waking up all my disks every x seconds for the 
LVM vgs command with pvestatd.

I'm not using all my lvm volumes groups for proxmox, some are for 
others needs, so I can't hide them on global_filters in LVM 
configuration.

I'm still searching how to tell proxmox to do not vgscan all disks, but 
only on VG declared on pve.

Is there any way to do so, or disable this check?


Thanks


From ouafnico at shivaserv.fr  Fri Apr  2 16:18:31 2021
From: ouafnico at shivaserv.fr (Ouafnico)
Date: Fri, 02 Apr 2021 16:18:31 +0200
Subject: [PVE-User] vgs/vgscan sleep disks
In-Reply-To: <mailman.38.1617367790.359.pve-user@lists.proxmox.com>
References: <mailman.38.1617367790.359.pve-user@lists.proxmox.com>
Message-ID: <V2XXQQ.D56QSBL56MZS3@shivaserv.fr>

I might have found something, if it can help anyone.


vgs is scanning every /dev/sd* devices, and /dev/md* devices.

My VG are on mdadm devices.


I have added in lvm filter, /dev/sd* in reject, but accept for /dev/md* 
only.


Maybe the mdadm cache is responding to vgs commands, but now I see only 
vgs seeks on /dev/md*, and it's not waking up all devices.


Le ven. 2 avril 2021 ? 14:43, Ouafnico via pve-user 
<pve-user at lists.proxmox.com> a ?crit :
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
> <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user>


From stefan at fuhrmann.homedns.org  Sat Apr  3 10:51:26 2021
From: stefan at fuhrmann.homedns.org (Stefan Fuhrmann)
Date: Sat, 3 Apr 2021 10:51:26 +0200
Subject: [PVE-User] Windows sever 2016 vm with very high ram usage
In-Reply-To: <CALt2oz60mFjoj_ARChyB4NRAF6je=Ht0k-P+vdQPO6vAmp71RA@mail.gmail.com>
References: <CALt2oz74FbuC8WcqayfMUHjw0juWhVrpRZ3R73oD1xpjbjhCQA@mail.gmail.com>
 <mailman.195.1616769614.347.pve-user@lists.proxmox.com>
 <CALt2oz60mFjoj_ARChyB4NRAF6je=Ht0k-P+vdQPO6vAmp71RA@mail.gmail.com>
Message-ID: <b7eb345c-4ab7-0553-55ee-93c98688067d@fuhrmann.homedns.org>

Ahoi,


 ?you have to installthe drivers:

https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers


Stefan


Am 29.03.21 um 15:37 schrieb Leandro Roggerone:
> Hi guys , thanks for your words , have some feedback:
>
> Maybe Proxmox cannot look inside the VM for the actual memory usage because
> the VirtIO balloon driver is not installed or active? Or maybe the other
> 90% is in use as Windows file cache?
> I think so ...
> Dont know about windows file cache ...
>
> Have you installed the VirtIO drivers for Windows? Are you assigning too
> many vCPUs or memory? Can you share the VM configuration file? Can you tell
> us something about your Proxmox hardware and version?
>   No , I have not ... now you mentioned im reading about VirtIO , I will try
> to install and let you know how it goes
>   Should install it no my pve box or directly inside windows vm ?
> I'm assigning max vCPUs abailables (24 , 4 sockets 6 cores) and 32gb for
> memory.
> (I really don't know about any criteria to assign vcpus)
>
> This is VM config file:
>
> root at pve:~# cat /etc/pve/nodes/pve/qemu-server/107.conf
>
> bootdisk: ide0
>
> cores: 6
>
> ide0: local-lvm:vm-107-disk-0,size=150G
>
> ide1: local-lvm:vm-107-disk-1,size=350G
>
> ide2:
> local:iso/Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO,media=cdrom,size=6808810K
>
> memory: 32768
>
> name: KAIKENII
>
> net0: e1000=1A:F1:10:BF:92:0A,bridge=vmbr3,firewall=1
>
> numa: 0
>
> ostype: win10
>
> scsihw: virtio-scsi-pci
>
> smbios1: uuid=daf8f767-59c7-4e87-b3be-75d4a8020c38
>
> sockets: 4
>
> vmgenid: a7634624-1230-4a3e-9e7c-255d32ad2030
>
> My PVE is:
> CPU(s) 24 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz (2 Sockets)
> Kernel Version Linux 5.0.15-1-pve #1 SMP PVE 5.0.15-1 (Wed, 03 Jul 2019
> 10:51:57 +0200)
> PVE Manager Version pve-manager/6.0-4/2a719255
> Total Mem = 64GB.
>
> That's all.
> Thanks
>
>
>
>
> El vie, 26 mar 2021 a las 11:40, Arjen via pve-user (<
> pve-user at lists.proxmox.com>) escribi?:
>
>>
>>
>> ---------- Forwarded message ----------
>> From: Arjen <leesteken at protonmail.ch>
>> To: Proxmox VE user list <pve-user at lists.proxmox.com>
>> Cc:
>> Bcc:
>> Date: Fri, 26 Mar 2021 14:39:13 +0000
>> Subject: Re: [PVE-User] Windows sever 2016 vm with very high ram usage
>> On Friday, March 26th, 2021 at 15:28, Leandro Roggerone <
>> leandro at tecnetmza.com.ar> wrote:
>>
>>> Hi guys , Just wanted to share this with you.
>>>
>>> After creating a VM for a windows sever 2016 with 32 GB ram I can
>>>
>>> see continuos high memory usage (about 99-100%).
>>>
>>> I have no running task , since it is a fresh server and from task manager
>>>
>>> can see a 10% of memory usage.
>>>
>>> Regarding those confusing differences, server performance is very bad.
>> Maybe Proxmox cannot look inside the VM for the actual memory usage
>> because the VirtIO balloon driver is not installed or active? Or maybe the
>> other 90% is in use as Windows file cache?
>>
>>> User experience is very poor with a non fluent user interface.
>>>
>>> Is there something to do / check to improve this ?
>> Have you installed the VirtIO drivers for Windows? Are you assigning too
>> many vCPUs or memory? Can you share the VM configuration file? Can you tell
>> us something about your Proxmox hardware and version?
>>
>>> Any advice would be welcome.
>>
>> Maybe search the forum for similar Windows performance questions?
>>
>>
>> https://forum.proxmox.com/forums/proxmox-ve-installation-and-configuration.16/
>>
>>> Thanks.
>> best of luck, Arjen
>>
>>
>> ---------- Forwarded message ----------
>> From: Arjen via pve-user <pve-user at lists.proxmox.com>
>> To: Proxmox VE user list <pve-user at lists.proxmox.com>
>> Cc: Arjen <leesteken at protonmail.ch>
>> Bcc:
>> Date: Fri, 26 Mar 2021 14:39:13 +0000
>> Subject: Re: [PVE-User] Windows sever 2016 vm with very high ram usage
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From hongyi.zhao at gmail.com  Sun Apr  4 10:25:51 2021
From: hongyi.zhao at gmail.com (Hongyi Zhao)
Date: Sun, 4 Apr 2021 16:25:51 +0800
Subject: [PVE-User] Change the private FQDN of pve node.
Message-ID: <CAGP6POKUbXGaL7F1xRQehtJ5MKr8QLay01eTG21ORvS=tOVPBA@mail.gmail.com>

Currently, I'm running only one pve node on one of my intranet machine
which using the following hosts file configuration:

----<quote>-----
# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
#https://pve.proxmox.com/wiki/Renaming_a_PVE_node
192.168.10.254 pve.lan pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
----</quote>-----

For compatibility and scalability, I intend to switch to the following
domain name:

pve1.pve.lan

For this purpose, I changed the following two files as below:

/etc/hosts:
192.168.10.254 pve1.pve.lan pve1

/etc/hostname:
pve1

I'm not sure if the above is enough. Any hints will be highly appreciated.

Regards
-- 
Assoc. Prof. Hongyi Zhao <hongyi.zhao at gmail.com>
Theory and Simulation of Materials
Hebei Polytechnic University of Science and Technology engineering
NO. 552 North Gangtie Road, Xingtai, China


From martin.konold at konsec.com  Mon Apr  5 10:22:56 2021
From: martin.konold at konsec.com (Konold, Martin)
Date: Mon, 05 Apr 2021 10:22:56 +0200
Subject: [PVE-User] ZFS Disk Usage unexpected high
Message-ID: <7e9d1358dcb12b288e884547553e39b4@konsec.com>


Hi,

I set up a single VM which currently used 15TB of data.
/dev/sdb is technically a ZFS volume on the Proxmox Host.

[root at vm ~]# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         40T   15T   25T  37% /data
[root at vm ~]# du -s /data/
14874345100     /data/

[root at host /]# zfs list zfs01/PVE-BE/vm-1-disk-1
NAME                         USED  AVAIL     REFER  MOUNTPOINT
zfs01/PVE-BE/vm-1-disk-1  31.5T  5.82T     31.5T  -

Why is the usage on the host about twice as large as within the vm?
(Yes, I have given fstrim and discard a try).

-- 
Regards
ppa. Martin Konold

--
Martin Konold - Prokurist, CTO
KONSEC GmbH -? make things real
Amtsgericht Stuttgart, HRB 23690
Gesch?ftsf?hrer: Andreas Mack
Im K?ller 3, 70794 Filderstadt, Germany


From gianni.milo22 at gmail.com  Mon Apr  5 11:54:40 2021
From: gianni.milo22 at gmail.com (Yanni M.)
Date: Mon, 5 Apr 2021 10:54:40 +0100
Subject: [PVE-User] ZFS Disk Usage unexpected high
In-Reply-To: <7e9d1358dcb12b288e884547553e39b4@konsec.com>
References: <7e9d1358dcb12b288e884547553e39b4@konsec.com>
Message-ID: <CACzVk9W=7Dbst916wMnFuJuMjS17zk0d4GPT8MNd7T1Gb4myWw@mail.gmail.com>

This is a common issue on raidz based pools. Assuming 4k sectors
(ashift=12) and zvol with 8K volblocksize, each 8K (2-sector) block uses a
single sector (4k) of parity. So 15TB of 8KB blocks (default
volblocksize=8k) takes up at least 22.5TB space on disk (including parity).
You will use less parity by increasing the volblock size (e.g.
volblocksize=32k, or the default recordsize=128k for filesystems) in
exchange of a possible lower performance. Another solution would be using a
pool of striped mirrors (RAID10). This problem does not exist in such pools
(as there are no parity blocks used).


On Mon, 5 Apr 2021 at 09:23, Konold, Martin <martin.konold at konsec.com>
wrote:

>
> Hi,
>
> I set up a single VM which currently used 15TB of data.
> /dev/sdb is technically a ZFS volume on the Proxmox Host.
>
> [root at vm ~]# df -h /data
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdb         40T   15T   25T  37% /data
> [root at vm ~]# du -s /data/
> 14874345100     /data/
>
> [root at host /]# zfs list zfs01/PVE-BE/vm-1-disk-1
> NAME                         USED  AVAIL     REFER  MOUNTPOINT
> zfs01/PVE-BE/vm-1-disk-1  31.5T  5.82T     31.5T  -
>
> Why is the usage on the host about twice as large as within the vm?
> (Yes, I have given fstrim and discard a try).
>
> --
> Regards
> ppa. Martin Konold
>
> --
> Martin Konold - Prokurist, CTO
> KONSEC GmbH -? make things real
> Amtsgericht Stuttgart, HRB 23690
> Gesch?ftsf?hrer: Andreas Mack
> Im K?ller 3, 70794 Filderstadt, Germany
> <https://www.google.com/maps/search/Im+K%C3%B6ller+3,+70794+Filderstadt,+Germany?entry=gmail&source=g>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From leandro at tecnetmza.com.ar  Mon Apr  5 16:25:55 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Mon, 5 Apr 2021 11:25:55 -0300
Subject: [PVE-User] * this host already contains virtual guests on fresh box
Message-ID: <CALt2oz7=UrqP=ge_vLFm+k7pV4n97z0RJ0FSEZtfpnPVf18UQA@mail.gmail.com>

Hi guys, I was trying to create a cluster and add a node to it
I have my main box (172.30.6.254) with a lot of  vms and containers running
there.
I created a cluster in my main box.
Then on my new fresh box (172.30.6.253) , tried to join to created cluster
but got the message:
* this host already contains virtual guests
and can not continue.
Why is that happening? my server is new and empty.
Have been looking but can not fix it.
Any suggestions?
Thanks, Leandro.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


From leandro at tecnetmza.com.ar  Mon Apr  5 16:48:14 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Mon, 5 Apr 2021 11:48:14 -0300
Subject: [PVE-User] * this host already contains virtual guests (SOLVED)
Message-ID: <CALt2oz4JBnFmGpy2UPEPbs8pGC6eY4KAQ+UBux5Q+g+-GwHdEw@mail.gmail.com>

Please dismiss my previous email , already solved
After searching on the new server directory , it has some VMs info from
main server created on a first join attempt.
I removed those directories at /etc/pve/nodes/main_node
and could succesfully add my new node to cluster.
Thanks.


From laurentfdumont at gmail.com  Tue Apr  6 01:04:42 2021
From: laurentfdumont at gmail.com (Laurent Dumont)
Date: Mon, 5 Apr 2021 19:04:42 -0400
Subject: [PVE-User] * this host already contains virtual guests on fresh
 box
In-Reply-To: <CALt2oz7=UrqP=ge_vLFm+k7pV4n97z0RJ0FSEZtfpnPVf18UQA@mail.gmail.com>
References: <CALt2oz7=UrqP=ge_vLFm+k7pV4n97z0RJ0FSEZtfpnPVf18UQA@mail.gmail.com>
Message-ID: <CAOAKi8x-uuEDvxy+KmKeb-XvqTctkASgSdzTrje_twb_BO3zzQ@mail.gmail.com>

Is the new box really fresh? No LXD container/VM?

You should be able to join a fresh box to an existing cluster with VMs
already running.

On Mon, Apr 5, 2021 at 10:26 AM Leandro Roggerone <leandro at tecnetmza.com.ar>
wrote:

> Hi guys, I was trying to create a cluster and add a node to it
> I have my main box (172.30.6.254) with a lot of  vms and containers running
> there.
> I created a cluster in my main box.
> Then on my new fresh box (172.30.6.253) , tried to join to created cluster
> but got the message:
> * this host already contains virtual guests
> and can not continue.
> Why is that happening? my server is new and empty.
> Have been looking but can not fix it.
> Any suggestions?
> Thanks, Leandro.
>
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> Libre
> de virus. www.avast.com
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From laurentfdumont at gmail.com  Tue Apr  6 01:05:11 2021
From: laurentfdumont at gmail.com (Laurent Dumont)
Date: Mon, 5 Apr 2021 19:05:11 -0400
Subject: [PVE-User] * this host already contains virtual guests (SOLVED)
In-Reply-To: <CALt2oz4JBnFmGpy2UPEPbs8pGC6eY4KAQ+UBux5Q+g+-GwHdEw@mail.gmail.com>
References: <CALt2oz4JBnFmGpy2UPEPbs8pGC6eY4KAQ+UBux5Q+g+-GwHdEw@mail.gmail.com>
Message-ID: <CAOAKi8y1gsyb8ZYTxGQ=6+LHmefrtgAtXPLx5Erh0CU2kn=DWw@mail.gmail.com>

Oups, replied to your other email but glad you found a fix :)

On Mon, Apr 5, 2021 at 10:48 AM Leandro Roggerone <leandro at tecnetmza.com.ar>
wrote:

> Please dismiss my previous email , already solved
> After searching on the new server directory , it has some VMs info from
> main server created on a first join attempt.
> I removed those directories at /etc/pve/nodes/main_node
> and could succesfully add my new node to cluster.
> Thanks.
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From laurentfdumont at gmail.com  Tue Apr  6 01:04:42 2021
From: laurentfdumont at gmail.com (Laurent Dumont)
Date: Mon, 5 Apr 2021 19:04:42 -0400
Subject: [PVE-User] * this host already contains virtual guests on fresh
 box
In-Reply-To: <CALt2oz7=UrqP=ge_vLFm+k7pV4n97z0RJ0FSEZtfpnPVf18UQA@mail.gmail.com>
References: <CALt2oz7=UrqP=ge_vLFm+k7pV4n97z0RJ0FSEZtfpnPVf18UQA@mail.gmail.com>
Message-ID: <CAOAKi8x-uuEDvxy+KmKeb-XvqTctkASgSdzTrje_twb_BO3zzQ@mail.gmail.com>

Is the new box really fresh? No LXD container/VM?

You should be able to join a fresh box to an existing cluster with VMs
already running.

On Mon, Apr 5, 2021 at 10:26 AM Leandro Roggerone <leandro at tecnetmza.com.ar>
wrote:

> Hi guys, I was trying to create a cluster and add a node to it
> I have my main box (172.30.6.254) with a lot of  vms and containers running
> there.
> I created a cluster in my main box.
> Then on my new fresh box (172.30.6.253) , tried to join to created cluster
> but got the message:
> * this host already contains virtual guests
> and can not continue.
> Why is that happening? my server is new and empty.
> Have been looking but can not fix it.
> Any suggestions?
> Thanks, Leandro.
>
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> Libre
> de virus. www.avast.com
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From laurentfdumont at gmail.com  Tue Apr  6 01:05:11 2021
From: laurentfdumont at gmail.com (Laurent Dumont)
Date: Mon, 5 Apr 2021 19:05:11 -0400
Subject: [PVE-User] * this host already contains virtual guests (SOLVED)
In-Reply-To: <CALt2oz4JBnFmGpy2UPEPbs8pGC6eY4KAQ+UBux5Q+g+-GwHdEw@mail.gmail.com>
References: <CALt2oz4JBnFmGpy2UPEPbs8pGC6eY4KAQ+UBux5Q+g+-GwHdEw@mail.gmail.com>
Message-ID: <CAOAKi8y1gsyb8ZYTxGQ=6+LHmefrtgAtXPLx5Erh0CU2kn=DWw@mail.gmail.com>

Oups, replied to your other email but glad you found a fix :)

On Mon, Apr 5, 2021 at 10:48 AM Leandro Roggerone <leandro at tecnetmza.com.ar>
wrote:

> Please dismiss my previous email , already solved
> After searching on the new server directory , it has some VMs info from
> main server created on a first join attempt.
> I removed those directories at /etc/pve/nodes/main_node
> and could succesfully add my new node to cluster.
> Thanks.
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From mhill at inett.de  Tue Apr  6 09:41:26 2021
From: mhill at inett.de (Maximilian Hill)
Date: Tue, 6 Apr 2021 09:41:26 +0200
Subject: [PVE-User] ZFS Disk Usage unexpected high
In-Reply-To: <7e9d1358dcb12b288e884547553e39b4@konsec.com>
References: <7e9d1358dcb12b288e884547553e39b4@konsec.com>
Message-ID: <YGwQplVYoYJttJ3j@inett.de>

Hi,

I got the same issue with different RAID-Z setups lately.

We worked around it, but I don't want to go into detail abaout that
before I know. why that happened.


Regards
Maximilian Hill

On Mon, Apr 05, 2021 at 10:22:56AM +0200, Konold, Martin wrote:
> 
> Hi,
> 
> I set up a single VM which currently used 15TB of data.
> /dev/sdb is technically a ZFS volume on the Proxmox Host.
> 
> [root at vm ~]# df -h /data
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdb         40T   15T   25T  37% /data
> [root at vm ~]# du -s /data/
> 14874345100     /data/
> 
> [root at host /]# zfs list zfs01/PVE-BE/vm-1-disk-1
> NAME                         USED  AVAIL     REFER  MOUNTPOINT
> zfs01/PVE-BE/vm-1-disk-1  31.5T  5.82T     31.5T  -
> 
> Why is the usage on the host about twice as large as within the vm?
> (Yes, I have given fstrim and discard a try).
> 
> -- 
> Regards
> ppa. Martin Konold
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20210406/40a4057e/attachment.sig>

From leandro at tecnetmza.com.ar  Wed Apr  7 17:25:16 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Wed, 7 Apr 2021 12:25:16 -0300
Subject: [PVE-User] link down for network interface
Message-ID: <CALt2oz58fYYZSGwgCMops2Phh2q9SiYmproaE-SrTwj2+Ozyvg@mail.gmail.com>

Hi guys ...
Yesterday was working on my datacenter cabling my new proxmox network
interfaces.
I connected a network port to my mikrotik router.
Before coming home I checked network linked condition and physically was ok
, both leds blinking on both side , router and server nic. (eno1)
Now , working remotely can see that server interface has no link.
Very strange since interface at router is up.
Any idea ? I never seen something similar (one side  link ok  and the other
is down).
Here can I share some outputs:
root at pve2:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0
state UP mode DEFAULT group default qlen 1000
    link/ether a0:d3:c1:f5:37:08 brd ff:ff:ff:ff:ff:ff
3: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr1
state UP mode DEFAULT group default qlen 1000
    link/ether a0:d3:c1:f5:37:09 brd ff:ff:ff:ff:ff:ff
4: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr2
state DOWN mode DEFAULT group default qlen 1000
    link/ether 40:a8:f0:2a:18:80 brd ff:ff:ff:ff:ff:ff
5: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr3
state DOWN mode DEFAULT group default qlen 1000

as you can see eno1 is down ... while can see link up on the other
connected side.
I will change cable tomorrow, but wanted to share this experience with you.
BTW , already tryed autonegociation , and fixed 100, 1000Mb.
Regards.
Leandro.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


From chris.hofstaedtler at deduktiva.com  Thu Apr  8 00:03:19 2021
From: chris.hofstaedtler at deduktiva.com (Chris Hofstaedtler | Deduktiva)
Date: Thu, 8 Apr 2021 00:03:19 +0200
Subject: [PVE-User] link down for network interface
In-Reply-To: <CALt2oz58fYYZSGwgCMops2Phh2q9SiYmproaE-SrTwj2+Ozyvg@mail.gmail.com>
References: <CALt2oz58fYYZSGwgCMops2Phh2q9SiYmproaE-SrTwj2+Ozyvg@mail.gmail.com>
Message-ID: <20210407220319.snhtabanjxcjfl6w@percival.namespace.at>

* Leandro Roggerone <leandro at tecnetmza.com.ar> [210407 17:25]:
> Hi guys ...
Please consider that more than one gender exists.

> Yesterday was working on my datacenter cabling my new proxmox network
> interfaces.
> I connected a network port to my mikrotik router.
> Before coming home I checked network linked condition and physically was ok
> , both leds blinking on both side , router and server nic. (eno1)
> Now , working remotely can see that server interface has no link.
> Very strange since interface at router is up.
> Any idea ? I never seen something similar (one side  link ok  and the other
> is down).

> root at pve2:~# ip link
[..]
> 4: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master vmbr2
> state DOWN mode DEFAULT group default qlen 1000
>     link/ether 40:a8:f0:2a:18:80 brd ff:ff:ff:ff:ff:ff
> 
> as you can see eno1 is down ... while can see link up on the other
> connected side.

You did not tell us the brand and make or driver of that network
card. Some Intel cards are known to be extremely picky about autoneg
and power save settings, (short) cables, etc.

Best luck,
Chris

-- 
Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien)
www.deduktiva.com / +43 1 353 1707


From lindsay.mathieson at gmail.com  Thu Apr  8 09:44:10 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 8 Apr 2021 17:44:10 +1000
Subject: [PVE-User] PBS and Open VSwicth
Message-ID: <CAEMkAmE51L71fv1Nn39Z9bDReMS912mnXAwrL3_mQVDHxQYqaQ@mail.gmail.com>

Does PBS support OpenVSwicth? not seeing anything in the webgui for it

-- 
Lindsay


From mityapetuhov at gmail.com  Thu Apr  8 09:58:57 2021
From: mityapetuhov at gmail.com (Dmitry Petuhov)
Date: Thu, 8 Apr 2021 10:58:57 +0300
Subject: [PVE-User] PBS and Open VSwicth
In-Reply-To: <CAEMkAmE51L71fv1Nn39Z9bDReMS912mnXAwrL3_mQVDHxQYqaQ@mail.gmail.com>
References: <CAEMkAmE51L71fv1Nn39Z9bDReMS912mnXAwrL3_mQVDHxQYqaQ@mail.gmail.com>
Message-ID: <2ff3699c-0511-5b7b-0a5f-a8be6fe667c3@gmail.com>

Hello

What for? I don't see any application of OVS in PBS. Maybe for interface 
bonding, but it can be done easier.

Regardless of support in GUI, you always can configure it by hand like 
in standard Debian installation.


08.04.2021 10:44, Lindsay Mathieson ?????:
> Does PBS support OpenVSwicth? not seeing anything in the webgui for it
>


From lindsay.mathieson at gmail.com  Thu Apr  8 15:04:23 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 8 Apr 2021 23:04:23 +1000
Subject: [PVE-User] PBS and Open VSwicth
In-Reply-To: <2ff3699c-0511-5b7b-0a5f-a8be6fe667c3@gmail.com>
References: <CAEMkAmE51L71fv1Nn39Z9bDReMS912mnXAwrL3_mQVDHxQYqaQ@mail.gmail.com>
 <2ff3699c-0511-5b7b-0a5f-a8be6fe667c3@gmail.com>
Message-ID: <b9995850-bcde-70fc-25c6-cb682aee777b@gmail.com>

On 8/04/2021 5:58 pm, Dmitry Petuhov wrote:
> What for? I don't see any application of OVS in PBS. Maybe for 
> interface bonding, but it can be done easier.


Meh - I find OVS more flexible

>
> Regardless of support in GUI, you always can configure it by hand like 
> in standard Debian installation.


I prefer to stick with "The Proxmox Way", it avoids ocmplications.


Regards, I'm not hung up on it, just checking. Have a Linux LACP bond 
setup currently.

-- 
Lindsay


From lindsay.mathieson at gmail.com  Thu Apr  8 15:06:27 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 8 Apr 2021 23:06:27 +1000
Subject: [PVE-User] PBS and ZFS Pools Compression?
Message-ID: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com>

Setup a ZFS RAID1 pool on my PBS server and enabled lz4 compression on 
it (habit). But given the backups are already compressed, its not really 
going to gain anything is it? possibly even counter productive?


Thanks.

-- 
Lindsay


From devzero at web.de  Thu Apr  8 15:12:57 2021
From: devzero at web.de (Roland)
Date: Thu, 8 Apr 2021 15:12:57 +0200
Subject: [PVE-User] PBS and ZFS Pools Compression?
In-Reply-To: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com>
References: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com>
Message-ID: <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de>

hi,

i think it's counter productive, as you are wasting cpu for
compress/uncompress data,
which should not be further compressible (except some smaller files like
img.fidx etc)

regards
roland

Am 08.04.21 um 15:06 schrieb Lindsay Mathieson:
> Setup a ZFS RAID1 pool on my PBS server and enabled lz4 compression on
> it (habit). But given the backups are already compressed, its not
> really going to gain anything is it? possibly even counter productive?
>
>
> Thanks.
>


From lindsay.mathieson at gmail.com  Thu Apr  8 15:13:46 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 8 Apr 2021 23:13:46 +1000
Subject: [PVE-User] PBS and ZFS Pools Compression?
In-Reply-To: <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de>
References: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com>
 <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de>
Message-ID: <1b240ba7-d285-9968-85fe-a99265b16c9c@gmail.com>

On 8/04/2021 11:12 pm, Roland wrote:
> i think it's counter productive, as you are wasting cpu for
> compress/uncompress data, 

Yah, I thought so, thanks.

-- 
Lindsay


From m at plus-plus.su  Thu Apr  8 17:19:27 2021
From: m at plus-plus.su (Mikhail)
Date: Thu, 8 Apr 2021 18:19:27 +0300
Subject: [PVE-User] PBS and ZFS Pools Compression?
In-Reply-To: <1b240ba7-d285-9968-85fe-a99265b16c9c@gmail.com>
References: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com>
 <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de>
 <1b240ba7-d285-9968-85fe-a99265b16c9c@gmail.com>
Message-ID: <478f113a-b899-8db7-ae2c-8de0084c7920@plus-plus.su>

On 4/8/21 4:13 PM, Lindsay Mathieson wrote:
> On 8/04/2021 11:12 pm, Roland wrote:
>> i think it's counter productive, as you are wasting cpu for
>> compress/uncompress data, 
> 
> Yah, I thought so, thanks.
> 

I may be wrong, but AFAIK ZFS detects compressed data and thus it is not
doing double-compression in such cases, so I guess there's no harm here
(we also have lz4 enabled on datastore where Proxmox sends backups).
Also lz4 is cheap, so I doubt it has any significant impact on modern CPUs.

regards,


From dietmar at proxmox.com  Thu Apr  8 17:47:23 2021
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Thu, 8 Apr 2021 17:47:23 +0200 (CEST)
Subject: [PVE-User] PBS and ZFS Pools Compression?
Message-ID: <156659471.1579.1617896843476@webmail.proxmox.com>


> I may be wrong, but AFAIK ZFS detects compressed data and thus it is not
> doing double-compression in such cases,

AFAIK the only way to detect compressed data is to actually compress it, then
test the size. So this is double-compression ...


From gseeley at gmail.com  Thu Apr  8 18:22:44 2021
From: gseeley at gmail.com (Geoff Seeley)
Date: Thu, 8 Apr 2021 09:22:44 -0700
Subject: [PVE-User] PBS and ZFS Pools Compression?
In-Reply-To: <mailman.181.1617896884.359.pve-user@lists.proxmox.com>
References: <mailman.181.1617896884.359.pve-user@lists.proxmox.com>
Message-ID: <CAF3x0qAmG6AY7PEEehy2j9hNkfRGmPNGmeQGLGRzUNUPZ6RZ=g@mail.gmail.com>

>
> > I may be wrong, but AFAIK ZFS detects compressed data and thus it is not
> > doing double-compression in such cases,
>
> AFAIK the only way to detect compressed data is to actually compress it,
> then
> test the size. So this is double-compression ...
>

ZFS compression is a little more complex than this, but the good news is
that ZFS is also smart enough not to do this!

This is a good article on the subject:
https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/

TL;DR: Enable compression at the pool level and forget about it.

-Geoff


From dietmar at proxmox.com  Thu Apr  8 20:06:59 2021
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Thu, 8 Apr 2021 20:06:59 +0200 (CEST)
Subject: [PVE-User] PBS and ZFS Pools Compression?
Message-ID: <1052148244.1615.1617905219488@webmail.proxmox.com>

> This is a good article on the subject:
> https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/

Can't find where the explain it. ZFS magically detects if data is compressible?
Please can someone give me a hint how they do that?


From devzero at web.de  Fri Apr  9 01:51:09 2021
From: devzero at web.de (Roland)
Date: Fri, 9 Apr 2021 01:51:09 +0200
Subject: [PVE-User] PBS and ZFS Pools Compression?
In-Reply-To: <1052148244.1615.1617905219488@webmail.proxmox.com>
References: <1052148244.1615.1617905219488@webmail.proxmox.com>
Message-ID: <9cd163da-e7b6-f3cf-5537-f601babcaba0@web.de>

i know that there was smartcompression feature in nexenta:

https://openzfs.org/w/images/4/4d/Compression-Saso_Kiselkov.pdf

afaik, it does not exist in zfsonlinux/openzfs.

on my system, i'm getting

- <250MB/s when writing uncompressible data to zfs pool with lz4 enabled

and

- >450MB/s when writing uncompressible data to zfs pool without compression

regards
roland

Am 08.04.21 um 20:06 schrieb Dietmar Maurer:
>> This is a good article on the subject:
>> https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/
> Can't find where the explain it. ZFS magically detects if data is compressible?
> Please can someone give me a hint how they do that?
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From f.gruenbichler at proxmox.com  Fri Apr  9 08:54:01 2021
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?q?Gr=FCnbichler?=)
Date: Fri, 09 Apr 2021 08:54:01 +0200
Subject: [PVE-User] PBS and ZFS Pools Compression?
In-Reply-To: <1052148244.1615.1617905219488@webmail.proxmox.com>
References: <1052148244.1615.1617905219488@webmail.proxmox.com>
Message-ID: <1617951199.q66a123ls2.astroid@nora.none>

On April 8, 2021 8:06 pm, Dietmar Maurer wrote:
>> This is a good article on the subject:
>> https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/
> 
> Can't find where the explain it. ZFS magically detects if data is compressible?
> Please can someone give me a hint how they do that?

no. they compress, and if the result is over a certain threshold, they 
save the uncompressed data to avoid the decompress overhead for barely 
any/no gain.


From leandro at tecnetmza.com.ar  Fri Apr  9 13:20:59 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Fri, 9 Apr 2021 08:20:59 -0300
Subject: [PVE-User] link down for network interface
In-Reply-To: <20210407220319.snhtabanjxcjfl6w@percival.namespace.at>
References: <CALt2oz58fYYZSGwgCMops2Phh2q9SiYmproaE-SrTwj2+Ozyvg@mail.gmail.com>
 <20210407220319.snhtabanjxcjfl6w@percival.namespace.at>
Message-ID: <CALt2oz7TF6_BKzTW3JYHBYg4040vkE=Qdcs8_ZsxFBbiS-Jk5A@mail.gmail.com>

Veeeery strange.
After removing the vlan aware flag on the interface configuration could get
the link up.
Fortunately I have a second nic on this server so I will use it instead.
Have not much time to debug this.
Thanks.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

El mi?, 7 abr 2021 a las 19:04, Chris Hofstaedtler | Deduktiva (<
chris.hofstaedtler at deduktiva.com>) escribi?:

> * Leandro Roggerone <leandro at tecnetmza.com.ar> [210407 17:25]:
> > Hi guys ...
> Please consider that more than one gender exists.
>
> > Yesterday was working on my datacenter cabling my new proxmox network
> > interfaces.
> > I connected a network port to my mikrotik router.
> > Before coming home I checked network linked condition and physically was
> ok
> > , both leds blinking on both side , router and server nic. (eno1)
> > Now , working remotely can see that server interface has no link.
> > Very strange since interface at router is up.
> > Any idea ? I never seen something similar (one side  link ok  and the
> other
> > is down).
>
> > root at pve2:~# ip link
> [..]
> > 4: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master
> vmbr2
> > state DOWN mode DEFAULT group default qlen 1000
> >     link/ether 40:a8:f0:2a:18:80 brd ff:ff:ff:ff:ff:ff
> >
> > as you can see eno1 is down ... while can see link up on the other
> > connected side.
>
> You did not tell us the brand and make or driver of that network
> card. Some Intel cards are known to be extremely picky about autoneg
> and power save settings, (short) cables, etc.
>
> Best luck,
> Chris
>
> --
> Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien)
> www.deduktiva.com / +43 1 353 1707
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From leandro at tecnetmza.com.ar  Fri Apr  9 19:19:06 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Fri, 9 Apr 2021 14:19:06 -0300
Subject: [PVE-User] LVM question
Message-ID: <CALt2oz5wn1LcPSi1JzqsNRszn3abyVieqDH5BqML8G1zOTjxTg@mail.gmail.com>

Hi guys , after install a new storage to my box , had to create lvm-thin.
Im not very good with lvm , after reading some docs , and links like:
https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/

I got a working solution  but have also some questions about it.
This is what I did:

 wipefs -a /dev/sdb

sgdisk -N 1 /dev/sdb

pvcreate --metadatasize 1024M -y -ff /dev/sdb1

vgcreate --metadatasize 1024M proxvg /dev/sdb1

lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n
proxthin proxvg

lvcreate -n proxvz -V 1.1T proxvg/proxthin

mkfs.ext4 /dev/proxvg/proxvz

mkdir /media/vz

echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' >>
/etc/fstab

mount -a


And have following result:


root at pve2:~# lvs
  LV            VG     Attr       LSize    Pool     Origin Data%  Meta%
 Move Log Cpy%Sync Convert
  proxthin      proxvg twi-aotz--    1.63t                 22.03  6.34

  proxvz        proxvg Vwi-aotz--    1.10t proxthin        1.67
root at pve2:~# lvdisplay
  --- Logical volume ---
  LV Name                proxthin
  VG Name                proxvg
  LV UUID                4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX
  LV Write Access        read/write
  LV Creation host, time pve2, 2021-04-01 13:09:41 -0300
  LV Pool metadata       proxthin_tmeta
  LV Pool data           proxthin_tdata
  LV Status              available
  # open                 3
  LV Size                1.63 TiB
  Allocated pool data    22.03%
  Allocated metadata     6.34%
  Current LE             428451
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

  --- Logical volume ---
  LV Path                /dev/proxvg/proxvz
  LV Name                proxvz
  VG Name                proxvg
  LV UUID                huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR
  LV Write Access        read/write
  LV Creation host, time pve2, 2021-04-01 13:10:12 -0300
  LV Pool name           proxthin
  LV Status              available
  # open                 1
  LV Size                1.10 TiB
  Mapped size            1.67%
  Current LE             288359
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:4


Have following comments:

I can create VMs on proxthin partition so it is ok.

I can create backup on  proxvz partition so it is ok.

What im concerned about is:

Physic storage space is about 1.8TG , how is it possible to create a 1.6
and 1.1T volumnes inside ?
It can be a problem in the future ?
I was thinking about reduce proxthin partition to 600Gb aprox , so it make
same sense 1.1T + 600G aprox 1.8 T
But there is no LV Path on proxthin partition so I can unmount and the
reduce.
So ..
What im missing here ? do I need to reduce proxthin partition ( I do need
the 1.1T partition to backup).
Hope to be clear about this guys.
Any comment would be wellcome.
Leandro.


From lindsay.mathieson at gmail.com  Sat Apr 10 05:22:57 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sat, 10 Apr 2021 13:22:57 +1000
Subject: [PVE-User] Revisited: External disk backup using PBS - Requesting
 Criticism/Advice?
Message-ID: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com>

TTL;DR - Backup PC is a standalone Proxmox Server, running a PBS lxc 
container using its host filesystem for the Backup store. The PBS 
container is backed up using vzdump to an attached external hard drive 
which is rotated offsite.


    Source Data

  * 5 Node Proxmox Cluster
  * Ceph storage (size = 3)
  * 30 VM's


    Backup Destination

Standalone Proxmox Server in same server room as main cluster

  * 16 GM Ram
  * CPU: Intel i5
  * Boot - 2 SSD's in ZFS RAID1
  * Data - 2 4TB WD NAS Drives in ZFS RAID1
  * Bonded 1G * 3
  * PBS Container
      o Data Store - 4TB, passed through from Host


    Schedule

  * Proxmox backups up all VM's to PBS Container Weekly
      o Will revisit the schedule
  * Host proxmox server backs up PBS and its data to external hard drive
      o Approx 2 TB Data (we are just an SMB)
      o Drive is rotated offsite


    Recovery

  * VM's can easily be restored from the PBS server as needed (very rare
    occurrence - usually a user messed up their VM)


    Disaster Recovery

This is the real concern - Fire, Theft etc. All servers and data 
including the Standalone Host and PBS server are gone.

  * Recreate Proxmox Cluster
  * Recreate Proxmox PBS Host
      o Restore the PBS Container and data from an offsite backup disk
  * Restore the VM's to the Cluster from the PBS Container

Does all this seem practical and safe?


Thanks - Lindsay


-- 
Lindsay


From leesteken at protonmail.ch  Sat Apr 10 09:10:10 2021
From: leesteken at protonmail.ch (Arjen)
Date: Sat, 10 Apr 2021 07:10:10 +0000
Subject: [PVE-User] Revisited: External disk backup using PBS - Requesting
 Criticism/Advice?
In-Reply-To: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com>
References: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com>
Message-ID: <2R8D4utXiNGRYsPE8Hi1xF74VKA4tNAgEXchp_H_Zs84aZoA1i8iPgh6oqyoPGBG4-W9_Df1IZZxwEmN-d3QGw18OKKfVvmuLxElBVqjkiI=@protonmail.ch>

On Saturday, April 10th, 2021 at 05:22, Lindsay Mathieson <lindsay.mathieson at gmail.com> wrote:

> TTL;DR - Backup PC is a standalone Proxmox Server, running a PBS lxc
> container using its host filesystem for the Backup store. The PBS
> container is backed up using vzdump to an attached external hard drive
> which is rotated offsite.
>
> Source Data
>
> -   5 Node Proxmox Cluster
> -   Ceph storage (size = 3)
> -   30 VM's
>
>     Backup Destination
>
>     Standalone Proxmox Server in same server room as main cluster
> -   16 GM Ram
> -   CPU: Intel i5
> -   Boot - 2 SSD's in ZFS RAID1
> -   Data - 2 4TB WD NAS Drives in ZFS RAID1
> -   Bonded 1G * 3
> -   PBS Container
>
>     o Data Store - 4TB, passed through from Host
>
>     Schedule
> -   Proxmox backups up all VM's to PBS Container Weekly
>
>     o Will revisit the schedule
> -   Host proxmox server backs up PBS and its data to external hard drive
>
>     o Approx 2 TB Data (we are just an SMB)
>
>     o Drive is rotated offsite
>
>     Recovery
> -   VM's can easily be restored from the PBS server as needed (very rare
>
>     occurrence - usually a user messed up their VM)
>
>     Disaster Recovery
>
>     This is the real concern - Fire, Theft etc. All servers and data
>     including the Standalone Host and PBS server are gone.
> -   Recreate Proxmox Cluster
> -   Recreate Proxmox PBS Host
>
>     o Restore the PBS Container and data from an offsite backup disk
> -   Restore the VM's to the Cluster from the PBS Container
>
>     Does all this seem practical and safe?

Don't expect to be able to backup the PBS container with 4TB to a 2TB external drive. The Datastore of a PSB does not compress much further and Proxmox VE Backup will only backup virtual disks and not mountpoints or storages passed from host, if I understand correctly.
I suggest adding a virtual disk of 2TB to the PBS container (and format it with ext4) which can be backed up by the Proxmox VE Backup.
I would also run the PBS container (with virtual disk) on the cluster instead on separate hardware which is a single point of failure. The local PBS would be then just as reliable as your cluster.

Regarding safeness: I suggest doing a automated disaster recovery every week to make sure it works as expected. Or at least partially, like restoring the PBS from an external drive.
Regarding practicality: I would have a remote PBS sync with your local PBS instead of moving physical disks (but you mentioned before that that was not really possible).

best regards, Arjen


From lindsay.mathieson at gmail.com  Sat Apr 10 15:28:50 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sat, 10 Apr 2021 23:28:50 +1000
Subject: [PVE-User] Revisited: External disk backup using PBS -
 Requesting Criticism/Advice?
In-Reply-To: <mailman.235.1618038675.359.pve-user@lists.proxmox.com>
References: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com>
 <mailman.235.1618038675.359.pve-user@lists.proxmox.com>
Message-ID: <e92b8b56-e89f-928d-a3c8-7b423e334cf5@gmail.com>

On 10/04/2021 5:10 pm, Arjen via pve-user wrote:
> Don't expect to be able to backup the PBS container with 4TB to a 2TB external drive.

I only passed 2TB through and the actual backup data comes to 1.3TB

>   The Datastore of a PSB does not compress much further and Proxmox VE Backup will only backup virtual disks and not mountpoints or storages passed from host, if I understand correctly.


I wondered that. Will be testing.

> I suggest adding a virtual disk of 2TB to the PBS container (and format it with ext4) which can be backed up by the Proxmox VE Backup.


Certainly a possibility.


I also wondered if it was practical to attach an external disk to PBS as 
a Datastore, then detach it. A bit more manual, but doable.


> I would also run the PBS container (with virtual disk) on the cluster instead on separate hardware which is a single point of failure. The local PBS would be then just as reliable as your cluster.


I want to keep the storage separate from the cluster, in that regard the 
local storage is a single point of failure, hence the need for offsite 
storage as well :)


>
> Regarding safeness: I suggest doing a automated disaster recovery every week to make sure it works as expected. Or at least partially, like restoring the PBS from an external drive.


I'll definitely be testing restore options to check that it works.

> Regarding practicality: I would have a remote PBS sync with your local PBS instead of moving physical disks (but you mentioned before that that was not really possible).


Alas :(


Perhaps I could do a backup on site, then physically move it offsite and 
attach it to a offsite PBS server and then sync it remotely - 
incremental backups over the net would be doable.


nb. Our NAS died, hence my increased investigation of this :) Definitely 
want to go with a more open and targeted solution this time, the NAS was 
a good appliance, but inflexible.

Thanks!

-- 
Lindsay


From lindsay.mathieson at gmail.com  Sat Apr 10 15:36:23 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sat, 10 Apr 2021 23:36:23 +1000
Subject: [PVE-User] PBS Incremental and stopped VM's
Message-ID: <ddef627d-5c5f-1358-4a9f-196ef5868e6e@gmail.com>

I'm guessing only running VM's (with dirty bitmap support) can be 
incrementally backed up?


Might be nice if we could schedule backups for only running VM's

-- 
Lindsay


From leesteken at protonmail.ch  Sat Apr 10 15:43:43 2021
From: leesteken at protonmail.ch (Arjen)
Date: Sat, 10 Apr 2021 13:43:43 +0000
Subject: [PVE-User] PBS Incremental and stopped VM's
In-Reply-To: <ddef627d-5c5f-1358-4a9f-196ef5868e6e@gmail.com>
References: <ddef627d-5c5f-1358-4a9f-196ef5868e6e@gmail.com>
Message-ID: <yl4Q_yt8NBMZUuS35KwBK-5RtAEv6V0dJTE77Hp8lW3GAxzIAOU2ErfF0CMHI9y8NjUQKF5TBtgSGhQIaE3p5Sy-7ltAFX8NmbZJoSyHVAk=@protonmail.ch>

On Saturday, April 10th, 2021 at 15:36, Lindsay Mathieson <lindsay.mathieson at gmail.com> wrote:

> I'm guessing only running VM's (with dirty bitmap support) can be
>
> incrementally backed up?
>
> Might be nice if we could schedule backups for only running VM's

Just to be clear: PBS always makes a full backup. The resulting data is deduplicated (before sending it to the server), which almost always reduces the writes to the server. An administration of changed virtual disk blocks is kept for running VMs, which only reduces the reads from VMs that have not been restarted between backups. It data transfer over the network is the bottleneck, you will have most benefit from the former (less changes, less transfers). The latter only speeds up the backup due to less reads (of unchanged data) from disk.

best regards, Arjen


From lindsay.mathieson at gmail.com  Sat Apr 10 16:06:19 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sun, 11 Apr 2021 00:06:19 +1000
Subject: [PVE-User] PBS Incremental and stopped VM's
In-Reply-To: <mailman.236.1618062258.359.pve-user@lists.proxmox.com>
References: <ddef627d-5c5f-1358-4a9f-196ef5868e6e@gmail.com>
 <mailman.236.1618062258.359.pve-user@lists.proxmox.com>
Message-ID: <90b0f7df-ca15-f9cd-b76a-0f8f26e24917@gmail.com>

On 10/04/2021 11:43 pm, Arjen via pve-user wrote:
> Just to be clear: PBS always makes a full backup. The resulting data is deduplicated (before sending it to the server), which almost always reduces the writes to the server.

Ah, I see now, thanks, I didn't understand that part of things. Looking 
at the logs of my 2nd backup, I see that stopped VM's had zero bytes 
written to the backup server.

-- 
Lindsay


From atokovenko at gmail.com  Mon Apr 12 00:43:38 2021
From: atokovenko at gmail.com (Oleksii Tokovenko)
Date: Mon, 12 Apr 2021 01:43:38 +0300
Subject: [PVE-User] pve-user Digest, Vol 157, Issue 12
In-Reply-To: <mailman.3.1618135201.14860.pve-user@lists.proxmox.com>
References: <mailman.3.1618135201.14860.pve-user@lists.proxmox.com>
Message-ID: <CAC=j=pQjtUrYa=Jrc-BiOdm20RErs1E_Od=oQzzhkrqEO49NjQ@mail.gmail.com>

  unsibscribe

??, 11 ????. 2021 ? 13:00 <pve-user-request at lists.proxmox.com> ????:

> Send pve-user mailing list submissions to
>         pve-user at lists.proxmox.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> or, via email, send a message with subject or body 'help' to
>         pve-user-request at lists.proxmox.com
>
> You can reach the person managing the list at
>         pve-user-owner at lists.proxmox.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of pve-user digest..."
>
>
> Today's Topics:
>
>    1. Re: Revisited: External disk backup using PBS - Requesting
>       Criticism/Advice? (Lindsay Mathieson)
>    2. PBS Incremental and stopped VM's (Lindsay Mathieson)
>    3. Re: PBS Incremental and stopped VM's (Arjen)
>    4. Re: PBS Incremental and stopped VM's (Lindsay Mathieson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 10 Apr 2021 23:28:50 +1000
> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
> To: pve-user at lists.proxmox.com
> Subject: Re: [PVE-User] Revisited: External disk backup using PBS -
>         Requesting Criticism/Advice?
> Message-ID: <e92b8b56-e89f-928d-a3c8-7b423e334cf5 at gmail.com>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> On 10/04/2021 5:10 pm, Arjen via pve-user wrote:
> > Don't expect to be able to backup the PBS container with 4TB to a 2TB
> external drive.
>
> I only passed 2TB through and the actual backup data comes to 1.3TB
>
> >   The Datastore of a PSB does not compress much further and Proxmox VE
> Backup will only backup virtual disks and not mountpoints or storages
> passed from host, if I understand correctly.
>
>
> I wondered that. Will be testing.
>
> > I suggest adding a virtual disk of 2TB to the PBS container (and format
> it with ext4) which can be backed up by the Proxmox VE Backup.
>
>
> Certainly a possibility.
>
>
> I also wondered if it was practical to attach an external disk to PBS as
> a Datastore, then detach it. A bit more manual, but doable.
>
>
> > I would also run the PBS container (with virtual disk) on the cluster
> instead on separate hardware which is a single point of failure. The local
> PBS would be then just as reliable as your cluster.
>
>
> I want to keep the storage separate from the cluster, in that regard the
> local storage is a single point of failure, hence the need for offsite
> storage as well :)
>
>
> >
> > Regarding safeness: I suggest doing a automated disaster recovery every
> week to make sure it works as expected. Or at least partially, like
> restoring the PBS from an external drive.
>
>
> I'll definitely be testing restore options to check that it works.
>
> > Regarding practicality: I would have a remote PBS sync with your local
> PBS instead of moving physical disks (but you mentioned before that that
> was not really possible).
>
>
> Alas :(
>
>
> Perhaps I could do a backup on site, then physically move it offsite and
> attach it to a offsite PBS server and then sync it remotely -
> incremental backups over the net would be doable.
>
>
> nb. Our NAS died, hence my increased investigation of this :) Definitely
> want to go with a more open and targeted solution this time, the NAS was
> a good appliance, but inflexible.
>
> Thanks!
>
> --
> Lindsay
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 10 Apr 2021 23:36:23 +1000
> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
> To: pve-user at lists.proxmox.com
> Subject: [PVE-User] PBS Incremental and stopped VM's
> Message-ID: <ddef627d-5c5f-1358-4a9f-196ef5868e6e at gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> I'm guessing only running VM's (with dirty bitmap support) can be
> incrementally backed up?
>
>
> Might be nice if we could schedule backups for only running VM's
>
> --
> Lindsay
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Sat, 10 Apr 2021 13:43:43 +0000
> From: Arjen <leesteken at protonmail.ch>
> To: Proxmox VE user list <pve-user at lists.proxmox.com>
> Subject: Re: [PVE-User] PBS Incremental and stopped VM's
> Message-ID:
>
> <yl4Q_yt8NBMZUuS35KwBK-5RtAEv6V0dJTE77Hp8lW3GAxzIAOU2ErfF0CMHI9y8NjUQKF5TBtgSGhQIaE3p5Sy-7ltAFX8NmbZJoSyHVAk=@
> protonmail.ch>
>
> Content-Type: text/plain; charset=utf-8
>
> On Saturday, April 10th, 2021 at 15:36, Lindsay Mathieson <
> lindsay.mathieson at gmail.com> wrote:
>
> > I'm guessing only running VM's (with dirty bitmap support) can be
> >
> > incrementally backed up?
> >
> > Might be nice if we could schedule backups for only running VM's
>
> Just to be clear: PBS always makes a full backup. The resulting data is
> deduplicated (before sending it to the server), which almost always reduces
> the writes to the server. An administration of changed virtual disk blocks
> is kept for running VMs, which only reduces the reads from VMs that have
> not been restarted between backups. It data transfer over the network is
> the bottleneck, you will have most benefit from the former (less changes,
> less transfers). The latter only speeds up the backup due to less reads (of
> unchanged data) from disk.
>
> best regards, Arjen
>
>
>
> ------------------------------
>
> Message: 4
> Date: Sun, 11 Apr 2021 00:06:19 +1000
> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
> To: pve-user at lists.proxmox.com
> Subject: Re: [PVE-User] PBS Incremental and stopped VM's
> Message-ID: <90b0f7df-ca15-f9cd-b76a-0f8f26e24917 at gmail.com>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> On 10/04/2021 11:43 pm, Arjen via pve-user wrote:
> > Just to be clear: PBS always makes a full backup. The resulting data is
> deduplicated (before sending it to the server), which almost always reduces
> the writes to the server.
>
> Ah, I see now, thanks, I didn't understand that part of things. Looking
> at the logs of my 2nd backup, I see that stopped VM's had zero bytes
> written to the backup server.
>
> --
> Lindsay
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
> ------------------------------
>
> End of pve-user Digest, Vol 157, Issue 12
> *****************************************
>
>

-- 
? ?????????,
????????? ??????? ??????????


From lindsay.mathieson at gmail.com  Mon Apr 12 03:57:50 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Mon, 12 Apr 2021 11:57:50 +1000
Subject: [PVE-User] Revisited: External disk backup using PBS -
 Requesting Criticism/Advice?
In-Reply-To: <mailman.235.1618038675.359.pve-user@lists.proxmox.com>
References: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com>
 <mailman.235.1618038675.359.pve-user@lists.proxmox.com>
Message-ID: <33f7cf57-9ca2-1c05-89db-9fceaf5e4cc5@gmail.com>

On 10/04/2021 5:10 pm, Arjen via pve-user wrote:
> Don't expect to be able to backup the PBS container with 4TB to a 2TB external drive. The Datastore of a PSB does not compress much further and Proxmox VE Backup will only backup virtual disks and not mountpoints or storages passed from host, if I understand correctly.
> I suggest adding a virtual disk of 2TB to the PBS container (and format it with ext4) which can be backed up by the Proxmox VE Backup.
> I would also run the PBS container (with virtual disk) on the cluster instead on separate hardware which is a single point of failure. The local PBS would be then just as reliable as your cluster.
>
> Regarding safeness: I suggest doing a automated disaster recovery every week to make sure it works as expected. Or at least partially, like restoring the PBS from an external drive.
> Regarding practicality: I would have a remote PBS sync with your local PBS instead of moving physical disks (but you mentioned before that that was not really possible).


Did some testing over the weekend.

  * Setup a PBS Container with a 2TB root disk
  * Backed up 6 VM's to it.
  * Backed up the PBS Container to a external disk using vzdump (using
    the std proxmox gui)
      o Slooow process at 30MB/s :)
  * Deleted the PBS Container
  * Deleted the backup VM's
  * Restored the PBS container from the external hard disk
      o Much faster - averaged around 100MB/s
  * Restored PBS Container started fine and verified.
  * Restored the backed up VM's from the PBS Container
      o Worked as expected
      o All backup and running.

All in all, worked as I wanted and seems a viable option for full image 
offsite backups via external hard disks.

The process of backing up the PBS container to the external drive is 
*very* slow :( I estimate 11 hours for a full cluster backup copy. But 
since its on a independent node, it doesn't load the main cluster and 
can just happen over the weekend.


-- 
Lindsay


From d.csapak at proxmox.com  Mon Apr 12 08:48:48 2021
From: d.csapak at proxmox.com (Dominik Csapak)
Date: Mon, 12 Apr 2021 08:48:48 +0200
Subject: [PVE-User] LVM question
In-Reply-To: <CALt2oz5wn1LcPSi1JzqsNRszn3abyVieqDH5BqML8G1zOTjxTg@mail.gmail.com>
References: <CALt2oz5wn1LcPSi1JzqsNRszn3abyVieqDH5BqML8G1zOTjxTg@mail.gmail.com>
Message-ID: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com>

Hi,

On 4/9/21 19:19, Leandro Roggerone wrote:
> Hi guys , after install a new storage to my box , had to create lvm-thin.
> Im not very good with lvm , after reading some docs , and links like:
> https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/
> 
> I got a working solution  but have also some questions about it.
> This is what I did:
> 
>   wipefs -a /dev/sdb
> 
> sgdisk -N 1 /dev/sdb
> 
> pvcreate --metadatasize 1024M -y -ff /dev/sdb1
> 
> vgcreate --metadatasize 1024M proxvg /dev/sdb1
> 
> lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n
> proxthin proxvg
> 
> lvcreate -n proxvz -V 1.1T proxvg/proxthin
> 
> mkfs.ext4 /dev/proxvg/proxvz
> 
> mkdir /media/vz
> 
> echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' >>
> /etc/fstab
> 
> mount -a
> 
> 
> And have following result:
> 
> 
> root at pve2:~# lvs
>    LV            VG     Attr       LSize    Pool     Origin Data%  Meta%
>   Move Log Cpy%Sync Convert
>    proxthin      proxvg twi-aotz--    1.63t                 22.03  6.34
> 
>    proxvz        proxvg Vwi-aotz--    1.10t proxthin        1.67
> root at pve2:~# lvdisplay
>    --- Logical volume ---
>    LV Name                proxthin
>    VG Name                proxvg
>    LV UUID                4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX
>    LV Write Access        read/write
>    LV Creation host, time pve2, 2021-04-01 13:09:41 -0300
>    LV Pool metadata       proxthin_tmeta
>    LV Pool data           proxthin_tdata
>    LV Status              available
>    # open                 3
>    LV Size                1.63 TiB
>    Allocated pool data    22.03%
>    Allocated metadata     6.34%
>    Current LE             428451
>    Segments               1
>    Allocation             inherit
>    Read ahead sectors     auto
>    - currently set to     256
>    Block device           253:2
> 
>    --- Logical volume ---
>    LV Path                /dev/proxvg/proxvz
>    LV Name                proxvz
>    VG Name                proxvg
>    LV UUID                huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR
>    LV Write Access        read/write
>    LV Creation host, time pve2, 2021-04-01 13:10:12 -0300
>    LV Pool name           proxthin
>    LV Status              available
>    # open                 1
>    LV Size                1.10 TiB
>    Mapped size            1.67%
>    Current LE             288359
>    Segments               1
>    Allocation             inherit
>    Read ahead sectors     auto
>    - currently set to     256
>    Block device           253:4
> 
> 
> Have following comments:
> 
> I can create VMs on proxthin partition so it is ok.
> 
> I can create backup on  proxvz partition so it is ok.

Looks OK imho.

> 
> What im concerned about is:
> 
> Physic storage space is about 1.8TG , how is it possible to create a 1.6
> and 1.1T volumnes inside ?

LVM Thin is 'thin-provisioned' it only uses space when it is really written.

> It can be a problem in the future ?

yes, if you do not monitor your real usage, if the thinpool runs full,
you can lose data.

> I was thinking about reduce proxthin partition to 600Gb aprox , so it make
> same sense 1.1T + 600G aprox 1.8 T
> But there is no LV Path on proxthin partition so I can unmount and the
> reduce.
> So ..
> What im missing here ? do I need to reduce proxthin partition ( I do need
> the 1.1T partition to backup).

the LV 'proxvz' is inside the thinpool 'proxthin' so as long as you
never allocate more that ~500GiB of vm/ct volumes, it should be fine.

alos, on allocation, the thinpool will print warnings if the allocated
lvs are bigger than the space available

hope this helps

> Hope to be clear about this guys.
> Any comment would be wellcome.
> Leandro.
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 


From d.csapak at proxmox.com  Mon Apr 12 08:48:48 2021
From: d.csapak at proxmox.com (Dominik Csapak)
Date: Mon, 12 Apr 2021 08:48:48 +0200
Subject: [PVE-User] LVM question
In-Reply-To: <CALt2oz5wn1LcPSi1JzqsNRszn3abyVieqDH5BqML8G1zOTjxTg@mail.gmail.com>
References: <CALt2oz5wn1LcPSi1JzqsNRszn3abyVieqDH5BqML8G1zOTjxTg@mail.gmail.com>
Message-ID: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com>

Hi,

On 4/9/21 19:19, Leandro Roggerone wrote:
> Hi guys , after install a new storage to my box , had to create lvm-thin.
> Im not very good with lvm , after reading some docs , and links like:
> https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/
> 
> I got a working solution  but have also some questions about it.
> This is what I did:
> 
>   wipefs -a /dev/sdb
> 
> sgdisk -N 1 /dev/sdb
> 
> pvcreate --metadatasize 1024M -y -ff /dev/sdb1
> 
> vgcreate --metadatasize 1024M proxvg /dev/sdb1
> 
> lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n
> proxthin proxvg
> 
> lvcreate -n proxvz -V 1.1T proxvg/proxthin
> 
> mkfs.ext4 /dev/proxvg/proxvz
> 
> mkdir /media/vz
> 
> echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' >>
> /etc/fstab
> 
> mount -a
> 
> 
> And have following result:
> 
> 
> root at pve2:~# lvs
>    LV            VG     Attr       LSize    Pool     Origin Data%  Meta%
>   Move Log Cpy%Sync Convert
>    proxthin      proxvg twi-aotz--    1.63t                 22.03  6.34
> 
>    proxvz        proxvg Vwi-aotz--    1.10t proxthin        1.67
> root at pve2:~# lvdisplay
>    --- Logical volume ---
>    LV Name                proxthin
>    VG Name                proxvg
>    LV UUID                4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX
>    LV Write Access        read/write
>    LV Creation host, time pve2, 2021-04-01 13:09:41 -0300
>    LV Pool metadata       proxthin_tmeta
>    LV Pool data           proxthin_tdata
>    LV Status              available
>    # open                 3
>    LV Size                1.63 TiB
>    Allocated pool data    22.03%
>    Allocated metadata     6.34%
>    Current LE             428451
>    Segments               1
>    Allocation             inherit
>    Read ahead sectors     auto
>    - currently set to     256
>    Block device           253:2
> 
>    --- Logical volume ---
>    LV Path                /dev/proxvg/proxvz
>    LV Name                proxvz
>    VG Name                proxvg
>    LV UUID                huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR
>    LV Write Access        read/write
>    LV Creation host, time pve2, 2021-04-01 13:10:12 -0300
>    LV Pool name           proxthin
>    LV Status              available
>    # open                 1
>    LV Size                1.10 TiB
>    Mapped size            1.67%
>    Current LE             288359
>    Segments               1
>    Allocation             inherit
>    Read ahead sectors     auto
>    - currently set to     256
>    Block device           253:4
> 
> 
> Have following comments:
> 
> I can create VMs on proxthin partition so it is ok.
> 
> I can create backup on  proxvz partition so it is ok.

Looks OK imho.

> 
> What im concerned about is:
> 
> Physic storage space is about 1.8TG , how is it possible to create a 1.6
> and 1.1T volumnes inside ?

LVM Thin is 'thin-provisioned' it only uses space when it is really written.

> It can be a problem in the future ?

yes, if you do not monitor your real usage, if the thinpool runs full,
you can lose data.

> I was thinking about reduce proxthin partition to 600Gb aprox , so it make
> same sense 1.1T + 600G aprox 1.8 T
> But there is no LV Path on proxthin partition so I can unmount and the
> reduce.
> So ..
> What im missing here ? do I need to reduce proxthin partition ( I do need
> the 1.1T partition to backup).

the LV 'proxvz' is inside the thinpool 'proxthin' so as long as you
never allocate more that ~500GiB of vm/ct volumes, it should be fine.

alos, on allocation, the thinpool will print warnings if the allocated
lvs are bigger than the space available

hope this helps

> Hope to be clear about this guys.
> Any comment would be wellcome.
> Leandro.
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 


From leandro at tecnetmza.com.ar  Mon Apr 12 13:12:48 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Mon, 12 Apr 2021 08:12:48 -0300
Subject: [PVE-User] LVM question
In-Reply-To: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com>
References: <CALt2oz5wn1LcPSi1JzqsNRszn3abyVieqDH5BqML8G1zOTjxTg@mail.gmail.com>
 <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com>
Message-ID: <CALt2oz7zNyjfMCjPVo9TJhDzaTriX9dcfLFFLw5CjHbZhnxUvw@mail.gmail.com>

Thanks !! very helpful.


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

El lun, 12 abr 2021 a las 3:48, Dominik Csapak (<d.csapak at proxmox.com>)
escribi?:

> Hi,
>
> On 4/9/21 19:19, Leandro Roggerone wrote:
> > Hi guys , after install a new storage to my box , had to create lvm-thin.
> > Im not very good with lvm , after reading some docs , and links like:
> >
> https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/
> >
> > I got a working solution  but have also some questions about it.
> > This is what I did:
> >
> >   wipefs -a /dev/sdb
> >
> > sgdisk -N 1 /dev/sdb
> >
> > pvcreate --metadatasize 1024M -y -ff /dev/sdb1
> >
> > vgcreate --metadatasize 1024M proxvg /dev/sdb1
> >
> > lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n
> > proxthin proxvg
> >
> > lvcreate -n proxvz -V 1.1T proxvg/proxthin
> >
> > mkfs.ext4 /dev/proxvg/proxvz
> >
> > mkdir /media/vz
> >
> > echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2'
> >>
> > /etc/fstab
> >
> > mount -a
> >
> >
> > And have following result:
> >
> >
> > root at pve2:~# lvs
> >    LV            VG     Attr       LSize    Pool     Origin Data%  Meta%
> >   Move Log Cpy%Sync Convert
> >    proxthin      proxvg twi-aotz--    1.63t                 22.03  6.34
> >
> >    proxvz        proxvg Vwi-aotz--    1.10t proxthin        1.67
> > root at pve2:~# lvdisplay
> >    --- Logical volume ---
> >    LV Name                proxthin
> >    VG Name                proxvg
> >    LV UUID                4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX
> >    LV Write Access        read/write
> >    LV Creation host, time pve2, 2021-04-01 13:09:41 -0300
> >    LV Pool metadata       proxthin_tmeta
> >    LV Pool data           proxthin_tdata
> >    LV Status              available
> >    # open                 3
> >    LV Size                1.63 TiB
> >    Allocated pool data    22.03%
> >    Allocated metadata     6.34%
> >    Current LE             428451
> >    Segments               1
> >    Allocation             inherit
> >    Read ahead sectors     auto
> >    - currently set to     256
> >    Block device           253:2
> >
> >    --- Logical volume ---
> >    LV Path                /dev/proxvg/proxvz
> >    LV Name                proxvz
> >    VG Name                proxvg
> >    LV UUID                huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR
> >    LV Write Access        read/write
> >    LV Creation host, time pve2, 2021-04-01 13:10:12 -0300
> >    LV Pool name           proxthin
> >    LV Status              available
> >    # open                 1
> >    LV Size                1.10 TiB
> >    Mapped size            1.67%
> >    Current LE             288359
> >    Segments               1
> >    Allocation             inherit
> >    Read ahead sectors     auto
> >    - currently set to     256
> >    Block device           253:4
> >
> >
> > Have following comments:
> >
> > I can create VMs on proxthin partition so it is ok.
> >
> > I can create backup on  proxvz partition so it is ok.
>
> Looks OK imho.
>
> >
> > What im concerned about is:
> >
> > Physic storage space is about 1.8TG , how is it possible to create a 1.6
> > and 1.1T volumnes inside ?
>
> LVM Thin is 'thin-provisioned' it only uses space when it is really
> written.
>
> > It can be a problem in the future ?
>
> yes, if you do not monitor your real usage, if the thinpool runs full,
> you can lose data.
>
> > I was thinking about reduce proxthin partition to 600Gb aprox , so it
> make
> > same sense 1.1T + 600G aprox 1.8 T
> > But there is no LV Path on proxthin partition so I can unmount and the
> > reduce.
> > So ..
> > What im missing here ? do I need to reduce proxthin partition ( I do need
> > the 1.1T partition to backup).
>
> the LV 'proxvz' is inside the thinpool 'proxthin' so as long as you
> never allocate more that ~500GiB of vm/ct volumes, it should be fine.
>
> alos, on allocation, the thinpool will print warnings if the allocated
> lvs are bigger than the space available
>
> hope this helps
>
> > Hope to be clear about this guys.
> > Any comment would be wellcome.
> > Leandro.
> > _______________________________________________
> > pve-user mailing list
> > pve-user at lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> >
>
>
>


From piviul at riminilug.it  Tue Apr 13 10:05:05 2021
From: piviul at riminilug.it (Piviul)
Date: Tue, 13 Apr 2021 10:05:05 +0200
Subject: [PVE-User] Edit: Boot Order mask
Message-ID: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>

I ask[?] about this little problem on the forum but nobody found a 
solution, so I try here...

In my PVE the mask where I can change the Boot Order options of a VM is 
not ever the same. If I access to the mask from 2 nodes (say node1 and 
node2) the mask is a simple html form with only combo boxes. On the 
third node (say node3) the mask is more sophisticated, can support the 
drag and drop, has checkbox... in other word it's different. So I would 
like to know why my three nodes doesn't have the same mask even if they 
are at the same proxmox version and if there is a way that all nodes 
shows the same mask.

I ask you because this is not only a layout problem; if I modify the 
boot order options from the node3, I can see strange chars in the PVE 
gui of the other two nodes but if I configure the boot order options 
from node1 or node2 all seems works flawless.

Best regards

Piviul

[?] 
https://forum.proxmox.com/threads/strange-chars-in-boot-order-options.87169/


From alwin at antreich.com  Tue Apr 13 11:03:39 2021
From: alwin at antreich.com (Alwin Antreich)
Date: Tue, 13 Apr 2021 09:03:39 +0000
Subject: [PVE-User] Edit: Boot Order mask
In-Reply-To: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
References: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
Message-ID: <bfc5354351da4e89d10acaa30e1bc6ef@antreich.com>

Hello Piviul,

April 13, 2021 10:05 AM, "Piviul" <piviul at riminilug.it> wrote:

> I ask[?] about this little problem on the forum but nobody found a
> solution, so I try here...
> 
> In my PVE the mask where I can change the Boot Order options of a VM is
> not ever the same. If I access to the mask from 2 nodes (say node1 and
> node2) the mask is a simple html form with only combo boxes. On the
> third node (say node3) the mask is more sophisticated, can support the
> drag and drop, has checkbox... in other word it's different. So I would
> like to know why my three nodes doesn't have the same mask even if they
> are at the same proxmox version and if there is a way that all nodes
> shows the same mask.
> 
> I ask you because this is not only a layout problem; if I modify the
> boot order options from the node3, I can see strange chars in the PVE
> gui of the other two nodes but if I configure the boot order options
> from node1 or node2 all seems works flawless.

Are you're nodes all on the same update level? If not update all of them. If yes, then try to clear
the browser cache.

--
Cheers,
Alwin


From piviul at riminilug.it  Tue Apr 13 10:44:36 2021
From: piviul at riminilug.it (Piviul)
Date: Tue, 13 Apr 2021 10:44:36 +0200
Subject: [PVE-User] pve-user Digest, Vol 157, Issue 12
In-Reply-To: <CAC=j=pQjtUrYa=Jrc-BiOdm20RErs1E_Od=oQzzhkrqEO49NjQ@mail.gmail.com>
References: <mailman.3.1618135201.14860.pve-user@lists.proxmox.com>
 <CAC=j=pQjtUrYa=Jrc-BiOdm20RErs1E_Od=oQzzhkrqEO49NjQ@mail.gmail.com>
Message-ID: <432743f5-b0ab-414f-297b-e55d8163166f@riminilug.it>

Hi Oleksii, if you want to unsuscribe this list please go to 
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user; you can 
found the instructions at the end of the page.

Best regards

Piviul

Il 12/04/21 00:43, Oleksii Tokovenko ha scritto:
>    unsibscribe
>
> ??, 11 ????. 2021 ? 13:00 <pve-user-request at lists.proxmox.com> ????:
>
>> Send pve-user mailing list submissions to
>>          pve-user at lists.proxmox.com
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>          https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> or, via email, send a message with subject or body 'help' to
>>          pve-user-request at lists.proxmox.com
>>
>> You can reach the person managing the list at
>>          pve-user-owner at lists.proxmox.com
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of pve-user digest..."
>>
>>
>> Today's Topics:
>>
>>     1. Re: Revisited: External disk backup using PBS - Requesting
>>        Criticism/Advice? (Lindsay Mathieson)
>>     2. PBS Incremental and stopped VM's (Lindsay Mathieson)
>>     3. Re: PBS Incremental and stopped VM's (Arjen)
>>     4. Re: PBS Incremental and stopped VM's (Lindsay Mathieson)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sat, 10 Apr 2021 23:28:50 +1000
>> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
>> To: pve-user at lists.proxmox.com
>> Subject: Re: [PVE-User] Revisited: External disk backup using PBS -
>>          Requesting Criticism/Advice?
>> Message-ID: <e92b8b56-e89f-928d-a3c8-7b423e334cf5 at gmail.com>
>> Content-Type: text/plain; charset=windows-1252; format=flowed
>>
>> On 10/04/2021 5:10 pm, Arjen via pve-user wrote:
>>> Don't expect to be able to backup the PBS container with 4TB to a 2TB
>> external drive.
>>
>> I only passed 2TB through and the actual backup data comes to 1.3TB
>>
>>>    The Datastore of a PSB does not compress much further and Proxmox VE
>> Backup will only backup virtual disks and not mountpoints or storages
>> passed from host, if I understand correctly.
>>
>>
>> I wondered that. Will be testing.
>>
>>> I suggest adding a virtual disk of 2TB to the PBS container (and format
>> it with ext4) which can be backed up by the Proxmox VE Backup.
>>
>>
>> Certainly a possibility.
>>
>>
>> I also wondered if it was practical to attach an external disk to PBS as
>> a Datastore, then detach it. A bit more manual, but doable.
>>
>>
>>> I would also run the PBS container (with virtual disk) on the cluster
>> instead on separate hardware which is a single point of failure. The local
>> PBS would be then just as reliable as your cluster.
>>
>>
>> I want to keep the storage separate from the cluster, in that regard the
>> local storage is a single point of failure, hence the need for offsite
>> storage as well :)
>>
>>
>>> Regarding safeness: I suggest doing a automated disaster recovery every
>> week to make sure it works as expected. Or at least partially, like
>> restoring the PBS from an external drive.
>>
>>
>> I'll definitely be testing restore options to check that it works.
>>
>>> Regarding practicality: I would have a remote PBS sync with your local
>> PBS instead of moving physical disks (but you mentioned before that that
>> was not really possible).
>>
>>
>> Alas :(
>>
>>
>> Perhaps I could do a backup on site, then physically move it offsite and
>> attach it to a offsite PBS server and then sync it remotely -
>> incremental backups over the net would be doable.
>>
>>
>> nb. Our NAS died, hence my increased investigation of this :) Definitely
>> want to go with a more open and targeted solution this time, the NAS was
>> a good appliance, but inflexible.
>>
>> Thanks!
>>
>> --
>> Lindsay
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Sat, 10 Apr 2021 23:36:23 +1000
>> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
>> To: pve-user at lists.proxmox.com
>> Subject: [PVE-User] PBS Incremental and stopped VM's
>> Message-ID: <ddef627d-5c5f-1358-4a9f-196ef5868e6e at gmail.com>
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>> I'm guessing only running VM's (with dirty bitmap support) can be
>> incrementally backed up?
>>
>>
>> Might be nice if we could schedule backups for only running VM's
>>
>> --
>> Lindsay
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Sat, 10 Apr 2021 13:43:43 +0000
>> From: Arjen <leesteken at protonmail.ch>
>> To: Proxmox VE user list <pve-user at lists.proxmox.com>
>> Subject: Re: [PVE-User] PBS Incremental and stopped VM's
>> Message-ID:
>>
>> <yl4Q_yt8NBMZUuS35KwBK-5RtAEv6V0dJTE77Hp8lW3GAxzIAOU2ErfF0CMHI9y8NjUQKF5TBtgSGhQIaE3p5Sy-7ltAFX8NmbZJoSyHVAk=@
>> protonmail.ch>
>>
>> Content-Type: text/plain; charset=utf-8
>>
>> On Saturday, April 10th, 2021 at 15:36, Lindsay Mathieson <
>> lindsay.mathieson at gmail.com> wrote:
>>
>>> I'm guessing only running VM's (with dirty bitmap support) can be
>>>
>>> incrementally backed up?
>>>
>>> Might be nice if we could schedule backups for only running VM's
>> Just to be clear: PBS always makes a full backup. The resulting data is
>> deduplicated (before sending it to the server), which almost always reduces
>> the writes to the server. An administration of changed virtual disk blocks
>> is kept for running VMs, which only reduces the reads from VMs that have
>> not been restarted between backups. It data transfer over the network is
>> the bottleneck, you will have most benefit from the former (less changes,
>> less transfers). The latter only speeds up the backup due to less reads (of
>> unchanged data) from disk.
>>
>> best regards, Arjen
>>
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Sun, 11 Apr 2021 00:06:19 +1000
>> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
>> To: pve-user at lists.proxmox.com
>> Subject: Re: [PVE-User] PBS Incremental and stopped VM's
>> Message-ID: <90b0f7df-ca15-f9cd-b76a-0f8f26e24917 at gmail.com>
>> Content-Type: text/plain; charset=windows-1252; format=flowed
>>
>> On 10/04/2021 11:43 pm, Arjen via pve-user wrote:
>>> Just to be clear: PBS always makes a full backup. The resulting data is
>> deduplicated (before sending it to the server), which almost always reduces
>> the writes to the server.
>>
>> Ah, I see now, thanks, I didn't understand that part of things. Looking
>> at the logs of my 2nd backup, I see that stopped VM's had zero bytes
>> written to the backup server.
>>
>> --
>> Lindsay
>>
>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>>
>> ------------------------------
>>
>> End of pve-user Digest, Vol 157, Issue 12
>> *****************************************
>>
>>


From alain.pean at c2n.upsaclay.fr  Tue Apr 13 10:57:09 2021
From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=)
Date: Tue, 13 Apr 2021 10:57:09 +0200
Subject: [PVE-User] Edit: Boot Order mask
In-Reply-To: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
References: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
Message-ID: <b8bd884e-b3f3-f08f-cbaa-5a1b043d13b2@c2n.upsaclay.fr>

Le 13/04/2021 ? 10:05, Piviul a ?crit?:
> I ask[?] about this little problem on the forum but nobody found a 
> solution, so I try here...
>
> In my PVE the mask where I can change the Boot Order options of a VM 
> is not ever the same. If I access to the mask from 2 nodes (say node1 
> and node2) the mask is a simple html form with only combo boxes. On 
> the third node (say node3) the mask is more sophisticated, can support 
> the drag and drop, has checkbox... in other word it's different. So I 
> would like to know why my three nodes doesn't have the same mask even 
> if they are at the same proxmox version and if there is a way that all 
> nodes shows the same mask.
>
> I ask you because this is not only a layout problem; if I modify the 
> boot order options from the node3, I can see strange chars in the PVE 
> gui of the other two nodes but if I configure the boot order options 
> from node1 or node2 all seems works flawless. 

Hi Piviul,

My guess would be that your nodes would have different versions of 
Proxmox packages. And not the same proxmox interface on each...

The best thing would be to have the complete version of each package 
wich 'pveversion -v', but a shorter first information is to display, and 
copy paste just version here ?
# pveversion

Thanks

Alain

-- 
Administrateur Syst?me/R?seau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255


From piviul at riminilug.it  Wed Apr 14 09:37:45 2021
From: piviul at riminilug.it (Piviul)
Date: Wed, 14 Apr 2021 09:37:45 +0200
Subject: [PVE-User] Edit: Boot Order mask
In-Reply-To: <b8bd884e-b3f3-f08f-cbaa-5a1b043d13b2@c2n.upsaclay.fr>
References: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
 <b8bd884e-b3f3-f08f-cbaa-5a1b043d13b2@c2n.upsaclay.fr>
Message-ID: <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it>

Il 13/04/21 10:57, Alain P?an ha scritto:
> Hi Piviul,
>
> My guess would be that your nodes would have different versions of 
> Proxmox packages. And not the same proxmox interface on each...
>
> The best thing would be to have the complete version of each package 
> wich 'pveversion -v', but a shorter first information is to display, 
> and copy paste just version here ?
> # pveversion
>
> Thanks

I Alain, first of all thank you very much indeed to you and to all 
people answered this thread. I reply your message but the infos here 
should answer even the infos asked from Alwin...

I send directly the output differences from the command pveversion with 
-v flag because all three nodes show the same 
"pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve)" version.

So I have launched the following command in all three nodes:

# pveversion -v > pveversion.$(hostname)

obtaining 3 differents files and I've done the diff between the first 
two files (referring to pve01 and pve02) and as expected there is no 
difference:

$ diff pveversion.pve0{1,2}

Then I have done the diff between the first and the third node and this 
is the result:

$ diff pveversion.pve0{1,3}
5d4
< pve-kernel-5.3: 6.1-6
8,9c7
< pve-kernel-5.3.18-3-pve: 5.3.18-3
< pve-kernel-5.3.10-1-pve: 5.3.10-1
---
 > pve-kernel-5.4.34-1-pve: 5.4.34-2

there are some little differences yes but in kernel that are not in use 
any more (in all 3 nodes uname -r shows 5.4.106-1-pve)...

Attached you can find all three files hoping the system doesn't cut them.

Please can I ask you if you have a 6.3 node in your installations that 
was previously in 6.2 version (i.e. not installed directly in 6.3 
version)? Can you tell me if the "Boot order" musk is the one with only 
combo boxes or the more evoluted drag and drop musk?

Thank you very much

Piviul


-------------- next part --------------
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.19-pve1
ceph-fuse: 14.2.19-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.13-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
-------------- next part --------------
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.19-pve1
ceph-fuse: 14.2.19-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.13-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
-------------- next part --------------
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.19-pve1
ceph-fuse: 14.2.19-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.13-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

From elacunza at binovo.es  Wed Apr 14 11:04:10 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 14 Apr 2021 11:04:10 +0200
Subject: PVE 6.2 Strange cluster node fence
Message-ID: <4cd1b5fc-4c66-77d9-6af8-82831ca37f76@binovo.es>

Hi all,

Yesterday we had a strange fence happen in a PVE 6.2 cluster.

Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been 
operating normally for a year. Last update was on January 21st 2021. 
Storage is Ceph and nodes are connected to the same network switch with 
active-pasive bonds.

proxmox1 was fenced and automatically rebooted, then everything 
recovered. HA restarted VMs in other nodes too.

proxmox1 syslog: (no network link issues reported at device level)
---
Apr 13 11:35:14 proxmox1 corosync[1410]:?? [KNET? ] link: host: 3 link: 
0 is down
Apr 13 11:35:14 proxmox1 corosync[1410]:?? [KNET? ] host: host: 3 
(passive) best link: 0 (pri: 1)
Apr 13 11:35:14 proxmox1 corosync[1410]:?? [KNET? ] host: host: 3 has no 
active links
Apr 13 11:35:15 proxmox1 corosync[1410]:?? [TOTEM ] Token has not been 
received in 61 ms
Apr 13 11:35:15 proxmox1 corosync[1410]:?? [TOTEM ] A processor failed, 
forming new configuration.
Apr 13 11:35:18 proxmox1 corosync[1410]:?? [KNET? ] rx: host: 3 link: 0 
is up
Apr 13 11:35:18 proxmox1 corosync[1410]:?? [KNET? ] host: host: 3 
(passive) best link: 0 (pri: 1)
Apr 13 11:35:18 proxmox1 corosync[1410]:?? [TOTEM ] Token has not been 
received in 3069 ms
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [TOTEM ] A new membership 
(2.5477) was formed. Members left: 1 3
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [TOTEM ] Failed to receive 
the leave message. failed: 1 3
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] notice: members: 2/1398
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] This node is within 
the non-primary component and will NOT provide any services.
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] Members[1]: 2
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [MAIN? ] Completed service 
synchronization, ready to provide service.
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: members: 2/1398
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: node lost quorum
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: received write while 
not quorate - trigger resync
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: leaving CPG group
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [TOTEM ] A new membership 
(1.547b) was formed. Members joined: 1 3
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: members: 1/1396, 
2/1398, 3/1457
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: starting data 
syncronisation
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] This node is within 
the primary component and will provide service.
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] Members[3]: 1 2 3
Apr 13 11:35:19 proxmox1 corosync[1410]:?? [MAIN? ] Completed service 
synchronization, ready to provide service.
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: node has quorum
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: received sync 
request (epoch 1/1396/00000006)
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: received sync 
request (epoch 1/1396/00000007)
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: received all states
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: all data is up 
to date
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] notice: start cluster 
connection
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_join failed: 14
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: can't initialize service
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:19 proxmox1 pve-ha-lrm[1770]: lost lock 
'ha_agent_proxmox1_lock - cfs lock update failed - Device or resource busy
Apr 13 11:35:20 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:21 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:22 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:23 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:23 proxmox1 pve-ha-lrm[1770]: status change active => 
lost_agent_lock
Apr 13 11:35:24 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 7 more ...]
Apr 13 11:35:25 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-nnAQGx/qb): Bad message (74)
Apr 13 11:35:27 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2
Apr 13 11:35:28 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:29 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:30 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:31 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-JDQj3Z/qb): Bad message (74)
Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 2
Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 2
Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2
Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 2
Apr 13 11:35:34 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:36 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 7 more ...]
Apr 13 11:35:37 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-jgBffR/qb): Bad message (74)
Apr 13 11:35:39 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2
Apr 13 11:35:39 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:39 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:40 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:41 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:43 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-dWqAg7/qb): Bad message (74)
Apr 13 11:35:45 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2
Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:48 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 7 more ...]
Apr 13 11:35:49 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-LnKe7L/qb): Bad message (74)
Apr 13 11:35:51 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2
Apr 13 11:35:52 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:53 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 5 more ...]
Apr 13 11:35:54 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 9 more ...]
Apr 13 11:35:55 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 0
Apr 13 11:35:55 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 0
Apr 13 11:35:55 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-dXTlNP/qb): Bad message (74)
Apr 13 11:35:57 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2
Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:36:00 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[... 9 more ...]
Apr 13 11:36:01 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:36:01 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:36:01 proxmox1 pvesr[938407]: trying to acquire cfs lock 
'file-replication_cfg' ...
Apr 13 11:36:01 proxmox1 corosync[1410]:?? [QB??? ] request returned 
error (/dev/shm/qb-1410-1398-33-17q0ii/qb): Bad message (74)
Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 2
Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 2
Apr 13 11:36:03 proxmox1 pvesr[938407]: trying to acquire cfs lock 
'file-replication_cfg' ...
Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message 
failed: 9
[reset garbage)
---

proxmox2 log:
---
Apr 13 11:35:15 proxmox2 corosync[1402]:?? [KNET? ] link: host: 3 link: 
0 is down
Apr 13 11:35:15 proxmox2 corosync[1402]:?? [KNET? ] host: host: 3 
(passive) best link: 0 (pri: 1)
Apr 13 11:35:15 proxmox2 corosync[1402]:?? [KNET? ] host: host: 3 has no 
active links
Apr 13 11:35:15 proxmox2 corosync[1402]:?? [TOTEM ] Token has not been 
received in 1237 ms
Apr 13 11:35:15 proxmox2 corosync[1402]:?? [TOTEM ] A processor failed, 
forming new configuration.
Apr 13 11:35:17 proxmox2 corosync[1402]:?? [KNET? ] rx: host: 3 link: 0 
is up
Apr 13 11:35:17 proxmox2 corosync[1402]:?? [KNET? ] host: host: 3 
(passive) best link: 0 (pri: 1)
Apr 13 11:35:18 proxmox2 corosync[1402]:?? [TOTEM ] Token has not been 
received in 4637 ms
Apr 13 11:35:19 proxmox2 corosync[1402]:?? [TOTEM ] A new membership 
(1.5477) was formed. Members left: 2
Apr 13 11:35:19 proxmox2 corosync[1402]:?? [TOTEM ] Failed to receive 
the leave message. failed: 2
Apr 13 11:35:19 proxmox2 corosync[1402]:?? [TOTEM ] A new membership 
(1.547b) was formed. Members joined: 2
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: members: 1/1396, 
3/1457
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: starting data 
syncronisation
Apr 13 11:35:19 proxmox2 corosync[1402]:?? [QUORUM] Members[3]: 1 2 3
Apr 13 11:35:19 proxmox2 corosync[1402]:?? [MAIN? ] Completed service 
synchronization, ready to provide service.
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: cpg_send_message 
retried 1 times
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: members: 1/1396, 
3/1457
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: starting data 
syncronisation
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: members: 1/1396, 
2/1398, 3/1457
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: received sync 
request (epoch 1/1396/00000006)
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: received sync 
request (epoch 1/1396/00000006)
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: received sync 
request (epoch 1/1396/00000007)
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: received all states
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: leader is 1/1396
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: synced members: 
1/1396, 3/1457
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: start sending 
inode updates
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: sent all (0) updates
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: all data is up to date
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: 
dfsm_deliver_queue: queue length 4
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: received all states
Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: all data is up 
to date
Apr 13 11:36:00 proxmox2 systemd[1]: Starting Proxmox VE replication 
runner...
Apr 13 11:36:00 proxmox2 systemd[1]: pvesr.service: Succeeded.
Apr 13 11:36:00 proxmox2 systemd[1]: Started Proxmox VE replication runner.
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: members: 1/1396, 
2/1398, 3/1457
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: starting data 
syncronisation
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: received sync 
request (epoch 1/1396/00000007)
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: received all states
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: leader is 1/1396
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: synced members: 
1/1396, 3/1457
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: start sending 
inode updates
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: sent all (8) updates
Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: all data is up to date
Apr 13 11:36:25 proxmox2 corosync[1402]:?? [KNET? ] link: host: 2 link: 
0 is down
Apr 13 11:36:25 proxmox2 corosync[1402]:?? [KNET? ] host: host: 2 
(passive) best link: 0 (pri: 1)
Apr 13 11:36:25 proxmox2 corosync[1402]:?? [KNET? ] host: host: 2 has no 
active links
Apr 13 11:36:26 proxmox2 corosync[1402]:?? [TOTEM ] Token has not been 
received in 61 ms
Apr 13 11:36:26 proxmox2 corosync[1402]:?? [TOTEM ] A processor failed, 
forming new configuration.
Apr 13 11:36:28 proxmox2 corosync[1402]:?? [TOTEM ] A new membership 
(1.547f) was formed. Members left: 2
Apr 13 11:36:28 proxmox2 corosync[1402]:?? [TOTEM ] Failed to receive 
the leave message. failed: 2
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: members: 1/1396, 
3/1457
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: starting data 
syncronisation
Apr 13 11:36:28 proxmox2 corosync[1402]:?? [QUORUM] Members[2]: 1 3
Apr 13 11:36:28 proxmox2 corosync[1402]:?? [MAIN? ] Completed service 
synchronization, ready to provide service.
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: cpg_send_message 
retried 1 times
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: members: 1/1396, 
3/1457
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: starting data 
syncronisation
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: received sync 
request (epoch 1/1396/00000008)
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: received sync 
request (epoch 1/1396/00000008)
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: received all states
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: leader is 1/1396
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: synced members: 
1/1396, 3/1457
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: start sending 
inode updates
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: sent all (0) updates
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: all data is up to date
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: 
dfsm_deliver_queue: queue length 2
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: received all states
Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: all data is up 
to date
Apr 13 11:36:29 proxmox2 pve-ha-crm[1801]: node 'proxmox1': state 
changed from 'online' => 'unknown'
Apr 13 11:36:38 proxmox2 pvestatd[1553]: got timeout
Apr 13 11:36:38 proxmox2 pvestatd[1553]: status update time (5.090 seconds)
Apr 13 11:36:45 proxmox2 ceph-osd[1424]: 2021-04-13 11:36:45.407 
7f94513df700 -1 osd.2 1166 heartbeat_check: no reply from 
192.168.91.11:6820 osd.0 since back 2021-04-13 11:36:23.684429 front 
2021-04-13 11:36:23.684422 (oldest deadline 2021-04-13 11:36:44.784447)
---

all 3 nodes have the same running Proxmox versions:

root at proxmox1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1


We are upgrading the cluster in the next days as part of our 3-month 
upgrade cycle, but can wait.

Any ideas? Could this be a bug?

Thanks a lot

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


From mir at miras.org  Wed Apr 14 11:21:09 2021
From: mir at miras.org (Michael Rasmussen)
Date: Wed, 14 Apr 2021 11:21:09 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
Message-ID: <20210414112109.6f57f496@sleipner.datanom.net>

On Wed, 14 Apr 2021 11:04:10 +0200
Eneko Lacunza via pve-user <pve-user at lists.proxmox.com> wrote:

> Hi all,
> 
> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
> 
> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
> operating normally for a year. Last update was on January 21st 2021.
> Storage is Ceph and nodes are connected to the same network switch
> with active-pasive bonds.
> 
> proxmox1 was fenced and automatically rebooted, then everything
> recovered. HA restarted VMs in other nodes too.
> 
> proxmox1 syslog: (no network link issues reported at device level)
I have seen this occasionally and every time the cause was high network
load/network congestion which caused token timeout. The default token
timeout in corosync IMHO is very optimistically configured to 1000 ms
so I have changed this setting to 5000 ms and after I have done this I
have never seen fencing happening caused by network load/network
congestion again. You could try this and see if that helps you.

PS. my cluster communication is on a dedicated gb bonded vlan.

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E
mir <at> datanom <dot> net
https://pgp.key-server.io/pks/lookup?search=0xE501F51C
mir <at> miras <dot> org
https://pgp.key-server.io/pks/lookup?search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
When I woke up this morning, my girlfriend asked if I had slept well.
I said, "No, I made a few mistakes."
		-- Steven Wright
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20210414/40c5634d/attachment.sig>

From elacunza at binovo.es  Wed Apr 14 12:12:09 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 14 Apr 2021 12:12:09 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
Message-ID: <2d33e64d-ee43-0a3b-0a24-538df9ef837c@binovo.es>

Hi Michael,

El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
> On Wed, 14 Apr 2021 11:04:10 +0200
> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com>  wrote:
>
>> Hi all,
>>
>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>
>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>> operating normally for a year. Last update was on January 21st 2021.
>> Storage is Ceph and nodes are connected to the same network switch
>> with active-pasive bonds.
>>
>> proxmox1 was fenced and automatically rebooted, then everything
>> recovered. HA restarted VMs in other nodes too.
>>
>> proxmox1 syslog: (no network link issues reported at device level)
> I have seen this occasionally and every time the cause was high network
> load/network congestion which caused token timeout. The default token
> timeout in corosync IMHO is very optimistically configured to 1000 ms
> so I have changed this setting to 5000 ms and after I have done this I
> have never seen fencing happening caused by network load/network
> congestion again. You could try this and see if that helps you.
>
> PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't 
include this info) but only for proxmox nodes:

- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on 
each switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private 
networks.

I wouldn't expect high network load/congestion because it's on an 
internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
ocurring during the fence.

Network cards are Broadcom.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


From smr at kmi.com  Wed Apr 14 13:22:33 2021
From: smr at kmi.com (Stefan M. Radman)
Date: Wed, 14 Apr 2021 11:22:33 +0000
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
Message-ID: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>

Hi Eneko

Do you have separate physical interfaces for the cluster (corosync) traffic?
Do you have them on separate VLANs on your switches?
Are you running 1 or 2 corosync rings?

Please post your /etc/network/interfaces and explain which interface connects where.

Thanks

Stefan


On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user <pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>> wrote:


From: Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>>
Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence
Date: April 14, 2021 at 12:12:09 GMT+2
To: pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>


Hi Michael,

El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
On Wed, 14 Apr 2021 11:04:10 +0200
Eneko Lacunza via pve-user<pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>>  wrote:

Hi all,

Yesterday we had a strange fence happen in a PVE 6.2 cluster.

Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
operating normally for a year. Last update was on January 21st 2021.
Storage is Ceph and nodes are connected to the same network switch
with active-pasive bonds.

proxmox1 was fenced and automatically rebooted, then everything
recovered. HA restarted VMs in other nodes too.

proxmox1 syslog: (no network link issues reported at device level)
I have seen this occasionally and every time the cause was high network
load/network congestion which caused token timeout. The default token
timeout in corosync IMHO is very optimistically configured to 1000 ms
so I have changed this setting to 5000 ms and after I have done this I
have never seen fencing happening caused by network load/network
congestion again. You could try this and see if that helps you.

PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes:

- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks.

I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence.

Network cards are Broadcom.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


_______________________________________________
pve-user mailing list
pve-user at lists.proxmox.com
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&amp;reserved=0


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.

From elacunza at binovo.es  Wed Apr 14 15:18:13 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 14 Apr 2021 15:18:13 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>
Message-ID: <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es>

Hi Stefan,

El 14/4/21 a las 13:22, Stefan M. Radman escribi?:
> Hi Eneko
>
> Do you have separate physical interfaces for the cluster (corosync) 
> traffic?
No.
> Do you have them on separate VLANs on your switches?
Onyl Ceph traffic is on VLAN91, the rest is untagged.

> Are you running 1 or 2 corosync rings?
This is standard... no hand tuning:

nodelist {
 ? node {
 ??? name: proxmox1
 ??? nodeid: 2
 ??? quorum_votes: 1
 ??? ring0_addr: 192.168.90.11
 ? }
 ? node {
 ??? name: proxmox2
 ??? nodeid: 1
 ??? quorum_votes: 1
 ??? ring0_addr: 192.168.90.12
 ? }
 ? node {
 ??? name: proxmox3
 ??? nodeid: 3
 ??? quorum_votes: 1
 ??? ring0_addr: 192.168.90.13
 ? }
}

quorum {
 ? provider: corosync_votequorum
}

totem {
 ? cluster_name: CLUSTERNAME
 ? config_version: 3
 ? interface {
 ??? linknumber: 0
 ? }
 ? ip_version: ipv4-6
 ? secauth: on
 ? version: 2
}

>
> Please post your /etc/network/interfaces and explain which interface 
> connects where.
auto lo
iface lo inet loopback

iface ens2f0np0 inet manual
# Switch2

iface ens2f1np1 inet manual
# Switch1

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
 ??? bond-slaves ens2f0np0 ens2f1np1
 ??? bond-miimon 100
 ??? bond-mode active-backup
 ??? bond-primary ens2f0np1

auto bond0.91
iface bond0.91 inet static
 ??? address 192.168.91.11
#Ceph

auto vmbr0
iface vmbr0 inet static
 ??? address 192.168.90.11
 ??? gateway 192.168.90.1
 ??? bridge-ports bond0
 ??? bridge-stp off
 ??? bridge-fd 0

Thanks
>
> Thanks
>
> Stefan
>
>
>> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user 
>> <pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>> wrote:
>>
>>
>> *From: *Eneko Lacunza <elacunza at binovo.es <mailto:elacunza at binovo.es>>
>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence*
>> *Date: *April 14, 2021 at 12:12:09 GMT+2
>> *To: *pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
>>
>>
>> Hi Michael,
>>
>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
>>> On Wed, 14 Apr 2021 11:04:10 +0200
>>> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com 
>>> <mailto:pve-user at lists.proxmox.com>> ?wrote:
>>>
>>>> Hi all,
>>>>
>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>>>
>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>>>> operating normally for a year. Last update was on January 21st 2021.
>>>> Storage is Ceph and nodes are connected to the same network switch
>>>> with active-pasive bonds.
>>>>
>>>> proxmox1 was fenced and automatically rebooted, then everything
>>>> recovered. HA restarted VMs in other nodes too.
>>>>
>>>> proxmox1 syslog: (no network link issues reported at device level)
>>> I have seen this occasionally and every time the cause was high network
>>> load/network congestion which caused token timeout. The default token
>>> timeout in corosync IMHO is very optimistically configured to 1000 ms
>>> so I have changed this setting to 5000 ms and after I have done this I
>>> have never seen fencing happening caused by network load/network
>>> congestion again. You could try this and see if that helps you.
>>>
>>> PS. my cluster communication is on a dedicated gb bonded vlan.
>> Thanks for the info. In this case network is 10Gbit (I see I didn't 
>> include this info) but only for proxmox nodes:
>>
>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
>> - Both switches are interconnected with a SFP+ DAC
>> - Active-passive Bonds in each proxmox node go one SFP+ interface on 
>> each switch. Primary interfaces are configured to be on the same switch.
>> - Connectivity to the LAN is done with 1 Gbit link
>> - Proxmox 2x10G Bond is used for VM networking and Ceph 
>> public/private networks.
>>
>> I wouldn't expect high network load/congestion because it's on an 
>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
>> ocurring during the fence.
>>
>> Network cards are Broadcom.
>>
>> Thanks
>>
>> Eneko Lacunza
>> Zuzendari teknikoa | Director t?cnico
>> Binovo IT Human Project
>>
>> Tel. +34 943 569 206 | https://www.binovo.es <https://www.binovo.es>
>> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun
>>
>> https://www.youtube.com/user/CANALBINOVO 
>> <https://www.youtube.com/user/CANALBINOVO>
>> https://www.linkedin.com/company/37269706/
>>
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&amp;reserved=0
>
>
> CONFIDENTIALITY NOTICE: /This communication may contain privileged and 
> confidential information, or may otherwise be protected from 
> disclosure, and is intended solely for use of the intended 
> recipient(s). If you are not the intended recipient of this 
> communication, please notify the sender that you have received this 
> communication in error and delete and destroy all copies in your 
> possession. /
>

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


From smr at kmi.com  Wed Apr 14 15:57:09 2021
From: smr at kmi.com (Stefan M. Radman)
Date: Wed, 14 Apr 2021 13:57:09 +0000
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>
 <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es>
Message-ID: <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com>

Hi Eneko

That?s a nice setup and I bet it works well but you should do some hand-tuning to increase resilience.

Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces?

If that?s the case I?d strongly recommend to turn them into dedicated untagged interfaces for the cluster traffic, running on two separate ?rings".

https://pve.proxmox.com/wiki/Separate_Cluster_Network
https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol

Create two corosync rings, using isolated VLANs on your two switches e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2.

eno1 => Switch1 => VLAN4001
eno2 => Switch2 => VLAN4002

Restrict VLAN4001 to the access ports where the eno1 interfaces are connected. Prune VLAN4001 from ALL trunks.
Restrict VLAN4001 to the access ports where the eno2 interfaces are connected. Prune VLAN4002 from ALL trunks.

Assign the eno1 and eno2 interfaces to two separate subnets and you are done.

With separate rings you don?t even have to stop your cluster while migrating corosync to the new subnets.
Just do them one-by-one.

With corosync running on two separate rings isolated from the rest of your network you should not see any further node fencing.

Stefan

On Apr 14, 2021, at 15:18, Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>> wrote:

Hi Stefan,

El 14/4/21 a las 13:22, Stefan M. Radman escribi?:
Hi Eneko

Do you have separate physical interfaces for the cluster (corosync) traffic?
No.
Do you have them on separate VLANs on your switches?
Onyl Ceph traffic is on VLAN91, the rest is untagged.

Are you running 1 or 2 corosync rings?
This is standard... no hand tuning:

nodelist {
  node {
    name: proxmox1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.90.11
  }
  node {
    name: proxmox2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.90.12
  }
  node {
    name: proxmox3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.90.13
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: CLUSTERNAME
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  secauth: on
  version: 2
}


Please post your /etc/network/interfaces and explain which interface connects where.
auto lo
iface lo inet loopback

iface ens2f0np0 inet manual
# Switch2

iface ens2f1np1 inet manual
# Switch1

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves ens2f0np0 ens2f1np1
    bond-miimon 100
    bond-mode active-backup
    bond-primary ens2f0np1

auto bond0.91
iface bond0.91 inet static
    address 192.168.91.11
#Ceph

auto vmbr0
iface vmbr0 inet static
    address 192.168.90.11
    gateway 192.168.90.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

Thanks

Thanks

Stefan


On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user <pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>> wrote:


From: Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>>
Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence
Date: April 14, 2021 at 12:12:09 GMT+2
To: pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>


Hi Michael,

El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
On Wed, 14 Apr 2021 11:04:10 +0200
Eneko Lacunza via pve-user<pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>>  wrote:

Hi all,

Yesterday we had a strange fence happen in a PVE 6.2 cluster.

Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
operating normally for a year. Last update was on January 21st 2021.
Storage is Ceph and nodes are connected to the same network switch
with active-pasive bonds.

proxmox1 was fenced and automatically rebooted, then everything
recovered. HA restarted VMs in other nodes too.

proxmox1 syslog: (no network link issues reported at device level)
I have seen this occasionally and every time the cause was high network
load/network congestion which caused token timeout. The default token
timeout in corosync IMHO is very optimistically configured to 1000 ms
so I have changed this setting to 5000 ms and after I have done this I
have never seen fencing happening caused by network load/network
congestion again. You could try this and see if that helps you.

PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes:

- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks.

I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence.

Network cards are Broadcom.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995281826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=u3I648kqkdmxF8btFzqout2bTlfHed9JjK9Tr8EzB34%3D&reserved=0>
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995281826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TOacYDsEVcv%2Bw7wxcY7IbLbp8K1VkbtTKXqaf52e76Q%3D&reserved=0>
https://www.linkedin.com/company/37269706/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F37269706%2F&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995291821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vAVmCwacXJwaD5cRRXiPJN2AVNec1YcTJIvTnL3LF5M%3D&reserved=0>


_______________________________________________
pve-user mailing list
pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&amp;reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995291821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wUOdZSI45kUUTMQL7JH4OCVR2ako8jwhKEM%2FNzxmHBo%3D&reserved=0>


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.


Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995301815%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=e2ocJwHrvNE539rwcuQNiNLg88kfmNRIgclpbfwVf9E%3D&reserved=0>
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995301815%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uVo0SltppkTmSXZALpZcczyZ5gPqrDxDwGjRDCv7jQM%3D&reserved=0>
https://www.linkedin.com/company/37269706/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F37269706%2F&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995311810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zGH9mD%2B4Ngurl1i3xU%2B9XzCPjw%2FiqwZNfsU%2BuTgSUDg%3D&reserved=0>


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.

From elacunza at binovo.es  Wed Apr 14 16:07:09 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 14 Apr 2021 16:07:09 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>
 <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es>
 <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com>
Message-ID: <fb8ce5b4-534b-5dfd-2aca-9015e89e013a@binovo.es>

Hi Stefan,

Thanks for your advice. Seems a really good use for otherwise unused 1G 
ports so I'll look into configuring that.

If nodes had only one 1G interface, would you also une RRP? (one ring on 
1G and the other on 10G bond)

Thanks

El 14/4/21 a las 15:57, Stefan M. Radman escribi?:
> Hi Eneko
>
> That?s a nice setup and I bet it works well but you should do some 
> hand-tuning to increase resilience.
>
> Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces?
>
> If that?s the case I?d strongly recommend to turn them into dedicated 
> untagged interfaces for the cluster traffic, running on two separate 
> ?rings".
>
> https://pve.proxmox.com/wiki/Separate_Cluster_Network 
> <https://pve.proxmox.com/wiki/Separate_Cluster_Network>
> https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol 
> <https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol>
>
> Create two corosync rings, using isolated VLANs on your two switches 
> e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2.
>
> eno1 => Switch1 => VLAN4001
> eno2 => Switch2 => VLAN4002
>
> Restrict VLAN4001 to the access ports where the eno1 interfaces are 
> connected. Prune VLAN4001 from ALL trunks.
> Restrict VLAN4001 to the access ports where the eno2 interfaces are 
> connected. Prune VLAN4002 from ALL trunks.
> Assign the eno1 and eno2 interfaces to two separate subnets and you 
> are done.
>
> With separate rings you don?t even have to stop your cluster while 
> migrating corosync to the new subnets.
> Just do them one-by-one.
>
> With corosync running on two separate rings isolated from the rest of 
> your network you should not see any further node fencing.
>
> Stefan
>
>> On Apr 14, 2021, at 15:18, Eneko Lacunza <elacunza at binovo.es 
>> <mailto:elacunza at binovo.es>> wrote:
>>
>> Hi Stefan,
>>
>> El 14/4/21 a las 13:22, Stefan M. Radman escribi?:
>>> Hi Eneko
>>>
>>> Do you have separate physical interfaces for the cluster (corosync) 
>>> traffic?
>> No.
>>> Do you have them on separate VLANs on your switches?
>> Onyl Ceph traffic is on VLAN91, the rest is untagged.
>>
>>> Are you running 1 or 2 corosync rings?
>> This is standard... no hand tuning:
>>
>> nodelist {
>> ? node {
>> ??? name: proxmox1
>> ??? nodeid: 2
>> ??? quorum_votes: 1
>> ??? ring0_addr: 192.168.90.11
>> ? }
>> ? node {
>> ??? name: proxmox2
>> ??? nodeid: 1
>> ??? quorum_votes: 1
>> ??? ring0_addr: 192.168.90.12
>> ? }
>> ? node {
>> ??? name: proxmox3
>> ??? nodeid: 3
>> ??? quorum_votes: 1
>> ??? ring0_addr: 192.168.90.13
>> ? }
>> }
>>
>> quorum {
>> ? provider: corosync_votequorum
>> }
>>
>> totem {
>> ? cluster_name: CLUSTERNAME
>> ? config_version: 3
>> ? interface {
>> ??? linknumber: 0
>> ? }
>> ? ip_version: ipv4-6
>> ? secauth: on
>> ? version: 2
>> }
>>
>>>
>>> Please post your /etc/network/interfaces and explain which interface 
>>> connects where.
>> auto lo
>> iface lo inet loopback
>>
>> iface ens2f0np0 inet manual
>> # Switch2
>>
>> iface ens2f1np1 inet manual
>> # Switch1
>>
>> iface eno1 inet manual
>>
>> iface eno2 inet manual
>>
>> auto bond0
>> iface bond0 inet manual
>> ??? bond-slaves ens2f0np0 ens2f1np1
>> ??? bond-miimon 100
>> ??? bond-mode active-backup
>> ??? bond-primary ens2f0np1
>>
>> auto bond0.91
>> iface bond0.91 inet static
>> ??? address 192.168.91.11
>> #Ceph
>>
>> auto vmbr0
>> iface vmbr0 inet static
>> ??? address 192.168.90.11
>> ??? gateway 192.168.90.1
>> ??? bridge-ports bond0
>> ??? bridge-stp off
>> ??? bridge-fd 0
>>
>> Thanks
>>>
>>> Thanks
>>>
>>> Stefan
>>>
>>>
>>>> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user 
>>>> <pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>> wrote:
>>>>
>>>>
>>>> *From: *Eneko Lacunza <elacunza at binovo.es <mailto:elacunza at binovo.es>>
>>>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence*
>>>> *Date: *April 14, 2021 at 12:12:09 GMT+2
>>>> *To: *pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
>>>>
>>>>
>>>> Hi Michael,
>>>>
>>>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
>>>>> On Wed, 14 Apr 2021 11:04:10 +0200
>>>>> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com 
>>>>> <mailto:pve-user at lists.proxmox.com>> ?wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>>>>>
>>>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>>>>>> operating normally for a year. Last update was on January 21st 2021.
>>>>>> Storage is Ceph and nodes are connected to the same network switch
>>>>>> with active-pasive bonds.
>>>>>>
>>>>>> proxmox1 was fenced and automatically rebooted, then everything
>>>>>> recovered. HA restarted VMs in other nodes too.
>>>>>>
>>>>>> proxmox1 syslog: (no network link issues reported at device level)
>>>>> I have seen this occasionally and every time the cause was high 
>>>>> network
>>>>> load/network congestion which caused token timeout. The default token
>>>>> timeout in corosync IMHO is very optimistically configured to 1000 ms
>>>>> so I have changed this setting to 5000 ms and after I have done this I
>>>>> have never seen fencing happening caused by network load/network
>>>>> congestion again. You could try this and see if that helps you.
>>>>>
>>>>> PS. my cluster communication is on a dedicated gb bonded vlan.
>>>> Thanks for the info. In this case network is 10Gbit (I see I didn't 
>>>> include this info) but only for proxmox nodes:
>>>>
>>>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
>>>> - Both switches are interconnected with a SFP+ DAC
>>>> - Active-passive Bonds in each proxmox node go one SFP+ interface 
>>>> on each switch. Primary interfaces are configured to be on the same 
>>>> switch.
>>>> - Connectivity to the LAN is done with 1 Gbit link
>>>> - Proxmox 2x10G Bond is used for VM networking and Ceph 
>>>> public/private networks.
>>>>
>>>> I wouldn't expect high network load/congestion because it's on an 
>>>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
>>>> ocurring during the fence.
>>>>
>>>> Network cards are Broadcom.
>>>>
>>>> Thanks
>>>>
>>>> Eneko Lacunza
>>>> Zuzendari teknikoa | Director t?cnico
>>>> Binovo IT Human Project
>>>>
>>>> Tel. +34 943 569 206 | https://www.binovo.es 
>>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995281826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=u3I648kqkdmxF8btFzqout2bTlfHed9JjK9Tr8EzB34%3D&reserved=0>
>>>> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun
>>>>
>>>> https://www.youtube.com/user/CANALBINOVO 
>>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7C4398bf34e74d4be5195f08d8ff47c38d%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540030995281826%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TOacYDsEVcv%2Bw7wxcY7IbLbp8K1VkbtTKXqaf52e76Q%3D&reserved=0>
>>>> https://www.linkedin.com/company/37269706/
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at lists.proxmox.com
>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&amp;reserved=0
>>>
>>>
>>> CONFIDENTIALITY NOTICE: /This communication may contain privileged 
>>> and confidential information, or may otherwise be protected from 
>>> disclosure, and is intended solely for use of the intended 
>>> recipient(s). If you are not the intended recipient of this 
>>> communication, please notify the sender that you have received this 
>>> communication in error and delete and destroy all copies in your 
>>> possession. /
>>>
>>
>
>
> CONFIDENTIALITY NOTICE: /This communication may contain privileged and 
> confidential information, or may otherwise be protected from 
> disclosure, and is intended solely for use of the intended 
> recipient(s). If you are not the intended recipient of this 
> communication, please notify the sender that you have received this 
> communication in error and delete and destroy all copies in your 
> possession. /
>

      EnekoLacunza

Director T?cnico | Zuzendari teknikoa

Binovo IT Human Project

	943 569 206 <tel:943 569 206>

	elacunza at binovo.es <mailto:elacunza at binovo.es>

	binovo.es <//binovo.es>

	Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun

	
youtube <https://www.youtube.com/user/CANALBINOVO/>	
	linkedin <https://www.linkedin.com/company/37269706/>	


From smr at kmi.com  Wed Apr 14 16:49:59 2021
From: smr at kmi.com (Stefan M. Radman)
Date: Wed, 14 Apr 2021 14:49:59 +0000
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <fb8ce5b4-534b-5dfd-2aca-9015e89e013a@binovo.es>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>
 <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es>
 <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com>
 <fb8ce5b4-534b-5dfd-2aca-9015e89e013a@binovo.es>
Message-ID: <3CC99343-0FD3-4F2F-8F05-119CBB61BC37@kmi.com>

Hi Eneko

If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and the other on 10G bond)

That?s pretty unlikely. Usually they come in pairs ;)

But yes, in that hypothetical case I?d use the available physical interface for ring1 and build ring2 from a tagged interface.

For corosync interfaces I prefer two separate physical interfaces (simple, resilient).
Bonding and tagging adds a layer of complexity you don?t want on a cluster heartbeat.

Find below an actual configuration of a cluster with one node having just 2 interfaces while the other nodes all have 4.
The 2 interfaces are configured in an HA bond like yours and the corosync rings are stacked on it as tagged interfaces in their specific VLANs.
VLAN684 exists on switch1 only and VLAN685 exists on switch2 only.
The most resilient solution under the circumstances given and has been working like a charm for several years now.

Regards

Stefan

NODE1 - 4 interfaces
====================

iface eno1 inet manual
#Gb1 - Trunk

iface eno2 inet manual
#Gb2 - Trunk

auto eno3
iface eno3 inet static
address 192.168.84.1
netmask 255.255.255.0
#Gb3 - COROSYNC1 - VLAN684

auto eno4
iface eno4 inet static
address 192.168.85.1
netmask 255.255.255.0
#Gb4 - COROSYNC2 - VLAN685

auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode active-backup
#HA Bundle Gb1/Gb2 - Trunk


NODE3 - 2 interfaces
====================

iface eno1 inet manual
#Gb1 - Trunk

iface eno2 inet manual
#Gb2 - Trunk

auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond_miimon 100
bond_mode active-backup
#HA Bundle Gb1/Gb2 - Trunk

auto bond0.684
iface bond0.684 inet static
address 192.168.84.3
netmask 255.255.255.0
#COROSYNC1 - VLAN684

auto bond0.685
iface bond0.685 inet static
address 192.168.85.3
netmask 255.255.255.0
#COROSYNC2 - VLAN685

On Apr 14, 2021, at 16:07, Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>> wrote:

Hi Stefan,

Thanks for your advice. Seems a really good use for otherwise unused 1G ports so I'll look into configuring that.

If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and the other on 10G bond)

Thanks

El 14/4/21 a las 15:57, Stefan M. Radman escribi?:
Hi Eneko

That?s a nice setup and I bet it works well but you should do some hand-tuning to increase resilience.

Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces?

If that?s the case I?d strongly recommend to turn them into dedicated untagged interfaces for the cluster traffic, running on two separate ?rings".

https://pve.proxmox.com/wiki/Separate_Cluster_Network<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380150598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vWo26hj0ANMu6mtkk9WhdbKA0TJ0%2FgalkowwssJqmjA%3D&reserved=0>
https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network%23Redundant_Ring_Protocol&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SRJn5cb7yUPxuUTOFRnUxiyBjtCindxzPjpNBMlYuf4%3D&reserved=0>

Create two corosync rings, using isolated VLANs on your two switches e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2.

eno1 => Switch1 => VLAN4001
eno2 => Switch2 => VLAN4002

Restrict VLAN4001 to the access ports where the eno1 interfaces are connected. Prune VLAN4001 from ALL trunks.
Restrict VLAN4001 to the access ports where the eno2 interfaces are connected. Prune VLAN4002 from ALL trunks.

Assign the eno1 and eno2 interfaces to two separate subnets and you are done.

With separate rings you don?t even have to stop your cluster while migrating corosync to the new subnets.
Just do them one-by-one.

With corosync running on two separate rings isolated from the rest of your network you should not see any further node fencing.

Stefan

On Apr 14, 2021, at 15:18, Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>> wrote:

Hi Stefan,

El 14/4/21 a las 13:22, Stefan M. Radman escribi?:
Hi Eneko

Do you have separate physical interfaces for the cluster (corosync) traffic?
No.
Do you have them on separate VLANs on your switches?
Onyl Ceph traffic is on VLAN91, the rest is untagged.

Are you running 1 or 2 corosync rings?
This is standard... no hand tuning:

nodelist {
  node {
    name: proxmox1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.90.11
  }
  node {
    name: proxmox2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.90.12
  }
  node {
    name: proxmox3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.90.13
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: CLUSTERNAME
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  secauth: on
  version: 2
}


Please post your /etc/network/interfaces and explain which interface connects where.
auto lo
iface lo inet loopback

iface ens2f0np0 inet manual
# Switch2

iface ens2f1np1 inet manual
# Switch1

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves ens2f0np0 ens2f1np1
    bond-miimon 100
    bond-mode active-backup
    bond-primary ens2f0np1

auto bond0.91
iface bond0.91 inet static
    address 192.168.91.11
#Ceph

auto vmbr0
iface vmbr0 inet static
    address 192.168.90.11
    gateway 192.168.90.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

Thanks

Thanks

Stefan


On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user <pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>> wrote:


From: Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>>
Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence
Date: April 14, 2021 at 12:12:09 GMT+2
To: pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>


Hi Michael,

El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
On Wed, 14 Apr 2021 11:04:10 +0200
Eneko Lacunza via pve-user<pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>>  wrote:

Hi all,

Yesterday we had a strange fence happen in a PVE 6.2 cluster.

Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
operating normally for a year. Last update was on January 21st 2021.
Storage is Ceph and nodes are connected to the same network switch
with active-pasive bonds.

proxmox1 was fenced and automatically rebooted, then everything
recovered. HA restarted VMs in other nodes too.

proxmox1 syslog: (no network link issues reported at device level)
I have seen this occasionally and every time the cause was high network
load/network congestion which caused token timeout. The default token
timeout in corosync IMHO is very optimistically configured to 1000 ms
so I have changed this setting to 5000 ms and after I have done this I
have never seen fencing happening caused by network load/network
congestion again. You could try this and see if that helps you.

PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes:

- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks.

I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence.

Network cards are Broadcom.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4%2FgMjuUWXvASTXhGHY1jaebv1O9MS8YB7K7DUa9pq3E%3D&reserved=0>
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380170585%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7d1X%2F9o5%2Fds6UrrotnAjhZEiKo6X0Yfvi8AfZWr%2BbNk%3D&reserved=0>
https://www.linkedin.com/company/37269706/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F37269706%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380180580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=n0nBU%2FxY%2BGTFXjuDkhfiBI0EO7%2B0w50Lpw6VOpSfnnM%3D&reserved=0>


_______________________________________________
pve-user mailing list
pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&amp;reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380180580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XrzD8uS%2BKzGFxKRvZlgfaJuroNOORvNGMpFwVE4efdo%3D&reserved=0>


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.


Eneko Lacunza
Director
 T?cnico | Zuzendari teknikoa
Binovo IT Human Project
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/phone-icon-2x.png]
943
 569 206<tel:943%20569%20206>
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/email-icon-2x.png]
elacunza at binovo.es<mailto:elacunza at binovo.es>
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/link-icon-2x.png]
binovo.es<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbinovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380190574%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sj%2FOgHYtiuuMLV2tjpjPAuX8ENMFOXfVP2A%2B%2F8e%2FxWw%3D&reserved=0>
[https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/address-icon-2x.png]
Astigarragako
 Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
[https://odooticketbai.com/wp-content/uploads/2020/10/Logo-Binovo-firmas-de-correo.png]
[youtube]<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380190574%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Czc3iDzP721b5qJh7maKPRjGg6DIkRWZRaOtSzUTd4Y%3D&reserved=0>
[linkedin]<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F37269706%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380200571%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2FUdWkOQy0PoPs0cqM0mm4Uzd6fDkqzYOCgAU5Gi2SmM%3D&reserved=0>


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.

From elacunza at binovo.es  Wed Apr 14 17:15:08 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 14 Apr 2021 17:15:08 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <3CC99343-0FD3-4F2F-8F05-119CBB61BC37@kmi.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com>
 <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es>
 <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com>
 <fb8ce5b4-534b-5dfd-2aca-9015e89e013a@binovo.es>
 <3CC99343-0FD3-4F2F-8F05-119CBB61BC37@kmi.com>
Message-ID: <647b9ba1-e3dd-e144-ec2a-0de375fb6072@binovo.es>

Hi Stefan,

El 14/4/21 a las 16:49, Stefan M. Radman escribi?:
>> If nodes had only one 1G interface, would you also une RRP? (one ring 
>> on 1G and the other on 10G bond)
>
> That?s pretty unlikely. Usually they come in pairs ;)
Right, unless you use "entry" level servers or DIY builds ;)
>
> But yes, in that hypothetical case I?d use the available physical 
> interface for ring1 and build ring2 from a tagged interface.
>
> For corosync interfaces I prefer two separate physical interfaces 
> (simple, resilient).
> Bonding and tagging adds a layer of complexity you don?t want on a 
> cluster heartbeat.
Sure.
>
> Find below an actual configuration of a cluster with one node having 
> just 2 interfaces while the other nodes all have 4.
> The 2 interfaces are configured in an HA bond like yours and the 
> corosync rings are stacked on it as tagged interfaces in their 
> specific VLANs.
> VLAN684 exists on switch1 only and VLAN685 exists on switch2 only.
> The most resilient solution under the circumstances given and has been 
> working like a charm for several years now.
Thanks for the examples!

Cheers
Eneko


>
> Regards
>
> Stefan
>
> NODE1 - 4 interfaces
> ====================
>
> iface eno1 inet manual
> #Gb1 - Trunk
>
> iface eno2 inet manual
> #Gb2 - Trunk
>
> auto eno3
> iface eno3 inet static
> address192.168.84.1
> netmask255.255.255.0
> #Gb3 - COROSYNC1 - VLAN684
>
> auto eno4
> iface eno4 inet static
> address192.168.85.1
> netmask255.255.255.0
> #Gb4 - COROSYNC2 - VLAN685
>
> auto bond0
> iface bond0 inet manual
> slaves eno1 eno2
> bond_miimon 100
> bond_mode active-backup
> #HA Bundle Gb1/Gb2 - Trunk
>
>
> NODE3 - 2 interfaces
> ====================
>
> iface eno1 inet manual
> #Gb1 - Trunk
>
> iface eno2 inet manual
> #Gb2 - Trunk
>
> auto bond0
> iface bond0 inet manual
> slaves eno1 eno2
> bond_miimon 100
> bond_mode active-backup
> #HA Bundle Gb1/Gb2 - Trunk
>
> auto bond0.684
> iface bond0.684 inet static
> address192.168.84.3
> netmask 255.255.255.0
> #COROSYNC1 - VLAN684
>
> auto bond0.685
> iface bond0.685 inet static
> address 192.168.85.3
> netmask 255.255.255.0
> #COROSYNC2 - VLAN685
>
>> On Apr 14, 2021, at 16:07, Eneko Lacunza <elacunza at binovo.es 
>> <mailto:elacunza at binovo.es>> wrote:
>>
>> Hi Stefan,
>>
>> Thanks for your advice. Seems a really good use for otherwise unused 
>> 1G ports so I'll look into configuring that.
>>
>> If nodes had only one 1G interface, would you also une RRP? (one ring 
>> on 1G and the other on 10G bond)
>>
>> Thanks
>>
>> El 14/4/21 a las 15:57, Stefan M. Radman escribi?:
>>> Hi Eneko
>>>
>>> That?s a nice setup and I bet it works well but you should do some 
>>> hand-tuning to increase resilience.
>>>
>>> Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces?
>>>
>>> If that?s the case I?d strongly recommend to turn them into 
>>> dedicated untagged interfaces for the cluster traffic, running on 
>>> two separate ?rings".
>>>
>>> https://pve.proxmox.com/wiki/Separate_Cluster_Network 
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380150598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vWo26hj0ANMu6mtkk9WhdbKA0TJ0%2FgalkowwssJqmjA%3D&reserved=0>
>>> https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol 
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network%23Redundant_Ring_Protocol&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SRJn5cb7yUPxuUTOFRnUxiyBjtCindxzPjpNBMlYuf4%3D&reserved=0>
>>>
>>> Create two corosync rings, using isolated VLANs on your two switches 
>>> e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2.
>>>
>>> eno1 => Switch1 => VLAN4001
>>> eno2 => Switch2 => VLAN4002
>>>
>>> Restrict VLAN4001 to the access ports where the eno1 interfaces are 
>>> connected. Prune VLAN4001 from ALL trunks.
>>> Restrict VLAN4001 to the access ports where the eno2 interfaces are 
>>> connected. Prune VLAN4002 from ALL trunks.
>>> Assign the eno1 and eno2 interfaces to two separate subnets and you 
>>> are done.
>>>
>>> With separate rings you don?t even have to stop your cluster while 
>>> migrating corosync to the new subnets.
>>> Just do them one-by-one.
>>>
>>> With corosync running on two separate rings isolated from the rest 
>>> of your network you should not see any further node fencing.
>>>
>>> Stefan
>>>
>>>> On Apr 14, 2021, at 15:18, Eneko Lacunza <elacunza at binovo.es 
>>>> <mailto:elacunza at binovo.es>> wrote:
>>>>
>>>> Hi Stefan,
>>>>
>>>> El 14/4/21 a las 13:22, Stefan M. Radman escribi?:
>>>>> Hi Eneko
>>>>>
>>>>> Do you have separate physical interfaces for the cluster 
>>>>> (corosync) traffic?
>>>> No.
>>>>> Do you have them on separate VLANs on your switches?
>>>> Onyl Ceph traffic is on VLAN91, the rest is untagged.
>>>>
>>>>> Are you running 1 or 2 corosync rings?
>>>> This is standard... no hand tuning:
>>>>
>>>> nodelist {
>>>> ? node {
>>>> ??? name: proxmox1
>>>> ??? nodeid: 2
>>>> ??? quorum_votes: 1
>>>> ??? ring0_addr: 192.168.90.11
>>>> ? }
>>>> ? node {
>>>> ??? name: proxmox2
>>>> ??? nodeid: 1
>>>> ??? quorum_votes: 1
>>>> ??? ring0_addr: 192.168.90.12
>>>> ? }
>>>> ? node {
>>>> ??? name: proxmox3
>>>> ??? nodeid: 3
>>>> ??? quorum_votes: 1
>>>> ??? ring0_addr: 192.168.90.13
>>>> ? }
>>>> }
>>>>
>>>> quorum {
>>>> ? provider: corosync_votequorum
>>>> }
>>>>
>>>> totem {
>>>> ? cluster_name: CLUSTERNAME
>>>> ? config_version: 3
>>>> ? interface {
>>>> ??? linknumber: 0
>>>> ? }
>>>> ? ip_version: ipv4-6
>>>> ? secauth: on
>>>> ? version: 2
>>>> }
>>>>
>>>>>
>>>>> Please post your /etc/network/interfaces and explain which 
>>>>> interface connects where.
>>>> auto lo
>>>> iface lo inet loopback
>>>>
>>>> iface ens2f0np0 inet manual
>>>> # Switch2
>>>>
>>>> iface ens2f1np1 inet manual
>>>> # Switch1
>>>>
>>>> iface eno1 inet manual
>>>>
>>>> iface eno2 inet manual
>>>>
>>>> auto bond0
>>>> iface bond0 inet manual
>>>> ??? bond-slaves ens2f0np0 ens2f1np1
>>>> ??? bond-miimon 100
>>>> ??? bond-mode active-backup
>>>> ??? bond-primary ens2f0np1
>>>>
>>>> auto bond0.91
>>>> iface bond0.91 inet static
>>>> ??? address 192.168.91.11
>>>> #Ceph
>>>>
>>>> auto vmbr0
>>>> iface vmbr0 inet static
>>>> ??? address 192.168.90.11
>>>> ??? gateway 192.168.90.1
>>>> ??? bridge-ports bond0
>>>> ??? bridge-stp off
>>>> ??? bridge-fd 0
>>>>
>>>> Thanks
>>>>>
>>>>> Thanks
>>>>>
>>>>> Stefan
>>>>>
>>>>>
>>>>>> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user 
>>>>>> <pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>> 
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> *From: *Eneko Lacunza <elacunza at binovo.es 
>>>>>> <mailto:elacunza at binovo.es>>
>>>>>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence*
>>>>>> *Date: *April 14, 2021 at 12:12:09 GMT+2
>>>>>> *To: *pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
>>>>>>
>>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
>>>>>>> On Wed, 14 Apr 2021 11:04:10 +0200
>>>>>>> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com 
>>>>>>> <mailto:pve-user at lists.proxmox.com>> ?wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>>>>>>>
>>>>>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>>>>>>>> operating normally for a year. Last update was on January 21st 
>>>>>>>> 2021.
>>>>>>>> Storage is Ceph and nodes are connected to the same network switch
>>>>>>>> with active-pasive bonds.
>>>>>>>>
>>>>>>>> proxmox1 was fenced and automatically rebooted, then everything
>>>>>>>> recovered. HA restarted VMs in other nodes too.
>>>>>>>>
>>>>>>>> proxmox1 syslog: (no network link issues reported at device level)
>>>>>>> I have seen this occasionally and every time the cause was high 
>>>>>>> network
>>>>>>> load/network congestion which caused token timeout. The default 
>>>>>>> token
>>>>>>> timeout in corosync IMHO is very optimistically configured to 
>>>>>>> 1000 ms
>>>>>>> so I have changed this setting to 5000 ms and after I have done 
>>>>>>> this I
>>>>>>> have never seen fencing happening caused by network load/network
>>>>>>> congestion again. You could try this and see if that helps you.
>>>>>>>
>>>>>>> PS. my cluster communication is on a dedicated gb bonded vlan.
>>>>>> Thanks for the info. In this case network is 10Gbit (I see I 
>>>>>> didn't include this info) but only for proxmox nodes:
>>>>>>
>>>>>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
>>>>>> - Both switches are interconnected with a SFP+ DAC
>>>>>> - Active-passive Bonds in each proxmox node go one SFP+ interface 
>>>>>> on each switch. Primary interfaces are configured to be on the 
>>>>>> same switch.
>>>>>> - Connectivity to the LAN is done with 1 Gbit link
>>>>>> - Proxmox 2x10G Bond is used for VM networking and Ceph 
>>>>>> public/private networks.
>>>>>>
>>>>>> I wouldn't expect high network load/congestion because it's on an 
>>>>>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
>>>>>> ocurring during the fence.
>>>>>>
>>>>>> Network cards are Broadcom.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eneko Lacunza
>>>>>> Zuzendari teknikoa | Director t?cnico
>>>>>> Binovo IT Human Project
>>>>>>
>>>>>> Tel. +34 943 569 206 | https://www.binovo.es 
>>>>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4%2FgMjuUWXvASTXhGHY1jaebv1O9MS8YB7K7DUa9pq3E%3D&reserved=0>
>>>>>> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun
>>>>>>
>>>>>> https://www.youtube.com/user/CANALBINOVO 
>>>>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380170585%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7d1X%2F9o5%2Fds6UrrotnAjhZEiKo6X0Yfvi8AfZWr%2BbNk%3D&reserved=0>
>>>>>> https://www.linkedin.com/company/37269706/
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-user at lists.proxmox.com
>>>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&amp;reserved=0
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE: /This communication may contain privileged 
>>>>> and confidential information, or may otherwise be protected from 
>>>>> disclosure, and is intended solely for use of the intended 
>>>>> recipient(s). If you are not the intended recipient of this 
>>>>> communication, please notify the sender that you have received 
>>>>> this communication in error and delete and destroy all copies in 
>>>>> your possession. /
>>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE: /This communication may contain privileged 
>>> and confidential information, or may otherwise be protected from 
>>> disclosure, and is intended solely for use of the intended 
>>> recipient(s). If you are not the intended recipient of this 
>>> communication, please notify the sender that you have received this 
>>> communication in error and delete and destroy all copies in your 
>>> possession. /
>>>
>>
>
>
> CONFIDENTIALITY NOTICE: /This communication may contain privileged and 
> confidential information, or may otherwise be protected from 
> disclosure, and is intended solely for use of the intended 
> recipient(s). If you are not the intended recipient of this 
> communication, please notify the sender that you have received this 
> communication in error and delete and destroy all copies in your 
> possession. /
>

      EnekoLacunza

Director T?cnico | Zuzendari teknikoa

Binovo IT Human Project

	943 569 206 <tel:943 569 206>

	elacunza at binovo.es <mailto:elacunza at binovo.es>

	binovo.es <//binovo.es>

	Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun

	
youtube <https://www.youtube.com/user/CANALBINOVO/>	
	linkedin <https://www.linkedin.com/company/37269706/>	


From alain.pean at c2n.upsaclay.fr  Wed Apr 14 18:15:29 2021
From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=)
Date: Wed, 14 Apr 2021 18:15:29 +0200
Subject: [PVE-User] Edit: Boot Order mask
In-Reply-To: <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it>
References: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
 <b8bd884e-b3f3-f08f-cbaa-5a1b043d13b2@c2n.upsaclay.fr>
 <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it>
Message-ID: <da7f1512-5fa6-1bb4-a878-7035f8100454@c2n.upsaclay.fr>

Le 14/04/2021 ? 09:37, Piviul a ?crit?:
> I Alain, first of all thank you very much indeed to you and to all 
> people answered this thread. I reply your message but the infos here 
> should answer even the infos asked from Alwin...
>
> I send directly the output differences from the command pveversion 
> with -v flag because all three nodes show the same 
> "pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve)" version.
>
> So I have launched the following command in all three nodes:
>
> # pveversion -v > pveversion.$(hostname)
>
> obtaining 3 differents files and I've done the diff between the first 
> two files (referring to pve01 and pve02) and as expected there is no 
> difference:
>
> $ diff pveversion.pve0{1,2}
>
> Then I have done the diff between the first and the third node and 
> this is the result:
>
> $ diff pveversion.pve0{1,3}
> 5d4
> < pve-kernel-5.3: 6.1-6
> 8,9c7
> < pve-kernel-5.3.18-3-pve: 5.3.18-3
> < pve-kernel-5.3.10-1-pve: 5.3.10-1
> ---
> > pve-kernel-5.4.34-1-pve: 5.4.34-2
>
> there are some little differences yes but in kernel that are not in 
> use any more (in all 3 nodes uname -r shows 5.4.106-1-pve)...
>
> Attached you can find all three files hoping the system doesn't cut them.
>
> Please can I ask you if you have a 6.3 node in your installations that 
> was previously in 6.2 version (i.e. not installed directly in 6.3 
> version)? Can you tell me if the "Boot order" musk is the one with 
> only combo boxes or the more evoluted drag and drop musk?

Hi Piviul,

I don't think only a difference in kernel could explain this difference 
in the web interface, if the other packages are the same. Did you try to 
clear the cache in your web browsers ?

The attached files are indeed there. I looked at the versions, and all 
three appears up to date, so for me, the only origin that I can suppose 
could be the browser cache.

Alain

-- 
Administrateur Syst?me/R?seau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255


From elacunza at binovo.es  Wed Apr 14 18:26:08 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 14 Apr 2021 18:26:08 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
Message-ID: <255c1bb1-a8f2-bec1-3ad9-6785e63d6dae@binovo.es>

Hi,

So I have figured out what likely happened.

Indeed it was very likely a network congestion because proxmox1 and 
proxmox2 where using a switch and proxmox3 the other, due to proxmox1 
and proxmox2 not having properly loaded the bond-primary directive 
(primary slave not shown on /proc/net/bonding/bond0 although it was 
present in /etc/network/interfaces).

Additionally, just checked out that both switches are linked by a 1G 
port due to the 4th SFP+ port being used for the backup server... 
(against my recommendation during the cluster setup I must add...)

So very likely it was network congestion that kicked proxmox1 out of the 
cluster.

If seems that bond directives should be present in slaves too, like:

auto lo
iface lo inet loopback

iface ens2f0np0 inet manual
 ??? bond-master bond0
 ??? bond-primary ens2f0np1
# Switch2

iface ens2f1np1 inet manual
 ??? bond-master bond0
 ??? bond-primary ens2f0np1
# Switch1

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
 ??? bond-slaves ens2f0np0 ens2f1np1
 ??? bond-miimon 100
 ??? bond-mode active-backup
 ??? bond-primary ens2f0np1

auto bond0.91
iface bond0.91 inet static
 ??? address 192.168.91.11
#Ceph

auto vmbr0
iface vmbr0 inet static
 ??? address 192.168.90.11
 ??? gateway 192.168.90.1
 ??? bridge-ports bond0
 ??? bridge-stp off
 ??? bridge-fd 0

Otherwise, it seems sometimes primary doesn't get configured properly...

Thanks again Michael and Stefan!
Eneko


El 14/4/21 a las 12:12, Eneko Lacunza via pve-user escribi?:
> Hi Michael,
>
> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
>> On Wed, 14 Apr 2021 11:04:10 +0200
>> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com> wrote:
>>
>>> Hi all,
>>>
>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>>
>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>>> operating normally for a year. Last update was on January 21st 2021.
>>> Storage is Ceph and nodes are connected to the same network switch
>>> with active-pasive bonds.
>>>
>>> proxmox1 was fenced and automatically rebooted, then everything
>>> recovered. HA restarted VMs in other nodes too.
>>>
>>> proxmox1 syslog: (no network link issues reported at device level)
>> I have seen this occasionally and every time the cause was high network
>> load/network congestion which caused token timeout. The default token
>> timeout in corosync IMHO is very optimistically configured to 1000 ms
>> so I have changed this setting to 5000 ms and after I have done this I
>> have never seen fencing happening caused by network load/network
>> congestion again. You could try this and see if that helps you.
>>
>> PS. my cluster communication is on a dedicated gb bonded vlan.
> Thanks for the info. In this case network is 10Gbit (I see I didn't 
> include this info) but only for proxmox nodes:
>
> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
> - Both switches are interconnected with a SFP+ DAC
> - Active-passive Bonds in each proxmox node go one SFP+ interface on 
> each switch. Primary interfaces are configured to be on the same switch.
> - Connectivity to the LAN is done with 1 Gbit link
> - Proxmox 2x10G Bond is used for VM networking and Ceph public/private 
> networks.
>
> I wouldn't expect high network load/congestion because it's on an 
> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
> ocurring during the fence.
>
> Network cards are Broadcom.
>
> Thanks 

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


From humberto.freitas310 at gmail.com  Wed Apr 14 18:57:01 2021
From: humberto.freitas310 at gmail.com (Humberto Freitas)
Date: Wed, 14 Apr 2021 17:57:01 +0100
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
Message-ID: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>

Hey guys, I hope everybody is all right ?

I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc...

I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist?

Appreciate your wisdom lol

Thanks for your great work

Sincerely,

Humberto Freitas

Phone: +244 944 775 334
Email: humberto.freitas310 at gmail.com
Angola

From ralf.storm at konzept-is.de  Wed Apr 14 19:27:15 2021
From: ralf.storm at konzept-is.de (Ralf Storm)
Date: Wed, 14 Apr 2021 19:27:15 +0200
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
References: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
Message-ID: <20068e92-58d5-1ce7-67d6-181bf97cc948@konzept-is.de>

Hello Humberto,


I have it running for a customer on a DL380 Gen9 and on an even older 
one, works togeher like charm.

I have installed Proxmox on so many different hardware in the past few 
years, from a small atom NUC, on "usual" pcs and up to "real" servers.

Never had a problem with it.

I always use the ZFS-options during install - better than using the 
raidcontrollers in my opinion.


have fun with it


best regards


Ralf

Am 14/04/2021 um 18:57 schrieb Humberto Freitas:
> Hey guys, I hope everybody is all right ?
>
> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc...
>
> I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist?
>
> Appreciate your wisdom lol
>
> Thanks for your great work
>
> Sincerely,
>
> Humberto Freitas
>
> Phone: +244 944 775 334
> Email: humberto.freitas310 at gmail.com
> Angola
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From smr at kmi.com  Wed Apr 14 19:28:44 2021
From: smr at kmi.com (Stefan M. Radman)
Date: Wed, 14 Apr 2021 17:28:44 +0000
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <mailman.351.1618417579.359.pve-user@lists.proxmox.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <mailman.351.1618417579.359.pve-user@lists.proxmox.com>
Message-ID: <35DFA143-EE6D-41AD-A795-C58BE7253441@kmi.com>

Hi Eneko

The redundant corosync rings would definitely have prevented the fencing even in your scenario.

As a final note you should also consider replacing that 1GbE link between the switches by an Nx1GbE bundle (LACP) for redundancy and bandwidth reasons or at least by 2 x 1GbE secured by spanning tree (RSTP).

Stefan

On Apr 14, 2021, at 18:26, Eneko Lacunza via pve-user <pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>> wrote:


From: Eneko Lacunza <elacunza at binovo.es<mailto:elacunza at binovo.es>>
Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence
Date: April 14, 2021 at 18:26:08 GMT+2
To: pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>


Hi,

So I have figured out what likely happened.

Indeed it was very likely a network congestion because proxmox1 and proxmox2 where using a switch and proxmox3 the other, due to proxmox1 and proxmox2 not having properly loaded the bond-primary directive (primary slave not shown on /proc/net/bonding/bond0 although it was present in /etc/network/interfaces).

Additionally, just checked out that both switches are linked by a 1G port due to the 4th SFP+ port being used for the backup server... (against my recommendation during the cluster setup I must add...)

So very likely it was network congestion that kicked proxmox1 out of the cluster.

If seems that bond directives should be present in slaves too, like:

auto lo
iface lo inet loopback

iface ens2f0np0 inet manual
    bond-master bond0
    bond-primary ens2f0np1
# Switch2

iface ens2f1np1 inet manual
    bond-master bond0
    bond-primary ens2f0np1
# Switch1

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves ens2f0np0 ens2f1np1
    bond-miimon 100
    bond-mode active-backup
    bond-primary ens2f0np1

auto bond0.91
iface bond0.91 inet static
    address 192.168.91.11
#Ceph

auto vmbr0
iface vmbr0 inet static
    address 192.168.90.11
    gateway 192.168.90.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

Otherwise, it seems sometimes primary doesn't get configured properly...

Thanks again Michael and Stefan!
Eneko


El 14/4/21 a las 12:12, Eneko Lacunza via pve-user escribi?:
Hi Michael,

El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
On Wed, 14 Apr 2021 11:04:10 +0200
Eneko Lacunza via pve-user<pve-user at lists.proxmox.com<mailto:pve-user at lists.proxmox.com>> wrote:

Hi all,

Yesterday we had a strange fence happen in a PVE 6.2 cluster.

Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
operating normally for a year. Last update was on January 21st 2021.
Storage is Ceph and nodes are connected to the same network switch
with active-pasive bonds.

proxmox1 was fenced and automatically rebooted, then everything
recovered. HA restarted VMs in other nodes too.

proxmox1 syslog: (no network link issues reported at device level)
I have seen this occasionally and every time the cause was high network
load/network congestion which caused token timeout. The default token
timeout in corosync IMHO is very optimistically configured to 1000 ms
so I have changed this setting to 5000 ms and after I have done this I
have never seen fencing happening caused by network load/network
congestion again. You could try this and see if that helps you.

PS. my cluster communication is on a dedicated gb bonded vlan.
Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes:

- We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
- Both switches are interconnected with a SFP+ DAC
- Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch.
- Connectivity to the LAN is done with 1 Gbit link
- Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks.

I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence.

Network cards are Broadcom.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


_______________________________________________
pve-user mailing list
pve-user at lists.proxmox.com
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C6173285a195944ab306e08d8ff620c61%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540143873213806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=k%2FL7WhTr4ybZ%2FsKsx%2F49L3k7sjc2VA71xKwI8iH8buw%3D&amp;reserved=0


CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession.

From emmungil at yahoo.com  Wed Apr 14 19:53:54 2021
From: emmungil at yahoo.com (LEVENT EMMUNGIL)
Date: Wed, 14 Apr 2021 17:53:54 +0000 (UTC)
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
References: <211108960.1987020.1618422834880.ref@mail.yahoo.com>
Message-ID: <211108960.1987020.1618422834880@mail.yahoo.com>

Hi,I have been using proxmox on various hardware (Different vendors rangin from pc to enterprise).HP Proliant DL380 Gen7, Gen8, Gen9 and Gen10They are all worked well, and had no problem. 
Best wishes.


From humberto.freitas310 at gmail.com  Wed Apr 14 20:19:53 2021
From: humberto.freitas310 at gmail.com (Humberto Freitas)
Date: Wed, 14 Apr 2021 19:19:53 +0100
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <20068e92-58d5-1ce7-67d6-181bf97cc948@konzept-is.de>
References: <20068e92-58d5-1ce7-67d6-181bf97cc948@konzept-is.de>
Message-ID: <6FCF9326-8545-4073-B119-A245D4E3EE69@gmail.com>

Hello Ralf, thank you so much for you fast response. 

> I have it running for a customer on a DL380 Gen9 and on an even older one, works togeher like charm.
Nice to know that. It looks promising ?

> I have installed Proxmox on so many different hardware in the past few years, from a small atom NUC, on "usual" pcs and up to "real" servers.
> 
> Never had a problem with it.
Yeah, Proxmox is such great software. Excellent work guys

> I always use the ZFS-options during install - better than using the raidcontrollers in my opinion.
Thanks for the advice. It looks like this server has some issues with RAID drivers. I?ll keep this in mind ?

> have fun with it
Hell yeah... I?m just waiting for the final decision to get it and start working with it. I can?t wait. Perhaps I?ll say something about the installation so people can see that Proxmox is tested on this kind of server

Once again thanks Ralf and all the community 

Sincerely,

Humberto Freitas

Phone: +244 944 775 334
Email: humberto.freitas310 at gmail.com
Angola

> On 14/04/2021, at 6:27 PM, Ralf Storm <ralf.storm at konzept-is.de> wrote:
> 
> ?Hello Humberto,
> 
> 
> I have it running for a customer on a DL380 Gen9 and on an even older one, works togeher like charm.
> 
> I have installed Proxmox on so many different hardware in the past few years, from a small atom NUC, on "usual" pcs and up to "real" servers.
> 
> Never had a problem with it.
> 
> I always use the ZFS-options during install - better than using the raidcontrollers in my opinion.
> 
> 
> have fun with it
> 
> 
> best regards
> 
> 
> Ralf
> 
>> Am 14/04/2021 um 18:57 schrieb Humberto Freitas:
>> Hey guys, I hope everybody is all right ?
>> 
>> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc...
>> 
>> I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist?
>> 
>> Appreciate your wisdom lol
>> 
>> Thanks for your great work
>> 
>> Sincerely,
>> 
>> Humberto Freitas
>> 
>> Phone: +244 944 775 334
>> Email: humberto.freitas310 at gmail.com
>> Angola
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From gaio at sv.lnf.it  Thu Apr 15 09:38:48 2021
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Thu, 15 Apr 2021 09:38:48 +0200
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
References: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
Message-ID: <20210415073848.GB3322@sv.lnf.it>

Mandi! Humberto Freitas
  In chel di` si favelave...

> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc...

I'm currently running PVE 5 on a ProLiant ML350p Gen8, that AFAIK is
the 'tower' version of ProLiant DL380p Gen8. No trouble at all.


If you have enough RAM, consider putting the controller in JBOD mode
and install directly with ZFS software RAID.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From a.lauterer at proxmox.com  Thu Apr 15 09:47:10 2021
From: a.lauterer at proxmox.com (Aaron Lauterer)
Date: Thu, 15 Apr 2021 09:47:10 +0200
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
References: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
Message-ID: <d5c7330b-a2f6-13a4-f92f-74a1ca071cdd@proxmox.com>

I personally still have one of those old boxes around.

Works fine but regarding the disks and RAID controllers you should be aware that in my experience, booting from any of the disks did not work when put into JBOD mode. I ended up putting in an HBA controller which also needed new SAS cables as the ones it shipped with have plugs that are angled at 90? making it impossible to plug 2 of them into the HBA.


On 4/14/21 6:57 PM, Humberto Freitas wrote:
> Hey guys, I hope everybody is all right ?
> 
> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc...
> 
> I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist?
> 
> Appreciate your wisdom lol
> 
> Thanks for your great work
> 
> Sincerely,
> 
> Humberto Freitas
> 
> Phone: +244 944 775 334
> Email: humberto.freitas310 at gmail.com
> Angola
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From elacunza at binovo.es  Thu Apr 15 09:55:42 2021
From: elacunza at binovo.es (Eneko Lacunza)
Date: Thu, 15 Apr 2021 09:55:42 +0200
Subject: [PVE-User] PVE 6.2 Strange cluster node fence
In-Reply-To: <35DFA143-EE6D-41AD-A795-C58BE7253441@kmi.com>
References: <mailman.324.1618391090.359.pve-user@lists.proxmox.com>
 <mailman.325.1618392084.359.pve-user@lists.proxmox.com>
 <mailman.329.1618395138.359.pve-user@lists.proxmox.com>
 <mailman.351.1618417579.359.pve-user@lists.proxmox.com>
 <35DFA143-EE6D-41AD-A795-C58BE7253441@kmi.com>
Message-ID: <a34b6867-1749-b576-9527-87ebff42d73e@binovo.es>

Hi Stefan,

El 14/4/21 a las 19:28, Stefan M. Radman escribi?:
> The redundant corosync rings would definitely have prevented the 
> fencing even in your scenario. 

Yes that's for sure ;)
>
> As a final note you should also consider replacing that 1GbE link 
> between the switches by an Nx1GbE bundle (LACP) for redundancy and 
> bandwidth reasons or at least by 2 x 1GbE secured by spanning tree (RSTP).
I think we should interlink the switches with SFP+. Backups don't need 
that bandwith but the final say is not mine :(

Thanks a lot
Eneko


>
> Stefan
>
>> On Apr 14, 2021, at 18:26, Eneko Lacunza via pve-user 
>> <pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>> wrote:
>>
>>
>> *From: *Eneko Lacunza <elacunza at binovo.es <mailto:elacunza at binovo.es>>
>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence*
>> *Date: *April 14, 2021 at 18:26:08 GMT+2
>> *To: *pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
>>
>>
>> Hi,
>>
>> So I have figured out what likely happened.
>>
>> Indeed it was very likely a network congestion because proxmox1 and 
>> proxmox2 where using a switch and proxmox3 the other, due to proxmox1 
>> and proxmox2 not having properly loaded the bond-primary directive 
>> (primary slave not shown on /proc/net/bonding/bond0 although it was 
>> present in /etc/network/interfaces).
>>
>> Additionally, just checked out that both switches are linked by a 1G 
>> port due to the 4th SFP+ port being used for the backup server... 
>> (against my recommendation during the cluster setup I must add...)
>>
>> So very likely it was network congestion that kicked proxmox1 out of 
>> the cluster.
>>
>> If seems that bond directives should be present in slaves too, like:
>>
>> auto lo
>> iface lo inet loopback
>>
>> iface ens2f0np0 inet manual
>> ??? bond-master bond0
>> ??? bond-primary ens2f0np1
>> # Switch2
>>
>> iface ens2f1np1 inet manual
>> ??? bond-master bond0
>> ??? bond-primary ens2f0np1
>> # Switch1
>>
>> iface eno1 inet manual
>>
>> iface eno2 inet manual
>>
>> auto bond0
>> iface bond0 inet manual
>> ??? bond-slaves ens2f0np0 ens2f1np1
>> ??? bond-miimon 100
>> ??? bond-mode active-backup
>> ??? bond-primary ens2f0np1
>>
>> auto bond0.91
>> iface bond0.91 inet static
>> ??? address 192.168.91.11
>> #Ceph
>>
>> auto vmbr0
>> iface vmbr0 inet static
>> ??? address 192.168.90.11
>> ??? gateway 192.168.90.1
>> ??? bridge-ports bond0
>> ??? bridge-stp off
>> ??? bridge-fd 0
>>
>> Otherwise, it seems sometimes primary doesn't get configured properly...
>>
>> Thanks again Michael and Stefan!
>> Eneko
>>
>>
>> El 14/4/21 a las 12:12, Eneko Lacunza via pve-user escribi?:
>>> Hi Michael,
>>>
>>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?:
>>>> On Wed, 14 Apr 2021 11:04:10 +0200
>>>> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com 
>>>> <mailto:pve-user at lists.proxmox.com>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>>>>
>>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>>>>> operating normally for a year. Last update was on January 21st 2021.
>>>>> Storage is Ceph and nodes are connected to the same network switch
>>>>> with active-pasive bonds.
>>>>>
>>>>> proxmox1 was fenced and automatically rebooted, then everything
>>>>> recovered. HA restarted VMs in other nodes too.
>>>>>
>>>>> proxmox1 syslog: (no network link issues reported at device level)
>>>> I have seen this occasionally and every time the cause was high network
>>>> load/network congestion which caused token timeout. The default token
>>>> timeout in corosync IMHO is very optimistically configured to 1000 ms
>>>> so I have changed this setting to 5000 ms and after I have done this I
>>>> have never seen fencing happening caused by network load/network
>>>> congestion again. You could try this and see if that helps you.
>>>>
>>>> PS. my cluster communication is on a dedicated gb bonded vlan.
>>> Thanks for the info. In this case network is 10Gbit (I see I didn't 
>>> include this info) but only for proxmox nodes:
>>>
>>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
>>> - Both switches are interconnected with a SFP+ DAC
>>> - Active-passive Bonds in each proxmox node go one SFP+ interface on 
>>> each switch. Primary interfaces are configured to be on the same switch.
>>> - Connectivity to the LAN is done with 1 Gbit link
>>> - Proxmox 2x10G Bond is used for VM networking and Ceph 
>>> public/private networks.
>>>
>>> I wouldn't expect high network load/congestion because it's on an 
>>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were 
>>> ocurring during the fence.
>>>
>>> Network cards are Broadcom.
>>>
>>> Thanks
>>
>> Eneko Lacunza
>> Zuzendari teknikoa | Director t?cnico
>> Binovo IT Human Project
>>
>> Tel. +34 943 569 206 | https://www.binovo.es <https://www.binovo.es>
>> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun
>>
>> https://www.youtube.com/user/CANALBINOVO 
>> <https://www.youtube.com/user/CANALBINOVO>
>> https://www.linkedin.com/company/37269706/
>>
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&amp;data=04%7C01%7Csmr%40kmi.com%7C6173285a195944ab306e08d8ff620c61%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540143873213806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=k%2FL7WhTr4ybZ%2FsKsx%2F49L3k7sjc2VA71xKwI8iH8buw%3D&amp;reserved=0
>
>
> CONFIDENTIALITY NOTICE: /This communication may contain privileged and 
> confidential information, or may otherwise be protected from 
> disclosure, and is intended solely for use of the intended 
> recipient(s). If you are not the intended recipient of this 
> communication, please notify the sender that you have received this 
> communication in error and delete and destroy all copies in your 
> possession. /
>

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


From ralf.storm at konzept-is.de  Thu Apr 15 09:59:17 2021
From: ralf.storm at konzept-is.de (Ralf Storm)
Date: Thu, 15 Apr 2021 09:59:17 +0200
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <d5c7330b-a2f6-13a4-f92f-74a1ca071cdd@proxmox.com>
References: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
 <d5c7330b-a2f6-13a4-f92f-74a1ca071cdd@proxmox.com>
Message-ID: <a55d6774-6a92-0bcf-2a55-30811fb9730b@konzept-is.de>

Hey Aron,

why didn?t you get this booting? What was the errror? Never had any 
booting problems with proxmox, despite the ZFS issues, which are 
described in the documentation, with quick and easy solutions

Am 15/04/2021 um 09:47 schrieb Aaron Lauterer:
> I personally still have one of those old boxes around.
>
> Works fine but regarding the disks and RAID controllers you should be 
> aware that in my experience, booting from any of the disks did not 
> work when put into JBOD mode. I ended up putting in an HBA controller 
> which also needed new SAS cables as the ones it shipped with have 
> plugs that are angled at 90? making it impossible to plug 2 of them 
> into the HBA.
>
>
> On 4/14/21 6:57 PM, Humberto Freitas wrote:
>> Hey guys, I hope everybody is all right ?
>>
>> I?m seeking for an advice from the community on buying this server, 
>> HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m 
>> planning to install, in the beginning, an instance of Debian 10 and 
>> in it I?m planning to have the usual setting for an enterprise like a 
>> filesharing software, an ERP, web server, etc...
>>
>> I?ve looked up something on DuckDuckGo, and find few things except 
>> this: 
>> https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. 
>> Does the issues described in the page still exist?
>>
>> Appreciate your wisdom lol
>>
>> Thanks for your great work
>>
>> Sincerely,
>>
>> Humberto Freitas
>>
>> Phone: +244 944 775 334
>> Email: humberto.freitas310 at gmail.com
>> Angola
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From a.lauterer at proxmox.com  Thu Apr 15 11:32:22 2021
From: a.lauterer at proxmox.com (Aaron Lauterer)
Date: Thu, 15 Apr 2021 11:32:22 +0200
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <a55d6774-6a92-0bcf-2a55-30811fb9730b@konzept-is.de>
References: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
 <d5c7330b-a2f6-13a4-f92f-74a1ca071cdd@proxmox.com>
 <a55d6774-6a92-0bcf-2a55-30811fb9730b@konzept-is.de>
Message-ID: <50abd43b-ad14-a8a7-f69b-db75764f28dd@proxmox.com>

IIRC, it has been a while: once I switched the P420i to HBA/JBOD mode, I think it only supports HBA? It just did not present any option to boot from it either in the controller or BIOS settings.

I think that has gotten better with the RAID controllers present in the G9 and later, but the P420i in the G8 (and G7?) are a bit horrible in that regard.

On 4/15/21 9:59 AM, Ralf Storm wrote:
> Hey Aron,
> 
> why didn?t you get this booting? What was the errror? Never had any booting problems with proxmox, despite the ZFS issues, which are described in the documentation, with quick and easy solutions
> 
> Am 15/04/2021 um 09:47 schrieb Aaron Lauterer:
>> I personally still have one of those old boxes around.
>>
>> Works fine but regarding the disks and RAID controllers you should be aware that in my experience, booting from any of the disks did not work when put into JBOD mode. I ended up putting in an HBA controller which also needed new SAS cables as the ones it shipped with have plugs that are angled at 90? making it impossible to plug 2 of them into the HBA.
>>
>>
>> On 4/14/21 6:57 PM, Humberto Freitas wrote:
>>> Hey guys, I hope everybody is all right ?
>>>
>>> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc...
>>>
>>> I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist?
>>>
>>> Appreciate your wisdom lol
>>>
>>> Thanks for your great work
>>>
>>> Sincerely,
>>>
>>> Humberto Freitas
>>>
>>> Phone: +244 944 775 334
>>> Email: humberto.freitas310 at gmail.com
>>> Angola
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From piviul at riminilug.it  Thu Apr 15 16:03:08 2021
From: piviul at riminilug.it (Piviul)
Date: Thu, 15 Apr 2021 16:03:08 +0200
Subject: [PVE-User] Edit: Boot Order mask
In-Reply-To: <da7f1512-5fa6-1bb4-a878-7035f8100454@c2n.upsaclay.fr>
References: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
 <b8bd884e-b3f3-f08f-cbaa-5a1b043d13b2@c2n.upsaclay.fr>
 <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it>
 <da7f1512-5fa6-1bb4-a878-7035f8100454@c2n.upsaclay.fr>
Message-ID: <a9c32fc8-817e-c26d-376f-b0755f713e02@riminilug.it>

Il 14/04/21 18:15, Alain P?an ha scritto:
> Hi Piviul,
>
> I don't think only a difference in kernel could explain this 
> difference in the web interface, if the other packages are the same. 
> Did you try to clear the cache in your web browsers ?
>
> The attached files are indeed there. I looked at the versions, and all 
> three appears up to date, so for me, the only origin that I can 
> suppose could be the browser cache.

But I'm sure it's not a cache browser because I have clear the cache and 
I have tested this problem in different browsers in different PCs... in 
my opinion there is a bug: during the node upgrade to 6.3, proxmox VE 
doesn't update the code that generate the Boot order option mask. Please 
can you verify if in your 6.3 proxomox nodes that are updates from 
previously 6.2 you can see the new drag and drop boot order mask?

Thank you very much

Piviul


From lindsay.mathieson at gmail.com  Thu Apr 15 17:43:28 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Fri, 16 Apr 2021 01:43:28 +1000
Subject: [PVE-User] mirrored ZFS boot with SATA SSD & NVMe PCIe SSD?
Message-ID: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com>

Setting up a home server (NUC i5) and it can only take a 2.5" SATA drive 
and a NVMe PCIe SSD - but I would really like a mirrored ZFS boot.

Is it possible (and safe?) to use a 512GB SATA SSD and a 512GB NVMe PCi 
SSD in a zfs boot mirror?

-- 
Lindsay


From leesteken at protonmail.ch  Thu Apr 15 17:51:57 2021
From: leesteken at protonmail.ch (Arjen)
Date: Thu, 15 Apr 2021 15:51:57 +0000
Subject: [PVE-User] mirrored ZFS boot with SATA SSD & NVMe PCIe SSD?
In-Reply-To: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com>
References: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com>
Message-ID: <bbVb1mGGNg3yp1EZiN4MT2k_PnZkfvX-AXgWnJBtitt5Af9ypqPshvLApTD63K12v2uDtDvtJbSrM29uLyDVvazSYeE1zxTXip7Qe65DWB8=@protonmail.ch>

On Thursday, April 15th, 2021 at 17:43, Lindsay Mathieson <lindsay.mathieson at gmail.com> wrote:

> Setting up a home server (NUC i5) and it can only take a 2.5" SATA drive
>
> and a NVMe PCIe SSD - but I would really like a mirrored ZFS boot.
>
> Is it possible (and safe?) to use a 512GB SATA SSD and a 512GB NVMe PCi
>
> SSD in a zfs boot mirror?

It should work fine but the write speeds (and fsync/sec) will typically be the slowest of the two.
I use a NVME M.2 mirrored by two SATA drives for my VMs, which I think is quite similar.
Older systems sometimes cannot boot from NVME (PCIe). You might want to make sure, otherwise the redundancy won't help if the SATA one fails.


From lindsay.mathieson at gmail.com  Thu Apr 15 18:33:35 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Fri, 16 Apr 2021 02:33:35 +1000
Subject: [PVE-User] mirrored ZFS boot with SATA SSD & NVMe PCIe SSD?
In-Reply-To: <mailman.398.1618501949.359.pve-user@lists.proxmox.com>
References: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com>
 <mailman.398.1618501949.359.pve-user@lists.proxmox.com>
Message-ID: <ff2531b5-26ab-9372-d8a6-acd58b4337f3@gmail.com>

On 16/04/2021 1:51 am, Arjen via pve-user wrote:
> It should work fine but the write speeds (and fsync/sec) will typically be the slowest of the two.

Thanks, thought that would be the case.


> I use a NVME M.2 mirrored by two SATA drives for my VMs, which I think is quite similar.
> Older systems sometimes cannot boot from NVME (PCIe). You might want to make sure, otherwise the redundancy won't help if the SATA one fails.

Didn't know that, will check.

-- 
Lindsay


From piviul at riminilug.it  Fri Apr 16 16:16:26 2021
From: piviul at riminilug.it (Piviul)
Date: Fri, 16 Apr 2021 16:16:26 +0200
Subject: [PVE-User] Edit: Boot Order mask
In-Reply-To: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
References: <d9bce6e0-d6ba-7492-d335-30a49950e06a@riminilug.it>
Message-ID: <3e550f7c-4f26-3573-63e8-d1e544096b82@riminilug.it>

Il 13/04/21 10:05, Piviul ha scritto:
> I ask[?] about this little problem on the forum but nobody found a 
> solution, so I try here...
>
> In my PVE the mask where I can change the Boot Order options of a VM 
> is not ever the same. If I access to the mask from 2 nodes (say node1 
> and node2) the mask is a simple html form with only combo boxes. On 
> the third node (say node3) the mask is more sophisticated, can support 
> the drag and drop, has checkbox... in other word it's different. So I 
> would like to know why my three nodes doesn't have the same mask even 
> if they are at the same proxmox version and if there is a way that all 
> nodes shows the same mask.
>
> I ask you because this is not only a layout problem; if I modify the 
> boot order options from the node3, I can see strange chars in the PVE 
> gui of the other two nodes but if I configure the boot order options 
> from node1 or node2 all seems works flawless.

The problem has been solved reinstalling pve-manager with the command

# apt install --reinstall pve-manager

|Thank you very much to all list members Have a great day! Piviul |
||

||


From lindsay.mathieson at gmail.com  Mon Apr 19 02:52:09 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Mon, 19 Apr 2021 10:52:09 +1000
Subject: [PVE-User] unpriviliged lxc uid/gid mappings
Message-ID: <d974a78b-65fc-9a54-905a-c25b4dafb479@gmail.com>

I must say, I find the subject very confusing and difficult to parse. It 
seems very difficult to setup with multiple user and container mappings 
to maintain - I just setup 4 containers with 4 bind mounts each and 
after a lot of fiddling, got them working, but I'm not confident on 
maintenance for the future. I had to give up on the container that 
needed access to 2 USB tuners and a Intel QuickSync GPU (vaapi), ended 
up running that container privileged.


Is there any plans to simplify it for the future? I found the LXD (4.0?) 
system of raw.idmap settings much easier to setup, I was able to 
generically script that for containers.

-- 
Lindsay


From lindsay.mathieson at gmail.com  Mon Apr 19 02:53:30 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Mon, 19 Apr 2021 10:53:30 +1000
Subject: [PVE-User] unpriviliged lxc uid/gid mappings
Message-ID: <190926b5-0c91-b8d3-e653-5425103c0c0d@gmail.com>

I must say, I find the subject very confusing and difficult to parse. It 
seems very difficult to setup with multiple user and container mappings 
to maintain - I just setup 4 containers with 4 bind mounts each and 
after a lot of fiddling, got them working, but I'm not confident on 
maintenance for the future. I had to give up on the container that 
needed access to 2 USB tuners and a Intel QuickSync GPU (vaapi), ended 
up running that container privileged.


Is there any plans to simplify it for the future? I found the LXD (4.0?) 
system of raw.idmap settings much easier to setup, I was able to 
generically script that for containers.


Not complaining, I'm very happy with the overall setup I have at home - 
PX Media Server and a PBS Server, much easier to maintain than my old 
setup, and disaster recovery exists now :)

-- 
Lindsay


From leandro at tecnetmza.com.ar  Mon Apr 19 14:29:19 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Mon, 19 Apr 2021 09:29:19 -0300
Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8
In-Reply-To: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
References: <A9FD0194-5228-4AED-88A5-39D0184A59A0@gmail.com>
Message-ID: <CALt2oz4TpoRi=EXC_NS0DSKAgq2sFW=jZtN8+P=a=XhbyWu4Lw@mail.gmail.com>

Humberto , WE bought an used hpe proliant dl380 gen8.
It is working very nice so far.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

El mi?, 14 abr 2021 a las 13:57, Humberto Freitas (<
humberto.freitas310 at gmail.com>) escribi?:

> Hey guys, I hope everybody is all right ?
>
> I?m seeking for an advice from the community on buying this server, HP
> ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning
> to install, in the beginning, an instance of Debian 10 and in it I?m
> planning to have the usual setting for an enterprise like a filesharing
> software, an ERP, web server, etc...
>
> I?ve looked up something on DuckDuckGo, and find few things except this:
> https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/.
> Does the issues described in the page still exist?
>
> Appreciate your wisdom lol
>
> Thanks for your great work
>
> Sincerely,
>
> Humberto Freitas
>
> Phone: +244 944 775 334
> Email: humberto.freitas310 at gmail.com
> Angola
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From leandro at tecnetmza.com.ar  Mon Apr 19 14:59:15 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Mon, 19 Apr 2021 09:59:15 -0300
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
Message-ID: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>

Hi guys , I received a very old del pe 2950 box.
Fortunately it has 64 GB and a double power supply, so i'm thinking about
using it with pve.
After confirm with dell support about storage capacity:
Max physical storage support is 2TB.
Max virtual storage support is also 2TB.
##
I was reading on previous emails at this mail list , about storage:
 "putting the controller in JBOD mode
and install directly with ZFS software RAID."

I always thought that raid hardware controller was the best option but ,
perhaps I can give it a try to ZFS software RAID with this old server  ....
what do you think ?
I have a bunch of 3.5" with odd capacities  unused drives.
I readed that zfs can merge them to get a more efficient use.
What about hot replace / remove or insert a new drive ?
Will it work without service disruption in production environments ?

Regards,
Leandro.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


From gaio at sv.lnf.it  Mon Apr 19 15:38:49 2021
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Mon, 19 Apr 2021 15:38:49 +0200
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
In-Reply-To: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
References: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
Message-ID: <20210419133849.GN3268@sv.lnf.it>

Mandi! Leandro Roggerone
  In chel di` si favelave...

> What about hot replace / remove or insert a new drive ?
> Will it work without service disruption in production environments ?

AFAIk no.

If i remember well, Linux SCSI/SATA subsystem have support for the hot-swap,
but need also the support for the controller/cage/backpane/...

So, basically, switching to JBOD/Passthroug, you lost host-swap.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From lists at benappy.com  Mon Apr 19 15:57:04 2021
From: lists at benappy.com (Michel 'ic' Luczak)
Date: Mon, 19 Apr 2021 15:57:04 +0200
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
In-Reply-To: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
References: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
Message-ID: <A14D8DF6-3BD7-4030-AEFC-CCB3E7CC5332@benappy.com>

Hi,

> Max physical storage support is 2TB.
> Max virtual storage support is also 2TB.

I?m using a PE2950 with the stock SAS RAID card and 6 x 4TB SATA using uSATA adapters (small boards that clip in the back of SATA drives to ?make them SAS? (over simplifying things but you will need those)). There is no real limitation, you may only run into issues with 512e/512n/etc? but up to 4 or 6 TB you should be fine.

More recently I used very modern 8TB SAS drives in an R410 without backplane and I had to cut the 3.3V power line to the drives because on modern drives it?s turning them off.

Dell support knows only what was written in the manual at the date of release of the server.

Regards, Michel


From devzero at web.de  Mon Apr 19 16:05:14 2021
From: devzero at web.de (Roland)
Date: Mon, 19 Apr 2021 16:05:14 +0200
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
In-Reply-To: <20210419133849.GN3268@sv.lnf.it>
References: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
 <20210419133849.GN3268@sv.lnf.it>
Message-ID: <4d87a555-3c3e-0819-404d-cbf65da0b9f1@web.de>

shouldn't it be possible to replace existing controller with
crossflashed perc h310 + sff-8484/sff-8087 cable? and you're done with
the disk / hotswap limitation ?

roland

Am 19.04.21 um 15:38 schrieb Marco Gaiarin:
> Mandi! Leandro Roggerone
>    In chel di` si favelave...
>
>> What about hot replace / remove or insert a new drive ?
>> Will it work without service disruption in production environments ?
> AFAIk no.
>
> If i remember well, Linux SCSI/SATA subsystem have support for the hot-swap,
> but need also the support for the controller/cage/backpane/...
>
> So, basically, switching to JBOD/Passthroug, you lost host-swap.
>


From kyleaschmitt at gmail.com  Mon Apr 19 17:08:04 2021
From: kyleaschmitt at gmail.com (Kyle Schmitt)
Date: Mon, 19 Apr 2021 10:08:04 -0500
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
In-Reply-To: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
References: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
Message-ID: <CAO6gj4_EtDMzB4bgMoE-cTzv2U6ERCXikzq5BWndo8FhmRC4=A@mail.gmail.com>

I run on R610s, so about the same generation I think.  I use the
hardware raid for mirrored boot drive only, and NFS over 10G for VM
storage, which is on a seperate system running FreeBSD + ZFS.

But for your case: the general rule for any storage system is don't
mix.  You pick one and only one: hardware raid, software raid, zfs,
ceph, etc.  The exceptions are for systems you almost definitely won't
be using like luster and gluster.

You can still do hotplug with JBOD mode, at least on the dell hardware
I've used.  I have no idea if it's officially supported or not.  I do
know that I sometimes have to use the raid tools to bring in a new
drive when it's NOT in JBOD mode.

In the 6ish years I've run ZFS I've only had one drive fail (pure
luck, not skill), and it was trivial to swap out and replace.

--Kyle

On Mon, Apr 19, 2021 at 7:59 AM Leandro Roggerone
<leandro at tecnetmza.com.ar> wrote:
>
> Hi guys , I received a very old del pe 2950 box.
> Fortunately it has 64 GB and a double power supply, so i'm thinking about
> using it with pve.
> After confirm with dell support about storage capacity:
> Max physical storage support is 2TB.
> Max virtual storage support is also 2TB.
> ##
> I was reading on previous emails at this mail list , about storage:
>  "putting the controller in JBOD mode
> and install directly with ZFS software RAID."
>
> I always thought that raid hardware controller was the best option but ,
> perhaps I can give it a try to ZFS software RAID with this old server  ....
> what do you think ?
> I have a bunch of 3.5" with odd capacities  unused drives.
> I readed that zfs can merge them to get a more efficient use.
> What about hot replace / remove or insert a new drive ?
> Will it work without service disruption in production environments ?
>
> Regards,
> Leandro.
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Libre
> de virus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From d.alexandris at gmail.com  Mon Apr 19 22:20:18 2021
From: d.alexandris at gmail.com (Dimitri Alexandris)
Date: Mon, 19 Apr 2021 23:20:18 +0300
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
In-Reply-To: <CAO6gj4_EtDMzB4bgMoE-cTzv2U6ERCXikzq5BWndo8FhmRC4=A@mail.gmail.com>
References: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
 <CAO6gj4_EtDMzB4bgMoE-cTzv2U6ERCXikzq5BWndo8FhmRC4=A@mail.gmail.com>
Message-ID: <CAOWoYHqzeYnZoV0HKj30uuWv4cwPKB9GCAOF5sb5879axoSPEQ@mail.gmail.com>

I have an old Dell 1950 with one cpu (2 disks).  Was 1+1G ram, added 16+16
and now have 34G.  Two power supplies.

1-  Never bothered to change SAS controller mode.
2-  SATA disks (up to 2T of course) work perfectly, never used transposers.
3-  First worked with an internal SSD (at internal SATA port, plus an addon
power cable) for OS (always ZFS) + 2 SATA disks (ZFS raid) for data.  3
years, no problems.
4-  Now i bought 2 SAS 1T, with OS on them, and also works fine.
5-  Hot plugging disks is working fine.  I actually increased capacity with
bigger disks without stopping anything this way.  Had 500G SATA before 1T
SAS diks.

I configure the 2 eths in OVS (openvswitch) bond mode, with several VLANS
and internal networks (OVS IntPorts).

In your case, with many SAS disks you will be fine with any configuration,
e.g.:

3+3 x 2T = 6T ZFS raid, OS + data, the fastest combination, or
5+1 ZFS raidz, 10T space, with best capacity, or
4+2 ZFS raidz2, 8T space, also big, and safer.

Do yourself a favour, and buy some decent SAS disks, at 7200rpm are very
cheap.


On Mon, Apr 19, 2021 at 6:08 PM Kyle Schmitt <kyleaschmitt at gmail.com> wrote:

> I run on R610s, so about the same generation I think.  I use the
> hardware raid for mirrored boot drive only, and NFS over 10G for VM
> storage, which is on a seperate system running FreeBSD + ZFS.
>
> But for your case: the general rule for any storage system is don't
> mix.  You pick one and only one: hardware raid, software raid, zfs,
> ceph, etc.  The exceptions are for systems you almost definitely won't
> be using like luster and gluster.
>
> You can still do hotplug with JBOD mode, at least on the dell hardware
> I've used.  I have no idea if it's officially supported or not.  I do
> know that I sometimes have to use the raid tools to bring in a new
> drive when it's NOT in JBOD mode.
>
> In the 6ish years I've run ZFS I've only had one drive fail (pure
> luck, not skill), and it was trivial to swap out and replace.
>
> --Kyle
>
> On Mon, Apr 19, 2021 at 7:59 AM Leandro Roggerone
> <leandro at tecnetmza.com.ar> wrote:
> >
> > Hi guys , I received a very old del pe 2950 box.
> > Fortunately it has 64 GB and a double power supply, so i'm thinking about
> > using it with pve.
> > After confirm with dell support about storage capacity:
> > Max physical storage support is 2TB.
> > Max virtual storage support is also 2TB.
> > ##
> > I was reading on previous emails at this mail list , about storage:
> >  "putting the controller in JBOD mode
> > and install directly with ZFS software RAID."
> >
> > I always thought that raid hardware controller was the best option but ,
> > perhaps I can give it a try to ZFS software RAID with this old server
> ....
> > what do you think ?
> > I have a bunch of 3.5" with odd capacities  unused drives.
> > I readed that zfs can merge them to get a more efficient use.
> > What about hot replace / remove or insert a new drive ?
> > Will it work without service disruption in production environments ?
> >
> > Regards,
> > Leandro.
> >
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> > Libre
> > de virus. www.avast.com
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > _______________________________________________
> > pve-user mailing list
> > pve-user at lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From leandro at tecnetmza.com.ar  Tue Apr 20 14:06:21 2021
From: leandro at tecnetmza.com.ar (Leandro Roggerone)
Date: Tue, 20 Apr 2021 09:06:21 -0300
Subject: [PVE-User] get the most of storage for a very old dell pe 2950.
In-Reply-To: <CAOWoYHqzeYnZoV0HKj30uuWv4cwPKB9GCAOF5sb5879axoSPEQ@mail.gmail.com>
References: <CALt2oz6qSFaWYTZMrkWrc9tVsXB1hvX+ugzDbfnE_L+Fn3q=Gw@mail.gmail.com>
 <CAO6gj4_EtDMzB4bgMoE-cTzv2U6ERCXikzq5BWndo8FhmRC4=A@mail.gmail.com>
 <CAOWoYHqzeYnZoV0HKj30uuWv4cwPKB9GCAOF5sb5879axoSPEQ@mail.gmail.com>
Message-ID: <CALt2oz6c6FujPGDvhMN7waZYPk+jnRuA=3vWJnHMLrYbCfGKww@mail.gmail.com>

Thank you guys.
I will try to buy 6 new drives.
Then I will let you know how it goes.
Regards.
Leandro.

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

El lun, 19 abr 2021 a las 17:21, Dimitri Alexandris (<d.alexandris at gmail.com>)
escribi?:

> I have an old Dell 1950 with one cpu (2 disks).  Was 1+1G ram, added 16+16
> and now have 34G.  Two power supplies.
>
> 1-  Never bothered to change SAS controller mode.
> 2-  SATA disks (up to 2T of course) work perfectly, never used transposers.
> 3-  First worked with an internal SSD (at internal SATA port, plus an addon
> power cable) for OS (always ZFS) + 2 SATA disks (ZFS raid) for data.  3
> years, no problems.
> 4-  Now i bought 2 SAS 1T, with OS on them, and also works fine.
> 5-  Hot plugging disks is working fine.  I actually increased capacity with
> bigger disks without stopping anything this way.  Had 500G SATA before 1T
> SAS diks.
>
> I configure the 2 eths in OVS (openvswitch) bond mode, with several VLANS
> and internal networks (OVS IntPorts).
>
> In your case, with many SAS disks you will be fine with any configuration,
> e.g.:
>
> 3+3 x 2T = 6T ZFS raid, OS + data, the fastest combination, or
> 5+1 ZFS raidz, 10T space, with best capacity, or
> 4+2 ZFS raidz2, 8T space, also big, and safer.
>
> Do yourself a favour, and buy some decent SAS disks, at 7200rpm are very
> cheap.
>
>
> On Mon, Apr 19, 2021 at 6:08 PM Kyle Schmitt <kyleaschmitt at gmail.com>
> wrote:
>
> > I run on R610s, so about the same generation I think.  I use the
> > hardware raid for mirrored boot drive only, and NFS over 10G for VM
> > storage, which is on a seperate system running FreeBSD + ZFS.
> >
> > But for your case: the general rule for any storage system is don't
> > mix.  You pick one and only one: hardware raid, software raid, zfs,
> > ceph, etc.  The exceptions are for systems you almost definitely won't
> > be using like luster and gluster.
> >
> > You can still do hotplug with JBOD mode, at least on the dell hardware
> > I've used.  I have no idea if it's officially supported or not.  I do
> > know that I sometimes have to use the raid tools to bring in a new
> > drive when it's NOT in JBOD mode.
> >
> > In the 6ish years I've run ZFS I've only had one drive fail (pure
> > luck, not skill), and it was trivial to swap out and replace.
> >
> > --Kyle
> >
> > On Mon, Apr 19, 2021 at 7:59 AM Leandro Roggerone
> > <leandro at tecnetmza.com.ar> wrote:
> > >
> > > Hi guys , I received a very old del pe 2950 box.
> > > Fortunately it has 64 GB and a double power supply, so i'm thinking
> about
> > > using it with pve.
> > > After confirm with dell support about storage capacity:
> > > Max physical storage support is 2TB.
> > > Max virtual storage support is also 2TB.
> > > ##
> > > I was reading on previous emails at this mail list , about storage:
> > >  "putting the controller in JBOD mode
> > > and install directly with ZFS software RAID."
> > >
> > > I always thought that raid hardware controller was the best option but
> ,
> > > perhaps I can give it a try to ZFS software RAID with this old server
> > ....
> > > what do you think ?
> > > I have a bunch of 3.5" with odd capacities  unused drives.
> > > I readed that zfs can merge them to get a more efficient use.
> > > What about hot replace / remove or insert a new drive ?
> > > Will it work without service disruption in production environments ?
> > >
> > > Regards,
> > > Leandro.
> > >
> > > <
> >
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > >
> > > Libre
> > > de virus. www.avast.com
> > > <
> >
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> > >
> > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > > _______________________________________________
> > > pve-user mailing list
> > > pve-user at lists.proxmox.com
> > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >
> >
> > _______________________________________________
> > pve-user mailing list
> > pve-user at lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> >
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From piccardi at truelite.it  Tue Apr 20 18:04:06 2021
From: piccardi at truelite.it (Simone Piccardi)
Date: Tue, 20 Apr 2021 18:04:06 +0200
Subject: PBS backups on files
Message-ID: <6f8ac760-2e6b-7050-1e47-0fed8d5d0b65@truelite.it>

Hi,

I just installed lasta version of Proxmox Backup System, where there is 
a new Tape Backup section. That's a good news, but it seems that is 
necessary to have tape driver, and that's not possible to use a simple 
file as destination for the backup.

Just having only a removable cassette disk and not a tape the feature 
(which seemd very well made) is unfortunately useless to me.

Since a removable disk is generally a simple and cheap solution for 
off-site backups, is there any possibility to extend this feature to 
save data on an ordinary file?

Greetings
Simone
-- 
Simone Piccardi                                 Truelite Srl
piccardi at truelite.it (email/jabber)             Via Monferrato, 6
Tel. +39-347-1032433                            50142 Firenze
http://www.truelite.it                          Tel. +39-055-7879597


From dietmar at proxmox.com  Tue Apr 20 21:07:26 2021
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Tue, 20 Apr 2021 21:07:26 +0200 (CEST)
Subject: [PVE-User] PBS backups on files
Message-ID: <1276563634.4483.1618945646240@webmail.proxmox.com>

> Since a removable disk is generally a simple and cheap solution for 
> off-site backups, is there any possibility to extend this feature to 
> save data on an ordinary file?

Sync to a removable disk is unrelated to tape backup.

But we have plans to support that also in the future...


From piccardi at truelite.it  Thu Apr 22 11:22:07 2021
From: piccardi at truelite.it (Simone Piccardi)
Date: Thu, 22 Apr 2021 11:22:07 +0200
Subject: [PVE-User] PBS backups on files
In-Reply-To: <1276563634.4483.1618945646240@webmail.proxmox.com>
References: <1276563634.4483.1618945646240@webmail.proxmox.com>
Message-ID: <390357c0-4249-bafb-8234-08cb9fe3792a@truelite.it>

Il 20/04/21 21:07, Dietmar Maurer ha scritto:
>> Since a removable disk is generally a simple and cheap solution for
>> off-site backups, is there any possibility to extend this feature to
>> save data on an ordinary file?
> 
> Sync to a removable disk is unrelated to tape backup.
> 
They seemed similar to me, because a tape is still a device file, so I 
thinked that just writing the same content into a standard file will do 
the job.

> But we have plans to support that also in the future...
> 
That's a good news.

Simone
-- 
Simone Piccardi                                 Truelite Srl
piccardi at truelite.it (email/jabber)             Via Monferrato, 6
Tel. +39-347-1032433                            50142 Firenze
http://www.truelite.it                          Tel. +39-055-7879597


From jmr.richardson at gmail.com  Tue Apr 27 20:38:04 2021
From: jmr.richardson at gmail.com (JR Richardson)
Date: Tue, 27 Apr 2021 13:38:04 -0500
Subject: [PVE-User] Multi Data Center Cluster or Not
Message-ID: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>

Hi All,

I'm looking for suggestions for geo-diversity using PROXMOX
Clustering. I understand running hypervisors in the same cluster in
multiple data centers is possible with high capacity/low latency
inter-site links. What I'm learning is there could be better ways,
like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS
is interesting but would require manually restoring nodes should a
failure occur.

I'm looking for best practice or suggestions in topology that folks
are using successfully or even tales of failure for what to avoid.

Thanks.

JR
-- 
JR Richardson
Engineering for the Masses
Chasing the Azeotrope


From aderumier at odiso.com  Wed Apr 28 04:03:19 2021
From: aderumier at odiso.com (alexandre derumier)
Date: Wed, 28 Apr 2021 04:03:19 +0200
Subject: [PVE-User] Multi Data Center Cluster or Not
In-Reply-To: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
References: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
Message-ID: <807b442a-7b57-2918-986d-fda0db321b45@odiso.com>

Hi,

If you want same cluster on multiple datacenter, you really need low 
latency (for proxmox && storage), and at least 3 datacenters to keep 
quorum.

if you need a 2dc datacenter, with 1 primary && 1 backup as disaster 
recovery

you could manually replicate a zfs or ceph storage to the backup dc 
(with snapshot export/import), or other storage replication feature if 
you have a san like netapp for example and do an rsync of /etc/pve.


On 27/04/2021 20:38, JR Richardson wrote:
> Hi All,
>
> I'm looking for suggestions for geo-diversity using PROXMOX
> Clustering. I understand running hypervisors in the same cluster in
> multiple data centers is possible with high capacity/low latency
> inter-site links. What I'm learning is there could be better ways,
> like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS
> is interesting but would require manually restoring nodes should a
> failure occur.
>
> I'm looking for best practice or suggestions in topology that folks
> are using successfully or even tales of failure for what to avoid.
>
> Thanks.
>
> JR


From t.lamprecht at proxmox.com  Wed Apr 28 08:40:51 2021
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Wed, 28 Apr 2021 08:40:51 +0200
Subject: [PVE-User] Multi Data Center Cluster or Not
In-Reply-To: <807b442a-7b57-2918-986d-fda0db321b45@odiso.com>
References: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
 <807b442a-7b57-2918-986d-fda0db321b45@odiso.com>
Message-ID: <5f38f579-fe59-763e-6919-be691912aa87@proxmox.com>

On 28.04.21 04:03, alexandre derumier wrote:
> On 27/04/2021 20:38, JR Richardson wrote:
>> I'm looking for suggestions for geo-diversity using PROXMOX
>> Clustering. I understand running hypervisors in the same cluster in
>> multiple data centers is possible with high capacity/low latency
>> inter-site links. What I'm learning is there could be better ways,
>> like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS
>> is interesting but would require manually restoring nodes should a
>> failure occur.
>>
>> I'm looking for best practice or suggestions in topology that folks
>> are using successfully or even tales of failure for what to avoid.
>
> If you want same cluster on multiple datacenter, you really need low latency (for proxmox && storage), and at least 3 datacenters to keep quorum.
> 
> if you need a 2dc datacenter, with 1 primary && 1 backup as disaster recovery
> 
> you could manually replicate a zfs or ceph storage to the backup dc (with snapshot export/import), or other storage replication feature if you have a san like netapp for example and do an rsync of /etc/pve.
> 

We know of setups which use rbd-mirror to mirror their production Ceph pool
to a second DC for recovery on failure. It's still needs a bit of hands-on
approach on setup and actual recovery can be prepared too (pre-create matching
VMs, maybe lock them by default so no start is done by accident).

We also know some city-gov IT people which run their cluster over multiple
DCs, but they have the luck to be able to run redundant fiber with LAN-like
latency between those DCs, which may not be an option for everyone.

A multi-datacenter management is planned, but we currently are still fleshing
out the basis, albeit some features required for that to happen are in-work.
Nothing ready to soon, though, just mentioning as FYI.

cheers,
Thomas


From martin at proxmox.com  Wed Apr 28 11:56:33 2021
From: martin at proxmox.com (Martin Maurer)
Date: Wed, 28 Apr 2021 11:56:33 +0200
Subject: [PVE-User] Proxmox VE 6.4 released
Message-ID: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>

Hi all,

We are proud to announce the general availability of Proxmox Virtual Environment 6.4, our open-source virtualization platform. This version brings unified single-file restore for virtual machine (VM) and container (CT) backup archives stored on a Proxmox Backup Server as well as live restore of VM backup archives located on a Proxmox Backup Server.

Version 6.4 also comes with Ceph Octopus 15.2.11 and Ceph Nautilus 14.2.20, many enhancements to KVM/QEMU, and notable bug fixes. Many new Ceph-specific management features have been added to the GUI. We have improved the integration of the placement group (PG) auto-scaler, and you can configure Target Size or Target Ratio settings in the GUI.

The new version is based on Debian Buster 10.9, but using a newer, long-term supported Linux kernel 5.4. Optionally, the 5.11 kernel can be installed, providing support for the latest hardware. The latest versions of QEMU 5.2, LXC 4.0, and OpenZFS 2.0.4 have been included.

There are some notable bug fixes and smaller improvements, see the full release notes.

Release notes
https://pve.proxmox.com/wiki/Roadmap

Press release
https://www.proxmox.com/en/news/press-releases/proxmox-virtual-environment-6-4-available

Video tutorial
https://www.proxmox.com/en/training/video-tutorials/item/what-s-new-in-proxmox-ve-6-4

Download
https://www.proxmox.com/en/downloads
Alternate ISO download:
http://download.proxmox.com/iso

Documentation
https://pve.proxmox.com/pve-docs

Community Forum
https://forum.proxmox.com

Source Code
https://git.proxmox.com

Bugtracker
https://bugzilla.proxmox.com

FAQ
Q: Can I dist-upgrade Proxmox VE 6.x to 6.4 with apt?
A: Yes, just via GUI or via CLI with apt update && apt dist-upgrade

Q: Can I install Proxmox VE 6.4 on top of Debian Buster?
A: Yes, see https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Buster

Q: Can I upgrade my Proxmox VE 5.4 cluster with Ceph Luminous to 6.x and higher with Ceph Nautilus and even Ceph Octopus?
A: This is a three step process. First, you have to upgrade Proxmox VE from 5.4 to 6.4, and afterwards upgrade Ceph from Luminous to Nautilus. There are a lot of improvements and changes, please follow exactly the upgrade documentation.
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus

Finally, do the upgrade to Ceph Octopus - https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus

Q: Where can I get more information about feature updates?
A: Check our roadmap, forum, mailing lists, and subscribe to our newsletter.

A big THANK YOU to our active community for all your feedback, testing, bug reporting and patch submitting!

-- 
Best Regards,

Martin Maurer
Proxmox VE project leader


From daniel at firewall-services.com  Wed Apr 28 14:14:10 2021
From: daniel at firewall-services.com (Daniel Berteaud)
Date: Wed, 28 Apr 2021 14:14:10 +0200 (CEST)
Subject: [PVE-User] SDN issues in 6.4
In-Reply-To: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
Message-ID: <371322183.4890.1619612050577.JavaMail.zimbra@fws.fr>

----- Le 28 Avr 21, ? 14:10, Daniel Berteaud <daniel at firewall-services.com> a ?crit : 

> Hi.

> Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN
> feature with a single VLAN zone, and a few vnets (each one of the vnets using a
> VLAN tag, and not VLAN-aware themself).
> I see several issues regarding SDN since the upgrade :

>    * The biggest issue is that in Datacenter -> SDN I only see a single node (with
>    the status "available"). The other two do not appear anymore. Without paying
>    attention, I clicked on the "Apply" button. This wiped the
>    /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore,
>    and reloaded their network stack. Needless to say it was a complete failure as
>    all the VM attached to one of those vnets lost network connectivity. I've
>    manually copied this /etc/network/interfaces.d/sdn file from the only working
>    node to the other two for now, but I can't make any change from the GUI now or
>     it'll do the same again
>    * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone
>    were displayed at all. But the Vnets correctly showed they were attached to my
>    zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding
>    it again from the GUI, which seemed to work. The only change it made to
>     /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone

Also, I have a lot of errors like this now : 

Apr 28 13:14:16 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. 
Apr 28 13:14:25 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. 
Apr 28 13:14:35 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. 
Apr 28 13:14:46 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. 
Apr 28 13:14:55 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. 

On all the 3 nodes (even the one which still appears in the SDN Status page on the GUI) 

-- 

[ https://www.firewall-services.com/ ] 	
Daniel Berteaud 
FIREWALL-SERVICES SAS, La s?curit? des r?seaux 
Soci?t? de Services en Logiciels Libres 
T?l : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ] 


From daniel at firewall-services.com  Wed Apr 28 14:10:19 2021
From: daniel at firewall-services.com (Daniel Berteaud)
Date: Wed, 28 Apr 2021 14:10:19 +0200 (CEST)
Subject: [PVE-User] SDN issues in 6.4
Message-ID: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>

Hi. 

Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN feature with a single VLAN zone, and a few vnets (each one of the vnets using a VLAN tag, and not VLAN-aware themself). 
I see several issues regarding SDN since the upgrade : 


    * The biggest issue is that in Datacenter -> SDN I only see a single node (with the status "available"). The other two do not appear anymore. Without paying attention, I clicked on the "Apply" button. This wiped the /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore, and reloaded their network stack. Needless to say it was a complete failure as all the VM attached to one of those vnets lost network connectivity. I've manually copied this /etc/network/interfaces.d/sdn file from the only working node to the other two for now, but I can't make any change from the GUI now or it'll do the same again 
    * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone were displayed at all. But the Vnets correctly showed they were attached to my zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding it again from the GUI, which seemed to work. The only change it made to /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone 


Anyone know what could be wrong ? Why would 2 (out of 3) nodes not showing up in the SDN status anymore ? 

The 3 nodes are fully up to date using the no-subscription repo, here's the complete pveversion : 

pve-manager/6.4-4/337d6701 (running kernel: 5.4.106-1-pve) 
root at pvo5:~# pveversion -v 
proxmox-ve: 6.4-1 (running kernel: 5.4.106-1-pve) 
pve-manager: 6.4-4 (running version: 6.4-4/337d6701) 
pve-kernel-5.4: 6.4-1 
pve-kernel-helper: 6.4-1 
pve-kernel-5.4.106-1-pve: 5.4.106-1 
pve-kernel-5.4.103-1-pve: 5.4.103-1 
pve-kernel-5.4.73-1-pve: 5.4.73-1 
ceph-fuse: 12.2.11+dfsg1-2.1+b1 
corosync: 3.1.2-pve1 
criu: 3.11-3 
glusterfs-client: 5.5-3 
ifupdown: residual config 
ifupdown2: 3.0.0-1+pve3 
ksm-control-daemon: 1.3-1 
libjs-extjs: 6.0.1-10 
libknet1: 1.20-pve1 
libproxmox-acme-perl: 1.0.8 
libproxmox-backup-qemu0: 1.0.3-1 
libpve-access-control: 6.4-1 
libpve-apiclient-perl: 3.1-3 
libpve-common-perl: 6.4-2 
libpve-guest-common-perl: 3.1-5 
libpve-http-server-perl: 3.2-1 
libpve-network-perl: 0.5-1 
libpve-storage-perl: 6.4-1 
libqb0: 1.0.5-1 
libspice-server1: 0.14.2-4~pve6+1 
lvm2: 2.03.02-pve4 
lxc-pve: 4.0.6-2 
lxcfs: 4.0.6-pve1 
novnc-pve: 1.1.0-1 
openvswitch-switch: 2.12.3-1 
proxmox-backup-client: 1.1.5-1 
proxmox-mini-journalreader: 1.1-1 
proxmox-widget-toolkit: 2.5-3 
pve-cluster: 6.4-1 
pve-container: 3.3-5 
pve-docs: 6.4-1 
pve-edk2-firmware: 2.20200531-1 
pve-firewall: 4.1-3 
pve-firmware: 3.2-2 
pve-ha-manager: 3.1-1 
pve-i18n: 2.3-1 
pve-qemu-kvm: 5.2.0-6 
pve-xtermjs: 4.7.0-3 
qemu-server: 6.4-1 
smartmontools: 7.2-pve2 
spiceterm: 3.1-1 
vncterm: 1.6-2 
zfsutils-linux: 2.0.4-pve1 
root at pvo5:~# 


-- 


[ https://www.firewall-services.com/ ] 	
Daniel Berteaud 
FIREWALL-SERVICES SAS, La s?curit? des r?seaux 
Soci?t? de Services en Logiciels Libres 
T?l : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ] 


From daniel at firewall-services.com  Wed Apr 28 14:39:38 2021
From: daniel at firewall-services.com (Daniel Berteaud)
Date: Wed, 28 Apr 2021 14:39:38 +0200 (CEST)
Subject: [PVE-User] SDN issues in 6.4
In-Reply-To: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
Message-ID: <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr>

----- Le 28 Avr 21, ? 14:10, Daniel Berteaud daniel at firewall-services.com a ?crit :

> Hi.
> 
> Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN
> feature with a single VLAN zone, and a few vnets (each one of the vnets using a
> VLAN tag, and not VLAN-aware themself).
> I see several issues regarding SDN since the upgrade :
> 
> 
> 
>    * The biggest issue is that in Datacenter -> SDN I only see a single node (with
>    the status "available"). The other two do not appear anymore. Without paying
>    attention, I clicked on the "Apply" button. This wiped the
>    /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore,
>    and reloaded their network stack. Needless to say it was a complete failure as
>    all the VM attached to one of those vnets lost network connectivity. I've
>    manually copied this /etc/network/interfaces.d/sdn file from the only working
>    node to the other two for now, but I can't make any change from the GUI now or
>    it'll do the same again
>    * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone
>    were displayed at all. But the Vnets correctly showed they were attached to my
>    zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding
>    it again from the GUI, which seemed to work. The only change it made to
>    /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone
> 
> 

I just opened https://bugzilla.proxmox.com/show_bug.cgi?id=3403
I checked another single node (no cluster) PVE install, which have the exact same issue, so it's not something specific on my setup, but a more general (and critical) bug

Cheers,
Daniel


-- 
[ https://www.firewall-services.com/ ] 	
Daniel Berteaud 
FIREWALL-SERVICES SAS, La s?curit? des r?seaux 
Soci?t? de Services en Logiciels Libres 
T?l : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ]


From gilberto.nunes32 at gmail.com  Wed Apr 28 14:50:53 2021
From: gilberto.nunes32 at gmail.com (Gilberto Ferreira)
Date: Wed, 28 Apr 2021 09:50:53 -0300
Subject: [PVE-User] SDN issues in 6.4
In-Reply-To: <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr>
References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
 <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr>
Message-ID: <CAOKSTBsDkSkDsOGyGrmnLHwNf27j_=ZBPusrvFkJwd=C7oo_qw@mail.gmail.com>

Just curious: did you restart the upgraded nodes?

---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram


Em qua., 28 de abr. de 2021 ?s 09:39, Daniel Berteaud
<daniel at firewall-services.com> escreveu:

>
> ----- Le 28 Avr 21, ? 14:10, Daniel Berteaud daniel at firewall-services.com a ?crit :
>
> > Hi.
> >
> > Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN
> > feature with a single VLAN zone, and a few vnets (each one of the vnets using a
> > VLAN tag, and not VLAN-aware themself).
> > I see several issues regarding SDN since the upgrade :
> >
> >
> >
> >    * The biggest issue is that in Datacenter -> SDN I only see a single node (with
> >    the status "available"). The other two do not appear anymore. Without paying
> >    attention, I clicked on the "Apply" button. This wiped the
> >    /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore,
> >    and reloaded their network stack. Needless to say it was a complete failure as
> >    all the VM attached to one of those vnets lost network connectivity. I've
> >    manually copied this /etc/network/interfaces.d/sdn file from the only working
> >    node to the other two for now, but I can't make any change from the GUI now or
> >    it'll do the same again
> >    * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone
> >    were displayed at all. But the Vnets correctly showed they were attached to my
> >    zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding
> >    it again from the GUI, which seemed to work. The only change it made to
> >    /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone
> >
> >
>
> I just opened https://bugzilla.proxmox.com/show_bug.cgi?id=3403
> I checked another single node (no cluster) PVE install, which have the exact same issue, so it's not something specific on my setup, but a more general (and critical) bug
>
> Cheers,
> Daniel
>
>
> --
> [ https://www.firewall-services.com/ ]
> Daniel Berteaud
> FIREWALL-SERVICES SAS, La s?curit? des r?seaux
> Soci?t? de Services en Logiciels Libres
> T?l : +33.5 56 64 15 32
> Matrix: @dani:fws.fr
> [ https://www.firewall-services.com/ | https://www.firewall-services.com ]
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From lindsay.mathieson at gmail.com  Wed Apr 28 14:52:52 2021
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 28 Apr 2021 22:52:52 +1000
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
Message-ID: <08d5428a-b4dc-eec8-cd1a-8e0e66acfba8@gmail.com>

Upgraded from 6.3 with no problems on my single node, non-ceph home 
server (CT only). Kernel update, so needed a reboot.

On 28/04/2021 7:56 pm, Martin Maurer wrote:
> This version brings unified single-file restore for virtual machine 
> (VM) and container (CT) backup archives stored on a Proxmox Backup Server


Thats amazing, tested and works as advertised. Could see this being very 
useful.


> as well as live restore of VM backup archives located on a Proxmox 
> Backup Server. 


How on earth do you do that? are you retrieving disk sectors on the fly 
as needed from the backup server?


Great release! Will probably upgrade our PX/Ceph cluster at work over 
the weekend.

-- 
Lindsay


From daniel at firewall-services.com  Wed Apr 28 14:57:27 2021
From: daniel at firewall-services.com (Daniel Berteaud)
Date: Wed, 28 Apr 2021 14:57:27 +0200 (CEST)
Subject: [PVE-User] SDN issues in 6.4
In-Reply-To: <CAOKSTBsDkSkDsOGyGrmnLHwNf27j_=ZBPusrvFkJwd=C7oo_qw@mail.gmail.com>
References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
 <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr>
 <CAOKSTBsDkSkDsOGyGrmnLHwNf27j_=ZBPusrvFkJwd=C7oo_qw@mail.gmail.com>
Message-ID: <1733391914.5223.1619614647243.JavaMail.zimbra@fws.fr>

----- Le 28 Avr 21, ? 14:50, Gilberto Ferreira gilberto.nunes32 at gmail.com a ?crit :

> Just curious: did you restart the upgraded nodes?

Yes, of course ;-)

-- 
[ https://www.firewall-services.com/ ] 	
Daniel Berteaud 
FIREWALL-SERVICES SAS, La s?curit? des r?seaux 
Soci?t? de Services en Logiciels Libres 
T?l : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ]


From gilberto.nunes32 at gmail.com  Wed Apr 28 15:04:47 2021
From: gilberto.nunes32 at gmail.com (Gilberto Ferreira)
Date: Wed, 28 Apr 2021 10:04:47 -0300
Subject: [PVE-User] SDN issues in 6.4
In-Reply-To: <1733391914.5223.1619614647243.JavaMail.zimbra@fws.fr>
References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr>
 <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr>
 <CAOKSTBsDkSkDsOGyGrmnLHwNf27j_=ZBPusrvFkJwd=C7oo_qw@mail.gmail.com>
 <1733391914.5223.1619614647243.JavaMail.zimbra@fws.fr>
Message-ID: <CAOKSTBtkVoqdBajy0rwpmbiOuxBdSp0cUW9tdTxLOeyog1Fa0w@mail.gmail.com>

Ok! Just checking. ?
---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram


---
Gilberto Nunes Ferreira
(47) 99676-7530 - Whatsapp / Telegram


Em qua., 28 de abr. de 2021 ?s 09:58, Daniel Berteaud <
daniel at firewall-services.com> escreveu:

> ----- Le 28 Avr 21, ? 14:50, Gilberto Ferreira gilberto.nunes32 at gmail.com
> a ?crit :
>
> > Just curious: did you restart the upgraded nodes?
>
> Yes, of course ;-)
>
> --
> [ https://www.firewall-services.com/ ]
> Daniel Berteaud
> FIREWALL-SERVICES SAS, La s?curit? des r?seaux
> Soci?t? de Services en Logiciels Libres
> T?l : +33.5 56 64 15 32
> Matrix: @dani:fws.fr
> [ https://www.firewall-services.com/ | https://www.firewall-services.com ]
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From f.cuseo at panservice.it  Wed Apr 28 16:49:12 2021
From: f.cuseo at panservice.it (Fabrizio Cuseo)
Date: Wed, 28 Apr 2021 16:49:12 +0200 (CEST)
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
Message-ID: <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>

Great !
Regarding file restore, are you planning to support restore from LVM filesystem ?

Regards, Fabrizio


----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto:

> Hi all,
> 
> We are proud to announce the general availability of Proxmox Virtual Environment
> 6.4, our open-source virtualization platform. This version brings unified
> single-file restore for virtual machine (VM) and container (CT) backup archives
> stored on a Proxmox Backup Server as well as live restore of VM backup archives
> 
-- 
---
Fabrizio Cuseo - mailto:f.cuseo at panservice.it
Direzione Generale - Panservice InterNetWorking
Servizi Professionali per Internet ed il Networking
Panservice e' associata AIIP - RIPE Local Registry
Phone: +39 0773 410020 - Fax: +39 0773 470219
http://www.panservice.it  mailto:info at panservice.it
Numero verde nazionale: 800 901492


From f.cuseo at panservice.it  Wed Apr 28 17:08:28 2021
From: f.cuseo at panservice.it (Fabrizio Cuseo)
Date: Wed, 28 Apr 2021 17:08:28 +0200 (CEST)
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
 <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>
 <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com>
Message-ID: <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it>

Wonderful ! 
And what about NTFS filesystems ? 


----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto:

> On 28/04/2021 16:49, Fabrizio Cuseo wrote:
>> Great !
>> Regarding file restore, are you planning to support restore from LVM filesystem
>> ?
>> 
> 
> Yes, we plan on adding support for LVM, ZFS and mdraid in the future.
> For now only filesystems directly on partitions are supported.
> 
> ~ Stefan
> 
>> Regards, Fabrizio
>> 
>> 
>> 
>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto:
>> 
>>> Hi all,
>>>
>>> We are proud to announce the general availability of Proxmox Virtual Environment
>>> 6.4, our open-source virtualization platform. This version brings unified
>>> single-file restore for virtual machine (VM) and container (CT) backup archives
>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives

-- 
---
Fabrizio Cuseo - mailto:f.cuseo at panservice.it
Direzione Generale - Panservice InterNetWorking
Servizi Professionali per Internet ed il Networking
Panservice e' associata AIIP - RIPE Local Registry
Phone: +39 0773 410020 - Fax: +39 0773 470219
http://www.panservice.it  mailto:info at panservice.it
Numero verde nazionale: 800 901492


From s.reiter at proxmox.com  Wed Apr 28 17:02:04 2021
From: s.reiter at proxmox.com (Stefan Reiter)
Date: Wed, 28 Apr 2021 17:02:04 +0200
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
 <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>
Message-ID: <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com>

On 28/04/2021 16:49, Fabrizio Cuseo wrote:
> Great !
> Regarding file restore, are you planning to support restore from LVM filesystem ?
> 

Yes, we plan on adding support for LVM, ZFS and mdraid in the future. 
For now only filesystems directly on partitions are supported.

~ Stefan

> Regards, Fabrizio
> 
> 
> 
> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto:
> 
>> Hi all,
>>
>> We are proud to announce the general availability of Proxmox Virtual Environment
>> 6.4, our open-source virtualization platform. This version brings unified
>> single-file restore for virtual machine (VM) and container (CT) backup archives
>> stored on a Proxmox Backup Server as well as live restore of VM backup archives
>>


From s.reiter at proxmox.com  Wed Apr 28 17:11:03 2021
From: s.reiter at proxmox.com (Stefan Reiter)
Date: Wed, 28 Apr 2021 17:11:03 +0200
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
 <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>
 <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com>
 <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it>
Message-ID: <db263ff4-3924-8fe1-ca31-efce68198407@proxmox.com>

On 28/04/2021 17:08, Fabrizio Cuseo wrote:
> Wonderful !
> And what about NTFS filesystems ?
> 

NTFS is supported already, file restore from Windows guests should be 
possible.

> 
> ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto:
> 
>> On 28/04/2021 16:49, Fabrizio Cuseo wrote:
>>> Great !
>>> Regarding file restore, are you planning to support restore from LVM filesystem
>>> ?
>>>
>>
>> Yes, we plan on adding support for LVM, ZFS and mdraid in the future.
>> For now only filesystems directly on partitions are supported.
>>
>> ~ Stefan
>>
>>> Regards, Fabrizio
>>>
>>>
>>>
>>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto:
>>>
>>>> Hi all,
>>>>
>>>> We are proud to announce the general availability of Proxmox Virtual Environment
>>>> 6.4, our open-source virtualization platform. This version brings unified
>>>> single-file restore for virtual machine (VM) and container (CT) backup archives
>>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives
> 


From f.cuseo at panservice.it  Wed Apr 28 17:15:56 2021
From: f.cuseo at panservice.it (Fabrizio Cuseo)
Date: Wed, 28 Apr 2021 17:15:56 +0200 (CEST)
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <db263ff4-3924-8fe1-ca31-efce68198407@proxmox.com>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
 <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>
 <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com>
 <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it>
 <db263ff4-3924-8fe1-ca31-efce68198407@proxmox.com>
Message-ID: <1871322782.88053.1619622956498.JavaMail.zimbra@zimbra.panservice.it>


I am trying with a Windows7 guest but I have this error: "proxmox-file-restore failed: Error: given image drive-virtio0.img.fidx' not found (500)


----- Il 28-apr-21, alle 17:11, Stefan Reiter s.reiter at proxmox.com ha scritto:

> On 28/04/2021 17:08, Fabrizio Cuseo wrote:
>> Wonderful !
>> And what about NTFS filesystems ?
>> 
> 
> NTFS is supported already, file restore from Windows guests should be
> possible.
> 
>> 
>> ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto:
>> 
>>> On 28/04/2021 16:49, Fabrizio Cuseo wrote:
>>>> Great !
>>>> Regarding file restore, are you planning to support restore from LVM filesystem
>>>> ?
>>>>
>>>
>>> Yes, we plan on adding support for LVM, ZFS and mdraid in the future.
>>> For now only filesystems directly on partitions are supported.
>>>
>>> ~ Stefan
>>>
>>>> Regards, Fabrizio
>>>>
>>>>
>>>>
>>>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We are proud to announce the general availability of Proxmox Virtual Environment
>>>>> 6.4, our open-source virtualization platform. This version brings unified
>>>>> single-file restore for virtual machine (VM) and container (CT) backup archives
>>>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives

-- 
---
Fabrizio Cuseo - mailto:f.cuseo at panservice.it
Direzione Generale - Panservice InterNetWorking
Servizi Professionali per Internet ed il Networking
Panservice e' associata AIIP - RIPE Local Registry
Phone: +39 0773 410020 - Fax: +39 0773 470219
http://www.panservice.it  mailto:info at panservice.it
Numero verde nazionale: 800 901492


From s.reiter at proxmox.com  Wed Apr 28 17:24:05 2021
From: s.reiter at proxmox.com (Stefan Reiter)
Date: Wed, 28 Apr 2021 17:24:05 +0200
Subject: [PVE-User] Proxmox VE 6.4 released
In-Reply-To: <1871322782.88053.1619622956498.JavaMail.zimbra@zimbra.panservice.it>
References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com>
 <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it>
 <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com>
 <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it>
 <db263ff4-3924-8fe1-ca31-efce68198407@proxmox.com>
 <1871322782.88053.1619622956498.JavaMail.zimbra@zimbra.panservice.it>
Message-ID: <877fc689-fe7b-72df-fce0-9849b5d4870e@proxmox.com>

On 28/04/2021 17:15, Fabrizio Cuseo wrote:
> 
> I am trying with a Windows7 guest but I have this error: "proxmox-file-restore failed: Error: given image drive-virtio0.img.fidx' not found (500)
> 

Ah that's an unrelated issue with virtio drives, should be fixed by 
'proxmox-backup-file-restore 1.1.5-2' currently in pvetest. We found 
that issue a bit too late for the release, unfortunately.

See:
https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=606828cc65feb380c0f9536fe7ca277ea1dc20c1

> 
> 
> ----- Il 28-apr-21, alle 17:11, Stefan Reiter s.reiter at proxmox.com ha scritto:
> 
>> On 28/04/2021 17:08, Fabrizio Cuseo wrote:
>>> Wonderful !
>>> And what about NTFS filesystems ?
>>>
>>
>> NTFS is supported already, file restore from Windows guests should be
>> possible.
>>
>>>
>>> ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto:
>>>
>>>> On 28/04/2021 16:49, Fabrizio Cuseo wrote:
>>>>> Great !
>>>>> Regarding file restore, are you planning to support restore from LVM filesystem
>>>>> ?
>>>>>
>>>>
>>>> Yes, we plan on adding support for LVM, ZFS and mdraid in the future.
>>>> For now only filesystems directly on partitions are supported.
>>>>
>>>> ~ Stefan
>>>>
>>>>> Regards, Fabrizio
>>>>>
>>>>>
>>>>>
>>>>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> We are proud to announce the general availability of Proxmox Virtual Environment
>>>>>> 6.4, our open-source virtualization platform. This version brings unified
>>>>>> single-file restore for virtual machine (VM) and container (CT) backup archives
>>>>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives
> 


From me at marcobertorello.it  Wed Apr 28 17:34:14 2021
From: me at marcobertorello.it (Bertorello, Marco)
Date: Wed, 28 Apr 2021 17:34:14 +0200
Subject: [PVE-User] Replication blocked issue
Message-ID: <11f33c2d-472d-d2c8-d3e4-c5e4a99900e4@marcobertorello.it>

Dear PVE users,

I've a 3-nodes clusters, with ZFS storage.
Every node use it's own storage and the VMs/LXCs are replicated across 
other nodes every 10 minutes.

Some times happens that a replica job is running without an end.

For example at the moment I have a replication started yesterday:

2021-04-27 07:20:01 101-1: start replication job
2021-04-27 07:20:01 101-1: guest => CT 101, running => 1
2021-04-27 07:20:01 101-1: volumes => DS1:subvol-101-disk-1
2021-04-27 07:20:02 101-1: freeze guest filesystem
2021-04-27 07:20:05 101-1: create snapshot 
'__replicate_101-1_1619500801__' on DS1:subvol-101-disk-1
2021-04-27 07:20:06 101-1: thaw guest filesystem
2021-04-27 07:20:06 101-1: using secure transmission, rate limit: none
2021-04-27 07:20:06 101-1: incremental sync 'DS1:subvol-101-disk-1' 
(__replicate_101-1_1619500201__ => __replicate_101-1_1619500801__)
2021-04-27 07:20:08 101-1: send from @__replicate_101-1_1619500201__ to 
zp1/subvol-101-disk-1 at __replicate_101-0_1619500211__ estimated size is 213K
2021-04-27 07:20:08 101-1: send from @__replicate_101-0_1619500211__ to 
zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ estimated size is 26.1M
2021-04-27 07:20:08 101-1: total estimated size is 26.4M
2021-04-27 07:20:09 101-1: TIME??????? SENT?? SNAPSHOT 
zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__
2021-04-27 07:20:09 101-1: 07:20:09?? 3.18M 
zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__
[...]
2021-04-28 17:27:25 101-1: 17:27:25?? 3.18M 
zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__
2021-04-28 17:27:26 101-1: 17:27:26?? 3.18M 
zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__
2021-04-28 17:27:27 101-1: 17:27:27?? 3.18M 
zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__

as you can see, no progress in this time slot, still 3.18M transferred.

There are 2 big problems with this:

1) the blocked replica prevents the other replication scheduled on the 
source node to run until this replication ends or fail

2) I've no other solution but reboot the destination node to exit this 
situation.

I tried to kill the process on the destination node, but the process is 
in D state and cannot be killed.
There is a way to get out this scenario without reboot nodes?

Thanks a lot and best regards,

-- 
Marco Bertorello
https://www.marcobertorello.it


From leesteken at protonmail.ch  Wed Apr 28 20:48:24 2021
From: leesteken at protonmail.ch (Arjen)
Date: Wed, 28 Apr 2021 18:48:24 +0000
Subject: [PVE-User] Multi Data Center Cluster or Not
In-Reply-To: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
References: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
Message-ID: <A0-KaWrg9PREejdC5UnsN4TJmAcTyLTlkVK3CvXq2G96PGt6fEGb_E3D2_FT8cX7WS98O9XVEllnbdcUcg1YiJH_iWHK03MKEmVEkziDGzU=@protonmail.ch>

On Tuesday, April 27th, 2021 at 20:38, JR Richardson <jmr.richardson at gmail.com> wrote:

> Hi All,
>
> I'm looking for suggestions for geo-diversity using PROXMOX
> Clustering. I understand running hypervisors in the same cluster in
> multiple data centers is possible with high capacity/low latency
> inter-site links. What I'm learning is there could be better ways,
> like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS
> is interesting but would require manually restoring nodes should a
> failure occur.
>
> I'm looking for best practice or suggestions in topology that folks
> are using successfully or even tales of failure for what to avoid.

I haven't actually done this, so feel free to ignore this or inform me of problems with this approach:

Set-up multiple Proxmox systems/clusters, each in a separate data center but don't cluster them over the data centers.
Set-up a VPN that allows Proxmox and VMs in each data center to connect to the others. It does not need low latency.
Have a PBS VM on each of them and backup your VMs (many times a day, if you want) to the local PBS and sync all the PBSs.
Distribute the VMs manually over the different systems, so that the users have the lowest latency.
Leave room for more VMs, this makes then operate more smooth and would allow taking over load from other systems.
If a data center becomes unusable, restore the VMs that were running there on the other systems manually.

In case of problems, nothing will be automated and you'll lose work since the most recent available backup, but at least you know that you have several other working Proxmox systems/clusters up and running and capable of restoring and running the affected VMs.
The syncing of backups only depends on changes in the set of deduplicated chunks and does not need low latency or high speed.

kind regards, Arjen


From f.cuseo at panservice.it  Wed Apr 28 21:06:23 2021
From: f.cuseo at panservice.it (Fabrizio Cuseo)
Date: Wed, 28 Apr 2021 21:06:23 +0200 (CEST)
Subject: [PVE-User] Multi Data Center Cluster or Not
In-Reply-To: <mailman.839.1619635767.359.pve-user@lists.proxmox.com>
References: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
 <mailman.839.1619635767.359.pve-user@lists.proxmox.com>
Message-ID: <814551923.94902.1619636783033.JavaMail.zimbra@zimbra.panservice.it>

I have not read with attention, but try to read this article:

https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring


----- Il 28-apr-21, alle 20:48, pve-user pve-user at lists.proxmox.com ha scritto:

> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-- 
---
Fabrizio Cuseo - mailto:f.cuseo at panservice.it
Direzione Generale - Panservice InterNetWorking
Servizi Professionali per Internet ed il Networking
Panservice e' associata AIIP - RIPE Local Registry
Phone: +39 0773 410020 - Fax: +39 0773 470219
http://www.panservice.it  mailto:info at panservice.it
Numero verde nazionale: 800 901492


From alex at calicolabs.com  Thu Apr 29 20:04:59 2021
From: alex at calicolabs.com (Alex Chekholko)
Date: Thu, 29 Apr 2021 11:04:59 -0700
Subject: [PVE-User] Multi Data Center Cluster or Not
In-Reply-To: <mailman.839.1619635767.359.pve-user@lists.proxmox.com>
References: <CA+U74VOqgn1MHXOKeOXBA6uVNUy_kK1hKCqAD+0fnBqbrG0z9Q@mail.gmail.com>
 <mailman.839.1619635767.359.pve-user@lists.proxmox.com>
Message-ID: <CANcy_PYfku-+jNgjZH+PRR8V3Pk09jzgQPOHpx_UPWz80FMHzg@mail.gmail.com>

Yes, this is the way I do it, each proxmox cluster is independent in its
own location but they all have access to NFS mounts or PBS server where I
can dump the vzdump images.  It is not too "HA" but you can backup/restore
VMs from one place to another or spin up yesterday's version.  It is
sufficient for our use cases and could be good enough for your DR.

On Wed, Apr 28, 2021 at 11:49 AM Arjen via pve-user <
pve-user at lists.proxmox.com> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Arjen <leesteken at protonmail.ch>
> To: Proxmox VE user list <pve-user at lists.proxmox.com>
> Cc:
> Bcc:
> Date: Wed, 28 Apr 2021 18:48:24 +0000
> Subject: Re: [PVE-User] Multi Data Center Cluster or Not
> On Tuesday, April 27th, 2021 at 20:38, JR Richardson <
> jmr.richardson at gmail.com> wrote:
>
> > Hi All,
> >
> > I'm looking for suggestions for geo-diversity using PROXMOX
> > Clustering. I understand running hypervisors in the same cluster in
> > multiple data centers is possible with high capacity/low latency
> > inter-site links. What I'm learning is there could be better ways,
> > like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS
> > is interesting but would require manually restoring nodes should a
> > failure occur.
> >
> > I'm looking for best practice or suggestions in topology that folks
> > are using successfully or even tales of failure for what to avoid.
>
> I haven't actually done this, so feel free to ignore this or inform me of
> problems with this approach:
>
> Set-up multiple Proxmox systems/clusters, each in a separate data center
> but don't cluster them over the data centers.
> Set-up a VPN that allows Proxmox and VMs in each data center to connect to
> the others. It does not need low latency.
> Have a PBS VM on each of them and backup your VMs (many times a day, if
> you want) to the local PBS and sync all the PBSs.
> Distribute the VMs manually over the different systems, so that the users
> have the lowest latency.
> Leave room for more VMs, this makes then operate more smooth and would
> allow taking over load from other systems.
> If a data center becomes unusable, restore the VMs that were running there
> on the other systems manually.
>
> In case of problems, nothing will be automated and you'll lose work since
> the most recent available backup, but at least you know that you have
> several other working Proxmox systems/clusters up and running and capable
> of restoring and running the affected VMs.
> The syncing of backups only depends on changes in the set of deduplicated
> chunks and does not need low latency or high speed.
>
> kind regards, Arjen
>
>
>
>
>
> ---------- Forwarded message ----------
> From: Arjen via pve-user <pve-user at lists.proxmox.com>
> To: Proxmox VE user list <pve-user at lists.proxmox.com>
> Cc: Arjen <leesteken at protonmail.ch>
> Bcc:
> Date: Wed, 28 Apr 2021 18:48:24 +0000
> Subject: Re: [PVE-User] Multi Data Center Cluster or Not
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From devzero at web.de  Fri Apr 30 12:42:25 2021
From: devzero at web.de (Roland)
Date: Fri, 30 Apr 2021 12:42:25 +0200
Subject: [PVE-User] pbs prune from commandline ?
Message-ID: <3538b9a7-3b60-f603-6b66-c694fb5b225c@web.de>

hello,

isn't there a commandline equivalent of? proxmox backup server side prune ?

(i.e. pbs -> Datastore -> Prune & GC -> Prune Schedule )


how can i trigger a prune from commandline on the pbs side , like i can
do gc and verify? with proxmox-backup-manager? ?

i only find prune option with proxmox-backup-client.

this should be be equivalent to:? pve -> storage -> pbs-ds -> Backup
Retention? tab, i.e. it's the prune definition on the client side.

shouldn't there exist prune option in proxmox-backup-manager, too !?

regards
roland