From lindsay.mathieson at gmail.com Thu Apr 1 03:29:33 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 1 Apr 2021 11:29:33 +1000 Subject: [PVE-User] Offsite Backups with PBS using External swapable Drives? In-Reply-To: References: <658103a5-e20c-a26a-de58-95c75e3e745d@gmail.com> Message-ID: <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com> On 31/03/2021 3:20 pm, Arjen via pve-user wrote: > If you run PBS somewhere, you can use a folder on the NAS as a Datastore. You could duplicate that folder to the external drive, as a backup copy of that Datastore and keep it off-site. I'll look into that, though I'd be concerned regards file locking and sync to disk issues. Worth testing out though. > You would need a PBS to read from the external drive in case of a on-site disaster. Something like that might be simlar to what you do now, except that you need to run a PBS somewhere on-site (possibly in a VM or CT). Actually running it in a VM now for testing! -- Lindsay From leesteken at protonmail.ch Thu Apr 1 08:24:45 2021 From: leesteken at protonmail.ch (Arjen) Date: Thu, 01 Apr 2021 06:24:45 +0000 Subject: [PVE-User] Offsite Backups with PBS using External swapable Drives? In-Reply-To: <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com> References: <658103a5-e20c-a26a-de58-95c75e3e745d@gmail.com> <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com> Message-ID: On Thursday, April 1st, 2021 at 03:29, Lindsay Mathieson wrote: > On 31/03/2021 3:20 pm, Arjen via pve-user wrote: > > > If you run PBS somewhere, you can use a folder on the NAS as a Datastore. You could duplicate that folder to the external drive, as a backup copy of that Datastore and keep it off-site. > > I'll look into that, though I'd be concerned regards file locking and > > sync to disk issues. Worth testing out though. Probably best to gracefully shutdown PBS before copying the disk. Or do a copy (best effort, ignoring failures) first, then shutdown PBS and rsync all remaining differences to reduce the down time on the PBS. Maybe someone more knowledgeable of PBS can tell us how to copy a Datastore to another disk? > > You would need a PBS to read from the external drive in case of a on-site disaster. Something like that might be simlar to what you do now, except that you need to run a PBS somewhere on-site (possibly in a VM or CT). > > Actually running it in a VM now for testing! If the Datastore is on a virtual disk, you could maybe use vzdump to backup that disk or even the whole VM to an external drive? Or first backup the whole PBS VM to the NAS and then sync to external disk, much like you do now? kind regards, Arjen From lindsay.mathieson at gmail.com Thu Apr 1 11:32:50 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 1 Apr 2021 19:32:50 +1000 Subject: [PVE-User] Offsite Backups with PBS using External swapable Drives? In-Reply-To: References: <658103a5-e20c-a26a-de58-95c75e3e745d@gmail.com> <764e7ad0-ff41-50a5-1ccf-7e626de45d4b@gmail.com> Message-ID: On 1/04/2021 4:24 pm, Arjen via pve-user wrote: > Or first backup the whole PBS VM to the NAS and then sync to external disk, much like you do now? That would certainly work Cheers, -- Lindsay From leandro at tecnetmza.com.ar Thu Apr 1 18:58:54 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Thu, 1 Apr 2021 13:58:54 -0300 Subject: [PVE-User] mi first cluster Message-ID: Hi guys : Im preparing my second proxmox box to create my first cluster. IT does not contain any usefull data yet , so I can play aorund a little bit. Have some questions. Is it possible to create a cluster with only two boxes ? I understood it is possible , but can not move VMs without service disruption. And if I want to move VMs without services disruption I need a third box. What is ceph storage for ? what is ZFS pool for ? In case I need I can add it on any of my available bays o any box ? Do pve boxes need to be on same network subnet ? I have two datacenters , I was thinking in the idea to install them on each location just for redundancy. Lets supose network connection performance is optimal ... Is there any problem with that ? Any other clustering good practice advice would be wellcome. Thanks. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> From d.csapak at proxmox.com Fri Apr 2 11:28:29 2021 From: d.csapak at proxmox.com (Dominik Csapak) Date: Fri, 2 Apr 2021 11:28:29 +0200 Subject: [PVE-User] mi first cluster In-Reply-To: References: Message-ID: On 4/1/21 18:58, Leandro Roggerone wrote: > Hi guys : Hi, > Im preparing my second proxmox box to create my first cluster. > IT does not contain any usefull data yet , so I can play aorund a little > bit. > Have some questions. > Is it possible to create a cluster with only two boxes ? yes > I understood it is possible , but can not move VMs without service > disruption. if you mean HA, yes you need at least 3 nodes or 2 nodes + a quorum device, also the storage needs to be avaiable on both nodes live migration still works even with 2 nodes, and even with local storage > And if I want to move VMs without services disruption I need a third box. > > What is ceph storage for ? > what is ZFS pool for ? what do you mean? ceph is distributed, redundant, software defined storage zfs is a type of local filesystem and disk management > In case I need I can add it on any of my available bays o any box ? > > Do pve boxes need to be on same network subnet ? generally no, i think, but it makes it easier > I have two datacenters , I was thinking in the idea to install them on each > location just for redundancy. > Lets supose network connection performance is optimal ... Is there any > problem with that ? i'd advise against that. our clustering stack uses corosync, which needs low latency links to work properly > Any other clustering good practice advice would be wellcome. > Thanks. > > > Libre > de virus. www.avast.com > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From d.csapak at proxmox.com Fri Apr 2 11:28:29 2021 From: d.csapak at proxmox.com (Dominik Csapak) Date: Fri, 2 Apr 2021 11:28:29 +0200 Subject: [PVE-User] mi first cluster In-Reply-To: References: Message-ID: On 4/1/21 18:58, Leandro Roggerone wrote: > Hi guys : Hi, > Im preparing my second proxmox box to create my first cluster. > IT does not contain any usefull data yet , so I can play aorund a little > bit. > Have some questions. > Is it possible to create a cluster with only two boxes ? yes > I understood it is possible , but can not move VMs without service > disruption. if you mean HA, yes you need at least 3 nodes or 2 nodes + a quorum device, also the storage needs to be avaiable on both nodes live migration still works even with 2 nodes, and even with local storage > And if I want to move VMs without services disruption I need a third box. > > What is ceph storage for ? > what is ZFS pool for ? what do you mean? ceph is distributed, redundant, software defined storage zfs is a type of local filesystem and disk management > In case I need I can add it on any of my available bays o any box ? > > Do pve boxes need to be on same network subnet ? generally no, i think, but it makes it easier > I have two datacenters , I was thinking in the idea to install them on each > location just for redundancy. > Lets supose network connection performance is optimal ... Is there any > problem with that ? i'd advise against that. our clustering stack uses corosync, which needs low latency links to work properly > Any other clustering good practice advice would be wellcome. > Thanks. > > > Libre > de virus. www.avast.com > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From ouafnico at shivaserv.fr Fri Apr 2 14:43:14 2021 From: ouafnico at shivaserv.fr (Ouafnico) Date: Fri, 02 Apr 2021 14:43:14 +0200 Subject: vgs/vgscan sleep disks Message-ID: <2OSXQQ.7DLD2CWRGIPN3@shivaserv.fr> Hi, I'm using proxmox for a while on a personal home server. I'm using multiples disk devices with LVM. Actually, I saw pve is waking up all my disks every x seconds for the LVM vgs command with pvestatd. I'm not using all my lvm volumes groups for proxmox, some are for others needs, so I can't hide them on global_filters in LVM configuration. I'm still searching how to tell proxmox to do not vgscan all disks, but only on VG declared on pve. Is there any way to do so, or disable this check? Thanks From ouafnico at shivaserv.fr Fri Apr 2 16:18:31 2021 From: ouafnico at shivaserv.fr (Ouafnico) Date: Fri, 02 Apr 2021 16:18:31 +0200 Subject: [PVE-User] vgs/vgscan sleep disks In-Reply-To: References: Message-ID: I might have found something, if it can help anyone. vgs is scanning every /dev/sd* devices, and /dev/md* devices. My VG are on mdadm devices. I have added in lvm filter, /dev/sd* in reject, but accept for /dev/md* only. Maybe the mdadm cache is responding to vgs commands, but now I see only vgs seeks on /dev/md*, and it's not waking up all devices. Le ven. 2 avril 2021 ? 14:43, Ouafnico via pve-user a ?crit : > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > From stefan at fuhrmann.homedns.org Sat Apr 3 10:51:26 2021 From: stefan at fuhrmann.homedns.org (Stefan Fuhrmann) Date: Sat, 3 Apr 2021 10:51:26 +0200 Subject: [PVE-User] Windows sever 2016 vm with very high ram usage In-Reply-To: References: Message-ID: Ahoi, ?you have to installthe drivers: https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers Stefan Am 29.03.21 um 15:37 schrieb Leandro Roggerone: > Hi guys , thanks for your words , have some feedback: > > Maybe Proxmox cannot look inside the VM for the actual memory usage because > the VirtIO balloon driver is not installed or active? Or maybe the other > 90% is in use as Windows file cache? > I think so ... > Dont know about windows file cache ... > > Have you installed the VirtIO drivers for Windows? Are you assigning too > many vCPUs or memory? Can you share the VM configuration file? Can you tell > us something about your Proxmox hardware and version? > No , I have not ... now you mentioned im reading about VirtIO , I will try > to install and let you know how it goes > Should install it no my pve box or directly inside windows vm ? > I'm assigning max vCPUs abailables (24 , 4 sockets 6 cores) and 32gb for > memory. > (I really don't know about any criteria to assign vcpus) > > This is VM config file: > > root at pve:~# cat /etc/pve/nodes/pve/qemu-server/107.conf > > bootdisk: ide0 > > cores: 6 > > ide0: local-lvm:vm-107-disk-0,size=150G > > ide1: local-lvm:vm-107-disk-1,size=350G > > ide2: > local:iso/Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO,media=cdrom,size=6808810K > > memory: 32768 > > name: KAIKENII > > net0: e1000=1A:F1:10:BF:92:0A,bridge=vmbr3,firewall=1 > > numa: 0 > > ostype: win10 > > scsihw: virtio-scsi-pci > > smbios1: uuid=daf8f767-59c7-4e87-b3be-75d4a8020c38 > > sockets: 4 > > vmgenid: a7634624-1230-4a3e-9e7c-255d32ad2030 > > My PVE is: > CPU(s) 24 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz (2 Sockets) > Kernel Version Linux 5.0.15-1-pve #1 SMP PVE 5.0.15-1 (Wed, 03 Jul 2019 > 10:51:57 +0200) > PVE Manager Version pve-manager/6.0-4/2a719255 > Total Mem = 64GB. > > That's all. > Thanks > > > > > El vie, 26 mar 2021 a las 11:40, Arjen via pve-user (< > pve-user at lists.proxmox.com>) escribi?: > >> >> >> ---------- Forwarded message ---------- >> From: Arjen >> To: Proxmox VE user list >> Cc: >> Bcc: >> Date: Fri, 26 Mar 2021 14:39:13 +0000 >> Subject: Re: [PVE-User] Windows sever 2016 vm with very high ram usage >> On Friday, March 26th, 2021 at 15:28, Leandro Roggerone < >> leandro at tecnetmza.com.ar> wrote: >> >>> Hi guys , Just wanted to share this with you. >>> >>> After creating a VM for a windows sever 2016 with 32 GB ram I can >>> >>> see continuos high memory usage (about 99-100%). >>> >>> I have no running task , since it is a fresh server and from task manager >>> >>> can see a 10% of memory usage. >>> >>> Regarding those confusing differences, server performance is very bad. >> Maybe Proxmox cannot look inside the VM for the actual memory usage >> because the VirtIO balloon driver is not installed or active? Or maybe the >> other 90% is in use as Windows file cache? >> >>> User experience is very poor with a non fluent user interface. >>> >>> Is there something to do / check to improve this ? >> Have you installed the VirtIO drivers for Windows? Are you assigning too >> many vCPUs or memory? Can you share the VM configuration file? Can you tell >> us something about your Proxmox hardware and version? >> >>> Any advice would be welcome. >> >> Maybe search the forum for similar Windows performance questions? >> >> >> https://forum.proxmox.com/forums/proxmox-ve-installation-and-configuration.16/ >> >>> Thanks. >> best of luck, Arjen >> >> >> ---------- Forwarded message ---------- >> From: Arjen via pve-user >> To: Proxmox VE user list >> Cc: Arjen >> Bcc: >> Date: Fri, 26 Mar 2021 14:39:13 +0000 >> Subject: Re: [PVE-User] Windows sever 2016 vm with very high ram usage >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From hongyi.zhao at gmail.com Sun Apr 4 10:25:51 2021 From: hongyi.zhao at gmail.com (Hongyi Zhao) Date: Sun, 4 Apr 2021 16:25:51 +0800 Subject: [PVE-User] Change the private FQDN of pve node. Message-ID: Currently, I'm running only one pve node on one of my intranet machine which using the following hosts file configuration: --------- # cat /etc/hosts 127.0.0.1 localhost.localdomain localhost #https://pve.proxmox.com/wiki/Renaming_a_PVE_node 192.168.10.254 pve.lan pve # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts --------- For compatibility and scalability, I intend to switch to the following domain name: pve1.pve.lan For this purpose, I changed the following two files as below: /etc/hosts: 192.168.10.254 pve1.pve.lan pve1 /etc/hostname: pve1 I'm not sure if the above is enough. Any hints will be highly appreciated. Regards -- Assoc. Prof. Hongyi Zhao Theory and Simulation of Materials Hebei Polytechnic University of Science and Technology engineering NO. 552 North Gangtie Road, Xingtai, China From martin.konold at konsec.com Mon Apr 5 10:22:56 2021 From: martin.konold at konsec.com (Konold, Martin) Date: Mon, 05 Apr 2021 10:22:56 +0200 Subject: [PVE-User] ZFS Disk Usage unexpected high Message-ID: <7e9d1358dcb12b288e884547553e39b4@konsec.com> Hi, I set up a single VM which currently used 15TB of data. /dev/sdb is technically a ZFS volume on the Proxmox Host. [root at vm ~]# df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sdb 40T 15T 25T 37% /data [root at vm ~]# du -s /data/ 14874345100 /data/ [root at host /]# zfs list zfs01/PVE-BE/vm-1-disk-1 NAME USED AVAIL REFER MOUNTPOINT zfs01/PVE-BE/vm-1-disk-1 31.5T 5.82T 31.5T - Why is the usage on the host about twice as large as within the vm? (Yes, I have given fstrim and discard a try). -- Regards ppa. Martin Konold -- Martin Konold - Prokurist, CTO KONSEC GmbH -? make things real Amtsgericht Stuttgart, HRB 23690 Gesch?ftsf?hrer: Andreas Mack Im K?ller 3, 70794 Filderstadt, Germany From gianni.milo22 at gmail.com Mon Apr 5 11:54:40 2021 From: gianni.milo22 at gmail.com (Yanni M.) Date: Mon, 5 Apr 2021 10:54:40 +0100 Subject: [PVE-User] ZFS Disk Usage unexpected high In-Reply-To: <7e9d1358dcb12b288e884547553e39b4@konsec.com> References: <7e9d1358dcb12b288e884547553e39b4@konsec.com> Message-ID: This is a common issue on raidz based pools. Assuming 4k sectors (ashift=12) and zvol with 8K volblocksize, each 8K (2-sector) block uses a single sector (4k) of parity. So 15TB of 8KB blocks (default volblocksize=8k) takes up at least 22.5TB space on disk (including parity). You will use less parity by increasing the volblock size (e.g. volblocksize=32k, or the default recordsize=128k for filesystems) in exchange of a possible lower performance. Another solution would be using a pool of striped mirrors (RAID10). This problem does not exist in such pools (as there are no parity blocks used). On Mon, 5 Apr 2021 at 09:23, Konold, Martin wrote: > > Hi, > > I set up a single VM which currently used 15TB of data. > /dev/sdb is technically a ZFS volume on the Proxmox Host. > > [root at vm ~]# df -h /data > Filesystem Size Used Avail Use% Mounted on > /dev/sdb 40T 15T 25T 37% /data > [root at vm ~]# du -s /data/ > 14874345100 /data/ > > [root at host /]# zfs list zfs01/PVE-BE/vm-1-disk-1 > NAME USED AVAIL REFER MOUNTPOINT > zfs01/PVE-BE/vm-1-disk-1 31.5T 5.82T 31.5T - > > Why is the usage on the host about twice as large as within the vm? > (Yes, I have given fstrim and discard a try). > > -- > Regards > ppa. Martin Konold > > -- > Martin Konold - Prokurist, CTO > KONSEC GmbH -? make things real > Amtsgericht Stuttgart, HRB 23690 > Gesch?ftsf?hrer: Andreas Mack > Im K?ller 3, 70794 Filderstadt, Germany > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From leandro at tecnetmza.com.ar Mon Apr 5 16:25:55 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Mon, 5 Apr 2021 11:25:55 -0300 Subject: [PVE-User] * this host already contains virtual guests on fresh box Message-ID: Hi guys, I was trying to create a cluster and add a node to it I have my main box (172.30.6.254) with a lot of vms and containers running there. I created a cluster in my main box. Then on my new fresh box (172.30.6.253) , tried to join to created cluster but got the message: * this host already contains virtual guests and can not continue. Why is that happening? my server is new and empty. Have been looking but can not fix it. Any suggestions? Thanks, Leandro. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> From leandro at tecnetmza.com.ar Mon Apr 5 16:48:14 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Mon, 5 Apr 2021 11:48:14 -0300 Subject: [PVE-User] * this host already contains virtual guests (SOLVED) Message-ID: Please dismiss my previous email , already solved After searching on the new server directory , it has some VMs info from main server created on a first join attempt. I removed those directories at /etc/pve/nodes/main_node and could succesfully add my new node to cluster. Thanks. From laurentfdumont at gmail.com Tue Apr 6 01:04:42 2021 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Mon, 5 Apr 2021 19:04:42 -0400 Subject: [PVE-User] * this host already contains virtual guests on fresh box In-Reply-To: References: Message-ID: Is the new box really fresh? No LXD container/VM? You should be able to join a fresh box to an existing cluster with VMs already running. On Mon, Apr 5, 2021 at 10:26 AM Leandro Roggerone wrote: > Hi guys, I was trying to create a cluster and add a node to it > I have my main box (172.30.6.254) with a lot of vms and containers running > there. > I created a cluster in my main box. > Then on my new fresh box (172.30.6.253) , tried to join to created cluster > but got the message: > * this host already contains virtual guests > and can not continue. > Why is that happening? my server is new and empty. > Have been looking but can not fix it. > Any suggestions? > Thanks, Leandro. > > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > Libre > de virus. www.avast.com > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From laurentfdumont at gmail.com Tue Apr 6 01:05:11 2021 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Mon, 5 Apr 2021 19:05:11 -0400 Subject: [PVE-User] * this host already contains virtual guests (SOLVED) In-Reply-To: References: Message-ID: Oups, replied to your other email but glad you found a fix :) On Mon, Apr 5, 2021 at 10:48 AM Leandro Roggerone wrote: > Please dismiss my previous email , already solved > After searching on the new server directory , it has some VMs info from > main server created on a first join attempt. > I removed those directories at /etc/pve/nodes/main_node > and could succesfully add my new node to cluster. > Thanks. > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From laurentfdumont at gmail.com Tue Apr 6 01:04:42 2021 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Mon, 5 Apr 2021 19:04:42 -0400 Subject: [PVE-User] * this host already contains virtual guests on fresh box In-Reply-To: References: Message-ID: Is the new box really fresh? No LXD container/VM? You should be able to join a fresh box to an existing cluster with VMs already running. On Mon, Apr 5, 2021 at 10:26 AM Leandro Roggerone wrote: > Hi guys, I was trying to create a cluster and add a node to it > I have my main box (172.30.6.254) with a lot of vms and containers running > there. > I created a cluster in my main box. > Then on my new fresh box (172.30.6.253) , tried to join to created cluster > but got the message: > * this host already contains virtual guests > and can not continue. > Why is that happening? my server is new and empty. > Have been looking but can not fix it. > Any suggestions? > Thanks, Leandro. > > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > Libre > de virus. www.avast.com > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From laurentfdumont at gmail.com Tue Apr 6 01:05:11 2021 From: laurentfdumont at gmail.com (Laurent Dumont) Date: Mon, 5 Apr 2021 19:05:11 -0400 Subject: [PVE-User] * this host already contains virtual guests (SOLVED) In-Reply-To: References: Message-ID: Oups, replied to your other email but glad you found a fix :) On Mon, Apr 5, 2021 at 10:48 AM Leandro Roggerone wrote: > Please dismiss my previous email , already solved > After searching on the new server directory , it has some VMs info from > main server created on a first join attempt. > I removed those directories at /etc/pve/nodes/main_node > and could succesfully add my new node to cluster. > Thanks. > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From mhill at inett.de Tue Apr 6 09:41:26 2021 From: mhill at inett.de (Maximilian Hill) Date: Tue, 6 Apr 2021 09:41:26 +0200 Subject: [PVE-User] ZFS Disk Usage unexpected high In-Reply-To: <7e9d1358dcb12b288e884547553e39b4@konsec.com> References: <7e9d1358dcb12b288e884547553e39b4@konsec.com> Message-ID: Hi, I got the same issue with different RAID-Z setups lately. We worked around it, but I don't want to go into detail abaout that before I know. why that happened. Regards Maximilian Hill On Mon, Apr 05, 2021 at 10:22:56AM +0200, Konold, Martin wrote: > > Hi, > > I set up a single VM which currently used 15TB of data. > /dev/sdb is technically a ZFS volume on the Proxmox Host. > > [root at vm ~]# df -h /data > Filesystem Size Used Avail Use% Mounted on > /dev/sdb 40T 15T 25T 37% /data > [root at vm ~]# du -s /data/ > 14874345100 /data/ > > [root at host /]# zfs list zfs01/PVE-BE/vm-1-disk-1 > NAME USED AVAIL REFER MOUNTPOINT > zfs01/PVE-BE/vm-1-disk-1 31.5T 5.82T 31.5T - > > Why is the usage on the host about twice as large as within the vm? > (Yes, I have given fstrim and discard a try). > > -- > Regards > ppa. Martin Konold -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From leandro at tecnetmza.com.ar Wed Apr 7 17:25:16 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Wed, 7 Apr 2021 12:25:16 -0300 Subject: [PVE-User] link down for network interface Message-ID: Hi guys ... Yesterday was working on my datacenter cabling my new proxmox network interfaces. I connected a network port to my mikrotik router. Before coming home I checked network linked condition and physically was ok , both leds blinking on both side , router and server nic. (eno1) Now , working remotely can see that server interface has no link. Very strange since interface at router is up. Any idea ? I never seen something similar (one side link ok and the other is down). Here can I share some outputs: root at pve2:~# ip link 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens2f0: mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000 link/ether a0:d3:c1:f5:37:08 brd ff:ff:ff:ff:ff:ff 3: ens2f1: mtu 1500 qdisc mq master vmbr1 state UP mode DEFAULT group default qlen 1000 link/ether a0:d3:c1:f5:37:09 brd ff:ff:ff:ff:ff:ff 4: eno1: mtu 1500 qdisc mq master vmbr2 state DOWN mode DEFAULT group default qlen 1000 link/ether 40:a8:f0:2a:18:80 brd ff:ff:ff:ff:ff:ff 5: eno2: mtu 1500 qdisc mq master vmbr3 state DOWN mode DEFAULT group default qlen 1000 as you can see eno1 is down ... while can see link up on the other connected side. I will change cable tomorrow, but wanted to share this experience with you. BTW , already tryed autonegociation , and fixed 100, 1000Mb. Regards. Leandro. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> From chris.hofstaedtler at deduktiva.com Thu Apr 8 00:03:19 2021 From: chris.hofstaedtler at deduktiva.com (Chris Hofstaedtler | Deduktiva) Date: Thu, 8 Apr 2021 00:03:19 +0200 Subject: [PVE-User] link down for network interface In-Reply-To: References: Message-ID: <20210407220319.snhtabanjxcjfl6w@percival.namespace.at> * Leandro Roggerone [210407 17:25]: > Hi guys ... Please consider that more than one gender exists. > Yesterday was working on my datacenter cabling my new proxmox network > interfaces. > I connected a network port to my mikrotik router. > Before coming home I checked network linked condition and physically was ok > , both leds blinking on both side , router and server nic. (eno1) > Now , working remotely can see that server interface has no link. > Very strange since interface at router is up. > Any idea ? I never seen something similar (one side link ok and the other > is down). > root at pve2:~# ip link [..] > 4: eno1: mtu 1500 qdisc mq master vmbr2 > state DOWN mode DEFAULT group default qlen 1000 > link/ether 40:a8:f0:2a:18:80 brd ff:ff:ff:ff:ff:ff > > as you can see eno1 is down ... while can see link up on the other > connected side. You did not tell us the brand and make or driver of that network card. Some Intel cards are known to be extremely picky about autoneg and power save settings, (short) cables, etc. Best luck, Chris -- Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien) www.deduktiva.com / +43 1 353 1707 From lindsay.mathieson at gmail.com Thu Apr 8 09:44:10 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 8 Apr 2021 17:44:10 +1000 Subject: [PVE-User] PBS and Open VSwicth Message-ID: Does PBS support OpenVSwicth? not seeing anything in the webgui for it -- Lindsay From mityapetuhov at gmail.com Thu Apr 8 09:58:57 2021 From: mityapetuhov at gmail.com (Dmitry Petuhov) Date: Thu, 8 Apr 2021 10:58:57 +0300 Subject: [PVE-User] PBS and Open VSwicth In-Reply-To: References: Message-ID: <2ff3699c-0511-5b7b-0a5f-a8be6fe667c3@gmail.com> Hello What for? I don't see any application of OVS in PBS. Maybe for interface bonding, but it can be done easier. Regardless of support in GUI, you always can configure it by hand like in standard Debian installation. 08.04.2021 10:44, Lindsay Mathieson ?????: > Does PBS support OpenVSwicth? not seeing anything in the webgui for it > From lindsay.mathieson at gmail.com Thu Apr 8 15:04:23 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 8 Apr 2021 23:04:23 +1000 Subject: [PVE-User] PBS and Open VSwicth In-Reply-To: <2ff3699c-0511-5b7b-0a5f-a8be6fe667c3@gmail.com> References: <2ff3699c-0511-5b7b-0a5f-a8be6fe667c3@gmail.com> Message-ID: On 8/04/2021 5:58 pm, Dmitry Petuhov wrote: > What for? I don't see any application of OVS in PBS. Maybe for > interface bonding, but it can be done easier. Meh - I find OVS more flexible > > Regardless of support in GUI, you always can configure it by hand like > in standard Debian installation. I prefer to stick with "The Proxmox Way", it avoids ocmplications. Regards, I'm not hung up on it, just checking. Have a Linux LACP bond setup currently. -- Lindsay From lindsay.mathieson at gmail.com Thu Apr 8 15:06:27 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 8 Apr 2021 23:06:27 +1000 Subject: [PVE-User] PBS and ZFS Pools Compression? Message-ID: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com> Setup a ZFS RAID1 pool on my PBS server and enabled lz4 compression on it (habit). But given the backups are already compressed, its not really going to gain anything is it? possibly even counter productive? Thanks. -- Lindsay From devzero at web.de Thu Apr 8 15:12:57 2021 From: devzero at web.de (Roland) Date: Thu, 8 Apr 2021 15:12:57 +0200 Subject: [PVE-User] PBS and ZFS Pools Compression? In-Reply-To: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com> References: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com> Message-ID: <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de> hi, i think it's counter productive, as you are wasting cpu for compress/uncompress data, which should not be further compressible (except some smaller files like img.fidx etc) regards roland Am 08.04.21 um 15:06 schrieb Lindsay Mathieson: > Setup a ZFS RAID1 pool on my PBS server and enabled lz4 compression on > it (habit). But given the backups are already compressed, its not > really going to gain anything is it? possibly even counter productive? > > > Thanks. > From lindsay.mathieson at gmail.com Thu Apr 8 15:13:46 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 8 Apr 2021 23:13:46 +1000 Subject: [PVE-User] PBS and ZFS Pools Compression? In-Reply-To: <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de> References: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com> <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de> Message-ID: <1b240ba7-d285-9968-85fe-a99265b16c9c@gmail.com> On 8/04/2021 11:12 pm, Roland wrote: > i think it's counter productive, as you are wasting cpu for > compress/uncompress data, Yah, I thought so, thanks. -- Lindsay From m at plus-plus.su Thu Apr 8 17:19:27 2021 From: m at plus-plus.su (Mikhail) Date: Thu, 8 Apr 2021 18:19:27 +0300 Subject: [PVE-User] PBS and ZFS Pools Compression? In-Reply-To: <1b240ba7-d285-9968-85fe-a99265b16c9c@gmail.com> References: <1e49af73-aaaa-f7e7-00e4-c940ce733a6f@gmail.com> <479b543a-cf3b-adda-29a3-522b9158e6e9@web.de> <1b240ba7-d285-9968-85fe-a99265b16c9c@gmail.com> Message-ID: <478f113a-b899-8db7-ae2c-8de0084c7920@plus-plus.su> On 4/8/21 4:13 PM, Lindsay Mathieson wrote: > On 8/04/2021 11:12 pm, Roland wrote: >> i think it's counter productive, as you are wasting cpu for >> compress/uncompress data, > > Yah, I thought so, thanks. > I may be wrong, but AFAIK ZFS detects compressed data and thus it is not doing double-compression in such cases, so I guess there's no harm here (we also have lz4 enabled on datastore where Proxmox sends backups). Also lz4 is cheap, so I doubt it has any significant impact on modern CPUs. regards, From dietmar at proxmox.com Thu Apr 8 17:47:23 2021 From: dietmar at proxmox.com (Dietmar Maurer) Date: Thu, 8 Apr 2021 17:47:23 +0200 (CEST) Subject: [PVE-User] PBS and ZFS Pools Compression? Message-ID: <156659471.1579.1617896843476@webmail.proxmox.com> > I may be wrong, but AFAIK ZFS detects compressed data and thus it is not > doing double-compression in such cases, AFAIK the only way to detect compressed data is to actually compress it, then test the size. So this is double-compression ... From gseeley at gmail.com Thu Apr 8 18:22:44 2021 From: gseeley at gmail.com (Geoff Seeley) Date: Thu, 8 Apr 2021 09:22:44 -0700 Subject: [PVE-User] PBS and ZFS Pools Compression? In-Reply-To: References: Message-ID: > > > I may be wrong, but AFAIK ZFS detects compressed data and thus it is not > > doing double-compression in such cases, > > AFAIK the only way to detect compressed data is to actually compress it, > then > test the size. So this is double-compression ... > ZFS compression is a little more complex than this, but the good news is that ZFS is also smart enough not to do this! This is a good article on the subject: https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/ TL;DR: Enable compression at the pool level and forget about it. -Geoff From dietmar at proxmox.com Thu Apr 8 20:06:59 2021 From: dietmar at proxmox.com (Dietmar Maurer) Date: Thu, 8 Apr 2021 20:06:59 +0200 (CEST) Subject: [PVE-User] PBS and ZFS Pools Compression? Message-ID: <1052148244.1615.1617905219488@webmail.proxmox.com> > This is a good article on the subject: > https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/ Can't find where the explain it. ZFS magically detects if data is compressible? Please can someone give me a hint how they do that? From devzero at web.de Fri Apr 9 01:51:09 2021 From: devzero at web.de (Roland) Date: Fri, 9 Apr 2021 01:51:09 +0200 Subject: [PVE-User] PBS and ZFS Pools Compression? In-Reply-To: <1052148244.1615.1617905219488@webmail.proxmox.com> References: <1052148244.1615.1617905219488@webmail.proxmox.com> Message-ID: <9cd163da-e7b6-f3cf-5537-f601babcaba0@web.de> i know that there was smartcompression feature in nexenta: https://openzfs.org/w/images/4/4d/Compression-Saso_Kiselkov.pdf afaik, it does not exist in zfsonlinux/openzfs. on my system, i'm getting - <250MB/s when writing uncompressible data to zfs pool with lz4 enabled and - >450MB/s when writing uncompressible data to zfs pool without compression regards roland Am 08.04.21 um 20:06 schrieb Dietmar Maurer: >> This is a good article on the subject: >> https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/ > Can't find where the explain it. ZFS magically detects if data is compressible? > Please can someone give me a hint how they do that? > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From f.gruenbichler at proxmox.com Fri Apr 9 08:54:01 2021 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?q?Gr=FCnbichler?=) Date: Fri, 09 Apr 2021 08:54:01 +0200 Subject: [PVE-User] PBS and ZFS Pools Compression? In-Reply-To: <1052148244.1615.1617905219488@webmail.proxmox.com> References: <1052148244.1615.1617905219488@webmail.proxmox.com> Message-ID: <1617951199.q66a123ls2.astroid@nora.none> On April 8, 2021 8:06 pm, Dietmar Maurer wrote: >> This is a good article on the subject: >> https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/ > > Can't find where the explain it. ZFS magically detects if data is compressible? > Please can someone give me a hint how they do that? no. they compress, and if the result is over a certain threshold, they save the uncompressed data to avoid the decompress overhead for barely any/no gain. From leandro at tecnetmza.com.ar Fri Apr 9 13:20:59 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Fri, 9 Apr 2021 08:20:59 -0300 Subject: [PVE-User] link down for network interface In-Reply-To: <20210407220319.snhtabanjxcjfl6w@percival.namespace.at> References: <20210407220319.snhtabanjxcjfl6w@percival.namespace.at> Message-ID: Veeeery strange. After removing the vlan aware flag on the interface configuration could get the link up. Fortunately I have a second nic on this server so I will use it instead. Have not much time to debug this. Thanks. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> El mi?, 7 abr 2021 a las 19:04, Chris Hofstaedtler | Deduktiva (< chris.hofstaedtler at deduktiva.com>) escribi?: > * Leandro Roggerone [210407 17:25]: > > Hi guys ... > Please consider that more than one gender exists. > > > Yesterday was working on my datacenter cabling my new proxmox network > > interfaces. > > I connected a network port to my mikrotik router. > > Before coming home I checked network linked condition and physically was > ok > > , both leds blinking on both side , router and server nic. (eno1) > > Now , working remotely can see that server interface has no link. > > Very strange since interface at router is up. > > Any idea ? I never seen something similar (one side link ok and the > other > > is down). > > > root at pve2:~# ip link > [..] > > 4: eno1: mtu 1500 qdisc mq master > vmbr2 > > state DOWN mode DEFAULT group default qlen 1000 > > link/ether 40:a8:f0:2a:18:80 brd ff:ff:ff:ff:ff:ff > > > > as you can see eno1 is down ... while can see link up on the other > > connected side. > > You did not tell us the brand and make or driver of that network > card. Some Intel cards are known to be extremely picky about autoneg > and power save settings, (short) cables, etc. > > Best luck, > Chris > > -- > Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien) > www.deduktiva.com / +43 1 353 1707 > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From leandro at tecnetmza.com.ar Fri Apr 9 19:19:06 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Fri, 9 Apr 2021 14:19:06 -0300 Subject: [PVE-User] LVM question Message-ID: Hi guys , after install a new storage to my box , had to create lvm-thin. Im not very good with lvm , after reading some docs , and links like: https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/ I got a working solution but have also some questions about it. This is what I did: wipefs -a /dev/sdb sgdisk -N 1 /dev/sdb pvcreate --metadatasize 1024M -y -ff /dev/sdb1 vgcreate --metadatasize 1024M proxvg /dev/sdb1 lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n proxthin proxvg lvcreate -n proxvz -V 1.1T proxvg/proxthin mkfs.ext4 /dev/proxvg/proxvz mkdir /media/vz echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' >> /etc/fstab mount -a And have following result: root at pve2:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert proxthin proxvg twi-aotz-- 1.63t 22.03 6.34 proxvz proxvg Vwi-aotz-- 1.10t proxthin 1.67 root at pve2:~# lvdisplay --- Logical volume --- LV Name proxthin VG Name proxvg LV UUID 4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX LV Write Access read/write LV Creation host, time pve2, 2021-04-01 13:09:41 -0300 LV Pool metadata proxthin_tmeta LV Pool data proxthin_tdata LV Status available # open 3 LV Size 1.63 TiB Allocated pool data 22.03% Allocated metadata 6.34% Current LE 428451 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:2 --- Logical volume --- LV Path /dev/proxvg/proxvz LV Name proxvz VG Name proxvg LV UUID huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR LV Write Access read/write LV Creation host, time pve2, 2021-04-01 13:10:12 -0300 LV Pool name proxthin LV Status available # open 1 LV Size 1.10 TiB Mapped size 1.67% Current LE 288359 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:4 Have following comments: I can create VMs on proxthin partition so it is ok. I can create backup on proxvz partition so it is ok. What im concerned about is: Physic storage space is about 1.8TG , how is it possible to create a 1.6 and 1.1T volumnes inside ? It can be a problem in the future ? I was thinking about reduce proxthin partition to 600Gb aprox , so it make same sense 1.1T + 600G aprox 1.8 T But there is no LV Path on proxthin partition so I can unmount and the reduce. So .. What im missing here ? do I need to reduce proxthin partition ( I do need the 1.1T partition to backup). Hope to be clear about this guys. Any comment would be wellcome. Leandro. From lindsay.mathieson at gmail.com Sat Apr 10 05:22:57 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Sat, 10 Apr 2021 13:22:57 +1000 Subject: [PVE-User] Revisited: External disk backup using PBS - Requesting Criticism/Advice? Message-ID: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com> TTL;DR - Backup PC is a standalone Proxmox Server, running a PBS lxc container using its host filesystem for the Backup store. The PBS container is backed up using vzdump to an attached external hard drive which is rotated offsite. Source Data * 5 Node Proxmox Cluster * Ceph storage (size = 3) * 30 VM's Backup Destination Standalone Proxmox Server in same server room as main cluster * 16 GM Ram * CPU: Intel i5 * Boot - 2 SSD's in ZFS RAID1 * Data - 2 4TB WD NAS Drives in ZFS RAID1 * Bonded 1G * 3 * PBS Container o Data Store - 4TB, passed through from Host Schedule * Proxmox backups up all VM's to PBS Container Weekly o Will revisit the schedule * Host proxmox server backs up PBS and its data to external hard drive o Approx 2 TB Data (we are just an SMB) o Drive is rotated offsite Recovery * VM's can easily be restored from the PBS server as needed (very rare occurrence - usually a user messed up their VM) Disaster Recovery This is the real concern - Fire, Theft etc. All servers and data including the Standalone Host and PBS server are gone. * Recreate Proxmox Cluster * Recreate Proxmox PBS Host o Restore the PBS Container and data from an offsite backup disk * Restore the VM's to the Cluster from the PBS Container Does all this seem practical and safe? Thanks - Lindsay -- Lindsay From leesteken at protonmail.ch Sat Apr 10 09:10:10 2021 From: leesteken at protonmail.ch (Arjen) Date: Sat, 10 Apr 2021 07:10:10 +0000 Subject: [PVE-User] Revisited: External disk backup using PBS - Requesting Criticism/Advice? In-Reply-To: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com> References: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com> Message-ID: <2R8D4utXiNGRYsPE8Hi1xF74VKA4tNAgEXchp_H_Zs84aZoA1i8iPgh6oqyoPGBG4-W9_Df1IZZxwEmN-d3QGw18OKKfVvmuLxElBVqjkiI=@protonmail.ch> On Saturday, April 10th, 2021 at 05:22, Lindsay Mathieson wrote: > TTL;DR - Backup PC is a standalone Proxmox Server, running a PBS lxc > container using its host filesystem for the Backup store. The PBS > container is backed up using vzdump to an attached external hard drive > which is rotated offsite. > > Source Data > > - 5 Node Proxmox Cluster > - Ceph storage (size = 3) > - 30 VM's > > Backup Destination > > Standalone Proxmox Server in same server room as main cluster > - 16 GM Ram > - CPU: Intel i5 > - Boot - 2 SSD's in ZFS RAID1 > - Data - 2 4TB WD NAS Drives in ZFS RAID1 > - Bonded 1G * 3 > - PBS Container > > o Data Store - 4TB, passed through from Host > > Schedule > - Proxmox backups up all VM's to PBS Container Weekly > > o Will revisit the schedule > - Host proxmox server backs up PBS and its data to external hard drive > > o Approx 2 TB Data (we are just an SMB) > > o Drive is rotated offsite > > Recovery > - VM's can easily be restored from the PBS server as needed (very rare > > occurrence - usually a user messed up their VM) > > Disaster Recovery > > This is the real concern - Fire, Theft etc. All servers and data > including the Standalone Host and PBS server are gone. > - Recreate Proxmox Cluster > - Recreate Proxmox PBS Host > > o Restore the PBS Container and data from an offsite backup disk > - Restore the VM's to the Cluster from the PBS Container > > Does all this seem practical and safe? Don't expect to be able to backup the PBS container with 4TB to a 2TB external drive. The Datastore of a PSB does not compress much further and Proxmox VE Backup will only backup virtual disks and not mountpoints or storages passed from host, if I understand correctly. I suggest adding a virtual disk of 2TB to the PBS container (and format it with ext4) which can be backed up by the Proxmox VE Backup. I would also run the PBS container (with virtual disk) on the cluster instead on separate hardware which is a single point of failure. The local PBS would be then just as reliable as your cluster. Regarding safeness: I suggest doing a automated disaster recovery every week to make sure it works as expected. Or at least partially, like restoring the PBS from an external drive. Regarding practicality: I would have a remote PBS sync with your local PBS instead of moving physical disks (but you mentioned before that that was not really possible). best regards, Arjen From lindsay.mathieson at gmail.com Sat Apr 10 15:28:50 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Sat, 10 Apr 2021 23:28:50 +1000 Subject: [PVE-User] Revisited: External disk backup using PBS - Requesting Criticism/Advice? In-Reply-To: References: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com> Message-ID: On 10/04/2021 5:10 pm, Arjen via pve-user wrote: > Don't expect to be able to backup the PBS container with 4TB to a 2TB external drive. I only passed 2TB through and the actual backup data comes to 1.3TB > The Datastore of a PSB does not compress much further and Proxmox VE Backup will only backup virtual disks and not mountpoints or storages passed from host, if I understand correctly. I wondered that. Will be testing. > I suggest adding a virtual disk of 2TB to the PBS container (and format it with ext4) which can be backed up by the Proxmox VE Backup. Certainly a possibility. I also wondered if it was practical to attach an external disk to PBS as a Datastore, then detach it. A bit more manual, but doable. > I would also run the PBS container (with virtual disk) on the cluster instead on separate hardware which is a single point of failure. The local PBS would be then just as reliable as your cluster. I want to keep the storage separate from the cluster, in that regard the local storage is a single point of failure, hence the need for offsite storage as well :) > > Regarding safeness: I suggest doing a automated disaster recovery every week to make sure it works as expected. Or at least partially, like restoring the PBS from an external drive. I'll definitely be testing restore options to check that it works. > Regarding practicality: I would have a remote PBS sync with your local PBS instead of moving physical disks (but you mentioned before that that was not really possible). Alas :( Perhaps I could do a backup on site, then physically move it offsite and attach it to a offsite PBS server and then sync it remotely - incremental backups over the net would be doable. nb. Our NAS died, hence my increased investigation of this :) Definitely want to go with a more open and targeted solution this time, the NAS was a good appliance, but inflexible. Thanks! -- Lindsay From lindsay.mathieson at gmail.com Sat Apr 10 15:36:23 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Sat, 10 Apr 2021 23:36:23 +1000 Subject: [PVE-User] PBS Incremental and stopped VM's Message-ID: I'm guessing only running VM's (with dirty bitmap support) can be incrementally backed up? Might be nice if we could schedule backups for only running VM's -- Lindsay From leesteken at protonmail.ch Sat Apr 10 15:43:43 2021 From: leesteken at protonmail.ch (Arjen) Date: Sat, 10 Apr 2021 13:43:43 +0000 Subject: [PVE-User] PBS Incremental and stopped VM's In-Reply-To: References: Message-ID: On Saturday, April 10th, 2021 at 15:36, Lindsay Mathieson wrote: > I'm guessing only running VM's (with dirty bitmap support) can be > > incrementally backed up? > > Might be nice if we could schedule backups for only running VM's Just to be clear: PBS always makes a full backup. The resulting data is deduplicated (before sending it to the server), which almost always reduces the writes to the server. An administration of changed virtual disk blocks is kept for running VMs, which only reduces the reads from VMs that have not been restarted between backups. It data transfer over the network is the bottleneck, you will have most benefit from the former (less changes, less transfers). The latter only speeds up the backup due to less reads (of unchanged data) from disk. best regards, Arjen From lindsay.mathieson at gmail.com Sat Apr 10 16:06:19 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Sun, 11 Apr 2021 00:06:19 +1000 Subject: [PVE-User] PBS Incremental and stopped VM's In-Reply-To: References: Message-ID: <90b0f7df-ca15-f9cd-b76a-0f8f26e24917@gmail.com> On 10/04/2021 11:43 pm, Arjen via pve-user wrote: > Just to be clear: PBS always makes a full backup. The resulting data is deduplicated (before sending it to the server), which almost always reduces the writes to the server. Ah, I see now, thanks, I didn't understand that part of things. Looking at the logs of my 2nd backup, I see that stopped VM's had zero bytes written to the backup server. -- Lindsay From atokovenko at gmail.com Mon Apr 12 00:43:38 2021 From: atokovenko at gmail.com (Oleksii Tokovenko) Date: Mon, 12 Apr 2021 01:43:38 +0300 Subject: [PVE-User] pve-user Digest, Vol 157, Issue 12 In-Reply-To: References: Message-ID: unsibscribe ??, 11 ????. 2021 ? 13:00 ????: > Send pve-user mailing list submissions to > pve-user at lists.proxmox.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > or, via email, send a message with subject or body 'help' to > pve-user-request at lists.proxmox.com > > You can reach the person managing the list at > pve-user-owner at lists.proxmox.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of pve-user digest..." > > > Today's Topics: > > 1. Re: Revisited: External disk backup using PBS - Requesting > Criticism/Advice? (Lindsay Mathieson) > 2. PBS Incremental and stopped VM's (Lindsay Mathieson) > 3. Re: PBS Incremental and stopped VM's (Arjen) > 4. Re: PBS Incremental and stopped VM's (Lindsay Mathieson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 10 Apr 2021 23:28:50 +1000 > From: Lindsay Mathieson > To: pve-user at lists.proxmox.com > Subject: Re: [PVE-User] Revisited: External disk backup using PBS - > Requesting Criticism/Advice? > Message-ID: > Content-Type: text/plain; charset=windows-1252; format=flowed > > On 10/04/2021 5:10 pm, Arjen via pve-user wrote: > > Don't expect to be able to backup the PBS container with 4TB to a 2TB > external drive. > > I only passed 2TB through and the actual backup data comes to 1.3TB > > > The Datastore of a PSB does not compress much further and Proxmox VE > Backup will only backup virtual disks and not mountpoints or storages > passed from host, if I understand correctly. > > > I wondered that. Will be testing. > > > I suggest adding a virtual disk of 2TB to the PBS container (and format > it with ext4) which can be backed up by the Proxmox VE Backup. > > > Certainly a possibility. > > > I also wondered if it was practical to attach an external disk to PBS as > a Datastore, then detach it. A bit more manual, but doable. > > > > I would also run the PBS container (with virtual disk) on the cluster > instead on separate hardware which is a single point of failure. The local > PBS would be then just as reliable as your cluster. > > > I want to keep the storage separate from the cluster, in that regard the > local storage is a single point of failure, hence the need for offsite > storage as well :) > > > > > > Regarding safeness: I suggest doing a automated disaster recovery every > week to make sure it works as expected. Or at least partially, like > restoring the PBS from an external drive. > > > I'll definitely be testing restore options to check that it works. > > > Regarding practicality: I would have a remote PBS sync with your local > PBS instead of moving physical disks (but you mentioned before that that > was not really possible). > > > Alas :( > > > Perhaps I could do a backup on site, then physically move it offsite and > attach it to a offsite PBS server and then sync it remotely - > incremental backups over the net would be doable. > > > nb. Our NAS died, hence my increased investigation of this :) Definitely > want to go with a more open and targeted solution this time, the NAS was > a good appliance, but inflexible. > > Thanks! > > -- > Lindsay > > > > > ------------------------------ > > Message: 2 > Date: Sat, 10 Apr 2021 23:36:23 +1000 > From: Lindsay Mathieson > To: pve-user at lists.proxmox.com > Subject: [PVE-User] PBS Incremental and stopped VM's > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > I'm guessing only running VM's (with dirty bitmap support) can be > incrementally backed up? > > > Might be nice if we could schedule backups for only running VM's > > -- > Lindsay > > > > > ------------------------------ > > Message: 3 > Date: Sat, 10 Apr 2021 13:43:43 +0000 > From: Arjen > To: Proxmox VE user list > Subject: Re: [PVE-User] PBS Incremental and stopped VM's > Message-ID: > > protonmail.ch> > > Content-Type: text/plain; charset=utf-8 > > On Saturday, April 10th, 2021 at 15:36, Lindsay Mathieson < > lindsay.mathieson at gmail.com> wrote: > > > I'm guessing only running VM's (with dirty bitmap support) can be > > > > incrementally backed up? > > > > Might be nice if we could schedule backups for only running VM's > > Just to be clear: PBS always makes a full backup. The resulting data is > deduplicated (before sending it to the server), which almost always reduces > the writes to the server. An administration of changed virtual disk blocks > is kept for running VMs, which only reduces the reads from VMs that have > not been restarted between backups. It data transfer over the network is > the bottleneck, you will have most benefit from the former (less changes, > less transfers). The latter only speeds up the backup due to less reads (of > unchanged data) from disk. > > best regards, Arjen > > > > ------------------------------ > > Message: 4 > Date: Sun, 11 Apr 2021 00:06:19 +1000 > From: Lindsay Mathieson > To: pve-user at lists.proxmox.com > Subject: Re: [PVE-User] PBS Incremental and stopped VM's > Message-ID: <90b0f7df-ca15-f9cd-b76a-0f8f26e24917 at gmail.com> > Content-Type: text/plain; charset=windows-1252; format=flowed > > On 10/04/2021 11:43 pm, Arjen via pve-user wrote: > > Just to be clear: PBS always makes a full backup. The resulting data is > deduplicated (before sending it to the server), which almost always reduces > the writes to the server. > > Ah, I see now, thanks, I didn't understand that part of things. Looking > at the logs of my 2nd backup, I see that stopped VM's had zero bytes > written to the backup server. > > -- > Lindsay > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > ------------------------------ > > End of pve-user Digest, Vol 157, Issue 12 > ***************************************** > > -- ? ?????????, ????????? ??????? ?????????? From lindsay.mathieson at gmail.com Mon Apr 12 03:57:50 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Mon, 12 Apr 2021 11:57:50 +1000 Subject: [PVE-User] Revisited: External disk backup using PBS - Requesting Criticism/Advice? In-Reply-To: References: <15e5bfd5-7458-fdbe-5079-ce78a19e0099@gmail.com> Message-ID: <33f7cf57-9ca2-1c05-89db-9fceaf5e4cc5@gmail.com> On 10/04/2021 5:10 pm, Arjen via pve-user wrote: > Don't expect to be able to backup the PBS container with 4TB to a 2TB external drive. The Datastore of a PSB does not compress much further and Proxmox VE Backup will only backup virtual disks and not mountpoints or storages passed from host, if I understand correctly. > I suggest adding a virtual disk of 2TB to the PBS container (and format it with ext4) which can be backed up by the Proxmox VE Backup. > I would also run the PBS container (with virtual disk) on the cluster instead on separate hardware which is a single point of failure. The local PBS would be then just as reliable as your cluster. > > Regarding safeness: I suggest doing a automated disaster recovery every week to make sure it works as expected. Or at least partially, like restoring the PBS from an external drive. > Regarding practicality: I would have a remote PBS sync with your local PBS instead of moving physical disks (but you mentioned before that that was not really possible). Did some testing over the weekend. * Setup a PBS Container with a 2TB root disk * Backed up 6 VM's to it. * Backed up the PBS Container to a external disk using vzdump (using the std proxmox gui) o Slooow process at 30MB/s :) * Deleted the PBS Container * Deleted the backup VM's * Restored the PBS container from the external hard disk o Much faster - averaged around 100MB/s * Restored PBS Container started fine and verified. * Restored the backed up VM's from the PBS Container o Worked as expected o All backup and running. All in all, worked as I wanted and seems a viable option for full image offsite backups via external hard disks. The process of backing up the PBS container to the external drive is *very* slow :( I estimate 11 hours for a full cluster backup copy. But since its on a independent node, it doesn't load the main cluster and can just happen over the weekend. -- Lindsay From d.csapak at proxmox.com Mon Apr 12 08:48:48 2021 From: d.csapak at proxmox.com (Dominik Csapak) Date: Mon, 12 Apr 2021 08:48:48 +0200 Subject: [PVE-User] LVM question In-Reply-To: References: Message-ID: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com> Hi, On 4/9/21 19:19, Leandro Roggerone wrote: > Hi guys , after install a new storage to my box , had to create lvm-thin. > Im not very good with lvm , after reading some docs , and links like: > https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/ > > I got a working solution but have also some questions about it. > This is what I did: > > wipefs -a /dev/sdb > > sgdisk -N 1 /dev/sdb > > pvcreate --metadatasize 1024M -y -ff /dev/sdb1 > > vgcreate --metadatasize 1024M proxvg /dev/sdb1 > > lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n > proxthin proxvg > > lvcreate -n proxvz -V 1.1T proxvg/proxthin > > mkfs.ext4 /dev/proxvg/proxvz > > mkdir /media/vz > > echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' >> > /etc/fstab > > mount -a > > > And have following result: > > > root at pve2:~# lvs > LV VG Attr LSize Pool Origin Data% Meta% > Move Log Cpy%Sync Convert > proxthin proxvg twi-aotz-- 1.63t 22.03 6.34 > > proxvz proxvg Vwi-aotz-- 1.10t proxthin 1.67 > root at pve2:~# lvdisplay > --- Logical volume --- > LV Name proxthin > VG Name proxvg > LV UUID 4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX > LV Write Access read/write > LV Creation host, time pve2, 2021-04-01 13:09:41 -0300 > LV Pool metadata proxthin_tmeta > LV Pool data proxthin_tdata > LV Status available > # open 3 > LV Size 1.63 TiB > Allocated pool data 22.03% > Allocated metadata 6.34% > Current LE 428451 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:2 > > --- Logical volume --- > LV Path /dev/proxvg/proxvz > LV Name proxvz > VG Name proxvg > LV UUID huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR > LV Write Access read/write > LV Creation host, time pve2, 2021-04-01 13:10:12 -0300 > LV Pool name proxthin > LV Status available > # open 1 > LV Size 1.10 TiB > Mapped size 1.67% > Current LE 288359 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:4 > > > Have following comments: > > I can create VMs on proxthin partition so it is ok. > > I can create backup on proxvz partition so it is ok. Looks OK imho. > > What im concerned about is: > > Physic storage space is about 1.8TG , how is it possible to create a 1.6 > and 1.1T volumnes inside ? LVM Thin is 'thin-provisioned' it only uses space when it is really written. > It can be a problem in the future ? yes, if you do not monitor your real usage, if the thinpool runs full, you can lose data. > I was thinking about reduce proxthin partition to 600Gb aprox , so it make > same sense 1.1T + 600G aprox 1.8 T > But there is no LV Path on proxthin partition so I can unmount and the > reduce. > So .. > What im missing here ? do I need to reduce proxthin partition ( I do need > the 1.1T partition to backup). the LV 'proxvz' is inside the thinpool 'proxthin' so as long as you never allocate more that ~500GiB of vm/ct volumes, it should be fine. alos, on allocation, the thinpool will print warnings if the allocated lvs are bigger than the space available hope this helps > Hope to be clear about this guys. > Any comment would be wellcome. > Leandro. > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From d.csapak at proxmox.com Mon Apr 12 08:48:48 2021 From: d.csapak at proxmox.com (Dominik Csapak) Date: Mon, 12 Apr 2021 08:48:48 +0200 Subject: [PVE-User] LVM question In-Reply-To: References: Message-ID: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com> Hi, On 4/9/21 19:19, Leandro Roggerone wrote: > Hi guys , after install a new storage to my box , had to create lvm-thin. > Im not very good with lvm , after reading some docs , and links like: > https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/ > > I got a working solution but have also some questions about it. > This is what I did: > > wipefs -a /dev/sdb > > sgdisk -N 1 /dev/sdb > > pvcreate --metadatasize 1024M -y -ff /dev/sdb1 > > vgcreate --metadatasize 1024M proxvg /dev/sdb1 > > lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n > proxthin proxvg > > lvcreate -n proxvz -V 1.1T proxvg/proxthin > > mkfs.ext4 /dev/proxvg/proxvz > > mkdir /media/vz > > echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' >> > /etc/fstab > > mount -a > > > And have following result: > > > root at pve2:~# lvs > LV VG Attr LSize Pool Origin Data% Meta% > Move Log Cpy%Sync Convert > proxthin proxvg twi-aotz-- 1.63t 22.03 6.34 > > proxvz proxvg Vwi-aotz-- 1.10t proxthin 1.67 > root at pve2:~# lvdisplay > --- Logical volume --- > LV Name proxthin > VG Name proxvg > LV UUID 4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX > LV Write Access read/write > LV Creation host, time pve2, 2021-04-01 13:09:41 -0300 > LV Pool metadata proxthin_tmeta > LV Pool data proxthin_tdata > LV Status available > # open 3 > LV Size 1.63 TiB > Allocated pool data 22.03% > Allocated metadata 6.34% > Current LE 428451 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:2 > > --- Logical volume --- > LV Path /dev/proxvg/proxvz > LV Name proxvz > VG Name proxvg > LV UUID huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR > LV Write Access read/write > LV Creation host, time pve2, 2021-04-01 13:10:12 -0300 > LV Pool name proxthin > LV Status available > # open 1 > LV Size 1.10 TiB > Mapped size 1.67% > Current LE 288359 > Segments 1 > Allocation inherit > Read ahead sectors auto > - currently set to 256 > Block device 253:4 > > > Have following comments: > > I can create VMs on proxthin partition so it is ok. > > I can create backup on proxvz partition so it is ok. Looks OK imho. > > What im concerned about is: > > Physic storage space is about 1.8TG , how is it possible to create a 1.6 > and 1.1T volumnes inside ? LVM Thin is 'thin-provisioned' it only uses space when it is really written. > It can be a problem in the future ? yes, if you do not monitor your real usage, if the thinpool runs full, you can lose data. > I was thinking about reduce proxthin partition to 600Gb aprox , so it make > same sense 1.1T + 600G aprox 1.8 T > But there is no LV Path on proxthin partition so I can unmount and the > reduce. > So .. > What im missing here ? do I need to reduce proxthin partition ( I do need > the 1.1T partition to backup). the LV 'proxvz' is inside the thinpool 'proxthin' so as long as you never allocate more that ~500GiB of vm/ct volumes, it should be fine. alos, on allocation, the thinpool will print warnings if the allocated lvs are bigger than the space available hope this helps > Hope to be clear about this guys. > Any comment would be wellcome. > Leandro. > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From leandro at tecnetmza.com.ar Mon Apr 12 13:12:48 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Mon, 12 Apr 2021 08:12:48 -0300 Subject: [PVE-User] LVM question In-Reply-To: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com> References: <2509fa63-a5b2-658a-610f-0497202d2f6f@proxmox.com> Message-ID: Thanks !! very helpful. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> El lun, 12 abr 2021 a las 3:48, Dominik Csapak () escribi?: > Hi, > > On 4/9/21 19:19, Leandro Roggerone wrote: > > Hi guys , after install a new storage to my box , had to create lvm-thin. > > Im not very good with lvm , after reading some docs , and links like: > > > https://forum.proxmox.com/threads/how-to-create-an-lvm-thinpool-and-vz-directory-on-the-same-disk.62901/ > > > > I got a working solution but have also some questions about it. > > This is what I did: > > > > wipefs -a /dev/sdb > > > > sgdisk -N 1 /dev/sdb > > > > pvcreate --metadatasize 1024M -y -ff /dev/sdb1 > > > > vgcreate --metadatasize 1024M proxvg /dev/sdb1 > > > > lvcreate -l 100%FREE --poolmetadatasize 1024M --chunksize 256 -T -n > > proxthin proxvg > > > > lvcreate -n proxvz -V 1.1T proxvg/proxthin > > > > mkfs.ext4 /dev/proxvg/proxvz > > > > mkdir /media/vz > > > > echo '/dev/proxvg/proxvz /media/vz ext4 defaults,errors=remount-ro 0 2' > >> > > /etc/fstab > > > > mount -a > > > > > > And have following result: > > > > > > root at pve2:~# lvs > > LV VG Attr LSize Pool Origin Data% Meta% > > Move Log Cpy%Sync Convert > > proxthin proxvg twi-aotz-- 1.63t 22.03 6.34 > > > > proxvz proxvg Vwi-aotz-- 1.10t proxthin 1.67 > > root at pve2:~# lvdisplay > > --- Logical volume --- > > LV Name proxthin > > VG Name proxvg > > LV UUID 4cEIr9-3ZVQ-vsy1-q9ZX-GsaD-7oq0-pZixsX > > LV Write Access read/write > > LV Creation host, time pve2, 2021-04-01 13:09:41 -0300 > > LV Pool metadata proxthin_tmeta > > LV Pool data proxthin_tdata > > LV Status available > > # open 3 > > LV Size 1.63 TiB > > Allocated pool data 22.03% > > Allocated metadata 6.34% > > Current LE 428451 > > Segments 1 > > Allocation inherit > > Read ahead sectors auto > > - currently set to 256 > > Block device 253:2 > > > > --- Logical volume --- > > LV Path /dev/proxvg/proxvz > > LV Name proxvz > > VG Name proxvg > > LV UUID huzpPT-g0Gd-3Jwb-2ydz-InHh-73vN-Jnc5TR > > LV Write Access read/write > > LV Creation host, time pve2, 2021-04-01 13:10:12 -0300 > > LV Pool name proxthin > > LV Status available > > # open 1 > > LV Size 1.10 TiB > > Mapped size 1.67% > > Current LE 288359 > > Segments 1 > > Allocation inherit > > Read ahead sectors auto > > - currently set to 256 > > Block device 253:4 > > > > > > Have following comments: > > > > I can create VMs on proxthin partition so it is ok. > > > > I can create backup on proxvz partition so it is ok. > > Looks OK imho. > > > > > What im concerned about is: > > > > Physic storage space is about 1.8TG , how is it possible to create a 1.6 > > and 1.1T volumnes inside ? > > LVM Thin is 'thin-provisioned' it only uses space when it is really > written. > > > It can be a problem in the future ? > > yes, if you do not monitor your real usage, if the thinpool runs full, > you can lose data. > > > I was thinking about reduce proxthin partition to 600Gb aprox , so it > make > > same sense 1.1T + 600G aprox 1.8 T > > But there is no LV Path on proxthin partition so I can unmount and the > > reduce. > > So .. > > What im missing here ? do I need to reduce proxthin partition ( I do need > > the 1.1T partition to backup). > > the LV 'proxvz' is inside the thinpool 'proxthin' so as long as you > never allocate more that ~500GiB of vm/ct volumes, it should be fine. > > alos, on allocation, the thinpool will print warnings if the allocated > lvs are bigger than the space available > > hope this helps > > > Hope to be clear about this guys. > > Any comment would be wellcome. > > Leandro. > > _______________________________________________ > > pve-user mailing list > > pve-user at lists.proxmox.com > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > > > From piviul at riminilug.it Tue Apr 13 10:05:05 2021 From: piviul at riminilug.it (Piviul) Date: Tue, 13 Apr 2021 10:05:05 +0200 Subject: [PVE-User] Edit: Boot Order mask Message-ID: I ask[?] about this little problem on the forum but nobody found a solution, so I try here... In my PVE the mask where I can change the Boot Order options of a VM is not ever the same. If I access to the mask from 2 nodes (say node1 and node2) the mask is a simple html form with only combo boxes. On the third node (say node3) the mask is more sophisticated, can support the drag and drop, has checkbox... in other word it's different. So I would like to know why my three nodes doesn't have the same mask even if they are at the same proxmox version and if there is a way that all nodes shows the same mask. I ask you because this is not only a layout problem; if I modify the boot order options from the node3, I can see strange chars in the PVE gui of the other two nodes but if I configure the boot order options from node1 or node2 all seems works flawless. Best regards Piviul [?] https://forum.proxmox.com/threads/strange-chars-in-boot-order-options.87169/ From alwin at antreich.com Tue Apr 13 11:03:39 2021 From: alwin at antreich.com (Alwin Antreich) Date: Tue, 13 Apr 2021 09:03:39 +0000 Subject: [PVE-User] Edit: Boot Order mask In-Reply-To: References: Message-ID: Hello Piviul, April 13, 2021 10:05 AM, "Piviul" wrote: > I ask[?] about this little problem on the forum but nobody found a > solution, so I try here... > > In my PVE the mask where I can change the Boot Order options of a VM is > not ever the same. If I access to the mask from 2 nodes (say node1 and > node2) the mask is a simple html form with only combo boxes. On the > third node (say node3) the mask is more sophisticated, can support the > drag and drop, has checkbox... in other word it's different. So I would > like to know why my three nodes doesn't have the same mask even if they > are at the same proxmox version and if there is a way that all nodes > shows the same mask. > > I ask you because this is not only a layout problem; if I modify the > boot order options from the node3, I can see strange chars in the PVE > gui of the other two nodes but if I configure the boot order options > from node1 or node2 all seems works flawless. Are you're nodes all on the same update level? If not update all of them. If yes, then try to clear the browser cache. -- Cheers, Alwin From piviul at riminilug.it Tue Apr 13 10:44:36 2021 From: piviul at riminilug.it (Piviul) Date: Tue, 13 Apr 2021 10:44:36 +0200 Subject: [PVE-User] pve-user Digest, Vol 157, Issue 12 In-Reply-To: References: Message-ID: <432743f5-b0ab-414f-297b-e55d8163166f@riminilug.it> Hi Oleksii, if you want to unsuscribe this list please go to https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user; you can found the instructions at the end of the page. Best regards Piviul Il 12/04/21 00:43, Oleksii Tokovenko ha scritto: > unsibscribe > > ??, 11 ????. 2021 ? 13:00 ????: > >> Send pve-user mailing list submissions to >> pve-user at lists.proxmox.com >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> or, via email, send a message with subject or body 'help' to >> pve-user-request at lists.proxmox.com >> >> You can reach the person managing the list at >> pve-user-owner at lists.proxmox.com >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of pve-user digest..." >> >> >> Today's Topics: >> >> 1. Re: Revisited: External disk backup using PBS - Requesting >> Criticism/Advice? (Lindsay Mathieson) >> 2. PBS Incremental and stopped VM's (Lindsay Mathieson) >> 3. Re: PBS Incremental and stopped VM's (Arjen) >> 4. Re: PBS Incremental and stopped VM's (Lindsay Mathieson) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sat, 10 Apr 2021 23:28:50 +1000 >> From: Lindsay Mathieson >> To: pve-user at lists.proxmox.com >> Subject: Re: [PVE-User] Revisited: External disk backup using PBS - >> Requesting Criticism/Advice? >> Message-ID: >> Content-Type: text/plain; charset=windows-1252; format=flowed >> >> On 10/04/2021 5:10 pm, Arjen via pve-user wrote: >>> Don't expect to be able to backup the PBS container with 4TB to a 2TB >> external drive. >> >> I only passed 2TB through and the actual backup data comes to 1.3TB >> >>> The Datastore of a PSB does not compress much further and Proxmox VE >> Backup will only backup virtual disks and not mountpoints or storages >> passed from host, if I understand correctly. >> >> >> I wondered that. Will be testing. >> >>> I suggest adding a virtual disk of 2TB to the PBS container (and format >> it with ext4) which can be backed up by the Proxmox VE Backup. >> >> >> Certainly a possibility. >> >> >> I also wondered if it was practical to attach an external disk to PBS as >> a Datastore, then detach it. A bit more manual, but doable. >> >> >>> I would also run the PBS container (with virtual disk) on the cluster >> instead on separate hardware which is a single point of failure. The local >> PBS would be then just as reliable as your cluster. >> >> >> I want to keep the storage separate from the cluster, in that regard the >> local storage is a single point of failure, hence the need for offsite >> storage as well :) >> >> >>> Regarding safeness: I suggest doing a automated disaster recovery every >> week to make sure it works as expected. Or at least partially, like >> restoring the PBS from an external drive. >> >> >> I'll definitely be testing restore options to check that it works. >> >>> Regarding practicality: I would have a remote PBS sync with your local >> PBS instead of moving physical disks (but you mentioned before that that >> was not really possible). >> >> >> Alas :( >> >> >> Perhaps I could do a backup on site, then physically move it offsite and >> attach it to a offsite PBS server and then sync it remotely - >> incremental backups over the net would be doable. >> >> >> nb. Our NAS died, hence my increased investigation of this :) Definitely >> want to go with a more open and targeted solution this time, the NAS was >> a good appliance, but inflexible. >> >> Thanks! >> >> -- >> Lindsay >> >> >> >> >> ------------------------------ >> >> Message: 2 >> Date: Sat, 10 Apr 2021 23:36:23 +1000 >> From: Lindsay Mathieson >> To: pve-user at lists.proxmox.com >> Subject: [PVE-User] PBS Incremental and stopped VM's >> Message-ID: >> Content-Type: text/plain; charset=utf-8; format=flowed >> >> I'm guessing only running VM's (with dirty bitmap support) can be >> incrementally backed up? >> >> >> Might be nice if we could schedule backups for only running VM's >> >> -- >> Lindsay >> >> >> >> >> ------------------------------ >> >> Message: 3 >> Date: Sat, 10 Apr 2021 13:43:43 +0000 >> From: Arjen >> To: Proxmox VE user list >> Subject: Re: [PVE-User] PBS Incremental and stopped VM's >> Message-ID: >> >> > protonmail.ch> >> >> Content-Type: text/plain; charset=utf-8 >> >> On Saturday, April 10th, 2021 at 15:36, Lindsay Mathieson < >> lindsay.mathieson at gmail.com> wrote: >> >>> I'm guessing only running VM's (with dirty bitmap support) can be >>> >>> incrementally backed up? >>> >>> Might be nice if we could schedule backups for only running VM's >> Just to be clear: PBS always makes a full backup. The resulting data is >> deduplicated (before sending it to the server), which almost always reduces >> the writes to the server. An administration of changed virtual disk blocks >> is kept for running VMs, which only reduces the reads from VMs that have >> not been restarted between backups. It data transfer over the network is >> the bottleneck, you will have most benefit from the former (less changes, >> less transfers). The latter only speeds up the backup due to less reads (of >> unchanged data) from disk. >> >> best regards, Arjen >> >> >> >> ------------------------------ >> >> Message: 4 >> Date: Sun, 11 Apr 2021 00:06:19 +1000 >> From: Lindsay Mathieson >> To: pve-user at lists.proxmox.com >> Subject: Re: [PVE-User] PBS Incremental and stopped VM's >> Message-ID: <90b0f7df-ca15-f9cd-b76a-0f8f26e24917 at gmail.com> >> Content-Type: text/plain; charset=windows-1252; format=flowed >> >> On 10/04/2021 11:43 pm, Arjen via pve-user wrote: >>> Just to be clear: PBS always makes a full backup. The resulting data is >> deduplicated (before sending it to the server), which almost always reduces >> the writes to the server. >> >> Ah, I see now, thanks, I didn't understand that part of things. Looking >> at the logs of my 2nd backup, I see that stopped VM's had zero bytes >> written to the backup server. >> >> -- >> Lindsay >> >> >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> >> ------------------------------ >> >> End of pve-user Digest, Vol 157, Issue 12 >> ***************************************** >> >> From alain.pean at c2n.upsaclay.fr Tue Apr 13 10:57:09 2021 From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=) Date: Tue, 13 Apr 2021 10:57:09 +0200 Subject: [PVE-User] Edit: Boot Order mask In-Reply-To: References: Message-ID: Le 13/04/2021 ? 10:05, Piviul a ?crit?: > I ask[?] about this little problem on the forum but nobody found a > solution, so I try here... > > In my PVE the mask where I can change the Boot Order options of a VM > is not ever the same. If I access to the mask from 2 nodes (say node1 > and node2) the mask is a simple html form with only combo boxes. On > the third node (say node3) the mask is more sophisticated, can support > the drag and drop, has checkbox... in other word it's different. So I > would like to know why my three nodes doesn't have the same mask even > if they are at the same proxmox version and if there is a way that all > nodes shows the same mask. > > I ask you because this is not only a layout problem; if I modify the > boot order options from the node3, I can see strange chars in the PVE > gui of the other two nodes but if I configure the boot order options > from node1 or node2 all seems works flawless. Hi Piviul, My guess would be that your nodes would have different versions of Proxmox packages. And not the same proxmox interface on each... The best thing would be to have the complete version of each package wich 'pveversion -v', but a shorter first information is to display, and copy paste just version here ? # pveversion Thanks Alain -- Administrateur Syst?me/R?seau C2N Centre de Nanosciences et Nanotechnologies (UMR 9001) Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau Tel : 01-70-27-06-88 Bureau A255 From piviul at riminilug.it Wed Apr 14 09:37:45 2021 From: piviul at riminilug.it (Piviul) Date: Wed, 14 Apr 2021 09:37:45 +0200 Subject: [PVE-User] Edit: Boot Order mask In-Reply-To: References: Message-ID: <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it> Il 13/04/21 10:57, Alain P?an ha scritto: > Hi Piviul, > > My guess would be that your nodes would have different versions of > Proxmox packages. And not the same proxmox interface on each... > > The best thing would be to have the complete version of each package > wich 'pveversion -v', but a shorter first information is to display, > and copy paste just version here ? > # pveversion > > Thanks I Alain, first of all thank you very much indeed to you and to all people answered this thread. I reply your message but the infos here should answer even the infos asked from Alwin... I send directly the output differences from the command pveversion with -v flag because all three nodes show the same "pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve)" version. So I have launched the following command in all three nodes: # pveversion -v > pveversion.$(hostname) obtaining 3 differents files and I've done the diff between the first two files (referring to pve01 and pve02) and as expected there is no difference: $ diff pveversion.pve0{1,2} Then I have done the diff between the first and the third node and this is the result: $ diff pveversion.pve0{1,3} 5d4 < pve-kernel-5.3: 6.1-6 8,9c7 < pve-kernel-5.3.18-3-pve: 5.3.18-3 < pve-kernel-5.3.10-1-pve: 5.3.10-1 --- > pve-kernel-5.4.34-1-pve: 5.4.34-2 there are some little differences yes but in kernel that are not in use any more (in all 3 nodes uname -r shows 5.4.106-1-pve)... Attached you can find all three files hoping the system doesn't cut them. Please can I ask you if you have a 6.3 node in your installations that was previously in 6.2 version (i.e. not installed directly in 6.3 version)? Can you tell me if the "Boot order" musk is the one with only combo boxes or the more evoluted drag and drop musk? Thank you very much Piviul -------------- next part -------------- proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve) pve-manager: 6.3-6 (running version: 6.3-6/2184247e) pve-kernel-5.4: 6.3-8 pve-kernel-helper: 6.3-8 pve-kernel-5.3: 6.1-6 pve-kernel-5.4.106-1-pve: 5.4.106-1 pve-kernel-5.4.103-1-pve: 5.4.103-1 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.3.10-1-pve: 5.3.10-1 ceph: 14.2.19-pve1 ceph-fuse: 14.2.19-pve1 corosync: 3.1.0-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve3 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.0.8 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.1-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.3-5 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.1-1 libpve-storage-perl: 6.3-8 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.0.13-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.4-9 pve-cluster: 6.2-1 pve-container: 3.3-4 pve-docs: 6.3-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-5 pve-xtermjs: 4.7.0-3 qemu-server: 6.3-10 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1 -------------- next part -------------- proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve) pve-manager: 6.3-6 (running version: 6.3-6/2184247e) pve-kernel-5.4: 6.3-8 pve-kernel-helper: 6.3-8 pve-kernel-5.3: 6.1-6 pve-kernel-5.4.106-1-pve: 5.4.106-1 pve-kernel-5.4.103-1-pve: 5.4.103-1 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.3.10-1-pve: 5.3.10-1 ceph: 14.2.19-pve1 ceph-fuse: 14.2.19-pve1 corosync: 3.1.0-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve3 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.0.8 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.1-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.3-5 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.1-1 libpve-storage-perl: 6.3-8 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.0.13-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.4-9 pve-cluster: 6.2-1 pve-container: 3.3-4 pve-docs: 6.3-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-5 pve-xtermjs: 4.7.0-3 qemu-server: 6.3-10 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1 -------------- next part -------------- proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve) pve-manager: 6.3-6 (running version: 6.3-6/2184247e) pve-kernel-5.4: 6.3-8 pve-kernel-helper: 6.3-8 pve-kernel-5.4.106-1-pve: 5.4.106-1 pve-kernel-5.4.103-1-pve: 5.4.103-1 pve-kernel-5.4.34-1-pve: 5.4.34-2 ceph: 14.2.19-pve1 ceph-fuse: 14.2.19-pve1 corosync: 3.1.0-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve3 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.0.8 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.1-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.3-5 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.1-1 libpve-storage-perl: 6.3-8 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.0.13-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.4-9 pve-cluster: 6.2-1 pve-container: 3.3-4 pve-docs: 6.3-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-5 pve-xtermjs: 4.7.0-3 qemu-server: 6.3-10 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1 From elacunza at binovo.es Wed Apr 14 11:04:10 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 14 Apr 2021 11:04:10 +0200 Subject: PVE 6.2 Strange cluster node fence Message-ID: <4cd1b5fc-4c66-77d9-6af8-82831ca37f76@binovo.es> Hi all, Yesterday we had a strange fence happen in a PVE 6.2 cluster. Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been operating normally for a year. Last update was on January 21st 2021. Storage is Ceph and nodes are connected to the same network switch with active-pasive bonds. proxmox1 was fenced and automatically rebooted, then everything recovered. HA restarted VMs in other nodes too. proxmox1 syslog: (no network link issues reported at device level) --- Apr 13 11:35:14 proxmox1 corosync[1410]:?? [KNET? ] link: host: 3 link: 0 is down Apr 13 11:35:14 proxmox1 corosync[1410]:?? [KNET? ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 13 11:35:14 proxmox1 corosync[1410]:?? [KNET? ] host: host: 3 has no active links Apr 13 11:35:15 proxmox1 corosync[1410]:?? [TOTEM ] Token has not been received in 61 ms Apr 13 11:35:15 proxmox1 corosync[1410]:?? [TOTEM ] A processor failed, forming new configuration. Apr 13 11:35:18 proxmox1 corosync[1410]:?? [KNET? ] rx: host: 3 link: 0 is up Apr 13 11:35:18 proxmox1 corosync[1410]:?? [KNET? ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 13 11:35:18 proxmox1 corosync[1410]:?? [TOTEM ] Token has not been received in 3069 ms Apr 13 11:35:19 proxmox1 corosync[1410]:?? [TOTEM ] A new membership (2.5477) was formed. Members left: 1 3 Apr 13 11:35:19 proxmox1 corosync[1410]:?? [TOTEM ] Failed to receive the leave message. failed: 1 3 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] notice: members: 2/1398 Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] This node is within the non-primary component and will NOT provide any services. Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] Members[1]: 2 Apr 13 11:35:19 proxmox1 corosync[1410]:?? [MAIN? ] Completed service synchronization, ready to provide service. Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: members: 2/1398 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: node lost quorum Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: received write while not quorate - trigger resync Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: leaving CPG group Apr 13 11:35:19 proxmox1 corosync[1410]:?? [TOTEM ] A new membership (1.547b) was formed. Members joined: 1 3 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: members: 1/1396, 2/1398, 3/1457 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: starting data syncronisation Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] This node is within the primary component and will provide service. Apr 13 11:35:19 proxmox1 corosync[1410]:?? [QUORUM] Members[3]: 1 2 3 Apr 13 11:35:19 proxmox1 corosync[1410]:?? [MAIN? ] Completed service synchronization, ready to provide service. Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: node has quorum Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: received sync request (epoch 1/1396/00000006) Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: received sync request (epoch 1/1396/00000007) Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: received all states Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [status] notice: all data is up to date Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] notice: start cluster connection Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_join failed: 14 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: can't initialize service Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:19 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:19 proxmox1 pve-ha-lrm[1770]: lost lock 'ha_agent_proxmox1_lock - cfs lock update failed - Device or resource busy Apr 13 11:35:20 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:21 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:22 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:23 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:23 proxmox1 pve-ha-lrm[1770]: status change active => lost_agent_lock Apr 13 11:35:24 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 7 more ...] Apr 13 11:35:25 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-nnAQGx/qb): Bad message (74) Apr 13 11:35:27 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2 Apr 13 11:35:28 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:29 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:30 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:31 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-JDQj3Z/qb): Bad message (74) Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 2 Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 2 Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2 Apr 13 11:35:33 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 2 Apr 13 11:35:34 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:35 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:36 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 7 more ...] Apr 13 11:35:37 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-jgBffR/qb): Bad message (74) Apr 13 11:35:39 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2 Apr 13 11:35:39 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:39 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:40 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:41 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:42 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:43 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-dWqAg7/qb): Bad message (74) Apr 13 11:35:45 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2 Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:46 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:47 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:48 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 7 more ...] Apr 13 11:35:49 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-LnKe7L/qb): Bad message (74) Apr 13 11:35:51 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2 Apr 13 11:35:52 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:53 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 5 more ...] Apr 13 11:35:54 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 9 more ...] Apr 13 11:35:55 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 0 Apr 13 11:35:55 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 0 Apr 13 11:35:55 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-dXTlNP/qb): Bad message (74) Apr 13 11:35:57 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_local_get failed: 2 Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:58 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:35:59 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:36:00 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [... 9 more ...] Apr 13 11:36:01 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:36:01 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:36:01 proxmox1 pvesr[938407]: trying to acquire cfs lock 'file-replication_cfg' ... Apr 13 11:36:01 proxmox1 corosync[1410]:?? [QB??? ] request returned error (/dev/shm/qb-1410-1398-33-17q0ii/qb): Bad message (74) Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 2 Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 2 Apr 13 11:36:03 proxmox1 pvesr[938407]: trying to acquire cfs lock 'file-replication_cfg' ... Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 Apr 13 11:36:03 proxmox1 pmxcfs[1398]: [dcdb] crit: cpg_send_message failed: 9 [reset garbage) --- proxmox2 log: --- Apr 13 11:35:15 proxmox2 corosync[1402]:?? [KNET? ] link: host: 3 link: 0 is down Apr 13 11:35:15 proxmox2 corosync[1402]:?? [KNET? ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 13 11:35:15 proxmox2 corosync[1402]:?? [KNET? ] host: host: 3 has no active links Apr 13 11:35:15 proxmox2 corosync[1402]:?? [TOTEM ] Token has not been received in 1237 ms Apr 13 11:35:15 proxmox2 corosync[1402]:?? [TOTEM ] A processor failed, forming new configuration. Apr 13 11:35:17 proxmox2 corosync[1402]:?? [KNET? ] rx: host: 3 link: 0 is up Apr 13 11:35:17 proxmox2 corosync[1402]:?? [KNET? ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 13 11:35:18 proxmox2 corosync[1402]:?? [TOTEM ] Token has not been received in 4637 ms Apr 13 11:35:19 proxmox2 corosync[1402]:?? [TOTEM ] A new membership (1.5477) was formed. Members left: 2 Apr 13 11:35:19 proxmox2 corosync[1402]:?? [TOTEM ] Failed to receive the leave message. failed: 2 Apr 13 11:35:19 proxmox2 corosync[1402]:?? [TOTEM ] A new membership (1.547b) was formed. Members joined: 2 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: members: 1/1396, 3/1457 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: starting data syncronisation Apr 13 11:35:19 proxmox2 corosync[1402]:?? [QUORUM] Members[3]: 1 2 3 Apr 13 11:35:19 proxmox2 corosync[1402]:?? [MAIN? ] Completed service synchronization, ready to provide service. Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: cpg_send_message retried 1 times Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: members: 1/1396, 3/1457 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: starting data syncronisation Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: members: 1/1396, 2/1398, 3/1457 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: received sync request (epoch 1/1396/00000006) Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: received sync request (epoch 1/1396/00000006) Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: received sync request (epoch 1/1396/00000007) Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: received all states Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: leader is 1/1396 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: synced members: 1/1396, 3/1457 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: start sending inode updates Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: sent all (0) updates Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: all data is up to date Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [dcdb] notice: dfsm_deliver_queue: queue length 4 Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: received all states Apr 13 11:35:19 proxmox2 pmxcfs[1396]: [status] notice: all data is up to date Apr 13 11:36:00 proxmox2 systemd[1]: Starting Proxmox VE replication runner... Apr 13 11:36:00 proxmox2 systemd[1]: pvesr.service: Succeeded. Apr 13 11:36:00 proxmox2 systemd[1]: Started Proxmox VE replication runner. Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: members: 1/1396, 2/1398, 3/1457 Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: starting data syncronisation Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: received sync request (epoch 1/1396/00000007) Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: received all states Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: leader is 1/1396 Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: synced members: 1/1396, 3/1457 Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: start sending inode updates Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: sent all (8) updates Apr 13 11:36:13 proxmox2 pmxcfs[1396]: [dcdb] notice: all data is up to date Apr 13 11:36:25 proxmox2 corosync[1402]:?? [KNET? ] link: host: 2 link: 0 is down Apr 13 11:36:25 proxmox2 corosync[1402]:?? [KNET? ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 13 11:36:25 proxmox2 corosync[1402]:?? [KNET? ] host: host: 2 has no active links Apr 13 11:36:26 proxmox2 corosync[1402]:?? [TOTEM ] Token has not been received in 61 ms Apr 13 11:36:26 proxmox2 corosync[1402]:?? [TOTEM ] A processor failed, forming new configuration. Apr 13 11:36:28 proxmox2 corosync[1402]:?? [TOTEM ] A new membership (1.547f) was formed. Members left: 2 Apr 13 11:36:28 proxmox2 corosync[1402]:?? [TOTEM ] Failed to receive the leave message. failed: 2 Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: members: 1/1396, 3/1457 Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: starting data syncronisation Apr 13 11:36:28 proxmox2 corosync[1402]:?? [QUORUM] Members[2]: 1 3 Apr 13 11:36:28 proxmox2 corosync[1402]:?? [MAIN? ] Completed service synchronization, ready to provide service. Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: cpg_send_message retried 1 times Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: members: 1/1396, 3/1457 Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: starting data syncronisation Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: received sync request (epoch 1/1396/00000008) Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: received sync request (epoch 1/1396/00000008) Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: received all states Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: leader is 1/1396 Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: synced members: 1/1396, 3/1457 Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: start sending inode updates Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: sent all (0) updates Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: all data is up to date Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [dcdb] notice: dfsm_deliver_queue: queue length 2 Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: received all states Apr 13 11:36:28 proxmox2 pmxcfs[1396]: [status] notice: all data is up to date Apr 13 11:36:29 proxmox2 pve-ha-crm[1801]: node 'proxmox1': state changed from 'online' => 'unknown' Apr 13 11:36:38 proxmox2 pvestatd[1553]: got timeout Apr 13 11:36:38 proxmox2 pvestatd[1553]: status update time (5.090 seconds) Apr 13 11:36:45 proxmox2 ceph-osd[1424]: 2021-04-13 11:36:45.407 7f94513df700 -1 osd.2 1166 heartbeat_check: no reply from 192.168.91.11:6820 osd.0 since back 2021-04-13 11:36:23.684429 front 2021-04-13 11:36:23.684422 (oldest deadline 2021-04-13 11:36:44.784447) --- all 3 nodes have the same running Proxmox versions: root at proxmox1:~# pveversion -v proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve) pve-manager: 6.3-3 (running version: 6.3-3/eee5f901) pve-kernel-5.4: 6.3-3 pve-kernel-helper: 6.3-3 pve-kernel-5.3: 6.1-6 pve-kernel-5.4.78-2-pve: 5.4.78-2 pve-kernel-5.4.65-1-pve: 5.4.65-1 pve-kernel-5.4.44-2-pve: 5.4.44-2 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.3.18-2-pve: 5.3.18-2 ceph: 14.2.16-pve1 ceph-fuse: 14.2.16-pve1 corosync: 3.0.4-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.16-pve1 libproxmox-acme-perl: 1.0.7 libproxmox-backup-qemu0: 1.0.2-1 libpve-access-control: 6.1-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.3-2 libpve-guest-common-perl: 3.1-3 libpve-http-server-perl: 3.1-1 libpve-storage-perl: 6.3-3 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.3-1 lxcfs: 4.0.3-pve3 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.0.6-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.4-3 pve-cluster: 6.2-1 pve-container: 3.3-2 pve-docs: 6.3-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.1-3 pve-ha-manager: 3.1-1 pve-i18n: 2.2-2 pve-qemu-kvm: 5.1.0-7 pve-xtermjs: 4.7.0-3 qemu-server: 6.3-2 smartmontools: 7.1-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 0.8.5-pve1 We are upgrading the cluster in the next days as part of our 3-month upgrade cycle, but can wait. Any ideas? Could this be a bug? Thanks a lot Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From mir at miras.org Wed Apr 14 11:21:09 2021 From: mir at miras.org (Michael Rasmussen) Date: Wed, 14 Apr 2021 11:21:09 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: References: Message-ID: <20210414112109.6f57f496@sleipner.datanom.net> On Wed, 14 Apr 2021 11:04:10 +0200 Eneko Lacunza via pve-user wrote: > Hi all, > > Yesterday we had a strange fence happen in a PVE 6.2 cluster. > > Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been > operating normally for a year. Last update was on January 21st 2021. > Storage is Ceph and nodes are connected to the same network switch > with active-pasive bonds. > > proxmox1 was fenced and automatically rebooted, then everything > recovered. HA restarted VMs in other nodes too. > > proxmox1 syslog: (no network link issues reported at device level) I have seen this occasionally and every time the cause was high network load/network congestion which caused token timeout. The default token timeout in corosync IMHO is very optimistically configured to 1000 ms so I have changed this setting to 5000 ms and after I have done this I have never seen fencing happening caused by network load/network congestion again. You could try this and see if that helps you. PS. my cluster communication is on a dedicated gb bonded vlan. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E mir datanom net https://pgp.key-server.io/pks/lookup?search=0xE501F51C mir miras org https://pgp.key-server.io/pks/lookup?search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: When I woke up this morning, my girlfriend asked if I had slept well. I said, "No, I made a few mistakes." -- Steven Wright -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From elacunza at binovo.es Wed Apr 14 12:12:09 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 14 Apr 2021 12:12:09 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: References: Message-ID: <2d33e64d-ee43-0a3b-0a24-538df9ef837c@binovo.es> Hi Michael, El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: > On Wed, 14 Apr 2021 11:04:10 +0200 > Eneko Lacunza via pve-user wrote: > >> Hi all, >> >> Yesterday we had a strange fence happen in a PVE 6.2 cluster. >> >> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been >> operating normally for a year. Last update was on January 21st 2021. >> Storage is Ceph and nodes are connected to the same network switch >> with active-pasive bonds. >> >> proxmox1 was fenced and automatically rebooted, then everything >> recovered. HA restarted VMs in other nodes too. >> >> proxmox1 syslog: (no network link issues reported at device level) > I have seen this occasionally and every time the cause was high network > load/network congestion which caused token timeout. The default token > timeout in corosync IMHO is very optimistically configured to 1000 ms > so I have changed this setting to 5000 ms and after I have done this I > have never seen fencing happening caused by network load/network > congestion again. You could try this and see if that helps you. > > PS. my cluster communication is on a dedicated gb bonded vlan. Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes: - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches - Both switches are interconnected with a SFP+ DAC - Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch. - Connectivity to the LAN is done with 1 Gbit link - Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks. I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence. Network cards are Broadcom. Thanks Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From smr at kmi.com Wed Apr 14 13:22:33 2021 From: smr at kmi.com (Stefan M. Radman) Date: Wed, 14 Apr 2021 11:22:33 +0000 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: References: Message-ID: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> Hi Eneko Do you have separate physical interfaces for the cluster (corosync) traffic? Do you have them on separate VLANs on your switches? Are you running 1 or 2 corosync rings? Please post your /etc/network/interfaces and explain which interface connects where. Thanks Stefan On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user > wrote: From: Eneko Lacunza > Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence Date: April 14, 2021 at 12:12:09 GMT+2 To: pve-user at lists.proxmox.com Hi Michael, El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: On Wed, 14 Apr 2021 11:04:10 +0200 Eneko Lacunza via pve-user> wrote: Hi all, Yesterday we had a strange fence happen in a PVE 6.2 cluster. Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been operating normally for a year. Last update was on January 21st 2021. Storage is Ceph and nodes are connected to the same network switch with active-pasive bonds. proxmox1 was fenced and automatically rebooted, then everything recovered. HA restarted VMs in other nodes too. proxmox1 syslog: (no network link issues reported at device level) I have seen this occasionally and every time the cause was high network load/network congestion which caused token timeout. The default token timeout in corosync IMHO is very optimistically configured to 1000 ms so I have changed this setting to 5000 ms and after I have done this I have never seen fencing happening caused by network load/network congestion again. You could try this and see if that helps you. PS. my cluster communication is on a dedicated gb bonded vlan. Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes: - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches - Both switches are interconnected with a SFP+ DAC - Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch. - Connectivity to the LAN is done with 1 Gbit link - Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks. I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence. Network cards are Broadcom. Thanks Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ pve-user mailing list pve-user at lists.proxmox.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0 CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. From elacunza at binovo.es Wed Apr 14 15:18:13 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 14 Apr 2021 15:18:13 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> References: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> Message-ID: <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es> Hi Stefan, El 14/4/21 a las 13:22, Stefan M. Radman escribi?: > Hi Eneko > > Do you have separate physical interfaces for the cluster (corosync) > traffic? No. > Do you have them on separate VLANs on your switches? Onyl Ceph traffic is on VLAN91, the rest is untagged. > Are you running 1 or 2 corosync rings? This is standard... no hand tuning: nodelist { ? node { ??? name: proxmox1 ??? nodeid: 2 ??? quorum_votes: 1 ??? ring0_addr: 192.168.90.11 ? } ? node { ??? name: proxmox2 ??? nodeid: 1 ??? quorum_votes: 1 ??? ring0_addr: 192.168.90.12 ? } ? node { ??? name: proxmox3 ??? nodeid: 3 ??? quorum_votes: 1 ??? ring0_addr: 192.168.90.13 ? } } quorum { ? provider: corosync_votequorum } totem { ? cluster_name: CLUSTERNAME ? config_version: 3 ? interface { ??? linknumber: 0 ? } ? ip_version: ipv4-6 ? secauth: on ? version: 2 } > > Please post your /etc/network/interfaces and explain which interface > connects where. auto lo iface lo inet loopback iface ens2f0np0 inet manual # Switch2 iface ens2f1np1 inet manual # Switch1 iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual ??? bond-slaves ens2f0np0 ens2f1np1 ??? bond-miimon 100 ??? bond-mode active-backup ??? bond-primary ens2f0np1 auto bond0.91 iface bond0.91 inet static ??? address 192.168.91.11 #Ceph auto vmbr0 iface vmbr0 inet static ??? address 192.168.90.11 ??? gateway 192.168.90.1 ??? bridge-ports bond0 ??? bridge-stp off ??? bridge-fd 0 Thanks > > Thanks > > Stefan > > >> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user >> > wrote: >> >> >> *From: *Eneko Lacunza > >> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence* >> *Date: *April 14, 2021 at 12:12:09 GMT+2 >> *To: *pve-user at lists.proxmox.com >> >> >> Hi Michael, >> >> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: >>> On Wed, 14 Apr 2021 11:04:10 +0200 >>> Eneko Lacunza via pve-user>> > ?wrote: >>> >>>> Hi all, >>>> >>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster. >>>> >>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been >>>> operating normally for a year. Last update was on January 21st 2021. >>>> Storage is Ceph and nodes are connected to the same network switch >>>> with active-pasive bonds. >>>> >>>> proxmox1 was fenced and automatically rebooted, then everything >>>> recovered. HA restarted VMs in other nodes too. >>>> >>>> proxmox1 syslog: (no network link issues reported at device level) >>> I have seen this occasionally and every time the cause was high network >>> load/network congestion which caused token timeout. The default token >>> timeout in corosync IMHO is very optimistically configured to 1000 ms >>> so I have changed this setting to 5000 ms and after I have done this I >>> have never seen fencing happening caused by network load/network >>> congestion again. You could try this and see if that helps you. >>> >>> PS. my cluster communication is on a dedicated gb bonded vlan. >> Thanks for the info. In this case network is 10Gbit (I see I didn't >> include this info) but only for proxmox nodes: >> >> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches >> - Both switches are interconnected with a SFP+ DAC >> - Active-passive Bonds in each proxmox node go one SFP+ interface on >> each switch. Primary interfaces are configured to be on the same switch. >> - Connectivity to the LAN is done with 1 Gbit link >> - Proxmox 2x10G Bond is used for VM networking and Ceph >> public/private networks. >> >> I wouldn't expect high network load/congestion because it's on an >> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were >> ocurring during the fence. >> >> Network cards are Broadcom. >> >> Thanks >> >> Eneko Lacunza >> Zuzendari teknikoa | Director t?cnico >> Binovo IT Human Project >> >> Tel. +34 943 569 206 | https://www.binovo.es >> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun >> >> https://www.youtube.com/user/CANALBINOVO >> >> https://www.linkedin.com/company/37269706/ >> >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0 > > > CONFIDENTIALITY NOTICE: /This communication may contain privileged and > confidential information, or may otherwise be protected from > disclosure, and is intended solely for use of the intended > recipient(s). If you are not the intended recipient of this > communication, please notify the sender that you have received this > communication in error and delete and destroy all copies in your > possession. / > Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From smr at kmi.com Wed Apr 14 15:57:09 2021 From: smr at kmi.com (Stefan M. Radman) Date: Wed, 14 Apr 2021 13:57:09 +0000 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es> References: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es> Message-ID: <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com> Hi Eneko That?s a nice setup and I bet it works well but you should do some hand-tuning to increase resilience. Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces? If that?s the case I?d strongly recommend to turn them into dedicated untagged interfaces for the cluster traffic, running on two separate ?rings". https://pve.proxmox.com/wiki/Separate_Cluster_Network https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol Create two corosync rings, using isolated VLANs on your two switches e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2. eno1 => Switch1 => VLAN4001 eno2 => Switch2 => VLAN4002 Restrict VLAN4001 to the access ports where the eno1 interfaces are connected. Prune VLAN4001 from ALL trunks. Restrict VLAN4001 to the access ports where the eno2 interfaces are connected. Prune VLAN4002 from ALL trunks. Assign the eno1 and eno2 interfaces to two separate subnets and you are done. With separate rings you don?t even have to stop your cluster while migrating corosync to the new subnets. Just do them one-by-one. With corosync running on two separate rings isolated from the rest of your network you should not see any further node fencing. Stefan On Apr 14, 2021, at 15:18, Eneko Lacunza > wrote: Hi Stefan, El 14/4/21 a las 13:22, Stefan M. Radman escribi?: Hi Eneko Do you have separate physical interfaces for the cluster (corosync) traffic? No. Do you have them on separate VLANs on your switches? Onyl Ceph traffic is on VLAN91, the rest is untagged. Are you running 1 or 2 corosync rings? This is standard... no hand tuning: nodelist { node { name: proxmox1 nodeid: 2 quorum_votes: 1 ring0_addr: 192.168.90.11 } node { name: proxmox2 nodeid: 1 quorum_votes: 1 ring0_addr: 192.168.90.12 } node { name: proxmox3 nodeid: 3 quorum_votes: 1 ring0_addr: 192.168.90.13 } } quorum { provider: corosync_votequorum } totem { cluster_name: CLUSTERNAME config_version: 3 interface { linknumber: 0 } ip_version: ipv4-6 secauth: on version: 2 } Please post your /etc/network/interfaces and explain which interface connects where. auto lo iface lo inet loopback iface ens2f0np0 inet manual # Switch2 iface ens2f1np1 inet manual # Switch1 iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual bond-slaves ens2f0np0 ens2f1np1 bond-miimon 100 bond-mode active-backup bond-primary ens2f0np1 auto bond0.91 iface bond0.91 inet static address 192.168.91.11 #Ceph auto vmbr0 iface vmbr0 inet static address 192.168.90.11 gateway 192.168.90.1 bridge-ports bond0 bridge-stp off bridge-fd 0 Thanks Thanks Stefan On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user > wrote: From: Eneko Lacunza > Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence Date: April 14, 2021 at 12:12:09 GMT+2 To: pve-user at lists.proxmox.com Hi Michael, El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: On Wed, 14 Apr 2021 11:04:10 +0200 Eneko Lacunza via pve-user> wrote: Hi all, Yesterday we had a strange fence happen in a PVE 6.2 cluster. Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been operating normally for a year. Last update was on January 21st 2021. Storage is Ceph and nodes are connected to the same network switch with active-pasive bonds. proxmox1 was fenced and automatically rebooted, then everything recovered. HA restarted VMs in other nodes too. proxmox1 syslog: (no network link issues reported at device level) I have seen this occasionally and every time the cause was high network load/network congestion which caused token timeout. The default token timeout in corosync IMHO is very optimistically configured to 1000 ms so I have changed this setting to 5000 ms and after I have done this I have never seen fencing happening caused by network load/network congestion again. You could try this and see if that helps you. PS. my cluster communication is on a dedicated gb bonded vlan. Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes: - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches - Both switches are interconnected with a SFP+ DAC - Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch. - Connectivity to the LAN is done with 1 Gbit link - Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks. I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence. Network cards are Broadcom. Thanks Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ pve-user mailing list pve-user at lists.proxmox.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0 CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. From elacunza at binovo.es Wed Apr 14 16:07:09 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 14 Apr 2021 16:07:09 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com> References: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es> <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com> Message-ID: Hi Stefan, Thanks for your advice. Seems a really good use for otherwise unused 1G ports so I'll look into configuring that. If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and the other on 10G bond) Thanks El 14/4/21 a las 15:57, Stefan M. Radman escribi?: > Hi Eneko > > That?s a nice setup and I bet it works well but you should do some > hand-tuning to increase resilience. > > Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces? > > If that?s the case I?d strongly recommend to turn them into dedicated > untagged interfaces for the cluster traffic, running on two separate > ?rings". > > https://pve.proxmox.com/wiki/Separate_Cluster_Network > > https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol > > > Create two corosync rings, using isolated VLANs on your two switches > e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2. > > eno1 => Switch1 => VLAN4001 > eno2 => Switch2 => VLAN4002 > > Restrict VLAN4001 to the access ports where the eno1 interfaces are > connected. Prune VLAN4001 from ALL trunks. > Restrict VLAN4001 to the access ports where the eno2 interfaces are > connected. Prune VLAN4002 from ALL trunks. > Assign the eno1 and eno2 interfaces to two separate subnets and you > are done. > > With separate rings you don?t even have to stop your cluster while > migrating corosync to the new subnets. > Just do them one-by-one. > > With corosync running on two separate rings isolated from the rest of > your network you should not see any further node fencing. > > Stefan > >> On Apr 14, 2021, at 15:18, Eneko Lacunza > > wrote: >> >> Hi Stefan, >> >> El 14/4/21 a las 13:22, Stefan M. Radman escribi?: >>> Hi Eneko >>> >>> Do you have separate physical interfaces for the cluster (corosync) >>> traffic? >> No. >>> Do you have them on separate VLANs on your switches? >> Onyl Ceph traffic is on VLAN91, the rest is untagged. >> >>> Are you running 1 or 2 corosync rings? >> This is standard... no hand tuning: >> >> nodelist { >> ? node { >> ??? name: proxmox1 >> ??? nodeid: 2 >> ??? quorum_votes: 1 >> ??? ring0_addr: 192.168.90.11 >> ? } >> ? node { >> ??? name: proxmox2 >> ??? nodeid: 1 >> ??? quorum_votes: 1 >> ??? ring0_addr: 192.168.90.12 >> ? } >> ? node { >> ??? name: proxmox3 >> ??? nodeid: 3 >> ??? quorum_votes: 1 >> ??? ring0_addr: 192.168.90.13 >> ? } >> } >> >> quorum { >> ? provider: corosync_votequorum >> } >> >> totem { >> ? cluster_name: CLUSTERNAME >> ? config_version: 3 >> ? interface { >> ??? linknumber: 0 >> ? } >> ? ip_version: ipv4-6 >> ? secauth: on >> ? version: 2 >> } >> >>> >>> Please post your /etc/network/interfaces and explain which interface >>> connects where. >> auto lo >> iface lo inet loopback >> >> iface ens2f0np0 inet manual >> # Switch2 >> >> iface ens2f1np1 inet manual >> # Switch1 >> >> iface eno1 inet manual >> >> iface eno2 inet manual >> >> auto bond0 >> iface bond0 inet manual >> ??? bond-slaves ens2f0np0 ens2f1np1 >> ??? bond-miimon 100 >> ??? bond-mode active-backup >> ??? bond-primary ens2f0np1 >> >> auto bond0.91 >> iface bond0.91 inet static >> ??? address 192.168.91.11 >> #Ceph >> >> auto vmbr0 >> iface vmbr0 inet static >> ??? address 192.168.90.11 >> ??? gateway 192.168.90.1 >> ??? bridge-ports bond0 >> ??? bridge-stp off >> ??? bridge-fd 0 >> >> Thanks >>> >>> Thanks >>> >>> Stefan >>> >>> >>>> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user >>>> > wrote: >>>> >>>> >>>> *From: *Eneko Lacunza > >>>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence* >>>> *Date: *April 14, 2021 at 12:12:09 GMT+2 >>>> *To: *pve-user at lists.proxmox.com >>>> >>>> >>>> Hi Michael, >>>> >>>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: >>>>> On Wed, 14 Apr 2021 11:04:10 +0200 >>>>> Eneko Lacunza via pve-user>>>> > ?wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster. >>>>>> >>>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been >>>>>> operating normally for a year. Last update was on January 21st 2021. >>>>>> Storage is Ceph and nodes are connected to the same network switch >>>>>> with active-pasive bonds. >>>>>> >>>>>> proxmox1 was fenced and automatically rebooted, then everything >>>>>> recovered. HA restarted VMs in other nodes too. >>>>>> >>>>>> proxmox1 syslog: (no network link issues reported at device level) >>>>> I have seen this occasionally and every time the cause was high >>>>> network >>>>> load/network congestion which caused token timeout. The default token >>>>> timeout in corosync IMHO is very optimistically configured to 1000 ms >>>>> so I have changed this setting to 5000 ms and after I have done this I >>>>> have never seen fencing happening caused by network load/network >>>>> congestion again. You could try this and see if that helps you. >>>>> >>>>> PS. my cluster communication is on a dedicated gb bonded vlan. >>>> Thanks for the info. In this case network is 10Gbit (I see I didn't >>>> include this info) but only for proxmox nodes: >>>> >>>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches >>>> - Both switches are interconnected with a SFP+ DAC >>>> - Active-passive Bonds in each proxmox node go one SFP+ interface >>>> on each switch. Primary interfaces are configured to be on the same >>>> switch. >>>> - Connectivity to the LAN is done with 1 Gbit link >>>> - Proxmox 2x10G Bond is used for VM networking and Ceph >>>> public/private networks. >>>> >>>> I wouldn't expect high network load/congestion because it's on an >>>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were >>>> ocurring during the fence. >>>> >>>> Network cards are Broadcom. >>>> >>>> Thanks >>>> >>>> Eneko Lacunza >>>> Zuzendari teknikoa | Director t?cnico >>>> Binovo IT Human Project >>>> >>>> Tel. +34 943 569 206 | https://www.binovo.es >>>> >>>> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun >>>> >>>> https://www.youtube.com/user/CANALBINOVO >>>> >>>> https://www.linkedin.com/company/37269706/ >>>> >>>> >>>> >>>> _______________________________________________ >>>> pve-user mailing list >>>> pve-user at lists.proxmox.com >>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0 >>> >>> >>> CONFIDENTIALITY NOTICE: /This communication may contain privileged >>> and confidential information, or may otherwise be protected from >>> disclosure, and is intended solely for use of the intended >>> recipient(s). If you are not the intended recipient of this >>> communication, please notify the sender that you have received this >>> communication in error and delete and destroy all copies in your >>> possession. / >>> >> > > > CONFIDENTIALITY NOTICE: /This communication may contain privileged and > confidential information, or may otherwise be protected from > disclosure, and is intended solely for use of the intended > recipient(s). If you are not the intended recipient of this > communication, please notify the sender that you have received this > communication in error and delete and destroy all copies in your > possession. / > EnekoLacunza Director T?cnico | Zuzendari teknikoa Binovo IT Human Project 943 569 206 elacunza at binovo.es binovo.es Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun youtube linkedin From smr at kmi.com Wed Apr 14 16:49:59 2021 From: smr at kmi.com (Stefan M. Radman) Date: Wed, 14 Apr 2021 14:49:59 +0000 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: References: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es> <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com> Message-ID: <3CC99343-0FD3-4F2F-8F05-119CBB61BC37@kmi.com> Hi Eneko If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and the other on 10G bond) That?s pretty unlikely. Usually they come in pairs ;) But yes, in that hypothetical case I?d use the available physical interface for ring1 and build ring2 from a tagged interface. For corosync interfaces I prefer two separate physical interfaces (simple, resilient). Bonding and tagging adds a layer of complexity you don?t want on a cluster heartbeat. Find below an actual configuration of a cluster with one node having just 2 interfaces while the other nodes all have 4. The 2 interfaces are configured in an HA bond like yours and the corosync rings are stacked on it as tagged interfaces in their specific VLANs. VLAN684 exists on switch1 only and VLAN685 exists on switch2 only. The most resilient solution under the circumstances given and has been working like a charm for several years now. Regards Stefan NODE1 - 4 interfaces ==================== iface eno1 inet manual #Gb1 - Trunk iface eno2 inet manual #Gb2 - Trunk auto eno3 iface eno3 inet static address 192.168.84.1 netmask 255.255.255.0 #Gb3 - COROSYNC1 - VLAN684 auto eno4 iface eno4 inet static address 192.168.85.1 netmask 255.255.255.0 #Gb4 - COROSYNC2 - VLAN685 auto bond0 iface bond0 inet manual slaves eno1 eno2 bond_miimon 100 bond_mode active-backup #HA Bundle Gb1/Gb2 - Trunk NODE3 - 2 interfaces ==================== iface eno1 inet manual #Gb1 - Trunk iface eno2 inet manual #Gb2 - Trunk auto bond0 iface bond0 inet manual slaves eno1 eno2 bond_miimon 100 bond_mode active-backup #HA Bundle Gb1/Gb2 - Trunk auto bond0.684 iface bond0.684 inet static address 192.168.84.3 netmask 255.255.255.0 #COROSYNC1 - VLAN684 auto bond0.685 iface bond0.685 inet static address 192.168.85.3 netmask 255.255.255.0 #COROSYNC2 - VLAN685 On Apr 14, 2021, at 16:07, Eneko Lacunza > wrote: Hi Stefan, Thanks for your advice. Seems a really good use for otherwise unused 1G ports so I'll look into configuring that. If nodes had only one 1G interface, would you also une RRP? (one ring on 1G and the other on 10G bond) Thanks El 14/4/21 a las 15:57, Stefan M. Radman escribi?: Hi Eneko That?s a nice setup and I bet it works well but you should do some hand-tuning to increase resilience. Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces? If that?s the case I?d strongly recommend to turn them into dedicated untagged interfaces for the cluster traffic, running on two separate ?rings". https://pve.proxmox.com/wiki/Separate_Cluster_Network https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol Create two corosync rings, using isolated VLANs on your two switches e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2. eno1 => Switch1 => VLAN4001 eno2 => Switch2 => VLAN4002 Restrict VLAN4001 to the access ports where the eno1 interfaces are connected. Prune VLAN4001 from ALL trunks. Restrict VLAN4001 to the access ports where the eno2 interfaces are connected. Prune VLAN4002 from ALL trunks. Assign the eno1 and eno2 interfaces to two separate subnets and you are done. With separate rings you don?t even have to stop your cluster while migrating corosync to the new subnets. Just do them one-by-one. With corosync running on two separate rings isolated from the rest of your network you should not see any further node fencing. Stefan On Apr 14, 2021, at 15:18, Eneko Lacunza > wrote: Hi Stefan, El 14/4/21 a las 13:22, Stefan M. Radman escribi?: Hi Eneko Do you have separate physical interfaces for the cluster (corosync) traffic? No. Do you have them on separate VLANs on your switches? Onyl Ceph traffic is on VLAN91, the rest is untagged. Are you running 1 or 2 corosync rings? This is standard... no hand tuning: nodelist { node { name: proxmox1 nodeid: 2 quorum_votes: 1 ring0_addr: 192.168.90.11 } node { name: proxmox2 nodeid: 1 quorum_votes: 1 ring0_addr: 192.168.90.12 } node { name: proxmox3 nodeid: 3 quorum_votes: 1 ring0_addr: 192.168.90.13 } } quorum { provider: corosync_votequorum } totem { cluster_name: CLUSTERNAME config_version: 3 interface { linknumber: 0 } ip_version: ipv4-6 secauth: on version: 2 } Please post your /etc/network/interfaces and explain which interface connects where. auto lo iface lo inet loopback iface ens2f0np0 inet manual # Switch2 iface ens2f1np1 inet manual # Switch1 iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual bond-slaves ens2f0np0 ens2f1np1 bond-miimon 100 bond-mode active-backup bond-primary ens2f0np1 auto bond0.91 iface bond0.91 inet static address 192.168.91.11 #Ceph auto vmbr0 iface vmbr0 inet static address 192.168.90.11 gateway 192.168.90.1 bridge-ports bond0 bridge-stp off bridge-fd 0 Thanks Thanks Stefan On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user > wrote: From: Eneko Lacunza > Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence Date: April 14, 2021 at 12:12:09 GMT+2 To: pve-user at lists.proxmox.com Hi Michael, El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: On Wed, 14 Apr 2021 11:04:10 +0200 Eneko Lacunza via pve-user> wrote: Hi all, Yesterday we had a strange fence happen in a PVE 6.2 cluster. Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been operating normally for a year. Last update was on January 21st 2021. Storage is Ceph and nodes are connected to the same network switch with active-pasive bonds. proxmox1 was fenced and automatically rebooted, then everything recovered. HA restarted VMs in other nodes too. proxmox1 syslog: (no network link issues reported at device level) I have seen this occasionally and every time the cause was high network load/network congestion which caused token timeout. The default token timeout in corosync IMHO is very optimistically configured to 1000 ms so I have changed this setting to 5000 ms and after I have done this I have never seen fencing happening caused by network load/network congestion again. You could try this and see if that helps you. PS. my cluster communication is on a dedicated gb bonded vlan. Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes: - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches - Both switches are interconnected with a SFP+ DAC - Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch. - Connectivity to the LAN is done with 1 Gbit link - Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks. I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence. Network cards are Broadcom. Thanks Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ pve-user mailing list pve-user at lists.proxmox.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0 CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. Eneko Lacunza Director T?cnico | Zuzendari teknikoa Binovo IT Human Project [https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/phone-icon-2x.png] 943 569 206 [https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/email-icon-2x.png] elacunza at binovo.es [https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/link-icon-2x.png] binovo.es [https://cdn2.hubspot.net/hubfs/53/tools/email-signature-generator/icons/address-icon-2x.png] Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun [https://odooticketbai.com/wp-content/uploads/2020/10/Logo-Binovo-firmas-de-correo.png] [youtube] [linkedin] CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. From elacunza at binovo.es Wed Apr 14 17:15:08 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 14 Apr 2021 17:15:08 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: <3CC99343-0FD3-4F2F-8F05-119CBB61BC37@kmi.com> References: <450D534B-0CDE-48D2-AC6E-31C8EF9D72EE@kmi.com> <8f4316da-2676-3297-657d-ba3ff572614e@binovo.es> <276D0DE7-FA1E-470F-9933-890C6C9D4E5B@kmi.com> <3CC99343-0FD3-4F2F-8F05-119CBB61BC37@kmi.com> Message-ID: <647b9ba1-e3dd-e144-ec2a-0de375fb6072@binovo.es> Hi Stefan, El 14/4/21 a las 16:49, Stefan M. Radman escribi?: >> If nodes had only one 1G interface, would you also une RRP? (one ring >> on 1G and the other on 10G bond) > > That?s pretty unlikely. Usually they come in pairs ;) Right, unless you use "entry" level servers or DIY builds ;) > > But yes, in that hypothetical case I?d use the available physical > interface for ring1 and build ring2 from a tagged interface. > > For corosync interfaces I prefer two separate physical interfaces > (simple, resilient). > Bonding and tagging adds a layer of complexity you don?t want on a > cluster heartbeat. Sure. > > Find below an actual configuration of a cluster with one node having > just 2 interfaces while the other nodes all have 4. > The 2 interfaces are configured in an HA bond like yours and the > corosync rings are stacked on it as tagged interfaces in their > specific VLANs. > VLAN684 exists on switch1 only and VLAN685 exists on switch2 only. > The most resilient solution under the circumstances given and has been > working like a charm for several years now. Thanks for the examples! Cheers Eneko > > Regards > > Stefan > > NODE1 - 4 interfaces > ==================== > > iface eno1 inet manual > #Gb1 - Trunk > > iface eno2 inet manual > #Gb2 - Trunk > > auto eno3 > iface eno3 inet static > address192.168.84.1 > netmask255.255.255.0 > #Gb3 - COROSYNC1 - VLAN684 > > auto eno4 > iface eno4 inet static > address192.168.85.1 > netmask255.255.255.0 > #Gb4 - COROSYNC2 - VLAN685 > > auto bond0 > iface bond0 inet manual > slaves eno1 eno2 > bond_miimon 100 > bond_mode active-backup > #HA Bundle Gb1/Gb2 - Trunk > > > NODE3 - 2 interfaces > ==================== > > iface eno1 inet manual > #Gb1 - Trunk > > iface eno2 inet manual > #Gb2 - Trunk > > auto bond0 > iface bond0 inet manual > slaves eno1 eno2 > bond_miimon 100 > bond_mode active-backup > #HA Bundle Gb1/Gb2 - Trunk > > auto bond0.684 > iface bond0.684 inet static > address192.168.84.3 > netmask 255.255.255.0 > #COROSYNC1 - VLAN684 > > auto bond0.685 > iface bond0.685 inet static > address 192.168.85.3 > netmask 255.255.255.0 > #COROSYNC2 - VLAN685 > >> On Apr 14, 2021, at 16:07, Eneko Lacunza > > wrote: >> >> Hi Stefan, >> >> Thanks for your advice. Seems a really good use for otherwise unused >> 1G ports so I'll look into configuring that. >> >> If nodes had only one 1G interface, would you also une RRP? (one ring >> on 1G and the other on 10G bond) >> >> Thanks >> >> El 14/4/21 a las 15:57, Stefan M. Radman escribi?: >>> Hi Eneko >>> >>> That?s a nice setup and I bet it works well but you should do some >>> hand-tuning to increase resilience. >>> >>> Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces? >>> >>> If that?s the case I?d strongly recommend to turn them into >>> dedicated untagged interfaces for the cluster traffic, running on >>> two separate ?rings". >>> >>> https://pve.proxmox.com/wiki/Separate_Cluster_Network >>> >>> https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol >>> >>> >>> Create two corosync rings, using isolated VLANs on your two switches >>> e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2. >>> >>> eno1 => Switch1 => VLAN4001 >>> eno2 => Switch2 => VLAN4002 >>> >>> Restrict VLAN4001 to the access ports where the eno1 interfaces are >>> connected. Prune VLAN4001 from ALL trunks. >>> Restrict VLAN4001 to the access ports where the eno2 interfaces are >>> connected. Prune VLAN4002 from ALL trunks. >>> Assign the eno1 and eno2 interfaces to two separate subnets and you >>> are done. >>> >>> With separate rings you don?t even have to stop your cluster while >>> migrating corosync to the new subnets. >>> Just do them one-by-one. >>> >>> With corosync running on two separate rings isolated from the rest >>> of your network you should not see any further node fencing. >>> >>> Stefan >>> >>>> On Apr 14, 2021, at 15:18, Eneko Lacunza >>> > wrote: >>>> >>>> Hi Stefan, >>>> >>>> El 14/4/21 a las 13:22, Stefan M. Radman escribi?: >>>>> Hi Eneko >>>>> >>>>> Do you have separate physical interfaces for the cluster >>>>> (corosync) traffic? >>>> No. >>>>> Do you have them on separate VLANs on your switches? >>>> Onyl Ceph traffic is on VLAN91, the rest is untagged. >>>> >>>>> Are you running 1 or 2 corosync rings? >>>> This is standard... no hand tuning: >>>> >>>> nodelist { >>>> ? node { >>>> ??? name: proxmox1 >>>> ??? nodeid: 2 >>>> ??? quorum_votes: 1 >>>> ??? ring0_addr: 192.168.90.11 >>>> ? } >>>> ? node { >>>> ??? name: proxmox2 >>>> ??? nodeid: 1 >>>> ??? quorum_votes: 1 >>>> ??? ring0_addr: 192.168.90.12 >>>> ? } >>>> ? node { >>>> ??? name: proxmox3 >>>> ??? nodeid: 3 >>>> ??? quorum_votes: 1 >>>> ??? ring0_addr: 192.168.90.13 >>>> ? } >>>> } >>>> >>>> quorum { >>>> ? provider: corosync_votequorum >>>> } >>>> >>>> totem { >>>> ? cluster_name: CLUSTERNAME >>>> ? config_version: 3 >>>> ? interface { >>>> ??? linknumber: 0 >>>> ? } >>>> ? ip_version: ipv4-6 >>>> ? secauth: on >>>> ? version: 2 >>>> } >>>> >>>>> >>>>> Please post your /etc/network/interfaces and explain which >>>>> interface connects where. >>>> auto lo >>>> iface lo inet loopback >>>> >>>> iface ens2f0np0 inet manual >>>> # Switch2 >>>> >>>> iface ens2f1np1 inet manual >>>> # Switch1 >>>> >>>> iface eno1 inet manual >>>> >>>> iface eno2 inet manual >>>> >>>> auto bond0 >>>> iface bond0 inet manual >>>> ??? bond-slaves ens2f0np0 ens2f1np1 >>>> ??? bond-miimon 100 >>>> ??? bond-mode active-backup >>>> ??? bond-primary ens2f0np1 >>>> >>>> auto bond0.91 >>>> iface bond0.91 inet static >>>> ??? address 192.168.91.11 >>>> #Ceph >>>> >>>> auto vmbr0 >>>> iface vmbr0 inet static >>>> ??? address 192.168.90.11 >>>> ??? gateway 192.168.90.1 >>>> ??? bridge-ports bond0 >>>> ??? bridge-stp off >>>> ??? bridge-fd 0 >>>> >>>> Thanks >>>>> >>>>> Thanks >>>>> >>>>> Stefan >>>>> >>>>> >>>>>> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user >>>>>> > >>>>>> wrote: >>>>>> >>>>>> >>>>>> *From: *Eneko Lacunza >>>>> > >>>>>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence* >>>>>> *Date: *April 14, 2021 at 12:12:09 GMT+2 >>>>>> *To: *pve-user at lists.proxmox.com >>>>>> >>>>>> >>>>>> Hi Michael, >>>>>> >>>>>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: >>>>>>> On Wed, 14 Apr 2021 11:04:10 +0200 >>>>>>> Eneko Lacunza via pve-user>>>>>> > ?wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster. >>>>>>>> >>>>>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been >>>>>>>> operating normally for a year. Last update was on January 21st >>>>>>>> 2021. >>>>>>>> Storage is Ceph and nodes are connected to the same network switch >>>>>>>> with active-pasive bonds. >>>>>>>> >>>>>>>> proxmox1 was fenced and automatically rebooted, then everything >>>>>>>> recovered. HA restarted VMs in other nodes too. >>>>>>>> >>>>>>>> proxmox1 syslog: (no network link issues reported at device level) >>>>>>> I have seen this occasionally and every time the cause was high >>>>>>> network >>>>>>> load/network congestion which caused token timeout. The default >>>>>>> token >>>>>>> timeout in corosync IMHO is very optimistically configured to >>>>>>> 1000 ms >>>>>>> so I have changed this setting to 5000 ms and after I have done >>>>>>> this I >>>>>>> have never seen fencing happening caused by network load/network >>>>>>> congestion again. You could try this and see if that helps you. >>>>>>> >>>>>>> PS. my cluster communication is on a dedicated gb bonded vlan. >>>>>> Thanks for the info. In this case network is 10Gbit (I see I >>>>>> didn't include this info) but only for proxmox nodes: >>>>>> >>>>>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches >>>>>> - Both switches are interconnected with a SFP+ DAC >>>>>> - Active-passive Bonds in each proxmox node go one SFP+ interface >>>>>> on each switch. Primary interfaces are configured to be on the >>>>>> same switch. >>>>>> - Connectivity to the LAN is done with 1 Gbit link >>>>>> - Proxmox 2x10G Bond is used for VM networking and Ceph >>>>>> public/private networks. >>>>>> >>>>>> I wouldn't expect high network load/congestion because it's on an >>>>>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were >>>>>> ocurring during the fence. >>>>>> >>>>>> Network cards are Broadcom. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Eneko Lacunza >>>>>> Zuzendari teknikoa | Director t?cnico >>>>>> Binovo IT Human Project >>>>>> >>>>>> Tel. +34 943 569 206 | https://www.binovo.es >>>>>> >>>>>> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun >>>>>> >>>>>> https://www.youtube.com/user/CANALBINOVO >>>>>> >>>>>> https://www.linkedin.com/company/37269706/ >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> pve-user mailing list >>>>>> pve-user at lists.proxmox.com >>>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0 >>>>> >>>>> >>>>> CONFIDENTIALITY NOTICE: /This communication may contain privileged >>>>> and confidential information, or may otherwise be protected from >>>>> disclosure, and is intended solely for use of the intended >>>>> recipient(s). If you are not the intended recipient of this >>>>> communication, please notify the sender that you have received >>>>> this communication in error and delete and destroy all copies in >>>>> your possession. / >>>>> >>>> >>> >>> >>> CONFIDENTIALITY NOTICE: /This communication may contain privileged >>> and confidential information, or may otherwise be protected from >>> disclosure, and is intended solely for use of the intended >>> recipient(s). If you are not the intended recipient of this >>> communication, please notify the sender that you have received this >>> communication in error and delete and destroy all copies in your >>> possession. / >>> >> > > > CONFIDENTIALITY NOTICE: /This communication may contain privileged and > confidential information, or may otherwise be protected from > disclosure, and is intended solely for use of the intended > recipient(s). If you are not the intended recipient of this > communication, please notify the sender that you have received this > communication in error and delete and destroy all copies in your > possession. / > EnekoLacunza Director T?cnico | Zuzendari teknikoa Binovo IT Human Project 943 569 206 elacunza at binovo.es binovo.es Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun youtube linkedin From alain.pean at c2n.upsaclay.fr Wed Apr 14 18:15:29 2021 From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=) Date: Wed, 14 Apr 2021 18:15:29 +0200 Subject: [PVE-User] Edit: Boot Order mask In-Reply-To: <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it> References: <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it> Message-ID: Le 14/04/2021 ? 09:37, Piviul a ?crit?: > I Alain, first of all thank you very much indeed to you and to all > people answered this thread. I reply your message but the infos here > should answer even the infos asked from Alwin... > > I send directly the output differences from the command pveversion > with -v flag because all three nodes show the same > "pve-manager/6.3-6/2184247e (running kernel: 5.4.106-1-pve)" version. > > So I have launched the following command in all three nodes: > > # pveversion -v > pveversion.$(hostname) > > obtaining 3 differents files and I've done the diff between the first > two files (referring to pve01 and pve02) and as expected there is no > difference: > > $ diff pveversion.pve0{1,2} > > Then I have done the diff between the first and the third node and > this is the result: > > $ diff pveversion.pve0{1,3} > 5d4 > < pve-kernel-5.3: 6.1-6 > 8,9c7 > < pve-kernel-5.3.18-3-pve: 5.3.18-3 > < pve-kernel-5.3.10-1-pve: 5.3.10-1 > --- > > pve-kernel-5.4.34-1-pve: 5.4.34-2 > > there are some little differences yes but in kernel that are not in > use any more (in all 3 nodes uname -r shows 5.4.106-1-pve)... > > Attached you can find all three files hoping the system doesn't cut them. > > Please can I ask you if you have a 6.3 node in your installations that > was previously in 6.2 version (i.e. not installed directly in 6.3 > version)? Can you tell me if the "Boot order" musk is the one with > only combo boxes or the more evoluted drag and drop musk? Hi Piviul, I don't think only a difference in kernel could explain this difference in the web interface, if the other packages are the same. Did you try to clear the cache in your web browsers ? The attached files are indeed there. I looked at the versions, and all three appears up to date, so for me, the only origin that I can suppose could be the browser cache. Alain -- Administrateur Syst?me/R?seau C2N Centre de Nanosciences et Nanotechnologies (UMR 9001) Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau Tel : 01-70-27-06-88 Bureau A255 From elacunza at binovo.es Wed Apr 14 18:26:08 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 14 Apr 2021 18:26:08 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: References: Message-ID: <255c1bb1-a8f2-bec1-3ad9-6785e63d6dae@binovo.es> Hi, So I have figured out what likely happened. Indeed it was very likely a network congestion because proxmox1 and proxmox2 where using a switch and proxmox3 the other, due to proxmox1 and proxmox2 not having properly loaded the bond-primary directive (primary slave not shown on /proc/net/bonding/bond0 although it was present in /etc/network/interfaces). Additionally, just checked out that both switches are linked by a 1G port due to the 4th SFP+ port being used for the backup server... (against my recommendation during the cluster setup I must add...) So very likely it was network congestion that kicked proxmox1 out of the cluster. If seems that bond directives should be present in slaves too, like: auto lo iface lo inet loopback iface ens2f0np0 inet manual ??? bond-master bond0 ??? bond-primary ens2f0np1 # Switch2 iface ens2f1np1 inet manual ??? bond-master bond0 ??? bond-primary ens2f0np1 # Switch1 iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual ??? bond-slaves ens2f0np0 ens2f1np1 ??? bond-miimon 100 ??? bond-mode active-backup ??? bond-primary ens2f0np1 auto bond0.91 iface bond0.91 inet static ??? address 192.168.91.11 #Ceph auto vmbr0 iface vmbr0 inet static ??? address 192.168.90.11 ??? gateway 192.168.90.1 ??? bridge-ports bond0 ??? bridge-stp off ??? bridge-fd 0 Otherwise, it seems sometimes primary doesn't get configured properly... Thanks again Michael and Stefan! Eneko El 14/4/21 a las 12:12, Eneko Lacunza via pve-user escribi?: > Hi Michael, > > El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: >> On Wed, 14 Apr 2021 11:04:10 +0200 >> Eneko Lacunza via pve-user wrote: >> >>> Hi all, >>> >>> Yesterday we had a strange fence happen in a PVE 6.2 cluster. >>> >>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been >>> operating normally for a year. Last update was on January 21st 2021. >>> Storage is Ceph and nodes are connected to the same network switch >>> with active-pasive bonds. >>> >>> proxmox1 was fenced and automatically rebooted, then everything >>> recovered. HA restarted VMs in other nodes too. >>> >>> proxmox1 syslog: (no network link issues reported at device level) >> I have seen this occasionally and every time the cause was high network >> load/network congestion which caused token timeout. The default token >> timeout in corosync IMHO is very optimistically configured to 1000 ms >> so I have changed this setting to 5000 ms and after I have done this I >> have never seen fencing happening caused by network load/network >> congestion again. You could try this and see if that helps you. >> >> PS. my cluster communication is on a dedicated gb bonded vlan. > Thanks for the info. In this case network is 10Gbit (I see I didn't > include this info) but only for proxmox nodes: > > - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches > - Both switches are interconnected with a SFP+ DAC > - Active-passive Bonds in each proxmox node go one SFP+ interface on > each switch. Primary interfaces are configured to be on the same switch. > - Connectivity to the LAN is done with 1 Gbit link > - Proxmox 2x10G Bond is used for VM networking and Ceph public/private > networks. > > I wouldn't expect high network load/congestion because it's on an > internal LAN, with 1Gbit clients. No Ceph issues/backfilling were > ocurring during the fence. > > Network cards are Broadcom. > > Thanks Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From humberto.freitas310 at gmail.com Wed Apr 14 18:57:01 2021 From: humberto.freitas310 at gmail.com (Humberto Freitas) Date: Wed, 14 Apr 2021 17:57:01 +0100 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 Message-ID: Hey guys, I hope everybody is all right ? I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc... I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist? Appreciate your wisdom lol Thanks for your great work Sincerely, Humberto Freitas Phone: +244 944 775 334 Email: humberto.freitas310 at gmail.com Angola From ralf.storm at konzept-is.de Wed Apr 14 19:27:15 2021 From: ralf.storm at konzept-is.de (Ralf Storm) Date: Wed, 14 Apr 2021 19:27:15 +0200 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: References: Message-ID: <20068e92-58d5-1ce7-67d6-181bf97cc948@konzept-is.de> Hello Humberto, I have it running for a customer on a DL380 Gen9 and on an even older one, works togeher like charm. I have installed Proxmox on so many different hardware in the past few years, from a small atom NUC, on "usual" pcs and up to "real" servers. Never had a problem with it. I always use the ZFS-options during install - better than using the raidcontrollers in my opinion. have fun with it best regards Ralf Am 14/04/2021 um 18:57 schrieb Humberto Freitas: > Hey guys, I hope everybody is all right ? > > I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc... > > I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist? > > Appreciate your wisdom lol > > Thanks for your great work > > Sincerely, > > Humberto Freitas > > Phone: +244 944 775 334 > Email: humberto.freitas310 at gmail.com > Angola > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From smr at kmi.com Wed Apr 14 19:28:44 2021 From: smr at kmi.com (Stefan M. Radman) Date: Wed, 14 Apr 2021 17:28:44 +0000 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: References: Message-ID: <35DFA143-EE6D-41AD-A795-C58BE7253441@kmi.com> Hi Eneko The redundant corosync rings would definitely have prevented the fencing even in your scenario. As a final note you should also consider replacing that 1GbE link between the switches by an Nx1GbE bundle (LACP) for redundancy and bandwidth reasons or at least by 2 x 1GbE secured by spanning tree (RSTP). Stefan On Apr 14, 2021, at 18:26, Eneko Lacunza via pve-user > wrote: From: Eneko Lacunza > Subject: Re: [PVE-User] PVE 6.2 Strange cluster node fence Date: April 14, 2021 at 18:26:08 GMT+2 To: pve-user at lists.proxmox.com Hi, So I have figured out what likely happened. Indeed it was very likely a network congestion because proxmox1 and proxmox2 where using a switch and proxmox3 the other, due to proxmox1 and proxmox2 not having properly loaded the bond-primary directive (primary slave not shown on /proc/net/bonding/bond0 although it was present in /etc/network/interfaces). Additionally, just checked out that both switches are linked by a 1G port due to the 4th SFP+ port being used for the backup server... (against my recommendation during the cluster setup I must add...) So very likely it was network congestion that kicked proxmox1 out of the cluster. If seems that bond directives should be present in slaves too, like: auto lo iface lo inet loopback iface ens2f0np0 inet manual bond-master bond0 bond-primary ens2f0np1 # Switch2 iface ens2f1np1 inet manual bond-master bond0 bond-primary ens2f0np1 # Switch1 iface eno1 inet manual iface eno2 inet manual auto bond0 iface bond0 inet manual bond-slaves ens2f0np0 ens2f1np1 bond-miimon 100 bond-mode active-backup bond-primary ens2f0np1 auto bond0.91 iface bond0.91 inet static address 192.168.91.11 #Ceph auto vmbr0 iface vmbr0 inet static address 192.168.90.11 gateway 192.168.90.1 bridge-ports bond0 bridge-stp off bridge-fd 0 Otherwise, it seems sometimes primary doesn't get configured properly... Thanks again Michael and Stefan! Eneko El 14/4/21 a las 12:12, Eneko Lacunza via pve-user escribi?: Hi Michael, El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: On Wed, 14 Apr 2021 11:04:10 +0200 Eneko Lacunza via pve-user> wrote: Hi all, Yesterday we had a strange fence happen in a PVE 6.2 cluster. Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been operating normally for a year. Last update was on January 21st 2021. Storage is Ceph and nodes are connected to the same network switch with active-pasive bonds. proxmox1 was fenced and automatically rebooted, then everything recovered. HA restarted VMs in other nodes too. proxmox1 syslog: (no network link issues reported at device level) I have seen this occasionally and every time the cause was high network load/network congestion which caused token timeout. The default token timeout in corosync IMHO is very optimistically configured to 1000 ms so I have changed this setting to 5000 ms and after I have done this I have never seen fencing happening caused by network load/network congestion again. You could try this and see if that helps you. PS. my cluster communication is on a dedicated gb bonded vlan. Thanks for the info. In this case network is 10Gbit (I see I didn't include this info) but only for proxmox nodes: - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches - Both switches are interconnected with a SFP+ DAC - Active-passive Bonds in each proxmox node go one SFP+ interface on each switch. Primary interfaces are configured to be on the same switch. - Connectivity to the LAN is done with 1 Gbit link - Proxmox 2x10G Bond is used for VM networking and Ceph public/private networks. I wouldn't expect high network load/congestion because it's on an internal LAN, with 1Gbit clients. No Ceph issues/backfilling were ocurring during the fence. Network cards are Broadcom. Thanks Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ _______________________________________________ pve-user mailing list pve-user at lists.proxmox.com https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C6173285a195944ab306e08d8ff620c61%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540143873213806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=k%2FL7WhTr4ybZ%2FsKsx%2F49L3k7sjc2VA71xKwI8iH8buw%3D&reserved=0 CONFIDENTIALITY NOTICE: This communication may contain privileged and confidential information, or may otherwise be protected from disclosure, and is intended solely for use of the intended recipient(s). If you are not the intended recipient of this communication, please notify the sender that you have received this communication in error and delete and destroy all copies in your possession. From emmungil at yahoo.com Wed Apr 14 19:53:54 2021 From: emmungil at yahoo.com (LEVENT EMMUNGIL) Date: Wed, 14 Apr 2021 17:53:54 +0000 (UTC) Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 References: <211108960.1987020.1618422834880.ref@mail.yahoo.com> Message-ID: <211108960.1987020.1618422834880@mail.yahoo.com> Hi,I have been using proxmox on various hardware (Different vendors rangin from pc to enterprise).HP Proliant DL380 Gen7, Gen8, Gen9 and Gen10They are all worked well, and had no problem. Best wishes. From humberto.freitas310 at gmail.com Wed Apr 14 20:19:53 2021 From: humberto.freitas310 at gmail.com (Humberto Freitas) Date: Wed, 14 Apr 2021 19:19:53 +0100 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: <20068e92-58d5-1ce7-67d6-181bf97cc948@konzept-is.de> References: <20068e92-58d5-1ce7-67d6-181bf97cc948@konzept-is.de> Message-ID: <6FCF9326-8545-4073-B119-A245D4E3EE69@gmail.com> Hello Ralf, thank you so much for you fast response. > I have it running for a customer on a DL380 Gen9 and on an even older one, works togeher like charm. Nice to know that. It looks promising ? > I have installed Proxmox on so many different hardware in the past few years, from a small atom NUC, on "usual" pcs and up to "real" servers. > > Never had a problem with it. Yeah, Proxmox is such great software. Excellent work guys > I always use the ZFS-options during install - better than using the raidcontrollers in my opinion. Thanks for the advice. It looks like this server has some issues with RAID drivers. I?ll keep this in mind ? > have fun with it Hell yeah... I?m just waiting for the final decision to get it and start working with it. I can?t wait. Perhaps I?ll say something about the installation so people can see that Proxmox is tested on this kind of server Once again thanks Ralf and all the community Sincerely, Humberto Freitas Phone: +244 944 775 334 Email: humberto.freitas310 at gmail.com Angola > On 14/04/2021, at 6:27 PM, Ralf Storm wrote: > > ?Hello Humberto, > > > I have it running for a customer on a DL380 Gen9 and on an even older one, works togeher like charm. > > I have installed Proxmox on so many different hardware in the past few years, from a small atom NUC, on "usual" pcs and up to "real" servers. > > Never had a problem with it. > > I always use the ZFS-options during install - better than using the raidcontrollers in my opinion. > > > have fun with it > > > best regards > > > Ralf > >> Am 14/04/2021 um 18:57 schrieb Humberto Freitas: >> Hey guys, I hope everybody is all right ? >> >> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc... >> >> I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist? >> >> Appreciate your wisdom lol >> >> Thanks for your great work >> >> Sincerely, >> >> Humberto Freitas >> >> Phone: +244 944 775 334 >> Email: humberto.freitas310 at gmail.com >> Angola >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From gaio at sv.lnf.it Thu Apr 15 09:38:48 2021 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 15 Apr 2021 09:38:48 +0200 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: References: Message-ID: <20210415073848.GB3322@sv.lnf.it> Mandi! Humberto Freitas In chel di` si favelave... > I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc... I'm currently running PVE 5 on a ProLiant ML350p Gen8, that AFAIK is the 'tower' version of ProLiant DL380p Gen8. No trouble at all. If you have enough RAM, consider putting the controller in JBOD mode and install directly with ZFS software RAID. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From a.lauterer at proxmox.com Thu Apr 15 09:47:10 2021 From: a.lauterer at proxmox.com (Aaron Lauterer) Date: Thu, 15 Apr 2021 09:47:10 +0200 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: References: Message-ID: I personally still have one of those old boxes around. Works fine but regarding the disks and RAID controllers you should be aware that in my experience, booting from any of the disks did not work when put into JBOD mode. I ended up putting in an HBA controller which also needed new SAS cables as the ones it shipped with have plugs that are angled at 90? making it impossible to plug 2 of them into the HBA. On 4/14/21 6:57 PM, Humberto Freitas wrote: > Hey guys, I hope everybody is all right ? > > I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc... > > I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist? > > Appreciate your wisdom lol > > Thanks for your great work > > Sincerely, > > Humberto Freitas > > Phone: +244 944 775 334 > Email: humberto.freitas310 at gmail.com > Angola > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From elacunza at binovo.es Thu Apr 15 09:55:42 2021 From: elacunza at binovo.es (Eneko Lacunza) Date: Thu, 15 Apr 2021 09:55:42 +0200 Subject: [PVE-User] PVE 6.2 Strange cluster node fence In-Reply-To: <35DFA143-EE6D-41AD-A795-C58BE7253441@kmi.com> References: <35DFA143-EE6D-41AD-A795-C58BE7253441@kmi.com> Message-ID: Hi Stefan, El 14/4/21 a las 19:28, Stefan M. Radman escribi?: > The redundant corosync rings would definitely have prevented the > fencing even in your scenario. Yes that's for sure ;) > > As a final note you should also consider replacing that 1GbE link > between the switches by an Nx1GbE bundle (LACP) for redundancy and > bandwidth reasons or at least by 2 x 1GbE secured by spanning tree (RSTP). I think we should interlink the switches with SFP+. Backups don't need that bandwith but the final say is not mine :( Thanks a lot Eneko > > Stefan > >> On Apr 14, 2021, at 18:26, Eneko Lacunza via pve-user >> > wrote: >> >> >> *From: *Eneko Lacunza > >> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence* >> *Date: *April 14, 2021 at 18:26:08 GMT+2 >> *To: *pve-user at lists.proxmox.com >> >> >> Hi, >> >> So I have figured out what likely happened. >> >> Indeed it was very likely a network congestion because proxmox1 and >> proxmox2 where using a switch and proxmox3 the other, due to proxmox1 >> and proxmox2 not having properly loaded the bond-primary directive >> (primary slave not shown on /proc/net/bonding/bond0 although it was >> present in /etc/network/interfaces). >> >> Additionally, just checked out that both switches are linked by a 1G >> port due to the 4th SFP+ port being used for the backup server... >> (against my recommendation during the cluster setup I must add...) >> >> So very likely it was network congestion that kicked proxmox1 out of >> the cluster. >> >> If seems that bond directives should be present in slaves too, like: >> >> auto lo >> iface lo inet loopback >> >> iface ens2f0np0 inet manual >> ??? bond-master bond0 >> ??? bond-primary ens2f0np1 >> # Switch2 >> >> iface ens2f1np1 inet manual >> ??? bond-master bond0 >> ??? bond-primary ens2f0np1 >> # Switch1 >> >> iface eno1 inet manual >> >> iface eno2 inet manual >> >> auto bond0 >> iface bond0 inet manual >> ??? bond-slaves ens2f0np0 ens2f1np1 >> ??? bond-miimon 100 >> ??? bond-mode active-backup >> ??? bond-primary ens2f0np1 >> >> auto bond0.91 >> iface bond0.91 inet static >> ??? address 192.168.91.11 >> #Ceph >> >> auto vmbr0 >> iface vmbr0 inet static >> ??? address 192.168.90.11 >> ??? gateway 192.168.90.1 >> ??? bridge-ports bond0 >> ??? bridge-stp off >> ??? bridge-fd 0 >> >> Otherwise, it seems sometimes primary doesn't get configured properly... >> >> Thanks again Michael and Stefan! >> Eneko >> >> >> El 14/4/21 a las 12:12, Eneko Lacunza via pve-user escribi?: >>> Hi Michael, >>> >>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribi?: >>>> On Wed, 14 Apr 2021 11:04:10 +0200 >>>> Eneko Lacunza via pve-user>>> > wrote: >>>> >>>>> Hi all, >>>>> >>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster. >>>>> >>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been >>>>> operating normally for a year. Last update was on January 21st 2021. >>>>> Storage is Ceph and nodes are connected to the same network switch >>>>> with active-pasive bonds. >>>>> >>>>> proxmox1 was fenced and automatically rebooted, then everything >>>>> recovered. HA restarted VMs in other nodes too. >>>>> >>>>> proxmox1 syslog: (no network link issues reported at device level) >>>> I have seen this occasionally and every time the cause was high network >>>> load/network congestion which caused token timeout. The default token >>>> timeout in corosync IMHO is very optimistically configured to 1000 ms >>>> so I have changed this setting to 5000 ms and after I have done this I >>>> have never seen fencing happening caused by network load/network >>>> congestion again. You could try this and see if that helps you. >>>> >>>> PS. my cluster communication is on a dedicated gb bonded vlan. >>> Thanks for the info. In this case network is 10Gbit (I see I didn't >>> include this info) but only for proxmox nodes: >>> >>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches >>> - Both switches are interconnected with a SFP+ DAC >>> - Active-passive Bonds in each proxmox node go one SFP+ interface on >>> each switch. Primary interfaces are configured to be on the same switch. >>> - Connectivity to the LAN is done with 1 Gbit link >>> - Proxmox 2x10G Bond is used for VM networking and Ceph >>> public/private networks. >>> >>> I wouldn't expect high network load/congestion because it's on an >>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were >>> ocurring during the fence. >>> >>> Network cards are Broadcom. >>> >>> Thanks >> >> Eneko Lacunza >> Zuzendari teknikoa | Director t?cnico >> Binovo IT Human Project >> >> Tel. +34 943 569 206 | https://www.binovo.es >> Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun >> >> https://www.youtube.com/user/CANALBINOVO >> >> https://www.linkedin.com/company/37269706/ >> >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C6173285a195944ab306e08d8ff620c61%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540143873213806%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=k%2FL7WhTr4ybZ%2FsKsx%2F49L3k7sjc2VA71xKwI8iH8buw%3D&reserved=0 > > > CONFIDENTIALITY NOTICE: /This communication may contain privileged and > confidential information, or may otherwise be protected from > disclosure, and is intended solely for use of the intended > recipient(s). If you are not the intended recipient of this > communication, please notify the sender that you have received this > communication in error and delete and destroy all copies in your > possession. / > Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 | https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From ralf.storm at konzept-is.de Thu Apr 15 09:59:17 2021 From: ralf.storm at konzept-is.de (Ralf Storm) Date: Thu, 15 Apr 2021 09:59:17 +0200 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: References: Message-ID: Hey Aron, why didn?t you get this booting? What was the errror? Never had any booting problems with proxmox, despite the ZFS issues, which are described in the documentation, with quick and easy solutions Am 15/04/2021 um 09:47 schrieb Aaron Lauterer: > I personally still have one of those old boxes around. > > Works fine but regarding the disks and RAID controllers you should be > aware that in my experience, booting from any of the disks did not > work when put into JBOD mode. I ended up putting in an HBA controller > which also needed new SAS cables as the ones it shipped with have > plugs that are angled at 90? making it impossible to plug 2 of them > into the HBA. > > > On 4/14/21 6:57 PM, Humberto Freitas wrote: >> Hey guys, I hope everybody is all right ? >> >> I?m seeking for an advice from the community on buying this server, >> HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m >> planning to install, in the beginning, an instance of Debian 10 and >> in it I?m planning to have the usual setting for an enterprise like a >> filesharing software, an ERP, web server, etc... >> >> I?ve looked up something on DuckDuckGo, and find few things except >> this: >> https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. >> Does the issues described in the page still exist? >> >> Appreciate your wisdom lol >> >> Thanks for your great work >> >> Sincerely, >> >> Humberto Freitas >> >> Phone: +244 944 775 334 >> Email: humberto.freitas310 at gmail.com >> Angola >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From a.lauterer at proxmox.com Thu Apr 15 11:32:22 2021 From: a.lauterer at proxmox.com (Aaron Lauterer) Date: Thu, 15 Apr 2021 11:32:22 +0200 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: References: Message-ID: <50abd43b-ad14-a8a7-f69b-db75764f28dd@proxmox.com> IIRC, it has been a while: once I switched the P420i to HBA/JBOD mode, I think it only supports HBA? It just did not present any option to boot from it either in the controller or BIOS settings. I think that has gotten better with the RAID controllers present in the G9 and later, but the P420i in the G8 (and G7?) are a bit horrible in that regard. On 4/15/21 9:59 AM, Ralf Storm wrote: > Hey Aron, > > why didn?t you get this booting? What was the errror? Never had any booting problems with proxmox, despite the ZFS issues, which are described in the documentation, with quick and easy solutions > > Am 15/04/2021 um 09:47 schrieb Aaron Lauterer: >> I personally still have one of those old boxes around. >> >> Works fine but regarding the disks and RAID controllers you should be aware that in my experience, booting from any of the disks did not work when put into JBOD mode. I ended up putting in an HBA controller which also needed new SAS cables as the ones it shipped with have plugs that are angled at 90? making it impossible to plug 2 of them into the HBA. >> >> >> On 4/14/21 6:57 PM, Humberto Freitas wrote: >>> Hey guys, I hope everybody is all right ? >>> >>> I?m seeking for an advice from the community on buying this server, HP ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning to install, in the beginning, an instance of Debian 10 and in it I?m planning to have the usual setting for an enterprise like a filesharing software, an ERP, web server, etc... >>> >>> I?ve looked up something on DuckDuckGo, and find few things except this: https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. Does the issues described in the page still exist? >>> >>> Appreciate your wisdom lol >>> >>> Thanks for your great work >>> >>> Sincerely, >>> >>> Humberto Freitas >>> >>> Phone: +244 944 775 334 >>> Email: humberto.freitas310 at gmail.com >>> Angola >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at lists.proxmox.com >>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>> >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From piviul at riminilug.it Thu Apr 15 16:03:08 2021 From: piviul at riminilug.it (Piviul) Date: Thu, 15 Apr 2021 16:03:08 +0200 Subject: [PVE-User] Edit: Boot Order mask In-Reply-To: References: <5c3d06aa-1bf8-ca1b-e826-3d2615685b9d@riminilug.it> Message-ID: Il 14/04/21 18:15, Alain P?an ha scritto: > Hi Piviul, > > I don't think only a difference in kernel could explain this > difference in the web interface, if the other packages are the same. > Did you try to clear the cache in your web browsers ? > > The attached files are indeed there. I looked at the versions, and all > three appears up to date, so for me, the only origin that I can > suppose could be the browser cache. But I'm sure it's not a cache browser because I have clear the cache and I have tested this problem in different browsers in different PCs... in my opinion there is a bug: during the node upgrade to 6.3, proxmox VE doesn't update the code that generate the Boot order option mask. Please can you verify if in your 6.3 proxomox nodes that are updates from previously 6.2 you can see the new drag and drop boot order mask? Thank you very much Piviul From lindsay.mathieson at gmail.com Thu Apr 15 17:43:28 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Fri, 16 Apr 2021 01:43:28 +1000 Subject: [PVE-User] mirrored ZFS boot with SATA SSD & NVMe PCIe SSD? Message-ID: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com> Setting up a home server (NUC i5) and it can only take a 2.5" SATA drive and a NVMe PCIe SSD - but I would really like a mirrored ZFS boot. Is it possible (and safe?) to use a 512GB SATA SSD and a 512GB NVMe PCi SSD in a zfs boot mirror? -- Lindsay From leesteken at protonmail.ch Thu Apr 15 17:51:57 2021 From: leesteken at protonmail.ch (Arjen) Date: Thu, 15 Apr 2021 15:51:57 +0000 Subject: [PVE-User] mirrored ZFS boot with SATA SSD & NVMe PCIe SSD? In-Reply-To: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com> References: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com> Message-ID: On Thursday, April 15th, 2021 at 17:43, Lindsay Mathieson wrote: > Setting up a home server (NUC i5) and it can only take a 2.5" SATA drive > > and a NVMe PCIe SSD - but I would really like a mirrored ZFS boot. > > Is it possible (and safe?) to use a 512GB SATA SSD and a 512GB NVMe PCi > > SSD in a zfs boot mirror? It should work fine but the write speeds (and fsync/sec) will typically be the slowest of the two. I use a NVME M.2 mirrored by two SATA drives for my VMs, which I think is quite similar. Older systems sometimes cannot boot from NVME (PCIe). You might want to make sure, otherwise the redundancy won't help if the SATA one fails. From lindsay.mathieson at gmail.com Thu Apr 15 18:33:35 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Fri, 16 Apr 2021 02:33:35 +1000 Subject: [PVE-User] mirrored ZFS boot with SATA SSD & NVMe PCIe SSD? In-Reply-To: References: <7be3f7ca-405e-0b8a-6bd1-4d3a3e1bccc6@gmail.com> Message-ID: On 16/04/2021 1:51 am, Arjen via pve-user wrote: > It should work fine but the write speeds (and fsync/sec) will typically be the slowest of the two. Thanks, thought that would be the case. > I use a NVME M.2 mirrored by two SATA drives for my VMs, which I think is quite similar. > Older systems sometimes cannot boot from NVME (PCIe). You might want to make sure, otherwise the redundancy won't help if the SATA one fails. Didn't know that, will check. -- Lindsay From piviul at riminilug.it Fri Apr 16 16:16:26 2021 From: piviul at riminilug.it (Piviul) Date: Fri, 16 Apr 2021 16:16:26 +0200 Subject: [PVE-User] Edit: Boot Order mask In-Reply-To: References: Message-ID: <3e550f7c-4f26-3573-63e8-d1e544096b82@riminilug.it> Il 13/04/21 10:05, Piviul ha scritto: > I ask[?] about this little problem on the forum but nobody found a > solution, so I try here... > > In my PVE the mask where I can change the Boot Order options of a VM > is not ever the same. If I access to the mask from 2 nodes (say node1 > and node2) the mask is a simple html form with only combo boxes. On > the third node (say node3) the mask is more sophisticated, can support > the drag and drop, has checkbox... in other word it's different. So I > would like to know why my three nodes doesn't have the same mask even > if they are at the same proxmox version and if there is a way that all > nodes shows the same mask. > > I ask you because this is not only a layout problem; if I modify the > boot order options from the node3, I can see strange chars in the PVE > gui of the other two nodes but if I configure the boot order options > from node1 or node2 all seems works flawless. The problem has been solved reinstalling pve-manager with the command # apt install --reinstall pve-manager |Thank you very much to all list members Have a great day! Piviul | || || From lindsay.mathieson at gmail.com Mon Apr 19 02:52:09 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Mon, 19 Apr 2021 10:52:09 +1000 Subject: [PVE-User] unpriviliged lxc uid/gid mappings Message-ID: I must say, I find the subject very confusing and difficult to parse. It seems very difficult to setup with multiple user and container mappings to maintain - I just setup 4 containers with 4 bind mounts each and after a lot of fiddling, got them working, but I'm not confident on maintenance for the future. I had to give up on the container that needed access to 2 USB tuners and a Intel QuickSync GPU (vaapi), ended up running that container privileged. Is there any plans to simplify it for the future? I found the LXD (4.0?) system of raw.idmap settings much easier to setup, I was able to generically script that for containers. -- Lindsay From lindsay.mathieson at gmail.com Mon Apr 19 02:53:30 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Mon, 19 Apr 2021 10:53:30 +1000 Subject: [PVE-User] unpriviliged lxc uid/gid mappings Message-ID: <190926b5-0c91-b8d3-e653-5425103c0c0d@gmail.com> I must say, I find the subject very confusing and difficult to parse. It seems very difficult to setup with multiple user and container mappings to maintain - I just setup 4 containers with 4 bind mounts each and after a lot of fiddling, got them working, but I'm not confident on maintenance for the future. I had to give up on the container that needed access to 2 USB tuners and a Intel QuickSync GPU (vaapi), ended up running that container privileged. Is there any plans to simplify it for the future? I found the LXD (4.0?) system of raw.idmap settings much easier to setup, I was able to generically script that for containers. Not complaining, I'm very happy with the overall setup I have at home - PX Media Server and a PBS Server, much easier to maintain than my old setup, and disaster recovery exists now :) -- Lindsay From leandro at tecnetmza.com.ar Mon Apr 19 14:29:19 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Mon, 19 Apr 2021 09:29:19 -0300 Subject: [PVE-User] Proxmox on a HP ProLiant DL380p Gen8 In-Reply-To: References: Message-ID: Humberto , WE bought an used hpe proliant dl380 gen8. It is working very nice so far. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> El mi?, 14 abr 2021 a las 13:57, Humberto Freitas (< humberto.freitas310 at gmail.com>) escribi?: > Hey guys, I hope everybody is all right ? > > I?m seeking for an advice from the community on buying this server, HP > ProLiant DL380p Gen8. Has someone installed Proxmox VE on it. I?m planning > to install, in the beginning, an instance of Debian 10 and in it I?m > planning to have the usual setting for an enterprise like a filesharing > software, an ERP, web server, etc... > > I?ve looked up something on DuckDuckGo, and find few things except this: > https://forum.proxmox.com/threads/help-installing-proxmox-on-hp-proliant-server-dl380e-g8.18522/. > Does the issues described in the page still exist? > > Appreciate your wisdom lol > > Thanks for your great work > > Sincerely, > > Humberto Freitas > > Phone: +244 944 775 334 > Email: humberto.freitas310 at gmail.com > Angola > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From leandro at tecnetmza.com.ar Mon Apr 19 14:59:15 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Mon, 19 Apr 2021 09:59:15 -0300 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. Message-ID: Hi guys , I received a very old del pe 2950 box. Fortunately it has 64 GB and a double power supply, so i'm thinking about using it with pve. After confirm with dell support about storage capacity: Max physical storage support is 2TB. Max virtual storage support is also 2TB. ## I was reading on previous emails at this mail list , about storage: "putting the controller in JBOD mode and install directly with ZFS software RAID." I always thought that raid hardware controller was the best option but , perhaps I can give it a try to ZFS software RAID with this old server .... what do you think ? I have a bunch of 3.5" with odd capacities unused drives. I readed that zfs can merge them to get a more efficient use. What about hot replace / remove or insert a new drive ? Will it work without service disruption in production environments ? Regards, Leandro. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> From gaio at sv.lnf.it Mon Apr 19 15:38:49 2021 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 19 Apr 2021 15:38:49 +0200 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. In-Reply-To: References: Message-ID: <20210419133849.GN3268@sv.lnf.it> Mandi! Leandro Roggerone In chel di` si favelave... > What about hot replace / remove or insert a new drive ? > Will it work without service disruption in production environments ? AFAIk no. If i remember well, Linux SCSI/SATA subsystem have support for the hot-swap, but need also the support for the controller/cage/backpane/... So, basically, switching to JBOD/Passthroug, you lost host-swap. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From lists at benappy.com Mon Apr 19 15:57:04 2021 From: lists at benappy.com (Michel 'ic' Luczak) Date: Mon, 19 Apr 2021 15:57:04 +0200 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. In-Reply-To: References: Message-ID: Hi, > Max physical storage support is 2TB. > Max virtual storage support is also 2TB. I?m using a PE2950 with the stock SAS RAID card and 6 x 4TB SATA using uSATA adapters (small boards that clip in the back of SATA drives to ?make them SAS? (over simplifying things but you will need those)). There is no real limitation, you may only run into issues with 512e/512n/etc? but up to 4 or 6 TB you should be fine. More recently I used very modern 8TB SAS drives in an R410 without backplane and I had to cut the 3.3V power line to the drives because on modern drives it?s turning them off. Dell support knows only what was written in the manual at the date of release of the server. Regards, Michel From devzero at web.de Mon Apr 19 16:05:14 2021 From: devzero at web.de (Roland) Date: Mon, 19 Apr 2021 16:05:14 +0200 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. In-Reply-To: <20210419133849.GN3268@sv.lnf.it> References: <20210419133849.GN3268@sv.lnf.it> Message-ID: <4d87a555-3c3e-0819-404d-cbf65da0b9f1@web.de> shouldn't it be possible to replace existing controller with crossflashed perc h310 + sff-8484/sff-8087 cable? and you're done with the disk / hotswap limitation ? roland Am 19.04.21 um 15:38 schrieb Marco Gaiarin: > Mandi! Leandro Roggerone > In chel di` si favelave... > >> What about hot replace / remove or insert a new drive ? >> Will it work without service disruption in production environments ? > AFAIk no. > > If i remember well, Linux SCSI/SATA subsystem have support for the hot-swap, > but need also the support for the controller/cage/backpane/... > > So, basically, switching to JBOD/Passthroug, you lost host-swap. > From kyleaschmitt at gmail.com Mon Apr 19 17:08:04 2021 From: kyleaschmitt at gmail.com (Kyle Schmitt) Date: Mon, 19 Apr 2021 10:08:04 -0500 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. In-Reply-To: References: Message-ID: I run on R610s, so about the same generation I think. I use the hardware raid for mirrored boot drive only, and NFS over 10G for VM storage, which is on a seperate system running FreeBSD + ZFS. But for your case: the general rule for any storage system is don't mix. You pick one and only one: hardware raid, software raid, zfs, ceph, etc. The exceptions are for systems you almost definitely won't be using like luster and gluster. You can still do hotplug with JBOD mode, at least on the dell hardware I've used. I have no idea if it's officially supported or not. I do know that I sometimes have to use the raid tools to bring in a new drive when it's NOT in JBOD mode. In the 6ish years I've run ZFS I've only had one drive fail (pure luck, not skill), and it was trivial to swap out and replace. --Kyle On Mon, Apr 19, 2021 at 7:59 AM Leandro Roggerone wrote: > > Hi guys , I received a very old del pe 2950 box. > Fortunately it has 64 GB and a double power supply, so i'm thinking about > using it with pve. > After confirm with dell support about storage capacity: > Max physical storage support is 2TB. > Max virtual storage support is also 2TB. > ## > I was reading on previous emails at this mail list , about storage: > "putting the controller in JBOD mode > and install directly with ZFS software RAID." > > I always thought that raid hardware controller was the best option but , > perhaps I can give it a try to ZFS software RAID with this old server .... > what do you think ? > I have a bunch of 3.5" with odd capacities unused drives. > I readed that zfs can merge them to get a more efficient use. > What about hot replace / remove or insert a new drive ? > Will it work without service disruption in production environments ? > > Regards, > Leandro. > > > Libre > de virus. www.avast.com > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From d.alexandris at gmail.com Mon Apr 19 22:20:18 2021 From: d.alexandris at gmail.com (Dimitri Alexandris) Date: Mon, 19 Apr 2021 23:20:18 +0300 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. In-Reply-To: References: Message-ID: I have an old Dell 1950 with one cpu (2 disks). Was 1+1G ram, added 16+16 and now have 34G. Two power supplies. 1- Never bothered to change SAS controller mode. 2- SATA disks (up to 2T of course) work perfectly, never used transposers. 3- First worked with an internal SSD (at internal SATA port, plus an addon power cable) for OS (always ZFS) + 2 SATA disks (ZFS raid) for data. 3 years, no problems. 4- Now i bought 2 SAS 1T, with OS on them, and also works fine. 5- Hot plugging disks is working fine. I actually increased capacity with bigger disks without stopping anything this way. Had 500G SATA before 1T SAS diks. I configure the 2 eths in OVS (openvswitch) bond mode, with several VLANS and internal networks (OVS IntPorts). In your case, with many SAS disks you will be fine with any configuration, e.g.: 3+3 x 2T = 6T ZFS raid, OS + data, the fastest combination, or 5+1 ZFS raidz, 10T space, with best capacity, or 4+2 ZFS raidz2, 8T space, also big, and safer. Do yourself a favour, and buy some decent SAS disks, at 7200rpm are very cheap. On Mon, Apr 19, 2021 at 6:08 PM Kyle Schmitt wrote: > I run on R610s, so about the same generation I think. I use the > hardware raid for mirrored boot drive only, and NFS over 10G for VM > storage, which is on a seperate system running FreeBSD + ZFS. > > But for your case: the general rule for any storage system is don't > mix. You pick one and only one: hardware raid, software raid, zfs, > ceph, etc. The exceptions are for systems you almost definitely won't > be using like luster and gluster. > > You can still do hotplug with JBOD mode, at least on the dell hardware > I've used. I have no idea if it's officially supported or not. I do > know that I sometimes have to use the raid tools to bring in a new > drive when it's NOT in JBOD mode. > > In the 6ish years I've run ZFS I've only had one drive fail (pure > luck, not skill), and it was trivial to swap out and replace. > > --Kyle > > On Mon, Apr 19, 2021 at 7:59 AM Leandro Roggerone > wrote: > > > > Hi guys , I received a very old del pe 2950 box. > > Fortunately it has 64 GB and a double power supply, so i'm thinking about > > using it with pve. > > After confirm with dell support about storage capacity: > > Max physical storage support is 2TB. > > Max virtual storage support is also 2TB. > > ## > > I was reading on previous emails at this mail list , about storage: > > "putting the controller in JBOD mode > > and install directly with ZFS software RAID." > > > > I always thought that raid hardware controller was the best option but , > > perhaps I can give it a try to ZFS software RAID with this old server > .... > > what do you think ? > > I have a bunch of 3.5" with odd capacities unused drives. > > I readed that zfs can merge them to get a more efficient use. > > What about hot replace / remove or insert a new drive ? > > Will it work without service disruption in production environments ? > > > > Regards, > > Leandro. > > > > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > > Libre > > de virus. www.avast.com > > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > _______________________________________________ > > pve-user mailing list > > pve-user at lists.proxmox.com > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From leandro at tecnetmza.com.ar Tue Apr 20 14:06:21 2021 From: leandro at tecnetmza.com.ar (Leandro Roggerone) Date: Tue, 20 Apr 2021 09:06:21 -0300 Subject: [PVE-User] get the most of storage for a very old dell pe 2950. In-Reply-To: References: Message-ID: Thank you guys. I will try to buy 6 new drives. Then I will let you know how it goes. Regards. Leandro. Libre de virus. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> El lun, 19 abr 2021 a las 17:21, Dimitri Alexandris () escribi?: > I have an old Dell 1950 with one cpu (2 disks). Was 1+1G ram, added 16+16 > and now have 34G. Two power supplies. > > 1- Never bothered to change SAS controller mode. > 2- SATA disks (up to 2T of course) work perfectly, never used transposers. > 3- First worked with an internal SSD (at internal SATA port, plus an addon > power cable) for OS (always ZFS) + 2 SATA disks (ZFS raid) for data. 3 > years, no problems. > 4- Now i bought 2 SAS 1T, with OS on them, and also works fine. > 5- Hot plugging disks is working fine. I actually increased capacity with > bigger disks without stopping anything this way. Had 500G SATA before 1T > SAS diks. > > I configure the 2 eths in OVS (openvswitch) bond mode, with several VLANS > and internal networks (OVS IntPorts). > > In your case, with many SAS disks you will be fine with any configuration, > e.g.: > > 3+3 x 2T = 6T ZFS raid, OS + data, the fastest combination, or > 5+1 ZFS raidz, 10T space, with best capacity, or > 4+2 ZFS raidz2, 8T space, also big, and safer. > > Do yourself a favour, and buy some decent SAS disks, at 7200rpm are very > cheap. > > > On Mon, Apr 19, 2021 at 6:08 PM Kyle Schmitt > wrote: > > > I run on R610s, so about the same generation I think. I use the > > hardware raid for mirrored boot drive only, and NFS over 10G for VM > > storage, which is on a seperate system running FreeBSD + ZFS. > > > > But for your case: the general rule for any storage system is don't > > mix. You pick one and only one: hardware raid, software raid, zfs, > > ceph, etc. The exceptions are for systems you almost definitely won't > > be using like luster and gluster. > > > > You can still do hotplug with JBOD mode, at least on the dell hardware > > I've used. I have no idea if it's officially supported or not. I do > > know that I sometimes have to use the raid tools to bring in a new > > drive when it's NOT in JBOD mode. > > > > In the 6ish years I've run ZFS I've only had one drive fail (pure > > luck, not skill), and it was trivial to swap out and replace. > > > > --Kyle > > > > On Mon, Apr 19, 2021 at 7:59 AM Leandro Roggerone > > wrote: > > > > > > Hi guys , I received a very old del pe 2950 box. > > > Fortunately it has 64 GB and a double power supply, so i'm thinking > about > > > using it with pve. > > > After confirm with dell support about storage capacity: > > > Max physical storage support is 2TB. > > > Max virtual storage support is also 2TB. > > > ## > > > I was reading on previous emails at this mail list , about storage: > > > "putting the controller in JBOD mode > > > and install directly with ZFS software RAID." > > > > > > I always thought that raid hardware controller was the best option but > , > > > perhaps I can give it a try to ZFS software RAID with this old server > > .... > > > what do you think ? > > > I have a bunch of 3.5" with odd capacities unused drives. > > > I readed that zfs can merge them to get a more efficient use. > > > What about hot replace / remove or insert a new drive ? > > > Will it work without service disruption in production environments ? > > > > > > Regards, > > > Leandro. > > > > > > < > > > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > > > > Libre > > > de virus. www.avast.com > > > < > > > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > > _______________________________________________ > > > pve-user mailing list > > > pve-user at lists.proxmox.com > > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > > > _______________________________________________ > > pve-user mailing list > > pve-user at lists.proxmox.com > > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From piccardi at truelite.it Tue Apr 20 18:04:06 2021 From: piccardi at truelite.it (Simone Piccardi) Date: Tue, 20 Apr 2021 18:04:06 +0200 Subject: PBS backups on files Message-ID: <6f8ac760-2e6b-7050-1e47-0fed8d5d0b65@truelite.it> Hi, I just installed lasta version of Proxmox Backup System, where there is a new Tape Backup section. That's a good news, but it seems that is necessary to have tape driver, and that's not possible to use a simple file as destination for the backup. Just having only a removable cassette disk and not a tape the feature (which seemd very well made) is unfortunately useless to me. Since a removable disk is generally a simple and cheap solution for off-site backups, is there any possibility to extend this feature to save data on an ordinary file? Greetings Simone -- Simone Piccardi Truelite Srl piccardi at truelite.it (email/jabber) Via Monferrato, 6 Tel. +39-347-1032433 50142 Firenze http://www.truelite.it Tel. +39-055-7879597 From dietmar at proxmox.com Tue Apr 20 21:07:26 2021 From: dietmar at proxmox.com (Dietmar Maurer) Date: Tue, 20 Apr 2021 21:07:26 +0200 (CEST) Subject: [PVE-User] PBS backups on files Message-ID: <1276563634.4483.1618945646240@webmail.proxmox.com> > Since a removable disk is generally a simple and cheap solution for > off-site backups, is there any possibility to extend this feature to > save data on an ordinary file? Sync to a removable disk is unrelated to tape backup. But we have plans to support that also in the future... From piccardi at truelite.it Thu Apr 22 11:22:07 2021 From: piccardi at truelite.it (Simone Piccardi) Date: Thu, 22 Apr 2021 11:22:07 +0200 Subject: [PVE-User] PBS backups on files In-Reply-To: <1276563634.4483.1618945646240@webmail.proxmox.com> References: <1276563634.4483.1618945646240@webmail.proxmox.com> Message-ID: <390357c0-4249-bafb-8234-08cb9fe3792a@truelite.it> Il 20/04/21 21:07, Dietmar Maurer ha scritto: >> Since a removable disk is generally a simple and cheap solution for >> off-site backups, is there any possibility to extend this feature to >> save data on an ordinary file? > > Sync to a removable disk is unrelated to tape backup. > They seemed similar to me, because a tape is still a device file, so I thinked that just writing the same content into a standard file will do the job. > But we have plans to support that also in the future... > That's a good news. Simone -- Simone Piccardi Truelite Srl piccardi at truelite.it (email/jabber) Via Monferrato, 6 Tel. +39-347-1032433 50142 Firenze http://www.truelite.it Tel. +39-055-7879597 From jmr.richardson at gmail.com Tue Apr 27 20:38:04 2021 From: jmr.richardson at gmail.com (JR Richardson) Date: Tue, 27 Apr 2021 13:38:04 -0500 Subject: [PVE-User] Multi Data Center Cluster or Not Message-ID: Hi All, I'm looking for suggestions for geo-diversity using PROXMOX Clustering. I understand running hypervisors in the same cluster in multiple data centers is possible with high capacity/low latency inter-site links. What I'm learning is there could be better ways, like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS is interesting but would require manually restoring nodes should a failure occur. I'm looking for best practice or suggestions in topology that folks are using successfully or even tales of failure for what to avoid. Thanks. JR -- JR Richardson Engineering for the Masses Chasing the Azeotrope From aderumier at odiso.com Wed Apr 28 04:03:19 2021 From: aderumier at odiso.com (alexandre derumier) Date: Wed, 28 Apr 2021 04:03:19 +0200 Subject: [PVE-User] Multi Data Center Cluster or Not In-Reply-To: References: Message-ID: <807b442a-7b57-2918-986d-fda0db321b45@odiso.com> Hi, If you want same cluster on multiple datacenter, you really need low latency (for proxmox && storage), and at least 3 datacenters to keep quorum. if you need a 2dc datacenter, with 1 primary && 1 backup as disaster recovery you could manually replicate a zfs or ceph storage to the backup dc (with snapshot export/import), or other storage replication feature if you have a san like netapp for example and do an rsync of /etc/pve. On 27/04/2021 20:38, JR Richardson wrote: > Hi All, > > I'm looking for suggestions for geo-diversity using PROXMOX > Clustering. I understand running hypervisors in the same cluster in > multiple data centers is possible with high capacity/low latency > inter-site links. What I'm learning is there could be better ways, > like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS > is interesting but would require manually restoring nodes should a > failure occur. > > I'm looking for best practice or suggestions in topology that folks > are using successfully or even tales of failure for what to avoid. > > Thanks. > > JR From t.lamprecht at proxmox.com Wed Apr 28 08:40:51 2021 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Wed, 28 Apr 2021 08:40:51 +0200 Subject: [PVE-User] Multi Data Center Cluster or Not In-Reply-To: <807b442a-7b57-2918-986d-fda0db321b45@odiso.com> References: <807b442a-7b57-2918-986d-fda0db321b45@odiso.com> Message-ID: <5f38f579-fe59-763e-6919-be691912aa87@proxmox.com> On 28.04.21 04:03, alexandre derumier wrote: > On 27/04/2021 20:38, JR Richardson wrote: >> I'm looking for suggestions for geo-diversity using PROXMOX >> Clustering. I understand running hypervisors in the same cluster in >> multiple data centers is possible with high capacity/low latency >> inter-site links. What I'm learning is there could be better ways, >> like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS >> is interesting but would require manually restoring nodes should a >> failure occur. >> >> I'm looking for best practice or suggestions in topology that folks >> are using successfully or even tales of failure for what to avoid. > > If you want same cluster on multiple datacenter, you really need low latency (for proxmox && storage), and at least 3 datacenters to keep quorum. > > if you need a 2dc datacenter, with 1 primary && 1 backup as disaster recovery > > you could manually replicate a zfs or ceph storage to the backup dc (with snapshot export/import), or other storage replication feature if you have a san like netapp for example and do an rsync of /etc/pve. > We know of setups which use rbd-mirror to mirror their production Ceph pool to a second DC for recovery on failure. It's still needs a bit of hands-on approach on setup and actual recovery can be prepared too (pre-create matching VMs, maybe lock them by default so no start is done by accident). We also know some city-gov IT people which run their cluster over multiple DCs, but they have the luck to be able to run redundant fiber with LAN-like latency between those DCs, which may not be an option for everyone. A multi-datacenter management is planned, but we currently are still fleshing out the basis, albeit some features required for that to happen are in-work. Nothing ready to soon, though, just mentioning as FYI. cheers, Thomas From martin at proxmox.com Wed Apr 28 11:56:33 2021 From: martin at proxmox.com (Martin Maurer) Date: Wed, 28 Apr 2021 11:56:33 +0200 Subject: [PVE-User] Proxmox VE 6.4 released Message-ID: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> Hi all, We are proud to announce the general availability of Proxmox Virtual Environment 6.4, our open-source virtualization platform. This version brings unified single-file restore for virtual machine (VM) and container (CT) backup archives stored on a Proxmox Backup Server as well as live restore of VM backup archives located on a Proxmox Backup Server. Version 6.4 also comes with Ceph Octopus 15.2.11 and Ceph Nautilus 14.2.20, many enhancements to KVM/QEMU, and notable bug fixes. Many new Ceph-specific management features have been added to the GUI. We have improved the integration of the placement group (PG) auto-scaler, and you can configure Target Size or Target Ratio settings in the GUI. The new version is based on Debian Buster 10.9, but using a newer, long-term supported Linux kernel 5.4. Optionally, the 5.11 kernel can be installed, providing support for the latest hardware. The latest versions of QEMU 5.2, LXC 4.0, and OpenZFS 2.0.4 have been included. There are some notable bug fixes and smaller improvements, see the full release notes. Release notes https://pve.proxmox.com/wiki/Roadmap Press release https://www.proxmox.com/en/news/press-releases/proxmox-virtual-environment-6-4-available Video tutorial https://www.proxmox.com/en/training/video-tutorials/item/what-s-new-in-proxmox-ve-6-4 Download https://www.proxmox.com/en/downloads Alternate ISO download: http://download.proxmox.com/iso Documentation https://pve.proxmox.com/pve-docs Community Forum https://forum.proxmox.com Source Code https://git.proxmox.com Bugtracker https://bugzilla.proxmox.com FAQ Q: Can I dist-upgrade Proxmox VE 6.x to 6.4 with apt? A: Yes, just via GUI or via CLI with apt update && apt dist-upgrade Q: Can I install Proxmox VE 6.4 on top of Debian Buster? A: Yes, see https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Buster Q: Can I upgrade my Proxmox VE 5.4 cluster with Ceph Luminous to 6.x and higher with Ceph Nautilus and even Ceph Octopus? A: This is a three step process. First, you have to upgrade Proxmox VE from 5.4 to 6.4, and afterwards upgrade Ceph from Luminous to Nautilus. There are a lot of improvements and changes, please follow exactly the upgrade documentation. https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0 https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus Finally, do the upgrade to Ceph Octopus - https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus Q: Where can I get more information about feature updates? A: Check our roadmap, forum, mailing lists, and subscribe to our newsletter. A big THANK YOU to our active community for all your feedback, testing, bug reporting and patch submitting! -- Best Regards, Martin Maurer Proxmox VE project leader From daniel at firewall-services.com Wed Apr 28 14:14:10 2021 From: daniel at firewall-services.com (Daniel Berteaud) Date: Wed, 28 Apr 2021 14:14:10 +0200 (CEST) Subject: [PVE-User] SDN issues in 6.4 In-Reply-To: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> Message-ID: <371322183.4890.1619612050577.JavaMail.zimbra@fws.fr> ----- Le 28 Avr 21, ? 14:10, Daniel Berteaud a ?crit : > Hi. > Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN > feature with a single VLAN zone, and a few vnets (each one of the vnets using a > VLAN tag, and not VLAN-aware themself). > I see several issues regarding SDN since the upgrade : > * The biggest issue is that in Datacenter -> SDN I only see a single node (with > the status "available"). The other two do not appear anymore. Without paying > attention, I clicked on the "Apply" button. This wiped the > /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore, > and reloaded their network stack. Needless to say it was a complete failure as > all the VM attached to one of those vnets lost network connectivity. I've > manually copied this /etc/network/interfaces.d/sdn file from the only working > node to the other two for now, but I can't make any change from the GUI now or > it'll do the same again > * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone > were displayed at all. But the Vnets correctly showed they were attached to my > zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding > it again from the GUI, which seemed to work. The only change it made to > /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone Also, I have a lot of errors like this now : Apr 28 13:14:16 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. Apr 28 13:14:25 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. Apr 28 13:14:35 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. Apr 28 13:14:46 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. Apr 28 13:14:55 pvo6 pvestatd[2624]: sdn status update error: cannot lookup undefined type! at /usr/share/perl5/PVE/Network/SDN/Zones.pm line 260. On all the 3 nodes (even the one which still appears in the SDN Status page on the GUI) -- [ https://www.firewall-services.com/ ] Daniel Berteaud FIREWALL-SERVICES SAS, La s?curit? des r?seaux Soci?t? de Services en Logiciels Libres T?l : +33.5 56 64 15 32 Matrix: @dani:fws.fr [ https://www.firewall-services.com/ | https://www.firewall-services.com ] From daniel at firewall-services.com Wed Apr 28 14:10:19 2021 From: daniel at firewall-services.com (Daniel Berteaud) Date: Wed, 28 Apr 2021 14:10:19 +0200 (CEST) Subject: [PVE-User] SDN issues in 6.4 Message-ID: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> Hi. Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN feature with a single VLAN zone, and a few vnets (each one of the vnets using a VLAN tag, and not VLAN-aware themself). I see several issues regarding SDN since the upgrade : * The biggest issue is that in Datacenter -> SDN I only see a single node (with the status "available"). The other two do not appear anymore. Without paying attention, I clicked on the "Apply" button. This wiped the /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore, and reloaded their network stack. Needless to say it was a complete failure as all the VM attached to one of those vnets lost network connectivity. I've manually copied this /etc/network/interfaces.d/sdn file from the only working node to the other two for now, but I can't make any change from the GUI now or it'll do the same again * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone were displayed at all. But the Vnets correctly showed they were attached to my zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding it again from the GUI, which seemed to work. The only change it made to /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone Anyone know what could be wrong ? Why would 2 (out of 3) nodes not showing up in the SDN status anymore ? The 3 nodes are fully up to date using the no-subscription repo, here's the complete pveversion : pve-manager/6.4-4/337d6701 (running kernel: 5.4.106-1-pve) root at pvo5:~# pveversion -v proxmox-ve: 6.4-1 (running kernel: 5.4.106-1-pve) pve-manager: 6.4-4 (running version: 6.4-4/337d6701) pve-kernel-5.4: 6.4-1 pve-kernel-helper: 6.4-1 pve-kernel-5.4.106-1-pve: 5.4.106-1 pve-kernel-5.4.103-1-pve: 5.4.103-1 pve-kernel-5.4.73-1-pve: 5.4.73-1 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve3 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.0.8 libproxmox-backup-qemu0: 1.0.3-1 libpve-access-control: 6.4-1 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-2 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-1 libpve-network-perl: 0.5-1 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 openvswitch-switch: 2.12.3-1 proxmox-backup-client: 1.1.5-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.5-3 pve-cluster: 6.4-1 pve-container: 3.3-5 pve-docs: 6.4-1 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-3 pve-firmware: 3.2-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-1 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.4-pve1 root at pvo5:~# -- [ https://www.firewall-services.com/ ] Daniel Berteaud FIREWALL-SERVICES SAS, La s?curit? des r?seaux Soci?t? de Services en Logiciels Libres T?l : +33.5 56 64 15 32 Matrix: @dani:fws.fr [ https://www.firewall-services.com/ | https://www.firewall-services.com ] From daniel at firewall-services.com Wed Apr 28 14:39:38 2021 From: daniel at firewall-services.com (Daniel Berteaud) Date: Wed, 28 Apr 2021 14:39:38 +0200 (CEST) Subject: [PVE-User] SDN issues in 6.4 In-Reply-To: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> Message-ID: <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr> ----- Le 28 Avr 21, ? 14:10, Daniel Berteaud daniel at firewall-services.com a ?crit : > Hi. > > Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN > feature with a single VLAN zone, and a few vnets (each one of the vnets using a > VLAN tag, and not VLAN-aware themself). > I see several issues regarding SDN since the upgrade : > > > > * The biggest issue is that in Datacenter -> SDN I only see a single node (with > the status "available"). The other two do not appear anymore. Without paying > attention, I clicked on the "Apply" button. This wiped the > /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore, > and reloaded their network stack. Needless to say it was a complete failure as > all the VM attached to one of those vnets lost network connectivity. I've > manually copied this /etc/network/interfaces.d/sdn file from the only working > node to the other two for now, but I can't make any change from the GUI now or > it'll do the same again > * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone > were displayed at all. But the Vnets correctly showed they were attached to my > zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding > it again from the GUI, which seemed to work. The only change it made to > /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone > > I just opened https://bugzilla.proxmox.com/show_bug.cgi?id=3403 I checked another single node (no cluster) PVE install, which have the exact same issue, so it's not something specific on my setup, but a more general (and critical) bug Cheers, Daniel -- [ https://www.firewall-services.com/ ] Daniel Berteaud FIREWALL-SERVICES SAS, La s?curit? des r?seaux Soci?t? de Services en Logiciels Libres T?l : +33.5 56 64 15 32 Matrix: @dani:fws.fr [ https://www.firewall-services.com/ | https://www.firewall-services.com ] From gilberto.nunes32 at gmail.com Wed Apr 28 14:50:53 2021 From: gilberto.nunes32 at gmail.com (Gilberto Ferreira) Date: Wed, 28 Apr 2021 09:50:53 -0300 Subject: [PVE-User] SDN issues in 6.4 In-Reply-To: <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr> References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr> Message-ID: Just curious: did you restart the upgraded nodes? --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qua., 28 de abr. de 2021 ?s 09:39, Daniel Berteaud escreveu: > > ----- Le 28 Avr 21, ? 14:10, Daniel Berteaud daniel at firewall-services.com a ?crit : > > > Hi. > > > > Just upgraded a small 3 nodes cluster to 6.4 today. This cluster used the SDN > > feature with a single VLAN zone, and a few vnets (each one of the vnets using a > > VLAN tag, and not VLAN-aware themself). > > I see several issues regarding SDN since the upgrade : > > > > > > > > * The biggest issue is that in Datacenter -> SDN I only see a single node (with > > the status "available"). The other two do not appear anymore. Without paying > > attention, I clicked on the "Apply" button. This wiped the > > /etc/network/interfaces.d/sdn file on the 2 nodes which do not appear anymore, > > and reloaded their network stack. Needless to say it was a complete failure as > > all the VM attached to one of those vnets lost network connectivity. I've > > manually copied this /etc/network/interfaces.d/sdn file from the only working > > node to the other two for now, but I can't make any change from the GUI now or > > it'll do the same again > > * In Datacenter -> SDN -> Zones, my single zone didn't appear anymore. No Zone > > were displayed at all. But the Vnets correctly showed they were attached to my > > zone. /etc/pve/sdn/zones.cfg correctly had my zone defined here. I tried adding > > it again from the GUI, which seemed to work. The only change it made to > > /etc/pve/sdn/zones.cfg is the new "ipam: pve" option added to the existing zone > > > > > > I just opened https://bugzilla.proxmox.com/show_bug.cgi?id=3403 > I checked another single node (no cluster) PVE install, which have the exact same issue, so it's not something specific on my setup, but a more general (and critical) bug > > Cheers, > Daniel > > > -- > [ https://www.firewall-services.com/ ] > Daniel Berteaud > FIREWALL-SERVICES SAS, La s?curit? des r?seaux > Soci?t? de Services en Logiciels Libres > T?l : +33.5 56 64 15 32 > Matrix: @dani:fws.fr > [ https://www.firewall-services.com/ | https://www.firewall-services.com ] > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From lindsay.mathieson at gmail.com Wed Apr 28 14:52:52 2021 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 28 Apr 2021 22:52:52 +1000 Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> Message-ID: <08d5428a-b4dc-eec8-cd1a-8e0e66acfba8@gmail.com> Upgraded from 6.3 with no problems on my single node, non-ceph home server (CT only). Kernel update, so needed a reboot. On 28/04/2021 7:56 pm, Martin Maurer wrote: > This version brings unified single-file restore for virtual machine > (VM) and container (CT) backup archives stored on a Proxmox Backup Server Thats amazing, tested and works as advertised. Could see this being very useful. > as well as live restore of VM backup archives located on a Proxmox > Backup Server. How on earth do you do that? are you retrieving disk sectors on the fly as needed from the backup server? Great release! Will probably upgrade our PX/Ceph cluster at work over the weekend. -- Lindsay From daniel at firewall-services.com Wed Apr 28 14:57:27 2021 From: daniel at firewall-services.com (Daniel Berteaud) Date: Wed, 28 Apr 2021 14:57:27 +0200 (CEST) Subject: [PVE-User] SDN issues in 6.4 In-Reply-To: References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr> Message-ID: <1733391914.5223.1619614647243.JavaMail.zimbra@fws.fr> ----- Le 28 Avr 21, ? 14:50, Gilberto Ferreira gilberto.nunes32 at gmail.com a ?crit : > Just curious: did you restart the upgraded nodes? Yes, of course ;-) -- [ https://www.firewall-services.com/ ] Daniel Berteaud FIREWALL-SERVICES SAS, La s?curit? des r?seaux Soci?t? de Services en Logiciels Libres T?l : +33.5 56 64 15 32 Matrix: @dani:fws.fr [ https://www.firewall-services.com/ | https://www.firewall-services.com ] From gilberto.nunes32 at gmail.com Wed Apr 28 15:04:47 2021 From: gilberto.nunes32 at gmail.com (Gilberto Ferreira) Date: Wed, 28 Apr 2021 10:04:47 -0300 Subject: [PVE-User] SDN issues in 6.4 In-Reply-To: <1733391914.5223.1619614647243.JavaMail.zimbra@fws.fr> References: <299675699.4847.1619611819855.JavaMail.zimbra@fws.fr> <47334094.5067.1619613578451.JavaMail.zimbra@fws.fr> <1733391914.5223.1619614647243.JavaMail.zimbra@fws.fr> Message-ID: Ok! Just checking. ? --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram --- Gilberto Nunes Ferreira (47) 99676-7530 - Whatsapp / Telegram Em qua., 28 de abr. de 2021 ?s 09:58, Daniel Berteaud < daniel at firewall-services.com> escreveu: > ----- Le 28 Avr 21, ? 14:50, Gilberto Ferreira gilberto.nunes32 at gmail.com > a ?crit : > > > Just curious: did you restart the upgraded nodes? > > Yes, of course ;-) > > -- > [ https://www.firewall-services.com/ ] > Daniel Berteaud > FIREWALL-SERVICES SAS, La s?curit? des r?seaux > Soci?t? de Services en Logiciels Libres > T?l : +33.5 56 64 15 32 > Matrix: @dani:fws.fr > [ https://www.firewall-services.com/ | https://www.firewall-services.com ] > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From f.cuseo at panservice.it Wed Apr 28 16:49:12 2021 From: f.cuseo at panservice.it (Fabrizio Cuseo) Date: Wed, 28 Apr 2021 16:49:12 +0200 (CEST) Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> Message-ID: <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> Great ! Regarding file restore, are you planning to support restore from LVM filesystem ? Regards, Fabrizio ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto: > Hi all, > > We are proud to announce the general availability of Proxmox Virtual Environment > 6.4, our open-source virtualization platform. This version brings unified > single-file restore for virtual machine (VM) and container (CT) backup archives > stored on a Proxmox Backup Server as well as live restore of VM backup archives > -- --- Fabrizio Cuseo - mailto:f.cuseo at panservice.it Direzione Generale - Panservice InterNetWorking Servizi Professionali per Internet ed il Networking Panservice e' associata AIIP - RIPE Local Registry Phone: +39 0773 410020 - Fax: +39 0773 470219 http://www.panservice.it mailto:info at panservice.it Numero verde nazionale: 800 901492 From f.cuseo at panservice.it Wed Apr 28 17:08:28 2021 From: f.cuseo at panservice.it (Fabrizio Cuseo) Date: Wed, 28 Apr 2021 17:08:28 +0200 (CEST) Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com> References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com> Message-ID: <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it> Wonderful ! And what about NTFS filesystems ? ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto: > On 28/04/2021 16:49, Fabrizio Cuseo wrote: >> Great ! >> Regarding file restore, are you planning to support restore from LVM filesystem >> ? >> > > Yes, we plan on adding support for LVM, ZFS and mdraid in the future. > For now only filesystems directly on partitions are supported. > > ~ Stefan > >> Regards, Fabrizio >> >> >> >> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto: >> >>> Hi all, >>> >>> We are proud to announce the general availability of Proxmox Virtual Environment >>> 6.4, our open-source virtualization platform. This version brings unified >>> single-file restore for virtual machine (VM) and container (CT) backup archives >>> stored on a Proxmox Backup Server as well as live restore of VM backup archives -- --- Fabrizio Cuseo - mailto:f.cuseo at panservice.it Direzione Generale - Panservice InterNetWorking Servizi Professionali per Internet ed il Networking Panservice e' associata AIIP - RIPE Local Registry Phone: +39 0773 410020 - Fax: +39 0773 470219 http://www.panservice.it mailto:info at panservice.it Numero verde nazionale: 800 901492 From s.reiter at proxmox.com Wed Apr 28 17:02:04 2021 From: s.reiter at proxmox.com (Stefan Reiter) Date: Wed, 28 Apr 2021 17:02:04 +0200 Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> Message-ID: <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com> On 28/04/2021 16:49, Fabrizio Cuseo wrote: > Great ! > Regarding file restore, are you planning to support restore from LVM filesystem ? > Yes, we plan on adding support for LVM, ZFS and mdraid in the future. For now only filesystems directly on partitions are supported. ~ Stefan > Regards, Fabrizio > > > > ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto: > >> Hi all, >> >> We are proud to announce the general availability of Proxmox Virtual Environment >> 6.4, our open-source virtualization platform. This version brings unified >> single-file restore for virtual machine (VM) and container (CT) backup archives >> stored on a Proxmox Backup Server as well as live restore of VM backup archives >> From s.reiter at proxmox.com Wed Apr 28 17:11:03 2021 From: s.reiter at proxmox.com (Stefan Reiter) Date: Wed, 28 Apr 2021 17:11:03 +0200 Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it> References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com> <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it> Message-ID: On 28/04/2021 17:08, Fabrizio Cuseo wrote: > Wonderful ! > And what about NTFS filesystems ? > NTFS is supported already, file restore from Windows guests should be possible. > > ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto: > >> On 28/04/2021 16:49, Fabrizio Cuseo wrote: >>> Great ! >>> Regarding file restore, are you planning to support restore from LVM filesystem >>> ? >>> >> >> Yes, we plan on adding support for LVM, ZFS and mdraid in the future. >> For now only filesystems directly on partitions are supported. >> >> ~ Stefan >> >>> Regards, Fabrizio >>> >>> >>> >>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto: >>> >>>> Hi all, >>>> >>>> We are proud to announce the general availability of Proxmox Virtual Environment >>>> 6.4, our open-source virtualization platform. This version brings unified >>>> single-file restore for virtual machine (VM) and container (CT) backup archives >>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives > From f.cuseo at panservice.it Wed Apr 28 17:15:56 2021 From: f.cuseo at panservice.it (Fabrizio Cuseo) Date: Wed, 28 Apr 2021 17:15:56 +0200 (CEST) Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com> <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it> Message-ID: <1871322782.88053.1619622956498.JavaMail.zimbra@zimbra.panservice.it> I am trying with a Windows7 guest but I have this error: "proxmox-file-restore failed: Error: given image drive-virtio0.img.fidx' not found (500) ----- Il 28-apr-21, alle 17:11, Stefan Reiter s.reiter at proxmox.com ha scritto: > On 28/04/2021 17:08, Fabrizio Cuseo wrote: >> Wonderful ! >> And what about NTFS filesystems ? >> > > NTFS is supported already, file restore from Windows guests should be > possible. > >> >> ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto: >> >>> On 28/04/2021 16:49, Fabrizio Cuseo wrote: >>>> Great ! >>>> Regarding file restore, are you planning to support restore from LVM filesystem >>>> ? >>>> >>> >>> Yes, we plan on adding support for LVM, ZFS and mdraid in the future. >>> For now only filesystems directly on partitions are supported. >>> >>> ~ Stefan >>> >>>> Regards, Fabrizio >>>> >>>> >>>> >>>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto: >>>> >>>>> Hi all, >>>>> >>>>> We are proud to announce the general availability of Proxmox Virtual Environment >>>>> 6.4, our open-source virtualization platform. This version brings unified >>>>> single-file restore for virtual machine (VM) and container (CT) backup archives >>>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives -- --- Fabrizio Cuseo - mailto:f.cuseo at panservice.it Direzione Generale - Panservice InterNetWorking Servizi Professionali per Internet ed il Networking Panservice e' associata AIIP - RIPE Local Registry Phone: +39 0773 410020 - Fax: +39 0773 470219 http://www.panservice.it mailto:info at panservice.it Numero verde nazionale: 800 901492 From s.reiter at proxmox.com Wed Apr 28 17:24:05 2021 From: s.reiter at proxmox.com (Stefan Reiter) Date: Wed, 28 Apr 2021 17:24:05 +0200 Subject: [PVE-User] Proxmox VE 6.4 released In-Reply-To: <1871322782.88053.1619622956498.JavaMail.zimbra@zimbra.panservice.it> References: <4817f902-0e8b-5ecc-fc0d-8bfccc255bd8@proxmox.com> <1053194568.87396.1619621352934.JavaMail.zimbra@zimbra.panservice.it> <4aff8786-d753-a7da-5369-b00272b078c7@proxmox.com> <1965759498.87885.1619622508217.JavaMail.zimbra@zimbra.panservice.it> <1871322782.88053.1619622956498.JavaMail.zimbra@zimbra.panservice.it> Message-ID: <877fc689-fe7b-72df-fce0-9849b5d4870e@proxmox.com> On 28/04/2021 17:15, Fabrizio Cuseo wrote: > > I am trying with a Windows7 guest but I have this error: "proxmox-file-restore failed: Error: given image drive-virtio0.img.fidx' not found (500) > Ah that's an unrelated issue with virtio drives, should be fixed by 'proxmox-backup-file-restore 1.1.5-2' currently in pvetest. We found that issue a bit too late for the release, unfortunately. See: https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=606828cc65feb380c0f9536fe7ca277ea1dc20c1 > > > ----- Il 28-apr-21, alle 17:11, Stefan Reiter s.reiter at proxmox.com ha scritto: > >> On 28/04/2021 17:08, Fabrizio Cuseo wrote: >>> Wonderful ! >>> And what about NTFS filesystems ? >>> >> >> NTFS is supported already, file restore from Windows guests should be >> possible. >> >>> >>> ----- Il 28-apr-21, alle 17:02, Stefan Reiter s.reiter at proxmox.com ha scritto: >>> >>>> On 28/04/2021 16:49, Fabrizio Cuseo wrote: >>>>> Great ! >>>>> Regarding file restore, are you planning to support restore from LVM filesystem >>>>> ? >>>>> >>>> >>>> Yes, we plan on adding support for LVM, ZFS and mdraid in the future. >>>> For now only filesystems directly on partitions are supported. >>>> >>>> ~ Stefan >>>> >>>>> Regards, Fabrizio >>>>> >>>>> >>>>> >>>>> ----- Il 28-apr-21, alle 11:56, Martin Maurer martin at proxmox.com ha scritto: >>>>> >>>>>> Hi all, >>>>>> >>>>>> We are proud to announce the general availability of Proxmox Virtual Environment >>>>>> 6.4, our open-source virtualization platform. This version brings unified >>>>>> single-file restore for virtual machine (VM) and container (CT) backup archives >>>>>> stored on a Proxmox Backup Server as well as live restore of VM backup archives > From me at marcobertorello.it Wed Apr 28 17:34:14 2021 From: me at marcobertorello.it (Bertorello, Marco) Date: Wed, 28 Apr 2021 17:34:14 +0200 Subject: [PVE-User] Replication blocked issue Message-ID: <11f33c2d-472d-d2c8-d3e4-c5e4a99900e4@marcobertorello.it> Dear PVE users, I've a 3-nodes clusters, with ZFS storage. Every node use it's own storage and the VMs/LXCs are replicated across other nodes every 10 minutes. Some times happens that a replica job is running without an end. For example at the moment I have a replication started yesterday: 2021-04-27 07:20:01 101-1: start replication job 2021-04-27 07:20:01 101-1: guest => CT 101, running => 1 2021-04-27 07:20:01 101-1: volumes => DS1:subvol-101-disk-1 2021-04-27 07:20:02 101-1: freeze guest filesystem 2021-04-27 07:20:05 101-1: create snapshot '__replicate_101-1_1619500801__' on DS1:subvol-101-disk-1 2021-04-27 07:20:06 101-1: thaw guest filesystem 2021-04-27 07:20:06 101-1: using secure transmission, rate limit: none 2021-04-27 07:20:06 101-1: incremental sync 'DS1:subvol-101-disk-1' (__replicate_101-1_1619500201__ => __replicate_101-1_1619500801__) 2021-04-27 07:20:08 101-1: send from @__replicate_101-1_1619500201__ to zp1/subvol-101-disk-1 at __replicate_101-0_1619500211__ estimated size is 213K 2021-04-27 07:20:08 101-1: send from @__replicate_101-0_1619500211__ to zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ estimated size is 26.1M 2021-04-27 07:20:08 101-1: total estimated size is 26.4M 2021-04-27 07:20:09 101-1: TIME??????? SENT?? SNAPSHOT zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ 2021-04-27 07:20:09 101-1: 07:20:09?? 3.18M zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ [...] 2021-04-28 17:27:25 101-1: 17:27:25?? 3.18M zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ 2021-04-28 17:27:26 101-1: 17:27:26?? 3.18M zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ 2021-04-28 17:27:27 101-1: 17:27:27?? 3.18M zp1/subvol-101-disk-1 at __replicate_101-1_1619500801__ as you can see, no progress in this time slot, still 3.18M transferred. There are 2 big problems with this: 1) the blocked replica prevents the other replication scheduled on the source node to run until this replication ends or fail 2) I've no other solution but reboot the destination node to exit this situation. I tried to kill the process on the destination node, but the process is in D state and cannot be killed. There is a way to get out this scenario without reboot nodes? Thanks a lot and best regards, -- Marco Bertorello https://www.marcobertorello.it From leesteken at protonmail.ch Wed Apr 28 20:48:24 2021 From: leesteken at protonmail.ch (Arjen) Date: Wed, 28 Apr 2021 18:48:24 +0000 Subject: [PVE-User] Multi Data Center Cluster or Not In-Reply-To: References: Message-ID: On Tuesday, April 27th, 2021 at 20:38, JR Richardson wrote: > Hi All, > > I'm looking for suggestions for geo-diversity using PROXMOX > Clustering. I understand running hypervisors in the same cluster in > multiple data centers is possible with high capacity/low latency > inter-site links. What I'm learning is there could be better ways, > like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS > is interesting but would require manually restoring nodes should a > failure occur. > > I'm looking for best practice or suggestions in topology that folks > are using successfully or even tales of failure for what to avoid. I haven't actually done this, so feel free to ignore this or inform me of problems with this approach: Set-up multiple Proxmox systems/clusters, each in a separate data center but don't cluster them over the data centers. Set-up a VPN that allows Proxmox and VMs in each data center to connect to the others. It does not need low latency. Have a PBS VM on each of them and backup your VMs (many times a day, if you want) to the local PBS and sync all the PBSs. Distribute the VMs manually over the different systems, so that the users have the lowest latency. Leave room for more VMs, this makes then operate more smooth and would allow taking over load from other systems. If a data center becomes unusable, restore the VMs that were running there on the other systems manually. In case of problems, nothing will be automated and you'll lose work since the most recent available backup, but at least you know that you have several other working Proxmox systems/clusters up and running and capable of restoring and running the affected VMs. The syncing of backups only depends on changes in the set of deduplicated chunks and does not need low latency or high speed. kind regards, Arjen From f.cuseo at panservice.it Wed Apr 28 21:06:23 2021 From: f.cuseo at panservice.it (Fabrizio Cuseo) Date: Wed, 28 Apr 2021 21:06:23 +0200 (CEST) Subject: [PVE-User] Multi Data Center Cluster or Not In-Reply-To: References: Message-ID: <814551923.94902.1619636783033.JavaMail.zimbra@zimbra.panservice.it> I have not read with attention, but try to read this article: https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring ----- Il 28-apr-21, alle 20:48, pve-user pve-user at lists.proxmox.com ha scritto: > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- --- Fabrizio Cuseo - mailto:f.cuseo at panservice.it Direzione Generale - Panservice InterNetWorking Servizi Professionali per Internet ed il Networking Panservice e' associata AIIP - RIPE Local Registry Phone: +39 0773 410020 - Fax: +39 0773 470219 http://www.panservice.it mailto:info at panservice.it Numero verde nazionale: 800 901492 From alex at calicolabs.com Thu Apr 29 20:04:59 2021 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 29 Apr 2021 11:04:59 -0700 Subject: [PVE-User] Multi Data Center Cluster or Not In-Reply-To: References: Message-ID: Yes, this is the way I do it, each proxmox cluster is independent in its own location but they all have access to NFS mounts or PBS server where I can dump the vzdump images. It is not too "HA" but you can backup/restore VMs from one place to another or spin up yesterday's version. It is sufficient for our use cases and could be good enough for your DR. On Wed, Apr 28, 2021 at 11:49 AM Arjen via pve-user < pve-user at lists.proxmox.com> wrote: > > > > ---------- Forwarded message ---------- > From: Arjen > To: Proxmox VE user list > Cc: > Bcc: > Date: Wed, 28 Apr 2021 18:48:24 +0000 > Subject: Re: [PVE-User] Multi Data Center Cluster or Not > On Tuesday, April 27th, 2021 at 20:38, JR Richardson < > jmr.richardson at gmail.com> wrote: > > > Hi All, > > > > I'm looking for suggestions for geo-diversity using PROXMOX > > Clustering. I understand running hypervisors in the same cluster in > > multiple data centers is possible with high capacity/low latency > > inter-site links. What I'm learning is there could be better ways, > > like running PROXMOX backup servers (PBS) with Remote Sync. Using PBS > > is interesting but would require manually restoring nodes should a > > failure occur. > > > > I'm looking for best practice or suggestions in topology that folks > > are using successfully or even tales of failure for what to avoid. > > I haven't actually done this, so feel free to ignore this or inform me of > problems with this approach: > > Set-up multiple Proxmox systems/clusters, each in a separate data center > but don't cluster them over the data centers. > Set-up a VPN that allows Proxmox and VMs in each data center to connect to > the others. It does not need low latency. > Have a PBS VM on each of them and backup your VMs (many times a day, if > you want) to the local PBS and sync all the PBSs. > Distribute the VMs manually over the different systems, so that the users > have the lowest latency. > Leave room for more VMs, this makes then operate more smooth and would > allow taking over load from other systems. > If a data center becomes unusable, restore the VMs that were running there > on the other systems manually. > > In case of problems, nothing will be automated and you'll lose work since > the most recent available backup, but at least you know that you have > several other working Proxmox systems/clusters up and running and capable > of restoring and running the affected VMs. > The syncing of backups only depends on changes in the set of deduplicated > chunks and does not need low latency or high speed. > > kind regards, Arjen > > > > > > ---------- Forwarded message ---------- > From: Arjen via pve-user > To: Proxmox VE user list > Cc: Arjen > Bcc: > Date: Wed, 28 Apr 2021 18:48:24 +0000 > Subject: Re: [PVE-User] Multi Data Center Cluster or Not > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From devzero at web.de Fri Apr 30 12:42:25 2021 From: devzero at web.de (Roland) Date: Fri, 30 Apr 2021 12:42:25 +0200 Subject: [PVE-User] pbs prune from commandline ? Message-ID: <3538b9a7-3b60-f603-6b66-c694fb5b225c@web.de> hello, isn't there a commandline equivalent of? proxmox backup server side prune ? (i.e. pbs -> Datastore -> Prune & GC -> Prune Schedule ) how can i trigger a prune from commandline on the pbs side , like i can do gc and verify? with proxmox-backup-manager? ? i only find prune option with proxmox-backup-client. this should be be equivalent to:? pve -> storage -> pbs-ds -> Backup Retention? tab, i.e. it's the prune definition on the client side. shouldn't there exist prune option in proxmox-backup-manager, too !? regards roland