From hermann at qwer.tk Sat Apr 2 14:49:16 2022 From: hermann at qwer.tk (Hermann Himmelbauer) Date: Sat, 2 Apr 2022 14:49:16 +0200 Subject: [PVE-User] PVE 7.1 - Firewall recommendations / best practice? Message-ID: <9ab0dd94-e598-ef4e-ec61-7ab816367cf6@qwer.tk> Dear Proxmox users, I set up a 3-node PVE cluster (PVE 7.1). Now I wonder if and how to configure a firewall. Therefore I would like to know your opinion on "best practice": a) Don't use PVE firewall and set up firewalling on each guest machine b) Use PVE firewall instead of firewalling on guest machines Basically, I have the impression that (b) is the better option for me as it is easier to configure the firewall for all guests in a central location. First of all I'd like to know if the implementation of the PVE-Firewall is reliable or if it is to some degree buggy and thus leads to problems? What is your experience? Moreover I wonder if the firewall is compatible with OVS? I have the following interfaces set up with OVS: enp3s0 (10GBit Storage network) enp1s0 enp2s0 bond0 (LACP, consisting of enp1s0 and enp2s0) vmbr0 (Bridge on top of bond0) vlan1 (on top of vmbr0, PVE management network) vlan200 (on top of vmbr0, alternative PVE management network) tapxxxx several guest network devices In some way the PVE firewall has to know that it has to apply its rules on the host level on vlan1 / vlan200 - how does it know that? What exactly would happen if I enable the firewall on the datacenter level? Will it block any network interfaces, even the storage network? I happenend to try it out - basically I expected that I will be locked out of the management, however, it did nothing? Any best practices? Best Regards, Hermann -- Hermann Himmelbauer Martinstra?e 18/2 3400 Klosterneuburg Mobile: +43-699-11492144 E-Mail: hermann at qwer.tk GPG/PGP: 299893C7 (on keyservers) From gaio at lilliput.linux.it Mon Apr 4 10:17:32 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Mon, 4 Apr 2022 10:17:32 +0200 Subject: [PVE-User] Replication failed, got tiemout? In-Reply-To: <461hhi-f5n2.ln1@hermione.lilliput.linux.it>; from SmartGate on Mon, Apr 04, 2022 at 10:36:01AM +0200 References: <461hhi-f5n2.ln1@hermione.lilliput.linux.it> Message-ID: > New installed PVE6 2-node cluster, totally unloaded; only some test VMs that > are replicated between the two nodes, conected via a 10G direct cable. > Sometimes we get: > ? command 'zfs snapshot rpool/data/vm-103-disk-0 at __replicate_103-0_1648656014__' failed: got timeout > What can be?! Thanks. We catch a log on /var/log/pve/replicate/, but seems, at least to me, not providing some more clue: 2022-04-02 14:00:14 103-0: start replication job 2022-04-02 14:00:14 103-0: guest => VM 103, running => 5167 2022-04-02 14:00:14 103-0: volumes => local-zfs:vm-103-disk-0 2022-04-02 14:00:16 103-0: create snapshot '__replicate_103-0_1648900814__' on local-zfs:vm-103-disk-0 2022-04-02 14:00:21 103-0: end replication job with error: command 'zfs snapshot rpool/data/vm-103-disk-0 at __replicate_103-0_1648900814__' failed: got timeout I'm seeking info. Thanks. -- Fino a quando il colore della pelle sar? pi? importante del colore degli occhi, sar? sempre guerra. (Bob Marley) From a.lauterer at proxmox.com Tue Apr 5 09:26:33 2022 From: a.lauterer at proxmox.com (Aaron Lauterer) Date: Tue, 5 Apr 2022 09:26:33 +0200 Subject: [PVE-User] Replication failed, got tiemout? In-Reply-To: References: <461hhi-f5n2.ln1@hermione.lilliput.linux.it> Message-ID: <3c6d1ee7-991c-50af-7427-f152ccdbe869@proxmox.com> Is the pool using HDDs? Could be that other things are happening at that moment and HDDs are really not great for random IO. I had that as well sometimes. Went away when I changed to SSDs. A dedicated special device vdev on (mirrored) SSDs should also improve the situation while not needing as many SSDs. Snapshots are a metadata operation. See [0] or `man zpoolconcepts` and look for "special device" Cheers Aaron [0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_special_device On 4/4/22 10:17, Marco Gaiarin wrote: >> New installed PVE6 2-node cluster, totally unloaded; only some test VMs that >> are replicated between the two nodes, conected via a 10G direct cable. >> Sometimes we get: >> ? command 'zfs snapshot rpool/data/vm-103-disk-0 at __replicate_103-0_1648656014__' failed: got timeout >> What can be?! Thanks. > > We catch a log on /var/log/pve/replicate/, but seems, at least to me, not > providing some more clue: > > 2022-04-02 14:00:14 103-0: start replication job > 2022-04-02 14:00:14 103-0: guest => VM 103, running => 5167 > 2022-04-02 14:00:14 103-0: volumes => local-zfs:vm-103-disk-0 > 2022-04-02 14:00:16 103-0: create snapshot '__replicate_103-0_1648900814__' on local-zfs:vm-103-disk-0 > 2022-04-02 14:00:21 103-0: end replication job with error: command 'zfs snapshot rpool/data/vm-103-disk-0 at __replicate_103-0_1648900814__' failed: got timeout > > I'm seeking info. Thanks. > From gaio at lilliput.linux.it Tue Apr 5 17:55:16 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Tue, 5 Apr 2022 17:55:16 +0200 Subject: [PVE-User] Replication failed, got tiemout? In-Reply-To: <3c6d1ee7-991c-50af-7427-f152ccdbe869@proxmox.com> References: <461hhi-f5n2.ln1@hermione.lilliput.linux.it> <3c6d1ee7-991c-50af-7427-f152ccdbe869@proxmox.com> Message-ID: <20220405155516.GB13740@lilliput.linux.it> Mandi! Aaron Lauterer In chel di` si favelave... > Is the pool using HDDs? Yes. After fiddling a bit, we are supposing an IO peak trouble, this confirm all. Because we don't need normally strict replication timing, for now we have limited the bandwidth, and seems works. Thanks for all the info. From mattp at pobox.com Sat Apr 9 21:06:21 2022 From: mattp at pobox.com (Matt Perry) Date: Sat, 9 Apr 2022 14:06:21 -0500 Subject: [PVE-User] Migrating VMs Not Working Message-ID: <60c29c3f-b289-393c-0715-e4f5630e95e4@pobox.com> I have a 2 node cluster using Proxmox 7.1-12 on each node. The machine hardware and disk sizes are identical to each other. I am trying to migrate a Windows 11 Pro vm. When I do the migration I take the vm offline. I have tried a few times to do the migration. Each time it gets this far and stops. 2022-04-09 12:36:37 starting migration of VM 113 to node 'pve1' (192.168.201.15) 2022-04-09 12:36:37 found local disk 'local-lvm:vm-113-disk-0' (in current VM config) 2022-04-09 12:36:37 found local disk 'local-lvm:vm-113-disk-1' (in current VM config) 2022-04-09 12:36:37 found generated disk 'local-lvm:vm-113-disk-2' (in current VM config) 2022-04-09 12:36:37 copying local disk images 2022-04-09 12:36:38 volume pve/vm-113-disk-0 already exists - importing with a different name 2022-04-09 12:36:38 Logical volume "vm-113-disk-4" created. I have let it sit at this point for a half an hour or more but there is no further output until I cancel the migration. What might be going on? What debug steps can I take to figure out how to migrate this vm? From tsabolov at t8.ru Mon Apr 11 14:12:10 2022 From: tsabolov at t8.ru (=?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?=) Date: Mon, 11 Apr 2022 15:12:10 +0300 Subject: [PVE-User] Migrating VMs Not Working In-Reply-To: <60c29c3f-b289-393c-0715-e4f5630e95e4@pobox.com> References: <60c29c3f-b289-393c-0715-e4f5630e95e4@pobox.com> Message-ID: <6b129a0f-8c51-1905-f00a-fe9cbd383fea@t8.ru> Hi Matt, Can you for test move the VM disks to shared Storage (NFS,ceph or other), shared for 2 nodes. I think because? the disk moved from 1 node Proxmox poweroff the VM before moved it to 2 node. See here https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_storage some types? of storage 09.04.2022 22:06, Matt Perry ?????: > I have a 2 node cluster using Proxmox 7.1-12 on each node. The machine > hardware and disk sizes are identical to each other. > > I am trying to migrate a Windows 11 Pro vm. When I do the migration I > take the vm offline. > > I have tried a few times to do the migration. Each time it gets this > far and stops. > > 2022-04-09 12:36:37 starting migration of VM 113 to node 'pve1' > (192.168.201.15) > 2022-04-09 12:36:37 found local disk 'local-lvm:vm-113-disk-0' (in > current VM config) > 2022-04-09 12:36:37 found local disk 'local-lvm:vm-113-disk-1' (in > current VM config) > 2022-04-09 12:36:37 found generated disk 'local-lvm:vm-113-disk-2' (in > current VM config) > 2022-04-09 12:36:37 copying local disk images > 2022-04-09 12:36:38 volume pve/vm-113-disk-0 already exists - > importing with a different name > 2022-04-09 12:36:38 Logical volume "vm-113-disk-4" created. > > I have let it sit at this point for a half an hour or more but there > is no further output until I cancel the migration. > What might be going on? What debug steps can I take to figure out how > to migrate this vm? > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > ? ????????? ?????? ??????? ????????? ????????????? ??? "?8" ???.: +74992716161 ???: +79850334875 logo_T8rus tsabolov at t8.ru ??? ??8?, 107076, ?. ?????? ????????????????? ??., ?. 44, ???.1 www.t8.ru From lindsay.mathieson at gmail.com Thu Apr 14 07:16:43 2022 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 14 Apr 2022 15:16:43 +1000 Subject: [PVE-User] PBS Backup VM with fuse mounts Message-ID: <352b4058-1d8e-4726-18ee-580c11ca0f4c@gmail.com> I have a KVM debian vm that has a fuse mount with 20 terabyte of data on it. If use PBS to back up the VM, will that include the data on the fuse mount? (I don't want it too) -- Lindsay Mathieson From mhill at inett.de Thu Apr 14 11:36:02 2022 From: mhill at inett.de (Maximilian Hill) Date: Thu, 14 Apr 2022 11:36:02 +0200 Subject: [PVE-User] Migration target node in case of shutdown Message-ID: Hello, how exactly does Proxmox VE determine the target node of a migration in case of a shutdown? Thanks, Max -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From s.sterz at proxmox.com Thu Apr 14 12:40:35 2022 From: s.sterz at proxmox.com (Stefan Sterz) Date: Thu, 14 Apr 2022 12:40:35 +0200 Subject: [PVE-User] Migration target node in case of shutdown In-Reply-To: References: Message-ID: <342bf09c-3113-eddf-1950-6dba41e4b771@proxmox.com> Hello, the target node is chosen by the node priority of the HA group that a VM is in. You can read more about that in the manual [1]. Here is an example: Assuming you have a three node cluster (node0, node1, node2) and the HA group has the following priorities: node0:2, node1:1, node2:0. A VM running on node0 will be moved to node1 and then node2 if each node fails one after another. If several nodes have the same priority, VMs will be distributed evenly. Best regards, Stefan [1]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#ha_manager_groups On 14.04.22 11:36, Maximilian Hill wrote: > Hello, > > how exactly does Proxmox VE determine the target node of a migration in > case of a shutdown? > > > Thanks, > > Max > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mhill at inett.de Thu Apr 14 12:50:09 2022 From: mhill at inett.de (Maximilian Hill) Date: Thu, 14 Apr 2022 12:50:09 +0200 Subject: [PVE-User] Migration target node in case of shutdown In-Reply-To: <342bf09c-3113-eddf-1950-6dba41e4b771@proxmox.com> References: <342bf09c-3113-eddf-1950-6dba41e4b771@proxmox.com> Message-ID: Hello Stefan, the reason I'm asking, is, that I ran into a case, in which the priorities were all equal (not set), but all VMs were tried to migrate from two nodes to the same node of a 8 node cluster. This by itself wouldn't really be a problem, but qemu managed to allocate all the memory for the VMs, so no migration failed to start. Then the VMs used the memory and the OOM started to kill VMs and other processes. In this special case, we have a temporary workaround in place: - No swap - vm.overcommit_memory = 2 - vm.overcommit_ratio = 99 This of course is not a nice solution at all. Best regards, Max On Thu, Apr 14, 2022 at 12:40:35PM +0200, Stefan Sterz wrote: > Hello, > > the target node is chosen by the node priority of the HA group that a > VM is in. You can read more about that in the manual [1]. Here is an > example: Assuming you have a three node cluster (node0, node1, node2) > and the HA group has the following priorities: node0:2, node1:1, > node2:0. A VM running on node0 will be moved to node1 and then node2 > if each node fails one after another. If several nodes have the same > priority, VMs will be distributed evenly. > > Best regards, > Stefan > > [1]: > https://pve.proxmox.com/pve-docs/pve-admin-guide.html#ha_manager_groups > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From mark at tuxis.nl Fri Apr 15 12:05:58 2022 From: mark at tuxis.nl (Mark Schouten) Date: Fri, 15 Apr 2022 12:05:58 +0200 Subject: [PVE-User] PBS Backup VM with fuse mounts In-Reply-To: <352b4058-1d8e-4726-18ee-580c11ca0f4c@gmail.com> References: <352b4058-1d8e-4726-18ee-580c11ca0f4c@gmail.com> Message-ID: <93D503A8-8C86-4122-A476-1367030AAEDA@tuxis.nl> Hi, If you?re using PVE to do the backup, no. It will only backup attached devices in the VM config. If you?re running proxmox-backup-client inside the VM, you might, considering your settings. ? Mark Schouten, CTO Tuxis B.V. mark at tuxis.nl > On 14 Apr 2022, at 07:16, Lindsay Mathieson wrote: > > I have a KVM debian vm that has a fuse mount with 20 terabyte of data on it. If use PBS to back up the VM, will that include the data on the fuse mount? (I don't want it too) > > -- > Lindsay Mathieson > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From tsabolov at t8.ru Fri Apr 15 16:02:13 2022 From: tsabolov at t8.ru (=?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?=) Date: Fri, 15 Apr 2022 17:02:13 +0300 Subject: [PVE-User] PowerEdge R440 & watchdog timer Message-ID: Hi to all, I have the? 6 node Proxmox VE cluster + Ceph installed (all node is PowerEdge R440) Need I Configure Hardware Watchdog / IPMI Fencing ? I'm a little confused do I need to do something in this direction or not? Sergey TS The best Regard ______________________________________ pve-user mailing list pve-user at lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mir at miras.org Fri Apr 15 16:15:57 2022 From: mir at miras.org (Michael Rasmussen) Date: Fri, 15 Apr 2022 16:15:57 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: Message-ID: <20220415161557.677231fd@sleipner.datanom.net> On Fri, 15 Apr 2022 17:02:13 +0300 ?????? ??????? wrote: > > Need I Configure Hardware Watchdog / IPMI Fencing ? > > I'm a little confused do I need to do something in this direction or > not? > The software watchdog comes configured out of the box and with fencing etc. and works extremely well. I would bother configuring the hardware watchdog. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E mir datanom net https://pgp.key-server.io/pks/lookup?search=0xE501F51C mir miras org https://pgp.key-server.io/pks/lookup?search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Keep out of the sunlight. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Fri Apr 15 16:23:02 2022 From: mir at miras.org (Michael Rasmussen) Date: Fri, 15 Apr 2022 16:23:02 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: Message-ID: <20220415162302.42ff30ca@sleipner.datanom.net> On Fri, 15 Apr 2022 16:15:57 +0200 Michael Rasmussen via pve-user wrote: > etc. and works extremely well. I would bother configuring the hardware > watchdog. > It should have said: I would not bother configuring the hardware watchdog :-) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E mir datanom net https://pgp.key-server.io/pks/lookup?search=0xE501F51C mir miras org https://pgp.key-server.io/pks/lookup?search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Your aim is high and to the right. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From tsabolov at t8.ru Fri Apr 15 16:48:12 2022 From: tsabolov at t8.ru (=?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?=) Date: Fri, 15 Apr 2022 17:48:12 +0300 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: Message-ID: <746cb313-1ee4-8481-6904-a261634202e9@t8.ru> Thanks for answers, I understand the configure hardware watchdog is not the best way. How I can change the Timer? when the 1 node from the cluster lost connectivity ( shutdown the switch)? go the reboot ? Is possible change the timer ? 15.04.2022 17:23, Michael Rasmussen via pve-user ?????: > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Sergey TS The best Regard ______________________________________ pve-user mailing list pve-user at lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mir at miras.org Fri Apr 15 18:00:27 2022 From: mir at miras.org (Michael Rasmussen) Date: Fri, 15 Apr 2022 18:00:27 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: <746cb313-1ee4-8481-6904-a261634202e9@t8.ru> <20220415165642.0e5b20cb@sleipner.datanom.net> Message-ID: <20220415180027.7d25dc74@sleipner.datanom.net> On Fri, 15 Apr 2022 18:17:26 +0300 ?????? ??????? wrote: > Thank you for answer. > > Maybe I'm wrong, but it seems to me that the timeout value is quite > small. > > If node lost connection after 2 min is go to reboot is good > practices, I won't argue with you, maybe you're right. > For the last 10 years I have been using Proxmox I have not have a lost connection to a server for over 1 sec without it being intentionally but if your circumstances is another usecase I would go for stackable switches I have a port for either switch connected to my servers and UPS control for all my servers. Loosing connection to a server for more than 1 sec can only mean hardware failure or loss of power. > > 15.04.2022 17:56, Michael Rasmussen ?????: > > On Fri, 15 Apr 2022 17:48:12 +0300 > > ?????? ??????? wrote: > > > >> Is possible change the timer ? > >> > > AFAIK this requires changes to the watchdog code. To the best of my > > knowledge the timeout is configured according to best practices. > > > > Why do you want to change the timeout value? > > > ? ????????? > ?????? ??????? > ????????? ????????????? ??? "?8" > ???.: +74992716161 > ???: +79850334875 > logo_T8rus > > tsabolov at t8.ru > ??? ??8?, 107076, ?. ?????? > ????????????????? ??., ?. 44, ???.1 > www.t8.ru -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E mir datanom net https://pgp.key-server.io/pks/lookup?search=0xE501F51C mir miras org https://pgp.key-server.io/pks/lookup?search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: (Never thought I'd be telling Malcolm and Ilya the same thing... :-) -- Larry Wall in <199711071819.KAA29909 at wall.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Fri Apr 15 18:04:48 2022 From: mir at miras.org (Michael Rasmussen) Date: Fri, 15 Apr 2022 18:04:48 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: <746cb313-1ee4-8481-6904-a261634202e9@t8.ru> <20220415165642.0e5b20cb@sleipner.datanom.net> Message-ID: <20220415180448.7b94cd08@sleipner.datanom.net> On Fri, 15 Apr 2022 18:00:27 +0200 Michael Rasmussen via pve-user wrote: > > For the last 10 years I have been using Proxmox I have not have a lost > connection to a server for over 1 sec without it being intentionally > but if your circumstances is another usecase I would go for stackable > switches I have a port for either switch connected to my servers and > UPS control for all my servers. > > Loosing connection to a server for more than 1 sec can only mean > hardware failure or loss of power. > Forgot to mention that all my infrastructure and hardware is UPS controlled so only planned downtime has been when replacing UPS/battery in UPS (3 times) and one time when there was a longer period without power from the power grid (1 time and not planned ;-). -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E mir datanom net https://pgp.key-server.io/pks/lookup?search=0xE501F51C mir miras org https://pgp.key-server.io/pks/lookup?search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: In West Union, Ohio, No married man can go flying without his spouse along at any time, unless he has been married for more than 12 months. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From wolf at wolfspyre.com Fri Apr 15 21:09:31 2022 From: wolf at wolfspyre.com (Wolf Noble) Date: Fri, 15 Apr 2022 14:09:31 -0500 Subject: [PVE-User] rebuilding a cluster node (new root disks (ext4 -> zfs)) Message-ID: <22DAD9B6-4EFE-48EA-A636-F84CE858B5D0@wolfspyre.com> Howdy all! 1) i hope you are doing well, and feel appreciated and respected. 2) Im rebuilding my cluster nodes one at a time. i was hoping there is a guide someplace that enumerated the files on (original root disks) that should be copied over ?new? files to allow the newly rebuilt node to join the cluster with ease. i have a hacky, ugly way, but i?m hoping there?s something a bit more canonical than just my trial and error. I?ll happily contribute my ?moving root to zfs? notes to that doc? if someone could be so kind as to point me in the direction of a clue. :) TIA! ?W [= The contents of this message have been written, read, processed, erased, sorted, sniffed, compressed, rewritten, misspelled, overcompensated, lost, found, and most importantly delivered entirely with recycled electrons =] From elacunza at binovo.es Tue Apr 19 16:34:11 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 16:34:11 +0200 Subject: Backup/timeout issues PVE 6.4 Message-ID: Hi all, We're having backup/timeout issues with traditional non-PBS backups in 6.4 . We have 3 nodes backing up to a NFS server with HDDs. For the same backup task (with multiple VMs spread in those 3 nodes), one node may finish all backups, but other may not be able to perform all VM backups, or not even start them due to storage "not being online". Version (the same for all 3 nodes): proxmox-ve: 6.4-1 (running kernel: 5.4.162-1-pve) pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) pve-kernel-5.4: 6.4-12 pve-kernel-helper: 6.4-12 pve-kernel-5.4.162-1-pve: 5.4.162-2 pve-kernel-5.4.119-1-pve: 5.4.119-1 pve-kernel-4.15: 5.4-19 pve-kernel-4.15.18-30-pve: 4.15.18-58 ceph: 15.2.15-pve1~bpo10 ceph-fuse: 15.2.15-pve1~bpo10 corosync: 3.1.5-pve2~bpo10+1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve4~bpo10 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.22-pve2~bpo10+1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-4 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 proxmox-backup-client: 1.1.13-2 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-1 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.3-2 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.7-pve1 Is this a known bug? We had some issues in another cluster that fixed using this patch on v6 (applied manually): https://bugzilla.proxmox.com/show_bug.cgi?id=3693 Has that bug been backported to v6? Thanks EnekoLacunza Director T?cnico | Zuzendari teknikoa Binovo IT Human Project 943 569 206 elacunza at binovo.es binovo.es Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun youtube linkedin From mark at tuxis.nl Tue Apr 19 16:40:32 2022 From: mark at tuxis.nl (Mark Schouten) Date: Tue, 19 Apr 2022 16:40:32 +0200 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: References: Message-ID: HI, Do you have enough server threads on the NFS server? I?ve seen issues with NFS because all server threads (default 8, on Debian IIRC) are busy, which causes new clients to not being able to connect. ? Mark Schouten, CTO Tuxis B.V. mark at tuxis.nl > On 19 Apr 2022, at 16:34, Eneko Lacunza via pve-user wrote: > > > From: Eneko Lacunza > Subject: Backup/timeout issues PVE 6.4 > Date: 19 April 2022 at 16:34:11 CEST > To: "pve-user at pve.proxmox.com" > > > Hi all, > > We're having backup/timeout issues with traditional non-PBS backups in 6.4 . > > We have 3 nodes backing up to a NFS server with HDDs. For the same backup task (with multiple VMs spread in those 3 nodes), one node may finish all backups, but other may not be able to perform all VM backups, or not even start them due to storage "not being online". > > Version (the same for all 3 nodes): > > proxmox-ve: 6.4-1 (running kernel: 5.4.162-1-pve) > pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) > pve-kernel-5.4: 6.4-12 > pve-kernel-helper: 6.4-12 > pve-kernel-5.4.162-1-pve: 5.4.162-2 > pve-kernel-5.4.119-1-pve: 5.4.119-1 > pve-kernel-4.15: 5.4-19 > pve-kernel-4.15.18-30-pve: 4.15.18-58 > ceph: 15.2.15-pve1~bpo10 > ceph-fuse: 15.2.15-pve1~bpo10 > corosync: 3.1.5-pve2~bpo10+1 > criu: 3.11-3 > glusterfs-client: 5.5-3 > ifupdown: residual config > ifupdown2: 3.0.0-1+pve4~bpo10 > ksm-control-daemon: 1.3-1 > libjs-extjs: 6.0.1-10 > libknet1: 1.22-pve2~bpo10+1 > libproxmox-acme-perl: 1.1.0 > libproxmox-backup-qemu0: 1.1.0-1 > libpve-access-control: 6.4-3 > libpve-apiclient-perl: 3.1-3 > libpve-common-perl: 6.4-4 > libpve-guest-common-perl: 3.1-5 > libpve-http-server-perl: 3.2-3 > libpve-storage-perl: 6.4-1 > libqb0: 1.0.5-1 > libspice-server1: 0.14.2-4~pve6+1 > lvm2: 2.03.02-pve4 > lxc-pve: 4.0.6-2 > lxcfs: 4.0.6-pve1 > novnc-pve: 1.1.0-1 > proxmox-backup-client: 1.1.13-2 > proxmox-mini-journalreader: 1.1-1 > proxmox-widget-toolkit: 2.6-1 > pve-cluster: 6.4-1 > pve-container: 3.3-6 > pve-docs: 6.4-2 > pve-edk2-firmware: 2.20200531-1 > pve-firewall: 4.1-4 > pve-firmware: 3.3-2 > pve-ha-manager: 3.1-1 > pve-i18n: 2.3-1 > pve-qemu-kvm: 5.2.0-6 > pve-xtermjs: 4.7.0-3 > qemu-server: 6.4-2 > smartmontools: 7.2-pve2 > spiceterm: 3.1-1 > vncterm: 1.6-2 > zfsutils-linux: 2.0.7-pve1 > > Is this a known bug? We had some issues in another cluster that fixed using this patch on v6 (applied manually): > https://bugzilla.proxmox.com/show_bug.cgi?id=3693 > > Has that bug been backported to v6? > > Thanks > > > EnekoLacunza > > Director T?cnico | Zuzendari teknikoa > > Binovo IT Human Project > > 943 569 206 > > elacunza at binovo.es > > binovo.es > > Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun > > > youtube > linkedin > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From elacunza at binovo.es Tue Apr 19 16:46:39 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 16:46:39 +0200 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: References: Message-ID: Hi Mark, This is a Synology server, I don't think I can control that from WUI... But I'll take a look. Thanks for the suggestion! El 19/4/22 a las 16:40, Mark Schouten escribi?: > HI, > > Do you have enough server threads on the NFS server? I?ve seen issues > with NFS because all server threads (default 8, on Debian IIRC) are > busy, which causes new clients to not being able to connect. > > ? > Mark Schouten, CTO > Tuxis B.V. > mark at tuxis.nl > > > >> On 19 Apr 2022, at 16:34, Eneko Lacunza via pve-user >> wrote: >> >> >> *From: *Eneko Lacunza >> *Subject: **Backup/timeout issues PVE 6.4* >> *Date: *19 April 2022 at 16:34:11 CEST >> *To: *"pve-user at pve.proxmox.com" >> >> >> Hi all, >> >> We're having backup/timeout issues with traditional non-PBS backups >> in 6.4 . >> >> We have 3 nodes backing up to a NFS server with HDDs. For the same >> backup task (with multiple VMs spread in those 3 nodes), one node may >> finish all backups, but other may not be able to perform all VM >> backups, or not even start them due to storage "not being online". >> >> Version (the same for all 3 nodes): >> >> proxmox-ve: 6.4-1 (running kernel: 5.4.162-1-pve) >> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) >> pve-kernel-5.4: 6.4-12 >> pve-kernel-helper: 6.4-12 >> pve-kernel-5.4.162-1-pve: 5.4.162-2 >> pve-kernel-5.4.119-1-pve: 5.4.119-1 >> pve-kernel-4.15: 5.4-19 >> pve-kernel-4.15.18-30-pve: 4.15.18-58 >> ceph: 15.2.15-pve1~bpo10 >> ceph-fuse: 15.2.15-pve1~bpo10 >> corosync: 3.1.5-pve2~bpo10+1 >> criu: 3.11-3 >> glusterfs-client: 5.5-3 >> ifupdown: residual config >> ifupdown2: 3.0.0-1+pve4~bpo10 >> ksm-control-daemon: 1.3-1 >> libjs-extjs: 6.0.1-10 >> libknet1: 1.22-pve2~bpo10+1 >> libproxmox-acme-perl: 1.1.0 >> libproxmox-backup-qemu0: 1.1.0-1 >> libpve-access-control: 6.4-3 >> libpve-apiclient-perl: 3.1-3 >> libpve-common-perl: 6.4-4 >> libpve-guest-common-perl: 3.1-5 >> libpve-http-server-perl: 3.2-3 >> libpve-storage-perl: 6.4-1 >> libqb0: 1.0.5-1 >> libspice-server1: 0.14.2-4~pve6+1 >> lvm2: 2.03.02-pve4 >> lxc-pve: 4.0.6-2 >> lxcfs: 4.0.6-pve1 >> novnc-pve: 1.1.0-1 >> proxmox-backup-client: 1.1.13-2 >> proxmox-mini-journalreader: 1.1-1 >> proxmox-widget-toolkit: 2.6-1 >> pve-cluster: 6.4-1 >> pve-container: 3.3-6 >> pve-docs: 6.4-2 >> pve-edk2-firmware: 2.20200531-1 >> pve-firewall: 4.1-4 >> pve-firmware: 3.3-2 >> pve-ha-manager: 3.1-1 >> pve-i18n: 2.3-1 >> pve-qemu-kvm: 5.2.0-6 >> pve-xtermjs: 4.7.0-3 >> qemu-server: 6.4-2 >> smartmontools: 7.2-pve2 >> spiceterm: 3.1-1 >> vncterm: 1.6-2 >> zfsutils-linux: 2.0.7-pve1 >> >> Is this a known bug? We had some issues in another cluster that fixed >> using this patch on v6 (applied manually): >> https://bugzilla.proxmox.com/show_bug.cgi?id=3693 >> >> Has that bug been backported to v6? >> >> Thanks >> >> >> ????EnekoLacunza >> >> Director T?cnico | Zuzendari teknikoa >> >> Binovo IT Human Project >> >> 943 569 206 >> >> elacunza at binovo.es >> >> binovo.es >> >> Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun >> >> >> youtube >> linkedin >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From tsabolov at t8.ru Tue Apr 19 16:52:12 2022 From: tsabolov at t8.ru (=?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?=) Date: Tue, 19 Apr 2022 17:52:12 +0300 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: References: Message-ID: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> The Synology path mount over NFS is shared on 3 nodes ? 19.04.2022 17:46, Eneko Lacunza via pve-user ?????: > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user ? ????????? ?????? ??????? ????????? ????????????? ??? "?8" ???.: +74992716161 ???: +79850334875 logo_T8rus tsabolov at t8.ru ??? ??8?, 107076, ?. ?????? ????????????????? ??., ?. 44, ???.1 www.t8.ru From elacunza at binovo.es Tue Apr 19 16:54:38 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 16:54:38 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: <746cb313-1ee4-8481-6904-a261634202e9@t8.ru> <20220415165642.0e5b20cb@sleipner.datanom.net> Message-ID: <0fb91be8-9308-f50e-896a-d1ee950d684d@binovo.es> Hi, El 15/4/22 a las 18:04, Michael Rasmussen via pve-user escribi?: >> For the last 10 years I have been using Proxmox I have not have a lost >> connection to a server for over 1 sec without it being intentionally >> but if your circumstances is another usecase I would go for stackable >> switches I have a port for either switch connected to my servers and >> UPS control for all my servers. >> >> Loosing connection to a server for more than 1 sec can only mean >> hardware failure or loss of power. >> > Forgot to mention that all my infrastructure and hardware is UPS > controlled so only planned downtime has been when replacing UPS/battery > in UPS (3 times) and one time when there was a longer period without > power from the power grid (1 time and not planned ;-). > Unfortunately, starting with PVE 7.x we're seeing cluster issues (nodes going out of quorum only to rejoin instantly) "too often". This is why we create multiple links for corosync after upgrading clusters to v7, so that one of these point-in-time issues with network doesn't reboot a node. So far it has worked well. Unfortunately, we haven't been able to find a common pattern/cause in several clusters we see the issue. Cheers Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From elacunza at binovo.es Tue Apr 19 16:55:20 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 16:55:20 +0200 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> References: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> Message-ID: <3c544791-d558-5520-5b4b-0a058ee0d7dd@binovo.es> El 19/4/22 a las 16:52, ?????? ??????? escribi?: > > The Synology path mount over NFS is shared on 3 nodes ? > > 19.04.2022 17:46, Eneko Lacunza via pve-user ?????: >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From elacunza at binovo.es Tue Apr 19 17:07:08 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 17:07:08 +0200 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> References: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> Message-ID: <3a1dd4d3-bdf3-6e65-164d-629af65cb159@binovo.es> Yes, all nodes mount the same NFS export. El 19/4/22 a las 16:52, ?????? ??????? escribi?: > > The Synology path mount over NFS is shared on 3 nodes ? > > 19.04.2022 17:46, Eneko Lacunza via pve-user ?????: >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From mir at miras.org Tue Apr 19 17:23:25 2022 From: mir at miras.org (Michael Rasmussen) Date: Tue, 19 Apr 2022 17:23:25 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: <746cb313-1ee4-8481-6904-a261634202e9@t8.ru> <20220415165642.0e5b20cb@sleipner.datanom.net> Message-ID: <20220419172325.00793904@sleipner.datanom.net> On Tue, 19 Apr 2022 16:54:38 +0200 Eneko Lacunza via pve-user wrote: > > So far it has worked well. Unfortunately, we haven't been able to > find a common pattern/cause in several clusters we see the issue. > If your corosync network is very busy and/or you have not configured several connections on corosync between your nodes you can try to increase the token value. Find the section called totem and change the value for token - default i 1000 (ms). I have changed the value to 5000 I have not experienced issues since. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc https://pgp.key-server.io/pks/lookup?search=0xD3C9A00E mir datanom net https://pgp.key-server.io/pks/lookup?search=0xE501F51C mir miras org https://pgp.key-server.io/pks/lookup?search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: I never take work home with me; I always leave it in some bar along the way. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From tsabolov at t8.ru Tue Apr 19 17:30:17 2022 From: tsabolov at t8.ru (=?UTF-8?B?0KHQtdGA0LPQtdC5INCm0LDQsdC+0LvQvtCy?=) Date: Tue, 19 Apr 2022 18:30:17 +0300 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: <3a1dd4d3-bdf3-6e65-164d-629af65cb159@binovo.es> References: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> <3a1dd4d3-bdf3-6e65-164d-629af65cb159@binovo.es> Message-ID: I have the same case with backup. 7 Nodes mount the same NFS path on Synology (folder) But? I make the In backup settings some groups (every group is node 1,2,3 etc and is VM) and set the time backups on groups, not the same time. And the Mode of backup is Compression is ZSTD, Snapshot. See screenshot. 19.04.2022 18:07, Eneko Lacunza ?????: > > Yes, all nodes mount the same NFS export. > > El 19/4/22 a las 16:52, ?????? ??????? escribi?: >> >> The Synology path mount over NFS is shared on 3 nodes ? >> >> 19.04.2022 17:46, Eneko Lacunza via pve-user ?????: >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at lists.proxmox.com >>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > Sergey TS The best Regard ______________________________________ pve-user mailing list pve-user at lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From elacunza at binovo.es Tue Apr 19 18:07:55 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 18:07:55 +0200 Subject: [PVE-User] PowerEdge R440 & watchdog timer In-Reply-To: References: <746cb313-1ee4-8481-6904-a261634202e9@t8.ru> <20220415165642.0e5b20cb@sleipner.datanom.net> Message-ID: <415781fa-8e6a-b809-cf49-5a585ffef87a@binovo.es> Hi Michael, El 19/4/22 a las 17:23, Michael Rasmussen via pve-user escribi?: >> So far it has worked well. Unfortunately, we haven't been able to >> find a common pattern/cause in several clusters we see the issue. >> > If your corosync network is very busy and/or you have not configured > several connections on corosync between your nodes you can try to > increase the token value. Find the section called totem and change the > value for token - default i 1000 (ms). I have changed the value to 5000 > I have not experienced issues since. Thanks, didn't know this, will try it! Regards Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From elacunza at binovo.es Tue Apr 19 18:10:49 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Tue, 19 Apr 2022 18:10:49 +0200 Subject: [PVE-User] Backup/timeout issues PVE 6.4 In-Reply-To: References: <8ca4f90c-4b01-2d16-2249-0bb3f9b9e15e@t8.ru> <3a1dd4d3-bdf3-6e65-164d-629af65cb159@binovo.es> Message-ID: <3b33db97-ae0c-3db5-5663-a54088086ddd@binovo.es> Hi, Thanks for the hint, but this isn't workable in our case, we don't keep VMs always on the same nodes. I think this has only recently started to happen, so maybe there was some change on PVE's timeout handling (or Linux NFS client...) Cheers El 19/4/22 a las 17:30, ?????? ??????? escribi?: > > I have the same case with backup. > > 7 Nodes mount the same NFS path on Synology (folder) > > But? I make the In backup settings some groups (every group is node > 1,2,3 etc and is VM) and set the time backups on groups, not the same > time. > > And the Mode of backup is Compression is ZSTD, Snapshot. > > See screenshot. > > > 19.04.2022 18:07, Eneko Lacunza ?????: >> >> Yes, all nodes mount the same NFS export. >> >> El 19/4/22 a las 16:52, ?????? ??????? escribi?: >>> >>> The Synology path mount over NFS is shared on 3 nodes ? >>> >>> 19.04.2022 17:46, Eneko Lacunza via pve-user ?????: >>>> _______________________________________________ >>>> pve-user mailing list >>>> pve-user at lists.proxmox.com >>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From lindsay.mathieson at gmail.com Wed Apr 20 04:54:23 2022 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 20 Apr 2022 12:54:23 +1000 Subject: [PVE-User] Data limits not being respected on zfs vm volume Message-ID: This is really odd - was downloading a large amount of data in a debian VM last night, something went wrong (my problem), it didn't stop and filled up the volume. Should be a problem as the virtual disk only exists to store temporary data: * vm-100-disk-1 * 256GB * 1 Partition, formatted and mounted as EXT4 * Located under rpool/data Trouble is, it kept expanding past 256GB, using up all the free space on the host boot drive. This morning everything was down and I had to delete the volume to get a functioning system. zfs list of volumes and snapshots: NAME??????????????????????????? USED AVAIL???? REFER? MOUNTPOINT rpool?????????????????????????? 450G???? 0B????? 104K? /rpool rpool/ROOT???????????????????? 17.3G???? 0B?????? 96K /rpool/ROOT rpool/ROOT/pve-1?????????????? 17.3G???? 0B???? 17.3G? / rpool/data????????????????????? 432G???? 0B????? 128K /rpool/data rpool/data/basevol-101-disk-0?? 563M???? 0B????? 563M /rpool/data/basevol-101-disk-0 rpool/data/basevol-102-disk-0?? 562M???? 0B????? 562M /rpool/data/basevol-102-disk-0 rpool/data/subvol-151-disk-0??? 911M???? 0B????? 911M /rpool/data/subvol-151-disk-0 rpool/data/subvol-152-disk-0??? 712M???? 0B????? 712M /rpool/data/subvol-152-disk-0 rpool/data/subvol-153-disk-0??? 712M???? 0B????? 712M /rpool/data/subvol-153-disk-0 rpool/data/subvol-154-disk-0??? 710M???? 0B????? 710M /rpool/data/subvol-154-disk-0 rpool/data/subvol-155-disk-0??? 838M???? 0B????? 838M /rpool/data/subvol-155-disk-0 rpool/data/vm-100-disk-0?????? 47.3G???? 0B???? 45.0G? - _*rpool/data/vm-100-disk-1??????? 338G???? 0B????? 235G? -*_ rpool/data/vm-100-state-fsck?? 2.05G???? 0B???? 2.05G? - rpool/data/vm-201-disk-0?????? 40.1G???? 0B???? 38.0G? - rpool/data/vm-201-disk-1??????? 176K???? 0B????? 104K? - root at px-server:~# NAME???????????????????????????????????? USED? AVAIL???? REFER MOUNTPOINT rpool/data/basevol-101-disk-0 at __base__???? 8K????? -????? 563M - rpool/data/basevol-102-disk-0 at __base__???? 8K????? -????? 562M - rpool/data/vm-100-disk-0 at fsck?????????? 2.32G????? -???? 42.7G - rpool/data/vm-100-disk-1 at fsck??????????? 103G????? -????? 164G - rpool/data/vm-201-disk-0 at BIOSChange???? 2.12G????? -???? 37.7G - rpool/data/vm-201-disk-1 at BIOSChange?????? 72K????? -?????? 96K - VM fstab UUID=7928b71b-a00e-4614-b239-d5cc9bf311d6 /?????????????? ext4??? errors=remount-ro 0?????? 1 # swap was on /dev/sda5 during installation # UUID=26f4eae9-7855-4561-b75b-1405cc5eec3e none swap??? sw????????????? 0?????? 0 /dev/sr0??????? /media/cdrom0?? udf,iso9660 user,noauto 0?????? 0 _*PARTUUID=bd9ca0da-fbde-4bfc-852b-f6b7db86292a /mnt/temp?????????????? ext4??? errors=remount-ro 0?????? 1*_ # moosefs mfsmount??????? /mnt/plex?????? fuse defaults,_netdev,mfsdelayedinit,mfssubfolder=plex 0?????? 0 How was this even possible? nb. The process downloading data was running in docker hosted on the debian vm. -- Lindsay Mathieson From mark at tuxis.nl Wed Apr 20 10:08:25 2022 From: mark at tuxis.nl (Mark Schouten) Date: Wed, 20 Apr 2022 10:08:25 +0200 Subject: [PVE-User] rebuilding a cluster node (new root disks (ext4 -> zfs)) In-Reply-To: <22DAD9B6-4EFE-48EA-A636-F84CE858B5D0@wolfspyre.com> References: <22DAD9B6-4EFE-48EA-A636-F84CE858B5D0@wolfspyre.com> Message-ID: <8E4CD97B-3B16-4C2F-8920-E269CC3C7864@tuxis.nl> Hi, Something like this? :) https://pve-user.pve.proxmox.narkive.com/Pqhqih3s/migrating-from-lvm-to-zfs ? Mark Schouten, CTO Tuxis B.V. mark at tuxis.nl > On 15 Apr 2022, at 21:09, Wolf Noble wrote: > > > Howdy all! > > 1) i hope you are doing well, and feel appreciated and respected. > > 2) Im rebuilding my cluster nodes one at a time. > i was hoping there is a guide someplace that enumerated the files on (original root disks) that should be copied over ?new? files to allow the newly rebuilt node to join the cluster with ease. > > i have a hacky, ugly way, but i?m hoping there?s something a bit more canonical than just my trial and error. > > I?ll happily contribute my ?moving root to zfs? notes to that doc? if someone could be so kind as to point me in the direction of a clue. :) > > TIA! > ?W > > [= The contents of this message have been written, read, processed, erased, sorted, sniffed, compressed, rewritten, misspelled, overcompensated, lost, found, and most importantly delivered entirely with recycled electrons =] > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From f.gruenbichler at proxmox.com Wed Apr 20 10:25:20 2022 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?q?Gr=FCnbichler?=) Date: Wed, 20 Apr 2022 10:25:20 +0200 Subject: [PVE-User] Data limits not being respected on zfs vm volume In-Reply-To: References: Message-ID: <1650442842.sqdn5m4pt2.astroid@nora.none> On April 20, 2022 4:54 am, Lindsay Mathieson wrote: > This is really odd - was downloading a large amount of data in a debian > VM last night, something went wrong (my problem), it didn't stop and > filled up the volume. > > > Should be a problem as the virtual disk only exists to store temporary data: > > * vm-100-disk-1 > * 256GB > * 1 Partition, formatted and mounted as EXT4 > * Located under rpool/data > > > Trouble is, it kept expanding past 256GB, using up all the free space on > the host boot drive. This morning everything was down and I had to > delete the volume to get a functioning system. > > zfs list of volumes and snapshots: > > NAME??????????????????????????? USED AVAIL???? REFER? MOUNTPOINT > rpool?????????????????????????? 450G???? 0B????? 104K? /rpool > rpool/ROOT???????????????????? 17.3G???? 0B?????? 96K /rpool/ROOT > rpool/ROOT/pve-1?????????????? 17.3G???? 0B???? 17.3G? / > rpool/data????????????????????? 432G???? 0B????? 128K /rpool/data > rpool/data/basevol-101-disk-0?? 563M???? 0B????? 563M > /rpool/data/basevol-101-disk-0 > rpool/data/basevol-102-disk-0?? 562M???? 0B????? 562M > /rpool/data/basevol-102-disk-0 > rpool/data/subvol-151-disk-0??? 911M???? 0B????? 911M > /rpool/data/subvol-151-disk-0 > rpool/data/subvol-152-disk-0??? 712M???? 0B????? 712M > /rpool/data/subvol-152-disk-0 > rpool/data/subvol-153-disk-0??? 712M???? 0B????? 712M > /rpool/data/subvol-153-disk-0 > rpool/data/subvol-154-disk-0??? 710M???? 0B????? 710M > /rpool/data/subvol-154-disk-0 > rpool/data/subvol-155-disk-0??? 838M???? 0B????? 838M > /rpool/data/subvol-155-disk-0 > rpool/data/vm-100-disk-0?????? 47.3G???? 0B???? 45.0G? - > _*rpool/data/vm-100-disk-1??????? 338G???? 0B????? 235G? -*_ used 338, refered 235G - so you either have snapshots, or raidz overhead taking up the extra space. > rpool/data/vm-100-state-fsck?? 2.05G???? 0B???? 2.05G? - > rpool/data/vm-201-disk-0?????? 40.1G???? 0B???? 38.0G? - > rpool/data/vm-201-disk-1??????? 176K???? 0B????? 104K? - > root at px-server:~# > > > NAME???????????????????????????????????? USED? AVAIL???? REFER MOUNTPOINT > rpool/data/basevol-101-disk-0 at __base__???? 8K????? -????? 563M - > rpool/data/basevol-102-disk-0 at __base__???? 8K????? -????? 562M - > rpool/data/vm-100-disk-0 at fsck?????????? 2.32G????? -???? 42.7G - > rpool/data/vm-100-disk-1 at fsck??????????? 103G????? -????? 164G - snapshots taking up 105G at least, which lines up nicely with 338-235 = 103G (doesn't have to, snapshot space accounting is a bit complicated). > rpool/data/vm-201-disk-0 at BIOSChange???? 2.12G????? -???? 37.7G - > rpool/data/vm-201-disk-1 at BIOSChange?????? 72K????? -?????? 96K - > > How was this even possible? see above. is the zvol thin-provisioned? if yes, then likely the snapshots are at fault. for regular zvols, creating a snapshot would already take care of having enough space at snapshot creationg time, and such a situation cannot arise. with thin-provisioned storage it's always possible to overcommit and run out of space. From gianni.milo22 at gmail.com Wed Apr 20 18:25:55 2022 From: gianni.milo22 at gmail.com (GM) Date: Wed, 20 Apr 2022 17:25:55 +0100 Subject: [PVE-User] Data limits not being respected on zfs vm volume In-Reply-To: References: Message-ID: > > Trouble is, it kept expanding past 256GB, using up all the free space on > the host boot drive. This morning everything was down and I had to > delete the volume to get a functioning system. Just to add up onto this, you could get around this from happening in the future by setting a "reservation" or "refreservation" to "rpool/ROOT/pve-1" dataset (man zfsprops for more details). From gaio at lilliput.linux.it Sun Apr 24 19:56:35 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Sun, 24 Apr 2022 19:56:35 +0200 Subject: [PVE-User] Disk performance test guidance... Message-ID: In a set of servers (some brand new dell T440 with 64GB or RAM, 2 500M SSD ZFS Raid 1, three 4TB HDD in ZFS raid3) we are catching some performance issue, eg in some operation we catch 10/20 MB/s maximal performance, with load that go sky high (load 40). An example: a couple of identical servers, connected via single gigabit connection. I've setup on server B a NFS server on HDD ZFS pool, then a NFS type storage (on both servers, they are clustered). I've copied from my PC a PVE VM backup, roughly 500GB of data, via NFS to server B: 80-90 MB/s, load 2-4. I've restored the VM on server A, thus *reading* data fron NFS, and restoring again on HDD ZFS pool on server A: 10-20 MB/s transfer rate, load at 40. This make no sense to me. As subject say, there's some 'Disk performance test guidance' document, so i can test and find the bottleneck? Thanks. -- Consolatevi! Sul sito http://www.sorryeverybody.com migliaia di americani chiedono scusa al mondo per la rielezione di Bush. (da Cacao Elefante) From naz9ul at gmail.com Mon Apr 25 16:07:53 2022 From: naz9ul at gmail.com (Sylvain Le Blanc) Date: Mon, 25 Apr 2022 10:07:53 -0400 Subject: [PVE-User] Disk performance test guidance... In-Reply-To: References: Message-ID: ZFS RAIDZ Drive Requirements - RAID Z requires 3 drives or more - RAID Z2 requires 5 drives or more - RAID Z3 requires 8 drives or more https://raidcalculators.com/zfs-raidz-capacity.php Hope this help ! Le dim. 24 avr. 2022 14 h 40, Marco Gaiarin a ?crit : > > In a set of servers (some brand new dell T440 with 64GB or RAM, 2 500M SSD > ZFS Raid 1, three 4TB HDD in ZFS raid3) we are catching some performance > issue, eg in some operation we catch 10/20 MB/s maximal performance, with > load that go sky high (load 40). > > > An example: a couple of identical servers, connected via single gigabit > connection. > I've setup on server B a NFS server on HDD ZFS pool, then a NFS type > storage > (on both servers, they are clustered). > > I've copied from my PC a PVE VM backup, roughly 500GB of data, via NFS to > server B: 80-90 MB/s, load 2-4. > > I've restored the VM on server A, thus *reading* data fron NFS, and > restoring again on HDD ZFS pool on server A: 10-20 MB/s transfer rate, load > at 40. > > > This make no sense to me. As subject say, there's some 'Disk performance > test > guidance' document, so i can test and find the bottleneck? > > > Thanks. > > -- > Consolatevi! Sul sito http://www.sorryeverybody.com migliaia di > americani > chiedono scusa al mondo per la rielezione di Bush. (da Cacao Elefante) > > > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From nada at verdnatura.es Tue Apr 26 08:36:37 2022 From: nada at verdnatura.es (nada) Date: Tue, 26 Apr 2022 08:36:37 +0200 Subject: [PVE-User] Disk performance test guidance... In-Reply-To: References: Message-ID: <7c154f7ca6db8be70a34468150cf7875@verdnatura.es> hi Marco you have NFS at server B so the speed must be better @B but before disk performance test you may also test traffic e.g. by iperf or iperf3 memory e.g. by memtest all the best Nada On 2022-04-24 19:56, Marco Gaiarin wrote: > In a set of servers (some brand new dell T440 with 64GB or RAM, 2 500M > SSD > ZFS Raid 1, three 4TB HDD in ZFS raid3) we are catching some > performance > issue, eg in some operation we catch 10/20 MB/s maximal performance, > with > load that go sky high (load 40). > > > An example: a couple of identical servers, connected via single gigabit > connection. > I've setup on server B a NFS server on HDD ZFS pool, then a NFS type > storage > (on both servers, they are clustered). > > I've copied from my PC a PVE VM backup, roughly 500GB of data, via NFS > to > server B: 80-90 MB/s, load 2-4. > > I've restored the VM on server A, thus *reading* data fron NFS, and > restoring again on HDD ZFS pool on server A: 10-20 MB/s transfer rate, > load > at 40. > > > This make no sense to me. As subject say, there's some 'Disk > performance test > guidance' document, so i can test and find the bottleneck? > > > Thanks. From nyangwesob at aua.ac.ke Tue Apr 26 08:50:02 2022 From: nyangwesob at aua.ac.ke (Brenda Nyangweso) Date: Tue, 26 Apr 2022 09:50:02 +0300 Subject: [PVE-User] Virtual Hosts Cannot Resolve Domain Names Message-ID: Hi, For some reason, some of my virtual machines seem to have developed a DNS issue. The host can resolve domain names, but some of the VMs cannot. The network settings of those that can resolve are similar to those that can't. Any clues as to what might be happening? Kind Regards, *Brenda * From nada at verdnatura.es Tue Apr 26 09:28:54 2022 From: nada at verdnatura.es (nada) Date: Tue, 26 Apr 2022 09:28:54 +0200 Subject: [PVE-User] Virtual Hosts Cannot Resolve Domain Names In-Reply-To: References: Message-ID: hi Brenda a month ago we had similar problem at one Proxmox node. Virtuals CT/QM with static IP were connected but CT/QM with DHCP IP were not connected because of DHCP offer timeout. The problem was with Mellanox netcards which were configured as bond via LACP (802.3ad). There was "illegal loopback" error at syslog Situation here was solved by * temporal reconfig bond from LACP to active-backup and CT/QM were OK * check firmware for Mellanox netcard related to HPE Proliant server * https://support.hpe.com/hpesc/public/docDisplay?docId=a00100988en_us&docLocale=en_US * new firmware installation * network reconfig back to LACP * testing reboot, QM/CT, connectivity OK You may check your syslog at proxmox node, at DNS/DHCP server and at relevant switch to find out where is your problem all the best Nada On 2022-04-26 08:50, Brenda Nyangweso wrote: > Hi, > For some reason, some of my virtual machines seem to have developed a > DNS > issue. The host can resolve domain names, but some of the VMs cannot. > The > network settings of those that can resolve are similar to those that > can't. > Any clues as to what might be happening? > > Kind Regards, > > *Brenda * > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user From colonellor at gmail.com Tue Apr 26 11:25:33 2022 From: colonellor at gmail.com (colonellor at gmail.com) Date: Tue, 26 Apr 2022 11:25:33 +0200 Subject: [PVE-User] Virtual Hosts Cannot Resolve Domain Names Message-ID: <6267babd.1c69fb81.37d2b.d598@mx.google.com> On Apr 26, 2022 8:50 AM, Brenda Nyangweso wrote: > > Hi, > For some reason, some of my virtual machines seem to have developed a DNS > issue. The host can resolve domain names, but some of the VMs cannot. The > network settings of those that can resolve are similar to those that can't. > Any clues as to what might be happening? > Some suggests: Local DNS cache ? File host ? > Kind Regards, > > *Brenda * > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From gaio at lilliput.linux.it Wed Apr 27 22:35:15 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Wed, 27 Apr 2022 22:35:15 +0200 Subject: [PVE-User] Disk performance test guidance... In-Reply-To: <7c154f7ca6db8be70a34468150cf7875@verdnatura.es> References: <7c154f7ca6db8be70a34468150cf7875@verdnatura.es> Message-ID: <20220427203515.GB72008@lilliput.linux.it> Mandi! nada In chel di` si favelave... > you have NFS at server B so the speed must be better @B > but before disk performance test you may also test > traffic e.g. by iperf or iperf3 Sure, i need also to test network, but fortunately i know decently iperf, while i know little ZFS... Thanks. From gaio at lilliput.linux.it Wed Apr 27 22:33:08 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Wed, 27 Apr 2022 22:33:08 +0200 Subject: [PVE-User] Disk performance test guidance... In-Reply-To: ; from SmartGate on Wed, Apr 27, 2022 at 23:36:01PM +0200 References: Message-ID: <2tloji-06b2.ln1@hermione.lilliput.linux.it> Mandi! Sylvain Le Blanc In chel di` si favelave... > ZFS RAIDZ Drive Requirements Sorry for not using the correct nomenclature, but i was not speaking about 'size' rather about 'speed'. I meant 'RAID Z' for 'ZFS raid3', and clearly 'ZFS mirror' for 'ZFS Raid 1'. Again, sorry. The question in subject, remain. Thanks. -- Sono i figli di quest'Italia, quest'Italia antifascista se cerchi casa non c'e` problema, basta conoscere un socialista (L. Barabarossa)