From krienke at uni-koblenz.de Thu Dec 1 08:37:36 2022 From: krienke at uni-koblenz.de (Rainer Krienke) Date: Thu, 1 Dec 2022 08:37:36 +0100 Subject: [PVE-User] proxmox hyperconverged pg calculations in pve 7.2, ceph pacific Message-ID: Hello, I run a a hyperconverged pve cluster (V7.2) with 11 nodes. Each node has 8 4TB disks. pve and ceph are installed an running. Next I wanted to create some ceph-pools with each 512 pgs. Since I want to use erasure coding (5+3) when creating a pool one rbd pool for metadata and the data pool are created. I used pveceph pool this command: pveceph pool create px-e --erasure-coding k=5,m=3 --pg_autoscale_mode off --pg_num 512 --pg_num_min 128 I was able to create two pools in this way but the third pveceph call threw this error: "got unexpected control message: TASK ERROR: error with 'osd pool create': mon_command failed - pg_num 512 size 8 would mean 22148 total pgs, which exceeds max 22000 (mon_max_pg_per_osd 250 * num_in_osds 88)" What I do not understand now are the calculations behind the scenes for the calculated total pg number of 22148. But how is this total number "22148" calculated? I already reduced the number of pgs for the metadata pool of each ec-pool and so I was able to create 4 pools in this way. But just for fun I now tried to create ec-pool number 5 and I see the message from above. Here are the pools created by now (scraped from ceph osd pool autoscale-status): Pool: Size: Bias: PG_NUM: rbd 4599 1.0 32 px-a-data 528.2G 1.0 512 px-a-metadata 838.1k 1.0 128 px-b-data 0 1.0 512 px-b-metadata 19 1.0 128 px-c-data 0 1.0 512 px-c-metadata 19 1.0 128 px-d-data 0 1.0 512 px-d-metadata 0 1.0 128 So the total number of pgs for all pools is currently 2592 which is far from 22148 pgs? Any ideas? Thanks Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312 From elacunza at binovo.es Thu Dec 1 09:10:44 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Thu, 1 Dec 2022 09:10:44 +0100 Subject: [PVE-User] proxmox hyperconverged pg calculations in pve 7.2, ceph pacific In-Reply-To: References: Message-ID: Hi Rainer, I haven't used erasure coded pools so I can't comment, but you may have better luck asking in ceph-user mailing list, as the question is quite generic and not Proxmox related: https://lists.ceph.io/postorius/lists/ceph-users.ceph.io/ Cheers El 1/12/22 a las 8:37, Rainer Krienke escribi?: > Hello, > > I run a a hyperconverged pve cluster (V7.2) with 11 nodes. Each node > has 8 4TB disks. pve and ceph are installed an running. > > Next I wanted to create some ceph-pools with each 512 pgs. Since I > want to use erasure coding (5+3) when creating a pool one rbd pool for > metadata and the data pool are created. I used pveceph pool this command: > > pveceph pool create px-e --erasure-coding k=5,m=3 --pg_autoscale_mode > off --pg_num 512 --pg_num_min 128 > > I was able to create two pools in this way but the third pveceph call > threw this error: > > "got unexpected control message: TASK ERROR: error with 'osd pool > create': mon_command failed -? pg_num 512 size 8 would mean 22148 > total pgs, which exceeds max 22000 (mon_max_pg_per_osd 250 * > num_in_osds 88)" > > What I do not understand now are the calculations behind the scenes > for the calculated total pg number of 22148. But how is this total > number "22148"? calculated? > > I already reduced the number of pgs for the metadata pool of each > ec-pool and so I was able to create 4 pools in this way. But just for > fun I now tried to create ec-pool number 5 and I see the message from > above. > > Here are the pools created by now (scraped from ceph osd pool > autoscale-status): > Pool:??????????????? Size:?? Bias:? PG_NUM: > rbd????????????????? 4599??? 1.0????? 32 > px-a-data????????? 528.2G??? 1.0???? 512 > px-a-metadata????? 838.1k??? 1.0???? 128 > px-b-data????????????? 0???? 1.0???? 512 > px-b-metadata???????? 19???? 1.0???? 128 > px-c-data????????????? 0???? 1.0???? 512 > px-c-metadata???????? 19???? 1.0???? 128 > px-d-data????????????? 0???? 1.0???? 512 > px-d-metadata????????? 0???? 1.0???? 128 > > So the total number of pgs for all pools is currently 2592 which is > far from 22148 pgs? > > Any ideas? > Thanks Rainer Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From alwin at antreich.com Thu Dec 1 21:29:36 2022 From: alwin at antreich.com (Alwin Antreich) Date: Thu, 01 Dec 2022 21:29:36 +0100 Subject: =?US-ASCII?Q?Re=3A_=5BPVE-User=5D_proxmox_hyperconverged_p?= =?US-ASCII?Q?g_calculations_in_pve_7=2E2=2C_ceph_pacific?= In-Reply-To: References: Message-ID: On December 1, 2022 8:37:36 AM GMT+01:00, Rainer Krienke wrote: >Hello, > >I run a a hyperconverged pve cluster (V7.2) with 11 nodes. Each node has 8 4TB disks. pve and ceph are installed an running. What's the intended use for these? And what disks are they! > >Next I wanted to create some ceph-pools with each 512 pgs. Since I want to use erasure coding (5+3) when creating a pool one rbd pool for metadata and the data pool are created. I used pveceph pool this command: > >pveceph pool create px-e --erasure-coding k=5,m=3 --pg_autoscale_mode off --pg_num 512 --pg_num_min 128 > It's __512 * 8 / num_osds__ to get the rough amount of PGs a OSD will be associated with. And use 4+3, as erasure profiles with a power of two perform better. Also the m is the amount off independent OSDs you can loose before loosing data. Is 3 your intent? And last but not least 5+3 will involve always 8 OSDs for a read/write. Plus objects are split, the size of the actual chunk matters much when HDDs are used. >I was able to create two pools in this way but the third pveceph call threw this error: > >"got unexpected control message: TASK ERROR: error with 'osd pool create': mon_command failed - pg_num 512 size 8 would mean 22148 total pgs, which exceeds max 22000 (mon_max_pg_per_osd 250 * num_in_osds 88)" > >What I do not understand now are the calculations behind the scenes for the calculated total pg number of 22148. But how is this total number "22148" calculated? > >I already reduced the number of pgs for the metadata pool of each ec-pool and so I was able to create 4 pools in this way. But just for fun I now tried to create ec-pool number 5 and I see the message from above. > >Here are the pools created by now (scraped from ceph osd pool autoscale-status): >Pool: Size: Bias: PG_NUM: >rbd 4599 1.0 32 >px-a-data 528.2G 1.0 512 >px-a-metadata 838.1k 1.0 128 >px-b-data 0 1.0 512 >px-b-metadata 19 1.0 128 >px-c-data 0 1.0 512 >px-c-metadata 19 1.0 128 >px-d-data 0 1.0 512 >px-d-metadata 0 1.0 128 > >So the total number of pgs for all pools is currently 2592 which is far from 22148 pgs? > >Any ideas? >Thanks Rainer Cheers, Alwin Hi Rainer, From gaio at lilliput.linux.it Mon Dec 12 17:07:01 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Mon, 12 Dec 2022 17:07:01 +0100 Subject: [PVE-User] OfflineUncorrectableSector, and now?! In-Reply-To: <20221128174156.gal7tmxds22tjreq@cloud0>; from SmartGate on Mon, Dec 12, 2022 at 19:06:01PM +0100 References: <20221128174156.gal7tmxds22tjreq@cloud0> Message-ID: <360k6j-u5c1.ln1@hermione.lilliput.linux.it> Mandi! Yannick Palanque In chel di` si favelave... > I see that it is a Dell SSD. Do you have any contract support? You > could ask to their support what they think of it. DELL support say that disk is good. Thre's some way to disable daily SMART email, eg defining that '8 bad sector is good'? Clearly without disabling SMART at all... If i've understood well, 'smartd' send notification using scripts in '/etc/smartmontools/run.d/', and particulary: /etc/smartmontools/run.d/10mail I've to code a custom script? I've used to have smartd signal disk trouble once, and reading manpages seems that this is still the default behaviour... -- La CIA ha scoperto chi porta il carbonchio... la befanchia!!! From elacunza at binovo.es Mon Dec 12 19:14:28 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Mon, 12 Dec 2022 19:14:28 +0100 Subject: [PVE-User] OfflineUncorrectableSector, and now?! In-Reply-To: <360k6j-u5c1.ln1@hermione.lilliput.linux.it> References: <20221128174156.gal7tmxds22tjreq@cloud0> <360k6j-u5c1.ln1@hermione.lilliput.linux.it> Message-ID: <060c50f4-1024-6e38-0c2f-d8eaaefee46b@binovo.es> Hi Marco, I only get SMART emails when those values change. So if it stays with value 8, there should be a way to not receive an email (if you're gettting it now, that is)... I don't think anything was touched for this in our environment... Cheers El 12/12/22 a las 17:07, Marco Gaiarin escribi?: > Mandi! Yannick Palanque > In chel di` si favelave... > >> I see that it is a Dell SSD. Do you have any contract support? You >> could ask to their support what they think of it. > DELL support say that disk is good. > > > Thre's some way to disable daily SMART email, eg defining that '8 bad sector > is good'? Clearly without disabling SMART at all... > > > If i've understood well, 'smartd' send notification using scripts in > '/etc/smartmontools/run.d/', and particulary: > > /etc/smartmontools/run.d/10mail > > I've to code a custom script? > > > I've used to have smartd signal disk trouble once, and reading manpages > seems that this is still the default behaviour... > Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From s.hanreich at proxmox.com Tue Dec 13 11:51:44 2022 From: s.hanreich at proxmox.com (Stefan Hanreich) Date: Tue, 13 Dec 2022 11:51:44 +0100 Subject: [PVE-User] OfflineUncorrectableSector, and now?! In-Reply-To: <360k6j-u5c1.ln1@hermione.lilliput.linux.it> References: <20221128174156.gal7tmxds22tjreq@cloud0> <360k6j-u5c1.ln1@hermione.lilliput.linux.it> Message-ID: These warnings get governed by the configuration in /etc/smartd.conf The only line in the default configuration line looks like this: DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner You can change this to the following line to only get email notifications when the value of SMART attribute 198 increases: |DEVICESCAN -U 198+ -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner | You can find the documentation for this file in the respective man page [1]. Kind Regards Stefan || [1] https://linux.die.net/man/5/smartd.conf || On 12/12/22 17:07, Marco Gaiarin wrote: > Mandi! Yannick Palanque > In chel di` si favelave... > >> I see that it is a Dell SSD. Do you have any contract support? You >> could ask to their support what they think of it. > DELL support say that disk is good. > > > Thre's some way to disable daily SMART email, eg defining that '8 bad sector > is good'? Clearly without disabling SMART at all... > > > If i've understood well, 'smartd' send notification using scripts in > '/etc/smartmontools/run.d/', and particulary: > > /etc/smartmontools/run.d/10mail > > I've to code a custom script? > > > I've used to have smartd signal disk trouble once, and reading manpages > seems that this is still the default behaviour... > From s.hanreich at proxmox.com Tue Dec 13 11:56:24 2022 From: s.hanreich at proxmox.com (Stefan Hanreich) Date: Tue, 13 Dec 2022 11:56:24 +0100 Subject: [PVE-User] OfflineUncorrectableSector, and now?! In-Reply-To: References: <20221128174156.gal7tmxds22tjreq@cloud0> <360k6j-u5c1.ln1@hermione.lilliput.linux.it> Message-ID: Seems like there were some issues with the formatting of my last mail, so I am writing again: The default config looks like this: DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner This would need to be adapted like this: DEVICESCAN -U 198+ -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner Kind Regards On 12/13/22 11:51, Stefan Hanreich wrote: > These warnings get governed by the configuration in /etc/smartd.conf > > The only line in the default configuration line looks like this: > > DEVICESCAN -d removable -n standby -m root -M exec > /usr/share/smartmontools/smartd-runner > > You can change this to the following line to only get email > notifications when the value of SMART attribute 198 increases: > > |DEVICESCAN -U 198+ -d removable -n standby -m root -M exec > /usr/share/smartmontools/smartd-runner | > > You can find the documentation for this file in the respective man page > [1]. > > Kind Regards > Stefan > > > || > > [1] https://linux.die.net/man/5/smartd.conf > > || > > On 12/12/22 17:07, Marco Gaiarin wrote: >> Mandi! Yannick Palanque >> ?? In chel di` si favelave... >> >>> I see that it is a Dell SSD. Do you have any contract support? You >>> could ask to their support what they think of it. >> DELL support say that disk is good. >> >> >> Thre's some way to disable daily SMART email, eg defining that '8 bad >> sector >> is good'? Clearly without disabling SMART at all... >> >> >> If i've understood well, 'smartd' send notification using scripts in >> '/etc/smartmontools/run.d/', and particulary: >> >> ????/etc/smartmontools/run.d/10mail >> >> I've to code a custom script? >> >> >> I've used to have smartd signal disk trouble once, and reading manpages >> seems that this is still the default behaviour... >> > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From uwe.sauter.de at gmail.com Thu Dec 15 08:23:41 2022 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Thu, 15 Dec 2022 08:23:41 +0100 Subject: [PVE-User] How to configure which network is used for migration Message-ID: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> Good morning, I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic. Is there a way to configure the interface/network that is used for migration or does this depend on the combination of hostname resolution and which hostname was used to create the cluster? (I have various hostnames configured per host, for each configured network one, so that I can explicitly choose which interface I use to connect to a host.) Regards, Uwe Interface configuration: eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24) eno2np1 N/C enp3s0 --+ +-- bond0 --+-- bond0.100 -- vmbr100 \ enp4s0 --+ +-- bond0.101 -- vmbr101 +-- VM traffic +-- bond0.102 -- vmbr102 / enp5s0 --+ +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24) enp6s0 --+ From mark at tuxis.nl Thu Dec 15 09:04:48 2022 From: mark at tuxis.nl (Mark Schouten) Date: Thu, 15 Dec 2022 08:04:48 +0000 Subject: [PVE-User] How to configure which network is used for migration In-Reply-To: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> Message-ID: Hi, Some unsolicited advice, switch to IPv6 ;) As for your question, you can set that up in the datacenter -> options tab. See https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_guest_migration Regards, ? Mark Schouten, CTO Tuxis B.V. mark at tuxis.nl / +31 318 200208 ------ Original Message ------ >From "Uwe Sauter" To "Proxmox VE user list" Date 12/15/2022 8:23:41 AM Subject [PVE-User] How to configure which network is used for migration >Good morning, > >I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different >network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic. > >Is there a way to configure the interface/network that is used for migration or does this depend on >the combination of hostname resolution and which hostname was used to create the cluster? >(I have various hostnames configured per host, for each configured network one, so that I can >explicitly choose which interface I use to connect to a host.) > >Regards, > > Uwe > > >Interface configuration: > >eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24) > >eno2np1 N/C > >enp3s0 --+ > +-- bond0 --+-- bond0.100 -- vmbr100 \ >enp4s0 --+ +-- bond0.101 -- vmbr101 +-- VM traffic > +-- bond0.102 -- vmbr102 / > >enp5s0 --+ > +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24) >enp6s0 --+ > >_______________________________________________ >pve-user mailing list >pve-user at lists.proxmox.com >https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From elacunza at binovo.es Thu Dec 15 09:06:25 2022 From: elacunza at binovo.es (Eneko Lacunza) Date: Thu, 15 Dec 2022 09:06:25 +0100 Subject: [PVE-User] How to configure which network is used for migration In-Reply-To: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> Message-ID: <3f621e67-63e1-e426-4bfb-a585669b19e9@binovo.es> Hi, You have in datacenter options "Migrations Settings", where you can set the migration network. Cheers El 15/12/22 a las 8:23, Uwe Sauter escribi?: > Good morning, > > I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different > network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic. > > Is there a way to configure the interface/network that is used for migration or does this depend on > the combination of hostname resolution and which hostname was used to create the cluster? > (I have various hostnames configured per host, for each configured network one, so that I can > explicitly choose which interface I use to connect to a host.) > > Regards, > > Uwe > > > Interface configuration: > > eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24) > > eno2np1 N/C > > enp3s0 --+ > +-- bond0 --+-- bond0.100 -- vmbr100 \ > enp4s0 --+ +-- bond0.101 -- vmbr101 +-- VM traffic > +-- bond0.102 -- vmbr102 / > > enp5s0 --+ > +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24) > enp6s0 --+ > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > Eneko Lacunza Zuzendari teknikoa | Director t?cnico Binovo IT Human Project Tel. +34 943 569 206 |https://www.binovo.es Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun https://www.youtube.com/user/CANALBINOVO https://www.linkedin.com/company/37269706/ From Alexandre.DERUMIER at groupe-cyllene.com Thu Dec 15 09:01:16 2022 From: Alexandre.DERUMIER at groupe-cyllene.com (DERUMIER, Alexandre) Date: Thu, 15 Dec 2022 08:01:16 +0000 Subject: [PVE-User] How to configure which network is used for migration In-Reply-To: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> Message-ID: <448a49514cec8fa89ea87821a41aa95248216b00.camel@groupe-cyllene.com> Hi, datacenter->options->migration settings->network Le jeudi 15 d?cembre 2022 ? 08:23 +0100, Uwe Sauter a ?crit?: > Good morning, > > I'm currently replacing one PVE cluster with another. The new > hardware has a bunch of different > network interfaces that I want to use to separate VM traffic from > Corosync/Ceph/migration traffic. > > Is there a way to configure the interface/network that is used for > migration or does this depend on > the combination of hostname resolution and which hostname was used to > create the cluster? > (I have various hostnames configured per host, for each configured > network one, so that I can > explicitly choose which interface I use to connect to a host.) > > Regards, > > ????????Uwe > > > Interface configuration: > > eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network > (192.168.1.0/24) > > eno2np1 N/C > > enp3s0? --+ > ????????? +-- bond0 --+-- bond0.100 -- vmbr100 \ > enp4s0? --+?????????? +-- bond0.101 -- vmbr101 +-- VM traffic > ????????????????????? +-- bond0.102 -- vmbr102 / > > enp5s0? --+ > ????????? +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph > (172.16.1.0/24) > enp6s0? --+ > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From uwe.sauter.de at gmail.com Thu Dec 15 09:11:45 2022 From: uwe.sauter.de at gmail.com (Uwe Sauter) Date: Thu, 15 Dec 2022 09:11:45 +0100 Subject: [PVE-User] How to configure which network is used for migration In-Reply-To: References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com> Message-ID: Mark, Alexandre, thanks for pointing that out. @Mark: as these are private, isolated networks I'm unsure how IPv6 would help with the issue. I could use link-local addresses (fe80::) but would need to append to every request the "%interface" postfix in order to tell the system which interface to use? Using unique local unicast addresses is no different in using private IPv4 networks? So please enlighten me how IPv6 would help in that situation. Besides that I agree ? we should switch to IPv6 where it is sensible and possible (from an organization's point of view). Regards, Uwe Am 15.12.22 um 09:04 schrieb Mark Schouten: > Hi, > > Some unsolicited advice, switch to IPv6 ;) > > As for your question, you can set that up in the datacenter -> options tab. See > https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_guest_migration > > Regards, > > ? > Mark Schouten, CTO > Tuxis B.V. > mark at tuxis.nl / +31 318 200208 > > > ------ Original Message ------ > From "Uwe Sauter" > To "Proxmox VE user list" > Date 12/15/2022 8:23:41 AM > Subject [PVE-User] How to configure which network is used for migration > >> Good morning, >> >> I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different >> network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic. >> >> Is there a way to configure the interface/network that is used for migration or does this depend on >> the combination of hostname resolution and which hostname was used to create the cluster? >> (I have various hostnames configured per host, for each configured network one, so that I can >> explicitly choose which interface I use to connect to a host.) >> >> Regards, >> >> ????Uwe >> >> >> Interface configuration: >> >> eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24) >> >> eno2np1 N/C >> >> enp3s0? --+ >> ????????? +-- bond0 --+-- bond0.100 -- vmbr100 \ >> enp4s0? --+?????????? +-- bond0.101 -- vmbr101 +-- VM traffic >> ????????????????????? +-- bond0.102 -- vmbr102 / >> >> enp5s0? --+ >> ????????? +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24) >> enp6s0? --+ >> >> _______________________________________________ >> pve-user mailing list >> pve-user at lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > From gaio at lilliput.linux.it Sat Dec 17 13:09:38 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Sat, 17 Dec 2022 13:09:38 +0100 Subject: [PVE-User] OfflineUncorrectableSector, and now?! In-Reply-To: ; from SmartGate on Sat, Dec 17, 2022 at 13:36:01PM +0100 References: Message-ID: <05o07j-6ep1.ln1@hermione.lilliput.linux.it> Mandi! Stefan Hanreich In chel di` si favelave... > DEVICESCAN -U 198+ -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner It works! Thanks!!! -- Worrying about case in a Windows (AD) context is one of the quickest paths to insanity. (Patrick Goetz) From Alexandre.DERUMIER at groupe-cyllene.com Tue Dec 20 13:16:22 2022 From: Alexandre.DERUMIER at groupe-cyllene.com (DERUMIER, Alexandre) Date: Tue, 20 Dec 2022 12:16:22 +0000 Subject: [PVE-User] [pve-devel] [PATCH qemu-server 08/10] memory: add virtio-mem support In-Reply-To: <15f5554d-3708-ac78-d2e2-1d797e55e211@proxmox.com> References: <20221209192726.1499142-1-aderumier@odiso.com> <20221209192726.1499142-9-aderumier@odiso.com> <87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com> <7b9306c429440304fb37601ece5ffdbad0b90e5f.camel@groupe-cyllene.com> <15f5554d-3708-ac78-d2e2-1d797e55e211@proxmox.com> Message-ID: Le mardi 20 d?cembre 2022 ? 11:26 +0100, Fiona Ebner a ?crit?: > Isn't ($MAX_MEM - $static_memory) / 32000 always strictly greater > than > 1? And if it could get smaller than 1, we also might have issues with > the int()+1 approach, because the result of the first log() will > become > negative. > > To be on the safe side we could just move the minimum check up: > > my $blocksize = ($MAX_MEM - $static_memory) / 32000; > $blocksize = 2 if $blocksize < 2; > $blocksize = 2**(ceil(log($blocksize)/log(2))); I think your are right. I totally forget than mem was in bytes, so the minimum blocksize is 2048 with a MAX_MEM of 64gb, the minimum blocksize is 2048. (I remember now that I wanted 64GB minimum to have transparent huge working out of the box). if MAX_MEM was allowed 32gb ,the minimum blocksize with ceil is 1024. so we need to force it to 2048 I'll rework the patch, thanks ! From gaio at lilliput.linux.it Thu Dec 22 14:43:56 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Thu, 22 Dec 2022 14:43:56 +0100 Subject: [PVE-User] Strange SMART data behaviour... smartd and PVE have different serial... Message-ID: Look at the photo attached. Srver just installed, four HDD (that SMART and PVE identify correctly) and two SSD disk that PVE put in state 'UNKNOWN'. SSD disks are behind an HP controller, put in HBA mode. But if i try to check the disks using 'smartctl': root at svpve3:~# smartctl -d cciss,0 -a /dev/sde smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.203-1-pve] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Phison Driven SSDs Device Model: KINGSTON SEDC500M480G Serial Number: 50026B7282DBBD5D LU WWN Device Id: 5 0026b7 282dbbd5d Firmware Version: SCEKJ2.8 User Capacity: 480,103,981,056 bytes [480 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic, zeroed Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Dec 22 14:37:16 2022 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled note that the serial are different. If i add to 'smartd.conf: /dev/sde -d cciss,0 -a -m root -M exec /usr/share/smartmontools/smartd-runner /dev/sdf -d cciss,1 -a -m root -M exec /usr/share/smartmontools/smartd-runner disks get monitored, but state file have the 'smartd' serial, not the 'PVE' serial. root at svpve3:~# ls -la /var/lib/smartmontools/ total 106 drwxr-xr-x 3 root root 19 Dec 22 14:40 . drwxr-xr-x 40 root root 40 Dec 21 14:08 .. -rw-r--r-- 1 root root 347 Dec 22 14:40 attrlog.KINGSTON_SEDC500M480G-50026B7282DBB86A.ata.csv -rw-r--r-- 1 root root 351 Dec 22 14:40 attrlog.KINGSTON_SEDC500M480G-50026B7282DBBD5D.ata.csv -rw-r--r-- 1 root root 16638 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WRQ0WQ44.ata.csv -rw-r--r-- 1 root root 16665 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WWZ1MBA8.ata.csv -rw-r--r-- 1 root root 16768 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WWZ1Q7F1.ata.csv -rw-r--r-- 1 root root 16626 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WWZ1RFL5.ata.csv drwxr-xr-x 2 root root 3 Dec 19 16:33 drivedb -rw-r--r-- 1 root root 3206 Dec 22 14:40 smartd.KINGSTON_SEDC500M480G-50026B7282DBB86A.ata.state -rw-r--r-- 1 root root 3209 Dec 22 14:40 smartd.KINGSTON_SEDC500M480G-50026B7282DBBD5D.ata.state -rw-r--r-- 1 root root 2529 Dec 22 14:40 smartd.ST8000VN004_3CP101-WRQ0WQ44.ata.state -rw-r--r-- 1 root root 2530 Dec 22 14:40 smartd.ST8000VN004_3CP101-WRQ0WQ44.ata.state~ -rw-r--r-- 1 root root 2530 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1MBA8.ata.state -rw-r--r-- 1 root root 2531 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1MBA8.ata.state~ -rw-r--r-- 1 root root 2533 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1Q7F1.ata.state -rw-r--r-- 1 root root 2533 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1Q7F1.ata.state~ -rw-r--r-- 1 root root 2531 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1RFL5.ata.state -rw-r--r-- 1 root root 2531 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1RFL5.ata.state~ What i'm missing here? Where PVE get the disk serial? Thanks. From gaio at lilliput.linux.it Fri Dec 23 09:19:10 2022 From: gaio at lilliput.linux.it (Marco Gaiarin) Date: Fri, 23 Dec 2022 09:19:10 +0100 Subject: [PVE-User] Unprivileged container dbus warning... Message-ID: On some unprivileged container (debian stratch) at every cron.daily run i got: Dec 23 06:43:37 vwp dbus[9223]: [system] Failed to reset fd limit before activating service: org.freedesktop.DBus.Error.AccessDenied: Failed to restore old fd limit: Operation not permitted seems a warning (the container works as expected), but how can i remove it?! Thanks. -- Dicono che la mafia ricicla i soldi sporchi in titoli di Stato. Ma ? naturale: volete che la mafia affidi i suoi soldi a gente sconosciuta? (Beppe Grillo) From oscar at dearriba.es Tue Dec 27 18:54:16 2022 From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=) Date: Tue, 27 Dec 2022 18:54:16 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected Message-ID: Hello all, >From ~1 week ago, one of my Proxmox nodes' data LVM is doing strange things. For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. I have set up proxmox inside a cluster with LVM and making backups to a NFS external location. Last week I tried to migrate an stopped VM of ~64 GiB from one server to another, and found out *the SSD started to underperform (~5 MB/s) after roughly 55 GiB copied *(this pattern was repeated several times). It was so bad that *even cancelling the migration, the SSD continued busy writting at that speeed and I need to reboot the instance, as it was completely unusable* (it is in my homelab, not running mission critical workloads, so it was okay to do that). After the reboot, I could remove the half-copied VM disk. After that, (and several retries, even making a backup to an external storage and trying to restore the backup, just in case the bottleneck was on the migration process) I ended up creating the instance from scratch and migrating data from one VM to another - so the VM was crearted brand new and no bottleneck was hit. The problem is that *now the pve/data logical volume is showing 377 GiB used, but the total size of stored VM disks (even if they are 100% approvisioned) is 168 GiB*. I checked and both VMs have no snapshots. I don't know if the reboot while writting to the disk (always having cancelled the migration first) damaged the LV in some way, but after thinking about it it does not even make sense that an SSD of this type ends up writting at 5 MB/s, even with the writting cache full. It should be writting far faster than that even without cache. Some information about the storage: `root at venom:~# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data pve twi-aotz-- 377.55g 96.13 1.54 [data_tdata] pve Twi-ao---- 377.55g [data_tmeta] pve ewi-ao---- <3.86g [lvol0_pmspare] pve ewi------- <3.86g root pve -wi-ao---- 60.00g swap pve -wi-ao---- 4.00g vm-150-disk-0 pve Vwi-a-tz-- 4.00m data 14.06 vm-150-disk-1 pve Vwi-a-tz-- 128.00g data 100.00 vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06 vm-201-disk-1 pve Vwi-aotz-- 40.00g data 71.51` and can be also seen on this post on the forum I did a couple of days ago: https://forum.proxmox.com/threads/thin-lvm-showing-more-used-space-than-expected.120051/ Any ideas aside from doing a backup and reinstall from scratch? Thanks in advance! From martin at holub.co.at Tue Dec 27 20:39:23 2022 From: martin at holub.co.at (Martin Holub) Date: Tue, 27 Dec 2022 20:39:23 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: References: Message-ID: Am 27.12.2022 um 18:54 schrieb ?scar de Arriba: > Hello all, > > From ~1 week ago, one of my Proxmox nodes' data LVM is doing strange things. > > For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. I have set up proxmox inside a cluster with LVM and making backups to a NFS external location. > > Last week I tried to migrate an stopped VM of ~64 GiB from one server to another, and found out *the SSD started to underperform (~5 MB/s) after roughly 55 GiB copied *(this pattern was repeated several times). > It was so bad that *even cancelling the migration, the SSD continued busy writting at that speeed and I need to reboot the instance, as it was completely unusable* (it is in my homelab, not running mission critical workloads, so it was okay to do that). After the reboot, I could remove the half-copied VM disk. > > After that, (and several retries, even making a backup to an external storage and trying to restore the backup, just in case the bottleneck was on the migration process) I ended up creating the instance from scratch and migrating data from one VM to another - so the VM was crearted brand new and no bottleneck was hit. > > The problem is that *now the pve/data logical volume is showing 377 GiB used, but the total size of stored VM disks (even if they are 100% approvisioned) is 168 GiB*. I checked and both VMs have no snapshots. > > I don't know if the reboot while writting to the disk (always having cancelled the migration first) damaged the LV in some way, but after thinking about it it does not even make sense that an SSD of this type ends up writting at 5 MB/s, even with the writting cache full. It should be writting far faster than that even without cache. > > Some information about the storage: > > `root at venom:~# lvs -a > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > data pve twi-aotz-- 377.55g 96.13 1.54 > [data_tdata] pve Twi-ao---- 377.55g > [data_tmeta] pve ewi-ao---- <3.86g > [lvol0_pmspare] pve ewi------- <3.86g > root pve -wi-ao---- 60.00g > swap pve -wi-ao---- 4.00g > vm-150-disk-0 pve Vwi-a-tz-- 4.00m data 14.06 > vm-150-disk-1 pve Vwi-a-tz-- 128.00g data 100.00 > vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06 > vm-201-disk-1 pve Vwi-aotz-- 40.00g data 71.51` > > and can be also seen on this post on the forum I did a couple of days ago: https://forum.proxmox.com/threads/thin-lvm-showing-more-used-space-than-expected.120051/ > > Any ideas aside from doing a backup and reinstall from scratch? > > Thanks in advance! > > _______________________________________________ > pve-user mailing list > pve-user at lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user Hi, Never used lvm-thin, so beware, this is just guessing, but to me this looks like, for some reason, something filled up your pool once (probably the migration?). Consumer SSDs don't perform well when allocation all space (at least to my knowledge) and, even there is still space in the pool, there are no free blocks (as for the SSDs controller). Therefore the low speed may come from this situation, as the controller needs to erase blocks, before writing them again, due to the lack of (known) free space. Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick. Also the "discard" option needs to be enabled for all volumes you want to trim, so check the VM config first. hth Martin From alain.pean at c2n.upsaclay.fr Wed Dec 28 11:52:23 2022 From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=) Date: Wed, 28 Dec 2022 11:52:23 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: References: Message-ID: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr> Le 27/12/2022 ? 18:54, ?scar de Arriba a ?crit?: > For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. Hi Oscar, Just to be sure, because normally wearout is 100% when the SSD is new, You are just soustracting, and it is in fact 100-4 = 96% ? My SSDs (Dell mixed use) after some years are still at 99%, so I am wondering about 4%... Alain -- Administrateur Syst?me/R?seau C2N Centre de Nanosciences et Nanotechnologies (UMR 9001) Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau Tel : 01-70-27-06-88 Bureau A255 From oscar at dearriba.es Wed Dec 28 12:19:31 2022 From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=) Date: Wed, 28 Dec 2022 12:19:31 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr> References: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr> Message-ID: <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com> Hi Alain, Thanks for taking time to answer my message. I think Proxmox UI is showing the % of wearout consumed. I just checked SMART using smartctl and it is showing 2.86 TB witten of a maximum of 180 TBW of this model (6%). I think those numbers are too much for the usage of this drive, but the number of power on hours match (52 days). I think the TBW are elevated because we had an instance with swap actived and that could generate s lot of IO (that's no longer the case from a couple of weeks ago). However, the strange behaviour of showing much more space used than the sum of VM disks + snapshots continue, and I'm really worried that the performance issue after copying some data can come from that situation. Also, the unit is showing now a 96% of space used, which worries me about decreased performance because of fragmentation issues. Oscar On Wed, Dec 28, 2022, at 11:52, Alain P?an wrote: > Le 27/12/2022 ? 18:54, ?scar de Arriba a ?crit : > > For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. > > Hi Oscar, > > Just to be sure, because normally wearout is 100% when the SSD is new, > You are just soustracting, and it is in fact 100-4 = 96% ? > My SSDs (Dell mixed use) after some years are still at 99%, so I am > wondering about 4%... > > Alain > > -- > Administrateur Syst?me/R?seau > C2N Centre de Nanosciences et Nanotechnologies (UMR 9001) > Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau > Tel : 01-70-27-06-88 Bureau A255 > > From oscar at dearriba.es Wed Dec 28 12:44:54 2022 From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=) Date: Wed, 28 Dec 2022 12:44:54 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: References: Message-ID: Hi Martin, > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick. I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied. I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it. Should I also enable trim/execute trim on Proxmox itself? Oscar From alain.pean at c2n.upsaclay.fr Wed Dec 28 13:17:02 2022 From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=) Date: Wed, 28 Dec 2022 13:17:02 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com> References: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr> <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com> Message-ID: <8f8156db-ec51-a426-3146-3fcf843b5699@c2n.upsaclay.fr> Le 28/12/2022 ? 12:19, ?scar de Arriba a ?crit?: > I think Proxmox UI is showing the % of wearout consumed. In my case, with Dell servers, the UI in fact is not showing anything (N/A), when the Raid storage volume is managed by the raid controller. In this case, I use Dell OMSA (Open Manage Server Administration), to display the values. But I have another cluster with Ceph, and indeed, it displays 0% as wearout. So I think you are right. I saw that they are Crucial SATA SSD directly attached on the motherboard. What kind of filesystem do you have on these SSDs ? Can you run pveperf on /dev/mapper/pveroot to see what are the performances ? Alain -- Administrateur Syst?me/R?seau C2N Centre de Nanosciences et Nanotechnologies (UMR 9001) Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau Tel : 01-70-27-06-88 Bureau A255 From oscar at dearriba.es Wed Dec 28 19:22:52 2022 From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=) Date: Wed, 28 Dec 2022 19:22:52 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: <8f8156db-ec51-a426-3146-3fcf843b5699@c2n.upsaclay.fr> References: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr> <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com> <8f8156db-ec51-a426-3146-3fcf843b5699@c2n.upsaclay.fr> Message-ID: <4720409e-c9f6-442d-b6c6-0e2f006c4b17@app.fastmail.com> > I saw that they are Crucial SATA SSD directly attached on the motherboard. What kind of filesystem do you have on these SSDs ? Can you run pveperf on /dev/mapper/pveroot to see what are the performances ? It is using LVM with ext4 for the root filesystem and the data storage is using LVM-Thin. root at venom:~# blkid /dev/sdj2: UUID="7B86-9E58" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="f4324ec9-c95e-4963-9ea9-5026f8f3fcae" /dev/sdj3: UUID="16ioTj-mei2-pqZI-bJWU-myRs-AF5a-Pfw03x" TYPE="LVM2_member" PARTUUID="d2528e8f-7958-4dc1-9960-f67999b75058" /dev/mapper/pve-swap: UUID="0fbe15d8-7823-42bc-891c-c131407921c7" TYPE="swap" /dev/mapper/pve-root: UUID="6bef8c06-b480-409c-8fa0-076344c9108d" BLOCK_SIZE="4096" TYPE="ext4" /dev/sdj1: PARTUUID="70bb576f-ab3a-4867-ab2e-e9a7c3fb5a15" /dev/mapper/pve-vm--150--disk--1: PTUUID="90e3bde4-d85c-46cb-a4b9-799c99e340c6" PTTYPE="gpt" /dev/mapper/pve-vm--201--disk--1: PTUUID="cb44eeb1-db0d-4d42-8a14-05077231b097" PTTYPE="gpt" root at venom:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data pve twi-aotz-- 377.55g 96.60 1.55 root pve -wi-ao---- 60.00g swap pve -wi-ao---- 4.00g vm-150-disk-0 pve Vwi-a-tz-- 4.00m data 14.06 vm-150-disk-1 pve Vwi-a-tz-- 128.00g data 100.00 vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06 vm-201-disk-1 pve Vwi-aotz-- 40.00g data 75.89 Regarding pveperf: root at venom:~# pveperf /dev/mapper/pve-root CPU BOGOMIPS: 211008.96 REGEX/SECOND: 1983864 HD SIZE: 58.76 GB (/dev/mapper/pve-root) BUFFERED READS: 338.10 MB/sec AVERAGE SEEK TIME: 0.09 ms open failed: Not a directory root at venom:~# pveperf ~/ CPU BOGOMIPS: 211008.96 REGEX/SECOND: 2067874 HD SIZE: 58.76 GB (/dev/mapper/pve-root) BUFFERED READS: 337.51 MB/sec AVERAGE SEEK TIME: 0.09 ms FSYNCS/SECOND: 679.87 DNS EXT: 128.22 ms DNS INT: 127.51 ms Thanks, Oscar From martin at holub.co.at Thu Dec 29 11:01:14 2022 From: martin at holub.co.at (Martin Holub) Date: Thu, 29 Dec 2022 11:01:14 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: References: Message-ID: Am 28.12.2022 um 12:44 schrieb ?scar de Arriba: > Hi Martin, > > > Did you try to run a fstrim on the VMs to regain the allocated > space? At least on linux something like "fstrim -av" should do the trick. > > I did it now and it freed ~55GiB of a running isntance (the one with > 128 GiB allocated). However that should only free blocks of the LV > used to store that VM disk, right? And the issue itself is that the > sum of maximum allocations of those disks is much lower than the space > occupied. > > I also have the feeling that those blocks remain used by a no longer > existant LVs, but I don't know how to fix it. > > Should I also enable trim/execute trim on Proxmox itself? > > Oscar > Hi, TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either. hth Martin From oscar at dearriba.es Thu Dec 29 11:48:16 2022 From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=) Date: Thu, 29 Dec 2022 11:48:16 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: References: Message-ID: <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com> Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE. Thanks, Oscar On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote: > > > Am 28.12.2022 um 12:44 schrieb ?scar de Arriba: >> Hi Martin, >> >> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick. >> >> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied. >> >> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it. >> >> Should I also enable trim/execute trim on Proxmox itself? >> >> Oscar >> > > > > Hi, > > TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either. > > hth > Martin > From oscar at dearriba.es Thu Dec 29 17:58:51 2022 From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=) Date: Thu, 29 Dec 2022 17:58:51 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com> References: <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com> Message-ID: Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space. However, even removing all VMs except one (which is hard to remove without disruption) I can see that: root at venom:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data pve twi-aotz-- 377.55g 60.65 0.67 root pve -wi-ao---- 60.00g swap pve -wi-ao---- 4.00g vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06 vm-201-disk-1 pve Vwi-aotz-- 40.00g data 56.58 Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full. On Thu, Dec 29, 2022, at 11:48, ?scar de Arriba wrote: > Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE. > > Thanks, > Oscar > > On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote: >> >> >> Am 28.12.2022 um 12:44 schrieb ?scar de Arriba: >>> Hi Martin, >>> >>> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick. >>> >>> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied. >>> >>> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it. >>> >>> Should I also enable trim/execute trim on Proxmox itself? >>> >>> Oscar >>> >> >> >> >> >> Hi, >> >> TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either. >> >> hth >> Martin >> >> > From pve at junkyard.4t2.com Fri Dec 30 10:52:21 2022 From: pve at junkyard.4t2.com (Tom Weber) Date: Fri, 30 Dec 2022 10:52:21 +0100 Subject: [PVE-User] Thin LVM showing more used space than expected In-Reply-To: References: <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com> Message-ID: <213da705-7804-c562-f21c-d6122bf88b85@junkyard.4t2.com> Am 29.12.22 um 17:58 schrieb ?scar de Arriba: > Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space. > > However, even removing all VMs except one (which is hard to remove without disruption) I can see that: > > root at venom:~# lvs > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > data pve twi-aotz-- 377.55g 60.65 0.67 > root pve -wi-ao---- 60.00g > swap pve -wi-ao---- 4.00g > vm-201-disk-0 pve Vwi-aotz-- 4.00m data 14.06 > vm-201-disk-1 pve Vwi-aotz-- 40.00g data 56.58 > > Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full. > you might want to try lvs -a Tom