From krienke at uni-koblenz.de  Thu Dec  1 08:37:36 2022
From: krienke at uni-koblenz.de (Rainer Krienke)
Date: Thu, 1 Dec 2022 08:37:36 +0100
Subject: [PVE-User] proxmox hyperconverged pg calculations in pve 7.2,
 ceph pacific
Message-ID: <adc4cfd1-1870-a45e-3806-6695dbbd228e@uni-koblenz.de>

Hello,

I run a a hyperconverged pve cluster (V7.2) with 11 nodes. Each node has 
8 4TB disks. pve and ceph are installed an running.

Next I wanted to create some ceph-pools with each 512 pgs. Since I want 
to use erasure coding (5+3) when creating a pool one rbd pool for 
metadata and the data pool are created. I used pveceph pool this command:

pveceph pool create px-e --erasure-coding k=5,m=3 --pg_autoscale_mode 
off --pg_num 512 --pg_num_min 128

I was able to create two pools in this way but the third pveceph call 
threw this error:

"got unexpected control message: TASK ERROR: error with 'osd pool 
create': mon_command failed -  pg_num 512 size 8 would mean 22148 total 
pgs, which exceeds max 22000 (mon_max_pg_per_osd 250 * num_in_osds 88)"

What I do not understand now are the calculations behind the scenes for 
the calculated total pg number of 22148. But how is this total number 
"22148"  calculated?

I already reduced the number of pgs for the metadata pool of each 
ec-pool and so I was able to create 4 pools in this way. But just for 
fun I now tried to create ec-pool number 5 and I see the message from 
above.

Here are the pools created by now (scraped from ceph osd pool 
autoscale-status):
Pool:                Size:   Bias:  PG_NUM:
rbd                  4599    1.0      32
px-a-data          528.2G    1.0     512
px-a-metadata      838.1k    1.0     128
px-b-data              0     1.0     512
px-b-metadata         19     1.0     128
px-c-data              0     1.0     512
px-c-metadata         19     1.0     128
px-d-data              0     1.0     512
px-d-metadata          0     1.0     128

So the total number of pgs for all pools is currently 2592 which is far 
from 22148 pgs?

Any ideas?
Thanks Rainer
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287 
1001312


From elacunza at binovo.es  Thu Dec  1 09:10:44 2022
From: elacunza at binovo.es (Eneko Lacunza)
Date: Thu, 1 Dec 2022 09:10:44 +0100
Subject: [PVE-User] proxmox hyperconverged pg calculations in pve 7.2,
 ceph pacific
In-Reply-To: <adc4cfd1-1870-a45e-3806-6695dbbd228e@uni-koblenz.de>
References: <adc4cfd1-1870-a45e-3806-6695dbbd228e@uni-koblenz.de>
Message-ID: <ee80a487-1f2a-6fca-9160-302e69bc5172@binovo.es>

Hi Rainer,

I haven't used erasure coded pools so I can't comment, but you may have 
better luck asking in ceph-user mailing list, as the question is quite 
generic and not Proxmox related:

https://lists.ceph.io/postorius/lists/ceph-users.ceph.io/

Cheers

El 1/12/22 a las 8:37, Rainer Krienke escribi?:
> Hello,
>
> I run a a hyperconverged pve cluster (V7.2) with 11 nodes. Each node 
> has 8 4TB disks. pve and ceph are installed an running.
>
> Next I wanted to create some ceph-pools with each 512 pgs. Since I 
> want to use erasure coding (5+3) when creating a pool one rbd pool for 
> metadata and the data pool are created. I used pveceph pool this command:
>
> pveceph pool create px-e --erasure-coding k=5,m=3 --pg_autoscale_mode 
> off --pg_num 512 --pg_num_min 128
>
> I was able to create two pools in this way but the third pveceph call 
> threw this error:
>
> "got unexpected control message: TASK ERROR: error with 'osd pool 
> create': mon_command failed -? pg_num 512 size 8 would mean 22148 
> total pgs, which exceeds max 22000 (mon_max_pg_per_osd 250 * 
> num_in_osds 88)"
>
> What I do not understand now are the calculations behind the scenes 
> for the calculated total pg number of 22148. But how is this total 
> number "22148"? calculated?
>
> I already reduced the number of pgs for the metadata pool of each 
> ec-pool and so I was able to create 4 pools in this way. But just for 
> fun I now tried to create ec-pool number 5 and I see the message from 
> above.
>
> Here are the pools created by now (scraped from ceph osd pool 
> autoscale-status):
> Pool:??????????????? Size:?? Bias:? PG_NUM:
> rbd????????????????? 4599??? 1.0????? 32
> px-a-data????????? 528.2G??? 1.0???? 512
> px-a-metadata????? 838.1k??? 1.0???? 128
> px-b-data????????????? 0???? 1.0???? 512
> px-b-metadata???????? 19???? 1.0???? 128
> px-c-data????????????? 0???? 1.0???? 512
> px-c-metadata???????? 19???? 1.0???? 128
> px-d-data????????????? 0???? 1.0???? 512
> px-d-metadata????????? 0???? 1.0???? 128
>
> So the total number of pgs for all pools is currently 2592 which is 
> far from 22148 pgs?
>
> Any ideas?
> Thanks Rainer

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

From alwin at antreich.com  Thu Dec  1 21:29:36 2022
From: alwin at antreich.com (Alwin Antreich)
Date: Thu, 01 Dec 2022 21:29:36 +0100
Subject: =?US-ASCII?Q?Re=3A_=5BPVE-User=5D_proxmox_hyperconverged_p?=
 =?US-ASCII?Q?g_calculations_in_pve_7=2E2=2C_ceph_pacific?=
In-Reply-To: <adc4cfd1-1870-a45e-3806-6695dbbd228e@uni-koblenz.de>
References: <adc4cfd1-1870-a45e-3806-6695dbbd228e@uni-koblenz.de>
Message-ID: <C51641E5-4946-480D-A850-FB372DDC3459@antreich.com>

On December 1, 2022 8:37:36 AM GMT+01:00, Rainer Krienke <krienke at uni-koblenz.de> wrote:
>Hello,
>
>I run a a hyperconverged pve cluster (V7.2) with 11 nodes. Each node has 8 4TB disks. pve and ceph are installed an running.

What's the intended use for these? And what disks are they!

>
>Next I wanted to create some ceph-pools with each 512 pgs. Since I want to use erasure coding (5+3) when creating a pool one rbd pool for metadata and the data pool are created. I used pveceph pool this command:
>
>pveceph pool create px-e --erasure-coding k=5,m=3 --pg_autoscale_mode off --pg_num 512 --pg_num_min 128
>

It's __512 * 8 / num_osds__ to get the rough amount of PGs a OSD will be associated with.

And use 4+3, as erasure profiles with a power of two perform better.

Also the m is the amount off independent OSDs you can loose before loosing data. Is 3 your intent? 

And last but not least 5+3 will involve always 8 OSDs for a read/write. Plus objects are split, the size of the actual chunk matters much when HDDs are used.

>I was able to create two pools in this way but the third pveceph call threw this error:
>
>"got unexpected control message: TASK ERROR: error with 'osd pool create': mon_command failed -  pg_num 512 size 8 would mean 22148 total pgs, which exceeds max 22000 (mon_max_pg_per_osd 250 * num_in_osds 88)"
>
>What I do not understand now are the calculations behind the scenes for the calculated total pg number of 22148. But how is this total number "22148"  calculated?
>
>I already reduced the number of pgs for the metadata pool of each ec-pool and so I was able to create 4 pools in this way. But just for fun I now tried to create ec-pool number 5 and I see the message from above.
>
>Here are the pools created by now (scraped from ceph osd pool autoscale-status):
>Pool:                Size:   Bias:  PG_NUM:
>rbd                  4599    1.0      32
>px-a-data          528.2G    1.0     512
>px-a-metadata      838.1k    1.0     128
>px-b-data              0     1.0     512
>px-b-metadata         19     1.0     128
>px-c-data              0     1.0     512
>px-c-metadata         19     1.0     128
>px-d-data              0     1.0     512
>px-d-metadata          0     1.0     128
>
>So the total number of pgs for all pools is currently 2592 which is far from 22148 pgs?
>
>Any ideas?
>Thanks Rainer

Cheers,
Alwin
Hi Rainer,


From gaio at lilliput.linux.it  Mon Dec 12 17:07:01 2022
From: gaio at lilliput.linux.it (Marco Gaiarin)
Date: Mon, 12 Dec 2022 17:07:01 +0100
Subject: [PVE-User] OfflineUncorrectableSector, and now?!
In-Reply-To: <20221128174156.gal7tmxds22tjreq@cloud0>;
 from SmartGate on Mon, Dec 12, 2022 at 19:06:01PM +0100
References: <Y4SmmJFsc/BJ8IIy@sv.lnf.it>
 <20221128174156.gal7tmxds22tjreq@cloud0>
Message-ID: <360k6j-u5c1.ln1@hermione.lilliput.linux.it>

Mandi! Yannick Palanque
  In chel di` si favelave...

> I see that it is a Dell SSD. Do you have any contract support? You
> could ask to their support what they think of it.

DELL support say that disk is good.


Thre's some way to disable daily SMART email, eg defining that '8 bad sector
is good'? Clearly without disabling SMART at all...


If i've understood well, 'smartd' send notification using scripts in
'/etc/smartmontools/run.d/', and particulary:

	/etc/smartmontools/run.d/10mail

I've to code a custom script?


I've used to have smartd signal disk trouble once, and reading manpages
seems that this is still the default behaviour...

-- 
  La CIA ha scoperto chi porta il carbonchio... la befanchia!!!


From elacunza at binovo.es  Mon Dec 12 19:14:28 2022
From: elacunza at binovo.es (Eneko Lacunza)
Date: Mon, 12 Dec 2022 19:14:28 +0100
Subject: [PVE-User] OfflineUncorrectableSector, and now?!
In-Reply-To: <360k6j-u5c1.ln1@hermione.lilliput.linux.it>
References: <Y4SmmJFsc/BJ8IIy@sv.lnf.it>
 <20221128174156.gal7tmxds22tjreq@cloud0>
 <360k6j-u5c1.ln1@hermione.lilliput.linux.it>
Message-ID: <060c50f4-1024-6e38-0c2f-d8eaaefee46b@binovo.es>

Hi Marco,

I only get SMART emails when those values change. So if it stays with 
value 8, there should be a way to not receive an email (if you're 
gettting it now, that is)... I don't think anything was touched for this 
in our environment...

Cheers

El 12/12/22 a las 17:07, Marco Gaiarin escribi?:
> Mandi! Yannick Palanque
>    In chel di` si favelave...
>
>> I see that it is a Dell SSD. Do you have any contract support? You
>> could ask to their support what they think of it.
> DELL support say that disk is good.
>
>
> Thre's some way to disable daily SMART email, eg defining that '8 bad sector
> is good'? Clearly without disabling SMART at all...
>
>
> If i've understood well, 'smartd' send notification using scripts in
> '/etc/smartmontools/run.d/', and particulary:
>
> 	/etc/smartmontools/run.d/10mail
>
> I've to code a custom script?
>
>
> I've used to have smartd signal disk trouble once, and reading manpages
> seems that this is still the default behaviour...
>


Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

From s.hanreich at proxmox.com  Tue Dec 13 11:51:44 2022
From: s.hanreich at proxmox.com (Stefan Hanreich)
Date: Tue, 13 Dec 2022 11:51:44 +0100
Subject: [PVE-User] OfflineUncorrectableSector, and now?!
In-Reply-To: <360k6j-u5c1.ln1@hermione.lilliput.linux.it>
References: <Y4SmmJFsc/BJ8IIy@sv.lnf.it>
 <20221128174156.gal7tmxds22tjreq@cloud0>
 <360k6j-u5c1.ln1@hermione.lilliput.linux.it>
Message-ID: <b8aecceb-2c6a-3a01-8414-d0580452cc30@proxmox.com>

These warnings get governed by the configuration in /etc/smartd.conf

The only line in the default configuration line looks like this:

DEVICESCAN -d removable -n standby -m root -M exec 
/usr/share/smartmontools/smartd-runner

You can change this to the following line to only get email 
notifications when the value of SMART attribute 198 increases:

|DEVICESCAN -U 198+ -d removable -n standby -m root -M exec 
/usr/share/smartmontools/smartd-runner |

You can find the documentation for this file in the respective man page [1].

Kind Regards
Stefan


||

[1] https://linux.die.net/man/5/smartd.conf

||

On 12/12/22 17:07, Marco Gaiarin wrote:
> Mandi! Yannick Palanque
>    In chel di` si favelave...
>
>> I see that it is a Dell SSD. Do you have any contract support? You
>> could ask to their support what they think of it.
> DELL support say that disk is good.
>
>
> Thre's some way to disable daily SMART email, eg defining that '8 bad sector
> is good'? Clearly without disabling SMART at all...
>
>
> If i've understood well, 'smartd' send notification using scripts in
> '/etc/smartmontools/run.d/', and particulary:
>
> 	/etc/smartmontools/run.d/10mail
>
> I've to code a custom script?
>
>
> I've used to have smartd signal disk trouble once, and reading manpages
> seems that this is still the default behaviour...
>

From s.hanreich at proxmox.com  Tue Dec 13 11:56:24 2022
From: s.hanreich at proxmox.com (Stefan Hanreich)
Date: Tue, 13 Dec 2022 11:56:24 +0100
Subject: [PVE-User] OfflineUncorrectableSector, and now?!
In-Reply-To: <b8aecceb-2c6a-3a01-8414-d0580452cc30@proxmox.com>
References: <Y4SmmJFsc/BJ8IIy@sv.lnf.it>
 <20221128174156.gal7tmxds22tjreq@cloud0>
 <360k6j-u5c1.ln1@hermione.lilliput.linux.it>
 <b8aecceb-2c6a-3a01-8414-d0580452cc30@proxmox.com>
Message-ID: <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>

Seems like there were some issues with the formatting of my last mail, 
so I am writing again:

The default config looks like this:

DEVICESCAN -d removable -n standby -m root -M exec 
/usr/share/smartmontools/smartd-runner

This would need to be adapted like this:

DEVICESCAN -U 198+ -d removable -n standby -m root -M exec 
/usr/share/smartmontools/smartd-runner

Kind Regards

On 12/13/22 11:51, Stefan Hanreich wrote:
> These warnings get governed by the configuration in /etc/smartd.conf
> 
> The only line in the default configuration line looks like this:
> 
> DEVICESCAN -d removable -n standby -m root -M exec 
> /usr/share/smartmontools/smartd-runner
> 
> You can change this to the following line to only get email 
> notifications when the value of SMART attribute 198 increases:
> 
> |DEVICESCAN -U 198+ -d removable -n standby -m root -M exec 
> /usr/share/smartmontools/smartd-runner |
> 
> You can find the documentation for this file in the respective man page 
> [1].
> 
> Kind Regards
> Stefan
> 
> 
> ||
> 
> [1] https://linux.die.net/man/5/smartd.conf
> 
> ||
> 
> On 12/12/22 17:07, Marco Gaiarin wrote:
>> Mandi! Yannick Palanque
>> ?? In chel di` si favelave...
>>
>>> I see that it is a Dell SSD. Do you have any contract support? You
>>> could ask to their support what they think of it.
>> DELL support say that disk is good.
>>
>>
>> Thre's some way to disable daily SMART email, eg defining that '8 bad 
>> sector
>> is good'? Clearly without disabling SMART at all...
>>
>>
>> If i've understood well, 'smartd' send notification using scripts in
>> '/etc/smartmontools/run.d/', and particulary:
>>
>> ????/etc/smartmontools/run.d/10mail
>>
>> I've to code a custom script?
>>
>>
>> I've used to have smartd signal disk trouble once, and reading manpages
>> seems that this is still the default behaviour...
>>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 


From uwe.sauter.de at gmail.com  Thu Dec 15 08:23:41 2022
From: uwe.sauter.de at gmail.com (Uwe Sauter)
Date: Thu, 15 Dec 2022 08:23:41 +0100
Subject: [PVE-User] How to configure which network is used for migration
Message-ID: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>

Good morning,

I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different
network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic.

Is there a way to configure the interface/network that is used for migration or does this depend on
the combination of hostname resolution and which hostname was used to create the cluster?
(I have various hostnames configured per host, for each configured network one, so that I can
explicitly choose which interface I use to connect to a host.)

Regards,

	Uwe


Interface configuration:

eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24)

eno2np1 N/C

enp3s0  --+
          +-- bond0 --+-- bond0.100 -- vmbr100 \
enp4s0  --+           +-- bond0.101 -- vmbr101 +-- VM traffic
                      +-- bond0.102 -- vmbr102 /

enp5s0  --+
          +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24)
enp6s0  --+


From mark at tuxis.nl  Thu Dec 15 09:04:48 2022
From: mark at tuxis.nl (Mark Schouten)
Date: Thu, 15 Dec 2022 08:04:48 +0000
Subject: [PVE-User] How to configure which network is used for migration
In-Reply-To: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
Message-ID: <em7e8e2199-2131-46e9-8480-80695f479cc6@4148cfe7.com>

Hi,

Some unsolicited advice, switch to IPv6 ;)

As for your question, you can set that up in the datacenter -> options 
tab. See 
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_guest_migration

Regards,

?
Mark Schouten, CTO
Tuxis B.V.
mark at tuxis.nl / +31 318 200208


------ Original Message ------
>From "Uwe Sauter" <uwe.sauter.de at gmail.com>
To "Proxmox VE user list" <pve-user at lists.proxmox.com>
Date 12/15/2022 8:23:41 AM
Subject [PVE-User] How to configure which network is used for migration

>Good morning,
>
>I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different
>network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic.
>
>Is there a way to configure the interface/network that is used for migration or does this depend on
>the combination of hostname resolution and which hostname was used to create the cluster?
>(I have various hostnames configured per host, for each configured network one, so that I can
>explicitly choose which interface I use to connect to a host.)
>
>Regards,
>
>	Uwe
>
>
>Interface configuration:
>
>eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24)
>
>eno2np1 N/C
>
>enp3s0  --+
>           +-- bond0 --+-- bond0.100 -- vmbr100 \
>enp4s0  --+           +-- bond0.101 -- vmbr101 +-- VM traffic
>                       +-- bond0.102 -- vmbr102 /
>
>enp5s0  --+
>           +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24)
>enp6s0  --+
>
>_______________________________________________
>pve-user mailing list
>pve-user at lists.proxmox.com
>https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From elacunza at binovo.es  Thu Dec 15 09:06:25 2022
From: elacunza at binovo.es (Eneko Lacunza)
Date: Thu, 15 Dec 2022 09:06:25 +0100
Subject: [PVE-User] How to configure which network is used for migration
In-Reply-To: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
Message-ID: <3f621e67-63e1-e426-4bfb-a585669b19e9@binovo.es>

Hi,

You have in datacenter options "Migrations Settings", where you can set 
the migration network.

Cheers

El 15/12/22 a las 8:23, Uwe Sauter escribi?:
> Good morning,
>
> I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different
> network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic.
>
> Is there a way to configure the interface/network that is used for migration or does this depend on
> the combination of hostname resolution and which hostname was used to create the cluster?
> (I have various hostnames configured per host, for each configured network one, so that I can
> explicitly choose which interface I use to connect to a host.)
>
> Regards,
>
> 	Uwe
>
>
> Interface configuration:
>
> eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24)
>
> eno2np1 N/C
>
> enp3s0  --+
>            +-- bond0 --+-- bond0.100 -- vmbr100 \
> enp4s0  --+           +-- bond0.101 -- vmbr101 +-- VM traffic
>                        +-- bond0.102 -- vmbr102 /
>
> enp5s0  --+
>            +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24)
> enp6s0  --+
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Eneko Lacunza
Zuzendari teknikoa | Director t?cnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2? izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

From Alexandre.DERUMIER at groupe-cyllene.com  Thu Dec 15 09:01:16 2022
From: Alexandre.DERUMIER at groupe-cyllene.com (DERUMIER, Alexandre)
Date: Thu, 15 Dec 2022 08:01:16 +0000
Subject: [PVE-User] How to configure which network is used for migration
In-Reply-To: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
Message-ID: <448a49514cec8fa89ea87821a41aa95248216b00.camel@groupe-cyllene.com>

Hi,

datacenter->options->migration settings->network


Le jeudi 15 d?cembre 2022 ? 08:23 +0100, Uwe Sauter a ?crit?:
> Good morning,
> 
> I'm currently replacing one PVE cluster with another. The new
> hardware has a bunch of different
> network interfaces that I want to use to separate VM traffic from
> Corosync/Ceph/migration traffic.
> 
> Is there a way to configure the interface/network that is used for
> migration or does this depend on
> the combination of hostname resolution and which hostname was used to
> create the cluster?
> (I have various hostnames configured per host, for each configured
> network one, so that I can
> explicitly choose which interface I use to connect to a host.)
> 
> Regards,
> 
> ????????Uwe
> 
> 
> Interface configuration:
> 
> eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network
> (192.168.1.0/24)
> 
> eno2np1 N/C
> 
> enp3s0? --+
> ????????? +-- bond0 --+-- bond0.100 -- vmbr100 \
> enp4s0? --+?????????? +-- bond0.101 -- vmbr101 +-- VM traffic
> ????????????????????? +-- bond0.102 -- vmbr102 /
> 
> enp5s0? --+
> ????????? +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph
> (172.16.1.0/24)
> enp6s0? --+
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From uwe.sauter.de at gmail.com  Thu Dec 15 09:11:45 2022
From: uwe.sauter.de at gmail.com (Uwe Sauter)
Date: Thu, 15 Dec 2022 09:11:45 +0100
Subject: [PVE-User] How to configure which network is used for migration
In-Reply-To: <em7e8e2199-2131-46e9-8480-80695f479cc6@4148cfe7.com>
References: <0792849c-8d3f-e0de-460c-e89245c8d12b@gmail.com>
 <em7e8e2199-2131-46e9-8480-80695f479cc6@4148cfe7.com>
Message-ID: <d80eadf7-d068-3695-b55c-e944b6b5079b@gmail.com>

Mark, Alexandre,

thanks for pointing that out.

@Mark: as these are private, isolated networks I'm unsure how IPv6 would help with the issue. I
could use link-local addresses (fe80::) but would need to append to every request the "%interface"
postfix in order to tell the system which interface to use?

Using unique local unicast addresses is no different in using private IPv4 networks?

So please enlighten me how IPv6 would help in that situation.

Besides that I agree ? we should switch to IPv6 where it is sensible and possible (from an
organization's point of view).

Regards,

	Uwe

Am 15.12.22 um 09:04 schrieb Mark Schouten:
> Hi,
> 
> Some unsolicited advice, switch to IPv6 ;)
> 
> As for your question, you can set that up in the datacenter -> options tab. See
> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_guest_migration
> 
> Regards,
> 
> ?
> Mark Schouten, CTO
> Tuxis B.V.
> mark at tuxis.nl / +31 318 200208
> 
> 
> ------ Original Message ------
> From "Uwe Sauter" <uwe.sauter.de at gmail.com>
> To "Proxmox VE user list" <pve-user at lists.proxmox.com>
> Date 12/15/2022 8:23:41 AM
> Subject [PVE-User] How to configure which network is used for migration
> 
>> Good morning,
>>
>> I'm currently replacing one PVE cluster with another. The new hardware has a bunch of different
>> network interfaces that I want to use to separate VM traffic from Corosync/Ceph/migration traffic.
>>
>> Is there a way to configure the interface/network that is used for migration or does this depend on
>> the combination of hostname resolution and which hostname was used to create the cluster?
>> (I have various hostnames configured per host, for each configured network one, so that I can
>> explicitly choose which interface I use to connect to a host.)
>>
>> Regards,
>>
>> ????Uwe
>>
>>
>> Interface configuration:
>>
>> eno1np0 --+-- untagged VLAN X -- Corosync ring 1/management network (192.168.1.0/24)
>>
>> eno2np1 N/C
>>
>> enp3s0? --+
>> ????????? +-- bond0 --+-- bond0.100 -- vmbr100 \
>> enp4s0? --+?????????? +-- bond0.101 -- vmbr101 +-- VM traffic
>> ????????????????????? +-- bond0.102 -- vmbr102 /
>>
>> enp5s0? --+
>> ????????? +-- bond1 --+-- untagged VLAN Y -- Corosync ring 0/Ceph (172.16.1.0/24)
>> enp6s0? --+
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> 


From gaio at lilliput.linux.it  Sat Dec 17 13:09:38 2022
From: gaio at lilliput.linux.it (Marco Gaiarin)
Date: Sat, 17 Dec 2022 13:09:38 +0100
Subject: [PVE-User] OfflineUncorrectableSector, and now?!
In-Reply-To: <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>;
 from SmartGate on Sat, Dec 17, 2022 at 13:36:01PM +0100
References: <b8aecceb-2c6a-3a01-8414-d0580452cc30@proxmox.com>
 <ad8d7d3c-0bcc-8a53-588d-d0e75875f925@proxmox.com>
Message-ID: <05o07j-6ep1.ln1@hermione.lilliput.linux.it>

Mandi! Stefan Hanreich
  In chel di` si favelave...

> DEVICESCAN -U 198+ -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner

It works! Thanks!!!

-- 
  Worrying about case in a Windows (AD) context is one of the quickest paths to
  insanity.							(Patrick Goetz)


From Alexandre.DERUMIER at groupe-cyllene.com  Tue Dec 20 13:16:22 2022
From: Alexandre.DERUMIER at groupe-cyllene.com (DERUMIER, Alexandre)
Date: Tue, 20 Dec 2022 12:16:22 +0000
Subject: [PVE-User] [pve-devel] [PATCH qemu-server 08/10] memory: add
 virtio-mem support
In-Reply-To: <15f5554d-3708-ac78-d2e2-1d797e55e211@proxmox.com>
References: <20221209192726.1499142-1-aderumier@odiso.com>
 <20221209192726.1499142-9-aderumier@odiso.com>
 <87423a6b-a17e-5ea4-9176-cd81e96c5693@proxmox.com>
 <7b9306c429440304fb37601ece5ffdbad0b90e5f.camel@groupe-cyllene.com>
 <15f5554d-3708-ac78-d2e2-1d797e55e211@proxmox.com>
Message-ID: <b354aab5e4791e7c862b15470ca24c273b8030be.camel@groupe-cyllene.com>

Le mardi 20 d?cembre 2022 ? 11:26 +0100, Fiona Ebner a ?crit?:
> Isn't ($MAX_MEM - $static_memory) / 32000 always strictly greater
> than
> 1? And if it could get smaller than 1, we also might have issues with
> the int()+1 approach, because the result of the first log() will
> become
> negative.
> 
> To be on the safe side we could just move the minimum check up:
> 
> my $blocksize = ($MAX_MEM - $static_memory) / 32000;
> $blocksize = 2 if $blocksize < 2;
> $blocksize = 2**(ceil(log($blocksize)/log(2)));

I think your are right.


I totally forget than mem was in bytes,

so the minimum blocksize is 2048


with a MAX_MEM of 64gb, the minimum blocksize is 2048.
(I remember now that I wanted 64GB minimum to have transparent huge
working out of the box).

if MAX_MEM was allowed 32gb ,the minimum blocksize with ceil is 1024.
so we need to force it to 2048

I'll rework the patch, thanks !


From gaio at lilliput.linux.it  Thu Dec 22 14:43:56 2022
From: gaio at lilliput.linux.it (Marco Gaiarin)
Date: Thu, 22 Dec 2022 14:43:56 +0100
Subject: [PVE-User] Strange SMART data behaviour... smartd and PVE have
 different serial...
Message-ID: <Y6RfHMt7+gg6ihoX@sv.lnf.it>


Look at the photo attached. Srver just installed, four HDD (that SMART and
PVE identify correctly) and two SSD disk that PVE put in state 'UNKNOWN'.

SSD disks are behind an HP controller, put in HBA mode.


But if i try to check the disks using 'smartctl':

 root at svpve3:~# smartctl -d cciss,0 -a /dev/sde
 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.4.203-1-pve] (local build)
 Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
 
 === START OF INFORMATION SECTION ===
 Model Family:     Phison Driven SSDs
 Device Model:     KINGSTON SEDC500M480G
 Serial Number:    50026B7282DBBD5D
 LU WWN Device Id: 5 0026b7 282dbbd5d
 Firmware Version: SCEKJ2.8
 User Capacity:    480,103,981,056 bytes [480 GB]
 Sector Size:      512 bytes logical/physical
 Rotation Rate:    Solid State Device
 Form Factor:      2.5 inches
 TRIM Command:     Available, deterministic, zeroed
 Device is:        In smartctl database [for details use: -P show]
 ATA Version is:   ACS-3 (minor revision not indicated)
 SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
 Local Time is:    Thu Dec 22 14:37:16 2022 CET
 SMART support is: Available - device has SMART capability.
 SMART support is: Enabled

note that the serial are different. If i add to 'smartd.conf:

 /dev/sde -d cciss,0 -a -m root -M exec /usr/share/smartmontools/smartd-runner
 /dev/sdf -d cciss,1 -a -m root -M exec /usr/share/smartmontools/smartd-runner

disks get monitored, but state file have the 'smartd' serial, not the 'PVE'
serial.

 root at svpve3:~# ls -la /var/lib/smartmontools/
 total 106
 drwxr-xr-x  3 root root    19 Dec 22 14:40 .
 drwxr-xr-x 40 root root    40 Dec 21 14:08 ..
 -rw-r--r--  1 root root   347 Dec 22 14:40 attrlog.KINGSTON_SEDC500M480G-50026B7282DBB86A.ata.csv
 -rw-r--r--  1 root root   351 Dec 22 14:40 attrlog.KINGSTON_SEDC500M480G-50026B7282DBBD5D.ata.csv
 -rw-r--r--  1 root root 16638 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WRQ0WQ44.ata.csv
 -rw-r--r--  1 root root 16665 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WWZ1MBA8.ata.csv
 -rw-r--r--  1 root root 16768 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WWZ1Q7F1.ata.csv
 -rw-r--r--  1 root root 16626 Dec 22 14:40 attrlog.ST8000VN004_3CP101-WWZ1RFL5.ata.csv
 drwxr-xr-x  2 root root     3 Dec 19 16:33 drivedb
 -rw-r--r--  1 root root  3206 Dec 22 14:40 smartd.KINGSTON_SEDC500M480G-50026B7282DBB86A.ata.state
 -rw-r--r--  1 root root  3209 Dec 22 14:40 smartd.KINGSTON_SEDC500M480G-50026B7282DBBD5D.ata.state
 -rw-r--r--  1 root root  2529 Dec 22 14:40 smartd.ST8000VN004_3CP101-WRQ0WQ44.ata.state
 -rw-r--r--  1 root root  2530 Dec 22 14:40 smartd.ST8000VN004_3CP101-WRQ0WQ44.ata.state~
 -rw-r--r--  1 root root  2530 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1MBA8.ata.state
 -rw-r--r--  1 root root  2531 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1MBA8.ata.state~
 -rw-r--r--  1 root root  2533 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1Q7F1.ata.state
 -rw-r--r--  1 root root  2533 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1Q7F1.ata.state~
 -rw-r--r--  1 root root  2531 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1RFL5.ata.state
 -rw-r--r--  1 root root  2531 Dec 22 14:40 smartd.ST8000VN004_3CP101-WWZ1RFL5.ata.state~

What i'm missing here? Where PVE get the disk serial?

Thanks.

From gaio at lilliput.linux.it  Fri Dec 23 09:19:10 2022
From: gaio at lilliput.linux.it (Marco Gaiarin)
Date: Fri, 23 Dec 2022 09:19:10 +0100
Subject: [PVE-User] Unprivileged container dbus warning...
Message-ID: <ss4g7j-28a.ln1@hermione.lilliput.linux.it>


On some unprivileged container (debian stratch) at every cron.daily run i
got:

	Dec 23 06:43:37 vwp dbus[9223]: [system] Failed to reset fd limit before activating service: org.freedesktop.DBus.Error.AccessDenied: Failed to restore old fd limit: Operation not permitted

seems a warning (the container works as expected), but how can i remove it?!

Thanks.

-- 
  Dicono che la mafia ricicla i soldi sporchi in titoli di Stato. Ma ?
  naturale: volete che la mafia affidi i suoi soldi a gente sconosciuta?
							(Beppe Grillo)


From oscar at dearriba.es  Tue Dec 27 18:54:16 2022
From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=)
Date: Tue, 27 Dec 2022 18:54:16 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
Message-ID: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>

Hello all,

>From ~1 week ago, one of my Proxmox nodes' data LVM is doing strange things.

 For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. I have set up proxmox inside a cluster with LVM and making backups to a NFS external location.

Last week I tried to migrate an stopped VM of ~64 GiB from one server to another, and found out *the SSD started to underperform (~5 MB/s) after roughly 55 GiB copied *(this pattern was repeated several times). 
It was so bad that *even cancelling the migration, the SSD continued busy writting at that speeed and I need to reboot the instance, as it was completely unusable* (it is in my homelab, not running mission critical workloads, so it was okay to do that). After the reboot, I could remove the half-copied VM disk.

After that, (and several retries, even making a backup to an external storage and trying to restore the backup, just in case the bottleneck was on the migration process) I ended up creating the instance from scratch and migrating data from one VM to another - so the VM was crearted brand new and no bottleneck was hit.

The problem is that *now the pve/data logical volume is showing 377 GiB used, but the total size of stored VM disks (even if they are 100% approvisioned) is 168 GiB*. I checked and both VMs have no snapshots. 

I don't know if the reboot while writting to the disk (always having cancelled the migration first) damaged the LV in some way, but after thinking about it it does not even make sense that an SSD of this type ends up writting at 5 MB/s, even with the writting cache full. It should be writting far faster than that even without cache.

Some information about the storage:

`root at venom:~# lvs -a
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data            pve twi-aotz-- 377.55g             96.13  1.54                           
  [data_tdata]    pve Twi-ao---- 377.55g                                                   
  [data_tmeta]    pve ewi-ao----  <3.86g                                                   
  [lvol0_pmspare] pve ewi-------  <3.86g                                                   
  root            pve -wi-ao----  60.00g                                                   
  swap            pve -wi-ao----   4.00g                                                   
  vm-150-disk-0   pve Vwi-a-tz--   4.00m data        14.06                                 
  vm-150-disk-1   pve Vwi-a-tz-- 128.00g data        100.00                                 
  vm-201-disk-0   pve Vwi-aotz--   4.00m data        14.06                                 
  vm-201-disk-1   pve Vwi-aotz--  40.00g data        71.51`

and can be also seen on this post on the forum I did a couple of days ago: https://forum.proxmox.com/threads/thin-lvm-showing-more-used-space-than-expected.120051/

Any ideas aside from doing a backup and reinstall from scratch?

Thanks in advance!


From martin at holub.co.at  Tue Dec 27 20:39:23 2022
From: martin at holub.co.at (Martin Holub)
Date: Tue, 27 Dec 2022 20:39:23 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
References: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
Message-ID: <a0181356-b5f7-09e4-93cf-5a64e2e2ed22@holub.co.at>

Am 27.12.2022 um 18:54 schrieb ?scar de Arriba:
> Hello all,
>
>  From ~1 week ago, one of my Proxmox nodes' data LVM is doing strange things.
>
>   For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout. I have set up proxmox inside a cluster with LVM and making backups to a NFS external location.
>
> Last week I tried to migrate an stopped VM of ~64 GiB from one server to another, and found out *the SSD started to underperform (~5 MB/s) after roughly 55 GiB copied *(this pattern was repeated several times).
> It was so bad that *even cancelling the migration, the SSD continued busy writting at that speeed and I need to reboot the instance, as it was completely unusable* (it is in my homelab, not running mission critical workloads, so it was okay to do that). After the reboot, I could remove the half-copied VM disk.
>
> After that, (and several retries, even making a backup to an external storage and trying to restore the backup, just in case the bottleneck was on the migration process) I ended up creating the instance from scratch and migrating data from one VM to another - so the VM was crearted brand new and no bottleneck was hit.
>
> The problem is that *now the pve/data logical volume is showing 377 GiB used, but the total size of stored VM disks (even if they are 100% approvisioned) is 168 GiB*. I checked and both VMs have no snapshots.
>
> I don't know if the reboot while writting to the disk (always having cancelled the migration first) damaged the LV in some way, but after thinking about it it does not even make sense that an SSD of this type ends up writting at 5 MB/s, even with the writting cache full. It should be writting far faster than that even without cache.
>
> Some information about the storage:
>
> `root at venom:~# lvs -a
>    LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>    data            pve twi-aotz-- 377.55g             96.13  1.54
>    [data_tdata]    pve Twi-ao---- 377.55g
>    [data_tmeta]    pve ewi-ao----  <3.86g
>    [lvol0_pmspare] pve ewi-------  <3.86g
>    root            pve -wi-ao----  60.00g
>    swap            pve -wi-ao----   4.00g
>    vm-150-disk-0   pve Vwi-a-tz--   4.00m data        14.06
>    vm-150-disk-1   pve Vwi-a-tz-- 128.00g data        100.00
>    vm-201-disk-0   pve Vwi-aotz--   4.00m data        14.06
>    vm-201-disk-1   pve Vwi-aotz--  40.00g data        71.51`
>
> and can be also seen on this post on the forum I did a couple of days ago: https://forum.proxmox.com/threads/thin-lvm-showing-more-used-space-than-expected.120051/
>
> Any ideas aside from doing a backup and reinstall from scratch?
>
> Thanks in advance!
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Hi,

Never used lvm-thin, so beware, this is just guessing, but to me this 
looks like, for some reason, something filled up your pool once 
(probably the migration?). Consumer SSDs don't perform well when 
allocation all space (at least to my knowledge) and, even there is still 
space in the pool, there are no free blocks (as for the SSDs 
controller). Therefore the low speed may come from this situation, as 
the controller needs to erase blocks, before writing them again, due to 
the lack of (known) free space. Did you try to run a fstrim on the VMs 
to regain the allocated space? At least on linux something like "fstrim 
-av" should do the trick. Also the "discard" option needs to be enabled 
for all volumes you want to trim, so check the VM config first.

hth
Martin


From alain.pean at c2n.upsaclay.fr  Wed Dec 28 11:52:23 2022
From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=)
Date: Wed, 28 Dec 2022 11:52:23 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
References: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
Message-ID: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr>

Le 27/12/2022 ? 18:54, ?scar de Arriba a ?crit?:
> For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout.

Hi Oscar,

Just to be sure, because normally wearout is 100% when the SSD is new, 
You are just soustracting, and it is in fact 100-4 = 96% ?
My SSDs (Dell mixed use) after some years are still at 99%, so I am 
wondering about 4%...

Alain

-- 
Administrateur Syst?me/R?seau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255


From oscar at dearriba.es  Wed Dec 28 12:19:31 2022
From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=)
Date: Wed, 28 Dec 2022 12:19:31 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr>
References: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
 <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr>
Message-ID: <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com>

Hi Alain,

Thanks for taking time to answer my message.
I think Proxmox UI is showing the % of wearout consumed. I just checked SMART using smartctl and it is showing 2.86 TB witten of a maximum of 180 TBW of this model (6%).

I think those numbers are too much for the usage of this drive, but the number of power on hours match (52 days). I think the TBW are elevated because we had an instance with swap actived and that could generate s lot of IO (that's no longer the case from a couple of weeks ago).

However, the strange behaviour of showing much more space used than the sum of VM disks + snapshots continue, and I'm really worried that the performance issue after copying some data can come from that situation. Also, the unit is showing now a 96% of space used, which worries me about decreased performance because of fragmentation issues.

Oscar

On Wed, Dec 28, 2022, at 11:52, Alain P?an wrote:
> Le 27/12/2022 ? 18:54, ?scar de Arriba a ?crit :
> > For storage, I'm using a commercial Crucial MX500 SATA SSD connected directly to the motherboard controller (no PCIe HBA for the system+data disk) and it is brand new - and S.M.A.R.T. checks are passing, only 4% of wearout.
> 
> Hi Oscar,
> 
> Just to be sure, because normally wearout is 100% when the SSD is new, 
> You are just soustracting, and it is in fact 100-4 = 96% ?
> My SSDs (Dell mixed use) after some years are still at 99%, so I am 
> wondering about 4%...
> 
> Alain
> 
> -- 
> Administrateur Syst?me/R?seau
> C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
> Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
> Tel : 01-70-27-06-88 Bureau A255
> 
> 

From oscar at dearriba.es  Wed Dec 28 12:44:54 2022
From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=)
Date: Wed, 28 Dec 2022 12:44:54 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
References: <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
Message-ID: <e207771d-7f93-415a-82fd-c3c8488d1e80@app.fastmail.com>

Hi Martin,

> Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.

I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.

I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.

Should I also enable trim/execute trim on Proxmox itself?

Oscar

From alain.pean at c2n.upsaclay.fr  Wed Dec 28 13:17:02 2022
From: alain.pean at c2n.upsaclay.fr (=?UTF-8?Q?Alain_P=c3=a9an?=)
Date: Wed, 28 Dec 2022 13:17:02 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com>
References: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
 <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr>
 <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com>
Message-ID: <8f8156db-ec51-a426-3146-3fcf843b5699@c2n.upsaclay.fr>

Le 28/12/2022 ? 12:19, ?scar de Arriba a ?crit?:
> I think Proxmox UI is showing the % of wearout consumed.

In my case, with Dell servers, the UI in fact is not showing anything 
(N/A), when the Raid storage volume is managed by the raid controller.
In this case, I use Dell OMSA (Open Manage Server Administration), to 
display the values.

But I have another cluster with Ceph, and indeed, it displays 0% as 
wearout. So I think you are right.

I saw that they are Crucial SATA SSD directly attached on the 
motherboard. What kind of filesystem do you have on these SSDs ? Can you 
run pveperf on /dev/mapper/pveroot to see what are the performances ?

Alain

-- 
Administrateur Syst?me/R?seau
C2N Centre de Nanosciences et Nanotechnologies (UMR 9001)
Boulevard Thomas Gobert (ex Avenue de La Vauve), 91120 Palaiseau
Tel : 01-70-27-06-88 Bureau A255


From oscar at dearriba.es  Wed Dec 28 19:22:52 2022
From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=)
Date: Wed, 28 Dec 2022 19:22:52 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <8f8156db-ec51-a426-3146-3fcf843b5699@c2n.upsaclay.fr>
References: <c9aa3de9-01e8-4728-8344-6452c1bb0f3e@app.fastmail.com>
 <5b795a94-ebd3-7793-a7c3-4567cd1b04f0@c2n.upsaclay.fr>
 <41f85334-834c-4534-916d-39bf5e382c75@app.fastmail.com>
 <8f8156db-ec51-a426-3146-3fcf843b5699@c2n.upsaclay.fr>
Message-ID: <4720409e-c9f6-442d-b6c6-0e2f006c4b17@app.fastmail.com>

> I saw that they are Crucial SATA SSD directly attached on the motherboard. What kind of filesystem do you have on these SSDs ? Can you run pveperf on /dev/mapper/pveroot to see what are the performances ?

It is using LVM with ext4 for the root filesystem and the data storage is using LVM-Thin.

root at venom:~# blkid
/dev/sdj2: UUID="7B86-9E58" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="f4324ec9-c95e-4963-9ea9-5026f8f3fcae"
/dev/sdj3: UUID="16ioTj-mei2-pqZI-bJWU-myRs-AF5a-Pfw03x" TYPE="LVM2_member" PARTUUID="d2528e8f-7958-4dc1-9960-f67999b75058"
/dev/mapper/pve-swap: UUID="0fbe15d8-7823-42bc-891c-c131407921c7" TYPE="swap"
/dev/mapper/pve-root: UUID="6bef8c06-b480-409c-8fa0-076344c9108d" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdj1: PARTUUID="70bb576f-ab3a-4867-ab2e-e9a7c3fb5a15"
/dev/mapper/pve-vm--150--disk--1: PTUUID="90e3bde4-d85c-46cb-a4b9-799c99e340c6" PTTYPE="gpt"
/dev/mapper/pve-vm--201--disk--1: PTUUID="cb44eeb1-db0d-4d42-8a14-05077231b097" PTTYPE="gpt"

root at venom:~# lvs
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- 377.55g             96.60  1.55                            
  root          pve -wi-ao----  60.00g                                                    
  swap          pve -wi-ao----   4.00g                                                    
  vm-150-disk-0 pve Vwi-a-tz--   4.00m data        14.06                                  
  vm-150-disk-1 pve Vwi-a-tz-- 128.00g data        100.00                                 
  vm-201-disk-0 pve Vwi-aotz--   4.00m data        14.06                                  
  vm-201-disk-1 pve Vwi-aotz--  40.00g data        75.89

Regarding pveperf:

root at venom:~# pveperf /dev/mapper/pve-root
CPU BOGOMIPS:      211008.96
REGEX/SECOND:      1983864
HD SIZE:           58.76 GB (/dev/mapper/pve-root)
BUFFERED READS:    338.10 MB/sec
AVERAGE SEEK TIME: 0.09 ms
open failed: Not a directory

root at venom:~# pveperf ~/
CPU BOGOMIPS:      211008.96
REGEX/SECOND:      2067874
HD SIZE:           58.76 GB (/dev/mapper/pve-root)
BUFFERED READS:    337.51 MB/sec
AVERAGE SEEK TIME: 0.09 ms
FSYNCS/SECOND:     679.87
DNS EXT:           128.22 ms
DNS INT:           127.51 ms

Thanks,
Oscar

From martin at holub.co.at  Thu Dec 29 11:01:14 2022
From: martin at holub.co.at (Martin Holub)
Date: Thu, 29 Dec 2022 11:01:14 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <e207771d-7f93-415a-82fd-c3c8488d1e80@app.fastmail.com>
References: <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
 <e207771d-7f93-415a-82fd-c3c8488d1e80@app.fastmail.com>
Message-ID: <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>


Am 28.12.2022 um 12:44 schrieb ?scar de Arriba:
> Hi Martin,
>
> > Did you try to run a fstrim on the VMs to regain the allocated 
> space? At least on linux something like "fstrim -av" should do the trick.
>
> I did it now and it freed ~55GiB of a running isntance (the one with 
> 128 GiB allocated). However that should only free blocks of the LV 
> used to store that VM disk, right? And the issue itself is that the 
> sum of maximum allocations of those disks is much lower than the space 
> occupied.
>
> I also have the feeling that those blocks remain used by a no longer 
> existant LVs, but I don't know how to fix it.
>
> Should I also enable trim/execute trim on Proxmox itself?
>
> Oscar
>

Hi,

TRIM only works on a filesystem level, so you can't trim a VG or 
similar. On the pve host i doubt it will help, but it wouldn't harm either.

hth
Martin

From oscar at dearriba.es  Thu Dec 29 11:48:16 2022
From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=)
Date: Thu, 29 Dec 2022 11:48:16 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
References: <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
 <e207771d-7f93-415a-82fd-c3c8488d1e80@app.fastmail.com>
 <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
Message-ID: <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com>

Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE.

Thanks,
Oscar

On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote:
> 
> 
> Am 28.12.2022 um 12:44 schrieb ?scar de Arriba:
>> Hi Martin,
>> 
>> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.
>> 
>> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.
>> 
>> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.
>> 
>> Should I also enable trim/execute trim on Proxmox itself?
>> 
>> Oscar
>> 
> 
> 
> 
> Hi,
> 
> TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either. 
> 
> hth
> Martin
> 

From oscar at dearriba.es  Thu Dec 29 17:58:51 2022
From: oscar at dearriba.es (=?UTF-8?Q?=C3=93scar_de_Arriba?=)
Date: Thu, 29 Dec 2022 17:58:51 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com>
References: <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
 <e207771d-7f93-415a-82fd-c3c8488d1e80@app.fastmail.com>
 <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
 <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com>
Message-ID: <cae87493-98a5-4256-b9a3-8af41931afea@app.fastmail.com>

Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space.

However, even removing all VMs except one (which is hard to remove without disruption) I can see that:

root at venom:~# lvs
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- 377.55g             60.65  0.67                            
  root          pve -wi-ao----  60.00g                                                    
  swap          pve -wi-ao----   4.00g                                                    
  vm-201-disk-0 pve Vwi-aotz--   4.00m data        14.06                                  
  vm-201-disk-1 pve Vwi-aotz--  40.00g data        56.58

Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full.

On Thu, Dec 29, 2022, at 11:48, ?scar de Arriba wrote:
> Any idea why it still has 96.23% of space used but the VMs are using way less? I'm starting to worry a lot about it (I don't kant tobe really full) and my current only hope is backup + reinstall PVE.
> 
> Thanks,
> Oscar
> 
> On Thu, Dec 29, 2022, at 11:01, Martin Holub wrote:
>> 
>> 
>> Am 28.12.2022 um 12:44 schrieb ?scar de Arriba:
>>> Hi Martin,
>>> 
>>> > Did you try to run a fstrim on the VMs to regain the allocated space? At least on linux something like "fstrim -av" should do the trick.
>>> 
>>> I did it now and it freed ~55GiB of a running isntance (the one with 128 GiB allocated). However that should only free blocks of the LV used to store that VM disk, right? And the issue itself is that the sum of maximum allocations of those disks is much lower than the space occupied.
>>> 
>>> I also have the feeling that those blocks remain used by a no longer existant LVs, but I don't know how to fix it.
>>> 
>>> Should I also enable trim/execute trim on Proxmox itself?
>>> 
>>> Oscar
>>> 
>> 
>> 
>> 
>> 
>> Hi,
>> 
>> TRIM only works on a filesystem level, so you can't trim a VG or similar. On the pve host i doubt it will help, but it wouldn't harm either. 
>> 
>> hth
>> Martin
>> 
>> 
> 

From pve at junkyard.4t2.com  Fri Dec 30 10:52:21 2022
From: pve at junkyard.4t2.com (Tom Weber)
Date: Fri, 30 Dec 2022 10:52:21 +0100
Subject: [PVE-User] Thin LVM showing more used space than expected
In-Reply-To: <cae87493-98a5-4256-b9a3-8af41931afea@app.fastmail.com>
References: <mailman.3.1672225201.17323.pve-user@lists.proxmox.com>
 <e207771d-7f93-415a-82fd-c3c8488d1e80@app.fastmail.com>
 <b462e244-86a1-eada-c50b-4361f037dc1e@holub.co.at>
 <64119642-e2a3-428a-a053-371f9fc6bda0@app.fastmail.com>
 <cae87493-98a5-4256-b9a3-8af41931afea@app.fastmail.com>
Message-ID: <213da705-7804-c562-f21c-d6122bf88b85@junkyard.4t2.com>

Am 29.12.22 um 17:58 schrieb ?scar de Arriba:
> Update: I have enabled `Discard` option in all disks of the VMs on that server and then `fstim` did the work and freed some space.
> 
> However, even removing all VMs except one (which is hard to remove without disruption) I can see that:
> 
> root at venom:~# lvs
>    LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>    data          pve twi-aotz-- 377.55g             60.65  0.67
>    root          pve -wi-ao----  60.00g
>    swap          pve -wi-ao----   4.00g
>    vm-201-disk-0 pve Vwi-aotz--   4.00m data        14.06
>    vm-201-disk-1 pve Vwi-aotz--  40.00g data        56.58
> 
> Which means that I have about 200 GB used out of nowhere :( At least it is no longer under pressure of being almost 100% full.
> 

you might want to try lvs -a

   Tom