[PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent

Frank Thommen f.thommen at dkfz-heidelberg.de
Wed Sep 11 16:22:46 CEST 2024


I'm not sure, if that is related, but looking at the activity LEDs of 
the DB/WAL SSD devices (two SSDs in RAID 1 each) in each host, the 
device in one host is basically permanently active (LEDs constantly 
flickering w/o pause), while on the other host, the device seems almost 
completely inactive (one blink every few seconds). The third host has no 
SSD DB device yet. To me that looks, as one Ceph node is extremely 
active, while the other isn't. That also looks like an imbalance to me.

Frank


On 11.09.24 13:00, Frank Thommen wrote:
> The OSDs are of different size, because we have 4 TB and 2 TB disks in 
> the systems.
> 
> We might give the reweight a try.
> 
> 
> 
> On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote:
>> Hi Frank,
>>
>> The images didn't work 🙂
>>
>> Pool and osd nearfull are closely related, when OSD's get full your pool
>>   also gets nearfull as Ceph needs to be able to follow the crush rules,
>> which it can't if one of the OSD's gets full, hence it warns when it
>> gets nearfull.
>>
>> I see that you're mixing OSD sizes, deleting and recreating the OSD's
>> one by one caused this, as the OSD's got new weights you should be OK
>> when you reweight them.
>> You can do this by hand or using reweight-by-utilization, what you
>> prefer.
>>
>> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max
>> rule should never be above 80%, as this gives you a nearfull pool when
>> it starts backfilling when you lose one node, or even a full pool worst
>> case, rendering the pool read only.
>>
>> Kind regards,
>> David
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user



More information about the pve-user mailing list