[PVE-User] [Extern] - Re: "nearfull" status in PVE Dashboard not consistent
Frank Thommen
f.thommen at dkfz-heidelberg.de
Wed Sep 11 16:22:46 CEST 2024
I'm not sure, if that is related, but looking at the activity LEDs of
the DB/WAL SSD devices (two SSDs in RAID 1 each) in each host, the
device in one host is basically permanently active (LEDs constantly
flickering w/o pause), while on the other host, the device seems almost
completely inactive (one blink every few seconds). The third host has no
SSD DB device yet. To me that looks, as one Ceph node is extremely
active, while the other isn't. That also looks like an imbalance to me.
Frank
On 11.09.24 13:00, Frank Thommen wrote:
> The OSDs are of different size, because we have 4 TB and 2 TB disks in
> the systems.
>
> We might give the reweight a try.
>
>
>
> On 10.09.24 20:31, David der Nederlanden | ITTY via pve-user wrote:
>> Hi Frank,
>>
>> The images didn't work 🙂
>>
>> Pool and osd nearfull are closely related, when OSD's get full your pool
>> also gets nearfull as Ceph needs to be able to follow the crush rules,
>> which it can't if one of the OSD's gets full, hence it warns when it
>> gets nearfull.
>>
>> I see that you're mixing OSD sizes, deleting and recreating the OSD's
>> one by one caused this, as the OSD's got new weights you should be OK
>> when you reweight them.
>> You can do this by hand or using reweight-by-utilization, what you
>> prefer.
>>
>> Not quite sure about the pool sizes, but an RBD pool with a 2/3 min/max
>> rule should never be above 80%, as this gives you a nearfull pool when
>> it starts backfilling when you lose one node, or even a full pool worst
>> case, rendering the pool read only.
>>
>> Kind regards,
>> David
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list