[PVE-User] [ceph-users] Re: Ceph Usage web and terminal.

Wed Dec 29 13:51:03 CET 2021

Hi, Uwe

29.12.2021 14:16, Uwe Sauter пишет:
> Just a feeling but I'd say that the imbalance in OSDs (one host having many more disks than the
> rest) is your problem.

Yes, last node in cluster have more disk then the rest, but

one disk is 12TB and all others 9 HD is 1TB

>
> Assuming that your configuration keeps 3 copies of each VM image then the imbalance probably means
> that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, all VM images with 2
> copies on that host become unresponsive, too.

In Proxmox web ceph pool I set the  Size: 2 , Min.Size: 2

With :  ceph osd map vm.pool object-name (vm ID) I see some of vm object 
one copy is on osd.12, example :

osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> 
up ([12,8], p12) acting ([12,8], p12)

But this example :

osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) 
-> up ([10,7], p10) acting ([10,7], p10)

osd.10 and osd.7

>
> Check your failure domain for Ceph and possibly change it from OSD to host. This should prevent that
> one host holds multiple copies of a VM image.

I didn 't understand a little what to check  ?

Can you explain me with example?

>
>
> Regards,
>
> 	Uwe
>
> Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
>> Hello to all.
>>
>> In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 15.2.15  octopus
>> (stable)": 7)
>>
>> Ceph HEALTH_OK
>>
>> ceph -s
>>    cluster:
>>      id:     9662e3fa-4ce6-41df-8d74-5deaa41a8dde
>>      health: HEALTH_OK
>>
>>    services:
>>      mon: 7 daemons, quorum pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
>>      mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, pve-3101, pve-3111,
>> pve-3108
>>      mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
>>      osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
>>
>>    task status:
>>
>>    data:
>>      pools:   4 pools, 1089 pgs
>>      objects: 1.09M objects, 4.1 TiB
>>      usage:   7.7 TiB used, 99 TiB / 106 TiB avail
>>      pgs:     1089 active+clean
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> ceph osd tree
>>
>> ID   CLASS  WEIGHT     TYPE NAME            STATUS  REWEIGHT PRI-AFF
>>   -1         106.43005  root default
>> -13          14.55478      host pve-3101
>>   10    hdd    7.27739          osd.10           up   1.00000 1.00000
>>   11    hdd    7.27739          osd.11           up   1.00000 1.00000
>> -11          14.55478      host pve-3103
>>    8    hdd    7.27739          osd.8            up   1.00000 1.00000
>>    9    hdd    7.27739          osd.9            up   1.00000 1.00000
>>   -3          14.55478      host pve-3105
>>    0    hdd    7.27739          osd.0            up   1.00000 1.00000
>>    1    hdd    7.27739          osd.1            up   1.00000 1.00000
>>   -5          14.55478      host pve-3107
>>    2    hdd    7.27739          osd.2            up   1.00000 1.00000
>>    3    hdd    7.27739          osd.3            up   1.00000 1.00000
>>   -9          14.55478      host pve-3108
>>    6    hdd    7.27739          osd.6            up   1.00000 1.00000
>>    7    hdd    7.27739          osd.7            up   1.00000 1.00000
>>   -7          14.55478      host pve-3109
>>    4    hdd    7.27739          osd.4            up   1.00000 1.00000
>>    5    hdd    7.27739          osd.5            up   1.00000 1.00000
>> -15          19.10138      host pve-3111
>>   12    hdd   10.91409          osd.12           up   1.00000 1.00000
>>   13    hdd    0.90970          osd.13           up   1.00000 1.00000
>>   14    hdd    0.90970          osd.14           up   1.00000 1.00000
>>   15    hdd    0.90970          osd.15           up   1.00000 1.00000
>>   16    hdd    0.90970          osd.16           up   1.00000 1.00000
>>   17    hdd    0.90970          osd.17           up   1.00000 1.00000
>>   18    hdd    0.90970          osd.18           up   1.00000 1.00000
>>   19    hdd    0.90970          osd.19           up   1.00000 1.00000
>>   20    hdd    0.90970          osd.20           up   1.00000 1.00000
>>   21    hdd    0.90970          osd.21           up   1.00000 1.00000
>>
>> ---------------------------------------------------------------------------------------------------------------
>>
>>
>> POOL                               ID  PGS   STORED   OBJECTS USED     %USED  MAX AVAIL
>> vm.pool                            2  1024  3.0 TiB  863.31k  6.0 TiB   6.38     44 TiB  (this pool
>> have the all VM disk)
>>
>> ---------------------------------------------------------------------------------------------------------------
>>
>>
>> ceph osd map vm.pool vm.pool.object
>> osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) -> up ([2,4], p2)
>> acting ([2,4], p2)
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> pveversion -v
>> proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
>> pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
>> pve-kernel-helper: 6.4-8
>> pve-kernel-5.4: 6.4-7
>> pve-kernel-5.4.143-1-pve: 5.4.143-1
>> pve-kernel-5.4.106-1-pve: 5.4.106-1
>> ceph: 15.2.15-pve1~bpo10
>> ceph-fuse: 15.2.15-pve1~bpo10
>> corosync: 3.1.2-pve1
>> criu: 3.11-3
>> glusterfs-client: 5.5-3
>> ifupdown: residual config
>> ifupdown2: 3.0.0-1+pve4~bpo10
>> ksm-control-daemon: 1.3-1
>> libjs-extjs: 6.0.1-10
>> libknet1: 1.22-pve1~bpo10+1
>> libproxmox-acme-perl: 1.1.0
>> libproxmox-backup-qemu0: 1.1.0-1
>> libpve-access-control: 6.4-3
>> libpve-apiclient-perl: 3.1-3
>> libpve-common-perl: 6.4-4
>> libpve-guest-common-perl: 3.1-5
>> libpve-http-server-perl: 3.2-3
>> libpve-storage-perl: 6.4-1
>> libqb0: 1.0.5-1
>> libspice-server1: 0.14.2-4~pve6+1
>> lvm2: 2.03.02-pve4
>> lxc-pve: 4.0.6-2
>> lxcfs: 4.0.6-pve1
>> novnc-pve: 1.1.0-1
>> proxmox-backup-client: 1.1.13-2
>> proxmox-mini-journalreader: 1.1-1
>> proxmox-widget-toolkit: 2.6-1
>> pve-cluster: 6.4-1
>> pve-container: 3.3-6
>> pve-docs: 6.4-2
>> pve-edk2-firmware: 2.20200531-1
>> pve-firewall: 4.1-4
>> pve-firmware: 3.3-2
>> pve-ha-manager: 3.1-1
>> pve-i18n: 2.3-1
>> pve-qemu-kvm: 5.2.0-6
>> pve-xtermjs: 4.7.0-3
>> qemu-server: 6.4-2
>> smartmontools: 7.2-pve2
>> spiceterm: 3.1-1
>> vncterm: 1.6-2
>> zfsutils-linux: 2.0.6-pve1~bpo10+1
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> And now my problem:
>>
>> For all VM I have one pool for VM disks
>>
>> When  node/host pve-3111  is shutdown in many of other nodes/hosts pve-3107, pve-3105  VM not
>> shutdown but not available in network.
>>
>> After the node/host is up Ceph back to HEALTH_OK and the all VM back to access in Network (without
>> reboot).
>>
>> Can some one to suggest me what I can to check in Ceph ?
>>
>> Thanks.
>>
>
-- 
-------------------------
С уважением
Сергей Цаболов,
Системный администратор
ООО "Т8"
Тел.: +74992716161,
Моб: +79850334875
tsabolov at t8.ru
ООО «Т8», 107076, г. Москва, Краснобогатырская ул., д. 44, стр.1
www.t8.ru