[PVE-User] Ceph jewel to luminous upgrade problem
Eneko Lacunza
elacunza at binovo.es
Mon Nov 13 16:26:31 CET 2017
Hi all,
We're in the process of upgrading our office Proxmox v4.4 cluster to v5.1 .
For that we first have followed instructions in
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous
to upgrade Ceph Jewel to Luminous.
Upgrade was apparently a success:
# ceph -s
cluster:
id: 8ee074d4-005c-4bd6-a077-85eddde543b5
health: HEALTH_OK
services:
mon: 3 daemons, quorum 0,2,3
mgr: butroe(active), standbys: guadalupe, sanmarko
osd: 12 osds: 12 up, 12 in
data:
pools: 2 pools, 640 pgs
objects: 518k objects, 1966 GB
usage: 4120 GB used, 7052 GB / 11172 GB avail
pgs: 640 active+clean
io:
client: 644 kB/s rd, 3299 kB/s wr, 61 op/s rd, 166 op/s wr
And versions seem good too:
# ceph mon versions
{
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 3
}
# ceph osd versions
{
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 12
}
But this weeked there were problems backing up some VMs, all with the
same error:
no such volume 'ceph-proxmox:vm-120-disk-1'
The "missing" volumes don't show in storage content, but they DO if we
do a "rbd -p proxmox ls".
When we try an info command we get an error though:
# rbd -p proxmox info vm-120-disk-1
2017-11-13 16:04:02.979006 7f99d8ff9700 -1 librbd::image::OpenRequest:
failed to retreive immutable metadata: (2) No such file or directory
rbd: error opening image vm-120-disk-1: (2) No such file or directory
Other VM disk images behave normally:
# rbd -p proxmox info vm-119-disk-1
rbd image 'vm-119-disk-1':
size 3072 MB in 768 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.575762ae8944a
format: 2
features: layering
flags:
I don't really know what to look at to further diagnose this. I recall
that there was a version 1 format for rbd, but I doubt "missing" disk
images are in that old format (and really don't know how to check that
if info doesn't work...)
Some of the missing VMs continue to be used by "old" running qemu
processes and work correctly; but if we stop the VM, then it won't start
again with the error reported above. I can start and stop VMs with
non-"missing" disk images normally.
Any hints about what to try next?
OSDs are filestore with XFS (created from GUI).
# pveversion -v
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 12.2.1-1~bpo80+1
Thanks a lot
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list