[PVE-User] PVE 5.1 - Intel <-> AMD migration crash with Debian 9
Eneko Lacunza
elacunza at binovo.es
Fri Feb 2 10:14:33 CET 2018
Hi all,
We have replaced an old node in our office Proxmox 5.1 cluster, with a
Ryzen 7 1700 machine with 64GB non-ECC RAM, just moving the disks from
the old Intel server to the new AMD machine. So far so good, everything
booted OK, Ceph OSD started OK after adjusting network, replacement went
really nice.
But we have found _one_ Debian 9 VM that kernel panics shortly after
migrating to/from Intel nodes from/to AMD node. Sometimes it is a matter
of seconds, sometimes it needs some minutes or even rarely one or two hours.
The strange thing is that we have done that king of migration with other
VMs (serveral Windows VMs with different versions, another CentOS VM,
Debian 8 VM) and works perfectly.
If we restart this problematic VM after the migration+crash, it works
flawlessly (no more crashes until migration to another CPU maker).
Migration between Intel CPUs (with ECC memory) works OK too. We don't
have a second AMD machine to test migration between AMD nodes.
VM has 1 socket/2 cores type kvm64, 3GB of RAM, Standard VGA, cdrom at
IDE2, scsi-virtio, scsi0 8G on ceph-rbd, scsi1 50GB on ceph-rbd, network
virtio, OS type Linux 4.x, Hotplug Disk, Network, USB, ACPI support yes,
BIOS SeaBIOS, KVM hwd virt yes, qemu agent no. We have tried with
virtio-block too.
# pveversion -v
proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.4.67-1-pve: 4.4.67-92
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-19
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-1~bpo90+1
Any ideas? This is a production VM but it isn't critical, we can play
with it. We can also live with the problem, but I think it could be of
interest to try to debug the problem.
Thanks a lot
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list