[PVE-User] HDD errors in VMs

Thu Dec 31 17:03:07 CET 2015

Hi mailing list members.

We are getting strange erros in our debian 8 VMs and only inside of the VM.

Setup is qcow2 with LVM inside of VMs (more details on bottom)

They are crashing randomly. Sometimes multipile times a day sometimes after
some days.

Has anyone the same problems? Or get it solved already?

BTW: No errors reported by raid system. HDDs seems to be all fine.

kernel.log

[So Dez 27 05:17:44 2015] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen

[So Dez 27 05:17:44 2015] ata1.00: failed command: WRITE DMA

[So Dez 27 05:17:44 2015] ata1.00: cmd ca/00:80:b8:4e:ce/00:00:00:00:00/eb
tag 0 dma 65536 out res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4
(timeout)

[So Dez 27 05:17:44 2015] ata1.00: status: { DRDY }

[So Dez 27 05:17:44 2015] ata1: soft resetting link

[So Dez 27 05:17:45 2015] ata1.01: NODEV after polling detection

[So Dez 27 05:17:45 2015] ata1.00: configured for MWDMA2

[So Dez 27 05:17:45 2015] ata1.00: device reported invalid CHS sector 0

[So Dez 27 05:17:45 2015] ata1: EH complete

We got multiple behaviors after that:

- 9 times the VM stopped working and we need to press reset or reboot
multiple times until it works

- 1 time we got a kernel panic after it

Seems to be not a hardware defect, because this problem occours also after
migration to another node.

Strange is that all debian7 VMs running fine. It´s only latest debian 8
getting this error.

pveversion

------------------------

proxmox-ve-2.6.32: 3.4-166 (running kernel: 2.6.32-43-pve)

pve-manager: 3.4-11 (running version: 3.4-11/6502936f)

pve-kernel-2.6.32-39-pve: 2.6.32-157

pve-kernel-2.6.32-37-pve: 2.6.32-150

pve-kernel-2.6.32-43-pve: 2.6.32-166

lvm2: 2.02.98-pve4

clvm: 2.02.98-pve4

corosync-pve: 1.4.7-1

openais-pve: 1.1.4-3

libqb0: 0.11.1-2

redhat-cluster-pve: 3.2.0-2

resource-agents-pve: 3.9.2-4

fence-agents-pve: 4.0.10-3

pve-cluster: 3.0-19

qemu-server: 3.4-6

pve-firmware: 1.1-5

libpve-common-perl: 3.0-24

libpve-access-control: 3.0-16

libpve-storage-perl: 3.0-34

pve-libspice-server1: 0.12.4-3

vncterm: 1.1-8

vzctl: 4.0-1pve6

vzprocps: 2.0.11-2

vzquota: 3.1-2

pve-qemu-kvm: 2.2-14

ksm-control-daemon: 1.1-1

glusterfs-client: 3.5.2-1

pvperf

------------------------

root at server14:/var/lib/vz# pveperf /var/lib/vz

CPU BOGOMIPS:      55203.24

REGEX/SECOND:      946969

HD SIZE:           2605.33 GB (/dev/mapper/pve-data)

BUFFERED READS:    130.43 MB/sec

AVERAGE SEEK TIME: 17.61 ms

FSYNCS/SECOND:     811.25

DNS EXT:           53.06 ms

DNS INT:           50.92 ms (xxx)

vmXXX.conf

------------------------

#[hostname]

#[IP]

#

boot: cdn

bootdisk: ide0

cores: 2

ide0:
local:110/vm-110-disk-1.qcow2,format=qcow2,cache=writethrough,size=201G

ide2: server16:iso/systemrescuecd-x86-4.6.1.iso,media=cdrom,size=459502K

memory: 4096

name: [FQDN]

net0: e1000=CE:C8:FE:B3:56:F8,bridge=vmbr0,firewall=1

numa: 0

onboot: 1

ostype: l26

smbios1: uuid=d5bc6275-b25a-4523-b927-0d0098a7cb74

sockets: 1

Hardware info

------------------------

AMD Opteron(tm) Processor 6176 12 cores

Supermicro H8SGL

Adaptec 5405Z with ZMCP

2 x HGST HDN724030AL as RAID 1

All updated to the latest versions.

Here the next crash of the VMs, all with same error messages:

------------------------------------------- 

kernel: [242495.848207] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen

kernel: [242495.849075] ata1.00: failed command: FLUSH CACHE

kernel: [242495.849772] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag
0

kernel: [242495.849772]          res 40/00:01:00:00:00/00:00:00:00:00/a0
Emask 0x4 (timeout)

kernel: [242495.851831] ata1.00: status: { DRDY }

kernel: [242500.892182] ata1: link is slow to respond, please be patient
(ready=0)

kernel: [242505.876134] ata1: device not ready (errno=-16), forcing
hardreset

kernel: [242505.876246] ata1: soft resetting link

kernel: [242506.033244] ata1.00: configured for MWDMA2

kernel: [242506.033252] ata1.00: retrying FLUSH 0xe7 Emask 0x4

kernel: [242506.033620] ata1.00: device reported invalid CHS sector 0

kernel: [242506.033632] ata1: EH complete

kernel: [255097.832155] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen

kernel: [255097.833034] ata1.00: failed command: FLUSH CACHE

kernel: [255097.833744] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag
0

kernel: [255097.833744]          res 40/00:01:00:00:00/00:00:00:00:00/a0
Emask 0x4 (timeout)

kernel: [255097.835810] ata1.00: status: { DRDY }

kernel: [255102.876126] ata1: link is slow to respond, please be patient
(ready=0)

kernel: [255107.860130] ata1: device not ready (errno=-16), forcing
hardreset

kernel: [255107.860153] ata1: soft resetting link

kernel: [255108.017093] ata1.00: configured for MWDMA2

kernel: [255108.017113] ata1.00: retrying FLUSH 0xe7 Emask 0x4

kernel: [255108.017537] ata1.00: device reported invalid CHS sector 0

kernel: [255108.017550] ata1: EH complete

kernel: [309438.824333] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen

kernel: [309438.825198] ata1.00: failed command: FLUSH CACHE

kernel: [309438.825921] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag
0

kernel: [309438.825921]          res 40/00:01:00:00:00/00:00:00:00:00/a0
Emask 0x4 (timeout)

kernel: [309438.827996] ata1.00: status: { DRDY }

kernel: [309443.868140] ata1: link is slow to respond, please be patient
(ready=0)

kernel: [309448.852147] ata1: device not ready (errno=-16), forcing
hardreset

kernel: [309448.852175] ata1: soft resetting link

kernel: [309449.009123] ata1.00: configured for MWDMA2

kernel: [309449.009129] ata1.00: retrying FLUSH 0xe7 Emask 0x4

kernel: [309449.009532] ata1.00: device reported invalid CHS sector 0

kernel: [309449.009545] ata1: EH complete

kind regards and happy about any hint

Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20151231/c18355e4/attachment.htm>