[PVE-User] HDD errors in VMs
Michael Pöllinger
m.poellinger at wds-tech.de
Thu Dec 31 17:03:07 CET 2015
Hi mailing list members.
We are getting strange erros in our debian 8 VMs and only inside of the VM.
Setup is qcow2 with LVM inside of VMs (more details on bottom)
They are crashing randomly. Sometimes multipile times a day sometimes after
some days.
Has anyone the same problems? Or get it solved already?
BTW: No errors reported by raid system. HDDs seems to be all fine.
kernel.log
[So Dez 27 05:17:44 2015] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
[So Dez 27 05:17:44 2015] ata1.00: failed command: WRITE DMA
[So Dez 27 05:17:44 2015] ata1.00: cmd ca/00:80:b8:4e:ce/00:00:00:00:00/eb
tag 0 dma 65536 out res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4
(timeout)
[So Dez 27 05:17:44 2015] ata1.00: status: { DRDY }
[So Dez 27 05:17:44 2015] ata1: soft resetting link
[So Dez 27 05:17:45 2015] ata1.01: NODEV after polling detection
[So Dez 27 05:17:45 2015] ata1.00: configured for MWDMA2
[So Dez 27 05:17:45 2015] ata1.00: device reported invalid CHS sector 0
[So Dez 27 05:17:45 2015] ata1: EH complete
We got multiple behaviors after that:
- 9 times the VM stopped working and we need to press reset or reboot
multiple times until it works
- 1 time we got a kernel panic after it
Seems to be not a hardware defect, because this problem occours also after
migration to another node.
Strange is that all debian7 VMs running fine. It´s only latest debian 8
getting this error.
pveversion
------------------------
proxmox-ve-2.6.32: 3.4-166 (running kernel: 2.6.32-43-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-2.6.32-37-pve: 2.6.32-150
pve-kernel-2.6.32-43-pve: 2.6.32-166
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-5
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-14
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
pvperf
------------------------
root at server14:/var/lib/vz# pveperf /var/lib/vz
CPU BOGOMIPS: 55203.24
REGEX/SECOND: 946969
HD SIZE: 2605.33 GB (/dev/mapper/pve-data)
BUFFERED READS: 130.43 MB/sec
AVERAGE SEEK TIME: 17.61 ms
FSYNCS/SECOND: 811.25
DNS EXT: 53.06 ms
DNS INT: 50.92 ms (xxx)
vmXXX.conf
------------------------
#[hostname]
#[IP]
#
boot: cdn
bootdisk: ide0
cores: 2
ide0:
local:110/vm-110-disk-1.qcow2,format=qcow2,cache=writethrough,size=201G
ide2: server16:iso/systemrescuecd-x86-4.6.1.iso,media=cdrom,size=459502K
memory: 4096
name: [FQDN]
net0: e1000=CE:C8:FE:B3:56:F8,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=d5bc6275-b25a-4523-b927-0d0098a7cb74
sockets: 1
Hardware info
------------------------
AMD Opteron(tm) Processor 6176 12 cores
Supermicro H8SGL
Adaptec 5405Z with ZMCP
2 x HGST HDN724030AL as RAID 1
All updated to the latest versions.
Here the next crash of the VMs, all with same error messages:
-------------------------------------------
kernel: [242495.848207] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
kernel: [242495.849075] ata1.00: failed command: FLUSH CACHE
kernel: [242495.849772] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag
0
kernel: [242495.849772] res 40/00:01:00:00:00/00:00:00:00:00/a0
Emask 0x4 (timeout)
kernel: [242495.851831] ata1.00: status: { DRDY }
kernel: [242500.892182] ata1: link is slow to respond, please be patient
(ready=0)
kernel: [242505.876134] ata1: device not ready (errno=-16), forcing
hardreset
kernel: [242505.876246] ata1: soft resetting link
kernel: [242506.033244] ata1.00: configured for MWDMA2
kernel: [242506.033252] ata1.00: retrying FLUSH 0xe7 Emask 0x4
kernel: [242506.033620] ata1.00: device reported invalid CHS sector 0
kernel: [242506.033632] ata1: EH complete
kernel: [255097.832155] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
kernel: [255097.833034] ata1.00: failed command: FLUSH CACHE
kernel: [255097.833744] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag
0
kernel: [255097.833744] res 40/00:01:00:00:00/00:00:00:00:00/a0
Emask 0x4 (timeout)
kernel: [255097.835810] ata1.00: status: { DRDY }
kernel: [255102.876126] ata1: link is slow to respond, please be patient
(ready=0)
kernel: [255107.860130] ata1: device not ready (errno=-16), forcing
hardreset
kernel: [255107.860153] ata1: soft resetting link
kernel: [255108.017093] ata1.00: configured for MWDMA2
kernel: [255108.017113] ata1.00: retrying FLUSH 0xe7 Emask 0x4
kernel: [255108.017537] ata1.00: device reported invalid CHS sector 0
kernel: [255108.017550] ata1: EH complete
kernel: [309438.824333] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
kernel: [309438.825198] ata1.00: failed command: FLUSH CACHE
kernel: [309438.825921] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag
0
kernel: [309438.825921] res 40/00:01:00:00:00/00:00:00:00:00/a0
Emask 0x4 (timeout)
kernel: [309438.827996] ata1.00: status: { DRDY }
kernel: [309443.868140] ata1: link is slow to respond, please be patient
(ready=0)
kernel: [309448.852147] ata1: device not ready (errno=-16), forcing
hardreset
kernel: [309448.852175] ata1: soft resetting link
kernel: [309449.009123] ata1.00: configured for MWDMA2
kernel: [309449.009129] ata1.00: retrying FLUSH 0xe7 Emask 0x4
kernel: [309449.009532] ata1.00: device reported invalid CHS sector 0
kernel: [309449.009545] ata1: EH complete
kind regards and happy about any hint
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20151231/c18355e4/attachment.htm>
More information about the pve-user
mailing list