[PVE-User] Moving disk with ZFS over iSCSI = IO error

Daniel Berteaud daniel at firewall-services.com
Thu Sep 19 07:57:20 CEST 2019


----- Le 17 Sep 19, à 18:27, Daniel Berteaud <daniel at firewall-services.com> a écrit : 

> Hi there.

> I'm working on moving my NFS setup to ZFS over iSCSI. I'm using a CentOS 7.6 box
> with ZoL 0.8.1, with the LIO backend (but this shouldn't be relevent, see
> further). For the PVE side, I'm running PVE6 with all updates applied.

> Except a few minor issues I found in the LIO backend (for which I sent a patch
> serie earlier today), most things do work nicely. Except one which is important
> to me : I can't move disk from ZFS over iSCSI to any other storage. Destination
> storage type doesn't matter, but the porblem is 100% reproducible when the
> source storage is ZFS over iSCSI

> A few seconds after I started disk move, the guest FS will "panic". For example,
> with an el7 guest using XFS, I get :

> kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
> kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
> kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 79 7f a8 00 00 08 00
> kernel: blk_update_request: I/O error, dev sda, sector 7962536
> kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
> kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
> kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 79 7f a8 00 00 08 00
> kernel: blk_update_request: I/O error, dev sda, sector 7962536
> kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
> kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
> kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 bc 0e 28 00 00 08 00
> kernel: blk_update_request: I/O error, dev sda, sector 12324392

> And the system completely crash. The data itself is not impacted. I can restart
> the guest and everything appears OK. It doesn't matter if I let the disk move
> operation terminates or if I cancel it.
> Moving the disk offline works as expected.

> Sparse or non sparse zvol backend doesn't matter either.

> I searched a lot about this issue, and found at least two other persons having
> the same, or a very similar issue :

>    * One using ZoL but with SCST, see [
>    https://sourceforge.net/p/scst/mailman/message/35241011/ |
>     https://sourceforge.net/p/scst/mailman/message/35241011/ ]
>    * Another, using OmniOS, so with Comstar, see [
>    https://forum.proxmox.com/threads/storage-iscsi-move-results-to-io-error.38848/
>    |
>    https://forum.proxmox.com/threads/storage-iscsi-move-results-to-io-error.38848/
>     ]

> Both are likely running PVE5, so it looks like it's not a recently introduced
> regression.

> I also was able to reproduce the issue with a FreeNAS storage, so using ctld. As
> the issue is present with so many different stack, I think we can eliminate an
> issue on the storage side. The problem is most likely on qemu, in it's iSCSI
> block implementation.
> The SCST-Devel thread is interesting, but infortunately, it's beyond my skills
> here.

> Any advice on how to debug this further ? I can reproduce it whenever I want, on
> a test setup. I'm happy to provide any usefull informations

> Regards, Daniel

Forgot to mention. When moving a disk offline, from ZFS over iSCSI to something else (in my case to an NFS storage), I do have warnings like this : 

create full clone of drive scsi0 (zfs-test:vm-132-disk-0) 
Formatting '/mnt/pve/nfs-dumps/images/132/vm-132-disk-0.qcow2', fmt=qcow2 size=53687091200 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16 
transferred: 0 bytes remaining: 53687091200 bytes total: 53687091200 bytes progression: 0.00 % 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 0: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 4194303: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 8388606: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 12582909: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 16777212: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 20971515: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 25165818: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 29360121: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 33554424: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 37748727: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 41943030: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 46137333: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 50331636: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 54525939: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 58720242: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 62914545: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 67108848: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 71303151: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 75497454: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 79691757: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 83886060: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 88080363: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 92274666: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 96468969: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 100663272: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 104857575: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 0: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
transferred: 536870912 bytes remaining: 53150220288 bytes total: 53687091200 bytes progression: 1.00 % 
transferred: 1079110533 bytes remaining: 52607980667 bytes total: 53687091200 bytes progression: 2.01 % 
transferred: 1615981445 bytes remaining: 52071109755 bytes total: 53687091200 bytes progression: 3.01 % 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 4194303: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
transferred: 2158221066 bytes remaining: 51528870134 bytes total: 53687091200 bytes progression: 4.02 % 
transferred: 2695091978 bytes remaining: 50991999222 bytes total: 53687091200 bytes progression: 5.02 % 
transferred: 3231962890 bytes remaining: 50455128310 bytes total: 53687091200 bytes progression: 6.02 % 
transferred: 3774202511 bytes remaining: 49912888689 bytes total: 53687091200 bytes progression: 7.03 % 
qemu-img: iSCSI GET_LBA_STATUS failed at lba 8388606: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_FIELD_IN_CDB(0x2400) 
transferred: 4311073423 bytes remaining: 49376017777 bytes total: 53687091200 bytes progression: 8.03 % 
transferred: 4853313044 bytes remaining: 48833778156 bytes total: 53687091200 bytes progression: 9.04 % 
transferred: 5390183956 bytes remaining: 48296907244 bytes total: 53687091200 bytes progression: 10.04 % 
transferred: 5927054868 bytes remaining: 47760036332 bytes total: 53687091200 bytes progression: 11.04 % 

Which might well be related to the problem (the same errors when the VM is running are reported back to the upper stacks, until the guest FS, which panics ?) 
When running offline, even with these error messages, the transfert is OK 

Cheers, 
Daniel 

-- 

[ https://www.firewall-services.com/ ] 	
Daniel Berteaud 
FIREWALL-SERVICES SAS, La sécurité des réseaux 
Société de Services en Logiciels Libres 
Tél : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ] 



More information about the pve-user mailing list