[PVE-User] Moving disk with ZFS over iSCSI = IO error

Tue Sep 17 18:27:33 CEST 2019

Hi there. 

I'm working on moving my NFS setup to ZFS over iSCSI. I'm using a CentOS 7.6 box with ZoL 0.8.1, with the LIO backend (but this shouldn't be relevent, see further). For the PVE side, I'm running PVE6 with all updates applied. 

Except a few minor issues I found in the LIO backend (for which I sent a patch serie earlier today), most things do work nicely. Except one which is important to me : I can't move disk from ZFS over iSCSI to any other storage. Destination storage type doesn't matter, but the porblem is 100% reproducible when the source storage is ZFS over iSCSI 

A few seconds after I started disk move, the guest FS will "panic". For example, with an el7 guest using XFS, I get : 

kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] 
kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated 
kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 79 7f a8 00 00 08 00 
kernel: blk_update_request: I/O error, dev sda, sector 7962536 
kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] 
kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated 
kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 79 7f a8 00 00 08 00 
kernel: blk_update_request: I/O error, dev sda, sector 7962536 
kernel: sd 2:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
kernel: sd 2:0:0:0: [sda] Sense Key : Aborted Command [current] 
kernel: sd 2:0:0:0: [sda] Add. Sense: I/O process terminated 
kernel: sd 2:0:0:0: [sda] CDB: Read(10) 28 00 00 bc 0e 28 00 00 08 00 
kernel: blk_update_request: I/O error, dev sda, sector 12324392 

And the system completely crash. The data itself is not impacted. I can restart the guest and everything appears OK. It doesn't matter if I let the disk move operation terminates or if I cancel it. 
Moving the disk offline works as expected. 

Sparse or non sparse zvol backend doesn't matter either. 

I searched a lot about this issue, and found at least two other persons having the same, or a very similar issue : 

    * One using ZoL but with SCST, see [ https://sourceforge.net/p/scst/mailman/message/35241011/ | https://sourceforge.net/p/scst/mailman/message/35241011/ ] 
    * Another, using OmniOS, so with Comstar, see [ https://forum.proxmox.com/threads/storage-iscsi-move-results-to-io-error.38848/ | https://forum.proxmox.com/threads/storage-iscsi-move-results-to-io-error.38848/ ] 

Both are likely running PVE5, so it looks like it's not a recently introduced regression. 

I also was able to reproduce the issue with a FreeNAS storage, so using ctld. As the issue is present with so many different stack, I think we can eliminate an issue on the storage side. The problem is most likely on qemu, in it's iSCSI block implementation. 
The SCST-Devel thread is interesting, but infortunately, it's beyond my skills here. 

Any advice on how to debug this further ? I can reproduce it whenever I want, on a test setup. I'm happy to provide any usefull informations 

Regards, Daniel 

-- 

[ https://www.firewall-services.com/ ] 	
Daniel Berteaud 
FIREWALL-SERVICES SAS, La sécurité des réseaux 
Société de Services en Logiciels Libres 
Tél : +33.5 56 64 15 32 
Matrix: @dani:fws.fr 
[ https://www.firewall-services.com/ | https://www.firewall-services.com ]