[PVE-User] Replication failed, got tiemout?

Aaron Lauterer a.lauterer at proxmox.com
Tue Apr 5 09:26:33 CEST 2022


Is the pool using HDDs? Could be that other things are happening at that moment and HDDs are really not great for random IO. I had that as well sometimes. Went away when I changed to SSDs. A dedicated special device vdev on (mirrored) SSDs should also improve the situation while not needing as many SSDs. Snapshots are a metadata operation. See [0] or `man zpoolconcepts` and look for "special device"


Cheers Aaron


[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_special_device


On 4/4/22 10:17, Marco Gaiarin wrote:
>> New installed PVE6 2-node cluster, totally unloaded; only some test VMs that
>> are replicated between the two nodes, conected via a 10G direct cable.
>> Sometimes we get:
>>    command 'zfs snapshot rpool/data/vm-103-disk-0 at __replicate_103-0_1648656014__' failed: got timeout
>> What can be?! Thanks.
> 
> We catch a log on /var/log/pve/replicate/, but seems, at least to me, not
> providing some more clue:
> 
>   2022-04-02 14:00:14 103-0: start replication job
>   2022-04-02 14:00:14 103-0: guest => VM 103, running => 5167
>   2022-04-02 14:00:14 103-0: volumes => local-zfs:vm-103-disk-0
>   2022-04-02 14:00:16 103-0: create snapshot '__replicate_103-0_1648900814__' on local-zfs:vm-103-disk-0
>   2022-04-02 14:00:21 103-0: end replication job with error: command 'zfs snapshot rpool/data/vm-103-disk-0 at __replicate_103-0_1648900814__' failed: got timeout
> 
> I'm seeking info. Thanks.
> 





More information about the pve-user mailing list