[PVE-User] proxmox 5 - replication fails
dorsy
dorsyka at yahoo.com
Wed Jul 12 13:50:44 CEST 2017
# cat /etc/pve/replication.cfg
local: 105-0
target ns302695
rate 10
schedule */2:00
local: 103-0
target ns3511723
rate 11
schedule */20
local: 109-0
target ns3511723
rate 10
local: 102-0
target ns302695
rate 10
schedule 22:30
local: 107-0
target ns302695
rate 10
local: 100-0
target ns302695
rate 10
schedule */2:00
cat /var/lib/pve-manager/pve-replication-state.json
{"103":{"local/ns3511723":{"storeid_list":["local-zfs"],"fail_count":0,"last_try":1499859600,"last_sync":1499859600,"last_iteration":1499859600,"last_node":"ns302695","duration":4.482678}},"109":{"local/ns3511723":{"fail_count":0,"storeid_list":["local-zfs"],"last_sync":1499859000,"last_try":1499859000,"last_iteration":1499859000,"last_node":"ns302695","duration":7.828846}}}
On the failed node (at the moment, I had failures from both sides):
# cat /var/lib/pve-manager/pve-replication-state.json
{"105":{"local/ns302695":{"last_iteration":1499853601,"fail_count":0,"duration":32.107092,"last_node":"ns3511723","storeid_list":["local-zfs"],"last_try":1499853633,"last_sync":1499853633}},"102":{"local/ns302695":{"last_try":1499805001,"last_sync":1499805001,"last_node":"ns3511723","duration":126.81862,"storeid_list":["local-zfs"],"last_iteration":1499805001,"fail_count":0}},"107":{"local/ns302695":{"fail_count":0,"last_iteration":1499859000,"duration":3.511844,"last_node":"ns3511723","storeid_list":["local-zfs"],"last_try":1499859000,"last_sync":1499859000}},"100":{"local/ns302695":{"error":"command
'set -o pipefail && pvesm export local-zfs:vm-100-disk-1 zfs -
-with-snapshots 1 -snapshot __replicate_100-0_1499858220__ |
/usr/bin/cstream -t 10000000 | /usr/bin/ssh -o 'BatchMode=yes' -o
'HostKeyAlias=ns302695' root at IP.OF.TAR.GET -- pvesm import
local-zfs:vm-100-disk-1 zfs - -with-snapshots 1' failed: exit code
255","fail_count":5,"last_iteration":1499858220,"duration":2.493542,"last_node":"ns3511723","storeid_list":["local-zfs"],"last_try":1499858220,"last_sync":1499846406}}}
But I knew all these from the API :)
pve:/> get nodes/ns3511723/replication/100-0/log
200 OK
[
{
"n" : 1,
"t" : "2017-07-12 13:17:00 100-0: start replication job"
},
{
"n" : 2,
"t" : "2017-07-12 13:17:00 100-0: guest => VM 100, running => 12279"
},
{
"n" : 3,
"t" : "2017-07-12 13:17:00 100-0: volumes => local-zfs:vm-100-disk-1"
},
{
"n" : 4,
"t" : "2017-07-12 13:17:01 100-0: create snapshot
'__replicate_100-0_1499858220__' on local-zfs:vm-100-disk-1"
},
{
"n" : 5,
"t" : "2017-07-12 13:17:01 100-0: full sync
'local-zfs:vm-100-disk-1' (__replicate_100-0_1499858220__)"
},
{
"n" : 6,
"t" : "2017-07-12 13:17:03 100-0: delete previous replication
snapshot '__replicate_100-0_1499858220__' on local-zfs:vm-100-disk-1"
},
{
"n" : 7,
"t" : "2017-07-12 13:17:03 100-0: end replication job with error:
command 'set -o pipefail && pvesm export local-zfs:vm-100-disk-1 zfs -
-with-snapshots 1 -snapshot __replicate_100-0_1499858220__ |
/usr/bin/cstream -t 10000000 | /usr/bin/ssh -o 'BatchMode=yes' -o
'HostKeyAlias=ns302695' root at IP.OF.TAR.GET -- pvesm import
local-zfs:vm-100-disk-1 zfs - -with-snapshots 1' failed: exit code 255"
}
]
pve:/> get nodes/ns3511723/replication/100-0/status
200 OK
{
"duration" : 2.493542,
"error" : "command 'set -o pipefail && pvesm export
local-zfs:vm-100-disk-1 zfs - -with-snapshots 1 -snapshot
__replicate_100-0_1499858220__ | /usr/bin/cstream -t 10000000 |
/usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=ns302695'
root at IP.OF.TAR.GET -- pvesm import local-zfs:vm-100-disk-1 zfs -
-with-snapshots 1' failed: exit code 255",
"fail_count" : 5,
"guest" : "100",
"id" : "100-0",
"jobnum" : "0",
"last_sync" : 1499846406,
"last_try" : 1499858220,
"next_sync" : 1499860020,
"rate" : 10,
"schedule" : "*/2:00",
"target" : "ns302695",
"type" : "local",
"vmtype" : "qemu"
}
Also, I have set a throttle of 10MB/s for the replication jobs, which is
just a portion of the available bandwidth between the nodes, it should
not be an issue.
On 2017-07-12 13:39, Dominik Csapak wrote:
> hi,
>
> i reply here, to avoid confusion in the other thread
>
> can you post the content of the two files:
>
> /etc/pve/replication.cfg
> /var/lib/pve-manager/pve-replication-state.json (of the source node)
>
> ?
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list