[PVE-User] HA Failover if shared storage fails on one Node

Wed Oct 17 13:33:16 CEST 2018

Hi,

We have dedicated Links for the Storage and the Cluster Communication,
so if only the Storage Links fail Corosync is still working. Maybe i
need to create some Watchdog myself for that specific case, but let's
wait if there is really nothing in Proxmox to handle that Scenario.

Best,
Martin

On 10/17/18 1:29 PM, Mark Adams wrote:
> What interface is your cluster communication (corosync) running over? As
> this is the link that needs to be unavailable to initiate a VM start on
> another node AFAIK.
>
> Basically, the other nodes in the cluster need to be seeing a problem with
> the node. If its still communicating over the whichever interface you have
> the cluster communication on then as far as it is concerned the node is
> still up. If you just lose access to your storage, then your VM will still
> be running in memory.
>
> I don't believe there is any separate storage specific monitoring in
> proxmox that could trigger a move to another node. If there is I'm sure
> someone else on the list will advise.
>
> Regards,
> Mark
>
> On Wed, 17 Oct 2018 at 12:19, Martin Holub <martin at holub.co.at> wrote:
>
>> On 10/17/18 1:11 PM, Gilberto Nunes wrote:
>>> Hi
>>>
>>> How about Node priority?
>>> Look section 14.5.2 in this doc
>>>
>>> https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_configuration_10
>>> ---
>>> Gilberto Nunes Ferreira
>>>
>>> (47) 3025-5907
>>> (47) 99676-7530 - Whatsapp / Telegram
>>>
>>> Skype: gilberto.nunes36
>>>
>>>
>>>
>>>
>>>
>>> Em qua, 17 de out de 2018 às 08:05, Martin Holub <martin at holub.co.at>
>>> escreveu:
>>>
>>>> Hi,
>>>>
>>>> I am currently testing the HA features on a 6 Node Cluster and a NetAPP
>>>> Storage with iSCSI and multipath configured on all Nodes. I now tried
>>>> what happens if, for any reason, booth Links fail (by shutting down the
>>>> Interfaces on one Blade). Unfortunately, altough i had configured HA for
>>>> my Test VM, Proxmox seems to not recognize the Storage outtage and
>>>> therefore did not migrate the VM to a different blade or removed that
>>>> Node from the Cluster (either by resetting it or fencing it somehow
>>>> else). Any hints on how to get that solved?
>>>>
>>>> Thanks,
>>>> Martin
>>>>
>> Not shure if i understood what you mean with that reference, but since
>> Proxmox does not detect that the Storage is unreachable on that specific
>> Cluster Node, how are HA Groups supposed to work around this?
>>
>> Best,
>> Martin
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user