[PVE-User] VM crash down after fence...

Gilberto Nunes gilberto.nunes32 at gmail.com
Sun Jul 29 18:28:58 CEST 2012


Here what's happen:

http://troll.ws/A2nH35

http://troll.ws/A6etIQ

On first screen, we see that I can have /dev/VM anymore...
On second, the VM has shutdown state on "alive" node.... :(

And just happen on feel seconds after the other node die...

That syslog on "alived" node:

/var/log/syslog
Jul 29 13:23:44 servidor-b task
UPID:servidor-b:00000C3E:0000EC40:50156390:qmstart:100:root at pam:: start VM
100: UPID:servidor-b:00000C3E:0000EC40:50156390:qmstart:100:root at pam:
Jul 29 13:23:44 servidor-b pvevm: <root at pam> starting task
UPID:servidor-b:00000C3E:0000EC40:50156390:qmstart:100:root at pam:
Jul 29 13:23:45 servidor-b kernel: device tap100i0 entered promiscuous mode
Jul 29 13:23:45 servidor-b rgmanager[3150]: [pvevm] Task still active,
waiting
Jul 29 13:23:45 servidor-b kernel: vmbr0: port 2(tap100i0) entering
forwarding state
Jul 29 13:23:45 servidor-b kernel: New device tap100i0 does not support
netpoll
Jul 29 13:23:45 servidor-b kernel: Disabling netpoll for vmbr0
Jul 29 13:23:45 servidor-b pvevm: <root at pam> end task
UPID:servidor-b:00000C3E:0000EC40:50156390:qmstart:100:root at pam: OK
Jul 29 13:23:45 servidor-b rgmanager[1787]: Service pvevm:100 started
Jul 29 13:23:55 servidor-b kernel: tap100i0: no IPv6 routers present
Jul 29 13:24:00 servidor-b ntpd[1252]: Listen normally on 15 tap100i0
fe80::e097:e8ff:feca:a9a6 UDP 123
Jul 29 13:24:12 servidor-b pmxcfs[1350]: [dcdb] notice: members: 2/1350
Jul 29 13:24:12 servidor-b pmxcfs[1350]: [dcdb] notice: members: 2/1350
Jul 29 13:24:22 servidor-b rgmanager[3238]: [pvevm] VM 100 is running
Jul 29 13:24:24 servidor-b kernel: bnx2 0000:07:00.0: eth1: NIC Copper Link
is Down
Jul 29 13:24:24 servidor-b kernel: vmbr1: port 1(eth1) entering disabled
state
Jul 29 13:24:25 servidor-b kernel: bnx2 0000:07:00.0: eth1: NIC Copper Link
is Up, 100 Mbps full duplex, receive & transmit flow control ON
Jul 29 13:24:25 servidor-b kernel: vmbr1: port 1(eth1) entering forwarding
state
Jul 29 13:24:32 servidor-b kernel: block drbd0: PingAck did not arrive in
time.
Jul 29 13:24:32 servidor-b kernel: block drbd0: peer( Primary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jul 29 13:24:32 servidor-b kernel: block drbd0: new current UUID
9DF628A28D5AF19F:693299C3FBC55B03:F7D75A97A0DFB737:F7D65A97A0DFB737
Jul 29 13:24:32 servidor-b kernel: block drbd0: asender terminated
Jul 29 13:24:32 servidor-b kernel: block drbd0: Terminating asender thread
Jul 29 13:24:32 servidor-b kernel: block drbd0: Connection closed
Jul 29 13:24:32 servidor-b kernel: block drbd0: conn( NetworkFailure ->
Unconnected )
Jul 29 13:24:32 servidor-b kernel: block drbd0: receiver terminated
Jul 29 13:24:32 servidor-b kernel: block drbd0: Restarting receiver thread
Jul 29 13:24:32 servidor-b kernel: block drbd0: receiver (re)started
Jul 29 13:24:32 servidor-b kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jul 29 13:24:32 servidor-b corosync[1571]:   [TOTEM ] A processor failed,
forming new configuration.
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] CLM CONFIGURATION
CHANGE
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] New Configuration:
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] #011r(0)
ip(10.0.0.20)
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] Members Left:
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] #011r(0)
ip(10.0.0.10)
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] Members Joined:
Jul 29 13:24:34 servidor-b corosync[1571]:   [CMAN  ] quorum lost, blocking
activity
Jul 29 13:24:34 servidor-b corosync[1571]:   [QUORUM] This node is within
the non-primary component and will NOT provide any services.
Jul 29 13:24:34 servidor-b corosync[1571]:   [QUORUM] Members[1]: 2
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] CLM CONFIGURATION
CHANGE
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] New Configuration:
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] #011r(0)
ip(10.0.0.20)
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] Members Left:
Jul 29 13:24:34 servidor-b corosync[1571]:   [CLM   ] Members Joined:
Jul 29 13:24:34 servidor-b corosync[1571]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jul 29 13:24:34 servidor-b pmxcfs[1350]: [status] notice: node lost quorum
Jul 29 13:24:34 servidor-b dlm_controld[1653]: node_history_cluster_remove
no nodeid 1
Jul 29 13:24:34 servidor-b corosync[1571]:   [CPG   ] chosen downlist:
sender r(0) ip(10.0.0.20) ; members(old:2 left:1)
Jul 29 13:24:34 servidor-b corosync[1571]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jul 29 13:24:34 servidor-b rgmanager[1787]: #1: Quorum Dissolved

Message from syslogd at servidor-b at Jul 29 13:24:34 ...
 rgmanager[1787]: #1: Quorum Dissolved
Jul 29 13:24:34 servidor-b kernel: dlm: closing connection to node 1
Jul 29 13:24:34 servidor-b task
UPID:servidor-b:00000CC8:00010015:501563C2:qmshutdown:100:root at pam::
shutdown VM 100:
UPID:servidor-b:00000CC8:00010015:501563C2:qmshutdown:100:root at pam:
Jul 29 13:24:34 servidor-b pvevm: <root at pam> starting task
UPID:servidor-b:00000CC8:00010015:501563C2:qmshutdown:100:root at pam:
Jul 29 13:24:35 servidor-b rgmanager[3274]: [pvevm] Task still active,
waiting
Jul 29 13:24:37 servidor-b rgmanager[3294]: [pvevm] Task still active,
waiting
Jul 29 13:24:38 servidor-b rgmanager[3314]: [pvevm] Task still active,
waiting
Jul 29 13:24:39 servidor-b rgmanager[3334]: [pvevm] Task still active,
waiting
Jul 29 13:24:40 servidor-b rgmanager[3354]: [pvevm] Task still active,
waiting
Jul 29 13:24:41 servidor-b rgmanager[3374]: [pvevm] Task still active,
waiting
Jul 29 13:24:42 servidor-b rgmanager[3394]: [pvevm] Task still active,
waiting
Jul 29 13:24:43 servidor-b rgmanager[3419]: [pvevm] Task still active,
waiting
Jul 29 13:24:44 servidor-b kernel: vmbr0: port 2(tap100i0) entering
disabled state
Jul 29 13:24:44 servidor-b kernel: vmbr0: port 2(tap100i0) entering
disabled state
Jul 29 13:24:44 servidor-b rgmanager[3446]: [pvevm] Task still active,
waiting
Jul 29 13:24:45 servidor-b pvevm: <root at pam> end task
UPID:servidor-b:00000CC8:00010015:501563C2:qmshutdown:100:root at pam: OK


2012/7/29 Gilberto Nunes <gilberto.nunes32 at gmail.com>

> Other question:
>
> Do I need run command pvecm expected 1 on both nodes??
>
>
> 2012/7/29 Gilberto Nunes <gilberto.nunes32 at gmail.com>
>
>> It's seems to me, that quorum is already set:
>>
>>
>> server-a:~# pvecm status
>> Version: 6.2.0
>> Config Version: 12
>> Cluster Name: SELB
>> Cluster Id: 1158
>> Cluster Member: Yes
>> Cluster Generation: 204
>> Membership state: Cluster-Member
>> Nodes: 2
>> Expected votes: 1
>> Total votes: 2
>> Node votes: 1
>> Quorum: 2
>> Active subsystems: 6
>> Flags:
>> Ports Bound: 0 177
>> Node name: servidor-a
>> Node ID: 1
>> Multicast addresses: 239.192.4.138
>> Node addresses: 10.0.0.10
>>
>> May I need other disk to make the quorum, right is mentioned on that
>> docs that I send on previously mail??
>>
>>
>> 2012/7/29 THe_ZiPMaN <flavio-pve at zipman.it>
>>
>>> On 07/29/2012 05:55 PM, Gilberto Nunes wrote:
>>> > And to complete my hell, I'm unable to migrate the VM to server-b to
>>> > server-a
>>> >
>>> > Executing HA migrate for VM 100 to node servidor-a
>>> > Trying to migrate pvevm:100 to servidor-a...Failure
>>> > TASK ERROR: command 'clusvcadm -M pvevm:100 -m servidor-a' failed: exit
>>> > code 255
>>>
>>> If you have a 2 node cluster you must set the quorum to 1 or the cluster
>>> won't service anything.
>>>
>>> You must set the quorum to 1 by hand
>>> # pvecm expected 1
>>>
>>> What's your cluster status on the 2 nodes?
>>> # pvecm status
>>>
>>> --
>>> Flavio Visentin
>>>
>>> A computer is like an air conditioner,
>>> it stops working when you open Windows
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>
>>
>>
>> --
>> Gilberto Nunes
>>
>>
>> (47) 9676-7530
>>
>> msn: gilbertonunesferreira at hotmail.com
>>
>> msn: konnectati at konnectati.com.br
>>
>> Skype: gilberto.nunes36
>>
>> Skype: konnectati
>>
>>
>> *www.konnectati.com.br*
>>
>>
>
>
> --
> Gilberto Nunes
>
>
> (47) 9676-7530
>
> msn: gilbertonunesferreira at hotmail.com
>
> msn: konnectati at konnectati.com.br
>
> Skype: gilberto.nunes36
>
> Skype: konnectati
>
>
> *www.konnectati.com.br*
>
>


-- 
Gilberto Nunes


(47) 9676-7530

msn: gilbertonunesferreira at hotmail.com

msn: konnectati at konnectati.com.br

Skype: gilberto.nunes36

Skype: konnectati


*www.konnectati.com.br*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20120729/213bb2ce/attachment.htm>


More information about the pve-user mailing list