[PVE-User] Unreliable

Fábio Rabelo fabio at fabiorabelo.wiki.br
Tue Mar 12 18:21:03 CET 2013


My Linksys are made by Cisco !

Is there a way to confirm that ?


Fábio Rabelo



2013/3/12 Alexandre DERUMIER <aderumier at odiso.com>

> Hi Steffen,
>
> Seem that you have multicast errors/hang which cause corosync error.
> What physicals switchs do you use ? (I ask this because we have found a
> multicast bug with a feature of current kernel and cisco swithcs)
>
>
>
>
> 2013/3/12 Steffen Wagner < mail at steffenwagner.com >
>
>
> Hi,
>
> I had a similiar problem with 2.2
> I had rgmanager for HA features running on high end hardware (Dell, QNAP
> and Cisco). After about three days one of the nodes (it wasnt always the
> same!) left quorum (log said something like 'node 2 left, x nodes remaining
> in cluster, fencing node 2.'. After then always the node was successfully
> fenced... so i disabled fencing and changed it to 'hand'. Then the node
> didnt shut down anymore. It remained online with all vms, but the cluster
> said the node was offline (at reboot the node stuck at pve rgmanager
> service, only hardreset was possible).
>
> In the end i disabled HA and ran the nodes now only in cluster mode
> without fencing... working until now (3 months) without any problems... a
> pity, because i want to use HA features, but dont know whats wrong.
>
> My network setup is similiar as Fabio's. I'm using VLANs one for the
> storage interface and one for the other.....
>
> Until now i think i stay at 2.2 and do not upgrade to 2.3 until everyone
> in the maillist is happy :-)
>
>
> Mit freundlichen Grüßen,
> Steffen Wagner
> --
>
> Im Obersteig 31
> 76879 Hochstadt/Pfalz
>
> E mail at steffenwagner.com
> M 01523/3544688
> F 06347/918474
>
> Fábio Rabelo < fabio at fabiorabelo.wiki.br > schrieb:
>
> >2013/3/12 Andreu Sànchez i Costa < andreu.sanchez at iws.es >
> >
> >> Hello Fábio,
> >>
> >> Al 12/03/13 01:00, En/na Fábio Rabelo ha escrit:
> >>
> >>
> >> 2.3 do not have the reliability 1.9 has !!!!
> >>
> >> I am struggling with it for 3 months, my deadline are gone, and I cannot
> >> make it work for more than 3 days without an issue ...
> >>
> >>
> >> I cannot give my opinion about 2.3 but with 2.2.x it works perfectly, I
> >> only had to change elevator to deadline cause CFQ had performance
> problems
> >> with our P2000 iSCSI array disk.
> >>
> >> As other list members asked, what are your main problems?
> >>
> >>
> >I already described the problems several times here .
> >
> >This is a five node cluster, motherboards dual opteron from Supermicro .
> >
> >Storage uses the same motherboard as the five nodes, but with a 16 3,5 HD
> >slots, with 12 occupied by WD enterprise disks .
> >
> >Storage runs Nas4Free . ( already try Freenas, same result )
> >
> >Like I said, when I installed PVE 1.9 everything works fine for, now 9
> >days, and counting .
> >
> >In the five nodes, are embedded 2 network ports, connected to Linksys
> >switcher, I am using it to serve the VMs .
> >
> >In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro
> >10 GB switcher, exclusive to communication between the five nodes and the
> >Storage .
> >
> >This switcher have no link with anything else .
> >
> >In the Storage, I use one of the embedded ports to manage, and all images
> >are served through 10 GB card .
> >
> >After sometime, between 1 and 3 days the system is working, the nodes
> stops
> >to talk with the storage .
> >
> >When it happens, the log shows lots of msg like this :
> >
> >Mar 6 17:15:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:15:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:15:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:15:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:16:09 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:16:19 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:16:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:16:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:16:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >Mar 6 17:16:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
> >not online
> >
> >
> >
> >After that, if I try to restart the pve daemon, it refuses to .
> >
> >If I try to reboot the server, it stops when the PVE daemon should stops,
> >and stays there forever .
> >
> >The only way to reboot any of the nodes is a hard reset !
> >
> >At first, I my suspects goes to Storage, changed from Freenas to Nas4Free,
> >sane thing, desperation !
> >
> >Then, for tests, I installed PVE 1.9 In all five nodes ( I have 2 systems
> >running it for 3 years, so issue, this new system are to replace both )
> >
> >Like I said, 9 days and counting !!!
> >
> >So, there is no problem in the hardware, and there is no problem with
> >Nas4Free !
> >
> >What left ?!?
> >
> >
> >Fábio Rabelo
> >
> >_______________________________________________
> >pve-user mailing list
> > pve-user at pve.proxmox.com
> > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pve.proxmox.com/pipermail/pve-user/attachments/20130312/fd498f15/attachment-0014.html>


More information about the pve-user mailing list