[PVE-User] Unreliable

Fábio Rabelo fabio at fabiorabelo.wiki.br
Tue Mar 12 12:32:21 CET 2013


2013/3/12 Andreu Sànchez i Costa <andreu.sanchez at iws.es>

>  Hello Fábio,
>
> Al 12/03/13 01:00, En/na Fábio Rabelo ha escrit:
>
>
> 2.3 do not have the reliability 1.9 has !!!!
>
> I am struggling with it for 3 months, my deadline are gone, and I cannot
> make it work for more than 3 days without an issue ...
>
>
> I cannot give my opinion about 2.3 but with 2.2.x it works perfectly, I
> only had to change elevator to deadline cause CFQ had performance problems
> with our P2000 iSCSI array disk.
>
> As other list members asked, what are your main problems?
>
>
I already described the problems several times here .

This is a five node cluster, motherboards dual opteron from Supermicro .

Storage uses the same motherboard as the five nodes, but with a 16 3,5 HD
slots, with 12 occupied by WD enterprise disks .

Storage runs Nas4Free .  ( already try Freenas, same result )

Like I said, when I installed PVE 1.9 everything works fine for, now 9
days, and counting .

In the five nodes, are embedded 2 network ports, connected to Linksys
switcher, I am using it to serve the VMs .

In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro
10 GB switcher, exclusive to communication between the five nodes and the
Storage .

This switcher have no link with anything else .

In the Storage, I use one of the embedded ports to manage, and all images
are served through 10 GB card .

After sometime, between 1 and 3 days the system is working, the nodes stops
to talk with the storage .

When it happens, the log shows lots of msg like this :

Mar  6 17:15:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:15:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:15:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:15:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:16:09 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:16:19 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:16:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:16:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:16:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online
Mar  6 17:16:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is
not online



After that, if I try to restart the pve daemon, it refuses to .

If I try to reboot the server, it stops when the PVE daemon should stops,
and stays there forever .

The only way to reboot any of the nodes is a hard reset !

At first, I my suspects goes to Storage, changed from Freenas to Nas4Free,
sane thing, desperation !

Then, for tests, I installed PVE 1.9 In all five nodes ( I have 2 systems
running it for 3 years, so issue, this new system are to replace both )

Like I said, 9 days and counting !!!

So, there is no problem in the hardware, and there is no problem with
Nas4Free !

What left ?!?


Fábio Rabelo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pve.proxmox.com/pipermail/pve-user/attachments/20130312/9b787137/attachment-0014.html>


More information about the pve-user mailing list