[PVE-User] Unreliable
Alexandre DERUMIER
aderumier at odiso.com
Tue Mar 12 14:18:46 CET 2013
>> In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro 10 GB switcher, exclusive to communication between the five nodes and the Storage .
What is the intel model card ? do you use mtu 9000 ?
>>pvestatd[2804]: WARNING: storage 'iudice01' is not online
What storage protocol do you use ? nfs/iscsi/lvm ?
if nfs, what is your mounts options ?
>>After that, if I try to restart the pve daemon, it refuses to .
>>If I try to reboot the server, it stops when the PVE daemon should stops, and stays there forever .
>>
>>The only way to reboot any of the nodes is a hard reset !
It's possible that a access to the storage is hanging (stats, vm volume info,...).
Normally a check is done to avoid that. (this is the "not online" message you see).
The check are :
for nfs::
/usr/bin/rpcinfo -p nfsipserver with a timeout of 2sec
for iscsi:
ping iscsiserverip tcp port 3260 with a timeout of 2sec.
So maybe the timeout is too low in proxmox code, when your san is under load.
Also, do you have vms hang ? or is it only pvedaemon/manager ?
----- Mail original -----
De: "Fábio Rabelo" <fabio at fabiorabelo.wiki.br>
À: "Andreu Sànchez i Costa" <andreu.sanchez at iws.es>
Cc: pve-user at pve.proxmox.com
Envoyé: Mardi 12 Mars 2013 12:32:21
Objet: Re: [PVE-User] Unreliable
2013/3/12 Andreu Sànchez i Costa < andreu.sanchez at iws.es >
Hello Fábio,
Al 12/03/13 01:00, En/na Fábio Rabelo ha escrit:
<blockquote>
2.3 do not have the reliability 1.9 has !!!!
I am struggling with it for 3 months, my deadline are gone, and I cannot make it work for more than 3 days without an issue ...
I cannot give my opinion about 2.3 but with 2.2.x it works perfectly, I only had to change elevator to deadline cause CFQ had performance problems with our P2000 iSCSI array disk.
As other list members asked, what are your main problems?
</blockquote>
I already described the problems several times here .
This is a five node cluster, motherboards dual opteron from Supermicro .
Storage uses the same motherboard as the five nodes, but with a 16 3,5 HD slots, with 12 occupied by WD enterprise disks .
Storage runs Nas4Free . ( already try Freenas, same result )
Like I said, when I installed PVE 1.9 everything works fine for, now 9 days, and counting .
In the five nodes, are embedded 2 network ports, connected to Linksys switcher, I am using it to serve the VMs .
In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro 10 GB switcher, exclusive to communication between the five nodes and the Storage .
This switcher have no link with anything else .
In the Storage, I use one of the embedded ports to manage, and all images are served through 10 GB card .
After sometime, between 1 and 3 days the system is working, the nodes stops to talk with the storage .
When it happens, the log shows lots of msg like this :
Mar 6 17:15:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:15:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:15:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:15:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:16:09 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:16:19 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:16:29 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:16:39 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:16:49 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
Mar 6 17:16:59 nodo-01 pvestatd[2804]: WARNING: storage 'iudice01' is not online
After that, if I try to restart the pve daemon, it refuses to .
If I try to reboot the server, it stops when the PVE daemon should stops, and stays there forever .
The only way to reboot any of the nodes is a hard reset !
At first, I my suspects goes to Storage, changed from Freenas to Nas4Free, sane thing, desperation !
Then, for tests, I installed PVE 1.9 In all five nodes ( I have 2 systems running it for 3 years, so issue, this new system are to replace both )
Like I said, 9 days and counting !!!
So, there is no problem in the hardware, and there is no problem with Nas4Free !
What left ?!?
Fábio Rabelo
_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list