[PVE-User] Unreliable

Wed Mar 13 04:18:18 CET 2013

Hi Fabio,

>What is the intel model card ? do you use mtu 9000 ? 
>>
>>Not yet, doing now . 
>>I did not saw that in any documentation, why ? 

Because it's more or less a best pratice when you use a san.
Be carefull you need to setup mtu 9000 everiwhere, proxmox san nic, switchs ports, freenas nic

also can you post your /etc/network/interfaces ?

I think you also need to tuned you tcp windows size

edit /etc/sysctl.conf

"# maximum receive socket buffer size, default 131071
net.core.rmem_max = 16777216
# maximum send socket buffer size, default 131071
net.core.wmem_max = 16777216

# default receive socket buffer size, default 65535
net.core.rmem_default = 524287
# default send socket buffer size, default 65535
net.core.wmem_default = 524287

# maximum amount of option memory buffers, default 10240
net.core.optmem_max = 524287
# number of unprocessed input packets before kernel starts dropping them, default 300
net.core.netdev_max_backlog = 300000

net.ipv4.tcp_rmem = 4096 524287 16777216
net.ipv4.tcp_wmem = 4096 524287 16777216

net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
"

then reboot your node

for NFS:

try nfsv4 if you can, performance is better for me.

options rw,noatime,nodiratime,noacl,vers=4,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600

>>This time all VMs crashed, and the pve daemon CAN be restarted ...
>>
>>If I ping to Storage from any of 5 nodes, it responds .
>>
>>In another ordinary linux box I connected in the same 10 GB switcher for test, all 2 nfs volumes can be mounted and I can copy anything to and from it .
>>
>>So Storage did not stop to working !!!!
>>
>>I will try the Debian kernel from Backports, it is 3.2.x ...
>>

If all yours 5 nodes have the same problem at the same moment,
Clearly something is wrong on your storage side (also,it's not a multicast problem). 

If vms are crashed, it's because they can't read/write to storage.
If pvedaemon hang, it's because it's try to read from your storage stats.

both running vms and pvedaemon are independants.

how do you mount (options?) your ordinary linux box ?

>>My original question, IT IS POSSIBLE to install Proxmox 1.9 under Squeeze ?!?
Never try it, maybe, try to install the debs package for repository. But maybe some dependencies will not work.
----- Mail original ----- 

De: "Fábio Rabelo" <fabio at fabiorabelo.wiki.br> 
À: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: pve-user at pve.proxmox.com 
Envoyé: Mardi 12 Mars 2013 19:48:01 
Objet: Re: [PVE-User] Unreliable 

2013/3/12 Alexandre DERUMIER < aderumier at odiso.com > 

>> In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro 10 GB switcher, exclusive to communication between the five nodes and the Storage . 

What is the intel model card ? do you use mtu 9000 ? 

Not yet, doing now . 
I did not saw that in any documentation, why ? 
This is the output of lspci -v 

03:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection (rev 01) 
Subsystem: Intel Corporation 10-Gigabit XF SR Dual Port Server Adapter 
Flags: bus master, fast devsel, latency 0, IRQ 19 
Memory at dff20000 (32-bit, non-prefetchable) [size=128K] 
Memory at dff40000 (32-bit, non-prefetchable) [size=256K] 
I/O ports at e400 [size=32] 
Memory at dff1c000 (32-bit, non-prefetchable) [size=16K] 
Capabilities: [40] Power Management version 3 
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ 
Capabilities: [60] MSI-X: Enable+ Count=18 Masked- 
Capabilities: [a0] Express Endpoint, MSI 00 
Capabilities: [100] Advanced Error Reporting 
Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-d9-39-5e 
Kernel driver in use: ixgbe 

<blockquote>

>>pvestatd[2804]: WARNING: storage 'iudice01' is not online 

What storage protocol do you use ? nfs/iscsi/lvm ? 
if nfs, what is your mounts options ? 

</blockquote>

Sorry, I forget to say it, is is nfs , the config were made within web interface in the very first attempt . 
This is the content of storage.conf : 

dir: local 
path /var/lib/vz 
content images,iso,vztmpl,rootdir 
maxfiles 0 

nfs: Backups 
path /mnt/pve/Backups 
server 192.168.100.20 
export /iudice01/backup 
options vers=3 
content images,backup 
maxfiles 1 

nfs: Imagens 
path /mnt/pve/Imagens 
server 192.168.100.20 
export /iudice01/images 
options vers=3 
content images,iso 
maxfiles 1 

<blockquote>

>>After that, if I try to restart the pve daemon, it refuses to . 
>>If I try to reboot the server, it stops when the PVE daemon should stops, and stays there forever . 
>> 
>>The only way to reboot any of the nodes is a hard reset ! 

It's possible that a access to the storage is hanging (stats, vm volume info,...). 
Normally a check is done to avoid that. (this is the "not online" message you see). 

</blockquote>

The Storage are OK, I can access its web interface, view logs, and etc ... nothing wrong in there . 

<blockquote>

The check are : 

for nfs:: 
/usr/bin/rpcinfo -p nfsipserver with a timeout of 2sec 

for iscsi: 

ping iscsiserverip tcp port 3260 with a timeout of 2sec. 

So maybe the timeout is too low in proxmox code, when your san is under load. 

Also, do you have vms hang ? or is it only pvedaemon/manager ? 

</blockquote>

No, VMs do not hang, unless I try to restart pvedaemon, them all VMs in that node hangs ... 
But I can not migrate any VM, each node do not talk with each other ... 

Fábio Rabelo