aderumier at odiso.com
Wed Mar 13 04:18:18 CET 2013
>What is the intel model card ? do you use mtu 9000 ?
>>Not yet, doing now .
>>I did not saw that in any documentation, why ?
Because it's more or less a best pratice when you use a san.
Be carefull you need to setup mtu 9000 everiwhere, proxmox san nic, switchs ports, freenas nic
also can you post your /etc/network/interfaces ?
I think you also need to tuned you tcp windows size
"# maximum receive socket buffer size, default 131071
net.core.rmem_max = 16777216
# maximum send socket buffer size, default 131071
net.core.wmem_max = 16777216
# default receive socket buffer size, default 65535
net.core.rmem_default = 524287
# default send socket buffer size, default 65535
net.core.wmem_default = 524287
# maximum amount of option memory buffers, default 10240
net.core.optmem_max = 524287
# number of unprocessed input packets before kernel starts dropping them, default 300
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_rmem = 4096 524287 16777216
net.ipv4.tcp_wmem = 4096 524287 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
then reboot your node
try nfsv4 if you can, performance is better for me.
>>This time all VMs crashed, and the pve daemon CAN be restarted ...
>>If I ping to Storage from any of 5 nodes, it responds .
>>In another ordinary linux box I connected in the same 10 GB switcher for test, all 2 nfs volumes can be mounted and I can copy anything to and from it .
>>So Storage did not stop to working !!!!
>>I will try the Debian kernel from Backports, it is 3.2.x ...
If all yours 5 nodes have the same problem at the same moment,
Clearly something is wrong on your storage side (also,it's not a multicast problem).
If vms are crashed, it's because they can't read/write to storage.
If pvedaemon hang, it's because it's try to read from your storage stats.
both running vms and pvedaemon are independants.
how do you mount (options?) your ordinary linux box ?
>>My original question, IT IS POSSIBLE to install Proxmox 1.9 under Squeeze ?!?
Never try it, maybe, try to install the debs package for repository. But maybe some dependencies will not work.
----- Mail original -----
De: "Fábio Rabelo" <fabio at fabiorabelo.wiki.br>
À: "Alexandre DERUMIER" <aderumier at odiso.com>
Cc: pve-user at pve.proxmox.com
Envoyé: Mardi 12 Mars 2013 19:48:01
Objet: Re: [PVE-User] Unreliable
2013/3/12 Alexandre DERUMIER < aderumier at odiso.com >
>> In one PCIe Slot there are an Intel 10 GB card, to talk with a Supermicro 10 GB switcher, exclusive to communication between the five nodes and the Storage .
What is the intel model card ? do you use mtu 9000 ?
Not yet, doing now .
I did not saw that in any documentation, why ?
This is the output of lspci -v
03:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection (rev 01)
Subsystem: Intel Corporation 10-Gigabit XF SR Dual Port Server Adapter
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at dff20000 (32-bit, non-prefetchable) [size=128K]
Memory at dff40000 (32-bit, non-prefetchable) [size=256K]
I/O ports at e400 [size=32]
Memory at dff1c000 (32-bit, non-prefetchable) [size=16K]
Capabilities:  Power Management version 3
Capabilities:  MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities:  MSI-X: Enable+ Count=18 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities:  Advanced Error Reporting
Capabilities:  Device Serial Number 00-1b-21-ff-ff-d9-39-5e
Kernel driver in use: ixgbe
>>pvestatd: WARNING: storage 'iudice01' is not online
What storage protocol do you use ? nfs/iscsi/lvm ?
if nfs, what is your mounts options ?
Sorry, I forget to say it, is is nfs , the config were made within web interface in the very first attempt .
This is the content of storage.conf :
>>After that, if I try to restart the pve daemon, it refuses to .
>>If I try to reboot the server, it stops when the PVE daemon should stops, and stays there forever .
>>The only way to reboot any of the nodes is a hard reset !
It's possible that a access to the storage is hanging (stats, vm volume info,...).
Normally a check is done to avoid that. (this is the "not online" message you see).
The Storage are OK, I can access its web interface, view logs, and etc ... nothing wrong in there .
The check are :
/usr/bin/rpcinfo -p nfsipserver with a timeout of 2sec
ping iscsiserverip tcp port 3260 with a timeout of 2sec.
So maybe the timeout is too low in proxmox code, when your san is under load.
Also, do you have vms hang ? or is it only pvedaemon/manager ?
No, VMs do not hang, unless I try to restart pvedaemon, them all VMs in that node hangs ...
But I can not migrate any VM, each node do not talk with each other ...
More information about the pve-user