[PVE-User] Nodes in "partial" offline condition
alk at ondore.com
Thu Sep 27 23:19:48 CEST 2012
I have a 4 nodes in a Proxmox 2.1 cluster.
After a network configuration change on the node I'm using as web panel
(hostname proxmox42) and rebooted (as Web GUI requested) I see the rest
of the nodes offline (hostnames proxmox43-proxmox45). Well, they have a
little red dot instead of a green one, in their icon in the in the web
The fallen nodes responds via Web and SSH, with some errors on the Web
GUI. The network configuration change I have done was to add a bridge on
a previously unused NIC.
What can I do (places to look, tests to run) to see what is going on? My
cluster has to go to production next week, I'm almost glad this happen
now and not then.
Random details, don't know what may be relevant:
The "Datacenter" (root of the GUI hierarchy) section of the Web GUI
shows this status:
"Search" tab lists all the resources but shows the details only for tab
status for proxmox42's resources.
"Summary" tab shows all the nodes as "online".
I have reloaded the page, logged out and logged in (using root PAM
account), same status.
Curiously, the "Summary" tabs of the fallen nodes are showing a valid
status. I can see the CPU details, uptime, etc. The only thing out of
order is the Load Average. They are doing or running nothing, but have
Load Average above 1.
Some parts of the GUI does not shows details and displays a floating
message "communication failure".
I can SSH to all the nodes and see that "pvecm status" and "pvecm nodes"
shows all 4 nodes online and running.
SSH to each node works, "top" confirms a high Load Average but shows
less than 1% CPU usage.
Apache access log shows successful connections to the API from proxmox42
to the fallen nodes.
I have rebooted one of the nodes and it appear to online now, seems
normal (Load Average, response to GUI). I have not rebooted any other
node yet. I'm more interested to find out what's the condition and make
sure i eliminate the cause, then getting my nodes back online ASAP.
More information about the pve-user