<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 02/14/2013 10:55 AM, Fábio Rabelo wrote:
<blockquote
cite="mid:CAEekY67xzUz_Hzq=gX9+nGxcL8CvTu8T1AYjX-K2LfgfkwW3Wg@mail.gmail.com"
type="cite">Today my system starts to be unresponsive ... and then
all VMs gone to "unknown state" .<br>
<br>
After some digging, all nfs mounts shows any content, but no
errors in log ?!?<br>
<br>
Then, I rebooted one node to find out if things came to live again
...<br>
<br>
Second problem, the system do not reboot !!<br>
<br>
After send the command via web interface, the system returns msg
of stopping all VMs and containers, and stay there like forever
!!!<br>
<br>
After 20 minutes waiting, I decide to try a reboot via ssh, again,
receives msg like the system is shutdown, stays like that forever
agian !<br>
<br>
After another 20 minutes, I try to connect viaq ssh, connection
works and the "uptime" command returns 14 days uptime !!!<br>
<br>
Then I presses reset button, after systems comes on line, the
storage do not connects, with this msg in log :<br>
<br>
<pre>WARNING: mount error: mount.nfs: Unknown error 32768
<font size="4">
<span style="font-family:arial,helvetica,sans-serif"></span></font><font size="4"><span style="font-family:arial,helvetica,sans-serif"><font>Google returns <font>nothing referring to this error ...
<font>I<font> am lost here <font>... don't know where to go ...
<font>Any Ideas ?!?!?
<font>Fábio Rabelo</font>
</font></font></font></font></font></font></span></font>
</pre>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
pve-user mailing list
<a class="moz-txt-link-abbreviated" href="mailto:pve-user@pve.proxmox.com">pve-user@pve.proxmox.com</a>
<a class="moz-txt-link-freetext" href="http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user">http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user</a>
</pre>
</blockquote>
<br>
Excuse the late reply. Maybe you or someone can try this next
time:<br>
<br>
If you can't restart the node then check for a stuck rgmanager
process. It can get to an un-killable state. <br>
<br>
<br>
so do <br>
ps afx <br>
<br>
<br>
<tt>222025 ? Ss 0:00 /bin/sh /etc/init.d/rc 6<br>
222028 ? S 0:00 \_ startpar -p 4 -t 20 -T 3 -M stop
-P 2 -R 6<br>
222044 ? S 0:00 \_ /bin/bash
/etc/init.d/rgmanager stop<br>
225986 ? S 0:00 \_ sleep 1<br>
<br>
rgmanager is in an un killable state.<br>
<br>
kill -9 on the running rgmanager process did not work.<br>
<br>
so kill the /etc/init.d/rgmanager stop process .</tt><br>
in the above you'd<br>
kill -9 222044<br>
<br>
<br>
here that allowed the machine to procede with a reboot.<br>
<br>
</body>
</html>