[pve-devel] Proxmox 4 feedback

Sun Oct 4 19:24:31 CEST 2015

Am 04.10.2015 um 16:46 schrieb Gilou:
> Le 04/10/2015 10:50, Thomas Lamprecht a écrit :
>>
>> Am 03.10.2015 um 22:40 schrieb Gilou:
>>> Le 02/10/2015 15:18, Thomas Lamprecht a écrit :
>>>> On 10/02/2015 11:59 AM, Gilou wrote:
>>>>> Hi,
>>>>>
>>>>> I just installed PVE 4 Beta2 (43 ?), and played a bit with it.
>>>>>
>>>>> I do not notice the same bug I had on 3.4 with Windows 2012
>>>>> rollbacks of
>>>>> snapshots: it just works, that is great.
>>>>>
>>>>> However, I keep on getting an error on different pages: "Too many
>>>>> redirections (599)". Any clue what could cause that? It happens even
>>>>> more often on the storage contents...
>>>> How do you connect to the web interface, network and browser? We do not
>>>> redirect from our proxy, AFAIK.
>>>>> I have an issue installing it over an old debian (that was running PVE
>>>>> 3.4), it seems it has a hard time properly partitionning the local
>>>>> disk,
>>>>> I'll investigate a bit further on that.
>>>> How did you install it over the old debian? PVE4 is based on debian
>>>> jessie, whereas PVE3.4 is based on wheezy so an upgrade is needed, but
>>>> that's not that trivial.
>>>> You can install PVE4 on a - new installed - plain debian jessie, though.
>>> I did it that way (I didn't want to try the upgrade from Debian,
>>> assuming it would end badly ;)). But it seems that it barfed on the
>>> partition layout. I deleted the partition table, and it installed just
>>> fine. I haven't reproduced or troubleshooted any further.
>>>
>>> Also, I managed to kill HA, it was working fine on PVE3, but in PVE4 it
>>> just didn't work... I have to read more about the new resource manager,
>>> but I wasn't impressed by the reliability of the beta overall. I'll get
>>> back to it next week and investigate further.
>> How did you kill HA? And what doesn't seemed reliable? We welcome bug
>> reports!
>> I did only some small stuff with 3.4 HA stack and a lot with the 4.0beta
>> stack, and for me the new HA manager was easier to configure and also
>> reliable on my test cases.
>> Although I have to say that as someone which develops on and with the ha
>> manager I'm probably a bit biased. :-)
>> Look at:
>> http://pve.proxmox.com/wiki/High_Availability_Cluster_4.x
>> and
>>> man ha-manager
>> those should give some information how it works.
> That's what I intend to read more thoroughly, though I already skimmed
> through it while I set it up. HA simply didn't work... Softdog behaving
> erraticly, corosync reporting a quorate cluster (and multicast behaving
> properly) while resource manager was unable to start, let alone protect
> or migrate vms..
What's your setup? If the cluster is quorate and you have any resource 
configured it will start up automatically, no need for manual start.
Look into the syslog, as the CRM and LRM normally log quite extensively, 
especially on errors.

A somewaht normal startup shows as:
for the LRM:
>  pve-ha-lrm[1695]: starting server
>  pve-ha-lrm[1695]: status change startup => wait_for_agent_lock
>  [...]
>  pve-ha-lrm[1695]: successfully acquired lock 'ha_agent_one_lock'
>  pve-ha-lrm[1695]: watchdog active
>  pve-ha-lrm[1695]: status change wait_for_agent_lock => active

If you hit those lines, the watchdog is active and triggers also.

For the CRM (only one node per cluster is master at any time):
>  pve-ha-crm[1689]: starting server
>  pve-ha-crm[1689]: status change startup => wait_for_quorum
>  pve-ha-crm[1689]: status change wait_for_quorum => slave
>  [...]
>  pve-ha-crm[1689]: successfully acquired lock 'ha_manager_lock'
>  pve-ha-crm[1689]: watchdog active
>  pve-ha-crm[1689]: status change slave => master
If those are present then everything should working.
> ha-manager status
Or looking in the webinterface for the HA tab in Datacenter gives you an 
overview.
>
> I will investigate, and report (unless I find I did something wrong). I
> also tried to explicitly separate the NIC for the filer from the
> management/cluster one, but it didn't work (and I can't say I believe
> it's actually being checked, but it's mentionned in the docs ;))
You mean separate storage/cluster network? That works.
Separating corosync from the management one gave someone also trouble, 
but should work. I'll try that on monday.
> Also, I won't insist much, but LXC live migration is a requirement for a
> proper upgrade plan, as it's working for OpenVZ, and it's not possible
> for us to lose that feature.
In another post on the mailing list it was stated that it's planned to 
make that possible for 4.1, but as always, no promises :D

Regards,
Thomas