From t.lamprecht at proxmox.com  Thu Nov 10 11:39:53 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Thu, 10 Nov 2016 11:39:53 +0100
Subject: [PVE-User] Cluster disaster
In-Reply-To: <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
Message-ID: <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>

On 11/09/2016 11:46 PM, Dhaussy Alexandre wrote:
> I had again another outage...
> BUT now everything is back online ! yay !
>
> So i think i had (at least) two problems :
>
> 1 - When installing/upgrading a node.
>
> If the node sees all SAN storages LUN before install, debian
> partitionner tries to scan all LUNs..
> This causes almost all nodes to reboot (not sure why, maybe it causes
> latency in lvm cluster, or a problem with a lock somewhere..)
>
> Same thing happens when f*$king os_prober spawns out on kernel upgrade.
> It scans all LVs and causes nodes reboots. So now i make sure of this in
> /etc/default/grub => GRUB_DISABLE_OS_PROBER=true

Yes OS_PROBER is _bad_ and may even corrupt some FS under some 
conditions, AFAIK.
The Proxmox VE iso does not have it for this reason.


>
> 2 - There seems to be a bug in lrm.
>
> Tonight i have seen timeouts in qmstarts in /var/log/pve/tasks/active.
> Just after the timeouts, lrm was kind of stuck doing nothing.

If it's doing nothing it would be interesting to see in which state it is.
Because if it's already online and  active the watchdog must trigger if 
it is stuck for ~60 seconds or more.


> Services began to start again after i restarted the service, anyway a
> few seconds after, the nodes got fenced.

Hmm, this means the watchdog was already running out.

> I think the timeouts are due to a bottlenet in our storage switchs, i
> have a few messages like this :
>
> Nov  9 22:34:40 proxmoxt25 kernel: [ 5389.318716] qla2xxx
> [0000:08:00.1]-801c:2: Abort command issued nexus=2:2:28 --  1 2002.
> Nov  9 22:34:41 proxmoxt25 kernel: [ 5390.482259] qla2xxx
> [0000:08:00.1]-801c:2: Abort command issued nexus=2:1:28 --  1 2002.
>
> So when all nodes rebooted, i may have hit the bottleneck, then the lrm
> bug, and all ha services were frozen... (happened several times.)

Yeah I looked a bit through logs of two of your nodes, it looks like the 
system hit quite some bottle necks..
CRM/LRM run often in 'loop took to long' errors the filesystem also is 
sometimes not writable.
You have in some logs some huge retransmit list from corosync.

Where does your cluster communication happens, not on the storage network?


A few general hints:

The ha-stack does not likes it when somebody moves the VM configs around 
from a VM in the started/migrate state.
If it's in stopped it's OK as there it can fixup the VM location. Else 
it cannot simply fixup the location as it does not know if the resource 
still runs on the (old) node.

Modifying the manager status does not works, if a manager is currently 
elected.
The manager reads it only on it transition from slave to manager to get 
the last state in memory.
After that it writes it just out so that on a master reelection the new 
master has the most current state.

So if something bad as this happens again I'd to the following:

If no master election happen, but there is a quorate parition of nodes 
and you are sure that thier pve-ha-crm service is up and running (else 
restart it first) you can try to trigger an instant master reelection by 
deleting the olds masters lock (which may not yet be invalid through 
timeout):
rmdir /etc/pve/priv/lock/ha_manager_lock/

If then a master election happens you should be fine and the HA stack 
will do its work and recover.

If you have to move the VMs you should disable those primary, ha-manager 
disable SID does that also quite well in a lot of problematic situations 
as it just edits the resources.cfg.
If this does not work you have no quorum or pve-cluster has a problem, 
which both mean HA recovery cannot take place on this node one way or 
the other.


>
> Thanks again for the help.
> Alexandre.
>
> Le 09/11/2016 ? 20:54, Thomas Lamprecht a ?crit :
>>
>> On 09.11.2016 18:05, Dhaussy Alexandre wrote:
>>> I have done a cleanup of ressources with echo "" >
>>> /etc/pve/ha/resources.cfg
>>>
>>> It seems to have resolved all problems with inconsistent status of
>>> lrm/lcm in the GUI.
>>>
>> Good. Logs would be interesting to see what went wrong but I do not
>> know if I can skim through them as your setup is not too small and there
>> may be much noise from the outage in there.
>>
>> If you have time you may sent me the log file(s) generated by:
>>
>> journalctl --since "-2 days" -u corosync -u pve-ha-lrm -u pve-ha-crm
>> -u pve-cluster  > pve-log-$(hostname).log
>>
>> (adapt the "-2 days" accordingly, it understands also something like,
>> "-1 day 3 hours")
>>
>> Sent them directly to my address (The list does not accepts bigger
>> attachments,
>> limit is something like 20-20 kb AFAIK).
>> I cannot promise any deep examination, but I can skim through them and
>> look what happened in the HA stack, maybe I see something obvious.
>>
>>> A new master have been elected. The manager_status file have been
>>> cleaned up.
>>> All nodes are idle or active.
>>>
>>> I am re-starting all vms in ha with "ha manager add".
>>> Seems to work now... :-/
>>>
>>> Le 09/11/2016 ? 17:40, Dhaussy Alexandre a ?crit :
>>>> Sorry my old message was too big...
>>>>
>>>> Thanks for the input !...
>>>>
>>>> I have attached manager_status files.
>>>> .old is the original file, and .new is the file i have modified and put
>>>> in /etc/pve/ha.
>>>>
>>>> I know this is bad but here's what i've done :
>>>>
>>>> - delnode on known NON-working nodes.
>>>> - rm -Rf /etc/pve/nodes/x for all NON-working nodes.
>>>> - replace all NON-working nodes with working nodes in
>>>> /etc/pve/ha/manager_status
>>>> - mv VM.conf files in the proper node directory
>>>> (/etc/pve/nodes/x/qemu-server/) in reference to
>>>> /etc/pve/ha/manager_status
>>>> - restart pve-ha-crm and pve-ha-lrm on all nodes
>>>>
>>>> Now on several nodes i have thoses messages :
>>>>
>>>> nov. 09 17:08:19 proxmoxt34 pve-ha-crm[26200]: status change startup =>
>>>> wait_for_quorum
>>>> nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed:
>>>> Noeud final de transport n'est pas connect?
>>>> nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed:
>>>> Connexion refus?e
>>>> nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed:
>>>> Connexion refus?e
>>>>
>>
>> This means that something with the cluster filesystem (pve-cluster)
>> was not OK.
>> Those messages weren't there previously?
>>
>>
>>>> nov. 09 17:08:22 proxmoxt34 pve-ha-lrm[26282]: status change startup =>
>>>> wait_for_agent_lock
>>>> nov. 09 17:12:07 proxmoxt34 pve-ha-lrm[26282]: ipcc_send_rec failed:
>>>> Noeud final de transport n'est pas connect?
>>>>
>>>> We are also investigating on a possible network problem..
>>>>
>> Multicast properly working?
>>
>>
>>>> Le 09/11/2016 ? 17:00, Thomas Lamprecht a ?crit :
>>>>> Hi,
>>>>>
>>>>> On 09.11.2016 16:29, Dhaussy Alexandre wrote:
>>>>>> I try to remove from ha in the gui, but nothing happends.
>>>>>> There are some services in "error" or "fence" state.
>>>>>>
>>>>>> Now i tried to remove the non-working nodes from the cluster... but i
>>>>>> still see those nodes in /etc/pve/ha/manager_status.
>>>>> Can you post the manager status please?
>>>>>
>>>>> Also, is pve-ha-lrm and pve-ha-crm up and running without any error
>>>>> on all nodes, at least on those in the quorate partition?
>>>>>
>>>>> check with:
>>>>> systemctl status pve-ha-lrm
>>>>> systemctl status pve-ha-crm
>>>>>
>>>>> If not restart them, and if then its still problematic please post the
>>>>> output
>>>>> of the systemctl status call (if its the same on all node one output
>>>>> should be enough).
>>>>>
>>>>>
>>>>>> Le 09/11/2016 ? 16:13, Dietmar Maurer a ?crit :
>>>>>>>> I wanted to remove vms from HA and start the vms locally, but I
>>>>>>>> can?t even do
>>>>>>>> that (nothing happens.)
>>>>> You can remove them from HA by emptying the HA resource file (this
>>>>> deletes also
>>>>> comments and group settings, but if you need to start them _now_ that
>>>>> shouldn't be a problem)
>>>>>
>>>>> echo "" > /etc/pve/ha/resources.cfg
>>>>>
>>>>> Afterwards you should be able to start them manually.
>>>>>
>>>>>
>>>>>>> How do you do that exactly (on the GUI)? You should be able to start
>>>>>>> them
>>>>>>> manually afterwards.
>>>>>>>
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-user at pve.proxmox.com
>>>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>>>
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user at pve.proxmox.com
>>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at pve.proxmox.com
>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From lindsay.mathieson at gmail.com  Thu Nov 10 21:34:56 2016
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Fri, 11 Nov 2016 06:34:56 +1000
Subject: [PVE-User] online migration broken in latest updates - "unknown
 command 'mtunnel'"
Message-ID: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>

qm migrate 506 vnb --online
400 Parameter verification failed.
target: target is local node.
qm migrate <vmid> <target> [OPTIONS]
root at vnb:/etc/pve/softlog# qm migrate 506 vng --online
ERROR: unknown command 'mtunnel'


-- 
Lindsay Mathieson


From t.lamprecht at proxmox.com  Thu Nov 10 22:11:46 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Thu, 10 Nov 2016 22:11:46 +0100
Subject: [PVE-User] online migration broken in latest updates - "unknown
 command 'mtunnel'"
In-Reply-To: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
Message-ID: <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>

On 10.11.2016 21:34, Lindsay Mathieson wrote:
> qm migrate 506 vnb --online
> 400 Parameter verification failed.
> target: target is local node.
> qm migrate <vmid> <target> [OPTIONS]
> root at vnb:/etc/pve/softlog# qm migrate 506 vng --online
> ERROR: unknown command 'mtunnel'
>
>

Are you sure you upgraded all, i.e. used:
apt update
apt full-upgrade

or
apt-get update
apt-get dist-upgrade

Can you post:
pveversion -v


From lindsay.mathieson at gmail.com  Thu Nov 10 22:35:37 2016
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Fri, 11 Nov 2016 07:35:37 +1000
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
Message-ID: <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>

On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
> Are you sure you upgraded all, i.e. used:
> apt update
> apt full-upgrade 

Resolved it thanks Thomas - I hadn't updated the *destination* server.


Thanks,

-- 
Lindsay Mathieson


From lists at hexis.consulting  Thu Nov 10 22:53:55 2016
From: lists at hexis.consulting (Hexis)
Date: Thu, 10 Nov 2016 15:53:55 -0600
Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi
Message-ID: <b2960bfa-34f9-2ce5-c449-914d7b2c3553@hexis.consulting>

I am trying to run Proxmox PVE 4.3 inside of VMware ESXi, which I was 
advised would work (obviously issues would occur with KVM). All has gone 
well so far, containers run fine, however, for some reason, the 
containers cannot access their gateway when routing through the linux 
bridge, which corresponds to an interface on the VM. The management 
interface of ProxMox which works the same way is fine.

Any ideas?

Thanks,

-Hexis


From t.lamprecht at proxmox.com  Fri Nov 11 08:05:42 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Fri, 11 Nov 2016 08:05:42 +0100
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
Message-ID: <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>

On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
>> Are you sure you upgraded all, i.e. used:
>> apt update
>> apt full-upgrade 
>
> Resolved it thanks Thomas - I hadn't updated the *destination* server.
>


makes sense, should have been made sense a few days ago this, would not 
be too hard to catch :/

anyway, for anyone reading this:
When upgrading qemu-server to version 4.0.93 or newer you should upgrade 
all other nodes pve-cluster package to version 4.0-47 or newer, else 
migrations to those nodes will not work - as we use a new command to 
detect if we should send the traffic over a separate migration network.

cheers,
Thomas


From colonellor at gmail.com  Fri Nov 11 08:48:00 2016
From: colonellor at gmail.com (Roberto Colonello)
Date: Fri, 11 Nov 2016 08:48:00 +0100
Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi
In-Reply-To: <b2960bfa-34f9-2ce5-c449-914d7b2c3553@hexis.consulting>
References: <b2960bfa-34f9-2ce5-c449-914d7b2c3553@hexis.consulting>
Message-ID: <CAJ_LXYLh8C0LSOcN1h-F-zBot1MZycQKVpVOT8GUjegi-BH0qw@mail.gmail.com>

On Thu, Nov 10, 2016 at 10:53 PM, Hexis <lists at hexis.consulting> wrote:

>
> Any ideas?


Ciao,
have you tried to set "Promiscuos mode: Accept" into vSwitch's Security
tab  ?


-- 
/roby.deb
--
"There are only 10 types of people in the world:Those who understand
binary, and those who don't"
SOFTWARE is like SEX IT's better when it's FREE
https://linuxcounter.net/ Counter Number:    552671 Favorite Distro : Debian


From yannis.milios at gmail.com  Fri Nov 11 13:11:27 2016
From: yannis.milios at gmail.com (Yannis Milios)
Date: Fri, 11 Nov 2016 12:11:27 +0000
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
Message-ID: <CAFiF2OpOQeOZn=J_B1ZXBTSyUi5tecpuJeU=_AhP3ZDQZCFfCg@mail.gmail.com>

Not sure if it's related, but after upgrading yesterday to the latest
updates, Ceph snapshots take a very long time to complete and finally they
fail.
This happens only if the VM is running and if I check the 'include RAM' box
in snapshot window. All 3 pve/ceph nodes are upgraded to the latest updates.

I have 3 pve nodes with ceph storage role on them. Below follows some more
info:

proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
pve-manager: 4.3-10 (running version: 4.3-10/7230e60f)
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-47
qemu-server: 4.0-94
pve-firmware: 1.1-10
libpve-common-perl: 4.0-80
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-14
pve-qemu-kvm: 2.7.0-6
pve-container: 1.0-81
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
openvswitch-switch: 2.5.0-1
ceph: 0.94.9-1~bpo80+1

ceph status
    cluster 32d19f44-fcef-4863-ad94-cb8d738fe179
     health HEALTH_OK
     monmap e3: 3 mons at {0=
192.168.148.65:6789/0,1=192.168.149.95:6789/0,2=192.168.149.115:6789/0}
            election epoch 260, quorum 0,1,2 0,1,2
     osdmap e740: 6 osds: 6 up, 6 in
      pgmap v2319446: 120 pgs, 1 pools, 198 GB data, 51642 objects
            393 GB used, 2183 GB / 2576 GB avail
                 120 active+clean
  client io 4973 B/s rd, 115 kB/s wr, 35 op/s


On Fri, Nov 11, 2016 at 7:05 AM, Thomas Lamprecht <t.lamprecht at proxmox.com>
wrote:

> On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
>
>> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
>>
>>> Are you sure you upgraded all, i.e. used:
>>> apt update
>>> apt full-upgrade
>>>
>>
>> Resolved it thanks Thomas - I hadn't updated the *destination* server.
>>
>>
>
> makes sense, should have been made sense a few days ago this, would not be
> too hard to catch :/
>
> anyway, for anyone reading this:
> When upgrading qemu-server to version 4.0.93 or newer you should upgrade
> all other nodes pve-cluster package to version 4.0-47 or newer, else
> migrations to those nodes will not work - as we use a new command to detect
> if we should send the traffic over a separate migration network.
>
> cheers,
> Thomas
>
>
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From w.bumiller at proxmox.com  Fri Nov 11 13:28:06 2016
From: w.bumiller at proxmox.com (Wolfgang Bumiller)
Date: Fri, 11 Nov 2016 13:28:06 +0100
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <CAFiF2OpOQeOZn=J_B1ZXBTSyUi5tecpuJeU=_AhP3ZDQZCFfCg@mail.gmail.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
 <CAFiF2OpOQeOZn=J_B1ZXBTSyUi5tecpuJeU=_AhP3ZDQZCFfCg@mail.gmail.com>
Message-ID: <20161111122806.GA13820@olga.wb>

Any chance you could compare pve-qemu-kvm 2.7.0-5 and this test build:
<http://download2.proxmox.com/temp/pve/pve-qemu-kvm_2.7.0-6_amd64.deb> ?

On Fri, Nov 11, 2016 at 12:11:27PM +0000, Yannis Milios wrote:
> Not sure if it's related, but after upgrading yesterday to the latest
> updates, Ceph snapshots take a very long time to complete and finally they
> fail.
> This happens only if the VM is running and if I check the 'include RAM' box
> in snapshot window. All 3 pve/ceph nodes are upgraded to the latest updates.
> 
> I have 3 pve nodes with ceph storage role on them. Below follows some more
> info:
> 
> proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
> pve-manager: 4.3-10 (running version: 4.3-10/7230e60f)
> pve-kernel-4.4.21-1-pve: 4.4.21-71
> pve-kernel-4.4.19-1-pve: 4.4.19-66
> lvm2: 2.02.116-pve3
> corosync-pve: 2.4.0-1
> libqb0: 1.0-1
> pve-cluster: 4.0-47
> qemu-server: 4.0-94
> pve-firmware: 1.1-10
> libpve-common-perl: 4.0-80
> libpve-access-control: 4.0-19
> libpve-storage-perl: 4.0-68
> pve-libspice-server1: 0.12.8-1
> vncterm: 1.2-1
> pve-docs: 4.3-14
> pve-qemu-kvm: 2.7.0-6
> pve-container: 1.0-81
> pve-firewall: 2.0-31
> pve-ha-manager: 1.0-35
> ksm-control-daemon: 1.2-1
> glusterfs-client: 3.5.2-2+deb8u2
> lxc-pve: 2.0.5-1
> lxcfs: 2.0.4-pve2
> criu: 1.6.0-1
> novnc-pve: 0.5-8
> smartmontools: 6.5+svn4324-1~pve80
> zfsutils: 0.6.5.8-pve13~bpo80
> openvswitch-switch: 2.5.0-1
> ceph: 0.94.9-1~bpo80+1
> 
> ceph status
>     cluster 32d19f44-fcef-4863-ad94-cb8d738fe179
>      health HEALTH_OK
>      monmap e3: 3 mons at {0=
> 192.168.148.65:6789/0,1=192.168.149.95:6789/0,2=192.168.149.115:6789/0}
>             election epoch 260, quorum 0,1,2 0,1,2
>      osdmap e740: 6 osds: 6 up, 6 in
>       pgmap v2319446: 120 pgs, 1 pools, 198 GB data, 51642 objects
>             393 GB used, 2183 GB / 2576 GB avail
>                  120 active+clean
>   client io 4973 B/s rd, 115 kB/s wr, 35 op/s
> 
> 
> 
> On Fri, Nov 11, 2016 at 7:05 AM, Thomas Lamprecht <t.lamprecht at proxmox.com>
> wrote:
> 
> > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
> >
> >> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
> >>
> >>> Are you sure you upgraded all, i.e. used:
> >>> apt update
> >>> apt full-upgrade
> >>>
> >>
> >> Resolved it thanks Thomas - I hadn't updated the *destination* server.
> >>
> >>
> >
> > makes sense, should have been made sense a few days ago this, would not be
> > too hard to catch :/
> >
> > anyway, for anyone reading this:
> > When upgrading qemu-server to version 4.0.93 or newer you should upgrade
> > all other nodes pve-cluster package to version 4.0-47 or newer, else
> > migrations to those nodes will not work - as we use a new command to detect
> > if we should send the traffic over a separate migration network.
> >
> > cheers,
> > Thomas
> >
> >
> >
> >
> > _______________________________________________
> > pve-user mailing list
> > pve-user at pve.proxmox.com
> > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


From yannis.milios at gmail.com  Fri Nov 11 13:45:16 2016
From: yannis.milios at gmail.com (Yannis Milios)
Date: Fri, 11 Nov 2016 12:45:16 +0000
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <20161111122806.GA13820@olga.wb>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
 <CAFiF2OpOQeOZn=J_B1ZXBTSyUi5tecpuJeU=_AhP3ZDQZCFfCg@mail.gmail.com>
 <20161111122806.GA13820@olga.wb>
Message-ID: <CAFiF2OrPqmLFSJXkzhXLRSqBhsfNr76o9xub7fBEP9FZ2Yi0Rw@mail.gmail.com>

Just tested it with pve-qemu-kvm 2.7.0-6 and it works fine, thanks!

On Fri, Nov 11, 2016 at 12:28 PM, Wolfgang Bumiller <w.bumiller at proxmox.com>
wrote:

> Any chance you could compare pve-qemu-kvm 2.7.0-5 and this test build:
> <http://download2.proxmox.com/temp/pve/pve-qemu-kvm_2.7.0-6_amd64.deb> ?
>
> On Fri, Nov 11, 2016 at 12:11:27PM +0000, Yannis Milios wrote:
> > Not sure if it's related, but after upgrading yesterday to the latest
> > updates, Ceph snapshots take a very long time to complete and finally
> they
> > fail.
> > This happens only if the VM is running and if I check the 'include RAM'
> box
> > in snapshot window. All 3 pve/ceph nodes are upgraded to the latest
> updates.
> >
> > I have 3 pve nodes with ceph storage role on them. Below follows some
> more
> > info:
> >
> > proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
> > pve-manager: 4.3-10 (running version: 4.3-10/7230e60f)
> > pve-kernel-4.4.21-1-pve: 4.4.21-71
> > pve-kernel-4.4.19-1-pve: 4.4.19-66
> > lvm2: 2.02.116-pve3
> > corosync-pve: 2.4.0-1
> > libqb0: 1.0-1
> > pve-cluster: 4.0-47
> > qemu-server: 4.0-94
> > pve-firmware: 1.1-10
> > libpve-common-perl: 4.0-80
> > libpve-access-control: 4.0-19
> > libpve-storage-perl: 4.0-68
> > pve-libspice-server1: 0.12.8-1
> > vncterm: 1.2-1
> > pve-docs: 4.3-14
> > pve-qemu-kvm: 2.7.0-6
> > pve-container: 1.0-81
> > pve-firewall: 2.0-31
> > pve-ha-manager: 1.0-35
> > ksm-control-daemon: 1.2-1
> > glusterfs-client: 3.5.2-2+deb8u2
> > lxc-pve: 2.0.5-1
> > lxcfs: 2.0.4-pve2
> > criu: 1.6.0-1
> > novnc-pve: 0.5-8
> > smartmontools: 6.5+svn4324-1~pve80
> > zfsutils: 0.6.5.8-pve13~bpo80
> > openvswitch-switch: 2.5.0-1
> > ceph: 0.94.9-1~bpo80+1
> >
> > ceph status
> >     cluster 32d19f44-fcef-4863-ad94-cb8d738fe179
> >      health HEALTH_OK
> >      monmap e3: 3 mons at {0=
> > 192.168.148.65:6789/0,1=192.168.149.95:6789/0,2=192.168.149.115:6789/0}
> >             election epoch 260, quorum 0,1,2 0,1,2
> >      osdmap e740: 6 osds: 6 up, 6 in
> >       pgmap v2319446: 120 pgs, 1 pools, 198 GB data, 51642 objects
> >             393 GB used, 2183 GB / 2576 GB avail
> >                  120 active+clean
> >   client io 4973 B/s rd, 115 kB/s wr, 35 op/s
> >
> >
> >
> > On Fri, Nov 11, 2016 at 7:05 AM, Thomas Lamprecht <
> t.lamprecht at proxmox.com>
> > wrote:
> >
> > > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
> > >
> > >> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
> > >>
> > >>> Are you sure you upgraded all, i.e. used:
> > >>> apt update
> > >>> apt full-upgrade
> > >>>
> > >>
> > >> Resolved it thanks Thomas - I hadn't updated the *destination* server.
> > >>
> > >>
> > >
> > > makes sense, should have been made sense a few days ago this, would
> not be
> > > too hard to catch :/
> > >
> > > anyway, for anyone reading this:
> > > When upgrading qemu-server to version 4.0.93 or newer you should
> upgrade
> > > all other nodes pve-cluster package to version 4.0-47 or newer, else
> > > migrations to those nodes will not work - as we use a new command to
> detect
> > > if we should send the traffic over a separate migration network.
> > >
> > > cheers,
> > > Thomas
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > pve-user mailing list
> > > pve-user at pve.proxmox.com
> > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel at pve.proxmox.com
> > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>


From ADhaussy at voyages-sncf.com  Fri Nov 11 15:56:32 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Fri, 11 Nov 2016 14:56:32 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
Message-ID: <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>

I really hope to find an explanation to all this mess.
Because i'm not very confident right now..

So far if i understand all this correctly.. I'm not very found of how watchdog behaves with crm/lrm.
To  make a comparison with PVE 3 (RedHat cluster), fencing happened on the corosync/cluster communication stack, but not on the resource manager stack.

On PVE 3, several times I found rgmanager was stuck.
I just had to find the culprit process (usually pve status), kill it, et voila.
But it never caused an outage.

> > 2 - There seems to be a bug in lrm.
> >
> > Tonight i have seen timeouts in qmstarts in /var/log/pve/tasks/active.
> > Just after the timeouts, lrm was kind of stuck doing nothing.
> 
> If it's doing nothing it would be interesting to see in which state it is.
> Because if it's already online and  active the watchdog must trigger if
> it is stuck for ~60 seconds or more.

I'll try to grab some info if it happens again.

> Hmm, this means the watchdog was already running out.

Do you have a hint why there is no messages in the logs when watchdog actually seems to trigger fencing ?
Because when a node suddently reboots, i can't be sure if it's the watchdog, a hardware bug, kernel bug or whatever..

> Yeah I looked a bit through logs of two of your nodes, it looks like the
> system hit quite some bottle necks..
> CRM/LRM run often in 'loop took to long' errors the filesystem also is
> sometimes not writable.
> You have in some logs some huge retransmit list from corosync.

Yes, there were much retransmits on "9 Nov 14:56".
This matches when we tried to switch network path, because at this time the nodes did not seem to talk to each other correctly (lrm waiting for quorum.)

Anyway I need to triple check (again) IGMP snooping on all network switchs.
+ Check HP blades Virtual Connect and firmwares..

> Where does your cluster communication happens, not on the storage
> network?

Storage is on fibre channel.
Cluster communication happens on a dedicated network vlan (shared with vmware.)
I also use another vlan for live migrations.

From ADhaussy at voyages-sncf.com  Fri Nov 11 16:28:09 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Fri, 11 Nov 2016 15:28:09 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
Message-ID: <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>

> Do you have a hint why there is no messages in the logs when watchdog
> actually seems to trigger fencing ?
> Because when a node suddently reboots, i can't be sure if it's the watchdog,
> a hardware bug, kernel bug or whatever..

Responding to myself, i find this interesting :

Nov  8 10:39:01 proxmoxt35 corosync[35250]:  [TOTEM ] A new membership (10.xx.xx.11:684) was formed. Members joined: 13
Nov  8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired - disable watchdog updates

Nov  8 10:39:01 proxmoxt31 corosync[23483]:  [TOTEM ] A new membership (10.xx.xx.11:684) was formed. Members joined: 13
Nov  8 10:40:01 proxmoxt31 watchdog-mux[22395]: client watchdog expired - disable watchdog updates

Nov  8 10:39:01 proxmoxt30 corosync[24634]:  [TOTEM ] A new membership (10.xx.xx.11:684) was formed. Members joined: 13
Nov  8 10:40:00 proxmoxt30 watchdog-mux[23492]: client watchdog expired - disable watchdog updates


Nov  9 10:05:41 proxmoxt20 corosync[42543]:  [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7
Nov  9 10:05:46 proxmoxt20 corosync[42543]:  [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7
Nov  9 10:06:42 proxmoxt20 watchdog-mux[41401]: client watchdog expired - disable watchdog updates

Nov  9 10:05:41 proxmoxt21 corosync[16184]:  [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7
Nov  9 10:05:46 proxmoxt21 corosync[16184]:  [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7
Nov  9 10:06:42 proxmoxt21 watchdog-mux[42853]: client watchdog expired - disable watchdog updates

Nov  9 10:05:41 proxmoxt30 corosync[16159]:  [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7
Nov  9 10:05:46 proxmoxt30 corosync[16159]:  [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7
Nov  9 10:06:42 proxmoxt30 watchdog-mux[43148]: client watchdog expired - disable watchdog updates

Nov  9 10:05:41 proxmoxt31 corosync[16297]:  [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7
Nov  9 10:05:46 proxmoxt31 corosync[16297]:  [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7
Nov  9 10:06:42 proxmoxt31 watchdog-mux[42761]: client watchdog expired - disable watchdog updates

Nov  9 10:05:41 proxmoxt34 corosync[41330]:  [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7
Nov  9 10:05:46 proxmoxt34 corosync[41330]:  [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7
Nov  9 10:06:42 proxmoxt34 watchdog-mux[40262]: client watchdog expired - disable watchdog updates

Nov  9 10:05:41 proxmoxt35 corosync[16158]:  [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7
Nov  9 10:05:46 proxmoxt35 corosync[16158]:  [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7
Nov  9 10:06:42 proxmoxt35 watchdog-mux[42684]: client watchdog expired - disable watchdog updates

From mir at miras.org  Fri Nov 11 16:31:54 2016
From: mir at miras.org (Michael Rasmussen)
Date: Fri, 11 Nov 2016 16:31:54 +0100
Subject: [PVE-User] Cluster disaster
In-Reply-To: <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
Message-ID: <B1988AE0-E99C-47B9-990A-7DB4D294AFDC@miras.org>

A long shot. Do you have a hardware watchdog enabled in bios?

On November 11, 2016 4:28:09 PM GMT+01:00, Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:
>> Do you have a hint why there is no messages in the logs when watchdog
>> actually seems to trigger fencing ?
>> Because when a node suddently reboots, i can't be sure if it's the
>watchdog,
>> a hardware bug, kernel bug or whatever..
>
>Responding to myself, i find this interesting :
>
>Nov  8 10:39:01 proxmoxt35 corosync[35250]:  [TOTEM ] A new membership
>(10.xx.xx.11:684) was formed. Members joined: 13
>Nov  8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired
>- disable watchdog updates
>
>Nov  8 10:39:01 proxmoxt31 corosync[23483]:  [TOTEM ] A new membership
>(10.xx.xx.11:684) was formed. Members joined: 13
>Nov  8 10:40:01 proxmoxt31 watchdog-mux[22395]: client watchdog expired
>- disable watchdog updates
>
>Nov  8 10:39:01 proxmoxt30 corosync[24634]:  [TOTEM ] A new membership
>(10.xx.xx.11:684) was formed. Members joined: 13
>Nov  8 10:40:00 proxmoxt30 watchdog-mux[23492]: client watchdog expired
>- disable watchdog updates
>
>
>Nov  9 10:05:41 proxmoxt20 corosync[42543]:  [TOTEM ] A new membership
>(10.xx.xx.11:796) was formed. Members left: 7
>Nov  9 10:05:46 proxmoxt20 corosync[42543]:  [TOTEM ] A new membership
>(10.xx.xx.11:800) was formed. Members joined: 7
>Nov  9 10:06:42 proxmoxt20 watchdog-mux[41401]: client watchdog expired
>- disable watchdog updates
>
>Nov  9 10:05:41 proxmoxt21 corosync[16184]:  [TOTEM ] A new membership
>(10.xx.xx.11:796) was formed. Members left: 7
>Nov  9 10:05:46 proxmoxt21 corosync[16184]:  [TOTEM ] A new membership
>(10.xx.xx.11:800) was formed. Members joined: 7
>Nov  9 10:06:42 proxmoxt21 watchdog-mux[42853]: client watchdog expired
>- disable watchdog updates
>
>Nov  9 10:05:41 proxmoxt30 corosync[16159]:  [TOTEM ] A new membership
>(10.xx.xx.11:796) was formed. Members left: 7
>Nov  9 10:05:46 proxmoxt30 corosync[16159]:  [TOTEM ] A new membership
>(10.xx.xx.11:800) was formed. Members joined: 7
>Nov  9 10:06:42 proxmoxt30 watchdog-mux[43148]: client watchdog expired
>- disable watchdog updates
>
>Nov  9 10:05:41 proxmoxt31 corosync[16297]:  [TOTEM ] A new membership
>(10.xx.xx.11:796) was formed. Members left: 7
>Nov  9 10:05:46 proxmoxt31 corosync[16297]:  [TOTEM ] A new membership
>(10.xx.xx.11:800) was formed. Members joined: 7
>Nov  9 10:06:42 proxmoxt31 watchdog-mux[42761]: client watchdog expired
>- disable watchdog updates
>
>Nov  9 10:05:41 proxmoxt34 corosync[41330]:  [TOTEM ] A new membership
>(10.xx.xx.11:796) was formed. Members left: 7
>Nov  9 10:05:46 proxmoxt34 corosync[41330]:  [TOTEM ] A new membership
>(10.xx.xx.11:800) was formed. Members joined: 7
>Nov  9 10:06:42 proxmoxt34 watchdog-mux[40262]: client watchdog expired
>- disable watchdog updates
>
>Nov  9 10:05:41 proxmoxt35 corosync[16158]:  [TOTEM ] A new membership
>(10.xx.xx.11:796) was formed. Members left: 7
>Nov  9 10:05:46 proxmoxt35 corosync[16158]:  [TOTEM ] A new membership
>(10.xx.xx.11:800) was formed. Members joined: 7
>Nov  9 10:06:42 proxmoxt35 watchdog-mux[42684]: client watchdog expired
>- disable watchdog updates
>_______________________________________________
>pve-user mailing list
>pve-user at pve.proxmox.com
>http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

----

This mail was virus scanned and spam checked before delivery.
This mail is also DKIM signed. See header dkim-signature.


From dietmar at proxmox.com  Fri Nov 11 17:43:23 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Fri, 11 Nov 2016 17:43:23 +0100 (CET)
Subject: [PVE-User] Cluster disaster
In-Reply-To: <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
Message-ID: <1880338119.114.1478882604308@webmail.proxmox.com>

> Responding to myself, i find this interesting :
> 
> Nov  8 10:39:01 proxmoxt35 corosync[35250]:  [TOTEM ] A new membership
> (10.xx.xx.11:684) was formed. Members joined: 13
> Nov  8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired -
> disable watchdog updates

you lost quorum, and the watchdog expired - that is how the watchdog based
fencing works.


From ADhaussy at voyages-sncf.com  Fri Nov 11 17:44:08 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Fri, 11 Nov 2016 16:44:08 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <B1988AE0-E99C-47B9-990A-7DB4D294AFDC@miras.org>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <B1988AE0-E99C-47B9-990A-7DB4D294AFDC@miras.org>
Message-ID: <8af65c49f3f544518bb13459c089e277@ECLIPSE.groupevsc.com>

> A long shot. Do you have a hardware watchdog enabled in bios?

I didn't modify any BIOS parameters, except power management.
So I believe it's enabled.

hpwdt module (hp ilo watchdog) is not loaded.
HP ASR is enabled (10 min timeout.)
Ipmi_watchdog is blacklisted.
nmi_watchdog is enabled => I have seen "please disable this" in proxmox wiki, but there is no explaination why you should do it. :)


From lists at hexis.consulting  Fri Nov 11 17:48:24 2016
From: lists at hexis.consulting (Hexis)
Date: Fri, 11 Nov 2016 10:48:24 -0600
Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi
In-Reply-To: <CAJ_LXYLh8C0LSOcN1h-F-zBot1MZycQKVpVOT8GUjegi-BH0qw@mail.gmail.com>
References: <b2960bfa-34f9-2ce5-c449-914d7b2c3553@hexis.consulting>
 <CAJ_LXYLh8C0LSOcN1h-F-zBot1MZycQKVpVOT8GUjegi-BH0qw@mail.gmail.com>
Message-ID: <878756eb-252b-2eb7-b36c-d11555b650bf@hexis.consulting>

You sir are a saint! That makes total sense and was definitely the 
problem. Everything is up and working.


On 11/11/2016 1:48 AM, Roberto Colonello wrote:
> On Thu, Nov 10, 2016 at 10:53 PM, Hexis <lists at hexis.consulting> wrote:
>
>> Any ideas?
>
> Ciao,
> have you tried to set "Promiscuos mode: Accept" into vSwitch's Security
> tab  ?
>
>


From colonellor at gmail.com  Fri Nov 11 18:33:00 2016
From: colonellor at gmail.com (Roberto Colonello)
Date: Fri, 11 Nov 2016 18:33:00 +0100
Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi
In-Reply-To: <878756eb-252b-2eb7-b36c-d11555b650bf@hexis.consulting>
References: <b2960bfa-34f9-2ce5-c449-914d7b2c3553@hexis.consulting>
 <CAJ_LXYLh8C0LSOcN1h-F-zBot1MZycQKVpVOT8GUjegi-BH0qw@mail.gmail.com>
 <878756eb-252b-2eb7-b36c-d11555b650bf@hexis.consulting>
Message-ID: <CAJ_LXYJ6t9BCqTDh7Qiv381GmSa=MUsPzksB=Ng4bR443gPjuw@mail.gmail.com>

On Fri, Nov 11, 2016 at 5:48 PM, Hexis <lists at hexis.consulting> wrote:

> You sir are a saint!


Please, do not disturb the saints :-)

You are lucky, I just finish a VMware training course :-D


-- 
/roby.deb
--
"There are only 10 types of people in the world:Those who understand
binary, and those who don't"
SOFTWARE is like SEX IT's better when it's FREE
https://linuxcounter.net/ Counter Number:    552671 Favorite Distro : Debian


From ADhaussy at voyages-sncf.com  Fri Nov 11 18:41:20 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Fri, 11 Nov 2016 17:41:20 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <1880338119.114.1478882604308@webmail.proxmox.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
Message-ID: <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>

> you lost quorum, and the watchdog expired - that is how the watchdog
> based fencing works.

I don't expect to loose quorum when _one_ node joins or leave the cluster.

Nov  8 10:38:58 proxmoxt20 pmxcfs[22537]: [status] notice: update cluster info (cluster name  pxmcluster, version = 14)
Nov  8 10:39:01 proxmoxt20 corosync[22577]:  [TOTEM ] A new membership (10.98.187.11:684) was formed. Members joined: 13
Nov  8 10:39:01 proxmoxt20 corosync[22577]:  [QUORUM] Members[13]: 9 10 11 13 4 12 3 1 2 5 6 7 8
Nov  8 10:39:59 proxmoxt20 watchdog-mux[23964]: client watchdog expired - disable watchdog updates

Nov  8 10:39:01 proxmoxt35 corosync[35250]:  [TOTEM ] A new membership (10.98.187.11:684) was formed. Members joined: 13
Nov  8 10:39:01 proxmoxt35 corosync[35250]:  [QUORUM] Members[13]: 9 10 11 13 4 12 3 1 2 5 6 7 8
Nov  8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired - disable watchdog updates

From dietmar at proxmox.com  Fri Nov 11 19:43:39 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Fri, 11 Nov 2016 19:43:39 +0100 (CET)
Subject: [PVE-User] Cluster disaster
In-Reply-To: <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
Message-ID: <1860956507.131.1478889820301@webmail.proxmox.com>


> On November 11, 2016 at 6:41 PM Dhaussy Alexandre <ADhaussy at voyages-sncf.com>
> wrote:
> 
> 
> > you lost quorum, and the watchdog expired - that is how the watchdog
> > based fencing works.
> 
> I don't expect to loose quorum when _one_ node joins or leave the cluster.

This was probably a long time before - but I have not read through the whole
logs ...


From daniel at linux-nerd.de  Sat Nov 12 17:15:11 2016
From: daniel at linux-nerd.de (Daniel)
Date: Sat, 12 Nov 2016 17:15:11 +0100
Subject: [PVE-User] Container didnt start or stuck
Message-ID: <A942FCEC-BF5C-46E7-B9F1-2F64DF268F4B@linux-nerd.de>

Hi There,

after reboot the Host-System i get a problem with some VMs.
The VM is booting and haning at such kind of Process:

find . -depth -xdev ! -name . ! ( -path ./lost+found -uid 0 ) ! ( -path ./quota.user -uid 0 ) ! ( -path ./aquota.user -uid 0 ) ! ( -path ./quota.group -uid 0 ) ! ( -path ./aquota.

after killing that task by hand it begins to boot as expected.
Anyone know if this is normal and tooks some time to be finished?

Cheers

Daniel

From daniel at linux-nerd.de  Sat Nov 12 20:46:53 2016
From: daniel at linux-nerd.de (Daniel)
Date: Sat, 12 Nov 2016 20:46:53 +0100
Subject: [PVE-User] Backup
Message-ID: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>

Hi there,

before we used LVM-THIN we were able to Backup all Contains directly from the Host-System.
Now, everythink is LVM. Is there any known and easy way to backup all Hosts including all VMs?
For example with rsync or backuppc or how ever?

Cheers

Daniel

From gbr at majentis.com  Sun Nov 13 15:15:36 2016
From: gbr at majentis.com (Gerald Brandt)
Date: Sun, 13 Nov 2016 08:15:36 -0600
Subject: [PVE-User] Kernel oops
Message-ID: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com>

Hi,

I'm getting a lot of crashes on my Proxmox box. I am runing Proxmox on a 
Debian base install, but I have anther boxes that does the same, and it 
is fine.


Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442402] ------------[ cut 
here ]------------
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442408] WARNING: CPU: 2 
PID: 0 at kernel/rcu/tree.c:2733 rcu_process_callbacks+0x5bb/0x5e0()
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442409] Modules linked in: 
nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables 
iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs 
lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi 
asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul 
snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm 
snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 
lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep 
i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd 
sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp 
fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi 
vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci r8169 mii 
firewire_core crc_itu_t sata_sil24 ahci libahci fjes
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442454] CPU: 2 PID: 0 Comm: 
swapper/2 Not tainted 4.4.21-1-pve #1
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442455] Hardware name: To 
be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 
11/24/2011
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442457] 0000000000000086 
63ad933f85fa0f2b ffff88083fc83e70 ffffffff813f3f83
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442459] 0000000000000000 
ffffffff81ccfadb ffff88083fc83ea8 ffffffff81081806
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442460] ffffffff81e576c0 
ffff88083fc97f38 0000000000000246 0000000000000000
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442462] Call Trace:
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442463] <IRQ>  
[<ffffffff813f3f83>] dump_stack+0x63/0x90
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442469] 
[<ffffffff81081806>] warn_slowpath_common+0x86/0xc0
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442471] 
[<ffffffff8108194a>] warn_slowpath_null+0x1a/0x20
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442473] 
[<ffffffff810e792b>] rcu_process_callbacks+0x5bb/0x5e0
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442475] 
[<ffffffff8108630e>] __do_softirq+0x10e/0x2a0
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442476] 
[<ffffffff810865fe>] irq_exit+0x8e/0x90
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442480] 
[<ffffffff81857122>] smp_apic_timer_interrupt+0x42/0x50
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442481] 
[<ffffffff818553e2>] apic_timer_interrupt+0x82/0x90
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442482] <EOI>  
[<ffffffff816d23ea>] ? cpuidle_enter_state+0x10a/0x260
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442487] 
[<ffffffff816d23c6>] ? cpuidle_enter_state+0xe6/0x260
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442488] 
[<ffffffff816d2577>] cpuidle_enter+0x17/0x20
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442491] 
[<ffffffff810c453b>] call_cpuidle+0x3b/0x70
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442492] 
[<ffffffff816d2553>] ? cpuidle_select+0x13/0x20
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442494] 
[<ffffffff810c482f>] cpu_startup_entry+0x2bf/0x380
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442496] 
[<ffffffff81051a34>] start_secondary+0x154/0x190
Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442497] ---[ end trace 
8a742910926b0ed4 ]---
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.617812] BUG: unable to 
handle kernel paging request at 000000000000bb00
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618057] IP: 
[<ffffffff811ebe57>] kmem_cache_alloc+0x77/0x200
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618662] PGD 5cb1c5067 PUD 
5cb0f2067 PMD 0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.619431] Oops: 0000 [#1] SMP
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.620253] Modules linked in: 
nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables 
iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs 
lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi 
asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul 
snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm 
snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 
lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep 
i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd 
sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp 
fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi 
vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci r8169 mii 
firewire_core crc_itu_t sata_sil24 ahci libahci fjes
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.624994] CPU: 5 PID: 23044 
Comm: ps Tainted: G        W       4.4.21-1-pve #1
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.626005] Hardware name: To 
be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 
11/24/2011
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.627039] task: 
ffff880818ed3700 ti: ffff8805cb27c000 task.ti: ffff8805cb27c000
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.628071] RIP: 
0010:[<ffffffff811ebe57>]  [<ffffffff811ebe57>] kmem_cache_alloc+0x77/0x200
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.629113] RSP: 
0018:ffff8805cb27fc98  EFLAGS: 00010282
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.630145] RAX: 
0000000000000000 RBX: 00000000024080c0 RCX: 00000000000c428b
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.631198] RDX: 
00000000000c428a RSI: 00000000024080c0 RDI: ffff88081f003700
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.632239] RBP: 
ffff8805cb27fcc8 R08: 000000000001a480 R09: 000000000000bb00
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.633275] R10: 
0000000000000006 R11: 0000000000000000 R12: 00000000024080c0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.634310] R13: 
ffffffff8120f26c R14: ffff88081f003700 R15: ffff88081f003700
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.635346] FS: 
00007f54269ce700(0000) GS:ffff88083fd40000(0000) knlGS:0000000000000000
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.636350] CS: 0010 DS: 0000 
ES: 0000 CR0: 0000000080050033
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.637388] CR2: 
000000000000bb00 CR3: 000000052f4f5000 CR4: 00000000000406e0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.638425] Stack:
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.639455] ffff8805cb27fcd0 
0000000000000000 ffff880819ad3cc0 ffff8805cb27fef4
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.640500] 0000000000000000 
ffff8805cb27fdd0 ffff8805cb27fcf0 ffffffff8120f26c
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.641545] ffffffff81217f1d 
0000000000008000 ffff8805cb27fef4 ffff8805cb27fdc0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.642587] Call Trace:
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.643623] 
[<ffffffff8120f26c>] get_empty_filp+0x5c/0x1c0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.644660] 
[<ffffffff81217f1d>] ? terminate_walk+0xbd/0xd0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.645699] 
[<ffffffff8121bee3>] path_openat+0x43/0x1530
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.646731] 
[<ffffffff8121d544>] ? putname+0x54/0x60
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.647758] 
[<ffffffff8121d9e5>] ? filename_lookup+0xf5/0x180
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.648781] 
[<ffffffff8121e5d1>] do_filp_open+0x91/0x100
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.649802] 
[<ffffffff8138eaba>] ? common_perm_cond+0x3a/0x50
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.650814] 
[<ffffffff8111e472>] ? from_kgid_munged+0x12/0x20
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.651825] 
[<ffffffff81212b27>] ? cp_new_stat+0x157/0x190
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.652786] 
[<ffffffff8122bf86>] ? __alloc_fd+0x46/0x180
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.653804] 
[<ffffffff8120c8a9>] do_sys_open+0x139/0x2a0
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.654795] 
[<ffffffff8120ca2e>] SyS_open+0x1e/0x20
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.655780] 
[<ffffffff81854676>] entry_SYSCALL_64_fastpath+0x16/0x75
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.656766] Code: 08 65 4c 03 
05 53 e3 e1 7e 4d 8b 08 4d 85 c9 0f 84 42 01 00 00 49 83 78 10 00 0f 84 
37 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 01 4c 89 c8 65 
49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.657834] RIP 
[<ffffffff811ebe57>] kmem_cache_alloc+0x77/0x200
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.658878]  RSP <ffff8805cb27fc98>
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.659907] CR2: 000000000000bb00
Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.667666] ---[ end trace 
8a742910926b0ed5 ]---

I am non-subscriptions, and I just did an update yesterday to see if it 
would fix the error. I'll be running a memtest today to see if I can 
find anything.

I hadn't done an update in awhile before that, so I'm leaning towards a 
hardware issue. What do you think?

Gerald


From gbr at majentis.com  Sun Nov 13 15:42:43 2016
From: gbr at majentis.com (Gerald Brandt)
Date: Sun, 13 Nov 2016 08:42:43 -0600
Subject: [PVE-User] Kernel oops
In-Reply-To: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com>
References: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com>
Message-ID: <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com>


On 2016-11-13 08:15 AM, Gerald Brandt wrote:
> Hi,
>
> I'm getting a lot of crashes on my Proxmox box. I am runing Proxmox on 
> a Debian base install, but I have anther boxes that does the same, and 
> it is fine.
>
>
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442402] ------------[ cut 
> here ]------------
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442408] WARNING: CPU: 2 
> PID: 0 at kernel/rcu/tree.c:2733 rcu_process_callbacks+0x5bb/0x5e0()
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442409] Modules linked 
> in: nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables 
> iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs 
> lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
> nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi 
> asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul 
> snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm 
> snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 
> lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep 
> i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd 
> sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp 
> fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi 
> vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
> raid6_pq libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci 
> r8169 mii firewire_core crc_itu_t sata_sil24 ahci libahci fjes
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442454] CPU: 2 PID: 0 
> Comm: swapper/2 Not tainted 4.4.21-1-pve #1
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442455] Hardware name: To 
> be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 
> 11/24/2011
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442457] 0000000000000086 
> 63ad933f85fa0f2b ffff88083fc83e70 ffffffff813f3f83
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442459] 0000000000000000 
> ffffffff81ccfadb ffff88083fc83ea8 ffffffff81081806
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442460] ffffffff81e576c0 
> ffff88083fc97f38 0000000000000246 0000000000000000
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442462] Call Trace:
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442463] <IRQ> 
> [<ffffffff813f3f83>] dump_stack+0x63/0x90
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442469] 
> [<ffffffff81081806>] warn_slowpath_common+0x86/0xc0
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442471] 
> [<ffffffff8108194a>] warn_slowpath_null+0x1a/0x20
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442473] 
> [<ffffffff810e792b>] rcu_process_callbacks+0x5bb/0x5e0
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442475] 
> [<ffffffff8108630e>] __do_softirq+0x10e/0x2a0
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442476] 
> [<ffffffff810865fe>] irq_exit+0x8e/0x90
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442480] 
> [<ffffffff81857122>] smp_apic_timer_interrupt+0x42/0x50
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442481] 
> [<ffffffff818553e2>] apic_timer_interrupt+0x82/0x90
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442482] <EOI> 
> [<ffffffff816d23ea>] ? cpuidle_enter_state+0x10a/0x260
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442487] 
> [<ffffffff816d23c6>] ? cpuidle_enter_state+0xe6/0x260
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442488] 
> [<ffffffff816d2577>] cpuidle_enter+0x17/0x20
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442491] 
> [<ffffffff810c453b>] call_cpuidle+0x3b/0x70
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442492] 
> [<ffffffff816d2553>] ? cpuidle_select+0x13/0x20
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442494] 
> [<ffffffff810c482f>] cpu_startup_entry+0x2bf/0x380
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442496] 
> [<ffffffff81051a34>] start_secondary+0x154/0x190
> Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442497] ---[ end trace 
> 8a742910926b0ed4 ]---
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.617812] BUG: unable to 
> handle kernel paging request at 000000000000bb00
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618057] IP: 
> [<ffffffff811ebe57>] kmem_cache_alloc+0x77/0x200
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618662] PGD 5cb1c5067 PUD 
> 5cb0f2067 PMD 0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.619431] Oops: 0000 [#1] SMP
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.620253] Modules linked 
> in: nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables 
> iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs 
> lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
> ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
> nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi 
> asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul 
> snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm 
> snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 
> lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep 
> i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd 
> sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp 
> fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi 
> vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
> raid6_pq libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci 
> r8169 mii firewire_core crc_itu_t sata_sil24 ahci libahci fjes
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.624994] CPU: 5 PID: 23044 
> Comm: ps Tainted: G        W       4.4.21-1-pve #1
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.626005] Hardware name: To 
> be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 
> 11/24/2011
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.627039] task: 
> ffff880818ed3700 ti: ffff8805cb27c000 task.ti: ffff8805cb27c000
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.628071] RIP: 
> 0010:[<ffffffff811ebe57>]  [<ffffffff811ebe57>] 
> kmem_cache_alloc+0x77/0x200
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.629113] RSP: 
> 0018:ffff8805cb27fc98  EFLAGS: 00010282
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.630145] RAX: 
> 0000000000000000 RBX: 00000000024080c0 RCX: 00000000000c428b
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.631198] RDX: 
> 00000000000c428a RSI: 00000000024080c0 RDI: ffff88081f003700
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.632239] RBP: 
> ffff8805cb27fcc8 R08: 000000000001a480 R09: 000000000000bb00
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.633275] R10: 
> 0000000000000006 R11: 0000000000000000 R12: 00000000024080c0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.634310] R13: 
> ffffffff8120f26c R14: ffff88081f003700 R15: ffff88081f003700
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.635346] FS: 
> 00007f54269ce700(0000) GS:ffff88083fd40000(0000) knlGS:0000000000000000
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.636350] CS: 0010 DS: 0000 
> ES: 0000 CR0: 0000000080050033
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.637388] CR2: 
> 000000000000bb00 CR3: 000000052f4f5000 CR4: 00000000000406e0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.638425] Stack:
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.639455] ffff8805cb27fcd0 
> 0000000000000000 ffff880819ad3cc0 ffff8805cb27fef4
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.640500] 0000000000000000 
> ffff8805cb27fdd0 ffff8805cb27fcf0 ffffffff8120f26c
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.641545] ffffffff81217f1d 
> 0000000000008000 ffff8805cb27fef4 ffff8805cb27fdc0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.642587] Call Trace:
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.643623] 
> [<ffffffff8120f26c>] get_empty_filp+0x5c/0x1c0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.644660] 
> [<ffffffff81217f1d>] ? terminate_walk+0xbd/0xd0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.645699] 
> [<ffffffff8121bee3>] path_openat+0x43/0x1530
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.646731] 
> [<ffffffff8121d544>] ? putname+0x54/0x60
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.647758] 
> [<ffffffff8121d9e5>] ? filename_lookup+0xf5/0x180
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.648781] 
> [<ffffffff8121e5d1>] do_filp_open+0x91/0x100
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.649802] 
> [<ffffffff8138eaba>] ? common_perm_cond+0x3a/0x50
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.650814] 
> [<ffffffff8111e472>] ? from_kgid_munged+0x12/0x20
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.651825] 
> [<ffffffff81212b27>] ? cp_new_stat+0x157/0x190
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.652786] 
> [<ffffffff8122bf86>] ? __alloc_fd+0x46/0x180
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.653804] 
> [<ffffffff8120c8a9>] do_sys_open+0x139/0x2a0
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.654795] 
> [<ffffffff8120ca2e>] SyS_open+0x1e/0x20
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.655780] 
> [<ffffffff81854676>] entry_SYSCALL_64_fastpath+0x16/0x75
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.656766] Code: 08 65 4c 03 
> 05 53 e3 e1 7e 4d 8b 08 4d 85 c9 0f 84 42 01 00 00 49 83 78 10 00 0f 
> 84 37 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 01 4c 89 c8 
> 65 49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.657834] RIP 
> [<ffffffff811ebe57>] kmem_cache_alloc+0x77/0x200
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.658878]  RSP 
> <ffff8805cb27fc98>
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.659907] CR2: 
> 000000000000bb00
> Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.667666] ---[ end trace 
> 8a742910926b0ed5 ]---
>
> I am non-subscriptions, and I just did an update yesterday to see if 
> it would fix the error. I'll be running a memtest today to see if I 
> can find anything.
>
> I hadn't done an update in awhile before that, so I'm leaning towards 
> a hardware issue. What do you think?
>
> Gerald
>

root at gbr-proxmox-1:~# pveversion  -verbose
proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve)
pve-manager: 4.3-10 (running version: 4.3-10/7230e60f)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.10-1-pve: 4.4.10-54
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-47
qemu-server: 4.0-94
pve-firmware: 1.1-10
libpve-common-perl: 4.0-80
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-14
pve-qemu-kvm: 2.7.0-6
pve-container: 1.0-81
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80


From f.gruenbichler at proxmox.com  Mon Nov 14 07:40:21 2016
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Mon, 14 Nov 2016 07:40:21 +0100
Subject: [PVE-User] Backup
In-Reply-To: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>
References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>
Message-ID: <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com>

On Sat, Nov 12, 2016 at 08:46:53PM +0100, Daniel wrote:
> Hi there,
> 
> before we used LVM-THIN we were able to Backup all Contains directly from the Host-System.
> Now, everythink is LVM. Is there any known and easy way to backup all Hosts including all VMs?
> For example with rsync or backuppc or how ever?

you can mount a container's volumes to be accessible on the host by
calling "pct mount ID". please be aware that this sets a lock on the
container and needs to be reversed by "pct unmount ID" afterwards.

but I would advise you to use vzdump to backup containers - you get a
(compressed) tar archive, the config is backed up as well and you get
consistency "for free" (or almost free ;)). normally, you want to
restore individual containers anyway.


From daniel at linux-nerd.de  Mon Nov 14 09:43:40 2016
From: daniel at linux-nerd.de (Daniel)
Date: Mon, 14 Nov 2016 09:43:40 +0100
Subject: [PVE-User] Backup
In-Reply-To: <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com>
References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>
 <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com>
Message-ID: <C3F2472F-239D-4061-96AC-7EF799FD3146@linux-nerd.de>

> 
> but I would advise you to use vzdump to backup containers - you get a
> (compressed) tar archive, the config is backed up as well and you get
> consistency "for free" (or almost free ;)). normally, you want to
> restore individual containers anyway.

The problem is that there is no way to restore just simple files and its not incremental.
So vzdump make no sense for me :(

Cheers

From f.gruenbichler at proxmox.com  Mon Nov 14 09:53:59 2016
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Mon, 14 Nov 2016 09:53:59 +0100
Subject: [PVE-User] Backup
In-Reply-To: <C3F2472F-239D-4061-96AC-7EF799FD3146@linux-nerd.de>
References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>
 <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com>
 <C3F2472F-239D-4061-96AC-7EF799FD3146@linux-nerd.de>
Message-ID: <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com>

On Mon, Nov 14, 2016 at 09:43:40AM +0100, Daniel wrote:
> > 
> > but I would advise you to use vzdump to backup containers - you get a
> > (compressed) tar archive, the config is backed up as well and you get
> > consistency "for free" (or almost free ;)). normally, you want to
> > restore individual containers anyway.
> 
> The problem is that there is no way to restore just simple files and its not incremental.
> So vzdump make no sense for me :(

extracting individual files is not a problem for container backups -
they're just compressed tar archives after all. incremental backups are
not supported though, that is correct.


From daniel at linux-nerd.de  Mon Nov 14 10:26:44 2016
From: daniel at linux-nerd.de (Daniel)
Date: Mon, 14 Nov 2016 10:26:44 +0100
Subject: [PVE-User] Backup
In-Reply-To: <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com>
References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>
 <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com>
 <C3F2472F-239D-4061-96AC-7EF799FD3146@linux-nerd.de>
 <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com>
Message-ID: <1912EFC5-BF08-4A3B-AD5D-9E159661D254@linux-nerd.de>


> Am 14.11.2016 um 09:53 schrieb Fabian Gr?nbichler <f.gruenbichler at proxmox.com>:
> 
> On Mon, Nov 14, 2016 at 09:43:40AM +0100, Daniel wrote:
>>> 
>>> but I would advise you to use vzdump to backup containers - you get a
>>> (compressed) tar archive, the config is backed up as well and you get
>>> consistency "for free" (or almost free ;)). normally, you want to
>>> restore individual containers anyway.
>> 
>> The problem is that there is no way to restore just simple files and its not incremental.
>> So vzdump make no sense for me :(
> 
> extracting individual files is not a problem for container backups -
> they're just compressed tar archives after all. incremental backups are
> not supported though, that is correct.

Its not a big deal to use backuppc for example on each Container but it was easier before we used LVM-Thin ;)
So it will blow up our network. Our Mail-Server for example is backuped up hours which is not so easy handled bei vzdump ;) 

From e.kasper at proxmox.com  Mon Nov 14 10:39:22 2016
From: e.kasper at proxmox.com (Emmanuel Kasper)
Date: Mon, 14 Nov 2016 10:39:22 +0100
Subject: [PVE-User] Kernel oops
In-Reply-To: <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com>
References: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com>
 <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com>
Message-ID: <3ca4868b-d596-1df3-fd4b-906482049183@proxmox.com>

>> I am non-subscriptions, and I just did an update yesterday to see if
>> it would fix the error. I'll be running a memtest today to see if I
>> can find anything.
>>
>> I hadn't done an update in awhile before that, so I'm leaning towards
>> a hardware issue. What do you think?

Yes, most probably the ram is the culprit. You might also check that the
RAM modules are properly seated on the motherboard.


From ADhaussy at voyages-sncf.com  Mon Nov 14 11:50:57 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Mon, 14 Nov 2016 10:50:57 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <1860956507.131.1478889820301@webmail.proxmox.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
Message-ID: <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>


Le 11/11/2016 ? 19:43, Dietmar Maurer a ?crit :
> On November 11, 2016 at 6:41 PM Dhaussy Alexandre 
> <ADhaussy at voyages-sncf.com> wrote:
>>> you lost quorum, and the watchdog expired - that is how the watchdog
>>> based fencing works.
>> I don't expect to loose quorum when _one_ node joins or leave the cluster.
> This was probably a long time before - but I have not read through the whole
> logs ...
That makes no sense to me..
The fact is : everything have been working fine for weeks.


What i can see in the logs is : several reboots of cluster nodes 
suddently, and exactly one minute after one node joining and/or leaving 
the cluster.
I see no problems with corosync/lrm/crm before that.
This leads me to a probable network (multicast) malfunction.

I did a bit of homeworks reading the wiki about ha manager..

What i understand so far, is that every state/service change from LRM 
must be acknowledged (cluster-wise) by CRM master.
So if a multicast disruption occurs, and i assume LRM wouldn't be able 
talk to the CRM MASTER, then it also couldn't reset the watchdog, am i 
right ?

Another thing ; i have checked my network configuration, the cluster ip 
is set on a linux bridge...
By default multicast_snooping is set to 1 on linux bridge, so i think it 
there's a good chance this is the source of my problems...
Note that we don't use IGMP snooping, it is disabled on almost all 
network switchs.

Plus i found a post by A.Derumier (yes, 3 years old..) He did have 
similar issues with bridge and multicast.
http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html

From t.lamprecht at proxmox.com  Mon Nov 14 12:33:27 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Mon, 14 Nov 2016 12:33:27 +0100
Subject: [PVE-User] Cluster disaster
In-Reply-To: <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
Message-ID: <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>


On 14.11.2016 11:50, Dhaussy Alexandre wrote:
>
> Le 11/11/2016 ? 19:43, Dietmar Maurer a ?crit :
>> On November 11, 2016 at 6:41 PM Dhaussy Alexandre
>> <ADhaussy at voyages-sncf.com> wrote:
>>>> you lost quorum, and the watchdog expired - that is how the watchdog
>>>> based fencing works.
>>> I don't expect to loose quorum when _one_ node joins or leave the cluster.
>> This was probably a long time before - but I have not read through the whole
>> logs ...
> That makes no sense to me..
> The fact is : everything have been working fine for weeks.
>
>
> What i can see in the logs is : several reboots of cluster nodes
> suddently, and exactly one minute after one node joining and/or leaving
> the cluster.

The watchdog is set to an 60 second timeout, meaning that cluster leave caused
quorum loss, or other problems (you said you had multicast problems around that
time) thus the LRM stopped updating the watchdog, so one minute later it resetted
all nodes, which left the quorate partition.

> I see no problems with corosync/lrm/crm before that.
> This leads me to a probable network (multicast) malfunction.
>
> I did a bit of homeworks reading the wiki about ha manager..
>
> What i understand so far, is that every state/service change from LRM
> must be acknowledged (cluster-wise) by CRM master.

Yes and no, LRM and CRM are two state machines with synced inputs,
but that holds mainly for human triggered commands and the resulting
communication.
Meaning that commands like start, stop, migrate may not go through from
the CRM to the LRM. Fencing and such stuff works none the less, else it
would be a major design flaw :)

> So if a multicast disruption occurs, and i assume LRM wouldn't be able
> talk to the CRM MASTER, then it also couldn't reset the watchdog, am i
> right ?
>


No, the watchdog runs on each node and is CRM independent.
As watchdogs are normally not able to server more clients we wrote
the watchdog-mux (multiplexer).
This is a very simple C program which opens the watchdog with a
60 second timeout and allows multiple clients (at the moment CRM
and LRM) to connect to it.
If a client does not resets the dog for about 10 seconds, IIRC, the
watchdox-mux disables watchdogs updates on the real watchdog.
After that a node reset will happen *when* the dog runs out of time,
not instantly.

So if the LRM cannot communicate (i.e. has no quorum) he will stop
updating the dog, thus trigger independent what the CRM says or does.


> Another thing ; i have checked my network configuration, the cluster ip
> is set on a linux bridge...
> By default multicast_snooping is set to 1 on linux bridge, so i think it
> there's a good chance this is the source of my problems...
> Note that we don't use IGMP snooping, it is disabled on almost all
> network switchs.
>

Yes, multicast snooping has to be configured (recommended) or else turned off on the switch.
That's stated in some wiki articles, various forum posts and our docs, here:
http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

Hope that helps a bit understanding. :)

cheers,
Thomas

> Plus i found a post by A.Derumier (yes, 3 years old..) He did have
> similar issues with bridge and multicast.
> http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From dietmar at proxmox.com  Mon Nov 14 12:34:02 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Mon, 14 Nov 2016 12:34:02 +0100 (CET)
Subject: [PVE-User] Cluster disaster
In-Reply-To: <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
Message-ID: <462735272.57.1479123243305@webmail.proxmox.com>

> What i understand so far, is that every state/service change from LRM 
> must be acknowledged (cluster-wise) by CRM master.
> So if a multicast disruption occurs, and i assume LRM wouldn't be able 
> talk to the CRM MASTER, then it also couldn't reset the watchdog, am i 
> right ?

Nothing happens as long as you have quorum. And if I understand you
correctly, you never lost quorum on those nodes?


From ADhaussy at voyages-sncf.com  Mon Nov 14 14:25:18 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Mon, 14 Nov 2016 13:25:18 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <462735272.57.1479123243305@webmail.proxmox.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <462735272.57.1479123243305@webmail.proxmox.com>
Message-ID: <812f2710-9973-267f-abca-53789a28162b@voyages-sncf.com>

Le 14/11/2016 ? 12:34, Dietmar Maurer a ?crit :
>> What i understand so far, is that every state/service change from LRM
>> must be acknowledged (cluster-wise) by CRM master.
>> So if a multicast disruption occurs, and i assume LRM wouldn't be able
>> talk to the CRM MASTER, then it also couldn't reset the watchdog, am i
>> right ?
> Nothing happens as long as you have quorum. And if I understand you
> correctly, you never lost quorum on those nodes?
As far as can be told from the log files, yes.

From ADhaussy at voyages-sncf.com  Mon Nov 14 14:46:40 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Mon, 14 Nov 2016 13:46:40 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
Message-ID: <81da8dca-65d8-95d9-b804-e73ff60a4650@voyages-sncf.com>

Le 14/11/2016 ? 12:33, Thomas Lamprecht a ?crit :
> Hope that helps a bit understanding. :)

Sure, thank you for clearing things up. :)
I wish i had done this before, but i learned a lot in the last few days...

From gbr at majentis.com  Tue Nov 15 14:39:21 2016
From: gbr at majentis.com (Gerald Brandt)
Date: Tue, 15 Nov 2016 07:39:21 -0600
Subject: [PVE-User] Kernel oops
In-Reply-To: <3ca4868b-d596-1df3-fd4b-906482049183@proxmox.com>
References: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com>
 <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com>
 <3ca4868b-d596-1df3-fd4b-906482049183@proxmox.com>
Message-ID: <29cb9d5d-56b7-6ab2-2d5d-d3e6b19a7fa2@majentis.com>


On 2016-11-14 03:39 AM, Emmanuel Kasper wrote:
>>> I am non-subscriptions, and I just did an update yesterday to see if
>>> it would fix the error. I'll be running a memtest today to see if I
>>> can find anything.
>>>
>>> I hadn't done an update in awhile before that, so I'm leaning towards
>>> a hardware issue. What do you think?
> Yes, most probably the ram is the culprit. You might also check that the
> RAM modules are properly seated on the motherboard.
>
>
> _______________________________________________
>

Bad RAM is exactly what it was. 2 of the 4 DIMMs went bad after 4 years.

Gerald


From m at plus-plus.su  Tue Nov 15 15:48:57 2016
From: m at plus-plus.su (Mikhail)
Date: Tue, 15 Nov 2016 17:48:57 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
Message-ID: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>

Hello,

Please help me to find why I'm seeing slow speeds when KVM guest is on
NFS storage. I have pretty standard setup, running Proxmox 4.1-1. The
storage server is on NFS connected directly (no switches/hubs, direct
NIC-to-NIC connection) via Gigabit ethernet.

I just launched Debian-8.3 stock ISO installation on the KVM guest
that's disk resides on NFS and I'm seeing some terribly slow file copy
operation speeds on debian install procedure - about 200-600
kilobyte/second according to "bwm-ng" output on storage server. I also
tried direct write from my Proxmox host via NFS using "dd" and results
are showing near 1gbit speeds:

# dd if=/dev/zero of=10G bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 115.951 s, 90.4 MB/s

What could be an issue?

On Proxmox host:

# cat /proc/mounts |grep vmnf
192.168.4.1:/mnt/vmnfs /mnt/pve/vmnfs nfs
rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.1,mountvers=3,mountport=47825,mountproto=udp,local_lock=none,addr=192.168.4.1
0 0

storage.cfg:
nfs: vmnfs
	export /mnt/vmnfs
	server 192.168.4.1
	path /mnt/pve/vmnfs
	content images
	options vers=3
	maxfiles 1

KVM guest config:
# cat /etc/pve/qemu-server/85103.conf
bootdisk: virtio0
cores: 1
ide2: ISOimages:iso/debian-8.3.0-amd64-CD-1.iso,media=cdrom
memory: 2048
name: WEB
net0: virtio=3A:39:66:30:63:32,bridge=vmbr0,tag=85
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=97ea543f-ca64-43ab-9d66-9d1c9cd179b0
sockets: 1
virtio0: vmnfs:85103/vm-85103-disk-1.qcow2,size=50G

Any suggestions where to start looking is greatly appreciated.

Thanks.


From gbr at majentis.com  Tue Nov 15 16:09:46 2016
From: gbr at majentis.com (Gerald Brandt)
Date: Tue, 15 Nov 2016 09:09:46 -0600
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
Message-ID: <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>

I don't know if it helps, but I always switch to NFSv4.

nfs: storage
     export /proxmox
     server 172.23.4.16
     path /mnt/pve/storage
     options vers=4
     maxfiles 1
     content iso,backup,images

Gerald

On 2016-11-15 08:48 AM, Mikhail wrote:
> Hello,
>
> Please help me to find why I'm seeing slow speeds when KVM guest is on
> NFS storage. I have pretty standard setup, running Proxmox 4.1-1. The
> storage server is on NFS connected directly (no switches/hubs, direct
> NIC-to-NIC connection) via Gigabit ethernet.
>
> I just launched Debian-8.3 stock ISO installation on the KVM guest
> that's disk resides on NFS and I'm seeing some terribly slow file copy
> operation speeds on debian install procedure - about 200-600
> kilobyte/second according to "bwm-ng" output on storage server. I also
> tried direct write from my Proxmox host via NFS using "dd" and results
> are showing near 1gbit speeds:
>
> # dd if=/dev/zero of=10G bs=1M count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 115.951 s, 90.4 MB/s
>
> What could be an issue?
>
> On Proxmox host:
>
> # cat /proc/mounts |grep vmnf
> 192.168.4.1:/mnt/vmnfs /mnt/pve/vmnfs nfs
> rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.1,mountvers=3,mountport=47825,mountproto=udp,local_lock=none,addr=192.168.4.1
> 0 0
>
> storage.cfg:
> nfs: vmnfs
> 	export /mnt/vmnfs
> 	server 192.168.4.1
> 	path /mnt/pve/vmnfs
> 	content images
> 	options vers=3
> 	maxfiles 1
>
> KVM guest config:
> # cat /etc/pve/qemu-server/85103.conf
> bootdisk: virtio0
> cores: 1
> ide2: ISOimages:iso/debian-8.3.0-amd64-CD-1.iso,media=cdrom
> memory: 2048
> name: WEB
> net0: virtio=3A:39:66:30:63:32,bridge=vmbr0,tag=85
> numa: 0
> onboot: 1
> ostype: l26
> smbios1: uuid=97ea543f-ca64-43ab-9d66-9d1c9cd179b0
> sockets: 1
> virtio0: vmnfs:85103/vm-85103-disk-1.qcow2,size=50G
>
> Any suggestions where to start looking is greatly appreciated.
>
> Thanks.
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>


From m at plus-plus.su  Tue Nov 15 18:25:10 2016
From: m at plus-plus.su (Mikhail)
Date: Tue, 15 Nov 2016 20:25:10 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
Message-ID: <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>

On 11/15/2016 06:09 PM, Gerald Brandt wrote:
> I don't know if it helps, but I always switch to NFSv4.

Thanks for the tip. This did not help. I also tried with various caching
options (writeback, writethrough, etc) and RAW disk format instead of
qcow2 - nothing changed.

I also have LVM over iSCSI export to that Proxmox host, and using LVM
over network (to the same storage server) I'm seeing expected speeds
close to 1gbit.

So this means something is either wrong with NFS export options, or
something related to that part.


From ADhaussy at voyages-sncf.com  Tue Nov 15 19:04:10 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Tue, 15 Nov 2016 18:04:10 +0000
Subject: [PVE-User] weird memory stats in GUI graphs
Message-ID: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com>

Hello,

I just noticed two different values on the node Summary tab :

Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB)

And graphs : Total RAM : 540.94GB and Usage : 504.53GB

The server has 512G + proxmox-ve: 4.3-66.

From weik at bbs-haarentor.de  Tue Nov 15 19:16:24 2016
From: weik at bbs-haarentor.de (Ulf Weikert)
Date: Tue, 15 Nov 2016 19:16:24 +0100
Subject: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM
Message-ID: <dd9dcaa9-952e-03bd-f8bb-21b5570f4e88@bbs-haarentor.de>

Hey there,

I'm running Proxmox VE 4.2-2 on a HP DL380 G8.
Yesterday I installed a new 10Gbit HP 560SFP+ Dual Port Fibre NIC.
Which lead me to this bug [0], [1].

But since the card is displayed in my Proxmox Webinterface, I think it
should work at least.


So I configured my KVM Container and installed Windows Server 2012 R2 VM
according to the best practices wiki [2].

After Installation however the NIC in the VM does not receive an IP.
The DL 380 host is directly attached to our coreswitch via fibre channel.

To make sure DHCP is working on the coreswitch port I plugged in some
old 1 GBit SFP modules. Connected it to an equally old 24 port Switch.
And plugged my notebook in the RJ45 port. DHCP works fine there.

So in theory it should work in my VM as well. But for some reason it
doesn't. See screenshot [3]
See my Proxmox Host and VM Setup. [4] & [5].

I tried using OVS Bridge instead of Linux Bridge as suggested in [0]#20,
but that didn't work either.

I'm thankful for any tip or advice you can give me.


[0] https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1616107
[2] https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices
[3] https://postimg.org/image/bmh7iooxp/
[4] https://postimg.org/image/l8aryzg3h/
[5] https://postimg.org/image/z392hgail/

-- 
Freundliche Gr??e
 
Im Auftrag
Ulf Weikert
Systemadministrator
Berufsbildende Schulen Haarentor der Stadt Oldenburg
Ammerl?nder Heerstr. 33-39 | 26129 Oldenburg
-----------------------------------------
Encryptet 'Signal' Call:
+49 441 77915-17

Tel.
+49 441 77915-17

E-Mail:
weik at bbs-haarentor.de

Besuchen Sie uns im Internet unter www.bbs-haarentor.de
Schulprogramm unter: http://www.bbs-haarentor.de/index.php?id=323

Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 


From dietmar at proxmox.com  Tue Nov 15 19:48:07 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Tue, 15 Nov 2016 19:48:07 +0100 (CET)
Subject: [PVE-User] weird memory stats in GUI graphs
In-Reply-To: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com>
References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com>
Message-ID: <580588460.34.1479235687618@webmail.proxmox.com>

> I just noticed two different values on the node Summary tab :
> 
> Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB)
> 
> And graphs : Total RAM : 540.94GB and Usage : 504.53GB

Indeed, that looks strange. Please note that the units are
different (GiB vs. GB), but values are still wrong.


From dietmar at proxmox.com  Tue Nov 15 19:49:43 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Tue, 15 Nov 2016 19:49:43 +0100 (CET)
Subject: [PVE-User] weird memory stats in GUI graphs
In-Reply-To: <580588460.34.1479235687618@webmail.proxmox.com>
References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com>
 <580588460.34.1479235687618@webmail.proxmox.com>
Message-ID: <762264953.36.1479235783912@webmail.proxmox.com>


> On November 15, 2016 at 7:48 PM Dietmar Maurer <dietmar at proxmox.com> wrote:
> 
> 
> > I just noticed two different values on the node Summary tab :
> > 
> > Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB)
> > 
> > And graphs : Total RAM : 540.94GB and Usage : 504.53GB
> 
> Indeed, that looks strange. Please note that the units are
> different (GiB vs. GB), but values are still wrong.

No, values are correct - it is just the different unit.


From dietmar at proxmox.com  Tue Nov 15 19:52:09 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Tue, 15 Nov 2016 19:52:09 +0100 (CET)
Subject: [PVE-User] weird memory stats in GUI graphs
In-Reply-To: <762264953.36.1479235783912@webmail.proxmox.com>
References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com>
 <580588460.34.1479235687618@webmail.proxmox.com>
 <762264953.36.1479235783912@webmail.proxmox.com>
Message-ID: <1387700866.38.1479235929867@webmail.proxmox.com>


> On November 15, 2016 at 7:49 PM Dietmar Maurer <dietmar at proxmox.com> wrote:
> 
> 
> 
> 
> > On November 15, 2016 at 7:48 PM Dietmar Maurer <dietmar at proxmox.com> wrote:
> > 
> > 
> > > I just noticed two different values on the node Summary tab :
> > > 
> > > Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB)
> > > 
> > > And graphs : Total RAM : 540.94GB and Usage : 504.53GB
> > 
> > Indeed, that looks strange. Please note that the units are
> > different (GiB vs. GB), but values are still wrong.
> 
> No, values are correct - it is just the different unit.

Also see: https://en.wikipedia.org/wiki/Gibibyte

And yes, I know it is not ideal to display values with different base unit,
but this has technical reasons...


From ADhaussy at voyages-sncf.com  Tue Nov 15 21:12:02 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Tue, 15 Nov 2016 20:12:02 +0000
Subject: [PVE-User] weird memory stats in GUI graphs
In-Reply-To: <1387700866.38.1479235929867@webmail.proxmox.com>
References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com>
 <580588460.34.1479235687618@webmail.proxmox.com>
 <762264953.36.1479235783912@webmail.proxmox.com>,
 <1387700866.38.1479235929867@webmail.proxmox.com>
Message-ID: <2EDB1A96-F0FE-4391-B951-4BDF5111602A@voyages-sncf.com>


> Le 15 nov. 2016 ? 19:52, Dietmar Maurer <dietmar at proxmox.com> a ?crit :

>> No, values are correct - it is just the different unit.
> 
> Also see: https://en.wikipedia.org/wiki/Gibibyte
> 
> And yes, I know it is not ideal to display values with different base unit,
> but this has technical reasons...
> 

Indeed, i did not pay attention to units.

You almost lost me. :-)
I guess blame the bad habits of using GB for GiB..


From bc at iptel.co  Tue Nov 15 22:33:13 2016
From: bc at iptel.co (Brian ::)
Date: Tue, 15 Nov 2016 21:33:13 +0000
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
Message-ID: <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>

90.4 MB/s isn't that far off.


On Tue, Nov 15, 2016 at 5:25 PM, Mikhail <m at plus-plus.su> wrote:
> On 11/15/2016 06:09 PM, Gerald Brandt wrote:
>> I don't know if it helps, but I always switch to NFSv4.
>
> Thanks for the tip. This did not help. I also tried with various caching
> options (writeback, writethrough, etc) and RAW disk format instead of
> qcow2 - nothing changed.
>
> I also have LVM over iSCSI export to that Proxmox host, and using LVM
> over network (to the same storage server) I'm seeing expected speeds
> close to 1gbit.
>
> So this means something is either wrong with NFS export options, or
> something related to that part.
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From bc at iptel.co  Tue Nov 15 22:35:55 2016
From: bc at iptel.co (Brian ::)
Date: Tue, 15 Nov 2016 21:35:55 +0000
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
Message-ID: <CAAr8QowmJCENamkvNmHmN=pSPjfVvy4+gZkvRyh7eEryzRX9Dg@mail.gmail.com>

Ignore my reply - just reread the thread fully :)

NFS should work just fine.. no idea why you are seeing those lousy speeds.


On Tue, Nov 15, 2016 at 9:33 PM, Brian :: <bc at iptel.co> wrote:
> 90.4 MB/s isn't that far off.
>
>
> On Tue, Nov 15, 2016 at 5:25 PM, Mikhail <m at plus-plus.su> wrote:
>> On 11/15/2016 06:09 PM, Gerald Brandt wrote:
>>> I don't know if it helps, but I always switch to NFSv4.
>>
>> Thanks for the tip. This did not help. I also tried with various caching
>> options (writeback, writethrough, etc) and RAW disk format instead of
>> qcow2 - nothing changed.
>>
>> I also have LVM over iSCSI export to that Proxmox host, and using LVM
>> over network (to the same storage server) I'm seeing expected speeds
>> close to 1gbit.
>>
>> So this means something is either wrong with NFS export options, or
>> something related to that part.
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From m at plus-plus.su  Tue Nov 15 22:36:40 2016
From: m at plus-plus.su (Mikhail)
Date: Wed, 16 Nov 2016 00:36:40 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
Message-ID: <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>

On 11/16/2016 12:33 AM, Brian :: wrote:
> 90.4 MB/s isn't that far off.

Hello,

Yes, but I'm only able to get these results when doing simple "dd" test
directly on Proxmox host machine inside NFS-mounted directory. KVM
guest's filesystem is not getting even 1/4 of that speed when it's disk
resides on the very same NFS (Debian installation from stock ISO takes
~hour to copy first halt of it's files..)


From bc at iptel.co  Tue Nov 15 22:43:42 2016
From: bc at iptel.co (Brian ::)
Date: Tue, 15 Nov 2016 21:43:42 +0000
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
Message-ID: <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>

What type of disk controller and what caching mode are you using?


On Tue, Nov 15, 2016 at 9:36 PM, Mikhail <m at plus-plus.su> wrote:
> On 11/16/2016 12:33 AM, Brian :: wrote:
>> 90.4 MB/s isn't that far off.
>
> Hello,
>
> Yes, but I'm only able to get these results when doing simple "dd" test
> directly on Proxmox host machine inside NFS-mounted directory. KVM
> guest's filesystem is not getting even 1/4 of that speed when it's disk
> resides on the very same NFS (Debian installation from stock ISO takes
> ~hour to copy first halt of it's files..)
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From m at plus-plus.su  Tue Nov 15 23:05:00 2016
From: m at plus-plus.su (Mikhail)
Date: Wed, 16 Nov 2016 01:05:00 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
Message-ID: <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>

On 11/16/2016 12:43 AM, Brian :: wrote:
> What type of disk controller and what caching mode are you using? 

The storage server is built with 4 x 4TB ST4000NM0034 Seagate disks,
attached to LSI Logic SAS3008 controller. Then there's Debian Jessie
with software RAID10 using MDADM. This space is given to Proxmox host
via iSCSI + LVM via 10 gbit ethernet. There's 32GB of RAM in this
storage server, so almost all this RAM can be used for cache (nothing
else runs there).

I ran various tests on the storage server locally (created local LV,
formatted it to EXT4 and ran there various disk-intensive tasks such as
copying big files, etc). My average write speed to this MDADM raid10 /
LVM / Ext4 filesystem is about 70-80mb/s. I guess it should be much
faster then that, but I can't find out where's the bottleneck in this
setup..

# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
      7811819520 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 11/59 pages [44KB], 65536KB chunk

unused devices: <none>

# pvs
  PV         VG   Fmt  Attr PSize PFree
  /dev/md0   vg0  lvm2 a--  7.28t 1.28t

Thanks.

> 
> 
> 
> On Tue, Nov 15, 2016 at 9:36 PM, Mikhail <m at plus-plus.su> wrote:
>> On 11/16/2016 12:33 AM, Brian :: wrote:
>>> 90.4 MB/s isn't that far off.
>>
>> Hello,
>>
>> Yes, but I'm only able to get these results when doing simple "dd" test
>> directly on Proxmox host machine inside NFS-mounted directory. KVM
>> guest's filesystem is not getting even 1/4 of that speed when it's disk
>> resides on the very same NFS (Debian installation from stock ISO takes
>> ~hour to copy first halt of it's files..)
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From proxmox-user at mattern.org  Tue Nov 15 23:12:34 2016
From: proxmox-user at mattern.org (Marcus)
Date: Tue, 15 Nov 2016 23:12:34 +0100
Subject: [PVE-User] Backup
In-Reply-To: <1912EFC5-BF08-4A3B-AD5D-9E159661D254@linux-nerd.de>
References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de>
 <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com>
 <C3F2472F-239D-4061-96AC-7EF799FD3146@linux-nerd.de>
 <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com>
 <1912EFC5-BF08-4A3B-AD5D-9E159661D254@linux-nerd.de>
Message-ID: <47d2d15f-b3e2-ac2e-4b64-bcec5a0d02f3@mattern.org>

Hi,

it is also possible to take LVM Snapshots on the host. Than mount the
snapshot and take backups with rsync (rsnapshot e. g.) or whatever you
prefer. You don't need pct mount or any software inside the container.


Am 14.11.2016 um 10:26 schrieb Daniel:
>> Am 14.11.2016 um 09:53 schrieb Fabian Gr?nbichler <f.gruenbichler at proxmox.com>:
>>
>> On Mon, Nov 14, 2016 at 09:43:40AM +0100, Daniel wrote:
>>>> but I would advise you to use vzdump to backup containers - you get a
>>>> (compressed) tar archive, the config is backed up as well and you get
>>>> consistency "for free" (or almost free ;)). normally, you want to
>>>> restore individual containers anyway.
>>> The problem is that there is no way to restore just simple files and its not incremental.
>>> So vzdump make no sense for me :(
>> extracting individual files is not a problem for container backups -
>> they're just compressed tar archives after all. incremental backups are
>> not supported though, that is correct.
> Its not a big deal to use backuppc for example on each Container but it was easier before we used LVM-Thin ;)
> So it will blow up our network. Our Mail-Server for example is backuped up hours which is not so easy handled bei vzdump ;) 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From bc at iptel.co  Tue Nov 15 23:22:29 2016
From: bc at iptel.co (Brian ::)
Date: Tue, 15 Nov 2016 22:22:29 +0000
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
 <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
Message-ID: <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>

Hi Mikhail

The guest that is running - what type of controller / cache?

Thanks


On Tue, Nov 15, 2016 at 10:05 PM, Mikhail <m at plus-plus.su> wrote:
> On 11/16/2016 12:43 AM, Brian :: wrote:
>> What type of disk controller and what caching mode are you using?
>
> The storage server is built with 4 x 4TB ST4000NM0034 Seagate disks,
> attached to LSI Logic SAS3008 controller. Then there's Debian Jessie
> with software RAID10 using MDADM. This space is given to Proxmox host
> via iSCSI + LVM via 10 gbit ethernet. There's 32GB of RAM in this
> storage server, so almost all this RAM can be used for cache (nothing
> else runs there).
>
> I ran various tests on the storage server locally (created local LV,
> formatted it to EXT4 and ran there various disk-intensive tasks such as
> copying big files, etc). My average write speed to this MDADM raid10 /
> LVM / Ext4 filesystem is about 70-80mb/s. I guess it should be much
> faster then that, but I can't find out where's the bottleneck in this
> setup..
>
> # cat /proc/mdstat
> Personalities : [raid10]
> md0 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
>       7811819520 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
>       bitmap: 11/59 pages [44KB], 65536KB chunk
>
> unused devices: <none>
>
> # pvs
>   PV         VG   Fmt  Attr PSize PFree
>   /dev/md0   vg0  lvm2 a--  7.28t 1.28t
>
> Thanks.
>
>>
>>
>>
>> On Tue, Nov 15, 2016 at 9:36 PM, Mikhail <m at plus-plus.su> wrote:
>>> On 11/16/2016 12:33 AM, Brian :: wrote:
>>>> 90.4 MB/s isn't that far off.
>>>
>>> Hello,
>>>
>>> Yes, but I'm only able to get these results when doing simple "dd" test
>>> directly on Proxmox host machine inside NFS-mounted directory. KVM
>>> guest's filesystem is not getting even 1/4 of that speed when it's disk
>>> resides on the very same NFS (Debian installation from stock ISO takes
>>> ~hour to copy first halt of it's files..)
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From m at plus-plus.su  Tue Nov 15 23:33:30 2016
From: m at plus-plus.su (Mikhail)
Date: Wed, 16 Nov 2016 01:33:30 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
 <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
 <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>
Message-ID: <f99be47d-08c1-7f05-231a-33f834ef547e@plus-plus.su>

On 11/16/2016 01:22 AM, Brian :: wrote:
> Hi Mikhail
> 
> The guest that is running - what type of controller / cache?
> 
> Thanks

Brian,

The guest is Debian Jessie, running VirtIO as controller and "Default
(No cache)" cache setting. I tried both writeback / writethrough
settings as well, but it did not change things to better..

Btw, just did another "dd" test on the storage server itself (ext4
mounted from LVM lv that resides on top of MDADM RAID10):

#  dd if=/dev/zero of=test bs=64k count=550k conv=fdatasync
563200+0 records in
563200+0 records out
36909875200 bytes (37 GB) copied, 176.696 s, 209 MB/s

I guess the storage server itself is fine.

Btw, similar results when I run this test from Proxmox host that is
attached via 10gbit ethernet to storage server using NFS mount:

# dd if=/dev/zero of=test bs=64k count=550k conv=fdatasync
563200+0 records in
563200+0 records out
36909875200 bytes (37 GB) copied, 165.531 s, 223 MB/s

At this point, I don't know what else I can check on my systems to find
what's the problem with KVM images being put on NFS storage.


From m at plus-plus.su  Tue Nov 15 23:59:15 2016
From: m at plus-plus.su (Mikhail)
Date: Wed, 16 Nov 2016 01:59:15 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <f99be47d-08c1-7f05-231a-33f834ef547e@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
 <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
 <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>
 <f99be47d-08c1-7f05-231a-33f834ef547e@plus-plus.su>
Message-ID: <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su>

Here's a clean "dd" test of two identical KVM guests that shows how
results differ (NFS vs LVM):

1) First guest inside qcow2 image, located on NFS share (via 10gbit
ethernet), cache settings "Default (No cache)":

$ dd if=/dev/zero of=test bs=64k count=150k conv=fdatasync
153600+0 records in
153600+0 records out
10066329600 bytes (10 GB) copied, 196.993 s, 51.1 MB/s


2) Second guest runs inside LVM-over-iSCSI logical volume, from the same
storage server, via same as first guest 10gbit ethernet, cache settings
"Default (No cache)":

$ dd if=/dev/zero of=test bs=64k count=150k conv=fdatasync
153600+0 records in
153600+0 records out
10066329600 bytes (10 GB) copied, 58.474 s, 172 MB/s

Mikhail.


From dietmar at proxmox.com  Wed Nov 16 06:52:58 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Wed, 16 Nov 2016 06:52:58 +0100 (CET)
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
 <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
 <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>
 <f99be47d-08c1-7f05-231a-33f834ef547e@plus-plus.su>
 <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su>
Message-ID: <1057547647.2.1479275578414@webmail.proxmox.com>


> 1) First guest inside qcow2 image, located on NFS share (via 10gbit

What values do you get with raw images?


From m at plus-plus.su  Wed Nov 16 11:02:26 2016
From: m at plus-plus.su (Mikhail)
Date: Wed, 16 Nov 2016 13:02:26 +0300
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <1057547647.2.1479275578414@webmail.proxmox.com>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
 <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
 <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>
 <f99be47d-08c1-7f05-231a-33f834ef547e@plus-plus.su>
 <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su>
 <1057547647.2.1479275578414@webmail.proxmox.com>
Message-ID: <5e80b121-ba54-e860-8ad6-086095789268@plus-plus.su>

On 11/16/2016 08:52 AM, Dietmar Maurer wrote:
> 
>> 1) First guest inside qcow2 image, located on NFS share (via 10gbit
> 
> What values do you get with raw images?
> 

Just now converted guest's disk image to RAW, using default cache
settings. Seeing much better results - same "dd" test now shows 145 MB/s
write speeds:

541590+0 records out
35493642240 bytes (35 GB) copied, 245.511 s, 145 MB/s

(dd if=/dev/zero of=test bs=64k count=550k conv=fdatasync)

And I also tried same test over 1 gbit network, it shows acceptable
results there as well:

dd if=/dev/zero of=test bs=64k count=150k conv=fdatasync
153600+0 records in
153600+0 records out
10066329600 bytes (10 GB) copied, 94.1721 s, 107 MB/s

So something is not good with QCOW2 disk format.

Mikhail.


From dietmar at proxmox.com  Wed Nov 16 12:06:42 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Wed, 16 Nov 2016 12:06:42 +0100 (CET)
Subject: [PVE-User] Slow speeds when KVM guest is on NFS
In-Reply-To: <5e80b121-ba54-e860-8ad6-086095789268@plus-plus.su>
References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su>
 <b316b329-4f19-cae4-4894-ca2478abea5c@majentis.com>
 <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su>
 <CAAr8QozqpoFztRSPnf6VSA9o4=aQhrYXFRe5CP3AH+i1rVAiMg@mail.gmail.com>
 <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su>
 <CAAr8Qox_zz52keg5bQ65f1_69asmAKOCDLe6YgDo06NfbPpZJw@mail.gmail.com>
 <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su>
 <CAAr8QozRGD1_exxKHy-=8tx-+K16YBNB+qBbeChOr3aBU2A6Gw@mail.gmail.com>
 <f99be47d-08c1-7f05-231a-33f834ef547e@plus-plus.su>
 <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su>
 <1057547647.2.1479275578414@webmail.proxmox.com>
 <5e80b121-ba54-e860-8ad6-086095789268@plus-plus.su>
Message-ID: <827147981.173.1479294402439@webmail.proxmox.com>

> So something is not good with QCOW2 disk format.

I guess this is just because it changes a sequential write
order to something more random. You will get different
results if you use other benchmark tools ...


From nick-liste at posteo.eu  Wed Nov 16 12:40:06 2016
From: nick-liste at posteo.eu (Nicola Ferrari (#554252))
Date: Wed, 16 Nov 2016 12:40:06 +0100
Subject: [PVE-User] Android app for pve 4.2 management
Message-ID: <o0hgih$pu8$1@blaine.gmane.org>

Hi everybody.

I'm running a 3-nodes pve 4.2 cluster.
In the past I used to overview the cluster at home and while travelling
using OpenVPN on my phone, in conjunction with QuadProx Mobile:
https://play.google.com/store/apps/details?id=it.quadrata.android.quad_prox_mob&hl=it

I recently (sept 2016) upgraded cluster to pve4, and only yesterday I
realized that Quadrata App is no more functional on pve4:
I can see only a few data (nodes name, free memory/cpu) but I can't see
vm list and management options (start, stop, console and so on) .

Do you experience the same issue too?
Any advice about alternative Android apps to achieve this?

Thanks!
Nick

PS: I tried to use simply a web browser on the mobile (firefox mobile)
but it requests too much resources... that's not usable..


-- 
+---------------------+
| Linux User  #554252 |
+---------------------+


From daniel at linux-nerd.de  Wed Nov 16 13:02:07 2016
From: daniel at linux-nerd.de (Daniel)
Date: Wed, 16 Nov 2016 13:02:07 +0100
Subject: [PVE-User] Android app for pve 4.2 management
In-Reply-To: <o0hgih$pu8$1@blaine.gmane.org>
References: <o0hgih$pu8$1@blaine.gmane.org>
Message-ID: <4C14E5C6-0309-4EC2-A42D-24447FEF7266@linux-nerd.de>

Actually i have the same problem.

Thats the reason why i started to develop my own App.
But these App will take serval month till i will be able to show something.


> Am 16.11.2016 um 12:40 schrieb Nicola Ferrari (#554252) <nick-liste at posteo.eu>:
> 
> Hi everybody.
> 
> I'm running a 3-nodes pve 4.2 cluster.
> In the past I used to overview the cluster at home and while travelling
> using OpenVPN on my phone, in conjunction with QuadProx Mobile:
> https://play.google.com/store/apps/details?id=it.quadrata.android.quad_prox_mob&hl=it
> 
> I recently (sept 2016) upgraded cluster to pve4, and only yesterday I
> realized that Quadrata App is no more functional on pve4:
> I can see only a few data (nodes name, free memory/cpu) but I can't see
> vm list and management options (start, stop, console and so on) .
> 
> Do you experience the same issue too?
> Any advice about alternative Android apps to achieve this?
> 
> Thanks!
> Nick
> 
> PS: I tried to use simply a web browser on the mobile (firefox mobile)
> but it requests too much resources... that's not usable..
> 
> 
> 
> -- 
> +---------------------+
> | Linux User  #554252 |
> +---------------------+
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From aderumier at odiso.com  Wed Nov 16 13:47:15 2016
From: aderumier at odiso.com (Alexandre DERUMIER)
Date: Wed, 16 Nov 2016 13:47:15 +0100 (CET)
Subject: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre,
 No LAN Connection in VM
In-Reply-To: <dd9dcaa9-952e-03bd-f8bb-21b5570f4e88@bbs-haarentor.de>
References: <dd9dcaa9-952e-03bd-f8bb-21b5570f4e88@bbs-haarentor.de>
Message-ID: <109434272.3393159.1479300435920.JavaMail.zimbra@oxygem.tv>

maybe you can use tcpdump on vmbrX, and see if you see dhcp queries/responses ?

does it work with static ip ?


----- Mail original -----
De: "Ulf Weikert" <weik at bbs-haarentor.de>
?: "proxmoxve" <pve-user at pve.proxmox.com>
Envoy?: Mardi 15 Novembre 2016 19:16:24
Objet: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM

Hey there, 

I'm running Proxmox VE 4.2-2 on a HP DL380 G8. 
Yesterday I installed a new 10Gbit HP 560SFP+ Dual Port Fibre NIC. 
Which lead me to this bug [0], [1]. 

But since the card is displayed in my Proxmox Webinterface, I think it 
should work at least. 


So I configured my KVM Container and installed Windows Server 2012 R2 VM 
according to the best practices wiki [2]. 

After Installation however the NIC in the VM does not receive an IP. 
The DL 380 host is directly attached to our coreswitch via fibre channel. 

To make sure DHCP is working on the coreswitch port I plugged in some 
old 1 GBit SFP modules. Connected it to an equally old 24 port Switch. 
And plugged my notebook in the RJ45 port. DHCP works fine there. 

So in theory it should work in my VM as well. But for some reason it 
doesn't. See screenshot [3] 
See my Proxmox Host and VM Setup. [4] & [5]. 

I tried using OVS Bridge instead of Linux Bridge as suggested in [0]#20, 
but that didn't work either. 

I'm thankful for any tip or advice you can give me. 


[0] https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/ 
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1616107 
[2] https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices 
[3] https://postimg.org/image/bmh7iooxp/ 
[4] https://postimg.org/image/l8aryzg3h/ 
[5] https://postimg.org/image/z392hgail/ 

-- 
Freundliche Gr??e 

Im Auftrag 
Ulf Weikert 
Systemadministrator 
Berufsbildende Schulen Haarentor der Stadt Oldenburg 
Ammerl?nder Heerstr. 33-39 | 26129 Oldenburg 
----------------------------------------- 
Encryptet 'Signal' Call: 
+49 441 77915-17 

Tel. 
+49 441 77915-17 

E-Mail: 
weik at bbs-haarentor.de 

Besuchen Sie uns im Internet unter www.bbs-haarentor.de 
Schulprogramm unter: http://www.bbs-haarentor.de/index.php?id=323 

Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, 
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 


From f.gruenbichler at proxmox.com  Wed Nov 16 15:02:12 2016
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Wed, 16 Nov 2016 15:02:12 +0100
Subject: [PVE-User] call for testing: updated grub2 packages on pvetest
Message-ID: <20161116140212.7i7ply3bzklmlw24@nora.maurer-it.com>

Hello,

I'd like people that have non-productive test setups to participate in
testing the updated grub2 packages that are available in the pvetest
repository.

We already tested them on all of our available hardware and setups, but
since issues with grub tend to be rather ugly to fix once something
fails, some exposure to (potentially exotic) configurations cannot hurt.

The packages update to the newer release (called "beta3" in the upstream
grub project) and drop the patches from the ZoL variant of grub2 that we
previously used, since grub now supports ZFS upstream.

Thanks in advance for any feedback!

http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-common_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-amd64-bin_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-amd64_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-ia32-bin_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-ia32_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-pc-bin_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-pc-dbg_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-pc_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-rescue-pc_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-theme-starfield_2.02-pve5_amd64.deb
http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub2-common_2.02-pve5_amd64.deb


From weik at bbs-haarentor.de  Wed Nov 16 15:44:00 2016
From: weik at bbs-haarentor.de (Ulf Weikert)
Date: Wed, 16 Nov 2016 15:44:00 +0100
Subject: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre,
 No LAN Connection in VM
In-Reply-To: <109434272.3393159.1479300435920.JavaMail.zimbra@oxygem.tv>
References: <dd9dcaa9-952e-03bd-f8bb-21b5570f4e88@bbs-haarentor.de>
 <109434272.3393159.1479300435920.JavaMail.zimbra@oxygem.tv>
Message-ID: <1280bb60-495e-eb03-acf6-84363515252f@bbs-haarentor.de>


On 16.11.2016 13:47, Alexandre DERUMIER wrote:
> maybe you can use tcpdump on vmbrX, and see if you see dhcp queries/responses ?
While running the command, I unplugged the sfp module and put it back
in. This is the outcome.

root at VMC-01-SN:~# tcpdump -i eth4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel

root at VMC-01-SN:~# tcpdump -i vmbr4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmbr4, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel

tcpdump -i eth1 (Interface for the productive system) shows expected
behavior. All kinds of traffic which matches the services that are
running on the VM.

>
> does it work with static ip ?
In the VM, no.
On the host due to the kernel bug I linked to in my first mail, I'm not
able to restart the interface seperately.
Afaik for know the only way to set an IP on eth4 is to "reboot -f" the
host. Which is something I can not due right now because there are
productive systems on it. Maybe weekend.
>
>
> ----- Mail original -----
> De: "Ulf Weikert" <weik at bbs-haarentor.de>
> ?: "proxmoxve" <pve-user at pve.proxmox.com>
> Envoy?: Mardi 15 Novembre 2016 19:16:24
> Objet: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM
>
> Hey there, 
>
> I'm running Proxmox VE 4.2-2 on a HP DL380 G8. 
> Yesterday I installed a new 10Gbit HP 560SFP+ Dual Port Fibre NIC. 
> Which lead me to this bug [0], [1]. 
>
> But since the card is displayed in my Proxmox Webinterface, I think it 
> should work at least. 
>
>
> So I configured my KVM Container and installed Windows Server 2012 R2 VM 
> according to the best practices wiki [2]. 
>
> After Installation however the NIC in the VM does not receive an IP. 
> The DL 380 host is directly attached to our coreswitch via fibre channel. 
>
> To make sure DHCP is working on the coreswitch port I plugged in some 
> old 1 GBit SFP modules. Connected it to an equally old 24 port Switch. 
> And plugged my notebook in the RJ45 port. DHCP works fine there. 
>
> So in theory it should work in my VM as well. But for some reason it 
> doesn't. See screenshot [3] 
> See my Proxmox Host and VM Setup. [4] & [5]. 
>
> I tried using OVS Bridge instead of Linux Bridge as suggested in [0]#20, 
> but that didn't work either. 
>
> I'm thankful for any tip or advice you can give me. 
>
>
> [0] https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/ 
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1616107 
> [2] https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices 
> [3] https://postimg.org/image/bmh7iooxp/ 
> [4] https://postimg.org/image/l8aryzg3h/ 
> [5] https://postimg.org/image/z392hgail/ 
>

-- 
Freundliche Gr??e
 
Im Auftrag
Ulf Weikert
Systemadministrator
Berufsbildende Schulen Haarentor der Stadt Oldenburg
Ammerl?nder Heerstr. 33-39 | 26129 Oldenburg
-----------------------------------------
Encryptet 'Signal' Call:
+49 441 77915-17

Tel.
+49 441 77915-17

E-Mail:
weik at bbs-haarentor.de

Besuchen Sie uns im Internet unter www.bbs-haarentor.de
Schulprogramm unter: http://www.bbs-haarentor.de/index.php?id=323

Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 


From gaio at sv.lnf.it  Wed Nov 16 16:47:18 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Wed, 16 Nov 2016 16:47:18 +0100
Subject: [PVE-User] P2V and UEFI...
Message-ID: <20161116154718.GJ3673@sv.lnf.it>


I need to P2V a debian 8 server, installed on UEFI/GPT.

A little complication born by the fact that i need to P2V in the same
server (eg, image the server, reinstall it with proxmox, then create
the VM), but i can move data elsewhere (to keep OS image minimal) and
test the image with other PVE installation.

Normally, i use 'mondobackup' for that, but mondo does not support UEFI
(at least in debian).


Also, i prefere to keep data in a second (virtual) disk, and backup
that by other mean (bacula) so i need to ''repartition'' (better:
reorganize data) in disks.


So, summarizing: what tool it is better to use to do a (preferibly
offline) image of some partition of a phisical server, respecting UEFI
partitioning schema?


I hope i was clear. Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From nick-liste at posteo.eu  Wed Nov 16 18:43:52 2016
From: nick-liste at posteo.eu (Nicola Ferrari (#554252))
Date: Wed, 16 Nov 2016 18:43:52 +0100
Subject: [PVE-User] Android app for pve 4.2 management
In-Reply-To: <4C14E5C6-0309-4EC2-A42D-24447FEF7266@linux-nerd.de>
References: <o0hgih$pu8$1@blaine.gmane.org>
 <4C14E5C6-0309-4EC2-A42D-24447FEF7266@linux-nerd.de>
Message-ID: <o0i5sk$9dn$1@blaine.gmane.org>

On 16/11/2016 13:02, Daniel wrote:
> Actually i have the same problem.
> 
> Thats the reason why i started to develop my own App.
> But these App will take serval month till i will be able to show something.
> 
> 

- Thanks Daniel for your response.
I'm glad to know about it.

- In the meantime, I've just written to the "Quadrata" developers (they
are also italian as me), to know something more about the QuadProx
future roadmap.

- Surfing in the play store, I got into this one:
https://play.google.com/store/apps/details?id=com.undatech.opaque

Have anybody tried this?
Since that's not free, and no trial version is available, I would know
your personal opinion about that.

( Please be patient for my poor english... )

Thanks!
Nick


-- 
+---------------------+
| Linux User  #554252 |
+---------------------+


From martin at proxmox.com  Wed Nov 16 18:45:50 2016
From: martin at proxmox.com (Martin Maurer)
Date: Wed, 16 Nov 2016 18:45:50 +0100
Subject: [PVE-User] Android app for pve 4.2 management
In-Reply-To: <o0hgih$pu8$1@blaine.gmane.org>
References: <o0hgih$pu8$1@blaine.gmane.org>
Message-ID: <d4366088-3aaf-1cbd-07d3-1683dfbf4073@proxmox.com>

See http://pve.proxmox.com/wiki/Proxmox_VE_Mobile

On 16.11.2016 12:40, Nicola Ferrari (#554252) wrote:
> Hi everybody.
> 
> I'm running a 3-nodes pve 4.2 cluster.
> In the past I used to overview the cluster at home and while travelling
> using OpenVPN on my phone, in conjunction with QuadProx Mobile:
> https://play.google.com/store/apps/details?id=it.quadrata.android.quad_prox_mob&hl=it
> 
> I recently (sept 2016) upgraded cluster to pve4, and only yesterday I
> realized that Quadrata App is no more functional on pve4:
> I can see only a few data (nodes name, free memory/cpu) but I can't see
> vm list and management options (start, stop, console and so on) .
> 
> Do you experience the same issue too?
> Any advice about alternative Android apps to achieve this?
> 
> Thanks!
> Nick
> 
> PS: I tried to use simply a web browser on the mobile (firefox mobile)
> but it requests too much resources... that's not usable..
> 
> 
> 

-- 
Best Regards,

Martin Maurer

martin at proxmox.com
http://www.proxmox.com

____________________________________________________________________
Proxmox Server Solutions GmbH
Br?uhausgasse 37, 1050 Vienna, Austria
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien


From yannis.milios at gmail.com  Wed Nov 16 19:54:07 2016
From: yannis.milios at gmail.com (Yannis Milios)
Date: Wed, 16 Nov 2016 18:54:07 +0000
Subject: [PVE-User] P2V and UEFI...
In-Reply-To: <20161116154718.GJ3673@sv.lnf.it>
References: <20161116154718.GJ3673@sv.lnf.it>
Message-ID: <CAFiF2OoZr6-5jbkLJq-Ph1BOTqQiF672fBHSzyJH-+BRDSpqmw@mail.gmail.com>

I would use plain dd or clonezilla to backup. Then restore to vm and adjust
partitions/vdisks as needed by using gparted.


On Wednesday, 16 November 2016, Marco Gaiarin <gaio at sv.lnf.it> wrote:

>
> I need to P2V a debian 8 server, installed on UEFI/GPT.
>
> A little complication born by the fact that i need to P2V in the same
> server (eg, image the server, reinstall it with proxmox, then create
> the VM), but i can move data elsewhere (to keep OS image minimal) and
> test the image with other PVE installation.
>
> Normally, i use 'mondobackup' for that, but mondo does not support UEFI
> (at least in debian).
>
>
> Also, i prefere to keep data in a second (virtual) disk, and backup
> that by other mean (bacula) so i need to ''repartition'' (better:
> reorganize data) in disks.
>
>
> So, summarizing: what tool it is better to use to do a (preferibly
> offline) image of some partition of a phisical server, respecting UEFI
> partitioning schema?
>
>
> I hope i was clear. Thanks.
>
> --
> dott. Marco Gaiarin                                     GNUPG Key ID:
> 240A3D66
>   Associazione ``La Nostra Famiglia''
> http://www.lanostrafamiglia.it/
>   Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento
> (PN)
>   marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f
> +39-0434-842797
>
>                 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
>     http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
>         (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com <javascript:;>
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


-- 
Sent from Gmail Mobile


From mityapetuhov at gmail.com  Thu Nov 17 06:05:32 2016
From: mityapetuhov at gmail.com (Dmitry Petuhov)
Date: Thu, 17 Nov 2016 08:05:32 +0300
Subject: [PVE-User] P2V and UEFI...
In-Reply-To: <20161116154718.GJ3673@sv.lnf.it>
References: <20161116154718.GJ3673@sv.lnf.it>
Message-ID: <dbe8a9f1-675e-33ee-c9ce-5d51e344f193@gmail.com>

I've used something like
ssh root@<source_node> 'tar --one-file-system -C <source_mountpoint> -cf - .' | tar -C <target_mountpoint> -xf -

Source and target partition layouts may differ, just don't forget to update fstab accordingly and about boot partition for UEFI and re-run grub-install (maybe not needed for UEFI?).


16.11.2016 18:47, Marco Gaiarin wrote:
> I need to P2V a debian 8 server, installed on UEFI/GPT.
>
> A little complication born by the fact that i need to P2V in the same
> server (eg, image the server, reinstall it with proxmox, then create
> the VM), but i can move data elsewhere (to keep OS image minimal) and
> test the image with other PVE installation.
>
> Normally, i use 'mondobackup' for that, but mondo does not support UEFI
> (at least in debian).
>
>
> Also, i prefere to keep data in a second (virtual) disk, and backup
> that by other mean (bacula) so i need to ''repartition'' (better:
> reorganize data) in disks.
>
>
> So, summarizing: what tool it is better to use to do a (preferibly
> offline) image of some partition of a phisical server, respecting UEFI
> partitioning schema?
>
>
> I hope i was clear. Thanks.
>


From dietmar at proxmox.com  Thu Nov 17 10:07:06 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Thu, 17 Nov 2016 10:07:06 +0100 (CET)
Subject: [PVE-User] drbdmanage License change
Message-ID: <419261783.47.1479373626167@webmail.proxmox.com>

Hi all,

We just want to inform you that Linbit changed the License
for their 'drbdmanage' toolkit.

The commit messages says ("Philipp Reisner"):
------------------
basically we do not want that others (who have not contributed to the
development) act as parasites in our support business
------------------

The commit is here:

http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1


The new License contains the following clause (3.4b):

------------------
3.4) Without prior written consent of LICENSOR or an authorized partner,
 LICENSEE is not allowed to:

b) provide commercial turn-key solutions based on the LICENSED SOFTWARE or
 commercial services for the LICENSED SOFTWARE or its modifications to any
 third party (e.g. software support or trainings).
------------------

So we are basically forced to remove the package from our repository. We will
also remove the included storage driver to make sure that we and our
customers do not violate that license.

Please contact Linbit if you want to use drbdmanage in future. They may 
provide all necessary packages.


From gaio at sv.lnf.it  Thu Nov 17 14:07:38 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Thu, 17 Nov 2016 14:07:38 +0100
Subject: [PVE-User] Moving a disk from Ceph to thin-lvm, troubles...
Message-ID: <20161117130738.GG3402@sv.lnf.it>


I'm still building my ceph cluster, and i've found that i put it under
heavy stress migrating data.

So i've setup, on a node (so, not replicated) a thin lvm storage and
tried to move the disk.

My LVM setup:

root at thor:~# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda5
  VG Name               pve
  PV Size               1.37 TiB / not usable 2.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              358668
  Free PE               0
  Allocated PE          358668
  PV UUID               yxx5qG-NAJQ-IqpV-HdJW-7YJS-M2c5-HeQItn
   
root at thor:~# vgdisplay 
  --- Volume group ---
  VG Name               pve
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  10
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               1.37 TiB
  PE Size               4.00 MiB
  Total PE              358668
  Alloc PE / Size       358668 / 1.37 TiB
  Free  PE / Size       0 / 0   
  VG UUID               VBaahR-ikYG-H2jK-TCdq-SPvE-VbLA-X4fpPd
   
root at thor:~# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/pve/lvol0
  LV Name                lvol0
  VG Name                pve
  LV UUID                LR4G8Z-zHoB-t12p-B127-dK8z-GZw1-tZmQHP
  LV Write Access        read/write
  LV Creation host, time thor, 2016-11-11 12:23:36 +0100
  LV Status              available
  # open                 0
  LV Size                88.00 MiB
  Current LE             22
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:0
   
  --- Logical volume ---
  LV Name                scratch
  VG Name                pve
  LV UUID                fFVtrc-B9lJ-h3gj-ksU6-WICb-w0A6-BVlqlq
  LV Write Access        read/write
  LV Creation host, time thor, 2016-11-11 12:24:33 +0100
  LV Pool metadata       scratch_tmeta
  LV Pool data           scratch_tdata
  LV Status              available
  # open                 1
  LV Size                1.37 TiB
  Allocated pool data    48.36%
  Allocated metadata     99.95%
  Current LE             358602
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     4096
  Block device           251:3

(note the 'Allocated pool data    48.36%').


Source disk is 1TB on ceph, rather empty. Target space is 1.37 TB. I've followed
the proxmox wiki creating the thin lvm storage (https://pve.proxmox.com/wiki/Storage:_LVM_Thin).


I've first tried to move the disk 'online', and log say:

 create full clone of drive virtio1 (DATA:vm-107-disk-1)
 Logical volume "vm-107-disk-1" created.
 drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed
 transferred: 0 bytes remaining: 1099511627776 bytes total: 1099511627776 bytes progression: 0.00 % busy: true ready: false
 transferred: 146800640 bytes remaining: 1099364827136 bytes total: 1099511627776 bytes progression: 0.01 % busy: true ready: false
 transferred: 557842432 bytes remaining: 1098953785344 bytes total: 1099511627776 bytes progression: 0.05 % busy: true ready: false 
 [...]
 transferred: 727548166144 bytes remaining: 371963461632 bytes total: 1099511627776 bytes progression: 66.17 % busy: true ready: false
 device-mapper: message ioctl on failed: Operation not supported
 Failed to resume scratch.
 lvremove 'pve/vm-107-disk-1' error: Failed to update pool pve/scratch.
 TASK ERROR: storage migration failed: mirroring error: mirroring job seem to have die. Maybe do you have bad sectors? at /usr/share/perl5/PVE/QemuServer.pm line 5890.

In syslog i've catched also:

 Nov 17 12:59:45 thor lvm[598]: Thin metadata pve-scratch-tpool is now 80% full.
 Nov 17 13:03:35 thor lvm[598]: Thin metadata pve-scratch-tpool is now 85% full.
 Nov 17 13:07:25 thor lvm[598]: Thin metadata pve-scratch-tpool is now 90% full.
 Nov 17 13:11:25 thor lvm[598]: Thin metadata pve-scratch-tpool is now 95% full.


Now, if i try again, i simply get (offline or online, make no difference):

 create full clone of drive virtio1 (DATA:vm-107-disk-1)
 device-mapper: message ioctl on failed: Operation not supported
 TASK ERROR: storage migration failed: lvcreate 'pve/vm-107-disk-1' error: Failed to resume scratch.

Also, if i go to proxmox web interfce, storage 'Scratch' (the name of the
thin lvm storage) is:
	Usage 48.36% (677.42 GiB of 1.37 TiB

but 'content' is empty. And i'm sure i've not 677.42 GiB of data in the source disk...


What i'm missing?! Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From IMMO.WETZEL at adtran.com  Thu Nov 17 14:49:58 2016
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Thu, 17 Nov 2016 13:49:58 +0000
Subject: [PVE-User] api call to get the right node name
Message-ID: <F5452071F098E84B91D98FDE02860FAABC780329@ex-mb1.corp.adtran.com>

HI,

is there any direct api call to get the node name where the vm is currently running on ?

Mit freundlichen Gr??en / With kind regards

Immo Wetzel

ADTRAN GmbH
Siemensallee 1
17489 Greifswald
Germany

Phone: +49 3834 5352 823
Mobile: +49 151 147 29 225
Skype: immo_wetzel_adtran
Immo.Wetzel at Adtran.com<mailto:Immo.Wetzel at Adtran.com>   PGP-Fingerprint: 7313 7E88 4E19 AACF 45E9 E74D EFF7 0480 F4CF 6426
http://www.adtran.com<http://www.adtran.com/>

Sitz der Gesellschaft: Berlin / Registered office: Berlin
Registergericht: Berlin / Commercial registry: Amtsgericht Charlottenburg, HRB 135656 B
Gesch?ftsf?hrung / Managing Directors: Roger Shannon, James D. Wilson, Jr., Dr. Eduard Scheiterer


From d.csapak at proxmox.com  Thu Nov 17 15:12:48 2016
From: d.csapak at proxmox.com (Dominik Csapak)
Date: Thu, 17 Nov 2016 15:12:48 +0100
Subject: [PVE-User] api call to get the right node name
In-Reply-To: <F5452071F098E84B91D98FDE02860FAABC780329@ex-mb1.corp.adtran.com>
References: <F5452071F098E84B91D98FDE02860FAABC780329@ex-mb1.corp.adtran.com>
Message-ID: <e54152ac-ec9d-edd3-8133-a2cf950ac7ac@proxmox.com>

On 11/17/2016 02:49 PM, IMMO WETZEL wrote:
> HI,
>
> is there any direct api call to get the node name where the vm is currently running on ?
>

not directly no,
but you can call /cluster/resources and parse the output for your vm


From IMMO.WETZEL at adtran.com  Thu Nov 17 17:36:31 2016
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Thu, 17 Nov 2016 16:36:31 +0000
Subject: [PVE-User] how to create a snapshot from vm via api2 ?
Message-ID: <F5452071F098E84B91D98FDE02860FAABC7804F5@ex-mb1.corp.adtran.com>

Is that function may be not described at the current api  doc ?
I would expect at least three parameter
node,vmid,snapshotname,description,savevmstate{Boolean}

cos qm snapshot allow such parameter
root at prox01:~# qm help snapshot
USAGE: qm snapshot <vmid> <snapname> [OPTIONS]

 Snapshot a VM.

  <vmid>     integer (1 - N)
             The (unique) ID of the VM.
  <snapname> string
             The name of the snapshot.
  -description string
             A textual description or comment.
  -vmstate   boolean
             Save the vmstate


From IMMO.WETZEL at adtran.com  Thu Nov 17 18:49:11 2016
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Thu, 17 Nov 2016 17:49:11 +0000
Subject: [PVE-User] how to get the processstate via API
Message-ID: <F5452071F098E84B91D98FDE02860FAABC7806B2@ex-mb1.corp.adtran.com>

Hi,
Every task started by API gets a unique task id.
How can I check the state of this task via API?

Immo


From dietmar at proxmox.com  Thu Nov 17 19:54:33 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Thu, 17 Nov 2016 19:54:33 +0100 (CET)
Subject: [PVE-User] how to get the processstate via API
In-Reply-To: <F5452071F098E84B91D98FDE02860FAABC7806B2@ex-mb1.corp.adtran.com>
References: <F5452071F098E84B91D98FDE02860FAABC7806B2@ex-mb1.corp.adtran.com>
Message-ID: <1071762906.117.1479408873531@webmail.proxmox.com>

HTTP:   GET /api2/json/nodes/{node}/tasks/{upid}
CLI:	pvesh get /nodes/{node}/tasks/{upid}


> On November 17, 2016 at 6:49 PM IMMO WETZEL <IMMO.WETZEL at adtran.com> wrote:
> 
> 
> Hi,
> Every task started by API gets a unique task id.
> How can I check the state of this task via API?
> 
> Immo
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From f.gruenbichler at proxmox.com  Fri Nov 18 08:38:51 2016
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Fri, 18 Nov 2016 08:38:51 +0100
Subject: [PVE-User] how to create a snapshot from vm via api2 ?
In-Reply-To: <F5452071F098E84B91D98FDE02860FAABC7804F5@ex-mb1.corp.adtran.com>
References: <F5452071F098E84B91D98FDE02860FAABC7804F5@ex-mb1.corp.adtran.com>
Message-ID: <20161118073851.72gc2qqye6ibgz5z@nora.maurer-it.com>

On Thu, Nov 17, 2016 at 04:36:31PM +0000, IMMO WETZEL wrote:
> Is that function may be not described at the current api  doc ?

it is ;) are you using the online version[1]?

> I would expect at least three parameter
> node,vmid,snapshotname,description,savevmstate{Boolean}
> 
> cos qm snapshot allow such parameter
> root at prox01:~# qm help snapshot
> USAGE: qm snapshot <vmid> <snapname> [OPTIONS]
> 
>  Snapshot a VM.
> 
>   <vmid>     integer (1 - N)
>              The (unique) ID of the VM.
>   <snapname> string
>              The name of the snapshot.
>   -description string
>              A textual description or comment.
>   -vmstate   boolean
>              Save the vmstate

HTTP: POST /api2/json/nodes/{node}/qemu/{vmid}/snapshot
 
CLI: pvesh create /nodes/{node}/qemu/{vmid}/snapshot

node, snapname and vmid are required, and you have the two optional
parameters like with "qm snapshot"


1: http://pve.proxmox.com/pve-docs/api-viewer/index.html


From gaio at sv.lnf.it  Fri Nov 18 14:04:48 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Fri, 18 Nov 2016 14:04:48 +0100
Subject: [PVE-User] Moving a disk from Ceph to thin-lvm, troubles...
In-Reply-To: <20161117130738.GG3402@sv.lnf.it>
References: <20161117130738.GG3402@sv.lnf.it>
Message-ID: <20161118130448.GF3291@sv.lnf.it>


> What i'm missing?! Thanks.

Sorry, probably i'm missing some background info on LVM. Trying to
reset and restart from the ground.


With LVM, you define a storage with a VG, and proxmox itself create a
LV for every disk. Simple, clear.


With Thin-LVM, insted, i've to creare a LV, define it as 'thin' with
(taken from the wiki):

	lvcreate -L 100G -n data pve
	lvconvert --type thin-pool pve/data

and in definition of the storage, in proxmox interface, i've to specify
the VG (clear) but also the LV.
OK. But done that, where the disk image get created? Proxmox take care
of formatting and mounting the LV, and create the disk image inside?

Sorry, but i've not clear how works...


Also, seems that the trouble with Thin LVM came from the fact that i've
exausted the 'metadata' space, and the default LVM configuration does
not extend automatically the metadata space
(thin_pool_autoextend_threshold = 100).

Right?


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From f.gruenbichler at proxmox.com  Fri Nov 18 14:28:47 2016
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Fri, 18 Nov 2016 14:28:47 +0100
Subject: [PVE-User] Moving a disk from Ceph to thin-lvm, troubles...
In-Reply-To: <20161118130448.GF3291@sv.lnf.it>
References: <20161117130738.GG3402@sv.lnf.it> <20161118130448.GF3291@sv.lnf.it>
Message-ID: <20161118132847.sltboxrjlet4neel@nora.maurer-it.com>

On Fri, Nov 18, 2016 at 02:04:48PM +0100, Marco Gaiarin wrote:
> 
> > What i'm missing?! Thanks.
> 
> Sorry, probably i'm missing some background info on LVM. Trying to
> reset and restart from the ground.
> 
> 
> With LVM, you define a storage with a VG, and proxmox itself create a
> LV for every disk. Simple, clear.
> 
> 
> With Thin-LVM, insted, i've to creare a LV, define it as 'thin' with
> (taken from the wiki):
> 
> 	lvcreate -L 100G -n data pve
> 	lvconvert --type thin-pool pve/data

you can simply create the thin pool LV in one go, e.g.:
lvcreate -L 100G -n mythinpoolname -T myvgname

will create a 100G thin pool (volume) called "mythinpoolname" on the
volume group "myvgname". optionally you can specify the pool metadata
size (with "--poolmetadatasize SIZE"), the default is 64b per chunk of
the pool.

> 
> and in definition of the storage, in proxmox interface, i've to specify
> the VG (clear) but also the LV.
> OK. But done that, where the disk image get created? Proxmox take care
> of formatting and mounting the LV, and create the disk image inside?

PVE will automatically create thinly provisioned LVs for the disks, and
instead of on the VG, they are created on the thin pool. A thin pool
cannot be mounted, only the thinly provisioned volumes on it can. If you
want to simplify it, a thin pool acts as both an LV (in relation to the
VG) and as VG (in relation to the thin volumes stored on it).

> 
> Sorry, but i've not clear how works...

hope the explanation helped a bit?

> 
> Also, seems that the trouble with Thin LVM came from the fact that i've
> exausted the 'metadata' space, and the default LVM configuration does
> not extend automatically the metadata space
> (thin_pool_autoextend_threshold = 100).

you can specify the metadata size on creation (see above) - maybe the
convert to thin operation does not allocate enough space for the
metadata? in our default setup, pool autoextension is not possible
(there are no free blocks in the VG to autoextend with)..

you can check "man lvmthin" for more examples and explanations for how
LVM thin provisioning works.


From chance_ellis at yahoo.com  Fri Nov 18 17:02:32 2016
From: chance_ellis at yahoo.com (Chance Ellis)
Date: Fri, 18 Nov 2016 11:02:32 -0500
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
Message-ID: <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com>

Hello,

I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time. 

My normal upgrade plan is to live migrate vms off of node-1  to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3.

The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal.

Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network?

Thanks!


On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" <pve-user-bounces at pve.proxmox.com on behalf of t.lamprecht at proxmox.com> wrote:

    On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
    > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
    >> Are you sure you upgraded all, i.e. used:
    >> apt update
    >> apt full-upgrade 
    >
    > Resolved it thanks Thomas - I hadn't updated the *destination* server.
    >
    
    
    makes sense, should have been made sense a few days ago this, would not 
    be too hard to catch :/
    
    anyway, for anyone reading this:
    When upgrading qemu-server to version 4.0.93 or newer you should upgrade 
    all other nodes pve-cluster package to version 4.0-47 or newer, else 
    migrations to those nodes will not work - as we use a new command to 
    detect if we should send the traffic over a separate migration network.
    
    cheers,
    Thomas
    
    
    _______________________________________________
    pve-user mailing list
    pve-user at pve.proxmox.com
    http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
    

From t.lamprecht at proxmox.com  Fri Nov 18 17:44:41 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Fri, 18 Nov 2016 17:44:41 +0100
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
 <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com>
Message-ID: <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com>

Hi,


On 11/18/2016 05:02 PM, Chance Ellis wrote:
> Hello,
>
> I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time.
>
> My normal upgrade plan is to live migrate vms off of node-1  to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3.
>
> The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal.

If I understand you correctly you have
new node1
old node2

And migration from node2 -> node1 does not work?
That should not be, if you run into this can you post the error from the 
migrate command?

We normally try to guarantee that old -> new works, the other way around 
cannot be always guaranteed.

I tested this also now and it worked. I down graded a test node of mine, 
started a VM there and live migrated it successfully to a upgraded VM.

> Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network?

If you have not configured it it will not be used.
Migrate a unimportant test VM first to see if it works.

cheers,
Thomas

>
> Thanks!
>
>
> On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" <pve-user-bounces at pve.proxmox.com on behalf of t.lamprecht at proxmox.com> wrote:
>
>      On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
>      > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
>      >> Are you sure you upgraded all, i.e. used:
>      >> apt update
>      >> apt full-upgrade
>      >
>      > Resolved it thanks Thomas - I hadn't updated the *destination* server.
>      >
>      
>      
>      makes sense, should have been made sense a few days ago this, would not
>      be too hard to catch :/
>      
>      anyway, for anyone reading this:
>      When upgrading qemu-server to version 4.0.93 or newer you should upgrade
>      all other nodes pve-cluster package to version 4.0-47 or newer, else
>      migrations to those nodes will not work - as we use a new command to
>      detect if we should send the traffic over a separate migration network.
>      
>      cheers,
>      Thomas
>      
>      
>      
>      _______________________________________________
>      pve-user mailing list
>      pve-user at pve.proxmox.com
>      http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>      
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From chance_ellis at yahoo.com  Fri Nov 18 18:01:44 2016
From: chance_ellis at yahoo.com (Chance Ellis)
Date: Fri, 18 Nov 2016 12:01:44 -0500
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
 <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com>
 <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com>
Message-ID: <5FDE66D8-461C-4D3A-9356-6DAF80FE4F5E@yahoo.com>

Hi Tom,

You have the essential issue. 

I have my original cluster. All nodes are running these versions: http://pastebin.com/vquKJaKJ

As a test, I added a new node to the cluster, running these versions: http://pastebin.com/Jg5LH0RD

When I try to migrate from old->new, I get the following error: http://pastebin.com/YazWBtn2

When I try to migrate from new-> old, I get the following error: http://pastebin.com/hBfBnsYP

Thanks!


On 11/18/16, 11:44 AM, "pve-user on behalf of Thomas Lamprecht" <pve-user-bounces at pve.proxmox.com on behalf of t.lamprecht at proxmox.com> wrote:

    Hi,
    
    
    On 11/18/2016 05:02 PM, Chance Ellis wrote:
    > Hello,
    >
    > I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time.
    >
    > My normal upgrade plan is to live migrate vms off of node-1  to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3.
    >
    > The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal.
    
    If I understand you correctly you have
    new node1
    old node2
    
    And migration from node2 -> node1 does not work?
    That should not be, if you run into this can you post the error from the 
    migrate command?
    
    We normally try to guarantee that old -> new works, the other way around 
    cannot be always guaranteed.
    
    I tested this also now and it worked. I down graded a test node of mine, 
    started a VM there and live migrated it successfully to a upgraded VM.
    
    > Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network?
    
    If you have not configured it it will not be used.
    Migrate a unimportant test VM first to see if it works.
    
    cheers,
    Thomas
    
    >
    > Thanks!
    >
    >
    > On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" <pve-user-bounces at pve.proxmox.com on behalf of t.lamprecht at proxmox.com> wrote:
    >
    >      On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
    >      > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
    >      >> Are you sure you upgraded all, i.e. used:
    >      >> apt update
    >      >> apt full-upgrade
    >      >
    >      > Resolved it thanks Thomas - I hadn't updated the *destination* server.
    >      >
    >      
    >      
    >      makes sense, should have been made sense a few days ago this, would not
    >      be too hard to catch :/
    >      
    >      anyway, for anyone reading this:
    >      When upgrading qemu-server to version 4.0.93 or newer you should upgrade
    >      all other nodes pve-cluster package to version 4.0-47 or newer, else
    >      migrations to those nodes will not work - as we use a new command to
    >      detect if we should send the traffic over a separate migration network.
    >      
    >      cheers,
    >      Thomas
    >      
    >      
    >      
    >      _______________________________________________
    >      pve-user mailing list
    >      pve-user at pve.proxmox.com
    >      http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
    >      
    >
    >
    > _______________________________________________
    > pve-user mailing list
    > pve-user at pve.proxmox.com
    > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
    
    
    _______________________________________________
    pve-user mailing list
    pve-user at pve.proxmox.com
    http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
    

From t.lamprecht at proxmox.com  Fri Nov 18 18:21:00 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Fri, 18 Nov 2016 18:21:00 +0100
Subject: [PVE-User] [pve-devel] online migration broken in latest
 updates - "unknown command 'mtunnel'"
In-Reply-To: <5FDE66D8-461C-4D3A-9356-6DAF80FE4F5E@yahoo.com>
References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com>
 <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com>
 <cf672f2a-2d21-f670-bc86-a49eddd914e5@gmail.com>
 <b737064b-bf01-6a26-8dee-45b133b0d22f@proxmox.com>
 <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com>
 <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com>
 <5FDE66D8-461C-4D3A-9356-6DAF80FE4F5E@yahoo.com>
Message-ID: <40cba59f-ef02-e80a-fdb2-836098544ec1@proxmox.com>

Hi,


I'm removing the pve-devel list from the reply-to as one is enough.

On 11/18/2016 06:01 PM, Chance Ellis wrote:
> Hi Tom,
>
> You have the essential issue.
>
> I have my original cluster. All nodes are running these versions: http://pastebin.com/vquKJaKJ
>
> As a test, I added a new node to the cluster, running these versions: http://pastebin.com/Jg5LH0RD
>
> When I try to migrate from old->new, I get the following error: http://pastebin.com/YazWBtn2

Here the migration network patch has now fault, it comes to this line:

  > ...
  > Nov 18 11:59:27 starting online/live migration on tcp:localhost:60000

Here anything related to a dedicated migration network happened already, 
the rest of the code is independent from it.

But I find it interesting that you have "tcp:localhost:60000" in the log.
This means that your node still uses the old TCP forward ssh tunnel for 
the migration.
Those did not open reliable so we switched to unix sockets, so the line 
should be something like:

  > Nov 18 17:42:38 starting online/live migration on 
unix:/run/qemu-server/167.migrate

As a workaround disable migration_unsecure or delete it from 
/etc/pve/datacenter.cfg then it should work.
I have to look into that, there may be a bug when migrating from old -> 
new and migration_unsecure on.

>
> When I try to migrate from new-> old, I get the following error: http://pastebin.com/hBfBnsYP

This is expected. But you can solve it by updating at least the 
pve-cluster pcakage on the old node, then it should work also.

cheers,
Thomas

> Thanks!
>
>
>
>
>
>
> On 11/18/16, 11:44 AM, "pve-user on behalf of Thomas Lamprecht" <pve-user-bounces at pve.proxmox.com on behalf of t.lamprecht at proxmox.com> wrote:
>
>      Hi,
>      
>      
>      On 11/18/2016 05:02 PM, Chance Ellis wrote:
>      > Hello,
>      >
>      > I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time.
>      >
>      > My normal upgrade plan is to live migrate vms off of node-1  to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3.
>      >
>      > The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal.
>      
>      If I understand you correctly you have
>      new node1
>      old node2
>      
>      And migration from node2 -> node1 does not work?
>      That should not be, if you run into this can you post the error from the
>      migrate command?
>      
>      We normally try to guarantee that old -> new works, the other way around
>      cannot be always guaranteed.
>      
>      I tested this also now and it worked. I down graded a test node of mine,
>      started a VM there and live migrated it successfully to a upgraded VM.
>      
>      > Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network?
>      
>      If you have not configured it it will not be used.
>      Migrate a unimportant test VM first to see if it works.
>      
>      cheers,
>      Thomas
>      
>      >
>      > Thanks!
>      >
>      >
>      > On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" <pve-user-bounces at pve.proxmox.com on behalf of t.lamprecht at proxmox.com> wrote:
>      >
>      >      On 11/10/2016 10:35 PM, Lindsay Mathieson wrote:
>      >      > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote:
>      >      >> Are you sure you upgraded all, i.e. used:
>      >      >> apt update
>      >      >> apt full-upgrade
>      >      >
>      >      > Resolved it thanks Thomas - I hadn't updated the *destination* server.
>      >      >
>      >
>      >
>      >      makes sense, should have been made sense a few days ago this, would not
>      >      be too hard to catch :/
>      >
>      >      anyway, for anyone reading this:
>      >      When upgrading qemu-server to version 4.0.93 or newer you should upgrade
>      >      all other nodes pve-cluster package to version 4.0-47 or newer, else
>      >      migrations to those nodes will not work - as we use a new command to
>      >      detect if we should send the traffic over a separate migration network.
>      >
>      >      cheers,
>      >      Thomas
>      >
>      >
>      >
>      >      _______________________________________________
>      >      pve-user mailing list
>      >      pve-user at pve.proxmox.com
>      >      http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>      >
>      >
>      >
>      > _______________________________________________
>      > pve-user mailing list
>      > pve-user at pve.proxmox.com
>      > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>      
>      
>      _______________________________________________
>      pve-user mailing list
>      pve-user at pve.proxmox.com
>      http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>      
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mavleeuwen at icloud.com  Sat Nov 19 10:19:11 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 10:19:11 +0100
Subject: [PVE-User] License issue
Message-ID: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>

Hi,

I?m new here and I never used mailing lists.., so I apologise if I do something stupid.

I?m Marcel van Leeuwen and living in the Netherlands. IT stuff is just my hobby but I must admit its a bit out of hand. I'm testing ProxmoxVE at the moment and I really like it. I also considered ESXi but I like the opensource character of ProxmoxVE.

I subscribed for a license to support the project and of course to get updates. Now I?m in a testing phase so I installed my license a couple of times. I think I hit a maximum cause I can reactivate my license at the moment. I raised a ticket over at Maurer IT. I was not aware of this limitation. How do I prevent this from happening again? Just not install the license or not re-install ProxmoxVE?

Regards,

Marcel van Leeuwen

From dietmar at proxmox.com  Sat Nov 19 11:06:38 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Sat, 19 Nov 2016 11:06:38 +0100 (CET)
Subject: [PVE-User] License issue
In-Reply-To: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
Message-ID: <1749975450.39.1479549998560@webmail.proxmox.com>


> I subscribed for a license to support the project and of course to get
> updates. Now I?m in a testing phase so I installed my license a couple of
> times. I think I hit a maximum cause I can reactivate my license at the
> moment. I raised a ticket over at Maurer IT. I was not aware of this
> limitation. How do I prevent this from happening again? Just not install the
> license or not re-install ProxmoxVE?

It is usually not required to do re-installs (what for?). And I guess 
it is not necessary to activate the subscription for a test system 
when you know you will reinstall soon (use pve-no-subscription for updates).


From mavleeuwen at icloud.com  Sat Nov 19 12:14:06 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 12:14:06 +0100
Subject: [PVE-User] License issue
In-Reply-To: <1749975450.39.1479549998560@webmail.proxmox.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
Message-ID: <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>

Yeah, I agree it?s normally not necessary to do re-installs. The reason I did I was messing with remote NFS shares in LXC containers. So I did a couple of stupid things (i still have not resolved this issue). I already installed the license and was not aware of the limitation. 

For now I?ve add the pve-no-subscripition repository.

What?s the difference between the pve-enterprise and the pve-no-subscription repository? Are update just beter tested in the pve-enterprise repo? 

> On 19 Nov 2016, at 11:06, Dietmar Maurer <dietmar at proxmox.com> wrote:
> 
> 
>> I subscribed for a license to support the project and of course to get
>> updates. Now I?m in a testing phase so I installed my license a couple of
>> times. I think I hit a maximum cause I can reactivate my license at the
>> moment. I raised a ticket over at Maurer IT. I was not aware of this
>> limitation. How do I prevent this from happening again? Just not install the
>> license or not re-install ProxmoxVE?
> 
> It is usually not required to do re-installs (what for?). And I guess 
> it is not necessary to activate the subscription for a test system 
> when you know you will reinstall soon (use pve-no-subscription for updates).
> 


From bc at iptel.co  Sat Nov 19 12:25:16 2016
From: bc at iptel.co (Brian ::)
Date: Sat, 19 Nov 2016 11:25:16 +0000
Subject: [PVE-User] License issue
In-Reply-To: <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
Message-ID: <CAAr8QoznBVUZDAJjdTK4a6HTV6_P88EuoGXb4Jg4x78X1R4bPg@mail.gmail.com>

Hi Marcel,

Its all explained here https://pve.proxmox.com/wiki/Package_Repositories

Cheers


On Sat, Nov 19, 2016 at 11:14 AM, Marcel van Leeuwen
<mavleeuwen at icloud.com> wrote:
> Yeah, I agree it?s normally not necessary to do re-installs. The reason I did I was messing with remote NFS shares in LXC containers. So I did a couple of stupid things (i still have not resolved this issue). I already installed the license and was not aware of the limitation.
>
> For now I?ve add the pve-no-subscripition repository.
>
> What?s the difference between the pve-enterprise and the pve-no-subscription repository? Are update just beter tested in the pve-enterprise repo?
>
>> On 19 Nov 2016, at 11:06, Dietmar Maurer <dietmar at proxmox.com> wrote:
>>
>>
>>> I subscribed for a license to support the project and of course to get
>>> updates. Now I?m in a testing phase so I installed my license a couple of
>>> times. I think I hit a maximum cause I can reactivate my license at the
>>> moment. I raised a ticket over at Maurer IT. I was not aware of this
>>> limitation. How do I prevent this from happening again? Just not install the
>>> license or not re-install ProxmoxVE?
>>
>> It is usually not required to do re-installs (what for?). And I guess
>> it is not necessary to activate the subscription for a test system
>> when you know you will reinstall soon (use pve-no-subscription for updates).
>>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mavleeuwen at icloud.com  Sat Nov 19 12:33:56 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 12:33:56 +0100
Subject: [PVE-User] License issue
In-Reply-To: <CAAr8QoznBVUZDAJjdTK4a6HTV6_P88EuoGXb4Jg4x78X1R4bPg@mail.gmail.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
 <CAAr8QoznBVUZDAJjdTK4a6HTV6_P88EuoGXb4Jg4x78X1R4bPg@mail.gmail.com>
Message-ID: <22442392-FBAC-4FFD-BC32-4EDD6A7CCB79@icloud.com>

Hi Brian,

Thanks for that link! Checking?

Cheers,

Marcel

> On 19 Nov 2016, at 12:25, Brian :: <bc at iptel.co> wrote:
> 
> Hi Marcel,
> 
> Its all explained here https://pve.proxmox.com/wiki/Package_Repositories
> 
> Cheers
> 
> 
> 
> On Sat, Nov 19, 2016 at 11:14 AM, Marcel van Leeuwen
> <mavleeuwen at icloud.com> wrote:
>> Yeah, I agree it?s normally not necessary to do re-installs. The reason I did I was messing with remote NFS shares in LXC containers. So I did a couple of stupid things (i still have not resolved this issue). I already installed the license and was not aware of the limitation.
>> 
>> For now I?ve add the pve-no-subscripition repository.
>> 
>> What?s the difference between the pve-enterprise and the pve-no-subscription repository? Are update just beter tested in the pve-enterprise repo?
>> 
>>> On 19 Nov 2016, at 11:06, Dietmar Maurer <dietmar at proxmox.com> wrote:
>>> 
>>> 
>>>> I subscribed for a license to support the project and of course to get
>>>> updates. Now I?m in a testing phase so I installed my license a couple of
>>>> times. I think I hit a maximum cause I can reactivate my license at the
>>>> moment. I raised a ticket over at Maurer IT. I was not aware of this
>>>> limitation. How do I prevent this from happening again? Just not install the
>>>> license or not re-install ProxmoxVE?
>>> 
>>> It is usually not required to do re-installs (what for?). And I guess
>>> it is not necessary to activate the subscription for a test system
>>> when you know you will reinstall soon (use pve-no-subscription for updates).
>>> 
>> 
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mavleeuwen at icloud.com  Sat Nov 19 14:35:52 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 14:35:52 +0100
Subject: [PVE-User] NFS, LXC
Message-ID: <3922CAC4-2058-4E21-9AB1-F7F45E17C8B4@icloud.com>

Hi,

I?m trying to mount a remote NFS share (NAS) from a LXC container. I found this on the Proxmox forums and tried it. 

/etc/apparmor.d/lxc-default-with-nfs

# Do not load this file.  Rather, load /etc/apparmor.d/lxc-containers, which
# will source all profiles under /etc/apparmor.d/lxc

profile lxc-container-default-with-nfs flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>

# allow NFS (nfs/nfs4) mounts.
  mount fstype=nfs*,
}

reload

apparmor_parser -r /etc/apparmor.d/lxc-containers

add to container config

lxc.aa_profile: lxc-container-default-with-nfs

I add the above settings to my Proxmox host but when I restart the LXC container with the new settings I can?t access the web app in this container anymore. It looks like all network connectivity is gone. Also tried to ping Goolge.com <http://goolge.com/> within the LXC container but no go. When I remove  

lxc.aa_profile: lxc-container-default-with-nfs

everything is okay. Any idea?

Cheers,

Marcel

From lemonnierk at ulrar.net  Sat Nov 19 14:50:42 2016
From: lemonnierk at ulrar.net (Kevin Lemonnier)
Date: Sat, 19 Nov 2016 14:50:42 +0100
Subject: [PVE-User] License issue
In-Reply-To: <1749975450.39.1479549998560@webmail.proxmox.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
Message-ID: <20161119135042.GJ24918@luwin.ulrar.net>

> 
> It is usually not required to do re-installs (what for?). [...] 
>

It's so so so so easy to mess up in a cluster and be locked out.
Unfortunatly the only way is to re-install, and that's basicaly the
only answer you get from both IRC and the forum to those problems.

So yes, re-install is unfortunatly necessary.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20161119/50567377/attachment.sig>

From mavleeuwen at icloud.com  Sat Nov 19 15:33:03 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 15:33:03 +0100
Subject: [PVE-User] NFS, LXC
In-Reply-To: <3922CAC4-2058-4E21-9AB1-F7F45E17C8B4@icloud.com>
References: <3922CAC4-2058-4E21-9AB1-F7F45E17C8B4@icloud.com>
Message-ID: <A11B4BE4-4807-40AE-97C9-BE00016019B3@icloud.com>

To reply to my own question you have to mount a NFS share on the host via the webui and use bind mount points.

Cheers,

Marcel

> On 19 Nov 2016, at 14:35, Marcel van Leeuwen <mavleeuwen at icloud.com> wrote:
> 
> Hi,
> 
> I?m trying to mount a remote NFS share (NAS) from a LXC container. I found this on the Proxmox forums and tried it. 
> 
> /etc/apparmor.d/lxc-default-with-nfs
> 
> # Do not load this file.  Rather, load /etc/apparmor.d/lxc-containers, which
> # will source all profiles under /etc/apparmor.d/lxc
> 
> profile lxc-container-default-with-nfs flags=(attach_disconnected,mediate_deleted) {
>  #include <abstractions/lxc/container-base>
> 
> # allow NFS (nfs/nfs4) mounts.
>  mount fstype=nfs*,
> }
> 
> reload
> 
> apparmor_parser -r /etc/apparmor.d/lxc-containers
> 
> add to container config
> 
> lxc.aa_profile: lxc-container-default-with-nfs
> 
> I add the above settings to my Proxmox host but when I restart the LXC container with the new settings I can?t access the web app in this container anymore. It looks like all network connectivity is gone. Also tried to ping Goolge.com <http://goolge.com/> within the LXC container but no go. When I remove  
> 
> lxc.aa_profile: lxc-container-default-with-nfs
> 
> everything is okay. Any idea?
> 
> Cheers,
> 
> Marcel
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mavleeuwen at icloud.com  Sat Nov 19 15:38:23 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 15:38:23 +0100
Subject: [PVE-User] License issue
In-Reply-To: <20161119135042.GJ24918@luwin.ulrar.net>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <20161119135042.GJ24918@luwin.ulrar.net>
Message-ID: <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com>

Hmmm, also true. I think this surly applies to less experienced Linux user like me but also if you applies when you are not comfortable on a distro?

Cheers,

Marcel 
> On 19 Nov 2016, at 14:50, Kevin Lemonnier <lemonnierk at ulrar.net> wrote:
> 
>> 
>> It is usually not required to do re-installs (what for?). [...] 
>> 
> 
> It's so so so so easy to mess up in a cluster and be locked out.
> Unfortunatly the only way is to re-install, and that's basicaly the
> only answer you get from both IRC and the forum to those problems.
> 
> So yes, re-install is unfortunatly necessary.
> 
> -- 
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From bc at iptel.co  Sat Nov 19 16:15:24 2016
From: bc at iptel.co (Brian ::)
Date: Sat, 19 Nov 2016 15:15:24 +0000
Subject: [PVE-User] License issue
In-Reply-To: <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <20161119135042.GJ24918@luwin.ulrar.net>
 <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com>
Message-ID: <CAAr8Qow=bLnULinVyHG=6Nn3tM+HjOCcnajx+5UQRMAxt0WRJg@mail.gmail.com>

Don't install the licence until you're fully comfortable that you have
everything working the way you want it and you won't have any issue!

You can use the non sub repo for as long as you need.

On Sat, Nov 19, 2016 at 2:38 PM, Marcel van Leeuwen
<mavleeuwen at icloud.com> wrote:
> Hmmm, also true. I think this surly applies to less experienced Linux user like me but also if you applies when you are not comfortable on a distro?
>
> Cheers,
>
> Marcel
>> On 19 Nov 2016, at 14:50, Kevin Lemonnier <lemonnierk at ulrar.net> wrote:
>>
>>>
>>> It is usually not required to do re-installs (what for?). [...]
>>>
>>
>> It's so so so so easy to mess up in a cluster and be locked out.
>> Unfortunatly the only way is to re-install, and that's basicaly the
>> only answer you get from both IRC and the forum to those problems.
>>
>> So yes, re-install is unfortunatly necessary.
>>
>> --
>> Kevin Lemonnier
>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From dietmar at proxmox.com  Sat Nov 19 16:39:14 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Sat, 19 Nov 2016 16:39:14 +0100 (CET)
Subject: [PVE-User] License issue
In-Reply-To: <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
Message-ID: <547945705.173.1479569954476@webmail.proxmox.com>

> For now I?ve add the pve-no-subscripition repository.
> 
> What?s the difference between the pve-enterprise and the pve-no-subscription
> repository? Are update just beter tested in the pve-enterprise repo? 

Basically yes. 


From mavleeuwen at icloud.com  Sat Nov 19 19:15:01 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 19:15:01 +0100
Subject: [PVE-User] License issue
In-Reply-To: <CAAr8Qow=bLnULinVyHG=6Nn3tM+HjOCcnajx+5UQRMAxt0WRJg@mail.gmail.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <20161119135042.GJ24918@luwin.ulrar.net>
 <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com>
 <CAAr8Qow=bLnULinVyHG=6Nn3tM+HjOCcnajx+5UQRMAxt0WRJg@mail.gmail.com>
Message-ID: <E2812FB8-F80A-4D57-9E97-764BF4397CB2@icloud.com>

I'm certainly going to do this. Thanks!

Marcel

> Op 19 nov. 2016 om 16:15 heeft Brian :: <bc at iptel.co> het volgende geschreven:
> 
> Don't install the licence until you're fully comfortable that you have
> everything working the way you want it and you won't have any issue!
> 
> You can use the non sub repo for as long as you need.
> 
> On Sat, Nov 19, 2016 at 2:38 PM, Marcel van Leeuwen
> <mavleeuwen at icloud.com> wrote:
>> Hmmm, also true. I think this surly applies to less experienced Linux user like me but also if you applies when you are not comfortable on a distro?
>> 
>> Cheers,
>> 
>> Marcel
>>>> On 19 Nov 2016, at 14:50, Kevin Lemonnier <lemonnierk at ulrar.net> wrote:
>>>> 
>>>> 
>>>> It is usually not required to do re-installs (what for?). [...]
>>>> 
>>> 
>>> It's so so so so easy to mess up in a cluster and be locked out.
>>> Unfortunatly the only way is to re-install, and that's basicaly the
>>> only answer you get from both IRC and the forum to those problems.
>>> 
>>> So yes, re-install is unfortunatly necessary.
>>> 
>>> --
>>> Kevin Lemonnier
>>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> 
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mavleeuwen at icloud.com  Sat Nov 19 19:20:54 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Sat, 19 Nov 2016 19:20:54 +0100
Subject: [PVE-User] License issue
In-Reply-To: <547945705.173.1479569954476@webmail.proxmox.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
 <547945705.173.1479569954476@webmail.proxmox.com>
Message-ID: <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com>

What if the license is renewed after a year? Then you have 3 installs again?

Op 19 nov. 2016 om 16:39 heeft Dietmar Maurer <dietmar at proxmox.com> het volgende geschreven:

>> For now I?ve add the pve-no-subscripition repository.
>> 
>> What?s the difference between the pve-enterprise and the pve-no-subscription
>> repository? Are update just beter tested in the pve-enterprise repo? 
> 
> Basically yes. 
> 


From dietmar at proxmox.com  Sat Nov 19 20:22:20 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Sat, 19 Nov 2016 20:22:20 +0100 (CET)
Subject: [PVE-User] License issue
In-Reply-To: <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
 <547945705.173.1479569954476@webmail.proxmox.com>
 <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com>
Message-ID: <94138841.252.1479583340873@webmail.proxmox.com>

Please note that our software license is AGPL.

You talk about subscriptions here - and this is something very different.

> What if the license is renewed after a year? Then you have 3 installs again?

Sure. Also, you can simply contact our support if you need more than 3
installs. We usually find a solution ...


From lindsay.mathieson at gmail.com  Sun Nov 20 00:13:19 2016
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sun, 20 Nov 2016 09:13:19 +1000
Subject: [PVE-User] pve-qemu-kvm 2.7.0-8
Message-ID: <426a470a-64ad-df2d-51e0-ac789de96e6b@gmail.com>

Does 2.7.0-8 resolve the snapshot problems?


-- 
Lindsay Mathieson


From lindsay.mathieson at gmail.com  Sun Nov 20 01:04:13 2016
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Sun, 20 Nov 2016 10:04:13 +1000
Subject: [PVE-User] pve-qemu-kvm 2.7.0-8
In-Reply-To: <426a470a-64ad-df2d-51e0-ac789de96e6b@gmail.com>
References: <426a470a-64ad-df2d-51e0-ac789de96e6b@gmail.com>
Message-ID: <9673b774-9c47-b3a9-8657-03e4513042ec@gmail.com>

On 20/11/2016 9:13 AM, Lindsay Mathieson wrote:
> Does 2.7.0-8 resolve the snapshot problems? 

To answer my own question, it appears that it does. I have successfully 
snapshotted and restored running VM's. Online migration is ok to.


Thanks Devs.

-- 
Lindsay Mathieson


From daniel at linux-nerd.de  Sun Nov 20 11:08:50 2016
From: daniel at linux-nerd.de (Daniel)
Date: Sun, 20 Nov 2016 11:08:50 +0100
Subject: [PVE-User] LXC Live Migration
Message-ID: <FBC7CB76-E634-448A-A966-60BDBE03B775@linux-nerd.de>

Hi,

i didnt test it yet but is LXC Live Migration implemented now?
If not, someone knows if there are plans for implementation?

Cheers


Daniel


From marcomgabriel at gmail.com  Sun Nov 20 15:27:11 2016
From: marcomgabriel at gmail.com (Marco M. Gabriel)
Date: Sun, 20 Nov 2016 14:27:11 +0000
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <419261783.47.1479373626167@webmail.proxmox.com>
References: <419261783.47.1479373626167@webmail.proxmox.com>
Message-ID: <CAEp19KNAAbOcsDuC_RGgZ8Ptmr3cY_N3NDdybQUyfZVzDK4oog@mail.gmail.com>

How does this affect existing Proxmox VE 4.x / DRBD9 setups?

Does "removing the storage driver" mean, that there is no DRBD kernel
module available from next release oder is it just the manageability due to
removal of drbdmanage?

thanks for clarification,
Marco


Dietmar Maurer <dietmar at proxmox.com> schrieb am Do., 17. Nov. 2016 um
10:07 Uhr:

> Hi all,
>
> We just want to inform you that Linbit changed the License
> for their 'drbdmanage' toolkit.
>
> The commit messages says ("Philipp Reisner"):
> ------------------
> basically we do not want that others (who have not contributed to the
> development) act as parasites in our support business
> ------------------
>
> The commit is here:
>
>
> http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1
>
>
> The new License contains the following clause (3.4b):
>
> ------------------
> 3.4) Without prior written consent of LICENSOR or an authorized partner,
>  LICENSEE is not allowed to:
>
> b) provide commercial turn-key solutions based on the LICENSED SOFTWARE or
>  commercial services for the LICENSED SOFTWARE or its modifications to any
>  third party (e.g. software support or trainings).
> ------------------
>
> So we are basically forced to remove the package from our repository. We
> will
> also remove the included storage driver to make sure that we and our
> customers do not violate that license.
>
> Please contact Linbit if you want to use drbdmanage in future. They may
> provide all necessary packages.
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From mail at valentinvoigt.info  Sun Nov 20 16:00:40 2016
From: mail at valentinvoigt.info (Valentin Voigt)
Date: Sun, 20 Nov 2016 15:00:40 +0000
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <419261783.47.1479373626167@webmail.proxmox.com>
Message-ID: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>

Hi,

I just set up a two-node cluster with Proxmox using DRBD9 for live 
migration.

Does that mean that I should switch technology as long as we're not 
using that cluster for production? It would be pretty sad when DRBD gets 
removed from Proxmox once I get to use it.

I think it's already hard to find solutions for high-availability(ish) 
clusters for those poor souls with with only two physical machines. Any 
hint in a better direction would of course be appreciated!

Thanks!

Valentin

------ Originalnachricht ------
Von: "Dietmar Maurer" <dietmar at proxmox.com>
An: "PVE Development List" <pve-devel at pve.proxmox.com>; "PVE User List" 
<pve-user at pve.proxmox.com>
Gesendet: 17.11.2016 10:07:06
Betreff: [PVE-User] drbdmanage License change

>Hi all,
>
>We just want to inform you that Linbit changed the License
>for their 'drbdmanage' toolkit.
>
>The commit messages says ("Philipp Reisner"):
>------------------
>basically we do not want that others (who have not contributed to the
>development) act as parasites in our support business
>------------------
>
>The commit is here:
>
>http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1
>
>
>The new License contains the following clause (3.4b):
>
>------------------
>3.4) Without prior written consent of LICENSOR or an authorized 
>partner,
>  LICENSEE is not allowed to:
>
>b) provide commercial turn-key solutions based on the LICENSED SOFTWARE 
>or
>  commercial services for the LICENSED SOFTWARE or its modifications to 
>any
>  third party (e.g. software support or trainings).
>------------------
>
>So we are basically forced to remove the package from our repository. 
>We will
>also remove the included storage driver to make sure that we and our
>customers do not violate that license.
>
>Please contact Linbit if you want to use drbdmanage in future. They may
>provide all necessary packages.
>
>_______________________________________________
>pve-user mailing list
>pve-user at pve.proxmox.com
>http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From dietmar at proxmox.com  Sun Nov 20 16:25:19 2016
From: dietmar at proxmox.com (Dietmar Maurer)
Date: Sun, 20 Nov 2016 16:25:19 +0100 (CET)
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <CAEp19KNAAbOcsDuC_RGgZ8Ptmr3cY_N3NDdybQUyfZVzDK4oog@mail.gmail.com>
References: <419261783.47.1479373626167@webmail.proxmox.com>
 <CAEp19KNAAbOcsDuC_RGgZ8Ptmr3cY_N3NDdybQUyfZVzDK4oog@mail.gmail.com>
Message-ID: <225246742.23.1479655521325@webmail.proxmox.com>


> How does this affect existing Proxmox VE 4.x / DRBD9 setups?
> 
> Does "removing the storage driver" mean, that there is no DRBD kernel
> module available from next release oder is it just the manageability due to
> removal of drbdmanage?

We will keep the kernel module for now, unless Linbit 
wants that we remove it.


From aderumier at odiso.com  Sun Nov 20 17:54:47 2016
From: aderumier at odiso.com (Alexandre DERUMIER)
Date: Sun, 20 Nov 2016 17:54:47 +0100 (CET)
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
Message-ID: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>

>>I think it's already hard to find solutions for high-availability(ish) 
>>clusters for those poor souls with with only two physical machines. Any 
>>hint in a better direction would of course be appreciated!

I think we could manage this with qemu block replication.

Live local storage migration is coming, and it's including the mirror through remote node.
So I think it should be too difficult to handle continuous replication of qemu block driver.
Maybe the only problem is that is not possible to run other qemu jobs (backups for example), at the same time.


----- Mail original -----
De: "Valentin Voigt" <mail at valentinvoigt.info>
?: "proxmoxve" <pve-user at pve.proxmox.com>
Envoy?: Dimanche 20 Novembre 2016 16:00:40
Objet: Re: [PVE-User] drbdmanage License change

Hi, 

I just set up a two-node cluster with Proxmox using DRBD9 for live 
migration. 

Does that mean that I should switch technology as long as we're not 
using that cluster for production? It would be pretty sad when DRBD gets 
removed from Proxmox once I get to use it. 

I think it's already hard to find solutions for high-availability(ish) 
clusters for those poor souls with with only two physical machines. Any 
hint in a better direction would of course be appreciated! 

Thanks! 

Valentin 

------ Originalnachricht ------ 
Von: "Dietmar Maurer" <dietmar at proxmox.com> 
An: "PVE Development List" <pve-devel at pve.proxmox.com>; "PVE User List" 
<pve-user at pve.proxmox.com> 
Gesendet: 17.11.2016 10:07:06 
Betreff: [PVE-User] drbdmanage License change 

>Hi all, 
> 
>We just want to inform you that Linbit changed the License 
>for their 'drbdmanage' toolkit. 
> 
>The commit messages says ("Philipp Reisner"): 
>------------------ 
>basically we do not want that others (who have not contributed to the 
>development) act as parasites in our support business 
>------------------ 
> 
>The commit is here: 
> 
>http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1 
> 
> 
>The new License contains the following clause (3.4b): 
> 
>------------------ 
>3.4) Without prior written consent of LICENSOR or an authorized 
>partner, 
> LICENSEE is not allowed to: 
> 
>b) provide commercial turn-key solutions based on the LICENSED SOFTWARE 
>or 
> commercial services for the LICENSED SOFTWARE or its modifications to 
>any 
> third party (e.g. software support or trainings). 
>------------------ 
> 
>So we are basically forced to remove the package from our repository. 
>We will 
>also remove the included storage driver to make sure that we and our 
>customers do not violate that license. 
> 
>Please contact Linbit if you want to use drbdmanage in future. They may 
>provide all necessary packages. 
> 
>_______________________________________________ 
>pve-user mailing list 
>pve-user at pve.proxmox.com 
>http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 


From lindsay.mathieson at gmail.com  Sun Nov 20 22:22:37 2016
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Mon, 21 Nov 2016 07:22:37 +1000
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
 <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
Message-ID: <899652ea-c266-d799-2032-f3e17d156387@gmail.com>

On 21/11/2016 2:54 AM, Alexandre DERUMIER wrote:
> I think we could manage this with qemu block replication.

Very nice.


Is this an existing feature in qemu or still under development? (or 
planning)

-- 
Lindsay Mathieson


From aderumier at odiso.com  Mon Nov 21 07:19:41 2016
From: aderumier at odiso.com (Alexandre DERUMIER)
Date: Mon, 21 Nov 2016 07:19:41 +0100 (CET)
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <899652ea-c266-d799-2032-f3e17d156387@gmail.com>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
 <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
 <899652ea-c266-d799-2032-f3e17d156387@gmail.com>
Message-ID: <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>

>>Is this an existing feature in qemu or still under development? (or 
>>planning)

qemu already support block migration to remote nbd (network block device) server.

qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node).
I'll would like to implemented this, but first, we need to finish to implement live migration + live local storage migration.


----- Mail original -----
De: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
?: "proxmoxve" <pve-user at pve.proxmox.com>
Envoy?: Dimanche 20 Novembre 2016 22:22:37
Objet: Re: [PVE-User] drbdmanage License change

On 21/11/2016 2:54 AM, Alexandre DERUMIER wrote: 
> I think we could manage this with qemu block replication. 

Very nice. 


Is this an existing feature in qemu or still under development? (or 
planning) 

-- 
Lindsay Mathieson 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 


From yannis.milios at gmail.com  Mon Nov 21 08:53:01 2016
From: yannis.milios at gmail.com (Yannis Milios)
Date: Mon, 21 Nov 2016 07:53:01 +0000
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
 <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
 <899652ea-c266-d799-2032-f3e17d156387@gmail.com>
 <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>
Message-ID: <CAFiF2OpSpuX5_EhZ1t8bCLRKhvS7LBDAm5fLfCRHqnL6Y6Y2LA@mail.gmail.com>

Regarding drbd, is it possible to include drbd8 kernel module + userland
utilities instead which are not affected by the license change?

On Mon, 21 Nov 2016 at 06:20, Alexandre DERUMIER <aderumier at odiso.com>
wrote:

> >>Is this an existing feature in qemu or still under development? (or
> >>planning)
>
> qemu already support block migration to remote nbd (network block device)
> server.
>
> qemu 2.8 have a new feature, COLO, which will allow HA without vm
> interruption. (continuous memory + block replication on remote node).
> I'll would like to implemented this, but first, we need to finish to
> implement live migration + live local storage migration.
>
>
> ----- Mail original -----
> De: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
> ?: "proxmoxve" <pve-user at pve.proxmox.com>
> Envoy?: Dimanche 20 Novembre 2016 22:22:37
> Objet: Re: [PVE-User] drbdmanage License change
>
> On 21/11/2016 2:54 AM, Alexandre DERUMIER wrote:
> > I think we could manage this with qemu block replication.
>
> Very nice.
>
>
> Is this an existing feature in qemu or still under development? (or
> planning)
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
-- 
Sent from Gmail Mobile


From lindsay.mathieson at gmail.com  Mon Nov 21 09:56:21 2016
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Mon, 21 Nov 2016 18:56:21 +1000
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
 <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
 <899652ea-c266-d799-2032-f3e17d156387@gmail.com>
 <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>
Message-ID: <CAEMkAmEQEHScjWesj44Q9mXKSvvXK4yd7J6BYTHR0c120j_q5Q@mail.gmail.com>

On 21 November 2016 at 16:19, Alexandre DERUMIER <aderumier at odiso.com> wrote:
> qemu already support block migration to remote nbd (network block device) server.

Thanks, I'll have a look into that.

>
> qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node).

Wow, very cool.


-- 
Lindsay


From f.gruenbichler at proxmox.com  Mon Nov 21 10:17:29 2016
From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=)
Date: Mon, 21 Nov 2016 10:17:29 +0100
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <CAEMkAmEQEHScjWesj44Q9mXKSvvXK4yd7J6BYTHR0c120j_q5Q@mail.gmail.com>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
 <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
 <899652ea-c266-d799-2032-f3e17d156387@gmail.com>
 <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>
 <CAEMkAmEQEHScjWesj44Q9mXKSvvXK4yd7J6BYTHR0c120j_q5Q@mail.gmail.com>
Message-ID: <20161121091729.foqnlctas4y6mh7r@nora.maurer-it.com>

On Mon, Nov 21, 2016 at 06:56:21PM +1000, Lindsay Mathieson wrote:
> On 21 November 2016 at 16:19, Alexandre DERUMIER <aderumier at odiso.com> wrote:
> > qemu already support block migration to remote nbd (network block device) server.
> 
> Thanks, I'll have a look into that.
> 
> >
> > qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node).
> 
> Wow, very cool.
> 

Yes - but also will take some time to test and integrate, so don't
expect this to hit the PVE repos right after the 2.8 release ;). Also
keep in mind the hardware and network requirements for anything
approaching a busy workload running like this - you need to constantly
sync the memory and I/O!


From aderumier at odiso.com  Mon Nov 21 13:52:13 2016
From: aderumier at odiso.com (Alexandre DERUMIER)
Date: Mon, 21 Nov 2016 13:52:13 +0100 (CET)
Subject: [PVE-User] drbdmanage License change
In-Reply-To: <20161121091729.foqnlctas4y6mh7r@nora.maurer-it.com>
References: <em6c223ce6-05fe-4b99-a9d2-279d046e20fa@zelos>
 <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv>
 <899652ea-c266-d799-2032-f3e17d156387@gmail.com>
 <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv>
 <CAEMkAmEQEHScjWesj44Q9mXKSvvXK4yd7J6BYTHR0c120j_q5Q@mail.gmail.com>
 <20161121091729.foqnlctas4y6mh7r@nora.maurer-it.com>
Message-ID: <1478687900.3513790.1479732733900.JavaMail.zimbra@oxygem.tv>

>>Yes - but also will take some time to test and integrate, so don't
>>expect this to hit the PVE repos right after the 2.8 release ;)

Yes, I think this need a lot of work :)

----- Mail original -----
De: "Fabian Gr?nbichler" <f.gruenbichler at proxmox.com>
?: "proxmoxve" <pve-user at pve.proxmox.com>
Envoy?: Lundi 21 Novembre 2016 10:17:29
Objet: Re: [PVE-User] drbdmanage License change

On Mon, Nov 21, 2016 at 06:56:21PM +1000, Lindsay Mathieson wrote: 
> On 21 November 2016 at 16:19, Alexandre DERUMIER <aderumier at odiso.com> wrote: 
> > qemu already support block migration to remote nbd (network block device) server. 
> 
> Thanks, I'll have a look into that. 
> 
> > 
> > qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node). 
> 
> Wow, very cool. 
> 

Yes - but also will take some time to test and integrate, so don't 
expect this to hit the PVE repos right after the 2.8 release ;). Also 
keep in mind the hardware and network requirements for anything 
approaching a busy workload running like this - you need to constantly 
sync the memory and I/O! 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 


From IMMO.WETZEL at adtran.com  Mon Nov 21 22:44:43 2016
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Mon, 21 Nov 2016 21:44:43 +0000
Subject: [PVE-User] Set new vm description via a piece call doesn't work.
 Any example available?
Message-ID: <r004dp38n8jb3supx3pm1r6l.1479764673867@email.android.com>

Hi,

I try to change the description of a vm via a picture call.

But I got always back an empty array.
I used the same json call for creating snapshot successfully.

The Web interface seem not to use json but adding the optional digest.
So is there anybody in this round who can tell me if I have to add the digest also for description and how to calculate this one?

I tried already to use the digest I got with the get config.

My json body just contains the description string. Nothing else and the call path is .../nodes/[node]/qemu/[vmid]/config

Immo


Sent from Samsung Mobile

From t.lamprecht at proxmox.com  Tue Nov 22 08:33:57 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Tue, 22 Nov 2016 08:33:57 +0100
Subject: [PVE-User] Set new vm description via a piece call doesn't
 work. Any example available?
In-Reply-To: <r004dp38n8jb3supx3pm1r6l.1479764673867@email.android.com>
References: <r004dp38n8jb3supx3pm1r6l.1479764673867@email.android.com>
Message-ID: <b409b6d8-ec15-bebb-943e-63109c9f11ea@proxmox.com>

Hi,

On 11/21/2016 10:44 PM, IMMO WETZEL wrote:
> Hi,
>
> I try to change the description of a vm via a picture call.


pvesh set /nodes/localhost/qemu/100/config --description 'test 12'

works for me here.

So with HTTP you would use a PUT request (instead of set with pvesh) to 
/nodes/localhost/qemu/100/config with the description property.

> But I got always back an empty array.
> I used the same json call for creating snapshot successfully.
>
> The Web interface seem not to use json but adding the optional digest.
> So is there anybody in this round who can tell me if I have to add the digest also for description and how to calculate this one?
> I tried already to use the digest I got with the get config.

That's the correct value for digest. It is simply a SHA1 hash of the 
config file, you can then pass the digest to your set command so that 
this command aborts if someone else changed the config in the mean time.

>
> My json body just contains the description string. Nothing else and the call path is .../nodes/[node]/qemu/[vmid]/config
>
> Immo
>
>
>
> Sent from Samsung Mobile
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From ADhaussy at voyages-sncf.com  Tue Nov 22 17:35:08 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Tue, 22 Nov 2016 16:35:08 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
Message-ID: <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>

...sequel to those thrilling adventures...
I _still_ have problems with nodes not joining the cluster properly after rebooting...

Here's what we have done last night :

- Stopped ALL VMs (just to ensure no corruption happen in case of unexpected reboots...)
- Patched qemu from 2.6.1 to 2.6.2 to fix live migration issues.
- Removed bridge (cluster network) on all nodes to fix multicast issues (11 nodes total.)
- Patched all (HP blade/HP ILO/Ethernet/Fiber Channel card) bios and firmwares (13 nodes total.)
- Rebooted all nodes, one, two, or three server simultaneously.

So far we had absolutly no problems, corosync was still quorate and all nodes leaved and joined the cluster successfully.

- Added 2 nodes to the cluster, no problem at all...
- Started two VMs on two nodes, and to cut the network on those nodes.
- As expected, watchdog did its job killing the two nodes, VMs were relocated.... so far so good !

_Except_, the two nodes were never able to join the cluster again after reboot...

LVM takes so long to scan all PVs/LVs....somehow, i believe, it ends in an inconsistency when systemd starts cluster services.
On the other nodes, i can actually see that corosync does a quick join/leave (and fails) right after booting...

Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [TOTEM ] A new membership (10.98.x.x:1492) was formed. Members joined: 10
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [TOTEM ] A new membership (10.98.x.x:1496) was formed. Members left: 10
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [CPG   ] downlist left_list: 0 received in state 2
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [QUORUM] Members[10]: 9 11 5 4 12 3 1 2 6 8
Nov 22 02:07:52 proxmoxt21 corosync[22342]:  [MAIN  ] Completed service synchronization, ready to provide service.

I tried several reboots...same problem. :(
I ended up removing the two freshly added nodes from the cluster, and restarted all VMs.

I don't know how, but i feel that every node i add to the cluster currently slows down LVM scan a little more...until it ends up interfering with cluster services at boot...
Recall that i have about 1500Vms, 1600LVs, 70PVs on external SAN storage...

_Now_ i have a serious lead that this issue could be related to a known racing condition between udev and multipath.
I have had this issue previously, but i didnt think i would interact and cause issues with cluster services...what do you think ?
See the https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799781

I quickly tried the workaround suggested here : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799781#32
(remove this rule from udev : ACTION=="add|change", SUBSYSTEM=="block", RUN+="/sbin/multipath -v0 /dev/$name")

I can tell it boots _much_ faster, but i will need to give another try and proper testing to see if it fix my issue...
Anyhow, i'm open to suggestions or thoughts that could enlighten me...

(And sorry for the long story)

Le 14/11/2016 ? 12:33, Thomas Lamprecht a ?crit :


On 14.11.2016 11:50, Dhaussy Alexandre wrote:

Le 11/11/2016 ? 19:43, Dietmar Maurer a ?crit :
On November 11, 2016 at 6:41 PM Dhaussy Alexandre
<ADhaussy at voyages-sncf.com><mailto:ADhaussy at voyages-sncf.com> wrote:
you lost quorum, and the watchdog expired - that is how the watchdog
based fencing works.
I don't expect to loose quorum when _one_ node joins or leave the cluster.
This was probably a long time before - but I have not read through the whole
logs ...
That makes no sense to me..
The fact is : everything have been working fine for weeks.


What i can see in the logs is : several reboots of cluster nodes
suddently, and exactly one minute after one node joining and/or leaving
the cluster.

The watchdog is set to an 60 second timeout, meaning that cluster leave caused
quorum loss, or other problems (you said you had multicast problems around that
time) thus the LRM stopped updating the watchdog, so one minute later it resetted
all nodes, which left the quorate partition.

I see no problems with corosync/lrm/crm before that.
This leads me to a probable network (multicast) malfunction.

I did a bit of homeworks reading the wiki about ha manager..

What i understand so far, is that every state/service change from LRM
must be acknowledged (cluster-wise) by CRM master.

Yes and no, LRM and CRM are two state machines with synced inputs,
but that holds mainly for human triggered commands and the resulting
communication.
Meaning that commands like start, stop, migrate may not go through from
the CRM to the LRM. Fencing and such stuff works none the less, else it
would be a major design flaw :)

So if a multicast disruption occurs, and i assume LRM wouldn't be able
talk to the CRM MASTER, then it also couldn't reset the watchdog, am i
right ?


No, the watchdog runs on each node and is CRM independent.
As watchdogs are normally not able to server more clients we wrote
the watchdog-mux (multiplexer).
This is a very simple C program which opens the watchdog with a
60 second timeout and allows multiple clients (at the moment CRM
and LRM) to connect to it.
If a client does not resets the dog for about 10 seconds, IIRC, the
watchdox-mux disables watchdogs updates on the real watchdog.
After that a node reset will happen *when* the dog runs out of time,
not instantly.

So if the LRM cannot communicate (i.e. has no quorum) he will stop
updating the dog, thus trigger independent what the CRM says or does.


Another thing ; i have checked my network configuration, the cluster ip
is set on a linux bridge...
By default multicast_snooping is set to 1 on linux bridge, so i think it
there's a good chance this is the source of my problems...
Note that we don't use IGMP snooping, it is disabled on almost all
network switchs.


Yes, multicast snooping has to be configured (recommended) or else turned off on the switch.
That's stated in some wiki articles, various forum posts and our docs, here:
http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

Hope that helps a bit understanding. :)

cheers,
Thomas

Plus i found a post by A.Derumier (yes, 3 years old..) He did have
similar issues with bridge and multicast.
http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html
_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com<mailto:pve-user at pve.proxmox.com>
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mir at miras.org  Tue Nov 22 17:56:08 2016
From: mir at miras.org (Michael Rasmussen)
Date: Tue, 22 Nov 2016 17:56:08 +0100
Subject: [PVE-User] Cluster disaster
In-Reply-To: <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
 <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
Message-ID: <20161122175608.50607b2d@sleipner.datanom.net>

On Tue, 22 Nov 2016 16:35:08 +0000
Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:

> 
> I don't know how, but i feel that every node i add to the cluster currently slows down LVM scan a little more...until it ends up interfering with cluster services at boot...
Maybe you need to tune the filter rules in /etc/lvm/lvm.conf.

My own rules as an inspiration:
    # Do not scan ZFS zvols (to avoid problems on ZFS zvols snapshots)
    global_filter = [ "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|" ]

    # Only scan for volumes on local disk and on iSCSI target from Qnap NAS. Block scanning from all
    # other block devices.
    filter = [ "a|ata-OCZ-AGILITY3_OCZ-QMZN8K4967DA9NGO.*|", "a|scsi-36001405e38e9f02ddef9d4573db7a0d0|", "r|.*|" ]


-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
The trouble with being punctual is that people think you have nothing
more important to do.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20161122/058fcd15/attachment.sig>

From ADhaussy at voyages-sncf.com  Tue Nov 22 18:12:27 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Tue, 22 Nov 2016 17:12:27 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <20161122175608.50607b2d@sleipner.datanom.net>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
 <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
 <20161122175608.50607b2d@sleipner.datanom.net>
Message-ID: <a02a1666-cf33-3e91-1a70-32605c0d6ca7@voyages-sncf.com>


Le 22/11/2016 ? 17:56, Michael Rasmussen a ?crit :
> On Tue, 22 Nov 2016 16:35:08 +0000
> Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:
>
>> I don't know how, but i feel that every node i add to the cluster currently slows down LVM scan a little more...until it ends up interfering with cluster services at boot...
> Maybe you need to tune the filter rules in /etc/lvm/lvm.conf.

Yep, i already tuned filters in lvm config, before that i had "duplicate 
PVs' messages because of multipath devices.
Anyway if i'm not wrong, LVM still has a lot of LVs to activate at boot.

nov. 22 02:16:21 proxmoxt34 lvm[7279]: 1644 logical volume(s) in volume 
group "T_proxmox_1" now active
nov. 22 02:16:21 proxmoxt34 lvm[7279]: 2 logical volume(s) in volume 
group "proxmoxt34-vg" now active

From mir at miras.org  Tue Nov 22 18:48:54 2016
From: mir at miras.org (Michael Rasmussen)
Date: Tue, 22 Nov 2016 18:48:54 +0100
Subject: [PVE-User] Cluster disaster
In-Reply-To: <a02a1666-cf33-3e91-1a70-32605c0d6ca7@voyages-sncf.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
 <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
 <20161122175608.50607b2d@sleipner.datanom.net>
 <a02a1666-cf33-3e91-1a70-32605c0d6ca7@voyages-sncf.com>
Message-ID: <AEB659A3-4C99-4562-BD20-680F5E25E2C0@miras.org>

Have you tested your filter rules?

On November 22, 2016 6:12:27 PM GMT+01:00, Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:
>
>Le 22/11/2016 ? 17:56, Michael Rasmussen a ?crit :
>> On Tue, 22 Nov 2016 16:35:08 +0000
>> Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:
>>
>>> I don't know how, but i feel that every node i add to the cluster
>currently slows down LVM scan a little more...until it ends up
>interfering with cluster services at boot...
>> Maybe you need to tune the filter rules in /etc/lvm/lvm.conf.
>
>Yep, i already tuned filters in lvm config, before that i had
>"duplicate 
>PVs' messages because of multipath devices.
>Anyway if i'm not wrong, LVM still has a lot of LVs to activate at
>boot.
>
>nov. 22 02:16:21 proxmoxt34 lvm[7279]: 1644 logical volume(s) in volume
>
>group "T_proxmox_1" now active
>nov. 22 02:16:21 proxmoxt34 lvm[7279]: 2 logical volume(s) in volume 
>group "proxmoxt34-vg" now active
>_______________________________________________
>pve-user mailing list
>pve-user at pve.proxmox.com
>http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

----

This mail was virus scanned and spam checked before delivery.
This mail is also DKIM signed. See header dkim-signature.


From gbr at majentis.com  Tue Nov 22 19:00:10 2016
From: gbr at majentis.com (Gerald Brandt)
Date: Tue, 22 Nov 2016 12:00:10 -0600
Subject: [PVE-User] A stop job is running... (xxx/no limit)
Message-ID: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>

Hi,

I'm trying to shut down a server, and it waits on 'A stop job is 
running... (xx/ no limit). Why is there no time limit, and how can I set 
one?

Gerald


From ADhaussy at voyages-sncf.com  Tue Nov 22 19:04:39 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Tue, 22 Nov 2016 18:04:39 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <AEB659A3-4C99-4562-BD20-680F5E25E2C0@miras.org>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
 <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
 <20161122175608.50607b2d@sleipner.datanom.net>
 <a02a1666-cf33-3e91-1a70-32605c0d6ca7@voyages-sncf.com>
 <AEB659A3-4C99-4562-BD20-680F5E25E2C0@miras.org>
Message-ID: <aba47733-a691-b6d2-1ffd-690e7aadcfcb@voyages-sncf.com>


Le 22/11/2016 ? 18:48, Michael Rasmussen a ?crit :
> Have you tested your filter rules?
Yes, i set this filter at install :

global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|", 
"r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ]

>
> On November 22, 2016 6:12:27 PM GMT+01:00, Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:
>> Le 22/11/2016 ? 17:56, Michael Rasmussen a ?crit :
>>> On Tue, 22 Nov 2016 16:35:08 +0000
>>> Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:
>>>
>>>> I don't know how, but i feel that every node i add to the cluster
>> currently slows down LVM scan a little more...until it ends up
>> interfering with cluster services at boot...
>>> Maybe you need to tune the filter rules in /etc/lvm/lvm.conf.
>> Yep, i already tuned filters in lvm config, before that i had
>> "duplicate
>> PVs' messages because of multipath devices.
>> Anyway if i'm not wrong, LVM still has a lot of LVs to activate at
>> boot.
>>
>> nov. 22 02:16:21 proxmoxt34 lvm[7279]: 1644 logical volume(s) in volume
>>
>> group "T_proxmox_1" now active
>> nov. 22 02:16:21 proxmoxt34 lvm[7279]: 2 logical volume(s) in volume
>> group "proxmoxt34-vg" now active
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

From mir at miras.org  Tue Nov 22 19:18:44 2016
From: mir at miras.org (Michael Rasmussen)
Date: Tue, 22 Nov 2016 19:18:44 +0100
Subject: [PVE-User] Cluster disaster
In-Reply-To: <aba47733-a691-b6d2-1ffd-690e7aadcfcb@voyages-sncf.com>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
 <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
 <20161122175608.50607b2d@sleipner.datanom.net>
 <a02a1666-cf33-3e91-1a70-32605c0d6ca7@voyages-sncf.com>
 <AEB659A3-4C99-4562-BD20-680F5E25E2C0@miras.org>
 <aba47733-a691-b6d2-1ffd-690e7aadcfcb@voyages-sncf.com>
Message-ID: <20161122191844.6e627895@sleipner.datanom.net>

On Tue, 22 Nov 2016 18:04:39 +0000
Dhaussy Alexandre <ADhaussy at voyages-sncf.com> wrote:

> Le 22/11/2016 ? 18:48, Michael Rasmussen a ?crit :
> > Have you tested your filter rules?  
> Yes, i set this filter at install :
> 
> global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|", 
> "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ]
> 
Does vgscan and lvscan list the expected?

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
We come to bury DOS, not to praise it.
		-- Paul Vojta, vojta at math.berkeley.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20161122/104b06a3/attachment.sig>

From ADhaussy at voyages-sncf.com  Tue Nov 22 19:47:51 2016
From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre)
Date: Tue, 22 Nov 2016 18:47:51 +0000
Subject: [PVE-User] Cluster disaster
In-Reply-To: <20161122191844.6e627895@sleipner.datanom.net>
References: <b6fcfa66-8338-3380-9d48-ca3bdc950cbc@voyages-sncf.com>
 <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com>
 <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com>
 <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com>
 <e822d7a1-d67c-cf46-63b6-72967145e156@proxmox.com>
 <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com>
 <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com>
 <1880338119.114.1478882604308@webmail.proxmox.com>
 <df70888b9c484aad9e130a6dd06f6747@ECLIPSE.groupevsc.com>
 <1860956507.131.1478889820301@webmail.proxmox.com>
 <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com>
 <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com>
 <b6862830-8acb-1c36-1cf5-2a4202a2694d@voyages-sncf.com>
 <20161122175608.50607b2d@sleipner.datanom.net>
 <a02a1666-cf33-3e91-1a70-32605c0d6ca7@voyages-sncf.com>
 <AEB659A3-4C99-4562-BD20-680F5E25E2C0@miras.org>
 <aba47733-a691-b6d2-1ffd-690e7aadcfcb@voyages-sncf.com>
 <20161122191844.6e627895@sleipner.datanom.net>
Message-ID: <4a736c76-b470-ea14-eee8-267855fe87cc@voyages-sncf.com>

Le 22/11/2016 ? 18:48, Michael Rasmussen a ?crit :
>>> Have you tested your filter rules?
>> Yes, i set this filter at install :
>>
>> global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|",
>> "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ]
>>
> Does vgscan and lvscan list the expected?
>
Seems to.

root at proxmoxt20:~# vgscan
   Reading all physical volumes.  This may take a while...
   Found volume group "T_proxmox_1" using metadata type lvm2
   Found volume group "pve" using metadata type lvm2

root at proxmoxt20:~# lvscan
   ACTIVE            '/dev/T_proxmox_1/vm-106-disk-1' [116,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-108-disk-1' [106,00 GiB] inherit
   inactive          '/dev/T_proxmox_1/vm-109-disk-1' [116,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-110-disk-1' [116,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-111-disk-1' [116,00 GiB] inherit
................
....cut.....
................
   ACTIVE            '/dev/T_proxmox_1/vm-451-disk-2' [90,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-451-disk-3' [90,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-1195-disk-2' [128,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-138-disk-1' [106,00 GiB] inherit
   ACTIVE            '/dev/T_proxmox_1/vm-517-disk-1' [101,00 GiB] inherit
   ACTIVE            '/dev/pve/swap' [7,63 GiB] inherit
   ACTIVE            '/dev/pve/root' [95,37 GiB] inherit
   ACTIVE            '/dev/pve/data' [174,46 GiB] inherit

From mark at openvs.co.uk  Wed Nov 23 10:40:55 2016
From: mark at openvs.co.uk (Mark Adams)
Date: Wed, 23 Nov 2016 09:40:55 +0000
Subject: [PVE-User] ZFS on iSCSI + Pacemaker/corosync/DRBD
Message-ID: <CAHxUxjB=vd-uEOkeiy_HHTYqK=6kPW=4aOgP-dPLUdY+7Fc1Pw@mail.gmail.com>

Hi All,

I'm testing out proxmox and trying to get a working ZFS on iSCSI HA setup
going.

Because ZFS on iSCSI logs on to the iscsi server via ssh and creates a zfs
dataset then adds iscsi config to /etc/ietd.conf it works fine when you've
got a single iscsi host, but I haven't figured out a way to use it with
pacemaker/corosync resources.

I believe the correct configuration would be for the ZFS on iSCSI script to
create the pacemaker iSCSILogicalUnit resource using pcs, after creating
the zfs dataset, but this musn't be something that is supported as yet.

Has anyone else tried to get this or a similar setup working? Any views
greatly received.

Thanks,
Mark


From gaio at sv.lnf.it  Wed Nov 23 13:30:12 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Wed, 23 Nov 2016 13:30:12 +0100
Subject: [PVE-User] A stop job is running... (xxx/no limit)
In-Reply-To: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>
References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>
Message-ID: <20161123123012.GF3383@sv.lnf.it>

Mandi! Gerald Brandt
  In chel di` si favelave...

> I'm trying to shut down a server, and it waits on 'A stop job is
> running... (xx/ no limit). Why is there no time limit, and how can I
> set one?

NFS storage?

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From gbr at majentis.com  Wed Nov 23 13:50:42 2016
From: gbr at majentis.com (Gerald Brandt)
Date: Wed, 23 Nov 2016 06:50:42 -0600
Subject: [PVE-User] A stop job is running... (xxx/no limit)
In-Reply-To: <20161123123012.GF3383@sv.lnf.it>
References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>
 <20161123123012.GF3383@sv.lnf.it>
Message-ID: <9a66b305-e68f-d17f-016a-a5cda4074dd2@majentis.com>


On 2016-11-23 06:30 AM, Marco Gaiarin wrote:
> Mandi! Gerald Brandt
>    In chel di` si favelave...
>
>> I'm trying to shut down a server, and it waits on 'A stop job is
>> running... (xx/ no limit). Why is there no time limit, and how can I
>> set one?
> NFS storage?
>

Yup. Why, does that make a difference?

Gerald


From gaio at sv.lnf.it  Wed Nov 23 14:01:03 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Wed, 23 Nov 2016 14:01:03 +0100
Subject: [PVE-User] A stop job is running... (xxx/no limit)
In-Reply-To: <9a66b305-e68f-d17f-016a-a5cda4074dd2@majentis.com>
References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>
 <20161123123012.GF3383@sv.lnf.it>
 <9a66b305-e68f-d17f-016a-a5cda4074dd2@majentis.com>
Message-ID: <20161123130103.GL3383@sv.lnf.it>

Mandi! Gerald Brandt
  In chel di` si favelave...

> >NFS storage?
> Yup. Why, does that make a difference?

Look at list archive, some weeks ago: seems that systemd behave not so
correctly and tear down the NFS server before proxmox, that stalls.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From alain.pean at lpn.cnrs.fr  Wed Nov 23 14:22:28 2016
From: alain.pean at lpn.cnrs.fr (=?UTF-8?Q?Alain_P=c3=a9an?=)
Date: Wed, 23 Nov 2016 14:22:28 +0100
Subject: [PVE-User] A stop job is running... (xxx/no limit)
In-Reply-To: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>
References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com>
Message-ID: <c4bdd1f2-1b76-2a95-ea3a-15c8188f425e@lpn.cnrs.fr>

Le 22/11/2016 ? 19:00, Gerald Brandt a ?crit :
> I'm trying to shut down a server, and it waits on 'A stop job is 
> running... (xx/ no limit). Why is there no time limit, and how can I 
> set one?

I see also this problem with a Dell R630 server with Broadcom 10g 
interfaces. It's a known bug. It is resolved in 4.5 kernels. Perhaps we 
have to wait for a proxmox upgrade to this kernel :

https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/

It seems to be a bug in 4.4 Ubuntu kernels.

Alain

-- 
Administrateur Syst?me/R?seau
C2N (ex LPN) Centre de Nanosciences et Nanotechnologies (UMR 9001)
Site de Marcoussis, Data IV, route de Nozay - 91460 Marcoussis
Tel : 01-69-63-61-34


From marcomgabriel at gmail.com  Wed Nov 23 16:16:27 2016
From: marcomgabriel at gmail.com (Marco M. Gabriel)
Date: Wed, 23 Nov 2016 15:16:27 +0000
Subject: [PVE-User] MTU size changed on a running cluster
Message-ID: <CAEp19KPbuyeNMhQzPA3qT4RgEWKBZjPabQzncD5FvX=eBkhVkA@mail.gmail.com>

Hi there,

on a productive 5 node Proxmox VE Ceph cluster, we experienced some strange
behaviour:

Based on
http://pve.proxmox.com/wiki/Open_vSwitch#Example_2:_Bond_.2B_Bridge_.2B_Internal_Ports
we
have an internal network for cluster/corosync communication and another
internal network for Ceph Storage traffic. The Ceph OVS bridge was set to
MTU 9000 in /etc/network/interfaces and ran without a problem since a week.

Today we've seen Ceph errors like "x requests are blocked > 32 sec".

After a troubleshooting, we's seen that packets got dropped because they
were > 1500 bytes on the Ceph interface. That was strange as we had set
them to MTU 9000 and it was running since a week.

We checked the Interfaces and on two nodes, we saw a MTU of 1500 while the
other three nodes still had MTU 9000.

Has anybody experiences something like that? I read that an OVS bridge
automatically sets it's own MTU according to the lowest MTU of the member
interfaces, but I am not sure if this could be a problem here.

Any hints appreciated,
Marco


From w.link at proxmox.com  Wed Nov 23 16:29:01 2016
From: w.link at proxmox.com (Wolfgang Link)
Date: Wed, 23 Nov 2016 16:29:01 +0100
Subject: [PVE-User] MTU size changed on a running cluster
In-Reply-To: <CAEp19KPbuyeNMhQzPA3qT4RgEWKBZjPabQzncD5FvX=eBkhVkA@mail.gmail.com>
References: <CAEp19KPbuyeNMhQzPA3qT4RgEWKBZjPabQzncD5FvX=eBkhVkA@mail.gmail.com>
Message-ID: <5835B5BD.2070701@proxmox.com>

This is a openvswitch bug.
The workaround is to use openvswitch 2.6, it is on testing repo and set
mtu_reqest on the interface.

https://github.com/openvswitch/ovs/blob/master/FAQ.rst

Q: How can I configure the bridge internal interface MTU? Why does Open
vSwitch keep changing internal ports MTU?

A: By default Open vSwitch overrides the internal interfaces (e.g. br0)
MTU. If you have just an internal interface (e.g. br0) and a physical
interface (e.g. eth0), then every change in MTU to eth0 will be
reflected to br0. Any manual MTU configuration using ip or ifconfig on
internal interfaces is going to be overridden by Open vSwitch to match
the current bridge minimum.

Sometimes this behavior is not desirable, for example with tunnels. The
MTU of an internal interface can be explicitly set using the following
command:

$ ovs-vsctl set int br0 mtu_request=1450
After this, Open vSwitch will configure br0 MTU to 1450. Since this
setting is in the database it will be persistent (compared to what
happens with ip or ifconfig).

The MTU configuration can be removed to restore the default behavior with:

$ ovs-vsctl set int br0 mtu_request=[]
The mtu_request column can be used to configure MTU even for physical
interfaces (e.g. eth0).


On 11/23/2016 04:16 PM, Marco M. Gabriel wrote:
> Hi there,
> 
> on a productive 5 node Proxmox VE Ceph cluster, we experienced some strange
> behaviour:
> 
> Based on
> http://pve.proxmox.com/wiki/Open_vSwitch#Example_2:_Bond_.2B_Bridge_.2B_Internal_Ports
> we
> have an internal network for cluster/corosync communication and another
> internal network for Ceph Storage traffic. The Ceph OVS bridge was set to
> MTU 9000 in /etc/network/interfaces and ran without a problem since a week.
> 
> Today we've seen Ceph errors like "x requests are blocked > 32 sec".
> 
> After a troubleshooting, we's seen that packets got dropped because they
> were > 1500 bytes on the Ceph interface. That was strange as we had set
> them to MTU 9000 and it was running since a week.
> 
> We checked the Interfaces and on two nodes, we saw a MTU of 1500 while the
> other three nodes still had MTU 9000.
> 
> Has anybody experiences something like that? I read that an OVS bridge
> automatically sets it's own MTU according to the lowest MTU of the member
> interfaces, but I am not sure if this could be a problem here.
> 
> Any hints appreciated,
> Marco
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


From mir at miras.org  Wed Nov 23 21:40:06 2016
From: mir at miras.org (Michael Rasmussen)
Date: Wed, 23 Nov 2016 21:40:06 +0100
Subject: [PVE-User] ZFS on iSCSI + Pacemaker/corosync/DRBD
In-Reply-To: <CAHxUxjB=vd-uEOkeiy_HHTYqK=6kPW=4aOgP-dPLUdY+7Fc1Pw@mail.gmail.com>
References: <CAHxUxjB=vd-uEOkeiy_HHTYqK=6kPW=4aOgP-dPLUdY+7Fc1Pw@mail.gmail.com>
Message-ID: <20161123214006.26e4e9a9@sleipner.datanom.net>

On Wed, 23 Nov 2016 09:40:55 +0000
Mark Adams <mark at openvs.co.uk> wrote:

> 
> Has anyone else tried to get this or a similar setup working? Any views
> greatly received.
> 
What you are trying to achieve is not a good idea with
corosync/pacemaker since iSCSI is a block device. To create a cluster
over a LUN will require a cluster aware filesystem like NFS, CIFS etc.
The proper way of doing this with iSCSI would be using multipath to a
SAN since iSCSI LUNs cannot be shared. Unfortunately the current
implementation of ZFS over iSCSI does not support multipath (a
limitation in libiscsi). Also may I remind you that Iet development has
stopped in favor of LIO targets (http://linux-iscsi.org/wiki/LIO). I am
currently working on making an implementation of LIO for proxmox which
will use a different architecture than the current ZFS over iSCSI
implementation. The new implementation will support multipath. As this
is developed in my spare time progress is not a high as it could be.

Alternatively you could look at this:
http://www.napp-it.org/doc/downloads/z-raid.pdf

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
The computer should be doing the hard work.  That's what it's paid to
do, after all.
		-- Larry Wall in <199709012312.QAA08121 at wall.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20161123/133d5782/attachment.sig>

From t.lamprecht at proxmox.com  Thu Nov 24 10:05:20 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Thu, 24 Nov 2016 10:05:20 +0100
Subject: [PVE-User] HA Changes and Cleanups
Message-ID: <38040063-eb73-37e6-9fa7-7b5f05f3b269@proxmox.com>

Hi all,

regarding the discussion about our HA stack on the pve-user list in October
we made some changes, which - hopefully - should address some problems and
reduce some common pitfalls.

* What has changed or is new:

pct shutdown / qm shutdown and the Shutdown button in the web interface work
now as expected, if triggered the HA service will be shut down and not
automatically started again. If that is needed there is still the 'reset'
functionality.

We provide now better feedback about the actual state of a HA service.
E.g. 'started' will be only shown if the local resource manager confirmed
that the service really started, else we show 'starting' so that it's
clearer whats currently happening.

We merged the GUI's 'Resource' tab into the 'HA' tab, related information is
now placed together. This should give a better overview of the current
situation.
Note, there are some fields in the resource grid which are hidden by
default, to show them click on one of the tiny triangles in the column
headers: https://i.imgsafe.org/6a271a3cc4.png

Improved the built in documentation.

We also reworked the request states for services, there is now:

* started (replaces 'enabled')
The CRM tries to start the resource. Service state is set to started
after successful start. On node failures, or when start fails, it tries
to recover the resource. If everything fails, service state it set to
error.

* stopped (new)
The CRM tries to keep the resource in stopped state, but it still
tries to relocate the resources on node failures.

* disabled
The CRM tries to put the resource in stopped state, but does not
try to relocate the resources on node failures. The main purpose
of this state is error recovery, because it is the only way to
move a resource out of the error state.


So the general used ones should be now 'started' and 'stopped', here its
clear what the HA stack will do.
'disabled' should be mainly used to recover a service which is in the error
state.

ha-manager enabled/disabled was removed, this was not in the API so it
should only affect user which called it directly.
You can use `ha-manager set SID --state REQUEST_STATE` instead.

* What has still to come:

A 'ignore' request state in which the service will not be touched by HA but
is still in the resource configuration - this was wished a few times.
I have WIP patches ready but nothing merged yet.

A bit less confusion on task execution logs.

Allowing hard stopping of a VM/CT under HA.

I hope this addresses some part of the feedback we got.
Many thanks to the community for the feedback and to Dietmar who did a lot
of the above mentioned work and also Dominik for his help with the UI.

User which want to test this changes can use the new packages we pushed to
pvetest yesterday evening CET.
The changes are include in the packages:
pve-ha-manager >= 1.0-38
pve-manager >= 4.3-11

Happy testing and feel free to provide feedback.

cheers,
Thomas


From lists at merit.unu.edu  Thu Nov 24 10:22:11 2016
From: lists at merit.unu.edu (mj)
Date: Thu, 24 Nov 2016 10:22:11 +0100
Subject: [PVE-User] HA Changes and Cleanups
In-Reply-To: <38040063-eb73-37e6-9fa7-7b5f05f3b269@proxmox.com>
References: <38040063-eb73-37e6-9fa7-7b5f05f3b269@proxmox.com>
Message-ID: <97d4d05d-44b7-be24-4040-d973696ede7e@merit.unu.edu>

Hi Thomas,

Thank you for these improvements.

(I did not participate much in the following discussion, but I was the 
one who started the thread "[PVE-User] HA question")

MJ

On 11/24/2016 10:05 AM, Thomas Lamprecht wrote:
> Hi all,
>
> regarding the discussion about our HA stack on the pve-user list in October
> we made some changes, which - hopefully - should address some problems and
> reduce some common pitfalls.
>
> * What has changed or is new:
>
> pct shutdown / qm shutdown and the Shutdown button in the web interface
> work
> now as expected, if triggered the HA service will be shut down and not
> automatically started again. If that is needed there is still the 'reset'
> functionality.
>
> We provide now better feedback about the actual state of a HA service.
> E.g. 'started' will be only shown if the local resource manager confirmed
> that the service really started, else we show 'starting' so that it's
> clearer whats currently happening.
>
> We merged the GUI's 'Resource' tab into the 'HA' tab, related
> information is
> now placed together. This should give a better overview of the current
> situation.
> Note, there are some fields in the resource grid which are hidden by
> default, to show them click on one of the tiny triangles in the column
> headers: https://i.imgsafe.org/6a271a3cc4.png
>
> Improved the built in documentation.
>
> We also reworked the request states for services, there is now:
>
> * started (replaces 'enabled')
> The CRM tries to start the resource. Service state is set to started
> after successful start. On node failures, or when start fails, it tries
> to recover the resource. If everything fails, service state it set to
> error.
>
> * stopped (new)
> The CRM tries to keep the resource in stopped state, but it still
> tries to relocate the resources on node failures.
>
> * disabled
> The CRM tries to put the resource in stopped state, but does not
> try to relocate the resources on node failures. The main purpose
> of this state is error recovery, because it is the only way to
> move a resource out of the error state.
>
>
> So the general used ones should be now 'started' and 'stopped', here its
> clear what the HA stack will do.
> 'disabled' should be mainly used to recover a service which is in the error
> state.
>
> ha-manager enabled/disabled was removed, this was not in the API so it
> should only affect user which called it directly.
> You can use `ha-manager set SID --state REQUEST_STATE` instead.
>
> * What has still to come:
>
> A 'ignore' request state in which the service will not be touched by HA but
> is still in the resource configuration - this was wished a few times.
> I have WIP patches ready but nothing merged yet.
>
> A bit less confusion on task execution logs.
>
> Allowing hard stopping of a VM/CT under HA.
>
> I hope this addresses some part of the feedback we got.
> Many thanks to the community for the feedback and to Dietmar who did a lot
> of the above mentioned work and also Dominik for his help with the UI.
>
> User which want to test this changes can use the new packages we pushed to
> pvetest yesterday evening CET.
> The changes are include in the packages:
> pve-ha-manager >= 1.0-38
> pve-manager >= 4.3-11
>
> Happy testing and feel free to provide feedback.
>
> cheers,
> Thomas
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From IMMO.WETZEL at adtran.com  Thu Nov 24 13:41:16 2016
From: IMMO.WETZEL at adtran.com (IMMO WETZEL)
Date: Thu, 24 Nov 2016 12:41:16 +0000
Subject: [PVE-User] pvesh delete
 /nodes/{node}/qemu/{vmid}/snapshot/{snapname} -force doesnt work as
 expected ?
Message-ID: <F5452071F098E84B91D98FDE02860FAABC78DC52@ex-mb1.corp.adtran.com>

Hi,


help command shows delete snapshot with option force but this isnt executed correctly.

See output below. Or do I something wrong ?

Background: The snapshot doesnt exists in qcow2 disk file but in config. Therefore force should help as expected and remove the snapshot entry from the config file.


root at prox01:/root# pvesh help /nodes/prox05/qemu/161/snapshot/initialSnapShot --verbose

help [path] [--verbose]

cd [path]

ls [path]


USAGE: delete /nodes/{node}/qemu/{vmid}/snapshot/{snapname}  [OPTIONS]


  Delete a VM snapshot.


  -force     boolean


             For removal from config file, even if removing disk snapshots

             fails.


...


So -force should be the one I need.


But see here:


root at prox01:/root# pvesh delete  /nodes/prox05/qemu/161/snapshot/initialSnapShot -force

usage: delete [path]


So why I see the usage string here ?


Following is correct cos snapshot can't be found


root at prox01:/root# pvesh delete  /nodes/prox05/qemu/161/snapshot/initialSnapShot

command '/usr/bin/qemu-img snapshot -d initialSnapShot /mnt/pve/Storage/images/161/vm-161-disk-1.qcow2' failed: exit code 1

qemu-img: Could not delete snapshot 'initialSnapShot': (Can't find the snapshot)

UPID:prox05:00007C50:48ACCCD1:5836D31B:qmdelsnapshot:161:root at pam:200 OK


From mavleeuwen at icloud.com  Thu Nov 24 21:06:46 2016
From: mavleeuwen at icloud.com (Marcel van Leeuwen)
Date: Thu, 24 Nov 2016 21:06:46 +0100
Subject: [PVE-User] License issue
In-Reply-To: <94138841.252.1479583340873@webmail.proxmox.com>
References: <FFE0BCE3-82D5-444D-8567-C509C880F73B@icloud.com>
 <1749975450.39.1479549998560@webmail.proxmox.com>
 <B99656FB-A8AC-46B3-B038-3217CC0C18E7@icloud.com>
 <547945705.173.1479569954476@webmail.proxmox.com>
 <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com>
 <94138841.252.1479583340873@webmail.proxmox.com>
Message-ID: <8FAC9A84-3E0B-49A7-86DD-8E1000A4B8F0@icloud.com>

Yeah, true.

Thanks for clarifying!

Cheers,

Marcel van Leeuwen
> On 19 Nov 2016, at 20:22, Dietmar Maurer <dietmar at proxmox.com> wrote:
> 
> Please note that our software license is AGPL.
> 
> You talk about subscriptions here - and this is something very different.
> 
>> What if the license is renewed after a year? Then you have 3 installs again?
> 
> Sure. Also, you can simply contact our support if you need more than 3
> installs. We usually find a solution ...
> 


From proxmox-user at mattern.org  Fri Nov 25 00:48:59 2016
From: proxmox-user at mattern.org (Marcus)
Date: Fri, 25 Nov 2016 00:48:59 +0100
Subject: [PVE-User] Debian initramfs bug #775583
Message-ID: <61939744-e1fa-2b83-3ef9-0794fdd070cc@mattern.org>

Hi,

it seems that I hit this bug on a Proxmox test installation with Proxmox
4.3. Symptoms are the same as described in the bug report
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=775583) - LVM Volume
for /usr is not activated at boot. Manually doing a vgchange -a y in the
initramfs shell activates them and the OS boots.

There is a "local-block" initramfs script in the Debian latest version
of the lvm2 package that fixes this issue. If I copy it to my Proxmox
test installation and rebuild the initramfs it boots normally.

I found that the bug is fixed in Debian lvm2_2.02.111-2.1.

Regards.


From mark at tuxis.nl  Fri Nov 25 11:18:46 2016
From: mark at tuxis.nl (Mark Schouten)
Date: Fri, 25 Nov 2016 11:18:46 +0100
Subject: [PVE-User] HA Cluster migration issues
Message-ID: <2600018850-5504@kerio.tuxis.nl>

Hi,


I have a HA cluster running, with Ceph and all, and I have rebooted one of the nodes this week. We now want te migrate the HA-VM's back to the original server, but that fails without a clear error.


I can say:

root at proxmox01:~# qm migrate 600 proxmox03 -online

Executing HA migrate for VM 600 to node proxmox03


I then see kvm starting on node proxmox03, but then something goes wrong after that and migration fails:
task started by HA resource agent
Nov 25 10:58:05 starting migration of VM 600 to node 'proxmox03' (10.1.1.3)
Nov 25 10:58:05 copying disk images
Nov 25 10:58:05 starting VM 600 on remote node 'proxmox03'
Nov 25 10:58:06 starting ssh migration tunnel
Nov 25 10:58:07 starting online/live migration on localhost:60000
Nov 25 10:58:07 migrate_set_speed: 8589934592
Nov 25 10:58:07 migrate_set_downtime: 0.1
Nov 25 10:58:09 ERROR: online migrate failure - aborting
Nov 25 10:58:09 aborting phase 2 - cleanup resources
Nov 25 10:58:09 migrate_cancel
Nov 25 10:58:10 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems


I can't see any errormessage that is more useful. Can anybody tell me how I can further debug this or maybe somebody knows what's going on?


pveversion -v (This is identical on the two machines)
proxmox-ve: 4.2-48 (running kernel: 4.4.6-1-pve)
pve-manager: 4.2-2 (running version: 4.2-2/725d76f0)
pve-kernel-4.4.6-1-pve: 4.4.6-48
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-72
pve-firmware: 1.1-8
libpve-common-perl: 4.0-59
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-14
pve-container: 1.0-62
pve-firewall: 2.0-25
pve-ha-manager: 1.0-28
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1


Met vriendelijke groeten,

--?
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK:?61527076?| http://www.tuxis.nl/
T: 0318 200208 | info at tuxis.nl

From gwenn+proxmox at beurre.demisel.net  Fri Nov 25 11:27:35 2016
From: gwenn+proxmox at beurre.demisel.net (Gwenn Gueguen)
Date: Fri, 25 Nov 2016 11:27:35 +0100
Subject: [PVE-User] VMA endianness bug?
Message-ID: <20161125112735.623254f1@port-42.amossys.fr>


Hi,

I had an issue while reading a VMA file written by a proxmox backup
on an up-to-date Proxmox 4.3 node.

According to vma_spec.txt[1], "All numbers in VMA archive are stored in
Big Endian byte order." but it looks like the 2 byte size field at the
beginning of each blob are stored in little endian byte order.

Here is an extract of the blob buffer:

030000 00 11 00 71 65 6D 75 2D 73 65 72 76 65 72 2E 63 ...qemu-server.c
030020 6F 6E 66 00 24 02 62 61 6C 6C 6F 6F 6E 3A 20 31 onf.$.balloon: 1

Config name length is 17 (0x0011) but is written in file as 4352
(0x1100).
Config data length is 548 (0x0224) but is written in file as 9218
(0x2402).

Others numbers in the header (version, timestamp, etc.) are written in
big endian byte order (0X00000001 for the version).

Cheers,

[1] https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=vma_spec.txt

-- 
Gwenn Gueguen


From w.bumiller at proxmox.com  Fri Nov 25 12:03:14 2016
From: w.bumiller at proxmox.com (Wolfgang Bumiller)
Date: Fri, 25 Nov 2016 12:03:14 +0100 (CET)
Subject: [PVE-User] VMA endianness bug?
In-Reply-To: <20161125112735.623254f1@port-42.amossys.fr>
References: <20161125112735.623254f1@port-42.amossys.fr>
Message-ID: <847334054.38.1480071794867@webmail.proxmox.com>


> On November 25, 2016 at 11:27 AM Gwenn Gueguen <gwenn+proxmox at beurre.demisel.net> wrote:
> 
> 
> 
> Hi,
> 
> I had an issue while reading a VMA file written by a proxmox backup
> on an up-to-date Proxmox 4.3 node.
> 
> According to vma_spec.txt[1], "All numbers in VMA archive are stored in
> Big Endian byte order." but it looks like the 2 byte size field at the
> beginning of each blob are stored in little endian byte order.

Unfortunately this is correct. Also note that the first byte of the
blob buffer is unused (iow. the first length starts at an offset of
1). (If you just access it via the offset pointers from the
device/config fields you won't run into this, but if you try to just
index the size+data pairs you will ;-) ).

May I ask what you're working on?


From gwenn+proxmox at beurre.demisel.net  Fri Nov 25 12:15:11 2016
From: gwenn+proxmox at beurre.demisel.net (Gwenn Gueguen)
Date: Fri, 25 Nov 2016 12:15:11 +0100
Subject: [PVE-User] VMA endianness bug?
In-Reply-To: <847334054.38.1480071794867@webmail.proxmox.com>
References: <20161125112735.623254f1@port-42.amossys.fr>
 <847334054.38.1480071794867@webmail.proxmox.com>
Message-ID: <20161125121511.6dc74781@port-42.amossys.fr>

On Fri, 25 Nov 2016 12:03:14 +0100 (CET)
Wolfgang Bumiller <w.bumiller at proxmox.com> wrote:

> Unfortunately this is correct. Also note that the first byte of the
> blob buffer is unused (iow. the first length starts at an offset of
> 1). (If you just access it via the offset pointers from the
> device/config fields you won't run into this, but if you try to just
> index the size+data pairs you will ;-) ).

Fortunately, I'm using offsets stored in the header so this is not a
problem.

> May I ask what you're working on?

We will use Proxmox to host an experimentation platform and we'll have
to import/export VMs from/to other virtualization platforms so I'm
trying to develop a small tool to convert VMA backups to OVA and vice
versa.

An import/export feature would be great.

-- 
Gwenn Gueguen


From hermann at qwer.tk  Fri Nov 25 13:43:07 2016
From: hermann at qwer.tk (Hermann Himmelbauer)
Date: Fri, 25 Nov 2016 13:43:07 +0100
Subject: [PVE-User] Ceph upgrade from 94.3 - recommendations?
Message-ID: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk>

Hi,
I recently upgraded the Proxomox community version to the latest version
and wonder if a ceph upgrade is recommended, too?

Currently my ceph version is 0.94.3 - and I see that there are upgrades
to 0.94.9 on the ceph site, does anyone know how to do such an upgrade
on proxmox? Is it risky?

Best Regards,
Hermann

-- 
hermann at qwer.tk
PGP/GPG: 299893C7 (on keyservers)


From w.bumiller at proxmox.com  Fri Nov 25 14:23:13 2016
From: w.bumiller at proxmox.com (Wolfgang Bumiller)
Date: Fri, 25 Nov 2016 14:23:13 +0100 (CET)
Subject: [PVE-User] VMA endianness bug?
In-Reply-To: <20161125121511.6dc74781@port-42.amossys.fr>
References: <20161125112735.623254f1@port-42.amossys.fr>
 <847334054.38.1480071794867@webmail.proxmox.com>
 <20161125121511.6dc74781@port-42.amossys.fr>
Message-ID: <1740456382.149.1480080193359@webmail.proxmox.com>

> On November 25, 2016 at 12:15 PM Gwenn Gueguen <gwenn+proxmox at beurre.demisel.net> wrote:
> > May I ask what you're working on?
> 
> We will use Proxmox to host an experimentation platform and we'll have
> to import/export VMs from/to other virtualization platforms so I'm
> trying to develop a small tool to convert VMA backups to OVA and vice
> versa.
> 
> An import/export feature would be great.

There are the `vma create/vma extract` cli tools, alternatively I'll
probably be moving the vma handling code into a separate library for
easier maintenance (which would also allow it to be reused more easily
for such tools).


From aderumier at odiso.com  Sat Nov 26 09:04:49 2016
From: aderumier at odiso.com (Alexandre DERUMIER)
Date: Sat, 26 Nov 2016 09:04:49 +0100 (CET)
Subject: [PVE-User] Ceph upgrade from 94.3 - recommendations?
In-Reply-To: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk>
References: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk>
Message-ID: <549136967.3727214.1480147489744.JavaMail.zimbra@oxygem.tv>

Sure,
you can always upgrade to last minor version. (0.94.X)

Only jewel is not yet compatible because of a bug, but it'll be fixed in next jewel release (10.2.4)

----- Mail original -----
De: "Hermann Himmelbauer" <hermann at qwer.tk>
?: "proxmoxve" <pve-user at pve.proxmox.com>
Envoy?: Vendredi 25 Novembre 2016 13:43:07
Objet: [PVE-User] Ceph upgrade from 94.3 - recommendations?

Hi, 
I recently upgraded the Proxomox community version to the latest version 
and wonder if a ceph upgrade is recommended, too? 

Currently my ceph version is 0.94.3 - and I see that there are upgrades 
to 0.94.9 on the ceph site, does anyone know how to do such an upgrade 
on proxmox? Is it risky? 

Best Regards, 
Hermann 

-- 
hermann at qwer.tk 
PGP/GPG: 299893C7 (on keyservers) 
_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 


From gaio at sv.lnf.it  Mon Nov 28 13:05:11 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Mon, 28 Nov 2016 13:05:11 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
Message-ID: <20161128120511.GJ3348@sv.lnf.it>


A very strange saturday evening. Hardware tooling, hacking, caffeine,
...

I'm still completing my CEPH storage cluster (now 2 node storage,
waiting to add the third), but is it mostly ''on production''.
So, after playing with server for some month, saturday i've shut down
all the cluster, setup all the cables, switches, UPS, ... in a more
decent and stable way.

To simulate a hard power outgage, i've not set the noout and nodown
flags.


After that, i've powered up all the cluster (first the 2 ceph storage
node, after the 2 pve host nodes) and i've hit the first trouble:

	2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 

The trouble came from the fact that... my NTP server was on a VM, and
despite the fact that the status was only 'HEALTH_WARN', i cannot
access anymore the storage.

I've solved adding more NTP server from other sites, and after some
time the cluster go OK:

	2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK

and here the panic start.


PVE interface report the Ceph cluster OK, report correctly all the stuffs
(mon, osd, pools, pool usage, ...) but data cluster was not accessible:

 a) if i try to move a disk, reply with something like 'no available'.

 b) if i try to start VMs, they stalls...

The only strange things on log was that there's NO pgmap update, like
before:

	2016-11-26 16:59:31.588695 mon.0 10.27.251.7:6789/0 2317560 : cluster [INF] pgmap v2410540: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13569 kB/s rd, 2731 kB/s wr, 565 op/s

but really, on panic, i've not noted that.


After some tests, i've finally do the right thing.

 1) i've set the noout and nodown flags.

 2) i've rebooted the ceph nodes, one by one.

After that, all the cluster start. VMs that was on stalls, immediately
start.


After that, i've understood that NTP is a crucial service for ceph, so
it is needed to have a pool of servers. Still, i'm not sure this was
the culprit.


The second thing i've understood is that Ceph react badly to a total
shutdown. In a datacenter this is probably acceptable.

I don't know if it is my fault, or at least there's THE RIGTH WAY to
start a Ceph cluster from cold metal...


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From sysadmin-pve at cognitec.com  Mon Nov 28 13:45:24 2016
From: sysadmin-pve at cognitec.com (Alwin Antreich)
Date: Mon, 28 Nov 2016 13:45:24 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <20161128120511.GJ3348@sv.lnf.it>
References: <20161128120511.GJ3348@sv.lnf.it>
Message-ID: <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>

Hi Marco,

On 11/28/2016 01:05 PM, Marco Gaiarin wrote:
> 
> A very strange saturday evening. Hardware tooling, hacking, caffeine,
> ...
> 
> I'm still completing my CEPH storage cluster (now 2 node storage,
> waiting to add the third), but is it mostly ''on production''.
> So, after playing with server for some month, saturday i've shut down
> all the cluster, setup all the cables, switches, UPS, ... in a more
> decent and stable way.
> 
> To simulate a hard power outgage, i've not set the noout and nodown
> flags.
> 
> 
> After that, i've powered up all the cluster (first the 2 ceph storage
> node, after the 2 pve host nodes) and i've hit the first trouble:
> 
> 	2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 
> 
> The trouble came from the fact that... my NTP server was on a VM, and
> despite the fact that the status was only 'HEALTH_WARN', i cannot
> access anymore the storage.

What did the full ceph status show?
Did you add all the monitors to your storage config in proxmox?
A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be
available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg.
1 mons down).

Did you configure timesyncd properly?
On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the
proper time, so every host knows which map is the current one.

> 
> I've solved adding more NTP server from other sites, and after some
> time the cluster go OK:
> 
> 	2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK
> 
> and here the panic start.
> 
> 
> PVE interface report the Ceph cluster OK, report correctly all the stuffs
> (mon, osd, pools, pool usage, ...) but data cluster was not accessible:
> 
>  a) if i try to move a disk, reply with something like 'no available'.
> 
>  b) if i try to start VMs, they stalls...
> 
> The only strange things on log was that there's NO pgmap update, like
> before:
> 
> 	2016-11-26 16:59:31.588695 mon.0 10.27.251.7:6789/0 2317560 : cluster [INF] pgmap v2410540: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13569 kB/s rd, 2731 kB/s wr, 565 op/s
> 
> but really, on panic, i've not noted that.
> 
> 
> After some tests, i've finally do the right thing.
> 
>  1) i've set the noout and nodown flags.
> 
>  2) i've rebooted the ceph nodes, one by one.
> 
> After that, all the cluster start. VMs that was on stalls, immediately
> start.
> 
> 
> After that, i've understood that NTP is a crucial service for ceph, so
> it is needed to have a pool of servers. Still, i'm not sure this was
> the culprit.
> 
> 
> The second thing i've understood is that Ceph react badly to a total
> shutdown. In a datacenter this is probably acceptable.
> 
> I don't know if it is my fault, or at least there's THE RIGTH WAY to
> start a Ceph cluster from cold metal...
> 
> 
> Thanks.
> 

-- 
Cheers,
Alwin


From gaio at sv.lnf.it  Mon Nov 28 15:31:41 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Mon, 28 Nov 2016 15:31:41 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
Message-ID: <20161128143141.GQ3348@sv.lnf.it>

Mandi! Alwin Antreich
  In chel di` si favelave...

> What did the full ceph status show?

Do you mean 'ceph status'? I've not saved it, but was OK, as now:

 root at thor:~# ceph status
    cluster 8794c124-c2ec-4e81-8631-742992159bd6
     health HEALTH_OK
     monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
            election epoch 94, quorum 0,1,2,3 0,1,2,3
     osdmap e114: 6 osds: 6 up, 6 in
      pgmap v2524432: 768 pgs, 3 pools, 944 GB data, 237 kobjects
            1874 GB used, 7435 GB / 9310 GB avail
                 768 active+clean
  client io 7693 B/s rd, 302 kB/s wr, 65 op/s


> Did you add all the monitors to your storage config in proxmox?
> A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be
> available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg.
> 1 mons down).

I've currently 4 nodes in my cluster: all node are pve clusterized, 2
are cpu only (ceph mon), 2 (and one more to come) storage node
(mon+osd(s)).

Yes, i've not changed the storage configuration, and when the CPU nodes
started at least the two storage nodes where online.


> Did you configure timesyncd properly?
> On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the
> proper time, so every host knows which map is the current one.

Now, yes. As stated, i've had configured with only a NTP server that was
a VM in the same cluster; now, they use two NTP server, one remote.

Fixed the ntp server, servers get in sync, ceph status got OK but mons
does not start to peers themself ('pgmap' logs).


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From sysadmin-pve at cognitec.com  Mon Nov 28 15:50:21 2016
From: sysadmin-pve at cognitec.com (Alwin Antreich)
Date: Mon, 28 Nov 2016 15:50:21 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <20161128143141.GQ3348@sv.lnf.it>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
Message-ID: <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>

Hi Marco,

On 11/28/2016 03:31 PM, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
>   In chel di` si favelave...
> 
>> What did the full ceph status show?
> 
> Do you mean 'ceph status'? I've not saved it, but was OK, as now:
> 
>  root at thor:~# ceph status
>     cluster 8794c124-c2ec-4e81-8631-742992159bd6
>      health HEALTH_OK
>      monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
>             election epoch 94, quorum 0,1,2,3 0,1,2,3
>      osdmap e114: 6 osds: 6 up, 6 in
>       pgmap v2524432: 768 pgs, 3 pools, 944 GB data, 237 kobjects
>             1874 GB used, 7435 GB / 9310 GB avail
>                  768 active+clean
>   client io 7693 B/s rd, 302 kB/s wr, 65 op/s
> 

Would have been interesting if all OSDs were up & in. As depending on the pool config, the min size for serving data out
of that pool might have prevented the storage to serve data.

> 
>> Did you add all the monitors to your storage config in proxmox?
>> A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be
>> available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg.
>> 1 mons down).
> 
> I've currently 4 nodes in my cluster: all node are pve clusterized, 2
> are cpu only (ceph mon), 2 (and one more to come) storage node
> (mon+osd(s)).
> 
> Yes, i've not changed the storage configuration, and when the CPU nodes
> started at least the two storage nodes where online.

I see from your ceph status that you have 4 mons, are they all in your storage conf? And are your storage nodes also mons?

It is important to have the monitors online, as these are accessed first and if those aren't then no storage is
available. With only one OSD node running the storage could be still available, besides a HEALTH_WARN.

> 
> 
>> Did you configure timesyncd properly?
>> On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the
>> proper time, so every host knows which map is the current one.
> 
> Now, yes. As stated, i've had configured with only a NTP server that was
> a VM in the same cluster; now, they use two NTP server, one remote.

Then a reboot should not do any harm.

> 
> Fixed the ntp server, servers get in sync, ceph status got OK but mons
> does not start to peers themself ('pgmap' logs).

If your mons aren't peering, then the status wouldn't be OK, so they must have done it after a while. May you please
show us the logs?

> 
> 
> Thanks.
> 

-- 
Cheers,
Alwin


From gaio at sv.lnf.it  Mon Nov 28 16:04:25 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Mon, 28 Nov 2016 16:04:25 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
 <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
Message-ID: <20161128150425.GT3348@sv.lnf.it>

Mandi! Alwin Antreich
  In chel di` si favelave...

> Would have been interesting if all OSDs were up & in. As depending on the pool config, the min size for serving data out
> of that pool might have prevented the storage to serve data.

Ouch! I've forgot to specify... not only the status was OK, but
effectively all OSDs was up & in, in 'ceph status' and also in PVE
interface.

Also, for now i've 2 node storage and my pools size is 2.


> I see from your ceph status that you have 4 mons, are they all in your storage conf? And are your storage nodes also mons?

Yes.


> If your mons aren't peering, then the status wouldn't be OK, so they must have done it after a while. May you please
> show us the logs?

Tomorrow. ;-)

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From rui.godinho.lopes at gmail.com  Tue Nov 29 00:20:42 2016
From: rui.godinho.lopes at gmail.com (Rui Lopes)
Date: Mon, 28 Nov 2016 23:20:42 +0000
Subject: [PVE-User] lvmconfig binary on debian jessie?
Message-ID: <CA+kzyB+DrXM8qpNuhsFiGoC1mhZnqDqO0E+25DNMwyv74qr55Q@mail.gmail.com>

Hello,

Is there a way to have the lvmconfig binary on debian jessie (the dist that
the proxmox iso uses)? Known is there are alternatives?

Thanks!


From gaio at sv.lnf.it  Tue Nov 29 12:17:44 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Tue, 29 Nov 2016 12:17:44 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
 <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
Message-ID: <20161129111744.GL3355@sv.lnf.it>

Mandi! Alwin Antreich
  In chel di` si favelave...

> May you please show us the logs?

Ok, i'm here. With the log.

A bit of legenda: 10.27.251.7 and 10.27.251.8 are the 'ceph' nodes
(mon+osd); 10.27.251.11 and 10.27.251.12 are the 'cpu' nodes (only
mon). In order, mon.0, mon.1, mon.2 and mon.3.

These are the logs of 10.27.251.7 (mon.0); Seems to me that ceph logs
are all similar, so i hope these suffices.


I've started my activity at 15.00, but before take down all the stuff
i've P2V my last server, my Asterisk PBX box. Clearly, cluster worked:

[...]
2016-11-26 16:45:51.900445 osd.4 10.27.251.8:6804/3442 5016 : cluster [INF] 3.68 scrub starts
2016-11-26 16:45:52.047932 osd.4 10.27.251.8:6804/3442 5017 : cluster [INF] 3.68 scrub ok
2016-11-26 16:45:52.741334 mon.0 10.27.251.7:6789/0 2317313 : cluster [INF] pgmap v2410312: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20533 B/s rd, 945 kB/s wr, 127 op/s
2016-11-26 16:45:54.825603 mon.0 10.27.251.7:6789/0 2317314 : cluster [INF] pgmap v2410313: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 261 kB/s wr, 7 op/s
[...]
2016-11-26 16:47:52.741749 mon.0 10.27.251.7:6789/0 2317382 : cluster [INF] pgmap v2410381: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11563 B/s rd, 687 kB/s wr, 124 op/s
2016-11-26 16:47:55.002485 mon.0 10.27.251.7:6789/0 2317383 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s


Finished the P2V, i've started to power off the cluster, starting from
the cpu nodes. After powering down a node, i've realized that i need it
to do another thing, so i've re-powered on. ;-)

2016-11-26 16:48:05.018514 mon.1 10.27.251.8:6789/0 129 : cluster [INF] mon.1 calling new monitor election
2016-11-26 16:48:05.031761 mon.2 10.27.251.11:6789/0 120 : cluster [INF] mon.2 calling new monitor election
2016-11-26 16:48:05.053262 mon.0 10.27.251.7:6789/0 2317384 : cluster [INF] mon.0 calling new monitor election
2016-11-26 16:48:10.091773 mon.0 10.27.251.7:6789/0 2317385 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
2016-11-26 16:48:10.104535 mon.0 10.27.251.7:6789/0 2317386 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
2016-11-26 16:48:10.143625 mon.0 10.27.251.7:6789/0 2317387 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 16:48:10.143731 mon.0 10.27.251.7:6789/0 2317388 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s
2016-11-26 16:48:10.144828 mon.0 10.27.251.7:6789/0 2317389 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 16:48:10.148407 mon.0 10.27.251.7:6789/0 2317390 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 16:48:11.208968 mon.0 10.27.251.7:6789/0 2317391 : cluster [INF] pgmap v2410383: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2174 kB/s rd, 646 kB/s wr, 130 op/s
2016-11-26 16:48:13.309644 mon.0 10.27.251.7:6789/0 2317392 : cluster [INF] pgmap v2410384: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2210 kB/s rd, 652 kB/s wr, 135 op/s
[...]
2016-11-26 16:50:04.665220 mon.0 10.27.251.7:6789/0 2317466 : cluster [INF] pgmap v2410458: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2579 B/s rd, 23217 B/s wr, 5 op/s
2016-11-26 16:50:05.707271 mon.0 10.27.251.7:6789/0 2317467 : cluster [INF] pgmap v2410459: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 157 kB/s rd, 445 kB/s wr, 82 op/s
2016-11-26 16:50:16.786716 mon.1 10.27.251.8:6789/0 130 : cluster [INF] mon.1 calling new monitor election
2016-11-26 16:50:16.815156 mon.0 10.27.251.7:6789/0 2317468 : cluster [INF] mon.0 calling new monitor election
2016-11-26 16:52:51.536024 osd.0 10.27.251.7:6800/3166 7755 : cluster [INF] 1.e8 scrub starts
2016-11-26 16:52:53.771169 osd.0 10.27.251.7:6800/3166 7756 : cluster [INF] 1.e8 scrub ok
2016-11-26 16:54:34.558607 osd.0 10.27.251.7:6800/3166 7757 : cluster [INF] 1.ed scrub starts
2016-11-26 16:54:36.682207 osd.0 10.27.251.7:6800/3166 7758 : cluster [INF] 1.ed scrub ok
2016-11-26 16:57:07.816187 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
2016-11-26 16:57:13.242951 mon.0 10.27.251.7:6789/0 2317469 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 16:57:13.252424 mon.0 10.27.251.7:6789/0 2317470 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 16:57:13.253143 mon.0 10.27.251.7:6789/0 2317471 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155786s > max 0.05s
2016-11-26 16:57:13.302934 mon.0 10.27.251.7:6789/0 2317472 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 16:57:13.302998 mon.0 10.27.251.7:6789/0 2317473 : cluster [INF] pgmap v2410460: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 77940 B/s rd, 208 kB/s wr, 38 op/s
2016-11-26 16:57:13.303055 mon.0 10.27.251.7:6789/0 2317474 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 16:57:13.303141 mon.0 10.27.251.7:6789/0 2317475 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 16:57:13.304000 mon.0 10.27.251.7:6789/0 2317476 : cluster [WRN] message from mon.3 was stamped 0.156822s in the future, clocks not synchronized
2016-11-26 16:57:14.350452 mon.0 10.27.251.7:6789/0 2317477 : cluster [INF] pgmap v2410461: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 43651 B/s rd, 15067 B/s wr, 2 op/s
[...]
2016-11-26 16:57:30.901532 mon.0 10.27.251.7:6789/0 2317483 : cluster [INF] pgmap v2410467: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1539 kB/s rd, 316 kB/s wr, 172 op/s
2016-11-26 16:51:13.939571 osd.4 10.27.251.8:6804/3442 5018 : cluster [INF] 4.91 deep-scrub starts
2016-11-26 16:52:03.663961 osd.4 10.27.251.8:6804/3442 5019 : cluster [INF] 4.91 deep-scrub ok
2016-11-26 16:57:33.003398 mon.0 10.27.251.7:6789/0 2317484 : cluster [INF] pgmap v2410468: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20384 kB/s rd, 2424 kB/s wr, 1163 op/s
[...]
2016-11-26 16:57:41.523421 mon.0 10.27.251.7:6789/0 2317489 : cluster [INF] pgmap v2410473: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3654 kB/s rd, 732 kB/s wr, 385 op/s
2016-11-26 16:57:43.284475 mon.0 10.27.251.7:6789/0 2317490 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155191s > max 0.05s
2016-11-26 16:57:43.624090 mon.0 10.27.251.7:6789/0 2317491 : cluster [INF] pgmap v2410474: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2140 kB/s rd, 391 kB/s wr, 233 op/s
[...]
2016-11-26 16:58:02.688789 mon.0 10.27.251.7:6789/0 2317503 : cluster [INF] pgmap v2410486: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4675 kB/s rd, 184 kB/s wr, 281 op/s
2016-11-26 16:52:48.308292 osd.3 10.27.251.8:6812/4377 8761 : cluster [INF] 1.55 scrub starts
2016-11-26 16:52:50.718814 osd.3 10.27.251.8:6812/4377 8762 : cluster [INF] 1.55 scrub ok
2016-11-26 16:52:59.309398 osd.3 10.27.251.8:6812/4377 8763 : cluster [INF] 4.c7 scrub starts
2016-11-26 16:53:10.848883 osd.3 10.27.251.8:6812/4377 8764 : cluster [INF] 4.c7 scrub ok
2016-11-26 16:58:03.759643 mon.0 10.27.251.7:6789/0 2317504 : cluster [INF] pgmap v2410487: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 8311 kB/s rd, 65182 B/s wr, 334 op/s
[...]
2016-11-26 16:58:11.183400 mon.0 10.27.251.7:6789/0 2317510 : cluster [INF] pgmap v2410493: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11880 kB/s rd, 507 kB/s wr, 1006 op/s
2016-11-26 16:58:13.265908 mon.0 10.27.251.7:6789/0 2317511 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
2016-11-26 16:58:13.290893 mon.0 10.27.251.7:6789/0 2317512 : cluster [INF] pgmap v2410494: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 9111 kB/s rd, 523 kB/s wr, 718 op/s
[...]
2016-11-26 16:58:42.309990 mon.0 10.27.251.7:6789/0 2317529 : cluster [INF] pgmap v2410511: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 22701 kB/s rd, 4773 kB/s wr, 834 op/s
2016-11-26 16:58:43.285715 mon.0 10.27.251.7:6789/0 2317530 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.154781s > max 0.05s
2016-11-26 16:58:43.358508 mon.0 10.27.251.7:6789/0 2317531 : cluster [INF] pgmap v2410512: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 19916 kB/s rd, 4439 kB/s wr, 741 op/s
[...]
2016-11-26 16:59:17.933355 mon.0 10.27.251.7:6789/0 2317552 : cluster [INF] pgmap v2410533: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4400 kB/s rd, 2144 kB/s wr, 276 op/s
2016-11-26 16:59:18.981605 mon.0 10.27.251.7:6789/0 2317553 : cluster [WRN] message from mon.3 was stamped 0.155111s in the future, clocks not synchronized
2016-11-26 16:59:21.064651 mon.0 10.27.251.7:6789/0 2317554 : cluster [INF] pgmap v2410534: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3909 kB/s rd, 1707 kB/s wr, 232 op/s
[...]
2016-11-26 16:59:58.729775 mon.0 10.27.251.7:6789/0 2317576 : cluster [INF] pgmap v2410556: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4067 kB/s rd, 1372 kB/s wr, 125 op/s
2016-11-26 17:00:00.000396 mon.0 10.27.251.7:6789/0 2317577 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
2016-11-26 17:00:00.807659 mon.0 10.27.251.7:6789/0 2317578 : cluster [INF] pgmap v2410557: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 7894 kB/s rd, 1245 kB/s wr, 552 op/s
[...]
2016-11-26 17:00:11.359226 mon.0 10.27.251.7:6789/0 2317585 : cluster [INF] pgmap v2410564: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2416 kB/s rd, 376 kB/s wr, 191 op/s
2016-11-26 17:00:13.286867 mon.0 10.27.251.7:6789/0 2317586 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.153666s > max 0.05s
2016-11-26 17:00:13.481830 mon.0 10.27.251.7:6789/0 2317587 : cluster [INF] pgmap v2410565: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 6266 kB/s rd, 492 kB/s wr, 265 op/s
[...]
2016-11-26 17:00:15.559867 mon.0 10.27.251.7:6789/0 2317588 : cluster [INF] pgmap v2410566: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 5107 kB/s rd, 176 kB/s wr, 133 op/s

OK, here server was shut down and so logs stop.


At power up, i got as sayed clock skew troubles, so i got status
HEALTH_WARN:

2016-11-26 18:16:19.623440 mon.1 10.27.251.8:6789/0 1311 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:16:19.729689 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:16:19.848291 mon.0 10.27.251.7:6789/0 1183 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:16:29.613075 mon.2 10.27.251.11:6789/0 20 : cluster [WRN] message from mon.0 was stamped 0.341880s in the future, clocks not synchronized
2016-11-26 18:16:29.742328 mon.1 10.27.251.8:6789/0 1332 : cluster [WRN] message from mon.0 was stamped 0.212611s in the future, clocks not synchronized
2016-11-26 18:16:29.894351 mon.0 10.27.251.7:6789/0 1202 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
2016-11-26 18:16:29.901079 mon.0 10.27.251.7:6789/0 1203 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
2016-11-26 18:16:29.902069 mon.0 10.27.251.7:6789/0 1204 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347176s > max 0.05s
2016-11-26 18:16:29.928249 mon.0 10.27.251.7:6789/0 1205 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.203948s > max 0.05s
2016-11-26 18:16:29.955001 mon.0 10.27.251.7:6789/0 1206 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:16:29.955115 mon.0 10.27.251.7:6789/0 1207 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:16:29.955195 mon.0 10.27.251.7:6789/0 1208 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:16:29.955297 mon.0 10.27.251.7:6789/0 1209 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:16:36.965739 mon.2 10.27.251.11:6789/0 23 : cluster [WRN] message from mon.0 was stamped 0.347450s in the future, clocks not synchronized
2016-11-26 18:16:37.091476 mon.1 10.27.251.8:6789/0 1335 : cluster [WRN] message from mon.0 was stamped 0.221680s in the future, clocks not synchronized
2016-11-26 18:16:59.929488 mon.0 10.27.251.7:6789/0 1212 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347736s > max 0.05s
2016-11-26 18:16:59.929541 mon.0 10.27.251.7:6789/0 1213 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.222216s > max 0.05s
2016-11-26 18:17:02.770378 mon.2 10.27.251.11:6789/0 24 : cluster [WRN] message from mon.0 was stamped 0.345763s in the future, clocks not synchronized
2016-11-26 18:17:02.902756 mon.1 10.27.251.8:6789/0 1336 : cluster [WRN] message from mon.0 was stamped 0.213372s in the future, clocks not synchronized
2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 
2016-11-26 18:17:59.930852 mon.0 10.27.251.7:6789/0 1219 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.348437s > max 0.05s
2016-11-26 18:17:59.930923 mon.0 10.27.251.7:6789/0 1220 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.223381s > max 0.05s
2016-11-26 18:18:24.383970 mon.2 10.27.251.11:6789/0 25 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:18:24.459941 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
2016-11-26 18:18:24.506084 mon.3 10.27.251.12:6789/0 2 : cluster [WRN] message from mon.0 was stamped 0.271532s in the future, clocks not synchronized
2016-11-26 18:18:24.508845 mon.1 10.27.251.8:6789/0 1337 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:18:24.733137 mon.0 10.27.251.7:6789/0 1221 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:18:24.764445 mon.0 10.27.251.7:6789/0 1222 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 18:18:24.770743 mon.0 10.27.251.7:6789/0 1223 : cluster [INF] HEALTH_OK
2016-11-26 18:18:24.771644 mon.0 10.27.251.7:6789/0 1224 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.34865s > max 0.05s
2016-11-26 18:18:24.771763 mon.0 10.27.251.7:6789/0 1225 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272024s > max 0.05s
2016-11-26 18:18:24.778105 mon.0 10.27.251.7:6789/0 1226 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:18:24.778168 mon.0 10.27.251.7:6789/0 1227 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:18:24.778217 mon.0 10.27.251.7:6789/0 1228 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:18:24.778309 mon.0 10.27.251.7:6789/0 1229 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:18:24.778495 mon.0 10.27.251.7:6789/0 1230 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.217754s > max 0.05s
2016-11-26 18:18:31.609426 mon.3 10.27.251.12:6789/0 5 : cluster [WRN] message from mon.0 was stamped 0.272441s in the future, clocks not synchronized
2016-11-26 18:18:54.779742 mon.0 10.27.251.7:6789/0 1231 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272617s > max 0.05s
2016-11-26 18:18:54.779795 mon.0 10.27.251.7:6789/0 1232 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.224392s > max 0.05s
2016-11-26 18:18:54.779834 mon.0 10.27.251.7:6789/0 1233 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349151s > max 0.05s
2016-11-26 18:18:57.598098 mon.3 10.27.251.12:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.272729s in the future, clocks not synchronized
2016-11-26 18:19:09.612371 mon.2 10.27.251.11:6789/0 26 : cluster [WRN] message from mon.0 was stamped 0.349322s in the future, clocks not synchronized
2016-11-26 18:19:09.736830 mon.1 10.27.251.8:6789/0 1338 : cluster [WRN] message from mon.0 was stamped 0.224812s in the future, clocks not synchronized
2016-11-26 18:19:24.770966 mon.0 10.27.251.7:6789/0 1234 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
2016-11-26 18:19:54.781002 mon.0 10.27.251.7:6789/0 1235 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.273372s > max 0.05s
2016-11-26 18:19:54.781078 mon.0 10.27.251.7:6789/0 1236 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.225574s > max 0.05s
2016-11-26 18:19:54.781120 mon.0 10.27.251.7:6789/0 1237 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349896s > max 0.05s
2016-11-26 18:21:03.602890 mon.3 10.27.251.12:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.274203s in the future, clocks not synchronized
2016-11-26 18:21:24.782299 mon.0 10.27.251.7:6789/0 1238 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27444s > max 0.05s
2016-11-26 18:21:24.782359 mon.0 10.27.251.7:6789/0 1239 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.351099s > max 0.05s
2016-11-26 18:21:24.782397 mon.0 10.27.251.7:6789/0 1240 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.227465s > max 0.05s
2016-11-26 18:23:24.783511 mon.0 10.27.251.7:6789/0 1241 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.275852s > max 0.05s
2016-11-26 18:23:24.783572 mon.0 10.27.251.7:6789/0 1242 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.352701s > max 0.05s
2016-11-26 18:23:24.783614 mon.0 10.27.251.7:6789/0 1243 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.229936s > max 0.05s
2016-11-26 18:25:54.784800 mon.0 10.27.251.7:6789/0 1244 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.277662s > max 0.05s
2016-11-26 18:25:54.784861 mon.0 10.27.251.7:6789/0 1245 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.354716s > max 0.05s
2016-11-26 18:25:54.785102 mon.0 10.27.251.7:6789/0 1246 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.232739s > max 0.05s
2016-11-26 18:28:54.786183 mon.0 10.27.251.7:6789/0 1248 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27989s > max 0.05s
2016-11-26 18:28:54.786243 mon.0 10.27.251.7:6789/0 1249 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.23634s > max 0.05s
2016-11-26 18:28:54.786284 mon.0 10.27.251.7:6789/0 1250 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.35715s > max 0.05s
2016-11-26 18:29:36.721250 mon.2 10.27.251.11:6789/0 27 : cluster [WRN] message from mon.0 was stamped 0.357750s in the future, clocks not synchronized
2016-11-26 18:29:36.841757 mon.1 10.27.251.8:6789/0 1339 : cluster [WRN] message from mon.0 was stamped 0.237207s in the future, clocks not synchronized
2016-11-26 18:31:30.725507 mon.3 10.27.251.12:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.281799s in the future, clocks not synchronized
2016-11-26 18:32:24.787410 mon.0 10.27.251.7:6789/0 1264 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282481s > max 0.05s
2016-11-26 18:32:24.787462 mon.0 10.27.251.7:6789/0 1265 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.360058s > max 0.05s
2016-11-26 18:32:24.787500 mon.0 10.27.251.7:6789/0 1266 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.240569s > max 0.05s
2016-11-26 18:33:20.594196 mon.3 10.27.251.12:6789/0 9 : cluster [INF] mon.3 calling new monitor election
2016-11-26 18:33:20.635816 mon.1 10.27.251.8:6789/0 1340 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:33:20.894625 mon.0 10.27.251.7:6789/0 1273 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:33:25.919955 mon.0 10.27.251.7:6789/0 1274 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 18:33:25.929393 mon.0 10.27.251.7:6789/0 1275 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 18:33:25.930715 mon.0 10.27.251.7:6789/0 1276 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282884s > max 0.05s
2016-11-26 18:33:25.947280 mon.0 10.27.251.7:6789/0 1277 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.234203s > max 0.05s
2016-11-26 18:33:25.964223 mon.0 10.27.251.7:6789/0 1278 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:33:25.964283 mon.0 10.27.251.7:6789/0 1279 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:33:25.964326 mon.0 10.27.251.7:6789/0 1280 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:33:25.964418 mon.0 10.27.251.7:6789/0 1281 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:33:55.948613 mon.0 10.27.251.7:6789/0 1283 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.28349s > max 0.05s
2016-11-26 18:33:55.948680 mon.0 10.27.251.7:6789/0 1284 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.242253s > max 0.05s
2016-11-26 18:34:25.929710 mon.0 10.27.251.7:6789/0 1287 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
2016-11-26 18:34:55.950050 mon.0 10.27.251.7:6789/0 1288 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.284225s > max 0.05s
2016-11-26 18:34:55.950117 mon.0 10.27.251.7:6789/0 1289 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243421s > max 0.05s
2016-11-26 18:36:25.951267 mon.0 10.27.251.7:6789/0 1290 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.285389s > max 0.05s
2016-11-26 18:36:25.951393 mon.0 10.27.251.7:6789/0 1291 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.245253s > max 0.05s
2016-11-26 18:38:25.952573 mon.0 10.27.251.7:6789/0 1294 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.286907s > max 0.05s
2016-11-26 18:38:25.952836 mon.0 10.27.251.7:6789/0 1295 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247648s > max 0.05s
2016-11-26 18:40:55.954179 mon.0 10.27.251.7:6789/0 1296 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.288735s > max 0.05s
2016-11-26 18:40:55.954233 mon.0 10.27.251.7:6789/0 1297 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.2506s > max 0.05s
2016-11-26 18:43:32.915408 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:43:32.916835 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
2016-11-26 18:43:32.951384 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.388792s in the future, clocks not synchronized
2016-11-26 18:43:33.014026 mon.3 10.27.251.12:6789/0 10 : cluster [INF] mon.3 calling new monitor election
2016-11-26 18:43:33.050896 mon.1 10.27.251.8:6789/0 1341 : cluster [INF] mon.1 calling new monitor election
2016-11-26 18:43:33.305330 mon.0 10.27.251.7:6789/0 1298 : cluster [INF] mon.0 calling new monitor election
2016-11-26 18:43:33.324492 mon.0 10.27.251.7:6789/0 1299 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 18:43:33.333626 mon.0 10.27.251.7:6789/0 1300 : cluster [INF] HEALTH_OK
2016-11-26 18:43:33.334234 mon.0 10.27.251.7:6789/0 1301 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290845s > max 0.05s
2016-11-26 18:43:33.334321 mon.0 10.27.251.7:6789/0 1302 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.388745s > max 0.05s
2016-11-26 18:43:33.340638 mon.0 10.27.251.7:6789/0 1303 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 18:43:33.340703 mon.0 10.27.251.7:6789/0 1304 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 18:43:33.340763 mon.0 10.27.251.7:6789/0 1305 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 18:43:33.340858 mon.0 10.27.251.7:6789/0 1306 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 18:43:33.341044 mon.0 10.27.251.7:6789/0 1307 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247914s > max 0.05s
2016-11-26 18:43:40.064299 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.388889s in the future, clocks not synchronized
2016-11-26 18:44:03.342137 mon.0 10.27.251.7:6789/0 1308 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291226s > max 0.05s
2016-11-26 18:44:03.342225 mon.0 10.27.251.7:6789/0 1309 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.254342s > max 0.05s
2016-11-26 18:44:03.342281 mon.0 10.27.251.7:6789/0 1310 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.389057s > max 0.05s
2016-11-26 18:44:06.047499 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.389102s in the future, clocks not synchronized
2016-11-26 18:44:33.333908 mon.0 10.27.251.7:6789/0 1311 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
2016-11-26 18:45:03.343358 mon.0 10.27.251.7:6789/0 1313 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291989s > max 0.05s
2016-11-26 18:45:03.343435 mon.0 10.27.251.7:6789/0 1314 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.255536s > max 0.05s
2016-11-26 18:45:03.343540 mon.0 10.27.251.7:6789/0 1315 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.38983s > max 0.05s
2016-11-26 18:46:11.549947 mon.2 10.27.251.11:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.390678s in the future, clocks not synchronized
2016-11-26 18:46:33.344570 mon.0 10.27.251.7:6789/0 1329 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.29311s > max 0.05s
2016-11-26 18:46:33.344642 mon.0 10.27.251.7:6789/0 1330 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.257389s > max 0.05s
2016-11-26 18:46:33.344707 mon.0 10.27.251.7:6789/0 1331 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.391036s > max 0.05s
2016-11-26 18:48:33.345909 mon.0 10.27.251.7:6789/0 1354 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.294607s > max 0.05s
2016-11-26 18:48:33.345973 mon.0 10.27.251.7:6789/0 1355 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.392611s > max 0.05s
2016-11-26 18:48:33.346016 mon.0 10.27.251.7:6789/0 1356 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.259781s > max 0.05s
2016-11-26 18:51:03.347074 mon.0 10.27.251.7:6789/0 1357 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.296507s > max 0.05s
2016-11-26 18:51:03.347259 mon.0 10.27.251.7:6789/0 1358 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.394627s > max 0.05s
2016-11-26 18:51:03.347311 mon.0 10.27.251.7:6789/0 1359 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.262662s > max 0.05s
2016-11-26 18:54:03.348471 mon.0 10.27.251.7:6789/0 1360 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.298756s > max 0.05s
2016-11-26 18:54:03.348533 mon.0 10.27.251.7:6789/0 1361 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.397086s > max 0.05s
2016-11-26 18:54:03.348580 mon.0 10.27.251.7:6789/0 1362 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.266196s > max 0.05s
2016-11-26 18:56:39.053369 mon.2 10.27.251.11:6789/0 9 : cluster [WRN] message from mon.0 was stamped 0.399300s in the future, clocks not synchronized
2016-11-26 18:57:33.349690 mon.0 10.27.251.7:6789/0 1363 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192948s > max 0.05s
2016-11-26 18:57:33.349743 mon.0 10.27.251.7:6789/0 1364 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.270457s > max 0.05s
2016-11-26 18:57:33.349788 mon.0 10.27.251.7:6789/0 1365 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.400016s > max 0.05s
2016-11-26 19:00:00.000400 mon.0 10.27.251.7:6789/0 1370 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
2016-11-26 19:01:33.350738 mon.0 10.27.251.7:6789/0 1389 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192183s > max 0.05s
2016-11-26 19:01:33.350800 mon.0 10.27.251.7:6789/0 1390 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.275208s > max 0.05s
2016-11-26 19:01:33.350856 mon.0 10.27.251.7:6789/0 1391 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.40334s > max 0.05s
2016-11-26 19:06:03.351908 mon.0 10.27.251.7:6789/0 1478 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192207s > max 0.05s
2016-11-26 19:06:03.351997 mon.0 10.27.251.7:6789/0 1479 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.280431s > max 0.05s
2016-11-26 19:06:03.352110 mon.0 10.27.251.7:6789/0 1480 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.251491s > max 0.05s

But after adding the new NTP sever and waiting some time, finally clock
get in sync and status go to OK.
But (this is the PANIC time) despite of the fact that 'ceph status' and
pve interface say 'all OK', cluster does not work.

So i've started to reboot the CPU nodes (mon.2 and .3):

2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK
2016-11-26 19:12:43.854404 mon.1 10.27.251.8:6789/0 1342 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:12:43.856032 mon.3 10.27.251.12:6789/0 11 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:12:43.870922 mon.0 10.27.251.7:6789/0 1590 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:12:48.895683 mon.0 10.27.251.7:6789/0 1591 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 19:12:48.905245 mon.0 10.27.251.7:6789/0 1592 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 19:12:48.951654 mon.0 10.27.251.7:6789/0 1593 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:12:48.951715 mon.0 10.27.251.7:6789/0 1594 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:12:48.951766 mon.0 10.27.251.7:6789/0 1595 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:12:48.951848 mon.0 10.27.251.7:6789/0 1596 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:15:48.583382 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:15:48.584865 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:15:48.589714 mon.0 10.27.251.7:6789/0 1616 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:15:48.589965 mon.1 10.27.251.8:6789/0 1343 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:15:48.591671 mon.3 10.27.251.12:6789/0 12 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:15:48.614007 mon.0 10.27.251.7:6789/0 1617 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 19:15:48.620602 mon.0 10.27.251.7:6789/0 1618 : cluster [INF] HEALTH_OK
2016-11-26 19:15:48.633199 mon.0 10.27.251.7:6789/0 1619 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:15:48.633258 mon.0 10.27.251.7:6789/0 1620 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:15:48.633322 mon.0 10.27.251.7:6789/0 1621 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:15:48.633416 mon.0 10.27.251.7:6789/0 1622 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:18:12.415679 mon.0 10.27.251.7:6789/0 1639 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:18:17.444444 mon.0 10.27.251.7:6789/0 1640 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
2016-11-26 19:18:17.453618 mon.0 10.27.251.7:6789/0 1641 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
2016-11-26 19:18:17.468577 mon.0 10.27.251.7:6789/0 1642 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:18:17.468636 mon.0 10.27.251.7:6789/0 1643 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:18:17.468679 mon.0 10.27.251.7:6789/0 1644 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:18:17.468755 mon.0 10.27.251.7:6789/0 1645 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:21:25.457997 mon.2 10.27.251.11:6789/0 5 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:21:25.458923 mon.0 10.27.251.7:6789/0 1648 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:21:25.459240 mon.1 10.27.251.8:6789/0 1344 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:21:25.489206 mon.0 10.27.251.7:6789/0 1649 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 19:21:25.498421 mon.0 10.27.251.7:6789/0 1650 : cluster [INF] HEALTH_OK
2016-11-26 19:21:25.505645 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:21:25.508232 mon.0 10.27.251.7:6789/0 1651 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:21:25.508377 mon.0 10.27.251.7:6789/0 1652 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:21:25.508466 mon.0 10.27.251.7:6789/0 1653 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:21:25.508556 mon.0 10.27.251.7:6789/0 1654 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:44:00.306113 mon.0 10.27.251.7:6789/0 1672 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:44:05.343631 mon.0 10.27.251.7:6789/0 1673 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
2016-11-26 19:44:05.353082 mon.0 10.27.251.7:6789/0 1674 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
2016-11-26 19:44:05.373799 mon.0 10.27.251.7:6789/0 1675 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:44:05.373860 mon.0 10.27.251.7:6789/0 1676 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:44:05.373904 mon.0 10.27.251.7:6789/0 1677 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:44:05.373983 mon.0 10.27.251.7:6789/0 1678 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:47:20.297661 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:47:20.299406 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
2016-11-26 19:47:20.357274 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.404381s in the future, clocks not synchronized
2016-11-26 19:47:20.716116 mon.3 10.27.251.12:6789/0 4 : cluster [INF] mon.3 calling new monitor election
2016-11-26 19:47:20.719435 mon.0 10.27.251.7:6789/0 1679 : cluster [INF] mon.0 calling new monitor election
2016-11-26 19:47:20.719853 mon.1 10.27.251.8:6789/0 1345 : cluster [INF] mon.1 calling new monitor election
2016-11-26 19:47:20.747017 mon.0 10.27.251.7:6789/0 1680 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 19:47:20.755302 mon.0 10.27.251.7:6789/0 1681 : cluster [INF] HEALTH_OK
2016-11-26 19:47:20.755943 mon.0 10.27.251.7:6789/0 1682 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420346s > max 0.05s
2016-11-26 19:47:20.762042 mon.0 10.27.251.7:6789/0 1683 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 19:47:20.762100 mon.0 10.27.251.7:6789/0 1684 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:47:20.762146 mon.0 10.27.251.7:6789/0 1685 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 19:47:20.762226 mon.0 10.27.251.7:6789/0 1686 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
2016-11-26 19:47:27.462603 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.420329s in the future, clocks not synchronized
2016-11-26 19:47:50.763598 mon.0 10.27.251.7:6789/0 1687 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420661s > max 0.05s
2016-11-26 19:47:53.438750 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.420684s in the future, clocks not synchronized
2016-11-26 19:48:20.755382 mon.0 10.27.251.7:6789/0 1688 : cluster [INF] HEALTH_WARN; clock skew detected on mon.2; Monitor clock skew detected 
2016-11-26 19:49:20.755732 mon.0 10.27.251.7:6789/0 1697 : cluster [INF] HEALTH_OK


With no luck. So finally i've set 'nodown' and 'noout' flags and
rebooted the storage nodes (mon.0 ad .1). And suddenly all get back as
normal:

2016-11-26 19:57:20.090836 mon.0 10.27.251.7:6789/0 1722 : cluster [INF] osdmap e99: 6 osds: 6 up, 6 in
2016-11-26 19:57:20.110743 mon.0 10.27.251.7:6789/0 1723 : cluster [INF] pgmap v2410578: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:57:20.758100 mon.0 10.27.251.7:6789/0 1724 : cluster [INF] HEALTH_WARN; noout flag(s) set
2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:00:00.000180 mon.1 10.27.251.8:6789/0 1353 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3 1,2,3
2016-11-26 20:01:49.705122 mon.0 10.27.251.7:6789/0 1 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:01:49.731728 mon.0 10.27.251.7:6789/0 4 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:01:49.751119 mon.0 10.27.251.7:6789/0 5 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 20:01:49.762503 mon.0 10.27.251.7:6789/0 6 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
2016-11-26 20:01:49.788619 mon.0 10.27.251.7:6789/0 7 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.243513s > max 0.05s
2016-11-26 20:01:49.788699 mon.0 10.27.251.7:6789/0 8 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.240216s > max 0.05s
2016-11-26 20:01:49.788796 mon.0 10.27.251.7:6789/0 9 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243912s > max 0.05s
2016-11-26 20:01:49.797382 mon.0 10.27.251.7:6789/0 10 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 20:01:49.797669 mon.0 10.27.251.7:6789/0 11 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:01:49.797850 mon.0 10.27.251.7:6789/0 12 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 20:01:49.797960 mon.0 10.27.251.7:6789/0 13 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
2016-11-26 20:01:49.798248 mon.0 10.27.251.7:6789/0 14 : cluster [WRN] message from mon.1 was stamped 0.294517s in the future, clocks not synchronized
2016-11-26 20:01:50.014131 mon.3 10.27.251.12:6789/0 6 : cluster [INF] mon.3 calling new monitor election
2016-11-26 20:01:50.016998 mon.2 10.27.251.11:6789/0 9 : cluster [INF] mon.2 calling new monitor election
2016-11-26 20:01:50.017895 mon.1 10.27.251.8:6789/0 1354 : cluster [INF] mon.1 calling new monitor election
2016-11-26 20:01:57.737260 mon.0 10.27.251.7:6789/0 19 : cluster [WRN] message from mon.3 was stamped 0.291444s in the future, clocks not synchronized
2016-11-26 20:02:19.789732 mon.0 10.27.251.7:6789/0 20 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.294864s > max 0.05s
2016-11-26 20:02:19.789786 mon.0 10.27.251.7:6789/0 21 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290951s > max 0.05s
2016-11-26 20:02:19.789824 mon.0 10.27.251.7:6789/0 22 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.29396s > max 0.05s
2016-11-26 20:02:20.949515 mon.0 10.27.251.7:6789/0 23 : cluster [INF] osdmap e101: 6 osds: 4 up, 6 in
2016-11-26 20:02:20.985891 mon.0 10.27.251.7:6789/0 24 : cluster [INF] pgmap v2410580: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:02:21.965798 mon.0 10.27.251.7:6789/0 25 : cluster [INF] osd.0 10.27.251.7:6804/3291 boot
2016-11-26 20:02:21.965879 mon.0 10.27.251.7:6789/0 26 : cluster [INF] osd.1 10.27.251.7:6800/2793 boot
2016-11-26 20:02:21.975031 mon.0 10.27.251.7:6789/0 27 : cluster [INF] osdmap e102: 6 osds: 6 up, 6 in
2016-11-26 20:02:22.022415 mon.0 10.27.251.7:6789/0 28 : cluster [INF] pgmap v2410581: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:02:23.026342 mon.0 10.27.251.7:6789/0 29 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
2016-11-26 20:02:23.026417 mon.0 10.27.251.7:6789/0 30 : cluster [WRN] message from mon.2 was stamped 0.275306s in the future, clocks not synchronized
2016-11-26 20:02:23.046210 mon.0 10.27.251.7:6789/0 31 : cluster [INF] pgmap v2410582: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:02:25.819773 mon.0 10.27.251.7:6789/0 32 : cluster [INF] pgmap v2410583: 768 pgs: 169 stale+active+clean, 143 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1467 kB/s wr, 276 op/s
2016-11-26 20:02:26.896658 mon.0 10.27.251.7:6789/0 33 : cluster [INF] pgmap v2410584: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3337 kB/s wr, 630 op/s
2016-11-26 20:02:49.763887 mon.0 10.27.251.7:6789/0 34 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set; Monitor clock skew detected 
2016-11-26 20:02:55.636643 osd.1 10.27.251.7:6800/2793 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.511571 secs
2016-11-26 20:02:55.636653 osd.1 10.27.251.7:6800/2793 2 : cluster [WRN] slow request 30.511571 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:03:04.727273 osd.0 10.27.251.7:6804/3291 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.147061 secs
2016-11-26 20:03:04.727281 osd.0 10.27.251.7:6804/3291 2 : cluster [WRN] slow request 30.147061 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:03:25.648743 osd.1 10.27.251.7:6800/2793 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.523708 secs
2016-11-26 20:03:25.648758 osd.1 10.27.251.7:6800/2793 4 : cluster [WRN] slow request 60.523708 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:03:34.737588 osd.0 10.27.251.7:6804/3291 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.157392 secs
2016-11-26 20:03:34.737597 osd.0 10.27.251.7:6804/3291 4 : cluster [WRN] slow request 60.157392 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:03:49.765365 mon.0 10.27.251.7:6789/0 35 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set
2016-11-26 20:04:25.850414 mon.0 10.27.251.7:6789/0 36 : cluster [INF] pgmap v2410585: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:04:26.890251 mon.0 10.27.251.7:6789/0 37 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:04:25.668335 osd.1 10.27.251.7:6800/2793 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.543296 secs
2016-11-26 20:04:25.668343 osd.1 10.27.251.7:6800/2793 6 : cluster [WRN] slow request 120.543296 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:04:34.757570 osd.0 10.27.251.7:6804/3291 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.177368 secs
2016-11-26 20:04:34.757595 osd.0 10.27.251.7:6804/3291 6 : cluster [WRN] slow request 120.177368 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:04:49.766694 mon.0 10.27.251.7:6789/0 38 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
2016-11-26 20:05:41.864203 mon.0 10.27.251.7:6789/0 39 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:05:46.887853 mon.0 10.27.251.7:6789/0 40 : cluster [INF] mon.0 at 0 won leader election with quorum 0,2,3
2016-11-26 20:05:46.897914 mon.0 10.27.251.7:6789/0 41 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set; 1 mons down, quorum 0,2,3 0,2,3
2016-11-26 20:05:46.898803 mon.0 10.27.251.7:6789/0 42 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 20:05:46.898873 mon.0 10.27.251.7:6789/0 43 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:05:46.898930 mon.0 10.27.251.7:6789/0 44 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 20:05:46.899022 mon.0 10.27.251.7:6789/0 45 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
2016-11-26 20:06:25.875860 mon.0 10.27.251.7:6789/0 46 : cluster [INF] pgmap v2410587: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:06:26.902246 mon.0 10.27.251.7:6789/0 47 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:06:25.708241 osd.1 10.27.251.7:6800/2793 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.583204 secs
2016-11-26 20:06:25.708251 osd.1 10.27.251.7:6800/2793 8 : cluster [WRN] slow request 240.583204 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
2016-11-26 20:06:34.798235 osd.0 10.27.251.7:6804/3291 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.218041 secs
2016-11-26 20:06:34.798247 osd.0 10.27.251.7:6804/3291 8 : cluster [WRN] slow request 240.218041 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
2016-11-26 20:07:20.410986 mon.3 10.27.251.12:6789/0 7 : cluster [INF] mon.3 calling new monitor election
2016-11-26 20:07:20.414159 mon.2 10.27.251.11:6789/0 10 : cluster [INF] mon.2 calling new monitor election
2016-11-26 20:07:20.421808 mon.0 10.27.251.7:6789/0 48 : cluster [INF] mon.0 calling new monitor election
2016-11-26 20:07:20.448582 mon.0 10.27.251.7:6789/0 49 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
2016-11-26 20:07:20.459304 mon.0 10.27.251.7:6789/0 50 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
2016-11-26 20:07:20.465502 mon.0 10.27.251.7:6789/0 51 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
2016-11-26 20:07:20.465571 mon.0 10.27.251.7:6789/0 52 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:20.465650 mon.0 10.27.251.7:6789/0 53 : cluster [INF] mdsmap e1: 0/0/0 up
2016-11-26 20:07:20.465750 mon.0 10.27.251.7:6789/0 54 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
2016-11-26 20:07:20.465934 mon.0 10.27.251.7:6789/0 55 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.10054s > max 0.05s
2016-11-26 20:07:20.478961 mon.0 10.27.251.7:6789/0 56 : cluster [WRN] message from mon.1 was stamped 0.109909s in the future, clocks not synchronized
2016-11-26 20:07:20.522400 mon.1 10.27.251.8:6789/0 1 : cluster [INF] mon.1 calling new monitor election
2016-11-26 20:07:20.541271 mon.1 10.27.251.8:6789/0 2 : cluster [INF] mon.1 calling new monitor election
2016-11-26 20:07:32.641565 mon.0 10.27.251.7:6789/0 61 : cluster [INF] osdmap e104: 6 osds: 5 up, 6 in
2016-11-26 20:07:32.665552 mon.0 10.27.251.7:6789/0 62 : cluster [INF] pgmap v2410589: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:33.658567 mon.0 10.27.251.7:6789/0 63 : cluster [INF] osd.5 10.27.251.8:6812/4116 boot
2016-11-26 20:07:33.676112 mon.0 10.27.251.7:6789/0 64 : cluster [INF] osdmap e105: 6 osds: 6 up, 6 in
2016-11-26 20:07:33.726565 mon.0 10.27.251.7:6789/0 65 : cluster [INF] pgmap v2410590: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:34.722585 mon.0 10.27.251.7:6789/0 66 : cluster [INF] osdmap e106: 6 osds: 5 up, 6 in
2016-11-26 20:07:34.785966 mon.0 10.27.251.7:6789/0 67 : cluster [INF] pgmap v2410591: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:35.737328 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osd.4 10.27.251.8:6804/3430 boot
2016-11-26 20:07:35.757111 mon.0 10.27.251.7:6789/0 69 : cluster [INF] osdmap e107: 6 osds: 6 up, 6 in
2016-11-26 20:07:35.794812 mon.0 10.27.251.7:6789/0 70 : cluster [INF] pgmap v2410592: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:36.797846 mon.0 10.27.251.7:6789/0 71 : cluster [INF] osdmap e108: 6 osds: 6 up, 6 in
2016-11-26 20:07:36.842861 mon.0 10.27.251.7:6789/0 72 : cluster [INF] pgmap v2410593: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:38.854149 mon.0 10.27.251.7:6789/0 73 : cluster [INF] pgmap v2410594: 768 pgs: 88 stale+active+clean, 312 peering, 368 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1992 kB/s rd, 683 kB/s wr, 117 op/s
2016-11-26 20:07:39.923063 mon.0 10.27.251.7:6789/0 74 : cluster [INF] pgmap v2410595: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1466 kB/s wr, 257 op/s
2016-11-26 20:07:41.012515 mon.0 10.27.251.7:6789/0 75 : cluster [INF] osdmap e109: 6 osds: 5 up, 6 in
2016-11-26 20:07:41.039741 mon.0 10.27.251.7:6789/0 76 : cluster [INF] pgmap v2410596: 768 pgs: 142 stale+active+clean, 312 peering, 314 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1110 kB/s wr, 211 op/s
2016-11-26 20:07:38.817104 osd.0 10.27.251.7:6804/3291 9 : cluster [INF] 1.b7 scrub starts
2016-11-26 20:07:41.429461 osd.0 10.27.251.7:6804/3291 10 : cluster [INF] 1.b7 scrub ok
2016-11-26 20:07:42.043092 mon.0 10.27.251.7:6789/0 77 : cluster [INF] osd.2 10.27.251.8:6800/3073 boot
2016-11-26 20:07:42.074005 mon.0 10.27.251.7:6789/0 78 : cluster [INF] osdmap e110: 6 osds: 5 up, 6 in
2016-11-26 20:07:42.150211 mon.0 10.27.251.7:6789/0 79 : cluster [INF] pgmap v2410597: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 940 B/s rd, 1 op/s
2016-11-26 20:07:43.084122 mon.0 10.27.251.7:6789/0 80 : cluster [INF] osd.3 10.27.251.8:6808/3714 boot
2016-11-26 20:07:43.104296 mon.0 10.27.251.7:6789/0 81 : cluster [INF] osdmap e111: 6 osds: 6 up, 6 in
2016-11-26 20:07:35.733073 osd.1 10.27.251.7:6800/2793 9 : cluster [INF] 3.37 scrub starts
2016-11-26 20:07:35.841829 osd.1 10.27.251.7:6800/2793 10 : cluster [INF] 3.37 scrub ok
2016-11-26 20:07:36.733564 osd.1 10.27.251.7:6800/2793 11 : cluster [INF] 3.7c scrub starts
2016-11-26 20:07:36.852120 osd.1 10.27.251.7:6800/2793 12 : cluster [INF] 3.7c scrub ok
2016-11-26 20:07:41.764388 osd.1 10.27.251.7:6800/2793 13 : cluster [INF] 3.fc scrub starts
2016-11-26 20:07:41.830597 osd.1 10.27.251.7:6800/2793 14 : cluster [INF] 3.fc scrub ok
2016-11-26 20:07:42.736376 osd.1 10.27.251.7:6800/2793 15 : cluster [INF] 4.9 scrub starts
2016-11-26 20:07:43.149808 mon.0 10.27.251.7:6789/0 82 : cluster [INF] pgmap v2410598: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 982 B/s rd, 1 op/s
2016-11-26 20:07:44.135066 mon.0 10.27.251.7:6789/0 83 : cluster [INF] osdmap e112: 6 osds: 6 up, 6 in
2016-11-26 20:07:44.178743 mon.0 10.27.251.7:6789/0 84 : cluster [INF] pgmap v2410599: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
2016-11-26 20:07:46.774607 mon.0 10.27.251.7:6789/0 85 : cluster [INF] pgmap v2410600: 768 pgs: 154 stale+active+clean, 223 peering, 390 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2157 kB/s wr, 466 op/s
2016-11-26 20:07:47.846499 mon.0 10.27.251.7:6789/0 86 : cluster [INF] pgmap v2410601: 768 pgs: 223 peering, 544 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4603 kB/s wr, 748 op/s
2016-11-26 20:07:48.919366 mon.0 10.27.251.7:6789/0 87 : cluster [INF] pgmap v2410602: 768 pgs: 99 peering, 667 active+clean, 2 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4235 kB/s wr, 495 op/s
2016-11-26 20:07:49.986068 mon.0 10.27.251.7:6789/0 88 : cluster [INF] pgmap v2410603: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1607 kB/s rd, 30552 B/s wr, 127 op/s
2016-11-26 20:07:50.468852 mon.0 10.27.251.7:6789/0 89 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.105319s > max 0.05s
2016-11-26 20:07:43.076810 osd.0 10.27.251.7:6804/3291 11 : cluster [INF] 1.17 scrub starts
2016-11-26 20:07:45.709439 osd.0 10.27.251.7:6804/3291 12 : cluster [INF] 1.17 scrub ok
2016-11-26 20:07:52.746601 mon.0 10.27.251.7:6789/0 90 : cluster [INF] pgmap v2410604: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 628 kB/s rd, 25525 B/s wr, 139 op/s
[...]
2016-11-26 20:08:03.325584 mon.0 10.27.251.7:6789/0 98 : cluster [INF] pgmap v2410612: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 387 kB/s rd, 61530 B/s wr, 90 op/s
2016-11-26 20:08:03.523958 osd.1 10.27.251.7:6800/2793 16 : cluster [INF] 4.9 scrub ok
2016-11-26 20:08:04.398784 mon.0 10.27.251.7:6789/0 99 : cluster [INF] pgmap v2410613: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2975 kB/s rd, 401 kB/s wr, 419 op/s
[...]
2016-11-26 20:08:20.340826 mon.0 10.27.251.7:6789/0 112 : cluster [INF] pgmap v2410626: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 384 kB/s rd, 95507 B/s wr, 31 op/s
2016-11-26 20:08:20.458392 mon.0 10.27.251.7:6789/0 113 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1; nodown,noout flag(s) set; Monitor clock skew detected 
2016-11-26 20:08:22.429360 mon.0 10.27.251.7:6789/0 114 : cluster [INF] pgmap v2410627: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 256 kB/s rd, 65682 B/s wr, 18 op/s
[...]
2016-11-26 20:09:19.885573 mon.0 10.27.251.7:6789/0 160 : cluster [INF] pgmap v2410671: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 33496 kB/s rd, 3219 kB/s wr, 317 op/s
2016-11-26 20:09:20.458837 mon.0 10.27.251.7:6789/0 161 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
2016-11-26 20:09:20.921396 mon.0 10.27.251.7:6789/0 162 : cluster [INF] pgmap v2410672: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 10498 kB/s rd, 970 kB/s wr, 46 op/s
[...]
2016-11-26 20:09:40.156783 mon.0 10.27.251.7:6789/0 178 : cluster [INF] pgmap v2410688: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 16202 kB/s rd, 586 kB/s wr, 64 op/s
2016-11-26 20:09:41.231992 mon.0 10.27.251.7:6789/0 181 : cluster [INF] osdmap e113: 6 osds: 6 up, 6 in
2016-11-26 20:09:41.260099 mon.0 10.27.251.7:6789/0 182 : cluster [INF] pgmap v2410689: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13734 kB/s rd, 561 kB/s wr, 58 op/s
[...]
2016-11-26 20:09:46.764432 mon.0 10.27.251.7:6789/0 187 : cluster [INF] pgmap v2410693: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4388 kB/s rd, 97979 B/s wr, 18 op/s
2016-11-26 20:09:46.764614 mon.0 10.27.251.7:6789/0 189 : cluster [INF] osdmap e114: 6 osds: 6 up, 6 in
2016-11-26 20:09:46.793173 mon.0 10.27.251.7:6789/0 190 : cluster [INF] pgmap v2410694: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1709 kB/s rd, 75202 B/s wr, 4 op/s
[...]
2016-11-26 20:10:19.919396 mon.0 10.27.251.7:6789/0 216 : cluster [INF] pgmap v2410719: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 404 kB/s wr, 4 op/s
2016-11-26 20:10:20.459279 mon.0 10.27.251.7:6789/0 217 : cluster [INF] HEALTH_OK


Other things to note. In syslog (not ceph log) of mon.0 i've found for
the first (falied) boot:

Nov 26 18:05:43 capitanamerica ceph[1714]: === mon.0 ===
Nov 26 18:05:43 capitanamerica ceph[1714]: Starting Ceph mon.0 on capitanamerica...
Nov 26 18:05:43 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 18:05:43 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 18:05:43 capitanamerica ceph[1714]: Running as unit ceph-mon.0.1480179943.905192147.service.
Nov 26 18:05:43 capitanamerica ceph[1714]: Starting ceph-create-keys on capitanamerica...
Nov 26 18:05:44 capitanamerica ceph[1714]: === osd.1 ===
Nov 26 18:05:44 capitanamerica ceph[1714]: 2016-11-26 18:05:44.939844 7f7f2478c700  0 -- :/2046852810 >> 10.27.251.7:6789/0 pipe(0x7f7f20061550 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f2005a990).fault
Nov 26 18:05:46 capitanamerica bash[1874]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
Nov 26 18:05:52 capitanamerica ceph[1714]: 2016-11-26 18:05:52.234086 7f7f2478c700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400b0c0).fault
Nov 26 18:05:58 capitanamerica ceph[1714]: 2016-11-26 18:05:58.234163 7f7f2458a700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.12:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d240).fault
Nov 26 18:06:04 capitanamerica ceph[1714]: 2016-11-26 18:06:04.234037 7f7f2468b700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d310).fault
Nov 26 18:06:14 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 1.82 host=capitanamerica root=default'
Nov 26 18:06:14 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.1']' returned non-zero exit status 1
Nov 26 18:06:15 capitanamerica ceph[1714]: === osd.0 ===
Nov 26 18:06:22 capitanamerica ceph[1714]: 2016-11-26 18:06:22.238039 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000b0c0).fault
Nov 26 18:06:28 capitanamerica ceph[1714]: 2016-11-26 18:06:28.241918 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d240).fault
Nov 26 18:06:34 capitanamerica ceph[1714]: 2016-11-26 18:06:34.242060 7f8bb45b1700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d310).fault
Nov 26 18:06:38 capitanamerica ceph[1714]: 2016-11-26 18:06:38.242035 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000de50).fault
Nov 26 18:06:44 capitanamerica ceph[1714]: 2016-11-26 18:06:44.242157 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000e0d0).fault
Nov 26 18:06:45 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 1.82 host=capitanamerica root=default'
Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1
Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: One or more partitions failed to activate

And for the second (working):

Nov 26 20:01:49 capitanamerica ceph[1716]: === mon.0 ===
Nov 26 20:01:49 capitanamerica ceph[1716]: Starting Ceph mon.0 on capitanamerica...
Nov 26 20:01:49 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 20:01:49 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 20:01:49 capitanamerica ceph[1716]: Running as unit ceph-mon.0.1480186909.457328760.service.
Nov 26 20:01:49 capitanamerica ceph[1716]: Starting ceph-create-keys on capitanamerica...
Nov 26 20:01:49 capitanamerica bash[1900]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.1 ===
Nov 26 20:01:50 capitanamerica ceph[1716]: create-or-move updated item name 'osd.1' weight 1.82 at location {host=capitanamerica,root=default} to crush map
Nov 26 20:01:50 capitanamerica ceph[1716]: Starting Ceph osd.1 on capitanamerica...
Nov 26 20:01:50 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 20:01:50 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 20:01:50 capitanamerica ceph[1716]: Running as unit ceph-osd.1.1480186910.254183695.service.
Nov 26 20:01:50 capitanamerica bash[2765]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.0 ===
Nov 26 20:01:51 capitanamerica ceph[1716]: create-or-move updated item name 'osd.0' weight 1.82 at location {host=capitanamerica,root=default} to crush map
Nov 26 20:01:51 capitanamerica ceph[1716]: Starting Ceph osd.0 on capitanamerica...
Nov 26 20:01:51 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Nov 26 20:01:51 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Nov 26 20:01:51 capitanamerica ceph[1716]: Running as unit ceph-osd.0.1480186910.957564523.service.
Nov 26 20:01:51 capitanamerica bash[3281]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal


So seems to me that at the first start (some) OSD fail to start. But,
again, PVE and 'ceph status' report all OSDs as up&in.


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From sysadmin-pve at cognitec.com  Tue Nov 29 14:40:44 2016
From: sysadmin-pve at cognitec.com (Alwin Antreich)
Date: Tue, 29 Nov 2016 14:40:44 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <20161129111744.GL3355@sv.lnf.it>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
 <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
 <20161129111744.GL3355@sv.lnf.it>
Message-ID: <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com>

Hi Marco,

On 11/29/2016 12:17 PM, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
>   In chel di` si favelave...
> 
>> May you please show us the logs?
> 
> Ok, i'm here. With the log.
> 
> A bit of legenda: 10.27.251.7 and 10.27.251.8 are the 'ceph' nodes
> (mon+osd); 10.27.251.11 and 10.27.251.12 are the 'cpu' nodes (only
> mon). In order, mon.0, mon.1, mon.2 and mon.3.
> 
> These are the logs of 10.27.251.7 (mon.0); Seems to me that ceph logs
> are all similar, so i hope these suffices.
> 
> 
> I've started my activity at 15.00, but before take down all the stuff
> i've P2V my last server, my Asterisk PBX box. Clearly, cluster worked:
> 
> [...]
> 2016-11-26 16:45:51.900445 osd.4 10.27.251.8:6804/3442 5016 : cluster [INF] 3.68 scrub starts
> 2016-11-26 16:45:52.047932 osd.4 10.27.251.8:6804/3442 5017 : cluster [INF] 3.68 scrub ok
> 2016-11-26 16:45:52.741334 mon.0 10.27.251.7:6789/0 2317313 : cluster [INF] pgmap v2410312: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20533 B/s rd, 945 kB/s wr, 127 op/s
> 2016-11-26 16:45:54.825603 mon.0 10.27.251.7:6789/0 2317314 : cluster [INF] pgmap v2410313: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 261 kB/s wr, 7 op/s
> [...]
> 2016-11-26 16:47:52.741749 mon.0 10.27.251.7:6789/0 2317382 : cluster [INF] pgmap v2410381: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11563 B/s rd, 687 kB/s wr, 124 op/s
> 2016-11-26 16:47:55.002485 mon.0 10.27.251.7:6789/0 2317383 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s
> 
> 
> Finished the P2V, i've started to power off the cluster, starting from
> the cpu nodes. After powering down a node, i've realized that i need it
> to do another thing, so i've re-powered on. ;-)
> 
> 2016-11-26 16:48:05.018514 mon.1 10.27.251.8:6789/0 129 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 16:48:05.031761 mon.2 10.27.251.11:6789/0 120 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 16:48:05.053262 mon.0 10.27.251.7:6789/0 2317384 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 16:48:10.091773 mon.0 10.27.251.7:6789/0 2317385 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
> 2016-11-26 16:48:10.104535 mon.0 10.27.251.7:6789/0 2317386 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
> 2016-11-26 16:48:10.143625 mon.0 10.27.251.7:6789/0 2317387 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 16:48:10.143731 mon.0 10.27.251.7:6789/0 2317388 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s
> 2016-11-26 16:48:10.144828 mon.0 10.27.251.7:6789/0 2317389 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 16:48:10.148407 mon.0 10.27.251.7:6789/0 2317390 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 16:48:11.208968 mon.0 10.27.251.7:6789/0 2317391 : cluster [INF] pgmap v2410383: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2174 kB/s rd, 646 kB/s wr, 130 op/s
> 2016-11-26 16:48:13.309644 mon.0 10.27.251.7:6789/0 2317392 : cluster [INF] pgmap v2410384: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2210 kB/s rd, 652 kB/s wr, 135 op/s
> [...]
> 2016-11-26 16:50:04.665220 mon.0 10.27.251.7:6789/0 2317466 : cluster [INF] pgmap v2410458: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2579 B/s rd, 23217 B/s wr, 5 op/s
> 2016-11-26 16:50:05.707271 mon.0 10.27.251.7:6789/0 2317467 : cluster [INF] pgmap v2410459: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 157 kB/s rd, 445 kB/s wr, 82 op/s
> 2016-11-26 16:50:16.786716 mon.1 10.27.251.8:6789/0 130 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 16:50:16.815156 mon.0 10.27.251.7:6789/0 2317468 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 16:52:51.536024 osd.0 10.27.251.7:6800/3166 7755 : cluster [INF] 1.e8 scrub starts
> 2016-11-26 16:52:53.771169 osd.0 10.27.251.7:6800/3166 7756 : cluster [INF] 1.e8 scrub ok
> 2016-11-26 16:54:34.558607 osd.0 10.27.251.7:6800/3166 7757 : cluster [INF] 1.ed scrub starts
> 2016-11-26 16:54:36.682207 osd.0 10.27.251.7:6800/3166 7758 : cluster [INF] 1.ed scrub ok
> 2016-11-26 16:57:07.816187 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 16:57:13.242951 mon.0 10.27.251.7:6789/0 2317469 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 16:57:13.252424 mon.0 10.27.251.7:6789/0 2317470 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 16:57:13.253143 mon.0 10.27.251.7:6789/0 2317471 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155786s > max 0.05s
> 2016-11-26 16:57:13.302934 mon.0 10.27.251.7:6789/0 2317472 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 16:57:13.302998 mon.0 10.27.251.7:6789/0 2317473 : cluster [INF] pgmap v2410460: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 77940 B/s rd, 208 kB/s wr, 38 op/s
> 2016-11-26 16:57:13.303055 mon.0 10.27.251.7:6789/0 2317474 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 16:57:13.303141 mon.0 10.27.251.7:6789/0 2317475 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 16:57:13.304000 mon.0 10.27.251.7:6789/0 2317476 : cluster [WRN] message from mon.3 was stamped 0.156822s in the future, clocks not synchronized
> 2016-11-26 16:57:14.350452 mon.0 10.27.251.7:6789/0 2317477 : cluster [INF] pgmap v2410461: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 43651 B/s rd, 15067 B/s wr, 2 op/s
> [...]
> 2016-11-26 16:57:30.901532 mon.0 10.27.251.7:6789/0 2317483 : cluster [INF] pgmap v2410467: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1539 kB/s rd, 316 kB/s wr, 172 op/s
> 2016-11-26 16:51:13.939571 osd.4 10.27.251.8:6804/3442 5018 : cluster [INF] 4.91 deep-scrub starts
> 2016-11-26 16:52:03.663961 osd.4 10.27.251.8:6804/3442 5019 : cluster [INF] 4.91 deep-scrub ok
> 2016-11-26 16:57:33.003398 mon.0 10.27.251.7:6789/0 2317484 : cluster [INF] pgmap v2410468: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20384 kB/s rd, 2424 kB/s wr, 1163 op/s
> [...]
> 2016-11-26 16:57:41.523421 mon.0 10.27.251.7:6789/0 2317489 : cluster [INF] pgmap v2410473: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3654 kB/s rd, 732 kB/s wr, 385 op/s
> 2016-11-26 16:57:43.284475 mon.0 10.27.251.7:6789/0 2317490 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155191s > max 0.05s
> 2016-11-26 16:57:43.624090 mon.0 10.27.251.7:6789/0 2317491 : cluster [INF] pgmap v2410474: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2140 kB/s rd, 391 kB/s wr, 233 op/s
> [...]
> 2016-11-26 16:58:02.688789 mon.0 10.27.251.7:6789/0 2317503 : cluster [INF] pgmap v2410486: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4675 kB/s rd, 184 kB/s wr, 281 op/s
> 2016-11-26 16:52:48.308292 osd.3 10.27.251.8:6812/4377 8761 : cluster [INF] 1.55 scrub starts
> 2016-11-26 16:52:50.718814 osd.3 10.27.251.8:6812/4377 8762 : cluster [INF] 1.55 scrub ok
> 2016-11-26 16:52:59.309398 osd.3 10.27.251.8:6812/4377 8763 : cluster [INF] 4.c7 scrub starts
> 2016-11-26 16:53:10.848883 osd.3 10.27.251.8:6812/4377 8764 : cluster [INF] 4.c7 scrub ok
> 2016-11-26 16:58:03.759643 mon.0 10.27.251.7:6789/0 2317504 : cluster [INF] pgmap v2410487: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 8311 kB/s rd, 65182 B/s wr, 334 op/s
> [...]
> 2016-11-26 16:58:11.183400 mon.0 10.27.251.7:6789/0 2317510 : cluster [INF] pgmap v2410493: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11880 kB/s rd, 507 kB/s wr, 1006 op/s
> 2016-11-26 16:58:13.265908 mon.0 10.27.251.7:6789/0 2317511 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
> 2016-11-26 16:58:13.290893 mon.0 10.27.251.7:6789/0 2317512 : cluster [INF] pgmap v2410494: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 9111 kB/s rd, 523 kB/s wr, 718 op/s
> [...]
> 2016-11-26 16:58:42.309990 mon.0 10.27.251.7:6789/0 2317529 : cluster [INF] pgmap v2410511: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 22701 kB/s rd, 4773 kB/s wr, 834 op/s
> 2016-11-26 16:58:43.285715 mon.0 10.27.251.7:6789/0 2317530 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.154781s > max 0.05s
> 2016-11-26 16:58:43.358508 mon.0 10.27.251.7:6789/0 2317531 : cluster [INF] pgmap v2410512: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 19916 kB/s rd, 4439 kB/s wr, 741 op/s
> [...]
> 2016-11-26 16:59:17.933355 mon.0 10.27.251.7:6789/0 2317552 : cluster [INF] pgmap v2410533: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4400 kB/s rd, 2144 kB/s wr, 276 op/s
> 2016-11-26 16:59:18.981605 mon.0 10.27.251.7:6789/0 2317553 : cluster [WRN] message from mon.3 was stamped 0.155111s in the future, clocks not synchronized
> 2016-11-26 16:59:21.064651 mon.0 10.27.251.7:6789/0 2317554 : cluster [INF] pgmap v2410534: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3909 kB/s rd, 1707 kB/s wr, 232 op/s
> [...]
> 2016-11-26 16:59:58.729775 mon.0 10.27.251.7:6789/0 2317576 : cluster [INF] pgmap v2410556: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4067 kB/s rd, 1372 kB/s wr, 125 op/s
> 2016-11-26 17:00:00.000396 mon.0 10.27.251.7:6789/0 2317577 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
> 2016-11-26 17:00:00.807659 mon.0 10.27.251.7:6789/0 2317578 : cluster [INF] pgmap v2410557: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 7894 kB/s rd, 1245 kB/s wr, 552 op/s
> [...]
> 2016-11-26 17:00:11.359226 mon.0 10.27.251.7:6789/0 2317585 : cluster [INF] pgmap v2410564: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2416 kB/s rd, 376 kB/s wr, 191 op/s
> 2016-11-26 17:00:13.286867 mon.0 10.27.251.7:6789/0 2317586 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.153666s > max 0.05s
> 2016-11-26 17:00:13.481830 mon.0 10.27.251.7:6789/0 2317587 : cluster [INF] pgmap v2410565: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 6266 kB/s rd, 492 kB/s wr, 265 op/s
> [...]
> 2016-11-26 17:00:15.559867 mon.0 10.27.251.7:6789/0 2317588 : cluster [INF] pgmap v2410566: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 5107 kB/s rd, 176 kB/s wr, 133 op/s
> 
> OK, here server was shut down and so logs stop.
> 
> 
> At power up, i got as sayed clock skew troubles, so i got status
> HEALTH_WARN:
> 
> 2016-11-26 18:16:19.623440 mon.1 10.27.251.8:6789/0 1311 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:16:19.729689 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:16:19.848291 mon.0 10.27.251.7:6789/0 1183 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:16:29.613075 mon.2 10.27.251.11:6789/0 20 : cluster [WRN] message from mon.0 was stamped 0.341880s in the future, clocks not synchronized
> 2016-11-26 18:16:29.742328 mon.1 10.27.251.8:6789/0 1332 : cluster [WRN] message from mon.0 was stamped 0.212611s in the future, clocks not synchronized
> 2016-11-26 18:16:29.894351 mon.0 10.27.251.7:6789/0 1202 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
> 2016-11-26 18:16:29.901079 mon.0 10.27.251.7:6789/0 1203 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
> 2016-11-26 18:16:29.902069 mon.0 10.27.251.7:6789/0 1204 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347176s > max 0.05s
> 2016-11-26 18:16:29.928249 mon.0 10.27.251.7:6789/0 1205 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.203948s > max 0.05s
> 2016-11-26 18:16:29.955001 mon.0 10.27.251.7:6789/0 1206 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:16:29.955115 mon.0 10.27.251.7:6789/0 1207 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:16:29.955195 mon.0 10.27.251.7:6789/0 1208 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:16:29.955297 mon.0 10.27.251.7:6789/0 1209 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:16:36.965739 mon.2 10.27.251.11:6789/0 23 : cluster [WRN] message from mon.0 was stamped 0.347450s in the future, clocks not synchronized
> 2016-11-26 18:16:37.091476 mon.1 10.27.251.8:6789/0 1335 : cluster [WRN] message from mon.0 was stamped 0.221680s in the future, clocks not synchronized
> 2016-11-26 18:16:59.929488 mon.0 10.27.251.7:6789/0 1212 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347736s > max 0.05s
> 2016-11-26 18:16:59.929541 mon.0 10.27.251.7:6789/0 1213 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.222216s > max 0.05s
> 2016-11-26 18:17:02.770378 mon.2 10.27.251.11:6789/0 24 : cluster [WRN] message from mon.0 was stamped 0.345763s in the future, clocks not synchronized
> 2016-11-26 18:17:02.902756 mon.1 10.27.251.8:6789/0 1336 : cluster [WRN] message from mon.0 was stamped 0.213372s in the future, clocks not synchronized
> 2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 
> 2016-11-26 18:17:59.930852 mon.0 10.27.251.7:6789/0 1219 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.348437s > max 0.05s
> 2016-11-26 18:17:59.930923 mon.0 10.27.251.7:6789/0 1220 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.223381s > max 0.05s
> 2016-11-26 18:18:24.383970 mon.2 10.27.251.11:6789/0 25 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:18:24.459941 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 18:18:24.506084 mon.3 10.27.251.12:6789/0 2 : cluster [WRN] message from mon.0 was stamped 0.271532s in the future, clocks not synchronized
> 2016-11-26 18:18:24.508845 mon.1 10.27.251.8:6789/0 1337 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:18:24.733137 mon.0 10.27.251.7:6789/0 1221 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:18:24.764445 mon.0 10.27.251.7:6789/0 1222 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 18:18:24.770743 mon.0 10.27.251.7:6789/0 1223 : cluster [INF] HEALTH_OK
> 2016-11-26 18:18:24.771644 mon.0 10.27.251.7:6789/0 1224 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.34865s > max 0.05s
> 2016-11-26 18:18:24.771763 mon.0 10.27.251.7:6789/0 1225 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272024s > max 0.05s
> 2016-11-26 18:18:24.778105 mon.0 10.27.251.7:6789/0 1226 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:18:24.778168 mon.0 10.27.251.7:6789/0 1227 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:18:24.778217 mon.0 10.27.251.7:6789/0 1228 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:18:24.778309 mon.0 10.27.251.7:6789/0 1229 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:18:24.778495 mon.0 10.27.251.7:6789/0 1230 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.217754s > max 0.05s
> 2016-11-26 18:18:31.609426 mon.3 10.27.251.12:6789/0 5 : cluster [WRN] message from mon.0 was stamped 0.272441s in the future, clocks not synchronized
> 2016-11-26 18:18:54.779742 mon.0 10.27.251.7:6789/0 1231 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272617s > max 0.05s
> 2016-11-26 18:18:54.779795 mon.0 10.27.251.7:6789/0 1232 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.224392s > max 0.05s
> 2016-11-26 18:18:54.779834 mon.0 10.27.251.7:6789/0 1233 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349151s > max 0.05s
> 2016-11-26 18:18:57.598098 mon.3 10.27.251.12:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.272729s in the future, clocks not synchronized
> 2016-11-26 18:19:09.612371 mon.2 10.27.251.11:6789/0 26 : cluster [WRN] message from mon.0 was stamped 0.349322s in the future, clocks not synchronized
> 2016-11-26 18:19:09.736830 mon.1 10.27.251.8:6789/0 1338 : cluster [WRN] message from mon.0 was stamped 0.224812s in the future, clocks not synchronized
> 2016-11-26 18:19:24.770966 mon.0 10.27.251.7:6789/0 1234 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
> 2016-11-26 18:19:54.781002 mon.0 10.27.251.7:6789/0 1235 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.273372s > max 0.05s
> 2016-11-26 18:19:54.781078 mon.0 10.27.251.7:6789/0 1236 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.225574s > max 0.05s
> 2016-11-26 18:19:54.781120 mon.0 10.27.251.7:6789/0 1237 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349896s > max 0.05s
> 2016-11-26 18:21:03.602890 mon.3 10.27.251.12:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.274203s in the future, clocks not synchronized
> 2016-11-26 18:21:24.782299 mon.0 10.27.251.7:6789/0 1238 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27444s > max 0.05s
> 2016-11-26 18:21:24.782359 mon.0 10.27.251.7:6789/0 1239 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.351099s > max 0.05s
> 2016-11-26 18:21:24.782397 mon.0 10.27.251.7:6789/0 1240 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.227465s > max 0.05s
> 2016-11-26 18:23:24.783511 mon.0 10.27.251.7:6789/0 1241 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.275852s > max 0.05s
> 2016-11-26 18:23:24.783572 mon.0 10.27.251.7:6789/0 1242 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.352701s > max 0.05s
> 2016-11-26 18:23:24.783614 mon.0 10.27.251.7:6789/0 1243 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.229936s > max 0.05s
> 2016-11-26 18:25:54.784800 mon.0 10.27.251.7:6789/0 1244 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.277662s > max 0.05s
> 2016-11-26 18:25:54.784861 mon.0 10.27.251.7:6789/0 1245 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.354716s > max 0.05s
> 2016-11-26 18:25:54.785102 mon.0 10.27.251.7:6789/0 1246 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.232739s > max 0.05s
> 2016-11-26 18:28:54.786183 mon.0 10.27.251.7:6789/0 1248 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27989s > max 0.05s
> 2016-11-26 18:28:54.786243 mon.0 10.27.251.7:6789/0 1249 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.23634s > max 0.05s
> 2016-11-26 18:28:54.786284 mon.0 10.27.251.7:6789/0 1250 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.35715s > max 0.05s
> 2016-11-26 18:29:36.721250 mon.2 10.27.251.11:6789/0 27 : cluster [WRN] message from mon.0 was stamped 0.357750s in the future, clocks not synchronized
> 2016-11-26 18:29:36.841757 mon.1 10.27.251.8:6789/0 1339 : cluster [WRN] message from mon.0 was stamped 0.237207s in the future, clocks not synchronized
> 2016-11-26 18:31:30.725507 mon.3 10.27.251.12:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.281799s in the future, clocks not synchronized
> 2016-11-26 18:32:24.787410 mon.0 10.27.251.7:6789/0 1264 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282481s > max 0.05s
> 2016-11-26 18:32:24.787462 mon.0 10.27.251.7:6789/0 1265 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.360058s > max 0.05s
> 2016-11-26 18:32:24.787500 mon.0 10.27.251.7:6789/0 1266 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.240569s > max 0.05s
> 2016-11-26 18:33:20.594196 mon.3 10.27.251.12:6789/0 9 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 18:33:20.635816 mon.1 10.27.251.8:6789/0 1340 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:33:20.894625 mon.0 10.27.251.7:6789/0 1273 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:33:25.919955 mon.0 10.27.251.7:6789/0 1274 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 18:33:25.929393 mon.0 10.27.251.7:6789/0 1275 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 18:33:25.930715 mon.0 10.27.251.7:6789/0 1276 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282884s > max 0.05s
> 2016-11-26 18:33:25.947280 mon.0 10.27.251.7:6789/0 1277 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.234203s > max 0.05s
> 2016-11-26 18:33:25.964223 mon.0 10.27.251.7:6789/0 1278 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:33:25.964283 mon.0 10.27.251.7:6789/0 1279 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:33:25.964326 mon.0 10.27.251.7:6789/0 1280 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:33:25.964418 mon.0 10.27.251.7:6789/0 1281 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:33:55.948613 mon.0 10.27.251.7:6789/0 1283 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.28349s > max 0.05s
> 2016-11-26 18:33:55.948680 mon.0 10.27.251.7:6789/0 1284 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.242253s > max 0.05s
> 2016-11-26 18:34:25.929710 mon.0 10.27.251.7:6789/0 1287 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 
> 2016-11-26 18:34:55.950050 mon.0 10.27.251.7:6789/0 1288 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.284225s > max 0.05s
> 2016-11-26 18:34:55.950117 mon.0 10.27.251.7:6789/0 1289 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243421s > max 0.05s
> 2016-11-26 18:36:25.951267 mon.0 10.27.251.7:6789/0 1290 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.285389s > max 0.05s
> 2016-11-26 18:36:25.951393 mon.0 10.27.251.7:6789/0 1291 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.245253s > max 0.05s
> 2016-11-26 18:38:25.952573 mon.0 10.27.251.7:6789/0 1294 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.286907s > max 0.05s
> 2016-11-26 18:38:25.952836 mon.0 10.27.251.7:6789/0 1295 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247648s > max 0.05s
> 2016-11-26 18:40:55.954179 mon.0 10.27.251.7:6789/0 1296 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.288735s > max 0.05s
> 2016-11-26 18:40:55.954233 mon.0 10.27.251.7:6789/0 1297 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.2506s > max 0.05s
> 2016-11-26 18:43:32.915408 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:43:32.916835 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 18:43:32.951384 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.388792s in the future, clocks not synchronized
> 2016-11-26 18:43:33.014026 mon.3 10.27.251.12:6789/0 10 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 18:43:33.050896 mon.1 10.27.251.8:6789/0 1341 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 18:43:33.305330 mon.0 10.27.251.7:6789/0 1298 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 18:43:33.324492 mon.0 10.27.251.7:6789/0 1299 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 18:43:33.333626 mon.0 10.27.251.7:6789/0 1300 : cluster [INF] HEALTH_OK
> 2016-11-26 18:43:33.334234 mon.0 10.27.251.7:6789/0 1301 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290845s > max 0.05s
> 2016-11-26 18:43:33.334321 mon.0 10.27.251.7:6789/0 1302 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.388745s > max 0.05s
> 2016-11-26 18:43:33.340638 mon.0 10.27.251.7:6789/0 1303 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 18:43:33.340703 mon.0 10.27.251.7:6789/0 1304 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 18:43:33.340763 mon.0 10.27.251.7:6789/0 1305 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 18:43:33.340858 mon.0 10.27.251.7:6789/0 1306 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 18:43:33.341044 mon.0 10.27.251.7:6789/0 1307 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247914s > max 0.05s
> 2016-11-26 18:43:40.064299 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.388889s in the future, clocks not synchronized
> 2016-11-26 18:44:03.342137 mon.0 10.27.251.7:6789/0 1308 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291226s > max 0.05s
> 2016-11-26 18:44:03.342225 mon.0 10.27.251.7:6789/0 1309 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.254342s > max 0.05s
> 2016-11-26 18:44:03.342281 mon.0 10.27.251.7:6789/0 1310 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.389057s > max 0.05s
> 2016-11-26 18:44:06.047499 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.389102s in the future, clocks not synchronized
> 2016-11-26 18:44:33.333908 mon.0 10.27.251.7:6789/0 1311 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
> 2016-11-26 18:45:03.343358 mon.0 10.27.251.7:6789/0 1313 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291989s > max 0.05s
> 2016-11-26 18:45:03.343435 mon.0 10.27.251.7:6789/0 1314 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.255536s > max 0.05s
> 2016-11-26 18:45:03.343540 mon.0 10.27.251.7:6789/0 1315 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.38983s > max 0.05s
> 2016-11-26 18:46:11.549947 mon.2 10.27.251.11:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.390678s in the future, clocks not synchronized
> 2016-11-26 18:46:33.344570 mon.0 10.27.251.7:6789/0 1329 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.29311s > max 0.05s
> 2016-11-26 18:46:33.344642 mon.0 10.27.251.7:6789/0 1330 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.257389s > max 0.05s
> 2016-11-26 18:46:33.344707 mon.0 10.27.251.7:6789/0 1331 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.391036s > max 0.05s
> 2016-11-26 18:48:33.345909 mon.0 10.27.251.7:6789/0 1354 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.294607s > max 0.05s
> 2016-11-26 18:48:33.345973 mon.0 10.27.251.7:6789/0 1355 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.392611s > max 0.05s
> 2016-11-26 18:48:33.346016 mon.0 10.27.251.7:6789/0 1356 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.259781s > max 0.05s
> 2016-11-26 18:51:03.347074 mon.0 10.27.251.7:6789/0 1357 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.296507s > max 0.05s
> 2016-11-26 18:51:03.347259 mon.0 10.27.251.7:6789/0 1358 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.394627s > max 0.05s
> 2016-11-26 18:51:03.347311 mon.0 10.27.251.7:6789/0 1359 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.262662s > max 0.05s
> 2016-11-26 18:54:03.348471 mon.0 10.27.251.7:6789/0 1360 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.298756s > max 0.05s
> 2016-11-26 18:54:03.348533 mon.0 10.27.251.7:6789/0 1361 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.397086s > max 0.05s
> 2016-11-26 18:54:03.348580 mon.0 10.27.251.7:6789/0 1362 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.266196s > max 0.05s
> 2016-11-26 18:56:39.053369 mon.2 10.27.251.11:6789/0 9 : cluster [WRN] message from mon.0 was stamped 0.399300s in the future, clocks not synchronized
> 2016-11-26 18:57:33.349690 mon.0 10.27.251.7:6789/0 1363 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192948s > max 0.05s
> 2016-11-26 18:57:33.349743 mon.0 10.27.251.7:6789/0 1364 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.270457s > max 0.05s
> 2016-11-26 18:57:33.349788 mon.0 10.27.251.7:6789/0 1365 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.400016s > max 0.05s
> 2016-11-26 19:00:00.000400 mon.0 10.27.251.7:6789/0 1370 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 
> 2016-11-26 19:01:33.350738 mon.0 10.27.251.7:6789/0 1389 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192183s > max 0.05s
> 2016-11-26 19:01:33.350800 mon.0 10.27.251.7:6789/0 1390 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.275208s > max 0.05s
> 2016-11-26 19:01:33.350856 mon.0 10.27.251.7:6789/0 1391 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.40334s > max 0.05s
> 2016-11-26 19:06:03.351908 mon.0 10.27.251.7:6789/0 1478 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192207s > max 0.05s
> 2016-11-26 19:06:03.351997 mon.0 10.27.251.7:6789/0 1479 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.280431s > max 0.05s
> 2016-11-26 19:06:03.352110 mon.0 10.27.251.7:6789/0 1480 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.251491s > max 0.05s
> 
> But after adding the new NTP sever and waiting some time, finally clock
> get in sync and status go to OK.
> But (this is the PANIC time) despite of the fact that 'ceph status' and
> pve interface say 'all OK', cluster does not work.
> 
> So i've started to reboot the CPU nodes (mon.2 and .3):
> 
> 2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK
> 2016-11-26 19:12:43.854404 mon.1 10.27.251.8:6789/0 1342 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:12:43.856032 mon.3 10.27.251.12:6789/0 11 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:12:43.870922 mon.0 10.27.251.7:6789/0 1590 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:12:48.895683 mon.0 10.27.251.7:6789/0 1591 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 19:12:48.905245 mon.0 10.27.251.7:6789/0 1592 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 19:12:48.951654 mon.0 10.27.251.7:6789/0 1593 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:12:48.951715 mon.0 10.27.251.7:6789/0 1594 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:12:48.951766 mon.0 10.27.251.7:6789/0 1595 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:12:48.951848 mon.0 10.27.251.7:6789/0 1596 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:15:48.583382 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:15:48.584865 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:15:48.589714 mon.0 10.27.251.7:6789/0 1616 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:15:48.589965 mon.1 10.27.251.8:6789/0 1343 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:15:48.591671 mon.3 10.27.251.12:6789/0 12 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:15:48.614007 mon.0 10.27.251.7:6789/0 1617 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 19:15:48.620602 mon.0 10.27.251.7:6789/0 1618 : cluster [INF] HEALTH_OK
> 2016-11-26 19:15:48.633199 mon.0 10.27.251.7:6789/0 1619 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:15:48.633258 mon.0 10.27.251.7:6789/0 1620 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:15:48.633322 mon.0 10.27.251.7:6789/0 1621 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:15:48.633416 mon.0 10.27.251.7:6789/0 1622 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:18:12.415679 mon.0 10.27.251.7:6789/0 1639 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:18:17.444444 mon.0 10.27.251.7:6789/0 1640 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2
> 2016-11-26 19:18:17.453618 mon.0 10.27.251.7:6789/0 1641 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2
> 2016-11-26 19:18:17.468577 mon.0 10.27.251.7:6789/0 1642 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:18:17.468636 mon.0 10.27.251.7:6789/0 1643 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:18:17.468679 mon.0 10.27.251.7:6789/0 1644 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:18:17.468755 mon.0 10.27.251.7:6789/0 1645 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:21:25.457997 mon.2 10.27.251.11:6789/0 5 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:21:25.458923 mon.0 10.27.251.7:6789/0 1648 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:21:25.459240 mon.1 10.27.251.8:6789/0 1344 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:21:25.489206 mon.0 10.27.251.7:6789/0 1649 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 19:21:25.498421 mon.0 10.27.251.7:6789/0 1650 : cluster [INF] HEALTH_OK
> 2016-11-26 19:21:25.505645 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:21:25.508232 mon.0 10.27.251.7:6789/0 1651 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:21:25.508377 mon.0 10.27.251.7:6789/0 1652 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:21:25.508466 mon.0 10.27.251.7:6789/0 1653 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:21:25.508556 mon.0 10.27.251.7:6789/0 1654 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:44:00.306113 mon.0 10.27.251.7:6789/0 1672 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:44:05.343631 mon.0 10.27.251.7:6789/0 1673 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3
> 2016-11-26 19:44:05.353082 mon.0 10.27.251.7:6789/0 1674 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3
> 2016-11-26 19:44:05.373799 mon.0 10.27.251.7:6789/0 1675 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:44:05.373860 mon.0 10.27.251.7:6789/0 1676 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:44:05.373904 mon.0 10.27.251.7:6789/0 1677 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:44:05.373983 mon.0 10.27.251.7:6789/0 1678 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:47:20.297661 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:47:20.299406 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 19:47:20.357274 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.404381s in the future, clocks not synchronized
> 2016-11-26 19:47:20.716116 mon.3 10.27.251.12:6789/0 4 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 19:47:20.719435 mon.0 10.27.251.7:6789/0 1679 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 19:47:20.719853 mon.1 10.27.251.8:6789/0 1345 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 19:47:20.747017 mon.0 10.27.251.7:6789/0 1680 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 19:47:20.755302 mon.0 10.27.251.7:6789/0 1681 : cluster [INF] HEALTH_OK
> 2016-11-26 19:47:20.755943 mon.0 10.27.251.7:6789/0 1682 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420346s > max 0.05s
> 2016-11-26 19:47:20.762042 mon.0 10.27.251.7:6789/0 1683 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 19:47:20.762100 mon.0 10.27.251.7:6789/0 1684 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:47:20.762146 mon.0 10.27.251.7:6789/0 1685 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 19:47:20.762226 mon.0 10.27.251.7:6789/0 1686 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in
> 2016-11-26 19:47:27.462603 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.420329s in the future, clocks not synchronized
> 2016-11-26 19:47:50.763598 mon.0 10.27.251.7:6789/0 1687 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420661s > max 0.05s
> 2016-11-26 19:47:53.438750 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.420684s in the future, clocks not synchronized
> 2016-11-26 19:48:20.755382 mon.0 10.27.251.7:6789/0 1688 : cluster [INF] HEALTH_WARN; clock skew detected on mon.2; Monitor clock skew detected 
> 2016-11-26 19:49:20.755732 mon.0 10.27.251.7:6789/0 1697 : cluster [INF] HEALTH_OK
> 
> 
> With no luck. So finally i've set 'nodown' and 'noout' flags and
> rebooted the storage nodes (mon.0 ad .1). And suddenly all get back as
> normal:
> 
> 2016-11-26 19:57:20.090836 mon.0 10.27.251.7:6789/0 1722 : cluster [INF] osdmap e99: 6 osds: 6 up, 6 in
> 2016-11-26 19:57:20.110743 mon.0 10.27.251.7:6789/0 1723 : cluster [INF] pgmap v2410578: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:57:20.758100 mon.0 10.27.251.7:6789/0 1724 : cluster [INF] HEALTH_WARN; noout flag(s) set
> 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
> 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
> 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:00:00.000180 mon.1 10.27.251.8:6789/0 1353 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3 1,2,3
> 2016-11-26 20:01:49.705122 mon.0 10.27.251.7:6789/0 1 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:01:49.731728 mon.0 10.27.251.7:6789/0 4 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:01:49.751119 mon.0 10.27.251.7:6789/0 5 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 20:01:49.762503 mon.0 10.27.251.7:6789/0 6 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
> 2016-11-26 20:01:49.788619 mon.0 10.27.251.7:6789/0 7 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.243513s > max 0.05s
> 2016-11-26 20:01:49.788699 mon.0 10.27.251.7:6789/0 8 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.240216s > max 0.05s
> 2016-11-26 20:01:49.788796 mon.0 10.27.251.7:6789/0 9 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243912s > max 0.05s
> 2016-11-26 20:01:49.797382 mon.0 10.27.251.7:6789/0 10 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 20:01:49.797669 mon.0 10.27.251.7:6789/0 11 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:01:49.797850 mon.0 10.27.251.7:6789/0 12 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 20:01:49.797960 mon.0 10.27.251.7:6789/0 13 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in
> 2016-11-26 20:01:49.798248 mon.0 10.27.251.7:6789/0 14 : cluster [WRN] message from mon.1 was stamped 0.294517s in the future, clocks not synchronized
> 2016-11-26 20:01:50.014131 mon.3 10.27.251.12:6789/0 6 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 20:01:50.016998 mon.2 10.27.251.11:6789/0 9 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 20:01:50.017895 mon.1 10.27.251.8:6789/0 1354 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 20:01:57.737260 mon.0 10.27.251.7:6789/0 19 : cluster [WRN] message from mon.3 was stamped 0.291444s in the future, clocks not synchronized
> 2016-11-26 20:02:19.789732 mon.0 10.27.251.7:6789/0 20 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.294864s > max 0.05s
> 2016-11-26 20:02:19.789786 mon.0 10.27.251.7:6789/0 21 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290951s > max 0.05s
> 2016-11-26 20:02:19.789824 mon.0 10.27.251.7:6789/0 22 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.29396s > max 0.05s
> 2016-11-26 20:02:20.949515 mon.0 10.27.251.7:6789/0 23 : cluster [INF] osdmap e101: 6 osds: 4 up, 6 in
> 2016-11-26 20:02:20.985891 mon.0 10.27.251.7:6789/0 24 : cluster [INF] pgmap v2410580: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:02:21.965798 mon.0 10.27.251.7:6789/0 25 : cluster [INF] osd.0 10.27.251.7:6804/3291 boot
> 2016-11-26 20:02:21.965879 mon.0 10.27.251.7:6789/0 26 : cluster [INF] osd.1 10.27.251.7:6800/2793 boot
> 2016-11-26 20:02:21.975031 mon.0 10.27.251.7:6789/0 27 : cluster [INF] osdmap e102: 6 osds: 6 up, 6 in
> 2016-11-26 20:02:22.022415 mon.0 10.27.251.7:6789/0 28 : cluster [INF] pgmap v2410581: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:02:23.026342 mon.0 10.27.251.7:6789/0 29 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
> 2016-11-26 20:02:23.026417 mon.0 10.27.251.7:6789/0 30 : cluster [WRN] message from mon.2 was stamped 0.275306s in the future, clocks not synchronized
> 2016-11-26 20:02:23.046210 mon.0 10.27.251.7:6789/0 31 : cluster [INF] pgmap v2410582: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:02:25.819773 mon.0 10.27.251.7:6789/0 32 : cluster [INF] pgmap v2410583: 768 pgs: 169 stale+active+clean, 143 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1467 kB/s wr, 276 op/s
> 2016-11-26 20:02:26.896658 mon.0 10.27.251.7:6789/0 33 : cluster [INF] pgmap v2410584: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3337 kB/s wr, 630 op/s
> 2016-11-26 20:02:49.763887 mon.0 10.27.251.7:6789/0 34 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set; Monitor clock skew detected 
> 2016-11-26 20:02:55.636643 osd.1 10.27.251.7:6800/2793 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.511571 secs
> 2016-11-26 20:02:55.636653 osd.1 10.27.251.7:6800/2793 2 : cluster [WRN] slow request 30.511571 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:03:04.727273 osd.0 10.27.251.7:6804/3291 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.147061 secs
> 2016-11-26 20:03:04.727281 osd.0 10.27.251.7:6804/3291 2 : cluster [WRN] slow request 30.147061 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:03:25.648743 osd.1 10.27.251.7:6800/2793 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.523708 secs
> 2016-11-26 20:03:25.648758 osd.1 10.27.251.7:6800/2793 4 : cluster [WRN] slow request 60.523708 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:03:34.737588 osd.0 10.27.251.7:6804/3291 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.157392 secs
> 2016-11-26 20:03:34.737597 osd.0 10.27.251.7:6804/3291 4 : cluster [WRN] slow request 60.157392 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:03:49.765365 mon.0 10.27.251.7:6789/0 35 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set
> 2016-11-26 20:04:25.850414 mon.0 10.27.251.7:6789/0 36 : cluster [INF] pgmap v2410585: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:04:26.890251 mon.0 10.27.251.7:6789/0 37 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:04:25.668335 osd.1 10.27.251.7:6800/2793 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.543296 secs
> 2016-11-26 20:04:25.668343 osd.1 10.27.251.7:6800/2793 6 : cluster [WRN] slow request 120.543296 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:04:34.757570 osd.0 10.27.251.7:6804/3291 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.177368 secs
> 2016-11-26 20:04:34.757595 osd.0 10.27.251.7:6804/3291 6 : cluster [WRN] slow request 120.177368 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:04:49.766694 mon.0 10.27.251.7:6789/0 38 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
> 2016-11-26 20:05:41.864203 mon.0 10.27.251.7:6789/0 39 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:05:46.887853 mon.0 10.27.251.7:6789/0 40 : cluster [INF] mon.0 at 0 won leader election with quorum 0,2,3
> 2016-11-26 20:05:46.897914 mon.0 10.27.251.7:6789/0 41 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set; 1 mons down, quorum 0,2,3 0,2,3
> 2016-11-26 20:05:46.898803 mon.0 10.27.251.7:6789/0 42 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 20:05:46.898873 mon.0 10.27.251.7:6789/0 43 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:05:46.898930 mon.0 10.27.251.7:6789/0 44 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 20:05:46.899022 mon.0 10.27.251.7:6789/0 45 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
> 2016-11-26 20:06:25.875860 mon.0 10.27.251.7:6789/0 46 : cluster [INF] pgmap v2410587: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:06:26.902246 mon.0 10.27.251.7:6789/0 47 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:06:25.708241 osd.1 10.27.251.7:6800/2793 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.583204 secs
> 2016-11-26 20:06:25.708251 osd.1 10.27.251.7:6800/2793 8 : cluster [WRN] slow request 240.583204 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg
> 2016-11-26 20:06:34.798235 osd.0 10.27.251.7:6804/3291 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.218041 secs
> 2016-11-26 20:06:34.798247 osd.0 10.27.251.7:6804/3291 8 : cluster [WRN] slow request 240.218041 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg
> 2016-11-26 20:07:20.410986 mon.3 10.27.251.12:6789/0 7 : cluster [INF] mon.3 calling new monitor election
> 2016-11-26 20:07:20.414159 mon.2 10.27.251.11:6789/0 10 : cluster [INF] mon.2 calling new monitor election
> 2016-11-26 20:07:20.421808 mon.0 10.27.251.7:6789/0 48 : cluster [INF] mon.0 calling new monitor election
> 2016-11-26 20:07:20.448582 mon.0 10.27.251.7:6789/0 49 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3
> 2016-11-26 20:07:20.459304 mon.0 10.27.251.7:6789/0 50 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set
> 2016-11-26 20:07:20.465502 mon.0 10.27.251.7:6789/0 51 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0}
> 2016-11-26 20:07:20.465571 mon.0 10.27.251.7:6789/0 52 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:20.465650 mon.0 10.27.251.7:6789/0 53 : cluster [INF] mdsmap e1: 0/0/0 up
> 2016-11-26 20:07:20.465750 mon.0 10.27.251.7:6789/0 54 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:20.465934 mon.0 10.27.251.7:6789/0 55 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.10054s > max 0.05s
> 2016-11-26 20:07:20.478961 mon.0 10.27.251.7:6789/0 56 : cluster [WRN] message from mon.1 was stamped 0.109909s in the future, clocks not synchronized
> 2016-11-26 20:07:20.522400 mon.1 10.27.251.8:6789/0 1 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 20:07:20.541271 mon.1 10.27.251.8:6789/0 2 : cluster [INF] mon.1 calling new monitor election
> 2016-11-26 20:07:32.641565 mon.0 10.27.251.7:6789/0 61 : cluster [INF] osdmap e104: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:32.665552 mon.0 10.27.251.7:6789/0 62 : cluster [INF] pgmap v2410589: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:33.658567 mon.0 10.27.251.7:6789/0 63 : cluster [INF] osd.5 10.27.251.8:6812/4116 boot
> 2016-11-26 20:07:33.676112 mon.0 10.27.251.7:6789/0 64 : cluster [INF] osdmap e105: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:33.726565 mon.0 10.27.251.7:6789/0 65 : cluster [INF] pgmap v2410590: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:34.722585 mon.0 10.27.251.7:6789/0 66 : cluster [INF] osdmap e106: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:34.785966 mon.0 10.27.251.7:6789/0 67 : cluster [INF] pgmap v2410591: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:35.737328 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osd.4 10.27.251.8:6804/3430 boot
> 2016-11-26 20:07:35.757111 mon.0 10.27.251.7:6789/0 69 : cluster [INF] osdmap e107: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:35.794812 mon.0 10.27.251.7:6789/0 70 : cluster [INF] pgmap v2410592: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:36.797846 mon.0 10.27.251.7:6789/0 71 : cluster [INF] osdmap e108: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:36.842861 mon.0 10.27.251.7:6789/0 72 : cluster [INF] pgmap v2410593: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:38.854149 mon.0 10.27.251.7:6789/0 73 : cluster [INF] pgmap v2410594: 768 pgs: 88 stale+active+clean, 312 peering, 368 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1992 kB/s rd, 683 kB/s wr, 117 op/s
> 2016-11-26 20:07:39.923063 mon.0 10.27.251.7:6789/0 74 : cluster [INF] pgmap v2410595: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1466 kB/s wr, 257 op/s
> 2016-11-26 20:07:41.012515 mon.0 10.27.251.7:6789/0 75 : cluster [INF] osdmap e109: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:41.039741 mon.0 10.27.251.7:6789/0 76 : cluster [INF] pgmap v2410596: 768 pgs: 142 stale+active+clean, 312 peering, 314 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1110 kB/s wr, 211 op/s
> 2016-11-26 20:07:38.817104 osd.0 10.27.251.7:6804/3291 9 : cluster [INF] 1.b7 scrub starts
> 2016-11-26 20:07:41.429461 osd.0 10.27.251.7:6804/3291 10 : cluster [INF] 1.b7 scrub ok
> 2016-11-26 20:07:42.043092 mon.0 10.27.251.7:6789/0 77 : cluster [INF] osd.2 10.27.251.8:6800/3073 boot
> 2016-11-26 20:07:42.074005 mon.0 10.27.251.7:6789/0 78 : cluster [INF] osdmap e110: 6 osds: 5 up, 6 in
> 2016-11-26 20:07:42.150211 mon.0 10.27.251.7:6789/0 79 : cluster [INF] pgmap v2410597: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 940 B/s rd, 1 op/s
> 2016-11-26 20:07:43.084122 mon.0 10.27.251.7:6789/0 80 : cluster [INF] osd.3 10.27.251.8:6808/3714 boot
> 2016-11-26 20:07:43.104296 mon.0 10.27.251.7:6789/0 81 : cluster [INF] osdmap e111: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:35.733073 osd.1 10.27.251.7:6800/2793 9 : cluster [INF] 3.37 scrub starts
> 2016-11-26 20:07:35.841829 osd.1 10.27.251.7:6800/2793 10 : cluster [INF] 3.37 scrub ok
> 2016-11-26 20:07:36.733564 osd.1 10.27.251.7:6800/2793 11 : cluster [INF] 3.7c scrub starts
> 2016-11-26 20:07:36.852120 osd.1 10.27.251.7:6800/2793 12 : cluster [INF] 3.7c scrub ok
> 2016-11-26 20:07:41.764388 osd.1 10.27.251.7:6800/2793 13 : cluster [INF] 3.fc scrub starts
> 2016-11-26 20:07:41.830597 osd.1 10.27.251.7:6800/2793 14 : cluster [INF] 3.fc scrub ok
> 2016-11-26 20:07:42.736376 osd.1 10.27.251.7:6800/2793 15 : cluster [INF] 4.9 scrub starts
> 2016-11-26 20:07:43.149808 mon.0 10.27.251.7:6789/0 82 : cluster [INF] pgmap v2410598: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 982 B/s rd, 1 op/s
> 2016-11-26 20:07:44.135066 mon.0 10.27.251.7:6789/0 83 : cluster [INF] osdmap e112: 6 osds: 6 up, 6 in
> 2016-11-26 20:07:44.178743 mon.0 10.27.251.7:6789/0 84 : cluster [INF] pgmap v2410599: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail
> 2016-11-26 20:07:46.774607 mon.0 10.27.251.7:6789/0 85 : cluster [INF] pgmap v2410600: 768 pgs: 154 stale+active+clean, 223 peering, 390 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2157 kB/s wr, 466 op/s
> 2016-11-26 20:07:47.846499 mon.0 10.27.251.7:6789/0 86 : cluster [INF] pgmap v2410601: 768 pgs: 223 peering, 544 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4603 kB/s wr, 748 op/s
> 2016-11-26 20:07:48.919366 mon.0 10.27.251.7:6789/0 87 : cluster [INF] pgmap v2410602: 768 pgs: 99 peering, 667 active+clean, 2 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4235 kB/s wr, 495 op/s
> 2016-11-26 20:07:49.986068 mon.0 10.27.251.7:6789/0 88 : cluster [INF] pgmap v2410603: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1607 kB/s rd, 30552 B/s wr, 127 op/s
> 2016-11-26 20:07:50.468852 mon.0 10.27.251.7:6789/0 89 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.105319s > max 0.05s
> 2016-11-26 20:07:43.076810 osd.0 10.27.251.7:6804/3291 11 : cluster [INF] 1.17 scrub starts
> 2016-11-26 20:07:45.709439 osd.0 10.27.251.7:6804/3291 12 : cluster [INF] 1.17 scrub ok
> 2016-11-26 20:07:52.746601 mon.0 10.27.251.7:6789/0 90 : cluster [INF] pgmap v2410604: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 628 kB/s rd, 25525 B/s wr, 139 op/s
> [...]
> 2016-11-26 20:08:03.325584 mon.0 10.27.251.7:6789/0 98 : cluster [INF] pgmap v2410612: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 387 kB/s rd, 61530 B/s wr, 90 op/s
> 2016-11-26 20:08:03.523958 osd.1 10.27.251.7:6800/2793 16 : cluster [INF] 4.9 scrub ok
> 2016-11-26 20:08:04.398784 mon.0 10.27.251.7:6789/0 99 : cluster [INF] pgmap v2410613: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2975 kB/s rd, 401 kB/s wr, 419 op/s
> [...]
> 2016-11-26 20:08:20.340826 mon.0 10.27.251.7:6789/0 112 : cluster [INF] pgmap v2410626: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 384 kB/s rd, 95507 B/s wr, 31 op/s
> 2016-11-26 20:08:20.458392 mon.0 10.27.251.7:6789/0 113 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1; nodown,noout flag(s) set; Monitor clock skew detected 
> 2016-11-26 20:08:22.429360 mon.0 10.27.251.7:6789/0 114 : cluster [INF] pgmap v2410627: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 256 kB/s rd, 65682 B/s wr, 18 op/s
> [...]
> 2016-11-26 20:09:19.885573 mon.0 10.27.251.7:6789/0 160 : cluster [INF] pgmap v2410671: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 33496 kB/s rd, 3219 kB/s wr, 317 op/s
> 2016-11-26 20:09:20.458837 mon.0 10.27.251.7:6789/0 161 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set
> 2016-11-26 20:09:20.921396 mon.0 10.27.251.7:6789/0 162 : cluster [INF] pgmap v2410672: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 10498 kB/s rd, 970 kB/s wr, 46 op/s
> [...]
> 2016-11-26 20:09:40.156783 mon.0 10.27.251.7:6789/0 178 : cluster [INF] pgmap v2410688: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 16202 kB/s rd, 586 kB/s wr, 64 op/s
> 2016-11-26 20:09:41.231992 mon.0 10.27.251.7:6789/0 181 : cluster [INF] osdmap e113: 6 osds: 6 up, 6 in
> 2016-11-26 20:09:41.260099 mon.0 10.27.251.7:6789/0 182 : cluster [INF] pgmap v2410689: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13734 kB/s rd, 561 kB/s wr, 58 op/s
> [...]
> 2016-11-26 20:09:46.764432 mon.0 10.27.251.7:6789/0 187 : cluster [INF] pgmap v2410693: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4388 kB/s rd, 97979 B/s wr, 18 op/s
> 2016-11-26 20:09:46.764614 mon.0 10.27.251.7:6789/0 189 : cluster [INF] osdmap e114: 6 osds: 6 up, 6 in
> 2016-11-26 20:09:46.793173 mon.0 10.27.251.7:6789/0 190 : cluster [INF] pgmap v2410694: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1709 kB/s rd, 75202 B/s wr, 4 op/s
> [...]
> 2016-11-26 20:10:19.919396 mon.0 10.27.251.7:6789/0 216 : cluster [INF] pgmap v2410719: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 404 kB/s wr, 4 op/s
> 2016-11-26 20:10:20.459279 mon.0 10.27.251.7:6789/0 217 : cluster [INF] HEALTH_OK
> 
> 
> Other things to note. In syslog (not ceph log) of mon.0 i've found for
> the first (falied) boot:
> 
> Nov 26 18:05:43 capitanamerica ceph[1714]: === mon.0 ===
> Nov 26 18:05:43 capitanamerica ceph[1714]: Starting Ceph mon.0 on capitanamerica...
> Nov 26 18:05:43 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 18:05:43 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 18:05:43 capitanamerica ceph[1714]: Running as unit ceph-mon.0.1480179943.905192147.service.
> Nov 26 18:05:43 capitanamerica ceph[1714]: Starting ceph-create-keys on capitanamerica...
> Nov 26 18:05:44 capitanamerica ceph[1714]: === osd.1 ===
> Nov 26 18:05:44 capitanamerica ceph[1714]: 2016-11-26 18:05:44.939844 7f7f2478c700  0 -- :/2046852810 >> 10.27.251.7:6789/0 pipe(0x7f7f20061550 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f2005a990).fault
> Nov 26 18:05:46 capitanamerica bash[1874]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
> Nov 26 18:05:52 capitanamerica ceph[1714]: 2016-11-26 18:05:52.234086 7f7f2478c700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400b0c0).fault
> Nov 26 18:05:58 capitanamerica ceph[1714]: 2016-11-26 18:05:58.234163 7f7f2458a700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.12:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d240).fault
> Nov 26 18:06:04 capitanamerica ceph[1714]: 2016-11-26 18:06:04.234037 7f7f2468b700  0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d310).fault
> Nov 26 18:06:14 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 1.82 host=capitanamerica root=default'
> Nov 26 18:06:14 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.1']' returned non-zero exit status 1
> Nov 26 18:06:15 capitanamerica ceph[1714]: === osd.0 ===
> Nov 26 18:06:22 capitanamerica ceph[1714]: 2016-11-26 18:06:22.238039 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000b0c0).fault
> Nov 26 18:06:28 capitanamerica ceph[1714]: 2016-11-26 18:06:28.241918 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d240).fault
> Nov 26 18:06:34 capitanamerica ceph[1714]: 2016-11-26 18:06:34.242060 7f8bb45b1700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d310).fault
> Nov 26 18:06:38 capitanamerica ceph[1714]: 2016-11-26 18:06:38.242035 7f8bb44b0700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000de50).fault
> Nov 26 18:06:44 capitanamerica ceph[1714]: 2016-11-26 18:06:44.242157 7f8bb46b2700  0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000e0d0).fault
> Nov 26 18:06:45 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 1.82 host=capitanamerica root=default'
> Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1
> Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: One or more partitions failed to activate
> 
> And for the second (working):
> 
> Nov 26 20:01:49 capitanamerica ceph[1716]: === mon.0 ===
> Nov 26 20:01:49 capitanamerica ceph[1716]: Starting Ceph mon.0 on capitanamerica...
> Nov 26 20:01:49 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 20:01:49 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 20:01:49 capitanamerica ceph[1716]: Running as unit ceph-mon.0.1480186909.457328760.service.
> Nov 26 20:01:49 capitanamerica ceph[1716]: Starting ceph-create-keys on capitanamerica...
> Nov 26 20:01:49 capitanamerica bash[1900]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6
> Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.1 ===
> Nov 26 20:01:50 capitanamerica ceph[1716]: create-or-move updated item name 'osd.1' weight 1.82 at location {host=capitanamerica,root=default} to crush map
> Nov 26 20:01:50 capitanamerica ceph[1716]: Starting Ceph osd.1 on capitanamerica...
> Nov 26 20:01:50 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 20:01:50 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 20:01:50 capitanamerica ceph[1716]: Running as unit ceph-osd.1.1480186910.254183695.service.
> Nov 26 20:01:50 capitanamerica bash[2765]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
> Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.0 ===
> Nov 26 20:01:51 capitanamerica ceph[1716]: create-or-move updated item name 'osd.0' weight 1.82 at location {host=capitanamerica,root=default} to crush map
> Nov 26 20:01:51 capitanamerica ceph[1716]: Starting Ceph osd.0 on capitanamerica...
> Nov 26 20:01:51 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
> Nov 26 20:01:51 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768;  /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
> Nov 26 20:01:51 capitanamerica ceph[1716]: Running as unit ceph-osd.0.1480186910.957564523.service.
> Nov 26 20:01:51 capitanamerica bash[3281]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal
> 
> 
> So seems to me that at the first start (some) OSD fail to start. But,
> again, PVE and 'ceph status' report all OSDs as up&in.

What does the following command give you?

ceph osd pool get <POOLNAME> min_size

> 
> 
> Thanks.
> 

As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to
happen. And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again
and ceph needs its time, till all services are running.

-- 
Cheers,
Alwin


From gaio at sv.lnf.it  Tue Nov 29 15:05:14 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Tue, 29 Nov 2016 15:05:14 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
 <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
 <20161129111744.GL3355@sv.lnf.it>
 <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com>
Message-ID: <20161129140514.GR3355@sv.lnf.it>

Mandi! Alwin Antreich
  In chel di` si favelave...

> What does the following command give you?
> ceph osd pool get <POOLNAME> min_size

root at capitanamerica:~# ceph osd pool get DATA min_size
min_size: 1
root at capitanamerica:~# ceph osd pool get VM min_size
min_size: 1
root at capitanamerica:~# ceph osd pool get LXC min_size
min_size: 1


> As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to
> happen.

Ahem, not so unlikely... we have UPSes but not diesel generators... ;-(


> And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again
> and ceph needs its time, till all services are running.

This is not the case. I've started the nodes one by one...

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From sysadmin-pve at cognitec.com  Tue Nov 29 17:26:25 2016
From: sysadmin-pve at cognitec.com (Alwin Antreich)
Date: Tue, 29 Nov 2016 17:26:25 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <20161129140514.GR3355@sv.lnf.it>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
 <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
 <20161129111744.GL3355@sv.lnf.it>
 <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com>
 <20161129140514.GR3355@sv.lnf.it>
Message-ID: <a1365ab1-8188-d718-429c-d6c17ce83142@cognitec.com>

Hi Marco,

On 11/29/2016 03:05 PM, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
>   In chel di` si favelave...
> 
>> What does the following command give you?
>> ceph osd pool get <POOLNAME> min_size
> 
> root at capitanamerica:~# ceph osd pool get DATA min_size
> min_size: 1
> root at capitanamerica:~# ceph osd pool get VM min_size
> min_size: 1
> root at capitanamerica:~# ceph osd pool get LXC min_size
> min_size: 1

The min_size 1 means in a degraded state, ceph serves the data as long as one copy is available.

> 
> 
>> As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to
>> happen.
> 
> Ahem, not so unlikely... we have UPSes but not diesel generators... ;-(

If they shutdown cleanly, then it shouldn't be a problem, as far as I have tested it myself.

> 
> 
>> And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again
>> and ceph needs its time, till all services are running.
> 
> This is not the case. I've started the nodes one by one...
> 

I don't see this behavior on our test cluster, when we shutdown all hosts and start them up at a later time.

-- 
Cheers,
Alwin


From f.rust at sec.tu-bs.de  Wed Nov 30 09:06:30 2016
From: f.rust at sec.tu-bs.de (F.Rust)
Date: Wed, 30 Nov 2016 09:06:30 +0100
Subject: [PVE-User] Webfrontent View
Message-ID: <E64541F9-35D9-4087-971C-8D032441F9A1@sec.tu-bs.de>

Hi all,

I?m using promos 4.2 for a while now and am quite satisfied. 
But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future? 

Thanks for any help,
Frank


From gaio at sv.lnf.it  Wed Nov 30 09:36:47 2016
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Wed, 30 Nov 2016 09:36:47 +0100
Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-)
In-Reply-To: <a1365ab1-8188-d718-429c-d6c17ce83142@cognitec.com>
References: <20161128120511.GJ3348@sv.lnf.it>
 <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com>
 <20161128143141.GQ3348@sv.lnf.it>
 <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com>
 <20161129111744.GL3355@sv.lnf.it>
 <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com>
 <20161129140514.GR3355@sv.lnf.it>
 <a1365ab1-8188-d718-429c-d6c17ce83142@cognitec.com>
Message-ID: <20161130083647.GC3213@sv.lnf.it>

Mandi! Alwin Antreich
  In chel di` si favelave...

> The min_size 1 means in a degraded state, ceph serves the data as long as one copy is available.

Yes, i know.


> If they shutdown cleanly, then it shouldn't be a problem, as far as I have tested it myself.
[...]
> I don't see this behavior on our test cluster, when we shutdown all hosts and start them up at a later time.

Boh. It's strange...

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From t.lamprecht at proxmox.com  Wed Nov 30 10:22:01 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Wed, 30 Nov 2016 10:22:01 +0100
Subject: [PVE-User] Webfrontent View
In-Reply-To: <E64541F9-35D9-4087-971C-8D032441F9A1@sec.tu-bs.de>
References: <E64541F9-35D9-4087-971C-8D032441F9A1@sec.tu-bs.de>
Message-ID: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com>

Hi,

On 11/30/2016 09:06 AM, F.Rust wrote:
> Hi all,
>
> I?m using promos 4.2 for a while now and am quite satisfied.
> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future?

You can resize the tree pane again and the bottom log window can be toggled,
with the small bar with the triangle on it at the bottom, see:
https://www.pictshare.net/316ece0142.png

If that does not help can you please upload a screenshot and send
the link as a reply so we can see whats going on :)

cheers,
Thomas

>
> Thanks for any help,
> Frank
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From f.rust at sec.tu-bs.de  Wed Nov 30 10:34:34 2016
From: f.rust at sec.tu-bs.de (F.Rust)
Date: Wed, 30 Nov 2016 10:34:34 +0100
Subject: [PVE-User] Webfrontent View
In-Reply-To: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com>
References: <E64541F9-35D9-4087-971C-8D032441F9A1@sec.tu-bs.de>
 <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com>
Message-ID: <5E754049-E685-4E4E-8960-4A0A9F2AAC01@sec.tu-bs.de>

Here the requested screenshot.

https://www.pictshare.net/2477973227.png

You can see there are no resize handles...


> Am 30.11.2016 um 10:22 schrieb Thomas Lamprecht <t.lamprecht at proxmox.com>:
> 
> Hi,
> 
> On 11/30/2016 09:06 AM, F.Rust wrote:
>> Hi all,
>> 
>> I?m using promos 4.2 for a while now and am quite satisfied.
>> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future?
> 
> You can resize the tree pane again and the bottom log window can be toggled,
> with the small bar with the triangle on it at the bottom, see:
> https://www.pictshare.net/316ece0142.png
> 
> If that does not help can you please upload a screenshot and send
> the link as a reply so we can see whats going on :)
> 
> cheers,
> Thomas
> 
>> 
>> Thanks for any help,
>> Frank
>> 
>> 
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


Frank Rust

------------------------------------------------------------------------
Frank Rust                           Technische Universit?t Braunschweig

Fon: 0531 39155122                         Institut f?r Systemsicherheit
Fax: 0531 39155130                                          Rebenring 56
Mail: f.rust at tu-braunschweig.de                     D-38106 Braunschweig


From t.lamprecht at proxmox.com  Wed Nov 30 10:40:46 2016
From: t.lamprecht at proxmox.com (Thomas Lamprecht)
Date: Wed, 30 Nov 2016 10:40:46 +0100
Subject: [PVE-User] Webfrontent View
In-Reply-To: <5E754049-E685-4E4E-8960-4A0A9F2AAC01@sec.tu-bs.de>
References: <E64541F9-35D9-4087-971C-8D032441F9A1@sec.tu-bs.de>
 <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com>
 <5E754049-E685-4E4E-8960-4A0A9F2AAC01@sec.tu-bs.de>
Message-ID: <c512e9c8-2652-a86e-aa39-0c8ec23cba91@proxmox.com>

On 11/30/2016 10:34 AM, F.Rust wrote:
> Here the requested screenshot.
>
> https://www.pictshare.net/2477973227.png
>
> You can see there are no resize handles...

The one for the Log Panel below is there but yes, the left three seems 
to be missing...

Did you tried a force reload which should empty the cache: CTRL + SHIFT + R

Else I could imagine that an Addon is interfering here, maybe you 
accidentally did a
"right click + Block Element" on the tree panel so that your add blocker 
blocks it,
just shooting in the dark here :)

cheers,
Thomas

>
>> Am 30.11.2016 um 10:22 schrieb Thomas Lamprecht <t.lamprecht at proxmox.com>:
>>
>> Hi,
>>
>> On 11/30/2016 09:06 AM, F.Rust wrote:
>>> Hi all,
>>>
>>> I?m using promos 4.2 for a while now and am quite satisfied.
>>> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future?
>> You can resize the tree pane again and the bottom log window can be toggled,
>> with the small bar with the triangle on it at the bottom, see:
>> https://www.pictshare.net/316ece0142.png
>>
>> If that does not help can you please upload a screenshot and send
>> the link as a reply so we can see whats going on :)
>>
>> cheers,
>> Thomas
>>
>>> Thanks for any help,
>>> Frank
>>>
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
>
> Frank Rust
>
> ------------------------------------------------------------------------
> Frank Rust                           Technische Universit?t Braunschweig
>
> Fon: 0531 39155122                         Institut f?r Systemsicherheit
> Fax: 0531 39155130                                          Rebenring 56
> Mail: f.rust at tu-braunschweig.de                     D-38106 Braunschweig
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From regis.houssin at inodbox.com  Wed Nov 30 10:42:45 2016
From: regis.houssin at inodbox.com (=?UTF-8?Q?R=c3=a9gis_Houssin?=)
Date: Wed, 30 Nov 2016 10:42:45 +0100
Subject: [PVE-User] New VM created after 4.3 upgrade not start !
Message-ID: <df1ed291-9d41-b7a3-e39d-748f995e1144@inodbox.com>

Hi,

after upgrade proxmox with the latest 4.3, I have an error message when
starting a new VM :
(the VMs created before the update works fine)

> kvm: -drive file=/dev/drbd/by-res/vm-502-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on: Could not open '/dev/drbd/by-res/vm-502-disk-1/0': No such file or directory
> TASK ERROR: start failed: command '/usr/bin/kvm -id 502 -chardev 'socket,id=qmp,path=/var/run/qemu-server/502.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/502.pid -daemonize -smbios 'type=1,uuid=7eab7942-fcaf-48b6-94ac-bad24087e609' -name srv1.happylibre.fr -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/502.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k fr -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:3681fcbb6821' -drive 'file=/var/lib/vz/template/iso/debian-8.4.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/drbd/by-res/vm-502-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap502i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=5A:73:0D:7E:E9:C5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1


the volume and resource "vm-502-disk-1" is ok, but it does not appear
with "drbdsetup show", and it does not appear with "drbd-overview" !!

what is the problem please ?

thanks

Cordialement,
-- 
R?gis Houssin
---------------------------------------------------------
iNodbox (Cap-Networks)
5, rue Corneille
01000 BOURG EN BRESSE
FRANCE
VoIP: +33 1 83 62 40 03
GSM: +33 6 33 02 07 97
Email: regis.houssin at inodbox.com

Web: https://www.inodbox.com/
Development: https://git.framasoft.org/u/inodbox/
Translation: https://www.transifex.com/inodbox/
---------------------------------------------------------


From f.rust at sec.tu-bs.de  Wed Nov 30 10:58:26 2016
From: f.rust at sec.tu-bs.de (F.Rust)
Date: Wed, 30 Nov 2016 10:58:26 +0100
Subject: [PVE-User] Webfrontent View
In-Reply-To: <30893B7D-8A98-48B5-AE61-B7131D5C52D6@tu-bs.de>
References: <E64541F9-35D9-4087-971C-8D032441F9A1@sec.tu-bs.de>
 <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com>
 <30893B7D-8A98-48B5-AE61-B7131D5C52D6@tu-bs.de>
Message-ID: <BB50FC0E-3BCD-469F-9F3B-7DDF7666E76A@sec.tu-bs.de>

You are right! Thanks a lot. 
BOTH were there, but invisible (at least for me).
Under regular circumstances it is impossible to move these drawers to that extreme positions. I have no idea how it could happen (and stay persistent during different user sessions).

Best regards, 
Frank

> Am 30.11.2016 um 10:33 schrieb F. Rust <f.rust at tu-bs.de>:
> 
> Here the requested screenshot.
> 
> https://www.pictshare.net/2477973227.png
> 
> You can see there are no resize handles...
> 
> 
>> Am 30.11.2016 um 10:22 schrieb Thomas Lamprecht <t.lamprecht at proxmox.com>:
>> 
>> Hi,
>> 
>> On 11/30/2016 09:06 AM, F.Rust wrote:
>>> Hi all,
>>> 
>>> I?m using promos 4.2 for a while now and am quite satisfied.
>>> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future?
>> 
>> You can resize the tree pane again and the bottom log window can be toggled,
>> with the small bar with the triangle on it at the bottom, see:
>> https://www.pictshare.net/316ece0142.png
>> 
>> If that does not help can you please upload a screenshot and send
>> the link as a reply so we can see whats going on :)
>> 
>> cheers,
>> Thomas
>> 
>>> 
>>> Thanks for any help,
>>> Frank
>>> 
>>> 
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> 
>> 
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> 
> 
> 
> 
> Frank Rust
> 
> ------------------------------------------------------------------------
> Frank Rust                           Technische Universit?t Braunschweig
> 
> Fon: 0531 39155122                         Institut f?r Systemsicherheit
> Fax: 0531 39155130                                          Rebenring 56
> Mail: f.rust at tu-braunschweig.de                     D-38106 Braunschweig
> 


Frank Rust

------------------------------------------------------------------------
Frank Rust                           Technische Universit?t Braunschweig

Fon: 0531 39155122                         Institut f?r Systemsicherheit
Fax: 0531 39155130                                          Rebenring 56
Mail: f.rust at tu-braunschweig.de                     D-38106 Braunschweig


From mark at openvs.co.uk  Wed Nov 30 19:10:55 2016
From: mark at openvs.co.uk (Mark Adams)
Date: Wed, 30 Nov 2016 18:10:55 +0000
Subject: [PVE-User] ZFS on iSCSI + Pacemaker/corosync/DRBD
In-Reply-To: <20161123214006.26e4e9a9@sleipner.datanom.net>
References: <CAHxUxjB=vd-uEOkeiy_HHTYqK=6kPW=4aOgP-dPLUdY+7Fc1Pw@mail.gmail.com>
 <20161123214006.26e4e9a9@sleipner.datanom.net>
Message-ID: <CAHxUxjCAV1D7ueQnwEvXL7tFLQ2r+AbRj0ymD96BsQm70VgFUg@mail.gmail.com>

Hi,

Thanks for the response.

I was planning on using active/backup bonding on 10Gbe for my network fault
tolerance so no multipath support shouldn't be an issue.

I've come across some strange behaviour with the iet provider though, in
that after 9 LUNS it starts changing the existing luns rather than adding
additional.

Hard disk config in proxmox for VM:

Hard Disk (virtio0) ZFSOVERISCSI:vm-112-disk-1,size=10G
Hard Disk (virtio1) ZFSOVERISCSI:vm-112-disk-2,size=10G
Hard Disk (virtio2) ZFSOVERISCSI:vm-112-disk-3,size=10G
Hard Disk (virtio3) ZFSOVERISCSI:vm-112-disk-4,size=10G
Hard Disk (virtio4) ZFSOVERISCSI:vm-112-disk-5,size=10G
Hard Disk (virtio5) ZFSOVERISCSI:vm-112-disk-6,size=10G
Hard Disk (virtio6) ZFSOVERISCSI:vm-112-disk-7,size=10G
Hard Disk (virtio7) ZFSOVERISCSI:vm-112-disk-8,size=10G
Hard Disk (virtio8) ZFSOVERISCSI:vm-112-disk-9,size=10G
Hard Disk (virtio9) ZFSOVERISCSI:vm-112-disk-10,size=10G

ietd.conf file on zfs/iscsi storage host:

Lun 0 Path=/dev/VMSTORE/vm-112-disk-1,Type=blockio
Lun 1 Path=/dev/VMSTORE/vm-112-disk-2,Type=blockio
Lun 2 Path=/dev/VMSTORE/vm-112-disk-3,Type=blockio
Lun 3 Path=/dev/VMSTORE/vm-112-disk-4,Type=blockio
Lun 4 Path=/dev/VMSTORE/vm-112-disk-6,Type=blockio
Lun 5 Path=/dev/VMSTORE/vm-112-disk-7,Type=blockio
Lun 6 Path=/dev/VMSTORE/vm-112-disk-8,Type=blockio
Lun 7 Path=/dev/VMSTORE/vm-112-disk-9,Type=blockio
Lun 8 Path=/dev/VMSTORE/vm-112-disk-10,Type=blockio

as you can see, "disk-5" is missing since I added "disk-10"

Is anyone using zfs over iscsi with iet? have you seen this behaviour?

Thanks,
Mark

On 23 November 2016 at 20:40, Michael Rasmussen <mir at miras.org> wrote:

> On Wed, 23 Nov 2016 09:40:55 +0000
> Mark Adams <mark at openvs.co.uk> wrote:
>
> >
> > Has anyone else tried to get this or a similar setup working? Any views
> > greatly received.
> >
> What you are trying to achieve is not a good idea with
> corosync/pacemaker since iSCSI is a block device. To create a cluster
> over a LUN will require a cluster aware filesystem like NFS, CIFS etc.
> The proper way of doing this with iSCSI would be using multipath to a
> SAN since iSCSI LUNs cannot be shared. Unfortunately the current
> implementation of ZFS over iSCSI does not support multipath (a
> limitation in libiscsi). Also may I remind you that Iet development has
> stopped in favor of LIO targets (http://linux-iscsi.org/wiki/LIO). I am
> currently working on making an implementation of LIO for proxmox which
> will use a different architecture than the current ZFS over iSCSI
> implementation. The new implementation will support multipath. As this
> is developed in my spare time progress is not a high as it could be.
>
> Alternatively you could look at this:
> http://www.napp-it.org/doc/downloads/z-raid.pdf
>
> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael <at> rasmussen <dot> cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir <at> datanom <dot> net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir <at> miras <dot> org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --------------------------------------------------------------
> /usr/games/fortune -es says:
> The computer should be doing the hard work.  That's what it's paid to
> do, after all.
>                 -- Larry Wall in <199709012312.QAA08121 at wall.org>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>