From t.lamprecht at proxmox.com Thu Nov 10 11:39:53 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Thu, 10 Nov 2016 11:39:53 +0100 Subject: [PVE-User] Cluster disaster In-Reply-To: <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> Message-ID: On 11/09/2016 11:46 PM, Dhaussy Alexandre wrote: > I had again another outage... > BUT now everything is back online ! yay ! > > So i think i had (at least) two problems : > > 1 - When installing/upgrading a node. > > If the node sees all SAN storages LUN before install, debian > partitionner tries to scan all LUNs.. > This causes almost all nodes to reboot (not sure why, maybe it causes > latency in lvm cluster, or a problem with a lock somewhere..) > > Same thing happens when f*$king os_prober spawns out on kernel upgrade. > It scans all LVs and causes nodes reboots. So now i make sure of this in > /etc/default/grub => GRUB_DISABLE_OS_PROBER=true Yes OS_PROBER is _bad_ and may even corrupt some FS under some conditions, AFAIK. The Proxmox VE iso does not have it for this reason. > > 2 - There seems to be a bug in lrm. > > Tonight i have seen timeouts in qmstarts in /var/log/pve/tasks/active. > Just after the timeouts, lrm was kind of stuck doing nothing. If it's doing nothing it would be interesting to see in which state it is. Because if it's already online and active the watchdog must trigger if it is stuck for ~60 seconds or more. > Services began to start again after i restarted the service, anyway a > few seconds after, the nodes got fenced. Hmm, this means the watchdog was already running out. > I think the timeouts are due to a bottlenet in our storage switchs, i > have a few messages like this : > > Nov 9 22:34:40 proxmoxt25 kernel: [ 5389.318716] qla2xxx > [0000:08:00.1]-801c:2: Abort command issued nexus=2:2:28 -- 1 2002. > Nov 9 22:34:41 proxmoxt25 kernel: [ 5390.482259] qla2xxx > [0000:08:00.1]-801c:2: Abort command issued nexus=2:1:28 -- 1 2002. > > So when all nodes rebooted, i may have hit the bottleneck, then the lrm > bug, and all ha services were frozen... (happened several times.) Yeah I looked a bit through logs of two of your nodes, it looks like the system hit quite some bottle necks.. CRM/LRM run often in 'loop took to long' errors the filesystem also is sometimes not writable. You have in some logs some huge retransmit list from corosync. Where does your cluster communication happens, not on the storage network? A few general hints: The ha-stack does not likes it when somebody moves the VM configs around from a VM in the started/migrate state. If it's in stopped it's OK as there it can fixup the VM location. Else it cannot simply fixup the location as it does not know if the resource still runs on the (old) node. Modifying the manager status does not works, if a manager is currently elected. The manager reads it only on it transition from slave to manager to get the last state in memory. After that it writes it just out so that on a master reelection the new master has the most current state. So if something bad as this happens again I'd to the following: If no master election happen, but there is a quorate parition of nodes and you are sure that thier pve-ha-crm service is up and running (else restart it first) you can try to trigger an instant master reelection by deleting the olds masters lock (which may not yet be invalid through timeout): rmdir /etc/pve/priv/lock/ha_manager_lock/ If then a master election happens you should be fine and the HA stack will do its work and recover. If you have to move the VMs you should disable those primary, ha-manager disable SID does that also quite well in a lot of problematic situations as it just edits the resources.cfg. If this does not work you have no quorum or pve-cluster has a problem, which both mean HA recovery cannot take place on this node one way or the other. > > Thanks again for the help. > Alexandre. > > Le 09/11/2016 ? 20:54, Thomas Lamprecht a ?crit : >> >> On 09.11.2016 18:05, Dhaussy Alexandre wrote: >>> I have done a cleanup of ressources with echo "" > >>> /etc/pve/ha/resources.cfg >>> >>> It seems to have resolved all problems with inconsistent status of >>> lrm/lcm in the GUI. >>> >> Good. Logs would be interesting to see what went wrong but I do not >> know if I can skim through them as your setup is not too small and there >> may be much noise from the outage in there. >> >> If you have time you may sent me the log file(s) generated by: >> >> journalctl --since "-2 days" -u corosync -u pve-ha-lrm -u pve-ha-crm >> -u pve-cluster > pve-log-$(hostname).log >> >> (adapt the "-2 days" accordingly, it understands also something like, >> "-1 day 3 hours") >> >> Sent them directly to my address (The list does not accepts bigger >> attachments, >> limit is something like 20-20 kb AFAIK). >> I cannot promise any deep examination, but I can skim through them and >> look what happened in the HA stack, maybe I see something obvious. >> >>> A new master have been elected. The manager_status file have been >>> cleaned up. >>> All nodes are idle or active. >>> >>> I am re-starting all vms in ha with "ha manager add". >>> Seems to work now... :-/ >>> >>> Le 09/11/2016 ? 17:40, Dhaussy Alexandre a ?crit : >>>> Sorry my old message was too big... >>>> >>>> Thanks for the input !... >>>> >>>> I have attached manager_status files. >>>> .old is the original file, and .new is the file i have modified and put >>>> in /etc/pve/ha. >>>> >>>> I know this is bad but here's what i've done : >>>> >>>> - delnode on known NON-working nodes. >>>> - rm -Rf /etc/pve/nodes/x for all NON-working nodes. >>>> - replace all NON-working nodes with working nodes in >>>> /etc/pve/ha/manager_status >>>> - mv VM.conf files in the proper node directory >>>> (/etc/pve/nodes/x/qemu-server/) in reference to >>>> /etc/pve/ha/manager_status >>>> - restart pve-ha-crm and pve-ha-lrm on all nodes >>>> >>>> Now on several nodes i have thoses messages : >>>> >>>> nov. 09 17:08:19 proxmoxt34 pve-ha-crm[26200]: status change startup => >>>> wait_for_quorum >>>> nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed: >>>> Noeud final de transport n'est pas connect? >>>> nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed: >>>> Connexion refus?e >>>> nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed: >>>> Connexion refus?e >>>> >> >> This means that something with the cluster filesystem (pve-cluster) >> was not OK. >> Those messages weren't there previously? >> >> >>>> nov. 09 17:08:22 proxmoxt34 pve-ha-lrm[26282]: status change startup => >>>> wait_for_agent_lock >>>> nov. 09 17:12:07 proxmoxt34 pve-ha-lrm[26282]: ipcc_send_rec failed: >>>> Noeud final de transport n'est pas connect? >>>> >>>> We are also investigating on a possible network problem.. >>>> >> Multicast properly working? >> >> >>>> Le 09/11/2016 ? 17:00, Thomas Lamprecht a ?crit : >>>>> Hi, >>>>> >>>>> On 09.11.2016 16:29, Dhaussy Alexandre wrote: >>>>>> I try to remove from ha in the gui, but nothing happends. >>>>>> There are some services in "error" or "fence" state. >>>>>> >>>>>> Now i tried to remove the non-working nodes from the cluster... but i >>>>>> still see those nodes in /etc/pve/ha/manager_status. >>>>> Can you post the manager status please? >>>>> >>>>> Also, is pve-ha-lrm and pve-ha-crm up and running without any error >>>>> on all nodes, at least on those in the quorate partition? >>>>> >>>>> check with: >>>>> systemctl status pve-ha-lrm >>>>> systemctl status pve-ha-crm >>>>> >>>>> If not restart them, and if then its still problematic please post the >>>>> output >>>>> of the systemctl status call (if its the same on all node one output >>>>> should be enough). >>>>> >>>>> >>>>>> Le 09/11/2016 ? 16:13, Dietmar Maurer a ?crit : >>>>>>>> I wanted to remove vms from HA and start the vms locally, but I >>>>>>>> can?t even do >>>>>>>> that (nothing happens.) >>>>> You can remove them from HA by emptying the HA resource file (this >>>>> deletes also >>>>> comments and group settings, but if you need to start them _now_ that >>>>> shouldn't be a problem) >>>>> >>>>> echo "" > /etc/pve/ha/resources.cfg >>>>> >>>>> Afterwards you should be able to start them manually. >>>>> >>>>> >>>>>>> How do you do that exactly (on the GUI)? You should be able to start >>>>>>> them >>>>>>> manually afterwards. >>>>>>> >>>>>> _______________________________________________ >>>>>> pve-user mailing list >>>>>> pve-user at pve.proxmox.com >>>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>>>> >>>>> _______________________________________________ >>>>> pve-user mailing list >>>>> pve-user at pve.proxmox.com >>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>>> _______________________________________________ >>>> pve-user mailing list >>>> pve-user at pve.proxmox.com >>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From lindsay.mathieson at gmail.com Thu Nov 10 21:34:56 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Fri, 11 Nov 2016 06:34:56 +1000 Subject: [PVE-User] online migration broken in latest updates - "unknown command 'mtunnel'" Message-ID: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> qm migrate 506 vnb --online 400 Parameter verification failed. target: target is local node. qm migrate [OPTIONS] root at vnb:/etc/pve/softlog# qm migrate 506 vng --online ERROR: unknown command 'mtunnel' -- Lindsay Mathieson From t.lamprecht at proxmox.com Thu Nov 10 22:11:46 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Thu, 10 Nov 2016 22:11:46 +0100 Subject: [PVE-User] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> Message-ID: <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> On 10.11.2016 21:34, Lindsay Mathieson wrote: > qm migrate 506 vnb --online > 400 Parameter verification failed. > target: target is local node. > qm migrate [OPTIONS] > root at vnb:/etc/pve/softlog# qm migrate 506 vng --online > ERROR: unknown command 'mtunnel' > > Are you sure you upgraded all, i.e. used: apt update apt full-upgrade or apt-get update apt-get dist-upgrade Can you post: pveversion -v From lindsay.mathieson at gmail.com Thu Nov 10 22:35:37 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Fri, 11 Nov 2016 07:35:37 +1000 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> Message-ID: On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: > Are you sure you upgraded all, i.e. used: > apt update > apt full-upgrade Resolved it thanks Thomas - I hadn't updated the *destination* server. Thanks, -- Lindsay Mathieson From lists at hexis.consulting Thu Nov 10 22:53:55 2016 From: lists at hexis.consulting (Hexis) Date: Thu, 10 Nov 2016 15:53:55 -0600 Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi Message-ID: I am trying to run Proxmox PVE 4.3 inside of VMware ESXi, which I was advised would work (obviously issues would occur with KVM). All has gone well so far, containers run fine, however, for some reason, the containers cannot access their gateway when routing through the linux bridge, which corresponds to an interface on the VM. The management interface of ProxMox which works the same way is fine. Any ideas? Thanks, -Hexis From t.lamprecht at proxmox.com Fri Nov 11 08:05:42 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Fri, 11 Nov 2016 08:05:42 +0100 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> Message-ID: On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: >> Are you sure you upgraded all, i.e. used: >> apt update >> apt full-upgrade > > Resolved it thanks Thomas - I hadn't updated the *destination* server. > makes sense, should have been made sense a few days ago this, would not be too hard to catch :/ anyway, for anyone reading this: When upgrading qemu-server to version 4.0.93 or newer you should upgrade all other nodes pve-cluster package to version 4.0-47 or newer, else migrations to those nodes will not work - as we use a new command to detect if we should send the traffic over a separate migration network. cheers, Thomas From colonellor at gmail.com Fri Nov 11 08:48:00 2016 From: colonellor at gmail.com (Roberto Colonello) Date: Fri, 11 Nov 2016 08:48:00 +0100 Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi In-Reply-To: References: Message-ID: On Thu, Nov 10, 2016 at 10:53 PM, Hexis wrote: > > Any ideas? Ciao, have you tried to set "Promiscuos mode: Accept" into vSwitch's Security tab ? -- /roby.deb -- "There are only 10 types of people in the world:Those who understand binary, and those who don't" SOFTWARE is like SEX IT's better when it's FREE https://linuxcounter.net/ Counter Number: 552671 Favorite Distro : Debian From yannis.milios at gmail.com Fri Nov 11 13:11:27 2016 From: yannis.milios at gmail.com (Yannis Milios) Date: Fri, 11 Nov 2016 12:11:27 +0000 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> Message-ID: Not sure if it's related, but after upgrading yesterday to the latest updates, Ceph snapshots take a very long time to complete and finally they fail. This happens only if the VM is running and if I check the 'include RAM' box in snapshot window. All 3 pve/ceph nodes are upgraded to the latest updates. I have 3 pve nodes with ceph storage role on them. Below follows some more info: proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve) pve-manager: 4.3-10 (running version: 4.3-10/7230e60f) pve-kernel-4.4.21-1-pve: 4.4.21-71 pve-kernel-4.4.19-1-pve: 4.4.19-66 lvm2: 2.02.116-pve3 corosync-pve: 2.4.0-1 libqb0: 1.0-1 pve-cluster: 4.0-47 qemu-server: 4.0-94 pve-firmware: 1.1-10 libpve-common-perl: 4.0-80 libpve-access-control: 4.0-19 libpve-storage-perl: 4.0-68 pve-libspice-server1: 0.12.8-1 vncterm: 1.2-1 pve-docs: 4.3-14 pve-qemu-kvm: 2.7.0-6 pve-container: 1.0-81 pve-firewall: 2.0-31 pve-ha-manager: 1.0-35 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u2 lxc-pve: 2.0.5-1 lxcfs: 2.0.4-pve2 criu: 1.6.0-1 novnc-pve: 0.5-8 smartmontools: 6.5+svn4324-1~pve80 zfsutils: 0.6.5.8-pve13~bpo80 openvswitch-switch: 2.5.0-1 ceph: 0.94.9-1~bpo80+1 ceph status cluster 32d19f44-fcef-4863-ad94-cb8d738fe179 health HEALTH_OK monmap e3: 3 mons at {0= 192.168.148.65:6789/0,1=192.168.149.95:6789/0,2=192.168.149.115:6789/0} election epoch 260, quorum 0,1,2 0,1,2 osdmap e740: 6 osds: 6 up, 6 in pgmap v2319446: 120 pgs, 1 pools, 198 GB data, 51642 objects 393 GB used, 2183 GB / 2576 GB avail 120 active+clean client io 4973 B/s rd, 115 kB/s wr, 35 op/s On Fri, Nov 11, 2016 at 7:05 AM, Thomas Lamprecht wrote: > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > >> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: >> >>> Are you sure you upgraded all, i.e. used: >>> apt update >>> apt full-upgrade >>> >> >> Resolved it thanks Thomas - I hadn't updated the *destination* server. >> >> > > makes sense, should have been made sense a few days ago this, would not be > too hard to catch :/ > > anyway, for anyone reading this: > When upgrading qemu-server to version 4.0.93 or newer you should upgrade > all other nodes pve-cluster package to version 4.0-47 or newer, else > migrations to those nodes will not work - as we use a new command to detect > if we should send the traffic over a separate migration network. > > cheers, > Thomas > > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From w.bumiller at proxmox.com Fri Nov 11 13:28:06 2016 From: w.bumiller at proxmox.com (Wolfgang Bumiller) Date: Fri, 11 Nov 2016 13:28:06 +0100 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> Message-ID: <20161111122806.GA13820@olga.wb> Any chance you could compare pve-qemu-kvm 2.7.0-5 and this test build: ? On Fri, Nov 11, 2016 at 12:11:27PM +0000, Yannis Milios wrote: > Not sure if it's related, but after upgrading yesterday to the latest > updates, Ceph snapshots take a very long time to complete and finally they > fail. > This happens only if the VM is running and if I check the 'include RAM' box > in snapshot window. All 3 pve/ceph nodes are upgraded to the latest updates. > > I have 3 pve nodes with ceph storage role on them. Below follows some more > info: > > proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve) > pve-manager: 4.3-10 (running version: 4.3-10/7230e60f) > pve-kernel-4.4.21-1-pve: 4.4.21-71 > pve-kernel-4.4.19-1-pve: 4.4.19-66 > lvm2: 2.02.116-pve3 > corosync-pve: 2.4.0-1 > libqb0: 1.0-1 > pve-cluster: 4.0-47 > qemu-server: 4.0-94 > pve-firmware: 1.1-10 > libpve-common-perl: 4.0-80 > libpve-access-control: 4.0-19 > libpve-storage-perl: 4.0-68 > pve-libspice-server1: 0.12.8-1 > vncterm: 1.2-1 > pve-docs: 4.3-14 > pve-qemu-kvm: 2.7.0-6 > pve-container: 1.0-81 > pve-firewall: 2.0-31 > pve-ha-manager: 1.0-35 > ksm-control-daemon: 1.2-1 > glusterfs-client: 3.5.2-2+deb8u2 > lxc-pve: 2.0.5-1 > lxcfs: 2.0.4-pve2 > criu: 1.6.0-1 > novnc-pve: 0.5-8 > smartmontools: 6.5+svn4324-1~pve80 > zfsutils: 0.6.5.8-pve13~bpo80 > openvswitch-switch: 2.5.0-1 > ceph: 0.94.9-1~bpo80+1 > > ceph status > cluster 32d19f44-fcef-4863-ad94-cb8d738fe179 > health HEALTH_OK > monmap e3: 3 mons at {0= > 192.168.148.65:6789/0,1=192.168.149.95:6789/0,2=192.168.149.115:6789/0} > election epoch 260, quorum 0,1,2 0,1,2 > osdmap e740: 6 osds: 6 up, 6 in > pgmap v2319446: 120 pgs, 1 pools, 198 GB data, 51642 objects > 393 GB used, 2183 GB / 2576 GB avail > 120 active+clean > client io 4973 B/s rd, 115 kB/s wr, 35 op/s > > > > On Fri, Nov 11, 2016 at 7:05 AM, Thomas Lamprecht > wrote: > > > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > > > >> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: > >> > >>> Are you sure you upgraded all, i.e. used: > >>> apt update > >>> apt full-upgrade > >>> > >> > >> Resolved it thanks Thomas - I hadn't updated the *destination* server. > >> > >> > > > > makes sense, should have been made sense a few days ago this, would not be > > too hard to catch :/ > > > > anyway, for anyone reading this: > > When upgrading qemu-server to version 4.0.93 or newer you should upgrade > > all other nodes pve-cluster package to version 4.0-47 or newer, else > > migrations to those nodes will not work - as we use a new command to detect > > if we should send the traffic over a separate migration network. > > > > cheers, > > Thomas > > > > > > > > > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > _______________________________________________ > pve-devel mailing list > pve-devel at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel From yannis.milios at gmail.com Fri Nov 11 13:45:16 2016 From: yannis.milios at gmail.com (Yannis Milios) Date: Fri, 11 Nov 2016 12:45:16 +0000 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: <20161111122806.GA13820@olga.wb> References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> <20161111122806.GA13820@olga.wb> Message-ID: Just tested it with pve-qemu-kvm 2.7.0-6 and it works fine, thanks! On Fri, Nov 11, 2016 at 12:28 PM, Wolfgang Bumiller wrote: > Any chance you could compare pve-qemu-kvm 2.7.0-5 and this test build: > ? > > On Fri, Nov 11, 2016 at 12:11:27PM +0000, Yannis Milios wrote: > > Not sure if it's related, but after upgrading yesterday to the latest > > updates, Ceph snapshots take a very long time to complete and finally > they > > fail. > > This happens only if the VM is running and if I check the 'include RAM' > box > > in snapshot window. All 3 pve/ceph nodes are upgraded to the latest > updates. > > > > I have 3 pve nodes with ceph storage role on them. Below follows some > more > > info: > > > > proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve) > > pve-manager: 4.3-10 (running version: 4.3-10/7230e60f) > > pve-kernel-4.4.21-1-pve: 4.4.21-71 > > pve-kernel-4.4.19-1-pve: 4.4.19-66 > > lvm2: 2.02.116-pve3 > > corosync-pve: 2.4.0-1 > > libqb0: 1.0-1 > > pve-cluster: 4.0-47 > > qemu-server: 4.0-94 > > pve-firmware: 1.1-10 > > libpve-common-perl: 4.0-80 > > libpve-access-control: 4.0-19 > > libpve-storage-perl: 4.0-68 > > pve-libspice-server1: 0.12.8-1 > > vncterm: 1.2-1 > > pve-docs: 4.3-14 > > pve-qemu-kvm: 2.7.0-6 > > pve-container: 1.0-81 > > pve-firewall: 2.0-31 > > pve-ha-manager: 1.0-35 > > ksm-control-daemon: 1.2-1 > > glusterfs-client: 3.5.2-2+deb8u2 > > lxc-pve: 2.0.5-1 > > lxcfs: 2.0.4-pve2 > > criu: 1.6.0-1 > > novnc-pve: 0.5-8 > > smartmontools: 6.5+svn4324-1~pve80 > > zfsutils: 0.6.5.8-pve13~bpo80 > > openvswitch-switch: 2.5.0-1 > > ceph: 0.94.9-1~bpo80+1 > > > > ceph status > > cluster 32d19f44-fcef-4863-ad94-cb8d738fe179 > > health HEALTH_OK > > monmap e3: 3 mons at {0= > > 192.168.148.65:6789/0,1=192.168.149.95:6789/0,2=192.168.149.115:6789/0} > > election epoch 260, quorum 0,1,2 0,1,2 > > osdmap e740: 6 osds: 6 up, 6 in > > pgmap v2319446: 120 pgs, 1 pools, 198 GB data, 51642 objects > > 393 GB used, 2183 GB / 2576 GB avail > > 120 active+clean > > client io 4973 B/s rd, 115 kB/s wr, 35 op/s > > > > > > > > On Fri, Nov 11, 2016 at 7:05 AM, Thomas Lamprecht < > t.lamprecht at proxmox.com> > > wrote: > > > > > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > > > > > >> On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: > > >> > > >>> Are you sure you upgraded all, i.e. used: > > >>> apt update > > >>> apt full-upgrade > > >>> > > >> > > >> Resolved it thanks Thomas - I hadn't updated the *destination* server. > > >> > > >> > > > > > > makes sense, should have been made sense a few days ago this, would > not be > > > too hard to catch :/ > > > > > > anyway, for anyone reading this: > > > When upgrading qemu-server to version 4.0.93 or newer you should > upgrade > > > all other nodes pve-cluster package to version 4.0-47 or newer, else > > > migrations to those nodes will not work - as we use a new command to > detect > > > if we should send the traffic over a separate migration network. > > > > > > cheers, > > > Thomas > > > > > > > > > > > > > > > _______________________________________________ > > > pve-user mailing list > > > pve-user at pve.proxmox.com > > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > _______________________________________________ > > pve-devel mailing list > > pve-devel at pve.proxmox.com > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > > From ADhaussy at voyages-sncf.com Fri Nov 11 15:56:32 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Fri, 11 Nov 2016 14:56:32 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> Message-ID: <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> I really hope to find an explanation to all this mess. Because i'm not very confident right now.. So far if i understand all this correctly.. I'm not very found of how watchdog behaves with crm/lrm. To make a comparison with PVE 3 (RedHat cluster), fencing happened on the corosync/cluster communication stack, but not on the resource manager stack. On PVE 3, several times I found rgmanager was stuck. I just had to find the culprit process (usually pve status), kill it, et voila. But it never caused an outage. > > 2 - There seems to be a bug in lrm. > > > > Tonight i have seen timeouts in qmstarts in /var/log/pve/tasks/active. > > Just after the timeouts, lrm was kind of stuck doing nothing. > > If it's doing nothing it would be interesting to see in which state it is. > Because if it's already online and active the watchdog must trigger if > it is stuck for ~60 seconds or more. I'll try to grab some info if it happens again. > Hmm, this means the watchdog was already running out. Do you have a hint why there is no messages in the logs when watchdog actually seems to trigger fencing ? Because when a node suddently reboots, i can't be sure if it's the watchdog, a hardware bug, kernel bug or whatever.. > Yeah I looked a bit through logs of two of your nodes, it looks like the > system hit quite some bottle necks.. > CRM/LRM run often in 'loop took to long' errors the filesystem also is > sometimes not writable. > You have in some logs some huge retransmit list from corosync. Yes, there were much retransmits on "9 Nov 14:56". This matches when we tried to switch network path, because at this time the nodes did not seem to talk to each other correctly (lrm waiting for quorum.) Anyway I need to triple check (again) IGMP snooping on all network switchs. + Check HP blades Virtual Connect and firmwares.. > Where does your cluster communication happens, not on the storage > network? Storage is on fibre channel. Cluster communication happens on a dedicated network vlan (shared with vmware.) I also use another vlan for live migrations. From ADhaussy at voyages-sncf.com Fri Nov 11 16:28:09 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Fri, 11 Nov 2016 15:28:09 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> Message-ID: <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> > Do you have a hint why there is no messages in the logs when watchdog > actually seems to trigger fencing ? > Because when a node suddently reboots, i can't be sure if it's the watchdog, > a hardware bug, kernel bug or whatever.. Responding to myself, i find this interesting : Nov 8 10:39:01 proxmoxt35 corosync[35250]: [TOTEM ] A new membership (10.xx.xx.11:684) was formed. Members joined: 13 Nov 8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired - disable watchdog updates Nov 8 10:39:01 proxmoxt31 corosync[23483]: [TOTEM ] A new membership (10.xx.xx.11:684) was formed. Members joined: 13 Nov 8 10:40:01 proxmoxt31 watchdog-mux[22395]: client watchdog expired - disable watchdog updates Nov 8 10:39:01 proxmoxt30 corosync[24634]: [TOTEM ] A new membership (10.xx.xx.11:684) was formed. Members joined: 13 Nov 8 10:40:00 proxmoxt30 watchdog-mux[23492]: client watchdog expired - disable watchdog updates Nov 9 10:05:41 proxmoxt20 corosync[42543]: [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7 Nov 9 10:05:46 proxmoxt20 corosync[42543]: [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7 Nov 9 10:06:42 proxmoxt20 watchdog-mux[41401]: client watchdog expired - disable watchdog updates Nov 9 10:05:41 proxmoxt21 corosync[16184]: [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7 Nov 9 10:05:46 proxmoxt21 corosync[16184]: [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7 Nov 9 10:06:42 proxmoxt21 watchdog-mux[42853]: client watchdog expired - disable watchdog updates Nov 9 10:05:41 proxmoxt30 corosync[16159]: [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7 Nov 9 10:05:46 proxmoxt30 corosync[16159]: [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7 Nov 9 10:06:42 proxmoxt30 watchdog-mux[43148]: client watchdog expired - disable watchdog updates Nov 9 10:05:41 proxmoxt31 corosync[16297]: [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7 Nov 9 10:05:46 proxmoxt31 corosync[16297]: [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7 Nov 9 10:06:42 proxmoxt31 watchdog-mux[42761]: client watchdog expired - disable watchdog updates Nov 9 10:05:41 proxmoxt34 corosync[41330]: [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7 Nov 9 10:05:46 proxmoxt34 corosync[41330]: [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7 Nov 9 10:06:42 proxmoxt34 watchdog-mux[40262]: client watchdog expired - disable watchdog updates Nov 9 10:05:41 proxmoxt35 corosync[16158]: [TOTEM ] A new membership (10.xx.xx.11:796) was formed. Members left: 7 Nov 9 10:05:46 proxmoxt35 corosync[16158]: [TOTEM ] A new membership (10.xx.xx.11:800) was formed. Members joined: 7 Nov 9 10:06:42 proxmoxt35 watchdog-mux[42684]: client watchdog expired - disable watchdog updates From mir at miras.org Fri Nov 11 16:31:54 2016 From: mir at miras.org (Michael Rasmussen) Date: Fri, 11 Nov 2016 16:31:54 +0100 Subject: [PVE-User] Cluster disaster In-Reply-To: <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> Message-ID: A long shot. Do you have a hardware watchdog enabled in bios? On November 11, 2016 4:28:09 PM GMT+01:00, Dhaussy Alexandre wrote: >> Do you have a hint why there is no messages in the logs when watchdog >> actually seems to trigger fencing ? >> Because when a node suddently reboots, i can't be sure if it's the >watchdog, >> a hardware bug, kernel bug or whatever.. > >Responding to myself, i find this interesting : > >Nov 8 10:39:01 proxmoxt35 corosync[35250]: [TOTEM ] A new membership >(10.xx.xx.11:684) was formed. Members joined: 13 >Nov 8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired >- disable watchdog updates > >Nov 8 10:39:01 proxmoxt31 corosync[23483]: [TOTEM ] A new membership >(10.xx.xx.11:684) was formed. Members joined: 13 >Nov 8 10:40:01 proxmoxt31 watchdog-mux[22395]: client watchdog expired >- disable watchdog updates > >Nov 8 10:39:01 proxmoxt30 corosync[24634]: [TOTEM ] A new membership >(10.xx.xx.11:684) was formed. Members joined: 13 >Nov 8 10:40:00 proxmoxt30 watchdog-mux[23492]: client watchdog expired >- disable watchdog updates > > >Nov 9 10:05:41 proxmoxt20 corosync[42543]: [TOTEM ] A new membership >(10.xx.xx.11:796) was formed. Members left: 7 >Nov 9 10:05:46 proxmoxt20 corosync[42543]: [TOTEM ] A new membership >(10.xx.xx.11:800) was formed. Members joined: 7 >Nov 9 10:06:42 proxmoxt20 watchdog-mux[41401]: client watchdog expired >- disable watchdog updates > >Nov 9 10:05:41 proxmoxt21 corosync[16184]: [TOTEM ] A new membership >(10.xx.xx.11:796) was formed. Members left: 7 >Nov 9 10:05:46 proxmoxt21 corosync[16184]: [TOTEM ] A new membership >(10.xx.xx.11:800) was formed. Members joined: 7 >Nov 9 10:06:42 proxmoxt21 watchdog-mux[42853]: client watchdog expired >- disable watchdog updates > >Nov 9 10:05:41 proxmoxt30 corosync[16159]: [TOTEM ] A new membership >(10.xx.xx.11:796) was formed. Members left: 7 >Nov 9 10:05:46 proxmoxt30 corosync[16159]: [TOTEM ] A new membership >(10.xx.xx.11:800) was formed. Members joined: 7 >Nov 9 10:06:42 proxmoxt30 watchdog-mux[43148]: client watchdog expired >- disable watchdog updates > >Nov 9 10:05:41 proxmoxt31 corosync[16297]: [TOTEM ] A new membership >(10.xx.xx.11:796) was formed. Members left: 7 >Nov 9 10:05:46 proxmoxt31 corosync[16297]: [TOTEM ] A new membership >(10.xx.xx.11:800) was formed. Members joined: 7 >Nov 9 10:06:42 proxmoxt31 watchdog-mux[42761]: client watchdog expired >- disable watchdog updates > >Nov 9 10:05:41 proxmoxt34 corosync[41330]: [TOTEM ] A new membership >(10.xx.xx.11:796) was formed. Members left: 7 >Nov 9 10:05:46 proxmoxt34 corosync[41330]: [TOTEM ] A new membership >(10.xx.xx.11:800) was formed. Members joined: 7 >Nov 9 10:06:42 proxmoxt34 watchdog-mux[40262]: client watchdog expired >- disable watchdog updates > >Nov 9 10:05:41 proxmoxt35 corosync[16158]: [TOTEM ] A new membership >(10.xx.xx.11:796) was formed. Members left: 7 >Nov 9 10:05:46 proxmoxt35 corosync[16158]: [TOTEM ] A new membership >(10.xx.xx.11:800) was formed. Members joined: 7 >Nov 9 10:06:42 proxmoxt35 watchdog-mux[42684]: client watchdog expired >- disable watchdog updates >_______________________________________________ >pve-user mailing list >pve-user at pve.proxmox.com >http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ---- This mail was virus scanned and spam checked before delivery. This mail is also DKIM signed. See header dkim-signature. From dietmar at proxmox.com Fri Nov 11 17:43:23 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Fri, 11 Nov 2016 17:43:23 +0100 (CET) Subject: [PVE-User] Cluster disaster In-Reply-To: <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> Message-ID: <1880338119.114.1478882604308@webmail.proxmox.com> > Responding to myself, i find this interesting : > > Nov 8 10:39:01 proxmoxt35 corosync[35250]: [TOTEM ] A new membership > (10.xx.xx.11:684) was formed. Members joined: 13 > Nov 8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired - > disable watchdog updates you lost quorum, and the watchdog expired - that is how the watchdog based fencing works. From ADhaussy at voyages-sncf.com Fri Nov 11 17:44:08 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Fri, 11 Nov 2016 16:44:08 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> Message-ID: <8af65c49f3f544518bb13459c089e277@ECLIPSE.groupevsc.com> > A long shot. Do you have a hardware watchdog enabled in bios? I didn't modify any BIOS parameters, except power management. So I believe it's enabled. hpwdt module (hp ilo watchdog) is not loaded. HP ASR is enabled (10 min timeout.) Ipmi_watchdog is blacklisted. nmi_watchdog is enabled => I have seen "please disable this" in proxmox wiki, but there is no explaination why you should do it. :) From lists at hexis.consulting Fri Nov 11 17:48:24 2016 From: lists at hexis.consulting (Hexis) Date: Fri, 11 Nov 2016 10:48:24 -0600 Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi In-Reply-To: References: Message-ID: <878756eb-252b-2eb7-b36c-d11555b650bf@hexis.consulting> You sir are a saint! That makes total sense and was definitely the problem. Everything is up and working. On 11/11/2016 1:48 AM, Roberto Colonello wrote: > On Thu, Nov 10, 2016 at 10:53 PM, Hexis wrote: > >> Any ideas? > > Ciao, > have you tried to set "Promiscuos mode: Accept" into vSwitch's Security > tab ? > > From colonellor at gmail.com Fri Nov 11 18:33:00 2016 From: colonellor at gmail.com (Roberto Colonello) Date: Fri, 11 Nov 2016 18:33:00 +0100 Subject: [PVE-User] PVE 4.3 CONTAINERS ONLY on VMware ESXi In-Reply-To: <878756eb-252b-2eb7-b36c-d11555b650bf@hexis.consulting> References: <878756eb-252b-2eb7-b36c-d11555b650bf@hexis.consulting> Message-ID: On Fri, Nov 11, 2016 at 5:48 PM, Hexis wrote: > You sir are a saint! Please, do not disturb the saints :-) You are lucky, I just finish a VMware training course :-D -- /roby.deb -- "There are only 10 types of people in the world:Those who understand binary, and those who don't" SOFTWARE is like SEX IT's better when it's FREE https://linuxcounter.net/ Counter Number: 552671 Favorite Distro : Debian From ADhaussy at voyages-sncf.com Fri Nov 11 18:41:20 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Fri, 11 Nov 2016 17:41:20 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <1880338119.114.1478882604308@webmail.proxmox.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> Message-ID: > you lost quorum, and the watchdog expired - that is how the watchdog > based fencing works. I don't expect to loose quorum when _one_ node joins or leave the cluster. Nov 8 10:38:58 proxmoxt20 pmxcfs[22537]: [status] notice: update cluster info (cluster name pxmcluster, version = 14) Nov 8 10:39:01 proxmoxt20 corosync[22577]: [TOTEM ] A new membership (10.98.187.11:684) was formed. Members joined: 13 Nov 8 10:39:01 proxmoxt20 corosync[22577]: [QUORUM] Members[13]: 9 10 11 13 4 12 3 1 2 5 6 7 8 Nov 8 10:39:59 proxmoxt20 watchdog-mux[23964]: client watchdog expired - disable watchdog updates Nov 8 10:39:01 proxmoxt35 corosync[35250]: [TOTEM ] A new membership (10.98.187.11:684) was formed. Members joined: 13 Nov 8 10:39:01 proxmoxt35 corosync[35250]: [QUORUM] Members[13]: 9 10 11 13 4 12 3 1 2 5 6 7 8 Nov 8 10:39:58 proxmoxt35 watchdog-mux[28239]: client watchdog expired - disable watchdog updates From dietmar at proxmox.com Fri Nov 11 19:43:39 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Fri, 11 Nov 2016 19:43:39 +0100 (CET) Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> Message-ID: <1860956507.131.1478889820301@webmail.proxmox.com> > On November 11, 2016 at 6:41 PM Dhaussy Alexandre > wrote: > > > > you lost quorum, and the watchdog expired - that is how the watchdog > > based fencing works. > > I don't expect to loose quorum when _one_ node joins or leave the cluster. This was probably a long time before - but I have not read through the whole logs ... From daniel at linux-nerd.de Sat Nov 12 17:15:11 2016 From: daniel at linux-nerd.de (Daniel) Date: Sat, 12 Nov 2016 17:15:11 +0100 Subject: [PVE-User] Container didnt start or stuck Message-ID: Hi There, after reboot the Host-System i get a problem with some VMs. The VM is booting and haning at such kind of Process: find . -depth -xdev ! -name . ! ( -path ./lost+found -uid 0 ) ! ( -path ./quota.user -uid 0 ) ! ( -path ./aquota.user -uid 0 ) ! ( -path ./quota.group -uid 0 ) ! ( -path ./aquota. after killing that task by hand it begins to boot as expected. Anyone know if this is normal and tooks some time to be finished? Cheers Daniel From daniel at linux-nerd.de Sat Nov 12 20:46:53 2016 From: daniel at linux-nerd.de (Daniel) Date: Sat, 12 Nov 2016 20:46:53 +0100 Subject: [PVE-User] Backup Message-ID: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> Hi there, before we used LVM-THIN we were able to Backup all Contains directly from the Host-System. Now, everythink is LVM. Is there any known and easy way to backup all Hosts including all VMs? For example with rsync or backuppc or how ever? Cheers Daniel From gbr at majentis.com Sun Nov 13 15:15:36 2016 From: gbr at majentis.com (Gerald Brandt) Date: Sun, 13 Nov 2016 08:15:36 -0600 Subject: [PVE-User] Kernel oops Message-ID: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com> Hi, I'm getting a lot of crashes on my Proxmox box. I am runing Proxmox on a Debian base install, but I have anther boxes that does the same, and it is fine. Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442402] ------------[ cut here ]------------ Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442408] WARNING: CPU: 2 PID: 0 at kernel/rcu/tree.c:2733 rcu_process_callbacks+0x5bb/0x5e0() Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442409] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci r8169 mii firewire_core crc_itu_t sata_sil24 ahci libahci fjes Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442454] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.21-1-pve #1 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442455] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 11/24/2011 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442457] 0000000000000086 63ad933f85fa0f2b ffff88083fc83e70 ffffffff813f3f83 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442459] 0000000000000000 ffffffff81ccfadb ffff88083fc83ea8 ffffffff81081806 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442460] ffffffff81e576c0 ffff88083fc97f38 0000000000000246 0000000000000000 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442462] Call Trace: Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442463] [] dump_stack+0x63/0x90 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442469] [] warn_slowpath_common+0x86/0xc0 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442471] [] warn_slowpath_null+0x1a/0x20 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442473] [] rcu_process_callbacks+0x5bb/0x5e0 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442475] [] __do_softirq+0x10e/0x2a0 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442476] [] irq_exit+0x8e/0x90 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442480] [] smp_apic_timer_interrupt+0x42/0x50 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442481] [] apic_timer_interrupt+0x82/0x90 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442482] [] ? cpuidle_enter_state+0x10a/0x260 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442487] [] ? cpuidle_enter_state+0xe6/0x260 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442488] [] cpuidle_enter+0x17/0x20 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442491] [] call_cpuidle+0x3b/0x70 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442492] [] ? cpuidle_select+0x13/0x20 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442494] [] cpu_startup_entry+0x2bf/0x380 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442496] [] start_secondary+0x154/0x190 Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442497] ---[ end trace 8a742910926b0ed4 ]--- Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.617812] BUG: unable to handle kernel paging request at 000000000000bb00 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618057] IP: [] kmem_cache_alloc+0x77/0x200 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618662] PGD 5cb1c5067 PUD 5cb0f2067 PMD 0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.619431] Oops: 0000 [#1] SMP Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.620253] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci r8169 mii firewire_core crc_itu_t sata_sil24 ahci libahci fjes Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.624994] CPU: 5 PID: 23044 Comm: ps Tainted: G W 4.4.21-1-pve #1 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.626005] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 11/24/2011 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.627039] task: ffff880818ed3700 ti: ffff8805cb27c000 task.ti: ffff8805cb27c000 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.628071] RIP: 0010:[] [] kmem_cache_alloc+0x77/0x200 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.629113] RSP: 0018:ffff8805cb27fc98 EFLAGS: 00010282 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.630145] RAX: 0000000000000000 RBX: 00000000024080c0 RCX: 00000000000c428b Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.631198] RDX: 00000000000c428a RSI: 00000000024080c0 RDI: ffff88081f003700 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.632239] RBP: ffff8805cb27fcc8 R08: 000000000001a480 R09: 000000000000bb00 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.633275] R10: 0000000000000006 R11: 0000000000000000 R12: 00000000024080c0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.634310] R13: ffffffff8120f26c R14: ffff88081f003700 R15: ffff88081f003700 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.635346] FS: 00007f54269ce700(0000) GS:ffff88083fd40000(0000) knlGS:0000000000000000 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.636350] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.637388] CR2: 000000000000bb00 CR3: 000000052f4f5000 CR4: 00000000000406e0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.638425] Stack: Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.639455] ffff8805cb27fcd0 0000000000000000 ffff880819ad3cc0 ffff8805cb27fef4 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.640500] 0000000000000000 ffff8805cb27fdd0 ffff8805cb27fcf0 ffffffff8120f26c Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.641545] ffffffff81217f1d 0000000000008000 ffff8805cb27fef4 ffff8805cb27fdc0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.642587] Call Trace: Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.643623] [] get_empty_filp+0x5c/0x1c0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.644660] [] ? terminate_walk+0xbd/0xd0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.645699] [] path_openat+0x43/0x1530 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.646731] [] ? putname+0x54/0x60 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.647758] [] ? filename_lookup+0xf5/0x180 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.648781] [] do_filp_open+0x91/0x100 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.649802] [] ? common_perm_cond+0x3a/0x50 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.650814] [] ? from_kgid_munged+0x12/0x20 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.651825] [] ? cp_new_stat+0x157/0x190 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.652786] [] ? __alloc_fd+0x46/0x180 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.653804] [] do_sys_open+0x139/0x2a0 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.654795] [] SyS_open+0x1e/0x20 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.655780] [] entry_SYSCALL_64_fastpath+0x16/0x75 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.656766] Code: 08 65 4c 03 05 53 e3 e1 7e 4d 8b 08 4d 85 c9 0f 84 42 01 00 00 49 83 78 10 00 0f 84 37 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 01 4c 89 c8 65 49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.657834] RIP [] kmem_cache_alloc+0x77/0x200 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.658878] RSP Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.659907] CR2: 000000000000bb00 Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.667666] ---[ end trace 8a742910926b0ed5 ]--- I am non-subscriptions, and I just did an update yesterday to see if it would fix the error. I'll be running a memtest today to see if I can find anything. I hadn't done an update in awhile before that, so I'm leaning towards a hardware issue. What do you think? Gerald From gbr at majentis.com Sun Nov 13 15:42:43 2016 From: gbr at majentis.com (Gerald Brandt) Date: Sun, 13 Nov 2016 08:42:43 -0600 Subject: [PVE-User] Kernel oops In-Reply-To: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com> References: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com> Message-ID: <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com> On 2016-11-13 08:15 AM, Gerald Brandt wrote: > Hi, > > I'm getting a lot of crashes on my Proxmox box. I am runing Proxmox on > a Debian base install, but I have anther boxes that does the same, and > it is fine. > > > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442402] ------------[ cut > here ]------------ > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442408] WARNING: CPU: 2 > PID: 0 at kernel/rcu/tree.c:2733 rcu_process_callbacks+0x5bb/0x5e0() > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442409] Modules linked > in: nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables > iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs > lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad > ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi > nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi > asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul > snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm > snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 > lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep > i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd > sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp > fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi > vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci > r8169 mii firewire_core crc_itu_t sata_sil24 ahci libahci fjes > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442454] CPU: 2 PID: 0 > Comm: swapper/2 Not tainted 4.4.21-1-pve #1 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442455] Hardware name: To > be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 > 11/24/2011 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442457] 0000000000000086 > 63ad933f85fa0f2b ffff88083fc83e70 ffffffff813f3f83 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442459] 0000000000000000 > ffffffff81ccfadb ffff88083fc83ea8 ffffffff81081806 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442460] ffffffff81e576c0 > ffff88083fc97f38 0000000000000246 0000000000000000 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442462] Call Trace: > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442463] > [] dump_stack+0x63/0x90 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442469] > [] warn_slowpath_common+0x86/0xc0 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442471] > [] warn_slowpath_null+0x1a/0x20 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442473] > [] rcu_process_callbacks+0x5bb/0x5e0 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442475] > [] __do_softirq+0x10e/0x2a0 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442476] > [] irq_exit+0x8e/0x90 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442480] > [] smp_apic_timer_interrupt+0x42/0x50 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442481] > [] apic_timer_interrupt+0x82/0x90 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442482] > [] ? cpuidle_enter_state+0x10a/0x260 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442487] > [] ? cpuidle_enter_state+0xe6/0x260 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442488] > [] cpuidle_enter+0x17/0x20 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442491] > [] call_cpuidle+0x3b/0x70 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442492] > [] ? cpuidle_select+0x13/0x20 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442494] > [] cpu_startup_entry+0x2bf/0x380 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442496] > [] start_secondary+0x154/0x190 > Nov 13 06:15:54 gbr-proxmox-1 kernel: [61228.442497] ---[ end trace > 8a742910926b0ed4 ]--- > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.617812] BUG: unable to > handle kernel paging request at 000000000000bb00 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618057] IP: > [] kmem_cache_alloc+0x77/0x200 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.618662] PGD 5cb1c5067 PUD > 5cb0f2067 PMD 0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.619431] Oops: 0000 [#1] SMP > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.620253] Modules linked > in: nfsv3 rpcsec_gss_krb5 nfsv4 ip_set ip6table_filter ip6_tables > iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs > lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad > ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi > nfnetlink_log nfnetlink xfs snd_hda_codec_hdmi nouveau eeepc_wmi > asus_wmi kvm_amd kvm sparse_keymap irqbypass mxm_wmi crct10dif_pclmul > snd_hda_codec_realtek crc32_pclmul video snd_hda_codec_generic ttm > snd_hda_intel drm_kms_helper drm snd_hda_codec aesni_intel aes_x86_64 > lrw gf128mul glue_helper snd_hda_core ablk_helper cryptd snd_hwdep > i2c_algo_bit snd_pcm fb_sys_fops syscopyarea snd_timer sysfillrect snd > sysimgblt input_leds pcspkr serio_raw soundcore edac_mce_amd k10temp > fam15h_power edac_core shpchp i2c_piix4 8250_fintek mac_hid wmi > vhost_net vhost macvtap macvlan it87 hwmon_vid autofs4 btrfs raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 ses enclosure uas usb_storage firewire_ohci > r8169 mii firewire_core crc_itu_t sata_sil24 ahci libahci fjes > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.624994] CPU: 5 PID: 23044 > Comm: ps Tainted: G W 4.4.21-1-pve #1 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.626005] Hardware name: To > be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 > 11/24/2011 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.627039] task: > ffff880818ed3700 ti: ffff8805cb27c000 task.ti: ffff8805cb27c000 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.628071] RIP: > 0010:[] [] > kmem_cache_alloc+0x77/0x200 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.629113] RSP: > 0018:ffff8805cb27fc98 EFLAGS: 00010282 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.630145] RAX: > 0000000000000000 RBX: 00000000024080c0 RCX: 00000000000c428b > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.631198] RDX: > 00000000000c428a RSI: 00000000024080c0 RDI: ffff88081f003700 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.632239] RBP: > ffff8805cb27fcc8 R08: 000000000001a480 R09: 000000000000bb00 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.633275] R10: > 0000000000000006 R11: 0000000000000000 R12: 00000000024080c0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.634310] R13: > ffffffff8120f26c R14: ffff88081f003700 R15: ffff88081f003700 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.635346] FS: > 00007f54269ce700(0000) GS:ffff88083fd40000(0000) knlGS:0000000000000000 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.636350] CS: 0010 DS: 0000 > ES: 0000 CR0: 0000000080050033 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.637388] CR2: > 000000000000bb00 CR3: 000000052f4f5000 CR4: 00000000000406e0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.638425] Stack: > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.639455] ffff8805cb27fcd0 > 0000000000000000 ffff880819ad3cc0 ffff8805cb27fef4 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.640500] 0000000000000000 > ffff8805cb27fdd0 ffff8805cb27fcf0 ffffffff8120f26c > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.641545] ffffffff81217f1d > 0000000000008000 ffff8805cb27fef4 ffff8805cb27fdc0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.642587] Call Trace: > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.643623] > [] get_empty_filp+0x5c/0x1c0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.644660] > [] ? terminate_walk+0xbd/0xd0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.645699] > [] path_openat+0x43/0x1530 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.646731] > [] ? putname+0x54/0x60 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.647758] > [] ? filename_lookup+0xf5/0x180 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.648781] > [] do_filp_open+0x91/0x100 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.649802] > [] ? common_perm_cond+0x3a/0x50 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.650814] > [] ? from_kgid_munged+0x12/0x20 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.651825] > [] ? cp_new_stat+0x157/0x190 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.652786] > [] ? __alloc_fd+0x46/0x180 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.653804] > [] do_sys_open+0x139/0x2a0 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.654795] > [] SyS_open+0x1e/0x20 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.655780] > [] entry_SYSCALL_64_fastpath+0x16/0x75 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.656766] Code: 08 65 4c 03 > 05 53 e3 e1 7e 4d 8b 08 4d 85 c9 0f 84 42 01 00 00 49 83 78 10 00 0f > 84 37 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 01 4c 89 c8 > 65 49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.657834] RIP > [] kmem_cache_alloc+0x77/0x200 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.658878] RSP > > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.659907] CR2: > 000000000000bb00 > Nov 13 06:17:06 gbr-proxmox-1 kernel: [61300.667666] ---[ end trace > 8a742910926b0ed5 ]--- > > I am non-subscriptions, and I just did an update yesterday to see if > it would fix the error. I'll be running a memtest today to see if I > can find anything. > > I hadn't done an update in awhile before that, so I'm leaning towards > a hardware issue. What do you think? > > Gerald > root at gbr-proxmox-1:~# pveversion -verbose proxmox-ve: 4.3-71 (running kernel: 4.4.21-1-pve) pve-manager: 4.3-10 (running version: 4.3-10/7230e60f) pve-kernel-4.4.6-1-pve: 4.4.6-48 pve-kernel-4.4.13-1-pve: 4.4.13-56 pve-kernel-4.2.6-1-pve: 4.2.6-36 pve-kernel-4.4.8-1-pve: 4.4.8-52 pve-kernel-4.4.21-1-pve: 4.4.21-71 pve-kernel-4.4.19-1-pve: 4.4.19-66 pve-kernel-4.4.10-1-pve: 4.4.10-54 lvm2: 2.02.116-pve3 corosync-pve: 2.4.0-1 libqb0: 1.0-1 pve-cluster: 4.0-47 qemu-server: 4.0-94 pve-firmware: 1.1-10 libpve-common-perl: 4.0-80 libpve-access-control: 4.0-19 libpve-storage-perl: 4.0-68 pve-libspice-server1: 0.12.8-1 vncterm: 1.2-1 pve-docs: 4.3-14 pve-qemu-kvm: 2.7.0-6 pve-container: 1.0-81 pve-firewall: 2.0-31 pve-ha-manager: 1.0-35 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u2 lxc-pve: 2.0.5-1 lxcfs: 2.0.4-pve2 criu: 1.6.0-1 novnc-pve: 0.5-8 smartmontools: 6.5+svn4324-1~pve80 From f.gruenbichler at proxmox.com Mon Nov 14 07:40:21 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Mon, 14 Nov 2016 07:40:21 +0100 Subject: [PVE-User] Backup In-Reply-To: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> Message-ID: <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com> On Sat, Nov 12, 2016 at 08:46:53PM +0100, Daniel wrote: > Hi there, > > before we used LVM-THIN we were able to Backup all Contains directly from the Host-System. > Now, everythink is LVM. Is there any known and easy way to backup all Hosts including all VMs? > For example with rsync or backuppc or how ever? you can mount a container's volumes to be accessible on the host by calling "pct mount ID". please be aware that this sets a lock on the container and needs to be reversed by "pct unmount ID" afterwards. but I would advise you to use vzdump to backup containers - you get a (compressed) tar archive, the config is backed up as well and you get consistency "for free" (or almost free ;)). normally, you want to restore individual containers anyway. From daniel at linux-nerd.de Mon Nov 14 09:43:40 2016 From: daniel at linux-nerd.de (Daniel) Date: Mon, 14 Nov 2016 09:43:40 +0100 Subject: [PVE-User] Backup In-Reply-To: <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com> References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com> Message-ID: > > but I would advise you to use vzdump to backup containers - you get a > (compressed) tar archive, the config is backed up as well and you get > consistency "for free" (or almost free ;)). normally, you want to > restore individual containers anyway. The problem is that there is no way to restore just simple files and its not incremental. So vzdump make no sense for me :( Cheers From f.gruenbichler at proxmox.com Mon Nov 14 09:53:59 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Mon, 14 Nov 2016 09:53:59 +0100 Subject: [PVE-User] Backup In-Reply-To: References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com> Message-ID: <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com> On Mon, Nov 14, 2016 at 09:43:40AM +0100, Daniel wrote: > > > > but I would advise you to use vzdump to backup containers - you get a > > (compressed) tar archive, the config is backed up as well and you get > > consistency "for free" (or almost free ;)). normally, you want to > > restore individual containers anyway. > > The problem is that there is no way to restore just simple files and its not incremental. > So vzdump make no sense for me :( extracting individual files is not a problem for container backups - they're just compressed tar archives after all. incremental backups are not supported though, that is correct. From daniel at linux-nerd.de Mon Nov 14 10:26:44 2016 From: daniel at linux-nerd.de (Daniel) Date: Mon, 14 Nov 2016 10:26:44 +0100 Subject: [PVE-User] Backup In-Reply-To: <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com> References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com> <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com> Message-ID: <1912EFC5-BF08-4A3B-AD5D-9E159661D254@linux-nerd.de> > Am 14.11.2016 um 09:53 schrieb Fabian Gr?nbichler : > > On Mon, Nov 14, 2016 at 09:43:40AM +0100, Daniel wrote: >>> >>> but I would advise you to use vzdump to backup containers - you get a >>> (compressed) tar archive, the config is backed up as well and you get >>> consistency "for free" (or almost free ;)). normally, you want to >>> restore individual containers anyway. >> >> The problem is that there is no way to restore just simple files and its not incremental. >> So vzdump make no sense for me :( > > extracting individual files is not a problem for container backups - > they're just compressed tar archives after all. incremental backups are > not supported though, that is correct. Its not a big deal to use backuppc for example on each Container but it was easier before we used LVM-Thin ;) So it will blow up our network. Our Mail-Server for example is backuped up hours which is not so easy handled bei vzdump ;) From e.kasper at proxmox.com Mon Nov 14 10:39:22 2016 From: e.kasper at proxmox.com (Emmanuel Kasper) Date: Mon, 14 Nov 2016 10:39:22 +0100 Subject: [PVE-User] Kernel oops In-Reply-To: <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com> References: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com> <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com> Message-ID: <3ca4868b-d596-1df3-fd4b-906482049183@proxmox.com> >> I am non-subscriptions, and I just did an update yesterday to see if >> it would fix the error. I'll be running a memtest today to see if I >> can find anything. >> >> I hadn't done an update in awhile before that, so I'm leaning towards >> a hardware issue. What do you think? Yes, most probably the ram is the culprit. You might also check that the RAM modules are properly seated on the motherboard. From ADhaussy at voyages-sncf.com Mon Nov 14 11:50:57 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Mon, 14 Nov 2016 10:50:57 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <1860956507.131.1478889820301@webmail.proxmox.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> Message-ID: <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> Le 11/11/2016 ? 19:43, Dietmar Maurer a ?crit : > On November 11, 2016 at 6:41 PM Dhaussy Alexandre > wrote: >>> you lost quorum, and the watchdog expired - that is how the watchdog >>> based fencing works. >> I don't expect to loose quorum when _one_ node joins or leave the cluster. > This was probably a long time before - but I have not read through the whole > logs ... That makes no sense to me.. The fact is : everything have been working fine for weeks. What i can see in the logs is : several reboots of cluster nodes suddently, and exactly one minute after one node joining and/or leaving the cluster. I see no problems with corosync/lrm/crm before that. This leads me to a probable network (multicast) malfunction. I did a bit of homeworks reading the wiki about ha manager.. What i understand so far, is that every state/service change from LRM must be acknowledged (cluster-wise) by CRM master. So if a multicast disruption occurs, and i assume LRM wouldn't be able talk to the CRM MASTER, then it also couldn't reset the watchdog, am i right ? Another thing ; i have checked my network configuration, the cluster ip is set on a linux bridge... By default multicast_snooping is set to 1 on linux bridge, so i think it there's a good chance this is the source of my problems... Note that we don't use IGMP snooping, it is disabled on almost all network switchs. Plus i found a post by A.Derumier (yes, 3 years old..) He did have similar issues with bridge and multicast. http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html From t.lamprecht at proxmox.com Mon Nov 14 12:33:27 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Mon, 14 Nov 2016 12:33:27 +0100 Subject: [PVE-User] Cluster disaster In-Reply-To: <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> Message-ID: <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> On 14.11.2016 11:50, Dhaussy Alexandre wrote: > > Le 11/11/2016 ? 19:43, Dietmar Maurer a ?crit : >> On November 11, 2016 at 6:41 PM Dhaussy Alexandre >> wrote: >>>> you lost quorum, and the watchdog expired - that is how the watchdog >>>> based fencing works. >>> I don't expect to loose quorum when _one_ node joins or leave the cluster. >> This was probably a long time before - but I have not read through the whole >> logs ... > That makes no sense to me.. > The fact is : everything have been working fine for weeks. > > > What i can see in the logs is : several reboots of cluster nodes > suddently, and exactly one minute after one node joining and/or leaving > the cluster. The watchdog is set to an 60 second timeout, meaning that cluster leave caused quorum loss, or other problems (you said you had multicast problems around that time) thus the LRM stopped updating the watchdog, so one minute later it resetted all nodes, which left the quorate partition. > I see no problems with corosync/lrm/crm before that. > This leads me to a probable network (multicast) malfunction. > > I did a bit of homeworks reading the wiki about ha manager.. > > What i understand so far, is that every state/service change from LRM > must be acknowledged (cluster-wise) by CRM master. Yes and no, LRM and CRM are two state machines with synced inputs, but that holds mainly for human triggered commands and the resulting communication. Meaning that commands like start, stop, migrate may not go through from the CRM to the LRM. Fencing and such stuff works none the less, else it would be a major design flaw :) > So if a multicast disruption occurs, and i assume LRM wouldn't be able > talk to the CRM MASTER, then it also couldn't reset the watchdog, am i > right ? > No, the watchdog runs on each node and is CRM independent. As watchdogs are normally not able to server more clients we wrote the watchdog-mux (multiplexer). This is a very simple C program which opens the watchdog with a 60 second timeout and allows multiple clients (at the moment CRM and LRM) to connect to it. If a client does not resets the dog for about 10 seconds, IIRC, the watchdox-mux disables watchdogs updates on the real watchdog. After that a node reset will happen *when* the dog runs out of time, not instantly. So if the LRM cannot communicate (i.e. has no quorum) he will stop updating the dog, thus trigger independent what the CRM says or does. > Another thing ; i have checked my network configuration, the cluster ip > is set on a linux bridge... > By default multicast_snooping is set to 1 on linux bridge, so i think it > there's a good chance this is the source of my problems... > Note that we don't use IGMP snooping, it is disabled on almost all > network switchs. > Yes, multicast snooping has to be configured (recommended) or else turned off on the switch. That's stated in some wiki articles, various forum posts and our docs, here: http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements Hope that helps a bit understanding. :) cheers, Thomas > Plus i found a post by A.Derumier (yes, 3 years old..) He did have > similar issues with bridge and multicast. > http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From dietmar at proxmox.com Mon Nov 14 12:34:02 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Mon, 14 Nov 2016 12:34:02 +0100 (CET) Subject: [PVE-User] Cluster disaster In-Reply-To: <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> Message-ID: <462735272.57.1479123243305@webmail.proxmox.com> > What i understand so far, is that every state/service change from LRM > must be acknowledged (cluster-wise) by CRM master. > So if a multicast disruption occurs, and i assume LRM wouldn't be able > talk to the CRM MASTER, then it also couldn't reset the watchdog, am i > right ? Nothing happens as long as you have quorum. And if I understand you correctly, you never lost quorum on those nodes? From ADhaussy at voyages-sncf.com Mon Nov 14 14:25:18 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Mon, 14 Nov 2016 13:25:18 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <462735272.57.1479123243305@webmail.proxmox.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <462735272.57.1479123243305@webmail.proxmox.com> Message-ID: <812f2710-9973-267f-abca-53789a28162b@voyages-sncf.com> Le 14/11/2016 ? 12:34, Dietmar Maurer a ?crit : >> What i understand so far, is that every state/service change from LRM >> must be acknowledged (cluster-wise) by CRM master. >> So if a multicast disruption occurs, and i assume LRM wouldn't be able >> talk to the CRM MASTER, then it also couldn't reset the watchdog, am i >> right ? > Nothing happens as long as you have quorum. And if I understand you > correctly, you never lost quorum on those nodes? As far as can be told from the log files, yes. From ADhaussy at voyages-sncf.com Mon Nov 14 14:46:40 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Mon, 14 Nov 2016 13:46:40 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> Message-ID: <81da8dca-65d8-95d9-b804-e73ff60a4650@voyages-sncf.com> Le 14/11/2016 ? 12:33, Thomas Lamprecht a ?crit : > Hope that helps a bit understanding. :) Sure, thank you for clearing things up. :) I wish i had done this before, but i learned a lot in the last few days... From gbr at majentis.com Tue Nov 15 14:39:21 2016 From: gbr at majentis.com (Gerald Brandt) Date: Tue, 15 Nov 2016 07:39:21 -0600 Subject: [PVE-User] Kernel oops In-Reply-To: <3ca4868b-d596-1df3-fd4b-906482049183@proxmox.com> References: <54ba26f3-efcb-c70d-1f3d-aecfccb82a79@majentis.com> <7025c030-25c0-4d54-2ee2-15858d0caf68@majentis.com> <3ca4868b-d596-1df3-fd4b-906482049183@proxmox.com> Message-ID: <29cb9d5d-56b7-6ab2-2d5d-d3e6b19a7fa2@majentis.com> On 2016-11-14 03:39 AM, Emmanuel Kasper wrote: >>> I am non-subscriptions, and I just did an update yesterday to see if >>> it would fix the error. I'll be running a memtest today to see if I >>> can find anything. >>> >>> I hadn't done an update in awhile before that, so I'm leaning towards >>> a hardware issue. What do you think? > Yes, most probably the ram is the culprit. You might also check that the > RAM modules are properly seated on the motherboard. > > > _______________________________________________ > Bad RAM is exactly what it was. 2 of the 4 DIMMs went bad after 4 years. Gerald From m at plus-plus.su Tue Nov 15 15:48:57 2016 From: m at plus-plus.su (Mikhail) Date: Tue, 15 Nov 2016 17:48:57 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS Message-ID: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> Hello, Please help me to find why I'm seeing slow speeds when KVM guest is on NFS storage. I have pretty standard setup, running Proxmox 4.1-1. The storage server is on NFS connected directly (no switches/hubs, direct NIC-to-NIC connection) via Gigabit ethernet. I just launched Debian-8.3 stock ISO installation on the KVM guest that's disk resides on NFS and I'm seeing some terribly slow file copy operation speeds on debian install procedure - about 200-600 kilobyte/second according to "bwm-ng" output on storage server. I also tried direct write from my Proxmox host via NFS using "dd" and results are showing near 1gbit speeds: # dd if=/dev/zero of=10G bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 115.951 s, 90.4 MB/s What could be an issue? On Proxmox host: # cat /proc/mounts |grep vmnf 192.168.4.1:/mnt/vmnfs /mnt/pve/vmnfs nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.1,mountvers=3,mountport=47825,mountproto=udp,local_lock=none,addr=192.168.4.1 0 0 storage.cfg: nfs: vmnfs export /mnt/vmnfs server 192.168.4.1 path /mnt/pve/vmnfs content images options vers=3 maxfiles 1 KVM guest config: # cat /etc/pve/qemu-server/85103.conf bootdisk: virtio0 cores: 1 ide2: ISOimages:iso/debian-8.3.0-amd64-CD-1.iso,media=cdrom memory: 2048 name: WEB net0: virtio=3A:39:66:30:63:32,bridge=vmbr0,tag=85 numa: 0 onboot: 1 ostype: l26 smbios1: uuid=97ea543f-ca64-43ab-9d66-9d1c9cd179b0 sockets: 1 virtio0: vmnfs:85103/vm-85103-disk-1.qcow2,size=50G Any suggestions where to start looking is greatly appreciated. Thanks. From gbr at majentis.com Tue Nov 15 16:09:46 2016 From: gbr at majentis.com (Gerald Brandt) Date: Tue, 15 Nov 2016 09:09:46 -0600 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> Message-ID: I don't know if it helps, but I always switch to NFSv4. nfs: storage export /proxmox server 172.23.4.16 path /mnt/pve/storage options vers=4 maxfiles 1 content iso,backup,images Gerald On 2016-11-15 08:48 AM, Mikhail wrote: > Hello, > > Please help me to find why I'm seeing slow speeds when KVM guest is on > NFS storage. I have pretty standard setup, running Proxmox 4.1-1. The > storage server is on NFS connected directly (no switches/hubs, direct > NIC-to-NIC connection) via Gigabit ethernet. > > I just launched Debian-8.3 stock ISO installation on the KVM guest > that's disk resides on NFS and I'm seeing some terribly slow file copy > operation speeds on debian install procedure - about 200-600 > kilobyte/second according to "bwm-ng" output on storage server. I also > tried direct write from my Proxmox host via NFS using "dd" and results > are showing near 1gbit speeds: > > # dd if=/dev/zero of=10G bs=1M count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB) copied, 115.951 s, 90.4 MB/s > > What could be an issue? > > On Proxmox host: > > # cat /proc/mounts |grep vmnf > 192.168.4.1:/mnt/vmnfs /mnt/pve/vmnfs nfs > rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.4.1,mountvers=3,mountport=47825,mountproto=udp,local_lock=none,addr=192.168.4.1 > 0 0 > > storage.cfg: > nfs: vmnfs > export /mnt/vmnfs > server 192.168.4.1 > path /mnt/pve/vmnfs > content images > options vers=3 > maxfiles 1 > > KVM guest config: > # cat /etc/pve/qemu-server/85103.conf > bootdisk: virtio0 > cores: 1 > ide2: ISOimages:iso/debian-8.3.0-amd64-CD-1.iso,media=cdrom > memory: 2048 > name: WEB > net0: virtio=3A:39:66:30:63:32,bridge=vmbr0,tag=85 > numa: 0 > onboot: 1 > ostype: l26 > smbios1: uuid=97ea543f-ca64-43ab-9d66-9d1c9cd179b0 > sockets: 1 > virtio0: vmnfs:85103/vm-85103-disk-1.qcow2,size=50G > > Any suggestions where to start looking is greatly appreciated. > > Thanks. > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > From m at plus-plus.su Tue Nov 15 18:25:10 2016 From: m at plus-plus.su (Mikhail) Date: Tue, 15 Nov 2016 20:25:10 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> Message-ID: <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> On 11/15/2016 06:09 PM, Gerald Brandt wrote: > I don't know if it helps, but I always switch to NFSv4. Thanks for the tip. This did not help. I also tried with various caching options (writeback, writethrough, etc) and RAW disk format instead of qcow2 - nothing changed. I also have LVM over iSCSI export to that Proxmox host, and using LVM over network (to the same storage server) I'm seeing expected speeds close to 1gbit. So this means something is either wrong with NFS export options, or something related to that part. From ADhaussy at voyages-sncf.com Tue Nov 15 19:04:10 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Tue, 15 Nov 2016 18:04:10 +0000 Subject: [PVE-User] weird memory stats in GUI graphs Message-ID: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com> Hello, I just noticed two different values on the node Summary tab : Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB) And graphs : Total RAM : 540.94GB and Usage : 504.53GB The server has 512G + proxmox-ve: 4.3-66. From weik at bbs-haarentor.de Tue Nov 15 19:16:24 2016 From: weik at bbs-haarentor.de (Ulf Weikert) Date: Tue, 15 Nov 2016 19:16:24 +0100 Subject: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM Message-ID: Hey there, I'm running Proxmox VE 4.2-2 on a HP DL380 G8. Yesterday I installed a new 10Gbit HP 560SFP+ Dual Port Fibre NIC. Which lead me to this bug [0], [1]. But since the card is displayed in my Proxmox Webinterface, I think it should work at least. So I configured my KVM Container and installed Windows Server 2012 R2 VM according to the best practices wiki [2]. After Installation however the NIC in the VM does not receive an IP. The DL 380 host is directly attached to our coreswitch via fibre channel. To make sure DHCP is working on the coreswitch port I plugged in some old 1 GBit SFP modules. Connected it to an equally old 24 port Switch. And plugged my notebook in the RJ45 port. DHCP works fine there. So in theory it should work in my VM as well. But for some reason it doesn't. See screenshot [3] See my Proxmox Host and VM Setup. [4] & [5]. I tried using OVS Bridge instead of Linux Bridge as suggested in [0]#20, but that didn't work either. I'm thankful for any tip or advice you can give me. [0] https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/ [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1616107 [2] https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices [3] https://postimg.org/image/bmh7iooxp/ [4] https://postimg.org/image/l8aryzg3h/ [5] https://postimg.org/image/z392hgail/ -- Freundliche Gr??e Im Auftrag Ulf Weikert Systemadministrator Berufsbildende Schulen Haarentor der Stadt Oldenburg Ammerl?nder Heerstr. 33-39 | 26129 Oldenburg ----------------------------------------- Encryptet 'Signal' Call: +49 441 77915-17 Tel. +49 441 77915-17 E-Mail: weik at bbs-haarentor.de Besuchen Sie uns im Internet unter www.bbs-haarentor.de Schulprogramm unter: http://www.bbs-haarentor.de/index.php?id=323 Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. From dietmar at proxmox.com Tue Nov 15 19:48:07 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Tue, 15 Nov 2016 19:48:07 +0100 (CET) Subject: [PVE-User] weird memory stats in GUI graphs In-Reply-To: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com> References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com> Message-ID: <580588460.34.1479235687618@webmail.proxmox.com> > I just noticed two different values on the node Summary tab : > > Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB) > > And graphs : Total RAM : 540.94GB and Usage : 504.53GB Indeed, that looks strange. Please note that the units are different (GiB vs. GB), but values are still wrong. From dietmar at proxmox.com Tue Nov 15 19:49:43 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Tue, 15 Nov 2016 19:49:43 +0100 (CET) Subject: [PVE-User] weird memory stats in GUI graphs In-Reply-To: <580588460.34.1479235687618@webmail.proxmox.com> References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com> <580588460.34.1479235687618@webmail.proxmox.com> Message-ID: <762264953.36.1479235783912@webmail.proxmox.com> > On November 15, 2016 at 7:48 PM Dietmar Maurer wrote: > > > > I just noticed two different values on the node Summary tab : > > > > Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB) > > > > And graphs : Total RAM : 540.94GB and Usage : 504.53GB > > Indeed, that looks strange. Please note that the units are > different (GiB vs. GB), but values are still wrong. No, values are correct - it is just the different unit. From dietmar at proxmox.com Tue Nov 15 19:52:09 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Tue, 15 Nov 2016 19:52:09 +0100 (CET) Subject: [PVE-User] weird memory stats in GUI graphs In-Reply-To: <762264953.36.1479235783912@webmail.proxmox.com> References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com> <580588460.34.1479235687618@webmail.proxmox.com> <762264953.36.1479235783912@webmail.proxmox.com> Message-ID: <1387700866.38.1479235929867@webmail.proxmox.com> > On November 15, 2016 at 7:49 PM Dietmar Maurer wrote: > > > > > > On November 15, 2016 at 7:48 PM Dietmar Maurer wrote: > > > > > > > I just noticed two different values on the node Summary tab : > > > > > > Numbers : RAM usage 92.83% (467.65 GiB of 503.79 GiB) > > > > > > And graphs : Total RAM : 540.94GB and Usage : 504.53GB > > > > Indeed, that looks strange. Please note that the units are > > different (GiB vs. GB), but values are still wrong. > > No, values are correct - it is just the different unit. Also see: https://en.wikipedia.org/wiki/Gibibyte And yes, I know it is not ideal to display values with different base unit, but this has technical reasons... From ADhaussy at voyages-sncf.com Tue Nov 15 21:12:02 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Tue, 15 Nov 2016 20:12:02 +0000 Subject: [PVE-User] weird memory stats in GUI graphs In-Reply-To: <1387700866.38.1479235929867@webmail.proxmox.com> References: <6debaa4a-872c-f94a-0bf5-b4ffcc5886c0@voyages-sncf.com> <580588460.34.1479235687618@webmail.proxmox.com> <762264953.36.1479235783912@webmail.proxmox.com>, <1387700866.38.1479235929867@webmail.proxmox.com> Message-ID: <2EDB1A96-F0FE-4391-B951-4BDF5111602A@voyages-sncf.com> > Le 15 nov. 2016 ? 19:52, Dietmar Maurer a ?crit : >> No, values are correct - it is just the different unit. > > Also see: https://en.wikipedia.org/wiki/Gibibyte > > And yes, I know it is not ideal to display values with different base unit, > but this has technical reasons... > Indeed, i did not pay attention to units. You almost lost me. :-) I guess blame the bad habits of using GB for GiB.. From bc at iptel.co Tue Nov 15 22:33:13 2016 From: bc at iptel.co (Brian ::) Date: Tue, 15 Nov 2016 21:33:13 +0000 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> Message-ID: 90.4 MB/s isn't that far off. On Tue, Nov 15, 2016 at 5:25 PM, Mikhail wrote: > On 11/15/2016 06:09 PM, Gerald Brandt wrote: >> I don't know if it helps, but I always switch to NFSv4. > > Thanks for the tip. This did not help. I also tried with various caching > options (writeback, writethrough, etc) and RAW disk format instead of > qcow2 - nothing changed. > > I also have LVM over iSCSI export to that Proxmox host, and using LVM > over network (to the same storage server) I'm seeing expected speeds > close to 1gbit. > > So this means something is either wrong with NFS export options, or > something related to that part. > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From bc at iptel.co Tue Nov 15 22:35:55 2016 From: bc at iptel.co (Brian ::) Date: Tue, 15 Nov 2016 21:35:55 +0000 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> Message-ID: Ignore my reply - just reread the thread fully :) NFS should work just fine.. no idea why you are seeing those lousy speeds. On Tue, Nov 15, 2016 at 9:33 PM, Brian :: wrote: > 90.4 MB/s isn't that far off. > > > On Tue, Nov 15, 2016 at 5:25 PM, Mikhail wrote: >> On 11/15/2016 06:09 PM, Gerald Brandt wrote: >>> I don't know if it helps, but I always switch to NFSv4. >> >> Thanks for the tip. This did not help. I also tried with various caching >> options (writeback, writethrough, etc) and RAW disk format instead of >> qcow2 - nothing changed. >> >> I also have LVM over iSCSI export to that Proxmox host, and using LVM >> over network (to the same storage server) I'm seeing expected speeds >> close to 1gbit. >> >> So this means something is either wrong with NFS export options, or >> something related to that part. >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From m at plus-plus.su Tue Nov 15 22:36:40 2016 From: m at plus-plus.su (Mikhail) Date: Wed, 16 Nov 2016 00:36:40 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> Message-ID: <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> On 11/16/2016 12:33 AM, Brian :: wrote: > 90.4 MB/s isn't that far off. Hello, Yes, but I'm only able to get these results when doing simple "dd" test directly on Proxmox host machine inside NFS-mounted directory. KVM guest's filesystem is not getting even 1/4 of that speed when it's disk resides on the very same NFS (Debian installation from stock ISO takes ~hour to copy first halt of it's files..) From bc at iptel.co Tue Nov 15 22:43:42 2016 From: bc at iptel.co (Brian ::) Date: Tue, 15 Nov 2016 21:43:42 +0000 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> Message-ID: What type of disk controller and what caching mode are you using? On Tue, Nov 15, 2016 at 9:36 PM, Mikhail wrote: > On 11/16/2016 12:33 AM, Brian :: wrote: >> 90.4 MB/s isn't that far off. > > Hello, > > Yes, but I'm only able to get these results when doing simple "dd" test > directly on Proxmox host machine inside NFS-mounted directory. KVM > guest's filesystem is not getting even 1/4 of that speed when it's disk > resides on the very same NFS (Debian installation from stock ISO takes > ~hour to copy first halt of it's files..) > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From m at plus-plus.su Tue Nov 15 23:05:00 2016 From: m at plus-plus.su (Mikhail) Date: Wed, 16 Nov 2016 01:05:00 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> Message-ID: <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> On 11/16/2016 12:43 AM, Brian :: wrote: > What type of disk controller and what caching mode are you using? The storage server is built with 4 x 4TB ST4000NM0034 Seagate disks, attached to LSI Logic SAS3008 controller. Then there's Debian Jessie with software RAID10 using MDADM. This space is given to Proxmox host via iSCSI + LVM via 10 gbit ethernet. There's 32GB of RAM in this storage server, so almost all this RAM can be used for cache (nothing else runs there). I ran various tests on the storage server locally (created local LV, formatted it to EXT4 and ran there various disk-intensive tasks such as copying big files, etc). My average write speed to this MDADM raid10 / LVM / Ext4 filesystem is about 70-80mb/s. I guess it should be much faster then that, but I can't find out where's the bottleneck in this setup.. # cat /proc/mdstat Personalities : [raid10] md0 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1] 7811819520 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU] bitmap: 11/59 pages [44KB], 65536KB chunk unused devices: # pvs PV VG Fmt Attr PSize PFree /dev/md0 vg0 lvm2 a-- 7.28t 1.28t Thanks. > > > > On Tue, Nov 15, 2016 at 9:36 PM, Mikhail wrote: >> On 11/16/2016 12:33 AM, Brian :: wrote: >>> 90.4 MB/s isn't that far off. >> >> Hello, >> >> Yes, but I'm only able to get these results when doing simple "dd" test >> directly on Proxmox host machine inside NFS-mounted directory. KVM >> guest's filesystem is not getting even 1/4 of that speed when it's disk >> resides on the very same NFS (Debian installation from stock ISO takes >> ~hour to copy first halt of it's files..) >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From proxmox-user at mattern.org Tue Nov 15 23:12:34 2016 From: proxmox-user at mattern.org (Marcus) Date: Tue, 15 Nov 2016 23:12:34 +0100 Subject: [PVE-User] Backup In-Reply-To: <1912EFC5-BF08-4A3B-AD5D-9E159661D254@linux-nerd.de> References: <650CDE80-F944-44B3-BF1E-56D592AC63BF@linux-nerd.de> <20161114064021.4dio6u7uf4w7bgjx@nora.maurer-it.com> <20161114085359.im2tpwi2yngdep3v@nora.maurer-it.com> <1912EFC5-BF08-4A3B-AD5D-9E159661D254@linux-nerd.de> Message-ID: <47d2d15f-b3e2-ac2e-4b64-bcec5a0d02f3@mattern.org> Hi, it is also possible to take LVM Snapshots on the host. Than mount the snapshot and take backups with rsync (rsnapshot e. g.) or whatever you prefer. You don't need pct mount or any software inside the container. Am 14.11.2016 um 10:26 schrieb Daniel: >> Am 14.11.2016 um 09:53 schrieb Fabian Gr?nbichler : >> >> On Mon, Nov 14, 2016 at 09:43:40AM +0100, Daniel wrote: >>>> but I would advise you to use vzdump to backup containers - you get a >>>> (compressed) tar archive, the config is backed up as well and you get >>>> consistency "for free" (or almost free ;)). normally, you want to >>>> restore individual containers anyway. >>> The problem is that there is no way to restore just simple files and its not incremental. >>> So vzdump make no sense for me :( >> extracting individual files is not a problem for container backups - >> they're just compressed tar archives after all. incremental backups are >> not supported though, that is correct. > Its not a big deal to use backuppc for example on each Container but it was easier before we used LVM-Thin ;) > So it will blow up our network. Our Mail-Server for example is backuped up hours which is not so easy handled bei vzdump ;) > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From bc at iptel.co Tue Nov 15 23:22:29 2016 From: bc at iptel.co (Brian ::) Date: Tue, 15 Nov 2016 22:22:29 +0000 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> Message-ID: Hi Mikhail The guest that is running - what type of controller / cache? Thanks On Tue, Nov 15, 2016 at 10:05 PM, Mikhail wrote: > On 11/16/2016 12:43 AM, Brian :: wrote: >> What type of disk controller and what caching mode are you using? > > The storage server is built with 4 x 4TB ST4000NM0034 Seagate disks, > attached to LSI Logic SAS3008 controller. Then there's Debian Jessie > with software RAID10 using MDADM. This space is given to Proxmox host > via iSCSI + LVM via 10 gbit ethernet. There's 32GB of RAM in this > storage server, so almost all this RAM can be used for cache (nothing > else runs there). > > I ran various tests on the storage server locally (created local LV, > formatted it to EXT4 and ran there various disk-intensive tasks such as > copying big files, etc). My average write speed to this MDADM raid10 / > LVM / Ext4 filesystem is about 70-80mb/s. I guess it should be much > faster then that, but I can't find out where's the bottleneck in this > setup.. > > # cat /proc/mdstat > Personalities : [raid10] > md0 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1] > 7811819520 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU] > bitmap: 11/59 pages [44KB], 65536KB chunk > > unused devices: > > # pvs > PV VG Fmt Attr PSize PFree > /dev/md0 vg0 lvm2 a-- 7.28t 1.28t > > Thanks. > >> >> >> >> On Tue, Nov 15, 2016 at 9:36 PM, Mikhail wrote: >>> On 11/16/2016 12:33 AM, Brian :: wrote: >>>> 90.4 MB/s isn't that far off. >>> >>> Hello, >>> >>> Yes, but I'm only able to get these results when doing simple "dd" test >>> directly on Proxmox host machine inside NFS-mounted directory. KVM >>> guest's filesystem is not getting even 1/4 of that speed when it's disk >>> resides on the very same NFS (Debian installation from stock ISO takes >>> ~hour to copy first halt of it's files..) >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From m at plus-plus.su Tue Nov 15 23:33:30 2016 From: m at plus-plus.su (Mikhail) Date: Wed, 16 Nov 2016 01:33:30 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> Message-ID: On 11/16/2016 01:22 AM, Brian :: wrote: > Hi Mikhail > > The guest that is running - what type of controller / cache? > > Thanks Brian, The guest is Debian Jessie, running VirtIO as controller and "Default (No cache)" cache setting. I tried both writeback / writethrough settings as well, but it did not change things to better.. Btw, just did another "dd" test on the storage server itself (ext4 mounted from LVM lv that resides on top of MDADM RAID10): # dd if=/dev/zero of=test bs=64k count=550k conv=fdatasync 563200+0 records in 563200+0 records out 36909875200 bytes (37 GB) copied, 176.696 s, 209 MB/s I guess the storage server itself is fine. Btw, similar results when I run this test from Proxmox host that is attached via 10gbit ethernet to storage server using NFS mount: # dd if=/dev/zero of=test bs=64k count=550k conv=fdatasync 563200+0 records in 563200+0 records out 36909875200 bytes (37 GB) copied, 165.531 s, 223 MB/s At this point, I don't know what else I can check on my systems to find what's the problem with KVM images being put on NFS storage. From m at plus-plus.su Tue Nov 15 23:59:15 2016 From: m at plus-plus.su (Mikhail) Date: Wed, 16 Nov 2016 01:59:15 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> Message-ID: <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su> Here's a clean "dd" test of two identical KVM guests that shows how results differ (NFS vs LVM): 1) First guest inside qcow2 image, located on NFS share (via 10gbit ethernet), cache settings "Default (No cache)": $ dd if=/dev/zero of=test bs=64k count=150k conv=fdatasync 153600+0 records in 153600+0 records out 10066329600 bytes (10 GB) copied, 196.993 s, 51.1 MB/s 2) Second guest runs inside LVM-over-iSCSI logical volume, from the same storage server, via same as first guest 10gbit ethernet, cache settings "Default (No cache)": $ dd if=/dev/zero of=test bs=64k count=150k conv=fdatasync 153600+0 records in 153600+0 records out 10066329600 bytes (10 GB) copied, 58.474 s, 172 MB/s Mikhail. From dietmar at proxmox.com Wed Nov 16 06:52:58 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Wed, 16 Nov 2016 06:52:58 +0100 (CET) Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su> Message-ID: <1057547647.2.1479275578414@webmail.proxmox.com> > 1) First guest inside qcow2 image, located on NFS share (via 10gbit What values do you get with raw images? From m at plus-plus.su Wed Nov 16 11:02:26 2016 From: m at plus-plus.su (Mikhail) Date: Wed, 16 Nov 2016 13:02:26 +0300 Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <1057547647.2.1479275578414@webmail.proxmox.com> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su> <1057547647.2.1479275578414@webmail.proxmox.com> Message-ID: <5e80b121-ba54-e860-8ad6-086095789268@plus-plus.su> On 11/16/2016 08:52 AM, Dietmar Maurer wrote: > >> 1) First guest inside qcow2 image, located on NFS share (via 10gbit > > What values do you get with raw images? > Just now converted guest's disk image to RAW, using default cache settings. Seeing much better results - same "dd" test now shows 145 MB/s write speeds: 541590+0 records out 35493642240 bytes (35 GB) copied, 245.511 s, 145 MB/s (dd if=/dev/zero of=test bs=64k count=550k conv=fdatasync) And I also tried same test over 1 gbit network, it shows acceptable results there as well: dd if=/dev/zero of=test bs=64k count=150k conv=fdatasync 153600+0 records in 153600+0 records out 10066329600 bytes (10 GB) copied, 94.1721 s, 107 MB/s So something is not good with QCOW2 disk format. Mikhail. From dietmar at proxmox.com Wed Nov 16 12:06:42 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Wed, 16 Nov 2016 12:06:42 +0100 (CET) Subject: [PVE-User] Slow speeds when KVM guest is on NFS In-Reply-To: <5e80b121-ba54-e860-8ad6-086095789268@plus-plus.su> References: <8ab339a5-41fe-968c-e8ea-9aab60a9f4e7@plus-plus.su> <6bcf1f73-79f4-4ff7-a337-395a41c326dc@plus-plus.su> <1924770a-6a64-ba77-0ee3-f2f3e00d154b@plus-plus.su> <3dc35b99-c52e-30a2-fd09-1e5298ffedde@plus-plus.su> <0f756f48-8652-37f9-6422-6f62905a5edd@plus-plus.su> <1057547647.2.1479275578414@webmail.proxmox.com> <5e80b121-ba54-e860-8ad6-086095789268@plus-plus.su> Message-ID: <827147981.173.1479294402439@webmail.proxmox.com> > So something is not good with QCOW2 disk format. I guess this is just because it changes a sequential write order to something more random. You will get different results if you use other benchmark tools ... From nick-liste at posteo.eu Wed Nov 16 12:40:06 2016 From: nick-liste at posteo.eu (Nicola Ferrari (#554252)) Date: Wed, 16 Nov 2016 12:40:06 +0100 Subject: [PVE-User] Android app for pve 4.2 management Message-ID: Hi everybody. I'm running a 3-nodes pve 4.2 cluster. In the past I used to overview the cluster at home and while travelling using OpenVPN on my phone, in conjunction with QuadProx Mobile: https://play.google.com/store/apps/details?id=it.quadrata.android.quad_prox_mob&hl=it I recently (sept 2016) upgraded cluster to pve4, and only yesterday I realized that Quadrata App is no more functional on pve4: I can see only a few data (nodes name, free memory/cpu) but I can't see vm list and management options (start, stop, console and so on) . Do you experience the same issue too? Any advice about alternative Android apps to achieve this? Thanks! Nick PS: I tried to use simply a web browser on the mobile (firefox mobile) but it requests too much resources... that's not usable.. -- +---------------------+ | Linux User #554252 | +---------------------+ From daniel at linux-nerd.de Wed Nov 16 13:02:07 2016 From: daniel at linux-nerd.de (Daniel) Date: Wed, 16 Nov 2016 13:02:07 +0100 Subject: [PVE-User] Android app for pve 4.2 management In-Reply-To: References: Message-ID: <4C14E5C6-0309-4EC2-A42D-24447FEF7266@linux-nerd.de> Actually i have the same problem. Thats the reason why i started to develop my own App. But these App will take serval month till i will be able to show something. > Am 16.11.2016 um 12:40 schrieb Nicola Ferrari (#554252) : > > Hi everybody. > > I'm running a 3-nodes pve 4.2 cluster. > In the past I used to overview the cluster at home and while travelling > using OpenVPN on my phone, in conjunction with QuadProx Mobile: > https://play.google.com/store/apps/details?id=it.quadrata.android.quad_prox_mob&hl=it > > I recently (sept 2016) upgraded cluster to pve4, and only yesterday I > realized that Quadrata App is no more functional on pve4: > I can see only a few data (nodes name, free memory/cpu) but I can't see > vm list and management options (start, stop, console and so on) . > > Do you experience the same issue too? > Any advice about alternative Android apps to achieve this? > > Thanks! > Nick > > PS: I tried to use simply a web browser on the mobile (firefox mobile) > but it requests too much resources... that's not usable.. > > > > -- > +---------------------+ > | Linux User #554252 | > +---------------------+ > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From aderumier at odiso.com Wed Nov 16 13:47:15 2016 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Wed, 16 Nov 2016 13:47:15 +0100 (CET) Subject: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM In-Reply-To: References: Message-ID: <109434272.3393159.1479300435920.JavaMail.zimbra@oxygem.tv> maybe you can use tcpdump on vmbrX, and see if you see dhcp queries/responses ? does it work with static ip ? ----- Mail original ----- De: "Ulf Weikert" ?: "proxmoxve" Envoy?: Mardi 15 Novembre 2016 19:16:24 Objet: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM Hey there, I'm running Proxmox VE 4.2-2 on a HP DL380 G8. Yesterday I installed a new 10Gbit HP 560SFP+ Dual Port Fibre NIC. Which lead me to this bug [0], [1]. But since the card is displayed in my Proxmox Webinterface, I think it should work at least. So I configured my KVM Container and installed Windows Server 2012 R2 VM according to the best practices wiki [2]. After Installation however the NIC in the VM does not receive an IP. The DL 380 host is directly attached to our coreswitch via fibre channel. To make sure DHCP is working on the coreswitch port I plugged in some old 1 GBit SFP modules. Connected it to an equally old 24 port Switch. And plugged my notebook in the RJ45 port. DHCP works fine there. So in theory it should work in my VM as well. But for some reason it doesn't. See screenshot [3] See my Proxmox Host and VM Setup. [4] & [5]. I tried using OVS Bridge instead of Linux Bridge as suggested in [0]#20, but that didn't work either. I'm thankful for any tip or advice you can give me. [0] https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/ [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1616107 [2] https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices [3] https://postimg.org/image/bmh7iooxp/ [4] https://postimg.org/image/l8aryzg3h/ [5] https://postimg.org/image/z392hgail/ -- Freundliche Gr??e Im Auftrag Ulf Weikert Systemadministrator Berufsbildende Schulen Haarentor der Stadt Oldenburg Ammerl?nder Heerstr. 33-39 | 26129 Oldenburg ----------------------------------------- Encryptet 'Signal' Call: +49 441 77915-17 Tel. +49 441 77915-17 E-Mail: weik at bbs-haarentor.de Besuchen Sie uns im Internet unter www.bbs-haarentor.de Schulprogramm unter: http://www.bbs-haarentor.de/index.php?id=323 Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From f.gruenbichler at proxmox.com Wed Nov 16 15:02:12 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Wed, 16 Nov 2016 15:02:12 +0100 Subject: [PVE-User] call for testing: updated grub2 packages on pvetest Message-ID: <20161116140212.7i7ply3bzklmlw24@nora.maurer-it.com> Hello, I'd like people that have non-productive test setups to participate in testing the updated grub2 packages that are available in the pvetest repository. We already tested them on all of our available hardware and setups, but since issues with grub tend to be rather ugly to fix once something fails, some exposure to (potentially exotic) configurations cannot hurt. The packages update to the newer release (called "beta3" in the upstream grub project) and drop the patches from the ZoL variant of grub2 that we previously used, since grub now supports ZFS upstream. Thanks in advance for any feedback! http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-common_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-amd64-bin_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-amd64_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-ia32-bin_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi-ia32_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-efi_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-pc-bin_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-pc-dbg_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-pc_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-rescue-pc_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub-theme-starfield_2.02-pve5_amd64.deb http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64/grub2-common_2.02-pve5_amd64.deb From weik at bbs-haarentor.de Wed Nov 16 15:44:00 2016 From: weik at bbs-haarentor.de (Ulf Weikert) Date: Wed, 16 Nov 2016 15:44:00 +0100 Subject: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM In-Reply-To: <109434272.3393159.1479300435920.JavaMail.zimbra@oxygem.tv> References: <109434272.3393159.1479300435920.JavaMail.zimbra@oxygem.tv> Message-ID: <1280bb60-495e-eb03-acf6-84363515252f@bbs-haarentor.de> On 16.11.2016 13:47, Alexandre DERUMIER wrote: > maybe you can use tcpdump on vmbrX, and see if you see dhcp queries/responses ? While running the command, I unplugged the sfp module and put it back in. This is the outcome. root at VMC-01-SN:~# tcpdump -i eth4 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel root at VMC-01-SN:~# tcpdump -i vmbr4 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vmbr4, link-type EN10MB (Ethernet), capture size 262144 bytes ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel tcpdump -i eth1 (Interface for the productive system) shows expected behavior. All kinds of traffic which matches the services that are running on the VM. > > does it work with static ip ? In the VM, no. On the host due to the kernel bug I linked to in my first mail, I'm not able to restart the interface seperately. Afaik for know the only way to set an IP on eth4 is to "reboot -f" the host. Which is something I can not due right now because there are productive systems on it. Maybe weekend. > > > ----- Mail original ----- > De: "Ulf Weikert" > ?: "proxmoxve" > Envoy?: Mardi 15 Novembre 2016 19:16:24 > Objet: [PVE-User] Proxmox VE 4.2-2, 10Gbit Fibre, No LAN Connection in VM > > Hey there, > > I'm running Proxmox VE 4.2-2 on a HP DL380 G8. > Yesterday I installed a new 10Gbit HP 560SFP+ Dual Port Fibre NIC. > Which lead me to this bug [0], [1]. > > But since the card is displayed in my Proxmox Webinterface, I think it > should work at least. > > > So I configured my KVM Container and installed Windows Server 2012 R2 VM > according to the best practices wiki [2]. > > After Installation however the NIC in the VM does not receive an IP. > The DL 380 host is directly attached to our coreswitch via fibre channel. > > To make sure DHCP is working on the coreswitch port I plugged in some > old 1 GBit SFP modules. Connected it to an equally old 24 port Switch. > And plugged my notebook in the RJ45 port. DHCP works fine there. > > So in theory it should work in my VM as well. But for some reason it > doesn't. See screenshot [3] > See my Proxmox Host and VM Setup. [4] & [5]. > > I tried using OVS Bridge instead of Linux Bridge as suggested in [0]#20, > but that didn't work either. > > I'm thankful for any tip or advice you can give me. > > > [0] https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/ > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1616107 > [2] https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices > [3] https://postimg.org/image/bmh7iooxp/ > [4] https://postimg.org/image/l8aryzg3h/ > [5] https://postimg.org/image/z392hgail/ > -- Freundliche Gr??e Im Auftrag Ulf Weikert Systemadministrator Berufsbildende Schulen Haarentor der Stadt Oldenburg Ammerl?nder Heerstr. 33-39 | 26129 Oldenburg ----------------------------------------- Encryptet 'Signal' Call: +49 441 77915-17 Tel. +49 441 77915-17 E-Mail: weik at bbs-haarentor.de Besuchen Sie uns im Internet unter www.bbs-haarentor.de Schulprogramm unter: http://www.bbs-haarentor.de/index.php?id=323 Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. From gaio at sv.lnf.it Wed Nov 16 16:47:18 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Wed, 16 Nov 2016 16:47:18 +0100 Subject: [PVE-User] P2V and UEFI... Message-ID: <20161116154718.GJ3673@sv.lnf.it> I need to P2V a debian 8 server, installed on UEFI/GPT. A little complication born by the fact that i need to P2V in the same server (eg, image the server, reinstall it with proxmox, then create the VM), but i can move data elsewhere (to keep OS image minimal) and test the image with other PVE installation. Normally, i use 'mondobackup' for that, but mondo does not support UEFI (at least in debian). Also, i prefere to keep data in a second (virtual) disk, and backup that by other mean (bacula) so i need to ''repartition'' (better: reorganize data) in disks. So, summarizing: what tool it is better to use to do a (preferibly offline) image of some partition of a phisical server, respecting UEFI partitioning schema? I hope i was clear. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From nick-liste at posteo.eu Wed Nov 16 18:43:52 2016 From: nick-liste at posteo.eu (Nicola Ferrari (#554252)) Date: Wed, 16 Nov 2016 18:43:52 +0100 Subject: [PVE-User] Android app for pve 4.2 management In-Reply-To: <4C14E5C6-0309-4EC2-A42D-24447FEF7266@linux-nerd.de> References: <4C14E5C6-0309-4EC2-A42D-24447FEF7266@linux-nerd.de> Message-ID: On 16/11/2016 13:02, Daniel wrote: > Actually i have the same problem. > > Thats the reason why i started to develop my own App. > But these App will take serval month till i will be able to show something. > > - Thanks Daniel for your response. I'm glad to know about it. - In the meantime, I've just written to the "Quadrata" developers (they are also italian as me), to know something more about the QuadProx future roadmap. - Surfing in the play store, I got into this one: https://play.google.com/store/apps/details?id=com.undatech.opaque Have anybody tried this? Since that's not free, and no trial version is available, I would know your personal opinion about that. ( Please be patient for my poor english... ) Thanks! Nick -- +---------------------+ | Linux User #554252 | +---------------------+ From martin at proxmox.com Wed Nov 16 18:45:50 2016 From: martin at proxmox.com (Martin Maurer) Date: Wed, 16 Nov 2016 18:45:50 +0100 Subject: [PVE-User] Android app for pve 4.2 management In-Reply-To: References: Message-ID: See http://pve.proxmox.com/wiki/Proxmox_VE_Mobile On 16.11.2016 12:40, Nicola Ferrari (#554252) wrote: > Hi everybody. > > I'm running a 3-nodes pve 4.2 cluster. > In the past I used to overview the cluster at home and while travelling > using OpenVPN on my phone, in conjunction with QuadProx Mobile: > https://play.google.com/store/apps/details?id=it.quadrata.android.quad_prox_mob&hl=it > > I recently (sept 2016) upgraded cluster to pve4, and only yesterday I > realized that Quadrata App is no more functional on pve4: > I can see only a few data (nodes name, free memory/cpu) but I can't see > vm list and management options (start, stop, console and so on) . > > Do you experience the same issue too? > Any advice about alternative Android apps to achieve this? > > Thanks! > Nick > > PS: I tried to use simply a web browser on the mobile (firefox mobile) > but it requests too much resources... that's not usable.. > > > -- Best Regards, Martin Maurer martin at proxmox.com http://www.proxmox.com ____________________________________________________________________ Proxmox Server Solutions GmbH Br?uhausgasse 37, 1050 Vienna, Austria Commercial register no.: FN 258879 f Registration office: Handelsgericht Wien From yannis.milios at gmail.com Wed Nov 16 19:54:07 2016 From: yannis.milios at gmail.com (Yannis Milios) Date: Wed, 16 Nov 2016 18:54:07 +0000 Subject: [PVE-User] P2V and UEFI... In-Reply-To: <20161116154718.GJ3673@sv.lnf.it> References: <20161116154718.GJ3673@sv.lnf.it> Message-ID: I would use plain dd or clonezilla to backup. Then restore to vm and adjust partitions/vdisks as needed by using gparted. On Wednesday, 16 November 2016, Marco Gaiarin wrote: > > I need to P2V a debian 8 server, installed on UEFI/GPT. > > A little complication born by the fact that i need to P2V in the same > server (eg, image the server, reinstall it with proxmox, then create > the VM), but i can move data elsewhere (to keep OS image minimal) and > test the image with other PVE installation. > > Normally, i use 'mondobackup' for that, but mondo does not support UEFI > (at least in debian). > > > Also, i prefere to keep data in a second (virtual) disk, and backup > that by other mean (bacula) so i need to ''repartition'' (better: > reorganize data) in disks. > > > So, summarizing: what tool it is better to use to do a (preferibly > offline) image of some partition of a phisical server, respecting UEFI > partitioning schema? > > > I hope i was clear. Thanks. > > -- > dott. Marco Gaiarin GNUPG Key ID: > 240A3D66 > Associazione ``La Nostra Famiglia'' > http://www.lanostrafamiglia.it/ > Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento > (PN) > marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f > +39-0434-842797 > > Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! > http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 > (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Sent from Gmail Mobile From mityapetuhov at gmail.com Thu Nov 17 06:05:32 2016 From: mityapetuhov at gmail.com (Dmitry Petuhov) Date: Thu, 17 Nov 2016 08:05:32 +0300 Subject: [PVE-User] P2V and UEFI... In-Reply-To: <20161116154718.GJ3673@sv.lnf.it> References: <20161116154718.GJ3673@sv.lnf.it> Message-ID: I've used something like ssh root@ 'tar --one-file-system -C -cf - .' | tar -C -xf - Source and target partition layouts may differ, just don't forget to update fstab accordingly and about boot partition for UEFI and re-run grub-install (maybe not needed for UEFI?). 16.11.2016 18:47, Marco Gaiarin wrote: > I need to P2V a debian 8 server, installed on UEFI/GPT. > > A little complication born by the fact that i need to P2V in the same > server (eg, image the server, reinstall it with proxmox, then create > the VM), but i can move data elsewhere (to keep OS image minimal) and > test the image with other PVE installation. > > Normally, i use 'mondobackup' for that, but mondo does not support UEFI > (at least in debian). > > > Also, i prefere to keep data in a second (virtual) disk, and backup > that by other mean (bacula) so i need to ''repartition'' (better: > reorganize data) in disks. > > > So, summarizing: what tool it is better to use to do a (preferibly > offline) image of some partition of a phisical server, respecting UEFI > partitioning schema? > > > I hope i was clear. Thanks. > From dietmar at proxmox.com Thu Nov 17 10:07:06 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Thu, 17 Nov 2016 10:07:06 +0100 (CET) Subject: [PVE-User] drbdmanage License change Message-ID: <419261783.47.1479373626167@webmail.proxmox.com> Hi all, We just want to inform you that Linbit changed the License for their 'drbdmanage' toolkit. The commit messages says ("Philipp Reisner"): ------------------ basically we do not want that others (who have not contributed to the development) act as parasites in our support business ------------------ The commit is here: http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1 The new License contains the following clause (3.4b): ------------------ 3.4) Without prior written consent of LICENSOR or an authorized partner, LICENSEE is not allowed to: b) provide commercial turn-key solutions based on the LICENSED SOFTWARE or commercial services for the LICENSED SOFTWARE or its modifications to any third party (e.g. software support or trainings). ------------------ So we are basically forced to remove the package from our repository. We will also remove the included storage driver to make sure that we and our customers do not violate that license. Please contact Linbit if you want to use drbdmanage in future. They may provide all necessary packages. From gaio at sv.lnf.it Thu Nov 17 14:07:38 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 17 Nov 2016 14:07:38 +0100 Subject: [PVE-User] Moving a disk from Ceph to thin-lvm, troubles... Message-ID: <20161117130738.GG3402@sv.lnf.it> I'm still building my ceph cluster, and i've found that i put it under heavy stress migrating data. So i've setup, on a node (so, not replicated) a thin lvm storage and tried to move the disk. My LVM setup: root at thor:~# pvdisplay --- Physical volume --- PV Name /dev/sda5 VG Name pve PV Size 1.37 TiB / not usable 2.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 358668 Free PE 0 Allocated PE 358668 PV UUID yxx5qG-NAJQ-IqpV-HdJW-7YJS-M2c5-HeQItn root at thor:~# vgdisplay --- Volume group --- VG Name pve System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 10 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 1.37 TiB PE Size 4.00 MiB Total PE 358668 Alloc PE / Size 358668 / 1.37 TiB Free PE / Size 0 / 0 VG UUID VBaahR-ikYG-H2jK-TCdq-SPvE-VbLA-X4fpPd root at thor:~# lvdisplay --- Logical volume --- LV Path /dev/pve/lvol0 LV Name lvol0 VG Name pve LV UUID LR4G8Z-zHoB-t12p-B127-dK8z-GZw1-tZmQHP LV Write Access read/write LV Creation host, time thor, 2016-11-11 12:23:36 +0100 LV Status available # open 0 LV Size 88.00 MiB Current LE 22 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 251:0 --- Logical volume --- LV Name scratch VG Name pve LV UUID fFVtrc-B9lJ-h3gj-ksU6-WICb-w0A6-BVlqlq LV Write Access read/write LV Creation host, time thor, 2016-11-11 12:24:33 +0100 LV Pool metadata scratch_tmeta LV Pool data scratch_tdata LV Status available # open 1 LV Size 1.37 TiB Allocated pool data 48.36% Allocated metadata 99.95% Current LE 358602 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 251:3 (note the 'Allocated pool data 48.36%'). Source disk is 1TB on ceph, rather empty. Target space is 1.37 TB. I've followed the proxmox wiki creating the thin lvm storage (https://pve.proxmox.com/wiki/Storage:_LVM_Thin). I've first tried to move the disk 'online', and log say: create full clone of drive virtio1 (DATA:vm-107-disk-1) Logical volume "vm-107-disk-1" created. drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed transferred: 0 bytes remaining: 1099511627776 bytes total: 1099511627776 bytes progression: 0.00 % busy: true ready: false transferred: 146800640 bytes remaining: 1099364827136 bytes total: 1099511627776 bytes progression: 0.01 % busy: true ready: false transferred: 557842432 bytes remaining: 1098953785344 bytes total: 1099511627776 bytes progression: 0.05 % busy: true ready: false [...] transferred: 727548166144 bytes remaining: 371963461632 bytes total: 1099511627776 bytes progression: 66.17 % busy: true ready: false device-mapper: message ioctl on failed: Operation not supported Failed to resume scratch. lvremove 'pve/vm-107-disk-1' error: Failed to update pool pve/scratch. TASK ERROR: storage migration failed: mirroring error: mirroring job seem to have die. Maybe do you have bad sectors? at /usr/share/perl5/PVE/QemuServer.pm line 5890. In syslog i've catched also: Nov 17 12:59:45 thor lvm[598]: Thin metadata pve-scratch-tpool is now 80% full. Nov 17 13:03:35 thor lvm[598]: Thin metadata pve-scratch-tpool is now 85% full. Nov 17 13:07:25 thor lvm[598]: Thin metadata pve-scratch-tpool is now 90% full. Nov 17 13:11:25 thor lvm[598]: Thin metadata pve-scratch-tpool is now 95% full. Now, if i try again, i simply get (offline or online, make no difference): create full clone of drive virtio1 (DATA:vm-107-disk-1) device-mapper: message ioctl on failed: Operation not supported TASK ERROR: storage migration failed: lvcreate 'pve/vm-107-disk-1' error: Failed to resume scratch. Also, if i go to proxmox web interfce, storage 'Scratch' (the name of the thin lvm storage) is: Usage 48.36% (677.42 GiB of 1.37 TiB but 'content' is empty. And i'm sure i've not 677.42 GiB of data in the source disk... What i'm missing?! Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From IMMO.WETZEL at adtran.com Thu Nov 17 14:49:58 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 17 Nov 2016 13:49:58 +0000 Subject: [PVE-User] api call to get the right node name Message-ID: HI, is there any direct api call to get the node name where the vm is currently running on ? Mit freundlichen Gr??en / With kind regards Immo Wetzel ADTRAN GmbH Siemensallee 1 17489 Greifswald Germany Phone: +49 3834 5352 823 Mobile: +49 151 147 29 225 Skype: immo_wetzel_adtran Immo.Wetzel at Adtran.com PGP-Fingerprint: 7313 7E88 4E19 AACF 45E9 E74D EFF7 0480 F4CF 6426 http://www.adtran.com Sitz der Gesellschaft: Berlin / Registered office: Berlin Registergericht: Berlin / Commercial registry: Amtsgericht Charlottenburg, HRB 135656 B Gesch?ftsf?hrung / Managing Directors: Roger Shannon, James D. Wilson, Jr., Dr. Eduard Scheiterer From d.csapak at proxmox.com Thu Nov 17 15:12:48 2016 From: d.csapak at proxmox.com (Dominik Csapak) Date: Thu, 17 Nov 2016 15:12:48 +0100 Subject: [PVE-User] api call to get the right node name In-Reply-To: References: Message-ID: On 11/17/2016 02:49 PM, IMMO WETZEL wrote: > HI, > > is there any direct api call to get the node name where the vm is currently running on ? > not directly no, but you can call /cluster/resources and parse the output for your vm From IMMO.WETZEL at adtran.com Thu Nov 17 17:36:31 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 17 Nov 2016 16:36:31 +0000 Subject: [PVE-User] how to create a snapshot from vm via api2 ? Message-ID: Is that function may be not described at the current api doc ? I would expect at least three parameter node,vmid,snapshotname,description,savevmstate{Boolean} cos qm snapshot allow such parameter root at prox01:~# qm help snapshot USAGE: qm snapshot [OPTIONS] Snapshot a VM. integer (1 - N) The (unique) ID of the VM. string The name of the snapshot. -description string A textual description or comment. -vmstate boolean Save the vmstate From IMMO.WETZEL at adtran.com Thu Nov 17 18:49:11 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 17 Nov 2016 17:49:11 +0000 Subject: [PVE-User] how to get the processstate via API Message-ID: Hi, Every task started by API gets a unique task id. How can I check the state of this task via API? Immo From dietmar at proxmox.com Thu Nov 17 19:54:33 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Thu, 17 Nov 2016 19:54:33 +0100 (CET) Subject: [PVE-User] how to get the processstate via API In-Reply-To: References: Message-ID: <1071762906.117.1479408873531@webmail.proxmox.com> HTTP: GET /api2/json/nodes/{node}/tasks/{upid} CLI: pvesh get /nodes/{node}/tasks/{upid} > On November 17, 2016 at 6:49 PM IMMO WETZEL wrote: > > > Hi, > Every task started by API gets a unique task id. > How can I check the state of this task via API? > > Immo > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From f.gruenbichler at proxmox.com Fri Nov 18 08:38:51 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Fri, 18 Nov 2016 08:38:51 +0100 Subject: [PVE-User] how to create a snapshot from vm via api2 ? In-Reply-To: References: Message-ID: <20161118073851.72gc2qqye6ibgz5z@nora.maurer-it.com> On Thu, Nov 17, 2016 at 04:36:31PM +0000, IMMO WETZEL wrote: > Is that function may be not described at the current api doc ? it is ;) are you using the online version[1]? > I would expect at least three parameter > node,vmid,snapshotname,description,savevmstate{Boolean} > > cos qm snapshot allow such parameter > root at prox01:~# qm help snapshot > USAGE: qm snapshot [OPTIONS] > > Snapshot a VM. > > integer (1 - N) > The (unique) ID of the VM. > string > The name of the snapshot. > -description string > A textual description or comment. > -vmstate boolean > Save the vmstate HTTP: POST /api2/json/nodes/{node}/qemu/{vmid}/snapshot CLI: pvesh create /nodes/{node}/qemu/{vmid}/snapshot node, snapname and vmid are required, and you have the two optional parameters like with "qm snapshot" 1: http://pve.proxmox.com/pve-docs/api-viewer/index.html From gaio at sv.lnf.it Fri Nov 18 14:04:48 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Fri, 18 Nov 2016 14:04:48 +0100 Subject: [PVE-User] Moving a disk from Ceph to thin-lvm, troubles... In-Reply-To: <20161117130738.GG3402@sv.lnf.it> References: <20161117130738.GG3402@sv.lnf.it> Message-ID: <20161118130448.GF3291@sv.lnf.it> > What i'm missing?! Thanks. Sorry, probably i'm missing some background info on LVM. Trying to reset and restart from the ground. With LVM, you define a storage with a VG, and proxmox itself create a LV for every disk. Simple, clear. With Thin-LVM, insted, i've to creare a LV, define it as 'thin' with (taken from the wiki): lvcreate -L 100G -n data pve lvconvert --type thin-pool pve/data and in definition of the storage, in proxmox interface, i've to specify the VG (clear) but also the LV. OK. But done that, where the disk image get created? Proxmox take care of formatting and mounting the LV, and create the disk image inside? Sorry, but i've not clear how works... Also, seems that the trouble with Thin LVM came from the fact that i've exausted the 'metadata' space, and the default LVM configuration does not extend automatically the metadata space (thin_pool_autoextend_threshold = 100). Right? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From f.gruenbichler at proxmox.com Fri Nov 18 14:28:47 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Fri, 18 Nov 2016 14:28:47 +0100 Subject: [PVE-User] Moving a disk from Ceph to thin-lvm, troubles... In-Reply-To: <20161118130448.GF3291@sv.lnf.it> References: <20161117130738.GG3402@sv.lnf.it> <20161118130448.GF3291@sv.lnf.it> Message-ID: <20161118132847.sltboxrjlet4neel@nora.maurer-it.com> On Fri, Nov 18, 2016 at 02:04:48PM +0100, Marco Gaiarin wrote: > > > What i'm missing?! Thanks. > > Sorry, probably i'm missing some background info on LVM. Trying to > reset and restart from the ground. > > > With LVM, you define a storage with a VG, and proxmox itself create a > LV for every disk. Simple, clear. > > > With Thin-LVM, insted, i've to creare a LV, define it as 'thin' with > (taken from the wiki): > > lvcreate -L 100G -n data pve > lvconvert --type thin-pool pve/data you can simply create the thin pool LV in one go, e.g.: lvcreate -L 100G -n mythinpoolname -T myvgname will create a 100G thin pool (volume) called "mythinpoolname" on the volume group "myvgname". optionally you can specify the pool metadata size (with "--poolmetadatasize SIZE"), the default is 64b per chunk of the pool. > > and in definition of the storage, in proxmox interface, i've to specify > the VG (clear) but also the LV. > OK. But done that, where the disk image get created? Proxmox take care > of formatting and mounting the LV, and create the disk image inside? PVE will automatically create thinly provisioned LVs for the disks, and instead of on the VG, they are created on the thin pool. A thin pool cannot be mounted, only the thinly provisioned volumes on it can. If you want to simplify it, a thin pool acts as both an LV (in relation to the VG) and as VG (in relation to the thin volumes stored on it). > > Sorry, but i've not clear how works... hope the explanation helped a bit? > > Also, seems that the trouble with Thin LVM came from the fact that i've > exausted the 'metadata' space, and the default LVM configuration does > not extend automatically the metadata space > (thin_pool_autoextend_threshold = 100). you can specify the metadata size on creation (see above) - maybe the convert to thin operation does not allocate enough space for the metadata? in our default setup, pool autoextension is not possible (there are no free blocks in the VG to autoextend with).. you can check "man lvmthin" for more examples and explanations for how LVM thin provisioning works. From chance_ellis at yahoo.com Fri Nov 18 17:02:32 2016 From: chance_ellis at yahoo.com (Chance Ellis) Date: Fri, 18 Nov 2016 11:02:32 -0500 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> Message-ID: <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com> Hello, I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time. My normal upgrade plan is to live migrate vms off of node-1 to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3. The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal. Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network? Thanks! On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" wrote: On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: >> Are you sure you upgraded all, i.e. used: >> apt update >> apt full-upgrade > > Resolved it thanks Thomas - I hadn't updated the *destination* server. > makes sense, should have been made sense a few days ago this, would not be too hard to catch :/ anyway, for anyone reading this: When upgrading qemu-server to version 4.0.93 or newer you should upgrade all other nodes pve-cluster package to version 4.0-47 or newer, else migrations to those nodes will not work - as we use a new command to detect if we should send the traffic over a separate migration network. cheers, Thomas _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From t.lamprecht at proxmox.com Fri Nov 18 17:44:41 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Fri, 18 Nov 2016 17:44:41 +0100 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com> References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com> Message-ID: <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com> Hi, On 11/18/2016 05:02 PM, Chance Ellis wrote: > Hello, > > I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time. > > My normal upgrade plan is to live migrate vms off of node-1 to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3. > > The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal. If I understand you correctly you have new node1 old node2 And migration from node2 -> node1 does not work? That should not be, if you run into this can you post the error from the migrate command? We normally try to guarantee that old -> new works, the other way around cannot be always guaranteed. I tested this also now and it worked. I down graded a test node of mine, started a VM there and live migrated it successfully to a upgraded VM. > Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network? If you have not configured it it will not be used. Migrate a unimportant test VM first to see if it works. cheers, Thomas > > Thanks! > > > On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" wrote: > > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: > >> Are you sure you upgraded all, i.e. used: > >> apt update > >> apt full-upgrade > > > > Resolved it thanks Thomas - I hadn't updated the *destination* server. > > > > > makes sense, should have been made sense a few days ago this, would not > be too hard to catch :/ > > anyway, for anyone reading this: > When upgrading qemu-server to version 4.0.93 or newer you should upgrade > all other nodes pve-cluster package to version 4.0-47 or newer, else > migrations to those nodes will not work - as we use a new command to > detect if we should send the traffic over a separate migration network. > > cheers, > Thomas > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From chance_ellis at yahoo.com Fri Nov 18 18:01:44 2016 From: chance_ellis at yahoo.com (Chance Ellis) Date: Fri, 18 Nov 2016 12:01:44 -0500 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com> References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com> <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com> Message-ID: <5FDE66D8-461C-4D3A-9356-6DAF80FE4F5E@yahoo.com> Hi Tom, You have the essential issue. I have my original cluster. All nodes are running these versions: http://pastebin.com/vquKJaKJ As a test, I added a new node to the cluster, running these versions: http://pastebin.com/Jg5LH0RD When I try to migrate from old->new, I get the following error: http://pastebin.com/YazWBtn2 When I try to migrate from new-> old, I get the following error: http://pastebin.com/hBfBnsYP Thanks! On 11/18/16, 11:44 AM, "pve-user on behalf of Thomas Lamprecht" wrote: Hi, On 11/18/2016 05:02 PM, Chance Ellis wrote: > Hello, > > I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time. > > My normal upgrade plan is to live migrate vms off of node-1 to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3. > > The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal. If I understand you correctly you have new node1 old node2 And migration from node2 -> node1 does not work? That should not be, if you run into this can you post the error from the migrate command? We normally try to guarantee that old -> new works, the other way around cannot be always guaranteed. I tested this also now and it worked. I down graded a test node of mine, started a VM there and live migrated it successfully to a upgraded VM. > Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network? If you have not configured it it will not be used. Migrate a unimportant test VM first to see if it works. cheers, Thomas > > Thanks! > > > On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" wrote: > > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: > >> Are you sure you upgraded all, i.e. used: > >> apt update > >> apt full-upgrade > > > > Resolved it thanks Thomas - I hadn't updated the *destination* server. > > > > > makes sense, should have been made sense a few days ago this, would not > be too hard to catch :/ > > anyway, for anyone reading this: > When upgrading qemu-server to version 4.0.93 or newer you should upgrade > all other nodes pve-cluster package to version 4.0-47 or newer, else > migrations to those nodes will not work - as we use a new command to > detect if we should send the traffic over a separate migration network. > > cheers, > Thomas > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From t.lamprecht at proxmox.com Fri Nov 18 18:21:00 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Fri, 18 Nov 2016 18:21:00 +0100 Subject: [PVE-User] [pve-devel] online migration broken in latest updates - "unknown command 'mtunnel'" In-Reply-To: <5FDE66D8-461C-4D3A-9356-6DAF80FE4F5E@yahoo.com> References: <462da5cc-a39d-4059-64f0-61c1042ed90a@gmail.com> <572ac973-9066-fa58-3651-41ee2475a491@proxmox.com> <1ABC5363-64F3-403C-9DAC-5C74EE712472@yahoo.com> <006a29c1-9d1d-875c-9419-7dfcc8437380@proxmox.com> <5FDE66D8-461C-4D3A-9356-6DAF80FE4F5E@yahoo.com> Message-ID: <40cba59f-ef02-e80a-fdb2-836098544ec1@proxmox.com> Hi, I'm removing the pve-devel list from the reply-to as one is enough. On 11/18/2016 06:01 PM, Chance Ellis wrote: > Hi Tom, > > You have the essential issue. > > I have my original cluster. All nodes are running these versions: http://pastebin.com/vquKJaKJ > > As a test, I added a new node to the cluster, running these versions: http://pastebin.com/Jg5LH0RD > > When I try to migrate from old->new, I get the following error: http://pastebin.com/YazWBtn2 Here the migration network patch has now fault, it comes to this line: > ... > Nov 18 11:59:27 starting online/live migration on tcp:localhost:60000 Here anything related to a dedicated migration network happened already, the rest of the code is independent from it. But I find it interesting that you have "tcp:localhost:60000" in the log. This means that your node still uses the old TCP forward ssh tunnel for the migration. Those did not open reliable so we switched to unix sockets, so the line should be something like: > Nov 18 17:42:38 starting online/live migration on unix:/run/qemu-server/167.migrate As a workaround disable migration_unsecure or delete it from /etc/pve/datacenter.cfg then it should work. I have to look into that, there may be a bug when migrating from old -> new and migration_unsecure on. > > When I try to migrate from new-> old, I get the following error: http://pastebin.com/hBfBnsYP This is expected. But you can solve it by updating at least the pve-cluster pcakage on the old node, then it should work also. cheers, Thomas > Thanks! > > > > > > > On 11/18/16, 11:44 AM, "pve-user on behalf of Thomas Lamprecht" wrote: > > Hi, > > > On 11/18/2016 05:02 PM, Chance Ellis wrote: > > Hello, > > > > I am running a small cluster of 3 nodes. I would like to upgrade those nodes to the newer versions. The problem I will run into is a requirement for no down time. > > > > My normal upgrade plan is to live migrate vms off of node-1 to the remaining 2 nodes. I then upgrade and reboot node-1. Once node-1 is operational, I move vms from node-2 to node-1. I upgrade and reboot node-2. I follow the same for node-3. > > > > The issue I will run into is that once I upgrade node-1, I won?t be able to migrate vms from node-2 back to it because of the version mismatch on qemu-server and pve-cluster. The migration will fail. The only option I will have is to shutdown the vm and move the conf file. Back to the no down time requirement, this is less than ideal. > > If I understand you correctly you have > new node1 > old node2 > > And migration from node2 -> node1 does not work? > That should not be, if you run into this can you post the error from the > migrate command? > > We normally try to guarantee that old -> new works, the other way around > cannot be always guaranteed. > > I tested this also now and it worked. I down graded a test node of mine, > started a VM there and live migrated it successfully to a upgraded VM. > > > Is there another way to migrate the vms with the new version packages using the old method that doesn?t detect or a separate migration network? > > If you have not configured it it will not be used. > Migrate a unimportant test VM first to see if it works. > > cheers, > Thomas > > > > > Thanks! > > > > > > On 11/11/16, 2:05 AM, "pve-user on behalf of Thomas Lamprecht" wrote: > > > > On 11/10/2016 10:35 PM, Lindsay Mathieson wrote: > > > On 11/11/2016 7:11 AM, Thomas Lamprecht wrote: > > >> Are you sure you upgraded all, i.e. used: > > >> apt update > > >> apt full-upgrade > > > > > > Resolved it thanks Thomas - I hadn't updated the *destination* server. > > > > > > > > > makes sense, should have been made sense a few days ago this, would not > > be too hard to catch :/ > > > > anyway, for anyone reading this: > > When upgrading qemu-server to version 4.0.93 or newer you should upgrade > > all other nodes pve-cluster package to version 4.0-47 or newer, else > > migrations to those nodes will not work - as we use a new command to > > detect if we should send the traffic over a separate migration network. > > > > cheers, > > Thomas > > > > > > > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > > > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mavleeuwen at icloud.com Sat Nov 19 10:19:11 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 10:19:11 +0100 Subject: [PVE-User] License issue Message-ID: Hi, I?m new here and I never used mailing lists.., so I apologise if I do something stupid. I?m Marcel van Leeuwen and living in the Netherlands. IT stuff is just my hobby but I must admit its a bit out of hand. I'm testing ProxmoxVE at the moment and I really like it. I also considered ESXi but I like the opensource character of ProxmoxVE. I subscribed for a license to support the project and of course to get updates. Now I?m in a testing phase so I installed my license a couple of times. I think I hit a maximum cause I can reactivate my license at the moment. I raised a ticket over at Maurer IT. I was not aware of this limitation. How do I prevent this from happening again? Just not install the license or not re-install ProxmoxVE? Regards, Marcel van Leeuwen From dietmar at proxmox.com Sat Nov 19 11:06:38 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sat, 19 Nov 2016 11:06:38 +0100 (CET) Subject: [PVE-User] License issue In-Reply-To: References: Message-ID: <1749975450.39.1479549998560@webmail.proxmox.com> > I subscribed for a license to support the project and of course to get > updates. Now I?m in a testing phase so I installed my license a couple of > times. I think I hit a maximum cause I can reactivate my license at the > moment. I raised a ticket over at Maurer IT. I was not aware of this > limitation. How do I prevent this from happening again? Just not install the > license or not re-install ProxmoxVE? It is usually not required to do re-installs (what for?). And I guess it is not necessary to activate the subscription for a test system when you know you will reinstall soon (use pve-no-subscription for updates). From mavleeuwen at icloud.com Sat Nov 19 12:14:06 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 12:14:06 +0100 Subject: [PVE-User] License issue In-Reply-To: <1749975450.39.1479549998560@webmail.proxmox.com> References: <1749975450.39.1479549998560@webmail.proxmox.com> Message-ID: Yeah, I agree it?s normally not necessary to do re-installs. The reason I did I was messing with remote NFS shares in LXC containers. So I did a couple of stupid things (i still have not resolved this issue). I already installed the license and was not aware of the limitation. For now I?ve add the pve-no-subscripition repository. What?s the difference between the pve-enterprise and the pve-no-subscription repository? Are update just beter tested in the pve-enterprise repo? > On 19 Nov 2016, at 11:06, Dietmar Maurer wrote: > > >> I subscribed for a license to support the project and of course to get >> updates. Now I?m in a testing phase so I installed my license a couple of >> times. I think I hit a maximum cause I can reactivate my license at the >> moment. I raised a ticket over at Maurer IT. I was not aware of this >> limitation. How do I prevent this from happening again? Just not install the >> license or not re-install ProxmoxVE? > > It is usually not required to do re-installs (what for?). And I guess > it is not necessary to activate the subscription for a test system > when you know you will reinstall soon (use pve-no-subscription for updates). > From bc at iptel.co Sat Nov 19 12:25:16 2016 From: bc at iptel.co (Brian ::) Date: Sat, 19 Nov 2016 11:25:16 +0000 Subject: [PVE-User] License issue In-Reply-To: References: <1749975450.39.1479549998560@webmail.proxmox.com> Message-ID: Hi Marcel, Its all explained here https://pve.proxmox.com/wiki/Package_Repositories Cheers On Sat, Nov 19, 2016 at 11:14 AM, Marcel van Leeuwen wrote: > Yeah, I agree it?s normally not necessary to do re-installs. The reason I did I was messing with remote NFS shares in LXC containers. So I did a couple of stupid things (i still have not resolved this issue). I already installed the license and was not aware of the limitation. > > For now I?ve add the pve-no-subscripition repository. > > What?s the difference between the pve-enterprise and the pve-no-subscription repository? Are update just beter tested in the pve-enterprise repo? > >> On 19 Nov 2016, at 11:06, Dietmar Maurer wrote: >> >> >>> I subscribed for a license to support the project and of course to get >>> updates. Now I?m in a testing phase so I installed my license a couple of >>> times. I think I hit a maximum cause I can reactivate my license at the >>> moment. I raised a ticket over at Maurer IT. I was not aware of this >>> limitation. How do I prevent this from happening again? Just not install the >>> license or not re-install ProxmoxVE? >> >> It is usually not required to do re-installs (what for?). And I guess >> it is not necessary to activate the subscription for a test system >> when you know you will reinstall soon (use pve-no-subscription for updates). >> > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mavleeuwen at icloud.com Sat Nov 19 12:33:56 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 12:33:56 +0100 Subject: [PVE-User] License issue In-Reply-To: References: <1749975450.39.1479549998560@webmail.proxmox.com> Message-ID: <22442392-FBAC-4FFD-BC32-4EDD6A7CCB79@icloud.com> Hi Brian, Thanks for that link! Checking? Cheers, Marcel > On 19 Nov 2016, at 12:25, Brian :: wrote: > > Hi Marcel, > > Its all explained here https://pve.proxmox.com/wiki/Package_Repositories > > Cheers > > > > On Sat, Nov 19, 2016 at 11:14 AM, Marcel van Leeuwen > wrote: >> Yeah, I agree it?s normally not necessary to do re-installs. The reason I did I was messing with remote NFS shares in LXC containers. So I did a couple of stupid things (i still have not resolved this issue). I already installed the license and was not aware of the limitation. >> >> For now I?ve add the pve-no-subscripition repository. >> >> What?s the difference between the pve-enterprise and the pve-no-subscription repository? Are update just beter tested in the pve-enterprise repo? >> >>> On 19 Nov 2016, at 11:06, Dietmar Maurer wrote: >>> >>> >>>> I subscribed for a license to support the project and of course to get >>>> updates. Now I?m in a testing phase so I installed my license a couple of >>>> times. I think I hit a maximum cause I can reactivate my license at the >>>> moment. I raised a ticket over at Maurer IT. I was not aware of this >>>> limitation. How do I prevent this from happening again? Just not install the >>>> license or not re-install ProxmoxVE? >>> >>> It is usually not required to do re-installs (what for?). And I guess >>> it is not necessary to activate the subscription for a test system >>> when you know you will reinstall soon (use pve-no-subscription for updates). >>> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mavleeuwen at icloud.com Sat Nov 19 14:35:52 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 14:35:52 +0100 Subject: [PVE-User] NFS, LXC Message-ID: <3922CAC4-2058-4E21-9AB1-F7F45E17C8B4@icloud.com> Hi, I?m trying to mount a remote NFS share (NAS) from a LXC container. I found this on the Proxmox forums and tried it. /etc/apparmor.d/lxc-default-with-nfs # Do not load this file. Rather, load /etc/apparmor.d/lxc-containers, which # will source all profiles under /etc/apparmor.d/lxc profile lxc-container-default-with-nfs flags=(attach_disconnected,mediate_deleted) { #include # allow NFS (nfs/nfs4) mounts. mount fstype=nfs*, } reload apparmor_parser -r /etc/apparmor.d/lxc-containers add to container config lxc.aa_profile: lxc-container-default-with-nfs I add the above settings to my Proxmox host but when I restart the LXC container with the new settings I can?t access the web app in this container anymore. It looks like all network connectivity is gone. Also tried to ping Goolge.com within the LXC container but no go. When I remove lxc.aa_profile: lxc-container-default-with-nfs everything is okay. Any idea? Cheers, Marcel From lemonnierk at ulrar.net Sat Nov 19 14:50:42 2016 From: lemonnierk at ulrar.net (Kevin Lemonnier) Date: Sat, 19 Nov 2016 14:50:42 +0100 Subject: [PVE-User] License issue In-Reply-To: <1749975450.39.1479549998560@webmail.proxmox.com> References: <1749975450.39.1479549998560@webmail.proxmox.com> Message-ID: <20161119135042.GJ24918@luwin.ulrar.net> > > It is usually not required to do re-installs (what for?). [...] > It's so so so so easy to mess up in a cluster and be locked out. Unfortunatly the only way is to re-install, and that's basicaly the only answer you get from both IRC and the forum to those problems. So yes, re-install is unfortunatly necessary. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: From mavleeuwen at icloud.com Sat Nov 19 15:33:03 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 15:33:03 +0100 Subject: [PVE-User] NFS, LXC In-Reply-To: <3922CAC4-2058-4E21-9AB1-F7F45E17C8B4@icloud.com> References: <3922CAC4-2058-4E21-9AB1-F7F45E17C8B4@icloud.com> Message-ID: To reply to my own question you have to mount a NFS share on the host via the webui and use bind mount points. Cheers, Marcel > On 19 Nov 2016, at 14:35, Marcel van Leeuwen wrote: > > Hi, > > I?m trying to mount a remote NFS share (NAS) from a LXC container. I found this on the Proxmox forums and tried it. > > /etc/apparmor.d/lxc-default-with-nfs > > # Do not load this file. Rather, load /etc/apparmor.d/lxc-containers, which > # will source all profiles under /etc/apparmor.d/lxc > > profile lxc-container-default-with-nfs flags=(attach_disconnected,mediate_deleted) { > #include > > # allow NFS (nfs/nfs4) mounts. > mount fstype=nfs*, > } > > reload > > apparmor_parser -r /etc/apparmor.d/lxc-containers > > add to container config > > lxc.aa_profile: lxc-container-default-with-nfs > > I add the above settings to my Proxmox host but when I restart the LXC container with the new settings I can?t access the web app in this container anymore. It looks like all network connectivity is gone. Also tried to ping Goolge.com within the LXC container but no go. When I remove > > lxc.aa_profile: lxc-container-default-with-nfs > > everything is okay. Any idea? > > Cheers, > > Marcel > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mavleeuwen at icloud.com Sat Nov 19 15:38:23 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 15:38:23 +0100 Subject: [PVE-User] License issue In-Reply-To: <20161119135042.GJ24918@luwin.ulrar.net> References: <1749975450.39.1479549998560@webmail.proxmox.com> <20161119135042.GJ24918@luwin.ulrar.net> Message-ID: <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com> Hmmm, also true. I think this surly applies to less experienced Linux user like me but also if you applies when you are not comfortable on a distro? Cheers, Marcel > On 19 Nov 2016, at 14:50, Kevin Lemonnier wrote: > >> >> It is usually not required to do re-installs (what for?). [...] >> > > It's so so so so easy to mess up in a cluster and be locked out. > Unfortunatly the only way is to re-install, and that's basicaly the > only answer you get from both IRC and the forum to those problems. > > So yes, re-install is unfortunatly necessary. > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From bc at iptel.co Sat Nov 19 16:15:24 2016 From: bc at iptel.co (Brian ::) Date: Sat, 19 Nov 2016 15:15:24 +0000 Subject: [PVE-User] License issue In-Reply-To: <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com> References: <1749975450.39.1479549998560@webmail.proxmox.com> <20161119135042.GJ24918@luwin.ulrar.net> <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com> Message-ID: Don't install the licence until you're fully comfortable that you have everything working the way you want it and you won't have any issue! You can use the non sub repo for as long as you need. On Sat, Nov 19, 2016 at 2:38 PM, Marcel van Leeuwen wrote: > Hmmm, also true. I think this surly applies to less experienced Linux user like me but also if you applies when you are not comfortable on a distro? > > Cheers, > > Marcel >> On 19 Nov 2016, at 14:50, Kevin Lemonnier wrote: >> >>> >>> It is usually not required to do re-installs (what for?). [...] >>> >> >> It's so so so so easy to mess up in a cluster and be locked out. >> Unfortunatly the only way is to re-install, and that's basicaly the >> only answer you get from both IRC and the forum to those problems. >> >> So yes, re-install is unfortunatly necessary. >> >> -- >> Kevin Lemonnier >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From dietmar at proxmox.com Sat Nov 19 16:39:14 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sat, 19 Nov 2016 16:39:14 +0100 (CET) Subject: [PVE-User] License issue In-Reply-To: References: <1749975450.39.1479549998560@webmail.proxmox.com> Message-ID: <547945705.173.1479569954476@webmail.proxmox.com> > For now I?ve add the pve-no-subscripition repository. > > What?s the difference between the pve-enterprise and the pve-no-subscription > repository? Are update just beter tested in the pve-enterprise repo? Basically yes. From mavleeuwen at icloud.com Sat Nov 19 19:15:01 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 19:15:01 +0100 Subject: [PVE-User] License issue In-Reply-To: References: <1749975450.39.1479549998560@webmail.proxmox.com> <20161119135042.GJ24918@luwin.ulrar.net> <66C3EE99-C305-43B7-90A5-63818F1A81C3@icloud.com> Message-ID: I'm certainly going to do this. Thanks! Marcel > Op 19 nov. 2016 om 16:15 heeft Brian :: het volgende geschreven: > > Don't install the licence until you're fully comfortable that you have > everything working the way you want it and you won't have any issue! > > You can use the non sub repo for as long as you need. > > On Sat, Nov 19, 2016 at 2:38 PM, Marcel van Leeuwen > wrote: >> Hmmm, also true. I think this surly applies to less experienced Linux user like me but also if you applies when you are not comfortable on a distro? >> >> Cheers, >> >> Marcel >>>> On 19 Nov 2016, at 14:50, Kevin Lemonnier wrote: >>>> >>>> >>>> It is usually not required to do re-installs (what for?). [...] >>>> >>> >>> It's so so so so easy to mess up in a cluster and be locked out. >>> Unfortunatly the only way is to re-install, and that's basicaly the >>> only answer you get from both IRC and the forum to those problems. >>> >>> So yes, re-install is unfortunatly necessary. >>> >>> -- >>> Kevin Lemonnier >>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mavleeuwen at icloud.com Sat Nov 19 19:20:54 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Sat, 19 Nov 2016 19:20:54 +0100 Subject: [PVE-User] License issue In-Reply-To: <547945705.173.1479569954476@webmail.proxmox.com> References: <1749975450.39.1479549998560@webmail.proxmox.com> <547945705.173.1479569954476@webmail.proxmox.com> Message-ID: <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com> What if the license is renewed after a year? Then you have 3 installs again? Op 19 nov. 2016 om 16:39 heeft Dietmar Maurer het volgende geschreven: >> For now I?ve add the pve-no-subscripition repository. >> >> What?s the difference between the pve-enterprise and the pve-no-subscription >> repository? Are update just beter tested in the pve-enterprise repo? > > Basically yes. > From dietmar at proxmox.com Sat Nov 19 20:22:20 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sat, 19 Nov 2016 20:22:20 +0100 (CET) Subject: [PVE-User] License issue In-Reply-To: <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com> References: <1749975450.39.1479549998560@webmail.proxmox.com> <547945705.173.1479569954476@webmail.proxmox.com> <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com> Message-ID: <94138841.252.1479583340873@webmail.proxmox.com> Please note that our software license is AGPL. You talk about subscriptions here - and this is something very different. > What if the license is renewed after a year? Then you have 3 installs again? Sure. Also, you can simply contact our support if you need more than 3 installs. We usually find a solution ... From lindsay.mathieson at gmail.com Sun Nov 20 00:13:19 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Sun, 20 Nov 2016 09:13:19 +1000 Subject: [PVE-User] pve-qemu-kvm 2.7.0-8 Message-ID: <426a470a-64ad-df2d-51e0-ac789de96e6b@gmail.com> Does 2.7.0-8 resolve the snapshot problems? -- Lindsay Mathieson From lindsay.mathieson at gmail.com Sun Nov 20 01:04:13 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Sun, 20 Nov 2016 10:04:13 +1000 Subject: [PVE-User] pve-qemu-kvm 2.7.0-8 In-Reply-To: <426a470a-64ad-df2d-51e0-ac789de96e6b@gmail.com> References: <426a470a-64ad-df2d-51e0-ac789de96e6b@gmail.com> Message-ID: <9673b774-9c47-b3a9-8657-03e4513042ec@gmail.com> On 20/11/2016 9:13 AM, Lindsay Mathieson wrote: > Does 2.7.0-8 resolve the snapshot problems? To answer my own question, it appears that it does. I have successfully snapshotted and restored running VM's. Online migration is ok to. Thanks Devs. -- Lindsay Mathieson From daniel at linux-nerd.de Sun Nov 20 11:08:50 2016 From: daniel at linux-nerd.de (Daniel) Date: Sun, 20 Nov 2016 11:08:50 +0100 Subject: [PVE-User] LXC Live Migration Message-ID: Hi, i didnt test it yet but is LXC Live Migration implemented now? If not, someone knows if there are plans for implementation? Cheers Daniel From marcomgabriel at gmail.com Sun Nov 20 15:27:11 2016 From: marcomgabriel at gmail.com (Marco M. Gabriel) Date: Sun, 20 Nov 2016 14:27:11 +0000 Subject: [PVE-User] drbdmanage License change In-Reply-To: <419261783.47.1479373626167@webmail.proxmox.com> References: <419261783.47.1479373626167@webmail.proxmox.com> Message-ID: How does this affect existing Proxmox VE 4.x / DRBD9 setups? Does "removing the storage driver" mean, that there is no DRBD kernel module available from next release oder is it just the manageability due to removal of drbdmanage? thanks for clarification, Marco Dietmar Maurer schrieb am Do., 17. Nov. 2016 um 10:07 Uhr: > Hi all, > > We just want to inform you that Linbit changed the License > for their 'drbdmanage' toolkit. > > The commit messages says ("Philipp Reisner"): > ------------------ > basically we do not want that others (who have not contributed to the > development) act as parasites in our support business > ------------------ > > The commit is here: > > > http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1 > > > The new License contains the following clause (3.4b): > > ------------------ > 3.4) Without prior written consent of LICENSOR or an authorized partner, > LICENSEE is not allowed to: > > b) provide commercial turn-key solutions based on the LICENSED SOFTWARE or > commercial services for the LICENSED SOFTWARE or its modifications to any > third party (e.g. software support or trainings). > ------------------ > > So we are basically forced to remove the package from our repository. We > will > also remove the included storage driver to make sure that we and our > customers do not violate that license. > > Please contact Linbit if you want to use drbdmanage in future. They may > provide all necessary packages. > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From mail at valentinvoigt.info Sun Nov 20 16:00:40 2016 From: mail at valentinvoigt.info (Valentin Voigt) Date: Sun, 20 Nov 2016 15:00:40 +0000 Subject: [PVE-User] drbdmanage License change In-Reply-To: <419261783.47.1479373626167@webmail.proxmox.com> Message-ID: Hi, I just set up a two-node cluster with Proxmox using DRBD9 for live migration. Does that mean that I should switch technology as long as we're not using that cluster for production? It would be pretty sad when DRBD gets removed from Proxmox once I get to use it. I think it's already hard to find solutions for high-availability(ish) clusters for those poor souls with with only two physical machines. Any hint in a better direction would of course be appreciated! Thanks! Valentin ------ Originalnachricht ------ Von: "Dietmar Maurer" An: "PVE Development List" ; "PVE User List" Gesendet: 17.11.2016 10:07:06 Betreff: [PVE-User] drbdmanage License change >Hi all, > >We just want to inform you that Linbit changed the License >for their 'drbdmanage' toolkit. > >The commit messages says ("Philipp Reisner"): >------------------ >basically we do not want that others (who have not contributed to the >development) act as parasites in our support business >------------------ > >The commit is here: > >http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1 > > >The new License contains the following clause (3.4b): > >------------------ >3.4) Without prior written consent of LICENSOR or an authorized >partner, > LICENSEE is not allowed to: > >b) provide commercial turn-key solutions based on the LICENSED SOFTWARE >or > commercial services for the LICENSED SOFTWARE or its modifications to >any > third party (e.g. software support or trainings). >------------------ > >So we are basically forced to remove the package from our repository. >We will >also remove the included storage driver to make sure that we and our >customers do not violate that license. > >Please contact Linbit if you want to use drbdmanage in future. They may >provide all necessary packages. > >_______________________________________________ >pve-user mailing list >pve-user at pve.proxmox.com >http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From dietmar at proxmox.com Sun Nov 20 16:25:19 2016 From: dietmar at proxmox.com (Dietmar Maurer) Date: Sun, 20 Nov 2016 16:25:19 +0100 (CET) Subject: [PVE-User] drbdmanage License change In-Reply-To: References: <419261783.47.1479373626167@webmail.proxmox.com> Message-ID: <225246742.23.1479655521325@webmail.proxmox.com> > How does this affect existing Proxmox VE 4.x / DRBD9 setups? > > Does "removing the storage driver" mean, that there is no DRBD kernel > module available from next release oder is it just the manageability due to > removal of drbdmanage? We will keep the kernel module for now, unless Linbit wants that we remove it. From aderumier at odiso.com Sun Nov 20 17:54:47 2016 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Sun, 20 Nov 2016 17:54:47 +0100 (CET) Subject: [PVE-User] drbdmanage License change In-Reply-To: References: Message-ID: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> >>I think it's already hard to find solutions for high-availability(ish) >>clusters for those poor souls with with only two physical machines. Any >>hint in a better direction would of course be appreciated! I think we could manage this with qemu block replication. Live local storage migration is coming, and it's including the mirror through remote node. So I think it should be too difficult to handle continuous replication of qemu block driver. Maybe the only problem is that is not possible to run other qemu jobs (backups for example), at the same time. ----- Mail original ----- De: "Valentin Voigt" ?: "proxmoxve" Envoy?: Dimanche 20 Novembre 2016 16:00:40 Objet: Re: [PVE-User] drbdmanage License change Hi, I just set up a two-node cluster with Proxmox using DRBD9 for live migration. Does that mean that I should switch technology as long as we're not using that cluster for production? It would be pretty sad when DRBD gets removed from Proxmox once I get to use it. I think it's already hard to find solutions for high-availability(ish) clusters for those poor souls with with only two physical machines. Any hint in a better direction would of course be appreciated! Thanks! Valentin ------ Originalnachricht ------ Von: "Dietmar Maurer" An: "PVE Development List" ; "PVE User List" Gesendet: 17.11.2016 10:07:06 Betreff: [PVE-User] drbdmanage License change >Hi all, > >We just want to inform you that Linbit changed the License >for their 'drbdmanage' toolkit. > >The commit messages says ("Philipp Reisner"): >------------------ >basically we do not want that others (who have not contributed to the >development) act as parasites in our support business >------------------ > >The commit is here: > >http://git.drbd.org/drbdmanage.git/commitdiff/441dc6a96b0bc6a08d2469fa5a82d97fc08e8ec1 > > >The new License contains the following clause (3.4b): > >------------------ >3.4) Without prior written consent of LICENSOR or an authorized >partner, > LICENSEE is not allowed to: > >b) provide commercial turn-key solutions based on the LICENSED SOFTWARE >or > commercial services for the LICENSED SOFTWARE or its modifications to >any > third party (e.g. software support or trainings). >------------------ > >So we are basically forced to remove the package from our repository. >We will >also remove the included storage driver to make sure that we and our >customers do not violate that license. > >Please contact Linbit if you want to use drbdmanage in future. They may >provide all necessary packages. > >_______________________________________________ >pve-user mailing list >pve-user at pve.proxmox.com >http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From lindsay.mathieson at gmail.com Sun Nov 20 22:22:37 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Mon, 21 Nov 2016 07:22:37 +1000 Subject: [PVE-User] drbdmanage License change In-Reply-To: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> References: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> Message-ID: <899652ea-c266-d799-2032-f3e17d156387@gmail.com> On 21/11/2016 2:54 AM, Alexandre DERUMIER wrote: > I think we could manage this with qemu block replication. Very nice. Is this an existing feature in qemu or still under development? (or planning) -- Lindsay Mathieson From aderumier at odiso.com Mon Nov 21 07:19:41 2016 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Mon, 21 Nov 2016 07:19:41 +0100 (CET) Subject: [PVE-User] drbdmanage License change In-Reply-To: <899652ea-c266-d799-2032-f3e17d156387@gmail.com> References: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> <899652ea-c266-d799-2032-f3e17d156387@gmail.com> Message-ID: <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> >>Is this an existing feature in qemu or still under development? (or >>planning) qemu already support block migration to remote nbd (network block device) server. qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node). I'll would like to implemented this, but first, we need to finish to implement live migration + live local storage migration. ----- Mail original ----- De: "Lindsay Mathieson" ?: "proxmoxve" Envoy?: Dimanche 20 Novembre 2016 22:22:37 Objet: Re: [PVE-User] drbdmanage License change On 21/11/2016 2:54 AM, Alexandre DERUMIER wrote: > I think we could manage this with qemu block replication. Very nice. Is this an existing feature in qemu or still under development? (or planning) -- Lindsay Mathieson _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From yannis.milios at gmail.com Mon Nov 21 08:53:01 2016 From: yannis.milios at gmail.com (Yannis Milios) Date: Mon, 21 Nov 2016 07:53:01 +0000 Subject: [PVE-User] drbdmanage License change In-Reply-To: <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> References: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> <899652ea-c266-d799-2032-f3e17d156387@gmail.com> <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> Message-ID: Regarding drbd, is it possible to include drbd8 kernel module + userland utilities instead which are not affected by the license change? On Mon, 21 Nov 2016 at 06:20, Alexandre DERUMIER wrote: > >>Is this an existing feature in qemu or still under development? (or > >>planning) > > qemu already support block migration to remote nbd (network block device) > server. > > qemu 2.8 have a new feature, COLO, which will allow HA without vm > interruption. (continuous memory + block replication on remote node). > I'll would like to implemented this, but first, we need to finish to > implement live migration + live local storage migration. > > > ----- Mail original ----- > De: "Lindsay Mathieson" > ?: "proxmoxve" > Envoy?: Dimanche 20 Novembre 2016 22:22:37 > Objet: Re: [PVE-User] drbdmanage License change > > On 21/11/2016 2:54 AM, Alexandre DERUMIER wrote: > > I think we could manage this with qemu block replication. > > Very nice. > > > Is this an existing feature in qemu or still under development? (or > planning) > > -- > Lindsay Mathieson > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Sent from Gmail Mobile From lindsay.mathieson at gmail.com Mon Nov 21 09:56:21 2016 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Mon, 21 Nov 2016 18:56:21 +1000 Subject: [PVE-User] drbdmanage License change In-Reply-To: <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> References: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> <899652ea-c266-d799-2032-f3e17d156387@gmail.com> <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> Message-ID: On 21 November 2016 at 16:19, Alexandre DERUMIER wrote: > qemu already support block migration to remote nbd (network block device) server. Thanks, I'll have a look into that. > > qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node). Wow, very cool. -- Lindsay From f.gruenbichler at proxmox.com Mon Nov 21 10:17:29 2016 From: f.gruenbichler at proxmox.com (Fabian =?iso-8859-1?Q?Gr=FCnbichler?=) Date: Mon, 21 Nov 2016 10:17:29 +0100 Subject: [PVE-User] drbdmanage License change In-Reply-To: References: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> <899652ea-c266-d799-2032-f3e17d156387@gmail.com> <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> Message-ID: <20161121091729.foqnlctas4y6mh7r@nora.maurer-it.com> On Mon, Nov 21, 2016 at 06:56:21PM +1000, Lindsay Mathieson wrote: > On 21 November 2016 at 16:19, Alexandre DERUMIER wrote: > > qemu already support block migration to remote nbd (network block device) server. > > Thanks, I'll have a look into that. > > > > > qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node). > > Wow, very cool. > Yes - but also will take some time to test and integrate, so don't expect this to hit the PVE repos right after the 2.8 release ;). Also keep in mind the hardware and network requirements for anything approaching a busy workload running like this - you need to constantly sync the memory and I/O! From aderumier at odiso.com Mon Nov 21 13:52:13 2016 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Mon, 21 Nov 2016 13:52:13 +0100 (CET) Subject: [PVE-User] drbdmanage License change In-Reply-To: <20161121091729.foqnlctas4y6mh7r@nora.maurer-it.com> References: <255524546.3490258.1479660887587.JavaMail.zimbra@oxygem.tv> <899652ea-c266-d799-2032-f3e17d156387@gmail.com> <1924091778.3494438.1479709181439.JavaMail.zimbra@oxygem.tv> <20161121091729.foqnlctas4y6mh7r@nora.maurer-it.com> Message-ID: <1478687900.3513790.1479732733900.JavaMail.zimbra@oxygem.tv> >>Yes - but also will take some time to test and integrate, so don't >>expect this to hit the PVE repos right after the 2.8 release ;) Yes, I think this need a lot of work :) ----- Mail original ----- De: "Fabian Gr?nbichler" ?: "proxmoxve" Envoy?: Lundi 21 Novembre 2016 10:17:29 Objet: Re: [PVE-User] drbdmanage License change On Mon, Nov 21, 2016 at 06:56:21PM +1000, Lindsay Mathieson wrote: > On 21 November 2016 at 16:19, Alexandre DERUMIER wrote: > > qemu already support block migration to remote nbd (network block device) server. > > Thanks, I'll have a look into that. > > > > > qemu 2.8 have a new feature, COLO, which will allow HA without vm interruption. (continuous memory + block replication on remote node). > > Wow, very cool. > Yes - but also will take some time to test and integrate, so don't expect this to hit the PVE repos right after the 2.8 release ;). Also keep in mind the hardware and network requirements for anything approaching a busy workload running like this - you need to constantly sync the memory and I/O! _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From IMMO.WETZEL at adtran.com Mon Nov 21 22:44:43 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Mon, 21 Nov 2016 21:44:43 +0000 Subject: [PVE-User] Set new vm description via a piece call doesn't work. Any example available? Message-ID: Hi, I try to change the description of a vm via a picture call. But I got always back an empty array. I used the same json call for creating snapshot successfully. The Web interface seem not to use json but adding the optional digest. So is there anybody in this round who can tell me if I have to add the digest also for description and how to calculate this one? I tried already to use the digest I got with the get config. My json body just contains the description string. Nothing else and the call path is .../nodes/[node]/qemu/[vmid]/config Immo Sent from Samsung Mobile From t.lamprecht at proxmox.com Tue Nov 22 08:33:57 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Tue, 22 Nov 2016 08:33:57 +0100 Subject: [PVE-User] Set new vm description via a piece call doesn't work. Any example available? In-Reply-To: References: Message-ID: Hi, On 11/21/2016 10:44 PM, IMMO WETZEL wrote: > Hi, > > I try to change the description of a vm via a picture call. pvesh set /nodes/localhost/qemu/100/config --description 'test 12' works for me here. So with HTTP you would use a PUT request (instead of set with pvesh) to /nodes/localhost/qemu/100/config with the description property. > But I got always back an empty array. > I used the same json call for creating snapshot successfully. > > The Web interface seem not to use json but adding the optional digest. > So is there anybody in this round who can tell me if I have to add the digest also for description and how to calculate this one? > I tried already to use the digest I got with the get config. That's the correct value for digest. It is simply a SHA1 hash of the config file, you can then pass the digest to your set command so that this command aborts if someone else changed the config in the mean time. > > My json body just contains the description string. Nothing else and the call path is .../nodes/[node]/qemu/[vmid]/config > > Immo > > > > Sent from Samsung Mobile > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From ADhaussy at voyages-sncf.com Tue Nov 22 17:35:08 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Tue, 22 Nov 2016 16:35:08 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> Message-ID: ...sequel to those thrilling adventures... I _still_ have problems with nodes not joining the cluster properly after rebooting... Here's what we have done last night : - Stopped ALL VMs (just to ensure no corruption happen in case of unexpected reboots...) - Patched qemu from 2.6.1 to 2.6.2 to fix live migration issues. - Removed bridge (cluster network) on all nodes to fix multicast issues (11 nodes total.) - Patched all (HP blade/HP ILO/Ethernet/Fiber Channel card) bios and firmwares (13 nodes total.) - Rebooted all nodes, one, two, or three server simultaneously. So far we had absolutly no problems, corosync was still quorate and all nodes leaved and joined the cluster successfully. - Added 2 nodes to the cluster, no problem at all... - Started two VMs on two nodes, and to cut the network on those nodes. - As expected, watchdog did its job killing the two nodes, VMs were relocated.... so far so good ! _Except_, the two nodes were never able to join the cluster again after reboot... LVM takes so long to scan all PVs/LVs....somehow, i believe, it ends in an inconsistency when systemd starts cluster services. On the other nodes, i can actually see that corosync does a quick join/leave (and fails) right after booting... Nov 22 02:07:52 proxmoxt21 corosync[22342]: [TOTEM ] A new membership (10.98.x.x:1492) was formed. Members joined: 10 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [TOTEM ] A new membership (10.98.x.x:1496) was formed. Members left: 10 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [CPG ] downlist left_list: 0 received in state 2 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [QUORUM] Members[10]: 9 11 5 4 12 3 1 2 6 8 Nov 22 02:07:52 proxmoxt21 corosync[22342]: [MAIN ] Completed service synchronization, ready to provide service. I tried several reboots...same problem. :( I ended up removing the two freshly added nodes from the cluster, and restarted all VMs. I don't know how, but i feel that every node i add to the cluster currently slows down LVM scan a little more...until it ends up interfering with cluster services at boot... Recall that i have about 1500Vms, 1600LVs, 70PVs on external SAN storage... _Now_ i have a serious lead that this issue could be related to a known racing condition between udev and multipath. I have had this issue previously, but i didnt think i would interact and cause issues with cluster services...what do you think ? See the https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799781 I quickly tried the workaround suggested here : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799781#32 (remove this rule from udev : ACTION=="add|change", SUBSYSTEM=="block", RUN+="/sbin/multipath -v0 /dev/$name") I can tell it boots _much_ faster, but i will need to give another try and proper testing to see if it fix my issue... Anyhow, i'm open to suggestions or thoughts that could enlighten me... (And sorry for the long story) Le 14/11/2016 ? 12:33, Thomas Lamprecht a ?crit : On 14.11.2016 11:50, Dhaussy Alexandre wrote: Le 11/11/2016 ? 19:43, Dietmar Maurer a ?crit : On November 11, 2016 at 6:41 PM Dhaussy Alexandre wrote: you lost quorum, and the watchdog expired - that is how the watchdog based fencing works. I don't expect to loose quorum when _one_ node joins or leave the cluster. This was probably a long time before - but I have not read through the whole logs ... That makes no sense to me.. The fact is : everything have been working fine for weeks. What i can see in the logs is : several reboots of cluster nodes suddently, and exactly one minute after one node joining and/or leaving the cluster. The watchdog is set to an 60 second timeout, meaning that cluster leave caused quorum loss, or other problems (you said you had multicast problems around that time) thus the LRM stopped updating the watchdog, so one minute later it resetted all nodes, which left the quorate partition. I see no problems with corosync/lrm/crm before that. This leads me to a probable network (multicast) malfunction. I did a bit of homeworks reading the wiki about ha manager.. What i understand so far, is that every state/service change from LRM must be acknowledged (cluster-wise) by CRM master. Yes and no, LRM and CRM are two state machines with synced inputs, but that holds mainly for human triggered commands and the resulting communication. Meaning that commands like start, stop, migrate may not go through from the CRM to the LRM. Fencing and such stuff works none the less, else it would be a major design flaw :) So if a multicast disruption occurs, and i assume LRM wouldn't be able talk to the CRM MASTER, then it also couldn't reset the watchdog, am i right ? No, the watchdog runs on each node and is CRM independent. As watchdogs are normally not able to server more clients we wrote the watchdog-mux (multiplexer). This is a very simple C program which opens the watchdog with a 60 second timeout and allows multiple clients (at the moment CRM and LRM) to connect to it. If a client does not resets the dog for about 10 seconds, IIRC, the watchdox-mux disables watchdogs updates on the real watchdog. After that a node reset will happen *when* the dog runs out of time, not instantly. So if the LRM cannot communicate (i.e. has no quorum) he will stop updating the dog, thus trigger independent what the CRM says or does. Another thing ; i have checked my network configuration, the cluster ip is set on a linux bridge... By default multicast_snooping is set to 1 on linux bridge, so i think it there's a good chance this is the source of my problems... Note that we don't use IGMP snooping, it is disabled on almost all network switchs. Yes, multicast snooping has to be configured (recommended) or else turned off on the switch. That's stated in some wiki articles, various forum posts and our docs, here: http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements Hope that helps a bit understanding. :) cheers, Thomas Plus i found a post by A.Derumier (yes, 3 years old..) He did have similar issues with bridge and multicast. http://pve.proxmox.com/pipermail/pve-devel/2013-March/006678.html _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mir at miras.org Tue Nov 22 17:56:08 2016 From: mir at miras.org (Michael Rasmussen) Date: Tue, 22 Nov 2016 17:56:08 +0100 Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> Message-ID: <20161122175608.50607b2d@sleipner.datanom.net> On Tue, 22 Nov 2016 16:35:08 +0000 Dhaussy Alexandre wrote: > > I don't know how, but i feel that every node i add to the cluster currently slows down LVM scan a little more...until it ends up interfering with cluster services at boot... Maybe you need to tune the filter rules in /etc/lvm/lvm.conf. My own rules as an inspiration: # Do not scan ZFS zvols (to avoid problems on ZFS zvols snapshots) global_filter = [ "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|" ] # Only scan for volumes on local disk and on iSCSI target from Qnap NAS. Block scanning from all # other block devices. filter = [ "a|ata-OCZ-AGILITY3_OCZ-QMZN8K4967DA9NGO.*|", "a|scsi-36001405e38e9f02ddef9d4573db7a0d0|", "r|.*|" ] -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: The trouble with being punctual is that people think you have nothing more important to do. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: From ADhaussy at voyages-sncf.com Tue Nov 22 18:12:27 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Tue, 22 Nov 2016 17:12:27 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <20161122175608.50607b2d@sleipner.datanom.net> References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> <20161122175608.50607b2d@sleipner.datanom.net> Message-ID: Le 22/11/2016 ? 17:56, Michael Rasmussen a ?crit : > On Tue, 22 Nov 2016 16:35:08 +0000 > Dhaussy Alexandre wrote: > >> I don't know how, but i feel that every node i add to the cluster currently slows down LVM scan a little more...until it ends up interfering with cluster services at boot... > Maybe you need to tune the filter rules in /etc/lvm/lvm.conf. Yep, i already tuned filters in lvm config, before that i had "duplicate PVs' messages because of multipath devices. Anyway if i'm not wrong, LVM still has a lot of LVs to activate at boot. nov. 22 02:16:21 proxmoxt34 lvm[7279]: 1644 logical volume(s) in volume group "T_proxmox_1" now active nov. 22 02:16:21 proxmoxt34 lvm[7279]: 2 logical volume(s) in volume group "proxmoxt34-vg" now active From mir at miras.org Tue Nov 22 18:48:54 2016 From: mir at miras.org (Michael Rasmussen) Date: Tue, 22 Nov 2016 18:48:54 +0100 Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> <20161122175608.50607b2d@sleipner.datanom.net> Message-ID: Have you tested your filter rules? On November 22, 2016 6:12:27 PM GMT+01:00, Dhaussy Alexandre wrote: > >Le 22/11/2016 ? 17:56, Michael Rasmussen a ?crit : >> On Tue, 22 Nov 2016 16:35:08 +0000 >> Dhaussy Alexandre wrote: >> >>> I don't know how, but i feel that every node i add to the cluster >currently slows down LVM scan a little more...until it ends up >interfering with cluster services at boot... >> Maybe you need to tune the filter rules in /etc/lvm/lvm.conf. > >Yep, i already tuned filters in lvm config, before that i had >"duplicate >PVs' messages because of multipath devices. >Anyway if i'm not wrong, LVM still has a lot of LVs to activate at >boot. > >nov. 22 02:16:21 proxmoxt34 lvm[7279]: 1644 logical volume(s) in volume > >group "T_proxmox_1" now active >nov. 22 02:16:21 proxmoxt34 lvm[7279]: 2 logical volume(s) in volume >group "proxmoxt34-vg" now active >_______________________________________________ >pve-user mailing list >pve-user at pve.proxmox.com >http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ---- This mail was virus scanned and spam checked before delivery. This mail is also DKIM signed. See header dkim-signature. From gbr at majentis.com Tue Nov 22 19:00:10 2016 From: gbr at majentis.com (Gerald Brandt) Date: Tue, 22 Nov 2016 12:00:10 -0600 Subject: [PVE-User] A stop job is running... (xxx/no limit) Message-ID: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> Hi, I'm trying to shut down a server, and it waits on 'A stop job is running... (xx/ no limit). Why is there no time limit, and how can I set one? Gerald From ADhaussy at voyages-sncf.com Tue Nov 22 19:04:39 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Tue, 22 Nov 2016 18:04:39 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7ac426e7-23e7-db40-4cb4-b9b2ce04682e@voyages-sncf.com> <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> <20161122175608.50607b2d@sleipner.datanom.net> Message-ID: Le 22/11/2016 ? 18:48, Michael Rasmussen a ?crit : > Have you tested your filter rules? Yes, i set this filter at install : global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|", "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ] > > On November 22, 2016 6:12:27 PM GMT+01:00, Dhaussy Alexandre wrote: >> Le 22/11/2016 ? 17:56, Michael Rasmussen a ?crit : >>> On Tue, 22 Nov 2016 16:35:08 +0000 >>> Dhaussy Alexandre wrote: >>> >>>> I don't know how, but i feel that every node i add to the cluster >> currently slows down LVM scan a little more...until it ends up >> interfering with cluster services at boot... >>> Maybe you need to tune the filter rules in /etc/lvm/lvm.conf. >> Yep, i already tuned filters in lvm config, before that i had >> "duplicate >> PVs' messages because of multipath devices. >> Anyway if i'm not wrong, LVM still has a lot of LVs to activate at >> boot. >> >> nov. 22 02:16:21 proxmoxt34 lvm[7279]: 1644 logical volume(s) in volume >> >> group "T_proxmox_1" now active >> nov. 22 02:16:21 proxmoxt34 lvm[7279]: 2 logical volume(s) in volume >> group "proxmoxt34-vg" now active >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mir at miras.org Tue Nov 22 19:18:44 2016 From: mir at miras.org (Michael Rasmussen) Date: Tue, 22 Nov 2016 19:18:44 +0100 Subject: [PVE-User] Cluster disaster In-Reply-To: References: <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> <20161122175608.50607b2d@sleipner.datanom.net> Message-ID: <20161122191844.6e627895@sleipner.datanom.net> On Tue, 22 Nov 2016 18:04:39 +0000 Dhaussy Alexandre wrote: > Le 22/11/2016 ? 18:48, Michael Rasmussen a ?crit : > > Have you tested your filter rules? > Yes, i set this filter at install : > > global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|", > "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ] > Does vgscan and lvscan list the expected? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: We come to bury DOS, not to praise it. -- Paul Vojta, vojta at math.berkeley.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: From ADhaussy at voyages-sncf.com Tue Nov 22 19:47:51 2016 From: ADhaussy at voyages-sncf.com (Dhaussy Alexandre) Date: Tue, 22 Nov 2016 18:47:51 +0000 Subject: [PVE-User] Cluster disaster In-Reply-To: <20161122191844.6e627895@sleipner.datanom.net> References: <7d772b79-1c56-ed9e-c384-41360396e8c7@voyages-sncf.com> <201acff5-4140-e7a0-c3a6-cfd84bac8fdb@proxmox.com> <7bb6c8ce-ffa2-6ab1-5526-6d051c33cd52@voyages-sncf.com> <0a7d8757b5234528bd8bcf3926268664@ECLIPSE.groupevsc.com> <10e99daf728b4ad4ad22927039c4eaaa@ECLIPSE.groupevsc.com> <1880338119.114.1478882604308@webmail.proxmox.com> <1860956507.131.1478889820301@webmail.proxmox.com> <4b29d849-db06-9c40-e7a2-46b0f2bd9198@voyages-sncf.com> <1eb12f71-500f-3ae1-dad1-0e37ef74c839@proxmox.com> <20161122175608.50607b2d@sleipner.datanom.net> <20161122191844.6e627895@sleipner.datanom.net> Message-ID: <4a736c76-b470-ea14-eee8-267855fe87cc@voyages-sncf.com> Le 22/11/2016 ? 18:48, Michael Rasmussen a ?crit : >>> Have you tested your filter rules? >> Yes, i set this filter at install : >> >> global_filter = [ "r|sd[b-z].*|", "r|disk|", "r|dm-.*|", >> "r|vm.*disk.*|", "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "a|.*|" ] >> > Does vgscan and lvscan list the expected? > Seems to. root at proxmoxt20:~# vgscan Reading all physical volumes. This may take a while... Found volume group "T_proxmox_1" using metadata type lvm2 Found volume group "pve" using metadata type lvm2 root at proxmoxt20:~# lvscan ACTIVE '/dev/T_proxmox_1/vm-106-disk-1' [116,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-108-disk-1' [106,00 GiB] inherit inactive '/dev/T_proxmox_1/vm-109-disk-1' [116,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-110-disk-1' [116,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-111-disk-1' [116,00 GiB] inherit ................ ....cut..... ................ ACTIVE '/dev/T_proxmox_1/vm-451-disk-2' [90,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-451-disk-3' [90,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-1195-disk-2' [128,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-138-disk-1' [106,00 GiB] inherit ACTIVE '/dev/T_proxmox_1/vm-517-disk-1' [101,00 GiB] inherit ACTIVE '/dev/pve/swap' [7,63 GiB] inherit ACTIVE '/dev/pve/root' [95,37 GiB] inherit ACTIVE '/dev/pve/data' [174,46 GiB] inherit From mark at openvs.co.uk Wed Nov 23 10:40:55 2016 From: mark at openvs.co.uk (Mark Adams) Date: Wed, 23 Nov 2016 09:40:55 +0000 Subject: [PVE-User] ZFS on iSCSI + Pacemaker/corosync/DRBD Message-ID: Hi All, I'm testing out proxmox and trying to get a working ZFS on iSCSI HA setup going. Because ZFS on iSCSI logs on to the iscsi server via ssh and creates a zfs dataset then adds iscsi config to /etc/ietd.conf it works fine when you've got a single iscsi host, but I haven't figured out a way to use it with pacemaker/corosync resources. I believe the correct configuration would be for the ZFS on iSCSI script to create the pacemaker iSCSILogicalUnit resource using pcs, after creating the zfs dataset, but this musn't be something that is supported as yet. Has anyone else tried to get this or a similar setup working? Any views greatly received. Thanks, Mark From gaio at sv.lnf.it Wed Nov 23 13:30:12 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Wed, 23 Nov 2016 13:30:12 +0100 Subject: [PVE-User] A stop job is running... (xxx/no limit) In-Reply-To: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> Message-ID: <20161123123012.GF3383@sv.lnf.it> Mandi! Gerald Brandt In chel di` si favelave... > I'm trying to shut down a server, and it waits on 'A stop job is > running... (xx/ no limit). Why is there no time limit, and how can I > set one? NFS storage? -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From gbr at majentis.com Wed Nov 23 13:50:42 2016 From: gbr at majentis.com (Gerald Brandt) Date: Wed, 23 Nov 2016 06:50:42 -0600 Subject: [PVE-User] A stop job is running... (xxx/no limit) In-Reply-To: <20161123123012.GF3383@sv.lnf.it> References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> <20161123123012.GF3383@sv.lnf.it> Message-ID: <9a66b305-e68f-d17f-016a-a5cda4074dd2@majentis.com> On 2016-11-23 06:30 AM, Marco Gaiarin wrote: > Mandi! Gerald Brandt > In chel di` si favelave... > >> I'm trying to shut down a server, and it waits on 'A stop job is >> running... (xx/ no limit). Why is there no time limit, and how can I >> set one? > NFS storage? > Yup. Why, does that make a difference? Gerald From gaio at sv.lnf.it Wed Nov 23 14:01:03 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Wed, 23 Nov 2016 14:01:03 +0100 Subject: [PVE-User] A stop job is running... (xxx/no limit) In-Reply-To: <9a66b305-e68f-d17f-016a-a5cda4074dd2@majentis.com> References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> <20161123123012.GF3383@sv.lnf.it> <9a66b305-e68f-d17f-016a-a5cda4074dd2@majentis.com> Message-ID: <20161123130103.GL3383@sv.lnf.it> Mandi! Gerald Brandt In chel di` si favelave... > >NFS storage? > Yup. Why, does that make a difference? Look at list archive, some weeks ago: seems that systemd behave not so correctly and tear down the NFS server before proxmox, that stalls. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From alain.pean at lpn.cnrs.fr Wed Nov 23 14:22:28 2016 From: alain.pean at lpn.cnrs.fr (=?UTF-8?Q?Alain_P=c3=a9an?=) Date: Wed, 23 Nov 2016 14:22:28 +0100 Subject: [PVE-User] A stop job is running... (xxx/no limit) In-Reply-To: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> References: <4edecef3-aa7e-1e4c-ee57-c67da78b316e@majentis.com> Message-ID: Le 22/11/2016 ? 19:00, Gerald Brandt a ?crit : > I'm trying to shut down a server, and it waits on 'A stop job is > running... (xx/ no limit). Why is there no time limit, and how can I > set one? I see also this problem with a Dell R630 server with Broadcom 10g interfaces. It's a known bug. It is resolved in 4.5 kernels. Perhaps we have to wait for a proxmox upgrade to this kernel : https://forum.proxmox.com/threads/no-reboot-with-4-4-pve-kernel.27908/ It seems to be a bug in 4.4 Ubuntu kernels. Alain -- Administrateur Syst?me/R?seau C2N (ex LPN) Centre de Nanosciences et Nanotechnologies (UMR 9001) Site de Marcoussis, Data IV, route de Nozay - 91460 Marcoussis Tel : 01-69-63-61-34 From marcomgabriel at gmail.com Wed Nov 23 16:16:27 2016 From: marcomgabriel at gmail.com (Marco M. Gabriel) Date: Wed, 23 Nov 2016 15:16:27 +0000 Subject: [PVE-User] MTU size changed on a running cluster Message-ID: Hi there, on a productive 5 node Proxmox VE Ceph cluster, we experienced some strange behaviour: Based on http://pve.proxmox.com/wiki/Open_vSwitch#Example_2:_Bond_.2B_Bridge_.2B_Internal_Ports we have an internal network for cluster/corosync communication and another internal network for Ceph Storage traffic. The Ceph OVS bridge was set to MTU 9000 in /etc/network/interfaces and ran without a problem since a week. Today we've seen Ceph errors like "x requests are blocked > 32 sec". After a troubleshooting, we's seen that packets got dropped because they were > 1500 bytes on the Ceph interface. That was strange as we had set them to MTU 9000 and it was running since a week. We checked the Interfaces and on two nodes, we saw a MTU of 1500 while the other three nodes still had MTU 9000. Has anybody experiences something like that? I read that an OVS bridge automatically sets it's own MTU according to the lowest MTU of the member interfaces, but I am not sure if this could be a problem here. Any hints appreciated, Marco From w.link at proxmox.com Wed Nov 23 16:29:01 2016 From: w.link at proxmox.com (Wolfgang Link) Date: Wed, 23 Nov 2016 16:29:01 +0100 Subject: [PVE-User] MTU size changed on a running cluster In-Reply-To: References: Message-ID: <5835B5BD.2070701@proxmox.com> This is a openvswitch bug. The workaround is to use openvswitch 2.6, it is on testing repo and set mtu_reqest on the interface. https://github.com/openvswitch/ovs/blob/master/FAQ.rst Q: How can I configure the bridge internal interface MTU? Why does Open vSwitch keep changing internal ports MTU? A: By default Open vSwitch overrides the internal interfaces (e.g. br0) MTU. If you have just an internal interface (e.g. br0) and a physical interface (e.g. eth0), then every change in MTU to eth0 will be reflected to br0. Any manual MTU configuration using ip or ifconfig on internal interfaces is going to be overridden by Open vSwitch to match the current bridge minimum. Sometimes this behavior is not desirable, for example with tunnels. The MTU of an internal interface can be explicitly set using the following command: $ ovs-vsctl set int br0 mtu_request=1450 After this, Open vSwitch will configure br0 MTU to 1450. Since this setting is in the database it will be persistent (compared to what happens with ip or ifconfig). The MTU configuration can be removed to restore the default behavior with: $ ovs-vsctl set int br0 mtu_request=[] The mtu_request column can be used to configure MTU even for physical interfaces (e.g. eth0). On 11/23/2016 04:16 PM, Marco M. Gabriel wrote: > Hi there, > > on a productive 5 node Proxmox VE Ceph cluster, we experienced some strange > behaviour: > > Based on > http://pve.proxmox.com/wiki/Open_vSwitch#Example_2:_Bond_.2B_Bridge_.2B_Internal_Ports > we > have an internal network for cluster/corosync communication and another > internal network for Ceph Storage traffic. The Ceph OVS bridge was set to > MTU 9000 in /etc/network/interfaces and ran without a problem since a week. > > Today we've seen Ceph errors like "x requests are blocked > 32 sec". > > After a troubleshooting, we's seen that packets got dropped because they > were > 1500 bytes on the Ceph interface. That was strange as we had set > them to MTU 9000 and it was running since a week. > > We checked the Interfaces and on two nodes, we saw a MTU of 1500 while the > other three nodes still had MTU 9000. > > Has anybody experiences something like that? I read that an OVS bridge > automatically sets it's own MTU according to the lowest MTU of the member > interfaces, but I am not sure if this could be a problem here. > > Any hints appreciated, > Marco > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From mir at miras.org Wed Nov 23 21:40:06 2016 From: mir at miras.org (Michael Rasmussen) Date: Wed, 23 Nov 2016 21:40:06 +0100 Subject: [PVE-User] ZFS on iSCSI + Pacemaker/corosync/DRBD In-Reply-To: References: Message-ID: <20161123214006.26e4e9a9@sleipner.datanom.net> On Wed, 23 Nov 2016 09:40:55 +0000 Mark Adams wrote: > > Has anyone else tried to get this or a similar setup working? Any views > greatly received. > What you are trying to achieve is not a good idea with corosync/pacemaker since iSCSI is a block device. To create a cluster over a LUN will require a cluster aware filesystem like NFS, CIFS etc. The proper way of doing this with iSCSI would be using multipath to a SAN since iSCSI LUNs cannot be shared. Unfortunately the current implementation of ZFS over iSCSI does not support multipath (a limitation in libiscsi). Also may I remind you that Iet development has stopped in favor of LIO targets (http://linux-iscsi.org/wiki/LIO). I am currently working on making an implementation of LIO for proxmox which will use a different architecture than the current ZFS over iSCSI implementation. The new implementation will support multipath. As this is developed in my spare time progress is not a high as it could be. Alternatively you could look at this: http://www.napp-it.org/doc/downloads/z-raid.pdf -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: The computer should be doing the hard work. That's what it's paid to do, after all. -- Larry Wall in <199709012312.QAA08121 at wall.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: From t.lamprecht at proxmox.com Thu Nov 24 10:05:20 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Thu, 24 Nov 2016 10:05:20 +0100 Subject: [PVE-User] HA Changes and Cleanups Message-ID: <38040063-eb73-37e6-9fa7-7b5f05f3b269@proxmox.com> Hi all, regarding the discussion about our HA stack on the pve-user list in October we made some changes, which - hopefully - should address some problems and reduce some common pitfalls. * What has changed or is new: pct shutdown / qm shutdown and the Shutdown button in the web interface work now as expected, if triggered the HA service will be shut down and not automatically started again. If that is needed there is still the 'reset' functionality. We provide now better feedback about the actual state of a HA service. E.g. 'started' will be only shown if the local resource manager confirmed that the service really started, else we show 'starting' so that it's clearer whats currently happening. We merged the GUI's 'Resource' tab into the 'HA' tab, related information is now placed together. This should give a better overview of the current situation. Note, there are some fields in the resource grid which are hidden by default, to show them click on one of the tiny triangles in the column headers: https://i.imgsafe.org/6a271a3cc4.png Improved the built in documentation. We also reworked the request states for services, there is now: * started (replaces 'enabled') The CRM tries to start the resource. Service state is set to started after successful start. On node failures, or when start fails, it tries to recover the resource. If everything fails, service state it set to error. * stopped (new) The CRM tries to keep the resource in stopped state, but it still tries to relocate the resources on node failures. * disabled The CRM tries to put the resource in stopped state, but does not try to relocate the resources on node failures. The main purpose of this state is error recovery, because it is the only way to move a resource out of the error state. So the general used ones should be now 'started' and 'stopped', here its clear what the HA stack will do. 'disabled' should be mainly used to recover a service which is in the error state. ha-manager enabled/disabled was removed, this was not in the API so it should only affect user which called it directly. You can use `ha-manager set SID --state REQUEST_STATE` instead. * What has still to come: A 'ignore' request state in which the service will not be touched by HA but is still in the resource configuration - this was wished a few times. I have WIP patches ready but nothing merged yet. A bit less confusion on task execution logs. Allowing hard stopping of a VM/CT under HA. I hope this addresses some part of the feedback we got. Many thanks to the community for the feedback and to Dietmar who did a lot of the above mentioned work and also Dominik for his help with the UI. User which want to test this changes can use the new packages we pushed to pvetest yesterday evening CET. The changes are include in the packages: pve-ha-manager >= 1.0-38 pve-manager >= 4.3-11 Happy testing and feel free to provide feedback. cheers, Thomas From lists at merit.unu.edu Thu Nov 24 10:22:11 2016 From: lists at merit.unu.edu (mj) Date: Thu, 24 Nov 2016 10:22:11 +0100 Subject: [PVE-User] HA Changes and Cleanups In-Reply-To: <38040063-eb73-37e6-9fa7-7b5f05f3b269@proxmox.com> References: <38040063-eb73-37e6-9fa7-7b5f05f3b269@proxmox.com> Message-ID: <97d4d05d-44b7-be24-4040-d973696ede7e@merit.unu.edu> Hi Thomas, Thank you for these improvements. (I did not participate much in the following discussion, but I was the one who started the thread "[PVE-User] HA question") MJ On 11/24/2016 10:05 AM, Thomas Lamprecht wrote: > Hi all, > > regarding the discussion about our HA stack on the pve-user list in October > we made some changes, which - hopefully - should address some problems and > reduce some common pitfalls. > > * What has changed or is new: > > pct shutdown / qm shutdown and the Shutdown button in the web interface > work > now as expected, if triggered the HA service will be shut down and not > automatically started again. If that is needed there is still the 'reset' > functionality. > > We provide now better feedback about the actual state of a HA service. > E.g. 'started' will be only shown if the local resource manager confirmed > that the service really started, else we show 'starting' so that it's > clearer whats currently happening. > > We merged the GUI's 'Resource' tab into the 'HA' tab, related > information is > now placed together. This should give a better overview of the current > situation. > Note, there are some fields in the resource grid which are hidden by > default, to show them click on one of the tiny triangles in the column > headers: https://i.imgsafe.org/6a271a3cc4.png > > Improved the built in documentation. > > We also reworked the request states for services, there is now: > > * started (replaces 'enabled') > The CRM tries to start the resource. Service state is set to started > after successful start. On node failures, or when start fails, it tries > to recover the resource. If everything fails, service state it set to > error. > > * stopped (new) > The CRM tries to keep the resource in stopped state, but it still > tries to relocate the resources on node failures. > > * disabled > The CRM tries to put the resource in stopped state, but does not > try to relocate the resources on node failures. The main purpose > of this state is error recovery, because it is the only way to > move a resource out of the error state. > > > So the general used ones should be now 'started' and 'stopped', here its > clear what the HA stack will do. > 'disabled' should be mainly used to recover a service which is in the error > state. > > ha-manager enabled/disabled was removed, this was not in the API so it > should only affect user which called it directly. > You can use `ha-manager set SID --state REQUEST_STATE` instead. > > * What has still to come: > > A 'ignore' request state in which the service will not be touched by HA but > is still in the resource configuration - this was wished a few times. > I have WIP patches ready but nothing merged yet. > > A bit less confusion on task execution logs. > > Allowing hard stopping of a VM/CT under HA. > > I hope this addresses some part of the feedback we got. > Many thanks to the community for the feedback and to Dietmar who did a lot > of the above mentioned work and also Dominik for his help with the UI. > > User which want to test this changes can use the new packages we pushed to > pvetest yesterday evening CET. > The changes are include in the packages: > pve-ha-manager >= 1.0-38 > pve-manager >= 4.3-11 > > Happy testing and feel free to provide feedback. > > cheers, > Thomas > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From IMMO.WETZEL at adtran.com Thu Nov 24 13:41:16 2016 From: IMMO.WETZEL at adtran.com (IMMO WETZEL) Date: Thu, 24 Nov 2016 12:41:16 +0000 Subject: [PVE-User] pvesh delete /nodes/{node}/qemu/{vmid}/snapshot/{snapname} -force doesnt work as expected ? Message-ID: Hi, help command shows delete snapshot with option force but this isnt executed correctly. See output below. Or do I something wrong ? Background: The snapshot doesnt exists in qcow2 disk file but in config. Therefore force should help as expected and remove the snapshot entry from the config file. root at prox01:/root# pvesh help /nodes/prox05/qemu/161/snapshot/initialSnapShot --verbose help [path] [--verbose] cd [path] ls [path] USAGE: delete /nodes/{node}/qemu/{vmid}/snapshot/{snapname} [OPTIONS] Delete a VM snapshot. -force boolean For removal from config file, even if removing disk snapshots fails. ... So -force should be the one I need. But see here: root at prox01:/root# pvesh delete /nodes/prox05/qemu/161/snapshot/initialSnapShot -force usage: delete [path] So why I see the usage string here ? Following is correct cos snapshot can't be found root at prox01:/root# pvesh delete /nodes/prox05/qemu/161/snapshot/initialSnapShot command '/usr/bin/qemu-img snapshot -d initialSnapShot /mnt/pve/Storage/images/161/vm-161-disk-1.qcow2' failed: exit code 1 qemu-img: Could not delete snapshot 'initialSnapShot': (Can't find the snapshot) UPID:prox05:00007C50:48ACCCD1:5836D31B:qmdelsnapshot:161:root at pam:200 OK From mavleeuwen at icloud.com Thu Nov 24 21:06:46 2016 From: mavleeuwen at icloud.com (Marcel van Leeuwen) Date: Thu, 24 Nov 2016 21:06:46 +0100 Subject: [PVE-User] License issue In-Reply-To: <94138841.252.1479583340873@webmail.proxmox.com> References: <1749975450.39.1479549998560@webmail.proxmox.com> <547945705.173.1479569954476@webmail.proxmox.com> <28B3E5EC-E71F-4864-9CDD-B94E1461B1C9@icloud.com> <94138841.252.1479583340873@webmail.proxmox.com> Message-ID: <8FAC9A84-3E0B-49A7-86DD-8E1000A4B8F0@icloud.com> Yeah, true. Thanks for clarifying! Cheers, Marcel van Leeuwen > On 19 Nov 2016, at 20:22, Dietmar Maurer wrote: > > Please note that our software license is AGPL. > > You talk about subscriptions here - and this is something very different. > >> What if the license is renewed after a year? Then you have 3 installs again? > > Sure. Also, you can simply contact our support if you need more than 3 > installs. We usually find a solution ... > From proxmox-user at mattern.org Fri Nov 25 00:48:59 2016 From: proxmox-user at mattern.org (Marcus) Date: Fri, 25 Nov 2016 00:48:59 +0100 Subject: [PVE-User] Debian initramfs bug #775583 Message-ID: <61939744-e1fa-2b83-3ef9-0794fdd070cc@mattern.org> Hi, it seems that I hit this bug on a Proxmox test installation with Proxmox 4.3. Symptoms are the same as described in the bug report (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=775583) - LVM Volume for /usr is not activated at boot. Manually doing a vgchange -a y in the initramfs shell activates them and the OS boots. There is a "local-block" initramfs script in the Debian latest version of the lvm2 package that fixes this issue. If I copy it to my Proxmox test installation and rebuild the initramfs it boots normally. I found that the bug is fixed in Debian lvm2_2.02.111-2.1. Regards. From mark at tuxis.nl Fri Nov 25 11:18:46 2016 From: mark at tuxis.nl (Mark Schouten) Date: Fri, 25 Nov 2016 11:18:46 +0100 Subject: [PVE-User] HA Cluster migration issues Message-ID: <2600018850-5504@kerio.tuxis.nl> Hi, I have a HA cluster running, with Ceph and all, and I have rebooted one of the nodes this week. We now want te migrate the HA-VM's back to the original server, but that fails without a clear error. I can say: root at proxmox01:~# qm migrate 600 proxmox03 -online Executing HA migrate for VM 600 to node proxmox03 I then see kvm starting on node proxmox03, but then something goes wrong after that and migration fails: task started by HA resource agent Nov 25 10:58:05 starting migration of VM 600 to node 'proxmox03' (10.1.1.3) Nov 25 10:58:05 copying disk images Nov 25 10:58:05 starting VM 600 on remote node 'proxmox03' Nov 25 10:58:06 starting ssh migration tunnel Nov 25 10:58:07 starting online/live migration on localhost:60000 Nov 25 10:58:07 migrate_set_speed: 8589934592 Nov 25 10:58:07 migrate_set_downtime: 0.1 Nov 25 10:58:09 ERROR: online migrate failure - aborting Nov 25 10:58:09 aborting phase 2 - cleanup resources Nov 25 10:58:09 migrate_cancel Nov 25 10:58:10 ERROR: migration finished with problems (duration 00:00:05) TASK ERROR: migration problems I can't see any errormessage that is more useful. Can anybody tell me how I can further debug this or maybe somebody knows what's going on? pveversion -v (This is identical on the two machines) proxmox-ve: 4.2-48 (running kernel: 4.4.6-1-pve) pve-manager: 4.2-2 (running version: 4.2-2/725d76f0) pve-kernel-4.4.6-1-pve: 4.4.6-48 lvm2: 2.02.116-pve2 corosync-pve: 2.3.5-2 libqb0: 1.0-1 pve-cluster: 4.0-39 qemu-server: 4.0-72 pve-firmware: 1.1-8 libpve-common-perl: 4.0-59 libpve-access-control: 4.0-16 libpve-storage-perl: 4.0-50 pve-libspice-server1: 0.12.5-2 vncterm: 1.2-1 pve-qemu-kvm: 2.5-14 pve-container: 1.0-62 pve-firewall: 2.0-25 pve-ha-manager: 1.0-28 ksm-control-daemon: 1.2-1 glusterfs-client: 3.5.2-2+deb8u2 lxc-pve: 1.1.5-7 lxcfs: 2.0.0-pve2 cgmanager: 0.39-pve1 criu: 1.6.0-1 Met vriendelijke groeten, --? Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:?61527076?| http://www.tuxis.nl/ T: 0318 200208 | info at tuxis.nl From gwenn+proxmox at beurre.demisel.net Fri Nov 25 11:27:35 2016 From: gwenn+proxmox at beurre.demisel.net (Gwenn Gueguen) Date: Fri, 25 Nov 2016 11:27:35 +0100 Subject: [PVE-User] VMA endianness bug? Message-ID: <20161125112735.623254f1@port-42.amossys.fr> Hi, I had an issue while reading a VMA file written by a proxmox backup on an up-to-date Proxmox 4.3 node. According to vma_spec.txt[1], "All numbers in VMA archive are stored in Big Endian byte order." but it looks like the 2 byte size field at the beginning of each blob are stored in little endian byte order. Here is an extract of the blob buffer: 030000 00 11 00 71 65 6D 75 2D 73 65 72 76 65 72 2E 63 ...qemu-server.c 030020 6F 6E 66 00 24 02 62 61 6C 6C 6F 6F 6E 3A 20 31 onf.$.balloon: 1 Config name length is 17 (0x0011) but is written in file as 4352 (0x1100). Config data length is 548 (0x0224) but is written in file as 9218 (0x2402). Others numbers in the header (version, timestamp, etc.) are written in big endian byte order (0X00000001 for the version). Cheers, [1] https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=vma_spec.txt -- Gwenn Gueguen From w.bumiller at proxmox.com Fri Nov 25 12:03:14 2016 From: w.bumiller at proxmox.com (Wolfgang Bumiller) Date: Fri, 25 Nov 2016 12:03:14 +0100 (CET) Subject: [PVE-User] VMA endianness bug? In-Reply-To: <20161125112735.623254f1@port-42.amossys.fr> References: <20161125112735.623254f1@port-42.amossys.fr> Message-ID: <847334054.38.1480071794867@webmail.proxmox.com> > On November 25, 2016 at 11:27 AM Gwenn Gueguen wrote: > > > > Hi, > > I had an issue while reading a VMA file written by a proxmox backup > on an up-to-date Proxmox 4.3 node. > > According to vma_spec.txt[1], "All numbers in VMA archive are stored in > Big Endian byte order." but it looks like the 2 byte size field at the > beginning of each blob are stored in little endian byte order. Unfortunately this is correct. Also note that the first byte of the blob buffer is unused (iow. the first length starts at an offset of 1). (If you just access it via the offset pointers from the device/config fields you won't run into this, but if you try to just index the size+data pairs you will ;-) ). May I ask what you're working on? From gwenn+proxmox at beurre.demisel.net Fri Nov 25 12:15:11 2016 From: gwenn+proxmox at beurre.demisel.net (Gwenn Gueguen) Date: Fri, 25 Nov 2016 12:15:11 +0100 Subject: [PVE-User] VMA endianness bug? In-Reply-To: <847334054.38.1480071794867@webmail.proxmox.com> References: <20161125112735.623254f1@port-42.amossys.fr> <847334054.38.1480071794867@webmail.proxmox.com> Message-ID: <20161125121511.6dc74781@port-42.amossys.fr> On Fri, 25 Nov 2016 12:03:14 +0100 (CET) Wolfgang Bumiller wrote: > Unfortunately this is correct. Also note that the first byte of the > blob buffer is unused (iow. the first length starts at an offset of > 1). (If you just access it via the offset pointers from the > device/config fields you won't run into this, but if you try to just > index the size+data pairs you will ;-) ). Fortunately, I'm using offsets stored in the header so this is not a problem. > May I ask what you're working on? We will use Proxmox to host an experimentation platform and we'll have to import/export VMs from/to other virtualization platforms so I'm trying to develop a small tool to convert VMA backups to OVA and vice versa. An import/export feature would be great. -- Gwenn Gueguen From hermann at qwer.tk Fri Nov 25 13:43:07 2016 From: hermann at qwer.tk (Hermann Himmelbauer) Date: Fri, 25 Nov 2016 13:43:07 +0100 Subject: [PVE-User] Ceph upgrade from 94.3 - recommendations? Message-ID: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk> Hi, I recently upgraded the Proxomox community version to the latest version and wonder if a ceph upgrade is recommended, too? Currently my ceph version is 0.94.3 - and I see that there are upgrades to 0.94.9 on the ceph site, does anyone know how to do such an upgrade on proxmox? Is it risky? Best Regards, Hermann -- hermann at qwer.tk PGP/GPG: 299893C7 (on keyservers) From w.bumiller at proxmox.com Fri Nov 25 14:23:13 2016 From: w.bumiller at proxmox.com (Wolfgang Bumiller) Date: Fri, 25 Nov 2016 14:23:13 +0100 (CET) Subject: [PVE-User] VMA endianness bug? In-Reply-To: <20161125121511.6dc74781@port-42.amossys.fr> References: <20161125112735.623254f1@port-42.amossys.fr> <847334054.38.1480071794867@webmail.proxmox.com> <20161125121511.6dc74781@port-42.amossys.fr> Message-ID: <1740456382.149.1480080193359@webmail.proxmox.com> > On November 25, 2016 at 12:15 PM Gwenn Gueguen wrote: > > May I ask what you're working on? > > We will use Proxmox to host an experimentation platform and we'll have > to import/export VMs from/to other virtualization platforms so I'm > trying to develop a small tool to convert VMA backups to OVA and vice > versa. > > An import/export feature would be great. There are the `vma create/vma extract` cli tools, alternatively I'll probably be moving the vma handling code into a separate library for easier maintenance (which would also allow it to be reused more easily for such tools). From aderumier at odiso.com Sat Nov 26 09:04:49 2016 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Sat, 26 Nov 2016 09:04:49 +0100 (CET) Subject: [PVE-User] Ceph upgrade from 94.3 - recommendations? In-Reply-To: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk> References: <08167395-a642-2200-5585-8b259fbdca84@qwer.tk> Message-ID: <549136967.3727214.1480147489744.JavaMail.zimbra@oxygem.tv> Sure, you can always upgrade to last minor version. (0.94.X) Only jewel is not yet compatible because of a bug, but it'll be fixed in next jewel release (10.2.4) ----- Mail original ----- De: "Hermann Himmelbauer" ?: "proxmoxve" Envoy?: Vendredi 25 Novembre 2016 13:43:07 Objet: [PVE-User] Ceph upgrade from 94.3 - recommendations? Hi, I recently upgraded the Proxomox community version to the latest version and wonder if a ceph upgrade is recommended, too? Currently my ceph version is 0.94.3 - and I see that there are upgrades to 0.94.9 on the ceph site, does anyone know how to do such an upgrade on proxmox? Is it risky? Best Regards, Hermann -- hermann at qwer.tk PGP/GPG: 299893C7 (on keyservers) _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From gaio at sv.lnf.it Mon Nov 28 13:05:11 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 28 Nov 2016 13:05:11 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) Message-ID: <20161128120511.GJ3348@sv.lnf.it> A very strange saturday evening. Hardware tooling, hacking, caffeine, ... I'm still completing my CEPH storage cluster (now 2 node storage, waiting to add the third), but is it mostly ''on production''. So, after playing with server for some month, saturday i've shut down all the cluster, setup all the cables, switches, UPS, ... in a more decent and stable way. To simulate a hard power outgage, i've not set the noout and nodown flags. After that, i've powered up all the cluster (first the 2 ceph storage node, after the 2 pve host nodes) and i've hit the first trouble: 2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected The trouble came from the fact that... my NTP server was on a VM, and despite the fact that the status was only 'HEALTH_WARN', i cannot access anymore the storage. I've solved adding more NTP server from other sites, and after some time the cluster go OK: 2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK and here the panic start. PVE interface report the Ceph cluster OK, report correctly all the stuffs (mon, osd, pools, pool usage, ...) but data cluster was not accessible: a) if i try to move a disk, reply with something like 'no available'. b) if i try to start VMs, they stalls... The only strange things on log was that there's NO pgmap update, like before: 2016-11-26 16:59:31.588695 mon.0 10.27.251.7:6789/0 2317560 : cluster [INF] pgmap v2410540: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13569 kB/s rd, 2731 kB/s wr, 565 op/s but really, on panic, i've not noted that. After some tests, i've finally do the right thing. 1) i've set the noout and nodown flags. 2) i've rebooted the ceph nodes, one by one. After that, all the cluster start. VMs that was on stalls, immediately start. After that, i've understood that NTP is a crucial service for ceph, so it is needed to have a pool of servers. Still, i'm not sure this was the culprit. The second thing i've understood is that Ceph react badly to a total shutdown. In a datacenter this is probably acceptable. I don't know if it is my fault, or at least there's THE RIGTH WAY to start a Ceph cluster from cold metal... Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From sysadmin-pve at cognitec.com Mon Nov 28 13:45:24 2016 From: sysadmin-pve at cognitec.com (Alwin Antreich) Date: Mon, 28 Nov 2016 13:45:24 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <20161128120511.GJ3348@sv.lnf.it> References: <20161128120511.GJ3348@sv.lnf.it> Message-ID: <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> Hi Marco, On 11/28/2016 01:05 PM, Marco Gaiarin wrote: > > A very strange saturday evening. Hardware tooling, hacking, caffeine, > ... > > I'm still completing my CEPH storage cluster (now 2 node storage, > waiting to add the third), but is it mostly ''on production''. > So, after playing with server for some month, saturday i've shut down > all the cluster, setup all the cables, switches, UPS, ... in a more > decent and stable way. > > To simulate a hard power outgage, i've not set the noout and nodown > flags. > > > After that, i've powered up all the cluster (first the 2 ceph storage > node, after the 2 pve host nodes) and i've hit the first trouble: > > 2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected > > The trouble came from the fact that... my NTP server was on a VM, and > despite the fact that the status was only 'HEALTH_WARN', i cannot > access anymore the storage. What did the full ceph status show? Did you add all the monitors to your storage config in proxmox? A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg. 1 mons down). Did you configure timesyncd properly? On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the proper time, so every host knows which map is the current one. > > I've solved adding more NTP server from other sites, and after some > time the cluster go OK: > > 2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK > > and here the panic start. > > > PVE interface report the Ceph cluster OK, report correctly all the stuffs > (mon, osd, pools, pool usage, ...) but data cluster was not accessible: > > a) if i try to move a disk, reply with something like 'no available'. > > b) if i try to start VMs, they stalls... > > The only strange things on log was that there's NO pgmap update, like > before: > > 2016-11-26 16:59:31.588695 mon.0 10.27.251.7:6789/0 2317560 : cluster [INF] pgmap v2410540: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13569 kB/s rd, 2731 kB/s wr, 565 op/s > > but really, on panic, i've not noted that. > > > After some tests, i've finally do the right thing. > > 1) i've set the noout and nodown flags. > > 2) i've rebooted the ceph nodes, one by one. > > After that, all the cluster start. VMs that was on stalls, immediately > start. > > > After that, i've understood that NTP is a crucial service for ceph, so > it is needed to have a pool of servers. Still, i'm not sure this was > the culprit. > > > The second thing i've understood is that Ceph react badly to a total > shutdown. In a datacenter this is probably acceptable. > > I don't know if it is my fault, or at least there's THE RIGTH WAY to > start a Ceph cluster from cold metal... > > > Thanks. > -- Cheers, Alwin From gaio at sv.lnf.it Mon Nov 28 15:31:41 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 28 Nov 2016 15:31:41 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> Message-ID: <20161128143141.GQ3348@sv.lnf.it> Mandi! Alwin Antreich In chel di` si favelave... > What did the full ceph status show? Do you mean 'ceph status'? I've not saved it, but was OK, as now: root at thor:~# ceph status cluster 8794c124-c2ec-4e81-8631-742992159bd6 health HEALTH_OK monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} election epoch 94, quorum 0,1,2,3 0,1,2,3 osdmap e114: 6 osds: 6 up, 6 in pgmap v2524432: 768 pgs, 3 pools, 944 GB data, 237 kobjects 1874 GB used, 7435 GB / 9310 GB avail 768 active+clean client io 7693 B/s rd, 302 kB/s wr, 65 op/s > Did you add all the monitors to your storage config in proxmox? > A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be > available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg. > 1 mons down). I've currently 4 nodes in my cluster: all node are pve clusterized, 2 are cpu only (ceph mon), 2 (and one more to come) storage node (mon+osd(s)). Yes, i've not changed the storage configuration, and when the CPU nodes started at least the two storage nodes where online. > Did you configure timesyncd properly? > On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the > proper time, so every host knows which map is the current one. Now, yes. As stated, i've had configured with only a NTP server that was a VM in the same cluster; now, they use two NTP server, one remote. Fixed the ntp server, servers get in sync, ceph status got OK but mons does not start to peers themself ('pgmap' logs). Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From sysadmin-pve at cognitec.com Mon Nov 28 15:50:21 2016 From: sysadmin-pve at cognitec.com (Alwin Antreich) Date: Mon, 28 Nov 2016 15:50:21 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <20161128143141.GQ3348@sv.lnf.it> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> Message-ID: <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> Hi Marco, On 11/28/2016 03:31 PM, Marco Gaiarin wrote: > Mandi! Alwin Antreich > In chel di` si favelave... > >> What did the full ceph status show? > > Do you mean 'ceph status'? I've not saved it, but was OK, as now: > > root at thor:~# ceph status > cluster 8794c124-c2ec-4e81-8631-742992159bd6 > health HEALTH_OK > monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > election epoch 94, quorum 0,1,2,3 0,1,2,3 > osdmap e114: 6 osds: 6 up, 6 in > pgmap v2524432: 768 pgs, 3 pools, 944 GB data, 237 kobjects > 1874 GB used, 7435 GB / 9310 GB avail > 768 active+clean > client io 7693 B/s rd, 302 kB/s wr, 65 op/s > Would have been interesting if all OSDs were up & in. As depending on the pool config, the min size for serving data out of that pool might have prevented the storage to serve data. > >> Did you add all the monitors to your storage config in proxmox? >> A client is speaking to the monitor first to get the proper maps and then connects to the OSDs. The storage would not be >> available if you only have one monitor configured on the storage tab in proxmox and that mon would be not avialable (eg. >> 1 mons down). > > I've currently 4 nodes in my cluster: all node are pve clusterized, 2 > are cpu only (ceph mon), 2 (and one more to come) storage node > (mon+osd(s)). > > Yes, i've not changed the storage configuration, and when the CPU nodes > started at least the two storage nodes where online. I see from your ceph status that you have 4 mons, are they all in your storage conf? And are your storage nodes also mons? It is important to have the monitors online, as these are accessed first and if those aren't then no storage is available. With only one OSD node running the storage could be still available, besides a HEALTH_WARN. > > >> Did you configure timesyncd properly? >> On reboot the time has to be synced by the host, so all ceph hosts share the same time. The ceph map updates require the >> proper time, so every host knows which map is the current one. > > Now, yes. As stated, i've had configured with only a NTP server that was > a VM in the same cluster; now, they use two NTP server, one remote. Then a reboot should not do any harm. > > Fixed the ntp server, servers get in sync, ceph status got OK but mons > does not start to peers themself ('pgmap' logs). If your mons aren't peering, then the status wouldn't be OK, so they must have done it after a while. May you please show us the logs? > > > Thanks. > -- Cheers, Alwin From gaio at sv.lnf.it Mon Nov 28 16:04:25 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Mon, 28 Nov 2016 16:04:25 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> Message-ID: <20161128150425.GT3348@sv.lnf.it> Mandi! Alwin Antreich In chel di` si favelave... > Would have been interesting if all OSDs were up & in. As depending on the pool config, the min size for serving data out > of that pool might have prevented the storage to serve data. Ouch! I've forgot to specify... not only the status was OK, but effectively all OSDs was up & in, in 'ceph status' and also in PVE interface. Also, for now i've 2 node storage and my pools size is 2. > I see from your ceph status that you have 4 mons, are they all in your storage conf? And are your storage nodes also mons? Yes. > If your mons aren't peering, then the status wouldn't be OK, so they must have done it after a while. May you please > show us the logs? Tomorrow. ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From rui.godinho.lopes at gmail.com Tue Nov 29 00:20:42 2016 From: rui.godinho.lopes at gmail.com (Rui Lopes) Date: Mon, 28 Nov 2016 23:20:42 +0000 Subject: [PVE-User] lvmconfig binary on debian jessie? Message-ID: Hello, Is there a way to have the lvmconfig binary on debian jessie (the dist that the proxmox iso uses)? Known is there are alternatives? Thanks! From gaio at sv.lnf.it Tue Nov 29 12:17:44 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Tue, 29 Nov 2016 12:17:44 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> Message-ID: <20161129111744.GL3355@sv.lnf.it> Mandi! Alwin Antreich In chel di` si favelave... > May you please show us the logs? Ok, i'm here. With the log. A bit of legenda: 10.27.251.7 and 10.27.251.8 are the 'ceph' nodes (mon+osd); 10.27.251.11 and 10.27.251.12 are the 'cpu' nodes (only mon). In order, mon.0, mon.1, mon.2 and mon.3. These are the logs of 10.27.251.7 (mon.0); Seems to me that ceph logs are all similar, so i hope these suffices. I've started my activity at 15.00, but before take down all the stuff i've P2V my last server, my Asterisk PBX box. Clearly, cluster worked: [...] 2016-11-26 16:45:51.900445 osd.4 10.27.251.8:6804/3442 5016 : cluster [INF] 3.68 scrub starts 2016-11-26 16:45:52.047932 osd.4 10.27.251.8:6804/3442 5017 : cluster [INF] 3.68 scrub ok 2016-11-26 16:45:52.741334 mon.0 10.27.251.7:6789/0 2317313 : cluster [INF] pgmap v2410312: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20533 B/s rd, 945 kB/s wr, 127 op/s 2016-11-26 16:45:54.825603 mon.0 10.27.251.7:6789/0 2317314 : cluster [INF] pgmap v2410313: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 261 kB/s wr, 7 op/s [...] 2016-11-26 16:47:52.741749 mon.0 10.27.251.7:6789/0 2317382 : cluster [INF] pgmap v2410381: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11563 B/s rd, 687 kB/s wr, 124 op/s 2016-11-26 16:47:55.002485 mon.0 10.27.251.7:6789/0 2317383 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s Finished the P2V, i've started to power off the cluster, starting from the cpu nodes. After powering down a node, i've realized that i need it to do another thing, so i've re-powered on. ;-) 2016-11-26 16:48:05.018514 mon.1 10.27.251.8:6789/0 129 : cluster [INF] mon.1 calling new monitor election 2016-11-26 16:48:05.031761 mon.2 10.27.251.11:6789/0 120 : cluster [INF] mon.2 calling new monitor election 2016-11-26 16:48:05.053262 mon.0 10.27.251.7:6789/0 2317384 : cluster [INF] mon.0 calling new monitor election 2016-11-26 16:48:10.091773 mon.0 10.27.251.7:6789/0 2317385 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2 2016-11-26 16:48:10.104535 mon.0 10.27.251.7:6789/0 2317386 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2 2016-11-26 16:48:10.143625 mon.0 10.27.251.7:6789/0 2317387 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 16:48:10.143731 mon.0 10.27.251.7:6789/0 2317388 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s 2016-11-26 16:48:10.144828 mon.0 10.27.251.7:6789/0 2317389 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 16:48:10.148407 mon.0 10.27.251.7:6789/0 2317390 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 16:48:11.208968 mon.0 10.27.251.7:6789/0 2317391 : cluster [INF] pgmap v2410383: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2174 kB/s rd, 646 kB/s wr, 130 op/s 2016-11-26 16:48:13.309644 mon.0 10.27.251.7:6789/0 2317392 : cluster [INF] pgmap v2410384: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2210 kB/s rd, 652 kB/s wr, 135 op/s [...] 2016-11-26 16:50:04.665220 mon.0 10.27.251.7:6789/0 2317466 : cluster [INF] pgmap v2410458: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2579 B/s rd, 23217 B/s wr, 5 op/s 2016-11-26 16:50:05.707271 mon.0 10.27.251.7:6789/0 2317467 : cluster [INF] pgmap v2410459: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 157 kB/s rd, 445 kB/s wr, 82 op/s 2016-11-26 16:50:16.786716 mon.1 10.27.251.8:6789/0 130 : cluster [INF] mon.1 calling new monitor election 2016-11-26 16:50:16.815156 mon.0 10.27.251.7:6789/0 2317468 : cluster [INF] mon.0 calling new monitor election 2016-11-26 16:52:51.536024 osd.0 10.27.251.7:6800/3166 7755 : cluster [INF] 1.e8 scrub starts 2016-11-26 16:52:53.771169 osd.0 10.27.251.7:6800/3166 7756 : cluster [INF] 1.e8 scrub ok 2016-11-26 16:54:34.558607 osd.0 10.27.251.7:6800/3166 7757 : cluster [INF] 1.ed scrub starts 2016-11-26 16:54:36.682207 osd.0 10.27.251.7:6800/3166 7758 : cluster [INF] 1.ed scrub ok 2016-11-26 16:57:07.816187 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election 2016-11-26 16:57:13.242951 mon.0 10.27.251.7:6789/0 2317469 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 2016-11-26 16:57:13.252424 mon.0 10.27.251.7:6789/0 2317470 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 2016-11-26 16:57:13.253143 mon.0 10.27.251.7:6789/0 2317471 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155786s > max 0.05s 2016-11-26 16:57:13.302934 mon.0 10.27.251.7:6789/0 2317472 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 16:57:13.302998 mon.0 10.27.251.7:6789/0 2317473 : cluster [INF] pgmap v2410460: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 77940 B/s rd, 208 kB/s wr, 38 op/s 2016-11-26 16:57:13.303055 mon.0 10.27.251.7:6789/0 2317474 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 16:57:13.303141 mon.0 10.27.251.7:6789/0 2317475 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 16:57:13.304000 mon.0 10.27.251.7:6789/0 2317476 : cluster [WRN] message from mon.3 was stamped 0.156822s in the future, clocks not synchronized 2016-11-26 16:57:14.350452 mon.0 10.27.251.7:6789/0 2317477 : cluster [INF] pgmap v2410461: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 43651 B/s rd, 15067 B/s wr, 2 op/s [...] 2016-11-26 16:57:30.901532 mon.0 10.27.251.7:6789/0 2317483 : cluster [INF] pgmap v2410467: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1539 kB/s rd, 316 kB/s wr, 172 op/s 2016-11-26 16:51:13.939571 osd.4 10.27.251.8:6804/3442 5018 : cluster [INF] 4.91 deep-scrub starts 2016-11-26 16:52:03.663961 osd.4 10.27.251.8:6804/3442 5019 : cluster [INF] 4.91 deep-scrub ok 2016-11-26 16:57:33.003398 mon.0 10.27.251.7:6789/0 2317484 : cluster [INF] pgmap v2410468: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20384 kB/s rd, 2424 kB/s wr, 1163 op/s [...] 2016-11-26 16:57:41.523421 mon.0 10.27.251.7:6789/0 2317489 : cluster [INF] pgmap v2410473: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3654 kB/s rd, 732 kB/s wr, 385 op/s 2016-11-26 16:57:43.284475 mon.0 10.27.251.7:6789/0 2317490 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155191s > max 0.05s 2016-11-26 16:57:43.624090 mon.0 10.27.251.7:6789/0 2317491 : cluster [INF] pgmap v2410474: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2140 kB/s rd, 391 kB/s wr, 233 op/s [...] 2016-11-26 16:58:02.688789 mon.0 10.27.251.7:6789/0 2317503 : cluster [INF] pgmap v2410486: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4675 kB/s rd, 184 kB/s wr, 281 op/s 2016-11-26 16:52:48.308292 osd.3 10.27.251.8:6812/4377 8761 : cluster [INF] 1.55 scrub starts 2016-11-26 16:52:50.718814 osd.3 10.27.251.8:6812/4377 8762 : cluster [INF] 1.55 scrub ok 2016-11-26 16:52:59.309398 osd.3 10.27.251.8:6812/4377 8763 : cluster [INF] 4.c7 scrub starts 2016-11-26 16:53:10.848883 osd.3 10.27.251.8:6812/4377 8764 : cluster [INF] 4.c7 scrub ok 2016-11-26 16:58:03.759643 mon.0 10.27.251.7:6789/0 2317504 : cluster [INF] pgmap v2410487: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 8311 kB/s rd, 65182 B/s wr, 334 op/s [...] 2016-11-26 16:58:11.183400 mon.0 10.27.251.7:6789/0 2317510 : cluster [INF] pgmap v2410493: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11880 kB/s rd, 507 kB/s wr, 1006 op/s 2016-11-26 16:58:13.265908 mon.0 10.27.251.7:6789/0 2317511 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 2016-11-26 16:58:13.290893 mon.0 10.27.251.7:6789/0 2317512 : cluster [INF] pgmap v2410494: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 9111 kB/s rd, 523 kB/s wr, 718 op/s [...] 2016-11-26 16:58:42.309990 mon.0 10.27.251.7:6789/0 2317529 : cluster [INF] pgmap v2410511: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 22701 kB/s rd, 4773 kB/s wr, 834 op/s 2016-11-26 16:58:43.285715 mon.0 10.27.251.7:6789/0 2317530 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.154781s > max 0.05s 2016-11-26 16:58:43.358508 mon.0 10.27.251.7:6789/0 2317531 : cluster [INF] pgmap v2410512: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 19916 kB/s rd, 4439 kB/s wr, 741 op/s [...] 2016-11-26 16:59:17.933355 mon.0 10.27.251.7:6789/0 2317552 : cluster [INF] pgmap v2410533: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4400 kB/s rd, 2144 kB/s wr, 276 op/s 2016-11-26 16:59:18.981605 mon.0 10.27.251.7:6789/0 2317553 : cluster [WRN] message from mon.3 was stamped 0.155111s in the future, clocks not synchronized 2016-11-26 16:59:21.064651 mon.0 10.27.251.7:6789/0 2317554 : cluster [INF] pgmap v2410534: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3909 kB/s rd, 1707 kB/s wr, 232 op/s [...] 2016-11-26 16:59:58.729775 mon.0 10.27.251.7:6789/0 2317576 : cluster [INF] pgmap v2410556: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4067 kB/s rd, 1372 kB/s wr, 125 op/s 2016-11-26 17:00:00.000396 mon.0 10.27.251.7:6789/0 2317577 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 2016-11-26 17:00:00.807659 mon.0 10.27.251.7:6789/0 2317578 : cluster [INF] pgmap v2410557: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 7894 kB/s rd, 1245 kB/s wr, 552 op/s [...] 2016-11-26 17:00:11.359226 mon.0 10.27.251.7:6789/0 2317585 : cluster [INF] pgmap v2410564: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2416 kB/s rd, 376 kB/s wr, 191 op/s 2016-11-26 17:00:13.286867 mon.0 10.27.251.7:6789/0 2317586 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.153666s > max 0.05s 2016-11-26 17:00:13.481830 mon.0 10.27.251.7:6789/0 2317587 : cluster [INF] pgmap v2410565: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 6266 kB/s rd, 492 kB/s wr, 265 op/s [...] 2016-11-26 17:00:15.559867 mon.0 10.27.251.7:6789/0 2317588 : cluster [INF] pgmap v2410566: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 5107 kB/s rd, 176 kB/s wr, 133 op/s OK, here server was shut down and so logs stop. At power up, i got as sayed clock skew troubles, so i got status HEALTH_WARN: 2016-11-26 18:16:19.623440 mon.1 10.27.251.8:6789/0 1311 : cluster [INF] mon.1 calling new monitor election 2016-11-26 18:16:19.729689 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election 2016-11-26 18:16:19.848291 mon.0 10.27.251.7:6789/0 1183 : cluster [INF] mon.0 calling new monitor election 2016-11-26 18:16:29.613075 mon.2 10.27.251.11:6789/0 20 : cluster [WRN] message from mon.0 was stamped 0.341880s in the future, clocks not synchronized 2016-11-26 18:16:29.742328 mon.1 10.27.251.8:6789/0 1332 : cluster [WRN] message from mon.0 was stamped 0.212611s in the future, clocks not synchronized 2016-11-26 18:16:29.894351 mon.0 10.27.251.7:6789/0 1202 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2 2016-11-26 18:16:29.901079 mon.0 10.27.251.7:6789/0 1203 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2 2016-11-26 18:16:29.902069 mon.0 10.27.251.7:6789/0 1204 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347176s > max 0.05s 2016-11-26 18:16:29.928249 mon.0 10.27.251.7:6789/0 1205 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.203948s > max 0.05s 2016-11-26 18:16:29.955001 mon.0 10.27.251.7:6789/0 1206 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 18:16:29.955115 mon.0 10.27.251.7:6789/0 1207 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 18:16:29.955195 mon.0 10.27.251.7:6789/0 1208 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 18:16:29.955297 mon.0 10.27.251.7:6789/0 1209 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 18:16:36.965739 mon.2 10.27.251.11:6789/0 23 : cluster [WRN] message from mon.0 was stamped 0.347450s in the future, clocks not synchronized 2016-11-26 18:16:37.091476 mon.1 10.27.251.8:6789/0 1335 : cluster [WRN] message from mon.0 was stamped 0.221680s in the future, clocks not synchronized 2016-11-26 18:16:59.929488 mon.0 10.27.251.7:6789/0 1212 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347736s > max 0.05s 2016-11-26 18:16:59.929541 mon.0 10.27.251.7:6789/0 1213 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.222216s > max 0.05s 2016-11-26 18:17:02.770378 mon.2 10.27.251.11:6789/0 24 : cluster [WRN] message from mon.0 was stamped 0.345763s in the future, clocks not synchronized 2016-11-26 18:17:02.902756 mon.1 10.27.251.8:6789/0 1336 : cluster [WRN] message from mon.0 was stamped 0.213372s in the future, clocks not synchronized 2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected 2016-11-26 18:17:59.930852 mon.0 10.27.251.7:6789/0 1219 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.348437s > max 0.05s 2016-11-26 18:17:59.930923 mon.0 10.27.251.7:6789/0 1220 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.223381s > max 0.05s 2016-11-26 18:18:24.383970 mon.2 10.27.251.11:6789/0 25 : cluster [INF] mon.2 calling new monitor election 2016-11-26 18:18:24.459941 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election 2016-11-26 18:18:24.506084 mon.3 10.27.251.12:6789/0 2 : cluster [WRN] message from mon.0 was stamped 0.271532s in the future, clocks not synchronized 2016-11-26 18:18:24.508845 mon.1 10.27.251.8:6789/0 1337 : cluster [INF] mon.1 calling new monitor election 2016-11-26 18:18:24.733137 mon.0 10.27.251.7:6789/0 1221 : cluster [INF] mon.0 calling new monitor election 2016-11-26 18:18:24.764445 mon.0 10.27.251.7:6789/0 1222 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 18:18:24.770743 mon.0 10.27.251.7:6789/0 1223 : cluster [INF] HEALTH_OK 2016-11-26 18:18:24.771644 mon.0 10.27.251.7:6789/0 1224 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.34865s > max 0.05s 2016-11-26 18:18:24.771763 mon.0 10.27.251.7:6789/0 1225 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272024s > max 0.05s 2016-11-26 18:18:24.778105 mon.0 10.27.251.7:6789/0 1226 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 18:18:24.778168 mon.0 10.27.251.7:6789/0 1227 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 18:18:24.778217 mon.0 10.27.251.7:6789/0 1228 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 18:18:24.778309 mon.0 10.27.251.7:6789/0 1229 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 18:18:24.778495 mon.0 10.27.251.7:6789/0 1230 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.217754s > max 0.05s 2016-11-26 18:18:31.609426 mon.3 10.27.251.12:6789/0 5 : cluster [WRN] message from mon.0 was stamped 0.272441s in the future, clocks not synchronized 2016-11-26 18:18:54.779742 mon.0 10.27.251.7:6789/0 1231 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272617s > max 0.05s 2016-11-26 18:18:54.779795 mon.0 10.27.251.7:6789/0 1232 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.224392s > max 0.05s 2016-11-26 18:18:54.779834 mon.0 10.27.251.7:6789/0 1233 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349151s > max 0.05s 2016-11-26 18:18:57.598098 mon.3 10.27.251.12:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.272729s in the future, clocks not synchronized 2016-11-26 18:19:09.612371 mon.2 10.27.251.11:6789/0 26 : cluster [WRN] message from mon.0 was stamped 0.349322s in the future, clocks not synchronized 2016-11-26 18:19:09.736830 mon.1 10.27.251.8:6789/0 1338 : cluster [WRN] message from mon.0 was stamped 0.224812s in the future, clocks not synchronized 2016-11-26 18:19:24.770966 mon.0 10.27.251.7:6789/0 1234 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 2016-11-26 18:19:54.781002 mon.0 10.27.251.7:6789/0 1235 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.273372s > max 0.05s 2016-11-26 18:19:54.781078 mon.0 10.27.251.7:6789/0 1236 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.225574s > max 0.05s 2016-11-26 18:19:54.781120 mon.0 10.27.251.7:6789/0 1237 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349896s > max 0.05s 2016-11-26 18:21:03.602890 mon.3 10.27.251.12:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.274203s in the future, clocks not synchronized 2016-11-26 18:21:24.782299 mon.0 10.27.251.7:6789/0 1238 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27444s > max 0.05s 2016-11-26 18:21:24.782359 mon.0 10.27.251.7:6789/0 1239 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.351099s > max 0.05s 2016-11-26 18:21:24.782397 mon.0 10.27.251.7:6789/0 1240 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.227465s > max 0.05s 2016-11-26 18:23:24.783511 mon.0 10.27.251.7:6789/0 1241 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.275852s > max 0.05s 2016-11-26 18:23:24.783572 mon.0 10.27.251.7:6789/0 1242 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.352701s > max 0.05s 2016-11-26 18:23:24.783614 mon.0 10.27.251.7:6789/0 1243 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.229936s > max 0.05s 2016-11-26 18:25:54.784800 mon.0 10.27.251.7:6789/0 1244 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.277662s > max 0.05s 2016-11-26 18:25:54.784861 mon.0 10.27.251.7:6789/0 1245 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.354716s > max 0.05s 2016-11-26 18:25:54.785102 mon.0 10.27.251.7:6789/0 1246 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.232739s > max 0.05s 2016-11-26 18:28:54.786183 mon.0 10.27.251.7:6789/0 1248 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27989s > max 0.05s 2016-11-26 18:28:54.786243 mon.0 10.27.251.7:6789/0 1249 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.23634s > max 0.05s 2016-11-26 18:28:54.786284 mon.0 10.27.251.7:6789/0 1250 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.35715s > max 0.05s 2016-11-26 18:29:36.721250 mon.2 10.27.251.11:6789/0 27 : cluster [WRN] message from mon.0 was stamped 0.357750s in the future, clocks not synchronized 2016-11-26 18:29:36.841757 mon.1 10.27.251.8:6789/0 1339 : cluster [WRN] message from mon.0 was stamped 0.237207s in the future, clocks not synchronized 2016-11-26 18:31:30.725507 mon.3 10.27.251.12:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.281799s in the future, clocks not synchronized 2016-11-26 18:32:24.787410 mon.0 10.27.251.7:6789/0 1264 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282481s > max 0.05s 2016-11-26 18:32:24.787462 mon.0 10.27.251.7:6789/0 1265 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.360058s > max 0.05s 2016-11-26 18:32:24.787500 mon.0 10.27.251.7:6789/0 1266 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.240569s > max 0.05s 2016-11-26 18:33:20.594196 mon.3 10.27.251.12:6789/0 9 : cluster [INF] mon.3 calling new monitor election 2016-11-26 18:33:20.635816 mon.1 10.27.251.8:6789/0 1340 : cluster [INF] mon.1 calling new monitor election 2016-11-26 18:33:20.894625 mon.0 10.27.251.7:6789/0 1273 : cluster [INF] mon.0 calling new monitor election 2016-11-26 18:33:25.919955 mon.0 10.27.251.7:6789/0 1274 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 2016-11-26 18:33:25.929393 mon.0 10.27.251.7:6789/0 1275 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 2016-11-26 18:33:25.930715 mon.0 10.27.251.7:6789/0 1276 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282884s > max 0.05s 2016-11-26 18:33:25.947280 mon.0 10.27.251.7:6789/0 1277 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.234203s > max 0.05s 2016-11-26 18:33:25.964223 mon.0 10.27.251.7:6789/0 1278 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 18:33:25.964283 mon.0 10.27.251.7:6789/0 1279 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 18:33:25.964326 mon.0 10.27.251.7:6789/0 1280 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 18:33:25.964418 mon.0 10.27.251.7:6789/0 1281 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 18:33:55.948613 mon.0 10.27.251.7:6789/0 1283 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.28349s > max 0.05s 2016-11-26 18:33:55.948680 mon.0 10.27.251.7:6789/0 1284 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.242253s > max 0.05s 2016-11-26 18:34:25.929710 mon.0 10.27.251.7:6789/0 1287 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected 2016-11-26 18:34:55.950050 mon.0 10.27.251.7:6789/0 1288 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.284225s > max 0.05s 2016-11-26 18:34:55.950117 mon.0 10.27.251.7:6789/0 1289 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243421s > max 0.05s 2016-11-26 18:36:25.951267 mon.0 10.27.251.7:6789/0 1290 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.285389s > max 0.05s 2016-11-26 18:36:25.951393 mon.0 10.27.251.7:6789/0 1291 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.245253s > max 0.05s 2016-11-26 18:38:25.952573 mon.0 10.27.251.7:6789/0 1294 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.286907s > max 0.05s 2016-11-26 18:38:25.952836 mon.0 10.27.251.7:6789/0 1295 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247648s > max 0.05s 2016-11-26 18:40:55.954179 mon.0 10.27.251.7:6789/0 1296 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.288735s > max 0.05s 2016-11-26 18:40:55.954233 mon.0 10.27.251.7:6789/0 1297 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.2506s > max 0.05s 2016-11-26 18:43:32.915408 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election 2016-11-26 18:43:32.916835 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election 2016-11-26 18:43:32.951384 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.388792s in the future, clocks not synchronized 2016-11-26 18:43:33.014026 mon.3 10.27.251.12:6789/0 10 : cluster [INF] mon.3 calling new monitor election 2016-11-26 18:43:33.050896 mon.1 10.27.251.8:6789/0 1341 : cluster [INF] mon.1 calling new monitor election 2016-11-26 18:43:33.305330 mon.0 10.27.251.7:6789/0 1298 : cluster [INF] mon.0 calling new monitor election 2016-11-26 18:43:33.324492 mon.0 10.27.251.7:6789/0 1299 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 18:43:33.333626 mon.0 10.27.251.7:6789/0 1300 : cluster [INF] HEALTH_OK 2016-11-26 18:43:33.334234 mon.0 10.27.251.7:6789/0 1301 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290845s > max 0.05s 2016-11-26 18:43:33.334321 mon.0 10.27.251.7:6789/0 1302 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.388745s > max 0.05s 2016-11-26 18:43:33.340638 mon.0 10.27.251.7:6789/0 1303 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 18:43:33.340703 mon.0 10.27.251.7:6789/0 1304 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 18:43:33.340763 mon.0 10.27.251.7:6789/0 1305 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 18:43:33.340858 mon.0 10.27.251.7:6789/0 1306 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 18:43:33.341044 mon.0 10.27.251.7:6789/0 1307 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247914s > max 0.05s 2016-11-26 18:43:40.064299 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.388889s in the future, clocks not synchronized 2016-11-26 18:44:03.342137 mon.0 10.27.251.7:6789/0 1308 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291226s > max 0.05s 2016-11-26 18:44:03.342225 mon.0 10.27.251.7:6789/0 1309 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.254342s > max 0.05s 2016-11-26 18:44:03.342281 mon.0 10.27.251.7:6789/0 1310 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.389057s > max 0.05s 2016-11-26 18:44:06.047499 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.389102s in the future, clocks not synchronized 2016-11-26 18:44:33.333908 mon.0 10.27.251.7:6789/0 1311 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 2016-11-26 18:45:03.343358 mon.0 10.27.251.7:6789/0 1313 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291989s > max 0.05s 2016-11-26 18:45:03.343435 mon.0 10.27.251.7:6789/0 1314 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.255536s > max 0.05s 2016-11-26 18:45:03.343540 mon.0 10.27.251.7:6789/0 1315 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.38983s > max 0.05s 2016-11-26 18:46:11.549947 mon.2 10.27.251.11:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.390678s in the future, clocks not synchronized 2016-11-26 18:46:33.344570 mon.0 10.27.251.7:6789/0 1329 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.29311s > max 0.05s 2016-11-26 18:46:33.344642 mon.0 10.27.251.7:6789/0 1330 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.257389s > max 0.05s 2016-11-26 18:46:33.344707 mon.0 10.27.251.7:6789/0 1331 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.391036s > max 0.05s 2016-11-26 18:48:33.345909 mon.0 10.27.251.7:6789/0 1354 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.294607s > max 0.05s 2016-11-26 18:48:33.345973 mon.0 10.27.251.7:6789/0 1355 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.392611s > max 0.05s 2016-11-26 18:48:33.346016 mon.0 10.27.251.7:6789/0 1356 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.259781s > max 0.05s 2016-11-26 18:51:03.347074 mon.0 10.27.251.7:6789/0 1357 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.296507s > max 0.05s 2016-11-26 18:51:03.347259 mon.0 10.27.251.7:6789/0 1358 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.394627s > max 0.05s 2016-11-26 18:51:03.347311 mon.0 10.27.251.7:6789/0 1359 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.262662s > max 0.05s 2016-11-26 18:54:03.348471 mon.0 10.27.251.7:6789/0 1360 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.298756s > max 0.05s 2016-11-26 18:54:03.348533 mon.0 10.27.251.7:6789/0 1361 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.397086s > max 0.05s 2016-11-26 18:54:03.348580 mon.0 10.27.251.7:6789/0 1362 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.266196s > max 0.05s 2016-11-26 18:56:39.053369 mon.2 10.27.251.11:6789/0 9 : cluster [WRN] message from mon.0 was stamped 0.399300s in the future, clocks not synchronized 2016-11-26 18:57:33.349690 mon.0 10.27.251.7:6789/0 1363 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192948s > max 0.05s 2016-11-26 18:57:33.349743 mon.0 10.27.251.7:6789/0 1364 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.270457s > max 0.05s 2016-11-26 18:57:33.349788 mon.0 10.27.251.7:6789/0 1365 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.400016s > max 0.05s 2016-11-26 19:00:00.000400 mon.0 10.27.251.7:6789/0 1370 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected 2016-11-26 19:01:33.350738 mon.0 10.27.251.7:6789/0 1389 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192183s > max 0.05s 2016-11-26 19:01:33.350800 mon.0 10.27.251.7:6789/0 1390 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.275208s > max 0.05s 2016-11-26 19:01:33.350856 mon.0 10.27.251.7:6789/0 1391 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.40334s > max 0.05s 2016-11-26 19:06:03.351908 mon.0 10.27.251.7:6789/0 1478 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192207s > max 0.05s 2016-11-26 19:06:03.351997 mon.0 10.27.251.7:6789/0 1479 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.280431s > max 0.05s 2016-11-26 19:06:03.352110 mon.0 10.27.251.7:6789/0 1480 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.251491s > max 0.05s But after adding the new NTP sever and waiting some time, finally clock get in sync and status go to OK. But (this is the PANIC time) despite of the fact that 'ceph status' and pve interface say 'all OK', cluster does not work. So i've started to reboot the CPU nodes (mon.2 and .3): 2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK 2016-11-26 19:12:43.854404 mon.1 10.27.251.8:6789/0 1342 : cluster [INF] mon.1 calling new monitor election 2016-11-26 19:12:43.856032 mon.3 10.27.251.12:6789/0 11 : cluster [INF] mon.3 calling new monitor election 2016-11-26 19:12:43.870922 mon.0 10.27.251.7:6789/0 1590 : cluster [INF] mon.0 calling new monitor election 2016-11-26 19:12:48.895683 mon.0 10.27.251.7:6789/0 1591 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 2016-11-26 19:12:48.905245 mon.0 10.27.251.7:6789/0 1592 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 2016-11-26 19:12:48.951654 mon.0 10.27.251.7:6789/0 1593 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 19:12:48.951715 mon.0 10.27.251.7:6789/0 1594 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:12:48.951766 mon.0 10.27.251.7:6789/0 1595 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 19:12:48.951848 mon.0 10.27.251.7:6789/0 1596 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 19:15:48.583382 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election 2016-11-26 19:15:48.584865 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election 2016-11-26 19:15:48.589714 mon.0 10.27.251.7:6789/0 1616 : cluster [INF] mon.0 calling new monitor election 2016-11-26 19:15:48.589965 mon.1 10.27.251.8:6789/0 1343 : cluster [INF] mon.1 calling new monitor election 2016-11-26 19:15:48.591671 mon.3 10.27.251.12:6789/0 12 : cluster [INF] mon.3 calling new monitor election 2016-11-26 19:15:48.614007 mon.0 10.27.251.7:6789/0 1617 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 19:15:48.620602 mon.0 10.27.251.7:6789/0 1618 : cluster [INF] HEALTH_OK 2016-11-26 19:15:48.633199 mon.0 10.27.251.7:6789/0 1619 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 19:15:48.633258 mon.0 10.27.251.7:6789/0 1620 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:15:48.633322 mon.0 10.27.251.7:6789/0 1621 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 19:15:48.633416 mon.0 10.27.251.7:6789/0 1622 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 19:18:12.415679 mon.0 10.27.251.7:6789/0 1639 : cluster [INF] mon.0 calling new monitor election 2016-11-26 19:18:17.444444 mon.0 10.27.251.7:6789/0 1640 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2 2016-11-26 19:18:17.453618 mon.0 10.27.251.7:6789/0 1641 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2 2016-11-26 19:18:17.468577 mon.0 10.27.251.7:6789/0 1642 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 19:18:17.468636 mon.0 10.27.251.7:6789/0 1643 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:18:17.468679 mon.0 10.27.251.7:6789/0 1644 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 19:18:17.468755 mon.0 10.27.251.7:6789/0 1645 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 19:21:25.457997 mon.2 10.27.251.11:6789/0 5 : cluster [INF] mon.2 calling new monitor election 2016-11-26 19:21:25.458923 mon.0 10.27.251.7:6789/0 1648 : cluster [INF] mon.0 calling new monitor election 2016-11-26 19:21:25.459240 mon.1 10.27.251.8:6789/0 1344 : cluster [INF] mon.1 calling new monitor election 2016-11-26 19:21:25.489206 mon.0 10.27.251.7:6789/0 1649 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 19:21:25.498421 mon.0 10.27.251.7:6789/0 1650 : cluster [INF] HEALTH_OK 2016-11-26 19:21:25.505645 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election 2016-11-26 19:21:25.508232 mon.0 10.27.251.7:6789/0 1651 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 19:21:25.508377 mon.0 10.27.251.7:6789/0 1652 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:21:25.508466 mon.0 10.27.251.7:6789/0 1653 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 19:21:25.508556 mon.0 10.27.251.7:6789/0 1654 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 19:44:00.306113 mon.0 10.27.251.7:6789/0 1672 : cluster [INF] mon.0 calling new monitor election 2016-11-26 19:44:05.343631 mon.0 10.27.251.7:6789/0 1673 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 2016-11-26 19:44:05.353082 mon.0 10.27.251.7:6789/0 1674 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 2016-11-26 19:44:05.373799 mon.0 10.27.251.7:6789/0 1675 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 19:44:05.373860 mon.0 10.27.251.7:6789/0 1676 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:44:05.373904 mon.0 10.27.251.7:6789/0 1677 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 19:44:05.373983 mon.0 10.27.251.7:6789/0 1678 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 19:47:20.297661 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election 2016-11-26 19:47:20.299406 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election 2016-11-26 19:47:20.357274 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.404381s in the future, clocks not synchronized 2016-11-26 19:47:20.716116 mon.3 10.27.251.12:6789/0 4 : cluster [INF] mon.3 calling new monitor election 2016-11-26 19:47:20.719435 mon.0 10.27.251.7:6789/0 1679 : cluster [INF] mon.0 calling new monitor election 2016-11-26 19:47:20.719853 mon.1 10.27.251.8:6789/0 1345 : cluster [INF] mon.1 calling new monitor election 2016-11-26 19:47:20.747017 mon.0 10.27.251.7:6789/0 1680 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 19:47:20.755302 mon.0 10.27.251.7:6789/0 1681 : cluster [INF] HEALTH_OK 2016-11-26 19:47:20.755943 mon.0 10.27.251.7:6789/0 1682 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420346s > max 0.05s 2016-11-26 19:47:20.762042 mon.0 10.27.251.7:6789/0 1683 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 19:47:20.762100 mon.0 10.27.251.7:6789/0 1684 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:47:20.762146 mon.0 10.27.251.7:6789/0 1685 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 19:47:20.762226 mon.0 10.27.251.7:6789/0 1686 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in 2016-11-26 19:47:27.462603 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.420329s in the future, clocks not synchronized 2016-11-26 19:47:50.763598 mon.0 10.27.251.7:6789/0 1687 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420661s > max 0.05s 2016-11-26 19:47:53.438750 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.420684s in the future, clocks not synchronized 2016-11-26 19:48:20.755382 mon.0 10.27.251.7:6789/0 1688 : cluster [INF] HEALTH_WARN; clock skew detected on mon.2; Monitor clock skew detected 2016-11-26 19:49:20.755732 mon.0 10.27.251.7:6789/0 1697 : cluster [INF] HEALTH_OK With no luck. So finally i've set 'nodown' and 'noout' flags and rebooted the storage nodes (mon.0 ad .1). And suddenly all get back as normal: 2016-11-26 19:57:20.090836 mon.0 10.27.251.7:6789/0 1722 : cluster [INF] osdmap e99: 6 osds: 6 up, 6 in 2016-11-26 19:57:20.110743 mon.0 10.27.251.7:6789/0 1723 : cluster [INF] pgmap v2410578: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:57:20.758100 mon.0 10.27.251.7:6789/0 1724 : cluster [INF] HEALTH_WARN; noout flag(s) set 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:00:00.000180 mon.1 10.27.251.8:6789/0 1353 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3 1,2,3 2016-11-26 20:01:49.705122 mon.0 10.27.251.7:6789/0 1 : cluster [INF] mon.0 calling new monitor election 2016-11-26 20:01:49.731728 mon.0 10.27.251.7:6789/0 4 : cluster [INF] mon.0 calling new monitor election 2016-11-26 20:01:49.751119 mon.0 10.27.251.7:6789/0 5 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 20:01:49.762503 mon.0 10.27.251.7:6789/0 6 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set 2016-11-26 20:01:49.788619 mon.0 10.27.251.7:6789/0 7 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.243513s > max 0.05s 2016-11-26 20:01:49.788699 mon.0 10.27.251.7:6789/0 8 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.240216s > max 0.05s 2016-11-26 20:01:49.788796 mon.0 10.27.251.7:6789/0 9 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243912s > max 0.05s 2016-11-26 20:01:49.797382 mon.0 10.27.251.7:6789/0 10 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 20:01:49.797669 mon.0 10.27.251.7:6789/0 11 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:01:49.797850 mon.0 10.27.251.7:6789/0 12 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 20:01:49.797960 mon.0 10.27.251.7:6789/0 13 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in 2016-11-26 20:01:49.798248 mon.0 10.27.251.7:6789/0 14 : cluster [WRN] message from mon.1 was stamped 0.294517s in the future, clocks not synchronized 2016-11-26 20:01:50.014131 mon.3 10.27.251.12:6789/0 6 : cluster [INF] mon.3 calling new monitor election 2016-11-26 20:01:50.016998 mon.2 10.27.251.11:6789/0 9 : cluster [INF] mon.2 calling new monitor election 2016-11-26 20:01:50.017895 mon.1 10.27.251.8:6789/0 1354 : cluster [INF] mon.1 calling new monitor election 2016-11-26 20:01:57.737260 mon.0 10.27.251.7:6789/0 19 : cluster [WRN] message from mon.3 was stamped 0.291444s in the future, clocks not synchronized 2016-11-26 20:02:19.789732 mon.0 10.27.251.7:6789/0 20 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.294864s > max 0.05s 2016-11-26 20:02:19.789786 mon.0 10.27.251.7:6789/0 21 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290951s > max 0.05s 2016-11-26 20:02:19.789824 mon.0 10.27.251.7:6789/0 22 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.29396s > max 0.05s 2016-11-26 20:02:20.949515 mon.0 10.27.251.7:6789/0 23 : cluster [INF] osdmap e101: 6 osds: 4 up, 6 in 2016-11-26 20:02:20.985891 mon.0 10.27.251.7:6789/0 24 : cluster [INF] pgmap v2410580: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:02:21.965798 mon.0 10.27.251.7:6789/0 25 : cluster [INF] osd.0 10.27.251.7:6804/3291 boot 2016-11-26 20:02:21.965879 mon.0 10.27.251.7:6789/0 26 : cluster [INF] osd.1 10.27.251.7:6800/2793 boot 2016-11-26 20:02:21.975031 mon.0 10.27.251.7:6789/0 27 : cluster [INF] osdmap e102: 6 osds: 6 up, 6 in 2016-11-26 20:02:22.022415 mon.0 10.27.251.7:6789/0 28 : cluster [INF] pgmap v2410581: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:02:23.026342 mon.0 10.27.251.7:6789/0 29 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in 2016-11-26 20:02:23.026417 mon.0 10.27.251.7:6789/0 30 : cluster [WRN] message from mon.2 was stamped 0.275306s in the future, clocks not synchronized 2016-11-26 20:02:23.046210 mon.0 10.27.251.7:6789/0 31 : cluster [INF] pgmap v2410582: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:02:25.819773 mon.0 10.27.251.7:6789/0 32 : cluster [INF] pgmap v2410583: 768 pgs: 169 stale+active+clean, 143 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1467 kB/s wr, 276 op/s 2016-11-26 20:02:26.896658 mon.0 10.27.251.7:6789/0 33 : cluster [INF] pgmap v2410584: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3337 kB/s wr, 630 op/s 2016-11-26 20:02:49.763887 mon.0 10.27.251.7:6789/0 34 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set; Monitor clock skew detected 2016-11-26 20:02:55.636643 osd.1 10.27.251.7:6800/2793 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.511571 secs 2016-11-26 20:02:55.636653 osd.1 10.27.251.7:6800/2793 2 : cluster [WRN] slow request 30.511571 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg 2016-11-26 20:03:04.727273 osd.0 10.27.251.7:6804/3291 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.147061 secs 2016-11-26 20:03:04.727281 osd.0 10.27.251.7:6804/3291 2 : cluster [WRN] slow request 30.147061 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg 2016-11-26 20:03:25.648743 osd.1 10.27.251.7:6800/2793 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.523708 secs 2016-11-26 20:03:25.648758 osd.1 10.27.251.7:6800/2793 4 : cluster [WRN] slow request 60.523708 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg 2016-11-26 20:03:34.737588 osd.0 10.27.251.7:6804/3291 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.157392 secs 2016-11-26 20:03:34.737597 osd.0 10.27.251.7:6804/3291 4 : cluster [WRN] slow request 60.157392 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg 2016-11-26 20:03:49.765365 mon.0 10.27.251.7:6789/0 35 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set 2016-11-26 20:04:25.850414 mon.0 10.27.251.7:6789/0 36 : cluster [INF] pgmap v2410585: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:04:26.890251 mon.0 10.27.251.7:6789/0 37 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:04:25.668335 osd.1 10.27.251.7:6800/2793 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.543296 secs 2016-11-26 20:04:25.668343 osd.1 10.27.251.7:6800/2793 6 : cluster [WRN] slow request 120.543296 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg 2016-11-26 20:04:34.757570 osd.0 10.27.251.7:6804/3291 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.177368 secs 2016-11-26 20:04:34.757595 osd.0 10.27.251.7:6804/3291 6 : cluster [WRN] slow request 120.177368 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg 2016-11-26 20:04:49.766694 mon.0 10.27.251.7:6789/0 38 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set 2016-11-26 20:05:41.864203 mon.0 10.27.251.7:6789/0 39 : cluster [INF] mon.0 calling new monitor election 2016-11-26 20:05:46.887853 mon.0 10.27.251.7:6789/0 40 : cluster [INF] mon.0 at 0 won leader election with quorum 0,2,3 2016-11-26 20:05:46.897914 mon.0 10.27.251.7:6789/0 41 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set; 1 mons down, quorum 0,2,3 0,2,3 2016-11-26 20:05:46.898803 mon.0 10.27.251.7:6789/0 42 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 20:05:46.898873 mon.0 10.27.251.7:6789/0 43 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:05:46.898930 mon.0 10.27.251.7:6789/0 44 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 20:05:46.899022 mon.0 10.27.251.7:6789/0 45 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in 2016-11-26 20:06:25.875860 mon.0 10.27.251.7:6789/0 46 : cluster [INF] pgmap v2410587: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:06:26.902246 mon.0 10.27.251.7:6789/0 47 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:06:25.708241 osd.1 10.27.251.7:6800/2793 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.583204 secs 2016-11-26 20:06:25.708251 osd.1 10.27.251.7:6800/2793 8 : cluster [WRN] slow request 240.583204 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg 2016-11-26 20:06:34.798235 osd.0 10.27.251.7:6804/3291 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.218041 secs 2016-11-26 20:06:34.798247 osd.0 10.27.251.7:6804/3291 8 : cluster [WRN] slow request 240.218041 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg 2016-11-26 20:07:20.410986 mon.3 10.27.251.12:6789/0 7 : cluster [INF] mon.3 calling new monitor election 2016-11-26 20:07:20.414159 mon.2 10.27.251.11:6789/0 10 : cluster [INF] mon.2 calling new monitor election 2016-11-26 20:07:20.421808 mon.0 10.27.251.7:6789/0 48 : cluster [INF] mon.0 calling new monitor election 2016-11-26 20:07:20.448582 mon.0 10.27.251.7:6789/0 49 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 2016-11-26 20:07:20.459304 mon.0 10.27.251.7:6789/0 50 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set 2016-11-26 20:07:20.465502 mon.0 10.27.251.7:6789/0 51 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} 2016-11-26 20:07:20.465571 mon.0 10.27.251.7:6789/0 52 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:20.465650 mon.0 10.27.251.7:6789/0 53 : cluster [INF] mdsmap e1: 0/0/0 up 2016-11-26 20:07:20.465750 mon.0 10.27.251.7:6789/0 54 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in 2016-11-26 20:07:20.465934 mon.0 10.27.251.7:6789/0 55 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.10054s > max 0.05s 2016-11-26 20:07:20.478961 mon.0 10.27.251.7:6789/0 56 : cluster [WRN] message from mon.1 was stamped 0.109909s in the future, clocks not synchronized 2016-11-26 20:07:20.522400 mon.1 10.27.251.8:6789/0 1 : cluster [INF] mon.1 calling new monitor election 2016-11-26 20:07:20.541271 mon.1 10.27.251.8:6789/0 2 : cluster [INF] mon.1 calling new monitor election 2016-11-26 20:07:32.641565 mon.0 10.27.251.7:6789/0 61 : cluster [INF] osdmap e104: 6 osds: 5 up, 6 in 2016-11-26 20:07:32.665552 mon.0 10.27.251.7:6789/0 62 : cluster [INF] pgmap v2410589: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:33.658567 mon.0 10.27.251.7:6789/0 63 : cluster [INF] osd.5 10.27.251.8:6812/4116 boot 2016-11-26 20:07:33.676112 mon.0 10.27.251.7:6789/0 64 : cluster [INF] osdmap e105: 6 osds: 6 up, 6 in 2016-11-26 20:07:33.726565 mon.0 10.27.251.7:6789/0 65 : cluster [INF] pgmap v2410590: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:34.722585 mon.0 10.27.251.7:6789/0 66 : cluster [INF] osdmap e106: 6 osds: 5 up, 6 in 2016-11-26 20:07:34.785966 mon.0 10.27.251.7:6789/0 67 : cluster [INF] pgmap v2410591: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:35.737328 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osd.4 10.27.251.8:6804/3430 boot 2016-11-26 20:07:35.757111 mon.0 10.27.251.7:6789/0 69 : cluster [INF] osdmap e107: 6 osds: 6 up, 6 in 2016-11-26 20:07:35.794812 mon.0 10.27.251.7:6789/0 70 : cluster [INF] pgmap v2410592: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:36.797846 mon.0 10.27.251.7:6789/0 71 : cluster [INF] osdmap e108: 6 osds: 6 up, 6 in 2016-11-26 20:07:36.842861 mon.0 10.27.251.7:6789/0 72 : cluster [INF] pgmap v2410593: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:38.854149 mon.0 10.27.251.7:6789/0 73 : cluster [INF] pgmap v2410594: 768 pgs: 88 stale+active+clean, 312 peering, 368 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1992 kB/s rd, 683 kB/s wr, 117 op/s 2016-11-26 20:07:39.923063 mon.0 10.27.251.7:6789/0 74 : cluster [INF] pgmap v2410595: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1466 kB/s wr, 257 op/s 2016-11-26 20:07:41.012515 mon.0 10.27.251.7:6789/0 75 : cluster [INF] osdmap e109: 6 osds: 5 up, 6 in 2016-11-26 20:07:41.039741 mon.0 10.27.251.7:6789/0 76 : cluster [INF] pgmap v2410596: 768 pgs: 142 stale+active+clean, 312 peering, 314 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1110 kB/s wr, 211 op/s 2016-11-26 20:07:38.817104 osd.0 10.27.251.7:6804/3291 9 : cluster [INF] 1.b7 scrub starts 2016-11-26 20:07:41.429461 osd.0 10.27.251.7:6804/3291 10 : cluster [INF] 1.b7 scrub ok 2016-11-26 20:07:42.043092 mon.0 10.27.251.7:6789/0 77 : cluster [INF] osd.2 10.27.251.8:6800/3073 boot 2016-11-26 20:07:42.074005 mon.0 10.27.251.7:6789/0 78 : cluster [INF] osdmap e110: 6 osds: 5 up, 6 in 2016-11-26 20:07:42.150211 mon.0 10.27.251.7:6789/0 79 : cluster [INF] pgmap v2410597: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 940 B/s rd, 1 op/s 2016-11-26 20:07:43.084122 mon.0 10.27.251.7:6789/0 80 : cluster [INF] osd.3 10.27.251.8:6808/3714 boot 2016-11-26 20:07:43.104296 mon.0 10.27.251.7:6789/0 81 : cluster [INF] osdmap e111: 6 osds: 6 up, 6 in 2016-11-26 20:07:35.733073 osd.1 10.27.251.7:6800/2793 9 : cluster [INF] 3.37 scrub starts 2016-11-26 20:07:35.841829 osd.1 10.27.251.7:6800/2793 10 : cluster [INF] 3.37 scrub ok 2016-11-26 20:07:36.733564 osd.1 10.27.251.7:6800/2793 11 : cluster [INF] 3.7c scrub starts 2016-11-26 20:07:36.852120 osd.1 10.27.251.7:6800/2793 12 : cluster [INF] 3.7c scrub ok 2016-11-26 20:07:41.764388 osd.1 10.27.251.7:6800/2793 13 : cluster [INF] 3.fc scrub starts 2016-11-26 20:07:41.830597 osd.1 10.27.251.7:6800/2793 14 : cluster [INF] 3.fc scrub ok 2016-11-26 20:07:42.736376 osd.1 10.27.251.7:6800/2793 15 : cluster [INF] 4.9 scrub starts 2016-11-26 20:07:43.149808 mon.0 10.27.251.7:6789/0 82 : cluster [INF] pgmap v2410598: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 982 B/s rd, 1 op/s 2016-11-26 20:07:44.135066 mon.0 10.27.251.7:6789/0 83 : cluster [INF] osdmap e112: 6 osds: 6 up, 6 in 2016-11-26 20:07:44.178743 mon.0 10.27.251.7:6789/0 84 : cluster [INF] pgmap v2410599: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail 2016-11-26 20:07:46.774607 mon.0 10.27.251.7:6789/0 85 : cluster [INF] pgmap v2410600: 768 pgs: 154 stale+active+clean, 223 peering, 390 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2157 kB/s wr, 466 op/s 2016-11-26 20:07:47.846499 mon.0 10.27.251.7:6789/0 86 : cluster [INF] pgmap v2410601: 768 pgs: 223 peering, 544 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4603 kB/s wr, 748 op/s 2016-11-26 20:07:48.919366 mon.0 10.27.251.7:6789/0 87 : cluster [INF] pgmap v2410602: 768 pgs: 99 peering, 667 active+clean, 2 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4235 kB/s wr, 495 op/s 2016-11-26 20:07:49.986068 mon.0 10.27.251.7:6789/0 88 : cluster [INF] pgmap v2410603: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1607 kB/s rd, 30552 B/s wr, 127 op/s 2016-11-26 20:07:50.468852 mon.0 10.27.251.7:6789/0 89 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.105319s > max 0.05s 2016-11-26 20:07:43.076810 osd.0 10.27.251.7:6804/3291 11 : cluster [INF] 1.17 scrub starts 2016-11-26 20:07:45.709439 osd.0 10.27.251.7:6804/3291 12 : cluster [INF] 1.17 scrub ok 2016-11-26 20:07:52.746601 mon.0 10.27.251.7:6789/0 90 : cluster [INF] pgmap v2410604: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 628 kB/s rd, 25525 B/s wr, 139 op/s [...] 2016-11-26 20:08:03.325584 mon.0 10.27.251.7:6789/0 98 : cluster [INF] pgmap v2410612: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 387 kB/s rd, 61530 B/s wr, 90 op/s 2016-11-26 20:08:03.523958 osd.1 10.27.251.7:6800/2793 16 : cluster [INF] 4.9 scrub ok 2016-11-26 20:08:04.398784 mon.0 10.27.251.7:6789/0 99 : cluster [INF] pgmap v2410613: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2975 kB/s rd, 401 kB/s wr, 419 op/s [...] 2016-11-26 20:08:20.340826 mon.0 10.27.251.7:6789/0 112 : cluster [INF] pgmap v2410626: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 384 kB/s rd, 95507 B/s wr, 31 op/s 2016-11-26 20:08:20.458392 mon.0 10.27.251.7:6789/0 113 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1; nodown,noout flag(s) set; Monitor clock skew detected 2016-11-26 20:08:22.429360 mon.0 10.27.251.7:6789/0 114 : cluster [INF] pgmap v2410627: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 256 kB/s rd, 65682 B/s wr, 18 op/s [...] 2016-11-26 20:09:19.885573 mon.0 10.27.251.7:6789/0 160 : cluster [INF] pgmap v2410671: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 33496 kB/s rd, 3219 kB/s wr, 317 op/s 2016-11-26 20:09:20.458837 mon.0 10.27.251.7:6789/0 161 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set 2016-11-26 20:09:20.921396 mon.0 10.27.251.7:6789/0 162 : cluster [INF] pgmap v2410672: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 10498 kB/s rd, 970 kB/s wr, 46 op/s [...] 2016-11-26 20:09:40.156783 mon.0 10.27.251.7:6789/0 178 : cluster [INF] pgmap v2410688: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 16202 kB/s rd, 586 kB/s wr, 64 op/s 2016-11-26 20:09:41.231992 mon.0 10.27.251.7:6789/0 181 : cluster [INF] osdmap e113: 6 osds: 6 up, 6 in 2016-11-26 20:09:41.260099 mon.0 10.27.251.7:6789/0 182 : cluster [INF] pgmap v2410689: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13734 kB/s rd, 561 kB/s wr, 58 op/s [...] 2016-11-26 20:09:46.764432 mon.0 10.27.251.7:6789/0 187 : cluster [INF] pgmap v2410693: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4388 kB/s rd, 97979 B/s wr, 18 op/s 2016-11-26 20:09:46.764614 mon.0 10.27.251.7:6789/0 189 : cluster [INF] osdmap e114: 6 osds: 6 up, 6 in 2016-11-26 20:09:46.793173 mon.0 10.27.251.7:6789/0 190 : cluster [INF] pgmap v2410694: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1709 kB/s rd, 75202 B/s wr, 4 op/s [...] 2016-11-26 20:10:19.919396 mon.0 10.27.251.7:6789/0 216 : cluster [INF] pgmap v2410719: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 404 kB/s wr, 4 op/s 2016-11-26 20:10:20.459279 mon.0 10.27.251.7:6789/0 217 : cluster [INF] HEALTH_OK Other things to note. In syslog (not ceph log) of mon.0 i've found for the first (falied) boot: Nov 26 18:05:43 capitanamerica ceph[1714]: === mon.0 === Nov 26 18:05:43 capitanamerica ceph[1714]: Starting Ceph mon.0 on capitanamerica... Nov 26 18:05:43 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f... Nov 26 18:05:43 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f. Nov 26 18:05:43 capitanamerica ceph[1714]: Running as unit ceph-mon.0.1480179943.905192147.service. Nov 26 18:05:43 capitanamerica ceph[1714]: Starting ceph-create-keys on capitanamerica... Nov 26 18:05:44 capitanamerica ceph[1714]: === osd.1 === Nov 26 18:05:44 capitanamerica ceph[1714]: 2016-11-26 18:05:44.939844 7f7f2478c700 0 -- :/2046852810 >> 10.27.251.7:6789/0 pipe(0x7f7f20061550 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f2005a990).fault Nov 26 18:05:46 capitanamerica bash[1874]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6 Nov 26 18:05:52 capitanamerica ceph[1714]: 2016-11-26 18:05:52.234086 7f7f2478c700 0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400b0c0).fault Nov 26 18:05:58 capitanamerica ceph[1714]: 2016-11-26 18:05:58.234163 7f7f2458a700 0 -- 10.27.251.7:0/2046852810 >> 10.27.251.12:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d240).fault Nov 26 18:06:04 capitanamerica ceph[1714]: 2016-11-26 18:06:04.234037 7f7f2468b700 0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d310).fault Nov 26 18:06:14 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 1.82 host=capitanamerica root=default' Nov 26 18:06:14 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.1']' returned non-zero exit status 1 Nov 26 18:06:15 capitanamerica ceph[1714]: === osd.0 === Nov 26 18:06:22 capitanamerica ceph[1714]: 2016-11-26 18:06:22.238039 7f8bb46b2700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000b0c0).fault Nov 26 18:06:28 capitanamerica ceph[1714]: 2016-11-26 18:06:28.241918 7f8bb44b0700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d240).fault Nov 26 18:06:34 capitanamerica ceph[1714]: 2016-11-26 18:06:34.242060 7f8bb45b1700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d310).fault Nov 26 18:06:38 capitanamerica ceph[1714]: 2016-11-26 18:06:38.242035 7f8bb44b0700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000de50).fault Nov 26 18:06:44 capitanamerica ceph[1714]: 2016-11-26 18:06:44.242157 7f8bb46b2700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000e0d0).fault Nov 26 18:06:45 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 1.82 host=capitanamerica root=default' Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1 Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: One or more partitions failed to activate And for the second (working): Nov 26 20:01:49 capitanamerica ceph[1716]: === mon.0 === Nov 26 20:01:49 capitanamerica ceph[1716]: Starting Ceph mon.0 on capitanamerica... Nov 26 20:01:49 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f... Nov 26 20:01:49 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f. Nov 26 20:01:49 capitanamerica ceph[1716]: Running as unit ceph-mon.0.1480186909.457328760.service. Nov 26 20:01:49 capitanamerica ceph[1716]: Starting ceph-create-keys on capitanamerica... Nov 26 20:01:49 capitanamerica bash[1900]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6 Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.1 === Nov 26 20:01:50 capitanamerica ceph[1716]: create-or-move updated item name 'osd.1' weight 1.82 at location {host=capitanamerica,root=default} to crush map Nov 26 20:01:50 capitanamerica ceph[1716]: Starting Ceph osd.1 on capitanamerica... Nov 26 20:01:50 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f... Nov 26 20:01:50 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f. Nov 26 20:01:50 capitanamerica ceph[1716]: Running as unit ceph-osd.1.1480186910.254183695.service. Nov 26 20:01:50 capitanamerica bash[2765]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.0 === Nov 26 20:01:51 capitanamerica ceph[1716]: create-or-move updated item name 'osd.0' weight 1.82 at location {host=capitanamerica,root=default} to crush map Nov 26 20:01:51 capitanamerica ceph[1716]: Starting Ceph osd.0 on capitanamerica... Nov 26 20:01:51 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f... Nov 26 20:01:51 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f. Nov 26 20:01:51 capitanamerica ceph[1716]: Running as unit ceph-osd.0.1480186910.957564523.service. Nov 26 20:01:51 capitanamerica bash[3281]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal So seems to me that at the first start (some) OSD fail to start. But, again, PVE and 'ceph status' report all OSDs as up&in. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From sysadmin-pve at cognitec.com Tue Nov 29 14:40:44 2016 From: sysadmin-pve at cognitec.com (Alwin Antreich) Date: Tue, 29 Nov 2016 14:40:44 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <20161129111744.GL3355@sv.lnf.it> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> <20161129111744.GL3355@sv.lnf.it> Message-ID: <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com> Hi Marco, On 11/29/2016 12:17 PM, Marco Gaiarin wrote: > Mandi! Alwin Antreich > In chel di` si favelave... > >> May you please show us the logs? > > Ok, i'm here. With the log. > > A bit of legenda: 10.27.251.7 and 10.27.251.8 are the 'ceph' nodes > (mon+osd); 10.27.251.11 and 10.27.251.12 are the 'cpu' nodes (only > mon). In order, mon.0, mon.1, mon.2 and mon.3. > > These are the logs of 10.27.251.7 (mon.0); Seems to me that ceph logs > are all similar, so i hope these suffices. > > > I've started my activity at 15.00, but before take down all the stuff > i've P2V my last server, my Asterisk PBX box. Clearly, cluster worked: > > [...] > 2016-11-26 16:45:51.900445 osd.4 10.27.251.8:6804/3442 5016 : cluster [INF] 3.68 scrub starts > 2016-11-26 16:45:52.047932 osd.4 10.27.251.8:6804/3442 5017 : cluster [INF] 3.68 scrub ok > 2016-11-26 16:45:52.741334 mon.0 10.27.251.7:6789/0 2317313 : cluster [INF] pgmap v2410312: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20533 B/s rd, 945 kB/s wr, 127 op/s > 2016-11-26 16:45:54.825603 mon.0 10.27.251.7:6789/0 2317314 : cluster [INF] pgmap v2410313: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 261 kB/s wr, 7 op/s > [...] > 2016-11-26 16:47:52.741749 mon.0 10.27.251.7:6789/0 2317382 : cluster [INF] pgmap v2410381: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11563 B/s rd, 687 kB/s wr, 124 op/s > 2016-11-26 16:47:55.002485 mon.0 10.27.251.7:6789/0 2317383 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s > > > Finished the P2V, i've started to power off the cluster, starting from > the cpu nodes. After powering down a node, i've realized that i need it > to do another thing, so i've re-powered on. ;-) > > 2016-11-26 16:48:05.018514 mon.1 10.27.251.8:6789/0 129 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 16:48:05.031761 mon.2 10.27.251.11:6789/0 120 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 16:48:05.053262 mon.0 10.27.251.7:6789/0 2317384 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 16:48:10.091773 mon.0 10.27.251.7:6789/0 2317385 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2 > 2016-11-26 16:48:10.104535 mon.0 10.27.251.7:6789/0 2317386 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2 > 2016-11-26 16:48:10.143625 mon.0 10.27.251.7:6789/0 2317387 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 16:48:10.143731 mon.0 10.27.251.7:6789/0 2317388 : cluster [INF] pgmap v2410382: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 922 kB/s rd, 292 kB/s wr, 28 op/s > 2016-11-26 16:48:10.144828 mon.0 10.27.251.7:6789/0 2317389 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 16:48:10.148407 mon.0 10.27.251.7:6789/0 2317390 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 16:48:11.208968 mon.0 10.27.251.7:6789/0 2317391 : cluster [INF] pgmap v2410383: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2174 kB/s rd, 646 kB/s wr, 130 op/s > 2016-11-26 16:48:13.309644 mon.0 10.27.251.7:6789/0 2317392 : cluster [INF] pgmap v2410384: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2210 kB/s rd, 652 kB/s wr, 135 op/s > [...] > 2016-11-26 16:50:04.665220 mon.0 10.27.251.7:6789/0 2317466 : cluster [INF] pgmap v2410458: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2579 B/s rd, 23217 B/s wr, 5 op/s > 2016-11-26 16:50:05.707271 mon.0 10.27.251.7:6789/0 2317467 : cluster [INF] pgmap v2410459: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 157 kB/s rd, 445 kB/s wr, 82 op/s > 2016-11-26 16:50:16.786716 mon.1 10.27.251.8:6789/0 130 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 16:50:16.815156 mon.0 10.27.251.7:6789/0 2317468 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 16:52:51.536024 osd.0 10.27.251.7:6800/3166 7755 : cluster [INF] 1.e8 scrub starts > 2016-11-26 16:52:53.771169 osd.0 10.27.251.7:6800/3166 7756 : cluster [INF] 1.e8 scrub ok > 2016-11-26 16:54:34.558607 osd.0 10.27.251.7:6800/3166 7757 : cluster [INF] 1.ed scrub starts > 2016-11-26 16:54:36.682207 osd.0 10.27.251.7:6800/3166 7758 : cluster [INF] 1.ed scrub ok > 2016-11-26 16:57:07.816187 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 16:57:13.242951 mon.0 10.27.251.7:6789/0 2317469 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 > 2016-11-26 16:57:13.252424 mon.0 10.27.251.7:6789/0 2317470 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 > 2016-11-26 16:57:13.253143 mon.0 10.27.251.7:6789/0 2317471 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155786s > max 0.05s > 2016-11-26 16:57:13.302934 mon.0 10.27.251.7:6789/0 2317472 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 16:57:13.302998 mon.0 10.27.251.7:6789/0 2317473 : cluster [INF] pgmap v2410460: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 77940 B/s rd, 208 kB/s wr, 38 op/s > 2016-11-26 16:57:13.303055 mon.0 10.27.251.7:6789/0 2317474 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 16:57:13.303141 mon.0 10.27.251.7:6789/0 2317475 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 16:57:13.304000 mon.0 10.27.251.7:6789/0 2317476 : cluster [WRN] message from mon.3 was stamped 0.156822s in the future, clocks not synchronized > 2016-11-26 16:57:14.350452 mon.0 10.27.251.7:6789/0 2317477 : cluster [INF] pgmap v2410461: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 43651 B/s rd, 15067 B/s wr, 2 op/s > [...] > 2016-11-26 16:57:30.901532 mon.0 10.27.251.7:6789/0 2317483 : cluster [INF] pgmap v2410467: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1539 kB/s rd, 316 kB/s wr, 172 op/s > 2016-11-26 16:51:13.939571 osd.4 10.27.251.8:6804/3442 5018 : cluster [INF] 4.91 deep-scrub starts > 2016-11-26 16:52:03.663961 osd.4 10.27.251.8:6804/3442 5019 : cluster [INF] 4.91 deep-scrub ok > 2016-11-26 16:57:33.003398 mon.0 10.27.251.7:6789/0 2317484 : cluster [INF] pgmap v2410468: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 20384 kB/s rd, 2424 kB/s wr, 1163 op/s > [...] > 2016-11-26 16:57:41.523421 mon.0 10.27.251.7:6789/0 2317489 : cluster [INF] pgmap v2410473: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3654 kB/s rd, 732 kB/s wr, 385 op/s > 2016-11-26 16:57:43.284475 mon.0 10.27.251.7:6789/0 2317490 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.155191s > max 0.05s > 2016-11-26 16:57:43.624090 mon.0 10.27.251.7:6789/0 2317491 : cluster [INF] pgmap v2410474: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2140 kB/s rd, 391 kB/s wr, 233 op/s > [...] > 2016-11-26 16:58:02.688789 mon.0 10.27.251.7:6789/0 2317503 : cluster [INF] pgmap v2410486: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4675 kB/s rd, 184 kB/s wr, 281 op/s > 2016-11-26 16:52:48.308292 osd.3 10.27.251.8:6812/4377 8761 : cluster [INF] 1.55 scrub starts > 2016-11-26 16:52:50.718814 osd.3 10.27.251.8:6812/4377 8762 : cluster [INF] 1.55 scrub ok > 2016-11-26 16:52:59.309398 osd.3 10.27.251.8:6812/4377 8763 : cluster [INF] 4.c7 scrub starts > 2016-11-26 16:53:10.848883 osd.3 10.27.251.8:6812/4377 8764 : cluster [INF] 4.c7 scrub ok > 2016-11-26 16:58:03.759643 mon.0 10.27.251.7:6789/0 2317504 : cluster [INF] pgmap v2410487: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 8311 kB/s rd, 65182 B/s wr, 334 op/s > [...] > 2016-11-26 16:58:11.183400 mon.0 10.27.251.7:6789/0 2317510 : cluster [INF] pgmap v2410493: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 11880 kB/s rd, 507 kB/s wr, 1006 op/s > 2016-11-26 16:58:13.265908 mon.0 10.27.251.7:6789/0 2317511 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected > 2016-11-26 16:58:13.290893 mon.0 10.27.251.7:6789/0 2317512 : cluster [INF] pgmap v2410494: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 9111 kB/s rd, 523 kB/s wr, 718 op/s > [...] > 2016-11-26 16:58:42.309990 mon.0 10.27.251.7:6789/0 2317529 : cluster [INF] pgmap v2410511: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 22701 kB/s rd, 4773 kB/s wr, 834 op/s > 2016-11-26 16:58:43.285715 mon.0 10.27.251.7:6789/0 2317530 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.154781s > max 0.05s > 2016-11-26 16:58:43.358508 mon.0 10.27.251.7:6789/0 2317531 : cluster [INF] pgmap v2410512: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 19916 kB/s rd, 4439 kB/s wr, 741 op/s > [...] > 2016-11-26 16:59:17.933355 mon.0 10.27.251.7:6789/0 2317552 : cluster [INF] pgmap v2410533: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4400 kB/s rd, 2144 kB/s wr, 276 op/s > 2016-11-26 16:59:18.981605 mon.0 10.27.251.7:6789/0 2317553 : cluster [WRN] message from mon.3 was stamped 0.155111s in the future, clocks not synchronized > 2016-11-26 16:59:21.064651 mon.0 10.27.251.7:6789/0 2317554 : cluster [INF] pgmap v2410534: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3909 kB/s rd, 1707 kB/s wr, 232 op/s > [...] > 2016-11-26 16:59:58.729775 mon.0 10.27.251.7:6789/0 2317576 : cluster [INF] pgmap v2410556: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4067 kB/s rd, 1372 kB/s wr, 125 op/s > 2016-11-26 17:00:00.000396 mon.0 10.27.251.7:6789/0 2317577 : cluster [INF] HEALTH_WARN; clock skew detected on mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected > 2016-11-26 17:00:00.807659 mon.0 10.27.251.7:6789/0 2317578 : cluster [INF] pgmap v2410557: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 7894 kB/s rd, 1245 kB/s wr, 552 op/s > [...] > 2016-11-26 17:00:11.359226 mon.0 10.27.251.7:6789/0 2317585 : cluster [INF] pgmap v2410564: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2416 kB/s rd, 376 kB/s wr, 191 op/s > 2016-11-26 17:00:13.286867 mon.0 10.27.251.7:6789/0 2317586 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.153666s > max 0.05s > 2016-11-26 17:00:13.481830 mon.0 10.27.251.7:6789/0 2317587 : cluster [INF] pgmap v2410565: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 6266 kB/s rd, 492 kB/s wr, 265 op/s > [...] > 2016-11-26 17:00:15.559867 mon.0 10.27.251.7:6789/0 2317588 : cluster [INF] pgmap v2410566: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 5107 kB/s rd, 176 kB/s wr, 133 op/s > > OK, here server was shut down and so logs stop. > > > At power up, i got as sayed clock skew troubles, so i got status > HEALTH_WARN: > > 2016-11-26 18:16:19.623440 mon.1 10.27.251.8:6789/0 1311 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 18:16:19.729689 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 18:16:19.848291 mon.0 10.27.251.7:6789/0 1183 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 18:16:29.613075 mon.2 10.27.251.11:6789/0 20 : cluster [WRN] message from mon.0 was stamped 0.341880s in the future, clocks not synchronized > 2016-11-26 18:16:29.742328 mon.1 10.27.251.8:6789/0 1332 : cluster [WRN] message from mon.0 was stamped 0.212611s in the future, clocks not synchronized > 2016-11-26 18:16:29.894351 mon.0 10.27.251.7:6789/0 1202 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2 > 2016-11-26 18:16:29.901079 mon.0 10.27.251.7:6789/0 1203 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2 > 2016-11-26 18:16:29.902069 mon.0 10.27.251.7:6789/0 1204 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347176s > max 0.05s > 2016-11-26 18:16:29.928249 mon.0 10.27.251.7:6789/0 1205 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.203948s > max 0.05s > 2016-11-26 18:16:29.955001 mon.0 10.27.251.7:6789/0 1206 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 18:16:29.955115 mon.0 10.27.251.7:6789/0 1207 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 18:16:29.955195 mon.0 10.27.251.7:6789/0 1208 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 18:16:29.955297 mon.0 10.27.251.7:6789/0 1209 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 18:16:36.965739 mon.2 10.27.251.11:6789/0 23 : cluster [WRN] message from mon.0 was stamped 0.347450s in the future, clocks not synchronized > 2016-11-26 18:16:37.091476 mon.1 10.27.251.8:6789/0 1335 : cluster [WRN] message from mon.0 was stamped 0.221680s in the future, clocks not synchronized > 2016-11-26 18:16:59.929488 mon.0 10.27.251.7:6789/0 1212 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.347736s > max 0.05s > 2016-11-26 18:16:59.929541 mon.0 10.27.251.7:6789/0 1213 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.222216s > max 0.05s > 2016-11-26 18:17:02.770378 mon.2 10.27.251.11:6789/0 24 : cluster [WRN] message from mon.0 was stamped 0.345763s in the future, clocks not synchronized > 2016-11-26 18:17:02.902756 mon.1 10.27.251.8:6789/0 1336 : cluster [WRN] message from mon.0 was stamped 0.213372s in the future, clocks not synchronized > 2016-11-26 18:17:29.901353 mon.0 10.27.251.7:6789/0 1218 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2; 1 mons down, quorum 0,1,2 0,1,2; Monitor clock skew detected > 2016-11-26 18:17:59.930852 mon.0 10.27.251.7:6789/0 1219 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.348437s > max 0.05s > 2016-11-26 18:17:59.930923 mon.0 10.27.251.7:6789/0 1220 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.223381s > max 0.05s > 2016-11-26 18:18:24.383970 mon.2 10.27.251.11:6789/0 25 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 18:18:24.459941 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 18:18:24.506084 mon.3 10.27.251.12:6789/0 2 : cluster [WRN] message from mon.0 was stamped 0.271532s in the future, clocks not synchronized > 2016-11-26 18:18:24.508845 mon.1 10.27.251.8:6789/0 1337 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 18:18:24.733137 mon.0 10.27.251.7:6789/0 1221 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 18:18:24.764445 mon.0 10.27.251.7:6789/0 1222 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 18:18:24.770743 mon.0 10.27.251.7:6789/0 1223 : cluster [INF] HEALTH_OK > 2016-11-26 18:18:24.771644 mon.0 10.27.251.7:6789/0 1224 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.34865s > max 0.05s > 2016-11-26 18:18:24.771763 mon.0 10.27.251.7:6789/0 1225 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272024s > max 0.05s > 2016-11-26 18:18:24.778105 mon.0 10.27.251.7:6789/0 1226 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 18:18:24.778168 mon.0 10.27.251.7:6789/0 1227 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 18:18:24.778217 mon.0 10.27.251.7:6789/0 1228 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 18:18:24.778309 mon.0 10.27.251.7:6789/0 1229 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 18:18:24.778495 mon.0 10.27.251.7:6789/0 1230 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.217754s > max 0.05s > 2016-11-26 18:18:31.609426 mon.3 10.27.251.12:6789/0 5 : cluster [WRN] message from mon.0 was stamped 0.272441s in the future, clocks not synchronized > 2016-11-26 18:18:54.779742 mon.0 10.27.251.7:6789/0 1231 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.272617s > max 0.05s > 2016-11-26 18:18:54.779795 mon.0 10.27.251.7:6789/0 1232 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.224392s > max 0.05s > 2016-11-26 18:18:54.779834 mon.0 10.27.251.7:6789/0 1233 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349151s > max 0.05s > 2016-11-26 18:18:57.598098 mon.3 10.27.251.12:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.272729s in the future, clocks not synchronized > 2016-11-26 18:19:09.612371 mon.2 10.27.251.11:6789/0 26 : cluster [WRN] message from mon.0 was stamped 0.349322s in the future, clocks not synchronized > 2016-11-26 18:19:09.736830 mon.1 10.27.251.8:6789/0 1338 : cluster [WRN] message from mon.0 was stamped 0.224812s in the future, clocks not synchronized > 2016-11-26 18:19:24.770966 mon.0 10.27.251.7:6789/0 1234 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected > 2016-11-26 18:19:54.781002 mon.0 10.27.251.7:6789/0 1235 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.273372s > max 0.05s > 2016-11-26 18:19:54.781078 mon.0 10.27.251.7:6789/0 1236 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.225574s > max 0.05s > 2016-11-26 18:19:54.781120 mon.0 10.27.251.7:6789/0 1237 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.349896s > max 0.05s > 2016-11-26 18:21:03.602890 mon.3 10.27.251.12:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.274203s in the future, clocks not synchronized > 2016-11-26 18:21:24.782299 mon.0 10.27.251.7:6789/0 1238 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27444s > max 0.05s > 2016-11-26 18:21:24.782359 mon.0 10.27.251.7:6789/0 1239 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.351099s > max 0.05s > 2016-11-26 18:21:24.782397 mon.0 10.27.251.7:6789/0 1240 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.227465s > max 0.05s > 2016-11-26 18:23:24.783511 mon.0 10.27.251.7:6789/0 1241 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.275852s > max 0.05s > 2016-11-26 18:23:24.783572 mon.0 10.27.251.7:6789/0 1242 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.352701s > max 0.05s > 2016-11-26 18:23:24.783614 mon.0 10.27.251.7:6789/0 1243 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.229936s > max 0.05s > 2016-11-26 18:25:54.784800 mon.0 10.27.251.7:6789/0 1244 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.277662s > max 0.05s > 2016-11-26 18:25:54.784861 mon.0 10.27.251.7:6789/0 1245 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.354716s > max 0.05s > 2016-11-26 18:25:54.785102 mon.0 10.27.251.7:6789/0 1246 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.232739s > max 0.05s > 2016-11-26 18:28:54.786183 mon.0 10.27.251.7:6789/0 1248 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.27989s > max 0.05s > 2016-11-26 18:28:54.786243 mon.0 10.27.251.7:6789/0 1249 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.23634s > max 0.05s > 2016-11-26 18:28:54.786284 mon.0 10.27.251.7:6789/0 1250 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.35715s > max 0.05s > 2016-11-26 18:29:36.721250 mon.2 10.27.251.11:6789/0 27 : cluster [WRN] message from mon.0 was stamped 0.357750s in the future, clocks not synchronized > 2016-11-26 18:29:36.841757 mon.1 10.27.251.8:6789/0 1339 : cluster [WRN] message from mon.0 was stamped 0.237207s in the future, clocks not synchronized > 2016-11-26 18:31:30.725507 mon.3 10.27.251.12:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.281799s in the future, clocks not synchronized > 2016-11-26 18:32:24.787410 mon.0 10.27.251.7:6789/0 1264 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282481s > max 0.05s > 2016-11-26 18:32:24.787462 mon.0 10.27.251.7:6789/0 1265 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.360058s > max 0.05s > 2016-11-26 18:32:24.787500 mon.0 10.27.251.7:6789/0 1266 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.240569s > max 0.05s > 2016-11-26 18:33:20.594196 mon.3 10.27.251.12:6789/0 9 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 18:33:20.635816 mon.1 10.27.251.8:6789/0 1340 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 18:33:20.894625 mon.0 10.27.251.7:6789/0 1273 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 18:33:25.919955 mon.0 10.27.251.7:6789/0 1274 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 > 2016-11-26 18:33:25.929393 mon.0 10.27.251.7:6789/0 1275 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 > 2016-11-26 18:33:25.930715 mon.0 10.27.251.7:6789/0 1276 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.282884s > max 0.05s > 2016-11-26 18:33:25.947280 mon.0 10.27.251.7:6789/0 1277 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.234203s > max 0.05s > 2016-11-26 18:33:25.964223 mon.0 10.27.251.7:6789/0 1278 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 18:33:25.964283 mon.0 10.27.251.7:6789/0 1279 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 18:33:25.964326 mon.0 10.27.251.7:6789/0 1280 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 18:33:25.964418 mon.0 10.27.251.7:6789/0 1281 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 18:33:55.948613 mon.0 10.27.251.7:6789/0 1283 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.28349s > max 0.05s > 2016-11-26 18:33:55.948680 mon.0 10.27.251.7:6789/0 1284 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.242253s > max 0.05s > 2016-11-26 18:34:25.929710 mon.0 10.27.251.7:6789/0 1287 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.3; 1 mons down, quorum 0,1,3 0,1,3; Monitor clock skew detected > 2016-11-26 18:34:55.950050 mon.0 10.27.251.7:6789/0 1288 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.284225s > max 0.05s > 2016-11-26 18:34:55.950117 mon.0 10.27.251.7:6789/0 1289 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243421s > max 0.05s > 2016-11-26 18:36:25.951267 mon.0 10.27.251.7:6789/0 1290 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.285389s > max 0.05s > 2016-11-26 18:36:25.951393 mon.0 10.27.251.7:6789/0 1291 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.245253s > max 0.05s > 2016-11-26 18:38:25.952573 mon.0 10.27.251.7:6789/0 1294 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.286907s > max 0.05s > 2016-11-26 18:38:25.952836 mon.0 10.27.251.7:6789/0 1295 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247648s > max 0.05s > 2016-11-26 18:40:55.954179 mon.0 10.27.251.7:6789/0 1296 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.288735s > max 0.05s > 2016-11-26 18:40:55.954233 mon.0 10.27.251.7:6789/0 1297 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.2506s > max 0.05s > 2016-11-26 18:43:32.915408 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 18:43:32.916835 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 18:43:32.951384 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.388792s in the future, clocks not synchronized > 2016-11-26 18:43:33.014026 mon.3 10.27.251.12:6789/0 10 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 18:43:33.050896 mon.1 10.27.251.8:6789/0 1341 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 18:43:33.305330 mon.0 10.27.251.7:6789/0 1298 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 18:43:33.324492 mon.0 10.27.251.7:6789/0 1299 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 18:43:33.333626 mon.0 10.27.251.7:6789/0 1300 : cluster [INF] HEALTH_OK > 2016-11-26 18:43:33.334234 mon.0 10.27.251.7:6789/0 1301 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290845s > max 0.05s > 2016-11-26 18:43:33.334321 mon.0 10.27.251.7:6789/0 1302 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.388745s > max 0.05s > 2016-11-26 18:43:33.340638 mon.0 10.27.251.7:6789/0 1303 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 18:43:33.340703 mon.0 10.27.251.7:6789/0 1304 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 18:43:33.340763 mon.0 10.27.251.7:6789/0 1305 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 18:43:33.340858 mon.0 10.27.251.7:6789/0 1306 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 18:43:33.341044 mon.0 10.27.251.7:6789/0 1307 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.247914s > max 0.05s > 2016-11-26 18:43:40.064299 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.388889s in the future, clocks not synchronized > 2016-11-26 18:44:03.342137 mon.0 10.27.251.7:6789/0 1308 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291226s > max 0.05s > 2016-11-26 18:44:03.342225 mon.0 10.27.251.7:6789/0 1309 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.254342s > max 0.05s > 2016-11-26 18:44:03.342281 mon.0 10.27.251.7:6789/0 1310 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.389057s > max 0.05s > 2016-11-26 18:44:06.047499 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.389102s in the future, clocks not synchronized > 2016-11-26 18:44:33.333908 mon.0 10.27.251.7:6789/0 1311 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected > 2016-11-26 18:45:03.343358 mon.0 10.27.251.7:6789/0 1313 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.291989s > max 0.05s > 2016-11-26 18:45:03.343435 mon.0 10.27.251.7:6789/0 1314 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.255536s > max 0.05s > 2016-11-26 18:45:03.343540 mon.0 10.27.251.7:6789/0 1315 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.38983s > max 0.05s > 2016-11-26 18:46:11.549947 mon.2 10.27.251.11:6789/0 8 : cluster [WRN] message from mon.0 was stamped 0.390678s in the future, clocks not synchronized > 2016-11-26 18:46:33.344570 mon.0 10.27.251.7:6789/0 1329 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.29311s > max 0.05s > 2016-11-26 18:46:33.344642 mon.0 10.27.251.7:6789/0 1330 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.257389s > max 0.05s > 2016-11-26 18:46:33.344707 mon.0 10.27.251.7:6789/0 1331 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.391036s > max 0.05s > 2016-11-26 18:48:33.345909 mon.0 10.27.251.7:6789/0 1354 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.294607s > max 0.05s > 2016-11-26 18:48:33.345973 mon.0 10.27.251.7:6789/0 1355 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.392611s > max 0.05s > 2016-11-26 18:48:33.346016 mon.0 10.27.251.7:6789/0 1356 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.259781s > max 0.05s > 2016-11-26 18:51:03.347074 mon.0 10.27.251.7:6789/0 1357 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.296507s > max 0.05s > 2016-11-26 18:51:03.347259 mon.0 10.27.251.7:6789/0 1358 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.394627s > max 0.05s > 2016-11-26 18:51:03.347311 mon.0 10.27.251.7:6789/0 1359 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.262662s > max 0.05s > 2016-11-26 18:54:03.348471 mon.0 10.27.251.7:6789/0 1360 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.298756s > max 0.05s > 2016-11-26 18:54:03.348533 mon.0 10.27.251.7:6789/0 1361 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.397086s > max 0.05s > 2016-11-26 18:54:03.348580 mon.0 10.27.251.7:6789/0 1362 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.266196s > max 0.05s > 2016-11-26 18:56:39.053369 mon.2 10.27.251.11:6789/0 9 : cluster [WRN] message from mon.0 was stamped 0.399300s in the future, clocks not synchronized > 2016-11-26 18:57:33.349690 mon.0 10.27.251.7:6789/0 1363 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192948s > max 0.05s > 2016-11-26 18:57:33.349743 mon.0 10.27.251.7:6789/0 1364 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.270457s > max 0.05s > 2016-11-26 18:57:33.349788 mon.0 10.27.251.7:6789/0 1365 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.400016s > max 0.05s > 2016-11-26 19:00:00.000400 mon.0 10.27.251.7:6789/0 1370 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; Monitor clock skew detected > 2016-11-26 19:01:33.350738 mon.0 10.27.251.7:6789/0 1389 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192183s > max 0.05s > 2016-11-26 19:01:33.350800 mon.0 10.27.251.7:6789/0 1390 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.275208s > max 0.05s > 2016-11-26 19:01:33.350856 mon.0 10.27.251.7:6789/0 1391 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.40334s > max 0.05s > 2016-11-26 19:06:03.351908 mon.0 10.27.251.7:6789/0 1478 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.192207s > max 0.05s > 2016-11-26 19:06:03.351997 mon.0 10.27.251.7:6789/0 1479 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.280431s > max 0.05s > 2016-11-26 19:06:03.352110 mon.0 10.27.251.7:6789/0 1480 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.251491s > max 0.05s > > But after adding the new NTP sever and waiting some time, finally clock > get in sync and status go to OK. > But (this is the PANIC time) despite of the fact that 'ceph status' and > pve interface say 'all OK', cluster does not work. > > So i've started to reboot the CPU nodes (mon.2 and .3): > > 2016-11-26 19:11:33.343818 mon.0 10.27.251.7:6789/0 1581 : cluster [INF] HEALTH_OK > 2016-11-26 19:12:43.854404 mon.1 10.27.251.8:6789/0 1342 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 19:12:43.856032 mon.3 10.27.251.12:6789/0 11 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 19:12:43.870922 mon.0 10.27.251.7:6789/0 1590 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 19:12:48.895683 mon.0 10.27.251.7:6789/0 1591 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 > 2016-11-26 19:12:48.905245 mon.0 10.27.251.7:6789/0 1592 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 > 2016-11-26 19:12:48.951654 mon.0 10.27.251.7:6789/0 1593 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 19:12:48.951715 mon.0 10.27.251.7:6789/0 1594 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:12:48.951766 mon.0 10.27.251.7:6789/0 1595 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 19:12:48.951848 mon.0 10.27.251.7:6789/0 1596 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 19:15:48.583382 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 19:15:48.584865 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 19:15:48.589714 mon.0 10.27.251.7:6789/0 1616 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 19:15:48.589965 mon.1 10.27.251.8:6789/0 1343 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 19:15:48.591671 mon.3 10.27.251.12:6789/0 12 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 19:15:48.614007 mon.0 10.27.251.7:6789/0 1617 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 19:15:48.620602 mon.0 10.27.251.7:6789/0 1618 : cluster [INF] HEALTH_OK > 2016-11-26 19:15:48.633199 mon.0 10.27.251.7:6789/0 1619 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 19:15:48.633258 mon.0 10.27.251.7:6789/0 1620 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:15:48.633322 mon.0 10.27.251.7:6789/0 1621 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 19:15:48.633416 mon.0 10.27.251.7:6789/0 1622 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 19:18:12.415679 mon.0 10.27.251.7:6789/0 1639 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 19:18:17.444444 mon.0 10.27.251.7:6789/0 1640 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2 > 2016-11-26 19:18:17.453618 mon.0 10.27.251.7:6789/0 1641 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,2 0,1,2 > 2016-11-26 19:18:17.468577 mon.0 10.27.251.7:6789/0 1642 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 19:18:17.468636 mon.0 10.27.251.7:6789/0 1643 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:18:17.468679 mon.0 10.27.251.7:6789/0 1644 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 19:18:17.468755 mon.0 10.27.251.7:6789/0 1645 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 19:21:25.457997 mon.2 10.27.251.11:6789/0 5 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 19:21:25.458923 mon.0 10.27.251.7:6789/0 1648 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 19:21:25.459240 mon.1 10.27.251.8:6789/0 1344 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 19:21:25.489206 mon.0 10.27.251.7:6789/0 1649 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 19:21:25.498421 mon.0 10.27.251.7:6789/0 1650 : cluster [INF] HEALTH_OK > 2016-11-26 19:21:25.505645 mon.3 10.27.251.12:6789/0 1 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 19:21:25.508232 mon.0 10.27.251.7:6789/0 1651 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 19:21:25.508377 mon.0 10.27.251.7:6789/0 1652 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:21:25.508466 mon.0 10.27.251.7:6789/0 1653 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 19:21:25.508556 mon.0 10.27.251.7:6789/0 1654 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 19:44:00.306113 mon.0 10.27.251.7:6789/0 1672 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 19:44:05.343631 mon.0 10.27.251.7:6789/0 1673 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,3 > 2016-11-26 19:44:05.353082 mon.0 10.27.251.7:6789/0 1674 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 0,1,3 0,1,3 > 2016-11-26 19:44:05.373799 mon.0 10.27.251.7:6789/0 1675 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 19:44:05.373860 mon.0 10.27.251.7:6789/0 1676 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:44:05.373904 mon.0 10.27.251.7:6789/0 1677 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 19:44:05.373983 mon.0 10.27.251.7:6789/0 1678 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 19:47:20.297661 mon.2 10.27.251.11:6789/0 1 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 19:47:20.299406 mon.2 10.27.251.11:6789/0 2 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 19:47:20.357274 mon.2 10.27.251.11:6789/0 3 : cluster [WRN] message from mon.0 was stamped 0.404381s in the future, clocks not synchronized > 2016-11-26 19:47:20.716116 mon.3 10.27.251.12:6789/0 4 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 19:47:20.719435 mon.0 10.27.251.7:6789/0 1679 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 19:47:20.719853 mon.1 10.27.251.8:6789/0 1345 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 19:47:20.747017 mon.0 10.27.251.7:6789/0 1680 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 19:47:20.755302 mon.0 10.27.251.7:6789/0 1681 : cluster [INF] HEALTH_OK > 2016-11-26 19:47:20.755943 mon.0 10.27.251.7:6789/0 1682 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420346s > max 0.05s > 2016-11-26 19:47:20.762042 mon.0 10.27.251.7:6789/0 1683 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 19:47:20.762100 mon.0 10.27.251.7:6789/0 1684 : cluster [INF] pgmap v2410577: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:47:20.762146 mon.0 10.27.251.7:6789/0 1685 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 19:47:20.762226 mon.0 10.27.251.7:6789/0 1686 : cluster [INF] osdmap e98: 6 osds: 6 up, 6 in > 2016-11-26 19:47:27.462603 mon.2 10.27.251.11:6789/0 6 : cluster [WRN] message from mon.0 was stamped 0.420329s in the future, clocks not synchronized > 2016-11-26 19:47:50.763598 mon.0 10.27.251.7:6789/0 1687 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.420661s > max 0.05s > 2016-11-26 19:47:53.438750 mon.2 10.27.251.11:6789/0 7 : cluster [WRN] message from mon.0 was stamped 0.420684s in the future, clocks not synchronized > 2016-11-26 19:48:20.755382 mon.0 10.27.251.7:6789/0 1688 : cluster [INF] HEALTH_WARN; clock skew detected on mon.2; Monitor clock skew detected > 2016-11-26 19:49:20.755732 mon.0 10.27.251.7:6789/0 1697 : cluster [INF] HEALTH_OK > > > With no luck. So finally i've set 'nodown' and 'noout' flags and > rebooted the storage nodes (mon.0 ad .1). And suddenly all get back as > normal: > > 2016-11-26 19:57:20.090836 mon.0 10.27.251.7:6789/0 1722 : cluster [INF] osdmap e99: 6 osds: 6 up, 6 in > 2016-11-26 19:57:20.110743 mon.0 10.27.251.7:6789/0 1723 : cluster [INF] pgmap v2410578: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:57:20.758100 mon.0 10.27.251.7:6789/0 1724 : cluster [INF] HEALTH_WARN; noout flag(s) set > 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in > 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 19:57:24.617480 mon.0 10.27.251.7:6789/0 1727 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in > 2016-11-26 19:57:24.641974 mon.0 10.27.251.7:6789/0 1728 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:00:00.000180 mon.1 10.27.251.8:6789/0 1353 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set; 1 mons down, quorum 1,2,3 1,2,3 > 2016-11-26 20:01:49.705122 mon.0 10.27.251.7:6789/0 1 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 20:01:49.731728 mon.0 10.27.251.7:6789/0 4 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 20:01:49.751119 mon.0 10.27.251.7:6789/0 5 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 20:01:49.762503 mon.0 10.27.251.7:6789/0 6 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set > 2016-11-26 20:01:49.788619 mon.0 10.27.251.7:6789/0 7 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.243513s > max 0.05s > 2016-11-26 20:01:49.788699 mon.0 10.27.251.7:6789/0 8 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.240216s > max 0.05s > 2016-11-26 20:01:49.788796 mon.0 10.27.251.7:6789/0 9 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.243912s > max 0.05s > 2016-11-26 20:01:49.797382 mon.0 10.27.251.7:6789/0 10 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 20:01:49.797669 mon.0 10.27.251.7:6789/0 11 : cluster [INF] pgmap v2410579: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:01:49.797850 mon.0 10.27.251.7:6789/0 12 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 20:01:49.797960 mon.0 10.27.251.7:6789/0 13 : cluster [INF] osdmap e100: 6 osds: 6 up, 6 in > 2016-11-26 20:01:49.798248 mon.0 10.27.251.7:6789/0 14 : cluster [WRN] message from mon.1 was stamped 0.294517s in the future, clocks not synchronized > 2016-11-26 20:01:50.014131 mon.3 10.27.251.12:6789/0 6 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 20:01:50.016998 mon.2 10.27.251.11:6789/0 9 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 20:01:50.017895 mon.1 10.27.251.8:6789/0 1354 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 20:01:57.737260 mon.0 10.27.251.7:6789/0 19 : cluster [WRN] message from mon.3 was stamped 0.291444s in the future, clocks not synchronized > 2016-11-26 20:02:19.789732 mon.0 10.27.251.7:6789/0 20 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.294864s > max 0.05s > 2016-11-26 20:02:19.789786 mon.0 10.27.251.7:6789/0 21 : cluster [WRN] mon.3 10.27.251.12:6789/0 clock skew 0.290951s > max 0.05s > 2016-11-26 20:02:19.789824 mon.0 10.27.251.7:6789/0 22 : cluster [WRN] mon.2 10.27.251.11:6789/0 clock skew 0.29396s > max 0.05s > 2016-11-26 20:02:20.949515 mon.0 10.27.251.7:6789/0 23 : cluster [INF] osdmap e101: 6 osds: 4 up, 6 in > 2016-11-26 20:02:20.985891 mon.0 10.27.251.7:6789/0 24 : cluster [INF] pgmap v2410580: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:02:21.965798 mon.0 10.27.251.7:6789/0 25 : cluster [INF] osd.0 10.27.251.7:6804/3291 boot > 2016-11-26 20:02:21.965879 mon.0 10.27.251.7:6789/0 26 : cluster [INF] osd.1 10.27.251.7:6800/2793 boot > 2016-11-26 20:02:21.975031 mon.0 10.27.251.7:6789/0 27 : cluster [INF] osdmap e102: 6 osds: 6 up, 6 in > 2016-11-26 20:02:22.022415 mon.0 10.27.251.7:6789/0 28 : cluster [INF] pgmap v2410581: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:02:23.026342 mon.0 10.27.251.7:6789/0 29 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in > 2016-11-26 20:02:23.026417 mon.0 10.27.251.7:6789/0 30 : cluster [WRN] message from mon.2 was stamped 0.275306s in the future, clocks not synchronized > 2016-11-26 20:02:23.046210 mon.0 10.27.251.7:6789/0 31 : cluster [INF] pgmap v2410582: 768 pgs: 312 stale+active+clean, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:02:25.819773 mon.0 10.27.251.7:6789/0 32 : cluster [INF] pgmap v2410583: 768 pgs: 169 stale+active+clean, 143 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1467 kB/s wr, 276 op/s > 2016-11-26 20:02:26.896658 mon.0 10.27.251.7:6789/0 33 : cluster [INF] pgmap v2410584: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 3337 kB/s wr, 630 op/s > 2016-11-26 20:02:49.763887 mon.0 10.27.251.7:6789/0 34 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1, mon.2, mon.3; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set; Monitor clock skew detected > 2016-11-26 20:02:55.636643 osd.1 10.27.251.7:6800/2793 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.511571 secs > 2016-11-26 20:02:55.636653 osd.1 10.27.251.7:6800/2793 2 : cluster [WRN] slow request 30.511571 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg > 2016-11-26 20:03:04.727273 osd.0 10.27.251.7:6804/3291 1 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.147061 secs > 2016-11-26 20:03:04.727281 osd.0 10.27.251.7:6804/3291 2 : cluster [WRN] slow request 30.147061 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg > 2016-11-26 20:03:25.648743 osd.1 10.27.251.7:6800/2793 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.523708 secs > 2016-11-26 20:03:25.648758 osd.1 10.27.251.7:6800/2793 4 : cluster [WRN] slow request 60.523708 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg > 2016-11-26 20:03:34.737588 osd.0 10.27.251.7:6804/3291 3 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.157392 secs > 2016-11-26 20:03:34.737597 osd.0 10.27.251.7:6804/3291 4 : cluster [WRN] slow request 60.157392 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg > 2016-11-26 20:03:49.765365 mon.0 10.27.251.7:6789/0 35 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; nodown,noout flag(s) set > 2016-11-26 20:04:25.850414 mon.0 10.27.251.7:6789/0 36 : cluster [INF] pgmap v2410585: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:04:26.890251 mon.0 10.27.251.7:6789/0 37 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:04:25.668335 osd.1 10.27.251.7:6800/2793 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.543296 secs > 2016-11-26 20:04:25.668343 osd.1 10.27.251.7:6800/2793 6 : cluster [WRN] slow request 120.543296 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg > 2016-11-26 20:04:34.757570 osd.0 10.27.251.7:6804/3291 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 120.177368 secs > 2016-11-26 20:04:34.757595 osd.0 10.27.251.7:6804/3291 6 : cluster [WRN] slow request 120.177368 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg > 2016-11-26 20:04:49.766694 mon.0 10.27.251.7:6789/0 38 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set > 2016-11-26 20:05:41.864203 mon.0 10.27.251.7:6789/0 39 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 20:05:46.887853 mon.0 10.27.251.7:6789/0 40 : cluster [INF] mon.0 at 0 won leader election with quorum 0,2,3 > 2016-11-26 20:05:46.897914 mon.0 10.27.251.7:6789/0 41 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set; 1 mons down, quorum 0,2,3 0,2,3 > 2016-11-26 20:05:46.898803 mon.0 10.27.251.7:6789/0 42 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 20:05:46.898873 mon.0 10.27.251.7:6789/0 43 : cluster [INF] pgmap v2410586: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:05:46.898930 mon.0 10.27.251.7:6789/0 44 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 20:05:46.899022 mon.0 10.27.251.7:6789/0 45 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in > 2016-11-26 20:06:25.875860 mon.0 10.27.251.7:6789/0 46 : cluster [INF] pgmap v2410587: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:06:26.902246 mon.0 10.27.251.7:6789/0 47 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:06:25.708241 osd.1 10.27.251.7:6800/2793 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.583204 secs > 2016-11-26 20:06:25.708251 osd.1 10.27.251.7:6800/2793 8 : cluster [WRN] slow request 240.583204 seconds old, received at 2016-11-26 20:02:25.124993: osd_op(client.7854102.1:1 rbd_id.vm-102-disk-1 [call rbd.get_id] 3.197c044b RETRY=1 retry+read e103) currently reached_pg > 2016-11-26 20:06:34.798235 osd.0 10.27.251.7:6804/3291 7 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 240.218041 secs > 2016-11-26 20:06:34.798247 osd.0 10.27.251.7:6804/3291 8 : cluster [WRN] slow request 240.218041 seconds old, received at 2016-11-26 20:02:34.580154: osd_op(client.7849892.0:1 vm-108-disk-1.rbd [stat] 1.e494866a RETRY=2 ack+retry+read+known_if_redirected e103) currently reached_pg > 2016-11-26 20:07:20.410986 mon.3 10.27.251.12:6789/0 7 : cluster [INF] mon.3 calling new monitor election > 2016-11-26 20:07:20.414159 mon.2 10.27.251.11:6789/0 10 : cluster [INF] mon.2 calling new monitor election > 2016-11-26 20:07:20.421808 mon.0 10.27.251.7:6789/0 48 : cluster [INF] mon.0 calling new monitor election > 2016-11-26 20:07:20.448582 mon.0 10.27.251.7:6789/0 49 : cluster [INF] mon.0 at 0 won leader election with quorum 0,1,2,3 > 2016-11-26 20:07:20.459304 mon.0 10.27.251.7:6789/0 50 : cluster [INF] HEALTH_WARN; 312 pgs peering; 312 pgs stuck inactive; 312 pgs stuck unclean; 2 requests are blocked > 32 sec; nodown,noout flag(s) set > 2016-11-26 20:07:20.465502 mon.0 10.27.251.7:6789/0 51 : cluster [INF] monmap e4: 4 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0} > 2016-11-26 20:07:20.465571 mon.0 10.27.251.7:6789/0 52 : cluster [INF] pgmap v2410588: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:20.465650 mon.0 10.27.251.7:6789/0 53 : cluster [INF] mdsmap e1: 0/0/0 up > 2016-11-26 20:07:20.465750 mon.0 10.27.251.7:6789/0 54 : cluster [INF] osdmap e103: 6 osds: 6 up, 6 in > 2016-11-26 20:07:20.465934 mon.0 10.27.251.7:6789/0 55 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.10054s > max 0.05s > 2016-11-26 20:07:20.478961 mon.0 10.27.251.7:6789/0 56 : cluster [WRN] message from mon.1 was stamped 0.109909s in the future, clocks not synchronized > 2016-11-26 20:07:20.522400 mon.1 10.27.251.8:6789/0 1 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 20:07:20.541271 mon.1 10.27.251.8:6789/0 2 : cluster [INF] mon.1 calling new monitor election > 2016-11-26 20:07:32.641565 mon.0 10.27.251.7:6789/0 61 : cluster [INF] osdmap e104: 6 osds: 5 up, 6 in > 2016-11-26 20:07:32.665552 mon.0 10.27.251.7:6789/0 62 : cluster [INF] pgmap v2410589: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:33.658567 mon.0 10.27.251.7:6789/0 63 : cluster [INF] osd.5 10.27.251.8:6812/4116 boot > 2016-11-26 20:07:33.676112 mon.0 10.27.251.7:6789/0 64 : cluster [INF] osdmap e105: 6 osds: 6 up, 6 in > 2016-11-26 20:07:33.726565 mon.0 10.27.251.7:6789/0 65 : cluster [INF] pgmap v2410590: 768 pgs: 72 stale+active+clean, 312 peering, 384 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:34.722585 mon.0 10.27.251.7:6789/0 66 : cluster [INF] osdmap e106: 6 osds: 5 up, 6 in > 2016-11-26 20:07:34.785966 mon.0 10.27.251.7:6789/0 67 : cluster [INF] pgmap v2410591: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:35.737328 mon.0 10.27.251.7:6789/0 68 : cluster [INF] osd.4 10.27.251.8:6804/3430 boot > 2016-11-26 20:07:35.757111 mon.0 10.27.251.7:6789/0 69 : cluster [INF] osdmap e107: 6 osds: 6 up, 6 in > 2016-11-26 20:07:35.794812 mon.0 10.27.251.7:6789/0 70 : cluster [INF] pgmap v2410592: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:36.797846 mon.0 10.27.251.7:6789/0 71 : cluster [INF] osdmap e108: 6 osds: 6 up, 6 in > 2016-11-26 20:07:36.842861 mon.0 10.27.251.7:6789/0 72 : cluster [INF] pgmap v2410593: 768 pgs: 160 stale+active+clean, 312 peering, 296 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:38.854149 mon.0 10.27.251.7:6789/0 73 : cluster [INF] pgmap v2410594: 768 pgs: 88 stale+active+clean, 312 peering, 368 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1992 kB/s rd, 683 kB/s wr, 117 op/s > 2016-11-26 20:07:39.923063 mon.0 10.27.251.7:6789/0 74 : cluster [INF] pgmap v2410595: 768 pgs: 312 peering, 456 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1466 kB/s wr, 257 op/s > 2016-11-26 20:07:41.012515 mon.0 10.27.251.7:6789/0 75 : cluster [INF] osdmap e109: 6 osds: 5 up, 6 in > 2016-11-26 20:07:41.039741 mon.0 10.27.251.7:6789/0 76 : cluster [INF] pgmap v2410596: 768 pgs: 142 stale+active+clean, 312 peering, 314 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1110 kB/s wr, 211 op/s > 2016-11-26 20:07:38.817104 osd.0 10.27.251.7:6804/3291 9 : cluster [INF] 1.b7 scrub starts > 2016-11-26 20:07:41.429461 osd.0 10.27.251.7:6804/3291 10 : cluster [INF] 1.b7 scrub ok > 2016-11-26 20:07:42.043092 mon.0 10.27.251.7:6789/0 77 : cluster [INF] osd.2 10.27.251.8:6800/3073 boot > 2016-11-26 20:07:42.074005 mon.0 10.27.251.7:6789/0 78 : cluster [INF] osdmap e110: 6 osds: 5 up, 6 in > 2016-11-26 20:07:42.150211 mon.0 10.27.251.7:6789/0 79 : cluster [INF] pgmap v2410597: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 940 B/s rd, 1 op/s > 2016-11-26 20:07:43.084122 mon.0 10.27.251.7:6789/0 80 : cluster [INF] osd.3 10.27.251.8:6808/3714 boot > 2016-11-26 20:07:43.104296 mon.0 10.27.251.7:6789/0 81 : cluster [INF] osdmap e111: 6 osds: 6 up, 6 in > 2016-11-26 20:07:35.733073 osd.1 10.27.251.7:6800/2793 9 : cluster [INF] 3.37 scrub starts > 2016-11-26 20:07:35.841829 osd.1 10.27.251.7:6800/2793 10 : cluster [INF] 3.37 scrub ok > 2016-11-26 20:07:36.733564 osd.1 10.27.251.7:6800/2793 11 : cluster [INF] 3.7c scrub starts > 2016-11-26 20:07:36.852120 osd.1 10.27.251.7:6800/2793 12 : cluster [INF] 3.7c scrub ok > 2016-11-26 20:07:41.764388 osd.1 10.27.251.7:6800/2793 13 : cluster [INF] 3.fc scrub starts > 2016-11-26 20:07:41.830597 osd.1 10.27.251.7:6800/2793 14 : cluster [INF] 3.fc scrub ok > 2016-11-26 20:07:42.736376 osd.1 10.27.251.7:6800/2793 15 : cluster [INF] 4.9 scrub starts > 2016-11-26 20:07:43.149808 mon.0 10.27.251.7:6789/0 82 : cluster [INF] pgmap v2410598: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 982 B/s rd, 1 op/s > 2016-11-26 20:07:44.135066 mon.0 10.27.251.7:6789/0 83 : cluster [INF] osdmap e112: 6 osds: 6 up, 6 in > 2016-11-26 20:07:44.178743 mon.0 10.27.251.7:6789/0 84 : cluster [INF] pgmap v2410599: 768 pgs: 296 stale+active+clean, 223 peering, 248 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail > 2016-11-26 20:07:46.774607 mon.0 10.27.251.7:6789/0 85 : cluster [INF] pgmap v2410600: 768 pgs: 154 stale+active+clean, 223 peering, 390 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2157 kB/s wr, 466 op/s > 2016-11-26 20:07:47.846499 mon.0 10.27.251.7:6789/0 86 : cluster [INF] pgmap v2410601: 768 pgs: 223 peering, 544 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4603 kB/s wr, 748 op/s > 2016-11-26 20:07:48.919366 mon.0 10.27.251.7:6789/0 87 : cluster [INF] pgmap v2410602: 768 pgs: 99 peering, 667 active+clean, 2 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4235 kB/s wr, 495 op/s > 2016-11-26 20:07:49.986068 mon.0 10.27.251.7:6789/0 88 : cluster [INF] pgmap v2410603: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1607 kB/s rd, 30552 B/s wr, 127 op/s > 2016-11-26 20:07:50.468852 mon.0 10.27.251.7:6789/0 89 : cluster [WRN] mon.1 10.27.251.8:6789/0 clock skew 0.105319s > max 0.05s > 2016-11-26 20:07:43.076810 osd.0 10.27.251.7:6804/3291 11 : cluster [INF] 1.17 scrub starts > 2016-11-26 20:07:45.709439 osd.0 10.27.251.7:6804/3291 12 : cluster [INF] 1.17 scrub ok > 2016-11-26 20:07:52.746601 mon.0 10.27.251.7:6789/0 90 : cluster [INF] pgmap v2410604: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 628 kB/s rd, 25525 B/s wr, 139 op/s > [...] > 2016-11-26 20:08:03.325584 mon.0 10.27.251.7:6789/0 98 : cluster [INF] pgmap v2410612: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 387 kB/s rd, 61530 B/s wr, 90 op/s > 2016-11-26 20:08:03.523958 osd.1 10.27.251.7:6800/2793 16 : cluster [INF] 4.9 scrub ok > 2016-11-26 20:08:04.398784 mon.0 10.27.251.7:6789/0 99 : cluster [INF] pgmap v2410613: 768 pgs: 767 active+clean, 1 active+clean+scrubbing; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 2975 kB/s rd, 401 kB/s wr, 419 op/s > [...] > 2016-11-26 20:08:20.340826 mon.0 10.27.251.7:6789/0 112 : cluster [INF] pgmap v2410626: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 384 kB/s rd, 95507 B/s wr, 31 op/s > 2016-11-26 20:08:20.458392 mon.0 10.27.251.7:6789/0 113 : cluster [INF] HEALTH_WARN; clock skew detected on mon.1; nodown,noout flag(s) set; Monitor clock skew detected > 2016-11-26 20:08:22.429360 mon.0 10.27.251.7:6789/0 114 : cluster [INF] pgmap v2410627: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 256 kB/s rd, 65682 B/s wr, 18 op/s > [...] > 2016-11-26 20:09:19.885573 mon.0 10.27.251.7:6789/0 160 : cluster [INF] pgmap v2410671: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 33496 kB/s rd, 3219 kB/s wr, 317 op/s > 2016-11-26 20:09:20.458837 mon.0 10.27.251.7:6789/0 161 : cluster [INF] HEALTH_WARN; nodown,noout flag(s) set > 2016-11-26 20:09:20.921396 mon.0 10.27.251.7:6789/0 162 : cluster [INF] pgmap v2410672: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 10498 kB/s rd, 970 kB/s wr, 46 op/s > [...] > 2016-11-26 20:09:40.156783 mon.0 10.27.251.7:6789/0 178 : cluster [INF] pgmap v2410688: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 16202 kB/s rd, 586 kB/s wr, 64 op/s > 2016-11-26 20:09:41.231992 mon.0 10.27.251.7:6789/0 181 : cluster [INF] osdmap e113: 6 osds: 6 up, 6 in > 2016-11-26 20:09:41.260099 mon.0 10.27.251.7:6789/0 182 : cluster [INF] pgmap v2410689: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 13734 kB/s rd, 561 kB/s wr, 58 op/s > [...] > 2016-11-26 20:09:46.764432 mon.0 10.27.251.7:6789/0 187 : cluster [INF] pgmap v2410693: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 4388 kB/s rd, 97979 B/s wr, 18 op/s > 2016-11-26 20:09:46.764614 mon.0 10.27.251.7:6789/0 189 : cluster [INF] osdmap e114: 6 osds: 6 up, 6 in > 2016-11-26 20:09:46.793173 mon.0 10.27.251.7:6789/0 190 : cluster [INF] pgmap v2410694: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 1709 kB/s rd, 75202 B/s wr, 4 op/s > [...] > 2016-11-26 20:10:19.919396 mon.0 10.27.251.7:6789/0 216 : cluster [INF] pgmap v2410719: 768 pgs: 768 active+clean; 936 GB data, 1858 GB used, 7452 GB / 9310 GB avail; 404 kB/s wr, 4 op/s > 2016-11-26 20:10:20.459279 mon.0 10.27.251.7:6789/0 217 : cluster [INF] HEALTH_OK > > > Other things to note. In syslog (not ceph log) of mon.0 i've found for > the first (falied) boot: > > Nov 26 18:05:43 capitanamerica ceph[1714]: === mon.0 === > Nov 26 18:05:43 capitanamerica ceph[1714]: Starting Ceph mon.0 on capitanamerica... > Nov 26 18:05:43 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f... > Nov 26 18:05:43 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f. > Nov 26 18:05:43 capitanamerica ceph[1714]: Running as unit ceph-mon.0.1480179943.905192147.service. > Nov 26 18:05:43 capitanamerica ceph[1714]: Starting ceph-create-keys on capitanamerica... > Nov 26 18:05:44 capitanamerica ceph[1714]: === osd.1 === > Nov 26 18:05:44 capitanamerica ceph[1714]: 2016-11-26 18:05:44.939844 7f7f2478c700 0 -- :/2046852810 >> 10.27.251.7:6789/0 pipe(0x7f7f20061550 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f2005a990).fault > Nov 26 18:05:46 capitanamerica bash[1874]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6 > Nov 26 18:05:52 capitanamerica ceph[1714]: 2016-11-26 18:05:52.234086 7f7f2478c700 0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400b0c0).fault > Nov 26 18:05:58 capitanamerica ceph[1714]: 2016-11-26 18:05:58.234163 7f7f2458a700 0 -- 10.27.251.7:0/2046852810 >> 10.27.251.12:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d240).fault > Nov 26 18:06:04 capitanamerica ceph[1714]: 2016-11-26 18:06:04.234037 7f7f2468b700 0 -- 10.27.251.7:0/2046852810 >> 10.27.251.11:6789/0 pipe(0x7f7f14006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7f1400d310).fault > Nov 26 18:06:14 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 1.82 host=capitanamerica root=default' > Nov 26 18:06:14 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.1']' returned non-zero exit status 1 > Nov 26 18:06:15 capitanamerica ceph[1714]: === osd.0 === > Nov 26 18:06:22 capitanamerica ceph[1714]: 2016-11-26 18:06:22.238039 7f8bb46b2700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000b0c0).fault > Nov 26 18:06:28 capitanamerica ceph[1714]: 2016-11-26 18:06:28.241918 7f8bb44b0700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d240).fault > Nov 26 18:06:34 capitanamerica ceph[1714]: 2016-11-26 18:06:34.242060 7f8bb45b1700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.12:6789/0 pipe(0x7f8ba0006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000d310).fault > Nov 26 18:06:38 capitanamerica ceph[1714]: 2016-11-26 18:06:38.242035 7f8bb44b0700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000de50).fault > Nov 26 18:06:44 capitanamerica ceph[1714]: 2016-11-26 18:06:44.242157 7f8bb46b2700 0 -- 10.27.251.7:0/3291965862 >> 10.27.251.11:6789/0 pipe(0x7f8ba0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f8ba000e0d0).fault > Nov 26 18:06:45 capitanamerica ceph[1714]: failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 1.82 host=capitanamerica root=default' > Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: ceph osd start failed: Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start', 'osd.0']' returned non-zero exit status 1 > Nov 26 18:06:45 capitanamerica ceph[1714]: ceph-disk: Error: One or more partitions failed to activate > > And for the second (working): > > Nov 26 20:01:49 capitanamerica ceph[1716]: === mon.0 === > Nov 26 20:01:49 capitanamerica ceph[1716]: Starting Ceph mon.0 on capitanamerica... > Nov 26 20:01:49 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f... > Nov 26 20:01:49 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-mon -i 0 --pid-file /var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f. > Nov 26 20:01:49 capitanamerica ceph[1716]: Running as unit ceph-mon.0.1480186909.457328760.service. > Nov 26 20:01:49 capitanamerica ceph[1716]: Starting ceph-create-keys on capitanamerica... > Nov 26 20:01:49 capitanamerica bash[1900]: starting mon.0 rank 0 at 10.27.251.7:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 8794c124-c2ec-4e81-8631-742992159bd6 > Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.1 === > Nov 26 20:01:50 capitanamerica ceph[1716]: create-or-move updated item name 'osd.1' weight 1.82 at location {host=capitanamerica,root=default} to crush map > Nov 26 20:01:50 capitanamerica ceph[1716]: Starting Ceph osd.1 on capitanamerica... > Nov 26 20:01:50 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f... > Nov 26 20:01:50 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f. > Nov 26 20:01:50 capitanamerica ceph[1716]: Running as unit ceph-osd.1.1480186910.254183695.service. > Nov 26 20:01:50 capitanamerica bash[2765]: starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal > Nov 26 20:01:50 capitanamerica ceph[1716]: === osd.0 === > Nov 26 20:01:51 capitanamerica ceph[1716]: create-or-move updated item name 'osd.0' weight 1.82 at location {host=capitanamerica,root=default} to crush map > Nov 26 20:01:51 capitanamerica ceph[1716]: Starting Ceph osd.0 on capitanamerica... > Nov 26 20:01:51 capitanamerica systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f... > Nov 26 20:01:51 capitanamerica systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f. > Nov 26 20:01:51 capitanamerica ceph[1716]: Running as unit ceph-osd.0.1480186910.957564523.service. > Nov 26 20:01:51 capitanamerica bash[3281]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal > > > So seems to me that at the first start (some) OSD fail to start. But, > again, PVE and 'ceph status' report all OSDs as up&in. What does the following command give you? ceph osd pool get min_size > > > Thanks. > As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to happen. And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again and ceph needs its time, till all services are running. -- Cheers, Alwin From gaio at sv.lnf.it Tue Nov 29 15:05:14 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Tue, 29 Nov 2016 15:05:14 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> <20161129111744.GL3355@sv.lnf.it> <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com> Message-ID: <20161129140514.GR3355@sv.lnf.it> Mandi! Alwin Antreich In chel di` si favelave... > What does the following command give you? > ceph osd pool get min_size root at capitanamerica:~# ceph osd pool get DATA min_size min_size: 1 root at capitanamerica:~# ceph osd pool get VM min_size min_size: 1 root at capitanamerica:~# ceph osd pool get LXC min_size min_size: 1 > As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to > happen. Ahem, not so unlikely... we have UPSes but not diesel generators... ;-( > And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again > and ceph needs its time, till all services are running. This is not the case. I've started the nodes one by one... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From sysadmin-pve at cognitec.com Tue Nov 29 17:26:25 2016 From: sysadmin-pve at cognitec.com (Alwin Antreich) Date: Tue, 29 Nov 2016 17:26:25 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: <20161129140514.GR3355@sv.lnf.it> References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> <20161129111744.GL3355@sv.lnf.it> <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com> <20161129140514.GR3355@sv.lnf.it> Message-ID: Hi Marco, On 11/29/2016 03:05 PM, Marco Gaiarin wrote: > Mandi! Alwin Antreich > In chel di` si favelave... > >> What does the following command give you? >> ceph osd pool get min_size > > root at capitanamerica:~# ceph osd pool get DATA min_size > min_size: 1 > root at capitanamerica:~# ceph osd pool get VM min_size > min_size: 1 > root at capitanamerica:~# ceph osd pool get LXC min_size > min_size: 1 The min_size 1 means in a degraded state, ceph serves the data as long as one copy is available. > > >> As a general thought, a HA cluster would be always running, so the event that you shutdown all nodes is unlikely to >> happen. > > Ahem, not so unlikely... we have UPSes but not diesel generators... ;-( If they shutdown cleanly, then it shouldn't be a problem, as far as I have tested it myself. > > >> And if you decide to shutdown all nodes, then a couple of minutes should be ok to get everything running again >> and ceph needs its time, till all services are running. > > This is not the case. I've started the nodes one by one... > I don't see this behavior on our test cluster, when we shutdown all hosts and start them up at a later time. -- Cheers, Alwin From f.rust at sec.tu-bs.de Wed Nov 30 09:06:30 2016 From: f.rust at sec.tu-bs.de (F.Rust) Date: Wed, 30 Nov 2016 09:06:30 +0100 Subject: [PVE-User] Webfrontent View Message-ID: Hi all, I?m using promos 4.2 for a while now and am quite satisfied. But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future? Thanks for any help, Frank From gaio at sv.lnf.it Wed Nov 30 09:36:47 2016 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Wed, 30 Nov 2016 09:36:47 +0100 Subject: [PVE-User] Ceph: PANIC or DON'T PANIC? ;-) In-Reply-To: References: <20161128120511.GJ3348@sv.lnf.it> <88ea04bd-9167-b11c-acfe-2c918ca41f54@cognitec.com> <20161128143141.GQ3348@sv.lnf.it> <331dbded-d31f-e36a-517c-2c9b9dd6a3b5@cognitec.com> <20161129111744.GL3355@sv.lnf.it> <691959e9-1f92-e2f5-73bd-26a3cf709b11@cognitec.com> <20161129140514.GR3355@sv.lnf.it> Message-ID: <20161130083647.GC3213@sv.lnf.it> Mandi! Alwin Antreich In chel di` si favelave... > The min_size 1 means in a degraded state, ceph serves the data as long as one copy is available. Yes, i know. > If they shutdown cleanly, then it shouldn't be a problem, as far as I have tested it myself. [...] > I don't see this behavior on our test cluster, when we shutdown all hosts and start them up at a later time. Boh. It's strange... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From t.lamprecht at proxmox.com Wed Nov 30 10:22:01 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Wed, 30 Nov 2016 10:22:01 +0100 Subject: [PVE-User] Webfrontent View In-Reply-To: References: Message-ID: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com> Hi, On 11/30/2016 09:06 AM, F.Rust wrote: > Hi all, > > I?m using promos 4.2 for a while now and am quite satisfied. > But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future? You can resize the tree pane again and the bottom log window can be toggled, with the small bar with the triangle on it at the bottom, see: https://www.pictshare.net/316ece0142.png If that does not help can you please upload a screenshot and send the link as a reply so we can see whats going on :) cheers, Thomas > > Thanks for any help, > Frank > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From f.rust at sec.tu-bs.de Wed Nov 30 10:34:34 2016 From: f.rust at sec.tu-bs.de (F.Rust) Date: Wed, 30 Nov 2016 10:34:34 +0100 Subject: [PVE-User] Webfrontent View In-Reply-To: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com> References: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com> Message-ID: <5E754049-E685-4E4E-8960-4A0A9F2AAC01@sec.tu-bs.de> Here the requested screenshot. https://www.pictshare.net/2477973227.png You can see there are no resize handles... > Am 30.11.2016 um 10:22 schrieb Thomas Lamprecht : > > Hi, > > On 11/30/2016 09:06 AM, F.Rust wrote: >> Hi all, >> >> I?m using promos 4.2 for a while now and am quite satisfied. >> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future? > > You can resize the tree pane again and the bottom log window can be toggled, > with the small bar with the triangle on it at the bottom, see: > https://www.pictshare.net/316ece0142.png > > If that does not help can you please upload a screenshot and send > the link as a reply so we can see whats going on :) > > cheers, > Thomas > >> >> Thanks for any help, >> Frank >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > Frank Rust ------------------------------------------------------------------------ Frank Rust Technische Universit?t Braunschweig Fon: 0531 39155122 Institut f?r Systemsicherheit Fax: 0531 39155130 Rebenring 56 Mail: f.rust at tu-braunschweig.de D-38106 Braunschweig From t.lamprecht at proxmox.com Wed Nov 30 10:40:46 2016 From: t.lamprecht at proxmox.com (Thomas Lamprecht) Date: Wed, 30 Nov 2016 10:40:46 +0100 Subject: [PVE-User] Webfrontent View In-Reply-To: <5E754049-E685-4E4E-8960-4A0A9F2AAC01@sec.tu-bs.de> References: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com> <5E754049-E685-4E4E-8960-4A0A9F2AAC01@sec.tu-bs.de> Message-ID: On 11/30/2016 10:34 AM, F.Rust wrote: > Here the requested screenshot. > > https://www.pictshare.net/2477973227.png > > You can see there are no resize handles... The one for the Log Panel below is there but yes, the left three seems to be missing... Did you tried a force reload which should empty the cache: CTRL + SHIFT + R Else I could imagine that an Addon is interfering here, maybe you accidentally did a "right click + Block Element" on the tree panel so that your add blocker blocks it, just shooting in the dark here :) cheers, Thomas > >> Am 30.11.2016 um 10:22 schrieb Thomas Lamprecht : >> >> Hi, >> >> On 11/30/2016 09:06 AM, F.Rust wrote: >>> Hi all, >>> >>> I?m using promos 4.2 for a while now and am quite satisfied. >>> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future? >> You can resize the tree pane again and the bottom log window can be toggled, >> with the small bar with the triangle on it at the bottom, see: >> https://www.pictshare.net/316ece0142.png >> >> If that does not help can you please upload a screenshot and send >> the link as a reply so we can see whats going on :) >> >> cheers, >> Thomas >> >>> Thanks for any help, >>> Frank >>> >>> >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > > Frank Rust > > ------------------------------------------------------------------------ > Frank Rust Technische Universit?t Braunschweig > > Fon: 0531 39155122 Institut f?r Systemsicherheit > Fax: 0531 39155130 Rebenring 56 > Mail: f.rust at tu-braunschweig.de D-38106 Braunschweig > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From regis.houssin at inodbox.com Wed Nov 30 10:42:45 2016 From: regis.houssin at inodbox.com (=?UTF-8?Q?R=c3=a9gis_Houssin?=) Date: Wed, 30 Nov 2016 10:42:45 +0100 Subject: [PVE-User] New VM created after 4.3 upgrade not start ! Message-ID: Hi, after upgrade proxmox with the latest 4.3, I have an error message when starting a new VM : (the VMs created before the update works fine) > kvm: -drive file=/dev/drbd/by-res/vm-502-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on: Could not open '/dev/drbd/by-res/vm-502-disk-1/0': No such file or directory > TASK ERROR: start failed: command '/usr/bin/kvm -id 502 -chardev 'socket,id=qmp,path=/var/run/qemu-server/502.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/502.pid -daemonize -smbios 'type=1,uuid=7eab7942-fcaf-48b6-94ac-bad24087e609' -name srv1.happylibre.fr -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/502.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k fr -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:3681fcbb6821' -drive 'file=/var/lib/vz/template/iso/debian-8.4.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/drbd/by-res/vm-502-disk-1/0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap502i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=5A:73:0D:7E:E9:C5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1 the volume and resource "vm-502-disk-1" is ok, but it does not appear with "drbdsetup show", and it does not appear with "drbd-overview" !! what is the problem please ? thanks Cordialement, -- R?gis Houssin --------------------------------------------------------- iNodbox (Cap-Networks) 5, rue Corneille 01000 BOURG EN BRESSE FRANCE VoIP: +33 1 83 62 40 03 GSM: +33 6 33 02 07 97 Email: regis.houssin at inodbox.com Web: https://www.inodbox.com/ Development: https://git.framasoft.org/u/inodbox/ Translation: https://www.transifex.com/inodbox/ --------------------------------------------------------- From f.rust at sec.tu-bs.de Wed Nov 30 10:58:26 2016 From: f.rust at sec.tu-bs.de (F.Rust) Date: Wed, 30 Nov 2016 10:58:26 +0100 Subject: [PVE-User] Webfrontent View In-Reply-To: <30893B7D-8A98-48B5-AE61-B7131D5C52D6@tu-bs.de> References: <39912350-41cc-74ec-e0a7-5465b79d4f61@proxmox.com> <30893B7D-8A98-48B5-AE61-B7131D5C52D6@tu-bs.de> Message-ID: You are right! Thanks a lot. BOTH were there, but invisible (at least for me). Under regular circumstances it is impossible to move these drawers to that extreme positions. I have no idea how it could happen (and stay persistent during different user sessions). Best regards, Frank > Am 30.11.2016 um 10:33 schrieb F. Rust : > > Here the requested screenshot. > > https://www.pictshare.net/2477973227.png > > You can see there are no resize handles... > > >> Am 30.11.2016 um 10:22 schrieb Thomas Lamprecht : >> >> Hi, >> >> On 11/30/2016 09:06 AM, F.Rust wrote: >>> Hi all, >>> >>> I?m using promos 4.2 for a while now and am quite satisfied. >>> But now I managed somehow to accidental switch the web frontend to only show the main content but not the left tree pane and not the bottom "Tasks/Cluster Log" pane. How can I get back the regular view? How to prevent this in future? >> >> You can resize the tree pane again and the bottom log window can be toggled, >> with the small bar with the triangle on it at the bottom, see: >> https://www.pictshare.net/316ece0142.png >> >> If that does not help can you please upload a screenshot and send >> the link as a reply so we can see whats going on :) >> >> cheers, >> Thomas >> >>> >>> Thanks for any help, >>> Frank >>> >>> >>> _______________________________________________ >>> pve-user mailing list >>> pve-user at pve.proxmox.com >>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> > > > > Frank Rust > > ------------------------------------------------------------------------ > Frank Rust Technische Universit?t Braunschweig > > Fon: 0531 39155122 Institut f?r Systemsicherheit > Fax: 0531 39155130 Rebenring 56 > Mail: f.rust at tu-braunschweig.de D-38106 Braunschweig > Frank Rust ------------------------------------------------------------------------ Frank Rust Technische Universit?t Braunschweig Fon: 0531 39155122 Institut f?r Systemsicherheit Fax: 0531 39155130 Rebenring 56 Mail: f.rust at tu-braunschweig.de D-38106 Braunschweig From mark at openvs.co.uk Wed Nov 30 19:10:55 2016 From: mark at openvs.co.uk (Mark Adams) Date: Wed, 30 Nov 2016 18:10:55 +0000 Subject: [PVE-User] ZFS on iSCSI + Pacemaker/corosync/DRBD In-Reply-To: <20161123214006.26e4e9a9@sleipner.datanom.net> References: <20161123214006.26e4e9a9@sleipner.datanom.net> Message-ID: Hi, Thanks for the response. I was planning on using active/backup bonding on 10Gbe for my network fault tolerance so no multipath support shouldn't be an issue. I've come across some strange behaviour with the iet provider though, in that after 9 LUNS it starts changing the existing luns rather than adding additional. Hard disk config in proxmox for VM: Hard Disk (virtio0) ZFSOVERISCSI:vm-112-disk-1,size=10G Hard Disk (virtio1) ZFSOVERISCSI:vm-112-disk-2,size=10G Hard Disk (virtio2) ZFSOVERISCSI:vm-112-disk-3,size=10G Hard Disk (virtio3) ZFSOVERISCSI:vm-112-disk-4,size=10G Hard Disk (virtio4) ZFSOVERISCSI:vm-112-disk-5,size=10G Hard Disk (virtio5) ZFSOVERISCSI:vm-112-disk-6,size=10G Hard Disk (virtio6) ZFSOVERISCSI:vm-112-disk-7,size=10G Hard Disk (virtio7) ZFSOVERISCSI:vm-112-disk-8,size=10G Hard Disk (virtio8) ZFSOVERISCSI:vm-112-disk-9,size=10G Hard Disk (virtio9) ZFSOVERISCSI:vm-112-disk-10,size=10G ietd.conf file on zfs/iscsi storage host: Lun 0 Path=/dev/VMSTORE/vm-112-disk-1,Type=blockio Lun 1 Path=/dev/VMSTORE/vm-112-disk-2,Type=blockio Lun 2 Path=/dev/VMSTORE/vm-112-disk-3,Type=blockio Lun 3 Path=/dev/VMSTORE/vm-112-disk-4,Type=blockio Lun 4 Path=/dev/VMSTORE/vm-112-disk-6,Type=blockio Lun 5 Path=/dev/VMSTORE/vm-112-disk-7,Type=blockio Lun 6 Path=/dev/VMSTORE/vm-112-disk-8,Type=blockio Lun 7 Path=/dev/VMSTORE/vm-112-disk-9,Type=blockio Lun 8 Path=/dev/VMSTORE/vm-112-disk-10,Type=blockio as you can see, "disk-5" is missing since I added "disk-10" Is anyone using zfs over iscsi with iet? have you seen this behaviour? Thanks, Mark On 23 November 2016 at 20:40, Michael Rasmussen wrote: > On Wed, 23 Nov 2016 09:40:55 +0000 > Mark Adams wrote: > > > > > Has anyone else tried to get this or a similar setup working? Any views > > greatly received. > > > What you are trying to achieve is not a good idea with > corosync/pacemaker since iSCSI is a block device. To create a cluster > over a LUN will require a cluster aware filesystem like NFS, CIFS etc. > The proper way of doing this with iSCSI would be using multipath to a > SAN since iSCSI LUNs cannot be shared. Unfortunately the current > implementation of ZFS over iSCSI does not support multipath (a > limitation in libiscsi). Also may I remind you that Iet development has > stopped in favor of LIO targets (http://linux-iscsi.org/wiki/LIO). I am > currently working on making an implementation of LIO for proxmox which > will use a different architecture than the current ZFS over iSCSI > implementation. The new implementation will support multipath. As this > is developed in my spare time progress is not a high as it could be. > > Alternatively you could look at this: > http://www.napp-it.org/doc/downloads/z-raid.pdf > > -- > Hilsen/Regards > Michael Rasmussen > > Get my public GnuPG keys: > michael rasmussen cc > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E > mir datanom net > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C > mir miras org > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 > -------------------------------------------------------------- > /usr/games/fortune -es says: > The computer should be doing the hard work. That's what it's paid to > do, after all. > -- Larry Wall in <199709012312.QAA08121 at wall.org> > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >