From gaio at sv.lnf.it Thu Jun 4 09:22:26 2020 From: gaio at sv.lnf.it (Marco Gaiarin) Date: Thu, 4 Jun 2020 09:22:26 +0200 Subject: [PVE-User] PVE 6, wireless and regulatory database... In-Reply-To: <20200527070105.GC477557@dona.proxmox.com> References: <20200518084316.GC3626@lilliput.linux.it> <20200519074011.GB4975@lilliput.linux.it> <20200519090409.GA406757@dona.proxmox.com> <20200520205816.GA23561@lilliput.linux.it> <20200522100421.GA1577721@dona.proxmox.com> <20200526104430.GK3717@lilliput.linux.it> <20200526113514.GB477557@dona.proxmox.com> <20200526153146.GL3717@lilliput.linux.it> <20200527070105.GC477557@dona.proxmox.com> Message-ID: <20200604072226.GA3816@lilliput.linux.it> Mandi! Alwin Antreich In chel di` si favelave... > > I've installed the buster package... > You will need the package from the backports. Sorry for the late answer, but even at home i've needed to define with my stakeholder a mainenance windows for a cluster reboot. ;-) I confirm, work as expected. Jun 3 23:51:57 ino kernel: [ 7.866523] cfg80211: Loading compiled-in X.509 certificates for regulatory database Jun 3 23:51:57 ino kernel: [ 7.878636] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' Jun 3 23:51:57 ino kernel: [ 8.070696] ath: EEPROM regdomain: 0x809c Jun 3 23:51:57 ino kernel: [ 8.070698] ath: EEPROM indicates we should expect a country code Jun 3 23:51:57 ino kernel: [ 8.070698] ath: doing EEPROM country->regdmn map search Jun 3 23:51:57 ino kernel: [ 8.070699] ath: country maps to regdmn code: 0x52 Jun 3 23:51:57 ino kernel: [ 8.070700] ath: Country alpha2 being used: CN Jun 3 23:51:57 ino kernel: [ 8.070700] ath: Regpair used: 0x52 Jun 3 23:51:57 ino kernel: [ 8.072132] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' Jun 3 23:51:57 ino kernel: [ 8.072647] ieee80211 phy0: Atheros AR9287 Rev:2 mem=0xffff9d084dbf0000, irq=16 Jun 3 23:51:57 ino kernel: [ 8.080354] ath9k 0000:10:00.0 wls1: renamed from wlan0 Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bont?, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) From lindsay.mathieson at gmail.com Thu Jun 4 09:42:58 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 4 Jun 2020 17:42:58 +1000 Subject: [PVE-User] PVE 6, wireless and regulatory database... In-Reply-To: <20200604072226.GA3816@lilliput.linux.it> References: <20200518084316.GC3626@lilliput.linux.it> <20200519074011.GB4975@lilliput.linux.it> <20200519090409.GA406757@dona.proxmox.com> <20200520205816.GA23561@lilliput.linux.it> <20200522100421.GA1577721@dona.proxmox.com> <20200526104430.GK3717@lilliput.linux.it> <20200526113514.GB477557@dona.proxmox.com> <20200526153146.GL3717@lilliput.linux.it> <20200527070105.GC477557@dona.proxmox.com> <20200604072226.GA3816@lilliput.linux.it> Message-ID: <18820b24-12bd-aa8a-6016-37c36a276f7a@gmail.com> On 4/06/2020 5:22 pm, Marco Gaiarin wrote: > but even at home i've needed to define with > my stakeholder SO? :) -- Lindsay From sivakumar.saravanan.jv.ext at valeo-siemens.com Thu Jun 4 14:52:43 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Thu, 4 Jun 2020 14:52:43 +0200 Subject: [PVE-User] Proxmox Datacenter Issue Message-ID: Hello, We have a one Proxmox Datacenter and on top of that we have around 15 standalone nodes and cluster defined. The Datacenter itself is showing "communication error " frequentially. All standalone nodes are unavailable to perform any activities within theProxmox Datacenter. Appreciating your support. Best regards Sivakumar -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From elacunza at binovo.es Thu Jun 4 14:59:37 2020 From: elacunza at binovo.es (Eneko Lacunza) Date: Thu, 4 Jun 2020 14:59:37 +0200 Subject: [PVE-User] Proxmox Datacenter Issue In-Reply-To: References: Message-ID: <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es> Hi, El 4/6/20 a las 14:52, Sivakumar SARAVANAN escribi?: > Hello, > > We have a one Proxmox Datacenter and on top of that we have around 15 > standalone nodes and cluster defined. > > The Datacenter itself is showing "communication error " frequentially. All > standalone nodes are unavailable to perform any activities within > theProxmox Datacenter. > > Appreciating your support. > This is usually a network problem. What version of Proxmox (pveversion -v) Cheers Eneko -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943569206 Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) www.binovo.es From sivakumar.saravanan.jv.ext at valeo-siemens.com Thu Jun 4 15:07:00 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Thu, 4 Jun 2020 15:07:00 +0200 Subject: [PVE-User] Proxmox Datacenter Issue In-Reply-To: <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es> References: <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es> Message-ID: Hello We are using the pve-manager/6.1-3/37248ce6 There is no network issue, we are able to access the all host from the putty session. But not from Datacenter. Best regards, Sivakumar SARAVANAN On Thu, Jun 4, 2020 at 3:00 PM Eneko Lacunza wrote: > Hi, > > El 4/6/20 a las 14:52, Sivakumar SARAVANAN escribi?: > > Hello, > > > > We have a one Proxmox Datacenter and on top of that we have around 15 > > standalone nodes and cluster defined. > > > > The Datacenter itself is showing "communication error " frequentially. > All > > standalone nodes are unavailable to perform any activities within > > theProxmox Datacenter. > > > > Appreciating your support. > > > This is usually a network problem. What version of Proxmox (pveversion -v) > > > Cheers > Eneko > > -- > Zuzendari Teknikoa / Director T?cnico > Binovo IT Human Project, S.L. > Telf. 943569206 > Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) > www.binovo.es > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From sivakumar.saravanan.jv.ext at valeo-siemens.com Thu Jun 4 15:33:42 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Thu, 4 Jun 2020 15:33:42 +0200 Subject: [PVE-User] Concern About removing the host from datacenter Message-ID: Hello, Is there any problem if I remove the standalone host from Proxmox Datacenter and add the same host back to the cluster without changing the IP and hostname ? It is a stanalong host and no cluster defined. Best regards Sivakumar SARAVANAN -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From harrim4n at harrim4n.com Thu Jun 4 15:41:00 2020 From: harrim4n at harrim4n.com (harrim4n) Date: Thu, 4 Jun 2020 15:41:00 +0200 Subject: [PVE-User] Proxmox Datacenter Issue In-Reply-To: References: <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es> Message-ID: Hi, I don't understand your host layout. Are you running a cluster as described in [1] or not? Does your environment match the requirements in the wiki? What do you mean "15 standalone nodes and cluster defined"? Are they running in a cluster or not? Also, are the hosts able to reach each other? Just because you can access them from your host, doesn't mean that they can talk to each other. Regards, harrim4n [1] https://pve.proxmox.com/wiki/Cluster_Manager On 04.06.20 15:07, Sivakumar SARAVANAN wrote: > Hello > > We are using the pve-manager/6.1-3/37248ce6 > There is no network issue, we are able to access the all host from the > putty session. But not from Datacenter. > > > Best regards, > > Sivakumar SARAVANAN > > > > On Thu, Jun 4, 2020 at 3:00 PM Eneko Lacunza wrote: > >> Hi, >> >> El 4/6/20 a las 14:52, Sivakumar SARAVANAN escribi?: >>> Hello, >>> >>> We have a one Proxmox Datacenter and on top of that we have around 15 >>> standalone nodes and cluster defined. >>> >>> The Datacenter itself is showing "communication error " frequentially. >> All >>> standalone nodes are unavailable to perform any activities within >>> theProxmox Datacenter. >>> >>> Appreciating your support. >>> >> This is usually a network problem. What version of Proxmox (pveversion -v) >> >> >> Cheers >> Eneko >> >> -- >> Zuzendari Teknikoa / Director T?cnico >> Binovo IT Human Project, S.L. >> Telf. 943569206 >> Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) >> www.binovo.es >> >> _______________________________________________ >> pve-user mailing list >> pve-user at pve.proxmox.com >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >> From sivakumar.saravanan.jv.ext at valeo-siemens.com Mon Jun 8 10:14:54 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Mon, 8 Jun 2020 10:14:54 +0200 Subject: [PVE-User] VM Power Issue Message-ID: Hello, I am not able to start the VM after adding the PCI device to VM. I can see the below error message. TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name HIL-System096Planned -chardev 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' -smp '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi' -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object 'memory-backend-ram,id=ram-node1,size=16384M' -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -drive 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' -netdev 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout Appreciating your suggestion. Best regards, SK -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From sivakumar.saravanan.jv.ext at valeo-siemens.com Mon Jun 8 14:11:01 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Mon, 8 Jun 2020 14:11:01 +0200 Subject: [PVE-User] Fwd: VM Power Issue In-Reply-To: References: Message-ID: Hello, I am not able to start the VM after adding the PCI device to VM. I can see the below error message. TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name HIL-System096Planned -chardev 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' -smp '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi' -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object 'memory-backend-ram,id=ram-node1,size=16384M' -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -drive 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' -netdev 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout Appreciating your suggestion. Best regards, SK -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From leesteken at protonmail.ch Mon Jun 8 16:30:57 2020 From: leesteken at protonmail.ch (Arjen) Date: Mon, 08 Jun 2020 14:30:57 +0000 Subject: [PVE-User] VM Power Issue In-Reply-To: References: Message-ID: On Monday, June 8, 2020 10:14 AM, Sivakumar SARAVANAN wrote: > Hello, > > I am not able to start the VM after adding the PCI device to VM. > I can see the below error message. Maybe your system is very busy? Maybe it takes a while to allocate the memory? Maybe you could give more information about the VM configuration and your PVE setup? Can you try running the command below from the command line of your PVE host, to see if it works and how long it takes? Sometimes (often memory-size related), it just works but takes longer than the time-out. > TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name > HIL-System096Planned -chardev > 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon > 'chardev=qmp,mode=control' -chardev > 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon > 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid > -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' -smp > '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot > 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' > -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu > 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi' > -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa > 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object > 'memory-backend-ram,id=ram-node1,size=16384M' -numa > 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device > 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device > 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device > 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device > 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device > 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device > 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device > 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev > 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device > 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device > 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi > 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive > 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device > 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive > 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' > -device > 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' > -drive > 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' > -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' > -netdev > 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' > -device > 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' > -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global > 'kvm-pit.lost_tick_policy=discard'' failed: got timeout > > Appreciating your suggestion. > > Best regards, > > SK > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > This e-mail message is intended for the internal use of the intended > > recipient(s) only. > > The information contained herein is > > confidential/privileged. Its disclosure or reproduction is strictly > > prohibited. > > If you are not the intended recipient, please inform the sender > > immediately, do not disclose it internally or to third parties and destroy > > it. > > In the course of our business relationship and for business purposes > > only, Valeo may need to process some of your personal data. > > For more > > information, please refer to the Valeo Data Protection Statement and > > Privacy notice available on Valeo.com > > https://www.valeo.com/en/ethics-and-compliance/#principes Am I the intended recipient? Otherwise, consider yourself informed immediately. I apologize for disclosing this information on the same mailing-list you sent the original e-mail. Valeo is not allowed to process my personal data, according to the General Data Protection Regulation (GDPR), without prior written consent. Please consider removing such statements when sending email to a (public) mailing list, as it makes it difficult to help you without violating your rules. From mark at openvs.co.uk Mon Jun 8 16:38:48 2020 From: mark at openvs.co.uk (Mark Adams) Date: Mon, 8 Jun 2020 15:38:48 +0100 Subject: [PVE-User] VM Power Issue In-Reply-To: References: Message-ID: Sivakumar - This is a "known issue" as far as I am aware, usually when you are allocating quite a bit of memory (although 16G is not a lot in your case, but maybe the server doesn't have much ram?) when starting a vm with a PCI device passed through to it. It also only seems to happen when you are nearing "peak" ram usage, so getting close to running out. It never happens on a fresh boot. I don't know if it has been acknowledged or even reported to redhat, or whether simply the timeout should be longer in proxmox. I wrote to this list about it not long ago and never received a response, and I have seen at least 1 forum post about it. Anyway to cut a long story short, just start it manually on the cli, which has no timeout. "qm showcmd VMID | bash" should start it fine. IE "qm showcmd 101 | bash" Regards, Mark On Mon, 8 Jun 2020 at 15:31, Arjen via pve-user wrote: > > > > ---------- Forwarded message ---------- > From: Arjen > To: PVE User List > Cc: > Bcc: > Date: Mon, 08 Jun 2020 14:30:57 +0000 > Subject: Re: [PVE-User] VM Power Issue > On Monday, June 8, 2020 10:14 AM, Sivakumar SARAVANAN < > sivakumar.saravanan.jv.ext at valeo-siemens.com> wrote: > > > Hello, > > > > I am not able to start the VM after adding the PCI device to VM. > > I can see the below error message. > > Maybe your system is very busy? Maybe it takes a while to allocate the > memory? > Maybe you could give more information about the VM configuration and your > PVE setup? > > Can you try running the command below from the command line of your PVE > host, to see if it works and how long it takes? > Sometimes (often memory-size related), it just works but takes longer than > the time-out. > > > TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name > > HIL-System096Planned -chardev > > 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon > > 'chardev=qmp,mode=control' -chardev > > 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon > > 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid > > -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' > -smp > > '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot > > > 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' > > -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu > > > 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi' > > -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa > > 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object > > 'memory-backend-ram,id=ram-node1,size=16384M' -numa > > 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device > > 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device > > 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device > > 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device > > 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device > > 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device > > 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device > > 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev > > 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device > > 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device > > 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi > > 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive > > 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device > > 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive > > > 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' > > -device > > > 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' > > -drive > > > 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' > > -device > 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' > > -netdev > > > 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' > > -device > > > 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' > > -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global > > 'kvm-pit.lost_tick_policy=discard'' failed: got timeout > > > > Appreciating your suggestion. > > > > Best regards, > > > > SK > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > This e-mail message is intended for the internal use of the intended > > > > recipient(s) only. > > > > The information contained herein is > > > > confidential/privileged. Its disclosure or reproduction is strictly > > > > prohibited. > > > > If you are not the intended recipient, please inform the sender > > > > immediately, do not disclose it internally or to third parties and > destroy > > > > it. > > > > In the course of our business relationship and for business purposes > > > > only, Valeo may need to process some of your personal data. > > > > For more > > > > information, please refer to the Valeo Data Protection Statement and > > > > Privacy notice available on Valeo.com > > > > https://www.valeo.com/en/ethics-and-compliance/#principes > > Am I the intended recipient? Otherwise, consider yourself informed > immediately. I apologize for disclosing this information on the same > mailing-list you sent the original e-mail. Valeo is not allowed to process > my personal data, according to the General Data Protection Regulation > (GDPR), without prior written consent. Please consider removing such > statements when sending email to a (public) mailing list, as it makes it > difficult to help you without violating your rules. > > > > ---------- Forwarded message ---------- > From: Arjen via pve-user > To: PVE User List > Cc: Arjen > Bcc: > Date: Mon, 08 Jun 2020 14:30:57 +0000 > Subject: Re: [PVE-User] VM Power Issue > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From sivakumar.saravanan.jv.ext at valeo-siemens.com Mon Jun 8 17:15:52 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Mon, 8 Jun 2020 17:15:52 +0200 Subject: [PVE-User] VM Power Issue In-Reply-To: References: Message-ID: Hello Mark, Thanks for your support. It working fine now. Best regards SK On Mon, Jun 8, 2020 at 4:40 PM Mark Adams via pve-user < pve-user at pve.proxmox.com> wrote: > > > > ---------- Forwarded message ---------- > From: Mark Adams > To: PVE User List > Cc: > Bcc: > Date: Mon, 8 Jun 2020 15:38:48 +0100 > Subject: Re: [PVE-User] VM Power Issue > Sivakumar - This is a "known issue" as far as I am aware, usually when you > are allocating quite a bit of memory (although 16G is not a lot in your > case, but maybe the server doesn't have much ram?) when starting a vm with > a PCI device passed through to it. It also only seems to happen when you > are nearing "peak" ram usage, so getting close to running out. It never > happens on a fresh boot. > > I don't know if it has been acknowledged or even reported to redhat, or > whether simply the timeout should be longer in proxmox. > > I wrote to this list about it not long ago and never received a response, > and I have seen at least 1 forum post about it. > > Anyway to cut a long story short, just start it manually on the cli, which > has no timeout. "qm showcmd VMID | bash" should start it fine. IE "qm > showcmd 101 | bash" > > Regards, > Mark > > On Mon, 8 Jun 2020 at 15:31, Arjen via pve-user > wrote: > > > > > > > > > ---------- Forwarded message ---------- > > From: Arjen > > To: PVE User List > > Cc: > > Bcc: > > Date: Mon, 08 Jun 2020 14:30:57 +0000 > > Subject: Re: [PVE-User] VM Power Issue > > On Monday, June 8, 2020 10:14 AM, Sivakumar SARAVANAN < > > sivakumar.saravanan.jv.ext at valeo-siemens.com> wrote: > > > > > Hello, > > > > > > I am not able to start the VM after adding the PCI device to VM. > > > I can see the below error message. > > > > Maybe your system is very busy? Maybe it takes a while to allocate the > > memory? > > Maybe you could give more information about the VM configuration and your > > PVE setup? > > > > Can you try running the command below from the command line of your PVE > > host, to see if it works and how long it takes? > > Sometimes (often memory-size related), it just works but takes longer > than > > the time-out. > > > > > TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name > > > HIL-System096Planned -chardev > > > 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon > > > 'chardev=qmp,mode=control' -chardev > > > 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon > > > 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid > > > -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' > > -smp > > > '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot > > > > > > 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' > > > -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu > > > > > > 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi' > > > -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa > > > 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object > > > 'memory-backend-ram,id=ram-node1,size=16384M' -numa > > > 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device > > > 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device > > > 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device > > > 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device > > > 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device > > > 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device > > > 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device > > > 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev > > > 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' > -device > > > 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device > > > 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi > > > 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive > > > 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device > > > 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive > > > > > > 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' > > > -device > > > > > > 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' > > > -drive > > > > > > 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' > > > -device > > 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb' > > > -netdev > > > > > > 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' > > > -device > > > > > > 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' > > > -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global > > > 'kvm-pit.lost_tick_policy=discard'' failed: got timeout > > > > > > Appreciating your suggestion. > > > > > > Best regards, > > > > > > SK > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > This e-mail message is intended for the internal use of the intended > > > > > > recipient(s) only. > > > > > > The information contained herein is > > > > > > confidential/privileged. Its disclosure or reproduction is strictly > > > > > > prohibited. > > > > > > If you are not the intended recipient, please inform the sender > > > > > > immediately, do not disclose it internally or to third parties and > > destroy > > > > > > it. > > > > > > In the course of our business relationship and for business purposes > > > > > > only, Valeo may need to process some of your personal data. > > > > > > For more > > > > > > information, please refer to the Valeo Data Protection Statement and > > > > > > Privacy notice available on Valeo.com > > > > > > https://www.valeo.com/en/ethics-and-compliance/#principes > > > > Am I the intended recipient? Otherwise, consider yourself informed > > immediately. I apologize for disclosing this information on the same > > mailing-list you sent the original e-mail. Valeo is not allowed to > process > > my personal data, according to the General Data Protection Regulation > > (GDPR), without prior written consent. Please consider removing such > > statements when sending email to a (public) mailing list, as it makes it > > difficult to help you without violating your rules. > > > > > > > > ---------- Forwarded message ---------- > > From: Arjen via pve-user > > To: PVE User List > > Cc: Arjen > > Bcc: > > Date: Mon, 08 Jun 2020 14:30:57 +0000 > > Subject: Re: [PVE-User] VM Power Issue > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > ---------- Forwarded message ---------- > From: Mark Adams via pve-user > To: PVE User List > Cc: Mark Adams > Bcc: > Date: Mon, 8 Jun 2020 15:38:48 +0100 > Subject: Re: [PVE-User] VM Power Issue > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From devzero at web.de Tue Jun 9 11:12:21 2020 From: devzero at web.de (Roland) Date: Tue, 9 Jun 2020 11:12:21 +0200 Subject: [PVE-User] zvol vs qcow2 on zfs Message-ID: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> Hello, i'm currently planning a larger migration from xenserver to proxmox. we want to use a proxmox cluster without shared storage, i.e. local storage only. zfs is perfect for that. whatever, we have found zvol (the proxmox default) is not optimal for us, because of the following reasons: - zvol cannot be replicated (on a "per dataset" basis) with pve-zsync. we want to replicate our datasets to central backupserver. - other replication tools (e.g. syncoid) won't handle zvols well, i.e. when zvol is deleted on the source , it is not deleted on the target. that is a problem when we "shuffle around" zvols between different pools/datasets or servers. they would need extra scripting/handling on the replication target. - backing up zvols on the replicated server (for example with borgbackup) is also not straightforward (because they are no files and snapshots from a "backupsnap" aren't easily acessible, too) - zvol has known performance issues , e.g. ( https://github.com/openzfs/zfs/issues/10095 ) is anybody using qcow2 on zfs in production at a larger scale or someone wants to share his thoughts/experience with using qcow2 on zfs ? regards roland From gianni.milo22 at gmail.com Tue Jun 9 19:12:18 2020 From: gianni.milo22 at gmail.com (Gianni Milo) Date: Tue, 9 Jun 2020 18:12:18 +0100 Subject: [PVE-User] zvol vs qcow2 on zfs In-Reply-To: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> Message-ID: > is anybody using qcow2 on zfs in production at a larger scale or someone > wants to share his thoughts/experience with using qcow2 on zfs ? I would not use qcow2 images on a zfs dataset. I would prefer raw images instead because the overhead is less and you can snapshot the VMs at the zfs layer which is much faster. G. > From marco at internet.one Tue Jun 9 19:46:11 2020 From: marco at internet.one (Marco Bellini) Date: Tue, 9 Jun 2020 17:46:11 +0000 Subject: [PVE-User] CEPH performance In-Reply-To: References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>, Message-ID: <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> Dear All, I'm trying to use proxmox on a 4 nodes cluster with ceph. every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU. despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor. is there any trick to have ceph intro proxmox working fast? thank you everybody for any advice. -- .- -.-- / - .... . / ..-. --- .-. -.-. . / -... . / .-- .. - .... / -.-- --- ..- -.-.-- Marco Bellini From elacunza at binovo.es Wed Jun 10 08:30:08 2020 From: elacunza at binovo.es (Eneko Lacunza) Date: Wed, 10 Jun 2020 08:30:08 +0200 Subject: [PVE-User] CEPH performance In-Reply-To: <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> Message-ID: Hi Marco, El 9/6/20 a las 19:46, Marco Bellini escribi?: > Dear All, > I'm trying to use proxmox on a 4 nodes cluster with ceph. > every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU. > > despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor. > > is there any trick to have ceph intro proxmox working fast? > What is "very very poor"? What specs have the Proxmox nodes (CPU, RAM)? AFAIK, it will be a challenge to get more that 2000 IOPS from one VM using Ceph... How are you performing the benchmark? Cheers -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943569206 Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) www.binovo.es From mark at openvs.co.uk Wed Jun 10 08:38:47 2020 From: mark at openvs.co.uk (Mark Adams) Date: Wed, 10 Jun 2020 07:38:47 +0100 Subject: [PVE-User] CEPH performance In-Reply-To: References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> Message-ID: The simplest thing to set also is to make sure you are using writeback cache in your vms with ceph. It makes a huge difference in performance. On Wed, 10 Jun 2020, 07:31 Eneko Lacunza, wrote: > Hi Marco, > > El 9/6/20 a las 19:46, Marco Bellini escribi?: > > Dear All, > > I'm trying to use proxmox on a 4 nodes cluster with ceph. > > every node has a 500G NVME drive, with dedicated 10G ceph network with > 9000bytes MTU. > > > > despite off nvme warp speed I can reach when used as lvm volume, as soon > as I convert it into a 4-osd ceph, performance are very very poor. > > > > is there any trick to have ceph intro proxmox working fast? > > > What is "very very poor"? What specs have the Proxmox nodes (CPU, RAM)? > > AFAIK, it will be a challenge to get more that 2000 IOPS from one VM > using Ceph... > > How are you performing the benchmark? > > Cheers > > -- > Zuzendari Teknikoa / Director T?cnico > Binovo IT Human Project, S.L. > Telf. 943569206 > Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) > www.binovo.es > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From aderumier at odiso.com Wed Jun 10 13:15:20 2020 From: aderumier at odiso.com (Alexandre DERUMIER) Date: Wed, 10 Jun 2020 13:15:20 +0200 (CEST) Subject: [PVE-User] CEPH performance In-Reply-To: References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> Message-ID: <1188207075.1092058.1591787720490.JavaMail.zimbra@odiso.com> >>AFAIK, it will be a challenge to get more that 2000 IOPS from one VM >>using Ceph... with iodetph=1, single queue, you'll have indeed the latency , and you shouldn't be to reach more than 4000-5000iops. (depend mainly of cpu frequency on client + cpu frequency on cluster + network latency) but with more parallel read/write, you should be able to reach 70-80000 iops without any problem by disk. (if you need more, you can use multiple disks with iothreads, I was able to scale up to 500-600000 iops with 5-6 disks). depend of your workload, you can enable writeback, it'll improve performance of sequential write of small coalesced blocks. (it's regrouping them in bigger block before sending it to ceph.) But currently (nautilus), enabling writeback slowdown read. with octopus (actually in test http://download.proxmox.com/debian/ceph-octopus/dists/buster/test/), it's solved, and you can always enabled writeback octopus have also others optimisations, and writeback is able to regroup also random non coalesced blocks See my last benchmarks: " Here some iops result with 1vm - 1disk - 4k block iodepth=64, librbd, no iothread. nautilus-cache=none nautilus-cache=writeback octopus-cache=none octopus-cache=writeback randread 4k 62.1k 25.2k 61.1k 60.8k randwrite 4k 27.7k 19.5k 34.5k 53.0k seqwrite 4k 7850 37.5k 24.9k 82.6k " ----- Mail original ----- De: "Eneko Lacunza" ?: "proxmoxve" Envoy?: Mercredi 10 Juin 2020 08:30:08 Objet: Re: [PVE-User] CEPH performance Hi Marco, El 9/6/20 a las 19:46, Marco Bellini escribi?: > Dear All, > I'm trying to use proxmox on a 4 nodes cluster with ceph. > every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU. > > despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor. > > is there any trick to have ceph intro proxmox working fast? > What is "very very poor"? What specs have the Proxmox nodes (CPU, RAM)? AFAIK, it will be a challenge to get more that 2000 IOPS from one VM using Ceph... How are you performing the benchmark? Cheers -- Zuzendari Teknikoa / Director T?cnico Binovo IT Human Project, S.L. Telf. 943569206 Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) www.binovo.es _______________________________________________ pve-user mailing list pve-user at pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From jm at ginernet.com Wed Jun 10 16:24:25 2020 From: jm at ginernet.com (=?UTF-8?Q?Jos=c3=a9_Manuel_Giner?=) Date: Wed, 10 Jun 2020 16:24:25 +0200 Subject: [PVE-User] CEPH performance In-Reply-To: <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> Message-ID: <18faea52-9ae9-06b8-8a7f-d9344c432d04@ginernet.com> Note that with only 10 Gbps network, you will get only 1 GB/s wich is only the 25-30% performance of a NVMe. To profit the 100% performance of a NVMe you need at least a 40G network. On 09/06/2020 19:46, Marco Bellini wrote: > > Dear All, > I'm trying to use proxmox on a 4 nodes cluster with ceph. > every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU. > > despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor. > > is there any trick to have ceph intro proxmox working fast? > > thank you everybody for any advice. > > > > -- .- -.-- / - .... . / ..-. --- .-. -.-. . / -... . / .-- .. - .... / -.-- --- ..- -.-.-- > > Marco Bellini > > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > -- Jos? Manuel Giner https://ginernet.com From sivakumar.saravanan.jv.ext at valeo-siemens.com Wed Jun 10 17:42:19 2020 From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN) Date: Wed, 10 Jun 2020 17:42:19 +0200 Subject: [PVE-User] New host issue while adding to cluster Message-ID: Hello, All hosts and Datacenter itself become unavailable after adding the new host to the cluster. What could be the reason? Best regards SK -- *This e-mail message is intended for the internal use of the intended recipient(s) only. The information contained herein is confidential/privileged. Its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please inform the sender immediately, do not disclose it internally or to third parties and destroy it. In the course of our business relationship and for business purposes only, Valeo may need to process some of your personal data. For more information, please refer to the Valeo Data Protection Statement and Privacy notice available on Valeo.com * From lindsay.mathieson at gmail.com Tue Jun 16 15:51:17 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Tue, 16 Jun 2020 23:51:17 +1000 Subject: [PVE-User] CEPH performance In-Reply-To: References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de> <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one> Message-ID: <614bc869-beb6-dd69-09ce-bfe2acca4b8a@gmail.com> On 10/06/2020 4:38 pm, Mark Adams via pve-user wrote: > The simplest thing to set also is to make sure you are using writeback > cache in your vms with ceph. It makes a huge difference in performance. Chiming in - doing some testing with a 5 node ceph/proxmox cluster here. Basic spinners and 4*1G eth, LACP:tcp-balance. enabling KRBD on the ceph pool made a huge difference - I presume that uses the rbd kernel driver? -- Lindsay From lindsay.mathieson at gmail.com Tue Jun 16 16:00:10 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 17 Jun 2020 00:00:10 +1000 Subject: [PVE-User] Kudo's to the Proxmox team for their ceph integration Message-ID: Have been revisting using ceph after trialing it several years back (ok, but a headache to manage and performance sucked on the limited hardware we had). Wow, you've really put a lot of effort integrating it into proxmox, that UI makes the setup and monitoring so easy. Outstanding work. And the Nautilus features add two key things I really like about zfs - transparent compression and checksumming. Bluetore does seem to have much better performance. Seems pretty solid to, due to my sleep deprived state, I managed to crash/hard reboot the entire cluster *twice* today, but ceph recovered flawlessly with no loss both times, and HA brought up my critical VM's with no intervention (pfSense router, AD and SQL Server). Thanks! -- Lindsay From lindsay.mathieson at gmail.com Tue Jun 16 16:11:56 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 17 Jun 2020 00:11:56 +1000 Subject: [PVE-User] Kudo's to the Proxmox team for their ceph integration In-Reply-To: References: Message-ID: <8c85ec0a-62a5-78e6-e223-088cacceaede@gmail.com> Oh, and the ZFS boot options in the installer are pretty slick too. It was very flaky for me when it first came out, but seems rock solid now. Setup a new server with two SSD's in raid1, no issues. On 17/06/2020 12:00 am, Lindsay Mathieson wrote: > Have been revisting using ceph after trialing it several years back > (ok, but a headache to manage and performance sucked on the limited > hardware we had). > > > Wow, you've really put a lot of effort integrating it into proxmox, > that UI makes the setup and monitoring so easy. Outstanding work. And > the Nautilus features add two key things I really like about zfs - > transparent compression and checksumming. Bluetore does seem to have > much better performance. > > > Seems pretty solid to, due to my sleep deprived state, I managed to > crash/hard reboot the entire cluster *twice* today, but ceph recovered > flawlessly with no loss both times, and HA brought up my critical VM's > with no intervention (pfSense router, AD and SQL Server). > > > Thanks! > -- Lindsay From lindsay.mathieson at gmail.com Wed Jun 17 02:16:09 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 17 Jun 2020 10:16:09 +1000 Subject: [PVE-User] Ceph Storage Status Page Message-ID: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com> Is the "Usage" total displayed there the Actual Size / replication factor? Because my total is 5TB when I have 16TB of disk space. Which makes sense and is more useful. -- Lindsay From alex at calicolabs.com Wed Jun 17 03:09:17 2020 From: alex at calicolabs.com (Alex Chekholko) Date: Tue, 16 Jun 2020 18:09:17 -0700 Subject: [PVE-User] Ceph Storage Status Page In-Reply-To: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com> References: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com> Message-ID: Maybe compare it to the output of "ceph -s" on the CLI. You can click "Shell" in the upper right WebUI view. On Tue, Jun 16, 2020 at 5:17 PM Lindsay Mathieson < lindsay.mathieson at gmail.com> wrote: > Is the "Usage" total displayed there the Actual Size / replication > factor? Because my total is 5TB when I have 16TB of disk space. > > > Which makes sense and is more useful. > > -- > Lindsay > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From lindsay.mathieson at gmail.com Thu Jun 18 02:34:41 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 18 Jun 2020 10:34:41 +1000 Subject: [PVE-User] Ceph Storage Status Page In-Reply-To: References: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com> Message-ID: On 17/06/2020 11:09 am, Alex Chekholko via pve-user wrote: > Maybe compare it to the output of "ceph -s" on the CLI. You can click > "Shell" in the upper right WebUI view. Thanks, yah I'm familiar with that, just curious as to what Proxmox is displaying. "ceph -s" shows: ??? 7.9 TiB used, 23 TiB / 30 TiB avail Whereas Proxmox shows: ??? 3.33TiB of 8.59TiB I do have lz4 compression on though. Maybe thats skewing the figures. -- Lindsay From lindsay.mathieson at gmail.com Thu Jun 18 13:30:38 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Thu, 18 Jun 2020 21:30:38 +1000 Subject: [PVE-User] Enabling telemetry broke all my ceph managers Message-ID: Clean nautilous install I setup last week * 5 Proxmox nodes o All on latest updates via no-subscription channel * 18 OSD's * 3 Managers * 3 Monitors * Cluster Heal good * In a protracted rebalance phase * All managed via proxmox I thought I would enable telemetry for caph as per this article: https://docs.ceph.com/docs/master/mgr/telemetry/ * Enabled the module (command line) * ceph telemetry on * Tested getting the status * Set the contact and description ceph config set mgr mgr/telemetry/contact 'John Doe ' ceph config set mgr mgr/telemetry/description 'My first Ceph cluster' ceph config set mgr mgr/telemetry/channel_ident true * Tried sending it ceph telemetry send I *think* this is when the managers died, but it could have been earlier. But around then the all ceph IO stopped and I discovered all three managers had crashed and would not restart. I was shitting myself because this was remote and the router is a pfSense VM :) Fortunately it kept going without its disk responding. systemctl start ceph-mgr at vni.service Job for ceph-mgr at vni.service failed because the control process exited with error code. See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for details. From journalcontrol -xe -- The unit ceph-mgr at vni.service has entered the 'failed' state with result 'exit-code'. Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager daemon. -- Subject: A start job for unit ceph-mgr at vni.service has failed -- Defined-By: systemd -- Support: https://www.debian.org/support -- -- A start job for unit ceph-mgr at vni.service has finished with a failure. -- -- The job identifier is 91690 and the job result is failed. From systemctl status ceph-mgr at vni.service ceph-mgr at vni.service - Ceph cluster manager daemon ?? Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled; vendor preset: enabled) ? Drop-In: /lib/systemd/system/ceph-mgr at .service.d ?????????? ??ceph-after-pve-cluster.conf ?? Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 AEST; 8min ago ? Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) ?Main PID: 415566 (code=exited, status=1/FAILURE) Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service RestartSec=10s expired, scheduling restart. Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart job, restart counter is at 4. Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon. Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request repeated too quickly. Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result 'exit-code'. Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager daemon. I created a new manager service on an unused node and fortunately that worked. I deleted/recreated the old managers and they started working. It was a sweaty few minutes :) Everything resumed without a hiccup after that, impressed. Not game to try and reproduce it though. -- Lindsay From brians at iptel.co Fri Jun 19 00:06:40 2020 From: brians at iptel.co (Brian :) Date: Thu, 18 Jun 2020 23:06:40 +0100 Subject: [PVE-User] Enabling telemetry broke all my ceph managers In-Reply-To: References: Message-ID: Nice save. And thanks for the detailed info. On Thursday, June 18, 2020, Lindsay Mathieson wrote: > Clean nautilous install I setup last week > > * 5 Proxmox nodes > o All on latest updates via no-subscription channel > * 18 OSD's > * 3 Managers > * 3 Monitors > * Cluster Heal good > * In a protracted rebalance phase > * All managed via proxmox > > I thought I would enable telemetry for caph as per this article: > > https://docs.ceph.com/docs/master/mgr/telemetry/ > > > * Enabled the module (command line) > * ceph telemetry on > * Tested getting the status > * Set the contact and description > ceph config set mgr mgr/telemetry/contact 'John Doe > ' > ceph config set mgr mgr/telemetry/description 'My first Ceph cluster' > ceph config set mgr mgr/telemetry/channel_ident true > * Tried sending it > ceph telemetry send > > I *think* this is when the managers died, but it could have been earlier. But around then the all ceph IO stopped and I discovered all three managers had crashed and would not restart. I was shitting myself because this was remote and the router is a pfSense VM :) Fortunately it kept going without its disk responding. > > systemctl start ceph-mgr at vni.service > Job for ceph-mgr at vni.service failed because the control process exited with error code. > See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for details. > > From journalcontrol -xe > > -- The unit ceph-mgr at vni.service has entered the 'failed' state with > result 'exit-code'. > Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > -- Subject: A start job for unit ceph-mgr at vni.service has failed > -- Defined-By: systemd > -- Support: https://www.debian.org/support > -- > -- A start job for unit ceph-mgr at vni.service has finished with a > failure. > -- > -- The job identifier is 91690 and the job result is failed. > > > From systemctl status ceph-mgr at vni.service > > ceph-mgr at vni.service - Ceph cluster manager daemon > Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled; vendor preset: enabled) > Drop-In: /lib/systemd/system/ceph-mgr at .service.d > ??ceph-after-pve-cluster.conf > Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 AEST; 8min ago > Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) > Main PID: 415566 (code=exited, status=1/FAILURE) > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service RestartSec=10s expired, scheduling restart. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart job, restart counter is at 4. > Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request repeated too quickly. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result 'exit-code'. > Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager daemon. > > I created a new manager service on an unused node and fortunately that worked. I deleted/recreated the old managers and they started working. It was a sweaty few minutes :) > > > Everything resumed without a hiccup after that, impressed. Not game to try and reproduce it though. > > > > -- > Lindsay > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > From rommelrt at nauta.cu Fri Jun 19 17:56:27 2020 From: rommelrt at nauta.cu (Rommel Rodriguez Toirac) Date: Fri, 19 Jun 2020 11:56:27 -0400 Subject: [PVE-User] Where to find CentOS 8 template Message-ID: <75DEC247-E483-4404-B90E-CC32B8310261@nauta.cu> Hello all; Does anyone know of a place to get a CentOS 8 container file similar to the ones at https://download.openvz.org/template/precreated/ for CentOS 7 (centos-7-x86_64-minimal.tar.gz) https://download.openvz.org/template/precreated/ https://download.openvz.org/template/precreated/centos-7-x86_64-minimal.tar.gz -- Rommel Rodriguez Toirac rommelrt at nauta.cu From leesteken at protonmail.ch Fri Jun 19 18:03:48 2020 From: leesteken at protonmail.ch (Arjen) Date: Fri, 19 Jun 2020 16:03:48 +0000 Subject: [PVE-User] Where to find CentOS 8 template In-Reply-To: <75DEC247-E483-4404-B90E-CC32B8310261@nauta.cu> References: <75DEC247-E483-4404-B90E-CC32B8310261@nauta.cu> Message-ID: On Friday, June 19, 2020 5:56 PM, Rommel Rodriguez Toirac wrote: > Hello all; > > Does anyone know of a place to get a CentOS 8 container file similar to the > ones at https://download.openvz.org/template/precreated/ for CentOS 7 > (centos-7-x86_64-minimal.tar.gz) > > https://download.openvz.org/template/precreated/ > > https://download.openvz.org/template/precreated/centos-7-x86_64-minimal.tar.gz If you run /usr/bin/pveam update, I believe you should be able to download "centos-8-default (20191016)" using the Templates button on a storage (in the Proxmox WebGUI) that has container templates enabled. Is this not working for you? If so, which version of Proxmox do you use? Or is that template not what you are looking for? Or did I understand your question wrong? kind regards, Arjen From atokovenko at gmail.com Mon Jun 22 21:57:31 2020 From: atokovenko at gmail.com (Oleksii Tokovenko) Date: Mon, 22 Jun 2020 22:57:31 +0300 Subject: [PVE-User] pve-user Digest, Vol 147, Issue 10 In-Reply-To: References: Message-ID: unsubscribe ??, 19 ????. 2020 ? 13:00 ????: > Send pve-user mailing list submissions to > pve-user at pve.proxmox.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > or, via email, send a message with subject or body 'help' to > pve-user-request at pve.proxmox.com > > You can reach the person managing the list at > pve-user-owner at pve.proxmox.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of pve-user digest..." > > > Today's Topics: > > 1. Enabling telemetry broke all my ceph managers (Lindsay Mathieson) > 2. Re: Enabling telemetry broke all my ceph managers (Brian :) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 18 Jun 2020 21:30:38 +1000 > From: Lindsay Mathieson > To: PVE User List > Subject: [PVE-User] Enabling telemetry broke all my ceph managers > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > Clean nautilous install I setup last week > > * 5 Proxmox nodes > o All on latest updates via no-subscription channel > * 18 OSD's > * 3 Managers > * 3 Monitors > * Cluster Heal good > * In a protracted rebalance phase > * All managed via proxmox > > I thought I would enable telemetry for caph as per this article: > > https://docs.ceph.com/docs/master/mgr/telemetry/ > > > * Enabled the module (command line) > * ceph telemetry on > * Tested getting the status > * Set the contact and description > ceph config set mgr mgr/telemetry/contact 'John Doe > ' > ceph config set mgr mgr/telemetry/description 'My first Ceph cluster' > ceph config set mgr mgr/telemetry/channel_ident true > * Tried sending it > ceph telemetry send > > I *think* this is when the managers died, but it could have been > earlier. But around then the all ceph IO stopped and I discovered all > three managers had crashed and would not restart. I was shitting myself > because this was remote and the router is a pfSense VM :) Fortunately it > kept going without its disk responding. > > systemctl start ceph-mgr at vni.service > Job for ceph-mgr at vni.service failed because the control process exited > with error code. > See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for > details. > > From journalcontrol -xe > > -- The unit ceph-mgr at vni.service has entered the 'failed' state with > result 'exit-code'. > Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > -- Subject: A start job for unit ceph-mgr at vni.service has failed > -- Defined-By: systemd > -- Support: https://www.debian.org/support > -- > -- A start job for unit ceph-mgr at vni.service has finished with a > failure. > -- > -- The job identifier is 91690 and the job result is failed. > > > From systemctl status ceph-mgr at vni.service > > ceph-mgr at vni.service - Ceph cluster manager daemon > ?? Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled; > vendor preset: enabled) > ? Drop-In: /lib/systemd/system/ceph-mgr at .service.d > ?????????? ??ceph-after-pve-cluster.conf > ?? Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 > AEST; 8min ago > ? Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} > --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) > ?Main PID: 415566 (code=exited, status=1/FAILURE) > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service > RestartSec=10s expired, scheduling restart. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart > job, restart counter is at 4. > Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request > repeated too quickly. > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result > 'exit-code'. > Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > > I created a new manager service on an unused node and fortunately that > worked. I deleted/recreated the old managers and they started working. > It was a sweaty few minutes :) > > > Everything resumed without a hiccup after that, impressed. Not game to > try and reproduce it though. > > > > -- > Lindsay > > > > ------------------------------ > > Message: 2 > Date: Thu, 18 Jun 2020 23:06:40 +0100 > From: "Brian :" > To: PVE User List > Subject: Re: [PVE-User] Enabling telemetry broke all my ceph managers > Message-ID: > MeekoDhoLN1s30BKX9cDdiEdJVLFvvQZH733Q at mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Nice save. And thanks for the detailed info. > > On Thursday, June 18, 2020, Lindsay Mathieson > > wrote: > > Clean nautilous install I setup last week > > > > * 5 Proxmox nodes > > o All on latest updates via no-subscription channel > > * 18 OSD's > > * 3 Managers > > * 3 Monitors > > * Cluster Heal good > > * In a protracted rebalance phase > > * All managed via proxmox > > > > I thought I would enable telemetry for caph as per this article: > > > > https://docs.ceph.com/docs/master/mgr/telemetry/ > > > > > > * Enabled the module (command line) > > * ceph telemetry on > > * Tested getting the status > > * Set the contact and description > > ceph config set mgr mgr/telemetry/contact 'John Doe > > ' > > ceph config set mgr mgr/telemetry/description 'My first Ceph cluster' > > ceph config set mgr mgr/telemetry/channel_ident true > > * Tried sending it > > ceph telemetry send > > > > I *think* this is when the managers died, but it could have been earlier. > But around then the all ceph IO stopped and I discovered all three managers > had crashed and would not restart. I was shitting myself because this was > remote and the router is a pfSense VM :) Fortunately it kept going without > its disk responding. > > > > systemctl start ceph-mgr at vni.service > > Job for ceph-mgr at vni.service failed because the control process exited > with error code. > > See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for > details. > > > > From journalcontrol -xe > > > > -- The unit ceph-mgr at vni.service has entered the 'failed' state with > > result 'exit-code'. > > Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager > > daemon. > > -- Subject: A start job for unit ceph-mgr at vni.service has failed > > -- Defined-By: systemd > > -- Support: https://www.debian.org/support > > -- > > -- A start job for unit ceph-mgr at vni.service has finished with a > > failure. > > -- > > -- The job identifier is 91690 and the job result is failed. > > > > > > From systemctl status ceph-mgr at vni.service > > > > ceph-mgr at vni.service - Ceph cluster manager daemon > > Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled; > vendor > preset: enabled) > > Drop-In: /lib/systemd/system/ceph-mgr at .service.d > > ??ceph-after-pve-cluster.conf > > Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 AEST; > 8min ago > > Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} > --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) > > Main PID: 415566 (code=exited, status=1/FAILURE) > > > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service > RestartSec=10s expired, scheduling restart. > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart > job, restart counter is at 4. > > Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon. > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request > repeated too quickly. > > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result > 'exit-code'. > > Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager > daemon. > > > > I created a new manager service on an unused node and fortunately that > worked. I deleted/recreated the old managers and they started working. It > was a sweaty few minutes :) > > > > > > Everything resumed without a hiccup after that, impressed. Not game to > try and reproduce it though. > > > > > > > > -- > > Lindsay > > > > _______________________________________________ > > pve-user mailing list > > pve-user at pve.proxmox.com > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > > ------------------------------ > > End of pve-user Digest, Vol 147, Issue 10 > ***************************************** > -- ? ?????????, ????????? ??????? ?????????? From thomas.naumann at ovgu.de Fri Jun 26 09:51:57 2020 From: thomas.naumann at ovgu.de (Naumann, Thomas) Date: Fri, 26 Jun 2020 07:51:57 +0000 Subject: [PVE-User] osd init authentication failed: (1) Operation not permitted Message-ID: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de> Hi, in our production cluster (proxmox 5.4, ceph 12.2) there is an issue since yesterday. after an increase of a pool 5 OSDs do not start, status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) set, 5 osds down, 128 osds: 123 up, 128 in. last lines of OSD-logfile: 2020-06-26 08:40:26.240005 7f6d245fff80 1 freelist init 2020-06-26 08:40:26.243779 7f6d245fff80 1 bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation metadata 2020-06-26 08:40:26.251501 7f6d245fff80 1 bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in 1 extents 2020-06-26 08:40:26.253058 7f6d245fff80 0 /mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs 2020-06-26 08:40:26.253309 7f6d245fff80 0 _get_class not permitted to load sdk 2020-06-26 08:40:26.256486 7f6d245fff80 0 _get_class not permitted to load kvs 2020-06-26 08:40:26.256611 7f6d245fff80 0 /mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296: loading cls_hello 2020-06-26 08:40:26.258362 7f6d245fff80 0 _get_class not permitted to load lua 2020-06-26 08:40:26.259850 7f6d245fff80 0 osd.45 46770 crush map has features 288514051259236352, adjusting msgr requires for clients 2020-06-26 08:40:26.259859 7f6d245fff80 0 osd.45 46770 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons 2020-06-26 08:40:26.259863 7f6d245fff80 0 osd.45 46770 crush map has features 1009089991638532096, adjusting msgr requires for osds 2020-06-26 08:40:26.305880 7f6d245fff80 0 osd.45 46770 load_pgs 2020-06-26 08:40:28.024638 7f6d245fff80 0 osd.45 46770 load_pgs opened 129 pgs 2020-06-26 08:40:28.024803 7f6d245fff80 0 osd.45 46770 using weightedpriority op queue with priority op cut off at 64. 2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770 log_to_monitors {default=true} 2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init authentication failed: (1) Operation not permitted Does anyone know how to fix this? -- Thomas Naumann Abteilung Netze und Kommunikation Otto-von-Guericke Universit?t Magdeburg Universit?tsrechenzentrum Universit?tsplatz 2 39106 Magdeburg fon: +49 391 67-58563 email: thomas.naumann at ovgu.de From a.antreich at proxmox.com Mon Jun 29 10:36:51 2020 From: a.antreich at proxmox.com (Alwin Antreich) Date: Mon, 29 Jun 2020 10:36:51 +0200 Subject: [PVE-User] osd init authentication failed: (1) Operation not permitted In-Reply-To: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de> References: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de> Message-ID: <20200629083651.GB1554173@dona.proxmox.com> Hello Thomas, On Fri, Jun 26, 2020 at 07:51:57AM +0000, Naumann, Thomas wrote: > Hi, > > in our production cluster (proxmox 5.4, ceph 12.2) there is an issue > since yesterday. after an increase of a pool 5 OSDs do not start, > status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) set, > 5 osds down, 128 osds: 123 up, 128 in. > > last lines of OSD-logfile: > 2020-06-26 08:40:26.240005 7f6d245fff80 1 freelist init > 2020-06-26 08:40:26.243779 7f6d245fff80 1 > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation > metadata > 2020-06-26 08:40:26.251501 7f6d245fff80 1 > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in 1 > extents > 2020-06-26 08:40:26.253058 7f6d245fff80 0 > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197: > loading cephfs > 2020-06-26 08:40:26.253309 7f6d245fff80 0 _get_class not permitted to > load sdk > 2020-06-26 08:40:26.256486 7f6d245fff80 0 _get_class not permitted to > load kvs > 2020-06-26 08:40:26.256611 7f6d245fff80 0 > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296: loading > cls_hello > 2020-06-26 08:40:26.258362 7f6d245fff80 0 _get_class not permitted to > load lua > 2020-06-26 08:40:26.259850 7f6d245fff80 0 osd.45 46770 crush map has > features 288514051259236352, adjusting msgr requires for clients > 2020-06-26 08:40:26.259859 7f6d245fff80 0 osd.45 46770 crush map has > features 288514051259236352 was 8705, adjusting msgr requires for mons > 2020-06-26 08:40:26.259863 7f6d245fff80 0 osd.45 46770 crush map has > features 1009089991638532096, adjusting msgr requires for osds > 2020-06-26 08:40:26.305880 7f6d245fff80 0 osd.45 46770 load_pgs > 2020-06-26 08:40:28.024638 7f6d245fff80 0 osd.45 46770 load_pgs opened > 129 pgs > 2020-06-26 08:40:28.024803 7f6d245fff80 0 osd.45 46770 using > weightedpriority op queue with priority op cut off at 64. > 2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770 log_to_monitors > {default=true} > 2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init > authentication failed: (1) Operation not permitted > > Does anyone know how to fix this? Are does OSDs on the same host? What is the current status of the cluster? -- Cheers, Alwin From thomas.naumann at ovgu.de Mon Jun 29 13:23:31 2020 From: thomas.naumann at ovgu.de (Naumann, Thomas) Date: Mon, 29 Jun 2020 11:23:31 +0000 Subject: [PVE-User] osd init authentication failed: (1) Operation not permitted In-Reply-To: <20200629083651.GB1554173@dona.proxmox.com> References: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de> <20200629083651.GB1554173@dona.proxmox.com> Message-ID: <4e6d946ab3c8395f6912e8c47a7bfe1f0fe0e839.camel@ovgu.de> Hi Alwin, yes, all OSDs, which did not start, were on same physical clusternode and all running VMs on cluster were dead because of missing objects. Problem was that those OSDs did not have an entry in "ceph auth list", so manually adding the OSDs (ceph auth add osd.X osd 'allow *' mon 'allow profile osd' mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph- X/keyring) solved the problem. After that starting systemd-service for each OSD war successful. Until now I did not find anything related in logfiles on clusternode. Any hint to demystifying the cluster behavior is welcome... -- Thomas Naumann Abteilung Netze und Kommunikation Otto-von-Guericke Universit?t Magdeburg Universit?tsrechenzentrum Universit?tsplatz 2 39106 Magdeburg fon: +49 391 67-58563 email: thomas.naumann at ovgu.de On Mon, 2020-06-29 at 10:36 +0200, Alwin Antreich wrote: > Hello Thomas, > > On Fri, Jun 26, 2020 at 07:51:57AM +0000, Naumann, Thomas wrote: > > Hi, > > > > in our production cluster (proxmox 5.4, ceph 12.2) there is an > > issue > > since yesterday. after an increase of a pool 5 OSDs do not start, > > status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) > > set, > > 5 osds down, 128 osds: 123 up, 128 in. > > > > last lines of OSD-logfile: > > 2020-06-26 08:40:26.240005 7f6d245fff80 1 freelist init > > 2020-06-26 08:40:26.243779 7f6d245fff80 1 > > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation > > metadata > > 2020-06-26 08:40:26.251501 7f6d245fff80 1 > > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in > > 1 > > extents > > 2020-06-26 08:40:26.253058 7f6d245fff80 0 > > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197: > > loading cephfs > > 2020-06-26 08:40:26.253309 7f6d245fff80 0 _get_class not permitted > > to > > load sdk > > 2020-06-26 08:40:26.256486 7f6d245fff80 0 _get_class not permitted > > to > > load kvs > > 2020-06-26 08:40:26.256611 7f6d245fff80 0 > > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296: > > loading > > cls_hello > > 2020-06-26 08:40:26.258362 7f6d245fff80 0 _get_class not permitted > > to > > load lua > > 2020-06-26 08:40:26.259850 7f6d245fff80 0 osd.45 46770 crush map > > has > > features 288514051259236352, adjusting msgr requires for clients > > 2020-06-26 08:40:26.259859 7f6d245fff80 0 osd.45 46770 crush map > > has > > features 288514051259236352 was 8705, adjusting msgr requires for > > mons > > 2020-06-26 08:40:26.259863 7f6d245fff80 0 osd.45 46770 crush map > > has > > features 1009089991638532096, adjusting msgr requires for osds > > 2020-06-26 08:40:26.305880 7f6d245fff80 0 osd.45 46770 load_pgs > > 2020-06-26 08:40:28.024638 7f6d245fff80 0 osd.45 46770 load_pgs > > opened > > 129 pgs > > 2020-06-26 08:40:28.024803 7f6d245fff80 0 osd.45 46770 using > > weightedpriority op queue with priority op cut off at 64. > > 2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770 > > log_to_monitors > > {default=true} > > 2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init > > authentication failed: (1) Operation not permitted > > > > Does anyone know how to fix this? > Are does OSDs on the same host? What is the current status of the > cluster? > > -- > Cheers, > Alwin > > _______________________________________________ > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From a.antreich at proxmox.com Mon Jun 29 14:35:21 2020 From: a.antreich at proxmox.com (Alwin Antreich) Date: Mon, 29 Jun 2020 14:35:21 +0200 Subject: [PVE-User] osd init authentication failed: (1) Operation not permitted In-Reply-To: <4e6d946ab3c8395f6912e8c47a7bfe1f0fe0e839.camel@ovgu.de> References: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de> <20200629083651.GB1554173@dona.proxmox.com> <4e6d946ab3c8395f6912e8c47a7bfe1f0fe0e839.camel@ovgu.de> Message-ID: <20200629123521.GA81113@dona.proxmox.com> On Mon, Jun 29, 2020 at 11:23:31AM +0000, Naumann, Thomas wrote: > Hi Alwin, > > yes, all OSDs, which did not start, were on same physical clusternode > and all running VMs on cluster were dead because of missing objects. > > Problem was that those OSDs did not have an entry in "ceph auth list", > so manually adding the OSDs (ceph auth add osd.X osd 'allow *' mon > 'allow profile osd' mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph- > X/keyring) solved the problem. > > After that starting systemd-service for each OSD war successful. > > Until now I did not find anything related in logfiles on clusternode. > Any hint to demystifying the cluster behavior is welcome... Besides, correleting log file, not really. ;) Not certain but this issue could also be at play. https://bugzilla.proxmox.com/show_bug.cgi?id=2053 -- Cheers, Alwin From lindsay.mathieson at gmail.com Mon Jun 29 16:07:40 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Tue, 30 Jun 2020 00:07:40 +1000 Subject: [PVE-User] Ceph Bluestore - lvmcache versus WAL/DB on SSD Message-ID: As per the title :) I have 23 OSD spinners on 5 hosts, Data+WAL+DB all on the disk. All VM's are windows running with Writeback Cache. Performance is adequate but see occasional high IO loads that make the VM's sluggish. With a lot of work I could move the WAL/DB to separate SSD partitions, or use lvmcache, which looks to be more transparent to setup. TBH, write performance could be better, as could IOPS :) The Ethernet connections never comes close to being saturated, so I'm guessing write speed of the disks is the limiting factor. I'm tending towards separate WAL/DB devices as I prefer to work within the recommended usages of projects such as Ceph these days, rather than trying to outguess their design parameters. Any experiences either way on the list? -- Lindsay From jameslipski at protonmail.com Tue Jun 30 03:08:49 2020 From: jameslipski at protonmail.com (jameslipski) Date: Tue, 30 Jun 2020 01:08:49 +0000 Subject: High I/O waits, not sure if it's a ceph issue. Message-ID: <4GE-3ImIaZ3ujQiKYpuwovUyhUEwt8m_ZZAcH3haKt6ly27BvzznK1BgWvt5-T7tM9X3_79u6PcdPDIpxrfhcXh6bDvfuE07B5f8dSrvBDw=@protonmail.com> Greetings, I'm trying out PVE. Currently I'm just doing tests and ran into an issue relating to high I/O waits. Just to give a little bit of a background, we currently we have 6 nodes. We're running CEPH, and each node consists of 2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global ceph configurations (at least as shown on the proxmox interface) is as follows: [global] auth_client_required = xxxx auth_cluster_required = xxxx auth_service_required = xxxx cluster_network = 10.125.0.0/24 fsid = f64d2a67-98c3-4dbc-abfd-906ea7aaf314 mon_allow_pool_delete = true mon_host = 10.125.0.101 10.125.0.102 10.125.0.103 10.125.0.105 10.125.0.106 10.125.0.104 osd_pool_default_min_size = 2 osd_pool_default_size = 3 public_network = 10.125.0.0/24 [client] keyring = /etc/pve/priv/$cluster.$name.keyring If I'm missing any relevant information relating to my ceph setup (I'm still learning this), please let me know. Each node consists of 2x Xeon E5-2660 v3. Where I ran into high I/O waits is when running 2 VMs. 1 VM is a mysql replication server (using 8 cores), and is performing mostly writes. The second VM is running Debian with Cacti. Both of these systems are on 2 different nodes but uses CEPH to store the vm-hd. When I copied files over the network to the VM running Cacti, I've noticed high I/O waits in my mysql VM. I'm assuming that this has something to do with ceph; though the only thing I'm seeing in the ceph logs are the following: 02:43:01.062082 mgr.node01 (mgr.2914449) 8009571 : cluster [DBG] pgmap v8009574: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.4 MiB/s wr, 274 op/s 02:43:03.063137 mgr.node01 (mgr.2914449) 8009572 : cluster [DBG] pgmap v8009575: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 3.0 MiB/s wr, 380 op/s 02:43:05.064125 mgr.node01 (mgr.2914449) 8009573 : cluster [DBG] pgmap v8009576: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 2.9 MiB/s wr, 332 op/s 02:43:07.065373 mgr.node01 (mgr.2914449) 8009574 : cluster [DBG] pgmap v8009577: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 2.7 MiB/s wr, 313 op/s 02:43:09.066210 mgr.node01 (mgr.2914449) 8009575 : cluster [DBG] pgmap v8009578: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 2.9 MiB/s wr, 350 op/s 02:43:11.066913 mgr.node01 (mgr.2914449) 8009576 : cluster [DBG] pgmap v8009579: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.1 MiB/s wr, 346 op/s 02:43:13.067926 mgr.node01 (mgr.2914449) 8009577 : cluster [DBG] pgmap v8009580: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.5 MiB/s wr, 408 op/s 02:43:15.068834 mgr.node01 (mgr.2914449) 8009578 : cluster [DBG] pgmap v8009581: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.0 MiB/s wr, 320 op/s 02:43:17.069627 mgr.node01 (mgr.2914449) 8009579 : cluster [DBG] pgmap v8009582: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 2.5 MiB/s wr, 285 op/s 02:43:19.070507 mgr.node01 (mgr.2914449) 8009580 : cluster [DBG] pgmap v8009583: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.0 MiB/s wr, 349 op/s 02:43:21.071241 mgr.node01 (mgr.2914449) 8009581 : cluster [DBG] pgmap v8009584: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 2.8 MiB/s wr, 319 op/s 02:43:23.072286 mgr.node01 (mgr.2914449) 8009582 : cluster [DBG] pgmap v8009585: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.7 MiB/s wr, 329 op/s 02:43:25.073369 mgr.node01 (mgr.2914449) 8009583 : cluster [DBG] pgmap v8009586: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.8 MiB/s wr, 304 op/s 02:43:27.074315 mgr.node01 (mgr.2914449) 8009584 : cluster [DBG] pgmap v8009587: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.2 MiB/s wr, 262 op/s 02:43:29.075284 mgr.node01 (mgr.2914449) 8009585 : cluster [DBG] pgmap v8009588: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 2.9 MiB/s wr, 342 op/s 02:43:31.076180 mgr.node01 (mgr.2914449) 8009586 : cluster [DBG] pgmap v8009589: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 2.4 MiB/s wr, 269 op/s 02:43:33.077523 mgr.node01 (mgr.2914449) 8009587 : cluster [DBG] pgmap v8009590: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.4 MiB/s wr, 389 op/s 02:43:35.078543 mgr.node01 (mgr.2914449) 8009588 : cluster [DBG] pgmap v8009591: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.1 MiB/s wr, 344 op/s 02:43:37.079428 mgr.node01 (mgr.2914449) 8009589 : cluster [DBG] pgmap v8009592: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.0 MiB/s wr, 334 op/s 02:43:39.080419 mgr.node01 (mgr.2914449) 8009590 : cluster [DBG] pgmap v8009593: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.3 MiB/s wr, 377 op/s I'm not sure what could be causing high I/O waits; or whether this is an issue relating to my ceph configurations. Any suggestions would be appreciated/ or if you need any additional information, let me know what you need and I'll post them. Thank you. From lindsay.mathieson at gmail.com Tue Jun 30 03:28:51 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Tue, 30 Jun 2020 11:28:51 +1000 Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue. In-Reply-To: References: Message-ID: On 30/06/2020 11:08 am, jameslipski via pve-user wrote: > ust to give a little bit of a background, we currently we have 6 nodes. We're running CEPH, and each node consists of > 2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global ceph configurations (at least as shown on the proxmox interface) is as follows: Network config? (ie. speed etc). Ceph is Nautilus 14.2.9? (latest on proxmox) Do you have KRBD set for the Proxmox Ceph Storage? that help a lot. -- Lindsay From pve at hw.wewi.de Tue Jun 30 10:51:41 2020 From: pve at hw.wewi.de (Hermann) Date: Tue, 30 Jun 2020 10:51:41 +0200 Subject: [PVE-User] FC-Luns only local devices? Message-ID: <8e7395ee-c3ab-03fc-aa2d-9bf2e0781375@hw.wewi.de> Dear PVE-USers, I would really appreciate being steered in the right direction as to the connection of Fibre-Channel-Luns in Proxmox. As far as I can see, FC-LUNs only appear als local blockdevices in PVE. If I have several LWL-Cables between my Cluster and these bloody expensive Storages, do I have to set up multipath manually in debian? Or is there a CLI-command handling all this? Any helping hint is heartily appreciated. Greetings, Hermann From chris.hofstaedtler at deduktiva.com Tue Jun 30 12:21:25 2020 From: chris.hofstaedtler at deduktiva.com (Chris Hofstaedtler | Deduktiva) Date: Tue, 30 Jun 2020 12:21:25 +0200 Subject: [PVE-User] FC-Luns only local devices? In-Reply-To: <8e7395ee-c3ab-03fc-aa2d-9bf2e0781375@hw.wewi.de> References: <8e7395ee-c3ab-03fc-aa2d-9bf2e0781375@hw.wewi.de> Message-ID: <20200630102125.nnrf3okkiwtdmojw@zeha.at> Hi Hermann , * Hermann [200630 10:51]: > I would really appreciate being steered in the right direction as to the > connection of Fibre-Channel-Luns in Proxmox. > > As far as I can see, FC-LUNs only appear als local blockdevices in PVE. > If I have several LWL-Cables between my Cluster and these bloody > expensive Storages, do I have to set up multipath manually in debian? With most storages you need to configure multipath itself manually, with the settings your storage vendor hands you. Our setup for this is: 1. Manual multipath setup, we tend to enable find_multipaths "smart" to avoid configuring all WWIDs everywhere and so on. 2. The LVM PVs go directly on the mpathXX devices (no partitioning). 3. One VG per mpath device. The VGs are then seen by Proxmox just like always. You have to take great care when removing block devices again, so all PVE nodes release the VGs, PVs, all underlying device mapper devices, and remove the physical sdXX devices, before removing the exports from the storage side. Often it's easier to reboot, and during the reboot fence access to the to-be-removed LUN for the currently rebooting host. Chris -- Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien) www.deduktiva.com / +43 1 353 1707 From jameslipski at protonmail.com Tue Jun 30 14:07:59 2020 From: jameslipski at protonmail.com (jameslipski) Date: Tue, 30 Jun 2020 12:07:59 +0000 Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue. In-Reply-To: References: Message-ID: Thanks for the reply All nodes are connected to a 10Gbit switch. Ceph is currently running on 14.2.2 but will update to the latest. KRBD was not enabled to the pool. Before I update ceph, regarding KRBD, I've just enabled it, do I have do re-create the pool, restart ceph, restart the node, etc... or it just takes into effect? ??????? Original Message ??????? On Monday, June 29, 2020 9:28 PM, Lindsay Mathieson wrote: > On 30/06/2020 11:08 am, jameslipski via pve-user wrote: > > > ust to give a little bit of a background, we currently we have 6 nodes. We're running CEPH, and each node consists of > > 2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global ceph configurations (at least as shown on the proxmox interface) is as follows: > > Network config? (ie. speed etc). > > Ceph is Nautilus 14.2.9? (latest on proxmox) > > Do you have KRBD set for the Proxmox Ceph Storage? that help a lot. > > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Lindsay > > pve-user mailing list > pve-user at pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user From mark at tuxis.nl Tue Jun 30 15:09:12 2020 From: mark at tuxis.nl (Mark Schouten) Date: Tue, 30 Jun 2020 15:09:12 +0200 Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue. In-Reply-To: References: Message-ID: <20200630130912.qia6rghud5okmnsp@shell.tuxis.net> On Tue, Jun 30, 2020 at 11:28:51AM +1000, Lindsay Mathieson wrote: > Do you have KRBD set for the Proxmox Ceph Storage? that help a lot. I think this is incorrect. Using KRBD uses the kernel-driver which is usually older than the userland-version. Also, upgrading is easier when not using KRBD. I'd like to hear that I'm wrong, am I? :) -- Mark Schouten | Tuxis B.V. KvK: 74698818 | http://www.tuxis.nl/ T: +31 318 200208 | info at tuxis.nl From mark at tuxis.nl Tue Jun 30 15:12:33 2020 From: mark at tuxis.nl (Mark Schouten) Date: Tue, 30 Jun 2020 15:12:33 +0200 Subject: [PVE-User] Ceph Bluestore - lvmcache versus WAL/DB on SSD In-Reply-To: References: Message-ID: <20200630131233.z3ezuatxnys6n637@shell.tuxis.net> On Tue, Jun 30, 2020 at 12:07:40AM +1000, Lindsay Mathieson wrote: > As per the title :) I have 23 OSD spinners on 5 hosts, Data+WAL+DB all on > the disk. All VM's are windows running with Writeback Cache. Performance is > adequate but see occasional high IO loads that make the VM's sluggish. Could be that (deep) scrubs are periodically killing your performance. There are some tweaks available to make them less invading: osd_scrub_chunk_min=20 # 5 osd_scrub_sleep=4 # 0 And then some: https://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/ Best option is really to place spinning disks with SSD.. -- Mark Schouten | Tuxis B.V. KvK: 74698818 | http://www.tuxis.nl/ T: +31 318 200208 | info at tuxis.nl From lindsay.mathieson at gmail.com Tue Jun 30 17:44:58 2020 From: lindsay.mathieson at gmail.com (Lindsay Mathieson) Date: Wed, 1 Jul 2020 01:44:58 +1000 Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue. In-Reply-To: References: Message-ID: <12aa72ce-9f3c-035c-dccd-8cf8b3279a9a@gmail.com> On 30/06/2020 10:07 pm, jameslipski via pve-user wrote: > Before I update ceph, regarding KRBD, I've just enabled it, do I have do re-create the pool, restart ceph, restart the node, etc... or it just takes into effect? No need to recreate the pool, just stop/start the VM's accessing it. -- Lindsay