From gaio at sv.lnf.it  Thu Jun  4 09:22:26 2020
From: gaio at sv.lnf.it (Marco Gaiarin)
Date: Thu, 4 Jun 2020 09:22:26 +0200
Subject: [PVE-User] PVE 6, wireless and regulatory database...
In-Reply-To: <20200527070105.GC477557@dona.proxmox.com>
References: <20200518084316.GC3626@lilliput.linux.it>
 <bffb1727-57ac-7ae6-3716-fb39a84547a7@proxmox.com>
 <20200519074011.GB4975@lilliput.linux.it>
 <20200519090409.GA406757@dona.proxmox.com>
 <20200520205816.GA23561@lilliput.linux.it>
 <20200522100421.GA1577721@dona.proxmox.com>
 <20200526104430.GK3717@lilliput.linux.it>
 <20200526113514.GB477557@dona.proxmox.com>
 <20200526153146.GL3717@lilliput.linux.it>
 <20200527070105.GC477557@dona.proxmox.com>
Message-ID: <20200604072226.GA3816@lilliput.linux.it>

Mandi! Alwin Antreich
  In chel di` si favelave...

> > I've installed the buster package...
> You will need the package from the backports.

Sorry for the late answer, but even at home i've needed to define with
my stakeholder a mainenance windows for a cluster reboot. ;-)

I confirm, work as expected.

Jun  3 23:51:57 ino kernel: [    7.866523] cfg80211: Loading compiled-in X.509 certificates for regulatory database
Jun  3 23:51:57 ino kernel: [    7.878636] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Jun  3 23:51:57 ino kernel: [    8.070696] ath: EEPROM regdomain: 0x809c
Jun  3 23:51:57 ino kernel: [    8.070698] ath: EEPROM indicates we should expect a country code
Jun  3 23:51:57 ino kernel: [    8.070698] ath: doing EEPROM country->regdmn map search
Jun  3 23:51:57 ino kernel: [    8.070699] ath: country maps to regdmn code: 0x52
Jun  3 23:51:57 ino kernel: [    8.070700] ath: Country alpha2 being used: CN
Jun  3 23:51:57 ino kernel: [    8.070700] ath: Regpair used: 0x52
Jun  3 23:51:57 ino kernel: [    8.072132] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
Jun  3 23:51:57 ino kernel: [    8.072647] ieee80211 phy0: Atheros AR9287 Rev:2 mem=0xffff9d084dbf0000, irq=16
Jun  3 23:51:57 ino kernel: [    8.080354] ath9k 0000:10:00.0 wls1: renamed from wlan0


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bont?, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


From lindsay.mathieson at gmail.com  Thu Jun  4 09:42:58 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 4 Jun 2020 17:42:58 +1000
Subject: [PVE-User] PVE 6, wireless and regulatory database...
In-Reply-To: <20200604072226.GA3816@lilliput.linux.it>
References: <20200518084316.GC3626@lilliput.linux.it>
 <bffb1727-57ac-7ae6-3716-fb39a84547a7@proxmox.com>
 <20200519074011.GB4975@lilliput.linux.it>
 <20200519090409.GA406757@dona.proxmox.com>
 <20200520205816.GA23561@lilliput.linux.it>
 <20200522100421.GA1577721@dona.proxmox.com>
 <20200526104430.GK3717@lilliput.linux.it>
 <20200526113514.GB477557@dona.proxmox.com>
 <20200526153146.GL3717@lilliput.linux.it>
 <20200527070105.GC477557@dona.proxmox.com>
 <20200604072226.GA3816@lilliput.linux.it>
Message-ID: <18820b24-12bd-aa8a-6016-37c36a276f7a@gmail.com>

On 4/06/2020 5:22 pm, Marco Gaiarin wrote:
> but even at home i've needed to define with
> my stakeholder

SO? :)

-- 
Lindsay


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Thu Jun  4 14:52:43 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Thu, 4 Jun 2020 14:52:43 +0200
Subject: [PVE-User] Proxmox Datacenter Issue
Message-ID: <CAETBTLT5C+KX0RKaP8foLs8Sc_cEUK+YBDE03jW27yK1F_T+KA@mail.gmail.com>

Hello,

We have a one Proxmox Datacenter and on top of that we have around 15
standalone nodes and cluster defined.

The Datacenter itself is showing "communication error " frequentially. All
standalone nodes are unavailable to perform any activities within
theProxmox Datacenter.

Appreciating your support.

Best regards

Sivakumar

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From elacunza at binovo.es  Thu Jun  4 14:59:37 2020
From: elacunza at binovo.es (Eneko Lacunza)
Date: Thu, 4 Jun 2020 14:59:37 +0200
Subject: [PVE-User] Proxmox Datacenter Issue
In-Reply-To: <CAETBTLT5C+KX0RKaP8foLs8Sc_cEUK+YBDE03jW27yK1F_T+KA@mail.gmail.com>
References: <CAETBTLT5C+KX0RKaP8foLs8Sc_cEUK+YBDE03jW27yK1F_T+KA@mail.gmail.com>
Message-ID: <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es>

Hi,

El 4/6/20 a las 14:52, Sivakumar SARAVANAN escribi?:
> Hello,
>
> We have a one Proxmox Datacenter and on top of that we have around 15
> standalone nodes and cluster defined.
>
> The Datacenter itself is showing "communication error " frequentially. All
> standalone nodes are unavailable to perform any activities within
> theProxmox Datacenter.
>
> Appreciating your support.
>
This is usually a network problem. What version of Proxmox (pveversion -v)


Cheers
Eneko

-- 
Zuzendari Teknikoa / Director T?cnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Thu Jun  4 15:07:00 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Thu, 4 Jun 2020 15:07:00 +0200
Subject: [PVE-User] Proxmox Datacenter Issue
In-Reply-To: <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es>
References: <CAETBTLT5C+KX0RKaP8foLs8Sc_cEUK+YBDE03jW27yK1F_T+KA@mail.gmail.com>
 <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es>
Message-ID: <CAETBTLQfNxCnB1UvufFNssdWAyPPmBB-MiGuFSzbqSh5V1m5Jg@mail.gmail.com>

Hello

We are using the pve-manager/6.1-3/37248ce6
There is no network issue, we are able to access the all host from the
putty session. But not from Datacenter.


Best regards,

Sivakumar SARAVANAN


On Thu, Jun 4, 2020 at 3:00 PM Eneko Lacunza <elacunza at binovo.es> wrote:

> Hi,
>
> El 4/6/20 a las 14:52, Sivakumar SARAVANAN escribi?:
> > Hello,
> >
> > We have a one Proxmox Datacenter and on top of that we have around 15
> > standalone nodes and cluster defined.
> >
> > The Datacenter itself is showing "communication error " frequentially.
> All
> > standalone nodes are unavailable to perform any activities within
> > theProxmox Datacenter.
> >
> > Appreciating your support.
> >
> This is usually a network problem. What version of Proxmox (pveversion -v)
>
>
> Cheers
> Eneko
>
> --
> Zuzendari Teknikoa / Director T?cnico
> Binovo IT Human Project, S.L.
> Telf. 943569206
> Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Thu Jun  4 15:33:42 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Thu, 4 Jun 2020 15:33:42 +0200
Subject: [PVE-User] Concern About removing the host from datacenter
Message-ID: <CAETBTLTnRZa_sxKLG6aMNOeRAoG13z6CH+m9pOy66G5dVH7Y5Q@mail.gmail.com>

Hello,

Is there any problem if I remove the standalone host from Proxmox
Datacenter and add the same host back to the cluster without changing the
IP and hostname ?
It is a stanalong host and no cluster defined.

Best regards

Sivakumar SARAVANAN

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From harrim4n at harrim4n.com  Thu Jun  4 15:41:00 2020
From: harrim4n at harrim4n.com (harrim4n)
Date: Thu, 4 Jun 2020 15:41:00 +0200
Subject: [PVE-User] Proxmox Datacenter Issue
In-Reply-To: <CAETBTLQfNxCnB1UvufFNssdWAyPPmBB-MiGuFSzbqSh5V1m5Jg@mail.gmail.com>
References: <CAETBTLT5C+KX0RKaP8foLs8Sc_cEUK+YBDE03jW27yK1F_T+KA@mail.gmail.com>
 <11be2b84-b00d-477b-a3a1-86958e341265@binovo.es>
 <CAETBTLQfNxCnB1UvufFNssdWAyPPmBB-MiGuFSzbqSh5V1m5Jg@mail.gmail.com>
Message-ID: <a0fea04b-0f7a-7f2a-c9f3-2c0dcd3b50a8@harrim4n.com>

Hi,

I don't understand your host layout.

Are you running a cluster as described in [1] or not? Does your
environment match the requirements in the wiki?

What do you mean "15 standalone nodes and cluster defined"? Are they
running in a cluster or not?

Also, are the hosts able to reach each other? Just because you can
access them from your host, doesn't mean that they can talk to each other.


Regards,

harrim4n


[1] https://pve.proxmox.com/wiki/Cluster_Manager


On 04.06.20 15:07, Sivakumar SARAVANAN wrote:
> Hello
>
> We are using the pve-manager/6.1-3/37248ce6
> There is no network issue, we are able to access the all host from the
> putty session. But not from Datacenter.
>
>
> Best regards,
>
> Sivakumar SARAVANAN
>
>
>
> On Thu, Jun 4, 2020 at 3:00 PM Eneko Lacunza <elacunza at binovo.es> wrote:
>
>> Hi,
>>
>> El 4/6/20 a las 14:52, Sivakumar SARAVANAN escribi?:
>>> Hello,
>>>
>>> We have a one Proxmox Datacenter and on top of that we have around 15
>>> standalone nodes and cluster defined.
>>>
>>> The Datacenter itself is showing "communication error " frequentially.
>> All
>>> standalone nodes are unavailable to perform any activities within
>>> theProxmox Datacenter.
>>>
>>> Appreciating your support.
>>>
>> This is usually a network problem. What version of Proxmox (pveversion -v)
>>
>>
>> Cheers
>> Eneko
>>
>> --
>> Zuzendari Teknikoa / Director T?cnico
>> Binovo IT Human Project, S.L.
>> Telf. 943569206
>> Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
>> www.binovo.es
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Mon Jun  8 10:14:54 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Mon, 8 Jun 2020 10:14:54 +0200
Subject: [PVE-User] VM Power Issue
Message-ID: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>

Hello,

I am not able to start the VM after adding the PCI device to VM.
I can see the below error message.

TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name
HIL-System096Planned -chardev
'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon
'chardev=qmp,mode=control' -chardev
'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon
'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid
-daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' -smp
'4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot
'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
-vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu
'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi'
-m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa
'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object
'memory-backend-ram,id=ram-node1,size=16384M' -numa
'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device
'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device
'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device
'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device
'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device
'usb-tablet,id=tablet,bus=uhci.0,port=1' -device
'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device
'VGA,id=vga,bus=pci.0,addr=0x2' -chardev
'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device
'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device
'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi
'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive
'if=none,id=drive-ide2,media=cdrom,aio=threads' -device
'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive
'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
-device
'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100'
-drive
'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb'
-netdev
'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown'
-device
'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
-rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global
'kvm-pit.lost_tick_policy=discard'' failed: got timeout

Appreciating your suggestion.


Best regards,

SK

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Mon Jun  8 14:11:01 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Mon, 8 Jun 2020 14:11:01 +0200
Subject: [PVE-User] Fwd: VM Power Issue
In-Reply-To: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>
References: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>
Message-ID: <CAETBTLRPcXuQFN+X7iCb=pVpXSyU+4cubEtbaK9Z=04vbYr_2A@mail.gmail.com>

Hello,

I am not able to start the VM after adding the PCI device to VM.
I can see the below error message.

TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name
HIL-System096Planned -chardev
'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon
'chardev=qmp,mode=control' -chardev
'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon
'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid
-daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' -smp
'4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot
'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
-vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu
'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi'
-m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa
'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object
'memory-backend-ram,id=ram-node1,size=16384M' -numa
'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device
'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device
'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device
'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device
'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device
'usb-tablet,id=tablet,bus=uhci.0,port=1' -device
'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device
'VGA,id=vga,bus=pci.0,addr=0x2' -chardev
'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device
'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device
'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi
'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive
'if=none,id=drive-ide2,media=cdrom,aio=threads' -device
'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive
'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
-device
'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100'
-drive
'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
-device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb'
-netdev
'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown'
-device
'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
-rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global
'kvm-pit.lost_tick_policy=discard'' failed: got timeout

Appreciating your suggestion.


Best regards,

SK

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From leesteken at protonmail.ch  Mon Jun  8 16:30:57 2020
From: leesteken at protonmail.ch (Arjen)
Date: Mon, 08 Jun 2020 14:30:57 +0000
Subject: [PVE-User] VM Power Issue
In-Reply-To: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>
References: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>
Message-ID: <R7h16kIpnPJPL0qCzqMqjf9N5NaM_keHxZIcWpPdP-Ix_d2Mo80m0BWnNtekly3yrDm3f3PQilWCC6yNXneyjZeuDmvOkR_-dAELveD3awA=@protonmail.ch>

On Monday, June 8, 2020 10:14 AM, Sivakumar SARAVANAN <sivakumar.saravanan.jv.ext at valeo-siemens.com> wrote:

> Hello,
>
> I am not able to start the VM after adding the PCI device to VM.
> I can see the below error message.

Maybe your system is very busy? Maybe it takes a while to allocate the memory?
Maybe you could give more information about the VM configuration and your PVE setup?

Can you try running the command below from the command line of your PVE host, to see if it works and how long it takes?
Sometimes (often memory-size related), it just works but takes longer than the time-out.

> TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name
> HIL-System096Planned -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon
> 'chardev=qmp,mode=control' -chardev
> 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon
> 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid
> -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf' -smp
> '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot
> 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
> -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu
> 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi'
> -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa
> 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object
> 'memory-backend-ram,id=ram-node1,size=16384M' -numa
> 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device
> 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device
> 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device
> 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device
> 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device
> 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device
> 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device
> 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev
> 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device
> 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device
> 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi
> 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive
> 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device
> 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive
> 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
> -device
> 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100'
> -drive
> 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
> -device 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb'
> -netdev
> 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown'
> -device
> 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
> -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global
> 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
>
> Appreciating your suggestion.
>
> Best regards,
>
> SK
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> This e-mail message is intended for the internal use of the intended
>
> recipient(s) only.
>
> The information contained herein is
>
> confidential/privileged. Its disclosure or reproduction is strictly
>
> prohibited.
>
> If you are not the intended recipient, please inform the sender
>
> immediately, do not disclose it internally or to third parties and destroy
>
> it.
>
> In the course of our business relationship and for business purposes
>
> only, Valeo may need to process some of your personal data.
>
> For more
>
> information, please refer to the Valeo Data Protection Statement and
>
> Privacy notice available on Valeo.com
>
> https://www.valeo.com/en/ethics-and-compliance/#principes

Am I the intended recipient? Otherwise, consider yourself informed immediately. I apologize for disclosing this information on the same mailing-list you sent the original e-mail. Valeo is not allowed to process my personal data, according to the General Data Protection Regulation (GDPR), without prior written consent. Please consider removing such statements when sending email to a (public) mailing list, as it makes it difficult to help you without violating your rules.


From mark at openvs.co.uk  Mon Jun  8 16:38:48 2020
From: mark at openvs.co.uk (Mark Adams)
Date: Mon, 8 Jun 2020 15:38:48 +0100
Subject: [PVE-User] VM Power Issue
In-Reply-To: <mailman.160.1591626669.526.pve-user@pve.proxmox.com>
References: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>
 <mailman.160.1591626669.526.pve-user@pve.proxmox.com>
Message-ID: <CAHxUxjD9inPyQM9ainXF+q95WMr=HyvToCxV1-yy0yaaxEoW-g@mail.gmail.com>

Sivakumar - This is a "known issue" as far as I am aware, usually when you
are allocating quite a bit of memory (although 16G is not a lot in your
case, but maybe the server doesn't have much ram?) when starting a vm with
a PCI device passed through to it. It also only seems to happen when you
are nearing "peak" ram usage, so getting close to running out. It never
happens on a fresh boot.

I don't know if it has been acknowledged or even reported to redhat, or
whether simply the timeout should be longer in proxmox.

I wrote to this list about it not long ago and never received a response,
and I have seen at least 1 forum post about it.

Anyway to cut a long story short, just start it manually on the cli, which
has no timeout. "qm showcmd VMID | bash" should start it fine. IE "qm
showcmd 101 | bash"

Regards,
Mark

On Mon, 8 Jun 2020 at 15:31, Arjen via pve-user <pve-user at pve.proxmox.com>
wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Arjen <leesteken at protonmail.ch>
> To: PVE User List <pve-user at pve.proxmox.com>
> Cc:
> Bcc:
> Date: Mon, 08 Jun 2020 14:30:57 +0000
> Subject: Re: [PVE-User] VM Power Issue
> On Monday, June 8, 2020 10:14 AM, Sivakumar SARAVANAN <
> sivakumar.saravanan.jv.ext at valeo-siemens.com> wrote:
>
> > Hello,
> >
> > I am not able to start the VM after adding the PCI device to VM.
> > I can see the below error message.
>
> Maybe your system is very busy? Maybe it takes a while to allocate the
> memory?
> Maybe you could give more information about the VM configuration and your
> PVE setup?
>
> Can you try running the command below from the command line of your PVE
> host, to see if it works and how long it takes?
> Sometimes (often memory-size related), it just works but takes longer than
> the time-out.
>
> > TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name
> > HIL-System096Planned -chardev
> > 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon
> > 'chardev=qmp,mode=control' -chardev
> > 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon
> > 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid
> > -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf'
> -smp
> > '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot
> >
> 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
> > -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu
> >
> 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi'
> > -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa
> > 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object
> > 'memory-backend-ram,id=ram-node1,size=16384M' -numa
> > 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device
> > 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device
> > 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device
> > 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device
> > 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device
> > 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device
> > 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device
> > 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev
> > 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0' -device
> > 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device
> > 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi
> > 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive
> > 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device
> > 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive
> >
> 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
> > -device
> >
> 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100'
> > -drive
> >
> 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
> > -device
> 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb'
> > -netdev
> >
> 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown'
> > -device
> >
> 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
> > -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global
> > 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
> >
> > Appreciating your suggestion.
> >
> > Best regards,
> >
> > SK
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > This e-mail message is intended for the internal use of the intended
> >
> > recipient(s) only.
> >
> > The information contained herein is
> >
> > confidential/privileged. Its disclosure or reproduction is strictly
> >
> > prohibited.
> >
> > If you are not the intended recipient, please inform the sender
> >
> > immediately, do not disclose it internally or to third parties and
> destroy
> >
> > it.
> >
> > In the course of our business relationship and for business purposes
> >
> > only, Valeo may need to process some of your personal data.
> >
> > For more
> >
> > information, please refer to the Valeo Data Protection Statement and
> >
> > Privacy notice available on Valeo.com
> >
> > https://www.valeo.com/en/ethics-and-compliance/#principes
>
> Am I the intended recipient? Otherwise, consider yourself informed
> immediately. I apologize for disclosing this information on the same
> mailing-list you sent the original e-mail. Valeo is not allowed to process
> my personal data, according to the General Data Protection Regulation
> (GDPR), without prior written consent. Please consider removing such
> statements when sending email to a (public) mailing list, as it makes it
> difficult to help you without violating your rules.
>
>
>
> ---------- Forwarded message ----------
> From: Arjen via pve-user <pve-user at pve.proxmox.com>
> To: PVE User List <pve-user at pve.proxmox.com>
> Cc: Arjen <leesteken at protonmail.ch>
> Bcc:
> Date: Mon, 08 Jun 2020 14:30:57 +0000
> Subject: Re: [PVE-User] VM Power Issue
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Mon Jun  8 17:15:52 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Mon, 8 Jun 2020 17:15:52 +0200
Subject: [PVE-User] VM Power Issue
In-Reply-To: <mailman.161.1591627230.526.pve-user@pve.proxmox.com>
References: <CAETBTLSrAmmv9kMp9rXboi74AdccnMXSSZVdBgR+32u58-gofA@mail.gmail.com>
 <mailman.160.1591626669.526.pve-user@pve.proxmox.com>
 <mailman.161.1591627230.526.pve-user@pve.proxmox.com>
Message-ID: <CAETBTLR-Mp_OpJjkNbxebRiru=7WtcdS1R0pT011Pf=HnosJoA@mail.gmail.com>

Hello Mark,

Thanks for your support.

It working fine now.

Best regards

SK


On Mon, Jun 8, 2020 at 4:40 PM Mark Adams via pve-user <
pve-user at pve.proxmox.com> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Mark Adams <mark at openvs.co.uk>
> To: PVE User List <pve-user at pve.proxmox.com>
> Cc:
> Bcc:
> Date: Mon, 8 Jun 2020 15:38:48 +0100
> Subject: Re: [PVE-User] VM Power Issue
> Sivakumar - This is a "known issue" as far as I am aware, usually when you
> are allocating quite a bit of memory (although 16G is not a lot in your
> case, but maybe the server doesn't have much ram?) when starting a vm with
> a PCI device passed through to it. It also only seems to happen when you
> are nearing "peak" ram usage, so getting close to running out. It never
> happens on a fresh boot.
>
> I don't know if it has been acknowledged or even reported to redhat, or
> whether simply the timeout should be longer in proxmox.
>
> I wrote to this list about it not long ago and never received a response,
> and I have seen at least 1 forum post about it.
>
> Anyway to cut a long story short, just start it manually on the cli, which
> has no timeout. "qm showcmd VMID | bash" should start it fine. IE "qm
> showcmd 101 | bash"
>
> Regards,
> Mark
>
> On Mon, 8 Jun 2020 at 15:31, Arjen via pve-user <pve-user at pve.proxmox.com>
> wrote:
>
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Arjen <leesteken at protonmail.ch>
> > To: PVE User List <pve-user at pve.proxmox.com>
> > Cc:
> > Bcc:
> > Date: Mon, 08 Jun 2020 14:30:57 +0000
> > Subject: Re: [PVE-User] VM Power Issue
> > On Monday, June 8, 2020 10:14 AM, Sivakumar SARAVANAN <
> > sivakumar.saravanan.jv.ext at valeo-siemens.com> wrote:
> >
> > > Hello,
> > >
> > > I am not able to start the VM after adding the PCI device to VM.
> > > I can see the below error message.
> >
> > Maybe your system is very busy? Maybe it takes a while to allocate the
> > memory?
> > Maybe you could give more information about the VM configuration and your
> > PVE setup?
> >
> > Can you try running the command below from the command line of your PVE
> > host, to see if it works and how long it takes?
> > Sometimes (often memory-size related), it just works but takes longer
> than
> > the time-out.
> >
> > > TASK ERROR: start failed: command '/usr/bin/kvm -id 175 -name
> > > HIL-System096Planned -chardev
> > > 'socket,id=qmp,path=/var/run/qemu-server/175.qmp,server,nowait' -mon
> > > 'chardev=qmp,mode=control' -chardev
> > > 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon
> > > 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/175.pid
> > > -daemonize -smbios 'type=1,uuid=1ab2409d-4b67-4d3c-822a-7a024d05d9bf'
> > -smp
> > > '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot
> > >
> >
> 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg'
> > > -vnc unix:/var/run/qemu-server/175.vnc,password -no-hpet -cpu
> > >
> >
> 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,hv_ipi'
> > > -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa
> > > 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object
> > > 'memory-backend-ram,id=ram-node1,size=16384M' -numa
> > > 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -device
> > > 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device
> > > 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device
> > > 'vmgenid,guid=c98f392f-13af-43d9-b26e-ca070177f6bb' -device
> > > 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device
> > > 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device
> > > 'vfio-pci,host=0000:1b:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device
> > > 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev
> > > 'socket,path=/var/run/qemu-server/175.qga,server,nowait,id=qga0'
> -device
> > > 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device
> > > 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi
> > > 'initiator-name=iqn.1993-08.org.debian:01:626ca038d6c7' -drive
> > > 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device
> > > 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive
> > >
> >
> 'file=/dev/zvol/SSD-Storage-PRX018/vm-175-disk-0,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
> > > -device
> > >
> >
> 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100'
> > > -drive
> > >
> >
> 'file=/dev/zvol/HDD-Storage-PRX018/vm-175-disk-1,if=none,id=drive-virtio1,cache=writethrough,format=raw,aio=threads,detect-zeroes=on'
> > > -device
> > 'virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb'
> > > -netdev
> > >
> >
> 'type=tap,id=net0,ifname=tap175i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown'
> > > -device
> > >
> >
> 'e1000,mac=F2:3F:4D:48:7B:68,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
> > > -rtc 'driftfix=slew,base=localtime' -machine 'type=pc+pve1' -global
> > > 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
> > >
> > > Appreciating your suggestion.
> > >
> > > Best regards,
> > >
> > > SK
> > >
> > >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >
> > > This e-mail message is intended for the internal use of the intended
> > >
> > > recipient(s) only.
> > >
> > > The information contained herein is
> > >
> > > confidential/privileged. Its disclosure or reproduction is strictly
> > >
> > > prohibited.
> > >
> > > If you are not the intended recipient, please inform the sender
> > >
> > > immediately, do not disclose it internally or to third parties and
> > destroy
> > >
> > > it.
> > >
> > > In the course of our business relationship and for business purposes
> > >
> > > only, Valeo may need to process some of your personal data.
> > >
> > > For more
> > >
> > > information, please refer to the Valeo Data Protection Statement and
> > >
> > > Privacy notice available on Valeo.com
> > >
> > > https://www.valeo.com/en/ethics-and-compliance/#principes
> >
> > Am I the intended recipient? Otherwise, consider yourself informed
> > immediately. I apologize for disclosing this information on the same
> > mailing-list you sent the original e-mail. Valeo is not allowed to
> process
> > my personal data, according to the General Data Protection Regulation
> > (GDPR), without prior written consent. Please consider removing such
> > statements when sending email to a (public) mailing list, as it makes it
> > difficult to help you without violating your rules.
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Arjen via pve-user <pve-user at pve.proxmox.com>
> > To: PVE User List <pve-user at pve.proxmox.com>
> > Cc: Arjen <leesteken at protonmail.ch>
> > Bcc:
> > Date: Mon, 08 Jun 2020 14:30:57 +0000
> > Subject: Re: [PVE-User] VM Power Issue
> > _______________________________________________
> > pve-user mailing list
> > pve-user at pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
>
> ---------- Forwarded message ----------
> From: Mark Adams via pve-user <pve-user at pve.proxmox.com>
> To: PVE User List <pve-user at pve.proxmox.com>
> Cc: Mark Adams <mark at openvs.co.uk>
> Bcc:
> Date: Mon, 8 Jun 2020 15:38:48 +0100
> Subject: Re: [PVE-User] VM Power Issue
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From devzero at web.de  Tue Jun  9 11:12:21 2020
From: devzero at web.de (Roland)
Date: Tue, 9 Jun 2020 11:12:21 +0200
Subject: [PVE-User] zvol vs qcow2 on zfs
Message-ID: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>

Hello,

i'm currently planning a larger migration from xenserver to proxmox.

we want to use a proxmox cluster without shared storage, i.e. local
storage only. zfs is perfect for that.

whatever, we have found zvol (the proxmox default) is not optimal for
us, because of the following reasons:

- zvol cannot be replicated (on a "per dataset" basis) with pve-zsync.
we want to replicate our datasets to central backupserver.

- other replication tools (e.g. syncoid) won't handle zvols well, i.e.
when zvol is deleted on the source , it is not deleted on the target.
that is a problem when we "shuffle around" zvols between different
pools/datasets or servers. they would need extra scripting/handling on
the replication target.

- backing up zvols on the replicated server (for example with
borgbackup) is also not straightforward (because they are no files and
snapshots from a "backupsnap" aren't easily acessible, too)

- zvol has known performance issues , e.g. (
https://github.com/openzfs/zfs/issues/10095 )

is anybody using qcow2 on zfs in production at a larger scale or someone
wants to share his thoughts/experience with using qcow2 on zfs ?

regards
roland


From gianni.milo22 at gmail.com  Tue Jun  9 19:12:18 2020
From: gianni.milo22 at gmail.com (Gianni Milo)
Date: Tue, 9 Jun 2020 18:12:18 +0100
Subject: [PVE-User] zvol vs qcow2 on zfs
In-Reply-To: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
Message-ID: <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>

> is anybody using qcow2 on zfs in production at a larger scale or someone
> wants to share his thoughts/experience with using qcow2 on zfs ?


I would not use qcow2 images on a zfs dataset. I would prefer raw images
instead because the overhead is less and you can snapshot the VMs at the
zfs layer which is much faster.

G.

>


From marco at internet.one  Tue Jun  9 19:46:11 2020
From: marco at internet.one (Marco Bellini)
Date: Tue, 9 Jun 2020 17:46:11 +0000
Subject: [PVE-User] CEPH performance
In-Reply-To: <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>,
 <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
Message-ID: <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>


Dear All, 
I'm trying to use proxmox on a 4 nodes cluster with ceph. 
every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU.

despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor. 

is there any trick to have ceph intro proxmox working fast?

thank you everybody for any advice. 


-- .- -.-- / - .... . / ..-. --- .-. -.-. . / -... . / .-- .. - .... / -.-- --- ..- -.-.--

Marco Bellini


From elacunza at binovo.es  Wed Jun 10 08:30:08 2020
From: elacunza at binovo.es (Eneko Lacunza)
Date: Wed, 10 Jun 2020 08:30:08 +0200
Subject: [PVE-User] CEPH performance
In-Reply-To: <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
 <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
 <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
Message-ID: <c556deca-2d81-ee0e-96df-14751494b5d7@binovo.es>

Hi Marco,

El 9/6/20 a las 19:46, Marco Bellini escribi?:
> Dear All,
> I'm trying to use proxmox on a 4 nodes cluster with ceph.
> every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU.
>
> despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor.
>
> is there any trick to have ceph intro proxmox working fast?
>
What is "very very poor"? What specs have the Proxmox nodes (CPU, RAM)?

AFAIK, it will be a challenge to get more that 2000 IOPS from one VM 
using Ceph...

How are you performing the benchmark?

Cheers

-- 
Zuzendari Teknikoa / Director T?cnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es


From mark at openvs.co.uk  Wed Jun 10 08:38:47 2020
From: mark at openvs.co.uk (Mark Adams)
Date: Wed, 10 Jun 2020 07:38:47 +0100
Subject: [PVE-User] CEPH performance
In-Reply-To: <c556deca-2d81-ee0e-96df-14751494b5d7@binovo.es>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
 <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
 <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
 <c556deca-2d81-ee0e-96df-14751494b5d7@binovo.es>
Message-ID: <CAHxUxjB4YH3G5dvj5gTjtfVLQ_W3kuk_D1aWMtnSgiatH=mydw@mail.gmail.com>

The simplest thing to set also is to make sure you are using writeback
cache in your vms with ceph. It makes a huge difference in performance.

On Wed, 10 Jun 2020, 07:31 Eneko Lacunza, <elacunza at binovo.es> wrote:

> Hi Marco,
>
> El 9/6/20 a las 19:46, Marco Bellini escribi?:
> > Dear All,
> > I'm trying to use proxmox on a 4 nodes cluster with ceph.
> > every node has a 500G NVME drive, with dedicated 10G ceph network with
> 9000bytes MTU.
> >
> > despite off nvme warp speed I can reach when used as lvm volume, as soon
> as I convert it into a 4-osd ceph, performance are very very poor.
> >
> > is there any trick to have ceph intro proxmox working fast?
> >
> What is "very very poor"? What specs have the Proxmox nodes (CPU, RAM)?
>
> AFAIK, it will be a challenge to get more that 2000 IOPS from one VM
> using Ceph...
>
> How are you performing the benchmark?
>
> Cheers
>
> --
> Zuzendari Teknikoa / Director T?cnico
> Binovo IT Human Project, S.L.
> Telf. 943569206
> Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From aderumier at odiso.com  Wed Jun 10 13:15:20 2020
From: aderumier at odiso.com (Alexandre DERUMIER)
Date: Wed, 10 Jun 2020 13:15:20 +0200 (CEST)
Subject: [PVE-User] CEPH performance
In-Reply-To: <c556deca-2d81-ee0e-96df-14751494b5d7@binovo.es>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
 <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
 <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
 <c556deca-2d81-ee0e-96df-14751494b5d7@binovo.es>
Message-ID: <1188207075.1092058.1591787720490.JavaMail.zimbra@odiso.com>

>>AFAIK, it will be a challenge to get more that 2000 IOPS from one VM
>>using Ceph...

with iodetph=1, single queue, you'll have indeed the latency , and you shouldn't be to reach more than 4000-5000iops.
(depend mainly of cpu frequency on client + cpu frequency on cluster + network latency)

but with more parallel read/write, you should be able to reach 70-80000 iops without any problem by disk.
(if you need more, you can use multiple disks with iothreads, I was able to scale up to 500-600000 iops with 5-6 disks).


depend of your workload, you can enable writeback, it'll improve performance of sequential write of small coalesced blocks.
(it's regrouping them in bigger block before sending it to ceph.)
But currently (nautilus), enabling writeback slowdown read.

with octopus (actually in test http://download.proxmox.com/debian/ceph-octopus/dists/buster/test/),
it's solved, and you can always enabled writeback

octopus have also others optimisations, and writeback is able to regroup also random non coalesced blocks

See my last benchmarks:

"
Here some iops result with 1vm - 1disk -  4k block   iodepth=64, librbd, no iothread.


                        nautilus-cache=none     nautilus-cache=writeback          octopus-cache=none     octopus-cache=writeback
          
randread 4k                  62.1k                     25.2k                            61.1k                     60.8k
randwrite 4k                 27.7k                     19.5k                            34.5k                     53.0k
seqwrite 4k                  7850                      37.5k                            24.9k                     82.6k
"


----- Mail original -----
De: "Eneko Lacunza" <elacunza at binovo.es>
?: "proxmoxve" <pve-user at pve.proxmox.com>
Envoy?: Mercredi 10 Juin 2020 08:30:08
Objet: Re: [PVE-User] CEPH performance

Hi Marco, 

El 9/6/20 a las 19:46, Marco Bellini escribi?: 
> Dear All, 
> I'm trying to use proxmox on a 4 nodes cluster with ceph. 
> every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU. 
> 
> despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor. 
> 
> is there any trick to have ceph intro proxmox working fast? 
> 
What is "very very poor"? What specs have the Proxmox nodes (CPU, RAM)? 

AFAIK, it will be a challenge to get more that 2000 IOPS from one VM 
using Ceph... 

How are you performing the benchmark? 

Cheers 

-- 
Zuzendari Teknikoa / Director T?cnico 
Binovo IT Human Project, S.L. 
Telf. 943569206 
Astigarragako bidea 2, 2? izq. oficina 11; 20180 Oiartzun (Gipuzkoa) 
www.binovo.es 

_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 


From jm at ginernet.com  Wed Jun 10 16:24:25 2020
From: jm at ginernet.com (=?UTF-8?Q?Jos=c3=a9_Manuel_Giner?=)
Date: Wed, 10 Jun 2020 16:24:25 +0200
Subject: [PVE-User] CEPH performance
In-Reply-To: <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
 <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
 <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
Message-ID: <18faea52-9ae9-06b8-8a7f-d9344c432d04@ginernet.com>

Note that with only 10 Gbps network, you will get only 1 GB/s wich is 
only the 25-30% performance of a NVMe.

To profit the 100% performance of a NVMe you need at least a 40G network.


On 09/06/2020 19:46, Marco Bellini wrote:
> 
> Dear All,
> I'm trying to use proxmox on a 4 nodes cluster with ceph.
> every node has a 500G NVME drive, with dedicated 10G ceph network with 9000bytes MTU.
> 
> despite off nvme warp speed I can reach when used as lvm volume, as soon as I convert it into a 4-osd ceph, performance are very very poor.
> 
> is there any trick to have ceph intro proxmox working fast?
> 
> thank you everybody for any advice.
> 
> 
> 
> -- .- -.-- / - .... . / ..-. --- .-. -.-. . / -... . / .-- .. - .... / -.-- --- ..- -.-.--
> 
> Marco Bellini
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 

-- 
Jos? Manuel Giner
https://ginernet.com


From sivakumar.saravanan.jv.ext at valeo-siemens.com  Wed Jun 10 17:42:19 2020
From: sivakumar.saravanan.jv.ext at valeo-siemens.com (Sivakumar SARAVANAN)
Date: Wed, 10 Jun 2020 17:42:19 +0200
Subject: [PVE-User] New host issue while adding to cluster
Message-ID: <CAETBTLQy0h=1XZ1n8cOG+x9c2cdX6M0Rky9D19QxXxaB2QxMew@mail.gmail.com>

Hello,

All hosts and Datacenter itself become unavailable after adding the new
host to the cluster.
What could be the reason?


Best regards

SK

-- 
*This e-mail message is intended for the internal use of the intended 
recipient(s) only.
The information contained herein is 
confidential/privileged. Its disclosure or reproduction is strictly 
prohibited.
If you are not the intended recipient, please inform the sender 
immediately, do not disclose it internally or to third parties and destroy 
it.

In the course of our business relationship and for business purposes 
only, Valeo may need to process some of your personal data. 
For more 
information, please refer to the Valeo Data Protection Statement and 
Privacy notice available on Valeo.com 
<https://www.valeo.com/en/ethics-and-compliance/#principes>*


From lindsay.mathieson at gmail.com  Tue Jun 16 15:51:17 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Tue, 16 Jun 2020 23:51:17 +1000
Subject: [PVE-User] CEPH performance
In-Reply-To: <mailman.179.1591771201.526.pve-user@pve.proxmox.com>
References: <1efde5a3-70f1-26b6-8c32-5a53b8602b43@web.de>
 <CACzVk9WK0Ur_sVY-oqqyyrigM+BUoPMp5kcfiK_T05Ce6v4wtw@mail.gmail.com>
 <4a73d7f96c0a4ebcbd5bc0650a964a87@internet.one>
 <c556deca-2d81-ee0e-96df-14751494b5d7@binovo.es>
 <mailman.179.1591771201.526.pve-user@pve.proxmox.com>
Message-ID: <614bc869-beb6-dd69-09ce-bfe2acca4b8a@gmail.com>

On 10/06/2020 4:38 pm, Mark Adams via pve-user wrote:
> The simplest thing to set also is to make sure you are using writeback
> cache in your vms with ceph. It makes a huge difference in performance.


Chiming in - doing some testing with a 5 node ceph/proxmox cluster here. 
Basic spinners and 4*1G eth, LACP:tcp-balance.

enabling KRBD on the ceph pool made a huge difference - I presume that 
uses the rbd kernel driver?

-- 
Lindsay


From lindsay.mathieson at gmail.com  Tue Jun 16 16:00:10 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 17 Jun 2020 00:00:10 +1000
Subject: [PVE-User] Kudo's to the Proxmox team for their ceph integration
Message-ID: <a7b309fa-9f6e-269d-b571-8b8b5c7df027@gmail.com>

Have been revisting using ceph after trialing it several years back (ok, 
but a headache to manage and performance sucked on the limited hardware 
we had).


Wow, you've really put a lot of effort integrating it into proxmox, that 
UI makes the setup and monitoring so easy. Outstanding work. And the 
Nautilus features add two key things I really like about zfs - 
transparent compression and checksumming. Bluetore does seem to have 
much better performance.


Seems pretty solid to, due to my sleep deprived state, I managed to 
crash/hard reboot the entire cluster *twice* today, but ceph recovered 
flawlessly with no loss both times, and HA brought up my critical VM's 
with no intervention (pfSense router, AD and SQL Server).


Thanks!

-- 
Lindsay


From lindsay.mathieson at gmail.com  Tue Jun 16 16:11:56 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 17 Jun 2020 00:11:56 +1000
Subject: [PVE-User] Kudo's to the Proxmox team for their ceph integration
In-Reply-To: <a7b309fa-9f6e-269d-b571-8b8b5c7df027@gmail.com>
References: <a7b309fa-9f6e-269d-b571-8b8b5c7df027@gmail.com>
Message-ID: <8c85ec0a-62a5-78e6-e223-088cacceaede@gmail.com>

Oh, and the ZFS boot options in the installer are pretty slick too. It 
was very flaky for me when it first came out, but seems rock solid now. 
Setup a new server with two SSD's in raid1, no issues.

On 17/06/2020 12:00 am, Lindsay Mathieson wrote:
> Have been revisting using ceph after trialing it several years back 
> (ok, but a headache to manage and performance sucked on the limited 
> hardware we had).
>
>
> Wow, you've really put a lot of effort integrating it into proxmox, 
> that UI makes the setup and monitoring so easy. Outstanding work. And 
> the Nautilus features add two key things I really like about zfs - 
> transparent compression and checksumming. Bluetore does seem to have 
> much better performance.
>
>
> Seems pretty solid to, due to my sleep deprived state, I managed to 
> crash/hard reboot the entire cluster *twice* today, but ceph recovered 
> flawlessly with no loss both times, and HA brought up my critical VM's 
> with no intervention (pfSense router, AD and SQL Server).
>
>
> Thanks!
>

-- 
Lindsay


From lindsay.mathieson at gmail.com  Wed Jun 17 02:16:09 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 17 Jun 2020 10:16:09 +1000
Subject: [PVE-User] Ceph Storage Status Page
Message-ID: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com>

Is the "Usage" total displayed there the Actual Size / replication 
factor? Because my total is 5TB when I have 16TB of disk space.


Which makes sense and is more useful.

-- 
Lindsay


From alex at calicolabs.com  Wed Jun 17 03:09:17 2020
From: alex at calicolabs.com (Alex Chekholko)
Date: Tue, 16 Jun 2020 18:09:17 -0700
Subject: [PVE-User] Ceph Storage Status Page
In-Reply-To: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com>
References: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com>
Message-ID: <CANcy_PZ-XQoJ9QHJ4-=Yh7SzukTdn9OwBcBwraPbZ0T-KdCaeA@mail.gmail.com>

Maybe compare it to the output of "ceph -s" on the CLI.  You can click
"Shell" in the upper right WebUI view.

On Tue, Jun 16, 2020 at 5:17 PM Lindsay Mathieson <
lindsay.mathieson at gmail.com> wrote:

> Is the "Usage" total displayed there the Actual Size / replication
> factor? Because my total is 5TB when I have 16TB of disk space.
>
>
> Which makes sense and is more useful.
>
> --
> Lindsay
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From lindsay.mathieson at gmail.com  Thu Jun 18 02:34:41 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 18 Jun 2020 10:34:41 +1000
Subject: [PVE-User] Ceph Storage Status Page
In-Reply-To: <mailman.7.1592356231.538.pve-user@pve.proxmox.com>
References: <07d205c3-d462-8e8a-f8ed-dce8d33961ec@gmail.com>
 <mailman.7.1592356231.538.pve-user@pve.proxmox.com>
Message-ID: <f118b8d2-d3a3-ad89-93d2-3de547861a69@gmail.com>

On 17/06/2020 11:09 am, Alex Chekholko via pve-user wrote:
> Maybe compare it to the output of "ceph -s" on the CLI.  You can click
> "Shell" in the upper right WebUI view.


Thanks, yah I'm familiar with that, just curious as to what Proxmox is 
displaying. "ceph -s" shows:

 ??? 7.9 TiB used, 23 TiB / 30 TiB avail


Whereas Proxmox shows:

 ??? 3.33TiB of 8.59TiB


I do have lz4 compression on though. Maybe thats skewing the figures.

-- 
Lindsay


From lindsay.mathieson at gmail.com  Thu Jun 18 13:30:38 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Thu, 18 Jun 2020 21:30:38 +1000
Subject: [PVE-User] Enabling telemetry broke all my ceph managers
Message-ID: <a6481a31-5d59-c13c-dea2-5367842c21e7@gmail.com>

Clean nautilous install I setup last week

  * 5 Proxmox nodes
      o All on latest updates via no-subscription channel
  * 18 OSD's
  * 3 Managers
  * 3 Monitors
  * Cluster Heal good
  * In a protracted rebalance phase
  * All managed via proxmox

I thought I would enable telemetry for caph as per this article:

https://docs.ceph.com/docs/master/mgr/telemetry/


  * Enabled the module (command line)
  * ceph telemetry on
  * Tested getting the status
  * Set the contact and description
    ceph config set mgr mgr/telemetry/contact 'John Doe
    <john.doe at example.com>'
    ceph config set mgr mgr/telemetry/description 'My first Ceph cluster'
    ceph config set mgr mgr/telemetry/channel_ident true
  * Tried sending it
    ceph telemetry send

I *think* this is when the managers died, but it could have been 
earlier. But around then the all ceph IO stopped and I discovered all 
three managers had crashed and would not restart. I was shitting myself 
because this was remote and the router is a pfSense VM :) Fortunately it 
kept going without its disk responding.

systemctl start ceph-mgr at vni.service
Job for ceph-mgr at vni.service failed because the control process exited 
with error code.
See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for 
details.

 From journalcontrol -xe

    -- The unit ceph-mgr at vni.service has entered the 'failed' state with
    result 'exit-code'.
    Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager
    daemon.
    -- Subject: A start job for unit ceph-mgr at vni.service has failed
    -- Defined-By: systemd
    -- Support: https://www.debian.org/support
    --
    -- A start job for unit ceph-mgr at vni.service has finished with a
    failure.
    --
    -- The job identifier is 91690 and the job result is failed.


 From systemctl status ceph-mgr at vni.service

ceph-mgr at vni.service - Ceph cluster manager daemon
 ?? Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled; 
vendor preset: enabled)
 ? Drop-In: /lib/systemd/system/ceph-mgr at .service.d
 ?????????? ??ceph-after-pve-cluster.conf
 ?? Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 
AEST; 8min ago
 ? Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} 
--id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
 ?Main PID: 415566 (code=exited, status=1/FAILURE)

Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service 
RestartSec=10s expired, scheduling restart.
Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart 
job, restart counter is at 4.
Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon.
Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request 
repeated too quickly.
Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result 
'exit-code'.
Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager daemon.

I created a new manager service on an unused node and fortunately that 
worked. I deleted/recreated the old managers and they started working. 
It was a sweaty few minutes :)


Everything resumed without a hiccup after that, impressed. Not game to 
try and reproduce it though.


-- 
Lindsay


From brians at iptel.co  Fri Jun 19 00:06:40 2020
From: brians at iptel.co (Brian :)
Date: Thu, 18 Jun 2020 23:06:40 +0100
Subject: [PVE-User] Enabling telemetry broke all my ceph managers
In-Reply-To: <a6481a31-5d59-c13c-dea2-5367842c21e7@gmail.com>
References: <a6481a31-5d59-c13c-dea2-5367842c21e7@gmail.com>
Message-ID: <CAGPQfi_xwebe=MeekoDhoLN1s30BKX9cDdiEdJVLFvvQZH733Q@mail.gmail.com>

Nice save. And thanks for the detailed info.

On Thursday, June 18, 2020, Lindsay Mathieson <lindsay.mathieson at gmail.com>
wrote:
> Clean nautilous install I setup last week
>
>  * 5 Proxmox nodes
>      o All on latest updates via no-subscription channel
>  * 18 OSD's
>  * 3 Managers
>  * 3 Monitors
>  * Cluster Heal good
>  * In a protracted rebalance phase
>  * All managed via proxmox
>
> I thought I would enable telemetry for caph as per this article:
>
> https://docs.ceph.com/docs/master/mgr/telemetry/
>
>
>  * Enabled the module (command line)
>  * ceph telemetry on
>  * Tested getting the status
>  * Set the contact and description
>    ceph config set mgr mgr/telemetry/contact 'John Doe
>    <john.doe at example.com>'
>    ceph config set mgr mgr/telemetry/description 'My first Ceph cluster'
>    ceph config set mgr mgr/telemetry/channel_ident true
>  * Tried sending it
>    ceph telemetry send
>
> I *think* this is when the managers died, but it could have been earlier.
But around then the all ceph IO stopped and I discovered all three managers
had crashed and would not restart. I was shitting myself because this was
remote and the router is a pfSense VM :) Fortunately it kept going without
its disk responding.
>
> systemctl start ceph-mgr at vni.service
> Job for ceph-mgr at vni.service failed because the control process exited
with error code.
> See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for
details.
>
> From journalcontrol -xe
>
>    -- The unit ceph-mgr at vni.service has entered the 'failed' state with
>    result 'exit-code'.
>    Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager
>    daemon.
>    -- Subject: A start job for unit ceph-mgr at vni.service has failed
>    -- Defined-By: systemd
>    -- Support: https://www.debian.org/support
>    --
>    -- A start job for unit ceph-mgr at vni.service has finished with a
>    failure.
>    --
>    -- The job identifier is 91690 and the job result is failed.
>
>
> From systemctl status ceph-mgr at vni.service
>
> ceph-mgr at vni.service - Ceph cluster manager daemon
>    Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled; vendor
preset: enabled)
>   Drop-In: /lib/systemd/system/ceph-mgr at .service.d
>            ??ceph-after-pve-cluster.conf
>    Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 AEST;
8min ago
>   Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER}
--id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
>  Main PID: 415566 (code=exited, status=1/FAILURE)
>
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service
RestartSec=10s expired, scheduling restart.
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart
job, restart counter is at 4.
> Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon.
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request
repeated too quickly.
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result
'exit-code'.
> Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager
daemon.
>
> I created a new manager service on an unused node and fortunately that
worked. I deleted/recreated the old managers and they started working. It
was a sweaty few minutes :)
>
>
> Everything resumed without a hiccup after that, impressed. Not game to
try and reproduce it though.
>
>
>
> --
> Lindsay
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


From rommelrt at nauta.cu  Fri Jun 19 17:56:27 2020
From: rommelrt at nauta.cu (Rommel Rodriguez Toirac)
Date: Fri, 19 Jun 2020 11:56:27 -0400
Subject: [PVE-User] Where to find CentOS 8 template
Message-ID: <75DEC247-E483-4404-B90E-CC32B8310261@nauta.cu>

Hello all;

Does anyone know of a place to get a CentOS 8 container file similar to the 
ones at https://download.openvz.org/template/precreated/ for CentOS 7 
(centos-7-x86_64-minimal.tar.gz)

https://download.openvz.org/template/precreated/

https://download.openvz.org/template/precreated/centos-7-x86_64-minimal.tar.gz


-- 
Rommel Rodriguez Toirac
rommelrt at nauta.cu

From leesteken at protonmail.ch  Fri Jun 19 18:03:48 2020
From: leesteken at protonmail.ch (Arjen)
Date: Fri, 19 Jun 2020 16:03:48 +0000
Subject: [PVE-User] Where to find CentOS 8 template
In-Reply-To: <75DEC247-E483-4404-B90E-CC32B8310261@nauta.cu>
References: <75DEC247-E483-4404-B90E-CC32B8310261@nauta.cu>
Message-ID: <gVkc8IinAqCdQkO-rp5l1N8HrqtQaw_cAk1L-pMSr2nmmayICHX2eFJUcpJ-cJXgGLNNWL5xM_5kI_gIm0gM1Yrg2svyMxuLRviA_Xb6koI=@protonmail.ch>

On Friday, June 19, 2020 5:56 PM, Rommel Rodriguez Toirac <rommelrt at nauta.cu> wrote:

> Hello all;
>
> Does anyone know of a place to get a CentOS 8 container file similar to the
> ones at https://download.openvz.org/template/precreated/ for CentOS 7
> (centos-7-x86_64-minimal.tar.gz)
>
> https://download.openvz.org/template/precreated/
>
> https://download.openvz.org/template/precreated/centos-7-x86_64-minimal.tar.gz

If you run /usr/bin/pveam update, I believe you should be able to download "centos-8-default (20191016)" using the Templates button on a storage (in the Proxmox WebGUI) that has container templates enabled.

Is this not working for you? If so, which version of Proxmox do you use?
Or is that template not what you are looking for? Or did I understand your question wrong?

kind regards, Arjen


From atokovenko at gmail.com  Mon Jun 22 21:57:31 2020
From: atokovenko at gmail.com (Oleksii Tokovenko)
Date: Mon, 22 Jun 2020 22:57:31 +0300
Subject: [PVE-User] pve-user Digest, Vol 147, Issue 10
In-Reply-To: <mailman.5.1592560801.25744.pve-user@pve.proxmox.com>
References: <mailman.5.1592560801.25744.pve-user@pve.proxmox.com>
Message-ID: <CAC=j=pT9bn6n-5hho2TFeNBW88aQi6nrshnMLt02LtstLvEjsA@mail.gmail.com>

unsubscribe

??, 19 ????. 2020 ? 13:00 <pve-user-request at pve.proxmox.com> ????:

> Send pve-user mailing list submissions to
>         pve-user at pve.proxmox.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> or, via email, send a message with subject or body 'help' to
>         pve-user-request at pve.proxmox.com
>
> You can reach the person managing the list at
>         pve-user-owner at pve.proxmox.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of pve-user digest..."
>
>
> Today's Topics:
>
>    1. Enabling telemetry broke all my ceph managers (Lindsay Mathieson)
>    2. Re: Enabling telemetry broke all my ceph managers (Brian :)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 18 Jun 2020 21:30:38 +1000
> From: Lindsay Mathieson <lindsay.mathieson at gmail.com>
> To: PVE User List <pve-user at pve.proxmox.com>
> Subject: [PVE-User] Enabling telemetry broke all my ceph managers
> Message-ID: <a6481a31-5d59-c13c-dea2-5367842c21e7 at gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Clean nautilous install I setup last week
>
>   * 5 Proxmox nodes
>       o All on latest updates via no-subscription channel
>   * 18 OSD's
>   * 3 Managers
>   * 3 Monitors
>   * Cluster Heal good
>   * In a protracted rebalance phase
>   * All managed via proxmox
>
> I thought I would enable telemetry for caph as per this article:
>
> https://docs.ceph.com/docs/master/mgr/telemetry/
>
>
>   * Enabled the module (command line)
>   * ceph telemetry on
>   * Tested getting the status
>   * Set the contact and description
>     ceph config set mgr mgr/telemetry/contact 'John Doe
>     <john.doe at example.com>'
>     ceph config set mgr mgr/telemetry/description 'My first Ceph cluster'
>     ceph config set mgr mgr/telemetry/channel_ident true
>   * Tried sending it
>     ceph telemetry send
>
> I *think* this is when the managers died, but it could have been
> earlier. But around then the all ceph IO stopped and I discovered all
> three managers had crashed and would not restart. I was shitting myself
> because this was remote and the router is a pfSense VM :) Fortunately it
> kept going without its disk responding.
>
> systemctl start ceph-mgr at vni.service
> Job for ceph-mgr at vni.service failed because the control process exited
> with error code.
> See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for
> details.
>
>  From journalcontrol -xe
>
>     -- The unit ceph-mgr at vni.service has entered the 'failed' state with
>     result 'exit-code'.
>     Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager
>     daemon.
>     -- Subject: A start job for unit ceph-mgr at vni.service has failed
>     -- Defined-By: systemd
>     -- Support: https://www.debian.org/support
>     --
>     -- A start job for unit ceph-mgr at vni.service has finished with a
>     failure.
>     --
>     -- The job identifier is 91690 and the job result is failed.
>
>
>  From systemctl status ceph-mgr at vni.service
>
> ceph-mgr at vni.service - Ceph cluster manager daemon
>  ?? Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled;
> vendor preset: enabled)
>  ? Drop-In: /lib/systemd/system/ceph-mgr at .service.d
>  ?????????? ??ceph-after-pve-cluster.conf
>  ?? Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52
> AEST; 8min ago
>  ? Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER}
> --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
>  ?Main PID: 415566 (code=exited, status=1/FAILURE)
>
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service
> RestartSec=10s expired, scheduling restart.
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart
> job, restart counter is at 4.
> Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon.
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request
> repeated too quickly.
> Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result
> 'exit-code'.
> Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager
> daemon.
>
> I created a new manager service on an unused node and fortunately that
> worked. I deleted/recreated the old managers and they started working.
> It was a sweaty few minutes :)
>
>
> Everything resumed without a hiccup after that, impressed. Not game to
> try and reproduce it though.
>
>
>
> --
> Lindsay
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 18 Jun 2020 23:06:40 +0100
> From: "Brian :" <brians at iptel.co>
> To: PVE User List <pve-user at pve.proxmox.com>
> Subject: Re: [PVE-User] Enabling telemetry broke all my ceph managers
> Message-ID:
>         <CAGPQfi_xwebe=
> MeekoDhoLN1s30BKX9cDdiEdJVLFvvQZH733Q at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Nice save. And thanks for the detailed info.
>
> On Thursday, June 18, 2020, Lindsay Mathieson <lindsay.mathieson at gmail.com
> >
> wrote:
> > Clean nautilous install I setup last week
> >
> >  * 5 Proxmox nodes
> >      o All on latest updates via no-subscription channel
> >  * 18 OSD's
> >  * 3 Managers
> >  * 3 Monitors
> >  * Cluster Heal good
> >  * In a protracted rebalance phase
> >  * All managed via proxmox
> >
> > I thought I would enable telemetry for caph as per this article:
> >
> > https://docs.ceph.com/docs/master/mgr/telemetry/
> >
> >
> >  * Enabled the module (command line)
> >  * ceph telemetry on
> >  * Tested getting the status
> >  * Set the contact and description
> >    ceph config set mgr mgr/telemetry/contact 'John Doe
> >    <john.doe at example.com>'
> >    ceph config set mgr mgr/telemetry/description 'My first Ceph cluster'
> >    ceph config set mgr mgr/telemetry/channel_ident true
> >  * Tried sending it
> >    ceph telemetry send
> >
> > I *think* this is when the managers died, but it could have been earlier.
> But around then the all ceph IO stopped and I discovered all three managers
> had crashed and would not restart. I was shitting myself because this was
> remote and the router is a pfSense VM :) Fortunately it kept going without
> its disk responding.
> >
> > systemctl start ceph-mgr at vni.service
> > Job for ceph-mgr at vni.service failed because the control process exited
> with error code.
> > See "systemctl status ceph-mgr at vni.service" and "journalctl -xe" for
> details.
> >
> > From journalcontrol -xe
> >
> >    -- The unit ceph-mgr at vni.service has entered the 'failed' state with
> >    result 'exit-code'.
> >    Jun 18 21:02:25 vni systemd[1]: Failed to start Ceph cluster manager
> >    daemon.
> >    -- Subject: A start job for unit ceph-mgr at vni.service has failed
> >    -- Defined-By: systemd
> >    -- Support: https://www.debian.org/support
> >    --
> >    -- A start job for unit ceph-mgr at vni.service has finished with a
> >    failure.
> >    --
> >    -- The job identifier is 91690 and the job result is failed.
> >
> >
> > From systemctl status ceph-mgr at vni.service
> >
> > ceph-mgr at vni.service - Ceph cluster manager daemon
> >    Loaded: loaded (/lib/systemd/system/ceph-mgr at .service; enabled;
> vendor
> preset: enabled)
> >   Drop-In: /lib/systemd/system/ceph-mgr at .service.d
> >            ??ceph-after-pve-cluster.conf
> >    Active: failed (Result: exit-code) since Thu 2020-06-18 20:53:52 AEST;
> 8min ago
> >   Process: 415566 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER}
> --id vni --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
> >  Main PID: 415566 (code=exited, status=1/FAILURE)
> >
> > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Service
> RestartSec=10s expired, scheduling restart.
> > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Scheduled restart
> job, restart counter is at 4.
> > Jun 18 20:53:52 vni systemd[1]: Stopped Ceph cluster manager daemon.
> > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Start request
> repeated too quickly.
> > Jun 18 20:53:52 vni systemd[1]: ceph-mgr at vni.service: Failed with result
> 'exit-code'.
> > Jun 18 20:53:52 vni systemd[1]: Failed to start Ceph cluster manager
> daemon.
> >
> > I created a new manager service on an unused node and fortunately that
> worked. I deleted/recreated the old managers and they started working. It
> was a sweaty few minutes :)
> >
> >
> > Everything resumed without a hiccup after that, impressed. Not game to
> try and reproduce it though.
> >
> >
> >
> > --
> > Lindsay
> >
> > _______________________________________________
> > pve-user mailing list
> > pve-user at pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
> ------------------------------
>
> End of pve-user Digest, Vol 147, Issue 10
> *****************************************
>


-- 
? ?????????,
????????? ??????? ??????????


From thomas.naumann at ovgu.de  Fri Jun 26 09:51:57 2020
From: thomas.naumann at ovgu.de (Naumann, Thomas)
Date: Fri, 26 Jun 2020 07:51:57 +0000
Subject: [PVE-User] osd init authentication failed: (1) Operation not
 permitted
Message-ID: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de>

Hi,

in our production cluster (proxmox 5.4, ceph 12.2) there is an issue
since yesterday. after an increase of a pool 5 OSDs do not start,
status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) set,
5 osds down, 128 osds: 123 up, 128 in.

last lines of OSD-logfile:
2020-06-26 08:40:26.240005 7f6d245fff80  1 freelist init
2020-06-26 08:40:26.243779 7f6d245fff80  1
bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation
metadata
2020-06-26 08:40:26.251501 7f6d245fff80  1
bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in 1
extents
2020-06-26 08:40:26.253058 7f6d245fff80  0 <cls>
/mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197:
loading cephfs
2020-06-26 08:40:26.253309 7f6d245fff80  0 _get_class not permitted to
load sdk
2020-06-26 08:40:26.256486 7f6d245fff80  0 _get_class not permitted to
load kvs
2020-06-26 08:40:26.256611 7f6d245fff80  0 <cls>
/mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296: loading
cls_hello
2020-06-26 08:40:26.258362 7f6d245fff80  0 _get_class not permitted to
load lua
2020-06-26 08:40:26.259850 7f6d245fff80  0 osd.45 46770 crush map has
features 288514051259236352, adjusting msgr requires for clients
2020-06-26 08:40:26.259859 7f6d245fff80  0 osd.45 46770 crush map has
features 288514051259236352 was 8705, adjusting msgr requires for mons
2020-06-26 08:40:26.259863 7f6d245fff80  0 osd.45 46770 crush map has
features 1009089991638532096, adjusting msgr requires for osds
2020-06-26 08:40:26.305880 7f6d245fff80  0 osd.45 46770 load_pgs
2020-06-26 08:40:28.024638 7f6d245fff80  0 osd.45 46770 load_pgs opened
129 pgs
2020-06-26 08:40:28.024803 7f6d245fff80  0 osd.45 46770 using
weightedpriority op queue with priority op cut off at 64.
2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770 log_to_monitors
{default=true}
2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init
authentication failed: (1) Operation not permitted

Does anyone know how to fix this?
-- 
Thomas Naumann

Abteilung Netze und Kommunikation
Otto-von-Guericke Universit?t Magdeburg
Universit?tsrechenzentrum
Universit?tsplatz 2
39106 Magdeburg

fon: +49 391 67-58563
email: thomas.naumann at ovgu.de

From a.antreich at proxmox.com  Mon Jun 29 10:36:51 2020
From: a.antreich at proxmox.com (Alwin Antreich)
Date: Mon, 29 Jun 2020 10:36:51 +0200
Subject: [PVE-User] osd init authentication failed: (1) Operation not
 permitted
In-Reply-To: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de>
References: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de>
Message-ID: <20200629083651.GB1554173@dona.proxmox.com>

Hello Thomas,

On Fri, Jun 26, 2020 at 07:51:57AM +0000, Naumann, Thomas wrote:
> Hi,
> 
> in our production cluster (proxmox 5.4, ceph 12.2) there is an issue
> since yesterday. after an increase of a pool 5 OSDs do not start,
> status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) set,
> 5 osds down, 128 osds: 123 up, 128 in.
> 
> last lines of OSD-logfile:
> 2020-06-26 08:40:26.240005 7f6d245fff80  1 freelist init
> 2020-06-26 08:40:26.243779 7f6d245fff80  1
> bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation
> metadata
> 2020-06-26 08:40:26.251501 7f6d245fff80  1
> bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in 1
> extents
> 2020-06-26 08:40:26.253058 7f6d245fff80  0 <cls>
> /mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197:
> loading cephfs
> 2020-06-26 08:40:26.253309 7f6d245fff80  0 _get_class not permitted to
> load sdk
> 2020-06-26 08:40:26.256486 7f6d245fff80  0 _get_class not permitted to
> load kvs
> 2020-06-26 08:40:26.256611 7f6d245fff80  0 <cls>
> /mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296: loading
> cls_hello
> 2020-06-26 08:40:26.258362 7f6d245fff80  0 _get_class not permitted to
> load lua
> 2020-06-26 08:40:26.259850 7f6d245fff80  0 osd.45 46770 crush map has
> features 288514051259236352, adjusting msgr requires for clients
> 2020-06-26 08:40:26.259859 7f6d245fff80  0 osd.45 46770 crush map has
> features 288514051259236352 was 8705, adjusting msgr requires for mons
> 2020-06-26 08:40:26.259863 7f6d245fff80  0 osd.45 46770 crush map has
> features 1009089991638532096, adjusting msgr requires for osds
> 2020-06-26 08:40:26.305880 7f6d245fff80  0 osd.45 46770 load_pgs
> 2020-06-26 08:40:28.024638 7f6d245fff80  0 osd.45 46770 load_pgs opened
> 129 pgs
> 2020-06-26 08:40:28.024803 7f6d245fff80  0 osd.45 46770 using
> weightedpriority op queue with priority op cut off at 64.
> 2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770 log_to_monitors
> {default=true}
> 2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init
> authentication failed: (1) Operation not permitted
> 
> Does anyone know how to fix this?
Are does OSDs on the same host? What is the current status of the
cluster?

--
Cheers,
Alwin


From thomas.naumann at ovgu.de  Mon Jun 29 13:23:31 2020
From: thomas.naumann at ovgu.de (Naumann, Thomas)
Date: Mon, 29 Jun 2020 11:23:31 +0000
Subject: [PVE-User] osd init authentication failed: (1) Operation not
 permitted
In-Reply-To: <20200629083651.GB1554173@dona.proxmox.com>
References: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de>
 <20200629083651.GB1554173@dona.proxmox.com>
Message-ID: <4e6d946ab3c8395f6912e8c47a7bfe1f0fe0e839.camel@ovgu.de>

Hi Alwin,

yes, all OSDs, which did not start, were on same physical clusternode
and all running VMs on cluster were dead because of missing objects.

Problem was that those OSDs did not have an entry in "ceph auth list",
so manually adding the OSDs (ceph auth add osd.X osd 'allow *' mon
'allow profile osd' mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph-
X/keyring) solved the problem.

After that starting systemd-service for each OSD war successful.

Until now I did not find anything related in logfiles on clusternode.
Any hint to demystifying the cluster behavior is welcome...
-- 
Thomas Naumann

Abteilung Netze und Kommunikation
Otto-von-Guericke Universit?t Magdeburg
Universit?tsrechenzentrum
Universit?tsplatz 2
39106 Magdeburg

fon: +49 391 67-58563
email: thomas.naumann at ovgu.de

On Mon, 2020-06-29 at 10:36 +0200, Alwin Antreich wrote:
> Hello Thomas,
> 
> On Fri, Jun 26, 2020 at 07:51:57AM +0000, Naumann, Thomas wrote:
> > Hi,
> > 
> > in our production cluster (proxmox 5.4, ceph 12.2) there is an
> > issue
> > since yesterday. after an increase of a pool 5 OSDs do not start,
> > status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s)
> > set,
> > 5 osds down, 128 osds: 123 up, 128 in.
> > 
> > last lines of OSD-logfile:
> > 2020-06-26 08:40:26.240005 7f6d245fff80  1 freelist init
> > 2020-06-26 08:40:26.243779 7f6d245fff80  1
> > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation
> > metadata
> > 2020-06-26 08:40:26.251501 7f6d245fff80  1
> > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in
> > 1
> > extents
> > 2020-06-26 08:40:26.253058 7f6d245fff80  0 <cls>
> > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197:
> > loading cephfs
> > 2020-06-26 08:40:26.253309 7f6d245fff80  0 _get_class not permitted
> > to
> > load sdk
> > 2020-06-26 08:40:26.256486 7f6d245fff80  0 _get_class not permitted
> > to
> > load kvs
> > 2020-06-26 08:40:26.256611 7f6d245fff80  0 <cls>
> > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296:
> > loading
> > cls_hello
> > 2020-06-26 08:40:26.258362 7f6d245fff80  0 _get_class not permitted
> > to
> > load lua
> > 2020-06-26 08:40:26.259850 7f6d245fff80  0 osd.45 46770 crush map
> > has
> > features 288514051259236352, adjusting msgr requires for clients
> > 2020-06-26 08:40:26.259859 7f6d245fff80  0 osd.45 46770 crush map
> > has
> > features 288514051259236352 was 8705, adjusting msgr requires for
> > mons
> > 2020-06-26 08:40:26.259863 7f6d245fff80  0 osd.45 46770 crush map
> > has
> > features 1009089991638532096, adjusting msgr requires for osds
> > 2020-06-26 08:40:26.305880 7f6d245fff80  0 osd.45 46770 load_pgs
> > 2020-06-26 08:40:28.024638 7f6d245fff80  0 osd.45 46770 load_pgs
> > opened
> > 129 pgs
> > 2020-06-26 08:40:28.024803 7f6d245fff80  0 osd.45 46770 using
> > weightedpriority op queue with priority op cut off at 64.
> > 2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770
> > log_to_monitors
> > {default=true}
> > 2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init
> > authentication failed: (1) Operation not permitted
> > 
> > Does anyone know how to fix this?
> Are does OSDs on the same host? What is the current status of the
> cluster?
> 
> --
> Cheers,
> Alwin
> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

From a.antreich at proxmox.com  Mon Jun 29 14:35:21 2020
From: a.antreich at proxmox.com (Alwin Antreich)
Date: Mon, 29 Jun 2020 14:35:21 +0200
Subject: [PVE-User] osd init authentication failed: (1) Operation not
 permitted
In-Reply-To: <4e6d946ab3c8395f6912e8c47a7bfe1f0fe0e839.camel@ovgu.de>
References: <853a4b17e0ba833ecb4327274f9c3ecb7c784bf7.camel@ovgu.de>
 <20200629083651.GB1554173@dona.proxmox.com>
 <4e6d946ab3c8395f6912e8c47a7bfe1f0fe0e839.camel@ovgu.de>
Message-ID: <20200629123521.GA81113@dona.proxmox.com>

On Mon, Jun 29, 2020 at 11:23:31AM +0000, Naumann, Thomas wrote:
> Hi Alwin,
> 
> yes, all OSDs, which did not start, were on same physical clusternode
> and all running VMs on cluster were dead because of missing objects.
> 
> Problem was that those OSDs did not have an entry in "ceph auth list",
> so manually adding the OSDs (ceph auth add osd.X osd 'allow *' mon
> 'allow profile osd' mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph-
> X/keyring) solved the problem.
> 
> After that starting systemd-service for each OSD war successful.
> 
> Until now I did not find anything related in logfiles on clusternode.
> Any hint to demystifying the cluster behavior is welcome...
Besides, correleting log file, not really. ;)

Not certain but this issue could also be at play.
https://bugzilla.proxmox.com/show_bug.cgi?id=2053

--
Cheers,
Alwin


From lindsay.mathieson at gmail.com  Mon Jun 29 16:07:40 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Tue, 30 Jun 2020 00:07:40 +1000
Subject: [PVE-User] Ceph Bluestore - lvmcache versus WAL/DB on SSD
Message-ID: <c180d6a9-59e2-21ab-a5aa-96577b038aea@gmail.com>

As per the title :) I have 23 OSD spinners on 5 hosts, Data+WAL+DB all 
on the disk. All VM's are windows running with Writeback Cache. 
Performance is adequate but see occasional high IO loads that make the 
VM's sluggish.

With a lot of work I could move the WAL/DB to separate SSD partitions, 
or use lvmcache, which looks to be more transparent to setup.


TBH, write performance could be better, as could IOPS :) The Ethernet 
connections never comes close to being saturated, so I'm guessing write 
speed of the disks is the limiting factor.


I'm tending towards separate WAL/DB devices as I prefer to work within 
the recommended usages of projects such as Ceph these days, rather than 
trying to outguess their design parameters.


Any experiences either way on the list?

-- 
Lindsay


From jameslipski at protonmail.com  Tue Jun 30 03:08:49 2020
From: jameslipski at protonmail.com (jameslipski)
Date: Tue, 30 Jun 2020 01:08:49 +0000
Subject: High I/O waits, not sure if it's a ceph issue.
Message-ID: <4GE-3ImIaZ3ujQiKYpuwovUyhUEwt8m_ZZAcH3haKt6ly27BvzznK1BgWvt5-T7tM9X3_79u6PcdPDIpxrfhcXh6bDvfuE07B5f8dSrvBDw=@protonmail.com>

Greetings,

I'm trying out PVE. Currently I'm just doing tests and ran into an issue relating to high I/O waits.

Just to give a little bit of a background, we currently we have 6 nodes. We're running CEPH, and each node consists of
2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global ceph configurations (at least as shown on the proxmox interface) is as follows:

[global]
auth_client_required = xxxx
auth_cluster_required = xxxx
auth_service_required = xxxx
cluster_network = 10.125.0.0/24
fsid = f64d2a67-98c3-4dbc-abfd-906ea7aaf314
mon_allow_pool_delete = true
mon_host = 10.125.0.101 10.125.0.102 10.125.0.103 10.125.0.105
10.125.0.106 10.125.0.104
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.125.0.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

If I'm missing any relevant information relating to my ceph setup (I'm still learning this), please let me know.

Each node consists of 2x Xeon E5-2660 v3. Where I ran into high I/O waits is when running 2 VMs. 1 VM is a mysql replication server (using 8 cores), and is performing mostly writes. The second VM is running Debian with Cacti. Both of these systems are on 2 different nodes but uses CEPH to store the vm-hd. When I copied files over the network to the VM running Cacti, I've noticed high I/O waits in my mysql VM.

I'm assuming that this has something to do with ceph; though the only thing I'm seeing in the ceph logs are the following:

02:43:01.062082 mgr.node01 (mgr.2914449) 8009571 : cluster [DBG] pgmap v8009574: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.4 MiB/s wr, 274 op/s
02:43:03.063137 mgr.node01 (mgr.2914449) 8009572 : cluster [DBG] pgmap v8009575: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 3.0 MiB/s wr, 380 op/s
02:43:05.064125 mgr.node01 (mgr.2914449) 8009573 : cluster [DBG] pgmap v8009576: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 2.9 MiB/s wr, 332 op/s
02:43:07.065373 mgr.node01 (mgr.2914449) 8009574 : cluster [DBG] pgmap v8009577: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 2.7 MiB/s wr, 313 op/s
02:43:09.066210 mgr.node01 (mgr.2914449) 8009575 : cluster [DBG] pgmap v8009578: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 2.9 MiB/s wr, 350 op/s
02:43:11.066913 mgr.node01 (mgr.2914449) 8009576 : cluster [DBG] pgmap v8009579: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.1 MiB/s wr, 346 op/s
02:43:13.067926 mgr.node01 (mgr.2914449) 8009577 : cluster [DBG] pgmap v8009580: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.5 MiB/s wr, 408 op/s
02:43:15.068834 mgr.node01 (mgr.2914449) 8009578 : cluster [DBG] pgmap v8009581: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.0 MiB/s wr, 320 op/s
02:43:17.069627 mgr.node01 (mgr.2914449) 8009579 : cluster [DBG] pgmap v8009582: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 2.5 MiB/s wr, 285 op/s
02:43:19.070507 mgr.node01 (mgr.2914449) 8009580 : cluster [DBG] pgmap v8009583: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 341 B/s rd, 3.0 MiB/s wr, 349 op/s
02:43:21.071241 mgr.node01 (mgr.2914449) 8009581 : cluster [DBG] pgmap v8009584: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 0 B/s rd, 2.8 MiB/s wr, 319 op/s
02:43:23.072286 mgr.node01 (mgr.2914449) 8009582 : cluster [DBG] pgmap v8009585: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.7 MiB/s wr, 329 op/s
02:43:25.073369 mgr.node01 (mgr.2914449) 8009583 : cluster [DBG] pgmap v8009586: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.8 MiB/s wr, 304 op/s
02:43:27.074315 mgr.node01 (mgr.2914449) 8009584 : cluster [DBG] pgmap v8009587: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 2.2 MiB/s wr, 262 op/s
02:43:29.075284 mgr.node01 (mgr.2914449) 8009585 : cluster [DBG] pgmap v8009588: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 2.9 MiB/s wr, 342 op/s
02:43:31.076180 mgr.node01 (mgr.2914449) 8009586 : cluster [DBG] pgmap v8009589: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 2.4 MiB/s wr, 269 op/s
02:43:33.077523 mgr.node01 (mgr.2914449) 8009587 : cluster [DBG] pgmap v8009590: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.4 MiB/s wr, 389 op/s
02:43:35.078543 mgr.node01 (mgr.2914449) 8009588 : cluster [DBG] pgmap v8009591: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.1 MiB/s wr, 344 op/s
02:43:37.079428 mgr.node01 (mgr.2914449) 8009589 : cluster [DBG] pgmap v8009592: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.0 MiB/s wr, 334 op/s
02:43:39.080419 mgr.node01 (mgr.2914449) 8009590 : cluster [DBG] pgmap v8009593: 512 pgs: 512 active+clean; 246 GiB data, 712 GiB used, 20 TiB / 21 TiB avail; 682 B/s rd, 3.3 MiB/s wr, 377 op/s

I'm not sure what could be causing high I/O waits; or whether this is an issue relating to my ceph configurations. Any suggestions would be appreciated/ or if you need any additional information, let me know what you need and I'll post them.

Thank you.

From lindsay.mathieson at gmail.com  Tue Jun 30 03:28:51 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Tue, 30 Jun 2020 11:28:51 +1000
Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue.
In-Reply-To: <mailman.66.1593479341.538.pve-user@pve.proxmox.com>
References: <mailman.66.1593479341.538.pve-user@pve.proxmox.com>
Message-ID: <ee66fe0e-4f50-99f0-e948-2c6f2e95bb62@gmail.com>

On 30/06/2020 11:08 am, jameslipski via pve-user wrote:
> ust to give a little bit of a background, we currently we have 6 nodes. We're running CEPH, and each node consists of
> 2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global ceph configurations (at least as shown on the proxmox interface) is as follows:

Network config? (ie. speed etc).


Ceph is Nautilus 14.2.9? (latest on proxmox)


Do you have KRBD set for the Proxmox Ceph Storage? that help a lot.

-- 
Lindsay


From pve at hw.wewi.de  Tue Jun 30 10:51:41 2020
From: pve at hw.wewi.de (Hermann)
Date: Tue, 30 Jun 2020 10:51:41 +0200
Subject: [PVE-User] FC-Luns only local devices?
Message-ID: <8e7395ee-c3ab-03fc-aa2d-9bf2e0781375@hw.wewi.de>

Dear PVE-USers,

I would really appreciate being steered in the right direction as to the
connection of Fibre-Channel-Luns in Proxmox.

As far as I can see, FC-LUNs only appear als local blockdevices in PVE.
If I have several LWL-Cables between my Cluster and these bloody
expensive Storages, do I have to set up multipath manually in debian?

Or is there a CLI-command handling all this?

Any helping hint is heartily appreciated.

Greetings, Hermann


From chris.hofstaedtler at deduktiva.com  Tue Jun 30 12:21:25 2020
From: chris.hofstaedtler at deduktiva.com (Chris Hofstaedtler | Deduktiva)
Date: Tue, 30 Jun 2020 12:21:25 +0200
Subject: [PVE-User] FC-Luns only local devices?
In-Reply-To: <8e7395ee-c3ab-03fc-aa2d-9bf2e0781375@hw.wewi.de>
References: <8e7395ee-c3ab-03fc-aa2d-9bf2e0781375@hw.wewi.de>
Message-ID: <20200630102125.nnrf3okkiwtdmojw@zeha.at>

Hi Hermann <NoLastName>,

* Hermann <pve at hw.wewi.de> [200630 10:51]:
> I would really appreciate being steered in the right direction as to the
> connection of Fibre-Channel-Luns in Proxmox.
> 
> As far as I can see, FC-LUNs only appear als local blockdevices in PVE.
> If I have several LWL-Cables between my Cluster and these bloody
> expensive Storages, do I have to set up multipath manually in debian?

With most storages you need to configure multipath itself manually,
with the settings your storage vendor hands you.

Our setup for this is:

1. Manual multipath setup, we tend to enable find_multipaths "smart"
to avoid configuring all WWIDs everywhere and so on.

2. The LVM PVs go directly on the mpathXX devices (no partitioning).

3. One VG per mpath device. The VGs are then seen by Proxmox just
like always.

You have to take great care when removing block devices again, so
all PVE nodes release the VGs, PVs, all underlying device mapper
devices, and remove the physical sdXX devices, before removing the
exports from the storage side.
Often it's easier to reboot, and during the reboot fence access to
the to-be-removed LUN for the currently rebooting host.

Chris

-- 
Chris Hofstaedtler / Deduktiva GmbH (FN 418592 b, HG Wien)
www.deduktiva.com / +43 1 353 1707


From jameslipski at protonmail.com  Tue Jun 30 14:07:59 2020
From: jameslipski at protonmail.com (jameslipski)
Date: Tue, 30 Jun 2020 12:07:59 +0000
Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue.
In-Reply-To: <ee66fe0e-4f50-99f0-e948-2c6f2e95bb62@gmail.com>
References: <mailman.66.1593479341.538.pve-user@pve.proxmox.com>
 <ee66fe0e-4f50-99f0-e948-2c6f2e95bb62@gmail.com>
Message-ID: <XphGdCwAehBt27RvzAQ4jhwF67RH_vajrhjrw8mI6TgvWFN4m4iMm0L961TA2qoWGfY4mlg3oIniTHMooQvVjsKGpn0oBYvnBFNHo-F_YgQ=@protonmail.com>

Thanks for the reply

All nodes are connected to a 10Gbit switch. Ceph is currently running on 14.2.2 but will update to the latest. KRBD was not enabled to the pool.

Before I update ceph, regarding KRBD, I've just enabled it, do I have do re-create the pool, restart ceph, restart the node, etc... or it just takes into effect?

??????? Original Message ???????
On Monday, June 29, 2020 9:28 PM, Lindsay Mathieson <lindsay.mathieson at gmail.com> wrote:

> On 30/06/2020 11:08 am, jameslipski via pve-user wrote:
>
> > ust to give a little bit of a background, we currently we have 6 nodes. We're running CEPH, and each node consists of
> > 2 osds (each node has 2x Intel SSDSC2KG019T8) OSD Type is bluestore. Global ceph configurations (at least as shown on the proxmox interface) is as follows:
>
> Network config? (ie. speed etc).
>
> Ceph is Nautilus 14.2.9? (latest on proxmox)
>
> Do you have KRBD set for the Proxmox Ceph Storage? that help a lot.
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Lindsay
>
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


From mark at tuxis.nl  Tue Jun 30 15:09:12 2020
From: mark at tuxis.nl (Mark Schouten)
Date: Tue, 30 Jun 2020 15:09:12 +0200
Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue.
In-Reply-To: <ee66fe0e-4f50-99f0-e948-2c6f2e95bb62@gmail.com>
References: <mailman.66.1593479341.538.pve-user@pve.proxmox.com>
 <ee66fe0e-4f50-99f0-e948-2c6f2e95bb62@gmail.com>
Message-ID: <20200630130912.qia6rghud5okmnsp@shell.tuxis.net>

On Tue, Jun 30, 2020 at 11:28:51AM +1000, Lindsay Mathieson wrote:
> Do you have KRBD set for the Proxmox Ceph Storage? that help a lot.

I think this is incorrect. Using KRBD uses the kernel-driver which is
usually older than the userland-version. Also, upgrading is easier when
not using KRBD. 

I'd like to hear that I'm wrong, am I? :)

-- 
Mark Schouten     | Tuxis B.V.
KvK: 74698818     | http://www.tuxis.nl/
T: +31 318 200208 | info at tuxis.nl


From mark at tuxis.nl  Tue Jun 30 15:12:33 2020
From: mark at tuxis.nl (Mark Schouten)
Date: Tue, 30 Jun 2020 15:12:33 +0200
Subject: [PVE-User] Ceph Bluestore - lvmcache versus WAL/DB on SSD
In-Reply-To: <c180d6a9-59e2-21ab-a5aa-96577b038aea@gmail.com>
References: <c180d6a9-59e2-21ab-a5aa-96577b038aea@gmail.com>
Message-ID: <20200630131233.z3ezuatxnys6n637@shell.tuxis.net>

On Tue, Jun 30, 2020 at 12:07:40AM +1000, Lindsay Mathieson wrote:
> As per the title :) I have 23 OSD spinners on 5 hosts, Data+WAL+DB all on
> the disk. All VM's are windows running with Writeback Cache. Performance is
> adequate but see occasional high IO loads that make the VM's sluggish.

Could be that (deep) scrubs are periodically killing your performance.
There are some tweaks available to make them less invading:

osd_scrub_chunk_min=20 # 5
osd_scrub_sleep=4 # 0

And then some:
https://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/

Best option is really to place spinning disks with SSD..

-- 
Mark Schouten     | Tuxis B.V.
KvK: 74698818     | http://www.tuxis.nl/
T: +31 318 200208 | info at tuxis.nl


From lindsay.mathieson at gmail.com  Tue Jun 30 17:44:58 2020
From: lindsay.mathieson at gmail.com (Lindsay Mathieson)
Date: Wed, 1 Jul 2020 01:44:58 +1000
Subject: [PVE-User] High I/O waits, not sure if it's a ceph issue.
In-Reply-To: <mailman.73.1593518889.538.pve-user@pve.proxmox.com>
References: <mailman.66.1593479341.538.pve-user@pve.proxmox.com>
 <ee66fe0e-4f50-99f0-e948-2c6f2e95bb62@gmail.com>
 <mailman.73.1593518889.538.pve-user@pve.proxmox.com>
Message-ID: <12aa72ce-9f3c-035c-dccd-8cf8b3279a9a@gmail.com>

On 30/06/2020 10:07 pm, jameslipski via pve-user wrote:
> Before I update ceph, regarding KRBD, I've just enabled it, do I have do re-create the pool, restart ceph, restart the node, etc... or it just takes into effect?


No need to recreate the pool, just stop/start the VM's accessing it.

-- 
Lindsay