[pve-devel] opensource vm scheduler : btrplace

Alexandre DERUMIER aderumier at odiso.com
Wed May 29 16:39:03 CEST 2019


>>here the academic paper of opennebula scheduler 
>>https://is.muni.cz/th/o8t7a/thesis.pdf 
Damn, sorry, this is not the current scheduler implementation of nebula, this is another new version
improved, but java too :/

----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Cc: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
Envoyé: Mercredi 29 Mai 2019 16:30:57
Objet: Re: [pve-devel] opensource vm scheduler : btrplace

>>Also, In my research, the opennebula scheduler is more basic, but should be implementable in perl without too much difficulty 
>>https://github.com/OpenNebula/one/blob/441cf1f7f9e726cb5f200d661d50e92a4042fff7/src/scheduler/src/sched/Scheduler.cc 
>> 
>>(It's migrate 1vm, recompute, migrate 1vm, recompute,...). 
>>So it's best effort, but could works for basic scheduling.(cpu/ram,ha group, affinity,antifinity) 
here the academic paper of opennebula scheduler 

https://is.muni.cz/th/o8t7a/thesis.pdf 

----- Mail original ----- 
De: "aderumier" <aderumier at odiso.com> 
À: "Thomas Lamprecht" <t.lamprecht at proxmox.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Mercredi 29 Mai 2019 15:44:29 
Objet: Re: [pve-devel] opensource vm scheduler : btrplace 

>>It also needs to be integrated that a VM which is currently locked (e.g., 
>>for backup or snapshot) must be marked as temporarily non-migratable, 
>>only if such information can be passed to the scheduler and it can use 
>>our (API) methods this could be of use... 
I think it possible to add new states to the model 
https://github.com/btrplace/scheduler/wiki/VMs-and-nodes-life-cycle 


>>After quick skip over those it seems that they are only related, 
>>e.g., showing how one could prove certain things in such an evironment 
>>but not the (full) idea behind the scheduler itself. 
>>The "real deal" which actually describe what they do is sadly behind 
>>a paywal: https://ieeexplore.ieee.org/abstract/document/6409358 

here the full version: (note that they are improvement since 2013) 

https://pages.lip6.fr/Julia.Lawall/btrplace-tdsc2013.pdf 



>>Sorry, I did not wanted to damper your enthusiasm about finally finding 
>>a really good solution for this, just thinking a bout a realistic 
>>integration.. Also the java part really won't fly, not only from me, but 
>>also Dietmar et al. won't like it. 

I don't like java too :p (and his garbage collector) 


>>Do you think you can find out about the real algorithms they use? 
>>I guess that porting this over to something without a runtime (Java or 
>>else) should not be to problematic (I hope I'm not to naïve here ^^) 
>>and more people/projects could benefit from it.. 

they use 

http://www.choco-solver.org/ 

(java too :/) 

I don't known if they exist some kind of magic java converter to another language (rust,...) ? 


BTW, they demo app is very nice for simulation, could be great web version of HA simulator. 



Also, In my research, the opennebula scheduler is more basic, but should be implementable in perl without too much difficulty 
https://github.com/OpenNebula/one/blob/441cf1f7f9e726cb5f200d661d50e92a4042fff7/src/scheduler/src/sched/Scheduler.cc 

(It's migrate 1vm, recompute, migrate 1vm, recompute,...). 
So it's best effort, but could works for basic scheduling.(cpu/ram,ha group, affinity,antifinity) 

----- Mail original ----- 
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com>, "aderumier" <aderumier at odiso.com> 
Envoyé: Mercredi 29 Mai 2019 10:34:01 
Objet: Re: [pve-devel] opensource vm scheduler : btrplace 

On 5/29/19 10:00 AM, Alexandre DERUMIER wrote: 
> and The algorithm compute the whole placements (which is super difficult to implement fastly, as the number of combinaisons compute can be really hurge), 
> and give the whole migration order. (benchmarks show some seconds to compute 10000vms on 10000 nodes) 

I mean it's effectively the "knapsack problem 

> It's also taking in count the estimated time of migration (based on network bandwith and also number of dirty pages changes in qemu), 
> and do parallel migrations. 


> They are a small interactive demo here 
> http://www.btrplace.org/play/ 
> 
> (source code of the demo frontend:https://github.com/btrplace/play backend: https://github.com/btrplace/playd) 

look interesting 

> 
> 
> Some presentations (a lot in french, as it's a research project of a french university, but it's seem to be used by nutatix in production): 

"Fabien Hermenier", the main contact on the WebSite works for nutanix 

> https://webcast.in2p3.fr/video/a_flexible_virtual_machine_placement_algorithm_for_iaas_clouds_to_fit_evolving_user_requirements 
> https://fhermeni.github.io/pubs/hermenier-rescom17.pdf 
> 
> some academic papers: 
> http://www.btrplace.org/pubs/hermenier-socc17.pdf 
> http://www.btrplace.org/pubs/kherbache-tcc17.pdf 

>>After quick skip over those it seems that they are only related, 
>>e.g., showing how one could prove certain things in such an evironment 
>>but not the (full) idea behind the scheduler itself. 
>>The "real deal" which actually describe what they do is sadly behind 
>>a paywal: https://ieeexplore.ieee.org/abstract/document/6409358 


>>It seems that the interesting code lives here: 
>>https://github.com/btrplace/scheduler/tree/master/api/src/main/java/org/btrplace 

> 
> Now, the main problem, is that it's java. (seem that scientific like it, redhat rhev/ovirt have also implement scheduling algo model with java). 
> I don't known if it could be implemented in proxmox? (or at least with a daemon like the daemon, and rest api call from perl to java? Importing java class in perl ???) 
> 

That's not good, we really do not want java runtime for anything 
in Proxmox, so not completely the holy grail.. 
But, currently there are some workings going on to see if Rust could 
be used for things needing to be fast, so maybe we could take the ideas 
and do so? I mean we if this should be used in PVE you need integration 
anyway, one needs to have in mind that backups, replications, ... can 
happen and one cannot just do a migration on QEMU level. 

It also needs to be integrated that a VM which is currently locked (e.g., 
for backup or snapshot) must be marked as temporarily non-migratable, 
only if such information can be passed to the scheduler and it can use 
our (API) methods this could be of use... 
Sorry, I did not wanted to damper your enthusiasm about finally finding 
a really good solution for this, just thinking a bout a realistic 
integration.. Also the java part really won't fly, not only from me, but 
also Dietmar et al. won't like it. 

Do you think you can find out about the real algorithms they use? 
I guess that porting this over to something without a runtime (Java or 
else) should not be to problematic (I hope I'm not to naïve here ^^) 
and more people/projects could benefit from it.. 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list