[pve-devel] has somebody already tested corosync3 alpha et new knet transport ?

Wed Jun 27 08:02:30 CEST 2018

Hi,

On 6/26/18 10:54 PM, Alexandre DERUMIER wrote:
> I have found this presentation about coming corosync3 (seem to be alpha recently)
> http://build.clusterlabs.org/corosync/presentations/2017-Kronosnet-The-new-face-of-corosync-communications.pdf
> with the new kronosnet (knet) transport.
> 

Yes, tracking it somewhat since over half a year, looks really
good on paper but didn't not have yet time to do much testing -
as it'd be PVE 6.X timeframe anyway.

> 
> Latencies results are really impressive and no more multicast ! (users will be happy ;)

FYI, knet is a abstraction layer, it still uses udp (aka multicast)
As you do not get to handle a lot of links with a lot of nodes without
multicast - i.e., multicast is a very good thing, even if some hosting
environments and switch default settings are against it :)
It can also uses SCTP as transport method, which is a layer 4 protocol,
on the same level as UDP or TCP - i.e., it's not encapsulated in those.

> and a lot of others improvments (dynamic mtu, ifdown/iup without breaking cluster, and seem to be compatible with corosync2 (with udp, udpu transports)
> 
> I'm still looking to make bigger proxmox clusters in the future :)

Yes, looks definitly nice and it's on our radar, I'll try to build
a corosync 3 package if got a bit time to spare.

> BTW, I was at a kubernetes/container conference at Paris today,
> and a talk of a guy was about trying to create in own orchestrator instead kubernetes (because of problem with etcd, network lag brigging down k8s master,...),
> talking about clusters, paxos, strong consistency.
> 
> He's looking to use a causal consistency model instead strong consistency, I never heard about this,
> but this seem really great to be able to manage bigger cluster, and also geo clusters.

You can do more in parallel with it. In strong consistency models all
events (for our case, write/read operations) are ensured to be ordered.
If node A sees write OP-A happen before write OP-B then this principle
guarantee that all other nodes see OP-B after OP-A.
Casual consistency does this too, but only if OP-A and OP-B are related,
i.e., they affect each other (like a write to the same file would).

Are there links to the presentation, could be interesting :)
Seems they use a a protocol named "cure" for the update replication:
https://pages.lip6.fr/Marc.Shapiro/papers/Cure-final-ICDCS16.pdf

> He have given a link to an opensource key value store using causal consistency, called "antidote"
> https://syncfree.github.io/antidote/
> 
> Maybe for the future (proxmox 10 ;), it could be great to have this kind of model. 
> (I'm not enough expert to say if it could work, and If it could be possible to reimplement pmxcfs with this kind of protocol, and manage others things like pve-crm/lrm)

Hmm, an academic erlang project with a bit short whitepaper,
I'm a bit wary on such projects - but sounds definitively interesting.