In our LUG (Linux User Group) here in Pordenone last saturday we had a
little ''breafing'' on Proxmox/Ceph.

A LUG member dismantle their PVE cluster, removed disks and we do the
test using their boxes, clearly with newer disks.

After the hacking session, they restore every node in original shape,
but something strange happened.

Stefano had a 3 node cluster (say PVE1, PVE2 and PVE3), and because
have some useful service running on it, they added a fourth node
(PVE4). But:

1) because there's no VM running on, first stopped PVE1, and removed disks.

2) then hitted their head on the wall (damn! Quorum!), so added PVE4 to
 the cluster.

3) the move all running VM to PVE4 and shut down PVE2 and PVE3.

So, PVE1 was 'not aware' of PVE4 (was shut down before).

Restoring the cluster in original shape, first add back PVE2 and PVE3 to
the cluster, with no trouble at all.

Then try to re-add PVE1, but does not add to the cluster.

They was forced to manually shut down corosync, mount /etc/pve and copy
corosync.conf from another node on PVE1.
After reboot, PVE1 was aware of the cluster.

Clearly, my friend committed a mistakes. But we are curious to know how
PVE/corosync does not recovery automatically.

Some idea:

1) because PVE1 have no DNS info about PVE4: i doubt that, because
 simply the manual ''sync'' of corosync configuration fixed that.

2) because there's some sort of ''master'' info in corosync, and
 plausibly PVE1 was the master (and so, was sut down thinking so) and
on restore phase, PVE4 was the master (because was the only node
working). Two masters conflict each other.
I doubt also that, because the cluster formeb by PVE2-3-4 worked.

3) because corosync recovery automagically if and only if the node
 mutually ''knows'' each other, and PVE1 know nothing about PVE4 (that
was added after).

4) ...

Someone have the answer?! Thanks. ;)
