[pve-devel] [PATCH RFC storage] rbd: fix #3286 add namespace support

Wed Mar 3 13:17:09 CET 2021

On March 3, 2021 11:10 am, aderumier at odiso.com wrote:
> Is they any plan on the roadmap to generalize namespace, but at vm
> level ?
> 
> I'm still looking for easy cross-cluster vm migration  with shared
> storage.

I recently picked up the remote migration feature, FWIW ;)

> 
> I was thinking about something simple like
> /etc/pve/<node>/qemu-server/<namespace>/<vmid.conf>

> with new disk volumes including the namespace in their path like:
> "scsi0: <storage>:<namespace>/vm-100-disk-0"

I am not sure how that would solve the issue? the problem with sharing a 
shared storage between clusters is that VMID 100 on cluster A and VMID 
100 on cluster B are not the same entity, so a volume owned by VMID 100 
is not attributable to either cluster.

if both clusters are allowed to setup a namespace FOO, then you need to 
manually take care not to duplicate VMIDs inside this namespace across 
all clusters, just like you have to take care to not duplicate VMIDs 
across all clusters right now?

if only one cluster is allowed to use a certain namespace, then shared 
migration needs to do a rename (or rather, move the VM and volumes 
from one namespace to another). that would mean no live-migration, since 
a live-rename of a volume is not possible, unless the namespace is not 
actually encoded in the volume name on the storage. if the namespace 
is not actually encoded in the volume name, it does not protect against 
cross-namespace confusion (since when listing a storage's contents, I 
can't tell which namespace volume BAR belongs to), and we'd be back to 
square one.

IMHO there are things that might help with the issue:
- a client used to manage all clusters that ensures a VMID is not 
  assigned to more than one cluster
- better support for custom volids (reduce chance of clashes, does not 
  solve issue with orphaned/unreferenced volumes)
- allow marking a storage as "don't scan for unreferenced volumes", so 
  that stray volumes likely belonging to other clusters are not picked 
  up when migrating/deleting/.. guests (setting this would also need to 
  disallow deleting any volumes via the storage API instead of the guest 
  API, as we don't have any safeguards on the storage level then..)

the first point is hard to do atomically, since we don't have a 
cross-cluster pmxcfs, but some sort of "assign ranges to clusters, 
remember exceptions for VMs which have been migrated away" could work, 
if ALL management then happens using this client and not the regular 
per-cluster API. this could also be supported in PVE right now 
(configure range in datacenter.cfg, have some API call to register "this 
VMID is burnt/does not belong to this cluster anymore, ignore it for all 
intents and purposes) - although obviously this would not yet guarantuee 
no re-use across clusters, but just enable integration/management tools 
to have some support on the PVE side for enforcing those ranges.

just some quick thoughts, might not be 100% thought-through in all 
directions :-P