[pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror

Wed Jan 15 10:51:16 CET 2025

> Fiona Ebner <f.ebner at proxmox.com> hat am 15.01.2025 10:39 CET geschrieben:
> 
>  
> Am 14.01.25 um 11:03 schrieb DERUMIER, Alexandre:
> >>> If we do need lookup, an idea to get around the character limit is
> >>> using
> >>> a hash of the information to generate the node name, e.g.
> >>> hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or
> >>> whatever
> > 
> > yes, I think it should works
> > 
> >>> is actually needed as unique information. Even if we only use
> >>> lowercase
> >>> letters, we have 26 base chars, so 26^31 possible values.
> > 
> > yes, I was think about a hash too, but I was not sure how to convert it
> > to the alphanum characters (valid char : alphanum , ‘-’, ‘.’ and ‘_’. 
> > )
> > 
> > 
> > 
> >>> So hashes with up to
> >>>
> >>>> math.log2(26**31)
> >>> 145.71363126237387
> >>>
> >>> bits can still fit, which should be more than enough. Even with an
> >>> enormous number of 2^50 block nodes (realistically, the max values we
> >>> expect to encounter are more like 2^10), the collision probability
> >>> (using a simple approximation for the birthday problem) would only be
> >>>
> >>>> d=2**145
> >>>> n=2**50
> >>>> 1 - math.exp(-(n*n)/(2*d))
> >>> 1.4210854715202004e-14
> > 
> > yes, should be enough
> > 
> > a simple md5 is 128bit, 
> > sha1 is 160bit    (it's 150bits space with extra -,.,- characters)
> > 
> > Do you known a good hash algorithm ?
> 
> I'm not too well-read in cryptography, but AFAIK, you can shorten the
> result of sha256 to get a good hash algorithm with fewer bits. We could
> also have the node-name start with a "h" to make sure it doesn't start
> with a number and then use base32 for the remaining 30 characters. I.e.
> we could take the first 150 bits (32^30 = 2^150) from the sha256 hash
> and convert that to base32.
> 
> @Shannon @Fabian please correct me if I'm wrong.

IMHO this isn't really a cryptographic use case, so I'd not worry too much about any of that.

basically what we have is the following situation:

- we have some input data (volid+snapname)
- we have a key derived from the input data (block node name)
- we have a value (block node)
- we need to be be able to map back the block node (name) to the input data

sometimes we need to allocate a second block node temporarily for a given input data (right?), and we can't rename block nodes, so there might be more than one key value (block node name) for a key. to map back from a block node name to the volid+snapname, we can hash the input data and then use that (shortened) hash as the middle part of the block node name (with a counter as last part and some static/drive-related prefix). the only thing we need to ensure is that the hash is good enough to avoid accidental collisions (given the nature of the input data, I don't think we have to worry about non-accidental collisions either unless we choose a very basic checksum, but even if that were possible, an attacker could only mess with data of a VM where they can already add/remove images anyway..), and that we never re-use a block node name for something that doesn't match its input data (I have to admit I lost track a bit of whether that invariant can hold?).