[pve-devel] [PATCH v3 qemu-server 08/11] blockdev: convert drive_mirror to blockdev_mirror

Fiona Ebner f.ebner at proxmox.com
Wed Jan 15 11:06:19 CET 2025


Am 15.01.25 um 10:51 schrieb Fabian Grünbichler:
> 
>> Fiona Ebner <f.ebner at proxmox.com> hat am 15.01.2025 10:39 CET geschrieben:
>>
>>  
>> Am 14.01.25 um 11:03 schrieb DERUMIER, Alexandre:
>>>>> If we do need lookup, an idea to get around the character limit is
>>>>> using
>>>>> a hash of the information to generate the node name, e.g.
>>>>> hash("fmt-$volid@$snapname"), hash("file-$volid@$snapname") or
>>>>> whatever
>>>
>>> yes, I think it should works
>>>
>>>>> is actually needed as unique information. Even if we only use
>>>>> lowercase
>>>>> letters, we have 26 base chars, so 26^31 possible values.
>>>
>>> yes, I was think about a hash too, but I was not sure how to convert it
>>> to the alphanum characters (valid char : alphanum , ‘-’, ‘.’ and ‘_’. 
>>> )
>>>
>>>
>>>
>>>>> So hashes with up to
>>>>>
>>>>>> math.log2(26**31)
>>>>> 145.71363126237387
>>>>>
>>>>> bits can still fit, which should be more than enough. Even with an
>>>>> enormous number of 2^50 block nodes (realistically, the max values we
>>>>> expect to encounter are more like 2^10), the collision probability
>>>>> (using a simple approximation for the birthday problem) would only be
>>>>>
>>>>>> d=2**145
>>>>>> n=2**50
>>>>>> 1 - math.exp(-(n*n)/(2*d))
>>>>> 1.4210854715202004e-14
>>>
>>> yes, should be enough
>>>
>>> a simple md5 is 128bit, 
>>> sha1 is 160bit    (it's 150bits space with extra -,.,- characters)
>>>
>>> Do you known a good hash algorithm ?
>>
>> I'm not too well-read in cryptography, but AFAIK, you can shorten the
>> result of sha256 to get a good hash algorithm with fewer bits. We could
>> also have the node-name start with a "h" to make sure it doesn't start
>> with a number and then use base32 for the remaining 30 characters. I.e.
>> we could take the first 150 bits (32^30 = 2^150) from the sha256 hash
>> and convert that to base32.
>>
>> @Shannon @Fabian please correct me if I'm wrong.
> 
> IMHO this isn't really a cryptographic use case, so I'd not worry too much about any of that.

Yes, we don't need much to get enough collision-resistance. Just wanted
to make sure and check it explicitly.

> 
> basically what we have is the following situation:
> 
> - we have some input data (volid+snapname)
> - we have a key derived from the input data (block node name)
> - we have a value (block node)
> - we need to be be able to map back the block node (name) to the input data

Oh, we need to map back too? But that can be done via filename in the
block node, or not?

> sometimes we need to allocate a second block node temporarily for a given input data (right?), and we can't rename block nodes, so there might be more than one key value (block node name) for a key. to map back from a block node name to the volid+snapname, we can hash the input data and then use that (shortened) hash as the middle part of the block node name (with a counter as last part and some static/drive-related prefix). the only thing we need to ensure is that the hash is good enough to avoid accidental collisions (given the nature of the input data, I don't think we have to worry about non-accidental collisions either unless we choose a very basic checksum, but even if that were possible, an attacker could only mess with data of a VM where they can already add/remove images anyway..), and that we never re-use a block node name for something that doesn't match its input data (I have to admit I lost track a bit of whether that invariant can hold?).

Okay, sure. If we need other prefixes-suffixes, we can shorten the hash
part more. Even with only 15 characters for the hash, we have an
extremely low probability for collision with about a million nodes:

>>> math.log2(32**15)
75.0
>>> d=2**75
>>> n=2**20
>>> 1 - math.exp(-(n*n)/(2*d))
1.4551915228366852e-11




More information about the pve-devel mailing list