[pbs-devel] [PATCH proxmox-backup 2/2] tape: write informational MAM attributes on tapes

Thu May 23 10:10:19 CEST 2024

Am 23/05/2024 um 08:09 schrieb Dominik Csapak:
> On 5/22/24 19:24, Thomas Lamprecht wrote:
>> What your commit did not mention is why you skip setting a few others, like I
>> could imagine that the following would have some use:> - DATE AND TIME LAST WRITTEN
> 
> i hesitated with this one as there is no timezone included and it does not
> specify one either, but i guess we could just use UTC (although that might
> be confusing for some people?)

Ok, that is a good point, and yeah a shame that the spec didn't make this
field 17 bytes wide, allowing to encode a [+-]ZZZZ UTC-timezone difference.

While for small setups in just one location, or set to the same timezone
in the whole organisation, independent of where the servers are located,
in can be even nice to use the local timezone, that still is a loss of
information.. Using UTC and documenting that is the only way to allow
being sure of what the actual time the tape was written to is in any
timezone.

So, I'd go for UTC here and document that. If we ever show this in the
UI or CLI we can render it correctly as ISO 8601 indicating that this is
UTC time.

> 
>> - TEXT LOCALIZATION IDENTIFIER (Strings are UTF-8 in rust, and we do not
>>    explicitly keep them in ASCII or the like FWICT)
> 
> that one i explicitly left out because we (currently) only write ascii,
> but yes, we could simply set that to utf-8 for "future-proof"ness

This is mostly enforced indirectly currently or? As the label depends on
the pool name and that one is enforced to match the "safe" regex?

In that case it might be good to either future-proof or alternatively, IMO
not really better, to at least enforce/check that the saved text is ascii
directly here, as the data is a String, which is utf-8 in rust, so coupling
the assumption here to the rather distanced API schema format seems not
ideal to me.

btw. enforcing the length might be nice too, what would actually happen if
one writes more data than reserved by the spec, does it spill into the
next field, does something catches this and errors out?

> - APPLICATION FORMAT VERSION (always good to have)
> 
> isn't that implicated by the application version ?

Not necessarily, there can be a new way to write tapes from PBS in the
future and the version to use might be selectable, or the newer one
backported to an older stable version (at least as option).

IME, tracking format and program versions as separate things makes life
only easier in the long run.

> 
> we don't really have a 'format version' for the tape format, but each
> archive on it has it's own version e.g. the snapshot archives
> have version 1.2 while the chunk archive and catalog archive have 1.1
> and the labels have only 1.0

You could combine those atoms that make out the whole tape format into a
full version by concatenating them with a separator like semicolon or a
plus or the like.

As this field has 16 characters you could even prefix each version with a
letter to make it slightly simpler to read, e.g.:

A1.2;C1.1;L1.0

Or use the letter as separator, making a bit more space for future version
extension:

A1.2C1.1L1.0

We could even use that now to define a global tape version or a, well,
versioning-version:

T1.0A1.2C1.1L1.0

A bit crowded but any (future) command of ours that outputs this information
could format that nicely and documenting it should cover third party tools.

>>
>> Not so sure from top of my head about the UIDS, i.e., if we even have something
>> that can be easily mapped to this.
> 
> not sure which field you mean here? in  LTO-5 there is only one standardized
> field left and that is the VOLUME COHERENCY INFORMATION
> and i don't think we'll need that

At least in LTO-9 there would be "MEDIUM GLOBALLY UNIQUE IDENTIFIER" and "MEDIA
POOL GLOBALLY UNIQUE IDENTIFIER", differing per LTO version shouldn't (hopefully)
be an issue, but probably not _that_ important, at least if we do not have
existing information that can be mapped 1:1 to those two fields already.

btw. there's also BARCODE, as we support barcode labeling, it might be good
to write that out too I guess?