[pbs-devel] [PATCH proxmox-backup 2/2] docs/scanrefs: fix handling if ref is same as headline

Mon Feb 8 17:06:38 CET 2021

On 2/6/21 9:22 AM, Thomas Lamprecht wrote:
> On 05.02.21 16:10, Aaron Lauterer wrote:
>> If the ref is named the same as the headline (once normalized), sphinx
>> will return a 'idX' value in node['ids'][1] which we use for the label
>> ID. The headline is always present at index 0.
>>
>> Checking for that and using index 0 in case we do get a 'idX' helps us
>> to avoid using the 'idX' as keys in our OnlineHelpInfo.js and actually
>> use the intended key.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
>> ---
>>   docs/_ext/proxmox-scanrefs.py | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/_ext/proxmox-scanrefs.py b/docs/_ext/proxmox-scanrefs.py
>> index 1b3c0615..0d626561 100644
>> --- a/docs/_ext/proxmox-scanrefs.py
>> +++ b/docs/_ext/proxmox-scanrefs.py
>> @@ -90,7 +90,18 @@ class ReflabelMapper(Builder):
>>                   if hasattr(node, 'expect_referenced_by_id') and len(node['ids']) > 1: # explicit labels
>>                       filename = self.env.doc2path(docname)
>>                       filename_html = re.sub('.rst', '.html', filename)
>> -                    labelid = node['ids'][1] # [0] is predefined by sphinx, we need [1] for explicit ones
>> +
>> +                    # node['ids'][0] contains a normalized version of the
>> +                    # headline.  If the ref and headline are the same
>> +                    # (normalized) sphinx will set the node['ids'][1] to a
>> +                    # generic id in the format `idX` where X is numeric. If the
>> +                    # ref and headline are not the same, the ref name will be
>> +                    # stored in node['ids'][1]
> 
> can you point me from where you derived that?
> 
> Because I think there are always two refs in such cases where we set one
> above a heading: the implicit heading one and the explicit from us.
> The always get normalized, but the implicit has a fallback if there's a ref
> conflict with an explicit or even another implicit one, when a title is
> reused in the same chapter or so?
> 
> Do we also have access to the chapter id/name here?
> Then we could enforce that explicit ones must have that prefixed.

I did derive that from comparing the output of the debug prints for the different situations. Unfortunately the Sphinx docs are a bit sparse on that or my search foo is not good enough ;)

Comparing the output if the explicit ref matches the implicit from the headline (shortened the 'children' element):

{'attributes': {'backrefs': [],
                 'classes': [],
                 'dupnames': [],
                 'ids': ['creating-backups', 'id1'],
                 'names': ['creating backups', 'creating_backups']},
  'children': [<title: <#text: 'Creating Backups'>>,
               <paragraph: <#text: 'This section e ...'>>,
               [.....]
               <literal_block: <#text: '# proxmox-back ...'>>,
               <section "excluding files/folders from a backup": <title...><paragraph...><paragraph...><paragraph...><par ...>],
  'document': <document: <section "backup client usage"...>>,
  'expect_referenced_by_id': {'creating-backups': <target: >},
  'expect_referenced_by_name': {'creating_backups': <target: >},

And now if the explicit ref is different from the headline:

{'attributes': {'backrefs': [],
                 'classes': [],
                 'dupnames': [],
                 'ids': ['creating-backups', 'client-creating-backups'],
                 'names': ['creating backups', 'client_creating_backups']},
  'children': [<title: <#text: 'Creating Backups'>>,
               <paragraph: <#text: 'This section e ...'>>,
               [...]
               <literal_block: <#text: '# proxmox-back ...'>>,
               <section "excluding files/folders from a backup": <title...><paragraph...><paragraph...><paragraph...><par ...>],
  'document': <document: <section "backup client usage"...>>,
  'expect_referenced_by_id': {'client-creating-backups': <target: >},
  'expect_referenced_by_name': {'client_creating_backups': <target: >},

You can see the difference in the 'attributes.ids' array.

On thing though that I observed is that 'expect_referenced_by_id' will contain the actual key used for the ref AFAICT. So we could use that and not worry about checking if the 'attributes.ids[0]' array contains a string starting with 'id[0-9]'. If I set the explicit ref to 'idX' with X being a number, that then is also present in the 'expect_referenced_by_id' field.

On an additional note: Right now we do not have any explicit references matching the headlines they are referencing because they are all prefixed or unique in another way. We could add a check here to fail if the explicit ref id matches the normalized headline and throw a warning / die with error to avoid any ambiguity in the refs in the future.

e.g. (pseudo code)
if (attributes['ids'][0] == expect_referenced_by_id:
     exit('reference is matching implicit headline ref, consider adding a prefix')

> 
>> +                    if re.match('^id[0-9]*$', node['ids'][1]):
> 
> should be a + not * op? we want to avoid clashes with real possible refs
> as much as possible..
> 
> What happens if I set now one to id1 and there would be already an id1?
> 
> I just really do not want to revisit this again, and loosing references
> is a no-go, the docs must work.

See above note, I think that addresses it.

> 
>> +                        labelid = node['ids'][0]
>> +                    else:
>> +                        labelid = node['ids'][1]
>> +
>>                       title = cast(nodes.title, node[0])
>>                       logger.info('traversing section {}'.format(title.astext()))
>>                       ref_name = getattr(title, 'rawsource', title.astext())
>>
>