[pve-devel] [PATCH proxmox-i18n v2] use xgettext to extract translatable strings

Maximiliano Sandoval m.sandoval at proxmox.com
Thu Dec 7 09:42:30 CET 2023


xgettext is a robust tool to extract translatable strings from source
code.

Using msgcat for concatenating pot files is not recommended, hence we
also switch to xgettext. It also added garbage when there were comments.

What do we get for free:

- It de-escapes strings. there are 3 cases in our code base where
  single-quoted strings were used and its `'` had to be escaped, these
  were not de-escaped properly when presented to translators. This is one
  such example

  ```diff
   #: proxmox-widget-toolkit/src/panel/EmailRecipientPanel.js:39
  -msgid "The notification will be sent to the user\\'s configured mail address"
  +#, fuzzy
  +msgid "The notification will be sent to the user's configured mail address"
   msgstr "La notificación sera enviada a el correo configurado del usuario"
  ```

- xgettext can detect when strings use a certain style of substitutions,
  but I was not able to detect the conditions and it only affects a single
  string in the entire code base.

  ```diff
   #: proxmox-widget-toolkit/src/Utils.js:995
  +#, javascript-format
   msgid "{0}% of {1}"
   msgstr "{0}% de {1}"
  ```

- Correct POT-Creation-Date, note how the new one matches the
  Revision-Date's format.

  ```diff
  @@ -7,7 +7,7 @@ msgid ""
   msgstr ""
   "Project-Id-Version: proxmox translations\n"
   "Report-Msgid-Bugs-To: <support at proxmox.com>\n"
  -"POT-Creation-Date: Wed Nov 22 18:17:30 2023\n"
  +"POT-Creation-Date: 2023-12-01 11:25+0100\n"
   "PO-Revision-Date: 2023-11-27 16:43+0100\n"
   "Last-Translator: Maximiliano Sandoval <m.sandoval at proxmox.com>\n"
   "Language-Team: Spanish\n"
  ```

- Extraction of strings using ngettext, pgettext, etc. Even if we don't
  have js wrappers for these at the moment, they are critical to provide
  good-quality translations and could be added in the future.

- We can extract comments from the source code with `xgettext -c TAG`.

  Code comments in a line above a `gettext` that start with `TRANSLATORS`
  will be added to the po files to provide context for translators.

  Newly added comments won't mark strings as fuzzy but can provide
  helpful context to translators.

  Comments are additive, if for example two sources contain
  the same string with different comments and it appears a third time
  without comments, the three sources and the two comments will be shown
  to translators.

  These are a few examples that could be implemented in our codebase:

  It is not clear if "Prune Options" prunes the options or configures
  pruning.
  ```js
  // TRANSLATORS: Opens the panel that allows configuring how Pruning works
  let s = gettext("Prune Options");
  ```

  Adding a source for a concept or its expanded name can help
  translators decide whats the gender for a word in their language.
  ```js
  // TRANSLATORS: TOTP stands for Time-based one-time password
  let s = gettext("Add a TOTP login factor");
  ```

  Some strings are not marked for translation to avoid translating
  certain parts of it, this is a change that could be made
  ```diff
  -fieldLabel: 'Crush Rule', // do not localize
  +// TRANSLATORS: Do not translate 'Crush', its a proper name
  +fieldLabel: gettext('Crush Rule'),
  ```

  Or simply to give more context when substitutions are involved.
  ```
  // TRANSLATORS: For example 'Join CLUSTER_NAME'
  return Ext.String.format(gettext('Join {0}'), `'${cn}'`);
  ```

Cons:
- In total 3 translations were marked as fuzzy. Translators will have to
  review and mark them as translated again.

- The reordering of sources for each msgstr will create an unnecessarily
  massive (yet ultimately harmless) diff (approx. 50k insertions(+) 50k
  deletions(-)).

Signed-off-by: Maximiliano Sandoval <m.sandoval at proxmox.com>
---
Differences from v1:
 - Use `find -name` rather than `find -iname`
 - Only extract comments starting with TRANSLATORS. It seems it is not possible to
   specify multiple tags.

 Makefile     |  11 ++++-
 jsgettext.pl | 135 ---------------------------------------------------
 2 files changed, 9 insertions(+), 137 deletions(-)
 delete mode 100755 jsgettext.pl

diff --git a/Makefile b/Makefile
index 1d7af6e..cee10cf 100644
--- a/Makefile
+++ b/Makefile
@@ -97,7 +97,14 @@ pbs-lang-%.js: %.po
 # parameter 1 is the name
 # parameter 2 is the directory
 define potupdate
-    ./jsgettext.pl -p "$(1) $(shell cd $(2);git rev-parse HEAD)" -o $(1).pot $(2)
+	find . -name "*.js" -path "./$(2)*" | xargs xgettext -s \
+      --add-comments=TRANSLATORS \
+      --from-code="UTF-8" \
+      --package-name="$(1)" \
+      --package-version="$(shell cd $(2);git rev-parse HEAD)" \
+      --msgid-bugs-address="<support at proxmox.com>" \
+      --copyright-holder="Copyright (C) Proxmox Server Solutions GmbH <support at proxmox.com> & the translation contributors." \
+      --output="$(1)".pot
 endef
 
 .PHONY: update update_pot do_update
@@ -124,7 +131,7 @@ init-%.po: messages.pot
 
 .INTERMEDIATE: messages.pot
 messages.pot: proxmox-widget-toolkit.pot proxmox-mailgateway.pot pve-manager.pot proxmox-backup.pot
-	msgcat $^ > $@
+	xgettext $^ --msgid-bugs-address="<support at proxmox.com>" -o $@
 
 .PHONY: distclean
 distclean: clean
diff --git a/jsgettext.pl b/jsgettext.pl
deleted file mode 100755
index 7f758fd..0000000
--- a/jsgettext.pl
+++ /dev/null
@@ -1,135 +0,0 @@
-#!/usr/bin/perl
-
-use strict;
-use warnings;
-
-use Encode;
-use Getopt::Long;
-use Locale::PO;
-use Time::Local;
-
-my $options = {};
-GetOptions($options, 'o=s', 'b=s', 'p=s') or die "unable to parse options\n";
-
-my $dirs = [@ARGV];
-
-die "no directory specified\n" if !scalar(@$dirs);
-
-foreach my $dir (@$dirs) {
-    die "no such directory '$dir'\n" if ! -d $dir;
-}
-
-my $projectId = $options->{p} || die "missing project ID\n";
-
-my $basehref = {};
-if (my $base = $options->{b}) {
-    my $aref = Locale::PO->load_file_asarray($base) ||
-	die "unable to load '$base'\n";
-
-    my $charset;
-    my $hpo = $aref->[0] || die "no header";
-    my $header = $hpo->dequote($hpo->msgstr);
-    if ($header =~ m|^Content-Type:\s+text/plain;\s+charset=(\S+)$|im) {
-	$charset = $1;
-    } else {
-	die "unable to get charset\n" if !$charset;
-    }
-
-    foreach my $po (@$aref) {
-	my $qmsgid = decode($charset, $po->msgid);
-	my $msgid = $po->dequote($qmsgid);
-	$basehref->{$msgid} = $po;
-    }
-}
-
-sub find_js_sources {
-    my ($base_dirs) = @_;
-
-    my $find_cmd = 'find ';
-    # shell quote heuristic, with the (here safe) assumption that the dirs don't contain single-quotes
-    $find_cmd .= join(' ', map { "'$_'" } $base_dirs->@*);
-    $find_cmd .= ' -name "*.js"';
-    open(my $find_cmd_output, '-|', "$find_cmd | sort") or die "Failed to execute command: $!";
-
-    my $sources = [];
-    while (my $line = <$find_cmd_output>) {
-	chomp $line;
-	print "F: $line\n";
-	push @$sources, $line;
-    }
-    close($find_cmd_output);
-
-    return $sources;
-}
-
-my $header = <<'__EOD';
-Proxmox message catalog.
-
-Copyright (C) Proxmox Server Solutions GmbH
-
-This file is free software: you can redistribute it and/or modify it under the terms of the GNU
-Affero General Public License as published by the Free Software Foundation, either version 3 of the
-License, or (at your option) any later version.
--- Proxmox Support Team <support\@proxmox.com>
-__EOD
-
-my $ctime = scalar localtime;
-
-my $href = {
-    '' => Locale::PO->new(
-	-msgid => '',
-	-comment => $header,
-	-fuzzy => 1,
-	-msgstr => "Project-Id-Version: $projectId\n"
-	    ."Report-Msgid-Bugs-To: <support\@proxmox.com>\n"
-	    ."POT-Creation-Date: $ctime\n"
-	    ."PO-Revision-Date: YEAR-MO-DA HO:MI +ZONE\n"
-	    ."Last-Translator: FULL NAME <EMAIL\@ADDRESS>\n"
-	    ."Language-Team: LANGUAGE <support\@proxmox.com>\n"
-	    ."MIME-Version: 1.0\n"
-	    ."Content-Type: text/plain; charset=UTF-8\n"
-	    ."Content-Transfer-Encoding: 8bit\n",
-    ),
-};
-
-sub extract_msg {
-    my ($filename, $linenr, $line) = @_;
-
-    my $count = 0;
-
-    while(1) {
-	my $text;
-	if ($line =~ m/\bgettext\s*\((("((?:[^"\\]++|\\.)*+)")|('((?:[^'\\]++|\\.)*+)'))\)/g) {
-	    $text = $3 || $5;
-	}
-	last if !$text;
-	return if $basehref->{$text};
-	$count++;
-
-	my $ref = "$filename:$linenr";
-
-	if (my $po = $href->{$text}) {
-	    $po->reference($po->reference() . " $ref");
-	} else {
-	    $href->{$text} = Locale::PO->new(-msgid=> $text, -reference=> $ref, -msgstr=> '');
-	}
-    }
-    die "can't extract gettext message in '$filename' line $linenr\n" if !$count;
-    return;
-}
-
-my $sources = find_js_sources($dirs);
-
-foreach my $s (@$sources) {
-    open(my $SRC_FH, '<', $s) || die "unable to open file '$s' - $!\n";
-    while(defined(my $line = <$SRC_FH>)) {
-	if ($line =~ m/gettext\s*\(/ && $line !~ m/^\s*function gettext/) {
-	    extract_msg($s, $., $line);
-	}
-    }
-    close($SRC_FH);
-}
-
-my $filename = $options->{o} // "messages.pot";
-Locale::PO->save_file_fromhash($filename, $href);
-
-- 
2.39.2





More information about the pve-devel mailing list