diff mbox series

[v3,3/9] Makefile: have "make pot" not "reset --hard"

Message ID 20220523012531.4505-4-worldhello.net@gmail.com (mailing list archive)
State Superseded
Headers show
Series Incremental po/git.pot update and new l10n workflow | expand

Commit Message

Jiang Xin May 23, 2022, 1:25 a.m. UTC
From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
custom PRItime format, 2017-07-20) we'd consider source files as-is
with gettext, but because we need to understand PRItime in the same way
that gettext itself understands PRIuMAX we'd first check if we had a
clean checkout, then munge all of the processed files in-place with
"sed", generate "po/git.pot", and then finally "reset --hard" to undo
our changes.

By generating "pot" snippets in ".build/pot/po" for each source file
and rewriting certain source files with PRItime macros to temporary
files in ".build/pot/po", we can avoid running "make pot" by altering
files in place and doing a "reset --hard" afterwards.

This speed of "make pot" is slower than before on an initial run,
because we run "xgettext" many times (once per source file), but it
can be boosted by parallelization. It is *much* faster for incremental
runs, and will allow us to implement related targets in subsequent
commits.

When the "pot" target was originally added in cd5513a7168 (i18n:
Makefile: "pot" target to extract messages marked for translation,
2011-02-22) it behaved like a "normal" target. I.e. we'd skip the
re-generation of the po/git.pot if nothing had to be done.

Then after po/git.pot was checked in in dce37b66fb0 (l10n: initial
git.pot for 1.7.10 upcoming release, 2012-02-13) the target was broken
until 1f31963e921 (i18n: treat "make pot" as an explicitly-invoked
target, 2014-08-22) when it was made to depend on "FORCE". I.e. the
Makefile's dependency resolution inherently can't handle incremental
building when the target file may be updated by git (or something else
external to "make"). But this case no longer applies, so FORCE is no
longer needed.

That out of the way, the main logic change here is getting rid of the
"reset --hard":

We'll generate intermediate ".build/pot/po/%.po" files from "%", which
is handy to see at a glance what strings (if any) in a given file are
marked for translation:

	$ make .build/pot/po/pretty.c.po
	[...]
	$ cat .build/pot/po/pretty.c.po
	#: pretty.c:1051
	msgid "unable to parse --pretty format"
	msgstr ""
	$

For these C source files which contain the PRItime macros, we will
create temporary munged "*.c" files in a tree in ".build/pot/po"
corresponding to our source tree, and have "xgettext" consider those.
The rule needs to be careful to "(cd .build/pot/po && ...)", because
otherwise the comments in the po/git.pot file wouldn't refer to the
correct source locations (they'd be prefixed with ".build/pot/po").
These temporary munged "*.c” files will be removed immediately after
the corresponding po files are generated, because some development tools
cannot ignore the duplicate source files in the ".build" directory
according to the ".gitignore" file, and that may cause trouble.

The output of the generated po/git.pot file is changed in one minor
way: Because we're using msgcat(1) instead of xgettext(1) to
concatenate the output we'll now disambiguate where "TRANSLATORS"
comments come from, in cases where a message is the same in N files,
and either only one has a "TRANSLATORS" comment, or they're
different. E.g. for the "Your edited hunk[...]" message we'll now
apply this change (comment content elided):

	+#. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
	 #. TRANSLATORS: do not translate [y/n]
	[...]
	+#. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
	 #. TRANSLATORS: do not translate [y/n]
	[...]
	 #: add-patch.c:1253 git-add--interactive.perl:1244
	 msgid ""
	 "Your edited hunk does not apply. Edit again (saying \"no\" discards!) [y/n]? "
	 msgstr ""

There are six such changes, and they all make the context more
understandable, as msgcat(1) is better at handling these edge cases
than xgettext(1)'s previously used "--join-existing" flag.

But filenames in the above disambiguation lines of extracted-comments
have an extra ".po" extension compared to the filenames at the file
locations. While we could rename the intermediate ".build/pot/po/%.po"
files without the ".po" extension to use more intuitive filenames in
the disambiguation lines of extracted-comments, but that will confuse
developer tools with lots of invalid C or other source files in
".build/pot/po" directory.

The addition of "--omit-header" option for xgettext makes the "pot"
snippets in ".build/pot/po/*.po" smaller. But as we'll see in a
subsequent commit this header behavior has been hiding an
encoding-related bug from us, so let's carry it forward instead of
re-generating it with xgettext(1).

The "po/git.pot" file should have a header entry, because a proper
header entry will increase the speed of creating a new po file using
msginit and set a proper "POT-Creation-Date:" field in the header
entry of a "po/XX.po" file. We use xgettext to generate a separate
header file at ".build/pot/git.header" from "/dev/null", and use this
header to assemble "po/git.pot".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore |  1 +
 Makefile   | 80 ++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 60 insertions(+), 21 deletions(-)

Comments

Ævar Arnfjörð Bjarmason May 23, 2022, 7:28 a.m. UTC | #1
On Mon, May 23 2022, Jiang Xin wrote:

> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
> custom PRItime format, 2017-07-20) we'd consider source files as-is
> with gettext, but because we need to understand PRItime in the same way
> that gettext itself understands PRIuMAX we'd first check if we had a
> clean checkout, then munge all of the processed files in-place with
> "sed", generate "po/git.pot", and then finally "reset --hard" to undo
> our changes.
>
> By generating "pot" snippets in ".build/pot/po" for each source file
> and rewriting certain source files with PRItime macros to temporary
> files in ".build/pot/po", we can avoid running "make pot" by altering
> files in place and doing a "reset --hard" afterwards.
>
> This speed of "make pot" is slower than before on an initial run,
> because we run "xgettext" many times (once per source file), but it
> can be boosted by parallelization. It is *much* faster for incremental
> runs, and will allow us to implement related targets in subsequent
> commits.
>
> When the "pot" target was originally added in cd5513a7168 (i18n:
> Makefile: "pot" target to extract messages marked for translation,
> 2011-02-22) it behaved like a "normal" target. I.e. we'd skip the
> re-generation of the po/git.pot if nothing had to be done.
>
> Then after po/git.pot was checked in in dce37b66fb0 (l10n: initial
> git.pot for 1.7.10 upcoming release, 2012-02-13) the target was broken
> until 1f31963e921 (i18n: treat "make pot" as an explicitly-invoked
> target, 2014-08-22) when it was made to depend on "FORCE". I.e. the
> Makefile's dependency resolution inherently can't handle incremental
> building when the target file may be updated by git (or something else
> external to "make"). But this case no longer applies, so FORCE is no
> longer needed.
>
> That out of the way, the main logic change here is getting rid of the
> "reset --hard":
>
> We'll generate intermediate ".build/pot/po/%.po" files from "%", which
> is handy to see at a glance what strings (if any) in a given file are
> marked for translation:
>
> 	$ make .build/pot/po/pretty.c.po
> 	[...]
> 	$ cat .build/pot/po/pretty.c.po
> 	#: pretty.c:1051
> 	msgid "unable to parse --pretty format"
> 	msgstr ""
> 	$
>
> For these C source files which contain the PRItime macros, we will
> create temporary munged "*.c" files in a tree in ".build/pot/po"
> corresponding to our source tree, and have "xgettext" consider those.
> The rule needs to be careful to "(cd .build/pot/po && ...)", because
> otherwise the comments in the po/git.pot file wouldn't refer to the
> correct source locations (they'd be prefixed with ".build/pot/po").
> These temporary munged "*.c” files will be removed immediately after
> the corresponding po files are generated, because some development tools
> cannot ignore the duplicate source files in the ".build" directory
> according to the ".gitignore" file, and that may cause trouble.
>
> The output of the generated po/git.pot file is changed in one minor
> way: Because we're using msgcat(1) instead of xgettext(1) to
> concatenate the output we'll now disambiguate where "TRANSLATORS"
> comments come from, in cases where a message is the same in N files,
> and either only one has a "TRANSLATORS" comment, or they're
> different. E.g. for the "Your edited hunk[...]" message we'll now
> apply this change (comment content elided):
>
> 	+#. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
> 	 #. TRANSLATORS: do not translate [y/n]
> 	[...]
> 	+#. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
> 	 #. TRANSLATORS: do not translate [y/n]
> 	[...]
> 	 #: add-patch.c:1253 git-add--interactive.perl:1244
> 	 msgid ""
> 	 "Your edited hunk does not apply. Edit again (saying \"no\" discards!) [y/n]? "
> 	 msgstr ""
>
> There are six such changes, and they all make the context more
> understandable, as msgcat(1) is better at handling these edge cases
> than xgettext(1)'s previously used "--join-existing" flag.
>
> But filenames in the above disambiguation lines of extracted-comments
> have an extra ".po" extension compared to the filenames at the file
> locations. While we could rename the intermediate ".build/pot/po/%.po"
> files without the ".po" extension to use more intuitive filenames in
> the disambiguation lines of extracted-comments, but that will confuse
> developer tools with lots of invalid C or other source files in
> ".build/pot/po" directory.
>
> The addition of "--omit-header" option for xgettext makes the "pot"
> snippets in ".build/pot/po/*.po" smaller. But as we'll see in a
> subsequent commit this header behavior has been hiding an
> encoding-related bug from us, so let's carry it forward instead of
> re-generating it with xgettext(1).
>
> The "po/git.pot" file should have a header entry, because a proper
> header entry will increase the speed of creating a new po file using
> msginit and set a proper "POT-Creation-Date:" field in the header
> entry of a "po/XX.po" file. We use xgettext to generate a separate
> header file at ".build/pot/git.header" from "/dev/null", and use this
> header to assemble "po/git.pot".
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  .gitignore |  1 +
>  Makefile   | 80 ++++++++++++++++++++++++++++++++++++++++--------------
>  2 files changed, 60 insertions(+), 21 deletions(-)
>
> diff --git a/.gitignore b/.gitignore
> index e81de1063a..a452215764 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -200,6 +200,7 @@
>  *.[aos]
>  *.o.json
>  *.py[co]
> +.build/
>  .depend/
>  *.gcda
>  *.gcno
> diff --git a/Makefile b/Makefile
> index 46914dcd80..1962999c18 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -569,6 +569,7 @@ INSTALL = install
>  TCL_PATH = tclsh
>  TCLTK_PATH = wish
>  XGETTEXT = xgettext
> +MSGCAT = msgcat
>  MSGFMT = msgfmt
>  CURL_CONFIG = curl-config
>  GCOV = gcov
> @@ -2706,6 +2707,7 @@ XGETTEXT_FLAGS = \
>  	--force-po \
>  	--add-comments=TRANSLATORS: \
>  	--msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
> +	--package-name=Git \
>  	--sort-by-file \
>  	--from-code=UTF-8
>  XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
> @@ -2714,6 +2716,7 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
>  	--keyword=gettextln --keyword=eval_gettextln
>  XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
>  	--keyword=__ --keyword=N__ --keyword="__n:1,2"
> +MSGCAT_FLAGS = --sort-by-file
>  LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
>  	      $(GENERATED_H)
>  LOCALIZED_SH = $(SCRIPT_SH)
> @@ -2726,34 +2729,68 @@ LOCALIZED_SH += t/t0200/test.sh
>  LOCALIZED_PERL += t/t0200/test.perl
>  endif
>  
> -## Note that this is meant to be run only by the localization coordinator
> -## under a very controlled condition, i.e. (1) it is to be run in a
> -## Git repository (not a tarball extract), (2) any local modifications
> -## will be lost.
> +## We generate intermediate .build/pot/po/%.po files containing a
> +## extract of the translations we find in each file in the source
> +## tree. We will assemble them using msgcat to create the final
> +## "po/git.pot" file.
> +LOCALIZED_ALL_GEN_PO =
> +
> +LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po)
> +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO)
> +
> +LOCALIZED_SH_GEN_PO = $(LOCALIZED_SH:%=.build/pot/po/%.po)
> +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO)
> +
> +LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po)
> +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO)
> +
>  ## Gettext tools cannot work with our own custom PRItime type, so
>  ## we replace PRItime with PRIuMAX.  We need to update this to
>  ## PRIdMAX if we switch to a signed type later.
> +$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
> +	$(call mkdir_p_parent_template)
> +	$(QUIET_XGETTEXT) \
> +	    if grep -q PRItime $<; then \
> +		(\
> +			sed -e 's|PRItime|PRIuMAX|g' <$< \
> +				>.build/pot/po/$< && \
> +			cd .build/pot/po && \
> +			$(XGETTEXT) --omit-header \
> +				-o $(@:.build/pot/po/%=%) \
> +				$(XGETTEXT_FLAGS_C) $< && \
> +			rm $<; \
> +		); \
> +	    else \
> +		$(XGETTEXT) --omit-header \
> +			-o $@ $(XGETTEXT_FLAGS_C) $<; \
> +	    fi
>  
> -po/git.pot: $(GENERATED_H) FORCE
> -	# All modifications will be reverted at the end, so we do not
> -	# want to have any local change.
> -	git diff --quiet HEAD && git diff --quiet --cached
> +$(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
> +	$(call mkdir_p_parent_template)
> +	$(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
> +		-o$@ $(XGETTEXT_FLAGS_SH) $<
>  
> -	@for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \
> -	do \
> -		sed -e 's|PRItime|PRIuMAX|g' <"$$s" >"$$s+" && \
> -		cat "$$s+" >"$$s" && rm "$$s+"; \
> -	done
> +$(LOCALIZED_PERL_GEN_PO): .build/pot/po/%.po: %
> +	$(call mkdir_p_parent_template)
> +	$(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
> +		-o$@ $(XGETTEXT_FLAGS_PERL) $<
> +q
> +define gen_pot_header
> +$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
> +	-o - /dev/null | \
> +sed -e 's|charset=CHARSET|charset=UTF-8|' \
> +    -e 's|\(Last-Translator: \)FULL NAME <.*>|\1make by the Makefile|' \
> +    -e 's|\(Language-Team: \)LANGUAGE <.*>|\1Git Mailing List <git@vger.kernel.org>|' \
> +    >$@ && \
> +echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
> +endef
>  
> -	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ $(XGETTEXT_FLAGS_C) $(LOCALIZED_C)
> -	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_SH) \
> -		$(LOCALIZED_SH)
> -	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_PERL) \
> -		$(LOCALIZED_PERL)
> +.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
> +	$(call mkdir_p_parent_template)
> +	$(QUIET_GEN)$(gen_pot_header)

So re a previous round I think the addition of a header to po/git.pot
doesn't make sense, i.e. it's not needed, so we can just drop this
entirely.

But if you really want to include it anyway then this dependency is
broken, the header just depends on this gen_pot_header you've defined,
we don't need to re-generate it every time any single source file
changes.

Or is the intent to have its timestamp bumped if any of the source files
change?

If so I think that to the extent we need headers at all (for the *.po
files) it might be worthwhile to just have the timestamp be the same
epoch we use for the tests in t/, i.e. never update it and avoid the
needless churn. If people need to check when things were last changed,
or initially added, they can use this little thing called "git" :)
Junio C Hamano May 23, 2022, 8:15 a.m. UTC | #2
Jiang Xin <worldhello.net@gmail.com> writes:

> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
> custom PRItime format, 2017-07-20) we'd consider source files as-is

", we'd consider".

> with gettext, but because we need to understand PRItime in the same way
> that gettext itself understands PRIuMAX we'd first check if we had a

"PRIuMAX, we'd first"

> clean checkout, then munge all of the processed files in-place with
> "sed", generate "po/git.pot", and then finally "reset --hard" to undo
> our changes.
>
> By generating "pot" snippets in ".build/pot/po" for each source file
> and rewriting certain source files with PRItime macros to temporary
> files in ".build/pot/po", we can avoid running "make pot" by altering
> files in place and doing a "reset --hard" afterwards.

Good.

> This speed of "make pot" is slower than before on an initial run,
> because we run "xgettext" many times (once per source file), but it
> can be boosted by parallelization. It is *much* faster for incremental
> runs, and will allow us to implement related targets in subsequent
> commits.

This is to show my ignorance, but is there any downside, other than
increased overhead coming from runing many instances of the program,
in the "one file at a time" approach?  I was wondering if two or
more identical translatable strings appear in multiple source files,
where they are coalesced into a single entry in the resulting .pot
file, and if xgettext having visibility into all these files would
somehow help the process, but presumably we'd use msgcat to unify
them into one entry so there wouldn't be such a downside there.  But
are there others?

Thanks.
Jiang Xin May 23, 2022, 9:37 a.m. UTC | #3
On Mon, May 23, 2022 at 4:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jiang Xin <worldhello.net@gmail.com> writes:
>
> > From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >
> > Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
> > custom PRItime format, 2017-07-20) we'd consider source files as-is
>
> ", we'd consider".
>
> > with gettext, but because we need to understand PRItime in the same way
> > that gettext itself understands PRIuMAX we'd first check if we had a
>
> "PRIuMAX, we'd first"
>
> > clean checkout, then munge all of the processed files in-place with
> > "sed", generate "po/git.pot", and then finally "reset --hard" to undo
> > our changes.
> >
> > By generating "pot" snippets in ".build/pot/po" for each source file
> > and rewriting certain source files with PRItime macros to temporary
> > files in ".build/pot/po", we can avoid running "make pot" by altering
> > files in place and doing a "reset --hard" afterwards.
>
> Good.
>
> > This speed of "make pot" is slower than before on an initial run,
> > because we run "xgettext" many times (once per source file), but it
> > can be boosted by parallelization. It is *much* faster for incremental
> > runs, and will allow us to implement related targets in subsequent
> > commits.
>
> This is to show my ignorance, but is there any downside, other than
> increased overhead coming from runing many instances of the program,
> in the "one file at a time" approach?  I was wondering if two or
> more identical translatable strings appear in multiple source files,
> where they are coalesced into a single entry in the resulting .pot
> file, and if xgettext having visibility into all these files would
> somehow help the process, but presumably we'd use msgcat to unify
> them into one entry so there wouldn't be such a downside there.  But
> are there others?

Both xgettext and msgcat have the ability to unify identical
translatable strings, and list all the filenames and locations in the
reference-comment part of the entry. In the following example, we can
see the message "could not write index" comes from several C source
files. We can also find this message in the corresponding intermediate
po files, such as: ".build/pot/po/add-interactive.c.po" and
".build/pot/po/reset.c.po".

    #: add-interactive.c:709 add-interactive.c:898 reset.c:160 ...
    msgid "could not write index"
    msgstr ""

If there are different contexts (I.E. notes for translators in source
code) for the identical translatable string from different files,
"msgcat" will add additional lines "#. #-#-#-#-# ... #-#-#-#-#" for
disambiguation like below:

    +#. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
     #. TRANSLATORS: 'it' refers to the patch mentioned in the
previous messages.
     #: add-patch.c:1106 git-add--interactive.perl:1129
     msgid ""
     "If it does not apply cleanly, you will be given an opportunity to\n"
     "edit again.  If all lines of the hunk are removed, then the edit is\n"
     "aborted and the hunk is left unchanged.\n"
     msgstr ""

The filename in the disambiguation comment line ends with ".po"
extension. Ævar tries to fix this issue by introducing more
intermediate files (which I will try to understand by some experiment)
or remove ".po" extension from po files, while I think it is
tolerable.
Jiang Xin May 23, 2022, 3 p.m. UTC | #4
On Mon, May 23, 2022 at 4:07 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 23 2022, Jiang Xin wrote:
>
> > From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >
> > Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
> > custom PRItime format, 2017-07-20) we'd consider source files as-is
> > with gettext, but because we need to understand PRItime in the same way
> > that gettext itself understands PRIuMAX we'd first check if we had a
> > clean checkout, then munge all of the processed files in-place with
> > "sed", generate "po/git.pot", and then finally "reset --hard" to undo
> > our changes.
> >
> > By generating "pot" snippets in ".build/pot/po" for each source file
> > and rewriting certain source files with PRItime macros to temporary
> > files in ".build/pot/po", we can avoid running "make pot" by altering
> > files in place and doing a "reset --hard" afterwards.
> >
> > This speed of "make pot" is slower than before on an initial run,
> > because we run "xgettext" many times (once per source file), but it
> > can be boosted by parallelization. It is *much* faster for incremental
> > runs, and will allow us to implement related targets in subsequent
> > commits.
> >
> > When the "pot" target was originally added in cd5513a7168 (i18n:
> > Makefile: "pot" target to extract messages marked for translation,
> > 2011-02-22) it behaved like a "normal" target. I.e. we'd skip the
> > re-generation of the po/git.pot if nothing had to be done.
> >
> > Then after po/git.pot was checked in in dce37b66fb0 (l10n: initial
> > git.pot for 1.7.10 upcoming release, 2012-02-13) the target was broken
> > until 1f31963e921 (i18n: treat "make pot" as an explicitly-invoked
> > target, 2014-08-22) when it was made to depend on "FORCE". I.e. the
> > Makefile's dependency resolution inherently can't handle incremental
> > building when the target file may be updated by git (or something else
> > external to "make"). But this case no longer applies, so FORCE is no
> > longer needed.
> >
> > That out of the way, the main logic change here is getting rid of the
> > "reset --hard":
> >
> > We'll generate intermediate ".build/pot/po/%.po" files from "%", which
> > is handy to see at a glance what strings (if any) in a given file are
> > marked for translation:
> >
> >       $ make .build/pot/po/pretty.c.po
> >       [...]
> >       $ cat .build/pot/po/pretty.c.po
> >       #: pretty.c:1051
> >       msgid "unable to parse --pretty format"
> >       msgstr ""
> >       $
> >
> > For these C source files which contain the PRItime macros, we will
> > create temporary munged "*.c" files in a tree in ".build/pot/po"
> > corresponding to our source tree, and have "xgettext" consider those.
> > The rule needs to be careful to "(cd .build/pot/po && ...)", because
> > otherwise the comments in the po/git.pot file wouldn't refer to the
> > correct source locations (they'd be prefixed with ".build/pot/po").
> > These temporary munged "*.c” files will be removed immediately after
> > the corresponding po files are generated, because some development tools
> > cannot ignore the duplicate source files in the ".build" directory
> > according to the ".gitignore" file, and that may cause trouble.
> >
> > The output of the generated po/git.pot file is changed in one minor
> > way: Because we're using msgcat(1) instead of xgettext(1) to
> > concatenate the output we'll now disambiguate where "TRANSLATORS"
> > comments come from, in cases where a message is the same in N files,
> > and either only one has a "TRANSLATORS" comment, or they're
> > different. E.g. for the "Your edited hunk[...]" message we'll now
> > apply this change (comment content elided):
> >
> >       +#. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
> >        #. TRANSLATORS: do not translate [y/n]
> >       [...]
> >       +#. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
> >        #. TRANSLATORS: do not translate [y/n]
> >       [...]
> >        #: add-patch.c:1253 git-add--interactive.perl:1244
> >        msgid ""
> >        "Your edited hunk does not apply. Edit again (saying \"no\" discards!) [y/n]? "
> >        msgstr ""
> >
> > There are six such changes, and they all make the context more
> > understandable, as msgcat(1) is better at handling these edge cases
> > than xgettext(1)'s previously used "--join-existing" flag.
> >
> > But filenames in the above disambiguation lines of extracted-comments
> > have an extra ".po" extension compared to the filenames at the file
> > locations. While we could rename the intermediate ".build/pot/po/%.po"
> > files without the ".po" extension to use more intuitive filenames in
> > the disambiguation lines of extracted-comments, but that will confuse
> > developer tools with lots of invalid C or other source files in
> > ".build/pot/po" directory.
> >
> > The addition of "--omit-header" option for xgettext makes the "pot"
> > snippets in ".build/pot/po/*.po" smaller. But as we'll see in a
> > subsequent commit this header behavior has been hiding an
> > encoding-related bug from us, so let's carry it forward instead of
> > re-generating it with xgettext(1).
> >
> > The "po/git.pot" file should have a header entry, because a proper
> > header entry will increase the speed of creating a new po file using
> > msginit and set a proper "POT-Creation-Date:" field in the header
> > entry of a "po/XX.po" file. We use xgettext to generate a separate
> > header file at ".build/pot/git.header" from "/dev/null", and use this
> > header to assemble "po/git.pot".
> >
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  .gitignore |  1 +
> >  Makefile   | 80 ++++++++++++++++++++++++++++++++++++++++--------------
> >  2 files changed, 60 insertions(+), 21 deletions(-)
> >
> > diff --git a/.gitignore b/.gitignore
> > index e81de1063a..a452215764 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -200,6 +200,7 @@
> >  *.[aos]
> >  *.o.json
> >  *.py[co]
> > +.build/
> >  .depend/
> >  *.gcda
> >  *.gcno
> > diff --git a/Makefile b/Makefile
> > index 46914dcd80..1962999c18 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -569,6 +569,7 @@ INSTALL = install
> >  TCL_PATH = tclsh
> >  TCLTK_PATH = wish
> >  XGETTEXT = xgettext
> > +MSGCAT = msgcat
> >  MSGFMT = msgfmt
> >  CURL_CONFIG = curl-config
> >  GCOV = gcov
> > @@ -2706,6 +2707,7 @@ XGETTEXT_FLAGS = \
> >       --force-po \
> >       --add-comments=TRANSLATORS: \
> >       --msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
> > +     --package-name=Git \
> >       --sort-by-file \
> >       --from-code=UTF-8
> >  XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
> > @@ -2714,6 +2716,7 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
> >       --keyword=gettextln --keyword=eval_gettextln
> >  XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
> >       --keyword=__ --keyword=N__ --keyword="__n:1,2"
> > +MSGCAT_FLAGS = --sort-by-file
> >  LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
> >             $(GENERATED_H)
> >  LOCALIZED_SH = $(SCRIPT_SH)
> > @@ -2726,34 +2729,68 @@ LOCALIZED_SH += t/t0200/test.sh
> >  LOCALIZED_PERL += t/t0200/test.perl
> >  endif
> >
> > -## Note that this is meant to be run only by the localization coordinator
> > -## under a very controlled condition, i.e. (1) it is to be run in a
> > -## Git repository (not a tarball extract), (2) any local modifications
> > -## will be lost.
> > +## We generate intermediate .build/pot/po/%.po files containing a
> > +## extract of the translations we find in each file in the source
> > +## tree. We will assemble them using msgcat to create the final
> > +## "po/git.pot" file.
> > +LOCALIZED_ALL_GEN_PO =
> > +
> > +LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po)
> > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO)
> > +
> > +LOCALIZED_SH_GEN_PO = $(LOCALIZED_SH:%=.build/pot/po/%.po)
> > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO)
> > +
> > +LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po)
> > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO)
> > +
> >  ## Gettext tools cannot work with our own custom PRItime type, so
> >  ## we replace PRItime with PRIuMAX.  We need to update this to
> >  ## PRIdMAX if we switch to a signed type later.
> > +$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_XGETTEXT) \
> > +         if grep -q PRItime $<; then \
> > +             (\
> > +                     sed -e 's|PRItime|PRIuMAX|g' <$< \
> > +                             >.build/pot/po/$< && \
> > +                     cd .build/pot/po && \
> > +                     $(XGETTEXT) --omit-header \
> > +                             -o $(@:.build/pot/po/%=%) \
> > +                             $(XGETTEXT_FLAGS_C) $< && \
> > +                     rm $<; \
> > +             ); \
> > +         else \
> > +             $(XGETTEXT) --omit-header \
> > +                     -o $@ $(XGETTEXT_FLAGS_C) $<; \
> > +         fi
> >
> > -po/git.pot: $(GENERATED_H) FORCE
> > -     # All modifications will be reverted at the end, so we do not
> > -     # want to have any local change.
> > -     git diff --quiet HEAD && git diff --quiet --cached
> > +$(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
> > +             -o$@ $(XGETTEXT_FLAGS_SH) $<
> >
> > -     @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \
> > -     do \
> > -             sed -e 's|PRItime|PRIuMAX|g' <"$$s" >"$$s+" && \
> > -             cat "$$s+" >"$$s" && rm "$$s+"; \
> > -     done
> > +$(LOCALIZED_PERL_GEN_PO): .build/pot/po/%.po: %
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
> > +             -o$@ $(XGETTEXT_FLAGS_PERL) $<
> > +q
> > +define gen_pot_header
> > +$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
> > +     -o - /dev/null | \
> > +sed -e 's|charset=CHARSET|charset=UTF-8|' \
> > +    -e 's|\(Last-Translator: \)FULL NAME <.*>|\1make by the Makefile|' \
> > +    -e 's|\(Language-Team: \)LANGUAGE <.*>|\1Git Mailing List <git@vger.kernel.org>|' \
> > +    >$@ && \
> > +echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
> > +endef
> >
> > -     $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ $(XGETTEXT_FLAGS_C) $(LOCALIZED_C)
> > -     $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_SH) \
> > -             $(LOCALIZED_SH)
> > -     $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_PERL) \
> > -             $(LOCALIZED_PERL)
> > +.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_GEN)$(gen_pot_header)
>
> So re a previous round I think the addition of a header to po/git.pot
> doesn't make sense, i.e. it's not needed, so we can just drop this
> entirely.

You missed a note about this in commit log in this reroll:

    The "po/git.pot" file should have a header entry, because a proper
    header entry will increase the speed of creating a new po file using
    msginit and set a proper "POT-Creation-Date:" field in the header
    entry of a "po/XX.po" file. We use xgettext to generate a separate
    header file at ".build/pot/git.header" from "/dev/null", and use this
    header to assemble "po/git.pot".

> But if you really want to include it anyway then this dependency is
> broken, the header just depends on this gen_pot_header you've defined,
> we don't need to re-generate it every time any single source file
> changes.

The header file ".build/pot/git.header" should have the same
prerequisites as the file "po/git.pot". This is because we need to
refresh the timestamp of the "POT-Creation-Date:" field.

> Or is the intent to have its timestamp bumped if any of the source files
> change?

Right. Just like using xgettext to create a "po/git.pot" with a fresh
header entry, we should recreate the header file for msgcat to create
"po/git.pot".

We want to switch to a new l10n workflow, and we want everything works
as usual. When l10n contributors update their "po/XX.po" file, it
should contain the latest "POT-Creation-Date" timestamp instead of
sticking to a fixed and outdated timestamp.

> If so I think that to the extent we need headers at all (for the *.po
> files) it might be worthwhile to just have the timestamp be the same
> epoch we use for the tests in t/, i.e. never update it and avoid the
> needless churn. If people need to check when things were last changed,
> or initially added, they can use this little thing called "git" :)

I stick to my option, see above.
Jiang Xin May 24, 2022, 12:56 a.m. UTC | #5
On Mon, May 23, 2022 at 4:07 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 23 2022, Jiang Xin wrote:
>
> > From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >
> > Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our
> > custom PRItime format, 2017-07-20) we'd consider source files as-is
> > with gettext, but because we need to understand PRItime in the same way
> > that gettext itself understands PRIuMAX we'd first check if we had a
> > clean checkout, then munge all of the processed files in-place with
> > "sed", generate "po/git.pot", and then finally "reset --hard" to undo
> > our changes.
> >
> > By generating "pot" snippets in ".build/pot/po" for each source file
> > and rewriting certain source files with PRItime macros to temporary
> > files in ".build/pot/po", we can avoid running "make pot" by altering
> > files in place and doing a "reset --hard" afterwards.
> >
> > This speed of "make pot" is slower than before on an initial run,
> > because we run "xgettext" many times (once per source file), but it
> > can be boosted by parallelization. It is *much* faster for incremental
> > runs, and will allow us to implement related targets in subsequent
> > commits.
> >
> > When the "pot" target was originally added in cd5513a7168 (i18n:
> > Makefile: "pot" target to extract messages marked for translation,
> > 2011-02-22) it behaved like a "normal" target. I.e. we'd skip the
> > re-generation of the po/git.pot if nothing had to be done.
> >
> > Then after po/git.pot was checked in in dce37b66fb0 (l10n: initial
> > git.pot for 1.7.10 upcoming release, 2012-02-13) the target was broken
> > until 1f31963e921 (i18n: treat "make pot" as an explicitly-invoked
> > target, 2014-08-22) when it was made to depend on "FORCE". I.e. the
> > Makefile's dependency resolution inherently can't handle incremental
> > building when the target file may be updated by git (or something else
> > external to "make"). But this case no longer applies, so FORCE is no
> > longer needed.
> >
> > That out of the way, the main logic change here is getting rid of the
> > "reset --hard":
> >
> > We'll generate intermediate ".build/pot/po/%.po" files from "%", which
> > is handy to see at a glance what strings (if any) in a given file are
> > marked for translation:
> >
> >       $ make .build/pot/po/pretty.c.po
> >       [...]
> >       $ cat .build/pot/po/pretty.c.po
> >       #: pretty.c:1051
> >       msgid "unable to parse --pretty format"
> >       msgstr ""
> >       $
> >
> > For these C source files which contain the PRItime macros, we will
> > create temporary munged "*.c" files in a tree in ".build/pot/po"
> > corresponding to our source tree, and have "xgettext" consider those.
> > The rule needs to be careful to "(cd .build/pot/po && ...)", because
> > otherwise the comments in the po/git.pot file wouldn't refer to the
> > correct source locations (they'd be prefixed with ".build/pot/po").
> > These temporary munged "*.c” files will be removed immediately after
> > the corresponding po files are generated, because some development tools
> > cannot ignore the duplicate source files in the ".build" directory
> > according to the ".gitignore" file, and that may cause trouble.
> >
> > The output of the generated po/git.pot file is changed in one minor
> > way: Because we're using msgcat(1) instead of xgettext(1) to
> > concatenate the output we'll now disambiguate where "TRANSLATORS"
> > comments come from, in cases where a message is the same in N files,
> > and either only one has a "TRANSLATORS" comment, or they're
> > different. E.g. for the "Your edited hunk[...]" message we'll now
> > apply this change (comment content elided):
> >
> >       +#. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
> >        #. TRANSLATORS: do not translate [y/n]
> >       [...]
> >       +#. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
> >        #. TRANSLATORS: do not translate [y/n]
> >       [...]
> >        #: add-patch.c:1253 git-add--interactive.perl:1244
> >        msgid ""
> >        "Your edited hunk does not apply. Edit again (saying \"no\" discards!) [y/n]? "
> >        msgstr ""
> >
> > There are six such changes, and they all make the context more
> > understandable, as msgcat(1) is better at handling these edge cases
> > than xgettext(1)'s previously used "--join-existing" flag.
> >
> > But filenames in the above disambiguation lines of extracted-comments
> > have an extra ".po" extension compared to the filenames at the file
> > locations. While we could rename the intermediate ".build/pot/po/%.po"
> > files without the ".po" extension to use more intuitive filenames in
> > the disambiguation lines of extracted-comments, but that will confuse
> > developer tools with lots of invalid C or other source files in
> > ".build/pot/po" directory.
> >
> > The addition of "--omit-header" option for xgettext makes the "pot"
> > snippets in ".build/pot/po/*.po" smaller. But as we'll see in a
> > subsequent commit this header behavior has been hiding an
> > encoding-related bug from us, so let's carry it forward instead of
> > re-generating it with xgettext(1).
> >
> > The "po/git.pot" file should have a header entry, because a proper
> > header entry will increase the speed of creating a new po file using
> > msginit and set a proper "POT-Creation-Date:" field in the header
> > entry of a "po/XX.po" file. We use xgettext to generate a separate
> > header file at ".build/pot/git.header" from "/dev/null", and use this
> > header to assemble "po/git.pot".
> >
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  .gitignore |  1 +
> >  Makefile   | 80 ++++++++++++++++++++++++++++++++++++++++--------------
> >  2 files changed, 60 insertions(+), 21 deletions(-)
> >
> > diff --git a/.gitignore b/.gitignore
> > index e81de1063a..a452215764 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -200,6 +200,7 @@
> >  *.[aos]
> >  *.o.json
> >  *.py[co]
> > +.build/
> >  .depend/
> >  *.gcda
> >  *.gcno
> > diff --git a/Makefile b/Makefile
> > index 46914dcd80..1962999c18 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -569,6 +569,7 @@ INSTALL = install
> >  TCL_PATH = tclsh
> >  TCLTK_PATH = wish
> >  XGETTEXT = xgettext
> > +MSGCAT = msgcat
> >  MSGFMT = msgfmt
> >  CURL_CONFIG = curl-config
> >  GCOV = gcov
> > @@ -2706,6 +2707,7 @@ XGETTEXT_FLAGS = \
> >       --force-po \
> >       --add-comments=TRANSLATORS: \
> >       --msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
> > +     --package-name=Git \
> >       --sort-by-file \
> >       --from-code=UTF-8
> >  XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
> > @@ -2714,6 +2716,7 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
> >       --keyword=gettextln --keyword=eval_gettextln
> >  XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
> >       --keyword=__ --keyword=N__ --keyword="__n:1,2"
> > +MSGCAT_FLAGS = --sort-by-file
> >  LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
> >             $(GENERATED_H)
> >  LOCALIZED_SH = $(SCRIPT_SH)
> > @@ -2726,34 +2729,68 @@ LOCALIZED_SH += t/t0200/test.sh
> >  LOCALIZED_PERL += t/t0200/test.perl
> >  endif
> >
> > -## Note that this is meant to be run only by the localization coordinator
> > -## under a very controlled condition, i.e. (1) it is to be run in a
> > -## Git repository (not a tarball extract), (2) any local modifications
> > -## will be lost.
> > +## We generate intermediate .build/pot/po/%.po files containing a
> > +## extract of the translations we find in each file in the source
> > +## tree. We will assemble them using msgcat to create the final
> > +## "po/git.pot" file.
> > +LOCALIZED_ALL_GEN_PO =
> > +
> > +LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po)
> > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO)
> > +
> > +LOCALIZED_SH_GEN_PO = $(LOCALIZED_SH:%=.build/pot/po/%.po)
> > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO)
> > +
> > +LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po)
> > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO)
> > +
> >  ## Gettext tools cannot work with our own custom PRItime type, so
> >  ## we replace PRItime with PRIuMAX.  We need to update this to
> >  ## PRIdMAX if we switch to a signed type later.
> > +$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_XGETTEXT) \
> > +         if grep -q PRItime $<; then \
> > +             (\
> > +                     sed -e 's|PRItime|PRIuMAX|g' <$< \
> > +                             >.build/pot/po/$< && \
> > +                     cd .build/pot/po && \
> > +                     $(XGETTEXT) --omit-header \
> > +                             -o $(@:.build/pot/po/%=%) \
> > +                             $(XGETTEXT_FLAGS_C) $< && \
> > +                     rm $<; \
> > +             ); \
> > +         else \
> > +             $(XGETTEXT) --omit-header \
> > +                     -o $@ $(XGETTEXT_FLAGS_C) $<; \
> > +         fi
> >
> > -po/git.pot: $(GENERATED_H) FORCE
> > -     # All modifications will be reverted at the end, so we do not
> > -     # want to have any local change.
> > -     git diff --quiet HEAD && git diff --quiet --cached
> > +$(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
> > +             -o$@ $(XGETTEXT_FLAGS_SH) $<
> >
> > -     @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \
> > -     do \
> > -             sed -e 's|PRItime|PRIuMAX|g' <"$$s" >"$$s+" && \
> > -             cat "$$s+" >"$$s" && rm "$$s+"; \
> > -     done
> > +$(LOCALIZED_PERL_GEN_PO): .build/pot/po/%.po: %
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
> > +             -o$@ $(XGETTEXT_FLAGS_PERL) $<
> > +q
> > +define gen_pot_header
> > +$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
> > +     -o - /dev/null | \
> > +sed -e 's|charset=CHARSET|charset=UTF-8|' \
> > +    -e 's|\(Last-Translator: \)FULL NAME <.*>|\1make by the Makefile|' \
> > +    -e 's|\(Language-Team: \)LANGUAGE <.*>|\1Git Mailing List <git@vger.kernel.org>|' \
> > +    >$@ && \
> > +echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
> > +endef
> >
> > -     $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ $(XGETTEXT_FLAGS_C) $(LOCALIZED_C)
> > -     $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_SH) \
> > -             $(LOCALIZED_SH)
> > -     $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_PERL) \
> > -             $(LOCALIZED_PERL)
> > +.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
> > +     $(call mkdir_p_parent_template)
> > +     $(QUIET_GEN)$(gen_pot_header)
>
> So re a previous round I think the addition of a header to po/git.pot
> doesn't make sense, i.e. it's not needed, so we can just drop this
> entirely.
>
> But if you really want to include it anyway then this dependency is
> broken, the header just depends on this gen_pot_header you've defined,
> we don't need to re-generate it every time any single source file
> changes.
>
> Or is the intent to have its timestamp bumped if any of the source files
> change?
>
> If so I think that to the extent we need headers at all (for the *.po
> files) it might be worthwhile to just have the timestamp be the same
> epoch we use for the tests in t/, i.e. never update it and avoid the
> needless churn. If people need to check when things were last changed,
> or initially added, they can use this little thing called "git" :)

After rereading all the replies, I feel it's better to to leave a note
here to make my point clear. As we removed the "po/git.pot" file, it
is not easy to find the base version of a "po/XX.po" file, and we need
the "POT-Creation-Date:" field which comes from the header of the
generated POT file to tell us the version of the base.

    $ git -P grep POT-Creation-Date -- po/
    po/bg.po:"POT-Creation-Date: 2022-01-17 08:34+0800\n"
    po/ca.po:"POT-Creation-Date: 2022-04-06 14:40+0800\n"
    po/de.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/el.po:"POT-Creation-Date: 2019-02-15 10:09+0800\n"
    po/es.po:"POT-Creation-Date: 2021-11-04 08:34+0800\n"
    po/fr.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/id.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/is.po:"POT-Creation-Date: 2016-06-17 18:55+0000\n"
    po/it.po:"POT-Creation-Date: 2020-10-10 09:32+0800\n"
    po/ko.po:"POT-Creation-Date: 2018-06-16 22:06+0800\n"
    po/pl.po:"POT-Creation-Date: 2021-11-10 08:55+0800\n"
    po/pt_PT.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/ru.po:"POT-Creation-Date: 2020-12-21 07:10+0800\n"
    po/sv.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/tr.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/vi.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/zh_CN.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
    po/zh_TW.po:"POT-Creation-Date: 2022-04-13 14:52+0800\n"
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index e81de1063a..a452215764 100644
--- a/.gitignore
+++ b/.gitignore
@@ -200,6 +200,7 @@ 
 *.[aos]
 *.o.json
 *.py[co]
+.build/
 .depend/
 *.gcda
 *.gcno
diff --git a/Makefile b/Makefile
index 46914dcd80..1962999c18 100644
--- a/Makefile
+++ b/Makefile
@@ -569,6 +569,7 @@  INSTALL = install
 TCL_PATH = tclsh
 TCLTK_PATH = wish
 XGETTEXT = xgettext
+MSGCAT = msgcat
 MSGFMT = msgfmt
 CURL_CONFIG = curl-config
 GCOV = gcov
@@ -2706,6 +2707,7 @@  XGETTEXT_FLAGS = \
 	--force-po \
 	--add-comments=TRANSLATORS: \
 	--msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
+	--package-name=Git \
 	--sort-by-file \
 	--from-code=UTF-8
 XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
@@ -2714,6 +2716,7 @@  XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
 	--keyword=gettextln --keyword=eval_gettextln
 XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
 	--keyword=__ --keyword=N__ --keyword="__n:1,2"
+MSGCAT_FLAGS = --sort-by-file
 LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
 	      $(GENERATED_H)
 LOCALIZED_SH = $(SCRIPT_SH)
@@ -2726,34 +2729,68 @@  LOCALIZED_SH += t/t0200/test.sh
 LOCALIZED_PERL += t/t0200/test.perl
 endif
 
-## Note that this is meant to be run only by the localization coordinator
-## under a very controlled condition, i.e. (1) it is to be run in a
-## Git repository (not a tarball extract), (2) any local modifications
-## will be lost.
+## We generate intermediate .build/pot/po/%.po files containing a
+## extract of the translations we find in each file in the source
+## tree. We will assemble them using msgcat to create the final
+## "po/git.pot" file.
+LOCALIZED_ALL_GEN_PO =
+
+LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po)
+LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO)
+
+LOCALIZED_SH_GEN_PO = $(LOCALIZED_SH:%=.build/pot/po/%.po)
+LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO)
+
+LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po)
+LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO)
+
 ## Gettext tools cannot work with our own custom PRItime type, so
 ## we replace PRItime with PRIuMAX.  We need to update this to
 ## PRIdMAX if we switch to a signed type later.
+$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
+	$(call mkdir_p_parent_template)
+	$(QUIET_XGETTEXT) \
+	    if grep -q PRItime $<; then \
+		(\
+			sed -e 's|PRItime|PRIuMAX|g' <$< \
+				>.build/pot/po/$< && \
+			cd .build/pot/po && \
+			$(XGETTEXT) --omit-header \
+				-o $(@:.build/pot/po/%=%) \
+				$(XGETTEXT_FLAGS_C) $< && \
+			rm $<; \
+		); \
+	    else \
+		$(XGETTEXT) --omit-header \
+			-o $@ $(XGETTEXT_FLAGS_C) $<; \
+	    fi
 
-po/git.pot: $(GENERATED_H) FORCE
-	# All modifications will be reverted at the end, so we do not
-	# want to have any local change.
-	git diff --quiet HEAD && git diff --quiet --cached
+$(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
+	$(call mkdir_p_parent_template)
+	$(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
+		-o$@ $(XGETTEXT_FLAGS_SH) $<
 
-	@for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \
-	do \
-		sed -e 's|PRItime|PRIuMAX|g' <"$$s" >"$$s+" && \
-		cat "$$s+" >"$$s" && rm "$$s+"; \
-	done
+$(LOCALIZED_PERL_GEN_PO): .build/pot/po/%.po: %
+	$(call mkdir_p_parent_template)
+	$(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
+		-o$@ $(XGETTEXT_FLAGS_PERL) $<
+
+define gen_pot_header
+$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
+	-o - /dev/null | \
+sed -e 's|charset=CHARSET|charset=UTF-8|' \
+    -e 's|\(Last-Translator: \)FULL NAME <.*>|\1make by the Makefile|' \
+    -e 's|\(Language-Team: \)LANGUAGE <.*>|\1Git Mailing List <git@vger.kernel.org>|' \
+    >$@ && \
+echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
+endef
 
-	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ $(XGETTEXT_FLAGS_C) $(LOCALIZED_C)
-	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_SH) \
-		$(LOCALIZED_SH)
-	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_PERL) \
-		$(LOCALIZED_PERL)
+.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
+	$(call mkdir_p_parent_template)
+	$(QUIET_GEN)$(gen_pot_header)
 
-	# Reverting the munged source, leaving only the updated $@
-	git reset --hard
-	mv $@+ $@
+po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO) FORCE
+	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $(filter-out FORCE,$^) >$@
 
 .PHONY: pot
 pot: po/git.pot
@@ -3292,6 +3329,7 @@  cocciclean:
 	$(RM) contrib/coccinelle/*.cocci.patch*
 
 clean: profile-clean coverage-clean cocciclean
+	$(RM) -r .build
 	$(RM) *.res
 	$(RM) $(OBJECTS)
 	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB) $(REFTABLE_TEST_LIB)