diff mbox series

[v2,1/9] Makefile: sort "po/git.pot" by file location

Message ID 20220519081548.3380-2-worldhello.net@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/9] Makefile: sort "po/git.pot" by file location | expand

Commit Message

Jiang Xin May 19, 2022, 8:15 a.m. UTC
From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

Before feeding xgettext with more C souce files which may be ignored
by various compiler conditions, add new option "--sort-by-file" to
xgettext program to create stable message template file "po/git.pot".

With this update, the newly generated "po/git.pot" will has the same
entries while in a different order. We won't checkin the newly generated
"po/git.pot", because we will remove it from tree in a later commit.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

Comments

Ævar Arnfjörð Bjarmason May 19, 2022, 8:53 a.m. UTC | #1
On Thu, May 19 2022, Jiang Xin wrote:

> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>
> Before feeding xgettext with more C souce files which may be ignored
> by various compiler conditions, add new option "--sort-by-file" to
> xgettext program to create stable message template file "po/git.pot".
>
> With this update, the newly generated "po/git.pot" will has the same
> entries while in a different order. We won't checkin the newly generated
> "po/git.pot", because we will remove it from tree in a later commit.
>
> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Makefile b/Makefile
> index f8bccfab5e..83e968e2a4 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2706,6 +2706,7 @@ XGETTEXT_FLAGS = \
>  	--force-po \
>  	--add-comments=TRANSLATORS: \
>  	--msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
> +	--sort-by-file \
>  	--from-code=UTF-8
>  XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
>  	--keyword=_ --keyword=N_ --keyword="Q_:1,2"

I'm not opposed to this change, but between this and 2/9 I'm still
unsure what the aim is exactly, and if the results are desired.

In 2/9 you correctly point out that we have messages we've missed due to
LOCALIZED_C being platform-specific.

That should never happen (although your implementation in 2/9 may have
some small issues, I'll reply there separately), i.e. we should always
have "make pot" generate the same po/git.pot from the same commit
whether you're on linux, mac os x etc.

But AFAICT we have a "stable" sort order now, it's in whatever order we
feed the files to xgettext, which ultimately comes down to e.g. the list
of $(LIB_OBJS) in the Makefile.

I've been looking over the libintl documentation to see what exactly
these sort options do, and how they differ from the default, and it's
not really described.

AFAICT the xgettext behavior we have now is that we'll process the files
we have in order, and for those files extract the messages we have as we
see them.

One fringe benefit of that is that e.g. "make pot
XGETTEXT_INCLUDE_TESTS=Y" (which I think I've only ever used, and
probably only ~10 years ago) will get added at the end, but now it'll be
added wherever t/t0200 sorts.

Then because compose the builtin objs and lib objs by concatenation, but
don't $(sort) them in the Makefile most of this change is due to us
e.g. sorting builtin/* before parse-options.c or whatever.

But oddly we also have cases like this:

        strbuf_addf(&header, print_file_item_data.modified_fmt,
                    _("staged"), _("unstaged"), _("path"));

Before this we'd list those in that order in the git.pot, but now
because of --sort-by-file we'll list any messages on the same line in
sorted msgid order, not in the orderd they appear in. Another example is
e.g. this in builtin/blame.c:

    OPT_STRING_LIST(0, "ignore-rev", &ignore_rev_list, N_("rev"), N_("ignore <rev> when blaming"))

Where before we'd list them in that order, but now it's with "ignore.."
before "rev".

I think this change would be easier to follow & explain if you first
made this change:
	
	diff --git a/Makefile b/Makefile
	index 61aadf3ce88..3726fe8064a 100644
	--- a/Makefile
	+++ b/Makefile
	@@ -2715,10 +2715,9 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
	 	--keyword=gettextln --keyword=eval_gettextln
	 XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
	 	--keyword=__ --keyword=N__ --keyword="__n:1,2"
	-LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H)
	-LOCALIZED_SH = $(SCRIPT_SH)
	-LOCALIZED_SH += git-sh-setup.sh
	-LOCALIZED_PERL = $(SCRIPT_PERL)
	+LOCALIZED_C = $(sort $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H))
	+LOCALIZED_SH = $(sort $(SCRIPT_SH) git-sh-setup.sh)
	+LOCALIZED_PERL = $(sort $(SCRIPT_PERL))
	 
	 ifdef XGETTEXT_INCLUDE_TESTS
	 LOCALIZED_C += t/t0200/test.c

Which would sort things within C, SH and Perl files (but not among
them). Then this change would AFAICT only:

 * Change that "within one line" sort order, as noted above
 * Sort across C/SH/Perl.

I'm mostly "meh" on the result, but it's also because I genuinely don't
get what the goal was. Is it because in 2/9 you'll end up using
$(FOUND_C_SOURCES), which we derive from either "git ls-files" or
"find", the latter of which has an unstable sort order?

Even if that's the case a smaller change is to just use $(sort) as noted
above. I like that a tiny bit more because then the "--join-existing" we
do for SH/Perl is an append, whereas now we'll re-sort the whole thing.

I also don't think it matters much either way, so whatever you're happy
with, the above all came from trying to follow along...
Jiang Xin May 19, 2022, 12:41 p.m. UTC | #2
On Thu, May 19, 2022 at 5:17 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Thu, May 19 2022, Jiang Xin wrote:
>
> > From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> >
> > Before feeding xgettext with more C souce files which may be ignored
> > by various compiler conditions, add new option "--sort-by-file" to
> > xgettext program to create stable message template file "po/git.pot".
> >
> > With this update, the newly generated "po/git.pot" will has the same
> > entries while in a different order. We won't checkin the newly generated
> > "po/git.pot", because we will remove it from tree in a later commit.
> >
> > Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  Makefile | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/Makefile b/Makefile
> > index f8bccfab5e..83e968e2a4 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -2706,6 +2706,7 @@ XGETTEXT_FLAGS = \
> >       --force-po \
> >       --add-comments=TRANSLATORS: \
> >       --msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
> > +     --sort-by-file \
> >       --from-code=UTF-8
> >  XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
> >       --keyword=_ --keyword=N_ --keyword="Q_:1,2"
>
> I'm not opposed to this change, but between this and 2/9 I'm still
> unsure what the aim is exactly, and if the results are desired.

We will change the order of input files to feed xgettext to create
"po/git.pot" in patch 2/9, and we may change the order of input files
in future version. With the option "--sort-by-file" we can get a
stable "po/git.pot" and better compression ratios for future versions
of "po/*.po*".

> In 2/9 you correctly point out that we have messages we've missed due to
> LOCALIZED_C being platform-specific.
>
> That should never happen (although your implementation in 2/9 may have
> some small issues, I'll reply there separately), i.e. we should always
> have "make pot" generate the same po/git.pot from the same commit
> whether you're on linux, mac os x etc.

l10n translators may work on different platforms or have different
compiler conditions, and their contributions may have different base
templates (po/git.pot). It is hard for the l10n coordinator to choose
a base template to check contributions from different l10n
contributors.

> But AFAICT we have a "stable" sort order now, it's in whatever order we
> feed the files to xgettext, which ultimately comes down to e.g. the list
> of $(LIB_OBJS) in the Makefile.
>
> I've been looking over the libintl documentation to see what exactly
> these sort options do, and how they differ from the default, and it's
> not really described.
>
> AFAICT the xgettext behavior we have now is that we'll process the files
> we have in order, and for those files extract the messages we have as we
> see them.
>
> One fringe benefit of that is that e.g. "make pot
> XGETTEXT_INCLUDE_TESTS=Y" (which I think I've only ever used, and
> probably only ~10 years ago) will get added at the end, but now it'll be
> added wherever t/t0200 sorts.
>
> Then because compose the builtin objs and lib objs by concatenation, but
> don't $(sort) them in the Makefile most of this change is due to us
> e.g. sorting builtin/* before parse-options.c or whatever.
>
> But oddly we also have cases like this:
>
>         strbuf_addf(&header, print_file_item_data.modified_fmt,
>                     _("staged"), _("unstaged"), _("path"));
>
> Before this we'd list those in that order in the git.pot, but now
> because of --sort-by-file we'll list any messages on the same line in
> sorted msgid order, not in the orderd they appear in. Another example is
> e.g. this in builtin/blame.c:
>
>     OPT_STRING_LIST(0, "ignore-rev", &ignore_rev_list, N_("rev"), N_("ignore <rev> when blaming"))
>
> Where before we'd list them in that order, but now it's with "ignore.."
> before "rev".

These side effects of changing the order of entries in the same line
of the same source file have little effect on the l10n translation.

> I think this change would be easier to follow & explain if you first
> made this change:
>
>         diff --git a/Makefile b/Makefile
>         index 61aadf3ce88..3726fe8064a 100644
>         --- a/Makefile
>         +++ b/Makefile
>         @@ -2715,10 +2715,9 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
>                 --keyword=gettextln --keyword=eval_gettextln
>          XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
>                 --keyword=__ --keyword=N__ --keyword="__n:1,2"
>         -LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H)
>         -LOCALIZED_SH = $(SCRIPT_SH)
>         -LOCALIZED_SH += git-sh-setup.sh
>         -LOCALIZED_PERL = $(SCRIPT_PERL)
>         +LOCALIZED_C = $(sort $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H))
>         +LOCALIZED_SH = $(sort $(SCRIPT_SH) git-sh-setup.sh)
>         +LOCALIZED_PERL = $(sort $(SCRIPT_PERL))
>
>          ifdef XGETTEXT_INCLUDE_TESTS
>          LOCALIZED_C += t/t0200/test.c

I do not need this intermediate commit and submit new generated
"po/git.pot" file to observe changes of the "po/git.pot" file, because
I have a diff driver install as described here:

    https://github.com/git-l10n/git-po-helper/tree/main/contrib/diff-dirver

So I can tell the newly generated "po/git.pot" will have the same
entries while in a different order. If I want to see changes on raw
files, I can use command:

    git diff --no-textconv

> Which would sort things within C, SH and Perl files (but not among
> them). Then this change would AFAICT only:
>
>  * Change that "within one line" sort order, as noted above
>  * Sort across C/SH/Perl.
>
> I'm mostly "meh" on the result, but it's also because I genuinely don't
> get what the goal was. Is it because in 2/9 you'll end up using
> $(FOUND_C_SOURCES), which we derive from either "git ls-files" or
> "find", the latter of which has an unstable sort order?

The goal is to have a constant order of entries in "po/git.pot" and
"po/*.po", so we can save the space of our repository by better
compression ratio on files inside "po/".

--
Jiang Xin
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index f8bccfab5e..83e968e2a4 100644
--- a/Makefile
+++ b/Makefile
@@ -2706,6 +2706,7 @@  XGETTEXT_FLAGS = \
 	--force-po \
 	--add-comments=TRANSLATORS: \
 	--msgid-bugs-address="Git Mailing List <git@vger.kernel.org>" \
+	--sort-by-file \
 	--from-code=UTF-8
 XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
 	--keyword=_ --keyword=N_ --keyword="Q_:1,2"