[00/10] Factorization of messages with similar meaning

Message ID	pull.1088.git.1638514909.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1088.git.1638514909.gitgitgadget@gmail.com> From: " =?utf-8?q?Jean-No=C3=ABl?= Avila via GitGitGadget" <gitgitgadget@gmail.com> Date: Fri, 03 Dec 2021 07:01:39 +0000 Subject: [PATCH 00/10] Factorization of messages with similar meaning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fcc: Sent To: git@vger.kernel.org Cc: =?utf-8?q?Jean-No=C3=ABl?= Avila <jn.avila@free.fr> Precedence: bulk
Series	Factorization of messages with similar meaning \| expand [00/10] Factorization of messages with similar meaning [01/10] i18n: refactor "foo and bar are mutually exclusive" [02/10] i18n: refactor "%s, %s and %s are mutually exclusive" [03/10] i18n: turn "options are incompatible" into "are mutually exclusive" [04/10] i18n: standardize "cannot open" and "cannot read" [05/10] i18n: tag.c factorize i18n strings [06/10] i18n: factorize "--foo requires --bar" and the like [07/10] i18n: factorize "no directory given for --foo" [08/10] i18n: refactor "unrecognized %(foo) argument" strings [09/10] i18n: factorize "--foo outside a repository" [10/10] i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"

Elijah Newren via GitGitGadget Dec. 3, 2021, 7:01 a.m. UTC

This series is a meager attempt at rationalizing a small fraction of the
internationalized messages. Sorry in advance for the dull task of reviewing
these insipide patches.

Doing so has some positive effects:

 * non-translatable constant strings are kept out of the way for translators
 * messages with identical meaning are built identically
 * the total number of messages to translate is decreased.

I'm inclined to even go a step further and turn these messages into #define
or const strings. This would have the added benefits:

 * make sure that the messages to translate are identical
 * create a library of message skeletons to be picked up when new messages
   are needed

What do you think?

Jean-Noël Avila (10):
  i18n: refactor "foo and bar are mutually exclusive"
  i18n: refactor "%s, %s and %s are mutually exclusive"
  i18n: turn "options are incompatible" into "are mutually exclusive"
  i18n: standardize "cannot open" and "cannot read"
  i18n: tag.c factorize i18n strings
  i18n: factorize "--foo requires --bar" and the like
  i18n: factorize "no directory given for --foo"
  i18n: refactor "unrecognized %(foo) argument" strings
  i18n: factorize "--foo outside a repository"
  i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"

 apply.c                           |  8 ++++----
 archive.c                         |  8 ++++----
 builtin/add.c                     | 12 ++++++------
 builtin/branch.c                  |  2 +-
 builtin/checkout.c                |  8 ++++----
 builtin/clone.c                   |  2 +-
 builtin/commit.c                  |  6 +++---
 builtin/describe.c                |  2 +-
 builtin/diff-tree.c               |  2 +-
 builtin/difftool.c                |  4 ++--
 builtin/fast-export.c             |  4 ++--
 builtin/fetch.c                   |  6 +++---
 builtin/index-pack.c              |  4 ++--
 builtin/init-db.c                 |  2 +-
 builtin/log.c                     |  8 ++++----
 builtin/pack-objects.c            |  2 +-
 builtin/push.c                    |  8 ++++----
 builtin/repack.c                  |  4 ++--
 builtin/reset.c                   |  8 ++++----
 builtin/rm.c                      |  2 +-
 builtin/stash.c                   |  4 ++--
 builtin/submodule--helper.c       |  4 ++--
 builtin/tag.c                     | 10 +++++-----
 builtin/worktree.c                |  6 +++---
 diff.c                            |  2 +-
 fetch-pack.c                      |  2 +-
 git.c                             |  6 +++---
 http-fetch.c                      |  4 ++--
 range-diff.c                      |  2 +-
 ref-filter.c                      | 20 ++++++++++----------
 revision.c                        | 22 +++++++++++-----------
 t/t2026-checkout-pathspec-file.sh |  4 ++--
 t/t2072-restore-pathspec-file.sh  |  2 +-
 t/t3704-add-pathspec-file.sh      |  6 +++---
 t/t3909-stash-pathspec-file.sh    |  2 +-
 t/t5606-clone-options.sh          |  2 +-
 t/t7107-reset-pathspec-file.sh    |  2 +-
 t/t7526-commit-pathspec-file.sh   |  4 ++--
 38 files changed, 103 insertions(+), 103 deletions(-)


base-commit: 35151cf0720460a897cde9b8039af364743240e7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1088%2Fjnavila%2Fi18n-refactor-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1088/jnavila/i18n-refactor-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1088

Jeff King Dec. 3, 2021, 9:55 p.m. UTC | #1

On Fri, Dec 03, 2021 at 07:01:39AM +0000, Jean-Noël Avila via GitGitGadget wrote:

> This series is a meager attempt at rationalizing a small fraction of the
> internationalized messages. Sorry in advance for the dull task of reviewing
> these insipide patches.
> 
> Doing so has some positive effects:
> 
>  * non-translatable constant strings are kept out of the way for translators
>  * messages with identical meaning are built identically
>  * the total number of messages to translate is decreased.
> 
> I'm inclined to even go a step further and turn these messages into #define
> or const strings. This would have the added benefits:
> 
>  * make sure that the messages to translate are identical
>  * create a library of message skeletons to be picked up when new messages
>    are needed
> 
> What do you think?

One slight negative of this approach is that it makes messages a bit
harder to grep for. It sometimes really nice to "git jump grep" for
specific messages you got to see where they're coming from.

I don't think that's a strong objection, though. If this is making the
translations overall more maintainable it might be worth the tradeoff.

We could also allow GIT_VERBOSE=1 or something to print the file/line of
error(), warning(), and die() messages, which solves the same problem. I
think Ævar might have had some patches in that direction.

-Peff

Johannes Sixt Dec. 3, 2021, 11:39 p.m. UTC | #2

Am 03.12.21 um 22:55 schrieb Jeff King:
> On Fri, Dec 03, 2021 at 07:01:39AM +0000, Jean-Noël Avila via GitGitGadget wrote:
> 
>> This series is a meager attempt at rationalizing a small fraction of the
>> internationalized messages. Sorry in advance for the dull task of reviewing
>> these insipide patches.
>>
>> Doing so has some positive effects:
>>
>>  * non-translatable constant strings are kept out of the way for translators
>>  * messages with identical meaning are built identically
>>  * the total number of messages to translate is decreased.
>>
>> I'm inclined to even go a step further and turn these messages into #define
>> or const strings. This would have the added benefits:
>>
>>  * make sure that the messages to translate are identical
>>  * create a library of message skeletons to be picked up when new messages
>>    are needed
>>
>> What do you think?
> 
> One slight negative of this approach is that it makes messages a bit
> harder to grep for. It sometimes really nice to "git jump grep" for
> specific messages you got to see where they're coming from.

This can be mitigated by using, for example,

  git grep -e --stdin --and -e mutually

as long as the rewrite keeps the arguments on the same line with the
format strings, which it does.

Another aspect is that translators lose context. For example, "%s and %s
are mutally exclusive" may have to be translated differently depending
on what kind of text is substituted for %s. In this example it's
probably always command line options (I haven't checked), so not an
immediate problem. But something to keep in mind.

-- Hannes

Junio C Hamano Dec. 5, 2021, 7:31 a.m. UTC | #3

Johannes Sixt <j6t@kdbg.org> writes:

> Another aspect is that translators lose context. For example, "%s and %s
> are mutally exclusive" may have to be translated differently depending
> on what kind of text is substituted for %s. In this example it's
> probably always command line options (I haven't checked), so not an
> immediate problem. But something to keep in mind.

Yup.  I do not think we are quite ready to have two identical msgid
strings to be translated into two different msgstr strings.  We've
briefly talked about pgettext() a few months ago, but nothing
concrete came out of it, as far as I can recall.

Jean-Noël Avila Dec. 5, 2021, 5:25 p.m. UTC | #4

On Sunday, 5 December 2021 08:31:38 CET Junio C Hamano wrote:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
> > Another aspect is that translators lose context. For example, "%s and %s
> > are mutally exclusive" may have to be translated differently depending
> > on what kind of text is substituted for %s. In this example it's
> > probably always command line options (I haven't checked), so not an
> > immediate problem. But something to keep in mind.
> 
> Yup.  I do not think we are quite ready to have two identical msgid
> strings to be translated into two different msgstr strings.  We've
> briefly talked about pgettext() a few months ago, but nothing
> concrete came out of it, as far as I can recall.
> 
> 

As a translator, I made sure that all the messages are with the same grammatical structure where the placeholders are only command line options. The same messages with placeholders are meant to convey exactly the same meaning at all their use point. We have all the control on the source code to tailor them so that one message model is only used with specific types of variables (options here). That's another reason why I was proposing to define and name them.

If needed, "%s and %s are mutually exclusive" could be turned into "options %s and %s are mutually exclusive" to make it clear that the placeholders can only hold option names.

Junio C Hamano Dec. 5, 2021, 7:30 p.m. UTC | #5

"Jean-Noël Avila via GitGitGadget"  <gitgitgadget@gmail.com> writes:

> This series is a meager attempt at rationalizing a small fraction of the
> internationalized messages. Sorry in advance for the dull task of reviewing
> these insipide patches.
>
> Doing so has some positive effects:
>
>  * non-translatable constant strings are kept out of the way for translators
>  * messages with identical meaning are built identically
>  * the total number of messages to translate is decreased.
>
> I'm inclined to even go a step further and turn these messages into #define
> or const strings.

After looking at [01/10] that repeats the same string in many
places, I would have to say that we do not want such C preprocessor
macros.  Having to hunt for an existing message that is close enough
to what you want to say, when you are writing a new message, feels a
bit too much.

I wonder if a tool that

 - looks for "newly added" messages (by scanning "git diff" output)

 - compares them with po/git.pot for existing msgid in a fuzzy way
   to locate the ones that may be candidate for a change like the
   changes in this series

can help developers, though.

Thanks.

Johannes Sixt Dec. 5, 2021, 7:50 p.m. UTC | #6

Am 05.12.21 um 18:25 schrieb Jean-Noël AVILA:
> If needed, "%s and %s are mutually exclusive" could be turned into
> "options %s and %s are mutually exclusive" to make it clear that the
> placeholders can only hold option names.

IMO, being less terse helps not only translators, but also users.

Regarding this particular message, personally, I am not a fan of
"mutually exclusive" (sounds like it's been taken from a law text). How
about "options ... are incompatible" or "... cannot be used together"?

-- Hannes

Junio C Hamano Dec. 6, 2021, 7:18 p.m. UTC | #7

Johannes Sixt <j6t@kdbg.org> writes:

> Am 05.12.21 um 18:25 schrieb Jean-Noël AVILA:
>> If needed, "%s and %s are mutually exclusive" could be turned into
>> "options %s and %s are mutually exclusive" to make it clear that the
>> placeholders can only hold option names.
>
> IMO, being less terse helps not only translators, but also users.
>
> Regarding this particular message, personally, I am not a fan of
> "mutually exclusive" (sounds like it's been taken from a law text). How
> about "options ... are incompatible" or "... cannot be used together"?

Sounds good.  Or perhaps "X cannot be used with Y", which may be
even shorter and is still clear what it wants to say.

    X and Y are incompatible.
    X and Y cannot be used together.
    X cannot be used with Y.

[00/10] Factorization of messages with similar meaning

Message

Comments