[v8,07/15] bugreport: add git-remote-https version

Message ID	20200220015858.181086-8-emilyshaffer@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=QcP8=4I=vger.kernel.org=git-owner@kernel.org> Date: Wed, 19 Feb 2020 17:58:50 -0800 In-Reply-To: <20200220015858.181086-1-emilyshaffer@google.com> Message-Id: <20200220015858.181086-8-emilyshaffer@google.com> Mime-Version: 1.0 References: <20200220015858.181086-1-emilyshaffer@google.com> Subject: [PATCH v8 07/15] bugreport: add git-remote-https version From: Emily Shaffer <emilyshaffer@google.com> To: git@vger.kernel.org Cc: Emily Shaffer <emilyshaffer@google.com> Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	add git-bugreport tool \| expand [v8,00/15] add git-bugreport tool [v8,01/15] help: move list_config_help to builtin/help [v8,02/15] help: add shell-path to --build-options [v8,03/15] bugreport: add tool to generate debugging info [v8,04/15] bugreport: gather git version and build info [v8,05/15] bugreport: add uname info [v8,06/15] bugreport: add compiler info [v8,07/15] bugreport: add git-remote-https version [v8,08/15] bugreport: include user interactive shell [v8,09/15] bugreport: generate config safelist based on docs [v8,10/15] bugreport: add config values from safelist [v8,11/15] bugreport: collect list of populated hooks [v8,12/15] bugreport: count loose objects [v8,13/15] bugreport: add packed object summary [v8,14/15] bugreport: list contents of $OBJDIR/info [v8,15/15] bugreport: summarize contents of alternates file

Emily Shaffer Feb. 20, 2020, 1:58 a.m. UTC

It's possible for git-remote-curl to be built separately from git; in that
case we want to know what version of cURL is used by git-remote-curl, not
necessarily which version was present at git-bugreport's build time.
So instead, ask git-remote-curl for the version information it knows
about.

Today, "git-remote-http" and "git-remote-https" are aliased to
"git-remote-curl"; but in case we rely on a different library than cURL
in the future, let's not explicitly reference cURL from bugreport.

For longevity purposes, invoke the alias "git-remote-https" instead of
"git-remote-http".

Since it could have been built at a different time, also report the
version and built-from commit of git-remote-curl alongside the cURL info.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/git-bugreport.txt |  1 +
 bugreport.c                     | 16 ++++++++++++++++
 remote-curl.c                   |  8 ++++++++
 3 files changed, 25 insertions(+)

Junio C Hamano Feb. 20, 2020, 8:35 p.m. UTC | #1

Emily Shaffer <emilyshaffer@google.com> writes:

> +static void get_git_remote_https_version_info(struct strbuf *version_info)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +
> +	cp.git_cmd = 1;
> +	argv_array_push(&cp.args, "remote-https");
> +	argv_array_push(&cp.args, "--build-info");
> +	if (capture_command(&cp, version_info, 0))
> +	    strbuf_addstr(version_info, "'git-remote-https --build-info' not supported\n");
> +}
>  
>  static void get_system_info(struct strbuf *sys_info)
>  {
> @@ -29,6 +41,10 @@ static void get_system_info(struct strbuf *sys_info)
>  	strbuf_addstr(sys_info, "compiler info: ");
>  	get_compiler_info(sys_info);
>  	strbuf_complete_line(sys_info);
> +
> +	strbuf_addstr(sys_info, "git-remote-https --build-info:\n");
> +	get_git_remote_https_version_info(sys_info);
> +	strbuf_complete_line(sys_info);
>  }
>  
>  static const char * const bugreport_usage[] = {
> diff --git a/remote-curl.c b/remote-curl.c
> index 8eb96152f5..73e52175c0 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -17,6 +17,7 @@
>  #include "protocol.h"
>  #include "quote.h"
>  #include "transport.h"
> +#include "version.h"
>  
>  static struct remote *remote;
>  /* always ends with a trailing slash */
> @@ -1375,6 +1376,13 @@ int cmd_main(int argc, const char **argv)
>  	string_list_init(&options.deepen_not, 1);
>  	string_list_init(&options.push_options, 1);
>  
> +	if (!strcmp("--build-info", argv[1])) {
> +		printf("git-http-fetch version: %s\n", git_version_string);

We are letting bugreport grab this information from remote-curl, and
they are different binaries, so git_version_string we see here is
that of the remote-curl.  Good.

> +		printf("built from commit: %s\n", git_built_from_commit_string);
> +		printf("curl version: %s\n", curl_version());
> +		return 0;
> +	}
> +
>  	/*
>  	 * Just report "remote-curl" here (folding all the various aliases
>  	 * ("git-remote-http", "git-remote-https", and etc.) here since they

Makes sense, except that it is not clear what our overall stance on
i18n/l10n of the bugreport output.

I think there are two schools of thought.  

 - You can stick to C local, with an expectation that developers can
   read reports in C locale.  This will allow developers to avoid
   having to read reports written in a language they do not
   understand, and also makes it easier to mechanically process
   reports.

 - You can make sure what matters in the report is localized to the
   end-users' locale.  This will avoid forcing end-users to send out
   a report that they may not even able to read (which is what
   happens if we did not do i18n/l10n).

I am not sure which way you are aiming at.  The boilerplate text
we saw in a very early part of the series were marked N_() but some
of the messages in later parts of the series are not.

Emily Shaffer Feb. 20, 2020, 11:28 p.m. UTC | #2

On Thu, Feb 20, 2020 at 12:35:37PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > +static void get_git_remote_https_version_info(struct strbuf *version_info)
> > +{
> > +	struct child_process cp = CHILD_PROCESS_INIT;
> > +
> > +	cp.git_cmd = 1;
> > +	argv_array_push(&cp.args, "remote-https");
> > +	argv_array_push(&cp.args, "--build-info");
> > +	if (capture_command(&cp, version_info, 0))
> > +	    strbuf_addstr(version_info, "'git-remote-https --build-info' not supported\n");
> > +}
> >  
> >  static void get_system_info(struct strbuf *sys_info)
> >  {
> > @@ -29,6 +41,10 @@ static void get_system_info(struct strbuf *sys_info)
> >  	strbuf_addstr(sys_info, "compiler info: ");
> >  	get_compiler_info(sys_info);
> >  	strbuf_complete_line(sys_info);
> > +
> > +	strbuf_addstr(sys_info, "git-remote-https --build-info:\n");
> > +	get_git_remote_https_version_info(sys_info);
> > +	strbuf_complete_line(sys_info);
> >  }
> >  
> >  static const char * const bugreport_usage[] = {
> > diff --git a/remote-curl.c b/remote-curl.c
> > index 8eb96152f5..73e52175c0 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -17,6 +17,7 @@
> >  #include "protocol.h"
> >  #include "quote.h"
> >  #include "transport.h"
> > +#include "version.h"
> >  
> >  static struct remote *remote;
> >  /* always ends with a trailing slash */
> > @@ -1375,6 +1376,13 @@ int cmd_main(int argc, const char **argv)
> >  	string_list_init(&options.deepen_not, 1);
> >  	string_list_init(&options.push_options, 1);
> >  
> > +	if (!strcmp("--build-info", argv[1])) {
> > +		printf("git-http-fetch version: %s\n", git_version_string);
> 
> We are letting bugreport grab this information from remote-curl, and
> they are different binaries, so git_version_string we see here is
> that of the remote-curl.  Good.
> 
> > +		printf("built from commit: %s\n", git_built_from_commit_string);
> > +		printf("curl version: %s\n", curl_version());
> > +		return 0;
> > +	}
> > +
> >  	/*
> >  	 * Just report "remote-curl" here (folding all the various aliases
> >  	 * ("git-remote-http", "git-remote-https", and etc.) here since they
> 
> Makes sense, except that it is not clear what our overall stance on
> i18n/l10n of the bugreport output.
> 
> I think there are two schools of thought.  
> 
>  - You can stick to C local, with an expectation that developers can
>    read reports in C locale.  This will allow developers to avoid
>    having to read reports written in a language they do not
>    understand, and also makes it easier to mechanically process
>    reports.
> 
>  - You can make sure what matters in the report is localized to the
>    end-users' locale.  This will avoid forcing end-users to send out
>    a report that they may not even able to read (which is what
>    happens if we did not do i18n/l10n).
> 
> I am not sure which way you are aiming at.  The boilerplate text
> we saw in a very early part of the series were marked N_() but some
> of the messages in later parts of the series are not.

I mentioned it in another reply to you at a different point in the v8
conversation; I took away the localization on the boilerplate at your
suggestion. Unfortunately I agree that we wouldn't be equipped to handle
reports in other languages. Not being able to understand the
user-fillable part of the report extends, I think, to not being able to
understand the diagnostic info. If the point of the report is to skip
back-and-forth, then writing back to say "actually, run all this stuff
manually so we can see the output in English" is not going to achieve
that goal.

 - Emily

Junio C Hamano Feb. 21, 2020, 3:44 a.m. UTC | #3

Emily Shaffer <emilyshaffer@google.com> writes:

> ... Unfortunately I agree that we wouldn't be equipped to handle
> reports in other languages.

I actually was anticipating a far better world ;-) 

There is no reason to limit the recipient of reports only to *us*.
Use of Git and population proficient in Git would become wide enough
that we should be able to expect that users who speak language X can
be helped by experts who speak the same language X with their
issues.

Emily Shaffer Feb. 25, 2020, 10:08 p.m. UTC | #4

On Thu, Feb 20, 2020 at 07:44:30PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > ... Unfortunately I agree that we wouldn't be equipped to handle
> > reports in other languages.
> 
> I actually was anticipating a far better world ;-) 
> 
> There is no reason to limit the recipient of reports only to *us*.
> Use of Git and population proficient in Git would become wide enough
> that we should be able to expect that users who speak language X can
> be helped by experts who speak the same language X with their
> issues.

Hm. I guess I got the opposite impression from you way back in v1. I do
wish it had been communicated a little more clearly; it's frustrating to
perceive a reversal after seven months of review. But that's probably on
my own reading comprehension :)

Well, I certainly don't mind - but I did have a pretty long conversation
with Jonathan Nieder last week about whether it's feasible to make the
bugreport extremely stable while also being locale aware. As I
understood it, he worried about someone having a misconfigured locale
causing the bugreport tool to crash when it tries to set the locale.

But I'm having trouble figuring out how that can happen. It looks like
we use libintl.h to do almost all of our string localization, which I
have to assume is pretty robust. Or to put it another way, if the user's
environment has broken libintl.h, then it seems like they also have
bigger problems outside of Git which they would notice already.

(Side note: I went down a rabbit hole of trying to break my locale, and
did manage to generate some garbage by setting 'LANG' to a non-UTF-8
character set and setting 'LANGUAGE' to a character set which does
require UTF-8, that is, 'LANG=es_MX LANGUAGE=ko git status'.)

How about if I localize the bugreport template, headers, and formatted
comments (e.g. "3745 local loose objects"), and include a tip in 'git
help bugreport' suggesting that if it doesn't look right, maybe the user
wants to run it with 'LANG= LANGUAGE= git bugreport' to ensure it
actually gets generated?

I had another thought, actually, that this is maybe semantically similar
problem to the malformed config we discussed earlier in the review. Does
it make sense to include some kind of --safemode flag to 'git' which
asks it to not perform localization and not read configs? I would
propose adding it just to git-bugreport, and I could work around some
locale weirdness that way, but with a broken config the attempt dies
before git-bugreport binary is invoked at all.

 - Emily

Junio C Hamano Feb. 25, 2020, 10:26 p.m. UTC | #5

Emily Shaffer <emilyshaffer@google.com> writes:

> How about if I localize the bugreport template, headers, and formatted
> comments (e.g. "3745 local loose objects"), and include a tip in 'git
> help bugreport' suggesting that if it doesn't look right, maybe the user
> wants to run it with 'LANG= LANGUAGE= git bugreport' to ensure it
> actually gets generated?

I think that will be what is going to happen anyway in the real
world.  We'd spend a reasonable effort for localization (and I
personally am perfectly OK if the effort for the initial round is
"almost zero"), but make sure C-locale is left as an escape hatch.

> I had another thought, actually, that this is maybe semantically similar
> problem to the malformed config we discussed earlier in the review. Does
> it make sense to include some kind of --safemode flag to 'git' which
> asks it to not perform localization and not read configs?

I suspect that "git --safemode" would always end up being buggy than
running "git" under LANG=C LC_ALL=C GIT_CONFIG=/dev/null or something
like that ;-)

Wouldn't it defeat more than 30% of the value of the tool if we
do not read and report the contents of the configuration file(s)?

Junio C Hamano Feb. 25, 2020, 11:29 p.m. UTC | #6

Emily Shaffer <emilyshaffer@google.com> writes:

> Hm. I guess I got the opposite impression from you way back in v1. I do
> wish it had been communicated a little more clearly; it's frustrating to
> perceive a reversal after seven months of review. But that's probably on
> my own reading comprehension :)

Well, seven months is a long time for anybody to learn from repeated
reading and gained experience to form an opinion and get affected by
others' opinions.  In any case, this is one of the reasons why I try
to discourage people to have too many topics in flight before moving
on to a new and different topic---the risk of ending up with fliping
and flopping is just too high when you give people chance to forget.
Rather, I'd prefer to see something simple to land first and then
later refined.

On this particular issue, I actually do not have a preference.  As
long as the topic has a coherent position/stance, any one of

 (a) we do as much i18n as possible to help end-users, or

 (b) we stay away from i18n to ensure the reports are machine
     readable, or

 (c) somewhere in between with a clear criterion where you are
     drawing the line (e.g. "the introductory text is what we want
     the end-user to read, so it is i18ned, but the report about
     their environment are primarily for our use and we avoid
     localizing so that we can process mechanically"),

is fine.  The important point is that we choose what we do with a
solid guiding principle behind the decision.

In practice, every string in bugreport.c you have control over the
use (or non-use) of _() around it, but codepaths that you call from
existing parts of the system are likely to have their messages
localized if they are meant for Porcelain use.  So from that point
of view, (a) would be easier to arrange than (b), I suspect.

Thanks.

Emily Shaffer Feb. 25, 2020, 11:29 p.m. UTC | #7

On Tue, Feb 25, 2020 at 02:26:34PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > How about if I localize the bugreport template, headers, and formatted
> > comments (e.g. "3745 local loose objects"), and include a tip in 'git
> > help bugreport' suggesting that if it doesn't look right, maybe the user
> > wants to run it with 'LANG= LANGUAGE= git bugreport' to ensure it
> > actually gets generated?
> 
> I think that will be what is going to happen anyway in the real
> world.  We'd spend a reasonable effort for localization (and I
> personally am perfectly OK if the effort for the initial round is
> "almost zero"), but make sure C-locale is left as an escape hatch.

OK.

> 
> > I had another thought, actually, that this is maybe semantically similar
> > problem to the malformed config we discussed earlier in the review. Does
> > it make sense to include some kind of --safemode flag to 'git' which
> > asks it to not perform localization and not read configs?
> 
> I suspect that "git --safemode" would always end up being buggy than
> running "git" under LANG=C LC_ALL=C GIT_CONFIG=/dev/null or something
> like that ;-)
> 
> Wouldn't it defeat more than 30% of the value of the tool if we
> do not read and report the contents of the configuration file(s)?

If you mean "the tool" = "Git", I won't argue with you. But if you mean
"the tool" = "git-bugreport", I'd say that 70% of the value of the tool
is better than 0% because my config is broken and I don't know why or
how.

 - Emily

Emily Shaffer Feb. 25, 2020, 11:55 p.m. UTC | #8

On Tue, Feb 25, 2020 at 03:29:08PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Hm. I guess I got the opposite impression from you way back in v1. I do
> > wish it had been communicated a little more clearly; it's frustrating to
> > perceive a reversal after seven months of review. But that's probably on
> > my own reading comprehension :)
> 
> Well, seven months is a long time for anybody to learn from repeated
> reading and gained experience to form an opinion and get affected by
> others' opinions.  In any case, this is one of the reasons why I try
> to discourage people to have too many topics in flight before moving
> on to a new and different topic---the risk of ending up with fliping
> and flopping is just too high when you give people chance to forget.
> Rather, I'd prefer to see something simple to land first and then
> later refined.

I think it's exactly what happened with this topic. It's easy to say
"don't have too many things out at once" in theory - but becomes less
easy when you combine it with things like quarterly planning at dayjob,
etc. In fact I have been trying to revive and push through old topics I
had in flight because I had this same realization - my six topics were
only each updated every month or so, meaning everyone (including me)
forgot what was going on from revision to revision.

I wonder whether it makes sense to try and finalize the bare minimums of
this feature, and then move the rest of the work to their own topics. I
see a couple benefits:

 - Users can start trying out 'git bugreport' when they write us mails
   much sooner. Even the template alone would increase the quality of a
   number of the reports I see come into this list.
 - It may help insulate this topic from 'feature creep,' which in my
   opinion has plagued it from the start. While the entire topic of 15
   patches is in flux, it's hard for someone else to write their own
   patch adding, say, midex/commit-graph support, and so I feel
   obligated to write it myself - which of course increases the amount
   of time the topic lives on in review.
 - Of course smaller topics are easier to review, and a commit-graph
   expert is much more likely to click on the subject "[PATCH]
   bugreport: gather commit-graph diagnostics" than they are to click on
   the subject "[PATCH] add bugreport utility". That is, I'm guessing
   there will be more reviews on the semantics of the patch (is it the
   right diagnostic info? is it useful?) than there have been so far.

The downside is mailing list churn, and the possibility that with a
minimum 'git-bugreport' checked in I find something better to do. I'm
not too worried about the former; after all, that's what this list is
for, right? As for the latter, after 7 months of flip and flop I don't
think the risk is greater than it is now anyways ;) I think it makes
sense to send the following topics:

/* To me, this sounds like minimum viable bugreport. "What went wrong,
 * and how do I build the same binary as you so I can reproduce it?"
add git-bugreport tool:
 - bugreport: add tool to generate debugging info
 - bugreport: gather git version and build info
 - bugreport: add uname info
 - bugreport: add compiler info

bugreport: add git-remote-https version

bugreport: include shell info
 - help: add shell-path to --build-options
 - bugreport: include user interactive shell

bugreport: include config info
 - help: move list_config_help to builtin/help
 - bugreport: generate config safelist based on docs
 - bugreport: add config values from safelist

bugreport: collect list of populated hooks

bugreport: include object store info
 - bugreport: count loose objects
 - bugreport: add packed object summary
 - bugreport: list contents of $OBJDIR/info

I'd like to send that first topic as vN+1 of this one, and I can hold
the rest of the branches locally until we get it ironed out; once that
topic makes it through to 'next' then I can send the rest again. What do
you think?

> On this particular issue, I actually do not have a preference.  As
> long as the topic has a coherent position/stance, any one of
> 
>  (a) we do as much i18n as possible to help end-users, or
> 
>  (b) we stay away from i18n to ensure the reports are machine
>      readable, or
> 
>  (c) somewhere in between with a clear criterion where you are
>      drawing the line (e.g. "the introductory text is what we want
>      the end-user to read, so it is i18ned, but the report about
>      their environment are primarily for our use and we avoid
>      localizing so that we can process mechanically"),
> 
> is fine.  The important point is that we choose what we do with a
> solid guiding principle behind the decision.
> 
> In practice, every string in bugreport.c you have control over the
> use (or non-use) of _() around it, but codepaths that you call from
> existing parts of the system are likely to have their messages
> localized if they are meant for Porcelain use.  So from that point
> of view, (a) would be easier to arrange than (b), I suspect.

Hm. Sounds convincing enough to me.

 - Emily

[v8,07/15] bugreport: add git-remote-https version

Commit Message

Comments

Patch