Message ID | 20200220015858.181086-8-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add git-bugreport tool | expand |
Emily Shaffer <emilyshaffer@google.com> writes: > +static void get_git_remote_https_version_info(struct strbuf *version_info) > +{ > + struct child_process cp = CHILD_PROCESS_INIT; > + > + cp.git_cmd = 1; > + argv_array_push(&cp.args, "remote-https"); > + argv_array_push(&cp.args, "--build-info"); > + if (capture_command(&cp, version_info, 0)) > + strbuf_addstr(version_info, "'git-remote-https --build-info' not supported\n"); > +} > > static void get_system_info(struct strbuf *sys_info) > { > @@ -29,6 +41,10 @@ static void get_system_info(struct strbuf *sys_info) > strbuf_addstr(sys_info, "compiler info: "); > get_compiler_info(sys_info); > strbuf_complete_line(sys_info); > + > + strbuf_addstr(sys_info, "git-remote-https --build-info:\n"); > + get_git_remote_https_version_info(sys_info); > + strbuf_complete_line(sys_info); > } > > static const char * const bugreport_usage[] = { > diff --git a/remote-curl.c b/remote-curl.c > index 8eb96152f5..73e52175c0 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -17,6 +17,7 @@ > #include "protocol.h" > #include "quote.h" > #include "transport.h" > +#include "version.h" > > static struct remote *remote; > /* always ends with a trailing slash */ > @@ -1375,6 +1376,13 @@ int cmd_main(int argc, const char **argv) > string_list_init(&options.deepen_not, 1); > string_list_init(&options.push_options, 1); > > + if (!strcmp("--build-info", argv[1])) { > + printf("git-http-fetch version: %s\n", git_version_string); We are letting bugreport grab this information from remote-curl, and they are different binaries, so git_version_string we see here is that of the remote-curl. Good. > + printf("built from commit: %s\n", git_built_from_commit_string); > + printf("curl version: %s\n", curl_version()); > + return 0; > + } > + > /* > * Just report "remote-curl" here (folding all the various aliases > * ("git-remote-http", "git-remote-https", and etc.) here since they Makes sense, except that it is not clear what our overall stance on i18n/l10n of the bugreport output. I think there are two schools of thought. - You can stick to C local, with an expectation that developers can read reports in C locale. This will allow developers to avoid having to read reports written in a language they do not understand, and also makes it easier to mechanically process reports. - You can make sure what matters in the report is localized to the end-users' locale. This will avoid forcing end-users to send out a report that they may not even able to read (which is what happens if we did not do i18n/l10n). I am not sure which way you are aiming at. The boilerplate text we saw in a very early part of the series were marked N_() but some of the messages in later parts of the series are not.
On Thu, Feb 20, 2020 at 12:35:37PM -0800, Junio C Hamano wrote: > Emily Shaffer <emilyshaffer@google.com> writes: > > > +static void get_git_remote_https_version_info(struct strbuf *version_info) > > +{ > > + struct child_process cp = CHILD_PROCESS_INIT; > > + > > + cp.git_cmd = 1; > > + argv_array_push(&cp.args, "remote-https"); > > + argv_array_push(&cp.args, "--build-info"); > > + if (capture_command(&cp, version_info, 0)) > > + strbuf_addstr(version_info, "'git-remote-https --build-info' not supported\n"); > > +} > > > > static void get_system_info(struct strbuf *sys_info) > > { > > @@ -29,6 +41,10 @@ static void get_system_info(struct strbuf *sys_info) > > strbuf_addstr(sys_info, "compiler info: "); > > get_compiler_info(sys_info); > > strbuf_complete_line(sys_info); > > + > > + strbuf_addstr(sys_info, "git-remote-https --build-info:\n"); > > + get_git_remote_https_version_info(sys_info); > > + strbuf_complete_line(sys_info); > > } > > > > static const char * const bugreport_usage[] = { > > diff --git a/remote-curl.c b/remote-curl.c > > index 8eb96152f5..73e52175c0 100644 > > --- a/remote-curl.c > > +++ b/remote-curl.c > > @@ -17,6 +17,7 @@ > > #include "protocol.h" > > #include "quote.h" > > #include "transport.h" > > +#include "version.h" > > > > static struct remote *remote; > > /* always ends with a trailing slash */ > > @@ -1375,6 +1376,13 @@ int cmd_main(int argc, const char **argv) > > string_list_init(&options.deepen_not, 1); > > string_list_init(&options.push_options, 1); > > > > + if (!strcmp("--build-info", argv[1])) { > > + printf("git-http-fetch version: %s\n", git_version_string); > > We are letting bugreport grab this information from remote-curl, and > they are different binaries, so git_version_string we see here is > that of the remote-curl. Good. > > > + printf("built from commit: %s\n", git_built_from_commit_string); > > + printf("curl version: %s\n", curl_version()); > > + return 0; > > + } > > + > > /* > > * Just report "remote-curl" here (folding all the various aliases > > * ("git-remote-http", "git-remote-https", and etc.) here since they > > Makes sense, except that it is not clear what our overall stance on > i18n/l10n of the bugreport output. > > I think there are two schools of thought. > > - You can stick to C local, with an expectation that developers can > read reports in C locale. This will allow developers to avoid > having to read reports written in a language they do not > understand, and also makes it easier to mechanically process > reports. > > - You can make sure what matters in the report is localized to the > end-users' locale. This will avoid forcing end-users to send out > a report that they may not even able to read (which is what > happens if we did not do i18n/l10n). > > I am not sure which way you are aiming at. The boilerplate text > we saw in a very early part of the series were marked N_() but some > of the messages in later parts of the series are not. I mentioned it in another reply to you at a different point in the v8 conversation; I took away the localization on the boilerplate at your suggestion. Unfortunately I agree that we wouldn't be equipped to handle reports in other languages. Not being able to understand the user-fillable part of the report extends, I think, to not being able to understand the diagnostic info. If the point of the report is to skip back-and-forth, then writing back to say "actually, run all this stuff manually so we can see the output in English" is not going to achieve that goal. - Emily
Emily Shaffer <emilyshaffer@google.com> writes: > ... Unfortunately I agree that we wouldn't be equipped to handle > reports in other languages. I actually was anticipating a far better world ;-) There is no reason to limit the recipient of reports only to *us*. Use of Git and population proficient in Git would become wide enough that we should be able to expect that users who speak language X can be helped by experts who speak the same language X with their issues.
On Thu, Feb 20, 2020 at 07:44:30PM -0800, Junio C Hamano wrote: > Emily Shaffer <emilyshaffer@google.com> writes: > > > ... Unfortunately I agree that we wouldn't be equipped to handle > > reports in other languages. > > I actually was anticipating a far better world ;-) > > There is no reason to limit the recipient of reports only to *us*. > Use of Git and population proficient in Git would become wide enough > that we should be able to expect that users who speak language X can > be helped by experts who speak the same language X with their > issues. Hm. I guess I got the opposite impression from you way back in v1. I do wish it had been communicated a little more clearly; it's frustrating to perceive a reversal after seven months of review. But that's probably on my own reading comprehension :) Well, I certainly don't mind - but I did have a pretty long conversation with Jonathan Nieder last week about whether it's feasible to make the bugreport extremely stable while also being locale aware. As I understood it, he worried about someone having a misconfigured locale causing the bugreport tool to crash when it tries to set the locale. But I'm having trouble figuring out how that can happen. It looks like we use libintl.h to do almost all of our string localization, which I have to assume is pretty robust. Or to put it another way, if the user's environment has broken libintl.h, then it seems like they also have bigger problems outside of Git which they would notice already. (Side note: I went down a rabbit hole of trying to break my locale, and did manage to generate some garbage by setting 'LANG' to a non-UTF-8 character set and setting 'LANGUAGE' to a character set which does require UTF-8, that is, 'LANG=es_MX LANGUAGE=ko git status'.) How about if I localize the bugreport template, headers, and formatted comments (e.g. "3745 local loose objects"), and include a tip in 'git help bugreport' suggesting that if it doesn't look right, maybe the user wants to run it with 'LANG= LANGUAGE= git bugreport' to ensure it actually gets generated? I had another thought, actually, that this is maybe semantically similar problem to the malformed config we discussed earlier in the review. Does it make sense to include some kind of --safemode flag to 'git' which asks it to not perform localization and not read configs? I would propose adding it just to git-bugreport, and I could work around some locale weirdness that way, but with a broken config the attempt dies before git-bugreport binary is invoked at all. - Emily
Emily Shaffer <emilyshaffer@google.com> writes: > How about if I localize the bugreport template, headers, and formatted > comments (e.g. "3745 local loose objects"), and include a tip in 'git > help bugreport' suggesting that if it doesn't look right, maybe the user > wants to run it with 'LANG= LANGUAGE= git bugreport' to ensure it > actually gets generated? I think that will be what is going to happen anyway in the real world. We'd spend a reasonable effort for localization (and I personally am perfectly OK if the effort for the initial round is "almost zero"), but make sure C-locale is left as an escape hatch. > I had another thought, actually, that this is maybe semantically similar > problem to the malformed config we discussed earlier in the review. Does > it make sense to include some kind of --safemode flag to 'git' which > asks it to not perform localization and not read configs? I suspect that "git --safemode" would always end up being buggy than running "git" under LANG=C LC_ALL=C GIT_CONFIG=/dev/null or something like that ;-) Wouldn't it defeat more than 30% of the value of the tool if we do not read and report the contents of the configuration file(s)?
Emily Shaffer <emilyshaffer@google.com> writes: > Hm. I guess I got the opposite impression from you way back in v1. I do > wish it had been communicated a little more clearly; it's frustrating to > perceive a reversal after seven months of review. But that's probably on > my own reading comprehension :) Well, seven months is a long time for anybody to learn from repeated reading and gained experience to form an opinion and get affected by others' opinions. In any case, this is one of the reasons why I try to discourage people to have too many topics in flight before moving on to a new and different topic---the risk of ending up with fliping and flopping is just too high when you give people chance to forget. Rather, I'd prefer to see something simple to land first and then later refined. On this particular issue, I actually do not have a preference. As long as the topic has a coherent position/stance, any one of (a) we do as much i18n as possible to help end-users, or (b) we stay away from i18n to ensure the reports are machine readable, or (c) somewhere in between with a clear criterion where you are drawing the line (e.g. "the introductory text is what we want the end-user to read, so it is i18ned, but the report about their environment are primarily for our use and we avoid localizing so that we can process mechanically"), is fine. The important point is that we choose what we do with a solid guiding principle behind the decision. In practice, every string in bugreport.c you have control over the use (or non-use) of _() around it, but codepaths that you call from existing parts of the system are likely to have their messages localized if they are meant for Porcelain use. So from that point of view, (a) would be easier to arrange than (b), I suspect. Thanks.
On Tue, Feb 25, 2020 at 02:26:34PM -0800, Junio C Hamano wrote: > Emily Shaffer <emilyshaffer@google.com> writes: > > > How about if I localize the bugreport template, headers, and formatted > > comments (e.g. "3745 local loose objects"), and include a tip in 'git > > help bugreport' suggesting that if it doesn't look right, maybe the user > > wants to run it with 'LANG= LANGUAGE= git bugreport' to ensure it > > actually gets generated? > > I think that will be what is going to happen anyway in the real > world. We'd spend a reasonable effort for localization (and I > personally am perfectly OK if the effort for the initial round is > "almost zero"), but make sure C-locale is left as an escape hatch. OK. > > > I had another thought, actually, that this is maybe semantically similar > > problem to the malformed config we discussed earlier in the review. Does > > it make sense to include some kind of --safemode flag to 'git' which > > asks it to not perform localization and not read configs? > > I suspect that "git --safemode" would always end up being buggy than > running "git" under LANG=C LC_ALL=C GIT_CONFIG=/dev/null or something > like that ;-) > > Wouldn't it defeat more than 30% of the value of the tool if we > do not read and report the contents of the configuration file(s)? If you mean "the tool" = "Git", I won't argue with you. But if you mean "the tool" = "git-bugreport", I'd say that 70% of the value of the tool is better than 0% because my config is broken and I don't know why or how. - Emily
On Tue, Feb 25, 2020 at 03:29:08PM -0800, Junio C Hamano wrote: > Emily Shaffer <emilyshaffer@google.com> writes: > > > Hm. I guess I got the opposite impression from you way back in v1. I do > > wish it had been communicated a little more clearly; it's frustrating to > > perceive a reversal after seven months of review. But that's probably on > > my own reading comprehension :) > > Well, seven months is a long time for anybody to learn from repeated > reading and gained experience to form an opinion and get affected by > others' opinions. In any case, this is one of the reasons why I try > to discourage people to have too many topics in flight before moving > on to a new and different topic---the risk of ending up with fliping > and flopping is just too high when you give people chance to forget. > Rather, I'd prefer to see something simple to land first and then > later refined. I think it's exactly what happened with this topic. It's easy to say "don't have too many things out at once" in theory - but becomes less easy when you combine it with things like quarterly planning at dayjob, etc. In fact I have been trying to revive and push through old topics I had in flight because I had this same realization - my six topics were only each updated every month or so, meaning everyone (including me) forgot what was going on from revision to revision. I wonder whether it makes sense to try and finalize the bare minimums of this feature, and then move the rest of the work to their own topics. I see a couple benefits: - Users can start trying out 'git bugreport' when they write us mails much sooner. Even the template alone would increase the quality of a number of the reports I see come into this list. - It may help insulate this topic from 'feature creep,' which in my opinion has plagued it from the start. While the entire topic of 15 patches is in flux, it's hard for someone else to write their own patch adding, say, midex/commit-graph support, and so I feel obligated to write it myself - which of course increases the amount of time the topic lives on in review. - Of course smaller topics are easier to review, and a commit-graph expert is much more likely to click on the subject "[PATCH] bugreport: gather commit-graph diagnostics" than they are to click on the subject "[PATCH] add bugreport utility". That is, I'm guessing there will be more reviews on the semantics of the patch (is it the right diagnostic info? is it useful?) than there have been so far. The downside is mailing list churn, and the possibility that with a minimum 'git-bugreport' checked in I find something better to do. I'm not too worried about the former; after all, that's what this list is for, right? As for the latter, after 7 months of flip and flop I don't think the risk is greater than it is now anyways ;) I think it makes sense to send the following topics: /* To me, this sounds like minimum viable bugreport. "What went wrong, * and how do I build the same binary as you so I can reproduce it?" add git-bugreport tool: - bugreport: add tool to generate debugging info - bugreport: gather git version and build info - bugreport: add uname info - bugreport: add compiler info bugreport: add git-remote-https version bugreport: include shell info - help: add shell-path to --build-options - bugreport: include user interactive shell bugreport: include config info - help: move list_config_help to builtin/help - bugreport: generate config safelist based on docs - bugreport: add config values from safelist bugreport: collect list of populated hooks bugreport: include object store info - bugreport: count loose objects - bugreport: add packed object summary - bugreport: list contents of $OBJDIR/info I'd like to send that first topic as vN+1 of this one, and I can hold the rest of the branches locally until we get it ironed out; once that topic makes it through to 'next' then I can send the rest again. What do you think? > On this particular issue, I actually do not have a preference. As > long as the topic has a coherent position/stance, any one of > > (a) we do as much i18n as possible to help end-users, or > > (b) we stay away from i18n to ensure the reports are machine > readable, or > > (c) somewhere in between with a clear criterion where you are > drawing the line (e.g. "the introductory text is what we want > the end-user to read, so it is i18ned, but the report about > their environment are primarily for our use and we avoid > localizing so that we can process mechanically"), > > is fine. The important point is that we choose what we do with a > solid guiding principle behind the decision. > > In practice, every string in bugreport.c you have control over the > use (or non-use) of _() around it, but codepaths that you call from > existing parts of the system are likely to have their messages > localized if they are meant for Porcelain use. So from that point > of view, (a) would be easier to arrange than (b), I suspect. Hm. Sounds convincing enough to me. - Emily
diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt index 643d1b2884..8c7cb9a5d4 100644 --- a/Documentation/git-bugreport.txt +++ b/Documentation/git-bugreport.txt @@ -28,6 +28,7 @@ The following information is captured automatically: - 'git version --build-options' - uname sysname, release, version, and machine strings - Compiler-specific info string + - 'git remote-https --build-info' This tool is invoked via the typical Git setup process, which means that in some cases, it might not be able to launch - for example, if a relevant config file diff --git a/bugreport.c b/bugreport.c index 9a470c5588..fb2adfdf14 100644 --- a/bugreport.c +++ b/bugreport.c @@ -5,6 +5,18 @@ #include "time.h" #include "help.h" #include "compat/compiler.h" +#include "run-command.h" + +static void get_git_remote_https_version_info(struct strbuf *version_info) +{ + struct child_process cp = CHILD_PROCESS_INIT; + + cp.git_cmd = 1; + argv_array_push(&cp.args, "remote-https"); + argv_array_push(&cp.args, "--build-info"); + if (capture_command(&cp, version_info, 0)) + strbuf_addstr(version_info, "'git-remote-https --build-info' not supported\n"); +} static void get_system_info(struct strbuf *sys_info) { @@ -29,6 +41,10 @@ static void get_system_info(struct strbuf *sys_info) strbuf_addstr(sys_info, "compiler info: "); get_compiler_info(sys_info); strbuf_complete_line(sys_info); + + strbuf_addstr(sys_info, "git-remote-https --build-info:\n"); + get_git_remote_https_version_info(sys_info); + strbuf_complete_line(sys_info); } static const char * const bugreport_usage[] = { diff --git a/remote-curl.c b/remote-curl.c index 8eb96152f5..73e52175c0 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -17,6 +17,7 @@ #include "protocol.h" #include "quote.h" #include "transport.h" +#include "version.h" static struct remote *remote; /* always ends with a trailing slash */ @@ -1375,6 +1376,13 @@ int cmd_main(int argc, const char **argv) string_list_init(&options.deepen_not, 1); string_list_init(&options.push_options, 1); + if (!strcmp("--build-info", argv[1])) { + printf("git-http-fetch version: %s\n", git_version_string); + printf("built from commit: %s\n", git_built_from_commit_string); + printf("curl version: %s\n", curl_version()); + return 0; + } + /* * Just report "remote-curl" here (folding all the various aliases * ("git-remote-http", "git-remote-https", and etc.) here since they
It's possible for git-remote-curl to be built separately from git; in that case we want to know what version of cURL is used by git-remote-curl, not necessarily which version was present at git-bugreport's build time. So instead, ask git-remote-curl for the version information it knows about. Today, "git-remote-http" and "git-remote-https" are aliased to "git-remote-curl"; but in case we rely on a different library than cURL in the future, let's not explicitly reference cURL from bugreport. For longevity purposes, invoke the alias "git-remote-https" instead of "git-remote-http". Since it could have been built at a different time, also report the version and built-from commit of git-remote-curl alongside the cURL info. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Documentation/git-bugreport.txt | 1 + bugreport.c | 16 ++++++++++++++++ remote-curl.c | 8 ++++++++ 3 files changed, 25 insertions(+)