bugreport: add tool to generate debugging info

Message ID	20190815023418.33407-1-emilyshaffer@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 14 Aug 2019 19:34:18 -0700 Message-Id: <20190815023418.33407-1-emilyshaffer@google.com> Mime-Version: 1.0 Subject: [PATCH] bugreport: add tool to generate debugging info From: Emily Shaffer <emilyshaffer@google.com> To: git@vger.kernel.org Cc: Emily Shaffer <emilyshaffer@google.com> Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	bugreport: add tool to generate debugging info \| expand bugreport: add tool to generate debugging info

Emily Shaffer Aug. 15, 2019, 2:34 a.m. UTC

Make it easier for users who encounter a bug to send a report by
collecting some state information, intended to be viewed by humans
familiar with Git.

Teach Git how to prompt the user for a good bug report: reproduction
steps, expected behavior, and actual behavior. Also, teach Git to write
down its own version, the version of some of its dependencies, the
operating system information, the effective gitconfig, the configured
hooks, and the contents of $GIT_DIR/logs. Finally, make sure Git asks
the user to review the contents of the report after it's generated.

If users can send us a well-written bug report complete with system
information, a remote we may be able to clone, and a reflog showing us
where the problem occurred, we can reduce the number of questions and
answers which must travel between the reporter and the Git contributor.

Users may also wish to send a report like this to their local "Git
expert" if they have put their repository into a state they are confused
by.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
There are some things I'm not certain about from this patch, I'd
appreciate feedback.

 - Do we want to advertise the Git mailing list for bug reports?
 - Which config options should we filter to avoid accidentally receiving
   credentials?
 - At the moment, the test is trying to check "everything we thought we
   would populate got populated" - that seemed to me like it would hold
   up best to changes to the set of information being collected. But
   maybe there's something more robust we can do.

And of course, if there are suggestions for other things to include in
the report I'm interested in adding.

An example of the output can be found here:
https://gist.github.com/nasamuffin/2e320892f5c2147e829cbcf5bd0759a2

Thanks.
 - Emily

 .gitignore                      |  1 +
 Documentation/git-bugreport.txt | 48 ++++++++++++++++++
 Makefile                        |  1 +
 command-list.txt                |  1 +
 git-bugreport.sh                | 86 +++++++++++++++++++++++++++++++++
 t/t0091-bugreport.sh            | 41 ++++++++++++++++
 6 files changed, 178 insertions(+)
 create mode 100644 Documentation/git-bugreport.txt
 create mode 100755 git-bugreport.sh
 create mode 100755 t/t0091-bugreport.sh

Derrick Stolee Aug. 15, 2019, 2:15 p.m. UTC | #1

On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> Make it easier for users who encounter a bug to send a report by
> collecting some state information, intended to be viewed by humans
> familiar with Git.

This is an excellent idea! VFS for Git has a similar "diagnose" command
that collects logs, config, and other details (like packfile sizes and
loose object counts). That has been a critical tool for supporting users.

> Teach Git how to prompt the user for a good bug report: reproduction
> steps, expected behavior, and actual behavior. Also, teach Git to write
> down its own version, the version of some of its dependencies, the
> operating system information, the effective gitconfig, the configured
> hooks, and the contents of $GIT_DIR/logs. Finally, make sure Git asks
> the user to review the contents of the report after it's generated.
> 
> If users can send us a well-written bug report complete with system
> information, a remote we may be able to clone, and a reflog showing us
> where the problem occurred, we can reduce the number of questions and
> answers which must travel between the reporter and the Git contributor.
> 
> Users may also wish to send a report like this to their local "Git
> expert" if they have put their repository into a state they are confused
> by.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> There are some things I'm not certain about from this patch, I'd
> appreciate feedback.
> 
>  - Do we want to advertise the Git mailing list for bug reports?

That is possible. Isn't there another mailing list for git users?

I could see a patch added on top of this for git-for-windows/git that
changes the instructions to create issues on GitHub.

>  - Which config options should we filter to avoid accidentally receiving
>    credentials?

The remote URLs are pretty sensitive. Not only do users sometimes put passwords
or PATs into their URLs, the literal name of the repo could be a secret.

>  - At the moment, the test is trying to check "everything we thought we
>    would populate got populated" - that seemed to me like it would hold
>    up best to changes to the set of information being collected. But
>    maybe there's something more robust we can do.
> 
> And of course, if there are suggestions for other things to include in
> the report I'm interested in adding.
> 
> An example of the output can be found here:
> https://gist.github.com/nasamuffin/2e320892f5c2147e829cbcf5bd0759a2
> 
> Thanks.
>  - Emily
> 
>  .gitignore                      |  1 +
>  Documentation/git-bugreport.txt | 48 ++++++++++++++++++
>  Makefile                        |  1 +
>  command-list.txt                |  1 +
>  git-bugreport.sh                | 86 +++++++++++++++++++++++++++++++++
>  t/t0091-bugreport.sh            | 41 ++++++++++++++++
>  6 files changed, 178 insertions(+)
>  create mode 100644 Documentation/git-bugreport.txt
>  create mode 100755 git-bugreport.sh
>  create mode 100755 t/t0091-bugreport.sh
> 
> diff --git a/.gitignore b/.gitignore
> index 521d8f4fb4..b4f5433084 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -25,6 +25,7 @@
>  /git-bisect--helper
>  /git-blame
>  /git-branch
> +/git-bugreport
>  /git-bundle
>  /git-cat-file
>  /git-check-attr
> diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
> new file mode 100644
> index 0000000000..c5f45bbee8
> --- /dev/null
> +++ b/Documentation/git-bugreport.txt
> @@ -0,0 +1,48 @@
> +git-bugreport(1)
> +================
> +
> +NAME
> +----
> +git-bugreport - Collect information for user to file a bug report
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git bugreport' [-o | --output <path>]
> +
> +DESCRIPTION
> +-----------
> +Captures information about the user's machine, Git client, and repository state,
> +as well as a form requesting information about the behavior the user observed,
> +into a single text file which the user can then share, for example to the Git
> +mailing list, in order to report an observed bug.
> +
> +The following information is requested from the user:
> +
> + - Reproduction steps
> + - Expected behavior
> + - Actual behavior
> +
> +The following information is captured automatically:
> +
> + - Git version (`git version --build-options`)
> + - Machine information (`uname -a`)
> + - Versions of various dependencies
> + - Git config contents (`git config --show-origin --list`)
> + - The contents of all configured git-hooks in `.git/hooks/`
> + - The contents of `.git/logs`
> +
> +OPTIONS
> +-------
> +-o [<path>]::
> +--output [<path>]::
> +	Place the resulting bug report file in <path> instead of the root of the
> +	Git repository.
> +
> +NOTE
> +----
> +Bug reports can be sent to git@vger.kernel.org.
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> diff --git a/Makefile b/Makefile
> index f9255344ae..69801a1c45 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -606,6 +606,7 @@ TEST_PROGRAMS_NEED_X =
>  unexport CDPATH
>  
>  SCRIPT_SH += git-bisect.sh
> +SCRIPT_SH += git-bugreport.sh
>  SCRIPT_SH += git-difftool--helper.sh
>  SCRIPT_SH += git-filter-branch.sh
>  SCRIPT_SH += git-merge-octopus.sh
> diff --git a/command-list.txt b/command-list.txt
> index a9ac72bef4..be5a605047 100644
> --- a/command-list.txt
> +++ b/command-list.txt
> @@ -54,6 +54,7 @@ git-archive                             mainporcelain
>  git-bisect                              mainporcelain           info
>  git-blame                               ancillaryinterrogators          complete
>  git-branch                              mainporcelain           history
> +git-bugreport                           ancillaryinterrogators
>  git-bundle                              mainporcelain
>  git-cat-file                            plumbinginterrogators
>  git-check-attr                          purehelpers
> diff --git a/git-bugreport.sh b/git-bugreport.sh
> new file mode 100755
> index 0000000000..2200703a51
> --- /dev/null
> +++ b/git-bugreport.sh

At first I was alarmed by "What? another shell script?" but this command should
prioritize flexibility and extensibility over speed. Running multiple processes
shouldn't be too taxing for what we are trying to do here.

> @@ -0,0 +1,86 @@
> +#!/bin/sh
> +
> +print_filenames_and_content() {
> +while read -r line; do
> +	echo "$line";
> +	echo "========";
> +	cat "$line";
> +	echo;
> +done
> +}
> +
> +generate_report_text() {
> +
> +	# Generate a form for the user to fill out with their issue.
> +	gettextln "Thank you for filling out a Git bug report!"
> +	gettextln "Please answer the following questions to help us understand your issue."
> +	echo
> +	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
> +	echo
> +	gettextln "What did you expect to happen? (Expected behavior)"
> +	echo
> +	gettextln "What happened instead? (Actual behavior)"
> +	echo
> +	gettextln "Anything else you want to add:"
> +	echo
> +	gettextln "Please review the rest of the bug report below."
> +	gettextln "You can delete any lines you don't wish to send."
> +	echo
> +
> +	echo "[System Information]"
> +	git version --build-options
> +	uname -a
> +	curl-config --version
> +	ldd --version
> +	echo
> +
> +	echo "[Git Config]"
> +	# TODO: Pass this through grep -v to avoid users sending us their credentials.
> +	git config --show-origin --list
> +	echo

Config options to consider stripping out:

	*url*
	*pass* (anything "password" but also "sendmail.smtppass")

> +	echo "[Configured Hooks]"
> +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
> +	echo

Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
is a binary file?

> +
> +	echo "[Git Logs]"
> +	find "$GIT_DIR/logs" -type f | print_filenames_and_content
> +	echo

As mentioned before, I've sometimes found it helpful to know the data shape for the object
store. Having a few extra steps such as the following could be nice:

	echo "[Loose Objects]"
	for objdir in $(find "$GIT_DIR/objects/??" -type d)
	do
		echo "$objdir: $(ls $objdir | wc -l)"
	done
	echo

	echo "[Pack Data]"
	ls -l "$GIT_DIR/objects/pack"
	echo

	echo "[Object Info Data]"
	ls -lR "$GIT_DIR/objects/info"
	echo

	echo "[Alternates File]"
	echo "========"
	cat "$GIT_DIR/objects/info/alternates"
	echo

That last one will collect information on the commit-graph file, even if it is an
incremental file.

> +
> +}
> +
> +USAGE="[-o | --output <path>]"
> +
> +SUBDIRECTORY_OK=t
> +OPTIONS_SPEC=
> +. git-sh-setup
> +. git-sh-i18n
> +
> +basedir="$PWD"
> +while :
> +do
> +	case "$1" in
> +	-o|--output)
> +		shift
> +		basedir="$1"
> +		shift
> +		continue
> +		;;
> +	"")
> +		break
> +		;;
> +	*)
> +		usage
> +		;;
> +	esac
> +done
> +
> +
> +# Create bugreport file
> +BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"
> +
> +generate_report_text >$BUGREPORT_FILE
> +
> +git_editor $BUGREPORT_FILE
> +
> +eval_gettextln "Your new bug report is in \$BUGREPORT_FILE."
> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> new file mode 100755
> index 0000000000..6eb2ee4f66
> --- /dev/null
> +++ b/t/t0091-bugreport.sh
> @@ -0,0 +1,41 @@
> +#!/bin/bash
> +
> +test_description='git bugreport'
> +
> +. ./test-lib.sh
> +
> +# Headers "[System Info]" will be followed by a non-empty line if we put some
> +# information there; we can make sure all our headers were followed by some
> +# information to check if the command was successful.
> +HEADER_PATTERN="^\[.*\]$"
> +check_all_headers_populated() {
> +	while read -r line; do
> +		if [$(grep $HEADER_PATTERN $line)]; then
> +			read -r nextline
> +			if [-z $nextline]; then
> +				return 1;
> +			fi
> +		fi
> +	done
> +}
> +
> +test_expect_success 'creates a report with content in the right places' '
> +	git bugreport &&
> +	check_all_headers_populated <git-bugreport-* &&
> +	rm git-bugreport-*
> +'
> +
> +test_expect_success '--output puts the report in the provided dir' '
> +	mkdir foo/ &&
> +	git bugreport -o foo/ &&
> +	test -f foo/git-bugreport-* &&
> +	rm -fr foo/
> +'
> +
> +test_expect_success 'incorrect arguments abort with usage' '
> +	test_must_fail git bugreport --false 2>output &&
> +	grep usage output &&
> +	test ! -f git-bugreport-*
> +'
> +
> +test_done
> 

I think this is a great start, and I'll take some time later to try it out.

Thanks,
-Stolee

Junio C Hamano Aug. 15, 2019, 2:36 p.m. UTC | #2

Derrick Stolee <stolee@gmail.com> writes:

> Config options to consider stripping out:
>
> 	*url*
> 	*pass* (anything "password" but also "sendmail.smtppass")

Blacklisting?  I wonder if users feel safer if these are limited to
known-benign ones.

>> +	echo "[Configured Hooks]"
>> +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
>> +	echo
>
> Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
> is a binary file?

This makes me feel very nervous.  $GIT_DIR/hooks/ are private and
people can hardcode credentials in them; $GIT_DIR/hooks/pre-foo may
be written toread from $GIT_DIR/hooks/mypassword with the knowledge
that there won't be any "mypassword" hook.

Junio C Hamano Aug. 15, 2019, 6:10 p.m. UTC | #3

Emily Shaffer <emilyshaffer@google.com> writes:

> diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
> new file mode 100644
> index 0000000000..c5f45bbee8
> --- /dev/null
> +++ b/Documentation/git-bugreport.txt
> @@ -0,0 +1,48 @@
> +git-bugreport(1)
> +================
> +
> +NAME
> +----
> +git-bugreport - Collect information for user to file a bug report
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git bugreport' [-o | --output <path>]
> +
> +DESCRIPTION
> +-----------
> +Captures information about the user's machine, Git client, and repository state,
> +as well as a form requesting information about the behavior the user observed,
> +into a single text file which the user can then share, for example to the Git
> +mailing list, in order to report an observed bug.
> +
> +The following information is requested from the user:
> +
> + - Reproduction steps
> + - Expected behavior
> + - Actual behavior

+ - How the above two are different

It is often helpful to have users explain how the expected and
actual are different in their own words.

> +NOTE
> +----
> +Bug reports can be sent to git@vger.kernel.org.

I am not sure if this belongs here.

> diff --git a/git-bugreport.sh b/git-bugreport.sh
> new file mode 100755
> index 0000000000..2200703a51
> --- /dev/null
> +++ b/git-bugreport.sh
> @@ -0,0 +1,86 @@
> +#!/bin/sh
> +
> +print_filenames_and_content() {
> +while read -r line; do
> +	echo "$line";
> +	echo "========";
> +	cat "$line";
> +	echo;
> +done
> +}

Style.

 - have SP on both sides of ()
 - one more HT indent for the function body
 - "do" on its own line
 - no unnecessary semicolons when LF would do

You probably are better off asking xargs to do this instead of
relying on "read -r".

	find ... | xargs -n 1 sh -c 'echo "$1" && cat "$1"' -

or something like that, perhaps.

> +
> +generate_report_text() {
> +
> +	# Generate a form for the user to fill out with their issue.
> +	gettextln "Thank you for filling out a Git bug report!"
> +	gettextln "Please answer the following questions to help us understand your issue."
> +	echo
> +	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
> +	echo
> +	gettextln "What did you expect to happen? (Expected behavior)"
> +	echo
> +	gettextln "What happened instead? (Actual behavior)"
> +	echo
> +	gettextln "Anything else you want to add:"
> +	echo
> +	gettextln "Please review the rest of the bug report below."
> +	gettextln "You can delete any lines you don't wish to send."
> +	echo

Would we on the receiving end be able to tell these section headers
in translated to 47 different languages?  I am sure that i18n is
used here to encourage non-C-locale users to file bugs in their own
languages, but are we prepared to react to them?

> +	echo "[System Information]"
> +	git version --build-options
> +	uname -a
> +	curl-config --version
> +	ldd --version

"curl-config: command not found" may be clear enough, but would
there be a case where errors from two or more consecutive commands
in the above sequence would make the output confusing to the person
sitting on the receiving end?  Would it help, as a convention, to
always ahve "echo [what am I reporting]" before each of these
commands?

> +# Create bugreport file
> +BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"

How portable is -Iminutes?

> diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> new file mode 100755
> index 0000000000..6eb2ee4f66
> --- /dev/null
> +++ b/t/t0091-bugreport.sh
> @@ -0,0 +1,41 @@
> +#!/bin/bash

Make sure /bin/sh suffices to run the test script.

> +test_description='git bugreport'
> +
> +. ./test-lib.sh
> +
> +# Headers "[System Info]" will be followed by a non-empty line if we put some
> +# information there; we can make sure all our headers were followed by some
> +# information to check if the command was successful.
> +HEADER_PATTERN="^\[.*\]$"
> +check_all_headers_populated() {
> +	while read -r line; do
> +		if [$(grep $HEADER_PATTERN $line)]; then

Documentation/CodingGuidelines

I am not sure if the traits this test script checks about the
contents of the output is all that interesting.  Whenever we add new
sections to the main command because we want other kinds of
information collected, we'd update this test script because
otherwise the test would fail, but would it result in quality
bugreport tool, or is it just additional busywork?

If we decide later that a header and its body needs to be separated
with a blank line for better readablity, the check done here would
also need to be updated, but again, that does not feel like anything
more than just busywork to me.

The tests for "-o" and unknown options do make sense, though.

Thanks.

> +# Headers "[System Info]" will be followed by a non-empty line if we put some
> +# information there; we can make sure all our headers were followed by some
> +# information to check if the command was successful.
> +HEADER_PATTERN="^\[.*\]$"
> +check_all_headers_populated() {
> +	while read -r line; do
> +		if [$(grep $HEADER_PATTERN $line)]; then
> +			read -r nextline
> +			if [-z $nextline]; then
> +				return 1;
> +			fi
> +		fi
> +	done
> +}
> +
> +test_expect_success 'creates a report with content in the right places' '
> +	git bugreport &&
> +	check_all_headers_populated <git-bugreport-* &&
> +	rm git-bugreport-*
> +'
> +
> +test_expect_success '--output puts the report in the provided dir' '
> +	mkdir foo/ &&
> +	git bugreport -o foo/ &&
> +	test -f foo/git-bugreport-* &&
> +	rm -fr foo/
> +'
> +
> +test_expect_success 'incorrect arguments abort with usage' '
> +	test_must_fail git bugreport --false 2>output &&
> +	grep usage output &&
> +	test ! -f git-bugreport-*
> +'
> +
> +test_done

Johannes Schindelin Aug. 15, 2019, 8:07 p.m. UTC | #4

Hi,

On Thu, 15 Aug 2019, Derrick Stolee wrote:

> On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > new file mode 100755
> > index 0000000000..2200703a51
> > --- /dev/null
> > +++ b/git-bugreport.sh
>
> At first I was alarmed by "What? another shell script?" but this
> command should prioritize flexibility and extensibility over speed.
> Running multiple processes shouldn't be too taxing for what we are
> trying to do here.

Git for Windows sometimes receives bug reports about Bash not being able
to start (usually it is a DLL base address problem, related to the way
Cygwin and MSYS2 emulate `fork()`).

In such a case, `git bugreport` would only be able to offer a reason for
yet another bug report instead of adding useful metadata.

Something to keep in mind when deciding how to implement this command.

Ciao,
Dscho

Emily Shaffer Aug. 15, 2019, 8:13 p.m. UTC | #5

On Thu, Aug 15, 2019 at 10:15:24AM -0400, Derrick Stolee wrote:

> >  - Do we want to advertise the Git mailing list for bug reports?
> 
> That is possible. Isn't there another mailing list for git users?

I know there's an IRC channel for Git users, I dunno about mailing list.
I'm worried that places I name as points of contact will grow stale,
although I suppose #git on freenode isn't really going anywhere, for
example. (But as a counterexample, I hope that nobody sends something
like this to #git-devel, which seems to have a lower population than
this mailing list.)

> 
> I could see a patch added on top of this for git-for-windows/git that
> changes the instructions to create issues on GitHub.

Indeed; I imagine we'd probably like to patch it to ask for bugs in our
bug tracker.

> 
> >  - Which config options should we filter to avoid accidentally receiving
> >    credentials?
> 
> The remote URLs are pretty sensitive. Not only do users sometimes put passwords
> or PATs into their URLs, the literal name of the repo could be a secret.

Now here's where I start to wonder. We often debug internally by asking
for the remote URL and replicating the issue there, which is why I
mentioned it explicitly in the commit message. But I hadn't considered
folks including the password in the URL.

Well, I suppose anybody can keep a local patch to change the config
filter pattern. I'll try to make it easy to spot and modify.

[snip]

> At first I was alarmed by "What? another shell script?" but this command should
> prioritize flexibility and extensibility over speed. Running multiple processes
> shouldn't be too taxing for what we are trying to do here.

If shell scripts are entirely deprecated I can convert it, but doing it
in C seemed like overkill when I really just wanted "what are all the
commands we would ask the user to run and tell us the output?". I
figured also that it would be a little more immune to bitrot to output
the contents of porcelain commands here.

> > +	echo "[Git Config]"
> > +	# TODO: Pass this through grep -v to avoid users sending us their credentials.
> > +	git config --show-origin --list
> > +	echo
> 
> Config options to consider stripping out:
> 
> 	*url*
> 	*pass* (anything "password" but also "sendmail.smtppass")

Done, thanks.

> 
> > +	echo "[Configured Hooks]"
> > +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
> > +	echo
> 
> Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
> is a binary file?

Yeah, I'm sure it will. I'll add a check to
print_filenames_and_content() so it can tell us if there is a hook
installed there, even if we can't see the content.

> 
> > +
> > +	echo "[Git Logs]"
> > +	find "$GIT_DIR/logs" -type f | print_filenames_and_content
> > +	echo
> 
> As mentioned before, I've sometimes found it helpful to know the data shape for the object
> store. Having a few extra steps such as the following could be nice:
> 
> 	echo "[Loose Objects]"
> 	for objdir in $(find "$GIT_DIR/objects/??" -type d)
> 	do
> 		echo "$objdir: $(ls $objdir | wc -l)"
`echo "$objdir: $(ls $objdir | wc -l) objects`
I'll add context so we don't need to have the bugreport script memorized
in order to read a bugreport :)
> 	done
> 	echo
> 
> 	echo "[Pack Data]"
> 	ls -l "$GIT_DIR/objects/pack"
> 	echo
> 
> 	echo "[Object Info Data]"
> 	ls -lR "$GIT_DIR/objects/info"
> 	echo
> 
> 	echo "[Alternates File]"
> 	echo "========"
> 	cat "$GIT_DIR/objects/info/alternates"
> 	echo
> 
> That last one will collect information on the commit-graph file, even if it is an
> incremental file.

Thanks Stolee, these are awesome and exactly the kind of feedback I was
hoping for.

> 
> I think this is a great start, and I'll take some time later to try it out.
> 
> Thanks,
> -Stolee

Awesome. I'm excited to hear how it goes.

 - Emily

Emily Shaffer Aug. 15, 2019, 9:52 p.m. UTC | #6

On Thu, Aug 15, 2019 at 11:10:12AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > diff --git a/Documentation/git-bugreport.txt b/Documentation/git-bugreport.txt
> > new file mode 100644
> > index 0000000000..c5f45bbee8
> > --- /dev/null
> > +++ b/Documentation/git-bugreport.txt
> > @@ -0,0 +1,48 @@
> > +git-bugreport(1)
> > +================
> > +
> > +NAME
> > +----
> > +git-bugreport - Collect information for user to file a bug report
> > +
> > +SYNOPSIS
> > +--------
> > +[verse]
> > +'git bugreport' [-o | --output <path>]
> > +
> > +DESCRIPTION
> > +-----------
> > +Captures information about the user's machine, Git client, and repository state,
> > +as well as a form requesting information about the behavior the user observed,
> > +into a single text file which the user can then share, for example to the Git
> > +mailing list, in order to report an observed bug.
> > +
> > +The following information is requested from the user:
> > +
> > + - Reproduction steps
> > + - Expected behavior
> > + - Actual behavior
> 
> + - How the above two are different
> 
> It is often helpful to have users explain how the expected and
> actual are different in their own words.

Good point; added.

> 
> > +NOTE
> > +----
> > +Bug reports can be sent to git@vger.kernel.org.
> 
> I am not sure if this belongs here.

Sure, I wasn't certain either. Would you rather I remove the "what to do
with this bugreport" NOTE section entirely? I worry that folks will
think this generated text file is being magically reported behind the
scenes, or that they won't know what to do with the file now that it's
created.

I suppose I did mention in the DESCRIPTION section that the user can
share the generated text file to report a bug. Maybe I should remove
"for example to the Git ML" from there as well.

> 
> > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > new file mode 100755
> > index 0000000000..2200703a51
> > --- /dev/null
> > +++ b/git-bugreport.sh
> > @@ -0,0 +1,86 @@
> > +#!/bin/sh
> > +
> > +print_filenames_and_content() {
> > +while read -r line; do
> > +	echo "$line";
> > +	echo "========";
> > +	cat "$line";
> > +	echo;
> > +done
> > +}
> 
> Style.
> 
>  - have SP on both sides of ()
>  - one more HT indent for the function body
>  - "do" on its own line
>  - no unnecessary semicolons when LF would do
> 
> You probably are better off asking xargs to do this instead of
> relying on "read -r".
> 
> 	find ... | xargs -n 1 sh -c 'echo "$1" && cat "$1"' -
> 
> or something like that, perhaps.

Hm. In responding to Stolee's comments, I ended up replacing "cat $line"
with another function, and it seems xargs isn't happy directing to
functions which haven't been exported, because it spins a new process
for each argument it's passed. I had originally intended to do this with
xargs but then didn't want to pack many lines into one xargs call.

The way the code is in PS1 I think it's a reasonable replacement, but
the code in PS2 is less so - there's a check for whether a file given is
binary or whether it exists at all (the former for hooks, which I think
you were worried about, and the latter for the alternates file, which I
found I do not have.)

Why is "read -r" unreliable? Is it better to export a function and use
xargs, compared to using "read -r" in a loop like this? I was worried
that exporting a function is similar to namespace pollution in C++, but
maybe I'm mistaken.

> 
> > +
> > +generate_report_text() {
> > +
> > +	# Generate a form for the user to fill out with their issue.
> > +	gettextln "Thank you for filling out a Git bug report!"
> > +	gettextln "Please answer the following questions to help us understand your issue."
> > +	echo
> > +	gettextln "What did you do before the bug happened? (Steps to reproduce your issue)"
> > +	echo
> > +	gettextln "What did you expect to happen? (Expected behavior)"
> > +	echo
> > +	gettextln "What happened instead? (Actual behavior)"
> > +	echo
> > +	gettextln "Anything else you want to add:"
> > +	echo
> > +	gettextln "Please review the rest of the bug report below."
> > +	gettextln "You can delete any lines you don't wish to send."
> > +	echo
> 
> Would we on the receiving end be able to tell these section headers
> in translated to 47 different languages?  I am sure that i18n is
> used here to encourage non-C-locale users to file bugs in their own
> languages, but are we prepared to react to them?

I hope that we could accept reports in many languages with online
translation as a guide, but practically speaking,
machine-learning-powered online translation may not be as good for
technical topics as it is for conversational.

At the same time, it seems unnecessarily restrictive to ask for bugs to
come in English. My heart thinks it's better to encourage other
languages too.

> 
> > +	echo "[System Information]"
> > +	git version --build-options
> > +	uname -a
> > +	curl-config --version
> > +	ldd --version
> 
> "curl-config: command not found" may be clear enough, but would
> there be a case where errors from two or more consecutive commands
> in the above sequence would make the output confusing to the person
> sitting on the receiving end?  Would it help, as a convention, to
> always ahve "echo [what am I reporting]" before each of these
> commands?

Yeah, I think that's a good point.

In responding to Stolee's review, I mentioned adding additional context
to some command's output so that "those reviewing the report don't need
to be intimately familiar with what the bugreport script runs". But it
might actually be most useful to echo the command that's being run
everywhere.

I imagine it's fine to include the [] header as well, to make it easier
for the user to understand what was generated.

> 
> > +# Create bugreport file
> > +BUGREPORT_FILE="$basedir/git-bugreport-$(whoami)-$(hostname)-$(date -Iminutes)"
> 
> How portable is -Iminutes?

Ah, it seems while it works in MingW and bash, it doesn't work in zsh.
I'll figure something else out. Thanks.

> 
> > diff --git a/t/t0091-bugreport.sh b/t/t0091-bugreport.sh
> > new file mode 100755
> > index 0000000000..6eb2ee4f66
> > --- /dev/null
> > +++ b/t/t0091-bugreport.sh
> > @@ -0,0 +1,41 @@
> > +#!/bin/bash
> 
> Make sure /bin/sh suffices to run the test script.

Will do. Thanks.

> 
> > +test_description='git bugreport'
> > +
> > +. ./test-lib.sh
> > +
> > +# Headers "[System Info]" will be followed by a non-empty line if we put some
> > +# information there; we can make sure all our headers were followed by some
> > +# information to check if the command was successful.
> > +HEADER_PATTERN="^\[.*\]$"
> > +check_all_headers_populated() {
> > +	while read -r line; do
> > +		if [$(grep $HEADER_PATTERN $line)]; then
> 
> Documentation/CodingGuidelines
> 
> I am not sure if the traits this test script checks about the
> contents of the output is all that interesting.  Whenever we add new
> sections to the main command because we want other kinds of
> information collected, we'd update this test script because
> otherwise the test would fail, but would it result in quality
> bugreport tool, or is it just additional busywork?

I'm a little confused about this - I checked for a general [...] header
standing alone on a line specifically so that we wouldn't need to update
the test script when adding a new section. What do you see that will
grow stale?

> 
> If we decide later that a header and its body needs to be separated
> with a blank line for better readablity, the check done here would
> also need to be updated, but again, that does not feel like anything
> more than just busywork to me.

Sure, I agree with that.

So, what's your suggestion? Not to check the output at all? (This may
actually be fine; it occurred to me while reading your review that if a
user is filing a bug report about something, one of the diagnostic
commands in bugreport might be what's broken for them. So perhaps it
should be tolerant to missing information...)

> 
> The tests for "-o" and unknown options do make sense, though.
> 
> Thanks.
Thanks for reviewing.

 - Emily

Emily Shaffer Aug. 15, 2019, 10:24 p.m. UTC | #7

On Thu, Aug 15, 2019 at 10:07:57PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 15 Aug 2019, Derrick Stolee wrote:
> 
> > On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> > > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > > new file mode 100755
> > > index 0000000000..2200703a51
> > > --- /dev/null
> > > +++ b/git-bugreport.sh
> >
> > At first I was alarmed by "What? another shell script?" but this
> > command should prioritize flexibility and extensibility over speed.
> > Running multiple processes shouldn't be too taxing for what we are
> > trying to do here.
> 
> Git for Windows sometimes receives bug reports about Bash not being able
> to start (usually it is a DLL base address problem, related to the way
> Cygwin and MSYS2 emulate `fork()`).

Hmm. In a case like this, then, how is someone using GfW at all?
Embarrassingly, I don't actually have a way to try it out for myself.
It seems there's no GUI, and users are using it through the command line
in mingw, so when you say "bash doesn't start" do you mean "users can't
use the mingw prompt at all"?

> 
> In such a case, `git bugreport` would only be able to offer a reason for
> yet another bug report instead of adding useful metadata.
> 
> Something to keep in mind when deciding how to implement this command.
> 
> Ciao,
> Dscho

Yeah, that's an interesting point, thanks for bringing it up. So, in the
case when bash won't start at all, what does that failure look like? How
much of Git can users still use? For example, would they be able to get
far enough to run a binary git-bugreport?

 - Emily

Junio C Hamano Aug. 15, 2019, 10:29 p.m. UTC | #8

Emily Shaffer <emilyshaffer@google.com> writes:

>> > +NOTE
>> > +----
>> > +Bug reports can be sent to git@vger.kernel.org.
>> 
>> I am not sure if this belongs here.
>
> Sure, I wasn't certain either. Would you rather I remove the "what to do
> with this bugreport" NOTE section entirely?

Not really.  You are invoking an editor to let the user edit the
pre-populated report, and I would imagine that a comment in that
file would be the best place to give instructions, not a manpage
for "git bugreport" command.

> So, what's your suggestion? Not to check the output at all? (This may
> actually be fine; it occurred to me while reading your review that if a
> user is filing a bug report about something, one of the diagnostic
> commands in bugreport might be what's broken for them. So perhaps it
> should be tolerant to missing information...)

Right.

Emily Shaffer Aug. 15, 2019, 10:52 p.m. UTC | #9

On Thu, Aug 15, 2019 at 07:36:57AM -0700, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
> > Config options to consider stripping out:
> >
> > 	*url*
> > 	*pass* (anything "password" but also "sendmail.smtppass")
> 
> Blacklisting?  I wonder if users feel safer if these are limited to
> known-benign ones.

I think a whitelist of config options to print would grow stale
immediately, and the options we're missing would be very likely to be
configs to turn on new experimental features - which is probably what we
most want the bugreport for.

> 
> >> +	echo "[Configured Hooks]"
> >> +	find "$GIT_DIR/hooks/" -type f | grep -v "\.sample$" | print_filenames_and_content
> >> +	echo
> >
> > Remove the sample hooks, but focus on the others. Will this look like garbage if a hook
> > is a binary file?
> 
> This makes me feel very nervous.  $GIT_DIR/hooks/ are private and
> people can hardcode credentials in them; $GIT_DIR/hooks/pre-foo may
> be written toread from $GIT_DIR/hooks/mypassword with the knowledge
> that there won't be any "mypassword" hook.

Hmm. I think the list of valid hooks isn't one that changes often, but
it's also not enumerated in some machine-parseable way - it exists in
Documentation/githooks.txt but that's all. I'd still be a little worried
about bitrot... I think it's probably better to list the filenames in
$GIT_DIR/hooks but not print their contents. I'll modify it.

 - Emily

Emily Shaffer Aug. 15, 2019, 10:54 p.m. UTC | #10

On Thu, Aug 15, 2019 at 03:29:24PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> >> > +NOTE
> >> > +----
> >> > +Bug reports can be sent to git@vger.kernel.org.
> >> 
> >> I am not sure if this belongs here.
> >
> > Sure, I wasn't certain either. Would you rather I remove the "what to do
> > with this bugreport" NOTE section entirely?
> 
> Not really.  You are invoking an editor to let the user edit the
> pre-populated report, and I would imagine that a comment in that
> file would be the best place to give instructions, not a manpage
> for "git bugreport" command.

Oh, I see! In that case, do you still want the Git mailing list address
shown in the report text?

> 
> > So, what's your suggestion? Not to check the output at all? (This may
> > actually be fine; it occurred to me while reading your review that if a
> > user is filing a bug report about something, one of the diagnostic
> > commands in bugreport might be what's broken for them. So perhaps it
> > should be tolerant to missing information...)
> 
> Right.

Ok, will do.

Junio C Hamano Aug. 15, 2019, 11:40 p.m. UTC | #11

Emily Shaffer <emilyshaffer@google.com> writes:

> I think a whitelist of config options to print would grow stale
> immediately, and the options we're missing would be very likely to be
> configs to turn on new experimental features - which is probably what we
> most want the bugreport for.

The implementation of your "git bugreport" command comes from the
same version of Git as such an experimental feature you want
feedback to is shipped with, so I am not sure where your concern
about staleness comes from.

If you really care about getting quality reports, you need to earn
users' trust (one way of doing so would be to maintain a good "list
of benign configuration knobs whose values help diagnosing issues"),
and you need to make sure we obtain relevant pieces of information.
Both of these things are something you need to actively work on.

We have trained ourselves to the point that we consider it a bug if
you add a new command git-bugreport without adding a corresponding
pattern in the .gitignore file.  And thanks to that discipline, we
have been fairly good at keeping .gitignore up to date.  I do not
see a reason why we cannot do something similar to the registry of
configuration variables that we care about, which may be shipped as
part of "git bugreport" command.

Emily Shaffer Aug. 16, 2019, 1:25 a.m. UTC | #12

On Thu, Aug 15, 2019 at 04:40:50PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > I think a whitelist of config options to print would grow stale
> > immediately, and the options we're missing would be very likely to be
> > configs to turn on new experimental features - which is probably what we
> > most want the bugreport for.
> 
> The implementation of your "git bugreport" command comes from the
> same version of Git as such an experimental feature you want
> feedback to is shipped with, so I am not sure where your concern
> about staleness comes from.
> 
> If you really care about getting quality reports, you need to earn
> users' trust (one way of doing so would be to maintain a good "list
> of benign configuration knobs whose values help diagnosing issues"),
> and you need to make sure we obtain relevant pieces of information.
> Both of these things are something you need to actively work on.
> 
> We have trained ourselves to the point that we consider it a bug if
> you add a new command git-bugreport without adding a corresponding
> pattern in the .gitignore file.  And thanks to that discipline, we
> have been fairly good at keeping .gitignore up to date.  I do not
> see a reason why we cannot do something similar to the registry of
> configuration variables that we care about, which may be shipped as
> part of "git bugreport" command.

I think comparing this habit to the .gitignore isn't quite fair -
.gitignore tells me I forgot to add my new command binary to it, when I
run `git status` to see what I need to add to my commit with new
command.

But, I agree that it's not going to be possible to create a completely
leakproof blacklisting heuristic here. So I'll use a whitelist for the
next patchset.

 - Emily

Junio C Hamano Aug. 16, 2019, 4:41 p.m. UTC | #13

Emily Shaffer <emilyshaffer@google.com> writes:

> I think comparing this habit to the .gitignore isn't quite fair -
> .gitignore tells me I forgot to add my new command binary to it, when I
> run `git status` to see what I need to add to my commit with new
> command.

That is why I said that we need to actively work on, if we care
about getting quality reports.

I do not think it is unreasonable to expect the build procedure for
"git bugreport" to involve scanning in Documentation/config/ to pick
up variable names, annotated in such a way that is invisible to
AsciiDoc to allow us tell which ones are sensitive and which ones
are not.  A test in t/ could even check if a documented
configuration variable has such an annotation.  A commit that adds
configuration variables without documentiong them does exist, but
variables without documentation are (1) bugs, and (2) are not worth
serious engineering effort on until they get documented.

I am not saying that you must implement the whitelist exactly like
the above.  I am not even saying that this must be whitelist and not
blacklist---you'd have the same issue maintaining the list anyway.

The above is merely to illustrate the reason why I think that the
kind of "active work" to make sure that the list will not go stale
would be feasible.

Thanks.

Emily Shaffer Aug. 16, 2019, 7:08 p.m. UTC | #14

On Fri, Aug 16, 2019 at 09:41:41AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > I think comparing this habit to the .gitignore isn't quite fair -
> > .gitignore tells me I forgot to add my new command binary to it, when I
> > run `git status` to see what I need to add to my commit with new
> > command.
> 
> That is why I said that we need to actively work on, if we care
> about getting quality reports.
> 
> I do not think it is unreasonable to expect the build procedure for
> "git bugreport" to involve scanning in Documentation/config/ to pick
> up variable names, annotated in such a way that is invisible to
> AsciiDoc to allow us tell which ones are sensitive and which ones
> are not.  A test in t/ could even check if a documented
> configuration variable has such an annotation.  A commit that adds
> configuration variables without documentiong them does exist, but
> variables without documentation are (1) bugs, and (2) are not worth
> serious engineering effort on until they get documented.

Interesting. I think I have an idea for a way to do this, but it ends up
fairly large; I'll send proof of concept as a follow-on patch to this
one. Thanks for the suggestions.

 - Emily

Johannes Schindelin Aug. 16, 2019, 8:19 p.m. UTC | #15

Hi Emily,

On Thu, 15 Aug 2019, Emily Shaffer wrote:

> On Thu, Aug 15, 2019 at 10:07:57PM +0200, Johannes Schindelin wrote:
> >
> > On Thu, 15 Aug 2019, Derrick Stolee wrote:
> >
> > > On 8/14/2019 10:34 PM, Emily Shaffer wrote:
> > > > diff --git a/git-bugreport.sh b/git-bugreport.sh
> > > > new file mode 100755
> > > > index 0000000000..2200703a51
> > > > --- /dev/null
> > > > +++ b/git-bugreport.sh
> > >
> > > At first I was alarmed by "What? another shell script?" but this
> > > command should prioritize flexibility and extensibility over speed.
> > > Running multiple processes shouldn't be too taxing for what we are
> > > trying to do here.
> >
> > Git for Windows sometimes receives bug reports about Bash not being able
> > to start (usually it is a DLL base address problem, related to the way
> > Cygwin and MSYS2 emulate `fork()`).
>
> Hmm. In a case like this, then, how is someone using GfW at all?

On Windows, there are native command-line interpreters available that
are _not_ Bash: PowerShell and CMD.

These days, you can get away with using Git _without_ a working Unix
shell most of the time. There are only preciously few commands left that
are still written as scripts, and most of these seem to be used less
often than one might think:

	- submodule
	- bisect
	- filter-branch
	- instaweb
	- mergetool
	- some uncommon merge strategies
	- rebase -p (already deprecated)
	- several CVS/Subversion/Arch/Quilt integrations
	- send-email (only relevant for mailing list centric projects,
	  again, not very common any longer)
	- request-pull (I would not be surprised if this is specific for
	  the Linux project, as if that project was even close to the
	  main user of Git)

So as long as you don't use submodules, and as long as you are unaware
of the `bisect` command, which would comprise the majority of Git users
in my experience, you can totally use Git for Windows without using Bash
or Perl, i.e. without using the Cygwin/MSYS2 runtime that seems to be
one of the common causes for trouble in Git for Windows.

> Embarrassingly, I don't actually have a way to try it out for myself.
> It seems there's no GUI, and users are using it through the command line
> in mingw, so when you say "bash doesn't start" do you mean "users can't
> use the mingw prompt at all"?

There is no MinGW prompt.

MinGW stands for "Minimal GNU on Windows", but it really refers to a way
to compile native Win32 programs via GCC. Meaning that some of the POSIX
functionality Unix/Linux developers have come to expect is unavailable.
For example, `fork()`. Which means that there is no native port of Bash
to Windows, at least not that _I_ am aware of.

Git for Windows comes with a "Git Bash", which is essentially a Bash
built against the MSYS2 runtime (i.e. it is much more like a Cygwin
executable than like a Win32 executable).

The common way to run Git for Windows is actually via `git.exe`,
independent of what particular command-line interpreter you prefer.

> > In such a case, `git bugreport` would only be able to offer a reason for
> > yet another bug report instead of adding useful metadata.
> >
> > Something to keep in mind when deciding how to implement this command.
>
> Yeah, that's an interesting point, thanks for bringing it up. So, in the
> case when bash won't start at all, what does that failure look like? How
> much of Git can users still use? For example, would they be able to get
> far enough to run a binary git-bugreport?

See above.

If I were you, I would not write `git bugreport` as anything but a
built-in, written in C. Unless there are _really_ good reasons not to
[*1*].  And quite honestly, I don't see any such reasons.

Ciao,
Dscho

Footnote *1*: One good reason to use the MSYS2 Bash would be to be able
to use MSYS2's POSIX emulation layer, e.g. to talk to the SSH agent
(that is also an MSYS2 program, meaning that it communicates via
emulated Unix sockets, which is a facility not available to native Win32
programs).

Another valid reason to use the MSYS2 Bash is to be able to talk to the
(emulated) pseudo TTYs provided by the MSYS2 runtime, e.g. when running
inside a MinTTY window, which is the default for Git Bash in Git for
Windows.

bugreport: add tool to generate debugging info

Commit Message

Comments

Patch