mbox series

[v4,0/7] scalar: implement the subcommand "diagnose"

Message ID pull.1128.v4.git.1652210824.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series scalar: implement the subcommand "diagnose" | expand

Message

Jean-Noël Avila via GitGitGadget May 10, 2022, 7:26 p.m. UTC
Over the course of the years, we developed a sub-command that gathers
diagnostic data into a .zip file that can then be attached to bug reports.
This sub-command turned out to be very useful in helping Scalar developers
identify and fix issues.

Changes since v3:

 * We're now using unquote_c_style() instead of rolling our own unquoter.
 * Fixed the added regression test.
 * As pointed out by Scalar's Functional Tests, the
   add_directory_to_archiver() function should not fail when scalar diagnose
   encounters FSMonitor's Unix socket, but only warn instead.
 * Related: add_directory_to_archiver() needs to propagate errors from
   processing subdirectories so that the top-level call returns an error,
   too.

Changes since v2:

 * Clarified in the commit message what the biggest benefit of
   --add-file-with-content is.
 * The <path> part of the -add-file-with-content argument can now contain
   colons. To do this, the path needs to start and end in double-quote
   characters (which are stripped), and the backslash serves as escape
   character in that case (to allow the path to contain both colons and
   double-quotes).
 * Fixed incorrect grammar.
 * Instead of strcmp(<what-we-don't-want>), we now say
   !strcmp(<what-we-want>).
 * The help text for --add-file-with-content was improved a tiny bit.
 * Adjusted the commit message that still talked about spawning plenty of
   processes and about a throw-away repository for the sake of generating a
   .zip file.
 * Simplified the code that shows the diagnostics and adds them to the .zip
   file.
 * The final message that reports that the archive is complete is now
   printed to stderr instead of stdout.

Changes since v1:

 * Instead of creating a throw-away repository, staging the contents of the
   .zip file and then using git write-tree and git archive to write the .zip
   file, the patch series now introduces a new option to git archive and
   uses write_archive() directly (avoiding any separate process).
 * Since the command avoids separate processes, it is now blazing fast on
   Windows, and I dropped the spinner() function because it's no longer
   needed.
 * While reworking the test case, I noticed that scalar [...] <enlistment>
   failed to verify that the specified directory exists, and would happily
   "traverse to its parent directory" on its quest to find a Scalar
   enlistment. That is of course incorrect, and has been fixed as a "while
   at it" sort of preparatory commit.
 * I had forgotten to sign off on all the commits, which has been fixed.
 * Instead of some "home-grown" readdir()-based function, the code now uses
   for_each_file_in_pack_dir() to look through the pack directories.
 * If any alternates are configured, their pack directories are now included
   in the output.
 * The commit message that might be interpreted to promise information about
   large loose files has been corrected to no longer promise that.
 * The test cases have been adjusted to test a little bit more (e.g.
   verifying that specific paths are mentioned in the output, instead of
   merely verifying that the output is non-empty).

Johannes Schindelin (5):
  archive: optionally add "virtual" files
  archive --add-file-with-contents: allow paths containing colons
  scalar: validate the optional enlistment argument
  Implement `scalar diagnose`
  scalar diagnose: include disk space information

Matthew John Cheetham (2):
  scalar: teach `diagnose` to gather packfile info
  scalar: teach `diagnose` to gather loose objects information

 Documentation/git-archive.txt    |  17 ++
 archive.c                        |  61 ++++++-
 contrib/scalar/scalar.c          | 292 ++++++++++++++++++++++++++++++-
 contrib/scalar/scalar.txt        |  12 ++
 contrib/scalar/t/t9099-scalar.sh |  27 +++
 t/t5003-archive-zip.sh           |  20 +++
 6 files changed, 419 insertions(+), 10 deletions(-)


base-commit: ddc35d833dd6f9e8946b09cecd3311b8aa18d295
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1128%2Fdscho%2Fscalar-diagnose-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1128/dscho/scalar-diagnose-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1128

Range-diff vs v3:

 1:  45662cf582a = 1:  45662cf582a archive: optionally add "virtual" files
 2:  ce4b1b680c9 ! 2:  fdba4ed6f4d archive --add-file-with-contents: allow paths containing colons
     @@ Documentation/git-archive.txt: OPTIONS
      -command-line limits. For non-trivial cases, write an untracked file
      -and use `--add-file` instead.
      +The `<path>` argument can start and end with a literal double-quote
     -+character. In this case, the backslash is interpreted as escape
     -+character. The path must be quoted if it contains a colon, to avoid
     -+the colon from being misinterpreted as the separator between the
     -+path and the contents.
     ++character; The contained file name is interpreted as a C-style string,
     ++i.e. the backslash is interpreted as escape character. The path must
     ++be quoted if it contains a colon, to avoid the colon from being
     ++misinterpreted as the separator between the path and the contents, or
     ++if the path begins or ends with a double-quote character.
      ++
      +The file mode is limited to a regular file, and the option may be
      +subject to platform-dependent command-line limits. For non-trivial
     @@ Documentation/git-archive.txt: OPTIONS
       	Look for attributes in .gitattributes files in the working tree
      
       ## archive.c ##
     +@@
     + #include "parse-options.h"
     + #include "unpack-trees.h"
     + #include "dir.h"
     ++#include "quote.h"
     + 
     + static char const * const archive_usage[] = {
     + 	N_("git archive [<options>] <tree-ish> [<path>...]"),
      @@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
       			die(_("Not a regular file: %s"), path);
       		info->content = NULL; /* read the file later */
       	} else {
      -		const char *colon = strchr(arg, ':');
     - 		char *p;
     +-		char *p;
     ++		struct strbuf buf = STRBUF_INIT;
     ++		const char *p = arg;
     ++
     ++		if (*p != '"')
     ++			p = strchr(p, ':');
     ++		else if (unquote_c_style(&buf, p, &p) < 0)
     ++			die(_("unclosed quote: '%s'"), arg);
       
      -		if (!colon)
     --			die(_("missing colon: '%s'"), arg);
     -+		if (*arg != '"') {
     -+			const char *colon = strchr(arg, ':');
     -+
     -+			if (!colon)
     -+				die(_("missing colon: '%s'"), arg);
     -+			p = xstrndup(arg, colon - arg);
     -+			arg = colon + 1;
     -+		} else {
     -+			struct strbuf buf = STRBUF_INIT;
     -+			const char *orig = arg;
     -+
     -+			for (;;) {
     -+				if (!*(++arg))
     -+					die(_("unclosed quote: '%s'"), orig);
     -+				if (*arg == '"')
     -+					break;
     -+				if (*arg == '\\' && *(++arg) == '\0')
     -+					die(_("trailing backslash: '%s"), orig);
     -+				else
     -+					strbuf_addch(&buf, *arg);
     -+			}
     -+
     -+			if (*(++arg) != ':')
     -+				die(_("missing colon: '%s'"), orig);
     -+
     -+			p = strbuf_detach(&buf, NULL);
     -+			arg++;
     -+		}
     ++		if (!p || *p != ':')
     + 			die(_("missing colon: '%s'"), arg);
       
      -		p = xstrndup(arg, colon - arg);
     - 		if (!args->prefix)
     - 			path = p;
     - 		else {
     -@@ archive.c: static int add_file_cb(const struct option *opt, const char *arg, int unset)
     +-		if (!args->prefix)
     +-			path = p;
     +-		else {
     +-			path = prefix_filename(args->prefix, p);
     +-			free(p);
     ++		if (p == arg)
     ++			die(_("empty file name: '%s'"), arg);
     ++
     ++		path = buf.len ?
     ++			strbuf_detach(&buf, NULL) : xstrndup(arg, p - arg);
     ++
     ++		if (args->prefix) {
     ++			char *save = path;
     ++			path = prefix_filename(args->prefix, path);
     ++			free(save);
       		}
       		memset(&info->stat, 0, sizeof(info->stat));
       		info->stat.st_mode = S_IFREG | 0644;
      -		info->content = xstrdup(colon + 1);
     -+		info->content = xstrdup(arg);
     ++		info->content = xstrdup(p + 1);
       		info->stat.st_size = strlen(info->content);
       	}
       	item = string_list_append_nodup(&args->extra_files, path);
 3:  5a3eeb55409 = 3:  da9f52a8240 scalar: validate the optional enlistment argument
 4:  dfe821d10fe ! 4:  87bdc22322b Implement `scalar diagnose`
     @@ contrib/scalar/scalar.c: static int unregister_dir(void)
      +		if (e->d_type == DT_REG)
      +			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
      +		else if (e->d_type != DT_DIR)
     ++			warning(_("skipping '%s', which is neither file nor "
     ++				  "directory"), buf.buf);
     ++		else if (recurse &&
     ++			 add_directory_to_archiver(archiver_args,
     ++						   buf.buf, recurse) < 0)
      +			res = -1;
     -+		else if (recurse)
     -+		     add_directory_to_archiver(archiver_args, buf.buf, recurse);
      +	}
      +
      +	closedir(dir);
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` erro
      +SQ="'"
      +test_expect_success UNZIP 'scalar diagnose' '
      +	scalar clone "file://$(pwd)" cloned --single-branch &&
     -+	scalar diagnose cloned >out &&
     -+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     ++	scalar diagnose cloned >out 2>err &&
     ++	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      +	zip_path=$(cat zip_path) &&
      +	test -n "$zip_path" &&
      +	unzip -v "$zip_path" &&
 5:  bb162abd383 ! 5:  3f63b197d42 scalar diagnose: include disk space information
     @@ contrib/scalar/t/t9099-scalar.sh
      @@ contrib/scalar/t/t9099-scalar.sh: SQ="'"
       test_expect_success UNZIP 'scalar diagnose' '
       	scalar clone "file://$(pwd)" cloned --single-branch &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
      +	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
       	zip_path=$(cat zip_path) &&
       	test -n "$zip_path" &&
 6:  32aaad7cce1 ! 6:  fc1319338fc scalar: teach `diagnose` to gather packfile info
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success '`scalar [...] <dir>` erro
       	scalar clone "file://$(pwd)" cloned --single-branch &&
      +	git repack &&
      +	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
       	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	folder=${zip_path%.zip} &&
       	test_path_is_missing "$folder" &&
 7:  322932f0bb8 ! 7:  e8f5b42f7b7 scalar: teach `diagnose` to gather loose objects information
     @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	git repack &&
       	echo "$(pwd)/.git/objects/" >>cloned/src/.git/objects/info/alternates &&
      +	test_commit -C cloned/src loose &&
     - 	scalar diagnose cloned >out &&
     + 	scalar diagnose cloned >out 2>err &&
       	grep "Available space" out &&
     - 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <out >zip_path &&
     + 	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
      @@ contrib/scalar/t/t9099-scalar.sh: test_expect_success UNZIP 'scalar diagnose' '
       	unzip -p "$zip_path" diagnostics.log >out &&
       	test_file_not_empty out &&

Comments

Ævar Arnfjörð Bjarmason May 17, 2022, 3:03 p.m. UTC | #1
On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:

> Over the course of the years, we developed a sub-command that gathers
> diagnostic data into a .zip file that can then be attached to bug reports.
> This sub-command turned out to be very useful in helping Scalar developers
> identify and fix issues.

I don't mind this as some intermediate step, but re the context of the
plan for scalar "eventually going away" (discussed in previous threads)
I wonder why (especially re the earlier thread upthread at [1]) this
isn't being added to "git bugreport".

Is the plan to integrate this into "git bugreport" eventually?

1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202062213030.347@tvgsbejvaqbjf.bet/
Randall S. Becker May 17, 2022, 3:28 p.m. UTC | #2
On May 17, 2022 11:03 AM, Ævar Arnfjörð Bjarmason wrote:
>On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:
>
>> Over the course of the years, we developed a sub-command that gathers
>> diagnostic data into a .zip file that can then be attached to bug reports.
>> This sub-command turned out to be very useful in helping Scalar
>> developers identify and fix issues.
>
>I don't mind this as some intermediate step, but re the context of the plan for
>scalar "eventually going away" (discussed in previous threads) I wonder why
>(especially re the earlier thread upthread at [1]) this isn't being added to "git
>bugreport".
>
>Is the plan to integrate this into "git bugreport" eventually?
>
>1.
>https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202062213030.347@tvgsbejvaqbjf.
>bet/

Could this also not be useful in fsck, as --diagnose? That's the go-to command when there are issues for many users.
--Randall
Johannes Schindelin May 19, 2022, 6:17 p.m. UTC | #3
Hi Randall and Ævar,

On Tue, 17 May 2022, rsbecker@nexbridge.com wrote:

> On May 17, 2022 11:03 AM, Ævar Arnfjörð Bjarmason wrote:
> >On Tue, May 10 2022, Johannes Schindelin via GitGitGadget wrote:
> >
> >> Over the course of the years, we developed a sub-command that gathers
> >> diagnostic data into a .zip file that can then be attached to bug
> >> reports. This sub-command turned out to be very useful in helping
> >> Scalar developers identify and fix issues.
> >
> >I don't mind this as some intermediate step, but re the context of the
> >plan for scalar "eventually going away" (discussed in previous threads)
> >I wonder why (especially re the earlier thread upthread at [1]) this
> >isn't being added to "git bugreport".
> >
> >Is the plan to integrate this into "git bugreport" eventually?

Potentially a variation of the `scalar diagnose` code could be useful in
`git bugreport`, opt-in via a new option.

But that's not the purpose of this patch series.

> Could this also not be useful in fsck, as --diagnose? That's the go-to
> command when there are issues for many users.

I can see where you're coming from, but `fsck`'s mission is to verify the
integrity of the local Git database. That is very different from the
mission of `scalar diagnose`, which is to help diagnose issues (whether
they are truly bugs or usage patterns causing unfortunate performance).

Ciao,
Dscho