diff mbox series

[v6+,4/7] scalar: implement `scalar diagnose`

Message ID 20220528231118.3504387-5-gitster@pobox.com (mailing list archive)
State Accepted
Commit aa5c79a33156c2f086ca3a149c11f1434d10f5ce
Headers show
Series js/scalar-diagnose rebased | expand

Commit Message

Junio C Hamano May 28, 2022, 11:11 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/scalar/scalar.c          | 144 +++++++++++++++++++++++++++++++
 contrib/scalar/scalar.txt        |  12 +++
 contrib/scalar/t/t9099-scalar.sh |  14 +++
 3 files changed, 170 insertions(+)

Comments

Ævar Arnfjörð Bjarmason June 10, 2022, 2:08 a.m. UTC | #1
On Sat, May 28 2022, Junio C Hamano wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> [...]
> The `diagnose` command is the culmination of this hard-won knowledge: it
> gathers the installed hooks, the config, a couple statistics describing
> the data shape, among other pieces of information, and then wraps
> everything up in a tidy, neat `.zip` archive.
> [...]
> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
> +		goto diagnose_cleanup;

Noticed on top of some local changes I have to not add a .git/hooks (the
--no-template topic), but this fails to diagnose any repo that doesn't
have these paths, which are optional, either because a user could have manually removed them, or used --template=.

although I don't think there's a way to create that sort of repo with
the scalar tooling, it doesn't seem to forward that option, but I didn't
look deeply.

So, no big deal, but it would be nice to have that fixed. Is there a
reason for why this mere addition of various stuff for diagnosis goes
straight to an opendir() and error on failure, as opposed to doing an
lstat() etc. first?
Junio C Hamano June 10, 2022, 4:44 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sat, May 28 2022, Junio C Hamano wrote:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>> [...]
>> The `diagnose` command is the culmination of this hard-won knowledge: it
>> gathers the installed hooks, the config, a couple statistics describing
>> the data shape, among other pieces of information, and then wraps
>> everything up in a tidy, neat `.zip` archive.
>> [...]
>> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
>> +		goto diagnose_cleanup;
>
> Noticed on top of some local changes I have to not add a
> .git/hooks (the --no-template topic), but this fails to diagnose
> any repo that doesn't have these paths, which are optional, either
> because a user could have manually removed them, or used
> --template=.

Quite honestly, if it lacks any directory that we traditionally
created upon "git init", with our standard templates, we can and
should call such a repository "broken" and move on.
Ævar Arnfjörð Bjarmason June 10, 2022, 5:35 p.m. UTC | #3
On Fri, Jun 10 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Sat, May 28 2022, Junio C Hamano wrote:
>>
>>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>> [...]
>>> The `diagnose` command is the culmination of this hard-won knowledge: it
>>> gathers the installed hooks, the config, a couple statistics describing
>>> the data shape, among other pieces of information, and then wraps
>>> everything up in a tidy, neat `.zip` archive.
>>> [...]
>>> +	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
>>> +	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
>>> +		goto diagnose_cleanup;
>>
>> Noticed on top of some local changes I have to not add a
>> .git/hooks (the --no-template topic), but this fails to diagnose
>> any repo that doesn't have these paths, which are optional, either
>> because a user could have manually removed them, or used
>> --template=.
>
> Quite honestly, if it lacks any directory that we traditionally
> created upon "git init", with our standard templates, we can and
> should call such a repository "broken" and move on.

In our own test suite we do e.g. (and did more of that until some recent
changes of mine):

    git mv .git/hooks .git/hooks.disabled

We've never documented in "git init" or the like that these very
optional directories in .git/ were some sort of hard requirenment, and
e.g. core.hooksPath and gitrepository-layout(5) explicitly seem to
suggest otherwise.

In any case, there's the golden rule about being strict in what you emit
and loose in what you accept, which we've taken with repository
compatibility. Having a tool that's designed to aid bugreporting be
picky about what sort of repository it supports seems to go against the
point of such a tool.

Particularly in this case, where it seems easy to just guard it with a
stat() check, or not error out if we fail to add this to the *.zip file,
no?
diff mbox series

Patch

diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
index 6d58c7a698..a1e05a2146 100644
--- a/contrib/scalar/scalar.c
+++ b/contrib/scalar/scalar.c
@@ -11,6 +11,7 @@ 
 #include "dir.h"
 #include "packfile.h"
 #include "help.h"
+#include "archive.h"
 
 /*
  * Remove the deepest subdirectory in the provided path string. Path must not
@@ -260,6 +261,47 @@  static int unregister_dir(void)
 	return res;
 }
 
+static int add_directory_to_archiver(struct strvec *archiver_args,
+					  const char *path, int recurse)
+{
+	int at_root = !*path;
+	DIR *dir = opendir(at_root ? "." : path);
+	struct dirent *e;
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	int res = 0;
+
+	if (!dir)
+		return error_errno(_("could not open directory '%s'"), path);
+
+	if (!at_root)
+		strbuf_addf(&buf, "%s/", path);
+	len = buf.len;
+	strvec_pushf(archiver_args, "--prefix=%s", buf.buf);
+
+	while (!res && (e = readdir(dir))) {
+		if (!strcmp(".", e->d_name) || !strcmp("..", e->d_name))
+			continue;
+
+		strbuf_setlen(&buf, len);
+		strbuf_addstr(&buf, e->d_name);
+
+		if (e->d_type == DT_REG)
+			strvec_pushf(archiver_args, "--add-file=%s", buf.buf);
+		else if (e->d_type != DT_DIR)
+			warning(_("skipping '%s', which is neither file nor "
+				  "directory"), buf.buf);
+		else if (recurse &&
+			 add_directory_to_archiver(archiver_args,
+						   buf.buf, recurse) < 0)
+			res = -1;
+	}
+
+	closedir(dir);
+	strbuf_release(&buf);
+	return res;
+}
+
 /* printf-style interface, expects `<key>=<value>` argument */
 static int set_config(const char *fmt, ...)
 {
@@ -500,6 +542,107 @@  static int cmd_clone(int argc, const char **argv)
 	return res;
 }
 
+static int cmd_diagnose(int argc, const char **argv)
+{
+	struct option options[] = {
+		OPT_END(),
+	};
+	const char * const usage[] = {
+		N_("scalar diagnose [<enlistment>]"),
+		NULL
+	};
+	struct strbuf zip_path = STRBUF_INIT;
+	struct strvec archiver_args = STRVEC_INIT;
+	char **argv_copy = NULL;
+	int stdout_fd = -1, archiver_fd = -1;
+	time_t now = time(NULL);
+	struct tm tm;
+	struct strbuf path = STRBUF_INIT, buf = STRBUF_INIT;
+	int res = 0;
+
+	argc = parse_options(argc, argv, NULL, options,
+			     usage, 0);
+
+	setup_enlistment_directory(argc, argv, usage, options, &zip_path);
+
+	strbuf_addstr(&zip_path, "/.scalarDiagnostics/scalar_");
+	strbuf_addftime(&zip_path,
+			"%Y%m%d_%H%M%S", localtime_r(&now, &tm), 0, 0);
+	strbuf_addstr(&zip_path, ".zip");
+	switch (safe_create_leading_directories(zip_path.buf)) {
+	case SCLD_EXISTS:
+	case SCLD_OK:
+		break;
+	default:
+		error_errno(_("could not create directory for '%s'"),
+			    zip_path.buf);
+		goto diagnose_cleanup;
+	}
+	stdout_fd = dup(1);
+	if (stdout_fd < 0) {
+		res = error_errno(_("could not duplicate stdout"));
+		goto diagnose_cleanup;
+	}
+
+	archiver_fd = xopen(zip_path.buf, O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (archiver_fd < 0 || dup2(archiver_fd, 1) < 0) {
+		res = error_errno(_("could not redirect output"));
+		goto diagnose_cleanup;
+	}
+
+	init_zip_archiver();
+	strvec_pushl(&archiver_args, "scalar-diagnose", "--format=zip", NULL);
+
+	strbuf_reset(&buf);
+	strbuf_addstr(&buf, "Collecting diagnostic info\n\n");
+	get_version_info(&buf, 1);
+
+	strbuf_addf(&buf, "Enlistment root: %s\n", the_repository->worktree);
+	write_or_die(stdout_fd, buf.buf, buf.len);
+	strvec_pushf(&archiver_args,
+		     "--add-virtual-file=diagnostics.log:%.*s",
+		     (int)buf.len, buf.buf);
+
+	if ((res = add_directory_to_archiver(&archiver_args, ".git", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/hooks", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/info", 0)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/logs", 1)) ||
+	    (res = add_directory_to_archiver(&archiver_args, ".git/objects/info", 0)))
+		goto diagnose_cleanup;
+
+	strvec_pushl(&archiver_args, "--prefix=",
+		     oid_to_hex(the_hash_algo->empty_tree), "--", NULL);
+
+	/* `write_archive()` modifies the `argv` passed to it. Let it. */
+	argv_copy = xmemdupz(archiver_args.v,
+			     sizeof(char *) * archiver_args.nr);
+	res = write_archive(archiver_args.nr, (const char **)argv_copy, NULL,
+			    the_repository, NULL, 0);
+	if (res) {
+		error(_("failed to write archive"));
+		goto diagnose_cleanup;
+	}
+
+	if (!res)
+		fprintf(stderr, "\n"
+		       "Diagnostics complete.\n"
+		       "All of the gathered info is captured in '%s'\n",
+		       zip_path.buf);
+
+diagnose_cleanup:
+	if (archiver_fd >= 0) {
+		close(1);
+		dup2(stdout_fd, 1);
+	}
+	free(argv_copy);
+	strvec_clear(&archiver_args);
+	strbuf_release(&zip_path);
+	strbuf_release(&path);
+	strbuf_release(&buf);
+
+	return res;
+}
+
 static int cmd_list(int argc, const char **argv)
 {
 	if (argc != 1)
@@ -801,6 +944,7 @@  static struct {
 	{ "reconfigure", cmd_reconfigure },
 	{ "delete", cmd_delete },
 	{ "version", cmd_version },
+	{ "diagnose", cmd_diagnose },
 	{ NULL, NULL},
 };
 
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index cf4e5b889c..c0425e0653 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -14,6 +14,7 @@  scalar register [<enlistment>]
 scalar unregister [<enlistment>]
 scalar run ( all | config | commit-graph | fetch | loose-objects | pack-files ) [<enlistment>]
 scalar reconfigure [ --all | <enlistment> ]
+scalar diagnose [<enlistment>]
 scalar delete <enlistment>
 
 DESCRIPTION
@@ -139,6 +140,17 @@  reconfigure the enlistment.
 With the `--all` option, all enlistments currently registered with Scalar
 will be reconfigured. Use this option after each Scalar upgrade.
 
+Diagnose
+~~~~~~~~
+
+diagnose [<enlistment>]::
+    When reporting issues with Scalar, it is often helpful to provide the
+    information gathered by this command, including logs and certain
+    statistics describing the data shape of the current enlistment.
++
+The output of this command is a `.zip` file that is written into
+a directory adjacent to the worktree in the `src` directory.
+
 Delete
 ~~~~~~
 
diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
index bb42354a8b..fbb1df2049 100755
--- a/contrib/scalar/t/t9099-scalar.sh
+++ b/contrib/scalar/t/t9099-scalar.sh
@@ -98,4 +98,18 @@  test_expect_success '`scalar [...] <dir>` errors out when dir is missing' '
 	grep "cloned. does not exist" err
 '
 
+SQ="'"
+test_expect_success UNZIP 'scalar diagnose' '
+	scalar clone "file://$(pwd)" cloned --single-branch &&
+	scalar diagnose cloned >out 2>err &&
+	sed -n "s/.*$SQ\\(.*\\.zip\\)$SQ.*/\\1/p" <err >zip_path &&
+	zip_path=$(cat zip_path) &&
+	test -n "$zip_path" &&
+	unzip -v "$zip_path" &&
+	folder=${zip_path%.zip} &&
+	test_path_is_missing "$folder" &&
+	unzip -p "$zip_path" diagnostics.log >out &&
+	test_file_not_empty out
+'
+
 test_done