diff mbox series

[v2,1/2] rev-list: add --missing-info to print missing object path

Message ID 20250110053417.2602109-3-jltobler@gmail.com (mailing list archive)
State New
Headers show
Series [v2,1/2] rev-list: add --missing-info to print missing object path | expand

Commit Message

Justin Tobler Jan. 10, 2025, 5:34 a.m. UTC
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.

Add the `--missing-info` option that, when also set with
`--missing=print`, prints additional insight about the missing object
inferred from its containing object. Each line of output for a missing
object is in the form: `?<oid> [<token>=<value>]...`. This format is
kept generic so it can support multiple different attributes and be
extended in the future as needed. Each additional attribute is separated
by a SP. Any value containing a SP or special character is enclosed in
double-quotes in the C style as needed.

For now, only a missing object path attribute is implemented. It follows
the form `path=<path>` and specifies the full path to the object from
the top-level tree. In a subsequent commit, a missing object type
attribute will also be added.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.txt | 12 ++++
 builtin/rev-list.c                 | 98 +++++++++++++++++++++++++-----
 t/t6022-rev-list-missing.sh        | 52 ++++++++++++++++
 3 files changed, 147 insertions(+), 15 deletions(-)

Comments

Christian Couder Jan. 10, 2025, 8:47 a.m. UTC | #1
On Fri, Jan 10, 2025 at 6:38 AM Justin Tobler <jltobler@gmail.com> wrote:

> +--missing-info::
> +       Only useful with `--missing=print`; prints any additional information
> +       about the missing object inferred from its containing object. The
> +       information is all printed on the same line with the missing object ID
> +       in the form: `?<oid> [<token>=<value>]...`. Additional attributes are
> +       each separated by a SP.

Nit: I'd rather say "The `<token>=<value>` pairs containing additional
information are separated from each other by a SP." to avoid
introducing "attributes" which might not be very clear.

> Any value containing a SP or special character
> +       is enclosed in double-quotes in the C style as needed. Each
> +       `<token>=<value>` may be one of the following:

It might be a bit better to decide for each token-value pair how the
value is encoded, instead of deciding in advance for all of them.

> ++
> +The `path=<path>` shows the path of the missing object inferred from a
> +containing object.

For example for the path, I think it might be easier to always enclose
it in double-quotes in the C style rather than checking first if it
contains spaces or other special characters, see below.

> +static void print_missing_object(struct missing_objects_map_entry *entry,
> +                                int print_missing_info)
> +{
> +       struct strbuf sb;
> +
> +       if (!print_missing_info) {
> +               printf("?%s\n", oid_to_hex(&entry->entry.oid));
> +               return;
> +       }
> +
> +       strbuf_init(&sb, 0);

I am not sure it's worth initializing the sb separately from its
declaration above. Using "struct strbuf sb = STRBUF_INIT;" is more
standard in the code base and I think most compilers these days are
likely to be able to optimize away the initialization in the
"!print_missing_info" case.

> +       if (entry->path && *entry->path) {
> +               strbuf_addstr(&sb, " path=");
> +
> +               if (quote_c_style(entry->path, NULL, NULL, 0))
> +                       quote_c_style(entry->path, &sb, NULL, 0);
> +               else if (strchr(entry->path, ' '))
> +                       strbuf_addf(&sb, "\"%s\"", entry->path);
> +               else
> +                       strbuf_addstr(&sb, entry->path);

I think the above code paragraph could be simplified to just:

            quote_c_style(entry->path, &sb, NULL, 0);

if we decided to always quote the path. The decoding part would likely
be simplified too.

> +       }
> +
> +       printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
> +       strbuf_release(&sb);
> +}

> @@ -656,6 +703,15 @@ int cmd_rev_list(int argc,
>                 if (skip_prefix(arg, "--missing=", &arg))
>                         continue; /* already handled above */
>
> +               if (!strcmp(arg, "--missing-info")) {
> +                       if (arg_missing_action != MA_PRINT)
> +                               die(_("the option '%s' requires '%s'"),
> +                                   "--missing-info", "--missing=print");

It seems to me that this check should be performed outside the arg
parsing loop so that this check passes if the user passes
"--missing-info" before "--missing=print" on the command line.

> +                       print_missing_info = 1;
> +                       continue;
> +               }
> +
Junio C Hamano Jan. 10, 2025, 3:22 p.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> On Fri, Jan 10, 2025 at 6:38 AM Justin Tobler <jltobler@gmail.com> wrote:
>
>> +--missing-info::
>> +       Only useful with `--missing=print`; prints any additional information
>> +       about the missing object inferred from its containing object. The
>> +       information is all printed on the same line with the missing object ID
>> +       in the form: `?<oid> [<token>=<value>]...`. Additional attributes are
>> +       each separated by a SP.
>
> Nit: I'd rather say "The `<token>=<value>` pairs containing additional
> information are separated from each other by a SP." to avoid
> introducing "attributes" which might not be very clear.

Excellent.

>> Any value containing a SP or special character
>> +       is enclosed in double-quotes in the C style as needed. Each
>> +       `<token>=<value>` may be one of the following:
>
> It might be a bit better to decide for each token-value pair how the
> value is encoded, instead of deciding in advance for all of them.

I strongly advise against it.

Declaring the syntax we will use for forseeable future for new
attributes upfront would allow the receiving end, the scripts that
interpret our output, to be written in a futureproof way.

An alternative is "value is encoded in token specific fashion, but
one thing that is common across these future encodings is that SP or
LF contained in value will be represented in such a way that the
resulting encoded value will not have either of these two
problematic bytes".

>> +       if (entry->path && *entry->path) {
>> +               strbuf_addstr(&sb, " path=");
>> +
>> +               if (quote_c_style(entry->path, NULL, NULL, 0))
>> +                       quote_c_style(entry->path, &sb, NULL, 0);
>> +               else if (strchr(entry->path, ' '))
>> +                       strbuf_addf(&sb, "\"%s\"", entry->path);
>> +               else
>> +                       strbuf_addstr(&sb, entry->path);
>
> I think the above code paragraph could be simplified to just:
>
>             quote_c_style(entry->path, &sb, NULL, 0);

Hmph, you two may be both wrong ;-)  It is unfortunate that you
cannot easily configure what is considered "must be quoted" bytes
per invocation of the quote_c_style() function.  Most problematic
is that in the cq_lookup[] table, SP and "!" are treated the same
way, i.e., <a b c> does not need to be quoted.  This comes from the
fact that quote_c_style() was written for the sole purpose of
showing the pathnames on the header lines of "git diff" output.

We may be able to (ab)use quote_path() with QUOTE_PATH_QUOTE_SP
bit set for this purpose, though, to work around the above.

cq_must_quote() also is affected by core.quotepath, which is not a
desirable feature here---we want our machine readable program output
stable regardless of end-user preference.

By the way, perhaps we should propose to make the default value for
core.quotepath to "no" at Git 3.0 boundary?

Thanks.
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 459e5a02f5..8e363f5ae7 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -1028,6 +1028,18 @@  If some tips passed to the traversal are missing, they will be
 considered as missing too, and the traversal will ignore them. In case
 we cannot get their Object ID though, an error will be raised.
 
+--missing-info::
+	Only useful with `--missing=print`; prints any additional information
+	about the missing object inferred from its containing object. The
+	information is all printed on the same line with the missing object ID
+	in the form: `?<oid> [<token>=<value>]...`. Additional attributes are
+	each separated by a SP. Any value containing a SP or special character
+	is enclosed in double-quotes in the C style as needed. Each
+	`<token>=<value>` may be one of the following:
++
+The `path=<path>` shows the path of the missing object inferred from a
+containing object.
+
 --exclude-promisor-objects::
 	(For internal use only.)  Prefilter object traversal at
 	promisor boundary.  This is used with partial clone.  This is
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3196da7b2d..861ffdaf21 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -22,7 +22,10 @@ 
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "oidmap.h"
 #include "packfile.h"
+#include "quote.h"
+#include "strbuf.h"
 
 static const char rev_list_usage[] =
 "git rev-list [<options>] <commit>... [--] [<path>...]\n"
@@ -73,7 +76,11 @@  static unsigned progress_counter;
 static struct oidset omitted_objects;
 static int arg_print_omitted; /* print objects omitted by filter */
 
-static struct oidset missing_objects;
+struct missing_objects_map_entry {
+	struct oidmap_entry entry;
+	const char *path;
+};
+static struct oidmap missing_objects;
 enum missing_action {
 	MA_ERROR = 0,    /* fail if any missing objects are encountered */
 	MA_ALLOW_ANY,    /* silently allow ALL missing objects */
@@ -101,7 +108,47 @@  static off_t get_object_disk_usage(struct object *obj)
 	return size;
 }
 
-static inline void finish_object__ma(struct object *obj)
+static void add_missing_object_entry(struct object_id *oid, const char *path)
+{
+	struct missing_objects_map_entry *entry;
+
+	if (oidmap_get(&missing_objects, oid))
+		return;
+
+	CALLOC_ARRAY(entry, 1);
+	entry->entry.oid = *oid;
+	if (path)
+		entry->path = xstrdup(path);
+	oidmap_put(&missing_objects, entry);
+}
+
+static void print_missing_object(struct missing_objects_map_entry *entry,
+				 int print_missing_info)
+{
+	struct strbuf sb;
+
+	if (!print_missing_info) {
+		printf("?%s\n", oid_to_hex(&entry->entry.oid));
+		return;
+	}
+
+	strbuf_init(&sb, 0);
+	if (entry->path && *entry->path) {
+		strbuf_addstr(&sb, " path=");
+
+		if (quote_c_style(entry->path, NULL, NULL, 0))
+			quote_c_style(entry->path, &sb, NULL, 0);
+		else if (strchr(entry->path, ' '))
+			strbuf_addf(&sb, "\"%s\"", entry->path);
+		else
+			strbuf_addstr(&sb, entry->path);
+	}
+
+	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
+	strbuf_release(&sb);
+}
+
+static inline void finish_object__ma(struct object *obj, const char *name)
 {
 	/*
 	 * Whether or not we try to dynamically fetch missing objects
@@ -119,7 +166,7 @@  static inline void finish_object__ma(struct object *obj)
 		return;
 
 	case MA_PRINT:
-		oidset_insert(&missing_objects, &obj->oid);
+		add_missing_object_entry(&obj->oid, name);
 		return;
 
 	case MA_ALLOW_PROMISOR:
@@ -152,7 +199,7 @@  static void show_commit(struct commit *commit, void *data)
 
 	if (revs->do_not_die_on_missing_objects &&
 	    oidset_contains(&revs->missing_commits, &commit->object.oid)) {
-		finish_object__ma(&commit->object);
+		finish_object__ma(&commit->object, NULL);
 		return;
 	}
 
@@ -268,12 +315,11 @@  static void show_commit(struct commit *commit, void *data)
 	finish_commit(commit);
 }
 
-static int finish_object(struct object *obj, const char *name UNUSED,
-			 void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
-		finish_object__ma(obj);
+		finish_object__ma(obj, name);
 		return 1;
 	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
@@ -538,6 +584,7 @@  int cmd_rev_list(int argc,
 	int bisect_show_vars = 0;
 	int bisect_find_all = 0;
 	int use_bitmap_index = 0;
+	int print_missing_info = 0;
 	int filter_provided_objects = 0;
 	const char *show_progress = NULL;
 	int ret = 0;
@@ -656,6 +703,15 @@  int cmd_rev_list(int argc,
 		if (skip_prefix(arg, "--missing=", &arg))
 			continue; /* already handled above */
 
+		if (!strcmp(arg, "--missing-info")) {
+			if (arg_missing_action != MA_PRINT)
+				die(_("the option '%s' requires '%s'"),
+				    "--missing-info", "--missing=print");
+
+			print_missing_info = 1;
+			continue;
+		}
+
 		if (!strcmp(arg, ("--no-object-names"))) {
 			arg_show_object_names = 0;
 			continue;
@@ -782,9 +838,16 @@  int cmd_rev_list(int argc,
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT) {
-		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
+		struct oidset_iter iter;
+		struct object_id *oid;
+
+		oidmap_init(&missing_objects, DEFAULT_OIDSET_SIZE);
+		oidset_iter_init(&revs.missing_commits, &iter);
+
 		/* Add missing tips */
-		oidset_insert_from_set(&missing_objects, &revs.missing_commits);
+		while ((oid = oidset_iter_next(&iter)))
+			add_missing_object_entry(oid, NULL);
+
 		oidset_clear(&revs.missing_commits);
 	}
 
@@ -801,12 +864,17 @@  int cmd_rev_list(int argc,
 		oidset_clear(&omitted_objects);
 	}
 	if (arg_missing_action == MA_PRINT) {
-		struct oidset_iter iter;
-		struct object_id *oid;
-		oidset_iter_init(&missing_objects, &iter);
-		while ((oid = oidset_iter_next(&iter)))
-			printf("?%s\n", oid_to_hex(oid));
-		oidset_clear(&missing_objects);
+		struct missing_objects_map_entry *entry;
+		struct oidmap_iter iter;
+
+		oidmap_iter_init(&missing_objects, &iter);
+
+		while ((entry = oidmap_iter_next(&iter))) {
+			print_missing_object(entry, print_missing_info);
+			free((void *)entry->path);
+		}
+
+		oidmap_free(&missing_objects, true);
 	}
 
 	stop_progress(&progress);
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 7553a9cca2..2eb051028e 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -145,4 +145,56 @@  do
 	done
 done
 
+for obj in "HEAD~1" "HEAD^{tree}" "HEAD:foo" "HEAD:foo/bar" "HEAD:baz baz" "HEAD:bat\""
+do
+	test_expect_success "--missing-info with missing '$obj'" '
+		test_when_finished rm -rf missing-info &&
+
+		git init missing-info &&
+		(
+			cd missing-info &&
+			git commit --allow-empty -m first &&
+
+			mkdir foo &&
+			echo bar >foo/bar &&
+			echo baz >"baz baz" &&
+			echo bat >bat\" &&
+			git add -A &&
+			git commit -m second &&
+
+			oid="$(git rev-parse "$obj")" &&
+			path=".git/objects/$(test_oid_to_path $oid)" &&
+
+			case $obj in
+			HEAD:foo)
+				path_info=" path=foo"
+				;;
+			HEAD:foo/bar)
+				path_info=" path=foo/bar"
+				;;
+			"HEAD:baz baz")
+				path_info=" path=\"baz baz\""
+				;;
+			"HEAD:bat\"")
+				path_info=" path=\"bat\\\"\""
+				;;
+			esac &&
+
+			# Before the object is made missing, we use rev-list to
+			# get the expected oids.
+			git rev-list --objects --no-object-names \
+				HEAD ^"$obj" >expect.raw &&
+			echo "?$oid$path_info" >>expect.raw &&
+
+			mv "$path" "$path.hidden" &&
+			git rev-list --objects --no-object-names \
+				--missing=print --missing-info HEAD >actual.raw &&
+
+			sort actual.raw >actual &&
+			sort expect.raw >expect &&
+			test_cmp expect actual
+		)
+	'
+done
+
 test_done