diff mbox series

[v8,6/8] ls-tree.c: support --object-only option for "git-ls-tree"

Message ID cb881183cb7179e3e7e983b7c9ffe115f845115c.1641043500.git.dyroneteng@gmail.com (mailing list archive)
State Superseded
Headers show
Series ls-tree: "--object-only" and "--format" opts | expand

Commit Message

Teng Long Jan. 1, 2022, 1:50 p.m. UTC
We usually pipe the output from `git ls-trees` to tools like
`sed` or `cut` when we only want to extract some fields.

When we want only the pathname component, we can pass
`--name-only` option to omit such a pipeline, but there are no
options for extracting other fields.

Teach the "--object-only" option to the command to only show the
object name. This option cannot be used together with
"--name-only" or "--long" , they are mutually exclusive (actually
"--name-only" and "--long" can be combined together before, this
commit by the way fix this bug).

A simple refactoring was done to the "show_tree" function, intead by
using bitwise operations to recognize the format for printing to
stdout. The reason for doing this is that we don't want to increase
the readability difficulty with the addition of "-object-only",
making this part of the logic easier to read and expand.

In terms of performance, there is no loss comparing to the
"master" (2ae0a9cb8298185a94e5998086f380a355dd8907), here are the
results of the performance tests in my environment based on linux
repository:

    $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r HEAD"
    Benchmark 1: /opt/git/master/bin/git ls-tree -r HEAD
    Time (mean ± σ):     105.8 ms ±   2.7 ms    [User: 85.7 ms, System: 20.0 ms]
    Range (min … max):   101.5 ms … 111.3 ms    28 runs

    $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r HEAD"
    Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r HEAD
    Time (mean ± σ):     105.0 ms ±   3.0 ms    [User: 83.7 ms, System: 21.2 ms]
    Range (min … max):    99.3 ms … 109.5 ms    27 runs

    $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r -l HEAD"
    Benchmark 1: /opt/git/master/bin/git ls-tree -r -l HEAD
    Time (mean ± σ):     337.4 ms ±  10.9 ms    [User: 308.3 ms, System: 29.0 ms]
    Range (min … max):   323.0 ms … 355.0 ms    10 runs

    $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r -l HEAD"
    Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r -l HEAD
    Time (mean ± σ):     337.6 ms ±   6.2 ms    [User: 309.4 ms, System: 28.1 ms]
    Range (min … max):   330.4 ms … 349.9 ms    10 runs

Signed-off-by: Teng Long <dyroneteng@gmail.com>
---
 Documentation/git-ls-tree.txt |   7 +-
 builtin/ls-tree.c             | 140 +++++++++++++++++++++++++---------
 t/t3104-ls-tree-oid.sh        |  51 +++++++++++++
 3 files changed, 159 insertions(+), 39 deletions(-)
 create mode 100755 t/t3104-ls-tree-oid.sh

Comments

Junio C Hamano Jan. 4, 2022, 1:21 a.m. UTC | #1
Teng Long <dyroneteng@gmail.com> writes:

> A simple refactoring was done to the "show_tree" function, intead by
> using bitwise operations to recognize the format for printing to
> stdout. The reason for doing this is that we don't want to increase
> the readability difficulty with the addition of "-object-only",
> making this part of the logic easier to read and expand.

The resulting code looks unnecessarily complex and brittle; some
SHOW_FOO mean SHOW_FOO_ONLY_AND_NOTHING_ELSE while other SHOW_BAR
means SHOW_BAR_BUT_WE_MAY_SHOW_OTHER_THINGS_IN_LATER_PART, and the
distinction is not clear from their names (which means it is hard
to later extend and enhance the behaviour of the code).

> +	if (!(shown_bits ^ SHOW_FILE_NAME)) {

Is the use of XOR operator significant here?

I.e. "if (shown_bits & SHOW_FILE_NAME)" would have been a much more
natural way to guard "this is a block that shows the file name",
than "the result MUST BE all bits off if we flip SHOW_FILE_NAME bit
off".  If various SHOW_FOO bits are meant to be mutually exclusive,
then "if ((shown_bits & SHOW_FILE_NAME) == SHOW_FILE_NAME)" would
also make sense, but as I said upfront, it is unclear to me if
shown_bits are meant to be a collection of "this bit means this
field is shown (and it implies nothing else)", so I dunno.
Teng Long Jan. 4, 2022, 7:29 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> The resulting code looks unnecessarily complex and brittle; some
> SHOW_FOO mean SHOW_FOO_ONLY_AND_NOTHING_ELSE while other SHOW_BAR
> means SHOW_BAR_BUT_WE_MAY_SHOW_OTHER_THINGS_IN_LATER_PART, and the
> distinction is not clear from their names (which means it is hard
> to later extend and enhance the behaviour of the code).

I agree with you that the relevant code is not very clear, So I
think I will take these steps:

1. Rename "shown_bits" -> "shown_fields"

   essentially we want to show the fields but not bits to user that
   may firstly solve the unclear nameing problem for "shown_bits"
   itself.

2. Rename related macro definitions of "shown_fields by :


	   SHOW_FILE_NAME -> FILE_NAME_FIELD
    	   SHOW_SIZE -> SIZE_FIELD
    	   SHOW_OBJECT_NAME -> OBJECT_NAME_FIELD
    	   SHOW_TYPE -> TYPE_FIELD
    	   SHOW_DEFAULT -> DEFAULT_FIELDS

    I think the confusion comes from " SHOW_FOO_ONLY_AND_NOTHING_ELSE"
    and " SHOW_BAR_BUT_WE_MAY_SHOW_OTHER_THINGS_IN_LATER_PART" is
    because some macros's is named by mixed the "flags" and "the
    operation of flags" together. 

    So with renamings, we try to unify these definitions meaning,
    they are just used for defining a "field" with a specified
    non-repetitive bits.
    
    After that, you can show many "fields" by combining any of "fields",
    such as what the builtin "DEFAULT_FIELDS" does, it shows all the
    fields but except the "size" field.

    By far, the "field(s)" only means the definition themselfs, and
    no business with "which one/ones" or  "how" to shown.

3. Decide "which" fields need to show

   The definition of "WHICH_FIELDS_TO_SHOWN" is by "shown_fields",
   it's used for parse from the options and compute it's value
   by function "parse_shown_fields()".

   Actually, the function already represent what work it do,
   but the problem is I didn't notice to rename "show_bits"
   to "show_fields" before. This may bring some confusion,
   because we are not going to show the bits but the field(s).
   So, I will do the <STEP.1>.

4. Decide "How" fields to be shown

    Now we have already know the field(s) we care about, next
    step is to show the fields.

> > +	if (!(shown_bits ^ SHOW_FILE_NAME)) {

> Is the use of XOR operator significant here?
> 
> I.e. "if (shown_bits & SHOW_FILE_NAME)" would have been a much more
> natural way to guard "this is a block that shows the file name",
> than "the result MUST BE all bits off if we flip SHOW_FILE_NAME bit
> off".  If various SHOW_FOO bits are meant to be mutually exclusive,
> then "if ((shown_bits & SHOW_FILE_NAME) == SHOW_FILE_NAME)" would
> also make sense, but as I said upfront, it is unclear to me if
> shown_bits are meant to be a collection of "this bit means this
> field is shown (and it implies nothing else)", so I dunno.

    Not significant. Both work and It's all right for me. Your
    readability is better, and now I know how to handle this
    situation better.

    E.g, if we only want to show a "filename" field
    (with `--name-only`), we will use a way like
    "if ((shown_fields & FILE_NAME_FIELD) == FILE_NAME_FIELD)"
    to judge this situation.

    And if we want to show fields in a builtin way
    (as described in 'git-ls-tree.txt', default output format
    is compatible with what `--index-info --stdin` of
    'git update-index' expects.), we will use "DEFAULT_FIELDS"
    instead.


I am not sure if this solves the problem you are considering as I
may misunderstand, I can quickly finish this new patch and we can
look at it then。

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index db02d6d79a..729370f235 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -10,7 +10,7 @@  SYNOPSIS
 --------
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
-	    [--name-only] [--name-status] [--full-name] [--full-tree] [--abbrev[=<n>]]
+	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
 	    <tree-ish> [<path>...]
 
 DESCRIPTION
@@ -59,6 +59,11 @@  OPTIONS
 --name-only::
 --name-status::
 	List only filenames (instead of the "long" output), one per line.
+	Cannot be combined with `--object-only`.
+
+--object-only::
+	List only names of the objects, one per line. Cannot be combined
+	with `--name-only` or `--name-status`.
 
 --abbrev[=<n>]::
 	Instead of showing the full 40-byte hexadecimal object
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index df8312408d..85ca7358ba 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -16,22 +16,59 @@ 
 
 static int line_termination = '\n';
 #define LS_RECURSIVE 1
-#define LS_TREE_ONLY 2
-#define LS_SHOW_TREES 4
-#define LS_NAME_ONLY 8
-#define LS_SHOW_SIZE 16
+#define LS_TREE_ONLY (1 << 1)
+#define LS_SHOW_TREES (1 << 2)
+#define LS_NAME_ONLY (1 << 3)
+#define LS_SHOW_SIZE (1 << 4)
+#define LS_OBJECT_ONLY (1 << 5)
 static int abbrev;
 static int ls_options;
 static struct pathspec pathspec;
 static int chomp_prefix;
 static const char *ls_tree_prefix;
+static unsigned int shown_bits;
+#define SHOW_FILE_NAME 1
+#define SHOW_SIZE (1 << 1)
+#define SHOW_OBJECT_NAME (1 << 2)
+#define SHOW_TYPE (1 << 3)
+#define SHOW_MODE (1 << 4)
+#define SHOW_DEFAULT 29 /* 11101 size is not shown to output by default */
 
 static const  char * const ls_tree_usage[] = {
 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
 	NULL
 };
 
-static int show_recursive(const char *base, size_t baselen, const char *pathname)
+enum {
+	MODE_UNSPECIFIED = 0,
+	MODE_NAME_ONLY,
+	MODE_OBJECT_ONLY,
+	MODE_LONG,
+};
+
+static int cmdmode = MODE_UNSPECIFIED;
+
+static int parse_shown_fields(void)
+{
+	if (cmdmode == MODE_NAME_ONLY) {
+		shown_bits = SHOW_FILE_NAME;
+		return 0;
+	}
+	if (cmdmode == MODE_OBJECT_ONLY) {
+		shown_bits = SHOW_OBJECT_NAME;
+		return 0;
+	}
+	if (!ls_options || (ls_options & LS_RECURSIVE)
+	    || (ls_options & LS_SHOW_TREES)
+	    || (ls_options & LS_TREE_ONLY))
+		shown_bits = SHOW_DEFAULT;
+	if (cmdmode == MODE_LONG)
+		shown_bits = SHOW_DEFAULT | SHOW_SIZE;
+	return 1;
+}
+
+static int show_recursive(const char *base, size_t baselen,
+			  const char *pathname)
 {
 	int i;
 
@@ -61,6 +98,39 @@  static int show_recursive(const char *base, size_t baselen, const char *pathname
 	return 0;
 }
 
+static int show_default(const struct object_id *oid, enum object_type type,
+			const char *pathname, unsigned mode,
+			struct strbuf *base)
+{
+	size_t baselen = base->len;
+
+	if (shown_bits & SHOW_SIZE) {
+		char size_text[24];
+		if (type == OBJ_BLOB) {
+			unsigned long size;
+			if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
+				xsnprintf(size_text, sizeof(size_text), "BAD");
+			else
+				xsnprintf(size_text, sizeof(size_text),
+					  "%" PRIuMAX, (uintmax_t)size);
+		} else {
+			xsnprintf(size_text, sizeof(size_text), "-");
+		}
+		printf("%06o %s %s %7s\t", mode, type_name(type),
+		find_unique_abbrev(oid, abbrev), size_text);
+	} else {
+		printf("%06o %s %s\t", mode, type_name(type),
+		find_unique_abbrev(oid, abbrev));
+	}
+	baselen = base->len;
+	strbuf_addstr(base, pathname);
+	write_name_quoted_relative(base->buf,
+				   chomp_prefix ? ls_tree_prefix : NULL, stdout,
+				   line_termination);
+	strbuf_setlen(base, baselen);
+	return 1;
+}
+
 static int show_tree_init(enum object_type *type, struct strbuf *base,
 			  const char *pathname, unsigned mode, int *retval)
 {
@@ -89,34 +159,24 @@  static int show_tree(const struct object_id *oid, struct strbuf *base,
 	if (show_tree_init(&type, base, pathname, mode, &retval))
 		return retval;
 
-	if (!(ls_options & LS_NAME_ONLY)) {
-		if (ls_options & LS_SHOW_SIZE) {
-			char size_text[24];
-			if (type == OBJ_BLOB) {
-				unsigned long size;
-				if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
-					xsnprintf(size_text, sizeof(size_text),
-						  "BAD");
-				else
-					xsnprintf(size_text, sizeof(size_text),
-						  "%"PRIuMAX, (uintmax_t)size);
-			} else {
-				xsnprintf(size_text, sizeof(size_text), "-");
-			}
-			printf("%06o %s %s %7s\t", mode, type_name(type),
-			       find_unique_abbrev(oid, abbrev),
-			       size_text);
-		} else {
-			printf("%06o %s %s\t", mode, type_name(type),
-			       find_unique_abbrev(oid, abbrev));
-		}
+	if (!(shown_bits ^ SHOW_OBJECT_NAME)) {
+		printf("%s%c", find_unique_abbrev(oid, abbrev), line_termination);
+		return retval;
 	}
-	baselen = base->len;
-	strbuf_addstr(base, pathname);
-	write_name_quoted_relative(base->buf,
-				   chomp_prefix ? ls_tree_prefix : NULL,
-				   stdout, line_termination);
-	strbuf_setlen(base, baselen);
+
+	if (!(shown_bits ^ SHOW_FILE_NAME)) {
+		baselen = base->len;
+		strbuf_addstr(base, pathname);
+		write_name_quoted_relative(base->buf,
+					   chomp_prefix ? ls_tree_prefix : NULL,
+					   stdout, line_termination);
+		strbuf_setlen(base, baselen);
+	}
+
+	if (!(shown_bits ^ SHOW_DEFAULT) ||
+	    !(shown_bits ^ (SHOW_DEFAULT | SHOW_SIZE)))
+		show_default(oid, type, pathname, mode, base);
+
 	return retval;
 }
 
@@ -134,12 +194,14 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			LS_SHOW_TREES),
 		OPT_SET_INT('z', NULL, &line_termination,
 			    N_("terminate entries with NUL byte"), 0),
-		OPT_BIT('l', "long", &ls_options, N_("include object size"),
-			LS_SHOW_SIZE),
-		OPT_BIT(0, "name-only", &ls_options, N_("list only filenames"),
-			LS_NAME_ONLY),
-		OPT_BIT(0, "name-status", &ls_options, N_("list only filenames"),
-			LS_NAME_ONLY),
+		OPT_CMDMODE('l', "long", &cmdmode, N_("include object size"),
+			    MODE_LONG),
+		OPT_CMDMODE(0, "name-only", &cmdmode, N_("list only filenames"),
+			    MODE_NAME_ONLY),
+		OPT_CMDMODE(0, "name-status", &cmdmode, N_("list only filenames"),
+			    MODE_NAME_ONLY),
+		OPT_CMDMODE(0, "object-only", &cmdmode, N_("list only objects"),
+			    MODE_OBJECT_ONLY),
 		OPT_SET_INT(0, "full-name", &chomp_prefix,
 			    N_("use full path names"), 0),
 		OPT_BOOL(0, "full-tree", &full_tree,
@@ -170,6 +232,8 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	if (get_oid(argv[0], &oid))
 		die("Not a valid object name %s", argv[0]);
 
+	parse_shown_fields();
+
 	/*
 	 * show_recursive() rolls its own matching code and is
 	 * generally ignorant of 'struct pathspec'. The magic mask
diff --git a/t/t3104-ls-tree-oid.sh b/t/t3104-ls-tree-oid.sh
new file mode 100755
index 0000000000..6ce62bd769
--- /dev/null
+++ b/t/t3104-ls-tree-oid.sh
@@ -0,0 +1,51 @@ 
+#!/bin/sh
+
+test_description='git ls-tree objects handling.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit A &&
+	test_commit B &&
+	mkdir -p C &&
+	test_commit C/D.txt &&
+	find *.txt path* \( -type f -o -type l \) -print |
+	xargs git update-index --add &&
+	tree=$(git write-tree) &&
+	echo $tree
+'
+
+test_expect_success 'usage: --object-only' '
+	git ls-tree --object-only $tree >current &&
+	git ls-tree $tree >result &&
+	cut -f1 result | cut -d " " -f3 >expected &&
+	test_cmp current expected
+'
+
+test_expect_success 'usage: --object-only with -r' '
+	git ls-tree --object-only -r $tree >current &&
+	git ls-tree -r $tree >result &&
+	cut -f1 result | cut -d " " -f3 >expected &&
+	test_cmp current expected
+'
+
+test_expect_success 'usage: --object-only with --abbrev' '
+	git ls-tree --object-only --abbrev=6 $tree >current &&
+	git ls-tree --abbrev=6 $tree >result &&
+	cut -f1 result | cut -d " " -f3 >expected &&
+	test_cmp current expected
+'
+
+test_expect_success 'usage: incompatible options: --name-only with --object-only' '
+	test_expect_code 129 git ls-tree --object-only --name-only $tree
+'
+
+test_expect_success 'usage: incompatible options: --name-status with --object-only' '
+	test_expect_code 129 git ls-tree --object-only --name-status $tree
+'
+
+test_expect_success 'usage: incompatible options: --long with --object-only' '
+	test_expect_code 129 git ls-tree --object-only --long $tree
+'
+
+test_done