diff mbox series

[v10,6/9] ls-tree.c: support --object-only option for "git-ls-tree"

Message ID bcfbc935b80b889273e3e54fec2a896e44acd2b5.1641978175.git.dyroneteng@gmail.com (mailing list archive)
State Superseded
Headers show
Series ls-tree: "--object-only" and "--format" opts | expand

Commit Message

Teng Long Jan. 13, 2022, 3:42 a.m. UTC
We usually pipe the output from `git ls-trees` to tools like
`sed` or `cut` when we only want to extract some fields.

When we want only the pathname component, we can pass
`--name-only` option to omit such a pipeline, but there are no
options for extracting other fields.

Teach the "--object-only" option to the command to only show the
object name. This option cannot be used together with
"--name-only" or "--long" , they are mutually exclusive (actually
"--name-only" and "--long" can be combined together before, this
commit by the way fix this bug).

A simple refactoring was done to the "show_tree" function, intead by
using bitwise operations to recognize the format for printing to
stdout. The reason for doing this is that we don't want to increase
the readability difficulty with the addition of "-object-only",
making this part of the logic easier to read and expand.

In terms of performance, there is no loss comparing to the
"master" (2ae0a9cb8298185a94e5998086f380a355dd8907), here are the
results of the performance tests in my environment based on linux
repository:

    $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r HEAD"
    Benchmark 1: /opt/git/master/bin/git ls-tree -r HEAD
    Time (mean ± σ):     105.8 ms ±   2.7 ms    [User: 85.7 ms, System: 20.0 ms]
    Range (min … max):   101.5 ms … 111.3 ms    28 runs

    $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r HEAD"
    Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r HEAD
    Time (mean ± σ):     105.0 ms ±   3.0 ms    [User: 83.7 ms, System: 21.2 ms]
    Range (min … max):    99.3 ms … 109.5 ms    27 runs

    $hyperfine --warmup=10 "/opt/git/master/bin/git ls-tree -r -l HEAD"
    Benchmark 1: /opt/git/master/bin/git ls-tree -r -l HEAD
    Time (mean ± σ):     337.4 ms ±  10.9 ms    [User: 308.3 ms, System: 29.0 ms]
    Range (min … max):   323.0 ms … 355.0 ms    10 runs

    $hyperfine --warmup=10 "/opt/git/ls-tree-oid-only/bin/git ls-tree -r -l HEAD"
    Benchmark 1: /opt/git/ls-tree-oid-only/bin/git ls-tree -r -l HEAD
    Time (mean ± σ):     337.6 ms ±   6.2 ms    [User: 309.4 ms, System: 28.1 ms]
    Range (min … max):   330.4 ms … 349.9 ms    10 runs

Signed-off-by: Teng Long <dyroneteng@gmail.com>
---
 Documentation/git-ls-tree.txt |   7 +-
 builtin/ls-tree.c             | 141 +++++++++++++++++++++++++---------
 t/t3104-ls-tree-oid.sh        |  51 ++++++++++++
 3 files changed, 160 insertions(+), 39 deletions(-)
 create mode 100755 t/t3104-ls-tree-oid.sh

Comments

Ævar Arnfjörð Bjarmason Jan. 13, 2022, 6:59 a.m. UTC | #1
On Thu, Jan 13 2022, Teng Long wrote:

> We usually pipe the output from `git ls-trees` to tools like
> `sed` or `cut` when we only want to extract some fields.
>
> When we want only the pathname component, we can pass
> `--name-only` option to omit such a pipeline, but there are no
> options for extracting other fields.
>
> Teach the "--object-only" option to the command to only show the
> object name. This option cannot be used together with
> "--name-only" or "--long" , they are mutually exclusive (actually
> "--name-only" and "--long" can be combined together before, this
> commit by the way fix this bug).

In the RFC series I sent this was first implemented in terms of the
--format option, and I skipped the custom implementation you're adding
here:
https://lore.kernel.org/git/RFC-patch-7.7-5e34df4f8dd-20211217T131635Z-avarab@gmail.com/

I think in terms of patch series structure it would make sense to do
that, and then have this custom --object-only implementation in terms of
not-"--format " follow from that, and thus with the tests for the two
(we'd add the tests you're adding here first, just for a
--format="%(objectname)" or whatever) we'd see that the two are 1=1
equivalent in terms of functionality, but that this one is <X>% more
optimized.
Teng Long Jan. 14, 2022, 8:18 a.m. UTC | #2
On Thu, Jan 13, 2022 at 3:02 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> In the RFC series I sent this was first implemented in terms of the
> --format option, and I skipped the custom implementation you're adding
> here:
> https://lore.kernel.org/git/RFC-patch-7.7-5e34df4f8dd-20211217T131635Z-avarab@gmail.com/
>
> I think in terms of patch series structure it would make sense to do
> that, and then have this custom --object-only implementation in terms of
> not-"--format " follow from that, and thus with the tests for the two

Sorry, the "not-"--format" means?

> (we'd add the tests you're adding here first, just for a
> --format="%(objectname)" or whatever) we'd see that the two are 1=1
> equivalent in terms of functionality, but that this one is <X>% more
> optimized.

Please allow me to understand your advice,  if we put the commit of
introducing "--format" before the commit of introducing "--object-only", will
be better because it's possible to supply more optimized performance
(if we have) information in the commit message.
Ævar Arnfjörð Bjarmason Jan. 14, 2022, 11:47 a.m. UTC | #3
On Fri, Jan 14 2022, Teng Long wrote:

> On Thu, Jan 13, 2022 at 3:02 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>
>> In the RFC series I sent this was first implemented in terms of the
>> --format option, and I skipped the custom implementation you're adding
>> here:
>> https://lore.kernel.org/git/RFC-patch-7.7-5e34df4f8dd-20211217T131635Z-avarab@gmail.com/
>>
>> I think in terms of patch series structure it would make sense to do
>> that, and then have this custom --object-only implementation in terms of
>> not-"--format " follow from that, and thus with the tests for the two
>
> Sorry, the "not-"--format" means?

Sorry about not being clear. I mean there's two potential
implementations. One that't is terms of --format='%(objectname)', and
the other with your custom (faster) code to implement it.

>> (we'd add the tests you're adding here first, just for a
>> --format="%(objectname)" or whatever) we'd see that the two are 1=1
>> equivalent in terms of functionality, but that this one is <X>% more
>> optimized.
>
> Please allow me to understand your advice,  if we put the commit of
> introducing "--format" before the commit of introducing "--object-only", will
> be better because it's possible to supply more optimized performance
> (if we have) information in the commit message.

Yes, you get the functionality you need with a simple alias of
--format='%(objectname)' to --object-name (or whatever), so the only
reason to carry the extra code is for optimization.

I wonder if the extra difference in performance is still something you
care about, or if just the --format implementation would be OK.

But in any case, starting with a simpler implementation and testing it
makes the progression easier to reason about.
Teng Long Jan. 18, 2022, 9:55 a.m. UTC | #4
On Fri, Jan 14, 2022 at 7:59 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> Yes, you get the functionality you need with a simple alias of
> --format='%(objectname)' to --object-name (or whatever), so the only
> reason to carry the extra code is for optimization.
>
> I wonder if the extra difference in performance is still something you
> care about, or if just the --format implementation would be OK.
>
> But in any case, starting with a simpler implementation and testing it
> makes the progression easier to reason about.

Actually, at first, I wanted to achieve this in a simple way, as the
"--object-only" implementation.

With the discussion in the community, I think both of them can achieve
this purpose. "--object-only" is more intuitive, while "--format "is
more flexible.
For example, if the terminal supports automatic completion, the function of
this option can be clearly known with typing TAB and lower costs of use and
understanding. "--format"  also works, but maybe have to check the help
document to see if there are fields that support the same purpose.

Because the community had a different opinion about it. Junio, might prefer
an "--object-only" approach, if I understand the context correctly.

So I have some inclination to support both. However, I can accept that only
"--format" is supported.

So in the next patch, I hope to do some refactoring of the commit to support
"--object-only" as the top commit. If in the end, we decide that "--format" is
enough, we can discard the top "--object-only" commit.

I know you guys currently are busy on the new 2.35 release, so a later reply
is OK.

Thanks.
Ævar Arnfjörð Bjarmason Feb. 4, 2022, 12:58 p.m. UTC | #5
On Tue, Jan 18 2022, Teng Long wrote:

> On Fri, Jan 14, 2022 at 7:59 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>
>> Yes, you get the functionality you need with a simple alias of
>> --format='%(objectname)' to --object-name (or whatever), so the only
>> reason to carry the extra code is for optimization.
>>
>> I wonder if the extra difference in performance is still something you
>> care about, or if just the --format implementation would be OK.
>>
>> But in any case, starting with a simpler implementation and testing it
>> makes the progression easier to reason about.
>
> Actually, at first, I wanted to achieve this in a simple way, as the
> "--object-only" implementation.
>
> With the discussion in the community, I think both of them can achieve
> this purpose. "--object-only" is more intuitive, while "--format "is
> more flexible.
> For example, if the terminal supports automatic completion, the function of
> this option can be clearly known with typing TAB and lower costs of use and
> understanding. "--format"  also works, but maybe have to check the help
> document to see if there are fields that support the same purpose.
>
> Because the community had a different opinion about it. Junio, might prefer
> an "--object-only" approach, if I understand the context correctly.
>
> So I have some inclination to support both. However, I can accept that only
> "--format" is supported.

I'm only talking about how it's implemented internally, not whether we
have an --object-only option in the UI. I think it's good to have the
option for completion etc.

I.e. in my RFC implementation of it here it's just a trivial wrapper
around specifying a --format:
https://lore.kernel.org/git/RFC-patch-7.7-5e34df4f8dd-20211217T131635Z-avarab@gmail.com/;
Implementing it is 6 lines of trivial C code boilerplate.

But when you picked that up & ran with it you ended up carrying your
original implementation:
https://lore.kernel.org/git/e0274f079a7d381b9a936bfcd53bad64167c18b8.1641440700.git.dyroneteng@gmail.com/

I'm not saying we shouldn't have that, but that in any case a sequence of:

 1. Add a --format option
 2. Add a --object-only alias for a --format (what my RFC 7/7 does)
 3. Add a custom more optimized --object-only implementation

Would make the patch progression much easier to read, and we'd consider
the correctness of --object-only (1 and 2) separate from the
optimization question (3).

But maybe we won't need (3) at all in the end, i.e. is (1 and 2) fast
enough for it not to matter (I think probably "yes", but I don't have a
strong opinion on that).

> So in the next patch, I hope to do some refactoring of the commit to support
> "--object-only" as the top commit. If in the end, we decide that "--format" is
> enough, we can discard the top "--object-only" commit.

*nod*, now that I read ahead I think you pretty much agree with that plan :)

> I know you guys currently are busy on the new 2.35 release, so a later reply
> is OK.

Now would be a good time :)

I was reminded of this because Junio's proposed it for next at
https://lore.kernel.org/git/xmqqr18jnr2t.fsf@gitster.g/

I think per the above & other replies of mine (including not matters of
code arrangement opinion, but e.g. the doc formatting bug) we'll need at
least one more re-roll of this. Thanks for sticking with this & working
on this!

I'll indicate that in a reply to that "What's Cooking" report.
Teng Long Feb. 7, 2022, 2:22 a.m. UTC | #6
On Fri, Feb 4, 2022 at 9:04 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> I'm not saying we shouldn't have that, but that in any case a sequence of:
>
>  1. Add a --format option
>  2. Add a --object-only alias for a --format (what my RFC 7/7 does)
>  3. Add a custom more optimized --object-only implementation
>
> Would make the patch progression much easier to read, and we'd consider
> the correctness of --object-only (1 and 2) separate from the
> optimization question (3).
>
> But maybe we won't need (3) at all in the end, i.e. is (1 and 2) fast
> enough for it not to matter (I think probably "yes", but I don't have a
> strong opinion on that).

Sorry for the late reply, I had a vacation in the last two weeks (Chinese
New Year).

I have to say it's a very valuable recommendation and at the same time
I recognise that
spending more time on organizing commits ahead is important and make
small steps(or commits) sufficiently.

> Now would be a good time :)
>
> I was reminded of this because Junio's proposed it for next at
> https://lore.kernel.org/git/xmqqr18jnr2t.fsf@gitster.g/
>
> I think per the above & other replies of mine (including not matters of
> code arrangement opinion, but e.g. the doc formatting bug) we'll need at
> least one more re-roll of this. Thanks for sticking with this & working
> on this!
>
> I'll indicate that in a reply to that "What's Cooking" report.

Thanks for mentioning that. I will back work on it this week.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index db02d6d79a..729370f235 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -10,7 +10,7 @@  SYNOPSIS
 --------
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
-	    [--name-only] [--name-status] [--full-name] [--full-tree] [--abbrev[=<n>]]
+	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]]
 	    <tree-ish> [<path>...]
 
 DESCRIPTION
@@ -59,6 +59,11 @@  OPTIONS
 --name-only::
 --name-status::
 	List only filenames (instead of the "long" output), one per line.
+	Cannot be combined with `--object-only`.
+
+--object-only::
+	List only names of the objects, one per line. Cannot be combined
+	with `--name-only` or `--name-status`.
 
 --abbrev[=<n>]::
 	Instead of showing the full 40-byte hexadecimal object
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 9729854a3d..e1a2f8225b 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -16,22 +16,60 @@ 
 
 static int line_termination = '\n';
 #define LS_RECURSIVE 1
-#define LS_TREE_ONLY 2
-#define LS_SHOW_TREES 4
-#define LS_NAME_ONLY 8
-#define LS_SHOW_SIZE 16
+#define LS_TREE_ONLY (1 << 1)
+#define LS_SHOW_TREES (1 << 2)
+#define LS_NAME_ONLY (1 << 3)
+#define LS_SHOW_SIZE (1 << 4)
+#define LS_OBJECT_ONLY (1 << 5)
 static int abbrev;
 static int ls_options;
 static struct pathspec pathspec;
 static int chomp_prefix;
 static const char *ls_tree_prefix;
+static unsigned int shown_fields;
+#define FIELD_FILE_NAME 1
+#define FIELD_SIZE (1 << 1)
+#define FIELD_OBJECT_NAME (1 << 2)
+#define FIELD_TYPE (1 << 3)
+#define FIELD_MODE (1 << 4)
+#define FIELD_DEFAULT 29 /* 11101 size is not shown to output by default */
+#define FIELD_LONG_DEFAULT  (FIELD_DEFAULT | FIELD_SIZE)
 
 static const  char * const ls_tree_usage[] = {
 	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
 	NULL
 };
 
-static int show_recursive(const char *base, size_t baselen, const char *pathname)
+enum {
+	MODE_UNSPECIFIED = 0,
+	MODE_NAME_ONLY,
+	MODE_OBJECT_ONLY,
+	MODE_LONG,
+};
+
+static int cmdmode = MODE_UNSPECIFIED;
+
+static int parse_shown_fields(void)
+{
+	if (cmdmode == MODE_NAME_ONLY) {
+		shown_fields = FIELD_FILE_NAME;
+		return 0;
+	}
+	if (cmdmode == MODE_OBJECT_ONLY) {
+		shown_fields = FIELD_OBJECT_NAME;
+		return 0;
+	}
+	if (!ls_options || (ls_options & LS_RECURSIVE)
+	    || (ls_options & LS_SHOW_TREES)
+	    || (ls_options & LS_TREE_ONLY))
+		shown_fields = FIELD_DEFAULT;
+	if (cmdmode == MODE_LONG)
+		shown_fields = FIELD_LONG_DEFAULT;
+	return 1;
+}
+
+static int show_recursive(const char *base, size_t baselen,
+			  const char *pathname)
 {
 	int i;
 
@@ -70,6 +108,39 @@  static enum object_type get_type(unsigned int mode)
 	        : OBJ_BLOB);
 }
 
+static int show_default(const struct object_id *oid, enum object_type type,
+			const char *pathname, unsigned mode,
+			struct strbuf *base)
+{
+	size_t baselen = base->len;
+
+	if (shown_fields & FIELD_SIZE) {
+		char size_text[24];
+		if (type == OBJ_BLOB) {
+			unsigned long size;
+			if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
+				xsnprintf(size_text, sizeof(size_text), "BAD");
+			else
+				xsnprintf(size_text, sizeof(size_text),
+					  "%" PRIuMAX, (uintmax_t)size);
+		} else {
+			xsnprintf(size_text, sizeof(size_text), "-");
+		}
+		printf("%06o %s %s %7s\t", mode, type_name(type),
+		find_unique_abbrev(oid, abbrev), size_text);
+	} else {
+		printf("%06o %s %s\t", mode, type_name(type),
+		find_unique_abbrev(oid, abbrev));
+	}
+	baselen = base->len;
+	strbuf_addstr(base, pathname);
+	write_name_quoted_relative(base->buf,
+				   chomp_prefix ? ls_tree_prefix : NULL, stdout,
+				   line_termination);
+	strbuf_setlen(base, baselen);
+	return 1;
+}
+
 static int show_tree(const struct object_id *oid, struct strbuf *base,
 		const char *pathname, unsigned mode, void *context)
 {
@@ -84,34 +155,24 @@  static int show_tree(const struct object_id *oid, struct strbuf *base,
 	if (type == OBJ_BLOB && (ls_options & LS_TREE_ONLY))
 		return 0;
 
-	if (!(ls_options & LS_NAME_ONLY)) {
-		if (ls_options & LS_SHOW_SIZE) {
-			char size_text[24];
-			if (type == OBJ_BLOB) {
-				unsigned long size;
-				if (oid_object_info(the_repository, oid, &size) == OBJ_BAD)
-					xsnprintf(size_text, sizeof(size_text),
-						  "BAD");
-				else
-					xsnprintf(size_text, sizeof(size_text),
-						  "%"PRIuMAX, (uintmax_t)size);
-			} else {
-				xsnprintf(size_text, sizeof(size_text), "-");
-			}
-			printf("%06o %s %s %7s\t", mode, type_name(type),
-			       find_unique_abbrev(oid, abbrev),
-			       size_text);
-		} else {
-			printf("%06o %s %s\t", mode, type_name(type),
-			       find_unique_abbrev(oid, abbrev));
-		}
+	if (shown_fields == FIELD_OBJECT_NAME) {
+		printf("%s%c", find_unique_abbrev(oid, abbrev), line_termination);
+		return recurse;
 	}
-	baselen = base->len;
-	strbuf_addstr(base, pathname);
-	write_name_quoted_relative(base->buf,
-				   chomp_prefix ? ls_tree_prefix : NULL,
-				   stdout, line_termination);
-	strbuf_setlen(base, baselen);
+
+	if (shown_fields == FIELD_FILE_NAME) {
+		baselen = base->len;
+		strbuf_addstr(base, pathname);
+		write_name_quoted_relative(base->buf,
+					   chomp_prefix ? ls_tree_prefix : NULL,
+					   stdout, line_termination);
+		strbuf_setlen(base, baselen);
+		return recurse;
+	}
+
+	if (shown_fields >= FIELD_DEFAULT)
+		show_default(oid, type, pathname, mode, base);
+
 	return recurse;
 }
 
@@ -129,12 +190,14 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			LS_SHOW_TREES),
 		OPT_SET_INT('z', NULL, &line_termination,
 			    N_("terminate entries with NUL byte"), 0),
-		OPT_BIT('l', "long", &ls_options, N_("include object size"),
-			LS_SHOW_SIZE),
-		OPT_BIT(0, "name-only", &ls_options, N_("list only filenames"),
-			LS_NAME_ONLY),
-		OPT_BIT(0, "name-status", &ls_options, N_("list only filenames"),
-			LS_NAME_ONLY),
+		OPT_CMDMODE('l', "long", &cmdmode, N_("include object size"),
+			    MODE_LONG),
+		OPT_CMDMODE(0, "name-only", &cmdmode, N_("list only filenames"),
+			    MODE_NAME_ONLY),
+		OPT_CMDMODE(0, "name-status", &cmdmode, N_("list only filenames"),
+			    MODE_NAME_ONLY),
+		OPT_CMDMODE(0, "object-only", &cmdmode, N_("list only objects"),
+			    MODE_OBJECT_ONLY),
 		OPT_SET_INT(0, "full-name", &chomp_prefix,
 			    N_("use full path names"), 0),
 		OPT_BOOL(0, "full-tree", &full_tree,
@@ -165,6 +228,8 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	if (get_oid(argv[0], &oid))
 		die("Not a valid object name %s", argv[0]);
 
+	parse_shown_fields();
+
 	/*
 	 * show_recursive() rolls its own matching code and is
 	 * generally ignorant of 'struct pathspec'. The magic mask
diff --git a/t/t3104-ls-tree-oid.sh b/t/t3104-ls-tree-oid.sh
new file mode 100755
index 0000000000..6ce62bd769
--- /dev/null
+++ b/t/t3104-ls-tree-oid.sh
@@ -0,0 +1,51 @@ 
+#!/bin/sh
+
+test_description='git ls-tree objects handling.'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit A &&
+	test_commit B &&
+	mkdir -p C &&
+	test_commit C/D.txt &&
+	find *.txt path* \( -type f -o -type l \) -print |
+	xargs git update-index --add &&
+	tree=$(git write-tree) &&
+	echo $tree
+'
+
+test_expect_success 'usage: --object-only' '
+	git ls-tree --object-only $tree >current &&
+	git ls-tree $tree >result &&
+	cut -f1 result | cut -d " " -f3 >expected &&
+	test_cmp current expected
+'
+
+test_expect_success 'usage: --object-only with -r' '
+	git ls-tree --object-only -r $tree >current &&
+	git ls-tree -r $tree >result &&
+	cut -f1 result | cut -d " " -f3 >expected &&
+	test_cmp current expected
+'
+
+test_expect_success 'usage: --object-only with --abbrev' '
+	git ls-tree --object-only --abbrev=6 $tree >current &&
+	git ls-tree --abbrev=6 $tree >result &&
+	cut -f1 result | cut -d " " -f3 >expected &&
+	test_cmp current expected
+'
+
+test_expect_success 'usage: incompatible options: --name-only with --object-only' '
+	test_expect_code 129 git ls-tree --object-only --name-only $tree
+'
+
+test_expect_success 'usage: incompatible options: --name-status with --object-only' '
+	test_expect_code 129 git ls-tree --object-only --name-status $tree
+'
+
+test_expect_success 'usage: incompatible options: --long with --object-only' '
+	test_expect_code 129 git ls-tree --object-only --long $tree
+'
+
+test_done