[v4,3/3] ref-filter: add support for %(contents:size)

Message ID	20200710164739.6616-4-chriscool@tuxfamily.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=TxkB=AV=vger.kernel.org=git-owner@kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, Christian Couder <chriscool@tuxfamily.org> Subject: [PATCH v4 3/3] ref-filter: add support for %(contents:size) Date: Fri, 10 Jul 2020 18:47:39 +0200 Message-Id: <20200710164739.6616-4-chriscool@tuxfamily.org> In-Reply-To: <20200710164739.6616-1-chriscool@tuxfamily.org> References: <20200707174049.21714-1-chriscool@tuxfamily.org> <20200710164739.6616-1-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Add support for %(contents:size) in ref-filter \| expand [v4,0/3] Add support for %(contents:size) in ref-filter [v4,1/3] Documentation: clarify %(contents:XXXX) doc [v4,2/3] t6300: test refs pointing to tree and blob [v4,3/3] ref-filter: add support for %(contents:size)

Message ID

20200710164739.6616-4-chriscool@tuxfamily.org (mailing list archive)

State

New, archived

Headers

From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
        Christian Couder <chriscool@tuxfamily.org>
Subject: [PATCH v4 3/3] ref-filter: add support for %(contents:size)
Date: Fri, 10 Jul 2020 18:47:39 +0200
Message-Id: <20200710164739.6616-4-chriscool@tuxfamily.org>
In-Reply-To: <20200710164739.6616-1-chriscool@tuxfamily.org>
References: <20200707174049.21714-1-chriscool@tuxfamily.org>
 <20200710164739.6616-1-chriscool@tuxfamily.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

Add support for %(contents:size) in ref-filter | expand

Commit Message

Christian Couder July 10, 2020, 4:47 p.m. UTC

It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its output
with the output from `git cat-file`.

As with %(contents), %(contents:size) is silently ignored, if a
ref points to something other than a commit or a tag:

```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

```

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-for-each-ref.txt |  3 +++
 ref-filter.c                       |  7 ++++++-
 t/t6300-for-each-ref.sh            | 19 +++++++++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)

Comments

Junio C Hamano July 10, 2020, 8:38 p.m. UTC | #1

Christian Couder <christian.couder@gmail.com> writes:

> It's useful and efficient to be able to get the size of the
> contents directly without having to pipe through `wc -c`.
>
> Also the result of the following:
>
> `git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`
>
> is off by one as `git for-each-ref` appends a newline character
> after the contents, which can be seen by comparing its output
> with the output from `git cat-file`.
>
> As with %(contents), %(contents:size) is silently ignored, if a
> ref points to something other than a commit or a tag:
>
> ```
> $ git update-ref refs/mytrees/first HEAD^{tree}
> $ git for-each-ref --format='%(contents)' refs/mytrees/first
>
> $ git for-each-ref --format='%(contents:size)' refs/mytrees/first
>
> ```
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/git-for-each-ref.txt |  3 +++
>  ref-filter.c                       |  7 ++++++-
>  t/t6300-for-each-ref.sh            | 19 +++++++++++++++++++
>  3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
> index b739412c30..2ea71c5f6c 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -235,6 +235,9 @@ and `date` to extract the named component.
>  The message in a commit or a tag object is `contents`, from which
>  `contents:<part>` can be used to extract various parts out of:
>  
> +contents:size::
> +	The size in bytes of the commit or tag message.
> +
>  contents:subject::
>  	The first paragraph of the message, which typically is a
>  	single line, is taken as the "subject" of the commit or the

OK.

> diff --git a/ref-filter.c b/ref-filter.c
> index 8447cb09be..73d8bfa86d 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -127,7 +127,8 @@ static struct used_atom {
>  			unsigned int nobracket : 1, push : 1, push_remote : 1;
>  		} remote_ref;
>  		struct {
> -			enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
> +			enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH,
> +			       C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
>  			struct process_trailer_options trailer_opts;
>  			unsigned int nlines;
>  		} contents;
> @@ -338,6 +339,8 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
>  		atom->u.contents.option = C_BARE;
>  	else if (!strcmp(arg, "body"))
>  		atom->u.contents.option = C_BODY;
> +	else if (!strcmp(arg, "size"))
> +		atom->u.contents.option = C_LENGTH;
>  	else if (!strcmp(arg, "signature"))
>  		atom->u.contents.option = C_SIG;
>  	else if (!strcmp(arg, "subject"))
> @@ -1253,6 +1256,8 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
>  			v->s = copy_subject(subpos, sublen);
>  		else if (atom->u.contents.option == C_BODY_DEP)
>  			v->s = xmemdupz(bodypos, bodylen);
> +		else if (atom->u.contents.option == C_LENGTH)
> +			v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
>  		else if (atom->u.contents.option == C_BODY)
>  			v->s = xmemdupz(bodypos, nonsiglen);
>  		else if (atom->u.contents.option == C_SIG)

OK.

> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index e9f468d360..467871ac10 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -52,6 +52,25 @@ test_atom() {

You need to stare at the precontext to see if the added lines are
correct.  We have these before the precontext of the patch:

	case "$1" in
		head) ref=refs/heads/master ;;
		 tag) ref=refs/tags/testtag ;;
		 sym) ref=refs/heads/sym ;;
		   *) ref=$1 ;;
	esac
	printf '%s\n' "$3" >expected
	test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
		git for-each-ref --format='%($2)' $ref >actual &&

Here it uses "$1" for mere reporting on the test title, while using
"$ref" as the reliable way to uniquely identify it as a full ref.

>  		sanitize_pgp <actual >actual.clean &&
>  		test_cmp expected actual.clean
>  	"
> +	# Automatically test "contents:size" atom after testing "contents"
> +	if test "$2" = "contents"
> +	then
> +		case "$1" in
> +		refs/tags/signed-*)

Shouldn't this be $ref to be compared with full refnames like we see
below?

I know the callers won't pass 'head', 'tag' and 'sym' with
'contents' to this helper so the distinction may not currently
matter in practice, but still this use of "$1" does not sound quite
right, no?  

I actually was expecting you to switch on

	case $(git cat-file -t "$ref") in
	tag)
		...;;
	tree | blob)
		...;;
	commit)
		...;;
	easc

instead of the namespace, as %(contents:size) silently becomes empty
due to the underlying object type, not where the object that does
not support the "method" sits in the refs/ namespace.

> +			# We cannot use $3 as it expects sanitize_pgp to run
> +			expect=$(git cat-file tag $ref | tail -n +6 | wc -c) ;;
> +		refs/mytrees/* | refs/myblobs/*)
> +			expect='' ;;

Thanks for catching my thinko; I think I wrote 0 here in my
illustration.

> +		*)
> +			expect=$(printf '%s' "$3" | wc -c) ;;
> +		esac
> +		# Leave $expect unquoted to lose possible leading whitespaces
> +		echo $expect >expected

OK.

> +		test_expect_${4:-success} $PREREQ "basic atom: $1 $2:size" "
> +			git for-each-ref --format='%($2:size)' $ref >actual &&
> +			test_cmp expected actual
> +		"

This is harder to read than necessary; let's not say "$2" when we
know it is 'contents' and nothing else.  Also avoid double-quoted
test body when you can.  The body is evaled and $ref we assigned is
visible inside the test just fine, so make it a habit to quote the
body with single quote pair, i.e.

	test_expect_${4:-sucess} $PREREQ "basic atom: $1 contents:size" '
		git for-each-ref --format="%(contents:size)" "$ref" >actual &&
		test_cmp expect actual
	'

Thanks.

> +	fi
>  }
>  
>  hexlen=$(test_oid hexsz)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index b739412c30..2ea71c5f6c 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,9 @@  and `date` to extract the named component.
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
+contents:size::
+	The size in bytes of the commit or tag message.
+
 contents:subject::
 	The first paragraph of the message, which typically is a
 	single line, is taken as the "subject" of the commit or the
diff --git a/ref-filter.c b/ref-filter.c
index 8447cb09be..73d8bfa86d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -127,7 +127,8 @@  static struct used_atom {
 			unsigned int nobracket : 1, push : 1, push_remote : 1;
 		} remote_ref;
 		struct {
-			enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
+			enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH,
+			       C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
@@ -338,6 +339,8 @@  static int contents_atom_parser(const struct ref_format *format, struct used_ato
 		atom->u.contents.option = C_BARE;
 	else if (!strcmp(arg, "body"))
 		atom->u.contents.option = C_BODY;
+	else if (!strcmp(arg, "size"))
+		atom->u.contents.option = C_LENGTH;
 	else if (!strcmp(arg, "signature"))
 		atom->u.contents.option = C_SIG;
 	else if (!strcmp(arg, "subject"))
@@ -1253,6 +1256,8 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = copy_subject(subpos, sublen);
 		else if (atom->u.contents.option == C_BODY_DEP)
 			v->s = xmemdupz(bodypos, bodylen);
+		else if (atom->u.contents.option == C_LENGTH)
+			v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
 		else if (atom->u.contents.option == C_BODY)
 			v->s = xmemdupz(bodypos, nonsiglen);
 		else if (atom->u.contents.option == C_SIG)
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index e9f468d360..467871ac10 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -52,6 +52,25 @@  test_atom() {
 		sanitize_pgp <actual >actual.clean &&
 		test_cmp expected actual.clean
 	"
+	# Automatically test "contents:size" atom after testing "contents"
+	if test "$2" = "contents"
+	then
+		case "$1" in
+		refs/tags/signed-*)
+			# We cannot use $3 as it expects sanitize_pgp to run
+			expect=$(git cat-file tag $ref | tail -n +6 | wc -c) ;;
+		refs/mytrees/* | refs/myblobs/*)
+			expect='' ;;
+		*)
+			expect=$(printf '%s' "$3" | wc -c) ;;
+		esac
+		# Leave $expect unquoted to lose possible leading whitespaces
+		echo $expect >expected
+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2:size" "
+			git for-each-ref --format='%($2:size)' $ref >actual &&
+			test_cmp expected actual
+		"
+	fi
 }
 
 hexlen=$(test_oid hexsz)

[v4,3/3] ref-filter: add support for %(contents:size)

Commit Message

Comments

Patch