[v2,2/2] ref-filter: add support for %(contents:size)

Message ID	20200702140845.24945-3-chriscool@tuxfamily.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=gkpi=AN=vger.kernel.org=git-owner@kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, Christian Couder <chriscool@tuxfamily.org> Subject: [PATCH v2 2/2] ref-filter: add support for %(contents:size) Date: Thu, 2 Jul 2020 16:08:45 +0200 Message-Id: <20200702140845.24945-3-chriscool@tuxfamily.org> In-Reply-To: <20200702140845.24945-1-chriscool@tuxfamily.org> References: <20200702140845.24945-1-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Add support for %(contents:size) in ref-filter \| expand [v2,0/2] Add support for %(contents:size) in ref-filter [v2,1/2] Documentation: clarify %(contents:XXXX) doc [v2,2/2] ref-filter: add support for %(contents:size)

Message ID

20200702140845.24945-3-chriscool@tuxfamily.org (mailing list archive)

State

New, archived

Headers

From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
        Christian Couder <chriscool@tuxfamily.org>
Subject: [PATCH v2 2/2] ref-filter: add support for %(contents:size)
Date: Thu,  2 Jul 2020 16:08:45 +0200
Message-Id: <20200702140845.24945-3-chriscool@tuxfamily.org>
In-Reply-To: <20200702140845.24945-1-chriscool@tuxfamily.org>
References: <20200702140845.24945-1-chriscool@tuxfamily.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

Add support for %(contents:size) in ref-filter | expand

Commit Message

Christian Couder July 2, 2020, 2:08 p.m. UTC

It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its ouput
with the output from `git cat-file`.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/git-for-each-ref.txt |  3 +++
 ref-filter.c                       |  7 ++++++-
 t/t6300-for-each-ref.sh            | 17 +++++++++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

Comments

Junio C Hamano July 6, 2020, 9:44 p.m. UTC | #1

Christian Couder <christian.couder@gmail.com> writes:

> It's useful and efficient to be able to get the size of the
> contents directly without having to pipe through `wc -c`.
>
> Also the result of the following:
>
> `git for-each-ref --format='%(contents)' | wc -c`
>
> is off by one as `git for-each-ref` appends a newline character
> after the contents, which can be seen by comparing its ouput
> with the output from `git cat-file`.

So that's off by number of refs that are shown?

> +contents:size::
> +	The size in bytes of the complete message.
> +

Complete as opposed to what?  

What happens when the object referred to by the ref is not a commit
or a tag?

I am fine if it just is silently ignored (which is consistent with
already existing behaviour of other requests that do not make sense
for the given type) if the thing is a blob or a tree, but we'd need
to cover the case with a test or two.  It seems you only expect this
with a tag object and do not have any test that checks for other
types of objects?

Thanks.

> +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp
> +test_tag_contents_size_pgp () {
> +	ref="$1"
> +	test_expect_success $PREREQ "basic atom: $ref contents:size" "
> +		git cat-file tag $ref | tail -n +6 | wc -c >expected &&
> +		git for-each-ref --format='%(contents:size)' $ref >actual &&
> +		test_cmp expected actual
> +	"
> +}
> +
>  PREREQ=GPG
>  test_atom refs/tags/signed-empty subject ''
>  test_atom refs/tags/signed-empty contents:subject ''
> @@ -629,6 +643,7 @@ test_atom refs/tags/signed-empty body "$sig"
>  test_atom refs/tags/signed-empty contents:body ''
>  test_atom refs/tags/signed-empty contents:signature "$sig"
>  test_atom refs/tags/signed-empty contents "$sig"
> +test_tag_contents_size_pgp refs/tags/signed-empty
>  
>  test_atom refs/tags/signed-short subject 'subject line'
>  test_atom refs/tags/signed-short contents:subject 'subject line'
> @@ -637,6 +652,7 @@ test_atom refs/tags/signed-short contents:body ''
>  test_atom refs/tags/signed-short contents:signature "$sig"
>  test_atom refs/tags/signed-short contents "subject line
>  $sig"
> +test_tag_contents_size_pgp refs/tags/signed-short
>  
>  test_atom refs/tags/signed-long subject 'subject line'
>  test_atom refs/tags/signed-long contents:subject 'subject line'
> @@ -649,6 +665,7 @@ test_atom refs/tags/signed-long contents "subject line
>  
>  body contents
>  $sig"
> +test_tag_contents_size_pgp refs/tags/signed-long
>  
>  test_expect_success 'set up multiple-sort tags' '
>  	for when in 100000 200000

Christian Couder July 7, 2020, 8:40 a.m. UTC | #2

On Mon, Jul 6, 2020 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > It's useful and efficient to be able to get the size of the
> > contents directly without having to pipe through `wc -c`.
> >
> > Also the result of the following:
> >
> > `git for-each-ref --format='%(contents)' | wc -c`
> >
> > is off by one as `git for-each-ref` appends a newline character
> > after the contents, which can be seen by comparing its ouput
> > with the output from `git cat-file`.
>
> So that's off by number of refs that are shown?

Yeah, true. I should have added a ref as I mean something like the
following is off by one:

`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`

I have changed it to the above in my current version.

> > +contents:size::
> > +     The size in bytes of the complete message.
> > +
>
> Complete as opposed to what?

In the existing documentation there is already "The complete message
of a commit or tag object is `contents`. "

So yeah I could add another preparatory patch to change the above to
something like:

"The complete message (subject, body, trailers and signature) of a
commit or tag object is `contents`. "

> What happens when the object referred to by the ref is not a commit
> or a tag?

Right now %(contents) shows nothing (a blank line actually) when the
ref points to something other than a commit or a tag, and
%(contents:size) does the same:

```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

```

I am not sure if it's worth updating the existing documentation or
just the commit message of this patch. For now I have done the latter
in my current version.

> I am fine if it just is silently ignored (which is consistent with
> already existing behaviour of other requests that do not make sense
> for the given type) if the thing is a blob or a tree, but we'd need
> to cover the case with a test or two.  It seems you only expect this
> with a tag object and do not have any test that checks for other
> types of objects?

My patch already adds 1 test with a commit:

--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -125,6 +125,7 @@ test_atom head contents:body ''
 test_atom head contents:signature ''
 test_atom head contents 'Initial
 '
+test_atom head contents:size '8'
 test_atom head HEAD '*'

There is only one test with a commit, because that's already the case
for %(contents) too.

I am ok with adding another preparatory patch to the series that would
add a few more test cases with commits, trees and blobs though.

Junio C Hamano July 7, 2020, 3:28 p.m. UTC | #3

Christian Couder <christian.couder@gmail.com> writes:

>> I am fine if it just is silently ignored (which is consistent with
>> already existing behaviour of other requests that do not make sense
>> for the given type) if the thing is a blob or a tree, but we'd need
>> to cover the case with a test or two.  It seems you only expect this
>> with a tag object and do not have any test that checks for other
>> types of objects?
>
> My patch already adds 1 test with a commit:

But that is not an interesting case, no?  Unless I missed that there
were new tests to see what happens with a blob and a tree, that is.

> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -125,6 +125,7 @@ test_atom head contents:body ''
>  test_atom head contents:signature ''
>  test_atom head contents 'Initial
>  '
> +test_atom head contents:size '8'
>  test_atom head HEAD '*'
>
> There is only one test with a commit, because that's already the case
> for %(contents) too.
>
> I am ok with adding another preparatory patch to the series that would
> add a few more test cases with commits, trees and blobs though.

I am OK either way, because I know fairly well how these formatting atom
system works (I had heavy involvement in the initial one) and I know
it would be hard to make it do nonsensical things when given an
object of an unexpected type.

But I was hoping some among us experienced contributors would lead
beginners by example of making sure we test both positive ("see, my
shiny new toy does work") and negative ("and it won't do nonsensical
things when given an input it is not designed to work with") cases.

Thanks.

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2db9779d54..833641eacd 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,9 @@  and `date` to extract the named component.
 The complete message of a commit or tag object is `contents`. This
 field can also be used in the following ways:
 
+contents:size::
+	The size in bytes of the complete message.
+
 contents:subject::
 	The "subject" of the commit or tag message.  It's actually the
 	concatenation of all lines of the commit message up to the
diff --git a/ref-filter.c b/ref-filter.c
index bf7b70299b..036a95d0d2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -127,7 +127,8 @@  static struct used_atom {
 			unsigned int nobracket : 1, push : 1, push_remote : 1;
 		} remote_ref;
 		struct {
-			enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
+			enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH,
+			       C_LINES, C_SIG, C_SUB, C_TRAILERS } option;
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
@@ -338,6 +339,8 @@  static int contents_atom_parser(const struct ref_format *format, struct used_ato
 		atom->u.contents.option = C_BARE;
 	else if (!strcmp(arg, "body"))
 		atom->u.contents.option = C_BODY;
+	else if (!strcmp(arg, "size"))
+		atom->u.contents.option = C_LENGTH;
 	else if (!strcmp(arg, "signature"))
 		atom->u.contents.option = C_SIG;
 	else if (!strcmp(arg, "subject"))
@@ -1253,6 +1256,8 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = copy_subject(subpos, sublen);
 		else if (atom->u.contents.option == C_BODY_DEP)
 			v->s = xmemdupz(bodypos, bodylen);
+		else if (atom->u.contents.option == C_LENGTH)
+			v->s = xstrfmt("%ld", strlen(subpos));
 		else if (atom->u.contents.option == C_BODY)
 			v->s = xmemdupz(bodypos, nonsiglen);
 		else if (atom->u.contents.option == C_SIG)
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index da59fadc5d..ceee8d9372 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -125,6 +125,7 @@  test_atom head contents:body ''
 test_atom head contents:signature ''
 test_atom head contents 'Initial
 '
+test_atom head contents:size '8'
 test_atom head HEAD '*'
 
 test_atom tag refname refs/tags/testtag
@@ -170,6 +171,7 @@  test_atom tag contents:body ''
 test_atom tag contents:signature ''
 test_atom tag contents 'Tagging at 1151968727
 '
+test_atom tag contents:size '22'
 test_atom tag HEAD ' '
 
 test_expect_success 'Check invalid atoms names are errors' '
@@ -580,6 +582,7 @@  test_atom refs/tags/subject-body contents 'the subject line
 first body line
 second body line
 '
+test_atom refs/tags/subject-body contents:size '51'
 
 test_expect_success 'create tag with multiline subject' '
 	cat >msg <<-\EOF &&
@@ -606,6 +609,7 @@  second subject line
 first body line
 second body line
 '
+test_atom refs/tags/multiline contents:size '73'
 
 test_expect_success GPG 'create signed tags' '
 	git tag -s -m "" signed-empty &&
@@ -622,6 +626,16 @@  sig='-----BEGIN PGP SIGNATURE-----
 -----END PGP SIGNATURE-----
 '
 
+# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp
+test_tag_contents_size_pgp () {
+	ref="$1"
+	test_expect_success $PREREQ "basic atom: $ref contents:size" "
+		git cat-file tag $ref | tail -n +6 | wc -c >expected &&
+		git for-each-ref --format='%(contents:size)' $ref >actual &&
+		test_cmp expected actual
+	"
+}
+
 PREREQ=GPG
 test_atom refs/tags/signed-empty subject ''
 test_atom refs/tags/signed-empty contents:subject ''
@@ -629,6 +643,7 @@  test_atom refs/tags/signed-empty body "$sig"
 test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
+test_tag_contents_size_pgp refs/tags/signed-empty
 
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -637,6 +652,7 @@  test_atom refs/tags/signed-short contents:body ''
 test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
+test_tag_contents_size_pgp refs/tags/signed-short
 
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -649,6 +665,7 @@  test_atom refs/tags/signed-long contents "subject line
 
 body contents
 $sig"
+test_tag_contents_size_pgp refs/tags/signed-long
 
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000

[v2,2/2] ref-filter: add support for %(contents:size)

Commit Message

Comments

Patch