Message ID | 20200707174049.21714-5-chriscool@tuxfamily.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for %(contents:size) in ref-filter | expand |
Christian Couder <christian.couder@gmail.com> writes: > It's useful and efficient to be able to get the size of the > contents directly without having to pipe through `wc -c`. > > Also the result of the following: > > `git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c` > > is off by one as `git for-each-ref` appends a newline character > after the contents, which can be seen by comparing its output > with the output from `git cat-file`. > > As with %(contents), %(contents:size) is silently ignored, if a > ref points to something other than a commit or a tag: > > ``` > $ git update-ref refs/mytrees/first HEAD^{tree} > $ git for-each-ref --format='%(contents)' refs/mytrees/first > > $ git for-each-ref --format='%(contents:size)' refs/mytrees/first > > ``` > > Signed-off-by: Christian Couder <chriscool@tuxfamily.org> > --- > Documentation/git-for-each-ref.txt | 3 +++ > ref-filter.c | 7 ++++++- > t/t6300-for-each-ref.sh | 19 +++++++++++++++++++ > 3 files changed, 28 insertions(+), 1 deletion(-) Nice. The only questionable thing here is if we later regret for assuming that all sizes are always measured in bytes. If we later find an application that wants an efficient access to "| wc -l" (instead of "| wc -c" that motivated this patch), we'd want to be able to say "%(contents:lines)" and at that point we may want to go back in time and call this "%(contents:bytes)" or something. > test_atom head contents 'Initial > ' > +test_atom head contents:size '8' These two are tied together (any change to the test script that causes us to update the former also forces us to update the latter), but I do not think of a way to unify the test without writing too much boilerplate code, so let's say this is good enough at least for now (but I may change my opinion as I read along). > test_atom tag contents 'Tagging at 1151968727 > ' > +test_atom tag contents:size '22' Likewise. > @@ -580,6 +582,7 @@ test_atom refs/tags/subject-body contents 'the subject line > first body line > second body line > ' > +test_atom refs/tags/subject-body contents:size '51' Likewise. Of course, we _could_ update the test_atom to do something magic only when the 'contents' atom is being asked. We notice that $2 is 'contents', do the usual test_expect_success for 'contents', and then measure the byte length of $3 ourselves and test 'contents:size'. That way, all the above test updates would become unnecessary (and the last two hunks of this patch can also go). That approach may even allow you to hide the details of sanitize-pgp in the updated test_atom so that the actual tests may not have to get updated even for signed tags. > +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp > +test_tag_contents_size_pgp () { > + ref="$1" > + test_expect_success $PREREQ "basic atom: $ref contents:size" " > + git cat-file tag $ref | tail -n +6 | wc -c >expected && > + git for-each-ref --format='%(contents:size)' $ref >actual && > + test_cmp expected actual > + " > +} > + > PREREQ=GPG > test_atom refs/tags/signed-empty subject '' > test_atom refs/tags/signed-empty contents:subject '' > @@ -629,6 +643,7 @@ test_atom refs/tags/signed-empty body "$sig" > test_atom refs/tags/signed-empty contents:body '' > test_atom refs/tags/signed-empty contents:signature "$sig" > test_atom refs/tags/signed-empty contents "$sig" > +test_tag_contents_size_pgp refs/tags/signed-empty > > test_atom refs/tags/signed-short subject 'subject line' > test_atom refs/tags/signed-short contents:subject 'subject line' > @@ -637,6 +652,7 @@ test_atom refs/tags/signed-short contents:body '' > test_atom refs/tags/signed-short contents:signature "$sig" > test_atom refs/tags/signed-short contents "subject line > $sig" > +test_tag_contents_size_pgp refs/tags/signed-short > > test_atom refs/tags/signed-long subject 'subject line' > test_atom refs/tags/signed-long contents:subject 'subject line' > @@ -649,6 +665,7 @@ test_atom refs/tags/signed-long contents "subject line > > body contents > $sig" > +test_tag_contents_size_pgp refs/tags/signed-long > > test_expect_success 'set up refs pointing to tree and blob' ' > git update-ref refs/mytrees/first refs/heads/master^{tree} && > @@ -664,6 +681,7 @@ test_atom refs/mytrees/first body "" > test_atom refs/mytrees/first contents:body "" > test_atom refs/mytrees/first contents:signature "" > test_atom refs/mytrees/first contents "" > +test_atom refs/mytrees/first contents:size "" > > test_atom refs/myblobs/first subject "" > test_atom refs/myblobs/first contents:subject "" > @@ -671,6 +689,7 @@ test_atom refs/myblobs/first body "" > test_atom refs/myblobs/first contents:body "" > test_atom refs/myblobs/first contents:signature "" > test_atom refs/myblobs/first contents "" > +test_atom refs/myblobs/first contents:size "" > > test_expect_success 'set up multiple-sort tags' ' > for when in 100000 200000
Christian Couder <christian.couder@gmail.com> writes: > + else if (atom->u.contents.option == C_LENGTH) > + v->s = xstrfmt("%ld", strlen(subpos)); Please squash something like this in. 32-bit builds are failing. ref-filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ref-filter.c b/ref-filter.c index 8ec28f0535..73d8bfa86d 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1257,7 +1257,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf) else if (atom->u.contents.option == C_BODY_DEP) v->s = xmemdupz(bodypos, bodylen); else if (atom->u.contents.option == C_LENGTH) - v->s = xstrfmt("%ld", strlen(subpos)); + v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos)); else if (atom->u.contents.option == C_BODY) v->s = xmemdupz(bodypos, nonsiglen); else if (atom->u.contents.option == C_SIG)
Christian Couder <christian.couder@gmail.com> writes: > +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp > +test_tag_contents_size_pgp () { > + ref="$1" > + test_expect_success $PREREQ "basic atom: $ref contents:size" " > + git cat-file tag $ref | tail -n +6 | wc -c >expected && > + git for-each-ref --format='%(contents:size)' $ref >actual && > + test_cmp expected actual > + " > +} Of course, this will BREAK the tests on macOS and possibly others with "wc" that emits leading whitespaces before the number. The tip of 'seen' has been failing ever since this topic was merged there e.g. https://travis-ci.org/github/git/git/jobs/705986794 Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Of course, we _could_ update the test_atom to do something magic > only when the 'contents' atom is being asked. We notice that $2 is > 'contents', do the usual test_expect_success for 'contents', and > then measure the byte length of $3 ourselves and test > 'contents:size'. That way, all the above test updates would become > unnecessary (and the last two hunks of this patch can also go). > > That approach may even allow you to hide the details of sanitize-pgp > in the updated test_atom so that the actual tests may not have to get > updated even for signed tags. After seeing the "wc -c" portability issues, I am now even more inclined to say that the above is the right direction. The portability worries can and should be encapsulated in a single test_atom helper function, just as it can be used to hide the differences between signed tags, annotated tags and commits. Thanks. >> +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp >> +test_tag_contents_size_pgp () { >> + ref="$1" >> + test_expect_success $PREREQ "basic atom: $ref contents:size" " >> + git cat-file tag $ref | tail -n +6 | wc -c >expected && >> + git for-each-ref --format='%(contents:size)' $ref >actual && >> + test_cmp expected actual >> + " >> +}
On Thu, Jul 9, 2020 at 2:14 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > Of course, we _could_ update the test_atom to do something magic > > only when the 'contents' atom is being asked. We notice that $2 is > > 'contents', do the usual test_expect_success for 'contents', and > > then measure the byte length of $3 ourselves and test > > 'contents:size'. That way, all the above test updates would become > > unnecessary (and the last two hunks of this patch can also go). > > > > That approach may even allow you to hide the details of sanitize-pgp > > in the updated test_atom so that the actual tests may not have to get > > updated even for signed tags. > > After seeing the "wc -c" portability issues, I am now even more > inclined to say that the above is the right direction. The > portability worries can and should be encapsulated in a single > test_atom helper function, just as it can be used to hide the > differences between signed tags, annotated tags and commits. Yeah, I have been working on that and will send a new patch series soon. The current test_atom() change looks like this: diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index 371e45e5ad..e514d98574 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -52,6 +52,26 @@ test_atom() { sanitize_pgp <actual >actual.clean && test_cmp expected actual.clean " + # Automatically test "contents:size" atom after testing "contents" + if test "$2" = "contents" + then + case "$1" in + refs/tags/signed-*) + # We cannot use $3 as it expects sanitize_pgp to run + git cat-file tag $ref | tail -n +6 | \ + wc -c | sed -e 's/^ *//' >expected ;; + refs/mytrees/*) + echo >expected ;; + refs/myblobs/*) + echo >expected ;; + *) + printf '%s' "$3" | wc -c | sed -e 's/^ *//' >expected ;; + esac + test_expect_${4:-success} $PREREQ "basic atom: $1 $2:size" " + git for-each-ref --format='%($2:size)' $ref >actual && + test_cmp expected actual + " + fi } I am wondering if it's worth adding a preparatory patch to introduce an helper function like the following in test-lib-functions.sh: +# test_byte_count outputs the number of bytes in files or stdin +# +# It is like wc -c but without portability issues, as on macOS and +# possibly other platforms leading whitespaces are emitted before the +# number. + +test_byte_count () { + wc -c "$@" | sed -e 's/^ *//' +} Not sure about the name of this helper function as it works differently than test_line_count().
Christian Couder <christian.couder@gmail.com> writes: > Yeah, I have been working on that and will send a new patch series soon. > The current test_atom() change looks like this: > > diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh > index 371e45e5ad..e514d98574 100755 > --- a/t/t6300-for-each-ref.sh > +++ b/t/t6300-for-each-ref.sh > @@ -52,6 +52,26 @@ test_atom() { > sanitize_pgp <actual >actual.clean && > test_cmp expected actual.clean > " > + # Automatically test "contents:size" atom after testing "contents" > + if test "$2" = "contents" > + then > + case "$1" in > + refs/tags/signed-*) > + # We cannot use $3 as it expects sanitize_pgp to run > + git cat-file tag $ref | tail -n +6 | \ > + wc -c | sed -e 's/^ *//' >expected ;; > + refs/mytrees/*) > + echo >expected ;; > + refs/myblobs/*) > + echo >expected ;; > + *) > + printf '%s' "$3" | wc -c | sed -e 's/^ *//' >expected ;; > + esac > + test_expect_${4:-success} $PREREQ "basic atom: $1 $2:size" " > + git for-each-ref --format='%($2:size)' $ref >actual && > + test_cmp expected actual > + " > + fi > } > > I am wondering if it's worth adding a preparatory patch to introduce > an helper function like the following in test-lib-functions.sh: > > +# test_byte_count outputs the number of bytes in files or stdin > +# > +# It is like wc -c but without portability issues, as on macOS and > +# possibly other platforms leading whitespaces are emitted before the > +# number. > + > +test_byte_count () { > + wc -c "$@" | sed -e 's/^ *//' > +} > > Not sure about the name of this helper function as it works > differently than test_line_count(). Yeah, if I were writing it, I'd call it "sane_wc_c" or something. But more importantly, I think the invocation of "|sed" is overkill. If I were writing it, I would go more like... if test $2 = contents then case "$1" in ...) expect=$(git cat-file ... | wc -c) ;; refs/mytrees/* | refs/myblobs/*) expect=0 ;; *) expect=$(printf ... | wc -c) ;; esac # leave $expect unquoted to lose possible leading whitespaces echo $expect >expect test_expect_success "..." ' ... test_cmp expect actual ' fi
diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index 788258c3ad..049bc93e6a 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -236,6 +236,9 @@ The complete message (subject, body, trailers and signature) of a commit or tag object is `contents`. This field can also be used in the following ways: +contents:size:: + The size in bytes of the complete message. + contents:subject:: The "subject" of the commit or tag message. It's actually the concatenation of all lines of the commit message up to the diff --git a/ref-filter.c b/ref-filter.c index 8447cb09be..8ec28f0535 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -127,7 +127,8 @@ static struct used_atom { unsigned int nobracket : 1, push : 1, push_remote : 1; } remote_ref; struct { - enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, C_SUB, C_TRAILERS } option; + enum { C_BARE, C_BODY, C_BODY_DEP, C_LENGTH, + C_LINES, C_SIG, C_SUB, C_TRAILERS } option; struct process_trailer_options trailer_opts; unsigned int nlines; } contents; @@ -338,6 +339,8 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato atom->u.contents.option = C_BARE; else if (!strcmp(arg, "body")) atom->u.contents.option = C_BODY; + else if (!strcmp(arg, "size")) + atom->u.contents.option = C_LENGTH; else if (!strcmp(arg, "signature")) atom->u.contents.option = C_SIG; else if (!strcmp(arg, "subject")) @@ -1253,6 +1256,8 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf) v->s = copy_subject(subpos, sublen); else if (atom->u.contents.option == C_BODY_DEP) v->s = xmemdupz(bodypos, bodylen); + else if (atom->u.contents.option == C_LENGTH) + v->s = xstrfmt("%ld", strlen(subpos)); else if (atom->u.contents.option == C_BODY) v->s = xmemdupz(bodypos, nonsiglen); else if (atom->u.contents.option == C_SIG) diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index 371e45e5ad..b580e27a32 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -125,6 +125,7 @@ test_atom head contents:body '' test_atom head contents:signature '' test_atom head contents 'Initial ' +test_atom head contents:size '8' test_atom head HEAD '*' test_atom tag refname refs/tags/testtag @@ -170,6 +171,7 @@ test_atom tag contents:body '' test_atom tag contents:signature '' test_atom tag contents 'Tagging at 1151968727 ' +test_atom tag contents:size '22' test_atom tag HEAD ' ' test_expect_success 'Check invalid atoms names are errors' ' @@ -580,6 +582,7 @@ test_atom refs/tags/subject-body contents 'the subject line first body line second body line ' +test_atom refs/tags/subject-body contents:size '51' test_expect_success 'create tag with multiline subject' ' cat >msg <<-\EOF && @@ -606,6 +609,7 @@ second subject line first body line second body line ' +test_atom refs/tags/multiline contents:size '73' test_expect_success GPG 'create signed tags' ' git tag -s -m "" signed-empty && @@ -622,6 +626,16 @@ sig='-----BEGIN PGP SIGNATURE----- -----END PGP SIGNATURE----- ' +# We cannot use test_atom to check contents:size of signed tags due to sanitize_pgp +test_tag_contents_size_pgp () { + ref="$1" + test_expect_success $PREREQ "basic atom: $ref contents:size" " + git cat-file tag $ref | tail -n +6 | wc -c >expected && + git for-each-ref --format='%(contents:size)' $ref >actual && + test_cmp expected actual + " +} + PREREQ=GPG test_atom refs/tags/signed-empty subject '' test_atom refs/tags/signed-empty contents:subject '' @@ -629,6 +643,7 @@ test_atom refs/tags/signed-empty body "$sig" test_atom refs/tags/signed-empty contents:body '' test_atom refs/tags/signed-empty contents:signature "$sig" test_atom refs/tags/signed-empty contents "$sig" +test_tag_contents_size_pgp refs/tags/signed-empty test_atom refs/tags/signed-short subject 'subject line' test_atom refs/tags/signed-short contents:subject 'subject line' @@ -637,6 +652,7 @@ test_atom refs/tags/signed-short contents:body '' test_atom refs/tags/signed-short contents:signature "$sig" test_atom refs/tags/signed-short contents "subject line $sig" +test_tag_contents_size_pgp refs/tags/signed-short test_atom refs/tags/signed-long subject 'subject line' test_atom refs/tags/signed-long contents:subject 'subject line' @@ -649,6 +665,7 @@ test_atom refs/tags/signed-long contents "subject line body contents $sig" +test_tag_contents_size_pgp refs/tags/signed-long test_expect_success 'set up refs pointing to tree and blob' ' git update-ref refs/mytrees/first refs/heads/master^{tree} && @@ -664,6 +681,7 @@ test_atom refs/mytrees/first body "" test_atom refs/mytrees/first contents:body "" test_atom refs/mytrees/first contents:signature "" test_atom refs/mytrees/first contents "" +test_atom refs/mytrees/first contents:size "" test_atom refs/myblobs/first subject "" test_atom refs/myblobs/first contents:subject "" @@ -671,6 +689,7 @@ test_atom refs/myblobs/first body "" test_atom refs/myblobs/first contents:body "" test_atom refs/myblobs/first contents:signature "" test_atom refs/myblobs/first contents "" +test_atom refs/myblobs/first contents:size "" test_expect_success 'set up multiple-sort tags' ' for when in 100000 200000
It's useful and efficient to be able to get the size of the contents directly without having to pipe through `wc -c`. Also the result of the following: `git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c` is off by one as `git for-each-ref` appends a newline character after the contents, which can be seen by comparing its output with the output from `git cat-file`. As with %(contents), %(contents:size) is silently ignored, if a ref points to something other than a commit or a tag: ``` $ git update-ref refs/mytrees/first HEAD^{tree} $ git for-each-ref --format='%(contents)' refs/mytrees/first $ git for-each-ref --format='%(contents:size)' refs/mytrees/first ``` Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- Documentation/git-for-each-ref.txt | 3 +++ ref-filter.c | 7 ++++++- t/t6300-for-each-ref.sh | 19 +++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-)