diff mbox series

[3/3] pretty: add abbrev option to %(describe)

Message ID 20211024014256.3569322-4-eschwartz@archlinux.org (mailing list archive)
State Superseded
Headers show
Series Add some more options to the pretty-formats | expand

Commit Message

Eli Schwartz Oct. 24, 2021, 1:42 a.m. UTC
The %(describe) placeholder by default, like `git describe`, uses a
seven-character abbreviated commit hash. This may not be sufficient to
fully describe all git repos, resulting in a placeholder replacement
changing its length because the repository grew in size. This could
cause the output of git-archive to change.

Add the --abbrev option to `git describe` to the placeholder interface
in order to provide tools to the user for fine-tuning project defaults
and ensure reproducible archives.

One alternative would be to just always specify --abbrev=40 but this may
be a bit too biased...

Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
---
 Documentation/pretty-formats.txt | 4 ++++
 pretty.c                         | 2 +-
 t/t4205-log-pretty-formats.sh    | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Oct. 24, 2021, 5:15 a.m. UTC | #1
Eli Schwartz <eschwartz@archlinux.org> writes:

> The %(describe) placeholder by default, like `git describe`, uses a
> seven-character abbreviated commit hash. This may not be sufficient to

"hash" -> "object name".

> fully describe all git repos, resulting in a placeholder replacement

"all git repos" -> "all commits in a given repository" (there may be
other valid way to clarify, but the point is that 'describe' does
not describe 'git repos' in the sense that my repository gets
description X while your repository gets description Y).

> changing its length because the repository grew in size. This could
> cause the output of git-archive to change.
>
> Add the --abbrev option to `git describe` to the placeholder interface
> in order to provide tools to the user for fine-tuning project defaults
> and ensure reproducible archives.

Note that it is sad that --abbrev=<n> does not necessarily ensure
reproducibility.  To be more precise, I do not think it sacrifices
uniqueness to make the output reproducible.  You can get more than N
hex-digits in the output if N is too small to ensure uniquness.

So it indeed is that this line of thought ...

> One alternative would be to just always specify --abbrev=40 but this may
> be a bit too biased...

... to use --abbrev=999 (because 40 is not the length of a full
object name in the SHA-2 world) is the only reasonable way, if what
you care about is the reproducibility.

    Side note.  I think "git describe --no-abbrev" is buggy in that
    it does not give a full object name; I didn't check the code,
    but it appears to be behaving the same way as "git describe
    --abbrev=0" (show no hexdigits).  Fixing this bug may possibly
    be a low-hanging fruit.

But even if the feature cannot be used to guarantee a full
reproducibility, it is a good thing that we can now add this feature
with minimum effort thanks to the previous two steps.

The refactoring I suggested in my review for the previous step will
shine, if we want to do a good job parsing the --abbrev=<n> option,
since such a code organization would make it a fairly easy addition
to introduce "integer" type that calls match_placeholder_arg_value()
to read the option value (like "string" does) and validate that the
value is indeed an integer.

Would we want to support "--contains" as another boolean type?  How
about "--all" and "--long"?  All three sound plausible candidates.

Thanks.
Eli Schwartz Oct. 24, 2021, 3:43 p.m. UTC | #2
On 10/24/21 1:15 AM, Junio C Hamano wrote:
> Eli Schwartz <eschwartz@archlinux.org> writes:
> 
>> The %(describe) placeholder by default, like `git describe`, uses a
>> seven-character abbreviated commit hash. This may not be sufficient to
> 
> "hash" -> "object name".
> 
>> fully describe all git repos, resulting in a placeholder replacement
> 
> "all git repos" -> "all commits in a given repository" (there may be
> other valid way to clarify, but the point is that 'describe' does
> not describe 'git repos' in the sense that my repository gets
> description X while your repository gets description Y).


Good points.


>> changing its length because the repository grew in size. This could
>> cause the output of git-archive to change.
>>
>> Add the --abbrev option to `git describe` to the placeholder interface
>> in order to provide tools to the user for fine-tuning project defaults
>> and ensure reproducible archives.
> 
> Note that it is sad that --abbrev=<n> does not necessarily ensure
> reproducibility.  To be more precise, I do not think it sacrifices
> uniqueness to make the output reproducible.  You can get more than N
> hex-digits in the output if N is too small to ensure uniquness.
> 
> So it indeed is that this line of thought ...
> 
>> One alternative would be to just always specify --abbrev=40 but this may
>> be a bit too biased...
> 
> ... to use --abbrev=999 (because 40 is not the length of a full
> object name in the SHA-2 world) is the only reasonable way, if what
> you care about is the reproducibility.


Right, I keep forgetting about the current work towards SHA-2... that
being said I somehow feel that 40 hex-digits will probably be reasonably
sufficient even if commit object ids can become longer than that. So it
is technically true...


>     Side note.  I think "git describe --no-abbrev" is buggy in that
>     it does not give a full object name; I didn't check the code,
>     but it appears to be behaving the same way as "git describe
>     --abbrev=0" (show no hexdigits).  Fixing this bug may possibly
>     be a low-hanging fruit.


I... did not realize that --abbrev, which takes an integer value, could
be specified with a leading --no- in the first place. :o


> But even if the feature cannot be used to guarantee a full
> reproducibility, it is a good thing that we can now add this feature
> with minimum effort thanks to the previous two steps.
> 
> The refactoring I suggested in my review for the previous step will
> shine, if we want to do a good job parsing the --abbrev=<n> option,
> since such a code organization would make it a fairly easy addition
> to introduce "integer" type that calls match_placeholder_arg_value()
> to read the option value (like "string" does) and validate that the
> value is indeed an integer.
> 
> Would we want to support "--contains" as another boolean type?  How
> about "--all" and "--long"?  All three sound plausible candidates.


I didn't have any immediate use for these options. To be honest, I don't
entirely understand the purpose of:

--all, which causes git describe to report things like "heads/master"

--contains, which causes git describe to report different output
depending on the status of later commits being tagged


They may have their uses, but I'm not sure those uses include writing
metadata to a git-archive tarball at least. I believe the results would
inevitably end up changing after the fact, whereas only matching
existing tags will tend to be pretty reliable as a tag is, in most
(all?) cases, pushed at the same time as, or before:
- the commit it describes
- commits which are later than the tag


On the other hand...

--long could be interesting to some people, although for, say,
generating software version numbers, a tagged release will typically
omit that information in my experience. For the sake of thoroughness I
could add that too.
diff mbox series

Patch

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 14107ac191..317c1382b5 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -221,6 +221,10 @@  The placeholders are:
 			  the same time.
 +
 ** 'tags[=<BOOL>]': Also consider lightweight tags.
+** 'abbrev=<N>': Instead of using the default number of hexadecimal digits
+   (which will vary according to the number of objects in the repository with a
+   default of 7) of the abbreviated object name, use <n> digits, or as many digits
+   as needed to form a unique object name.
 ** 'match=<pattern>': Only consider tags matching the given
    `glob(7)` pattern, excluding the "refs/tags/" prefix.
 ** 'exclude=<pattern>': Do not consider tags matching the given
diff --git a/pretty.c b/pretty.c
index 3a41bedf1a..a092457274 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1217,7 +1217,7 @@  int format_set_trailers_options(struct process_trailer_options *opts,
 static size_t parse_describe_args(const char *start, struct strvec *args)
 {
 	const char *options[] = { "tags" };
-	const char *option_arguments[] = { "match", "exclude" };
+	const char *option_arguments[] = { "match", "exclude", "abbrev" };
 	const char *arg = start;
 
 	for (;;) {
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index d4acf8882f..35eef4c865 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -1010,4 +1010,12 @@  test_expect_success '%(describe:tags) vs git describe --tags' '
 	test_cmp expect actual
 '
 
+test_expect_success '%(describe:abbrev=...) vs git describe --abbrev=...' '
+	test_when_finished "git tag -d tagname" &&
+	git tag -a -m tagged tagname &&
+	git describe --abbrev=15 >expect &&
+	git log -1 --format="%(describe:abbrev=15)" >actual &&
+	test_cmp expect actual
+'
+
 test_done