Message ID | 20211024014256.3569322-4-eschwartz@archlinux.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add some more options to the pretty-formats | expand |
Eli Schwartz <eschwartz@archlinux.org> writes: > The %(describe) placeholder by default, like `git describe`, uses a > seven-character abbreviated commit hash. This may not be sufficient to "hash" -> "object name". > fully describe all git repos, resulting in a placeholder replacement "all git repos" -> "all commits in a given repository" (there may be other valid way to clarify, but the point is that 'describe' does not describe 'git repos' in the sense that my repository gets description X while your repository gets description Y). > changing its length because the repository grew in size. This could > cause the output of git-archive to change. > > Add the --abbrev option to `git describe` to the placeholder interface > in order to provide tools to the user for fine-tuning project defaults > and ensure reproducible archives. Note that it is sad that --abbrev=<n> does not necessarily ensure reproducibility. To be more precise, I do not think it sacrifices uniqueness to make the output reproducible. You can get more than N hex-digits in the output if N is too small to ensure uniquness. So it indeed is that this line of thought ... > One alternative would be to just always specify --abbrev=40 but this may > be a bit too biased... ... to use --abbrev=999 (because 40 is not the length of a full object name in the SHA-2 world) is the only reasonable way, if what you care about is the reproducibility. Side note. I think "git describe --no-abbrev" is buggy in that it does not give a full object name; I didn't check the code, but it appears to be behaving the same way as "git describe --abbrev=0" (show no hexdigits). Fixing this bug may possibly be a low-hanging fruit. But even if the feature cannot be used to guarantee a full reproducibility, it is a good thing that we can now add this feature with minimum effort thanks to the previous two steps. The refactoring I suggested in my review for the previous step will shine, if we want to do a good job parsing the --abbrev=<n> option, since such a code organization would make it a fairly easy addition to introduce "integer" type that calls match_placeholder_arg_value() to read the option value (like "string" does) and validate that the value is indeed an integer. Would we want to support "--contains" as another boolean type? How about "--all" and "--long"? All three sound plausible candidates. Thanks.
On 10/24/21 1:15 AM, Junio C Hamano wrote: > Eli Schwartz <eschwartz@archlinux.org> writes: > >> The %(describe) placeholder by default, like `git describe`, uses a >> seven-character abbreviated commit hash. This may not be sufficient to > > "hash" -> "object name". > >> fully describe all git repos, resulting in a placeholder replacement > > "all git repos" -> "all commits in a given repository" (there may be > other valid way to clarify, but the point is that 'describe' does > not describe 'git repos' in the sense that my repository gets > description X while your repository gets description Y). Good points. >> changing its length because the repository grew in size. This could >> cause the output of git-archive to change. >> >> Add the --abbrev option to `git describe` to the placeholder interface >> in order to provide tools to the user for fine-tuning project defaults >> and ensure reproducible archives. > > Note that it is sad that --abbrev=<n> does not necessarily ensure > reproducibility. To be more precise, I do not think it sacrifices > uniqueness to make the output reproducible. You can get more than N > hex-digits in the output if N is too small to ensure uniquness. > > So it indeed is that this line of thought ... > >> One alternative would be to just always specify --abbrev=40 but this may >> be a bit too biased... > > ... to use --abbrev=999 (because 40 is not the length of a full > object name in the SHA-2 world) is the only reasonable way, if what > you care about is the reproducibility. Right, I keep forgetting about the current work towards SHA-2... that being said I somehow feel that 40 hex-digits will probably be reasonably sufficient even if commit object ids can become longer than that. So it is technically true... > Side note. I think "git describe --no-abbrev" is buggy in that > it does not give a full object name; I didn't check the code, > but it appears to be behaving the same way as "git describe > --abbrev=0" (show no hexdigits). Fixing this bug may possibly > be a low-hanging fruit. I... did not realize that --abbrev, which takes an integer value, could be specified with a leading --no- in the first place. :o > But even if the feature cannot be used to guarantee a full > reproducibility, it is a good thing that we can now add this feature > with minimum effort thanks to the previous two steps. > > The refactoring I suggested in my review for the previous step will > shine, if we want to do a good job parsing the --abbrev=<n> option, > since such a code organization would make it a fairly easy addition > to introduce "integer" type that calls match_placeholder_arg_value() > to read the option value (like "string" does) and validate that the > value is indeed an integer. > > Would we want to support "--contains" as another boolean type? How > about "--all" and "--long"? All three sound plausible candidates. I didn't have any immediate use for these options. To be honest, I don't entirely understand the purpose of: --all, which causes git describe to report things like "heads/master" --contains, which causes git describe to report different output depending on the status of later commits being tagged They may have their uses, but I'm not sure those uses include writing metadata to a git-archive tarball at least. I believe the results would inevitably end up changing after the fact, whereas only matching existing tags will tend to be pretty reliable as a tag is, in most (all?) cases, pushed at the same time as, or before: - the commit it describes - commits which are later than the tag On the other hand... --long could be interesting to some people, although for, say, generating software version numbers, a tagged release will typically omit that information in my experience. For the sake of thoroughness I could add that too.
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt index 14107ac191..317c1382b5 100644 --- a/Documentation/pretty-formats.txt +++ b/Documentation/pretty-formats.txt @@ -221,6 +221,10 @@ The placeholders are: the same time. + ** 'tags[=<BOOL>]': Also consider lightweight tags. +** 'abbrev=<N>': Instead of using the default number of hexadecimal digits + (which will vary according to the number of objects in the repository with a + default of 7) of the abbreviated object name, use <n> digits, or as many digits + as needed to form a unique object name. ** 'match=<pattern>': Only consider tags matching the given `glob(7)` pattern, excluding the "refs/tags/" prefix. ** 'exclude=<pattern>': Do not consider tags matching the given diff --git a/pretty.c b/pretty.c index 3a41bedf1a..a092457274 100644 --- a/pretty.c +++ b/pretty.c @@ -1217,7 +1217,7 @@ int format_set_trailers_options(struct process_trailer_options *opts, static size_t parse_describe_args(const char *start, struct strvec *args) { const char *options[] = { "tags" }; - const char *option_arguments[] = { "match", "exclude" }; + const char *option_arguments[] = { "match", "exclude", "abbrev" }; const char *arg = start; for (;;) { diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh index d4acf8882f..35eef4c865 100755 --- a/t/t4205-log-pretty-formats.sh +++ b/t/t4205-log-pretty-formats.sh @@ -1010,4 +1010,12 @@ test_expect_success '%(describe:tags) vs git describe --tags' ' test_cmp expect actual ' +test_expect_success '%(describe:abbrev=...) vs git describe --abbrev=...' ' + test_when_finished "git tag -d tagname" && + git tag -a -m tagged tagname && + git describe --abbrev=15 >expect && + git log -1 --format="%(describe:abbrev=15)" >actual && + test_cmp expect actual +' + test_done
The %(describe) placeholder by default, like `git describe`, uses a seven-character abbreviated commit hash. This may not be sufficient to fully describe all git repos, resulting in a placeholder replacement changing its length because the repository grew in size. This could cause the output of git-archive to change. Add the --abbrev option to `git describe` to the placeholder interface in order to provide tools to the user for fine-tuning project defaults and ensure reproducible archives. One alternative would be to just always specify --abbrev=40 but this may be a bit too biased... Signed-off-by: Eli Schwartz <eschwartz@archlinux.org> --- Documentation/pretty-formats.txt | 4 ++++ pretty.c | 2 +- t/t4205-log-pretty-formats.sh | 8 ++++++++ 3 files changed, 13 insertions(+), 1 deletion(-)