Message ID | 004f2e4c92918a7a4e452d49e98ef15f1c5ac545.1600427894.git.liu.denton@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | sample hooks: become hash agnostic | expand |
Hi Denton, On Fri, Sep 18, 2020 at 04:19:03AM -0700, Denton Liu wrote: > When a user needed the null OID for scripting purposes, it used to be > very easy: hardcode 40 zeros. However, since Git started supporting > SHA-256, this assumption became false which may break some scripts. > Allow users to fix their broken scripts by providing users with a > hash-agnostic method of obtaining the null OID. I have not been very involved in the hash transition, so please take my comments with a grain of salt (and if they are misplaced, feel free to ignore them). This '--null-oid' thing makes me wonder exactly what it does. Yours gives a type-less object back, but what about scripts that want the OID of the empty blob or tree? Would having something like '--null-oid[=<type>]' be useful for them? On the one hand, it seems like a thing that would be useful, but on the other, those aren't *the* null OID when 'type' is 'blob' or 'tree'. A more appropriate name in that case might be '--empty-oid=tree'. So, that's an argument that '--null-oid' and '--empty-oid[=<type>]' should be two distinct things. I think I like that best. Do you have any thoughts about it? Thanks, Taylor
On Fri, Sep 18, 2020 at 10:11:25AM -0400, Taylor Blau wrote: > Hi Denton, > > On Fri, Sep 18, 2020 at 04:19:03AM -0700, Denton Liu wrote: > > When a user needed the null OID for scripting purposes, it used to be > > very easy: hardcode 40 zeros. However, since Git started supporting > > SHA-256, this assumption became false which may break some scripts. > > Allow users to fix their broken scripts by providing users with a > > hash-agnostic method of obtaining the null OID. > > I have not been very involved in the hash transition, so please take my > comments with a grain of salt (and if they are misplaced, feel free to > ignore them). Same disclaimer above applies here, too ;-). There are a number of spots in the test suite that reference 'ZERO_OID', as well as OIDs for the empty tree and blob. Maybe the definition of those could be updated to use any new flags you do/don't introduce? I'd be just as happy if that were to occur in a different series than this, since I don't want to hold you up by adding a bunch of new things to your list. In either case, I think '--zero-oid' makes more sense than '--null-oid' (and it matches the tests that are already written). The pair '--zero-oid' and '--empty-oid=<type>' make sense to me. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > In either case, I think '--zero-oid' makes more sense than '--null-oid' > (and it matches the tests that are already written). The pair > '--zero-oid' and '--empty-oid=<type>' make sense to me. I am not sure rev-parse should even know about "empty-oid". An end user or a script who wants to learn what name an empty blob has can and should ask "git hash-object -t blob --stdin </dev/null". I can buy --zero-oid might be handy, but don't see a pressing need if it is merely to support our test suite and sample hooks. Instead, something like ZERO_OID=$(git hash-object --stdin </dev/null | tr '[0-9a-f]' '0') should suffice, no? Take this as a mild indifference, not as a strong rejection.
On Fri, Sep 18, 2020 at 11:16:54AM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > In either case, I think '--zero-oid' makes more sense than '--null-oid' > > (and it matches the tests that are already written). The pair > > '--zero-oid' and '--empty-oid=<type>' make sense to me. > > I am not sure rev-parse should even know about "empty-oid". An end > user or a script who wants to learn what name an empty blob has can > and should ask "git hash-object -t blob --stdin </dev/null". Yeah, my uncertainty ("should this be '--empty-oid' or '--null-oid'?") is probably a good indication (to me, at least) that the option shouldn't even exist. > I can buy --zero-oid might be handy, but don't see a pressing need > if it is merely to support our test suite and sample hooks. > Instead, something like > > ZERO_OID=$(git hash-object --stdin </dev/null | tr '[0-9a-f]' '0') > > should suffice, no? Absolutely. > Take this as a mild indifference, not as a strong rejection. For what it's worth, I'm probably as indifferent as you. I would be slightly less so if there was evidence of lots of out-of-tree scripts that care about these special OIDs, but I haven't looked too far. Thanks, Taylor
On 2020-09-18 at 14:11:25, Taylor Blau wrote: > Hi Denton, > > On Fri, Sep 18, 2020 at 04:19:03AM -0700, Denton Liu wrote: > > When a user needed the null OID for scripting purposes, it used to be > > very easy: hardcode 40 zeros. However, since Git started supporting > > SHA-256, this assumption became false which may break some scripts. > > Allow users to fix their broken scripts by providing users with a > > hash-agnostic method of obtaining the null OID. > > I have not been very involved in the hash transition, so please take my > comments with a grain of salt (and if they are misplaced, feel free to > ignore them). > > This '--null-oid' thing makes me wonder exactly what it does. Yours > gives a type-less object back, but what about scripts that want the OID > of the empty blob or tree? > > Would having something like '--null-oid[=<type>]' be useful for them? On > the one hand, it seems like a thing that would be useful, but on the > other, those aren't *the* null OID when 'type' is 'blob' or 'tree'. A > more appropriate name in that case might be '--empty-oid=tree'. > > So, that's an argument that '--null-oid' and '--empty-oid[=<type>]' > should be two distinct things. I think I like that best. Do you have any > thoughts about it? So I definitely want to distinguish between the null (all-zeros) OID and the OID of an empty object, and I think using "null" and "empty" are fine. What I typically do when I write shell scripts, and which may obviate the need for this patch is turn this: [ "$oid" = 0000000000000000000000000000000000000000 ] into this: echo "$oid" | grep -qsE '^0+$' This is slightly less efficient, but it's also backwards compatible with older Git version assuming you have a POSIX grep. If you still want this option, then that's fine, but please make --null-oid take the same arguments as --show-object-format (and default to the same value). Git will soon learn about writing SHA-1 while storing in SHA-256, and it makes everyone's life better if we can plan for the future by making it understand these options now. I'm not sure we need an empty tree and empty blob object, because it's pretty easy to write these: git hash-object -t tree /dev/null git hash-object -t blob /dev/null That's what I've done in some of the transition code at least.
On Fri, Sep 18, 2020 at 2:34 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > So I definitely want to distinguish between the null (all-zeros) OID and > the OID of an empty object, and I think using "null" and "empty" are > fine. (I like this myself) > What I typically do when I write shell scripts, and which may obviate > the need for this patch is turn this: > > [ "$oid" = 0000000000000000000000000000000000000000 ] > > into this: > > echo "$oid" | grep -qsE '^0+$' > > This is slightly less efficient, but it's also backwards compatible > with older Git version assuming you have a POSIX grep. Note that a lot of `grep`s do not have `-q` and/or `-s` so the portable variant of this is `grep '^0+$' >/dev/null` (you only need the `2>&1` part if you're concerned about bad input files or an error on a pipe or something). > I'm not sure we need an empty tree and empty blob object, because it's > pretty easy to write these: > > git hash-object -t tree /dev/null > git hash-object -t blob /dev/null > > That's what I've done in some of the transition code at least. That's what's recommended in my 2012 stackoverflow Q&A, too. The use of `/dev/null` directly here is perhaps unsatisfactory on old Windows systems, though...? Chris
On Fri, Sep 18, 2020 at 09:26:09PM +0000, brian m. carlson wrote: > What I typically do when I write shell scripts, and which may obviate > the need for this patch is turn this: > > [ "$oid" = 0000000000000000000000000000000000000000 ] > > into this: > > echo "$oid" | grep -qsE '^0+$' > > This is slightly less efficient, but it's also backwards compatible > with older Git version assuming you have a POSIX grep. Yeah, I mostly just have no idea how common this is in the wild. If many scripts care about the null OID, then a '--null-oid' makes sense to me. But if it's only a few, then it does not. > If you still want this option, then that's fine, but please make > --null-oid take the same arguments as --show-object-format (and default > to the same value). Git will soon learn about writing SHA-1 while > storing in SHA-256, and it makes everyone's life better if we can plan > for the future by making it understand these options now. Agreed. > I'm not sure we need an empty tree and empty blob object, because it's > pretty easy to write these: > > git hash-object -t tree /dev/null > git hash-object -t blob /dev/null > > That's what I've done in some of the transition code at least. I could go either way. This for some reason seems more common to me, so I wouldn't mind making it easier for callers, but I don't care so much because what you already wrote is easy enough as-is. > -- > brian m. carlson: Houston, Texas, US Thanks, Taylor
On Sep 18 2020, brian m. carlson wrote: > What I typically do when I write shell scripts, and which may obviate > the need for this patch is turn this: > > [ "$oid" = 0000000000000000000000000000000000000000 ] > > into this: > > echo "$oid" | grep -qsE '^0+$' > > This is slightly less efficient, but it's also backwards compatible > with older Git version assuming you have a POSIX grep. You can also use case $oid in *[1-9a-f]*) ... ;; *) ... ;; esac which doesn't need an external process. Andreas.
On 2020-09-20 at 04:25:33, Chris Torek wrote: > On Fri, Sep 18, 2020 at 2:34 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > What I typically do when I write shell scripts, and which may obviate > > the need for this patch is turn this: > > > > [ "$oid" = 0000000000000000000000000000000000000000 ] > > > > into this: > > > > echo "$oid" | grep -qsE '^0+$' > > > > This is slightly less efficient, but it's also backwards compatible > > with older Git version assuming you have a POSIX grep. > > Note that a lot of `grep`s do not have `-q` and/or `-s` so the > portable variant of this is `grep '^0+$' >/dev/null` (you only need > the `2>&1` part if you're concerned about bad input files or > an error on a pipe or something). If we're looking for best compatibility here, then using egrep and /dev/null is best, I agree. I personally use the POSIX version because it's been that way since at least 2001 and I don't have a problem with requiring compliance with a 19-year-old standard. But for Git, we should definitely do whatever we do in the testsuite if we use this approach, since presumably that works everywhere. As Andreas pointed out, there are ways to avoid the external process that we could stuff in a shell function. I'm not picky. > > I'm not sure we need an empty tree and empty blob object, because it's > > pretty easy to write these: > > > > git hash-object -t tree /dev/null > > git hash-object -t blob /dev/null > > > > That's what I've done in some of the transition code at least. > > That's what's recommended in my 2012 stackoverflow Q&A, too. > The use of `/dev/null` directly here is perhaps unsatisfactory on > old Windows systems, though...? I believe all modern versions of Git for Windows provide /dev/null via the shell, since it's required for a lot of things to work, so I'm not worried about this case. It is definitely good to think about Windows, though.
diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt index 19b12b6d43..b370d425d7 100644 --- a/Documentation/git-rev-parse.txt +++ b/Documentation/git-rev-parse.txt @@ -285,6 +285,10 @@ print a message to stderr and exit with nonzero status. Other Options ~~~~~~~~~~~~~ +--null-oid:: + Print the null OID (the OID containing all zeros). This OID is + used to represent a non-existent object. + --since=datestring:: --after=datestring:: Parse the date string, and output the corresponding diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index ed200c8af1..4e4ca99775 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -910,6 +910,10 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix) } continue; } + if (!strcmp(arg, "--null-oid")) { + puts(oid_to_hex(&null_oid)); + continue; + } if (skip_prefix(arg, "--since=", &arg)) { show_datestring("--max-age=", arg); continue; diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh index 408b97d5af..8c1bd543ef 100755 --- a/t/t1500-rev-parse.sh +++ b/t/t1500-rev-parse.sh @@ -185,4 +185,10 @@ test_expect_success 'showing the superproject correctly' ' test_cmp expect out ' +test_expect_success 'rev-parse --null-oid' ' + echo "$(test_oid zero)" >expect && + git rev-parse --null-oid >actual && + test_cmp expect actual +' + test_done
When a user needed the null OID for scripting purposes, it used to be very easy: hardcode 40 zeros. However, since Git started supporting SHA-256, this assumption became false which may break some scripts. Allow users to fix their broken scripts by providing users with a hash-agnostic method of obtaining the null OID. Signed-off-by: Denton Liu <liu.denton@gmail.com> --- Documentation/git-rev-parse.txt | 4 ++++ builtin/rev-parse.c | 4 ++++ t/t1500-rev-parse.sh | 6 ++++++ 3 files changed, 14 insertions(+)