diff mbox series

[2/4] builtin/rev-parse: learn --null-oid

Message ID 004f2e4c92918a7a4e452d49e98ef15f1c5ac545.1600427894.git.liu.denton@gmail.com (mailing list archive)
State Superseded
Headers show
Series sample hooks: become hash agnostic | expand

Commit Message

Denton Liu Sept. 18, 2020, 11:19 a.m. UTC
When a user needed the null OID for scripting purposes, it used to be
very easy: hardcode 40 zeros. However, since Git started supporting
SHA-256, this assumption became false which may break some scripts.
Allow users to fix their broken scripts by providing users with a
hash-agnostic method of obtaining the null OID.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---
 Documentation/git-rev-parse.txt | 4 ++++
 builtin/rev-parse.c             | 4 ++++
 t/t1500-rev-parse.sh            | 6 ++++++
 3 files changed, 14 insertions(+)

Comments

Taylor Blau Sept. 18, 2020, 2:11 p.m. UTC | #1
Hi Denton,

On Fri, Sep 18, 2020 at 04:19:03AM -0700, Denton Liu wrote:
> When a user needed the null OID for scripting purposes, it used to be
> very easy: hardcode 40 zeros. However, since Git started supporting
> SHA-256, this assumption became false which may break some scripts.
> Allow users to fix their broken scripts by providing users with a
> hash-agnostic method of obtaining the null OID.

I have not been very involved in the hash transition, so please take my
comments with a grain of salt (and if they are misplaced, feel free to
ignore them).

This '--null-oid' thing makes me wonder exactly what it does. Yours
gives a type-less object back, but what about scripts that want the OID
of the empty blob or tree?

Would having something like '--null-oid[=<type>]' be useful for them? On
the one hand, it seems like a thing that would be useful, but on the
other, those aren't *the* null OID when 'type' is 'blob' or 'tree'. A
more appropriate name in that case might be '--empty-oid=tree'.

So, that's an argument that '--null-oid' and '--empty-oid[=<type>]'
should be two distinct things. I think I like that best. Do you have any
thoughts about it?

Thanks,
Taylor
Taylor Blau Sept. 18, 2020, 2:16 p.m. UTC | #2
On Fri, Sep 18, 2020 at 10:11:25AM -0400, Taylor Blau wrote:
> Hi Denton,
>
> On Fri, Sep 18, 2020 at 04:19:03AM -0700, Denton Liu wrote:
> > When a user needed the null OID for scripting purposes, it used to be
> > very easy: hardcode 40 zeros. However, since Git started supporting
> > SHA-256, this assumption became false which may break some scripts.
> > Allow users to fix their broken scripts by providing users with a
> > hash-agnostic method of obtaining the null OID.
>
> I have not been very involved in the hash transition, so please take my
> comments with a grain of salt (and if they are misplaced, feel free to
> ignore them).

Same disclaimer above applies here, too ;-). There are a number of spots
in the test suite that reference 'ZERO_OID', as well as OIDs for the
empty tree and blob. Maybe the definition of those could be updated to
use any new flags you do/don't introduce?

I'd be just as happy if that were to occur in a different series than
this, since I don't want to hold you up by adding a bunch of new things
to your list.

In either case, I think '--zero-oid' makes more sense than '--null-oid'
(and it matches the tests that are already written). The pair
'--zero-oid' and '--empty-oid=<type>' make sense to me.

Thanks,
Taylor
Junio C Hamano Sept. 18, 2020, 6:16 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> In either case, I think '--zero-oid' makes more sense than '--null-oid'
> (and it matches the tests that are already written). The pair
> '--zero-oid' and '--empty-oid=<type>' make sense to me.

I am not sure rev-parse should even know about "empty-oid".  An end
user or a script who wants to learn what name an empty blob has can
and should ask "git hash-object -t blob --stdin </dev/null".

I can buy --zero-oid might be handy, but don't see a pressing need
if it is merely to support our test suite and sample hooks.
Instead, something like

  ZERO_OID=$(git hash-object --stdin </dev/null | tr '[0-9a-f]' '0')

should suffice, no?

Take this as a mild indifference, not as a strong rejection.
Taylor Blau Sept. 18, 2020, 6:21 p.m. UTC | #4
On Fri, Sep 18, 2020 at 11:16:54AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > In either case, I think '--zero-oid' makes more sense than '--null-oid'
> > (and it matches the tests that are already written). The pair
> > '--zero-oid' and '--empty-oid=<type>' make sense to me.
>
> I am not sure rev-parse should even know about "empty-oid".  An end
> user or a script who wants to learn what name an empty blob has can
> and should ask "git hash-object -t blob --stdin </dev/null".

Yeah, my uncertainty ("should this be '--empty-oid' or '--null-oid'?")
is probably a good indication (to me, at least) that the option
shouldn't even exist.

> I can buy --zero-oid might be handy, but don't see a pressing need
> if it is merely to support our test suite and sample hooks.
> Instead, something like
>
>   ZERO_OID=$(git hash-object --stdin </dev/null | tr '[0-9a-f]' '0')
>
> should suffice, no?

Absolutely.

> Take this as a mild indifference, not as a strong rejection.

For what it's worth, I'm probably as indifferent as you. I would be
slightly less so if there was evidence of lots of out-of-tree scripts
that care about these special OIDs, but I haven't looked too far.


Thanks,
Taylor
brian m. carlson Sept. 18, 2020, 9:26 p.m. UTC | #5
On 2020-09-18 at 14:11:25, Taylor Blau wrote:
> Hi Denton,
> 
> On Fri, Sep 18, 2020 at 04:19:03AM -0700, Denton Liu wrote:
> > When a user needed the null OID for scripting purposes, it used to be
> > very easy: hardcode 40 zeros. However, since Git started supporting
> > SHA-256, this assumption became false which may break some scripts.
> > Allow users to fix their broken scripts by providing users with a
> > hash-agnostic method of obtaining the null OID.
> 
> I have not been very involved in the hash transition, so please take my
> comments with a grain of salt (and if they are misplaced, feel free to
> ignore them).
> 
> This '--null-oid' thing makes me wonder exactly what it does. Yours
> gives a type-less object back, but what about scripts that want the OID
> of the empty blob or tree?
> 
> Would having something like '--null-oid[=<type>]' be useful for them? On
> the one hand, it seems like a thing that would be useful, but on the
> other, those aren't *the* null OID when 'type' is 'blob' or 'tree'. A
> more appropriate name in that case might be '--empty-oid=tree'.
> 
> So, that's an argument that '--null-oid' and '--empty-oid[=<type>]'
> should be two distinct things. I think I like that best. Do you have any
> thoughts about it?

So I definitely want to distinguish between the null (all-zeros) OID and
the OID of an empty object, and I think using "null" and "empty" are
fine.

What I typically do when I write shell scripts, and which may obviate
the need for this patch is turn this:

  [ "$oid" = 0000000000000000000000000000000000000000 ]

into this:

  echo "$oid" | grep -qsE '^0+$'

This is slightly less efficient, but it's also backwards compatible
with older Git version assuming you have a POSIX grep.

If you still want this option, then that's fine, but please make
--null-oid take the same arguments as --show-object-format (and default
to the same value).  Git will soon learn about writing SHA-1 while
storing in SHA-256, and it makes everyone's life better if we can plan
for the future by making it understand these options now.

I'm not sure we need an empty tree and empty blob object, because it's
pretty easy to write these:

  git hash-object -t tree /dev/null
  git hash-object -t blob /dev/null

That's what I've done in some of the transition code at least.
Chris Torek Sept. 20, 2020, 4:25 a.m. UTC | #6
On Fri, Sep 18, 2020 at 2:34 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> So I definitely want to distinguish between the null (all-zeros) OID and
> the OID of an empty object, and I think using "null" and "empty" are
> fine.

(I like this myself)

> What I typically do when I write shell scripts, and which may obviate
> the need for this patch is turn this:
>
>   [ "$oid" = 0000000000000000000000000000000000000000 ]
>
> into this:
>
>   echo "$oid" | grep -qsE '^0+$'
>
> This is slightly less efficient, but it's also backwards compatible
> with older Git version assuming you have a POSIX grep.

Note that a lot of `grep`s do not have `-q` and/or `-s` so the
portable variant of this is `grep '^0+$' >/dev/null` (you only need
the `2>&1` part if you're concerned about bad input files or
an error on a pipe or something).

> I'm not sure we need an empty tree and empty blob object, because it's
> pretty easy to write these:
>
>   git hash-object -t tree /dev/null
>   git hash-object -t blob /dev/null
>
> That's what I've done in some of the transition code at least.

That's what's recommended in my 2012 stackoverflow Q&A, too.
The use of `/dev/null` directly here is perhaps unsatisfactory on
old Windows systems, though...?

Chris
Taylor Blau Sept. 20, 2020, 3:35 p.m. UTC | #7
On Fri, Sep 18, 2020 at 09:26:09PM +0000, brian m. carlson wrote:
> What I typically do when I write shell scripts, and which may obviate
> the need for this patch is turn this:
>
>   [ "$oid" = 0000000000000000000000000000000000000000 ]
>
> into this:
>
>   echo "$oid" | grep -qsE '^0+$'
>
> This is slightly less efficient, but it's also backwards compatible
> with older Git version assuming you have a POSIX grep.

Yeah, I mostly just have no idea how common this is in the wild. If many
scripts care about the null OID, then a '--null-oid' makes sense to me.
But if it's only a few, then it does not.

> If you still want this option, then that's fine, but please make
> --null-oid take the same arguments as --show-object-format (and default
> to the same value).  Git will soon learn about writing SHA-1 while
> storing in SHA-256, and it makes everyone's life better if we can plan
> for the future by making it understand these options now.

Agreed.

> I'm not sure we need an empty tree and empty blob object, because it's
> pretty easy to write these:
>
>   git hash-object -t tree /dev/null
>   git hash-object -t blob /dev/null
>
> That's what I've done in some of the transition code at least.

I could go either way. This for some reason seems more common to me, so
I wouldn't mind making it easier for callers, but I don't care so much
because what you already wrote is easy enough as-is.

> --
> brian m. carlson: Houston, Texas, US

Thanks,
Taylor
Andreas Schwab Sept. 20, 2020, 4:03 p.m. UTC | #8
On Sep 18 2020, brian m. carlson wrote:

> What I typically do when I write shell scripts, and which may obviate
> the need for this patch is turn this:
>
>   [ "$oid" = 0000000000000000000000000000000000000000 ]
>
> into this:
>
>   echo "$oid" | grep -qsE '^0+$'
>
> This is slightly less efficient, but it's also backwards compatible
> with older Git version assuming you have a POSIX grep.

You can also use

  case $oid in *[1-9a-f]*) ... ;; *) ... ;; esac

which doesn't need an external process.

Andreas.
brian m. carlson Sept. 20, 2020, 6:58 p.m. UTC | #9
On 2020-09-20 at 04:25:33, Chris Torek wrote:
> On Fri, Sep 18, 2020 at 2:34 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > What I typically do when I write shell scripts, and which may obviate
> > the need for this patch is turn this:
> >
> >   [ "$oid" = 0000000000000000000000000000000000000000 ]
> >
> > into this:
> >
> >   echo "$oid" | grep -qsE '^0+$'
> >
> > This is slightly less efficient, but it's also backwards compatible
> > with older Git version assuming you have a POSIX grep.
> 
> Note that a lot of `grep`s do not have `-q` and/or `-s` so the
> portable variant of this is `grep '^0+$' >/dev/null` (you only need
> the `2>&1` part if you're concerned about bad input files or
> an error on a pipe or something).

If we're looking for best compatibility here, then using egrep and
/dev/null is best, I agree.  I personally use the POSIX version because
it's been that way since at least 2001 and I don't have a problem with
requiring compliance with a 19-year-old standard.  But for Git, we
should definitely do whatever we do in the testsuite if we use this
approach, since presumably that works everywhere.

As Andreas pointed out, there are ways to avoid the external process
that we could stuff in a shell function.  I'm not picky.

> > I'm not sure we need an empty tree and empty blob object, because it's
> > pretty easy to write these:
> >
> >   git hash-object -t tree /dev/null
> >   git hash-object -t blob /dev/null
> >
> > That's what I've done in some of the transition code at least.
> 
> That's what's recommended in my 2012 stackoverflow Q&A, too.
> The use of `/dev/null` directly here is perhaps unsatisfactory on
> old Windows systems, though...?

I believe all modern versions of Git for Windows provide /dev/null via
the shell, since it's required for a lot of things to work, so I'm not
worried about this case.  It is definitely good to think about Windows,
though.
diff mbox series

Patch

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index 19b12b6d43..b370d425d7 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -285,6 +285,10 @@  print a message to stderr and exit with nonzero status.
 Other Options
 ~~~~~~~~~~~~~
 
+--null-oid::
+	Print the null OID (the OID containing all zeros). This OID is
+	used to represent a non-existent object.
+
 --since=datestring::
 --after=datestring::
 	Parse the date string, and output the corresponding
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index ed200c8af1..4e4ca99775 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -910,6 +910,10 @@  int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				}
 				continue;
 			}
+			if (!strcmp(arg, "--null-oid")) {
+				puts(oid_to_hex(&null_oid));
+				continue;
+			}
 			if (skip_prefix(arg, "--since=", &arg)) {
 				show_datestring("--max-age=", arg);
 				continue;
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 408b97d5af..8c1bd543ef 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -185,4 +185,10 @@  test_expect_success 'showing the superproject correctly' '
 	test_cmp expect out
 '
 
+test_expect_success 'rev-parse --null-oid' '
+	echo "$(test_oid zero)" >expect &&
+	git rev-parse --null-oid >actual &&
+	test_cmp expect actual
+'
+
 test_done