diff mbox series

[3/3] read_ref_at(): special-case ref@{0} for an empty reflog

Message ID 20240226100803.GC2685600@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 5edd12672086c6b6d92147925dda5dd3bca2b658
Headers show
Series show-branch --reflog fixes | expand

Commit Message

Jeff King Feb. 26, 2024, 10:08 a.m. UTC
The previous commit special-cased get_oid_basic()'s handling of ref@{n}
for a reflog with n entries. But its special case doesn't work for
ref@{0} in an empty reflog, because read_ref_at() dies when it notices
the empty reflog!

We can make this work by special-casing this in read_ref_at(). It's
somewhat gross, for two reasons:

  1. We have no reflog entry to describe in the "msg" out-parameter. So
     we have to leave it uninitialized or make something up.

  2. Likewise, we have no oid to put in the "oid" out-parameter. Leaving
     it untouched is actually the best thing here, as all of the callers
     will have initialized it with the current ref value via
     repo_dwim_log(). This is rather subtle, but it is how things worked
     in 6436a20284 (refs: allow @{n} to work with n-sized reflog,
     2021-01-07) before we reverted it.

The key difference from 6436a20284 here is that we'll return "1" to
indicate that we _didn't_ find the requested reflog entry. Coupled with
the special-casing in get_oid_basic() in the previous commit, that's
enough to make looking up ref@{0} work, and we can flip 6436a20284's
test back to expect_success.

It also means that the call in show-branch which segfaulted with
6436a20284 (and which is now tested in t3202) remains OK. The caller
notices that we could not find any reflog entry, and so it breaks out of
its loop, showing nothing. This is different from the current behavior
of producing an error, but it's just as reasonable (and is exactly what
we'd do if you asked it to walk starting at ref@{1} but there was only 1
entry).

Thus nobody should actually look at the reflog entry info we return. But
we'll still put in some fake values just to be on the safe side, since
this is such a subtle and confusing interface. Likewise, we'll document
what's going on in a comment above the function declaration. If this
were a function with a lot of callers, the footgun would probably not be
worth it. But it has only ever had two callers in its 18-year existence,
and it seems unlikely to grow more. So let's hold our noses and let
users enjoy the convenience of a simulated ref@{0}.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c                     | 15 +++++++++++++++
 refs.h                     | 15 ++++++++++++++-
 t/t1508-at-combinations.sh |  2 +-
 t/t3202-show-branch.sh     |  4 ++--
 4 files changed, 32 insertions(+), 4 deletions(-)

Comments

Jeff King Feb. 26, 2024, 10:10 a.m. UTC | #1
On Mon, Feb 26, 2024 at 05:08:03AM -0500, Jeff King wrote:

> Thus nobody should actually look at the reflog entry info we return. But
> we'll still put in some fake values just to be on the safe side, since
> this is such a subtle and confusing interface. Likewise, we'll document
> what's going on in a comment above the function declaration. If this
> were a function with a lot of callers, the footgun would probably not be
> worth it. But it has only ever had two callers in its 18-year existence,
> and it seems unlikely to grow more. So let's hold our noses and let
> users enjoy the convenience of a simulated ref@{0}.

Obviously I'm sympathetic to Patrick's position that this empty-reflog
special case is kind of gross. ;)

That's one of the reasons I split this out from patch 2; we can see
exactly what must be done to make each case work. And in fact I had
originally started to write a patch that simply changed t1508 to expect
failure. I could still be persuaded to go that way if anybody feels
strongly.

-Peff
Junio C Hamano Feb. 26, 2024, 5:25 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

>  	if (!cb.reccnt) {
> +		if (cnt == 0) {

Style "if (!cnt)" ?  In this particular case I do not think it
actually is an improvement, though, simply because zero is really
special in this logic.

> +			/*
> +			 * The caller asked for ref@{0}, and we had no entries.
> +			 * It's a bit subtle, but in practice all callers have
> +			 * prepped the "oid" field with the current value of
> +			 * the ref, which is the most reasonable fallback.
> +			 *
> +			 * We'll put dummy values into the out-parameters (so
> +			 * they're not just uninitialized garbage), and the
> +			 * caller can take our return value as a hint that
> +			 * we did not find any such reflog.
> +			 */
> +			set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
> +			return 1;
> +		}

The dummy value I 100% agree with ;-).

You mentioned the convenience special case for time-based reflog
query for a time older than (e.g. @{20.years.ago}) the reflog
itself, and perhaps this one should be treated as its counterpart,
that is only useful for count-based access.
Junio C Hamano Feb. 26, 2024, 5:25 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> That's one of the reasons I split this out from patch 2; we can see
> exactly what must be done to make each case work. And in fact I had
> originally started to write a patch that simply changed t1508 to expect
> failure. I could still be persuaded to go that way if anybody feels
> strongly.

I do not feel strongly either way myself.  It just is interesting
that the older end of the history is with @{20.years.ago} special
case that is only for time-based query, while the newer end of the
history is with @{0} special case.
Jeff King Feb. 27, 2024, 8:05 a.m. UTC | #4
On Mon, Feb 26, 2024 at 09:25:28AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >  	if (!cb.reccnt) {
> > +		if (cnt == 0) {
> 
> Style "if (!cnt)" ?  In this particular case I do not think it
> actually is an improvement, though, simply because zero is really
> special in this logic.

Yeah, I didn't really choose "== 0" as a conscious decision. But I do
tend to write "!cnt" when available, so I think I sub-consciously chose
this as a better description of the intent.

> 
> > +			/*
> > +			 * The caller asked for ref@{0}, and we had no entries.
> > +			 * It's a bit subtle, but in practice all callers have
> > +			 * prepped the "oid" field with the current value of
> > +			 * the ref, which is the most reasonable fallback.
> > +			 *
> > +			 * We'll put dummy values into the out-parameters (so
> > +			 * they're not just uninitialized garbage), and the
> > +			 * caller can take our return value as a hint that
> > +			 * we did not find any such reflog.
> > +			 */
> > +			set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
> > +			return 1;
> > +		}
> 
> The dummy value I 100% agree with ;-).
> 
> You mentioned the convenience special case for time-based reflog
> query for a time older than (e.g. @{20.years.ago}) the reflog
> itself, and perhaps this one should be treated as its counterpart,
> that is only useful for count-based access.

I think the true counterpart to that would be extending what's added by
patch 2 to get_oid_basic() to stop checking that we hit the count
exactly.

Let me try to lay out my thinking. If you _do_ have a reflog and the
request (whether time or count-based) goes too far back, read_ref_at()
will give you the oldest entry and return "1". And then in
get_oid_basic():

  - if it was a time based cutoff (like @{20.years.ago}), we will issue
    a warning ("log only goes back to 2024-01-01") but return the value
    anyway.

  - before this series, if it is a count based cutoff (like @{9999}), we
    fail and say "there are only have N entries".

  - after this series, we special case asking for @{9999} when there are
    9998 entries by returning the "old" oid but not issuing any warning
    (we do not need to, because we found the right answer for what it
    would have been had that old entry not been pruned).

  - what we _could_ do (but this series does not), and what would be the
    true counterpart to the @{20.years.ago} case, is to allow @{9999}
    for a reflog with only 20 entries, returning the old value from 20
    (or the new value if it was a creation!?) and issuing a warning
    saying "well, it only went back 20, but here you go".

I'm not so sure about that last one. It is pretty subjective, but
somehow asking for timestamps feels more "fuzzy" to me, and Git
returning a fuzzy answer is OK. Whereas asking for item 9999 in a list
with 20 items and getting back an answer feels more absolutely wrong. I
could be persuaded if there were a concrete use case, but I can't really
think of one. It seems more likely to confuse and hinder a user than to
help them.

-Peff
Jeff King Feb. 27, 2024, 8:07 a.m. UTC | #5
On Mon, Feb 26, 2024 at 09:25:32AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > That's one of the reasons I split this out from patch 2; we can see
> > exactly what must be done to make each case work. And in fact I had
> > originally started to write a patch that simply changed t1508 to expect
> > failure. I could still be persuaded to go that way if anybody feels
> > strongly.
> 
> I do not feel strongly either way myself.  It just is interesting
> that the older end of the history is with @{20.years.ago} special
> case that is only for time-based query, while the newer end of the
> history is with @{0} special case.

I laid out my thinking on the 20.years.ago special case in another
reply, but I wanted to say one more thing. The special case here in
patch 3 is making @{0} work for the empty reflog, but there is no
matching special case for time-based timestamps. If you have an empty
reflog and ask for @{20.years.ago}, you will get the usual "nope, the
reflog is empty" response (as opposed to having a non-empty reflog that
cuts off before 20 years ago).

Obviously we could make that work, but I think the point is that "@{0}"
is special magic for "the current value" in a way that a timestamp isn't
really.

-Peff
Junio C Hamano Feb. 27, 2024, 5:03 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> Let me try to lay out my thinking. If you _do_ have a reflog and the
> request (whether time or count-based) goes too far back, read_ref_at()
> will give you the oldest entry and return "1". And then in
> get_oid_basic():
> ...
>   - what we _could_ do (but this series does not), and what would be the
>     true counterpart to the @{20.years.ago} case, is to allow @{9999}
>     for a reflog with only 20 entries, returning the old value from 20
>     (or the new value if it was a creation!?) and issuing a warning
>     saying "well, it only went back 20, but here you go".

Ah, I wasn't drawing *that* similarity.  My thinking was more like

 - When you have two entries in reflog, ref@{0} will use and find
   the latest entry whose value is the same as the ref itself.

 - When you have one entry, @{0} will use and find the latest entry
   whose value is the same as the ref itself.

 - When you have zero entry, @{0} can do the same by taking
   advantage of the fact that its value is supposed to be the same
   as the ref itself anyway.

that happens near the youngest end of a reflog, contrasting with the
@{20.years.ago} that happens near the oldest end.

> I'm not so sure about that last one. It is pretty subjective, but
> somehow asking for timestamps feels more "fuzzy" to me, and Git
> returning a fuzzy answer is OK. Whereas asking for item 9999 in a list
> with 20 items and getting back an answer feels more absolutely wrong. I
> could be persuaded if there were a concrete use case, but I can't really
> think of one. It seems more likely to confuse and hinder a user than to
> help them.

I do not think anybody misses @{9999} not giving the oldest
available, simply because "oldest" is a concept that fits better
with time-based queries than count-based queries.
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 6b826b002e..8eaec5eca7 100644
--- a/refs.c
+++ b/refs.c
@@ -1109,6 +1109,21 @@  int read_ref_at(struct ref_store *refs, const char *refname,
 	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
 
 	if (!cb.reccnt) {
+		if (cnt == 0) {
+			/*
+			 * The caller asked for ref@{0}, and we had no entries.
+			 * It's a bit subtle, but in practice all callers have
+			 * prepped the "oid" field with the current value of
+			 * the ref, which is the most reasonable fallback.
+			 *
+			 * We'll put dummy values into the out-parameters (so
+			 * they're not just uninitialized garbage), and the
+			 * caller can take our return value as a hint that
+			 * we did not find any such reflog.
+			 */
+			set_read_ref_cutoffs(&cb, 0, 0, "empty reflog");
+			return 1;
+		}
 		if (flags & GET_OID_QUIETLY)
 			exit(128);
 		else
diff --git a/refs.h b/refs.h
index 303c5fac4d..37116ad2b2 100644
--- a/refs.h
+++ b/refs.h
@@ -440,7 +440,20 @@  int refs_create_reflog(struct ref_store *refs, const char *refname,
 		       struct strbuf *err);
 int safe_create_reflog(const char *refname, struct strbuf *err);
 
-/** Reads log for the value of ref during at_time. **/
+/**
+ * Reads log for the value of ref during at_time (in which case "cnt" should be
+ * negative) or the reflog "cnt" entries from the top (in which case "at_time"
+ * should be 0).
+ *
+ * If we found the reflog entry in question, returns 0 (and details of the
+ * entry can be found in the out-parameters).
+ *
+ * If we ran out of reflog entries, the out-parameters are filled with the
+ * details of the oldest entry we did find, and the function returns 1. Note
+ * that there is one important special case here! If the reflog was empty
+ * and the caller asked for the 0-th cnt, we will return "1" but leave the
+ * "oid" field untouched.
+ **/
 int read_ref_at(struct ref_store *refs,
 		const char *refname, unsigned int flags,
 		timestamp_t at_time, int cnt,
diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
index 370bf7137e..e841309d0e 100755
--- a/t/t1508-at-combinations.sh
+++ b/t/t1508-at-combinations.sh
@@ -110,7 +110,7 @@  test_expect_success '@{1} works with only one reflog entry' '
 	test_cmp_rev newbranch~ newbranch@{1}
 '
 
-test_expect_failure '@{0} works with empty reflog' '
+test_expect_success '@{0} works with empty reflog' '
 	git checkout -B newbranch main &&
 	git reflog expire --expire=now refs/heads/newbranch &&
 	test_cmp_rev newbranch newbranch@{0}
diff --git a/t/t3202-show-branch.sh b/t/t3202-show-branch.sh
index 35f35f8091..a1139f79e2 100755
--- a/t/t3202-show-branch.sh
+++ b/t/t3202-show-branch.sh
@@ -279,8 +279,8 @@  test_expect_success '--reflog shows reflog entries' '
 
 test_expect_success '--reflog handles missing reflog' '
 	git reflog expire --expire=now branch &&
-	test_must_fail git show-branch --reflog branch 2>err &&
-	grep "log .* is empty" err
+	git show-branch --reflog branch >actual &&
+	test_must_be_empty actual
 '
 
 test_done