wt-status: expand, not dwim, a "detached from" ref

Message ID	20200513004058.34456-1-jonathantanmy@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=6HsL=63=vger.kernel.org=git-owner@kernel.org> Date: Tue, 12 May 2020 17:40:58 -0700 Message-Id: <20200513004058.34456-1-jonathantanmy@google.com> Mime-Version: 1.0 Subject: [PATCH] wt-status: expand, not dwim, a "detached from" ref From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com> Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	wt-status: expand, not dwim, a "detached from" ref \| expand wt-status: expand, not dwim, a "detached from" ref

Jonathan Tan May 13, 2020, 12:40 a.m. UTC

When a user checks out the upstream branch of HEAD and then runs "git
branch", like this:

  git checkout @{u}
  git branch

an error message is printed:

  fatal: HEAD does not point to a branch

(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)

This is because "git branch" attempts to DWIM "@{u}" when determining
the "HEAD detached" message, but that doesn't work because HEAD no
longer points to a branch.

Therefore, when calculating the status of a worktree, only expand such a
string, not DWIM it. Such strings would not match a ref, and "git
branch" would report "HEAD detached at <hash>" instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
I tried to track down other uses of wt_status_get_detached_from()
(called from wt_status_get_state() with get_detached_from=1) but there
seemed to be quite a few, so I stopped.

One alternative is to plumb down a flag from dwim_ref() to
interpret_branch_mark() that indicates that (1) don't read HEAD when
processing, and (2) when encountering a ref of the form "<optional
ref>@{u}", don't die when the optional ref doesn't exist, is not a
branch, or does not have an upstream. (We cannot change the
functionality outright because other parts of Git rely on such a message
being printed.) But this seems too complicated just for diagnosis.
---
 t/t3203-branch-output.sh | 10 ++++++++++
 wt-status.c              |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

Junio C Hamano May 13, 2020, 5:33 a.m. UTC | #1

Jonathan Tan <jonathantanmy@google.com> writes:

> When a user checks out the upstream branch of HEAD and then runs "git
> branch", like this:
>
>   git checkout @{u}
>   git branch
>
> an error message is printed:
>
>   fatal: HEAD does not point to a branch

I had to spend unreasonable amount of time reproducing this.

    $ git branch --show-current
    master
    $ git checkout -b -t naster master
    $ git branch --show-current
    naster
    $ git checkout @{u}
    $ git branch --show-current
    master
    $ git branch
    * master
      naster

What was missing was that the current branch must be forked from a
remote-tracking branch; with a fork from a local branch, the last
step, i.e. "git branch", works just fine.

> (This error message when running "git branch" persists even after
> checking out other things - it only stops after checking out a branch.)

Correct.  And it is also worth pointing out that this is not "git
branch"; the users would see it primarily as a problem with "git
status", which would die the same way.

> This is because "git branch" attempts to DWIM "@{u}" when determining
> the "HEAD detached" message, but that doesn't work because HEAD no
> longer points to a branch.
>
> Therefore, when calculating the status of a worktree, only expand such a
> string, not DWIM it. Such strings would not match a ref, and "git
> branch" would report "HEAD detached at <hash>" instead.

OK.  "Detached at <hash>" is somewhat unsatisfactory (you most
likely have detached from refs/remotes/origin/<something>), but
it is much better than "fatal" X-<.

> One alternative is to plumb down a flag from dwim_ref() to
> interpret_branch_mark() that indicates that (1) don't read HEAD when
> processing, and (2) when encountering a ref of the form "<optional
> ref>@{u}", don't die when the optional ref doesn't exist, is not a
> branch, or does not have an upstream. (We cannot change the
> functionality outright because other parts of Git rely on such a message
> being printed.)

Thanks for a good analysis.  That likely is a good direction for a
longer term.  @{upstream} is "tell me the upstream of the branch
that is currently checked out", right?  So it is reasonable to
expect that it has no good answer in a detached HEAD state.  And
when on a branch, that branch may not have an upstream, and we
should prepare for such a case, too.

> diff --git a/wt-status.c b/wt-status.c
> index 98dfa6f73f..f84ebc3e2c 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1562,7 +1562,7 @@ static void wt_status_get_detached_from(struct repository *r,
>  		return;
>  	}
>  
> -	if (dwim_ref(cb.buf.buf, cb.buf.len, &oid, &ref) == 1 &&
> +	if (expand_ref(the_repository, cb.buf.buf, cb.buf.len, &oid, &ref) == 1 &&
>  	    /* sha1 is a commit? match without further lookup */
>  	    (oideq(&cb.noid, &oid) ||
>  	     /* perhaps sha1 is a tag, try to dereference to a commit */

Hmph, I just did this:

    $ git branch -b -t next origin/next
    $ git checkout next@{upstream}
    $ git status

It used to say "HEAD detached at origin/next" without this change,
but now it says "HEAD detached at 046d49d455", which smells like a
regression.

Junio C Hamano May 13, 2020, 2:59 p.m. UTC | #2

Junio C Hamano <gitster@pobox.com> writes:

> OK.  "Detached at <hash>" is somewhat unsatisfactory (you most
> likely have detached from refs/remotes/origin/<something>), but
> it is much better than "fatal" X-<.
>
>> One alternative is to plumb down a flag from dwim_ref() to
>> interpret_branch_mark() that indicates that (1) don't read HEAD when
>> processing, and (2) when encountering a ref of the form "<optional
>> ref>@{u}", don't die when the optional ref doesn't exist, is not a
>> branch, or does not have an upstream. (We cannot change the
>> functionality outright because other parts of Git rely on such a message
>> being printed.)
>
> Thanks for a good analysis.  That likely is a good direction for a
> longer term.  @{upstream} is "tell me the upstream of the branch
> that is currently checked out", right?  So it is reasonable to
> expect that it has no good answer in a detached HEAD state.  And
> when on a branch, that branch may not have an upstream, and we
> should prepare for such a case, too.

Well, I have to take this back.

The real cause of the problem is that we record in reflog that we
switched to @{u}, but when get_detached_from() tries to "dwim" it,
the "current branch" that is implicitly there to be applied to @{u}
is different one from what used to compute which branch to check
out, isn't it?  And it probably is not limited to @{upstream} but
other @{<stuff>} that implicitly apply to the "current branch",
e.g. @{push}.

This causes a few potential problems:

 - The "current" branch may be different from the one when such a
   reflog entry that refers to @{<stuff>} when we later yank
   @{<stuff>} out of the reflog and try to use it.  This is the
   problem the patch under discussion tries to hide, but the patch
   also breaks cases that are working fine.

 - Even if the user gave the branch name (i.e. 'next@{upstream}' in
   the example this patch would have introduced a regression) or if
   we updated get_detached_from() to correctly infer the branch that
   was current when the reflog entry in which we found @{upstream}
   was recorded, the upstream for the branch may have been
   reconfigured between the time when the reflog entry was written
   and get_detached_from() is called.  'next@{upstream}' may have
   been 'origin/next' back then but it may be a different
   remote-tracking branch now.  This issue is not solved by the
   patch---the issue is unfixable unless we log the changes to the
   branch configuration so that we can figure out what was the
   upstream of 'next' back then, which we won't do.

Assuming that is the root cause, I think the right solution likely
lies along these three lines:

 (1) record not @{<stuff>} (or <branch>@{<stuff>} for that matter),
     but the actual starting point in the reflog (e.g. in the
     example this patch would have introduced a regression,
     i.e. next@{u}, we should record 'origin/next'.  In the example
     this patch would have used degraded output to prevent dying,
     i.e. @{u}, we should also record 'origin/next')---this also
     will fix the "the branch's upstream may be different now"
     problem.

 (2) a patch like yours to use expand_ref() as a fallback, but only
     for the "@{<stuff>}" notation that depends on what the "then
     current" branch was---this is a mere band-aid, like the patch
     under discussion is, but at least it would not regress the
     cases that are "working".  "The upstream may be different now"
     problem is still there.

 (3) when we have to interpret @{<stuff>} found in the reflog, go
     back to see which branch was current, and compute @{<stuff>}
     relative to that branch---this would "fix" more cases than (2)
     would, but it won't fix "the upstream can change" problem, and
     I think the trade-off is not all that great.

I think the combination of (1) and (2) is likely to be the best way
to go.  (1) will remove the root cause, and (2) will allow us to
cope with reflog entries before (1) is applied.

Jonathan Tan May 18, 2020, 10:24 p.m. UTC | #3

>  (1) record not @{<stuff>} (or <branch>@{<stuff>} for that matter),
>      but the actual starting point in the reflog (e.g. in the
>      example this patch would have introduced a regression,
>      i.e. next@{u}, we should record 'origin/next'.  In the example
>      this patch would have used degraded output to prevent dying,
>      i.e. @{u}, we should also record 'origin/next')---this also
>      will fix the "the branch's upstream may be different now"
>      problem.

This sounds reasonable. I took a look at this.

The part that converts the user-given refname (e.g. "@{u}") into an OID
is the invocation of get_oid_mb() in parse_branchname_arg() in
builtin/checkout.c, and get_oid_mb() eventually calls repo_dwim_ref()
which has access to the absolute branch name ("origin/master"). I did
not try plumbing it all the way, but I tried overriding "arg" with
"refs/remotes/origin/master" after the call to get_oid_mb() and it
worked.

For reference, the stack between get_oid_mb() and repo_dwim_ref() is as
follows (the line numbers may not be accurate because of some debug
statements I added):

  repo_dwim_ref (refs.c:597)                                                                                
  get_oid_basic (sha1-name.c:875)                                                                           
  get_oid_1 (sha1-name.c:1195)                                                                              
  get_oid_with_context_1 (sha1-name.c:1812)                                                         
  get_oid_with_context (sha1-name.c:1959)
  repo_get_oid (sha1-name.c:1610)
  repo_get_oid_mb (sha1-name.c:1382)

Besides the increase in complicatedness of all the listed functions that
we would need in order to plumb the absolute branch name through, I
haven't checked if the absolute branch name is the one that we should
use whenever we write to the reflog, or if there are some times that we
still want to use the user-specified name. I'll take a further look, but
any ideas are welcome.

Jonathan Nieder Aug. 27, 2020, 1:47 a.m. UTC | #4

Hi,

Jonathan Tan wrote:

> When a user checks out the upstream branch of HEAD and then runs "git
> branch", like this:
>
>   git checkout @{u}
>   git branch
>
> an error message is printed:
>
>   fatal: HEAD does not point to a branch

Interesting.  Even worse, it happens when I run "git status".

[...]
> This is because "git branch" attempts to DWIM "@{u}" when determining
> the "HEAD detached" message, but that doesn't work because HEAD no
> longer points to a branch.
>
> Therefore, when calculating the status of a worktree, only expand such a
> string, not DWIM it. Such strings would not match a ref, and "git
> branch" would report "HEAD detached at <hash>" instead.
[...]
> -	if (dwim_ref(cb.buf.buf, cb.buf.len, &oid, &ref) == 1 &&
> +	if (expand_ref(the_repository, cb.buf.buf, cb.buf.len, &oid, &ref) == 1 &&
>  	    /* sha1 is a commit? match without further lookup */

Alas, as Junio mentions downthread, this patch produces a regression
in behavior: before,

	$ git checkout --quiet master@{u}
	$ git status | head -1
	HEAD detached at origin/master

after,

	$ git checkout --quiet master@{u}
	$ git status | head -1
	HEAD detached at 675a4aaf3b2

This ends up being a bit subtle.

The current branch can be in one of three states: I can be on a branch
(HEAD symref pointing to a ref refs/heads/branchname), detached (HEAD
psuedoref pointing to a commit), or on a branch yet to be born.
__git_ps1 in contrib/completion/git-prompt.sh, for example, is able to
show these three states.

Since b397ea4863a (status: show more info than "currently not on any
branch", 2013-03-13), "git status", on the other hand, gives richer
information.  In the detached case, naming the commit that HEAD points
to doesn't really give a user enough information to orient themself,
so we look at the reflog (!) to find out what the user had intended to
check out.  Reflog entries look like

	checkout: moving from master to v1.0

and the "v1.0" can tell us that v1.0 is the tag that we detached HEAD
on.

This strikes me as always having been strange in a few ways:

On one hand, this is using the reflog.  reflog entries are
historically meant to be human-readable.  They are a raw string, not
structured data.  They can be translated to a different human
language.  Other tools interacting with git repositories are not
guaranteed to use the same string.  Changes such as the introduction
of "git switch" could be expected to lead to writing "switch:" instead
of "checkout:" some day.

Beyond that, it involves guesswork.  As b397ea4863a explains,

	When it cannot figure out the original ref, it shows an
	abbreviated SHA-1.

The reflog entry contains the parameter passed to "git checkout" in
the form the user provided --- it is not a full ref name or string
meant to be appended to refs/heads/ but it can be arbitrary syntax
supported by "git checkout".  At the time that the checkout happened,
we know what *commit* that name resolved to, but we do not know what
ref it resolved to:

- refs in the search path can have been created or deleted since then
- with @{u} syntax, the named branch's upstream can have been
  reconfigured
- and so on

If we want to know what ref HEAD was detached at, wouldn't we want
some reliable source of that information that records it at the time
it was known?

Another example is if I used "git checkout -" syntax.  In that case,
the reflog records the commit id that I am checking out, instead of
recording "-".  That's because (after "-" is replaced by "@{-1}") the
"magic branch" handler strbuf_branchname replaces @{-1} with with the
branch name being checked out.  That branch name *also* comes from the
reflog, this time from the <old> side of the

	checkout: moving from <old> to <new>

line, and <old> is populated as a simple commit id or branch name read
from HEAD.  So this creates a bit of an inconsistency: if I run "git
status", "git checkout -", "git checkout -" again, and then "git
status" again, the first "git status" tells me what ref I had used to
detach HEAD but the second does not.

Okay, so much for background.

The motivating error producing

	fatal: HEAD does not point to a branch

is a special case of the "we do not know what ref it resolved to"
problem described above: the string @{upstream} represents the
upstream of the branch that HEAD pointed to *at the time of the
checkout*, but since then HEAD has been detached.

[...]
> One alternative is to plumb down a flag from dwim_ref() to
> interpret_branch_mark() that indicates that (1) don't read HEAD when
> processing, and (2) when encountering a ref of the form "<optional
> ref>@{u}", don't die when the optional ref doesn't exist, is not a
> branch, or does not have an upstream.

Given the context above, two possibilities seem appealing:

 A. Like you're hinting, could dwim_ref get a variant that returns -1
    instead of die()ing on failure?  That way, we could fulfill the
    intent described in b397ea48:

	When it cannot figure out the original ref, it shows an abbreviated
	SHA-1.

 B. Alternatively, in the @{u} case could we switch together the <old>
    and <new> pieces of the context from the

	checkout: moving from master to @{upstream}

    reflog line to make "master@{upstream}"?  It's still possible for
    the upstream to have changed since then, but at least in the
    common case this would match the lookup that happened at checkout
    time.

Thoughts?

Thanks,
Jonathan

Junio C Hamano Aug. 27, 2020, 2:10 a.m. UTC | #5

Jonathan Nieder <jrnieder@gmail.com> writes:

> Given the context above, two possibilities seem appealing:
>
>  A. Like you're hinting, could dwim_ref get a variant that returns -1
>     instead of die()ing on failure?  That way, we could fulfill the
>     intent described in b397ea48:
>
> 	When it cannot figure out the original ref, it shows an abbreviated
> 	SHA-1.
>
>  B. Alternatively, in the @{u} case could we switch together the <old>
>     and <new> pieces of the context from the
>
> 	checkout: moving from master to @{upstream}
>
>     reflog line to make "master@{upstream}"?  It's still possible for
>     the upstream to have changed since then, but at least in the
>     common case this would match the lookup that happened at checkout
>     time.

Ah, blast from the past.  Thanks for resurrecting.

If we are allowed to change what goes to reflog, can we do even
better than recording master@{upstream} at the time "checkout"
records the HEAD movement?  "checkout: moving from next to master"
would be far better than "moving from next to next@{upstream}" or
"moving from next to @{upstream}".  

Can we even change the phrasing altogether, like "checkout: moving
from next to detached e9b77c..."?  That would produce even more
precise report.

wt-status: expand, not dwim, a "detached from" ref

Commit Message

Comments

Patch