diff mbox series

prune: mark rebase autostash and orig-head as reachable

Message ID pull.1656.git.1707411636382.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series prune: mark rebase autostash and orig-head as reachable | expand

Commit Message

Phillip Wood Feb. 8, 2024, 5 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rebase records the oid of HEAD before rebasing and the commit created by
"--autostash" in files in the rebase state directory. This means that
the autostash commit is never reachable from any ref or reflog and when
rebasing a detached HEAD the original HEAD can become unreachable if the
user expires HEAD's the reflog while the rebase is running. Fix this by
reading the relevant files when marking reachable commits.

Note that it is possible for the commit recorded in
.git/rebase-merge/amend to be unreachable but pruning that object does
not affect the operation of "git rebase --continue" as we're only
interested in the object id, not in the object itself.

Reported-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
    prune: mark rebase autostash and orig-head as reachable
    
    Thanks for reporting this Orgad, it reminded me that I've been meaning
    to clean up this patch for a while.
    
    I've got a patch on top of this that marks selected pseudorefs such as
    ORIG_HEAD and AUTO_MERGE as reachable but I've held off sending that as
    I think it would be better to rebase it on kn/for-all-refs once that has
    stabilized as that will hopefully allows us to mark all pseudorefs as
    reachable rather than using a hard-coded list.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1656%2Fphillipwood%2Fprune-protect-rebase-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1656/phillipwood/prune-protect-rebase-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1656

 reachable.c                 | 51 +++++++++++++++++++++++++++++++++++++
 t/t3407-rebase-abort.sh     | 17 ++++++++++++-
 t/t3420-rebase-autostash.sh | 10 ++++++++
 3 files changed, 77 insertions(+), 1 deletion(-)


base-commit: 2a540e432fe5dff3cfa9d3bf7ca56db2ad12ebb9

Comments

Eric Sunshine Feb. 8, 2024, 5:25 p.m. UTC | #1
On Thu, Feb 8, 2024 at 12:00 PM Phillip Wood via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Rebase records the oid of HEAD before rebasing and the commit created by
> "--autostash" in files in the rebase state directory. This means that
> the autostash commit is never reachable from any ref or reflog and when
> rebasing a detached HEAD the original HEAD can become unreachable if the
> user expires HEAD's the reflog while the rebase is running. Fix this by
> reading the relevant files when marking reachable commits.
>
> Note that it is possible for the commit recorded in
> .git/rebase-merge/amend to be unreachable but pruning that object does
> not affect the operation of "git rebase --continue" as we're only
> interested in the object id, not in the object itself.
>
> Reported-by: Orgad Shaneh <orgads@gmail.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
> diff --git a/reachable.c b/reachable.c
> @@ -30,6 +31,53 @@ static void update_progress(struct connectivity_progress *cp)
> +static int add_one_file(const char *path, struct rev_info *revs)
> +{
> +       struct strbuf buf = STRBUF_INIT;
> +
> +       if (!read_oneliner(&buf, path, READ_ONELINER_SKIP_IF_EMPTY)) {
> +               strbuf_release(&buf);
> +               return 0;
> +       }
> +       strbuf_trim(&buf);
> +       if (!get_oid_hex(buf.buf, &oid)) {
> +               object = parse_object_or_die(&oid, buf.buf);
> +               add_pending_object(revs, object, "");
> +       }
> +       return 0;
> +}

Is this leaking the strbuf? Should the function instead end with:

    strbuf_release(&buf);
    return 0;

Also, what is the significance of the return value of this function?
All code paths seem to be returning 0 unconditionally, and the caller
ignores the return value.

> +/* Mark objects recored in rebase state files as reachable. */

s/recored/recorded/
Junio C Hamano Feb. 8, 2024, 6:06 p.m. UTC | #2
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Rebase records the oid of HEAD before rebasing and the commit created by
> "--autostash" in files in the rebase state directory. This means that
> the autostash commit is never reachable from any ref or reflog and when
> rebasing a detached HEAD the original HEAD can become unreachable if the
> user expires HEAD's the reflog while the rebase is running. Fix this by
> reading the relevant files when marking reachable commits.

I do not like this kind of special casing in general, but because
these are our tools' droppings, I am OK to grandfather them in, as
long as we promise ourselves that we will not add more of these
ad-hoc "text files" that record object names, loss of which affects
correctness.  They should, like "git bisect", be using proper
references to protect these objects instead, of course.

I agree with you that we might want to add pseudorefs as a starting
points of reachability traversal, but I suspect it would add
unnecessary complexity we would rather not want to deal with.

For example, not GC'ing what is pointed at by lines in FETCH_HEAD is
OK.  Excluding those objects that are only reachable from an object
mentioned by a pseudoref, when a new "git fetch" is negotiating with
a remote what objects need to be sent here, might be disastrous, as
the pseudoref that said "this object is here and you can safely
consider everything reachable from it is" will be short-lived and
can go away anytime, and an auto-gc kicking in at a wrong time ...

Thanks.
Phillip Wood Feb. 9, 2024, 11:08 a.m. UTC | #3
Hi Eric

On 08/02/2024 17:25, Eric Sunshine wrote:
> On Thu, Feb 8, 2024 at 12:00 PM Phillip Wood via GitGitGadget

>> +static int add_one_file(const char *path, struct rev_info *revs)
>> +{
>> +       struct strbuf buf = STRBUF_INIT;
>> +
>> +       if (!read_oneliner(&buf, path, READ_ONELINER_SKIP_IF_EMPTY)) {
>> +               strbuf_release(&buf);
>> +               return 0;
>> +       }
>> +       strbuf_trim(&buf);
>> +       if (!get_oid_hex(buf.buf, &oid)) {
>> +               object = parse_object_or_die(&oid, buf.buf);
>> +               add_pending_object(revs, object, "");
>> +       }
>> +       return 0;
>> +}
> 
> Is this leaking the strbuf? Should the function instead end with:
> 
>      strbuf_release(&buf);
>      return 0;

Yes, well spotted

> Also, what is the significance of the return value of this function?
> All code paths seem to be returning 0 unconditionally, and the caller
> ignores the return value.

Good point, I think in an earlier draft it returned an error at one 
point, I'll change the return types.

>> +/* Mark objects recored in rebase state files as reachable. */
> 
> s/recored/recorded/

Sharp eyes as ever, thanks for looking at this I'll re-roll

Phillip
Phillip Wood Feb. 9, 2024, 2:15 p.m. UTC | #4
Hi Junio

On 08/02/2024 18:06, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Rebase records the oid of HEAD before rebasing and the commit created by
>> "--autostash" in files in the rebase state directory. This means that
>> the autostash commit is never reachable from any ref or reflog and when
>> rebasing a detached HEAD the original HEAD can become unreachable if the
>> user expires HEAD's the reflog while the rebase is running. Fix this by
>> reading the relevant files when marking reachable commits.
> 
> I do not like this kind of special casing in general, but because
> these are our tools' droppings, I am OK to grandfather them in, as
> long as we promise ourselves that we will not add more of these
> ad-hoc "text files" that record object names, loss of which affects
> correctness.  They should, like "git bisect", be using proper
> references to protect these objects instead, of course.

We should definitely do that for future commands

> I agree with you that we might want to add pseudorefs as a starting
> points of reachability traversal, but I suspect it would add
> unnecessary complexity we would rather not want to deal with.
> 
> For example, not GC'ing what is pointed at by lines in FETCH_HEAD is
> OK.  Excluding those objects that are only reachable from an object
> mentioned by a pseudoref, when a new "git fetch" is negotiating with
> a remote what objects need to be sent here, might be disastrous, as
> the pseudoref that said "this object is here and you can safely
> consider everything reachable from it is" will be short-lived and
> can go away anytime, and an auto-gc kicking in at a wrong time ...

I can see that including pseudorefs when "git fetch" is negotiating 
could be problematic but does it use mark_reachable_objects()? Maybe I'm 
missing something as I've only done a quick grep but it only seems to be 
called from builtin/prune.c and builtin/repack.c and prior to 4421474e06 
(Move traversal of reachable objects into a separate library., 
2007-01-06) it seems to have been a static method in builtin-prune.c

Best Wishes

Phillip
Phil Hord May 14, 2024, 7:37 p.m. UTC | #5
On Thu, Feb 8, 2024 at 10:08 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Rebase records the oid of HEAD before rebasing and the commit created by
> > "--autostash" in files in the rebase state directory. This means that
...
> I do not like this kind of special casing in general, but because
> these are our tools' droppings, I am OK to grandfather them in, as
> long as we promise ourselves that we will not add more of these
> ad-hoc "text files" that record object names, loss of which affects
> correctness.  They should, like "git bisect", be using proper
> references to protect these objects instead, of course.

I have long wanted to have a special ref named "AUTOSTASH" since it
supports my workflow of applying workdir changes to previous commits
during a rebase.  For example, I often do this:

     $ git rebase -i
     Created autostash: c0ffeebea0
     ...
     <stopped to edit some commit in my history>
     $ git stash apply c0ffeebea0
     $ git commit --amend && git rebase --continue

But it requires me to find the text output after "Created autostash:"
from the original rebase command which may have scrolled a lot by now.
It would be easier to say:
     $ git stash apply AUTOSTASH

I see that MERGE_AUTOSTASH has been added lately. And I am inferring
that there's a desire to remove (eventually) these file-based info
trackers such as "rebase-apply/autostash". Is there any reason not to
raise the rebase/autostash notation to a proper ref now?  Should it be
named REBASE_AUTOSTASH if I add this?

Even if we don't remove the file-based notation immediately
"rebase-apply/autostash", I would like to add a ref that duplicates
the information for my workflow. Maybe we can deprecate the file
itself and remove it in some future version.
diff mbox series

Patch

diff --git a/reachable.c b/reachable.c
index f29b06a5d05..fef29422d4a 100644
--- a/reachable.c
+++ b/reachable.c
@@ -17,6 +17,7 @@ 
 #include "pack-mtimes.h"
 #include "config.h"
 #include "run-command.h"
+#include "sequencer.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -30,6 +31,53 @@  static void update_progress(struct connectivity_progress *cp)
 		display_progress(cp->progress, cp->count);
 }
 
+static int add_one_file(const char *path, struct rev_info *revs)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id oid;
+	struct object *object;
+
+	if (!read_oneliner(&buf, path, READ_ONELINER_SKIP_IF_EMPTY)) {
+		strbuf_release(&buf);
+		return 0;
+	}
+	strbuf_trim(&buf);
+	if (!get_oid_hex(buf.buf, &oid)) {
+		object = parse_object_or_die(&oid, buf.buf);
+		add_pending_object(revs, object, "");
+	}
+	return 0;
+}
+
+/* Mark objects recored in rebase state files as reachable. */
+static int add_rebase_files(struct rev_info *revs)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t len;
+	const char *path[] = {
+		"rebase-apply/autostash",
+		"rebase-apply/orig-head",
+		"rebase-merge/autostash",
+		"rebase-merge/orig-head",
+	};
+	struct worktree **worktrees = get_worktrees();
+
+	for (struct worktree **wt = worktrees; *wt; wt++) {
+		strbuf_reset(&buf);
+		strbuf_addstr(&buf, get_worktree_git_dir(*wt));
+		strbuf_complete(&buf, '/');
+		len = buf.len;
+		for (size_t i = 0; i < ARRAY_SIZE(path); i++) {
+			strbuf_setlen(&buf, len);
+			strbuf_addstr(&buf, path[i]);
+			add_one_file(buf.buf, revs);
+		}
+	}
+	strbuf_release(&buf);
+	free_worktrees(worktrees);
+	return 0;
+}
+
 static int add_one_ref(const char *path, const struct object_id *oid,
 		       int flag, void *cb_data)
 {
@@ -322,6 +370,9 @@  void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	head_ref(add_one_ref, revs);
 	other_head_refs(add_one_ref, revs);
 
+	/* rebase autostash and orig-head */
+	add_rebase_files(revs);
+
 	/* Add all reflog info */
 	if (mark_reflog)
 		add_reflogs_to_pending(revs, 0);
diff --git a/t/t3407-rebase-abort.sh b/t/t3407-rebase-abort.sh
index ebbaed147a6..9f49c4228b6 100755
--- a/t/t3407-rebase-abort.sh
+++ b/t/t3407-rebase-abort.sh
@@ -40,9 +40,24 @@  testrebase() {
 		test_path_is_missing "$state_dir"
 	'
 
+	test_expect_success "pre rebase$type head is marked as reachable" '
+		# Clean up the state from the previous one
+		git checkout -f --detach pre-rebase &&
+		test_tick &&
+		git commit --amend --only -m "reworded" &&
+		orig_head=$(git rev-parse HEAD) &&
+		test_must_fail git rebase$type main &&
+		# Stop ORIG_HEAD marking $state_dir/orig-head as reachable
+		git update-ref -d ORIG_HEAD &&
+		git reflog expire --expire="$GIT_COMMITTER_DATE" --all &&
+		git prune --expire=now &&
+		git rebase --abort &&
+		test_cmp_rev $orig_head HEAD
+	'
+
 	test_expect_success "rebase$type --abort after --skip" '
 		# Clean up the state from the previous one
-		git reset --hard pre-rebase &&
+		git checkout -B to-rebase pre-rebase &&
 		test_must_fail git rebase$type main &&
 		test_path_is_dir "$state_dir" &&
 		test_must_fail git rebase --skip &&
diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
index 693934ee8be..1a820f14815 100755
--- a/t/t3420-rebase-autostash.sh
+++ b/t/t3420-rebase-autostash.sh
@@ -333,4 +333,14 @@  test_expect_success 'never change active branch' '
 	test_cmp_rev not-the-feature-branch unrelated-onto-branch
 '
 
+test_expect_success 'autostash commit is marked as reachable' '
+	echo changed >file0 &&
+	git rebase --autostash --exec "git prune --expire=now" \
+		feature-branch^ feature-branch &&
+	# git rebase succeeds if the stash cannot be applied so we need to check
+	# the contents of file0
+	echo changed >expect &&
+	test_cmp expect file0
+'
+
 test_done