Message ID | pull.1656.git.1707411636382.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | prune: mark rebase autostash and orig-head as reachable | expand |
On Thu, Feb 8, 2024 at 12:00 PM Phillip Wood via GitGitGadget <gitgitgadget@gmail.com> wrote: > Rebase records the oid of HEAD before rebasing and the commit created by > "--autostash" in files in the rebase state directory. This means that > the autostash commit is never reachable from any ref or reflog and when > rebasing a detached HEAD the original HEAD can become unreachable if the > user expires HEAD's the reflog while the rebase is running. Fix this by > reading the relevant files when marking reachable commits. > > Note that it is possible for the commit recorded in > .git/rebase-merge/amend to be unreachable but pruning that object does > not affect the operation of "git rebase --continue" as we're only > interested in the object id, not in the object itself. > > Reported-by: Orgad Shaneh <orgads@gmail.com> > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> > --- > diff --git a/reachable.c b/reachable.c > @@ -30,6 +31,53 @@ static void update_progress(struct connectivity_progress *cp) > +static int add_one_file(const char *path, struct rev_info *revs) > +{ > + struct strbuf buf = STRBUF_INIT; > + > + if (!read_oneliner(&buf, path, READ_ONELINER_SKIP_IF_EMPTY)) { > + strbuf_release(&buf); > + return 0; > + } > + strbuf_trim(&buf); > + if (!get_oid_hex(buf.buf, &oid)) { > + object = parse_object_or_die(&oid, buf.buf); > + add_pending_object(revs, object, ""); > + } > + return 0; > +} Is this leaking the strbuf? Should the function instead end with: strbuf_release(&buf); return 0; Also, what is the significance of the return value of this function? All code paths seem to be returning 0 unconditionally, and the caller ignores the return value. > +/* Mark objects recored in rebase state files as reachable. */ s/recored/recorded/
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > Rebase records the oid of HEAD before rebasing and the commit created by > "--autostash" in files in the rebase state directory. This means that > the autostash commit is never reachable from any ref or reflog and when > rebasing a detached HEAD the original HEAD can become unreachable if the > user expires HEAD's the reflog while the rebase is running. Fix this by > reading the relevant files when marking reachable commits. I do not like this kind of special casing in general, but because these are our tools' droppings, I am OK to grandfather them in, as long as we promise ourselves that we will not add more of these ad-hoc "text files" that record object names, loss of which affects correctness. They should, like "git bisect", be using proper references to protect these objects instead, of course. I agree with you that we might want to add pseudorefs as a starting points of reachability traversal, but I suspect it would add unnecessary complexity we would rather not want to deal with. For example, not GC'ing what is pointed at by lines in FETCH_HEAD is OK. Excluding those objects that are only reachable from an object mentioned by a pseudoref, when a new "git fetch" is negotiating with a remote what objects need to be sent here, might be disastrous, as the pseudoref that said "this object is here and you can safely consider everything reachable from it is" will be short-lived and can go away anytime, and an auto-gc kicking in at a wrong time ... Thanks.
Hi Eric On 08/02/2024 17:25, Eric Sunshine wrote: > On Thu, Feb 8, 2024 at 12:00 PM Phillip Wood via GitGitGadget >> +static int add_one_file(const char *path, struct rev_info *revs) >> +{ >> + struct strbuf buf = STRBUF_INIT; >> + >> + if (!read_oneliner(&buf, path, READ_ONELINER_SKIP_IF_EMPTY)) { >> + strbuf_release(&buf); >> + return 0; >> + } >> + strbuf_trim(&buf); >> + if (!get_oid_hex(buf.buf, &oid)) { >> + object = parse_object_or_die(&oid, buf.buf); >> + add_pending_object(revs, object, ""); >> + } >> + return 0; >> +} > > Is this leaking the strbuf? Should the function instead end with: > > strbuf_release(&buf); > return 0; Yes, well spotted > Also, what is the significance of the return value of this function? > All code paths seem to be returning 0 unconditionally, and the caller > ignores the return value. Good point, I think in an earlier draft it returned an error at one point, I'll change the return types. >> +/* Mark objects recored in rebase state files as reachable. */ > > s/recored/recorded/ Sharp eyes as ever, thanks for looking at this I'll re-roll Phillip
Hi Junio On 08/02/2024 18:06, Junio C Hamano wrote: > "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> Rebase records the oid of HEAD before rebasing and the commit created by >> "--autostash" in files in the rebase state directory. This means that >> the autostash commit is never reachable from any ref or reflog and when >> rebasing a detached HEAD the original HEAD can become unreachable if the >> user expires HEAD's the reflog while the rebase is running. Fix this by >> reading the relevant files when marking reachable commits. > > I do not like this kind of special casing in general, but because > these are our tools' droppings, I am OK to grandfather them in, as > long as we promise ourselves that we will not add more of these > ad-hoc "text files" that record object names, loss of which affects > correctness. They should, like "git bisect", be using proper > references to protect these objects instead, of course. We should definitely do that for future commands > I agree with you that we might want to add pseudorefs as a starting > points of reachability traversal, but I suspect it would add > unnecessary complexity we would rather not want to deal with. > > For example, not GC'ing what is pointed at by lines in FETCH_HEAD is > OK. Excluding those objects that are only reachable from an object > mentioned by a pseudoref, when a new "git fetch" is negotiating with > a remote what objects need to be sent here, might be disastrous, as > the pseudoref that said "this object is here and you can safely > consider everything reachable from it is" will be short-lived and > can go away anytime, and an auto-gc kicking in at a wrong time ... I can see that including pseudorefs when "git fetch" is negotiating could be problematic but does it use mark_reachable_objects()? Maybe I'm missing something as I've only done a quick grep but it only seems to be called from builtin/prune.c and builtin/repack.c and prior to 4421474e06 (Move traversal of reachable objects into a separate library., 2007-01-06) it seems to have been a static method in builtin-prune.c Best Wishes Phillip
On Thu, Feb 8, 2024 at 10:08 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > Rebase records the oid of HEAD before rebasing and the commit created by > > "--autostash" in files in the rebase state directory. This means that ... > I do not like this kind of special casing in general, but because > these are our tools' droppings, I am OK to grandfather them in, as > long as we promise ourselves that we will not add more of these > ad-hoc "text files" that record object names, loss of which affects > correctness. They should, like "git bisect", be using proper > references to protect these objects instead, of course. I have long wanted to have a special ref named "AUTOSTASH" since it supports my workflow of applying workdir changes to previous commits during a rebase. For example, I often do this: $ git rebase -i Created autostash: c0ffeebea0 ... <stopped to edit some commit in my history> $ git stash apply c0ffeebea0 $ git commit --amend && git rebase --continue But it requires me to find the text output after "Created autostash:" from the original rebase command which may have scrolled a lot by now. It would be easier to say: $ git stash apply AUTOSTASH I see that MERGE_AUTOSTASH has been added lately. And I am inferring that there's a desire to remove (eventually) these file-based info trackers such as "rebase-apply/autostash". Is there any reason not to raise the rebase/autostash notation to a proper ref now? Should it be named REBASE_AUTOSTASH if I add this? Even if we don't remove the file-based notation immediately "rebase-apply/autostash", I would like to add a ref that duplicates the information for my workflow. Maybe we can deprecate the file itself and remove it in some future version.
diff --git a/reachable.c b/reachable.c index f29b06a5d05..fef29422d4a 100644 --- a/reachable.c +++ b/reachable.c @@ -17,6 +17,7 @@ #include "pack-mtimes.h" #include "config.h" #include "run-command.h" +#include "sequencer.h" struct connectivity_progress { struct progress *progress; @@ -30,6 +31,53 @@ static void update_progress(struct connectivity_progress *cp) display_progress(cp->progress, cp->count); } +static int add_one_file(const char *path, struct rev_info *revs) +{ + struct strbuf buf = STRBUF_INIT; + struct object_id oid; + struct object *object; + + if (!read_oneliner(&buf, path, READ_ONELINER_SKIP_IF_EMPTY)) { + strbuf_release(&buf); + return 0; + } + strbuf_trim(&buf); + if (!get_oid_hex(buf.buf, &oid)) { + object = parse_object_or_die(&oid, buf.buf); + add_pending_object(revs, object, ""); + } + return 0; +} + +/* Mark objects recored in rebase state files as reachable. */ +static int add_rebase_files(struct rev_info *revs) +{ + struct strbuf buf = STRBUF_INIT; + size_t len; + const char *path[] = { + "rebase-apply/autostash", + "rebase-apply/orig-head", + "rebase-merge/autostash", + "rebase-merge/orig-head", + }; + struct worktree **worktrees = get_worktrees(); + + for (struct worktree **wt = worktrees; *wt; wt++) { + strbuf_reset(&buf); + strbuf_addstr(&buf, get_worktree_git_dir(*wt)); + strbuf_complete(&buf, '/'); + len = buf.len; + for (size_t i = 0; i < ARRAY_SIZE(path); i++) { + strbuf_setlen(&buf, len); + strbuf_addstr(&buf, path[i]); + add_one_file(buf.buf, revs); + } + } + strbuf_release(&buf); + free_worktrees(worktrees); + return 0; +} + static int add_one_ref(const char *path, const struct object_id *oid, int flag, void *cb_data) { @@ -322,6 +370,9 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog, head_ref(add_one_ref, revs); other_head_refs(add_one_ref, revs); + /* rebase autostash and orig-head */ + add_rebase_files(revs); + /* Add all reflog info */ if (mark_reflog) add_reflogs_to_pending(revs, 0); diff --git a/t/t3407-rebase-abort.sh b/t/t3407-rebase-abort.sh index ebbaed147a6..9f49c4228b6 100755 --- a/t/t3407-rebase-abort.sh +++ b/t/t3407-rebase-abort.sh @@ -40,9 +40,24 @@ testrebase() { test_path_is_missing "$state_dir" ' + test_expect_success "pre rebase$type head is marked as reachable" ' + # Clean up the state from the previous one + git checkout -f --detach pre-rebase && + test_tick && + git commit --amend --only -m "reworded" && + orig_head=$(git rev-parse HEAD) && + test_must_fail git rebase$type main && + # Stop ORIG_HEAD marking $state_dir/orig-head as reachable + git update-ref -d ORIG_HEAD && + git reflog expire --expire="$GIT_COMMITTER_DATE" --all && + git prune --expire=now && + git rebase --abort && + test_cmp_rev $orig_head HEAD + ' + test_expect_success "rebase$type --abort after --skip" ' # Clean up the state from the previous one - git reset --hard pre-rebase && + git checkout -B to-rebase pre-rebase && test_must_fail git rebase$type main && test_path_is_dir "$state_dir" && test_must_fail git rebase --skip && diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh index 693934ee8be..1a820f14815 100755 --- a/t/t3420-rebase-autostash.sh +++ b/t/t3420-rebase-autostash.sh @@ -333,4 +333,14 @@ test_expect_success 'never change active branch' ' test_cmp_rev not-the-feature-branch unrelated-onto-branch ' +test_expect_success 'autostash commit is marked as reachable' ' + echo changed >file0 && + git rebase --autostash --exec "git prune --expire=now" \ + feature-branch^ feature-branch && + # git rebase succeeds if the stash cannot be applied so we need to check + # the contents of file0 + echo changed >expect && + test_cmp expect file0 +' + test_done