Message ID | 24f2c4a62317231f4eabed23bb24d345abc9d67e.1603807338.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] rebase -i: stop overwriting ORIG_HEAD buffer | expand |
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Phillip Wood <phillip.wood@dunelm.org.uk> My initial impression after seeing the recent report about ORIG_HEAD was "hmph, these days, rebasing is done on detached HEAD and the final step updates the target branch only once, so @{1} is much easier to use---perhaps it is time to deprecate use of ORIG_HEAD?". After all, ORIG_HEAD was invented way before we had reflog, and given that one of the goal of reflog was to give more general recovery mechanism than going back one-step like ORIG_HEAD allowed us to, and "rebase" were taught to work on detached HEAD to make @{1} more useful, it would not be too bad to eventually retire ORIG_HEAD in a distant future, I thought. But it is a good initiative anyway to make ORIG_HEAD again work as documented. Thanks for working on it. > After rebasing ORIG_HEAD is supposed to point to the old HEAD of the A comma after "rebasing". > rebased branch. Unfortunately the buffer storing the oid was > overwritten with a new oid before ORIG_HEAD was created. The buffer is > also used when writing .git/rebase-merge/orig-head which is used by > `rebase --abort` to restore the previous head. Luckily that file is > written before the buffer is overwritten. As we want the full oid > find_unique_abbrev() is replaced with oid_to_hex_r() rather than > find_unique_abbrev_r(). The above is hard to read and understand. It is unclear where the observation of the current behaviour (which is often the explanation of the cause of the bug) ends and the description of new behaviour begins. ... old HEAD of the rebased branch. The code used find_unique_abbrev() to obtain the object name of the old HEAD and wrote to both .git/rebase-merge/orig-head (used by `rebase --abort` to go back to the previous state) and to ORIG_HEAD. The buffer find_unique_abbrev() gives back is volatile, unfortunately, and was overwritten after the former file is written but before ORIG_FILE is written, leaving an incorrect object name in it. Up to that point is the observation of the current code, which explains where the bug comes from. Please have a paragraph break after that, before explaining the solution, e.g. Avoid relying on the volatile buffer of find_unique_abbrev(), and instead supply our own buffer to keep the object name. Because we want to use the full object name, use oid_to_hex_r() instead of find_unique_abbrev_r() to do so. > I think that all of the users of head_hash should actually be using > opts->orig_head instead as passing a string rather than a struct > object_id around is a hang over from the scripted implementation. This > patch just fixes the immediate bug and adds a regression test based on > Caspar's reproduction example. The users will be converted to use > struct object_id and head_hash removed in the next few commits. Makes sense. > > Reported-by: Caspar Duregger <herr.kaste@gmail.com> > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> > --- If you cite "Caspar's repro example" like that in the log message, it is incomplete to leave out a URL to the mail archive. > builtin/rebase.c | 10 +++++----- > t/t3404-rebase-interactive.sh | 11 +++++++++++ > 2 files changed, 16 insertions(+), 5 deletions(-) > > diff --git a/builtin/rebase.c b/builtin/rebase.c > index eeca53382f..6def28a533 100644 > --- a/builtin/rebase.c > +++ b/builtin/rebase.c > @@ -270,15 +270,15 @@ static int edit_todo_file(unsigned flags) > } > > static int get_revision_ranges(struct commit *upstream, struct commit *onto, > - struct object_id *orig_head, const char **head_hash, > + struct object_id *orig_head, char *head_hash, > char **revisions, char **shortrevisions) > { > struct commit *base_rev = upstream ? upstream : onto; > const char *shorthead; > > - *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); > + oid_to_hex_r(head_hash, orig_head); > *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), > - *head_hash); > + head_hash); > > shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); > > @@ -327,7 +327,7 @@ static void split_exec_commands(const char *cmd, struct string_list *commands) > static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) > { > int ret; > - const char *head_hash = NULL; > + char head_hash[GIT_MAX_HEXSZ]; > char *revisions = NULL, *shortrevisions = NULL; > struct strvec make_script_args = STRVEC_INIT; > struct todo_list todo_list = TODO_LIST_INIT; > @@ -335,7 +335,7 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) > struct string_list commands = STRING_LIST_INIT_DUP; > > if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, > - &head_hash, &revisions, &shortrevisions)) > + head_hash, &revisions, &shortrevisions)) > return -1; > > if (init_basic_state(&replay, > diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh > index 07a1617351..1e56696e4f 100755 > --- a/t/t3404-rebase-interactive.sh > +++ b/t/t3404-rebase-interactive.sh > @@ -1797,6 +1797,17 @@ test_expect_success 'todo has correct onto hash' ' > test_i18ngrep "^# Rebase ..* onto $onto" actual > ' > > +test_expect_success 'ORIG_HEAD is updated correctly' ' > + test_when_finished "git checkout master && git branch -D test-orig-head" && > + git checkout -b test-orig-head A && > + git commit --allow-empty -m A1 && > + git commit --allow-empty -m A2 && > + git commit --allow-empty -m A3 && > + git commit --allow-empty -m A4 && > + git rebase master && > + test_cmp_rev ORIG_HEAD test-orig-head@{1} > +' > + > # This must be the last test in this file > test_expect_success '$EDITOR and friends are unchanged' ' > test_editor_unchanged
On 27/10/2020 21:10, Junio C Hamano wrote: > "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Phillip Wood <phillip.wood@dunelm.org.uk> > > My initial impression after seeing the recent report about ORIG_HEAD > was "hmph, these days, rebasing is done on detached HEAD and the > final step updates the target branch only once, so @{1} is much > easier to use---perhaps it is time to deprecate use of ORIG_HEAD?". > After all, ORIG_HEAD was invented way before we had reflog, and > given that one of the goal of reflog was to give more general > recovery mechanism than going back one-step like ORIG_HEAD allowed > us to, and "rebase" were taught to work on detached HEAD to make > @{1} more useful, it would not be too bad to eventually retire > ORIG_HEAD in a distant future, I thought. Thanks for filling in the history. As reset sets ORIG_HEAD as well as rebase it can be confusing so retiring it in the future maybe a good idea. > But it is a good initiative anyway to make ORIG_HEAD again work as > documented. Thanks for working on it. > >> After rebasing ORIG_HEAD is supposed to point to the old HEAD of the > > A comma after "rebasing". Sure >> rebased branch. Unfortunately the buffer storing the oid was >> overwritten with a new oid before ORIG_HEAD was created. The buffer is >> also used when writing .git/rebase-merge/orig-head which is used by >> `rebase --abort` to restore the previous head. Luckily that file is >> written before the buffer is overwritten. As we want the full oid >> find_unique_abbrev() is replaced with oid_to_hex_r() rather than >> find_unique_abbrev_r(). > > The above is hard to read and understand. It is unclear where the > observation of the current behaviour (which is often the explanation > of the cause of the bug) ends and the description of new behaviour > begins. > > ... old HEAD of the rebased branch. The code used > find_unique_abbrev() to obtain the object name of the old HEAD > and wrote to both .git/rebase-merge/orig-head (used by `rebase > --abort` to go back to the previous state) and to ORIG_HEAD. > The buffer find_unique_abbrev() gives back is volatile, > unfortunately, and was overwritten after the former file is > written but before ORIG_FILE is written, leaving an incorrect > object name in it. > > Up to that point is the observation of the current code, which > explains where the bug comes from. Please have a paragraph break > after that, before explaining the solution, e.g. > > Avoid relying on the volatile buffer of find_unique_abbrev(), > and instead supply our own buffer to keep the object name. > Because we want to use the full object name, use oid_to_hex_r() > instead of find_unique_abbrev_r() to do so. That's much clearer thanks. I wonder if swapping to oid_to_hex_r() rather than find_unique_abbev_r() is complicating this commit unnecessarily as the code that is changed here is deleted in patch 4, maybe we should do the switch there. >> I think that all of the users of head_hash should actually be using >> opts->orig_head instead as passing a string rather than a struct >> object_id around is a hang over from the scripted implementation. This >> patch just fixes the immediate bug and adds a regression test based on >> Caspar's reproduction example. The users will be converted to use >> struct object_id and head_hash removed in the next few commits. > > Makes sense. > >> >> Reported-by: Caspar Duregger <herr.kaste@gmail.com> >> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> >> --- > > If you cite "Caspar's repro example" like that in the log message, > it is incomplete to leave out a URL to the mail archive. Good point I'll add a url Thanks for your comments Phillip >> builtin/rebase.c | 10 +++++----- >> t/t3404-rebase-interactive.sh | 11 +++++++++++ >> 2 files changed, 16 insertions(+), 5 deletions(-) >> >> diff --git a/builtin/rebase.c b/builtin/rebase.c >> index eeca53382f..6def28a533 100644 >> --- a/builtin/rebase.c >> +++ b/builtin/rebase.c >> @@ -270,15 +270,15 @@ static int edit_todo_file(unsigned flags) >> } >> >> static int get_revision_ranges(struct commit *upstream, struct commit *onto, >> - struct object_id *orig_head, const char **head_hash, >> + struct object_id *orig_head, char *head_hash, >> char **revisions, char **shortrevisions) >> { >> struct commit *base_rev = upstream ? upstream : onto; >> const char *shorthead; >> >> - *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); >> + oid_to_hex_r(head_hash, orig_head); >> *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), >> - *head_hash); >> + head_hash); >> >> shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); >> >> @@ -327,7 +327,7 @@ static void split_exec_commands(const char *cmd, struct string_list *commands) >> static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) >> { >> int ret; >> - const char *head_hash = NULL; >> + char head_hash[GIT_MAX_HEXSZ]; >> char *revisions = NULL, *shortrevisions = NULL; >> struct strvec make_script_args = STRVEC_INIT; >> struct todo_list todo_list = TODO_LIST_INIT; >> @@ -335,7 +335,7 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) >> struct string_list commands = STRING_LIST_INIT_DUP; >> >> if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, >> - &head_hash, &revisions, &shortrevisions)) >> + head_hash, &revisions, &shortrevisions)) >> return -1; >> >> if (init_basic_state(&replay, >> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh >> index 07a1617351..1e56696e4f 100755 >> --- a/t/t3404-rebase-interactive.sh >> +++ b/t/t3404-rebase-interactive.sh >> @@ -1797,6 +1797,17 @@ test_expect_success 'todo has correct onto hash' ' >> test_i18ngrep "^# Rebase ..* onto $onto" actual >> ' >> >> +test_expect_success 'ORIG_HEAD is updated correctly' ' >> + test_when_finished "git checkout master && git branch -D test-orig-head" && >> + git checkout -b test-orig-head A && >> + git commit --allow-empty -m A1 && >> + git commit --allow-empty -m A2 && >> + git commit --allow-empty -m A3 && >> + git commit --allow-empty -m A4 && >> + git rebase master && >> + test_cmp_rev ORIG_HEAD test-orig-head@{1} >> +' >> + >> # This must be the last test in this file >> test_expect_success '$EDITOR and friends are unchanged' ' >> test_editor_unchanged
Am Di., 27. Okt. 2020 um 22:10 Uhr schrieb Junio C Hamano <gitster@pobox.com>: > > My initial impression after seeing the recent report about ORIG_HEAD > was "hmph, these days, rebasing is done on detached HEAD and the > final step updates the target branch only once, so @{1} is much > easier to use---perhaps it is time to deprecate use of ORIG_HEAD?". > After all, ORIG_HEAD was invented way before we had reflog, and > given that one of the goal of reflog was to give more general > recovery mechanism than going back one-step like ORIG_HEAD allowed > us to, and "rebase" were taught to work on detached HEAD to make > @{1} more useful, it would not be too bad to eventually retire > ORIG_HEAD in a distant future, I thought. Phillip pointed out that ORIG_HEAD is actually not save *if* there is a `reset` or `rebase --skip` during the rebase. Otherwise, by design, ORIG_HEAD would be easier to use, as in the form `<branch_name>@{<n>}` two things have to be decided and can go wrong. Wouldn't it be better for the mid-term run to set ORIG_HEAD at the end of the rebase operation, basically during cleanup, and make the commitment that it always points to the right thing for undoing the recent rebase or to compare it (`git diff ORIG_HEAD`) to see possible unwanted merge resolutions. I say mid-term because a `git rebase --undo` with additional safety checks and a `--autostash` for example is IMO the better long-run UX, in a very distant future. > > But it is a good initiative anyway to make ORIG_HEAD again work as > documented. Thanks for working on it. > Regards Caspar Duregger
"herr.kaste" <herr.kaste@gmail.com> writes: > Phillip pointed out that ORIG_HEAD is actually not save *if* there is > a `reset` or `rebase --skip` during the rebase. Otherwise, by design, > ORIG_HEAD would be easier to use, as in the form `<branch_name>@{<n>}` > two things have to be decided and can go wrong. What "two"? You should be able to just say @{1} regardless---that was the whole point of performing all the intermediate steps while on the detached HEAD so that you can rely on <n> being 1, and @{<num or time>} is a short-hand of <branch>@{<num or time>} for the current branch, and not a short-hand for HEAD@{...}, to help such a use case. Or am I missing something?
Am Di., 3. Nov. 2020 um 01:21 Uhr schrieb Junio C Hamano <gitster@pobox.com>: > > "herr.kaste" <herr.kaste@gmail.com> writes: > > > Phillip pointed out that ORIG_HEAD is actually not save *if* there is > > a `reset` or `rebase --skip` during the rebase. Otherwise, by design, > > ORIG_HEAD would be easier to use, as in the form `<branch_name>@{<n>}` > > two things have to be decided and can go wrong. > > What "two"? You should be able to just say @{1} regardless---that > was the whole point of performing all the intermediate steps while > on the detached HEAD so that you can rely on <n> being 1, and @{<num > or time>} is a short-hand of <branch>@{<num or time>} for the > current branch, and not a short-hand for HEAD@{...}, to help such a > use case. > > Or am I missing something? Well, "@{1}" basically means: from the stream of things that happened take the first. It is very natural to refer to the most recent thing differently. In practice, until now, I used the {...} form only to refer to older things. To put it differently, using {...} I'm researching history. From the docs: ORIG_HEAD is created by commands that move your HEAD in a drastic way, to record the position of the HEAD before their operation, so that you can easily change the tip of the branch back to the state before you ran them. That's just humane. You do something, and then you revert. I don't need a concept of a written history here, just of recency.
diff --git a/builtin/rebase.c b/builtin/rebase.c index eeca53382f..6def28a533 100644 --- a/builtin/rebase.c +++ b/builtin/rebase.c @@ -270,15 +270,15 @@ static int edit_todo_file(unsigned flags) } static int get_revision_ranges(struct commit *upstream, struct commit *onto, - struct object_id *orig_head, const char **head_hash, + struct object_id *orig_head, char *head_hash, char **revisions, char **shortrevisions) { struct commit *base_rev = upstream ? upstream : onto; const char *shorthead; - *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); + oid_to_hex_r(head_hash, orig_head); *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), - *head_hash); + head_hash); shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); @@ -327,7 +327,7 @@ static void split_exec_commands(const char *cmd, struct string_list *commands) static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) { int ret; - const char *head_hash = NULL; + char head_hash[GIT_MAX_HEXSZ]; char *revisions = NULL, *shortrevisions = NULL; struct strvec make_script_args = STRVEC_INIT; struct todo_list todo_list = TODO_LIST_INIT; @@ -335,7 +335,7 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) struct string_list commands = STRING_LIST_INIT_DUP; if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, - &head_hash, &revisions, &shortrevisions)) + head_hash, &revisions, &shortrevisions)) return -1; if (init_basic_state(&replay, diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh index 07a1617351..1e56696e4f 100755 --- a/t/t3404-rebase-interactive.sh +++ b/t/t3404-rebase-interactive.sh @@ -1797,6 +1797,17 @@ test_expect_success 'todo has correct onto hash' ' test_i18ngrep "^# Rebase ..* onto $onto" actual ' +test_expect_success 'ORIG_HEAD is updated correctly' ' + test_when_finished "git checkout master && git branch -D test-orig-head" && + git checkout -b test-orig-head A && + git commit --allow-empty -m A1 && + git commit --allow-empty -m A2 && + git commit --allow-empty -m A3 && + git commit --allow-empty -m A4 && + git rebase master && + test_cmp_rev ORIG_HEAD test-orig-head@{1} +' + # This must be the last test in this file test_expect_success '$EDITOR and friends are unchanged' ' test_editor_unchanged