diff mbox series

sequencer: avoid dropping fixup commit that targets self via commit-ish

Message ID 20220918121053.880225-1-aclopte@gmail.com (mailing list archive)
State Superseded
Headers show
Series sequencer: avoid dropping fixup commit that targets self via commit-ish | expand

Commit Message

Johannes Altmanninger Sept. 18, 2022, 12:10 p.m. UTC
Commit 68d5d03bc4 (rebase: teach --autosquash to match on sha1 in
addition to message, 2010-11-04) made --autosquash apply a commit
with subject "fixup! 012345" to the first commit in the todo list
whose OID starts with 012345. So far so good.

More recently, c44a4c650c (rebase -i: rearrange fixup/squash lines
using the rebase--helper, 2017-07-14) reimplemented this logic in C
and introduced two behavior changes.
First, OID matches are given precedence over subject prefix
matches.  Second, instead of prefix-matching OIDs, we use
lookup_commit_reference_by_name().  This means that if 012345 is a
branch name, we will apply the fixup commit to the tip of that branch
(if that is present in the todo list).

Both behavior changes might be motivated by performance concerns
(since the commit message mentions performance).  Looking through
the todo list to find a commit that matches the given prefix can be
more expensive than looking up an OID.  The runtime of the former is
of O(n*m) where n is the size of the todo list and m is the length
of a commit subject. However, if this is really a problem, we could
easily make it O(m) by constructing a trie (prefix tree).

Demonstrate both behavior changes by adding two test cases for
"fixup! foo" where foo is a commit-ish that is not an OID-prefix.
Arguably, this feature is very weird.  If no one uses it we should
consider removing it.

Regardless, there is one bad edge case to fix.  Let refspec "foo" point
to a commit with the subject "fixup! foo". Since rebase --autosquash
finds the fixup target via lookup_commit_reference_by_name(), the
fixup target is the fixup commit itself. Obviously this can't work.
We proceed with the broken invariant and drop the fixup commit
entirely.

The self-fixup was only allowed because the fixup commit was already
added to the preliminary todo list, which it shouldn't be.  Rather,
we should first compute the fixup target and only then add the fixup
commit to the todo list. Make it so, avoiding this error by design,
and add a third test for this case.

Reported-by: Erik Cervin Edin <erik@cervined.in>
Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
---
 sequencer.c                  |  4 ++--
 t/t3415-rebase-autosquash.sh | 44 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)

Comments

Erik Cervin Edin Sept. 18, 2022, 3:05 p.m. UTC | #1
Thank you for the explanation!

On Sun, Sep 18, 2022 at 2:11 PM Johannes Altmanninger <aclopte@gmail.com> wrote:
>
> Commit 68d5d03bc4 (rebase: teach --autosquash to match on sha1 in
> addition to message, 2010-11-04) made --autosquash apply a commit
> with subject "fixup! 012345" to the first commit in the todo list
> whose OID starts with 012345. So far so good.

I wasn't aware such fixups were recognized. The only way to create
such fixups is manually, correct?
Johannes Altmanninger Sept. 18, 2022, 5:54 p.m. UTC | #2
On Sun, Sep 18, 2022 at 05:05:29PM +0200, Erik Cervin Edin wrote:
> Thank you for the explanation!
> 
> On Sun, Sep 18, 2022 at 2:11 PM Johannes Altmanninger <aclopte@gmail.com> wrote:
> >
> > Commit 68d5d03bc4 (rebase: teach --autosquash to match on sha1 in
> > addition to message, 2010-11-04) made --autosquash apply a commit
> > with subject "fixup! 012345" to the first commit in the todo list
> > whose OID starts with 012345. So far so good.
> 
> I wasn't aware such fixups were recognized.

Neither was I. I've repeatedly had issues where the message created by
"commit --fixup" was ambiguous (user error I know) but I can change my tool
to use OIDs for the cases when I squash them right away..

> The only way to create
> such fixups is manually, correct?

I think so
Junio C Hamano Sept. 19, 2022, 1:11 a.m. UTC | #3
Johannes Altmanninger <aclopte@gmail.com> writes:

> First, OID matches are given precedence over subject prefix
> matches.  Second, instead of prefix-matching OIDs, we use
> lookup_commit_reference_by_name().  This means that if 012345 is a
> branch name, we will apply the fixup commit to the tip of that branch
> (if that is present in the todo list).

Good finding.

I think that both are results from imprecise conversion from the
original done carelessly.  A rewrite does not necessarily have to be
bug-to-bug equivalent, and the precedence between object names vs
subject prefix is something I think does not have to be kept the
same as the original (in other words, a user who gives a token after
"fixup" that is ambiguous between the two deserves whatever the
implementation happens to give).  But use of _by_name() that does
not limit the input to hexadecimal _is_ a problem exactly for the
reason why we have this discussion thread.  The function accepts
more than we want to accept, and anybody who reads the original
commit 68d5d03b (rebase: teach --autosquash to match on sha1 in
addition to message, 2010-11-04) and understands why we added it
wouldn't have used it.

Your solution looks somewhat surprising to me, as I would naively
have thought to fix the use of _by_name() and limit its use only
when the input token is all hexadecimal, or something.  I'd need to
think more to convince myself why this is the right solution.

Thanks.

> Demonstrate both behavior changes by adding two test cases for
> "fixup! foo" where foo is a commit-ish that is not an OID-prefix.
> Arguably, this feature is very weird.  If no one uses it we should
> consider removing it.
>
> Regardless, there is one bad edge case to fix.  Let refspec "foo" point
> to a commit with the subject "fixup! foo". Since rebase --autosquash
> finds the fixup target via lookup_commit_reference_by_name(), the
> fixup target is the fixup commit itself. Obviously this can't work.
> We proceed with the broken invariant and drop the fixup commit
> entirely.
>
> The self-fixup was only allowed because the fixup commit was already
> added to the preliminary todo list, which it shouldn't be.  Rather,
> we should first compute the fixup target and only then add the fixup
> commit to the todo list. Make it so, avoiding this error by design,
> and add a third test for this case.
>
> Reported-by: Erik Cervin Edin <erik@cervined.in>
> Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
> ---
>  sequencer.c                  |  4 ++--
>  t/t3415-rebase-autosquash.sh | 44 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 484ca9aa50..777200a6dc 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -6287,8 +6287,6 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
>  			return error(_("the script was already rearranged."));
>  		}
>  
> -		*commit_todo_item_at(&commit_todo, item->commit) = item;
> -
>  		parse_commit(item->commit);
>  		commit_buffer = logmsg_reencode(item->commit, NULL, "UTF-8");
>  		find_commit_subject(commit_buffer, &subject);
> @@ -6355,6 +6353,8 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
>  					strhash(entry->subject));
>  			hashmap_put(&subject2item, &entry->entry);
>  		}
> +
> +		*commit_todo_item_at(&commit_todo, item->commit) = item;
>  	}
>  
>  	if (rearranged) {
> diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
> index 78c27496d6..879e628512 100755
> --- a/t/t3415-rebase-autosquash.sh
> +++ b/t/t3415-rebase-autosquash.sh
> @@ -232,6 +232,50 @@ test_expect_success 'auto squash that matches longer sha1' '
>  	test_line_count = 1 actual
>  '
>  
> +test_expect_success 'auto squash that matches regex' '
> +	git reset --hard base &&
> +	git commit --allow-empty -m "hay needle hay" &&
> +	git commit --allow-empty -m "fixup! :/[n]eedle" &&
> +	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ &&
> +	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
> +	cat <<-EOF >expect &&
> +	pick HASH hay needle hay # empty
> +	fixup HASH fixup! :/[n]eedle # empty
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'auto squash of fixup commit that matches branch name' '
> +	git reset --hard base &&
> +	git commit --allow-empty -m "wip commit (just a prefix match so overshadowed by branch)" &&
> +	git commit --allow-empty -m "tip of wip" &&
> +	git branch wip &&
> +	git commit --allow-empty -m "unrelated commit" &&
> +	git commit --allow-empty -m "fixup! wip" &&
> +	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^^^ &&
> +	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
> +	cat <<-EOF >expect &&
> +	pick HASH wip commit (just a prefix match so overshadowed by branch) # empty
> +	pick HASH tip of wip # empty
> +	fixup HASH fixup! wip # empty
> +	pick HASH unrelated commit # empty
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'auto squash of fixup commit that matches branch name which points back to fixup commit' '
> +	git reset --hard base &&
> +	git commit --allow-empty -m "fixup! self-cycle" &&
> +	git branch self-cycle &&
> +	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ &&
> +	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
> +	cat <<-EOF >expect &&
> +	pick HASH second commit
> +	pick HASH fixup! self-cycle # empty
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  test_auto_commit_flags () {
>  	git reset --hard base &&
>  	echo 1 >file1 &&
Junio C Hamano Sept. 19, 2022, 4:07 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> ...  But use of _by_name() that does
> not limit the input to hexadecimal _is_ a problem ...

Ah, no, sorry, this was wrong.  The original used "rev-parse -q --verify"
without restricting the "single word" to "sha1 prefix".
Junio C Hamano Sept. 19, 2022, 4:23 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Your solution looks somewhat surprising to me, as I would naively
> have thought to fix the use of _by_name() and limit its use only
> when the input token is all hexadecimal, or something.  I'd need to
> think more to convince myself why this is the right solution.

OK, we try to find what to amend with the current "fixupish" from
the todo slab, which by definition must be something that we have
already dealt with.  The current "fixupish" should not be found
because we haven't finished dealing with it, so delaying the
addition of the current one to the todo slab is a must.

There is no leaving or continuing the loop, other than an error
return that aborts the whole thing, that may break the resulting
todo slab in the normal case due to this change, either.  

The fix makes sense to me.  Will queue.

Thanks.
Junio C Hamano Sept. 19, 2022, 11:09 p.m. UTC | #6
Johannes Altmanninger <aclopte@gmail.com> writes:

> +test_expect_success 'auto squash of fixup commit that matches branch name which points back to fixup commit' '
> ...
> +	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&

The construct is rejected by sed implementations of BSD descent, it
seems.

https://github.com/git/git/actions/runs/3084784922/jobs/4987337058#step:4:1844

++ sed -ne '/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}' tmp
sed: 1: "/^[^#]/{s/[0-9a-f]\{7,\ ...": extra characters at the end of p command
error: last command exited with $?=1
not ok 11 - auto squash that matches regex

Here is a fix-up that can be squashed in.

Thanks.

 t/t3415-rebase-autosquash.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
index 879e628512..d65d2258c3 100755
--- a/t/t3415-rebase-autosquash.sh
+++ b/t/t3415-rebase-autosquash.sh
@@ -237,7 +237,7 @@ test_expect_success 'auto squash that matches regex' '
 	git commit --allow-empty -m "hay needle hay" &&
 	git commit --allow-empty -m "fixup! :/[n]eedle" &&
 	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ &&
-	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
+	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p;}" tmp >actual &&
 	cat <<-EOF >expect &&
 	pick HASH hay needle hay # empty
 	fixup HASH fixup! :/[n]eedle # empty
@@ -253,7 +253,7 @@ test_expect_success 'auto squash of fixup commit that matches branch name' '
 	git commit --allow-empty -m "unrelated commit" &&
 	git commit --allow-empty -m "fixup! wip" &&
 	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^^^ &&
-	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
+	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p;}" tmp >actual &&
 	cat <<-EOF >expect &&
 	pick HASH wip commit (just a prefix match so overshadowed by branch) # empty
 	pick HASH tip of wip # empty
@@ -268,7 +268,7 @@ test_expect_success 'auto squash of fixup commit that matches branch name which
 	git commit --allow-empty -m "fixup! self-cycle" &&
 	git branch self-cycle &&
 	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ &&
-	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
+	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p;}" tmp >actual &&
 	cat <<-EOF >expect &&
 	pick HASH second commit
 	pick HASH fixup! self-cycle # empty
Johannes Altmanninger Sept. 20, 2022, 3:20 a.m. UTC | #7
On Mon, Sep 19, 2022 at 09:07:27AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > ...  But use of _by_name() that does
> > not limit the input to hexadecimal _is_ a problem ...
> 
> Ah, no, sorry, this was wrong.  The original used "rev-parse -q --verify"
> without restricting the "single word" to "sha1 prefix".
> 

Yeah, I've sent a v2 that removes the false verdicts from the commit message.

Even if we restricted it to SHAs it would still be ambiguous.. I guess it
doesn't matter in practice.

Thanks
Johannes Altmanninger Sept. 20, 2022, 3:27 a.m. UTC | #8
On Mon, Sep 19, 2022 at 04:09:57PM -0700, Junio C Hamano wrote:
> Johannes Altmanninger <aclopte@gmail.com> writes:
> 
> > +test_expect_success 'auto squash of fixup commit that matches branch name which points back to fixup commit' '
> > ...
> > +	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&

FWIW I copied and adapted this one from the remerge-diff tests.  Not sure
what other tests use. Maybe the HASH replacement is worth a test helper even
though it is a heuristic that can go wrong.
diff mbox series

Patch

diff --git a/sequencer.c b/sequencer.c
index 484ca9aa50..777200a6dc 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -6287,8 +6287,6 @@  int todo_list_rearrange_squash(struct todo_list *todo_list)
 			return error(_("the script was already rearranged."));
 		}
 
-		*commit_todo_item_at(&commit_todo, item->commit) = item;
-
 		parse_commit(item->commit);
 		commit_buffer = logmsg_reencode(item->commit, NULL, "UTF-8");
 		find_commit_subject(commit_buffer, &subject);
@@ -6355,6 +6353,8 @@  int todo_list_rearrange_squash(struct todo_list *todo_list)
 					strhash(entry->subject));
 			hashmap_put(&subject2item, &entry->entry);
 		}
+
+		*commit_todo_item_at(&commit_todo, item->commit) = item;
 	}
 
 	if (rearranged) {
diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
index 78c27496d6..879e628512 100755
--- a/t/t3415-rebase-autosquash.sh
+++ b/t/t3415-rebase-autosquash.sh
@@ -232,6 +232,50 @@  test_expect_success 'auto squash that matches longer sha1' '
 	test_line_count = 1 actual
 '
 
+test_expect_success 'auto squash that matches regex' '
+	git reset --hard base &&
+	git commit --allow-empty -m "hay needle hay" &&
+	git commit --allow-empty -m "fixup! :/[n]eedle" &&
+	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ &&
+	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
+	cat <<-EOF >expect &&
+	pick HASH hay needle hay # empty
+	fixup HASH fixup! :/[n]eedle # empty
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'auto squash of fixup commit that matches branch name' '
+	git reset --hard base &&
+	git commit --allow-empty -m "wip commit (just a prefix match so overshadowed by branch)" &&
+	git commit --allow-empty -m "tip of wip" &&
+	git branch wip &&
+	git commit --allow-empty -m "unrelated commit" &&
+	git commit --allow-empty -m "fixup! wip" &&
+	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^^^ &&
+	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
+	cat <<-EOF >expect &&
+	pick HASH wip commit (just a prefix match so overshadowed by branch) # empty
+	pick HASH tip of wip # empty
+	fixup HASH fixup! wip # empty
+	pick HASH unrelated commit # empty
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'auto squash of fixup commit that matches branch name which points back to fixup commit' '
+	git reset --hard base &&
+	git commit --allow-empty -m "fixup! self-cycle" &&
+	git branch self-cycle &&
+	GIT_SEQUENCE_EDITOR="cat >tmp" git rebase --autosquash -i HEAD^^ &&
+	sed -ne "/^[^#]/{s/[0-9a-f]\{7,\}/HASH/g;p}" tmp >actual &&
+	cat <<-EOF >expect &&
+	pick HASH second commit
+	pick HASH fixup! self-cycle # empty
+	EOF
+	test_cmp expect actual
+'
+
 test_auto_commit_flags () {
 	git reset --hard base &&
 	echo 1 >file1 &&