rebase -i: check labels and refs when parsing todo list

Message ID	pull.1482.git.1676644675638.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> Message-Id: <pull.1482.git.1676644675638.gitgitgadget@gmail.com> Date: Fri, 17 Feb 2023 14:37:55 +0000 Subject: [PATCH] rebase -i: check labels and refs when parsing todo list Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee <derrickstolee@github.com>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Phillip Wood <phillip.wood@dunelm.org.uk>, Phillip Wood <phillip.wood@dunelm.org.uk> Precedence: bulk From: Phillip Wood <phillip.wood@dunelm.org.uk>
Series	rebase -i: check labels and refs when parsing todo list \| expand rebase -i: check labels and refs when parsing todo list

Message ID

pull.1482.git.1676644675638.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: <pull.1482.git.1676644675638.gitgitgadget@gmail.com>
Date: Fri, 17 Feb 2023 14:37:55 +0000
Subject: [PATCH] rebase -i: check labels and refs when parsing todo list
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>,
        Johannes Schindelin <Johannes.Schindelin@gmx.de>,
        Phillip Wood <phillip.wood@dunelm.org.uk>,
        Phillip Wood <phillip.wood@dunelm.org.uk>
Precedence: bulk
From: Phillip Wood <phillip.wood@dunelm.org.uk>

Series

rebase -i: check labels and refs when parsing todo list | expand

Commit Message

Phillip Wood Feb. 17, 2023, 2:37 p.m. UTC

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Check that the argument to the "label" and "update-ref" commands is a
valid refname when the todo list is parsed rather than waiting until the
command is executed. This means that the user can deal with any errors
at the beginning of the rebase rather than having it stop halfway
through due to a typo in a label name. The "update-ref" command is
changed to reject single level refs as it is all to easy to type
"update-ref branch" which is incorrect rather than "update-ref
refs/heads/branch"

Note that it is not straight forward to check the arguments to "reset"
and "merge" commands as they may be any revision, not just a refname and
we do not have an equivalent of check_refname_format() for revisions.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
    rebase -i: check labels and refs when parsing todo list
    
    Hopefully detecting these errors when the user edits the todo list will
    give a better experience.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1482%2Fphillipwood%2Frebase-check-labels-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1482/phillipwood/rebase-check-labels-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1482

 sequencer.c                   | 37 ++++++++++++++++++++++++++++++++++-
 t/t3404-rebase-interactive.sh | 25 ++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 2 deletions(-)


base-commit: b1485644f936ee83a995ec24d23f713f4230a1ae

Comments

Derrick Stolee Feb. 17, 2023, 3:32 p.m. UTC | #1

On 2/17/2023 9:37 AM, Phillip Wood via GitGitGadget wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> Check that the argument to the "label" and "update-ref" commands is a
> valid refname when the todo list is parsed rather than waiting until the
> command is executed. This means that the user can deal with any errors
> at the beginning of the rebase rather than having it stop halfway
> through due to a typo in a label name.

Thanks for thinking about this user experience. This is a good
improvement to the user interaction.

> The "update-ref" command is
> changed to reject single level refs as it is all to easy to type
> "update-ref branch" which is incorrect rather than "update-ref
> refs/heads/branch"

I think it's good to start by adding the restriction in this
check, but we could revisit this requirement in the future to
see if it is worth allowing the user to drop "refs/heads/" and
let it be implied. It just adds some complexity to the parsing,
so this patch adds helpful scaffolding (in checks and tests)
such that we could do that later in a safer way.

> +static int check_label_or_ref_arg(enum todo_command command, const char *arg)
> +{
> +	int allow_onelevel =
> +		command == TODO_LABEL ? REFNAME_ALLOW_ONELEVEL : 0;
> +
> +	if ((command == TODO_LABEL && !strcmp(arg, "#")) ||

Interesting that "#" means something special for the label, and
it's not limited to just the start of the label name, but must
be the entire string. Is this not something that
check_refname_format() would catch? Is the motivation that users
might add what they think is a comment, such as:

  label # make a label here

but oddly, this doesn't include something strange like

  label #make a label here

> +	    check_refname_format(arg, allow_onelevel)) {
> +		if (command == TODO_LABEL)
> +			error(_("'%s' is not a valid label"), arg);

If we have any kind of error and we are in TODO_LABEL, then
we can use a label message. Good.

> +		else if (check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
> +			error(_("'%s' is not a valid refname"), arg);
> +		else
> +			error(_("update-ref requires a fully qualified refname e.g. refs/heads/%s"),
> +			      arg);

This took me a little while to grok, but I think I have it
now: when in the update-ref mode, it could fail because of a
one-level ref (the else case) or it could fail because the
ref name uses forbidden characters (the else if case).

This nesting of conditions seems a bit fragile if we were to
add a new todo_command to check here. Perhaps reorganize it
to switch on the command?

	switch (command) {
	case TODO_LABEL:
		if (!strcmp(arg, "#") ||
		    check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
			return error(_("'%s' is not a valid label", arg);
		break;

	case TODO_UPDATE_REF:
		if (check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
			return error(_("'%s' is not a valid refname"), arg);
		else if (check_refname_format(arg, 0))
			return error(_("update-ref reqruies a fully qualified refname (e.g. refs/heads/%s)",
				     arg);
		break;

	default:
		BUG("unexpected todo_command");
	}

	return 0;

> @@ -2523,8 +2543,23 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>  		return error(_("missing arguments for %s"),
>  			     command_to_string(item->command));
>  
> -	if (item->command == TODO_EXEC || item->command == TODO_LABEL ||
> +	if (item->command == TODO_LABEL ||
>  	    item->command == TODO_RESET || item->command == TODO_UPDATE_REF) {
> +		int ret = 0;
> +
> +		item->commit = NULL;
> +		item->arg_offset = bol - buf;
> +		item->arg_len = (int)(eol - bol);
> +		if (item->command != TODO_RESET) {
> +			saved = *eol;
> +			*eol = '\0';
> +			ret = check_label_or_ref_arg(item->command, bol);
> +			*eol = saved;
> +		}
> +		return ret;
> +	}
> +
> +	if (item->command == TODO_EXEC) {
>  		item->commit = NULL;
>  		item->arg_offset = bol - buf;
>  		item->arg_len = (int)(eol - bol);

(What's missing from this context is "return 0;")

Is there an important reason why you separated TODO_EXEC and
its identical item->arg_(offset|len) parsing into its own
block? It seems like we could modify your change to look like
this:

	if (item->command == TODO_EXEC || item->command == TODO_LABEL ||
  	    item->command == TODO_RESET || item->command == TODO_UPDATE_REF) {
		int ret = 0;

		item->commit = NULL;
		item->arg_offset = bol - buf;
		item->arg_len = (int)(eol - bol);
		if (item->command == TODO_RESET ||
		    item->command == TODO_UPDATE_REF) {
			saved = *eol;
			*eol = '\0';
			ret = check_label_or_ref_arg(item->command, bol);
			*eol = saved;
		}
		return ret;
	}

and the diff will have fewer new lines as well as fewer
duplicate lines in the post-image. Am I missing something
about TODO_EXEC being special?

> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 462cefd25df..2cf2d2b8a24 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -2120,7 +2120,30 @@ test_expect_success '--update-refs: check failed ref update' '
>  	tail -n 6 err >err.last &&
>  	sed -e "s/Rebasing.*Successfully/Successfully/g" -e "s/^\t//g" \
>  		<err.last >err.trimmed &&
> -	test_cmp expect err.trimmed
> +	test_cmp expect err.trimmed &&
> +	git rebase --abort
> +'

Perhaps this `git rebase --abort` should be part of a
`test_when_finished test_may_fail git rebase --abort` at
the start of the test so that your new test can succeed
even if an earlier test step caused the test to fail.

> +test_expect_success 'bad labels and refs rejected when parsing todo list' '
> +	cat >todo <<-\EOF &&
> +	exec >execed
> +	label #
> +	label :invalid
> +	update-ref :bad
> +	update-ref topic
> +	EOF
> +	rm -f execed &&
> +	(
> +		set_replace_editor todo &&
> +		test_must_fail git rebase -i HEAD 2>err
> +	) &&
> +	grep "'\''#'\'' is not a valid label" err &&
> +	grep "'\'':invalid'\'' is not a valid label" err &&
> +	grep "'\'':bad'\'' is not a valid refname" err &&
> +	grep "update-ref requires a fully qualified refname e.g. refs/heads/topic" \
> +		err &&
> +	test_path_is_missing execed &&
> +	git rebase --abort
>  '

Again, the `git rebase --abort` seems like protection for
future tests, so a test_when_finished would help.

Thanks,
-Stolee

Phillip Wood Feb. 20, 2023, 9:24 a.m. UTC | #2

Hi Stolee

On 17/02/2023 15:32, Derrick Stolee wrote:
> On 2/17/2023 9:37 AM, Phillip Wood via GitGitGadget wrote:
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Check that the argument to the "label" and "update-ref" commands is a
>> valid refname when the todo list is parsed rather than waiting until the
>> command is executed. This means that the user can deal with any errors
>> at the beginning of the rebase rather than having it stop halfway
>> through due to a typo in a label name.
> 
> Thanks for thinking about this user experience. This is a good
> improvement to the user interaction.
> 
>> The "update-ref" command is
>> changed to reject single level refs as it is all to easy to type
>> "update-ref branch" which is incorrect rather than "update-ref
>> refs/heads/branch"
> 
> I think it's good to start by adding the restriction in this
> check, but we could revisit this requirement in the future to
> see if it is worth allowing the user to drop "refs/heads/" and
> let it be implied. It just adds some complexity to the parsing,
> so this patch adds helpful scaffolding (in checks and tests)
> such that we could do that later in a safer way.
> 
>> +static int check_label_or_ref_arg(enum todo_command command, const char *arg)
>> +{
>> +	int allow_onelevel =
>> +		command == TODO_LABEL ? REFNAME_ALLOW_ONELEVEL : 0;
>> +
>> +	if ((command == TODO_LABEL && !strcmp(arg, "#")) ||
> 
> Interesting that "#" means something special for the label, and
> it's not limited to just the start of the label name, but must
> be the entire string. Is this not something that
> check_refname_format() would catch?

"#" is a valid refname, but the merge command uses it to separate the 
merge parents from the commit summary so it cannot be used as a label. 
It is rejected at the start of do_label() (arguably that check could be 
removed with this series but I'm tempted to leave it alone)

> Is the motivation that users
> might add what they think is a comment, such as:
> 
>    label # make a label here
> 
> but oddly, this doesn't include something strange like
> 
>    label #make a label here

I've actually got some patches based on this that add support for 
comments like this, but they are not the reason that "#" is forbidden as 
a label here.

>> +	    check_refname_format(arg, allow_onelevel)) {
>> +		if (command == TODO_LABEL)
>> +			error(_("'%s' is not a valid label"), arg);
> 
> If we have any kind of error and we are in TODO_LABEL, then
> we can use a label message. Good.
> 
>> +		else if (check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
>> +			error(_("'%s' is not a valid refname"), arg);
>> +		else
>> +			error(_("update-ref requires a fully qualified refname e.g. refs/heads/%s"),
>> +			      arg);
> 
> This took me a little while to grok, but I think I have it
> now: when in the update-ref mode, it could fail because of a
> one-level ref (the else case) or it could fail because the
> ref name uses forbidden characters (the else if case).
> 
> This nesting of conditions seems a bit fragile if we were to
> add a new todo_command to check here. Perhaps reorganize it
> to switch on the command?
> 
> 	switch (command) {
> 	case TODO_LABEL:
> 		if (!strcmp(arg, "#") ||
> 		    check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
> 			return error(_("'%s' is not a valid label", arg);
> 		break;
> 
> 	case TODO_UPDATE_REF:
> 		if (check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
> 			return error(_("'%s' is not a valid refname"), arg);
> 		else if (check_refname_format(arg, 0))
> 			return error(_("update-ref reqruies a fully qualified refname (e.g. refs/heads/%s)",
> 				     arg);
> 		break;
> 
> 	default:
> 		BUG("unexpected todo_command");
> 	}
> 
> 	return 0;

That's definitely clearer, thanks for the suggestion

>> @@ -2523,8 +2543,23 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>>   		return error(_("missing arguments for %s"),
>>   			     command_to_string(item->command));
>>   
>> -	if (item->command == TODO_EXEC || item->command == TODO_LABEL ||
>> +	if (item->command == TODO_LABEL ||
>>   	    item->command == TODO_RESET || item->command == TODO_UPDATE_REF) {
>> +		int ret = 0;
>> +
>> +		item->commit = NULL;
>> +		item->arg_offset = bol - buf;
>> +		item->arg_len = (int)(eol - bol);
>> +		if (item->command != TODO_RESET) {
>> +			saved = *eol;
>> +			*eol = '\0';
>> +			ret = check_label_or_ref_arg(item->command, bol);
>> +			*eol = saved;
>> +		}
>> +		return ret;
>> +	}
>> +
>> +	if (item->command == TODO_EXEC) {
>>   		item->commit = NULL;
>>   		item->arg_offset = bol - buf;
>>   		item->arg_len = (int)(eol - bol);
> 
> (What's missing from this context is "return 0;")
> 
> Is there an important reason why you separated TODO_EXEC and
> its identical item->arg_(offset|len) parsing into its own
> block?

In a word no! This patch was pulled out from the series I mentioned 
above that adds support for comments. There it makes sense to treat exec 
commands separately as they cannot have comments but you're right that 
there is no need to do that in this patch.

> It seems like we could modify your change to look like
> this:
> 
> 
> 	if (item->command == TODO_EXEC || item->command == TODO_LABEL ||
>    	    item->command == TODO_RESET || item->command == TODO_UPDATE_REF) {
> 		int ret = 0;
> 
> 		item->commit = NULL;
> 		item->arg_offset = bol - buf;
> 		item->arg_len = (int)(eol - bol);
> 		if (item->command == TODO_RESET ||
> 		    item->command == TODO_UPDATE_REF) {
> 			saved = *eol;
> 			*eol = '\0';
> 			ret = check_label_or_ref_arg(item->command, bol);
> 			*eol = saved;
> 		}
> 		return ret;
> 	}
> 
> and the diff will have fewer new lines as well as fewer
> duplicate lines in the post-image. Am I missing something
> about TODO_EXEC being special?

I'll update as you suggest

>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> index 462cefd25df..2cf2d2b8a24 100755
>> --- a/t/t3404-rebase-interactive.sh
>> +++ b/t/t3404-rebase-interactive.sh
>> @@ -2120,7 +2120,30 @@ test_expect_success '--update-refs: check failed ref update' '
>>   	tail -n 6 err >err.last &&
>>   	sed -e "s/Rebasing.*Successfully/Successfully/g" -e "s/^\t//g" \
>>   		<err.last >err.trimmed &&
>> -	test_cmp expect err.trimmed
>> +	test_cmp expect err.trimmed &&
>> +	git rebase --abort
>> +'
> 
> Perhaps this `git rebase --abort` should be part of a
> `test_when_finished test_may_fail git rebase --abort` at
> the start of the test so that your new test can succeed
> even if an earlier test step caused the test to fail.

That's a good idea (and for the next test as well)

Thanks for you're thoughtful comments and suggestions

Best Wishes

Phillip

>> +test_expect_success 'bad labels and refs rejected when parsing todo list' '
>> +	cat >todo <<-\EOF &&
>> +	exec >execed
>> +	label #
>> +	label :invalid
>> +	update-ref :bad
>> +	update-ref topic
>> +	EOF
>> +	rm -f execed &&
>> +	(
>> +		set_replace_editor todo &&
>> +		test_must_fail git rebase -i HEAD 2>err
>> +	) &&
>> +	grep "'\''#'\'' is not a valid label" err &&
>> +	grep "'\'':invalid'\'' is not a valid label" err &&
>> +	grep "'\'':bad'\'' is not a valid refname" err &&
>> +	grep "update-ref requires a fully qualified refname e.g. refs/heads/topic" \
>> +		err &&
>> +	test_path_is_missing execed &&
>> +	git rebase --abort
>>   '
> 
> Again, the `git rebase --abort` seems like protection for
> future tests, so a test_when_finished would help.
> 
> Thanks,
> -Stolee

Phillip Wood Feb. 20, 2023, 10:03 a.m. UTC | #3

On 20/02/2023 09:24, Phillip Wood wrote:
> Thanks for you're thoughtful comments and suggestions

Hmm, perhaps I'm not quite as awake as I thought I was. I meant "your 
thoughtful comments" of course.

Best Wishes

Phillip

diff --git a/sequencer.c b/sequencer.c
index 3e4a1972897..b4b5c3f331d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2477,6 +2477,26 @@  static int is_command(enum todo_command command, const char **bol)
 		 (*bol = p));
 }
 
+static int check_label_or_ref_arg(enum todo_command command, const char *arg)
+{
+	int allow_onelevel =
+		command == TODO_LABEL ? REFNAME_ALLOW_ONELEVEL : 0;
+
+	if ((command == TODO_LABEL && !strcmp(arg, "#")) ||
+	    check_refname_format(arg, allow_onelevel)) {
+		if (command == TODO_LABEL)
+			error(_("'%s' is not a valid label"), arg);
+		else if (check_refname_format(arg, REFNAME_ALLOW_ONELEVEL))
+			error(_("'%s' is not a valid refname"), arg);
+		else
+			error(_("update-ref requires a fully qualified refname e.g. refs/heads/%s"),
+			      arg);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int parse_insn_line(struct repository *r, struct todo_item *item,
 			   const char *buf, const char *bol, char *eol)
 {
@@ -2523,8 +2543,23 @@  static int parse_insn_line(struct repository *r, struct todo_item *item,
 		return error(_("missing arguments for %s"),
 			     command_to_string(item->command));
 
-	if (item->command == TODO_EXEC || item->command == TODO_LABEL ||
+	if (item->command == TODO_LABEL ||
 	    item->command == TODO_RESET || item->command == TODO_UPDATE_REF) {
+		int ret = 0;
+
+		item->commit = NULL;
+		item->arg_offset = bol - buf;
+		item->arg_len = (int)(eol - bol);
+		if (item->command != TODO_RESET) {
+			saved = *eol;
+			*eol = '\0';
+			ret = check_label_or_ref_arg(item->command, bol);
+			*eol = saved;
+		}
+		return ret;
+	}
+
+	if (item->command == TODO_EXEC) {
 		item->commit = NULL;
 		item->arg_offset = bol - buf;
 		item->arg_len = (int)(eol - bol);
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 462cefd25df..2cf2d2b8a24 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -2120,7 +2120,30 @@  test_expect_success '--update-refs: check failed ref update' '
 	tail -n 6 err >err.last &&
 	sed -e "s/Rebasing.*Successfully/Successfully/g" -e "s/^\t//g" \
 		<err.last >err.trimmed &&
-	test_cmp expect err.trimmed
+	test_cmp expect err.trimmed &&
+	git rebase --abort
+'
+
+test_expect_success 'bad labels and refs rejected when parsing todo list' '
+	cat >todo <<-\EOF &&
+	exec >execed
+	label #
+	label :invalid
+	update-ref :bad
+	update-ref topic
+	EOF
+	rm -f execed &&
+	(
+		set_replace_editor todo &&
+		test_must_fail git rebase -i HEAD 2>err
+	) &&
+	grep "'\''#'\'' is not a valid label" err &&
+	grep "'\'':invalid'\'' is not a valid label" err &&
+	grep "'\'':bad'\'' is not a valid refname" err &&
+	grep "update-ref requires a fully qualified refname e.g. refs/heads/topic" \
+		err &&
+	test_path_is_missing execed &&
+	git rebase --abort
 '
 
 # This must be the last test in this file

rebase -i: check labels and refs when parsing todo list

Commit Message

Comments

Patch