[1/3,RFC] tests: add test_todo() to mark known breakages

Message ID	472d05111a38276192e30f454f42aa39df51d604.1665068476.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <472d05111a38276192e30f454f42aa39df51d604.1665068476.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1374.git.1665068476.gitgitgadget@gmail.com> References: <pull.1374.git.1665068476.gitgitgadget@gmail.com> Date: Thu, 06 Oct 2022 15:01:14 +0000 Subject: [PATCH 1/3] [RFC] tests: add test_todo() to mark known breakages Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>, Derrick Stolee <stolee@gmail.com>, Phillip Wood <phillip.wood@dunelm.org.uk>, Phillip Wood <phillip.wood@dunelm.org.uk> Precedence: bulk From: Phillip Wood <phillip.wood@dunelm.org.uk>
Series	tests: add test_todo() for known failures \| expand [0/3,RFC] tests: add test_todo() for known failures [1/3,RFC] tests: add test_todo() to mark known breakages [2/3,RFC] test_todo: allow [!] grep as the command [3/3,RFC] test_todo: allow [verbose] test as the command

Phillip Wood Oct. 6, 2022, 3:01 p.m. UTC

From: Phillip Wood <phillip.wood@dunelm.org.uk>

test_todo() is intended as a fine grained replacement for
test_expect_failure(). Rather than marking the whole test as failing
test_todo() is used to mark individual failing commands within a
test. This approach to writing failing tests allows us to detect
unexpected failures that are hidden by test_expect_failure().

Failing commands are reported by the test harness in the same way as
test_expect_failure() so there is no change in output when migrating
from test_expect_failure() to test_todo(). If a command marked with
test_todo() succeeds then the test will fail. This is designed to make
it easier to see when a command starts succeeding in our CI compared
to using test_expect_failure() where it is easy to fix a failing test
case and not realize it.

test_todo() is built upon test_expect_failure() but accepts commands
starting with test_* in addition to git. As our test_* assertions use
BUG() to signal usage errors any such error will not be hidden by
test_todo().

This commit coverts a few tests to show the intended use of
test_todo().  A limitation of test_todo() as it is currently
implemented is that it cannot be used in a subshell.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/README                        |  12 +++
 t/t0000-basic.sh                |  64 ++++++++++++++
 t/t3401-rebase-and-am-rename.sh |  12 +--
 t/t3424-rebase-empty.sh         |   6 +-
 t/t3600-rm.sh                   |   8 +-
 t/test-lib-functions.sh         | 147 +++++++++++++++++++++++---------
 6 files changed, 194 insertions(+), 55 deletions(-)

Ævar Arnfjörð Bjarmason Oct. 6, 2022, 3:36 p.m. UTC | #1

On Thu, Oct 06 2022, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> test_todo() is intended as a fine grained replacement for
> test_expect_failure(). Rather than marking the whole test as failing
> test_todo() is used to mark individual failing commands within a
> test. This approach to writing failing tests allows us to detect
> unexpected failures that are hidden by test_expect_failure().
>
> Failing commands are reported by the test harness in the same way as
> test_expect_failure() so there is no change in output when migrating
> from test_expect_failure() to test_todo(). If a command marked with
> test_todo() succeeds then the test will fail. This is designed to make
> it easier to see when a command starts succeeding in our CI compared
> to using test_expect_failure() where it is easy to fix a failing test
> case and not realize it.
>
> test_todo() is built upon test_expect_failure() but accepts commands
> starting with test_* in addition to git. As our test_* assertions use
> BUG() to signal usage errors any such error will not be hidden by
> test_todo().

I think they will, unless I'm missing something. E.g. try out:
	
	diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
	index 80e76a4695e..1be895abba6 100755
	--- a/t/t0210-trace2-normal.sh
	+++ b/t/t0210-trace2-normal.sh
	@@ -170,7 +170,7 @@ test_expect_success 'BUG messages are written to trace2' '
	 
	 test_expect_success 'bug messages with BUG_if_bug() are written to trace2' '
	 	test_when_finished "rm trace.normal actual expect" &&
	-	test_expect_code 99 env GIT_TRACE2="$(pwd)/trace.normal" \
	+	test_todo env GIT_TRACE2="$(pwd)/trace.normal" \
	 		test-tool trace2 008bug 2>err &&
	 	cat >expect <<-\EOF &&
	 	a bug message

I.e. in our tests you need to look out for exit code 99, not the usual
abort().

I have local patches to fix this, previously submitted as an RFC here:
https://lore.kernel.org/git/RFC-cover-0.3-00000000000-20220525T234908Z-avarab@gmail.com/

I just rebased that & CI is currently running, I'll see how it does:
https://github.com/avar/git/tree/avar/usage-do-not-abort-on-BUG-to-get-trace2-event-2

When I merged your patches here with that topic yours started doing the
right thing in this case, i.e. a "test_todo" that would get a BUG()"
would be reported as a failure.

> +	test_true () {
> +		true
> +	}
> +	test_expect_success "pretend we have fixed a test_todo breakage" \
> +		"test_todo test_true"

"Why the indirection", until I realized that it's because you're working
around the whitelist of commands that we have, i.e. we allow 'git' and
'test-tool', but not 'grep' or whatever.

I'm of the opinion that we should just drop that limitation altogether,
which is shown to be pretty silly in this case. I.e. at some point we
listed "test_*" as "this invokes a git binary", but a lot of our test_*
commands don't, including this one.

So in general I think we should just allow any command in
"test_must_fail" et al, and just catch in code review if someone uses
"test_must_fail grep" as opposed to "! grep".

But for the particular case of "test_todo" doing so seems like pointless
work, if we think we're going to miss this sort of thing in review in
general, then surely that's not a concern if we're going to very
prominently mark tests as TODO tests, given how the test of the output
shows them?

> +test_must_fail_helper () {
> +	test_must_fail_name_="$1"
> +	shift
> +	case "$1" in
> +	ok=*)
> +		_test_ok=${1#ok=}
> +		shift
> +		;;
> +	*)
> +		_test_ok=
> +		;;
> +	esac
> +	if ! test_must_fail_acceptable $test_must_fail_name_ "$@"
> +	then
> +		echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*"
> +		return 1
> +	fi
> +	"$@" 2>&7
> +	exit_code=$?
> +	if test $exit_code -eq 0 && ! list_contains "$_test_ok" success
> +	then
> +		echo >&4 "test_$test_must_fail_name_: command succeeded: $*"
> +		return 1
> +	elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe
> +	then
> +		return 0
> +	elif test $exit_code -gt 129 && test $exit_code -le 192
> +	then
> +		echo >&4 "test_$test_must_fail_name_: died by signal $(($exit_code - 128)): $*"
> +		return 1
> +	elif test $exit_code -eq 127
> +	then
> +		echo >&4 "test_$test_must_fail_name_: command not found: $*"
> +		return 1
> +	elif test $exit_code -eq 126
> +	then
> +		echo >&4 "test_$test_must_fail_name_: valgrind error: $*"
> +		return 1
> +	fi
> +
> +	return 0
> +} 7>&2 2>&4
> +
> +# This is used to mark commands that should succeed but do not due to
> +# a known issue. Marking the individual commands that fail rather than
> +# using test_expect_failure allows us to detect any unexpected
> +# failures. As with test_must_fail if the command is killed by a
> +# signal the test will fail. If the command unexpectedly succeeds then
> +# the test will also fail. For example:
> +#
> +#	test_expect_success 'test a known failure' '
> +#		git foo 2>err &&
> +#		test_todo test_must_be_empty err
> +#	'
> +#
> +# This test will fail if "git foo" fails or err is unexpectedly empty.
> +# test_todo can be used with "git" or any of the "test_*" assertions
> +# such as test_cmp().
> +
> +test_todo () {
> +	if test "$test_todo_" = "test_expect_failure"
> +	then
> +		BUG "test_todo_ cannot be used inside test_expect_failure"
> +	fi
> +	test_todo_=todo
> +	test_must_fail_helper todo "$@" 2>&7
> +} 7>&2 2>&4
> +
>  # This is not among top-level (test_expect_success | test_expect_failure)
>  # but is a prefix that can be used in the test script, like:
>  #
> @@ -1061,43 +1160,7 @@ test_must_fail_acceptable () {
>  #    ! grep pattern output
>  
>  test_must_fail () {
> -	case "$1" in
> -	ok=*)
> -		_test_ok=${1#ok=}
> -		shift
> -		;;
> -	*)
> -		_test_ok=
> -		;;
> -	esac
> -	if ! test_must_fail_acceptable "$@"
> -	then
> -		echo >&7 "test_must_fail: only 'git' is allowed: $*"
> -		return 1
> -	fi
> -	"$@" 2>&7
> -	exit_code=$?
> -	if test $exit_code -eq 0 && ! list_contains "$_test_ok" success
> -	then
> -		echo >&4 "test_must_fail: command succeeded: $*"
> -		return 1
> -	elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe
> -	then
> -		return 0
> -	elif test $exit_code -gt 129 && test $exit_code -le 192
> -	then
> -		echo >&4 "test_must_fail: died by signal $(($exit_code - 128)): $*"
> -		return 1
> -	elif test $exit_code -eq 127
> -	then
> -		echo >&4 "test_must_fail: command not found: $*"
> -		return 1
> -	elif test $exit_code -eq 126
> -	then
> -		echo >&4 "test_must_fail: valgrind error: $*"
> -		return 1
> -	fi
> -	return 0
> +	test_must_fail_helper must_fail "$@" 2>&7
>  } 7>&2 2>&4
>  
>  # Similar to test_must_fail, but tolerates success, too.  This is
> @@ -1114,7 +1177,7 @@ test_must_fail () {
>  # Accepts the same options as test_must_fail.
>  
>  test_might_fail () {
> -	test_must_fail ok=success "$@" 2>&7
> +	test_must_fail_helper might_fail ok=success "$@" 2>&7
>  } 7>&2 2>&4
>  
>  # Similar to test_must_fail and test_might_fail, but check that a

I remember finding it annoying that "test_might_fail" is misreported
from test_must_fail_acceptable as being called "test_must_fail", so this
refactoring is very welcome.

But can you please split this into its own commit? I.e. that improvement
can stand on its own, i.e. just a change that has "test_must_fail" and
"test_might_fail" reporting their correct name.

Then this commit can follow, and then you'll just need to add (for this part):

	test_must_fail_helper todo "$@" 2>&7

And it'll make the resulting diff much smaller & easier to follow.

Phillip Wood Oct. 6, 2022, 4:10 p.m. UTC | #2

Hi Ævar

On 06/10/2022 16:36, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 06 2022, Phillip Wood via GitGitGadget wrote:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
 >>
>> test_todo() is built upon test_expect_failure() but accepts commands
>> starting with test_* in addition to git. As our test_* assertions use
>> BUG() to signal usage errors any such error will not be hidden by
>> test_todo().
> 
> I think they will, unless I'm missing something. E.g. try out:

It's talking about BUG() in test-lib.sh, not the C function. For example

test_path_is_file () {
	test "$#" -ne 1 && BUG "1 param"
	if ! test -f "$1"
	then
		echo "File $1 doesn't exist"
		false
	fi
}

So a test containing "test_todo test_path_is_file a b" should fail as 
BUG calls exit rather than returning non-zero (I should probably test 
that in 0000-basic.sh)

> 	diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> 	index 80e76a4695e..1be895abba6 100755
> 	--- a/t/t0210-trace2-normal.sh
> 	+++ b/t/t0210-trace2-normal.sh
> 	@@ -170,7 +170,7 @@ test_expect_success 'BUG messages are written to trace2' '
> 	
> 	 test_expect_success 'bug messages with BUG_if_bug() are written to trace2' '
> 	 	test_when_finished "rm trace.normal actual expect" &&
> 	-	test_expect_code 99 env GIT_TRACE2="$(pwd)/trace.normal" \
> 	+	test_todo env GIT_TRACE2="$(pwd)/trace.normal" \
> 	 		test-tool trace2 008bug 2>err &&
> 	 	cat >expect <<-\EOF &&
> 	 	a bug message
> 
> I.e. in our tests you need to look out for exit code 99, not the usual
> abort().
> 
> I have local patches to fix this, previously submitted as an RFC here:
> https://lore.kernel.org/git/RFC-cover-0.3-00000000000-20220525T234908Z-avarab@gmail.com/
> 
> I just rebased that & CI is currently running, I'll see how it does:
> https://github.com/avar/git/tree/avar/usage-do-not-abort-on-BUG-to-get-trace2-event-2
> 
> When I merged your patches here with that topic yours started doing the
> right thing in this case, i.e. a "test_todo" that would get a BUG()"
> would be reported as a failure.
> 
>> +	test_true () {
>> +		true
>> +	}
>> +	test_expect_success "pretend we have fixed a test_todo breakage" \
>> +		"test_todo test_true"
> 
> "Why the indirection", until I realized that it's because you're working
> around the whitelist of commands that we have, i.e. we allow 'git' and
> 'test-tool', but not 'grep' or whatever.
> 
> I'm of the opinion that we should just drop that limitation altogether,
> which is shown to be pretty silly in this case. I.e. at some point we
> listed "test_*" as "this invokes a git binary", but a lot of our test_*
> commands don't, including this one.

test_expect_failure does not allow test_* are you thinking of test-tool?

> So in general I think we should just allow any command in
> "test_must_fail" et al, and just catch in code review if someone uses
> "test_must_fail grep" as opposed to "! grep".

That is not going to scale well

> But for the particular case of "test_todo" doing so seems like pointless
> work, if we think we're going to miss this sort of thing in review in
> general, then surely that's not a concern if we're going to very
> prominently mark tests as TODO tests, given how the test of the output
> shows them?

 >[...]
>>   test_might_fail () {
>> -	test_must_fail ok=success "$@" 2>&7
>> +	test_must_fail_helper might_fail ok=success "$@" 2>&7
>>   } 7>&2 2>&4
>>   
>>   # Similar to test_must_fail and test_might_fail, but check that a
> 
> I remember finding it annoying that "test_might_fail" is misreported
> from test_must_fail_acceptable as being called "test_must_fail", so this
> refactoring is very welcome.
> 
> But can you please split this into its own commit? I.e. that improvement
> can stand on its own, i.e. just a change that has "test_must_fail" and
> "test_might_fail" reporting their correct name.
 >
> Then this commit can follow, and then you'll just need to add (for this part) >
> 	test_must_fail_helper todo "$@" 2>&7
> 
> And it'll make the resulting diff much smaller & easier to follow.

Sure, I should also improve the error message in

 >> +		echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*"

for test_todo.

Best Wishes

Phillip

Ævar Arnfjörð Bjarmason Oct. 6, 2022, 8:33 p.m. UTC | #3

On Thu, Oct 06 2022, Phillip Wood wrote:

> Hi Ævar
>
> On 06/10/2022 16:36, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Oct 06 2022, Phillip Wood via GitGitGadget wrote:
>> 
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> test_todo() is built upon test_expect_failure() but accepts commands
>>> starting with test_* in addition to git. As our test_* assertions use
>>> BUG() to signal usage errors any such error will not be hidden by
>>> test_todo().
>> I think they will, unless I'm missing something. E.g. try out:
>
> It's talking about BUG() in test-lib.sh, not the C function. For example
>
> test_path_is_file () {
> 	test "$#" -ne 1 && BUG "1 param"
> 	if ! test -f "$1"
> 	then
> 		echo "File $1 doesn't exist"
> 		false
> 	fi
> }
>
> So a test containing "test_todo test_path_is_file a b" should fail as
> BUG calls exit rather than returning non-zero (I should probably test 
> that in 0000-basic.sh)

Ah, anyway, I think getting that to behave correctly is *the* thing any
less sucky test_expect_failure replacement needs to get right, i.e. to
handle BUG() (in C code), abort(), SANITIZE=undefined (and friends, all
of whom abort()), segfaults etc.

>> 	diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
>> 	index 80e76a4695e..1be895abba6 100755
>> 	--- a/t/t0210-trace2-normal.sh
>> 	+++ b/t/t0210-trace2-normal.sh
>> 	@@ -170,7 +170,7 @@ test_expect_success 'BUG messages are written to trace2' '
>> 	
>> 	 test_expect_success 'bug messages with BUG_if_bug() are written to trace2' '
>> 	 	test_when_finished "rm trace.normal actual expect" &&
>> 	-	test_expect_code 99 env GIT_TRACE2="$(pwd)/trace.normal" \
>> 	+	test_todo env GIT_TRACE2="$(pwd)/trace.normal" \
>> 	 		test-tool trace2 008bug 2>err &&
>> 	 	cat >expect <<-\EOF &&
>> 	 	a bug message
>> I.e. in our tests you need to look out for exit code 99, not the
>> usual
>> abort().
>> I have local patches to fix this, previously submitted as an RFC
>> here:
>> https://lore.kernel.org/git/RFC-cover-0.3-00000000000-20220525T234908Z-avarab@gmail.com/
>> I just rebased that & CI is currently running, I'll see how it does:
>> https://github.com/avar/git/tree/avar/usage-do-not-abort-on-BUG-to-get-trace2-event-2
>> When I merged your patches here with that topic yours started doing
>> the
>> right thing in this case, i.e. a "test_todo" that would get a BUG()"
>> would be reported as a failure.
>> 
>>> +	test_true () {
>>> +		true
>>> +	}
>>> +	test_expect_success "pretend we have fixed a test_todo breakage" \
>>> +		"test_todo test_true"
>> "Why the indirection", until I realized that it's because you're
>> working
>> around the whitelist of commands that we have, i.e. we allow 'git' and
>> 'test-tool', but not 'grep' or whatever.
>> I'm of the opinion that we should just drop that limitation
>> altogether,
>> which is shown to be pretty silly in this case. I.e. at some point we
>> listed "test_*" as "this invokes a git binary", but a lot of our test_*
>> commands don't, including this one.
>
> test_expect_failure does not allow test_* are you thinking of test-tool?

I'm talking about test_todo, and the reason for that "test_true" being
needed is because test_must_fail_acceptable is picky, but we could also
(I just adjusted that one test):
	
	diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
	index 52362ad3dd3..921e0401eb5 100755
	--- a/t/t0000-basic.sh
	+++ b/t/t0000-basic.sh
	@@ -143,11 +143,8 @@ test_expect_success 'subtest: a passing TODO test' '
	 
	 test_expect_success 'subtest: a failing test_todo' '
	 	write_and_run_sub_test_lib_test failing-test-todo <<-\EOF &&
	-	test_false () {
	-		false
	-	}
	 	test_expect_success "passing test" "true"
	-	test_expect_success "known todo" "test_todo test_false"
	+	test_expect_success "known todo" "test_todo false"
	 	test_done
	 	EOF
	 	check_sub_test_lib_test failing-test-todo <<-\EOF
	diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
	index 8978709b231..9300eaa2c62 100644
	--- a/t/test-lib-functions.sh
	+++ b/t/test-lib-functions.sh
	@@ -1034,6 +1034,11 @@ test_must_fail_acceptable () {
	 		done
	 	fi
	 
	+	if test "$name" = "todo"
	+	then
	+		return 0
	+	fi
	+
	 	case "$1" in
	 	git|__git*|test-tool|test_terminal)
	 		return 0
	@@ -1050,10 +1055,6 @@ test_must_fail_acceptable () {
	 		fi
	 		return 1
	 		;;
	-	test_*)
	-		test "$name" = "todo"
	-		return $?
	-		;;
	 	*)
	 		return 1
	 		;;
	

>> So in general I think we should just allow any command in
>> "test_must_fail" et al, and just catch in code review if someone uses
>> "test_must_fail grep" as opposed to "! grep".
>
> That is not going to scale well

Well, you're throwing the scaling out the window by whitelisting test_*
in your 1/3 :)

I don't think we really need it, but *the* reason it's there is to
distinguish things that run our own C code from things that don't, and a
lot of test_* helpers don't.

So if you want to pursue making this correct per the current intent it
should really use the current whitelist to decide whether or not to pass
things through the "should we change the exit code if it's a signal,
segfault etc?" part.

Otherwise you should just negate it or whatever, as the "test_todo" I
showed in
https://lore.kernel.org/git/221006.86v8owr986.gmgdl@evledraar.gmail.com/
does. I.e. we shouldn't be detecting an abort() in /bin/false and the
like.

Victoria Dye Dec. 6, 2022, 10:37 p.m. UTC | #4

Phillip Wood via GitGitGadget wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> test_todo() is intended as a fine grained replacement for
> test_expect_failure(). Rather than marking the whole test as failing
> test_todo() is used to mark individual failing commands within a
> test. This approach to writing failing tests allows us to detect
> unexpected failures that are hidden by test_expect_failure().

I love this idea! I've nearly been burned a couple of times by the wrong
line in a 'test_expect_failure' triggering the error (e.g., due to bad
syntax earlier in the test). The added specificity of 'test_todo' will help
both reviewers and people fixing the underlying issues demonstrated by
expected-failing tests.

> 
> Failing commands are reported by the test harness in the same way as
> test_expect_failure() so there is no change in output when migrating
> from test_expect_failure() to test_todo(). If a command marked with
> test_todo() succeeds then the test will fail. This is designed to make
> it easier to see when a command starts succeeding in our CI compared
> to using test_expect_failure() where it is easy to fix a failing test
> case and not realize it.
> 
> test_todo() is built upon test_expect_failure() but accepts commands
> starting with test_* in addition to git. As our test_* assertions use
> BUG() to signal usage errors any such error will not be hidden by
> test_todo().

Should this be so restrictive? I think 'test_todo' would need to handle any
arbitrary command (mostly because of custom functions like
'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement
for 'test_expect_failure'. 

I see there's some related discussion in another subthread [1], but I don't
necessarily think removing restrictions (i.e. that the tested command must
be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for
'test_must_fail' et al. to be internally consistent. On one hand,
'test_todo' could be interpreted as an assertion (like 'test_must_fail'),
where we only want to assert on our code - hence the restrictions. From that
perspective, it would make sense to ease restrictions uniformly on all of
our assertion helpers. 

On the other hand, I'm interpreting 'test_todo' as
'test_expect_failure_on_line_N' - more of a "post-test result interpreter"
than an assertion helper. So because 'test_expect_failure' doesn't require
the failing line to come from a particular command, I don't think
'test_todo' needs to either. That leaves assertion helpers like
'test_must_fail' out of the scope of this change, avoiding any hairiness of
allowing them to assert on arbitrary code.

What do you think?

[1] https://lore.kernel.org/git/221006.86mta8r860.gmgdl@evledraar.gmail.com/

> 
> This commit coverts a few tests to show the intended use of
> test_todo().  A limitation of test_todo() as it is currently
> implemented is that it cannot be used in a subshell.
> 
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>

Ævar Arnfjörð Bjarmason Dec. 7, 2022, 12:08 p.m. UTC | #5

On Tue, Dec 06 2022, Victoria Dye wrote:

> Phillip Wood via GitGitGadget wrote:
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>> 
>> test_todo() is intended as a fine grained replacement for
>> test_expect_failure(). Rather than marking the whole test as failing
>> test_todo() is used to mark individual failing commands within a
>> test. This approach to writing failing tests allows us to detect
>> unexpected failures that are hidden by test_expect_failure().
>
> I love this idea! I've nearly been burned a couple of times by the wrong
> line in a 'test_expect_failure' triggering the error (e.g., due to bad
> syntax earlier in the test). The added specificity of 'test_todo' will help
> both reviewers and people fixing the underlying issues demonstrated by
> expected-failing tests.
>
>> 
>> Failing commands are reported by the test harness in the same way as
>> test_expect_failure() so there is no change in output when migrating
>> from test_expect_failure() to test_todo(). If a command marked with
>> test_todo() succeeds then the test will fail. This is designed to make
>> it easier to see when a command starts succeeding in our CI compared
>> to using test_expect_failure() where it is easy to fix a failing test
>> case and not realize it.
>> 
>> test_todo() is built upon test_expect_failure() but accepts commands
>> starting with test_* in addition to git. As our test_* assertions use
>> BUG() to signal usage errors any such error will not be hidden by
>> test_todo().
>
> Should this be so restrictive? I think 'test_todo' would need to handle any
> arbitrary command (mostly because of custom functions like
> 'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement
> for 'test_expect_failure'. 
>
> I see there's some related discussion in another subthread [1], but I don't
> necessarily think removing restrictions (i.e. that the tested command must
> be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for
> 'test_must_fail' et al. to be internally consistent. On one hand,
> 'test_todo' could be interpreted as an assertion (like 'test_must_fail'),
> where we only want to assert on our code - hence the restrictions. From that
> perspective, it would make sense to ease restrictions uniformly on all of
> our assertion helpers. 
>
> On the other hand, I'm interpreting 'test_todo' as
> 'test_expect_failure_on_line_N' - more of a "post-test result interpreter"
> than an assertion helper. So because 'test_expect_failure' doesn't require
> the failing line to come from a particular command, I don't think
> 'test_todo' needs to either. That leaves assertion helpers like
> 'test_must_fail' out of the scope of this change, avoiding any hairiness of
> allowing them to assert on arbitrary code.
>
> What do you think?

Are you saying that for the "test_todo" we shouldn't care whether it
exits with a "normal" non-zero or a segfault, abort() (e.g. BUG()) etc?
That's what the "test_must_fail" v.s. "!" is about.

Even if we erased tat distinction I think such a thing would be a
marginal improvement on "test_expect_failure", as we'd at least mark
what line fails, but like "test_expect_failure" we'd accept segfaults as
failures.

but as noted in the upthread discussions I think we should do better and
still check for segfaults etc. I think we have a couple of
"test_expect_failure" now where we expect a segfault, but for the rest
we'd like to know if they start segfaulting.

Phillip Wood Dec. 8, 2022, 3:06 p.m. UTC | #6

Hi Victoria

On 06/12/2022 22:37, Victoria Dye wrote:
> Phillip Wood via GitGitGadget wrote:
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
 >>
>> Failing commands are reported by the test harness in the same way as
>> test_expect_failure() so there is no change in output when migrating
>> from test_expect_failure() to test_todo(). If a command marked with
>> test_todo() succeeds then the test will fail. This is designed to make
>> it easier to see when a command starts succeeding in our CI compared
>> to using test_expect_failure() where it is easy to fix a failing test
>> case and not realize it.
>>
>> test_todo() is built upon test_expect_failure() but accepts commands
>> starting with test_* in addition to git. As our test_* assertions use
>> BUG() to signal usage errors any such error will not be hidden by
>> test_todo().
> 
> Should this be so restrictive? I think 'test_todo' would need to handle any
> arbitrary command (mostly because of custom functions like
> 'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement
> for 'test_expect_failure'.
> 
> I see there's some related discussion in another subthread [1], but I don't
> necessarily think removing restrictions (i.e. that the tested command must
> be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for
> 'test_must_fail' et al. to be internally consistent. On one hand,
> 'test_todo' could be interpreted as an assertion (like 'test_must_fail'),
> where we only want to assert on our code - hence the restrictions. From that
> perspective, it would make sense to ease restrictions uniformly on all of
> our assertion helpers.
> 
> On the other hand, I'm interpreting 'test_todo' as
> 'test_expect_failure_on_line_N' - more of a "post-test result interpreter"
> than an assertion helper. So because 'test_expect_failure' doesn't require
> the failing line to come from a particular command, I don't think
> 'test_todo' needs to either. That leaves assertion helpers like
> 'test_must_fail' out of the scope of this change, avoiding any hairiness of
> allowing them to assert on arbitrary code.
> 
> What do you think?

I don't think we need to remove the restrictions on 'test_must_fail', 
they seem to be there for a good reason and I'm not aware of anyone 
complaining about being inconvenienced by them. I think of 'test_todo' 
and 'test_must_fail' as being distinct, 'test_todo' only reuses the 
implementation of 'test_must_fail' for convenience rather than any other 
deep reason.

I added the restrictions to 'test_todo' to try and stop it being misused 
but I'm happy to relax them if needed. I'm keen that test_todo is able 
to distinguish between an expected failure and a failure due to the 
wrapped command being misused e.g. 'test_todo grep --invalid-option' 
should report an error. Restricting the commands makes it easier to 
guarantee that but we can always just add checks for other commands as 
we use them. In a way the existing restrictions are kind of pointless 
because test authors can always name their helper functions test_... to 
get round them.

I think you've convinced be to remove the restrictions on what can be 
wrapped by 'test_todo' when I re-roll.

Thanks for your thoughtful comments

Phillip

> [1] https://lore.kernel.org/git/221006.86mta8r860.gmgdl@evledraar.gmail.com/
> 
>>
>> This commit coverts a few tests to show the intended use of
>> test_todo().  A limitation of test_todo() as it is currently
>> implemented is that it cannot be used in a subshell.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>

Junio C Hamano Dec. 9, 2022, 1:09 a.m. UTC | #7

Phillip Wood <phillip.wood123@gmail.com> writes:

> I added the restrictions to 'test_todo' to try and stop it being
> misused but I'm happy to relax them if needed. I'm keen that test_todo
> is able to distinguish between an expected failure and a failure due
> to the wrapped command being misused e.g. 'test_todo grep
> --invalid-option' should report an error.

Hmm, but it is not useful if the failure is from "you cannot use the
system command grep with test_todo", as the (implicitly) encouraged
"fix" the developer who wrote the test would pick would be to use
"! grep --invalid-option" which would still fail for a wrong reason.

If a "git" command is expected to run to a completion but is
currently broken and produces a wrong output, it would be very
useful to be able to write

	git command --args >actual &&
	test_todo grep -e "$expected_token" actual

to say "when 'git command' is fixed, the output should contains
this, but we know it currently is broken". 

> I think you've convinced be to remove the restrictions on what can be
> wrapped by 'test_todo' when I re-roll.
>
> Thanks for your thoughtful comments

Thanks.

Phillip Wood Dec. 9, 2022, 9:04 a.m. UTC | #8

Hi Junio

On 09/12/2022 01:09, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> I added the restrictions to 'test_todo' to try and stop it being
>> misused but I'm happy to relax them if needed. I'm keen that test_todo
>> is able to distinguish between an expected failure and a failure due
>> to the wrapped command being misused e.g. 'test_todo grep
>> --invalid-option' should report an error.
> 
> Hmm, but it is not useful if the failure is from "you cannot use the
> system command grep with test_todo", as the (implicitly) encouraged
> "fix" the developer who wrote the test would pick would be to use
> "! grep --invalid-option" which would still fail for a wrong reason.
> 
> If a "git" command is expected to run to a completion but is
> currently broken and produces a wrong output, it would be very
> useful to be able to write
> 
> 	git command --args >actual &&
> 	test_todo grep -e "$expected_token" actual
> 
> to say "when 'git command' is fixed, the output should contains
> this, but we know it currently is broken".

You can do that now, the next patch adds support for "test_todo [!] 
grep" but you currently cannot pass arbitrary commands/helper functions.

Best Wishes

Phillip

>> I think you've convinced be to remove the restrictions on what can be
>> wrapped by 'test_todo' when I re-roll.
>>
>> Thanks for your thoughtful comments
> 
> Thanks.

[1/3,RFC] tests: add test_todo() to mark known breakages

Commit Message

Comments

Patch