Message ID | 472d05111a38276192e30f454f42aa39df51d604.1665068476.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tests: add test_todo() for known failures | expand |
On Thu, Oct 06 2022, Phillip Wood via GitGitGadget wrote: > From: Phillip Wood <phillip.wood@dunelm.org.uk> > > test_todo() is intended as a fine grained replacement for > test_expect_failure(). Rather than marking the whole test as failing > test_todo() is used to mark individual failing commands within a > test. This approach to writing failing tests allows us to detect > unexpected failures that are hidden by test_expect_failure(). > > Failing commands are reported by the test harness in the same way as > test_expect_failure() so there is no change in output when migrating > from test_expect_failure() to test_todo(). If a command marked with > test_todo() succeeds then the test will fail. This is designed to make > it easier to see when a command starts succeeding in our CI compared > to using test_expect_failure() where it is easy to fix a failing test > case and not realize it. > > test_todo() is built upon test_expect_failure() but accepts commands > starting with test_* in addition to git. As our test_* assertions use > BUG() to signal usage errors any such error will not be hidden by > test_todo(). I think they will, unless I'm missing something. E.g. try out: diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh index 80e76a4695e..1be895abba6 100755 --- a/t/t0210-trace2-normal.sh +++ b/t/t0210-trace2-normal.sh @@ -170,7 +170,7 @@ test_expect_success 'BUG messages are written to trace2' ' test_expect_success 'bug messages with BUG_if_bug() are written to trace2' ' test_when_finished "rm trace.normal actual expect" && - test_expect_code 99 env GIT_TRACE2="$(pwd)/trace.normal" \ + test_todo env GIT_TRACE2="$(pwd)/trace.normal" \ test-tool trace2 008bug 2>err && cat >expect <<-\EOF && a bug message I.e. in our tests you need to look out for exit code 99, not the usual abort(). I have local patches to fix this, previously submitted as an RFC here: https://lore.kernel.org/git/RFC-cover-0.3-00000000000-20220525T234908Z-avarab@gmail.com/ I just rebased that & CI is currently running, I'll see how it does: https://github.com/avar/git/tree/avar/usage-do-not-abort-on-BUG-to-get-trace2-event-2 When I merged your patches here with that topic yours started doing the right thing in this case, i.e. a "test_todo" that would get a BUG()" would be reported as a failure. > + test_true () { > + true > + } > + test_expect_success "pretend we have fixed a test_todo breakage" \ > + "test_todo test_true" "Why the indirection", until I realized that it's because you're working around the whitelist of commands that we have, i.e. we allow 'git' and 'test-tool', but not 'grep' or whatever. I'm of the opinion that we should just drop that limitation altogether, which is shown to be pretty silly in this case. I.e. at some point we listed "test_*" as "this invokes a git binary", but a lot of our test_* commands don't, including this one. So in general I think we should just allow any command in "test_must_fail" et al, and just catch in code review if someone uses "test_must_fail grep" as opposed to "! grep". But for the particular case of "test_todo" doing so seems like pointless work, if we think we're going to miss this sort of thing in review in general, then surely that's not a concern if we're going to very prominently mark tests as TODO tests, given how the test of the output shows them? > +test_must_fail_helper () { > + test_must_fail_name_="$1" > + shift > + case "$1" in > + ok=*) > + _test_ok=${1#ok=} > + shift > + ;; > + *) > + _test_ok= > + ;; > + esac > + if ! test_must_fail_acceptable $test_must_fail_name_ "$@" > + then > + echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*" > + return 1 > + fi > + "$@" 2>&7 > + exit_code=$? > + if test $exit_code -eq 0 && ! list_contains "$_test_ok" success > + then > + echo >&4 "test_$test_must_fail_name_: command succeeded: $*" > + return 1 > + elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe > + then > + return 0 > + elif test $exit_code -gt 129 && test $exit_code -le 192 > + then > + echo >&4 "test_$test_must_fail_name_: died by signal $(($exit_code - 128)): $*" > + return 1 > + elif test $exit_code -eq 127 > + then > + echo >&4 "test_$test_must_fail_name_: command not found: $*" > + return 1 > + elif test $exit_code -eq 126 > + then > + echo >&4 "test_$test_must_fail_name_: valgrind error: $*" > + return 1 > + fi > + > + return 0 > +} 7>&2 2>&4 > + > +# This is used to mark commands that should succeed but do not due to > +# a known issue. Marking the individual commands that fail rather than > +# using test_expect_failure allows us to detect any unexpected > +# failures. As with test_must_fail if the command is killed by a > +# signal the test will fail. If the command unexpectedly succeeds then > +# the test will also fail. For example: > +# > +# test_expect_success 'test a known failure' ' > +# git foo 2>err && > +# test_todo test_must_be_empty err > +# ' > +# > +# This test will fail if "git foo" fails or err is unexpectedly empty. > +# test_todo can be used with "git" or any of the "test_*" assertions > +# such as test_cmp(). > + > +test_todo () { > + if test "$test_todo_" = "test_expect_failure" > + then > + BUG "test_todo_ cannot be used inside test_expect_failure" > + fi > + test_todo_=todo > + test_must_fail_helper todo "$@" 2>&7 > +} 7>&2 2>&4 > + > # This is not among top-level (test_expect_success | test_expect_failure) > # but is a prefix that can be used in the test script, like: > # > @@ -1061,43 +1160,7 @@ test_must_fail_acceptable () { > # ! grep pattern output > > test_must_fail () { > - case "$1" in > - ok=*) > - _test_ok=${1#ok=} > - shift > - ;; > - *) > - _test_ok= > - ;; > - esac > - if ! test_must_fail_acceptable "$@" > - then > - echo >&7 "test_must_fail: only 'git' is allowed: $*" > - return 1 > - fi > - "$@" 2>&7 > - exit_code=$? > - if test $exit_code -eq 0 && ! list_contains "$_test_ok" success > - then > - echo >&4 "test_must_fail: command succeeded: $*" > - return 1 > - elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe > - then > - return 0 > - elif test $exit_code -gt 129 && test $exit_code -le 192 > - then > - echo >&4 "test_must_fail: died by signal $(($exit_code - 128)): $*" > - return 1 > - elif test $exit_code -eq 127 > - then > - echo >&4 "test_must_fail: command not found: $*" > - return 1 > - elif test $exit_code -eq 126 > - then > - echo >&4 "test_must_fail: valgrind error: $*" > - return 1 > - fi > - return 0 > + test_must_fail_helper must_fail "$@" 2>&7 > } 7>&2 2>&4 > > # Similar to test_must_fail, but tolerates success, too. This is > @@ -1114,7 +1177,7 @@ test_must_fail () { > # Accepts the same options as test_must_fail. > > test_might_fail () { > - test_must_fail ok=success "$@" 2>&7 > + test_must_fail_helper might_fail ok=success "$@" 2>&7 > } 7>&2 2>&4 > > # Similar to test_must_fail and test_might_fail, but check that a I remember finding it annoying that "test_might_fail" is misreported from test_must_fail_acceptable as being called "test_must_fail", so this refactoring is very welcome. But can you please split this into its own commit? I.e. that improvement can stand on its own, i.e. just a change that has "test_must_fail" and "test_might_fail" reporting their correct name. Then this commit can follow, and then you'll just need to add (for this part): test_must_fail_helper todo "$@" 2>&7 And it'll make the resulting diff much smaller & easier to follow.
Hi Ævar On 06/10/2022 16:36, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Oct 06 2022, Phillip Wood via GitGitGadget wrote: > >> From: Phillip Wood <phillip.wood@dunelm.org.uk> >> >> test_todo() is built upon test_expect_failure() but accepts commands >> starting with test_* in addition to git. As our test_* assertions use >> BUG() to signal usage errors any such error will not be hidden by >> test_todo(). > > I think they will, unless I'm missing something. E.g. try out: It's talking about BUG() in test-lib.sh, not the C function. For example test_path_is_file () { test "$#" -ne 1 && BUG "1 param" if ! test -f "$1" then echo "File $1 doesn't exist" false fi } So a test containing "test_todo test_path_is_file a b" should fail as BUG calls exit rather than returning non-zero (I should probably test that in 0000-basic.sh) > diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh > index 80e76a4695e..1be895abba6 100755 > --- a/t/t0210-trace2-normal.sh > +++ b/t/t0210-trace2-normal.sh > @@ -170,7 +170,7 @@ test_expect_success 'BUG messages are written to trace2' ' > > test_expect_success 'bug messages with BUG_if_bug() are written to trace2' ' > test_when_finished "rm trace.normal actual expect" && > - test_expect_code 99 env GIT_TRACE2="$(pwd)/trace.normal" \ > + test_todo env GIT_TRACE2="$(pwd)/trace.normal" \ > test-tool trace2 008bug 2>err && > cat >expect <<-\EOF && > a bug message > > I.e. in our tests you need to look out for exit code 99, not the usual > abort(). > > I have local patches to fix this, previously submitted as an RFC here: > https://lore.kernel.org/git/RFC-cover-0.3-00000000000-20220525T234908Z-avarab@gmail.com/ > > I just rebased that & CI is currently running, I'll see how it does: > https://github.com/avar/git/tree/avar/usage-do-not-abort-on-BUG-to-get-trace2-event-2 > > When I merged your patches here with that topic yours started doing the > right thing in this case, i.e. a "test_todo" that would get a BUG()" > would be reported as a failure. > >> + test_true () { >> + true >> + } >> + test_expect_success "pretend we have fixed a test_todo breakage" \ >> + "test_todo test_true" > > "Why the indirection", until I realized that it's because you're working > around the whitelist of commands that we have, i.e. we allow 'git' and > 'test-tool', but not 'grep' or whatever. > > I'm of the opinion that we should just drop that limitation altogether, > which is shown to be pretty silly in this case. I.e. at some point we > listed "test_*" as "this invokes a git binary", but a lot of our test_* > commands don't, including this one. test_expect_failure does not allow test_* are you thinking of test-tool? > So in general I think we should just allow any command in > "test_must_fail" et al, and just catch in code review if someone uses > "test_must_fail grep" as opposed to "! grep". That is not going to scale well > But for the particular case of "test_todo" doing so seems like pointless > work, if we think we're going to miss this sort of thing in review in > general, then surely that's not a concern if we're going to very > prominently mark tests as TODO tests, given how the test of the output > shows them? >[...] >> test_might_fail () { >> - test_must_fail ok=success "$@" 2>&7 >> + test_must_fail_helper might_fail ok=success "$@" 2>&7 >> } 7>&2 2>&4 >> >> # Similar to test_must_fail and test_might_fail, but check that a > > I remember finding it annoying that "test_might_fail" is misreported > from test_must_fail_acceptable as being called "test_must_fail", so this > refactoring is very welcome. > > But can you please split this into its own commit? I.e. that improvement > can stand on its own, i.e. just a change that has "test_must_fail" and > "test_might_fail" reporting their correct name. > > Then this commit can follow, and then you'll just need to add (for this part) > > test_must_fail_helper todo "$@" 2>&7 > > And it'll make the resulting diff much smaller & easier to follow. Sure, I should also improve the error message in >> + echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*" for test_todo. Best Wishes Phillip
On Thu, Oct 06 2022, Phillip Wood wrote: > Hi Ævar > > On 06/10/2022 16:36, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Oct 06 2022, Phillip Wood via GitGitGadget wrote: >> >>> From: Phillip Wood <phillip.wood@dunelm.org.uk> >>> >>> test_todo() is built upon test_expect_failure() but accepts commands >>> starting with test_* in addition to git. As our test_* assertions use >>> BUG() to signal usage errors any such error will not be hidden by >>> test_todo(). >> I think they will, unless I'm missing something. E.g. try out: > > It's talking about BUG() in test-lib.sh, not the C function. For example > > test_path_is_file () { > test "$#" -ne 1 && BUG "1 param" > if ! test -f "$1" > then > echo "File $1 doesn't exist" > false > fi > } > > So a test containing "test_todo test_path_is_file a b" should fail as > BUG calls exit rather than returning non-zero (I should probably test > that in 0000-basic.sh) Ah, anyway, I think getting that to behave correctly is *the* thing any less sucky test_expect_failure replacement needs to get right, i.e. to handle BUG() (in C code), abort(), SANITIZE=undefined (and friends, all of whom abort()), segfaults etc. >> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh >> index 80e76a4695e..1be895abba6 100755 >> --- a/t/t0210-trace2-normal.sh >> +++ b/t/t0210-trace2-normal.sh >> @@ -170,7 +170,7 @@ test_expect_success 'BUG messages are written to trace2' ' >> >> test_expect_success 'bug messages with BUG_if_bug() are written to trace2' ' >> test_when_finished "rm trace.normal actual expect" && >> - test_expect_code 99 env GIT_TRACE2="$(pwd)/trace.normal" \ >> + test_todo env GIT_TRACE2="$(pwd)/trace.normal" \ >> test-tool trace2 008bug 2>err && >> cat >expect <<-\EOF && >> a bug message >> I.e. in our tests you need to look out for exit code 99, not the >> usual >> abort(). >> I have local patches to fix this, previously submitted as an RFC >> here: >> https://lore.kernel.org/git/RFC-cover-0.3-00000000000-20220525T234908Z-avarab@gmail.com/ >> I just rebased that & CI is currently running, I'll see how it does: >> https://github.com/avar/git/tree/avar/usage-do-not-abort-on-BUG-to-get-trace2-event-2 >> When I merged your patches here with that topic yours started doing >> the >> right thing in this case, i.e. a "test_todo" that would get a BUG()" >> would be reported as a failure. >> >>> + test_true () { >>> + true >>> + } >>> + test_expect_success "pretend we have fixed a test_todo breakage" \ >>> + "test_todo test_true" >> "Why the indirection", until I realized that it's because you're >> working >> around the whitelist of commands that we have, i.e. we allow 'git' and >> 'test-tool', but not 'grep' or whatever. >> I'm of the opinion that we should just drop that limitation >> altogether, >> which is shown to be pretty silly in this case. I.e. at some point we >> listed "test_*" as "this invokes a git binary", but a lot of our test_* >> commands don't, including this one. > > test_expect_failure does not allow test_* are you thinking of test-tool? I'm talking about test_todo, and the reason for that "test_true" being needed is because test_must_fail_acceptable is picky, but we could also (I just adjusted that one test): diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index 52362ad3dd3..921e0401eb5 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -143,11 +143,8 @@ test_expect_success 'subtest: a passing TODO test' ' test_expect_success 'subtest: a failing test_todo' ' write_and_run_sub_test_lib_test failing-test-todo <<-\EOF && - test_false () { - false - } test_expect_success "passing test" "true" - test_expect_success "known todo" "test_todo test_false" + test_expect_success "known todo" "test_todo false" test_done EOF check_sub_test_lib_test failing-test-todo <<-\EOF diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 8978709b231..9300eaa2c62 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1034,6 +1034,11 @@ test_must_fail_acceptable () { done fi + if test "$name" = "todo" + then + return 0 + fi + case "$1" in git|__git*|test-tool|test_terminal) return 0 @@ -1050,10 +1055,6 @@ test_must_fail_acceptable () { fi return 1 ;; - test_*) - test "$name" = "todo" - return $? - ;; *) return 1 ;; >> So in general I think we should just allow any command in >> "test_must_fail" et al, and just catch in code review if someone uses >> "test_must_fail grep" as opposed to "! grep". > > That is not going to scale well Well, you're throwing the scaling out the window by whitelisting test_* in your 1/3 :) I don't think we really need it, but *the* reason it's there is to distinguish things that run our own C code from things that don't, and a lot of test_* helpers don't. So if you want to pursue making this correct per the current intent it should really use the current whitelist to decide whether or not to pass things through the "should we change the exit code if it's a signal, segfault etc?" part. Otherwise you should just negate it or whatever, as the "test_todo" I showed in https://lore.kernel.org/git/221006.86v8owr986.gmgdl@evledraar.gmail.com/ does. I.e. we shouldn't be detecting an abort() in /bin/false and the like.
Phillip Wood via GitGitGadget wrote: > From: Phillip Wood <phillip.wood@dunelm.org.uk> > > test_todo() is intended as a fine grained replacement for > test_expect_failure(). Rather than marking the whole test as failing > test_todo() is used to mark individual failing commands within a > test. This approach to writing failing tests allows us to detect > unexpected failures that are hidden by test_expect_failure(). I love this idea! I've nearly been burned a couple of times by the wrong line in a 'test_expect_failure' triggering the error (e.g., due to bad syntax earlier in the test). The added specificity of 'test_todo' will help both reviewers and people fixing the underlying issues demonstrated by expected-failing tests. > > Failing commands are reported by the test harness in the same way as > test_expect_failure() so there is no change in output when migrating > from test_expect_failure() to test_todo(). If a command marked with > test_todo() succeeds then the test will fail. This is designed to make > it easier to see when a command starts succeeding in our CI compared > to using test_expect_failure() where it is easy to fix a failing test > case and not realize it. > > test_todo() is built upon test_expect_failure() but accepts commands > starting with test_* in addition to git. As our test_* assertions use > BUG() to signal usage errors any such error will not be hidden by > test_todo(). Should this be so restrictive? I think 'test_todo' would need to handle any arbitrary command (mostly because of custom functions like 'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement for 'test_expect_failure'. I see there's some related discussion in another subthread [1], but I don't necessarily think removing restrictions (i.e. that the tested command must be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for 'test_must_fail' et al. to be internally consistent. On one hand, 'test_todo' could be interpreted as an assertion (like 'test_must_fail'), where we only want to assert on our code - hence the restrictions. From that perspective, it would make sense to ease restrictions uniformly on all of our assertion helpers. On the other hand, I'm interpreting 'test_todo' as 'test_expect_failure_on_line_N' - more of a "post-test result interpreter" than an assertion helper. So because 'test_expect_failure' doesn't require the failing line to come from a particular command, I don't think 'test_todo' needs to either. That leaves assertion helpers like 'test_must_fail' out of the scope of this change, avoiding any hairiness of allowing them to assert on arbitrary code. What do you think? [1] https://lore.kernel.org/git/221006.86mta8r860.gmgdl@evledraar.gmail.com/ > > This commit coverts a few tests to show the intended use of > test_todo(). A limitation of test_todo() as it is currently > implemented is that it cannot be used in a subshell. > > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
On Tue, Dec 06 2022, Victoria Dye wrote: > Phillip Wood via GitGitGadget wrote: >> From: Phillip Wood <phillip.wood@dunelm.org.uk> >> >> test_todo() is intended as a fine grained replacement for >> test_expect_failure(). Rather than marking the whole test as failing >> test_todo() is used to mark individual failing commands within a >> test. This approach to writing failing tests allows us to detect >> unexpected failures that are hidden by test_expect_failure(). > > I love this idea! I've nearly been burned a couple of times by the wrong > line in a 'test_expect_failure' triggering the error (e.g., due to bad > syntax earlier in the test). The added specificity of 'test_todo' will help > both reviewers and people fixing the underlying issues demonstrated by > expected-failing tests. > >> >> Failing commands are reported by the test harness in the same way as >> test_expect_failure() so there is no change in output when migrating >> from test_expect_failure() to test_todo(). If a command marked with >> test_todo() succeeds then the test will fail. This is designed to make >> it easier to see when a command starts succeeding in our CI compared >> to using test_expect_failure() where it is easy to fix a failing test >> case and not realize it. >> >> test_todo() is built upon test_expect_failure() but accepts commands >> starting with test_* in addition to git. As our test_* assertions use >> BUG() to signal usage errors any such error will not be hidden by >> test_todo(). > > Should this be so restrictive? I think 'test_todo' would need to handle any > arbitrary command (mostly because of custom functions like > 'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement > for 'test_expect_failure'. > > I see there's some related discussion in another subthread [1], but I don't > necessarily think removing restrictions (i.e. that the tested command must > be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for > 'test_must_fail' et al. to be internally consistent. On one hand, > 'test_todo' could be interpreted as an assertion (like 'test_must_fail'), > where we only want to assert on our code - hence the restrictions. From that > perspective, it would make sense to ease restrictions uniformly on all of > our assertion helpers. > > On the other hand, I'm interpreting 'test_todo' as > 'test_expect_failure_on_line_N' - more of a "post-test result interpreter" > than an assertion helper. So because 'test_expect_failure' doesn't require > the failing line to come from a particular command, I don't think > 'test_todo' needs to either. That leaves assertion helpers like > 'test_must_fail' out of the scope of this change, avoiding any hairiness of > allowing them to assert on arbitrary code. > > What do you think? Are you saying that for the "test_todo" we shouldn't care whether it exits with a "normal" non-zero or a segfault, abort() (e.g. BUG()) etc? That's what the "test_must_fail" v.s. "!" is about. Even if we erased tat distinction I think such a thing would be a marginal improvement on "test_expect_failure", as we'd at least mark what line fails, but like "test_expect_failure" we'd accept segfaults as failures. but as noted in the upthread discussions I think we should do better and still check for segfaults etc. I think we have a couple of "test_expect_failure" now where we expect a segfault, but for the rest we'd like to know if they start segfaulting.
Hi Victoria On 06/12/2022 22:37, Victoria Dye wrote: > Phillip Wood via GitGitGadget wrote: >> From: Phillip Wood <phillip.wood@dunelm.org.uk> >> >> Failing commands are reported by the test harness in the same way as >> test_expect_failure() so there is no change in output when migrating >> from test_expect_failure() to test_todo(). If a command marked with >> test_todo() succeeds then the test will fail. This is designed to make >> it easier to see when a command starts succeeding in our CI compared >> to using test_expect_failure() where it is easy to fix a failing test >> case and not realize it. >> >> test_todo() is built upon test_expect_failure() but accepts commands >> starting with test_* in addition to git. As our test_* assertions use >> BUG() to signal usage errors any such error will not be hidden by >> test_todo(). > > Should this be so restrictive? I think 'test_todo' would need to handle any > arbitrary command (mostly because of custom functions like > 'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement > for 'test_expect_failure'. > > I see there's some related discussion in another subthread [1], but I don't > necessarily think removing restrictions (i.e. that the tested command must > be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for > 'test_must_fail' et al. to be internally consistent. On one hand, > 'test_todo' could be interpreted as an assertion (like 'test_must_fail'), > where we only want to assert on our code - hence the restrictions. From that > perspective, it would make sense to ease restrictions uniformly on all of > our assertion helpers. > > On the other hand, I'm interpreting 'test_todo' as > 'test_expect_failure_on_line_N' - more of a "post-test result interpreter" > than an assertion helper. So because 'test_expect_failure' doesn't require > the failing line to come from a particular command, I don't think > 'test_todo' needs to either. That leaves assertion helpers like > 'test_must_fail' out of the scope of this change, avoiding any hairiness of > allowing them to assert on arbitrary code. > > What do you think? I don't think we need to remove the restrictions on 'test_must_fail', they seem to be there for a good reason and I'm not aware of anyone complaining about being inconvenienced by them. I think of 'test_todo' and 'test_must_fail' as being distinct, 'test_todo' only reuses the implementation of 'test_must_fail' for convenience rather than any other deep reason. I added the restrictions to 'test_todo' to try and stop it being misused but I'm happy to relax them if needed. I'm keen that test_todo is able to distinguish between an expected failure and a failure due to the wrapped command being misused e.g. 'test_todo grep --invalid-option' should report an error. Restricting the commands makes it easier to guarantee that but we can always just add checks for other commands as we use them. In a way the existing restrictions are kind of pointless because test authors can always name their helper functions test_... to get round them. I think you've convinced be to remove the restrictions on what can be wrapped by 'test_todo' when I re-roll. Thanks for your thoughtful comments Phillip > [1] https://lore.kernel.org/git/221006.86mta8r860.gmgdl@evledraar.gmail.com/ > >> >> This commit coverts a few tests to show the intended use of >> test_todo(). A limitation of test_todo() as it is currently >> implemented is that it cannot be used in a subshell. >> >> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Phillip Wood <phillip.wood123@gmail.com> writes: > I added the restrictions to 'test_todo' to try and stop it being > misused but I'm happy to relax them if needed. I'm keen that test_todo > is able to distinguish between an expected failure and a failure due > to the wrapped command being misused e.g. 'test_todo grep > --invalid-option' should report an error. Hmm, but it is not useful if the failure is from "you cannot use the system command grep with test_todo", as the (implicitly) encouraged "fix" the developer who wrote the test would pick would be to use "! grep --invalid-option" which would still fail for a wrong reason. If a "git" command is expected to run to a completion but is currently broken and produces a wrong output, it would be very useful to be able to write git command --args >actual && test_todo grep -e "$expected_token" actual to say "when 'git command' is fixed, the output should contains this, but we know it currently is broken". > I think you've convinced be to remove the restrictions on what can be > wrapped by 'test_todo' when I re-roll. > > Thanks for your thoughtful comments Thanks.
Hi Junio On 09/12/2022 01:09, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@gmail.com> writes: > >> I added the restrictions to 'test_todo' to try and stop it being >> misused but I'm happy to relax them if needed. I'm keen that test_todo >> is able to distinguish between an expected failure and a failure due >> to the wrapped command being misused e.g. 'test_todo grep >> --invalid-option' should report an error. > > Hmm, but it is not useful if the failure is from "you cannot use the > system command grep with test_todo", as the (implicitly) encouraged > "fix" the developer who wrote the test would pick would be to use > "! grep --invalid-option" which would still fail for a wrong reason. > > If a "git" command is expected to run to a completion but is > currently broken and produces a wrong output, it would be very > useful to be able to write > > git command --args >actual && > test_todo grep -e "$expected_token" actual > > to say "when 'git command' is fixed, the output should contains > this, but we know it currently is broken". You can do that now, the next patch adds support for "test_todo [!] grep" but you currently cannot pass arbitrary commands/helper functions. Best Wishes Phillip >> I think you've convinced be to remove the restrictions on what can be >> wrapped by 'test_todo' when I re-roll. >> >> Thanks for your thoughtful comments > > Thanks.
diff --git a/t/README b/t/README index 979b2d4833d..642aeab80b4 100644 --- a/t/README +++ b/t/README @@ -892,6 +892,10 @@ see test-lib-functions.sh for the full list and their options. - test_expect_failure [<prereq>] <message> <script> + Where possible new tests should use test_expect_success and mark + the individual failing commands with test_todo (see below) rather + than using test_expect_failure. + This is NOT the opposite of test_expect_success, but is used to mark a test that demonstrates a known breakage. Unlike the usual test_expect_success tests, which say "ok" on @@ -902,6 +906,14 @@ see test-lib-functions.sh for the full list and their options. Like test_expect_success this function can optionally use a three argument invocation with a prerequisite as the first argument. + - test_todo <command> + + This is used to mark commands that should succeed but do not due to + a known issue. The test will be reported as a known failure if the + issue still exists and won’t cause -i (immediate) to stop. If the + command unexpectedly succeeds then the test will be reported as a + failing. test_todo cannot be used in a subshell. + - test_debug <script> This takes a single argument, <script>, and evaluates it only diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index 502b4bcf9ea..52362ad3dd3 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -141,6 +141,70 @@ test_expect_success 'subtest: a passing TODO test' ' EOF ' +test_expect_success 'subtest: a failing test_todo' ' + write_and_run_sub_test_lib_test failing-test-todo <<-\EOF && + test_false () { + false + } + test_expect_success "passing test" "true" + test_expect_success "known todo" "test_todo test_false" + test_done + EOF + check_sub_test_lib_test failing-test-todo <<-\EOF + > ok 1 - passing test + > not ok 2 - known todo # TODO known breakage + > # still have 1 known breakage(s) + > # passed all remaining 1 test(s) + > 1..2 + EOF +' + +test_expect_success 'subtest: a passing test_todo' ' + write_and_run_sub_test_lib_test_err passing-test-todo <<-\EOF && + test_true () { + true + } + test_expect_success "pretend we have fixed a test_todo breakage" \ + "test_todo test_true" + test_done + EOF + check_sub_test_lib_test passing-test-todo <<-\EOF + > not ok 1 - pretend we have fixed a test_todo breakage + > # test_todo test_true + > # failed 1 among 1 test(s) + > 1..1 + EOF +' + +test_expect_success 'subtest: test_todo allowed arguments' ' + write_and_run_sub_test_lib_test_err acceptable-test-todo <<-\EOF && + # This an acceptable command for test_todo but not test_must_fail + test_true () { + return 0 + } + test_expect_success "test_todo skips env and accepts good command" \ + "test_todo env Var=Value git --invalid-option" + test_expect_success "test_todo skips env and rejects bad command" \ + "test_todo env Var=Value false" + test_expect_success "test_todo test_must_fail accepts good command" \ + "test_todo test_must_fail git --version" + test_expect_success "test_todo test_must_fail rejects bad command" \ + "test_todo test_must_fail test_true" + test_done + EOF + check_sub_test_lib_test acceptable-test-todo <<-\EOF + > not ok 1 - test_todo skips env and accepts good command # TODO known breakage + > not ok 2 - test_todo skips env and rejects bad command + > # test_todo env Var=Value false + > not ok 3 - test_todo test_must_fail accepts good command # TODO known breakage + > not ok 4 - test_todo test_must_fail rejects bad command + > # test_todo test_must_fail test_true + > # still have 2 known breakage(s) + > # failed 2 among remaining 2 test(s) + > 1..4 + EOF +' + test_expect_success 'subtest: 2 TODO tests, one passin' ' write_and_run_sub_test_lib_test partially-passing-todos <<-\EOF && test_expect_failure "pretend we have a known breakage" "false" diff --git a/t/t3401-rebase-and-am-rename.sh b/t/t3401-rebase-and-am-rename.sh index f18bae94507..cc5da9f5afe 100755 --- a/t/t3401-rebase-and-am-rename.sh +++ b/t/t3401-rebase-and-am-rename.sh @@ -52,7 +52,7 @@ test_expect_success 'rebase --interactive: directory rename detected' ' ) ' -test_expect_failure 'rebase --apply: directory rename detected' ' +test_expect_success 'rebase --apply: directory rename detected' ' ( cd dir-rename && @@ -63,8 +63,8 @@ test_expect_failure 'rebase --apply: directory rename detected' ' git ls-files -s >out && test_line_count = 5 out && - test_path_is_file y/d && - test_path_is_missing x/d + test_todo test_path_is_file y/d && + test_todo test_path_is_missing x/d ) ' @@ -84,7 +84,7 @@ test_expect_success 'rebase --merge: directory rename detected' ' ) ' -test_expect_failure 'am: directory rename detected' ' +test_expect_success 'am: directory rename detected' ' ( cd dir-rename && @@ -97,8 +97,8 @@ test_expect_failure 'am: directory rename detected' ' git ls-files -s >out && test_line_count = 5 out && - test_path_is_file y/d && - test_path_is_missing x/d + test_todo test_path_is_file y/d && + test_todo test_path_is_missing x/d ) ' diff --git a/t/t3424-rebase-empty.sh b/t/t3424-rebase-empty.sh index 5e1045a0afc..b7cae260759 100755 --- a/t/t3424-rebase-empty.sh +++ b/t/t3424-rebase-empty.sh @@ -34,15 +34,15 @@ test_expect_success 'setup test repository' ' git commit -m "Five letters ought to be enough for anybody" ' -test_expect_failure 'rebase (apply-backend)' ' - test_when_finished "git rebase --abort" && +test_expect_success 'rebase (apply-backend)' ' + test_when_finished "test_might_fail git rebase --abort" && git checkout -B testing localmods && # rebase (--apply) should not drop commits that start empty git rebase --apply upstream && test_write_lines D C B A >expect && git log --format=%s >actual && - test_cmp expect actual + test_todo test_cmp expect actual ' test_expect_success 'rebase --merge --empty=drop' ' diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index e74a318ac33..fa7831c0674 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -790,7 +790,7 @@ test_expect_success SYMLINKS 'rm across a symlinked leading path (no index)' ' test_path_is_file e/f ' -test_expect_failure SYMLINKS 'rm across a symlinked leading path (w/ index)' ' +test_expect_success SYMLINKS 'rm across a symlinked leading path (w/ index)' ' rm -rf d e && mkdir d && echo content >d/f && @@ -798,10 +798,10 @@ test_expect_failure SYMLINKS 'rm across a symlinked leading path (w/ index)' ' git commit -m "d/f exists" && mv d e && ln -s e d && - test_must_fail git rm d/f && - git rev-parse --verify :d/f && + test_todo test_must_fail git rm d/f && + test_todo git rev-parse --verify :d/f && test -h d && - test_path_is_file e/f + test_todo test_path_is_file e/f ' test_expect_success 'setup for testing rm messages' ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index c6479f24eb5..8978709b231 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -802,6 +802,7 @@ test_expect_failure () { export test_prereq if ! test_skip "$@" then + test_todo_=test_expect_failure test -n "$test_skip_test_preamble" || say >&3 "checking known breakage of $TEST_NUMBER.$test_count '$1': $2" if test_run_ "$2" expecting_failure @@ -825,9 +826,15 @@ test_expect_success () { then test -n "$test_skip_test_preamble" || say >&3 "expecting success of $TEST_NUMBER.$test_count '$1': $2" + test_todo_=test_expect_success if test_run_ "$2" then - test_ok_ "$1" + if test "$test_todo_" = "todo" + then + test_known_broken_failure_ "$1" + else + test_ok_ "$1" + fi else test_failure_ "$@" fi @@ -999,10 +1006,18 @@ list_contains () { } # Returns success if the arguments indicate that a command should be -# accepted by test_must_fail(). If the command is run with env, the env -# and its corresponding variable settings will be stripped before we -# test the command being run. +# accepted by test_must_fail() or test_todo(). If the command is run +# with env, the env and its corresponding variable settings will be +# stripped before we we test the command being run. +# +# test_todo() allows any of the assertions beginning test_ such as +# test_cmp in addition the commands allowed by test_must_fail(). + test_must_fail_acceptable () { + local name + name="$1" + shift + if test "$1" = "env" then shift @@ -1023,12 +1038,96 @@ test_must_fail_acceptable () { git|__git*|test-tool|test_terminal) return 0 ;; + test_might_fail|test_todo|test_when_finished) + return 1 + ;; + test_must_fail) + if test "$name" = "todo" + then + shift + test_must_fail_acceptable must_fail "$@" + return $? + fi + return 1 + ;; + test_*) + test "$name" = "todo" + return $? + ;; *) return 1 ;; esac } +test_must_fail_helper () { + test_must_fail_name_="$1" + shift + case "$1" in + ok=*) + _test_ok=${1#ok=} + shift + ;; + *) + _test_ok= + ;; + esac + if ! test_must_fail_acceptable $test_must_fail_name_ "$@" + then + echo >&7 "test_$test_must_fail_name_: only 'git' is allowed: $*" + return 1 + fi + "$@" 2>&7 + exit_code=$? + if test $exit_code -eq 0 && ! list_contains "$_test_ok" success + then + echo >&4 "test_$test_must_fail_name_: command succeeded: $*" + return 1 + elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe + then + return 0 + elif test $exit_code -gt 129 && test $exit_code -le 192 + then + echo >&4 "test_$test_must_fail_name_: died by signal $(($exit_code - 128)): $*" + return 1 + elif test $exit_code -eq 127 + then + echo >&4 "test_$test_must_fail_name_: command not found: $*" + return 1 + elif test $exit_code -eq 126 + then + echo >&4 "test_$test_must_fail_name_: valgrind error: $*" + return 1 + fi + + return 0 +} 7>&2 2>&4 + +# This is used to mark commands that should succeed but do not due to +# a known issue. Marking the individual commands that fail rather than +# using test_expect_failure allows us to detect any unexpected +# failures. As with test_must_fail if the command is killed by a +# signal the test will fail. If the command unexpectedly succeeds then +# the test will also fail. For example: +# +# test_expect_success 'test a known failure' ' +# git foo 2>err && +# test_todo test_must_be_empty err +# ' +# +# This test will fail if "git foo" fails or err is unexpectedly empty. +# test_todo can be used with "git" or any of the "test_*" assertions +# such as test_cmp(). + +test_todo () { + if test "$test_todo_" = "test_expect_failure" + then + BUG "test_todo_ cannot be used inside test_expect_failure" + fi + test_todo_=todo + test_must_fail_helper todo "$@" 2>&7 +} 7>&2 2>&4 + # This is not among top-level (test_expect_success | test_expect_failure) # but is a prefix that can be used in the test script, like: # @@ -1061,43 +1160,7 @@ test_must_fail_acceptable () { # ! grep pattern output test_must_fail () { - case "$1" in - ok=*) - _test_ok=${1#ok=} - shift - ;; - *) - _test_ok= - ;; - esac - if ! test_must_fail_acceptable "$@" - then - echo >&7 "test_must_fail: only 'git' is allowed: $*" - return 1 - fi - "$@" 2>&7 - exit_code=$? - if test $exit_code -eq 0 && ! list_contains "$_test_ok" success - then - echo >&4 "test_must_fail: command succeeded: $*" - return 1 - elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe - then - return 0 - elif test $exit_code -gt 129 && test $exit_code -le 192 - then - echo >&4 "test_must_fail: died by signal $(($exit_code - 128)): $*" - return 1 - elif test $exit_code -eq 127 - then - echo >&4 "test_must_fail: command not found: $*" - return 1 - elif test $exit_code -eq 126 - then - echo >&4 "test_must_fail: valgrind error: $*" - return 1 - fi - return 0 + test_must_fail_helper must_fail "$@" 2>&7 } 7>&2 2>&4 # Similar to test_must_fail, but tolerates success, too. This is @@ -1114,7 +1177,7 @@ test_must_fail () { # Accepts the same options as test_must_fail. test_might_fail () { - test_must_fail ok=success "$@" 2>&7 + test_must_fail_helper might_fail ok=success "$@" 2>&7 } 7>&2 2>&4 # Similar to test_must_fail and test_might_fail, but check that a