Message ID | pull.1117.v2.git.1646130289.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | ci: make Git's GitHub workflow output much more helpful | expand |
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > Changes since v1: > > * In the patch that removed MAKE_TARGETS, a stale comment about that > variable is also removed. > * The comment about set -x has been adjusted because it no longer applies > as-is. > * The commit message of "ci: make it easier to find failed tests' logs in > the GitHub workflow" has been adjusted to motivate the improvement > better. Will queue. Thanks.
On Tue, Mar 01 2022, Johannes Schindelin via GitGitGadget wrote: > Changes since v1: > > * In the patch that removed MAKE_TARGETS, a stale comment about that > variable is also removed. > * The comment about set -x has been adjusted because it no longer applies > as-is. > * The commit message of "ci: make it easier to find failed tests' logs in > the GitHub workflow" has been adjusted to motivate the improvement > better. Just briefly: Much of the feedback I had on v1 is still unanswered, or in the case of the performance issues I think just saying that this output is aimed at non-long-time-contributors doesn't really justify the large observed slowdowns. I've been clicking around on the various "seen" links you posted, and while some of the output is improved it's *much* slower, to the point where I don't find those gains worth it. For the early part of this series teaching the "ci/" this GitHub markdown that's something that's unnecessary in my alternate approach in [1]. This series is is obviously a relatively shorter way to get there, but knowing that you *can* do it in a simpler way now it's odd that the v2 doesn't discuss why getting from "A" to that "B" is still desirable with this method. As for the later parts: > 5: 94dcbe1bc43 = 5: 9eda6574313 tests: refactor --write-junit-xml code > 6: 41230100091 = 6: c8b240af749 test(junit): avoid line feeds in XML attributes > 7: 98b32630fcd = 7: 15f199e810e ci: optionally mark up output in the GitHub workflow > 8: 1a6bd1846bc = 8: 91ea54f36c5 ci: use `--github-workflow-markup` in the GitHub workflow > 9: 992b1575889 = 9: be2a83f5da3 ci: call `finalize_test_case_output` a little later I think (and I pointed out to you before) that I really don't see why teaching the test-lib.sh N output formats when we already have a machine-readable output format with TAP *and* are in fact running code with "prove" that is parsing that output format into nicely arranged objects/structures makes sense. So as a demo for how that can work, here's a quick POC I hacked up a while ago to use "prove" output plugins for this sort of thing. First we intentionally break a test: $ git -P diff diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh index 76052cb5620..f085188baf9 100755 --- a/t/t0002-gitfile.sh +++ b/t/t0002-gitfile.sh @@ -44,7 +44,7 @@ test_expect_success 'check hash-object' ' test_expect_success 'check cat-file' ' git cat-file blob $SHA >actual && - test_cmp bar actual + test_cmp /dev/null actual ' test_expect_success 'check update-index' ' @@ -67,7 +67,7 @@ test_expect_success 'check commit-tree' ' test_expect_success 'check rev-list' ' git update-ref "HEAD" "$SHA" && - test "$SHA" = "$(git rev-list HEAD)" + test "$SHA" = "$(xgit rev-list HEAD)" ' And here's the resulting output, which is the same as the existing "prove" summary output, but then followed by nicely formatted details about just the failing tests: $ PERL5LIB=$PWD prove --formatter Fmt t0002-gitfile.sh :: -vx test_expect_success 'setup_git_dir twice in subdir' ' t0002-gitfile.sh .. Dubious, test returned 1 (wstat 256, 0x100) Failed 2/14 subtests Test Summary Report ------------------- t0002-gitfile.sh (Wstat: 256 Tests: 14 Failed: 2) Failed tests: 6, 10 Non-zero exit status: 1 Files=1, Tests=14, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.13 cusr 0.05 csys = 0.20 CPU) Result: FAIL Failed in t0002-gitfile.sh#6: ==> test => not ok 6 - check cat-file ==> source => # ==> source => # git cat-file blob $SHA >actual && ==> source => # test_cmp /dev/null actual ==> source => # ==> trace => + git cat-file blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 ==> trace => + test_cmp /dev/null actual ==> trace => + test 2 -ne 2 ==> trace => + eval diff -u "$@" ==> trace => + diff -u /dev/null actual ==> output => --- /dev/null 2022-01-25 04:40:50.187529644 +0000 ==> output => +++ actual 2022-03-02 12:29:28.217960535 +0000 ==> output => @@ -0,0 +1 @@ ==> output => +foo ==> output => error: last command exited with $?=1 ==> output => Failed in t0002-gitfile.sh#10: ==> test => not ok 10 - check rev-list ==> source => # ==> source => # git update-ref "HEAD" "$SHA" && ==> source => # test "$SHA" = "$(xgit rev-list HEAD)" ==> source => # ==> trace => + git update-ref HEAD f2e10ff57e8c01fe514c650d9de97e913257ba0c ==> trace => + xgit rev-list HEAD ==> trace => + test f2e10ff57e8c01fe514c650d9de97e913257ba0c = ==> output => t0002-gitfile.sh: 5: eval: xgit: not found ==> output => error: last command exited with $?=1 ==> output => Now, some of that is on top of some output changes to test-lib.sh I had locally, but the "test", "source" etc. there is not a hardcoded part of the output, it's corresponding to (some of which I re-labeled, e.g. "comment" => "source") the individual object types the TAP::Parser emits. The code to do that is below, with brief (unindented) comments: package Git::TAP::Formatter::Session; use v5.18.2; use strict; use warnings; use base 'TAP::Formatter::Console::ParallelSession'; our %STATE; sub result { my $self = shift; my $result = shift; So here we just have the TAP object for the current tests (as in "ok", "not ok" etc.), and save all of those away (both for successful and non-successful) tests in %STATE for later: my $res = $self->SUPER::result($result); my $test_name = $self->name; # An AoO of test numbers and their output lines $STATE{$test_name} ||= [{lines => []}]; push @{$STATE{$test_name}->[-1]->{lines}} => $result; # When we see a new test add a new AoA for its output. We do # end up with the "plan" type as part of the last test, and # might want to split it up if ($result->type eq 'test') { push @{$STATE{$test_name}} => {}; } return $res; } This "Fmt" package is a "prove" plugin. It's just hooking into the code that emits the current summary, now it just shows the "Test Summary Report" that your CL notes issues with, but this shows how easy it is to change or amend it (you can override another accessor here to fully suppress the current output, I just didn't do that): package Fmt; use strict; use warnings; use List::MoreUtils qw(firstidx); use base 'TAP::Formatter::Console'; sub open_test { my $self = shift; my $session = $self->SUPER::open_test(@_); use Data::Dumper; #warn "session is = " . Dumper $session; return bless $session => 'Git::TAP::Formatter::Session'; } sub summary { my $self = shift; $self->SUPER::summary(@_); ## This state machine needs to go past the "ok" line and grab ## the comments emitted by e.g. "say_color_tap_comment_lines" in ## test_ok_() for my $test (sort keys %STATE) { for (my $i = 1; $i <= $#{$STATE{$test}}; $i++) { my @lines = @{$STATE{$test}->[$i]->{lines}}; my $break = firstidx { $_->raw eq '' } @lines; my @source = splice @lines, 0, $break; splice @lines, 0, 1; # Splice out the '' item push @{$STATE{$test}->[$i - 1]->{lines}} => @source; $STATE{$test}->[$i]->{lines} = \@lines; # Since we parsed out the source already, # let's make it easily machine-readable, and # parse the rest. $STATE{$test}->[$i]->{source} = \@source; my @trace = map { $_->{is_trace} = 1 } grep { $_->raw =~ /^\+ / } @lines; $STATE{$test}->[$i]->{trace} = \@trace if @trace; } } my $aggregator = $_[0]; for my $failed ($aggregator->failed) { my ($parser) = $aggregator->parsers($failed); for my $i ($parser->failed) { my $idx = $i - 1; my @lines = @{$STATE{$failed}->[$idx]->{lines}}; my ($test) = grep { $_->is_test } @lines; say "Failed in $failed#$i:"; say join "\n", map { s/^/ ==> /gr; } map { sprintf("%-6s => %s", $_->{is_trace} ? "trace" : $_->is_unknown ? "output" : $_->is_comment ? "source" : $_->type, $_->raw, ); } sort { # the "is_trace" may be undef no warnings qw(uninitialized); # The "[not ]ok" line first... $b->is_test <=> $a->is_test || # Then "comment" (i.e. test source) $b->is_comment <=> $a->is_comment || # Then the "+ " trace of execution $b->{is_trace} <=> $a->{is_trace} } @lines; } } } 1; 1. https://lore.kernel.org/git/cover-00.25-00000000000-20220221T143936Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/220222.86y2236ndp.gmgdl@evledraar.gmail.com/
Hi Ævar, On Wed, 2 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Mar 01 2022, Johannes Schindelin via GitGitGadget wrote: > > > Changes since v1: > > > > * In the patch that removed MAKE_TARGETS, a stale comment about that > > variable is also removed. > > * The comment about set -x has been adjusted because it no longer applies > > as-is. > > * The commit message of "ci: make it easier to find failed tests' logs in > > the GitHub workflow" has been adjusted to motivate the improvement > > better. > > Just briefly: Much of the feedback I had on v1 is still unanswered, Yes, because I got the sense that your feedback ignores the goal of making it easier to diagnose test failures. > or in the case of the performance issues I think just saying that this > output is aimed at non-long-time-contributors doesn't really justify the > large observed slowdowns. What good is a quickly-loading web site when it is less than useful? I'd much rather have a slow-loading web site that gets me to where I need to be than a fast-loading one that leaves me as puzzled as before. Ciao, Johannes
On Mon, Mar 07 2022, Johannes Schindelin wrote: > On Wed, 2 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > >> >> On Tue, Mar 01 2022, Johannes Schindelin via GitGitGadget wrote: >> >> > Changes since v1: >> > >> > * In the patch that removed MAKE_TARGETS, a stale comment about that >> > variable is also removed. >> > * The comment about set -x has been adjusted because it no longer applies >> > as-is. >> > * The commit message of "ci: make it easier to find failed tests' logs in >> > the GitHub workflow" has been adjusted to motivate the improvement >> > better. >> >> Just briefly: Much of the feedback I had on v1 is still unanswered, > > Yes, because I got the sense that your feedback ignores the goal of making > it easier to diagnose test failures. I think that's a rather strange conclusion given that I've submitted a parallel series that makes some of those failures easier to diagnose than the same changes in this series. I.e. the failures in build v.s. test phases, not the individual test format output (but those are orthagonal). I think it's a fair summary of our differences that we're just placing different values on UX responsiveness. I'm pretty sure there's some amount of UX slowdown you'd consider unacceptable, no matter how much the output was improved. Clearly we just use it differently. >> or in the case of the performance issues I think just saying that this >> output is aimed at non-long-time-contributors doesn't really justify the >> large observed slowdowns. > > What good is a quickly-loading web site when it is less than useful? For all the flaws in the current output there are cases now where you can click on a failure, see a summary of the 1-2 tests that failed, and even find your way through the right place in the rather verbose raw log output in 1/4 or 1/2 the time it takes the initial page on the new UX to loda. > I'd much rather have a slow-loading web site that gets me to where I need > to be than a fast-loading one that leaves me as puzzled as before. I think it's clear that we're going to disagree on this point, but I'd still think that: * In a re-roll, you should amend these patches to clearly note that's a UX trade-off you're making, perhaps with rough before/after timings similar to the ones I've posted. I.e. now those patches say nothing about the UX change resulting in UX that's *much* slower than before. Clearly noting that trade-off for reviewers is not the same as saying the trade-off can't be made. * I don't see why the changes here can't be made configurable (and perhaps you'd argue they should be on by default) via the ci-config phase.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I think that's a rather strange conclusion given that I've submitted a > parallel series that makes some of those failures easier to diagnose > than the same changes in this series. I.e. the failures in build > v.s. test phases, not the individual test format output (but those are > orthagonal). If you have a counter-proposal that you feel is solid enough, I do not mind dropping the topic in question and replacing it with the counter-proposal to let people see how it fares for a few days. If it allows others to view the output easily if you revert the merge of this topic into 'seen' and replace with the counter-proposal and push it to your own repository, that would be an even better way to highlight the differences of two approaches, as that would allow us to see the same failures side-by-side. Am I correct to understand that one of the the common goals here is to eliminate the need to discover how to get to the first failure output without turning it slow by 10x to load the output? > I think it's clear that we're going to disagree on this point, but I'd > still think that: > > * In a re-roll, you should amend these patches to clearly note that's a > UX trade-off you're making, perhaps with rough before/after timings > similar to the ones I've posted. > > I.e. now those patches say nothing about the UX change resulting in > UX that's *much* slower than before. Clearly noting that trade-off > for reviewers is not the same as saying the trade-off can't be made. Whether we perform counter-proposal comparison or not, the above is a reasonable thing to ask. > * I don't see why the changes here can't be made configurable (and > perhaps you'd argue they should be on by default) via the ci-config > phase. I do not know if such a knob is feasible, though. Thanks.
On Mon, Mar 07 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> I think that's a rather strange conclusion given that I've submitted a >> parallel series that makes some of those failures easier to diagnose >> than the same changes in this series. I.e. the failures in build >> v.s. test phases, not the individual test format output (but those are >> orthagonal). > > If you have a counter-proposal that you feel is solid enough, I do > not mind dropping the topic in question and replacing it with the > counter-proposal to let people see how it fares for a few days. If > it allows others to view the output easily if you revert the merge > of this topic into 'seen' and replace with the counter-proposal and > push it to your own repository, that would be an even better way to > highlight the differences of two approaches, as that would allow us > to see the same failures side-by-side. I proposed [1] as a counter-proposal to *part of* this series, which has various UX comparisons. Note that it doesn't aim to change the test failure output significantly, but to structurally simplify .github/workflows/main.yml and ci/ for a significant line-count reduction, and improve part of the output that we get for "make" v.s. "make test", and to make it obvious what the parameters of each step are. Which is partially overlapping with 1/2 of Johannes's series here, I think it makes sense to split up those two concerns & address this more incrementally. I.e. my series shows that you can get what the first half of this series proposes to do by adding GitHub-specific output to ci/* without any such CI-backend-specific output, and the resulting presentation in the UX is better. It'll still apply to "master" with this topic ejected, there was a minor comment (needing commit message rephrasing) from SZEDER Gábor on it, I could re-roll it if you're interested. > Am I correct to understand that one of the the common goals here is > to eliminate the need to discover how to get to the first failure > output without turning it slow by 10x to load the output? That's definitely an eventual common goal, I have a POC for how to do that with an alternate approach that doesn't suffer from that slowdown, and shows you much more targeted failure output (only the specific tests that failed) at [2]. I just think it makes sense to split up the concerns how we arrange .github/workflows/main.yml & ci/* and how doing that differently can improve the CI UX, v.s. the mostly orthagonal concern of how test-lib.sh + prove can best summarize their failure output. >> I think it's clear that we're going to disagree on this point, but I'd >> still think that: >> >> * In a re-roll, you should amend these patches to clearly note that's a >> UX trade-off you're making, perhaps with rough before/after timings >> similar to the ones I've posted. >> >> I.e. now those patches say nothing about the UX change resulting in >> UX that's *much* slower than before. Clearly noting that trade-off >> for reviewers is not the same as saying the trade-off can't be made. > > Whether we perform counter-proposal comparison or not, the above is > a reasonable thing to ask. > >> * I don't see why the changes here can't be made configurable (and >> perhaps you'd argue they should be on by default) via the ci-config >> phase. > > I do not know if such a knob is feasible, though. It would be rather trivial. Basically a matter of adding a "if: needs.ci-config.outputs.style == 'basic'" to ci/print-test-failures.sh, and a corresponding flag passed down to ci/lib.sh to have it invoke test-lib.sh with --github-workflow-markup or not. I.e. this series detects it's running in GitHub CI and optionally tells test-lib.sh to emit different output, so to get the old output you'd just need to not opt-in to that. I think we can have our cake and eat it too though, so I don't think there's any point in such a knob per-se. The only reason I'm suggesting it is because Johannes doesn't otherwise seem to want to address the significant CI UX slowdowns in this series. I do think the approach taken by this series is inherently limited in solving that problem though, in a way that the approach outlined in [2] isn't. I.e. the problem is that we're throwing say ~30k-90k lines of raw CI output at some GitHub JavaScript to format and present. Your browser needs to download all the output, and then the browser needs to spin at 100% CPU to present it to you. The reason for that is that anything that tweaks the test-lib.sh output is something you need to consume *in its entirety*. I.e. when you have a sequence like: ok 1 test one ok 2 test two not ok 3 test three 1..3 You don't know until the third test that you've had a failure. Short of teaching test-lib.sh even more complexity (e.g. pre-buffering its "passing" output) a UX layer needs to present all of it, and won't benefit from a parsed representation of it. Which is why I think adding other output formatters to test-lib.sh is a dead end approach to this problem. I mean, sure we could start parsing the new output emitted here, but that doesn't make sense either. We already run a full TAP parser over the output of the entire test suite, which we can easily use to only print details about those tests that failed[2]. We could also still have the full & raw output, but that could be in some other tab (or "stage", just like "ci/print-test-failures.sh" is now. To be fair that isn't quite as trivial in terms of patch count. In particular we have edge cases currently where the TAP output isn't valid due to bugs in test-lib.sh and/or tests, and can't combine it with the --verbose output. The demo at [2] is on top of some patches I've got locally to fix all of that. But I really think it's worth it to move a bit slower there & get it right rather than heading down the wrong direction of having another not-quite-machine-readable output target. I.e. once it's machine readable & parsed we can present it however we like, and can do *much better* output such as correlating trace output to the test source, and e.g. showing what specific line we failed on. Right now all of that needs to be eyeballed & inferred from the "--verbose -x" output (which is often non-obvious & a pain, especially when the "test_expect_success" block calls helper functions). 1. https://lore.kernel.org/git/cover-00.25-00000000000-20220221T143936Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/220302.86mti87cj2.gmgdl@evledraar.gmail.com/
Hi Ævar, On Mon, 7 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > On Mon, Mar 07 2022, Johannes Schindelin wrote: > > > On Wed, 2 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > > > >> > >> On Tue, Mar 01 2022, Johannes Schindelin via GitGitGadget wrote: > >> > >> > Changes since v1: > >> > > >> > * In the patch that removed MAKE_TARGETS, a stale comment about that > >> > variable is also removed. > >> > * The comment about set -x has been adjusted because it no longer applies > >> > as-is. > >> > * The commit message of "ci: make it easier to find failed tests' logs in > >> > the GitHub workflow" has been adjusted to motivate the improvement > >> > better. > >> > >> Just briefly: Much of the feedback I had on v1 is still unanswered, > > > > Yes, because I got the sense that your feedback ignores the goal of making > > it easier to diagnose test failures. > > I think that's a rather strange conclusion given that I've submitted a > parallel series that makes some of those failures easier to diagnose > than the same changes in this series. I.e. the failures in build > v.s. test phases, not the individual test format output (but those are > orthagonal). I do not know what your parallel series implements, as I did not have the time to read it yet (and it contains about double the number of patches of my series, hopefully not intended to make it impossible for me to spare the time to even taking a glimpse at it). Also: I thought we had the rule of trying to be mindful of other contributors and avoid interfering with patch series that are in flight? It could be viewed as unnecessarily adversarial. > I think it's a fair summary of our differences that we're just placing > different values on UX responsiveness. I'm pretty sure there's some > amount of UX slowdown you'd consider unacceptable, no matter how much > the output was improved. > > Clearly we just use it differently. I would gladly trade my convenience in return for making it easier for others to diagnose why their PR builds fail. At the moment, the way our CI/PR builds present test failures very likely makes every new contributor feel stupid for not knowing how to proceed. But they are not stupid. The failure is not theirs. The fault lies squarely with us, for making it so frigging hard. > >> or in the case of the performance issues I think just saying that this > >> output is aimed at non-long-time-contributors doesn't really justify the > >> large observed slowdowns. > > > > What good is a quickly-loading web site when it is less than useful? > > For all the flaws in the current output there are cases now where you > can click on a failure, see a summary of the 1-2 tests that failed, and > even find your way through the right place in the rather verbose raw log > output in 1/4 or 1/2 the time it takes the initial page on the new UX to > loda. I wonder where the hard data is that backs up these numbers. I do not have hard data, either, except for one: apart from you and Junio, I have yet to talk to any contributor who said "oh yeah, I found the logs right away" rather than things like "when I finally figured out that the logs were in `print-test-failures`, I had a chance to make sense of the failures" or even "zOMG _that_ is where the logs are?". And let me tell you that I heard this from a lot of people. Way more than a mere two. Far, far more. > > I'd much rather have a slow-loading web site that gets me to where I need > > to be than a fast-loading one that leaves me as puzzled as before. > > I think it's clear that we're going to disagree on this point, but I'd > still think that: > > * In a re-roll, you should amend these patches to clearly note that's a > UX trade-off you're making, perhaps with rough before/after timings > similar to the ones I've posted. > > I.e. now those patches say nothing about the UX change resulting in > UX that's *much* slower than before. Clearly noting that trade-off > for reviewers is not the same as saying the trade-off can't be made. Sure, I can do that. > * I don't see why the changes here can't be made configurable (and > perhaps you'd argue they should be on by default) via the ci-config > phase. I do see why. If my goal is to unhide the logs by default, so that new contributors can find them more easily, I will not hide that new behavior behind a hard-to-find config option, an option that new contributors are even less likely to find. That would be highly counterproductive. If your goal is to help new contributors, I am certain that you will agree. Ciao, Johannes
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > I do not have hard data, either, except for one: apart from you and Junio, > I have yet to talk to any contributor who said "oh yeah, I found the logs > right away" rather than things like "when I finally figured out that the > logs were in `print-test-failures`, I had a chance to make sense of the > failures" or even "zOMG _that_ is where the logs are?". And let me tell > you that I heard this from a lot of people. Way more than a mere two. > Far, far more. Stop counting me there. I didn't find the logs right away, and I already said that it is a good idea to eliminate the need to open the thing other than the one that opens by default.
On Wed, Mar 09 2022, Johannes Schindelin wrote: > Hi Ævar, > > On Mon, 7 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > >> On Mon, Mar 07 2022, Johannes Schindelin wrote: >> >> > On Wed, 2 Mar 2022, Ævar Arnfjörð Bjarmason wrote: >> > >> >> >> >> On Tue, Mar 01 2022, Johannes Schindelin via GitGitGadget wrote: >> >> >> >> > Changes since v1: >> >> > >> >> > * In the patch that removed MAKE_TARGETS, a stale comment about that >> >> > variable is also removed. >> >> > * The comment about set -x has been adjusted because it no longer applies >> >> > as-is. >> >> > * The commit message of "ci: make it easier to find failed tests' logs in >> >> > the GitHub workflow" has been adjusted to motivate the improvement >> >> > better. >> >> >> >> Just briefly: Much of the feedback I had on v1 is still unanswered, >> > >> > Yes, because I got the sense that your feedback ignores the goal of making >> > it easier to diagnose test failures. >> >> I think that's a rather strange conclusion given that I've submitted a >> parallel series that makes some of those failures easier to diagnose >> than the same changes in this series. I.e. the failures in build >> v.s. test phases, not the individual test format output (but those are >> orthagonal). > > I do not know what your parallel series implements, as I did not have the > time to read it yet (and it contains about double the number of patches of > my series, hopefully not intended to make it impossible for me to spare > the time to even taking a glimpse at it). No, I'm not arranging patches in such a way as to make them harder for you to review specifically. I thought those changes made sense as a logical progression. > Also: I thought we had the rule of trying to be mindful of other > contributors and avoid interfering with patch series that are in flight? > It could be viewed as unnecessarily adversarial. You don't need to look at the whole thing, but in https://lore.kernel.org/git/cover-00.25-00000000000-20220221T143936Z-avarab@gmail.com/ scroll down to "The following compares CI output" and compare: master: https://github.com/avar/git/runs/5274251909?check_suite_focus=true this: https://github.com/avar/git/runs/5274464670?check_suite_focus=true js: https://github.com/avar/git/runs/5272239403?check_suite_focus=true I.e. for the build v.s. test "grouping" you're doing early in your series we can get the same with a significantly negative instead of positive diffstat to .github & ci/, and it frees up the "nested groups" that you note as a limitation in your 4/9 with another potential group level (your 4/9: https://lore.kernel.org/git/9333ba781b8240f704e739b00d274f8c3d887e39.1643050574.git.gitgitgadget@gmail.com/) So it's not meant to be adversarial, but to help out this topic at large by showing that a constraint you ran up against is something we don't need to be limited by, and it makes that part of the CI output better. I think posting working code to demonstrate that we can indeed do that is the most productive thing to do, talk being cheap & all that. So yes, I agree that in general it's better to avoid conflicting topics etc., but in a case where a topic proposes to add a significant amount of code & complexity to get to some desired end-state, it makes sense to demonstrate with a patch or patches that we can get to the same end-state in some simpler way. >> I think it's a fair summary of our differences that we're just placing >> different values on UX responsiveness. I'm pretty sure there's some >> amount of UX slowdown you'd consider unacceptable, no matter how much >> the output was improved. >> >> Clearly we just use it differently. > > I would gladly trade my convenience in return for making it easier for > others to diagnose why their PR builds fail. > > At the moment, the way our CI/PR builds present test failures very likely > makes every new contributor feel stupid for not knowing how to proceed. > But they are not stupid. The failure is not theirs. The fault lies > squarely with us, for making it so frigging hard. I agree with you that it's relatively bad & could be improved. I don't have much issue with the end result we're left with once the page actually loads at the end of this series, just the practicalities of how slow the resulting UX is. >> >> or in the case of the performance issues I think just saying that this >> >> output is aimed at non-long-time-contributors doesn't really justify the >> >> large observed slowdowns. >> > >> > What good is a quickly-loading web site when it is less than useful? >> >> For all the flaws in the current output there are cases now where you >> can click on a failure, see a summary of the 1-2 tests that failed, and >> even find your way through the right place in the rather verbose raw log >> output in 1/4 or 1/2 the time it takes the initial page on the new UX to >> loda. > > I wonder where the hard data is that backs up these numbers. I've posted some in several replies to this series, e.g. https://lore.kernel.org/git/220222.86tucr6kz5.gmgdl@evledraar.gmail.com/; Have you tried to reproduce some of those? I.e. the "hard data" is that usually takes me 10-20 seconds to go from a CI link to the summary output & opening the "raw dump" view now, and the same page is taking about a minute to just load with the new UX. > [...] >> > I'd much rather have a slow-loading web site that gets me to where I need >> > to be than a fast-loading one that leaves me as puzzled as before. >> >> I think it's clear that we're going to disagree on this point, but I'd >> still think that: >> >> * In a re-roll, you should amend these patches to clearly note that's a >> UX trade-off you're making, perhaps with rough before/after timings >> similar to the ones I've posted. >> >> I.e. now those patches say nothing about the UX change resulting in >> UX that's *much* slower than before. Clearly noting that trade-off >> for reviewers is not the same as saying the trade-off can't be made. > > Sure, I can do that. > >> * I don't see why the changes here can't be made configurable (and >> perhaps you'd argue they should be on by default) via the ci-config >> phase. > > I do see why. If my goal is to unhide the logs by default, so that new > contributors can find them more easily, I will not hide that new behavior > behind a hard-to-find config option, an option that new contributors are > even less likely to find. That would be highly counterproductive. If your > goal is to help new contributors, I am certain that you will agree. I meant that they could be on by default, but to relatively easily give us an out to A/B test the performance of new fancy rendering v.s. the dumb raw dump we have now. If that's done via CI config it's a rather trivial matter of e.g. re-pushing "master" somewhere, whereas if it needs a patch/revert on top...
Johannes Schindelin via GitGitGadget wrote: > Changes since v1: > > * In the patch that removed MAKE_TARGETS, a stale comment about that > variable is also removed. > * The comment about set -x has been adjusted because it no longer applies > as-is. > * The commit message of "ci: make it easier to find failed tests' logs in > the GitHub workflow" has been adjusted to motivate the improvement > better. > > > Background > ========== > > Recent patches intended to help readers figure out CI failures much quicker > than before. Unfortunately, they haven't been entirely positive for me. For > example, they broke the branch protections in Microsoft's fork of Git, where > we require Pull Requests to pass a certain set of Checks (which are > identified by their names) and therefore caused follow-up work. > > Using CI and in general making it easier for new contributors is an area I'm > passionate about, and one I'd like to see improved. > > > The current situation > ===================== > > Let me walk you through the current experience when a PR build fails: I get > a notification mail that only says that a certain job failed. There's no > indication of which test failed (or was it the build?). I can click on a > link at it takes me to the workflow run. Once there, all it says is "Process > completed with exit code 1", or even "code 2". Sure, I can click on one of > the failed jobs. It even expands the failed step's log (collapsing the other > steps). And what do I see there? > > Let's look at an example of a failed linux-clang (ubuntu-latest) job > [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: > > [...] > Test Summary Report > ------------------- > t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) > Failed test: 49 > Non-zero exit status: 1 > t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) > TODO passed: 45, 47 > Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) > Result: FAIL > make[1]: *** [Makefile:53: prove] Error 1 > make[1]: Leaving directory '/home/runner/work/git/git/t' > make: *** [Makefile:3018: test] Error 2 > > > That's it. I count myself lucky not to be a new contributor being faced with > something like this. > > Now, since I am active in the Git project for a couple of days or so, I can > make sense of the "TODO passed" label and know that for the purpose of > fixing the build failures, I need to ignore this, and that I need to focus > on the "Failed test" part instead. > > I also know that I do not have to get myself an ubuntu-latest box just to > reproduce the error, I do not even have to check out the code and run it > just to learn what that "49" means. > > I know, and I do not expect any new contributor, not even most seasoned > contributors to know, that I have to patiently collapse the "Run > ci/run-build-and-tests.sh" job's log, and instead expand the "Run > ci/print-test-failures.sh" job log (which did not fail and hence does not > draw any attention to it). > You can count me as one of the ones that, until this series, had no idea this information existed in the CI logs. And it definitely would have been helpful in figuring out some of the OS-specific bugs I've run into in the past. :) > I know, and again: I do not expect many others to know this, that I then > have to click into the "Search logs" box (not the regular web browser's > search via Ctrl+F!) and type in "not ok" to find the log of the failed test > case (and this might still be a "known broken" one that is marked via > test_expect_failure and once again needs to be ignored). > > To be excessively clear: This is not a great experience! > > > Improved output > =============== > > Our previous Azure Pipelines-based CI builds had a much nicer UI, one that > even showed flaky tests, and trends e.g. how long the test cases ran. When I > ported Git's CI over to GitHub workflows (to make CI more accessible to new > contributors), I knew fully well that we would leave this very nice UI > behind, and I had hoped that we would get something similar back via new, > community-contributed GitHub Actions that can be used in GitHub workflows. > However, most likely because we use a home-grown test framework implemented > in opinionated POSIX shells scripts, that did not happen. > > So I had a look at what standards exist e.g. when testing PowerShell > modules, in the way of marking up their test output in GitHub workflows, and > I was not disappointed: GitHub workflows support "grouping" of output lines, > i.e. marking sections of the output as a group that is then collapsed by > default and can be expanded. And it is this feature I decided to use in this > patch series, along with GitHub workflows' commands to display errors or > notices that are also shown on the summary page of the workflow run. Now, in > addition to "Process completed with exit code" on the summary page, we also > read something like: > > ⊗ linux-gcc (ubuntu-latest) > failed: t9800.20 submit from detached head > > > Even better, this message is a link, and following that, the reader is > presented with something like this > [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: > > ⏵ Run ci/run-build-and-tests.sh > ⏵ CI setup > + ln -s /home/runner/none/.prove t/.prove > + run_tests=t > + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > + group Build make > + set +x > ⏵ Build > ⏵ Run tests > === Failed test: t9800-git-p4-basic === > ⏵ ok: t9800.1 start p4d > ⏵ ok: t9800.2 add p4 files > ⏵ ok: t9800.3 basic git p4 clone > ⏵ ok: t9800.4 depot typo error > ⏵ ok: t9800.5 git p4 clone @all > ⏵ ok: t9800.6 git p4 sync uninitialized repo > ⏵ ok: t9800.7 git p4 sync new branch > ⏵ ok: t9800.8 clone two dirs > ⏵ ok: t9800.9 clone two dirs, @all > ⏵ ok: t9800.10 clone two dirs, @all, conflicting files > ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit > ⏵ ok: t9800.12 clone using non-numeric revision ranges > ⏵ ok: t9800.13 clone with date range, excluding some changes > ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output > ⏵ ok: t9800.15 exit gracefully for p4 server errors > ⏵ ok: t9800.16 clone --bare should make a bare repository > ⏵ ok: t9800.17 initial import time from top change time > ⏵ ok: t9800.18 unresolvable host in P4PORT should display error > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > Error: failed: t9800.20 submit from detached head > ⏵ failure: t9800.20 submit from detached head > Error: failed: t9800.21 submit from worktree > ⏵ failure: t9800.21 submit from worktree > === Failed test: t9801-git-p4-branch === > [...] > > > The "Failed test:" lines are colored in yellow to give a better visual clue > about the logs' structure, the "Error:" label is colored in red to draw the > attention to the important part of the log, and the "⏵" characters indicate > that part of the log is collapsed and can be expanded by clicking on it. > > To drill down, the reader merely needs to expand the (failed) test case's > log by clicking on it, and then study the log. If needed (e.g. when the test > case relies on side effects from previous test cases), the logs of preceding > test cases can be expanded as well. In this example, when expanding > t9800.20, it looks like this (for ease of reading, I cut a few chunks of > lines, indicated by "[...]"): > > [...] > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > Error: failed: t9800.20 submit from detached head > ⏷ failure: t9800.20 submit from detached head > test_when_finished cleanup_git && > git p4 clone --dest="$git" //depot && > ( > cd "$git" && > git checkout p4/master && > >detached_head_test && > git add detached_head_test && > git commit -m "add detached_head" && > git config git-p4.skipSubmitEdit true && > git p4 submit && > git p4 rebase && > git log p4/master | grep detached_head > ) > [...] > Depot paths: //depot/ > Import destination: refs/remotes/p4/master > > Importing revision 9 (100%)Perforce db files in '.' will be created if missing... > Perforce db files in '.' will be created if missing... > > Traceback (most recent call last): > File "/home/runner/work/git/git/git-p4", line 4455, in <module> > main() > File "/home/runner/work/git/git/git-p4", line 4449, in main > if not cmd.run(args): > File "/home/runner/work/git/git/git-p4", line 2590, in run > rebase.rebase() > File "/home/runner/work/git/git/git-p4", line 4121, in rebase > if len(read_pipe("git diff-index HEAD --")) > 0: > File "/home/runner/work/git/git/git-p4", line 297, in read_pipe > retcode, out, err = read_pipe_full(c, *k, **kw) > File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full > p = subprocess.Popen( > File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ > self._execute_child(args, executable, preexec_fn, close_fds, > File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child > raise child_exception_type(errno_num, err_msg, err_filename) > FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' > error: last command exited with $?=1 > + cleanup_git > + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + nr_tries_left=60 > + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + test 1 -ne 1 > + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + nr_tries_left=60 > + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + exit 1 > + eval_ret=1 > + : > not ok 20 - submit from detached head > # > # test_when_finished cleanup_git && > # git p4 clone --dest="$git" //depot && > # ( > # cd "$git" && > # git checkout p4/master && > # >detached_head_test && > # git add detached_head_test && > # git commit -m "add detached_head" && > # git config git-p4.skipSubmitEdit true && > # git p4 submit && > # git p4 rebase && > # git log p4/master | grep detached_head > # ) > # > Error: failed: t9800.21 submit from worktree > [...] > > > Is this the best UI we can have for test failures in CI runs? I hope we can > do better. Having said that, this patch series presents a pretty good start, > and offers a basis for future improvements. > I think these are really valuable improvements over our current state, but I also understand the concerns about performance elsewhere in this thread (it's really slow to load for me as well, and scrolling/expanding the log groups can be a bit glitchy in my browser). That said, I think there are a couple ways you could improve the load time without sacrificing the (very helpful) improvements you've made to error log visibility. I experimented a bit (example result [1]) and came up with some things that may help: * group errors by test file, rather than by test case (to reduce parsing/rendering time for lots of groups). * print the verbose logs only for the failed test cases (to massively cut down on the size of the log, particularly when there's only a couple failures in a test file with a lot of passing tests). * skip printing the full text of the test in 'finalize_test_case_output' when creating the group, i.e., use '$1' instead of '$*' (in both passing and failing tests, this information is already printed via some other means). If you wanted to make sure a user could still access the full failure logs (i.e., including the "ok" test results), you could print a link to the artifacts page as well - that way, all of the information we currently provide to users can still be found somewhere. [1] https://github.com/vdye/git/runs/5666973267 > Johannes Schindelin (9): > ci: fix code style > ci/run-build-and-tests: take a more high-level view > ci: make it easier to find failed tests' logs in the GitHub workflow > ci/run-build-and-tests: add some structure to the GitHub workflow > output > tests: refactor --write-junit-xml code > test(junit): avoid line feeds in XML attributes > ci: optionally mark up output in the GitHub workflow > ci: use `--github-workflow-markup` in the GitHub workflow > ci: call `finalize_test_case_output` a little later > The organization of these commits makes the series a bit confusing to read, mainly due to the JUnit changes in the middle. Patches 5-6 don't appear to be dependent on patches 1-4, so they could be moved to the beginning (after patch 1). With that change, I think this series would flow more smoothly: "Cleanup/non-functional change" -> "JUnit XML improvements" -> "Log UX improvements". > .github/workflows/main.yml | 12 --- > ci/lib.sh | 82 +++++++++++++++-- > ci/run-build-and-tests.sh | 14 +-- > ci/run-test-slice.sh | 5 +- > t/test-lib-functions.sh | 4 +- > t/test-lib-github-workflow-markup.sh | 50 ++++++++++ > t/test-lib-junit.sh | 132 +++++++++++++++++++++++++++ > t/test-lib.sh | 128 ++++---------------------- > 8 files changed, 288 insertions(+), 139 deletions(-) > create mode 100644 t/test-lib-github-workflow-markup.sh > create mode 100644 t/test-lib-junit.sh > > > base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/1117 > > Range-diff vs v1: > > 1: db08b07c37a = 1: db08b07c37a ci: fix code style > 2: d2ff51bb5da ! 2: 42ff3e170bf ci/run-build-and-tests: take a more high-level view > @@ ci/run-build-and-tests.sh: pedantic) > ;; > esac > > - # Any new "test" targets should not go after this "make", but should > - # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will > - # start running tests. > +-# Any new "test" targets should not go after this "make", but should > +-# adjust $MAKE_TARGETS. Otherwise compilation-only targets above will > +-# start running tests. > -make $MAKE_TARGETS > +make > +if test -n "$run_tests" > 3: 98891b0d3f7 ! 3: bbbe1623257 ci: make it easier to find failed tests' logs in the GitHub workflow > @@ Metadata > ## Commit message ## > ci: make it easier to find failed tests' logs in the GitHub workflow > > + When investigating a test failure, the time that matters most is the > + time it takes from getting aware of the failure to displaying the output > + of the failing test case. > + > You currently have to know a lot of implementation details when > investigating test failures in the CI runs. The first step is easy: the > failed job is marked quite clearly, but when opening it, the failed step > @@ Commit message > The actually interesting part is in the detailed log of said failed > test script. But that log is shown in the CI run's step that runs > `ci/print-test-failures.sh`. And that step is _not_ expanded in the web > - UI by default. > + UI by default. It is even marked as "successful", which makes it very > + easy to miss that there is useful information hidden in there. > > Let's help the reader by showing the failed tests' detailed logs in the > step that is expanded automatically, i.e. directly after the test suite > 4: 9333ba781b8 ! 4: f72254a9ac6 ci/run-build-and-tests: add some structure to the GitHub workflow output > @@ ci/lib.sh > + > +# Set 'exit on error' for all CI scripts to let the caller know that > +# something went wrong. > -+# Set tracing executed commands, primarily setting environment variables > -+# and installing dependencies. > ++# > ++# We already enabled tracing executed commands earlier. This helps by showing > ++# how # environment variables are set and and dependencies are installed. > +set -e > + > skip_branch_tip_with_tag () { > @@ ci/lib.sh: linux-leaks) > +set -x > > ## ci/run-build-and-tests.sh ## > -@@ ci/run-build-and-tests.sh: esac > - # Any new "test" targets should not go after this "make", but should > - # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will > - # start running tests. > +@@ ci/run-build-and-tests.sh: pedantic) > + ;; > + esac > + > -make > +group Build make > if test -n "$run_tests" > 5: 94dcbe1bc43 = 5: 9eda6574313 tests: refactor --write-junit-xml code > 6: 41230100091 = 6: c8b240af749 test(junit): avoid line feeds in XML attributes > 7: 98b32630fcd = 7: 15f199e810e ci: optionally mark up output in the GitHub workflow > 8: 1a6bd1846bc = 8: 91ea54f36c5 ci: use `--github-workflow-markup` in the GitHub workflow > 9: 992b1575889 = 9: be2a83f5da3 ci: call `finalize_test_case_output` a little later >
On Thu, Mar 24 2022, Victoria Dye wrote: > Johannes Schindelin via GitGitGadget wrote: > [...] >> Is this the best UI we can have for test failures in CI runs? I hope we can >> do better. Having said that, this patch series presents a pretty good start, >> and offers a basis for future improvements. >> > > I think these are really valuable improvements over our current state, but I > also understand the concerns about performance elsewhere in this thread > (it's really slow to load for me as well, and scrolling/expanding the log > groups can be a bit glitchy in my browser). That said, I think there are a > couple ways you could improve the load time without sacrificing the (very > helpful) improvements you've made to error log visibility. I experimented a > bit (example result [1]) and came up with some things that may help: > > * group errors by test file, rather than by test case (to reduce > parsing/rendering time for lots of groups). > * print the verbose logs only for the failed test cases (to massively cut > down on the size of the log, particularly when there's only a couple > failures in a test file with a lot of passing tests). > * skip printing the full text of the test in 'finalize_test_case_output' > when creating the group, i.e., use '$1' instead of '$*' (in both passing > and failing tests, this information is already printed via some other > means). > > If you wanted to make sure a user could still access the full failure logs > (i.e., including the "ok" test results), you could print a link to the > artifacts page as well - that way, all of the information we currently > provide to users can still be found somewhere. > > [1] https://github.com/vdye/git/runs/5666973267 Thanks a lot for trying to address those concerns. I took a look at this and it definitely performs better, although in this case the overall output is ~3k lines. I'd be curious to see how it performs on some of the cases discussed in earlier threads of >~50k lines, although it looks like in this case that would require failures to be really widespread in the test suite. I just looked at this briefly, but looking at the branch I see you removed the "checking known breakage of[...]" etc. from the non-GitHub markdown output, I didn't spot how that was related/needed. >> Johannes Schindelin (9): >> ci: fix code style >> ci/run-build-and-tests: take a more high-level view >> ci: make it easier to find failed tests' logs in the GitHub workflow >> ci/run-build-and-tests: add some structure to the GitHub workflow >> output >> tests: refactor --write-junit-xml code >> test(junit): avoid line feeds in XML attributes >> ci: optionally mark up output in the GitHub workflow >> ci: use `--github-workflow-markup` in the GitHub workflow >> ci: call `finalize_test_case_output` a little later >> > > The organization of these commits makes the series a bit confusing to read, > mainly due to the JUnit changes in the middle. Patches 5-6 don't appear to > be dependent on patches 1-4, so they could be moved to the beginning (after > patch 1). With that change, I think this series would flow more smoothly: > "Cleanup/non-functional change" -> "JUnit XML improvements" -> "Log UX > improvements". Have you had a change to look at the approach my suggestion of an alternate approach to the early part of this series takes?: https://lore.kernel.org/git/cover-00.25-00000000000-20220221T143936Z-avarab@gmail.com/ I.e. to not build up ci/lib.sh to know to group the "build" etc. within the "run-build-and-test" step, but instead just to pull those to the top-level by running separate build & test steps.
Ævar Arnfjörð Bjarmason wrote: > > On Thu, Mar 24 2022, Victoria Dye wrote: > >> Johannes Schindelin via GitGitGadget wrote: >> [...] >>> Is this the best UI we can have for test failures in CI runs? I hope we can >>> do better. Having said that, this patch series presents a pretty good start, >>> and offers a basis for future improvements. >>> >> >> I think these are really valuable improvements over our current state, but I >> also understand the concerns about performance elsewhere in this thread >> (it's really slow to load for me as well, and scrolling/expanding the log >> groups can be a bit glitchy in my browser). That said, I think there are a >> couple ways you could improve the load time without sacrificing the (very >> helpful) improvements you've made to error log visibility. I experimented a >> bit (example result [1]) and came up with some things that may help: >> >> * group errors by test file, rather than by test case (to reduce >> parsing/rendering time for lots of groups). >> * print the verbose logs only for the failed test cases (to massively cut >> down on the size of the log, particularly when there's only a couple >> failures in a test file with a lot of passing tests). >> * skip printing the full text of the test in 'finalize_test_case_output' >> when creating the group, i.e., use '$1' instead of '$*' (in both passing >> and failing tests, this information is already printed via some other >> means). >> >> If you wanted to make sure a user could still access the full failure logs >> (i.e., including the "ok" test results), you could print a link to the >> artifacts page as well - that way, all of the information we currently >> provide to users can still be found somewhere. >> >> [1] https://github.com/vdye/git/runs/5666973267 > > Thanks a lot for trying to address those concerns. > > I took a look at this and it definitely performs better, although in > this case the overall output is ~3k lines. > > I'd be curious to see how it performs on some of the cases discussed in > earlier threads of >~50k lines, although it looks like in this case that > would require failures to be really widespread in the test suite. > Unfortunately, I don't have a direct comparison to that (the longest I found elsewhere in the thread was ~33k lines [1], but those failures came from strange interactions on the 'shears/seen' branch of Git for Windows that I couldn't easily replicate). If it helps, though, here's a 1:1 comparison of my "experiment" branch's forced test failures with and without the optimizations I tried (without optimization, the total log is ~28k lines): without optimization: https://github.com/vdye/git/runs/5696305589 with optimization: https://github.com/vdye/git/runs/5666973267 So it's definitely faster - it still takes a couple seconds to load, but not so long that my browser struggles with it (which was my main issue with the original approach). [1] https://github.com/dscho/git/runs/4840190622 > I just looked at this briefly, but looking at the branch I see you > removed the "checking known breakage of[...]" etc. from the non-GitHub > markdown output, I didn't spot how that was related/needed. > It was mostly just another attempt to cut down on extraneous output (since, if a test fails, the test definition is printed after the failure, so we would end up with the same information twice). That said, if that were to be incorporated here, it'd need to be smarter than what I tried - my change removed it entirely from the '.out' logs, and it means that any test that *does* pass wouldn't have its test definition logged anywhere (I think). The ideal situation would be the extraneous test definition is only removed from the '.markup' files, but that's probably a change better saved for a future patch/series. >>> Johannes Schindelin (9): >>> ci: fix code style >>> ci/run-build-and-tests: take a more high-level view >>> ci: make it easier to find failed tests' logs in the GitHub workflow >>> ci/run-build-and-tests: add some structure to the GitHub workflow >>> output >>> tests: refactor --write-junit-xml code >>> test(junit): avoid line feeds in XML attributes >>> ci: optionally mark up output in the GitHub workflow >>> ci: use `--github-workflow-markup` in the GitHub workflow >>> ci: call `finalize_test_case_output` a little later >>> >> >> The organization of these commits makes the series a bit confusing to read, >> mainly due to the JUnit changes in the middle. Patches 5-6 don't appear to >> be dependent on patches 1-4, so they could be moved to the beginning (after >> patch 1). With that change, I think this series would flow more smoothly: >> "Cleanup/non-functional change" -> "JUnit XML improvements" -> "Log UX >> improvements". > > Have you had a change to look at the approach my suggestion of an > alternate approach to the early part of this series takes?: > https://lore.kernel.org/git/cover-00.25-00000000000-20220221T143936Z-avarab@gmail.com/ > > I.e. to not build up ci/lib.sh to know to group the "build" etc. within > the "run-build-and-test" step, but instead just to pull those to the > top-level by running separate build & test steps. > I looked at it a while ago, but I actually had a similar issue following that series as I did this one; it's difficult to tell what's cleanup, what's refactoring unrelated to this series, and what's an explicit difference in approach compared with this series. Revisiting it now, I did the same thing I did with dscho's series: ran your branch with some forced test failures and looked at the results [2]. Based on that, there are a couple of helpful things I see in your series that contribute to the same overarching goal as this dscho's: * Separating build & test into different steps. * This makes it more immediately obvious to a user whether the issue was a compiler error or a test failure. Since test failures can only even happen if the compilation passes, this doesn't create (another) situation where the relevant failure information is in a different step than the auto-expanded failing one. * Separating 'lib.sh --build' and 'make' into different steps. * I was initially unsure of the value of this (conceptually, wouldn't they both be part of "build"?), but I eventually understood it to be "setup the environment for [build|test]" followed by "run the [build|test]". Since the main thing dscho's series is addressing is information visibility, I like that this similarly "unburies" the environment configuration at the beginning of build/test. Those changes are great (and they probably have some positive impact on load times). But as far as I can tell, nothing else in your series directly addresses the main problem dscho is fixing here, which is that the verbose failure logs are effectively hidden from the user (unless you know exactly where to look). As a result, it doesn't really fit as a "replacement" to this one for me. Honestly, my ideal "final form" of all of this may be a combination of both series, having the CI steps: - setup build environment - run build (make) - setup test environment - run test (make test) & print failure logs You can still pull the 'make' executions out of 'run-build-and-test.sh', but I think the "& print failure logs" part of the 'test' step (i.e., the added '|| handle_failed_tests') is the critical piece that, although it would slow things down to some extent (and, of course, it's subjective where the "too slow" line is), it would relevant failure information a whole lot more accessible. That's the real "value-add" of this series for me, if only because I know it would have helped me a bunch of times in the past - I absolutely believe it would similarly help new contributors in the future. [2] https://github.com/vdye/git/runs/5695895629
Hi Victoria, [sorry for the long delay, I was being pulled into too many directions, but am incredibly grateful for your fantastic investigation.] On Thu, 24 Mar 2022, Victoria Dye wrote: > Johannes Schindelin via GitGitGadget wrote: > > Changes since v1: > > > > * In the patch that removed MAKE_TARGETS, a stale comment about that > > variable is also removed. > > * The comment about set -x has been adjusted because it no longer applies > > as-is. > > * The commit message of "ci: make it easier to find failed tests' logs in > > the GitHub workflow" has been adjusted to motivate the improvement > > better. > > > > > > Background > > ========== > > > > Recent patches intended to help readers figure out CI failures much quicker > > than before. Unfortunately, they haven't been entirely positive for me. For > > example, they broke the branch protections in Microsoft's fork of Git, where > > we require Pull Requests to pass a certain set of Checks (which are > > identified by their names) and therefore caused follow-up work. > > > > Using CI and in general making it easier for new contributors is an area I'm > > passionate about, and one I'd like to see improved. > > > > > > The current situation > > ===================== > > > > Let me walk you through the current experience when a PR build fails: I get > > a notification mail that only says that a certain job failed. There's no > > indication of which test failed (or was it the build?). I can click on a > > link at it takes me to the workflow run. Once there, all it says is "Process > > completed with exit code 1", or even "code 2". Sure, I can click on one of > > the failed jobs. It even expands the failed step's log (collapsing the other > > steps). And what do I see there? > > > > Let's look at an example of a failed linux-clang (ubuntu-latest) job > > [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: > > > > [...] > > Test Summary Report > > ------------------- > > t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) > > Failed test: 49 > > Non-zero exit status: 1 > > t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) > > TODO passed: 45, 47 > > Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) > > Result: FAIL > > make[1]: *** [Makefile:53: prove] Error 1 > > make[1]: Leaving directory '/home/runner/work/git/git/t' > > make: *** [Makefile:3018: test] Error 2 > > > > > > That's it. I count myself lucky not to be a new contributor being faced with > > something like this. > > > > Now, since I am active in the Git project for a couple of days or so, I can > > make sense of the "TODO passed" label and know that for the purpose of > > fixing the build failures, I need to ignore this, and that I need to focus > > on the "Failed test" part instead. > > > > I also know that I do not have to get myself an ubuntu-latest box just to > > reproduce the error, I do not even have to check out the code and run it > > just to learn what that "49" means. > > > > I know, and I do not expect any new contributor, not even most seasoned > > contributors to know, that I have to patiently collapse the "Run > > ci/run-build-and-tests.sh" job's log, and instead expand the "Run > > ci/print-test-failures.sh" job log (which did not fail and hence does not > > draw any attention to it). > > > > You can count me as one of the ones that, until this series, had no idea > this information existed in the CI logs. And it definitely would have been > helpful in figuring out some of the OS-specific bugs I've run into in the > past. :) Excellent. I think I have enough accounts to consider it a universal truth that our CI output _needs_ something like this here patch series. > > I know, and again: I do not expect many others to know this, that I then > > have to click into the "Search logs" box (not the regular web browser's > > search via Ctrl+F!) and type in "not ok" to find the log of the failed test > > case (and this might still be a "known broken" one that is marked via > > test_expect_failure and once again needs to be ignored). > > > > To be excessively clear: This is not a great experience! > > > > > > Improved output > > =============== > > > > Our previous Azure Pipelines-based CI builds had a much nicer UI, one that > > even showed flaky tests, and trends e.g. how long the test cases ran. When I > > ported Git's CI over to GitHub workflows (to make CI more accessible to new > > contributors), I knew fully well that we would leave this very nice UI > > behind, and I had hoped that we would get something similar back via new, > > community-contributed GitHub Actions that can be used in GitHub workflows. > > However, most likely because we use a home-grown test framework implemented > > in opinionated POSIX shells scripts, that did not happen. > > > > So I had a look at what standards exist e.g. when testing PowerShell > > modules, in the way of marking up their test output in GitHub workflows, and > > I was not disappointed: GitHub workflows support "grouping" of output lines, > > i.e. marking sections of the output as a group that is then collapsed by > > default and can be expanded. And it is this feature I decided to use in this > > patch series, along with GitHub workflows' commands to display errors or > > notices that are also shown on the summary page of the workflow run. Now, in > > addition to "Process completed with exit code" on the summary page, we also > > read something like: > > > > ⊗ linux-gcc (ubuntu-latest) > > failed: t9800.20 submit from detached head > > > > > > Even better, this message is a link, and following that, the reader is > > presented with something like this > > [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: > > > > ⏵ Run ci/run-build-and-tests.sh > > ⏵ CI setup > > + ln -s /home/runner/none/.prove t/.prove > > + run_tests=t > > + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > > + group Build make > > + set +x > > ⏵ Build > > ⏵ Run tests > > === Failed test: t9800-git-p4-basic === > > ⏵ ok: t9800.1 start p4d > > ⏵ ok: t9800.2 add p4 files > > ⏵ ok: t9800.3 basic git p4 clone > > ⏵ ok: t9800.4 depot typo error > > ⏵ ok: t9800.5 git p4 clone @all > > ⏵ ok: t9800.6 git p4 sync uninitialized repo > > ⏵ ok: t9800.7 git p4 sync new branch > > ⏵ ok: t9800.8 clone two dirs > > ⏵ ok: t9800.9 clone two dirs, @all > > ⏵ ok: t9800.10 clone two dirs, @all, conflicting files > > ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit > > ⏵ ok: t9800.12 clone using non-numeric revision ranges > > ⏵ ok: t9800.13 clone with date range, excluding some changes > > ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output > > ⏵ ok: t9800.15 exit gracefully for p4 server errors > > ⏵ ok: t9800.16 clone --bare should make a bare repository > > ⏵ ok: t9800.17 initial import time from top change time > > ⏵ ok: t9800.18 unresolvable host in P4PORT should display error > > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > > Error: failed: t9800.20 submit from detached head > > ⏵ failure: t9800.20 submit from detached head > > Error: failed: t9800.21 submit from worktree > > ⏵ failure: t9800.21 submit from worktree > > === Failed test: t9801-git-p4-branch === > > [...] > > > > > > The "Failed test:" lines are colored in yellow to give a better visual clue > > about the logs' structure, the "Error:" label is colored in red to draw the > > attention to the important part of the log, and the "⏵" characters indicate > > that part of the log is collapsed and can be expanded by clicking on it. > > > > To drill down, the reader merely needs to expand the (failed) test case's > > log by clicking on it, and then study the log. If needed (e.g. when the test > > case relies on side effects from previous test cases), the logs of preceding > > test cases can be expanded as well. In this example, when expanding > > t9800.20, it looks like this (for ease of reading, I cut a few chunks of > > lines, indicated by "[...]"): > > > > [...] > > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > > Error: failed: t9800.20 submit from detached head > > ⏷ failure: t9800.20 submit from detached head > > test_when_finished cleanup_git && > > git p4 clone --dest="$git" //depot && > > ( > > cd "$git" && > > git checkout p4/master && > > >detached_head_test && > > git add detached_head_test && > > git commit -m "add detached_head" && > > git config git-p4.skipSubmitEdit true && > > git p4 submit && > > git p4 rebase && > > git log p4/master | grep detached_head > > ) > > [...] > > Depot paths: //depot/ > > Import destination: refs/remotes/p4/master > > > > Importing revision 9 (100%)Perforce db files in '.' will be created if missing... > > Perforce db files in '.' will be created if missing... > > > > Traceback (most recent call last): > > File "/home/runner/work/git/git/git-p4", line 4455, in <module> > > main() > > File "/home/runner/work/git/git/git-p4", line 4449, in main > > if not cmd.run(args): > > File "/home/runner/work/git/git/git-p4", line 2590, in run > > rebase.rebase() > > File "/home/runner/work/git/git/git-p4", line 4121, in rebase > > if len(read_pipe("git diff-index HEAD --")) > 0: > > File "/home/runner/work/git/git/git-p4", line 297, in read_pipe > > retcode, out, err = read_pipe_full(c, *k, **kw) > > File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full > > p = subprocess.Popen( > > File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ > > self._execute_child(args, executable, preexec_fn, close_fds, > > File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child > > raise child_exception_type(errno_num, err_msg, err_filename) > > FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' > > error: last command exited with $?=1 > > + cleanup_git > > + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + nr_tries_left=60 > > + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + test 1 -ne 1 > > + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + nr_tries_left=60 > > + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > > + exit 1 > > + eval_ret=1 > > + : > > not ok 20 - submit from detached head > > # > > # test_when_finished cleanup_git && > > # git p4 clone --dest="$git" //depot && > > # ( > > # cd "$git" && > > # git checkout p4/master && > > # >detached_head_test && > > # git add detached_head_test && > > # git commit -m "add detached_head" && > > # git config git-p4.skipSubmitEdit true && > > # git p4 submit && > > # git p4 rebase && > > # git log p4/master | grep detached_head > > # ) > > # > > Error: failed: t9800.21 submit from worktree > > [...] > > > > > > Is this the best UI we can have for test failures in CI runs? I hope we can > > do better. Having said that, this patch series presents a pretty good start, > > and offers a basis for future improvements. > > > > I think these are really valuable improvements over our current state, but I > also understand the concerns about performance elsewhere in this thread > (it's really slow to load for me as well, and scrolling/expanding the log > groups can be a bit glitchy in my browser). That said, I think there are a > couple ways you could improve the load time without sacrificing the (very > helpful) improvements you've made to error log visibility. I experimented a > bit (example result [1]) and came up with some things that may help: > > * group errors by test file, rather than by test case (to reduce > parsing/rendering time for lots of groups). I really would like to avoid that, based on my past experience with diagnosing test failures. It is definitely helpful if the structure lets the reader expand individual test cases. > * print the verbose logs only for the failed test cases (to massively cut > down on the size of the log, particularly when there's only a couple > failures in a test file with a lot of passing tests). That's an amazingly simple trick to improve the speed by a ton, indeed. Thank you for this splendid idea! > * skip printing the full text of the test in 'finalize_test_case_output' > when creating the group, i.e., use '$1' instead of '$*' (in both passing > and failing tests, this information is already printed via some other > means). > > If you wanted to make sure a user could still access the full failure logs > (i.e., including the "ok" test results), you could print a link to the > artifacts page as well - that way, all of the information we currently > provide to users can still be found somewhere. That's a good point, I added that hint to the output (the link is unfortunately not available at the time we print that advice). > > [1] https://github.com/vdye/git/runs/5666973267 > > > Johannes Schindelin (9): > > ci: fix code style > > ci/run-build-and-tests: take a more high-level view > > ci: make it easier to find failed tests' logs in the GitHub workflow > > ci/run-build-and-tests: add some structure to the GitHub workflow > > output > > tests: refactor --write-junit-xml code > > test(junit): avoid line feeds in XML attributes > > ci: optionally mark up output in the GitHub workflow > > ci: use `--github-workflow-markup` in the GitHub workflow > > ci: call `finalize_test_case_output` a little later > > > > The organization of these commits makes the series a bit confusing to read, > mainly due to the JUnit changes in the middle. Patches 5-6 don't appear to > be dependent on patches 1-4, so they could be moved to the beginning (after > patch 1). With that change, I think this series would flow more smoothly: > "Cleanup/non-functional change" -> "JUnit XML improvements" -> "Log UX > improvements". Great feedback! I changed the order as suggested. Again, thank you so much for helping me improve the user experience of Git's CI/PR builds. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> * print the verbose logs only for the failed test cases (to massively cut >> down on the size of the log, particularly when there's only a couple >> failures in a test file with a lot of passing tests). > > That's an amazingly simple trick to improve the speed by a ton, indeed. > Thank you for this splendid idea! > >> * skip printing the full text of the test in 'finalize_test_case_output' >> when creating the group, i.e., use '$1' instead of '$*' (in both passing >> and failing tests, this information is already printed via some other >> means). >> >> If you wanted to make sure a user could still access the full failure logs >> (i.e., including the "ok" test results), you could print a link to the >> artifacts page as well - that way, all of the information we currently >> provide to users can still be found somewhere. > > That's a good point, I added that hint to the output (the link is > unfortunately not available at the time we print that advice). https://github.com/git/git/runs/6539786128 shows that all in-flight topics merged to 'seen', except for the ds/bundle-uri-more, passes the linux-leaks job. The ds/bundle-uri-more topic introduces some leaks to commands that happen to be used in tests that are marked as leak-checker clean, making the job fail. Which makes a great guinea pig for the CI output improvement topic. So, I created two variants of 'seen' with this linux-leaks breakage. One is with the js/ci-github-workflow-markup topic on this thread. The other one is with the ab/ci-github-workflow-markup topic (which uses a preliminary clean-up ab/ci-setup-simplify topic as its base). They should show the identical test results and failures. And here are their output: - https://github.com/git/git/runs/6539835065 - https://github.com/git/git/runs/6539900608 If I recall correctly, the selling point of the ab/* variant over js/* variant was that it would give quicker UI response compared to the former, but other than that, both variants' UI are supposed to be as newbie friendly as the other. When I tried the former, it reacted too poorly to my attempt to scroll (with mouse scroll wheel, if it makes a difference) that sometimes I was staring a blank dark-gray space for a few seconds waiting for it to be filled by something, which was a bit jarring experience. When I tried the latter, it didn't show anything to help diagnosing the details of the breakage in "run make test" step and the user needed to know "print test failures" needs to be looked at, which I am not sure is an inherent limitation of the approach. After the single extra click, navigating the test output to find the failed steps among many others that succeeded was not a very pleasant experience. Those who are interested in UX experiment may want to visit these two output to see how usable each of these is for themselves. Thanks.
Hi Junio, On Sat, 21 May 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> * print the verbose logs only for the failed test cases (to massively cut > >> down on the size of the log, particularly when there's only a couple > >> failures in a test file with a lot of passing tests). > > > > That's an amazingly simple trick to improve the speed by a ton, indeed. > > Thank you for this splendid idea! > > > >> * skip printing the full text of the test in 'finalize_test_case_output' > >> when creating the group, i.e., use '$1' instead of '$*' (in both passing > >> and failing tests, this information is already printed via some other > >> means). > >> > >> If you wanted to make sure a user could still access the full failure logs > >> (i.e., including the "ok" test results), you could print a link to the > >> artifacts page as well - that way, all of the information we currently > >> provide to users can still be found somewhere. > > > > That's a good point, I added that hint to the output (the link is > > unfortunately not available at the time we print that advice). > > https://github.com/git/git/runs/6539786128 shows that all in-flight > topics merged to 'seen', except for the ds/bundle-uri-more, passes > the linux-leaks job. The ds/bundle-uri-more topic introduces some > leaks to commands that happen to be used in tests that are marked as > leak-checker clean, making the job fail. > > Which makes a great guinea pig for the CI output improvement topic. > > So, I created two variants of 'seen' with this linux-leaks breakage. > One is with the js/ci-github-workflow-markup topic on this thread. > The other one is with the ab/ci-github-workflow-markup topic (which > uses a preliminary clean-up ab/ci-setup-simplify topic as its base). > They should show the identical test results and failures. > > And here are their output: > > - https://github.com/git/git/runs/6539835065 I see that this is still with the previous iteration, and therefore exposes the same speed (or slowness) as was investigated so wonderfully by Victoria. So I really do not understand why you pointed to that run, given that it still contains all the successful test cases' logs, which contributes in a major way to said slowness. Maybe you meant to refer to https://github.com/git/git/runs/6540394142 instead, which at least for me loads much faster _and_ makes the output as helpful as my intention was? Ciao, Dscho > - https://github.com/git/git/runs/6539900608 > > If I recall correctly, the selling point of the ab/* variant over > js/* variant was that it would give quicker UI response compared to > the former, but other than that, both variants' UI are supposed to > be as newbie friendly as the other. > > When I tried the former, it reacted too poorly to my attempt to > scroll (with mouse scroll wheel, if it makes a difference) that > sometimes I was staring a blank dark-gray space for a few seconds > waiting for it to be filled by something, which was a bit jarring > experience. When I tried the latter, it didn't show anything to > help diagnosing the details of the breakage in "run make test" step > and the user needed to know "print test failures" needs to be looked > at, which I am not sure is an inherent limitation of the approach. > After the single extra click, navigating the test output to find the > failed steps among many others that succeeded was not a very pleasant > experience. > > Those who are interested in UX experiment may want to visit these > two output to see how usable each of these is for themselves. > > Thanks. > > > > >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > I see that this is still with the previous iteration, and therefore > exposes the same speed (or slowness) as was investigated so wonderfully by > Victoria. > > So I really do not understand why you pointed to that run, given that it Simply because your updated version came to my tree a lot after I prepared two trees that are otherwise identical for comparison to write the message you are responding to. If the new round is much improved than the previous one, that is a very good news. I do not appreciate that you have to always talk back to others in such an aggressive tone, and I do not think it is only to me, by the way. You could have said the same thing in a lot more cordial way, e.g. "There is a newer version than those being compared---could you look at this run instead for comparison, even though admittably there probably are changes in other topics in flight so the exact failures may be different?"
Junio C Hamano <gitster@pobox.com> writes: > Which makes a great guinea pig for the CI output improvement topic. > > So, I created two variants of 'seen' with this linux-leaks breakage. > One is with the js/ci-github-workflow-markup topic on this thread. > The other one is with the ab/ci-github-workflow-markup topic (which > uses a preliminary clean-up ab/ci-setup-simplify topic as its base). > They should show the identical test results and failures. The two runs to look at have been updated. - The one with Ævar's change was missing the primary "workflow markup" topic (it only had preliminary clean-up topic), so it is not a fair feature-to-feature comparison to begin with. - The other one with Johannes's change was done with the version before the latest round from yesterday, which has improvements. With all the other in-flight topics (including the one that shows failures in linux-leaks job) merged to the same base in the same order, I prepared two variants of 'seen' that resulted in these logs: - https://github.com/git/git/runs/6546816978 - https://github.com/git/git/runs/6546750379 One is with both of the required topics from Ævar (with a fix-up [*]), and the other is with the latest from Johannes's series. I do not want to taint other folks' eyes with my observations, so I'd send my impression in a separate message as a response to this message after waiting for some time. Thanks. [Footnote] * 76253615 (ci: optionally mark up output in the GitHub workflow, 2022-04-21) added references to ci/print-test-failures.sh and ci/print-test-failures-github.sh to the workflow file, while the latter script does not exist, but it appears that these references want to run the same script, so I've made a stupid and obvious fix-up today before pushing the result of merging all out. This prevented "make test || ci/print-test-failures.sh" from running correctly [*], ever since 76253615 (ci: optionally mark up output in the GitHub workflow, 2022-04-21) was queued, and it seems that nobody noticed nor complained. Sigh.
On Sat, May 21 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >>> * print the verbose logs only for the failed test cases (to massively cut >>> down on the size of the log, particularly when there's only a couple >>> failures in a test file with a lot of passing tests). >> >> That's an amazingly simple trick to improve the speed by a ton, indeed. >> Thank you for this splendid idea! >> >>> * skip printing the full text of the test in 'finalize_test_case_output' >>> when creating the group, i.e., use '$1' instead of '$*' (in both passing >>> and failing tests, this information is already printed via some other >>> means). >>> >>> If you wanted to make sure a user could still access the full failure logs >>> (i.e., including the "ok" test results), you could print a link to the >>> artifacts page as well - that way, all of the information we currently >>> provide to users can still be found somewhere. >> >> That's a good point, I added that hint to the output (the link is >> unfortunately not available at the time we print that advice). > > https://github.com/git/git/runs/6539786128 shows that all in-flight > topics merged to 'seen', except for the ds/bundle-uri-more, passes > the linux-leaks job. The ds/bundle-uri-more topic introduces some > leaks to commands that happen to be used in tests that are marked as > leak-checker clean, making the job fail. > > Which makes a great guinea pig for the CI output improvement topic. > > So, I created two variants of 'seen' with this linux-leaks breakage. > One is with the js/ci-github-workflow-markup topic on this thread. > The other one is with the ab/ci-github-workflow-markup topic (which > uses a preliminary clean-up ab/ci-setup-simplify topic as its base). > They should show the identical test results and failures. > > And here are their output: > > - https://github.com/git/git/runs/6539835065 > - https://github.com/git/git/runs/6539900608 > > If I recall correctly, the selling point of the ab/* variant over > js/* variant was that it would give quicker UI response compared to > the former, but other than that, both variants' UI are supposed to > be as newbie friendly as the other. ... > When I tried the former, it reacted too poorly to my attempt to > scroll (with mouse scroll wheel, if it makes a difference) that > sometimes I was staring a blank dark-gray space for a few seconds > waiting for it to be filled by something, which was a bit jarring > experience. When I tried the latter, it didn't show anything to > help diagnosing the details of the breakage in "run make test" step > and the user needed to know "print test failures" needs to be looked > at, which I am not sure is an inherent limitation of the approach. > After the single extra click, navigating the test output to find the > failed steps among many others that succeeded was not a very pleasant > experience. > > Those who are interested in UX experiment may want to visit these > two output to see how usable each of these is for themselves. Re selling point & feature comparison: The point of the ab/* variant was to re-roll Johannes's onto a "base" topic that made much of his unnecessary, because the building up of features to emit GitHub markup can be replaced by unrolling things like "make" and "make test" to the top-level. That has its own UX benefits, e.g. you can see at a glance what command was run and what the environment was, and "make" and "make test" are now split up from one monolithic "build and test" step. But the primary intention was not to provide a prettier UX, but to show that this arrangement made sense. I was hoping that Johannes would reply with some variant of "ah, I see what you mean, that does make things simpler!" and run with it, but alas... So small bits in the UX like what you pointed out with needing an extra click are in there, that one would be easy to solve, it's because we "focus" on the last step with a false exit code, so we'd just have to arrange for the "print" step to be that step. Anyway, the v3 CL of Johannes's series claims that the re-roll "improves the time to load pages". I ran both the Firefox and Chrome debugger with performance benchmarks against: https://github.com/git/git/runs/6540394142 And: https://github.com/avar/git/runs/6551581584?check_suite_focus=true The former is what Johannes noted as the correct v3 in https://lore.kernel.org/git/nycvar.QRO.7.76.6.2205222045130.352@tvgsbejvaqbjf.bet/, the latter is the current "seen" with ab/ci-github-workflow-markup reverted, i.e. just my "base" changes. In Chrome/Firefox the time to load the page (as in the spinner stops, and we "focus" on the right content) is: JS: ~60s / ~80s Æ: ~25s / ~18s This is with Chrome Version 101.0.4951.54 (Official Build) (64-bit) and Firefox 91.8.0esr (64-bit), both on a Debain Linux x86_64 Dell laptop. The case of Chrome is quite revealing (since its developer tools seem to show a better summary). It shows that the "Æ" version spent ~200ms on "scripting", ~1ms on "rendering", and ~20k ms "idle". For "JS" that's ~30k ms on "scripting", 15k ms on "rendering", then 7k ms on "painting" (which is ~0ms in the other one). 7k ms are spent on "idle". So these are basically the same performance results as I reported in earlier iterations. I think a v4 of this series really deserves a much less terse CL. I.e. there are specific reports about major slowdowns in the UX. Any re-roll should really be re-testing those with the same/similar software and reporting before/after results. Clearly the primary goal of improving the CI UX should not be to optimize the rendering of the results as a goal in itself, but in this case it becomes *so slow* that it precludes certain major use-cases, such as seeing a failure and being able to view it in some timely fashion.
Hi Junio, On Sun, 22 May 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I see that this is still with the previous iteration, and therefore > > exposes the same speed (or slowness) as was investigated so wonderfully by > > Victoria. > > > > So I really do not understand why you pointed to that run, given that it > > Simply because your updated version came to my tree a lot after I > prepared two trees that are otherwise identical for comparison to > write the message you are responding to. Oh sorry, I only noticed that your mail came in after I sent the new iteration, and I incorrectly thought that you had put the patch series into the Stalled/To-Drop pile, so I assumed that your mail was in response to my new iteration. I had missed that you replied to v2 instead of to v3. > I do not appreciate that you have to always talk back to others in > such an aggressive tone I apologize for that. As you might have guessed, it was not my intention to be aggressive in any way. I merely meant to express my puzzlement, and your explanation resolved that very nicely. Thank you, Dscho
Hi Ævar, On Mon, 23 May 2022, Ævar Arnfjörð Bjarmason wrote: > Re selling point & feature comparison: The point of the ab/* variant was > to re-roll Johannes's onto a "base" topic that made much of his > unnecessary, because the building up of features to emit GitHub markup > can be replaced by unrolling things like "make" and "make test" to the > top-level. > > That has its own UX benefits, e.g. you can see at a glance what command > was run and what the environment was, and "make" and "make test" are now > split up from one monolithic "build and test" step. > > But the primary intention was not to provide a prettier UX, but to show > that this arrangement made sense. I was hoping that Johannes would reply > with some variant of "ah, I see what you mean, that does make things > simpler!" and run with it, but alas... I believe that we share the goal to make the Git project more welcoming and easier to navigate for new contributors. The patch series you wanted me to look at claims to make the CI/PR definitions/scripts simpler. As it matters more to contributors how to investigate test failures, i.e. what information they are provided about the failures, I disagree that that patch series needs to be connected to my patch series in any way. Further, the result does not look like a simplification to me. For example, I consider it an absolute no-go to remove the remnants of Azure Pipelines support. As I had hinted, and as you saw on the git-security list, I require this support for embargoed releases. That’s what I did when working on the patches that made it into v2.35.2. In my book, removing such vital (if dormant) code is not a simplification, but a Chesterton’s Fence. While we do not need to use Azure Pipelines for our regular CI, we definitely need it for embargoed releases. “Simply revert it back” is not an excuse for removing something that should not be removed in the first place. As another example where I have a different concept of what constitutes “simple”: In Git for Windows’ fork, we carry a patch that integrates the `git-subtree` tests into the CI builds. This patch touches two places, `ci/run-build-and-tests.sh` and `ci/run-test-slice.sh`. These changes would be inherited by any CI definition that uses the scripts in `ci/`. With the proposed patches, there are four places to patch, and they are all limited to the GitHub workflow definition. Since you asked me for my assessment: this is de-DRYing the code, making it more cumbersome instead of simpler. In other words, I have fundamental objections about the approach and about tying it to the patches that improve the output of Git’s CI/PR runs. > In Chrome/Firefox the time to load the page (as in the spinner stops, > and we "focus" on the right content) is: > > JS: ~60s / ~80s > Æ: ~25s / ~18s My focus is on the experience of occasional and new contributors who need to investigate test failures in the CI/PR runs. In this thread, we already discussed the balance between speed of loading the page on the one hand and how well the reader is guided toward the relevant parts on the other hand. I disagree with you that the former should be prioritized over the latter, on the contrary, guiding the readers along a path to success is much more important than optimizing for a quick page load. Most contributors who chimed in seemed to not mind a longer page load time anyway, as long as the result would help them identify quickly what causes the test failures. Besides, the page load times are only likely to become better anyway, as GitHub engineers continuously improve Actions. Ciao, Dscho
Junio C Hamano <gitster@pobox.com> writes: > I do not want to taint other folks' eyes with my observations, so I'd > send my impression in a separate message as a response to this > message after waiting for some time. Between the previous and latest of the Johannes's topic, the test output got a lot shorter by discarding the "ok" output and keeping only the failures and skips. Because the readers are mostly interested in seeing failures (they can download the full log if they want to), and this design decision probably makes sense to me. The same "while scrolling, the user has to stare into the gray void for several seconds" is still there and needs a bit of getting used to (I do not know if it is a browser's problem, or something the output can help giving a better user experience---the lines in the folded part may probably not be "counted" correctly or something silly like that). The ones with the topic from Ævar last night, as I've mentioned already, lacked the main part of the logic, and it wouldn't have worked correctly because there was a show-stopper bug in one of the steps in it. With that fixed, the "extra click" I complained last night seems to be gone. I guess the same "discard the test steps that successfully ran" trick would give us the same "shorter" output. I observe the same "staring into the gray void while scrolling" when it comes to the print-test-failures output, just as in the output from Johannes's topic. Common to the both approaches, folding output from each test piece to one line (typically "ok" but sometimes "failed" heading) may be the source of UI responsiveness irritation I have been observing, but I wonder, with the removal of all "ok" pieces, it may make sense not to fold anything and instead give a flat "here are the traces of all failed and skipped tests". In any case, either implementation seems to give us a good improvement over what is in 'master'. Thanks.
Hi Junio, On Mon, 23 May 2022, Junio C Hamano wrote: > [...] the test output got a lot shorter by discarding the "ok" output > and keeping only the failures and skips. Because the readers are mostly > interested in seeing failures (they can download the full log if > they want to), and this design decision probably makes sense to me. For the record, Victoria suggested to group by file rather than by failed test case. However, I do speak from a lot of experience diagnosing test failures in CI/PR runs when I say: it is frequently very helpful to have a look at one failed test case at a time. I'd much rather suffer a minor lag while scrolling than having to find the boundaries manually, in particular when `test_expect_failure` test cases are present (which are reported as "broken" in the current iteration instead of "failed"). Besides, the scroll issue is probably similar between both approaches to grouping (and may be independent of the grouping, as you pointed out by reporting similar issues in the current `print-test-failures` step), and is something I hope the Actions engineers are working on. > Common to the both approaches, folding output from each test piece > to one line (typically "ok" but sometimes "failed" heading) may be > the source of UI responsiveness irritation I have been observing, > but I wonder, with the removal of all "ok" pieces, it may make sense > not to fold anything and instead give a flat "here are the traces of > all failed and skipped tests". As I mentioned above, I'd rather keep the grouping by failed test case. Obviously, the ideal way to decide would be to set up some A/B testing with real people, but I have no way to set up anything like that. > In any case, either implementation seems to give us a good improvement > over what is in 'master'. There are two things I would like to add: - In the current iteration's summary page, you will see the failed test cases' titles in the errors, and they are clickable (and will get you to the corresponding part of the logs). I find this very convenient. - The addition of the suggestion to look at the run's artifacts for the full logs might not look like a big deal, but I bet that it will help in particular new contributors. This was yet another great suggestion by Victoria. Thanks, Dscho
On Mon, May 23 2022, Johannes Schindelin wrote: > Hi Ævar, > > On Mon, 23 May 2022, Ævar Arnfjörð Bjarmason wrote: > >> Re selling point & feature comparison: The point of the ab/* variant was >> to re-roll Johannes's onto a "base" topic that made much of his >> unnecessary, because the building up of features to emit GitHub markup >> can be replaced by unrolling things like "make" and "make test" to the >> top-level. >> >> That has its own UX benefits, e.g. you can see at a glance what command >> was run and what the environment was, and "make" and "make test" are now >> split up from one monolithic "build and test" step. >> >> But the primary intention was not to provide a prettier UX, but to show >> that this arrangement made sense. I was hoping that Johannes would reply >> with some variant of "ah, I see what you mean, that does make things >> simpler!" and run with it, but alas... > > I believe that we share the goal to make the Git project more welcoming > and easier to navigate for new contributors. Yes, definitely. > The patch series you wanted me to look at claims to make the CI/PR > definitions/scripts simpler. As it matters more to contributors how to > investigate test failures, i.e. what information they are provided about > the failures, I disagree that that patch series needs to be connected to > my patch series in any way. Our two set of patches change different parts of the CI UX, so no. The set of patches I've been proposing isn't just making CI/PR definitions/scripts simpler, although it also does that. So e.g. in your patches you need to massage the CI output to split the "build" step from the "test" step. As you can see in an earlier RFC re-roll of them on top of my topic that something you'd get for free: https://lore.kernel.org/git/RFC-cover-v5-00.10-00000000000-20220421T183001Z-avarab@gmail.com/ > Further, the result does not look like a simplification to me. For > example, I consider it an absolute no-go to remove the remnants of Azure > Pipelines support. As I had hinted, and as you saw on the git-security > list, I require this support for embargoed releases. That’s what I did > when working on the patches that made it into v2.35.2. In my book, > removing such vital (if dormant) code is not a simplification, but a > Chesterton’s Fence. While we do not need to use Azure Pipelines for our > regular CI, we definitely need it for embargoed releases. “Simply revert > it back” is not an excuse for removing something that should not be > removed in the first place. Can you please reply to this 3 month old and still-waiting-on-your-reply E-Mail on this topic so we can figure out a way forward with this: https://lore.kernel.org/git/220222.86y2236ndp.gmgdl@evledraar.gmail.com/ > As another example where I have a different concept of what constitutes > “simple”: In Git for Windows’ fork, we carry a patch that integrates the > `git-subtree` tests into the CI builds. This patch touches two places, > `ci/run-build-and-tests.sh` and `ci/run-test-slice.sh`. These changes > would be inherited by any CI definition that uses the scripts in `ci/`. > With the proposed patches, there are four places to patch, and they are > all limited to the GitHub workflow definition. Since you asked me for my > assessment: this is de-DRYing the code, making it more cumbersome instead > of simpler. No, you'd still have two places to patch: 1. The top-level Makefile to have "make test" run those subtree tests depending on some flag, i.e. the same as your ci/run-build-and-tests.sh. 2. ci/run-test-slice.sh as before (which is only needed for the Windows-specific tests). Because we'd be having the Makefile drive the logic you could also run such a "make test" locally, which is something we should have anyway. E.g. when I build my own git I run the subtree tests, and would like to eventually make "run contrib tests too" some configurable option. So it is exactly the DRY principle. By avoiding making things needlessly CI-specific we can just control this behavior with flags, both in and outside CI. > In other words, I have fundamental objections about the approach and about > tying it to the patches that improve the output of Git’s CI/PR runs. I would too if after my series you needed to patch every place we run "make test" or whatever to run your subtree tests, but as noted above that's not the case. So hopefully this addresses that. More generally: I noted a while ago that if you pointed out issues like that I'd be happy to address them for you. Based on this I see d08496f2c40 (ci: run `contrib/subtree` tests in CI builds, 2021-08-05), and that would be easy to generalize. >> In Chrome/Firefox the time to load the page (as in the spinner stops, >> and we "focus" on the right content) is: >> >> JS: ~60s / ~80s >> Æ: ~25s / ~18s > > My focus is on the experience of occasional and new contributors who need > to investigate test failures in the CI/PR runs. In this thread, we already > discussed the balance between speed of loading the page on the one hand > and how well the reader is guided toward the relevant parts on the other > hand. First, your re-roll claims thta it "improves the time to load pages", but based on the sort of testing I'd done before when I reported the severe slowness introduced by this topic I can't reproduce that. So how exactly are you testing the performance of these load times, and can you share the numbers you have for master, your previous iteration and this re-roll? > I disagree with you that the former should be prioritized over the > latter, on the contrary, guiding the readers along a path to success is > much more important than optimizing for a quick page load. I think a better UX is certainly worth some cost to load times, so I'm not trying to be difficult in saying that this costs us some milliseconds so it's a no-go. But really, this is making it so slow that it's borderline unusable. The main way I use this interface is that I'll get an E-Mail with a failure report, or see the "X" in the UX and click through to the failure, then see the logs etc, and hopefully be able to see from that what's wrong, or how I could begin to reproduce it. Right now that's fast enough that I'll do that all in one browser click-through session, but if I'm having to wait *more than a minute* v.s. the current 10-20 seconds (which is already quite bad)? Your latest series also seems to either be buggy (or trigger some bug in GitHub Actions?) where even after that minute you'll see almost nothing on your screen. So a user who doesn't know the UX would end up waiting much longer than that. You seemingly need to know that it's done when it shows you that blank screen, and trigger a re-render by scrolling up or down, which will show you your actual failures. That's not an issue I saw in any iteration of this before this v3. > Most contributors who chimed in seemed to not mind a longer page load time > anyway, as long as the result would help them identify quickly what causes > the test failures. Wasn't much of that discussion a follow-up to your initial demos of this topic? I don't think those were as slow as what I'm pointing out above, which I think is just because those failures happened to involve much fewer lines of log. The slowness seems to be at correlated with how many lines we're dealing with in total. > Besides, the page load times are only likely to become > better anyway, as GitHub engineers continuously improve Actions. Sure, and if this were all magically made better by GH engineers these concerns would be addressed. But right now that isn't the case, and we don't know if/when that would happen, so we need to review these proposed changes on the basis of how they'd change the current GitHub CI UX overall.
Changes since v1: * In the patch that removed MAKE_TARGETS, a stale comment about that variable is also removed. * The comment about set -x has been adjusted because it no longer applies as-is. * The commit message of "ci: make it easier to find failed tests' logs in the GitHub workflow" has been adjusted to motivate the improvement better. Background ========== Recent patches intended to help readers figure out CI failures much quicker than before. Unfortunately, they haven't been entirely positive for me. For example, they broke the branch protections in Microsoft's fork of Git, where we require Pull Requests to pass a certain set of Checks (which are identified by their names) and therefore caused follow-up work. Using CI and in general making it easier for new contributors is an area I'm passionate about, and one I'd like to see improved. The current situation ===================== Let me walk you through the current experience when a PR build fails: I get a notification mail that only says that a certain job failed. There's no indication of which test failed (or was it the build?). I can click on a link at it takes me to the workflow run. Once there, all it says is "Process completed with exit code 1", or even "code 2". Sure, I can click on one of the failed jobs. It even expands the failed step's log (collapsing the other steps). And what do I see there? Let's look at an example of a failed linux-clang (ubuntu-latest) job [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: [...] Test Summary Report ------------------- t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) Failed test: 49 Non-zero exit status: 1 t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) TODO passed: 45, 47 Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) Result: FAIL make[1]: *** [Makefile:53: prove] Error 1 make[1]: Leaving directory '/home/runner/work/git/git/t' make: *** [Makefile:3018: test] Error 2 That's it. I count myself lucky not to be a new contributor being faced with something like this. Now, since I am active in the Git project for a couple of days or so, I can make sense of the "TODO passed" label and know that for the purpose of fixing the build failures, I need to ignore this, and that I need to focus on the "Failed test" part instead. I also know that I do not have to get myself an ubuntu-latest box just to reproduce the error, I do not even have to check out the code and run it just to learn what that "49" means. I know, and I do not expect any new contributor, not even most seasoned contributors to know, that I have to patiently collapse the "Run ci/run-build-and-tests.sh" job's log, and instead expand the "Run ci/print-test-failures.sh" job log (which did not fail and hence does not draw any attention to it). I know, and again: I do not expect many others to know this, that I then have to click into the "Search logs" box (not the regular web browser's search via Ctrl+F!) and type in "not ok" to find the log of the failed test case (and this might still be a "known broken" one that is marked via test_expect_failure and once again needs to be ignored). To be excessively clear: This is not a great experience! Improved output =============== Our previous Azure Pipelines-based CI builds had a much nicer UI, one that even showed flaky tests, and trends e.g. how long the test cases ran. When I ported Git's CI over to GitHub workflows (to make CI more accessible to new contributors), I knew fully well that we would leave this very nice UI behind, and I had hoped that we would get something similar back via new, community-contributed GitHub Actions that can be used in GitHub workflows. However, most likely because we use a home-grown test framework implemented in opinionated POSIX shells scripts, that did not happen. So I had a look at what standards exist e.g. when testing PowerShell modules, in the way of marking up their test output in GitHub workflows, and I was not disappointed: GitHub workflows support "grouping" of output lines, i.e. marking sections of the output as a group that is then collapsed by default and can be expanded. And it is this feature I decided to use in this patch series, along with GitHub workflows' commands to display errors or notices that are also shown on the summary page of the workflow run. Now, in addition to "Process completed with exit code" on the summary page, we also read something like: ⊗ linux-gcc (ubuntu-latest) failed: t9800.20 submit from detached head Even better, this message is a link, and following that, the reader is presented with something like this [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: ⏵ Run ci/run-build-and-tests.sh ⏵ CI setup + ln -s /home/runner/none/.prove t/.prove + run_tests=t + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main + group Build make + set +x ⏵ Build ⏵ Run tests === Failed test: t9800-git-p4-basic === ⏵ ok: t9800.1 start p4d ⏵ ok: t9800.2 add p4 files ⏵ ok: t9800.3 basic git p4 clone ⏵ ok: t9800.4 depot typo error ⏵ ok: t9800.5 git p4 clone @all ⏵ ok: t9800.6 git p4 sync uninitialized repo ⏵ ok: t9800.7 git p4 sync new branch ⏵ ok: t9800.8 clone two dirs ⏵ ok: t9800.9 clone two dirs, @all ⏵ ok: t9800.10 clone two dirs, @all, conflicting files ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit ⏵ ok: t9800.12 clone using non-numeric revision ranges ⏵ ok: t9800.13 clone with date range, excluding some changes ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output ⏵ ok: t9800.15 exit gracefully for p4 server errors ⏵ ok: t9800.16 clone --bare should make a bare repository ⏵ ok: t9800.17 initial import time from top change time ⏵ ok: t9800.18 unresolvable host in P4PORT should display error ⏵ ok: t9800.19 run hook p4-pre-submit before submit Error: failed: t9800.20 submit from detached head ⏵ failure: t9800.20 submit from detached head Error: failed: t9800.21 submit from worktree ⏵ failure: t9800.21 submit from worktree === Failed test: t9801-git-p4-branch === [...] The "Failed test:" lines are colored in yellow to give a better visual clue about the logs' structure, the "Error:" label is colored in red to draw the attention to the important part of the log, and the "⏵" characters indicate that part of the log is collapsed and can be expanded by clicking on it. To drill down, the reader merely needs to expand the (failed) test case's log by clicking on it, and then study the log. If needed (e.g. when the test case relies on side effects from previous test cases), the logs of preceding test cases can be expanded as well. In this example, when expanding t9800.20, it looks like this (for ease of reading, I cut a few chunks of lines, indicated by "[...]"): [...] ⏵ ok: t9800.19 run hook p4-pre-submit before submit Error: failed: t9800.20 submit from detached head ⏷ failure: t9800.20 submit from detached head test_when_finished cleanup_git && git p4 clone --dest="$git" //depot && ( cd "$git" && git checkout p4/master && >detached_head_test && git add detached_head_test && git commit -m "add detached_head" && git config git-p4.skipSubmitEdit true && git p4 submit && git p4 rebase && git log p4/master | grep detached_head ) [...] Depot paths: //depot/ Import destination: refs/remotes/p4/master Importing revision 9 (100%)Perforce db files in '.' will be created if missing... Perforce db files in '.' will be created if missing... Traceback (most recent call last): File "/home/runner/work/git/git/git-p4", line 4455, in <module> main() File "/home/runner/work/git/git/git-p4", line 4449, in main if not cmd.run(args): File "/home/runner/work/git/git/git-p4", line 2590, in run rebase.rebase() File "/home/runner/work/git/git/git-p4", line 4121, in rebase if len(read_pipe("git diff-index HEAD --")) > 0: File "/home/runner/work/git/git/git-p4", line 297, in read_pipe retcode, out, err = read_pipe_full(c, *k, **kw) File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full p = subprocess.Popen( File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' error: last command exited with $?=1 + cleanup_git + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + nr_tries_left=60 + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + test 1 -ne 1 + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + nr_tries_left=60 + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + exit 1 + eval_ret=1 + : not ok 20 - submit from detached head # # test_when_finished cleanup_git && # git p4 clone --dest="$git" //depot && # ( # cd "$git" && # git checkout p4/master && # >detached_head_test && # git add detached_head_test && # git commit -m "add detached_head" && # git config git-p4.skipSubmitEdit true && # git p4 submit && # git p4 rebase && # git log p4/master | grep detached_head # ) # Error: failed: t9800.21 submit from worktree [...] Is this the best UI we can have for test failures in CI runs? I hope we can do better. Having said that, this patch series presents a pretty good start, and offers a basis for future improvements. Johannes Schindelin (9): ci: fix code style ci/run-build-and-tests: take a more high-level view ci: make it easier to find failed tests' logs in the GitHub workflow ci/run-build-and-tests: add some structure to the GitHub workflow output tests: refactor --write-junit-xml code test(junit): avoid line feeds in XML attributes ci: optionally mark up output in the GitHub workflow ci: use `--github-workflow-markup` in the GitHub workflow ci: call `finalize_test_case_output` a little later .github/workflows/main.yml | 12 --- ci/lib.sh | 82 +++++++++++++++-- ci/run-build-and-tests.sh | 14 +-- ci/run-test-slice.sh | 5 +- t/test-lib-functions.sh | 4 +- t/test-lib-github-workflow-markup.sh | 50 ++++++++++ t/test-lib-junit.sh | 132 +++++++++++++++++++++++++++ t/test-lib.sh | 128 ++++---------------------- 8 files changed, 288 insertions(+), 139 deletions(-) create mode 100644 t/test-lib-github-workflow-markup.sh create mode 100644 t/test-lib-junit.sh base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1117 Range-diff vs v1: 1: db08b07c37a = 1: db08b07c37a ci: fix code style 2: d2ff51bb5da ! 2: 42ff3e170bf ci/run-build-and-tests: take a more high-level view @@ ci/run-build-and-tests.sh: pedantic) ;; esac - # Any new "test" targets should not go after this "make", but should - # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will - # start running tests. +-# Any new "test" targets should not go after this "make", but should +-# adjust $MAKE_TARGETS. Otherwise compilation-only targets above will +-# start running tests. -make $MAKE_TARGETS +make +if test -n "$run_tests" 3: 98891b0d3f7 ! 3: bbbe1623257 ci: make it easier to find failed tests' logs in the GitHub workflow @@ Metadata ## Commit message ## ci: make it easier to find failed tests' logs in the GitHub workflow + When investigating a test failure, the time that matters most is the + time it takes from getting aware of the failure to displaying the output + of the failing test case. + You currently have to know a lot of implementation details when investigating test failures in the CI runs. The first step is easy: the failed job is marked quite clearly, but when opening it, the failed step @@ Commit message The actually interesting part is in the detailed log of said failed test script. But that log is shown in the CI run's step that runs `ci/print-test-failures.sh`. And that step is _not_ expanded in the web - UI by default. + UI by default. It is even marked as "successful", which makes it very + easy to miss that there is useful information hidden in there. Let's help the reader by showing the failed tests' detailed logs in the step that is expanded automatically, i.e. directly after the test suite 4: 9333ba781b8 ! 4: f72254a9ac6 ci/run-build-and-tests: add some structure to the GitHub workflow output @@ ci/lib.sh + +# Set 'exit on error' for all CI scripts to let the caller know that +# something went wrong. -+# Set tracing executed commands, primarily setting environment variables -+# and installing dependencies. ++# ++# We already enabled tracing executed commands earlier. This helps by showing ++# how # environment variables are set and and dependencies are installed. +set -e + skip_branch_tip_with_tag () { @@ ci/lib.sh: linux-leaks) +set -x ## ci/run-build-and-tests.sh ## -@@ ci/run-build-and-tests.sh: esac - # Any new "test" targets should not go after this "make", but should - # adjust $MAKE_TARGETS. Otherwise compilation-only targets above will - # start running tests. +@@ ci/run-build-and-tests.sh: pedantic) + ;; + esac + -make +group Build make if test -n "$run_tests" 5: 94dcbe1bc43 = 5: 9eda6574313 tests: refactor --write-junit-xml code 6: 41230100091 = 6: c8b240af749 test(junit): avoid line feeds in XML attributes 7: 98b32630fcd = 7: 15f199e810e ci: optionally mark up output in the GitHub workflow 8: 1a6bd1846bc = 8: 91ea54f36c5 ci: use `--github-workflow-markup` in the GitHub workflow 9: 992b1575889 = 9: be2a83f5da3 ci: call `finalize_test_case_output` a little later