Message ID | pull.1117.git.1643050574.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | ci: make Git's GitHub workflow output much more helpful | expand |
On Mon, Jan 24 2022, Johannes Schindelin via GitGitGadget wrote: > Background > ========== > > Recent patches intended to help readers figure out CI failures much quicker > than before. Unfortunately, they haven't been entirely positive for me. For > example, they broke the branch protections in Microsoft's fork of Git, where > we require Pull Requests to pass a certain set of Checks (which are > identified by their names) and therefore caused follow-up work. This seems to be a reference to my df7375d7728 (CI: use shorter names that fit in UX tooltips, 2021-11-23) merged as part of ab/ci-updates, and I understand from this summary that you had some custom job somewhere that scraped the job names which broke. That's unfortunate, I do think being able to actually read the tooltips in the GitHub UI was a worthwhile trade-off in the end though. But I'm entirely confused about what any of that has to do with this series, which is about changing how the job output itself is presented and summarized, and not about the job names, and making them fit in tooltips. Later in the summary you note: > Using CI and in general making it easier for new contributors is an area I'm > passionate about, and one I'd like to see improved. > [...] > ⊗ linux-gcc (ubuntu-latest) > failed: t9800.20 submit from detached head Which has one of the new and shorter jobnames, but in a part of the UX where the length didn't matter, and I can't find a way where it does.
[CC-ing some people who've been interested in CI architechture] On Mon, Jan 24 2022, Johannes Schindelin via GitGitGadget wrote: > [...] > The current situation > ===================== > > Let me walk you through the current experience when a PR build fails: I get > a notification mail that only says that a certain job failed. There's no > indication of which test failed (or was it the build?). I can click on a > link at it takes me to the workflow run. Once there, all it says is "Process > completed with exit code 1", or even "code 2". Sure, I can click on one of > the failed jobs. It even expands the failed step's log (collapsing the other > steps). And what do I see there? > > Let's look at an example of a failed linux-clang (ubuntu-latest) job > [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: > > [...] > Test Summary Report > ------------------- > t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) > Failed test: 49 > Non-zero exit status: 1 > t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) > TODO passed: 45, 47 > Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) > Result: FAIL > make[1]: *** [Makefile:53: prove] Error 1 > make[1]: Leaving directory '/home/runner/work/git/git/t' > make: *** [Makefile:3018: test] Error 2 > Firstly I very much applaud any effort to move the CI UX forward. I know we haven't seen eye-to-eye on some of the trade-offs there, but I think something like this series is a step in the right direction. I.e. trying harder to summarize the output for the user, and making use of some CI platform-specific features. I sent a reply in this thread purely on some implementation concerns related to that in https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/, but let's leave that aside for now... > [...] > So I had a look at what standards exist e.g. when testing PowerShell > modules, in the way of marking up their test output in GitHub workflows, and > I was not disappointed: GitHub workflows support "grouping" of output lines, > i.e. marking sections of the output as a group that is then collapsed by > default and can be expanded. And it is this feature I decided to use in this > patch series, along with GitHub workflows' commands to display errors or > notices that are also shown on the summary page of the workflow run. Now, in > addition to "Process completed with exit code" on the summary page, we also > read something like: > > ⊗ linux-gcc (ubuntu-latest) > failed: t9800.20 submit from detached head > > Even better, this message is a link, and following that, the reader is > presented with something like this > [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: This series is doing several different things, at least: 1) "Grouping" the ci/ output, i.e. "make" from "make test" 2) Doing likewise for t/test-lib.sh 3) In doing that for t/test-lib.sh, also "signalling" the GitHub CI, to e.g. get the "submit from detached head" output you quote just a few lines above I'd like to focus on just #1 here. Where I was going with that in my last CI series was to make a start at eventually being able to run simply "make" at the top-level "step". I.e. to have a recipe that looks like: - run: make - run: make test I feel strongly that that's where we should be heading, and the #1 part of this series is basically trying to emulate what you'd get for free if we simply did that. I.e. if you run single commands at the "step" level (in GitHub CI nomenclature) you'll get what you're doing with groupings in this series for free, and without any special code in ci/*, better yet if you then do want grouping *within* that step you're free to do so without having clobbered your one-level of grouping already on distinguishing "make" from "make test". IOW our CI now looks like this (pseudocode): - job: - step1: - use ci/lib.sh to set env vars - run a script like ci/run-build-and-tests.sh - step2: - use ci/lib.sh to set env vars - run a script like print-test-failures.sh But should instead look like: - job: - step1: - set variables in $GITHUB_ENV using ci/lib.sh - step2: - make - step3: - make test - step4: - run a script like print-test-failures.sh Well, we can quibble about "step4", but again, let's focus on #1 here, that's more #2-#3 territory. I had some WIP code to do that which I polished up, here's how e.g. a build failure looks like in your implementation (again, just focusing on how "make" and "make test" is divided out, not the rest): https://github.com/dscho/git/runs/4840190622?check_suite_focus=true#step:4:62 I.e. you've made "build" an expandable group at the same level as a single failed test, and still all under the opaque ci/run-build-and-test.sh script. And here's mine. This is using a semi-recent version of my patches that happened to have a failure, not quite what I've got now, but close enough for this E-Mail: https://github.com/avar/git/runs/4956260395?check_suite_focus=true#step:7:1 Now, notice two things, one we've made "make" and "make test" top-level steps, but more importantly if you expand that "make test" step on yours you'll get the full "make test" output, And two it's got something you don't have at all, which is that we're now making use of the GitHub CI feature of having pre-declared an environment for "make test", which the CI knows about (you need to click to expand it): https://github.com/avar/git/runs/4956260395?check_suite_focus=true#step:7:4 Right now that's something we hardly make use of at all, but with my changes the environment is the *only* special sauce we specify before the step, i.e. GIT_PROVE_OPTS=.. DEFAULT_TEST_TARGET=... etc. I think I've run out of my ML quota for now, but here's the branch that implements it: https://github.com/git/git/compare/master...avar:avar/ci-unroll-make-commands-to-ci-recipe That's "282 additions and 493 deletions.", much of what was required to do this was to eject the remaining support for the dead Travis and Azure CI's that we don't run, i.e. to entirely remove any sort of state management or job control from ci/lib.sh, and have it *only* be tasked with setting variables for subsequent steps to use. That makes it much simpler, my end-state of it is ~170 lines v.s. your ~270 (but to be fair some of that's deleted Travis code): https://github.com/avar/git/blob/avar/ci-unroll-make-commands-to-ci-recipe/ci/lib.sh https://github.com/gitgitgadget/git/blob/pr-1117/dscho/use-grouping-in-ci-v1/ci/lib.sh And much of the rest is just gone, e.g. ci/run-build-and-tests.sh isn't there anymore, instead you simply run "make" or "make test" (or the equivalent on Windows, which also works): https://github.com/avar/git/tree/avar/ci-unroll-make-commands-to-ci-recipe/ci https://github.com/gitgitgadget/git/tree/pr-1117/dscho/use-grouping-in-ci-v1/ci Anyway, I hope we can find some sort of joint way forward with this, because I think your #1 at least is going in the opposite direction we should be going to achieve much the same ends you'd like to achieve. We can really just do this in a much simpler way once we stop treating ci/lib.sh and friends as monolithic ball of mud entry points. But I'd really like us not to go in this direction of using markup to "sub-divide" the "steps" within a given job, when we can relatively easily just ... divide the steps. As shown above that UI plays much more naturally into the CI's native features & how it likes to arrange & present things. And again, all of this is *only* discussing the "step #1" noted above. Using "grouping" for presenting the test failures themselves or sending summaries to the CI "Summary" is a different matter. Thanks!
Hi Junio, I notice that you did not take this into `seen` yet. I find that a little sad because it would potentially have helped others to figure out the failure in the latest `seen`: https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. Ciao, Dscho On Mon, 24 Jan 2022, Johannes Schindelin via GitGitGadget wrote: > > Background > ========== > > Recent patches intended to help readers figure out CI failures much quicker > than before. Unfortunately, they haven't been entirely positive for me. For > example, they broke the branch protections in Microsoft's fork of Git, where > we require Pull Requests to pass a certain set of Checks (which are > identified by their names) and therefore caused follow-up work. > > Using CI and in general making it easier for new contributors is an area I'm > passionate about, and one I'd like to see improved. > > > The current situation > ===================== > > Let me walk you through the current experience when a PR build fails: I get > a notification mail that only says that a certain job failed. There's no > indication of which test failed (or was it the build?). I can click on a > link at it takes me to the workflow run. Once there, all it says is "Process > completed with exit code 1", or even "code 2". Sure, I can click on one of > the failed jobs. It even expands the failed step's log (collapsing the other > steps). And what do I see there? > > Let's look at an example of a failed linux-clang (ubuntu-latest) job > [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: > > [...] > Test Summary Report > ------------------- > t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) > Failed test: 49 > Non-zero exit status: 1 > t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) > TODO passed: 45, 47 > Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) > Result: FAIL > make[1]: *** [Makefile:53: prove] Error 1 > make[1]: Leaving directory '/home/runner/work/git/git/t' > make: *** [Makefile:3018: test] Error 2 > > > That's it. I count myself lucky not to be a new contributor being faced with > something like this. > > Now, since I am active in the Git project for a couple of days or so, I can > make sense of the "TODO passed" label and know that for the purpose of > fixing the build failures, I need to ignore this, and that I need to focus > on the "Failed test" part instead. > > I also know that I do not have to get myself an ubuntu-latest box just to > reproduce the error, I do not even have to check out the code and run it > just to learn what that "49" means. > > I know, and I do not expect any new contributor, not even most seasoned > contributors to know, that I have to patiently collapse the "Run > ci/run-build-and-tests.sh" job's log, and instead expand the "Run > ci/print-test-failures.sh" job log (which did not fail and hence does not > draw any attention to it). > > I know, and again: I do not expect many others to know this, that I then > have to click into the "Search logs" box (not the regular web browser's > search via Ctrl+F!) and type in "not ok" to find the log of the failed test > case (and this might still be a "known broken" one that is marked via > test_expect_failure and once again needs to be ignored). > > To be excessively clear: This is not a great experience! > > > Improved output > =============== > > Our previous Azure Pipelines-based CI builds had a much nicer UI, one that > even showed flaky tests, and trends e.g. how long the test cases ran. When I > ported Git's CI over to GitHub workflows (to make CI more accessible to new > contributors), I knew fully well that we would leave this very nice UI > behind, and I had hoped that we would get something similar back via new, > community-contributed GitHub Actions that can be used in GitHub workflows. > However, most likely because we use a home-grown test framework implemented > in opinionated POSIX shells scripts, that did not happen. > > So I had a look at what standards exist e.g. when testing PowerShell > modules, in the way of marking up their test output in GitHub workflows, and > I was not disappointed: GitHub workflows support "grouping" of output lines, > i.e. marking sections of the output as a group that is then collapsed by > default and can be expanded. And it is this feature I decided to use in this > patch series, along with GitHub workflows' commands to display errors or > notices that are also shown on the summary page of the workflow run. Now, in > addition to "Process completed with exit code" on the summary page, we also > read something like: > > ⊗ linux-gcc (ubuntu-latest) > failed: t9800.20 submit from detached head > > > Even better, this message is a link, and following that, the reader is > presented with something like this > [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: > > ⏵ Run ci/run-build-and-tests.sh > ⏵ CI setup > + ln -s /home/runner/none/.prove t/.prove > + run_tests=t > + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main > + group Build make > + set +x > ⏵ Build > ⏵ Run tests > === Failed test: t9800-git-p4-basic === > ⏵ ok: t9800.1 start p4d > ⏵ ok: t9800.2 add p4 files > ⏵ ok: t9800.3 basic git p4 clone > ⏵ ok: t9800.4 depot typo error > ⏵ ok: t9800.5 git p4 clone @all > ⏵ ok: t9800.6 git p4 sync uninitialized repo > ⏵ ok: t9800.7 git p4 sync new branch > ⏵ ok: t9800.8 clone two dirs > ⏵ ok: t9800.9 clone two dirs, @all > ⏵ ok: t9800.10 clone two dirs, @all, conflicting files > ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit > ⏵ ok: t9800.12 clone using non-numeric revision ranges > ⏵ ok: t9800.13 clone with date range, excluding some changes > ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output > ⏵ ok: t9800.15 exit gracefully for p4 server errors > ⏵ ok: t9800.16 clone --bare should make a bare repository > ⏵ ok: t9800.17 initial import time from top change time > ⏵ ok: t9800.18 unresolvable host in P4PORT should display error > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > Error: failed: t9800.20 submit from detached head > ⏵ failure: t9800.20 submit from detached head > Error: failed: t9800.21 submit from worktree > ⏵ failure: t9800.21 submit from worktree > === Failed test: t9801-git-p4-branch === > [...] > > > The "Failed test:" lines are colored in yellow to give a better visual clue > about the logs' structure, the "Error:" label is colored in red to draw the > attention to the important part of the log, and the "⏵" characters indicate > that part of the log is collapsed and can be expanded by clicking on it. > > To drill down, the reader merely needs to expand the (failed) test case's > log by clicking on it, and then study the log. If needed (e.g. when the test > case relies on side effects from previous test cases), the logs of preceding > test cases can be expanded as well. In this example, when expanding > t9800.20, it looks like this (for ease of reading, I cut a few chunks of > lines, indicated by "[...]"): > > [...] > ⏵ ok: t9800.19 run hook p4-pre-submit before submit > Error: failed: t9800.20 submit from detached head > ⏷ failure: t9800.20 submit from detached head > test_when_finished cleanup_git && > git p4 clone --dest="$git" //depot && > ( > cd "$git" && > git checkout p4/master && > >detached_head_test && > git add detached_head_test && > git commit -m "add detached_head" && > git config git-p4.skipSubmitEdit true && > git p4 submit && > git p4 rebase && > git log p4/master | grep detached_head > ) > [...] > Depot paths: //depot/ > Import destination: refs/remotes/p4/master > > Importing revision 9 (100%)Perforce db files in '.' will be created if missing... > Perforce db files in '.' will be created if missing... > > Traceback (most recent call last): > File "/home/runner/work/git/git/git-p4", line 4455, in <module> > main() > File "/home/runner/work/git/git/git-p4", line 4449, in main > if not cmd.run(args): > File "/home/runner/work/git/git/git-p4", line 2590, in run > rebase.rebase() > File "/home/runner/work/git/git/git-p4", line 4121, in rebase > if len(read_pipe("git diff-index HEAD --")) > 0: > File "/home/runner/work/git/git/git-p4", line 297, in read_pipe > retcode, out, err = read_pipe_full(c, *k, **kw) > File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full > p = subprocess.Popen( > File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ > self._execute_child(args, executable, preexec_fn, close_fds, > File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child > raise child_exception_type(errno_num, err_msg, err_filename) > FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' > error: last command exited with $?=1 > + cleanup_git > + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + nr_tries_left=60 > + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + test 1 -ne 1 > + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + nr_tries_left=60 > + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git > + exit 1 > + eval_ret=1 > + : > not ok 20 - submit from detached head > # > # test_when_finished cleanup_git && > # git p4 clone --dest="$git" //depot && > # ( > # cd "$git" && > # git checkout p4/master && > # >detached_head_test && > # git add detached_head_test && > # git commit -m "add detached_head" && > # git config git-p4.skipSubmitEdit true && > # git p4 submit && > # git p4 rebase && > # git log p4/master | grep detached_head > # ) > # > Error: failed: t9800.21 submit from worktree > [...] > > > Is this the best UI we can have for test failures in CI runs? I hope we can > do better. Having said that, this patch series presents a pretty good start, > and offers a basis for future improvements. > > Johannes Schindelin (9): > ci: fix code style > ci/run-build-and-tests: take a more high-level view > ci: make it easier to find failed tests' logs in the GitHub workflow > ci/run-build-and-tests: add some structure to the GitHub workflow > output > tests: refactor --write-junit-xml code > test(junit): avoid line feeds in XML attributes > ci: optionally mark up output in the GitHub workflow > ci: use `--github-workflow-markup` in the GitHub workflow > ci: call `finalize_test_case_output` a little later > > .github/workflows/main.yml | 12 --- > ci/lib.sh | 81 ++++++++++++++-- > ci/run-build-and-tests.sh | 11 ++- > ci/run-test-slice.sh | 5 +- > t/test-lib-functions.sh | 4 +- > t/test-lib-github-workflow-markup.sh | 50 ++++++++++ > t/test-lib-junit.sh | 132 +++++++++++++++++++++++++++ > t/test-lib.sh | 128 ++++---------------------- > 8 files changed, 287 insertions(+), 136 deletions(-) > create mode 100644 t/test-lib-github-workflow-markup.sh > create mode 100644 t/test-lib-junit.sh > > > base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1117 > -- > gitgitgadget > >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > I notice that you did not take this into `seen` yet. I find that a little > sad because it would potentially have helped others to figure out the > failure in the latest `seen`: > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 > > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. I saw the thread, I saw a few patches were commented on, and a few were left unanswered, but one was replied by the original submitter with a "Good catch!", making me expect the topic to be discussed or rerolled to become ready relatively soon. But nothing happened, so I even forgot to take a look myself by picking it up in 'seen'. It does sound sad that the topic was left hanging there for 3 weeks or so in that state without any reroll or response.
On Sun, Feb 20 2022, Johannes Schindelin wrote: > Hi Junio, > > I notice that you did not take this into `seen` yet. I find that a little > sad because it would potentially have helped others to figure out the > failure in the latest `seen`: > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 > > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. I left some feedback on your submission ~3 weeks ago that you haven't responded to: https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/ I think you should really reply to that before this moves forward, i.e. it's not trivial concerns. I think to get from our current "X" to your aims of "Y" your way of doing that (for part of this series) is really an overly complex way of getting there that we can do much simpler, and the simpler way integrates much better with the GitHub CI UI. The feedback I left is on the part of this that's not directly relevant to what you're pointing out here (which is the grouping of the per-test failure output), but if your series is picked-up as-is we'd need to undo rather big parts of it to get to what I consider a better state for the "grouping" of the "make" v.s. "make test" etc. output. I can just submit my version of that & we can hash out what direction makes sense there, how does that sound? I've been running with it for about a month, and really think that part of the failure output is much better. Here's an example of that part: https://github.com/avar/git/runs/5259000590?check_suite_focus=true I.e. note how we'll now just have a "make" and "make test" step, and we failed there on the "make". So we'd get to the point of simply invoking those build steps as 1=1 mapped CI steps, as opposed to "improving" ci/run-build-and-tests.sh to emulate that (I've just git rm'd it in my version). > > On Mon, 24 Jan 2022, Johannes Schindelin via GitGitGadget wrote: > >> >> Background >> ========== >> >> Recent patches intended to help readers figure out CI failures much quicker >> than before. Unfortunately, they haven't been entirely positive for me. For >> example, they broke the branch protections in Microsoft's fork of Git, where >> we require Pull Requests to pass a certain set of Checks (which are >> identified by their names) and therefore caused follow-up work. >> >> Using CI and in general making it easier for new contributors is an area I'm >> passionate about, and one I'd like to see improved. >> >> >> The current situation >> ===================== >> >> Let me walk you through the current experience when a PR build fails: I get >> a notification mail that only says that a certain job failed. There's no >> indication of which test failed (or was it the build?). I can click on a >> link at it takes me to the workflow run. Once there, all it says is "Process >> completed with exit code 1", or even "code 2". Sure, I can click on one of >> the failed jobs. It even expands the failed step's log (collapsing the other >> steps). And what do I see there? >> >> Let's look at an example of a failed linux-clang (ubuntu-latest) job >> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: >> >> [...] >> Test Summary Report >> ------------------- >> t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) >> Failed test: 49 >> Non-zero exit status: 1 >> t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) >> TODO passed: 45, 47 >> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) >> Result: FAIL >> make[1]: *** [Makefile:53: prove] Error 1 >> make[1]: Leaving directory '/home/runner/work/git/git/t' >> make: *** [Makefile:3018: test] Error 2 >> >> >> That's it. I count myself lucky not to be a new contributor being faced with >> something like this. >> >> Now, since I am active in the Git project for a couple of days or so, I can >> make sense of the "TODO passed" label and know that for the purpose of >> fixing the build failures, I need to ignore this, and that I need to focus >> on the "Failed test" part instead. >> >> I also know that I do not have to get myself an ubuntu-latest box just to >> reproduce the error, I do not even have to check out the code and run it >> just to learn what that "49" means. >> >> I know, and I do not expect any new contributor, not even most seasoned >> contributors to know, that I have to patiently collapse the "Run >> ci/run-build-and-tests.sh" job's log, and instead expand the "Run >> ci/print-test-failures.sh" job log (which did not fail and hence does not >> draw any attention to it). >> >> I know, and again: I do not expect many others to know this, that I then >> have to click into the "Search logs" box (not the regular web browser's >> search via Ctrl+F!) and type in "not ok" to find the log of the failed test >> case (and this might still be a "known broken" one that is marked via >> test_expect_failure and once again needs to be ignored). >> >> To be excessively clear: This is not a great experience! >> >> >> Improved output >> =============== >> >> Our previous Azure Pipelines-based CI builds had a much nicer UI, one that >> even showed flaky tests, and trends e.g. how long the test cases ran. When I >> ported Git's CI over to GitHub workflows (to make CI more accessible to new >> contributors), I knew fully well that we would leave this very nice UI >> behind, and I had hoped that we would get something similar back via new, >> community-contributed GitHub Actions that can be used in GitHub workflows. >> However, most likely because we use a home-grown test framework implemented >> in opinionated POSIX shells scripts, that did not happen. >> >> So I had a look at what standards exist e.g. when testing PowerShell >> modules, in the way of marking up their test output in GitHub workflows, and >> I was not disappointed: GitHub workflows support "grouping" of output lines, >> i.e. marking sections of the output as a group that is then collapsed by >> default and can be expanded. And it is this feature I decided to use in this >> patch series, along with GitHub workflows' commands to display errors or >> notices that are also shown on the summary page of the workflow run. Now, in >> addition to "Process completed with exit code" on the summary page, we also >> read something like: >> >> ⊗ linux-gcc (ubuntu-latest) >> failed: t9800.20 submit from detached head >> >> >> Even better, this message is a link, and following that, the reader is >> presented with something like this >> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: >> >> ⏵ Run ci/run-build-and-tests.sh >> ⏵ CI setup >> + ln -s /home/runner/none/.prove t/.prove >> + run_tests=t >> + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main >> + group Build make >> + set +x >> ⏵ Build >> ⏵ Run tests >> === Failed test: t9800-git-p4-basic === >> ⏵ ok: t9800.1 start p4d >> ⏵ ok: t9800.2 add p4 files >> ⏵ ok: t9800.3 basic git p4 clone >> ⏵ ok: t9800.4 depot typo error >> ⏵ ok: t9800.5 git p4 clone @all >> ⏵ ok: t9800.6 git p4 sync uninitialized repo >> ⏵ ok: t9800.7 git p4 sync new branch >> ⏵ ok: t9800.8 clone two dirs >> ⏵ ok: t9800.9 clone two dirs, @all >> ⏵ ok: t9800.10 clone two dirs, @all, conflicting files >> ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit >> ⏵ ok: t9800.12 clone using non-numeric revision ranges >> ⏵ ok: t9800.13 clone with date range, excluding some changes >> ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output >> ⏵ ok: t9800.15 exit gracefully for p4 server errors >> ⏵ ok: t9800.16 clone --bare should make a bare repository >> ⏵ ok: t9800.17 initial import time from top change time >> ⏵ ok: t9800.18 unresolvable host in P4PORT should display error >> ⏵ ok: t9800.19 run hook p4-pre-submit before submit >> Error: failed: t9800.20 submit from detached head >> ⏵ failure: t9800.20 submit from detached head >> Error: failed: t9800.21 submit from worktree >> ⏵ failure: t9800.21 submit from worktree >> === Failed test: t9801-git-p4-branch === >> [...] >> >> >> The "Failed test:" lines are colored in yellow to give a better visual clue >> about the logs' structure, the "Error:" label is colored in red to draw the >> attention to the important part of the log, and the "⏵" characters indicate >> that part of the log is collapsed and can be expanded by clicking on it. >> >> To drill down, the reader merely needs to expand the (failed) test case's >> log by clicking on it, and then study the log. If needed (e.g. when the test >> case relies on side effects from previous test cases), the logs of preceding >> test cases can be expanded as well. In this example, when expanding >> t9800.20, it looks like this (for ease of reading, I cut a few chunks of >> lines, indicated by "[...]"): >> >> [...] >> ⏵ ok: t9800.19 run hook p4-pre-submit before submit >> Error: failed: t9800.20 submit from detached head >> ⏷ failure: t9800.20 submit from detached head >> test_when_finished cleanup_git && >> git p4 clone --dest="$git" //depot && >> ( >> cd "$git" && >> git checkout p4/master && >> >detached_head_test && >> git add detached_head_test && >> git commit -m "add detached_head" && >> git config git-p4.skipSubmitEdit true && >> git p4 submit && >> git p4 rebase && >> git log p4/master | grep detached_head >> ) >> [...] >> Depot paths: //depot/ >> Import destination: refs/remotes/p4/master >> >> Importing revision 9 (100%)Perforce db files in '.' will be created if missing... >> Perforce db files in '.' will be created if missing... >> >> Traceback (most recent call last): >> File "/home/runner/work/git/git/git-p4", line 4455, in <module> >> main() >> File "/home/runner/work/git/git/git-p4", line 4449, in main >> if not cmd.run(args): >> File "/home/runner/work/git/git/git-p4", line 2590, in run >> rebase.rebase() >> File "/home/runner/work/git/git/git-p4", line 4121, in rebase >> if len(read_pipe("git diff-index HEAD --")) > 0: >> File "/home/runner/work/git/git/git-p4", line 297, in read_pipe >> retcode, out, err = read_pipe_full(c, *k, **kw) >> File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full >> p = subprocess.Popen( >> File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ >> self._execute_child(args, executable, preexec_fn, close_fds, >> File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child >> raise child_exception_type(errno_num, err_msg, err_filename) >> FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' >> error: last command exited with $?=1 >> + cleanup_git >> + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git >> + nr_tries_left=60 >> + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git >> + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git >> + test 1 -ne 1 >> + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git >> + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git >> + nr_tries_left=60 >> + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git >> + exit 1 >> + eval_ret=1 >> + : >> not ok 20 - submit from detached head >> # >> # test_when_finished cleanup_git && >> # git p4 clone --dest="$git" //depot && >> # ( >> # cd "$git" && >> # git checkout p4/master && >> # >detached_head_test && >> # git add detached_head_test && >> # git commit -m "add detached_head" && >> # git config git-p4.skipSubmitEdit true && >> # git p4 submit && >> # git p4 rebase && >> # git log p4/master | grep detached_head >> # ) >> # >> Error: failed: t9800.21 submit from worktree >> [...] >> >> >> Is this the best UI we can have for test failures in CI runs? I hope we can >> do better. Having said that, this patch series presents a pretty good start, >> and offers a basis for future improvements. >> >> Johannes Schindelin (9): >> ci: fix code style >> ci/run-build-and-tests: take a more high-level view >> ci: make it easier to find failed tests' logs in the GitHub workflow >> ci/run-build-and-tests: add some structure to the GitHub workflow >> output >> tests: refactor --write-junit-xml code >> test(junit): avoid line feeds in XML attributes >> ci: optionally mark up output in the GitHub workflow >> ci: use `--github-workflow-markup` in the GitHub workflow >> ci: call `finalize_test_case_output` a little later >> >> .github/workflows/main.yml | 12 --- >> ci/lib.sh | 81 ++++++++++++++-- >> ci/run-build-and-tests.sh | 11 ++- >> ci/run-test-slice.sh | 5 +- >> t/test-lib-functions.sh | 4 +- >> t/test-lib-github-workflow-markup.sh | 50 ++++++++++ >> t/test-lib-junit.sh | 132 +++++++++++++++++++++++++++ >> t/test-lib.sh | 128 ++++---------------------- >> 8 files changed, 287 insertions(+), 136 deletions(-) >> create mode 100644 t/test-lib-github-workflow-markup.sh >> create mode 100644 t/test-lib-junit.sh >> >> >> base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a >> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1 >> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1 >> Pull-Request: https://github.com/gitgitgadget/git/pull/1117 >> -- >> gitgitgadget >> >>
Hi Junio, On Sat, 19 Feb 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I notice that you did not take this into `seen` yet. I find that a little > > sad because it would potentially have helped others to figure out the > > failure in the latest `seen`: > > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 > > > > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. > > I saw the thread, I saw a few patches were commented on, and a few > were left unanswered, but one was replied by the original submitter > with a "Good catch!", making me expect the topic to be discussed or > rerolled to become ready relatively soon. Yes, I have local changes, but I had really hoped that this patch series would get a chance to prove its point by example, i.e. by offering the improved output for the failures in `seen`. I hoped that because I think that those improvements speak for themselves when you see them. Ciao, Dscho
On Sun, Feb 20 2022, Johannes Schindelin wrote: > Hi Junio, > > On Sat, 19 Feb 2022, Junio C Hamano wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >> > I notice that you did not take this into `seen` yet. I find that a little >> > sad because it would potentially have helped others to figure out the >> > failure in the latest `seen`: >> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 >> > >> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. >> >> I saw the thread, I saw a few patches were commented on, and a few >> were left unanswered, but one was replied by the original submitter >> with a "Good catch!", making me expect the topic to be discussed or >> rerolled to become ready relatively soon. > > Yes, I have local changes, but I had really hoped that this patch series > would get a chance to prove its point by example, i.e. by offering the > improved output for the failures in `seen`. I hoped that because I think > that those improvements speak for themselves when you see them. I think it's a good idea to get wider expose in "seen", "next" etc. for topics where the bottleneck is lack of feedback due to lack of wider exposure. But in this case I've pointed out both direction & UX issues to you that you haven't addressed. Both what I sent a reminder of yesterday in [1], and more relevant to what you're discussing here a reply [2] where I looked & tested your new output v.s. the old, and found that on test failures it: * Replaced summary output with a much more verbose version. * Turned the GitHub UI from usable (but sometimes hard to find the needle in the haystack) to *extremely slow*. Seemingly because the browser was asked to make sense of~30k lines of output, with some of it hidden dynamically by JavaScript. 1. https://lore.kernel.org/git/220220.86bkz1d7hm.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/
Hi Ævar, On Mon, 21 Feb 2022, Ævar Arnfjörð Bjarmason wrote: > On Sun, Feb 20 2022, Johannes Schindelin wrote: > > > On Sat, 19 Feb 2022, Junio C Hamano wrote: > > > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> > >> > I notice that you did not take this into `seen` yet. I find that a little > >> > sad because it would potentially have helped others to figure out the > >> > failure in the latest `seen`: > >> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 > >> > > >> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. > >> > >> I saw the thread, I saw a few patches were commented on, and a few > >> were left unanswered, but one was replied by the original submitter > >> with a "Good catch!", making me expect the topic to be discussed or > >> rerolled to become ready relatively soon. > > > > Yes, I have local changes, but I had really hoped that this patch series > > would get a chance to prove its point by example, i.e. by offering the > > improved output for the failures in `seen`. I hoped that because I think > > that those improvements speak for themselves when you see them. > > I think it's a good idea to get wider expose in "seen", "next" etc. for > topics where the bottleneck is lack of feedback due to lack of wider > exposure. Having this in `seen` will give the patch series a chance to show in real life how it improves the process of analyzing regressions. Ciao, Johannes
Hi Ævar, On Sun, 20 Feb 2022, Ævar Arnfjörð Bjarmason wrote: > On Sun, Feb 20 2022, Johannes Schindelin wrote: > > > I notice that you did not take this into `seen` yet. I find that a little > > sad because it would potentially have helped others to figure out the > > failure in the latest `seen`: > > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 > > > > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. > > I left some feedback on your submission ~3 weeks ago that you haven't > responded to: > https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/ You answered my goal of making it easier to figure out regressions by doubling down on hiding the logs even better. That's not feedback, that's just ignoring the goal. You answered my refactor of the Azure Pipelines support with the question "why?" that I had answered already a long time ago. That's not feedback, that's ignoring the answers I already provided. I don't know how to respond to that, therefore I didn't. Ciao, Johannes
On Tue, Feb 22 2022, Johannes Schindelin wrote: > Hi Ævar, > > On Sun, 20 Feb 2022, Ævar Arnfjörð Bjarmason wrote: > >> On Sun, Feb 20 2022, Johannes Schindelin wrote: >> >> > I notice that you did not take this into `seen` yet. I find that a little >> > sad because it would potentially have helped others to figure out the >> > failure in the latest `seen`: >> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162 >> > >> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3. >> >> I left some feedback on your submission ~3 weeks ago that you haven't >> responded to: >> https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/ > > You answered my goal of making it easier to figure out regressions by > doubling down on hiding the logs even better. That's not feedback, that's > just ignoring the goal. I think it's clear to anyone reading my feedback that that's either a gross misreading of the feedback I provided or an intentional misrepresentation. I don't mention the second one of those lightly, but I think after some months of that pattern now when commenting on various patches of yours it's not an unfair claim. I.e. you generally seem to latch onto some very narrow interpretation or comment in some larger feedback pointing out various issues to you, and use that as a reason not to respond to or address any of the rest. So just to make the point about one of those mentioned in my [1] with some further details (I won't go into the whole thing to avoid repeating myself): I opened both of: https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true https://github.com/dscho/git/runs/4840190622?check_suite_focus=true Just now in Firefox 91.5.0esr-1. Both having been opened before, so they're in cache, and I've got a current 40MB/s real downlink speed etc. The former fully loads in around 5100ms, with your series here that's just short of 18000ms. So your CI changes are making the common case of just looking at a CI failure more than **3x as slow as before**. That's according to the "performance" timeline, and not some abstract "some JS was still running in the background". It lines up with the time that the scroll bar on the side of the screen stops moving, and the viewport does the "zoom to end" thing in GitHub CI's UI, focusing on the error reported. It was really slow before, but it's SLOOOOOW now. All of which (and I'm no webdev expert) seems to do with the browser engine struggling to keep up with a much larger set of log data being thrown at it, which despite the eliding you're adding can be seen in the ~1.7k lines of output growing to beyond ~33k now. Once it loads the end result after all of that (re your "doubling down on hiding the logs even better") is that I need to (and I've got a sizable vertically mounted screen) scroll through around 6 pages of output, each of which takes around 3 seconds of Firefox churning on more than 100% CPU before it shows me the next page. And even *if* it was instant the names of the failing tests are now spread across several pages of output, whereas in the "prove" output we have a quick overall summary separated from the "ci/print-test-failures.sh" output Does that mean the current output is perfect and can't be improved? No, I also think it sucks. I just think that the current implementation you've proposed for improving it is making it worse overall. Which doesn't mean that it couldn't be addressed, fixed, or that the core idea of using that "group" syntax to aggregate that output into sections is bad. I think we should use it, just not as it's currently implemented. If that's "not feedback" I don't know what is. It's all relevant, and while I'm elaborating further here [1] sent almost a month ago notes the same issues. > You answered my refactor of the Azure Pipelines support with the question > "why?" that I had answered already a long time ago. That's not feedback, > that's ignoring the answers I already provided. I think it's clear what the gap between that answer is and what I was asking you was in the parallel follow-up discussion at [2]. But even your answer there of just wanting to keep it in place doesn't really answer that question for this series. You're not just keeping that stale code in place, but actively changing it. I.e. even if you run with all that how are others meant to test and review the changes being proposed here? I.e. is resurrecting Azure CI required to test this series, or should reviewers ignore those parts and just hope it all works etc? > I don't know how to respond to that, therefore I didn't. I think whatever differences in direction for this CI feature that we have, or troubles understanding one another, that your update after 3 weeks of not replying to that feedback [3] asking why Junio didn't pick up your patches being indistinguishable from there having been nothing said about your patches at all is, I think, not a good way to proceed with that. I.e. we're not the only people talking here, there's presumably others who'll read these threads and will want to comment on the direction of any CI changes. Knowing from you that you read outstanding feedback and didn't understand it, or read it and summarized but ultimately decided to change nothing etc. makes for much of a flow on the ML than just ignoring that feedback entirely. 1. https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220222.86y2236ndp.gmgdl@evledraar.gmail.com/ 3. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202200043590.26495@tvgsbejvaqbjf.bet/
On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote: > [...] > So just to make the point about one of those mentioned in my [1] with > some further details (I won't go into the whole thing to avoid repeating > myself): > > I opened both of: > > https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true > https://github.com/dscho/git/runs/4840190622?check_suite_focus=true > > Just now in Firefox 91.5.0esr-1. Both having been opened before, so > they're in cache, and I've got a current 40MB/s real downlink speed etc. > > The former fully loads in around 5100ms, with your series here that's > just short of 18000ms. > > So your CI changes are making the common case of just looking at a CI > failure more than **3x as slow as before**. I don't think that is the most useful comparison between the two. When I am investigating a test failure the time that matters to me is the time it takes to display the output of the failing test case. With the first link above the initial page load is faster but to get to the output of the failing test case I have click on "Run ci/print_test_failures.sh" then wait for that to load and then search for "not ok" to actually get to the information I'm after. With the second link the initial page load does feel slower but then I'm presented with the test failures nicely highlighted in red, all I have to do is click on one and I've got the information I'm after. Overall that is much faster and easier to use. Best Wishes Phillip
On Wed, Feb 23 2022, Phillip Wood wrote: > On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote: >> [...] >> So just to make the point about one of those mentioned in my [1] with >> some further details (I won't go into the whole thing to avoid repeating >> myself): >> I opened both of: >> https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true >> https://github.com/dscho/git/runs/4840190622?check_suite_focus=true >> Just now in Firefox 91.5.0esr-1. Both having been opened before, so >> they're in cache, and I've got a current 40MB/s real downlink speed etc. >> The former fully loads in around 5100ms, with your series here >> that's >> just short of 18000ms. >> So your CI changes are making the common case of just looking at a >> CI >> failure more than **3x as slow as before**. > > I don't think that is the most useful comparison between the two.[...] I'm not saying that it's the most useful comparison between the two, but that there's a major performance regression introduced in this series that so far isn't addressed or noted. > [...]When > I am investigating a test failure the time that matters to me is the > time it takes to display the output of the failing test case. With the > first link above the initial page load is faster but to get to the > output of the failing test case I have click on "Run > ci/print_test_failures.sh" then wait for that to load and then search > for "not ok" to actually get to the information I'm after. With the > second link the initial page load does feel slower but then I'm > presented with the test failures nicely highlighted in red, all I > have to do is click on one and I've got the information I'm > after. Overall that is much faster and easier to use. Whether you think the regression is worth the end result is a subjective judgement. I don't think it is, but I don't think you or anyone else is wrong if they don't agree. If you think it's OK to spend ~20s instead of ~5s on rendering the new output that's something that clearly depends on how much you value the new output, and much much you're willing to wait. What I am saying, and what I hope you'll agree with, is that it's something that should be addressed in some way by this series. One way to do that would be to note the performence regression in a commit message, and argue that despite the slowdown it's worth it.
Hi Phillip, On Wed, 23 Feb 2022, Phillip Wood wrote: > On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote: > > [...] > > So just to make the point about one of those mentioned in my [1] with > > some further details (I won't go into the whole thing to avoid repeating > > myself): > > > > I opened both of: > > > > https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true > > https://github.com/dscho/git/runs/4840190622?check_suite_focus=true > > > > Just now in Firefox 91.5.0esr-1. Both having been opened before, so > > they're in cache, and I've got a current 40MB/s real downlink speed etc. > > > > The former fully loads in around 5100ms, with your series here that's > > just short of 18000ms. > > > > So your CI changes are making the common case of just looking at a CI > > failure more than **3x as slow as before**. > > I don't think that is the most useful comparison between the two. When I > am investigating a test failure the time that matters to me is the time > it takes to display the output of the failing test case. Thank you for expressing this so clearly. I will adopt a variation of this phrasing in my commit message, if you don't mind? > With the first link above the initial page load is faster but to get to > the output of the failing test case I have click on "Run > ci/print_test_failures.sh" then wait for that to load and then search > for "not ok" to actually get to the information I'm after. And that's only because you are familiar with what you have to do. Any new contributor would be stuck with the information presented on the initial load, without any indication that more information _is_ available, just hidden away in the next step's log (which is marked as "succeeding", therefore misleading the inclined reader into thinking that this cannot potentially contain any information pertinent to the _failure_ that needs to be investigated). > With the second link the initial page load does feel slower but then I'm > presented with the test failures nicely highlighted in red, all I have > to do is click on one and I've got the information I'm after. > > Overall that is much faster and easier to use. Thank you for your comment. I really started to doubt myself, getting the idea that it's just a case of me holding this thing wrong. For what it's worth, I did make a grave mistake by using that particular `seen` CI failure with all of those failing p4 tests, which obviously resulted in an incredibly large amount of logs. Obviously that _must_ be slow to load. I just did not have the time to fabricate a CI failure. However, I do agree with you that the large amount of logs would have to be looked at _anyway_, whether it is shown upon loading the job's logs or only when expanding the `print-test-failures` step's logs. The amount of the logs is a constant, after all, I did not change anything there (nor would I). So a better example might be my concrete use case yesterday: the CI build of `seen` failed. Here is the link to the regular output: https://github.com/git/git/actions/runs/1890665968 On that page, you see the following: Annotations 8 errors and 1 warning ⓧ win test (3) Process completed with exit code 2. ⓧ win test (6) Process completed with exit code 2. ⓧ win test (2) Process completed with exit code 2. ⓧ win+VS test (3) Process completed with exit code 2. ⓧ win+VS test (6) Process completed with exit code 2. ⓧ win+VS test (2) Process completed with exit code 2. ⓧ osx-gcc (macos-latest) Process completed with exit code 2. ⓧ osx-clang (macos-latest) Process completed with exit code 2. ⚠ CI: .github#L1 windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856 So I merged my branch into `seen` and pushed it. The corresponding run can be seen here: https://github.com/dscho/git/actions/runs/1892982393 On that page, you see the following: Annotations 50 errors and 1 warning ⓧ win test (3) failed: t7527.1 explicit daemon start and stop ⓧ win test (3) failed: t7527.2 implicit daemon start ⓧ win test (3) failed: t7527.3 implicit daemon stop (delete .git) ⓧ win test (3) failed: t7527.4 implicit daemon stop (rename .git) ⓧ win test (3) failed: t7527.5 implicit daemon stop (rename GIT~1) ⓧ win test (3) failed: t7527.6 implicit daemon stop (rename GIT~2) ⓧ win test (3) failed: t7527.8 cannot start multiple daemons ⓧ win test (3) failed: t7527.10 update-index implicitly starts daemon ⓧ win test (3) failed: t7527.11 status implicitly starts daemon ⓧ win test (3) failed: t7527.12 edit some files ⓧ win test (2) failed: t0012.81 fsmonitor--daemon can handle -h ⓧ win test (2) Process completed with exit code 1. ⓧ win test (6) failed: t7519.2 run fsmonitor-daemon in bare repo ⓧ win test (6) failed: t7519.3 run fsmonitor-daemon in virtual repo ⓧ win test (6) Process completed with exit code 1. ⓧ win+VS test (3) failed: t7527.1 explicit daemon start and stop ⓧ win+VS test (3) failed: t7527.2 implicit daemon start ⓧ win+VS test (3) failed: t7527.3 implicit daemon stop (delete .git) ⓧ win+VS test (3) failed: t7527.4 implicit daemon stop (rename .git) ⓧ win+VS test (3) failed: t7527.5 implicit daemon stop (rename GIT~1) ⓧ win+VS test (3) failed: t7527.6 implicit daemon stop (rename GIT~2) ⓧ win+VS test (3) failed: t7527.8 cannot start multiple daemons ⓧ win+VS test (3) failed: t7527.10 update-index implicitly starts daemon ⓧ win+VS test (3) failed: t7527.11 status implicitly starts daemon ⓧ win+VS test (3) failed: t7527.12 edit some files ⓧ win+VS test (2) failed: t0012.81 fsmonitor--daemon can handle -h ⓧ win+VS test (2) Process completed with exit code 1. ⓧ win+VS test (6) failed: t7519.2 run fsmonitor-daemon in bare repo ⓧ win+VS test (6) failed: t7519.3 run fsmonitor-daemon in virtual repo ⓧ win+VS test (6) Process completed with exit code 1. ⓧ osx-clang (macos-latest) failed: t0012.81 fsmonitor--daemon can handle -h ⓧ osx-clang (macos-latest) failed: t7519.2 run fsmonitor-daemon in bare repo ⓧ osx-clang (macos-latest) failed: t7527.1 explicit daemon start and stop ⓧ osx-clang (macos-latest) failed: t7527.2 implicit daemon start ⓧ osx-clang (macos-latest) failed: t7527.3 implicit daemon stop (delete .git) ⓧ osx-clang (macos-latest) failed: t7527.4 implicit daemon stop (rename .git) ⓧ osx-clang (macos-latest) failed: t7527.7 MacOS event spelling (rename .GIT) ⓧ osx-clang (macos-latest) failed: t7527.8 cannot start multiple daemons ⓧ osx-clang (macos-latest) failed: t7527.10 update-index implicitly starts daemon ⓧ osx-clang (macos-latest) failed: t7527.11 status implicitly starts daemon ⓧ osx-gcc (macos-latest) failed: t0012.81 fsmonitor--daemon can handle -h ⓧ osx-gcc (macos-latest) failed: t7519.2 run fsmonitor-daemon in bare repo ⓧ osx-gcc (macos-latest) failed: t7527.1 explicit daemon start and stop ⓧ osx-gcc (macos-latest) failed: t7527.2 implicit daemon start ⓧ osx-gcc (macos-latest) failed: t7527.3 implicit daemon stop (delete .git) ⓧ osx-gcc (macos-latest) failed: t7527.4 implicit daemon stop (rename .git) ⓧ osx-gcc (macos-latest) failed: t7527.7 MacOS event spelling (rename .GIT) ⓧ osx-gcc (macos-latest) failed: t7527.8 cannot start multiple daemons ⓧ osx-gcc (macos-latest) failed: t7527.10 update-index implicitly starts daemon ⓧ osx-gcc (macos-latest) failed: t7527.11 status implicitly starts daemon ⚠ CI: .github#L1 windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856 In my mind, this is already an improvement. (Even if this is a _lot_ of output, and a lot of individual errors, given that all of them are fixed with a single, small patch to adjust an option usage string, but that's not the fault of my patch series, but of the suggestion to put the check for the option usage string linting into the `parse_options()` machinery instead of into the static analysis job.) Since there are still plenty of failures, the page admittedly does load relatively slowly. But that's not the time I was trying to optimize for. My time comes at quite a premium these days, so if the computer has to work a little harder while I can do something else, as long as it saves _me_ time, I'll take that time. Every time. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > So I merged my branch into `seen` and pushed it. The corresponding run can > be seen here: > > https://github.com/dscho/git/actions/runs/1892982393 I visited this page (while logged in to GItHub---I am saying this for others who may not know the output is shown differently for visitors that are logged-in, and and logged-in users). > On that page, you see the following: > > Annotations > 50 errors and 1 warning > > ⓧ win test (3) > failed: t7527.1 explicit daemon start and stop > ... > > ⚠ CI: .github#L1 > windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856 > > In my mind, this is already an improvement. (Even if this is a _lot_ of > output, and a lot of individual errors, given that all of them are fixed > with a single, small patch to adjust an option usage string, but that's > not the fault of my patch series, but of the suggestion to put the check > for the option usage string linting into the `parse_options()` machinery > instead of into the static analysis job.) It is not obvious what aspect in the new output _you_ found "an improvement" to your readers, because you didn't spell it out. That makes "in my mind, this is already an improvement" a claim that is unnecessarily weaker than it really is. Let me tell my experience: - Clicking on macos+clang in the map-looking thing, it did show and scroll down automatically to show the last failure link ready to be clicked after a few seconds, which was nice, but made me scroll back to see the first failure, which could have been better. - Clicking on win+VS test (2), the failed <test> part was automatically opened, and a circle spinned for several dozens of seconds to make me wait, but after that, nothing happened. It was somewhat hard to know if I were expected to do something to view the first error and when the UI is ready to let me do so, or if I were just expected to wait a bit longer for it to all happen automatically. Either case, the presentation to fold all the pieces that finished successfully made it usable, as that saved human time to scan to where failures are shown. I personally do not care about the initial latency when viewing the output from CI run that may have happened a few dozens of minutes ago (I do not sit in front of GitHub CI UI and wait until it finishes). As long as it is made clear when I can start interacting with it, I can just open the page and let it load while I am working on something else. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Let me tell my experience: > > - Clicking on macos+clang in the map-looking thing, it did show and > scroll down automatically to show the last failure link ready to > be clicked after a few seconds, which was nice, but made me > scroll back to see the first failure, which could have been > better. > > - Clicking on win+VS test (2), the failed <test> part was > automatically opened, and a circle spinned for several dozens of > seconds to make me wait, but after that, nothing happened. It > was somewhat hard to know if I were expected to do something to > view the first error and when the UI is ready to let me do so, or > if I were just expected to wait a bit longer for it to all happen > automatically. > > Either case, the presentation to fold all the pieces that finished > successfully made it usable, as that saved human time to scan to > where failures are shown. > > I personally do not care about the initial latency when viewing the > output from CI run that may have happened a few dozens of minutes > ago (I do not sit in front of GitHub CI UI and wait until it > finishes). As long as it is made clear when I can start interacting > with it, I can just open the page and let it load while I am working > on something else. FWIW, CI run on "seen" uses this series. When I highlight a failure at CI, I often give a URL like this: https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520 I notice that this "hide by default" forces the recipient of the URL to click the line after the line with a red highlight before they can view the breakage. For example, an URL to show a similar breakage from the old run (without this series) looks like this: https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968 This directly jumps to the error and the recipient of the URL does not have to do anything special, which I have been using as a convenient way to give developers a starting point. I haven't compared the implementation of this one and Ævar's series that aims for a different goal, so I do not yet have an opinion on which one should come first (if we want to achieve both of what each of them wants to achieve, that is). Thanks.
Junio C Hamano <gitster@pobox.com> writes:
> FWIW, CI run on "seen" uses this series.
Another "early impression". I had to open this one today,
https://github.com/git/git/runs/5367854000?check_suite_focus=true
which was a jarring experience. It correctly painted the fourth
circle "Run ci/run-build-and-tests.sh" in red with X in it, and
after waiting for a while (which I already said that I do not mind
at all), showed a bunch of line, and then auto-scrolled down to the
end of that section.
It _looked_ like that it was now ready for me to interact with it,
so I started to scroll up to the beginning of that section, but I
had to stare at blank space for several minutes before lines are
shown to occupy that space. During the repainting, unlike the
initial delay-wait that lets me know that it is not ready by showing
the spinning circle, there was no indication that it wants me to
wait until it fills the blank space with lines. Not very pleasant.
I do not think it is so bad to say that it is less pleasant than
opening the large "print test failures" section and looking for "not
ok", which was what the original CI UI we had before this series.
But at least with the old one, once the UI becomes ready for me to
interact with, I didn't have to wait for (for the lack of better
phrase) such UI hiccups. Responses to looking for the next instance
of "not ok" was predictable.
Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> FWIW, CI run on "seen" uses this series. > > Another "early impression". I had to open this one today, > > https://github.com/git/git/runs/5367854000?check_suite_focus=true > > which was a jarring experience. It correctly painted the fourth > circle "Run ci/run-build-and-tests.sh" in red with X in it, and > after waiting for a while (which I already said that I do not mind > at all), showed a bunch of line, and then auto-scrolled down to the > end of that section. > > It _looked_ like that it was now ready for me to interact with it, > so I started to scroll up to the beginning of that section, but I > had to stare at blank space for several minutes before lines are Nah, that was several seconds, not minutes. Even though I am on Chromebooks, they are not _that_ slow ;-) > shown to occupy that space. During the repainting, unlike the > initial delay-wait that lets me know that it is not ready by showing > the spinning circle, there was no indication that it wants me to > wait until it fills the blank space with lines. Not very pleasant. > > I do not think it is so bad to say that it is less pleasant than > opening the large "print test failures" section and looking for "not > ok", which was what the original CI UI we had before this series. > But at least with the old one, once the UI becomes ready for me to > interact with, I didn't have to wait for (for the lack of better > phrase) such UI hiccups. Responses to looking for the next instance > of "not ok" was predictable. > > Thanks.
Hi Junio, On Sat, 26 Feb 2022, Junio C Hamano wrote: > When I highlight a failure at CI, I often give a URL like this: > > https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520 > > I notice that this "hide by default" forces the recipient of the URL > to click the line after the line with a red highlight before they > can view the breakage. Yes, that's because line 5520 is the header of that group. If you direct the reader to https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5674 instead, it will get expanded. If it would not get expanded, that would be a bug, but obviously not in my patch series. Ciao, Dscho
Hi Junio, On Mon, 28 Feb 2022, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > FWIW, CI run on "seen" uses this series. > > Another "early impression". I had to open this one today, > > https://github.com/git/git/runs/5367854000?check_suite_focus=true > > which was a jarring experience. It correctly painted the fourth > circle "Run ci/run-build-and-tests.sh" in red with X in it, and > after waiting for a while (which I already said that I do not mind > at all), showed a bunch of line, and then auto-scrolled down to the > end of that section. > > It _looked_ like that it was now ready for me to interact with it, > so I started to scroll up to the beginning of that section, but I > had to stare at blank space for several minutes before lines are > shown to occupy that space. During the repainting, unlike the > initial delay-wait that lets me know that it is not ready by showing > the spinning circle, there was no indication that it wants me to > wait until it fills the blank space with lines. Not very pleasant. > > I do not think it is so bad to say that it is less pleasant than > opening the large "print test failures" section and looking for "not > ok", which was what the original CI UI we had before this series. > But at least with the old one, once the UI becomes ready for me to > interact with, I didn't have to wait for (for the lack of better > phrase) such UI hiccups. Responses to looking for the next instance > of "not ok" was predictable. Let me again state my goal clearly, because some readers seem to be confused and believe that I want to improve the developer experience of veterans of the Git mailing list who are more than capable of finding their way through the build failures. My goal is instead to make new contributors' lives easier. It is a pretty high bar we set, expecting a new contributor faced with a build failure to figure out how to fix the breakage. It's as if we _wanted_ to instill impostors' syndrome in them, which is probably not intentional. In that respect, a relatively responsive page that utterly fails to direct the reader to the culprit is far, far worse than a slightly bumpy page that _does_ lead the reader in the right direction. In any case, thank you for integrating the patches into `seen` so that the impact of the patches can be "seen". Ciao, Dscho
Hi Junio, On Fri, 25 Feb 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On that page, you see the following: > > > > Annotations > > 50 errors and 1 warning > > > > ⓧ win test (3) > > failed: t7527.1 explicit daemon start and stop > > ... > > > > ⚠ CI: .github#L1 > > windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856 > > > > In my mind, this is already an improvement. (Even if this is a _lot_ of > > output, and a lot of individual errors, given that all of them are fixed > > with a single, small patch to adjust an option usage string, but that's > > not the fault of my patch series, but of the suggestion to put the check > > for the option usage string linting into the `parse_options()` machinery > > instead of into the static analysis job.) > > It is not obvious what aspect in the new output _you_ found "an > improvement" to your readers, because you didn't spell it out. True. The difference is those added lines that clearly indicate not only the failed job, but also the precise test case that failed. All in the summary page, without requiring any clicking nor scrolling. Thank you, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Let me again state my goal clearly, because some readers seem to be > confused and believe that I want to improve the developer experience of > veterans of the Git mailing list who are more than capable of finding > their way through the build failures. > > My goal is instead to make new contributors' lives easier. I understand who your primary target audiences are, and making them happy without robbing too much from others is a good thing to do. Instead of "Nah, this is not targetted for me" and ignoring the topic, I have been reporting my dogfood experience exactly to help that process---to notice if this variant worsens end-user experience and see if that is within the reasonable pros-and-cons tolerance, especially because they will start noticing the same hiccup as new contributors stay longer and gain experiences. > In any case, thank you for integrating the patches into `seen` so that the > impact of the patches can be "seen". No, do not thank me. Thank yourself for writing it, and thank others to try it and tell you their experiences to help you make it better. Thanks.
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Yes, that's because line 5520 is the header of that group. If you direct > the reader to > https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5674 > instead, it will get expanded. Ah, that's a good trick to know. I presume 5674 can be any random line in the block that is hidden by default that I want the viewer to expand, right? > If it would not get expanded, that would be a bug, but obviously not in my > patch series. Well, the choice to use a mechanism that has such a bug is made by this patch series. If "allowing users to easily exchange a URL that points the exact place that the tests broke" were one of the goals, it does not matter whose fault it is---the aggregate result is that the new UI would have failed that goal. Thankfully there does not seem to be such a bug in the UI mechanism used by this series, so we are in good shape. Thanks.
Hi Dscho On 25/02/2022 14:10, Johannes Schindelin wrote: > Hi Phillip, > > On Wed, 23 Feb 2022, Phillip Wood wrote: > >> On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote: >>> [...] >>> So just to make the point about one of those mentioned in my [1] with >>> some further details (I won't go into the whole thing to avoid repeating >>> myself): >>> >>> I opened both of: >>> >>> https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true >>> https://github.com/dscho/git/runs/4840190622?check_suite_focus=true >>> >>> Just now in Firefox 91.5.0esr-1. Both having been opened before, so >>> they're in cache, and I've got a current 40MB/s real downlink speed etc. >>> >>> The former fully loads in around 5100ms, with your series here that's >>> just short of 18000ms. >>> >>> So your CI changes are making the common case of just looking at a CI >>> failure more than **3x as slow as before**. >> >> I don't think that is the most useful comparison between the two. When I >> am investigating a test failure the time that matters to me is the time >> it takes to display the output of the failing test case. > > Thank you for expressing this so clearly. I will adopt a variation of this > phrasing in my commit message, if you don't mind? That's fine >> With the first link above the initial page load is faster but to get to >> the output of the failing test case I have click on "Run >> ci/print_test_failures.sh" then wait for that to load and then search >> for "not ok" to actually get to the information I'm after. > > And that's only because you are familiar with what you have to do. > > Any new contributor would be stuck with the information presented on the > initial load, without any indication that more information _is_ available, > just hidden away in the next step's log (which is marked as "succeeding", > therefore misleading the inclined reader into thinking that this cannot > potentially contain any information pertinent to the _failure_ that needs > to be investigated). Yes it took we a while to realize how to get to the test output when I first started looking at the CI results. One thing I forgot to mention was that when you expand a failing test it shows the test script twice before the output e.g. Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor git config core.fsmonitor true && git fsmonitor--daemon start && git update-index --fsmonitor expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor': git config core.fsmonitor true && git fsmonitor--daemon start && git update-index --fsmonitor ++ git config core.fsmonitor true ++ git fsmonitor--daemon start ... I don't know how easy it would be to fix that so that we only show "expecting success of ..." without the test being printed first Best Wishes Phillip >> With the second link the initial page load does feel slower but then I'm >> presented with the test failures nicely highlighted in red, all I have >> to do is click on one and I've got the information I'm after. >> >> Overall that is much faster and easier to use. > > Thank you for your comment. I really started to doubt myself, getting the > idea that it's just a case of me holding this thing wrong. > > For what it's worth, I did make a grave mistake by using that particular > `seen` CI failure with all of those failing p4 tests, which obviously > resulted in an incredibly large amount of logs. Obviously that _must_ be > slow to load. I just did not have the time to fabricate a CI failure. > > However, I do agree with you that the large amount of logs would have to > be looked at _anyway_, whether it is shown upon loading the job's logs or > only when expanding the `print-test-failures` step's logs. The amount of > the logs is a constant, after all, I did not change anything there (nor > would I). > > So a better example might be my concrete use case yesterday: the CI build > of `seen` failed. Here is the link to the regular output: > > https://github.com/git/git/actions/runs/1890665968 > > On that page, you see the following: > > > Annotations > 8 errors and 1 warning > > ⓧ win test (3) > Process completed with exit code 2. > > ⓧ win test (6) > Process completed with exit code 2. > > ⓧ win test (2) > Process completed with exit code 2. > > ⓧ win+VS test (3) > Process completed with exit code 2. > > ⓧ win+VS test (6) > Process completed with exit code 2. > > ⓧ win+VS test (2) > Process completed with exit code 2. > > ⓧ osx-gcc (macos-latest) > Process completed with exit code 2. > > ⓧ osx-clang (macos-latest) > Process completed with exit code 2. > > ⚠ CI: .github#L1 > windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856 > > So I merged my branch into `seen` and pushed it. The corresponding run can > be seen here: > > https://github.com/dscho/git/actions/runs/1892982393 > > On that page, you see the following: > > Annotations > 50 errors and 1 warning > > ⓧ win test (3) > failed: t7527.1 explicit daemon start and stop > > ⓧ win test (3) > failed: t7527.2 implicit daemon start > > ⓧ win test (3) > failed: t7527.3 implicit daemon stop (delete .git) > > ⓧ win test (3) > failed: t7527.4 implicit daemon stop (rename .git) > > ⓧ win test (3) > failed: t7527.5 implicit daemon stop (rename GIT~1) > > ⓧ win test (3) > failed: t7527.6 implicit daemon stop (rename GIT~2) > > ⓧ win test (3) > failed: t7527.8 cannot start multiple daemons > > ⓧ win test (3) > failed: t7527.10 update-index implicitly starts daemon > > ⓧ win test (3) > failed: t7527.11 status implicitly starts daemon > > ⓧ win test (3) > failed: t7527.12 edit some files > > ⓧ win test (2) > failed: t0012.81 fsmonitor--daemon can handle -h > > ⓧ win test (2) > Process completed with exit code 1. > > ⓧ win test (6) > failed: t7519.2 run fsmonitor-daemon in bare repo > > ⓧ win test (6) > failed: t7519.3 run fsmonitor-daemon in virtual repo > > ⓧ win test (6) > Process completed with exit code 1. > > ⓧ win+VS test (3) > failed: t7527.1 explicit daemon start and stop > > ⓧ win+VS test (3) > failed: t7527.2 implicit daemon start > > ⓧ win+VS test (3) > failed: t7527.3 implicit daemon stop (delete .git) > > ⓧ win+VS test (3) > failed: t7527.4 implicit daemon stop (rename .git) > > ⓧ win+VS test (3) > failed: t7527.5 implicit daemon stop (rename GIT~1) > > ⓧ win+VS test (3) > failed: t7527.6 implicit daemon stop (rename GIT~2) > > ⓧ win+VS test (3) > failed: t7527.8 cannot start multiple daemons > > ⓧ win+VS test (3) > failed: t7527.10 update-index implicitly starts daemon > > ⓧ win+VS test (3) > failed: t7527.11 status implicitly starts daemon > > ⓧ win+VS test (3) > failed: t7527.12 edit some files > > ⓧ win+VS test (2) > failed: t0012.81 fsmonitor--daemon can handle -h > > ⓧ win+VS test (2) > Process completed with exit code 1. > > ⓧ win+VS test (6) > failed: t7519.2 run fsmonitor-daemon in bare repo > > ⓧ win+VS test (6) > failed: t7519.3 run fsmonitor-daemon in virtual repo > > ⓧ win+VS test (6) > Process completed with exit code 1. > > ⓧ osx-clang (macos-latest) > failed: t0012.81 fsmonitor--daemon can handle -h > > ⓧ osx-clang (macos-latest) > failed: t7519.2 run fsmonitor-daemon in bare repo > > ⓧ osx-clang (macos-latest) > failed: t7527.1 explicit daemon start and stop > > ⓧ osx-clang (macos-latest) > failed: t7527.2 implicit daemon start > > ⓧ osx-clang (macos-latest) > failed: t7527.3 implicit daemon stop (delete .git) > > ⓧ osx-clang (macos-latest) > failed: t7527.4 implicit daemon stop (rename .git) > > ⓧ osx-clang (macos-latest) > failed: t7527.7 MacOS event spelling (rename .GIT) > > ⓧ osx-clang (macos-latest) > failed: t7527.8 cannot start multiple daemons > > ⓧ osx-clang (macos-latest) > failed: t7527.10 update-index implicitly starts daemon > > ⓧ osx-clang (macos-latest) > failed: t7527.11 status implicitly starts daemon > > ⓧ osx-gcc (macos-latest) > failed: t0012.81 fsmonitor--daemon can handle -h > > ⓧ osx-gcc (macos-latest) > failed: t7519.2 run fsmonitor-daemon in bare repo > > ⓧ osx-gcc (macos-latest) > failed: t7527.1 explicit daemon start and stop > > ⓧ osx-gcc (macos-latest) > failed: t7527.2 implicit daemon start > > ⓧ osx-gcc (macos-latest) > failed: t7527.3 implicit daemon stop (delete .git) > > ⓧ osx-gcc (macos-latest) > failed: t7527.4 implicit daemon stop (rename .git) > > ⓧ osx-gcc (macos-latest) > failed: t7527.7 MacOS event spelling (rename .GIT) > > ⓧ osx-gcc (macos-latest) > failed: t7527.8 cannot start multiple daemons > > ⓧ osx-gcc (macos-latest) > failed: t7527.10 update-index implicitly starts daemon > > ⓧ osx-gcc (macos-latest) > failed: t7527.11 status implicitly starts daemon > > ⚠ CI: .github#L1 > windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856 > > In my mind, this is already an improvement. (Even if this is a _lot_ of > output, and a lot of individual errors, given that all of them are fixed > with a single, small patch to adjust an option usage string, but that's > not the fault of my patch series, but of the suggestion to put the check > for the option usage string linting into the `parse_options()` machinery > instead of into the static analysis job.) > > Since there are still plenty of failures, the page admittedly does load > relatively slowly. But that's not the time I was trying to optimize for. > My time comes at quite a premium these days, so if the computer has to > work a little harder while I can do something else, as long as it saves > _me_ time, I'll take that time. Every time. > > Ciao, > Dscho
GitHub CI seems to fail due to lack of "paste" for win+VS job. This was somewhat unexpected, as our test scripts seem to make liberal use of "cut" that goes together with it. https://github.com/git/git/runs/5415486631?check_suite_focus=true#step:5:6199 The particular failure at the URL comes from the use of "paste" in 5ea4f3a5 (name-rev: use generation numbers if available, 2022-02-28), but it hardly is the first use of the command. There is one use of it in t/aggregate-results.sh in 'master/main' already. We could rewrite the tests that use "paste" but looking at the use of the tool in the test (and the aggregate thing), rewriting them due to lack of a tool, whose source should be freely available from where "cut" was taken from, does not sound like too attractive a direction to go in, but I do not know how much work is involved in adding it (and in general, any basic tool with similar complexity that we may find missing in the future) to the win+VS environment. Thoughts? Thanks.
On Thu, Mar 03 2022, Junio C Hamano wrote: > GitHub CI seems to fail due to lack of "paste" for win+VS job. This > was somewhat unexpected, as our test scripts seem to make liberal > use of "cut" that goes together with it. > > https://github.com/git/git/runs/5415486631?check_suite_focus=true#step:5:6199 > > The particular failure at the URL comes from the use of "paste" in > 5ea4f3a5 (name-rev: use generation numbers if available, > 2022-02-28), but it hardly is the first use of the command. There > is one use of it in t/aggregate-results.sh in 'master/main' already. I think it's the first use, the t/aggregate-results.sh is run on "DEFAULT_TEST_TARGET=test make -C t", but we use "DEFAULT_TEST_TARGET=prove". Re your upthread: > I personally do not care about the initial latency when viewing the > output from CI run that may have happened a few dozens of minutes > ago (I do not sit in front of GitHub CI UI and wait until it > finishes). I think this URL is a good example of what I noted in [1]. Your link loads relatively quickly, but I then saw a "linux-TEST-vars" failure and clicked on it, wanting to see why that fails. It opens relatively quickly, but no failure can be seen. It stalls with a spinner next to "t/run-build-and-test.sh", and stalled like that for 75[2] seconds before finally loading past line ~3.5k to line ~70k showing the relevant failure in t6120*.sh. I really don't think it's a reasonable claim to say that only "veterans" of git development[3] are likely to find the workflow of seeing a CI failure right away useful, or wanting to browse through the N different "job" failures without having to pre-open them, go find something else to do, then come back to it etc. I also noted in [1] that it takes a lot more CPU now, so even if that is your workflow for looking at CI you'll need a fairly performant machine if you have a few job failures (which isn't a typical), as each tab will be pegging a CPU core at ~100% for a while. I have fairly normally spec'd quad-core laptop that I almost never hear, and this new CI UI is pretty reliable in making it sound as though it's about to take flight. 1. https://lore.kernel.org/git/220222.86tucr6kz5.gmgdl@evledraar.gmail.com/ 2. I reported large N seconds, but nothing so bad before. For some reason this one's particularly bad, but in [1] it was the same CPU use with ~20s etc (but that one was 1/2 the amount of lines) 3. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2203011111150.11118@tvgsbejvaqbjf.bet/
Hi Junio, On Thu, 3 Mar 2022, Junio C Hamano wrote: > GitHub CI seems to fail due to lack of "paste" for win+VS job. This > was somewhat unexpected, as our test scripts seem to make liberal > use of "cut" that goes together with it. > > https://github.com/git/git/runs/5415486631?check_suite_focus=true#step:5:6199 > > The particular failure at the URL comes from the use of "paste" in > 5ea4f3a5 (name-rev: use generation numbers if available, > 2022-02-28), but it hardly is the first use of the command. There > is one use of it in t/aggregate-results.sh in 'master/main' already. > > We could rewrite the tests that use "paste" but looking at the use > of the tool in the test (and the aggregate thing), rewriting them > due to lack of a tool, whose source should be freely available from > where "cut" was taken from, does not sound like too attractive a > direction to go in, but I do not know how much work is involved in > adding it (and in general, any basic tool with similar complexity > that we may find missing in the future) to the win+VS environment. I added it: https://github.com/git-for-windows/git-sdk-64/commit/e3ade7eef2503149dfefe630037c2fd6d24f2c14 It will take ~35 minutes (at time of writing) for https://dev.azure.com/Git-for-Windows/git/_build/results?buildId=95542&view=results to make the corresponding artifact available to the `setup-git-for-windows-sdk` Action we use. Ciao, Dscho
Hi Ævar, On Fri, 4 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > I really don't think it's a reasonable claim to say that only "veterans" > of git development[3] are likely to find the workflow of seeing a CI > failure right away useful, or wanting to browse through the N different > "job" failures without having to pre-open them, go find something else > to do, then come back to it etc. I said that the current output is only useful to veterans. The output that hides the detailed log, under a separate job that is marked as non-failing. That's still as true as when I said it. :-) Ciao, Johannes
Hi Phillip, On Wed, 2 Mar 2022, Phillip Wood wrote: > On 25/02/2022 14:10, Johannes Schindelin wrote: > > > On Wed, 23 Feb 2022, Phillip Wood wrote: > > > > > With the first link above the initial page load is faster but to get > > > to the output of the failing test case I have click on "Run > > > ci/print_test_failures.sh" then wait for that to load and then > > > search for "not ok" to actually get to the information I'm after. > > > > And that's only because you are familiar with what you have to do. > > > > Any new contributor would be stuck with the information presented on the > > initial load, without any indication that more information _is_ available, > > just hidden away in the next step's log (which is marked as "succeeding", > > therefore misleading the inclined reader into thinking that this cannot > > potentially contain any information pertinent to the _failure_ that needs > > to be investigated). > > Yes it took we a while to realize how to get to the test output when I first > started looking at the CI results. Thank you for saying that. Since nobody else said it as clearly as you, I really started to doubt myself here. > One thing I forgot to mention was that when you expand a failing test it shows > the test script twice before the output e.g. > > Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor > failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor > git config core.fsmonitor true && > git fsmonitor--daemon start && > git update-index --fsmonitor > > expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor': > git config core.fsmonitor true && > git fsmonitor--daemon start && > git update-index --fsmonitor > > ++ git config core.fsmonitor true > ++ git fsmonitor--daemon start > ... > > I don't know how easy it would be to fix that so that we only show "expecting > success of ..." without the test being printed first It's not _super_ easy: right now, the patch series does not touch the code that prints the latter message. In fact, that latter message is generated _before_ the test fails, and redirected via `tee` into `GIT_TEST_TEE_OUTPUT_FILE`. Then, once the verdict is clear that this test case failed, the first message is printed (the one that is colored in the output via `::error::`), and the output from the `GIT_TEST_TEE_OUTPUT_FILE` file is pasted, starting at the offset marking the start of the test case. The easiest workaround would probably to add a flag that suppresses the header `expecting success` in case we're running with the `--github-workflow-markup` option. I'll put it on my ever-growing TODO list, but maybe in the interest of heeding the mantra "the perfect is the enemy of the good", this can be done on top of this series rather than blocking it? Thanks, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> GitHub CI seems to fail due to lack of "paste" for win+VS job. This >> was somewhat unexpected, as our test scripts seem to make liberal >> use of "cut" that goes together with it. >> ... > I added it: > https://github.com/git-for-windows/git-sdk-64/commit/e3ade7eef2503149dfefe630037c2fd6d24f2c14 Thanks.
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > I said that the current output is only useful to veterans. The output that > hides the detailed log, under a separate job that is marked as > non-failing. > > That's still as true as when I said it. :-) I think getting rid of the separate "print failures" CI step and making it more discoverable how to reveal the details of failing test step is a usability improvement. I am not Ævar, but I think what was questioned was the improvement justifies multi dozens of seconds extra waiting time, which is a usability dis-improvement. I do not have a good answer to that question. I am probably nearing to be a veteran who knows when to brew my tea in my work cycle, and waiting for an extra minute or two while browsing CI output is not a problem for me. But new "non-veteran" users may not share that. That is something a bit worrying about the new UI. Thanks.
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> One thing I forgot to mention was that when you expand a failing test it shows >> the test script twice before the output e.g. >> >> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor >> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor >> git config core.fsmonitor true && >> git fsmonitor--daemon start && >> git update-index --fsmonitor >> >> expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor': >> git config core.fsmonitor true && >> git fsmonitor--daemon start && >> git update-index --fsmonitor >> >> ++ git config core.fsmonitor true >> ++ git fsmonitor--daemon start >> ... >> >> I don't know how easy it would be to fix that so that we only show "expecting >> success of ..." without the test being printed first > > It's not _super_ easy: right now, the patch series does not touch the code In other words, it is not a new issue introduced by this series, right? > The easiest workaround would probably to add a flag that suppresses the > header `expecting success` in case we're running with the > `--github-workflow-markup` option. If that is the case, let's leave it outside the series. If we do not have to hide the solution behind any option specific to "--github-workflow-markup", then even users (like me) who reguarly run "cd t && sh ./t1234-a-particular-test.sh -i -v" would benefit if we no longer have to look at the duplicated test script in the output. Thanks.
Hi Dscho On 07/03/2022 16:07, Johannes Schindelin wrote: > Hi Phillip, > > On Wed, 2 Mar 2022, Phillip Wood wrote: > >> On 25/02/2022 14:10, Johannes Schindelin wrote: >> >>> On Wed, 23 Feb 2022, Phillip Wood wrote: >>> >>>> With the first link above the initial page load is faster but to get >>>> to the output of the failing test case I have click on "Run >>>> ci/print_test_failures.sh" then wait for that to load and then >>>> search for "not ok" to actually get to the information I'm after. >>> >>> And that's only because you are familiar with what you have to do. >>> >>> Any new contributor would be stuck with the information presented on the >>> initial load, without any indication that more information _is_ available, >>> just hidden away in the next step's log (which is marked as "succeeding", >>> therefore misleading the inclined reader into thinking that this cannot >>> potentially contain any information pertinent to the _failure_ that needs >>> to be investigated). >> >> Yes it took we a while to realize how to get to the test output when I first >> started looking at the CI results. > > Thank you for saying that. Since nobody else said it as clearly as you, I > really started to doubt myself here. > >> One thing I forgot to mention was that when you expand a failing test it shows >> the test script twice before the output e.g. >> >> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor >> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor >> git config core.fsmonitor true && >> git fsmonitor--daemon start && >> git update-index --fsmonitor >> >> expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor': >> git config core.fsmonitor true && >> git fsmonitor--daemon start && >> git update-index --fsmonitor >> >> ++ git config core.fsmonitor true >> ++ git fsmonitor--daemon start >> ... >> >> I don't know how easy it would be to fix that so that we only show "expecting >> success of ..." without the test being printed first > > It's not _super_ easy: right now, the patch series does not touch the code > that prints the latter message. In fact, that latter message is generated > _before_ the test fails, and redirected via `tee` into > `GIT_TEST_TEE_OUTPUT_FILE`. Then, once the verdict is clear that this test > case failed, the first message is printed (the one that is colored in the > output via `::error::`), and the output from the > `GIT_TEST_TEE_OUTPUT_FILE` file is pasted, starting at the offset marking > the start of the test case. > > The easiest workaround would probably to add a flag that suppresses the > header `expecting success` in case we're running with the > `--github-workflow-markup` option. > > I'll put it on my ever-growing TODO list, but maybe in the interest of > heeding the mantra "the perfect is the enemy of the good", this can be > done on top of this series rather than blocking it? Sure, I mentioned it in case it was easy to fix but I don't think it should stop these patches moving forward Best Wishes Phillip > Thanks, > Dscho
On Mon, Mar 07 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >>> One thing I forgot to mention was that when you expand a failing test it shows >>> the test script twice before the output e.g. >>> >>> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor >>> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor >>> git config core.fsmonitor true && >>> git fsmonitor--daemon start && >>> git update-index --fsmonitor >>> >>> expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor': >>> git config core.fsmonitor true && >>> git fsmonitor--daemon start && >>> git update-index --fsmonitor >>> >>> ++ git config core.fsmonitor true >>> ++ git fsmonitor--daemon start >>> ... >>> >>> I don't know how easy it would be to fix that so that we only show "expecting >>> success of ..." without the test being printed first >> >> It's not _super_ easy: right now, the patch series does not touch the code > > In other words, it is not a new issue introduced by this series, right? It is a new issue in this series, specifically how "finalize_test_case_output" interacts with "test_{ok,failure}_" and friends. >> The easiest workaround would probably to add a flag that suppresses the >> header `expecting success` in case we're running with the >> `--github-workflow-markup` option. > > If that is the case, let's leave it outside the series. > > If we do not have to hide the solution behind any option specific to > "--github-workflow-markup", then even users (like me) who reguarly > run "cd t && sh ./t1234-a-particular-test.sh -i -v" would benefit if > we no longer have to look at the duplicated test script in the > output. Unless you invoke it with --github-workflow-markup you won't see the duplication. I had some comments about inherent limitations in the approach in this series vis-a-vis parsing markup after the fact[1]. But that really doesn't seem to apply here. We're just printing the test source into the *.markup file twice for no particular reason, aren't we? *tests locally* Hrm, so first this is a bug: $ ./t0002-gitfile.sh --github-workflow-markup FATAL: Unexpected exit with code 1 FATAL: Unexpected exit with code 1 Seems it wants --tee but doesn't declare it, this works: $ rm -rf test-results/; ./t0002-gitfile.sh --github-workflow-markup --tee; cat test-results/t0002-gitfile.markup Isn't this a matter of making finalize_test_case_output not print the full $* (including the test source) for failures? 1. https://lore.kernel.org/git/220309.861qzbnymn.gmgdl@evledraar.gmail.com/
Hi Junio, On Mon, 7 Mar 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I said that the current output is only useful to veterans. The output that > > hides the detailed log, under a separate job that is marked as > > non-failing. > > > > That's still as true as when I said it. :-) > > I think getting rid of the separate "print failures" CI step and > making it more discoverable how to reveal the details of failing > test step is a usability improvement. I'm so glad you're saying that! I was starting to doubt whether my caring about getting rid of that `print failures` step was maybe misguided. > I am not Ævar, but I think what was questioned was the improvement > justifies multi dozens of seconds extra waiting time, which is a > usability dis-improvement. I do not have a good answer to that > question. It is definitely worrisome that we have to pay such a price. And if there was a good answer how to improve that (without sacrificing the discoverability of the command trace corresponding to the test failure), I would be more than just eager to hear it. I did consider generating a HTML-formatted report that would then be attached as a build artifact. But that would hide the relevant information even worse than a "print failures" step. Maybe I should just get rid of the grouping? But that _really_ helped me when I investigated the recent "usage string updates" vs "FSMonitor" problem, because the test case boundaries were so much clearer. Plus, as far as I saw, GitHub workflow logs always scroll to the end of the logs of the failing step, which would not help _at all_ here. So I dunno. > I am probably nearing to be a veteran who knows when to brew my tea > in my work cycle, and waiting for an extra minute or two while > browsing CI output is not a problem for me. :-) > But new "non-veteran" users may not share that. That is something a > bit worrying about the new UI. Indeed. My goal, after all, is to improve the experience of contributors, not to make it harder. Still, given that you currently have to click quite a few times until you get to where you need to be, I have my doubts that what this patch series does is actually making things slower, measured in terms of the total time from seeing a failed build to being able to diagnose the cause by inspecting the command trace. Ciao, Dscho
On Wed, Mar 09 2022, Johannes Schindelin wrote: > Hi Junio, > > On Mon, 7 Mar 2022, Junio C Hamano wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >> > I said that the current output is only useful to veterans. The output that >> > hides the detailed log, under a separate job that is marked as >> > non-failing. >> > >> > That's still as true as when I said it. :-) >> >> I think getting rid of the separate "print failures" CI step and >> making it more discoverable how to reveal the details of failing >> test step is a usability improvement. > > I'm so glad you're saying that! I was starting to doubt whether my caring > about getting rid of that `print failures` step was maybe misguided. I don't think anyone's been maintaining that getting rid of it wouldn't be ideal. I for one have just been commenting on issues in the proposed implementation. I think we might still want to retain some such steps in the future, i.e. if we have a failure have subsequent steps that on failure() bump varying levels of verbosity / raw logs etc., or even try re-running the test in different ways (e.g. narrow it down with --run). But the failure step you see when something fails should ideally have the failure plus the relevant error, just as we do with compile errors. >> I am not Ævar, but I think what was questioned was the improvement >> justifies multi dozens of seconds extra waiting time, which is a >> usability dis-improvement. I do not have a good answer to that >> question. > > It is definitely worrisome that we have to pay such a price. And if there > was a good answer how to improve that (without sacrificing the > discoverability of the command trace corresponding to the test failure), I > would be more than just eager to hear it. Isn't the answer to that what I suggested in [1]; I.e. the performance problem being that we include N number of lines of the output that *didn't fail*, and that's what slows down showing the relevant output that *did* fail. I.e. if say t3070-wildmatch.sh fails in a couple of tests we'd show a *lot* of lines between the relevant failing tests, let's just elide the non-failing ones and show the output for the failing ones specifically. *Sometimes* (but very rarely) it's relevant to still look at the full output, since the failure might be due to an earlier silent failure in a previous test (or the state it left behind), but I think that's rare enough that the right thing to do is just to stick that in a subsequent "verbose dump" step or whatever. >> But new "non-veteran" users may not share that. That is something a >> bit worrying about the new UI. > > Indeed. My goal, after all, is to improve the experience of contributors, > not to make it harder. > > Still, given that you currently have to click quite a few times until you > get to where you need to be, I have my doubts that what this patch series > does is actually making things slower, measured in terms of the total time > from seeing a failed build to being able to diagnose the cause by > inspecting the command trace. Yes, but wouldn't the "Test Summary Report" in [1] be the best of both worlds[1] (with some minor changes to adapt it to the GitHub "grouping" output, perhaps)? Then you'd always see the specific of the failing test at the end, if you had N number of failures you might have a lot of output above that, but even that we could always tweak with some smart heuristic. I.e. show verbose "not ok" output if failures <10, if 10..100 elide some for the raw log, if >100 just print "this is completely screwed" or whatever :) 1. https://lore.kernel.org/git/220302.86mti87cj2.gmgdl@evledraar.gmail.com/
Background ========== Recent patches intended to help readers figure out CI failures much quicker than before. Unfortunately, they haven't been entirely positive for me. For example, they broke the branch protections in Microsoft's fork of Git, where we require Pull Requests to pass a certain set of Checks (which are identified by their names) and therefore caused follow-up work. Using CI and in general making it easier for new contributors is an area I'm passionate about, and one I'd like to see improved. The current situation ===================== Let me walk you through the current experience when a PR build fails: I get a notification mail that only says that a certain job failed. There's no indication of which test failed (or was it the build?). I can click on a link at it takes me to the workflow run. Once there, all it says is "Process completed with exit code 1", or even "code 2". Sure, I can click on one of the failed jobs. It even expands the failed step's log (collapsing the other steps). And what do I see there? Let's look at an example of a failed linux-clang (ubuntu-latest) job [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]: [...] Test Summary Report ------------------- t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) Failed test: 49 Non-zero exit status: 1 t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) TODO passed: 45, 47 Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr 1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU) Result: FAIL make[1]: *** [Makefile:53: prove] Error 1 make[1]: Leaving directory '/home/runner/work/git/git/t' make: *** [Makefile:3018: test] Error 2 That's it. I count myself lucky not to be a new contributor being faced with something like this. Now, since I am active in the Git project for a couple of days or so, I can make sense of the "TODO passed" label and know that for the purpose of fixing the build failures, I need to ignore this, and that I need to focus on the "Failed test" part instead. I also know that I do not have to get myself an ubuntu-latest box just to reproduce the error, I do not even have to check out the code and run it just to learn what that "49" means. I know, and I do not expect any new contributor, not even most seasoned contributors to know, that I have to patiently collapse the "Run ci/run-build-and-tests.sh" job's log, and instead expand the "Run ci/print-test-failures.sh" job log (which did not fail and hence does not draw any attention to it). I know, and again: I do not expect many others to know this, that I then have to click into the "Search logs" box (not the regular web browser's search via Ctrl+F!) and type in "not ok" to find the log of the failed test case (and this might still be a "known broken" one that is marked via test_expect_failure and once again needs to be ignored). To be excessively clear: This is not a great experience! Improved output =============== Our previous Azure Pipelines-based CI builds had a much nicer UI, one that even showed flaky tests, and trends e.g. how long the test cases ran. When I ported Git's CI over to GitHub workflows (to make CI more accessible to new contributors), I knew fully well that we would leave this very nice UI behind, and I had hoped that we would get something similar back via new, community-contributed GitHub Actions that can be used in GitHub workflows. However, most likely because we use a home-grown test framework implemented in opinionated POSIX shells scripts, that did not happen. So I had a look at what standards exist e.g. when testing PowerShell modules, in the way of marking up their test output in GitHub workflows, and I was not disappointed: GitHub workflows support "grouping" of output lines, i.e. marking sections of the output as a group that is then collapsed by default and can be expanded. And it is this feature I decided to use in this patch series, along with GitHub workflows' commands to display errors or notices that are also shown on the summary page of the workflow run. Now, in addition to "Process completed with exit code" on the summary page, we also read something like: ⊗ linux-gcc (ubuntu-latest) failed: t9800.20 submit from detached head Even better, this message is a link, and following that, the reader is presented with something like this [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]: ⏵ Run ci/run-build-and-tests.sh ⏵ CI setup + ln -s /home/runner/none/.prove t/.prove + run_tests=t + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main + group Build make + set +x ⏵ Build ⏵ Run tests === Failed test: t9800-git-p4-basic === ⏵ ok: t9800.1 start p4d ⏵ ok: t9800.2 add p4 files ⏵ ok: t9800.3 basic git p4 clone ⏵ ok: t9800.4 depot typo error ⏵ ok: t9800.5 git p4 clone @all ⏵ ok: t9800.6 git p4 sync uninitialized repo ⏵ ok: t9800.7 git p4 sync new branch ⏵ ok: t9800.8 clone two dirs ⏵ ok: t9800.9 clone two dirs, @all ⏵ ok: t9800.10 clone two dirs, @all, conflicting files ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit ⏵ ok: t9800.12 clone using non-numeric revision ranges ⏵ ok: t9800.13 clone with date range, excluding some changes ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output ⏵ ok: t9800.15 exit gracefully for p4 server errors ⏵ ok: t9800.16 clone --bare should make a bare repository ⏵ ok: t9800.17 initial import time from top change time ⏵ ok: t9800.18 unresolvable host in P4PORT should display error ⏵ ok: t9800.19 run hook p4-pre-submit before submit Error: failed: t9800.20 submit from detached head ⏵ failure: t9800.20 submit from detached head Error: failed: t9800.21 submit from worktree ⏵ failure: t9800.21 submit from worktree === Failed test: t9801-git-p4-branch === [...] The "Failed test:" lines are colored in yellow to give a better visual clue about the logs' structure, the "Error:" label is colored in red to draw the attention to the important part of the log, and the "⏵" characters indicate that part of the log is collapsed and can be expanded by clicking on it. To drill down, the reader merely needs to expand the (failed) test case's log by clicking on it, and then study the log. If needed (e.g. when the test case relies on side effects from previous test cases), the logs of preceding test cases can be expanded as well. In this example, when expanding t9800.20, it looks like this (for ease of reading, I cut a few chunks of lines, indicated by "[...]"): [...] ⏵ ok: t9800.19 run hook p4-pre-submit before submit Error: failed: t9800.20 submit from detached head ⏷ failure: t9800.20 submit from detached head test_when_finished cleanup_git && git p4 clone --dest="$git" //depot && ( cd "$git" && git checkout p4/master && >detached_head_test && git add detached_head_test && git commit -m "add detached_head" && git config git-p4.skipSubmitEdit true && git p4 submit && git p4 rebase && git log p4/master | grep detached_head ) [...] Depot paths: //depot/ Import destination: refs/remotes/p4/master Importing revision 9 (100%)Perforce db files in '.' will be created if missing... Perforce db files in '.' will be created if missing... Traceback (most recent call last): File "/home/runner/work/git/git/git-p4", line 4455, in <module> main() File "/home/runner/work/git/git/git-p4", line 4449, in main if not cmd.run(args): File "/home/runner/work/git/git/git-p4", line 2590, in run rebase.rebase() File "/home/runner/work/git/git/git-p4", line 4121, in rebase if len(read_pipe("git diff-index HEAD --")) > 0: File "/home/runner/work/git/git/git-p4", line 297, in read_pipe retcode, out, err = read_pipe_full(c, *k, **kw) File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full p = subprocess.Popen( File "/usr/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --' error: last command exited with $?=1 + cleanup_git + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + nr_tries_left=60 + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + test 1 -ne 1 + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + nr_tries_left=60 + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git + exit 1 + eval_ret=1 + : not ok 20 - submit from detached head # # test_when_finished cleanup_git && # git p4 clone --dest="$git" //depot && # ( # cd "$git" && # git checkout p4/master && # >detached_head_test && # git add detached_head_test && # git commit -m "add detached_head" && # git config git-p4.skipSubmitEdit true && # git p4 submit && # git p4 rebase && # git log p4/master | grep detached_head # ) # Error: failed: t9800.21 submit from worktree [...] Is this the best UI we can have for test failures in CI runs? I hope we can do better. Having said that, this patch series presents a pretty good start, and offers a basis for future improvements. Johannes Schindelin (9): ci: fix code style ci/run-build-and-tests: take a more high-level view ci: make it easier to find failed tests' logs in the GitHub workflow ci/run-build-and-tests: add some structure to the GitHub workflow output tests: refactor --write-junit-xml code test(junit): avoid line feeds in XML attributes ci: optionally mark up output in the GitHub workflow ci: use `--github-workflow-markup` in the GitHub workflow ci: call `finalize_test_case_output` a little later .github/workflows/main.yml | 12 --- ci/lib.sh | 81 ++++++++++++++-- ci/run-build-and-tests.sh | 11 ++- ci/run-test-slice.sh | 5 +- t/test-lib-functions.sh | 4 +- t/test-lib-github-workflow-markup.sh | 50 ++++++++++ t/test-lib-junit.sh | 132 +++++++++++++++++++++++++++ t/test-lib.sh | 128 ++++---------------------- 8 files changed, 287 insertions(+), 136 deletions(-) create mode 100644 t/test-lib-github-workflow-markup.sh create mode 100644 t/test-lib-junit.sh base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1117