[0/9] ci: make Git's GitHub workflow output much more helpful

Message ID	pull.1117.git.1643050574.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1117.git.1643050574.gitgitgadget@gmail.com> From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> Date: Mon, 24 Jan 2022 18:56:05 +0000 Subject: [PATCH 0/9] ci: make Git's GitHub workflow output much more helpful Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Johannes Schindelin <johannes.schindelin@gmx.de> Precedence: bulk
Series	ci: make Git's GitHub workflow output much more helpful \| expand [0/9] ci: make Git's GitHub workflow output much more helpful [1/9] ci: fix code style [2/9] ci/run-build-and-tests: take a more high-level view [3/9] ci: make it easier to find failed tests' logs in the GitHub workflow [4/9] ci/run-build-and-tests: add some structure to the GitHub workflow output [5/9] tests: refactor --write-junit-xml code [6/9] test(junit): avoid line feeds in XML attributes [7/9] ci: optionally mark up output in the GitHub workflow [8/9] ci: use `--github-workflow-markup` in the GitHub workflow [9/9] ci: call `finalize_test_case_output` a little later

Kazuhiro Kato via GitGitGadget Jan. 24, 2022, 6:56 p.m. UTC

Background
==========

Recent patches intended to help readers figure out CI failures much quicker
than before. Unfortunately, they haven't been entirely positive for me. For
example, they broke the branch protections in Microsoft's fork of Git, where
we require Pull Requests to pass a certain set of Checks (which are
identified by their names) and therefore caused follow-up work.

Using CI and in general making it easier for new contributors is an area I'm
passionate about, and one I'd like to see improved.


The current situation
=====================

Let me walk you through the current experience when a PR build fails: I get
a notification mail that only says that a certain job failed. There's no
indication of which test failed (or was it the build?). I can click on a
link at it takes me to the workflow run. Once there, all it says is "Process
completed with exit code 1", or even "code 2". Sure, I can click on one of
the failed jobs. It even expands the failed step's log (collapsing the other
steps). And what do I see there?

Let's look at an example of a failed linux-clang (ubuntu-latest) job
[https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:

[...]
Test Summary Report
-------------------
t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
  Failed test:  49
  Non-zero exit status: 1
t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
  TODO passed:   45, 47
Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
Result: FAIL
make[1]: *** [Makefile:53: prove] Error 1
make[1]: Leaving directory '/home/runner/work/git/git/t'
make: *** [Makefile:3018: test] Error 2


That's it. I count myself lucky not to be a new contributor being faced with
something like this.

Now, since I am active in the Git project for a couple of days or so, I can
make sense of the "TODO passed" label and know that for the purpose of
fixing the build failures, I need to ignore this, and that I need to focus
on the "Failed test" part instead.

I also know that I do not have to get myself an ubuntu-latest box just to
reproduce the error, I do not even have to check out the code and run it
just to learn what that "49" means.

I know, and I do not expect any new contributor, not even most seasoned
contributors to know, that I have to patiently collapse the "Run
ci/run-build-and-tests.sh" job's log, and instead expand the "Run
ci/print-test-failures.sh" job log (which did not fail and hence does not
draw any attention to it).

I know, and again: I do not expect many others to know this, that I then
have to click into the "Search logs" box (not the regular web browser's
search via Ctrl+F!) and type in "not ok" to find the log of the failed test
case (and this might still be a "known broken" one that is marked via
test_expect_failure and once again needs to be ignored).

To be excessively clear: This is not a great experience!


Improved output
===============

Our previous Azure Pipelines-based CI builds had a much nicer UI, one that
even showed flaky tests, and trends e.g. how long the test cases ran. When I
ported Git's CI over to GitHub workflows (to make CI more accessible to new
contributors), I knew fully well that we would leave this very nice UI
behind, and I had hoped that we would get something similar back via new,
community-contributed GitHub Actions that can be used in GitHub workflows.
However, most likely because we use a home-grown test framework implemented
in opinionated POSIX shells scripts, that did not happen.

So I had a look at what standards exist e.g. when testing PowerShell
modules, in the way of marking up their test output in GitHub workflows, and
I was not disappointed: GitHub workflows support "grouping" of output lines,
i.e. marking sections of the output as a group that is then collapsed by
default and can be expanded. And it is this feature I decided to use in this
patch series, along with GitHub workflows' commands to display errors or
notices that are also shown on the summary page of the workflow run. Now, in
addition to "Process completed with exit code" on the summary page, we also
read something like:

⊗ linux-gcc (ubuntu-latest)
   failed: t9800.20 submit from detached head


Even better, this message is a link, and following that, the reader is
presented with something like this
[https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:

⏵ Run ci/run-build-and-tests.sh
⏵ CI setup
  + ln -s /home/runner/none/.prove t/.prove
  + run_tests=t
  + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
  + group Build make
  + set +x
⏵ Build
⏵ Run tests
  === Failed test: t9800-git-p4-basic ===
⏵ ok: t9800.1 start p4d
⏵ ok: t9800.2 add p4 files
⏵ ok: t9800.3 basic git p4 clone
⏵ ok: t9800.4 depot typo error
⏵ ok: t9800.5 git p4 clone @all
⏵ ok: t9800.6 git p4 sync uninitialized repo
⏵ ok: t9800.7 git p4 sync new branch
⏵ ok: t9800.8 clone two dirs
⏵ ok: t9800.9 clone two dirs, @all
⏵ ok: t9800.10 clone two dirs, @all, conflicting files
⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit
⏵ ok: t9800.12 clone using non-numeric revision ranges
⏵ ok: t9800.13 clone with date range, excluding some changes
⏵ ok: t9800.14 exit when p4 fails to produce marshaled output
⏵ ok: t9800.15 exit gracefully for p4 server errors
⏵ ok: t9800.16 clone --bare should make a bare repository
⏵ ok: t9800.17 initial import time from top change time
⏵ ok: t9800.18 unresolvable host in P4PORT should display error
⏵ ok: t9800.19 run hook p4-pre-submit before submit
  Error: failed: t9800.20 submit from detached head
⏵ failure: t9800.20 submit from detached head 
  Error: failed: t9800.21 submit from worktree
⏵ failure: t9800.21 submit from worktree 
  === Failed test: t9801-git-p4-branch ===
  [...]


The "Failed test:" lines are colored in yellow to give a better visual clue
about the logs' structure, the "Error:" label is colored in red to draw the
attention to the important part of the log, and the "⏵" characters indicate
that part of the log is collapsed and can be expanded by clicking on it.

To drill down, the reader merely needs to expand the (failed) test case's
log by clicking on it, and then study the log. If needed (e.g. when the test
case relies on side effects from previous test cases), the logs of preceding
test cases can be expanded as well. In this example, when expanding
t9800.20, it looks like this (for ease of reading, I cut a few chunks of
lines, indicated by "[...]"):

[...]
⏵ ok: t9800.19 run hook p4-pre-submit before submit
  Error: failed: t9800.20 submit from detached head
⏷ failure: t9800.20 submit from detached head 
      test_when_finished cleanup_git &&
      git p4 clone --dest="$git" //depot &&
        (
          cd "$git" &&
          git checkout p4/master &&
          >detached_head_test &&
          git add detached_head_test &&
          git commit -m "add detached_head" &&
          git config git-p4.skipSubmitEdit true &&
          git p4 submit &&
            git p4 rebase &&
            git log p4/master | grep detached_head
        )
    [...]
    Depot paths: //depot/
    Import destination: refs/remotes/p4/master
    
    Importing revision 9 (100%)Perforce db files in '.' will be created if missing...
    Perforce db files in '.' will be created if missing...
    
    Traceback (most recent call last):
      File "/home/runner/work/git/git/git-p4", line 4455, in <module>
        main()
      File "/home/runner/work/git/git/git-p4", line 4449, in main
        if not cmd.run(args):
      File "/home/runner/work/git/git/git-p4", line 2590, in run
        rebase.rebase()
      File "/home/runner/work/git/git/git-p4", line 4121, in rebase
        if len(read_pipe("git diff-index HEAD --")) > 0:
      File "/home/runner/work/git/git/git-p4", line 297, in read_pipe
        retcode, out, err = read_pipe_full(c, *k, **kw)
      File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full
        p = subprocess.Popen(
      File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --'
    error: last command exited with $?=1
    + cleanup_git
    + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
    + nr_tries_left=60
    + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
    + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
    + test 1 -ne 1
    + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
    + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
    + nr_tries_left=60
    + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
    + exit 1
    + eval_ret=1
    + :
    not ok 20 - submit from detached head
    #    
    #        test_when_finished cleanup_git &&
    #        git p4 clone --dest="$git" //depot &&
    #        (
    #            cd "$git" &&
    #            git checkout p4/master &&
    #            >detached_head_test &&
    #            git add detached_head_test &&
    #            git commit -m "add detached_head" &&
    #            git config git-p4.skipSubmitEdit true &&
    #            git p4 submit &&
    #            git p4 rebase &&
    #            git log p4/master | grep detached_head
    #        )
    #    
  Error: failed: t9800.21 submit from worktree
  [...]


Is this the best UI we can have for test failures in CI runs? I hope we can
do better. Having said that, this patch series presents a pretty good start,
and offers a basis for future improvements.

Johannes Schindelin (9):
  ci: fix code style
  ci/run-build-and-tests: take a more high-level view
  ci: make it easier to find failed tests' logs in the GitHub workflow
  ci/run-build-and-tests: add some structure to the GitHub workflow
    output
  tests: refactor --write-junit-xml code
  test(junit): avoid line feeds in XML attributes
  ci: optionally mark up output in the GitHub workflow
  ci: use `--github-workflow-markup` in the GitHub workflow
  ci: call `finalize_test_case_output` a little later

 .github/workflows/main.yml           |  12 ---
 ci/lib.sh                            |  81 ++++++++++++++--
 ci/run-build-and-tests.sh            |  11 ++-
 ci/run-test-slice.sh                 |   5 +-
 t/test-lib-functions.sh              |   4 +-
 t/test-lib-github-workflow-markup.sh |  50 ++++++++++
 t/test-lib-junit.sh                  | 132 +++++++++++++++++++++++++++
 t/test-lib.sh                        | 128 ++++----------------------
 8 files changed, 287 insertions(+), 136 deletions(-)
 create mode 100644 t/test-lib-github-workflow-markup.sh
 create mode 100644 t/test-lib-junit.sh


base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1117

Ævar Arnfjörð Bjarmason Jan. 26, 2022, 12:25 a.m. UTC | #1

On Mon, Jan 24 2022, Johannes Schindelin via GitGitGadget wrote:

> Background
> ==========
>
> Recent patches intended to help readers figure out CI failures much quicker
> than before. Unfortunately, they haven't been entirely positive for me. For
> example, they broke the branch protections in Microsoft's fork of Git, where
> we require Pull Requests to pass a certain set of Checks (which are
> identified by their names) and therefore caused follow-up work.

This seems to be a reference to my df7375d7728 (CI: use shorter names
that fit in UX tooltips, 2021-11-23) merged as part of ab/ci-updates,
and I understand from this summary that you had some custom job
somewhere that scraped the job names which broke.

That's unfortunate, I do think being able to actually read the tooltips
in the GitHub UI was a worthwhile trade-off in the end though.

But I'm entirely confused about what any of that has to do with this
series, which is about changing how the job output itself is presented
and summarized, and not about the job names, and making them fit in
tooltips.

Later in the summary you note: 

> Using CI and in general making it easier for new contributors is an area I'm
> passionate about, and one I'd like to see improved.
> [...]
> ⊗ linux-gcc (ubuntu-latest)
>    failed: t9800.20 submit from detached head

Which has one of the new and shorter jobnames, but in a part of the UX
where the length didn't matter, and I can't find a way where it does.

Ævar Arnfjörð Bjarmason Jan. 27, 2022, 4:31 p.m. UTC | #2

[CC-ing some people who've been interested in CI architechture]

On Mon, Jan 24 2022, Johannes Schindelin via GitGitGadget wrote:

> [...]
> The current situation
> =====================
>
> Let me walk you through the current experience when a PR build fails: I get
> a notification mail that only says that a certain job failed. There's no
> indication of which test failed (or was it the build?). I can click on a
> link at it takes me to the workflow run. Once there, all it says is "Process
> completed with exit code 1", or even "code 2". Sure, I can click on one of
> the failed jobs. It even expands the failed step's log (collapsing the other
> steps). And what do I see there?
>
> Let's look at an example of a failed linux-clang (ubuntu-latest) job
> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
>
> [...]
> Test Summary Report
> -------------------
> t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
>   Failed test:  49
>   Non-zero exit status: 1
> t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
>   TODO passed:   45, 47
> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
> Result: FAIL
> make[1]: *** [Makefile:53: prove] Error 1
> make[1]: Leaving directory '/home/runner/work/git/git/t'
> make: *** [Makefile:3018: test] Error 2
>

Firstly I very much applaud any effort to move the CI UX forward. I know
we haven't seen eye-to-eye on some of the trade-offs there, but I think
something like this series is a step in the right direction. I.e. trying
harder to summarize the output for the user, and making use of some CI
platform-specific features.

I sent a reply in this thread purely on some implementation concerns
related to that in
https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/,
but let's leave that aside for now...

> [...]
> So I had a look at what standards exist e.g. when testing PowerShell
> modules, in the way of marking up their test output in GitHub workflows, and
> I was not disappointed: GitHub workflows support "grouping" of output lines,
> i.e. marking sections of the output as a group that is then collapsed by
> default and can be expanded. And it is this feature I decided to use in this
> patch series, along with GitHub workflows' commands to display errors or
> notices that are also shown on the summary page of the workflow run. Now, in
> addition to "Process completed with exit code" on the summary page, we also
> read something like:
>
> ⊗ linux-gcc (ubuntu-latest)
>    failed: t9800.20 submit from detached head
>
> Even better, this message is a link, and following that, the reader is
> presented with something like this
> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:

This series is doing several different things, at least:

 1) "Grouping" the ci/ output, i.e. "make" from "make test"
 2) Doing likewise for t/test-lib.sh
 3) In doing that for t/test-lib.sh, also "signalling" the GitHub CI,
    to e.g. get the "submit from detached head" output you quote just
    a few lines above

I'd like to focus on just #1 here.

Where I was going with that in my last CI series was to make a start at
eventually being able to run simply "make" at the top-level
"step". I.e. to have a recipe that looks like:

    - run: make
    - run: make test

I feel strongly that that's where we should be heading, and the #1 part
of this series is basically trying to emulate what you'd get for free if
we simply did that.

I.e. if you run single commands at the "step" level (in GitHub CI
nomenclature) you'll get what you're doing with groupings in this series
for free, and without any special code in ci/*, better yet if you then
do want grouping *within* that step you're free to do so without having
clobbered your one-level of grouping already on distinguishing "make"
from "make test".

IOW our CI now looks like this (pseudocode):

     - job:
       - step1:
         - use ci/lib.sh to set env vars
         - run a script like ci/run-build-and-tests.sh
       - step2:
         - use ci/lib.sh to set env vars
         - run a script like print-test-failures.sh

But should instead look like:

     - job:
       - step1:
         - set variables in $GITHUB_ENV using ci/lib.sh
       - step2:
         - make
       - step3:
         - make test
       - step4:
         - run a script like print-test-failures.sh

Well, we can quibble about "step4", but again, let's focus on #1 here,
that's more #2-#3 territory.

I had some WIP code to do that which I polished up, here's how e.g. a
build failure looks like in your implementation (again, just focusing on
how "make" and "make test" is divided out, not the rest):

    https://github.com/dscho/git/runs/4840190622?check_suite_focus=true#step:4:62

I.e. you've made "build" an expandable group at the same level as a
single failed test, and still all under the opaque
ci/run-build-and-test.sh script.

And here's mine. This is using a semi-recent version of my patches that
happened to have a failure, not quite what I've got now, but close
enough for this E-Mail:

    https://github.com/avar/git/runs/4956260395?check_suite_focus=true#step:7:1

Now, notice two things, one we've made "make" and "make test" top-level
steps, but more importantly if you expand that "make test" step on yours
you'll get the full "make test" output,

And two it's got something you don't have at all, which is that we're
now making use of the GitHub CI feature of having pre-declared an
environment for "make test", which the CI knows about (you need to click
to expand it):

    https://github.com/avar/git/runs/4956260395?check_suite_focus=true#step:7:4

Right now that's something we hardly make use of at all, but with my
changes the environment is the *only* special sauce we specify before
the step, i.e. GIT_PROVE_OPTS=.. DEFAULT_TEST_TARGET=... etc.

I think I've run out of my ML quota for now, but here's the branch that implements it:

    https://github.com/git/git/compare/master...avar:avar/ci-unroll-make-commands-to-ci-recipe

That's "282 additions and 493 deletions.", much of what was required to
do this was to eject the remaining support for the dead Travis and Azure
CI's that we don't run, i.e. to entirely remove any sort of state
management or job control from ci/lib.sh, and have it *only* be tasked
with setting variables for subsequent steps to use.

That makes it much simpler, my end-state of it is ~170 lines v.s. your
~270 (but to be fair some of that's deleted Travis code):

    https://github.com/avar/git/blob/avar/ci-unroll-make-commands-to-ci-recipe/ci/lib.sh
    https://github.com/gitgitgadget/git/blob/pr-1117/dscho/use-grouping-in-ci-v1/ci/lib.sh

And much of the rest is just gone, e.g. ci/run-build-and-tests.sh isn't
there anymore, instead you simply run "make" or "make test" (or the
equivalent on Windows, which also works):

    https://github.com/avar/git/tree/avar/ci-unroll-make-commands-to-ci-recipe/ci
    https://github.com/gitgitgadget/git/tree/pr-1117/dscho/use-grouping-in-ci-v1/ci

Anyway, I hope we can find some sort of joint way forward with this,
because I think your #1 at least is going in the opposite direction we
should be going to achieve much the same ends you'd like to achieve.

We can really just do this in a much simpler way once we stop treating
ci/lib.sh and friends as monolithic ball of mud entry points.

But I'd really like us not to go in this direction of using markup to
"sub-divide" the "steps" within a given job, when we can relatively
easily just ... divide the steps.

As shown above that UI plays much more naturally into the CI's native
features & how it likes to arrange & present things.

And again, all of this is *only* discussing the "step #1" noted
above. Using "grouping" for presenting the test failures themselves or
sending summaries to the CI "Summary" is a different matter.

Thanks!

Johannes Schindelin Feb. 19, 2022, 11:46 p.m. UTC | #3

Hi Junio,

I notice that you did not take this into `seen` yet. I find that a little
sad because it would potentially have helped others to figure out the
failure in the latest `seen`:
https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162

Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.

Ciao,
Dscho


On Mon, 24 Jan 2022, Johannes Schindelin via GitGitGadget wrote:

>
> Background
> ==========
>
> Recent patches intended to help readers figure out CI failures much quicker
> than before. Unfortunately, they haven't been entirely positive for me. For
> example, they broke the branch protections in Microsoft's fork of Git, where
> we require Pull Requests to pass a certain set of Checks (which are
> identified by their names) and therefore caused follow-up work.
>
> Using CI and in general making it easier for new contributors is an area I'm
> passionate about, and one I'd like to see improved.
>
>
> The current situation
> =====================
>
> Let me walk you through the current experience when a PR build fails: I get
> a notification mail that only says that a certain job failed. There's no
> indication of which test failed (or was it the build?). I can click on a
> link at it takes me to the workflow run. Once there, all it says is "Process
> completed with exit code 1", or even "code 2". Sure, I can click on one of
> the failed jobs. It even expands the failed step's log (collapsing the other
> steps). And what do I see there?
>
> Let's look at an example of a failed linux-clang (ubuntu-latest) job
> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
>
> [...]
> Test Summary Report
> -------------------
> t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
>   Failed test:  49
>   Non-zero exit status: 1
> t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
>   TODO passed:   45, 47
> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
> Result: FAIL
> make[1]: *** [Makefile:53: prove] Error 1
> make[1]: Leaving directory '/home/runner/work/git/git/t'
> make: *** [Makefile:3018: test] Error 2
>
>
> That's it. I count myself lucky not to be a new contributor being faced with
> something like this.
>
> Now, since I am active in the Git project for a couple of days or so, I can
> make sense of the "TODO passed" label and know that for the purpose of
> fixing the build failures, I need to ignore this, and that I need to focus
> on the "Failed test" part instead.
>
> I also know that I do not have to get myself an ubuntu-latest box just to
> reproduce the error, I do not even have to check out the code and run it
> just to learn what that "49" means.
>
> I know, and I do not expect any new contributor, not even most seasoned
> contributors to know, that I have to patiently collapse the "Run
> ci/run-build-and-tests.sh" job's log, and instead expand the "Run
> ci/print-test-failures.sh" job log (which did not fail and hence does not
> draw any attention to it).
>
> I know, and again: I do not expect many others to know this, that I then
> have to click into the "Search logs" box (not the regular web browser's
> search via Ctrl+F!) and type in "not ok" to find the log of the failed test
> case (and this might still be a "known broken" one that is marked via
> test_expect_failure and once again needs to be ignored).
>
> To be excessively clear: This is not a great experience!
>
>
> Improved output
> ===============
>
> Our previous Azure Pipelines-based CI builds had a much nicer UI, one that
> even showed flaky tests, and trends e.g. how long the test cases ran. When I
> ported Git's CI over to GitHub workflows (to make CI more accessible to new
> contributors), I knew fully well that we would leave this very nice UI
> behind, and I had hoped that we would get something similar back via new,
> community-contributed GitHub Actions that can be used in GitHub workflows.
> However, most likely because we use a home-grown test framework implemented
> in opinionated POSIX shells scripts, that did not happen.
>
> So I had a look at what standards exist e.g. when testing PowerShell
> modules, in the way of marking up their test output in GitHub workflows, and
> I was not disappointed: GitHub workflows support "grouping" of output lines,
> i.e. marking sections of the output as a group that is then collapsed by
> default and can be expanded. And it is this feature I decided to use in this
> patch series, along with GitHub workflows' commands to display errors or
> notices that are also shown on the summary page of the workflow run. Now, in
> addition to "Process completed with exit code" on the summary page, we also
> read something like:
>
> ⊗ linux-gcc (ubuntu-latest)
>    failed: t9800.20 submit from detached head
>
>
> Even better, this message is a link, and following that, the reader is
> presented with something like this
> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:
>
> ⏵ Run ci/run-build-and-tests.sh
> ⏵ CI setup
>   + ln -s /home/runner/none/.prove t/.prove
>   + run_tests=t
>   + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>   + group Build make
>   + set +x
> ⏵ Build
> ⏵ Run tests
>   === Failed test: t9800-git-p4-basic ===
> ⏵ ok: t9800.1 start p4d
> ⏵ ok: t9800.2 add p4 files
> ⏵ ok: t9800.3 basic git p4 clone
> ⏵ ok: t9800.4 depot typo error
> ⏵ ok: t9800.5 git p4 clone @all
> ⏵ ok: t9800.6 git p4 sync uninitialized repo
> ⏵ ok: t9800.7 git p4 sync new branch
> ⏵ ok: t9800.8 clone two dirs
> ⏵ ok: t9800.9 clone two dirs, @all
> ⏵ ok: t9800.10 clone two dirs, @all, conflicting files
> ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit
> ⏵ ok: t9800.12 clone using non-numeric revision ranges
> ⏵ ok: t9800.13 clone with date range, excluding some changes
> ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output
> ⏵ ok: t9800.15 exit gracefully for p4 server errors
> ⏵ ok: t9800.16 clone --bare should make a bare repository
> ⏵ ok: t9800.17 initial import time from top change time
> ⏵ ok: t9800.18 unresolvable host in P4PORT should display error
> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>   Error: failed: t9800.20 submit from detached head
> ⏵ failure: t9800.20 submit from detached head
>   Error: failed: t9800.21 submit from worktree
> ⏵ failure: t9800.21 submit from worktree
>   === Failed test: t9801-git-p4-branch ===
>   [...]
>
>
> The "Failed test:" lines are colored in yellow to give a better visual clue
> about the logs' structure, the "Error:" label is colored in red to draw the
> attention to the important part of the log, and the "⏵" characters indicate
> that part of the log is collapsed and can be expanded by clicking on it.
>
> To drill down, the reader merely needs to expand the (failed) test case's
> log by clicking on it, and then study the log. If needed (e.g. when the test
> case relies on side effects from previous test cases), the logs of preceding
> test cases can be expanded as well. In this example, when expanding
> t9800.20, it looks like this (for ease of reading, I cut a few chunks of
> lines, indicated by "[...]"):
>
> [...]
> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>   Error: failed: t9800.20 submit from detached head
> ⏷ failure: t9800.20 submit from detached head
>       test_when_finished cleanup_git &&
>       git p4 clone --dest="$git" //depot &&
>         (
>           cd "$git" &&
>           git checkout p4/master &&
>           >detached_head_test &&
>           git add detached_head_test &&
>           git commit -m "add detached_head" &&
>           git config git-p4.skipSubmitEdit true &&
>           git p4 submit &&
>             git p4 rebase &&
>             git log p4/master | grep detached_head
>         )
>     [...]
>     Depot paths: //depot/
>     Import destination: refs/remotes/p4/master
>
>     Importing revision 9 (100%)Perforce db files in '.' will be created if missing...
>     Perforce db files in '.' will be created if missing...
>
>     Traceback (most recent call last):
>       File "/home/runner/work/git/git/git-p4", line 4455, in <module>
>         main()
>       File "/home/runner/work/git/git/git-p4", line 4449, in main
>         if not cmd.run(args):
>       File "/home/runner/work/git/git/git-p4", line 2590, in run
>         rebase.rebase()
>       File "/home/runner/work/git/git/git-p4", line 4121, in rebase
>         if len(read_pipe("git diff-index HEAD --")) > 0:
>       File "/home/runner/work/git/git/git-p4", line 297, in read_pipe
>         retcode, out, err = read_pipe_full(c, *k, **kw)
>       File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full
>         p = subprocess.Popen(
>       File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
>         self._execute_child(args, executable, preexec_fn, close_fds,
>       File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
>         raise child_exception_type(errno_num, err_msg, err_filename)
>     FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --'
>     error: last command exited with $?=1
>     + cleanup_git
>     + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + nr_tries_left=60
>     + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + test 1 -ne 1
>     + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + nr_tries_left=60
>     + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>     + exit 1
>     + eval_ret=1
>     + :
>     not ok 20 - submit from detached head
>     #
>     #        test_when_finished cleanup_git &&
>     #        git p4 clone --dest="$git" //depot &&
>     #        (
>     #            cd "$git" &&
>     #            git checkout p4/master &&
>     #            >detached_head_test &&
>     #            git add detached_head_test &&
>     #            git commit -m "add detached_head" &&
>     #            git config git-p4.skipSubmitEdit true &&
>     #            git p4 submit &&
>     #            git p4 rebase &&
>     #            git log p4/master | grep detached_head
>     #        )
>     #
>   Error: failed: t9800.21 submit from worktree
>   [...]
>
>
> Is this the best UI we can have for test failures in CI runs? I hope we can
> do better. Having said that, this patch series presents a pretty good start,
> and offers a basis for future improvements.
>
> Johannes Schindelin (9):
>   ci: fix code style
>   ci/run-build-and-tests: take a more high-level view
>   ci: make it easier to find failed tests' logs in the GitHub workflow
>   ci/run-build-and-tests: add some structure to the GitHub workflow
>     output
>   tests: refactor --write-junit-xml code
>   test(junit): avoid line feeds in XML attributes
>   ci: optionally mark up output in the GitHub workflow
>   ci: use `--github-workflow-markup` in the GitHub workflow
>   ci: call `finalize_test_case_output` a little later
>
>  .github/workflows/main.yml           |  12 ---
>  ci/lib.sh                            |  81 ++++++++++++++--
>  ci/run-build-and-tests.sh            |  11 ++-
>  ci/run-test-slice.sh                 |   5 +-
>  t/test-lib-functions.sh              |   4 +-
>  t/test-lib-github-workflow-markup.sh |  50 ++++++++++
>  t/test-lib-junit.sh                  | 132 +++++++++++++++++++++++++++
>  t/test-lib.sh                        | 128 ++++----------------------
>  8 files changed, 287 insertions(+), 136 deletions(-)
>  create mode 100644 t/test-lib-github-workflow-markup.sh
>  create mode 100644 t/test-lib-junit.sh
>
>
> base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1117
> --
> gitgitgadget
>
>

Junio C Hamano Feb. 20, 2022, 2:44 a.m. UTC | #4

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I notice that you did not take this into `seen` yet. I find that a little
> sad because it would potentially have helped others to figure out the
> failure in the latest `seen`:
> https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
>
> Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.

I saw the thread, I saw a few patches were commented on, and a few
were left unanswered, but one was replied by the original submitter
with a "Good catch!", making me expect the topic to be discussed or
rerolled to become ready relatively soon.

But nothing happened, so I even forgot to take a look myself by
picking it up in 'seen'.  It does sound sad that the topic was left
hanging there for 3 weeks or so in that state without any reroll or
response.

Ævar Arnfjörð Bjarmason Feb. 20, 2022, 12:47 p.m. UTC | #5

On Sun, Feb 20 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> I notice that you did not take this into `seen` yet. I find that a little
> sad because it would potentially have helped others to figure out the
> failure in the latest `seen`:
> https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
>
> Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.

I left some feedback on your submission ~3 weeks ago that you haven't
responded to:
https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/

I think you should really reply to that before this moves forward,
i.e. it's not trivial concerns. I think to get from our current "X" to
your aims of "Y" your way of doing that (for part of this series) is
really an overly complex way of getting there that we can do much
simpler, and the simpler way integrates much better with the GitHub CI
UI.

The feedback I left is on the part of this that's not directly relevant
to what you're pointing out here (which is the grouping of the per-test
failure output), but if your series is picked-up as-is we'd need to undo
rather big parts of it to get to what I consider a better state for the
"grouping" of the "make" v.s. "make test" etc. output.

I can just submit my version of that & we can hash out what direction
makes sense there, how does that sound? I've been running with it for
about a month, and really think that part of the failure output is much
better.

Here's an example of that part:
https://github.com/avar/git/runs/5259000590?check_suite_focus=true

I.e. note how we'll now just have a "make" and "make test" step, and we
failed there on the "make".

So we'd get to the point of simply invoking those build steps as 1=1
mapped CI steps, as opposed to "improving" ci/run-build-and-tests.sh to
emulate that (I've just git rm'd it in my version).

>
> On Mon, 24 Jan 2022, Johannes Schindelin via GitGitGadget wrote:
>
>>
>> Background
>> ==========
>>
>> Recent patches intended to help readers figure out CI failures much quicker
>> than before. Unfortunately, they haven't been entirely positive for me. For
>> example, they broke the branch protections in Microsoft's fork of Git, where
>> we require Pull Requests to pass a certain set of Checks (which are
>> identified by their names) and therefore caused follow-up work.
>>
>> Using CI and in general making it easier for new contributors is an area I'm
>> passionate about, and one I'd like to see improved.
>>
>>
>> The current situation
>> =====================
>>
>> Let me walk you through the current experience when a PR build fails: I get
>> a notification mail that only says that a certain job failed. There's no
>> indication of which test failed (or was it the build?). I can click on a
>> link at it takes me to the workflow run. Once there, all it says is "Process
>> completed with exit code 1", or even "code 2". Sure, I can click on one of
>> the failed jobs. It even expands the failed step's log (collapsing the other
>> steps). And what do I see there?
>>
>> Let's look at an example of a failed linux-clang (ubuntu-latest) job
>> [https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true]:
>>
>> [...]
>> Test Summary Report
>> -------------------
>> t1092-sparse-checkout-compatibility.sh           (Wstat: 256 Tests: 53 Failed: 1)
>>   Failed test:  49
>>   Non-zero exit status: 1
>> t3701-add-interactive.sh                         (Wstat: 0 Tests: 71 Failed: 0)
>>   TODO passed:   45, 47
>> Files=957, Tests=25489, 645 wallclock secs ( 5.74 usr  1.56 sys + 866.28 cusr 364.34 csys = 1237.92 CPU)
>> Result: FAIL
>> make[1]: *** [Makefile:53: prove] Error 1
>> make[1]: Leaving directory '/home/runner/work/git/git/t'
>> make: *** [Makefile:3018: test] Error 2
>>
>>
>> That's it. I count myself lucky not to be a new contributor being faced with
>> something like this.
>>
>> Now, since I am active in the Git project for a couple of days or so, I can
>> make sense of the "TODO passed" label and know that for the purpose of
>> fixing the build failures, I need to ignore this, and that I need to focus
>> on the "Failed test" part instead.
>>
>> I also know that I do not have to get myself an ubuntu-latest box just to
>> reproduce the error, I do not even have to check out the code and run it
>> just to learn what that "49" means.
>>
>> I know, and I do not expect any new contributor, not even most seasoned
>> contributors to know, that I have to patiently collapse the "Run
>> ci/run-build-and-tests.sh" job's log, and instead expand the "Run
>> ci/print-test-failures.sh" job log (which did not fail and hence does not
>> draw any attention to it).
>>
>> I know, and again: I do not expect many others to know this, that I then
>> have to click into the "Search logs" box (not the regular web browser's
>> search via Ctrl+F!) and type in "not ok" to find the log of the failed test
>> case (and this might still be a "known broken" one that is marked via
>> test_expect_failure and once again needs to be ignored).
>>
>> To be excessively clear: This is not a great experience!
>>
>>
>> Improved output
>> ===============
>>
>> Our previous Azure Pipelines-based CI builds had a much nicer UI, one that
>> even showed flaky tests, and trends e.g. how long the test cases ran. When I
>> ported Git's CI over to GitHub workflows (to make CI more accessible to new
>> contributors), I knew fully well that we would leave this very nice UI
>> behind, and I had hoped that we would get something similar back via new,
>> community-contributed GitHub Actions that can be used in GitHub workflows.
>> However, most likely because we use a home-grown test framework implemented
>> in opinionated POSIX shells scripts, that did not happen.
>>
>> So I had a look at what standards exist e.g. when testing PowerShell
>> modules, in the way of marking up their test output in GitHub workflows, and
>> I was not disappointed: GitHub workflows support "grouping" of output lines,
>> i.e. marking sections of the output as a group that is then collapsed by
>> default and can be expanded. And it is this feature I decided to use in this
>> patch series, along with GitHub workflows' commands to display errors or
>> notices that are also shown on the summary page of the workflow run. Now, in
>> addition to "Process completed with exit code" on the summary page, we also
>> read something like:
>>
>> ⊗ linux-gcc (ubuntu-latest)
>>    failed: t9800.20 submit from detached head
>>
>>
>> Even better, this message is a link, and following that, the reader is
>> presented with something like this
>> [https://github.com/dscho/git/runs/4840190622?check_suite_focus=true]:
>>
>> ⏵ Run ci/run-build-and-tests.sh
>> ⏵ CI setup
>>   + ln -s /home/runner/none/.prove t/.prove
>>   + run_tests=t
>>   + export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>>   + group Build make
>>   + set +x
>> ⏵ Build
>> ⏵ Run tests
>>   === Failed test: t9800-git-p4-basic ===
>> ⏵ ok: t9800.1 start p4d
>> ⏵ ok: t9800.2 add p4 files
>> ⏵ ok: t9800.3 basic git p4 clone
>> ⏵ ok: t9800.4 depot typo error
>> ⏵ ok: t9800.5 git p4 clone @all
>> ⏵ ok: t9800.6 git p4 sync uninitialized repo
>> ⏵ ok: t9800.7 git p4 sync new branch
>> ⏵ ok: t9800.8 clone two dirs
>> ⏵ ok: t9800.9 clone two dirs, @all
>> ⏵ ok: t9800.10 clone two dirs, @all, conflicting files
>> ⏵ ok: t9800.11 clone two dirs, each edited by submit, single git commit
>> ⏵ ok: t9800.12 clone using non-numeric revision ranges
>> ⏵ ok: t9800.13 clone with date range, excluding some changes
>> ⏵ ok: t9800.14 exit when p4 fails to produce marshaled output
>> ⏵ ok: t9800.15 exit gracefully for p4 server errors
>> ⏵ ok: t9800.16 clone --bare should make a bare repository
>> ⏵ ok: t9800.17 initial import time from top change time
>> ⏵ ok: t9800.18 unresolvable host in P4PORT should display error
>> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>>   Error: failed: t9800.20 submit from detached head
>> ⏵ failure: t9800.20 submit from detached head
>>   Error: failed: t9800.21 submit from worktree
>> ⏵ failure: t9800.21 submit from worktree
>>   === Failed test: t9801-git-p4-branch ===
>>   [...]
>>
>>
>> The "Failed test:" lines are colored in yellow to give a better visual clue
>> about the logs' structure, the "Error:" label is colored in red to draw the
>> attention to the important part of the log, and the "⏵" characters indicate
>> that part of the log is collapsed and can be expanded by clicking on it.
>>
>> To drill down, the reader merely needs to expand the (failed) test case's
>> log by clicking on it, and then study the log. If needed (e.g. when the test
>> case relies on side effects from previous test cases), the logs of preceding
>> test cases can be expanded as well. In this example, when expanding
>> t9800.20, it looks like this (for ease of reading, I cut a few chunks of
>> lines, indicated by "[...]"):
>>
>> [...]
>> ⏵ ok: t9800.19 run hook p4-pre-submit before submit
>>   Error: failed: t9800.20 submit from detached head
>> ⏷ failure: t9800.20 submit from detached head
>>       test_when_finished cleanup_git &&
>>       git p4 clone --dest="$git" //depot &&
>>         (
>>           cd "$git" &&
>>           git checkout p4/master &&
>>           >detached_head_test &&
>>           git add detached_head_test &&
>>           git commit -m "add detached_head" &&
>>           git config git-p4.skipSubmitEdit true &&
>>           git p4 submit &&
>>             git p4 rebase &&
>>             git log p4/master | grep detached_head
>>         )
>>     [...]
>>     Depot paths: //depot/
>>     Import destination: refs/remotes/p4/master
>>
>>     Importing revision 9 (100%)Perforce db files in '.' will be created if missing...
>>     Perforce db files in '.' will be created if missing...
>>
>>     Traceback (most recent call last):
>>       File "/home/runner/work/git/git/git-p4", line 4455, in <module>
>>         main()
>>       File "/home/runner/work/git/git/git-p4", line 4449, in main
>>         if not cmd.run(args):
>>       File "/home/runner/work/git/git/git-p4", line 2590, in run
>>         rebase.rebase()
>>       File "/home/runner/work/git/git/git-p4", line 4121, in rebase
>>         if len(read_pipe("git diff-index HEAD --")) > 0:
>>       File "/home/runner/work/git/git/git-p4", line 297, in read_pipe
>>         retcode, out, err = read_pipe_full(c, *k, **kw)
>>       File "/home/runner/work/git/git/git-p4", line 284, in read_pipe_full
>>         p = subprocess.Popen(
>>       File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
>>         self._execute_child(args, executable, preexec_fn, close_fds,
>>       File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
>>         raise child_exception_type(errno_num, err_msg, err_filename)
>>     FileNotFoundError: [Errno 2] No such file or directory: 'git diff-index HEAD --'
>>     error: last command exited with $?=1
>>     + cleanup_git
>>     + retry_until_success rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + nr_tries_left=60
>>     + rm -r /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + test_path_is_missing /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + test 1 -ne 1
>>     + test -e /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + retry_until_success mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + nr_tries_left=60
>>     + mkdir /home/runner/work/git/git/t/trash directory.t9800-git-p4-basic/git
>>     + exit 1
>>     + eval_ret=1
>>     + :
>>     not ok 20 - submit from detached head
>>     #
>>     #        test_when_finished cleanup_git &&
>>     #        git p4 clone --dest="$git" //depot &&
>>     #        (
>>     #            cd "$git" &&
>>     #            git checkout p4/master &&
>>     #            >detached_head_test &&
>>     #            git add detached_head_test &&
>>     #            git commit -m "add detached_head" &&
>>     #            git config git-p4.skipSubmitEdit true &&
>>     #            git p4 submit &&
>>     #            git p4 rebase &&
>>     #            git log p4/master | grep detached_head
>>     #        )
>>     #
>>   Error: failed: t9800.21 submit from worktree
>>   [...]
>>
>>
>> Is this the best UI we can have for test failures in CI runs? I hope we can
>> do better. Having said that, this patch series presents a pretty good start,
>> and offers a basis for future improvements.
>>
>> Johannes Schindelin (9):
>>   ci: fix code style
>>   ci/run-build-and-tests: take a more high-level view
>>   ci: make it easier to find failed tests' logs in the GitHub workflow
>>   ci/run-build-and-tests: add some structure to the GitHub workflow
>>     output
>>   tests: refactor --write-junit-xml code
>>   test(junit): avoid line feeds in XML attributes
>>   ci: optionally mark up output in the GitHub workflow
>>   ci: use `--github-workflow-markup` in the GitHub workflow
>>   ci: call `finalize_test_case_output` a little later
>>
>>  .github/workflows/main.yml           |  12 ---
>>  ci/lib.sh                            |  81 ++++++++++++++--
>>  ci/run-build-and-tests.sh            |  11 ++-
>>  ci/run-test-slice.sh                 |   5 +-
>>  t/test-lib-functions.sh              |   4 +-
>>  t/test-lib-github-workflow-markup.sh |  50 ++++++++++
>>  t/test-lib-junit.sh                  | 132 +++++++++++++++++++++++++++
>>  t/test-lib.sh                        | 128 ++++----------------------
>>  8 files changed, 287 insertions(+), 136 deletions(-)
>>  create mode 100644 t/test-lib-github-workflow-markup.sh
>>  create mode 100644 t/test-lib-junit.sh
>>
>>
>> base-commit: af4e5f569bc89f356eb34a9373d7f82aca6faa8a
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1117%2Fdscho%2Fuse-grouping-in-ci-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1117/dscho/use-grouping-in-ci-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1117
>> --
>> gitgitgadget
>>
>>

Johannes Schindelin Feb. 20, 2022, 3:25 p.m. UTC | #6

Hi Junio,

On Sat, 19 Feb 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I notice that you did not take this into `seen` yet. I find that a little
> > sad because it would potentially have helped others to figure out the
> > failure in the latest `seen`:
> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
> >
> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
>
> I saw the thread, I saw a few patches were commented on, and a few
> were left unanswered, but one was replied by the original submitter
> with a "Good catch!", making me expect the topic to be discussed or
> rerolled to become ready relatively soon.

Yes, I have local changes, but I had really hoped that this patch series
would get a chance to prove its point by example, i.e. by offering the
improved output for the failures in `seen`. I hoped that because I think
that those improvements speak for themselves when you see them.

Ciao,
Dscho

Ævar Arnfjörð Bjarmason Feb. 21, 2022, 8:09 a.m. UTC | #7

On Sun, Feb 20 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> On Sat, 19 Feb 2022, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > I notice that you did not take this into `seen` yet. I find that a little
>> > sad because it would potentially have helped others to figure out the
>> > failure in the latest `seen`:
>> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
>> >
>> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
>>
>> I saw the thread, I saw a few patches were commented on, and a few
>> were left unanswered, but one was replied by the original submitter
>> with a "Good catch!", making me expect the topic to be discussed or
>> rerolled to become ready relatively soon.
>
> Yes, I have local changes, but I had really hoped that this patch series
> would get a chance to prove its point by example, i.e. by offering the
> improved output for the failures in `seen`. I hoped that because I think
> that those improvements speak for themselves when you see them.

I think it's a good idea to get wider expose in "seen", "next" etc. for
topics where the bottleneck is lack of feedback due to lack of wider
exposure.

But in this case I've pointed out both direction & UX issues to you that
you haven't addressed. Both what I sent a reminder of yesterday in [1],
and more relevant to what you're discussing here a reply [2] where I
looked & tested your new output v.s. the old, and found that on test
failures it:

 * Replaced summary output with a much more verbose version.

 * Turned the GitHub UI from usable (but sometimes hard to find the needle in
   the haystack) to *extremely slow*. Seemingly because the browser was asked to
   make sense of~30k lines of output, with some of it hidden dynamically by JavaScript.

1. https://lore.kernel.org/git/220220.86bkz1d7hm.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/

Johannes Schindelin Feb. 22, 2022, 10:26 a.m. UTC | #8

Hi Ævar,

On Mon, 21 Feb 2022, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Feb 20 2022, Johannes Schindelin wrote:
>
> > On Sat, 19 Feb 2022, Junio C Hamano wrote:
> >
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >>
> >> > I notice that you did not take this into `seen` yet. I find that a little
> >> > sad because it would potentially have helped others to figure out the
> >> > failure in the latest `seen`:
> >> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
> >> >
> >> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
> >>
> >> I saw the thread, I saw a few patches were commented on, and a few
> >> were left unanswered, but one was replied by the original submitter
> >> with a "Good catch!", making me expect the topic to be discussed or
> >> rerolled to become ready relatively soon.
> >
> > Yes, I have local changes, but I had really hoped that this patch series
> > would get a chance to prove its point by example, i.e. by offering the
> > improved output for the failures in `seen`. I hoped that because I think
> > that those improvements speak for themselves when you see them.
>
> I think it's a good idea to get wider expose in "seen", "next" etc. for
> topics where the bottleneck is lack of feedback due to lack of wider
> exposure.

Having this in `seen` will give the patch series a chance to show in real
life how it improves the process of analyzing regressions.

Ciao,
Johannes

Johannes Schindelin Feb. 22, 2022, 10:30 a.m. UTC | #9

Hi Ævar,

On Sun, 20 Feb 2022, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Feb 20 2022, Johannes Schindelin wrote:
>
> > I notice that you did not take this into `seen` yet. I find that a little
> > sad because it would potentially have helped others to figure out the
> > failure in the latest `seen`:
> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
> >
> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
>
> I left some feedback on your submission ~3 weeks ago that you haven't
> responded to:
> https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/

You answered my goal of making it easier to figure out regressions by
doubling down on hiding the logs even better. That's not feedback, that's
just ignoring the goal.

You answered my refactor of the Azure Pipelines support with the question
"why?" that I had answered already a long time ago. That's not feedback,
that's ignoring the answers I already provided.

I don't know how to respond to that, therefore I didn't.

Ciao,
Johannes

Ævar Arnfjörð Bjarmason Feb. 22, 2022, 1:31 p.m. UTC | #10

On Tue, Feb 22 2022, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Sun, 20 Feb 2022, Ævar Arnfjörð Bjarmason wrote:
>
>> On Sun, Feb 20 2022, Johannes Schindelin wrote:
>>
>> > I notice that you did not take this into `seen` yet. I find that a little
>> > sad because it would potentially have helped others to figure out the
>> > failure in the latest `seen`:
>> > https://github.com/git/git/runs/5255378056?check_suite_focus=true#step:5:162
>> >
>> > Essentially, a recent patch introduces hard-coded SHA-1 hashes in t3007.3.
>>
>> I left some feedback on your submission ~3 weeks ago that you haven't
>> responded to:
>> https://lore.kernel.org/git/220127.86ilu5cdnf.gmgdl@evledraar.gmail.com/
>
> You answered my goal of making it easier to figure out regressions by
> doubling down on hiding the logs even better. That's not feedback, that's
> just ignoring the goal.

I think it's clear to anyone reading my feedback that that's either a
gross misreading of the feedback I provided or an intentional
misrepresentation.

I don't mention the second one of those lightly, but I think after some
months of that pattern now when commenting on various patches of yours
it's not an unfair claim.

I.e. you generally seem to latch onto some very narrow interpretation or
comment in some larger feedback pointing out various issues to you, and
use that as a reason not to respond to or address any of the rest.

So just to make the point about one of those mentioned in my [1] with
some further details (I won't go into the whole thing to avoid repeating
myself):

I opened both of:

    https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
    https://github.com/dscho/git/runs/4840190622?check_suite_focus=true

Just now in Firefox 91.5.0esr-1. Both having been opened before, so
they're in cache, and I've got a current 40MB/s real downlink speed etc.

The former fully loads in around 5100ms, with your series here that's
just short of 18000ms.

So your CI changes are making the common case of just looking at a CI
failure more than **3x as slow as before**.

That's according to the "performance" timeline, and not some abstract
"some JS was still running in the background". It lines up with the time
that the scroll bar on the side of the screen stops moving, and the
viewport does the "zoom to end" thing in GitHub CI's UI, focusing on the
error reported. It was really slow before, but it's SLOOOOOW now.

All of which (and I'm no webdev expert) seems to do with the browser
engine struggling to keep up with a much larger set of log data being
thrown at it, which despite the eliding you're adding can be seen in the
~1.7k lines of output growing to beyond ~33k now.

Once it loads the end result after all of that (re your "doubling down
on hiding the logs even better") is that I need to (and I've got a
sizable vertically mounted screen) scroll through around 6 pages of
output, each of which takes around 3 seconds of Firefox churning on more
than 100% CPU before it shows me the next page.

And even *if* it was instant the names of the failing tests are now
spread across several pages of output, whereas in the "prove" output we
have a quick overall summary separated from the
"ci/print-test-failures.sh" output

Does that mean the current output is perfect and can't be improved? No,
I also think it sucks. I just think that the current implementation
you've proposed for improving it is making it worse overall.

Which doesn't mean that it couldn't be addressed, fixed, or that the
core idea of using that "group" syntax to aggregate that output into
sections is bad. I think we should use it, just not as it's currently
implemented.

If that's "not feedback" I don't know what is. It's all relevant, and
while I'm elaborating further here [1] sent almost a month ago notes the
same issues.

> You answered my refactor of the Azure Pipelines support with the question
> "why?" that I had answered already a long time ago. That's not feedback,
> that's ignoring the answers I already provided.

I think it's clear what the gap between that answer is and what I was
asking you was in the parallel follow-up discussion at [2].

But even your answer there of just wanting to keep it in place doesn't
really answer that question for this series. You're not just keeping
that stale code in place, but actively changing it.

I.e. even if you run with all that how are others meant to test and
review the changes being proposed here?

I.e. is resurrecting Azure CI required to test this series, or should
reviewers ignore those parts and just hope it all works etc?

> I don't know how to respond to that, therefore I didn't.

I think whatever differences in direction for this CI feature that we
have, or troubles understanding one another, that your update after 3
weeks of not replying to that feedback [3] asking why Junio didn't pick
up your patches being indistinguishable from there having been nothing
said about your patches at all is, I think, not a good way to proceed
with that.

I.e. we're not the only people talking here, there's presumably others
who'll read these threads and will want to comment on the direction of
any CI changes.

Knowing from you that you read outstanding feedback and didn't
understand it, or read it and summarized but ultimately decided to
change nothing etc. makes for much of a flow on the ML than just
ignoring that feedback entirely.

1. https://lore.kernel.org/git/220126.86sftbfjl4.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220222.86y2236ndp.gmgdl@evledraar.gmail.com/
3. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2202200043590.26495@tvgsbejvaqbjf.bet/

Phillip Wood Feb. 23, 2022, 12:07 p.m. UTC | #11

On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote:
> [...]
> So just to make the point about one of those mentioned in my [1] with
> some further details (I won't go into the whole thing to avoid repeating
> myself):
> 
> I opened both of:
> 
>      https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
>      https://github.com/dscho/git/runs/4840190622?check_suite_focus=true
> 
> Just now in Firefox 91.5.0esr-1. Both having been opened before, so
> they're in cache, and I've got a current 40MB/s real downlink speed etc.
> 
> The former fully loads in around 5100ms, with your series here that's
> just short of 18000ms.
> 
> So your CI changes are making the common case of just looking at a CI
> failure more than **3x as slow as before**.

I don't think that is the most useful comparison between the two. When I 
am investigating a test failure the time that matters to me is the time 
it takes to display the output of the failing test case. With the first 
link above the initial page load is faster but to get to the output of 
the failing test case I have click on "Run ci/print_test_failures.sh" 
then wait for that to load and then search for "not ok" to actually get 
to the information I'm after. With the second link the initial page load 
does feel slower but then I'm presented  with the test failures nicely 
highlighted in red, all I have to do is click on one and I've got the 
information I'm after. Overall that is much faster and easier to use.

Best Wishes

Phillip

Ævar Arnfjörð Bjarmason Feb. 25, 2022, 12:39 p.m. UTC | #12

On Wed, Feb 23 2022, Phillip Wood wrote:

> On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote:
>> [...]
>> So just to make the point about one of those mentioned in my [1] with
>> some further details (I won't go into the whole thing to avoid repeating
>> myself):
>> I opened both of:
>>      https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
>>      https://github.com/dscho/git/runs/4840190622?check_suite_focus=true
>> Just now in Firefox 91.5.0esr-1. Both having been opened before, so
>> they're in cache, and I've got a current 40MB/s real downlink speed etc.
>> The former fully loads in around 5100ms, with your series here
>> that's
>> just short of 18000ms.
>> So your CI changes are making the common case of just looking at a
>> CI
>> failure more than **3x as slow as before**.
>
> I don't think that is the most useful comparison between the two.[...]

I'm not saying that it's the most useful comparison between the two, but
that there's a major performance regression introduced in this series
that so far isn't addressed or noted.

> [...]When
> I am investigating a test failure the time that matters to me is the
> time it takes to display the output of the failing test case. With the
> first link above the initial page load is faster but to get to the
> output of the failing test case I have click on "Run
> ci/print_test_failures.sh" then wait for that to load and then search
> for "not ok" to actually get to the information I'm after. With the
> second link the initial page load does feel slower but then I'm
> presented  with the test failures nicely highlighted in red, all I
> have to do is click on one and I've got the information I'm
> after. Overall that is much faster and easier to use.

Whether you think the regression is worth the end result is a subjective
judgement. I don't think it is, but I don't think you or anyone else is
wrong if they don't agree.

If you think it's OK to spend ~20s instead of ~5s on rendering the new
output that's something that clearly depends on how much you value the
new output, and much much you're willing to wait.

What I am saying, and what I hope you'll agree with, is that it's
something that should be addressed in some way by this series.

One way to do that would be to note the performence regression in a
commit message, and argue that despite the slowdown it's worth it.

Johannes Schindelin Feb. 25, 2022, 2:10 p.m. UTC | #13

Hi Phillip,

On Wed, 23 Feb 2022, Phillip Wood wrote:

> On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote:
> > [...]
> > So just to make the point about one of those mentioned in my [1] with
> > some further details (I won't go into the whole thing to avoid repeating
> > myself):
> >
> > I opened both of:
> >
> > https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
> > https://github.com/dscho/git/runs/4840190622?check_suite_focus=true
> >
> > Just now in Firefox 91.5.0esr-1. Both having been opened before, so
> > they're in cache, and I've got a current 40MB/s real downlink speed etc.
> >
> > The former fully loads in around 5100ms, with your series here that's
> > just short of 18000ms.
> >
> > So your CI changes are making the common case of just looking at a CI
> > failure more than **3x as slow as before**.
>
> I don't think that is the most useful comparison between the two. When I
> am investigating a test failure the time that matters to me is the time
> it takes to display the output of the failing test case.

Thank you for expressing this so clearly. I will adopt a variation of this
phrasing in my commit message, if you don't mind?

> With the first link above the initial page load is faster but to get to
> the output of the failing test case I have click on "Run
> ci/print_test_failures.sh" then wait for that to load and then search
> for "not ok" to actually get to the information I'm after.

And that's only because you are familiar with what you have to do.

Any new contributor would be stuck with the information presented on the
initial load, without any indication that more information _is_ available,
just hidden away in the next step's log (which is marked as "succeeding",
therefore misleading the inclined reader into thinking that this cannot
potentially contain any information pertinent to the _failure_ that needs
to be investigated).

> With the second link the initial page load does feel slower but then I'm
> presented with the test failures nicely highlighted in red, all I have
> to do is click on one and I've got the information I'm after.
>
> Overall that is much faster and easier to use.

Thank you for your comment. I really started to doubt myself, getting the
idea that it's just a case of me holding this thing wrong.

For what it's worth, I did make a grave mistake by using that particular
`seen` CI failure with all of those failing p4 tests, which obviously
resulted in an incredibly large amount of logs. Obviously that _must_ be
slow to load. I just did not have the time to fabricate a CI failure.

However, I do agree with you that the large amount of logs would have to
be looked at _anyway_, whether it is shown upon loading the job's logs or
only when expanding the `print-test-failures` step's logs. The amount of
the logs is a constant, after all, I did not change anything there (nor
would I).

So a better example might be my concrete use case yesterday: the CI build
of `seen` failed. Here is the link to the regular output:

	https://github.com/git/git/actions/runs/1890665968

On that page, you see the following:


	Annotations
	8 errors and 1 warning

	ⓧ win test (3)
	  Process completed with exit code 2.

	ⓧ win test (6)
	  Process completed with exit code 2.

	ⓧ win test (2)
	  Process completed with exit code 2.

	ⓧ win+VS test (3)
	  Process completed with exit code 2.

	ⓧ win+VS test (6)
	  Process completed with exit code 2.

	ⓧ win+VS test (2)
	  Process completed with exit code 2.

	ⓧ osx-gcc (macos-latest)
	  Process completed with exit code 2.

	ⓧ osx-clang (macos-latest)
	  Process completed with exit code 2.

	⚠ CI: .github#L1
	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856

So I merged my branch into `seen` and pushed it. The corresponding run can
be seen here:

	https://github.com/dscho/git/actions/runs/1892982393

On that page, you see the following:

	Annotations
	50 errors and 1 warning

	ⓧ win test (3)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ win test (3)
	  failed: t7527.2 implicit daemon start

	ⓧ win test (3)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ win test (3)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ win test (3)
	  failed: t7527.5 implicit daemon stop (rename GIT~1)

	ⓧ win test (3)
	  failed: t7527.6 implicit daemon stop (rename GIT~2)

	ⓧ win test (3)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ win test (3)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ win test (3)
	  failed: t7527.11 status implicitly starts daemon

	ⓧ win test (3)
	  failed: t7527.12 edit some files

	ⓧ win test (2)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ win test (2)
	  Process completed with exit code 1.

	ⓧ win test (6)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ win test (6)
	  failed: t7519.3 run fsmonitor-daemon in virtual repo

	ⓧ win test (6)
	  Process completed with exit code 1.

	ⓧ win+VS test (3)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ win+VS test (3)
	  failed: t7527.2 implicit daemon start

	ⓧ win+VS test (3)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ win+VS test (3)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ win+VS test (3)
	  failed: t7527.5 implicit daemon stop (rename GIT~1)

	ⓧ win+VS test (3)
	  failed: t7527.6 implicit daemon stop (rename GIT~2)

	ⓧ win+VS test (3)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ win+VS test (3)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ win+VS test (3)
	  failed: t7527.11 status implicitly starts daemon

	ⓧ win+VS test (3)
	  failed: t7527.12 edit some files

	ⓧ win+VS test (2)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ win+VS test (2)
	  Process completed with exit code 1.

	ⓧ win+VS test (6)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ win+VS test (6)
	  failed: t7519.3 run fsmonitor-daemon in virtual repo

	ⓧ win+VS test (6)
	  Process completed with exit code 1.

	ⓧ osx-clang (macos-latest)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ osx-clang (macos-latest)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ osx-clang (macos-latest)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ osx-clang (macos-latest)
	  failed: t7527.2 implicit daemon start

	ⓧ osx-clang (macos-latest)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ osx-clang (macos-latest)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ osx-clang (macos-latest)
	  failed: t7527.7 MacOS event spelling (rename .GIT)

	ⓧ osx-clang (macos-latest)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ osx-clang (macos-latest)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ osx-clang (macos-latest)
	  failed: t7527.11 status implicitly starts daemon

	ⓧ osx-gcc (macos-latest)
	  failed: t0012.81 fsmonitor--daemon can handle -h

	ⓧ osx-gcc (macos-latest)
	  failed: t7519.2 run fsmonitor-daemon in bare repo

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.1 explicit daemon start and stop

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.2 implicit daemon start

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.3 implicit daemon stop (delete .git)

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.4 implicit daemon stop (rename .git)

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.7 MacOS event spelling (rename .GIT)

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.8 cannot start multiple daemons

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.10 update-index implicitly starts daemon

	ⓧ osx-gcc (macos-latest)
	  failed: t7527.11 status implicitly starts daemon

	⚠ CI: .github#L1
	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856

In my mind, this is already an improvement. (Even if this is a _lot_ of
output, and a lot of individual errors, given that all of them are fixed
with a single, small patch to adjust an option usage string, but that's
not the fault of my patch series, but of the suggestion to put the check
for the option usage string linting into the `parse_options()` machinery
instead of into the static analysis job.)

Since there are still plenty of failures, the page admittedly does load
relatively slowly. But that's not the time I was trying to optimize for.
My time comes at quite a premium these days, so if the computer has to
work a little harder while I can do something else, as long as it saves
_me_ time, I'll take that time. Every time.

Ciao,
Dscho

Junio C Hamano Feb. 25, 2022, 6:16 p.m. UTC | #14

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So I merged my branch into `seen` and pushed it. The corresponding run can
> be seen here:
>
> 	https://github.com/dscho/git/actions/runs/1892982393

I visited this page (while logged in to GItHub---I am saying this
for others who may not know the output is shown differently for
visitors that are logged-in, and and logged-in users).

> On that page, you see the following:
>
> 	Annotations
> 	50 errors and 1 warning
>
> 	ⓧ win test (3)
> 	  failed: t7527.1 explicit daemon start and stop
> ...
>
> 	⚠ CI: .github#L1
> 	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856
>
> In my mind, this is already an improvement. (Even if this is a _lot_ of
> output, and a lot of individual errors, given that all of them are fixed
> with a single, small patch to adjust an option usage string, but that's
> not the fault of my patch series, but of the suggestion to put the check
> for the option usage string linting into the `parse_options()` machinery
> instead of into the static analysis job.)

It is not obvious what aspect in the new output _you_ found "an
improvement" to your readers, because you didn't spell it out.  That
makes "in my mind, this is already an improvement" a claim that is
unnecessarily weaker than it really is.

Let me tell my experience:

 - Clicking on macos+clang in the map-looking thing, it did show and
   scroll down automatically to show the last failure link ready to
   be clicked after a few seconds, which was nice, but made me
   scroll back to see the first failure, which could have been
   better.

 - Clicking on win+VS test (2), the failed <test> part was
   automatically opened, and a circle spinned for several dozens of
   seconds to make me wait, but after that, nothing happened.  It
   was somewhat hard to know if I were expected to do something to
   view the first error and when the UI is ready to let me do so, or
   if I were just expected to wait a bit longer for it to all happen
   automatically.

Either case, the presentation to fold all the pieces that finished
successfully made it usable, as that saved human time to scan to
where failures are shown.

I personally do not care about the initial latency when viewing the
output from CI run that may have happened a few dozens of minutes
ago (I do not sit in front of GitHub CI UI and wait until it
finishes). As long as it is made clear when I can start interacting
with it, I can just open the page and let it load while I am working
on something else.

Thanks.

Junio C Hamano Feb. 26, 2022, 6:43 p.m. UTC | #15

Junio C Hamano <gitster@pobox.com> writes:

> Let me tell my experience:
>
>  - Clicking on macos+clang in the map-looking thing, it did show and
>    scroll down automatically to show the last failure link ready to
>    be clicked after a few seconds, which was nice, but made me
>    scroll back to see the first failure, which could have been
>    better.
>
>  - Clicking on win+VS test (2), the failed <test> part was
>    automatically opened, and a circle spinned for several dozens of
>    seconds to make me wait, but after that, nothing happened.  It
>    was somewhat hard to know if I were expected to do something to
>    view the first error and when the UI is ready to let me do so, or
>    if I were just expected to wait a bit longer for it to all happen
>    automatically.
>
> Either case, the presentation to fold all the pieces that finished
> successfully made it usable, as that saved human time to scan to
> where failures are shown.
>
> I personally do not care about the initial latency when viewing the
> output from CI run that may have happened a few dozens of minutes
> ago (I do not sit in front of GitHub CI UI and wait until it
> finishes). As long as it is made clear when I can start interacting
> with it, I can just open the page and let it load while I am working
> on something else.

FWIW, CI run on "seen" uses this series.

When I highlight a failure at CI, I often give a URL like this:

https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520

I notice that this "hide by default" forces the recipient of the URL
to click the line after the line with a red highlight before they
can view the breakage.

For example, an URL to show a similar breakage from the old run
(without this series) looks like this:

https://github.com/git/git/runs/5341052811?check_suite_focus=true#step:5:3968

This directly jumps to the error and the recipient of the URL does
not have to do anything special, which I have been using as a
convenient way to give developers a starting point.

I haven't compared the implementation of this one and Ævar's series
that aims for a different goal, so I do not yet have an opinion on
which one should come first (if we want to achieve both of what each
of them wants to achieve, that is).

Thanks.

Junio C Hamano March 1, 2022, 2:59 a.m. UTC | #16

Junio C Hamano <gitster@pobox.com> writes:

> FWIW, CI run on "seen" uses this series.

Another "early impression".  I had to open this one today,

    https://github.com/git/git/runs/5367854000?check_suite_focus=true

which was a jarring experience.  It correctly painted the fourth
circle "Run ci/run-build-and-tests.sh" in red with X in it, and
after waiting for a while (which I already said that I do not mind
at all), showed a bunch of line, and then auto-scrolled down to the
end of that section.

It _looked_ like that it was now ready for me to interact with it,
so I started to scroll up to the beginning of that section, but I
had to stare at blank space for several minutes before lines are
shown to occupy that space.  During the repainting, unlike the
initial delay-wait that lets me know that it is not ready by showing
the spinning circle, there was no indication that it wants me to
wait until it fills the blank space with lines.  Not very pleasant.

I do not think it is so bad to say that it is less pleasant than
opening the large "print test failures" section and looking for "not
ok", which was what the original CI UI we had before this series.
But at least with the old one, once the UI becomes ready for me to
interact with, I didn't have to wait for (for the lack of better
phrase) such UI hiccups.  Responses to looking for the next instance
of "not ok" was predictable.

Thanks.

Junio C Hamano March 1, 2022, 6:35 a.m. UTC | #17

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> FWIW, CI run on "seen" uses this series.
>
> Another "early impression".  I had to open this one today,
>
>     https://github.com/git/git/runs/5367854000?check_suite_focus=true
>
> which was a jarring experience.  It correctly painted the fourth
> circle "Run ci/run-build-and-tests.sh" in red with X in it, and
> after waiting for a while (which I already said that I do not mind
> at all), showed a bunch of line, and then auto-scrolled down to the
> end of that section.
>
> It _looked_ like that it was now ready for me to interact with it,
> so I started to scroll up to the beginning of that section, but I
> had to stare at blank space for several minutes before lines are

Nah, that was several seconds, not minutes.  Even though I am on
Chromebooks, they are not _that_ slow ;-)

> shown to occupy that space.  During the repainting, unlike the
> initial delay-wait that lets me know that it is not ready by showing
> the spinning circle, there was no indication that it wants me to
> wait until it fills the blank space with lines.  Not very pleasant.
>
> I do not think it is so bad to say that it is less pleasant than
> opening the large "print test failures" section and looking for "not
> ok", which was what the original CI UI we had before this series.
> But at least with the old one, once the UI becomes ready for me to
> interact with, I didn't have to wait for (for the lack of better
> phrase) such UI hiccups.  Responses to looking for the next instance
> of "not ok" was predictable.
>
> Thanks.

Johannes Schindelin March 1, 2022, 10:10 a.m. UTC | #18

Hi Junio,

On Sat, 26 Feb 2022, Junio C Hamano wrote:

> When I highlight a failure at CI, I often give a URL like this:
>
> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5520
>
> I notice that this "hide by default" forces the recipient of the URL
> to click the line after the line with a red highlight before they
> can view the breakage.

Yes, that's because line 5520 is the header of that group. If you direct
the reader to
https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5674
instead, it will get expanded.

If it would not get expanded, that would be a bug, but obviously not in my
patch series.

Ciao,
Dscho

Johannes Schindelin March 1, 2022, 10:18 a.m. UTC | #19

Hi Junio,

On Mon, 28 Feb 2022, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > FWIW, CI run on "seen" uses this series.
>
> Another "early impression".  I had to open this one today,
>
>     https://github.com/git/git/runs/5367854000?check_suite_focus=true
>
> which was a jarring experience.  It correctly painted the fourth
> circle "Run ci/run-build-and-tests.sh" in red with X in it, and
> after waiting for a while (which I already said that I do not mind
> at all), showed a bunch of line, and then auto-scrolled down to the
> end of that section.
>
> It _looked_ like that it was now ready for me to interact with it,
> so I started to scroll up to the beginning of that section, but I
> had to stare at blank space for several minutes before lines are
> shown to occupy that space.  During the repainting, unlike the
> initial delay-wait that lets me know that it is not ready by showing
> the spinning circle, there was no indication that it wants me to
> wait until it fills the blank space with lines.  Not very pleasant.
>
> I do not think it is so bad to say that it is less pleasant than
> opening the large "print test failures" section and looking for "not
> ok", which was what the original CI UI we had before this series.
> But at least with the old one, once the UI becomes ready for me to
> interact with, I didn't have to wait for (for the lack of better
> phrase) such UI hiccups.  Responses to looking for the next instance
> of "not ok" was predictable.

Let me again state my goal clearly, because some readers seem to be
confused and believe that I want to improve the developer experience of
veterans of the Git mailing list who are more than capable of finding
their way through the build failures.

My goal is instead to make new contributors' lives easier.

It is a pretty high bar we set, expecting a new contributor faced with a
build failure to figure out how to fix the breakage. It's as if we
_wanted_ to instill impostors' syndrome in them, which is probably not
intentional.

In that respect, a relatively responsive page that utterly fails to direct
the reader to the culprit is far, far worse than a slightly bumpy page
that _does_ lead the reader in the right direction.

In any case, thank you for integrating the patches into `seen` so that the
impact of the patches can be "seen".

Ciao,
Dscho

Johannes Schindelin March 1, 2022, 10:20 a.m. UTC | #20

Hi Junio,

On Fri, 25 Feb 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > On that page, you see the following:
> >
> > 	Annotations
> > 	50 errors and 1 warning
> >
> > 	ⓧ win test (3)
> > 	  failed: t7527.1 explicit daemon start and stop
> > ...
> >
> > 	⚠ CI: .github#L1
> > 	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856
> >
> > In my mind, this is already an improvement. (Even if this is a _lot_ of
> > output, and a lot of individual errors, given that all of them are fixed
> > with a single, small patch to adjust an option usage string, but that's
> > not the fault of my patch series, but of the suggestion to put the check
> > for the option usage string linting into the `parse_options()` machinery
> > instead of into the static analysis job.)
>
> It is not obvious what aspect in the new output _you_ found "an
> improvement" to your readers, because you didn't spell it out.

True.

The difference is those added lines that clearly indicate not only the
failed job, but also the precise test case that failed. All in the summary
page, without requiring any clicking nor scrolling.

Thank you,
Dscho

Junio C Hamano March 1, 2022, 4:52 p.m. UTC | #21

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Let me again state my goal clearly, because some readers seem to be
> confused and believe that I want to improve the developer experience of
> veterans of the Git mailing list who are more than capable of finding
> their way through the build failures.
>
> My goal is instead to make new contributors' lives easier.

I understand who your primary target audiences are, and making them
happy without robbing too much from others is a good thing to do.
Instead of "Nah, this is not targetted for me" and ignoring the
topic, I have been reporting my dogfood experience exactly to help
that process---to notice if this variant worsens end-user experience
and see if that is within the reasonable pros-and-cons tolerance,
especially because they will start noticing the same hiccup as new
contributors stay longer and gain experiences.

> In any case, thank you for integrating the patches into `seen` so that the
> impact of the patches can be "seen".

No, do not thank me.  Thank yourself for writing it, and thank
others to try it and tell you their experiences to help you make it
better.

Thanks.

Junio C Hamano March 1, 2022, 4:57 p.m. UTC | #22

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Yes, that's because line 5520 is the header of that group. If you direct
> the reader to
> https://github.com/git/git/runs/5343133021?check_suite_focus=true#step:4:5674
> instead, it will get expanded.

Ah, that's a good trick to know.  I presume 5674 can be any random
line in the block that is hidden by default that I want the viewer
to expand, right?

> If it would not get expanded, that would be a bug, but obviously not in my
> patch series.

Well, the choice to use a mechanism that has such a bug is made by
this patch series.  If "allowing users to easily exchange a URL that
points the exact place that the tests broke" were one of the goals,
it does not matter whose fault it is---the aggregate result is that
the new UI would have failed that goal.

Thankfully there does not seem to be such a bug in the UI mechanism
used by this series, so we are in good shape.

Thanks.

Phillip Wood March 2, 2022, 10:58 a.m. UTC | #23

Hi Dscho

On 25/02/2022 14:10, Johannes Schindelin wrote:
> Hi Phillip,
> 
> On Wed, 23 Feb 2022, Phillip Wood wrote:
> 
>> On 22/02/2022 13:31, Ævar Arnfjörð Bjarmason wrote:
>>> [...]
>>> So just to make the point about one of those mentioned in my [1] with
>>> some further details (I won't go into the whole thing to avoid repeating
>>> myself):
>>>
>>> I opened both of:
>>>
>>> https://github.com/git-for-windows/git/runs/4822802185?check_suite_focus=true
>>> https://github.com/dscho/git/runs/4840190622?check_suite_focus=true
>>>
>>> Just now in Firefox 91.5.0esr-1. Both having been opened before, so
>>> they're in cache, and I've got a current 40MB/s real downlink speed etc.
>>>
>>> The former fully loads in around 5100ms, with your series here that's
>>> just short of 18000ms.
>>>
>>> So your CI changes are making the common case of just looking at a CI
>>> failure more than **3x as slow as before**.
>>
>> I don't think that is the most useful comparison between the two. When I
>> am investigating a test failure the time that matters to me is the time
>> it takes to display the output of the failing test case.
> 
> Thank you for expressing this so clearly. I will adopt a variation of this
> phrasing in my commit message, if you don't mind?

That's fine

>> With the first link above the initial page load is faster but to get to
>> the output of the failing test case I have click on "Run
>> ci/print_test_failures.sh" then wait for that to load and then search
>> for "not ok" to actually get to the information I'm after.
> 
> And that's only because you are familiar with what you have to do.
> 
> Any new contributor would be stuck with the information presented on the
> initial load, without any indication that more information _is_ available,
> just hidden away in the next step's log (which is marked as "succeeding",
> therefore misleading the inclined reader into thinking that this cannot
> potentially contain any information pertinent to the _failure_ that needs
> to be investigated).

Yes it took we a while to realize how to get to the test output when I 
first started looking at the CI results.

One thing I forgot to mention was that when you expand a failing test it 
shows the test script twice before the output e.g.

Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
   				git config core.fsmonitor true &&
   				git fsmonitor--daemon start &&
   				git update-index --fsmonitor
   			
   expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable 
fsmonitor':
   				git config core.fsmonitor true &&
   				git fsmonitor--daemon start &&
   				git update-index --fsmonitor

  ++ git config core.fsmonitor true
  ++ git fsmonitor--daemon start 			
  ...

I don't know how easy it would be to fix that so that we only show 
"expecting success of ..." without the test being printed first

Best Wishes

Phillip


>> With the second link the initial page load does feel slower but then I'm
>> presented with the test failures nicely highlighted in red, all I have
>> to do is click on one and I've got the information I'm after.
>>
>> Overall that is much faster and easier to use.
> 
> Thank you for your comment. I really started to doubt myself, getting the
> idea that it's just a case of me holding this thing wrong.
> 
> For what it's worth, I did make a grave mistake by using that particular
> `seen` CI failure with all of those failing p4 tests, which obviously
> resulted in an incredibly large amount of logs. Obviously that _must_ be
> slow to load. I just did not have the time to fabricate a CI failure.
> 
> However, I do agree with you that the large amount of logs would have to
> be looked at _anyway_, whether it is shown upon loading the job's logs or
> only when expanding the `print-test-failures` step's logs. The amount of
> the logs is a constant, after all, I did not change anything there (nor
> would I).
> 
> So a better example might be my concrete use case yesterday: the CI build
> of `seen` failed. Here is the link to the regular output:
> 
> 	https://github.com/git/git/actions/runs/1890665968
> 
> On that page, you see the following:
> 
> 
> 	Annotations
> 	8 errors and 1 warning
> 
> 	ⓧ win test (3)
> 	  Process completed with exit code 2.
> 
> 	ⓧ win test (6)
> 	  Process completed with exit code 2.
> 
> 	ⓧ win test (2)
> 	  Process completed with exit code 2.
> 
> 	ⓧ win+VS test (3)
> 	  Process completed with exit code 2.
> 
> 	ⓧ win+VS test (6)
> 	  Process completed with exit code 2.
> 
> 	ⓧ win+VS test (2)
> 	  Process completed with exit code 2.
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  Process completed with exit code 2.
> 
> 	ⓧ osx-clang (macos-latest)
> 	  Process completed with exit code 2.
> 
> 	⚠ CI: .github#L1
> 	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856
> 
> So I merged my branch into `seen` and pushed it. The corresponding run can
> be seen here:
> 
> 	https://github.com/dscho/git/actions/runs/1892982393
> 
> On that page, you see the following:
> 
> 	Annotations
> 	50 errors and 1 warning
> 
> 	ⓧ win test (3)
> 	  failed: t7527.1 explicit daemon start and stop
> 
> 	ⓧ win test (3)
> 	  failed: t7527.2 implicit daemon start
> 
> 	ⓧ win test (3)
> 	  failed: t7527.3 implicit daemon stop (delete .git)
> 
> 	ⓧ win test (3)
> 	  failed: t7527.4 implicit daemon stop (rename .git)
> 
> 	ⓧ win test (3)
> 	  failed: t7527.5 implicit daemon stop (rename GIT~1)
> 
> 	ⓧ win test (3)
> 	  failed: t7527.6 implicit daemon stop (rename GIT~2)
> 
> 	ⓧ win test (3)
> 	  failed: t7527.8 cannot start multiple daemons
> 
> 	ⓧ win test (3)
> 	  failed: t7527.10 update-index implicitly starts daemon
> 
> 	ⓧ win test (3)
> 	  failed: t7527.11 status implicitly starts daemon
> 
> 	ⓧ win test (3)
> 	  failed: t7527.12 edit some files
> 
> 	ⓧ win test (2)
> 	  failed: t0012.81 fsmonitor--daemon can handle -h
> 
> 	ⓧ win test (2)
> 	  Process completed with exit code 1.
> 
> 	ⓧ win test (6)
> 	  failed: t7519.2 run fsmonitor-daemon in bare repo
> 
> 	ⓧ win test (6)
> 	  failed: t7519.3 run fsmonitor-daemon in virtual repo
> 
> 	ⓧ win test (6)
> 	  Process completed with exit code 1.
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.1 explicit daemon start and stop
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.2 implicit daemon start
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.3 implicit daemon stop (delete .git)
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.4 implicit daemon stop (rename .git)
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.5 implicit daemon stop (rename GIT~1)
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.6 implicit daemon stop (rename GIT~2)
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.8 cannot start multiple daemons
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.10 update-index implicitly starts daemon
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.11 status implicitly starts daemon
> 
> 	ⓧ win+VS test (3)
> 	  failed: t7527.12 edit some files
> 
> 	ⓧ win+VS test (2)
> 	  failed: t0012.81 fsmonitor--daemon can handle -h
> 
> 	ⓧ win+VS test (2)
> 	  Process completed with exit code 1.
> 
> 	ⓧ win+VS test (6)
> 	  failed: t7519.2 run fsmonitor-daemon in bare repo
> 
> 	ⓧ win+VS test (6)
> 	  failed: t7519.3 run fsmonitor-daemon in virtual repo
> 
> 	ⓧ win+VS test (6)
> 	  Process completed with exit code 1.
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t0012.81 fsmonitor--daemon can handle -h
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7519.2 run fsmonitor-daemon in bare repo
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.1 explicit daemon start and stop
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.2 implicit daemon start
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.3 implicit daemon stop (delete .git)
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.4 implicit daemon stop (rename .git)
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.7 MacOS event spelling (rename .GIT)
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.8 cannot start multiple daemons
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.10 update-index implicitly starts daemon
> 
> 	ⓧ osx-clang (macos-latest)
> 	  failed: t7527.11 status implicitly starts daemon
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t0012.81 fsmonitor--daemon can handle -h
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7519.2 run fsmonitor-daemon in bare repo
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.1 explicit daemon start and stop
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.2 implicit daemon start
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.3 implicit daemon stop (delete .git)
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.4 implicit daemon stop (rename .git)
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.7 MacOS event spelling (rename .GIT)
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.8 cannot start multiple daemons
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.10 update-index implicitly starts daemon
> 
> 	ⓧ osx-gcc (macos-latest)
> 	  failed: t7527.11 status implicitly starts daemon
> 
> 	⚠ CI: .github#L1
> 	  windows-latest workflows now use windows-2022. For more details, see https://github.com/actions/virtual-environments/issues/4856
> 
> In my mind, this is already an improvement. (Even if this is a _lot_ of
> output, and a lot of individual errors, given that all of them are fixed
> with a single, small patch to adjust an option usage string, but that's
> not the fault of my patch series, but of the suggestion to put the check
> for the option usage string linting into the `parse_options()` machinery
> instead of into the static analysis job.)
> 
> Since there are still plenty of failures, the page admittedly does load
> relatively slowly. But that's not the time I was trying to optimize for.
> My time comes at quite a premium these days, so if the computer has to
> work a little harder while I can do something else, as long as it saves
> _me_ time, I'll take that time. Every time.
> 
> Ciao,
> Dscho

Junio C Hamano March 4, 2022, 7:38 a.m. UTC | #24

GitHub CI seems to fail due to lack of "paste" for win+VS job.  This
was somewhat unexpected, as our test scripts seem to make liberal
use of "cut" that goes together with it.

    https://github.com/git/git/runs/5415486631?check_suite_focus=true#step:5:6199

The particular failure at the URL comes from the use of "paste" in
5ea4f3a5 (name-rev: use generation numbers if available,
2022-02-28), but it hardly is the first use of the command.  There
is one use of it in t/aggregate-results.sh in 'master/main' already.

We could rewrite the tests that use "paste" but looking at the use
of the tool in the test (and the aggregate thing), rewriting them
due to lack of a tool, whose source should be freely available from
where "cut" was taken from, does not sound like too attractive a
direction to go in, but I do not know how much work is involved in
adding it (and in general, any basic tool with similar complexity
that we may find missing in the future) to the win+VS environment.

Thoughts?

Thanks.

Ævar Arnfjörð Bjarmason March 4, 2022, 9:04 a.m. UTC | #25

On Thu, Mar 03 2022, Junio C Hamano wrote:

> GitHub CI seems to fail due to lack of "paste" for win+VS job.  This
> was somewhat unexpected, as our test scripts seem to make liberal
> use of "cut" that goes together with it.
>
>     https://github.com/git/git/runs/5415486631?check_suite_focus=true#step:5:6199
>
> The particular failure at the URL comes from the use of "paste" in
> 5ea4f3a5 (name-rev: use generation numbers if available,
> 2022-02-28), but it hardly is the first use of the command.  There
> is one use of it in t/aggregate-results.sh in 'master/main' already.

I think it's the first use, the t/aggregate-results.sh is run on
"DEFAULT_TEST_TARGET=test make -C t", but we use
"DEFAULT_TEST_TARGET=prove".

Re your upthread:

> I personally do not care about the initial latency when viewing the
> output from CI run that may have happened a few dozens of minutes
> ago (I do not sit in front of GitHub CI UI and wait until it
> finishes). 

I think this URL is a good example of what I noted in [1]. Your link
loads relatively quickly, but I then saw a "linux-TEST-vars" failure and
clicked on it, wanting to see why that fails.

It opens relatively quickly, but no failure can be seen. It stalls with
a spinner next to "t/run-build-and-test.sh", and stalled like that for
75[2] seconds before finally loading past line ~3.5k to line ~70k
showing the relevant failure in t6120*.sh.

I really don't think it's a reasonable claim to say that only "veterans"
of git development[3] are likely to find the workflow of seeing a CI
failure right away useful, or wanting to browse through the N different
"job" failures without having to pre-open them, go find something else
to do, then come back to it etc.

I also noted in [1] that it takes a lot more CPU now, so even if that is
your workflow for looking at CI you'll need a fairly performant machine
if you have a few job failures (which isn't a typical), as each tab will
be pegging a CPU core at ~100% for a while.

I have fairly normally spec'd quad-core laptop that I almost never hear,
and this new CI UI is pretty reliable in making it sound as though it's
about to take flight.

1. https://lore.kernel.org/git/220222.86tucr6kz5.gmgdl@evledraar.gmail.com/
2. I reported large N seconds, but nothing so bad before. For some reason
   this one's particularly bad, but in [1] it was the same CPU use with ~20s
   etc (but that one was 1/2 the amount of lines)
3. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2203011111150.11118@tvgsbejvaqbjf.bet/

Johannes Schindelin March 7, 2022, 3:48 p.m. UTC | #26

Hi Junio,

On Thu, 3 Mar 2022, Junio C Hamano wrote:

> GitHub CI seems to fail due to lack of "paste" for win+VS job.  This
> was somewhat unexpected, as our test scripts seem to make liberal
> use of "cut" that goes together with it.
>
>     https://github.com/git/git/runs/5415486631?check_suite_focus=true#step:5:6199
>
> The particular failure at the URL comes from the use of "paste" in
> 5ea4f3a5 (name-rev: use generation numbers if available,
> 2022-02-28), but it hardly is the first use of the command.  There
> is one use of it in t/aggregate-results.sh in 'master/main' already.
>
> We could rewrite the tests that use "paste" but looking at the use
> of the tool in the test (and the aggregate thing), rewriting them
> due to lack of a tool, whose source should be freely available from
> where "cut" was taken from, does not sound like too attractive a
> direction to go in, but I do not know how much work is involved in
> adding it (and in general, any basic tool with similar complexity
> that we may find missing in the future) to the win+VS environment.

I added it:
https://github.com/git-for-windows/git-sdk-64/commit/e3ade7eef2503149dfefe630037c2fd6d24f2c14

It will take ~35 minutes (at time of writing) for
https://dev.azure.com/Git-for-Windows/git/_build/results?buildId=95542&view=results
to make the corresponding artifact available to the
`setup-git-for-windows-sdk` Action we use.

Ciao,
Dscho

Johannes Schindelin March 7, 2022, 3:51 p.m. UTC | #27

Hi Ævar,

On Fri, 4 Mar 2022, Ævar Arnfjörð Bjarmason wrote:

> I really don't think it's a reasonable claim to say that only "veterans"
> of git development[3] are likely to find the workflow of seeing a CI
> failure right away useful, or wanting to browse through the N different
> "job" failures without having to pre-open them, go find something else
> to do, then come back to it etc.

I said that the current output is only useful to veterans. The output that
hides the detailed log, under a separate job that is marked as
non-failing.

That's still as true as when I said it. :-)

Ciao,
Johannes

Johannes Schindelin March 7, 2022, 4:07 p.m. UTC | #28

Hi Phillip,

On Wed, 2 Mar 2022, Phillip Wood wrote:

> On 25/02/2022 14:10, Johannes Schindelin wrote:
>
> > On Wed, 23 Feb 2022, Phillip Wood wrote:
> >
> > > With the first link above the initial page load is faster but to get
> > > to the output of the failing test case I have click on "Run
> > > ci/print_test_failures.sh" then wait for that to load and then
> > > search for "not ok" to actually get to the information I'm after.
> >
> > And that's only because you are familiar with what you have to do.
> >
> > Any new contributor would be stuck with the information presented on the
> > initial load, without any indication that more information _is_ available,
> > just hidden away in the next step's log (which is marked as "succeeding",
> > therefore misleading the inclined reader into thinking that this cannot
> > potentially contain any information pertinent to the _failure_ that needs
> > to be investigated).
>
> Yes it took we a while to realize how to get to the test output when I first
> started looking at the CI results.

Thank you for saying that. Since nobody else said it as clearly as you, I
really started to doubt myself here.

> One thing I forgot to mention was that when you expand a failing test it shows
> the test script twice before the output e.g.
>
> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>   				git config core.fsmonitor true &&
>   				git fsmonitor--daemon start &&
>   				git update-index --fsmonitor
>
>   expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor':
>   				git config core.fsmonitor true &&
>   				git fsmonitor--daemon start &&
>   				git update-index --fsmonitor
>
>  ++ git config core.fsmonitor true
>  ++ git fsmonitor--daemon start
>  ...
>
> I don't know how easy it would be to fix that so that we only show "expecting
> success of ..." without the test being printed first

It's not _super_ easy: right now, the patch series does not touch the code
that prints the latter message. In fact, that latter message is generated
_before_ the test fails, and redirected via `tee` into
`GIT_TEST_TEE_OUTPUT_FILE`. Then, once the verdict is clear that this test
case failed, the first message is printed (the one that is colored in the
output via `::error::`), and the output from the
`GIT_TEST_TEE_OUTPUT_FILE` file is pasted, starting at the offset marking
the start of the test case.

The easiest workaround would probably to add a flag that suppresses the
header `expecting success` in case we're running with the
`--github-workflow-markup` option.

I'll put it on my ever-growing TODO list, but maybe in the interest of
heeding the mantra "the perfect is the enemy of the good", this can be
done on top of this series rather than blocking it?

Thanks,
Dscho

Junio C Hamano March 7, 2022, 4:58 p.m. UTC | #29

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> GitHub CI seems to fail due to lack of "paste" for win+VS job.  This
>> was somewhat unexpected, as our test scripts seem to make liberal
>> use of "cut" that goes together with it.
>> ...
> I added it:
> https://github.com/git-for-windows/git-sdk-64/commit/e3ade7eef2503149dfefe630037c2fd6d24f2c14

Thanks.

Junio C Hamano March 7, 2022, 5:05 p.m. UTC | #30

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I said that the current output is only useful to veterans. The output that
> hides the detailed log, under a separate job that is marked as
> non-failing.
>
> That's still as true as when I said it. :-)

I think getting rid of the separate "print failures" CI step and
making it more discoverable how to reveal the details of failing
test step is a usability improvement.

I am not Ævar, but I think what was questioned was the improvement
justifies multi dozens of seconds extra waiting time, which is a
usability dis-improvement.  I do not have a good answer to that
question.

I am probably nearing to be a veteran who knows when to brew my tea
in my work cycle, and waiting for an extra minute or two while
browsing CI output is not a problem for me.

But new "non-veteran" users may not share that.  That is something a
bit worrying about the new UI.

Thanks.

Junio C Hamano March 7, 2022, 5:11 p.m. UTC | #31

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> One thing I forgot to mention was that when you expand a failing test it shows
>> the test script twice before the output e.g.
>>
>> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>>   				git config core.fsmonitor true &&
>>   				git fsmonitor--daemon start &&
>>   				git update-index --fsmonitor
>>
>>   expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor':
>>   				git config core.fsmonitor true &&
>>   				git fsmonitor--daemon start &&
>>   				git update-index --fsmonitor
>>
>>  ++ git config core.fsmonitor true
>>  ++ git fsmonitor--daemon start
>>  ...
>>
>> I don't know how easy it would be to fix that so that we only show "expecting
>> success of ..." without the test being printed first
>
> It's not _super_ easy: right now, the patch series does not touch the code

In other words, it is not a new issue introduced by this series, right?

> The easiest workaround would probably to add a flag that suppresses the
> header `expecting success` in case we're running with the
> `--github-workflow-markup` option.

If that is the case, let's leave it outside the series.

If we do not have to hide the solution behind any option specific to
"--github-workflow-markup", then even users (like me) who reguarly
run "cd t && sh ./t1234-a-particular-test.sh -i -v" would benefit if
we no longer have to look at the duplicated test script in the
output.

Thanks.

Phillip Wood March 7, 2022, 5:12 p.m. UTC | #32

Hi Dscho
On 07/03/2022 16:07, Johannes Schindelin wrote:
> Hi Phillip,
> 
> On Wed, 2 Mar 2022, Phillip Wood wrote:
> 
>> On 25/02/2022 14:10, Johannes Schindelin wrote:
>>
>>> On Wed, 23 Feb 2022, Phillip Wood wrote:
>>>
>>>> With the first link above the initial page load is faster but to get
>>>> to the output of the failing test case I have click on "Run
>>>> ci/print_test_failures.sh" then wait for that to load and then
>>>> search for "not ok" to actually get to the information I'm after.
>>>
>>> And that's only because you are familiar with what you have to do.
>>>
>>> Any new contributor would be stuck with the information presented on the
>>> initial load, without any indication that more information _is_ available,
>>> just hidden away in the next step's log (which is marked as "succeeding",
>>> therefore misleading the inclined reader into thinking that this cannot
>>> potentially contain any information pertinent to the _failure_ that needs
>>> to be investigated).
>>
>> Yes it took we a while to realize how to get to the test output when I first
>> started looking at the CI results.
> 
> Thank you for saying that. Since nobody else said it as clearly as you, I
> really started to doubt myself here.
> 
>> One thing I forgot to mention was that when you expand a failing test it shows
>> the test script twice before the output e.g.
>>
>> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>>    				git config core.fsmonitor true &&
>>    				git fsmonitor--daemon start &&
>>    				git update-index --fsmonitor
>>
>>    expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor':
>>    				git config core.fsmonitor true &&
>>    				git fsmonitor--daemon start &&
>>    				git update-index --fsmonitor
>>
>>   ++ git config core.fsmonitor true
>>   ++ git fsmonitor--daemon start
>>   ...
>>
>> I don't know how easy it would be to fix that so that we only show "expecting
>> success of ..." without the test being printed first
> 
> It's not _super_ easy: right now, the patch series does not touch the code
> that prints the latter message. In fact, that latter message is generated
> _before_ the test fails, and redirected via `tee` into
> `GIT_TEST_TEE_OUTPUT_FILE`. Then, once the verdict is clear that this test
> case failed, the first message is printed (the one that is colored in the
> output via `::error::`), and the output from the
> `GIT_TEST_TEE_OUTPUT_FILE` file is pasted, starting at the offset marking
> the start of the test case.
> 
> The easiest workaround would probably to add a flag that suppresses the
> header `expecting success` in case we're running with the
> `--github-workflow-markup` option.
> 
> I'll put it on my ever-growing TODO list, but maybe in the interest of
> heeding the mantra "the perfect is the enemy of the good", this can be
> done on top of this series rather than blocking it?

Sure, I mentioned it in case it was easy to fix but I don't think it 
should stop these patches moving forward

Best Wishes

Phillip

> Thanks,
> Dscho

Ævar Arnfjörð Bjarmason March 9, 2022, 11:44 a.m. UTC | #33

On Mon, Mar 07 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>>> One thing I forgot to mention was that when you expand a failing test it shows
>>> the test script twice before the output e.g.
>>>
>>> Error: failed: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>>> failure: t7527.35 Matrix[uc:false][fsm:true] enable fsmonitor
>>>   				git config core.fsmonitor true &&
>>>   				git fsmonitor--daemon start &&
>>>   				git update-index --fsmonitor
>>>
>>>   expecting success of 7527.35 'Matrix[uc:false][fsm:true] enable fsmonitor':
>>>   				git config core.fsmonitor true &&
>>>   				git fsmonitor--daemon start &&
>>>   				git update-index --fsmonitor
>>>
>>>  ++ git config core.fsmonitor true
>>>  ++ git fsmonitor--daemon start
>>>  ...
>>>
>>> I don't know how easy it would be to fix that so that we only show "expecting
>>> success of ..." without the test being printed first
>>
>> It's not _super_ easy: right now, the patch series does not touch the code
>
> In other words, it is not a new issue introduced by this series, right?

It is a new issue in this series, specifically how
"finalize_test_case_output" interacts with "test_{ok,failure}_" and
friends.

>> The easiest workaround would probably to add a flag that suppresses the
>> header `expecting success` in case we're running with the
>> `--github-workflow-markup` option.
>
> If that is the case, let's leave it outside the series.
>
> If we do not have to hide the solution behind any option specific to
> "--github-workflow-markup", then even users (like me) who reguarly
> run "cd t && sh ./t1234-a-particular-test.sh -i -v" would benefit if
> we no longer have to look at the duplicated test script in the
> output.

Unless you invoke it with --github-workflow-markup you won't see the
duplication.

I had some comments about inherent limitations in the approach in this
series vis-a-vis parsing markup after the fact[1]. But that really
doesn't seem to apply here. We're just printing the test source into the
*.markup file twice for no particular reason, aren't we?

*tests locally*

Hrm, so first this is a bug:
    
    $ ./t0002-gitfile.sh  --github-workflow-markup
    FATAL: Unexpected exit with code 1
    FATAL: Unexpected exit with code 1
    
Seems it wants --tee but doesn't declare it, this works:

    $ rm -rf test-results/; ./t0002-gitfile.sh  --github-workflow-markup --tee; cat test-results/t0002-gitfile.markup

Isn't this a matter of making finalize_test_case_output not print the
full $* (including the test source) for failures?

1. https://lore.kernel.org/git/220309.861qzbnymn.gmgdl@evledraar.gmail.com/

Johannes Schindelin March 9, 2022, 1:02 p.m. UTC | #34

Hi Junio,

On Mon, 7 Mar 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I said that the current output is only useful to veterans. The output that
> > hides the detailed log, under a separate job that is marked as
> > non-failing.
> >
> > That's still as true as when I said it. :-)
>
> I think getting rid of the separate "print failures" CI step and
> making it more discoverable how to reveal the details of failing
> test step is a usability improvement.

I'm so glad you're saying that! I was starting to doubt whether my caring
about getting rid of that `print failures` step was maybe misguided.

> I am not Ævar, but I think what was questioned was the improvement
> justifies multi dozens of seconds extra waiting time, which is a
> usability dis-improvement.  I do not have a good answer to that
> question.

It is definitely worrisome that we have to pay such a price. And if there
was a good answer how to improve that (without sacrificing the
discoverability of the command trace corresponding to the test failure), I
would be more than just eager to hear it.

I did consider generating a HTML-formatted report that would then be
attached as a build artifact. But that would hide the relevant information
even worse than a "print failures" step.

Maybe I should just get rid of the grouping? But that _really_ helped me
when I investigated the recent "usage string updates" vs "FSMonitor"
problem, because the test case boundaries were so much clearer.

Plus, as far as I saw, GitHub workflow logs always scroll to the end of
the logs of the failing step, which would not help _at all_ here.

So I dunno.

> I am probably nearing to be a veteran who knows when to brew my tea
> in my work cycle, and waiting for an extra minute or two while
> browsing CI output is not a problem for me.

:-)

> But new "non-veteran" users may not share that.  That is something a
> bit worrying about the new UI.

Indeed. My goal, after all, is to improve the experience of contributors,
not to make it harder.

Still, given that you currently have to click quite a few times until you
get to where you need to be, I have my doubts that what this patch series
does is actually making things slower, measured in terms of the total time
from seeing a failed build to being able to diagnose the cause by
inspecting the command trace.

Ciao,
Dscho

Ævar Arnfjörð Bjarmason March 10, 2022, 3:23 p.m. UTC | #35

On Wed, Mar 09 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> On Mon, 7 Mar 2022, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > I said that the current output is only useful to veterans. The output that
>> > hides the detailed log, under a separate job that is marked as
>> > non-failing.
>> >
>> > That's still as true as when I said it. :-)
>>
>> I think getting rid of the separate "print failures" CI step and
>> making it more discoverable how to reveal the details of failing
>> test step is a usability improvement.
>
> I'm so glad you're saying that! I was starting to doubt whether my caring
> about getting rid of that `print failures` step was maybe misguided.

I don't think anyone's been maintaining that getting rid of it wouldn't
be ideal. I for one have just been commenting on issues in the proposed
implementation.

I think we might still want to retain some such steps in the future,
i.e. if we have a failure have subsequent steps that on failure() bump
varying levels of verbosity / raw logs etc., or even try re-running the
test in different ways (e.g. narrow it down with --run).

But the failure step you see when something fails should ideally have
the failure plus the relevant error, just as we do with compile errors.

>> I am not Ævar, but I think what was questioned was the improvement
>> justifies multi dozens of seconds extra waiting time, which is a
>> usability dis-improvement.  I do not have a good answer to that
>> question.
>
> It is definitely worrisome that we have to pay such a price. And if there
> was a good answer how to improve that (without sacrificing the
> discoverability of the command trace corresponding to the test failure), I
> would be more than just eager to hear it.

Isn't the answer to that what I suggested in [1]; I.e. the performance
problem being that we include N number of lines of the output that
*didn't fail*, and that's what slows down showing the relevant output
that *did* fail.

I.e. if say t3070-wildmatch.sh fails in a couple of tests we'd show a
*lot* of lines between the relevant failing tests, let's just elide the
non-failing ones and show the output for the failing ones specifically.

*Sometimes* (but very rarely) it's relevant to still look at the full
output, since the failure might be due to an earlier silent failure in a
previous test (or the state it left behind), but I think that's rare
enough that the right thing to do is just to stick that in a subsequent
"verbose dump" step or whatever.

>> But new "non-veteran" users may not share that.  That is something a
>> bit worrying about the new UI.
>
> Indeed. My goal, after all, is to improve the experience of contributors,
> not to make it harder.
>
> Still, given that you currently have to click quite a few times until you
> get to where you need to be, I have my doubts that what this patch series
> does is actually making things slower, measured in terms of the total time
> from seeing a failed build to being able to diagnose the cause by
> inspecting the command trace.

Yes, but wouldn't the "Test Summary Report" in [1] be the best of both
worlds[1] (with some minor changes to adapt it to the GitHub "grouping"
output, perhaps)?

Then you'd always see the specific of the failing test at the end, if
you had N number of failures you might have a lot of output above that,
but even that we could always tweak with some smart heuristic. I.e. show
verbose "not ok" output if failures <10, if 10..100 elide some for the
raw log, if >100 just print "this is completely screwed" or whatever :)

1. https://lore.kernel.org/git/220302.86mti87cj2.gmgdl@evledraar.gmail.com/

[0/9] ci: make Git's GitHub workflow output much more helpful

Message

Comments