diff mbox series

[1/3] MAINTAINERS: Introduce V: field for required tests

Message ID 20231115175146.9848-2-Nikolai.Kondrashov@redhat.com (mailing list archive)
State New
Headers show
Series MAINTAINERS: Introduce V: field for required tests | expand

Commit Message

Nikolai Kondrashov Nov. 15, 2023, 5:43 p.m. UTC
Introduce a new 'V:' ("Verify") field to MAINTAINERS. The field accepts
a name of a test suite which is required to be executed for each
contribution to the subsystem.

Each referenced test suite is expected to be documented in the new
Documentation/process/tests.rst file, which must have enough structure
(documented inside) for the tools to make use of it. Apart from basic
data, each test can refer to its "superset" - a test suite which this
one is a part of. The expected use is to describe both a large test
suite and its subsets, so the former would also be accepted, if a
subsystem requires only a subset.

Introduce a new tag, 'Tested-with:', documented in the
Documentation/process/submitting-patches.rst file. The tag is expected
to reference the documented test suites, similarly to the 'V:' field,
and to certify that the submitter executed the test suite on the change,
and that it passed.

Make scripts/checkpatch.pl ensure any added V: fields reference
documented test suites only, and output a warning if a change to a
subsystem doesn't certify the required test suites were executed,
if any.

If the test suite description includes a "Command", then checkpatch.pl
will output it as the one executing the suite. The command should run
with only the kernel tree and the regular developer environment set up.
But, at the same time, could simply output instructions for installing
any extra dependencies (or pull some automatically). The idea is to
get the developer into feedback loop quicker and easier, so they have
something to run and iterate on, even if it involves installing some
more stuff first. Therefore it's a good idea to add such wrappers to the
kernel tree proper and refer to them from the tests.rst.

Extend scripts/get_maintainer.pl to support retrieving the V: fields,
and scripts/parse-maintainers.pl to maintain their ordering.

Signed-off-by: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
---
 Documentation/process/submitting-patches.rst |  19 +++
 Documentation/process/tests.rst              |  35 ++++++
 MAINTAINERS                                  |   6 +
 scripts/checkpatch.pl                        | 118 ++++++++++++++++++-
 scripts/get_maintainer.pl                    |  17 ++-
 scripts/parse-maintainers.pl                 |   3 +-
 6 files changed, 194 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/process/tests.rst

Comments

Joe Perches Nov. 15, 2023, 6:31 p.m. UTC | #1
On Wed, 2023-11-15 at 19:43 +0200, Nikolai Kondrashov wrote:
> Introduce a new 'V:' ("Verify") field to MAINTAINERS. The field accepts
> a name of a test suite which is required to be executed for each
> contribution to the subsystem.

Perhaps this is simply too much overhead
process requirements for most kernel work.

While the addition of V: seems ok, I think
putting the verification in checkpatch is
odd at best and the verification of test
execution should be a separate script.
Mark Brown Nov. 15, 2023, 8:01 p.m. UTC | #2
On Wed, Nov 15, 2023 at 10:31:21AM -0800, Joe Perches wrote:
> On Wed, 2023-11-15 at 19:43 +0200, Nikolai Kondrashov wrote:
> > Introduce a new 'V:' ("Verify") field to MAINTAINERS. The field accepts
> > a name of a test suite which is required to be executed for each
> > contribution to the subsystem.

> Perhaps this is simply too much overhead
> process requirements for most kernel work.

> While the addition of V: seems ok, I think
> putting the verification in checkpatch is
> odd at best and the verification of test
> execution should be a separate script.

Verifying that the expected tags are present and valid does seem firmly
in what checkpatch's wheelhouse though?
Mark Brown Nov. 15, 2023, 8:14 p.m. UTC | #3
On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:

> Introduce a new tag, 'Tested-with:', documented in the
> Documentation/process/submitting-patches.rst file. The tag is expected
> to reference the documented test suites, similarly to the 'V:' field,
> and to certify that the submitter executed the test suite on the change,
> and that it passed.

This doesn't feel like it fits so well with git based flows - generally
the tags end up in git one way or another so there'll be a strong
tendency for this to end up getting added for one version and then
carried forward to the next version.  The way the tooling is at present
it doesn't really feel like there's a good point at which to insert the
tag.

I'm not sure exactly what'd be better though.
Konstantin Ryabitsev Nov. 15, 2023, 8:38 p.m. UTC | #4
On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
> Introduce a new tag, 'Tested-with:', documented in the
> Documentation/process/submitting-patches.rst file. The tag is expected
> to reference the documented test suites, similarly to the 'V:' field,
> and to certify that the submitter executed the test suite on the change,
> and that it passed.

I'm not sure it should be a tag that stays all the way through git commit.
How about making it a cover letter/first patch footer?

tested-with: <suite>

-K
Nikolai Kondrashov Nov. 16, 2023, noon UTC | #5
On 11/15/23 20:31, Joe Perches wrote:
> On Wed, 2023-11-15 at 19:43 +0200, Nikolai Kondrashov wrote:
>> Introduce a new 'V:' ("Verify") field to MAINTAINERS. The field accepts
>> a name of a test suite which is required to be executed for each
>> contribution to the subsystem.
> 
> Perhaps this is simply too much overhead
> process requirements for most kernel work.
> 
> While the addition of V: seems ok, I think
> putting the verification in checkpatch is
> odd at best and the verification of test
> execution should be a separate script.

I agree that this extends checkpatch.pl responsibilities somewhat. In the 
sense that it requires you to do something beside changing the patch itself. 
OTOH, checkpatch.pl already requires Signed-off-by:, which prompts you to 
check and clear up your authorship, similarly requiring work outside the patch.

At the same time, you're supposed to test your changes anyway. Sometimes it's 
manual and one-off, but often times running an existing test suite is at least 
beneficial, if not required.

In a sense, this is not *checkpatch.pl* itself requiring testing, but 
subsystem maintainers (who are opting in), and checkpatch.pl simply provides 
convenient means and an entry point for raising attention to maintainer's 
requests, and making it easier to discover the tests.

It also does *not* verify test execution, only alerts the contributors to the 
need, and requires certification. Again, similar to Signed-off-by:.

Nick
Nikolai Kondrashov Nov. 16, 2023, 12:09 p.m. UTC | #6
On 11/15/23 22:14, Mark Brown wrote:
> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
> 
>> Introduce a new tag, 'Tested-with:', documented in the
>> Documentation/process/submitting-patches.rst file. The tag is expected
>> to reference the documented test suites, similarly to the 'V:' field,
>> and to certify that the submitter executed the test suite on the change,
>> and that it passed.
> 
> This doesn't feel like it fits so well with git based flows - generally
> the tags end up in git one way or another so there'll be a strong
> tendency for this to end up getting added for one version and then
> carried forward to the next version.  The way the tooling is at present
> it doesn't really feel like there's a good point at which to insert the
> tag.
> 
> I'm not sure exactly what'd be better though.

Yeah, I agree that's a bit of a problem. One that only automated 
tools/testing/CI could fully solve. Cough, git forges, cough.

OTOH, once you managed to run an automated suite once, it's much easier to do 
it again, and most of the time developers *want* their code to work and pass 
the tests (it's much easier than manual testing after all). So it's likely 
they will keep running them for new revisions, even though they might not 
notice they simply reused the previously-added Tested-with: tag.

Still, one way to make this better could be requiring a URL pointing at test 
results to follow the test suite name in the Tested-with: tag. Then the 
maintainer could check that they're indeed fresh.

This would be however getting way ahead of ourselves and, with the current 
(average) state of testing infra, hard to do. Perhaps sometime later.

For now, I think this could already go a long way towards having more (and 
better) testing.

Nick
Nikolai Kondrashov Nov. 16, 2023, 12:14 p.m. UTC | #7
On 11/15/23 22:38, Konstantin Ryabitsev wrote:
> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
>> Introduce a new tag, 'Tested-with:', documented in the
>> Documentation/process/submitting-patches.rst file. The tag is expected
>> to reference the documented test suites, similarly to the 'V:' field,
>> and to certify that the submitter executed the test suite on the change,
>> and that it passed.
> 
> I'm not sure it should be a tag that stays all the way through git commit.
> How about making it a cover letter/first patch footer?
> 
> tested-with: <suite>

Yes, that would be better indeed. However, checkpatch.pl doesn't process cover 
letters, and so we would have no automated way to advertise and nudge people 
towards testing.

Nick

P.S. Git forges do that for you by nature of actually running the tests 
themselves, automatically. *Ducks*
Bagas Sanjaya Nov. 16, 2023, 1:20 p.m. UTC | #8
On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
> Make scripts/checkpatch.pl ensure any added V: fields reference
> documented test suites only, and output a warning if a change to a
> subsystem doesn't certify the required test suites were executed,
> if any.
> 
> If the test suite description includes a "Command", then checkpatch.pl
> will output it as the one executing the suite. The command should run
> with only the kernel tree and the regular developer environment set up.
> But, at the same time, could simply output instructions for installing
> any extra dependencies (or pull some automatically). The idea is to
> get the developer into feedback loop quicker and easier, so they have
> something to run and iterate on, even if it involves installing some
> more stuff first. Therefore it's a good idea to add such wrappers to the
> kernel tree proper and refer to them from the tests.rst.

Does it also apply to trivial patches (e.g. spelling or checkpatch fixes
as seen on drivers/staging/)?
Mark Brown Nov. 16, 2023, 1:26 p.m. UTC | #9
On Thu, Nov 16, 2023 at 02:14:24PM +0200, Nikolai Kondrashov wrote:

> Yes, that would be better indeed. However, checkpatch.pl doesn't process
> cover letters, and so we would have no automated way to advertise and nudge
> people towards testing.

Back when I used to run checkpatch it seemed to cope, it obviously
wouldn't find much to look at in there but you could feed it an entire
series with cover letter and the cover letter wouldn't cause any extra
issues.

> P.S. Git forges do that for you by nature of actually running the tests
> themselves, automatically. *Ducks*

The ability of forges to run tests is not hugely relevant to large
portions of the kernel, for drivers you're wanting the tests to run on
the relevant hardware and even core changes will often need hardware
that exercises the relevant features to run.  In these areas you're more
just looking for someone to say that they've done relevant testing but
there's not a substantial difference between say a comment on a pull
request or a followup email.
Nikolai Kondrashov Nov. 16, 2023, 1:41 p.m. UTC | #10
On 11/16/23 15:20, Bagas Sanjaya wrote:
> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
>> Make scripts/checkpatch.pl ensure any added V: fields reference
>> documented test suites only, and output a warning if a change to a
>> subsystem doesn't certify the required test suites were executed,
>> if any.
>>
>> If the test suite description includes a "Command", then checkpatch.pl
>> will output it as the one executing the suite. The command should run
>> with only the kernel tree and the regular developer environment set up.
>> But, at the same time, could simply output instructions for installing
>> any extra dependencies (or pull some automatically). The idea is to
>> get the developer into feedback loop quicker and easier, so they have
>> something to run and iterate on, even if it involves installing some
>> more stuff first. Therefore it's a good idea to add such wrappers to the
>> kernel tree proper and refer to them from the tests.rst.
> 
> Does it also apply to trivial patches (e.g. spelling or checkpatch fixes
> as seen on drivers/staging/)?

Do you mean, will checkpatch.pl suggest executing test suites for trivial 
patches as well? If so, then yes, of course. These are inevitable victims of 
such mechanisms in general, and it's hard to make an exception for them, but 
we have to consider the overall benefit of having more uniform testing vs. 
making trivial changes a bit more difficult.

Nick
Bagas Sanjaya Nov. 16, 2023, 1:43 p.m. UTC | #11
On 11/16/23 20:41, Nikolai Kondrashov wrote:
> On 11/16/23 15:20, Bagas Sanjaya wrote:
>> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
>>> Make scripts/checkpatch.pl ensure any added V: fields reference
>>> documented test suites only, and output a warning if a change to a
>>> subsystem doesn't certify the required test suites were executed,
>>> if any.
>>>
>>> If the test suite description includes a "Command", then checkpatch.pl
>>> will output it as the one executing the suite. The command should run
>>> with only the kernel tree and the regular developer environment set up.
>>> But, at the same time, could simply output instructions for installing
>>> any extra dependencies (or pull some automatically). The idea is to
>>> get the developer into feedback loop quicker and easier, so they have
>>> something to run and iterate on, even if it involves installing some
>>> more stuff first. Therefore it's a good idea to add such wrappers to the
>>> kernel tree proper and refer to them from the tests.rst.
>>
>> Does it also apply to trivial patches (e.g. spelling or checkpatch fixes
>> as seen on drivers/staging/)?
> 
> Do you mean, will checkpatch.pl suggest executing test suites for
> trivial patches as well? If so, then yes, of course. These are
> inevitable victims of such mechanisms in general, and it's hard to make
> an exception for them, but we have to consider the overall benefit of
> having more uniform testing vs. making trivial changes a bit more
> difficult.
> 

Yes, that's what I mean. Thanks anyway.
Nikolai Kondrashov Nov. 16, 2023, 1:52 p.m. UTC | #12
On 11/16/23 15:26, Mark Brown wrote:
> On Thu, Nov 16, 2023 at 02:14:24PM +0200, Nikolai Kondrashov wrote:
> 
>> Yes, that would be better indeed. However, checkpatch.pl doesn't process
>> cover letters, and so we would have no automated way to advertise and nudge
>> people towards testing.
> 
> Back when I used to run checkpatch it seemed to cope, it obviously
> wouldn't find much to look at in there but you could feed it an entire
> series with cover letter and the cover letter wouldn't cause any extra
> issues.

Ah, good to know, thank you. The question now is whether we can expect, or 
require, submitters to run checkpatch.pl on the complete patchset, cover 
letter included, *before* sending it.

>> P.S. Git forges do that for you by nature of actually running the tests
>> themselves, automatically. *Ducks*
> 
> The ability of forges to run tests is not hugely relevant to large
> portions of the kernel, for drivers you're wanting the tests to run on
> the relevant hardware and even core changes will often need hardware
> that exercises the relevant features to run.  In these areas you're more
> just looking for someone to say that they've done relevant testing but
> there's not a substantial difference between say a comment on a pull
> request or a followup email.

Agreed.

Still, there *are* largely hardware-independent (and thus maybe more 
impactful) parts of the kernel too.

Plus, you know yourself there's a bunch of companies willing (if not outright 
clamoring) to contribute their machine time for testing, provided some good 
will and authentication on the part of the contributors. And git forges 
provide good support for the latter. So perhaps *some* special hardware 
*could* be arranged there too, making it more useful.

Nick
Greg KH Nov. 16, 2023, 1:59 p.m. UTC | #13
On Thu, Nov 16, 2023 at 08:20:18PM +0700, Bagas Sanjaya wrote:
> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
> > Make scripts/checkpatch.pl ensure any added V: fields reference
> > documented test suites only, and output a warning if a change to a
> > subsystem doesn't certify the required test suites were executed,
> > if any.
> > 
> > If the test suite description includes a "Command", then checkpatch.pl
> > will output it as the one executing the suite. The command should run
> > with only the kernel tree and the regular developer environment set up.
> > But, at the same time, could simply output instructions for installing
> > any extra dependencies (or pull some automatically). The idea is to
> > get the developer into feedback loop quicker and easier, so they have
> > something to run and iterate on, even if it involves installing some
> > more stuff first. Therefore it's a good idea to add such wrappers to the
> > kernel tree proper and refer to them from the tests.rst.
> 
> Does it also apply to trivial patches (e.g. spelling or checkpatch fixes
> as seen on drivers/staging/)?

You are assuming that drivers/staging/ has actual tests :)
Geert Uytterhoeven Nov. 16, 2023, 2:21 p.m. UTC | #14
Hi Bagas,

On Thu, Nov 16, 2023 at 2:20 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
> > Make scripts/checkpatch.pl ensure any added V: fields reference
> > documented test suites only, and output a warning if a change to a
> > subsystem doesn't certify the required test suites were executed,
> > if any.
> >
> > If the test suite description includes a "Command", then checkpatch.pl
> > will output it as the one executing the suite. The command should run
> > with only the kernel tree and the regular developer environment set up.
> > But, at the same time, could simply output instructions for installing
> > any extra dependencies (or pull some automatically). The idea is to
> > get the developer into feedback loop quicker and easier, so they have
> > something to run and iterate on, even if it involves installing some
> > more stuff first. Therefore it's a good idea to add such wrappers to the
> > kernel tree proper and refer to them from the tests.rst.
>
> Does it also apply to trivial patches (e.g. spelling or checkpatch fixes
> as seen on drivers/staging/)?

After having seen the introduction of too many build failures, I'm
inclined to ask for an even stronger proof of testing for "trivial"
fixes for drivers/staging...

Gr{oetje,eeting}s,

                        Geert
Gustavo Padovan Nov. 20, 2023, 12:40 p.m. UTC | #15
On Thursday, November 16, 2023 09:14 -03, Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> wrote:

> On 11/15/23 22:38, Konstantin Ryabitsev wrote:
> > On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
> >> Introduce a new tag, 'Tested-with:', documented in the
> >> Documentation/process/submitting-patches.rst file. The tag is expected
> >> to reference the documented test suites, similarly to the 'V:' field,
> >> and to certify that the submitter executed the test suite on the change,
> >> and that it passed.
> > 
> > I'm not sure it should be a tag that stays all the way through git commit.
> > How about making it a cover letter/first patch footer?
> > 
> > tested-with: <suite>
> 
> Yes, that would be better indeed. However, checkpatch.pl doesn't process cover 
> letters, and so we would have no automated way to advertise and nudge people 
> towards testing.
 
At this year's LPC, there was quite some discussion around improving testing information, so this patch of yours lands has a great timing. :)

All your are proposing here is pretty interesting both for developers and CI systems, but it seems that a "Tested-with:" tag and checkpatch.pl validation will take quite some time to consolidate.

Would it make sense to split just the part that adds the V: field to MAINTAINERS and submit that as separate patchset together with its documentation? That way we can enable subsystem to start annotating their test suites already.

I also wonder how to make for subsystems that will have different test suites (eg something in kselftests and an external test suite). Would an alternative be pointing to a Documentation page with detailed info?

- Gus
Mark Brown Nov. 20, 2023, 1:31 p.m. UTC | #16
On Mon, Nov 20, 2023 at 12:40:39PM +0000, Gustavo Padovan wrote:

> I also wonder how to make for subsystems that will have different test
> suites (eg something in kselftests and an external test suite). Would
> an alternative be pointing to a Documentation page with detailed info?

Why not just add multiple test suite lines like we do for other things
where there can be more than one example?

Please fix your mail client to word wrap within paragraphs at something
substantially less than 80 columns.  Doing this makes your messages much
easier to read and reply to.
Theodore Ts'o Nov. 20, 2023, 8:51 p.m. UTC | #17
On Mon, Nov 20, 2023 at 02:30:49PM +0100, Ricardo Cañuelo wrote:
> 
> This is not trivial because tests vary a lot and we'd first need to
> define which artifacts to link to, and because whatever is linked (test
> commands, output log, results summary) would need to be stored
> forever. But since we're doing that already for basically all kernel
> mailing lists, I wonder if something like "public-inbox for test
> results" could be possible some day.

What we have at work is a way to upload the test results summary
(e.g., just KTAP result lines, or the xfstests junit XML) along with
test run metadata (e.g., what was the kernel commit on which the test
was run, and the test hardware), and this would be stored permanently.
Test artifacts is also preserved but for a limited amount of time
(e.g., some number of months or a year).

The difference in storage lifetimes is because the junit XML file
might be a few kilobytes to tens of kilobytes. but the test artifacts
might be a few megabytes to tens of megabytes.

Of course once you have this data, it becomes possible to detect when
a test may have regressed, or to detect flaky tests, and perhaps to
figure out if certain hardware configurations or kernel configurations
are more likely to trigger a particular test to fail.  So having all
of this data stored centrally would be really cool.  The only question
is who might be able to create such an infrastructure, and be able to
pay for the ongoing development and operational costs....

    	    	    		    		- Ted
Mark Brown Nov. 20, 2023, 10:27 p.m. UTC | #18
On Mon, Nov 20, 2023 at 03:51:31PM -0500, Theodore Ts'o wrote:

> What we have at work is a way to upload the test results summary
> (e.g., just KTAP result lines, or the xfstests junit XML) along with
> test run metadata (e.g., what was the kernel commit on which the test
> was run, and the test hardware), and this would be stored permanently.
> Test artifacts is also preserved but for a limited amount of time
> (e.g., some number of months or a year).

> The difference in storage lifetimes is because the junit XML file
> might be a few kilobytes to tens of kilobytes. but the test artifacts
> might be a few megabytes to tens of megabytes.

This is the sort of thing that kcidb (which Nikolai works on) is good at
ingesting, I actually do push all my CI's test results into there
already:

   https://github.com/kernelci/kcidb/

(the dashboard is down currently.)  A few other projects including the
current KernelCI and RedHat's CKI push their data in there too, I'm sure
Nikolai would be delighted to get more people pushing data in.  The goal
is to merge this with the main KernelCI infrastructure, it's currently
separate while people figure out the whole big data thing.

> Of course once you have this data, it becomes possible to detect when
> a test may have regressed, or to detect flaky tests, and perhaps to
> figure out if certain hardware configurations or kernel configurations
> are more likely to trigger a particular test to fail.  So having all
> of this data stored centrally would be really cool.  The only question
> is who might be able to create such an infrastructure, and be able to
> pay for the ongoing development and operational costs....

The KernelCI LF project is funding kcidb with precisely this goal for
the reasons you outline, the data collection part seems to be relatively
mature at this point but AIUI there's a bunch of open questions with the
analysis and usage side, partly due to needing to find people to work on
it.  My understanding is that ingesting large data sets into cloud
providers is pretty tractable, as with a lot of this stuff it gets more
interesting trying to pull the data out and comprehend it in a practical
fashion.  It'd be really cool to see more people working on that side of
things.

On the submission side it'd be interesting to start collecting more data
about the test systems used to run things, might be useful to add a new
schema for that which can be referenced from the test schema.
Theodore Ts'o Nov. 21, 2023, 6:04 a.m. UTC | #19
On Mon, Nov 20, 2023 at 10:27:33PM +0000, Mark Brown wrote:
> This is the sort of thing that kcidb (which Nikolai works on) is good at
> ingesting, I actually do push all my CI's test results into there
> already:
> 
>    https://github.com/kernelci/kcidb/
> 
> (the dashboard is down currently.)  A few other projects including the
> current KernelCI and RedHat's CKI push their data in there too, I'm sure
> Nikolai would be delighted to get more people pushing data in.  The goal
> is to merge this with the main KernelCI infrastructure, it's currently
> separate while people figure out the whole big data thing.

Looking at the kernelci, it appears that it's using a JSON submission
format.  Is there conversion scripts that take a KTAP test report, or
a Junit XML test report?

> The KernelCI LF project is funding kcidb with precisely this goal for
> the reasons you outline, the data collection part seems to be relatively
> mature at this point but AIUI there's a bunch of open questions with the
> analysis and usage side, partly due to needing to find people to work on
> it.

Indeed, this is the super hard part.  Having looked at the kernelci
web site, its dashboard isn't particularly useful for what I'm trying
to do with it.  For my part, when analyizing a single test run, the
kernelci dashboard isn't particularly helpful.  What I need is
something more like this:

ext4/4k: 554 tests, 48 skipped, 4301 seconds
ext4/1k: 550 tests, 3 failures, 51 skipped, 6739 seconds
  Failures: generic/051 generic/475 generic/476
ext4/ext3: 546 tests, 138 skipped, 4239 seconds
ext4/encrypt: 532 tests, 3 failures, 159 skipped, 3218 seconds
  Failures: generic/681 generic/682 generic/691
ext4/nojournal: 549 tests, 3 failures, 118 skipped, 4477 seconds
  Failures: ext4/301 ext4/304 generic/455
ext4/ext3conv: 551 tests, 49 skipped, 4655 seconds
ext4/adv: 551 tests, 4 failures, 56 skipped, 4987 seconds
  Failures: generic/477 generic/506
  Flaky: generic/269: 40% (2/5)   generic/455: 40% (2/5)
ext4/dioread_nolock: 552 tests, 48 skipped, 4538 seconds
ext4/data_journal: 550 tests, 2 failures, 120 skipped, 4401 seconds
  Failures: generic/455 generic/484
ext4/bigalloc_4k: 526 tests, 53 skipped, 4537 seconds
ext4/bigalloc_1k: 526 tests, 61 skipped, 4847 seconds
ext4/dax: 541 tests, 1 failures, 152 skipped, 3069 seconds
  Flaky: generic/269: 60% (3/5)
Totals: 6592 tests, 1053 skipped, 72 failures, 0 errors, 50577s

... which summarizes 6,592 tests in 20 lines, and for any test that
has failed, we rerun it four more times, so we can get an indication
of whether a test is a hard failure, or a flaky failure.

(I don't need to see all of the tests that passes; it's the test
failures or the test flakes that are significant.)

And then when comparing between multiple test runs, that's when I'm
interesting in see which tests may have regressed, or which tests may
have been fixed when going in between version A and version B.

And right now, kernelci doesn't have any of that.  So it might be hard
to convinced overloaded maintainers to upload test runs to kernelci,
when they don't see any immediate benefit of uploading the kernelci db.

There is a bit of a chicken-and-egg problem, since without the test
results getting uploaded, it's hard to get the analysis functionality
implemented, and without the analysis features, it's hard to get
developers to upload the data.

That being said, a number of file system developers probably have
several years worth of test results that we could probably give you.
I have hundreds of junit.xml files, with information about how kernel
version, what version of xfstesets, etc, that was used.  I'm happy to
make samples of it available for anyone who is interested.

Cheers,

						- Ted
David Gow Nov. 21, 2023, 10:36 a.m. UTC | #20
Thanks so much for doing this! I think everyone agrees that we need
_some_ way of documenting which tests to run, and I think this is our
best option.

In any case, this patch does a lot, and I'll comment on them
one-by-one. (It may be worth splitting this patch up into a few
separate bits, if only so that we can better separate the
uncontroversial bits from the open questions.)

On Thu, 16 Nov 2023 at 01:52, Nikolai Kondrashov
<Nikolai.Kondrashov@redhat.com> wrote:
>
> Introduce a new 'V:' ("Verify") field to MAINTAINERS. The field accepts
> a name of a test suite which is required to be executed for each
> contribution to the subsystem.

Yes -- this is exactly what I'd like. (As much as I'd love 'T' to have
been available. Alas...)

The other thing discussed at plumbers was to include this in the
'maintainer profile', but having it as a separate MAINTAINERS entry is
my preference, and is better for automation.

The question for what the tag actually contains brings us to...
>
> Each referenced test suite is expected to be documented in the new
> Documentation/process/tests.rst file, which must have enough structure
> (documented inside) for the tools to make use of it. Apart from basic
> data, each test can refer to its "superset" - a test suite which this
> one is a part of. The expected use is to describe both a large test
> suite and its subsets, so the former would also be accepted, if a
> subsystem requires only a subset.

I think this could work, but is a bit complicated.

My initial thought was to have this as a more free-form field, which
either contained a:
- Path to a command to run (e.g. tools/testing/kunit/run_checks.py)
- Path to a documentation file describing the test.
- URL to a page describing the test
- (Maybe) freeform text.

It's probably worth also looking at this proposal to auto-generate
similar documentation:
https://lore.kernel.org/linux-kselftest/cover.1689171160.git.mchehab@kernel.org/

The other question is how to handle outdated results when a new patch
revision is sent out. Personally, I think this is something we can
solve similarly to 'Reviewed-by', depending on the extent of the
changes and cost of the tests. I suspect for most automated tests,
this would mean never carrying the 'Tested-with' tag over, but if
testing it involved manually building and running kernels against 50
different hardware setups, I could imagine it making sense to not
re-do this if a new revision just changed a doc typo. If a URL is used
here, it could contain version info, too.

>
> Introduce a new tag, 'Tested-with:', documented in the
> Documentation/process/submitting-patches.rst file. The tag is expected
> to reference the documented test suites, similarly to the 'V:' field,
> and to certify that the submitter executed the test suite on the change,
> and that it passed.

I'm also 100% for this, though I'd considered it separately from the
MAINTAINERS change.

I think, in the ideal case, we want this to link to the results
somehow. kcidb would seem to be the obvious choice there.

Again, as a fallback, a plain text field would be useful to describe
cases where a patch was tested by some means other than a formal test
suite. This might not be ideal, but I'd still rather have people
describe that something "builds and boots on <x> hardware" than have
to guess if a patch was tested at all.

Of course, it'd then be up to maintainers to decide what they'd
accept: I'd expect that some would require there be a 'Tested-with'
header which links to valid results for the tests described in
MAINTAINERS.

>
> Make scripts/checkpatch.pl ensure any added V: fields reference
> documented test suites only, and output a warning if a change to a
> subsystem doesn't certify the required test suites were executed,
> if any.
>

I'd definitely want something like this to run at some point in the
patch-submission workflow. I think that, ultimately, we'll want to be
able to run some tests automatically (e.g., a git hook could run the
tests and add the 'Tested-with' line).

Personally, I'd like to require that all patches have a 'Tested-with'
field, even if there's not a corresponding 'V' MAINTAINERS entry, as
people should at least think of how something's tested, even if
there's not a formal 'test suite' for it. Though that seems a
longer-term goal


> If the test suite description includes a "Command", then checkpatch.pl
> will output it as the one executing the suite. The command should run
> with only the kernel tree and the regular developer environment set up.
> But, at the same time, could simply output instructions for installing
> any extra dependencies (or pull some automatically). The idea is to
> get the developer into feedback loop quicker and easier, so they have
> something to run and iterate on, even if it involves installing some
> more stuff first. Therefore it's a good idea to add such wrappers to the
> kernel tree proper and refer to them from the tests.rst.
>
> Extend scripts/get_maintainer.pl to support retrieving the V: fields,
> and scripts/parse-maintainers.pl to maintain their ordering.
>
> Signed-off-by: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
> ---

The questions I think we need to answer to get this in are:
1. Do we want to split this up (and potentially land it
piece-by-piece), or is it more valuable to have a stricter, more
complete system from the get-go?
2. What format should the 'V' line take? If it is simply a test name,
do we use a doc as suggested (or one generated in part from some other
process), or something like a command name or URL? Can it just be
freeform text?
3. Should 'Tested-with' be a test name in the same format as 'V', a
URL to results (any URL, or just kcidb?), or freeform text? How does
this evolve with multiple versions of patches?
4. How should this be enforced? A warning (not an 'error') from
checkpatch? A separate script?

Personally, my gut feeling is that we should land the simplest, most
minimal version of this (the 'V' field, as freeform text) now, and
build on that as consensus and tooling permits. I'd probably also add
the 'Tested-with' or similar tag, as freeform text, too. I don't think
either of those would cause major problems if we needed to change or
restrict the format later; I imagine there won't be a huge need to
parse old commits for test data, and even if so, it wouldn't be too
hard to ignore any which don't conform to any stricter future
convention.

But I don't think there's anything fundamentally wrong with the full
plan as-is, so if everyone's happy with it, I'd not object to having
it.

Cheers,
-- David
David Gow Nov. 21, 2023, 10:37 a.m. UTC | #21
On Tue, 21 Nov 2023 at 14:05, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Mon, Nov 20, 2023 at 10:27:33PM +0000, Mark Brown wrote:
> > This is the sort of thing that kcidb (which Nikolai works on) is good at
> > ingesting, I actually do push all my CI's test results into there
> > already:
> >
> >    https://github.com/kernelci/kcidb/
> >
> > (the dashboard is down currently.)  A few other projects including the
> > current KernelCI and RedHat's CKI push their data in there too, I'm sure
> > Nikolai would be delighted to get more people pushing data in.  The goal
> > is to merge this with the main KernelCI infrastructure, it's currently
> > separate while people figure out the whole big data thing.
>
> Looking at the kernelci, it appears that it's using a JSON submission
> format.  Is there conversion scripts that take a KTAP test report, or
> a Junit XML test report?

The kunit.py script has a very basic KCIDB JSON exporter, via the
--json option. This can be used as a generic KTAP -> KCIDB converter
with
kunit.py parse --json

It definitely still needs some work (there are a bunch of bugs,
hardcoded fields for things KTAP doesn't expose, some other output may
get mixed in, etc), but does exist as a starting point.

-- David
Mark Brown Nov. 21, 2023, 1:27 p.m. UTC | #22
On Tue, Nov 21, 2023 at 01:04:50AM -0500, Theodore Ts'o wrote:
> On Mon, Nov 20, 2023 at 10:27:33PM +0000, Mark Brown wrote:

> > This is the sort of thing that kcidb (which Nikolai works on) is good at
> > ingesting, I actually do push all my CI's test results into there
> > already:

> >    https://github.com/kernelci/kcidb/

> > (the dashboard is down currently.)  A few other projects including the
> > current KernelCI and RedHat's CKI push their data in there too, I'm sure
> > Nikolai would be delighted to get more people pushing data in.  The goal
> > is to merge this with the main KernelCI infrastructure, it's currently
> > separate while people figure out the whole big data thing.

> Looking at the kernelci, it appears that it's using a JSON submission
> format.  Is there conversion scripts that take a KTAP test report, or
> a Junit XML test report?

Probably - I know I've got something for KUnit which is annoyingly
difficult to publish for non-technical reasons and is a little broken
(things weren't visible in the dashboard when it was up which might mean
some missing field or a date set wrong).  My KTAP stuff is all mediated
through LAVA, that can push results into a web hook directly so it's
really easy to just add a notifier to your job and stream the results in
directly (I intend to push that into kcidb in my copious free time so
other people can use my code there).  It's relatively straightforward to
write these things.

> > The KernelCI LF project is funding kcidb with precisely this goal for
> > the reasons you outline, the data collection part seems to be relatively
> > mature at this point but AIUI there's a bunch of open questions with the
> > analysis and usage side, partly due to needing to find people to work on
> > it.

> Indeed, this is the super hard part.  Having looked at the kernelci
> web site, its dashboard isn't particularly useful for what I'm trying
> to do with it.  For my part, when analyizing a single test run, the
> kernelci dashboard isn't particularly helpful.  What I need is
> something more like this:
> 
> ext4/4k: 554 tests, 48 skipped, 4301 seconds
> ext4/1k: 550 tests, 3 failures, 51 skipped, 6739 seconds
>   Failures: generic/051 generic/475 generic/476

That should be achievable with the KernelCI stuff (which is different to
kcidb at present) - you're a lot of the way there with how kselftest is
currently reported modulo the list of failures which currently requires
you to drill down to a second level page.

> ... which summarizes 6,592 tests in 20 lines, and for any test that
> has failed, we rerun it four more times, so we can get an indication
> of whether a test is a hard failure, or a flaky failure.

> (I don't need to see all of the tests that passes; it's the test
> failures or the test flakes that are significant.)

The listing of tests does get a bit more complex when you mix in running
on different platforms.

> And then when comparing between multiple test runs, that's when I'm
> interesting in see which tests may have regressed, or which tests may
> have been fixed when going in between version A and version B.

Yup, that comparison stuff is useful.  The landing pages for individual
tests do have something there but not really anything higher level:

   https://linux.kernelci.org/test/case/id/655b0fa18dc4b7e0c47e4a88/

> And right now, kernelci doesn't have any of that.  So it might be hard
> to convinced overloaded maintainers to upload test runs to kernelci,
> when they don't see any immediate benefit of uploading the kernelci db.

Note that kcidb and KernelCI are currently different databases - with
the dashboard being done kcidb has no UI at all.  Personally I'm pushing
my data in on the basis that it costs me basically nothing to do so
given that I'm already running the tests.

> There is a bit of a chicken-and-egg problem, since without the test
> results getting uploaded, it's hard to get the analysis functionality
> implemented, and without the analysis features, it's hard to get
> developers to upload the data.

I think if we get tooling in place so that people can just run a script,
add a flag to their tools or whatever to ingest results from the
standard testsuites the barrier to reporting becomes sufficiently low
that it's more of a "why not?" type thing.

There's also other things we can do beyond big data analysis, some of
which are a lot easier - for example checking other people's CI results
for your branch before sending or accepting a pull request (if you've
got a one stop shop to pull data from that's a lot easier than if you
have to go round a bunch of places).

> That being said, a number of file system developers probably have
> several years worth of test results that we could probably give you.
> I have hundreds of junit.xml files, with information about how kernel
> version, what version of xfstesets, etc, that was used.  I'm happy to
> make samples of it available for anyone who is interested.

Right, I've likewise got a pile of results I can reprocess at will.
Nikolai Kondrashov Nov. 21, 2023, 6:24 p.m. UTC | #23
Hi Theodore,

On 11/20/23 22:51, Theodore Ts'o wrote:
> On Mon, Nov 20, 2023 at 02:30:49PM +0100, Ricardo Cañuelo wrote:
>>
>> This is not trivial because tests vary a lot and we'd first need to
>> define which artifacts to link to, and because whatever is linked (test
>> commands, output log, results summary) would need to be stored
>> forever. But since we're doing that already for basically all kernel
>> mailing lists, I wonder if something like "public-inbox for test
>> results" could be possible some day.
> 
> What we have at work is a way to upload the test results summary
> (e.g., just KTAP result lines, or the xfstests junit XML) along with
> test run metadata (e.g., what was the kernel commit on which the test
> was run, and the test hardware), and this would be stored permanently.
> Test artifacts is also preserved but for a limited amount of time
> (e.g., some number of months or a year).
> 
> The difference in storage lifetimes is because the junit XML file
> might be a few kilobytes to tens of kilobytes. but the test artifacts
> might be a few megabytes to tens of megabytes.
> 
> Of course once you have this data, it becomes possible to detect when
> a test may have regressed, or to detect flaky tests, and perhaps to
> figure out if certain hardware configurations or kernel configurations
> are more likely to trigger a particular test to fail.  So having all
> of this data stored centrally would be really cool.  The only question
> is who might be able to create such an infrastructure, and be able to
> pay for the ongoing development and operational costs....

Yes, I agree, having public result storage would be awesome. I think KCIDB is 
relatively-well positioned to take on some of that work. We have the POC 
dashboard already. Well, *had* until somebody scraped it and exhausted our 
cloud budget, I'm working on making it cheaper before bringing it back.

Meanwhile you can get a glimpse of it in one of my slide decks:
https://static.sched.com/hosted_files/eoss2023/ef/Kernel%20CI%20%E2%80%93%20How%20Far%20Can%20It%20Go%20%E2%80%93%20EOSS%202023.pdf

We also have an artifact-mirroring system (quite basic at the moment), 
expecting the submitter to already have the files hosted somewhere. We would 
need to add a file upload service to that.

Finally, I'm considering opening up submissions to (Google-authenticated) 
public, as opposed to just CI systems, so we could support this. That's still 
more work though.

Regarding analysis, I have plenty of ideas for that too, and even more ideas 
are explored by others lately. I just need to bring back the dashboard and 
find the time for all of it :D

Anyone interested in helping with any of this?

Nick
Mark Brown Nov. 21, 2023, 8:48 p.m. UTC | #24
On Tue, Nov 21, 2023 at 06:36:10PM +0800, David Gow wrote:

> The other question is how to handle outdated results when a new patch
> revision is sent out. Personally, I think this is something we can
> solve similarly to 'Reviewed-by', depending on the extent of the
> changes and cost of the tests. I suspect for most automated tests,
> this would mean never carrying the 'Tested-with' tag over, but if
> testing it involved manually building and running kernels against 50
> different hardware setups, I could imagine it making sense to not
> re-do this if a new revision just changed a doc typo. If a URL is used
> here, it could contain version info, too.

One thing with Reviewed-by that's a bit different to testing is that
Reviewed-by is generally invalidated by doing a change to the specific
patch that needs at least a commit --amend.

> Personally, I'd like to require that all patches have a 'Tested-with'
> field, even if there's not a corresponding 'V' MAINTAINERS entry, as
> people should at least think of how something's tested, even if
> there's not a formal 'test suite' for it. Though that seems a
> longer-term goal

A requirement feels like it'd be pretty painful for my workflow, or at
least result in me adding the thing in hope of what I'm actually going
to do rather than as a result of the testing - all my CI stuff
(including what I do for outgoing patches) is keyed off the git commits
being tested so updating the commits to reflect testing would have
unfortunate side effects.

> The questions I think we need to answer to get this in are:
> 1. Do we want to split this up (and potentially land it
> piece-by-piece), or is it more valuable to have a stricter, more
> complete system from the get-go?

I think splitting things makes sense.
kernel test robot Nov. 22, 2023, 1:08 a.m. UTC | #25
Hi Nikolai,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.7-rc2 next-20231121]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nikolai-Kondrashov/MAINTAINERS-Introduce-V-field-for-required-tests/20231116-015419
base:   linus/master
patch link:    https://lore.kernel.org/r/20231115175146.9848-2-Nikolai.Kondrashov%40redhat.com
patch subject: [PATCH 1/3] MAINTAINERS: Introduce V: field for required tests
reproduce: (https://download.01.org/0day-ci/archive/20231122/202311220843.vh7WyxDF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311220843.vh7WyxDF-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/process/tests.rst: WARNING: document isn't included in any toctree
Theodore Ts'o Nov. 22, 2023, 4:16 p.m. UTC | #26
On Tue, Nov 21, 2023 at 01:27:44PM +0000, Mark Brown wrote:
> > (I don't need to see all of the tests that passes; it's the test
> > failures or the test flakes that are significant.)
> 
> The listing of tests does get a bit more complex when you mix in running
> on different platforms.

Yeah, that's part of the aggregation reports problem.  Given a
particular test, say, generic/475, I'd love to see a summary of which
file system config (e.g., ext4/4k, ext4/1k) and which architectures a
particular test is failing or which is flaky.  Right now, I do this
manually using a combination of a mutt mail reader (the test summaries
are e-mailed to me), and emacs....

> I think if we get tooling in place so that people can just run a script,
> add a flag to their tools or whatever to ingest results from the
> standard testsuites the barrier to reporting becomes sufficiently low
> that it's more of a "why not?" type thing.

Sure, I'm happy to add something like that to my test runners:
kvm-xfstests, gce-xfstests, and android-xfstests.  Then anyone who
uses my test runner infrastructure would get uploading for free.  We
might need to debate whether I enable uploading as something which is
enabled by default or not (I can imagine some people won't wanting to
upload information to a public site, lest it leak information about an
upcoming mobile handset :-), but that's a minor point.

Personally, I'm not going to have time to look into this for a while,
but... patches welcome.  Or even something that takes a junit xml
file, and uploads it to the kcidb.  If someone can make something like
that available, I should be able to take it the rest of the way.

Cheers,

						- Ted
Nikolai Kondrashov Nov. 22, 2023, 5:19 p.m. UTC | #27
On 11/21/23 12:36, David Gow wrote:
> Thanks so much for doing this! I think everyone agrees that we need
> _some_ way of documenting which tests to run, and I think this is our
> best option.

Awesome :D

> In any case, this patch does a lot, and I'll comment on them
> one-by-one. (It may be worth splitting this patch up into a few
> separate bits, if only so that we can better separate the
> uncontroversial bits from the open questions.)

Yeah, I'll try to figure out a less controversial split.

> On Thu, 16 Nov 2023 at 01:52, Nikolai Kondrashov
>> Each referenced test suite is expected to be documented in the new
>> Documentation/process/tests.rst file, which must have enough structure
>> (documented inside) for the tools to make use of it. Apart from basic
>> data, each test can refer to its "superset" - a test suite which this
>> one is a part of. The expected use is to describe both a large test
>> suite and its subsets, so the former would also be accepted, if a
>> subsystem requires only a subset.
> 
> I think this could work, but is a bit complicated.
> 
> My initial thought was to have this as a more free-form field, which
> either contained a:
> - Path to a command to run (e.g. tools/testing/kunit/run_checks.py)
> - Path to a documentation file describing the test.
> - URL to a page describing the test
> - (Maybe) freeform text.

I think we should be careful about having too much flexibility here, if we
want to have tools process this. I mean, we would then have to formalize and
define the syntax for all the cases, which could then become too obscure for
humans to easily read and write.

As I said elsewhere, our ultimate (even if unachievable) target should be to
have commands to execute (instead of long setup and execution instructions),
for *all* V: entries. So that definitely should be supported. The current
patch already supports putting a command in the tests.rst to be printed by
checkpatch.pl. Perhaps we can allow putting it into the V: entry directly. I
have one idea on how to do that.

OTOH, I think there's value in an overall catalogue of tests and having
easily-accessible documentation for that command (even if brief), and that's
what going for the command via tests.rst allows.

Regarding a path to the documentation file, that goes against the idea of a
catalogue file, again, so I'm reluctant of letting it go. Same goes for a
documentation URL. Both of these can be expressed via the catalogue too.

I wonder if it could be useful to add another option to get_maintainer.pl, or
implement a new script, which would just dump the referenced test catalogue
sections for a patchset, for a longer-form read than what checkpatch.pl can
produce, but faster than scanning the whole catalogue personally.

> It's probably worth also looking at this proposal to auto-generate
> similar documentation:
> https://lore.kernel.org/linux-kselftest/cover.1689171160.git.mchehab@kernel.org/

IIRC, Mauro has mentioned this effort to me at EOSS in Prague, but I still
haven't found the time to look at it closely. I think this is a worthy effort
overall, but at a somewhat lower level. What I would like to be in the
tests.rst is a basic intro and help to get each corresponding test suite
running, to help breach the gap between trying to contribute and having your
contribution tested with what maintainers prescribe. The docs in tests.rst can
point to the more detailed docs, in turn. I also think it's good to have a
document with an overview of what tests exist in general and which areas they
address.

> The other question is how to handle outdated results when a new patch
> revision is sent out. Personally, I think this is something we can
> solve similarly to 'Reviewed-by', depending on the extent of the
> changes and cost of the tests. I suspect for most automated tests,
> this would mean never carrying the 'Tested-with' tag over, but if
> testing it involved manually building and running kernels against 50
> different hardware setups, I could imagine it making sense to not
> re-do this if a new revision just changed a doc typo. If a URL is used
> here, it could contain version info, too.

Yeah, that would be painful.

> Paste encouragement for in-tree test suites triggered by git forges here <

I think what Ricardo is suggesting in another branch, regarding adding result
URLs, could work. So it would be nice to account for that in the change, but
the policy would probably depend on subsystem/maintainer/the change in question.

>> Introduce a new tag, 'Tested-with:', documented in the
>> Documentation/process/submitting-patches.rst file. The tag is expected
>> to reference the documented test suites, similarly to the 'V:' field,
>> and to certify that the submitter executed the test suite on the change,
>> and that it passed.
> 
> I'm also 100% for this, though I'd considered it separately from the
> MAINTAINERS change.
> 
> I think, in the ideal case, we want this to link to the results
> somehow. kcidb would seem to be the obvious choice there.
> 
> Again, as a fallback, a plain text field would be useful to describe
> cases where a patch was tested by some means other than a formal test
> suite. This might not be ideal, but I'd still rather have people
> describe that something "builds and boots on <x> hardware" than have
> to guess if a patch was tested at all.
> 
> Of course, it'd then be up to maintainers to decide what they'd
> accept: I'd expect that some would require there be a 'Tested-with'
> header which links to valid results for the tests described in
> MAINTAINERS.

I'm thinking that maybe we should just not *require* a valid reference to a
documented test in the Tested-with: field. I.e. only verify that all the V:
values are listed, but ignore everything unknown.

>> Make scripts/checkpatch.pl ensure any added V: fields reference
>> documented test suites only, and output a warning if a change to a
>> subsystem doesn't certify the required test suites were executed,
>> if any.
> 
> I'd definitely want something like this to run at some point in the
> patch-submission workflow. I think that, ultimately, we'll want to be
> able to run some tests automatically (e.g., a git hook could run the
> tests and add the 'Tested-with' line).
> 
> Personally, I'd like to require that all patches have a 'Tested-with'
> field, even if there's not a corresponding 'V' MAINTAINERS entry, as
> people should at least think of how something's tested, even if
> there's not a formal 'test suite' for it. Though that seems a
> longer-term goal

Yeah, that would be nice from a maintainer's POV, but probably not very popular.

>> If the test suite description includes a "Command", then checkpatch.pl
>> will output it as the one executing the suite. The command should run
>> with only the kernel tree and the regular developer environment set up.
>> But, at the same time, could simply output instructions for installing
>> any extra dependencies (or pull some automatically). The idea is to
>> get the developer into feedback loop quicker and easier, so they have
>> something to run and iterate on, even if it involves installing some
>> more stuff first. Therefore it's a good idea to add such wrappers to the
>> kernel tree proper and refer to them from the tests.rst.
>>
>> Extend scripts/get_maintainer.pl to support retrieving the V: fields,
>> and scripts/parse-maintainers.pl to maintain their ordering.
>>
>> Signed-off-by: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
>> ---
> 
> The questions I think we need to answer to get this in are:
> 1. Do we want to split this up (and potentially land it
> piece-by-piece), or is it more valuable to have a stricter, more
> complete system from the get-go?

I'll see what I can do about splitting it.

> 2. What format should the 'V' line take? If it is simply a test name,
> do we use a doc as suggested (or one generated in part from some other
> process), or something like a command name or URL? Can it just be
> freeform text?
> 3. Should 'Tested-with' be a test name in the same format as 'V', a
> URL to results (any URL, or just kcidb?), or freeform text? How does
> this evolve with multiple versions of patches?

I don't think it's useful to restrict this to kcidb. I think the tools should
generally allow anything there, but verify the entries by MAINTAINERS are
there, as I write above.

> 4. How should this be enforced? A warning (not an 'error') from
> checkpatch? A separate script?
> 
> Personally, my gut feeling is that we should land the simplest, most
> minimal version of this (the 'V' field, as freeform text) now, and
> build on that as consensus and tooling permits. I'd probably also add
> the 'Tested-with' or similar tag, as freeform text, too. I don't think
> either of those would cause major problems if we needed to change or
> restrict the format later; I imagine there won't be a huge need to
> parse old commits for test data, and even if so, it wouldn't be too
> hard to ignore any which don't conform to any stricter future
> convention.
> 
> But I don't think there's anything fundamentally wrong with the full
> plan as-is, so if everyone's happy with it, I'd not object to having
> it.

I agree that the minimum support (just the freeform V:/Tested-with:) would be
easier to get merged, but would also be somewhat toothless. I think some sort
of test documentation/catalogue adds value, and a message from checkpatch.pl
even more so, as it greatly aids test discoverability, and I is a major point
of the change. We can lower the WARN to INFO to reduce resistance, or even
maybe make the level configurable in the V: field.

Anyway, I'll work on splitting this.

Thanks a lot for the extensive and insightful comments, David!

Nick
Nikolai Kondrashov Nov. 22, 2023, 5:41 p.m. UTC | #28
Hi Gustavo,

On 11/20/23 14:40, Gustavo Padovan wrote:
> On Thursday, November 16, 2023 09:14 -03, Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> wrote:
> 
>> On 11/15/23 22:38, Konstantin Ryabitsev wrote:
>>> On Wed, Nov 15, 2023 at 07:43:49PM +0200, Nikolai Kondrashov wrote:
>>>> Introduce a new tag, 'Tested-with:', documented in the
>>>> Documentation/process/submitting-patches.rst file. The tag is expected
>>>> to reference the documented test suites, similarly to the 'V:' field,
>>>> and to certify that the submitter executed the test suite on the change,
>>>> and that it passed.
>>>
>>> I'm not sure it should be a tag that stays all the way through git commit.
>>> How about making it a cover letter/first patch footer?
>>>
>>> tested-with: <suite>
>>
>> Yes, that would be better indeed. However, checkpatch.pl doesn't process cover
>> letters, and so we would have no automated way to advertise and nudge people
>> towards testing.
>   
> At this year's LPC, there was quite some discussion around improving testing information, so this patch of yours lands has a great timing. :)

Lucky me :D

> All your are proposing here is pretty interesting both for developers and CI systems, but it seems that a "Tested-with:" tag and checkpatch.pl validation will take quite some time to consolidate.
> 
> Would it make sense to split just the part that adds the V: field to MAINTAINERS and submit that as separate patchset together with its documentation? That way we can enable subsystem to start annotating their test suites already.

Yeah, I'll try to split this along controversy lines in the next version.

> I also wonder how to make for subsystems that will have different test suites (eg something in kselftests and an external test suite). Would an alternative be pointing to a Documentation page with detailed info?

As Mark already noted, you can always just put more V: entries there, but you
can also create a "meta-test" in the catalogue listing all the different test
suites, and reference it from the V: entry, if you'd like.

Nick
diff mbox series

Patch

diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index 86d346bcb8ef0..8f0f3c9f753dd 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -228,6 +228,25 @@  You should be able to justify all violations that remain in your
 patch.
 
 
+Test your changes
+-----------------
+
+Test the patch to the best of your ability. Check the MAINTAINERS file for the
+subsystem(s) you are changing to see if there are any **V:** entries requiring
+particular test suites to be executed. If any are required, follow the
+instructions in the Documentation/process/tests.rst under the headings
+matching the V: entries.
+
+If you ran any test suites documented in the Documentation/process/tests.rst
+file, and they passed, add a 'Tested-with: <test_suite>' line to the messages
+of the commits you tested, one for every test suite, substituting
+'<test_suite>' with their names.
+
+If a subsystem you're changing requires a test suite to be executed and the
+commit lacks the 'Tested-with:' line referring to it (or its documented
+superset), scripts/checkpatch.pl will produce a WARNING reminding you to run
+it.
+
 Select the recipients for your patch
 ------------------------------------
 
diff --git a/Documentation/process/tests.rst b/Documentation/process/tests.rst
new file mode 100644
index 0000000000000..907311e91ec45
--- /dev/null
+++ b/Documentation/process/tests.rst
@@ -0,0 +1,35 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _tests:
+
+Tests you can run
+=================
+
+There are many automated tests available for the Linux kernel, and some
+userspace tests which happen to also test the kernel. Here are some of them,
+along with the instructions on where to get them and how to run them for
+various purposes.
+
+This document has to follow a certain structure to allow tool access.
+Second-level headers (underscored with dashes '-') must contain test suite
+names, and the corresponding section must contain the test description.
+
+The test suites can be referred to by name, from the "V:" lines in the
+MAINTAINERS file, as well as from the "Tested-with:" tag in commit messages.
+
+The test suite description should contain the test documentation in general:
+where to get the test, how to run it, and how to interpret its results, but
+could also start with a "field list", with the following entries recognized by
+the tools (regardless of the case):
+
+:Summary: A single-line summary of the test suite
+:Superset: The name of the test suite this one is a subset of
+:Command: A shell command executing the test suite, not requiring setup
+          beyond the kernel tree and regular developer environment
+          (even if only to report what else needs setting up)
+
+Any other entries are accepted, but not processed. The following could be
+particularly useful:
+
+:Source: A URL pointing to the source code of the test suite
+:Docs: A URL pointing to further test suite documentation
diff --git a/MAINTAINERS b/MAINTAINERS
index 5c9f868e13b6e..2565c04f0490e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -59,6 +59,12 @@  Descriptions of section entries and preferred order
 	      matches patches or files that contain one or more of the words
 	      printk, pr_info or pr_err
 	   One regex pattern per line.  Multiple K: lines acceptable.
+	V: *Test* recommended for execution. The name of a test suite
+	   that should be executed for changes to the maintained subsystem.
+	   The test suite must be documented in a structured way in
+	   Documentation/process/tests.rst under a heading of the same name.
+	   V: xfstests
+	   One test suite per line.
 
 Maintainers List
 ----------------
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25fdb7fda1128..92ee221caa4d3 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -66,6 +66,9 @@  my $codespellfile = "/usr/share/codespell/dictionary.txt";
 my $user_codespellfile = "";
 my $conststructsfile = "$D/const_structs.checkpatch";
 my $docsfile = "$D/../Documentation/dev-tools/checkpatch.rst";
+my $testsrelfile = "Documentation/process/tests.rst";
+my $testsfile = "$D/../$testsrelfile";
+my %tests = ();
 my $typedefsfile;
 my $color = "auto";
 my $allow_c99_comments = 1; # Can be overridden by --ignore C99_COMMENT_TOLERANCE
@@ -282,6 +285,39 @@  sub load_docs {
 	close($docs);
 }
 
+sub load_tests {
+	open(my $tests, '<', "$testsfile")
+	    or warn "$P: Can't read the tests file $testsfile $!\n";
+
+	my $name = undef;
+	my $prev_line = undef;
+	my $in_field_list = 0;
+
+	while (<$tests>) {
+		my $line = $_;
+		$line =~ s/\s+$//;
+
+		# If the previous line was a second-level header (test name)
+		if ($line =~ /^-+$/ &&
+		    defined($prev_line) &&
+		    length($line) == length($prev_line)) {
+			$name = $prev_line;
+			$tests{$name} = {};
+			$in_field_list = 1;
+		# Else, if we're parsing the test's header field list
+		} elsif ($in_field_list) {
+			if ($line =~ /^:([^:]+):\s+(.*)/) {
+				$tests{$name}{lc($1)} = $2;
+			} else {
+				$in_field_list = !$line;
+			}
+		}
+
+		$prev_line = $line;
+	}
+	close($tests);
+}
+
 # Perl's Getopt::Long allows options to take optional arguments after a space.
 # Prevent --color by itself from consuming other arguments
 foreach (@ARGV) {
@@ -372,6 +408,7 @@  if ($color =~ /^[01]$/) {
 
 load_docs() if ($verbose);
 list_types(0) if ($list_types);
+load_tests();
 
 $fix = 1 if ($fix_inplace);
 $check_orig = $check;
@@ -1144,6 +1181,26 @@  sub is_maintained_obsolete {
 	return $maintained_status{$filename} =~ /obsolete/i;
 }
 
+# Test suites required per changed file
+our %file_required_tests = ();
+
+# Return a list of test suites required for execution for a particular file
+sub get_file_required_tests {
+	my ($filename) = @_;
+	my $file_required_tests;
+
+	return () if (!$tree || !(-e "$root/scripts/get_maintainer.pl"));
+
+	if (!exists($file_required_tests{$filename})) {
+		my $output = `perl $root/scripts/get_maintainer.pl --test --multiline --nogit --nogit-fallback -f $filename 2>&1`;
+		die "Failed retrieving tests required for changes to \"$filename\":\n$output" if ($?);
+		$file_required_tests{$filename} = [grep { !/@/ } split("\n", $output)]
+	}
+
+	$file_required_tests = $file_required_tests{$filename};
+	return @$file_required_tests;
+}
+
 sub is_SPDX_License_valid {
 	my ($license) = @_;
 
@@ -2689,6 +2746,9 @@  sub process {
 	my @setup_docs = ();
 	my $setup_docs = 0;
 
+	# Test suites which should not be required for execution
+	my %not_required_tests = ();
+
 	my $camelcase_file_seeded = 0;
 
 	my $checklicenseline = 1;
@@ -2907,6 +2967,27 @@  sub process {
 				}
 			}
 
+			# Check if tests are required for changes to the file
+			foreach my $name (get_file_required_tests($realfile)) {
+				next if exists $not_required_tests{$name};
+				my $test_ref = "\"$name\"";
+				my $summary = $tests{$name}{"summary"};
+				my $command = $tests{$name}{"command"};
+				my $instructions = "";
+				if (defined($summary)) {
+					$test_ref = "$summary ($test_ref)";
+				}
+				if (defined($command)) {
+					$instructions .= "\nRun the test with \"$command\".";
+				}
+				$instructions .= "\nSee $testsrelfile for instructions.";
+				WARN("TEST_REQUIRED",
+				     "Changes to $realfile require running $test_ref, " .
+				     "but no corresponding Tested-with: tag found." .
+				     "$instructions");
+				$not_required_tests{$name} = 1;
+			}
+
 			next;
 		}
 
@@ -3233,6 +3314,29 @@  sub process {
 			}
 		}
 
+# Check and accumulate executed test suites
+		if (!$in_commit_log && $line =~ /^\s*Tested-with:\s*(.*)/i) {
+			my $name = $1;
+			my $sub_found = 0;
+			if (!exists $tests{$name}) {
+				ERROR("TEST_NAME",
+				      "Test suite \"$name\" not documented in $testsrelfile\n" . $herecurr);
+			}
+			# Do not require this test suite and all its subsets
+			local *dont_require_test = sub {
+				my ($name) = @_;
+				$not_required_tests{$name} = 1;
+				foreach my $sub_name (keys %tests) {
+					my $sub_data = $tests{$sub_name};
+					my $superset = $sub_data->{"superset"};
+					if (defined($superset) and $superset eq $name) {
+						dont_require_test($sub_name);
+					}
+				}
+			};
+			dont_require_test($name);
+		}
+
 # Check email subject for common tools that don't need to be mentioned
 		if ($in_header_lines &&
 		    $line =~ /^Subject:.*\b(?:checkpatch|sparse|smatch)\b[^:]/i) {
@@ -3657,7 +3761,7 @@  sub process {
 				}
 			}
 # check MAINTAINERS entries for the right ordering too
-			my $preferred_order = 'MRLSWQBCPTFXNK';
+			my $preferred_order = 'MRLSWQBCPVTFXNK';
 			if ($rawline =~ /^\+[A-Z]:/ &&
 			    $prevrawline =~ /^[\+ ][A-Z]:/) {
 				$rawline =~ /^\+([A-Z]):\s*(.*)/;
@@ -3683,6 +3787,18 @@  sub process {
 					}
 				}
 			}
+# check MAINTAINERS V: entries are valid and refer to a documented test suite
+			if ($rawline =~ /^\+V:\s*(.*)/) {
+				my $name = $1;
+				if ($name =~ /@/) {
+					ERROR("TEST_NAME",
+					      "Test suite name cannot contain '@' character\n" . $herecurr);
+				}
+				if (!exists $tests{$name}) {
+					ERROR("TEST_NAME",
+					      "Test suite \"$name\" not documented in $testsrelfile\n" . $herecurr);
+				}
+			}
 		}
 
 		if (($realfile =~ /Makefile.*/ || $realfile =~ /Kbuild.*/) &&
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index 16d8ac6005b6f..d4ae27b67becb 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -53,6 +53,7 @@  my $output_section_maxlen = 50;
 my $scm = 0;
 my $tree = 1;
 my $web = 0;
+my $test = 0;
 my $subsystem = 0;
 my $status = 0;
 my $letters = "";
@@ -270,6 +271,7 @@  if (!GetOptions(
 		'scm!' => \$scm,
 		'tree!' => \$tree,
 		'web!' => \$web,
+		'test!' => \$test,
 		'letters=s' => \$letters,
 		'pattern-depth=i' => \$pattern_depth,
 		'k|keywords!' => \$keywords,
@@ -319,13 +321,14 @@  if ($sections || $letters ne "") {
     $status = 0;
     $subsystem = 0;
     $web = 0;
+    $test = 0;
     $keywords = 0;
     $keywords_in_file = 0;
     $interactive = 0;
 } else {
-    my $selections = $email + $scm + $status + $subsystem + $web;
+    my $selections = $email + $scm + $status + $subsystem + $web + $test;
     if ($selections == 0) {
-	die "$P:  Missing required option: email, scm, status, subsystem or web\n";
+	die "$P:  Missing required option: email, scm, status, subsystem, web or test\n";
     }
 }
 
@@ -630,6 +633,7 @@  my %hash_list_to;
 my @list_to = ();
 my @scm = ();
 my @web = ();
+my @test = ();
 my @subsystem = ();
 my @status = ();
 my %deduplicate_name_hash = ();
@@ -661,6 +665,11 @@  if ($web) {
     output(@web);
 }
 
+if ($test) {
+    @test = uniq(@test);
+    output(@test);
+}
+
 exit($exit);
 
 sub self_test {
@@ -846,6 +855,7 @@  sub get_maintainers {
     @list_to = ();
     @scm = ();
     @web = ();
+    @test = ();
     @subsystem = ();
     @status = ();
     %deduplicate_name_hash = ();
@@ -1068,6 +1078,7 @@  MAINTAINER field selection options:
   --status => print status if any
   --subsystem => print subsystem name if any
   --web => print website(s) if any
+  --test => print test(s) if any
 
 Output type options:
   --separator [, ] => separator for multiple entries on 1 line
@@ -1378,6 +1389,8 @@  sub add_categories {
 		push(@scm, $pvalue . $suffix);
 	    } elsif ($ptype eq "W") {
 		push(@web, $pvalue . $suffix);
+	    } elsif ($ptype eq "V") {
+		push(@test, $pvalue . $suffix);
 	    } elsif ($ptype eq "S") {
 		push(@status, $pvalue . $suffix);
 	    }
diff --git a/scripts/parse-maintainers.pl b/scripts/parse-maintainers.pl
index 2ca4eb3f190d6..dbc2b79bcaa46 100755
--- a/scripts/parse-maintainers.pl
+++ b/scripts/parse-maintainers.pl
@@ -44,6 +44,7 @@  usage: $P [options] <pattern matching regexes>
       B:  URI for bug tracking/submission
       C:  URI for chat
       P:  URI or file for subsystem specific coding styles
+      V:  Test suite name
       T:  SCM tree type and location
       F:  File and directory pattern
       X:  File and directory exclusion pattern
@@ -73,7 +74,7 @@  sub by_category($$) {
 
 sub by_pattern($$) {
     my ($a, $b) = @_;
-    my $preferred_order = 'MRLSWQBCPTFXNK';
+    my $preferred_order = 'MRLSWQBCPVTFXNK';
 
     my $a1 = uc(substr($a, 0, 1));
     my $b1 = uc(substr($b, 0, 1));