[v5] unit tests: Add a project plan document

Message ID	c7dca1a805a16fd4fd68e86efeec97510e3ac4b8.1691449216.git.steadmon@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Mon, 7 Aug 2023 16:07:55 -0700 In-Reply-To: <0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com> Mime-Version: 1.0 References: <0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com> Message-ID: <c7dca1a805a16fd4fd68e86efeec97510e3ac4b8.1691449216.git.steadmon@google.com> Subject: [PATCH v5] unit tests: Add a project plan document From: Josh Steadmon <steadmon@google.com> To: git@vger.kernel.org Cc: linusa@google.com, calvinwan@google.com, phillip.wood123@gmail.com, chooglen@google.com, gitster@pobox.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk
Series	[v5] unit tests: Add a project plan document \| expand [v5] unit tests: Add a project plan document

Message ID

c7dca1a805a16fd4fd68e86efeec97510e3ac4b8.1691449216.git.steadmon@google.com (mailing list archive)

State

New, archived

Headers

Date: Mon,  7 Aug 2023 16:07:55 -0700
In-Reply-To: 
 <0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com>
Mime-Version: 1.0
References: 
 <0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com>
Message-ID: 
 <c7dca1a805a16fd4fd68e86efeec97510e3ac4b8.1691449216.git.steadmon@google.com>
Subject: [PATCH v5] unit tests: Add a project plan document
From: Josh Steadmon <steadmon@google.com>
To: git@vger.kernel.org
Cc: linusa@google.com, calvinwan@google.com, phillip.wood123@gmail.com,
        chooglen@google.com, gitster@pobox.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

Series

[v5] unit tests: Add a project plan document | expand

Commit Message

Josh Steadmon Aug. 7, 2023, 11:07 p.m. UTC

In our current testing environment, we spend a significant amount of
effort crafting end-to-end tests for error conditions that could easily
be captured by unit tests (or we simply forgo some hard-to-setup and
rare error conditions). Describe what we hope to accomplish by
implementing unit tests, and explain some open questions and milestones.
Discuss desired features for test frameworks/harnesses, and provide a
preliminary comparison of several different frameworks.

Coauthored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
In our current testing environment, we spend a significant amount of
effort crafting end-to-end tests for error conditions that could easily
be captured by unit tests (or we simply forgo some hard-to-setup and
rare error conditions). Unit tests additionally provide stability to the
codebase and can simplify debugging through isolation. Turning parts of
Git into libraries[1] gives us the ability to run unit tests on the
libraries and to write unit tests in C. Writing unit tests in pure C,
rather than with our current shell/test-tool helper setup, simplifies
test setup, simplifies passing data around (no shell-isms required), and
reduces testing runtime by not spawning a separate process for every
test invocation.

This patch adds a project document describing our goals for adding unit
tests, as well as a discussion of features needed from prospective test
frameworks or harnesses. It also includes a WIP comparison of various
proposed frameworks. Later iterations of this series will probably
include a sample unit test and Makefile integration once we've settled
on a framework. A rendered preview of this doc can be found at [2].

[1] https://lore.kernel.org/git/CAJoAoZ=Cig_kLocxKGax31sU7Xe4==BGzC__Bg2_pr7krNq6MA@mail.gmail.com/
[2] https://github.com/steadmon/git/blob/unit-tests-dev/Documentation/technical/unit-tests.adoc

Reviewers can help this series progress by discussing whether it's
acceptable to rely on `prove` as a test harness for unit tests. We
support this for the current shell tests suite, but it is not strictly
required.

TODOs remaining:
- Discuss pre-existing harnesses for the current test suite
- Figure out how to evaluate frameworks on additional OSes such as *BSD
  and NonStop

Changes in v5:
- Add comparison point "License".
- Discuss feature priorities
- Drop frameworks:
  - Incompatible licenses: libtap, cmocka
  - Missing source: MyTAP
  - No TAP support: µnit, cmockery, cmockery2, Unity, minunit, CUnit
- Drop comparison point "Coverage reports": this can generally be
  handled by tools such as `gcov` regardless of the framework used.
- Drop comparison point "Inline tests": there didn't seem to be
  strong interest from reviewers for this feature.
- Drop comparison point "Scheduling / re-running": this was not
  supported by any of the main contenders, and is generally better
  handled by the harness rather than framework.
- Drop comparison point "Lazy test planning": this was supported by
  all frameworks that provide TAP output.

Changes in v4:
- Add link anchors for the framework comparison dimensions
- Explain "Partial" results for each dimension
- Use consistent dimension names in the section headers and comparison
  tables
- Add "Project KLOC", "Adoption", and "Inline tests" dimensions
- Fill in a few of the missing entries in the comparison table

Changes in v3:
- Expand the doc with discussion of desired features and a WIP
  comparison.
- Drop all implementation patches until a framework is selected.
- Link to v2: https://lore.kernel.org/r/20230517-unit-tests-v2-v2-0-21b5b60f4b32@google.com

 Documentation/Makefile                 |   1 +
 Documentation/technical/unit-tests.txt | 217 +++++++++++++++++++++++++
 2 files changed, 218 insertions(+)
 create mode 100644 Documentation/technical/unit-tests.txt


base-commit: a9e066fa63149291a55f383cfa113d8bdbdaa6b3

Comments

Phillip Wood Aug. 14, 2023, 1:29 p.m. UTC | #1

Hi Josh

On 08/08/2023 00:07, Josh Steadmon wrote:
> 
> Reviewers can help this series progress by discussing whether it's
> acceptable to rely on `prove` as a test harness for unit tests. We
> support this for the current shell tests suite, but it is not strictly
> required.

If possible it would be good to be able to run individual test programs 
without a harness as we can for our integration tests. For running more 
than one test program in parallel I think it is fine to require a harness.

I don't have a strong preference for which harness we use so long as it 
provides a way to (a) run tests that previously failed tests first and 
(b) run slow tests first. I do have a strong preference for using the 
same harness for both the unit tests and the integration tests so 
developers don't have to learn two different tools. Unless there is a 
problem with prove it would probably make sense just to keep using that 
as the project test harness.

> TODOs remaining:
> - Discuss pre-existing harnesses for the current test suite
> - Figure out how to evaluate frameworks on additional OSes such as *BSD
>    and NonStop

We have .cirrus.yml in tree which I think gitgitgadget uses to run our 
test suite on freebsd so we could leverage that for the unit tests. As 
for NonStop I think we'd just have to rely on Randall running the tests 
as he does now for the integration tests.

> Changes in v5:
> - Add comparison point "License".
> - Discuss feature priorities
> - Drop frameworks:
>    - Incompatible licenses: libtap, cmocka
>    - Missing source: MyTAP
>    - No TAP support: µnit, cmockery, cmockery2, Unity, minunit, CUnit
> - Drop comparison point "Coverage reports": this can generally be
>    handled by tools such as `gcov` regardless of the framework used.
> - Drop comparison point "Inline tests": there didn't seem to be
>    strong interest from reviewers for this feature.
> - Drop comparison point "Scheduling / re-running": this was not
>    supported by any of the main contenders, and is generally better
>    handled by the harness rather than framework.
> - Drop comparison point "Lazy test planning": this was supported by
>    all frameworks that provide TAP output.

These changes all sound sensible to me

The trimmed down table is most welcome. The custom implementation 
supports partial parallel execution. It shouldn't be too difficult to 
add some signal handling to it depending on what we want it to do.

It sounds like we're getting to the point where we have pinned down our 
requirements and the available alternatives well enough to make a decision.

Best Wishes

Phillip

Josh Steadmon Aug. 15, 2023, 10:55 p.m. UTC | #2

On 2023.08.14 14:29, Phillip Wood wrote:
> Hi Josh
> 
> On 08/08/2023 00:07, Josh Steadmon wrote:
> > 
> > Reviewers can help this series progress by discussing whether it's
> > acceptable to rely on `prove` as a test harness for unit tests. We
> > support this for the current shell tests suite, but it is not strictly
> > required.
> 
> If possible it would be good to be able to run individual test programs
> without a harness as we can for our integration tests. For running more than
> one test program in parallel I think it is fine to require a harness.

Sounds good. This is working in v6 which I hope to send to the list
soon.


> I don't have a strong preference for which harness we use so long as it
> provides a way to (a) run tests that previously failed tests first and (b)
> run slow tests first. I do have a strong preference for using the same
> harness for both the unit tests and the integration tests so developers
> don't have to learn two different tools. Unless there is a problem with
> prove it would probably make sense just to keep using that as the project
> test harness.

To be clear, it sounds like both of these can be done with `prove`
(using the various --state settings) without any further support from
our unit tests, right? I see that we do have a "failed" target for
re-running integration tests, but that relies on some test-lib.sh
features that currently have no equivalent in the unit test framework.


> > TODOs remaining:
> > - Discuss pre-existing harnesses for the current test suite
> > - Figure out how to evaluate frameworks on additional OSes such as *BSD
> >    and NonStop
> 
> We have .cirrus.yml in tree which I think gitgitgadget uses to run our test
> suite on freebsd so we could leverage that for the unit tests. As for
> NonStop I think we'd just have to rely on Randall running the tests as he
> does now for the integration tests.

Thanks for the pointer. I've updated .cirrus.yml (as well as the GitHub
CI configs) for v6.


> > Changes in v5:
> > - Add comparison point "License".
> > - Discuss feature priorities
> > - Drop frameworks:
> >    - Incompatible licenses: libtap, cmocka
> >    - Missing source: MyTAP
> >    - No TAP support: µnit, cmockery, cmockery2, Unity, minunit, CUnit
> > - Drop comparison point "Coverage reports": this can generally be
> >    handled by tools such as `gcov` regardless of the framework used.
> > - Drop comparison point "Inline tests": there didn't seem to be
> >    strong interest from reviewers for this feature.
> > - Drop comparison point "Scheduling / re-running": this was not
> >    supported by any of the main contenders, and is generally better
> >    handled by the harness rather than framework.
> > - Drop comparison point "Lazy test planning": this was supported by
> >    all frameworks that provide TAP output.
> 
> These changes all sound sensible to me
> 
> The trimmed down table is most welcome. The custom implementation supports
> partial parallel execution. It shouldn't be too difficult to add some signal
> handling to it depending on what we want it to do.
> 
> It sounds like we're getting to the point where we have pinned down our
> requirements and the available alternatives well enough to make a decision.

Yes, v6 will include your TAP implementation (I assume you are still OK
if I include your patch in this series?).

> Best Wishes
> 
> Phillip
>

Phillip Wood Aug. 17, 2023, 9:05 a.m. UTC | #3

Hi Josh

On 15/08/2023 23:55, Josh Steadmon wrote:
> On 2023.08.14 14:29, Phillip Wood wrote:
>> [...]

>> I don't have a strong preference for which harness we use so long as it
>> provides a way to (a) run tests that previously failed tests first and (b)
>> run slow tests first. I do have a strong preference for using the same
>> harness for both the unit tests and the integration tests so developers
>> don't have to learn two different tools. Unless there is a problem with
>> prove it would probably make sense just to keep using that as the project
>> test harness.
> 
> To be clear, it sounds like both of these can be done with `prove`
> (using the various --state settings) without any further support from
> our unit tests, right? 

Yes

> I see that we do have a "failed" target for
> re-running integration tests, but that relies on some test-lib.sh
> features that currently have no equivalent in the unit test framework.

Ooh, I didn't know about that, I think we could add something similar to 
the framework if we wanted.

> [...]
>> It sounds like we're getting to the point where we have pinned down our
>> requirements and the available alternatives well enough to make a decision.
> 
> Yes, v6 will include your TAP implementation (I assume you are still OK
> if I include your patch in this series?).

Yes that's fine. I'm about to go off the list for a couple of weeks, 
I'll take a proper look at v6 once I'm back.

Best Wishes

Phillip

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b629176d7d..3f2383a12c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -122,6 +122,7 @@  TECH_DOCS += technical/scalar
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/trivial-merge
+TECH_DOCS += technical/unit-tests
 SP_ARTICLES += $(TECH_DOCS)
 SP_ARTICLES += technical/api-index
 
diff --git a/Documentation/technical/unit-tests.txt b/Documentation/technical/unit-tests.txt
new file mode 100644
index 0000000000..fad9ec9279
--- /dev/null
+++ b/Documentation/technical/unit-tests.txt
@@ -0,0 +1,217 @@ 
+= Unit Testing
+
+In our current testing environment, we spend a significant amount of effort
+crafting end-to-end tests for error conditions that could easily be captured by
+unit tests (or we simply forgo some hard-to-setup and rare error conditions).
+Unit tests additionally provide stability to the codebase and can simplify
+debugging through isolation. Writing unit tests in pure C, rather than with our
+current shell/test-tool helper setup, simplifies test setup, simplifies passing
+data around (no shell-isms required), and reduces testing runtime by not
+spawning a separate process for every test invocation.
+
+We believe that a large body of unit tests, living alongside the existing test
+suite, will improve code quality for the Git project.
+
+== Definitions
+
+For the purposes of this document, we'll use *test framework* to refer to
+projects that support writing test cases and running tests within the context
+of a single executable. *Test harness* will refer to projects that manage
+running multiple executables (each of which may contain multiple test cases) and
+aggregating their results.
+
+In reality, these terms are not strictly defined, and many of the projects
+discussed below contain features from both categories.
+
+For now, we will evaluate projects solely on their framework features. Since we
+are relying on having TAP output (see below), we can assume that any framework
+can be made to work with a harness that we can choose later.
+
+
+== Choosing a framework
+
+=== Desired features & feature priority
+
+There are a variety of features we can use to rank the candidate frameworks, and
+those features have different priorities:
+
+* Critical features: we probably won't consider a framework without these
+** Can we legally / easily use the project?
+*** <<license,License>>
+*** <<vendorable-or-ubiquitous,Vendorable or ubiquitous>>
+*** <<maintainable-extensible,Maintainable / extensible>>
+*** <<major-platform-support,Major platform support>>
+** Does the project support our bare-minimum needs?
+*** <<tap-support,TAP support>>
+*** <<diagnostic-output,Diagnostic output>>
+*** <<runtime-skippable-tests,Runtime-skippable tests>>
+* Nice-to-have features:
+** <<parallel-execution,Parallel execution>>
+** <<mock-support,Mock support>>
+** <<signal-error-handling,Signal & error-handling>>
+* Tie-breaker stats
+** <<project-kloc,Project KLOC>>
+** <<adoption,Adoption>>
+
+[[license]]
+==== License
+
+We must be able to legally use the framework in connection with Git. As Git is
+licensed only under GPLv2, we must eliminate any LGPLv3, GPLv3, or Apache 2.0
+projects.
+
+[[vendorable-or-ubiquitous]]
+==== Vendorable or ubiquitous
+
+We want to avoid forcing Git developers to install new tools just to run unit
+tests. Any prospective frameworks and harnesses must either be vendorable
+(meaning, we can copy their source directly into Git's repository), or so
+ubiquitous that it is reasonable to expect that most developers will have the
+tools installed already.
+
+[[maintainable-extensible]]
+==== Maintainable / extensible
+
+It is unlikely that any pre-existing project perfectly fits our needs, so any
+project we select will need to be actively maintained and open to accepting
+changes. Alternatively, assuming we are vendoring the source into our repo, it
+must be simple enough that Git developers can feel comfortable making changes as
+needed to our version.
+
+In the comparison table below, "True" means that the framework seems to have
+active developers, that it is simple enough that Git developers can make changes
+to it, and that the project seems open to accepting external contributions (or
+that it is vendorable). "Partial" means that at least one of the above
+conditions holds.
+
+[[major-platform-support]]
+==== Major platform support
+
+At a bare minimum, unit-testing must work on Linux, MacOS, and Windows.
+
+In the comparison table below, "True" means that it works on all three major
+platforms with no issues. "Partial" means that there may be annoyances on one or
+more platforms, but it is still usable in principle.
+
+[[tap-support]]
+==== TAP support
+
+The https://testanything.org/[Test Anything Protocol] is a text-based interface
+that allows tests to communicate with a test harness. It is already used by
+Git's integration test suite. Supporting TAP output is a mandatory feature for
+any prospective test framework.
+
+In the comparison table below, "True" means this is natively supported.
+"Partial" means TAP output must be generated by post-processing the native
+output.
+
+Frameworks that do not have at least Partial support will not be evaluated
+further.
+
+[[diagnostic-output]]
+==== Diagnostic output
+
+When a test case fails, the framework must generate enough diagnostic output to
+help developers find the appropriate test case in source code in order to debug
+the failure.
+
+[[runtime-skippable-tests]]
+==== Runtime-skippable tests
+
+Test authors may wish to skip certain test cases based on runtime circumstances,
+so the framework should support this.
+
+[[parallel-execution]]
+==== Parallel execution
+
+Ideally, we will build up a significant collection of unit test cases, most
+likely split across multiple executables. It will be necessary to run these
+tests in parallel to enable fast develop-test-debug cycles.
+
+In the comparison table below, "True" means that individual test cases within a
+single test executable can be run in parallel. We assume that executable-level
+parallelism can be handled by the test harness.
+
+[[mock-support]]
+==== Mock support
+
+Unit test authors may wish to test code that interacts with objects that may be
+inconvenient to handle in a test (e.g. interacting with a network service).
+Mocking allows test authors to provide a fake implementation of these objects
+for more convenient tests.
+
+[[signal-error-handling]]
+==== Signal & error handling
+
+The test framework should fail gracefully when test cases are themselves buggy
+or when they are interrupted by signals during runtime.
+
+[[project-kloc]]
+==== Project KLOC
+
+The size of the project, in thousands of lines of code as measured by
+https://dwheeler.com/sloccount/[sloccount] (rounded up to the next multiple of
+1,000). As a tie-breaker, we probably prefer a project with fewer LOC.
+
+[[adoption]]
+==== Adoption
+
+As a tie-breaker, we prefer a more widely-used project. We use the number of
+GitHub / GitLab stars to estimate this.
+
+
+=== Comparison
+
+[format="csv",options="header",width="33%"]
+|=====
+Framework,"<<license,License>>","<<vendorable-or-ubiquitous,Vendorable or ubiquitous>>","<<maintainable-extensible,Maintainable / extensible>>","<<major-platform-support,Major platform support>>","<<tap-support,TAP support>>","<<diagnostic-output,Diagnostic output>>","<<runtime--skippable-tests,Runtime- skippable tests>>","<<parallel-execution,Parallel execution>>","<<mock-support,Mock support>>","<<signal-error-handling,Signal & error handling>>","<<project-kloc,Project KLOC>>","<<adoption,Adoption>>"
+https://lore.kernel.org/git/c902a166-98ce-afba-93f2-ea6027557176@gmail.com/[Custom Git impl.],[lime-background]#GPL v2#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[red-background]#False#,[red-background]#False#,[red-background]#False#,1,0
+https://github.com/silentbicycle/greatest[Greatest],[lime-background]#ISC#,[lime-background]#True#,[yellow-background]#Partial#,[lime-background]#True#,[yellow-background]#Partial#,[lime-background]#True#,[lime-background]#True#,[red-background]#False#,[red-background]#False#,[red-background]#False#,3,1400
+https://github.com/Snaipe/Criterion[Criterion],[lime-background]#MIT#,[red-background]#False#,[yellow-background]#Partial#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[red-background]#False#,[lime-background]#True#,19,1800
+https://github.com/rra/c-tap-harness/[C TAP],[lime-background]#Expat#,[lime-background]#True#,[yellow-background]#Partial#,[yellow-background]#Partial#,[lime-background]#True#,[red-background]#False#,[lime-background]#True#,[red-background]#False#,[red-background]#False#,[red-background]#False#,4,33
+https://libcheck.github.io/check/[Check],[lime-background]#LGPL v2.1#,[red-background]#False#,[yellow-background]#Partial#,[lime-background]#True#,[lime-background]#True#,[lime-background]#True#,[red-background]#False#,[red-background]#False#,[red-background]#False#,[lime-background]#True#,17,973
+|=====
+
+==== Alternatives considered
+
+Several suggested frameworks have been eliminated from consideration:
+
+* Incompatible licenses:
+** https://github.com/zorgnax/libtap[libtap] (LGPL v3)
+** https://cmocka.org/[cmocka] (Apache 2.0)
+* Missing source: https://www.kindahl.net/mytap/doc/index.html[MyTap]
+* No TAP support:
+** https://nemequ.github.io/munit/[µnit]
+** https://github.com/google/cmockery[cmockery]
+** https://github.com/lpabon/cmockery2[cmockery2]
+** https://github.com/ThrowTheSwitch/Unity[Unity]
+** https://github.com/siu/minunit[minunit]
+** https://cunit.sourceforge.net/[CUnit]
+
+==== Suggested framework
+
+Considering the feature priorities and comparison listed above, a custom
+framework seems to be the best option.
+
+
+== Choosing a test harness
+
+During upstream discussion, it was occasionally noted that `prove` provides many
+convenient features. While we already support the use of `prove` as a test
+harness for the shell tests, it is not strictly required.
+
+IMPORTANT: It is an open question whether or not we wish to rely on `prove` as a
+strict dependency for running unit tests.
+
+
+== Milestones
+
+* Get upstream agreement on implementing a custom test framework
+* Determine if it's OK to rely on `prove` for running unit tests
+* Add useful tests of library-like code
+* Integrate with Makefile
+* Integrate with CI
+* Integrate with
+  https://lore.kernel.org/git/20230502211454.1673000-1-calvinwan@google.com/[stdlib
+  work]
+* Run alongside regular `make test` target

[v5] unit tests: Add a project plan document

Commit Message

Comments

Patch