ci: add address and undefined sanitizer tasks

Message ID	xmqq8rlo62ih.fsf@gitster.g (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> From: Junio C Hamano <gitster@pobox.com> To: git@vger.kernel.org Subject: [PATCH] ci: add address and undefined sanitizer tasks Date: Sun, 09 Oct 2022 15:44:22 -0700 Message-ID: <xmqq8rlo62ih.fsf@gitster.g> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	ci: add address and undefined sanitizer tasks \| expand ci: add address and undefined sanitizer tasks

Junio C Hamano Oct. 9, 2022, 10:44 p.m. UTC

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 * I've been running my local post-integration-pre-pushout tests of
   'seen' with these two sanitizer tests, which has saved me from a
   few potential embarrassments early.  As it takes a lot extra time
   to run these locally, I am aiming to burden contributors who run
   their due diligence "before sending the patch" checks using the
   GitHub Actions CI ;-).

   The way the patch adds jobs to CI just imitates how -leaks one is
   defined.

 .github/workflows/main.yml | 6 ++++++
 ci/lib.sh                  | 3 +++
 2 files changed, 9 insertions(+)

Junio C Hamano Oct. 10, 2022, 11:25 p.m. UTC | #1

Junio C Hamano <gitster@pobox.com> writes:

> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  * I've been running my local post-integration-pre-pushout tests of
>    'seen' with these two sanitizer tests, which has saved me from a
>    few potential embarrassments early.  As it takes a lot extra time
>    to run these locally, I am aiming to burden contributors who run
>    their due diligence "before sending the patch" checks using the
>    GitHub Actions CI ;-).
>
>    The way the patch adds jobs to CI just imitates how -leaks one is
>    defined.
>
>  .github/workflows/main.yml | 6 ++++++
>  ci/lib.sh                  | 3 +++
>  2 files changed, 9 insertions(+)

One downside is that the usual CI cycle for a branch that takes a
bit shorter than 40 minutes seems to take between 50 to 60 minutes
(the primary culprit seems to be the address sanitizer).

Arguably, that 10 or 20 minutes are time saved from human
developers' time, so it might not be too bad, but I'll keep
it out of 'next' for now.

> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index 831f4df56c..2f80da7cfb 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -251,6 +251,12 @@ jobs:
>            - jobname: linux-leaks
>              cc: gcc
>              pool: ubuntu-latest
> +          - jobname: linux-address
> +            cc: gcc
> +            pool: ubuntu-latest
> +          - jobname: linux-undefined
> +            cc: gcc
> +            pool: ubuntu-latest
>      env:
>        CC: ${{matrix.vector.cc}}
>        CC_PACKAGE: ${{matrix.vector.cc_package}}
> diff --git a/ci/lib.sh b/ci/lib.sh
> index 1b0cc2b57d..678edd5abb 100755
> --- a/ci/lib.sh
> +++ b/ci/lib.sh
> @@ -278,6 +278,9 @@ linux-leaks)
>  	export GIT_TEST_PASSING_SANITIZE_LEAK=true
>  	export GIT_TEST_SANITIZE_LEAK_LOG=true
>  	;;
> +linux-address | linux-undefined)
> +	export SANITIZE=${jobname#linux-}
> +	;;
>  esac
>  
>  MAKEFLAGS="$MAKEFLAGS CC=${CC:-cc}"

Jeff King Oct. 11, 2022, midnight UTC | #2

On Mon, Oct 10, 2022 at 04:25:23PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  * I've been running my local post-integration-pre-pushout tests of
> >    'seen' with these two sanitizer tests, which has saved me from a
> >    few potential embarrassments early.  As it takes a lot extra time
> >    to run these locally, I am aiming to burden contributors who run
> >    their due diligence "before sending the patch" checks using the
> >    GitHub Actions CI ;-).
> >
> >    The way the patch adds jobs to CI just imitates how -leaks one is
> >    defined.
> 
> One downside is that the usual CI cycle for a branch that takes a
> bit shorter than 40 minutes seems to take between 50 to 60 minutes
> (the primary culprit seems to be the address sanitizer).
> 
> Arguably, that 10 or 20 minutes are time saved from human
> developers' time, so it might not be too bad, but I'll keep
> it out of 'next' for now.

You can make it a bit faster by running both at once as
SANITIZE=address,undefined.

Since we expect both to pass cleanly, there's really no other
complication; the user will either see errors (and correct them) or they
won't. The signal of "passed with asan, but not ubsan" (or vice versa)
is not that useful in practice.

In the long run, I hope that "leaks" can run in the same way, but we're
not there yet since there's a lot of selective test-running. In theory
the regular linux test run is not necessary with this job, but I think
the signal for "broke with/without sanitizers" is a little higher (and
anyway, we have to run the regular suite on a zillion other platforms,
too).

-Peff

Junio C Hamano Oct. 11, 2022, 12:09 a.m. UTC | #3

Jeff King <peff@peff.net> writes:

> The signal of "passed with asan, but not ubsan" (or vice versa)
> is not that useful in practice.

Ah, that may be true.  I'll try that in the next version.

Junio C Hamano Oct. 21, 2022, 5:59 a.m. UTC | #4

Jeff King <peff@peff.net> writes:

> You can make it a bit faster by running both at once as
> SANITIZE=address,undefined.

With a single combined sanitier job, we saw failures at CI of p4
tests, where the server "goes away" in the middle of tests, for
apparently no reason, and I just speculated perhaps the tests taking
too long may be causing an impatient server side to go away before
the client side finishes its work or something.  After splitting the
ASan and UBSan jobs into two and ejecting a topic out of 'seen', CI
started passing for the first time in several days.

So, here is what I ended up using, at least for now.

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
Subject: [PATCH] ci: add address and undefined sanitizer tasks

The current code is clean with these two sanitizers, and we would
like to keep it that way by running the checks for any new code.

The signal of "passed with asan, but not ubsan" (or vice versa) is
not that useful in practice, so it is tempting to run both santizers
in a single task, but it seems to take forever, so tentatively let's
try having two separate ones.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .github/workflows/main.yml | 6 ++++++
 ci/lib.sh                  | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 831f4df56c..bd6f75b8e0 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -251,6 +251,12 @@ jobs:
           - jobname: linux-leaks
             cc: gcc
             pool: ubuntu-latest
+          - jobname: linux-asan
+            cc: gcc
+            pool: ubuntu-latest
+          - jobname: linux-ubsan
+            cc: gcc
+            pool: ubuntu-latest
     env:
       CC: ${{matrix.vector.cc}}
       CC_PACKAGE: ${{matrix.vector.cc_package}}
diff --git a/ci/lib.sh b/ci/lib.sh
index 1b0cc2b57d..e3d49d3296 100755
--- a/ci/lib.sh
+++ b/ci/lib.sh
@@ -278,6 +278,12 @@ linux-leaks)
 	export GIT_TEST_PASSING_SANITIZE_LEAK=true
 	export GIT_TEST_SANITIZE_LEAK_LOG=true
 	;;
+linux-asan)
+	export SANITIZE=address
+	;;
+linux-ubsan)
+	export SANITIZE=undefined
+	;;
 esac

 MAKEFLAGS="$MAKEFLAGS CC=${CC:-cc}"

Jeff King Oct. 21, 2022, 6:25 a.m. UTC | #5

On Thu, Oct 20, 2022 at 10:59:28PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > You can make it a bit faster by running both at once as
> > SANITIZE=address,undefined.
> 
> With a single combined sanitier job, we saw failures at CI of p4
> tests, where the server "goes away" in the middle of tests, for
> apparently no reason, and I just speculated perhaps the tests taking
> too long may be causing an impatient server side to go away before
> the client side finishes its work or something.  After splitting the
> ASan and UBSan jobs into two and ejecting a topic out of 'seen', CI
> started passing for the first time in several days.

I am not too surprised to hear that. We flushed out many test races in
the past from using valgrind and SANITIZE. But I don't run the p4 tests
locally, so this is likely some of their first exposure to slower runs.

But if that is the case, then you are probably just papering over
failures here which will likely come back, at least racily.

I admit I was surprised how much slower the combined one is. Here are a
few timings on my laptop:

  $ for i in '' address undefined address,undefined; do
      echo -n "SANITIZE=$i"
      make SANITIZE=$i >/dev/null 2>&1 &&
      (cd t && time ./t3700-add.sh >/dev/null 2>&1)
      echo
    done

  SANITIZE=
  real	0m1.109s
  user	0m0.533s
  sys	0m0.589s

  SANITIZE=address
  real	0m1.816s
  user	0m0.999s
  sys	0m0.806s

  SANITIZE=undefined
  real	0m1.336s
  user	0m0.618s
  sys	0m0.710s

  SANITIZE=address,undefined
  real	0m2.928s
  user	0m1.304s
  sys	0m1.635s

Curiously, with CC=clang the results are closer to what I'd expect:

  SANITIZE=
  real	0m1.186s
  user	0m0.608s
  sys	0m0.579s

  SANITIZE=address
  real	0m1.910s
  user	0m1.014s
  sys	0m0.890s

  SANITIZE=undefined
  real	0m1.477s
  user	0m0.611s
  sys	0m0.865s

  SANITIZE=address,undefined
  real	0m2.309s
  user	0m1.126s
  sys	0m1.191s

I'm not sure what the takeaway is. If using two jobs produces
appreciably fewer races in practice, it may be worth doing. But I wonder
if just using clang would work better.

-Peff

ci: add address and undefined sanitizer tasks

Commit Message

Comments

Patch