mbox series

[00/19] Add Thread Sanitizer support to QEMU

Message ID 20200522160755.886-1-robert.foley@linaro.org (mailing list archive)
Headers show
Series Add Thread Sanitizer support to QEMU | expand

Message

Robert Foley May 22, 2020, 4:07 p.m. UTC
This patch series continues the work done by Emilio Cota and others to add
Thread Sanitizer (TSan) support to QEMU.

The starting point for this work was Emilio's branch here:
https://github.com/cota/qemu/commits/tsan
specifically this commit: 0be125fc0afd47218b34d2019abdd19b644f3199

The purpose of this patch is not to fix all the TSan warnings, but to enable
the TSan support so that QEMU developers can start using the tool.  
We found this tool useful and even ran it on our recent changes in
the cpu-locks series.
Clearly there is work to do here to clean up all the warnings. :)  
We have made a start to cleaning up these warnings by getting a VM to boot 
cleanly with no TSan warnings.  
We have also made an effort to introduce enough of the TSan suppression
mechanisms, so that others can continue this work.

This series adds support for:
- configure option for --enable-tsan.
- testing.rst has the full details on how to use TSan with docker
  and also outside of docker.
- Docker builds with TSan.
  - We added an Ubuntu 20.04 docker that supports TSan builds.
  - Something like this will build TSan
    make docker-test-build@ubuntu2004 DEBUG=1 TSAN=1
  - Testing with TSan is also supported with docker,
    although, be forwarned that test-quick currently fails.  
    See "Issues" section below for the current failures.
    make docker-test-quick@ubuntu2004 DEBUG=1 TSAN=1
  - We recommend using the DEBUG=1 option and launching the test 
   (like test-quick) from inside the docker so that when the test is done,
    you can review the warnings from inside the docker.
  - testing.rst has the full details on how to use TSan with docker.
- We added a blacklist file for files/functions
  TSan should ignore at compile time.
- And added a suppression file for TSan to suppress certain warnings at
  run time.  
  We found both of these mechanisms are needed when suppressing warnings.
- It is also worth mentioning that we were able to suppress/fix enough errors
  to allow an Ubuntu 18.04 aarch64 VM to boot with zero TSan warnings.  
  When we started this effort, there were ~300 warnings reported by 
  TSan during the same VM boot !

Issues:
- When running docker-test-quick under TSan there are several tests which hang
  - The unit tests which seem to hang under TSan:
    test-char, test-qdev-global-props, and test-qga.  
  - If we comment out those tests, check-unit finishes, albeit with 
    a couple of warnings. :)


Emilio G. Cota (7):
  cpu: convert queued work to a QSIMPLEQ
  thread: add qemu_spin_destroy
  cputlb: destroy CPUTLB with tlb_destroy
  qht: call qemu_spin_destroy for head buckets
  tcg: call qemu_spin_destroy for tb->jmp_lock
  translate-all: call qemu_spin_destroy for PageDesc
  thread: add tsan annotations to QemuSpin

Lingfeng Yang (1):
  configure: add --enable-tsan flag + fiber annotations for
    coroutine-ucontext

Robert Foley (11):
  tests/docker: Added docker build support for TSan.
  include/qemu: Added tsan.h for annotations.
  accel/tcg: Fixed tsan warnings related to parallel_cpus
  configure: added tsan support for blacklist.
  accel/tcg: Fixed tsan warnings.
  util/async: Fixed tsan warnings
  qht: Fix tsan warnings.
  util: fixed tsan warnings in thread_pool.c
  util: Added tsan annotate for thread name.
  target/arm: Fix tsan warning in cpu.c
  docs: Added details on TSan to testing.rst

 accel/tcg/cpu-exec.c                       |  4 +-
 accel/tcg/cputlb.c                         | 15 ++++
 accel/tcg/tcg-all.c                        |  4 +-
 accel/tcg/tcg-runtime.c                    |  7 +-
 accel/tcg/translate-all.c                  | 25 +++++-
 configure                                  | 40 +++++++++
 cpus-common.c                              | 25 ++----
 cpus.c                                     | 16 +++-
 docs/devel/testing.rst                     | 72 ++++++++++++++++
 exec.c                                     |  1 +
 hw/core/cpu.c                              |  3 +-
 include/exec/exec-all.h                    | 10 ++-
 include/hw/core/cpu.h                      |  6 +-
 include/qemu/thread.h                      | 38 ++++++++-
 include/qemu/tsan.h                        | 48 +++++++++++
 include/tcg/tcg.h                          |  3 +-
 linux-user/syscall.c                       |  4 +-
 target/arm/cpu.c                           |  2 +-
 tcg/tcg.c                                  | 19 ++++-
 tests/docker/Makefile.include              |  2 +
 tests/docker/common.rc                     | 19 +++++
 tests/docker/dockerfiles/ubuntu2004.docker | 65 +++++++++++++++
 tests/tsan/blacklist.tsan                  |  5 ++
 tests/tsan/suppressions.tsan               | 14 ++++
 util/async.c                               | 11 ++-
 util/coroutine-ucontext.c                  | 97 ++++++++++++++++++++--
 util/qemu-thread-posix.c                   |  2 +
 util/qht.c                                 |  4 +
 util/thread-pool.c                         |  5 +-
 29 files changed, 514 insertions(+), 52 deletions(-)
 create mode 100644 include/qemu/tsan.h
 create mode 100644 tests/docker/dockerfiles/ubuntu2004.docker
 create mode 100644 tests/tsan/blacklist.tsan
 create mode 100644 tests/tsan/suppressions.tsan

Comments

no-reply@patchew.org May 22, 2020, 9:07 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200522160755.886-1-robert.foley@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200522160755.886-1-robert.foley@linaro.org
Subject: [PATCH 00/19] Add Thread Sanitizer support to QEMU
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
1737663 docs: Added details on TSan to testing.rst
82aa460 target/arm: Fix tsan warning in cpu.c
4a3bd5a util: Added tsan annotate for thread name.
f2614bb util: fixed tsan warnings in thread_pool.c
56529c7 qht: Fix tsan warnings.
128f63c util/async: Fixed tsan warnings
f86c38c accel/tcg: Fixed tsan warnings.
0d7ee16 configure: added tsan support for blacklist.
cfb2d34 accel/tcg: Fixed tsan warnings related to parallel_cpus
1ef1ed2 include/qemu: Added tsan.h for annotations.
bd287e9 tests/docker: Added docker build support for TSan.
6baf0d3 thread: add tsan annotations to QemuSpin
bbf88d9 translate-all: call qemu_spin_destroy for PageDesc
5f0a213 tcg: call qemu_spin_destroy for tb->jmp_lock
fb19649 qht: call qemu_spin_destroy for head buckets
688ca64 cputlb: destroy CPUTLB with tlb_destroy
be8d1f8 thread: add qemu_spin_destroy
2a326b6 cpu: convert queued work to a QSIMPLEQ
7fb7830 configure: add --enable-tsan flag + fiber annotations for coroutine-ucontext

=== OUTPUT BEGIN ===
1/19 Checking commit 7fb7830797be (configure: add --enable-tsan flag + fiber annotations for coroutine-ucontext)
2/19 Checking commit 2a326b6f7215 (cpu: convert queued work to a QSIMPLEQ)
3/19 Checking commit be8d1f8ff517 (thread: add qemu_spin_destroy)
4/19 Checking commit 688ca64764bf (cputlb: destroy CPUTLB with tlb_destroy)
5/19 Checking commit fb19649f7025 (qht: call qemu_spin_destroy for head buckets)
6/19 Checking commit 5f0a21365e6e (tcg: call qemu_spin_destroy for tb->jmp_lock)
7/19 Checking commit bbf88d92575a (translate-all: call qemu_spin_destroy for PageDesc)
8/19 Checking commit 6baf0d37cfdf (thread: add tsan annotations to QemuSpin)
9/19 Checking commit bd287e96cf4a (tests/docker: Added docker build support for TSan.)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#74: 
new file mode 100644

total: 0 errors, 1 warnings, 118 lines checked

Patch 9/19 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
10/19 Checking commit 1ef1ed22be4b (include/qemu: Added tsan.h for annotations.)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#18: 
new file mode 100644

total: 0 errors, 1 warnings, 48 lines checked

Patch 10/19 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
11/19 Checking commit cfb2d343b6ed (accel/tcg: Fixed tsan warnings related to parallel_cpus)
12/19 Checking commit 0d7ee16ffe83 (configure: added tsan support for blacklist.)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#28: 
new file mode 100644

total: 0 errors, 1 warnings, 14 lines checked

Patch 12/19 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
13/19 Checking commit f86c38c6aa87 (accel/tcg: Fixed tsan warnings.)
14/19 Checking commit 128f63c37b76 (util/async: Fixed tsan warnings)
15/19 Checking commit 56529c7ff837 (qht: Fix tsan warnings.)
16/19 Checking commit f2614bbafdb4 (util: fixed tsan warnings in thread_pool.c)
17/19 Checking commit 4a3bd5a64414 (util: Added tsan annotate for thread name.)
18/19 Checking commit 82aa460722b3 (target/arm: Fix tsan warning in cpu.c)
19/19 Checking commit 17376630a146 (docs: Added details on TSan to testing.rst)
ERROR: trailing whitespace
#34: FILE: docs/devel/testing.rst:413:
+  $

ERROR: trailing whitespace
#40: FILE: docs/devel/testing.rst:419:
+the files with TSan warnings.  $

total: 2 errors, 0 warnings, 78 lines checked

Patch 19/19 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200522160755.886-1-robert.foley@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Emilio Cota May 23, 2020, 9:36 p.m. UTC | #2
On Fri, May 22, 2020 at 12:07:36 -0400, Robert Foley wrote:
> This patch series continues the work done by Emilio Cota and others to add
> Thread Sanitizer (TSan) support to QEMU.
> 
> The starting point for this work was Emilio's branch here:
> https://github.com/cota/qemu/commits/tsan
> specifically this commit: 0be125fc0afd47218b34d2019abdd19b644f3199
> 
> The purpose of this patch is not to fix all the TSan warnings, but to enable
> the TSan support so that QEMU developers can start using the tool.  
> We found this tool useful and even ran it on our recent changes in
> the cpu-locks series.
> Clearly there is work to do here to clean up all the warnings. :)  
> We have made a start to cleaning up these warnings by getting a VM to boot 
> cleanly with no TSan warnings.  
> We have also made an effort to introduce enough of the TSan suppression
> mechanisms, so that others can continue this work.

Thank you for doing this work! Great to see this finally coming along.

What are your plans wrt the per-cpu-lock branch? Given that some of
the races reported here are fixed there, I think it would make sense to
defer those patches to the per-cpu-lock series (i.e. patches 2/19, parts
of 13/19, and 18/19) so that this series goes in first (it is a lot
less likely to break anything).

Also, I would not worry too much about rushing to bring warnings to
0; what's important is that with the warnings we now know where to
focus on, and then we can carefully fix races. In particular I think
all annotations we add must have a comment, since otherwise we are
at the risk of blindlessly silencing warnings of real races.

> Issues:
> - When running docker-test-quick under TSan there are several tests which hang
>   - The unit tests which seem to hang under TSan:
>     test-char, test-qdev-global-props, and test-qga.  
>   - If we comment out those tests, check-unit finishes, albeit with 
>     a couple of warnings. :)

I've noticed another issue (that I did not notice on my previous
machine), which is that tsan blows up when in qht we lock all
of the bucket's locks. We then get this assert from tsan, since
it has a static limit of 64 mutexes held at any given time:

FATAL: ThreadSanitizer CHECK failed: /build/llvm-toolchain-10-yegZYJ/llvm-toolchain-10-10.0.0/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40)
    #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) <null> (qemu-system-aarch64+0x49f9f5)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) <null> (qemu-system-aarch64+0x4b651f)
    #2 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::addLock(unsigned long, unsigned long, unsigned int) <null> (qemu-system-aarch64+0x4aacbc)
    #3 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, __sanitizer::DDMutex*, bool, bool) <null> (qemu-system-aarch64+0x4aa22e)
    #4 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int, int) <null> (qemu-system-aarch64+0x49ded8)
    #5 __tsan_mutex_post_lock <null> (qemu-system-aarch64+0x48175a)
    #6 qemu_spin_lock /home/cota/src/qemu/include/qemu/thread.h:245:5 (qemu-system-aarch64+0xb1283b)
    #7 qht_map_lock_buckets /home/cota/src/qemu/util/qht.c:253:9 (qemu-system-aarch64+0xb1283b)

A workaround for now is to run qemu with TSAN_OPTIONS=detect_deadlocks=0,
although selectively disabling tsan for qht_map_lock/unlock_buckets would
be ideal (not sure if it's possible).

Another warning I'm reliably seen is:
WARNING: ThreadSanitizer: double lock of a mutex (pid=489006)
    #0 pthread_mutex_lock <null> (qemu-system-aarch64+0x457596)
    #1 qemu_mutex_lock_impl /home/cota/src/qemu/util/qemu-thread-posix.c:79:11 (qemu-system-aarch64+0xaf7e3c)

  Location is heap block of size 328 at 0x7b4800114900 allocated by main thread:
    #0 calloc <null> (qemu-system-aarch64+0x438b80)
    #1 g_malloc0 <null> (libglib-2.0.so.0+0x57d30)

  Mutex M57 (0x7b4800114960) created at:
    #0 pthread_mutex_init <null> (qemu-system-aarch64+0x43b74d)
    #1 qemu_rec_mutex_init /home/cota/src/qemu/util/qemu-thread-posix.c:119:11 (qemu-system-aarch64+0xaf815b)

But this one seems safe to ignore.

Thanks,
		E.
Robert Foley May 26, 2020, 3:18 p.m. UTC | #3
On Sat, 23 May 2020 at 17:36, Emilio G. Cota <cota@braap.org> wrote:
>
> On Fri, May 22, 2020 at 12:07:36 -0400, Robert Foley wrote:
> > This patch series continues the work done by Emilio Cota and others to add
> > Thread Sanitizer (TSan) support to QEMU.
> >
> > The starting point for this work was Emilio's branch here:
> > https://github.com/cota/qemu/commits/tsan
> > specifically this commit: 0be125fc0afd47218b34d2019abdd19b644f3199
> >
> > The purpose of this patch is not to fix all the TSan warnings, but to enable
> > the TSan support so that QEMU developers can start using the tool.
> > We found this tool useful and even ran it on our recent changes in
> > the cpu-locks series.
> > Clearly there is work to do here to clean up all the warnings. :)
> > We have made a start to cleaning up these warnings by getting a VM to boot
> > cleanly with no TSan warnings.
> > We have also made an effort to introduce enough of the TSan suppression
> > mechanisms, so that others can continue this work.
>
> Thank you for doing this work! Great to see this finally coming along.
>
> What are your plans wrt the per-cpu-lock branch? Given that some of
> the races reported here are fixed there, I think it would make sense to
> defer those patches to the per-cpu-lock series (i.e. patches 2/19, parts
> of 13/19, and 18/19) so that this series goes in first (it is a lot
> less likely to break anything).

I agree with you that we should defer my patches which overlap with
the per-cpu-lock patch series.

>
> Also, I would not worry too much about rushing to bring warnings to
> 0; what's important is that with the warnings we now know where to
> focus on, and then we can carefully fix races. In particular I think
> all annotations we add must have a comment, since otherwise we are
> at the risk of blindlessly silencing warnings of real races.

In order to re-focus the series a bit, we are planning to drop the
annotations from the series.  This allows for a bit more focus on
enabling TSan and less on bringing the warnings to 0 as you mentioned.

At the same time, we're also going to add in a bit more details to
testing.rst on how to use the various suppression mechanisms,
annotations, blacklist and suppressions.txt.

>
> > Issues:
> > - When running docker-test-quick under TSan there are several tests which hang
> >   - The unit tests which seem to hang under TSan:
> >     test-char, test-qdev-global-props, and test-qga.
> >   - If we comment out those tests, check-unit finishes, albeit with
> >     a couple of warnings. :)
>
> I've noticed another issue (that I did not notice on my previous
> machine), which is that tsan blows up when in qht we lock all
> of the bucket's locks. We then get this assert from tsan, since
> it has a static limit of 64 mutexes held at any given time:
>
> FATAL: ThreadSanitizer CHECK failed: /build/llvm-toolchain-10-yegZYJ/llvm-toolchain-10-10.0.0/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40)
<snip>

> A workaround for now is to run qemu with TSAN_OPTIONS=detect_deadlocks=0,
> although selectively disabling tsan for qht_map_lock/unlock_buckets would
> be ideal (not sure if it's possible).

We have been using detect_deadlocks=0 to avoid this.
We will see if it is possible to disable tsan for just
qht_map_lock/unlock_buckets

>
> Another warning I'm reliably seen is:
> WARNING: ThreadSanitizer: double lock of a mutex (pid=489006)
>     #0 pthread_mutex_lock <null> (qemu-system-aarch64+0x457596)
>     #1 qemu_mutex_lock_impl /home/cota/src/qemu/util/qemu-thread-posix.c:79:11 (qemu-system-aarch64+0xaf7e3c)
>
>   Location is heap block of size 328 at 0x7b4800114900 allocated by main thread:
>     #0 calloc <null> (qemu-system-aarch64+0x438b80)
>     #1 g_malloc0 <null> (libglib-2.0.so.0+0x57d30)
>
>   Mutex M57 (0x7b4800114960) created at:
>     #0 pthread_mutex_init <null> (qemu-system-aarch64+0x43b74d)
>     #1 qemu_rec_mutex_init /home/cota/src/qemu/util/qemu-thread-posix.c:119:11 (qemu-system-aarch64+0xaf815b)
>
> But this one seems safe to ignore.
>

This one we disabled in the suppressions.txt file.
TSan reports a double lock even when the mutex is flagged as recursive.


Thanks & Regards,
-Rob
> Thanks,
>                 E.