diff mbox

linux-user: call cpu_copy under clone_lock

Message ID 20180330133525.10994-1-jcmvbkbc@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Max Filippov March 30, 2018, 1:35 p.m. UTC
cpu_copy adds newly created CPU object to container/machine/unattached,
but does it w/o proper locking. As a result when multiple threads are
created rapidly QEMU may abort with the following message:

  GLib-CRITICAL **: g_hash_table_iter_next: assertion
  'ri->version == ri->hash_table->version' failed

  ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component:
  code should not be reached

Move cpu_copy invocation under clone_lock to fix that.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 linux-user/syscall.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Laurent Vivier March 30, 2018, 2 p.m. UTC | #1
Le 30/03/2018 à 15:35, Max Filippov a écrit :
> cpu_copy adds newly created CPU object to container/machine/unattached,
> but does it w/o proper locking. As a result when multiple threads are
> created rapidly QEMU may abort with the following message:
> 
>   GLib-CRITICAL **: g_hash_table_iter_next: assertion
>   'ri->version == ri->hash_table->version' failed
> 
>   ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component:
>   code should not be reached

Also reported in https://bugs.launchpad.net/qemu/+bug/1756519

> Move cpu_copy invocation under clone_lock to fix that.
> 
> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
> ---
>  linux-user/syscall.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 889abbda1e65..18ea79140f16 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -6346,6 +6346,10 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
>  
>          ts = g_new0(TaskState, 1);
>          init_task_state(ts);
> +
> +        /* Grab a mutex so that thread setup appears atomic.  */
> +        pthread_mutex_lock(&clone_lock);
> +
>          /* we create a new CPU instance. */
>          new_env = cpu_copy(env);
>          /* Init regs that differ from the parent.  */
> @@ -6364,9 +6368,6 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
>              cpu_set_tls (new_env, newtls);
>          }
>  
> -        /* Grab a mutex so that thread setup appears atomic.  */
> -        pthread_mutex_lock(&clone_lock);
> -
>          memset(&info, 0, sizeof(info));
>          pthread_mutex_init(&info.mutex, NULL);
>          pthread_mutex_lock(&info.mutex);
> 

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
no-reply@patchew.org March 31, 2018, 8:55 a.m. UTC | #2
Hi,

This series failed docker-quick@centos6 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 20180330133525.10994-1-jcmvbkbc@gmail.com
Subject: [Qemu-devel] [PATCH] linux-user: call cpu_copy under clone_lock

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
92f42e73b0 linux-user: call cpu_copy under clone_lock

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-1gu7zpou/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-1gu7zpou/src'
  GEN     /var/tmp/patchew-tester-tmp-1gu7zpou/src/docker-src.2018-03-31-04.54.53.27138/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-1gu7zpou/src/docker-src.2018-03-31-04.54.53.27138/qemu.tar.vroot'...
done.
Checking out files:  41% (2520/6066)   
Checking out files:  42% (2548/6066)   
Checking out files:  43% (2609/6066)   
Checking out files:  44% (2670/6066)   
Checking out files:  44% (2676/6066)   
Checking out files:  44% (2695/6066)   
Checking out files:  45% (2730/6066)   
Checking out files:  46% (2791/6066)   
Checking out files:  47% (2852/6066)   
Checking out files:  48% (2912/6066)   
Checking out files:  49% (2973/6066)   
Checking out files:  50% (3033/6066)   
Checking out files:  51% (3094/6066)   
Checking out files:  52% (3155/6066)   
Checking out files:  53% (3215/6066)   
Checking out files:  54% (3276/6066)   
Checking out files:  55% (3337/6066)   
Checking out files:  56% (3397/6066)   
Checking out files:  56% (3423/6066)   
Checking out files:  57% (3458/6066)   
Checking out files:  58% (3519/6066)   
Checking out files:  59% (3579/6066)   
Checking out files:  60% (3640/6066)   
Checking out files:  61% (3701/6066)   
Checking out files:  62% (3761/6066)   
Checking out files:  63% (3822/6066)   
Checking out files:  64% (3883/6066)   
Checking out files:  65% (3943/6066)   
Checking out files:  66% (4004/6066)   
Checking out files:  67% (4065/6066)   
Checking out files:  67% (4075/6066)   
Checking out files:  68% (4125/6066)   
Checking out files:  69% (4186/6066)   
Checking out files:  70% (4247/6066)   
Checking out files:  71% (4307/6066)   
Checking out files:  72% (4368/6066)   
Checking out files:  73% (4429/6066)   
Checking out files:  74% (4489/6066)   
Checking out files:  75% (4550/6066)   
Checking out files:  76% (4611/6066)   
Checking out files:  77% (4671/6066)   
Checking out files:  78% (4732/6066)   
Checking out files:  79% (4793/6066)   
Checking out files:  79% (4810/6066)   
Checking out files:  80% (4853/6066)   
Checking out files:  81% (4914/6066)   
Checking out files:  82% (4975/6066)   
Checking out files:  83% (5035/6066)   
Checking out files:  84% (5096/6066)   
Checking out files:  85% (5157/6066)   
Checking out files:  86% (5217/6066)   
Checking out files:  87% (5278/6066)   
Checking out files:  88% (5339/6066)   
Checking out files:  89% (5399/6066)   
Checking out files:  90% (5460/6066)   
Checking out files:  91% (5521/6066)   
Checking out files:  92% (5581/6066)   
Checking out files:  93% (5642/6066)   
Checking out files:  94% (5703/6066)   
Checking out files:  95% (5763/6066)   
Checking out files:  96% (5824/6066)   
Checking out files:  97% (5885/6066)   
Checking out files:  98% (5945/6066)   
Checking out files:  99% (6006/6066)   
Checking out files: 100% (6066/6066)   
Checking out files: 100% (6066/6066), done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-1gu7zpou/src/docker-src.2018-03-31-04.54.53.27138/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-1gu7zpou/src/docker-src.2018-03-31-04.54.53.27138/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
tar: /var/tmp/patchew-tester-tmp-1gu7zpou/src/docker-src.2018-03-31-04.54.53.27138/qemu.tar: Wrote only 2048 of 10240 bytes
tar: Error is not recoverable: exiting now
failed to create tar file
  COPY    RUNNER
    RUN test-quick in qemu:centos6 
tar: Unexpected EOF in archive
tar: rmtlseek not stopped at a record boundary
tar: Error is not recoverable: exiting now
/var/tmp/qemu/run: line 32: prep_fail: command not found
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
bzip2-devel-1.0.5-7.el6_0.x86_64
ccache-3.1.6-2.el6.x86_64
csnappy-devel-0-6.20150729gitd7bc683.el6.x86_64
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
gettext-0.17-18.el6.x86_64
git-1.7.1-9.el6_9.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libepoxy-devel-1.2-3.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
librdmacm-devel-1.0.21-0.el6.x86_64
lzo-devel-2.03-3.1.el6_5.1.x86_64
make-3.81-23.el6.x86_64
mesa-libEGL-devel-11.0.7-4.el6.x86_64
mesa-libgbm-devel-11.0.7-4.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
spice-glib-devel-0.26-8.el6.x86_64
spice-server-devel-0.12.4-16.el6.x86_64
tar-1.23-15.el6_8.x86_64
vte-devel-0.25.1-9.el6.x86_64
xen-devel-4.6.6-2.el6.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=bison     bzip2-devel     ccache     csnappy-devel     flex     g++     gcc     gettext     git     glib2-devel     libepoxy-devel     libfdt-devel     librdmacm-devel     lzo-devel     make     mesa-libEGL-devel     mesa-libgbm-devel     pixman-devel     SDL-devel     spice-glib-devel     spice-server-devel     tar     vte-devel     xen-devel     zlib-devel
HOSTNAME=53ead536cc4a
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

/var/tmp/qemu/run: line 52: cd: /tmp/qemu-test/src/tests/docker: No such file or directory
/var/tmp/qemu/run: line 57: /test-quick: No such file or directory
/var/tmp/qemu/run: line 57: exec: /test-quick: cannot execute: No such file or directory
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 407, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 404, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 261, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 229, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 147, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=3958b04c34c111e8aedb52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-1gu7zpou/src/docker-src.2018-03-31-04.54.53.27138:/var/tmp/qemu:z,ro', 'qemu:centos6', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 126
make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-1gu7zpou/src'
make: *** [tests/docker/Makefile.include:163: docker-run-test-quick@centos6] Error 2

real	0m31.842s
user	0m9.193s
sys	0m6.370s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Alex Bennée April 3, 2018, 4:26 p.m. UTC | #3
Max Filippov <jcmvbkbc@gmail.com> writes:

> cpu_copy adds newly created CPU object to container/machine/unattached,
> but does it w/o proper locking. As a result when multiple threads are
> created rapidly QEMU may abort with the following message:
>
>   GLib-CRITICAL **: g_hash_table_iter_next: assertion
>   'ri->version == ri->hash_table->version' failed
>
>   ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component:
>   code should not be reached
>
> Move cpu_copy invocation under clone_lock to fix that.

So my main concern is are we duplicating something already (should be?)
handled by fork_start/fork_end?

This serialises forks and ensures things like the cpu_list (which ~ a
thread list for linux-user) are updated safely.

>
> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
> ---
>  linux-user/syscall.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 889abbda1e65..18ea79140f16 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -6346,6 +6346,10 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
>
>          ts = g_new0(TaskState, 1);
>          init_task_state(ts);
> +
> +        /* Grab a mutex so that thread setup appears atomic.  */
> +        pthread_mutex_lock(&clone_lock);
> +
>          /* we create a new CPU instance. */
>          new_env = cpu_copy(env);
>          /* Init regs that differ from the parent.  */
> @@ -6364,9 +6368,6 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
>              cpu_set_tls (new_env, newtls);
>          }
>
> -        /* Grab a mutex so that thread setup appears atomic.  */
> -        pthread_mutex_lock(&clone_lock);
> -
>          memset(&info, 0, sizeof(info));
>          pthread_mutex_init(&info.mutex, NULL);
>          pthread_mutex_lock(&info.mutex);


--
Alex Bennée
Max Filippov April 3, 2018, 5:11 p.m. UTC | #4
Hi Alex,

On Tue, Apr 3, 2018 at 9:26 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
> Max Filippov <jcmvbkbc@gmail.com> writes:
>
>> cpu_copy adds newly created CPU object to container/machine/unattached,
>> but does it w/o proper locking. As a result when multiple threads are
>> created rapidly QEMU may abort with the following message:
>>
>>   GLib-CRITICAL **: g_hash_table_iter_next: assertion
>>   'ri->version == ri->hash_table->version' failed
>>
>>   ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component:
>>   code should not be reached
>>
>> Move cpu_copy invocation under clone_lock to fix that.
>
> So my main concern is are we duplicating something already (should be?)
> handled by fork_start/fork_end?

clone_lock already exists, it protects state in case of thread creation,
it just didn't protect enough of it.

The work done by fork_start/fork_end appears to be heavier than
what's needed for thread creation, because fork_start stops all
other CPUs (to make sure that child process won't get locks owned
by threads that no longer exist in the child process), which is not
required for thread creation, hence thread creation uses clone_lock.

> This serialises forks and ensures things like the cpu_list (which ~ a
> thread list for linux-user) are updated safely.
Alex Bennée April 4, 2018, 10:27 a.m. UTC | #5
Max Filippov <jcmvbkbc@gmail.com> writes:

> Hi Alex,
>
> On Tue, Apr 3, 2018 at 9:26 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>> Max Filippov <jcmvbkbc@gmail.com> writes:
>>
>>> cpu_copy adds newly created CPU object to container/machine/unattached,
>>> but does it w/o proper locking. As a result when multiple threads are
>>> created rapidly QEMU may abort with the following message:
>>>
>>>   GLib-CRITICAL **: g_hash_table_iter_next: assertion
>>>   'ri->version == ri->hash_table->version' failed
>>>
>>>   ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component:
>>>   code should not be reached
>>>
>>> Move cpu_copy invocation under clone_lock to fix that.
>>
>> So my main concern is are we duplicating something already (should be?)
>> handled by fork_start/fork_end?
>
> clone_lock already exists, it protects state in case of thread creation,
> it just didn't protect enough of it.
>
> The work done by fork_start/fork_end appears to be heavier than
> what's needed for thread creation, because fork_start stops all
> other CPUs (to make sure that child process won't get locks owned
> by threads that no longer exist in the child process), which is not
> required for thread creation, hence thread creation uses clone_lock.

I'm wondering if it should be doing more. After all start/end_exclusive
rely on the cpu list and that isn't updated on thread creation - and
without that a bunch of other things fail like ld/st exclusive after
your first new thread is spawned.

This really needs some test cases to check.

So while I think clone_lock fixes this immediate problem I suspect there
is more to do for this case.

>
>> This serialises forks and ensures things like the cpu_list (which ~ a
>> thread list for linux-user) are updated safely.


--
Alex Bennée
Peter Maydell April 4, 2018, 11:20 a.m. UTC | #6
On 4 April 2018 at 11:27, Alex Bennée <alex.bennee@linaro.org> wrote:
> I'm wondering if it should be doing more. After all start/end_exclusive
> rely on the cpu list and that isn't updated on thread creation - and
> without that a bunch of other things fail like ld/st exclusive after
> your first new thread is spawned.

I think that is handled because creating a new thread calls
cpu_copy(), which creates a new CPU object, which adds itself
to the CPU list in its realize function (same as for system
emulation new CPU objects).

thanks
-- PMM
Alex Bennée April 4, 2018, 12:52 p.m. UTC | #7
Peter Maydell <peter.maydell@linaro.org> writes:

> On 4 April 2018 at 11:27, Alex Bennée <alex.bennee@linaro.org> wrote:
>> I'm wondering if it should be doing more. After all start/end_exclusive
>> rely on the cpu list and that isn't updated on thread creation - and
>> without that a bunch of other things fail like ld/st exclusive after
>> your first new thread is spawned.
>
> I think that is handled because creating a new thread calls
> cpu_copy(), which creates a new CPU object, which adds itself
> to the CPU list in its realize function (same as for system
> emulation new CPU objects).

Ahh that makes sense - I missed it hidden behind the realize.

--
Alex Bennée
diff mbox

Patch

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 889abbda1e65..18ea79140f16 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6346,6 +6346,10 @@  static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
 
         ts = g_new0(TaskState, 1);
         init_task_state(ts);
+
+        /* Grab a mutex so that thread setup appears atomic.  */
+        pthread_mutex_lock(&clone_lock);
+
         /* we create a new CPU instance. */
         new_env = cpu_copy(env);
         /* Init regs that differ from the parent.  */
@@ -6364,9 +6368,6 @@  static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
             cpu_set_tls (new_env, newtls);
         }
 
-        /* Grab a mutex so that thread setup appears atomic.  */
-        pthread_mutex_lock(&clone_lock);
-
         memset(&info, 0, sizeof(info));
         pthread_mutex_init(&info.mutex, NULL);
         pthread_mutex_lock(&info.mutex);