Message ID | 1560173684-6264-1-git-send-email-lizhengui@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | file-posix: unlock qemu_global_mutex before pread when attach disk | expand |
On 10/06/19 15:34, Zhengui li wrote: > > when do qmp sush as drive_add, qemu main thread locks the > qemu_global_mutex and do pread in raw_probe_alignmen. Pread is a > synchronous operation. If backend storage network has a large delay > or IO pressure is too large, the pread operation will not return for > a long time, which make vcpu thread can't acquire qemu_global_mutex > for a long time and make the vcpu thread unable to be scheduled for a > long time. So virtual machine cpu soft lockup happened. > > qemu main thread should not hold qemu_global_mutex for a long time > when do qmp that involving IO synchronous operation sush pread , > ioctl, etc. So this patch unlock qemu_global_mutex before IO > synchronous operation sush pread. These preads are for 512-4096 bytes, can they really last much longer than the "open" that precedes them? If pread of 4K can trigger a soft lockup, things are really screwed up---and it's hard to be sure that all callers of raw_probe_alignment are okay with releasing the global mutex. Paolo
Patchew URL: https://patchew.org/QEMU/1560173684-6264-1-git-send-email-lizhengui@huawei.com/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH] file-posix: unlock qemu_global_mutex before pread when attach disk Type: series Message-id: 1560173684-6264-1-git-send-email-lizhengui@huawei.com === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === From https://github.com/patchew-project/qemu * [new tag] patchew/1560173684-6264-1-git-send-email-lizhengui@huawei.com -> patchew/1560173684-6264-1-git-send-email-lizhengui@huawei.com Switched to a new branch 'test' a5549055a3 file-posix: unlock qemu_global_mutex before pread when attach disk === OUTPUT BEGIN === ERROR: Missing Signed-off-by: line(s) total: 1 errors, 0 warnings, 9 lines checked Commit a5549055a30d (file-posix: unlock qemu_global_mutex before pread when attach disk) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/1560173684-6264-1-git-send-email-lizhengui@huawei.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
The pread will hang in attaching disk just when backend storage network disconnection . I think the locking range of qemu_global_mutex is too large when do qmp operation. what does the qemu_global_mutex really protect? what is the risk of unlocking qemu_global_mutex in qmp? On 2019/6/10 21:51, Paolo Bonzini wrote: > On 10/06/19 15:34, Zhengui li wrote: >> when do qmp sush as drive_add, qemu main thread locks the >> qemu_global_mutex and do pread in raw_probe_alignmen. Pread is a >> synchronous operation. If backend storage network has a large delay >> or IO pressure is too large, the pread operation will not return for >> a long time, which make vcpu thread can't acquire qemu_global_mutex >> for a long time and make the vcpu thread unable to be scheduled for a >> long time. So virtual machine cpu soft lockup happened. >> >> qemu main thread should not hold qemu_global_mutex for a long time >> when do qmp that involving IO synchronous operation sush pread , >> ioctl, etc. So this patch unlock qemu_global_mutex before IO >> synchronous operation sush pread. > These preads are for 512-4096 bytes, can they really last much longer > than the "open" that precedes them? If pread of 4K can trigger a soft > lockup, things are really screwed up---and it's hard to be sure that all > callers of raw_probe_alignment are okay with releasing the global mutex. > > Paolo > > . > null
On 10/06/19 16:51, l00284672 wrote: > The pread will hang in attaching disk just when backend storage network > disconnection . Would the "open" hang as well in that case? > I think the locking range of qemu_global_mutex is too large when do qmp > operation. what > does the qemu_global_mutex really protect? Basically everything. > what is the risk of unlocking qemu_global_mutex in qmp? It's not QMP; it's specifically the code that calls raw_probe_alignment. Paolo
-- Would the "open" hang as well in that case? The "open" doesn't hang in that case. Do you have any better solutions to solve this problem in the case? On 2019/6/11 0:03, Paolo Bonzini wrote: > On 10/06/19 16:51, l00284672 wrote: >> The pread will hang in attaching disk just when backend storage network >> disconnection . > Would the "open" hang as well in that case? >> I think the locking range of qemu_global_mutex is too large when do qmp >> operation. what >> does the qemu_global_mutex really protect? > Basically everything. > >> what is the risk of unlocking qemu_global_mutex in qmp? > It's not QMP; it's specifically the code that calls raw_probe_alignment. > > Paolo > > . > null
Am 11.06.2019 um 04:53 hat l00284672 geschrieben: > -- Would the "open" hang as well in that case? > The "open" doesn't hang in that case. > > Do you have any better solutions to solve this problem in the case? Yes, but unfortunately it's a lot harder. This is roughly what you'd have to do: 1. Make QMP command handlers async (patches from Marc-André are on the list) 2. Stop using HMP drive_add and instead switch to QMP blockdev-add 3. Move the blockdev-add code into a coroutine 4. Make .bdrv_open a coroutine_fn 5. Move the pread() in file-posix to the thread pool and let the coroutine yield while the request is running Only with all of these pieces in place we'll be able to release the global mutex while we're waiting for the pread() to complete. Kevin
diff --git a/block/file-posix.c b/block/file-posix.c index 1cf4ee4..192c779 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -297,7 +297,9 @@ static int probe_physical_blocksize(int fd, unsigned int *blk_size) */ static bool raw_is_io_aligned(int fd, void *buf, size_t len) { + qemu_mutex_unlock_iothread(); ssize_t ret = pread(fd, buf, len, 0); + qemu_mutex_lock_iothread(); if (ret >= 0) { return true;