Message ID | 1560276131-683243-7-git-send-email-andrey.shinkevich@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow Valgrind checking all QEMU processes | expand |
11.06.2019 21:02, Andrey Shinkevich wrote: > Processes are dying harder under the Valgring. It results in counting > the dying process as a newborn one. Make it sure that old NBD job get > finished before starting a new one. > > Suggested-by: Roman Kagan <rkagan@virtuozzo.com> > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > --- > tests/qemu-iotests/common.nbd | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd > index 25fc9ff..e3dcc60 100644 > --- a/tests/qemu-iotests/common.nbd > +++ b/tests/qemu-iotests/common.nbd > @@ -22,6 +22,7 @@ > nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock" > nbd_tcp_addr="127.0.0.1" > nbd_pid_file="${TEST_DIR}/qemu-nbd.pid" > +nbd_job_pid="" > > nbd_server_stop() > { > @@ -33,6 +34,9 @@ nbd_server_stop() > kill "$NBD_PID" > fi > fi Honestly, I don't understand the problem from commit message, but anyway comment should be added here, to mark that this is for valgrind... But you don't check for VALGRIND enabled.. I it intentional? > + if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then > + wait "$nbd_job_pid" > + fi > rm -f "$nbd_unix_socket" > } > > @@ -61,6 +65,7 @@ nbd_server_start_unix_socket() > { > nbd_server_stop > $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" & > + nbd_job_pid=$! > nbd_server_wait_for_unix_socket $! > } > > @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket() > { > nbd_server_stop > $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" & > + nbd_job_pid=$! > nbd_server_wait_for_tcp_socket $! > } >
On Tue, Jun 11, 2019 at 09:02:10PM +0300, Andrey Shinkevich wrote: > Processes are dying harder under the Valgring. It results in counting > the dying process as a newborn one. Make it sure that old NBD job get > finished before starting a new one. I think this log message is confusing. The problem this patch addresses is that nbd_server_stop only sends a signal to the nbd process and immediately returns, without waiting for it to actually terminate. The next operation is often starting a new instance of nbd; this races with the termination of the old one, and may result in various failures (like nbd_server_start_* taking the terminating nbd as the one just started, or the starting nbd encountering a busy listening socket). Without valgrind the race window is very small and the problem didn't surface for long. However, under valgrind process termination takes much longer so the race bites every test run. Since nbd is run in a background job of the test, record the nbd pid at the daemon start in a shell variable and perform a wait for it when terminating it. Roman. > Suggested-by: Roman Kagan <rkagan@virtuozzo.com> > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > --- > tests/qemu-iotests/common.nbd | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd > index 25fc9ff..e3dcc60 100644 > --- a/tests/qemu-iotests/common.nbd > +++ b/tests/qemu-iotests/common.nbd > @@ -22,6 +22,7 @@ > nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock" > nbd_tcp_addr="127.0.0.1" > nbd_pid_file="${TEST_DIR}/qemu-nbd.pid" > +nbd_job_pid="" > > nbd_server_stop() > { > @@ -33,6 +34,9 @@ nbd_server_stop() > kill "$NBD_PID" > fi > fi > + if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then > + wait "$nbd_job_pid" > + fi > rm -f "$nbd_unix_socket" > } > > @@ -61,6 +65,7 @@ nbd_server_start_unix_socket() > { > nbd_server_stop > $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" & > + nbd_job_pid=$! > nbd_server_wait_for_unix_socket $! > } > > @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket() > { > nbd_server_stop > $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" & > + nbd_job_pid=$! > nbd_server_wait_for_tcp_socket $! > } > -- > 1.8.3.1 >
On Thu, Jun 13, 2019 at 12:59:53PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 11.06.2019 21:02, Andrey Shinkevich wrote: > > Processes are dying harder under the Valgring. It results in counting > > the dying process as a newborn one. Make it sure that old NBD job get > > finished before starting a new one. > > > > Suggested-by: Roman Kagan <rkagan@virtuozzo.com> > > Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> > > --- > > tests/qemu-iotests/common.nbd | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd > > index 25fc9ff..e3dcc60 100644 > > --- a/tests/qemu-iotests/common.nbd > > +++ b/tests/qemu-iotests/common.nbd > > @@ -22,6 +22,7 @@ > > nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock" > > nbd_tcp_addr="127.0.0.1" > > nbd_pid_file="${TEST_DIR}/qemu-nbd.pid" > > +nbd_job_pid="" > > > > nbd_server_stop() > > { > > @@ -33,6 +34,9 @@ nbd_server_stop() > > kill "$NBD_PID" > > fi > > fi > > Honestly, I don't understand the problem from commit message, but anyway comment > should be added here, to mark that this is for valgrind... But you don't check for > VALGRIND enabled.. I it intentional? It is. The problem this patch fixes exists regardless of valgrind. valgrind just makes it easier to notice. See my reply to the patch itself. Roman. > > > + if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then > > + wait "$nbd_job_pid" > > + fi > > rm -f "$nbd_unix_socket" > > } > > > > @@ -61,6 +65,7 @@ nbd_server_start_unix_socket() > > { > > nbd_server_stop > > $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" & > > + nbd_job_pid=$! > > nbd_server_wait_for_unix_socket $! > > } > > > > @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket() > > { > > nbd_server_stop > > $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" & > > + nbd_job_pid=$! > > nbd_server_wait_for_tcp_socket $! > > } > > > > > -- > Best regards, > Vladimir
diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd index 25fc9ff..e3dcc60 100644 --- a/tests/qemu-iotests/common.nbd +++ b/tests/qemu-iotests/common.nbd @@ -22,6 +22,7 @@ nbd_unix_socket="${TEST_DIR}/qemu-nbd.sock" nbd_tcp_addr="127.0.0.1" nbd_pid_file="${TEST_DIR}/qemu-nbd.pid" +nbd_job_pid="" nbd_server_stop() { @@ -33,6 +34,9 @@ nbd_server_stop() kill "$NBD_PID" fi fi + if [ -n "$nbd_job_pid" ] && kill -s 0 "$nbd_job_pid" 2>/dev/null; then + wait "$nbd_job_pid" + fi rm -f "$nbd_unix_socket" } @@ -61,6 +65,7 @@ nbd_server_start_unix_socket() { nbd_server_stop $QEMU_NBD -v -t -k "$nbd_unix_socket" "$@" & + nbd_job_pid=$! nbd_server_wait_for_unix_socket $! } @@ -105,5 +110,6 @@ nbd_server_start_tcp_socket() { nbd_server_stop $QEMU_NBD -v -t -b $nbd_tcp_addr -p $nbd_tcp_port "$@" & + nbd_job_pid=$! nbd_server_wait_for_tcp_socket $! }
Processes are dying harder under the Valgring. It results in counting the dying process as a newborn one. Make it sure that old NBD job get finished before starting a new one. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- tests/qemu-iotests/common.nbd | 6 ++++++ 1 file changed, 6 insertions(+)