Message ID | 20200508051441.8143-2-fengli@smartx.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/2] io/channel: fix crash when qio_channel_readv_all return 0 | expand |
Hi On Fri, May 8, 2020 at 7:14 AM Li Feng <fengli@smartx.com> wrote: > > Root cause: > From `man recvmsg`, the RETURN VALUE says: > These calls return the number of bytes received, or -1 if an error occurred. > In the event of an error, errno is set to indicate the error. > The return value will be 0 when the peer has performed an orderly shutdown. > > When an error happens, the socket will be closed, and recvmsg return 0, > then error_setg will trigger a crash. > > This unit test could reproduce this issue: > tests/test-char -p /char/socket/client/reconnect-error/unix Current master doesn't trigger the backtrace, it's only after your patch 2. > > The core file backtrace is : > > (gdb) bt > #0 0x00007ffff5ac3277 in raise () from /lib64/libc.so.6 > #1 0x00007ffff5ac4968 in abort () from /lib64/libc.so.6 > #2 0x00005555555aaa94 in error_handle_fatal (errp=<optimized out>, err=0x7fffec0012d0) at util/error.c:40 > #3 0x00005555555aab6d in error_setv (errp=0x555555802a08 <error_abort>, src=0x5555555c4280 "io/channel.c", line=148, > func=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", err_class=ERROR_CLASS_GENERIC_ERROR, > fmt=<optimized out>, ap=0x7ffff423bae0, suffix=0x0) at util/error.c:73 > #4 0x00005555555aacf0 in error_setg_internal (errp=errp@entry=0x555555802a08 <error_abort>, > src=src@entry=0x5555555c4280 "io/channel.c", line=line@entry=148, > func=func@entry=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", > fmt=fmt@entry=0x5555555c43a0 "Unexpected end-of-file before all bytes were read") at util/error.c:97 > #5 0x000055555556c25c in qio_channel_readv_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, > errp=0x555555802a08 <error_abort>) at io/channel.c:147 > #6 0x000055555556c29a in qio_channel_read_all (ioc=<optimized out>, buf=<optimized out>, buflen=<optimized out>, > errp=<optimized out>) at io/channel.c:247 > #7 0x000055555556ad22 in char_socket_ping_pong (ioc=0x7fffec0008c0) at tests/test-char.c:732 > #8 0x000055555556ae12 in char_socket_client_server_thread (data=data@entry=0x55555582e350) at tests/test-char.c:891 > #9 0x00005555555a95b6 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 > #10 0x00007ffff5e61e25 in start_thread () from /lib64/libpthread.so.0 > #11 0x00007ffff5b8bbad in clone () from /lib64/libc.so.6 > > Signed-off-by: Li Feng <fengli@smartx.com> > --- > io/channel.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/io/channel.c b/io/channel.c > index e4376eb0bc..1a4a505f01 100644 > --- a/io/channel.c > +++ b/io/channel.c > @@ -144,8 +144,6 @@ int qio_channel_readv_all(QIOChannel *ioc, > > if (ret == 0) { > ret = -1; > - error_setg(errp, > - "Unexpected end-of-file before all bytes were read"); Nack, this code is fine. The problem is that the test case doesn't expect a disconnect in char_socket_ping_pong().
Marc-André Lureau <marcandre.lureau@gmail.com> 于2020年5月8日周五 下午8:32写道: > > Hi > > On Fri, May 8, 2020 at 7:14 AM Li Feng <fengli@smartx.com> wrote: > > > > Root cause: > > From `man recvmsg`, the RETURN VALUE says: > > These calls return the number of bytes received, or -1 if an error occurred. > > In the event of an error, errno is set to indicate the error. > > The return value will be 0 when the peer has performed an orderly shutdown. > > > > When an error happens, the socket will be closed, and recvmsg return 0, > > then error_setg will trigger a crash. > > > > This unit test could reproduce this issue: > > tests/test-char -p /char/socket/client/reconnect-error/unix > > Current master doesn't trigger the backtrace, it's only after your patch 2. Yes. However, the issue did exist in the master code base. The test case in patch 2 revealed this issue. > > > > > The core file backtrace is : > > > > (gdb) bt > > #0 0x00007ffff5ac3277 in raise () from /lib64/libc.so.6 > > #1 0x00007ffff5ac4968 in abort () from /lib64/libc.so.6 > > #2 0x00005555555aaa94 in error_handle_fatal (errp=<optimized out>, err=0x7fffec0012d0) at util/error.c:40 > > #3 0x00005555555aab6d in error_setv (errp=0x555555802a08 <error_abort>, src=0x5555555c4280 "io/channel.c", line=148, > > func=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", err_class=ERROR_CLASS_GENERIC_ERROR, > > fmt=<optimized out>, ap=0x7ffff423bae0, suffix=0x0) at util/error.c:73 > > #4 0x00005555555aacf0 in error_setg_internal (errp=errp@entry=0x555555802a08 <error_abort>, > > src=src@entry=0x5555555c4280 "io/channel.c", line=line@entry=148, > > func=func@entry=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", > > fmt=fmt@entry=0x5555555c43a0 "Unexpected end-of-file before all bytes were read") at util/error.c:97 > > #5 0x000055555556c25c in qio_channel_readv_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, > > errp=0x555555802a08 <error_abort>) at io/channel.c:147 > > #6 0x000055555556c29a in qio_channel_read_all (ioc=<optimized out>, buf=<optimized out>, buflen=<optimized out>, > > errp=<optimized out>) at io/channel.c:247 > > #7 0x000055555556ad22 in char_socket_ping_pong (ioc=0x7fffec0008c0) at tests/test-char.c:732 > > #8 0x000055555556ae12 in char_socket_client_server_thread (data=data@entry=0x55555582e350) at tests/test-char.c:891 > > #9 0x00005555555a95b6 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 > > #10 0x00007ffff5e61e25 in start_thread () from /lib64/libpthread.so.0 > > #11 0x00007ffff5b8bbad in clone () from /lib64/libc.so.6 > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > --- > > io/channel.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/io/channel.c b/io/channel.c > > index e4376eb0bc..1a4a505f01 100644 > > --- a/io/channel.c > > +++ b/io/channel.c > > @@ -144,8 +144,6 @@ int qio_channel_readv_all(QIOChannel *ioc, > > > > if (ret == 0) { > > ret = -1; > > - error_setg(errp, > > - "Unexpected end-of-file before all bytes were read"); > > Nack, this code is fine. > > The problem is that the test case doesn't expect a disconnect in > char_socket_ping_pong(). Yes, in the test case I try to inject an error in char_socket_ping_pong. Any concerns about these two patches? > > -- > Marc-André Lureau
On Fri, May 08, 2020 at 08:42:22PM +0800, Li Feng wrote: > Marc-André Lureau <marcandre.lureau@gmail.com> 于2020年5月8日周五 下午8:32写道: > > > > Hi > > > > On Fri, May 8, 2020 at 7:14 AM Li Feng <fengli@smartx.com> wrote: > > > > > > Root cause: > > > From `man recvmsg`, the RETURN VALUE says: > > > These calls return the number of bytes received, or -1 if an error occurred. > > > In the event of an error, errno is set to indicate the error. > > > The return value will be 0 when the peer has performed an orderly shutdown. > > > > > > When an error happens, the socket will be closed, and recvmsg return 0, > > > then error_setg will trigger a crash. > > > > > > This unit test could reproduce this issue: > > > tests/test-char -p /char/socket/client/reconnect-error/unix > > > > Current master doesn't trigger the backtrace, it's only after your patch 2. > Yes. However, the issue did exist in the master code base. > The test case in patch 2 revealed this issue. > > > > > > > > > The core file backtrace is : > > > > > > (gdb) bt > > > #0 0x00007ffff5ac3277 in raise () from /lib64/libc.so.6 > > > #1 0x00007ffff5ac4968 in abort () from /lib64/libc.so.6 > > > #2 0x00005555555aaa94 in error_handle_fatal (errp=<optimized out>, err=0x7fffec0012d0) at util/error.c:40 > > > #3 0x00005555555aab6d in error_setv (errp=0x555555802a08 <error_abort>, src=0x5555555c4280 "io/channel.c", line=148, > > > func=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", err_class=ERROR_CLASS_GENERIC_ERROR, > > > fmt=<optimized out>, ap=0x7ffff423bae0, suffix=0x0) at util/error.c:73 > > > #4 0x00005555555aacf0 in error_setg_internal (errp=errp@entry=0x555555802a08 <error_abort>, > > > src=src@entry=0x5555555c4280 "io/channel.c", line=line@entry=148, > > > func=func@entry=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", > > > fmt=fmt@entry=0x5555555c43a0 "Unexpected end-of-file before all bytes were read") at util/error.c:97 > > > #5 0x000055555556c25c in qio_channel_readv_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, > > > errp=0x555555802a08 <error_abort>) at io/channel.c:147 > > > #6 0x000055555556c29a in qio_channel_read_all (ioc=<optimized out>, buf=<optimized out>, buflen=<optimized out>, > > > errp=<optimized out>) at io/channel.c:247 > > > #7 0x000055555556ad22 in char_socket_ping_pong (ioc=0x7fffec0008c0) at tests/test-char.c:732 > > > #8 0x000055555556ae12 in char_socket_client_server_thread (data=data@entry=0x55555582e350) at tests/test-char.c:891 > > > #9 0x00005555555a95b6 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 > > > #10 0x00007ffff5e61e25 in start_thread () from /lib64/libpthread.so.0 > > > #11 0x00007ffff5b8bbad in clone () from /lib64/libc.so.6 > > > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > > --- > > > io/channel.c | 2 -- > > > 1 file changed, 2 deletions(-) > > > > > > diff --git a/io/channel.c b/io/channel.c > > > index e4376eb0bc..1a4a505f01 100644 > > > --- a/io/channel.c > > > +++ b/io/channel.c > > > @@ -144,8 +144,6 @@ int qio_channel_readv_all(QIOChannel *ioc, > > > > > > if (ret == 0) { > > > ret = -1; > > > - error_setg(errp, > > > - "Unexpected end-of-file before all bytes were read"); > > > > Nack, this code is fine. > > > > The problem is that the test case doesn't expect a disconnect in > > char_socket_ping_pong(). > Yes, in the test case I try to inject an error in char_socket_ping_pong. > Any concerns about these two patches? I agree with Marc-André - this patch is wrong. qio_channel_readv_all *MUST* always set 'errp' if the return value is -1. To not set 'errp' is violating the calling convention. The bug here is in your test case - it is passing '&error_abort' to the 'qio_channel_readv_all' call. If you don't want the test to crash, then don't pass &error_abort Regards, Daniel
Thank you, Daniel and Marc-André. I understand now. The error_abort is used to abort the program. I should rewrite the char_socket_ping_pong. I will post a new version without this wrong patch. Thanks, Feng Li Daniel P. Berrangé <berrange@redhat.com> 于2020年5月11日周一 下午6:03写道: > > On Fri, May 08, 2020 at 08:42:22PM +0800, Li Feng wrote: > > Marc-André Lureau <marcandre.lureau@gmail.com> 于2020年5月8日周五 下午8:32写道: > > > > > > Hi > > > > > > On Fri, May 8, 2020 at 7:14 AM Li Feng <fengli@smartx.com> wrote: > > > > > > > > Root cause: > > > > From `man recvmsg`, the RETURN VALUE says: > > > > These calls return the number of bytes received, or -1 if an error occurred. > > > > In the event of an error, errno is set to indicate the error. > > > > The return value will be 0 when the peer has performed an orderly shutdown. > > > > > > > > When an error happens, the socket will be closed, and recvmsg return 0, > > > > then error_setg will trigger a crash. > > > > > > > > This unit test could reproduce this issue: > > > > tests/test-char -p /char/socket/client/reconnect-error/unix > > > > > > Current master doesn't trigger the backtrace, it's only after your patch 2. > > Yes. However, the issue did exist in the master code base. > > The test case in patch 2 revealed this issue. > > > > > > > > > > > > > The core file backtrace is : > > > > > > > > (gdb) bt > > > > #0 0x00007ffff5ac3277 in raise () from /lib64/libc.so.6 > > > > #1 0x00007ffff5ac4968 in abort () from /lib64/libc.so.6 > > > > #2 0x00005555555aaa94 in error_handle_fatal (errp=<optimized out>, err=0x7fffec0012d0) at util/error.c:40 > > > > #3 0x00005555555aab6d in error_setv (errp=0x555555802a08 <error_abort>, src=0x5555555c4280 "io/channel.c", line=148, > > > > func=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", err_class=ERROR_CLASS_GENERIC_ERROR, > > > > fmt=<optimized out>, ap=0x7ffff423bae0, suffix=0x0) at util/error.c:73 > > > > #4 0x00005555555aacf0 in error_setg_internal (errp=errp@entry=0x555555802a08 <error_abort>, > > > > src=src@entry=0x5555555c4280 "io/channel.c", line=line@entry=148, > > > > func=func@entry=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", > > > > fmt=fmt@entry=0x5555555c43a0 "Unexpected end-of-file before all bytes were read") at util/error.c:97 > > > > #5 0x000055555556c25c in qio_channel_readv_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, > > > > errp=0x555555802a08 <error_abort>) at io/channel.c:147 > > > > #6 0x000055555556c29a in qio_channel_read_all (ioc=<optimized out>, buf=<optimized out>, buflen=<optimized out>, > > > > errp=<optimized out>) at io/channel.c:247 > > > > #7 0x000055555556ad22 in char_socket_ping_pong (ioc=0x7fffec0008c0) at tests/test-char.c:732 > > > > #8 0x000055555556ae12 in char_socket_client_server_thread (data=data@entry=0x55555582e350) at tests/test-char.c:891 > > > > #9 0x00005555555a95b6 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 > > > > #10 0x00007ffff5e61e25 in start_thread () from /lib64/libpthread.so.0 > > > > #11 0x00007ffff5b8bbad in clone () from /lib64/libc.so.6 > > > > > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > > > --- > > > > io/channel.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/io/channel.c b/io/channel.c > > > > index e4376eb0bc..1a4a505f01 100644 > > > > --- a/io/channel.c > > > > +++ b/io/channel.c > > > > @@ -144,8 +144,6 @@ int qio_channel_readv_all(QIOChannel *ioc, > > > > > > > > if (ret == 0) { > > > > ret = -1; > > > > - error_setg(errp, > > > > - "Unexpected end-of-file before all bytes were read"); > > > > > > Nack, this code is fine. > > > > > > The problem is that the test case doesn't expect a disconnect in > > > char_socket_ping_pong(). > > Yes, in the test case I try to inject an error in char_socket_ping_pong. > > Any concerns about these two patches? > > I agree with Marc-André - this patch is wrong. qio_channel_readv_all > *MUST* always set 'errp' if the return value is -1. To not set 'errp' > is violating the calling convention. > > The bug here is in your test case - it is passing '&error_abort' to the > 'qio_channel_readv_all' call. If you don't want the test to crash, then > don't pass &error_abort > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >
diff --git a/io/channel.c b/io/channel.c index e4376eb0bc..1a4a505f01 100644 --- a/io/channel.c +++ b/io/channel.c @@ -144,8 +144,6 @@ int qio_channel_readv_all(QIOChannel *ioc, if (ret == 0) { ret = -1; - error_setg(errp, - "Unexpected end-of-file before all bytes were read"); } else if (ret == 1) { ret = 0; }
Root cause: From `man recvmsg`, the RETURN VALUE says: These calls return the number of bytes received, or -1 if an error occurred. In the event of an error, errno is set to indicate the error. The return value will be 0 when the peer has performed an orderly shutdown. When an error happens, the socket will be closed, and recvmsg return 0, then error_setg will trigger a crash. This unit test could reproduce this issue: tests/test-char -p /char/socket/client/reconnect-error/unix The core file backtrace is : (gdb) bt #0 0x00007ffff5ac3277 in raise () from /lib64/libc.so.6 #1 0x00007ffff5ac4968 in abort () from /lib64/libc.so.6 #2 0x00005555555aaa94 in error_handle_fatal (errp=<optimized out>, err=0x7fffec0012d0) at util/error.c:40 #3 0x00005555555aab6d in error_setv (errp=0x555555802a08 <error_abort>, src=0x5555555c4280 "io/channel.c", line=148, func=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", err_class=ERROR_CLASS_GENERIC_ERROR, fmt=<optimized out>, ap=0x7ffff423bae0, suffix=0x0) at util/error.c:73 #4 0x00005555555aacf0 in error_setg_internal (errp=errp@entry=0x555555802a08 <error_abort>, src=src@entry=0x5555555c4280 "io/channel.c", line=line@entry=148, func=func@entry=0x5555555c4580 <__func__.17489> "qio_channel_readv_all", fmt=fmt@entry=0x5555555c43a0 "Unexpected end-of-file before all bytes were read") at util/error.c:97 #5 0x000055555556c25c in qio_channel_readv_all (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, errp=0x555555802a08 <error_abort>) at io/channel.c:147 #6 0x000055555556c29a in qio_channel_read_all (ioc=<optimized out>, buf=<optimized out>, buflen=<optimized out>, errp=<optimized out>) at io/channel.c:247 #7 0x000055555556ad22 in char_socket_ping_pong (ioc=0x7fffec0008c0) at tests/test-char.c:732 #8 0x000055555556ae12 in char_socket_client_server_thread (data=data@entry=0x55555582e350) at tests/test-char.c:891 #9 0x00005555555a95b6 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 #10 0x00007ffff5e61e25 in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff5b8bbad in clone () from /lib64/libc.so.6 Signed-off-by: Li Feng <fengli@smartx.com> --- io/channel.c | 2 -- 1 file changed, 2 deletions(-)