Message ID | b86f80bb47ac66b73b2afe80218c9913722c606a.1586370737.git.lukasstraub2@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | colo-compare bugfixes | expand |
> -----Original Message----- > From: Lukas Straub <lukasstraub2@web.de> > Sent: Thursday, April 9, 2020 2:34 AM > To: qemu-devel <qemu-devel@nongnu.org> > Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > <pbonzini@redhat.com> > Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right > AioContext > > qemu_bh_new will set the bh to be executed in the main loop. This causes > problems as colo_compare_handle_event assumes that it has exclusive > access the queues, which are also accessed in the iothread. It also assumes > that it runs in a different thread than the caller and takes the appropriate > locks. > > Create the bh with the AioContext of the iothread to fulfill these > assumptions. > Looks good for me, I assume it will increase performance. Do you have related data? Thanks Zhang Chen > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > --- > net/colo-compare.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > 10c0239f9d..1de4220fe2 100644 > --- a/net/colo-compare.c > +++ b/net/colo-compare.c > @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void > *opaque) > > static void colo_compare_iothread(CompareState *s) { > + AioContext *ctx = iothread_get_aio_context(s->iothread); > object_ref(OBJECT(s->iothread)); > s->worker_context = iothread_get_g_main_context(s->iothread); > > @@ -906,7 +907,7 @@ static void colo_compare_iothread(CompareState *s) > } > > colo_compare_timer_init(s); > - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); > + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s); > } > > static char *compare_get_pri_indev(Object *obj, Error **errp) > -- > 2.20.1
On Wed, 22 Apr 2020 08:29:39 +0000 "Zhang, Chen" <chen.zhang@intel.com> wrote: > > -----Original Message----- > > From: Lukas Straub <lukasstraub2@web.de> > > Sent: Thursday, April 9, 2020 2:34 AM > > To: qemu-devel <qemu-devel@nongnu.org> > > Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian > > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > > <pbonzini@redhat.com> > > Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right > > AioContext > > > > qemu_bh_new will set the bh to be executed in the main loop. This causes > > problems as colo_compare_handle_event assumes that it has exclusive > > access the queues, which are also accessed in the iothread. It also assumes > > that it runs in a different thread than the caller and takes the appropriate > > locks. > > > > Create the bh with the AioContext of the iothread to fulfill these > > assumptions. > > > > Looks good for me, I assume it will increase performance. Do you have related data? No, this fixes several crashes because the queues where accessed concurrently from multiple threads. Sorry for my bad wording. Regards, Lukas Straub > Thanks > Zhang Chen > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > --- > > net/colo-compare.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > > 10c0239f9d..1de4220fe2 100644 > > --- a/net/colo-compare.c > > +++ b/net/colo-compare.c > > @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void > > *opaque) > > > > static void colo_compare_iothread(CompareState *s) { > > + AioContext *ctx = iothread_get_aio_context(s->iothread); > > object_ref(OBJECT(s->iothread)); > > s->worker_context = iothread_get_g_main_context(s->iothread); > > > > @@ -906,7 +907,7 @@ static void colo_compare_iothread(CompareState *s) > > } > > > > colo_compare_timer_init(s); > > - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); > > + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s); > > } > > > > static char *compare_get_pri_indev(Object *obj, Error **errp) > > -- > > 2.20.1 >
> -----Original Message----- > From: Lukas Straub <lukasstraub2@web.de> > Sent: Wednesday, April 22, 2020 4:43 PM > To: Zhang, Chen <chen.zhang@intel.com> > Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > <pbonzini@redhat.com> > Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right > AioContext > > On Wed, 22 Apr 2020 08:29:39 +0000 > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > -----Original Message----- > > > From: Lukas Straub <lukasstraub2@web.de> > > > Sent: Thursday, April 9, 2020 2:34 AM > > > To: qemu-devel <qemu-devel@nongnu.org> > > > Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian > > > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > > > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > > > <pbonzini@redhat.com> > > > Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with the > > > right AioContext > > > > > > qemu_bh_new will set the bh to be executed in the main loop. This > > > causes problems as colo_compare_handle_event assumes that it has > > > exclusive access the queues, which are also accessed in the > > > iothread. It also assumes that it runs in a different thread than > > > the caller and takes the appropriate locks. > > > > > > Create the bh with the AioContext of the iothread to fulfill these > > > assumptions. > > > > > > > Looks good for me, I assume it will increase performance. Do you have > related data? > > No, this fixes several crashes because the queues where accessed > concurrently from multiple threads. Sorry for my bad wording. Can you describe some details about the crash? Step by step? Maybe I can re-produce and test it for this patch. Thanks Zhang Chen > > Regards, > Lukas Straub > > > Thanks > > Zhang Chen > > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > > --- > > > net/colo-compare.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > > > 10c0239f9d..1de4220fe2 100644 > > > --- a/net/colo-compare.c > > > +++ b/net/colo-compare.c > > > @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void > > > *opaque) > > > > > > static void colo_compare_iothread(CompareState *s) { > > > + AioContext *ctx = iothread_get_aio_context(s->iothread); > > > object_ref(OBJECT(s->iothread)); > > > s->worker_context = iothread_get_g_main_context(s->iothread); > > > > > > @@ -906,7 +907,7 @@ static void colo_compare_iothread(CompareState > *s) > > > } > > > > > > colo_compare_timer_init(s); > > > - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); > > > + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s); > > > } > > > > > > static char *compare_get_pri_indev(Object *obj, Error **errp) > > > -- > > > 2.20.1 > >
On Wed, 22 Apr 2020 09:03:00 +0000 "Zhang, Chen" <chen.zhang@intel.com> wrote: > > -----Original Message----- > > From: Lukas Straub <lukasstraub2@web.de> > > Sent: Wednesday, April 22, 2020 4:43 PM > > To: Zhang, Chen <chen.zhang@intel.com> > > Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > > <pbonzini@redhat.com> > > Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right > > AioContext > > > > On Wed, 22 Apr 2020 08:29:39 +0000 > > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > > > -----Original Message----- > > > > From: Lukas Straub <lukasstraub2@web.de> > > > > Sent: Thursday, April 9, 2020 2:34 AM > > > > To: qemu-devel <qemu-devel@nongnu.org> > > > > Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian > > > > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > > > > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > > > > <pbonzini@redhat.com> > > > > Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with the > > > > right AioContext > > > > > > > > qemu_bh_new will set the bh to be executed in the main loop. This > > > > causes problems as colo_compare_handle_event assumes that it has > > > > exclusive access the queues, which are also accessed in the > > > > iothread. It also assumes that it runs in a different thread than > > > > the caller and takes the appropriate locks. > > > > > > > > Create the bh with the AioContext of the iothread to fulfill these > > > > assumptions. > > > > > > > > > > Looks good for me, I assume it will increase performance. Do you have > > related data? > > > > No, this fixes several crashes because the queues where accessed > > concurrently from multiple threads. Sorry for my bad wording. > > Can you describe some details about the crash? Step by step? > Maybe I can re-produce and test it for this patch. There is no clear test case. For me the crashes happened after 1-20h of runtime with lots of checkpoints (800ms) and some network traffic. The coredump always showed that two threads where doing operations on the queues simultaneously. Unfortunately, I don't have the coredumps anymore. Regards, Lukas Straub > Thanks > Zhang Chen > > > > > Regards, > > Lukas Straub > > > > > Thanks > > > Zhang Chen > > > > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > > > --- > > > > net/colo-compare.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > > > > 10c0239f9d..1de4220fe2 100644 > > > > --- a/net/colo-compare.c > > > > +++ b/net/colo-compare.c > > > > @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void > > > > *opaque) > > > > > > > > static void colo_compare_iothread(CompareState *s) { > > > > + AioContext *ctx = iothread_get_aio_context(s->iothread); > > > > object_ref(OBJECT(s->iothread)); > > > > s->worker_context = iothread_get_g_main_context(s->iothread); > > > > > > > > @@ -906,7 +907,7 @@ static void colo_compare_iothread(CompareState > > *s) > > > > } > > > > > > > > colo_compare_timer_init(s); > > > > - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); > > > > + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s); > > > > } > > > > > > > > static char *compare_get_pri_indev(Object *obj, Error **errp) > > > > -- > > > > 2.20.1 > > > >
> -----Original Message----- > From: Lukas Straub <lukasstraub2@web.de> > Sent: Wednesday, April 22, 2020 5:40 PM > To: Zhang, Chen <chen.zhang@intel.com> > Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > <pbonzini@redhat.com> > Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right > AioContext > > On Wed, 22 Apr 2020 09:03:00 +0000 > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > -----Original Message----- > > > From: Lukas Straub <lukasstraub2@web.de> > > > Sent: Wednesday, April 22, 2020 4:43 PM > > > To: Zhang, Chen <chen.zhang@intel.com> > > > Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > > > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > > > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > > > <pbonzini@redhat.com> > > > Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with > > > the right AioContext > > > > > > On Wed, 22 Apr 2020 08:29:39 +0000 > > > "Zhang, Chen" <chen.zhang@intel.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Lukas Straub <lukasstraub2@web.de> > > > > > Sent: Thursday, April 9, 2020 2:34 AM > > > > > To: qemu-devel <qemu-devel@nongnu.org> > > > > > Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian > > > > > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; > > > > > Marc- André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > > > > > <pbonzini@redhat.com> > > > > > Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with > > > > > the right AioContext > > > > > > > > > > qemu_bh_new will set the bh to be executed in the main loop. > > > > > This causes problems as colo_compare_handle_event assumes that > > > > > it has exclusive access the queues, which are also accessed in > > > > > the iothread. It also assumes that it runs in a different thread > > > > > than the caller and takes the appropriate locks. > > > > > > > > > > Create the bh with the AioContext of the iothread to fulfill > > > > > these assumptions. > > > > > > > > > > > > > Looks good for me, I assume it will increase performance. Do you > > > > have > > > related data? > > > > > > No, this fixes several crashes because the queues where accessed > > > concurrently from multiple threads. Sorry for my bad wording. > > > > Can you describe some details about the crash? Step by step? > > Maybe I can re-produce and test it for this patch. > > There is no clear test case. For me the crashes happened after 1-20h of > runtime with lots of checkpoints (800ms) and some network traffic. The > coredump always showed that two threads where doing operations on the > queues simultaneously. > Unfortunately, I don't have the coredumps anymore. OK, Although I have not encountered the problem you described. I have test this patch, looks running fine. Reviewed-by: Zhang Chen <chen.zhang@intel.com> Thanks Zhang Chen > > Regards, > Lukas Straub > > > Thanks > > Zhang Chen > > > > > > > > Regards, > > > Lukas Straub > > > > > > > Thanks > > > > Zhang Chen > > > > > > > > > Signed-off-by: Lukas Straub <lukasstraub2@web.de> > > > > > --- > > > > > net/colo-compare.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index > > > > > 10c0239f9d..1de4220fe2 100644 > > > > > --- a/net/colo-compare.c > > > > > +++ b/net/colo-compare.c > > > > > @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void > > > > > *opaque) > > > > > > > > > > static void colo_compare_iothread(CompareState *s) { > > > > > + AioContext *ctx = iothread_get_aio_context(s->iothread); > > > > > object_ref(OBJECT(s->iothread)); > > > > > s->worker_context = > > > > > iothread_get_g_main_context(s->iothread); > > > > > > > > > > @@ -906,7 +907,7 @@ static void > > > > > colo_compare_iothread(CompareState > > > *s) > > > > > } > > > > > > > > > > colo_compare_timer_init(s); > > > > > - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); > > > > > + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, > > > > > + s); > > > > > } > > > > > > > > > > static char *compare_get_pri_indev(Object *obj, Error **errp) > > > > > -- > > > > > 2.20.1 > > > > > >
On 2020/4/23 下午3:29, Zhang, Chen wrote: > > >> -----Original Message----- >> From: Lukas Straub <lukasstraub2@web.de> >> Sent: Wednesday, April 22, 2020 5:40 PM >> To: Zhang, Chen <chen.zhang@intel.com> >> Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian >> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- >> André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini >> <pbonzini@redhat.com> >> Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right >> AioContext >> >> On Wed, 22 Apr 2020 09:03:00 +0000 >> "Zhang, Chen" <chen.zhang@intel.com> wrote: >> >>>> -----Original Message----- >>>> From: Lukas Straub <lukasstraub2@web.de> >>>> Sent: Wednesday, April 22, 2020 4:43 PM >>>> To: Zhang, Chen <chen.zhang@intel.com> >>>> Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian >>>> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- >>>> André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini >>>> <pbonzini@redhat.com> >>>> Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with >>>> the right AioContext >>>> >>>> On Wed, 22 Apr 2020 08:29:39 +0000 >>>> "Zhang, Chen" <chen.zhang@intel.com> wrote: >>>> >>>>>> -----Original Message----- >>>>>> From: Lukas Straub <lukasstraub2@web.de> >>>>>> Sent: Thursday, April 9, 2020 2:34 AM >>>>>> To: qemu-devel <qemu-devel@nongnu.org> >>>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian >>>>>> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; >>>>>> Marc- André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini >>>>>> <pbonzini@redhat.com> >>>>>> Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with >>>>>> the right AioContext >>>>>> >>>>>> qemu_bh_new will set the bh to be executed in the main loop. >>>>>> This causes problems as colo_compare_handle_event assumes that >>>>>> it has exclusive access the queues, which are also accessed in >>>>>> the iothread. It also assumes that it runs in a different thread >>>>>> than the caller and takes the appropriate locks. >>>>>> >>>>>> Create the bh with the AioContext of the iothread to fulfill >>>>>> these assumptions. >>>>>> >>>>> >>>>> Looks good for me, I assume it will increase performance. Do you >>>>> have >>>> related data? >>>> >>>> No, this fixes several crashes because the queues where accessed >>>> concurrently from multiple threads. Sorry for my bad wording. >>> >>> Can you describe some details about the crash? Step by step? >>> Maybe I can re-produce and test it for this patch. >> >> There is no clear test case. For me the crashes happened after 1-20h of >> runtime with lots of checkpoints (800ms) and some network traffic. The >> coredump always showed that two threads where doing operations on the >> queues simultaneously. >> Unfortunately, I don't have the coredumps anymore. > > OK, Although I have not encountered the problem you described. > I have test this patch, looks running fine. > > Reviewed-by: Zhang Chen <chen.zhang@intel.com> > > Thanks > Zhang Chen Hi, I encountered PVM crash caused by the race condition issue in v4.2.0. Here is the coredump for reference. ``` warning: core file may not match specified executable file. Core was generated by `qemu-system-x86_64 -name source -enable-kvm -cpu core2duo -m 1024 -global kvm-a'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055cb478bcd25 in qemu_hexdump (buf=0x0, fp=0x7f6e9122b680 <IO_2_1_stderr>, prefix=0x55cb47a388f5 "colo-compare spkt", size=1514) at util/hexdump.c:34 34 fprintf(fp, " %02x", (unsigned char)buf[b + i]); [Current thread is 1 (Thread 0x7f6da1ade700 (LWP 6119))] (gdb) where #0 0x000055cb478bcd25 in qemu_hexdump (buf=0x0, fp=0x7f6e9122b680 <IO_2_1_stderr>, prefix=0x55cb47a388f5 "colo-compare spkt", size=1514) at util/hexdump.c:34 #1 0x000055cb476fa1b5 in colo_compare_tcp (s=0x55cb496429f0, conn=0x7f6e78003e30) at net/colo-compare.c:462 #2 0x000055cb476fa8c1 in colo_compare_connection (opaque=0x7f6e78003e30, user_data=0x55cb496429f0) at net/colo-compare.c:687 #3 0x000055cb476fb4ab in compare_pri_rs_finalize (pri_rs=0x55cb49642b18) at net/colo-compare.c:1001 #4 0x000055cb476eb46f in net_fill_rstate (rs=0x55cb49642b18, buf=0x7f6da1add2c8 "", size=1064) at net/net.c:1764 #5 0x000055cb476faafa in compare_pri_chr_in (opaque=0x55cb496429f0, buf=0x7f6da1adc6f0 "`E˧V\210RT", size=4096) at net/colo-compare.c:776 #6 0x000055cb47815363 in qemu_chr_be_write_impl (s=0x55cb48c87ec0, buf=0x7f6da1adc6f0 "`E˧V\210RT", len=4096) at chardev/char.c:177 #7 0x000055cb478153c7 in qemu_chr_be_write (s=0x55cb48c87ec0, buf=0x7f6da1adc6f0 "`E˧V\210RT", len=4096) at chardev/char.c:189 #8 0x000055cb4781e002 in tcp_chr_read (chan=0x55cb48ef7690, cond=G_IO_IN, opaque=0x55cb48c87ec0) at chardev/char-socket.c:525 #9 0x000055cb47839655 in qio_channel_fd_source_dispatch (source=0x7f6e78002050, callback=0x55cb4781de53 <tcp_chr_read>, user_data=0x55cb48c87ec0) at io/channel-watch.c:84 #10 0x00007f6e950e1285 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #11 0x00007f6e950e1650 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #12 0x00007f6e950e1962 in g_main_loop_run () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #13 0x000055cb474920ae in iothread_run (opaque=0x55cb48c67f10) at iothread.c:82 #14 0x000055cb478a699d in qemu_thread_start (args=0x55cb498035d0) at util/qemu-thread-posix.c:519 #15 0x00007f6e912376db in start_thread (arg=0x7f6da1ade700) at pthread_create.c:463 #16 0x00007f6e90f6088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) ``` COLO works well after applying this patch in my tests. Reviewed-by: Derek Su <dereksu@qnap.com> Tested-by: Derek Su <dereksu@qnap.com> Regards, Derek > >> >> Regards, >> Lukas Straub >> >>> Thanks >>> Zhang Chen >>> >>>> >>>> Regards, >>>> Lukas Straub >>>> >>>>> Thanks >>>>> Zhang Chen >>>>> >>>>>> Signed-off-by: Lukas Straub <lukasstraub2@web.de> >>>>>> --- >>>>>> net/colo-compare.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index >>>>>> 10c0239f9d..1de4220fe2 100644 >>>>>> --- a/net/colo-compare.c >>>>>> +++ b/net/colo-compare.c >>>>>> @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void >>>>>> *opaque) >>>>>> >>>>>> static void colo_compare_iothread(CompareState *s) { >>>>>> + AioContext *ctx = iothread_get_aio_context(s->iothread); >>>>>> object_ref(OBJECT(s->iothread)); >>>>>> s->worker_context = >>>>>> iothread_get_g_main_context(s->iothread); >>>>>> >>>>>> @@ -906,7 +907,7 @@ static void >>>>>> colo_compare_iothread(CompareState >>>> *s) >>>>>> } >>>>>> >>>>>> colo_compare_timer_init(s); >>>>>> - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); >>>>>> + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, >>>>>> + s); >>>>>> } >>>>>> >>>>>> static char *compare_get_pri_indev(Object *obj, Error **errp) >>>>>> -- >>>>>> 2.20.1 >>>>> >>> >
> -----Original Message----- > From: Derek Su <dereksu@qnap.com> > Sent: Friday, April 24, 2020 12:37 PM > To: Zhang, Chen <chen.zhang@intel.com>; Lukas Straub > <lukasstraub2@web.de> > Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > <pbonzini@redhat.com> > Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with the right > AioContext > > On 2020/4/23 下午3:29, Zhang, Chen wrote: > > > > > >> -----Original Message----- > >> From: Lukas Straub <lukasstraub2@web.de> > >> Sent: Wednesday, April 22, 2020 5:40 PM > >> To: Zhang, Chen <chen.zhang@intel.com> > >> Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > >> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; Marc- > >> André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > >> <pbonzini@redhat.com> > >> Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with the > >> right AioContext > >> > >> On Wed, 22 Apr 2020 09:03:00 +0000 > >> "Zhang, Chen" <chen.zhang@intel.com> wrote: > >> > >>>> -----Original Message----- > >>>> From: Lukas Straub <lukasstraub2@web.de> > >>>> Sent: Wednesday, April 22, 2020 4:43 PM > >>>> To: Zhang, Chen <chen.zhang@intel.com> > >>>> Cc: qemu-devel <qemu-devel@nongnu.org>; Li Zhijian > >>>> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; > Marc- > >>>> André Lureau <marcandre.lureau@redhat.com>; Paolo Bonzini > >>>> <pbonzini@redhat.com> > >>>> Subject: Re: [PATCH 1/3] net/colo-compare.c: Create event_bh with > >>>> the right AioContext > >>>> > >>>> On Wed, 22 Apr 2020 08:29:39 +0000 > >>>> "Zhang, Chen" <chen.zhang@intel.com> wrote: > >>>> > >>>>>> -----Original Message----- > >>>>>> From: Lukas Straub <lukasstraub2@web.de> > >>>>>> Sent: Thursday, April 9, 2020 2:34 AM > >>>>>> To: qemu-devel <qemu-devel@nongnu.org> > >>>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; Li Zhijian > >>>>>> <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>; > >>>>>> Marc- André Lureau <marcandre.lureau@redhat.com>; Paolo > Bonzini > >>>>>> <pbonzini@redhat.com> > >>>>>> Subject: [PATCH 1/3] net/colo-compare.c: Create event_bh with the > >>>>>> right AioContext > >>>>>> > >>>>>> qemu_bh_new will set the bh to be executed in the main loop. > >>>>>> This causes problems as colo_compare_handle_event assumes that > it > >>>>>> has exclusive access the queues, which are also accessed in the > >>>>>> iothread. It also assumes that it runs in a different thread than > >>>>>> the caller and takes the appropriate locks. > >>>>>> > >>>>>> Create the bh with the AioContext of the iothread to fulfill > >>>>>> these assumptions. > >>>>>> > >>>>> > >>>>> Looks good for me, I assume it will increase performance. Do you > >>>>> have > >>>> related data? > >>>> > >>>> No, this fixes several crashes because the queues where accessed > >>>> concurrently from multiple threads. Sorry for my bad wording. > >>> > >>> Can you describe some details about the crash? Step by step? > >>> Maybe I can re-produce and test it for this patch. > >> > >> There is no clear test case. For me the crashes happened after 1-20h > >> of runtime with lots of checkpoints (800ms) and some network traffic. > >> The coredump always showed that two threads where doing operations > on > >> the queues simultaneously. > >> Unfortunately, I don't have the coredumps anymore. > > > > OK, Although I have not encountered the problem you described. > > I have test this patch, looks running fine. > > > > Reviewed-by: Zhang Chen <chen.zhang@intel.com> > > > > Thanks > > Zhang Chen > > > Hi, > > I encountered PVM crash caused by the race condition issue in v4.2.0. > Here is the coredump for reference. > > ``` > warning: core file may not match specified executable file. > Core was generated by `qemu-system-x86_64 -name source -enable-kvm - > cpu core2duo -m 1024 -global kvm-a'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x000055cb478bcd25 in qemu_hexdump (buf=0x0, fp=0x7f6e9122b680 > <IO_2_1_stderr>, prefix=0x55cb47a388f5 "colo-compare spkt", size=1514) at > util/hexdump.c:34 > 34 fprintf(fp, " %02x", (unsigned char)buf[b + i]); > [Current thread is 1 (Thread 0x7f6da1ade700 (LWP 6119))] > (gdb) where > #0 0x000055cb478bcd25 in qemu_hexdump (buf=0x0, fp=0x7f6e9122b680 > <IO_2_1_stderr>, prefix=0x55cb47a388f5 "colo-compare spkt", size=1514) at > util/hexdump.c:34 > #1 0x000055cb476fa1b5 in colo_compare_tcp (s=0x55cb496429f0, > conn=0x7f6e78003e30) at net/colo-compare.c:462 > #2 0x000055cb476fa8c1 in colo_compare_connection > (opaque=0x7f6e78003e30, user_data=0x55cb496429f0) at net/colo- > compare.c:687 > #3 0x000055cb476fb4ab in compare_pri_rs_finalize > (pri_rs=0x55cb49642b18) at net/colo-compare.c:1001 > #4 0x000055cb476eb46f in net_fill_rstate (rs=0x55cb49642b18, > buf=0x7f6da1add2c8 "", size=1064) at net/net.c:1764 > #5 0x000055cb476faafa in compare_pri_chr_in (opaque=0x55cb496429f0, > buf=0x7f6da1adc6f0 "`E˧V\210RT", size=4096) at net/colo-compare.c:776 > #6 0x000055cb47815363 in qemu_chr_be_write_impl (s=0x55cb48c87ec0, > buf=0x7f6da1adc6f0 "`E˧V\210RT", len=4096) at chardev/char.c:177 > #7 0x000055cb478153c7 in qemu_chr_be_write (s=0x55cb48c87ec0, > buf=0x7f6da1adc6f0 "`E˧V\210RT", len=4096) at chardev/char.c:189 > #8 0x000055cb4781e002 in tcp_chr_read (chan=0x55cb48ef7690, > cond=G_IO_IN, opaque=0x55cb48c87ec0) at chardev/char-socket.c:525 > #9 0x000055cb47839655 in qio_channel_fd_source_dispatch > (source=0x7f6e78002050, callback=0x55cb4781de53 <tcp_chr_read>, > user_data=0x55cb48c87ec0) at io/channel-watch.c:84 > #10 0x00007f6e950e1285 in g_main_context_dispatch () at > /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 > #11 0x00007f6e950e1650 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 > #12 0x00007f6e950e1962 in g_main_loop_run () at > /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 > #13 0x000055cb474920ae in iothread_run (opaque=0x55cb48c67f10) at > iothread.c:82 > #14 0x000055cb478a699d in qemu_thread_start (args=0x55cb498035d0) at > util/qemu-thread-posix.c:519 > #15 0x00007f6e912376db in start_thread (arg=0x7f6da1ade700) at > pthread_create.c:463 > #16 0x00007f6e90f6088f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > (gdb) > ``` > > COLO works well after applying this patch in my tests. > > Reviewed-by: Derek Su <dereksu@qnap.com> > Tested-by: Derek Su <dereksu@qnap.com> > Thanks Derek. > Regards, > Derek > > > > > > > >> > >> Regards, > >> Lukas Straub > >> > >>> Thanks > >>> Zhang Chen > >>> > >>>> > >>>> Regards, > >>>> Lukas Straub > >>>> > >>>>> Thanks > >>>>> Zhang Chen > >>>>> > >>>>>> Signed-off-by: Lukas Straub <lukasstraub2@web.de> > >>>>>> --- > >>>>>> net/colo-compare.c | 3 ++- > >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index > >>>>>> 10c0239f9d..1de4220fe2 100644 > >>>>>> --- a/net/colo-compare.c > >>>>>> +++ b/net/colo-compare.c > >>>>>> @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void > >>>>>> *opaque) > >>>>>> > >>>>>> static void colo_compare_iothread(CompareState *s) { > >>>>>> + AioContext *ctx = iothread_get_aio_context(s->iothread); > >>>>>> object_ref(OBJECT(s->iothread)); > >>>>>> s->worker_context = > >>>>>> iothread_get_g_main_context(s->iothread); > >>>>>> > >>>>>> @@ -906,7 +907,7 @@ static void > >>>>>> colo_compare_iothread(CompareState > >>>> *s) > >>>>>> } > >>>>>> > >>>>>> colo_compare_timer_init(s); > >>>>>> - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); > >>>>>> + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s); > >>>>>> } > >>>>>> > >>>>>> static char *compare_get_pri_indev(Object *obj, Error **errp) > >>>>>> -- > >>>>>> 2.20.1 > >>>>> > >>> > >
diff --git a/net/colo-compare.c b/net/colo-compare.c index 10c0239f9d..1de4220fe2 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -890,6 +890,7 @@ static void colo_compare_handle_event(void *opaque) static void colo_compare_iothread(CompareState *s) { + AioContext *ctx = iothread_get_aio_context(s->iothread); object_ref(OBJECT(s->iothread)); s->worker_context = iothread_get_g_main_context(s->iothread); @@ -906,7 +907,7 @@ static void colo_compare_iothread(CompareState *s) } colo_compare_timer_init(s); - s->event_bh = qemu_bh_new(colo_compare_handle_event, s); + s->event_bh = aio_bh_new(ctx, colo_compare_handle_event, s); } static char *compare_get_pri_indev(Object *obj, Error **errp)
qemu_bh_new will set the bh to be executed in the main loop. This causes problems as colo_compare_handle_event assumes that it has exclusive access the queues, which are also accessed in the iothread. It also assumes that it runs in a different thread than the caller and takes the appropriate locks. Create the bh with the AioContext of the iothread to fulfill these assumptions. Signed-off-by: Lukas Straub <lukasstraub2@web.de> --- net/colo-compare.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)