Message ID | 1476267508-19499-1-git-send-email-xiecl.fnst@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/12/2016 06:18 PM, Changlong Xie wrote: > ---------------------------------------------------------------------------------------- > time request Actions > 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false > ---------------------------------------------------------------------------------------- > > Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition > variable", this patch replaces CoMutex with CoQueue. > > Cc: Wen Congyang<wency@cn.fujitsu.com> > Reported-by: zhanghailiang<zhang.zhanghailiang@huawei.com> > Suggested-by: Paolo Bonzini<pbonzini@redhat.com> > Signed-off-by: Changlong Xie<xiecl.fnst@cn.fujitsu.com> > --- Please refer to: http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg02172.html
On 12/10/2016 12:18, Changlong Xie wrote: > + if (s->in_flight == MAX_NBD_REQUESTS) { > + qemu_co_queue_wait(&s->free_sema); > assert(s->in_flight < MAX_NBD_REQUESTS); > } I was wondering if this should be a while loop instead, but the assertion protects against that. So: Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Ping. Any comments? It's really a problem for NBD. Thanks -Xie On 10/12/2016 06:18 PM, Changlong Xie wrote: > NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are > N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev > N times. > ---------------------------------------------------------------------------------------- > time request Actions > 1 1 in_flight=1, Coroutine=C1 > 2 2 in_flight=2, Coroutine=C2 > ... > 15 15 in_flight=15, Coroutine=C15 > 16 16 in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true > 17 17 in_flight=16, Coroutine=C17, queue C17 into free_sema->queue > 18 18 in_flight=16, Coroutine=C18, queue C18 into free_sema->queue > ... > 26 N in_flight=16, Coroutine=C26, queue C26 into free_sema->queue > ---------------------------------------------------------------------------------------- > > Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because > it's equal to 'free_sema->holder'. > ---------------------------------------------------------------------------------------- > time request Actions > 27 16 in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false > ---------------------------------------------------------------------------------------- > > Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from > free_sema->queue's head and enter C17. More free_sema->holder is C17 now. > ---------------------------------------------------------------------------------------- > time request Actions > 28 17 in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true > ---------------------------------------------------------------------------------------- > > In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will > almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this > case, we will encounter assert "mutex->holder == self" failed since Kevin's commit 0e438cdc > "coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request > No.15' reply, qemu will stop unexpectedly: > ---------------------------------------------------------------------------------------- > time request Actions > 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false > ---------------------------------------------------------------------------------------- > > Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition > variable", this patch replaces CoMutex with CoQueue. > > Cc: Wen Congyang <wency@cn.fujitsu.com> > Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> > --- > block/nbd-client.c | 8 ++++---- > block/nbd-client.h | 2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/block/nbd-client.c b/block/nbd-client.c > index 2cf3237..40b28ab 100644 > --- a/block/nbd-client.c > +++ b/block/nbd-client.c > @@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s, > { > /* Poor man semaphore. The free_sema is locked when no other request > * can be accepted, and unlocked after receiving one reply. */ > - if (s->in_flight >= MAX_NBD_REQUESTS - 1) { > - qemu_co_mutex_lock(&s->free_sema); > + if (s->in_flight == MAX_NBD_REQUESTS) { > + qemu_co_queue_wait(&s->free_sema); > assert(s->in_flight < MAX_NBD_REQUESTS); > } > s->in_flight++; > @@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s, > int i = HANDLE_TO_INDEX(s, request->handle); > s->recv_coroutine[i] = NULL; > if (s->in_flight-- == MAX_NBD_REQUESTS) { > - qemu_co_mutex_unlock(&s->free_sema); > + qemu_co_queue_next(&s->free_sema); > } > } > > @@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs, > } > > qemu_co_mutex_init(&client->send_mutex); > - qemu_co_mutex_init(&client->free_sema); > + qemu_co_queue_init(&client->free_sema); > client->sioc = sioc; > object_ref(OBJECT(client->sioc)); > > diff --git a/block/nbd-client.h b/block/nbd-client.h > index 044aca4..307b8b1 100644 > --- a/block/nbd-client.h > +++ b/block/nbd-client.h > @@ -24,7 +24,7 @@ typedef struct NbdClientSession { > off_t size; > > CoMutex send_mutex; > - CoMutex free_sema; > + CoQueue free_sema; > Coroutine *send_coroutine; > int in_flight; > >
On 24/10/2016 03:44, Changlong Xie wrote: > Ping. Any comments? It's really a problem for NBD. Sorry, I haven't been sending pull requests. I'll do it this week. Paolo > Thanks > -Xie > > On 10/12/2016 06:18 PM, Changlong Xie wrote: >> NBD is using the CoMutex in a way that wasn't anticipated. For >> example, if there are >> N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke >> nbd_client_co_pwritev >> N times. >> ---------------------------------------------------------------------------------------- >> >> time request Actions >> 1 1 in_flight=1, Coroutine=C1 >> 2 2 in_flight=2, Coroutine=C2 >> ... >> 15 15 in_flight=15, Coroutine=C15 >> 16 16 in_flight=16, Coroutine=C16, free_sema->holder=C16, >> mutex->locked=true >> 17 17 in_flight=16, Coroutine=C17, queue C17 into free_sema->queue >> 18 18 in_flight=16, Coroutine=C18, queue C18 into free_sema->queue >> ... >> 26 N in_flight=16, Coroutine=C26, queue C26 into free_sema->queue >> ---------------------------------------------------------------------------------------- >> >> >> Once nbd client recieves request No.16' reply, we will re-enter C16. >> It's ok, because >> it's equal to 'free_sema->holder'. >> ---------------------------------------------------------------------------------------- >> >> time request Actions >> 27 16 in_flight=15, Coroutine=C16, free_sema->holder=C16, >> mutex->locked=false >> ---------------------------------------------------------------------------------------- >> >> >> Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop >> coroutines from >> free_sema->queue's head and enter C17. More free_sema->holder is C17 now. >> ---------------------------------------------------------------------------------------- >> >> time request Actions >> 28 17 in_flight=16, Coroutine=C17, free_sema->holder=C17, >> mutex->locked=true >> ---------------------------------------------------------------------------------------- >> >> >> In above scenario, we only recieves request No.16' reply. As time goes >> by, nbd client will >> almostly recieves replies from requests 1 to 15 rather than request 17 >> who owns C17. In this >> case, we will encounter assert "mutex->holder == self" failed since >> Kevin's commit 0e438cdc >> "coroutine: Let CoMutex remember who holds it". For example, if nbd >> client recieves request >> No.15' reply, qemu will stop unexpectedly: >> ---------------------------------------------------------------------------------------- >> >> time request Actions >> 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, >> mutex->locked=false >> ---------------------------------------------------------------------------------------- >> >> >> Per Paolo's suggestion "The simplest fix is to change it to CoQueue, >> which is like a condition >> variable", this patch replaces CoMutex with CoQueue. >> >> Cc: Wen Congyang <wency@cn.fujitsu.com> >> Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> >> --- >> block/nbd-client.c | 8 ++++---- >> block/nbd-client.h | 2 +- >> 2 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/block/nbd-client.c b/block/nbd-client.c >> index 2cf3237..40b28ab 100644 >> --- a/block/nbd-client.c >> +++ b/block/nbd-client.c >> @@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s, >> { >> /* Poor man semaphore. The free_sema is locked when no other >> request >> * can be accepted, and unlocked after receiving one reply. */ >> - if (s->in_flight >= MAX_NBD_REQUESTS - 1) { >> - qemu_co_mutex_lock(&s->free_sema); >> + if (s->in_flight == MAX_NBD_REQUESTS) { >> + qemu_co_queue_wait(&s->free_sema); >> assert(s->in_flight < MAX_NBD_REQUESTS); >> } >> s->in_flight++; >> @@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s, >> int i = HANDLE_TO_INDEX(s, request->handle); >> s->recv_coroutine[i] = NULL; >> if (s->in_flight-- == MAX_NBD_REQUESTS) { >> - qemu_co_mutex_unlock(&s->free_sema); >> + qemu_co_queue_next(&s->free_sema); >> } >> } >> >> @@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs, >> } >> >> qemu_co_mutex_init(&client->send_mutex); >> - qemu_co_mutex_init(&client->free_sema); >> + qemu_co_queue_init(&client->free_sema); >> client->sioc = sioc; >> object_ref(OBJECT(client->sioc)); >> >> diff --git a/block/nbd-client.h b/block/nbd-client.h >> index 044aca4..307b8b1 100644 >> --- a/block/nbd-client.h >> +++ b/block/nbd-client.h >> @@ -24,7 +24,7 @@ typedef struct NbdClientSession { >> off_t size; >> >> CoMutex send_mutex; >> - CoMutex free_sema; >> + CoQueue free_sema; >> Coroutine *send_coroutine; >> int in_flight; >> >> > > > >
On 10/24/2016 05:36 PM, Paolo Bonzini wrote: > > > On 24/10/2016 03:44, Changlong Xie wrote: >> Ping. Any comments? It's really a problem for NBD. > > Sorry, I haven't been sending pull requests. I'll do it this week. > Thanks : ) > Paolo > >> Thanks >> -Xie >> >> On 10/12/2016 06:18 PM, Changlong Xie wrote: >>> NBD is using the CoMutex in a way that wasn't anticipated. For >>> example, if there are >>> N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke >>> nbd_client_co_pwritev >>> N times. >>> ---------------------------------------------------------------------------------------- >>> >>> time request Actions >>> 1 1 in_flight=1, Coroutine=C1 >>> 2 2 in_flight=2, Coroutine=C2 >>> ... >>> 15 15 in_flight=15, Coroutine=C15 >>> 16 16 in_flight=16, Coroutine=C16, free_sema->holder=C16, >>> mutex->locked=true >>> 17 17 in_flight=16, Coroutine=C17, queue C17 into free_sema->queue >>> 18 18 in_flight=16, Coroutine=C18, queue C18 into free_sema->queue >>> ... >>> 26 N in_flight=16, Coroutine=C26, queue C26 into free_sema->queue >>> ---------------------------------------------------------------------------------------- >>> >>> >>> Once nbd client recieves request No.16' reply, we will re-enter C16. >>> It's ok, because >>> it's equal to 'free_sema->holder'. >>> ---------------------------------------------------------------------------------------- >>> >>> time request Actions >>> 27 16 in_flight=15, Coroutine=C16, free_sema->holder=C16, >>> mutex->locked=false >>> ---------------------------------------------------------------------------------------- >>> >>> >>> Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop >>> coroutines from >>> free_sema->queue's head and enter C17. More free_sema->holder is C17 now. >>> ---------------------------------------------------------------------------------------- >>> >>> time request Actions >>> 28 17 in_flight=16, Coroutine=C17, free_sema->holder=C17, >>> mutex->locked=true >>> ---------------------------------------------------------------------------------------- >>> >>> >>> In above scenario, we only recieves request No.16' reply. As time goes >>> by, nbd client will >>> almostly recieves replies from requests 1 to 15 rather than request 17 >>> who owns C17. In this >>> case, we will encounter assert "mutex->holder == self" failed since >>> Kevin's commit 0e438cdc >>> "coroutine: Let CoMutex remember who holds it". For example, if nbd >>> client recieves request >>> No.15' reply, qemu will stop unexpectedly: >>> ---------------------------------------------------------------------------------------- >>> >>> time request Actions >>> 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, >>> mutex->locked=false >>> ---------------------------------------------------------------------------------------- >>> >>> >>> Per Paolo's suggestion "The simplest fix is to change it to CoQueue, >>> which is like a condition >>> variable", this patch replaces CoMutex with CoQueue. >>> >>> Cc: Wen Congyang <wency@cn.fujitsu.com> >>> Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> >>> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> >>> --- >>> block/nbd-client.c | 8 ++++---- >>> block/nbd-client.h | 2 +- >>> 2 files changed, 5 insertions(+), 5 deletions(-) >>> >>> diff --git a/block/nbd-client.c b/block/nbd-client.c >>> index 2cf3237..40b28ab 100644 >>> --- a/block/nbd-client.c >>> +++ b/block/nbd-client.c >>> @@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s, >>> { >>> /* Poor man semaphore. The free_sema is locked when no other >>> request >>> * can be accepted, and unlocked after receiving one reply. */ >>> - if (s->in_flight >= MAX_NBD_REQUESTS - 1) { >>> - qemu_co_mutex_lock(&s->free_sema); >>> + if (s->in_flight == MAX_NBD_REQUESTS) { >>> + qemu_co_queue_wait(&s->free_sema); >>> assert(s->in_flight < MAX_NBD_REQUESTS); >>> } >>> s->in_flight++; >>> @@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s, >>> int i = HANDLE_TO_INDEX(s, request->handle); >>> s->recv_coroutine[i] = NULL; >>> if (s->in_flight-- == MAX_NBD_REQUESTS) { >>> - qemu_co_mutex_unlock(&s->free_sema); >>> + qemu_co_queue_next(&s->free_sema); >>> } >>> } >>> >>> @@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs, >>> } >>> >>> qemu_co_mutex_init(&client->send_mutex); >>> - qemu_co_mutex_init(&client->free_sema); >>> + qemu_co_queue_init(&client->free_sema); >>> client->sioc = sioc; >>> object_ref(OBJECT(client->sioc)); >>> >>> diff --git a/block/nbd-client.h b/block/nbd-client.h >>> index 044aca4..307b8b1 100644 >>> --- a/block/nbd-client.h >>> +++ b/block/nbd-client.h >>> @@ -24,7 +24,7 @@ typedef struct NbdClientSession { >>> off_t size; >>> >>> CoMutex send_mutex; >>> - CoMutex free_sema; >>> + CoQueue free_sema; >>> Coroutine *send_coroutine; >>> int in_flight; >>> >>> >> >> >> >> > > > . >
diff --git a/block/nbd-client.c b/block/nbd-client.c index 2cf3237..40b28ab 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s, { /* Poor man semaphore. The free_sema is locked when no other request * can be accepted, and unlocked after receiving one reply. */ - if (s->in_flight >= MAX_NBD_REQUESTS - 1) { - qemu_co_mutex_lock(&s->free_sema); + if (s->in_flight == MAX_NBD_REQUESTS) { + qemu_co_queue_wait(&s->free_sema); assert(s->in_flight < MAX_NBD_REQUESTS); } s->in_flight++; @@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s, int i = HANDLE_TO_INDEX(s, request->handle); s->recv_coroutine[i] = NULL; if (s->in_flight-- == MAX_NBD_REQUESTS) { - qemu_co_mutex_unlock(&s->free_sema); + qemu_co_queue_next(&s->free_sema); } } @@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs, } qemu_co_mutex_init(&client->send_mutex); - qemu_co_mutex_init(&client->free_sema); + qemu_co_queue_init(&client->free_sema); client->sioc = sioc; object_ref(OBJECT(client->sioc)); diff --git a/block/nbd-client.h b/block/nbd-client.h index 044aca4..307b8b1 100644 --- a/block/nbd-client.h +++ b/block/nbd-client.h @@ -24,7 +24,7 @@ typedef struct NbdClientSession { off_t size; CoMutex send_mutex; - CoMutex free_sema; + CoQueue free_sema; Coroutine *send_coroutine; int in_flight;
NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev N times. ---------------------------------------------------------------------------------------- time request Actions 1 1 in_flight=1, Coroutine=C1 2 2 in_flight=2, Coroutine=C2 ... 15 15 in_flight=15, Coroutine=C15 16 16 in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true 17 17 in_flight=16, Coroutine=C17, queue C17 into free_sema->queue 18 18 in_flight=16, Coroutine=C18, queue C18 into free_sema->queue ... 26 N in_flight=16, Coroutine=C26, queue C26 into free_sema->queue ---------------------------------------------------------------------------------------- Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because it's equal to 'free_sema->holder'. ---------------------------------------------------------------------------------------- time request Actions 27 16 in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false ---------------------------------------------------------------------------------------- Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from free_sema->queue's head and enter C17. More free_sema->holder is C17 now. ---------------------------------------------------------------------------------------- time request Actions 28 17 in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true ---------------------------------------------------------------------------------------- In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this case, we will encounter assert "mutex->holder == self" failed since Kevin's commit 0e438cdc "coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request No.15' reply, qemu will stop unexpectedly: ---------------------------------------------------------------------------------------- time request Actions 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false ---------------------------------------------------------------------------------------- Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition variable", this patch replaces CoMutex with CoQueue. Cc: Wen Congyang <wency@cn.fujitsu.com> Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> --- block/nbd-client.c | 8 ++++---- block/nbd-client.h | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-)