Message ID | 20230922065625.21848-2-elena.ufimtseva@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | multifd: various fixes | expand |
Elena Ufimtseva <elena.ufimtseva@oracle.com> writes: > In multifd_send_sync_main we need to wait for channels_ready > before submitting sync packet as the threads may still be sending > their previous pages. > There is also no need to check for channels_ready in the loop > before the wait for sem_sync, next iteration of sending pages > or another sync will start with waiting for channels_ready > semaphore. > Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5 > ("multifd: Fix the number of channels ready") > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > --- > migration/multifd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index 0f6b203877..e61e458151 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f) > } > } > > + qemu_sem_wait(&multifd_send_state->channels_ready); > /* > * When using zero-copy, it's necessary to flush the pages before any of > * the pages can be sent again, so we'll make sure the new version of the > @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f) > for (i = 0; i < migrate_multifd_channels(); i++) { > MultiFDSendParams *p = &multifd_send_state->params[i]; > > - qemu_sem_wait(&multifd_send_state->channels_ready); > trace_multifd_send_sync_main_wait(p->id); > qemu_sem_wait(&p->sem_sync); Please take a look at the series I just sent. Basically, I think we should wait on 'sem' for the number of existing channels and not just once per sync. Otherwise I think we'd hit the same issue this patch is trying to fix when we loop into the n+1 channels. I think the assert(!p->pending_job) in patch 3 helps prove that's more appropriate. Let me know what you think. Thanks
On Fri, Sep 22, 2023 at 01:06:53PM -0300, Fabiano Rosas wrote: > Elena Ufimtseva <elena.ufimtseva@oracle.com> writes: > > > In multifd_send_sync_main we need to wait for channels_ready > > before submitting sync packet as the threads may still be sending > > their previous pages. > > There is also no need to check for channels_ready in the loop > > before the wait for sem_sync, next iteration of sending pages > > or another sync will start with waiting for channels_ready > > semaphore. > > Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5 > > ("multifd: Fix the number of channels ready") > > > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > > --- > > migration/multifd.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 0f6b203877..e61e458151 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f) > > } > > } > > > > + qemu_sem_wait(&multifd_send_state->channels_ready); > > /* > > * When using zero-copy, it's necessary to flush the pages before any of > > * the pages can be sent again, so we'll make sure the new version of the > > @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f) > > for (i = 0; i < migrate_multifd_channels(); i++) { > > MultiFDSendParams *p = &multifd_send_state->params[i]; > > > > - qemu_sem_wait(&multifd_send_state->channels_ready); > > trace_multifd_send_sync_main_wait(p->id); > > qemu_sem_wait(&p->sem_sync); > > Please take a look at the series I just sent. Basically, I think we > should wait on 'sem' for the number of existing channels and not just > once per sync. Otherwise I think we'd hit the same issue this patch is > trying to fix when we loop into the n+1 channels. I think the > assert(!p->pending_job) in patch 3 helps prove that's more appropriate. Thank you! These patches make sense to me. Agree on redundant sem_sync. Lets see what others think. I will run some tests as well with your patches and spend more time looking at [2/3] patch. Elena U. > > Let me know what you think. > > Thanks
diff --git a/migration/multifd.c b/migration/multifd.c index 0f6b203877..e61e458151 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f) } } + qemu_sem_wait(&multifd_send_state->channels_ready); /* * When using zero-copy, it's necessary to flush the pages before any of * the pages can be sent again, so we'll make sure the new version of the @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; - qemu_sem_wait(&multifd_send_state->channels_ready); trace_multifd_send_sync_main_wait(p->id); qemu_sem_wait(&p->sem_sync);
In multifd_send_sync_main we need to wait for channels_ready before submitting sync packet as the threads may still be sending their previous pages. There is also no need to check for channels_ready in the loop before the wait for sem_sync, next iteration of sending pages or another sync will start with waiting for channels_ready semaphore. Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5 ("multifd: Fix the number of channels ready") Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> --- migration/multifd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)