diff mbox series

[1/4] multifd: wait for channels_ready before sending sync

Message ID 20230922065625.21848-2-elena.ufimtseva@oracle.com (mailing list archive)
State New, archived
Headers show
Series multifd: various fixes | expand

Commit Message

Elena Ufimtseva Sept. 22, 2023, 6:56 a.m. UTC
In multifd_send_sync_main we need to wait for channels_ready
before submitting sync packet as the threads may still be sending
their previous pages.
There is also no need to check for channels_ready in the loop
before the wait for sem_sync, next iteration of sending pages
or another sync will start with waiting for channels_ready
semaphore.
Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5
("multifd: Fix the number of channels ready")

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/multifd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Fabiano Rosas Sept. 22, 2023, 4:06 p.m. UTC | #1
Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> In multifd_send_sync_main we need to wait for channels_ready
> before submitting sync packet as the threads may still be sending
> their previous pages.
> There is also no need to check for channels_ready in the loop
> before the wait for sem_sync, next iteration of sending pages
> or another sync will start with waiting for channels_ready
> semaphore.
> Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5
> ("multifd: Fix the number of channels ready")
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  migration/multifd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0f6b203877..e61e458151 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f)
>          }
>      }
>  
> +    qemu_sem_wait(&multifd_send_state->channels_ready);
>      /*
>       * When using zero-copy, it's necessary to flush the pages before any of
>       * the pages can be sent again, so we'll make sure the new version of the
> @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f)
>      for (i = 0; i < migrate_multifd_channels(); i++) {
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
> -        qemu_sem_wait(&multifd_send_state->channels_ready);
>          trace_multifd_send_sync_main_wait(p->id);
>          qemu_sem_wait(&p->sem_sync);

Please take a look at the series I just sent. Basically, I think we
should wait on 'sem' for the number of existing channels and not just
once per sync. Otherwise I think we'd hit the same issue this patch is
trying to fix when we loop into the n+1 channels. I think the
assert(!p->pending_job) in patch 3 helps prove that's more appropriate.

Let me know what you think.

Thanks
Elena Ufimtseva Sept. 22, 2023, 11:18 p.m. UTC | #2
On Fri, Sep 22, 2023 at 01:06:53PM -0300, Fabiano Rosas wrote:
> Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:
> 
> > In multifd_send_sync_main we need to wait for channels_ready
> > before submitting sync packet as the threads may still be sending
> > their previous pages.
> > There is also no need to check for channels_ready in the loop
> > before the wait for sem_sync, next iteration of sending pages
> > or another sync will start with waiting for channels_ready
> > semaphore.
> > Changes to commit 90b3cec351996dd8ef4eb847ad38607812c5e7f5
> > ("multifd: Fix the number of channels ready")
> >
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > ---
> >  migration/multifd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 0f6b203877..e61e458151 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -595,6 +595,7 @@ int multifd_send_sync_main(QEMUFile *f)
> >          }
> >      }
> >  
> > +    qemu_sem_wait(&multifd_send_state->channels_ready);
> >      /*
> >       * When using zero-copy, it's necessary to flush the pages before any of
> >       * the pages can be sent again, so we'll make sure the new version of the
> > @@ -630,7 +631,6 @@ int multifd_send_sync_main(QEMUFile *f)
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >          MultiFDSendParams *p = &multifd_send_state->params[i];
> >  
> > -        qemu_sem_wait(&multifd_send_state->channels_ready);
> >          trace_multifd_send_sync_main_wait(p->id);
> >          qemu_sem_wait(&p->sem_sync);
> 
> Please take a look at the series I just sent. Basically, I think we
> should wait on 'sem' for the number of existing channels and not just
> once per sync. Otherwise I think we'd hit the same issue this patch is
> trying to fix when we loop into the n+1 channels. I think the
> assert(!p->pending_job) in patch 3 helps prove that's more appropriate.

Thank you!

These patches make sense to me.
Agree on redundant sem_sync. Lets see what others think.

I will run some tests as well with your patches and spend
more time looking at [2/3] patch.

Elena U.

> 
> Let me know what you think.
> 
> Thanks
diff mbox series

Patch

diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..e61e458151 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -595,6 +595,7 @@  int multifd_send_sync_main(QEMUFile *f)
         }
     }
 
+    qemu_sem_wait(&multifd_send_state->channels_ready);
     /*
      * When using zero-copy, it's necessary to flush the pages before any of
      * the pages can be sent again, so we'll make sure the new version of the
@@ -630,7 +631,6 @@  int multifd_send_sync_main(QEMUFile *f)
     for (i = 0; i < migrate_multifd_channels(); i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
-        qemu_sem_wait(&multifd_send_state->channels_ready);
         trace_multifd_send_sync_main_wait(p->id);
         qemu_sem_wait(&p->sem_sync);