Message ID | 20230922065625.21848-5-elena.ufimtseva@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | multifd: various fixes | expand |
Elena Ufimtseva <elena.ufimtseva@oracle.com> writes: > Sometimes multifd sends just sync packet with no pages > (normal_num is 0). In this case the old value is being > preserved and being accounted for while only packet_len > is being transferred. > Reset it to 0 after sending and accounting for. > > TODO: Fix the same packet ids in the stream. > with this patch, there is still an issue with the duplicated > packets ids being sent (with different number of pages/flags). > See in below multifd_send trace (before this change): > multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000 > multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000 > > With this commit there are still duplicated packets, but since no pages > are being sent with sync flag set, next_packet_size is 0: > multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000 > multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0 > If there is a suggestion how to fix this properly, I will be > glad to use it. > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> > --- > migration/multifd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/migration/multifd.c b/migration/multifd.c > index 3281397b18..8b4e26051b 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque) > p->next_packet_size + p->packet_len); > stat64_add(&mig_stats.transferred, > p->next_packet_size + p->packet_len); > + p->next_packet_size = 0; > qemu_mutex_lock(&p->mutex); > p->pending_job--; > qemu_mutex_unlock(&p->mutex); Reviewed-by: Fabiano Rosas <farosas@suse.de>
Elena Ufimtseva <elena.ufimtseva@oracle.com> writes: > Sometimes multifd sends just sync packet with no pages > (normal_num is 0). In this case the old value is being > preserved and being accounted for while only packet_len > is being transferred. > Reset it to 0 after sending and accounting for. > Usually you'd finish your commit message here with your Signed off tag and the comments below would be included after the following --- sign. That way we can merge this patch without bring the unrelated discussion along. > TODO: Fix the same packet ids in the stream. > with this patch, there is still an issue with the duplicated > packets ids being sent (with different number of pages/flags). > See in below multifd_send trace (before this change): > multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000 > multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000 > > With this commit there are still duplicated packets, but since no pages > are being sent with sync flag set, next_packet_size is 0: > multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000 > multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0 > If there is a suggestion how to fix this properly, I will be > glad to use it. What is going on here? Is it that we're incrementing the multifd_send_state->packet_num under different locks? Because p->mutex is per-channel? You could try turning that into an atomic operation instead.
diff --git a/migration/multifd.c b/migration/multifd.c index 3281397b18..8b4e26051b 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque) p->next_packet_size + p->packet_len); stat64_add(&mig_stats.transferred, p->next_packet_size + p->packet_len); + p->next_packet_size = 0; qemu_mutex_lock(&p->mutex); p->pending_job--; qemu_mutex_unlock(&p->mutex);
Sometimes multifd sends just sync packet with no pages (normal_num is 0). In this case the old value is being preserved and being accounted for while only packet_len is being transferred. Reset it to 0 after sending and accounting for. TODO: Fix the same packet ids in the stream. with this patch, there is still an issue with the duplicated packets ids being sent (with different number of pages/flags). See in below multifd_send trace (before this change): multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000 multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000 With this commit there are still duplicated packets, but since no pages are being sent with sync flag set, next_packet_size is 0: multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000 multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0 If there is a suggestion how to fix this properly, I will be glad to use it. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> --- migration/multifd.c | 1 + 1 file changed, 1 insertion(+)