mbox series

[0/4] Trivial patches from multifd device state transfer support patch set

Message ID cover.1730203967.git.maciej.szmigiero@oracle.com (mailing list archive)
Headers show
Series Trivial patches from multifd device state transfer support patch set | expand

Message

Maciej S. Szmigiero Oct. 29, 2024, 2:58 p.m. UTC
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

A new version of the multifd device state transfer support with VFIO consumer
patch set is being prepared, the previous version and the associated
discussion is available here:
https://lore.kernel.org/qemu-devel/cover.1724701542.git.maciej.szmigiero@oracle.com/

This new version was originally targeting QEMU 9.2 but such schedule proved
to be too optimistic due to sheer number of invasive changes/rework required,
especially with respect to the VFIO internal threads management and their
synchronization with the migration core.

In addition to these changes, recently merged commit 3b5948f808e3
("vfio/migration: Report only stop-copy size in vfio_state_pending_exact()")
seems to have uncovered a race between multifd RAM and device state transfers:
RAM transfer sender finishes the multifd stream with a SYNC in
ram_save_complete() but the multifd receive channels are only released
from this SYNC after the migration is wholly complete in
process_incoming_migration_bh().

The above causes problems if the multifd channels need to still be
running after the RAM transfer is completed, for example because
there is still remaining device state to be transferred.

Since QEMU 9.2 code freeze is coming I've separated small uncontroversial
commits from that WiP main patch set here, some of which were already
reviewed during previous main patch set iterations.

This way at least future code conflicts can be reduced and the amount
of patches that need to be carried in the future versions of the main
patch set is reduced.


Maciej S. Szmigiero (4):
  vfio/migration: Add save_{iterate,complete_precopy}_started trace
    events
  migration/ram: Add load start trace event
  migration/multifd: Zero p->flags before starting filling a packet
  migration: Document the BQL behavior of load SaveVMHandlers

 hw/vfio/migration.c           | 13 +++++++++++++
 hw/vfio/trace-events          |  3 +++
 include/hw/vfio/vfio-common.h |  3 +++
 include/migration/register.h  |  4 ++++
 migration/multifd.c           |  2 +-
 migration/ram.c               |  1 +
 migration/trace-events        |  1 +
 7 files changed, 26 insertions(+), 1 deletion(-)

Comments

Peter Xu Oct. 29, 2024, 8:40 p.m. UTC | #1
On Tue, Oct 29, 2024 at 03:58:12PM +0100, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> A new version of the multifd device state transfer support with VFIO consumer
> patch set is being prepared, the previous version and the associated
> discussion is available here:
> https://lore.kernel.org/qemu-devel/cover.1724701542.git.maciej.szmigiero@oracle.com/
> 
> This new version was originally targeting QEMU 9.2 but such schedule proved
> to be too optimistic due to sheer number of invasive changes/rework required,
> especially with respect to the VFIO internal threads management and their
> synchronization with the migration core.
> 
> In addition to these changes, recently merged commit 3b5948f808e3
> ("vfio/migration: Report only stop-copy size in vfio_state_pending_exact()")
> seems to have uncovered a race between multifd RAM and device state transfers:
> RAM transfer sender finishes the multifd stream with a SYNC in
> ram_save_complete() but the multifd receive channels are only released
> from this SYNC after the migration is wholly complete in
> process_incoming_migration_bh().
> 
> The above causes problems if the multifd channels need to still be
> running after the RAM transfer is completed, for example because
> there is still remaining device state to be transferred.
> 
> Since QEMU 9.2 code freeze is coming I've separated small uncontroversial
> commits from that WiP main patch set here, some of which were already
> reviewed during previous main patch set iterations.
> 
> This way at least future code conflicts can be reduced and the amount
> of patches that need to be carried in the future versions of the main
> patch set is reduced.
> 
> 
> Maciej S. Szmigiero (4):
>   vfio/migration: Add save_{iterate,complete_precopy}_started trace
>     events
>   migration/ram: Add load start trace event
>   migration/multifd: Zero p->flags before starting filling a packet
>   migration: Document the BQL behavior of load SaveVMHandlers

I queued patch 2-3.  Patch 4 is ok to be merged even after softfreeze if
it's a doc only change, but we don't need to rush either..

Thanks,
Maciej S. Szmigiero Oct. 29, 2024, 8:46 p.m. UTC | #2
On 29.10.2024 21:40, Peter Xu wrote:
> On Tue, Oct 29, 2024 at 03:58:12PM +0100, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> A new version of the multifd device state transfer support with VFIO consumer
>> patch set is being prepared, the previous version and the associated
>> discussion is available here:
>> https://lore.kernel.org/qemu-devel/cover.1724701542.git.maciej.szmigiero@oracle.com/
>>
>> This new version was originally targeting QEMU 9.2 but such schedule proved
>> to be too optimistic due to sheer number of invasive changes/rework required,
>> especially with respect to the VFIO internal threads management and their
>> synchronization with the migration core.
>>
>> In addition to these changes, recently merged commit 3b5948f808e3
>> ("vfio/migration: Report only stop-copy size in vfio_state_pending_exact()")
>> seems to have uncovered a race between multifd RAM and device state transfers:
>> RAM transfer sender finishes the multifd stream with a SYNC in
>> ram_save_complete() but the multifd receive channels are only released
>> from this SYNC after the migration is wholly complete in
>> process_incoming_migration_bh().
>>
>> The above causes problems if the multifd channels need to still be
>> running after the RAM transfer is completed, for example because
>> there is still remaining device state to be transferred.
>>
>> Since QEMU 9.2 code freeze is coming I've separated small uncontroversial
>> commits from that WiP main patch set here, some of which were already
>> reviewed during previous main patch set iterations.
>>
>> This way at least future code conflicts can be reduced and the amount
>> of patches that need to be carried in the future versions of the main
>> patch set is reduced.
>>
>>
>> Maciej S. Szmigiero (4):
>>    vfio/migration: Add save_{iterate,complete_precopy}_started trace
>>      events
>>    migration/ram: Add load start trace event
>>    migration/multifd: Zero p->flags before starting filling a packet
>>    migration: Document the BQL behavior of load SaveVMHandlers
> 
> I queued patch 2-3.  Patch 4 is ok to be merged even after softfreeze if
> it's a doc only change, but we don't need to rush either..
> 
> Thanks,
> 

Thanks!

Maciej