Message ID | 20200506094948.76388-8-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-mem: Paravirtualized memory hot(un)plug | expand |
* David Hildenbrand (david@redhat.com) wrote: > RDMA will pin all guest memory (as documented in docs/rdma.txt). We want > to mark RAM block discards to be broken - however, to keep it simple > use ram_block_discard_is_required() instead of inhibiting. Should this be dependent on whether rdma->pin_all is set? Even with !pin_all some will be pinned at any given time (when it's registered with the rdma stack). Dave > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Juan Quintela <quintela@redhat.com> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > migration/rdma.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index f61587891b..029adbb950 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -29,6 +29,7 @@ > #include "qemu/sockets.h" > #include "qemu/bitmap.h" > #include "qemu/coroutine.h" > +#include "exec/memory.h" > #include <sys/socket.h> > #include <netdb.h> > #include <arpa/inet.h> > @@ -4017,8 +4018,14 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp) > Error *local_err = NULL; > > trace_rdma_start_incoming_migration(); > - rdma = qemu_rdma_data_init(host_port, &local_err); > > + /* Avoid ram_block_discard_set_broken(), cannot change during migration. */ > + if (ram_block_discard_is_required()) { > + error_setg(errp, "RDMA: cannot set discarding of RAM broken"); > + return; > + } > + > + rdma = qemu_rdma_data_init(host_port, &local_err); > if (rdma == NULL) { > goto err; > } > @@ -4064,10 +4071,17 @@ void rdma_start_outgoing_migration(void *opaque, > const char *host_port, Error **errp) > { > MigrationState *s = opaque; > - RDMAContext *rdma = qemu_rdma_data_init(host_port, errp); > RDMAContext *rdma_return_path = NULL; > + RDMAContext *rdma; > int ret = 0; > > + /* Avoid ram_block_discard_set_broken(), cannot change during migration. */ > + if (ram_block_discard_is_required()) { > + error_setg(errp, "RDMA: cannot set discarding of RAM broken"); > + return; > + } > + > + rdma = qemu_rdma_data_init(host_port, errp); > if (rdma == NULL) { > goto err; > } > -- > 2.25.3 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 15.05.20 14:45, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@redhat.com) wrote: >> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want >> to mark RAM block discards to be broken - however, to keep it simple >> use ram_block_discard_is_required() instead of inhibiting. > > Should this be dependent on whether rdma->pin_all is set? > Even with !pin_all some will be pinned at any given time > (when it's registered with the rdma stack). Do you know how much memory this is? Is such memory only temporarily pinned? At least with special-cases of vfio, it's acceptable if some memory is temporarily pinned - we assume it's only the working set of the driver, which guests will not inflate as long as they don't want to shoot themselves in the foot. This here sounds like the guest does not know the pinned memory is special, right?
* David Hildenbrand (david@redhat.com) wrote: > On 15.05.20 14:45, Dr. David Alan Gilbert wrote: > > * David Hildenbrand (david@redhat.com) wrote: > >> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want > >> to mark RAM block discards to be broken - however, to keep it simple > >> use ram_block_discard_is_required() instead of inhibiting. > > > > Should this be dependent on whether rdma->pin_all is set? > > Even with !pin_all some will be pinned at any given time > > (when it's registered with the rdma stack). > > Do you know how much memory this is? Is such memory only temporarily pinned? With pin_all not set, only a subset of memory, I think multiple 1MB chunks, are pinned at any one time. > At least with special-cases of vfio, it's acceptable if some memory is > temporarily pinned - we assume it's only the working set of the driver, > which guests will not inflate as long as they don't want to shoot > themselves in the foot. > > This here sounds like the guest does not know the pinned memory is > special, right? Right - for RDMA it's all of memory that's being transferred, and the guest doesn't see when each part is transferred. Dave > -- > Thanks, > > David / dhildenb -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 15.05.20 19:51, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@redhat.com) wrote: >> On 15.05.20 14:45, Dr. David Alan Gilbert wrote: >>> * David Hildenbrand (david@redhat.com) wrote: >>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want >>>> to mark RAM block discards to be broken - however, to keep it simple >>>> use ram_block_discard_is_required() instead of inhibiting. >>> >>> Should this be dependent on whether rdma->pin_all is set? >>> Even with !pin_all some will be pinned at any given time >>> (when it's registered with the rdma stack). >> >> Do you know how much memory this is? Is such memory only temporarily pinned? > > With pin_all not set, only a subset of memory, I think multiple 1MB > chunks, are pinned at any one time. > >> At least with special-cases of vfio, it's acceptable if some memory is >> temporarily pinned - we assume it's only the working set of the driver, >> which guests will not inflate as long as they don't want to shoot >> themselves in the foot. >> >> This here sounds like the guest does not know the pinned memory is >> special, right? > > Right - for RDMA it's all of memory that's being transferred, and the > guest doesn't see when each part is transferred. Okay, so all memory will eventually be pinned, just not at the same time, correct? I think this implies that any memory that was previously discarded will be backed my new pages, meaning we will consume more memory than intended. If so, always disabling discarding of RAM seems to be the right thing to do.
* David Hildenbrand (david@redhat.com) wrote: > On 15.05.20 19:51, Dr. David Alan Gilbert wrote: > > * David Hildenbrand (david@redhat.com) wrote: > >> On 15.05.20 14:45, Dr. David Alan Gilbert wrote: > >>> * David Hildenbrand (david@redhat.com) wrote: > >>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want > >>>> to mark RAM block discards to be broken - however, to keep it simple > >>>> use ram_block_discard_is_required() instead of inhibiting. > >>> > >>> Should this be dependent on whether rdma->pin_all is set? > >>> Even with !pin_all some will be pinned at any given time > >>> (when it's registered with the rdma stack). > >> > >> Do you know how much memory this is? Is such memory only temporarily pinned? > > > > With pin_all not set, only a subset of memory, I think multiple 1MB > > chunks, are pinned at any one time. > > > >> At least with special-cases of vfio, it's acceptable if some memory is > >> temporarily pinned - we assume it's only the working set of the driver, > >> which guests will not inflate as long as they don't want to shoot > >> themselves in the foot. > >> > >> This here sounds like the guest does not know the pinned memory is > >> special, right? > > > > Right - for RDMA it's all of memory that's being transferred, and the > > guest doesn't see when each part is transferred. > > > Okay, so all memory will eventually be pinned, just not at the same > time, correct? > > I think this implies that any memory that was previously discarded will > be backed my new pages, meaning we will consume more memory than intended. > > If so, always disabling discarding of RAM seems to be the right thing to do. Yeh that's probably true, although there's a check for 'buffer_is_zero' in the !rdma->pin_all case, if the entire area is zero (or probably if unmapped) then it sends a notification rather than registering; see qemu_rdma_write_one and search for 'This chunk has not yet been registered, so first check to see' Dave > > -- > Thanks, > > David / dhildenb -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 15.05.20 20:36, Dr. David Alan Gilbert wrote: > * David Hildenbrand (david@redhat.com) wrote: >> On 15.05.20 19:51, Dr. David Alan Gilbert wrote: >>> * David Hildenbrand (david@redhat.com) wrote: >>>> On 15.05.20 14:45, Dr. David Alan Gilbert wrote: >>>>> * David Hildenbrand (david@redhat.com) wrote: >>>>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want >>>>>> to mark RAM block discards to be broken - however, to keep it simple >>>>>> use ram_block_discard_is_required() instead of inhibiting. >>>>> >>>>> Should this be dependent on whether rdma->pin_all is set? >>>>> Even with !pin_all some will be pinned at any given time >>>>> (when it's registered with the rdma stack). >>>> >>>> Do you know how much memory this is? Is such memory only temporarily pinned? >>> >>> With pin_all not set, only a subset of memory, I think multiple 1MB >>> chunks, are pinned at any one time. >>> >>>> At least with special-cases of vfio, it's acceptable if some memory is >>>> temporarily pinned - we assume it's only the working set of the driver, >>>> which guests will not inflate as long as they don't want to shoot >>>> themselves in the foot. >>>> >>>> This here sounds like the guest does not know the pinned memory is >>>> special, right? >>> >>> Right - for RDMA it's all of memory that's being transferred, and the >>> guest doesn't see when each part is transferred. >> >> >> Okay, so all memory will eventually be pinned, just not at the same >> time, correct? >> >> I think this implies that any memory that was previously discarded will >> be backed my new pages, meaning we will consume more memory than intended. >> >> If so, always disabling discarding of RAM seems to be the right thing to do. > > Yeh that's probably true, although there's a check for 'buffer_is_zero' > in the !rdma->pin_all case, if the entire area is zero (or probably if > unmapped) then it sends a notification rather than registering; see > qemu_rdma_write_one and search for 'This chunk has not yet been > registered, so first check to see' Right, if the whole chunk is zero, it will send a "compressed" zero chunk to the target. That will result in a memset() in case the destination is not already zero. So, both the source and the destination will be at least be read. But this only works if a complete chunk (1MB) is zero IIUC. If only one page within a chunk is not zero (e.g., not inflated), the whole chunk will be pinned. Also, "disabled chunk registration" seems to be another case. https://wiki.qemu.org/Features/RDMALiveMigration "Finally, zero pages are only checked if a page has not yet been registered using chunk registration (or not checked at all and unconditionally written if chunk registration is disabled. This is accomplished using the "Compress" command listed above. If the page *has* been registered then we check the entire chunk for zero. Only if the entire chunk is zero, then we send a compress command to zap the page on the other side."
diff --git a/migration/rdma.c b/migration/rdma.c index f61587891b..029adbb950 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -29,6 +29,7 @@ #include "qemu/sockets.h" #include "qemu/bitmap.h" #include "qemu/coroutine.h" +#include "exec/memory.h" #include <sys/socket.h> #include <netdb.h> #include <arpa/inet.h> @@ -4017,8 +4018,14 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp) Error *local_err = NULL; trace_rdma_start_incoming_migration(); - rdma = qemu_rdma_data_init(host_port, &local_err); + /* Avoid ram_block_discard_set_broken(), cannot change during migration. */ + if (ram_block_discard_is_required()) { + error_setg(errp, "RDMA: cannot set discarding of RAM broken"); + return; + } + + rdma = qemu_rdma_data_init(host_port, &local_err); if (rdma == NULL) { goto err; } @@ -4064,10 +4071,17 @@ void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp) { MigrationState *s = opaque; - RDMAContext *rdma = qemu_rdma_data_init(host_port, errp); RDMAContext *rdma_return_path = NULL; + RDMAContext *rdma; int ret = 0; + /* Avoid ram_block_discard_set_broken(), cannot change during migration. */ + if (ram_block_discard_is_required()) { + error_setg(errp, "RDMA: cannot set discarding of RAM broken"); + return; + } + + rdma = qemu_rdma_data_init(host_port, errp); if (rdma == NULL) { goto err; }
RDMA will pin all guest memory (as documented in docs/rdma.txt). We want to mark RAM block discards to be broken - however, to keep it simple use ram_block_discard_is_required() instead of inhibiting. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- migration/rdma.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)