Message ID | 1520426065-40265-3-git-send-email-wei.w.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Wei Wang (wei.w.wang@intel.com) wrote: > This patch adds an API to clear bits corresponding to guest free pages > from the dirty bitmap. Spilt the free page block if it crosses the QEMU > RAMBlock boundary. > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > CC: Dr. David Alan Gilbert <dgilbert@redhat.com> > CC: Juan Quintela <quintela@redhat.com> > CC: Michael S. Tsirkin <mst@redhat.com> > --- > include/migration/misc.h | 2 ++ > migration/ram.c | 21 +++++++++++++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/include/migration/misc.h b/include/migration/misc.h > index 77fd4f5..fae1acf 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -14,11 +14,13 @@ > #ifndef MIGRATION_MISC_H > #define MIGRATION_MISC_H > > +#include "exec/cpu-common.h" > #include "qemu/notify.h" > > /* migration/ram.c */ > > void ram_mig_init(void); > +void qemu_guest_free_page_hint(void *addr, size_t len); > > /* migration/block.c */ > > diff --git a/migration/ram.c b/migration/ram.c > index 5e33e5c..e172798 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp) > return 0; > } > This could do with some comments > +void qemu_guest_free_page_hint(void *addr, size_t len) > +{ > + RAMBlock *block; > + ram_addr_t offset; > + size_t used_len, start, npages; From your use I think the addr and len are coming raw from the guest; so we need to take some care. > + > + for (used_len = len; len > 0; len -= used_len) { That initialisation of used_len is unusual; I'd rather put that in the body. > + block = qemu_ram_block_from_host(addr, false, &offset); CHeck for block != 0 > + if (unlikely(offset + len > block->used_length)) { I think to make that overflow safe, that should be: if (len > (block->used_length - offset)) { But we'll need another test before it, because qemu_ram_block_from_host seems to check max_length not used_length, so we need to check for offset > block->used_length first > + used_len = block->used_length - offset; > + addr += used_len; > + } > + > + start = offset >> TARGET_PAGE_BITS; > + npages = used_len >> TARGET_PAGE_BITS; > + ram_state->migration_dirty_pages -= > + bitmap_count_one_with_offset(block->bmap, start, npages); > + bitmap_clear(block->bmap, start, npages); If this is happening while the migration is running, this isn't safe - the migration code could clear a bit at about the same point this happens, so that the count returned by bitmap_count_one_with_offset wouldn't match the word that was cleared by bitmap_clear. The only way I can see to fix it is to run over the range using bitmap_test_and_clear_atomic, using the return value to decrement the number of dirty pages. But you also need to be careful with the update of the migration_dirty_pages value itself, because that's also being read by the migration thread. Dave > + } > +} > + > /* > * Each of ram_save_setup, ram_save_iterate and ram_save_complete has > * long-running RCU critical section. When rcu-reclaims in the code > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote: > > + used_len = block->used_length - offset; > > + addr += used_len; > > + } > > + > > + start = offset >> TARGET_PAGE_BITS; > > + npages = used_len >> TARGET_PAGE_BITS; > > + ram_state->migration_dirty_pages -= > > + bitmap_count_one_with_offset(block->bmap, start, npages); > > + bitmap_clear(block->bmap, start, npages); > > If this is happening while the migration is running, this isn't safe - > the migration code could clear a bit at about the same point this > happens, so that the count returned by bitmap_count_one_with_offset > wouldn't match the word that was cleared by bitmap_clear. > > The only way I can see to fix it is to run over the range using > bitmap_test_and_clear_atomic, using the return value to decrement > the number of dirty pages. > But you also need to be careful with the update of the > migration_dirty_pages value itself, because that's also being read > by the migration thread. > > Dave I see that there's migration_bitmap_sync but it does not seem to be taken on all paths. E.g. migration_bitmap_clear_dirty and migration_bitmap_find_dirty are called without that lock sometimes. Thoughts?
* Michael S. Tsirkin (mst@redhat.com) wrote: > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote: > > > + used_len = block->used_length - offset; > > > + addr += used_len; > > > + } > > > + > > > + start = offset >> TARGET_PAGE_BITS; > > > + npages = used_len >> TARGET_PAGE_BITS; > > > + ram_state->migration_dirty_pages -= > > > + bitmap_count_one_with_offset(block->bmap, start, npages); > > > + bitmap_clear(block->bmap, start, npages); > > > > If this is happening while the migration is running, this isn't safe - > > the migration code could clear a bit at about the same point this > > happens, so that the count returned by bitmap_count_one_with_offset > > wouldn't match the word that was cleared by bitmap_clear. > > > > The only way I can see to fix it is to run over the range using > > bitmap_test_and_clear_atomic, using the return value to decrement > > the number of dirty pages. > > But you also need to be careful with the update of the > > migration_dirty_pages value itself, because that's also being read > > by the migration thread. > > > > Dave > > I see that there's migration_bitmap_sync but it does not seem to be Do you mean bitmap_mutex? > taken on all paths. E.g. migration_bitmap_clear_dirty and > migration_bitmap_find_dirty are called without that lock sometimes. > Thoughts? Hmm, that doesn't seem to protect much at all! It looks like it was originally added to handle hotplug causing the bitmaps to be resized; that extension code was removed in 66103a5 so that lock can probably go. I don't see how the lock would help us though; the migration thread is scanning it most of the time so would have to have the lock held most of the time. Dave > -- > MST -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, Mar 14, 2018 at 07:42:59PM +0000, Dr. David Alan Gilbert wrote: > * Michael S. Tsirkin (mst@redhat.com) wrote: > > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote: > > > > + used_len = block->used_length - offset; > > > > + addr += used_len; > > > > + } > > > > + > > > > + start = offset >> TARGET_PAGE_BITS; > > > > + npages = used_len >> TARGET_PAGE_BITS; > > > > + ram_state->migration_dirty_pages -= > > > > + bitmap_count_one_with_offset(block->bmap, start, npages); > > > > + bitmap_clear(block->bmap, start, npages); > > > > > > If this is happening while the migration is running, this isn't safe - > > > the migration code could clear a bit at about the same point this > > > happens, so that the count returned by bitmap_count_one_with_offset > > > wouldn't match the word that was cleared by bitmap_clear. > > > > > > The only way I can see to fix it is to run over the range using > > > bitmap_test_and_clear_atomic, using the return value to decrement > > > the number of dirty pages. > > > But you also need to be careful with the update of the > > > migration_dirty_pages value itself, because that's also being read > > > by the migration thread. > > > > > > Dave > > > > I see that there's migration_bitmap_sync but it does not seem to be > > Do you mean bitmap_mutex? Yes. Sorry. > > taken on all paths. E.g. migration_bitmap_clear_dirty and > > migration_bitmap_find_dirty are called without that lock sometimes. > > Thoughts? > > Hmm, that doesn't seem to protect much at all! It looks like it was > originally added to handle hotplug causing the bitmaps to be resized; > that extension code was removed in 66103a5 so that lock can probably go. > > I don't see how the lock would help us though; the migration thread is > scanning it most of the time so would have to have the lock held > most of the time. > > Dave > > > -- > > MST > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote: > * Wei Wang (wei.w.wang@intel.com) wrote: >> This patch adds an API to clear bits corresponding to guest free pages >> from the dirty bitmap. Spilt the free page block if it crosses the QEMU >> RAMBlock boundary. >> >> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> >> CC: Juan Quintela <quintela@redhat.com> >> CC: Michael S. Tsirkin <mst@redhat.com> >> --- >> include/migration/misc.h | 2 ++ >> migration/ram.c | 21 +++++++++++++++++++++ >> 2 files changed, 23 insertions(+) >> >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index 77fd4f5..fae1acf 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -14,11 +14,13 @@ >> #ifndef MIGRATION_MISC_H >> #define MIGRATION_MISC_H >> >> +#include "exec/cpu-common.h" >> #include "qemu/notify.h" >> >> /* migration/ram.c */ >> >> void ram_mig_init(void); >> +void qemu_guest_free_page_hint(void *addr, size_t len); >> >> /* migration/block.c */ >> >> diff --git a/migration/ram.c b/migration/ram.c >> index 5e33e5c..e172798 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp) >> return 0; >> } >> > This could do with some comments OK, I'll add some. > >> +void qemu_guest_free_page_hint(void *addr, size_t len) >> +{ >> + RAMBlock *block; >> + ram_addr_t offset; >> + size_t used_len, start, npages; > From your use I think the addr and len are coming raw from the guest; > so we need to take some care. > Actually the "addr" here has been the host address that corresponds to the guest free page. It's from elem->in_sg[0].iov_base. > >> + if (unlikely(offset + len > block->used_length)) { > I think to make that overflow safe, that should be: > if (len > (block->used_length - offset)) { > > But we'll need another test before it, because qemu_ram_block_from_host > seems to check max_length not used_length, so we need to check > for offset > block->used_length first OK, how about adding an assert above, like this: block = qemu_ram_block_from_host(addr, false, &offset); assert (offset < block->used_length ); if (!block) ... The address corresponds to a guest free page, which means it should be within used_length. If not, something weird happens, I think we'd better to assert it in that case. Best, Wei
On 03/15/2018 03:42 AM, Dr. David Alan Gilbert wrote: > * Michael S. Tsirkin (mst@redhat.com) wrote: >> On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote: >>>> + used_len = block->used_length - offset; >>>> + addr += used_len; >>>> + } >>>> + >>>> + start = offset >> TARGET_PAGE_BITS; >>>> + npages = used_len >> TARGET_PAGE_BITS; >>>> + ram_state->migration_dirty_pages -= >>>> + bitmap_count_one_with_offset(block->bmap, start, npages); >>>> + bitmap_clear(block->bmap, start, npages); >>> If this is happening while the migration is running, this isn't safe - >>> the migration code could clear a bit at about the same point this >>> happens, so that the count returned by bitmap_count_one_with_offset >>> wouldn't match the word that was cleared by bitmap_clear. >>> >>> The only way I can see to fix it is to run over the range using >>> bitmap_test_and_clear_atomic, using the return value to decrement >>> the number of dirty pages. >>> But you also need to be careful with the update of the >>> migration_dirty_pages value itself, because that's also being read >>> by the migration thread. >>> >>> Dave >> I see that there's migration_bitmap_sync but it does not seem to be > Do you mean bitmap_mutex? > >> taken on all paths. E.g. migration_bitmap_clear_dirty and >> migration_bitmap_find_dirty are called without that lock sometimes. >> Thoughts? Right. The bitmap claims to protect modification of the bitmap, but migration_bitmap_clear_dirty doesn't strictly follow the rule. > Hmm, that doesn't seem to protect much at all! It looks like it was > originally added to handle hotplug causing the bitmaps to be resized; > that extension code was removed in 66103a5 so that lock can probably go. > > I don't see how the lock would help us though; the migration thread is > scanning it most of the time so would have to have the lock held > most of the time. > How about adding the lock to migration_bitmap_clear_dirty, and we will have something like this: migration_bitmap_clear_dirty() { qemu_mutex_lock(&rs->bitmap_mutex); ret = test_and_clear_bit(page, rb->bmap); if (ret) { rs->migration_dirty_pages--; } ... qemu_mutex_unlock(&rs->bitmap_mutex); } qemu_guest_free_page_hint() { qemu_mutex_lock(&rs->bitmap_mutex); ... ram_state->migration_dirty_pages -= bitmap_count_one_with_offset(block->bmap, start, npages); bitmap_clear(block->bmap, start, npages); qemu_mutex_unlock(&rs->bitmap_mutex); } The migration thread will hold the lock only when it clears a bit from the bitmap. Or would you consider to change it to qemu_spin_lock? Best, Wei
On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote: > On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote: > > * Wei Wang (wei.w.wang@intel.com) wrote: > > > This patch adds an API to clear bits corresponding to guest free pages > > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU > > > RAMBlock boundary. > > > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > CC: Juan Quintela <quintela@redhat.com> > > > CC: Michael S. Tsirkin <mst@redhat.com> > > > --- > > > include/migration/misc.h | 2 ++ > > > migration/ram.c | 21 +++++++++++++++++++++ > > > 2 files changed, 23 insertions(+) > > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h > > > index 77fd4f5..fae1acf 100644 > > > --- a/include/migration/misc.h > > > +++ b/include/migration/misc.h > > > @@ -14,11 +14,13 @@ > > > #ifndef MIGRATION_MISC_H > > > #define MIGRATION_MISC_H > > > +#include "exec/cpu-common.h" > > > #include "qemu/notify.h" > > > /* migration/ram.c */ > > > void ram_mig_init(void); > > > +void qemu_guest_free_page_hint(void *addr, size_t len); > > > /* migration/block.c */ > > > diff --git a/migration/ram.c b/migration/ram.c > > > index 5e33e5c..e172798 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp) > > > return 0; > > > } > > This could do with some comments > > OK, I'll add some. > > > > > > +void qemu_guest_free_page_hint(void *addr, size_t len) > > > +{ > > > + RAMBlock *block; > > > + ram_addr_t offset; > > > + size_t used_len, start, npages; > > From your use I think the addr and len are coming raw from the guest; > > so we need to take some care. > > > > Actually the "addr" here has been the host address that corresponds to the > guest free page. It's from elem->in_sg[0].iov_base. > > > > > > + if (unlikely(offset + len > block->used_length)) { > > I think to make that overflow safe, that should be: > > if (len > (block->used_length - offset)) { > > > > But we'll need another test before it, because qemu_ram_block_from_host > > seems to check max_length not used_length, so we need to check > > for offset > block->used_length first > > OK, how about adding an assert above, like this: > > block = qemu_ram_block_from_host(addr, false, &offset); > assert (offset < block->used_length ); > if (!block) > ... > > The address corresponds to a guest free page, which means it should be > within used_length. If not, something weird happens, I think we'd better to > assert it in that case. > > Best, > Wei What if memory has been removed by hotunplug after guest sent the free page notification? This seems to actually be likely to happen as memory being unplugged would typically be mostly free.
On 03/15/2018 09:50 PM, Michael S. Tsirkin wrote: > On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote: >> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote: >>> * Wei Wang (wei.w.wang@intel.com) wrote: >>>> This patch adds an API to clear bits corresponding to guest free pages >>>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU >>>> RAMBlock boundary. >>>> >>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> >>>> CC: Juan Quintela <quintela@redhat.com> >>>> CC: Michael S. Tsirkin <mst@redhat.com> >>>> --- >>>> include/migration/misc.h | 2 ++ >>>> migration/ram.c | 21 +++++++++++++++++++++ >>>> 2 files changed, 23 insertions(+) >>>> >>>> diff --git a/include/migration/misc.h b/include/migration/misc.h >>>> index 77fd4f5..fae1acf 100644 >>>> --- a/include/migration/misc.h >>>> +++ b/include/migration/misc.h >>>> @@ -14,11 +14,13 @@ >>>> #ifndef MIGRATION_MISC_H >>>> #define MIGRATION_MISC_H >>>> +#include "exec/cpu-common.h" >>>> #include "qemu/notify.h" >>>> /* migration/ram.c */ >>>> void ram_mig_init(void); >>>> +void qemu_guest_free_page_hint(void *addr, size_t len); >>>> /* migration/block.c */ >>>> diff --git a/migration/ram.c b/migration/ram.c >>>> index 5e33e5c..e172798 100644 >>>> --- a/migration/ram.c >>>> +++ b/migration/ram.c >>>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp) >>>> return 0; >>>> } >>> This could do with some comments >> OK, I'll add some. >> >>>> +void qemu_guest_free_page_hint(void *addr, size_t len) >>>> +{ >>>> + RAMBlock *block; >>>> + ram_addr_t offset; >>>> + size_t used_len, start, npages; >>> From your use I think the addr and len are coming raw from the guest; >>> so we need to take some care. >>> >> Actually the "addr" here has been the host address that corresponds to the >> guest free page. It's from elem->in_sg[0].iov_base. >> >>>> + if (unlikely(offset + len > block->used_length)) { >>> I think to make that overflow safe, that should be: >>> if (len > (block->used_length - offset)) { >>> >>> But we'll need another test before it, because qemu_ram_block_from_host >>> seems to check max_length not used_length, so we need to check >>> for offset > block->used_length first >> OK, how about adding an assert above, like this: >> >> block = qemu_ram_block_from_host(addr, false, &offset); >> assert (offset < block->used_length ); >> if (!block) >> ... >> >> The address corresponds to a guest free page, which means it should be >> within used_length. If not, something weird happens, I think we'd better to >> assert it in that case. >> >> Best, >> Wei > What if memory has been removed by hotunplug after guest sent the > free page notification? > > This seems to actually be likely to happen as memory being unplugged > would typically be mostly free. OK, thanks for the reminder. Instead of using an assert, I think we can let the function just return if (offset > block->used_length). Best, Wei
diff --git a/include/migration/misc.h b/include/migration/misc.h index 77fd4f5..fae1acf 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -14,11 +14,13 @@ #ifndef MIGRATION_MISC_H #define MIGRATION_MISC_H +#include "exec/cpu-common.h" #include "qemu/notify.h" /* migration/ram.c */ void ram_mig_init(void); +void qemu_guest_free_page_hint(void *addr, size_t len); /* migration/block.c */ diff --git a/migration/ram.c b/migration/ram.c index 5e33e5c..e172798 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp) return 0; } +void qemu_guest_free_page_hint(void *addr, size_t len) +{ + RAMBlock *block; + ram_addr_t offset; + size_t used_len, start, npages; + + for (used_len = len; len > 0; len -= used_len) { + block = qemu_ram_block_from_host(addr, false, &offset); + if (unlikely(offset + len > block->used_length)) { + used_len = block->used_length - offset; + addr += used_len; + } + + start = offset >> TARGET_PAGE_BITS; + npages = used_len >> TARGET_PAGE_BITS; + ram_state->migration_dirty_pages -= + bitmap_count_one_with_offset(block->bmap, start, npages); + bitmap_clear(block->bmap, start, npages); + } +} + /* * Each of ram_save_setup, ram_save_iterate and ram_save_complete has * long-running RCU critical section. When rcu-reclaims in the code
This patch adds an API to clear bits corresponding to guest free pages from the dirty bitmap. Spilt the free page block if it crosses the QEMU RAMBlock boundary. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> --- include/migration/misc.h | 2 ++ migration/ram.c | 21 +++++++++++++++++++++ 2 files changed, 23 insertions(+)