diff mbox

[v4,2/4] migration: API to clear bits of guest free pages from the dirty bitmap

Message ID 1520426065-40265-3-git-send-email-wei.w.wang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wang, Wei W March 7, 2018, 12:34 p.m. UTC
This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

Comments

Dr. David Alan Gilbert March 14, 2018, 6:11 p.m. UTC | #1
* Wei Wang (wei.w.wang@intel.com) wrote:
> This patch adds an API to clear bits corresponding to guest free pages
> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> RAMBlock boundary.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/migration/misc.h |  2 ++
>  migration/ram.c          | 21 +++++++++++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 77fd4f5..fae1acf 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -14,11 +14,13 @@
>  #ifndef MIGRATION_MISC_H
>  #define MIGRATION_MISC_H
>  
> +#include "exec/cpu-common.h"
>  #include "qemu/notify.h"
>  
>  /* migration/ram.c */
>  
>  void ram_mig_init(void);
> +void qemu_guest_free_page_hint(void *addr, size_t len);
>  
>  /* migration/block.c */
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index 5e33e5c..e172798 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>      return 0;
>  }
>  

This could do with some comments

> +void qemu_guest_free_page_hint(void *addr, size_t len)
> +{
> +    RAMBlock *block;
> +    ram_addr_t offset;
> +    size_t used_len, start, npages;

From your use I think the addr and len are coming raw from the guest;
so we need to take some care.

> +
> +    for (used_len = len; len > 0; len -= used_len) {

That initialisation of used_len is unusual; I'd rather put that
in the body.

> +        block = qemu_ram_block_from_host(addr, false, &offset);

CHeck for block != 0

> +        if (unlikely(offset + len > block->used_length)) {

I think to make that overflow safe, that should be:
  if (len > (block->used_length - offset)) {

But we'll need another test before it, because qemu_ram_block_from_host
seems to check max_length not used_length, so we need to check
for offset > block->used_length first

> +            used_len = block->used_length - offset;
> +            addr += used_len;
> +        }
> +
> +        start = offset >> TARGET_PAGE_BITS;
> +        npages = used_len >> TARGET_PAGE_BITS;
> +        ram_state->migration_dirty_pages -=
> +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> +        bitmap_clear(block->bmap, start, npages);

If this is happening while the migration is running, this isn't safe -
the migration code could clear a bit at about the same point this
happens, so that the count returned by bitmap_count_one_with_offset
wouldn't match the word that was cleared by bitmap_clear.

The only way I can see to fix it is to run over the range using
bitmap_test_and_clear_atomic, using the return value to decrement
the number of dirty pages.
But you also need to be careful with the update of the
migration_dirty_pages value itself, because that's also being read
by the migration thread.

Dave

> +    }
> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Michael S. Tsirkin March 14, 2018, 7:16 p.m. UTC | #2
On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > +            used_len = block->used_length - offset;
> > +            addr += used_len;
> > +        }
> > +
> > +        start = offset >> TARGET_PAGE_BITS;
> > +        npages = used_len >> TARGET_PAGE_BITS;
> > +        ram_state->migration_dirty_pages -=
> > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > +        bitmap_clear(block->bmap, start, npages);
> 
> If this is happening while the migration is running, this isn't safe -
> the migration code could clear a bit at about the same point this
> happens, so that the count returned by bitmap_count_one_with_offset
> wouldn't match the word that was cleared by bitmap_clear.
> 
> The only way I can see to fix it is to run over the range using
> bitmap_test_and_clear_atomic, using the return value to decrement
> the number of dirty pages.
> But you also need to be careful with the update of the
> migration_dirty_pages value itself, because that's also being read
> by the migration thread.
> 
> Dave

I see that there's migration_bitmap_sync but it does not seem to be
taken on all paths. E.g. migration_bitmap_clear_dirty and
migration_bitmap_find_dirty are called without that lock sometimes.
Thoughts?
Dr. David Alan Gilbert March 14, 2018, 7:42 p.m. UTC | #3
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > > +            used_len = block->used_length - offset;
> > > +            addr += used_len;
> > > +        }
> > > +
> > > +        start = offset >> TARGET_PAGE_BITS;
> > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > +        ram_state->migration_dirty_pages -=
> > > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > > +        bitmap_clear(block->bmap, start, npages);
> > 
> > If this is happening while the migration is running, this isn't safe -
> > the migration code could clear a bit at about the same point this
> > happens, so that the count returned by bitmap_count_one_with_offset
> > wouldn't match the word that was cleared by bitmap_clear.
> > 
> > The only way I can see to fix it is to run over the range using
> > bitmap_test_and_clear_atomic, using the return value to decrement
> > the number of dirty pages.
> > But you also need to be careful with the update of the
> > migration_dirty_pages value itself, because that's also being read
> > by the migration thread.
> > 
> > Dave
> 
> I see that there's migration_bitmap_sync but it does not seem to be

Do you mean bitmap_mutex?

> taken on all paths. E.g. migration_bitmap_clear_dirty and
> migration_bitmap_find_dirty are called without that lock sometimes.
> Thoughts?

Hmm, that doesn't seem to protect much at all!  It looks like it was
originally added to handle hotplug causing the bitmaps to be resized;
that extension code was removed in 66103a5 so that lock can probably go.

I don't see how the lock would help us though; the migration thread is
scanning it most of the time so would have to have the lock held
most of the time.

Dave

> -- 
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Michael S. Tsirkin March 14, 2018, 8:38 p.m. UTC | #4
On Wed, Mar 14, 2018 at 07:42:59PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > > > +            used_len = block->used_length - offset;
> > > > +            addr += used_len;
> > > > +        }
> > > > +
> > > > +        start = offset >> TARGET_PAGE_BITS;
> > > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > > +        ram_state->migration_dirty_pages -=
> > > > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > > > +        bitmap_clear(block->bmap, start, npages);
> > > 
> > > If this is happening while the migration is running, this isn't safe -
> > > the migration code could clear a bit at about the same point this
> > > happens, so that the count returned by bitmap_count_one_with_offset
> > > wouldn't match the word that was cleared by bitmap_clear.
> > > 
> > > The only way I can see to fix it is to run over the range using
> > > bitmap_test_and_clear_atomic, using the return value to decrement
> > > the number of dirty pages.
> > > But you also need to be careful with the update of the
> > > migration_dirty_pages value itself, because that's also being read
> > > by the migration thread.
> > > 
> > > Dave
> > 
> > I see that there's migration_bitmap_sync but it does not seem to be
> 
> Do you mean bitmap_mutex?

Yes. Sorry.

> > taken on all paths. E.g. migration_bitmap_clear_dirty and
> > migration_bitmap_find_dirty are called without that lock sometimes.
> > Thoughts?
> 
> Hmm, that doesn't seem to protect much at all!  It looks like it was
> originally added to handle hotplug causing the bitmaps to be resized;
> that extension code was removed in 66103a5 so that lock can probably go.
> 
> I don't see how the lock would help us though; the migration thread is
> scanning it most of the time so would have to have the lock held
> most of the time.
> 
> Dave
> 
> > -- 
> > MST
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Wang, Wei W March 15, 2018, 10:52 a.m. UTC | #5
On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
> * Wei Wang (wei.w.wang@intel.com) wrote:
>> This patch adds an API to clear bits corresponding to guest free pages
>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>> RAMBlock boundary.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   include/migration/misc.h |  2 ++
>>   migration/ram.c          | 21 +++++++++++++++++++++
>>   2 files changed, 23 insertions(+)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 77fd4f5..fae1acf 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -14,11 +14,13 @@
>>   #ifndef MIGRATION_MISC_H
>>   #define MIGRATION_MISC_H
>>   
>> +#include "exec/cpu-common.h"
>>   #include "qemu/notify.h"
>>   
>>   /* migration/ram.c */
>>   
>>   void ram_mig_init(void);
>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>   
>>   /* migration/block.c */
>>   
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 5e33e5c..e172798 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>>       return 0;
>>   }
>>   
> This could do with some comments

OK, I'll add some.

>
>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>> +{
>> +    RAMBlock *block;
>> +    ram_addr_t offset;
>> +    size_t used_len, start, npages;
>  From your use I think the addr and len are coming raw from the guest;
> so we need to take some care.
>

Actually the "addr" here has been the host address that corresponds to 
the guest free page. It's from elem->in_sg[0].iov_base.

>
>> +        if (unlikely(offset + len > block->used_length)) {
> I think to make that overflow safe, that should be:
>    if (len > (block->used_length - offset)) {
>
> But we'll need another test before it, because qemu_ram_block_from_host
> seems to check max_length not used_length, so we need to check
> for offset > block->used_length first

OK, how about adding an assert above, like this:

block = qemu_ram_block_from_host(addr, false, &offset);
assert (offset  < block->used_length );
if (!block)
     ...

The address corresponds to a guest free page, which means it should be 
within used_length. If not, something weird happens, I think we'd better 
to assert it in that case.

Best,
Wei
Wang, Wei W March 15, 2018, 11:10 a.m. UTC | #6
On 03/15/2018 03:42 AM, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
>>>> +            used_len = block->used_length - offset;
>>>> +            addr += used_len;
>>>> +        }
>>>> +
>>>> +        start = offset >> TARGET_PAGE_BITS;
>>>> +        npages = used_len >> TARGET_PAGE_BITS;
>>>> +        ram_state->migration_dirty_pages -=
>>>> +                      bitmap_count_one_with_offset(block->bmap, start, npages);
>>>> +        bitmap_clear(block->bmap, start, npages);
>>> If this is happening while the migration is running, this isn't safe -
>>> the migration code could clear a bit at about the same point this
>>> happens, so that the count returned by bitmap_count_one_with_offset
>>> wouldn't match the word that was cleared by bitmap_clear.
>>>
>>> The only way I can see to fix it is to run over the range using
>>> bitmap_test_and_clear_atomic, using the return value to decrement
>>> the number of dirty pages.
>>> But you also need to be careful with the update of the
>>> migration_dirty_pages value itself, because that's also being read
>>> by the migration thread.
>>>
>>> Dave
>> I see that there's migration_bitmap_sync but it does not seem to be
> Do you mean bitmap_mutex?
>
>> taken on all paths. E.g. migration_bitmap_clear_dirty and
>> migration_bitmap_find_dirty are called without that lock sometimes.
>> Thoughts?

Right. The bitmap claims to protect modification of the bitmap, but 
migration_bitmap_clear_dirty doesn't strictly follow the rule.

> Hmm, that doesn't seem to protect much at all!  It looks like it was
> originally added to handle hotplug causing the bitmaps to be resized;
> that extension code was removed in 66103a5 so that lock can probably go.
>
> I don't see how the lock would help us though; the migration thread is
> scanning it most of the time so would have to have the lock held
> most of the time.
>



How about adding the lock to migration_bitmap_clear_dirty, and we will 
have something like this:

migration_bitmap_clear_dirty()
{
     qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
      if (ret) {
         rs->migration_dirty_pages--;
     }
     ...
     qemu_mutex_unlock(&rs->bitmap_mutex);
}


qemu_guest_free_page_hint()
{
     qemu_mutex_lock(&rs->bitmap_mutex);
     ...
     ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, 
npages);
     bitmap_clear(block->bmap, start, npages);
     qemu_mutex_unlock(&rs->bitmap_mutex);
}


The migration thread will hold the lock only when it clears a bit from 
the bitmap. Or would you consider to change it to qemu_spin_lock?

Best,
Wei
Michael S. Tsirkin March 15, 2018, 1:50 p.m. UTC | #7
On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote:
> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
> > * Wei Wang (wei.w.wang@intel.com) wrote:
> > > This patch adds an API to clear bits corresponding to guest free pages
> > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> > > RAMBlock boundary.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   include/migration/misc.h |  2 ++
> > >   migration/ram.c          | 21 +++++++++++++++++++++
> > >   2 files changed, 23 insertions(+)
> > > 
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index 77fd4f5..fae1acf 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -14,11 +14,13 @@
> > >   #ifndef MIGRATION_MISC_H
> > >   #define MIGRATION_MISC_H
> > > +#include "exec/cpu-common.h"
> > >   #include "qemu/notify.h"
> > >   /* migration/ram.c */
> > >   void ram_mig_init(void);
> > > +void qemu_guest_free_page_hint(void *addr, size_t len);
> > >   /* migration/block.c */
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5e33e5c..e172798 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
> > >       return 0;
> > >   }
> > This could do with some comments
> 
> OK, I'll add some.
> 
> > 
> > > +void qemu_guest_free_page_hint(void *addr, size_t len)
> > > +{
> > > +    RAMBlock *block;
> > > +    ram_addr_t offset;
> > > +    size_t used_len, start, npages;
> >  From your use I think the addr and len are coming raw from the guest;
> > so we need to take some care.
> > 
> 
> Actually the "addr" here has been the host address that corresponds to the
> guest free page. It's from elem->in_sg[0].iov_base.
> 
> > 
> > > +        if (unlikely(offset + len > block->used_length)) {
> > I think to make that overflow safe, that should be:
> >    if (len > (block->used_length - offset)) {
> > 
> > But we'll need another test before it, because qemu_ram_block_from_host
> > seems to check max_length not used_length, so we need to check
> > for offset > block->used_length first
> 
> OK, how about adding an assert above, like this:
> 
> block = qemu_ram_block_from_host(addr, false, &offset);
> assert (offset  < block->used_length );
> if (!block)
>     ...
> 
> The address corresponds to a guest free page, which means it should be
> within used_length. If not, something weird happens, I think we'd better to
> assert it in that case.
> 
> Best,
> Wei

What if memory has been removed by hotunplug after guest sent the
free page notification?

This seems to actually be likely to happen as memory being unplugged
would typically be mostly free.
Wang, Wei W March 16, 2018, 11:24 a.m. UTC | #8
On 03/15/2018 09:50 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote:
>> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
>>> * Wei Wang (wei.w.wang@intel.com) wrote:
>>>> This patch adds an API to clear bits corresponding to guest free pages
>>>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>>>> RAMBlock boundary.
>>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>    include/migration/misc.h |  2 ++
>>>>    migration/ram.c          | 21 +++++++++++++++++++++
>>>>    2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>>> index 77fd4f5..fae1acf 100644
>>>> --- a/include/migration/misc.h
>>>> +++ b/include/migration/misc.h
>>>> @@ -14,11 +14,13 @@
>>>>    #ifndef MIGRATION_MISC_H
>>>>    #define MIGRATION_MISC_H
>>>> +#include "exec/cpu-common.h"
>>>>    #include "qemu/notify.h"
>>>>    /* migration/ram.c */
>>>>    void ram_mig_init(void);
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>>>    /* migration/block.c */
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 5e33e5c..e172798 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>>>>        return 0;
>>>>    }
>>> This could do with some comments
>> OK, I'll add some.
>>
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>>>> +{
>>>> +    RAMBlock *block;
>>>> +    ram_addr_t offset;
>>>> +    size_t used_len, start, npages;
>>>   From your use I think the addr and len are coming raw from the guest;
>>> so we need to take some care.
>>>
>> Actually the "addr" here has been the host address that corresponds to the
>> guest free page. It's from elem->in_sg[0].iov_base.
>>
>>>> +        if (unlikely(offset + len > block->used_length)) {
>>> I think to make that overflow safe, that should be:
>>>     if (len > (block->used_length - offset)) {
>>>
>>> But we'll need another test before it, because qemu_ram_block_from_host
>>> seems to check max_length not used_length, so we need to check
>>> for offset > block->used_length first
>> OK, how about adding an assert above, like this:
>>
>> block = qemu_ram_block_from_host(addr, false, &offset);
>> assert (offset  < block->used_length );
>> if (!block)
>>      ...
>>
>> The address corresponds to a guest free page, which means it should be
>> within used_length. If not, something weird happens, I think we'd better to
>> assert it in that case.
>>
>> Best,
>> Wei
> What if memory has been removed by hotunplug after guest sent the
> free page notification?
>
> This seems to actually be likely to happen as memory being unplugged
> would typically be mostly free.


OK, thanks for the reminder. Instead of using an assert, I think we can 
let the function just return if (offset > block->used_length).

Best,
Wei
diff mbox

Patch

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..fae1acf 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@ 
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 5e33e5c..e172798 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2189,6 +2189,27 @@  static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (used_len = len; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(offset + len > block->used_length)) {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+    }
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code