diff mbox series

[v3] migration: clear the memory region dirty bitmap when skipping free pages

Message ID 20210722083055.23352-1-wei.w.wang@intel.com (mailing list archive)
State New, archived
Headers show
Series [v3] migration: clear the memory region dirty bitmap when skipping free pages | expand

Commit Message

Wang, Wei W July 22, 2021, 8:30 a.m. UTC
When skipping free pages to send, their corresponding dirty bits in the
memory region dirty bitmap need to be cleared. Otherwise the skipped
pages will be sent in the next round after the migration thread syncs
dirty bits from the memory region dirty bitmap.

Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 migration/ram.c | 74 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 56 insertions(+), 18 deletions(-)

v2->v3 changelog:
- change migration_clear_memory_region_dirty_bitmap_range to use
  QEMU_ALIGN

v1->v2 changelog:
- move migration_clear_memory_region_dirty_bitmap under bitmap_mutex as
  we lack confidence to have it outside the lock for now.
- clean the unnecessary subproject commit.

Comments

David Hildenbrand July 22, 2021, 9:47 a.m. UTC | #1
On 22.07.21 10:30, Wei Wang wrote:
> When skipping free pages to send, their corresponding dirty bits in the
> memory region dirty bitmap need to be cleared. Otherwise the skipped
> pages will be sent in the next round after the migration thread syncs
> dirty bits from the memory region dirty bitmap.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Reported-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> ---
>   migration/ram.c | 74 +++++++++++++++++++++++++++++++++++++------------
>   1 file changed, 56 insertions(+), 18 deletions(-)
> 

LGTM, thanks

Reviewed-by: David Hildenbrand <david@redhat.com>
Wang, Wei W July 22, 2021, 9:57 a.m. UTC | #2
On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote:
> On 22.07.21 10:30, Wei Wang wrote:
> > When skipping free pages to send, their corresponding dirty bits in
> > the memory region dirty bitmap need to be cleared. Otherwise the
> > skipped pages will be sent in the next round after the migration
> > thread syncs dirty bits from the memory region dirty bitmap.
> >
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Reported-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > ---
> >   migration/ram.c | 74
> +++++++++++++++++++++++++++++++++++++------------
> >   1 file changed, 56 insertions(+), 18 deletions(-)
> >
> 
> LGTM, thanks
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 

Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance.

Best,
Wei
Peter Xu July 22, 2021, 2:51 p.m. UTC | #3
On Thu, Jul 22, 2021 at 09:57:13AM +0000, Wang, Wei W wrote:
> On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote:
> > On 22.07.21 10:30, Wei Wang wrote:
> > > When skipping free pages to send, their corresponding dirty bits in
> > > the memory region dirty bitmap need to be cleared. Otherwise the
> > > skipped pages will be sent in the next round after the migration
> > > thread syncs dirty bits from the memory region dirty bitmap.
> > >
> > > Cc: David Hildenbrand <david@redhat.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Reported-by: David Hildenbrand <david@redhat.com>
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > ---
> > >   migration/ram.c | 74
> > +++++++++++++++++++++++++++++++++++++------------
> > >   1 file changed, 56 insertions(+), 18 deletions(-)
> > >
> > 
> > LGTM, thanks
> > 
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > 
> 
> Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance.

I can continue to try that; but it's slightly low priority to me so it could be
a bit delayed.  If either of you could help that would be great, as I still
don't know last time why I didn't use free-page-hint right (note: I definitely
checked lsmod for sure; so it's there).  So I'll need to figure that out first.

Thanks,
David Hildenbrand July 22, 2021, 2:51 p.m. UTC | #4
On 22.07.21 16:51, Peter Xu wrote:
> On Thu, Jul 22, 2021 at 09:57:13AM +0000, Wang, Wei W wrote:
>> On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote:
>>> On 22.07.21 10:30, Wei Wang wrote:
>>>> When skipping free pages to send, their corresponding dirty bits in
>>>> the memory region dirty bitmap need to be cleared. Otherwise the
>>>> skipped pages will be sent in the next round after the migration
>>>> thread syncs dirty bits from the memory region dirty bitmap.
>>>>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Peter Xu <peterx@redhat.com>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Reported-by: David Hildenbrand <david@redhat.com>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> ---
>>>>    migration/ram.c | 74
>>> +++++++++++++++++++++++++++++++++++++------------
>>>>    1 file changed, 56 insertions(+), 18 deletions(-)
>>>>
>>>
>>> LGTM, thanks
>>>
>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>
>>
>> Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance.
> 
> I can continue to try that; but it's slightly low priority to me so it could be
> a bit delayed.  If either of you could help that would be great, as I still
> don't know last time why I didn't use free-page-hint right (note: I definitely
> checked lsmod for sure; so it's there).  So I'll need to figure that out first.

I'll give it a churn.
Peter Xu July 22, 2021, 5:41 p.m. UTC | #5
On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote:
> I'll give it a churn.

Thanks, David.
David Hildenbrand July 23, 2021, 7:50 a.m. UTC | #6
On 22.07.21 19:41, Peter Xu wrote:
> On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote:
>> I'll give it a churn.
> 
> Thanks, David.
> 

Migration of a 8 GiB VM
* within the same host
* after Linux is up and idle
* free page hinting enabled
* after dirtying most VM memory using memhog
* keeping bandwidth set to QEMU defaults
* On my 16 GiB notebook with other stuff running


Current upstream with 63268c4970a, without this patch:

total time: 28606 ms
downtime: 33 ms
setup: 3 ms
transferred ram: 3722913 kbytes
throughput: 1066.37 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 21674 pages
skipped: 0 pages
normal: 928866 pages
normal bytes: 3715464 kbytes
dirty sync count: 5
pages-per-second: 32710

Current upstream without 63268c4970a, without this patch:

total time: 28530 ms
downtime: 277 ms
setup: 4 ms
transferred ram: 3726266 kbytes
throughput: 1070.21 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 21890 pages
skipped: 0 pages
normal: 929702 pages
normal bytes: 3718808 kbytes
dirty sync count: 5
pages-per-second: 32710


Current upstream without 63268c4970a, with this patch:

total time: 5115 ms
downtime: 37 ms
setup: 5 ms
transferred ram: 659532 kbytes
throughput: 1057.94 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 20748 pages
skipped: 0 pages
normal: 164516 pages
normal bytes: 658064 kbytes
dirty sync count: 4
pages-per-second: 32710


Current upstream with 63268c4970a, with this patch:

total time: 5205 ms
downtime: 45 ms
setup: 3 ms
transferred ram: 659636 kbytes
throughput: 1039.39 mbps
remaining ram: 0 kbytes
total ram: 8389384 kbytes
duplicate: 20264 pages
skipped: 0 pages
normal: 164543 pages
normal bytes: 658172 kbytes
dirty sync count: 4
pages-per-second: 32710



I repeated the last two measurements two times and took the "better" 
results.

Looks like this patch does it job and that 63268c4970a doesn't seem to 
degrade migration in this combination/setup significantly (if at all, we 
would have to do more measurements).
David Hildenbrand July 23, 2021, 8:03 a.m. UTC | #7
On 22.07.21 16:51, Peter Xu wrote:
> On Thu, Jul 22, 2021 at 09:57:13AM +0000, Wang, Wei W wrote:
>> On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote:
>>> On 22.07.21 10:30, Wei Wang wrote:
>>>> When skipping free pages to send, their corresponding dirty bits in
>>>> the memory region dirty bitmap need to be cleared. Otherwise the
>>>> skipped pages will be sent in the next round after the migration
>>>> thread syncs dirty bits from the memory region dirty bitmap.
>>>>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Peter Xu <peterx@redhat.com>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Reported-by: David Hildenbrand <david@redhat.com>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> ---
>>>>    migration/ram.c | 74
>>> +++++++++++++++++++++++++++++++++++++------------
>>>>    1 file changed, 56 insertions(+), 18 deletions(-)
>>>>
>>>
>>> LGTM, thanks
>>>
>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>
>>
>> Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance.
> 
> I can continue to try that; but it's slightly low priority to me so it could be
> a bit delayed.  If either of you could help that would be great, as I still
> don't know last time why I didn't use free-page-hint right (note: I definitely
> checked lsmod for sure; so it's there).  So I'll need to figure that out first.

Just for the records:

Add the following to your QEMU cmdline to make it work:

     -object iothread,id=t0 \
     -device virtio-balloon-pci,id=balloon,iothread=t0,free-page-hint=on

Most upstream distributions (e.g., Fedora) should have virtio-balloon 
build as module and there is no action inside the guest required to get 
it running. Just wait until Linux in the guest booted up.
Wang, Wei W July 23, 2021, 8:14 a.m. UTC | #8
On Friday, July 23, 2021 3:50 PM, David Hildenbrand wrote:
> 
> Migration of a 8 GiB VM
> * within the same host
> * after Linux is up and idle
> * free page hinting enabled
> * after dirtying most VM memory using memhog

Thanks for the tests!

I think it would be better to test using idle guests (no memhog).
With memhog eating most of the guest free pages, it's likely no or very few free pages are reported during the test.

Best,
Wei
David Hildenbrand July 23, 2021, 8:16 a.m. UTC | #9
On 23.07.21 10:14, Wang, Wei W wrote:
> On Friday, July 23, 2021 3:50 PM, David Hildenbrand wrote:
>>
>> Migration of a 8 GiB VM
>> * within the same host
>> * after Linux is up and idle
>> * free page hinting enabled
>> * after dirtying most VM memory using memhog
> 
> Thanks for the tests!
> 
> I think it would be better to test using idle guests (no memhog).
> With memhog eating most of the guest free pages, it's likely no or very few free pages are reported during the test.

*After dirtying*. memhog is no longer running.

... and also look again at the numbers how much memory we actually 
migrate :)
Wang, Wei W July 23, 2021, 8:32 a.m. UTC | #10
On Friday, July 23, 2021 4:17 PM, David Hildenbrand wrote:
> > On Friday, July 23, 2021 3:50 PM, David Hildenbrand wrote:
> >>
> >> Migration of a 8 GiB VM
> >> * within the same host
> >> * after Linux is up and idle
> >> * free page hinting enabled
> >> * after dirtying most VM memory using memhog
> >
> > Thanks for the tests!
> >
> > I think it would be better to test using idle guests (no memhog).
> > With memhog eating most of the guest free pages, it's likely no or very few
> free pages are reported during the test.
> 
> *After dirtying*. memhog is no longer running.
> 
> ... and also look again at the numbers how much memory we actually migrate :)
> 

OK, I was looking at the duplicate number, which was big in the idle guest case you posted before.
This looks good to me, the dirtied pages became free and got skipped as well. Thanks!

Best,
Wei
Peter Xu July 23, 2021, 12:51 p.m. UTC | #11
On Fri, Jul 23, 2021 at 09:50:18AM +0200, David Hildenbrand wrote:
> On 22.07.21 19:41, Peter Xu wrote:
> > On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote:
> > > I'll give it a churn.
> > 
> > Thanks, David.
> > 
> 
> Migration of a 8 GiB VM
> * within the same host
> * after Linux is up and idle
> * free page hinting enabled
> * after dirtying most VM memory using memhog
> * keeping bandwidth set to QEMU defaults
> * On my 16 GiB notebook with other stuff running
> 
> 
> Current upstream with 63268c4970a, without this patch:
> 
> total time: 28606 ms
> downtime: 33 ms
> setup: 3 ms
> transferred ram: 3722913 kbytes
> throughput: 1066.37 mbps
> remaining ram: 0 kbytes
> total ram: 8389384 kbytes
> duplicate: 21674 pages
> skipped: 0 pages
> normal: 928866 pages
> normal bytes: 3715464 kbytes
> dirty sync count: 5
> pages-per-second: 32710
> 
> Current upstream without 63268c4970a, without this patch:
> 
> total time: 28530 ms
> downtime: 277 ms
> setup: 4 ms
> transferred ram: 3726266 kbytes
> throughput: 1070.21 mbps
> remaining ram: 0 kbytes
> total ram: 8389384 kbytes
> duplicate: 21890 pages
> skipped: 0 pages
> normal: 929702 pages
> normal bytes: 3718808 kbytes
> dirty sync count: 5
> pages-per-second: 32710
> 
> 
> Current upstream without 63268c4970a, with this patch:
> 
> total time: 5115 ms
> downtime: 37 ms
> setup: 5 ms
> transferred ram: 659532 kbytes
> throughput: 1057.94 mbps
> remaining ram: 0 kbytes
> total ram: 8389384 kbytes
> duplicate: 20748 pages
> skipped: 0 pages
> normal: 164516 pages
> normal bytes: 658064 kbytes
> dirty sync count: 4
> pages-per-second: 32710
> 
> 
> Current upstream with 63268c4970a, with this patch:
> 
> total time: 5205 ms
> downtime: 45 ms
> setup: 3 ms
> transferred ram: 659636 kbytes
> throughput: 1039.39 mbps
> remaining ram: 0 kbytes
> total ram: 8389384 kbytes
> duplicate: 20264 pages
> skipped: 0 pages
> normal: 164543 pages
> normal bytes: 658172 kbytes
> dirty sync count: 4
> pages-per-second: 32710
> 
> 
> 
> I repeated the last two measurements two times and took the "better"
> results.
> 
> Looks like this patch does it job and that 63268c4970a doesn't seem to
> degrade migration in this combination/setup significantly (if at all, we
> would have to do more measurements).

Thanks again for helping!

Just to double check: the loop in qemu_guest_free_page_hint() won't run for a
lot of iterations, right?  Looks like that only happens when over ramblock
boundaries.  Otherwise we may also want to move that mutex out of the loop at
some point because atomic looks indeed expensive on huge hosts.
David Hildenbrand July 23, 2021, 12:52 p.m. UTC | #12
On 23.07.21 14:51, Peter Xu wrote:
> On Fri, Jul 23, 2021 at 09:50:18AM +0200, David Hildenbrand wrote:
>> On 22.07.21 19:41, Peter Xu wrote:
>>> On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote:
>>>> I'll give it a churn.
>>>
>>> Thanks, David.
>>>
>>
>> Migration of a 8 GiB VM
>> * within the same host
>> * after Linux is up and idle
>> * free page hinting enabled
>> * after dirtying most VM memory using memhog
>> * keeping bandwidth set to QEMU defaults
>> * On my 16 GiB notebook with other stuff running
>>
>>
>> Current upstream with 63268c4970a, without this patch:
>>
>> total time: 28606 ms
>> downtime: 33 ms
>> setup: 3 ms
>> transferred ram: 3722913 kbytes
>> throughput: 1066.37 mbps
>> remaining ram: 0 kbytes
>> total ram: 8389384 kbytes
>> duplicate: 21674 pages
>> skipped: 0 pages
>> normal: 928866 pages
>> normal bytes: 3715464 kbytes
>> dirty sync count: 5
>> pages-per-second: 32710
>>
>> Current upstream without 63268c4970a, without this patch:
>>
>> total time: 28530 ms
>> downtime: 277 ms
>> setup: 4 ms
>> transferred ram: 3726266 kbytes
>> throughput: 1070.21 mbps
>> remaining ram: 0 kbytes
>> total ram: 8389384 kbytes
>> duplicate: 21890 pages
>> skipped: 0 pages
>> normal: 929702 pages
>> normal bytes: 3718808 kbytes
>> dirty sync count: 5
>> pages-per-second: 32710
>>
>>
>> Current upstream without 63268c4970a, with this patch:
>>
>> total time: 5115 ms
>> downtime: 37 ms
>> setup: 5 ms
>> transferred ram: 659532 kbytes
>> throughput: 1057.94 mbps
>> remaining ram: 0 kbytes
>> total ram: 8389384 kbytes
>> duplicate: 20748 pages
>> skipped: 0 pages
>> normal: 164516 pages
>> normal bytes: 658064 kbytes
>> dirty sync count: 4
>> pages-per-second: 32710
>>
>>
>> Current upstream with 63268c4970a, with this patch:
>>
>> total time: 5205 ms
>> downtime: 45 ms
>> setup: 3 ms
>> transferred ram: 659636 kbytes
>> throughput: 1039.39 mbps
>> remaining ram: 0 kbytes
>> total ram: 8389384 kbytes
>> duplicate: 20264 pages
>> skipped: 0 pages
>> normal: 164543 pages
>> normal bytes: 658172 kbytes
>> dirty sync count: 4
>> pages-per-second: 32710
>>
>>
>>
>> I repeated the last two measurements two times and took the "better"
>> results.
>>
>> Looks like this patch does it job and that 63268c4970a doesn't seem to
>> degrade migration in this combination/setup significantly (if at all, we
>> would have to do more measurements).
> 
> Thanks again for helping!
> 
> Just to double check: the loop in qemu_guest_free_page_hint() won't run for a
> lot of iterations, right?  Looks like that only happens when over ramblock
> boundaries.  Otherwise we may also want to move that mutex out of the loop at
> some point because atomic looks indeed expensive on huge hosts.

I'd expect it never ever happens.
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index b5fc454b2f..c8262cf3bc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -789,6 +789,53 @@  unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return find_next_bit(bitmap, size, start);
 }
 
+static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
+                                                       RAMBlock *rb,
+                                                       unsigned long page)
+{
+    uint8_t shift;
+    hwaddr size, start;
+
+    if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) {
+        return;
+    }
+
+    shift = rb->clear_bmap_shift;
+    /*
+     * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
+     * can make things easier sometimes since then start address
+     * of the small chunk will always be 64 pages aligned so the
+     * bitmap will always be aligned to unsigned long. We should
+     * even be able to remove this restriction but I'm simply
+     * keeping it.
+     */
+    assert(shift >= 6);
+
+    size = 1ULL << (TARGET_PAGE_BITS + shift);
+    start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
+    trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+    memory_region_clear_dirty_bitmap(rb->mr, start, size);
+}
+
+static void
+migration_clear_memory_region_dirty_bitmap_range(RAMState *rs,
+                                                 RAMBlock *rb,
+                                                 unsigned long start,
+                                                 unsigned long npages)
+{
+    unsigned long i, chunk_pages = 1UL << rb->clear_bmap_shift;
+    unsigned long chunk_start = QEMU_ALIGN_DOWN(start, chunk_pages);
+    unsigned long chunk_end = QEMU_ALIGN_UP(start + npages, chunk_pages);
+
+    /*
+     * Clear pages from start to start + npages - 1, so the end boundary is
+     * exclusive.
+     */
+    for (i = chunk_start; i < chunk_end; i += chunk_pages) {
+        migration_clear_memory_region_dirty_bitmap(rs, rb, i);
+    }
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -803,26 +850,9 @@  static inline bool migration_bitmap_clear_dirty(RAMState *rs,
      * the page in the chunk we clear the remote dirty bitmap for all.
      * Clearing it earlier won't be a problem, but too late will.
      */
-    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
-        uint8_t shift = rb->clear_bmap_shift;
-        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
-        hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
-
-        /*
-         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
-         * can make things easier sometimes since then start address
-         * of the small chunk will always be 64 pages aligned so the
-         * bitmap will always be aligned to unsigned long.  We should
-         * even be able to remove this restriction but I'm simply
-         * keeping it.
-         */
-        assert(shift >= 6);
-        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
-        memory_region_clear_dirty_bitmap(rb->mr, start, size);
-    }
+    migration_clear_memory_region_dirty_bitmap(rs, rb, page);
 
     ret = test_and_clear_bit(page, rb->bmap);
-
     if (ret) {
         rs->migration_dirty_pages--;
     }
@@ -2741,6 +2771,14 @@  void qemu_guest_free_page_hint(void *addr, size_t len)
         npages = used_len >> TARGET_PAGE_BITS;
 
         qemu_mutex_lock(&ram_state->bitmap_mutex);
+        /*
+         * The skipped free pages are equavalent to be sent from clear_bmap's
+         * perspective, so clear the bits from the memory region bitmap which
+         * are initially set. Otherwise those skipped pages will be sent in
+         * the next round after syncing from the memory region bitmap.
+         */
+        migration_clear_memory_region_dirty_bitmap_range(ram_state, block,
+                                                         start, npages);
         ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, npages);
         bitmap_clear(block->bmap, start, npages);