Message ID | 20210722083055.23352-1-wei.w.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] migration: clear the memory region dirty bitmap when skipping free pages | expand |
On 22.07.21 10:30, Wei Wang wrote: > When skipping free pages to send, their corresponding dirty bits in the > memory region dirty bitmap need to be cleared. Otherwise the skipped > pages will be sent in the next round after the migration thread syncs > dirty bits from the memory region dirty bitmap. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Reported-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > --- > migration/ram.c | 74 +++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 56 insertions(+), 18 deletions(-) > LGTM, thanks Reviewed-by: David Hildenbrand <david@redhat.com>
On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote: > On 22.07.21 10:30, Wei Wang wrote: > > When skipping free pages to send, their corresponding dirty bits in > > the memory region dirty bitmap need to be cleared. Otherwise the > > skipped pages will be sent in the next round after the migration > > thread syncs dirty bits from the memory region dirty bitmap. > > > > Cc: David Hildenbrand <david@redhat.com> > > Cc: Peter Xu <peterx@redhat.com> > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Reported-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > > --- > > migration/ram.c | 74 > +++++++++++++++++++++++++++++++++++++------------ > > 1 file changed, 56 insertions(+), 18 deletions(-) > > > > LGTM, thanks > > Reviewed-by: David Hildenbrand <david@redhat.com> > Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance. Best, Wei
On Thu, Jul 22, 2021 at 09:57:13AM +0000, Wang, Wei W wrote: > On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote: > > On 22.07.21 10:30, Wei Wang wrote: > > > When skipping free pages to send, their corresponding dirty bits in > > > the memory region dirty bitmap need to be cleared. Otherwise the > > > skipped pages will be sent in the next round after the migration > > > thread syncs dirty bits from the memory region dirty bitmap. > > > > > > Cc: David Hildenbrand <david@redhat.com> > > > Cc: Peter Xu <peterx@redhat.com> > > > Cc: Michael S. Tsirkin <mst@redhat.com> > > > Reported-by: David Hildenbrand <david@redhat.com> > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > > > --- > > > migration/ram.c | 74 > > +++++++++++++++++++++++++++++++++++++------------ > > > 1 file changed, 56 insertions(+), 18 deletions(-) > > > > > > > LGTM, thanks > > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > > > Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance. I can continue to try that; but it's slightly low priority to me so it could be a bit delayed. If either of you could help that would be great, as I still don't know last time why I didn't use free-page-hint right (note: I definitely checked lsmod for sure; so it's there). So I'll need to figure that out first. Thanks,
On 22.07.21 16:51, Peter Xu wrote: > On Thu, Jul 22, 2021 at 09:57:13AM +0000, Wang, Wei W wrote: >> On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote: >>> On 22.07.21 10:30, Wei Wang wrote: >>>> When skipping free pages to send, their corresponding dirty bits in >>>> the memory region dirty bitmap need to be cleared. Otherwise the >>>> skipped pages will be sent in the next round after the migration >>>> thread syncs dirty bits from the memory region dirty bitmap. >>>> >>>> Cc: David Hildenbrand <david@redhat.com> >>>> Cc: Peter Xu <peterx@redhat.com> >>>> Cc: Michael S. Tsirkin <mst@redhat.com> >>>> Reported-by: David Hildenbrand <david@redhat.com> >>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >>>> --- >>>> migration/ram.c | 74 >>> +++++++++++++++++++++++++++++++++++++------------ >>>> 1 file changed, 56 insertions(+), 18 deletions(-) >>>> >>> >>> LGTM, thanks >>> >>> Reviewed-by: David Hildenbrand <david@redhat.com> >>> >> >> Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance. > > I can continue to try that; but it's slightly low priority to me so it could be > a bit delayed. If either of you could help that would be great, as I still > don't know last time why I didn't use free-page-hint right (note: I definitely > checked lsmod for sure; so it's there). So I'll need to figure that out first. I'll give it a churn.
On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote:
> I'll give it a churn.
Thanks, David.
On 22.07.21 19:41, Peter Xu wrote: > On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote: >> I'll give it a churn. > > Thanks, David. > Migration of a 8 GiB VM * within the same host * after Linux is up and idle * free page hinting enabled * after dirtying most VM memory using memhog * keeping bandwidth set to QEMU defaults * On my 16 GiB notebook with other stuff running Current upstream with 63268c4970a, without this patch: total time: 28606 ms downtime: 33 ms setup: 3 ms transferred ram: 3722913 kbytes throughput: 1066.37 mbps remaining ram: 0 kbytes total ram: 8389384 kbytes duplicate: 21674 pages skipped: 0 pages normal: 928866 pages normal bytes: 3715464 kbytes dirty sync count: 5 pages-per-second: 32710 Current upstream without 63268c4970a, without this patch: total time: 28530 ms downtime: 277 ms setup: 4 ms transferred ram: 3726266 kbytes throughput: 1070.21 mbps remaining ram: 0 kbytes total ram: 8389384 kbytes duplicate: 21890 pages skipped: 0 pages normal: 929702 pages normal bytes: 3718808 kbytes dirty sync count: 5 pages-per-second: 32710 Current upstream without 63268c4970a, with this patch: total time: 5115 ms downtime: 37 ms setup: 5 ms transferred ram: 659532 kbytes throughput: 1057.94 mbps remaining ram: 0 kbytes total ram: 8389384 kbytes duplicate: 20748 pages skipped: 0 pages normal: 164516 pages normal bytes: 658064 kbytes dirty sync count: 4 pages-per-second: 32710 Current upstream with 63268c4970a, with this patch: total time: 5205 ms downtime: 45 ms setup: 3 ms transferred ram: 659636 kbytes throughput: 1039.39 mbps remaining ram: 0 kbytes total ram: 8389384 kbytes duplicate: 20264 pages skipped: 0 pages normal: 164543 pages normal bytes: 658172 kbytes dirty sync count: 4 pages-per-second: 32710 I repeated the last two measurements two times and took the "better" results. Looks like this patch does it job and that 63268c4970a doesn't seem to degrade migration in this combination/setup significantly (if at all, we would have to do more measurements).
On 22.07.21 16:51, Peter Xu wrote: > On Thu, Jul 22, 2021 at 09:57:13AM +0000, Wang, Wei W wrote: >> On Thursday, July 22, 2021 5:48 PM, David Hildenbrand wrote: >>> On 22.07.21 10:30, Wei Wang wrote: >>>> When skipping free pages to send, their corresponding dirty bits in >>>> the memory region dirty bitmap need to be cleared. Otherwise the >>>> skipped pages will be sent in the next round after the migration >>>> thread syncs dirty bits from the memory region dirty bitmap. >>>> >>>> Cc: David Hildenbrand <david@redhat.com> >>>> Cc: Peter Xu <peterx@redhat.com> >>>> Cc: Michael S. Tsirkin <mst@redhat.com> >>>> Reported-by: David Hildenbrand <david@redhat.com> >>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >>>> --- >>>> migration/ram.c | 74 >>> +++++++++++++++++++++++++++++++++++++------------ >>>> 1 file changed, 56 insertions(+), 18 deletions(-) >>>> >>> >>> LGTM, thanks >>> >>> Reviewed-by: David Hildenbrand <david@redhat.com> >>> >> >> Thanks. Please remember to have a regression test together with Peterx's that patch when you get a chance. > > I can continue to try that; but it's slightly low priority to me so it could be > a bit delayed. If either of you could help that would be great, as I still > don't know last time why I didn't use free-page-hint right (note: I definitely > checked lsmod for sure; so it's there). So I'll need to figure that out first. Just for the records: Add the following to your QEMU cmdline to make it work: -object iothread,id=t0 \ -device virtio-balloon-pci,id=balloon,iothread=t0,free-page-hint=on Most upstream distributions (e.g., Fedora) should have virtio-balloon build as module and there is no action inside the guest required to get it running. Just wait until Linux in the guest booted up.
On Friday, July 23, 2021 3:50 PM, David Hildenbrand wrote: > > Migration of a 8 GiB VM > * within the same host > * after Linux is up and idle > * free page hinting enabled > * after dirtying most VM memory using memhog Thanks for the tests! I think it would be better to test using idle guests (no memhog). With memhog eating most of the guest free pages, it's likely no or very few free pages are reported during the test. Best, Wei
On 23.07.21 10:14, Wang, Wei W wrote: > On Friday, July 23, 2021 3:50 PM, David Hildenbrand wrote: >> >> Migration of a 8 GiB VM >> * within the same host >> * after Linux is up and idle >> * free page hinting enabled >> * after dirtying most VM memory using memhog > > Thanks for the tests! > > I think it would be better to test using idle guests (no memhog). > With memhog eating most of the guest free pages, it's likely no or very few free pages are reported during the test. *After dirtying*. memhog is no longer running. ... and also look again at the numbers how much memory we actually migrate :)
On Friday, July 23, 2021 4:17 PM, David Hildenbrand wrote: > > On Friday, July 23, 2021 3:50 PM, David Hildenbrand wrote: > >> > >> Migration of a 8 GiB VM > >> * within the same host > >> * after Linux is up and idle > >> * free page hinting enabled > >> * after dirtying most VM memory using memhog > > > > Thanks for the tests! > > > > I think it would be better to test using idle guests (no memhog). > > With memhog eating most of the guest free pages, it's likely no or very few > free pages are reported during the test. > > *After dirtying*. memhog is no longer running. > > ... and also look again at the numbers how much memory we actually migrate :) > OK, I was looking at the duplicate number, which was big in the idle guest case you posted before. This looks good to me, the dirtied pages became free and got skipped as well. Thanks! Best, Wei
On Fri, Jul 23, 2021 at 09:50:18AM +0200, David Hildenbrand wrote: > On 22.07.21 19:41, Peter Xu wrote: > > On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote: > > > I'll give it a churn. > > > > Thanks, David. > > > > Migration of a 8 GiB VM > * within the same host > * after Linux is up and idle > * free page hinting enabled > * after dirtying most VM memory using memhog > * keeping bandwidth set to QEMU defaults > * On my 16 GiB notebook with other stuff running > > > Current upstream with 63268c4970a, without this patch: > > total time: 28606 ms > downtime: 33 ms > setup: 3 ms > transferred ram: 3722913 kbytes > throughput: 1066.37 mbps > remaining ram: 0 kbytes > total ram: 8389384 kbytes > duplicate: 21674 pages > skipped: 0 pages > normal: 928866 pages > normal bytes: 3715464 kbytes > dirty sync count: 5 > pages-per-second: 32710 > > Current upstream without 63268c4970a, without this patch: > > total time: 28530 ms > downtime: 277 ms > setup: 4 ms > transferred ram: 3726266 kbytes > throughput: 1070.21 mbps > remaining ram: 0 kbytes > total ram: 8389384 kbytes > duplicate: 21890 pages > skipped: 0 pages > normal: 929702 pages > normal bytes: 3718808 kbytes > dirty sync count: 5 > pages-per-second: 32710 > > > Current upstream without 63268c4970a, with this patch: > > total time: 5115 ms > downtime: 37 ms > setup: 5 ms > transferred ram: 659532 kbytes > throughput: 1057.94 mbps > remaining ram: 0 kbytes > total ram: 8389384 kbytes > duplicate: 20748 pages > skipped: 0 pages > normal: 164516 pages > normal bytes: 658064 kbytes > dirty sync count: 4 > pages-per-second: 32710 > > > Current upstream with 63268c4970a, with this patch: > > total time: 5205 ms > downtime: 45 ms > setup: 3 ms > transferred ram: 659636 kbytes > throughput: 1039.39 mbps > remaining ram: 0 kbytes > total ram: 8389384 kbytes > duplicate: 20264 pages > skipped: 0 pages > normal: 164543 pages > normal bytes: 658172 kbytes > dirty sync count: 4 > pages-per-second: 32710 > > > > I repeated the last two measurements two times and took the "better" > results. > > Looks like this patch does it job and that 63268c4970a doesn't seem to > degrade migration in this combination/setup significantly (if at all, we > would have to do more measurements). Thanks again for helping! Just to double check: the loop in qemu_guest_free_page_hint() won't run for a lot of iterations, right? Looks like that only happens when over ramblock boundaries. Otherwise we may also want to move that mutex out of the loop at some point because atomic looks indeed expensive on huge hosts.
On 23.07.21 14:51, Peter Xu wrote: > On Fri, Jul 23, 2021 at 09:50:18AM +0200, David Hildenbrand wrote: >> On 22.07.21 19:41, Peter Xu wrote: >>> On Thu, Jul 22, 2021 at 04:51:48PM +0200, David Hildenbrand wrote: >>>> I'll give it a churn. >>> >>> Thanks, David. >>> >> >> Migration of a 8 GiB VM >> * within the same host >> * after Linux is up and idle >> * free page hinting enabled >> * after dirtying most VM memory using memhog >> * keeping bandwidth set to QEMU defaults >> * On my 16 GiB notebook with other stuff running >> >> >> Current upstream with 63268c4970a, without this patch: >> >> total time: 28606 ms >> downtime: 33 ms >> setup: 3 ms >> transferred ram: 3722913 kbytes >> throughput: 1066.37 mbps >> remaining ram: 0 kbytes >> total ram: 8389384 kbytes >> duplicate: 21674 pages >> skipped: 0 pages >> normal: 928866 pages >> normal bytes: 3715464 kbytes >> dirty sync count: 5 >> pages-per-second: 32710 >> >> Current upstream without 63268c4970a, without this patch: >> >> total time: 28530 ms >> downtime: 277 ms >> setup: 4 ms >> transferred ram: 3726266 kbytes >> throughput: 1070.21 mbps >> remaining ram: 0 kbytes >> total ram: 8389384 kbytes >> duplicate: 21890 pages >> skipped: 0 pages >> normal: 929702 pages >> normal bytes: 3718808 kbytes >> dirty sync count: 5 >> pages-per-second: 32710 >> >> >> Current upstream without 63268c4970a, with this patch: >> >> total time: 5115 ms >> downtime: 37 ms >> setup: 5 ms >> transferred ram: 659532 kbytes >> throughput: 1057.94 mbps >> remaining ram: 0 kbytes >> total ram: 8389384 kbytes >> duplicate: 20748 pages >> skipped: 0 pages >> normal: 164516 pages >> normal bytes: 658064 kbytes >> dirty sync count: 4 >> pages-per-second: 32710 >> >> >> Current upstream with 63268c4970a, with this patch: >> >> total time: 5205 ms >> downtime: 45 ms >> setup: 3 ms >> transferred ram: 659636 kbytes >> throughput: 1039.39 mbps >> remaining ram: 0 kbytes >> total ram: 8389384 kbytes >> duplicate: 20264 pages >> skipped: 0 pages >> normal: 164543 pages >> normal bytes: 658172 kbytes >> dirty sync count: 4 >> pages-per-second: 32710 >> >> >> >> I repeated the last two measurements two times and took the "better" >> results. >> >> Looks like this patch does it job and that 63268c4970a doesn't seem to >> degrade migration in this combination/setup significantly (if at all, we >> would have to do more measurements). > > Thanks again for helping! > > Just to double check: the loop in qemu_guest_free_page_hint() won't run for a > lot of iterations, right? Looks like that only happens when over ramblock > boundaries. Otherwise we may also want to move that mutex out of the loop at > some point because atomic looks indeed expensive on huge hosts. I'd expect it never ever happens.
diff --git a/migration/ram.c b/migration/ram.c index b5fc454b2f..c8262cf3bc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -789,6 +789,53 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, return find_next_bit(bitmap, size, start); } +static void migration_clear_memory_region_dirty_bitmap(RAMState *rs, + RAMBlock *rb, + unsigned long page) +{ + uint8_t shift; + hwaddr size, start; + + if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page)) { + return; + } + + shift = rb->clear_bmap_shift; + /* + * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this + * can make things easier sometimes since then start address + * of the small chunk will always be 64 pages aligned so the + * bitmap will always be aligned to unsigned long. We should + * even be able to remove this restriction but I'm simply + * keeping it. + */ + assert(shift >= 6); + + size = 1ULL << (TARGET_PAGE_BITS + shift); + start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size); + trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page); + memory_region_clear_dirty_bitmap(rb->mr, start, size); +} + +static void +migration_clear_memory_region_dirty_bitmap_range(RAMState *rs, + RAMBlock *rb, + unsigned long start, + unsigned long npages) +{ + unsigned long i, chunk_pages = 1UL << rb->clear_bmap_shift; + unsigned long chunk_start = QEMU_ALIGN_DOWN(start, chunk_pages); + unsigned long chunk_end = QEMU_ALIGN_UP(start + npages, chunk_pages); + + /* + * Clear pages from start to start + npages - 1, so the end boundary is + * exclusive. + */ + for (i = chunk_start; i < chunk_end; i += chunk_pages) { + migration_clear_memory_region_dirty_bitmap(rs, rb, i); + } +} + static inline bool migration_bitmap_clear_dirty(RAMState *rs, RAMBlock *rb, unsigned long page) @@ -803,26 +850,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs, * the page in the chunk we clear the remote dirty bitmap for all. * Clearing it earlier won't be a problem, but too late will. */ - if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) { - uint8_t shift = rb->clear_bmap_shift; - hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift); - hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size); - - /* - * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this - * can make things easier sometimes since then start address - * of the small chunk will always be 64 pages aligned so the - * bitmap will always be aligned to unsigned long. We should - * even be able to remove this restriction but I'm simply - * keeping it. - */ - assert(shift >= 6); - trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page); - memory_region_clear_dirty_bitmap(rb->mr, start, size); - } + migration_clear_memory_region_dirty_bitmap(rs, rb, page); ret = test_and_clear_bit(page, rb->bmap); - if (ret) { rs->migration_dirty_pages--; } @@ -2741,6 +2771,14 @@ void qemu_guest_free_page_hint(void *addr, size_t len) npages = used_len >> TARGET_PAGE_BITS; qemu_mutex_lock(&ram_state->bitmap_mutex); + /* + * The skipped free pages are equavalent to be sent from clear_bmap's + * perspective, so clear the bits from the memory region bitmap which + * are initially set. Otherwise those skipped pages will be sent in + * the next round after syncing from the memory region bitmap. + */ + migration_clear_memory_region_dirty_bitmap_range(ram_state, block, + start, npages); ram_state->migration_dirty_pages -= bitmap_count_one_with_offset(block->bmap, start, npages); bitmap_clear(block->bmap, start, npages);
When skipping free pages to send, their corresponding dirty bits in the memory region dirty bitmap need to be cleared. Otherwise the skipped pages will be sent in the next round after the migration thread syncs dirty bits from the memory region dirty bitmap. Cc: David Hildenbrand <david@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- migration/ram.c | 74 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 56 insertions(+), 18 deletions(-) v2->v3 changelog: - change migration_clear_memory_region_dirty_bitmap_range to use QEMU_ALIGN v1->v2 changelog: - move migration_clear_memory_region_dirty_bitmap under bitmap_mutex as we lack confidence to have it outside the lock for now. - clean the unnecessary subproject commit.