Message ID | 20180903092644.25812-2-xiaoguangrong@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: compression optimization | expand |
guangrong.xiao@gmail.com wrote: > From: Xiao Guangrong <xiaoguangrong@tencent.com> > > flush_compressed_data() needs to wait all compression threads to > finish their work, after that all threads are free until the > migration feeds new request to them, reducing its call can improve > the throughput and use CPU resource more effectively > We do not need to flush all threads at the end of iteration, the > data can be kept locally until the memory block is changed or > memory migration starts over in that case we will meet a dirtied > page which may still exists in compression threads's ring > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> I am not so sure about this patch. Right now, we warantee that after each iteration, all data is written before we start a new round. This patch changes to only "flush" the compression threads if we have "synched" with the kvm migration bitmap. Idea is good but as far as I can see: - we already call flush_compressed_data() inside firnd_dirty_block if we synchronize the bitmap. So, at least, we need to update dirty_sync_count there. - queued pages are "interesting", but I am not sure if compression and postcopy work well together. So, if we don't need to call flush_compressed_data() every round, then the one inside find_dirty_block() should be enough. Otherwise, I can't see why we need this other. Independent of this patch: - We always send data for every compression thread without testing if there is any there. > @@ -3212,7 +3225,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > } > i++; > } > - flush_compressed_data(rs); > rcu_read_unlock(); > > /* Why is not enough just to remove this call to flush_compressed_data? Later, Juan.
On 09/04/2018 12:38 AM, Juan Quintela wrote: > guangrong.xiao@gmail.com wrote: >> From: Xiao Guangrong <xiaoguangrong@tencent.com> >> >> flush_compressed_data() needs to wait all compression threads to >> finish their work, after that all threads are free until the >> migration feeds new request to them, reducing its call can improve >> the throughput and use CPU resource more effectively >> We do not need to flush all threads at the end of iteration, the >> data can be kept locally until the memory block is changed or >> memory migration starts over in that case we will meet a dirtied >> page which may still exists in compression threads's ring >> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> > > I am not so sure about this patch. > > Right now, we warantee that after each iteration, all data is written > before we start a new round. > > This patch changes to only "flush" the compression threads if we have > "synched" with the kvm migration bitmap. Idea is good but as far as I > can see: > > - we already call flush_compressed_data() inside firnd_dirty_block if we > synchronize the bitmap. The one called in find_dirty_block is as followings: if (migrate_use_xbzrle()) { /* If xbzrle is on, stop using the data compression at this * point. In theory, xbzrle can do better than compression. */ flush_compressed_data(rs); } We will call it only if xbzrle is also enabled, at this case, we will disable compression and xbzrle for the following pages, please refer to save_page_use_compression(). So, it can not help us if we just enabled compression separately. > So, at least, we need to update > dirty_sync_count there. I understand this is a optimization that reduces flush_compressed_data under the if both xbzrle and compression are both enabled. However, we can avoid it by using save_page_use_compression() to replace migrate_use_compression(). Furthermore, in our work which will be pushed it out after this patchset (https://marc.info/?l=kvm&m=152810616708459&w=2), we will made flush_compressed_data() really light if there is nothing to be flushed. So, how about just keep it at this patch and let's optimize it in our next work? :) > > - queued pages are "interesting", but I am not sure if compression and > postcopy work well together. > Compression and postcopy can not both be enabled, please refer to the code in migrate_caps_check() if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) { if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) { /* The decompression threads asynchronously write into RAM * rather than use the atomic copies needed to avoid * userfaulting. It should be possible to fix the decompression * threads for compatibility in future. */ error_setg(errp, "Postcopy is not currently compatible " "with compression"); return false; } > So, if we don't need to call flush_compressed_data() every round, then > the one inside find_dirty_block() should be enough. Otherwise, I can't > see why we need this other. > > > Independent of this patch: > - We always send data for every compression thread without testing if > there is any there. > Yes. That's because we will never doubly handle the page in the iteration. :) > >> @@ -3212,7 +3225,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >> } >> i++; >> } >> - flush_compressed_data(rs); >> rcu_read_unlock(); >> >> /* > > Why is not enough just to remove this call to flush_compressed_data? Consider this case, thanks Dave to point it out. :) =============== iteration one: thread 1: Save compressed page 'n' iteration two: thread 2: Save compressed page 'n' What guarantees that the version of page 'n' from thread 2 reaches the destination first without this flush? =============== If we just remove it, we can not get this guarantee. :)
On 09/04/2018 11:54 AM, Xiao Guangrong wrote: > > We will call it only if xbzrle is also enabled, at this case, we will > disable compression and xbzrle for the following pages, please refer ^and use xbzrle Sorry for the typo.
Xiao Guangrong <guangrong.xiao@gmail.com> wrote: > On 09/04/2018 12:38 AM, Juan Quintela wrote: >> guangrong.xiao@gmail.com wrote: >>> From: Xiao Guangrong <xiaoguangrong@tencent.com> >>> >>> flush_compressed_data() needs to wait all compression threads to >>> finish their work, after that all threads are free until the >>> migration feeds new request to them, reducing its call can improve >>> the throughput and use CPU resource more effectively >>> We do not need to flush all threads at the end of iteration, the >>> data can be kept locally until the memory block is changed or >>> memory migration starts over in that case we will meet a dirtied >>> page which may still exists in compression threads's ring >>> >>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> >> >> I am not so sure about this patch. >> >> Right now, we warantee that after each iteration, all data is written >> before we start a new round. >> >> This patch changes to only "flush" the compression threads if we have >> "synched" with the kvm migration bitmap. Idea is good but as far as I >> can see: >> >> - we already call flush_compressed_data() inside firnd_dirty_block if we >> synchronize the bitmap. > > The one called in find_dirty_block is as followings: > > if (migrate_use_xbzrle()) { > /* If xbzrle is on, stop using the data compression at this > * point. In theory, xbzrle can do better than compression. > */ > flush_compressed_data(rs); > } Ouch. I didn't notice thet use_xbzrle() there. I think that the proper solution is just to call there unconditionally flush_compressed_data(). And then remove the call that I pointed at the end, no? > We will call it only if xbzrle is also enabled, at this case, we will > disable compression and xbzrle for the following pages, please refer > to save_page_use_compression(). So, it can not help us if we just enabled > compression separately. Point taken, but I still think that it should be unconditional. > >> So, at least, we need to update >> dirty_sync_count there. > > I understand this is a optimization that reduces flush_compressed_data > under the if both xbzrle and compression are both enabled. However, we > can avoid it by using save_page_use_compression() to replace > migrate_use_compression(). > > Furthermore, in our work which will be pushed it out after this patchset > (https://marc.info/?l=kvm&m=152810616708459&w=2), we will made > flush_compressed_data() really light if there is nothing to be flushed. > > So, how about just keep it at this patch and let's optimize it in our > next work? :) I still think that it is not needed, and that we can do better just removing the migrate_use_xbzrle() test and just removing the one in ram_save_iterate, but we can optimize it later. >> - queued pages are "interesting", but I am not sure if compression and >> postcopy work well together. >> > > Compression and postcopy can not both be enabled, please refer to the > code in migrate_caps_check() > > if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) { > if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) { > /* The decompression threads asynchronously write into RAM > * rather than use the atomic copies needed to avoid > * userfaulting. It should be possible to fix the decompression > * threads for compatibility in future. > */ > error_setg(errp, "Postcopy is not currently compatible " > "with compression"); > return false; > } Ok. >> So, if we don't need to call flush_compressed_data() every round, then >> the one inside find_dirty_block() should be enough. Otherwise, I can't >> see why we need this other. >> >> >> Independent of this patch: >> - We always send data for every compression thread without testing if >> there is any there. >> > > Yes. That's because we will never doubly handle the page in the iteration. :) I mean something different here. Think about the case that we have 4 compression threads and only three pages left to send. We have to call flush_compressed_data() and it sends data for the four threads, even when one of them is empty. >>> @@ -3212,7 +3225,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >>> } >>> i++; >>> } >>> - flush_compressed_data(rs); >>> rcu_read_unlock(); >>> /* >> >> Why is not enough just to remove this call to flush_compressed_data? > > Consider this case, thanks Dave to point it out. :) > > =============== > > iteration one: > thread 1: Save compressed page 'n' > > iteration two: > thread 2: Save compressed page 'n' > > What guarantees that the version of page 'n' > from thread 2 reaches the destination first without > this flush? > =============== > > If we just remove it, we can not get this guarantee. :) Ok. I think that the propper fix is to remove the things that you had previously said, we will wait until you merge the other bits to optimize. I agree that there is no harm in the change. Later, Juan.
diff --git a/migration/ram.c b/migration/ram.c index 79c89425a3..2ad07b5e15 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -307,6 +307,8 @@ struct RAMState { uint64_t iterations; /* number of dirty bits in the bitmap */ uint64_t migration_dirty_pages; + /* last dirty_sync_count we have seen */ + uint64_t dirty_sync_count; /* protects modification of the bitmap */ QemuMutex bitmap_mutex; /* The RAMBlock used in the last src_page_requests */ @@ -3180,6 +3182,17 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) ram_control_before_iterate(f, RAM_CONTROL_ROUND); + /* + * if memory migration starts over, we will meet a dirtied page which + * may still exists in compression threads's ring, so we should flush + * the compressed data to make sure the new page is not overwritten by + * the old one in the destination. + */ + if (ram_counters.dirty_sync_count != rs->dirty_sync_count) { + rs->dirty_sync_count = ram_counters.dirty_sync_count; + flush_compressed_data(rs); + } + t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); i = 0; while ((ret = qemu_file_rate_limit(f)) == 0 || @@ -3212,7 +3225,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) } i++; } - flush_compressed_data(rs); rcu_read_unlock(); /*