Message ID | 1495642203-12702-5-git-send-email-felipe@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 24, 2017 at 05:10:03PM +0100, Felipe Franciosi wrote: > The commit message from 070afca25 suggests that dirty_rate_high_cnt > should be used more aggressively to start throttling after two > iterations instead of four. The code, however, only changes the auto > convergence behaviour to throttle after three iterations. This makes the > behaviour more aggressive by kicking off throttling after two iterations > as originally intended. This description looks suspect vs the code. You say it is changing from "after four" to "after two", but you are merely switching a post-increment to a pre-increment which can only reduce the boundary condition by 1, not 2. So either you mean to write "after two iterations instead of three" or "after three iterations intead of four" > > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > --- > migration/ram.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 1a3d9e6..26e03a5 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -708,7 +708,7 @@ static void migration_bitmap_sync(RAMState *rs) > > if ((rs->num_dirty_pages_period * TARGET_PAGE_SIZE > > (bytes_xfer_now - rs->bytes_xfer_prev) / 2) && > - (rs->dirty_rate_high_cnt++ >= 2)) { > + (++rs->dirty_rate_high_cnt >= 2)) { > trace_migration_throttle(); > rs->dirty_rate_high_cnt = 0; > mig_throttle_guest_down(); > -- > 1.9.5 > > Regards, Daniel
> On 24 May 2017, at 17:25, Daniel P. Berrange <berrange@redhat.com> wrote: > > On Wed, May 24, 2017 at 05:10:03PM +0100, Felipe Franciosi wrote: >> The commit message from 070afca25 suggests that dirty_rate_high_cnt >> should be used more aggressively to start throttling after two >> iterations instead of four. The code, however, only changes the auto >> convergence behaviour to throttle after three iterations. This makes the >> behaviour more aggressive by kicking off throttling after two iterations >> as originally intended. > > This description looks suspect vs the code. You say it is changing > from "after four" to "after two", but you are merely switching a > post-increment to a pre-increment which can only reduce the boundary > condition by 1, not 2. So either you mean to write > > "after two iterations instead of three" > > or > > "after three iterations intead of four" Hi Daniel, Thanks for the quick feedback. Let me clarify what I meant. If there's still confusion I'll update the commit message to reflect it better. Pre-070afca25: throttling kicked in after four iterations. Post-070afca25: throttling kicked in after three iterations (but Jason wrote he meant to start throttling after two). This patch: throttles kicks in after two iterations (so Jason's code, minus one). Perhaps where I said "The code, however," I could say "Commit 070afca25, however,". Let me know. Thanks, Felipe > >> >> Signed-off-by: Felipe Franciosi <felipe@nutanix.com> >> --- >> migration/ram.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/migration/ram.c b/migration/ram.c >> index 1a3d9e6..26e03a5 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -708,7 +708,7 @@ static void migration_bitmap_sync(RAMState *rs) >> >> if ((rs->num_dirty_pages_period * TARGET_PAGE_SIZE > >> (bytes_xfer_now - rs->bytes_xfer_prev) / 2) && >> - (rs->dirty_rate_high_cnt++ >= 2)) { >> + (++rs->dirty_rate_high_cnt >= 2)) { >> trace_migration_throttle(); >> rs->dirty_rate_high_cnt = 0; >> mig_throttle_guest_down(); >> -- >> 1.9.5 >> >> > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, May 24, 2017 at 05:10:03PM +0100, Felipe Franciosi wrote: > The commit message from 070afca25 suggests that dirty_rate_high_cnt > should be used more aggressively to start throttling after two > iterations instead of four. The code, however, only changes the auto > convergence behaviour to throttle after three iterations. This makes the > behaviour more aggressive by kicking off throttling after two iterations > as originally intended. For this one, I don't think fixing the code to match the commit message is that important. Instead, for me this patch looks more like something "changed iteration loops from 3 to 2". So the point is, what would be the best practical number for it. And when we change an existing value, we should have some reason, since it'll change behavior of existing user (though I'm not sure whether this one will affect much). I believe with higher dirty_rate_high_cnt, we have more smooth throttling, but it'll be slower in responding; While if lower or even remove it, we'll get very fast throttling response speed but I guess it may be more possible to report a false positive? IMHO here 3 is okay since after all we are solving the problem of unconverged migration, so as long as we can converge, I think it'll be fine. Thanks, > > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > --- > migration/ram.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 1a3d9e6..26e03a5 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -708,7 +708,7 @@ static void migration_bitmap_sync(RAMState *rs) > > if ((rs->num_dirty_pages_period * TARGET_PAGE_SIZE > > (bytes_xfer_now - rs->bytes_xfer_prev) / 2) && > - (rs->dirty_rate_high_cnt++ >= 2)) { > + (++rs->dirty_rate_high_cnt >= 2)) { > trace_migration_throttle(); > rs->dirty_rate_high_cnt = 0; > mig_throttle_guest_down(); > -- > 1.9.5 >
+ Matthew Rosato, original reviewer of 070afca25 > On 25 May 2017, at 02:03, Peter Xu <peterx@redhat.com> wrote: > > On Wed, May 24, 2017 at 05:10:03PM +0100, Felipe Franciosi wrote: >> The commit message from 070afca25 suggests that dirty_rate_high_cnt >> should be used more aggressively to start throttling after two >> iterations instead of four. The code, however, only changes the auto >> convergence behaviour to throttle after three iterations. This makes the >> behaviour more aggressive by kicking off throttling after two iterations >> as originally intended. > > For this one, I don't think fixing the code to match the commit > message is that important. Instead, for me this patch looks more like > something "changed iteration loops from 3 to 2". I agree. If we decide a v2 is needed I can amend the commit message accordingly. > So the point is, what > would be the best practical number for it. And when we change an > existing value, we should have some reason, since it'll change > behavior of existing user (though I'm not sure whether this one will > affect much). We've done a few tests with this internally using various workloads (DBs, synthetic mem stress, &c.). In summary, and along the lines with how Qemu implements auto convergence today, I would say this counter should be removed. Consider that when live migrating a large-ish VM (100GB+ RAM), the network will be under stress for (at least) several minutes. At the same time, the sole purpose of this counter is to attempt convergence without having to throttle the guest. That is, it defers throttling in the hope that either the guest calms down or that the network gets less congested. For real workloads, both cases are unlikely to happen. Throttling will eventually be needed otherwise the migration will not converge. > I believe with higher dirty_rate_high_cnt, we have more smooth > throttling, but it'll be slower in responding; While if lower or even > remove it, we'll get very fast throttling response speed but I guess > it may be more possible to report a false positive? With a higher dirty_rate_high_cnt, migration will simply take longer (if not converging). For real workloads, it doesn't change how much throttling is required. For spiky or varied workloads, it gives a chance for migration to converge without throttling, at the cost of massive network stress and a longer overall migration time (which is bad user experience IMO). > IMHO here 3 is > okay since after all we are solving the problem of unconverged > migration, so as long as we can converge, I think it'll be fine. Based on 070afca25's commit message, Jason seemed to think that four was too much and meant to change it to two. As explained above, I'd be in favour of removing this counter altogether, or at least make it configurable (perhaps a #define would be enough an improvement for now). This patch is intended as a compromise by effectively using two. Great feedback! Thanks again. Felipe > > Thanks, > >> >> Signed-off-by: Felipe Franciosi <felipe@nutanix.com> >> --- >> migration/ram.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/migration/ram.c b/migration/ram.c >> index 1a3d9e6..26e03a5 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -708,7 +708,7 @@ static void migration_bitmap_sync(RAMState *rs) >> >> if ((rs->num_dirty_pages_period * TARGET_PAGE_SIZE > >> (bytes_xfer_now - rs->bytes_xfer_prev) / 2) && >> - (rs->dirty_rate_high_cnt++ >= 2)) { >> + (++rs->dirty_rate_high_cnt >= 2)) { >> trace_migration_throttle(); >> rs->dirty_rate_high_cnt = 0; >> mig_throttle_guest_down(); >> -- >> 1.9.5 >> > > -- > Peter Xu
On Thu, May 25, 2017 at 11:20:02AM +0000, Felipe Franciosi wrote: > + Matthew Rosato, original reviewer of 070afca25 > > > On 25 May 2017, at 02:03, Peter Xu <peterx@redhat.com> wrote: > > > > On Wed, May 24, 2017 at 05:10:03PM +0100, Felipe Franciosi wrote: > >> The commit message from 070afca25 suggests that dirty_rate_high_cnt > >> should be used more aggressively to start throttling after two > >> iterations instead of four. The code, however, only changes the auto > >> convergence behaviour to throttle after three iterations. This makes the > >> behaviour more aggressive by kicking off throttling after two iterations > >> as originally intended. > > > > For this one, I don't think fixing the code to match the commit > > message is that important. Instead, for me this patch looks more like > > something "changed iteration loops from 3 to 2". > > I agree. If we decide a v2 is needed I can amend the commit message accordingly. > > > So the point is, what > > would be the best practical number for it. And when we change an > > existing value, we should have some reason, since it'll change > > behavior of existing user (though I'm not sure whether this one will > > affect much). > > We've done a few tests with this internally using various workloads (DBs, synthetic mem stress, &c.). In summary, and along the lines with how Qemu implements auto convergence today, I would say this counter should be removed. > > Consider that when live migrating a large-ish VM (100GB+ RAM), the network will be under stress for (at least) several minutes. At the same time, the sole purpose of this counter is to attempt convergence without having to throttle the guest. That is, it defers throttling in the hope that either the guest calms down or that the network gets less congested. > > For real workloads, both cases are unlikely to happen. Throttling will eventually be needed otherwise the migration will not converge. I am not much experienced with using auto convergence with real workload, but from what you explained I see it a reasonable change to even remove it (that sounds more persuasive to me than "just try to follow what the commit message said", with test results :), especially considering that we have cpu-throttle-initial and cpu-throttle-increment parameters as tunables, so we should be fine even we want to tune the speed down a bit in the future. > > > I believe with higher dirty_rate_high_cnt, we have more smooth > > throttling, but it'll be slower in responding; While if lower or even > > remove it, we'll get very fast throttling response speed but I guess > > it may be more possible to report a false positive? > > With a higher dirty_rate_high_cnt, migration will simply take longer (if not converging). For real workloads, it doesn't change how much throttling is required. For spiky or varied workloads, it gives a chance for migration to converge without throttling, at the cost of massive network stress and a longer overall migration time (which is bad user experience IMO). > > > IMHO here 3 is > > okay since after all we are solving the problem of unconverged > > migration, so as long as we can converge, I think it'll be fine. > > Based on 070afca25's commit message, Jason seemed to think that four was too much and meant to change it to two. As explained above, I'd be in favour of removing this counter altogether, or at least make it configurable (perhaps a #define would be enough an improvement for now). This patch is intended as a compromise by effectively using two. If to be a tunable, maybe another MigrationParameter? But I don't know whether it would really worth it since the other two can do more fine-grained tunes already. So I would prefer to remove it as well if thorough tests are carried out. Maybe we can also wait to see how other people think about it. Thanks,
Felipe Franciosi <felipe@nutanix.com> wrote: > The commit message from 070afca25 suggests that dirty_rate_high_cnt > should be used more aggressively to start throttling after two > iterations instead of four. The code, however, only changes the auto > convergence behaviour to throttle after three iterations. This makes the > behaviour more aggressive by kicking off throttling after two iterations > as originally intended. > > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Juan Quintela <quintela@redhat.com> First, Felipe is doing performance testing with the changes. Second, if you want autoconverge slower, you can use smaller increments. So, I think that this is ok. It is more, if you want you can send numbers of your workloads with current parameter and removing it altogether to convince me that it is a good idea just to drop it. Thanks, Juan.
diff --git a/migration/ram.c b/migration/ram.c index 1a3d9e6..26e03a5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -708,7 +708,7 @@ static void migration_bitmap_sync(RAMState *rs) if ((rs->num_dirty_pages_period * TARGET_PAGE_SIZE > (bytes_xfer_now - rs->bytes_xfer_prev) / 2) && - (rs->dirty_rate_high_cnt++ >= 2)) { + (++rs->dirty_rate_high_cnt >= 2)) { trace_migration_throttle(); rs->dirty_rate_high_cnt = 0; mig_throttle_guest_down();
The commit message from 070afca25 suggests that dirty_rate_high_cnt should be used more aggressively to start throttling after two iterations instead of four. The code, however, only changes the auto convergence behaviour to throttle after three iterations. This makes the behaviour more aggressive by kicking off throttling after two iterations as originally intended. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> --- migration/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)