Message ID | 20190122150543.16889-2-bala24@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: calculate expected_downtime considering redirtied ram | expand |
bala24@linux.vnet.ibm.com wrote: > From: Balamuruhan S <bala24@linux.vnet.ibm.com> > > currently we calculate expected_downtime by time taken to transfer > remaining ram, but during the time we had transferred remaining ram > few pages of ram might be redirtied and we need to retransfer it, > so it is better to consider them for calculating expected_downtime > for getting more accurate values. > > Total ram to be transferred = remaining ram + (redirtied ram at the > time when the remaining > ram gets transferred) > > redirtied ram = dirty_pages_rate * time taken to transfer remaining ram > > redirtied ram = dirty_pages_rate * (remaining ram / bandwidth) > > expected_downtime = (remaining ram + redirtied ram) / bandwidth > > Suggested-by: David Gibson <david@gibson.dropbear.id.au> > Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> > --- > migration/migration.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index ffc4d9e556..dc38e9a380 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s, > * recalculate. 10000 is a small enough number for our purposes > */ > if (ram_counters.dirty_pages_rate && transferred > 10000) { > - s->expected_downtime = ram_counters.remaining / bandwidth; > + /* Time required to transfer remaining ram */ > + remaining_ram_transfer_time = ram_counters.remaining / bandwidth missing semicolon > + > + /* redirty of ram at the time remaining ram gets transferred*/ > + newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time the same. Declaration of the new variables is also missing. > + s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth; > } > > qemu_file_reset_rate_limit(s->to_dst_file); About the numbers, I am not against it. It is an heuristic. Without numbers (and it is very load dependent) it is not clear that this one is going to be much worse/better than previous one (this should be a bit better, though). Thanks, Juan.
On Wed, Jan 23, 2019 at 05:35:03PM +0100, Juan Quintela wrote: > bala24@linux.vnet.ibm.com wrote: > > From: Balamuruhan S <bala24@linux.vnet.ibm.com> > > > > currently we calculate expected_downtime by time taken to transfer > > remaining ram, but during the time we had transferred remaining ram > > few pages of ram might be redirtied and we need to retransfer it, > > so it is better to consider them for calculating expected_downtime > > for getting more accurate values. > > > > Total ram to be transferred = remaining ram + (redirtied ram at the > > time when the remaining > > ram gets transferred) > > > > redirtied ram = dirty_pages_rate * time taken to transfer remaining ram > > > > redirtied ram = dirty_pages_rate * (remaining ram / bandwidth) > > > > expected_downtime = (remaining ram + redirtied ram) / bandwidth > > > > Suggested-by: David Gibson <david@gibson.dropbear.id.au> > > Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> > > --- > > migration/migration.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/migration/migration.c b/migration/migration.c > > index ffc4d9e556..dc38e9a380 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s, > > * recalculate. 10000 is a small enough number for our purposes > > */ > > if (ram_counters.dirty_pages_rate && transferred > 10000) { > > - s->expected_downtime = ram_counters.remaining / bandwidth; > > + /* Time required to transfer remaining ram */ > > + remaining_ram_transfer_time = ram_counters.remaining / bandwidth > > missing semicolon > > > + > > + /* redirty of ram at the time remaining ram gets transferred*/ > > + newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time > > the same. > > Declaration of the new variables is also missing. > > > + s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth; > > } > > > > qemu_file_reset_rate_limit(s->to_dst_file); > > About the numbers, I am not against it. It is an heuristic. Without > numbers (and it is very load dependent) it is not clear that this one is > going to be much worse/better than previous one (this should be a bit > better, though). Actually I have had a question on how expected_downtime is defined and how it will be used by users. My understanding is that the expected_downtime is defined as: how long time the guest will be down if we stop the VM right now and migrate all the rest of pages. This definition makes sense in that it helps the customer to dynamically decide whether it's a good point to go into the last phase of migration. Currently we should be able to achieve that by setting a very high target downtime. And if that definition is the thing we want, the current calculation seems exactly the number we want, since if we stop the VM right now then there won't be any more data to be dirtied as well. Regards,
On Thu, Jan 24, 2019 at 03:18:22PM +0800, Peter Xu wrote: > On Wed, Jan 23, 2019 at 05:35:03PM +0100, Juan Quintela wrote: > > bala24@linux.vnet.ibm.com wrote: > > > From: Balamuruhan S <bala24@linux.vnet.ibm.com> > > > > > > currently we calculate expected_downtime by time taken to transfer > > > remaining ram, but during the time we had transferred remaining ram > > > few pages of ram might be redirtied and we need to retransfer it, > > > so it is better to consider them for calculating expected_downtime > > > for getting more accurate values. > > > > > > Total ram to be transferred = remaining ram + (redirtied ram at the > > > time when the remaining > > > ram gets transferred) > > > > > > redirtied ram = dirty_pages_rate * time taken to transfer remaining ram > > > > > > redirtied ram = dirty_pages_rate * (remaining ram / bandwidth) > > > > > > expected_downtime = (remaining ram + redirtied ram) / bandwidth > > > > > > Suggested-by: David Gibson <david@gibson.dropbear.id.au> > > > Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> > > > --- > > > migration/migration.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > index ffc4d9e556..dc38e9a380 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s, > > > * recalculate. 10000 is a small enough number for our purposes > > > */ > > > if (ram_counters.dirty_pages_rate && transferred > 10000) { > > > - s->expected_downtime = ram_counters.remaining / bandwidth; > > > + /* Time required to transfer remaining ram */ > > > + remaining_ram_transfer_time = ram_counters.remaining / bandwidth > > > > missing semicolon > > > > > + > > > + /* redirty of ram at the time remaining ram gets transferred*/ > > > + newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time > > > > the same. > > > > Declaration of the new variables is also missing. > > > > > + s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth; > > > } > > > > > > qemu_file_reset_rate_limit(s->to_dst_file); > > > > About the numbers, I am not against it. It is an heuristic. Without > > numbers (and it is very load dependent) it is not clear that this one is > > going to be much worse/better than previous one (this should be a bit > > better, though). > > Actually I have had a question on how expected_downtime is defined and > how it will be used by users. > > My understanding is that the expected_downtime is defined as: how long > time the guest will be down if we stop the VM right now and migrate > all the rest of pages. > > This definition makes sense in that it helps the customer to > dynamically decide whether it's a good point to go into the last phase > of migration. Currently we should be able to achieve that by setting > a very high target downtime. > > And if that definition is the thing we want, the current calculation > seems exactly the number we want, since if we stop the VM right now > then there won't be any more data to be dirtied as well. Thank you Peter, I thought about your definition and it makes sense with your definition that existing calculation is appropriate and correct. -- Bala > > Regards, > > -- > Peter Xu >
diff --git a/migration/migration.c b/migration/migration.c index ffc4d9e556..dc38e9a380 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2903,7 +2903,13 @@ static void migration_update_counters(MigrationState *s, * recalculate. 10000 is a small enough number for our purposes */ if (ram_counters.dirty_pages_rate && transferred > 10000) { - s->expected_downtime = ram_counters.remaining / bandwidth; + /* Time required to transfer remaining ram */ + remaining_ram_transfer_time = ram_counters.remaining / bandwidth + + /* redirty of ram at the time remaining ram gets transferred*/ + newly_dirtied_ram = ram_counters.dirty_pages_rate * remaining_ram_transfer_time + + s->expected_downtime = (ram_counters.remaining + newly_dirtied_ram) / bandwidth; } qemu_file_reset_rate_limit(s->to_dst_file);