diff mbox series

[RFC] migration: Introduce migration throttle event

Message ID 4df13a8005170ad42cbbc883a0a8fdbb1ab94ac1.1739846274.git.yong.huang@smartx.com (mailing list archive)
State New
Headers show
Series [RFC] migration: Introduce migration throttle event | expand

Commit Message

Yong Huang Feb. 18, 2025, 2:39 a.m. UTC
From: Hyman Huang <yong.huang@smartx.com>

When the developer is examining the time distribution of
the migration, it is useful to record the migration throttle
timestamp. Consequently, include the migration throttle event.

Signed-off-by: Hyman Huang <yong.huang@smartx.com>
---
 migration/ram.c     |  1 +
 qapi/migration.json | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

Comments

Markus Armbruster Feb. 18, 2025, 5:44 a.m. UTC | #1
yong.huang@smartx.com writes:

> From: Hyman Huang <yong.huang@smartx.com>
>
> When the developer is examining the time distribution of
> the migration, it is useful to record the migration throttle
> timestamp. Consequently, include the migration throttle event.

Can you explain what you'd like to do with the information in a little
more detail?

> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> ---
>  migration/ram.c     |  1 +
>  qapi/migration.json | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 589b6505eb..725e029927 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period,
>  
>      /* We have not started throttling yet. Let's start it. */
>      if (!cpu_throttle_active()) {
> +        qapi_event_send_migration_throttle();
>          cpu_throttle_set(pct_initial);
>      } else {
>          /* Throttling already on, just increase the rate */

I guess the percentage is uninteresting because it changes too quickly.
Correct?

Would it make sense to track cpu_throttle_stop(), too?

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8b9c53595c..0495065b5d 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1393,6 +1393,21 @@
>  { 'event': 'MIGRATION_PASS',
>    'data': { 'pass': 'int' } }
>  
> +##
> +# @MIGRATION_THROTTLE:
> +#
> +# Emitted from the source side of a migration at the start of vCPU throttle
> +#
> +# Since: 10.0
> +#
> +# Example:
> +#
> +# <- { "event": "MIGRATION_THROTTLE",
> +#      "timestamp": { "seconds": 1267041730, "microseconds": 281295 } }
> +#
> +##
> +{ 'event': 'MIGRATION_THROTTLE' }
> +
>  ##
>  # @COLOMessage:
>  #

Standard question for events: if a management application misses an
event, say because it restarts and reconnects, is there a way to obtain
the missed information with a query command?
Yong Huang Feb. 18, 2025, 6:50 a.m. UTC | #2
On Tue, Feb 18, 2025 at 1:44 PM Markus Armbruster <armbru@redhat.com> wrote:

> yong.huang@smartx.com writes:
>
> > From: Hyman Huang <yong.huang@smartx.com>
> >
> > When the developer is examining the time distribution of
> > the migration, it is useful to record the migration throttle
> > timestamp. Consequently, include the migration throttle event.
>
> Can you explain what you'd like to do with the information in a little
> more detail?


Throttle degrades guest performance during live migration;
with respect to the performance degradation aspect, migration
can be divided into the following phases when there is an excessive
memory load:

1. setup -> throttle
2. throttle -> switch-over
3. switch-over->finished

In the 1st phase, performance degradation is mostly affected by dirty
tracking.
In the 2nd phase, performance degradation is affected by dirty tracking +
throttle
In the 3nd phase, performance degradation is affected by stopping vCPU

To help differentiate between these three stages, determine which
has the biggest influence on performance degradation, and do some
performance optimization or generate a performance report or whatever,
a throttling timestamp can be included.

This patch has 2 goals, logging the throttle timestamp and generating
an event for management applications.


>
> > Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> > ---
> >  migration/ram.c     |  1 +
> >  qapi/migration.json | 15 +++++++++++++++
> >  2 files changed, 16 insertions(+)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 589b6505eb..725e029927 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t
> bytes_dirty_period,
> >
> >      /* We have not started throttling yet. Let's start it. */
> >      if (!cpu_throttle_active()) {
> > +        qapi_event_send_migration_throttle();
> >          cpu_throttle_set(pct_initial);
> >      } else {
> >          /* Throttling already on, just increase the rate */
>
> I guess the percentage is uninteresting because it changes too quickly.
> Correct?
>
>
QMP could query the throttle percentage already, but there is no way
to peer the throttle initiation timestamp.


> Would it make sense to track cpu_throttle_stop(), too?
>

IMHO, the CPU throttle stop event might be less helpful when considering
the three phases I described above because it isn't an essential event for
guest performance deterioration investigation.


>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8b9c53595c..0495065b5d 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1393,6 +1393,21 @@
> >  { 'event': 'MIGRATION_PASS',
> >    'data': { 'pass': 'int' } }
> >
> > +##
> > +# @MIGRATION_THROTTLE:
> > +#
> > +# Emitted from the source side of a migration at the start of vCPU
> throttle
> > +#
> > +# Since: 10.0
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "MIGRATION_THROTTLE",
> > +#      "timestamp": { "seconds": 1267041730, "microseconds": 281295 } }
> > +#
> > +##
> > +{ 'event': 'MIGRATION_THROTTLE' }
> > +
> >  ##
> >  # @COLOMessage:
> >  #
>
> Standard question for events: if a management application misses an
> event, say because it restarts and reconnects, is there a way to obtain
> the missed information with a query command?
>

During live migration, such an event is not inevitable: the management
application ought to be aware of this.

Thanks for the comment,
Yong
Peter Xu Feb. 18, 2025, 6:15 p.m. UTC | #3
On Tue, Feb 18, 2025 at 10:39:55AM +0800, yong.huang@smartx.com wrote:
> From: Hyman Huang <yong.huang@smartx.com>
> 
> When the developer is examining the time distribution of
> the migration, it is useful to record the migration throttle
> timestamp. Consequently, include the migration throttle event.

Would trace_cpu_throttle_set() work too?  That can provide a timestamp and
also the new percentage of throttle.

I don't feel strongly that we must not introduce qmp events for debugging,
but allowing that to happen means we can get tons of events at last.. as
people can start requesting many more events, and we'll need one way to
justify them at last.

One way to justify events can be that it could be consumed by mgmt.  On
that, this one I'm not yet sure.. so ideally tracepoints could work already.

> 
> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
> ---
>  migration/ram.c     |  1 +
>  qapi/migration.json | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 589b6505eb..725e029927 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period,
>  
>      /* We have not started throttling yet. Let's start it. */
>      if (!cpu_throttle_active()) {
> +        qapi_event_send_migration_throttle();
>          cpu_throttle_set(pct_initial);
>      } else {
>          /* Throttling already on, just increase the rate */
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8b9c53595c..0495065b5d 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1393,6 +1393,21 @@
>  { 'event': 'MIGRATION_PASS',
>    'data': { 'pass': 'int' } }
>  
> +##
> +# @MIGRATION_THROTTLE:
> +#
> +# Emitted from the source side of a migration at the start of vCPU throttle
> +#
> +# Since: 10.0
> +#
> +# Example:
> +#
> +# <- { "event": "MIGRATION_THROTTLE",
> +#      "timestamp": { "seconds": 1267041730, "microseconds": 281295 } }
> +#
> +##
> +{ 'event': 'MIGRATION_THROTTLE' }
> +
>  ##
>  # @COLOMessage:
>  #
> -- 
> 2.27.0
>
Markus Armbruster Feb. 18, 2025, 7:21 p.m. UTC | #4
Peter Xu <peterx@redhat.com> writes:

> On Tue, Feb 18, 2025 at 10:39:55AM +0800, yong.huang@smartx.com wrote:
>> From: Hyman Huang <yong.huang@smartx.com>
>> 
>> When the developer is examining the time distribution of
>> the migration, it is useful to record the migration throttle
>> timestamp. Consequently, include the migration throttle event.
>
> Would trace_cpu_throttle_set() work too?  That can provide a timestamp and
> also the new percentage of throttle.
>
> I don't feel strongly that we must not introduce qmp events for debugging,
> but allowing that to happen means we can get tons of events at last.. as
> people can start requesting many more events, and we'll need one way to
> justify them at last.
>
> One way to justify events can be that it could be consumed by mgmt.  On
> that, this one I'm not yet sure.. so ideally tracepoints could work already.

Good point.
Yong Huang Feb. 19, 2025, 1:31 a.m. UTC | #5
On Wed, Feb 19, 2025 at 4:24 AM Markus Armbruster <armbru@redhat.com> wrote:

> Peter Xu <peterx@redhat.com> writes:
>
> > On Tue, Feb 18, 2025 at 10:39:55AM +0800, yong.huang@smartx.com wrote:
> >> From: Hyman Huang <yong.huang@smartx.com>
> >>
> >> When the developer is examining the time distribution of
> >> the migration, it is useful to record the migration throttle
> >> timestamp. Consequently, include the migration throttle event.
> >
> > Would trace_cpu_throttle_set() work too?  That can provide a timestamp
> and
> > also the new percentage of throttle.
> >
> > I don't feel strongly that we must not introduce qmp events for
> debugging,
> > but allowing that to happen means we can get tons of events at last.. as
> > people can start requesting many more events, and we'll need one way to
> > justify them at last.
> >
> > One way to justify events can be that it could be consumed by mgmt.  On
> > that, this one I'm not yet sure.. so ideally tracepoints could work
> already.
>
> Good point.
>
>
Ack
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb..725e029927 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -524,6 +524,7 @@  static void mig_throttle_guest_down(uint64_t bytes_dirty_period,
 
     /* We have not started throttling yet. Let's start it. */
     if (!cpu_throttle_active()) {
+        qapi_event_send_migration_throttle();
         cpu_throttle_set(pct_initial);
     } else {
         /* Throttling already on, just increase the rate */
diff --git a/qapi/migration.json b/qapi/migration.json
index 8b9c53595c..0495065b5d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1393,6 +1393,21 @@ 
 { 'event': 'MIGRATION_PASS',
   'data': { 'pass': 'int' } }
 
+##
+# @MIGRATION_THROTTLE:
+#
+# Emitted from the source side of a migration at the start of vCPU throttle
+#
+# Since: 10.0
+#
+# Example:
+#
+# <- { "event": "MIGRATION_THROTTLE",
+#      "timestamp": { "seconds": 1267041730, "microseconds": 281295 } }
+#
+##
+{ 'event': 'MIGRATION_THROTTLE' }
+
 ##
 # @COLOMessage:
 #