Message ID | 4df13a8005170ad42cbbc883a0a8fdbb1ab94ac1.1739846274.git.yong.huang@smartx.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] migration: Introduce migration throttle event | expand |
yong.huang@smartx.com writes: > From: Hyman Huang <yong.huang@smartx.com> > > When the developer is examining the time distribution of > the migration, it is useful to record the migration throttle > timestamp. Consequently, include the migration throttle event. Can you explain what you'd like to do with the information in a little more detail? > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > --- > migration/ram.c | 1 + > qapi/migration.json | 15 +++++++++++++++ > 2 files changed, 16 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index 589b6505eb..725e029927 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period, > > /* We have not started throttling yet. Let's start it. */ > if (!cpu_throttle_active()) { > + qapi_event_send_migration_throttle(); > cpu_throttle_set(pct_initial); > } else { > /* Throttling already on, just increase the rate */ I guess the percentage is uninteresting because it changes too quickly. Correct? Would it make sense to track cpu_throttle_stop(), too? > diff --git a/qapi/migration.json b/qapi/migration.json > index 8b9c53595c..0495065b5d 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -1393,6 +1393,21 @@ > { 'event': 'MIGRATION_PASS', > 'data': { 'pass': 'int' } } > > +## > +# @MIGRATION_THROTTLE: > +# > +# Emitted from the source side of a migration at the start of vCPU throttle > +# > +# Since: 10.0 > +# > +# Example: > +# > +# <- { "event": "MIGRATION_THROTTLE", > +# "timestamp": { "seconds": 1267041730, "microseconds": 281295 } } > +# > +## > +{ 'event': 'MIGRATION_THROTTLE' } > + > ## > # @COLOMessage: > # Standard question for events: if a management application misses an event, say because it restarts and reconnects, is there a way to obtain the missed information with a query command?
On Tue, Feb 18, 2025 at 1:44 PM Markus Armbruster <armbru@redhat.com> wrote: > yong.huang@smartx.com writes: > > > From: Hyman Huang <yong.huang@smartx.com> > > > > When the developer is examining the time distribution of > > the migration, it is useful to record the migration throttle > > timestamp. Consequently, include the migration throttle event. > > Can you explain what you'd like to do with the information in a little > more detail? Throttle degrades guest performance during live migration; with respect to the performance degradation aspect, migration can be divided into the following phases when there is an excessive memory load: 1. setup -> throttle 2. throttle -> switch-over 3. switch-over->finished In the 1st phase, performance degradation is mostly affected by dirty tracking. In the 2nd phase, performance degradation is affected by dirty tracking + throttle In the 3nd phase, performance degradation is affected by stopping vCPU To help differentiate between these three stages, determine which has the biggest influence on performance degradation, and do some performance optimization or generate a performance report or whatever, a throttling timestamp can be included. This patch has 2 goals, logging the throttle timestamp and generating an event for management applications. > > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > > --- > > migration/ram.c | 1 + > > qapi/migration.json | 15 +++++++++++++++ > > 2 files changed, 16 insertions(+) > > > > diff --git a/migration/ram.c b/migration/ram.c > > index 589b6505eb..725e029927 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t > bytes_dirty_period, > > > > /* We have not started throttling yet. Let's start it. */ > > if (!cpu_throttle_active()) { > > + qapi_event_send_migration_throttle(); > > cpu_throttle_set(pct_initial); > > } else { > > /* Throttling already on, just increase the rate */ > > I guess the percentage is uninteresting because it changes too quickly. > Correct? > > QMP could query the throttle percentage already, but there is no way to peer the throttle initiation timestamp. > Would it make sense to track cpu_throttle_stop(), too? > IMHO, the CPU throttle stop event might be less helpful when considering the three phases I described above because it isn't an essential event for guest performance deterioration investigation. > > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 8b9c53595c..0495065b5d 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -1393,6 +1393,21 @@ > > { 'event': 'MIGRATION_PASS', > > 'data': { 'pass': 'int' } } > > > > +## > > +# @MIGRATION_THROTTLE: > > +# > > +# Emitted from the source side of a migration at the start of vCPU > throttle > > +# > > +# Since: 10.0 > > +# > > +# Example: > > +# > > +# <- { "event": "MIGRATION_THROTTLE", > > +# "timestamp": { "seconds": 1267041730, "microseconds": 281295 } } > > +# > > +## > > +{ 'event': 'MIGRATION_THROTTLE' } > > + > > ## > > # @COLOMessage: > > # > > Standard question for events: if a management application misses an > event, say because it restarts and reconnects, is there a way to obtain > the missed information with a query command? > During live migration, such an event is not inevitable: the management application ought to be aware of this. Thanks for the comment, Yong
On Tue, Feb 18, 2025 at 10:39:55AM +0800, yong.huang@smartx.com wrote: > From: Hyman Huang <yong.huang@smartx.com> > > When the developer is examining the time distribution of > the migration, it is useful to record the migration throttle > timestamp. Consequently, include the migration throttle event. Would trace_cpu_throttle_set() work too? That can provide a timestamp and also the new percentage of throttle. I don't feel strongly that we must not introduce qmp events for debugging, but allowing that to happen means we can get tons of events at last.. as people can start requesting many more events, and we'll need one way to justify them at last. One way to justify events can be that it could be consumed by mgmt. On that, this one I'm not yet sure.. so ideally tracepoints could work already. > > Signed-off-by: Hyman Huang <yong.huang@smartx.com> > --- > migration/ram.c | 1 + > qapi/migration.json | 15 +++++++++++++++ > 2 files changed, 16 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index 589b6505eb..725e029927 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period, > > /* We have not started throttling yet. Let's start it. */ > if (!cpu_throttle_active()) { > + qapi_event_send_migration_throttle(); > cpu_throttle_set(pct_initial); > } else { > /* Throttling already on, just increase the rate */ > diff --git a/qapi/migration.json b/qapi/migration.json > index 8b9c53595c..0495065b5d 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -1393,6 +1393,21 @@ > { 'event': 'MIGRATION_PASS', > 'data': { 'pass': 'int' } } > > +## > +# @MIGRATION_THROTTLE: > +# > +# Emitted from the source side of a migration at the start of vCPU throttle > +# > +# Since: 10.0 > +# > +# Example: > +# > +# <- { "event": "MIGRATION_THROTTLE", > +# "timestamp": { "seconds": 1267041730, "microseconds": 281295 } } > +# > +## > +{ 'event': 'MIGRATION_THROTTLE' } > + > ## > # @COLOMessage: > # > -- > 2.27.0 >
Peter Xu <peterx@redhat.com> writes: > On Tue, Feb 18, 2025 at 10:39:55AM +0800, yong.huang@smartx.com wrote: >> From: Hyman Huang <yong.huang@smartx.com> >> >> When the developer is examining the time distribution of >> the migration, it is useful to record the migration throttle >> timestamp. Consequently, include the migration throttle event. > > Would trace_cpu_throttle_set() work too? That can provide a timestamp and > also the new percentage of throttle. > > I don't feel strongly that we must not introduce qmp events for debugging, > but allowing that to happen means we can get tons of events at last.. as > people can start requesting many more events, and we'll need one way to > justify them at last. > > One way to justify events can be that it could be consumed by mgmt. On > that, this one I'm not yet sure.. so ideally tracepoints could work already. Good point.
On Wed, Feb 19, 2025 at 4:24 AM Markus Armbruster <armbru@redhat.com> wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Tue, Feb 18, 2025 at 10:39:55AM +0800, yong.huang@smartx.com wrote: > >> From: Hyman Huang <yong.huang@smartx.com> > >> > >> When the developer is examining the time distribution of > >> the migration, it is useful to record the migration throttle > >> timestamp. Consequently, include the migration throttle event. > > > > Would trace_cpu_throttle_set() work too? That can provide a timestamp > and > > also the new percentage of throttle. > > > > I don't feel strongly that we must not introduce qmp events for > debugging, > > but allowing that to happen means we can get tons of events at last.. as > > people can start requesting many more events, and we'll need one way to > > justify them at last. > > > > One way to justify events can be that it could be consumed by mgmt. On > > that, this one I'm not yet sure.. so ideally tracepoints could work > already. > > Good point. > > Ack
diff --git a/migration/ram.c b/migration/ram.c index 589b6505eb..725e029927 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -524,6 +524,7 @@ static void mig_throttle_guest_down(uint64_t bytes_dirty_period, /* We have not started throttling yet. Let's start it. */ if (!cpu_throttle_active()) { + qapi_event_send_migration_throttle(); cpu_throttle_set(pct_initial); } else { /* Throttling already on, just increase the rate */ diff --git a/qapi/migration.json b/qapi/migration.json index 8b9c53595c..0495065b5d 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1393,6 +1393,21 @@ { 'event': 'MIGRATION_PASS', 'data': { 'pass': 'int' } } +## +# @MIGRATION_THROTTLE: +# +# Emitted from the source side of a migration at the start of vCPU throttle +# +# Since: 10.0 +# +# Example: +# +# <- { "event": "MIGRATION_THROTTLE", +# "timestamp": { "seconds": 1267041730, "microseconds": 281295 } } +# +## +{ 'event': 'MIGRATION_THROTTLE' } + ## # @COLOMessage: #