Message ID | 1516369485-5374-11-git-send-email-zhangckid@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Zhang Chen <zhangckid@gmail.com> writes: > From: zhanghailiang <zhang.zhanghailiang@huawei.com> > > If some errors happen during VM's COLO FT stage, it's important to > notify the users of this event. Together with 'x-colo-lost-heartbeat', > Users can intervene in COLO's failover work immediately. > If users don't want to get involved in COLO's failover verdict, > it is still necessary to notify users that we exited COLO mode. > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > Signed-off-by: Zhang Chen <zhangckid@gmail.com> > Reviewed-by: Eric Blake <eblake@redhat.com> [...] > diff --git a/qapi/migration.json b/qapi/migration.json > index 70e7b67..6fc95b7 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -869,6 +869,41 @@ > 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } > > ## > +# @COLO_EXIT: > +# > +# Emitted when VM finishes COLO mode due to some errors happening or > +# at the request of users. > +# > +# @mode: which COLO mode the VM was in when it exited. > +# > +# @reason: describes the reason for the COLO exit. > +# > +# Since: 2.12 > +# > +# Example: > +# > +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, > +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } > +# > +## > +{ 'event': 'COLO_EXIT', > + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } Standard question when I see a new event: is there a way to poll for the event's information? If not, why don't we need one? Remember, management applications might miss events when they lose the connection and have to reconnect, say because the management application needs to be restarted. > + > +## > +# @COLOExitReason: > +# > +# The reason for a COLO exit > +# > +# @request: COLO exit is due to an external request > +# > +# @error: COLO exit is due to an internal error > +# > +# Since: 2.12 > +## > +{ 'enum': 'COLOExitReason', > + 'data': [ 'request', 'error' ] } > + > +## > # @x-colo-lost-heartbeat: > # > # Tell qemu that heartbeat is lost, request it to do takeover procedures.
On Sat, Feb 3, 2018 at 3:49 PM, Markus Armbruster <armbru@redhat.com> wrote: > Zhang Chen <zhangckid@gmail.com> writes: > > > From: zhanghailiang <zhang.zhanghailiang@huawei.com> > > > > If some errors happen during VM's COLO FT stage, it's important to > > notify the users of this event. Together with 'x-colo-lost-heartbeat', > > Users can intervene in COLO's failover work immediately. > > If users don't want to get involved in COLO's failover verdict, > > it is still necessary to notify users that we exited COLO mode. > > > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > > Signed-off-by: Zhang Chen <zhangckid@gmail.com> > > Reviewed-by: Eric Blake <eblake@redhat.com> > [...] > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 70e7b67..6fc95b7 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -869,6 +869,41 @@ > > 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } > > > > ## > > +# @COLO_EXIT: > > +# > > +# Emitted when VM finishes COLO mode due to some errors happening or > > +# at the request of users. > > +# > > +# @mode: which COLO mode the VM was in when it exited. > > +# > > +# @reason: describes the reason for the COLO exit. > > +# > > +# Since: 2.12 > > +# > > +# Example: > > +# > > +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, > > +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": > "request" } } > > +# > > +## > > +{ 'event': 'COLO_EXIT', > > + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } > > Standard question when I see a new event: is there a way to poll for the > event's information? If not, why don't we need one? > > Your means is we'd better print the information to a log file or something like that for all qemu events? CC Eric Blake <eblake@redhat.com> any idea about this? Thanks Zhang Chen > Remember, management applications might miss events when they lose the > connection and have to reconnect, say because the management application > needs to be restarted. > > > + > > +## > > +# @COLOExitReason: > > +# > > +# The reason for a COLO exit > > +# > > +# @request: COLO exit is due to an external request > > +# > > +# @error: COLO exit is due to an internal error > > +# > > +# Since: 2.12 > > +## > > +{ 'enum': 'COLOExitReason', > > + 'data': [ 'request', 'error' ] } > > + > > +## > > # @x-colo-lost-heartbeat: > > # > > # Tell qemu that heartbeat is lost, request it to do takeover > procedures. >
Zhang Chen <zhangckid@gmail.com> writes: > On Sat, Feb 3, 2018 at 3:49 PM, Markus Armbruster <armbru@redhat.com> wrote: > >> Zhang Chen <zhangckid@gmail.com> writes: >> >> > From: zhanghailiang <zhang.zhanghailiang@huawei.com> >> > >> > If some errors happen during VM's COLO FT stage, it's important to >> > notify the users of this event. Together with 'x-colo-lost-heartbeat', >> > Users can intervene in COLO's failover work immediately. >> > If users don't want to get involved in COLO's failover verdict, >> > it is still necessary to notify users that we exited COLO mode. >> > >> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> >> > Signed-off-by: Zhang Chen <zhangckid@gmail.com> >> > Reviewed-by: Eric Blake <eblake@redhat.com> >> [...] >> > diff --git a/qapi/migration.json b/qapi/migration.json >> > index 70e7b67..6fc95b7 100644 >> > --- a/qapi/migration.json >> > +++ b/qapi/migration.json >> > @@ -869,6 +869,41 @@ >> > 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } >> > >> > ## >> > +# @COLO_EXIT: >> > +# >> > +# Emitted when VM finishes COLO mode due to some errors happening or >> > +# at the request of users. >> > +# >> > +# @mode: which COLO mode the VM was in when it exited. >> > +# >> > +# @reason: describes the reason for the COLO exit. >> > +# >> > +# Since: 2.12 >> > +# >> > +# Example: >> > +# >> > +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, >> > +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } >> > +# >> > +## >> > +{ 'event': 'COLO_EXIT', >> > + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } >> >> Standard question when I see a new event: is there a way to poll for the >> event's information? If not, why don't we need one? >> >> > Your means is we'd better print the information to a log file or something > like that for all qemu events? > CC Eric Blake <eblake@redhat.com> > any idea about this? Events carrying state change information management applications want to track are generally paired with a query- command. While the management application is connected, it can track by passively listening for state change events. After (re)connect, it has to actively query the current state. Questions? >> Remember, management applications might miss events when they lose the >> connection and have to reconnect, say because the management application >> needs to be restarted. >> >> > + >> > +## >> > +# @COLOExitReason: >> > +# >> > +# The reason for a COLO exit >> > +# >> > +# @request: COLO exit is due to an external request >> > +# >> > +# @error: COLO exit is due to an internal error >> > +# >> > +# Since: 2.12 >> > +## >> > +{ 'enum': 'COLOExitReason', >> > + 'data': [ 'request', 'error' ] } >> > + >> > +## >> > # @x-colo-lost-heartbeat: >> > # >> > # Tell qemu that heartbeat is lost, request it to do takeover procedures.
On Tue, Feb 6, 2018 at 3:27 PM, Markus Armbruster <armbru@redhat.com> wrote: > Zhang Chen <zhangckid@gmail.com> writes: > > > On Sat, Feb 3, 2018 at 3:49 PM, Markus Armbruster <armbru@redhat.com> > wrote: > > > >> Zhang Chen <zhangckid@gmail.com> writes: > >> > >> > From: zhanghailiang <zhang.zhanghailiang@huawei.com> > >> > > >> > If some errors happen during VM's COLO FT stage, it's important to > >> > notify the users of this event. Together with 'x-colo-lost-heartbeat', > >> > Users can intervene in COLO's failover work immediately. > >> > If users don't want to get involved in COLO's failover verdict, > >> > it is still necessary to notify users that we exited COLO mode. > >> > > >> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > >> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > >> > Signed-off-by: Zhang Chen <zhangckid@gmail.com> > >> > Reviewed-by: Eric Blake <eblake@redhat.com> > >> [...] > >> > diff --git a/qapi/migration.json b/qapi/migration.json > >> > index 70e7b67..6fc95b7 100644 > >> > --- a/qapi/migration.json > >> > +++ b/qapi/migration.json > >> > @@ -869,6 +869,41 @@ > >> > 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } > >> > > >> > ## > >> > +# @COLO_EXIT: > >> > +# > >> > +# Emitted when VM finishes COLO mode due to some errors happening or > >> > +# at the request of users. > >> > +# > >> > +# @mode: which COLO mode the VM was in when it exited. > >> > +# > >> > +# @reason: describes the reason for the COLO exit. > >> > +# > >> > +# Since: 2.12 > >> > +# > >> > +# Example: > >> > +# > >> > +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, > >> > +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": > "request" } } > >> > +# > >> > +## > >> > +{ 'event': 'COLO_EXIT', > >> > + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } > >> > >> Standard question when I see a new event: is there a way to poll for the > >> event's information? If not, why don't we need one? > >> > >> > > Your means is we'd better print the information to a log file or > something > > like that for all qemu events? > > CC Eric Blake <eblake@redhat.com> > > any idea about this? > > Events carrying state change information management applications want to > track are generally paired with a query- command. While the management > application is connected, it can track by passively listening for state > change events. After (re)connect, it has to actively query the current > state. > > Questions? > If I understand correctly, maybe we need a qemu events general history mechanism to solve this problem, because lots of qemu events can't resend the current state. Yes, when the "management application"(like libvirt) lose the connection to qemu, management application can't get the information after reconnect. Thanks Zhang Chen > > >> Remember, management applications might miss events when they lose the > >> connection and have to reconnect, say because the management application > >> needs to be restarted. > >> > >> > + > >> > +## > >> > +# @COLOExitReason: > >> > +# > >> > +# The reason for a COLO exit > >> > +# > >> > +# @request: COLO exit is due to an external request > >> > +# > >> > +# @error: COLO exit is due to an internal error > >> > +# > >> > +# Since: 2.12 > >> > +## > >> > +{ 'enum': 'COLOExitReason', > >> > + 'data': [ 'request', 'error' ] } > >> > + > >> > +## > >> > # @x-colo-lost-heartbeat: > >> > # > >> > # Tell qemu that heartbeat is lost, request it to do takeover > procedures. >
Zhang Chen <zhangckid@gmail.com> writes: > On Tue, Feb 6, 2018 at 3:27 PM, Markus Armbruster <armbru@redhat.com> wrote: > >> Zhang Chen <zhangckid@gmail.com> writes: >> >> > On Sat, Feb 3, 2018 at 3:49 PM, Markus Armbruster <armbru@redhat.com> wrote: >> >> Standard question when I see a new event: is there a way to poll for the >> >> event's information? If not, why don't we need one? >> >> >> >> >> > Your means is we'd better print the information to a log file or something >> > like that for all qemu events? >> > CC Eric Blake <eblake@redhat.com> >> > any idea about this? >> >> Events carrying state change information management applications want to >> track are generally paired with a query- command. While the management >> application is connected, it can track by passively listening for state >> change events. After (re)connect, it has to actively query the current >> state. >> >> Questions? >> > > > If I understand correctly, maybe we need a qemu events general history > mechanism > to solve this problem, > because lots of qemu events can't resend the current state. Yes, when the > "management application"(like libvirt) > lose the connection to qemu, management application can't get the > information after reconnect. Events can't resend the current state, but query commands can. Designing of an "events general history mechanism" could well be non-trivial. Its implementation might not be simple, either. Query commands, on the other hand, are well understood and easy to implement.
On Tue, Feb 6, 2018 at 5:53 PM, Markus Armbruster <armbru@redhat.com> wrote: > Zhang Chen <zhangckid@gmail.com> writes: > > > On Tue, Feb 6, 2018 at 3:27 PM, Markus Armbruster <armbru@redhat.com> > wrote: > > > >> Zhang Chen <zhangckid@gmail.com> writes: > >> > >> > On Sat, Feb 3, 2018 at 3:49 PM, Markus Armbruster <armbru@redhat.com> > wrote: > >> >> Standard question when I see a new event: is there a way to poll for > the > >> >> event's information? If not, why don't we need one? > >> >> > >> >> > >> > Your means is we'd better print the information to a log file or > something > >> > like that for all qemu events? > >> > CC Eric Blake <eblake@redhat.com> > >> > any idea about this? > >> > >> Events carrying state change information management applications want to > >> track are generally paired with a query- command. While the management > >> application is connected, it can track by passively listening for state > >> change events. After (re)connect, it has to actively query the current > >> state. > >> > >> Questions? > >> > > > > > > If I understand correctly, maybe we need a qemu events general history > > mechanism > > to solve this problem, > > because lots of qemu events can't resend the current state. Yes, when the > > "management application"(like libvirt) > > lose the connection to qemu, management application can't get the > > information after reconnect. > > Events can't resend the current state, but query commands can. > > Designing of an "events general history mechanism" could well be > non-trivial. Its implementation might not be simple, either. Query > commands, on the other hand, are well understood and easy to implement. > OK, I got it. I will add a new query command for COLO state in next version. Thanks your comments. Zhang Chen
On 02/05/2018 09:13 PM, Zhang Chen wrote: >>> +## >>> +{ 'event': 'COLO_EXIT', >>> + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } >> >> Standard question when I see a new event: is there a way to poll for the >> event's information? If not, why don't we need one? >> >> > Your means is we'd better print the information to a log file or something > like that for all qemu events? > CC Eric Blake <eblake@redhat.com> > any idea about this? Nothing to add, Markus is right - implementing a new mechanism that logs all events as they are issued, and teaching libvirt to parse that log at startup, is more work than just implementing a query-foo command that libvirt already knows how to use to query current state on first connect (and based on that query, make an intelligent decision on whether at least one event was missed during downtime). So far, no one has come up with an event that is so important it must be logged, when compared to the working alternative of just having events be ways to optimize performance so that the query- command doesn't have to be polled all the time, but no severe loss if the event is missed because the query- can be used in its place.
diff --git a/migration/colo.c b/migration/colo.c index 8d2e3f8..790b122 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -516,6 +516,18 @@ out: qemu_fclose(fb); } + /* + * There are only two reasons we can go here, some error happened. + * Or the user triggered failover. + */ + if (failover_get_state() == FAILOVER_STATUS_NONE) { + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, + COLO_EXIT_REASON_ERROR, NULL); + } else { + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, + COLO_EXIT_REASON_REQUEST, NULL); + } + /* Hope this not to be too long to wait here */ qemu_sem_wait(&s->colo_exit_sem); qemu_sem_destroy(&s->colo_exit_sem); @@ -746,6 +758,13 @@ out: if (local_err) { error_report_err(local_err); } + if (failover_get_state() == FAILOVER_STATUS_NONE) { + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, + COLO_EXIT_REASON_ERROR, NULL); + } else { + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, + COLO_EXIT_REASON_REQUEST, NULL); + } if (fb) { qemu_fclose(fb); diff --git a/qapi/migration.json b/qapi/migration.json index 70e7b67..6fc95b7 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -869,6 +869,41 @@ 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } ## +# @COLO_EXIT: +# +# Emitted when VM finishes COLO mode due to some errors happening or +# at the request of users. +# +# @mode: which COLO mode the VM was in when it exited. +# +# @reason: describes the reason for the COLO exit. +# +# Since: 2.12 +# +# Example: +# +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } +# +## +{ 'event': 'COLO_EXIT', + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } + +## +# @COLOExitReason: +# +# The reason for a COLO exit +# +# @request: COLO exit is due to an external request +# +# @error: COLO exit is due to an internal error +# +# Since: 2.12 +## +{ 'enum': 'COLOExitReason', + 'data': [ 'request', 'error' ] } + +## # @x-colo-lost-heartbeat: # # Tell qemu that heartbeat is lost, request it to do takeover procedures.