Message ID | 1526268228-27951-11-git-send-email-zhangckid@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Zhang Chen <zhangckid@gmail.com> writes: > From: zhanghailiang <zhang.zhanghailiang@huawei.com> > > If some errors happen during VM's COLO FT stage, it's important to > notify the users of this event. Together with 'x-colo-lost-heartbeat', > Users can intervene in COLO's failover work immediately. > If users don't want to get involved in COLO's failover verdict, > it is still necessary to notify users that we exited COLO mode. > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > Signed-off-by: Zhang Chen <zhangckid@gmail.com> > Reviewed-by: Eric Blake <eblake@redhat.com> > --- > migration/colo.c | 20 ++++++++++++++++++++ > qapi/migration.json | 37 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 57 insertions(+) > > diff --git a/migration/colo.c b/migration/colo.c > index c083d36..8ca6381 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -28,6 +28,7 @@ > #include "net/colo-compare.h" > #include "net/colo.h" > #include "block/block.h" > +#include "qapi/qapi-events-migration.h" > > static bool vmstate_loading; > static Notifier packets_compare_notifier; > @@ -514,6 +515,18 @@ out: > qemu_fclose(fb); > } > > + /* > + * There are only two reasons we can go here, some error happened. > + * Or the user triggered failover. > + */ > + if (failover_get_state() == FAILOVER_STATUS_NONE) { > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > + COLO_EXIT_REASON_ERROR, NULL); > + } else { > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > + COLO_EXIT_REASON_REQUEST, NULL); > + } Your comment makes me suspect failover_get_state() can only be FAILOVER_STATUS_NONE or FAILOVER_STATUS_REQUIRE here. Is that correct? If yes, I recommend to add a suitable assertion. > + > /* Hope this not to be too long to wait here */ > qemu_sem_wait(&s->colo_exit_sem); > qemu_sem_destroy(&s->colo_exit_sem); > @@ -744,6 +757,13 @@ out: > if (local_err) { > error_report_err(local_err); > } > + if (failover_get_state() == FAILOVER_STATUS_NONE) { > + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, > + COLO_EXIT_REASON_ERROR, NULL); > + } else { > + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, > + COLO_EXIT_REASON_REQUEST, NULL); > + } Same question. > > if (fb) { > qemu_fclose(fb); > diff --git a/qapi/migration.json b/qapi/migration.json > index f3974c6..55dae48 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -875,6 +875,43 @@ > 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } > > ## > +# @COLO_EXIT: > +# > +# Emitted when VM finishes COLO mode due to some errors happening or > +# at the request of users. > +# > +# @mode: report COLO mode when COLO exited. > +# > +# @reason: describes the reason for the COLO exit. > +# > +# Since: 2.13 > +# > +# Example: > +# > +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, > +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } > +# > +## > +{ 'event': 'COLO_EXIT', > + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } 'data' duplicates the next patch's ColoStatus, except it lacks @colo-running. Factoring out the common part doesn't seem worth the bother. Okay as is. > + > +## > +# @COLOExitReason: > +# > +# The reason for a COLO exit > +# > +# @none: no failover has ever happened. This can't occur in the COLO_EXIT event, only in the result of query-colo-status, can it? Worth spelling that out in the documentation? > +# > +# @request: COLO exit is due to an external request > +# > +# @error: COLO exit is due to an internal error > +# > +# Since: 2.13 > +## > +{ 'enum': 'COLOExitReason', > + 'data': [ 'none', 'request', 'error' ] } > + > +## > # @x-colo-lost-heartbeat: > # > # Tell qemu that heartbeat is lost, request it to do takeover procedures.
On Tue, May 15, 2018 at 10:29 PM, Markus Armbruster <armbru@redhat.com> wrote: > Zhang Chen <zhangckid@gmail.com> writes: > > > From: zhanghailiang <zhang.zhanghailiang@huawei.com> > > > > If some errors happen during VM's COLO FT stage, it's important to > > notify the users of this event. Together with 'x-colo-lost-heartbeat', > > Users can intervene in COLO's failover work immediately. > > If users don't want to get involved in COLO's failover verdict, > > it is still necessary to notify users that we exited COLO mode. > > > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > > Signed-off-by: Zhang Chen <zhangckid@gmail.com> > > Reviewed-by: Eric Blake <eblake@redhat.com> > > --- > > migration/colo.c | 20 ++++++++++++++++++++ > > qapi/migration.json | 37 +++++++++++++++++++++++++++++++++++++ > > 2 files changed, 57 insertions(+) > > > > diff --git a/migration/colo.c b/migration/colo.c > > index c083d36..8ca6381 100644 > > --- a/migration/colo.c > > +++ b/migration/colo.c > > @@ -28,6 +28,7 @@ > > #include "net/colo-compare.h" > > #include "net/colo.h" > > #include "block/block.h" > > +#include "qapi/qapi-events-migration.h" > > > > static bool vmstate_loading; > > static Notifier packets_compare_notifier; > > @@ -514,6 +515,18 @@ out: > > qemu_fclose(fb); > > } > > > > + /* > > + * There are only two reasons we can go here, some error happened. > > + * Or the user triggered failover. > > + */ > > + if (failover_get_state() == FAILOVER_STATUS_NONE) { > > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > > + COLO_EXIT_REASON_ERROR, NULL); > > + } else { > > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > > + COLO_EXIT_REASON_REQUEST, NULL); > > + } > > Your comment makes me suspect failover_get_state() can only be > FAILOVER_STATUS_NONE or FAILOVER_STATUS_REQUIRE here. Is that correct? > > If yes, I recommend to add a suitable assertion. > Yes, and what kinds of 'suitable assertion'? Just for the 'failover_get_state()' ? > > > + > > /* Hope this not to be too long to wait here */ > > qemu_sem_wait(&s->colo_exit_sem); > > qemu_sem_destroy(&s->colo_exit_sem); > > @@ -744,6 +757,13 @@ out: > > if (local_err) { > > error_report_err(local_err); > > } > > + if (failover_get_state() == FAILOVER_STATUS_NONE) { > > + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, > > + COLO_EXIT_REASON_ERROR, NULL); > > + } else { > > + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, > > + COLO_EXIT_REASON_REQUEST, NULL); > > + } > > Same question. > > > > > if (fb) { > > qemu_fclose(fb); > > diff --git a/qapi/migration.json b/qapi/migration.json > > index f3974c6..55dae48 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -875,6 +875,43 @@ > > 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } > > > > ## > > +# @COLO_EXIT: > > +# > > +# Emitted when VM finishes COLO mode due to some errors happening or > > +# at the request of users. > > +# > > +# @mode: report COLO mode when COLO exited. > > +# > > +# @reason: describes the reason for the COLO exit. > > +# > > +# Since: 2.13 > > +# > > +# Example: > > +# > > +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, > > +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": > "request" } } > > +# > > +## > > +{ 'event': 'COLO_EXIT', > > + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } > > 'data' duplicates the next patch's ColoStatus, except it lacks > @colo-running. Factoring out the common part doesn't seem worth the > bother. Okay as is. > > > + > > +## > > +# @COLOExitReason: > > +# > > +# The reason for a COLO exit > > +# > > +# @none: no failover has ever happened. > > This can't occur in the COLO_EXIT event, only in the result of > query-colo-status, can it? Yes. > Worth spelling that out in the documentation? > > OK, I will add more comments here in next version. Thanks Zhang Chen > > +# > > +# @request: COLO exit is due to an external request > > +# > > +# @error: COLO exit is due to an internal error > > +# > > +# Since: 2.13 > > +## > > +{ 'enum': 'COLOExitReason', > > + 'data': [ 'none', 'request', 'error' ] } > > + > > +## > > # @x-colo-lost-heartbeat: > > # > > # Tell qemu that heartbeat is lost, request it to do takeover > procedures. >
Zhang Chen <zhangckid@gmail.com> writes: > On Tue, May 15, 2018 at 10:29 PM, Markus Armbruster <armbru@redhat.com> > wrote: > >> Zhang Chen <zhangckid@gmail.com> writes: >> >> > From: zhanghailiang <zhang.zhanghailiang@huawei.com> >> > >> > If some errors happen during VM's COLO FT stage, it's important to >> > notify the users of this event. Together with 'x-colo-lost-heartbeat', >> > Users can intervene in COLO's failover work immediately. >> > If users don't want to get involved in COLO's failover verdict, >> > it is still necessary to notify users that we exited COLO mode. >> > >> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> >> > Signed-off-by: Zhang Chen <zhangckid@gmail.com> >> > Reviewed-by: Eric Blake <eblake@redhat.com> >> > --- >> > migration/colo.c | 20 ++++++++++++++++++++ >> > qapi/migration.json | 37 +++++++++++++++++++++++++++++++++++++ >> > 2 files changed, 57 insertions(+) >> > >> > diff --git a/migration/colo.c b/migration/colo.c >> > index c083d36..8ca6381 100644 >> > --- a/migration/colo.c >> > +++ b/migration/colo.c >> > @@ -28,6 +28,7 @@ >> > #include "net/colo-compare.h" >> > #include "net/colo.h" >> > #include "block/block.h" >> > +#include "qapi/qapi-events-migration.h" >> > >> > static bool vmstate_loading; >> > static Notifier packets_compare_notifier; >> > @@ -514,6 +515,18 @@ out: >> > qemu_fclose(fb); >> > } >> > >> > + /* >> > + * There are only two reasons we can go here, some error happened. >> > + * Or the user triggered failover. >> > + */ >> > + if (failover_get_state() == FAILOVER_STATUS_NONE) { >> > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, >> > + COLO_EXIT_REASON_ERROR, NULL); >> > + } else { >> > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, >> > + COLO_EXIT_REASON_REQUEST, NULL); >> > + } >> >> Your comment makes me suspect failover_get_state() can only be >> FAILOVER_STATUS_NONE or FAILOVER_STATUS_REQUIRE here. Is that correct? >> >> If yes, I recommend to add a suitable assertion. ... to make the possible states immediately obvious. The fact that you felt a need for a comment is further evidence of non-obviousness. > > Yes, and what kinds of 'suitable assertion'? Just for the > 'failover_get_state()' ? Here's one way to skin this cat: failover_state = failover_get_state(); if (failover_state == FAILOVER_STATUS_NONE) { qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR, NULL); } else { assert(failover_state == FAILOVER_STATUS_REQUIRE); qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST, NULL); } Another one: switch (failover_get_state() { case FAILOVER_STATUS_NONE: qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR, NULL); break; case FAILOVER_STATUS_REQUIRE: qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST, NULL); break; default: abort(); } Either way, the possible states are immediately obvious. The run time check is a nice bonus. With just your comment, the reader still has to make the connection from the comment's prose to states, i.e. from "some error happened" to FAILOVER_STATUS_NONE, and from "user triggered failover" to FAILOVER_STATUS_REQUIRE. [...]
On Thu, May 17, 2018 at 4:19 PM, Markus Armbruster <armbru@redhat.com> wrote: > Zhang Chen <zhangckid@gmail.com> writes: > > > On Tue, May 15, 2018 at 10:29 PM, Markus Armbruster <armbru@redhat.com> > > wrote: > > > >> Zhang Chen <zhangckid@gmail.com> writes: > >> > >> > From: zhanghailiang <zhang.zhanghailiang@huawei.com> > >> > > >> > If some errors happen during VM's COLO FT stage, it's important to > >> > notify the users of this event. Together with 'x-colo-lost-heartbeat', > >> > Users can intervene in COLO's failover work immediately. > >> > If users don't want to get involved in COLO's failover verdict, > >> > it is still necessary to notify users that we exited COLO mode. > >> > > >> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > >> > Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> > >> > Signed-off-by: Zhang Chen <zhangckid@gmail.com> > >> > Reviewed-by: Eric Blake <eblake@redhat.com> > >> > --- > >> > migration/colo.c | 20 ++++++++++++++++++++ > >> > qapi/migration.json | 37 +++++++++++++++++++++++++++++++++++++ > >> > 2 files changed, 57 insertions(+) > >> > > >> > diff --git a/migration/colo.c b/migration/colo.c > >> > index c083d36..8ca6381 100644 > >> > --- a/migration/colo.c > >> > +++ b/migration/colo.c > >> > @@ -28,6 +28,7 @@ > >> > #include "net/colo-compare.h" > >> > #include "net/colo.h" > >> > #include "block/block.h" > >> > +#include "qapi/qapi-events-migration.h" > >> > > >> > static bool vmstate_loading; > >> > static Notifier packets_compare_notifier; > >> > @@ -514,6 +515,18 @@ out: > >> > qemu_fclose(fb); > >> > } > >> > > >> > + /* > >> > + * There are only two reasons we can go here, some error > happened. > >> > + * Or the user triggered failover. > >> > + */ > >> > + if (failover_get_state() == FAILOVER_STATUS_NONE) { > >> > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > >> > + COLO_EXIT_REASON_ERROR, NULL); > >> > + } else { > >> > + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > >> > + COLO_EXIT_REASON_REQUEST, NULL); > >> > + } > >> > >> Your comment makes me suspect failover_get_state() can only be > >> FAILOVER_STATUS_NONE or FAILOVER_STATUS_REQUIRE here. Is that correct? > >> > >> If yes, I recommend to add a suitable assertion. > > ... to make the possible states immediately obvious. The fact that you > felt a need for a comment is further evidence of non-obviousness. > > > > > Yes, and what kinds of 'suitable assertion'? Just for the > > 'failover_get_state()' ? > > Here's one way to skin this cat: > > failover_state = failover_get_state(); > if (failover_state == FAILOVER_STATUS_NONE) { > qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > COLO_EXIT_REASON_ERROR, NULL); > } else { > assert(failover_state == FAILOVER_STATUS_REQUIRE); > qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > COLO_EXIT_REASON_REQUEST, NULL); > } > > Another one: > > switch (failover_get_state() { > case FAILOVER_STATUS_NONE: > qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > COLO_EXIT_REASON_ERROR, NULL); > break; > case FAILOVER_STATUS_REQUIRE: > qapi_event_send_colo_exit(COLO_MODE_PRIMARY, > COLO_EXIT_REASON_REQUEST, NULL); > break; > default: > abort(); > } > > Either way, the possible states are immediately obvious. The run time > check is a nice bonus. > > With just your comment, the reader still has to make the connection from > the comment's prose to states, i.e. from "some error happened" to > FAILOVER_STATUS_NONE, and from "user triggered failover" to > FAILOVER_STATUS_REQUIRE. > > [...] > I got it, thanks for your detailed reply. I will fix it in next version. Thanks Zhang Chen
diff --git a/migration/colo.c b/migration/colo.c index c083d36..8ca6381 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -28,6 +28,7 @@ #include "net/colo-compare.h" #include "net/colo.h" #include "block/block.h" +#include "qapi/qapi-events-migration.h" static bool vmstate_loading; static Notifier packets_compare_notifier; @@ -514,6 +515,18 @@ out: qemu_fclose(fb); } + /* + * There are only two reasons we can go here, some error happened. + * Or the user triggered failover. + */ + if (failover_get_state() == FAILOVER_STATUS_NONE) { + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, + COLO_EXIT_REASON_ERROR, NULL); + } else { + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, + COLO_EXIT_REASON_REQUEST, NULL); + } + /* Hope this not to be too long to wait here */ qemu_sem_wait(&s->colo_exit_sem); qemu_sem_destroy(&s->colo_exit_sem); @@ -744,6 +757,13 @@ out: if (local_err) { error_report_err(local_err); } + if (failover_get_state() == FAILOVER_STATUS_NONE) { + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, + COLO_EXIT_REASON_ERROR, NULL); + } else { + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, + COLO_EXIT_REASON_REQUEST, NULL); + } if (fb) { qemu_fclose(fb); diff --git a/qapi/migration.json b/qapi/migration.json index f3974c6..55dae48 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -875,6 +875,43 @@ 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } ## +# @COLO_EXIT: +# +# Emitted when VM finishes COLO mode due to some errors happening or +# at the request of users. +# +# @mode: report COLO mode when COLO exited. +# +# @reason: describes the reason for the COLO exit. +# +# Since: 2.13 +# +# Example: +# +# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172}, +# "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } +# +## +{ 'event': 'COLO_EXIT', + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } } + +## +# @COLOExitReason: +# +# The reason for a COLO exit +# +# @none: no failover has ever happened. +# +# @request: COLO exit is due to an external request +# +# @error: COLO exit is due to an internal error +# +# Since: 2.13 +## +{ 'enum': 'COLOExitReason', + 'data': [ 'none', 'request', 'error' ] } + +## # @x-colo-lost-heartbeat: # # Tell qemu that heartbeat is lost, request it to do takeover procedures.