Message ID | 20211020144519.10362-1-julien@xen.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.16] tools/xenstored: Ignore domain we were unable to restore | expand |
Julien Grall writes ("[PATCH for-4.16] tools/xenstored: Ignore domain we were unable to restore"): > From: Julien Grall <jgrall@amazon.com> > > Commit 939775cfd3 "handle dying domains in live update" was meant to > handle gracefully dying domain. However, the @releaseDomain watch > will end up to be sent as soon as we finished to restore Xenstored > state. > > This may be before Xen reports the domain to be dying (such as if > the guest decided to revoke access to the xenstore page). Consequently > daemon like xenconsoled will not clean-up the domain and it will be > left as a zombie. > > To avoid the problem, mark the connection as ignored. This also > requires to tweak conn_can_write() and conn_can_read() to prevent > dereferencing a NULL pointer (the interface will not mapped). > > The check conn->is_ignored was originally added after the callbacks > because the helpers for a socket connection may close the fd. However, > ignore_connection() will close a socket connection directly. So it is > fine to do the re-order. Release-Acked-by: Ian Jackson <iwj@xenproject.org>
On 20.10.21 16:45, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > Commit 939775cfd3 "handle dying domains in live update" was meant to > handle gracefully dying domain. However, the @releaseDomain watch > will end up to be sent as soon as we finished to restore Xenstored > state. > > This may be before Xen reports the domain to be dying (such as if > the guest decided to revoke access to the xenstore page). Consequently > daemon like xenconsoled will not clean-up the domain and it will be > left as a zombie. > > To avoid the problem, mark the connection as ignored. This also > requires to tweak conn_can_write() and conn_can_read() to prevent > dereferencing a NULL pointer (the interface will not mapped). > > The check conn->is_ignored was originally added after the callbacks > because the helpers for a socket connection may close the fd. However, > ignore_connection() will close a socket connection directly. So it is > fine to do the re-order. > > Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
> On 20 Oct 2021, at 15:45, Julien Grall <julien@xen.org> wrote: > > From: Julien Grall <jgrall@amazon.com> > > Commit 939775cfd3 "handle dying domains in live update" was meant to > handle gracefully dying domain. However, the @releaseDomain watch > will end up to be sent as soon as we finished to restore Xenstored > state. > > This may be before Xen reports the domain to be dying (such as if > the guest decided to revoke access to the xenstore page). Consequently > daemon like xenconsoled will not clean-up the domain and it will be > left as a zombie. > > To avoid the problem, mark the connection as ignored. This also > requires to tweak conn_can_write() and conn_can_read() to prevent > dereferencing a NULL pointer (the interface will not mapped). > > The check conn->is_ignored was originally added after the callbacks > because the helpers for a socket connection may close the fd. However, > ignore_connection() will close a socket connection directly. So it is > fine to do the re-order. > > Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> > > --- > > This issue was originally found when developping commit 939775cfd3. > Unfortunately, due to some miscommunication the wrong patch was sent > upstream. The approach is re-using the one we discussed back then. > > This was tested by modifying Linux to revoke the Xenstore grant during > boot. Without this patch, the domain will be left after Live-Update. Note > that on a basic setup (only xenconsoled and xl watch @releaseDomain), > the domain may be cleaned on the next domain shutdown/start. > > Xenstore Live-Update is so far a tech preview feature. But I would still > like to request this patch to be included in 4.16 as this was meant to > be part of the original work. > --- > tools/xenstore/xenstored_core.c | 8 ++++---- > tools/xenstore/xenstored_core.h | 1 + > tools/xenstore/xenstored_domain.c | 21 ++++++++++++--------- > 3 files changed, 17 insertions(+), 13 deletions(-) > > diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c > index 0d4c73d6e20c..91d093a12ea6 100644 > --- a/tools/xenstore/xenstored_core.c > +++ b/tools/xenstore/xenstored_core.c > @@ -338,10 +338,10 @@ static int destroy_conn(void *_conn) > > static bool conn_can_read(struct connection *conn) > { > - if (!conn->funcs->can_read(conn)) > + if (conn->is_ignored) > return false; > > - if (conn->is_ignored) > + if (!conn->funcs->can_read(conn)) > return false; > > /* > @@ -356,7 +356,7 @@ static bool conn_can_read(struct connection *conn) > > static bool conn_can_write(struct connection *conn) > { > - return conn->funcs->can_write(conn) && !conn->is_ignored; > + return !conn->is_ignored && conn->funcs->can_write(conn); > } > > /* This function returns index inside the array if succeed, -1 if fail */ > @@ -1466,7 +1466,7 @@ static struct { > * > * All watches, transactions, buffers will be freed. > */ > -static void ignore_connection(struct connection *conn) > +void ignore_connection(struct connection *conn) > { > struct buffered_data *out, *tmp; > > diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h > index 258f6ff38279..07d861d92499 100644 > --- a/tools/xenstore/xenstored_core.h > +++ b/tools/xenstore/xenstored_core.h > @@ -206,6 +206,7 @@ struct node *read_node(struct connection *conn, const void *ctx, > > struct connection *new_connection(const struct interface_funcs *funcs); > struct connection *get_connection_by_id(unsigned int conn_id); > +void ignore_connection(struct connection *conn); > void check_store(void); > void corrupt(struct connection *conn, const char *fmt, ...); > > diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c > index 47e9107c144e..d03c7d93a9e7 100644 > --- a/tools/xenstore/xenstored_domain.c > +++ b/tools/xenstore/xenstored_domain.c > @@ -268,14 +268,7 @@ void check_domains(void) > domain->shutdown = true; > notify = 1; > } > - /* > - * On Restore, we may have been unable to remap the > - * interface and the port. As we don't know whether > - * this was because of a dying domain, we need to > - * check if the interface and port are still valid. > - */ > - if (!dominfo.dying && domain->port && > - domain->interface) > + if (!dominfo.dying) > continue; > } > if (domain->conn) { > @@ -1303,6 +1296,17 @@ void read_state_connection(const void *ctx, const void *state) > if (!domain) > barf("domain allocation error"); > > + conn = domain->conn; > + > + /* > + * We may not have been able to restore the domain (for > + * instance because it revoked the Xenstore grant). We need > + * to keep it around to send @releaseDomain when it is > + * dead. So mark it as ignored. > + */ > + if (!domain->port || !domain->interface) > + ignore_connection(conn); > + > if (sc->spec.ring.tdomid != DOMID_INVALID) { > tdomain = find_or_alloc_domain(ctx, > sc->spec.ring.tdomid); > @@ -1311,7 +1315,6 @@ void read_state_connection(const void *ctx, const void *state) > talloc_reference(domain->conn, tdomain->conn); > domain->conn->target = tdomain->conn; > } > - conn = domain->conn; > } > > conn->conn_id = sc->conn_id; > -- > 2.32.0 > >
Thanks everyone, committed. Ian.
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c index 0d4c73d6e20c..91d093a12ea6 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -338,10 +338,10 @@ static int destroy_conn(void *_conn) static bool conn_can_read(struct connection *conn) { - if (!conn->funcs->can_read(conn)) + if (conn->is_ignored) return false; - if (conn->is_ignored) + if (!conn->funcs->can_read(conn)) return false; /* @@ -356,7 +356,7 @@ static bool conn_can_read(struct connection *conn) static bool conn_can_write(struct connection *conn) { - return conn->funcs->can_write(conn) && !conn->is_ignored; + return !conn->is_ignored && conn->funcs->can_write(conn); } /* This function returns index inside the array if succeed, -1 if fail */ @@ -1466,7 +1466,7 @@ static struct { * * All watches, transactions, buffers will be freed. */ -static void ignore_connection(struct connection *conn) +void ignore_connection(struct connection *conn) { struct buffered_data *out, *tmp; diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h index 258f6ff38279..07d861d92499 100644 --- a/tools/xenstore/xenstored_core.h +++ b/tools/xenstore/xenstored_core.h @@ -206,6 +206,7 @@ struct node *read_node(struct connection *conn, const void *ctx, struct connection *new_connection(const struct interface_funcs *funcs); struct connection *get_connection_by_id(unsigned int conn_id); +void ignore_connection(struct connection *conn); void check_store(void); void corrupt(struct connection *conn, const char *fmt, ...); diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c index 47e9107c144e..d03c7d93a9e7 100644 --- a/tools/xenstore/xenstored_domain.c +++ b/tools/xenstore/xenstored_domain.c @@ -268,14 +268,7 @@ void check_domains(void) domain->shutdown = true; notify = 1; } - /* - * On Restore, we may have been unable to remap the - * interface and the port. As we don't know whether - * this was because of a dying domain, we need to - * check if the interface and port are still valid. - */ - if (!dominfo.dying && domain->port && - domain->interface) + if (!dominfo.dying) continue; } if (domain->conn) { @@ -1303,6 +1296,17 @@ void read_state_connection(const void *ctx, const void *state) if (!domain) barf("domain allocation error"); + conn = domain->conn; + + /* + * We may not have been able to restore the domain (for + * instance because it revoked the Xenstore grant). We need + * to keep it around to send @releaseDomain when it is + * dead. So mark it as ignored. + */ + if (!domain->port || !domain->interface) + ignore_connection(conn); + if (sc->spec.ring.tdomid != DOMID_INVALID) { tdomain = find_or_alloc_domain(ctx, sc->spec.ring.tdomid); @@ -1311,7 +1315,6 @@ void read_state_connection(const void *ctx, const void *state) talloc_reference(domain->conn, tdomain->conn); domain->conn->target = tdomain->conn; } - conn = domain->conn; } conn->conn_id = sc->conn_id;