Message ID | 20231013230624.1007969-1-sstabellini@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cxenstored: wait until after reset to notify dom0less domains | expand |
On 14.10.23 01:06, Stefano Stabellini wrote: > From: George Dunlap <george.dunlap@cloud.com> > > Commit fc2b57c9a ("xenstored: send an evtchn notification on > introduce_domain") introduced the sending of an event channel to the > guest when first introduced, so that dom0less domains waiting for the > connection would know that xenstore was ready to use. > > Unfortunately, it was introduced in introduce_domain(), which 1) is > called by other functions, where such functionality is unneeded, and > 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This > introduces a race condition, whereby if xenstored is delayed, a domain > can wake up, send messages to the buffer, only to have them deleted by > xenstore before finishing its processing of the XS_INTRODUCE message. > > Move the connect-and-notfy call into do_introduce() instead, after the > domain_conn_rest(); predicated on the state being in the > XENSTORE_RECONNECT state. > > (We don't need to check for "restoring", since that value is always > passed as "false" from do_domain_introduce()). > > Also take the opportunity to add a missing wmb barrier after resetting > the indexes of the ring in domain_conn_reset. > > This change will also remove an extra event channel notification for > dom0 (because the notification is now done by do_introduce which is not > called for dom0.) The extra dom0 event channel notification was only > introduced by fc2b57c9a and was never present before. It is not needed > because dom0 is the one to tell xenstored the connection parameters, so > dom0 has to know that the ring page is setup correctly by the time > xenstored starts looking at it. It is dom0 that performs the ring page > init. > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> > CC: jgross@suse.com > CC: julien@xen.org > CC: wl@xen.org Two nits below, with those fixed: Reviewed-by: Juergen Gross <jgross@suse.com> > --- > tools/xenstored/domain.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c > index a6cd199fdc..f6ef37856c 100644 > --- a/tools/xenstored/domain.c > +++ b/tools/xenstored/domain.c > @@ -923,6 +923,7 @@ static void domain_conn_reset(struct domain *domain) > > domain->interface->req_cons = domain->interface->req_prod = 0; > domain->interface->rsp_cons = domain->interface->rsp_prod = 0; > + xen_wmb(); > } > > /* > @@ -988,12 +989,6 @@ static struct domain *introduce_domain(const void *ctx, > /* Now domain belongs to its connection. */ > talloc_steal(domain->conn, domain); > > - if (!restore) { > - /* Notify the domain that xenstore is available */ > - interface->connection = XENSTORE_CONNECTED; > - xenevtchn_notify(xce_handle, domain->port); > - } > - > if (!is_master_domain && !restore) > fire_special_watches("@introduceDomain"); > } else { > @@ -1033,6 +1028,13 @@ int do_introduce(const void *ctx, struct connection *conn, > > domain_conn_reset(domain); > > + if (domain->interface != NULL && > + domain->interface->connection == XENSTORE_RECONNECT) { Identation should be fixed (both tested conditions at the same position). > + /* Notify the domain that xenstore is available */ Please add a "." at the end of the sentence while you are moving this line. > + domain->interface->connection = XENSTORE_CONNECTED; > + xenevtchn_notify(xce_handle, domain->port); > + } > + > send_ack(conn, XS_INTRODUCE); > > return 0; Juergen
On Mon, Oct 16, 2023 at 8:38 AM Juergen Gross <jgross@suse.com> wrote: > On 14.10.23 01:06, Stefano Stabellini wrote: > > + /* Notify the domain that xenstore is available */ > > Please add a "." at the end of the sentence while you are moving this line. CODING_STYLE explicitly allows the omission of a '.' when a comment is only a single sentence. More generally, in this context in English at least, a comment seems to me more like a sign (which should not have punctuation) than a normal English sentence (which does). As such, to me the period actually seems wrong. In any case, CODING_STYLE was modified to avoid having to spend time arguing about this sort of thing. :-) -George
On 16.10.23 09:45, George Dunlap wrote: > On Mon, Oct 16, 2023 at 8:38 AM Juergen Gross <jgross@suse.com> wrote: >> On 14.10.23 01:06, Stefano Stabellini wrote: >>> + /* Notify the domain that xenstore is available */ >> >> Please add a "." at the end of the sentence while you are moving this line. > > CODING_STYLE explicitly allows the omission of a '.' when a comment is > only a single sentence. > > More generally, in this context in English at least, a comment seems > to me more like a sign (which should not have punctuation) than a > normal English sentence (which does). As such, to me the period > actually seems wrong. > > In any case, CODING_STYLE was modified to avoid having to spend time > arguing about this sort of thing. :-) I won't fight for the period. :-) Maybe I've just read too many kernel reviews asking for it. Juergen
+Henry for release ack On Fri, 13 Oct 2023, Stefano Stabellini wrote: > From: George Dunlap <george.dunlap@cloud.com> > > Commit fc2b57c9a ("xenstored: send an evtchn notification on > introduce_domain") introduced the sending of an event channel to the > guest when first introduced, so that dom0less domains waiting for the > connection would know that xenstore was ready to use. > > Unfortunately, it was introduced in introduce_domain(), which 1) is > called by other functions, where such functionality is unneeded, and > 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This > introduces a race condition, whereby if xenstored is delayed, a domain > can wake up, send messages to the buffer, only to have them deleted by > xenstore before finishing its processing of the XS_INTRODUCE message. > > Move the connect-and-notfy call into do_introduce() instead, after the > domain_conn_rest(); predicated on the state being in the > XENSTORE_RECONNECT state. > > (We don't need to check for "restoring", since that value is always > passed as "false" from do_domain_introduce()). > > Also take the opportunity to add a missing wmb barrier after resetting > the indexes of the ring in domain_conn_reset. > > This change will also remove an extra event channel notification for > dom0 (because the notification is now done by do_introduce which is not > called for dom0.) The extra dom0 event channel notification was only > introduced by fc2b57c9a and was never present before. It is not needed > because dom0 is the one to tell xenstored the connection parameters, so > dom0 has to know that the ring page is setup correctly by the time > xenstored starts looking at it. It is dom0 that performs the ring page > init. > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> > CC: jgross@suse.com > CC: julien@xen.org > CC: wl@xen.org > --- > tools/xenstored/domain.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c > index a6cd199fdc..f6ef37856c 100644 > --- a/tools/xenstored/domain.c > +++ b/tools/xenstored/domain.c > @@ -923,6 +923,7 @@ static void domain_conn_reset(struct domain *domain) > > domain->interface->req_cons = domain->interface->req_prod = 0; > domain->interface->rsp_cons = domain->interface->rsp_prod = 0; > + xen_wmb(); > } > > /* > @@ -988,12 +989,6 @@ static struct domain *introduce_domain(const void *ctx, > /* Now domain belongs to its connection. */ > talloc_steal(domain->conn, domain); > > - if (!restore) { > - /* Notify the domain that xenstore is available */ > - interface->connection = XENSTORE_CONNECTED; > - xenevtchn_notify(xce_handle, domain->port); > - } > - > if (!is_master_domain && !restore) > fire_special_watches("@introduceDomain"); > } else { > @@ -1033,6 +1028,13 @@ int do_introduce(const void *ctx, struct connection *conn, > > domain_conn_reset(domain); > > + if (domain->interface != NULL && > + domain->interface->connection == XENSTORE_RECONNECT) { > + /* Notify the domain that xenstore is available */ > + domain->interface->connection = XENSTORE_CONNECTED; > + xenevtchn_notify(xce_handle, domain->port); > + } > + > send_ack(conn, XS_INTRODUCE); > > return 0; > -- > 2.25.1 >
Hi Stefano, > On Oct 17, 2023, at 07:14, Stefano Stabellini <sstabellini@kernel.org> wrote: > > +Henry for release ack Thanks, I think we did have Juergen for reviewing this patch so: Release-acked-by: Henry Wang <Henry.Wang@arm.com> Kind regards, Henry > > > On Fri, 13 Oct 2023, Stefano Stabellini wrote: >> From: George Dunlap <george.dunlap@cloud.com> >> >> Commit fc2b57c9a ("xenstored: send an evtchn notification on >> introduce_domain") introduced the sending of an event channel to the >> guest when first introduced, so that dom0less domains waiting for the >> connection would know that xenstore was ready to use. >> >> Unfortunately, it was introduced in introduce_domain(), which 1) is >> called by other functions, where such functionality is unneeded, and >> 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This >> introduces a race condition, whereby if xenstored is delayed, a domain >> can wake up, send messages to the buffer, only to have them deleted by >> xenstore before finishing its processing of the XS_INTRODUCE message. >> >> Move the connect-and-notfy call into do_introduce() instead, after the >> domain_conn_rest(); predicated on the state being in the >> XENSTORE_RECONNECT state. >> >> (We don't need to check for "restoring", since that value is always >> passed as "false" from do_domain_introduce()). >> >> Also take the opportunity to add a missing wmb barrier after resetting >> the indexes of the ring in domain_conn_reset. >> >> This change will also remove an extra event channel notification for >> dom0 (because the notification is now done by do_introduce which is not >> called for dom0.) The extra dom0 event channel notification was only >> introduced by fc2b57c9a and was never present before. It is not needed >> because dom0 is the one to tell xenstored the connection parameters, so >> dom0 has to know that the ring page is setup correctly by the time >> xenstored starts looking at it. It is dom0 that performs the ring page >> init. >> >> Signed-off-by: George Dunlap <george.dunlap@cloud.com> >> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> >> CC: jgross@suse.com >> CC: julien@xen.org >> CC: wl@xen.org >> --- >> tools/xenstored/domain.c | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c >> index a6cd199fdc..f6ef37856c 100644 >> --- a/tools/xenstored/domain.c >> +++ b/tools/xenstored/domain.c >> @@ -923,6 +923,7 @@ static void domain_conn_reset(struct domain *domain) >> >> domain->interface->req_cons = domain->interface->req_prod = 0; >> domain->interface->rsp_cons = domain->interface->rsp_prod = 0; >> + xen_wmb(); >> } >> >> /* >> @@ -988,12 +989,6 @@ static struct domain *introduce_domain(const void *ctx, >> /* Now domain belongs to its connection. */ >> talloc_steal(domain->conn, domain); >> >> - if (!restore) { >> - /* Notify the domain that xenstore is available */ >> - interface->connection = XENSTORE_CONNECTED; >> - xenevtchn_notify(xce_handle, domain->port); >> - } >> - >> if (!is_master_domain && !restore) >> fire_special_watches("@introduceDomain"); >> } else { >> @@ -1033,6 +1028,13 @@ int do_introduce(const void *ctx, struct connection *conn, >> >> domain_conn_reset(domain); >> >> + if (domain->interface != NULL && >> + domain->interface->connection == XENSTORE_RECONNECT) { >> + /* Notify the domain that xenstore is available */ >> + domain->interface->connection = XENSTORE_CONNECTED; >> + xenevtchn_notify(xce_handle, domain->port); >> + } >> + >> send_ack(conn, XS_INTRODUCE); >> >> return 0; >> -- >> 2.25.1 >>
On 14.10.2023 01:06, Stefano Stabellini wrote: > From: George Dunlap <george.dunlap@cloud.com> > > Commit fc2b57c9a ("xenstored: send an evtchn notification on > introduce_domain") introduced the sending of an event channel to the > guest when first introduced, so that dom0less domains waiting for the > connection would know that xenstore was ready to use. > > Unfortunately, it was introduced in introduce_domain(), which 1) is > called by other functions, where such functionality is unneeded, and > 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This > introduces a race condition, whereby if xenstored is delayed, a domain > can wake up, send messages to the buffer, only to have them deleted by > xenstore before finishing its processing of the XS_INTRODUCE message. > > Move the connect-and-notfy call into do_introduce() instead, after the > domain_conn_rest(); predicated on the state being in the > XENSTORE_RECONNECT state. > > (We don't need to check for "restoring", since that value is always > passed as "false" from do_domain_introduce()). > > Also take the opportunity to add a missing wmb barrier after resetting > the indexes of the ring in domain_conn_reset. > > This change will also remove an extra event channel notification for > dom0 (because the notification is now done by do_introduce which is not > called for dom0.) The extra dom0 event channel notification was only > introduced by fc2b57c9a and was never present before. It is not needed > because dom0 is the one to tell xenstored the connection parameters, so > dom0 has to know that the ring page is setup correctly by the time > xenstored starts looking at it. It is dom0 that performs the ring page > init. > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Should this have had a Fixes: tag thus marking it to pick up for backport? Jan
On Wed, 17 Oct 2023, Jan Beulich wrote: > > On 14.10.2023 01:06, Stefano Stabellini wrote: > > From: George Dunlap <george.dunlap@cloud.com> > > > > Commit fc2b57c9a ("xenstored: send an evtchn notification on > > introduce_domain") introduced the sending of an event channel to the > > guest when first introduced, so that dom0less domains waiting for the > > connection would know that xenstore was ready to use. > > > > Unfortunately, it was introduced in introduce_domain(), which 1) is > > called by other functions, where such functionality is unneeded, and > > 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This > > introduces a race condition, whereby if xenstored is delayed, a domain > > can wake up, send messages to the buffer, only to have them deleted by > > xenstore before finishing its processing of the XS_INTRODUCE message. > > > > Move the connect-and-notfy call into do_introduce() instead, after the > > domain_conn_rest(); predicated on the state being in the > > XENSTORE_RECONNECT state. > > > > (We don't need to check for "restoring", since that value is always > > passed as "false" from do_domain_introduce()). > > > > Also take the opportunity to add a missing wmb barrier after resetting > > the indexes of the ring in domain_conn_reset. > > > > This change will also remove an extra event channel notification for > > dom0 (because the notification is now done by do_introduce which is not > > called for dom0.) The extra dom0 event channel notification was only > > introduced by fc2b57c9a and was never present before. It is not needed > > because dom0 is the one to tell xenstored the connection parameters, so > > dom0 has to know that the ring page is setup correctly by the time > > xenstored starts looking at it. It is dom0 that performs the ring page > > init. > > > > Signed-off-by: George Dunlap <george.dunlap@cloud.com> > > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> > > Should this have had a Fixes: tag thus marking it to pick up for > backport? Sorry this was committed already
On 18.10.2023 23:31, Stefano Stabellini wrote: > On Wed, 17 Oct 2023, Jan Beulich wrote: >> >> On 14.10.2023 01:06, Stefano Stabellini wrote: >>> From: George Dunlap <george.dunlap@cloud.com> >>> >>> Commit fc2b57c9a ("xenstored: send an evtchn notification on >>> introduce_domain") introduced the sending of an event channel to the >>> guest when first introduced, so that dom0less domains waiting for the >>> connection would know that xenstore was ready to use. >>> >>> Unfortunately, it was introduced in introduce_domain(), which 1) is >>> called by other functions, where such functionality is unneeded, and >>> 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This >>> introduces a race condition, whereby if xenstored is delayed, a domain >>> can wake up, send messages to the buffer, only to have them deleted by >>> xenstore before finishing its processing of the XS_INTRODUCE message. >>> >>> Move the connect-and-notfy call into do_introduce() instead, after the >>> domain_conn_rest(); predicated on the state being in the >>> XENSTORE_RECONNECT state. >>> >>> (We don't need to check for "restoring", since that value is always >>> passed as "false" from do_domain_introduce()). >>> >>> Also take the opportunity to add a missing wmb barrier after resetting >>> the indexes of the ring in domain_conn_reset. >>> >>> This change will also remove an extra event channel notification for >>> dom0 (because the notification is now done by do_introduce which is not >>> called for dom0.) The extra dom0 event channel notification was only >>> introduced by fc2b57c9a and was never present before. It is not needed >>> because dom0 is the one to tell xenstored the connection parameters, so >>> dom0 has to know that the ring page is setup correctly by the time >>> xenstored starts looking at it. It is dom0 that performs the ring page >>> init. >>> >>> Signed-off-by: George Dunlap <george.dunlap@cloud.com> >>> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> >> >> Should this have had a Fixes: tag thus marking it to pick up for >> backport? > > Sorry this was committed already That's why I used "have had". I still need an answer to the question though; your reply only hints towards "yes". Jan
On Thu, 19 Oct 2023, Jan Beulich wrote: > On 18.10.2023 23:31, Stefano Stabellini wrote: > > On Wed, 17 Oct 2023, Jan Beulich wrote: > >> > >> On 14.10.2023 01:06, Stefano Stabellini wrote: > >>> From: George Dunlap <george.dunlap@cloud.com> > >>> > >>> Commit fc2b57c9a ("xenstored: send an evtchn notification on > >>> introduce_domain") introduced the sending of an event channel to the > >>> guest when first introduced, so that dom0less domains waiting for the > >>> connection would know that xenstore was ready to use. > >>> > >>> Unfortunately, it was introduced in introduce_domain(), which 1) is > >>> called by other functions, where such functionality is unneeded, and > >>> 2) after the main XS_INTRODUCE call, calls domain_conn_reset(). This > >>> introduces a race condition, whereby if xenstored is delayed, a domain > >>> can wake up, send messages to the buffer, only to have them deleted by > >>> xenstore before finishing its processing of the XS_INTRODUCE message. > >>> > >>> Move the connect-and-notfy call into do_introduce() instead, after the > >>> domain_conn_rest(); predicated on the state being in the > >>> XENSTORE_RECONNECT state. > >>> > >>> (We don't need to check for "restoring", since that value is always > >>> passed as "false" from do_domain_introduce()). > >>> > >>> Also take the opportunity to add a missing wmb barrier after resetting > >>> the indexes of the ring in domain_conn_reset. > >>> > >>> This change will also remove an extra event channel notification for > >>> dom0 (because the notification is now done by do_introduce which is not > >>> called for dom0.) The extra dom0 event channel notification was only > >>> introduced by fc2b57c9a and was never present before. It is not needed > >>> because dom0 is the one to tell xenstored the connection parameters, so > >>> dom0 has to know that the ring page is setup correctly by the time > >>> xenstored starts looking at it. It is dom0 that performs the ring page > >>> init. > >>> > >>> Signed-off-by: George Dunlap <george.dunlap@cloud.com> > >>> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> > >> > >> Should this have had a Fixes: tag thus marking it to pick up for > >> backport? > > > > Sorry this was committed already > > That's why I used "have had". I still need an answer to the question > though; your reply only hints towards "yes". Yes, I would add the Fixes tag
diff --git a/tools/xenstored/domain.c b/tools/xenstored/domain.c index a6cd199fdc..f6ef37856c 100644 --- a/tools/xenstored/domain.c +++ b/tools/xenstored/domain.c @@ -923,6 +923,7 @@ static void domain_conn_reset(struct domain *domain) domain->interface->req_cons = domain->interface->req_prod = 0; domain->interface->rsp_cons = domain->interface->rsp_prod = 0; + xen_wmb(); } /* @@ -988,12 +989,6 @@ static struct domain *introduce_domain(const void *ctx, /* Now domain belongs to its connection. */ talloc_steal(domain->conn, domain); - if (!restore) { - /* Notify the domain that xenstore is available */ - interface->connection = XENSTORE_CONNECTED; - xenevtchn_notify(xce_handle, domain->port); - } - if (!is_master_domain && !restore) fire_special_watches("@introduceDomain"); } else { @@ -1033,6 +1028,13 @@ int do_introduce(const void *ctx, struct connection *conn, domain_conn_reset(domain); + if (domain->interface != NULL && + domain->interface->connection == XENSTORE_RECONNECT) { + /* Notify the domain that xenstore is available */ + domain->interface->connection = XENSTORE_CONNECTED; + xenevtchn_notify(xce_handle, domain->port); + } + send_ack(conn, XS_INTRODUCE); return 0;