Message ID | 1416408407-6774-4-git-send-email-kaike.wan@intel.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
> Ibacm may cause segfault when the SM restarts: when the SM restarts, ibacm > will > receive P_Key change event and instruct ibacmp to close all endpoints. > However, > ibacmp only resets the core endpoint pointer in its ep structure and keeps > the ep > in the port's ep_list. Afterwards, the ibacm core will ask ibacmp to > create > an ep for each pkey enumerated from the local port. The ep will be found > from the port's ep_list if it exists. However, if an old pkey is not > present > in the new SM configuration, the old ep will still be linked in the port's > ep_list with the ep->endpoint being set to NULL. When the ibacm core > forwards > the client reregistration event to ibacmp, ibacmp will enumerate the > ep_list and > try to join multicast group for each ep, including any one with ep- > >endpoint > set to NULL. In this case, it will cause segfault in acm_send_sa_mad(). > Additional check should be able to avoid the crash. I'm having trouble following this. Is the problem that the provider is not cleaning up properly/completely when being told to close its endpoint? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Ibacm may cause segfault when the SM restarts: when the SM restarts, > > ibacm will receive P_Key change event and instruct ibacmp to close all > > endpoints. > > However, > > ibacmp only resets the core endpoint pointer in its ep structure and > > keeps the ep in the port's ep_list. Afterwards, the ibacm core will > > ask ibacmp to create an ep for each pkey enumerated from the local > > port. The ep will be found from the port's ep_list if it exists. > > However, if an old pkey is not present in the new SM configuration, > > the old ep will still be linked in the port's ep_list with the > > ep->endpoint being set to NULL. When the ibacm core forwards the > > client reregistration event to ibacmp, ibacmp will enumerate the > > ep_list and try to join multicast group for each ep, including any one > > with ep- > > >endpoint > > set to NULL. In this case, it will cause segfault in acm_send_sa_mad(). > > Additional check should be able to avoid the crash. > > I'm having trouble following this. Is the problem that the provider is not > cleaning up properly/completely when being told to close its endpoint? > [Wan, Kaike] Yes, it's a problem for the current implementation of the default provider ibacmp. If the provider can clean up properly, it will not have such a problem. Nevertheless, the ibacm core should validate incoming parameters for a SA request from any provider. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c index 7568b9c..2b85958 100644 --- a/prov/acmp/src/acmp.c +++ b/prov/acmp/src/acmp.c @@ -1446,6 +1446,10 @@ static int acmp_port_join(void *port_context) for (ep_entry = port->ep_list.Next; ep_entry != &port->ep_list; ep_entry = ep_entry->Next) { ep = container_of(ep_entry, struct acmp_ep, entry); + if (!ep->endpoint) { + /* Stale endpoint */ + continue; + } acmp_ep_join(ep); } acm_log(1, "joins for device %s port %d complete\n", diff --git a/src/acm.c b/src/acm.c index d807c73..2d0d2e1 100644 --- a/src/acm.c +++ b/src/acm.c @@ -2352,6 +2352,10 @@ acm_alloc_sa_mad(const struct acm_endpoint *endpoint, void *context, { struct acmc_sa_req *req; + if (!endpoint) { + acm_log(0, "Error: NULL endpoint\n"); + return NULL; + } req = calloc(1, sizeof (*req)); if (!req) { acm_log(0, "Error: failed to allocate sa request\n");