From patchwork Fri Dec 15 17:28:00 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 10115663 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1D1C160231 for ; Fri, 15 Dec 2017 17:28:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA96E290FE for ; Fri, 15 Dec 2017 17:28:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF1322912D; Fri, 15 Dec 2017 17:28:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30DB7290FE for ; Fri, 15 Dec 2017 17:28:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755501AbdLOR2H (ORCPT ); Fri, 15 Dec 2017 12:28:07 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:43371 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755428AbdLOR2G (ORCPT ); Fri, 15 Dec 2017 12:28:06 -0500 Received: by mail-wr0-f194.google.com with SMTP id z34so8653874wrz.10 for ; Fri, 15 Dec 2017 09:28:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=SU3TXhgLQLmc/eCng7Ey7QTq+xm7l9xGGNt0AF9XvIE=; b=kvtXSZfNLGHXjXBQ57+jFZctoNo55azZD18HZOnW9cJOZx6RqZBiDSKasGYhFJjoob ifV/PzJSW6yhF4foJdJdast5zhjU/oJ31Cd56zaM2fegJhj+yoXaei6Up8y5pGiMPkZl j6IOR2WMFpWH5xYQMHjTQx6F6Qh8EYRsKznE6b20OaA1N3hQtoKt/a3ea0FKXjPwyToT cxgUr5XaqUUIClxRRBh3j7ulvAa5iqKccXtSCQXhQDQyd/QYDEt6fB0C6DEQxUPKLQgb lSb0eeQvGwyPHNxMZ12EUxyBAHheKHYr2v5MSqQtgsKZRn6rCCgznQJLJjuV3BVEMbSr QIkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=SU3TXhgLQLmc/eCng7Ey7QTq+xm7l9xGGNt0AF9XvIE=; b=hEKBazDYADbtwte41W2V6h+frhE3SSs3XG4cOuGRkPdL2heBsp0k/WOhRB4aiwBwMn eImkHDtOYVyTgv5LJil8CdeK2I6RaSNmNdHWhOoDqWPtCqBjA7QbUwf64dJxnBRMxu/T RrWrafrohdmglQBWC+Ti0kbuuCAOLfNMvCpY03tKDDrAHptPvC6eqf1s3J/jzjaoBTCw 8qroURuGJUU+LCKWtscPvlzfmsV31IeZZQ+DOv/8IMh+MDEFssuHyRmDdhw6eitqKIRd hzCbsfvO2VUso+EzmsbsNlOBYZ6E0kiyPDhBbKn6Z7AXUckG+BkW3cbcdBSdb8b3bore 3NEg== X-Gm-Message-State: AKGB3mK7erNUwZ04BGeJZjFZ7M/uEbzRntvE0tET3JNmK0rpKzAV40nl 5PCHEsV7UtNu9c+1asuM9sQZ4g== X-Google-Smtp-Source: ACJfBouT18FJRJepiomL/RdU7LJtDI5eQyWih86gq+GsYiouhdVVXEGzEMGi2SwI4koToQyRdd0kkg== X-Received: by 10.223.161.210 with SMTP id v18mr9273558wrv.170.1513358884719; Fri, 15 Dec 2017 09:28:04 -0800 (PST) Received: from ziepe.ca (S010614cc2056d97f.ed.shawcable.net. [70.74.179.152]) by smtp.gmail.com with ESMTPSA id k19sm6097684wrk.88.2017.12.15.09.28.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Dec 2017 09:28:04 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.86_2) (envelope-from ) id 1ePtmC-0003m0-Ls; Fri, 15 Dec 2017 10:28:00 -0700 Date: Fri, 15 Dec 2017 10:28:00 -0700 From: Jason Gunthorpe To: Honggang LI Cc: Bart Van Assche , "linux-rdma@vger.kernel.org" Subject: Re: [PATCH rdma-core] srp_daemon: Install signal handler for ibsrpdm Message-ID: <20171215172800.GA12434@ziepe.ca> References: <20171214110241.4701-1-honli@redhat.com> <1513263572.2986.2.camel@wdc.com> <20171215013628.GA743@dhcp-13-42.nay.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20171215013628.GA743@dhcp-13-42.nay.redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Dec 15, 2017 at 09:36:28AM +0800, Honggang LI wrote: > On Thu, Dec 14, 2017 at 02:59:34PM +0000, Bart Van Assche wrote: > > On Thu, 2017-12-14 at 19:02 +0800, Honggang LI wrote: > > > fd3005f0cd34 moves the signal handler setup from ibsrpdm path. So, > > > default signal handler will be used when it receives signal SIGINT. > > > ibsrpdm will exit with non-zero exit code as default signal handler > > > killed it. > > > > Can you explain why you think that ibsrpdm needs a signal handler? > > main > ibsrpdm > alloc_res > pthread_create(&res->res.async_ev_thread [1] > .... > free_res > if (res->async_ev_thread) { pthread_kill(res->async_ev_thread, SIGINT); [2] > > > The ibsrpdm program create a thread in [1], and send a SIGINT in [2]. > The default behavior of SIGINT is to terminate the process. Yuk, no, using signals like this is horrifyingly buggy. Here is a sketch on how to fix it properly. All the users of pthread_kill should be eliminated. Though overall, there is really no reason to even cleanup the threads, just call exit? --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c index cec36db2e0f12e..fe307705ad6a51 100644 --- a/srp_daemon/srp_daemon.c +++ b/srp_daemon/srp_daemon.c @@ -61,6 +61,7 @@ #include #include #include +#include #include #include #include @@ -1887,7 +1888,9 @@ static void free_res(struct resources *res) modify_qp_to_err(res->ud_res->qp); if (res->reconnect_thread) { - pthread_kill(res->reconnect_thread, SIGINT); + uint64_t val = 1; + + write(res->sync_res->stop_event_fd, &val, sizeof(val)); pthread_join(res->reconnect_thread, &status); } if (res->async_ev_thread) { @@ -1947,6 +1950,7 @@ static struct resources *alloc_res(void) goto err; } + res->sync_res.stop_event_fd = eventfd(0, EFD_CLOEXEC); ret = pthread_create(&res->res.async_ev_thread, NULL, run_thread_listen_to_events, &res->res); if (ret) diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h index 5d268ed395e17e..8cbee09d7c688b 100644 --- a/srp_daemon/srp_daemon.h +++ b/srp_daemon/srp_daemon.h @@ -245,6 +245,7 @@ enum { }; struct sync_resources { + int stop_event_fd; int stop_threads; int next_task; struct timespec next_recalc_time; diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c index 6b36b15cc84c16..cedc2b0ab8e5af 100644 --- a/srp_daemon/srp_handle_traps.c +++ b/srp_daemon/srp_handle_traps.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "srp_ib_types.h" @@ -788,69 +789,94 @@ int register_to_traps(struct resources *res, int subscribe) } -void *run_thread_listen_to_events(void *res_in) +static int do_async_event(struct resources *res) { - struct resources *res = (struct resources *)res_in; struct ibv_async_event event; - while (!stop_threads(res->sync_res)) { - if (ibv_get_async_event(res->ud_res->ib_ctx, &event)) { - if (errno != EINTR) - pr_err("ibv_get_async_event failed (errno = %d)\n", - errno); - break; + if (ibv_get_async_event(res->ud_res->ib_ctx, &event)) { + if (errno != EINTR) + pr_err("ibv_get_async_event failed (errno = %d)\n", + errno); + return -1; + } + + pr_debug("event_type %d, port %d\n", event.event_type, + event.element.port_num); + + switch (event.event_type) { + case IBV_EVENT_PORT_ACTIVE: + case IBV_EVENT_SM_CHANGE: + case IBV_EVENT_LID_CHANGE: + case IBV_EVENT_CLIENT_REREGISTER: + case IBV_EVENT_PKEY_CHANGE: + if (event.element.port_num == config->port_num) { + pthread_mutex_lock(&res->sync_res->mutex); + __schedule_rescan(res->sync_res, 0); + wake_up_main_loop(0); + pthread_mutex_unlock(&res->sync_res->mutex); } + break; - pr_debug("event_type %d, port %d\n", - event.event_type, event.element.port_num); - - switch (event.event_type) { - case IBV_EVENT_PORT_ACTIVE: - case IBV_EVENT_SM_CHANGE: - case IBV_EVENT_LID_CHANGE: - case IBV_EVENT_CLIENT_REREGISTER: - case IBV_EVENT_PKEY_CHANGE: - if (event.element.port_num == config->port_num) { - pthread_mutex_lock(&res->sync_res->mutex); - __schedule_rescan(res->sync_res, 0); - wake_up_main_loop(0); - pthread_mutex_unlock(&res->sync_res->mutex); - } - break; - - case IBV_EVENT_DEVICE_FATAL: - case IBV_EVENT_CQ_ERR: - case IBV_EVENT_QP_FATAL: - /* clean and restart */ - pr_err("Critical event %d, raising catastrophic " - "error signal\n", event.event_type); - raise(SRP_CATAS_ERR); - break; + case IBV_EVENT_DEVICE_FATAL: + case IBV_EVENT_CQ_ERR: + case IBV_EVENT_QP_FATAL: + /* clean and restart */ + pr_err("Critical event %d, raising catastrophic " + "error signal\n", + event.event_type); + raise(SRP_CATAS_ERR); + break; - /* + /* - case IBV_EVENT_PORT_ERR: - case IBV_EVENT_QP_REQ_ERR: - case IBV_EVENT_QP_ACCESS_ERR: - case IBV_EVENT_COMM_EST: - case IBV_EVENT_SQ_DRAINED: - case IBV_EVENT_PATH_MIG: - case IBV_EVENT_PATH_MIG_ERR: - case IBV_EVENT_SRQ_ERR: - case IBV_EVENT_SRQ_LIMIT_REACHED: - case IBV_EVENT_QP_LAST_WQE_REACHED: + case IBV_EVENT_PORT_ERR: + case IBV_EVENT_QP_REQ_ERR: + case IBV_EVENT_QP_ACCESS_ERR: + case IBV_EVENT_COMM_EST: + case IBV_EVENT_SQ_DRAINED: + case IBV_EVENT_PATH_MIG: + case IBV_EVENT_PATH_MIG_ERR: + case IBV_EVENT_SRQ_ERR: + case IBV_EVENT_SRQ_LIMIT_REACHED: + case IBV_EVENT_QP_LAST_WQE_REACHED: - */ + */ - default: + default: + break; + } + + ibv_ack_async_event(&event); + return 0; +} + +void *run_thread_listen_to_events(void *res_in) +{ + struct pollfd fds[2]; + struct resources *res = (struct resources *)res_in; + + fds[0].fd = res->ud_res->ib_ctx->async_fd; + fds[0].events = POLLIN; + fds[1].fd = res->sync_res->stop_event_fd; + fds[1].events = POLLIN; + + while (1) { + if (poll(fds, 2, -1) == -1) { + if (errno == EINTR) + continue; + pr_err("ibv_get_async_event failed (errno = %d)\n", + errno); break; } - ibv_ack_async_event(&event); + if (fds[1].revents & POLLIN) + break; + if (fds[0].revents & POLLIN) + if (do_async_event(res)) + break; } return NULL; } -