From patchwork Wed Jul 2 22:46:22 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hefty, Sean" X-Patchwork-Id: 4468351 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 438E99F390 for ; Wed, 2 Jul 2014 22:46:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6483B2020A for ; Wed, 2 Jul 2014 22:46:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 585732017E for ; Wed, 2 Jul 2014 22:46:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751535AbaGBWqi (ORCPT ); Wed, 2 Jul 2014 18:46:38 -0400 Received: from mga03.intel.com ([143.182.124.21]:20321 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751150AbaGBWqh (ORCPT ); Wed, 2 Jul 2014 18:46:37 -0400 Received: from azsmga001.ch.intel.com ([10.2.17.19]) by azsmga101.ch.intel.com with ESMTP; 02 Jul 2014 15:46:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,591,1400050800"; d="scan'208";a="452686758" Received: from cst-linux.jf.intel.com ([10.23.221.72]) by azsmga001.ch.intel.com with ESMTP; 02 Jul 2014 15:46:24 -0700 From: sean.hefty@intel.com To: linux-rdma@vger.kernel.org, hal@mellanox.com Cc: Sean Hefty Subject: [PATCH] rsocket: Fix crash resulting from keepalive timeout Date: Wed, 2 Jul 2014 15:46:22 -0700 Message-Id: <1404341182-12533-1-git-send-email-sean.hefty@intel.com> X-Mailer: git-send-email 1.7.3 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sean Hefty The following crash was reported by Hal Rosenstock, , with keepalive enabled. The crash occurs in the keepalive thread attempting to send a keepalive message. report: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffecf08700 (LWP 6013)] rs_post_write (rs=, sgl=0x0, nsge=0, wr_data=3758096385, flags=0, addr=0, rkey=0) at src/rsocket.c:1660 1660 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad)); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 (gdb) (gdb) p/x rs $1 = value has been optimized out So I added in the following to debug: 1660 if (rs == NULL) 1661 abort(); 1662 if (rs->cm_id == NULL) 1663 abort(); 1664 if (rs->cm_id->qp == NULL) 1665 abort(); 1666 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad)); 1667 } And saw in gdb: Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fffecf08700 (LWP 8096)] 0x00000030d50328a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 (gdb) (gdb) bt #0 0x00000030d50328a5 in raise () from /lib64/libc.so.6 #1 0x00000030d5034085 in abort () from /lib64/libc.so.6 #2 0x00007ffff057fe23 in rs_post_write (rs=, sgl=0x1fa0, nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665 #3 0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20) at src/rsocket.c:4245 #4 tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279 #5 0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0 #6 0x00000030d50e890d in clone () from /lib64/libc.so.6 (gdb) fr 2 #2 0x00007ffff057fe23 in rs_post_write (rs=, sgl=0x1fa0, nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665 1665 abort(); So qp is NULL somehow... :end report There is an issue if an rsocket is closed without going through the rshutdown. int rshutdown(int socket, int how) { ... if (rs->opts & RS_OPT_SVC_ACTIVE) rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE); We remove the rsocket from the keepalive thread in rshutdown. int rclose(int socket) { ... if (rs->state & rs_connected) rshutdown(socket, SHUT_RDWR); ... rs_free(rs); rclose will call shutdown only if we're connected. However, if the keepalive failed, the socket will be in an error state. So, no call to rshutdown, which will leave the freed rsocket on the keepalive thread's list. The fix is to to have rclose remove an rsocket from being processed by a service thread if it is still active. Signed-off-by: Sean Hefty Tested-by: Hal Rosenstock --- src/rsocket.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/src/rsocket.c b/src/rsocket.c index 3048e5e..f81fb1b 100644 --- a/src/rsocket.c +++ b/src/rsocket.c @@ -3265,6 +3265,8 @@ int rclose(int socket) if (rs->type == SOCK_STREAM) { if (rs->state & rs_connected) rshutdown(socket, SHUT_RDWR); + else if (rs->opts & RS_OPT_SVC_ACTIVE) + rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE); } else { ds_shutdown(rs); }