From patchwork Wed Apr 1 17:19:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 6141531 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2DE83BF4A6 for ; Wed, 1 Apr 2015 17:19:36 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 214012037D for ; Wed, 1 Apr 2015 17:19:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D990E20375 for ; Wed, 1 Apr 2015 17:19:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752601AbbDARTb (ORCPT ); Wed, 1 Apr 2015 13:19:31 -0400 Received: from mail-la0-f52.google.com ([209.85.215.52]:35658 "EHLO mail-la0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751113AbbDARTa (ORCPT ); Wed, 1 Apr 2015 13:19:30 -0400 Received: by lahf3 with SMTP id f3so41387481lah.2 for ; Wed, 01 Apr 2015 10:19:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=nnKh6v3SiYMgBX6RWsPh7r4BDY5XdGy3FhICI8kYEbQ=; b=rLtTkZIdtP5l0kxMQMaiQ6wLGxK/TT3yjLjBu/MGVBpQfigh6WWXramZtKeBaJrrEg da551weVEA/3EZbbdLhysfIw/G64nMQdeOloaR/vY9VryMBpwI7BfJowXHuRHHJEjWW4 oRXb9B00nuGNpAk1uCKy/q4f/fjdruk9Jq+k0RqSQJqzFT1HYR/Pafo3PecdTurJEQBj pPCrErajrq3pB2Vhey44W+Mg5M+1hl7GoItscqNKWxznitWL2ClOEqTkEmDBOYdRW3pg ifU86Gl6u/oeA1mn/FNcY3+r8gWHYeArl8Vf7Md4wd0yMPks5YdV4u/ei9SqxE3E2aqL 5JvQ== X-Received: by 10.152.27.97 with SMTP id s1mr18202911lag.53.1427908768919; Wed, 01 Apr 2015 10:19:28 -0700 (PDT) Received: from localhost.localdomain ([109.110.67.162]) by mx.google.com with ESMTPSA id lv11sm538392lac.38.2015.04.01.10.19.27 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Apr 2015 10:19:27 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Mike Christie , Mel Gorman , Sage Weil Subject: [PATCH] libceph: don't set memalloc flags in loopback case Date: Wed, 1 Apr 2015 20:19:20 +0300 Message-Id: <1427908760-7083-1-git-send-email-idryomov@gmail.com> X-Mailer: git-send-email 1.9.3 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SBL_CSS, T_DKIM_INVALID, T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Following nbd and iscsi, commit 89baaa570ab0 ("libceph: use memalloc flags for net IO") set SOCK_MEMALLOC and PF_MEMALLOC flags for rbd and cephfs. However it turned out to not play nice with loopback scenario, leading to lockups with a full socket send-q and empty recv-q. While we always advised against colocating kernel client and ceph servers on the same box, a few people are doing it and it's also useful for light development testing, so rather than reverting make sure to not set those flags in the loopback case. Cc: Mike Christie Cc: Mel Gorman Cc: Sage Weil Cc: stable@vger.kernel.org # 3.18+, needs backporting Signed-off-by: Ilya Dryomov --- net/ceph/messenger.c | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 6b3f54ed65ba..9fa2cce71164 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -101,6 +101,7 @@ #define CON_FLAG_WRITE_PENDING 2 /* we have data ready to send */ #define CON_FLAG_SOCK_CLOSED 3 /* socket state changed to closed */ #define CON_FLAG_BACKOFF 4 /* need to retry queuing delayed work */ +#define CON_FLAG_LOCAL 5 /* using loopback interface */ static bool con_flag_valid(unsigned long con_flag) { @@ -110,6 +111,7 @@ static bool con_flag_valid(unsigned long con_flag) case CON_FLAG_WRITE_PENDING: case CON_FLAG_SOCK_CLOSED: case CON_FLAG_BACKOFF: + case CON_FLAG_LOCAL: return true; default: return false; @@ -470,6 +472,18 @@ static void set_sock_callbacks(struct socket *sock, * socket helpers */ +static bool sk_is_loopback(struct sock *sk) +{ + struct dst_entry *dst = sk_dst_get(sk); + bool ret = false; + + if (dst) { + ret = dst->dev && (dst->dev->flags & IFF_LOOPBACK); + dst_release(dst); + } + return ret; +} + /* * initiate connection to a remote socket. */ @@ -484,7 +498,7 @@ static int ceph_tcp_connect(struct ceph_connection *con) IPPROTO_TCP, &sock); if (ret) return ret; - sock->sk->sk_allocation = GFP_NOFS | __GFP_MEMALLOC; + sock->sk->sk_allocation = GFP_NOFS; #ifdef CONFIG_LOCKDEP lockdep_set_class(&sock->sk->sk_lock, &socket_class); @@ -510,6 +524,11 @@ static int ceph_tcp_connect(struct ceph_connection *con) return ret; } + if (sk_is_loopback(sock->sk)) + con_flag_set(con, CON_FLAG_LOCAL); + else + con_flag_clear(con, CON_FLAG_LOCAL); + if (con->msgr->tcp_nodelay) { int optval = 1; @@ -520,7 +539,18 @@ static int ceph_tcp_connect(struct ceph_connection *con) ret); } - sk_set_memalloc(sock->sk); + /* + * Tagging with SOCK_MEMALLOC / setting PF_MEMALLOC may lead to + * lockups if our peer is on the same host (communicating via + * loopback) due to sk_filter() mercilessly dropping pfmemalloc + * skbs on the receiving side - receiving loopback socket is + * not going to be tagged with SOCK_MEMALLOC. See: + * + * - http://article.gmane.org/gmane.linux.kernel/1418791 + * - http://article.gmane.org/gmane.linux.kernel.stable/46128 + */ + if (!con_flag_test(con, CON_FLAG_LOCAL)) + sk_set_memalloc(sock->sk); con->sock = sock; return 0; @@ -2811,7 +2841,11 @@ static void con_work(struct work_struct *work) unsigned long pflags = current->flags; bool fault; - current->flags |= PF_MEMALLOC; + /* + * See SOCK_MEMALLOC comment in ceph_tcp_connect(). + */ + if (!con_flag_test(con, CON_FLAG_LOCAL)) + current->flags |= PF_MEMALLOC; mutex_lock(&con->mutex); while (true) {