From patchwork Fri Feb 28 22:29:56 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Scott Mayhew X-Patchwork-Id: 3744781 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 050C7BF13A for ; Fri, 28 Feb 2014 22:30:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 24F1420240 for ; Fri, 28 Feb 2014 22:30:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F324220237 for ; Fri, 28 Feb 2014 22:30:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752400AbaB1WaI (ORCPT ); Fri, 28 Feb 2014 17:30:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39666 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752808AbaB1W35 (ORCPT ); Fri, 28 Feb 2014 17:29:57 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s1SMTv5f024605 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 28 Feb 2014 17:29:57 -0500 Received: from tonberry.usersys.redhat.com (dhcp145-38.rdu.redhat.com [10.13.145.38]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s1SMTuka007998 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Feb 2014 17:29:57 -0500 Received: from tonberry.usersys.redhat.com (localhost [127.0.0.1]) by tonberry.usersys.redhat.com (8.14.5/8.14.5) with ESMTP id s1SMTuXh065413; Fri, 28 Feb 2014 17:29:56 -0500 Received: (from smayhew@localhost) by tonberry.usersys.redhat.com (8.14.5/8.14.5/Submit) id s1SMTuDj065412; Fri, 28 Feb 2014 17:29:56 -0500 Date: Fri, 28 Feb 2014 17:29:56 -0500 From: Scott Mayhew To: linux-nfs@vger.kernel.org Cc: jlayton@redhat.com Subject: [PATCH/RFC] Add simple backoff logic when reconnecting to a server that recently initiated a connection close Message-ID: <20140228222956.GA1544@tonberry.usersys.redhat.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We recently had a customer whose filer began closing the client's connection upon receipt of a PATHCONF operation. The environment had a mix of RHEL 6.2 and RHEL 6.4 clients. The RHEL 6.2 clients would wait 3 seconds before reconnecting while the RHEL 6.4 clients would reconnect immediately, triggering what could be described as a DOS on the filer. The difference in behavior was due to the inclusion of commit a519fc7 (SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT) in the RHEL 6.4 kernel. With this commit in place, when the server initiates a close we wind up destroying the transport socket, and a subsequent call to xs_connect() we attempt to connect right away. Prior to this commit, we would arrive in xs_connect() with a non-NULL socket and as a result we'd delay for the reestablish_timeout before trying to connect. It's still unknown what originally caused the filer to behave in this manner, but I'm able to reproduce this behavior with a Linux NFS server patched to close the connection upon receipt of a PATHCONF operation (patch attached). I've also attached two possible fixes using the old xprt->shutdown field to indicate that the other end has initiated a shutdown of the connection. The goal of both patches is to bring back some of the old backoff logic without undoing the fix brought about by a519fc7. The first option will delay any time the xprt->shutdown field is set -- the drawback being that in the case of a simple restart of the NFS server, the client may delay for 3 seconds before trying to reconnect. The second option will only delay after 3 successive connect/close sequences where we've not received a proper reply to an RPC request. -Scott From 275ac2c1ba2279b23b191825fa190f5b6f7882c5 Mon Sep 17 00:00:00 2001 From: Scott Mayhew Date: Fri, 28 Feb 2014 16:34:24 -0500 Subject: [PATCH] svcrpc: close connection when nfs v3 pathconf is received reproducer for case 1046417 --- net/sunrpc/svc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 5de6801..52866dc 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1113,6 +1113,12 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) rqstp->rq_vers = vers = svc_getnl(argv); /* version number */ rqstp->rq_proc = proc = svc_getnl(argv); /* procedure number */ + if (prog == 100003 && vers == 3 && proc == 20) { + dprintk("received pathconf, closing connection\n"); + svc_close_xprt(rqstp->rq_xprt); + goto dropit; + } + for (progp = serv->sv_program; progp; progp = progp->pg_next) if (prog == progp->pg_prog) break;