From patchwork Mon Nov 16 00:59:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11907027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5ADA9138B for ; Mon, 16 Nov 2020 01:01:33 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3E35120578 for ; Mon, 16 Nov 2020 01:01:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E35120578 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A9B57306E0E; Sun, 15 Nov 2020 17:01:09 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1E666308152 for ; Sun, 15 Nov 2020 17:00:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 72F93224F; Sun, 15 Nov 2020 20:00:06 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 710A12C7E6; Sun, 15 Nov 2020 20:00:06 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 15 Nov 2020 19:59:51 -0500 Message-Id: <1605488401-981-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1605488401-981-1-git-send-email-jsimmons@infradead.org> References: <1605488401-981-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/28] lustre: ptlrpc: throttle RPC resend if network error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Aurelien Degremont When sending a callback AST to a non-responding client, the server retries endlessly until the client is eventually evicted. When using ksocklnd, it will retry after each AST timeout, until the socket is eventually closed, after sock_timeout sec, where the retry will fail immediately, returning -110, as no socket could be established. The thread will spin on retrying and failing, until eventual client eviction. This will cause high thread CPU usage and possible resource denial. To workaround that, this patch avoids re-trying callback resend if: - the request is flagged with network error and timeout - last try was less than 1 sec ago In worst case, retry will happen after a timeout based on req->rq_deadline. If there is nothing else to handle, thread will be sleeping during that time, removing CPU overhead. WC-bug-id: https://jira.whamcloud.com/browse/LU-13984 Lustre-commit: 4103527c1c9b38 ("LU-13984 ptlrpc: throttle RPC resend if network error") Signed-off-by: Aurelien Degremont Reviewed-on: https://review.whamcloud.com/40020 Reviewed-by: Andreas Dilger Reviewed-by: Alexander Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/client.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index c9d9fe9..0e01ab33 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -1900,6 +1900,26 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set) goto interpret; } + /* don't resend too fast in case of network + * errors. + */ + if (ktime_get_real_seconds() < (req->rq_sent + 1) + && req->rq_net_err && req->rq_timedout) { + DEBUG_REQ(D_INFO, req, + "throttle request"); + /* Don't try to resend RPC right away + * as it is likely it will fail again + * and ptlrpc_check_set() will be + * called again, keeping this thread + * busy. Instead, wait for the next + * timeout. Flag it as resend to + * ensure we don't wait to long. + */ + req->rq_resend = 1; + spin_unlock(&imp->imp_lock); + continue; + } + list_move_tail(&req->rq_list, &imp->imp_sending_list);