[35/45] lustre: osc: Do not wait for grants for too long

Message ID	1590444502-20533-36-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=/Zk1=7H=lists.lustre.org=lustre-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 569DC2071A From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de> Date: Mon, 25 May 2020 18:08:12 -0400 Message-Id: <1590444502-20533-36-git-send-email-jsimmons@infradead.org> In-Reply-To: <1590444502-20533-1-git-send-email-jsimmons@infradead.org> References: <1590444502-20533-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 35/45] lustre: osc: Do not wait for grants for too long Precedence: list Cc: Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	lustre: merged OpenSFS client patches from April 30 to today \| expand [00/45] lustre: merged OpenSFS client patches from April 30 to today [01/45] lustre: fid: revert seq_client_rpc patch. [02/45] lustre: fld: convert cache_flush file to LPROC_SEQ_FOPS [03/45] lustre: cleanups and bug fixes [04/45] lnet: merge lnet_md_alloc into lnet_md_build. [05/45] lnet: always put a page list into struct lnet_libmd [06/45] lnet: discard kvec option from lnet_libmd. [07/45] lnet: remove msg_iov from lnet_msg. [08/45] lnet: o2iblnd: discard kiblnd_setup_rd_iov [09/45] lustre: ptlrpc: return proper write count from ping_store [10/45] lustre: sec: check permissions for changelogs access [11/45] lustre: uapi: add OBD_CONNECT2_FIDMAP [12/45] lustre: lov: lov_io_sub_init()) ASSERTION [13/45] lnet: Introduce constant for the lolnd NID [14/45] lustre: Remove inappropriate uses of BIT() macro. [15/45] lustre: mgc: protect from NULL exp in mgc_enqueue() [16/45] lustre: llite: do not flush COW pages from mapping [17/45] lustre: quota: quota pools for OSTs [18/45] lnet: libcfs: use BIT() macro where appropriate [19/45] lustre: llite: clean up pcc_layout_wait() [20/45] lustre: misc: declare static chars as const where possible. [21/45] lustre: llite: fix to make jobstats work for async ra [22/45] lustre: llite: verify truncated xattr is handled [23/45] lustre: obd: fix printing of client connection UUID [24/45] lnet: Add MD options for response tracking [25/45] lustre: Send file creation time to clients [26/45] lnet: stop using struct timeval [27/45] lustre: ptlrpc: connect to MDT stucks [28/45] lnet: restrict gateway selection [29/45] lustre: llite: restore ll_dcompare() [30/45] lustre: fallocate: Implement fallocate preallocate operation [31/45] lustre: llite: fix possible divide zero in ll_use_fast_io() [32/45] lustre: llog: allow delete of zero size llog [33/45] lustre: ldlm: use proper units for timeouts [34/45] lustre: dne: support directory restripe [35/45] lustre: osc: Do not wait for grants for too long [36/45] lnet: use kmem_cache_zalloc as appropriate. [37/45] lustre: osc: Ensure immediate departure of sync write pages [38/45] lnet: remove lnet_extract_iov() [39/45] lnet: simplify ksock_tx. [40/45] lnet: socklnd: discard tx_iov. [41/45] lustre: lmv: do not print MDTs that are inactive [42/45] lnet: use the same src nid for discovery [43/45] lustre: llite: check if page truncated in ll_write_begin() [44/45] lustre: dne: improve temp file name check [45/45] lustre: all: Cleanup LASSERTF uses missing newlines

Message ID

1590444502-20533-36-git-send-email-jsimmons@infradead.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 569DC2071A
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Mon, 25 May 2020 18:08:12 -0400
Message-Id: <1590444502-20533-36-git-send-email-jsimmons@infradead.org>
In-Reply-To: <1590444502-20533-1-git-send-email-jsimmons@infradead.org>
References: <1590444502-20533-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 35/45] lustre: osc: Do not wait for grants
 for too long
Precedence: list
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

Series

lustre: merged OpenSFS client patches from April 30 to today | expand

Commit Message

James Simmons May 25, 2020, 10:08 p.m. UTC

From: Oleg Drokin <green@whamcloud.com>

obd_timeout is way too long considering we are holding a lock
that might be contended. If OST is slow to respond, we might
get evicted, so limit us to a half of the shortest possible
max wait a server might have before switching to synchronous IO.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13131
Lustre-commit: 1eee11c75ca13 ("LU-13131 osc: Do not wait for grants for too long")
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38283
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_dlm.h |  2 ++
 fs/lustre/ldlm/ldlm_request.c  |  1 +
 fs/lustre/osc/osc_cache.c      | 13 ++++++++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index f67b612..174b314 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -1320,6 +1320,8 @@  int ldlm_cli_cancel_list(struct list_head *head, int count,
 			 enum ldlm_cancel_flags flags);
 /** @} ldlm_cli_api */
 
+extern unsigned int ldlm_enqueue_min;
+
 int ldlm_inodebits_drop(struct ldlm_lock *lock, u64 to_drop);
 int ldlm_cli_inodebits_convert(struct ldlm_lock *lock,
 			       enum ldlm_cancel_flags cancel_flags);
diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 5f06def..12ee241 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -69,6 +69,7 @@ 
 unsigned int ldlm_enqueue_min = OBD_TIMEOUT_DEFAULT;
 module_param(ldlm_enqueue_min, uint, 0644);
 MODULE_PARM_DESC(ldlm_enqueue_min, "lock enqueue timeout minimum");
+EXPORT_SYMBOL(ldlm_enqueue_min);
 
 /* in client side, whether the cached locks will be canceled before replay */
 unsigned int ldlm_cancel_unused_locks_before_replay = 1;
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 9e28ff6..c7f1502 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -39,6 +39,7 @@ 
 #define DEBUG_SUBSYSTEM S_OSC
 
 #include <lustre_osc.h>
+#include <lustre_dlm.h>
 
 #include "osc_internal.h"
 
@@ -1630,10 +1631,20 @@  static int osc_enter_cache(const struct lu_env *env, struct client_obd *cli,
 {
 	struct osc_object *osc = oap->oap_obj;
 	struct lov_oinfo *loi = osc->oo_oinfo;
-	unsigned long timeout = (AT_OFF ? obd_timeout : at_max) * HZ;
 	int rc = -EDQUOT;
 	int remain;
 	bool entered = false;
+	/* We cannot wait for a long time here since we are holding ldlm lock
+	 * across the actual IO. If no requests complete fast (e.g. due to
+	 * overloaded OST that takes a long time to process everything, we'd
+	 * get evicted if we wait for a normal obd_timeout or some such.
+	 * So we try to wait half the time it would take the client to be
+	 * evicted by server which is half obd_timeout when AT is off
+	 * or at least ldlm_enqueue_min with AT on.
+	 * See LU-13131
+	 */
+	unsigned long timeout = (AT_OFF ? obd_timeout / 2 :
+					  ldlm_enqueue_min / 2) * HZ;
 
 	OSC_DUMP_GRANT(D_CACHE, cli, "need:%d\n", bytes);

[35/45] lustre: osc: Do not wait for grants for too long

Commit Message

Patch