From patchwork Thu Jun 8 19:47:49 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Olga Kornievskaia X-Patchwork-Id: 9776377 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C5F416034B for ; Thu, 8 Jun 2017 19:47:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFA4A269A3 for ; Thu, 8 Jun 2017 19:47:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A462228472; Thu, 8 Jun 2017 19:47:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A48BA269A3 for ; Thu, 8 Jun 2017 19:47:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751470AbdFHTrw (ORCPT ); Thu, 8 Jun 2017 15:47:52 -0400 Received: from mail-qk0-f172.google.com ([209.85.220.172]:34367 "EHLO mail-qk0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751467AbdFHTrv (ORCPT ); Thu, 8 Jun 2017 15:47:51 -0400 Received: by mail-qk0-f172.google.com with SMTP id d14so9467285qkb.1 for ; Thu, 08 Jun 2017 12:47:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=3GgQGlj/Zz7CszJw68P+WHUxanMED+d+s6+Hc3btt1I=; b=CU+T2I7wJVFRUWSGUxDS6cuP/pnzMDuCceuwHKuAt1XFYI5urOYG2ZmZ5RhdbjUoHa SjPgWmBfVgzkLupyspfVVXV2lqGIeAPzAnq6APt+4FB1WTrW0zdHgjnwt9i49LE/oiB9 yaHeNO7RoResWS0Hu1zNYlE2O2eGoq4OU9pV8IKu3cgoEBZXhy9SPlDLUBar93J14ohM euYQ60k8jtx9n+EIXCPAXn2CxdShlPfYpyOsagUH8qe2wSz78mMDaO/FCC5dh0TsLFIn /Z56/ZKjj1WMrlF9eU8zOr6KcLpabT37LlNJnctTmBrXXsoHqWElQA8F0PLRoGAav8D3 E10w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=3GgQGlj/Zz7CszJw68P+WHUxanMED+d+s6+Hc3btt1I=; b=bmPK1x/nASFTr+4xj5eYonux87t2xTn4Phaq4OrShysVkQtyPEY5wAkD1RwaCmXcwZ 2BKndcza+u1RwHtsNhXaEoaEWPKChCAsdn+ZMeK91Z23YA16bSqFvFu42RGgUodVmbum MjSoaXs/oPJKudXuyqp/pztciuGcR+v7hNRRtEGuIAn4jkd1J2O2HPjMvdo4+ed2ZVUq xwjt4dXD53jj6+lWIuxDBhnG5cHpQWb8yzC2RnlgTi0PGVcFy7HC8LBnruIP1OncRzRC pSddoo4fcPaPnubolqmknusrMQZQU46bfOOgso+H7+n0hyvrS6LEC5WLNqkSx1Adtow3 sZYA== X-Gm-Message-State: AKS2vOweGaXovvQZ6qHch22cvl+gxQoxJGjIzIBUe2/OOiT+sPPGP8Jn D7OAUjvQrgQ4bqxhK0q0HGx7rBh/piY0 X-Received: by 10.55.217.136 with SMTP id q8mr33248115qkl.220.1496951270292; Thu, 08 Jun 2017 12:47:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.177.39 with HTTP; Thu, 8 Jun 2017 12:47:49 -0700 (PDT) In-Reply-To: References: From: Olga Kornievskaia Date: Thu, 8 Jun 2017 15:47:49 -0400 X-Google-Sender-Auth: uU0Or_kuohHNTjUQCu0QqvxI90I Message-ID: Subject: Re: race in state manager leads to failed recovery To: Trond Myklebust Cc: linux-nfs Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Trond, any thoughts on this? Another idea I had and tested to be effective was to introduce a new state bit "NFS_STATE_RECLAIM_INPROGRESS". If it's set then the new calls to nfs4_state_mark_reclaim_nograce() are just ignored. set_bit(NFS_OWNER_RECLAIM_NOGRACE, &state->owner->so_flags); @@ -1522,6 +1524,7 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs continue; atomic_inc(&state->count); spin_unlock(&sp->so_lock); + set_bit(NFS_STATE_RECLAIM_INPROGRESS, &state->flags); status = ops->recover_open(sp, state); if (status >= 0) { status = nfs4_reclaim_locks(state, ops); @@ -1538,6 +1541,8 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs } clear_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags); + clear_bit(NFS_STATE_RECLAIM_INPROGRESS, + &state->flags); nfs4_put_open_state(state); spin_lock(&sp->so_lock); goto restart; On Tue, Jun 6, 2017 at 12:39 PM, Olga Kornievskaia wrote: > An application can fail with EIO when the server reboots and the > client while doing the recovery is getting into the following > situation where it incorrectly chooses "nograce" recovery instead of > "reboot" recovery. The necessary trigger is a pnfs mount and 2 DS > reads on the same file fail with BAD_STATEID each read uses its own > lock stateid (and MDS server reboots losing state). Since server > rebooted it will fail recovery ops with BAD_SESSION and then also > STALE_ID triggering session and clientid recovery. > > 1. Thread1 gets BAD_STATEID initiates stateid recovery calls > nfs4_state_mark_reclaim_no_grace() which sets the cl_state->flags and > state->flags respective _NOGRACE flags. > > 2. State manager gets to run and it clears the _NOGRACE from the > cl_state. Calls nfs4_do_reclaim() which sends TEST_STATEID which gets > a BAD_SESSION. > > 3. Thread2 now gets control and it also processing its BAD_STATEID and > calls nfs4_state_mark_reclaim_no_grace() and it again sets the > state/cl_state->flags to have _NOGRACE. > > 4. State manager runs and continues with state recovery by sending > DESTROY_SESSION, and then CREATE_SESSION which fails with > STALE_CLIENTID. Now we are recovering the lease. > > 5. State manager in nfs4_recovery_handle_error for STALE_CLIENTID, > calls nfs4_state_start_reclaim_reboot() calls > nfs4_state_mark_reclaim_reboot() which has a check > > /* Don't recover state that expired before the reboot */ > if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) { > clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags); > return 0; > } > > Because the _NOGRACE was set again in Step3, we end up clearing > _RECLAIM_REBOOT and not setting _RECLAIM_REBOOT in the cl_state->flags > and choose to do "nograce" operation instead of recovery. > > Removing this check removes the problem. > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index 06274e3..2c668a0 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -1321,11 +1321,6 @@ static int > nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_st > if (!nfs4_valid_open_stateid(state)) > return 0; > set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags); > - /* Don't recover state that expired before the reboot */ > - if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) { > - clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags); > - return 0; > - } > set_bit(NFS_OWNER_RECLAIM_REBOOT, &state->owner->so_flags); > set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state); > return 1; > > I'm confused by the comment why would we not what to recover state? > > When both threads execute before the state manager starts this problem > doesn't exist. > > If helpful here are two annotated (**) snippet from var log message > from the non-working and then one without the check: > > Jun 6 12:06:42 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff880072f50c00 > ** first thread sets the nograce recovery > Jun 6 12:06:42 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce > cleared NOGRACE from client state ffff88006fc23d28 > ** this is from the state manager’s loop from after > test_and_clear(NFS4CLNT_RECLAIM_NO_GRACE, cl_state) > Jun 6 12:06:42 ipa18 kernel: AGLO: nfs4_do_reclaim start > Jun 6 12:06:42 ipa18 kernel: NFS call test_stateid ffff880077ac0af0 > Jun 6 12:06:42 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff880072f50c00 > ** 2nd thread sets the nograce recovery > Jun 6 12:06:42 ipa18 kernel: AGLO: nfs4_do_reclaim status end=-11 > Jun 6 12:06:42 ipa18 kernel: AGLO: nfs4_begin_drain_session > Jun 6 12:06:42 ipa18 kernel: AGLO: nfs4_state_mark_reclaim_reboot > clears RECLAIM_REBOOT from state=ffff880072f50c00 > ** this is from nfs4_state_mark_reclaim_reboot() and ends up not > setting the _RECLAIM_REBOOT flag in cl_state and clears it from the > state->flags > Jun 6 12:06:42 ipa18 kernel: AGLO: nfs4_begin_drain_session > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_begin_drain_session > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_reclaim_lease client > state=ffff88006fc23d28 flag has NOGRACE > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce > cleared NOGRACE from client state ffff88006fc23d28 > ** this is in the state manager loop after the "reclaim_nograce" is > chosen and clearing the _NOGRACE flag > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_do_reclaim start > Jun 6 12:06:43 ipa18 kernel: NFS call test_stateid ffff880077ac0af0 > Jun 6 12:06:43 ipa18 kernel: NFS call test_stateid ffff880077ac02f0 > Jun 6 12:06:43 ipa18 kernel: NFS call test_stateid ffff880072f50c68 > Jun 6 12:06:43 ipa18 kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed! > Jun 6 12:06:43 ipa18 kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed! > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_reclaim_open_state clearing > NOGRACE from state=ffff880072f50c00 > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_do_reclaim end=0 > Jun 6 12:06:43 ipa18 kernel: AGLO: nfs4_end_drain_session > > This is a snippet from when we ignore that _NOGRACE is set and proceed > to set the _RECLAIM_REBOOT > Jun 6 12:21:58 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff88007363bbc0 > Jun 6 12:21:58 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce > cleared NOGRACE from client state ffff88002febe528 > Jun 6 12:21:58 ipa18 kernel: AGLO: nfs4_do_reclaim start > Jun 6 12:21:58 ipa18 kernel: NFS call test_stateid ffff880077ac4cf0 > Jun 6 12:21:58 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff88007363bbc0 > Jun 6 12:21:58 ipa18 kernel: AGLO: nfs4_do_reclaim status end=-11 > Jun 6 12:21:58 ipa18 kernel: AGLO: nfs4_begin_drain_session > Jun 6 12:21:58 ipa18 kernel: AGLO: nfs4_state_mark_reclaim_reboot NOT > clearing _RECLAIM_REBOOT from state=ffff88007363bbc0 > *** Removing the check and instead proceeding to set the > _RECLAIM_REBOOT in the cl_state->flags so that > Jun 6 12:21:58 ipa18 kernel: AGLO: nfs4_begin_drain_session > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_begin_drain_session > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_reclaim_lease client > state=ffff88002febe528 flag has NOGRACE > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_state_manager reclaim reboot > op cl_state=ffff88002febe528 > *** Unlike the previous case the state manager now goes into the > reclaim reboot state. > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim start > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_reclaim_open_state clearing > NOGRACE from state=ffff88007363bbc0 > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim end=0 > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce > cleared NOGRACE from client state ffff88002febe528 > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim start > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim end=0 > Jun 6 12:21:59 ipa18 kernel: AGLO: nfs4_end_drain_session --- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index af285cc..0f20f12 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -158,6 +158,7 @@ enum { NFS_O_RDWR_STATE, /* OPEN stateid has read/write state */ NFS_STATE_RECLAIM_REBOOT, /* OPEN stateid server rebooted */ NFS_STATE_RECLAIM_NOGRACE, /* OPEN stateid needs to recover state */ + NFS_STATE_RECLAIM_INPROGRESS, /* OPEN stateid recovery is in progress */ NFS_STATE_POSIX_LOCKS, /* Posix locks are supported */ NFS_STATE_RECOVERY_FAILED, /* OPEN stateid state recovery failed */ NFS_STATE_MAY_NOTIFY_LOCK, /* server may CB_NOTIFY_LOCK */ diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index b34de03..4eef2eb 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1335,6 +1335,8 @@ int nfs4_state_mark_reclaim_nograce(struct nfs_client *clp, struct nfs4_state *s { if (!nfs4_valid_open_stateid(state)) return 0; + if (test_bit(NFS_STATE_RECLAIM_INPROGRESS, &state->flags)) + return 1; set_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags); clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);