From patchwork Thu May 7 17:04:58 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Olga Kornievskaia X-Patchwork-Id: 6359661 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 5C5799F32E for ; Thu, 7 May 2015 17:05:06 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 274B92034C for ; Thu, 7 May 2015 17:05:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1A48F2021F for ; Thu, 7 May 2015 17:05:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751038AbbEGRFA (ORCPT ); Thu, 7 May 2015 13:05:00 -0400 Received: from mail-ie0-f175.google.com ([209.85.223.175]:34601 "EHLO mail-ie0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751053AbbEGRE7 convert rfc822-to-8bit (ORCPT ); Thu, 7 May 2015 13:04:59 -0400 Received: by iedfl3 with SMTP id fl3so49634472ied.1 for ; Thu, 07 May 2015 10:04:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=xFh3rLyzZidWHXurAgH83mfWL2zTvbAAywgrTUzibfw=; b=XjvYxhQSJQTjj1Xp1tc/ZrlDCjxVz8pR1N8uvWzL4PI7ZYTmTDCywDqPMekXt8Tvet KREUj0VYVnVfCHGJIK6bwcoofyjIaa1v517K9XNzOuiJBOj/qsTvKlFXqDY/2UTLgLGk ixSbaQQscxSGB1/TK9Ay4FLfXGqXUqZXjshVYp5RP0JFRRN7s/Oanh3jJ7L1+pymrlkY uWuWU6fVnZbKHpdhfyKV1cNI9tFeCfasrYhxZIwLRj1UEhu71OqMbCZBUbxBtuXWukqd QBKEFm81Lb/0rAJ+Z2HU3XqQKZV8YVHf5g4aSWxH4VcaBtxEHaXki8RsYWR8NpmWOIXi WkNg== MIME-Version: 1.0 X-Received: by 10.43.167.137 with SMTP id ne9mr4721087icc.7.1431018298664; Thu, 07 May 2015 10:04:58 -0700 (PDT) Received: by 10.107.131.214 with HTTP; Thu, 7 May 2015 10:04:58 -0700 (PDT) Date: Thu, 7 May 2015 13:04:58 -0400 X-Google-Sender-Auth: 4eGJQAg2nbBb_WkhJAASdCgZdLk Message-ID: Subject: 4.0 NFS client in infinite loop in state recovery after getting BAD_STATEID From: Olga Kornievskaia To: Trond Myklebust , linux-nfs Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi folks, Problem: The upstream nfs4.0 client has problem where it will go into an infinite loop of re-sending an OPEN when it's trying to recover from receiving a BAD_STATEID error on an IO operation such READ or WRITE. How to easily reproduce (by using fault injection): 1. Do nfs4.0 mount to a server. 2. Open a file such that the server gives you a write delegation. 3. Do a write. Have a server return a BAD_STATEID. One way to do so is by using a python proxy, nfs4proxy, and inject BAD_STATEID error on WRITE. 4. And off it goes with the loop. Here’s why…. IO op like WRITE receives a BAD_STATEID. 1. for this error, in async handle error we call nfs4_schedule_stateid_recover() 2. that in turn will call nfs4_state_mark_reclaim_nograce() that will set a RECLAIM_NOGRACE in the state flags. 3. state manager thread will run and call nfs4_do_reclaim() to recover. 4. that will call nfs4_reclaim_open_state() in that function: restart: for open states in state test if RECLAIM_NOGRACE is set in state flags, if so clear it (it’s set and we’ll clear it) check open_stateid (checks if RECOVERY_FAILED is not set) (it’s not) checks if we have state calls ops->recover_open() for nfs4.0, it’ll call nfs40_open_expired() it’ll call nfs40_clear_delegation_stateid() it’ll call nfs_finish_clear_delegation_stateid() it’ll call nfs_remove_bad_delegation() it’ll call nfs_inode_find_state_and_recover() it’ll call nfs4_state_mark_reclaim_nograce() **** this will set RECLAIM_NOGRACE in state flags we return from recover_open() with status 0 call nfs4_reclaim_locks() returns 0 then goto restart; ************** what happens is since we reset the flag in the state flags the whole loop starts again. Solution: nfs_remove_bad_delegation() is only called from nfs_finish_clear_delegation_stateid() which is called from either 4.0 or 4.1 recover open functions in nograce case. In both cases, this is already state manager doing recovery based on the RECLAIM_NOGRACE flag set and it's going thru opens that need to be recovered. I propose to correct the loop by removing the call: the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 4711d04..b322823 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -632,10 +632,8 @@ void nfs_remove_bad_delegation(struct inode *inode) nfs_revoke_delegation(inode); delegation = nfs_inode_detach_delegation(inode); - if (delegation) { - nfs_inode_find_state_and_recover(inode, &delegation->stateid); + if (delegation) nfs_free_delegation(delegation); - } } EXPORT_SYMBOL_GPL(nfs_remove_bad_delegation); -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in