From patchwork Fri Sep 23 17:40:12 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Olga Kornievskaia X-Patchwork-Id: 9348533 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8EAC3607F2 for ; Fri, 23 Sep 2016 17:40:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 843012ACAD for ; Fri, 23 Sep 2016 17:40:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 786762ACB6; Fri, 23 Sep 2016 17:40:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54C5E2ACAD for ; Fri, 23 Sep 2016 17:40:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761945AbcIWRkO (ORCPT ); Fri, 23 Sep 2016 13:40:14 -0400 Received: from mail-it0-f46.google.com ([209.85.214.46]:35551 "EHLO mail-it0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759546AbcIWRkN (ORCPT ); Fri, 23 Sep 2016 13:40:13 -0400 Received: by mail-it0-f46.google.com with SMTP id r192so21481319ita.0 for ; Fri, 23 Sep 2016 10:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to; bh=/76hTIOYcMZVHc43jnk18kElSEMzC52Ql2Jz7GuW1Z0=; b=aBlmxqg1BhAhPIR4e/QEAORig1V/35P58iy58U0E57PV/+B2PVAVCyc8qw/4lThNKD PDHOVeDPLFb/qHr0G0gJ7O3J6chGYFPuCiBn0cobfmIE+zxK3fEzo9CbA/lA3H/5KwvB GqWR4cIG72m1Y+7C/QArsa118xGRAiWOF22nrIQO5tnuY8Cod/LpEoU7wp6wr5DcZ1/I c6FVUDxKT7+NJ2szjflZlQV8FbD3eckTyZFY2Js+w3RYGNzd4/SWDaXo4X96vHgWIlpX 20YEhuVv2MafyN4FMr2NDToUNirpbFqa0kuzNDQOAU98O9teAeaSSb7qWzCeOjSgjnp+ DTGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=/76hTIOYcMZVHc43jnk18kElSEMzC52Ql2Jz7GuW1Z0=; b=Bl5wuNhH5zdeAcLniam7GZm00GeQ3oEK1MdbtY1E3WZ/9J2ry6ThscbDXUJ15fMrSh ZCxYUfpKtde4q0CkXG5KyVaBY3ModvDUZH1CDaWlAlyoR32xyv0HqkqGdO76L7ZheLuL 0/Qb7KpOfamxxpQOciNGipoogRJs11lrOcY3uthWXEseNtK0tN0QhoQMCHegfV4ckSH1 k/DuvtijDkwjWb7CXjHo9mu3IAQEAFfoPvkeckgqZCGBDJ+POwLByuyXf+0aMmFDMGC6 Y/BndJxwhdRhwsiqdoa0luqTMUYRLVzVcxaWvo69uPPr9CxjYxNI05zGoQl2RlQD8BIk 9+8g== X-Gm-Message-State: AA6/9RlLiMldsjrCxfr5YqSHhbJaFCMILRy99hceSsyvAVBGDJi0ZVIlZI4WUNRRfM968Zsnz0Ni8QC5ddl2Ug== X-Received: by 10.36.73.19 with SMTP id z19mr5010721ita.36.1474652412640; Fri, 23 Sep 2016 10:40:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.14.3 with HTTP; Fri, 23 Sep 2016 10:40:12 -0700 (PDT) From: Olga Kornievskaia Date: Fri, 23 Sep 2016 13:40:12 -0400 X-Google-Sender-Auth: W3wvcYSbGQVzMirmeGi-beYu6zI Message-ID: Subject: reuse of slot and seq# when RPC was interrupted To: linux-nfs Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi folks, I'd like to raise an issue with regards to nfs41_sequence_done() slot->interrupted case. There is a comment there saying the if the RPC was interrupted then we don't know if the server has processed the slot or not so mark the slot as interrupted. In that case the sequence is not bumped. Then later there is logic that if we received SEQ_MISORDERED and the slot was marked interrupted then bump the sequence. The problem comes when the sequence number is not increment the reply is not necessarily a SEQ_MISORDERED. Instead, the reply is a "cached" reply of the operation that was interrupted. That leads to the xdr returning "Remote EIO" (unrecoverable in some cases). If we bump the sequence number always then we should get the SEQ_MISORDERED error from which we can recover. A reproducer to see an operation reuse a seq# and getting cached reply is as follows: 1. on the shell do "rm " 2. at the nfs_proxy delay the reply from the server enough to send a ctrl-c to the shell. 3. do something else on nfs. If we instead bump the sequence number in the case of interrupted and do: case -NFS4ERR_DELAY: @@ -748,14 +749,6 @@ int nfs41_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res) goto retry_nowait; case -NFS4ERR_SEQ_MISORDERED: /* - * Was the last operation on this sequence interrupted? - * If so, retry after bumping the sequence number. - */ - if (interrupted) { - ++slot->seq_nr; - goto retry_nowait; - } - /* * Could this slot have been previously retired? * If so, then the server may be expecting seq_nr = 1! */ 1. if the server received it, then we bump and next operation has correct number 2. if the server didn't received and we bump, then next operation received SEQ_MISORDERED, it'll reset the slot/session? --- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a1a3b4c..b78dac5 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -728,6 +728,7 @@ int nfs41_sequence_done(struct rpc_task *task, struct nfs4_sequence_res *res) * operation.. * Mark the slot as having hosted an interrupted RPC call. */ + ++slot->seq_nr; slot->interrupted = 1; goto out;