From patchwork Mon Jun 13 00:34:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Eshel X-Patchwork-Id: 9171799 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 76B576048C for ; Mon, 13 Jun 2016 00:34:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A39B1FF45 for ; Mon, 13 Jun 2016 00:34:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5EDDF2012F; Mon, 13 Jun 2016 00:34:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B62DD1FF45 for ; Mon, 13 Jun 2016 00:34:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933247AbcFMAeq (ORCPT ); Sun, 12 Jun 2016 20:34:46 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:20827 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933245AbcFMAeo convert rfc822-to-8bit (ORCPT ); Sun, 12 Jun 2016 20:34:44 -0400 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u5D0Yb45020032 for ; Sun, 12 Jun 2016 20:34:43 -0400 Received: from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158]) by mx0b-001b2d01.pphosted.com with ESMTP id 23getk2tb2-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Sun, 12 Jun 2016 20:34:43 -0400 Received: from localhost by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 12 Jun 2016 18:34:42 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Sun, 12 Jun 2016 18:34:39 -0600 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: eshel@us.ibm.com X-IBM-RcptTo: linux-nfs@vger.kernel.org Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 1C8283E4001C for ; Sun, 12 Jun 2016 18:34:39 -0600 (MDT) Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u5D0YdtE42074254 for ; Sun, 12 Jun 2016 17:34:39 -0700 Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EDD376E038 for ; Sun, 12 Jun 2016 18:34:38 -0600 (MDT) Received: from d50lp03.ny.us.ibm.com (unknown [146.89.104.211]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTPS id B95B16E035 for ; Sun, 12 Jun 2016 18:34:38 -0600 (MDT) Received: from localhost by d50lp03.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 12 Jun 2016 20:34:38 -0400 Received: from smtp.notes.na.collabserv.com (192.155.248.91) by d50lp03.ny.us.ibm.com (158.87.18.22) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256/256) Sun, 12 Jun 2016 20:34:36 -0400 X-IBM-Helo: smtp.notes.na.collabserv.com X-IBM-MailFrom: eshel@us.ibm.com X-IBM-RcptTo: linux-nfs@vger.kernel.org Received: from /spool/local by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Mon, 13 Jun 2016 00:34:35 -0000 Received: from us1a3-smtp01.a3.dal06.isc4sb.com (10.106.154.95) by smtp.notes.na.collabserv.com (10.106.227.143) with smtp.notes.na.collabserv.com ESMTP; Mon, 13 Jun 2016 00:34:33 -0000 Received: from us1a3-mail148.a3.dal06.isc4sb.com ([10.146.38.117]) by us1a3-smtp01.a3.dal06.isc4sb.com with ESMTP id 2016061300343298-178420 ; Mon, 13 Jun 2016 00:34:32 +0000 In-Reply-To: To: "J. Bruce Fields" Cc: linux-nfs@vger.kernel.org, "Srikanth Srinivasan" , "Trond Myklebust" , "Venkateswara R Puvvada" Subject: Re: NFS fixes From: "Marc Eshel" Date: Sun, 12 Jun 2016 17:34:32 -0700 References: MIME-Version: 1.0 X-KeepSent: 4D8DDA37:ED72F2E9-88257FD1:00031E93; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1FP3 SHF226 March 23, 2015 X-LLNOutbound: False X-Disclaimed: 64079 X-TNEFEvaluated: 1 x-cbid: 16061300-0024-0000-0000-000013DD87F7 X-IBM-ISS-SpamDetectors: Score=0.40962; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.40962; ST=0; TS=0; UL=0; ISC= X-IBM-ISS-DetailInfo: BY=3.00005373; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000170; SDB=6.00715439; UDB=6.00333240; UTC=2016-06-13 00:34:33 x-cbparentid: 16061300-4778-0000-0000-000006C58D9B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-06-12_16:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1606130005 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We are seeing a data corruption when putting very high load on the NFS V3 client reading multi gigabyte files in parallel. The check-sum on the files is showing the corruption, and looking at the data we see data that in one block that belongs in another block but it is not the full block. The test is done on multiple set of hardware using different type of server including kNFS and Ganesha servers with EXT3 or GPFS file system. The only common part in all test are NFSv3 client on REHL7.0, 7.1, 7.2. The question is there anything up stream that might fix data corruption by the NFSv3 client, oo do we know if this problem might have been reported by other users. The only fix that I see that might be related is attached, can this explain a data corruption? Thanks, Marc. Author: Trond Myklebust Date: Mon Aug 17 12:57:07 2015 -0500 NFS: nfs_set_pgio_error sometimes misses errors We should ensure that we always set the pgio_header's error field if a READ or WRITE RPC call returns an error. The current code depends on 'hdr->good_bytes' always being initialised to a large value, which is not always done correctly by callers. When this happens, applications may end up missing important errors. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust \ --- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 4984bbe..7c5718b 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -77,8 +77,8 @@ EXPORT_SYMBOL_GPL(nfs_pgheader_init); void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos) { spin_lock(&hdr->lock); - if (pos < hdr->io_start + hdr->good_bytes) { - set_bit(NFS_IOHDR_ERROR, &hdr->flags); + if (!test_and_set_bit(NFS_IOHDR_ERROR, &hdr->flags) + || pos < hdr->io_start + hdr->good_bytes) { clear_bit(NFS_IOHDR_EOF, &hdr->flags); hdr->good_bytes = pos - hdr->io_start;