From patchwork Sat Nov 6 01:16:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jane Chu X-Patchwork-Id: 12606197 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C2A72C9B for ; Sat, 6 Nov 2021 01:17:43 +0000 (UTC) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1A5MiJrF021985; Sat, 6 Nov 2021 01:17:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=9ALR+yBOH1UTV7+APpsLBv6kUw3EcfeEtH0t3niKMlo=; b=bhdD6DrPb0QrwO5v67yL37WfbfgsFk5eXkAGKvjWVf26ftA5PyZ8Hdti1xNLarHUJnCA xlIdojdAUG3YqzPBLJuli2E2/bLNtDmg/xECYwpOn6p4El8rETkQWh0mFMG88eH/XY0i Ppz0o/8dsj+u7ClUq14NZ4v1VanwH9Ux8JlbgxeUFmtrqlSdQ+Ou7nPgPigpBbP6bJ3Y 5vQV6mg3euzc3BrkgTWdsvmGmqKWUIERThY8oQVOGzsDQJK96pdrOcoMX61GoAsXlUw3 NKy24DtLPGe0s0vbf2b6jPsOAdBZvYniSaLuJTSEv9L0nV7dNaMuihYCAG381mNSVaUu OA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by mx0b-00069f02.pphosted.com with ESMTP id 3c4t7f5egq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 06 Nov 2021 01:17:11 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 1A61786m127298; Sat, 6 Nov 2021 01:17:10 GMT Received: from nam12-bn8-obe.outbound.protection.outlook.com (mail-bn8nam12lp2172.outbound.protection.outlook.com [104.47.55.172]) by userp3030.oracle.com with ESMTP id 3c5etrsyru-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 06 Nov 2021 01:17:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=INhCosMFzxNgZ9wBR9naqlwZNxJeJaJOOcg2LQYq5ZT8MNm+OlMcbcoMVaun6OjJ5iS3D4hAlYnr/P2NboZ2IMWyCCg2HT4C9YAmgZu7o2MCVk+ldp1/ofdBA5cmv16MTxAP6rd5cSYCPGuviOh4y4dwrMRx5h5iUTbu1WffSvo30VNvL4hFHhAVSWqFBnHcFkTSEHCSBrxNJR9ToYsHn11DmS0UfyMV998myEHTd17cbzzdVbQx43iAFHOTXSmDBhoD0lI7iudqdDHFSJkQiE1e+Z9yr0T7OLtViAZxGFkURuWis61+8cg/p6GGf3NsALAsjvS/dgHZve5b77+/6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9ALR+yBOH1UTV7+APpsLBv6kUw3EcfeEtH0t3niKMlo=; b=YX4wileJJFUiEucIhx24M6IUC3lTecbgkZLA3hTZ6qKjKoLiN2tsve7uzTHH2+i1jzP7e8fSS+MZeS9wTR0yg47+REU/qJO/O34/+3Vxn+CH9OzoPsWF+LSKxiHK1vHT2SSWxWiGdDaUjav9d2B6q7Crwqm7J9fvYJQkO9SP1bpN+5N1gLfbbS99mvY61aStxPyqkl/vZXMj38lwIlqpn9kYHSu+Cl3ty8J5zBVRANcyvXMiDLkyjsrPuZjx9RFrs0m4elLYxNaAvxK5yZkQuU0WxtYW/WUJZRwVclrPXs1wPsgyeNND7SR6kh3Og21BS0Ncm2epKKYPc2hOa+plng== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9ALR+yBOH1UTV7+APpsLBv6kUw3EcfeEtH0t3niKMlo=; b=hsiXEFccsHzZMsp+bnOFs3Hpo4EHVtaQ7Ow3kp3uPt8DXAt4uTCVptIUrmSWm6iFYPvL4ZwIXzZ7d7W7cpAJkPTMuRbgtBZzR3HJsWezvwjSEIQY5Ix8Wawn+/juiQnYk9GqvYouYJT10HbEdLzGXSzlcK3/1OgyHcTYcJ7EUBs= Authentication-Results: fromorbit.com; dkim=none (message not signed) header.d=none;fromorbit.com; dmarc=none action=none header.from=oracle.com; Received: from SJ0PR10MB4429.namprd10.prod.outlook.com (2603:10b6:a03:2d1::14) by SJ0PR10MB5890.namprd10.prod.outlook.com (2603:10b6:a03:3ef::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.13; Sat, 6 Nov 2021 01:17:07 +0000 Received: from SJ0PR10MB4429.namprd10.prod.outlook.com ([fe80::418c:dfe4:f3ee:feaa]) by SJ0PR10MB4429.namprd10.prod.outlook.com ([fe80::418c:dfe4:f3ee:feaa%6]) with mapi id 15.20.4669.013; Sat, 6 Nov 2021 01:17:07 +0000 From: Jane Chu To: david@fromorbit.com, djwong@kernel.org, dan.j.williams@intel.com, hch@infradead.org, vishal.l.verma@intel.com, dave.jiang@intel.com, agk@redhat.com, snitzer@redhat.com, dm-devel@redhat.com, ira.weiny@intel.com, willy@infradead.org, vgoyal@redhat.com, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v2 1/2] dax: Introduce normal and recovery dax operation modes Date: Fri, 5 Nov 2021 19:16:37 -0600 Message-Id: <20211106011638.2613039-2-jane.chu@oracle.com> X-Mailer: git-send-email 2.18.4 In-Reply-To: <20211106011638.2613039-1-jane.chu@oracle.com> References: <20211106011638.2613039-1-jane.chu@oracle.com> X-ClientProxiedBy: SJ0PR03CA0357.namprd03.prod.outlook.com (2603:10b6:a03:39c::32) To SJ0PR10MB4429.namprd10.prod.outlook.com (2603:10b6:a03:2d1::14) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Received: from brm-x62-16.us.oracle.com (2606:b400:8004:44::1c) by SJ0PR03CA0357.namprd03.prod.outlook.com (2603:10b6:a03:39c::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.11 via Frontend Transport; Sat, 6 Nov 2021 01:17:05 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 65e48592-b218-4f9c-3521-08d9a0c3261a X-MS-TrafficTypeDiagnostic: SJ0PR10MB5890: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7691; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eyrR+PlG+0mdHSythY/mgEWweknqhLFYxBI7C2o5mUfhIgSDvhbmclww6+FIcAzy/xZl4wEd0zpm7Uwahsye0lEQo48jIqoIUJ6GZgJhGU9oNDZWvEP4/Av9MkxQKSU7IFGZwDxOBFIRGD6P49vZPTjbzwbI/BUF9JqbGvzc1rbLZvD3CfcfCuvoZbICfDrfQpncx2H5EWJORgKcz2bjl+NdwT4Rm/ZoOhU9m2hqGnnxWH/gbwdVvwkWxeZVwl2C4+gRMJTFzLHweLf4yYHGWsQjaO9Y5qgjw0V3Xkn3z2IWqvLW7XPMdMbaYMRJiN2GlW3yekwPybYnSeXbV5pgolu7/UCTrk42xtMIvRc08CKSsO/trJItHAnm099o/FY34loN8zO5ytV5/NVHtKutSl4+rdz8/7DhkqACOIhowr6n5B8B31gBT/Q/Goj+XA00Q+FudEno+gRNWdSiC/1Ce1z3kFeqo01jBXYr8TLgbpr+yEOyiY6WkMGESHbO6f6n0wJqIG0ZbNyWDVzmCJj53sxPQmhzKhf7jzYJjR4zeIkm2rngEmytcZu+4sM458BoXjkaLnOvkzajKZQcZxfE6OyU/2/E4zFyITp5xUbZV1+3Rz6K9DqWsn2Ecm3Zg0K/xk7Pf9W51b9K1btINAxbiboDZImbhCw1bt005nig3z6kxFJg54YHxKWZq14vuXCUB3j1hsmy2ALkcCKdWYnCZw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR10MB4429.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(366004)(921005)(36756003)(8676002)(316002)(86362001)(66476007)(66946007)(508600001)(2906002)(44832011)(6486002)(83380400001)(2616005)(5660300002)(186003)(52116002)(7416002)(7696005)(1076003)(6666004)(38100700002)(8936002)(66556008)(30864003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 9sp6cOy9rgD1pqGUz4RxkF+gulsunrlqrNiJTqF8K2/VGbXSBPQcJSyrjPU7tOFz6Z32acMcJI8nTjXBZP+i9mUXILXnL//dpepT+mmku0mSXzYaVvwtTPdZx2kBOuh/EHT5noWNJ4h31wKe4LtnTvjBQwr1ekLJopHiulROqxb8IFRwvA4ip4UQ1/bO6JSaeY/sATH2Y6EATo/Z9fa7BrzL+LiLEDOv6HncjbVrJp18M/X6CGT/nBu7Pevzn9LrLK5I41pJv7/W8yJLYcRLrnGwfp0EqbdkOPx6zVHYfFYa6tdibVhUHR9GRPzBlCldXDvQaPwOIilsZOGszo0iNlftaJnrF7WnzJKrMgKRqPXU6+3oEv2Ddhq2iKFL20zb8hvssdq6MTcXVe5WE+lwQvOvmfr/2pWvhOvha7P70qzqT1N9HSEs4ZR5+EV3r/12ETQSi/ZJzfNy4pu2ViIhzh0PC1mDzy9Hk0I1as26bh+fKOPcWqD6+7qGBv6Dak2oLw8aY1rGAsqOt8Ab16SIMnPw8RnCWC0F0tpOAH/MZuN+ga8b4CjTJUp8Rbq1RQR750ClCbM+ilzLVjFx/h5ibj3XWJVj/uYvavi5Pv3an6jyKIk12Do2aMyjggKMwHchS0iq/anL40IZX98kbfQ2Gh0f07iQVebCmiGyazxySL8VVA68bcAS5RxUpsgBDSu0oKLGC6Wwpgv6FH8dVBsx6nfWv79YtFB1DBH0wm8CBw7fMLjQSJpBMBzH2cPRNx/4tYmqRdxMUCTHvy4FvoC6y/H8OBXRNBa85QE68xYvCAEvo2F6UmvtDqp4SgCX7w0WggS+dh0I3GAA4G6D+M4lP63zlo/WHeqKt7NDV5+qvXCDWJlfI5gO1MCYS5EqU3rQA1WddOLuxgKNC9iSCT4qvZ4HpAHgXEg23L+vS2RVNwIdRhSZS9ZkRTABBSpqkxsRUaMc5RqHPGoA1h63pyu2vo7blCCfpfEZANwAnOa+hz2tmKuyykosEuyCo/iPp3gA5jimsNCc6DNiQ4fYP5b2Y3YpvbJHc5dEvsU1qam4dZn1rOgGUBVPtpNvKlH4DrZJuN4AtROyI20sKe4CxtbtJwoJAWFqOMKOaXgxSXN9qXAGku22fj438+nKoYatL8b2qE+53Cm5VkOf0MF2/AYSSH3/EznJRJYD9NNFzkvWfvbFKf1fWHQTCzxzWhnxuu9yWIVL/zMTGFC87p/mU2HEw5MQMxfbgUTImnfVmIqcMVIUPtIfM9Yeesw3x1EBxeHcBku5LCzc09+VsfNpcejdxAviN0L1dSfJOdOJRRRsoXAKrcLC92IHaEJOcLOkZ5Tx9mey0LShsdDqsdw47OBeGmv1vCQ7hAmV7GcA8IatPX6daM9ww8CJBpYmm4H1LcEDG4Akur0hjD8RYomI+K/19Eapezc82GGsPjXt0zRUmfIQb1fiWG2tJr6gvGnnT+rq1kz2AjqtJHdwoJod0vrjTfAgsWJX74Gy+LuaOx+mRGDkCrT3fZ1zYFRQaFS+g6lbP7h3/m73kEctf1X4lr/CiTN8tQPIQoh4c5Wfr7X0M+ObsPV4LZdHESl3bctKQNi4c5Eh7oV76f1AtvHJWo6HAK7WWME7QrCc9D/1EWGH1Y4L10iuEpayu5SjgzL53GDs X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 65e48592-b218-4f9c-3521-08d9a0c3261a X-MS-Exchange-CrossTenant-AuthSource: SJ0PR10MB4429.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2021 01:17:07.0373 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 2exHhXcV9H/lBs0NRzIMAijXfuHE0hVfTdK8ZJymu1ZVA4l7r/eQ4BqQ7InHHFewH68XiwKOBnIX4L9D0mUyNA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR10MB5890 X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10159 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111060005 X-Proofpoint-ORIG-GUID: 44KRL2MPqggxMI4rqehMZNT6wF-GX24O X-Proofpoint-GUID: 44KRL2MPqggxMI4rqehMZNT6wF-GX24O Introduce DAX_OP_NORMAL and DAX_OP_RECOVERY operation modes to {dax_direct_access, dax_copy_from_iter, dax_copy_to_iter}. DAX_OP_NORMAL is the default or the existing mode, and DAX_OP_RECOVERY is a new mode for data recovery purpose. When dax-FS suspects dax media error might be encountered on a read or write, it can enact the recovery mode read or write by setting DAX_OP_RECOVERY in the aforementioned APIs. A read in recovery mode attempts to fetch as much data as possible until the first poisoned page is encountered. A write in recovery mode attempts to clear poison(s) in a page-aligned range and then write the user provided data over. DAX_OP_NORMAL should be used for all non-recovery code path. Signed-off-by: Jane Chu --- drivers/dax/super.c | 15 +++++++++------ drivers/md/dm-linear.c | 14 ++++++++------ drivers/md/dm-log-writes.c | 19 +++++++++++-------- drivers/md/dm-stripe.c | 14 ++++++++------ drivers/md/dm-target.c | 2 +- drivers/md/dm-writecache.c | 8 +++++--- drivers/md/dm.c | 14 ++++++++------ drivers/nvdimm/pmem.c | 11 ++++++----- drivers/nvdimm/pmem.h | 2 +- drivers/s390/block/dcssblk.c | 13 ++++++++----- fs/dax.c | 14 ++++++++------ fs/fuse/dax.c | 4 ++-- fs/fuse/virtio_fs.c | 12 ++++++++---- include/linux/dax.h | 18 +++++++++++------- include/linux/device-mapper.h | 5 +++-- tools/testing/nvdimm/pmem-dax.c | 2 +- 16 files changed, 98 insertions(+), 69 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index c0910687fbcb..90cae9d84b9c 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -110,6 +110,7 @@ enum dax_device_flags { * @dax_dev: a dax_device instance representing the logical memory range * @pgoff: offset in pages from the start of the device to translate * @nr_pages: number of consecutive pages caller can handle relative to @pfn + * @mode: indicate whether dax operation is in normal or recovery mode * @kaddr: output parameter that returns a virtual address mapping of pfn * @pfn: output parameter that returns an absolute pfn translation of @pgoff * @@ -117,7 +118,7 @@ enum dax_device_flags { * pages accessible at the device relative @pgoff. */ long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, - void **kaddr, pfn_t *pfn) + int mode, void **kaddr, pfn_t *pfn) { long avail; @@ -131,7 +132,7 @@ long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, return -EINVAL; avail = dax_dev->ops->direct_access(dax_dev, pgoff, nr_pages, - kaddr, pfn); + mode, kaddr, pfn); if (!avail) return -ERANGE; return min(avail, nr_pages); @@ -139,22 +140,24 @@ long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, EXPORT_SYMBOL_GPL(dax_direct_access); size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, - size_t bytes, struct iov_iter *i) + size_t bytes, struct iov_iter *i, int mode) { if (!dax_alive(dax_dev)) return 0; - return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i); + return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i, + mode); } EXPORT_SYMBOL_GPL(dax_copy_from_iter); size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, - size_t bytes, struct iov_iter *i) + size_t bytes, struct iov_iter *i, int mode) { if (!dax_alive(dax_dev)) return 0; - return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i); + return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i, + mode); } EXPORT_SYMBOL_GPL(dax_copy_to_iter); diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 90de42f6743a..c73ac6b98801 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -173,27 +173,29 @@ static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff) } static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, pfn_t *pfn) { struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); - return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn); + return dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn); } static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, + int mode) { struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); - return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i); + return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i, mode); } static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, + int mode) { struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); - return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i); + return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i, mode); } static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c index df3cd78223fb..1e9847f904ef 100644 --- a/drivers/md/dm-log-writes.c +++ b/drivers/md/dm-log-writes.c @@ -959,16 +959,18 @@ static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti, } static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, + void **kaddr, pfn_t *pfn) { struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff); - return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn); + return dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn); } static size_t log_writes_dax_copy_from_iter(struct dm_target *ti, - pgoff_t pgoff, void *addr, size_t bytes, - struct iov_iter *i) + pgoff_t pgoff, void *addr, + size_t bytes, struct iov_iter *i, + int mode) { struct log_writes_c *lc = ti->private; sector_t sector = pgoff * PAGE_SECTORS; @@ -985,16 +987,17 @@ static size_t log_writes_dax_copy_from_iter(struct dm_target *ti, return 0; } dax_copy: - return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i); + return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i, mode); } static size_t log_writes_dax_copy_to_iter(struct dm_target *ti, - pgoff_t pgoff, void *addr, size_t bytes, - struct iov_iter *i) + pgoff_t pgoff, void *addr, + size_t bytes, struct iov_iter *i, + int mode) { struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff); - return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i); + return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i, mode); } static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c index 50dba3f39274..4c098452268b 100644 --- a/drivers/md/dm-stripe.c +++ b/drivers/md/dm-stripe.c @@ -317,27 +317,29 @@ static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff) } static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, pfn_t *pfn) { struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); - return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn); + return dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn); } static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, + int mode) { struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); - return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i); + return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i, mode); } static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, + int mode) { struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); - return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i); + return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i, mode); } static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, diff --git a/drivers/md/dm-target.c b/drivers/md/dm-target.c index 64dd0b34fcf4..2de1073dbad6 100644 --- a/drivers/md/dm-target.c +++ b/drivers/md/dm-target.c @@ -142,7 +142,7 @@ static void io_err_release_clone_rq(struct request *clone, } static long io_err_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, pfn_t *pfn) { return -EIO; } diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index 0af464a863fe..b2e4ff922fe2 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -286,7 +286,8 @@ static int persistent_memory_claim(struct dm_writecache *wc) id = dax_read_lock(); - da = dax_direct_access(wc->ssd_dev->dax_dev, offset, p, &wc->memory_map, &pfn); + da = dax_direct_access(wc->ssd_dev->dax_dev, offset, p, DAX_OP_NORMAL, + &wc->memory_map, &pfn); if (da < 0) { wc->memory_map = NULL; r = da; @@ -308,8 +309,9 @@ static int persistent_memory_claim(struct dm_writecache *wc) i = 0; do { long daa; - daa = dax_direct_access(wc->ssd_dev->dax_dev, offset + i, p - i, - NULL, &pfn); + daa = dax_direct_access(wc->ssd_dev->dax_dev, + offset + i, p - i, + DAX_OP_NORMAL, NULL, &pfn); if (daa <= 0) { r = daa ? daa : -EINVAL; goto err3; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 282008afc465..dc354db22ef9 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1001,7 +1001,8 @@ static struct dm_target *dm_dax_get_live_target(struct mapped_device *md, } static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, + pfn_t *pfn) { struct mapped_device *md = dax_get_private(dax_dev); sector_t sector = pgoff * PAGE_SECTORS; @@ -1019,7 +1020,7 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, if (len < 1) goto out; nr_pages = min(len, nr_pages); - ret = ti->type->direct_access(ti, pgoff, nr_pages, kaddr, pfn); + ret = ti->type->direct_access(ti, pgoff, nr_pages, mode, kaddr, pfn); out: dm_put_live_table(md, srcu_idx); @@ -1028,7 +1029,8 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, } static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, + struct iov_iter *i, int mode) { struct mapped_device *md = dax_get_private(dax_dev); sector_t sector = pgoff * PAGE_SECTORS; @@ -1044,7 +1046,7 @@ static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, ret = copy_from_iter(addr, bytes, i); goto out; } - ret = ti->type->dax_copy_from_iter(ti, pgoff, addr, bytes, i); + ret = ti->type->dax_copy_from_iter(ti, pgoff, addr, bytes, i, mode); out: dm_put_live_table(md, srcu_idx); @@ -1052,7 +1054,7 @@ static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, } static size_t dm_dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, int mode) { struct mapped_device *md = dax_get_private(dax_dev); sector_t sector = pgoff * PAGE_SECTORS; @@ -1068,7 +1070,7 @@ static size_t dm_dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, ret = copy_to_iter(addr, bytes, i); goto out; } - ret = ti->type->dax_copy_to_iter(ti, pgoff, addr, bytes, i); + ret = ti->type->dax_copy_to_iter(ti, pgoff, addr, bytes, i, mode); out: dm_put_live_table(md, srcu_idx); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 0d6633987552..3dc99e0bf633 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -255,7 +255,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector, /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */ __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, pfn_t *pfn) { resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset; @@ -294,11 +294,12 @@ static int pmem_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, } static long pmem_dax_direct_access(struct dax_device *dax_dev, - pgoff_t pgoff, long nr_pages, void **kaddr, pfn_t *pfn) + pgoff_t pgoff, long nr_pages, int mode, void **kaddr, + pfn_t *pfn) { struct pmem_device *pmem = dax_get_private(dax_dev); - return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn); + return __pmem_direct_access(pmem, pgoff, nr_pages, mode, kaddr, pfn); } /* @@ -308,13 +309,13 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev, * dax_iomap_actor() */ static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, int mode) { return _copy_from_iter_flushcache(addr, bytes, i); } static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i) + void *addr, size_t bytes, struct iov_iter *i, int mode) { return _copy_mc_to_iter(addr, bytes, i); } diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h index 59cfe13ea8a8..bda6a898ba81 100644 --- a/drivers/nvdimm/pmem.h +++ b/drivers/nvdimm/pmem.h @@ -27,7 +27,7 @@ struct pmem_device { }; long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn); + long nr_pages, int mode, void **kaddr, pfn_t *pfn); #ifdef CONFIG_MEMORY_FAILURE static inline bool test_and_clear_pmem_poison(struct page *page) diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index e65e83764d1c..fb9f768e12a1 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -32,7 +32,7 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode); static void dcssblk_release(struct gendisk *disk, fmode_t mode); static void dcssblk_submit_bio(struct bio *bio); static long dcssblk_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn); + long nr_pages, int mode, void **kaddr, pfn_t *pfn); static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0"; @@ -45,13 +45,15 @@ static const struct block_device_operations dcssblk_devops = { }; static size_t dcssblk_dax_copy_from_iter(struct dax_device *dax_dev, - pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) + pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i, + int mode) { return copy_from_iter(addr, bytes, i); } static size_t dcssblk_dax_copy_to_iter(struct dax_device *dax_dev, - pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) + pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i, + int mode) { return copy_to_iter(addr, bytes, i); } @@ -62,7 +64,8 @@ static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev, long rc; void *kaddr; - rc = dax_direct_access(dax_dev, pgoff, nr_pages, &kaddr, NULL); + rc = dax_direct_access(dax_dev, pgoff, nr_pages, DAX_OP_NORMAL, + &kaddr, NULL); if (rc < 0) return rc; memset(kaddr, 0, nr_pages << PAGE_SHIFT); @@ -941,7 +944,7 @@ __dcssblk_direct_access(struct dcssblk_dev_info *dev_info, pgoff_t pgoff, static long dcssblk_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, pfn_t *pfn) { struct dcssblk_dev_info *dev_info = dax_get_private(dax_dev); diff --git a/fs/dax.c b/fs/dax.c index eb715363fd66..bea6df1498c3 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -735,7 +735,8 @@ static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_d return rc; id = dax_read_lock(); - rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL); + rc = dax_direct_access(dax_dev, pgoff, 1, DAX_OP_NORMAL, &kaddr, + NULL); if (rc < 0) { dax_read_unlock(id); return rc; @@ -1036,7 +1037,7 @@ static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size, return rc; id = dax_read_lock(); length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), - NULL, pfnp); + DAX_OP_NORMAL, NULL, pfnp); if (length < 0) { rc = length; goto out; @@ -1162,7 +1163,8 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap) if (page_aligned) rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1); else - rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL); + rc = dax_direct_access(iomap->dax_dev, pgoff, 1, + DAX_OP_NORMAL, &kaddr, NULL); if (rc < 0) { dax_read_unlock(id); return rc; @@ -1231,7 +1233,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, break; map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), - &kaddr, NULL); + DAX_OP_NORMAL, &kaddr, NULL); if (map_len < 0) { ret = map_len; break; @@ -1250,10 +1252,10 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, */ if (iov_iter_rw(iter) == WRITE) xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr, - map_len, iter); + map_len, iter, DAX_OP_NORMAL); else xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr, - map_len, iter); + map_len, iter, DAX_OP_NORMAL); pos += xfer; length -= xfer; diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 713818d74de6..755d8d4b7d34 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -1241,8 +1241,8 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd) INIT_DELAYED_WORK(&fcd->free_work, fuse_dax_free_mem_worker); id = dax_read_lock(); - nr_pages = dax_direct_access(fcd->dev, 0, PHYS_PFN(dax_size), NULL, - NULL); + nr_pages = dax_direct_access(fcd->dev, 0, PHYS_PFN(dax_size), + DAX_OP_NORMAL, NULL, NULL); dax_read_unlock(id); if (nr_pages < 0) { pr_debug("dax_direct_access() returned %ld\n", nr_pages); diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index b4c7c7fa987f..fb5433a37a7b 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -739,7 +739,8 @@ static void virtio_fs_cleanup_vqs(struct virtio_device *vdev, * offset. */ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, + pfn_t *pfn) { struct virtio_fs *fs = dax_get_private(dax_dev); phys_addr_t offset = PFN_PHYS(pgoff); @@ -755,14 +756,16 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, - size_t bytes, struct iov_iter *i) + size_t bytes, struct iov_iter *i, + int mode) { return copy_from_iter(addr, bytes, i); } static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, - size_t bytes, struct iov_iter *i) + size_t bytes, struct iov_iter *i, + int mode) { return copy_to_iter(addr, bytes, i); } @@ -773,7 +776,8 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev, long rc; void *kaddr; - rc = dax_direct_access(dax_dev, pgoff, nr_pages, &kaddr, NULL); + rc = dax_direct_access(dax_dev, pgoff, nr_pages, DAX_OP_NORMAL, + &kaddr, NULL); if (rc < 0) return rc; memset(kaddr, 0, nr_pages << PAGE_SHIFT); diff --git a/include/linux/dax.h b/include/linux/dax.h index 324363b798ec..931586df2905 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -9,6 +9,10 @@ /* Flag for synchronous flush */ #define DAXDEV_F_SYNC (1UL << 0) +/* dax operation mode dynamically set by caller */ +#define DAX_OP_NORMAL 0 +#define DAX_OP_RECOVERY 1 + typedef unsigned long dax_entry_t; struct dax_device; @@ -22,8 +26,8 @@ struct dax_operations { * logical-page-offset into an absolute physical pfn. Return the * number of pages available for DAX at that pfn. */ - long (*direct_access)(struct dax_device *, pgoff_t, long, - void **, pfn_t *); + long (*direct_access)(struct dax_device *, pgoff_t, long, int, + void **, pfn_t *); /* * Validate whether this device is usable as an fsdax backing * device. @@ -32,10 +36,10 @@ struct dax_operations { sector_t, sector_t); /* copy_from_iter: required operation for fs-dax direct-i/o */ size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t, - struct iov_iter *); + struct iov_iter *, int); /* copy_to_iter: required operation for fs-dax direct-i/o */ size_t (*copy_to_iter)(struct dax_device *, pgoff_t, void *, size_t, - struct iov_iter *); + struct iov_iter *, int); /* zero_page_range: required operation. Zero page range */ int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); }; @@ -186,11 +190,11 @@ static inline void dax_read_unlock(int id) bool dax_alive(struct dax_device *dax_dev); void *dax_get_private(struct dax_device *dax_dev); long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, - void **kaddr, pfn_t *pfn); + int mode, void **kaddr, pfn_t *pfn); size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, - size_t bytes, struct iov_iter *i); + size_t bytes, struct iov_iter *i, int mode); size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, - size_t bytes, struct iov_iter *i); + size_t bytes, struct iov_iter *i, int mode); int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, size_t nr_pages); void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index a7df155ea49b..6596a8e0ceed 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -146,9 +146,10 @@ typedef int (*dm_busy_fn) (struct dm_target *ti); * >= 0 : the number of bytes accessible at the address */ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn); + long nr_pages, int mode, void **kaddr, pfn_t *pfn); typedef size_t (*dm_dax_copy_iter_fn)(struct dm_target *ti, pgoff_t pgoff, - void *addr, size_t bytes, struct iov_iter *i); + void *addr, size_t bytes, struct iov_iter *i, + int mode); typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff, size_t nr_pages); diff --git a/tools/testing/nvdimm/pmem-dax.c b/tools/testing/nvdimm/pmem-dax.c index af19c85558e7..71c225630e7e 100644 --- a/tools/testing/nvdimm/pmem-dax.c +++ b/tools/testing/nvdimm/pmem-dax.c @@ -8,7 +8,7 @@ #include long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn) + long nr_pages, int mode, void **kaddr, pfn_t *pfn) { resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset; From patchwork Sat Nov 6 01:16:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jane Chu X-Patchwork-Id: 12606195 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2E1C2C99 for ; Sat, 6 Nov 2021 01:17:41 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1A5MiUTY004533; Sat, 6 Nov 2021 01:17:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references : content-type : mime-version; s=corp-2021-07-09; bh=c+GNFDrFMiooQMFPEQGlkC52Llu9IBjyO3DT8fpTVH0=; b=OWV3IxT0feLWyiHVuluR7g66EIQdQZTFa8bF/3u+4ANW/ZHVSZZzj7r9RXzjBmqtZqPg sIo67R2ofjqaSQyuGtkzT0aG71viKmnBfZ7nTw5xjYWlPDxeA9FnUPkr4grP1F1ITp11 BEfHCFfsv2BME82hLB9ibxVknmzW8E8P65my+WcSBm5bKSd3djPRYWF92DwBtrGociWQ H02ikuoqv5JX/G87Qch6bhRkxMXYuf7kfAzOfjOmpyLbOMDzV2XubpKwtb0wUA3Y+XAh XJwoCAqBuFndJBa0eBrQzFUkWe8MeNqGyykxkHCSRkOfDvs1q9rsVrafCm+kOfiWRZhR QA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by mx0b-00069f02.pphosted.com with ESMTP id 3c4t7k5gp8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 06 Nov 2021 01:17:16 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 1A615rjQ064774; Sat, 6 Nov 2021 01:17:15 GMT Received: from nam12-bn8-obe.outbound.protection.outlook.com (mail-bn8nam12lp2175.outbound.protection.outlook.com [104.47.55.175]) by aserp3030.oracle.com with ESMTP id 3c5fra86fv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 06 Nov 2021 01:17:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gw4y7+gvYI5/vkXRi4jdV2U4QGrHMzVC34+2kTEKjzXb7zSxMenD47KX1IEQE097VUr5Xax3f78VsWGkM0blPROVA2KmUdTVXqPV4Ew4RCB5afmseZT4XWpTF/2y02RCX429sMgiYr79Jn1SBMZItsjSOFHknI9KMHHBYRBF3sa64MRgmbR7jknlkuheT2v+5BI6+vAhDhAqKUYM3Xct9PqchsljzSuG3YQaQdLX0k8ijxuUrKxhjt8RqoIKskbbZgop22rmZfzYzzZbxlJpfkaPvKxJTuxUCWpzbJRFOfM23oCewa75AIxb5mEDCUQYReYnHPdp3S9+9YCBY8Dv7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=c+GNFDrFMiooQMFPEQGlkC52Llu9IBjyO3DT8fpTVH0=; b=I5XGXmspM0rfBe9K+KnjLfWS4CEranBTHz3PJXinHp1aGKu5ju+r9Rehitbnug8gp0/W3jUKMUAOuuaeHeu0l/IG1xnQb7uKH/4c87cTfV4VMXV6tdli7vMTlWq98rbYjbQBos+bTVpzfdgluacDmEZOOi9Y4BdIkrFG7nx4T+zyGv4qYMoHTri4rUF3xbNfIH/MRL9mVlfCM5iVBx6XUJO62QZF+5E1NVCI+ikkDq2DnhUm9qkkD5VhgVFx8UgYJ94KObDJiKgtTxFWfRoL62qGEYv1/GTcTj0/HKfXFrTbYLLwjKDCAHrvL+8WJoCz+IJUh8A8YNcm6y00tYHZDA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c+GNFDrFMiooQMFPEQGlkC52Llu9IBjyO3DT8fpTVH0=; b=IFsItyLgdwVHWD03YY/225hAdoJ3LMi9AWG9RsElaVd0qEtPia1vND0a3P8nPJozjWQ6iDmeYOeXwDK0fKJUoQnqC5NLWyLhsqou0ycoIZCPojn8acCszR8pS946Qh22l2NHRXJ/yT3ijUH4kst1ptvz6lBiO8oPjEWpvuzZH4A= Authentication-Results: fromorbit.com; dkim=none (message not signed) header.d=none;fromorbit.com; dmarc=none action=none header.from=oracle.com; Received: from SJ0PR10MB4429.namprd10.prod.outlook.com (2603:10b6:a03:2d1::14) by SJ0PR10MB5890.namprd10.prod.outlook.com (2603:10b6:a03:3ef::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.13; Sat, 6 Nov 2021 01:17:13 +0000 Received: from SJ0PR10MB4429.namprd10.prod.outlook.com ([fe80::418c:dfe4:f3ee:feaa]) by SJ0PR10MB4429.namprd10.prod.outlook.com ([fe80::418c:dfe4:f3ee:feaa%6]) with mapi id 15.20.4669.013; Sat, 6 Nov 2021 01:17:13 +0000 From: Jane Chu To: david@fromorbit.com, djwong@kernel.org, dan.j.williams@intel.com, hch@infradead.org, vishal.l.verma@intel.com, dave.jiang@intel.com, agk@redhat.com, snitzer@redhat.com, dm-devel@redhat.com, ira.weiny@intel.com, willy@infradead.org, vgoyal@redhat.com, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH v2 2/2] dax,pmem: Implement pmem based dax data recovery Date: Fri, 5 Nov 2021 19:16:38 -0600 Message-Id: <20211106011638.2613039-3-jane.chu@oracle.com> X-Mailer: git-send-email 2.18.4 In-Reply-To: <20211106011638.2613039-1-jane.chu@oracle.com> References: <20211106011638.2613039-1-jane.chu@oracle.com> X-ClientProxiedBy: SJ0PR03CA0357.namprd03.prod.outlook.com (2603:10b6:a03:39c::32) To SJ0PR10MB4429.namprd10.prod.outlook.com (2603:10b6:a03:2d1::14) Precedence: bulk X-Mailing-List: nvdimm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Received: from brm-x62-16.us.oracle.com (2606:b400:8004:44::1c) by SJ0PR03CA0357.namprd03.prod.outlook.com (2603:10b6:a03:39c::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.11 via Frontend Transport; Sat, 6 Nov 2021 01:17:12 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d3a5fe25-00db-411b-1651-08d9a0c32a31 X-MS-TrafficTypeDiagnostic: SJ0PR10MB5890: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1148; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: d8lN4WQn8vgs29dN3eW8aEm7O1FDMLSZtI9scVD7shXvHMP/5zJm1sgPDMUU7c/40eePadtjofBwjhxpUMiDW1x4Kp8isvCKI070G0urN/JZyCgxIm1yWewVLWN14nDrt6WZeDrg2hqZnr2IKtMejrf3D6S2sVH6JdUv2ThkbHJ+S59Ho59s6f9p0QUxfsuHKDKlfoRbvxzhir8P4mvNEYLNrE9D+b/KnlnW2okBo0hAcat4h9TnuP87dOwtq0hWcLX1H/+iKZ4LgE9NNtW0wIaFpBajf+5cN/OVIbX7xvcod1vJAD7sNCBNSwoiXotwvm4CyeQzNWeaIDmQNu+ISB2XDlBKT/Hii0sEaA69idMsEW6pk9BxQatHTEqokslI3SR00TFAUK09o7AZeD//BphQCj4iaqfFncQWrBBsZ2mpin/IO9+kdAJ7GWRv5ev8E142xQxncwiz7KAx6RDJb6Vw/KiyHZoK3BiaRWgN4y5WKwWf7jVz7hQnqm0fJ/goRqxTwm/2/6Zec7+2iP8Cx+5QZ5WasWnv/Ti5Cz+wOJINd7/XfRsBcHEv7jYb4p6jf02hm8Njdu2O9Wx7dQw9cnGLjuAHHDU6nwYAoLspgfaoFXNk7Ns8K9wPL0oR/m4yebF/vqRb6N/m7RzNzQPBW3DFGNl/cOlP/tI5SQ7ejRbJ9CACWExr6/+dDq0hShNMmYEP1kh2HlMdv8ScEUDaQFN/9FhKF6KYxJrmt1EB1LU= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR10MB4429.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(366004)(921005)(36756003)(8676002)(316002)(86362001)(66476007)(66946007)(508600001)(2906002)(44832011)(6486002)(83380400001)(2616005)(5660300002)(186003)(52116002)(7416002)(7696005)(1076003)(6666004)(38100700002)(8936002)(66556008)(309714004);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: DjQAfA7cdzLYsIHL2GD14ScvUs5h2xheFKca+ftoQnzohGlOhTCq8euL9laUVMolmDB9ElzldgAMPpfpn8EOmcmQGTG/jaqKqifVtcoMPdpPsakarCJR9O7FFwaS/pfKMVhAhznd/tvwMgGzDhGOkN3hgzvwiaODS2WMX1ez+PNvRROwq0ts3eJ7R6RrbA5upL5HoP5a9IelMSixNzVuQ2sgUA2L4MAwDGMKai/oNZsCvooX2hI82jfAIa/O7BNXJZLc0XWinw5K+eoBgrSd+0VNAedUIrZ5nZ31kcjHfnX+5FHpa57bW3pD3OyOjteO9OTUHAGBw5HrCFO8UTkWmaEFWa/LAakbUANMhHxTOK3939VWdiyZkx6N6MpGQUMv/wcdFsgiVmvl8Ir+293mO45QSMepwAxK66XGR4/75NDzas8p7I91Oe5ONIUn3AtzNbkyoNgJf1oD24uw0z0aO3DAE0MfcYcNZP2tuyEoc2VR0BCKYuqZEnuMsM8b+Nz6m6K6WiWGHaUJivUdEFwbxXzMVSyCErp5CPPygekimc0VDYxkLDun4Lfwntb07AuJKbvLOo+vKl6wAf7AgxuRliTpqUZhyZwfPUgAC3mZbYLoi9axSO2Y8o3EqRm70mDHdXEqkR/kp5jDrkA2KcD9aiYhmB752uRcojPCG2uRIRsqsXcAhVQ9kLnsleFoBQhoare4LcWO1LAZn527qxjRwI5mj/Z7kxNYLsDxyRX1q3wAoJnzY2H8NH/4+I+7qYY0HibOR6Cvzw93dWLrqAqa9hGjg10wwVagdeLTwGUeMZ5QqYMwUoSSsu+jG/HFgYt2RAtZOJZCajQvYrcocVHmB2/e62mDDdkpAqsJ6YUSL27foJtk11OEFDIWffxN1Rbc36bMyTZlSEqsfd6Attrbxm9BsV526BgZ+Dj4E4lYmYVkzi483OQxVa4Wc3OxnNbakLKgZTUqGqJtfj3FIYeDJ6zbS7jU+jyetx7Wov/r1okt7ETaaiTdN1FWE2QUfxFOV5s5eIW3lM2jrjpyKRa4dxSYMwTI5ewS4HH1yaeMxJNFjwyEMaFRijehtA71cIc7FzinzPJBV+pgnPeWT0h3ILfjMo6gOhjwCb2F2GZMIJ8Qw2UCx3ZQQOLBSpfCo1JMQQtoZl3xL+4DExGWfboEgCbdv5Z5q5xj6L2LHjQC0DRFcc+0iNAwrKN4zYBe4gSfyZOGbTEkFp1dWlr1cn1C7XV8VX8cDkJU/aDxC82fNthh7NtiDP9adrEZReTEATN5W4/Vdh5RDZ5A0Ff6M238X/6E7etl5h0o5bk52cZ4CaGT+D6RbJmsm3EK3hBpOgMyELRuT2qzi/IHtqOh9QqSNNl1jU4Zom8S1WSgzaLYdRm+Q3Id+UaEfyTOHkIlSziLJSBhseppUk0YfbG6kSw4cm6DWdYg5wXViNVFumbFc2mOd1NeUoUib1XG7mjf/MBXXLVaXlYXVsL+YFDovDkZrBy1xJDh3I0QIqk9b4oJlynlxWqX9mqG+YUvEPZvvKpt7UyyHejciuYOECDOT13Pa2xcQaCF7M+fmBizXJckCf5zgin7YSVTMm1u66WEio2uW8O7ahqGazzphxoxwX5212Yy/EKOYlhOD9ZCt8zn/e4SAS2fozYKaHsENJN5y75t X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: d3a5fe25-00db-411b-1651-08d9a0c32a31 X-MS-Exchange-CrossTenant-AuthSource: SJ0PR10MB4429.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2021 01:17:13.8016 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Zb0nlpiDQVi6hP2CxTIL9rA0Lj9ujTB8CmkxIMuIz8TU4FiH3ZPjVUPJlD7Uc3GVnwsA0V3PKX870Q6byNzucQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR10MB5890 X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10159 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 suspectscore=0 bulkscore=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111060005 X-Proofpoint-ORIG-GUID: BB5v6OVno30v7F4afHqD5xMuINTcBkMn X-Proofpoint-GUID: BB5v6OVno30v7F4afHqD5xMuINTcBkMn For /dev/pmem based dax, enable DAX_OP_RECOVERY mode for dax_direct_access to translate 'kaddr' over a range that may contain poison(s); and enable dax_copy_to_iter to read as much data as possible up till a poisoned page is encountered; and enable dax_copy_from_iter to clear poison among a page-aligned range, and then write the good data over. Signed-off-by: Jane Chu --- drivers/md/dm.c | 2 ++ drivers/nvdimm/pmem.c | 75 ++++++++++++++++++++++++++++++++++++++++--- fs/dax.c | 24 +++++++++++--- 3 files changed, 92 insertions(+), 9 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index dc354db22ef9..9b3dac916f22 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1043,6 +1043,7 @@ static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, if (!ti) goto out; if (!ti->type->dax_copy_from_iter) { + WARN_ON(mode == DAX_OP_RECOVERY); ret = copy_from_iter(addr, bytes, i); goto out; } @@ -1067,6 +1068,7 @@ static size_t dm_dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, if (!ti) goto out; if (!ti->type->dax_copy_to_iter) { + WARN_ON(mode == DAX_OP_RECOVERY); ret = copy_to_iter(addr, bytes, i); goto out; } diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 3dc99e0bf633..8ae6aa678c51 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -260,7 +260,7 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff, resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset; if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, - PFN_PHYS(nr_pages)))) + PFN_PHYS(nr_pages)) && mode == DAX_OP_NORMAL)) return -EIO; if (kaddr) @@ -303,20 +303,85 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev, } /* - * Use the 'no check' versions of copy_from_iter_flushcache() and - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds - * checking, both file offset and device offset, is handled by - * dax_iomap_actor() + * Even though the 'no check' versions of copy_from_iter_flushcache() + * and copy_mc_to_iter() are used to bypass HARDENED_USERCOPY overhead, + * 'read'/'write' aren't always safe when poison is consumed. They happen + * to be safe because the 'read'/'write' range has been guaranteed + * be free of poison(s) by a prior call to dax_direct_access() on the + * caller stack. + * But on a data recovery code path, the 'read'/'write' range is expected + * to contain poison(s), and so poison(s) is explicit checked, such that + * 'read' can fetch data from clean page(s) up till the first poison is + * encountered, and 'write' requires the range be page aligned in order + * to restore the poisoned page's memory type back to "rw" after clearing + * the poison(s). + * In the event of poison related failure, (size_t) -EIO is returned and + * caller may check the return value after casting it to (ssize_t). + * + * TODO: add support for CPUs that support MOVDIR64B instruction for + * faster poison clearing, and possibly smaller error blast radius. */ static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i, int mode) { + phys_addr_t pmem_off; + size_t len, lead_off; + struct pmem_device *pmem = dax_get_private(dax_dev); + struct device *dev = pmem->bb.dev; + + if (unlikely(mode == DAX_OP_RECOVERY)) { + lead_off = (unsigned long)addr & ~PAGE_MASK; + len = PFN_PHYS(PFN_UP(lead_off + bytes)); + if (is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, len)) { + if (lead_off || !(PAGE_ALIGNED(bytes))) { + dev_warn(dev, "Found poison, but addr(%p) and/or bytes(%#lx) not page aligned\n", + addr, bytes); + return (size_t) -EIO; + } + pmem_off = PFN_PHYS(pgoff) + pmem->data_offset; + if (pmem_clear_poison(pmem, pmem_off, bytes) != + BLK_STS_OK) + return (size_t) -EIO; + } + } + return _copy_from_iter_flushcache(addr, bytes, i); } static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i, int mode) { + int num_bad; + size_t len, lead_off; + unsigned long bad_pfn; + bool bad_pmem = false; + size_t adj_len = bytes; + sector_t sector, first_bad; + struct pmem_device *pmem = dax_get_private(dax_dev); + struct device *dev = pmem->bb.dev; + + if (unlikely(mode == DAX_OP_RECOVERY)) { + sector = PFN_PHYS(pgoff) / 512; + lead_off = (unsigned long)addr & ~PAGE_MASK; + len = PFN_PHYS(PFN_UP(lead_off + bytes)); + if (pmem->bb.count) + bad_pmem = !!badblocks_check(&pmem->bb, sector, + len / 512, &first_bad, &num_bad); + if (bad_pmem) { + bad_pfn = PHYS_PFN(first_bad * 512); + if (bad_pfn == pgoff) { + dev_warn(dev, "Found poison in page: pgoff(%#lx)\n", + pgoff); + return -EIO; + } + adj_len = PFN_PHYS(bad_pfn - pgoff) - lead_off; + dev_WARN_ONCE(dev, (adj_len > bytes), + "out-of-range first_bad?"); + } + if (adj_len == 0) + return (size_t) -EIO; + } + return _copy_mc_to_iter(addr, bytes, i); } diff --git a/fs/dax.c b/fs/dax.c index bea6df1498c3..7640be6b6a97 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1219,6 +1219,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, unsigned offset = pos & (PAGE_SIZE - 1); const size_t size = ALIGN(length + offset, PAGE_SIZE); const sector_t sector = dax_iomap_sector(iomap, pos); + long nr_page = PHYS_PFN(size); + int dax_mode = DAX_OP_NORMAL; ssize_t map_len; pgoff_t pgoff; void *kaddr; @@ -1232,8 +1234,13 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, if (ret) break; - map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), - DAX_OP_NORMAL, &kaddr, NULL); + map_len = dax_direct_access(dax_dev, pgoff, nr_page, dax_mode, + &kaddr, NULL); + if (unlikely(map_len == -EIO)) { + dax_mode = DAX_OP_RECOVERY; + map_len = dax_direct_access(dax_dev, pgoff, nr_page, + dax_mode, &kaddr, NULL); + } if (map_len < 0) { ret = map_len; break; @@ -1252,11 +1259,20 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, */ if (iov_iter_rw(iter) == WRITE) xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr, - map_len, iter, DAX_OP_NORMAL); + map_len, iter, dax_mode); else xfer = dax_copy_to_iter(dax_dev, pgoff, kaddr, - map_len, iter, DAX_OP_NORMAL); + map_len, iter, dax_mode); + /* + * If dax data recovery is enacted via DAX_OP_RECOVERY, + * recovery failure would be indicated by a -EIO return + * in 'xfer' casted as (size_t). + */ + if ((ssize_t)xfer == -EIO) { + ret = -EIO; + break; + } pos += xfer; length -= xfer; done += xfer;