From patchwork Sun Nov 8 19:28:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7579471 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 8BCD2C05C6 for ; Sun, 8 Nov 2015 19:34:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7D63B20601 for ; Sun, 8 Nov 2015 19:34:23 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7F2E42058C for ; Sun, 8 Nov 2015 19:34:22 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 727821A1F7F; Sun, 8 Nov 2015 11:34:22 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by ml01.01.org (Postfix) with ESMTP id 596171A1F82 for ; Sun, 8 Nov 2015 11:34:21 -0800 (PST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 08 Nov 2015 11:34:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,263,1444719600"; d="scan'208";a="814804255" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.39]) by orsmga001.jf.intel.com with ESMTP; 08 Nov 2015 11:34:22 -0800 Subject: [PATCH v4 14/14] block, dax: opt-in control for raw block dax support From: Dan Williams To: axboe@fb.com Date: Sun, 08 Nov 2015 14:28:39 -0500 Message-ID: <20151108192838.9104.11328.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <20151108192722.9104.86664.stgit@dwillia2-desk3.amr.corp.intel.com> References: <20151108192722.9104.86664.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Cc: jack@suse.cz, linux-nvdimm@lists.01.org, david@fromorbit.com, linux-block@vger.kernel.org, hch@lst.de X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now that we have the ability to dynamically enable/disable DAX for a raw block inode, make the default behavior a compile time decision. DAX does not yet have feature parity with pagecache backed mappings, and it may disable statistics that an application depends on, so environments should knowingly enable DAX semantics. Note, that this does not affect the mmap path for filesystems on top of a DAX capable block device. They currently open code a check for the ->direct_access() op in the gendisk. That said, DAX support is already opt-in for filesystems via a mount flag. Cc: Dave Chinner [dgc: leave the dax_do_io() path alone, let it honor S_DAX] Signed-off-by: Dan Williams --- block/Kconfig | 15 +++++++++++++++ block/ioctl.c | 2 +- fs/block_dev.c | 16 ++++++++-------- include/linux/fs.h | 8 ++++++++ 4 files changed, 32 insertions(+), 9 deletions(-) diff --git a/block/Kconfig b/block/Kconfig index 161491d0a879..6fb05c570332 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -111,6 +111,21 @@ config BLK_CMDLINE_PARSER See Documentation/block/cmdline-partition.txt for more information. +config BLK_DEV_DAX + bool "Block layer DAX support default" + depends on FS_DAX + help + When DAX support is available (CONFIG_FS_DAX) raw block devices + can also support direct userspace access to the storage capacity + via MMAP(2) similar to a file on a DAX-enabled filesystem. + However, the DAX I/O-path disables some standard I/O-statistics, + and the MMAP(2) path has some functional differences due to + bypassing the page cache. The choice here can be overridden at + run time via the BLKDAXSET ioctl. If you are unsure if the DAX + behavior is compatible with your environment, say N. Otherwise + DAX is a significantly faster way to access persistent memory + from NVDIMM devices. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/ioctl.c b/block/ioctl.c index 604438f36ddd..a353bcd29987 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -296,7 +296,7 @@ static inline int is_unrecognized_ioctl(int ret) } #ifdef CONFIG_FS_DAX -static bool blkdev_dax_capable(struct block_device *bdev) +bool blkdev_dax_capable(struct block_device *bdev) { struct gendisk *disk = bdev->bd_disk; diff --git a/fs/block_dev.c b/fs/block_dev.c index 09d10667cc19..43af861463f4 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1185,7 +1185,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_disk = disk; bdev->bd_queue = disk->queue; bdev->bd_contains = bdev; - bdev->bd_inode->i_flags = disk->fops->direct_access ? S_DAX : 0; + if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access) + bdev->bd_inode->i_flags = S_DAX; if (!partno) { ret = -ENXIO; bdev->bd_part = disk_get_part(disk, partno); @@ -1212,8 +1213,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) } } - if (!ret) + if (!ret) { bd_set_size(bdev,(loff_t)get_capacity(disk)<<9); + if (!blkdev_dax_capable(bdev)) + bdev->bd_inode->i_flags &= ~S_DAX; + } /* * If the device is invalidated, rescan partition @@ -1227,6 +1231,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) else if (ret == -ENOMEDIUM) invalidate_partitions(disk, bdev); } + if (ret) goto out_clear; } else { @@ -1247,12 +1252,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) goto out_clear; } bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); - /* - * If the partition is not aligned on a page - * boundary, we can't do dax I/O to it. - */ - if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) || - (bdev->bd_part->nr_sects % (PAGE_SIZE / 512))) + if (!blkdev_dax_capable(bdev)) bdev->bd_inode->i_flags &= ~S_DAX; } } else { diff --git a/include/linux/fs.h b/include/linux/fs.h index 8fb2d4b848bf..5a9e14538f69 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2282,6 +2282,14 @@ extern struct super_block *freeze_bdev(struct block_device *); extern void emergency_thaw_all(void); extern int thaw_bdev(struct block_device *bdev, struct super_block *sb); extern int fsync_bdev(struct block_device *); +#ifdef CONFIG_FS_DAX +extern bool blkdev_dax_capable(struct block_device *bdev); +#else +static inline bool blkdev_dax_capable(struct block_device *bdev) +{ + return false; +} +#endif extern struct super_block *blockdev_superblock;