From patchwork Sun Nov 8 19:28:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7579461 Return-Path: X-Original-To: patchwork-linux-block@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 4A7C09F392 for ; Sun, 8 Nov 2015 19:34:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4005520591 for ; Sun, 8 Nov 2015 19:34:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 22DEE2058A for ; Sun, 8 Nov 2015 19:34:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751346AbbKHTeW (ORCPT ); Sun, 8 Nov 2015 14:34:22 -0500 Received: from mga11.intel.com ([192.55.52.93]:59545 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750985AbbKHTeV (ORCPT ); Sun, 8 Nov 2015 14:34:21 -0500 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 08 Nov 2015 11:34:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,263,1444719600"; d="scan'208";a="814804255" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.39]) by orsmga001.jf.intel.com with ESMTP; 08 Nov 2015 11:34:22 -0800 Subject: [PATCH v4 14/14] block, dax: opt-in control for raw block dax support From: Dan Williams To: axboe@fb.com Cc: jack@suse.cz, linux-nvdimm@lists.01.org, david@fromorbit.com, linux-block@vger.kernel.org, ross.zwisler@linux.intel.com, hch@lst.de Date: Sun, 08 Nov 2015 14:28:39 -0500 Message-ID: <20151108192838.9104.11328.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <20151108192722.9104.86664.stgit@dwillia2-desk3.amr.corp.intel.com> References: <20151108192722.9104.86664.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now that we have the ability to dynamically enable/disable DAX for a raw block inode, make the default behavior a compile time decision. DAX does not yet have feature parity with pagecache backed mappings, and it may disable statistics that an application depends on, so environments should knowingly enable DAX semantics. Note, that this does not affect the mmap path for filesystems on top of a DAX capable block device. They currently open code a check for the ->direct_access() op in the gendisk. That said, DAX support is already opt-in for filesystems via a mount flag. Cc: Dave Chinner [dgc: leave the dax_do_io() path alone, let it honor S_DAX] Signed-off-by: Dan Williams --- block/Kconfig | 15 +++++++++++++++ block/ioctl.c | 2 +- fs/block_dev.c | 16 ++++++++-------- include/linux/fs.h | 8 ++++++++ 4 files changed, 32 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/block/Kconfig b/block/Kconfig index 161491d0a879..6fb05c570332 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -111,6 +111,21 @@ config BLK_CMDLINE_PARSER See Documentation/block/cmdline-partition.txt for more information. +config BLK_DEV_DAX + bool "Block layer DAX support default" + depends on FS_DAX + help + When DAX support is available (CONFIG_FS_DAX) raw block devices + can also support direct userspace access to the storage capacity + via MMAP(2) similar to a file on a DAX-enabled filesystem. + However, the DAX I/O-path disables some standard I/O-statistics, + and the MMAP(2) path has some functional differences due to + bypassing the page cache. The choice here can be overridden at + run time via the BLKDAXSET ioctl. If you are unsure if the DAX + behavior is compatible with your environment, say N. Otherwise + DAX is a significantly faster way to access persistent memory + from NVDIMM devices. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/ioctl.c b/block/ioctl.c index 604438f36ddd..a353bcd29987 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -296,7 +296,7 @@ static inline int is_unrecognized_ioctl(int ret) } #ifdef CONFIG_FS_DAX -static bool blkdev_dax_capable(struct block_device *bdev) +bool blkdev_dax_capable(struct block_device *bdev) { struct gendisk *disk = bdev->bd_disk; diff --git a/fs/block_dev.c b/fs/block_dev.c index 09d10667cc19..43af861463f4 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1185,7 +1185,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_disk = disk; bdev->bd_queue = disk->queue; bdev->bd_contains = bdev; - bdev->bd_inode->i_flags = disk->fops->direct_access ? S_DAX : 0; + if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access) + bdev->bd_inode->i_flags = S_DAX; if (!partno) { ret = -ENXIO; bdev->bd_part = disk_get_part(disk, partno); @@ -1212,8 +1213,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) } } - if (!ret) + if (!ret) { bd_set_size(bdev,(loff_t)get_capacity(disk)<<9); + if (!blkdev_dax_capable(bdev)) + bdev->bd_inode->i_flags &= ~S_DAX; + } /* * If the device is invalidated, rescan partition @@ -1227,6 +1231,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) else if (ret == -ENOMEDIUM) invalidate_partitions(disk, bdev); } + if (ret) goto out_clear; } else { @@ -1247,12 +1252,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) goto out_clear; } bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); - /* - * If the partition is not aligned on a page - * boundary, we can't do dax I/O to it. - */ - if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) || - (bdev->bd_part->nr_sects % (PAGE_SIZE / 512))) + if (!blkdev_dax_capable(bdev)) bdev->bd_inode->i_flags &= ~S_DAX; } } else { diff --git a/include/linux/fs.h b/include/linux/fs.h index 8fb2d4b848bf..5a9e14538f69 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2282,6 +2282,14 @@ extern struct super_block *freeze_bdev(struct block_device *); extern void emergency_thaw_all(void); extern int thaw_bdev(struct block_device *bdev, struct super_block *sb); extern int fsync_bdev(struct block_device *); +#ifdef CONFIG_FS_DAX +extern bool blkdev_dax_capable(struct block_device *bdev); +#else +static inline bool blkdev_dax_capable(struct block_device *bdev) +{ + return false; +} +#endif extern struct super_block *blockdev_superblock;