diff mbox

[v4,04/18] dax: require 'struct page' by default for filesystem dax

Message ID 151407698249.38751.17338746909239708376.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State Accepted
Commit 569d0365f571
Headers show

Commit Message

Dan Williams Dec. 24, 2017, 12:56 a.m. UTC
If a dax buffer from a device that does not map pages is passed to
read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
gdb attempts to examine the contents of a dax buffer from a device that
does not map pages it triggers SIGBUS. If fork(2) is called on a process
with a dax mapping from a device that does not map pages it triggers
SIGBUS. 'struct page' is required otherwise several kernel code paths
break in surprising ways. Disable filesystem-dax on devices that do not
map pages.

In addition to needing pfn_to_page() to be valid we also require devmap
pages.  We need this to detect dax pages in the get_user_pages_fast()
path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
drivers that have not supported get_user_pages() to date we allow them
to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
option which requires ->direct_access() to return pfn_t_special() pfns.
This leaves DAX support in brd disabled and scheduled for removal.

Note that when the initial dax support was being merged a few years back
there was concern that struct page was unsuitable for use with next
generation persistent memory devices. The theoretical concern was that
struct page access, being such a hotly used data structure in the
kernel, would lead to media wear out. While that was a reasonable
conservative starting position it has not held true in practice. We have
long since committed to using devm_memremap_pages() to support higher
order kernel functionality that needs get_user_pages() and
pfn_to_page().

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/powerpc/platforms/Kconfig |    1 +
 drivers/dax/super.c            |   10 ++++++++++
 drivers/s390/block/Kconfig     |    1 +
 fs/Kconfig                     |    7 +++++++
 4 files changed, 19 insertions(+)

Comments

Jan Kara Jan. 3, 2018, 3:29 p.m. UTC | #1
On Sat 23-12-17 16:56:22, Dan Williams wrote:
> If a dax buffer from a device that does not map pages is passed to
> read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
> gdb attempts to examine the contents of a dax buffer from a device that
> does not map pages it triggers SIGBUS. If fork(2) is called on a process
> with a dax mapping from a device that does not map pages it triggers
> SIGBUS. 'struct page' is required otherwise several kernel code paths
> break in surprising ways. Disable filesystem-dax on devices that do not
> map pages.
> 
> In addition to needing pfn_to_page() to be valid we also require devmap
> pages.  We need this to detect dax pages in the get_user_pages_fast()
> path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
> drivers that have not supported get_user_pages() to date we allow them
> to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
> option which requires ->direct_access() to return pfn_t_special() pfns.
> This leaves DAX support in brd disabled and scheduled for removal.
> 
> Note that when the initial dax support was being merged a few years back
> there was concern that struct page was unsuitable for use with next
> generation persistent memory devices. The theoretical concern was that
> struct page access, being such a hotly used data structure in the
> kernel, would lead to media wear out. While that was a reasonable
> conservative starting position it has not held true in practice. We have
> long since committed to using devm_memremap_pages() to support higher
> order kernel functionality that needs get_user_pages() and
> pfn_to_page().
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  arch/powerpc/platforms/Kconfig |    1 +
>  drivers/dax/super.c            |   10 ++++++++++
>  drivers/s390/block/Kconfig     |    1 +
>  fs/Kconfig                     |    7 +++++++
>  4 files changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index 5a96a2763e4a..2ce89b42a9f4 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -297,6 +297,7 @@ config AXON_RAM
>  	tristate "Axon DDR2 memory device driver"
>  	depends on PPC_IBM_CELL_BLADE && BLOCK
>  	select DAX
> +	select FS_DAX_LIMITED
>  	default m
>  	help
>  	  It registers one block device per Axon's DDR2 memory bank found
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 3ec804672601..473af694ad1c 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -15,6 +15,7 @@
>  #include <linux/mount.h>
>  #include <linux/magic.h>
>  #include <linux/genhd.h>
> +#include <linux/pfn_t.h>
>  #include <linux/cdev.h>
>  #include <linux/hash.h>
>  #include <linux/slab.h>
> @@ -123,6 +124,15 @@ int __bdev_dax_supported(struct super_block *sb, int blocksize)
>  		return len < 0 ? len : -EIO;
>  	}
>  
> +	if ((IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn))
> +			|| pfn_t_devmap(pfn))
> +		/* pass */;
> +	else {
> +		pr_debug("VFS (%s): error: dax support not enabled\n",
> +				sb->s_id);
> +		return -EOPNOTSUPP;
> +	}
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(__bdev_dax_supported);
> diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
> index 31f014b57bfc..594ae5fc8e9d 100644
> --- a/drivers/s390/block/Kconfig
> +++ b/drivers/s390/block/Kconfig
> @@ -15,6 +15,7 @@ config BLK_DEV_XPRAM
>  config DCSSBLK
>  	def_tristate m
>  	select DAX
> +	select FS_DAX_LIMITED
>  	prompt "DCSSBLK support"
>  	depends on S390 && BLOCK
>  	help
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7aee6d699fd6..b40128bf6d1a 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -58,6 +58,13 @@ config FS_DAX_PMD
>  	depends on ZONE_DEVICE
>  	depends on TRANSPARENT_HUGEPAGE
>  
> +# Selected by DAX drivers that do not expect filesystem DAX to support
> +# get_user_pages() of DAX mappings. I.e. "limited" indicates no support
> +# for fork() of processes with MAP_SHARED mappings or support for
> +# direct-I/O to a DAX mapping.
> +config FS_DAX_LIMITED
> +	bool
> +
>  endif # BLOCK
>  
>  # Posix ACL utility routines
>
Christoph Hellwig Jan. 4, 2018, 8:16 a.m. UTC | #2
Fine with me:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Gerald Schaefer Jan. 8, 2018, 11:58 a.m. UTC | #3
On Sat, 23 Dec 2017 16:56:22 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> If a dax buffer from a device that does not map pages is passed to
> read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
> gdb attempts to examine the contents of a dax buffer from a device that
> does not map pages it triggers SIGBUS. If fork(2) is called on a process
> with a dax mapping from a device that does not map pages it triggers
> SIGBUS. 'struct page' is required otherwise several kernel code paths
> break in surprising ways. Disable filesystem-dax on devices that do not
> map pages.
> 
> In addition to needing pfn_to_page() to be valid we also require devmap
> pages.  We need this to detect dax pages in the get_user_pages_fast()
> path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
> drivers that have not supported get_user_pages() to date we allow them
> to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
> option which requires ->direct_access() to return pfn_t_special() pfns.
> This leaves DAX support in brd disabled and scheduled for removal.
> 
> Note that when the initial dax support was being merged a few years back
> there was concern that struct page was unsuitable for use with next
> generation persistent memory devices. The theoretical concern was that
> struct page access, being such a hotly used data structure in the
> kernel, would lead to media wear out. While that was a reasonable
> conservative starting position it has not held true in practice. We have
> long since committed to using devm_memremap_pages() to support higher
> order kernel functionality that needs get_user_pages() and
> pfn_to_page().
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/powerpc/platforms/Kconfig |    1 +
>  drivers/dax/super.c            |   10 ++++++++++
>  drivers/s390/block/Kconfig     |    1 +
>  fs/Kconfig                     |    7 +++++++
>  4 files changed, 19 insertions(+)

dcssblk seems to work fine, I did not see any SIGBUS while "executing
in place" from dcssblk with the current upstream kernel, maybe because
we only use dcssblk with fs dax in read-only mode.

Anyway, the dcssblk change is fine with me. I will look into adding
struct pages for dcssblk memory later, to make it work again with
this change, but for now I do not know of anyone needing this in the
upstream kernel.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
diff mbox

Patch

diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 5a96a2763e4a..2ce89b42a9f4 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -297,6 +297,7 @@  config AXON_RAM
 	tristate "Axon DDR2 memory device driver"
 	depends on PPC_IBM_CELL_BLADE && BLOCK
 	select DAX
+	select FS_DAX_LIMITED
 	default m
 	help
 	  It registers one block device per Axon's DDR2 memory bank found
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 3ec804672601..473af694ad1c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -15,6 +15,7 @@ 
 #include <linux/mount.h>
 #include <linux/magic.h>
 #include <linux/genhd.h>
+#include <linux/pfn_t.h>
 #include <linux/cdev.h>
 #include <linux/hash.h>
 #include <linux/slab.h>
@@ -123,6 +124,15 @@  int __bdev_dax_supported(struct super_block *sb, int blocksize)
 		return len < 0 ? len : -EIO;
 	}
 
+	if ((IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn))
+			|| pfn_t_devmap(pfn))
+		/* pass */;
+	else {
+		pr_debug("VFS (%s): error: dax support not enabled\n",
+				sb->s_id);
+		return -EOPNOTSUPP;
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__bdev_dax_supported);
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index 31f014b57bfc..594ae5fc8e9d 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -15,6 +15,7 @@  config BLK_DEV_XPRAM
 config DCSSBLK
 	def_tristate m
 	select DAX
+	select FS_DAX_LIMITED
 	prompt "DCSSBLK support"
 	depends on S390 && BLOCK
 	help
diff --git a/fs/Kconfig b/fs/Kconfig
index 7aee6d699fd6..b40128bf6d1a 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -58,6 +58,13 @@  config FS_DAX_PMD
 	depends on ZONE_DEVICE
 	depends on TRANSPARENT_HUGEPAGE
 
+# Selected by DAX drivers that do not expect filesystem DAX to support
+# get_user_pages() of DAX mappings. I.e. "limited" indicates no support
+# for fork() of processes with MAP_SHARED mappings or support for
+# direct-I/O to a DAX mapping.
+config FS_DAX_LIMITED
+	bool
+
 endif # BLOCK
 
 # Posix ACL utility routines