diff mbox series

[V7,9/9] Documentation/dax: Update Usage section

Message ID 20200413054046.1560106-10-ira.weiny@intel.com (mailing list archive)
State Superseded, archived
Headers show
Series Enable per-file/per-directory DAX operations V7 | expand

Commit Message

Ira Weiny April 13, 2020, 5:40 a.m. UTC
From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V6:
	Update to allow setting FS_XFLAG_DAX any time.
	Update with list of behaviors from Darrick
	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
 1 file changed, 163 insertions(+), 3 deletions(-)

Comments

Darrick J. Wong April 13, 2020, 4:19 p.m. UTC | #1
On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Update the Usage section to reflect the new individual dax selection
> functionality.

Yum. :)

> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V6:
> 	Update to allow setting FS_XFLAG_DAX any time.
> 	Update with list of behaviors from Darrick
> 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> 
> Changes from V5:
> 	Update to reflect the agreed upon semantics
> 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
>  1 file changed, 163 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..af14c1b330a9 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>  
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> +different at this time.

I thought ext2 supports DAX?

> +Enabling DAX on ext4
> +--------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel access mode flag S_DAX that is set when
> +    file accesses go directly to persistent memory, bypassing the page
> +    cache.  Applications must call statx to discover the current S_DAX
> +    state (STATX_ATTR_DAX).
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> +    creation time.  This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.
> +
> +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> +    and the fs is on pmem then it will enable S_DAX at inode load time;
> +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> +
> + 3. There exists a dax= mount option.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX (at least on pmem),
> +                    and ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is an alias for "dax=always".
> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> +    be set or cleared at any time.  The flag state is inherited by any files or
> +    subdirectories when they are created within that directory.
> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> +    can do one of the following:
> +
> +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> +        needed; or
> +
> +    (b) Have the administrator set an override via mount option; or
> +
> +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> +        must then cause the kernel to evict the inode from memory.  This
> +        can be done by:
> +
> +        i>  Closing the file and re-opening the file and using statx to
> +            see if the fs has changed the S_DAX flag; and
> +
> +        ii> If the file still does not have the desired S_DAX access
> +            mode, either unmount and remount the filesystem, or close
> +            the file and use drop_caches.
> +
> + 6. It is expected that users who want to squeeze every last bit of performance
> +    out of the particular rough and tumble bits of their storage will also be
> +    exposed to the difficulties of what happens when the operating system can't
> +    totally virtualize those hardware capabilities.  DAX is such a feature.
> +    Basically, Formula-1 cars require a bit more care and feeding than your
> +    averaged Toyota minivan, as it were.

I think we can omit this last sentence for the formal documentation...
:)

> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> +and the other a currently enabled state (S_DAX).
> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> +within the file system.  This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> +root directory acts to set a default for the entire file system.)
> +
> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> +	dax: a,e
> +	no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> +	dax: a,b,c,d
> +	no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
> +mkdir a/b/c/d
> +
> +	dax: c,d
> +	no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> +dax mount option setting.  See below.
> +
> +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> +set and therefore statx will always return false on directories.

"statx will never indicate that S_DAX is set on directories."

> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option.  Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> +        "-o dax" by itself means "dax=always" to remain compatible with older
> +	         kernels
> +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> +
> +The default state is 'inode'.  Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> +	S_DAX = FS_XFLAG_DAX;
> +
> +	if (dax_mount == "always")
> +		S_DAX = true;
> +	else if (dax_mount == "off"
> +		S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override.  However, file enabled
> +state, S_DAX, will continue to be the overridden until the file system is
> +remounted with dax=inode.

"However, in-core inode state (S_DAX) will continue to be overridden
until the filesystem is remounted with dax=inode and the inode is
evicted."

...since we don't currently evict inodes just because a remount occurred.
:)

--D

>  
>  
>  Implementation Tips for Block Driver Writers
> -- 
> 2.25.1
>
Ira Weiny April 14, 2020, 4:38 a.m. UTC | #2
On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> 
> Yum. :)
> 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from V6:
> > 	Update to allow setting FS_XFLAG_DAX any time.
> > 	Update with list of behaviors from Darrick
> > 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > 
> > Changes from V5:
> > 	Update to reflect the agreed upon semantics
> > 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > ---
> >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> >  1 file changed, 163 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..af14c1b330a9 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> >  Usage
> >  -----
> >  
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> >  on it as usual.  The DAX code currently only supports files with a block
> >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > +different at this time.
> 
> I thought ext2 supports DAX?

Not that I know of?  Does it?

> 
> > +Enabling DAX on ext4
> > +--------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > +    file accesses go directly to persistent memory, bypassing the page
> > +    cache.  Applications must call statx to discover the current S_DAX
> > +    state (STATX_ATTR_DAX).
> > +
> > + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> > +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> > +    creation time.  This advisory flag can be set or cleared at any
> > +    time, but doing so does not immediately affect the S_DAX state.
> > +
> > +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> > +    and the fs is on pmem then it will enable S_DAX at inode load time;
> > +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> > +
> > + 3. There exists a dax= mount option.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX (at least on pmem),
> > +                    and ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is an alias for "dax=always".
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> > +    be set or cleared at any time.  The flag state is inherited by any files or
> > +    subdirectories when they are created within that directory.
> > +
> > + 5. Programs that require a specific file access mode (DAX or not DAX)
> > +    can do one of the following:
> > +
> > +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> > +        needed; or
> > +
> > +    (b) Have the administrator set an override via mount option; or
> > +
> > +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> > +        must then cause the kernel to evict the inode from memory.  This
> > +        can be done by:
> > +
> > +        i>  Closing the file and re-opening the file and using statx to
> > +            see if the fs has changed the S_DAX flag; and
> > +
> > +        ii> If the file still does not have the desired S_DAX access
> > +            mode, either unmount and remount the filesystem, or close
> > +            the file and use drop_caches.
> > +
> > + 6. It is expected that users who want to squeeze every last bit of performance
> > +    out of the particular rough and tumble bits of their storage will also be
> > +    exposed to the difficulties of what happens when the operating system can't
> > +    totally virtualize those hardware capabilities.  DAX is such a feature.
> > +    Basically, Formula-1 cars require a bit more care and feeding than your
> > +    averaged Toyota minivan, as it were.
> 
> I think we can omit this last sentence for the formal documentation...

Done.

> :)
> 
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> > +and the other a currently enabled state (S_DAX).
> > +
> > +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> > +within the file system.  This 'physical' config setting can be set using an
> > +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> > +directories automatically inherit FS_XFLAG_DAX from their parent directory
> > +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> > +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> > +root directory acts to set a default for the entire file system.)
> > +
> > +To clarify inheritance here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +	dax: a,e
> > +	no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > +	dax: a,b,c,d
> > +	no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > +	dax: c,d
> > +	no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> > +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> > +dax mount option setting.  See below.
> > +
> > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> > +set and therefore statx will always return false on directories.
> 
> "statx will never indicate that S_DAX is set on directories."

Done.

> 
> > +
> > +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> > +even if the underlying media does not support dax and/or the file system is
> > +overridden with a mount option.
> > +
> > +
> > +Overriding FS_XFLAG_DAX (dax= mount option)
> > +-------------------------------------------
> > +
> > +There exists a dax mount option.  Using the mount option does not change the
> > +physical configured state of individual files but overrides the S_DAX operating
> > +state when inodes are loaded.
> > +
> > +Given underlying media support, the dax mount option is a tri-state option
> > +(never, always, inode) with the following meanings:
> > +
> > +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> > +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> > +        "-o dax" by itself means "dax=always" to remain compatible with older
> > +	         kernels
> > +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> > +
> > +The default state is 'inode'.  Given underlying media support, the following
> > +algorithm is used to determine the effective mode of the file S_DAX on a
> > +capable device.
> > +
> > +	S_DAX = FS_XFLAG_DAX;
> > +
> > +	if (dax_mount == "always")
> > +		S_DAX = true;
> > +	else if (dax_mount == "off"
> > +		S_DAX = false;
> > +
> > +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> > +while the file system is mounted with a dax override.  However, file enabled
> > +state, S_DAX, will continue to be the overridden until the file system is
> > +remounted with dax=inode.
> 
> "However, in-core inode state (S_DAX) will continue to be overridden
> until the filesystem is remounted with dax=inode and the inode is
> evicted."
> 
> ...since we don't currently evict inodes just because a remount occurred.
> :)

Done

Thanks again for the review!  :-D

Ira

> 
> --D
> 
> >  
> >  
> >  Implementation Tips for Block Driver Writers
> > -- 
> > 2.25.1
> >
Dan Williams April 14, 2020, 5:12 a.m. UTC | #3
On Mon, Apr 13, 2020 at 9:38 PM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> > On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > Update the Usage section to reflect the new individual dax selection
> > > functionality.
> >
> > Yum. :)
> >
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >
> > > ---
> > > Changes from V6:
> > >     Update to allow setting FS_XFLAG_DAX any time.
> > >     Update with list of behaviors from Darrick
> > >     https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > >
> > > Changes from V5:
> > >     Update to reflect the agreed upon semantics
> > >     https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > ---
> > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > index 679729442fd2..af14c1b330a9 100644
> > > --- a/Documentation/filesystems/dax.txt
> > > +++ b/Documentation/filesystems/dax.txt
> > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > >  Usage
> > >  -----
> > >
> > > -If you have a block device which supports DAX, you can make a filesystem
> > > +If you have a block device which supports DAX, you can make a file system
> > >  on it as usual.  The DAX code currently only supports files with a block
> > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > +size when creating the file system.
> > > +
> > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > +different at this time.
> >
> > I thought ext2 supports DAX?
>
> Not that I know of?  Does it?

Yes. Seemed like a good idea at the time, but in retrospect...

In fairness I believe this was also an olive branch to XIP users that
were transitioned to DAX, so they did not also need to transition
filesystems.
Dan Williams April 14, 2020, 5:21 a.m. UTC | #4
On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>
> ---
> Changes from V6:
>         Update to allow setting FS_XFLAG_DAX any time.
>         Update with list of behaviors from Darrick
>         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
>
> Changes from V5:
>         Update to reflect the agreed upon semantics
>         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
>  1 file changed, 163 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..af14c1b330a9 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> +different at this time.
> +
> +Enabling DAX on ext4
> +--------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel access mode flag S_DAX that is set when
> +    file accesses go directly to persistent memory, bypassing the page
> +    cache.

I had reserved some quibbling with this wording, but now that this is
being proposed as documentation I'll let my quibbling fly. "dax" may
imply, but does not require persistent memory nor does it necessarily
"bypass page cache". For example on configurations that support dax,
but turn off MAP_SYNC (like virtio-pmem), a software flush is
required. Instead, if we're going to define "dax" here I'd prefer it
be a #include of the man page definition that is careful (IIRC) to
only talk about semantics and not backend implementation details. In
other words, dax is to page-cache as direct-io is to page cache,
effectively not there, but dig a bit deeper and you may find it.

> Applications must call statx to discover the current S_DAX
> +    state (STATX_ATTR_DAX).
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> +    creation time.  This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.
> +
> +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> +    and the fs is on pmem then it will enable S_DAX at inode load time;
> +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> +
> + 3. There exists a dax= mount option.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX (at least on pmem),
> +                    and ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is an alias for "dax=always".
> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> +    be set or cleared at any time.  The flag state is inherited by any files or
> +    subdirectories when they are created within that directory.
> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> +    can do one of the following:
> +
> +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> +        needed; or
> +
> +    (b) Have the administrator set an override via mount option; or
> +
> +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> +        must then cause the kernel to evict the inode from memory.  This
> +        can be done by:
> +
> +        i>  Closing the file and re-opening the file and using statx to
> +            see if the fs has changed the S_DAX flag; and
> +
> +        ii> If the file still does not have the desired S_DAX access
> +            mode, either unmount and remount the filesystem, or close
> +            the file and use drop_caches.
> +
> + 6. It is expected that users who want to squeeze every last bit of performance
> +    out of the particular rough and tumble bits of their storage will also be
> +    exposed to the difficulties of what happens when the operating system can't
> +    totally virtualize those hardware capabilities.  DAX is such a feature.
> +    Basically, Formula-1 cars require a bit more care and feeding than your
> +    averaged Toyota minivan, as it were.
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> +and the other a currently enabled state (S_DAX).
> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> +within the file system.  This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> +root directory acts to set a default for the entire file system.)
> +
> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> +       dax: a,e
> +       no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> +       dax: a,b,c,d
> +       no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
> +mkdir a/b/c/d
> +
> +       dax: c,d
> +       no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> +dax mount option setting.  See below.
> +
> +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> +set and therefore statx will always return false on directories.
> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option.  Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> +        "-o dax" by itself means "dax=always" to remain compatible with older
> +                kernels
> +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> +
> +The default state is 'inode'.  Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> +       S_DAX = FS_XFLAG_DAX;
> +
> +       if (dax_mount == "always")
> +               S_DAX = true;
> +       else if (dax_mount == "off"
> +               S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override.  However, file enabled
> +state, S_DAX, will continue to be the overridden until the file system is
> +remounted with dax=inode.
>
>
>  Implementation Tips for Block Driver Writers
> --
> 2.25.1
>
Darrick J. Wong April 14, 2020, 4:15 p.m. UTC | #5
On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> >
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
> > ---
> > Changes from V6:
> >         Update to allow setting FS_XFLAG_DAX any time.
> >         Update with list of behaviors from Darrick
> >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> >
> > Changes from V5:
> >         Update to reflect the agreed upon semantics
> >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > ---
> >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> >  1 file changed, 163 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..af14c1b330a9 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> >  Usage
> >  -----
> >
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> >  on it as usual.  The DAX code currently only supports files with a block
> >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > +different at this time.
> > +
> > +Enabling DAX on ext4
> > +--------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > +    file accesses go directly to persistent memory, bypassing the page
> > +    cache.
> 
> I had reserved some quibbling with this wording, but now that this is
> being proposed as documentation I'll let my quibbling fly. "dax" may
> imply, but does not require persistent memory nor does it necessarily
> "bypass page cache". For example on configurations that support dax,
> but turn off MAP_SYNC (like virtio-pmem), a software flush is
> required. Instead, if we're going to define "dax" here I'd prefer it
> be a #include of the man page definition that is careful (IIRC) to
> only talk about semantics and not backend implementation details. In
> other words, dax is to page-cache as direct-io is to page cache,
> effectively not there, but dig a bit deeper and you may find it.

Uh, which manpage?  Are you talking about the MAP_SYNC documentation?

I don't rewording this to say "There exists an in-kernel access mode
flag S_DAX that, when set, enables MAP_SYNC semantics.  Refer to mmap(2)
for more details about what that means."

--D

> > Applications must call statx to discover the current S_DAX
> > +    state (STATX_ATTR_DAX).
> > +
> > + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> > +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> > +    creation time.  This advisory flag can be set or cleared at any
> > +    time, but doing so does not immediately affect the S_DAX state.
> > +
> > +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> > +    and the fs is on pmem then it will enable S_DAX at inode load time;
> > +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> > +
> > + 3. There exists a dax= mount option.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX (at least on pmem),
> > +                    and ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is an alias for "dax=always".
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> > +    be set or cleared at any time.  The flag state is inherited by any files or
> > +    subdirectories when they are created within that directory.
> > +
> > + 5. Programs that require a specific file access mode (DAX or not DAX)
> > +    can do one of the following:
> > +
> > +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> > +        needed; or
> > +
> > +    (b) Have the administrator set an override via mount option; or
> > +
> > +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> > +        must then cause the kernel to evict the inode from memory.  This
> > +        can be done by:
> > +
> > +        i>  Closing the file and re-opening the file and using statx to
> > +            see if the fs has changed the S_DAX flag; and
> > +
> > +        ii> If the file still does not have the desired S_DAX access
> > +            mode, either unmount and remount the filesystem, or close
> > +            the file and use drop_caches.
> > +
> > + 6. It is expected that users who want to squeeze every last bit of performance
> > +    out of the particular rough and tumble bits of their storage will also be
> > +    exposed to the difficulties of what happens when the operating system can't
> > +    totally virtualize those hardware capabilities.  DAX is such a feature.
> > +    Basically, Formula-1 cars require a bit more care and feeding than your
> > +    averaged Toyota minivan, as it were.
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> > +and the other a currently enabled state (S_DAX).
> > +
> > +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> > +within the file system.  This 'physical' config setting can be set using an
> > +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> > +directories automatically inherit FS_XFLAG_DAX from their parent directory
> > +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> > +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> > +root directory acts to set a default for the entire file system.)
> > +
> > +To clarify inheritance here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +       dax: a,e
> > +       no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > +       dax: a,b,c,d
> > +       no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > +       dax: c,d
> > +       no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> > +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> > +dax mount option setting.  See below.
> > +
> > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> > +set and therefore statx will always return false on directories.
> > +
> > +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> > +even if the underlying media does not support dax and/or the file system is
> > +overridden with a mount option.
> > +
> > +
> > +Overriding FS_XFLAG_DAX (dax= mount option)
> > +-------------------------------------------
> > +
> > +There exists a dax mount option.  Using the mount option does not change the
> > +physical configured state of individual files but overrides the S_DAX operating
> > +state when inodes are loaded.
> > +
> > +Given underlying media support, the dax mount option is a tri-state option
> > +(never, always, inode) with the following meanings:
> > +
> > +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> > +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> > +        "-o dax" by itself means "dax=always" to remain compatible with older
> > +                kernels
> > +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> > +
> > +The default state is 'inode'.  Given underlying media support, the following
> > +algorithm is used to determine the effective mode of the file S_DAX on a
> > +capable device.
> > +
> > +       S_DAX = FS_XFLAG_DAX;
> > +
> > +       if (dax_mount == "always")
> > +               S_DAX = true;
> > +       else if (dax_mount == "off"
> > +               S_DAX = false;
> > +
> > +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> > +while the file system is mounted with a dax override.  However, file enabled
> > +state, S_DAX, will continue to be the overridden until the file system is
> > +remounted with dax=inode.
> >
> >
> >  Implementation Tips for Block Driver Writers
> > --
> > 2.25.1
> >
Dan Williams April 14, 2020, 7:04 p.m. UTC | #6
On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> > On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> > >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > Update the Usage section to reflect the new individual dax selection
> > > functionality.
> > >
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >
> > > ---
> > > Changes from V6:
> > >         Update to allow setting FS_XFLAG_DAX any time.
> > >         Update with list of behaviors from Darrick
> > >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > >
> > > Changes from V5:
> > >         Update to reflect the agreed upon semantics
> > >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > ---
> > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > index 679729442fd2..af14c1b330a9 100644
> > > --- a/Documentation/filesystems/dax.txt
> > > +++ b/Documentation/filesystems/dax.txt
> > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > >  Usage
> > >  -----
> > >
> > > -If you have a block device which supports DAX, you can make a filesystem
> > > +If you have a block device which supports DAX, you can make a file system
> > >  on it as usual.  The DAX code currently only supports files with a block
> > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > +size when creating the file system.
> > > +
> > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > +different at this time.
> > > +
> > > +Enabling DAX on ext4
> > > +--------------------
> > > +
> > > +When mounting the filesystem, use the "-o dax" option on the command line or
> > > +add 'dax' to the options in /etc/fstab.
> > > +
> > > +
> > > +Enabling DAX on xfs
> > > +-------------------
> > > +
> > > +Summary
> > > +-------
> > > +
> > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > +    file accesses go directly to persistent memory, bypassing the page
> > > +    cache.
> >
> > I had reserved some quibbling with this wording, but now that this is
> > being proposed as documentation I'll let my quibbling fly. "dax" may
> > imply, but does not require persistent memory nor does it necessarily
> > "bypass page cache". For example on configurations that support dax,
> > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > required. Instead, if we're going to define "dax" here I'd prefer it
> > be a #include of the man page definition that is careful (IIRC) to
> > only talk about semantics and not backend implementation details. In
> > other words, dax is to page-cache as direct-io is to page cache,
> > effectively not there, but dig a bit deeper and you may find it.
>
> Uh, which manpage?  Are you talking about the MAP_SYNC documentation?

No, I was referring to the proposed wording for STATX_ATTR_DAX.
There's no reason for this description to say anything divergent from
that description.
Ira Weiny April 14, 2020, 7:48 p.m. UTC | #7
On Mon, Apr 13, 2020 at 10:12:22PM -0700, Dan Williams wrote:
> On Mon, Apr 13, 2020 at 9:38 PM Ira Weiny <ira.weiny@intel.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> > > On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > Update the Usage section to reflect the new individual dax selection
> > > > functionality.
> > >
> > > Yum. :)
> > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > ---
> > > > Changes from V6:
> > > >     Update to allow setting FS_XFLAG_DAX any time.
> > > >     Update with list of behaviors from Darrick
> > > >     https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > >
> > > > Changes from V5:
> > > >     Update to reflect the agreed upon semantics
> > > >     https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > ---
> > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > index 679729442fd2..af14c1b330a9 100644
> > > > --- a/Documentation/filesystems/dax.txt
> > > > +++ b/Documentation/filesystems/dax.txt
> > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > >  Usage
> > > >  -----
> > > >
> > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > +If you have a block device which supports DAX, you can make a file system
> > > >  on it as usual.  The DAX code currently only supports files with a block
> > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > +size when creating the file system.
> > > > +
> > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > +different at this time.
> > >
> > > I thought ext2 supports DAX?
> >
> > Not that I know of?  Does it?
> 
> Yes. Seemed like a good idea at the time, but in retrospect...

Ah ok...   Is there an objection to leaving ext2 as a global mount option?
Updating the doc is easy enough.

Ira

> 
> In fairness I believe this was also an olive branch to XIP users that
> were transitioned to DAX, so they did not also need to transition
> filesystems.
Darrick J. Wong April 14, 2020, 7:57 p.m. UTC | #8
On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> > > On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> > > >
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > Update the Usage section to reflect the new individual dax selection
> > > > functionality.
> > > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > ---
> > > > Changes from V6:
> > > >         Update to allow setting FS_XFLAG_DAX any time.
> > > >         Update with list of behaviors from Darrick
> > > >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > >
> > > > Changes from V5:
> > > >         Update to reflect the agreed upon semantics
> > > >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > ---
> > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > index 679729442fd2..af14c1b330a9 100644
> > > > --- a/Documentation/filesystems/dax.txt
> > > > +++ b/Documentation/filesystems/dax.txt
> > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > >  Usage
> > > >  -----
> > > >
> > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > +If you have a block device which supports DAX, you can make a file system
> > > >  on it as usual.  The DAX code currently only supports files with a block
> > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > +size when creating the file system.
> > > > +
> > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > +different at this time.
> > > > +
> > > > +Enabling DAX on ext4
> > > > +--------------------
> > > > +
> > > > +When mounting the filesystem, use the "-o dax" option on the command line or
> > > > +add 'dax' to the options in /etc/fstab.
> > > > +
> > > > +
> > > > +Enabling DAX on xfs
> > > > +-------------------
> > > > +
> > > > +Summary
> > > > +-------
> > > > +
> > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > +    cache.
> > >
> > > I had reserved some quibbling with this wording, but now that this is
> > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > imply, but does not require persistent memory nor does it necessarily
> > > "bypass page cache". For example on configurations that support dax,
> > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > be a #include of the man page definition that is careful (IIRC) to
> > > only talk about semantics and not backend implementation details. In
> > > other words, dax is to page-cache as direct-io is to page cache,
> > > effectively not there, but dig a bit deeper and you may find it.
> >
> > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> 
> No, I was referring to the proposed wording for STATX_ATTR_DAX.
> There's no reason for this description to say anything divergent from
> that description.

Ahh, ok.  Something like this, then:

 1. There exists an in-kernel access mode flag S_DAX.  When set, the
    file is in the DAX (cpu direct access) state.  DAX state attempts to
    minimize software cache effects for both I/O and memory mappings of
    this file.  The S_DAX state is exposed to userspace via the
    STATX_ATTR_DAX statx flag.

    See the STATX_ATTR_DAX in the statx(2) manpage for more information.

--D
Ira Weiny April 14, 2020, 7:58 p.m. UTC | #9
On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> > > On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> > > >
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > Update the Usage section to reflect the new individual dax selection
> > > > functionality.
> > > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > ---
> > > > Changes from V6:
> > > >         Update to allow setting FS_XFLAG_DAX any time.
> > > >         Update with list of behaviors from Darrick
> > > >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > >
> > > > Changes from V5:
> > > >         Update to reflect the agreed upon semantics
> > > >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > ---
> > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > index 679729442fd2..af14c1b330a9 100644
> > > > --- a/Documentation/filesystems/dax.txt
> > > > +++ b/Documentation/filesystems/dax.txt
> > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > >  Usage
> > > >  -----
> > > >
> > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > +If you have a block device which supports DAX, you can make a file system
> > > >  on it as usual.  The DAX code currently only supports files with a block
> > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > +size when creating the file system.
> > > > +
> > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > +different at this time.
> > > > +
> > > > +Enabling DAX on ext4
> > > > +--------------------
> > > > +
> > > > +When mounting the filesystem, use the "-o dax" option on the command line or
> > > > +add 'dax' to the options in /etc/fstab.
> > > > +
> > > > +
> > > > +Enabling DAX on xfs
> > > > +-------------------
> > > > +
> > > > +Summary
> > > > +-------
> > > > +
> > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > +    cache.
> > >
> > > I had reserved some quibbling with this wording, but now that this is
> > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > imply, but does not require persistent memory nor does it necessarily
> > > "bypass page cache". For example on configurations that support dax,
> > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > be a #include of the man page definition that is careful (IIRC) to
> > > only talk about semantics and not backend implementation details. In
> > > other words, dax is to page-cache as direct-io is to page cache,
> > > effectively not there, but dig a bit deeper and you may find it.
> >
> > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> 
> No, I was referring to the proposed wording for STATX_ATTR_DAX.
> There's no reason for this description to say anything divergent from
> that description.

Ok I think the best text would be to simply refer to the STATX_ATTR_DAX man
page here.  Something like:

<quote>
 1. There exists an in-kernel access mode flag S_DAX that is set when file
    accesses is enabled for 'DAX'.  Applications must call statx to discover
    the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
    more details.
</quote>

Ira
Ira Weiny April 14, 2020, 8 p.m. UTC | #10
On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:

[snip]

> > > > > +
> > > > > +Enabling DAX on xfs
> > > > > +-------------------
> > > > > +
> > > > > +Summary
> > > > > +-------
> > > > > +
> > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > +    cache.
> > > >
> > > > I had reserved some quibbling with this wording, but now that this is
> > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > imply, but does not require persistent memory nor does it necessarily
> > > > "bypass page cache". For example on configurations that support dax,
> > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > be a #include of the man page definition that is careful (IIRC) to
> > > > only talk about semantics and not backend implementation details. In
> > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > effectively not there, but dig a bit deeper and you may find it.
> > >
> > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > 
> > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > There's no reason for this description to say anything divergent from
> > that description.
> 
> Ahh, ok.  Something like this, then:
> 
>  1. There exists an in-kernel access mode flag S_DAX.  When set, the
>     file is in the DAX (cpu direct access) state.  DAX state attempts to
>     minimize software cache effects for both I/O and memory mappings of
>     this file.  The S_DAX state is exposed to userspace via the
>     STATX_ATTR_DAX statx flag.
> 
>     See the STATX_ATTR_DAX in the statx(2) manpage for more information.

We crossed in the ether!!!  I propose even less details here...  Leave all the
details to the man page.

<quote>
1. There exists an in-kernel access mode flag S_DAX that is set when file
    accesses is enabled for 'DAX'.  Applications must call statx to discover
    the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
    more details.
</quote>

Ira
Darrick J. Wong April 14, 2020, 8:18 p.m. UTC | #11
On Tue, Apr 14, 2020 at 01:00:15PM -0700, Ira Weiny wrote:
> On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> [snip]
> 
> > > > > > +
> > > > > > +Enabling DAX on xfs
> > > > > > +-------------------
> > > > > > +
> > > > > > +Summary
> > > > > > +-------
> > > > > > +
> > > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > > +    cache.
> > > > >
> > > > > I had reserved some quibbling with this wording, but now that this is
> > > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > > imply, but does not require persistent memory nor does it necessarily
> > > > > "bypass page cache". For example on configurations that support dax,
> > > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > > be a #include of the man page definition that is careful (IIRC) to
> > > > > only talk about semantics and not backend implementation details. In
> > > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > > effectively not there, but dig a bit deeper and you may find it.
> > > >
> > > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > > 
> > > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > > There's no reason for this description to say anything divergent from
> > > that description.
> > 
> > Ahh, ok.  Something like this, then:
> > 
> >  1. There exists an in-kernel access mode flag S_DAX.  When set, the
> >     file is in the DAX (cpu direct access) state.  DAX state attempts to
> >     minimize software cache effects for both I/O and memory mappings of
> >     this file.  The S_DAX state is exposed to userspace via the
> >     STATX_ATTR_DAX statx flag.
> > 
> >     See the STATX_ATTR_DAX in the statx(2) manpage for more information.
> 
> We crossed in the ether!!!  I propose even less details here...  Leave all the
> details to the man page.
> 
> <quote>
> 1. There exists an in-kernel access mode flag S_DAX that is set when file
>     accesses is enabled for 'DAX'.  Applications must call statx to discover
>     the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
>     more details.
> </quote>

Why stop cutting there? :)

 1. There exists an in-kernel file access mode flag S_DAX that
    corresponds to the statx flag STATX_ATTR_DIRECT_LOAD_STORE.  See the
    manpage for statx(2) for details about this access mode.

--D

> Ira
>
Ira Weiny April 14, 2020, 8:54 p.m. UTC | #12
On Tue, Apr 14, 2020 at 01:18:08PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 14, 2020 at 01:00:15PM -0700, Ira Weiny wrote:
> > On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > > > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > [snip]
> > 
> > > > > > > +
> > > > > > > +Enabling DAX on xfs
> > > > > > > +-------------------
> > > > > > > +
> > > > > > > +Summary
> > > > > > > +-------
> > > > > > > +
> > > > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > > > +    cache.
> > > > > >
> > > > > > I had reserved some quibbling with this wording, but now that this is
> > > > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > > > imply, but does not require persistent memory nor does it necessarily
> > > > > > "bypass page cache". For example on configurations that support dax,
> > > > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > > > be a #include of the man page definition that is careful (IIRC) to
> > > > > > only talk about semantics and not backend implementation details. In
> > > > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > > > effectively not there, but dig a bit deeper and you may find it.
> > > > >
> > > > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > > > 
> > > > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > > > There's no reason for this description to say anything divergent from
> > > > that description.
> > > 
> > > Ahh, ok.  Something like this, then:
> > > 
> > >  1. There exists an in-kernel access mode flag S_DAX.  When set, the
> > >     file is in the DAX (cpu direct access) state.  DAX state attempts to
> > >     minimize software cache effects for both I/O and memory mappings of
> > >     this file.  The S_DAX state is exposed to userspace via the
> > >     STATX_ATTR_DAX statx flag.
> > > 
> > >     See the STATX_ATTR_DAX in the statx(2) manpage for more information.
> > 
> > We crossed in the ether!!!  I propose even less details here...  Leave all the
> > details to the man page.
> > 
> > <quote>
> > 1. There exists an in-kernel access mode flag S_DAX that is set when file
> >     accesses is enabled for 'DAX'.  Applications must call statx to discover
> >     the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
> >     more details.
> > </quote>
> 
> Why stop cutting there? :)
> 
>  1. There exists an in-kernel file access mode flag S_DAX that
>     corresponds to the statx flag STATX_ATTR_DIRECT_LOAD_STORE.  See the
>     manpage for statx(2) for details about this access mode.

Sure!  But I'm holding to STATX_ATTR_DAX...  I don't like introducing another
alias for this stuff.  Why have '-o dax=x' and then have some other term here?

Keep the name the same for consistency.

Searching for 'DAX Linux'[*] results in 'About 877,000 results' on Google.

While "'direct load store' Linux" results in 'About 2,630 results'.

I'll update the rest of the text though!  :-D

Ira

[*] Because 'DAX' is some company index and or a rapper...  <sigh>
Darrick J. Wong April 14, 2020, 9:02 p.m. UTC | #13
On Tue, Apr 14, 2020 at 01:54:44PM -0700, Ira Weiny wrote:
> On Tue, Apr 14, 2020 at 01:18:08PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 14, 2020 at 01:00:15PM -0700, Ira Weiny wrote:
> > > On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > > > > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > 
> > > [snip]
> > > 
> > > > > > > > +
> > > > > > > > +Enabling DAX on xfs
> > > > > > > > +-------------------
> > > > > > > > +
> > > > > > > > +Summary
> > > > > > > > +-------
> > > > > > > > +
> > > > > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > > > > +    cache.
> > > > > > >
> > > > > > > I had reserved some quibbling with this wording, but now that this is
> > > > > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > > > > imply, but does not require persistent memory nor does it necessarily
> > > > > > > "bypass page cache". For example on configurations that support dax,
> > > > > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > > > > be a #include of the man page definition that is careful (IIRC) to
> > > > > > > only talk about semantics and not backend implementation details. In
> > > > > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > > > > effectively not there, but dig a bit deeper and you may find it.
> > > > > >
> > > > > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > > > > 
> > > > > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > > > > There's no reason for this description to say anything divergent from
> > > > > that description.
> > > > 
> > > > Ahh, ok.  Something like this, then:
> > > > 
> > > >  1. There exists an in-kernel access mode flag S_DAX.  When set, the
> > > >     file is in the DAX (cpu direct access) state.  DAX state attempts to
> > > >     minimize software cache effects for both I/O and memory mappings of
> > > >     this file.  The S_DAX state is exposed to userspace via the
> > > >     STATX_ATTR_DAX statx flag.
> > > > 
> > > >     See the STATX_ATTR_DAX in the statx(2) manpage for more information.
> > > 
> > > We crossed in the ether!!!  I propose even less details here...  Leave all the
> > > details to the man page.
> > > 
> > > <quote>
> > > 1. There exists an in-kernel access mode flag S_DAX that is set when file
> > >     accesses is enabled for 'DAX'.  Applications must call statx to discover
> > >     the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
> > >     more details.
> > > </quote>
> > 
> > Why stop cutting there? :)
> > 
> >  1. There exists an in-kernel file access mode flag S_DAX that
> >     corresponds to the statx flag STATX_ATTR_DIRECT_LOAD_STORE.  See the
> >     manpage for statx(2) for details about this access mode.
> 
> Sure!  But I'm holding to STATX_ATTR_DAX...  I don't like introducing another
> alias for this stuff.  Why have '-o dax=x' and then have some other term here?

Ok, STATX_ATTR_DAX then.

> Keep the name the same for consistency.
> 
> Searching for 'DAX Linux'[*] results in 'About 877,000 results' on Google.
> 
> While "'direct load store' Linux" results in 'About 2,630 results'.
> 
> I'll update the rest of the text though!  :-D
> 
> Ira
> 
> [*] Because 'DAX' is some company index and or a rapper...  <sigh>

Don't forget Jadzia and Ezri. ;)

--D
Jan Kara April 15, 2020, 8:23 a.m. UTC | #14
On Tue 14-04-20 12:48:48, Ira Weiny wrote:
> On Mon, Apr 13, 2020 at 10:12:22PM -0700, Dan Williams wrote:
> > On Mon, Apr 13, 2020 at 9:38 PM Ira Weiny <ira.weiny@intel.com> wrote:
> > >
> > > On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> > > > On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > >
> > > > > Update the Usage section to reflect the new individual dax selection
> > > > > functionality.
> > > >
> > > > Yum. :)
> > > >
> > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > > >
> > > > > ---
> > > > > Changes from V6:
> > > > >     Update to allow setting FS_XFLAG_DAX any time.
> > > > >     Update with list of behaviors from Darrick
> > > > >     https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > > >
> > > > > Changes from V5:
> > > > >     Update to reflect the agreed upon semantics
> > > > >     https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > > ---
> > > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > > index 679729442fd2..af14c1b330a9 100644
> > > > > --- a/Documentation/filesystems/dax.txt
> > > > > +++ b/Documentation/filesystems/dax.txt
> > > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > > >  Usage
> > > > >  -----
> > > > >
> > > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > > +If you have a block device which supports DAX, you can make a file system
> > > > >  on it as usual.  The DAX code currently only supports files with a block
> > > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > > +size when creating the file system.
> > > > > +
> > > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > > +different at this time.
> > > >
> > > > I thought ext2 supports DAX?
> > >
> > > Not that I know of?  Does it?
> > 
> > Yes. Seemed like a good idea at the time, but in retrospect...
> 
> Ah ok...   Is there an objection to leaving ext2 as a global mount option?
> Updating the doc is easy enough.

I'm fine with that. I wouldn't really bother with per-inode DAX flag for
ext2.

								Honza
diff mbox series

Patch

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..af14c1b330a9 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,171 @@  For file mappings, the storage device is mapped directly into userspace.
 Usage
 -----
 
-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
+different at this time.
+
+Enabling DAX on ext4
+--------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel access mode flag S_DAX that is set when
+    file accesses go directly to persistent memory, bypassing the page
+    cache.  Applications must call statx to discover the current S_DAX
+    state (STATX_ATTR_DAX).
+
+ 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
+    inherited from the parent directory FS_XFLAG_DAX inode flag at file
+    creation time.  This advisory flag can be set or cleared at any
+    time, but doing so does not immediately affect the S_DAX state.
+
+    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
+    and the fs is on pmem then it will enable S_DAX at inode load time;
+    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
+
+ 3. There exists a dax= mount option.
+
+    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+    "-o dax=always" means "always set S_DAX (at least on pmem),
+                    and ignore FS_XFLAG_DAX."
+
+    "-o dax"        is an alias for "dax=always".
+
+    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
+
+ 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
+    be set or cleared at any time.  The flag state is inherited by any files or
+    subdirectories when they are created within that directory.
+
+ 5. Programs that require a specific file access mode (DAX or not DAX)
+    can do one of the following:
+
+    (a) Create files in directories that the FS_XFLAG_DAX flag set as
+        needed; or
+
+    (b) Have the administrator set an override via mount option; or
+
+    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
+        must then cause the kernel to evict the inode from memory.  This
+        can be done by:
+
+        i>  Closing the file and re-opening the file and using statx to
+            see if the fs has changed the S_DAX flag; and
+
+        ii> If the file still does not have the desired S_DAX access
+            mode, either unmount and remount the filesystem, or close
+            the file and use drop_caches.
+
+ 6. It is expected that users who want to squeeze every last bit of performance
+    out of the particular rough and tumble bits of their storage will also be
+    exposed to the difficulties of what happens when the operating system can't
+    totally virtualize those hardware capabilities.  DAX is such a feature.
+    Basically, Formula-1 cars require a bit more care and feeding than your
+    averaged Toyota minivan, as it were.
+
+
+Details
+-------
+
+There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
+and the other a currently enabled state (S_DAX).
+
+FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
+within the file system.  This 'physical' config setting can be set using an
+ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
+directories automatically inherit FS_XFLAG_DAX from their parent directory
+_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
+can be used to set a default behavior for an entire sub-tree.  (Doing so on the
+root directory acts to set a default for the entire file system.)
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
+the underlying media support, the value of FS_XFLAG_DAX, and the file systems
+dax mount option setting.  See below.
+
+statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
+set and therefore statx will always return false on directories.
+
+NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
+even if the underlying media does not support dax and/or the file system is
+overridden with a mount option.
+
+
+Overriding FS_XFLAG_DAX (dax= mount option)
+-------------------------------------------
+
+There exists a dax mount option.  Using the mount option does not change the
+physical configured state of individual files but overrides the S_DAX operating
+state when inodes are loaded.
+
+Given underlying media support, the dax mount option is a tri-state option
+(never, always, inode) with the following meanings:
+
+   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
+   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
+        "-o dax" by itself means "dax=always" to remain compatible with older
+	         kernels
+   "-o dax=inode" means "follow FS_XFLAG_DAX"
+
+The default state is 'inode'.  Given underlying media support, the following
+algorithm is used to determine the effective mode of the file S_DAX on a
+capable device.
+
+	S_DAX = FS_XFLAG_DAX;
+
+	if (dax_mount == "always")
+		S_DAX = true;
+	else if (dax_mount == "off"
+		S_DAX = false;
+
+To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
+while the file system is mounted with a dax override.  However, file enabled
+state, S_DAX, will continue to be the overridden until the file system is
+remounted with dax=inode.
 
 
 Implementation Tips for Block Driver Writers