[v3,2/2] ext4: handle layout changes to pinned DAX mappings

Message ID	20180702172912.329-1-ross.zwisler@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nvdimm-bounces@lists.01.org> Received-SPF: None (no SPF record) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=ross.zwisler@linux.intel.com; receiver=linux-nvdimm@lists.01.org From: Ross Zwisler <ross.zwisler@linux.intel.com> To: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>, Dave Chinner <david@fromorbit.com>, "Darrick J. Wong" <darrick.wong@oracle.com>, Christoph Hellwig <hch@lst.de>, linux-nvdimm@lists.01.org, Jeff Moyer <jmoyer@redhat.com>, linux-ext4@vger.kernel.org Subject: [PATCH v3 2/2] ext4: handle layout changes to pinned DAX mappings Date: Mon, 2 Jul 2018 11:29:12 -0600 Message-Id: <20180702172912.329-1-ross.zwisler@linux.intel.com> In-reply-to: <20180627212252.31032-3-ross.zwisler@linux.intel.com> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" <linux-nvdimm-bounces@lists.01.org>

Ross Zwisler July 2, 2018, 5:29 p.m. UTC

Follow the lead of xfs_break_dax_layouts() and add synchronization between
operations in ext4 which remove blocks from an inode (hole punch, truncate
down, etc.) and pages which are pinned due to DAX DMA operations.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
---

Changes since v2:
 * Added a comment to ext4_insert_range() explaining why we don't call
   ext4_break_layouts(). (Jan)

 * Added Lukas's reviewed-by.

Thanks again to Lukas & Jan for their reviews.

---
 fs/ext4/ext4.h     |  1 +
 fs/ext4/extents.c  | 17 +++++++++++++++++
 fs/ext4/inode.c    | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/truncate.h |  4 ++++
 4 files changed, 68 insertions(+)

Dave Chinner July 4, 2018, 12:49 a.m. UTC | #1

On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> Follow the lead of xfs_break_dax_layouts() and add synchronization between
> operations in ext4 which remove blocks from an inode (hole punch, truncate
> down, etc.) and pages which are pinned due to DAX DMA operations.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> ---
> 
> Changes since v2:
>  * Added a comment to ext4_insert_range() explaining why we don't call
>    ext4_break_layouts(). (Jan)

Which I think is wrong and will cause data corruption.

> @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
>  			LLONG_MAX);
>  	if (ret)
>  		goto out_mmap;
> +	/*
> +	 * We don't need to call ext4_break_layouts() because we aren't
> +	 * removing any blocks from the inode.  We are just changing their
> +	 * offset by inserting a hole.
> +	 */

The entire point of these leases is so that a thrid party can
directly access the blocks underlying the file. That means they are
keeping their own file offset<->disk block mapping internally, and
they are assuming that it is valid for as long as they hold the
lease. If the filesystem modifies the extent map - even something
like a shift here which changes the offset<->disk block mapping -
the userspace app now has a stale mapping and so the lease *must be
broken* to tell it that it's mappings are now stale and it needs to
refetch them.

If the app doesn't refetch it's mappings after something like a
shift, it will be reading and writing data from the wrong file
offset, and that will lead to the app silently corrupting it's data.
IOWs, layouts need to be broken by any operation that modifies the
extent map in any way, not just those operations that remove blocks.

Cheers,

Dave.

Jan Kara July 4, 2018, 12:27 p.m. UTC | #2

On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > down, etc.) and pages which are pinned due to DAX DMA operations.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> > 
> > Changes since v2:
> >  * Added a comment to ext4_insert_range() explaining why we don't call
> >    ext4_break_layouts(). (Jan)
> 
> Which I think is wrong and will cause data corruption.
> 
> > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> >  			LLONG_MAX);
> >  	if (ret)
> >  		goto out_mmap;
> > +	/*
> > +	 * We don't need to call ext4_break_layouts() because we aren't
> > +	 * removing any blocks from the inode.  We are just changing their
> > +	 * offset by inserting a hole.
> > +	 */
> 
> The entire point of these leases is so that a thrid party can
> directly access the blocks underlying the file. That means they are
> keeping their own file offset<->disk block mapping internally, and
> they are assuming that it is valid for as long as they hold the
> lease. If the filesystem modifies the extent map - even something
> like a shift here which changes the offset<->disk block mapping -
> the userspace app now has a stale mapping and so the lease *must be
> broken* to tell it that it's mappings are now stale and it needs to
> refetch them.

Well, ext4 has no real concept of leases and no pNFS support. And DAX
requirements wrt consistency are much weaker than those of pNFS. This is
mostly caused by the fact that calls like invalidate_mapping_pages() will
flush offset<->pfn mappings DAX maintains in the radix tree automatically
(similarly as it happens when page cache is used).

What Ross did just keeps ext4 + DAX behave similarly as ext4 + page cache
does - i.e., if you use mmaped file as a buffer e.g. for direct IO and mix
your direct IO with extent manipulations on that buffer file, you will get
inconsistent results. With page cache, pages you use as buffers will get
detached from the inode during extent manipulations and discarded once you
drop your page references. With DAX, changes will land at a different
offset of the file than you might have thought. But that is the same as if
we just waited for the IO to complete (which is what ext4_break_layouts()
effectively does) and then shifted those blocks.

So the biggest problem I can see here is that ext4_break_layouts() is a
misnomer as it promises more than the function actually does (wait for page
references to be dropped). If we called it like ext4_dax_unmap_pages(),
things would be clearer I guess. Ross?

								Honza

Dave Chinner July 4, 2018, 11:54 p.m. UTC | #3

On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > 
> > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > ---
> > > 
> > > Changes since v2:
> > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > >    ext4_break_layouts(). (Jan)
> > 
> > Which I think is wrong and will cause data corruption.
> > 
> > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > >  			LLONG_MAX);
> > >  	if (ret)
> > >  		goto out_mmap;
> > > +	/*
> > > +	 * We don't need to call ext4_break_layouts() because we aren't
> > > +	 * removing any blocks from the inode.  We are just changing their
> > > +	 * offset by inserting a hole.
> > > +	 */
> > 
> > The entire point of these leases is so that a thrid party can
> > directly access the blocks underlying the file. That means they are
> > keeping their own file offset<->disk block mapping internally, and
> > they are assuming that it is valid for as long as they hold the
> > lease. If the filesystem modifies the extent map - even something
> > like a shift here which changes the offset<->disk block mapping -
> > the userspace app now has a stale mapping and so the lease *must be
> > broken* to tell it that it's mappings are now stale and it needs to
> > refetch them.
> 
> Well, ext4 has no real concept of leases and no pNFS support. And DAX
> requirements wrt consistency are much weaker than those of pNFS. This is
> mostly caused by the fact that calls like invalidate_mapping_pages() will
> flush offset<->pfn mappings DAX maintains in the radix tree automatically
> (similarly as it happens when page cache is used).

I'm more concerned about apps that use file leases behaving the same
way, not just the pNFS stuff. if we are /delegating file layouts/ to
3rd parties, then all filesystems *need* to behave the same way.
We've already defined those semantics with XFS - every time the
filesystem changes an extent layout in any way it needs to break
existing layout delegations...

> What Ross did just keeps ext4 + DAX behave similarly as ext4 + page cache
> does

Sure. But the issue I'm raising is that ext4 is not playing by the
same extent layout delegation rules that XFS has already defined for
3rd party use.

i.e. don't fuck up layout delegation behaviour consistency right
from the start just because "<this subset of functionality> is all
we need right now for ext4". All the filesystems should implement
the same semantics and behaviour right from the start, otherwise
we're just going to make life a misery for anyone who tries to use
layout delegations in future.

Haven't we learnt this lesson the hard way enough times already?

Cheers,

Dave.

Darrick J. Wong July 5, 2018, 3:59 a.m. UTC | #4

On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > 
> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > > ---
> > > > 
> > > > Changes since v2:
> > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > > >    ext4_break_layouts(). (Jan)
> > > 
> > > Which I think is wrong and will cause data corruption.
> > > 
> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > > >  			LLONG_MAX);
> > > >  	if (ret)
> > > >  		goto out_mmap;
> > > > +	/*
> > > > +	 * We don't need to call ext4_break_layouts() because we aren't
> > > > +	 * removing any blocks from the inode.  We are just changing their
> > > > +	 * offset by inserting a hole.
> > > > +	 */

Does calling ext4_break_layouts from insert range not work?

It's my understanding that file leases work are a mechanism for the
filesystem to delegate some of its authority over physical space
mappings to "client" software.  AFAICT it's used for mmap'ing pmem
directly into userspace and exporting space on shared storage over
pNFS.  Some day we might use the same mechanism for the similar things
that RDMA does, or the swapfile code since that's essentially how it
works today.

The other part of these file leases is that the filesystem revokes them
any time it wants to perform a mapping operation on a file.  This breaks
my mental model of how leases work, and if you commit to this for ext4
then I'll have to remember that leases are different between xfs and
ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
be the implementation detail that "DAX won't care", then someone else
wiring up pNFS/future RDMA/whatever will also have to remember to put it
back into ext4 or else kaboom.

Granted, Dave said all these things already, but I actually feel
strongly enough to reiterate.

--D

> > > 
> > > The entire point of these leases is so that a thrid party can
> > > directly access the blocks underlying the file. That means they are
> > > keeping their own file offset<->disk block mapping internally, and
> > > they are assuming that it is valid for as long as they hold the
> > > lease. If the filesystem modifies the extent map - even something
> > > like a shift here which changes the offset<->disk block mapping -
> > > the userspace app now has a stale mapping and so the lease *must be
> > > broken* to tell it that it's mappings are now stale and it needs to
> > > refetch them.
> > 
> > Well, ext4 has no real concept of leases and no pNFS support. And DAX
> > requirements wrt consistency are much weaker than those of pNFS. This is
> > mostly caused by the fact that calls like invalidate_mapping_pages() will
> > flush offset<->pfn mappings DAX maintains in the radix tree automatically
> > (similarly as it happens when page cache is used).
> 
> I'm more concerned about apps that use file leases behaving the same
> way, not just the pNFS stuff. if we are /delegating file layouts/ to
> 3rd parties, then all filesystems *need* to behave the same way.
> We've already defined those semantics with XFS - every time the
> filesystem changes an extent layout in any way it needs to break
> existing layout delegations...
> 
> > What Ross did just keeps ext4 + DAX behave similarly as ext4 + page cache
> > does
> 
> Sure. But the issue I'm raising is that ext4 is not playing by the
> same extent layout delegation rules that XFS has already defined for
> 3rd party use.
> 
> i.e. don't fuck up layout delegation behaviour consistency right
> from the start just because "<this subset of functionality> is all
> we need right now for ext4". All the filesystems should implement
> the same semantics and behaviour right from the start, otherwise
> we're just going to make life a misery for anyone who tries to use
> layout delegations in future.
> 
> Haven't we learnt this lesson the hard way enough times already?
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

Ross Zwisler July 5, 2018, 4:53 p.m. UTC | #5

On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > > 
> > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > > > ---
> > > > > 
> > > > > Changes since v2:
> > > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > > > >    ext4_break_layouts(). (Jan)
> > > > 
> > > > Which I think is wrong and will cause data corruption.
> > > > 
> > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > > > >  			LLONG_MAX);
> > > > >  	if (ret)
> > > > >  		goto out_mmap;
> > > > > +	/*
> > > > > +	 * We don't need to call ext4_break_layouts() because we aren't
> > > > > +	 * removing any blocks from the inode.  We are just changing their
> > > > > +	 * offset by inserting a hole.
> > > > > +	 */
> 
> Does calling ext4_break_layouts from insert range not work?
> 
> It's my understanding that file leases work are a mechanism for the
> filesystem to delegate some of its authority over physical space
> mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> directly into userspace and exporting space on shared storage over
> pNFS.  Some day we might use the same mechanism for the similar things
> that RDMA does, or the swapfile code since that's essentially how it
> works today.
> 
> The other part of these file leases is that the filesystem revokes them
> any time it wants to perform a mapping operation on a file.  This breaks
> my mental model of how leases work, and if you commit to this for ext4
> then I'll have to remember that leases are different between xfs and
> ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> be the implementation detail that "DAX won't care", then someone else
> wiring up pNFS/future RDMA/whatever will also have to remember to put it
> back into ext4 or else kaboom.
> 
> Granted, Dave said all these things already, but I actually feel
> strongly enough to reiterate.

Jan, would you like me to call ext4_break_layouts() in ext4_insert_range() to
keep the lease mechanism consistent between ext4 and XFS, or would you prefer
the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?

Dan Williams July 5, 2018, 8:40 p.m. UTC | #6

On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
>> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
>> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
>> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
>> > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
>> > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
>> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
>> > > >
>> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
>> > > > Reviewed-by: Jan Kara <jack@suse.cz>
>> > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
>> > > > ---
>> > > >
>> > > > Changes since v2:
>> > > >  * Added a comment to ext4_insert_range() explaining why we don't call
>> > > >    ext4_break_layouts(). (Jan)
>> > >
>> > > Which I think is wrong and will cause data corruption.
>> > >
>> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
>> > > >                         LLONG_MAX);
>> > > >         if (ret)
>> > > >                 goto out_mmap;
>> > > > +       /*
>> > > > +        * We don't need to call ext4_break_layouts() because we aren't
>> > > > +        * removing any blocks from the inode.  We are just changing their
>> > > > +        * offset by inserting a hole.
>> > > > +        */
>
> Does calling ext4_break_layouts from insert range not work?
>
> It's my understanding that file leases work are a mechanism for the
> filesystem to delegate some of its authority over physical space
> mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> directly into userspace and exporting space on shared storage over
> pNFS.  Some day we might use the same mechanism for the similar things
> that RDMA does, or the swapfile code since that's essentially how it
> works today.
>
> The other part of these file leases is that the filesystem revokes them
> any time it wants to perform a mapping operation on a file.  This breaks
> my mental model of how leases work, and if you commit to this for ext4
> then I'll have to remember that leases are different between xfs and
> ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> be the implementation detail that "DAX won't care", then someone else
> wiring up pNFS/future RDMA/whatever will also have to remember to put it
> back into ext4 or else kaboom.
>
> Granted, Dave said all these things already, but I actually feel
> strongly enough to reiterate.

This patch kit is only for the DAX fix, this isn't full layout lease
support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
So ext4 is achieving feature parity for BREAK_UNMAP, just not
BREAK_WRITE, yet.

Dave Chinner July 5, 2018, 11:29 p.m. UTC | #7

On Thu, Jul 05, 2018 at 01:40:17PM -0700, Dan Williams wrote:
> On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> >> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> >> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> >> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> >> > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> >> > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> >> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> >> > > >
> >> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> >> > > > Reviewed-by: Jan Kara <jack@suse.cz>
> >> > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> >> > > > ---
> >> > > >
> >> > > > Changes since v2:
> >> > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> >> > > >    ext4_break_layouts(). (Jan)
> >> > >
> >> > > Which I think is wrong and will cause data corruption.
> >> > >
> >> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> >> > > >                         LLONG_MAX);
> >> > > >         if (ret)
> >> > > >                 goto out_mmap;
> >> > > > +       /*
> >> > > > +        * We don't need to call ext4_break_layouts() because we aren't
> >> > > > +        * removing any blocks from the inode.  We are just changing their
> >> > > > +        * offset by inserting a hole.
> >> > > > +        */
> >
> > Does calling ext4_break_layouts from insert range not work?
> >
> > It's my understanding that file leases work are a mechanism for the
> > filesystem to delegate some of its authority over physical space
> > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > directly into userspace and exporting space on shared storage over
> > pNFS.  Some day we might use the same mechanism for the similar things
> > that RDMA does, or the swapfile code since that's essentially how it
> > works today.
> >
> > The other part of these file leases is that the filesystem revokes them
> > any time it wants to perform a mapping operation on a file.  This breaks
> > my mental model of how leases work, and if you commit to this for ext4
> > then I'll have to remember that leases are different between xfs and
> > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > be the implementation detail that "DAX won't care", then someone else
> > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > back into ext4 or else kaboom.
> >
> > Granted, Dave said all these things already, but I actually feel
> > strongly enough to reiterate.
> 
> This patch kit is only for the DAX fix, this isn't full layout lease
> support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
> So ext4 is achieving feature parity for BREAK_UNMAP, just not
> BREAK_WRITE, yet.

BREAK_UNMAP is issued unconditionally by XFS for all fallocate
operations. There is no special except for extent shifting (up or
down) in XFS as this patch set is making for ext4.  IOWs, this
patchset does not implement BREAK_UNMAP with the same semantics as
XFS.

Cheers,

Dave.

Dan Williams July 6, 2018, 5:08 a.m. UTC | #8

On Thu, Jul 5, 2018 at 4:29 PM Dave Chinner <david@fromorbit.com> wrote:

> On Thu, Jul 05, 2018 at 01:40:17PM -0700, Dan Williams wrote:
> > On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong <darrick.wong@oracle.com>
> wrote:
> > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > >> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > >> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > >> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > >> > > > Follow the lead of xfs_break_dax_layouts() and add
> synchronization between
> > >> > > > operations in ext4 which remove blocks from an inode (hole
> punch, truncate
> > >> > > > down, etc.) and pages which are pinned due to DAX DMA
> operations.
> > >> > > >
> > >> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > >> > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > >> > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > >> > > > ---
> > >> > > >
> > >> > > > Changes since v2:
> > >> > > >  * Added a comment to ext4_insert_range() explaining why we
> don't call
> > >> > > >    ext4_break_layouts(). (Jan)
> > >> > >
> > >> > > Which I think is wrong and will cause data corruption.
> > >> > >
> > >> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode
> *inode, loff_t offset, loff_t len)
> > >> > > >                         LLONG_MAX);
> > >> > > >         if (ret)
> > >> > > >                 goto out_mmap;
> > >> > > > +       /*
> > >> > > > +        * We don't need to call ext4_break_layouts() because
> we aren't
> > >> > > > +        * removing any blocks from the inode.  We are just
> changing their
> > >> > > > +        * offset by inserting a hole.
> > >> > > > +        */
> > >
> > > Does calling ext4_break_layouts from insert range not work?
> > >
> > > It's my understanding that file leases work are a mechanism for the
> > > filesystem to delegate some of its authority over physical space
> > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > directly into userspace and exporting space on shared storage over
> > > pNFS.  Some day we might use the same mechanism for the similar things
> > > that RDMA does, or the swapfile code since that's essentially how it
> > > works today.
> > >
> > > The other part of these file leases is that the filesystem revokes them
> > > any time it wants to perform a mapping operation on a file.  This
> breaks
> > > my mental model of how leases work, and if you commit to this for ext4
> > > then I'll have to remember that leases are different between xfs and
> > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > be the implementation detail that "DAX won't care", then someone else
> > > wiring up pNFS/future RDMA/whatever will also have to remember to put
> it
> > > back into ext4 or else kaboom.
> > >
> > > Granted, Dave said all these things already, but I actually feel
> > > strongly enough to reiterate.
> >
> > This patch kit is only for the DAX fix, this isn't full layout lease
> > support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
> > So ext4 is achieving feature parity for BREAK_UNMAP, just not
> > BREAK_WRITE, yet.
>
> BREAK_UNMAP is issued unconditionally by XFS for all fallocate
> operations. There is no special except for extent shifting (up or
> down) in XFS as this patch set is making for ext4.  IOWs, this
> patchset does not implement BREAK_UNMAP with the same semantics as
> XFS.



Ah true, I see that now.

Lukas Czerner July 9, 2018, 9:59 a.m. UTC | #9

On Fri, Jul 06, 2018 at 09:29:34AM +1000, Dave Chinner wrote:
> On Thu, Jul 05, 2018 at 01:40:17PM -0700, Dan Williams wrote:
> > On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > >> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > >> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > >> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > >> > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > >> > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > >> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > >> > > >
> > >> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > >> > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > >> > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > >> > > > ---
> > >> > > >
> > >> > > > Changes since v2:
> > >> > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > >> > > >    ext4_break_layouts(). (Jan)
> > >> > >
> > >> > > Which I think is wrong and will cause data corruption.
> > >> > >
> > >> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > >> > > >                         LLONG_MAX);
> > >> > > >         if (ret)
> > >> > > >                 goto out_mmap;
> > >> > > > +       /*
> > >> > > > +        * We don't need to call ext4_break_layouts() because we aren't
> > >> > > > +        * removing any blocks from the inode.  We are just changing their
> > >> > > > +        * offset by inserting a hole.
> > >> > > > +        */
> > >
> > > Does calling ext4_break_layouts from insert range not work?
> > >
> > > It's my understanding that file leases work are a mechanism for the
> > > filesystem to delegate some of its authority over physical space
> > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > directly into userspace and exporting space on shared storage over
> > > pNFS.  Some day we might use the same mechanism for the similar things
> > > that RDMA does, or the swapfile code since that's essentially how it
> > > works today.
> > >
> > > The other part of these file leases is that the filesystem revokes them
> > > any time it wants to perform a mapping operation on a file.  This breaks
> > > my mental model of how leases work, and if you commit to this for ext4
> > > then I'll have to remember that leases are different between xfs and
> > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > be the implementation detail that "DAX won't care", then someone else
> > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > back into ext4 or else kaboom.
> > >
> > > Granted, Dave said all these things already, but I actually feel
> > > strongly enough to reiterate.
> > 
> > This patch kit is only for the DAX fix, this isn't full layout lease
> > support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
> > So ext4 is achieving feature parity for BREAK_UNMAP, just not
> > BREAK_WRITE, yet.
> 
> BREAK_UNMAP is issued unconditionally by XFS for all fallocate
> operations. There is no special except for extent shifting (up or
> down) in XFS as this patch set is making for ext4.  IOWs, this
> patchset does not implement BREAK_UNMAP with the same semantics as
> XFS.

If anything this is very usefull discussion ( at least for me ) and what
I do take away from it is that there is no documentation, nor
specification of the leases nor BREAK_UNMAP nor BREAK_WRITE.

grep -iR -e break_layout -e BREAK_UNMAP -e BREAK_WRITE Documentation/*

Maybe someone with a good understanding of how this stuff is supposed to
be done could write it down so filesystem devs can make it behave the
same.

Thanks!
-Lukas


> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Jan Kara July 9, 2018, 12:33 p.m. UTC | #10

On Thu 05-07-18 10:53:10, Ross Zwisler wrote:
> On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > > > 
> > > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > > > > ---
> > > > > > 
> > > > > > Changes since v2:
> > > > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > > > > >    ext4_break_layouts(). (Jan)
> > > > > 
> > > > > Which I think is wrong and will cause data corruption.
> > > > > 
> > > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > > > > >  			LLONG_MAX);
> > > > > >  	if (ret)
> > > > > >  		goto out_mmap;
> > > > > > +	/*
> > > > > > +	 * We don't need to call ext4_break_layouts() because we aren't
> > > > > > +	 * removing any blocks from the inode.  We are just changing their
> > > > > > +	 * offset by inserting a hole.
> > > > > > +	 */
> > 
> > Does calling ext4_break_layouts from insert range not work?
> > 
> > It's my understanding that file leases work are a mechanism for the
> > filesystem to delegate some of its authority over physical space
> > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > directly into userspace and exporting space on shared storage over
> > pNFS.  Some day we might use the same mechanism for the similar things
> > that RDMA does, or the swapfile code since that's essentially how it
> > works today.
> > 
> > The other part of these file leases is that the filesystem revokes them
> > any time it wants to perform a mapping operation on a file.  This breaks
> > my mental model of how leases work, and if you commit to this for ext4
> > then I'll have to remember that leases are different between xfs and
> > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > be the implementation detail that "DAX won't care", then someone else
> > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > back into ext4 or else kaboom.
> > 
> > Granted, Dave said all these things already, but I actually feel
> > strongly enough to reiterate.
> 
> Jan, would you like me to call ext4_break_layouts() in ext4_insert_range() to
> keep the lease mechanism consistent between ext4 and XFS, or would you prefer
> the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?

Let's just call it from ext4_insert_range(). I think the simple semantics
Dave and Darrick defend is more maintainable and insert range isn't really
performance critical operation.

The question remains whether equivalent of BREAK_UNMAP is really required
also for allocation of new blocks using fallocate. Because that doesn't
really seem fundamentally different from normal write which uses
BREAK_WRITE for xfs_break_layouts(). And that it more often used operation
so bothering with GUP synchronization when not needed could hurt there.
Dave, Darrick?

								Honza

Darrick J. Wong July 9, 2018, 4:18 p.m. UTC | #11

On Mon, Jul 09, 2018 at 11:59:07AM +0200, Lukas Czerner wrote:
> On Fri, Jul 06, 2018 at 09:29:34AM +1000, Dave Chinner wrote:
> > On Thu, Jul 05, 2018 at 01:40:17PM -0700, Dan Williams wrote:
> > > On Wed, Jul 4, 2018 at 8:59 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > >> On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > >> > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > >> > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > >> > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > >> > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > >> > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > >> > > >
> > > >> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > >> > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > >> > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > >> > > > ---
> > > >> > > >
> > > >> > > > Changes since v2:
> > > >> > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > > >> > > >    ext4_break_layouts(). (Jan)
> > > >> > >
> > > >> > > Which I think is wrong and will cause data corruption.
> > > >> > >
> > > >> > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > > >> > > >                         LLONG_MAX);
> > > >> > > >         if (ret)
> > > >> > > >                 goto out_mmap;
> > > >> > > > +       /*
> > > >> > > > +        * We don't need to call ext4_break_layouts() because we aren't
> > > >> > > > +        * removing any blocks from the inode.  We are just changing their
> > > >> > > > +        * offset by inserting a hole.
> > > >> > > > +        */
> > > >
> > > > Does calling ext4_break_layouts from insert range not work?
> > > >
> > > > It's my understanding that file leases work are a mechanism for the
> > > > filesystem to delegate some of its authority over physical space
> > > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > > directly into userspace and exporting space on shared storage over
> > > > pNFS.  Some day we might use the same mechanism for the similar things
> > > > that RDMA does, or the swapfile code since that's essentially how it
> > > > works today.
> > > >
> > > > The other part of these file leases is that the filesystem revokes them
> > > > any time it wants to perform a mapping operation on a file.  This breaks
> > > > my mental model of how leases work, and if you commit to this for ext4
> > > > then I'll have to remember that leases are different between xfs and
> > > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > > be the implementation detail that "DAX won't care", then someone else
> > > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > > back into ext4 or else kaboom.
> > > >
> > > > Granted, Dave said all these things already, but I actually feel
> > > > strongly enough to reiterate.
> > > 
> > > This patch kit is only for the DAX fix, this isn't full layout lease
> > > support. Even XFS is special casing unmap with the BREAK_UNMAP flag.
> > > So ext4 is achieving feature parity for BREAK_UNMAP, just not
> > > BREAK_WRITE, yet.
> > 
> > BREAK_UNMAP is issued unconditionally by XFS for all fallocate
> > operations. There is no special except for extent shifting (up or
> > down) in XFS as this patch set is making for ext4.  IOWs, this
> > patchset does not implement BREAK_UNMAP with the same semantics as
> > XFS.
> 
> If anything this is very usefull discussion ( at least for me ) and what
> I do take away from it is that there is no documentation, nor
> specification of the leases nor BREAK_UNMAP nor BREAK_WRITE.
> 
> grep -iR -e break_layout -e BREAK_UNMAP -e BREAK_WRITE Documentation/*
> 
> Maybe someone with a good understanding of how this stuff is supposed to
> be done could write it down so filesystem devs can make it behave the
> same.

Dan? :)

IIRC, BREAK_WRITE means "terminate all leases immediately" as the caller
prepares to write to a file range (which may or may not involve adding
more mappings), whereas BREAK_UNMAP means "terminate all leases and wait
until the lessee acknowledges" as the caller prepares to remove (or
move) file extent mappings.

--D

> Thanks!
> -Lukas
> 
> 
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

Darrick J. Wong July 9, 2018, 4:23 p.m. UTC | #12

On Mon, Jul 09, 2018 at 02:33:47PM +0200, Jan Kara wrote:
> On Thu 05-07-18 10:53:10, Ross Zwisler wrote:
> > On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > > > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > > > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > > > > 
> > > > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > > > > > ---
> > > > > > > 
> > > > > > > Changes since v2:
> > > > > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > > > > > >    ext4_break_layouts(). (Jan)
> > > > > > 
> > > > > > Which I think is wrong and will cause data corruption.
> > > > > > 
> > > > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > > > > > >  			LLONG_MAX);
> > > > > > >  	if (ret)
> > > > > > >  		goto out_mmap;
> > > > > > > +	/*
> > > > > > > +	 * We don't need to call ext4_break_layouts() because we aren't
> > > > > > > +	 * removing any blocks from the inode.  We are just changing their
> > > > > > > +	 * offset by inserting a hole.
> > > > > > > +	 */
> > > 
> > > Does calling ext4_break_layouts from insert range not work?
> > > 
> > > It's my understanding that file leases work are a mechanism for the
> > > filesystem to delegate some of its authority over physical space
> > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > directly into userspace and exporting space on shared storage over
> > > pNFS.  Some day we might use the same mechanism for the similar things
> > > that RDMA does, or the swapfile code since that's essentially how it
> > > works today.
> > > 
> > > The other part of these file leases is that the filesystem revokes them
> > > any time it wants to perform a mapping operation on a file.  This breaks
> > > my mental model of how leases work, and if you commit to this for ext4
> > > then I'll have to remember that leases are different between xfs and
> > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > be the implementation detail that "DAX won't care", then someone else
> > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > back into ext4 or else kaboom.
> > > 
> > > Granted, Dave said all these things already, but I actually feel
> > > strongly enough to reiterate.
> > 
> > Jan, would you like me to call ext4_break_layouts() in ext4_insert_range() to
> > keep the lease mechanism consistent between ext4 and XFS, or would you prefer
> > the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?
> 
> Let's just call it from ext4_insert_range(). I think the simple semantics
> Dave and Darrick defend is more maintainable and insert range isn't really
> performance critical operation.
> 
> The question remains whether equivalent of BREAK_UNMAP is really required
> also for allocation of new blocks using fallocate. Because that doesn't
> really seem fundamentally different from normal write which uses
> BREAK_WRITE for xfs_break_layouts(). And that it more often used operation
> so bothering with GUP synchronization when not needed could hurt there.
> Dave, Darrick?

Hmm, IIRC BREAK_UNMAP is supposed to be for callers who are going to
remove (or move) mappings that already exist, so that the caller blocks
until the lessee acknowledges that they've forgotten all the mappings
they knew about.  So I /think/ for fallocate mode 0 I think this could
be BREAK_WRITE instead of _UNMAP, though (at least for xfs) the other
modes all need _UNMAP.

Side question: in xfs_file_aio_write_checks, do we need to do
BREAK_UNMAP if is possible that writeback will end up performing a copy
write?  Granted, the pnfs export and dax stuff don't support reflink or
cow so I guess this is an academic question for now...

--D

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Jan Kara July 9, 2018, 7:49 p.m. UTC | #13

On Mon 09-07-18 09:23:38, Darrick J. Wong wrote:
> On Mon, Jul 09, 2018 at 02:33:47PM +0200, Jan Kara wrote:
> > On Thu 05-07-18 10:53:10, Ross Zwisler wrote:
> > > On Wed, Jul 04, 2018 at 08:59:52PM -0700, Darrick J. Wong wrote:
> > > > On Thu, Jul 05, 2018 at 09:54:14AM +1000, Dave Chinner wrote:
> > > > > On Wed, Jul 04, 2018 at 02:27:23PM +0200, Jan Kara wrote:
> > > > > > On Wed 04-07-18 10:49:23, Dave Chinner wrote:
> > > > > > > On Mon, Jul 02, 2018 at 11:29:12AM -0600, Ross Zwisler wrote:
> > > > > > > > Follow the lead of xfs_break_dax_layouts() and add synchronization between
> > > > > > > > operations in ext4 which remove blocks from an inode (hole punch, truncate
> > > > > > > > down, etc.) and pages which are pinned due to DAX DMA operations.
> > > > > > > > 
> > > > > > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > > > > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > > > > > Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > Changes since v2:
> > > > > > > >  * Added a comment to ext4_insert_range() explaining why we don't call
> > > > > > > >    ext4_break_layouts(). (Jan)
> > > > > > > 
> > > > > > > Which I think is wrong and will cause data corruption.
> > > > > > > 
> > > > > > > > @@ -5651,6 +5663,11 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
> > > > > > > >  			LLONG_MAX);
> > > > > > > >  	if (ret)
> > > > > > > >  		goto out_mmap;
> > > > > > > > +	/*
> > > > > > > > +	 * We don't need to call ext4_break_layouts() because we aren't
> > > > > > > > +	 * removing any blocks from the inode.  We are just changing their
> > > > > > > > +	 * offset by inserting a hole.
> > > > > > > > +	 */
> > > > 
> > > > Does calling ext4_break_layouts from insert range not work?
> > > > 
> > > > It's my understanding that file leases work are a mechanism for the
> > > > filesystem to delegate some of its authority over physical space
> > > > mappings to "client" software.  AFAICT it's used for mmap'ing pmem
> > > > directly into userspace and exporting space on shared storage over
> > > > pNFS.  Some day we might use the same mechanism for the similar things
> > > > that RDMA does, or the swapfile code since that's essentially how it
> > > > works today.
> > > > 
> > > > The other part of these file leases is that the filesystem revokes them
> > > > any time it wants to perform a mapping operation on a file.  This breaks
> > > > my mental model of how leases work, and if you commit to this for ext4
> > > > then I'll have to remember that leases are different between xfs and
> > > > ext4.  Worse, since the reason for skipping ext4_break_layouts seems to
> > > > be the implementation detail that "DAX won't care", then someone else
> > > > wiring up pNFS/future RDMA/whatever will also have to remember to put it
> > > > back into ext4 or else kaboom.
> > > > 
> > > > Granted, Dave said all these things already, but I actually feel
> > > > strongly enough to reiterate.
> > > 
> > > Jan, would you like me to call ext4_break_layouts() in ext4_insert_range() to
> > > keep the lease mechanism consistent between ext4 and XFS, or would you prefer
> > > the s/ext4_break_layouts()/ext4_dax_unmap_pages()/ rename?
> > 
> > Let's just call it from ext4_insert_range(). I think the simple semantics
> > Dave and Darrick defend is more maintainable and insert range isn't really
> > performance critical operation.
> > 
> > The question remains whether equivalent of BREAK_UNMAP is really required
> > also for allocation of new blocks using fallocate. Because that doesn't
> > really seem fundamentally different from normal write which uses
> > BREAK_WRITE for xfs_break_layouts(). And that it more often used operation
> > so bothering with GUP synchronization when not needed could hurt there.
> > Dave, Darrick?
> 
> Hmm, IIRC BREAK_UNMAP is supposed to be for callers who are going to
> remove (or move) mappings that already exist, so that the caller blocks
> until the lessee acknowledges that they've forgotten all the mappings
> they knew about.  So I /think/ for fallocate mode 0 I think this could
> be BREAK_WRITE instead of _UNMAP, though (at least for xfs) the other
> modes all need _UNMAP.

OK, so we are on the same page here :)

> Side question: in xfs_file_aio_write_checks, do we need to do
> BREAK_UNMAP if is possible that writeback will end up performing a copy
> write?  Granted, the pnfs export and dax stuff don't support reflink or
> cow so I guess this is an academic question for now...

My naive understanding would be that yes, BREAK_UNMAP is needed in such
case...

								Honza

[v3,2/2] ext4: handle layout changes to pinned DAX mappings

Commit Message

Comments

Patch