Message ID | 20190802192956.GA3032@redhat.com (mailing list archive) |
---|---|
State | Mainlined |
Commit | d75996dd022b6d83bd14af59b2775b1aa639e4b9 |
Headers | show |
Series | dax: dax_layout_busy_page() should not unmap cow pages | expand |
On Fri, Aug 2, 2019 at 12:30 PM Vivek Goyal <vgoyal@redhat.com> wrote: > > As of now dax_layout_busy_page() calls unmap_mapping_range() with last > argument as 1, which says even unmap cow pages. I am wondering who needs > to get rid of cow pages as well. > > I noticed one interesting side affect of this. I mount xfs with -o dax and > mmaped a file with MAP_PRIVATE and wrote some data to a page which created > cow page. Then I called fallocate() on that file to zero a page of file. > fallocate() called dax_layout_busy_page() which unmapped cow pages as well > and then I tried to read back the data I wrote and what I get is old > data from persistent memory. I lost the data I had written. This > read basically resulted in new fault and read back the data from > persistent memory. > > This sounds wrong. Are there any users which need to unmap cow pages > as well? If not, I am proposing changing it to not unmap cow pages. > > I noticed this while while writing virtio_fs code where when I tried > to reclaim a memory range and that corrupted the executable and I > was running from virtio-fs and program got segment violation. > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > --- > fs/dax.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: rhvgoyal-linux/fs/dax.c > =================================================================== > --- rhvgoyal-linux.orig/fs/dax.c 2019-08-01 17:03:10.574675652 -0400 > +++ rhvgoyal-linux/fs/dax.c 2019-08-02 14:32:28.809639116 -0400 > @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct > * guaranteed to either see new references or prevent new > * references from being established. > */ > - unmap_mapping_range(mapping, 0, 0, 1); > + unmap_mapping_range(mapping, 0, 0, 0); Good find, yes, this looks correct to me and should also go to -stable.
On 02/08/2019 22:37, Dan Williams wrote: > On Fri, Aug 2, 2019 at 12:30 PM Vivek Goyal <vgoyal@redhat.com> wrote: >> >> As of now dax_layout_busy_page() calls unmap_mapping_range() with last >> argument as 1, which says even unmap cow pages. I am wondering who needs >> to get rid of cow pages as well. >> >> I noticed one interesting side affect of this. I mount xfs with -o dax and >> mmaped a file with MAP_PRIVATE and wrote some data to a page which created >> cow page. Then I called fallocate() on that file to zero a page of file. >> fallocate() called dax_layout_busy_page() which unmapped cow pages as well >> and then I tried to read back the data I wrote and what I get is old >> data from persistent memory. I lost the data I had written. This >> read basically resulted in new fault and read back the data from >> persistent memory. >> >> This sounds wrong. Are there any users which need to unmap cow pages >> as well? If not, I am proposing changing it to not unmap cow pages. >> >> I noticed this while while writing virtio_fs code where when I tried >> to reclaim a memory range and that corrupted the executable and I >> was running from virtio-fs and program got segment violation. >> >> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> >> --- >> fs/dax.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> Index: rhvgoyal-linux/fs/dax.c >> =================================================================== >> --- rhvgoyal-linux.orig/fs/dax.c 2019-08-01 17:03:10.574675652 -0400 >> +++ rhvgoyal-linux/fs/dax.c 2019-08-02 14:32:28.809639116 -0400 >> @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct >> * guaranteed to either see new references or prevent new >> * references from being established. >> */ >> - unmap_mapping_range(mapping, 0, 0, 1); >> + unmap_mapping_range(mapping, 0, 0, 0); > > Good find, yes, this looks correct to me and should also go to -stable. > Please pay attention that unmap_mapping_range(mapping, ..., 1) is for the truncate case and friends So as I understand the man page: fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages. On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays Just saying I have not followed the above code path (We should have an xfstest for this?) Cheers Boaz
On Mon, Aug 05, 2019 at 02:53:06PM +0300, Boaz Harrosh wrote: > On 02/08/2019 22:37, Dan Williams wrote: > > On Fri, Aug 2, 2019 at 12:30 PM Vivek Goyal <vgoyal@redhat.com> wrote: > >> > >> As of now dax_layout_busy_page() calls unmap_mapping_range() with last > >> argument as 1, which says even unmap cow pages. I am wondering who needs > >> to get rid of cow pages as well. > >> > >> I noticed one interesting side affect of this. I mount xfs with -o dax and > >> mmaped a file with MAP_PRIVATE and wrote some data to a page which created > >> cow page. Then I called fallocate() on that file to zero a page of file. > >> fallocate() called dax_layout_busy_page() which unmapped cow pages as well > >> and then I tried to read back the data I wrote and what I get is old > >> data from persistent memory. I lost the data I had written. This > >> read basically resulted in new fault and read back the data from > >> persistent memory. > >> > >> This sounds wrong. Are there any users which need to unmap cow pages > >> as well? If not, I am proposing changing it to not unmap cow pages. > >> > >> I noticed this while while writing virtio_fs code where when I tried > >> to reclaim a memory range and that corrupted the executable and I > >> was running from virtio-fs and program got segment violation. > >> > >> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > >> --- > >> fs/dax.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> Index: rhvgoyal-linux/fs/dax.c > >> =================================================================== > >> --- rhvgoyal-linux.orig/fs/dax.c 2019-08-01 17:03:10.574675652 -0400 > >> +++ rhvgoyal-linux/fs/dax.c 2019-08-02 14:32:28.809639116 -0400 > >> @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct > >> * guaranteed to either see new references or prevent new > >> * references from being established. > >> */ > >> - unmap_mapping_range(mapping, 0, 0, 1); > >> + unmap_mapping_range(mapping, 0, 0, 0); > > > > Good find, yes, this looks correct to me and should also go to -stable. > > > > Please pay attention that unmap_mapping_range(mapping, ..., 1) is for the truncate case and friends > > So as I understand the man page: > fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages. > On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays I tested fallocate(FL_PUNCH_HOLE) on xfs (non-dax) and it does not seem to get rid of COW pages and my test case still can read the data it wrote in private pages. > > Just saying I have not followed the above code path > (We should have an xfstest for this?) I don't know either. It indeed is interesting to figure out what's the expected behavior with fallocate() and truncate() for COW pages and cover that using xfstest (if not already done). Irrespective of that, for dax, it seems particularly bad because we call unmap_mapping_range() for the whole file. So even if we are punching hole on a single page and expected cow page to go away associated with that page, currently it will get rid of all COW pages in whole file. So to me it makes sense to not get rid of COW pages and possibly introduce option of performing dax_layout_busy_page() on a range of pages (as opposed to whole file) and caller can specify whether to zap cow pages or not in the specified range. Thanks Vivek
On 05/08/2019 21:49, Vivek Goyal wrote: > On Mon, Aug 05, 2019 at 02:53:06PM +0300, Boaz Harrosh wrote: <> >> So as I understand the man page: >> fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages. >> On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays > > I tested fallocate(FL_PUNCH_HOLE) on xfs (non-dax) and it does not seem to > get rid of COW pages and my test case still can read the data it wrote > in private pages. > It seems you are right and I am wrong. This is what the Kernel code has to say about it: /* * Unlike in truncate_pagecache, unmap_mapping_range is called only * once (before truncating pagecache), and without "even_cows" flag: * hole-punching should not remove private COWed pages from the hole. */ For me this is confusing but that is what it is. So remove private COWed pages is only done when we do an setattr(ATTR_SIZE). >> >> Just saying I have not followed the above code path >> (We should have an xfstest for this?) > > I don't know either. It indeed is interesting to figure out what's the > expected behavior with fallocate() and truncate() for COW pages and cover > that using xfstest (if not already done). > I could not find any test for the COW positive FL_PUNCH_HOLE (I have that bug) could be nice to make one, and let FSs like mine fail. Any way very nice catch. > > Thanks > Vivek > Thanks Boaz
On Mon, Aug 5, 2019 at 12:17 PM Boaz Harrosh <boaz@plexistor.com> wrote: > > On 05/08/2019 21:49, Vivek Goyal wrote: > > On Mon, Aug 05, 2019 at 02:53:06PM +0300, Boaz Harrosh wrote: > <> > >> So as I understand the man page: > >> fallocate(FL_PUNCH_HOLE); means user is asking to get rid also of COW pages. > >> On the other way fallocate(FL_ZERO_RANGE) only the pmem portion is zeroed and COW (private pages) stays > > > > I tested fallocate(FL_PUNCH_HOLE) on xfs (non-dax) and it does not seem to > > get rid of COW pages and my test case still can read the data it wrote > > in private pages. > > > > It seems you are right and I am wrong. This is what the Kernel code has to say about it: > > /* > * Unlike in truncate_pagecache, unmap_mapping_range is called only > * once (before truncating pagecache), and without "even_cows" flag: > * hole-punching should not remove private COWed pages from the hole. > */ > > For me this is confusing but that is what it is. So remove private COWed pages > is only done when we do an setattr(ATTR_SIZE). > > >> > >> Just saying I have not followed the above code path > >> (We should have an xfstest for this?) > > > > I don't know either. It indeed is interesting to figure out what's the > > expected behavior with fallocate() and truncate() for COW pages and cover > > that using xfstest (if not already done). > > > > I could not find any test for the COW positive FL_PUNCH_HOLE (I have that bug) > could be nice to make one, and let FSs like mine fail. > Any way very nice catch. > Yes, and this bug is worse because it affects COW pages that are not the direct target of the truncate / hole punch. This unmap in dax_layout_busy_page() is only there to allow the fs to synchronize against get_user_pages_fast() which might otherwise race to grab a page reference and prevent the fs from making forward progress. The unmap_mapping_range() that addresses COW pages in the truncated range occurs later after the filesystem has regained control of the extent layout (i.e. break layouts has succeeded).
Index: rhvgoyal-linux/fs/dax.c =================================================================== --- rhvgoyal-linux.orig/fs/dax.c 2019-08-01 17:03:10.574675652 -0400 +++ rhvgoyal-linux/fs/dax.c 2019-08-02 14:32:28.809639116 -0400 @@ -600,7 +600,7 @@ struct page *dax_layout_busy_page(struct * guaranteed to either see new references or prevent new * references from being established. */ - unmap_mapping_range(mapping, 0, 0, 1); + unmap_mapping_range(mapping, 0, 0, 0); xas_lock_irq(&xas); xas_for_each(&xas, entry, ULONG_MAX) {
As of now dax_layout_busy_page() calls unmap_mapping_range() with last argument as 1, which says even unmap cow pages. I am wondering who needs to get rid of cow pages as well. I noticed one interesting side affect of this. I mount xfs with -o dax and mmaped a file with MAP_PRIVATE and wrote some data to a page which created cow page. Then I called fallocate() on that file to zero a page of file. fallocate() called dax_layout_busy_page() which unmapped cow pages as well and then I tried to read back the data I wrote and what I get is old data from persistent memory. I lost the data I had written. This read basically resulted in new fault and read back the data from persistent memory. This sounds wrong. Are there any users which need to unmap cow pages as well? If not, I am proposing changing it to not unmap cow pages. I noticed this while while writing virtio_fs code where when I tried to reclaim a memory range and that corrupted the executable and I was running from virtio-fs and program got segment violation. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- fs/dax.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)