Message ID | 1450974037-24775-8-git-send-email-matthew.r.wilcox@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Thu, Dec 24, 2015 at 11:20:36AM -0500, Matthew Wilcox wrote: > From: Matthew Wilcox <willy@linux.intel.com> > > Call into DAX to provide support for PUD pages, just like the PMD cases. > > Signed-off-by: Matthew Wilcox <willy@linux.intel.com> > --- > fs/xfs/xfs_file.c | 33 +++++++++++++++++++++++++++++++++ > fs/xfs/xfs_trace.h | 1 + > 2 files changed, 34 insertions(+) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index f5392ab..a81b942 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -1600,6 +1600,38 @@ xfs_filemap_pmd_fault( > return ret; > } > > +STATIC int > +xfs_filemap_pud_fault( > + struct vm_area_struct *vma, > + unsigned long addr, > + pud_t *pud, > + unsigned int flags) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + struct xfs_inode *ip = XFS_I(inode); > + int ret; > + > + if (!IS_DAX(inode)) > + return VM_FAULT_FALLBACK; > + > + trace_xfs_filemap_pud_fault(ip); > + > + if (flags & FAULT_FLAG_WRITE) { > + sb_start_pagefault(inode->i_sb); > + file_update_time(vma->vm_file); > + } > + > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + ret = __dax_pud_fault(vma, addr, pud, flags, xfs_get_blocks_dax_fault, > + NULL); > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + > + if (flags & FAULT_FLAG_WRITE) > + sb_end_pagefault(inode->i_sb); > + > + return ret; > +} > + > /* > * pfn_mkwrite was originally inteneded to ensure we capture time stamp > * updates on write faults. In reality, it's need to serialise against > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite( > static const struct vm_operations_struct xfs_file_vm_ops = { > .fault = xfs_filemap_fault, > .pmd_fault = xfs_filemap_pmd_fault, > + .pud_fault = xfs_filemap_pud_fault, This is getting silly - we now have 3 different page fault handlers that all do exactly the same thing. Please abstract this so that the page/pmd/pud is transparent and gets passed through to the generic handler code that then handles the differences between page/pmd/pud internally. This, after all, is the original reason that the ->fault handler was introduced.... Cheers, Dave.
On Thu, Dec 31, 2015 at 10:30:27AM +1100, Dave Chinner wrote: > > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite( > > static const struct vm_operations_struct xfs_file_vm_ops = { > > .fault = xfs_filemap_fault, > > .pmd_fault = xfs_filemap_pmd_fault, > > + .pud_fault = xfs_filemap_pud_fault, > > This is getting silly - we now have 3 different page fault handlers > that all do exactly the same thing. Please abstract this so that the > page/pmd/pud is transparent and gets passed through to the generic > handler code that then handles the differences between page/pmd/pud > internally. > > This, after all, is the original reason that the ->fault handler was > introduced.... I agree that it's silly, but this is the direction I was asked to go in by the MM people at the last MM summit. There was agreement that this needs to be abstracted, but that should be left for a separate cleanup round. I did prototype something I called a vpte (virtual pte), but that's very much on the back burner for now.
On Sat, Jan 02, 2016 at 11:43:09AM -0500, Matthew Wilcox wrote: > On Thu, Dec 31, 2015 at 10:30:27AM +1100, Dave Chinner wrote: > > > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite( > > > static const struct vm_operations_struct xfs_file_vm_ops = { > > > .fault = xfs_filemap_fault, > > > .pmd_fault = xfs_filemap_pmd_fault, > > > + .pud_fault = xfs_filemap_pud_fault, > > > > This is getting silly - we now have 3 different page fault handlers > > that all do exactly the same thing. Please abstract this so that the > > page/pmd/pud is transparent and gets passed through to the generic > > handler code that then handles the differences between page/pmd/pud > > internally. > > > > This, after all, is the original reason that the ->fault handler was > > introduced.... > > I agree that it's silly, but this is the direction I was asked to go in by > the MM people at the last MM summit. There was agreement that this needs > to be abstracted, but that should be left for a separate cleanup round. Ok, so it's time to abstract it now, before we end up with another round of broken filesystem code (like the first attempts at the XFS pmd_fault code). > I did prototype something I called a vpte (virtual pte), but that's very > much on the back burner for now. It's trivial to pack the parameters for pmd_fault and pud_fault into the struct vm_fault - all you need to do is add pmd_t/pud_t pointers to the structure, and everything else can be put into existing members of that structure. There's no need for a "virtual pte" type anywhere - you can do this effectively with an anonymous union for the pte/pmd/pud pointer and a flag to indicate the fault type. Then in __dax_fault() you can check vmf->flags and call the appropriate __dax_p{te,md,ud}_fault function, all without the filesystem having to care about the different fault types. Similar can be done with filemap_fault() - if it gets pmd/pud fault flags set it can just reject them as they should never occur right now... Cheers, Dave.
On Mon, Jan 04, 2016 at 07:33:56AM +1100, Dave Chinner wrote: > On Sat, Jan 02, 2016 at 11:43:09AM -0500, Matthew Wilcox wrote: > > On Thu, Dec 31, 2015 at 10:30:27AM +1100, Dave Chinner wrote: > > > > @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite( > > > > static const struct vm_operations_struct xfs_file_vm_ops = { > > > > .fault = xfs_filemap_fault, > > > > .pmd_fault = xfs_filemap_pmd_fault, > > > > + .pud_fault = xfs_filemap_pud_fault, > > > > > > This is getting silly - we now have 3 different page fault handlers > > > that all do exactly the same thing. Please abstract this so that the > > > page/pmd/pud is transparent and gets passed through to the generic > > > handler code that then handles the differences between page/pmd/pud > > > internally. > > > > > > This, after all, is the original reason that the ->fault handler was > > > introduced.... > > > > I agree that it's silly, but this is the direction I was asked to go in by > > the MM people at the last MM summit. There was agreement that this needs > > to be abstracted, but that should be left for a separate cleanup round. > > Ok, so it's time to abstract it now, before we end up with another > round of broken filesystem code (like the first attempts at the > XFS pmd_fault code). > > > I did prototype something I called a vpte (virtual pte), but that's very > > much on the back burner for now. > > It's trivial to pack the parameters for pmd_fault and pud_fault > into the struct vm_fault - all you need to do is add pmd_t/pud_t > pointers to the structure, and everything else can be put into > existing members of that structure. There's no need for a "virtual > pte" type anywhere - you can do this effectively with an anonymous > union for the pte/pmd/pud pointer and a flag to indicate the fault > type. > > Then in __dax_fault() you can check vmf->flags and call the > appropriate __dax_p{te,md,ud}_fault function, all without the > filesystem having to care about the different fault types. Similar > can be done with filemap_fault() - if it gets pmd/pud fault flags > set it can just reject them as they should never occur right now... I think the first 4 patches of my hugetmpfs RFD patchset[1] are relevant here. Looks like it shouldn't be a big deal to extend the approach to cover DAX case. [1] http://lkml.kernel.org./r/1447889136-6928-1-git-send-email-kirill.shutemov@linux.intel.com
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index f5392ab..a81b942 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1600,6 +1600,38 @@ xfs_filemap_pmd_fault( return ret; } +STATIC int +xfs_filemap_pud_fault( + struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags) +{ + struct inode *inode = file_inode(vma->vm_file); + struct xfs_inode *ip = XFS_I(inode); + int ret; + + if (!IS_DAX(inode)) + return VM_FAULT_FALLBACK; + + trace_xfs_filemap_pud_fault(ip); + + if (flags & FAULT_FLAG_WRITE) { + sb_start_pagefault(inode->i_sb); + file_update_time(vma->vm_file); + } + + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + ret = __dax_pud_fault(vma, addr, pud, flags, xfs_get_blocks_dax_fault, + NULL); + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + + if (flags & FAULT_FLAG_WRITE) + sb_end_pagefault(inode->i_sb); + + return ret; +} + /* * pfn_mkwrite was originally inteneded to ensure we capture time stamp * updates on write faults. In reality, it's need to serialise against @@ -1637,6 +1669,7 @@ xfs_filemap_pfn_mkwrite( static const struct vm_operations_struct xfs_file_vm_ops = { .fault = xfs_filemap_fault, .pmd_fault = xfs_filemap_pmd_fault, + .pud_fault = xfs_filemap_pud_fault, .map_pages = filemap_map_pages, .page_mkwrite = xfs_filemap_page_mkwrite, .pfn_mkwrite = xfs_filemap_pfn_mkwrite, diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 877079eb..16442bb 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -688,6 +688,7 @@ DEFINE_INODE_EVENT(xfs_inode_free_eofblocks_invalid); DEFINE_INODE_EVENT(xfs_filemap_fault); DEFINE_INODE_EVENT(xfs_filemap_pmd_fault); +DEFINE_INODE_EVENT(xfs_filemap_pud_fault); DEFINE_INODE_EVENT(xfs_filemap_page_mkwrite); DEFINE_INODE_EVENT(xfs_filemap_pfn_mkwrite);