Message ID | 1448309120-20911-1-git-send-email-toshi.kani@hpe.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > An infinite loop of PMD faults was observed when attempted to > mlock() a private read-only PMD mmap'd range of a DAX file. > > __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when > falling back to PTE on COW. However, __handle_mm_fault() > returns without falling back to handle_pte_fault() because > a PMD map is present in this case. > > Change __dax_pmd_fault() to split the PMD map, if present, > before returning with VM_FAULT_FALLBACK. > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: Matthew Wilcox <willy@linux.intel.com> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com> I thought the patch from Ross already addressed the infinite loop: https://patchwork.kernel.org/patch/7653731/ > --- > fs/dax.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 43671b6..3405583 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > return VM_FAULT_FALLBACK; > > /* Fall back to PTEs if we're going to COW */ > - if (write && !(vma->vm_flags & VM_SHARED)) > + if (write && !(vma->vm_flags & VM_SHARED)) { > + split_huge_page_pmd(vma, address, pmd); > return VM_FAULT_FALLBACK; > + } > /* If the PMD would extend outside the VMA */ > if (pmd_addr < vma->vm_start) > return VM_FAULT_FALLBACK; This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's a complete fix.
On Mon, 2015-11-23 at 12:45 -0800, Dan Williams wrote: > On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > > An infinite loop of PMD faults was observed when attempted to > > mlock() a private read-only PMD mmap'd range of a DAX file. > > > > __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when > > falling back to PTE on COW. However, __handle_mm_fault() > > returns without falling back to handle_pte_fault() because > > a PMD map is present in this case. > > > > Change __dax_pmd_fault() to split the PMD map, if present, > > before returning with VM_FAULT_FALLBACK. > > > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Cc: Matthew Wilcox <willy@linux.intel.com> > > Cc: Ross Zwisler <ross.zwisler@linux.intel.com> > > I thought the patch from Ross already addressed the infinite loop: > > https://patchwork.kernel.org/patch/7653731/ This fixes a different issue. I hit this one while testing my other patch along with the Ross's patch. > > --- > > fs/dax.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 43671b6..3405583 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, > > unsigned long address, > > return VM_FAULT_FALLBACK; > > > > /* Fall back to PTEs if we're going to COW */ > > - if (write && !(vma->vm_flags & VM_SHARED)) > > + if (write && !(vma->vm_flags & VM_SHARED)) { > > + split_huge_page_pmd(vma, address, pmd); > > return VM_FAULT_FALLBACK; > > + } > > /* If the PMD would extend outside the VMA */ > > if (pmd_addr < vma->vm_start) > > return VM_FAULT_FALLBACK; > > This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's > a complete fix. Well, __dax_pmd_fault() itself depends on CONFIG_TRANSPARENT_HUGEPAGE. Thanks, -Toshi
On Mon, Nov 23, 2015 at 12:45 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > On Mon, 2015-11-23 at 12:45 -0800, Dan Williams wrote: >> On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote: [..] >> This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's >> a complete fix. > > Well, __dax_pmd_fault() itself depends on CONFIG_TRANSPARENT_HUGEPAGE. > Indeed it is... I think that's wrong because transparent huge pages rely on struct page??
On Mon, 2015-11-23 at 12:56 -0800, Dan Williams wrote: > On Mon, Nov 23, 2015 at 12:45 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > > On Mon, 2015-11-23 at 12:45 -0800, Dan Williams wrote: > > > On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > [..] > > > This is a nop if CONFIG_TRANSPARENT_HUGEPAGE=n, so I don't think it's > > > a complete fix. > > > > Well, __dax_pmd_fault() itself depends on CONFIG_TRANSPARENT_HUGEPAGE. > > > > Indeed it is... I think that's wrong because transparent huge pages > rely on struct page?? I do not think this issue is related with struct page. wp_huge_pmd() calls either do_huge_pmd_wp_page() or dax_pmd_fault(). do_huge_pmd_wp_page() splits a pmd page when it returns with VM_FAULT_FALLBACK. So, this change keeps them consistent on VM_FAULT_FALLBACK. Thanks, -Toshi
On Mon, 2015-11-23 at 13:05 -0700, Toshi Kani wrote: > An infinite loop of PMD faults was observed when attempted to > mlock() a private read-only PMD mmap'd range of a DAX file. Typo: the above description should be (remove "read-only"): An infinite loop of PMD faults was observed when attempted to mlock() a private PMD mmap'd range of a DAX file. -Toshi > __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when > falling back to PTE on COW. However, __handle_mm_fault() > returns without falling back to handle_pte_fault() because > a PMD map is present in this case. > > Change __dax_pmd_fault() to split the PMD map, if present, > before returning with VM_FAULT_FALLBACK.
On Mon, Nov 23, 2015 at 12:05 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > An infinite loop of PMD faults was observed when attempted to > mlock() a private read-only PMD mmap'd range of a DAX file. > > __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when > falling back to PTE on COW. However, __handle_mm_fault() > returns without falling back to handle_pte_fault() because > a PMD map is present in this case. > > Change __dax_pmd_fault() to split the PMD map, if present, > before returning with VM_FAULT_FALLBACK. > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: Matthew Wilcox <willy@linux.intel.com> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com> > --- > fs/dax.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 43671b6..3405583 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > return VM_FAULT_FALLBACK; > > /* Fall back to PTEs if we're going to COW */ > - if (write && !(vma->vm_flags & VM_SHARED)) > + if (write && !(vma->vm_flags & VM_SHARED)) { > + split_huge_page_pmd(vma, address, pmd); > return VM_FAULT_FALLBACK; > + } Reviewed-by: Dan Williams <dan.j.williams@intel.com> I took a closer look at dax's CONFIG_TRANSPARENT_HUGEPAGE interactions and it turns out THP is a performance enhancement not a functional dependency. I.e. a performance enhancement to use a huge_zero_page where available, but not a requirement. I'll fold this in with my series make pmd_trans_huge() return false for non-huge_zero_page dax mappings, and in that case I'll need to up-level the call to pmdp_huge_clear_flush_notify() from __split_huge_page_pmd.
diff --git a/fs/dax.c b/fs/dax.c index 43671b6..3405583 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -546,8 +546,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, return VM_FAULT_FALLBACK; /* Fall back to PTEs if we're going to COW */ - if (write && !(vma->vm_flags & VM_SHARED)) + if (write && !(vma->vm_flags & VM_SHARED)) { + split_huge_page_pmd(vma, address, pmd); return VM_FAULT_FALLBACK; + } /* If the PMD would extend outside the VMA */ if (pmd_addr < vma->vm_start) return VM_FAULT_FALLBACK;
An infinite loop of PMD faults was observed when attempted to mlock() a private read-only PMD mmap'd range of a DAX file. __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when falling back to PTE on COW. However, __handle_mm_fault() returns without falling back to handle_pte_fault() because a PMD map is present in this case. Change __dax_pmd_fault() to split the PMD map, if present, before returning with VM_FAULT_FALLBACK. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> --- fs/dax.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)