Message ID | 1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Kirill, On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > Re-aquiring the lock should be fine since we check i_size after the > point. > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > fs/dax.c | 35 +++++++++++++++++++---------------- > mm/memory.c | 11 ++--------- > 2 files changed, 21 insertions(+), 25 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index 9ef9b80cc132..ed54efedade6 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > goto fallback; > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > + int i; > + for (i = 0; i < PTRS_PER_PMD; i++) > + clear_page(kaddr + i * PAGE_SIZE); This patch, now upstream as commit 46c043ede471, moves the call to clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not set at this point, so I'm not sure this path was ever tested. I'm also not sure why the compiler is not complaining about an uninitialized variable? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > Hi Kirill, > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > Re-aquiring the lock should be fine since we check i_size after the > > point. > > > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > fs/dax.c | 35 +++++++++++++++++++---------------- > > mm/memory.c | 11 ++--------- > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 9ef9b80cc132..ed54efedade6 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > goto fallback; > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > + int i; > > + for (i = 0; i < PTRS_PER_PMD; i++) > > + clear_page(kaddr + i * PAGE_SIZE); > > This patch, now upstream as commit 46c043ede471, moves the call to > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > set at this point, so I'm not sure this path was ever tested. Ughh. It's obviously broken. I took fs/dax.c part of the patch from Matthew. And I'm not sure now we would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block around. It should work fine where it was before. Right? Matthew? > I'm also not sure why the compiler is not complaining about an > uninitialized variable? No idea.
On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > Hi Kirill, > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > <kirill.shutemov@linux.intel.com> wrote: > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > point. > > > > > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > --- > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > mm/memory.c | 11 ++--------- > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > index 9ef9b80cc132..ed54efedade6 100644 > > > --- a/fs/dax.c > > > +++ b/fs/dax.c > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > goto fallback; > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > + int i; > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > set at this point, so I'm not sure this path was ever tested. > > Ughh. It's obviously broken. > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > block around. It should work fine where it was before. Right? > Matthew? Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems correct to me. Matthew is out for a while, so we should probably take care of this without him. Kirill, do you want to whip up a quick patch? I'm happy to do it if you're busy. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 17, 2015 at 8:41 AM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: >> On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: >> > Hi Kirill, >> > >> > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov >> > <kirill.shutemov@linux.intel.com> wrote: >> > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. >> > > >> > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from >> > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. >> > > >> > > Re-aquiring the lock should be fine since we check i_size after the >> > > point. >> > > >> > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com> >> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> > > --- >> > > fs/dax.c | 35 +++++++++++++++++++---------------- >> > > mm/memory.c | 11 ++--------- >> > > 2 files changed, 21 insertions(+), 25 deletions(-) >> > > >> > > diff --git a/fs/dax.c b/fs/dax.c >> > > index 9ef9b80cc132..ed54efedade6 100644 >> > > --- a/fs/dax.c >> > > +++ b/fs/dax.c >> > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, >> > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) >> > > goto fallback; >> > > >> > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { >> > > + int i; >> > > + for (i = 0; i < PTRS_PER_PMD; i++) >> > > + clear_page(kaddr + i * PAGE_SIZE); >> > >> > This patch, now upstream as commit 46c043ede471, moves the call to >> > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not >> > set at this point, so I'm not sure this path was ever tested. >> >> Ughh. It's obviously broken. >> >> I took fs/dax.c part of the patch from Matthew. And I'm not sure now we >> would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" >> block around. It should work fine where it was before. Right? >> Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. I'd say leave it at its current location and add a local call to bdev_direct_access() as I'm not sure you'd want to trigger one of the failure conditions without having zeroed the page. I.e. right before vmf_insert_pfn_pmd() is probably too late. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ross Zwisler wrote: > On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote: > > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote: > > > Hi Kirill, > > > > > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov > > > <kirill.shutemov@linux.intel.com> wrote: > > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. > > > > > > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from > > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. > > > > > > > > Re-aquiring the lock should be fine since we check i_size after the > > > > point. > > > > > > > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com> > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > --- > > > > fs/dax.c | 35 +++++++++++++++++++---------------- > > > > mm/memory.c | 11 ++--------- > > > > 2 files changed, 21 insertions(+), 25 deletions(-) > > > > > > > > diff --git a/fs/dax.c b/fs/dax.c > > > > index 9ef9b80cc132..ed54efedade6 100644 > > > > --- a/fs/dax.c > > > > +++ b/fs/dax.c > > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, > > > > if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) > > > > goto fallback; > > > > > > > > + if (buffer_unwritten(&bh) || buffer_new(&bh)) { > > > > + int i; > > > > + for (i = 0; i < PTRS_PER_PMD; i++) > > > > + clear_page(kaddr + i * PAGE_SIZE); > > > > > > This patch, now upstream as commit 46c043ede471, moves the call to > > > clear_page() earlier in __dax_pmd_fault(). However, 'kaddr' is not > > > set at this point, so I'm not sure this path was ever tested. > > > > Ughh. It's obviously broken. > > > > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we > > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" > > block around. It should work fine where it was before. Right? > > Matthew? > > Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems > correct to me. Matthew is out for a while, so we should probably take care of > this without him. > > Kirill, do you want to whip up a quick patch? I'm happy to do it if you're > busy. I would be better if you'll prepare the patch. Thanks.
diff --git a/fs/dax.c b/fs/dax.c index 9ef9b80cc132..ed54efedade6 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) goto fallback; + if (buffer_unwritten(&bh) || buffer_new(&bh)) { + int i; + for (i = 0; i < PTRS_PER_PMD; i++) + clear_page(kaddr + i * PAGE_SIZE); + count_vm_event(PGMAJFAULT); + mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); + result |= VM_FAULT_MAJOR; + } + + /* + * If we allocated new storage, make sure no process has any + * zero pages covering this hole + */ + if (buffer_new(&bh)) { + i_mmap_unlock_write(mapping); + unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0); + i_mmap_lock_write(mapping); + } + /* * If a truncate happened while we were allocating blocks, we may * leave blocks allocated to the file that are beyond EOF. We can't @@ -568,13 +587,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if ((pgoff | PG_PMD_COLOUR) >= size) goto fallback; - /* - * If we allocated new storage, make sure no process has any - * zero pages covering this hole - */ - if (buffer_new(&bh)) - unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0); - if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) { spinlock_t *ptl; pmd_t entry; @@ -605,15 +617,6 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) goto fallback; - if (buffer_unwritten(&bh) || buffer_new(&bh)) { - int i; - for (i = 0; i < PTRS_PER_PMD; i++) - clear_page(kaddr + i * PAGE_SIZE); - count_vm_event(PGMAJFAULT); - mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); - result |= VM_FAULT_MAJOR; - } - result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write); } diff --git a/mm/memory.c b/mm/memory.c index 5a3427bb3f32..670cdfa9f33e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2426,17 +2426,10 @@ void unmap_mapping_range(struct address_space *mapping, if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - - /* - * DAX already holds i_mmap_lock to serialise file truncate vs - * page fault and page fault vs page fault. - */ - if (!IS_DAX(mapping->host)) - i_mmap_lock_write(mapping); + i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap))) unmap_mapping_range_tree(&mapping->i_mmap, &details); - if (!IS_DAX(mapping->host)) - i_mmap_unlock_write(mapping); + i_mmap_unlock_write(mapping); } EXPORT_SYMBOL(unmap_mapping_range);
DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap. __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from all mappings. We need to drop i_mmap_lock there to avoid lock deadlock. Re-aquiring the lock should be fine since we check i_size after the point. Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- fs/dax.c | 35 +++++++++++++++++++---------------- mm/memory.c | 11 ++--------- 2 files changed, 21 insertions(+), 25 deletions(-)