diff mbox

mm: take i_mmap_lock in unmap_mapping_range() for DAX

Message ID 1438948423-128882-1-git-send-email-kirill.shutemov@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kirill A . Shutemov Aug. 7, 2015, 11:53 a.m. UTC
DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.

__dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.

Re-aquiring the lock should be fine since we check i_size after the
point.

Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/dax.c    | 35 +++++++++++++++++++----------------
 mm/memory.c | 11 ++---------
 2 files changed, 21 insertions(+), 25 deletions(-)

Comments

Dan Williams Sept. 15, 2015, 11:52 p.m. UTC | #1
Hi Kirill,

On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.
>
> __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
> all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.
>
> Re-aquiring the lock should be fine since we check i_size after the
> point.
>
> Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  fs/dax.c    | 35 +++++++++++++++++++----------------
>  mm/memory.c | 11 ++---------
>  2 files changed, 21 insertions(+), 25 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 9ef9b80cc132..ed54efedade6 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>         if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
>                 goto fallback;
>
> +       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
> +               int i;
> +               for (i = 0; i < PTRS_PER_PMD; i++)
> +                       clear_page(kaddr + i * PAGE_SIZE);

This patch, now upstream as commit 46c043ede471, moves the call to
clear_page() earlier in __dax_pmd_fault().  However, 'kaddr' is not
set at this point, so I'm not sure this path was ever tested.  I'm
also not sure why the compiler is not complaining about an
uninitialized variable?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov Sept. 16, 2015, 11:12 a.m. UTC | #2
On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote:
> Hi Kirill,
> 
> On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.
> >
> > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
> > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.
> >
> > Re-aquiring the lock should be fine since we check i_size after the
> > point.
> >
> > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  fs/dax.c    | 35 +++++++++++++++++++----------------
> >  mm/memory.c | 11 ++---------
> >  2 files changed, 21 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 9ef9b80cc132..ed54efedade6 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> >         if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
> >                 goto fallback;
> >
> > +       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
> > +               int i;
> > +               for (i = 0; i < PTRS_PER_PMD; i++)
> > +                       clear_page(kaddr + i * PAGE_SIZE);
> 
> This patch, now upstream as commit 46c043ede471, moves the call to
> clear_page() earlier in __dax_pmd_fault().  However, 'kaddr' is not
> set at this point, so I'm not sure this path was ever tested.

Ughh. It's obviously broken.

I took fs/dax.c part of the patch from Matthew. And I'm not sure now we
would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {"
block around. It should work fine where it was before. Right?
Matthew?

> I'm also not sure why the compiler is not complaining about an
> uninitialized variable?

No idea.
Ross Zwisler Sept. 17, 2015, 3:41 p.m. UTC | #3
On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote:
> On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote:
> > Hi Kirill,
> > 
> > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.
> > >
> > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
> > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.
> > >
> > > Re-aquiring the lock should be fine since we check i_size after the
> > > point.
> > >
> > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > ---
> > >  fs/dax.c    | 35 +++++++++++++++++++----------------
> > >  mm/memory.c | 11 ++---------
> > >  2 files changed, 21 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/fs/dax.c b/fs/dax.c
> > > index 9ef9b80cc132..ed54efedade6 100644
> > > --- a/fs/dax.c
> > > +++ b/fs/dax.c
> > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > >         if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
> > >                 goto fallback;
> > >
> > > +       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
> > > +               int i;
> > > +               for (i = 0; i < PTRS_PER_PMD; i++)
> > > +                       clear_page(kaddr + i * PAGE_SIZE);
> > 
> > This patch, now upstream as commit 46c043ede471, moves the call to
> > clear_page() earlier in __dax_pmd_fault().  However, 'kaddr' is not
> > set at this point, so I'm not sure this path was ever tested.
> 
> Ughh. It's obviously broken.
> 
> I took fs/dax.c part of the patch from Matthew. And I'm not sure now we
> would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {"
> block around. It should work fine where it was before. Right?
> Matthew?

Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems
correct to me.  Matthew is out for a while, so we should probably take care of
this without him.

Kirill, do you want to whip up a quick patch?  I'm happy to do it if you're
busy.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Sept. 17, 2015, 3:46 p.m. UTC | #4
On Thu, Sep 17, 2015 at 8:41 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote:
>> > Hi Kirill,
>> >
>> > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov
>> > <kirill.shutemov@linux.intel.com> wrote:
>> > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.
>> > >
>> > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
>> > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.
>> > >
>> > > Re-aquiring the lock should be fine since we check i_size after the
>> > > point.
>> > >
>> > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com>
>> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > > ---
>> > >  fs/dax.c    | 35 +++++++++++++++++++----------------
>> > >  mm/memory.c | 11 ++---------
>> > >  2 files changed, 21 insertions(+), 25 deletions(-)
>> > >
>> > > diff --git a/fs/dax.c b/fs/dax.c
>> > > index 9ef9b80cc132..ed54efedade6 100644
>> > > --- a/fs/dax.c
>> > > +++ b/fs/dax.c
>> > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>> > >         if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
>> > >                 goto fallback;
>> > >
>> > > +       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
>> > > +               int i;
>> > > +               for (i = 0; i < PTRS_PER_PMD; i++)
>> > > +                       clear_page(kaddr + i * PAGE_SIZE);
>> >
>> > This patch, now upstream as commit 46c043ede471, moves the call to
>> > clear_page() earlier in __dax_pmd_fault().  However, 'kaddr' is not
>> > set at this point, so I'm not sure this path was ever tested.
>>
>> Ughh. It's obviously broken.
>>
>> I took fs/dax.c part of the patch from Matthew. And I'm not sure now we
>> would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {"
>> block around. It should work fine where it was before. Right?
>> Matthew?
>
> Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems
> correct to me.  Matthew is out for a while, so we should probably take care of
> this without him.

I'd say leave it at its current location and add a local call to
bdev_direct_access() as I'm not sure you'd want to trigger one of the
failure conditions without having zeroed the page.  I.e. right before
vmf_insert_pfn_pmd() is probably too late.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A . Shutemov Sept. 17, 2015, 3:47 p.m. UTC | #5
Ross Zwisler wrote:
> On Wed, Sep 16, 2015 at 02:12:18PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Sep 15, 2015 at 04:52:42PM -0700, Dan Williams wrote:
> > > Hi Kirill,
> > > 
> > > On Fri, Aug 7, 2015 at 4:53 AM, Kirill A. Shutemov
> > > <kirill.shutemov@linux.intel.com> wrote:
> > > > DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.
> > > >
> > > > __dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
> > > > all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.
> > > >
> > > > Re-aquiring the lock should be fine since we check i_size after the
> > > > point.
> > > >
> > > > Not-yet-signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > ---
> > > >  fs/dax.c    | 35 +++++++++++++++++++----------------
> > > >  mm/memory.c | 11 ++---------
> > > >  2 files changed, 21 insertions(+), 25 deletions(-)
> > > >
> > > > diff --git a/fs/dax.c b/fs/dax.c
> > > > index 9ef9b80cc132..ed54efedade6 100644
> > > > --- a/fs/dax.c
> > > > +++ b/fs/dax.c
> > > > @@ -554,6 +554,25 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > > >         if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
> > > >                 goto fallback;
> > > >
> > > > +       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
> > > > +               int i;
> > > > +               for (i = 0; i < PTRS_PER_PMD; i++)
> > > > +                       clear_page(kaddr + i * PAGE_SIZE);
> > > 
> > > This patch, now upstream as commit 46c043ede471, moves the call to
> > > clear_page() earlier in __dax_pmd_fault().  However, 'kaddr' is not
> > > set at this point, so I'm not sure this path was ever tested.
> > 
> > Ughh. It's obviously broken.
> > 
> > I took fs/dax.c part of the patch from Matthew. And I'm not sure now we
> > would need to move this "if (buffer_unwritten(&bh) || buffer_new(&bh)) {"
> > block around. It should work fine where it was before. Right?
> > Matthew?
> 
> Moving the "if (buffer_unwritten(&bh) || buffer_new(&bh)) {" block back seems
> correct to me.  Matthew is out for a while, so we should probably take care of
> this without him.
> 
> Kirill, do you want to whip up a quick patch?  I'm happy to do it if you're
> busy.

I would be better if you'll prepare the patch. Thanks.
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 9ef9b80cc132..ed54efedade6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -554,6 +554,25 @@  int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
 		goto fallback;
 
+	if (buffer_unwritten(&bh) || buffer_new(&bh)) {
+		int i;
+		for (i = 0; i < PTRS_PER_PMD; i++)
+			clear_page(kaddr + i * PAGE_SIZE);
+		count_vm_event(PGMAJFAULT);
+		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+		result |= VM_FAULT_MAJOR;
+	}
+
+	/*
+	 * If we allocated new storage, make sure no process has any
+	 * zero pages covering this hole
+	 */
+	if (buffer_new(&bh)) {
+		i_mmap_unlock_write(mapping);
+		unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
+		i_mmap_lock_write(mapping);
+	}
+
 	/*
 	 * If a truncate happened while we were allocating blocks, we may
 	 * leave blocks allocated to the file that are beyond EOF.  We can't
@@ -568,13 +587,6 @@  int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	if ((pgoff | PG_PMD_COLOUR) >= size)
 		goto fallback;
 
-	/*
-	 * If we allocated new storage, make sure no process has any
-	 * zero pages covering this hole
-	 */
-	if (buffer_new(&bh))
-		unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
-
 	if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) {
 		spinlock_t *ptl;
 		pmd_t entry;
@@ -605,15 +617,6 @@  int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
 			goto fallback;
 
-		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
-			int i;
-			for (i = 0; i < PTRS_PER_PMD; i++)
-				clear_page(kaddr + i * PAGE_SIZE);
-			count_vm_event(PGMAJFAULT);
-			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-			result |= VM_FAULT_MAJOR;
-		}
-
 		result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write);
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 5a3427bb3f32..670cdfa9f33e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2426,17 +2426,10 @@  void unmap_mapping_range(struct address_space *mapping,
 	if (details.last_index < details.first_index)
 		details.last_index = ULONG_MAX;
 
-
-	/*
-	 * DAX already holds i_mmap_lock to serialise file truncate vs
-	 * page fault and page fault vs page fault.
-	 */
-	if (!IS_DAX(mapping->host))
-		i_mmap_lock_write(mapping);
+	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
-	if (!IS_DAX(mapping->host))
-		i_mmap_unlock_write(mapping);
+	i_mmap_unlock_write(mapping);
 }
 EXPORT_SYMBOL(unmap_mapping_range);