diff mbox

ext4: drop file_update_time from ext4_dax_fault

Message ID 20151120001813.27997.41722.stgit@dwillia2-desk3.jf.intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Dan Williams Nov. 20, 2015, 12:18 a.m. UTC
Neither the filemap_fault() nor the xfs dax fault path is updating time.
This call leads to the following WARN() when the block device has been
torn down:

 WARNING: CPU: 0 PID: 2133 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
 bdi-block not registered
 [..]
 Call Trace:
  [<ffffffff81459f62>] dump_stack+0x44/0x62
  [<ffffffff810a2052>] warn_slowpath_common+0x82/0xc0
  [<ffffffff810a20ec>] warn_slowpath_fmt+0x5c/0x80
  [<ffffffff812831a1>] __mark_inode_dirty+0x261/0x350
  [<ffffffff8126d109>] generic_update_time+0x79/0xd0
  [<ffffffff8126d28d>] file_update_time+0xbd/0x110
  [<ffffffff812e4bc8>] ext4_dax_fault+0x68/0x110
  [<ffffffff811f816e>] __do_fault+0x4e/0xf0
  [<ffffffff811fc2a7>] handle_mm_fault+0x5e7/0x1b50
  [<ffffffff811fbd11>] ? handle_mm_fault+0x51/0x1b50
  [<ffffffff810689c1>] __do_page_fault+0x191/0x3f0
  [<ffffffff81068cef>] trace_do_page_fault+0x4f/0x120
  [<ffffffff8106314a>] do_async_page_fault+0x1a/0xa0
  [<ffffffff81902678>] async_page_fault+0x28/0x30

Cc: Jan Kara <jack@suse.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/ext4/file.c |    2 --
 1 file changed, 2 deletions(-)

Comments

Dave Chinner Nov. 20, 2015, 3:03 a.m. UTC | #1
On Thu, Nov 19, 2015 at 04:18:13PM -0800, Dan Williams wrote:
> Neither the filemap_fault() nor the xfs dax fault path is updating time.
> This call leads to the following WARN() when the block device has been
> torn down:

I don't think that is right. In xfs_filemap_fault():


....
        /* DAX can shortcut the normal fault path on write faults! */
        if ((vmf->flags & FAULT_FLAG_WRITE) && IS_DAX(inode))
                return xfs_filemap_page_mkwrite(vma, vmf);
....

And xfs_filemap_page_mkwrite() most definitely calls file_update_time():

....
        trace_xfs_filemap_page_mkwrite(XFS_I(inode));

        sb_start_pagefault(inode->i_sb);
        file_update_time(vma->vm_file);
        xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
....

And, finally, in xfs_filemap_pmd_fault():

....
        if (flags & FAULT_FLAG_WRITE) {
                sb_start_pagefault(inode->i_sb);
                file_update_time(vma->vm_file);
        }
....

So we are clearly updating timestamps in XFS on every write fault
that occurs, whether it be through ->page_mkwrite, ->fault or
->pmd_fault. Hence removing those from ext4 can't be the righ tthing
to do.

> 
>  WARNING: CPU: 0 PID: 2133 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
>  bdi-block not registered
   ^^^^^^^^^^^^^^^^^^^^^^^^
>  [..]
>  Call Trace:
>   [<ffffffff81459f62>] dump_stack+0x44/0x62
>   [<ffffffff810a2052>] warn_slowpath_common+0x82/0xc0
>   [<ffffffff810a20ec>] warn_slowpath_fmt+0x5c/0x80
>   [<ffffffff812831a1>] __mark_inode_dirty+0x261/0x350
>   [<ffffffff8126d109>] generic_update_time+0x79/0xd0
>   [<ffffffff8126d28d>] file_update_time+0xbd/0x110
>   [<ffffffff812e4bc8>] ext4_dax_fault+0x68/0x110
>   [<ffffffff811f816e>] __do_fault+0x4e/0xf0
>   [<ffffffff811fc2a7>] handle_mm_fault+0x5e7/0x1b50
>   [<ffffffff811fbd11>] ? handle_mm_fault+0x51/0x1b50
>   [<ffffffff810689c1>] __do_page_fault+0x191/0x3f0
>   [<ffffffff81068cef>] trace_do_page_fault+0x4f/0x120
>   [<ffffffff8106314a>] do_async_page_fault+0x1a/0xa0
>   [<ffffffff81902678>] async_page_fault+0x28/0x30

Doesn't this indicate some problem at the block/bdi level?
__mark_inode_dirty() should not throw warnings like this regardless
of where it is called from...

Cheers,

Dave.
Dan Williams Nov. 20, 2015, 4:39 a.m. UTC | #2
On Thu, Nov 19, 2015 at 7:03 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Nov 19, 2015 at 04:18:13PM -0800, Dan Williams wrote:
>> Neither the filemap_fault() nor the xfs dax fault path is updating time.
>> This call leads to the following WARN() when the block device has been
>> torn down:
>
> I don't think that is right. In xfs_filemap_fault():
>
>
> ....
>         /* DAX can shortcut the normal fault path on write faults! */
>         if ((vmf->flags & FAULT_FLAG_WRITE) && IS_DAX(inode))
>                 return xfs_filemap_page_mkwrite(vma, vmf);
> ....
>
> And xfs_filemap_page_mkwrite() most definitely calls file_update_time():
>
> ....
>         trace_xfs_filemap_page_mkwrite(XFS_I(inode));
>
>         sb_start_pagefault(inode->i_sb);
>         file_update_time(vma->vm_file);
>         xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> ....
>
> And, finally, in xfs_filemap_pmd_fault():
>
> ....
>         if (flags & FAULT_FLAG_WRITE) {
>                 sb_start_pagefault(inode->i_sb);
>                 file_update_time(vma->vm_file);
>         }
> ....
>
> So we are clearly updating timestamps in XFS on every write fault
> that occurs, whether it be through ->page_mkwrite, ->fault or
> ->pmd_fault. Hence removing those from ext4 can't be the righ tthing
> to do.
>

Ah sorry I missed that.  When I saw that xfs did not trigger the same
warning as ext4 I just assumed it wasn't doing the time update.

>>
>>  WARNING: CPU: 0 PID: 2133 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
>>  bdi-block not registered
>    ^^^^^^^^^^^^^^^^^^^^^^^^
>>  [..]
>>  Call Trace:
>>   [<ffffffff81459f62>] dump_stack+0x44/0x62
>>   [<ffffffff810a2052>] warn_slowpath_common+0x82/0xc0
>>   [<ffffffff810a20ec>] warn_slowpath_fmt+0x5c/0x80
>>   [<ffffffff812831a1>] __mark_inode_dirty+0x261/0x350
>>   [<ffffffff8126d109>] generic_update_time+0x79/0xd0
>>   [<ffffffff8126d28d>] file_update_time+0xbd/0x110
>>   [<ffffffff812e4bc8>] ext4_dax_fault+0x68/0x110
>>   [<ffffffff811f816e>] __do_fault+0x4e/0xf0
>>   [<ffffffff811fc2a7>] handle_mm_fault+0x5e7/0x1b50
>>   [<ffffffff811fbd11>] ? handle_mm_fault+0x51/0x1b50
>>   [<ffffffff810689c1>] __do_page_fault+0x191/0x3f0
>>   [<ffffffff81068cef>] trace_do_page_fault+0x4f/0x120
>>   [<ffffffff8106314a>] do_async_page_fault+0x1a/0xa0
>>   [<ffffffff81902678>] async_page_fault+0x28/0x30
>
> Doesn't this indicate some problem at the block/bdi level?
> __mark_inode_dirty() should not throw warnings like this regardless
> of where it is called from...
>

I'll look closer at how the xfs path avoids triggering this...
Dave Chinner Nov. 21, 2015, 9:32 p.m. UTC | #3
On Thu, Nov 19, 2015 at 08:39:06PM -0800, Dan Williams wrote:
> On Thu, Nov 19, 2015 at 7:03 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Nov 19, 2015 at 04:18:13PM -0800, Dan Williams wrote:
> >>  WARNING: CPU: 0 PID: 2133 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
> >>  bdi-block not registered
> >    ^^^^^^^^^^^^^^^^^^^^^^^^
> >>  [..]
> >>  Call Trace:
> >>   [<ffffffff81459f62>] dump_stack+0x44/0x62
> >>   [<ffffffff810a2052>] warn_slowpath_common+0x82/0xc0
> >>   [<ffffffff810a20ec>] warn_slowpath_fmt+0x5c/0x80
> >>   [<ffffffff812831a1>] __mark_inode_dirty+0x261/0x350
> >>   [<ffffffff8126d109>] generic_update_time+0x79/0xd0
> >>   [<ffffffff8126d28d>] file_update_time+0xbd/0x110
> >>   [<ffffffff812e4bc8>] ext4_dax_fault+0x68/0x110
> >>   [<ffffffff811f816e>] __do_fault+0x4e/0xf0
> >>   [<ffffffff811fc2a7>] handle_mm_fault+0x5e7/0x1b50
> >>   [<ffffffff811fbd11>] ? handle_mm_fault+0x51/0x1b50
> >>   [<ffffffff810689c1>] __do_page_fault+0x191/0x3f0
> >>   [<ffffffff81068cef>] trace_do_page_fault+0x4f/0x120
> >>   [<ffffffff8106314a>] do_async_page_fault+0x1a/0xa0
> >>   [<ffffffff81902678>] async_page_fault+0x28/0x30
> >
> > Doesn't this indicate some problem at the block/bdi level?
> > __mark_inode_dirty() should not throw warnings like this regardless
> > of where it is called from...
> >
> 
> I'll look closer at how the xfs path avoids triggering this...

XFS doesn't mark inodes dirty at the VFS level for inode metadata
changes - dirty metadata is tracked by the journalling subsystem,
not the VFS.

Cheers,

Dave.
Jan Kara Nov. 23, 2015, 2:15 p.m. UTC | #4
On Fri 20-11-15 14:03:44, Dave Chinner wrote:
> On Thu, Nov 19, 2015 at 04:18:13PM -0800, Dan Williams wrote:
> >  WARNING: CPU: 0 PID: 2133 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
> >  bdi-block not registered
>    ^^^^^^^^^^^^^^^^^^^^^^^^
> >  [..]
> >  Call Trace:
> >   [<ffffffff81459f62>] dump_stack+0x44/0x62
> >   [<ffffffff810a2052>] warn_slowpath_common+0x82/0xc0
> >   [<ffffffff810a20ec>] warn_slowpath_fmt+0x5c/0x80
> >   [<ffffffff812831a1>] __mark_inode_dirty+0x261/0x350
> >   [<ffffffff8126d109>] generic_update_time+0x79/0xd0
> >   [<ffffffff8126d28d>] file_update_time+0xbd/0x110
> >   [<ffffffff812e4bc8>] ext4_dax_fault+0x68/0x110
> >   [<ffffffff811f816e>] __do_fault+0x4e/0xf0
> >   [<ffffffff811fc2a7>] handle_mm_fault+0x5e7/0x1b50
> >   [<ffffffff811fbd11>] ? handle_mm_fault+0x51/0x1b50
> >   [<ffffffff810689c1>] __do_page_fault+0x191/0x3f0
> >   [<ffffffff81068cef>] trace_do_page_fault+0x4f/0x120
> >   [<ffffffff8106314a>] do_async_page_fault+0x1a/0xa0
> >   [<ffffffff81902678>] async_page_fault+0x28/0x30
> 
> Doesn't this indicate some problem at the block/bdi level?
> __mark_inode_dirty() should not throw warnings like this regardless
> of where it is called from...

The warning warns that we are dirtying inodes against a backing device that
advertises it does writeback but is not registered with writeback code (so
flusher does not run for it). That makes sense but people seem to be
experimenting a lot with removing devices under filesystems lately and so
this warning triggers. Probably we have to refine it to not trigger for
such cases...

								Honza
Dave Chinner Nov. 23, 2015, 7:55 p.m. UTC | #5
On Mon, Nov 23, 2015 at 03:15:56PM +0100, Jan Kara wrote:
> On Fri 20-11-15 14:03:44, Dave Chinner wrote:
> > On Thu, Nov 19, 2015 at 04:18:13PM -0800, Dan Williams wrote:
> > >  WARNING: CPU: 0 PID: 2133 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
> > >  bdi-block not registered
> >    ^^^^^^^^^^^^^^^^^^^^^^^^
> > >  [..]
> > >  Call Trace:
> > >   [<ffffffff81459f62>] dump_stack+0x44/0x62
> > >   [<ffffffff810a2052>] warn_slowpath_common+0x82/0xc0
> > >   [<ffffffff810a20ec>] warn_slowpath_fmt+0x5c/0x80
> > >   [<ffffffff812831a1>] __mark_inode_dirty+0x261/0x350
> > >   [<ffffffff8126d109>] generic_update_time+0x79/0xd0
> > >   [<ffffffff8126d28d>] file_update_time+0xbd/0x110
> > >   [<ffffffff812e4bc8>] ext4_dax_fault+0x68/0x110
> > >   [<ffffffff811f816e>] __do_fault+0x4e/0xf0
> > >   [<ffffffff811fc2a7>] handle_mm_fault+0x5e7/0x1b50
> > >   [<ffffffff811fbd11>] ? handle_mm_fault+0x51/0x1b50
> > >   [<ffffffff810689c1>] __do_page_fault+0x191/0x3f0
> > >   [<ffffffff81068cef>] trace_do_page_fault+0x4f/0x120
> > >   [<ffffffff8106314a>] do_async_page_fault+0x1a/0xa0
> > >   [<ffffffff81902678>] async_page_fault+0x28/0x30
> > 
> > Doesn't this indicate some problem at the block/bdi level?
> > __mark_inode_dirty() should not throw warnings like this regardless
> > of where it is called from...
> 
> The warning warns that we are dirtying inodes against a backing device that
> advertises it does writeback but is not registered with writeback code (so
> flusher does not run for it). That makes sense but people seem to be
> experimenting a lot with removing devices under filesystems lately and so
> this warning triggers. Probably we have to refine it to not trigger for
> such cases...

Or, alternatively, have the block devices inform the filesystem that
it no longer has a valid bdi backing it. That way the filesystem can
error out at a higher layer and we never get to the point of trying
to mark an inode dirty when the backing device has been unplugged...

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 113837e7ba98..76854dd6dd7d 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -214,7 +214,6 @@  static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 	if (write) {
 		sb_start_pagefault(sb);
-		file_update_time(vma->vm_file);
 		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
 						EXT4_DATA_TRANS_BLOCKS(sb));
 	}
@@ -245,7 +244,6 @@  static int ext4_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
 
 	if (write) {
 		sb_start_pagefault(sb);
-		file_update_time(vma->vm_file);
 		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
 				ext4_chunk_trans_blocks(inode,
 							PMD_SIZE / PAGE_SIZE));