diff mbox

[4/5] dax: use sb_issue_zerout instead of calling dax_clear_sectors

Message ID 1458861450-17705-5-git-send-email-vishal.l.verma@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Verma, Vishal L March 24, 2016, 11:17 p.m. UTC
From: Matthew Wilcox <matthew.r.wilcox@intel.com>

dax_clear_sectors() cannot handle poisoned blocks.  These must be
zeroed using the BIO interface instead.  Convert ext2 and XFS to use
only sb_issue_zerout().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 fs/dax.c               | 32 --------------------------------
 fs/ext2/inode.c        |  7 +++----
 fs/xfs/xfs_bmap_util.c |  9 ---------
 include/linux/dax.h    |  1 -
 4 files changed, 3 insertions(+), 46 deletions(-)

Comments

Christoph Hellwig March 25, 2016, 10:44 a.m. UTC | #1
On Thu, Mar 24, 2016 at 05:17:29PM -0600, Vishal Verma wrote:
> @@ -72,16 +72,7 @@ xfs_zero_extent(
>  	struct xfs_mount *mp = ip->i_mount;
>  	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
>  	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
> -	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
>  
> -	if (IS_DAX(VFS_I(ip)))
> -		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
> -				sector, size);
> -
> -	/*
> -	 * let the block layer decide on the fastest method of
> -	 * implementing the zeroing.
> -	 */
>  	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);

While not new: using sb_issue_zeroout in XFS is wrong as it doesn't
account for the RT device.  We need the xfs_find_bdev_for_inode and
call blkdev_issue_zeroout directly with the bdev it returned.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 25, 2016, 6:47 p.m. UTC | #2
On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>
> dax_clear_sectors() cannot handle poisoned blocks.  These must be
> zeroed using the BIO interface instead.  Convert ext2 and XFS to use
> only sb_issue_zerout().
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> [vishal: Also remove the dax_clear_sectors function entirely]
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  fs/dax.c               | 32 --------------------------------
>  fs/ext2/inode.c        |  7 +++----
>  fs/xfs/xfs_bmap_util.c |  9 ---------
>  include/linux/dax.h    |  1 -
>  4 files changed, 3 insertions(+), 46 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index bb7e9f8..a30481e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
>         return page;
>  }
>
> -/*
> - * dax_clear_sectors() is called from within transaction context from XFS,
> - * and hence this means the stack from this point must follow GFP_NOFS
> - * semantics for all operations.
> - */
> -int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
> -{
> -       struct blk_dax_ctl dax = {
> -               .sector = _sector,
> -               .size = _size,
> -       };
> -
> -       might_sleep();
> -       do {
> -               long count, sz;
> -
> -               count = dax_map_atomic(bdev, &dax);
> -               if (count < 0)
> -                       return count;
> -               sz = min_t(long, count, SZ_128K);
> -               clear_pmem(dax.addr, sz);
> -               dax.size -= sz;
> -               dax.sector += sz / 512;
> -               dax_unmap_atomic(bdev, &dax);
> -               cond_resched();
> -       } while (dax.size);
> -
> -       wmb_pmem();
> -       return 0;
> -}
> -EXPORT_SYMBOL_GPL(dax_clear_sectors);

What about the other unwritten extent conversions in the dax path?
Shouldn't those be converted to block-layer zero-outs as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Verma, Vishal L March 25, 2016, 9:01 p.m. UTC | #3
On Fri, 2016-03-25 at 03:44 -0700, Christoph Hellwig wrote:
> On Thu, Mar 24, 2016 at 05:17:29PM -0600, Vishal Verma wrote:

> > 

> > @@ -72,16 +72,7 @@ xfs_zero_extent(

> >  	struct xfs_mount *mp = ip->i_mount;

> >  	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);

> >  	sector_t	block = XFS_BB_TO_FSBT(mp, sector);

> > -	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);

> >  

> > -	if (IS_DAX(VFS_I(ip)))

> > -		return

> > dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),

> > -				sector, size);

> > -

> > -	/*

> > -	 * let the block layer decide on the fastest method of

> > -	 * implementing the zeroing.

> > -	 */

> >  	return sb_issue_zeroout(mp->m_super, block, count_fsb,

> > GFP_NOFS);

> While not new: using sb_issue_zeroout in XFS is wrong as it doesn't

> account for the RT device.  We need the xfs_find_bdev_for_inode and

> call blkdev_issue_zeroout directly with the bdev it returned.


Ok, I'll fix and send a v2. Thanks!
>
Verma, Vishal L March 25, 2016, 9:03 p.m. UTC | #4
On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c

> om> wrote:

> > 

> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>

> > 

> > dax_clear_sectors() cannot handle poisoned blocks.  These must be

> > zeroed using the BIO interface instead.  Convert ext2 and XFS to

> > use

> > only sb_issue_zerout().

> > 

> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>

> > [vishal: Also remove the dax_clear_sectors function entirely]

> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>

> > ---

> >  fs/dax.c               | 32 --------------------------------

> >  fs/ext2/inode.c        |  7 +++----

> >  fs/xfs/xfs_bmap_util.c |  9 ---------

> >  include/linux/dax.h    |  1 -

> >  4 files changed, 3 insertions(+), 46 deletions(-)

> > 

> > diff --git a/fs/dax.c b/fs/dax.c

> > index bb7e9f8..a30481e 100644

> > --- a/fs/dax.c

> > +++ b/fs/dax.c

> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device

> > *bdev, sector_t n)

> >         return page;

> >  }

> > 

> > -/*

> > - * dax_clear_sectors() is called from within transaction context

> > from XFS,

> > - * and hence this means the stack from this point must follow

> > GFP_NOFS

> > - * semantics for all operations.

> > - */

> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,

> > long _size)

> > -{

> > -       struct blk_dax_ctl dax = {

> > -               .sector = _sector,

> > -               .size = _size,

> > -       };

> > -

> > -       might_sleep();

> > -       do {

> > -               long count, sz;

> > -

> > -               count = dax_map_atomic(bdev, &dax);

> > -               if (count < 0)

> > -                       return count;

> > -               sz = min_t(long, count, SZ_128K);

> > -               clear_pmem(dax.addr, sz);

> > -               dax.size -= sz;

> > -               dax.sector += sz / 512;

> > -               dax_unmap_atomic(bdev, &dax);

> > -               cond_resched();

> > -       } while (dax.size);

> > -

> > -       wmb_pmem();

> > -       return 0;

> > -}

> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);

> What about the other unwritten extent conversions in the dax path?

> Shouldn't those be converted to block-layer zero-outs as well?


Could you point me to where these might be? I thought once we've
converted all the zeroout type callers (by removing dax_clear_sectors),
and fixed up dax_do_io to try a driver fallback, we've handled all the
media error cases in dax..
Dan Williams March 25, 2016, 9:20 p.m. UTC | #5
On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@intel.c
>> om> wrote:
>> >
>> > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> >
>> > dax_clear_sectors() cannot handle poisoned blocks.  These must be
>> > zeroed using the BIO interface instead.  Convert ext2 and XFS to
>> > use
>> > only sb_issue_zerout().
>> >
>> > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > [vishal: Also remove the dax_clear_sectors function entirely]
>> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > ---
>> >  fs/dax.c               | 32 --------------------------------
>> >  fs/ext2/inode.c        |  7 +++----
>> >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> >  include/linux/dax.h    |  1 -
>> >  4 files changed, 3 insertions(+), 46 deletions(-)
>> >
>> > diff --git a/fs/dax.c b/fs/dax.c
>> > index bb7e9f8..a30481e 100644
>> > --- a/fs/dax.c
>> > +++ b/fs/dax.c
>> > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct block_device
>> > *bdev, sector_t n)
>> >         return page;
>> >  }
>> >
>> > -/*
>> > - * dax_clear_sectors() is called from within transaction context
>> > from XFS,
>> > - * and hence this means the stack from this point must follow
>> > GFP_NOFS
>> > - * semantics for all operations.
>> > - */
>> > -int dax_clear_sectors(struct block_device *bdev, sector_t _sector,
>> > long _size)
>> > -{
>> > -       struct blk_dax_ctl dax = {
>> > -               .sector = _sector,
>> > -               .size = _size,
>> > -       };
>> > -
>> > -       might_sleep();
>> > -       do {
>> > -               long count, sz;
>> > -
>> > -               count = dax_map_atomic(bdev, &dax);
>> > -               if (count < 0)
>> > -                       return count;
>> > -               sz = min_t(long, count, SZ_128K);
>> > -               clear_pmem(dax.addr, sz);
>> > -               dax.size -= sz;
>> > -               dax.sector += sz / 512;
>> > -               dax_unmap_atomic(bdev, &dax);
>> > -               cond_resched();
>> > -       } while (dax.size);
>> > -
>> > -       wmb_pmem();
>> > -       return 0;
>> > -}
>> > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> What about the other unwritten extent conversions in the dax path?
>> Shouldn't those be converted to block-layer zero-outs as well?
>
> Could you point me to where these might be? I thought once we've
> converted all the zeroout type callers (by removing dax_clear_sectors),
> and fixed up dax_do_io to try a driver fallback, we've handled all the
> media error cases in dax..

grep for usages of clear_pmem()... which I was hoping to eliminate
after this change to push zeroing down to the driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Verma, Vishal L March 28, 2016, 8:01 p.m. UTC | #6
On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L

> <vishal.l.verma@intel.com> wrote:

> > 

> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:

> > > 

> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int

> > > el.c

> > > om> wrote:

> > > > 

> > > > 

> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>

> > > > 

> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must

> > > > be

> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS

> > > > to

> > > > use

> > > > only sb_issue_zerout().

> > > > 

> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>

> > > > [vishal: Also remove the dax_clear_sectors function entirely]

> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>

> > > > ---

> > > >  fs/dax.c               | 32 --------------------------------

> > > >  fs/ext2/inode.c        |  7 +++----

> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------

> > > >  include/linux/dax.h    |  1 -

> > > >  4 files changed, 3 insertions(+), 46 deletions(-)

> > > > 

> > > > diff --git a/fs/dax.c b/fs/dax.c

> > > > index bb7e9f8..a30481e 100644

> > > > --- a/fs/dax.c

> > > > +++ b/fs/dax.c

> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct

> > > > block_device

> > > > *bdev, sector_t n)

> > > >         return page;

> > > >  }

> > > > 

> > > > -/*

> > > > - * dax_clear_sectors() is called from within transaction

> > > > context

> > > > from XFS,

> > > > - * and hence this means the stack from this point must follow

> > > > GFP_NOFS

> > > > - * semantics for all operations.

> > > > - */

> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t

> > > > _sector,

> > > > long _size)

> > > > -{

> > > > -       struct blk_dax_ctl dax = {

> > > > -               .sector = _sector,

> > > > -               .size = _size,

> > > > -       };

> > > > -

> > > > -       might_sleep();

> > > > -       do {

> > > > -               long count, sz;

> > > > -

> > > > -               count = dax_map_atomic(bdev, &dax);

> > > > -               if (count < 0)

> > > > -                       return count;

> > > > -               sz = min_t(long, count, SZ_128K);

> > > > -               clear_pmem(dax.addr, sz);

> > > > -               dax.size -= sz;

> > > > -               dax.sector += sz / 512;

> > > > -               dax_unmap_atomic(bdev, &dax);

> > > > -               cond_resched();

> > > > -       } while (dax.size);

> > > > -

> > > > -       wmb_pmem();

> > > > -       return 0;

> > > > -}

> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);

> > > What about the other unwritten extent conversions in the dax

> > > path?

> > > Shouldn't those be converted to block-layer zero-outs as well?

> > Could you point me to where these might be? I thought once we've

> > converted all the zeroout type callers (by removing

> > dax_clear_sectors),

> > and fixed up dax_do_io to try a driver fallback, we've handled all

> > the

> > media error cases in dax..

> grep for usages of clear_pmem()... which I was hoping to eliminate

> after this change to push zeroing down to the driver.


Ok, so I looked at these, and it looks like the majority of callers of
clear_pmem are from the fault path (either pmd or regular), and in
those cases we should be 'protected', as we would have failed at a
prior step (dax_map_atomic).

The two cases that may not be well handled are the calls to
dax_zero_page_range and dax_truncate_page which are called from file
systems. I think we may need to do a fallback to the driver for those
cases just like we do for dax_direct_io.. Thoughts?
Dan Williams March 28, 2016, 11:34 p.m. UTC | #7
On Mon, Mar 28, 2016 at 1:01 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Fri, 2016-03-25 at 14:20 -0700, Dan Williams wrote:
>> On Fri, Mar 25, 2016 at 2:03 PM, Verma, Vishal L
>> <vishal.l.verma@intel.com> wrote:
>> >
>> > On Fri, 2016-03-25 at 11:47 -0700, Dan Williams wrote:
>> > >
>> > > On Thu, Mar 24, 2016 at 4:17 PM, Vishal Verma <vishal.l.verma@int
>> > > el.c
>> > > om> wrote:
>> > > >
>> > > >
>> > > > From: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > >
>> > > > dax_clear_sectors() cannot handle poisoned blocks.  These must
>> > > > be
>> > > > zeroed using the BIO interface instead.  Convert ext2 and XFS
>> > > > to
>> > > > use
>> > > > only sb_issue_zerout().
>> > > >
>> > > > Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
>> > > > [vishal: Also remove the dax_clear_sectors function entirely]
>> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
>> > > > ---
>> > > >  fs/dax.c               | 32 --------------------------------
>> > > >  fs/ext2/inode.c        |  7 +++----
>> > > >  fs/xfs/xfs_bmap_util.c |  9 ---------
>> > > >  include/linux/dax.h    |  1 -
>> > > >  4 files changed, 3 insertions(+), 46 deletions(-)
>> > > >
>> > > > diff --git a/fs/dax.c b/fs/dax.c
>> > > > index bb7e9f8..a30481e 100644
>> > > > --- a/fs/dax.c
>> > > > +++ b/fs/dax.c
>> > > > @@ -78,38 +78,6 @@ struct page *read_dax_sector(struct
>> > > > block_device
>> > > > *bdev, sector_t n)
>> > > >         return page;
>> > > >  }
>> > > >
>> > > > -/*
>> > > > - * dax_clear_sectors() is called from within transaction
>> > > > context
>> > > > from XFS,
>> > > > - * and hence this means the stack from this point must follow
>> > > > GFP_NOFS
>> > > > - * semantics for all operations.
>> > > > - */
>> > > > -int dax_clear_sectors(struct block_device *bdev, sector_t
>> > > > _sector,
>> > > > long _size)
>> > > > -{
>> > > > -       struct blk_dax_ctl dax = {
>> > > > -               .sector = _sector,
>> > > > -               .size = _size,
>> > > > -       };
>> > > > -
>> > > > -       might_sleep();
>> > > > -       do {
>> > > > -               long count, sz;
>> > > > -
>> > > > -               count = dax_map_atomic(bdev, &dax);
>> > > > -               if (count < 0)
>> > > > -                       return count;
>> > > > -               sz = min_t(long, count, SZ_128K);
>> > > > -               clear_pmem(dax.addr, sz);
>> > > > -               dax.size -= sz;
>> > > > -               dax.sector += sz / 512;
>> > > > -               dax_unmap_atomic(bdev, &dax);
>> > > > -               cond_resched();
>> > > > -       } while (dax.size);
>> > > > -
>> > > > -       wmb_pmem();
>> > > > -       return 0;
>> > > > -}
>> > > > -EXPORT_SYMBOL_GPL(dax_clear_sectors);
>> > > What about the other unwritten extent conversions in the dax
>> > > path?
>> > > Shouldn't those be converted to block-layer zero-outs as well?
>> > Could you point me to where these might be? I thought once we've
>> > converted all the zeroout type callers (by removing
>> > dax_clear_sectors),
>> > and fixed up dax_do_io to try a driver fallback, we've handled all
>> > the
>> > media error cases in dax..
>> grep for usages of clear_pmem()... which I was hoping to eliminate
>> after this change to push zeroing down to the driver.
>
> Ok, so I looked at these, and it looks like the majority of callers of
> clear_pmem are from the fault path (either pmd or regular), and in
> those cases we should be 'protected', as we would have failed at a
> prior step (dax_map_atomic).

Seems kind of sad to fail the fault due to a bad block when we were
going to zero it anyway, right?  I'm not seeing a compelling reason to
keep any zeroing in fs/dax.c.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 29, 2016, 7:37 p.m. UTC | #8
On Tue, Mar 29, 2016 at 11:57 AM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
>
> <>
>
>> Seems kind of sad to fail the fault due to a bad block when we were
>> going to zero it anyway, right?  I'm not seeing a compelling reason to
>> keep any zeroing in fs/dax.c.
>
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

I think we're ok because clear_pmem() will always happen on PAGE_SIZE,
or HPAGE_SIZE boundaries.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara March 30, 2016, 7:49 a.m. UTC | #9
On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> 
> <>
> 
> > Seems kind of sad to fail the fault due to a bad block when we were
> > going to zero it anyway, right?  I'm not seeing a compelling reason to
> > keep any zeroing in fs/dax.c.
> 
> Agreed - but how do we do this? clear_pmem needs to be able to clear an
> arbitrary number of bytes, but to go through the driver, we'd need to
> send down a bio? If only the driver had an rw_bytes like interface that
> could be used by anyone... :)

Actually, my patches for page fault locking remove zeroing from
dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens from
the filesystem only and the zeroing in those two functions is just a dead
code...

								Honza
Verma, Vishal L April 1, 2016, 7:17 p.m. UTC | #10
On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:

> > 

> > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:

> > 

> > <>

> > 

> > > 

> > > Seems kind of sad to fail the fault due to a bad block when we

> > > were

> > > going to zero it anyway, right?  I'm not seeing a compelling

> > > reason to

> > > keep any zeroing in fs/dax.c.

> > Agreed - but how do we do this? clear_pmem needs to be able to clear

> > an

> > arbitrary number of bytes, but to go through the driver, we'd need

> > to

> > send down a bio? If only the driver had an rw_bytes like interface

> > that

> > could be used by anyone... :)

> Actually, my patches for page fault locking remove zeroing from

> dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens

> from

> the filesystem only and the zeroing in those two functions is just a

> dead

> code...


That should make things easier! Do you have a tree I could merge in to
get this? (WIP is ok as we know that my series will depend on yours..)
or, if you can distill out that patch on a 4.6-rc1 base, I could carry
it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

Thanks,
	-Vishal
Jan Kara April 4, 2016, 12:09 p.m. UTC | #11
On Fri 01-04-16 19:17:52, Verma, Vishal L wrote:
> On Wed, 2016-03-30 at 09:49 +0200, Jan Kara wrote:
> > On Tue 29-03-16 18:57:16, Verma, Vishal L wrote:
> > > 
> > > On Mon, 2016-03-28 at 16:34 -0700, Dan Williams wrote:
> > > 
> > > <>
> > > 
> > > > 
> > > > Seems kind of sad to fail the fault due to a bad block when we
> > > > were
> > > > going to zero it anyway, right?  I'm not seeing a compelling
> > > > reason to
> > > > keep any zeroing in fs/dax.c.
> > > Agreed - but how do we do this? clear_pmem needs to be able to clear
> > > an
> > > arbitrary number of bytes, but to go through the driver, we'd need
> > > to
> > > send down a bio? If only the driver had an rw_bytes like interface
> > > that
> > > could be used by anyone... :)
> > Actually, my patches for page fault locking remove zeroing from
> > dax_insert_mapping() and __dax_pmd_fault() - the zeroing now happens
> > from
> > the filesystem only and the zeroing in those two functions is just a
> > dead
> > code...
> 
> That should make things easier! Do you have a tree I could merge in to
> get this? (WIP is ok as we know that my series will depend on yours..)
> or, if you can distill out that patch on a 4.6-rc1 base, I could carry
> it in my series too (your v2's 3/10 doesn't apply on 4.6-rc1..)

I'll CC you on the next posting of the series which I want to do this week.

								Honza
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index bb7e9f8..a30481e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -78,38 +78,6 @@  struct page *read_dax_sector(struct block_device *bdev, sector_t n)
 	return page;
 }
 
-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
-	struct blk_dax_ctl dax = {
-		.sector = _sector,
-		.size = _size,
-	};
-
-	might_sleep();
-	do {
-		long count, sz;
-
-		count = dax_map_atomic(bdev, &dax);
-		if (count < 0)
-			return count;
-		sz = min_t(long, count, SZ_128K);
-		clear_pmem(dax.addr, sz);
-		dax.size -= sz;
-		dax.sector += sz / 512;
-		dax_unmap_atomic(bdev, &dax);
-		cond_resched();
-	} while (dax.size);
-
-	wmb_pmem();
-	return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
 static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
 		loff_t pos, loff_t end)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 6bd58e6..824f249 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@ 
 #include <linux/highuid.h>
 #include <linux/pagemap.h>
 #include <linux/dax.h>
+#include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/writeback.h>
 #include <linux/buffer_head.h>
@@ -737,10 +738,8 @@  static int ext2_get_blocks(struct inode *inode,
 		 * so that it's not found by another thread before it's
 		 * initialised
 		 */
-		err = dax_clear_sectors(inode->i_sb->s_bdev,
-				le32_to_cpu(chain[depth-1].key) <<
-				(inode->i_blkbits - 9),
-				1 << inode->i_blkbits);
+		err = sb_issue_zeroout(inode->i_sb,
+				le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
 		if (err) {
 			mutex_unlock(&ei->truncate_mutex);
 			goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6c87601..23a759a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,16 +72,7 @@  xfs_zero_extent(
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
 	sector_t	block = XFS_BB_TO_FSBT(mp, sector);
-	ssize_t		size = XFS_FSB_TO_B(mp, count_fsb);
 
-	if (IS_DAX(VFS_I(ip)))
-		return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
-				sector, size);
-
-	/*
-	 * let the block layer decide on the fastest method of
-	 * implementing the zeroing.
-	 */
 	return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);
 
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 636dd59..933198a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,7 +7,6 @@ 
 
 ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
 		  get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
 int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
 int dax_truncate_page(struct inode *, loff_t from, get_block_t);
 int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,