Message ID | 20161215140715.12732-3-mhocko@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing > KM_NOFS tags to keep lockdep happy") and use the new flag for them > instead. There is really no reason to make these allocations contexts > weaker just because of the lockdep which even might not be enabled > in most cases. > > Signed-off-by: Michal Hocko <mhocko@suse.com> I'd suggest that it might be better to drop this patch for now - it's not necessary for the context flag changeover but does introduce a risk of regressions if the conversion is wrong. Hence I think this is better as a completely separate series which audits and changes all the unnecessary KM_NOFS allocations in one go. I've never liked whack-a-mole style changes like this - do it once, do it properly.... Cheers, Dave.
On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote: > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce > > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it > > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing > > KM_NOFS tags to keep lockdep happy") and use the new flag for them > > instead. There is really no reason to make these allocations contexts > > weaker just because of the lockdep which even might not be enabled > > in most cases. > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > I'd suggest that it might be better to drop this patch for now - > it's not necessary for the context flag changeover but does > introduce a risk of regressions if the conversion is wrong. I was just about to write in that while I didn't see anything obviously wrong with the NOFS removals, I also don't know for sure that we can't end up recursively in those code paths (specifically the directory traversal thing). --D > Hence I think this is better as a completely separate series > which audits and changes all the unnecessary KM_NOFS allocations > in one go. I've never liked whack-a-mole style changes like this - > do it once, do it properly.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue 20-12-16 08:24:13, Dave Chinner wrote: > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce > > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it > > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing > > KM_NOFS tags to keep lockdep happy") and use the new flag for them > > instead. There is really no reason to make these allocations contexts > > weaker just because of the lockdep which even might not be enabled > > in most cases. > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > I'd suggest that it might be better to drop this patch for now - > it's not necessary for the context flag changeover but does > introduce a risk of regressions if the conversion is wrong. > > Hence I think this is better as a completely separate series > which audits and changes all the unnecessary KM_NOFS allocations > in one go. I've never liked whack-a-mole style changes like this - > do it once, do it properly.... OK, fair enough. I thought it might be better to have an example user so that others can follow but as you say, the risk of regression is really there and these kind of changes definitely need a throughout review. I am not sure I will be able to post more of those changes because that requires an intimate knowledge of the fs so I hope somebody can take over there and follow up. Thanks!
On Mon, Dec 19, 2016 at 02:06:19PM -0800, Darrick J. Wong wrote: > On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote: > > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce > > > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it > > > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing > > > KM_NOFS tags to keep lockdep happy") and use the new flag for them > > > instead. There is really no reason to make these allocations contexts > > > weaker just because of the lockdep which even might not be enabled > > > in most cases. > > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > I'd suggest that it might be better to drop this patch for now - > > it's not necessary for the context flag changeover but does > > introduce a risk of regressions if the conversion is wrong. > > I was just about to write in that while I didn't see anything obviously > wrong with the NOFS removals, I also don't know for sure that we can't > end up recursively in those code paths (specifically the directory > traversal thing). The issue is with code paths that can be called from both inside and outside transaction context - lockdep complains when it sees an allocation path that is used with both GFP_NOFS and GFP_KERNEL context, as it doesn't know that the GFP_KERNEL usage is safe or not. So things like the directory buffer path, which can be called from readdir without a transaction context, have various KM_NOFS flags scattered through it so that lockdep doesn't get all upset every time readdir is called... There are other cases like this - btree manipulation via bunmapi() can be called without transaction context to remove delayed alloc extents, and that puts all of the btree cursor and incore extent list handling in the same boat (all those allocations are KM_NOFS), etc. So it's not really recursion that is the problem here - it's different allocation contexts that lockdep can't know about unless it's told about them. We've done that with KM_NOFS in the past; in future we should use this KM_NOLOCKDEP flag, though I'd prefer a better name for it. e.g. KM_NOTRANS to indicate that the allocation can occur both inside and outside of transaction context.... Cheers, Dave.
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index 689f746224e7..ea3984091d58 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -33,6 +33,7 @@ typedef unsigned __bitwise xfs_km_flags_t; #define KM_NOFS ((__force xfs_km_flags_t)0x0004u) #define KM_MAYFAIL ((__force xfs_km_flags_t)0x0008u) #define KM_ZERO ((__force xfs_km_flags_t)0x0010u) +#define KM_NOLOCKDEP ((__force xfs_km_flags_t)0x0020u) /* * We use a special process flag to avoid recursive callbacks into @@ -57,6 +58,9 @@ kmem_flags_convert(xfs_km_flags_t flags) if (flags & KM_ZERO) lflags |= __GFP_ZERO; + if (flags & KM_NOLOCKDEP) + lflags |= __GFP_NOLOCKDEP; + return lflags; } diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c index f2dc1a950c85..b8b5f6914863 100644 --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -2429,7 +2429,7 @@ xfs_buf_map_from_irec( if (nirecs > 1) { map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map), - KM_SLEEP | KM_NOFS); + KM_SLEEP | KM_NOLOCKDEP); if (!map) return -ENOMEM; *mapp = map; @@ -2488,7 +2488,7 @@ xfs_dabuf_map( */ if (nfsb != 1) irecs = kmem_zalloc(sizeof(irec) * nfsb, - KM_SLEEP | KM_NOFS); + KM_SLEEP | KM_NOLOCKDEP); nirecs = nfsb; error = xfs_bmapi_read(dp, (xfs_fileoff_t)bno, nfsb, irecs, diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 7f0a01f7b592..f31ae592dcae 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1785,7 +1785,7 @@ xfs_alloc_buftarg( { xfs_buftarg_t *btp; - btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOFS); + btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOLOCKDEP); btp->bt_mount = mp; btp->bt_dev = bdev->bd_dev; diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c index 003a99b83bd8..033ed65d7ce6 100644 --- a/fs/xfs/xfs_dir2_readdir.c +++ b/fs/xfs/xfs_dir2_readdir.c @@ -503,7 +503,7 @@ xfs_dir2_leaf_getdents( length = howmany(bufsize + geo->blksize, (1 << geo->fsblog)); map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) + (length * sizeof(struct xfs_bmbt_irec)), - KM_SLEEP | KM_NOFS); + KM_SLEEP | KM_NOLOCKDEP); map_info->map_size = length; /*