Message ID | 20140416040336.10604.58240.stgit@notabene.brown (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote: > Currently both xfs and nfs will handle PF_FSTRANS by disabling > __GFP_FS. > > Make this effect global by repurposing memalloc_noio_flags (which > does the same thing for PF_MEMALLOC_NOIO and __GFP_IO) to generally > impost the task flags on a gfp_t. > Due to this repurposing we change the name of memalloc_noio_flags > to gfp_from_current(). > > As PF_FSTRANS now uniformly removes __GFP_FS we can remove special > code for this from xfs and nfs. > > As we can now expect other code to set PF_FSTRANS, its meaning is more > general, so the WARN_ON in xfs_vm_writepage() which checks PF_FSTRANS > is not set is no longer appropriate. PF_FSTRANS may be set for other > reasons than an XFS transaction. So PF_FSTRANS no longer means "filesystem in transaction context". Are you going to rename to match whatever it's meaning is now? I'm not exactly clear on what it means now... > As lockdep cares about __GFP_FS, we need to translate PF_FSTRANS to > __GFP_FS before calling lockdep_alloc_trace() in various places. > > Signed-off-by: NeilBrown <neilb@suse.de> .... > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > index 64db0e53edea..882b86270ebe 100644 > --- a/fs/xfs/kmem.h > +++ b/fs/xfs/kmem.h > @@ -50,8 +50,6 @@ kmem_flags_convert(xfs_km_flags_t flags) > lflags = GFP_ATOMIC | __GFP_NOWARN; > } else { > lflags = GFP_KERNEL | __GFP_NOWARN; > - if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS)) > - lflags &= ~__GFP_FS; > } I think KM_NOFS needs to remain here, as it has use outside of transaction contexts that set PF_FSTRANS.... > if (flags & KM_ZERO) > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index db2cfb067d0b..207a7f86d5d7 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -952,13 +952,6 @@ xfs_vm_writepage( > PF_MEMALLOC)) > goto redirty; > > - /* > - * Given that we do not allow direct reclaim to call us, we should > - * never be called while in a filesystem transaction. > - */ > - if (WARN_ON(current->flags & PF_FSTRANS)) > - goto redirty; We still need to ensure this rule isn't broken. If it is, the filesystem will silently deadlock in delayed allocation rather than gracefully handle the problem with a warning.... Cheers, Dave.
On Wed, 16 Apr 2014 15:37:56 +1000 Dave Chinner <david@fromorbit.com> wrote: > On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote: > > Currently both xfs and nfs will handle PF_FSTRANS by disabling > > __GFP_FS. > > > > Make this effect global by repurposing memalloc_noio_flags (which > > does the same thing for PF_MEMALLOC_NOIO and __GFP_IO) to generally > > impost the task flags on a gfp_t. > > Due to this repurposing we change the name of memalloc_noio_flags > > to gfp_from_current(). > > > > As PF_FSTRANS now uniformly removes __GFP_FS we can remove special > > code for this from xfs and nfs. > > > > As we can now expect other code to set PF_FSTRANS, its meaning is more > > general, so the WARN_ON in xfs_vm_writepage() which checks PF_FSTRANS > > is not set is no longer appropriate. PF_FSTRANS may be set for other > > reasons than an XFS transaction. > > So PF_FSTRANS no longer means "filesystem in transaction context". > Are you going to rename to match whatever it's meaning is now? > I'm not exactly clear on what it means now... I did consider renaming it to "PF_MEMALLOC_NOFS" as it is similar to "PF_MEMALLOC_NOIO", except that it disables __GFP_FS rather than __GFP_IO. Maybe I should go ahead with that. > > > > As lockdep cares about __GFP_FS, we need to translate PF_FSTRANS to > > __GFP_FS before calling lockdep_alloc_trace() in various places. > > > > Signed-off-by: NeilBrown <neilb@suse.de> > .... > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > > index 64db0e53edea..882b86270ebe 100644 > > --- a/fs/xfs/kmem.h > > +++ b/fs/xfs/kmem.h > > @@ -50,8 +50,6 @@ kmem_flags_convert(xfs_km_flags_t flags) > > lflags = GFP_ATOMIC | __GFP_NOWARN; > > } else { > > lflags = GFP_KERNEL | __GFP_NOWARN; > > - if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS)) > > - lflags &= ~__GFP_FS; > > } > > I think KM_NOFS needs to remain here, as it has use outside of > transaction contexts that set PF_FSTRANS.... Argh, yes of course. I'll have to re-test the other xfs changes now to see if they are really needed. Thanks! > > > if (flags & KM_ZERO) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index db2cfb067d0b..207a7f86d5d7 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -952,13 +952,6 @@ xfs_vm_writepage( > > PF_MEMALLOC)) > > goto redirty; > > > > - /* > > - * Given that we do not allow direct reclaim to call us, we should > > - * never be called while in a filesystem transaction. > > - */ > > - if (WARN_ON(current->flags & PF_FSTRANS)) > > - goto redirty; > > We still need to ensure this rule isn't broken. If it is, the > filesystem will silently deadlock in delayed allocation rather than > gracefully handle the problem with a warning.... Hmm... that might be tricky. The 'new' PF_FSTRANS can definitely be set when xfs_vm_writepage is called and we really want the write to happen. I don't suppose there is any other way to detect if a transaction is happening? Thanks, NeilBrown
On Wed, 16 Apr 2014 16:17:26 +1000 NeilBrown <neilb@suse.de> wrote: > On Wed, 16 Apr 2014 15:37:56 +1000 Dave Chinner <david@fromorbit.com> wrote: > > > On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote: > > > - /* > > > - * Given that we do not allow direct reclaim to call us, we should > > > - * never be called while in a filesystem transaction. > > > - */ > > > - if (WARN_ON(current->flags & PF_FSTRANS)) > > > - goto redirty; > > > > We still need to ensure this rule isn't broken. If it is, the > > filesystem will silently deadlock in delayed allocation rather than > > gracefully handle the problem with a warning.... > > Hmm... that might be tricky. The 'new' PF_FSTRANS can definitely be set when > xfs_vm_writepage is called and we really want the write to happen. > I don't suppose there is any other way to detect if a transaction is > happening? I've been thinking about this some more.... That code is in xfs_vm_writepage which is only called as ->writepage. xfs never calls that directly so it could only possibly be called during reclaim? We know that doesn't happen, but if it does then PF_MEMALLOC would be set, but PF_KSWAPD would not... and you already have a test for that. How about every time we set PF_FSTRANS, we store the corresponding xfs_trans_t in current->journal_info, and clear that field when PF_FSTRANS is cleared. Then xfs_vm_writepage can test for current->journal_info being clear. That is the field that several other filesystems use to keep track of the 'current' transaction. ?? I don't know what xfs_trans_t we would use in xfs_bmapi_allocate_worker, but I suspect you do :-) Thanks, NeilBrown
On Thu, Apr 17, 2014 at 11:03:50AM +1000, NeilBrown wrote: > On Wed, 16 Apr 2014 16:17:26 +1000 NeilBrown <neilb@suse.de> wrote: > > > On Wed, 16 Apr 2014 15:37:56 +1000 Dave Chinner <david@fromorbit.com> wrote: > > > > > On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote: > > > > > - /* > > > > - * Given that we do not allow direct reclaim to call us, we should > > > > - * never be called while in a filesystem transaction. > > > > - */ > > > > - if (WARN_ON(current->flags & PF_FSTRANS)) > > > > - goto redirty; > > > > > > We still need to ensure this rule isn't broken. If it is, the > > > filesystem will silently deadlock in delayed allocation rather than > > > gracefully handle the problem with a warning.... > > > > Hmm... that might be tricky. The 'new' PF_FSTRANS can definitely be set when > > xfs_vm_writepage is called and we really want the write to happen. > > I don't suppose there is any other way to detect if a transaction is > > happening? > > I've been thinking about this some more.... > > That code is in xfs_vm_writepage which is only called as ->writepage. > xfs never calls that directly so it could only possibly be called during > reclaim? __filemap_fdatawrite_range or __writeback_single_inode do_writepages ->writepages xfs_vm_writepages write_cache_pages ->writepage xfs_vm_writepage So explicit data flushes or background writeback still end up in xfs_vm_writepage. > We know that doesn't happen, but if it does then PF_MEMALLOC would be set, > but PF_KSWAPD would not... and you already have a test for that. > > How about every time we set PF_FSTRANS, we store the corresponding > xfs_trans_t in current->journal_info, and clear that field when PF_FSTRANS is > cleared. Then xfs_vm_writepage can test for current->journal_info being > clear. > That is the field that several other filesystems use to keep track of the > 'current' transaction. The difference is that we have an explicit transaction handle in XFS which defines the transaction context. i.e. we don't hide transactions in thread contexts - the transaction defines the atomic context of the modification being made..... > I don't know what xfs_trans_t we would use in > xfs_bmapi_allocate_worker, but I suspect you do :-) The same one we use now. But that's exactly my point. i.e. the transaction handle belongs to the operation being executed, not the thread that is currently executing it. We also hand transaction contexts to IO completion, do interesting things with log space reservations for operations that require multiple commits to complete and so pass state when handles are duplicated prior to commit, etc. We still need direct manipulation and control of the transaction structure, regardless of where it is stored. Cheers, Dave.
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 5bb790a69c71..ed863f52bae7 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -472,8 +472,7 @@ static int nfs_release_page(struct page *page, gfp_t gfp) /* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not * doing this memory reclaim for a fs-related allocation. */ - if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL && - !(current->flags & PF_FSTRANS)) { + if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { int how = FLUSH_SYNC; /* Don't let kswapd deadlock waiting for OOM RPC calls */ diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h index 64db0e53edea..882b86270ebe 100644 --- a/fs/xfs/kmem.h +++ b/fs/xfs/kmem.h @@ -50,8 +50,6 @@ kmem_flags_convert(xfs_km_flags_t flags) lflags = GFP_ATOMIC | __GFP_NOWARN; } else { lflags = GFP_KERNEL | __GFP_NOWARN; - if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS)) - lflags &= ~__GFP_FS; } if (flags & KM_ZERO) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index db2cfb067d0b..207a7f86d5d7 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -952,13 +952,6 @@ xfs_vm_writepage( PF_MEMALLOC)) goto redirty; - /* - * Given that we do not allow direct reclaim to call us, we should - * never be called while in a filesystem transaction. - */ - if (WARN_ON(current->flags & PF_FSTRANS)) - goto redirty; - /* Is this page beyond the end of the file? */ offset = i_size_read(inode); end_index = offset >> PAGE_CACHE_SHIFT; diff --git a/include/linux/sched.h b/include/linux/sched.h index 56fa52a0654c..f3291ed33c27 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1860,10 +1860,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, #define used_math() tsk_used_math(current) /* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */ -static inline gfp_t memalloc_noio_flags(gfp_t flags) +/* __GFP_FS isn't allowed if PF_FSTRANS is set in current->flags */ +static inline gfp_t gfp_from_current(gfp_t flags) { if (unlikely(current->flags & PF_MEMALLOC_NOIO)) flags &= ~__GFP_IO; + if (unlikely(current->flags & PF_FSTRANS)) + flags &= ~__GFP_FS; return flags; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ff8b91aa0b87..5e9225df3447 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2718,6 +2718,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct mem_cgroup *memcg = NULL; gfp_mask &= gfp_allowed_mask; + gfp_mask = gfp_from_current(gfp_mask); lockdep_trace_alloc(gfp_mask); @@ -2765,7 +2766,7 @@ retry_cpuset: * can deadlock because I/O on the device might not * complete. */ - gfp_mask = memalloc_noio_flags(gfp_mask); + gfp_mask = gfp_from_current(gfp_mask); page = __alloc_pages_slowpath(gfp_mask, order, zonelist, high_zoneidx, nodemask, preferred_zone, migratetype); diff --git a/mm/slab.c b/mm/slab.c index b264214c77ea..914d88661f3d 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3206,6 +3206,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, int slab_node = numa_mem_id(); flags &= gfp_allowed_mask; + flags = gfp_from_current(flags); lockdep_trace_alloc(flags); @@ -3293,6 +3294,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) void *objp; flags &= gfp_allowed_mask; + flags = gfp_from_current(flags); lockdep_trace_alloc(flags); diff --git a/mm/slob.c b/mm/slob.c index 4bf8809dfcce..18206f54d227 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -431,6 +431,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller) void *ret; gfp &= gfp_allowed_mask; + flags = gfp_from_current(flags); lockdep_trace_alloc(gfp); @@ -539,6 +540,7 @@ void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node) void *b; flags &= gfp_allowed_mask; + flags = gfp_from_current(flags); lockdep_trace_alloc(flags); diff --git a/mm/slub.c b/mm/slub.c index 25f14ad8f817..ff7c15e977da 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -961,6 +961,7 @@ static inline void kfree_hook(const void *x) static inline int slab_pre_alloc_hook(struct kmem_cache *s, gfp_t flags) { flags &= gfp_allowed_mask; + flags = gfp_from_current(flags); lockdep_trace_alloc(flags); might_sleep_if(flags & __GFP_WAIT); diff --git a/mm/vmscan.c b/mm/vmscan.c index 67165f839936..05de3289d031 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2592,7 +2592,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order, { unsigned long nr_reclaimed; struct scan_control sc = { - .gfp_mask = (gfp_mask = memalloc_noio_flags(gfp_mask)), + .gfp_mask = (gfp_mask = gfp_from_current(gfp_mask)), .may_writepage = !laptop_mode, .nr_to_reclaim = SWAP_CLUSTER_MAX, .may_unmap = 1, @@ -3524,7 +3524,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP), .may_swap = 1, .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), - .gfp_mask = (gfp_mask = memalloc_noio_flags(gfp_mask)), + .gfp_mask = (gfp_mask = gfp_from_current(gfp_mask)), .order = order, .priority = ZONE_RECLAIM_PRIORITY, };
Currently both xfs and nfs will handle PF_FSTRANS by disabling __GFP_FS. Make this effect global by repurposing memalloc_noio_flags (which does the same thing for PF_MEMALLOC_NOIO and __GFP_IO) to generally impost the task flags on a gfp_t. Due to this repurposing we change the name of memalloc_noio_flags to gfp_from_current(). As PF_FSTRANS now uniformly removes __GFP_FS we can remove special code for this from xfs and nfs. As we can now expect other code to set PF_FSTRANS, its meaning is more general, so the WARN_ON in xfs_vm_writepage() which checks PF_FSTRANS is not set is no longer appropriate. PF_FSTRANS may be set for other reasons than an XFS transaction. As lockdep cares about __GFP_FS, we need to translate PF_FSTRANS to __GFP_FS before calling lockdep_alloc_trace() in various places. Signed-off-by: NeilBrown <neilb@suse.de> --- fs/nfs/file.c | 3 +-- fs/xfs/kmem.h | 2 -- fs/xfs/xfs_aops.c | 7 ------- include/linux/sched.h | 5 ++++- mm/page_alloc.c | 3 ++- mm/slab.c | 2 ++ mm/slob.c | 2 ++ mm/slub.c | 1 + mm/vmscan.c | 4 ++-- 9 files changed, 14 insertions(+), 15 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html