Message ID | 20240126220756.395187-2-kent.overstreet@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm: introduce memalloc_flags_{save,restore} | expand |
On Fri 26-01-24 17:07:56, Kent Overstreet wrote: > If we're using PF_MEMALLOC, we might have a fallback and might not to > warn about a failing allocation - thus we need a PF_* equivalent of > __GFP_NOWARN. Could you be more specific about the user? Is this an allocation from the reclaim path or an explicit PF_MEMALLOC one? It would be also really helpful to explain why GFP_NOWARN cannot be used directly. Thanks!
On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote: > On Fri 26-01-24 17:07:56, Kent Overstreet wrote: > > If we're using PF_MEMALLOC, we might have a fallback and might not to > > warn about a failing allocation - thus we need a PF_* equivalent of > > __GFP_NOWARN. > > Could you be more specific about the user? Is this an allocation from > the reclaim path or an explicit PF_MEMALLOC one? It would be also really > helpful to explain why GFP_NOWARN cannot be used directly. Explicit PF_MEMALLOC. It's for a call to alloc_inode(), which doesn't take gfp flags, and plumbing it would require modifying a s_ops callback and would touch every filesystem - but we want to get away from gfp flags anyways :) More specifically, the code where I'm using it is doing a "try GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance; it's part of a cleanup for some weird lifetime stuff related to fs/inode.c. #define memalloc_flags_do(_flags, _do) \ ({ \ unsigned _saved_flags = memalloc_flags_save(_flags); \ typeof(_do) _ret = _do; \ memalloc_noreclaim_restore(_saved_flags); \ _ret; \ }) /* * Allocate a new inode, dropping/retaking btree locks if necessary: */ static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans) { struct bch_fs *c = trans->c; struct bch_inode_info *inode = memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb))); if (unlikely(!inode)) { int ret = drop_locks_do(trans, (inode = to_bch_ei(new_inode(c->vfs_sb))) ? 0 : -ENOMEM); if (ret && inode) discard_new_inode(&inode->v); if (ret) return ERR_PTR(ret); } return inode; }
On Sun 28-01-24 14:43:16, Kent Overstreet wrote: > On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote: > > On Fri 26-01-24 17:07:56, Kent Overstreet wrote: > > > If we're using PF_MEMALLOC, we might have a fallback and might not to > > > warn about a failing allocation - thus we need a PF_* equivalent of > > > __GFP_NOWARN. > > > > Could you be more specific about the user? Is this an allocation from > > the reclaim path or an explicit PF_MEMALLOC one? It would be also really > > helpful to explain why GFP_NOWARN cannot be used directly. > > Explicit PF_MEMALLOC. > > It's for a call to alloc_inode(), which doesn't take gfp flags, and > plumbing it would require modifying a s_ops callback and would touch > every filesystem - OK, I see. This should be part of the changelog. > but we want to get away from gfp flags anyways :) > > More specifically, the code where I'm using it is doing a "try > GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance; > it's part of a cleanup for some weird lifetime stuff related to > fs/inode.c. > > #define memalloc_flags_do(_flags, _do) \ > ({ \ > unsigned _saved_flags = memalloc_flags_save(_flags); \ > typeof(_do) _ret = _do; \ > memalloc_noreclaim_restore(_saved_flags); \ > _ret; \ > }) > > /* > * Allocate a new inode, dropping/retaking btree locks if necessary: > */ > static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans) > { > struct bch_fs *c = trans->c; > > struct bch_inode_info *inode = > memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb))); Is this what you meant by GFP_NOWAIT allocation? You would be right about this allocation not entering the direct reclaim but do you realize this is not an ordinary NOWAIT request beause PF_MEMALLOC will allow to dip into memory reserves without any limits? (Unless the specific allocation down the road is explicitly GFP_NOMEMALLOC) A failure here means that the system is really in a desparate state with all the memory reserves gone which migt break reclaimers who rely on those reserves. Are you sure it is a good idea? Unless I am missing something you are just giving an ordinary user access to those reserves by creating inodes without any bounds.
On Mon, Jan 29, 2024 at 11:48:00AM +0100, Michal Hocko wrote: > On Sun 28-01-24 14:43:16, Kent Overstreet wrote: > > On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote: > > > On Fri 26-01-24 17:07:56, Kent Overstreet wrote: > > > > If we're using PF_MEMALLOC, we might have a fallback and might not to > > > > warn about a failing allocation - thus we need a PF_* equivalent of > > > > __GFP_NOWARN. > > > > > > Could you be more specific about the user? Is this an allocation from > > > the reclaim path or an explicit PF_MEMALLOC one? It would be also really > > > helpful to explain why GFP_NOWARN cannot be used directly. > > > > Explicit PF_MEMALLOC. > > > > It's for a call to alloc_inode(), which doesn't take gfp flags, and > > plumbing it would require modifying a s_ops callback and would touch > > every filesystem - > > OK, I see. This should be part of the changelog. > > > but we want to get away from gfp flags anyways :) > > > > More specifically, the code where I'm using it is doing a "try > > GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance; > > it's part of a cleanup for some weird lifetime stuff related to > > fs/inode.c. > > > > #define memalloc_flags_do(_flags, _do) \ > > ({ \ > > unsigned _saved_flags = memalloc_flags_save(_flags); \ > > typeof(_do) _ret = _do; \ > > memalloc_noreclaim_restore(_saved_flags); \ > > _ret; \ > > }) > > > > /* > > * Allocate a new inode, dropping/retaking btree locks if necessary: > > */ > > static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans) > > { > > struct bch_fs *c = trans->c; > > > > struct bch_inode_info *inode = > > memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb))); > > Is this what you meant by GFP_NOWAIT allocation? You would be right > about this allocation not entering the direct reclaim but do you realize > this is not an ordinary NOWAIT request beause PF_MEMALLOC will allow to > dip into memory reserves without any limits? (Unless the specific > allocation down the road is explicitly GFP_NOMEMALLOC) > A failure here means that the system is really in a desparate state with all > the memory reserves gone which migt break reclaimers who rely on those > reserves. > > Are you sure it is a good idea? Unless I am missing something you are > just giving an ordinary user access to those reserves by creating inodes > without any bounds. Did not realize that; thank you. How about this version? From 0e87e55058ccddde4b6bcc092f43e66a4e632575 Mon Sep 17 00:00:00 2001 From: Kent Overstreet <kent.overstreet@linux.dev> Date: Thu, 25 Jan 2024 19:00:24 -0500 Subject: [PATCH] mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT PF_MEMALLOC_NOWARN -> __GFP_NOWARN Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> diff --git a/include/linux/sched.h b/include/linux/sched.h index cdb8ea53c365..36de7a2343c3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1632,12 +1632,12 @@ extern struct pid *cad_pid; #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ -#define PF_LOCAL_THROTTLE 0x00100000 /* Throttle writes only against the bdi I write to, +#define PF_MEMALLOC_NORECLAIM 0x00100000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF_MEMALLOC_NOWARN 0x00200000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF_LOCAL_THROTTLE 0x00400000 /* Throttle writes only against the bdi I write to, * I am cleaning dirty pages from some other bdi. */ -#define PF_KTHREAD 0x00200000 /* I am a kernel thread */ -#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 -#define PF__HOLE__01000000 0x01000000 +#define PF_KTHREAD 0x00800000 /* I am a kernel thread */ +#define PF_RANDOMIZE 0x01000000 /* Randomize virtual address space */ #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f00d7ecc2adf..c29059a76052 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -236,16 +236,25 @@ static inline gfp_t current_gfp_context(gfp_t flags) { unsigned int pflags = READ_ONCE(current->flags); - if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { + if (unlikely(pflags & (PF_MEMALLOC_NOIO | + PF_MEMALLOC_NOFS | + PF_MEMALLOC_NORECLAIM | + PF_MEMALLOC_NOWARN | + PF_MEMALLOC_PIN))) { /* - * NOIO implies both NOIO and NOFS and it is a weaker context - * so always make sure it makes precedence + * Stronger flags before weaker flags: + * NORECLAIM implies NOIO, which in turn implies NOFS */ - if (pflags & PF_MEMALLOC_NOIO) + if (pflags & PF_MEMALLOC_NORECLAIM) + flags &= ~__GFP_DIRECT_RECLAIM; + else if (pflags & PF_MEMALLOC_NOIO) flags &= ~(__GFP_IO | __GFP_FS); else if (pflags & PF_MEMALLOC_NOFS) flags &= ~__GFP_FS; + if (pflags & PF_MEMALLOC_NOWARN) + flags |= __GFP_NOWARN; + if (pflags & PF_MEMALLOC_PIN) flags &= ~__GFP_MOVABLE; }
On Thu 01-02-24 06:03:27, Kent Overstreet wrote: > On Mon, Jan 29, 2024 at 11:48:00AM +0100, Michal Hocko wrote: [...] > >From 0e87e55058ccddde4b6bcc092f43e66a4e632575 Mon Sep 17 00:00:00 2001 > From: Kent Overstreet <kent.overstreet@linux.dev> > Date: Thu, 25 Jan 2024 19:00:24 -0500 > Subject: [PATCH] mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN > > Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: > > PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT scoped NOWAIT/NORECLAIM semantic has been proposed in the past. I haven't been fan of it TBH. This is really tricky because unlike other scoped flags this changes the allocation failure semantic for the code that is not really aware of that. Even worse if the code inside the scope has a completely different allocation failure requirements - e.g. consider a nested GFP_NOFAIL request. Your implementation would simply override that and cause an unexpected failure. Now you could exclude GFP_NOFAIL explicitly but that brings more problems down the road - e.g. consider the scope NOWAIT is a way to prevent from sleeping from within atomic context. Now nested GFP_NOFAIL request either busy waits or sleeps inside of an atomic context. No good!
diff --git a/include/linux/sched.h b/include/linux/sched.h index cdb8ea53c365..36a5046ea3fa 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1632,11 +1632,11 @@ extern struct pid *cad_pid; #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ -#define PF_LOCAL_THROTTLE 0x00100000 /* Throttle writes only against the bdi I write to, +#define PF_MEMALLOC_NOWARN 0x00100000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF_LOCAL_THROTTLE 0x00200000 /* Throttle writes only against the bdi I write to, * I am cleaning dirty pages from some other bdi. */ -#define PF_KTHREAD 0x00200000 /* I am a kernel thread */ -#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 +#define PF_KTHREAD 0x00400000 /* I am a kernel thread */ +#define PF_RANDOMIZE 0x00800000 /* Randomize virtual address space */ #define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f00d7ecc2adf..817f5c1bf3d5 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -236,7 +236,10 @@ static inline gfp_t current_gfp_context(gfp_t flags) { unsigned int pflags = READ_ONCE(current->flags); - if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { + if (unlikely(pflags & (PF_MEMALLOC_NOIO | + PF_MEMALLOC_NOFS | + PF_MEMALLOC_NOWARN | + PF_MEMALLOC_PIN))) { /* * NOIO implies both NOIO and NOFS and it is a weaker context * so always make sure it makes precedence @@ -246,6 +249,9 @@ static inline gfp_t current_gfp_context(gfp_t flags) else if (pflags & PF_MEMALLOC_NOFS) flags &= ~__GFP_FS; + if (pflags & PF_MEMALLOC_NOWARN) + flags |= __GFP_NOWARN; + if (pflags & PF_MEMALLOC_PIN) flags &= ~__GFP_MOVABLE; }
If we're using PF_MEMALLOC, we might have a fallback and might not to warn about a failing allocation - thus we need a PF_* equivalent of __GFP_NOWARN. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> --- include/linux/sched.h | 8 ++++---- include/linux/sched/mm.h | 8 +++++++- 2 files changed, 11 insertions(+), 5 deletions(-)