diff mbox series

[2/2] mm: introduce PF_MEMALLOC_NOWARN

Message ID 20240126220756.395187-2-kent.overstreet@linux.dev (mailing list archive)
State New
Headers show
Series [1/2] mm: introduce memalloc_flags_{save,restore} | expand

Commit Message

Kent Overstreet Jan. 26, 2024, 10:07 p.m. UTC
If we're using PF_MEMALLOC, we might have a fallback and might not to
warn about a failing allocation - thus we need a PF_* equivalent of
__GFP_NOWARN.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
---
 include/linux/sched.h    | 8 ++++----
 include/linux/sched/mm.h | 8 +++++++-
 2 files changed, 11 insertions(+), 5 deletions(-)

Comments

Michal Hocko Jan. 28, 2024, 3:45 p.m. UTC | #1
On Fri 26-01-24 17:07:56, Kent Overstreet wrote:
> If we're using PF_MEMALLOC, we might have a fallback and might not to
> warn about a failing allocation - thus we need a PF_* equivalent of
> __GFP_NOWARN.

Could you be more specific about the user? Is this an allocation from
the reclaim path or an explicit PF_MEMALLOC one? It would be also really
helpful to explain why GFP_NOWARN cannot be used directly.

Thanks!
Kent Overstreet Jan. 28, 2024, 7:43 p.m. UTC | #2
On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote:
> On Fri 26-01-24 17:07:56, Kent Overstreet wrote:
> > If we're using PF_MEMALLOC, we might have a fallback and might not to
> > warn about a failing allocation - thus we need a PF_* equivalent of
> > __GFP_NOWARN.
> 
> Could you be more specific about the user? Is this an allocation from
> the reclaim path or an explicit PF_MEMALLOC one? It would be also really
> helpful to explain why GFP_NOWARN cannot be used directly.

Explicit PF_MEMALLOC.

It's for a call to alloc_inode(), which doesn't take gfp flags, and
plumbing it would require modifying a s_ops callback and would touch
every filesystem - but we want to get away from gfp flags anyways :)

More specifically, the code where I'm using it is doing a "try
GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance;
it's part of a cleanup for some weird lifetime stuff related to
fs/inode.c.

#define memalloc_flags_do(_flags, _do)						\
({										\
	unsigned _saved_flags = memalloc_flags_save(_flags);			\
	typeof(_do) _ret = _do;							\
	memalloc_noreclaim_restore(_saved_flags);				\
	_ret;									\
})

/*
 * Allocate a new inode, dropping/retaking btree locks if necessary:
 */
static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
{
	struct bch_fs *c = trans->c;

	struct bch_inode_info *inode =
		memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb)));

	if (unlikely(!inode)) {
		int ret = drop_locks_do(trans, (inode = to_bch_ei(new_inode(c->vfs_sb))) ? 0 : -ENOMEM);
		if (ret && inode)
			discard_new_inode(&inode->v);
		if (ret)
			return ERR_PTR(ret);
	}

	return inode;
}
Michal Hocko Jan. 29, 2024, 10:48 a.m. UTC | #3
On Sun 28-01-24 14:43:16, Kent Overstreet wrote:
> On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote:
> > On Fri 26-01-24 17:07:56, Kent Overstreet wrote:
> > > If we're using PF_MEMALLOC, we might have a fallback and might not to
> > > warn about a failing allocation - thus we need a PF_* equivalent of
> > > __GFP_NOWARN.
> > 
> > Could you be more specific about the user? Is this an allocation from
> > the reclaim path or an explicit PF_MEMALLOC one? It would be also really
> > helpful to explain why GFP_NOWARN cannot be used directly.
> 
> Explicit PF_MEMALLOC.
> 
> It's for a call to alloc_inode(), which doesn't take gfp flags, and
> plumbing it would require modifying a s_ops callback and would touch
> every filesystem -

OK, I see. This should be part of the changelog.

> but we want to get away from gfp flags anyways :)
> 
> More specifically, the code where I'm using it is doing a "try
> GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance;
> it's part of a cleanup for some weird lifetime stuff related to
> fs/inode.c.
> 
> #define memalloc_flags_do(_flags, _do)						\
> ({										\
> 	unsigned _saved_flags = memalloc_flags_save(_flags);			\
> 	typeof(_do) _ret = _do;							\
> 	memalloc_noreclaim_restore(_saved_flags);				\
> 	_ret;									\
> })
> 
> /*
>  * Allocate a new inode, dropping/retaking btree locks if necessary:
>  */
> static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
> {
> 	struct bch_fs *c = trans->c;
> 
> 	struct bch_inode_info *inode =
> 		memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb)));

Is this what you meant by GFP_NOWAIT allocation? You would be right
about this allocation not entering the direct reclaim but do you realize
this is not an ordinary NOWAIT request beause PF_MEMALLOC will allow to
dip into memory reserves without any limits? (Unless the specific
allocation down the road is explicitly GFP_NOMEMALLOC)
A failure here means that the system is really in a desparate state with all
the memory reserves gone which migt break reclaimers who rely on those
reserves.

Are you sure it is a good idea? Unless I am missing something you are
just giving an ordinary user access to those reserves by creating inodes
without any bounds.
Kent Overstreet Feb. 1, 2024, 11:03 a.m. UTC | #4
On Mon, Jan 29, 2024 at 11:48:00AM +0100, Michal Hocko wrote:
> On Sun 28-01-24 14:43:16, Kent Overstreet wrote:
> > On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote:
> > > On Fri 26-01-24 17:07:56, Kent Overstreet wrote:
> > > > If we're using PF_MEMALLOC, we might have a fallback and might not to
> > > > warn about a failing allocation - thus we need a PF_* equivalent of
> > > > __GFP_NOWARN.
> > > 
> > > Could you be more specific about the user? Is this an allocation from
> > > the reclaim path or an explicit PF_MEMALLOC one? It would be also really
> > > helpful to explain why GFP_NOWARN cannot be used directly.
> > 
> > Explicit PF_MEMALLOC.
> > 
> > It's for a call to alloc_inode(), which doesn't take gfp flags, and
> > plumbing it would require modifying a s_ops callback and would touch
> > every filesystem -
> 
> OK, I see. This should be part of the changelog.
> 
> > but we want to get away from gfp flags anyways :)
> > 
> > More specifically, the code where I'm using it is doing a "try
> > GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance;
> > it's part of a cleanup for some weird lifetime stuff related to
> > fs/inode.c.
> > 
> > #define memalloc_flags_do(_flags, _do)						\
> > ({										\
> > 	unsigned _saved_flags = memalloc_flags_save(_flags);			\
> > 	typeof(_do) _ret = _do;							\
> > 	memalloc_noreclaim_restore(_saved_flags);				\
> > 	_ret;									\
> > })
> > 
> > /*
> >  * Allocate a new inode, dropping/retaking btree locks if necessary:
> >  */
> > static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
> > {
> > 	struct bch_fs *c = trans->c;
> > 
> > 	struct bch_inode_info *inode =
> > 		memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb)));
> 
> Is this what you meant by GFP_NOWAIT allocation? You would be right
> about this allocation not entering the direct reclaim but do you realize
> this is not an ordinary NOWAIT request beause PF_MEMALLOC will allow to
> dip into memory reserves without any limits? (Unless the specific
> allocation down the road is explicitly GFP_NOMEMALLOC)
> A failure here means that the system is really in a desparate state with all
> the memory reserves gone which migt break reclaimers who rely on those
> reserves.
> 
> Are you sure it is a good idea? Unless I am missing something you are
> just giving an ordinary user access to those reserves by creating inodes
> without any bounds.

Did not realize that; thank you.

How about this version?

From 0e87e55058ccddde4b6bcc092f43e66a4e632575 Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kent.overstreet@linux.dev>
Date: Thu, 25 Jan 2024 19:00:24 -0500
Subject: [PATCH] mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN

Introduce PF_MEMALLOC_* equivalents of some GFP_ flags:

PF_MEMALLOC_NORECLAIM	-> GFP_NOWAIT
PF_MEMALLOC_NOWARN	-> __GFP_NOWARN

Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index cdb8ea53c365..36de7a2343c3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1632,12 +1632,12 @@ extern struct pid *cad_pid;
 #define PF_KSWAPD		0x00020000	/* I am kswapd */
 #define PF_MEMALLOC_NOFS	0x00040000	/* All allocation requests will inherit GFP_NOFS */
 #define PF_MEMALLOC_NOIO	0x00080000	/* All allocation requests will inherit GFP_NOIO */
-#define PF_LOCAL_THROTTLE	0x00100000	/* Throttle writes only against the bdi I write to,
+#define PF_MEMALLOC_NORECLAIM	0x00100000	/* All allocation requests will inherit __GFP_NOWARN */
+#define PF_MEMALLOC_NOWARN	0x00200000	/* All allocation requests will inherit __GFP_NOWARN */
+#define PF_LOCAL_THROTTLE	0x00400000	/* Throttle writes only against the bdi I write to,
 						 * I am cleaning dirty pages from some other bdi. */
-#define PF_KTHREAD		0x00200000	/* I am a kernel thread */
-#define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
-#define PF__HOLE__00800000	0x00800000
-#define PF__HOLE__01000000	0x01000000
+#define PF_KTHREAD		0x00800000	/* I am a kernel thread */
+#define PF_RANDOMIZE		0x01000000	/* Randomize virtual address space */
 #define PF__HOLE__02000000	0x02000000
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index f00d7ecc2adf..c29059a76052 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -236,16 +236,25 @@ static inline gfp_t current_gfp_context(gfp_t flags)
 {
 	unsigned int pflags = READ_ONCE(current->flags);
 
-	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
+	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
+			       PF_MEMALLOC_NOFS |
+			       PF_MEMALLOC_NORECLAIM |
+			       PF_MEMALLOC_NOWARN |
+			       PF_MEMALLOC_PIN))) {
 		/*
-		 * NOIO implies both NOIO and NOFS and it is a weaker context
-		 * so always make sure it makes precedence
+		 * Stronger flags before weaker flags:
+		 * NORECLAIM implies NOIO, which in turn implies NOFS
 		 */
-		if (pflags & PF_MEMALLOC_NOIO)
+		if (pflags & PF_MEMALLOC_NORECLAIM)
+			flags &= ~__GFP_DIRECT_RECLAIM;
+		else if (pflags & PF_MEMALLOC_NOIO)
 			flags &= ~(__GFP_IO | __GFP_FS);
 		else if (pflags & PF_MEMALLOC_NOFS)
 			flags &= ~__GFP_FS;
 
+		if (pflags & PF_MEMALLOC_NOWARN)
+			flags |= __GFP_NOWARN;
+
 		if (pflags & PF_MEMALLOC_PIN)
 			flags &= ~__GFP_MOVABLE;
 	}
Michal Hocko Feb. 1, 2024, 3:59 p.m. UTC | #5
On Thu 01-02-24 06:03:27, Kent Overstreet wrote:
> On Mon, Jan 29, 2024 at 11:48:00AM +0100, Michal Hocko wrote:
[...]
> >From 0e87e55058ccddde4b6bcc092f43e66a4e632575 Mon Sep 17 00:00:00 2001
> From: Kent Overstreet <kent.overstreet@linux.dev>
> Date: Thu, 25 Jan 2024 19:00:24 -0500
> Subject: [PATCH] mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN
> 
> Introduce PF_MEMALLOC_* equivalents of some GFP_ flags:
> 
> PF_MEMALLOC_NORECLAIM	-> GFP_NOWAIT

scoped NOWAIT/NORECLAIM semantic has been proposed in the past. I
haven't been fan of it TBH. This is really tricky because unlike other
scoped flags this changes the allocation failure semantic for the code
that is not really aware of that. Even worse if the code inside the
scope has a completely different allocation failure requirements - e.g.
consider a nested GFP_NOFAIL request. Your implementation would simply
override that and cause an unexpected failure. Now you could exclude
GFP_NOFAIL explicitly but that brings more problems down the road - e.g.
consider the scope NOWAIT is a way to prevent from sleeping from within
atomic context. Now nested GFP_NOFAIL request either busy waits or
sleeps inside of an atomic context. No good!
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index cdb8ea53c365..36a5046ea3fa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1632,11 +1632,11 @@  extern struct pid *cad_pid;
 #define PF_KSWAPD		0x00020000	/* I am kswapd */
 #define PF_MEMALLOC_NOFS	0x00040000	/* All allocation requests will inherit GFP_NOFS */
 #define PF_MEMALLOC_NOIO	0x00080000	/* All allocation requests will inherit GFP_NOIO */
-#define PF_LOCAL_THROTTLE	0x00100000	/* Throttle writes only against the bdi I write to,
+#define PF_MEMALLOC_NOWARN	0x00100000	/* All allocation requests will inherit __GFP_NOWARN */
+#define PF_LOCAL_THROTTLE	0x00200000	/* Throttle writes only against the bdi I write to,
 						 * I am cleaning dirty pages from some other bdi. */
-#define PF_KTHREAD		0x00200000	/* I am a kernel thread */
-#define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
-#define PF__HOLE__00800000	0x00800000
+#define PF_KTHREAD		0x00400000	/* I am a kernel thread */
+#define PF_RANDOMIZE		0x00800000	/* Randomize virtual address space */
 #define PF__HOLE__01000000	0x01000000
 #define PF__HOLE__02000000	0x02000000
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index f00d7ecc2adf..817f5c1bf3d5 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -236,7 +236,10 @@  static inline gfp_t current_gfp_context(gfp_t flags)
 {
 	unsigned int pflags = READ_ONCE(current->flags);
 
-	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
+	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
+			       PF_MEMALLOC_NOFS |
+			       PF_MEMALLOC_NOWARN |
+			       PF_MEMALLOC_PIN))) {
 		/*
 		 * NOIO implies both NOIO and NOFS and it is a weaker context
 		 * so always make sure it makes precedence
@@ -246,6 +249,9 @@  static inline gfp_t current_gfp_context(gfp_t flags)
 		else if (pflags & PF_MEMALLOC_NOFS)
 			flags &= ~__GFP_FS;
 
+		if (pflags & PF_MEMALLOC_NOWARN)
+			flags |= __GFP_NOWARN;
+
 		if (pflags & PF_MEMALLOC_PIN)
 			flags &= ~__GFP_MOVABLE;
 	}