Message ID | 96a18bd00cbc6cb554603cc0d6ef1c551965b078.1663762494.git.gnault@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [net] sunrpc: Use GFP_NOFS to prevent use of current->task_frag. | expand |
On Wed, 2022-09-21 at 14:16 +0200, Guillaume Nault wrote: > Commit a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all > rpciod/xprtiod jobs") stopped setting sk->sk_allocation explicitly in > favor of using memalloc_nofs_save()/memalloc_nofs_restore() critical > sections. > > However, ->sk_allocation isn't used just by the memory allocator. > In particular, sk_page_frag() uses it to figure out if it can return > the page_frag from current or if it has to use the socket one. > With ->sk_allocation set to the default GFP_KERNEL, sk_page_frag() now > returns current->page_frag, which might already be in use in the > current context if the call happens during memory reclaim. > > Fix this by setting ->sk_allocation to GFP_NOFS. > Note that we can't just instruct sk_page_frag() to look at > current->flags, because it could generate a cache miss, thus slowing > down the TCP fast path. > > This is similar to the problems fixed by the following two commits: > * cifs: commit dacb5d8875cc ("tcp: fix page frag corruption on page > fault"). > * nbd: commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from > memory reclaim"). > > Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/ > Fixes: a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") > Signed-off-by: Guillaume Nault <gnault@redhat.com> It's unfortunate, but I think we need to keep both memalloc_nofs_save() and sk_allocation for the time being. Thanks Guillaume, patch LGTM. Acked-by: Paolo Abeni <pabeni@redhat.com>
On 21 Sep 2022, at 8:16, Guillaume Nault wrote: > Commit a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all > rpciod/xprtiod jobs") stopped setting sk->sk_allocation explicitly in > favor of using memalloc_nofs_save()/memalloc_nofs_restore() critical > sections. > > However, ->sk_allocation isn't used just by the memory allocator. > In particular, sk_page_frag() uses it to figure out if it can return > the page_frag from current or if it has to use the socket one. > With ->sk_allocation set to the default GFP_KERNEL, sk_page_frag() now > returns current->page_frag, which might already be in use in the > current context if the call happens during memory reclaim. > > Fix this by setting ->sk_allocation to GFP_NOFS. > Note that we can't just instruct sk_page_frag() to look at > current->flags, because it could generate a cache miss, thus slowing > down the TCP fast path. > > This is similar to the problems fixed by the following two commits: > * cifs: commit dacb5d8875cc ("tcp: fix page frag corruption on page > fault"). > * nbd: commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from > memory reclaim"). > > Link: > https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/ > Fixes: a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all > rpciod/xprtiod jobs") > Signed-off-by: Guillaume Nault <gnault@redhat.com> Looks good, and thanks for looking through all the options. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Ben
On Wed, 2022-09-21 at 14:16 +0200, Guillaume Nault wrote: > Commit a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all > rpciod/xprtiod jobs") stopped setting sk->sk_allocation explicitly in > favor of using memalloc_nofs_save()/memalloc_nofs_restore() critical > sections. > > However, ->sk_allocation isn't used just by the memory allocator. > In particular, sk_page_frag() uses it to figure out if it can return > the page_frag from current or if it has to use the socket one. > With ->sk_allocation set to the default GFP_KERNEL, sk_page_frag() now > returns current->page_frag, which might already be in use in the > current context if the call happens during memory reclaim. > > Fix this by setting ->sk_allocation to GFP_NOFS. > Note that we can't just instruct sk_page_frag() to look at > current->flags, because it could generate a cache miss, thus slowing > down the TCP fast path. > > This is similar to the problems fixed by the following two commits: > * cifs: commit dacb5d8875cc ("tcp: fix page frag corruption on page > fault"). > * nbd: commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from > memory reclaim"). > > Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/ > Fixes: a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") > Signed-off-by: Guillaume Nault <gnault@redhat.com> @Trond, @Anna, @Chuck: are you ok with this patch? Should we take it via the net tree or will you merge it? Thanks! Paolo
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index e976007f4fd0..1bd3048d43ae 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1882,6 +1882,7 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, sk->sk_write_space = xs_udp_write_space; sk->sk_state_change = xs_local_state_change; sk->sk_error_report = xs_error_report; + sk->sk_allocation = GFP_NOFS; xprt_clear_connected(xprt); @@ -2083,6 +2084,7 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) sk->sk_user_data = xprt; sk->sk_data_ready = xs_data_ready; sk->sk_write_space = xs_udp_write_space; + sk->sk_allocation = GFP_NOFS; xprt_set_connected(xprt); @@ -2250,6 +2252,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) sk->sk_state_change = xs_tcp_state_change; sk->sk_write_space = xs_tcp_write_space; sk->sk_error_report = xs_error_report; + sk->sk_allocation = GFP_NOFS; /* socket options */ sock_reset_flag(sk, SOCK_LINGER);
Commit a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") stopped setting sk->sk_allocation explicitly in favor of using memalloc_nofs_save()/memalloc_nofs_restore() critical sections. However, ->sk_allocation isn't used just by the memory allocator. In particular, sk_page_frag() uses it to figure out if it can return the page_frag from current or if it has to use the socket one. With ->sk_allocation set to the default GFP_KERNEL, sk_page_frag() now returns current->page_frag, which might already be in use in the current context if the call happens during memory reclaim. Fix this by setting ->sk_allocation to GFP_NOFS. Note that we can't just instruct sk_page_frag() to look at current->flags, because it could generate a cache miss, thus slowing down the TCP fast path. This is similar to the problems fixed by the following two commits: * cifs: commit dacb5d8875cc ("tcp: fix page frag corruption on page fault"). * nbd: commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from memory reclaim"). Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/ Fixes: a1231fda7e94 ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") Signed-off-by: Guillaume Nault <gnault@redhat.com> --- net/sunrpc/xprtsock.c | 3 +++ 1 file changed, 3 insertions(+)