Message ID | 20250218143005.1318886-1-wangzhaolong1@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | smb: client: Fix netns refcount imbalance causing leaks and use-after-free | expand |
tentatively merged into cifs-2.6.git for-next pending more review and testing On Tue, Feb 18, 2025 at 8:34 AM Wang Zhaolong <wangzhaolong1@huawei.com> wrote: > > Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network > namespace.") attempted to fix a netns use-after-free issue by manually > adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add(). > > However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock > after rmmod") pointed out that the approach of manually setting > sk->sk_net_refcnt in the first commit was technically incorrect, as > sk->sk_net_refcnt should only be set for user sockets. It led to issues > like TCP timers not being cleared properly on close. The second commit > moved to a model of just holding an extra netns reference for > server->ssocket using get_net(), and dropping it when the server is torn > down. > > But there remain some gaps in the get_net()/put_net() balancing added by > these commits. The incomplete reference handling in these fixes results > in two issues: > > 1. Netns refcount leaks[1] > > The problem process is as follows: > > ``` > mount.cifs cifsd > > cifs_do_mount > cifs_mount > cifs_mount_get_session > cifs_get_tcp_session > get_net() /* First get net. */ > ip_connect > generic_ip_connect /* Try port 445 */ > get_net() > ->connect() /* Failed */ > put_net() > generic_ip_connect /* Try port 139 */ > get_net() /* Missing matching put_net() for this get_net().*/ > cifs_get_smb_ses > cifs_negotiate_protocol > smb2_negotiate > SMB2_negotiate > cifs_send_recv > wait_for_response > cifs_demultiplex_thread > cifs_read_from_socket > cifs_readv_from_socket > cifs_reconnect > cifs_abort_connection > sock_release(); > server->ssocket = NULL; > /* Missing put_net() here. */ > generic_ip_connect > get_net() > ->connect() /* Failed */ > put_net() > sock_release(); > server->ssocket = NULL; > free_rsp_buf > ... > clean_demultiplex_info > /* It's only called once here. */ > put_net() > ``` > > When cifs_reconnect() is triggered, the server->ssocket is released > without a corresponding put_net() for the reference acquired in > generic_ip_connect() before. it ends up calling generic_ip_connect() > again to retry get_net(). After that, server->ssocket is set to NULL > in the error path of generic_ip_connect(), and the net count cannot be > released in the final clean_demultiplex_info() function. > > 2. Potential use-after-free > > The current refcounting scheme can lead to a potential use-after-free issue > in the following scenario: > > ``` > cifs_do_mount > cifs_mount > cifs_mount_get_session > cifs_get_tcp_session > get_net() /* First get net */ > ip_connect > generic_ip_connect > get_net() > bind_socket > kernel_bind /* failed */ > put_net() > /* after out_err_crypto_release label */ > put_net() > /* after out_err label */ > put_net() > ``` > > In the exception handling process where binding the socket fails, the > get_net() and put_net() calls are unbalanced, which may cause the > server->net reference count to drop to zero and be prematurely released. > > To address both issues, this patch ties the netns reference counting to > the server->ssocket and server lifecycles. The extra reference is now > acquired when the server or socket is created, and released when the > socket is destroyed or the server is torn down. > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=219792 > > Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") > Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") > Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> > --- > fs/smb/client/connect.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c > index f917de020dd5..0d454149f3b4 100644 > --- a/fs/smb/client/connect.c > +++ b/fs/smb/client/connect.c > @@ -300,6 +300,7 @@ cifs_abort_connection(struct TCP_Server_Info *server) > server->ssocket->flags); > sock_release(server->ssocket); > server->ssocket = NULL; > + put_net(cifs_net_ns(server)); > } > server->sequence_number = 0; > server->session_estab = false; > @@ -3115,8 +3116,12 @@ generic_ip_connect(struct TCP_Server_Info *server) > /* > * Grab netns reference for the socket. > * > - * It'll be released here, on error, or in clean_demultiplex_info() upon server > - * teardown. > + * This reference will be released in several situations: > + * - In the failure path before the cifsd thread is started. > + * - In the all place where server->socket is released, it is > + * also set to NULL. > + * - Ultimately in clean_demultiplex_info(), during the final > + * teardown. > */ > get_net(net); > > @@ -3132,10 +3137,8 @@ generic_ip_connect(struct TCP_Server_Info *server) > } > > rc = bind_socket(server); > - if (rc < 0) { > - put_net(cifs_net_ns(server)); > + if (rc < 0) > return rc; > - } > > /* > * Eventually check for other socket options to change from > @@ -3181,9 +3184,6 @@ generic_ip_connect(struct TCP_Server_Info *server) > if (sport == htons(RFC1001_PORT)) > rc = ip_rfc1001_connect(server); > > - if (rc < 0) > - put_net(cifs_net_ns(server)); > - > return rc; > } > > -- > 2.34.3 > >
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index f917de020dd5..0d454149f3b4 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -300,6 +300,7 @@ cifs_abort_connection(struct TCP_Server_Info *server) server->ssocket->flags); sock_release(server->ssocket); server->ssocket = NULL; + put_net(cifs_net_ns(server)); } server->sequence_number = 0; server->session_estab = false; @@ -3115,8 +3116,12 @@ generic_ip_connect(struct TCP_Server_Info *server) /* * Grab netns reference for the socket. * - * It'll be released here, on error, or in clean_demultiplex_info() upon server - * teardown. + * This reference will be released in several situations: + * - In the failure path before the cifsd thread is started. + * - In the all place where server->socket is released, it is + * also set to NULL. + * - Ultimately in clean_demultiplex_info(), during the final + * teardown. */ get_net(net); @@ -3132,10 +3137,8 @@ generic_ip_connect(struct TCP_Server_Info *server) } rc = bind_socket(server); - if (rc < 0) { - put_net(cifs_net_ns(server)); + if (rc < 0) return rc; - } /* * Eventually check for other socket options to change from @@ -3181,9 +3184,6 @@ generic_ip_connect(struct TCP_Server_Info *server) if (sport == htons(RFC1001_PORT)) rc = ip_rfc1001_connect(server); - if (rc < 0) - put_net(cifs_net_ns(server)); - return rc; }
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") attempted to fix a netns use-after-free issue by manually adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add(). However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") pointed out that the approach of manually setting sk->sk_net_refcnt in the first commit was technically incorrect, as sk->sk_net_refcnt should only be set for user sockets. It led to issues like TCP timers not being cleared properly on close. The second commit moved to a model of just holding an extra netns reference for server->ssocket using get_net(), and dropping it when the server is torn down. But there remain some gaps in the get_net()/put_net() balancing added by these commits. The incomplete reference handling in these fixes results in two issues: 1. Netns refcount leaks[1] The problem process is as follows: ``` mount.cifs cifsd cifs_do_mount cifs_mount cifs_mount_get_session cifs_get_tcp_session get_net() /* First get net. */ ip_connect generic_ip_connect /* Try port 445 */ get_net() ->connect() /* Failed */ put_net() generic_ip_connect /* Try port 139 */ get_net() /* Missing matching put_net() for this get_net().*/ cifs_get_smb_ses cifs_negotiate_protocol smb2_negotiate SMB2_negotiate cifs_send_recv wait_for_response cifs_demultiplex_thread cifs_read_from_socket cifs_readv_from_socket cifs_reconnect cifs_abort_connection sock_release(); server->ssocket = NULL; /* Missing put_net() here. */ generic_ip_connect get_net() ->connect() /* Failed */ put_net() sock_release(); server->ssocket = NULL; free_rsp_buf ... clean_demultiplex_info /* It's only called once here. */ put_net() ``` When cifs_reconnect() is triggered, the server->ssocket is released without a corresponding put_net() for the reference acquired in generic_ip_connect() before. it ends up calling generic_ip_connect() again to retry get_net(). After that, server->ssocket is set to NULL in the error path of generic_ip_connect(), and the net count cannot be released in the final clean_demultiplex_info() function. 2. Potential use-after-free The current refcounting scheme can lead to a potential use-after-free issue in the following scenario: ``` cifs_do_mount cifs_mount cifs_mount_get_session cifs_get_tcp_session get_net() /* First get net */ ip_connect generic_ip_connect get_net() bind_socket kernel_bind /* failed */ put_net() /* after out_err_crypto_release label */ put_net() /* after out_err label */ put_net() ``` In the exception handling process where binding the socket fails, the get_net() and put_net() calls are unbalanced, which may cause the server->net reference count to drop to zero and be prematurely released. To address both issues, this patch ties the netns reference counting to the server->ssocket and server lifecycles. The extra reference is now acquired when the server or socket is created, and released when the socket is destroyed or the server is torn down. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=219792 Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> --- fs/smb/client/connect.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)