Message ID | 20200401125026.4899-1-aaptel@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] cifs: ignore cached share root handle closing errors | expand |
ср, 1 апр. 2020 г. в 05:50, Aurelien Aptel <aaptel@suse.com>: > > Fix tcon use-after-free and NULL ptr deref. > > Customer system crashes with the following kernel log: > > [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR > [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.347107] CIFS VFS: Close unmatched open > [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > ... > [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server) > #6 [...] smb2_cancelled_close_fid at ... [cifs] > #7 [...] process_one_work at ... > #8 [...] worker_thread at ... > #9 [...] kthread at ... > > The most likely explanation we have is: > > * When we put the last reference of a tcon (refcount=0), we close the > cached share root handle. > * If closing a handle is interupted, SMB2_close() will > queue a SMB2_close() in a work thread. > * The queued object keeps a tcon ref so we bump the tcon > refcount, jumping from 0 to 1. > * We reach the end of cifs_put_tcon(), we free the tcon object despite > it now having a refcount of 1. > * The queued work now runs, but the tcon, ses & server was freed in > the meantime resulting in a crash. > > THREAD 1 > ======== > cifs_put_tcon => tcon refcount reach 0 > SMB2_tdis > close_shroot_lease > close_shroot_lease_locked => if cached root has lease && refcount reach 0 > smb2_close_cached_fid => if cached root valid > SMB2_close => retry close in a worker thread if interrupted > smb2_handle_cancelled_close > __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !! > INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); > queue_work(cifsiod_wq, &cancelled->work) => queue work > tconInfoFree(tcon); ==> freed! > cifs_put_smb_ses(ses); ==> freed! > > THREAD 2 (workqueue) > ======== > smb2_cancelled_close_fid > SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon > cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time > *CRASH* > > Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease") > Signed-off-by: Aurelien Aptel <aaptel@suse.com> > --- > fs/cifs/smb2misc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c > index 0511aaf451d4..965276aeffcf 100644 > --- a/fs/cifs/smb2misc.c > +++ b/fs/cifs/smb2misc.c > @@ -766,6 +766,12 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, > > cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count); > spin_lock(&cifs_tcp_ses_lock); > + if (tcon->tc_count <= 0) { > + spin_unlock(&cifs_tcp_ses_lock); > + cifs_dbg(VFS, "tcon is closing, skipping async close retry of fid %llu %llu\n", Thanks for a good catch! Some suggestions below: - Why VFS? Share is being disconnected anyway, so all associated handles should be closed by the server eventually. I think FYI is sufficient here. - cifs_dbg_server may be better than cifs_dbg and please include tcon ID here otherwise FIDs may be meaningless if there are several shares mounted on a host. - May be WARN_ONCE if tc_count is negative since it is a possible bug? Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> -- Best regards, Pavel Shilovsky
Hi Pavel, Thanks for the review, will send v3 :) I thought this one would be of interest to you as it's related to your patch in the Fixes tag. I've noticed the shared root handle is causing us lots of issues lately. Paulo is working on a DFS bug related to it too (when switching servers the handle becomes invalid). Cheers,
diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index 0511aaf451d4..965276aeffcf 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -766,6 +766,12 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count); spin_lock(&cifs_tcp_ses_lock); + if (tcon->tc_count <= 0) { + spin_unlock(&cifs_tcp_ses_lock); + cifs_dbg(VFS, "tcon is closing, skipping async close retry of fid %llu %llu\n", + persistent_fid, volatile_fid); + return 0; + } tcon->tc_count++; spin_unlock(&cifs_tcp_ses_lock);
Fix tcon use-after-free and NULL ptr deref. Customer system crashes with the following kernel log: [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 [462233.347107] CIFS VFS: Close unmatched open [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 ... [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server) #6 [...] smb2_cancelled_close_fid at ... [cifs] #7 [...] process_one_work at ... #8 [...] worker_thread at ... #9 [...] kthread at ... The most likely explanation we have is: * When we put the last reference of a tcon (refcount=0), we close the cached share root handle. * If closing a handle is interupted, SMB2_close() will queue a SMB2_close() in a work thread. * The queued object keeps a tcon ref so we bump the tcon refcount, jumping from 0 to 1. * We reach the end of cifs_put_tcon(), we free the tcon object despite it now having a refcount of 1. * The queued work now runs, but the tcon, ses & server was freed in the meantime resulting in a crash. THREAD 1 ======== cifs_put_tcon => tcon refcount reach 0 SMB2_tdis close_shroot_lease close_shroot_lease_locked => if cached root has lease && refcount reach 0 smb2_close_cached_fid => if cached root valid SMB2_close => retry close in a worker thread if interrupted smb2_handle_cancelled_close __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !! INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); queue_work(cifsiod_wq, &cancelled->work) => queue work tconInfoFree(tcon); ==> freed! cifs_put_smb_ses(ses); ==> freed! THREAD 2 (workqueue) ======== smb2_cancelled_close_fid SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time *CRASH* Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease") Signed-off-by: Aurelien Aptel <aaptel@suse.com> --- fs/cifs/smb2misc.c | 6 ++++++ 1 file changed, 6 insertions(+)