Message ID | 20241118215028.1066662-1-paul@darkrain42.org (mailing list archive) |
---|---|
Headers | show |
Series | SMB cached directory fixes around reconnection/unmounting | expand |
Looks like you dropped the patch: "smb: No need to wait for work when cleaning up cached directories" Otherwise for the four remaining patches, looks like the first patch stayed the same (trivial comment change). Can you remind me which of these three changed: smb: Don't leak cfid when reconnect races with open_cached_dir smb: prevent use-after-free due to open_cached_dir error paths smb: During unmount, ensure all cached dir instances drop their dentry On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote: > > v2: > - Added locking in closed_all_cached_dirs() > - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached > dir dentries, and split out the "drop dentry" work from "potential > SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on > server traffic, but can ensure all "drop dentry" work has run. > - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry > > > The SMB client cached directory functionality can either leak a cfid if > open_cached_dir() races with a reconnect, or can have races between the > unmount process and cached dir cleanup/lease breaks that all lead to > a cached_dir instance not dropping its dentry ref in close_all_cached_dirs(). > These all manifest as a pair of BUGs when unmounting: > > [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] > [18645.789274] VFS: Busy inodes after unmount of cifs (cifs) > > These issues started with the lease directory cache handling introduced in > commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is > held"), and go away if I mount with 'nohandlecache'. > > I'm able to reproduce the "Dentry still in use" errors by connecting to an > actively-used SMB share (the server organically generates lease breaks) and > leaving these running for 'a while': > > - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done > - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done > > ('a while' is anywhere from 10 minutes to overnight. Also, it's not the > cleanest reproducer, but I stopped iterating once I had something that was > even remotely reliable for me...) > > This series attempts to fix these, as well as a use-after-free that could > occur because open_cached_dir() explicitly frees the cached_fid, rather than > relying on reference counting. > Paul Aurich (4): > smb: cached directories can be more than root file handle > smb: Don't leak cfid when reconnect races with open_cached_dir > smb: prevent use-after-free due to open_cached_dir error paths > smb: During unmount, ensure all cached dir instances drop their dentry > > fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------ > fs/smb/client/cached_dir.h | 6 +- > fs/smb/client/cifsfs.c | 14 ++- > fs/smb/client/cifsglob.h | 3 +- > fs/smb/client/inode.c | 3 - > fs/smb/client/trace.h | 3 + > 6 files changed, 179 insertions(+), 78 deletions(-) > > -- > 2.45.2 > >
On 2024-11-18 18:55:23 -0600, Steve French wrote: >Looks like you dropped the patch: >"smb: No need to wait for work when cleaning up cached directories" > >Otherwise for the four remaining patches, looks like the first patch >stayed the same (trivial comment change). > >Can you remind me which of these three changed: > > smb: Don't leak cfid when reconnect races with open_cached_dir > smb: prevent use-after-free due to open_cached_dir error paths > smb: During unmount, ensure all cached dir instances drop their dentry All the substantive changes are in the last patch. I should have clarified, but I just folded the changes from "smb: No need to wait for work when cleaning up cached directories" into that patch, as well. >On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote: >> >> v2: >> - Added locking in closed_all_cached_dirs() >> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached >> dir dentries, and split out the "drop dentry" work from "potential >> SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on >> server traffic, but can ensure all "drop dentry" work has run. >> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry >> >> >> The SMB client cached directory functionality can either leak a cfid if >> open_cached_dir() races with a reconnect, or can have races between the >> unmount process and cached dir cleanup/lease breaks that all lead to >> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs(). >> These all manifest as a pair of BUGs when unmounting: >> >> [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] >> [18645.789274] VFS: Busy inodes after unmount of cifs (cifs) >> >> These issues started with the lease directory cache handling introduced in >> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is >> held"), and go away if I mount with 'nohandlecache'. >> >> I'm able to reproduce the "Dentry still in use" errors by connecting to an >> actively-used SMB share (the server organically generates lease breaks) and >> leaving these running for 'a while': >> >> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done >> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done >> >> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the >> cleanest reproducer, but I stopped iterating once I had something that was >> even remotely reliable for me...) >> >> This series attempts to fix these, as well as a use-after-free that could >> occur because open_cached_dir() explicitly frees the cached_fid, rather than >> relying on reference counting. >> Paul Aurich (4): >> smb: cached directories can be more than root file handle >> smb: Don't leak cfid when reconnect races with open_cached_dir >> smb: prevent use-after-free due to open_cached_dir error paths >> smb: During unmount, ensure all cached dir instances drop their dentry >> >> fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------ >> fs/smb/client/cached_dir.h | 6 +- >> fs/smb/client/cifsfs.c | 14 ++- >> fs/smb/client/cifsglob.h | 3 +- >> fs/smb/client/inode.c | 3 - >> fs/smb/client/trace.h | 3 + >> 6 files changed, 179 insertions(+), 78 deletions(-) >> >> -- >> 2.45.2 >> >> > > >-- >Thanks, > >Steve
I noticed a hang today in generic/246 immediately after a crash in umount in the previous test, generic/241 running with 6.12 kernel with recent cifs-2.6.git/for-next which includes e.g. the 5 directory caching fixes) generic/241 to Windows. See attached dmesg log. Any thoughts? I did not see it hang on a previous run. Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0 Nov 21 12:36:09 fedora29 kernel: ? __pfx_umount_check+0x10/0x10 Nov 21 12:36:09 fedora29 kernel: d_walk+0xf3/0x4e0 Nov 21 12:36:09 fedora29 kernel: ? d_walk+0x4b/0x4e0 Nov 21 12:36:09 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220 Nov 21 12:36:09 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0 Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40 Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs] On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote: > > v2: > - Added locking in closed_all_cached_dirs() > - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached > dir dentries, and split out the "drop dentry" work from "potential > SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on > server traffic, but can ensure all "drop dentry" work has run. > - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry > > > The SMB client cached directory functionality can either leak a cfid if > open_cached_dir() races with a reconnect, or can have races between the > unmount process and cached dir cleanup/lease breaks that all lead to > a cached_dir instance not dropping its dentry ref in close_all_cached_dirs(). > These all manifest as a pair of BUGs when unmounting: > > [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] > [18645.789274] VFS: Busy inodes after unmount of cifs (cifs) > > These issues started with the lease directory cache handling introduced in > commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is > held"), and go away if I mount with 'nohandlecache'. > > I'm able to reproduce the "Dentry still in use" errors by connecting to an > actively-used SMB share (the server organically generates lease breaks) and > leaving these running for 'a while': > > - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done > - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done > > ('a while' is anywhere from 10 minutes to overnight. Also, it's not the > cleanest reproducer, but I stopped iterating once I had something that was > even remotely reliable for me...) > > This series attempts to fix these, as well as a use-after-free that could > occur because open_cached_dir() explicitly frees the cached_fid, rather than > relying on reference counting. > Paul Aurich (4): > smb: cached directories can be more than root file handle > smb: Don't leak cfid when reconnect races with open_cached_dir > smb: prevent use-after-free due to open_cached_dir error paths > smb: During unmount, ensure all cached dir instances drop their dentry > > fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------ > fs/smb/client/cached_dir.h | 6 +- > fs/smb/client/cifsfs.c | 14 ++- > fs/smb/client/cifsglob.h | 3 +- > fs/smb/client/inode.c | 3 - > fs/smb/client/trace.h | 3 + > 6 files changed, 179 insertions(+), 78 deletions(-) > > -- > 2.45.2 > >