mbox series

[v2,0/4] SMB cached directory fixes around reconnection/unmounting

Message ID 20241118215028.1066662-1-paul@darkrain42.org (mailing list archive)
Headers show
Series SMB cached directory fixes around reconnection/unmounting | expand

Message

Paul Aurich Nov. 18, 2024, 9:50 p.m. UTC
v2:
- Added locking in closed_all_cached_dirs()
- Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
  dir dentries, and split out the "drop dentry" work from "potential
  SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
  server traffic, but can ensure all "drop dentry" work has run.
- Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry


The SMB client cached directory functionality can either leak a cfid if
open_cached_dir() races with a reconnect, or can have races between the
unmount process and cached dir cleanup/lease breaks that all lead to
a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
These all manifest as a pair of BUGs when unmounting:

    [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
    [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)

These issues started with the lease directory cache handling introduced in
commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
held"), and go away if I mount with 'nohandlecache'.

I'm able to reproduce the "Dentry still in use" errors by connecting to an
actively-used SMB share (the server organically generates lease breaks) and
leaving these running for 'a while':

- while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
- while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done

('a while' is anywhere from 10 minutes to overnight. Also, it's not the
cleanest reproducer, but I stopped iterating once I had something that was
even remotely reliable for me...)

This series attempts to fix these, as well as a use-after-free that could
occur because open_cached_dir() explicitly frees the cached_fid, rather than
relying on reference counting.
Paul Aurich (4):
  smb: cached directories can be more than root file handle
  smb: Don't leak cfid when reconnect races with open_cached_dir
  smb: prevent use-after-free due to open_cached_dir error paths
  smb: During unmount, ensure all cached dir instances drop their dentry

 fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
 fs/smb/client/cached_dir.h |   6 +-
 fs/smb/client/cifsfs.c     |  14 ++-
 fs/smb/client/cifsglob.h   |   3 +-
 fs/smb/client/inode.c      |   3 -
 fs/smb/client/trace.h      |   3 +
 6 files changed, 179 insertions(+), 78 deletions(-)

--
2.45.2

Comments

Steve French Nov. 19, 2024, 12:55 a.m. UTC | #1
Looks like you dropped the patch:
"smb: No need to wait for work when cleaning up cached directories"

Otherwise for the four remaining patches, looks like the first patch
stayed the same (trivial comment change).

Can you remind me which of these three changed:

  smb: Don't leak cfid when reconnect races with open_cached_dir
  smb: prevent use-after-free due to open_cached_dir error paths
  smb: During unmount, ensure all cached dir instances drop their dentry

On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote:
>
> v2:
> - Added locking in closed_all_cached_dirs()
> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
>   dir dentries, and split out the "drop dentry" work from "potential
>   SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
>   server traffic, but can ensure all "drop dentry" work has run.
> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>
>
> The SMB client cached directory functionality can either leak a cfid if
> open_cached_dir() races with a reconnect, or can have races between the
> unmount process and cached dir cleanup/lease breaks that all lead to
> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
> These all manifest as a pair of BUGs when unmounting:
>
>     [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
>     [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>
> These issues started with the lease directory cache handling introduced in
> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
> held"), and go away if I mount with 'nohandlecache'.
>
> I'm able to reproduce the "Dentry still in use" errors by connecting to an
> actively-used SMB share (the server organically generates lease breaks) and
> leaving these running for 'a while':
>
> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>
> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
> cleanest reproducer, but I stopped iterating once I had something that was
> even remotely reliable for me...)
>
> This series attempts to fix these, as well as a use-after-free that could
> occur because open_cached_dir() explicitly frees the cached_fid, rather than
> relying on reference counting.
> Paul Aurich (4):
>   smb: cached directories can be more than root file handle
>   smb: Don't leak cfid when reconnect races with open_cached_dir
>   smb: prevent use-after-free due to open_cached_dir error paths
>   smb: During unmount, ensure all cached dir instances drop their dentry
>
>  fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
>  fs/smb/client/cached_dir.h |   6 +-
>  fs/smb/client/cifsfs.c     |  14 ++-
>  fs/smb/client/cifsglob.h   |   3 +-
>  fs/smb/client/inode.c      |   3 -
>  fs/smb/client/trace.h      |   3 +
>  6 files changed, 179 insertions(+), 78 deletions(-)
>
> --
> 2.45.2
>
>
Paul Aurich Nov. 19, 2024, 2:29 a.m. UTC | #2
On 2024-11-18 18:55:23 -0600, Steve French wrote:
>Looks like you dropped the patch:
>"smb: No need to wait for work when cleaning up cached directories"
>
>Otherwise for the four remaining patches, looks like the first patch
>stayed the same (trivial comment change).
>
>Can you remind me which of these three changed:
>
>  smb: Don't leak cfid when reconnect races with open_cached_dir
>  smb: prevent use-after-free due to open_cached_dir error paths
>  smb: During unmount, ensure all cached dir instances drop their dentry

All the substantive changes are in the last patch.  I should have clarified, 
but I just folded the changes from "smb: No need to wait for work when 
cleaning up cached directories" into that patch, as well.

>On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote:
>>
>> v2:
>> - Added locking in closed_all_cached_dirs()
>> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
>>   dir dentries, and split out the "drop dentry" work from "potential
>>   SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
>>   server traffic, but can ensure all "drop dentry" work has run.
>> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>>
>>
>> The SMB client cached directory functionality can either leak a cfid if
>> open_cached_dir() races with a reconnect, or can have races between the
>> unmount process and cached dir cleanup/lease breaks that all lead to
>> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
>> These all manifest as a pair of BUGs when unmounting:
>>
>>     [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
>>     [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>>
>> These issues started with the lease directory cache handling introduced in
>> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
>> held"), and go away if I mount with 'nohandlecache'.
>>
>> I'm able to reproduce the "Dentry still in use" errors by connecting to an
>> actively-used SMB share (the server organically generates lease breaks) and
>> leaving these running for 'a while':
>>
>> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
>> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>>
>> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
>> cleanest reproducer, but I stopped iterating once I had something that was
>> even remotely reliable for me...)
>>
>> This series attempts to fix these, as well as a use-after-free that could
>> occur because open_cached_dir() explicitly frees the cached_fid, rather than
>> relying on reference counting.
>> Paul Aurich (4):
>>   smb: cached directories can be more than root file handle
>>   smb: Don't leak cfid when reconnect races with open_cached_dir
>>   smb: prevent use-after-free due to open_cached_dir error paths
>>   smb: During unmount, ensure all cached dir instances drop their dentry
>>
>>  fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
>>  fs/smb/client/cached_dir.h |   6 +-
>>  fs/smb/client/cifsfs.c     |  14 ++-
>>  fs/smb/client/cifsglob.h   |   3 +-
>>  fs/smb/client/inode.c      |   3 -
>>  fs/smb/client/trace.h      |   3 +
>>  6 files changed, 179 insertions(+), 78 deletions(-)
>>
>> --
>> 2.45.2
>>
>>
>
>
>-- 
>Thanks,
>
>Steve
Steve French Nov. 21, 2024, 8:59 p.m. UTC | #3
I noticed a hang today in generic/246 immediately after a crash in
umount in the previous test, generic/241 running with 6.12 kernel with
recent cifs-2.6.git/for-next which includes e.g. the 5 directory
caching fixes) generic/241 to Windows. See attached dmesg log.  Any
thoughts?
I did not see it hang on a previous run.

Nov 21 12:36:09 fedora29 kernel: ? umount_check+0xc3/0xf0
Nov 21 12:36:09 fedora29 kernel: ? __pfx_umount_check+0x10/0x10
Nov 21 12:36:09 fedora29 kernel: d_walk+0xf3/0x4e0
Nov 21 12:36:09 fedora29 kernel: ? d_walk+0x4b/0x4e0
Nov 21 12:36:09 fedora29 kernel: shrink_dcache_for_umount+0x6d/0x220
Nov 21 12:36:09 fedora29 kernel: generic_shutdown_super+0x4a/0x1c0
Nov 21 12:36:09 fedora29 kernel: kill_anon_super+0x22/0x40
Nov 21 12:36:09 fedora29 kernel: cifs_kill_sb+0x78/0x90 [cifs]




On Mon, Nov 18, 2024 at 3:53 PM Paul Aurich <paul@darkrain42.org> wrote:
>
> v2:
> - Added locking in closed_all_cached_dirs()
> - Replaced use of the cifsiod_wq with a new workqueue used for dropping cached
>   dir dentries, and split out the "drop dentry" work from "potential
>   SMB2_close + cleanup" work so that close_all_cached_dirs() doesn't block on
>   server traffic, but can ensure all "drop dentry" work has run.
> - Repurposed the (essentially unused) cfid->fid_lock to protect cfid->dentry
>
>
> The SMB client cached directory functionality can either leak a cfid if
> open_cached_dir() races with a reconnect, or can have races between the
> unmount process and cached dir cleanup/lease breaks that all lead to
> a cached_dir instance not dropping its dentry ref in close_all_cached_dirs().
> These all manifest as a pair of BUGs when unmounting:
>
>     [18645.013550] BUG: Dentry ffff888140590ba0{i=1000000000080,n=/}  still in use (2) [unmount of cifs cifs]
>     [18645.789274] VFS: Busy inodes after unmount of cifs (cifs)
>
> These issues started with the lease directory cache handling introduced in
> commit ebe98f1447bb ("cifs: enable caching of directories for which a lease is
> held"), and go away if I mount with 'nohandlecache'.
>
> I'm able to reproduce the "Dentry still in use" errors by connecting to an
> actively-used SMB share (the server organically generates lease breaks) and
> leaving these running for 'a while':
>
> - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done
> - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10; echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20; echo "recovering"; iptables -F OUTPUT; sleep 10; done
>
> ('a while' is anywhere from 10 minutes to overnight. Also, it's not the
> cleanest reproducer, but I stopped iterating once I had something that was
> even remotely reliable for me...)
>
> This series attempts to fix these, as well as a use-after-free that could
> occur because open_cached_dir() explicitly frees the cached_fid, rather than
> relying on reference counting.
> Paul Aurich (4):
>   smb: cached directories can be more than root file handle
>   smb: Don't leak cfid when reconnect races with open_cached_dir
>   smb: prevent use-after-free due to open_cached_dir error paths
>   smb: During unmount, ensure all cached dir instances drop their dentry
>
>  fs/smb/client/cached_dir.c | 228 +++++++++++++++++++++++++------------
>  fs/smb/client/cached_dir.h |   6 +-
>  fs/smb/client/cifsfs.c     |  14 ++-
>  fs/smb/client/cifsglob.h   |   3 +-
>  fs/smb/client/inode.c      |   3 -
>  fs/smb/client/trace.h      |   3 +
>  6 files changed, 179 insertions(+), 78 deletions(-)
>
> --
> 2.45.2
>
>