Message ID | 20210129171316.13160-1-aaptel@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1] cifs: make nested cifs mount point dentries always valid to deal with signaled 'df' | expand |
I'm assuming that the revalidation is failing in cifs_revalidate_dentry, because it returns -ERESTARTSYS. Going by the documentation of d_revalidate: > This function should return a positive value if the dentry is still > valid, and zero or a negative error code if it isn't. In case of error, can we try returning the rc itself (rather than 0), and see if VFS avoids a dentry put? Because theoretically, the call execution has failed, and the dentry is not found to be invalid. Regards, Shyam On Fri, Jan 29, 2021 at 9:16 AM Aurélien Aptel <aaptel@suse.com> wrote: > > From: Aurelien Aptel <aaptel@suse.com> > > Assuming > - //HOST/a is mounted on /mnt > - //HOST/b is mounted on /mnt/b > > On a slow connection, running 'df' and killing it while it's > processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. > > This triggers the following chain of events: > => the dentry revalidation fail > => dentry is put and released > => superblock associated with the dentry is put > => /mnt/b is unmounted > > This quick fix makes cifs_d_revalidate() always succeed (mark as > valid) on cifs dentries which are also mount points. > > Signed-off-by: Aurelien Aptel <aaptel@suse.com> > Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> > --- > > I have managed to reproduce this bug with the following script. It > uses tc with netem discipline (CONFIG_NET_SCH_NETEM=y) to simulate > network delays. > > #!/bin/bash > > # > # reproducing bsc#1177440 > # > # nested mount point gets unmounted when process is signaled > # > set -e > > share1=//192.168.2.110/scratch > share2=//192.168.2.110/test > opts="username=administrator,password=aaptel-42,vers=1.0,actimeo=0" > > cleanup() { > # reset delay > tc qdisc del dev eth0 root > mount|grep -q /mnt/nest/a && umount /mnt/nest/a > mount|grep -q /mnt/nest && umount /mnt/nest > > echo 'module cifs -p' > /sys/kernel/debug/dynamic_debug/control > echo 'file fs/cifs/* -p' > /sys/kernel/debug/dynamic_debug/control > echo 0 > /proc/fs/cifs/cifsFYI > echo 0 > /sys/module/dns_resolver/parameters/debug > echo 1 > /proc/sys/kernel/printk_ratelimit > > } > > trap cleanup EXIT > > nbcifs() { > mount|grep cifs|wc -l > } > > reset() { > echo "unmounting and reloading cifs.ko" > mount|grep -q /mnt/nest/a && umount /mnt/nest/a > mount|grep -q /mnt/nest && umount /mnt/nest > sleep 1 > lsmod|grep -q cifs && ( modprobe -r cifs &> /dev/null || true ) > lsmod|grep -q cifs || ( modprobe cifs &> /dev/null || true ) > } > > mnt() { > dmesg --clear > echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control > echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control > echo 1 > /proc/fs/cifs/cifsFYI > echo 1 > /sys/module/dns_resolver/parameters/debug > echo 0 > /proc/sys/kernel/printk_ratelimit > > echo "mounting" > mkdir -p /mnt/nest > mount.cifs $share1 /mnt/nest -o "$opts" > mkdir -p /mnt/nest/a > mount.cifs $share2 /mnt/nest/a -o "$opts" > } > > # add fake delay > tc qdisc add dev eth0 root netem delay 300ms > > while :; do > reset > mnt > n=$(nbcifs) > echo "starting df with $n mounts" > df & > pid=$! > sleep 1.5 > kill $pid || true > x=$(nbcifs) > echo "stopping with $x mounts" > if [ $x -lt $n ]; then > echo "lost mounts" > dmesg > kernel.log > exit 1 > fi > done > > > > fs/cifs/dir.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c > index 68900f1629bff..876ef01628538 100644 > --- a/fs/cifs/dir.c > +++ b/fs/cifs/dir.c > @@ -741,6 +741,10 @@ cifs_d_revalidate(struct dentry *direntry, unsigned int flags) > if (flags & LOOKUP_RCU) > return -ECHILD; > > + /* nested cifs mount point are always valid */ > + if (d_mountpoint(direntry)) > + return 1; > + > if (d_really_is_positive(direntry)) { > inode = d_inode(direntry); > if ((flags & LOOKUP_REVAL) && !CIFS_CACHE_READ(CIFS_I(inode))) > -- > 2.29.2 >
Shyam Prasad N <nspmangalore@gmail.com> writes: > Going by the documentation of d_revalidate: >> This function should return a positive value if the dentry is still >> valid, and zero or a negative error code if it isn't. > > In case of error, can we try returning the rc itself (rather than 0), > and see if VFS avoids a dentry put? > Because theoretically, the call execution has failed, and the dentry > is not found to be invalid. AFAIK mount points are pinned, you cannot rm or mv them so it seems we could make them always valid. I don't know if there are deeper and more subtle implications. The recent signal fixes are not fixing this issue. Cheers,
I'm okay with returning valid for directory mount point. But the point that I'm trying to make here is that VFS reacts differently when d_validate returns an error VS when it just returns invalid: https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1409 Notice how it calls d_invalidate only when there's no error. And d_invalidate seems to have detach_mounts. It is likely that the umount happens there. I'm suggesting that we should return errors inside d_validate handlers, rather than just 0 or 1. Makes sense? Regards, Shyam On Mon, Feb 1, 2021 at 4:01 PM Aurélien Aptel <aaptel@suse.com> wrote: > > Shyam Prasad N <nspmangalore@gmail.com> writes: > > Going by the documentation of d_revalidate: > >> This function should return a positive value if the dentry is still > >> valid, and zero or a negative error code if it isn't. > > > > In case of error, can we try returning the rc itself (rather than 0), > > and see if VFS avoids a dentry put? > > Because theoretically, the call execution has failed, and the dentry > > is not found to be invalid. > > AFAIK mount points are pinned, you cannot rm or mv them so it seems we > could make them always valid. I don't know if there are deeper and more > subtle implications. > > The recent signal fixes are not fixing this issue. > > Cheers, > -- > Aurélien Aptel / SUSE Labs Samba Team > GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3 > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München) >
Shyam Prasad N <nspmangalore@gmail.com> writes: > But the point that I'm trying to make here is that VFS reacts > differently when d_validate returns an error VS when it just returns > invalid: > https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1409 I've tried returning the error with the repro script and it also worked. > Notice how it calls d_invalidate only when there's no error. And > d_invalidate seems to have detach_mounts. > It is likely that the umount happens there. I've dumped call stacks, the revalidation is triggered by filename_lookup() [ 31.913092] [<ffffffff8133e3d1>] ? SendReceive+0x2b1/0x2d0 [ 31.913093] [<ffffffff8132cbd2>] ? CIFSSMBQPathInfo+0x152/0x260 [ 31.913096] [<ffffffff81343c9f>] ? cifs_query_path_info+0x6f/0x1a0 [ 31.913098] [<ffffffff81366953>] ? cifs_get_inode_info.cold+0x44f/0x6bb [ 31.913100] [<ffffffff810bf935>] ? wake_up_process+0x15/0x20 [ 31.913102] [<ffffffff810e9c30>] ? vprintk_emit+0x200/0x500 [ 31.913103] [<ffffffff810ea099>] ? vprintk_default+0x29/0x40 [ 31.913105] [<ffffffff811a896f>] ? printk+0x50/0x52 [ 31.913107] [<ffffffff813678c1>] ? cifs_revalidate_dentry_attr.cold+0x71/0x79 [ 31.913109] [<ffffffff8133b194>] ? cifs_revalidate_dentry+0x14/0x30 [ 31.913110] [<ffffffff813327e5>] ? cifs_d_revalidate+0x25/0xb0 [ 31.913112] [<ffffffff812404ff>] ? lookup_fast+0x1bf/0x220 [ 31.913113] [<ffffffff812407bc>] ? walk_component+0x3c/0x3f0 [ 31.913114] [<ffffffff8123fe5b>] ? path_init+0x23b/0x450 [ 31.913116] [<ffffffff812412bf>] ? path_lookupat+0x7f/0x110 [ 31.913118] [<ffffffff81242ae7>] ? filename_lookup+0x97/0x190 [ 31.913120] [<ffffffff811e1e39>] ? handle_pte_fault+0x1d9/0x240 [ 31.913122] [<ffffffff81242cc9>] ? user_path_at_empty+0x59/0x90 [ 31.913124] [<ffffffff8126ab59>] ? user_statfs+0x39/0xa0 [ 31.913125] [<ffffffff8126abd6>] ? SYSC_statfs+0x16/0x40 [ 31.913127] [<ffffffff8106de53>] ? trace_do_page_fault+0x43/0x150 [ 31.913130] [<ffffffff8180359c>] ? async_page_fault+0x3c/0x60 [ 31.913131] [<ffffffff8126ad6e>] ? SyS_statfs+0xe/0x10 [ 31.913132] [<ffffffff81800021>] ? entry_SYSCALL_64_fastpath+0x20/0xee d_invalidate()->...->drop_mountpoint() adds a callback to unmount at later times: [ 31.913246] [<ffffffff812554da>] ? drop_mountpoint+0x6a/0x70 [ 31.913247] [<ffffffff8126b0b8>] ? pin_kill+0x88/0x160 [ 31.913249] [<ffffffff810d6a20>] ? prepare_to_wait_event+0x100/0x100 [ 31.913250] [<ffffffff810d6a20>] ? prepare_to_wait_event+0x100/0x100 [ 31.913252] [<ffffffff8126b205>] ? group_pin_kill+0x25/0x50 [ 31.913253] [<ffffffff812564fa>] ? __detach_mounts+0x13a/0x140 [ 31.913255] [<ffffffff8124bf24>] ? d_invalidate+0xa4/0x100 [ 31.913256] [<ffffffff812404cd>] ? lookup_fast+0x18d/0x220 [ 31.913257] [<ffffffff812407bc>] ? walk_component+0x3c/0x3f0 [ 31.913258] [<ffffffff8123fe5b>] ? path_init+0x23b/0x450 [ 31.913259] [<ffffffff812412bf>] ? path_lookupat+0x7f/0x110 [ 31.913261] [<ffffffff81242ae7>] ? filename_lookup+0x97/0x190 [ 31.913262] [<ffffffff811e1e39>] ? handle_pte_fault+0x1d9/0x240 [ 31.913263] [<ffffffff81242cc9>] ? user_path_at_empty+0x59/0x90 [ 31.913265] [<ffffffff8126ab59>] ? user_statfs+0x39/0xa0 [ 31.913266] [<ffffffff8126abd6>] ? SYSC_statfs+0x16/0x40 [ 31.913268] [<ffffffff8106de53>] ? trace_do_page_fault+0x43/0x150 [ 31.913269] [<ffffffff8180359c>] ? async_page_fault+0x3c/0x60 [ 31.913270] [<ffffffff8126ad6e>] ? SyS_statfs+0xe/0x10 [ 31.913272] [<ffffffff81800021>] ? entry_SYSCALL_64_fastpath+0x20/0xee The actual unmount call: [ 31.913594] [<ffffffff81362248>] ? cifs_put_tcon.part.0+0x71/0x123 [ 31.913597] [<ffffffff81362309>] ? cifs_put_tlink.cold+0x5/0xa [ 31.913599] [<ffffffff813318a5>] ? cifs_umount+0x65/0xd0 [ 31.913601] [<ffffffff8132976f>] ? cifs_kill_sb+0x1f/0x30 [ 31.913603] [<ffffffff8123527a>] ? deactivate_locked_super+0x4a/0xd0 [ 31.913605] [<ffffffff81235344>] ? deactivate_super+0x44/0x50 [ 31.913609] [<ffffffff81253adb>] ? cleanup_mnt+0x3b/0x90 [ 31.913610] [<ffffffff81253b82>] ? __cleanup_mnt+0x12/0x20 [ 31.913613] [<ffffffff810af9eb>] ? task_work_run+0x7b/0xb0 [ 31.913616] [<ffffffff810a0a3a>] ? get_signal+0x4ea/0x4f0 [ 31.913618] [<ffffffff811e1e39>] ? handle_pte_fault+0x1d9/0x240 [ 31.913621] [<ffffffff810185d1>] ? do_signal+0x21/0x100 [ 31.913624] [<ffffffff81242cc9>] ? user_path_at_empty+0x59/0x90 [ 31.913626] [<ffffffff8126ab59>] ? user_statfs+0x39/0xa0 [ 31.913628] [<ffffffff8126abd6>] ? SYSC_statfs+0x16/0x40 [ 31.913630] [<ffffffff81003517>] ? exit_to_usermode_loop+0x87/0xc0 [ 31.913632] [<ffffffff81003bed>] ? syscall_return_slowpath+0xed/0x180 [ 31.913634] [<ffffffff818001b1>] ? int_ret_from_sys_call+0x8/0x6d > I'm suggesting that we should return errors inside d_validate > handlers, rather than just 0 or 1. > Makes sense? Yes that worked too. I'll send v2. Cheers,
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c index 68900f1629bff..876ef01628538 100644 --- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -741,6 +741,10 @@ cifs_d_revalidate(struct dentry *direntry, unsigned int flags) if (flags & LOOKUP_RCU) return -ECHILD; + /* nested cifs mount point are always valid */ + if (d_mountpoint(direntry)) + return 1; + if (d_really_is_positive(direntry)) { inode = d_inode(direntry); if ((flags & LOOKUP_REVAL) && !CIFS_CACHE_READ(CIFS_I(inode)))