Message ID | 3756406.1712244064@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cifs: Fix reacquisition of volume cookie on still-live connection | expand |
Hi Dave, David Howells <dhowells@redhat.com> writes: > During mount, cifs_mount_get_tcon() gets a tcon resource connection record > and then attaches an fscache volume cookie to it. However, it does this > irrespective of whether or not the tcon returned from cifs_get_tcon() is a > new record or one that's already in use. This leads to a warning about a > volume cookie collision and a leaked volume cookie because tcon->fscache > gets reset. > > Fix this be adding a mutex and a "we've already tried this" flag and only > doing it once for the lifetime of the tcon. Can't we just move the cookie acquisition to cifs_get_tcon() before it gets added to list @ses->tcon_list. This way we'll guarantee that the cookie is set only once for the new tcon. Besides, do we want to share a tcon with two different superblocks that have 'fsc' and 'nofsc', respectively? If not, it would be better to fix match_tcon() as well to handle such case. > [!] Note: Looking at cifs_mount_get_tcon(), a more general solution may > actually be required. Reacquiring the volume cookie isn't the only thing > that function does: it also partially reinitialises the tcon record without > any locking - which may cause live filesystem ops already using the tcon > through a previous mount to malfunction. Agreed.
Paulo Alcantara <pc@manguebit.com> wrote: > Can't we just move the cookie acquisition to cifs_get_tcon() before it > gets added to list @ses->tcon_list. This way we'll guarantee that the > cookie is set only once for the new tcon. cifs_get_tcon() is used from more than one place and I'm not sure the second place (__cifs_construct_tcon()) actually wants a cookie. I'm not sure what that path is for. Could all the (re)setting up being done in cifs_mount_get_tcon() be pushed back into cifs_get_tcon()? > Besides, do we want to share a tcon with two different superblocks that > have 'fsc' and 'nofsc', respectively? If not, it would be better to fix > match_tcon() as well to handle such case. Maybe? What does a tcon *actually* represent? I note that in cifs_match_super(), it's not the only criterion matched upon - so you can, at least in apparent theory, get different superblocks for the same tcon anyway. This suggests that the tcon might not be the best place for the fscache volume cookie as you can have multiple inodes wishing to use the same file cookie if there are multiple mounts mounting the same tcon but, say, with different mount parameters. I'm not sure what the right way around this is. The root of the problem is coherency management. If we make a change to an inode on one mounted superblock and this bounces a change notification over to the server that then pokes an inode in another mounted superblock on the same machine and causes it to be invalidated, you lose your local cache if both inodes refer to the same fscache cookie. Remember: fscache does not do this for you! It's just a facility by which which data can be stored and retrieved. The netfs is responsible for telling it when to invalidate and handling coherency. That said, it might be possible to time-share a cookie on cifs with leases, but the local superblocks would have to know about each other - in which case, why are they separate superblocks? David
Paulo Alcantara <pc@manguebit.com> wrote: > > [!] Note: Looking at cifs_mount_get_tcon(), a more general solution may > > actually be required. Reacquiring the volume cookie isn't the only thing > > that function does: it also partially reinitialises the tcon record without > > any locking - which may cause live filesystem ops already using the tcon > > through a previous mount to malfunction. > > Agreed. Looking over the code again, I'm not sure whether is actually necessary - or whether it is necessary and will be a bit nasty to implement as it will require read locking also. Firstly, reset_cifs_unix_caps() seems to re-set tcon->fsUnixInfo.Capability and tcon->unix_ext, which it would presumably set to the same things - which is probably fine. However, cifs_qfs_tcon() makes RPC operations that reloads tcon->fsDevInfo and tcon->fsAttrInfo - both of which may be being accessed without locks. smb2_qfs_tcon() and smb3_qfs_tcon() alters everything cifs_qfs_tcon() does, plus a bunch of extra tcon members. Can this locally cached information change over time on the server whilst we have a connection to it? David
David Howells <dhowells@redhat.com> writes: > Paulo Alcantara <pc@manguebit.com> wrote: > >> Can't we just move the cookie acquisition to cifs_get_tcon() before it >> gets added to list @ses->tcon_list. This way we'll guarantee that the >> cookie is set only once for the new tcon. > > cifs_get_tcon() is used from more than one place and I'm not sure the second > place (__cifs_construct_tcon()) actually wants a cookie. I'm not sure what > that path is for. __cifs_construct_tcon() is used for creating sessions and tcons under multiuser mounts. Whenever an user accesses a multiuser mount and the client can't find a credential for it, a new session and tcon will be created for the user accessing the mount -- new accesses from same user will end up reusing the created session and tcon. And yes, I don't think we'll need a cookie for those tcons as the client seems to get the fscache cookie from master tcon (the one created from mount credentials). > Could all the (re)setting up being done in cifs_mount_get_tcon() be > pushed back into cifs_get_tcon()? AFAICT, yes. I'd need to look into it to make sure that's safe. >> Besides, do we want to share a tcon with two different superblocks that >> have 'fsc' and 'nofsc', respectively? If not, it would be better to fix >> match_tcon() as well to handle such case. > > Maybe? What does a tcon *actually* represent? I note that in > cifs_match_super(), it's not the only criterion matched upon - so you can, at > least in apparent theory, get different superblocks for the same tcon anyway. tcon simply represents a tree connected SMB share. It can be either an IPC share (\\srv\IPC$) or the actual share (\\srv\share) we're accessing the files from. Consider the following example where a tcon is reused from different CIFS superblocks: mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes. The two mounts are accessing the same tcon (\\srv\share) but the new superblock was created because the prefix path "\dir" didn't match in cifs_match_super(). Trust me, that's a very common scenario. > This suggests that the tcon might not be the best place for the fscache volume > cookie as you can have multiple inodes wishing to use the same file cookie if > there are multiple mounts mounting the same tcon but, say, with different > mount parameters. We're not supposed to allow mounts with different parameters reusing servers, sessions or tcons, so that should be no issue. > I'm not sure what the right way around this is. The root of the problem is > coherency management. If we make a change to an inode on one mounted > superblock and this bounces a change notification over to the server that then > pokes an inode in another mounted superblock on the same machine and causes it > to be invalidated, you lose your local cache if both inodes refer to the same > fscache cookie. Yes, that could be a problem. Perhaps placing the fscache cookie in the superblock would be a way to go, so we can handle the different fscache cookies for the superblocks that contain different prefix paths and access same tcon. > Remember: fscache does not do this for you! It's just a facility by which > which data can be stored and retrieved. The netfs is responsible for telling > it when to invalidate and handling coherency. ACK. > That said, it might be possible to time-share a cookie on cifs with leases, > but the local superblocks would have to know about each other - in which case, > why are they separate superblocks? See above why they could be separate superblocks.
Paulo Alcantara <pc@manguebit.com> wrote: > Consider the following example where a tcon is reused from different > CIFS superblocks: > > mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon > mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon > > So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes. > > The two mounts are accessing the same tcon (\\srv\share) but the new > superblock was created because the prefix path "\dir" didn't match in > cifs_match_super(). Trust me, that's a very common scenario. Why does it need to lead to a different superblock, assuming ${opts} is the same in both cases? Can we not do as NFS does and share the superblock, walking during the mount process through the directory prefix to the root object? In other words, why does: mount.cifs //srv/share /mnt/1 -o ${opts} mount.cifs //srv/share/dir /mnt/2 -o ${opts} give you a different result to: mount.cifs //srv/share /mnt/1 -o ${opts} mount --bind /mnt/1/dir /mnt/2 David
On 4/17/2024 10:38 AM, David Howells wrote: > Paulo Alcantara <pc@manguebit.com> wrote: > >> Consider the following example where a tcon is reused from different >> CIFS superblocks: >> >> mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon >> mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon >> >> So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes. >> >> The two mounts are accessing the same tcon (\\srv\share) but the new >> superblock was created because the prefix path "\dir" didn't match in >> cifs_match_super(). Trust me, that's a very common scenario. > > Why does it need to lead to a different superblock, assuming ${opts} is the The tcon is a property of the SMB3 session, it's not shared nor is it necessarily created at mount time. Tom. > same in both cases? Can we not do as NFS does and share the superblock, > walking during the mount process through the directory prefix to the root > object? > > In other words, why does: > > mount.cifs //srv/share /mnt/1 -o ${opts} > mount.cifs //srv/share/dir /mnt/2 -o ${opts} > > give you a different result to: > > mount.cifs //srv/share /mnt/1 -o ${opts} > mount --bind /mnt/1/dir /mnt/2 > > David > > >
David Howells <dhowells@redhat.com> writes: > Paulo Alcantara <pc@manguebit.com> wrote: > >> Consider the following example where a tcon is reused from different >> CIFS superblocks: >> >> mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon >> mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon >> >> So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes. >> >> The two mounts are accessing the same tcon (\\srv\share) but the new >> superblock was created because the prefix path "\dir" didn't match in >> cifs_match_super(). Trust me, that's a very common scenario. > > Why does it need to lead to a different superblock, assuming ${opts} is the > same in both cases? Can we not do as NFS does and share the superblock, > walking during the mount process through the directory prefix to the root > object? I don't know why it was designed that way, but the reason we have two different superblocks with ${opts} being the same is because cifs.ko relies on the value of cifs_sb_info::prepath to build paths out of dentries. See build_path_from_dentry(). So, when you access /mnt/2/foo, cifs.ko will build a path like '[optional tree name prefix] + cifs_sb_info::prepath + \foo' and then reuse connections (server+session+tcon) from first superblock to perform I/O on that file. > In other words, why does: > > mount.cifs //srv/share /mnt/1 -o ${opts} > mount.cifs //srv/share/dir /mnt/2 -o ${opts} > > give you a different result to: > > mount.cifs //srv/share /mnt/1 -o ${opts} > mount --bind /mnt/1/dir /mnt/2 Honestly, I don't know how bind works at VFS level. I see that the new superblock isn't created and when I access /mnt/2/foo, build_path_from_dentry() correctly returns '\dir\foo'.
Tom Talpey <tom@talpey.com> wrote: > The tcon is a property of the SMB3 session, it's not shared nor is > it necessarily created at mount time. Trust me, it can be shared between superblocks. David
Paulo Alcantara <pc@manguebit.com> wrote: > I don't know why it was designed that way, but the reason we have two > different superblocks with ${opts} being the same is because cifs.ko > relies on the value of cifs_sb_info::prepath to build paths out of > dentries. See build_path_from_dentry(). So, when you access > /mnt/2/foo, cifs.ko will build a path like '[optional tree name prefix] > + cifs_sb_info::prepath + \foo' and then reuse connections > (server+session+tcon) from first superblock to perform I/O on that file. Yep. You don't *need* prepath. You could always build from the sb->s_root without a prepath and have mnt->mnt_root offset the place the VFS thinks you are: [rootdir]/ <--- s_root points here | v foo/ | v bar/ <--- mnt_root points here | v a Without prepath, you build back up the tree { a, bar/, foo/, [rootdir] } with prepath you insert the prepath at the end. Bind mounts just make the VFS think it's starting midway down, but you build up back to s_root. Think of a mount as just referring to a subtree of the tree inside the superblock. The prepath is just an optimisation - but possibly one that makes sense for cifs if you're having to do pathname fabrication a lot. David
David Howells <dhowells@redhat.com> writes: > Paulo Alcantara <pc@manguebit.com> wrote: > >> I don't know why it was designed that way, but the reason we have two >> different superblocks with ${opts} being the same is because cifs.ko >> relies on the value of cifs_sb_info::prepath to build paths out of >> dentries. See build_path_from_dentry(). So, when you access >> /mnt/2/foo, cifs.ko will build a path like '[optional tree name prefix] >> + cifs_sb_info::prepath + \foo' and then reuse connections >> (server+session+tcon) from first superblock to perform I/O on that file. > > Yep. You don't *need* prepath. You could always build from the sb->s_root > without a prepath and have mnt->mnt_root offset the place the VFS thinks you > are: > > [rootdir]/ <--- s_root points here > | > v > foo/ > | > v > bar/ <--- mnt_root points here > | > v > a > > Without prepath, you build back up the tree { a, bar/, foo/, [rootdir] } with > prepath you insert the prepath at the end. > > Bind mounts just make the VFS think it's starting midway down, but you build > up back to s_root. > > Think of a mount as just referring to a subtree of the tree inside the > superblock. The prepath is just an optimisation - but possibly one that makes > sense for cifs if you're having to do pathname fabrication a lot. Thanks alot Dave! Great explanation. We also need to remember that those prefix paths can also be changed over reconnect. That is, if you're currently mounted to a DFS link target '\srv1\share' and client failovers to next target '\srv2\share\foo\bar', cifs_sb_info::prepath will be set to '\foo\bar'. And if you mounted the DFS link as `mount.cifs //dfs/link/some/dir`, cifs_sb_info::prepath would be set to '\some\dir\foo\bar'. Yeah, a lot of corner cases to handle... Anyways, don't worry much about all of this as we can handle in follow-up patches. FWIW, patch looks good: Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index 7ed9d05f6890..43319288b4e3 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -1275,7 +1275,9 @@ struct cifs_tcon { __u32 max_cached_dirs; #ifdef CONFIG_CIFS_FSCACHE u64 resource_id; /* server resource id */ + bool fscache_acquired; /* T if we've tried acquiring a cookie */ struct fscache_volume *fscache; /* cookie for share */ + struct mutex fscache_lock; /* Prevent regetting a cookie */ #endif struct list_head pending_opens; /* list of incomplete opens */ struct cached_fids *cfids; diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c index 340efce8f052..113bde8f1e61 100644 --- a/fs/smb/client/fscache.c +++ b/fs/smb/client/fscache.c @@ -43,12 +43,23 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) char *key; int ret = -ENOMEM; + if (tcon->fscache_acquired) + return 0; + + mutex_lock(&tcon->fscache_lock); + if (tcon->fscache_acquired) { + mutex_unlock(&tcon->fscache_lock); + return 0; + } + tcon->fscache_acquired = true; + tcon->fscache = NULL; switch (sa->sa_family) { case AF_INET: case AF_INET6: break; default: + mutex_unlock(&tcon->fscache_lock); cifs_dbg(VFS, "Unknown network family '%d'\n", sa->sa_family); return -EINVAL; } @@ -57,6 +68,7 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) sharename = extract_sharename(tcon->tree_name); if (IS_ERR(sharename)) { + mutex_unlock(&tcon->fscache_lock); cifs_dbg(FYI, "%s: couldn't extract sharename\n", __func__); return PTR_ERR(sharename); } @@ -90,6 +102,7 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) kfree(key); out: kfree(sharename); + mutex_unlock(&tcon->fscache_lock); return ret; } diff --git a/fs/smb/client/misc.c b/fs/smb/client/misc.c index c3771fc81328..b27fbb840539 100644 --- a/fs/smb/client/misc.c +++ b/fs/smb/client/misc.c @@ -141,6 +141,9 @@ tcon_info_alloc(bool dir_leases_enabled) #ifdef CONFIG_CIFS_DFS_UPCALL INIT_LIST_HEAD(&ret_buf->dfs_ses_list); #endif +#ifdef CONFIG_CIFS_FSCACHE + mutex_init(&ret_buf->fscache_lock); +#endif return ret_buf; }
During mount, cifs_mount_get_tcon() gets a tcon resource connection record and then attaches an fscache volume cookie to it. However, it does this irrespective of whether or not the tcon returned from cifs_get_tcon() is a new record or one that's already in use. This leads to a warning about a volume cookie collision and a leaked volume cookie because tcon->fscache gets reset. Fix this be adding a mutex and a "we've already tried this" flag and only doing it once for the lifetime of the tcon. [!] Note: Looking at cifs_mount_get_tcon(), a more general solution may actually be required. Reacquiring the volume cookie isn't the only thing that function does: it also partially reinitialises the tcon record without any locking - which may cause live filesystem ops already using the tcon through a previous mount to malfunction. This can be reproduced simply by something like: mount //example.com/test /xfstest.test -o user=shares,pass=xxx,fsc mount //example.com/test /mnt -o user=shares,pass=xxx,fsc Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/smb/client/cifsglob.h | 2 ++ fs/smb/client/fscache.c | 13 +++++++++++++ fs/smb/client/misc.c | 3 +++ 3 files changed, 18 insertions(+)