Message ID | 8b114189-e943-a7e6-3d31-16aa8a148da6@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] namei: don't drop link paths acquired under LOOKUP_RCU | expand |
On Sun, Feb 07, 2021 at 01:26:19PM -0700, Jens Axboe wrote: > Al, not sure if this is the right fix for the situation, but it's > definitely a problem. Observed by doing a LOOKUP_CACHED of something with > links, using /proc/self/comm as the example in the attached way to > demonstrate this problem. That's definitely not the right fix. What your analysis has missed is what legitimize_links() does to nd->depth when called. IOW, on transitions from RCU mode you want nd->depth to set according the number of links we'd grabbed references to. Flatly setting it to 0 on failure exit will lead to massive leaks. Could you check if the following fixes your reproducers? diff --git a/fs/namei.c b/fs/namei.c index 4cae88733a5c..afb293b39be7 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -687,7 +687,7 @@ static bool try_to_unlazy(struct nameidata *nd) nd->flags &= ~LOOKUP_RCU; if (nd->flags & LOOKUP_CACHED) - goto out1; + goto out2; if (unlikely(!legitimize_links(nd))) goto out1; if (unlikely(!legitimize_path(nd, &nd->path, nd->seq))) @@ -698,6 +698,8 @@ static bool try_to_unlazy(struct nameidata *nd) BUG_ON(nd->inode != parent->d_inode); return true; +out2: + nd->depth = 0; // as we hadn't gotten to legitimize_links() out1: nd->path.mnt = NULL; nd->path.dentry = NULL; @@ -725,7 +727,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi nd->flags &= ~LOOKUP_RCU; if (nd->flags & LOOKUP_CACHED) - goto out2; + goto out3; if (unlikely(!legitimize_links(nd))) goto out2; if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq))) @@ -753,6 +755,8 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi rcu_read_unlock(); return true; +out3: + nd->depth = 0; // as we hadn't gotten to legitimize_links() out2: nd->path.mnt = NULL; out1:
On Sun, Feb 14, 2021 at 04:05:22PM +0000, Al Viro wrote: > On Sun, Feb 07, 2021 at 01:26:19PM -0700, Jens Axboe wrote: > > > Al, not sure if this is the right fix for the situation, but it's > > definitely a problem. Observed by doing a LOOKUP_CACHED of something with > > links, using /proc/self/comm as the example in the attached way to > > demonstrate this problem. > > That's definitely not the right fix. What your analysis has missed is > what legitimize_links() does to nd->depth when called. IOW, on transitions > from RCU mode you want nd->depth to set according the number of links we'd > grabbed references to. Flatly setting it to 0 on failure exit will lead > to massive leaks. > > Could you check if the following fixes your reproducers? > > diff --git a/fs/namei.c b/fs/namei.c > index 4cae88733a5c..afb293b39be7 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -687,7 +687,7 @@ static bool try_to_unlazy(struct nameidata *nd) > > nd->flags &= ~LOOKUP_RCU; > if (nd->flags & LOOKUP_CACHED) > - goto out1; > + goto out2; > if (unlikely(!legitimize_links(nd))) > goto out1; > if (unlikely(!legitimize_path(nd, &nd->path, nd->seq))) > @@ -698,6 +698,8 @@ static bool try_to_unlazy(struct nameidata *nd) > BUG_ON(nd->inode != parent->d_inode); > return true; > > +out2: > + nd->depth = 0; // as we hadn't gotten to legitimize_links() > out1: > nd->path.mnt = NULL; > nd->path.dentry = NULL; > @@ -725,7 +727,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi > > nd->flags &= ~LOOKUP_RCU; > if (nd->flags & LOOKUP_CACHED) > - goto out2; > + goto out3; > if (unlikely(!legitimize_links(nd))) > goto out2; > if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq))) > @@ -753,6 +755,8 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi > rcu_read_unlock(); > return true; > > +out3: > + nd->depth = 0; // as we hadn't gotten to legitimize_links() > out2: > nd->path.mnt = NULL; > out1: Alternatively, we could use the fact that legitimize_links() is not called anywhere other than these two places and have LOOKUP_CACHED checked there. As in diff --git a/fs/namei.c b/fs/namei.c index 4cae88733a5c..58962569cc20 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -630,6 +630,10 @@ static inline bool legitimize_path(struct nameidata *nd, static bool legitimize_links(struct nameidata *nd) { int i; + if (unlikely(nd->flags & LOOKUP_CACHED)) { + nd->depth = 0; + return false; + } for (i = 0; i < nd->depth; i++) { struct saved *last = nd->stack + i; if (unlikely(!legitimize_path(nd, &last->link, last->seq))) { @@ -686,8 +690,6 @@ static bool try_to_unlazy(struct nameidata *nd) BUG_ON(!(nd->flags & LOOKUP_RCU)); nd->flags &= ~LOOKUP_RCU; - if (nd->flags & LOOKUP_CACHED) - goto out1; if (unlikely(!legitimize_links(nd))) goto out1; if (unlikely(!legitimize_path(nd, &nd->path, nd->seq))) @@ -724,8 +726,6 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi BUG_ON(!(nd->flags & LOOKUP_RCU)); nd->flags &= ~LOOKUP_RCU; - if (nd->flags & LOOKUP_CACHED) - goto out2; if (unlikely(!legitimize_links(nd))) goto out2; if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq))) That would be shorter, but might be harder to follow for reader. Not sure...
On 2/14/21 9:40 AM, Al Viro wrote: > On Sun, Feb 14, 2021 at 04:05:22PM +0000, Al Viro wrote: >> On Sun, Feb 07, 2021 at 01:26:19PM -0700, Jens Axboe wrote: >> >>> Al, not sure if this is the right fix for the situation, but it's >>> definitely a problem. Observed by doing a LOOKUP_CACHED of something with >>> links, using /proc/self/comm as the example in the attached way to >>> demonstrate this problem. >> >> That's definitely not the right fix. What your analysis has missed is >> what legitimize_links() does to nd->depth when called. IOW, on transitions >> from RCU mode you want nd->depth to set according the number of links we'd >> grabbed references to. Flatly setting it to 0 on failure exit will lead >> to massive leaks. >> >> Could you check if the following fixes your reproducers? >> >> diff --git a/fs/namei.c b/fs/namei.c >> index 4cae88733a5c..afb293b39be7 100644 >> --- a/fs/namei.c >> +++ b/fs/namei.c >> @@ -687,7 +687,7 @@ static bool try_to_unlazy(struct nameidata *nd) >> >> nd->flags &= ~LOOKUP_RCU; >> if (nd->flags & LOOKUP_CACHED) >> - goto out1; >> + goto out2; >> if (unlikely(!legitimize_links(nd))) >> goto out1; >> if (unlikely(!legitimize_path(nd, &nd->path, nd->seq))) >> @@ -698,6 +698,8 @@ static bool try_to_unlazy(struct nameidata *nd) >> BUG_ON(nd->inode != parent->d_inode); >> return true; >> >> +out2: >> + nd->depth = 0; // as we hadn't gotten to legitimize_links() >> out1: >> nd->path.mnt = NULL; >> nd->path.dentry = NULL; >> @@ -725,7 +727,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi >> >> nd->flags &= ~LOOKUP_RCU; >> if (nd->flags & LOOKUP_CACHED) >> - goto out2; >> + goto out3; >> if (unlikely(!legitimize_links(nd))) >> goto out2; >> if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq))) >> @@ -753,6 +755,8 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi >> rcu_read_unlock(); >> return true; >> >> +out3: >> + nd->depth = 0; // as we hadn't gotten to legitimize_links() >> out2: >> nd->path.mnt = NULL; >> out1: > > Alternatively, we could use the fact that legitimize_links() is not > called anywhere other than these two places and have LOOKUP_CACHED > checked there. As in Both fix the issue for me, just tested them. The second one seems cleaner to me, would probably be nice to have a comment on that in either the two callers or at least in legitimize_links() though.
On Sun, Feb 14, 2021 at 09:45:39AM -0700, Jens Axboe wrote: > >> +out3: > >> + nd->depth = 0; // as we hadn't gotten to legitimize_links() > >> out2: > >> nd->path.mnt = NULL; > >> out1: > > > > Alternatively, we could use the fact that legitimize_links() is not > > called anywhere other than these two places and have LOOKUP_CACHED > > checked there. As in > > Both fix the issue for me, just tested them. The second one seems > cleaner to me, would probably be nice to have a comment on that in > either the two callers or at least in legitimize_links() though. Hmm... Do you have anything on top of that branch? If you do, there's no way to avoid leaving bisect hazard; if not, I'd rather fold a fix into the broken commit...
On 2/14/21 3:57 PM, Al Viro wrote: > On Sun, Feb 14, 2021 at 09:45:39AM -0700, Jens Axboe wrote: > >>>> +out3: >>>> + nd->depth = 0; // as we hadn't gotten to legitimize_links() >>>> out2: >>>> nd->path.mnt = NULL; >>>> out1: >>> >>> Alternatively, we could use the fact that legitimize_links() is not >>> called anywhere other than these two places and have LOOKUP_CACHED >>> checked there. As in >> >> Both fix the issue for me, just tested them. The second one seems >> cleaner to me, would probably be nice to have a comment on that in >> either the two callers or at least in legitimize_links() though. > > Hmm... Do you have anything on top of that branch? If you do, there's > no way to avoid leaving bisect hazard; if not, I'd rather fold a fix > into the broken commit... I do, that's basically the base of that series, -rc6 + that branch. So I'd prefer if you just apply the fixup, which I do think is pretty low risk even if it is a potential bisection pain point. But not really a huge one, in this case. That said, if you do want to rebase it, I can rebase mine. That's not the end of the world either.
diff --git a/fs/namei.c b/fs/namei.c index 4cae88733a5c..20e706fe505a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -701,6 +701,7 @@ static bool try_to_unlazy(struct nameidata *nd) out1: nd->path.mnt = NULL; nd->path.dentry = NULL; + nd->depth = 0; out: rcu_read_unlock(); return false; @@ -755,6 +756,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi out2: nd->path.mnt = NULL; + nd->depth = 0; out1: nd->path.dentry = NULL; out:
Ran into an issue where testing with LOOKUP_CACHED ended up complaining about a mount count mismatch: WARNING: CPU: 3 PID: 368 at fs/namespace.c:1168 mntput_no_expire+0x1b5/0x270 Modules linked in: CPU: 3 PID: 368 Comm: al Not tainted 5.11.0-rc6+ #9166 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:mntput_no_expire+0x1b5/0x270 Code: 0f 84 b9 fe ff ff 48 8b 35 28 d7 fd 00 b9 01 00 00 00 bf 08 00 00 00 48 c7 c2 80 1b 17 82 e8 d2 10 e2 ff e9 97 fe ff ff 79 02 <0f> 0b e8 44 d4 e7 ff 83 05 0d 91 da 00 01 48 c7 c7 04 96 00 82 e8 RSP: 0018:ffffc900002bbe68 EFLAGS: 00010286 RAX: 0000000000000008 RBX: 00000000ffffffff RCX: 0000000000000008 RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000008 RBP: ffff888102790000 R08: 0000000000000000 R09: 0000000000000008 R10: ffff8881b9ce9c80 R11: 0000000000002000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff888102790088 R15: ffff888102790050 FS: 00007f35e29f9580(0000) GS:ffff8881b9cc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000558689fde004 CR3: 0000000108403003 CR4: 00000000001706e0 Call Trace: path_umount+0x224/0x510 __x64_sys_umount+0x6f/0x80 do_syscall_64+0x31/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f35e292f2cb Code: 1b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 71 1b 0c 00 f7 d8 RSP: 002b:00007ffc6536e638 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f35e292f2cb RDX: 00007ffc6536e640 RSI: 0000000000000000 RDI: 0000558689fde004 RBP: 0000558689fdd220 R08: 00007f35e2a194c0 R09: 0000000000000000 R10: 0000558689fdc448 R11: 0000000000000246 R12: 0000558689fdd120 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 and I believe this is due to the stack links being put, even though they were acquired under LOOKUP_RCU, and hence don't hold an actual reference. Before LOOKUP_CACHED, we always retried without LOOKUP_RCU, and hence they'd end up being valid. But if a caller specifies LOOKUP_CACHED, then we will not be retrying without LOOKUP_RCU. Fix this by clearing the stack link depth when we unlazy. Fixes: 6c6ec2b0a3e0 ("fs: add support for LOOKUP_CACHED") Signed-off-by: Jens Axboe <axboe@kernel.dk> --- Al, not sure if this is the right fix for the situation, but it's definitely a problem. Observed by doing a LOOKUP_CACHED of something with links, using /proc/self/comm as the example in the attached way to demonstrate this problem.