Message ID | 1449511503-7543-1-git-send-email-suzuki.poulose@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Dec 7, 2015 at 10:05 AM, Suzuki K. Poulose <suzuki.poulose@arm.com> wrote: > blkdev_open() doesn't release the bdev, it attached to a given > inode, if blkdev_get() fails (e.g, due to absence of a device). > This can cause kernel crashes when the original filesystem > tries to flush the data during evict_inode. Ugh. This code is a mess. Al, can you please comment? So what happens is that when "blkdev_get()" fails, it will do a bdput() on the bdev. But blkdev_open() hasn't done a bdget(). It's done a bd_acquire(). Which will do the whole "add inodes to bd_inodes". And yes, bd_forget() will undo that. HOWEVER. bd_forget() will undo that unconditionally, but bd_acquire() has *not* unconditionally done that bd_inodes list operation. It might already have been there. So as far as I can tell, the patch here undoes things potentially too much. Shouldn't the last bdput() already end up doing a bd_forget()? We'd have bdput -> iput -> iput_final -> evict -> bd_forget. but the fact that Suzuki shows an oops clearly shows that something is badly wrong. IOW, the path looks simple and apparently fixes an oops, but I'd like much more of an explanation for what happens, because it all feels wrong to me. Why doesn't the bdput() end up undoing the bd_acquire() properly? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/block_dev.c b/fs/block_dev.c index c25639e..7d7f322 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1484,6 +1484,7 @@ EXPORT_SYMBOL(blkdev_get_by_dev); static int blkdev_open(struct inode * inode, struct file * filp) { + int rc; struct block_device *bdev; /* @@ -1507,7 +1508,11 @@ static int blkdev_open(struct inode * inode, struct file * filp) filp->f_mapping = bdev->bd_inode->i_mapping; - return blkdev_get(bdev, filp->f_mode, filp); + rc = blkdev_get(bdev, filp->f_mode, filp); + if (rc) + bd_forget(inode); + + return rc; } static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
blkdev_open() doesn't release the bdev, it attached to a given inode, if blkdev_get() fails (e.g, due to absence of a device). This can cause kernel crashes when the original filesystem tries to flush the data during evict_inode. This can be triggered easily with virtio-9p fs using the following simple steps. root@localhost:~# mknod disk b 9 1 root@localhost:~# cat disk Unable to handle kernel NULL pointer dereference at virtual address 00000214 pgd = bea40000 [00000214] *pgd=be9eb831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1094 Comm: cat Not tainted 4.3.0 #3 Hardware name: Generic DT based system task: bf186600 ti: be822000 task.ti: be822000 PC is at blk_get_backing_dev_info+0x4/0x10 LR is at __filemap_fdatawrite_range+0x88/0x94 pc : [<80317e00>] lr : [<801995d4>] psr: 60010013 sp : be823db0 ip : 00000000 fp : 00000024 r10: fffffffa r9 : be86a240 r8 : bec87e58 r7 : 00000001 r6 : 80615640 r5 : 7fffffff r4 : bec03354 r3 : 00000000 r2 : bf006c00 r1 : 7fffffff r0 : bec03200 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5383d Table: bea4006a DAC: 00000051 Process cat (pid: 1094, stack limit = 0xbe822210) [... stack contents trimmed ...] [<80317e00>] (blk_get_backing_dev_info) from [<801995d4>] (__filemap_fdatawrite_range+0x88/0x94) [<801995d4>] (__filemap_fdatawrite_range) from [<80199608>] (filemap_fdatawrite+0x28/0x30) [<80199608>] (filemap_fdatawrite) from [<802fa830>] (v9fs_evict_inode+0x20/0x3c) [<802fa830>] (v9fs_evict_inode) from [<801ef4fc>] (evict+0xb0/0x188) [<801ef4fc>] (evict) from [<801eb998>] (__dentry_kill+0x1ec/0x250) [<801eb998>] (__dentry_kill) from [<801ec2d8>] (dput+0x188/0x28c) [<801ec2d8>] (dput) from [<801e0858>] (path_put+0x10/0x1c) [<801e0858>] (path_put) from [<801e08a0>] (terminate_walk+0x3c/0x98) [<801e08a0>] (terminate_walk) from [<801e3d54>] (path_openat+0x1ec/0xeac) [<801e3d54>] (path_openat) from [<801e56c8>] (do_filp_open+0x60/0xb4) [<801e56c8>] (do_filp_open) from [<801d7850>] (do_sys_open+0x124/0x1d0) [<801d7850>] (do_sys_open) from [<80107340>] (ret_fast_syscall+0x0/0x3c) Code: 806d3ca0 80941b7c 807175d8 e590305c (e5930214) ---[ end trace b61b160a3217ae29 ]--- Fixes: e525fd89d380c4a94c0d63913a1dd1a593ed25e7 Cc: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> --- fs/block_dev.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)