diff mbox

blkdev: Fix blkdev_open to release the bdev on error

Message ID 1449511503-7543-1-git-send-email-suzuki.poulose@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suzuki K Poulose Dec. 7, 2015, 6:05 p.m. UTC
blkdev_open() doesn't release the bdev, it attached to a given
inode, if blkdev_get() fails (e.g, due to absence of a device).
This can cause kernel crashes when the original filesystem
tries to flush the data during evict_inode.

This can be triggered easily with virtio-9p fs using the following
simple steps.

root@localhost:~# mknod disk b 9 1
root@localhost:~# cat disk
Unable to handle kernel NULL pointer dereference at virtual address 00000214
pgd = bea40000
[00000214] *pgd=be9eb831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1094 Comm: cat Not tainted 4.3.0 #3
Hardware name: Generic DT based system
task: bf186600 ti: be822000 task.ti: be822000
PC is at blk_get_backing_dev_info+0x4/0x10
LR is at __filemap_fdatawrite_range+0x88/0x94
pc : [<80317e00>]    lr : [<801995d4>]    psr: 60010013
sp : be823db0  ip : 00000000  fp : 00000024
r10: fffffffa  r9 : be86a240  r8 : bec87e58
r7 : 00000001  r6 : 80615640  r5 : 7fffffff  r4 : bec03354
r3 : 00000000  r2 : bf006c00  r1 : 7fffffff  r0 : bec03200
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5383d  Table: bea4006a  DAC: 00000051
Process cat (pid: 1094, stack limit = 0xbe822210)

[... stack contents trimmed ...]

[<80317e00>] (blk_get_backing_dev_info) from [<801995d4>] (__filemap_fdatawrite_range+0x88/0x94)
[<801995d4>] (__filemap_fdatawrite_range) from [<80199608>] (filemap_fdatawrite+0x28/0x30)
[<80199608>] (filemap_fdatawrite) from [<802fa830>] (v9fs_evict_inode+0x20/0x3c)
[<802fa830>] (v9fs_evict_inode) from [<801ef4fc>] (evict+0xb0/0x188)
[<801ef4fc>] (evict) from [<801eb998>] (__dentry_kill+0x1ec/0x250)
[<801eb998>] (__dentry_kill) from [<801ec2d8>] (dput+0x188/0x28c)
[<801ec2d8>] (dput) from [<801e0858>] (path_put+0x10/0x1c)
[<801e0858>] (path_put) from [<801e08a0>] (terminate_walk+0x3c/0x98)
[<801e08a0>] (terminate_walk) from [<801e3d54>] (path_openat+0x1ec/0xeac)
[<801e3d54>] (path_openat) from [<801e56c8>] (do_filp_open+0x60/0xb4)
[<801e56c8>] (do_filp_open) from [<801d7850>] (do_sys_open+0x124/0x1d0)
[<801d7850>] (do_sys_open) from [<80107340>] (ret_fast_syscall+0x0/0x3c)
Code: 806d3ca0 80941b7c 807175d8 e590305c (e5930214)
---[ end trace b61b160a3217ae29 ]---

Fixes: e525fd89d380c4a94c0d63913a1dd1a593ed25e7
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 fs/block_dev.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Linus Torvalds Dec. 7, 2015, 6:49 p.m. UTC | #1
On Mon, Dec 7, 2015 at 10:05 AM, Suzuki K. Poulose
<suzuki.poulose@arm.com> wrote:
> blkdev_open() doesn't release the bdev, it attached to a given
> inode, if blkdev_get() fails (e.g, due to absence of a device).
> This can cause kernel crashes when the original filesystem
> tries to flush the data during evict_inode.

Ugh. This code is a mess. Al, can you please comment?

So what happens is that when "blkdev_get()" fails, it will do a
bdput() on the bdev.

But blkdev_open() hasn't done a bdget(). It's done a bd_acquire().
Which will do the whole "add inodes to bd_inodes". And yes,
bd_forget() will undo that.

HOWEVER.

bd_forget() will undo that unconditionally, but bd_acquire() has *not*
unconditionally done that bd_inodes list operation. It might already
have been there.

So as far as I can tell, the patch here undoes things potentially too much.

Shouldn't the last bdput() already end up doing a bd_forget()? We'd have

 bdput -> iput -> iput_final -> evict -> bd_forget.

but the fact that Suzuki shows an oops clearly shows that something is
badly wrong.

IOW, the path looks simple and apparently fixes an oops, but I'd like
much more of an explanation for what happens, because it all feels
wrong to me. Why doesn't the bdput() end up undoing the bd_acquire()
properly?

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c25639e..7d7f322 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1484,6 +1484,7 @@  EXPORT_SYMBOL(blkdev_get_by_dev);
 
 static int blkdev_open(struct inode * inode, struct file * filp)
 {
+	int rc;
 	struct block_device *bdev;
 
 	/*
@@ -1507,7 +1508,11 @@  static int blkdev_open(struct inode * inode, struct file * filp)
 
 	filp->f_mapping = bdev->bd_inode->i_mapping;
 
-	return blkdev_get(bdev, filp->f_mode, filp);
+	rc = blkdev_get(bdev, filp->f_mode, filp);
+	if (rc)
+		bd_forget(inode);
+
+	return rc;
 }
 
 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)