From patchwork Thu Jan 11 17:00:08 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Kara <jack@suse.cz>
X-Patchwork-Id: 10158243
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	A6996602D8 for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 11 Jan 2018 17:00:24 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E21A2881D
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 11 Jan 2018 17:00:24 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 2C92728874; Thu, 11 Jan 2018 17:00:24 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCB032881D
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 11 Jan 2018 17:00:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934269AbeAKRAN (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Thu, 11 Jan 2018 12:00:13 -0500
Received: from mx2.suse.de ([195.135.220.15]:52966 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S934102AbeAKRAL (ORCPT <rfc822;linux-block@vger.kernel.org>);
	Thu, 11 Jan 2018 12:00:11 -0500
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])
	by mx2.suse.de (Postfix) with ESMTP id 004A9AEE1;
	Thu, 11 Jan 2018 17:00:09 +0000 (UTC)
Received: by quack2.suse.cz (Postfix, from userid 1000)
	id 42D741E39A2; Thu, 11 Jan 2018 18:00:08 +0100 (CET)
Date: Thu, 11 Jan 2018 18:00:08 +0100
From: Jan Kara <jack@suse.cz>
To: Hou Tao <houtao1@huawei.com>
Cc: Foy <long7573@126.com>, Jens Axboe <axboe@kernel.dk>,
	Jan Kara <jack@suse.cz>, Dan Carpenter <dan.carpenter@oracle.com>,
	Paolo Valente <paolo.valente@linaro.org>, security@kernel.org,
	syzkaller@googlegroups.com, linux-block@vger.kernel.org
Subject: Re: blkdev loop UAF
Message-ID: <20180111170008.24crzej4utoh5s3p@quack2.suse.cz>
References: <654967ea.6cce.160e43379f6.Coremail.long7573@126.com>
	<20180111082441.myaxc2upbjhcucqh@mwanda>
	<69a2465f-dd67-6962-4f15-57fa9cbafe4d@huawei.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <69a2465f-dd67-6962-4f15-57fa9cbafe4d@huawei.com>
User-Agent: NeoMutt/20170421 (1.8.2)
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Thu 11-01-18 19:22:39, Hou Tao wrote:
> Hi,
> 
> On 2018/1/11 16:24, Dan Carpenter wrote:
> > Thanks for your report and the patch.  I am sending it to the
> > linux-block devs since it's already public.
> > 
> > regards,
> > dan carpenter
> 
> The User-after-free problem is not specific for loop device, it can also
> be reproduced on scsi device, and there are more race problems caused by
> the race between bdev open and gendisk shutdown [1].
> 
> The cause of the UAF problem is that there are two instances of gendisk which share
> the same bdev. After the process owning the new gendisk increases bdev->bd_openers,
> the other process which owns the older gendisk will find bdev->bd_openers is not zero
> and will put the last reference of the older gendisk and cause User-after-free.
> 
> I had proposed a patch for the problem, but it's still an incomplete fix for the race
> between gendisk shutdown and bdev opening.
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fc..5ecdb9f 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1510,6 +1510,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
>                 if (bdev->bd_bdi == &noop_backing_dev_info)
>                         bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
>         } else {
> +               if (bdev->bd_disk != disk) {
> +                       ret = -ENXIO;
> +                       goto out_unlock_bdev;
> +               }
> +
>                 if (bdev->bd_contains == bdev) {
>                         ret = 0;
>                         if (bdev->bd_disk->fops->open)
> 
> 
> As far as I know, Jan Kara is working on these problems. So, Jan, any suggestions ?

Yeah, thanks for the ping. I was working today to get this fixed. I have
about 6 patches (attached) which should fix all the outstanding issues I'm
aware of.  So far they are only boot tested. I will give them more
stress-testing next week and then post them officially but if you have time,
feel free to try them out as well. Thanks!

								Honza

From f560a27dd0938300a6bb5b16d9164e29b5abf980 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 11 Jan 2018 17:29:03 +0100
Subject: [PATCH 6/6] blockdev: Avoid two active bdev inodes for one device

When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:

CPU0					CPU1
blkdev_open()
  bdev = bd_acquire()
					del_gendisk()
					  bdev_unhash_inode(bdev);
					remove device
					create new device with the same number
  __blkdev_get()
    disk = get_gendisk()
      - gets reference to gendisk of the new device

Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.

Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index fe41a76769fa..fe09ef9c21f3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1058,6 +1058,27 @@ static int bd_prepare_to_claim(struct block_device *bdev,
 	return 0;
 }
 
+static struct gendisk *bdev_get_gendisk(struct block_device *bdev, int *partno)
+{
+	struct gendisk *disk = get_gendisk(bdev->bd_dev, partno);
+
+	if (!disk)
+		return NULL;
+	/*
+	 * Now that we hold gendisk reference we make sure bdev we looked up is
+	 * not stale. If it is, it means device got removed and created before
+	 * we looked up gendisk and we fail open in such case. Associating
+	 * unhashed bdev with newly created gendisk could lead to two bdevs
+	 * (and thus two independent caches) being associated with one device
+	 * which is bad.
+	 */
+	if (inode_unhashed(bdev->bd_inode)) {
+		put_disk_and_module(disk);
+		return NULL;
+	}
+	return disk;
+}
+
 /**
  * bd_start_claiming - start claiming a block device
  * @bdev: block device of interest
@@ -1094,7 +1115,7 @@ static struct block_device *bd_start_claiming(struct block_device *bdev,
 	 * @bdev might not have been initialized properly yet, look up
 	 * and grab the outer block device the hard way.
 	 */
-	disk = get_gendisk(bdev->bd_dev, &partno);
+	disk = bdev_get_gendisk(bdev, &partno);
 	if (!disk)
 		return ERR_PTR(-ENXIO);
 
@@ -1429,7 +1450,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
  restart:
 
 	ret = -ENXIO;
-	disk = get_gendisk(bdev->bd_dev, &partno);
+	disk = bdev_get_gendisk(bdev, &partno);
 	if (!disk)
 		goto out;
 
-- 
2.13.6