From patchwork Tue Dec 8 07:25:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Al Viro X-Patchwork-Id: 7794981 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 9687FBEEE1 for ; Tue, 8 Dec 2015 07:25:53 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 8B9CE20515 for ; Tue, 8 Dec 2015 07:25:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 81E9220527 for ; Tue, 8 Dec 2015 07:25:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933074AbbLHHZP (ORCPT ); Tue, 8 Dec 2015 02:25:15 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:39234 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932717AbbLHHZO (ORCPT ); Tue, 8 Dec 2015 02:25:14 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.76 #1 (Red Hat Linux)) id 1a6Ce4-0003qm-Cb; Tue, 08 Dec 2015 07:25:08 +0000 Date: Tue, 8 Dec 2015 07:25:08 +0000 From: Al Viro To: "Suzuki K. Poulose" Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, marc.zyngier@arm.com, torvalds@linux-foundation.org, Tejun Heo , stable@vger.kernel.org Subject: Re: [PATCH] blkdev: Fix blkdev_open to release the bdev on error Message-ID: <20151208072508.GM20997@ZenIV.linux.org.uk> References: <1449511503-7543-1-git-send-email-suzuki.poulose@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1449511503-7543-1-git-send-email-suzuki.poulose@arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Dec 07, 2015 at 06:05:03PM +0000, Suzuki K. Poulose wrote: > blkdev_open() doesn't release the bdev, it attached to a given > inode, if blkdev_get() fails (e.g, due to absence of a device). > This can cause kernel crashes when the original filesystem > tries to flush the data during evict_inode. > > This can be triggered easily with virtio-9p fs using the following > simple steps. ??? How can filesystem type affect the behaviour of block devices? Having mknod /tmp/splat b 8 1; rm /tmp/splat try to evict the pagecache of /dev/sda1 is simply wrong, no matter what type /tmp happens to have. And they must share pagecache, or you'll get one hell of cache coherency problems. As it is, that pagecache belongs to inode on bdevfs (see fs/block_dev.c; not mountable anywhere visible, the one and only mount is internal). That inode is tied to struct bdev, ditto for its lifetime. Block device inodes on anything else have their ->i_mapping pointing to the corresponding (unique for given major/minor) inode on bdevfs; that gives us the coherency, but that also means that their *own* pagecache (->i_data) is empty. Which is just fine, since inode eviction should get rid of everything in its embedded struct address_space. In case of block device inodes on ext2, 9p, etc. that amounts to no pages at all. In case of bdevfs, it contains the page cache of block device. Aha... truncate_inode_pages_final(inode->i_mapping); clear_inode(inode); filemap_fdatawrite(inode->i_mapping); in there is obviously wrong - it should be truncate_inode_pages_final(&inode->i_data); clear_inode(inode); filemap_fdatawrite(&inode->i_data); and if you check other filesystems' ->evict_inode() you'll see the same thing there. We should not do bd_forget() upon failing open() - what for? As long as ->i_rdev remains the same, the pointer to struct bdev is valid. It doesn't pin bdev down; having it (or any other alias) opened does. When we decide to evict bdev, *all* aliasing inodes are dissociated from it; none of them is open at that point, so we are OK. When an aliasing inode gets evicted, we have it dissociated from its ->i_bdev (if any). Since we only access the ->i_mapping of aliasing inode while its open, those places are fine and anything that wants ->i_data of alias will simply find it empty. AFAICS, the cause of your oopsen is that 9p evict_inode is accessing the object it has no business to touch. Could you confirm that the patch below fixes your problem? --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index 699941e..5110785 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -451,9 +451,9 @@ void v9fs_evict_inode(struct inode *inode) { struct v9fs_inode *v9inode = V9FS_I(inode); - truncate_inode_pages_final(inode->i_mapping); + truncate_inode_pages_final(&inode->i_data); clear_inode(inode); - filemap_fdatawrite(inode->i_mapping); + filemap_fdatawrite(&inode->i_data); v9fs_cache_inode_put_cookie(inode); /* clunk the fid stashed in writeback_fid */