From patchwork Fri Jan 5 19:51:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liu Bo X-Patchwork-Id: 10147051 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A571A60153 for ; Fri, 5 Jan 2018 20:54:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 415FD28926 for ; Fri, 5 Jan 2018 20:54:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3623F28929; Fri, 5 Jan 2018 20:54:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C87E28926 for ; Fri, 5 Jan 2018 20:54:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752946AbeAEUyL (ORCPT ); Fri, 5 Jan 2018 15:54:11 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:45326 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752530AbeAEUyK (ORCPT ); Fri, 5 Jan 2018 15:54:10 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.21/8.16.0.21) with SMTP id w05KqYq4068967 for ; Fri, 5 Jan 2018 20:54:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=LX3DUi99CdmCGoiUgTK3WSbUxrx8r2keJF+cOJkCL6I=; b=COywSLNCjnovauhQlv7JLCnnEuW/qDlV9f5vpQixt+zdWZ/Fh9Z3SbhC+Yud3qQZaeAA Pyvz442xOa2Hbq6xuZ0Pz9e8UkKOP577T4OHoSXid4maSwTivPbdPPfZHWvClZDCTNTl lOjl/oxDwyO3kAluwQKc5dMVPp+fW7l3ebXpOJBcP/8P9KbH5dp1kkh4/3La5Zf1TkVw 7oxcaD1juz1WrZvfnkig40K8Tv1lt+/MtY9+shJpp458Ir0L1s0r+bxrYMVE5O7xP1M9 bunhiFGTLBYtOqVjN7dzKpeTR03gGed2M7PWO4JPV8wOCVyxnUKr+CdrI7iSlOg+7v9o Rw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2fagch035q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 05 Jan 2018 20:54:09 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w05Ks8H3004947 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Fri, 5 Jan 2018 20:54:08 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w05Ks84C031546 for ; Fri, 5 Jan 2018 20:54:08 GMT Received: from dhcp-10-211-47-181.usdhcp.oraclecorp.com.com (/10.211.47.181) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 05 Jan 2018 12:54:08 -0800 From: Liu Bo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 02/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Date: Fri, 5 Jan 2018 12:51:09 -0700 Message-Id: <20180105195117.5131-3-bo.li.liu@oracle.com> X-Mailer: git-send-email 2.9.4 In-Reply-To: <20180105195117.5131-1-bo.li.liu@oracle.com> References: <20180105195117.5131-1-bo.li.liu@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8765 signatures=668651 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=18 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=354 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801050287 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This fixes a corner case that is caused by a race of dio write vs dio read/write. Here is how the race could happen. Suppose that no extent map has been loaded into memory yet. There is a file extent [0, 32K), two jobs are running concurrently against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio read from [0, 4K) or [4K, 8K). t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K). ------------------------------------------------------ t1 t2 btrfs_get_blocks_direct() btrfs_get_blocks_direct() -> btrfs_get_extent() -> btrfs_get_extent() -> lookup_extent_mapping() -> add_extent_mapping() -> lookup_extent_mapping() # load [0, 32K) -> btrfs_new_extent_direct() -> btrfs_drop_extent_cache() # split [0, 32K) and # drop [8K, 32K) -> add_extent_mapping() # add [8K, 32K) -> add_extent_mapping() # handle -EEXIST when adding # [0, 32K) ------------------------------------------------------ About how t2(dio read/write) runs into -EEXIST: a) add_extent_mapping() gets -EEXIST for adding em [0, 32k), b) search_extent_mapping() then returns [0, 8k) as the existing em, even though start == existing->start, em is [0, 32k) so that extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k, c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k) (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree, d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write, which is confusing applications. Here I conclude all the possible situations, 1) start < existing->start +-----------+em+-----------+ +--prev---+ | +-------------+ | | | | | | | +---------+ + +---+existing++ ++ + | + start 2) start == existing->start +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 3) start > existing->start && start < (existing->start + existing->len) +------------em------------+ | +-------------+ | | | | | + +----existing-+ + | | + start 4) start >= (existing->start + existing->len) +-----------+em+-----------+ | +-------------+ | +--next---+ | | | | | | + +---+existing++ + +---------+ + | + start As we can see, it turns out that if start is within existing em (front inclusive), then the existing em should be returned as is, otherwise, we try our best to merge candidate em with sibling ems to form a larger em (in order to reduce the total number of em). Reported-by: David Vallender Signed-off-by: Liu Bo Reviewed-by: Josef Bacik --- v2: Improve commit log to provide more details about the bug. fs/btrfs/inode.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2784bb3..a270fe2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7162,19 +7162,12 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, * existing will always be non-NULL, since there must be * extent causing the -EEXIST. */ - if (existing->start == em->start && - extent_map_end(existing) >= extent_map_end(em) && - em->block_start == existing->block_start) { - /* - * The existing extent map already encompasses the - * entire extent map we tried to add. - */ + if (start >= existing->start && + start < extent_map_end(existing)) { free_extent_map(em); em = existing; err = 0; - - } else if (start >= extent_map_end(existing) || - start <= existing->start) { + } else { /* * The existing extent map is the one nearest to * the [start, start + len) range which overlaps @@ -7186,10 +7179,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, free_extent_map(em); em = NULL; } - } else { - free_extent_map(em); - em = existing; - err = 0; } } write_unlock(&em_tree->lock);