From patchwork Wed Mar 15 12:24:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikolay Borisov X-Patchwork-Id: 9625495 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B243D6048C for ; Wed, 15 Mar 2017 12:26:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A81FB2807B for ; Wed, 15 Mar 2017 12:26:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9CD1028619; Wed, 15 Mar 2017 12:26:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1EDBD2807B for ; Wed, 15 Mar 2017 12:26:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752651AbdCOM0B (ORCPT ); Wed, 15 Mar 2017 08:26:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:44135 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752438AbdCOM0A (ORCPT ); Wed, 15 Mar 2017 08:26:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D019DADF6; Wed, 15 Mar 2017 12:25:57 +0000 (UTC) From: Nikolay Borisov To: stable@vger.kernel.org Cc: darrick.wong@oracle.com, linux-xfs@vger.kernel.org, david@fromorbit.com, Brian Foster , Nikolay Borisov Subject: xfs: pass total block res. as total xfs_bmapi_write() parameter Date: Wed, 15 Mar 2017 14:24:14 +0200 Message-Id: <1489580654-29669-1-git-send-email-nborisov@suse.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <153397e1-ba41-20b3-455e-d622c234bc3a@gmail.com> References: <153397e1-ba41-20b3-455e-d622c234bc3a@gmail.com> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Brian Foster The total field from struct xfs_alloc_arg is a bit of an unknown commodity. It is documented as the total block requirement for the transaction and is used in this manner from most call sites by virtue of passing the total block reservation of the transaction associated with an allocation. Several xfs_bmapi_write() callers pass hardcoded values of 0 or 1 for the total block requirement, which is a historical oddity without any clear reasoning. The xfs_iomap_write_direct() caller, for example, passes 0 for the total block requirement. This has been determined to cause problems in the form of ABBA deadlocks of AGF buffers due to incorrect AG selection in the block allocator. Specifically, the xfs_alloc_space_available() function incorrectly selects an AG that doesn't actually have sufficient space for the allocation. This occurs because the args.total field is 0 and thus the remaining free space check on the AG doesn't actually consider the size of the allocation request. This locks the AGF buffer, the allocation attempt proceeds and ultimately fails (in xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next AG. In turn, this can lead to incorrect AG locking order (if the allocator wraps around, attempting to lock AG 0 after acquiring AG N) and thus deadlock if racing with another operation. This problem has been reproduced via generic/299 on smallish (1GB) ramdisk test devices. To avoid this problem, replace the undocumented hardcoded total parameters from the iomap and utility callers to pass the block reservation used for the associated transaction. This is consistent with other xfs_bmapi_write() callers throughout XFS. The assumption is that the total field allows the selection of an AG that can handle the entire operation rather than simply the allocation/range being requested (e.g., resulting btree splits, etc.). This addresses the aforementioned generic/299 hang by ensuring AG selection only occurs when the allocation can be satisfied by the AG. Reported-by: Ross Zwisler Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Nikolay Borisov Acked-by: Brian Foster --- Here is the backport for 3.12 fs/xfs/xfs_bmap_util.c | 2 +- fs/xfs/xfs_iomap.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 2843a0426bca..66172581dacb 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1146,7 +1146,7 @@ retry: xfs_bmap_init(&free_list, &firstfsb); error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb, alloc_type, &firstfsb, - 0, imapp, &nimaps, &free_list); + resblks, imapp, &nimaps, &free_list); if (error) { goto error0; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 43e31b0b288b..da87415ac561 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -217,7 +217,7 @@ xfs_iomap_write_direct( xfs_bmap_init(&free_list, &firstfsb); nimaps = 1; error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flag, - &firstfsb, 0, imap, &nimaps, &free_list); + &firstfsb, resblks, imap, &nimaps, &free_list); if (error) goto out_bmap_cancel; @@ -762,7 +762,7 @@ xfs_iomap_write_allocate( error = xfs_bmapi_write(tp, ip, map_start_fsb, count_fsb, XFS_BMAPI_STACK_SWITCH, - &first_block, 1, + &first_block, nres, imap, &nimaps, &free_list); if (error) goto trans_cancel; @@ -877,8 +877,8 @@ xfs_iomap_write_unwritten( xfs_bmap_init(&free_list, &firstfsb); nimaps = 1; error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, - XFS_BMAPI_CONVERT, &firstfsb, - 1, &imap, &nimaps, &free_list); + XFS_BMAPI_CONVERT, &firstfsb, resblks, + &imap, &nimaps, &free_list); if (error) goto error_on_bmapi_transaction;