From patchwork Tue Apr 12 16:42:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 8814081 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 3441FC0553 for ; Tue, 12 Apr 2016 18:24:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 63D20201CD for ; Tue, 12 Apr 2016 18:24:00 +0000 (UTC) Received: from mx3-phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 76541201BB for ; Tue, 12 Apr 2016 18:23:59 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx3-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u3CIJxiP014209; Tue, 12 Apr 2016 14:20:00 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u3CGgtUl005732 for ; Tue, 12 Apr 2016 12:42:55 -0400 Received: from bfoster.bfoster (dhcp-41-153.bos.redhat.com [10.18.41.153]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u3CGgtsg019024; Tue, 12 Apr 2016 12:42:55 -0400 Received: by bfoster.bfoster (Postfix, from userid 1000) id B2815125488; Tue, 12 Apr 2016 12:42:53 -0400 (EDT) From: Brian Foster To: xfs@oss.sgi.com Date: Tue, 12 Apr 2016 12:42:53 -0400 Message-Id: <1460479373-63317-11-git-send-email-bfoster@redhat.com> In-Reply-To: <1460479373-63317-1-git-send-email-bfoster@redhat.com> References: <1460479373-63317-1-git-send-email-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Tue, 12 Apr 2016 14:18:51 -0400 Cc: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, dm-devel@redhat.com Subject: [dm-devel] [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The block device reservation that occurs as part of transaction reservation uses a worst case algorithm to determine the amount of reservation required to satisfy the transaction. This means that one bdev (i.e., device-mapper) block is reserved per required filesystem block, even though the former block size is likely much larger than the latter. Worst case reservation is required in most cases because, from the perspective of the transaction, block allocation can occur throughout the block address space. This is unnecessary for some operations where more context is available, however. xfs_alloc_file_space() is one such case. It calls xfs_bmapi_write() in a loop and once per transaction. Since it also passes nmap == 1, each call maps a single extent and thus allocates contiguous blocks. Based on that, the bdev reservation can be reduced from the worst case 1-1 mapping to a more optimal 1-N mapping of dm blocks to fs blocks (e.g., one dm block can cover many fs blocks). Update xfs_alloc_file_space() to bypass transaction based bdev reservation. Instead, open-code the bdev reservation using the more optimal contiguous reservation value. This allows fallocate requests to consume just about all of the available space in a thin volume without premature ENOSPC errors. Signed-off-by: Brian Foster --- fs/xfs/xfs_bmap_util.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 3b63098..c2e1215 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -40,6 +40,7 @@ #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_log.h" +#include "xfs_thin.h" /* Kernel only BMAP related definitions and functions */ @@ -1035,9 +1036,11 @@ xfs_alloc_file_space( } /* - * Allocate and setup the transaction. + * Allocate and setup the transaction. The noblkres flags tells + * the reservation infrastructure to skip bdev reservation. */ tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT); + tp->t_flags |= XFS_TRANS_NOBLKRES; error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, resrtextents); /* @@ -1051,6 +1054,30 @@ xfs_alloc_file_space( xfs_trans_cancel(tp); break; } + + /* + * We disabled the transaction bdev reservation because the + * trans infrastructure uses a worst case reservation. Since we + * call xfs_bmapi_write() one mapping at a time, we can assume + * the allocated blocks will be contiguous and thus can use a + * more optimal reservation value. Acquire the reservation here + * and attach it to the transaction. + * + * XXX: Need to take apart data and metadata block parts of res + * (see XFS_DIOSTRAT_SPACE_RES()). The latter still needs + * worst-case. + */ + if (mp->m_thin_res) { + sector_t res = xfs_fsb_res(mp, resblks, true); + + error = xfs_thin_reserve(mp, res); + if (error) { + xfs_trans_cancel(tp); + break; + } + tp->t_blk_thin_res = res; + } + xfs_ilock(ip, XFS_ILOCK_EXCL); error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, quota_flag);