From patchwork Sun Jul 31 19:19:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 9253437 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 512F7601C0 for ; Sun, 31 Jul 2016 19:19:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BB6428460 for ; Sun, 31 Jul 2016 19:19:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D2E92848A; Sun, 31 Jul 2016 19:19:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=unavailable version=3.3.1 Received: from oss.sgi.com (oss.sgi.com [192.48.182.195]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BA54028460 for ; Sun, 31 Jul 2016 19:19:16 +0000 (UTC) Received: from oss.sgi.com (localhost [IPv6:::1]) by oss.sgi.com (Postfix) with ESMTP id 75D3B7CA1; Sun, 31 Jul 2016 14:19:14 -0500 (CDT) X-Original-To: xfs@oss.sgi.com Delivered-To: xfs@oss.sgi.com Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id 6B6777CA0 for ; Sun, 31 Jul 2016 14:19:12 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay3.corp.sgi.com (Postfix) with ESMTP id E0BA1AC001 for ; Sun, 31 Jul 2016 12:19:08 -0700 (PDT) X-ASG-Debug-ID: 1469992741-0bf8157e6c3025d0001-NocioJ Received: from newverein.lst.de (verein.lst.de [213.95.11.211]) by cuda.sgi.com with ESMTP id W3Ltq22yxnfB01fR (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 31 Jul 2016 12:19:02 -0700 (PDT) X-Barracuda-Envelope-From: hch@lst.de X-Barracuda-Effective-Source-IP: verein.lst.de[213.95.11.211] X-Barracuda-Apparent-Source-IP: 213.95.11.211 Received: by newverein.lst.de (Postfix, from userid 2407) id 6893D68D24; Sun, 31 Jul 2016 21:19:00 +0200 (CEST) Date: Sun, 31 Jul 2016 21:19:00 +0200 From: Christoph Hellwig To: Dave Chinner Subject: Re: iomap infrastructure and multipage writes V5 Message-ID: <20160731191900.GA29301@lst.de> X-ASG-Orig-Subj: Re: iomap infrastructure and multipage writes V5 References: <1464792297-13185-1-git-send-email-hch@lst.de> <20160628002649.GI12670@dastard> <20160630172239.GA23082@lst.de> <20160718111400.GC16044@dastard> <20160718111851.GD16044@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160718111851.GD16044@dastard> User-Agent: Mutt/1.5.17 (2007-11-01) X-Barracuda-Connect: verein.lst.de[213.95.11.211] X-Barracuda-Start-Time: 1469992742 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://192.48.157.11:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 2431 X-Virus-Scanned: by bsmtpd at sgi.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.7 tests=BSF_SC0_MISMATCH_TO X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.31654 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 BSF_SC0_MISMATCH_TO Envelope rcpt doesn't match header Cc: rpeterso@redhat.com, linux-fsdevel@vger.kernel.org, Christoph Hellwig , xfs@oss.sgi.com X-BeenThere: xfs@oss.sgi.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com X-Virus-Scanned: ClamAV using ClamSMTP Another quiet weekend trying to debug this, and only minor progress. The biggest different in traces of the old vs new code is that we manage to allocate much bigger delalloc reservations at a time in xfs_bmapi_delay -> xfs_bmapi_reserve_delalloc. The old code always went for a single FSB, which also meant allocating an indlen of 7 FSBs. With the iomap code we always allocate at least 4FSB (aka a page), and sometimes 8 or 12. All of these still need 7 FSBs for the worst case indirect blocks. So what happens here is that in an ENOSPC case we manage to allocate more actual delalloc blocks before hitting ENOSPC - notwithstanding that the old case would immediately release them a little later in xfs_bmap_add_extent_hole_delay after merging the delalloc extents. On the writeback side I don't see to many changes either. We'll eventually run out of blocks when allocating the transaction in xfs_iomap_write_allocate because the reserved pool is too small. The only real difference to before is that under the ENOSPC / out of memory case we have allocated between 4 to 12 times more blocks, so we have to clean up 4 to 12 times as much while write_cache_pages continues iterating over these dirty delalloc blocks. For me this happens ~6 times as much as before, but I still don't manage to hit an endless loop. Now after spending this much time I've started wondering why we even reserve blocks in xfs_iomap_write_allocate - after all we've reserved space for the actual data blocks and the indlen worst case in xfs_bmapi_reserve_delalloc. And in fact a little hack to drop that reservation seems to solve both the root cause (depleted reserved pool) and the cleanup mess. I just haven't spend enought time to convince myself that it's actually safe, and in fact looking at the allocator makes me thing it only works by accident currently despite generally postive test results. Here is the quick patch if anyone wants to chime in: diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 620fc91..67c317f 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -717,7 +717,7 @@ xfs_iomap_write_allocate( nimaps = 0; while (nimaps == 0) { - nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); + nres = 0; // XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres, 0, XFS_TRANS_RESERVE, &tp);