From patchwork Sun Nov 11 03:59:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10677471 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36F2F13AD for ; Sun, 11 Nov 2018 04:00:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24E332BA61 for ; Sun, 11 Nov 2018 04:00:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1852B2BA74; Sun, 11 Nov 2018 04:00:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 848E22BA61 for ; Sun, 11 Nov 2018 04:00:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727184AbeKKNrb (ORCPT ); Sun, 11 Nov 2018 08:47:31 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:42567 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727145AbeKKNra (ORCPT ); Sun, 11 Nov 2018 08:47:30 -0500 Received: by mail-pg1-f195.google.com with SMTP id i4-v6so2530556pgq.9 for ; Sat, 10 Nov 2018 20:00:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ySVt9LhXgtBQOtoUu5KyhI+Rdz/i2B5rt36WR4rcJU4=; b=IBAEJDdGPnsPM4C8nAFA7cuoZzbR0K3WsXXmMKn5XFaAX37cO0yBbMgLEep5IXZ87K DDlkA22WFa7XHeHwigDUSq4wOGhlOphqjkdo3l+dAjHrdG2r8m4t+Dw/Op0NXI8iEWih 8STblaCypl+7VEWSYXTAr6YIbYkxmm1MHvMyj5xNzAcxgggl4wJCNBIS7zoZySgZ1XyV +oCKFd7PhulwWWwA+Q1MHtYS6n6hp1gzqzr6dUgtquxn930KmjKmUfa/iosjo1EJURTi UTGayYHK2D5YoEJtwhcsGbOW9Sz4CdXpJ5H+IY1E+27F7waQpusDr06qVWXqZE5kVtR3 5Opw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ySVt9LhXgtBQOtoUu5KyhI+Rdz/i2B5rt36WR4rcJU4=; b=s5I2Yl/xjSP1gDXhrlPl6rG3tADjestx5xXY3PdkRnG4bDNFOk6fURmdmnTkKEe9vz dyI/F0SaZNxomAlceZdGcLydxBCVbhB7rWSFli3/FTsVZE2s/34Lr6GWeQTSljk8DbGv mkNXfFt3jq+F/9Y4tRSw1wMs4Gd3bb0mILYfoLr445JOJ6YwK3XYT2rfsbaNFE0UlPpF QqMuNCp0DQRdAO+Mq2X6MfoaU7c18YTa2JqfrIMD2dI9+T4kUHe14UUE8AtaHN/ggV4a aowJcd7W1oaZyGl4+Ke8p/YDrcBajM7zK4yiW9YuVKOF7wXc/8drkAjLx7QMQvHVZwv+ IMVQ== X-Gm-Message-State: AGRZ1gJW4aGd/ZaZcfA1lN32OzpPqBLNyqud2lJ+JLQ6KUGmjsIuB1n2 gD4GqXVBvyVOq+LXPafl1Y6I17aVkkY= X-Google-Smtp-Source: AJdET5dLKx0H1ZsXVI9hoymNJQUlbMiFEZBSDDAxQMRiUfYlMU6YG4gckXNRQm9pah4z8omcjQ4Nyg== X-Received: by 2002:a62:b0b:: with SMTP id t11-v6mr14697297pfi.93.1541908809817; Sat, 10 Nov 2018 20:00:09 -0800 (PST) Received: from vader.hsd1.wa.comcast.net ([2601:602:8b00:55d3:e6a7:a0ff:fe0b:c9a8]) by smtp.gmail.com with ESMTPSA id j197sm12982581pgc.76.2018.11.10.20.00.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Nov 2018 20:00:09 -0800 (PST) From: Omar Sandoval To: linux-xfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v2] xfs: cache minimum realtime summary level Date: Sat, 10 Nov 2018 19:59:58 -0800 Message-Id: <49a31389f88ab40b11f07afa5c728a1368e39a45.1541908447.git.osandov@fb.com> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval The realtime summary is a two-dimensional array on disk, effectively: u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap] rsum[log][bbno] is the number of extents of size 2**log which start in bitmap block bbno. xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether rsum[log][bbno] != 0 for any log level. However, the summary array is stored in row-major order (i.e., like an array in C), so all of these entries are not adjacent, but rather spread across the entire summary file. In the worst case (a full bitmap block), xfs_rtany_summary() has to check every level. This means that on a moderately-used realtime device, an allocation will waste a lot of time finding, reading, and releasing buffers for the realtime summary. In particular, one of our storage services (which runs on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems) spends almost 5% of its CPU cycles in xfs_rtbuf_get() and xfs_trans_brelse() called from xfs_rtany_summary(). One solution would be to also store the summary with the dimensions swapped. However, this would require a disk format change to a very old component of XFS. Instead, we can cache the minimum size which contains any extents. We do so lazily; rather than guaranteeing that the cache contains the precise minimum, it always contains a loose lower bound which we tighten when we read or update a summary block. This only uses a few kilobytes of memory and is already serialized via the realtime bitmap and summary inode locks, so the cost is minimal. With this change, the same workload only spends 0.2% of its CPU cycles in the realtime allocator. Signed-off-by: Omar Sandoval --- Based on Linus' master branch. Changes from v1: - Clarify comment in xfs_rtmount_inodes(). - Use kmem_* instead of kvmalloc/kvfree fs/xfs/libxfs/xfs_rtbitmap.c | 4 ++++ fs/xfs/xfs_mount.h | 6 ++++++ fs/xfs/xfs_rtalloc.c | 27 +++++++++++++++++++++++---- 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index b228c821bae6..6d4990717cee 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -505,6 +505,10 @@ xfs_rtmodify_summary_int( uint first = (uint)((char *)sp - (char *)bp->b_addr); *sp += delta; + if (*sp == 0 && log == mp->m_rsum_cache[bbno]) + mp->m_rsum_cache[bbno]++; + if (*sp != 0 && log < mp->m_rsum_cache[bbno]) + mp->m_rsum_cache[bbno] = log; xfs_trans_log_buf(tp, bp, first, first + sizeof(*sp) - 1); } if (sum) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 7964513c3128..2b626a4b4824 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -89,6 +89,12 @@ typedef struct xfs_mount { int m_logbsize; /* size of each log buffer */ uint m_rsumlevels; /* rt summary levels */ uint m_rsumsize; /* size of rt summary, bytes */ + /* + * Cache of rt summary level per bitmap block with the invariant that + * m_rsum_cache[bbno] <= the minimum i for which rsum[i][bbno] != 0. + * Reads and writes are serialized by the rsumip inode lock. + */ + uint8_t *m_rsum_cache; struct xfs_inode *m_rbmip; /* pointer to bitmap inode */ struct xfs_inode *m_rsumip; /* pointer to summary inode */ struct xfs_inode *m_rootip; /* pointer to root directory */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 926ed314ffba..330406e0af14 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -64,8 +64,12 @@ xfs_rtany_summary( int log; /* loop counter, log2 of ext. size */ xfs_suminfo_t sum; /* summary data */ + /* There are no extents at levels < m_rsum_cache[bbno]. */ + if (low < mp->m_rsum_cache[bbno]) + low = mp->m_rsum_cache[bbno]; + /* - * Loop over logs of extent sizes. Order is irrelevant. + * Loop over logs of extent sizes. */ for (log = low; log <= high; log++) { /* @@ -80,13 +84,17 @@ xfs_rtany_summary( */ if (sum) { *stat = 1; - return 0; + goto out; } } /* * Found nothing, return failure. */ *stat = 0; +out: + /* There were no extents at levels < log. */ + if (log > mp->m_rsum_cache[bbno]) + mp->m_rsum_cache[bbno] = log; return 0; } @@ -1187,8 +1195,8 @@ xfs_rtmount_init( } /* - * Get the bitmap and summary inodes into the mount structure - * at mount time. + * Get the bitmap and summary inodes and the summary cache into the mount + * structure at mount time. */ int /* error */ xfs_rtmount_inodes( @@ -1211,6 +1219,16 @@ xfs_rtmount_inodes( return error; } ASSERT(mp->m_rsumip != NULL); + /* + * The rsum cache is initialized to all zeroes, which is trivially a + * lower bound on the minimum level with any free extents. + */ + mp->m_rsum_cache = kmem_zalloc_large(sbp->sb_rbmblocks, KM_SLEEP); + if (!mp->m_rsum_cache) { + xfs_irele(mp->m_rbmip); + xfs_irele(mp->m_rsumip); + return -ENOMEM; + } return 0; } @@ -1218,6 +1236,7 @@ void xfs_rtunmount_inodes( struct xfs_mount *mp) { + kmem_free(mp->m_rsum_cache); if (mp->m_rbmip) xfs_irele(mp->m_rbmip); if (mp->m_rsumip)