From patchwork Thu Jan 18 22:19:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13523243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 223A3C47422 for ; Thu, 18 Jan 2024 22:22:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68A686B0078; Thu, 18 Jan 2024 17:22:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 605F96B0081; Thu, 18 Jan 2024 17:22:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 346F96B007B; Thu, 18 Jan 2024 17:22:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1EB1E6B007E for ; Thu, 18 Jan 2024 17:22:25 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DF2BB801CF for ; Thu, 18 Jan 2024 22:22:24 +0000 (UTC) X-FDA: 81693856608.15.5C45A9E Received: from mail-il1-f174.google.com (mail-il1-f174.google.com [209.85.166.174]) by imf23.hostedemail.com (Postfix) with ESMTP id F1193140013 for ; Thu, 18 Jan 2024 22:22:22 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=tA1gs6O7; spf=pass (imf23.hostedemail.com: domain of david@fromorbit.com designates 209.85.166.174 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705616543; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Y0T98cpX+SULSpjNgtsPYqzkRNNaWj63ZWS5JTebEXY=; b=c/hDOMrBISY0K0QNzap9zvU3dMRjeS7xKzDd/qzmGUbLBDpIA8fZWasPpSTIp2i7NLpp7Y CIYyn8towtxW1o67nxqplPEQErIPEC16tqVhsHepqunq6eJr64hKGb2agRyxHasa53SbZl qw/CIWJAEgCJfXknQ760t0Z8AB4dh9U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705616543; a=rsa-sha256; cv=none; b=EdXjgW0cwAZ9tQMuqk1Dp1bhf2YIuOejFXVwIqLg83fkflARF/wZoG5ybK7sb+rKoTKflm YJFMyEKs24iyw+rTYFKwVmJrGA//GhRZczkOo3KSwaU8OssRV6cbb7992LA2a3ibznAhXd bPYrtN5ZNIBSuS1D4YHNmkMVKg0FwWo= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=tA1gs6O7; spf=pass (imf23.hostedemail.com: domain of david@fromorbit.com designates 209.85.166.174 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com Received: by mail-il1-f174.google.com with SMTP id e9e14a558f8ab-3606c7a4cb5so754625ab.3 for ; Thu, 18 Jan 2024 14:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1705616542; x=1706221342; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Y0T98cpX+SULSpjNgtsPYqzkRNNaWj63ZWS5JTebEXY=; b=tA1gs6O7viwr6n7n1COGE/c1WiUPswMleCIZllmImf8d/ZjanH4KUjoJUkZx0Wr8di NvqEW0HA0NvU4uvHGMoOYYanZxUaDYl1VLUBZNjQAQQxQ98ju553oPM0Wzmx8S7zm62S U13vQ8EAAXvldjxZ7g95MZfOTuefeJyEYgGjZLHQ40G+KtzZKVEuBu6v4UegSNLIwpCl Tmt3nfqWXhFqZNHAKfHLNX8WxKiXiBFrNQ8EWpAB01K6WIGV4BDwnr7z04bSGa8LYVal R+41nzOFshw4wwBdBl6DYPpuSkR8ELgqcPYLEIS1o+HupK21rdgtcPWcfS/4yOKRu1m4 vcKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705616542; x=1706221342; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Y0T98cpX+SULSpjNgtsPYqzkRNNaWj63ZWS5JTebEXY=; b=rf91ALMcQXCsdVm3G/ygpfWsbFp2Dpr7tDekFkZhqMbSaMu9kHHsu6MIUBlJsnAice yH56oyXgMWcs14qapDCKXYMb9B6G1+HFAEug0DmyOpyd4ype8wMpofRWEOj24nCKm8U0 xm7mkX+88bYuzbbU8ETDBc/GLQ/kOiT6tuv+3X6hkmJwFQCHWoqp8DhTiMWkBaw05X5s jZ7OE4rtrzLXla6FWQc3ixr7zvsGV3BRNvfgPVp31heHuFvcYgZ2yg+adksosmucSmID fOJVKeYPP5yHX7qWUMZ7i2GQfR4BDgKX0rG+VVbv9YFx1S0ytJYZbOAgSOK4cPWxpV/F 35ww== X-Gm-Message-State: AOJu0YyclIGh2A+iJeQcQAfMXIGn/2JeMzgmtniRIQhMKwCJf87tCrOr hIgnNMuvNDBcqHW6mm0byPkAvEsS8UGYHzHVD4HjlefYU/kCgPq3yY7DKeZMzrs= X-Google-Smtp-Source: AGHT+IGlPKlu2kdXNkTMeiemaK5KwMkc7BAynefIhrrtwhxn/LDegX2GgQDreG0TxMQNpJ7/MxkpxA== X-Received: by 2002:a05:6e02:1d89:b0:361:a719:6024 with SMTP id h9-20020a056e021d8900b00361a7196024mr865850ila.30.1705616542047; Thu, 18 Jan 2024 14:22:22 -0800 (PST) Received: from dread.disaster.area (pa49-180-249-6.pa.nsw.optusnet.com.au. [49.180.249.6]) by smtp.gmail.com with ESMTPSA id y37-20020a634965000000b005ceeeea1816sm2137336pgk.77.2024.01.18.14.22.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 14:22:21 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rQamB-00CCGK-0V; Fri, 19 Jan 2024 09:22:18 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97) (envelope-from ) id 1rQamA-0000000HMlj-2Z5n; Fri, 19 Jan 2024 09:22:18 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: willy@infradead.org, linux-mm@kvack.org Subject: [RFC] [PATCH 0/3] xfs: use large folios for buffers Date: Fri, 19 Jan 2024 09:19:38 +1100 Message-ID: <20240118222216.4131379-1-david@fromorbit.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: F1193140013 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: mqxfyzsrnc57jdhw8um6cighpcd64fpm X-HE-Tag: 1705616542-936691 X-HE-Meta: U2FsdGVkX1+GVpAqbIHncPOcoMT5VZ2vjXE3mXC5Ejxgo/C5+ZfMmvI1urk9DZWTPmQYuXstVRj1jNWSCZXNAjSO3XzRJvv+GGtvMLMXxi7LJUSA40QsIsIuypF4rbmpSavMl36Pwqyl74LJtiyP5Lj7y6EpjjY/Mo+ypRjdAHOpO02TA2rWg8EGd5Iw7LKr1MxPwYL6gmVJ0aSNg7CNuW/w5+h2Vfw67Owt2ZqpHlMYoHUfPkx8kpdafU4IlOElfBiGIXLsQem/J/3E74BLww6m6bHr448PCavC3mt9TU37GL+4+bixwCQJesUIS3hlIGyc+zwuHcBpaMBsmonNvgukibUap+ZvMi0livtsJmfSLHLwArq9fTw1lg2rDxqYpBzAAdW12Y7+S/N1O1Ti7EJcVlx0f927ISpYqyRyH9+Nqu62ajjo0zLOHfIIKiJZdgG1u016QpQ7pOx4YAQv9a5ME8fJc4WulHW8djW1pXP+0SnaDMt98cuDzf+zNNDTY3KWArkKd0GIPYzQQaquX+pVmjZUGCeKwLCGuy6eg9wKpT0q8LLiOfb8YbESqb0/jLGp7iTNm5vK4OEd03pf7s1Lv2BoOUWY+PCb5fzJwzikctzQFMIAxQ++tvPvNXc6hUMbFn3mQpotAembYhmKojRjzgqpR/qAD4S36kemZeqTWy2GBVjMsb7UPhvV9HUHdudB0pgCy1bKcEB3heMZabn0Jr9yIseXX+SkIlgNUHhlgj/Sa+bxBwNofmY58o2iOFl3ayjqvJDkyIvIBQnWoPI2kmuzL5zfNE+tZ3bPfY4VHFcDJ5yp3r22yMvudPan2smSjiUubpRfiipTbg5+ZvUyGL9QFHGCszk3uLrLv4/a95VXJokCxsFqyspScdsaTfP0Mu3hpPDI2mhNWPOeLNRhQLlYkFeRv4nsulKLVpH93wLIkihgiBL+PWO+Aquo+6w2KVSngSwn7JycPzG V1F728LT 567ULViPHbIAY4JGlqa385TKBCtN2ddCxwbVBIZEogE5lKApj4FET93WMVCDYSkFaJKI+479EU3De4fY7vD3lgfRB4je7qgtswXomNO39LWrwAdohIo55RDaXUtHUf8Jr82ySZdiwL7GnSXYv9Y3AQ3bTrk5cNrrvGJp4fErjDsJIGymk0wKMCvgapnM+aIKrFVakrp0dtu17sgtCOHimpa5oqjcXUMLTj/5UndjXgFp7uHayasmKPihv/c/uzhGeTTmYsdjjo2Qdm47dJz6mH5FVrg7HfZlgO9rIh1HMYpMFZ8VDK6fr7GziHSEuZlcGlTj0LdJjS2gZBoN41slpK1OFRLtmS0hVhhXMErbMFwSSXBaFGC9TDX4lVMAlmee5CM+L0U3HFrRbZXlM11N86LO0TqGoRYoF0NXG5xY9BBU56oiQDfZ5Fl2L/W4tB3XRf/TNCJXsihAo4mdRrmQorvT8ebd0UxzcJbhtBmkeN6TJMfeS6FBJsWWdtHJhwRMv2yomhNDKEPGZYj8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The XFS buffer cache supports metadata buffers up to 64kB, and it does so by aggregating multiple pages into a single contiguous memory region using vmapping. This is expensive (both the setup and the runtime TLB mapping cost), and would be unnecessary if we could allocate large contiguous memory regions for the buffers in the first place. Enter multi-page folios. This patchset converts the buffer cache to use the folio API, then enhances it to optimisitically use large folios where possible. It retains the old "vmap an array of single page folios" functionality as a fallback when large folio allocation fails. This means that, like page cache support for large folios, we aren't dependent on large folio allocation succeeding all the time. This relegates the single page array allocation mechanism to the "slow path" that we don't have to care so much about performance of this path anymore. This might allow us to simplify it a bit in future. One of the issues with the folio conversion is that we use a couple of APIs that take struct page ** (i.e. pointers to page pointer arrays) and there aren't folio counterparts. These are the bulk page allocator and vm_map_ram(). In the cases where they are used, we cast &bp->b_folios[] to (struct page **) knowing that this array will only contain single page folios and that single page folios and struct page are the same structure and so have the same address. This is a bit of a hack (hence the RFC) but I'm not sure that it's worth adding folio versions of these interfaces right now. We don't need to use the bulk page allocator so much any more, because that's now a slow path and we could probably just call folio_alloc() in a loop like we used to. What to do about vm_map_ram() is a little less clear.... The other issue I tripped over in doing this conversion is that the discontiguous buffer straddling code in the buf log item dirty region tracking is broken. We don't actually exercise that code on existing configurations, and I tripped over it when tracking down a bug in the folio conversion. I fixed it and short-circuted the check for contiguous buffers, but that didn't fix the failure I was seeing (which was not handling bp->b_offset and large folios properly when building bios). Apart from those issues, the conversion and enhancement is relatively straight forward. It passes fstests on both 512 and 4096 byte sector size storage (512 byte sectors exercise the XBF_KMEM path which has non-zero bp->b_offset values) and doesn't appear to cause any problems with large directory buffers, though I haven't done any real testing on those yet. Large folio allocations are definitely being exercised, though, as all the inode cluster buffers are 16kB on a 512 byte inode V5 filesystem. Thoughts, comments, etc? Note: this patchset is on top of the NOFS removal patchset I sent a few days ago. That can be pulled from this git branch: https://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-kmem-cleanup