From patchwork Mon Jan 30 16:14:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13121412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C03EAC636D4 for ; Mon, 30 Jan 2023 16:14:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237306AbjA3QO6 (ORCPT ); Mon, 30 Jan 2023 11:14:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237483AbjA3QOz (ORCPT ); Mon, 30 Jan 2023 11:14:55 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50ADCA0; Mon, 30 Jan 2023 08:14:54 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id c10-20020a17090a1d0a00b0022e63a94799so1033139pjd.2; Mon, 30 Jan 2023 08:14:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VhCJkgKmno3Ehnuu2/UZ2Sk/flUz8J0C2jHe7kwnTK4=; b=GL0ynBl0gGdadbS5zkFARsHT+8kyPonuFb3c8tiGcfOUl3/+EQOlzVbIGwY8dSAaRB tJJEdCRBAoBM2JkqcOYd02wORea2jGZzZQKZ6TeMLaBvYga6hJ/qZ2UUq9XD0XX1hzvu ODuD2dK5LTak2ZUqfTNyq7FebBOABICxZ1tT8lTb/tFfkWxkL/phPga54fTzLASqXUrX /3Hg0VgwTK04jW6K2+nrftEoqn0XdTvC45Wnz/uFxtS8v0Sq7NP8tLy2UztKbMBxpoFW s1Q0OostBp3K9ddNJR6qDn5E3nGLSQiKKjc4px447VU1ezPI6XLvPQqnZyaIckBkIXTs LNjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VhCJkgKmno3Ehnuu2/UZ2Sk/flUz8J0C2jHe7kwnTK4=; b=KycJyD0iW3R8NArOHZun4tC8sEfHd+0lTAp7uo41Q7h9VqxlblJR+tqd71k0ufgMTm v6nFDeHKPkIdg+1vRYmi/0S8azsw1SZTqKVtRUToxul/SI99yuw0zPNazbSIYHM06O4Y xkdsLhmW6BwLuaqnFjJGEZ6KgkWIKuyjjwoZ0R/FERtAmut3Kj08lU/4JiINBrFoJL3w YYenuSHC93YvB1dbjfH3cRTTgQ66zQN8bZpEW8sO+Oagq8cx4EKA13Ib9NQyWfLP0VbZ oh31sikkVRuMjeNcRpgRDtmTyMOCjn2qGtyykfp+UlRKoCr1m5BtkPev69DylU0IVoIM yDeg== X-Gm-Message-State: AO0yUKWju6xVcquxcLCCTVdrk54GDoSXBMKR6zs+VOwovJ33DkQW9Gqd swpStF8a/AS0vZxU/Jx7fUgrnBgOO14= X-Google-Smtp-Source: AK7set+QChc6p40jqgU7RJABjPbrGQ3BFbvNxwrkfk4RjL1SioZBs3aXyorbCh5NlvfRkhopAlN3dw== X-Received: by 2002:a17:902:c94e:b0:194:dda9:7b40 with SMTP id i14-20020a170902c94e00b00194dda97b40mr10767578pla.2.1675095293314; Mon, 30 Jan 2023 08:14:53 -0800 (PST) Received: from localhost ([2406:7400:63:1fd8:5041:db86:706c:f96b]) by smtp.gmail.com with ESMTPSA id a9-20020a170902ecc900b001949ae8c275sm8002937plh.141.2023.01.30.08.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Jan 2023 08:14:52 -0800 (PST) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Aravinda Herle , "Ritesh Harjani (IBM)" Subject: [RFCv2 1/3] iomap: Move creation of iomap_page early in __iomap_write_begin Date: Mon, 30 Jan 2023 21:44:11 +0530 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Before this commit[1], we used to call iomap_page_create() before checking folio_test_uptodate() in __iomap_write_begin(). The problem is that commit[1] moved iop creation later i.e. after checking for whether the folio is uptodate. And if the folio is uptodate, it simply returns and doesn't allocate a iop. Now what can happen is that during __iomap_write_begin() for bs < ps, there can be a folio which is marked uptodate but does not have a iomap_page structure allocated. (I think one of the reason it can happen is due to memory pressure, we can end up freeing folio->private resource). Thus the iop structure will only gets allocated at the time of writeback in iomap_writepage_map(). This I think, was a not problem till now since we anyway only track uptodate status in iop (no support of tracking dirty bitmap status which later patches will add), and we also end up setting all the bits in iomap_page_create(), if the page is uptodate. [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/ Signed-off-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 356193e44cf0..e9c85fcf7a1f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -527,7 +527,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, size_t len, struct folio *folio) { const struct iomap *srcmap = iomap_iter_srcmap(iter); - struct iomap_page *iop; + struct iomap_page *iop = iomap_page_create(iter->inode, folio, + iter->flags); loff_t block_size = i_blocksize(iter->inode); loff_t block_start = round_down(pos, block_size); loff_t block_end = round_up(pos + len, block_size); @@ -539,7 +540,6 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, return 0; folio_clear_error(folio); - iop = iomap_page_create(iter->inode, folio, iter->flags); if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) return -EAGAIN; From patchwork Mon Jan 30 16:14:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13121413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A6D4C54EED for ; Mon, 30 Jan 2023 16:15:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237412AbjA3QO7 (ORCPT ); Mon, 30 Jan 2023 11:14:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237074AbjA3QO6 (ORCPT ); Mon, 30 Jan 2023 11:14:58 -0500 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F496A0; Mon, 30 Jan 2023 08:14:57 -0800 (PST) Received: by mail-pj1-x1029.google.com with SMTP id n20-20020a17090aab9400b00229ca6a4636so15993756pjq.0; Mon, 30 Jan 2023 08:14:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vVnQJu5sfmw97epqgNZa/UVQe2bS1bQ24l2zy0fQaR8=; b=eMsJYaljyU4mBXqckYJ7679gMuOoAW39TtZVifzp8OZe8Ry7Cdx2ykc1N2LI+jD+RY KIcuyomKnye6YiqO4oBAWGJzLgrY/E4ekx8bCD9ydjsEhCwOMtLOMFaCh3hB6k/hRyNL 37h6+dHEus5bgGQHIiztxOeGCfSMH1+HG2lTP45wNTkvOvv35YijQoLt30LGoTzP2Wgf urfo8wuOkgwNg/n+fQQLqxYAhQ5Wqdzu8ABywMBfJt/MtZrwQ8wlpKoUXzrJy/C09VOK p1dC+GyIcRb/gRs59qwRwUVIPPE022SMggUML0/sMYnNNXlOUzUz8A/USn5/+XUDJXCE gqcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vVnQJu5sfmw97epqgNZa/UVQe2bS1bQ24l2zy0fQaR8=; b=4eEib6UYLpM8YrXQmegJIVNYog5LmXiwMsLmAf3NQ6DJrkOFjP4u1pRduFEVZkZBuf Y3wu9CbKEQcVYrksCgkslqtSlLignvbADai7WHG1vseLhxxEZxNY8MPZb8wbBNtggAh+ Imi/jFS98Lp3uDQNnqfu3NP06EmQ20F+eGFQT1MM1bh1cVCeGxje35ltlxjQQGcjj3Xn WJv1vvjkWIyrLZZ/vSWZM8iB8lqxTliVqJd8GWJjxEE9LpjH1KZnppNkzW4b+R1EnkiV ZtFdYQE16GgbvZyKNnNusUuTF3x34PYB1sGgn5t3J/68Huwc9SV5xFrP5FZdXIaha22p /8rQ== X-Gm-Message-State: AFqh2kqszFDuiaG4rfxDGs19gw5Bpirek8osP2NV1pPcNP/EuMCTWqGD YKWGRRJ81Rf4Vvikl2Bx1zfnb/EdpsA= X-Google-Smtp-Source: AMrXdXslwc1bySRsEQVTpleWdHv2ZGFNzVXvqPujcOQcQUZksEVc9Vq2vfbPi+pwPQxtroccbF8MQA== X-Received: by 2002:a05:6a20:d695:b0:af:9391:449 with SMTP id it21-20020a056a20d69500b000af93910449mr53788823pzb.45.1675095296569; Mon, 30 Jan 2023 08:14:56 -0800 (PST) Received: from localhost ([2406:7400:63:1fd8:5041:db86:706c:f96b]) by smtp.gmail.com with ESMTPSA id g22-20020a170902869600b00195e8f97e72sm7988369plo.111.2023.01.30.08.14.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Jan 2023 08:14:56 -0800 (PST) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Aravinda Herle , "Ritesh Harjani (IBM)" Subject: [RFCv2 2/3] iomap: Change uptodate variable name to state Date: Mon, 30 Jan 2023 21:44:12 +0530 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch just changes the struct iomap_page uptodate & uptodate_lock member names to state and state_lock to better reflect their purpose for the upcoming patch. Signed-off-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e9c85fcf7a1f..faee2852db8f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -25,13 +25,13 @@ /* * Structure allocated for each folio when block size < folio size - * to track sub-folio uptodate status and I/O completions. + * to track sub-folio uptodate state and I/O completions. */ struct iomap_page { atomic_t read_bytes_pending; atomic_t write_bytes_pending; - spinlock_t uptodate_lock; - unsigned long uptodate[]; + spinlock_t state_lock; + unsigned long state[]; }; static inline struct iomap_page *to_iomap_page(struct folio *folio) @@ -58,12 +58,12 @@ iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) else gfp = GFP_NOFS | __GFP_NOFAIL; - iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)), + iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(nr_blocks)), gfp); if (iop) { - spin_lock_init(&iop->uptodate_lock); + spin_lock_init(&iop->state_lock); if (folio_test_uptodate(folio)) - bitmap_fill(iop->uptodate, nr_blocks); + bitmap_fill(iop->state, nr_blocks); folio_attach_private(folio, iop); } return iop; @@ -79,7 +79,7 @@ static void iomap_page_release(struct folio *folio) return; WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending)); WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending)); - WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) != + WARN_ON_ONCE(bitmap_full(iop->state, nr_blocks) != folio_test_uptodate(folio)); kfree(iop); } @@ -110,7 +110,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, /* move forward for each leading block marked uptodate */ for (i = first; i <= last; i++) { - if (!test_bit(i, iop->uptodate)) + if (!test_bit(i, iop->state)) break; *pos += block_size; poff += block_size; @@ -120,7 +120,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, /* truncate len if we find any trailing uptodate block(s) */ for ( ; i <= last; i++) { - if (test_bit(i, iop->uptodate)) { + if (test_bit(i, iop->state)) { plen -= (last - i + 1) * block_size; last = i - 1; break; @@ -152,11 +152,11 @@ static void iomap_iop_set_range_uptodate(struct folio *folio, unsigned last = (off + len - 1) >> inode->i_blkbits; unsigned long flags; - spin_lock_irqsave(&iop->uptodate_lock, flags); - bitmap_set(iop->uptodate, first, last - first + 1); - if (bitmap_full(iop->uptodate, i_blocks_per_folio(inode, folio))) + spin_lock_irqsave(&iop->state_lock, flags); + bitmap_set(iop->state, first, last - first + 1); + if (bitmap_full(iop->state, i_blocks_per_folio(inode, folio))) folio_mark_uptodate(folio); - spin_unlock_irqrestore(&iop->uptodate_lock, flags); + spin_unlock_irqrestore(&iop->state_lock, flags); } static void iomap_set_range_uptodate(struct folio *folio, @@ -451,7 +451,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count) last = (from + count - 1) >> inode->i_blkbits; for (i = first; i <= last; i++) - if (!test_bit(i, iop->uptodate)) + if (!test_bit(i, iop->state)) return false; return true; } @@ -1606,7 +1606,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, * invalid, grab a new one. */ for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { - if (iop && !test_bit(i, iop->uptodate)) + if (iop && !test_bit(i, iop->state)) continue; error = wpc->ops->map_blocks(wpc, inode, pos); From patchwork Mon Jan 30 16:14:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13121414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B880DC61DA4 for ; Mon, 30 Jan 2023 16:15:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237439AbjA3QPF (ORCPT ); Mon, 30 Jan 2023 11:15:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236294AbjA3QPC (ORCPT ); Mon, 30 Jan 2023 11:15:02 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF86E6A64; Mon, 30 Jan 2023 08:15:00 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id n2so8124405pfo.3; Mon, 30 Jan 2023 08:15:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EVFW2L8PRdmpyXP+UlpsFapVZPiXoES6qg+s7zA41Fw=; b=jQWxi9flL3kQMS+w2rBrBQs/69zM7Dk/UGpWDDA/O51nECPATgCPamfmHePcVj/ifY 7bgOjHai8gu3n/klPjyerkYrMIjQ6tm6JjR/BpHua6mg4jmCSKN3rDXsISoHzfsJR6G2 +ggTrWy3W10QTr9vTW40Jm1KKc5fauiQHF4Xt5xT3q3EjoXKxwykxlUxP/BpgM07VPhr BQSZBaSiabyEqEn64E/Uez4BqB10iNNOCaBAMUAp/ZlfGtW7VTXbptgOzNbe9U0oAXzR xEOHx7R2zkETjeFRhgKC/WAwyf1bFtmFpZ5WVPtpV/JVnijNQOhZ/fzx2C76gEAVoNdd AeKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EVFW2L8PRdmpyXP+UlpsFapVZPiXoES6qg+s7zA41Fw=; b=bCldCmd3miS2H5iOKigeIbXqDsHKnvOErCicK5v+a9hB6R78MvpLr+I/Pv7qcuH++g 8FXj0kNdVuyffCpVKlUkETQiHOIBLpxn+NVTEXtl4nT/3wpu32ok9ANdq37rAcOb3xEo kg1nfpXF24MhMi5MIjpuvmZHDrqhhkqdfrWKxJWGtRPuBeWHVdOoiX2+bD2DbKY1wANd Gxs0M0uEI0pxeFM8IKTW5TI6gFhTl2K5O9PEBcGfE/cPg861sbcYd7Gv+VaY1sqTp2ZU senZ391+13YTUPD21YDnyY6AqCfVDkSOrUf+PM9oUo2QVrQeEExyw+aJcJr+vM5a/89k naSw== X-Gm-Message-State: AO0yUKXSawx0vn38cs7gidFRgSloRerYKuNJU410MiNCF7wEnDBFzG8I ii0wZz91N/973Cf5cxTxmszuwNNsgIo= X-Google-Smtp-Source: AK7set8NUJ2e4RazoGqL8YQDJ/IuWw1yK1R8upq/7Gz2EgBiPp+4p40F/+9hhthZePWzaeXaSM6pXA== X-Received: by 2002:a62:1809:0:b0:590:7330:353c with SMTP id 9-20020a621809000000b005907330353cmr8200313pfy.6.1675095299758; Mon, 30 Jan 2023 08:14:59 -0800 (PST) Received: from localhost ([2406:7400:63:1fd8:5041:db86:706c:f96b]) by smtp.gmail.com with ESMTPSA id a27-20020a056a001d1b00b00593deb1a329sm795227pfx.66.2023.01.30.08.14.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Jan 2023 08:14:59 -0800 (PST) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: Aravinda Herle , "Ritesh Harjani (IBM)" Subject: [RFCv2 3/3] iomap: Support subpage size dirty tracking to improve write performance Date: Mon, 30 Jan 2023 21:44:13 +0530 Message-Id: <5e49fa975ce9d719f5b6f765aa5d3a1d44d98d1d.1675093524.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On a 64k pagesize platforms (specially Power and/or aarch64) with 4k filesystem blocksize, this patch should improve the performance by doing only the subpage dirty data write. This should also reduce the write amplification since we can now track subpage dirty status within state bitmaps. Earlier we had to write the entire 64k page even if only a part of it (e.g. 4k) was updated. Performance testing of below fio workload reveals ~16x performance improvement on nvme with XFS (4k blocksize) on Power (64K pagesize) FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. [global] ioengine=psync rw=randwrite overwrite=1 pre_read=1 direct=0 bs=4k size=1G dir=./ numjobs=8 fdatasync=1 runtime=60 iodepth=64 group_reporting=1 [fio-run] Reported-by: Aravinda Herle Signed-off-by: Ritesh Harjani (IBM) --- fs/gfs2/aops.c | 2 +- fs/iomap/buffered-io.c | 103 +++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_aops.c | 2 +- fs/zonefs/super.c | 2 +- include/linux/iomap.h | 1 + 5 files changed, 98 insertions(+), 12 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index e782b4f1d104..b9c35288a5eb 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -741,7 +741,7 @@ static const struct address_space_operations gfs2_aops = { .writepages = gfs2_writepages, .read_folio = gfs2_read_folio, .readahead = gfs2_readahead, - .dirty_folio = filemap_dirty_folio, + .dirty_folio = iomap_dirty_folio, .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .bmap = gfs2_bmap, diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index faee2852db8f..af3e77276dab 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -44,7 +44,8 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) static struct bio_set iomap_ioend_bioset; static struct iomap_page * -iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) +iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags, + bool from_writeback) { struct iomap_page *iop = to_iomap_page(folio); unsigned int nr_blocks = i_blocks_per_folio(inode, folio); @@ -58,12 +59,32 @@ iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) else gfp = GFP_NOFS | __GFP_NOFAIL; - iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(nr_blocks)), + iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(2 * nr_blocks)), gfp); if (iop) { spin_lock_init(&iop->state_lock); - if (folio_test_uptodate(folio)) - bitmap_fill(iop->state, nr_blocks); + /* + * iomap_page_create can get called from writeback after + * a truncate_inode_partial_folio operation on a large folio. + * For large folio the iop structure is freed in + * iomap_invalidate_folio() to ensure we can split the folio. + * That means we will have to let go of the optimization of + * tracking dirty bits here and set all bits as dirty if + * the folio is marked uptodate. + */ + if (from_writeback && folio_test_uptodate(folio)) + bitmap_fill(iop->state, 2 * nr_blocks); + else if (folio_test_uptodate(folio)) { + unsigned start = offset_in_folio(folio, + folio_pos(folio)) >> inode->i_blkbits; + bitmap_set(iop->state, start, nr_blocks); + } + if (folio_test_dirty(folio)) { + unsigned start = offset_in_folio(folio, + folio_pos(folio)) >> inode->i_blkbits; + start = start + nr_blocks; + bitmap_set(iop->state, start, nr_blocks); + } folio_attach_private(folio, iop); } return iop; @@ -168,6 +189,48 @@ static void iomap_set_range_uptodate(struct folio *folio, folio_mark_uptodate(folio); } +static void iomap_iop_set_range_dirty(struct folio *folio, + struct iomap_page *iop, size_t off, size_t len) +{ + struct inode *inode = folio->mapping->host; + unsigned int nr_blocks = i_blocks_per_folio(inode, folio); + unsigned first = (off >> inode->i_blkbits) + nr_blocks; + unsigned last = ((off + len - 1) >> inode->i_blkbits) + nr_blocks; + unsigned long flags; + + spin_lock_irqsave(&iop->state_lock, flags); + bitmap_set(iop->state, first, last - first + 1); + spin_unlock_irqrestore(&iop->state_lock, flags); +} + +static void iomap_set_range_dirty(struct folio *folio, + struct iomap_page *iop, size_t off, size_t len) +{ + if (iop) + iomap_iop_set_range_dirty(folio, iop, off, len); +} + +static void iomap_iop_clear_range_dirty(struct folio *folio, + struct iomap_page *iop, size_t off, size_t len) +{ + struct inode *inode = folio->mapping->host; + unsigned int nr_blocks = i_blocks_per_folio(inode, folio); + unsigned first = (off >> inode->i_blkbits) + nr_blocks; + unsigned last = ((off + len - 1) >> inode->i_blkbits) + nr_blocks; + unsigned long flags; + + spin_lock_irqsave(&iop->state_lock, flags); + bitmap_clear(iop->state, first, last - first + 1); + spin_unlock_irqrestore(&iop->state_lock, flags); +} + +static void iomap_clear_range_dirty(struct folio *folio, + struct iomap_page *iop, size_t off, size_t len) +{ + if (iop) + iomap_iop_clear_range_dirty(folio, iop, off, len); +} + static void iomap_finish_folio_read(struct folio *folio, size_t offset, size_t len, int error) { @@ -231,7 +294,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, if (WARN_ON_ONCE(size > iomap->length)) return -EIO; if (offset > 0) - iop = iomap_page_create(iter->inode, folio, iter->flags); + iop = iomap_page_create(iter->inode, folio, iter->flags, false); else iop = to_iomap_page(folio); @@ -269,7 +332,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, return iomap_read_inline_data(iter, folio); /* zero post-eof blocks as the page may be mapped */ - iop = iomap_page_create(iter->inode, folio, iter->flags); + iop = iomap_page_create(iter->inode, folio, iter->flags, false); iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen); if (plen == 0) goto done; @@ -497,6 +560,17 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio) +{ + unsigned int nr_blocks = i_blocks_per_folio(mapping->host, folio); + struct iomap_page *iop = iomap_page_create(mapping->host, folio, 0, false); + + iomap_set_range_dirty(folio, iop, offset_in_folio(folio, folio_pos(folio)), + nr_blocks << mapping->host->i_blkbits); + return filemap_dirty_folio(mapping, folio); +} +EXPORT_SYMBOL_GPL(iomap_dirty_folio); + static void iomap_write_failed(struct inode *inode, loff_t pos, unsigned len) { @@ -528,7 +602,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, { const struct iomap *srcmap = iomap_iter_srcmap(iter); struct iomap_page *iop = iomap_page_create(iter->inode, folio, - iter->flags); + iter->flags, false); loff_t block_size = i_blocksize(iter->inode); loff_t block_start = round_down(pos, block_size); loff_t block_end = round_up(pos + len, block_size); @@ -686,6 +760,7 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, if (unlikely(copied < len && !folio_test_uptodate(folio))) return 0; iomap_set_range_uptodate(folio, iop, offset_in_folio(folio, pos), len); + iomap_set_range_dirty(folio, iop, offset_in_folio(folio, pos), len); filemap_dirty_folio(inode->i_mapping, folio); return copied; } @@ -1231,6 +1306,13 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter, block_commit_write(&folio->page, 0, length); } else { WARN_ON_ONCE(!folio_test_uptodate(folio)); + /* + * TODO: We need not set range of dirty bits in iop here. + * This will be taken care by iomap_dirty_folio callback + * function which gets called from folio_mark_dirty(). + */ + iomap_set_range_dirty(folio, to_iomap_page(folio), + offset_in_folio(folio, iter->pos), length); folio_mark_dirty(folio); } @@ -1590,7 +1672,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct inode *inode, struct folio *folio, u64 end_pos) { - struct iomap_page *iop = iomap_page_create(inode, folio, 0); + struct iomap_page *iop = iomap_page_create(inode, folio, 0, true); struct iomap_ioend *ioend, *next; unsigned len = i_blocksize(inode); unsigned nblocks = i_blocks_per_folio(inode, folio); @@ -1606,7 +1688,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, * invalid, grab a new one. */ for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { - if (iop && !test_bit(i, iop->state)) + if (iop && !test_bit(i + nblocks, iop->state)) continue; error = wpc->ops->map_blocks(wpc, inode, pos); @@ -1650,6 +1732,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, } } + iomap_clear_range_dirty(folio, iop, + offset_in_folio(folio, folio_pos(folio)), + end_pos - folio_pos(folio)); folio_start_writeback(folio); folio_unlock(folio); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 41734202796f..7e6c54955b4f 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -571,7 +571,7 @@ const struct address_space_operations xfs_address_space_operations = { .read_folio = xfs_vm_read_folio, .readahead = xfs_vm_readahead, .writepages = xfs_vm_writepages, - .dirty_folio = filemap_dirty_folio, + .dirty_folio = iomap_dirty_folio, .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .bmap = xfs_vm_bmap, diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index a9c5c3f720ad..4cefc2af87f3 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -267,7 +267,7 @@ static const struct address_space_operations zonefs_file_aops = { .read_folio = zonefs_read_folio, .readahead = zonefs_readahead, .writepages = zonefs_writepages, - .dirty_folio = filemap_dirty_folio, + .dirty_folio = iomap_dirty_folio, .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .migrate_folio = filemap_migrate_folio, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 0983dfc9a203..b60562a0b893 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -262,6 +262,7 @@ void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,