From patchwork Tue Dec 3 15:31:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13892628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E67B5E64ABB for ; Tue, 3 Dec 2024 15:33:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3593D6B009F; Tue, 3 Dec 2024 10:33:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 30AF26B00A0; Tue, 3 Dec 2024 10:33:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10E2C6B00A1; Tue, 3 Dec 2024 10:33:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EB1EB6B009F for ; Tue, 3 Dec 2024 10:33:04 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A8DE81C74C2 for ; Tue, 3 Dec 2024 15:33:04 +0000 (UTC) X-FDA: 82854040710.22.2484658 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by imf29.hostedemail.com (Postfix) with ESMTP id A2FA2120012 for ; Tue, 3 Dec 2024 15:32:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=RnOpZfH8; dmarc=none; spf=pass (imf29.hostedemail.com: domain of axboe@kernel.dk designates 209.85.167.182 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733239970; a=rsa-sha256; cv=none; b=yZB5qfiQ5oh9Vo4oHH3LhKAtLpWBmKdiiH/0mUDzlNbVPTKCGwlI76HacCbAvwBwzefMi6 JLtCJq6msm5nvlLW1tLmdxneR7c6J2ZYLgnkco78tpyy6ns20RD3Cq/Oumjgzm1y9/ZvHK GOYRjFGr5SeoqgBpeu32Hu48sr+Vj8A= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=RnOpZfH8; dmarc=none; spf=pass (imf29.hostedemail.com: domain of axboe@kernel.dk designates 209.85.167.182 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733239970; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rNRkYctWaLbO70YbebeuOKFfbUg5iKQfEN1xFJYmHB8=; b=xCW1LraQxeaA7IBkF9g0ujHPm1u2u5NEXQxorjHPS70/sJ5oYXPtCze2OhAopcloi8xuwG pJTjEqb5pHFKbaHIC0V+XwlIo1ljzvNQZW+9Fv1I6IlkT91jLjne0gz2NH7J2L8aIbVLUD e2VquNx68pdOSo7CehvuonfEey258rw= Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3ea55a2a38bso3014524b6e.1 for ; Tue, 03 Dec 2024 07:33:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1733239981; x=1733844781; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rNRkYctWaLbO70YbebeuOKFfbUg5iKQfEN1xFJYmHB8=; b=RnOpZfH8BuL3g1dSFP6XELBS3E5zKDRlQ6yz2RUNlcLwhtIGyBYz+MiykvhnClN//y xSk8lQ9wGSgVO/NfzpH2/CYIdj0gnfOKFbbq9BVUrelcfwf1wVP4jAtAMClLMic11EVW AO0gMuLUzWcIkuAnMZMJq97EWEVtOXzLIhY5zbnbU8gv4oLnW2LeFWpRlF8pJHEM4WrY GSTQSGI4e2Qd0vVY3l/kiF3GKPfnsr5XGWyfukB5rj6g5Kj3rDexC/VjL89BE5oq5Hj+ T9yg+PT5bzyo9U4RHibQVut06LWrSvB6GVPtljF3dI85Hs1Mpktv+gLZZgFsM75FbIQy reFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733239981; x=1733844781; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rNRkYctWaLbO70YbebeuOKFfbUg5iKQfEN1xFJYmHB8=; b=kpc4tn9KpNOt9Dl0WLv27N9ZhHtccA0EcHFZVYscZthjvXnuJ2CO2L3O51ppcqFxvu 06FczNlfauFRTnT53pUosF5aYoPbT+1lky/Iw8smYHFB9TlDDZKwtO7GEjR9N1nB5cJd LWuM4gGKljK6RlicbYQEHmM4RUhOamzn2B7/N6dd0f+8IOLeeBHr++izXfBSL2meTiHt 2H6zO+DU4oThOXSHj9q32SxaMqCrem6OoOIiMvnzMCSFeBoUga59MOaWkttsE5Q2pLcR qrcL8g4J+5MgQYhv7hv4XZAPwm00KFjSZXsP8aExZxoYxPHRUFUVa7JF3jJQ0G5sIsTW 2oXQ== X-Gm-Message-State: AOJu0YyOZy/nXIpXuQNEdyu+KZoq8B4qKgJHoP3qFbBZcUTLHyM8hDJA gegJjfn/0NTHgpULi82g2QUIQuNcpmwwKil9/fHqphqLUZdOBYgv0VOGtNWZb6C66wGZgG87i0d 3 X-Gm-Gg: ASbGncso38BxlB8IWrEDeaRoZb4FIpfDaJFJFiugth82nxj+MdL+ZP98m+cm/ERuCMw uvaU5tPzhqjXwjHQCw9lfp7E8jVcu4OwOK2t9CiymgEyfc1iMJ5mg+Rh1Ld8wDVxL6cIeDwZs3o 5vTyGcFxPPgzh6uTwa/8qKZYSMIEHzgmNqA8L/8QsIowV1asZqc4tPZdbmSnx6dYJpzcBUn3hSM cW6ON7M5RVwvGIhoXYoR4KWw5QSruYY18uU4QDMUb8pUsEFSckUDfR3udc= X-Google-Smtp-Source: AGHT+IGnmoFBWMRz09BuY3oNVVI/1OH4qyI0X8Y48goC/ICz9U7NJeNOcoRPtjmNOo7azkeA+JUnjQ== X-Received: by 2002:a05:6808:200d:b0:3e6:6208:b737 with SMTP id 5614622812f47-3eae4f3bce5mr2757240b6e.17.1733239981545; Tue, 03 Dec 2024 07:33:01 -0800 (PST) Received: from localhost.localdomain ([130.250.255.163]) by smtp.gmail.com with ESMTPSA id 5614622812f47-3ea86036cbbsm2891878b6e.8.2024.12.03.07.33.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Dec 2024 07:33:00 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, bfoster@redhat.com, Jens Axboe Subject: [PATCH 11/12] mm/filemap: make buffered writes work with RWF_UNCACHED Date: Tue, 3 Dec 2024 08:31:47 -0700 Message-ID: <20241203153232.92224-13-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241203153232.92224-2-axboe@kernel.dk> References: <20241203153232.92224-2-axboe@kernel.dk> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A2FA2120012 X-Stat-Signature: mjrnjurpha1h3sfh8y1k9b6tnz3qax5b X-Rspam-User: X-HE-Tag: 1733239963-394803 X-HE-Meta: U2FsdGVkX1/4f0siUSZhHZ0HLDXgVTH5RW+GZKuGXbzDfnsByNB8ZK2jHN45a38/917oDnDFTYcUHLIw4ucB6BgcDTAsyWuzhLl0a3O5FzTqOgD6NusznNeQQQb5N1HQKktMr+1HhXckkwR9THaN/ViXKWsa/e+NvmUzQJir77SmzxXff2JQwxhaSnghxgWItp9xTQqk1DBaZR5Wt5Am4MeQnoz1S57VJgSBEYcneXLmjytSZhqu75GuowO8jHCl+lo/5v2kEqpajjNhx9E/JBwpimNy3gK5UfgAuunwB/g2qah3Orm1qiq8aSkXMgaiBNRbyiOiF27w0oJXGLg8NVh3z3U8U/QN8WxHNbsjo66Q1yYPh7jsLr7+4iGlz3iZ/OGgOLBzU5M71UTI5BLY1FD8bS1Xo08ROlWUF0cfwZEh/J/qUSyBFVvU7e4CoP217BYB/Sn1Yw+aqSjIkUngnP52IfORe8HN8xC3/EebIGADAWLvmzDk2NSh5BUEHjF00OjLMiceshKHhTnQHeq2PgUr3OQ7H6SxusMJWqBMo9kFrctfbeQqyFdd8jnErZ1PJ5oJGKglzfwB1uOM3gjz3iHb6Ctl8nPhACNkh3JmEeO6lnrAovL91YZoBKGwA0ZMzApPglrI8yZOzI+ZuL+VOJhiN3AGTKc3Xp8WIp3jOZZS/Kk02goj9bBENW8I7Oq9mHTPKNjl2QzttV2DBbPclHkmaVYpzbUBvSWwIIuJSCceWSh9Vna5Y/HX5dIUA4ugRHD0NyMiOKGQOQQ8btzUolrVCNDI7z8Q0GdaURnf4uHidskFP8SYZ5QSNGI+Ade44i/QESlFrc3v4DCH05YbAJKO6/Ws1KLwJki1+loNp6AyB2pELKpJ9Z/bilYje5s/ylyK6EQgcqQKPis4+F0ZHt38ss5E3eRaZwfWECzHHFKtr/bImB2upZaYR/4lnKKRuZYGQ0ZEix9KaKjYLmz VHY7KK4C bvcQaAl7fq0FT/L0geWoIe5l4hwtQVZC4VdupE9RGX1TFbjr2zl+/J6Ppn7PemtzcweBmMvZHVUBRHc5MvnQDosrq671L33YKsVi4VsQYHbtNEF9DKCxebuizPfOn41uUvp1d56E9uF9iAG+ffxYd2BJId0V+jG77srUJ+NZWMdZODb8ubhEQ7UenRo0Wd8kkunf0oiQEjNMml3Xd/boinyMpAwEp/U6t7Kp2lADoVuXFUnqRafpgHpc/Sautm0HYma72XeS2M7LA3TFsDlZwxMfK1wXt7mCTtIfmuN/itRqp0Big8kop/UpYFiZwk1FARKkmxy1Cwai6skRIopMncnhDgNjN/6xgosqg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If RWF_UNCACHED is set for a write, mark new folios being written with uncached. This is done by passing in the fact that it's an uncached write through the folio pointer. We can only get there when IOCB_UNCACHED was allowed, which can only happen if the file system opts in. Opting in means they need to check for the LSB in the folio pointer to know if it's an uncached write or not. If it is, then FGP_UNCACHED should be used if creating new folios is necessary. Uncached writes will drop any folios they create upon writeback completion, but leave folios that may exist in that range alone. Since ->write_begin() doesn't currently take any flags, and to avoid needing to change the callback kernel wide, use the foliop being passed in to ->write_begin() to signal if this is an uncached write or not. File systems can then use that to mark newly created folios as uncached. This provides similar benefits to using RWF_UNCACHED with reads. Testing buffered writes on 32 files: writing bs 65536, uncached 0 1s: 196035MB/sec 2s: 132308MB/sec 3s: 132438MB/sec 4s: 116528MB/sec 5s: 103898MB/sec 6s: 108893MB/sec 7s: 99678MB/sec 8s: 106545MB/sec 9s: 106826MB/sec 10s: 101544MB/sec 11s: 111044MB/sec 12s: 124257MB/sec 13s: 116031MB/sec 14s: 114540MB/sec 15s: 115011MB/sec 16s: 115260MB/sec 17s: 116068MB/sec 18s: 116096MB/sec where it's quite obvious where the page cache filled, and performance dropped from to about half of where it started, settling in at around 115GB/sec. Meanwhile, 32 kswapds were running full steam trying to reclaim pages. Running the same test with uncached buffered writes: writing bs 65536, uncached 1 1s: 198974MB/sec 2s: 189618MB/sec 3s: 193601MB/sec 4s: 188582MB/sec 5s: 193487MB/sec 6s: 188341MB/sec 7s: 194325MB/sec 8s: 188114MB/sec 9s: 192740MB/sec 10s: 189206MB/sec 11s: 193442MB/sec 12s: 189659MB/sec 13s: 191732MB/sec 14s: 190701MB/sec 15s: 191789MB/sec 16s: 191259MB/sec 17s: 190613MB/sec 18s: 191951MB/sec and the behavior is fully predictable, performing the same throughout even after the page cache would otherwise have fully filled with dirty data. It's also about 65% faster, and using half the CPU of the system compared to the normal buffered write. Signed-off-by: Jens Axboe --- include/linux/fs.h | 5 +++++ include/linux/pagemap.h | 9 +++++++++ mm/filemap.c | 12 +++++++++++- 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 40383f5cc6a2..32255473f79d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2912,6 +2912,11 @@ static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count) (iocb->ki_flags & IOCB_SYNC) ? 0 : 1); if (ret) return ret; + } else if (iocb->ki_flags & IOCB_UNCACHED) { + struct address_space *mapping = iocb->ki_filp->f_mapping; + + filemap_fdatawrite_range_kick(mapping, iocb->ki_pos, + iocb->ki_pos + count); } return count; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f2d49dccb7c1..e49587c40157 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -14,6 +14,7 @@ #include #include #include /* for in_interrupt() */ +#include #include struct folio_batch; @@ -70,6 +71,14 @@ static inline int filemap_write_and_wait(struct address_space *mapping) return filemap_write_and_wait_range(mapping, 0, LLONG_MAX); } +/* + * Value passed in to ->write_begin() if IOCB_UNCACHED is set for the write, + * and the ->write_begin() handler on a file system supporting FOP_UNCACHED + * must check for this and pass FGP_UNCACHED for folio creation. + */ +#define foliop_uncached ((struct folio *) 0xfee1c001) +#define foliop_is_uncached(foliop) (*(foliop) == foliop_uncached) + /** * filemap_set_wb_err - set a writeback error on an address_space * @mapping: mapping in which to set writeback error diff --git a/mm/filemap.c b/mm/filemap.c index 826df99e294f..00f3c6c58629 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4095,7 +4095,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) ssize_t written = 0; do { - struct folio *folio; + struct folio *folio = NULL; size_t offset; /* Offset into folio */ size_t bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ @@ -4123,6 +4123,16 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) break; } + /* + * If IOCB_UNCACHED is set here, we now the file system + * supports it. And hence it'll know to check folip for being + * set to this magic value. If so, it's an uncached write. + * Whenever ->write_begin() changes prototypes again, this + * can go away and just pass iocb or iocb flags. + */ + if (iocb->ki_flags & IOCB_UNCACHED) + folio = foliop_uncached; + status = a_ops->write_begin(file, mapping, pos, bytes, &folio, &fsdata); if (unlikely(status < 0))