From patchwork Fri Nov 15 22:44:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joanne Koong X-Patchwork-Id: 13877385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D742CD68BDC for ; Fri, 15 Nov 2024 22:46:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67CF16B00B8; Fri, 15 Nov 2024 17:46:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 62C746B00B9; Fri, 15 Nov 2024 17:46:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CC816B00BA; Fri, 15 Nov 2024 17:46:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2DC086B00B8 for ; Fri, 15 Nov 2024 17:46:57 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A8480AC9AB for ; Fri, 15 Nov 2024 22:46:56 +0000 (UTC) X-FDA: 82789814268.25.F5C0E30 Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) by imf16.hostedemail.com (Postfix) with ESMTP id 48B5D180010 for ; Fri, 15 Nov 2024 22:46:09 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=c2vTICIJ; spf=pass (imf16.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731710757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=710TuHEiZSo6JI5Ld9IOzGZQ5MPqx7fEXflm4wt35AA=; b=ZVSa4gieQM0CWYWx9vjq3W97uKNGtULjYMqCGcDU0aUDqRncEgJjY+IyhmtZqMA2ez5nVJ ZxfPrpjhNXDr8Pjv8WcAPgbHPNstVlNRQnFe3NRU1ZXBKb5RYqUBoDziKhVvQV33Uwbjxa aHE+ccio+kt4C3E+g93ayS9XcH7rn70= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731710757; a=rsa-sha256; cv=none; b=QzpQif9G/08uCRgj8X+MiNtlU+jftdTeya9Eu6tKzQBDObJxvwexhhtDJL+ivk9JhzfUfI mIA8/YGhj2AnFH5/bcbRKN0YIFYLhGMouhrajw/mizVTvHaUcxSqo/ZJr6y2/dpKNIlMLH 47Vx+pWZ5xJYuT7fFCp34jtsYTtps3U= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=c2vTICIJ; spf=pass (imf16.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-6ea339a41f1so11296747b3.2 for ; Fri, 15 Nov 2024 14:46:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731710814; x=1732315614; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=710TuHEiZSo6JI5Ld9IOzGZQ5MPqx7fEXflm4wt35AA=; b=c2vTICIJkJQedhQDeJ4aaczF58EJmaICMe2Hqow+hJveQBotmUtSNZMkrIXInY9TpV Is2+2i+sfVVDvxp2gKFo2HIuWC3I/3tJwV9SnKcCuFtD3OyS2nufiwLxDCWwWcWSle+Y 9Kb0jV27dPusk69SOlYNdc9vXjOkNh8sl0KdonCvHA7/SF8QSwc9oMVW4M0Tk9l4rqnZ 2DRQ/8zhPffoHBWmN2Y9aXyWgAZVK9qaU2YstmNhovGfSbSnCDsB1RjEYvVM4yO+Nssa m1gOknE1ckxN3K/KIR1O21GKL4hxrpxpsIDtEd/k5aObSrjS28wv+2xMdzxJz7JD2KDf /S0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731710814; x=1732315614; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=710TuHEiZSo6JI5Ld9IOzGZQ5MPqx7fEXflm4wt35AA=; b=GxREkUKC9n18MiEzdz18nWR4au/bvZnZHn2LOr/5N28/E8qAn5Osks2QRkCkK4p8AA d7VSmPjkt1mWrazOhWljNrVhRrsocPTPqWfgmB/VSVdr83n9YlpUsGc6tTJ6PxSHNum2 zKyfK9tnM7BJolYzFtIYLOH2iO1Ci+M3lCNskKblOqSDZFNYRjUj6CNPwceGAJm6S5c6 C/MR1ukqA2x5gRluYPJJpIYfdKi653LY23cnFXAf85HV2KdR97DK4CoyLuGhKrXSAeeT qN9LYJWdNivStfg2uZjL+tVTlmXrl5mgjs3+bbNz694bPx80ARA5bAaHHRg4nOnIJTBA A2hA== X-Forwarded-Encrypted: i=1; AJvYcCXEur5tkr5Jia479YsJuuE8FRQHuyijMK+6Vl+IzbDDfowJFfoY7u5zR1Vlf++ZBrkR2m8lm/hPQg==@kvack.org X-Gm-Message-State: AOJu0YyJAeqmH8bndTUOYJkT3VjGq4i0DQCUhC0RAOmyz7zCHGXp8u51 cAQ1RiYfF24itJ/6TXEKwOJVXfyNdpadcEl1xFm0GWaTNdjY6NRB X-Google-Smtp-Source: AGHT+IGBQ3lcs9flGIlvjHdkHlY367LoO797EM5zdqL1aHzzI1E/+vB/7X98/euZDYslVVqMiLq5FQ== X-Received: by 2002:a05:690c:7246:b0:6e3:34b9:960a with SMTP id 00721157ae682-6ee55b7e645mr48549607b3.17.1731710813924; Fri, 15 Nov 2024 14:46:53 -0800 (PST) Received: from localhost (fwdproxy-nha-001.fbsv.net. [2a03:2880:25ff:1::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6ee712c2948sm856187b3.51.2024.11.15.14.46.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Nov 2024 14:46:53 -0800 (PST) From: Joanne Koong To: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, jefflexu@linux.alibaba.com, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Subject: [PATCH v5 0/5] fuse: remove temp page copies in writeback Date: Fri, 15 Nov 2024 14:44:54 -0800 Message-ID: <20241115224459.427610-1-joannelkoong@gmail.com> X-Mailer: git-send-email 2.43.5 MIME-Version: 1.0 X-Stat-Signature: epdoaiphkm4weg6o7wm8qtydssg8eyxm X-Rspam-User: X-Rspamd-Queue-Id: 48B5D180010 X-Rspamd-Server: rspam02 X-HE-Tag: 1731710769-70271 X-HE-Meta: U2FsdGVkX18j5j8WAzgtkaYEjt6zEU2RSo2U0Rb3LfiTH3HUhBSM4Xhz+LbG6Z9YONKdnh5iZ0KuOtUSQvg1PQyDsvyD0mHoOeNxHeIVL7WYLEgArn5Y/5/kIxA3KBB2+a42/4Iik8VhVNtegRviM2Ed+298gvEv0qwuikxntTnqdXtx2aFTNeA+Di6XEK2Seb4e/xV4JK0kcXa7vSAeJkbjLSY1w3RGgGPPhnMlTOiKeZtgjA59265CzksdDSZjtfPN4CopHzG7opj5PoOCeSip2GG/7z2UlYEFk13o9zwBm1INsdsWLUPDHGcH929+5X9k5aLzetZWHZl0jd68a56Le6BXMvbEAzEP942Z2Gq8xu3Xp2cmK5MH52s3GjYsSq1JGlW5uCAmIBmBUdg4dyBH0swsMjEeCBjQgFHbnMocdDWtSAvr4ZHTFaTomxhxAz3DJ3hYZPhdzHTzn/Xc68+d2mwkldjyL/iHL8Nz+1ZzRoY19TRGG665v34dexJDaWL1C6nkQXagm3SiWqR1xotMi6rvDqVV+93jgtFyM0dHhyt57bkVh0VbwAEOd+gnidFZrBhvRc3iKokjWymhToYzQKrJ6PF8EQIhFoksWmR8KnpBBDmJT6swobtC2oFDefMANsgGfSYbZ+x9riTvWYn9Yo/kTzH1QtrLNn3TDm6jMlR4rEcUzTDVx5lXPlhiArxoXb2tV8AnWCVY5c0J2ZEGlOwG9CyIwe3UhUtveF3E6fTPvEKnd1VJmVSUuCwJmFcqXpInGAyeYwI1F/wmwuFVI0fdI+oxvDms6rT/2mJVOq8gxOyNMxwDOA2SOLHqDhKU78XBjemr5uFc4bB78qkKbKhwXgiqrUg9AotnYrphalqWJ0BT7VQE381ReGL9zuiee/4+Ul6L91sGTz2uAFjm5sW/dPGdFgKUzPJN8CBUnikOwRX//5v/RU2hihky0ag0+YkOWHV84QNzHUo aBZqT8hN 1IC26eQBkkZgzLFiAy4cLnXuPkzQncULlHhSBU2uyz1JknQoXFNBE1zLJortvGxgXgtwzLk7b34P8CRcK+I1WDJM8vT28CSNKedl2FDR/hK0scfipXtb1wWSCFhOmIOHcUl8TLvAk/qkTIdJh37uzi6gN4/ZmGJDPHqvTT1T+2mDfxlShPI30TQz1/M0FjFnSYwjr4z+rKfGXw5wm7AN3fyDmmlLfTVIO7Cq3XLJQDf39Lrs3+KI0okrwEXqshkKQmVvwnIv29ed0WpuE9/WYcpGlxcTQPKecw4PiqofWeY82LgXBzuTrv4tspaa8SJZUxPN13evkEfGuuUlbjQPB4gxtK7xJpMTGeemdcKy3g1sGSb0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The purpose of this patchset is to help make writeback-cache write performance in FUSE filesystems as fast as possible. In the current FUSE writeback design (see commit 3be5a52b30aa ("fuse: support writable mmap"))), a temp page is allocated for every dirty page to be written back, the contents of the dirty page are copied over to the temp page, and the temp page gets handed to the server to write back. This is done so that writeback may be immediately cleared on the dirty page, and this in turn is done for two reasons: a) in order to mitigate the following deadlock scenario that may arise if reclaim waits on writeback on the dirty page to complete (more details can be found in this thread [1]): * single-threaded FUSE server is in the middle of handling a request that needs a memory allocation * memory allocation triggers direct reclaim * direct reclaim waits on a folio under writeback * the FUSE server can't write back the folio since it's stuck in direct reclaim b) in order to unblock internal (eg sync, page compaction) waits on writeback without needing the server to complete writing back to disk, which may take an indeterminate amount of time. Allocating and copying dirty pages to temp pages is the biggest performance bottleneck for FUSE writeback. This patchset aims to get rid of the temp page altogether (which will also allow us to get rid of the internal FUSE rb tree that is needed to keep track of writeback status on the temp pages). Benchmarks show approximately a 20% improvement in throughput for 4k block-size writes and a 45% improvement for 1M block-size writes. With removing the temp page, writeback state is now only cleared on the dirty page after the server has written it back to disk. This may take an indeterminate amount of time. As well, there is also the possibility of malicious or well-intentioned but buggy servers where writeback may in the worst case scenario, never complete. This means that any folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to be carefully audited. In particular, these are the cases that need to be accounted for: * potentially deadlocking in reclaim, as mentioned above * potentially stalling sync(2) * potentially stalling page migration / compaction This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which filesystems may set on its inode mappings to indicate that writeback operations may take an indeterminate amount of time to complete. FUSE will set this flag on its mappings. This patchset adds checks to the critical parts of reclaim, sync, and page migration logic where writeback may be waited on. Please note the following: * For sync(2), waiting on writeback will be skipped for FUSE, but this has no effect on existing behavior. Dirty FUSE pages are already not guaranteed to be written to disk by the time sync(2) returns (eg writeback is cleared on the dirty page but the server may not have written out the temp page to disk yet). If the caller wishes to ensure the data has actually been synced to disk, they should use fsync(2)/fdatasync(2) instead. * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be waited on when in writeback. There are some cases where the wait is desirable. For example, for the sync_file_range() syscall, it is fine to wait on the writeback since the caller passes in a fd for the operation. [1] https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com/ Changelog --------- v4: https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoong@gmail.com/ Changes from v4 -> v5: * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel) * Drop memory hotplug patch (David and Shakeel) * Remove some more kunnecessary writeback waits in fuse code (Jingbo) * Make commit message for reclaim patch more concise - drop part about deadlock and just focus on how it may stall waits v3: https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoong@gmail.com/ Changes from v3 -> v4: * Use filemap_fdatawait_range() instead of filemap_range_has_writeback() in readahead v2: https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoong@gmail.com/ Changes from v2 -> v3: * Account for sync and page migration cases as well (Miklos) * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLOCK * For fuse inodes, set mapping_writeback_may_block only if fc->writeback_cache is enabled v1: https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoong@gmail.com/T/#t Changes from v1 -> v2: * Have flag in "enum mapping_flags" instead of creating asop_flags (Shakeel) * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel) Joanne Koong (5): mm: add AS_WRITEBACK_INDETERMINATE mapping flag mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_INDETERMINATE mappings mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings fuse: remove tmp folio for writebacks and internal rb tree fs/fs-writeback.c | 3 + fs/fuse/file.c | 339 +++------------------------------------- include/linux/pagemap.h | 11 ++ mm/migrate.c | 5 +- mm/vmscan.c | 10 +- 5 files changed, 45 insertions(+), 323 deletions(-)