From patchwork Mon May 27 10:32:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 10962501 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C520933 for ; Mon, 27 May 2019 10:32:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38A352897A for ; Mon, 27 May 2019 10:32:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2C0FD28A86; Mon, 27 May 2019 10:32:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 685062897A for ; Mon, 27 May 2019 10:32:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C3E56B0273; Mon, 27 May 2019 06:32:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 050EF6B0276; Mon, 27 May 2019 06:32:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D49E46B0277; Mon, 27 May 2019 06:32:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 58FBF6B0273 for ; Mon, 27 May 2019 06:32:17 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id e21so27401354edr.18 for ; Mon, 27 May 2019 03:32:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=zAmtlpMVFnlR1RireToMtQMxx6kCml4rqb9ty+U79tA=; b=CeTMB84lSjcxXvN+7Jx9HiaxyPxYpAtJsYJ8zwQ29k5FYkJ9A0VXrxvqOZqQ1uhTiX uuuxx8+1AKVYYQaqKRvnjyb2HysYH/62EwPZpixRyZzVbYb5DDnLREep6vtB9wITtaFc QY/64shOWwZBWiIwY+zDKXZ/E5MEjN+9Wv+tkXpm0l0GyK9dEHGdI6ZOsfUhe8yVwPB3 2CR9d7A0TsRtoiZnlNIT7RjJwlST2uu2Rx1Gi3iPMvBhjidrKBsbbAmgU1rWgU8IsUsQ 5ELg4loaoB3aDVVz4ob5kUxHUqED6nTFDEMR/W23jBecDX5KQHzndg4lDtPRAnAVnZG+ ghCQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=jgross@suse.com X-Gm-Message-State: APjAAAVFIxE9ZyVz53oK91bNsNfIPDa/R2rHDnY0vsNpQdA18sfVOqvj aT3qRMoRsoaRxVBgmwWCjIzkKGPVHeKT55FcyVCLf8Up0hfn8QHPShyCKL/Qwqstl8HBkY2gWkP SeD4kPj7mJfqersQm3b+UaUhSgpZUbcOFyrUGUI1HMwiGswSVpkgjTbHLtOKbShx1AA== X-Received: by 2002:a17:906:6c8:: with SMTP id v8mr34831233ejb.14.1558953136655; Mon, 27 May 2019 03:32:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqzwNAUR0lIFzWxVc0s9sI0+cHSI/+YNpabJ3LE6GEeKdjxX5iwFSH0L4BHWpQJNOGNJAWhL X-Received: by 2002:a17:906:6c8:: with SMTP id v8mr34831019ejb.14.1558953133198; Mon, 27 May 2019 03:32:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558953133; cv=none; d=google.com; s=arc-20160816; b=o1HQvk4vfm8iXJN42n5kgldhDW+YD8qd85GXRqJ4qx5vM+d6A98mrtUbmD9nN8aJnx yHG8kz5owMM1u+W11oy2h8DjfiGMkk4QTldP5wOd3lpTOu+Zon5BFD0I8bc9lmtluQji 4db1CvVMd744HKbt508e8HVv/PGwilf8QFU9P3GAsdmpJ6e3E3zsoJCrdUr6uHlA71iX +FseDJ6+rOWmI7h+KOG2c+T0heeqinxJ9YRUN6KczmYCTUA5L8OEKkuoa5w1gYNXJ/n2 eErVqrGVDC0O6BBgmGH1CtQTy699coalOervBNAVKVeXitUbu6gLPoEjjUxp6cdUx2V7 d5SQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=zAmtlpMVFnlR1RireToMtQMxx6kCml4rqb9ty+U79tA=; b=L7yCItELiabNOTF9sj4lcfLlfXWVgK3szjdMiXkc0K2w0d6gPrChvQpzCO8Xn5MAN0 tYsxEb3M8oBjxdyfBGdmSTU5+ZREbaPjpLZtlPESt8olv8M9dK7TdJOM6+/2eTfbSZ38 7smtm841jEV3XACy/JoSZsXzy/utGb4pkLnDsPiElAqT9uRs+36euP/okx3Vat9X8gqS JnUHxq/+UTQokQdO+Y4Rr46CkLLboxTX6wkP8wsLXoGETtC0bcMQlu1RPwhrCTl1D4AV S42G+HFBNSHMl8hH8xsv+NEjOHGN5u/vfWKtKR7mFMqzQPM0oOOWpt9K3lt+0BcjBe+J ZybQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=jgross@suse.com Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id k7si6357778edi.117.2019.05.27.03.32.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 May 2019 03:32:13 -0700 (PDT) Received-SPF: pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=jgross@suse.com X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 45060AE79; Mon, 27 May 2019 10:32:12 +0000 (UTC) From: Juergen Gross To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-erofs@lists.ozlabs.org, devel@driverdev.osuosl.org, linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org Cc: Juergen Gross , Jonathan Corbet , Gao Xiang , Chao Yu , Greg Kroah-Hartman , Alexander Viro , Chris Mason , Josef Bacik , David Sterba , "Theodore Ts'o" , Andreas Dilger , Jaegeuk Kim , Mark Fasheh , Joel Becker , Joseph Qi , ocfs2-devel@oss.oracle.com Subject: [PATCH 2/3] mm: remove cleancache.c Date: Mon, 27 May 2019 12:32:06 +0200 Message-Id: <20190527103207.13287-3-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190527103207.13287-1-jgross@suse.com> References: <20190527103207.13287-1-jgross@suse.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With the removal of tmem and xen-selfballoon the only user of cleancache is gone. Remove it, too. Signed-off-by: Juergen Gross Acked-by: David Sterba Reviewed-by: Joseph Qi Acked-by: Chao Yu --- Documentation/vm/cleancache.rst | 296 ------------------------------------ Documentation/vm/frontswap.rst | 10 +- Documentation/vm/index.rst | 1 - MAINTAINERS | 7 - drivers/staging/erofs/data.c | 6 - drivers/staging/erofs/internal.h | 1 - fs/block_dev.c | 5 - fs/btrfs/extent_io.c | 9 -- fs/btrfs/super.c | 2 - fs/ext4/readpage.c | 6 - fs/ext4/super.c | 2 - fs/f2fs/data.c | 3 +- fs/mpage.c | 7 - fs/ocfs2/super.c | 2 - fs/super.c | 3 - include/linux/cleancache.h | 124 --------------- include/linux/fs.h | 5 - mm/Kconfig | 22 --- mm/Makefile | 1 - mm/cleancache.c | 317 --------------------------------------- mm/filemap.c | 11 -- mm/truncate.c | 15 +- 22 files changed, 10 insertions(+), 845 deletions(-) delete mode 100644 Documentation/vm/cleancache.rst delete mode 100644 include/linux/cleancache.h delete mode 100644 mm/cleancache.c diff --git a/Documentation/vm/cleancache.rst b/Documentation/vm/cleancache.rst deleted file mode 100644 index 68cba9131c31..000000000000 --- a/Documentation/vm/cleancache.rst +++ /dev/null @@ -1,296 +0,0 @@ -.. _cleancache: - -========== -Cleancache -========== - -Motivation -========== - -Cleancache is a new optional feature provided by the VFS layer that -potentially dramatically increases page cache effectiveness for -many workloads in many environments at a negligible cost. - -Cleancache can be thought of as a page-granularity victim cache for clean -pages that the kernel's pageframe replacement algorithm (PFRA) would like -to keep around, but can't since there isn't enough memory. So when the -PFRA "evicts" a page, it first attempts to use cleancache code to -put the data contained in that page into "transcendent memory", memory -that is not directly accessible or addressable by the kernel and is -of unknown and possibly time-varying size. - -Later, when a cleancache-enabled filesystem wishes to access a page -in a file on disk, it first checks cleancache to see if it already -contains it; if it does, the page of data is copied into the kernel -and a disk access is avoided. - -Transcendent memory "drivers" for cleancache are currently implemented -in Xen (using hypervisor memory) and zcache (using in-kernel compressed -memory) and other implementations are in development. - -:ref:`FAQs ` are included below. - -Implementation Overview -======================= - -A cleancache "backend" that provides transcendent memory registers itself -to the kernel's cleancache "frontend" by calling cleancache_register_ops, -passing a pointer to a cleancache_ops structure with funcs set appropriately. -The functions provided must conform to certain semantics as follows: - -Most important, cleancache is "ephemeral". Pages which are copied into -cleancache have an indefinite lifetime which is completely unknowable -by the kernel and so may or may not still be in cleancache at any later time. -Thus, as its name implies, cleancache is not suitable for dirty pages. -Cleancache has complete discretion over what pages to preserve and what -pages to discard and when. - -Mounting a cleancache-enabled filesystem should call "init_fs" to obtain a -pool id which, if positive, must be saved in the filesystem's superblock; -a negative return value indicates failure. A "put_page" will copy a -(presumably about-to-be-evicted) page into cleancache and associate it with -the pool id, a file key, and a page index into the file. (The combination -of a pool id, a file key, and an index is sometimes called a "handle".) -A "get_page" will copy the page, if found, from cleancache into kernel memory. -An "invalidate_page" will ensure the page no longer is present in cleancache; -an "invalidate_inode" will invalidate all pages associated with the specified -file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate -all pages in all files specified by the given pool id and also surrender -the pool id. - -An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache -to treat the pool as shared using a 128-bit UUID as a key. On systems -that may run multiple kernels (such as hard partitioned or virtualized -systems) that may share a clustered filesystem, and where cleancache -may be shared among those kernels, calls to init_shared_fs that specify the -same UUID will receive the same pool id, thus allowing the pages to -be shared. Note that any security requirements must be imposed outside -of the kernel (e.g. by "tools" that control cleancache). Or a -cleancache implementation can simply disable shared_init by always -returning a negative value. - -If a get_page is successful on a non-shared pool, the page is invalidated -(thus making cleancache an "exclusive" cache). On a shared pool, the page -is NOT invalidated on a successful get_page so that it remains accessible to -other sharers. The kernel is responsible for ensuring coherency between -cleancache (shared or not), the page cache, and the filesystem, using -cleancache invalidate operations as required. - -Note that cleancache must enforce put-put-get coherency and get-get -coherency. For the former, if two puts are made to the same handle but -with different data, say AAA by the first put and BBB by the second, a -subsequent get can never return the stale data (AAA). For get-get coherency, -if a get for a given handle fails, subsequent gets for that handle will -never succeed unless preceded by a successful put with that handle. - -Last, cleancache provides no SMP serialization guarantees; if two -different Linux threads are simultaneously putting and invalidating a page -with the same handle, the results are indeterminate. Callers must -lock the page to ensure serial behavior. - -Cleancache Performance Metrics -============================== - -If properly configured, monitoring of cleancache is done via debugfs in -the `/sys/kernel/debug/cleancache` directory. The effectiveness of cleancache -can be measured (across all filesystems) with: - -``succ_gets`` - number of gets that were successful - -``failed_gets`` - number of gets that failed - -``puts`` - number of puts attempted (all "succeed") - -``invalidates`` - number of invalidates attempted - -A backend implementation may provide additional metrics. - -.. _faq: - -FAQ -=== - -* Where's the value? (Andrew Morton) - -Cleancache provides a significant performance benefit to many workloads -in many environments with negligible overhead by improving the -effectiveness of the pagecache. Clean pagecache pages are -saved in transcendent memory (RAM that is otherwise not directly -addressable to the kernel); fetching those pages later avoids "refaults" -and thus disk reads. - -Cleancache (and its sister code "frontswap") provide interfaces for -this transcendent memory (aka "tmem"), which conceptually lies between -fast kernel-directly-addressable RAM and slower DMA/asynchronous devices. -Disallowing direct kernel or userland reads/writes to tmem -is ideal when data is transformed to a different form and size (such -as with compression) or secretly moved (as might be useful for write- -balancing for some RAM-like devices). Evicted page-cache pages (and -swap pages) are a great use for this kind of slower-than-RAM-but-much- -faster-than-disk transcendent memory, and the cleancache (and frontswap) -"page-object-oriented" specification provides a nice way to read and -write -- and indirectly "name" -- the pages. - -In the virtual case, the whole point of virtualization is to statistically -multiplex physical resources across the varying demands of multiple -virtual machines. This is really hard to do with RAM and efforts to -do it well with no kernel change have essentially failed (except in some -well-publicized special-case workloads). Cleancache -- and frontswap -- -with a fairly small impact on the kernel, provide a huge amount -of flexibility for more dynamic, flexible RAM multiplexing. -Specifically, the Xen Transcendent Memory backend allows otherwise -"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple -virtual machines, but the pages can be compressed and deduplicated to -optimize RAM utilization. And when guest OS's are induced to surrender -underutilized RAM (e.g. with "self-ballooning"), page cache pages -are the first to go, and cleancache allows those pages to be -saved and reclaimed if overall host system memory conditions allow. - -And the identical interface used for cleancache can be used in -physical systems as well. The zcache driver acts as a memory-hungry -device that stores pages of data in a compressed state. And -the proposed "RAMster" driver shares RAM across multiple physical -systems. - -* Why does cleancache have its sticky fingers so deep inside the - filesystems and VFS? (Andrew Morton and Christoph Hellwig) - -The core hooks for cleancache in VFS are in most cases a single line -and the minimum set are placed precisely where needed to maintain -coherency (via cleancache_invalidate operations) between cleancache, -the page cache, and disk. All hooks compile into nothingness if -cleancache is config'ed off and turn into a function-pointer- -compare-to-NULL if config'ed on but no backend claims the ops -functions, or to a compare-struct-element-to-negative if a -backend claims the ops functions but a filesystem doesn't enable -cleancache. - -Some filesystems are built entirely on top of VFS and the hooks -in VFS are sufficient, so don't require an "init_fs" hook; the -initial implementation of cleancache didn't provide this hook. -But for some filesystems (such as btrfs), the VFS hooks are -incomplete and one or more hooks in fs-specific code are required. -And for some other filesystems, such as tmpfs, cleancache may -be counterproductive. So it seemed prudent to require a filesystem -to "opt in" to use cleancache, which requires adding a hook in -each filesystem. Not all filesystems are supported by cleancache -only because they haven't been tested. The existing set should -be sufficient to validate the concept, the opt-in approach means -that untested filesystems are not affected, and the hooks in the -existing filesystems should make it very easy to add more -filesystems in the future. - -The total impact of the hooks to existing fs and mm files is only -about 40 lines added (not counting comments and blank lines). - -* Why not make cleancache asynchronous and batched so it can more - easily interface with real devices with DMA instead of copying each - individual page? (Minchan Kim) - -The one-page-at-a-time copy semantics simplifies the implementation -on both the frontend and backend and also allows the backend to -do fancy things on-the-fly like page compression and -page deduplication. And since the data is "gone" (copied into/out -of the pageframe) before the cleancache get/put call returns, -a great deal of race conditions and potential coherency issues -are avoided. While the interface seems odd for a "real device" -or for real kernel-addressable RAM, it makes perfect sense for -transcendent memory. - -* Why is non-shared cleancache "exclusive"? And where is the - page "invalidated" after a "get"? (Minchan Kim) - -The main reason is to free up space in transcendent memory and -to avoid unnecessary cleancache_invalidate calls. If you want inclusive, -the page can be "put" immediately following the "get". If -put-after-get for inclusive becomes common, the interface could -be easily extended to add a "get_no_invalidate" call. - -The invalidate is done by the cleancache backend implementation. - -* What's the performance impact? - -Performance analysis has been presented at OLS'09 and LCA'10. -Briefly, performance gains can be significant on most workloads, -especially when memory pressure is high (e.g. when RAM is -overcommitted in a virtual workload); and because the hooks are -invoked primarily in place of or in addition to a disk read/write, -overhead is negligible even in worst case workloads. Basically -cleancache replaces I/O with memory-copy-CPU-overhead; on older -single-core systems with slow memory-copy speeds, cleancache -has little value, but in newer multicore machines, especially -consolidated/virtualized machines, it has great value. - -* How do I add cleancache support for filesystem X? (Boaz Harrash) - -Filesystems that are well-behaved and conform to certain -restrictions can utilize cleancache simply by making a call to -cleancache_init_fs at mount time. Unusual, misbehaving, or -poorly layered filesystems must either add additional hooks -and/or undergo extensive additional testing... or should just -not enable the optional cleancache. - -Some points for a filesystem to consider: - - - The FS should be block-device-based (e.g. a ram-based FS such - as tmpfs should not enable cleancache) - - To ensure coherency/correctness, the FS must ensure that all - file removal or truncation operations either go through VFS or - add hooks to do the equivalent cleancache "invalidate" operations - - To ensure coherency/correctness, either inode numbers must - be unique across the lifetime of the on-disk file OR the - FS must provide an "encode_fh" function. - - The FS must call the VFS superblock alloc and deactivate routines - or add hooks to do the equivalent cleancache calls done there. - - To maximize performance, all pages fetched from the FS should - go through the do_mpag_readpage routine or the FS should add - hooks to do the equivalent (cf. btrfs) - - Currently, the FS blocksize must be the same as PAGESIZE. This - is not an architectural restriction, but no backends currently - support anything different. - - A clustered FS should invoke the "shared_init_fs" cleancache - hook to get best performance for some backends. - -* Why not use the KVA of the inode as the key? (Christoph Hellwig) - -If cleancache would use the inode virtual address instead of -inode/filehandle, the pool id could be eliminated. But, this -won't work because cleancache retains pagecache data pages -persistently even when the inode has been pruned from the -inode unused list, and only invalidates the data page if the file -gets removed/truncated. So if cleancache used the inode kva, -there would be potential coherency issues if/when the inode -kva is reused for a different file. Alternately, if cleancache -invalidated the pages when the inode kva was freed, much of the value -of cleancache would be lost because the cache of pages in cleanache -is potentially much larger than the kernel pagecache and is most -useful if the pages survive inode cache removal. - -* Why is a global variable required? - -The cleancache_enabled flag is checked in all of the frequently-used -cleancache hooks. The alternative is a function call to check a static -variable. Since cleancache is enabled dynamically at runtime, systems -that don't enable cleancache would suffer thousands (possibly -tens-of-thousands) of unnecessary function calls per second. So the -global variable allows cleancache to be enabled by default at compile -time, but have insignificant performance impact when cleancache remains -disabled at runtime. - -* Does cleanache work with KVM? - -The memory model of KVM is sufficiently different that a cleancache -backend may have less value for KVM. This remains to be tested, -especially in an overcommitted system. - -* Does cleancache work in userspace? It sounds useful for - memory hungry caches like web browsers. (Jamie Lokier) - -No plans yet, though we agree it sounds useful, at least for -apps that bypass the page cache (e.g. O_DIRECT). - -Last updated: Dan Magenheimer, April 13 2011 diff --git a/Documentation/vm/frontswap.rst b/Documentation/vm/frontswap.rst index 1979f430c1c5..511c921bc8d2 100644 --- a/Documentation/vm/frontswap.rst +++ b/Documentation/vm/frontswap.rst @@ -8,8 +8,8 @@ Frontswap provides a "transcendent memory" interface for swap pages. In some environments, dramatic performance savings may be obtained because swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. -(Note, frontswap -- and :ref:`cleancache` (merged at 3.0) -- are the "frontends" -and the only necessary changes to the core kernel for transcendent memory; +(Note, frontswap is the "frontend" +and the only necessary change to the core kernel for transcendent memory; all other supporting code -- the "backends" -- is implemented as drivers. See the LWN.net article `Transcendent memory in a nutshell`_ for a detailed overview of frontswap and related kernel parts) @@ -87,11 +87,11 @@ This interface is ideal when data is transformed to a different form and size (such as with compression) or secretly moved (as might be useful for write-balancing for some RAM-like devices). Swap pages (and evicted page-cache pages) are a great use for this kind of slower-than-RAM- -but-much-faster-than-disk "pseudo-RAM device" and the frontswap (and -cleancache) interface to transcendent memory provides a nice way to read +but-much-faster-than-disk "pseudo-RAM device" and the frontswap +interface to transcendent memory provides a nice way to read and write -- and indirectly "name" -- the pages. -Frontswap -- and cleancache -- with a fairly small impact on the kernel, +Frontswap with a fairly small impact on the kernel, provides a huge amount of flexibility for more dynamic, flexible RAM utilization in various system configurations: diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index e8d943b21cf9..9e42be5f0c44 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -30,7 +30,6 @@ descriptions of data structures and algorithms. active_mm balance - cleancache frontswap highmem hmm diff --git a/MAINTAINERS b/MAINTAINERS index 2336dd26ece4..d33b46475629 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3928,13 +3928,6 @@ M: Miguel Ojeda S: Maintained F: .clang-format -CLEANCACHE API -M: Konrad Rzeszutek Wilk -L: linux-kernel@vger.kernel.org -S: Maintained -F: mm/cleancache.c -F: include/linux/cleancache.h - CLK API M: Russell King L: linux-clk@vger.kernel.org diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c index 746685f90564..0f5f9429a01c 100644 --- a/drivers/staging/erofs/data.c +++ b/drivers/staging/erofs/data.c @@ -205,12 +205,6 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, goto has_updated; } - if (cleancache_get_page(page) == 0) { - err = 0; - SetPageUptodate(page); - goto has_updated; - } - /* note that for readpage case, bio also equals to NULL */ if (bio && /* not continuous */ diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index c47778b3fabd..a31b26b28985 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -19,7 +19,6 @@ #include #include #include -#include #include #include #include "erofs_fs.h" diff --git a/fs/block_dev.c b/fs/block_dev.c index e6886c93c89d..5ddd460a927b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -29,7 +29,6 @@ #include #include #include -#include #include #include #include @@ -96,10 +95,6 @@ void invalidate_bdev(struct block_device *bdev) lru_add_drain_all(); /* make sure all lru add caches are flushed */ invalidate_mapping_pages(mapping, 0, -1); } - /* 99% of the time, we don't need to flush the cleancache on the bdev. - * But, for the strange corners, lets be cautious - */ - cleancache_invalidate_inode(mapping); } EXPORT_SYMBOL(invalidate_bdev); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index db337e53aab3..07f468b4dcea 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -12,7 +12,6 @@ #include #include #include -#include #include "extent_io.h" #include "extent_map.h" #include "ctree.h" @@ -3015,14 +3014,6 @@ static int __do_readpage(struct extent_io_tree *tree, set_page_extent_mapped(page); - if (!PageUptodate(page)) { - if (cleancache_get_page(page) == 0) { - BUG_ON(blocksize != PAGE_SIZE); - unlock_extent(tree, start, end); - goto out; - } - } - if (page->index == last_byte >> PAGE_SHIFT) { char *userpage; size_t zero_offset = offset_in_page(last_byte); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0645ec428b4f..707f0096b437 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -1228,7 +1227,6 @@ static int btrfs_fill_super(struct super_block *sb, goto fail_close; } - cleancache_init_fs(sb); sb->s_flags |= SB_ACTIVE; return 0; diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c index c916017db334..5b471552e794 100644 --- a/fs/ext4/readpage.c +++ b/fs/ext4/readpage.c @@ -43,7 +43,6 @@ #include #include #include -#include #include "ext4.h" @@ -225,11 +224,6 @@ int ext4_mpage_readpages(struct address_space *mapping, } else if (fully_mapped) { SetPageMappedToDisk(page); } - if (fully_mapped && blocks_per_page == 1 && - !PageUptodate(page) && cleancache_get_page(page) == 0) { - SetPageUptodate(page); - goto confused; - } /* * This page will go to BIO. Do we need to send this diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4079605d437a..04d083364497 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include #include @@ -2307,7 +2306,6 @@ static int ext4_setup_super(struct super_block *sb, struct ext4_super_block *es, EXT4_INODES_PER_GROUP(sb), sbi->s_mount_opt, sbi->s_mount_opt2); - cleancache_init_fs(sb); return err; } diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index eda4181d2092..935c98ef6169 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -16,7 +16,6 @@ #include #include #include -#include #include #include "f2fs.h" @@ -1563,7 +1562,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, block_nr = map->m_pblk + block_in_file - map->m_lblk; SetPageMappedToDisk(page); - if (!PageUptodate(page) && !cleancache_get_page(page)) { + if (!PageUptodate(page)) { SetPageUptodate(page); goto confused; } diff --git a/fs/mpage.c b/fs/mpage.c index 436a85260394..744b18ec3d98 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -29,7 +29,6 @@ #include #include #include -#include #include "internal.h" /* @@ -284,12 +283,6 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) SetPageMappedToDisk(page); } - if (fully_mapped && blocks_per_page == 1 && !PageUptodate(page) && - cleancache_get_page(page) == 0) { - SetPageUptodate(page); - goto confused; - } - /* * This page will go to BIO. Do we need to send this BIO off first? */ diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 8821bc7b9c72..b3f18d3b9da0 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -41,7 +41,6 @@ #include #include #include -#include #include #define CREATE_TRACE_POINTS @@ -2328,7 +2327,6 @@ static int ocfs2_initialize_super(struct super_block *sb, mlog_errno(status); goto bail; } - cleancache_init_shared_fs(sb); osb->ocfs2_wq = alloc_ordered_workqueue("ocfs2_wq", WQ_MEM_RECLAIM); if (!osb->ocfs2_wq) { diff --git a/fs/super.c b/fs/super.c index 2739f57515f8..5ce5121e9e76 100644 --- a/fs/super.c +++ b/fs/super.c @@ -31,7 +31,6 @@ #include #include #include -#include #include #include #include @@ -257,7 +256,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, s->s_maxbytes = MAX_NON_LFS; s->s_op = &default_op; s->s_time_gran = 1000000000; - s->cleancache_poolid = CLEANCACHE_NO_POOL; s->s_shrink.seeks = DEFAULT_SEEKS; s->s_shrink.scan_objects = super_cache_scan; @@ -326,7 +324,6 @@ void deactivate_locked_super(struct super_block *s) { struct file_system_type *fs = s->s_type; if (atomic_dec_and_test(&s->s_active)) { - cleancache_invalidate_fs(s); unregister_shrinker(&s->s_shrink); fs->kill_sb(s); diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h deleted file mode 100644 index 5f5730c1d324..000000000000 --- a/include/linux/cleancache.h +++ /dev/null @@ -1,124 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_CLEANCACHE_H -#define _LINUX_CLEANCACHE_H - -#include -#include -#include - -#define CLEANCACHE_NO_POOL -1 -#define CLEANCACHE_NO_BACKEND -2 -#define CLEANCACHE_NO_BACKEND_SHARED -3 - -#define CLEANCACHE_KEY_MAX 6 - -/* - * cleancache requires every file with a page in cleancache to have a - * unique key unless/until the file is removed/truncated. For some - * filesystems, the inode number is unique, but for "modern" filesystems - * an exportable filehandle is required (see exportfs.h) - */ -struct cleancache_filekey { - union { - ino_t ino; - __u32 fh[CLEANCACHE_KEY_MAX]; - u32 key[CLEANCACHE_KEY_MAX]; - } u; -}; - -struct cleancache_ops { - int (*init_fs)(size_t); - int (*init_shared_fs)(uuid_t *uuid, size_t); - int (*get_page)(int, struct cleancache_filekey, - pgoff_t, struct page *); - void (*put_page)(int, struct cleancache_filekey, - pgoff_t, struct page *); - void (*invalidate_page)(int, struct cleancache_filekey, pgoff_t); - void (*invalidate_inode)(int, struct cleancache_filekey); - void (*invalidate_fs)(int); -}; - -extern int cleancache_register_ops(const struct cleancache_ops *ops); -extern void __cleancache_init_fs(struct super_block *); -extern void __cleancache_init_shared_fs(struct super_block *); -extern int __cleancache_get_page(struct page *); -extern void __cleancache_put_page(struct page *); -extern void __cleancache_invalidate_page(struct address_space *, struct page *); -extern void __cleancache_invalidate_inode(struct address_space *); -extern void __cleancache_invalidate_fs(struct super_block *); - -#ifdef CONFIG_CLEANCACHE -#define cleancache_enabled (1) -static inline bool cleancache_fs_enabled_mapping(struct address_space *mapping) -{ - return mapping->host->i_sb->cleancache_poolid >= 0; -} -static inline bool cleancache_fs_enabled(struct page *page) -{ - return cleancache_fs_enabled_mapping(page->mapping); -} -#else -#define cleancache_enabled (0) -#define cleancache_fs_enabled(_page) (0) -#define cleancache_fs_enabled_mapping(_page) (0) -#endif - -/* - * The shim layer provided by these inline functions allows the compiler - * to reduce all cleancache hooks to nothingness if CONFIG_CLEANCACHE - * is disabled, to a single global variable check if CONFIG_CLEANCACHE - * is enabled but no cleancache "backend" has dynamically enabled it, - * and, for the most frequent cleancache ops, to a single global variable - * check plus a superblock element comparison if CONFIG_CLEANCACHE is enabled - * and a cleancache backend has dynamically enabled cleancache, but the - * filesystem referenced by that cleancache op has not enabled cleancache. - * As a result, CONFIG_CLEANCACHE can be enabled by default with essentially - * no measurable performance impact. - */ - -static inline void cleancache_init_fs(struct super_block *sb) -{ - if (cleancache_enabled) - __cleancache_init_fs(sb); -} - -static inline void cleancache_init_shared_fs(struct super_block *sb) -{ - if (cleancache_enabled) - __cleancache_init_shared_fs(sb); -} - -static inline int cleancache_get_page(struct page *page) -{ - if (cleancache_enabled && cleancache_fs_enabled(page)) - return __cleancache_get_page(page); - return -1; -} - -static inline void cleancache_put_page(struct page *page) -{ - if (cleancache_enabled && cleancache_fs_enabled(page)) - __cleancache_put_page(page); -} - -static inline void cleancache_invalidate_page(struct address_space *mapping, - struct page *page) -{ - /* careful... page->mapping is NULL sometimes when this is called */ - if (cleancache_enabled && cleancache_fs_enabled_mapping(mapping)) - __cleancache_invalidate_page(mapping, page); -} - -static inline void cleancache_invalidate_inode(struct address_space *mapping) -{ - if (cleancache_enabled && cleancache_fs_enabled_mapping(mapping)) - __cleancache_invalidate_inode(mapping); -} - -static inline void cleancache_invalidate_fs(struct super_block *sb) -{ - if (cleancache_enabled) - __cleancache_invalidate_fs(sb); -} - -#endif /* _LINUX_CLEANCACHE_H */ diff --git a/include/linux/fs.h b/include/linux/fs.h index f7fdfe93e25d..a2a965bd30f2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1475,11 +1475,6 @@ struct super_block { const struct dentry_operations *s_d_op; /* default d_op for dentries */ - /* - * Saved pool identifier for cleancache (-1 means none) - */ - int cleancache_poolid; - struct shrinker s_shrink; /* per-sb shrinker handle */ /* Number of inodes with nlink == 0 but still referenced */ diff --git a/mm/Kconfig b/mm/Kconfig index f0c76ba47695..5166fe4af00b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -435,28 +435,6 @@ config NEED_PER_CPU_KM bool default y -config CLEANCACHE - bool "Enable cleancache driver to cache clean pages if tmem is present" - help - Cleancache can be thought of as a page-granularity victim cache - for clean pages that the kernel's pageframe replacement algorithm - (PFRA) would like to keep around, but can't since there isn't enough - memory. So when the PFRA "evicts" a page, it first attempts to use - cleancache code to put the data contained in that page into - "transcendent memory", memory that is not directly accessible or - addressable by the kernel and is of unknown and possibly - time-varying size. And when a cleancache-enabled - filesystem wishes to access a page in a file on disk, it first - checks cleancache to see if it already contains it; if it does, - the page is copied into the kernel and a disk access is avoided. - When a transcendent memory driver is available (such as zcache or - Xen transcendent memory), a significant I/O reduction - may be achieved. When none is available, all cleancache calls - are reduced to a single pointer-compare-against-NULL resulting - in a negligible performance hit. - - If unsure, say Y to enable cleancache - config FRONTSWAP bool "Enable frontswap to cache swap pages if tmem is present" depends on SWAP diff --git a/mm/Makefile b/mm/Makefile index ac5e5ba78874..312cc7de894c 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -85,7 +85,6 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o obj-$(CONFIG_PAGE_OWNER) += page_owner.o -obj-$(CONFIG_CLEANCACHE) += cleancache.o obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o obj-$(CONFIG_ZPOOL) += zpool.o obj-$(CONFIG_ZBUD) += zbud.o diff --git a/mm/cleancache.c b/mm/cleancache.c deleted file mode 100644 index 2bf12da9baa0..000000000000 --- a/mm/cleancache.c +++ /dev/null @@ -1,317 +0,0 @@ -/* - * Cleancache frontend - * - * This code provides the generic "frontend" layer to call a matching - * "backend" driver implementation of cleancache. See - * Documentation/vm/cleancache.rst for more information. - * - * Copyright (C) 2009-2010 Oracle Corp. All rights reserved. - * Author: Dan Magenheimer - * - * This work is licensed under the terms of the GNU GPL, version 2. - */ - -#include -#include -#include -#include -#include -#include - -/* - * cleancache_ops is set by cleancache_register_ops to contain the pointers - * to the cleancache "backend" implementation functions. - */ -static const struct cleancache_ops *cleancache_ops __read_mostly; - -/* - * Counters available via /sys/kernel/debug/cleancache (if debugfs is - * properly configured. These are for information only so are not protected - * against increment races. - */ -static u64 cleancache_succ_gets; -static u64 cleancache_failed_gets; -static u64 cleancache_puts; -static u64 cleancache_invalidates; - -static void cleancache_register_ops_sb(struct super_block *sb, void *unused) -{ - switch (sb->cleancache_poolid) { - case CLEANCACHE_NO_BACKEND: - __cleancache_init_fs(sb); - break; - case CLEANCACHE_NO_BACKEND_SHARED: - __cleancache_init_shared_fs(sb); - break; - } -} - -/* - * Register operations for cleancache. Returns 0 on success. - */ -int cleancache_register_ops(const struct cleancache_ops *ops) -{ - if (cmpxchg(&cleancache_ops, NULL, ops)) - return -EBUSY; - - /* - * A cleancache backend can be built as a module and hence loaded after - * a cleancache enabled filesystem has called cleancache_init_fs. To - * handle such a scenario, here we call ->init_fs or ->init_shared_fs - * for each active super block. To differentiate between local and - * shared filesystems, we temporarily initialize sb->cleancache_poolid - * to CLEANCACHE_NO_BACKEND or CLEANCACHE_NO_BACKEND_SHARED - * respectively in case there is no backend registered at the time - * cleancache_init_fs or cleancache_init_shared_fs is called. - * - * Since filesystems can be mounted concurrently with cleancache - * backend registration, we have to be careful to guarantee that all - * cleancache enabled filesystems that has been mounted by the time - * cleancache_register_ops is called has got and all mounted later will - * get cleancache_poolid. This is assured by the following statements - * tied together: - * - * a) iterate_supers skips only those super blocks that has started - * ->kill_sb - * - * b) if iterate_supers encounters a super block that has not finished - * ->mount yet, it waits until it is finished - * - * c) cleancache_init_fs is called from ->mount and - * cleancache_invalidate_fs is called from ->kill_sb - * - * d) we call iterate_supers after cleancache_ops has been set - * - * From a) it follows that if iterate_supers skips a super block, then - * either the super block is already dead, in which case we do not need - * to bother initializing cleancache for it, or it was mounted after we - * initiated iterate_supers. In the latter case, it must have seen - * cleancache_ops set according to d) and initialized cleancache from - * ->mount by itself according to c). This proves that we call - * ->init_fs at least once for each active super block. - * - * From b) and c) it follows that if iterate_supers encounters a super - * block that has already started ->init_fs, it will wait until ->mount - * and hence ->init_fs has finished, then check cleancache_poolid, see - * that it has already been set and therefore do nothing. This proves - * that we call ->init_fs no more than once for each super block. - * - * Combined together, the last two paragraphs prove the function - * correctness. - * - * Note that various cleancache callbacks may proceed before this - * function is called or even concurrently with it, but since - * CLEANCACHE_NO_BACKEND is negative, they will all result in a noop - * until the corresponding ->init_fs has been actually called and - * cleancache_ops has been set. - */ - iterate_supers(cleancache_register_ops_sb, NULL); - return 0; -} -EXPORT_SYMBOL(cleancache_register_ops); - -/* Called by a cleancache-enabled filesystem at time of mount */ -void __cleancache_init_fs(struct super_block *sb) -{ - int pool_id = CLEANCACHE_NO_BACKEND; - - if (cleancache_ops) { - pool_id = cleancache_ops->init_fs(PAGE_SIZE); - if (pool_id < 0) - pool_id = CLEANCACHE_NO_POOL; - } - sb->cleancache_poolid = pool_id; -} -EXPORT_SYMBOL(__cleancache_init_fs); - -/* Called by a cleancache-enabled clustered filesystem at time of mount */ -void __cleancache_init_shared_fs(struct super_block *sb) -{ - int pool_id = CLEANCACHE_NO_BACKEND_SHARED; - - if (cleancache_ops) { - pool_id = cleancache_ops->init_shared_fs(&sb->s_uuid, PAGE_SIZE); - if (pool_id < 0) - pool_id = CLEANCACHE_NO_POOL; - } - sb->cleancache_poolid = pool_id; -} -EXPORT_SYMBOL(__cleancache_init_shared_fs); - -/* - * If the filesystem uses exportable filehandles, use the filehandle as - * the key, else use the inode number. - */ -static int cleancache_get_key(struct inode *inode, - struct cleancache_filekey *key) -{ - int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *); - int len = 0, maxlen = CLEANCACHE_KEY_MAX; - struct super_block *sb = inode->i_sb; - - key->u.ino = inode->i_ino; - if (sb->s_export_op != NULL) { - fhfn = sb->s_export_op->encode_fh; - if (fhfn) { - len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL); - if (len <= FILEID_ROOT || len == FILEID_INVALID) - return -1; - if (maxlen > CLEANCACHE_KEY_MAX) - return -1; - } - } - return 0; -} - -/* - * "Get" data from cleancache associated with the poolid/inode/index - * that were specified when the data was put to cleanache and, if - * successful, use it to fill the specified page with data and return 0. - * The pageframe is unchanged and returns -1 if the get fails. - * Page must be locked by caller. - * - * The function has two checks before any action is taken - whether - * a backend is registered and whether the sb->cleancache_poolid - * is correct. - */ -int __cleancache_get_page(struct page *page) -{ - int ret = -1; - int pool_id; - struct cleancache_filekey key = { .u.key = { 0 } }; - - if (!cleancache_ops) { - cleancache_failed_gets++; - goto out; - } - - VM_BUG_ON_PAGE(!PageLocked(page), page); - pool_id = page->mapping->host->i_sb->cleancache_poolid; - if (pool_id < 0) - goto out; - - if (cleancache_get_key(page->mapping->host, &key) < 0) - goto out; - - ret = cleancache_ops->get_page(pool_id, key, page->index, page); - if (ret == 0) - cleancache_succ_gets++; - else - cleancache_failed_gets++; -out: - return ret; -} -EXPORT_SYMBOL(__cleancache_get_page); - -/* - * "Put" data from a page to cleancache and associate it with the - * (previously-obtained per-filesystem) poolid and the page's, - * inode and page index. Page must be locked. Note that a put_page - * always "succeeds", though a subsequent get_page may succeed or fail. - * - * The function has two checks before any action is taken - whether - * a backend is registered and whether the sb->cleancache_poolid - * is correct. - */ -void __cleancache_put_page(struct page *page) -{ - int pool_id; - struct cleancache_filekey key = { .u.key = { 0 } }; - - if (!cleancache_ops) { - cleancache_puts++; - return; - } - - VM_BUG_ON_PAGE(!PageLocked(page), page); - pool_id = page->mapping->host->i_sb->cleancache_poolid; - if (pool_id >= 0 && - cleancache_get_key(page->mapping->host, &key) >= 0) { - cleancache_ops->put_page(pool_id, key, page->index, page); - cleancache_puts++; - } -} -EXPORT_SYMBOL(__cleancache_put_page); - -/* - * Invalidate any data from cleancache associated with the poolid and the - * page's inode and page index so that a subsequent "get" will fail. - * - * The function has two checks before any action is taken - whether - * a backend is registered and whether the sb->cleancache_poolid - * is correct. - */ -void __cleancache_invalidate_page(struct address_space *mapping, - struct page *page) -{ - /* careful... page->mapping is NULL sometimes when this is called */ - int pool_id = mapping->host->i_sb->cleancache_poolid; - struct cleancache_filekey key = { .u.key = { 0 } }; - - if (!cleancache_ops) - return; - - if (pool_id >= 0) { - VM_BUG_ON_PAGE(!PageLocked(page), page); - if (cleancache_get_key(mapping->host, &key) >= 0) { - cleancache_ops->invalidate_page(pool_id, - key, page->index); - cleancache_invalidates++; - } - } -} -EXPORT_SYMBOL(__cleancache_invalidate_page); - -/* - * Invalidate all data from cleancache associated with the poolid and the - * mappings's inode so that all subsequent gets to this poolid/inode - * will fail. - * - * The function has two checks before any action is taken - whether - * a backend is registered and whether the sb->cleancache_poolid - * is correct. - */ -void __cleancache_invalidate_inode(struct address_space *mapping) -{ - int pool_id = mapping->host->i_sb->cleancache_poolid; - struct cleancache_filekey key = { .u.key = { 0 } }; - - if (!cleancache_ops) - return; - - if (pool_id >= 0 && cleancache_get_key(mapping->host, &key) >= 0) - cleancache_ops->invalidate_inode(pool_id, key); -} -EXPORT_SYMBOL(__cleancache_invalidate_inode); - -/* - * Called by any cleancache-enabled filesystem at time of unmount; - * note that pool_id is surrendered and may be returned by a subsequent - * cleancache_init_fs or cleancache_init_shared_fs. - */ -void __cleancache_invalidate_fs(struct super_block *sb) -{ - int pool_id; - - pool_id = sb->cleancache_poolid; - sb->cleancache_poolid = CLEANCACHE_NO_POOL; - - if (cleancache_ops && pool_id >= 0) - cleancache_ops->invalidate_fs(pool_id); -} -EXPORT_SYMBOL(__cleancache_invalidate_fs); - -static int __init init_cleancache(void) -{ -#ifdef CONFIG_DEBUG_FS - struct dentry *root = debugfs_create_dir("cleancache", NULL); - if (root == NULL) - return -ENXIO; - debugfs_create_u64("succ_gets", 0444, root, &cleancache_succ_gets); - debugfs_create_u64("failed_gets", 0444, root, &cleancache_failed_gets); - debugfs_create_u64("puts", 0444, root, &cleancache_puts); - debugfs_create_u64("invalidates", 0444, root, &cleancache_invalidates); -#endif - return 0; -} -module_init(init_cleancache) diff --git a/mm/filemap.c b/mm/filemap.c index df2006ba0cfa..de7c4655d752 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -35,7 +35,6 @@ #include #include #include -#include #include #include #include @@ -157,16 +156,6 @@ static void unaccount_page_cache_page(struct address_space *mapping, { int nr; - /* - * if we're uptodate, flush out into the cleancache, otherwise - * invalidate any existing cleancache entries. We can't leave - * stale data around in the cleancache once our page is gone - */ - if (PageUptodate(page) && PageMappedToDisk(page)) - cleancache_put_page(page); - else - cleancache_invalidate_page(mapping, page); - VM_BUG_ON_PAGE(PageTail(page), page); VM_BUG_ON_PAGE(page_mapped(page), page); if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(page_mapped(page))) { diff --git a/mm/truncate.c b/mm/truncate.c index 8563339041f6..9eb4060b7a93 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -22,7 +22,6 @@ #include /* grr. try_to_release_page, do_invalidatepage */ #include -#include #include #include "internal.h" @@ -301,7 +300,7 @@ void truncate_inode_pages_range(struct address_space *mapping, int i; if (mapping->nrpages == 0 && mapping->nrexceptional == 0) - goto out; + return; /* Offsets within partial pages */ partial_start = lstart & (PAGE_SIZE - 1); @@ -382,7 +381,6 @@ void truncate_inode_pages_range(struct address_space *mapping, } wait_on_page_writeback(page); zero_user_segment(page, partial_start, top); - cleancache_invalidate_page(mapping, page); if (page_has_private(page)) do_invalidatepage(page, partial_start, top - partial_start); @@ -395,7 +393,6 @@ void truncate_inode_pages_range(struct address_space *mapping, if (page) { wait_on_page_writeback(page); zero_user_segment(page, 0, partial_end); - cleancache_invalidate_page(mapping, page); if (page_has_private(page)) do_invalidatepage(page, 0, partial_end); @@ -408,7 +405,7 @@ void truncate_inode_pages_range(struct address_space *mapping, * will be released, just zeroed, so we can bail out now. */ if (start >= end) - goto out; + return; index = start; for ( ; ; ) { @@ -453,9 +450,6 @@ void truncate_inode_pages_range(struct address_space *mapping, pagevec_release(&pvec); index++; } - -out: - cleancache_invalidate_inode(mapping); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -681,7 +675,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, int did_range_unmap = 0; if (mapping->nrpages == 0 && mapping->nrexceptional == 0) - goto out; + return 0; pagevec_init(&pvec); index = start; @@ -751,8 +745,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, if (dax_mapping(mapping)) { unmap_mapping_pages(mapping, start, end - start + 1, false); } -out: - cleancache_invalidate_inode(mapping); + return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); From patchwork Mon May 27 10:32:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 10962499 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8FF076 for ; Mon, 27 May 2019 10:32:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A87BA2896F for ; Mon, 27 May 2019 10:32:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B49D28A31; Mon, 27 May 2019 10:32:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B171E2896F for ; Mon, 27 May 2019 10:32:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BCFC6B0272; Mon, 27 May 2019 06:32:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 83B786B0273; Mon, 27 May 2019 06:32:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68EEA6B0274; Mon, 27 May 2019 06:32:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id DE0956B0272 for ; Mon, 27 May 2019 06:32:14 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id c1so27389315edi.20 for ; Mon, 27 May 2019 03:32:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=8CGFP6Qn2xvR/4ig5R52dtdRE57fbpM5XHPmlSASgvE=; b=TsSYGsPDUHv8AMkiX6RNt9Xbb0OMf61CsGjA0pRzf5Yusds/wEG5NMna8HTGDxpo1a pq/ZGdq0N8zy8gy4xxV5yXk+tI9xscCZ2E0yqZeuSxffNZRqgxaGPYFQLEEWvGELm5SP joaVrXrxTw+02NTYOFAPqxupFsoj4VktNtMApQn1RKvt93wEyiQEiv1Cgs2i8Jm+9zPK fUjKHaodTSCdtYrAWzMqjOz7SFTA69AsgUTbqZxcJIGiulFfMeGhyhjS3qgKFq7oazYg vGp3PYMvR35fF3BR+H9/1h9ax/n3CphvkVfA4NAswEOER7nSfM0P3/wjOb//xzGxzM6Q 0PCw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=jgross@suse.com X-Gm-Message-State: APjAAAXDS85np+yUX6i+/79lB2/NFp3Wrt0/CFPXJkGio3ad0wHO0GDG BI584zFfRHCEB8Um37wLAjswEKlyneC1o7k0C/gpmh2X9UHqCqPeTBlp1dJirmQ1jJtTswAF0Xb ZijDIqtbOQhI/G0dlEcTv28gGb1wStkQraKSuOsXMNHpvM7r9iCrw4cUPAcOa61J2Xw== X-Received: by 2002:a17:906:4e87:: with SMTP id v7mr89361594eju.150.1558953134290; Mon, 27 May 2019 03:32:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqx8f5RVzHm9dpqaO3nZ4eayaRInAJ8Q6exxd31CB6aEc4PGYPNxGWFupbBEzmfch7BZBqsI X-Received: by 2002:a17:906:4e87:: with SMTP id v7mr89361515eju.150.1558953132925; Mon, 27 May 2019 03:32:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558953132; cv=none; d=google.com; s=arc-20160816; b=e1OhV4Nyjy7T1XlIbKD8NbWJ7XoDsOcK+2PFjG/udNn9j93vlG55kiFb5NXCyrlRlG D/HUlKz54yMVhh9mVX6jDohe+WwQUjv+h6zQqTPw+ytw78Zgz262KeNbND+PnYFgBxvg hBzp1ZEI+Ckh4k0E9yAYIIifsJGjxzCFegUEhAtJIIVSvPGdIZUpjwNe4lD4C2QPovzx K9drAWXl4z7sxWjShp5cA0xIF28bbDRISLEGFMTRh2LNkfP/09hLOYvTAuXcz8Fmr89X Or2rkyZPeljCfDehYnGi7/Y2c2Ki2atxhduW8zogzy5IqxSZ/Qgb2eN5ZE1bviArYOxT /VBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=8CGFP6Qn2xvR/4ig5R52dtdRE57fbpM5XHPmlSASgvE=; b=OhgQj/OPpTG3i3kIcB+q16FPX4GshpQXWJ20xNxZGGNcwWeoLGeZVqN2BCOomY/eX6 /LNbKr+oFS4t+I+tbb5V68ZbnEvCcEsMB13i/Zv5fxxX4s9Gmf4/PZJmeDytW39CdWxl gfSEiVaZFfC6kHYoMmhMt9gDBMIHR2qrPREPxRwB0gfnS6PwG9R7DeVGwI7KYp7atx0R Nb1Fo0c2rms0ICFR4KsPw4ujL4tsYZ2J7p+FXQzMeI67xD6kIzX83SmfYkRkFmtbW4/M Q0SH87gc8+itWHG7PrDBZfbxGIe34D+spP2yH5MB7f4PYovrThc6UvmoT5BsFPxLMKue T+0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=jgross@suse.com Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id d21si8482193edb.358.2019.05.27.03.32.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 May 2019 03:32:12 -0700 (PDT) Received-SPF: pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jgross@suse.com designates 195.135.220.15 as permitted sender) smtp.mailfrom=jgross@suse.com X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 00898ADE6; Mon, 27 May 2019 10:32:12 +0000 (UTC) From: Juergen Gross To: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org Cc: Juergen Gross , Jonathan Corbet , Konrad Rzeszutek Wilk Subject: [PATCH 3/3] mm: remove tmem specifics from frontswap Date: Mon, 27 May 2019 12:32:07 +0200 Message-Id: <20190527103207.13287-4-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190527103207.13287-1-jgross@suse.com> References: <20190527103207.13287-1-jgross@suse.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The frontswap module contains several parts which are specific to tmem. With that no longer present those parts can be removed. Signed-off-by: Juergen Gross --- Documentation/vm/frontswap.rst | 17 +---- include/linux/frontswap.h | 5 -- mm/Kconfig | 16 ++--- mm/frontswap.c | 156 +---------------------------------------- 4 files changed, 7 insertions(+), 187 deletions(-) diff --git a/Documentation/vm/frontswap.rst b/Documentation/vm/frontswap.rst index 511c921bc8d2..2c674c0c6a77 100644 --- a/Documentation/vm/frontswap.rst +++ b/Documentation/vm/frontswap.rst @@ -19,7 +19,7 @@ for a detailed overview of frontswap and related kernel parts) Frontswap is so named because it can be thought of as the opposite of a "backing" store for a swap device. The storage is assumed to be a synchronous concurrency-safe page-oriented "pseudo-RAM device" conforming -to the requirements of transcendent memory (such as Xen's "tmem", or +to the requirements of transcendent memory (such as in-kernel compressed memory, aka "zcache", or future RAM-like devices); this pseudo-RAM device is not directly accessible or addressable by the kernel and is of unknown and possibly time-varying size. The driver @@ -113,21 +113,6 @@ many servers in a cluster can swap, dynamically as needed, to a single server configured with a large amount of RAM... without pre-configuring how much of the RAM is available for each of the clients! -In the virtual case, the whole point of virtualization is to statistically -multiplex physical resources across the varying demands of multiple -virtual machines. This is really hard to do with RAM and efforts to do -it well with no kernel changes have essentially failed (except in some -well-publicized special-case workloads). -Specifically, the Xen Transcendent Memory backend allows otherwise -"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple -virtual machines, but the pages can be compressed and deduplicated to -optimize RAM utilization. And when guest OS's are induced to surrender -underutilized RAM (e.g. with "selfballooning"), sudden unexpected -memory pressure may result in swapping; frontswap allows those pages -to be swapped to and from hypervisor RAM (if overall host system memory -conditions allow), thus mitigating the potentially awful performance impact -of unplanned swapping. - A KVM implementation is underway and has been RFC'ed to lkml. And, using frontswap, investigation is also underway on the use of NVM as a memory extension technology. diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h index 6d775984905b..052480aa3756 100644 --- a/include/linux/frontswap.h +++ b/include/linux/frontswap.h @@ -24,11 +24,6 @@ struct frontswap_ops { }; extern void frontswap_register_ops(struct frontswap_ops *ops); -extern void frontswap_shrink(unsigned long); -extern unsigned long frontswap_curr_pages(void); -extern void frontswap_writethrough(bool); -#define FRONTSWAP_HAS_EXCLUSIVE_GETS -extern void frontswap_tmem_exclusive_gets(bool); extern bool __frontswap_test(struct swap_info_struct *, pgoff_t); extern void __frontswap_init(unsigned type, unsigned long *map); diff --git a/mm/Kconfig b/mm/Kconfig index 5166fe4af00b..971b615ad3a6 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -436,20 +436,14 @@ config NEED_PER_CPU_KM default y config FRONTSWAP - bool "Enable frontswap to cache swap pages if tmem is present" + bool "Enable frontswap to cache swap pages if zswap is present" depends on SWAP help Frontswap is so named because it can be thought of as the opposite - of a "backing" store for a swap device. The data is stored into - "transcendent memory", memory that is not directly accessible or - addressable by the kernel and is of unknown and possibly - time-varying size. When space in transcendent memory is available, - a significant swap I/O reduction may be achieved. When none is - available, all frontswap calls are reduced to a single pointer- - compare-against-NULL resulting in a negligible performance hit - and swap data is stored as normal on the matching swap device. - - If unsure, say Y to enable frontswap. + of a "backing" store for a swap device. The only user right now is + zswap. + + If unsure, say "n". config CMA bool "Contiguous Memory Allocator" diff --git a/mm/frontswap.c b/mm/frontswap.c index 157e5bf63504..e3370e46a0a5 100644 --- a/mm/frontswap.c +++ b/mm/frontswap.c @@ -33,23 +33,6 @@ static struct frontswap_ops *frontswap_ops __read_mostly; #define for_each_frontswap_ops(ops) \ for ((ops) = frontswap_ops; (ops); (ops) = (ops)->next) -/* - * If enabled, frontswap_store will return failure even on success. As - * a result, the swap subsystem will always write the page to swap, in - * effect converting frontswap into a writethrough cache. In this mode, - * there is no direct reduction in swap writes, but a frontswap backend - * can unilaterally "reclaim" any pages in use with no data loss, thus - * providing increases control over maximum memory usage due to frontswap. - */ -static bool frontswap_writethrough_enabled __read_mostly; - -/* - * If enabled, the underlying tmem implementation is capable of doing - * exclusive gets, so frontswap_load, on a successful tmem_get must - * mark the page as no longer in frontswap AND mark it dirty. - */ -static bool frontswap_tmem_exclusive_gets_enabled __read_mostly; - #ifdef CONFIG_DEBUG_FS /* * Counters available via /sys/kernel/debug/frontswap (if debugfs is @@ -167,24 +150,6 @@ void frontswap_register_ops(struct frontswap_ops *ops) } EXPORT_SYMBOL(frontswap_register_ops); -/* - * Enable/disable frontswap writethrough (see above). - */ -void frontswap_writethrough(bool enable) -{ - frontswap_writethrough_enabled = enable; -} -EXPORT_SYMBOL(frontswap_writethrough); - -/* - * Enable/disable frontswap exclusive gets (see above). - */ -void frontswap_tmem_exclusive_gets(bool enable) -{ - frontswap_tmem_exclusive_gets_enabled = enable; -} -EXPORT_SYMBOL(frontswap_tmem_exclusive_gets); - /* * Called when a swap device is swapon'd. */ @@ -280,9 +245,6 @@ int __frontswap_store(struct page *page) } else { inc_frontswap_failed_stores(); } - if (frontswap_writethrough_enabled) - /* report failure so swap also writes to swap device */ - ret = -1; return ret; } EXPORT_SYMBOL(__frontswap_store); @@ -314,13 +276,8 @@ int __frontswap_load(struct page *page) if (!ret) /* successful load */ break; } - if (ret == 0) { + if (ret == 0) inc_frontswap_loads(); - if (frontswap_tmem_exclusive_gets_enabled) { - SetPageDirty(page); - __frontswap_clear(sis, offset); - } - } return ret; } EXPORT_SYMBOL(__frontswap_load); @@ -369,117 +326,6 @@ void __frontswap_invalidate_area(unsigned type) } EXPORT_SYMBOL(__frontswap_invalidate_area); -static unsigned long __frontswap_curr_pages(void) -{ - unsigned long totalpages = 0; - struct swap_info_struct *si = NULL; - - assert_spin_locked(&swap_lock); - plist_for_each_entry(si, &swap_active_head, list) - totalpages += atomic_read(&si->frontswap_pages); - return totalpages; -} - -static int __frontswap_unuse_pages(unsigned long total, unsigned long *unused, - int *swapid) -{ - int ret = -EINVAL; - struct swap_info_struct *si = NULL; - int si_frontswap_pages; - unsigned long total_pages_to_unuse = total; - unsigned long pages = 0, pages_to_unuse = 0; - - assert_spin_locked(&swap_lock); - plist_for_each_entry(si, &swap_active_head, list) { - si_frontswap_pages = atomic_read(&si->frontswap_pages); - if (total_pages_to_unuse < si_frontswap_pages) { - pages = pages_to_unuse = total_pages_to_unuse; - } else { - pages = si_frontswap_pages; - pages_to_unuse = 0; /* unuse all */ - } - /* ensure there is enough RAM to fetch pages from frontswap */ - if (security_vm_enough_memory_mm(current->mm, pages)) { - ret = -ENOMEM; - continue; - } - vm_unacct_memory(pages); - *unused = pages_to_unuse; - *swapid = si->type; - ret = 0; - break; - } - - return ret; -} - -/* - * Used to check if it's necessory and feasible to unuse pages. - * Return 1 when nothing to do, 0 when need to shink pages, - * error code when there is an error. - */ -static int __frontswap_shrink(unsigned long target_pages, - unsigned long *pages_to_unuse, - int *type) -{ - unsigned long total_pages = 0, total_pages_to_unuse; - - assert_spin_locked(&swap_lock); - - total_pages = __frontswap_curr_pages(); - if (total_pages <= target_pages) { - /* Nothing to do */ - *pages_to_unuse = 0; - return 1; - } - total_pages_to_unuse = total_pages - target_pages; - return __frontswap_unuse_pages(total_pages_to_unuse, pages_to_unuse, type); -} - -/* - * Frontswap, like a true swap device, may unnecessarily retain pages - * under certain circumstances; "shrink" frontswap is essentially a - * "partial swapoff" and works by calling try_to_unuse to attempt to - * unuse enough frontswap pages to attempt to -- subject to memory - * constraints -- reduce the number of pages in frontswap to the - * number given in the parameter target_pages. - */ -void frontswap_shrink(unsigned long target_pages) -{ - unsigned long pages_to_unuse = 0; - int uninitialized_var(type), ret; - - /* - * we don't want to hold swap_lock while doing a very - * lengthy try_to_unuse, but swap_list may change - * so restart scan from swap_active_head each time - */ - spin_lock(&swap_lock); - ret = __frontswap_shrink(target_pages, &pages_to_unuse, &type); - spin_unlock(&swap_lock); - if (ret == 0) - try_to_unuse(type, true, pages_to_unuse); - return; -} -EXPORT_SYMBOL(frontswap_shrink); - -/* - * Count and return the number of frontswap pages across all - * swap devices. This is exported so that backend drivers can - * determine current usage without reading debugfs. - */ -unsigned long frontswap_curr_pages(void) -{ - unsigned long totalpages = 0; - - spin_lock(&swap_lock); - totalpages = __frontswap_curr_pages(); - spin_unlock(&swap_lock); - - return totalpages; -} -EXPORT_SYMBOL(frontswap_curr_pages); - static int __init init_frontswap(void) { #ifdef CONFIG_DEBUG_FS