Message ID | 20230530210251.493194-1-yosryahmed@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: zswap: support exclusive loads | expand |
On Tue, 30 May 2023 21:02:51 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") > removed support for exclusive loads from frontswap as it was not used. > > Bring back exclusive loads support to frontswap by adding an > exclusive_loads argument to frontswap_ops. Add support for exclusive > loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS. Why is this Kconfigurable? Why not just enable the feature for all builds? > Refactor zswap entry invalidation in zswap_frontswap_invalidate_page() > into zswap_invalidate_entry() to reuse it in zswap_frontswap_load(). > > With exclusive loads, we avoid having two copies of the same page in > memory (compressed & uncompressed) after faulting it in from zswap. On > the other hand, if the page is to be reclaimed again without being > dirtied, it will be re-compressed. Compression is not usually slow, and > a page that was just faulted in is less likely to be reclaimed again > soon. > > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON > The selection made here can be overridden by using the kernel > command line 'zswap.enabled=' option. > > +config ZSWAP_EXCLUSIVE_LOADS > + bool "Invalidate zswap entries when pages are loaded" > + depends on ZSWAP > + help > + If selected, when a page is loaded from zswap, the zswap entry is > + invalidated at once, as opposed to leaving it in zswap until the > + swap entry is freed. > + > + This avoids having two copies of the same page in memory > + (compressed and uncompressed) after faulting in a page from zswap. > + The cost is that if the page was never dirtied and needs to be > + swapped out again, it will be re-compressed. So it's a speed-vs-space tradeoff? I'm not sure how users are to decide whether they want this. Did we help them as much as possible?
On Tue, May 30, 2023 at 2:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 30 May 2023 21:02:51 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") > > removed support for exclusive loads from frontswap as it was not used. > > > > Bring back exclusive loads support to frontswap by adding an > > exclusive_loads argument to frontswap_ops. Add support for exclusive > > loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS. > > Why is this Kconfigurable? Why not just enable the feature for all > builds? I assumed that some users want the current behavior, where reclaiming clean pages that were once in zswap would be faster. If no one cares, I can remove the config option and have it always on. > > > Refactor zswap entry invalidation in zswap_frontswap_invalidate_page() > > into zswap_invalidate_entry() to reuse it in zswap_frontswap_load(). > > > > With exclusive loads, we avoid having two copies of the same page in > > memory (compressed & uncompressed) after faulting it in from zswap. On > > the other hand, if the page is to be reclaimed again without being > > dirtied, it will be re-compressed. Compression is not usually slow, and > > a page that was just faulted in is less likely to be reclaimed again > > soon. > > > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON > > The selection made here can be overridden by using the kernel > > command line 'zswap.enabled=' option. > > > > +config ZSWAP_EXCLUSIVE_LOADS > > + bool "Invalidate zswap entries when pages are loaded" > > + depends on ZSWAP > > + help > > + If selected, when a page is loaded from zswap, the zswap entry is > > + invalidated at once, as opposed to leaving it in zswap until the > > + swap entry is freed. > > + > > + This avoids having two copies of the same page in memory > > + (compressed and uncompressed) after faulting in a page from zswap. > > + The cost is that if the page was never dirtied and needs to be > > + swapped out again, it will be re-compressed. > > So it's a speed-vs-space tradeoff? I'm not sure how users are to > decide whether they want this. Did we help them as much as possible? Yes, it is a reclaim speed vs. space tradeoff. My intuition is that it should be more useful to have this enabled, as the memory savings should be more important than having reclaim be a little bit faster in some specific situations. We can make the configuration on by default if others agree. I would imagine users would turn this configuration on and observe memory usage of zswap vs. reclaim speed, and decide based on the numbers. > >
On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote: > @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page) > > /* Try loading from each implementation, until one succeeds. */ > ret = frontswap_ops->load(type, offset, page); > - if (ret == 0) > + if (ret == 0) { > inc_frontswap_loads(); > + if (frontswap_ops->exclusive_loads) { > + SetPageDirty(page); > + __frontswap_clear(sis, offset); > + } Somewhat tangential, but is there still a point to the frontswap layer? It seems usecases other than zswap have never materialized, at least not in tree. Life would be a lot easier if we were to just hardcode the zswap callbacks in the swap functions. It's not the patch's fault, but it highlights the boiler plate the indirection causes. ->load() already has the page and could just dirty it directly. Instead now both layers have to handle invalidation, which is a vector for bugs. Can somebody think of reasons to keep it? If not, I'd take a stab at removing it.
On Tue, May 30, 2023 at 4:54 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote: > > @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page) > > > > /* Try loading from each implementation, until one succeeds. */ > > ret = frontswap_ops->load(type, offset, page); > > - if (ret == 0) > > + if (ret == 0) { > > inc_frontswap_loads(); > > + if (frontswap_ops->exclusive_loads) { > > + SetPageDirty(page); > > + __frontswap_clear(sis, offset); > > + } > > Somewhat tangential, but is there still a point to the frontswap > layer? It seems usecases other than zswap have never materialized, at > least not in tree. Life would be a lot easier if we were to just > hardcode the zswap callbacks in the swap functions. > > It's not the patch's fault, but it highlights the boiler plate the > indirection causes. ->load() already has the page and could just dirty > it directly. Instead now both layers have to handle invalidation, > which is a vector for bugs. > > Can somebody think of reasons to keep it? If not, I'd take a stab at > removing it. I was intending to eventually remove it as part of the swap abstraction proposal I am working on, but this will take some time. If you want to remove it now, even better :)
On Tue, May 30, 2023 at 07:54:47PM -0400, Johannes Weiner wrote: > Somewhat tangential, but is there still a point to the frontswap > layer? It seems usecases other than zswap have never materialized, at > least not in tree. Life would be a lot easier if we were to just > hardcode the zswap callbacks in the swap functions. I've been wanting to remove it for a while, as it really is rather pointless.
On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote: > @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON > The selection made here can be overridden by using the kernel > command line 'zswap.enabled=' option. > > +config ZSWAP_EXCLUSIVE_LOADS > + bool "Invalidate zswap entries when pages are loaded" > + depends on ZSWAP > + help > + If selected, when a page is loaded from zswap, the zswap entry is > + invalidated at once, as opposed to leaving it in zswap until the > + swap entry is freed. > + > + This avoids having two copies of the same page in memory > + (compressed and uncompressed) after faulting in a page from zswap. > + The cost is that if the page was never dirtied and needs to be > + swapped out again, it will be re-compressed. > + > choice > prompt "Default compressor" > depends on ZSWAP > diff --git a/mm/frontswap.c b/mm/frontswap.c > index 279e55b4ed87..e5d6825110f4 100644 > --- a/mm/frontswap.c > +++ b/mm/frontswap.c > @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page) > > /* Try loading from each implementation, until one succeeds. */ > ret = frontswap_ops->load(type, offset, page); > - if (ret == 0) > + if (ret == 0) { > inc_frontswap_loads(); > + if (frontswap_ops->exclusive_loads) { > + SetPageDirty(page); > + __frontswap_clear(sis, offset); > + } > + } > return ret; This would be a much more accessible feature (distro kernels, experimenting, adapting to different workloads) if it were runtime switchable. That should be possible, right? As long as frontswap and zswap are coordinated, this can be done on a per-entry basis: exclusive = READ_ONCE(frontswap_ops->exclusive_loads); ret = frontswap_ops->load(type, offset, page, exclusive); if (ret == 0) { if (exclusive) { SetPageDirty(page); __frontswap_clear(sis, offset); } }
On Wed, Jun 7, 2023 at 7:13 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, May 30, 2023 at 09:02:51PM +0000, Yosry Ahmed wrote: > > @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON > > The selection made here can be overridden by using the kernel > > command line 'zswap.enabled=' option. > > > > +config ZSWAP_EXCLUSIVE_LOADS > > + bool "Invalidate zswap entries when pages are loaded" > > + depends on ZSWAP > > + help > > + If selected, when a page is loaded from zswap, the zswap entry is > > + invalidated at once, as opposed to leaving it in zswap until the > > + swap entry is freed. > > + > > + This avoids having two copies of the same page in memory > > + (compressed and uncompressed) after faulting in a page from zswap. > > + The cost is that if the page was never dirtied and needs to be > > + swapped out again, it will be re-compressed. > > + > > choice > > prompt "Default compressor" > > depends on ZSWAP > > diff --git a/mm/frontswap.c b/mm/frontswap.c > > index 279e55b4ed87..e5d6825110f4 100644 > > --- a/mm/frontswap.c > > +++ b/mm/frontswap.c > > @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page) > > > > /* Try loading from each implementation, until one succeeds. */ > > ret = frontswap_ops->load(type, offset, page); > > - if (ret == 0) > > + if (ret == 0) { > > inc_frontswap_loads(); > > + if (frontswap_ops->exclusive_loads) { > > + SetPageDirty(page); > > + __frontswap_clear(sis, offset); > > + } > > + } > > return ret; > > This would be a much more accessible feature (distro kernels, > experimenting, adapting to different workloads) if it were runtime > switchable. > > That should be possible, right? As long as frontswap and zswap are > coordinated, this can be done on a per-entry basis: > > exclusive = READ_ONCE(frontswap_ops->exclusive_loads); > ret = frontswap_ops->load(type, offset, page, exclusive); > if (ret == 0) { > if (exclusive) { > SetPageDirty(page); > __frontswap_clear(sis, offset); > } > } Sure, I can do that. My thought was that this isn't something that someone usually wants to change at runtime, but at least for experimentation, it does make things a lot easier. I can add a zswap module parameter. I am guessing for consistency with other zswap settings and for convenience, I can leave the config option to set the default value with, aka CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON or something. Does this make sense?
diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h index a631bac12220..289561e12cad 100644 --- a/include/linux/frontswap.h +++ b/include/linux/frontswap.h @@ -13,6 +13,7 @@ struct frontswap_ops { int (*load)(unsigned, pgoff_t, struct page *); /* load a page */ void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */ void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */ + bool exclusive_loads; /* pages are invalidated after being loaded */ }; int frontswap_register_ops(const struct frontswap_ops *ops); diff --git a/mm/Kconfig b/mm/Kconfig index 7672a22647b4..92c30879bf67 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -46,6 +46,19 @@ config ZSWAP_DEFAULT_ON The selection made here can be overridden by using the kernel command line 'zswap.enabled=' option. +config ZSWAP_EXCLUSIVE_LOADS + bool "Invalidate zswap entries when pages are loaded" + depends on ZSWAP + help + If selected, when a page is loaded from zswap, the zswap entry is + invalidated at once, as opposed to leaving it in zswap until the + swap entry is freed. + + This avoids having two copies of the same page in memory + (compressed and uncompressed) after faulting in a page from zswap. + The cost is that if the page was never dirtied and needs to be + swapped out again, it will be re-compressed. + choice prompt "Default compressor" depends on ZSWAP diff --git a/mm/frontswap.c b/mm/frontswap.c index 279e55b4ed87..e5d6825110f4 100644 --- a/mm/frontswap.c +++ b/mm/frontswap.c @@ -216,8 +216,13 @@ int __frontswap_load(struct page *page) /* Try loading from each implementation, until one succeeds. */ ret = frontswap_ops->load(type, offset, page); - if (ret == 0) + if (ret == 0) { inc_frontswap_loads(); + if (frontswap_ops->exclusive_loads) { + SetPageDirty(page); + __frontswap_clear(sis, offset); + } + } return ret; } diff --git a/mm/zswap.c b/mm/zswap.c index 59da2a415fbb..fba80330afd1 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1329,6 +1329,16 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, goto reject; } +static void zswap_invalidate_entry(struct zswap_tree *tree, + struct zswap_entry *entry) +{ + /* remove from rbtree */ + zswap_rb_erase(&tree->rbroot, entry); + + /* drop the initial reference from entry creation */ + zswap_entry_put(tree, entry); +} + /* * returns 0 if the page was successfully decompressed * return -1 on entry not found or error @@ -1403,6 +1413,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, count_objcg_event(entry->objcg, ZSWPIN); freeentry: spin_lock(&tree->lock); + if (!ret && IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS)) + zswap_invalidate_entry(tree, entry); zswap_entry_put(tree, entry); spin_unlock(&tree->lock); @@ -1423,13 +1435,7 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) spin_unlock(&tree->lock); return; } - - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, entry); - - /* drop the initial reference from entry creation */ - zswap_entry_put(tree, entry); - + zswap_invalidate_entry(tree, entry); spin_unlock(&tree->lock); } @@ -1472,7 +1478,8 @@ static const struct frontswap_ops zswap_frontswap_ops = { .load = zswap_frontswap_load, .invalidate_page = zswap_frontswap_invalidate_page, .invalidate_area = zswap_frontswap_invalidate_area, - .init = zswap_frontswap_init + .init = zswap_frontswap_init, + .exclusive_loads = IS_ENABLED(CONFIG_ZSWAP_EXCLUSIVE_LOADS), }; /*********************************
Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") removed support for exclusive loads from frontswap as it was not used. Bring back exclusive loads support to frontswap by adding an exclusive_loads argument to frontswap_ops. Add support for exclusive loads to zswap behind CONFIG_ZSWAP_EXCLUSIVE_LOADS. Refactor zswap entry invalidation in zswap_frontswap_invalidate_page() into zswap_invalidate_entry() to reuse it in zswap_frontswap_load(). With exclusive loads, we avoid having two copies of the same page in memory (compressed & uncompressed) after faulting it in from zswap. On the other hand, if the page is to be reclaimed again without being dirtied, it will be re-compressed. Compression is not usually slow, and a page that was just faulted in is less likely to be reclaimed again soon. Suggested-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- include/linux/frontswap.h | 1 + mm/Kconfig | 13 +++++++++++++ mm/frontswap.c | 7 ++++++- mm/zswap.c | 23 +++++++++++++++-------- 4 files changed, 35 insertions(+), 9 deletions(-)