Message ID | 20240814062830.26833-3-kanchana.p.sridhar@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: ZSWAP swap-out of mTHP folios | expand |
On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar <kanchana.p.sridhar@intel.com> wrote: > > Added vmstat event counters per mTHP-size that can be used to account > for folios of different sizes being successfully stored in ZSWAP. > > For this RFC, it is not clear if these zswpout counters should instead > be added as part of the existing mTHP stats in > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats. > > The following is also a viable option, should it make better sense, > for instance, as: > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout. > > If so, we would be able to distinguish between mTHP zswap and > non-zswap swapouts through: > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout > > and > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout > > respectively. > > Comments would be appreciated as to which approach is preferable. Even though swapout might go through zswap, from the perspective of the mm core, it shouldn't be aware of that. Shouldn't zswpout be part of swpout? Why are they separate? no matter if a mTHP has been put in zswap, it has been swapped-out to mm-core? No? > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > --- > include/linux/vm_event_item.h | 15 +++++++++++++++ > mm/vmstat.c | 15 +++++++++++++++ > 2 files changed, 30 insertions(+) > > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index 747943bc8cc2..2451bcfcf05c 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > THP_ZERO_PAGE_ALLOC, > THP_ZERO_PAGE_ALLOC_FAILED, > THP_SWPOUT, > +#ifdef CONFIG_ZSWAP > + ZSWPOUT_PMD_THP_FOLIO, > +#endif > THP_SWPOUT_FALLBACK, > #endif > #ifdef CONFIG_MEMORY_BALLOON > @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > ZSWPIN, > ZSWPOUT, > ZSWPWB, > + ZSWPOUT_4KB_FOLIO, > +#ifdef CONFIG_THP_SWAP > + mTHP_ZSWPOUT_8kB, > + mTHP_ZSWPOUT_16kB, > + mTHP_ZSWPOUT_32kB, > + mTHP_ZSWPOUT_64kB, > + mTHP_ZSWPOUT_128kB, > + mTHP_ZSWPOUT_256kB, > + mTHP_ZSWPOUT_512kB, > + mTHP_ZSWPOUT_1024kB, > + mTHP_ZSWPOUT_2048kB, > +#endif This implementation hardcodes assumptions about the page size being 4KB, but page sizes can vary, and so can the THP orders? > #endif > #ifdef CONFIG_X86 > DIRECT_MAP_LEVEL2_SPLIT, > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 8507c497218b..0e66c8b0c486 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = { > "thp_zero_page_alloc", > "thp_zero_page_alloc_failed", > "thp_swpout", > +#ifdef CONFIG_ZSWAP > + "zswpout_pmd_thp_folio", > +#endif > "thp_swpout_fallback", > #endif > #ifdef CONFIG_MEMORY_BALLOON > @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = { > "zswpin", > "zswpout", > "zswpwb", > + "zswpout_4kb_folio", > +#ifdef CONFIG_THP_SWAP > + "mthp_zswpout_8kb", > + "mthp_zswpout_16kb", > + "mthp_zswpout_32kb", > + "mthp_zswpout_64kb", > + "mthp_zswpout_128kb", > + "mthp_zswpout_256kb", > + "mthp_zswpout_512kb", > + "mthp_zswpout_1024kb", > + "mthp_zswpout_2048kb", > +#endif The issue here is that the number of THP orders can vary across different platforms. > #endif > #ifdef CONFIG_X86 > "direct_map_level2_splits", > -- > 2.27.0 > Thanks Barry
Hi Barry, > -----Original Message----- > From: Barry Song <21cnbao@gmail.com> > Sent: Wednesday, August 14, 2024 12:49 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store > vmstat event counters. > > On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar > <kanchana.p.sridhar@intel.com> wrote: > > > > Added vmstat event counters per mTHP-size that can be used to account > > for folios of different sizes being successfully stored in ZSWAP. > > > > For this RFC, it is not clear if these zswpout counters should instead > > be added as part of the existing mTHP stats in > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats. > > > > The following is also a viable option, should it make better sense, > > for instance, as: > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout. > > > > If so, we would be able to distinguish between mTHP zswap and > > non-zswap swapouts through: > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout > > > > and > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout > > > > respectively. > > > > Comments would be appreciated as to which approach is preferable. > > Even though swapout might go through zswap, from the perspective of > the mm core, it shouldn't be aware of that. Shouldn't zswpout be part > of swpout? Why are they separate? no matter if a mTHP has been > put in zswap, it has been swapped-out to mm-core? No? Thanks for the code review comments. This is a good point. I was keeping in mind the convention used by existing vmstat event counters that distinguish zswpout/zswpin from pswpout/pswpin events. If we want to keep the distinction in mTHP swapouts, would adding a separate MTHP_STAT_ZSWPOUT to "enum mthp_stat_item" be Ok? In any case, it looks like all that would be needed is a call to count_mthp_stat(folio_order(folio), MTHP_STAT_[Z]SWPOUT) in the general case. I will make this change in v2, depending on whether or not the separation of zswpout vs. non-zswap swpout is recommended for mTHP. > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > --- > > include/linux/vm_event_item.h | 15 +++++++++++++++ > > mm/vmstat.c | 15 +++++++++++++++ > > 2 files changed, 30 insertions(+) > > > > diff --git a/include/linux/vm_event_item.h > b/include/linux/vm_event_item.h > > index 747943bc8cc2..2451bcfcf05c 100644 > > --- a/include/linux/vm_event_item.h > > +++ b/include/linux/vm_event_item.h > > @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, > PSWPIN, PSWPOUT, > > THP_ZERO_PAGE_ALLOC, > > THP_ZERO_PAGE_ALLOC_FAILED, > > THP_SWPOUT, > > +#ifdef CONFIG_ZSWAP > > + ZSWPOUT_PMD_THP_FOLIO, > > +#endif > > THP_SWPOUT_FALLBACK, > > #endif > > #ifdef CONFIG_MEMORY_BALLOON > > @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT, > PSWPIN, PSWPOUT, > > ZSWPIN, > > ZSWPOUT, > > ZSWPWB, > > + ZSWPOUT_4KB_FOLIO, > > +#ifdef CONFIG_THP_SWAP > > + mTHP_ZSWPOUT_8kB, > > + mTHP_ZSWPOUT_16kB, > > + mTHP_ZSWPOUT_32kB, > > + mTHP_ZSWPOUT_64kB, > > + mTHP_ZSWPOUT_128kB, > > + mTHP_ZSWPOUT_256kB, > > + mTHP_ZSWPOUT_512kB, > > + mTHP_ZSWPOUT_1024kB, > > + mTHP_ZSWPOUT_2048kB, > > +#endif > > This implementation hardcodes assumptions about the page size being 4KB, > but page sizes can vary, and so can the THP orders? Agreed, will address in v2. > > > #endif > > #ifdef CONFIG_X86 > > DIRECT_MAP_LEVEL2_SPLIT, > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 8507c497218b..0e66c8b0c486 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = { > > "thp_zero_page_alloc", > > "thp_zero_page_alloc_failed", > > "thp_swpout", > > +#ifdef CONFIG_ZSWAP > > + "zswpout_pmd_thp_folio", > > +#endif > > "thp_swpout_fallback", > > #endif > > #ifdef CONFIG_MEMORY_BALLOON > > @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = { > > "zswpin", > > "zswpout", > > "zswpwb", > > + "zswpout_4kb_folio", > > +#ifdef CONFIG_THP_SWAP > > + "mthp_zswpout_8kb", > > + "mthp_zswpout_16kb", > > + "mthp_zswpout_32kb", > > + "mthp_zswpout_64kb", > > + "mthp_zswpout_128kb", > > + "mthp_zswpout_256kb", > > + "mthp_zswpout_512kb", > > + "mthp_zswpout_1024kb", > > + "mthp_zswpout_2048kb", > > +#endif > > The issue here is that the number of THP orders > can vary across different platforms. Agreed, will address in v2. Thanks, Kanchana > > > #endif > > #ifdef CONFIG_X86 > > "direct_map_level2_splits", > > -- > > 2.27.0 > > > > Thanks > Barry
On Thu, Aug 15, 2024 at 5:40 AM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > Hi Barry, > > > -----Original Message----- > > From: Barry Song <21cnbao@gmail.com> > > Sent: Wednesday, August 14, 2024 12:49 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; akpm@linux- > > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store > > vmstat event counters. > > > > On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > Added vmstat event counters per mTHP-size that can be used to account > > > for folios of different sizes being successfully stored in ZSWAP. > > > > > > For this RFC, it is not clear if these zswpout counters should instead > > > be added as part of the existing mTHP stats in > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats. > > > > > > The following is also a viable option, should it make better sense, > > > for instance, as: > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout. > > > > > > If so, we would be able to distinguish between mTHP zswap and > > > non-zswap swapouts through: > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout > > > > > > and > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout > > > > > > respectively. > > > > > > Comments would be appreciated as to which approach is preferable. > > > > Even though swapout might go through zswap, from the perspective of > > the mm core, it shouldn't be aware of that. Shouldn't zswpout be part > > of swpout? Why are they separate? no matter if a mTHP has been > > put in zswap, it has been swapped-out to mm-core? No? > > Thanks for the code review comments. This is a good point. I was keeping in > mind the convention used by existing vmstat event counters that distinguish > zswpout/zswpin from pswpout/pswpin events. > > If we want to keep the distinction in mTHP swapouts, would adding a > separate MTHP_STAT_ZSWPOUT to "enum mthp_stat_item" be Ok? > I'm not entirely sure how important the zswpout counter is. To me, it doesn't seem as critical as swpout and swpout_fallback, which are more useful for system profiling. zswapout feels more like an internal detail related to how the swap-out process is handled? If this is the case, we might not need this per-size counter. Otherwise, I believe sysfs is a better place to avoid all the chaos in vmstat to handle various orders and sizes. So the question is, per-size zswpout counter is really important or just for debugging purposes? > In any case, it looks like all that would be needed is a call to > count_mthp_stat(folio_order(folio), MTHP_STAT_[Z]SWPOUT) in the > general case. > > I will make this change in v2, depending on whether or not the > separation of zswpout vs. non-zswap swpout is recommended for > mTHP. > > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > --- > > > include/linux/vm_event_item.h | 15 +++++++++++++++ > > > mm/vmstat.c | 15 +++++++++++++++ > > > 2 files changed, 30 insertions(+) > > > > > > diff --git a/include/linux/vm_event_item.h > > b/include/linux/vm_event_item.h > > > index 747943bc8cc2..2451bcfcf05c 100644 > > > --- a/include/linux/vm_event_item.h > > > +++ b/include/linux/vm_event_item.h > > > @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, > > PSWPIN, PSWPOUT, > > > THP_ZERO_PAGE_ALLOC, > > > THP_ZERO_PAGE_ALLOC_FAILED, > > > THP_SWPOUT, > > > +#ifdef CONFIG_ZSWAP > > > + ZSWPOUT_PMD_THP_FOLIO, > > > +#endif > > > THP_SWPOUT_FALLBACK, > > > #endif > > > #ifdef CONFIG_MEMORY_BALLOON > > > @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT, > > PSWPIN, PSWPOUT, > > > ZSWPIN, > > > ZSWPOUT, > > > ZSWPWB, > > > + ZSWPOUT_4KB_FOLIO, > > > +#ifdef CONFIG_THP_SWAP > > > + mTHP_ZSWPOUT_8kB, > > > + mTHP_ZSWPOUT_16kB, > > > + mTHP_ZSWPOUT_32kB, > > > + mTHP_ZSWPOUT_64kB, > > > + mTHP_ZSWPOUT_128kB, > > > + mTHP_ZSWPOUT_256kB, > > > + mTHP_ZSWPOUT_512kB, > > > + mTHP_ZSWPOUT_1024kB, > > > + mTHP_ZSWPOUT_2048kB, > > > +#endif > > > > This implementation hardcodes assumptions about the page size being 4KB, > > but page sizes can vary, and so can the THP orders? > > Agreed, will address in v2. > > > > > > #endif > > > #ifdef CONFIG_X86 > > > DIRECT_MAP_LEVEL2_SPLIT, > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > index 8507c497218b..0e66c8b0c486 100644 > > > --- a/mm/vmstat.c > > > +++ b/mm/vmstat.c > > > @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = { > > > "thp_zero_page_alloc", > > > "thp_zero_page_alloc_failed", > > > "thp_swpout", > > > +#ifdef CONFIG_ZSWAP > > > + "zswpout_pmd_thp_folio", > > > +#endif > > > "thp_swpout_fallback", > > > #endif > > > #ifdef CONFIG_MEMORY_BALLOON > > > @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = { > > > "zswpin", > > > "zswpout", > > > "zswpwb", > > > + "zswpout_4kb_folio", > > > +#ifdef CONFIG_THP_SWAP > > > + "mthp_zswpout_8kb", > > > + "mthp_zswpout_16kb", > > > + "mthp_zswpout_32kb", > > > + "mthp_zswpout_64kb", > > > + "mthp_zswpout_128kb", > > > + "mthp_zswpout_256kb", > > > + "mthp_zswpout_512kb", > > > + "mthp_zswpout_1024kb", > > > + "mthp_zswpout_2048kb", > > > +#endif > > > > The issue here is that the number of THP orders > > can vary across different platforms. > > Agreed, will address in v2. > > Thanks, > Kanchana > > > > > > #endif > > > #ifdef CONFIG_X86 > > > "direct_map_level2_splits", > > > -- > > > 2.27.0 > > > Thanks Barry
Hi Barry, > -----Original Message----- > From: Barry Song <21cnbao@gmail.com> > Sent: Wednesday, August 14, 2024 4:25 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store > vmstat event counters. > > On Thu, Aug 15, 2024 at 5:40 AM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > Hi Barry, > > > > > -----Original Message----- > > > From: Barry Song <21cnbao@gmail.com> > > > Sent: Wednesday, August 14, 2024 12:49 AM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; > akpm@linux- > > > foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store > > > vmstat event counters. > > > > > > On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar > > > <kanchana.p.sridhar@intel.com> wrote: > > > > > > > > Added vmstat event counters per mTHP-size that can be used to account > > > > for folios of different sizes being successfully stored in ZSWAP. > > > > > > > > For this RFC, it is not clear if these zswpout counters should instead > > > > be added as part of the existing mTHP stats in > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats. > > > > > > > > The following is also a viable option, should it make better sense, > > > > for instance, as: > > > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout. > > > > > > > > If so, we would be able to distinguish between mTHP zswap and > > > > non-zswap swapouts through: > > > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout > > > > > > > > and > > > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout > > > > > > > > respectively. > > > > > > > > Comments would be appreciated as to which approach is preferable. > > > > > > Even though swapout might go through zswap, from the perspective of > > > the mm core, it shouldn't be aware of that. Shouldn't zswpout be part > > > of swpout? Why are they separate? no matter if a mTHP has been > > > put in zswap, it has been swapped-out to mm-core? No? > > > > Thanks for the code review comments. This is a good point. I was keeping in > > mind the convention used by existing vmstat event counters that distinguish > > zswpout/zswpin from pswpout/pswpin events. > > > > If we want to keep the distinction in mTHP swapouts, would adding a > > separate MTHP_STAT_ZSWPOUT to "enum mthp_stat_item" be Ok? > > > > I'm not entirely sure how important the zswpout counter is. To me, it doesn't > seem as critical as swpout and swpout_fallback, which are more useful for > system profiling. zswapout feels more like an internal detail related to > how the swap-out process is handled? If this is the case, we might not > need this per-size counter. > > Otherwise, I believe sysfs is a better place to avoid all the chaos in vmstat > to handle various orders and sizes. So the question is, per-size zswpout > counter is really important or just for debugging purposes? I agree, sysfs would be a cleaner mTHP stats accounting solution, given the existing mTHP swpout stats under the per-order /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout. I personally find distinct zswap vs. bdev/fs swapout accounting useful for debugging, and for overall reclaim path characterization. For instance, the impact of different zswap compressors' compress latency on zswpout activity for a given workload. Is a slowdown in compress latency causing active/hot memory to be reclaimed and immediately faulted in? Does better zswap compress efficiency co-relate to more cold memory as mTHP to be reclaimed? How does the reclaim path efficiency improvement resulting from improving zswap_store mTHP performance co-relate with ZSWAP utilization and memory savings? I have found these counters useful in understanding some of these characteristics. I also believe it helps to account for the number of mTHP being stored in different compress tiers. For e.g. how many mTHP were stored in zswap vs. being rejected and stored in the backing swap device. This could help say in provisioning zswap memory, and knowing the impact of zswap compress path latency on scaling. Another interesting characteristic that mTHP zswpout accounting could help understand would be compressor incompressibility and/or zpool fragmentation; and being able to better co-relate the zswap/reject_* sysfs counters with mTHP [z]swpout stats. Look forward to inputs from yourself and others on the direction and next steps. Thanks, Kanchana > > > In any case, it looks like all that would be needed is a call to > > count_mthp_stat(folio_order(folio), MTHP_STAT_[Z]SWPOUT) in the > > general case. > > > > I will make this change in v2, depending on whether or not the > > separation of zswpout vs. non-zswap swpout is recommended for > > mTHP. > > > > > > > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > --- > > > > include/linux/vm_event_item.h | 15 +++++++++++++++ > > > > mm/vmstat.c | 15 +++++++++++++++ > > > > 2 files changed, 30 insertions(+) > > > > > > > > diff --git a/include/linux/vm_event_item.h > > > b/include/linux/vm_event_item.h > > > > index 747943bc8cc2..2451bcfcf05c 100644 > > > > --- a/include/linux/vm_event_item.h > > > > +++ b/include/linux/vm_event_item.h > > > > @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, > > > PSWPIN, PSWPOUT, > > > > THP_ZERO_PAGE_ALLOC, > > > > THP_ZERO_PAGE_ALLOC_FAILED, > > > > THP_SWPOUT, > > > > +#ifdef CONFIG_ZSWAP > > > > + ZSWPOUT_PMD_THP_FOLIO, > > > > +#endif > > > > THP_SWPOUT_FALLBACK, > > > > #endif > > > > #ifdef CONFIG_MEMORY_BALLOON > > > > @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT, > > > PSWPIN, PSWPOUT, > > > > ZSWPIN, > > > > ZSWPOUT, > > > > ZSWPWB, > > > > + ZSWPOUT_4KB_FOLIO, > > > > +#ifdef CONFIG_THP_SWAP > > > > + mTHP_ZSWPOUT_8kB, > > > > + mTHP_ZSWPOUT_16kB, > > > > + mTHP_ZSWPOUT_32kB, > > > > + mTHP_ZSWPOUT_64kB, > > > > + mTHP_ZSWPOUT_128kB, > > > > + mTHP_ZSWPOUT_256kB, > > > > + mTHP_ZSWPOUT_512kB, > > > > + mTHP_ZSWPOUT_1024kB, > > > > + mTHP_ZSWPOUT_2048kB, > > > > +#endif > > > > > > This implementation hardcodes assumptions about the page size being > 4KB, > > > but page sizes can vary, and so can the THP orders? > > > > Agreed, will address in v2. > > > > > > > > > #endif > > > > #ifdef CONFIG_X86 > > > > DIRECT_MAP_LEVEL2_SPLIT, > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > index 8507c497218b..0e66c8b0c486 100644 > > > > --- a/mm/vmstat.c > > > > +++ b/mm/vmstat.c > > > > @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = { > > > > "thp_zero_page_alloc", > > > > "thp_zero_page_alloc_failed", > > > > "thp_swpout", > > > > +#ifdef CONFIG_ZSWAP > > > > + "zswpout_pmd_thp_folio", > > > > +#endif > > > > "thp_swpout_fallback", > > > > #endif > > > > #ifdef CONFIG_MEMORY_BALLOON > > > > @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = { > > > > "zswpin", > > > > "zswpout", > > > > "zswpwb", > > > > + "zswpout_4kb_folio", > > > > +#ifdef CONFIG_THP_SWAP > > > > + "mthp_zswpout_8kb", > > > > + "mthp_zswpout_16kb", > > > > + "mthp_zswpout_32kb", > > > > + "mthp_zswpout_64kb", > > > > + "mthp_zswpout_128kb", > > > > + "mthp_zswpout_256kb", > > > > + "mthp_zswpout_512kb", > > > > + "mthp_zswpout_1024kb", > > > > + "mthp_zswpout_2048kb", > > > > +#endif > > > > > > The issue here is that the number of THP orders > > > can vary across different platforms. > > > > Agreed, will address in v2. > > > > Thanks, > > Kanchana > > > > > > > > > #endif > > > > #ifdef CONFIG_X86 > > > > "direct_map_level2_splits", > > > > -- > > > > 2.27.0 > > > > > > Thanks > Barry
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 747943bc8cc2..2451bcfcf05c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_ZERO_PAGE_ALLOC, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, +#ifdef CONFIG_ZSWAP + ZSWPOUT_PMD_THP_FOLIO, +#endif THP_SWPOUT_FALLBACK, #endif #ifdef CONFIG_MEMORY_BALLOON @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, ZSWPIN, ZSWPOUT, ZSWPWB, + ZSWPOUT_4KB_FOLIO, +#ifdef CONFIG_THP_SWAP + mTHP_ZSWPOUT_8kB, + mTHP_ZSWPOUT_16kB, + mTHP_ZSWPOUT_32kB, + mTHP_ZSWPOUT_64kB, + mTHP_ZSWPOUT_128kB, + mTHP_ZSWPOUT_256kB, + mTHP_ZSWPOUT_512kB, + mTHP_ZSWPOUT_1024kB, + mTHP_ZSWPOUT_2048kB, +#endif #endif #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, diff --git a/mm/vmstat.c b/mm/vmstat.c index 8507c497218b..0e66c8b0c486 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = { "thp_zero_page_alloc", "thp_zero_page_alloc_failed", "thp_swpout", +#ifdef CONFIG_ZSWAP + "zswpout_pmd_thp_folio", +#endif "thp_swpout_fallback", #endif #ifdef CONFIG_MEMORY_BALLOON @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = { "zswpin", "zswpout", "zswpwb", + "zswpout_4kb_folio", +#ifdef CONFIG_THP_SWAP + "mthp_zswpout_8kb", + "mthp_zswpout_16kb", + "mthp_zswpout_32kb", + "mthp_zswpout_64kb", + "mthp_zswpout_128kb", + "mthp_zswpout_256kb", + "mthp_zswpout_512kb", + "mthp_zswpout_1024kb", + "mthp_zswpout_2048kb", +#endif #endif #ifdef CONFIG_X86 "direct_map_level2_splits",
Added vmstat event counters per mTHP-size that can be used to account for folios of different sizes being successfully stored in ZSWAP. For this RFC, it is not clear if these zswpout counters should instead be added as part of the existing mTHP stats in /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats. The following is also a viable option, should it make better sense, for instance, as: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout. If so, we would be able to distinguish between mTHP zswap and non-zswap swapouts through: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout and /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout respectively. Comments would be appreciated as to which approach is preferable. Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> --- include/linux/vm_event_item.h | 15 +++++++++++++++ mm/vmstat.c | 15 +++++++++++++++ 2 files changed, 30 insertions(+)