Message ID | 20230310182851.2579138-3-shr@devkernel.io (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: process/cgroup ksm support | expand |
On 10.03.23 19:28, Stefan Roesch wrote: > This adds the general_profit KSM sysfs knob and the process profit metric > and process merge type knobs to ksm_stat. > > 1) split off pages_volatile function > > This splits off the pages_volatile function. The next patch will > use this function. > > 2) expose general_profit metric > > The documentation mentions a general profit metric, however this > metric is not calculated. In addition the formula depends on the size > of internal structures, which makes it more difficult for an > administrator to make the calculation. Adding the metric for a better > user experience. > > 3) document general_profit sysfs knob > > 4) calculate ksm process profit metric > > The ksm documentation mentions the process profit metric and how to > calculate it. This adds the calculation of the metric. > > 5) add ksm_merge_type() function > > This adds the ksm_merge_type function. The function returns the > merge type for the process. For madvise it returns "madvise", for > prctl it returns "process" and otherwise it returns "none". > > 6) mm: expose ksm process profit metric and merge type in ksm_stat > > This exposes the ksm process profit metric in /proc/<pid>/ksm_stat. > The name of the value is ksm_merge_type. The documentation mentions > the formula for the ksm process profit metric, however it does not > calculate it. In addition the formula depends on the size of internal > structures. So it makes sense to expose it. > > 7) document new procfs ksm knobs > Often, when you have to start making a list of things that a patch does, it might make sense to split some of the items into separate patches such that you can avoid lists and just explain in list-free text how the pieces in the patch fit together. I'd suggest splitting this patch into logical pieces. For example, separating the general profit calculation/exposure from the per-mm profit and the per-mm ksm type indication. > Link: https://lkml.kernel.org/r/20230224044000.3084046-3-shr@devkernel.io > Signed-off-by: Stefan Roesch <shr@devkernel.io> > Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Rik van Riel <riel@surriel.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- [...] > KSM_ATTR_RO(pages_volatile); > > @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject *kobj, > } > KSM_ATTR_RO(zero_pages_sharing); > > +static ssize_t general_profit_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + long general_profit; > + long all_rmap_items; > + > + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared + > + ksm_pages_unshared + pages_volatile(); Are you sure you want to count a config knob (ksm_max_page_sharing) into that formula? I yet have to digest what this calculation implies, but it does feel odd. Further, maybe just avoid pages_volatile(). Expanding the formula (excluding ksm_max_page_sharing for now): all_rmap = ksm_pages_shared + ksm_pages_unshared + pages_volatile(); -> expand pages_volatile() (ignoring the < 0 case) all_rmap = ksm_pages_shared + ksm_pages_unshared + ksm_rmap_items - ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared; -> simplify all_rmap = ksm_rmap_items + ksm_pages_sharing; Or is the < 0 case relevant here? > + general_profit = ksm_pages_sharing * PAGE_SIZE - > + all_rmap_items * sizeof(struct ksm_rmap_item); > + > + return sysfs_emit(buf, "%ld\n", general_profit); > +} > +KSM_ATTR_RO(general_profit); > + > static ssize_t stable_node_dups_show(struct kobject *kobj, > struct kobj_attribute *attr, char *buf) > { > @@ -3345,6 +3385,7 @@ static struct attribute *ksm_attrs[] = { > &stable_node_dups_attr.attr, > &stable_node_chains_prune_millisecs_attr.attr, > &use_zero_pages_attr.attr, > + &general_profit_attr.attr, > NULL, > }; > The calculations (profit) don't include when KSM places the shared zeropage I guess. Accounting that per MM (and eventually globally) is in the works. [1] [1] https://lore.kernel.org/lkml/20230328153852.26c2577e4bd921c371c47a7e@linux-foundation.org/t/
David Hildenbrand <david@redhat.com> writes: > On 10.03.23 19:28, Stefan Roesch wrote: >> This adds the general_profit KSM sysfs knob and the process profit metric >> and process merge type knobs to ksm_stat. >> 1) split off pages_volatile function >> This splits off the pages_volatile function. The next patch will >> use this function. >> 2) expose general_profit metric >> The documentation mentions a general profit metric, however this >> metric is not calculated. In addition the formula depends on the size >> of internal structures, which makes it more difficult for an >> administrator to make the calculation. Adding the metric for a better >> user experience. >> 3) document general_profit sysfs knob >> 4) calculate ksm process profit metric >> The ksm documentation mentions the process profit metric and how to >> calculate it. This adds the calculation of the metric. >> 5) add ksm_merge_type() function >> This adds the ksm_merge_type function. The function returns the >> merge type for the process. For madvise it returns "madvise", for >> prctl it returns "process" and otherwise it returns "none". >> 6) mm: expose ksm process profit metric and merge type in ksm_stat >> This exposes the ksm process profit metric in /proc/<pid>/ksm_stat. >> The name of the value is ksm_merge_type. The documentation mentions >> the formula for the ksm process profit metric, however it does not >> calculate it. In addition the formula depends on the size of internal >> structures. So it makes sense to expose it. >> 7) document new procfs ksm knobs >> > > Often, when you have to start making a list of things that a patch does, it > might make sense to split some of the items into separate patches such that you > can avoid lists and just explain in list-free text how the pieces in the patch > fit together. > > I'd suggest splitting this patch into logical pieces. For example, separating > the general profit calculation/exposure from the per-mm profit and the per-mm > ksm type indication. > Originally these were individual patches. If I recall correctly Johannes Weiner wanted them as one patch. I can certainly split them again. >> Link: https://lkml.kernel.org/r/20230224044000.3084046-3-shr@devkernel.io >> Signed-off-by: Stefan Roesch <shr@devkernel.io> >> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Johannes Weiner <hannes@cmpxchg.org> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Rik van Riel <riel@surriel.com> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> >> --- > > > [...] > >> KSM_ATTR_RO(pages_volatile); >> @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject >> *kobj, >> } >> KSM_ATTR_RO(zero_pages_sharing); >> +static ssize_t general_profit_show(struct kobject *kobj, >> + struct kobj_attribute *attr, char *buf) >> +{ >> + long general_profit; >> + long all_rmap_items; >> + >> + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared + >> + ksm_pages_unshared + pages_volatile(); > > Are you sure you want to count a config knob (ksm_max_page_sharing) into that > formula? I yet have to digest what this calculation implies, but it does feel > odd. > This was a mistake. I wanted ksm_pages_sharing instead of ksm_max_page_sharing. > > Further, maybe just avoid pages_volatile(). Expanding the formula (excluding > ksm_max_page_sharing for now): > > > all_rmap = ksm_pages_shared + ksm_pages_unshared + pages_volatile(); > > -> expand pages_volatile() (ignoring the < 0 case) > > all_rmap = ksm_pages_shared + ksm_pages_unshared + ksm_rmap_items - > ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared; > > -> simplify > > all_rmap = ksm_rmap_items + ksm_pages_sharing; > I'll simplify it. > Or is the < 0 case relevant here? > A negative profit is ok. >> + general_profit = ksm_pages_sharing * PAGE_SIZE - >> + all_rmap_items * sizeof(struct ksm_rmap_item); >> + >> + return sysfs_emit(buf, "%ld\n", general_profit); >> +} >> +KSM_ATTR_RO(general_profit); >> + >> static ssize_t stable_node_dups_show(struct kobject *kobj, >> struct kobj_attribute *attr, char *buf) >> { >> @@ -3345,6 +3385,7 @@ static struct attribute *ksm_attrs[] = { >> &stable_node_dups_attr.attr, >> &stable_node_chains_prune_millisecs_attr.attr, >> &use_zero_pages_attr.attr, >> + &general_profit_attr.attr, >> NULL, >> }; >> > > The calculations (profit) don't include when KSM places the shared zeropage I > guess. Accounting that per MM (and eventually globally) is in the works. [1] > > > [1] > https://lore.kernel.org/lkml/20230328153852.26c2577e4bd921c371c47a7e@linux-foundation.org/t/
>> >> Often, when you have to start making a list of things that a patch does, it >> might make sense to split some of the items into separate patches such that you >> can avoid lists and just explain in list-free text how the pieces in the patch >> fit together. >> >> I'd suggest splitting this patch into logical pieces. For example, separating >> the general profit calculation/exposure from the per-mm profit and the per-mm >> ksm type indication. >> > > Originally these were individual patches. If I recall correctly Johannes > Weiner wanted them as one patch. I can certainly split them again. That's why I remember that v1 contained more patches :) Again, just my opinion on patches that require a description in form of a list ... > >>> Link: https://lkml.kernel.org/r/20230224044000.3084046-3-shr@devkernel.io >>> Signed-off-by: Stefan Roesch <shr@devkernel.io> >>> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> >>> Cc: David Hildenbrand <david@redhat.com> >>> Cc: Johannes Weiner <hannes@cmpxchg.org> >>> Cc: Michal Hocko <mhocko@suse.com> >>> Cc: Rik van Riel <riel@surriel.com> >>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> >>> --- >> >> >> [...] >> >>> KSM_ATTR_RO(pages_volatile); >>> @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject >>> *kobj, >>> } >>> KSM_ATTR_RO(zero_pages_sharing); >>> +static ssize_t general_profit_show(struct kobject *kobj, >>> + struct kobj_attribute *attr, char *buf) >>> +{ >>> + long general_profit; >>> + long all_rmap_items; >>> + >>> + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared + >>> + ksm_pages_unshared + pages_volatile(); >> >> Are you sure you want to count a config knob (ksm_max_page_sharing) into that >> formula? I yet have to digest what this calculation implies, but it does feel >> odd. >> > > This was a mistake. I wanted ksm_pages_sharing instead of > ksm_max_page_sharing. > >> >> Further, maybe just avoid pages_volatile(). Expanding the formula (excluding >> ksm_max_page_sharing for now): >> >> >> all_rmap = ksm_pages_shared + ksm_pages_unshared + pages_volatile(); >> >> -> expand pages_volatile() (ignoring the < 0 case) >> >> all_rmap = ksm_pages_shared + ksm_pages_unshared + ksm_rmap_items - >> ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared; >> >> -> simplify >> >> all_rmap = ksm_rmap_items + ksm_pages_sharing; >> > I'll simplify it. Cool.
On Thu, Apr 06, 2023 at 03:23:11PM +0200, David Hildenbrand wrote: > > > > > > Often, when you have to start making a list of things that a patch does, it > > > might make sense to split some of the items into separate patches such that you > > > can avoid lists and just explain in list-free text how the pieces in the patch > > > fit together. > > > > > > I'd suggest splitting this patch into logical pieces. For example, separating > > > the general profit calculation/exposure from the per-mm profit and the per-mm > > > ksm type indication. > > > > > > > Originally these were individual patches. If I recall correctly Johannes > > Weiner wanted them as one patch. I can certainly split them again. > > That's why I remember that v1 contained more patches :) > > Again, just my opinion on patches that require a description in form of a > list ... My concern was the initial splitting of patch 1. I found it easier to review with the new prctl() being one logical change, and fully implemented in one go. The changelog is still in the form of a list, but it essentially enumerates diff hunks that make up the interface. No objection to splitting out the multiple sysfs knobs, though! The strategy I usually follow is this: 1. Split out changes based on user-visible behavior (new prctl(), new sysfs knob) 2. Extract changes made along the way that have stand-alone value in existing code (bug fixes, simplifying/documenting tricky sections, cleanups). 3. Split out noisy prep work such as renames and refactors that make the user-visible functional changes more difficult to understand. and then order the series into sections 2, 3, and 1.
On 06.04.23 16:16, Johannes Weiner wrote: > On Thu, Apr 06, 2023 at 03:23:11PM +0200, David Hildenbrand wrote: >>>> >>>> Often, when you have to start making a list of things that a patch does, it >>>> might make sense to split some of the items into separate patches such that you >>>> can avoid lists and just explain in list-free text how the pieces in the patch >>>> fit together. >>>> >>>> I'd suggest splitting this patch into logical pieces. For example, separating >>>> the general profit calculation/exposure from the per-mm profit and the per-mm >>>> ksm type indication. >>>> >>> >>> Originally these were individual patches. If I recall correctly Johannes >>> Weiner wanted them as one patch. I can certainly split them again. >> >> That's why I remember that v1 contained more patches :) >> >> Again, just my opinion on patches that require a description in form of a >> list ... > > My concern was the initial splitting of patch 1. I found it easier to > review with the new prctl() being one logical change, and fully > implemented in one go. The changelog is still in the form of a list, > but it essentially enumerates diff hunks that make up the interface. > > No objection to splitting out the multiple sysfs knobs, though! > > The strategy I usually follow is this: > > 1. Split out changes based on user-visible behavior (new prctl(), new > sysfs knob) > > 2. Extract changes made along the way that have stand-alone value in > existing code (bug fixes, simplifying/documenting tricky sections, > cleanups). > > 3. Split out noisy prep work such as renames and refactors that make > the user-visible functional changes more difficult to understand. > > and then order the series into sections 2, 3, and 1. > I agree. The most important part is the "logical change" part. Once it's down to a list of 3 items or so for a single commit we can usually express it like (just an example that does no longer apply due to pages_volatile() not being required) the following when combining items 1+2+3 from the list: " Let's expose the general KSM profit via sysfs and document that new toggle. [add details about that and especially why we are doing that] As we need the number of volatile pages to calculate the general KSM profit, factor out existing functionality into a helper. " Much easier to read.
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-ksm b/Documentation/ABI/testing/sysfs-kernel-mm-ksm index d244674a9480..7768e90f7a8f 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-ksm +++ b/Documentation/ABI/testing/sysfs-kernel-mm-ksm @@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes. When it is set to 0 only pages from the same node are merged, otherwise pages from all nodes can be merged together (default). + +What: /sys/kernel/mm/ksm/general_profit +Date: January 2023 +KernelVersion: 6.1 +Contact: Linux memory management mailing list <linux-mm@kvack.org> +Description: Measure how effective KSM is. + general_profit: how effective is KSM. The formula for the + calculation is in Documentation/admin-guide/mm/ksm.rst. diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index 270560fef3b2..d2929964cd0f 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -157,6 +157,8 @@ stable_node_chains_prune_millisecs The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``: +general_profit + how effective is KSM. The calculation is explained below. pages_shared how many shared pages are being used pages_sharing @@ -214,7 +216,8 @@ several times, which are unprofitable memory consumed. ksm_rmap_items * sizeof(rmap_item). where ksm_merging_pages is shown under the directory ``/proc/<pid>/``, - and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. + and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit + is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit. From the perspective of application, a high ratio of ``ksm_rmap_items`` to ``ksm_merging_pages`` means a bad madvise-applied policy, so developers or @@ -225,6 +228,9 @@ so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped, because the ksm profit is approximately zero or negative. +The ksm_merge_type in ``/proc/<pid>/ksm_stat`` shows the merge type of the +process. Valid values are ``none``, ``madvise`` and ``process``. + Monitoring KSM events ===================== diff --git a/fs/proc/base.c b/fs/proc/base.c index 07463ad4a70a..c74450318e05 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -96,6 +96,7 @@ #include <linux/time_namespace.h> #include <linux/resctrl.h> #include <linux/cn_proc.h> +#include <linux/ksm.h> #include <trace/events/oom.h> #include "internal.h" #include "fd.h" @@ -3199,6 +3200,7 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace * return 0; } + static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -3208,6 +3210,9 @@ static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns, if (mm) { seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items); seq_printf(m, "zero_pages_sharing %lu\n", mm->ksm_zero_pages_sharing); + seq_printf(m, "ksm_merging_pages %lu\n", mm->ksm_merging_pages); + seq_printf(m, "ksm_merge_type %s\n", ksm_merge_type(mm)); + seq_printf(m, "ksm_process_profit %ld\n", ksm_process_profit(mm)); mmput(mm); } diff --git a/include/linux/ksm.h b/include/linux/ksm.h index d38a05a36298..d5f69f18ee5a 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -55,6 +55,11 @@ struct page *ksm_might_need_to_copy(struct page *page, void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); +#ifdef CONFIG_PROC_FS +long ksm_process_profit(struct mm_struct *); +const char *ksm_merge_type(struct mm_struct *mm); +#endif /* CONFIG_PROC_FS */ + #else /* !CONFIG_KSM */ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) diff --git a/mm/ksm.c b/mm/ksm.c index b8e6e734dd69..5fa7f239dbd8 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -3009,6 +3009,25 @@ static void wait_while_offlining(void) } #endif /* CONFIG_MEMORY_HOTREMOVE */ +#ifdef CONFIG_PROC_FS +long ksm_process_profit(struct mm_struct *mm) +{ + return (long)mm->ksm_merging_pages * PAGE_SIZE - + mm->ksm_rmap_items * sizeof(struct ksm_rmap_item); +} + +/* Return merge type name as string. */ +const char *ksm_merge_type(struct mm_struct *mm) +{ + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags)) + return "process"; + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags)) + return "madvise"; + else + return "none"; +} +#endif /* CONFIG_PROC_FS */ + #ifdef CONFIG_SYSFS /* * This all compiles without CONFIG_SYSFS, but is a waste of space. @@ -3256,8 +3275,7 @@ static ssize_t pages_unshared_show(struct kobject *kobj, } KSM_ATTR_RO(pages_unshared); -static ssize_t pages_volatile_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) +static long pages_volatile(void) { long ksm_pages_volatile; @@ -3269,7 +3287,14 @@ static ssize_t pages_volatile_show(struct kobject *kobj, */ if (ksm_pages_volatile < 0) ksm_pages_volatile = 0; - return sysfs_emit(buf, "%ld\n", ksm_pages_volatile); + + return ksm_pages_volatile; +} + +static ssize_t pages_volatile_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%ld\n", pages_volatile()); } KSM_ATTR_RO(pages_volatile); @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject *kobj, } KSM_ATTR_RO(zero_pages_sharing); +static ssize_t general_profit_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + long general_profit; + long all_rmap_items; + + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared + + ksm_pages_unshared + pages_volatile(); + general_profit = ksm_pages_sharing * PAGE_SIZE - + all_rmap_items * sizeof(struct ksm_rmap_item); + + return sysfs_emit(buf, "%ld\n", general_profit); +} +KSM_ATTR_RO(general_profit); + static ssize_t stable_node_dups_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { @@ -3345,6 +3385,7 @@ static struct attribute *ksm_attrs[] = { &stable_node_dups_attr.attr, &stable_node_chains_prune_millisecs_attr.attr, &use_zero_pages_attr.attr, + &general_profit_attr.attr, NULL, };