Message ID | 20220116152150.859520-1-yang.yang29@zte.com.cn (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | psi: Treat ksm swapping in copy as memstall | expand |
On Mon, Jan 17, 2022 at 07:14:53AM -0500, Johannes Weiner wrote: > Hello Yang, > > On Sun, Jan 16, 2022 at 03:21:51PM +0000, cgel.zte@gmail.com wrote: > > From: Yang Yang <yang.yang29@zte.com.cn> > > > > When faults in from swap what used to be a ksm page and that page > > had been swapped in before, system has to make a copy. Obviously > > this kind of copy is related to high memory pressure, so we treat > > it as memstall. Although ksm page merging is not because of high > > memory pressure. > > > > Information of this new kind of stall will help psi to account > > memory pressure more precise. > > Thanks for your patch. I'm curious if you have a concrete use case > where this makes a difference, or if this is something you found while > reading the code? > Thanks for your reply. I found it while reading the code, and did a test, please see below. > > Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> > > --- > > mm/ksm.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/mm/ksm.c b/mm/ksm.c > > index 4a7f8614e57d..d4ec6773f9b8 100644 > > --- a/mm/ksm.c > > +++ b/mm/ksm.c > > @@ -39,6 +39,7 @@ > > #include <linux/freezer.h> > > #include <linux/oom.h> > > #include <linux/numa.h> > > +#include <linux/psi.h> > > > > #include <asm/tlbflush.h> > > #include "internal.h" > > @@ -2569,6 +2570,7 @@ struct page *ksm_might_need_to_copy(struct page *page, > > { > > struct anon_vma *anon_vma = page_anon_vma(page); > > struct page *new_page; > > + unsigned long pflags; > > > > if (PageKsm(page)) { > > if (page_stable_node(page) && > > @@ -2583,6 +2585,7 @@ struct page *ksm_might_need_to_copy(struct page *page, > > if (!PageUptodate(page)) > > return page; /* let do_swap_page report the error */ > > > > + psi_memstall_enter(&pflags); > > new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); > > if (new_page && > > mem_cgroup_charge(page_folio(new_page), vma->vm_mm, GFP_KERNEL)) { > > @@ -2600,6 +2603,7 @@ struct page *ksm_might_need_to_copy(struct page *page, > > #endif > > } > > > > + psi_memstall_leave(&pflags); > > This does unconditional stall accounting for a swapin operation. But > if you take a look at workingset_refault() -> folio_wait_bit_common(), > we only count memory stalls when the page is thrashing, not when it's > a transitionary refault (which happen even when there is enough memory > to hold the workingset). You need to check PageWorkingset() at least. > I see PSI already does stall accounting for a swapin operation of zram in most conditions. See swap_readpage(), it calls psi_memstall_enter(). > But again I'd be curious first if this is a practical concern. Swapins > should be IO dominated - or in the case of zswap dominated by the > decompression. Does a page copy really matter? I did a test, when we use zram, it takes longer time for ksm copying than swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() average takes 55639ns. So I think this patch is reasonable. I use ktime_get_ts64() for time calcuating in swap_readpage() and ksm_might_need_to_copy(). Code likes: psi_memstall_enter() ktime_get_ts64(&ts_start) //ksm copy or swapin ktime_get_ts64(&ts_end) time = timespec64_sub(ts_end, ts_start) psi_memstall_leave()
Wed, Jan 19, 2022 at 07:58:23AM -0500, Johannes Weiner wrote: > On Wed, Jan 19, 2022 at 06:13:54AM +0000, CGEL wrote: > > I did a test, when we use zram, it takes longer time for ksm copying than > > swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() > > average takes 55639ns. So I think this patch is reasonable. > > Ok, that sounds reasonable to me as well. Please add the > PageWorkingset() check and resubmit the patch. Thanks! I am a litte confused about adding PageWorkingset(), since I think ksm_might_need_to_copy() memstall is like swap_readpage() memstall and swap_readpage() doesn't add PageWorkingset(). Would please make it cleaner? Thanks!
On Thu, Jan 27, 2022 at 08:29:08PM -0500, Johannes Weiner wrote: > On Fri, Jan 21, 2022 at 09:51:08AM +0000, CGEL wrote: > > Wed, Jan 19, 2022 at 07:58:23AM -0500, Johannes Weiner wrote: > > > On Wed, Jan 19, 2022 at 06:13:54AM +0000, CGEL wrote: > > > > I did a test, when we use zram, it takes longer time for ksm copying than > > > > swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() > > > > average takes 55639ns. So I think this patch is reasonable. > > > > > > Ok, that sounds reasonable to me as well. Please add the > > > PageWorkingset() check and resubmit the patch. Thanks! > > I am a litte confused about adding PageWorkingset(), since I > > think ksm_might_need_to_copy() memstall is like swap_readpage() > > memstall and swap_readpage() doesn't add PageWorkingset(). > > That's actually a bug! It should do that. I recently found that too. Please CC to me your new patch, thanks! And I will send V2 of this patch "psi: Treat ksm swapping in copy as memstall" with PageWorkingset(). > That psi_memstall_enter() in swap_readpage() was added by commit > 937790699be9c8100e5358625e7dfa8b32bd33f2. It's for catching the > scenarios that don't go through submit_bio(). When you look at > submit_bio(), it counts stalls only if we have workingset pages: > > /* > * If we're reading data that is part of the userspace workingset, count > * submission time as memory stall. When the device is congested, or > * the submitting cgroup IO-throttled, submission can be a significant > * part of overall IO time. > */ > if (unlikely(bio_op(bio) == REQ_OP_READ && > bio_flagged(bio, BIO_WORKINGSET))) { > unsigned long pflags; > > psi_memstall_enter(&pflags); > submit_bio_noacct(bio); > psi_memstall_leave(&pflags); > return; > } > > I hope that clarifies it. I'll send a patch to fix up swap_readpage(). > > Thanks
On Fri, 28 Jan 2022, CGEL wrote: > On Thu, Jan 27, 2022 at 08:29:08PM -0500, Johannes Weiner wrote: > > On Fri, Jan 21, 2022 at 09:51:08AM +0000, CGEL wrote: > > > Wed, Jan 19, 2022 at 07:58:23AM -0500, Johannes Weiner wrote: > > > > On Wed, Jan 19, 2022 at 06:13:54AM +0000, CGEL wrote: > > > > > I did a test, when we use zram, it takes longer time for ksm copying than > > > > > swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() > > > > > average takes 55639ns. So I think this patch is reasonable. > > > > > > > > Ok, that sounds reasonable to me as well. Please add the > > > > PageWorkingset() check and resubmit the patch. Thanks! > > > I am a litte confused about adding PageWorkingset(), since I > > > think ksm_might_need_to_copy() memstall is like swap_readpage() > > > memstall and swap_readpage() doesn't add PageWorkingset(). > > > > That's actually a bug! It should do that. > I recently found that too. Please CC to me your new patch, thanks! > And I will send V2 of this patch "psi: Treat ksm swapping in copy > as memstall" with PageWorkingset(). I'm entirely PSI-ignorant, and reluctant to disagree with Johannes, but I don't see how your patch to ksm_might_need_to_copy() could be correct - maybe the "swapping" in your subject is confusing. There is no PSI enter and exit around the page allocation and copying in wp_page_copy(), so why in the analogous ksm_might_need_to_copy()? Hugh
On Mon, Feb 07, 2022 at 07:22:22PM -0800, Hugh Dickins wrote: > On Fri, 28 Jan 2022, CGEL wrote: > > On Thu, Jan 27, 2022 at 08:29:08PM -0500, Johannes Weiner wrote: > > > On Fri, Jan 21, 2022 at 09:51:08AM +0000, CGEL wrote: > > > > Wed, Jan 19, 2022 at 07:58:23AM -0500, Johannes Weiner wrote: > > > > > On Wed, Jan 19, 2022 at 06:13:54AM +0000, CGEL wrote: > > > > > > I did a test, when we use zram, it takes longer time for ksm copying than > > > > > > swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() > > > > > > average takes 55639ns. So I think this patch is reasonable. > > > > > > > > > > Ok, that sounds reasonable to me as well. Please add the > > > > > PageWorkingset() check and resubmit the patch. Thanks! > > > > I am a litte confused about adding PageWorkingset(), since I > > > > think ksm_might_need_to_copy() memstall is like swap_readpage() > > > > memstall and swap_readpage() doesn't add PageWorkingset(). > > > > > > That's actually a bug! It should do that. > > I recently found that too. Please CC to me your new patch, thanks! > > And I will send V2 of this patch "psi: Treat ksm swapping in copy > > as memstall" with PageWorkingset(). > > I'm entirely PSI-ignorant, and reluctant to disagree with Johannes, > but I don't see how your patch to ksm_might_need_to_copy() could be > correct - maybe the "swapping" in your subject is confusing. > > There is no PSI enter and exit around the page allocation and copying > in wp_page_copy(), so why in the analogous ksm_might_need_to_copy()? > I think it's two questions, first why PSI didn't treat wp_page_copy() as memstall, second why should PSI treat ksm_might_need_to_copy() as memstall. The first question is unrelated with this patch. I think the reason is PSI focous on memory contending(see Documentation/accounting/psi.rst), and wp_page_copy() is not caused by memory contending. Actually wp_page_copy() will still be called if memory is not contending. For the second question, ksm_might_need_to_copy() is called only becaused of swapping, and swap is caused by memory contending, so PSI better treat it as memstall. Thanks. > Hugh
On Tue, 8 Feb 2022, CGEL wrote: > On Mon, Feb 07, 2022 at 07:22:22PM -0800, Hugh Dickins wrote: > > On Fri, 28 Jan 2022, CGEL wrote: > > > On Thu, Jan 27, 2022 at 08:29:08PM -0500, Johannes Weiner wrote: > > > > On Fri, Jan 21, 2022 at 09:51:08AM +0000, CGEL wrote: > > > > > Wed, Jan 19, 2022 at 07:58:23AM -0500, Johannes Weiner wrote: > > > > > > On Wed, Jan 19, 2022 at 06:13:54AM +0000, CGEL wrote: > > > > > > > I did a test, when we use zram, it takes longer time for ksm copying than > > > > > > > swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() > > > > > > > average takes 55639ns. So I think this patch is reasonable. > > > > > > > > > > > > Ok, that sounds reasonable to me as well. Please add the > > > > > > PageWorkingset() check and resubmit the patch. Thanks! > > > > > I am a litte confused about adding PageWorkingset(), since I > > > > > think ksm_might_need_to_copy() memstall is like swap_readpage() > > > > > memstall and swap_readpage() doesn't add PageWorkingset(). > > > > > > > > That's actually a bug! It should do that. > > > I recently found that too. Please CC to me your new patch, thanks! > > > And I will send V2 of this patch "psi: Treat ksm swapping in copy > > > as memstall" with PageWorkingset(). > > > > I'm entirely PSI-ignorant, and reluctant to disagree with Johannes, > > but I don't see how your patch to ksm_might_need_to_copy() could be > > correct - maybe the "swapping" in your subject is confusing. > > > > There is no PSI enter and exit around the page allocation and copying > > in wp_page_copy(), so why in the analogous ksm_might_need_to_copy()? > > > I think it's two questions, first why PSI didn't treat wp_page_copy() as > memstall, second why should PSI treat ksm_might_need_to_copy() as memstall. > > The first question is unrelated with this patch. I think the reason is PSI > focous on memory contending(see Documentation/accounting/psi.rst), and > wp_page_copy() is not caused by memory contending. Actually wp_page_copy() > will still be called if memory is not contending. Agreed. > > For the second question, ksm_might_need_to_copy() is called only becaused > of swapping, and swap is caused by memory contending, so PSI better treat > it as memstall. But there I'm not at all convinced. * psi_memstall_enter - mark the beginning of a memory stall section * @flags: flags to handle nested sections * * Marks the calling task as being stalled due to a lack of memory, * such as waiting for a refault or performing reclaim. psi_memstall_enter() will have been called if do_swap_page() had to read back from swap or wait on page lock; and psi_memstall_enter() will be called if ksm_might_need_to_copy()'s alloc_page_vma() or mem_cgroup_charge() goes into page reclaim. Being stalled due to a lack of memory is fully covered there. Your argument is that copy_user_highpage() is a significant overhead (but not a memstall), and it might have been avoided if there was no KSM or no swapping - though doing the copy there often(?) saves doing the wp_page_copy() when do_swap_page() goes on to do_wp_page() (and that one you're not asking to count). I can understand you wanting to keep track of page copying overhead; and I've grown uneasy recently about the way CONFIG_KSM=y can add a ksm_might_need_to_copy() overhead for pages which never went near KSM; but I still don't see how psi_memstall is appropriate here. Hugh
On Tue, Feb 08, 2022 at 09:55:48PM -0800, Hugh Dickins wrote: > On Tue, 8 Feb 2022, CGEL wrote: > > On Mon, Feb 07, 2022 at 07:22:22PM -0800, Hugh Dickins wrote: > > > On Fri, 28 Jan 2022, CGEL wrote: > > > > On Thu, Jan 27, 2022 at 08:29:08PM -0500, Johannes Weiner wrote: > > > > > On Fri, Jan 21, 2022 at 09:51:08AM +0000, CGEL wrote: > > > > > > Wed, Jan 19, 2022 at 07:58:23AM -0500, Johannes Weiner wrote: > > > > > > > On Wed, Jan 19, 2022 at 06:13:54AM +0000, CGEL wrote: > > > > > > > > I did a test, when we use zram, it takes longer time for ksm copying than > > > > > > > > swap_readpage(). Ksm copying average takes 147263ns, swap_readpage() > > > > > > > > average takes 55639ns. So I think this patch is reasonable. > > > > > > > > > > > > > > Ok, that sounds reasonable to me as well. Please add the > > > > > > > PageWorkingset() check and resubmit the patch. Thanks! > > > > > > I am a litte confused about adding PageWorkingset(), since I > > > > > > think ksm_might_need_to_copy() memstall is like swap_readpage() > > > > > > memstall and swap_readpage() doesn't add PageWorkingset(). > > > > > > > > > > That's actually a bug! It should do that. > > > > I recently found that too. Please CC to me your new patch, thanks! > > > > And I will send V2 of this patch "psi: Treat ksm swapping in copy > > > > as memstall" with PageWorkingset(). > > > > > > I'm entirely PSI-ignorant, and reluctant to disagree with Johannes, > > > but I don't see how your patch to ksm_might_need_to_copy() could be > > > correct - maybe the "swapping" in your subject is confusing. > > > > > > There is no PSI enter and exit around the page allocation and copying > > > in wp_page_copy(), so why in the analogous ksm_might_need_to_copy()? > > > > > I think it's two questions, first why PSI didn't treat wp_page_copy() as > > memstall, second why should PSI treat ksm_might_need_to_copy() as memstall. > > > > The first question is unrelated with this patch. I think the reason is PSI > > focous on memory contending(see Documentation/accounting/psi.rst), and > > wp_page_copy() is not caused by memory contending. Actually wp_page_copy() > > will still be called if memory is not contending. > > Agreed. > > > > > For the second question, ksm_might_need_to_copy() is called only becaused > > of swapping, and swap is caused by memory contending, so PSI better treat > > it as memstall. > > But there I'm not at all convinced. > > * psi_memstall_enter - mark the beginning of a memory stall section > * @flags: flags to handle nested sections > * > * Marks the calling task as being stalled due to a lack of memory, > * such as waiting for a refault or performing reclaim. > > psi_memstall_enter() will have been called if do_swap_page() had to > read back from swap or wait on page lock; and psi_memstall_enter() > will be called if ksm_might_need_to_copy()'s alloc_page_vma() or > mem_cgroup_charge() goes into page reclaim. Being stalled due to > a lack of memory is fully covered there. PSI counts "read back from swap time" as memstall time, because the reason why we had to "read back from swap" is memory contending had happened before and caused swap out. ksm_might_need_to_copy() is just like it, we had to do this copy because of swap out happened before, not because ksm_might_need_to_copy() may goes into page reclaim. And if page reclaim happens, doesn't matter, psi_memstall_enter() will handle nested sections. > Your argument is that copy_user_highpage() is a significant overhead > (but not a memstall), and it might have been avoided if there was > no KSM or no swapping - though doing the copy there often(?) saves > doing the wp_page_copy() when do_swap_page() goes on to do_wp_page() > (and that one you're not asking to count). "Doing the copy" doesn't always means "saves doing the wp_page_copy()". For example, if there is no FAULT_FLAG_WRITE flag. What about we only count this case as memstall: ksm_might_need_to_copy() do the copy and not "saves doing the wp_page_copy()"? As this copy takes 2 times longer than swap_readpage() when use zram(see early mail on this maillist), we'd better not just leave it. > I can understand you wanting to keep track of page copying overhead; > and I've grown uneasy recently about the way CONFIG_KSM=y can add a > ksm_might_need_to_copy() overhead for pages which never went near KSM; > but I still don't see how psi_memstall is appropriate here. For "ksm_might_need_to_copy() overhead for pages which never went near KSM", is there a plan to optimize it? Or should I work on this? Thanks. > Hugh
diff --git a/mm/ksm.c b/mm/ksm.c index 4a7f8614e57d..d4ec6773f9b8 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -39,6 +39,7 @@ #include <linux/freezer.h> #include <linux/oom.h> #include <linux/numa.h> +#include <linux/psi.h> #include <asm/tlbflush.h> #include "internal.h" @@ -2569,6 +2570,7 @@ struct page *ksm_might_need_to_copy(struct page *page, { struct anon_vma *anon_vma = page_anon_vma(page); struct page *new_page; + unsigned long pflags; if (PageKsm(page)) { if (page_stable_node(page) && @@ -2583,6 +2585,7 @@ struct page *ksm_might_need_to_copy(struct page *page, if (!PageUptodate(page)) return page; /* let do_swap_page report the error */ + psi_memstall_enter(&pflags); new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); if (new_page && mem_cgroup_charge(page_folio(new_page), vma->vm_mm, GFP_KERNEL)) { @@ -2600,6 +2603,7 @@ struct page *ksm_might_need_to_copy(struct page *page, #endif } + psi_memstall_leave(&pflags); return new_page; }