diff mbox series

[RFC] mm: don't miss the last page because of round-off error

Message ID 20180817231834.15959-1-guro@fb.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm: don't miss the last page because of round-off error | expand

Commit Message

Roman Gushchin Aug. 17, 2018, 11:18 p.m. UTC
I've noticed, that dying memory cgroups are  often pinned
in memory by a single pagecache page. Even under moderate
memory pressure they sometimes stayed in such state
for a long time. That looked strange.

My investigation showed that the problem is caused by
applying the LRU pressure balancing math:

  scan = div64_u64(scan * fraction[lru], denominator),

where

  denominator = fraction[anon] + fraction[file] + 1.

Because fraction[lru] is always less than denominator,
if the initial scan size is 1, the result is always 0.

This means the last page is not scanned and has
no chances to be reclaimed.

Fix this by skipping the balancing logic if the initial
scan count is 1.

In practice this change significantly improves the speed
of dying cgroups reclaim.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
---
 mm/vmscan.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox (Oracle) Aug. 18, 2018, 1:22 a.m. UTC | #1
On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> -			scan = div64_u64(scan * fraction[file],
> -					 denominator);
> +			if (scan > 1)
> +				scan = div64_u64(scan * fraction[file],
> +						 denominator);

Wouldn't we be better off doing a div_round_up?  ie:

	scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);

although i'd rather hide that in a new macro in math64.h than opencode it
here.
Roman Gushchin Aug. 20, 2018, 5:19 p.m. UTC | #2
On Fri, Aug 17, 2018 at 06:22:13PM -0700, Matthew Wilcox wrote:
> On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> > -			scan = div64_u64(scan * fraction[file],
> > -					 denominator);
> > +			if (scan > 1)
> > +				scan = div64_u64(scan * fraction[file],
> > +						 denominator);
> 
> Wouldn't we be better off doing a div_round_up?  ie:
> 
> 	scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> 
> although i'd rather hide that in a new macro in math64.h than opencode it
> here.

Good idea! Will do in v2.

Thanks!
Konstantin Khlebnikov Aug. 21, 2018, 5:11 a.m. UTC | #3
On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
>> -                     scan = div64_u64(scan * fraction[file],
>> -                                      denominator);
>> +                     if (scan > 1)
>> +                             scan = div64_u64(scan * fraction[file],
>> +                                              denominator);
>
> Wouldn't we be better off doing a div_round_up?  ie:
>
>         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>
> although i'd rather hide that in a new macro in math64.h than opencode it
> here.

All numbers here should be up to nr_pages * 200 and fit into unsigned long.
I see no reason for u64. If they overflow then u64 wouldn't help either.

There is macro DIV_ROUND_UP in kernel.h
Matthew Wilcox (Oracle) Aug. 21, 2018, 1:35 p.m. UTC | #4
On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> -                     scan = div64_u64(scan * fraction[file],
> >> -                                      denominator);
> >> +                     if (scan > 1)
> >> +                             scan = div64_u64(scan * fraction[file],
> >> +                                              denominator);
> >
> > Wouldn't we be better off doing a div_round_up?  ie:
> >
> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >
> > although i'd rather hide that in a new macro in math64.h than opencode it
> > here.
> 
> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> I see no reason for u64. If they overflow then u64 wouldn't help either.

Shaohua added the div64 usage initially, adding him to the cc.

> There is macro DIV_ROUND_UP in kernel.h

Indeed there is.
Johannes Weiner Aug. 21, 2018, 5:15 p.m. UTC | #5
On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> -                     scan = div64_u64(scan * fraction[file],
> >> -                                      denominator);
> >> +                     if (scan > 1)
> >> +                             scan = div64_u64(scan * fraction[file],
> >> +                                              denominator);
> >
> > Wouldn't we be better off doing a div_round_up?  ie:
> >
> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >
> > although i'd rather hide that in a new macro in math64.h than opencode it
> > here.
> 
> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> I see no reason for u64. If they overflow then u64 wouldn't help either.

It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
to four times of what's on the LRUs. That can overflow a u32 with even
small amounts of memory.
Konstantin Khlebnikov Aug. 22, 2018, 6:01 a.m. UTC | #6
On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
>> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
>> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
>> >> -                     scan = div64_u64(scan * fraction[file],
>> >> -                                      denominator);
>> >> +                     if (scan > 1)
>> >> +                             scan = div64_u64(scan * fraction[file],
>> >> +                                              denominator);
>> >
>> > Wouldn't we be better off doing a div_round_up?  ie:
>> >
>> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>> >
>> > although i'd rather hide that in a new macro in math64.h than opencode it
>> > here.
>>
>> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
>> I see no reason for u64. If they overflow then u64 wouldn't help either.
>
> It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
> to four times of what's on the LRUs. That can overflow a u32 with even
> small amounts of memory.

Ah, this thing is inverted because it aims to proportional reactivation rate
rather than the proportional pressure to reclaimable pages.
That's not obvious. I suppose this should be in comment above it.

Well, at least denominator should fit into unsigned long. So full
64/64 division is redundant.
Roman Gushchin Aug. 22, 2018, 5:50 p.m. UTC | #7
On Wed, Aug 22, 2018 at 09:01:19AM +0300, Konstantin Khlebnikov wrote:
> On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> >> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> >> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> >> -                     scan = div64_u64(scan * fraction[file],
> >> >> -                                      denominator);
> >> >> +                     if (scan > 1)
> >> >> +                             scan = div64_u64(scan * fraction[file],
> >> >> +                                              denominator);
> >> >
> >> > Wouldn't we be better off doing a div_round_up?  ie:
> >> >
> >> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >> >
> >> > although i'd rather hide that in a new macro in math64.h than opencode it
> >> > here.
> >>
> >> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> >> I see no reason for u64. If they overflow then u64 wouldn't help either.
> >
> > It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
> > to four times of what's on the LRUs. That can overflow a u32 with even
> > small amounts of memory.
> 
> Ah, this thing is inverted because it aims to proportional reactivation rate
> rather than the proportional pressure to reclaimable pages.
> That's not obvious. I suppose this should be in comment above it.
> 
> Well, at least denominator should fit into unsigned long. So full
> 64/64 division is redundant.

In any case it's not related to the original issue and should be
treated separately. I'd like to keep the patch simple to make
backporting to stable easy.

All refactorings can be done separately, if necessarily.

Thanks!
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..f85c5ec01886 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2287,9 +2287,12 @@  static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 			/*
 			 * Scan types proportional to swappiness and
 			 * their relative recent reclaim efficiency.
+			 * Make sure we don't miss the last page
+			 * because of a round-off error.
 			 */
-			scan = div64_u64(scan * fraction[file],
-					 denominator);
+			if (scan > 1)
+				scan = div64_u64(scan * fraction[file],
+						 denominator);
 			break;
 		case SCAN_FILE:
 		case SCAN_ANON: