Message ID | 20161227155532.GI1308@dhcp22.suse.cz (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Hi Michal, [auto build test ERROR on mmotm/master] [also build test ERROR on v4.10-rc1 next-20161224] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-vmscan-consider-eligible-zones-in-get_scan_count/20161228-000917 base: git://git.cmpxchg.org/linux-mmotm.git master config: i386-tinyconfig (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): mm/vmscan.c: In function 'lruvec_lru_size_zone_idx': >> mm/vmscan.c:264:10: error: implicit declaration of function 'lruvec_zone_lru_size' [-Werror=implicit-function-declaration] size = lruvec_zone_lru_size(lruvec, lru, zid); ^~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/lruvec_zone_lru_size +264 mm/vmscan.c 258 struct zone *zone = &pgdat->node_zones[zid]; 259 unsigned long size; 260 261 if (!managed_zone(zone)) 262 continue; 263 > 264 size = lruvec_zone_lru_size(lruvec, lru, zid); 265 lru_size -= min(size, lru_size); 266 } 267 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > Hi, > could you try to run with the following patch on top of the previous > one? I do not think it will make a large change in your workload but > I think we need something like that so some testing under which is known > to make a high lowmem pressure would be really appreciated. If you have > more time to play with it then running with and without the patch with > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > whether it make any difference at all. Of course, no problem! First, about the events to trace: mm_vmscan_direct_reclaim_start doesn't seem to exist, but mm_vmscan_direct_reclaim_begin does. I'm sure that's what you meant and so I took that one instead. Then I have to admit in both cases (once without the latest patch, once with) very little trace data was actually produced. In the case without the patch, the reclaim was started more often and reclaimed a smaller number of pages each time, in the case with the patch it was invoked less often, and with the last time it was invoked it reclaimed a rather big number of pages. I have no clue, however, if that happened "by chance" or if it was actually causes by the patch and thus an expected change. In both cases, my test case was: Reboot, setup logging, do "emerge firefox" (which unpacks and builds the firefox sources), then, when the emerge had come so far that the unpacking was done and the building had started, switch to another console and untar the latest kernel, libreoffice and (once more) firefox sources there. After that had completed, I aborted the emerge build process and stopped tracing. Here's the trace data captured without the latest patch applied: khugepaged-22 [000] .... 566.123383: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [000] .N.. 566.165520: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 khugepaged-22 [001] .... 587.515424: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [000] .... 587.596035: mm_vmscan_direct_reclaim_end: nr_reclaimed=1029 khugepaged-22 [001] .... 599.879536: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [000] .... 601.000812: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 khugepaged-22 [001] .... 601.228137: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 601.309952: mm_vmscan_direct_reclaim_end: nr_reclaimed=1081 khugepaged-22 [001] .... 694.935267: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .N.. 695.081943: mm_vmscan_direct_reclaim_end: nr_reclaimed=1071 khugepaged-22 [001] .... 701.370707: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 701.372798: mm_vmscan_direct_reclaim_end: nr_reclaimed=1089 khugepaged-22 [001] .... 764.752036: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [000] .... 771.047905: mm_vmscan_direct_reclaim_end: nr_reclaimed=1039 khugepaged-22 [000] .... 781.760515: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 781.826543: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 khugepaged-22 [001] .... 782.595575: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [000] .... 782.638591: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 khugepaged-22 [001] .... 782.930455: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 782.993608: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 khugepaged-22 [001] .... 783.330378: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 783.369653: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 And this is the same with the patch applied: khugepaged-22 [001] .... 523.599997: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 523.683110: mm_vmscan_direct_reclaim_end: nr_reclaimed=1092 khugepaged-22 [001] .... 535.345477: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 535.401189: mm_vmscan_direct_reclaim_end: nr_reclaimed=1078 khugepaged-22 [000] .... 692.876716: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 khugepaged-22 [001] .... 703.312399: mm_vmscan_direct_reclaim_end: nr_reclaimed=197759 If my test case and thus the results don't sound good, I could of course try some other test cases ... like capturing for a longer period of time or trying to produce more memory pressure by running more processes at the same time, or something like that. Besides that I can say that the patch hasn't produced any warnings or other issues so far, so at first glance, it doesn't seem to hurt anything. Greetings Nils -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 28-12-16 00:28:38, kbuild test robot wrote: > Hi Michal, > > [auto build test ERROR on mmotm/master] > [also build test ERROR on v4.10-rc1 next-20161224] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-vmscan-consider-eligible-zones-in-get_scan_count/20161228-000917 > base: git://git.cmpxchg.org/linux-mmotm.git master > config: i386-tinyconfig (attached as .config) > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > All errors (new ones prefixed by >>): > > mm/vmscan.c: In function 'lruvec_lru_size_zone_idx': > >> mm/vmscan.c:264:10: error: implicit declaration of function 'lruvec_zone_lru_size' [-Werror=implicit-function-declaration] > size = lruvec_zone_lru_size(lruvec, lru, zid); this patch depends on the previous one http://lkml.kernel.org/r/20161226124839.GB20715@dhcp22.suse.cz
On Tue 27-12-16 20:33:09, Nils Holland wrote: > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > > Hi, > > could you try to run with the following patch on top of the previous > > one? I do not think it will make a large change in your workload but > > I think we need something like that so some testing under which is known > > to make a high lowmem pressure would be really appreciated. If you have > > more time to play with it then running with and without the patch with > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > > whether it make any difference at all. > > Of course, no problem! > > First, about the events to trace: mm_vmscan_direct_reclaim_start > doesn't seem to exist, but mm_vmscan_direct_reclaim_begin does. I'm > sure that's what you meant and so I took that one instead. yes, sorry about the confusion > Then I have to admit in both cases (once without the latest patch, > once with) very little trace data was actually produced. In the case > without the patch, the reclaim was started more often and reclaimed a > smaller number of pages each time, in the case with the patch it was > invoked less often, and with the last time it was invoked it reclaimed > a rather big number of pages. I have no clue, however, if that > happened "by chance" or if it was actually causes by the patch and > thus an expected change. yes that seems to be a variation of the workload I would say because if anything the patch should reduce the number of scanned pages. > In both cases, my test case was: Reboot, setup logging, do "emerge > firefox" (which unpacks and builds the firefox sources), then, when > the emerge had come so far that the unpacking was done and the > building had started, switch to another console and untar the latest > kernel, libreoffice and (once more) firefox sources there. After that > had completed, I aborted the emerge build process and stopped tracing. > > Here's the trace data captured without the latest patch applied: > > khugepaged-22 [000] .... 566.123383: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .N.. 566.165520: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 > khugepaged-22 [001] .... 587.515424: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 587.596035: mm_vmscan_direct_reclaim_end: nr_reclaimed=1029 > khugepaged-22 [001] .... 599.879536: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 601.000812: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 > khugepaged-22 [001] .... 601.228137: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 601.309952: mm_vmscan_direct_reclaim_end: nr_reclaimed=1081 > khugepaged-22 [001] .... 694.935267: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .N.. 695.081943: mm_vmscan_direct_reclaim_end: nr_reclaimed=1071 > khugepaged-22 [001] .... 701.370707: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 701.372798: mm_vmscan_direct_reclaim_end: nr_reclaimed=1089 > khugepaged-22 [001] .... 764.752036: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 771.047905: mm_vmscan_direct_reclaim_end: nr_reclaimed=1039 > khugepaged-22 [000] .... 781.760515: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 781.826543: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 782.595575: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 782.638591: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 782.930455: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 782.993608: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 783.330378: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 783.369653: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > > And this is the same with the patch applied: > > khugepaged-22 [001] .... 523.599997: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 523.683110: mm_vmscan_direct_reclaim_end: nr_reclaimed=1092 > khugepaged-22 [001] .... 535.345477: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 535.401189: mm_vmscan_direct_reclaim_end: nr_reclaimed=1078 > khugepaged-22 [000] .... 692.876716: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 703.312399: mm_vmscan_direct_reclaim_end: nr_reclaimed=197759 In these cases there is no real difference because this is not the lowmem pressure because those requests can go to the highmem zone. > If my test case and thus the results don't sound good, I could of > course try some other test cases ... like capturing for a longer > period of time or trying to produce more memory pressure by running > more processes at the same time, or something like that. yes, a stronger memory pressure would be needed. I suspect that your original issues was more about active list aging than a really strong memory pressure. So it might be possible that your workload will not notice. If you can collect those two tracepoints over a longer time it can still tell us something but I do not want you to burn a lot of time on this. The main issue seems to be fixed and the follow up fix can wait for a throughout review after both Mel and Johannes are back from holiday. > Besides that I can say that the patch hasn't produced any warnings or > other issues so far, so at first glance, it doesn't seem to hurt > anything. Thanks!
On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > Hi, > could you try to run with the following patch on top of the previous > one? I do not think it will make a large change in your workload but > I think we need something like that so some testing under which is known > to make a high lowmem pressure would be really appreciated. If you have > more time to play with it then running with and without the patch with > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > whether it make any difference at all. > > I would also appreciate if Mel and Johannes had a look at it. I am not > yet sure whether we need the same thing for anon/file balancing in > get_scan_count. I suspect we need but need to think more about that. > > Thanks a lot again! > --- > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Tue, 27 Dec 2016 16:28:44 +0100 > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count > > get_scan_count considers the whole node LRU size when > - doing SCAN_FILE due to many page cache inactive pages > - calculating the number of pages to scan > > in both cases this might lead to unexpected behavior especially on 32b > systems where we can expect lowmem memory pressure very often. > > A large highmem zone can easily distort SCAN_FILE heuristic because > there might be only few file pages from the eligible zones on the node > lru and we would still enforce file lru scanning which can lead to > trashing while we could still scan anonymous pages. Nit: It doesn't make thrashing because isolate_lru_pages filter out them but I agree it makes pointless CPU burning to find eligible pages. > > The later use of lruvec_lru_size can be problematic as well. Especially > when there are not many pages from the eligible zones. We would have to > skip over many pages to find anything to reclaim but shrink_node_memcg > would only reduce the remaining number to scan by SWAP_CLUSTER_MAX > at maximum. Therefore we can end up going over a large LRU many times > without actually having chance to reclaim much if anything at all. The > closer we are out of memory on lowmem zone the worse the problem will > be. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/vmscan.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c98b1a585992..785b4d7fb8a0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int > } > > /* > + * Return the number of pages on the given lru which are eligibne for the eligible > + * given zone_idx > + */ > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec, > + enum lru_list lru, int zone_idx) Nit: Although there is a comment, function name is rather confusing when I compared it with lruvec_zone_lru_size. lruvec_eligible_zones_lru_size is better? > +{ > + struct pglist_data *pgdat = lruvec_pgdat(lruvec); > + unsigned long lru_size; > + int zid; > + > + lru_size = lruvec_lru_size(lruvec, lru); > + for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { > + struct zone *zone = &pgdat->node_zones[zid]; > + unsigned long size; > + > + if (!managed_zone(zone)) > + continue; > + > + size = lruvec_zone_lru_size(lruvec, lru, zid); > + lru_size -= min(size, lru_size); > + } > + > + return lru_size; > +} > + > +/* > * Add a shrinker callback to be called from the vm. > */ > int register_shrinker(struct shrinker *shrinker) > @@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, > * system is under heavy pressure. > */ > if (!inactive_list_is_low(lruvec, true, sc) && > - lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) { > + lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) { > scan_balance = SCAN_FILE; > goto out; > } > @@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, > unsigned long size; > unsigned long scan; > > - size = lruvec_lru_size(lruvec, lru); > + size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx); > scan = size >> sc->priority; > > if (!scan && pass && force_scan) > -- > 2.10.2 Nit: With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than own custom calculation to filter out non-eligible pages. Anyway, I think this patch does right things so I suppose this. Acked-by: Minchan Kim <minchan@kernel.org> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 29-12-16 10:20:26, Minchan Kim wrote: > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > > Hi, > > could you try to run with the following patch on top of the previous > > one? I do not think it will make a large change in your workload but > > I think we need something like that so some testing under which is known > > to make a high lowmem pressure would be really appreciated. If you have > > more time to play with it then running with and without the patch with > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > > whether it make any difference at all. > > > > I would also appreciate if Mel and Johannes had a look at it. I am not > > yet sure whether we need the same thing for anon/file balancing in > > get_scan_count. I suspect we need but need to think more about that. > > > > Thanks a lot again! > > --- > > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.com> > > Date: Tue, 27 Dec 2016 16:28:44 +0100 > > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count > > > > get_scan_count considers the whole node LRU size when > > - doing SCAN_FILE due to many page cache inactive pages > > - calculating the number of pages to scan > > > > in both cases this might lead to unexpected behavior especially on 32b > > systems where we can expect lowmem memory pressure very often. > > > > A large highmem zone can easily distort SCAN_FILE heuristic because > > there might be only few file pages from the eligible zones on the node > > lru and we would still enforce file lru scanning which can lead to > > trashing while we could still scan anonymous pages. > > Nit: > It doesn't make thrashing because isolate_lru_pages filter out them > but I agree it makes pointless CPU burning to find eligible pages. This is not about isolate_lru_pages. The trashing could happen if we had lowmem pagecache user which would constantly reclaim recently faulted in pages while there is anonymous memory in the lowmem which could be reclaimed instead. [...] > > /* > > + * Return the number of pages on the given lru which are eligibne for the > eligible fixed > > + * given zone_idx > > + */ > > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec, > > + enum lru_list lru, int zone_idx) > > Nit: > > Although there is a comment, function name is rather confusing when I compared > it with lruvec_zone_lru_size. I am all for a better name. > lruvec_eligible_zones_lru_size is better? this would be too easy to confuse with lruvec_eligible_zone_lru_size. What about lruvec_lru_size_eligible_zones? > Nit: > > With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than > own custom calculation to filter out non-eligible pages. Yes, that would be possible and I was considering that. But then I found useful to see total and reduced numbers in the tracepoint http://lkml.kernel.org/r/20161228153032.10821-8-mhocko@kernel.org and didn't want to call lruvec_lru_size 2 times. But if you insist then I can just do that. > Anyway, I think this patch does right things so I suppose this. > > Acked-by: Minchan Kim <minchan@kernel.org> Thanks for the review!
On Thu, Dec 29, 2016 at 10:04:32AM +0100, Michal Hocko wrote: > On Thu 29-12-16 10:20:26, Minchan Kim wrote: > > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > > > Hi, > > > could you try to run with the following patch on top of the previous > > > one? I do not think it will make a large change in your workload but > > > I think we need something like that so some testing under which is known > > > to make a high lowmem pressure would be really appreciated. If you have > > > more time to play with it then running with and without the patch with > > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > > > whether it make any difference at all. > > > > > > I would also appreciate if Mel and Johannes had a look at it. I am not > > > yet sure whether we need the same thing for anon/file balancing in > > > get_scan_count. I suspect we need but need to think more about that. > > > > > > Thanks a lot again! > > > --- > > > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001 > > > From: Michal Hocko <mhocko@suse.com> > > > Date: Tue, 27 Dec 2016 16:28:44 +0100 > > > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count > > > > > > get_scan_count considers the whole node LRU size when > > > - doing SCAN_FILE due to many page cache inactive pages > > > - calculating the number of pages to scan > > > > > > in both cases this might lead to unexpected behavior especially on 32b > > > systems where we can expect lowmem memory pressure very often. > > > > > > A large highmem zone can easily distort SCAN_FILE heuristic because > > > there might be only few file pages from the eligible zones on the node > > > lru and we would still enforce file lru scanning which can lead to > > > trashing while we could still scan anonymous pages. > > > > Nit: > > It doesn't make thrashing because isolate_lru_pages filter out them > > but I agree it makes pointless CPU burning to find eligible pages. > > This is not about isolate_lru_pages. The trashing could happen if we had > lowmem pagecache user which would constantly reclaim recently faulted > in pages while there is anonymous memory in the lowmem which could be > reclaimed instead. > > [...] > > > /* > > > + * Return the number of pages on the given lru which are eligibne for the > > eligible > > fixed > > > > + * given zone_idx > > > + */ > > > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec, > > > + enum lru_list lru, int zone_idx) > > > > Nit: > > > > Although there is a comment, function name is rather confusing when I compared > > it with lruvec_zone_lru_size. > > I am all for a better name. > > > lruvec_eligible_zones_lru_size is better? > > this would be too easy to confuse with lruvec_eligible_zone_lru_size. > What about lruvec_lru_size_eligible_zones? Don't mind. > > > Nit: > > > > With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than > > own custom calculation to filter out non-eligible pages. > > Yes, that would be possible and I was considering that. But then I found > useful to see total and reduced numbers in the tracepoint > http://lkml.kernel.org/r/20161228153032.10821-8-mhocko@kernel.org > and didn't want to call lruvec_lru_size 2 times. But if you insist then > I can just do that. I don't mind either but I think we need to describe the reason if you want to go with your open-coded version. Otherwise, someone will try to fix it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/vmscan.c b/mm/vmscan.c index c98b1a585992..785b4d7fb8a0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int } /* + * Return the number of pages on the given lru which are eligibne for the + * given zone_idx + */ +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec, + enum lru_list lru, int zone_idx) +{ + struct pglist_data *pgdat = lruvec_pgdat(lruvec); + unsigned long lru_size; + int zid; + + lru_size = lruvec_lru_size(lruvec, lru); + for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { + struct zone *zone = &pgdat->node_zones[zid]; + unsigned long size; + + if (!managed_zone(zone)) + continue; + + size = lruvec_zone_lru_size(lruvec, lru, zid); + lru_size -= min(size, lru_size); + } + + return lru_size; +} + +/* * Add a shrinker callback to be called from the vm. */ int register_shrinker(struct shrinker *shrinker) @@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, * system is under heavy pressure. */ if (!inactive_list_is_low(lruvec, true, sc) && - lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) { + lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) { scan_balance = SCAN_FILE; goto out; } @@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long size; unsigned long scan; - size = lruvec_lru_size(lruvec, lru); + size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx); scan = size >> sc->priority; if (!scan && pass && force_scan)