Message ID | 1638760762-27239-1-git-send-email-huangzhaoyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: mask DIRECT_RECLAIM in kswapd | expand |
On Mon, 6 Dec 2021 11:19:22 +0800 Huangzhaoyang <huangzhaoyang@gmail.com> wrote: > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > As the eg bellowing, using GFP_KERNEL could confuse the registered .releasepage > or .shrinker functions when called in kswapd and have them acting wrongly.Mask > __GFP_DIRECT_RECLAIM in kswapd. > > eg, > kswapd > shrink_page_list > try_to_release_page > __fscache_maybe_release_page > ... > if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) { > fscache_stat(&fscache_n_store_vmscan_busy); > return false; > } Well, we have thus far been permitting kswapd's memory allocations to enter direct reclaim. Forbidding that kernel-wide might be the right thing to do, or might not be. But disabling it kernel-wide because of a peculiar hack in fscache is not good justification. > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4083,7 +4083,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) > bool boosted; > struct zone *zone; > struct scan_control sc = { > - .gfp_mask = GFP_KERNEL, > + .gfp_mask = GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, > .order = order, > .may_unmap = 1, > }; Maybe hack the hack like this? --- a/fs/fscache/page.c~a +++ a/fs/fscache/page.c @@ -126,8 +126,10 @@ page_busy: * sleeping on memory allocation, so we may need to impose a timeout * too. */ if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) { - fscache_stat(&fscache_n_store_vmscan_busy); - return false; + if (!current_is_kswapd()) { + fscache_stat(&fscache_n_store_vmscan_busy); + return false; + } } fscache_stat(&fscache_n_store_vmscan_wait);
On Tue, Dec 7, 2021 at 9:23 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 6 Dec 2021 11:19:22 +0800 Huangzhaoyang <huangzhaoyang@gmail.com> wrote: > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > As the eg bellowing, using GFP_KERNEL could confuse the registered .releasepage > > or .shrinker functions when called in kswapd and have them acting wrongly.Mask > > __GFP_DIRECT_RECLAIM in kswapd. > > > > eg, > > kswapd > > shrink_page_list > > try_to_release_page > > __fscache_maybe_release_page > > ... > > if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) { > > fscache_stat(&fscache_n_store_vmscan_busy); > > return false; > > } > > Well, we have thus far been permitting kswapd's memory allocations to > enter direct reclaim. Forbidding that kernel-wide might be the right > thing to do, or might not be. But disabling it kernel-wide because of > a peculiar hack in fscache is not good justification. By checking the whole path of kswapd reclaiming, I don't find any steps need __GFP_DIRECT_RECLAIM but the hooked slab shrinker and fs's releasepage functions. It doesn't make sense for kswapd be aware of there is a concurrent direct reclaim. > > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -4083,7 +4083,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) > > bool boosted; > > struct zone *zone; > > struct scan_control sc = { > > - .gfp_mask = GFP_KERNEL, > > + .gfp_mask = GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, > > .order = order, > > .may_unmap = 1, > > }; > > Maybe hack the hack like this? > > --- a/fs/fscache/page.c~a > +++ a/fs/fscache/page.c > @@ -126,8 +126,10 @@ page_busy: > * sleeping on memory allocation, so we may need to impose a timeout > * too. */ > if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) { > - fscache_stat(&fscache_n_store_vmscan_busy); > - return false; > + if (!current_is_kswapd()) { > + fscache_stat(&fscache_n_store_vmscan_busy); > + return false; > + } > } > > fscache_stat(&fscache_n_store_vmscan_wait); This method works. However, there are several other hook functions as below using this flag for judging the context. IMHO, __GFP_DIRECT_RECLAIM just only takes affection for two points. Have page_alloc_slow_path judging if enter direct_reclaim and the reclaimer tell the context. eg. xfs_qm_shrink_scan( ... if ((sc->gfp_mask & (__GFP_FS|__GFP_DIRECT_RECLAIM)) != (__GFP_FS|__GFP_DIRECT_RECLAIM)) return 0; static int ceph_releasepage(struct page *page, gfp_t gfp) ... if (PageFsCache(page)) { if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) return 0; static int afs_releasepage(struct page *page, gfp_t gfp_flags) ... if (PageFsCache(page)) { if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) return false; > _ > > But please, do cc the fscache mailing list and maintainer when mucking > with these things.
Andrew Morton <akpm@linux-foundation.org> wrote: > > __fscache_maybe_release_page > > ... > > if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS)) { > > fscache_stat(&fscache_n_store_vmscan_busy); > > return false; > > } > > Well, we have thus far been permitting kswapd's memory allocations to > enter direct reclaim. Forbidding that kernel-wide might be the right > thing to do, or might not be. But disabling it kernel-wide because of > a peculiar hack in fscache is not good justification. It's avoiding sleeping in ->releasepage() if fscache is doing something with the page. With the old I/O still used by nfs and cifs, PG_fscache means that the page is known to fscache and it might be doing something with it in the background. You have to ask fscache to release the page, which may require I/O to take place, to get rid of the mark. With the new I/O, as used by 9p, afs and ceph, where we're doing async DIO between the page and the cache, PG_fscache just means that there's a DIO write in progress from the page. It will be cleared when the DIO completes. I'm fine with changing the condition in the if-statement. Note that in my fscache-rewrite branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite I've been changing this to: if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS)) and the old I/O is gone. This is aimed at the next merge window. If you want me to change it there, let me know. David
diff --git a/mm/vmscan.c b/mm/vmscan.c index ef4a6dc..3b5c5e6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4083,7 +4083,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) bool boosted; struct zone *zone; struct scan_control sc = { - .gfp_mask = GFP_KERNEL, + .gfp_mask = GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, .order = order, .may_unmap = 1, };