Message ID | 163712397076.13692.4727608274002939094@noble.neil.brown.name (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | MM: discard __GFP_ATOMIC | expand |
On Wed, Nov 17, 2021 at 03:39:30PM +1100, NeilBrown wrote: > +++ b/drivers/iommu/tegra-smmu.c > @@ -676,12 +676,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as, > * allocate page in a sleeping context if GFP flags permit. Hence > * spinlock needs to be unlocked and re-locked after allocation. > */ > - if (!(gfp & __GFP_ATOMIC)) > + if (gfp & __GFP_DIRECT_RECLAIM) > spin_unlock_irqrestore(&as->lock, *flags); > > page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO); > > - if (!(gfp & __GFP_ATOMIC)) > + if (gfp & __GFP_DIRECT_RECLAIM) > spin_lock_irqsave(&as->lock, *flags); > > /* Surely this should be gfpflags_allow_blocking() instead of poking about in the innards of gfp flags? This patch seems like a good simplification to me. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[Cc Mel] On Wed 17-11-21 15:39:30, Neil Brown wrote: > > __GFP_ATOMIC serves little purpose. > It's main effect is to set ALLOC_HARDER which adds a few little boosts to > increase the chance of an allocation succeeding, one of which is to > lower the water-mark at which it will succeed. > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > adjusts this watermark. It is probable that other users of __GFP_HIGH > should benefit from the other little bonuses that __GFP_ATOMIC gets. > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > There is little point to this. We already get a might_sleep() warning > if __GFP_DIRECT_RECLAIM is set. > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > probable that testing ALLOC_HARDER is a better fit here. > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > sleep. This should test __GFP_DIRECT_RECLAIM instead. > > This patch: > - removes __GFP_ATOMIC > - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set > (as well as ALLOC_HIGH). > - makes other adjustments as suggested by the above. > > The net result is not change to GFP_ATOMIC allocations. Other > allocations that use __GFP_HIGH will benefit from a few different extra > privileges. This affects: > xen, dm, md, ntfs3 > the vermillion frame buffer > hibernation > ksm > swap > all of which likely produce more benefit than cost if these selected > allocation are more likely to succeed quickly. > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > > This patch came out of my attempts to review and document some of the > __GFP flags. I couldn't find that __GFP_ATOMIC meant anything really > useful, so I thought that discarding it would be best. > > Obviously GFP_ATOMIC is useful - just not __GFP_ATOMIC (and the fact > that both names exist could be seen as a warning sign). > > Thanks, > NeilBrown > > > Documentation/vm/balance.rst | 2 +- > drivers/iommu/tegra-smmu.c | 4 ++-- > include/linux/gfp.h | 12 ++++-------- > include/trace/events/mmflags.h | 1 - > lib/test_printf.c | 8 ++++---- > mm/internal.h | 3 +-- > mm/memcontrol.c | 2 +- > mm/page_alloc.c | 16 ++++------------ > tools/perf/builtin-kmem.c | 1 - > tools/testing/radix-tree/linux/gfp.h | 3 +-- > 10 files changed, 18 insertions(+), 34 deletions(-) > > diff --git a/Documentation/vm/balance.rst b/Documentation/vm/balance.rst > index 6a1fadf3e173..e38e9d83c1c7 100644 > --- a/Documentation/vm/balance.rst > +++ b/Documentation/vm/balance.rst > @@ -6,7 +6,7 @@ Memory Balancing > > Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com> > > -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as > +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as > well as for non __GFP_IO allocations. > > The first reason why a caller may avoid reclaim is that the caller can not > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c > index e900e3c46903..c5fa8b8673b6 100644 > --- a/drivers/iommu/tegra-smmu.c > +++ b/drivers/iommu/tegra-smmu.c > @@ -676,12 +676,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as, > * allocate page in a sleeping context if GFP flags permit. Hence > * spinlock needs to be unlocked and re-locked after allocation. > */ > - if (!(gfp & __GFP_ATOMIC)) > + if (gfp & __GFP_DIRECT_RECLAIM) > spin_unlock_irqrestore(&as->lock, *flags); > > page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO); > > - if (!(gfp & __GFP_ATOMIC)) > + if (gfp & __GFP_DIRECT_RECLAIM) > spin_lock_irqsave(&as->lock, *flags); > > /* > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index b976c4177299..676c813bd93f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -39,7 +39,7 @@ struct vm_area_struct; > #define ___GFP_IO 0x40u > #define ___GFP_FS 0x80u > #define ___GFP_ZERO 0x100u > -#define ___GFP_ATOMIC 0x200u > +/* 0x200u unused */ > #define ___GFP_DIRECT_RECLAIM 0x400u > #define ___GFP_KSWAPD_RECLAIM 0x800u > #define ___GFP_WRITE 0x1000u > @@ -116,11 +116,8 @@ struct vm_area_struct; > * > * %__GFP_HIGH indicates that the caller is high-priority and that granting > * the request is necessary before the system can make forward progress. > - * For example, creating an IO context to clean pages. > - * > - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is > - * high priority. Users are typically interrupt handlers. This may be > - * used in conjunction with %__GFP_HIGH > + * For example creating an IO context to clean pages and requests > + * from atomic context. > * > * %__GFP_MEMALLOC allows access to all memory. This should only be used when > * the caller guarantees the allocation will allow more memory to be freed > @@ -135,7 +132,6 @@ struct vm_area_struct; > * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. > * This takes precedence over the %__GFP_MEMALLOC flag if both are set. > */ > -#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) > #define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) > #define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) > #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) > @@ -320,7 +316,7 @@ struct vm_area_struct; > * version does not attempt reclaim/compaction at all and is by default used > * in page fault path, while the non-light is used by khugepaged. > */ > -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) > +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) > #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) > #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) > diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h > index 116ed4d5d0f8..30f492256b8c 100644 > --- a/include/trace/events/mmflags.h > +++ b/include/trace/events/mmflags.h > @@ -29,7 +29,6 @@ > {(unsigned long)__GFP_HIGHMEM, "__GFP_HIGHMEM"}, \ > {(unsigned long)GFP_DMA32, "GFP_DMA32"}, \ > {(unsigned long)__GFP_HIGH, "__GFP_HIGH"}, \ > - {(unsigned long)__GFP_ATOMIC, "__GFP_ATOMIC"}, \ > {(unsigned long)__GFP_IO, "__GFP_IO"}, \ > {(unsigned long)__GFP_FS, "__GFP_FS"}, \ > {(unsigned long)__GFP_NOWARN, "__GFP_NOWARN"}, \ > diff --git a/lib/test_printf.c b/lib/test_printf.c > index 07309c45f327..8010de49b6c5 100644 > --- a/lib/test_printf.c > +++ b/lib/test_printf.c > @@ -673,17 +673,17 @@ flags(void) > gfp = GFP_ATOMIC|__GFP_DMA; > test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp); > > - gfp = __GFP_ATOMIC; > - test("__GFP_ATOMIC", "%pGg", &gfp); > + gfp = __GFP_HIGH; > + test("__GFP_HIGH", "%pGg", &gfp); > > /* Any flags not translated by the table should remain numeric */ > gfp = ~__GFP_BITS_MASK; > snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp); > test(cmp_buffer, "%pGg", &gfp); > > - snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx", > + snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx", > (unsigned long) gfp); > - gfp |= __GFP_ATOMIC; > + gfp |= __GFP_HIGH; > test(cmp_buffer, "%pGg", &gfp); > > kfree(cmp_buffer); > diff --git a/mm/internal.h b/mm/internal.h > index 3b79a5c9427a..4564067065f9 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -20,8 +20,7 @@ > */ > #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\ > __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\ > - __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\ > - __GFP_ATOMIC) > + __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC) > > /* The GFP flags allowed during early boot */ > #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS)) > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 781605e92015..1ff55347e4b9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2621,7 +2621,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > * put the burden of reclaim on regular allocation requests > * and let these go through as privileged allocations. > */ > - if (gfp_mask & __GFP_ATOMIC) > + if (gfp_mask & __GFP_HIGH) > goto force; > > /* > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c5952749ad40..4fff90ca145b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3932,12 +3932,12 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, > free_pages)) > return true; > /* > - * Ignore watermark boosting for GFP_ATOMIC order-0 allocations > + * Ignore watermark boosting for GFP_HIGH order-0 allocations > * when checking the min watermark. The min watermark is the > * point where boosting is ignored so that kswapd is woken up > * when below the low watermark. > */ > - if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost > + if (unlikely(!order && (alloc_flags & ALLOC_HARDER) && z->watermark_boost > && ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) { > mark = z->_watermark[WMARK_MIN]; > return __zone_watermark_ok(z, order, mark, highest_zoneidx, > @@ -4661,12 +4661,12 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > * The caller may dip into page reserves a bit more if the caller > * cannot run direct reclaim, or if the caller has realtime scheduling > * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will > - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH). > + * set both ALLOC_HARDER (unless __GFP_NOMEMALLOC) and ALLOC_HIGH. > */ > alloc_flags |= (__force int) > (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); > > - if (gfp_mask & __GFP_ATOMIC) { > + if (gfp_mask & __GFP_HIGH) { > /* > * Not worth trying to allocate harder for __GFP_NOMEMALLOC even > * if it can't schedule. > @@ -4859,14 +4859,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > unsigned int cpuset_mems_cookie; > int reserve_flags; > > - /* > - * We also sanity check to catch abuse of atomic reserves being used by > - * callers that are not in atomic context. > - */ > - if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) == > - (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) > - gfp_mask &= ~__GFP_ATOMIC; > - > retry_cpuset: > compaction_retries = 0; > no_progress_loops = 0; > diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c > index da03a341c63c..17e6eeb6e196 100644 > --- a/tools/perf/builtin-kmem.c > +++ b/tools/perf/builtin-kmem.c > @@ -640,7 +640,6 @@ static const struct { > { "__GFP_HIGHMEM", "HM" }, > { "GFP_DMA32", "D32" }, > { "__GFP_HIGH", "H" }, > - { "__GFP_ATOMIC", "_A" }, > { "__GFP_IO", "I" }, > { "__GFP_FS", "F" }, > { "__GFP_NOWARN", "NWR" }, > diff --git a/tools/testing/radix-tree/linux/gfp.h b/tools/testing/radix-tree/linux/gfp.h > index 32159c08a52e..0a0741104dfe 100644 > --- a/tools/testing/radix-tree/linux/gfp.h > +++ b/tools/testing/radix-tree/linux/gfp.h > @@ -12,7 +12,6 @@ > #define __GFP_FS 0x80u > #define __GFP_NOWARN 0x200u > #define __GFP_ZERO 0x8000u > -#define __GFP_ATOMIC 0x80000u > #define __GFP_ACCOUNT 0x100000u > #define __GFP_DIRECT_RECLAIM 0x400000u > #define __GFP_KSWAPD_RECLAIM 0x2000000u > @@ -20,7 +19,7 @@ > #define __GFP_RECLAIM (__GFP_DIRECT_RECLAIM|__GFP_KSWAPD_RECLAIM) > > #define GFP_ZONEMASK 0x0fu > -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) > +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) > #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) > > -- > 2.33.1
On Thu, Nov 18, 2021 at 10:22:36AM +0100, Michal Hocko wrote: > [Cc Mel] > I think this patch should be ok. There are few direct users of __GFP_HIGH and some of them are borderline silly (e.g. mm/shmem.c specifying __GFP_HIGH|__GFP_NOMEMALLOC) while others just look questionable ( drivers/md/raid10.c seems to assume __GFP_HIGH guarantees allocation success). Xen appears to be the worst abuser of __GFP_HIGH.
On Fri, 19 Nov 2021, Mel Gorman wrote: > On Thu, Nov 18, 2021 at 10:22:36AM +0100, Michal Hocko wrote: > > [Cc Mel] > > > > I think this patch should be ok. There are few direct users of __GFP_HIGH > and some of them are borderline silly (e.g. mm/shmem.c specifying > __GFP_HIGH|__GFP_NOMEMALLOC) while others just look questionable ( > drivers/md/raid10.c seems to assume __GFP_HIGH guarantees allocation > success). Xen appears to be the worst abuser of __GFP_HIGH. That __GFP_HIGH in raid10.c is passed to mempool_alloc(), so there is no assumption that __GFP_HIGH will provide guarantees - the mempool does that. The comment - which I wrote 4 years ago and don't recall at all - suggest it was purely about performance - get error handling out of the way quickly. I doubt I could justify it if challenged... Thanks, NeilBrown > > -- > Mel Gorman > SUSE Labs > >
On Thu, 18 Nov 2021, Matthew Wilcox wrote: > On Wed, Nov 17, 2021 at 03:39:30PM +1100, NeilBrown wrote: > > +++ b/drivers/iommu/tegra-smmu.c > > @@ -676,12 +676,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as, > > * allocate page in a sleeping context if GFP flags permit. Hence > > * spinlock needs to be unlocked and re-locked after allocation. > > */ > > - if (!(gfp & __GFP_ATOMIC)) > > + if (gfp & __GFP_DIRECT_RECLAIM) > > spin_unlock_irqrestore(&as->lock, *flags); > > > > page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO); > > > > - if (!(gfp & __GFP_ATOMIC)) > > + if (gfp & __GFP_DIRECT_RECLAIM) > > spin_lock_irqsave(&as->lock, *flags); > > > > /* > > Surely this should be gfpflags_allow_blocking() instead of poking about > in the innards of gfp flags? Possibly. Didn't know about gfpflags_allow_blocking(). From a quick grep in the kernel, a whole lot of other people don't know about it either, though clearly some do. Maybe we should reaname "__GFP_DIRECT_RECLAIM" to "__GFP_ALLOW_BLOCKING", because that is what most users seems to care about. If not, then we probably want a gfpflags_without_block() function that removes that flag, as lots of code wants to do that - and using the flag for one, and an inline for the other is not consistent. My leaning would be to __GFP_ALLOW_BLOCKING NeilBrown > > This patch seems like a good simplification to me. > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Thanks, NeilBrown
On Fri, Nov 19, 2021 at 10:14:38AM +1100, NeilBrown wrote: > On Thu, 18 Nov 2021, Matthew Wilcox wrote: > > Surely this should be gfpflags_allow_blocking() instead of poking about > > in the innards of gfp flags? > > Possibly. Didn't know about gfpflags_allow_blocking(). From a quick > grep in the kernel, a whole lot of other people don't know about it > either, though clearly some do. > > Maybe we should reaname "__GFP_DIRECT_RECLAIM" to > "__GFP_ALLOW_BLOCKING", because that is what most users seems to care > about. I tend towards the school of thought that the __GFP flags should make sense to the implementation and users should use either GFP_ or functions. When we see users adding or subtracting __GFP flags, that's a problem. > If not, then we probably want a gfpflags_without_block() function that > removes that flag, as lots of code wants to do that - and using the flag > for one, and an inline for the other is not consistent. It's not a _lot_ of code ... block/bio.c drivers/net/ethernet/mellanox/mlx4/icm.c drivers/vhost/net.c fs/btrfs/disk-io.c fs/btrfs/volumes.c fs/erofs/zdata.c fs/fscache/cookie.c fs/fscache/page.c kernel/cgroup/cgroup.c net/core/skbuff.c net/core/sock.c net/netlink/af_netlink.c (excluding mm -- if mm wants to meddle with GFP flags, that's fine) I think a lot of these are probably misguided, eg the filesystems should probably be using GFP_NOFS to prevent recursion.
On Sat, 20 Nov 2021, Matthew Wilcox wrote: > On Fri, Nov 19, 2021 at 10:14:38AM +1100, NeilBrown wrote: > > On Thu, 18 Nov 2021, Matthew Wilcox wrote: > > > Surely this should be gfpflags_allow_blocking() instead of poking about > > > in the innards of gfp flags? > > > > Possibly. Didn't know about gfpflags_allow_blocking(). From a quick > > grep in the kernel, a whole lot of other people don't know about it > > either, though clearly some do. > > > > Maybe we should reaname "__GFP_DIRECT_RECLAIM" to > > "__GFP_ALLOW_BLOCKING", because that is what most users seems to care > > about. > > I tend towards the school of thought that the __GFP flags should make > sense to the implementation and users should use either GFP_ or functions. > When we see users adding or subtracting __GFP flags, that's a problem. Except __GFP_NOWARN of course, or __GFP_ZERO, or __GFP_NOFAIL. What about __GFP_HIGHMEM? __GFP_DMA? __GFP_HIGH? They all seem to be quite meaningful to the caller - explicitly specifying properties of the memory or properties of the service. (But maybe you would prefer __GFP_HIGH be spelled "__GFP_LOW_WATERMARK" so it would make more sense to the implementation). __GFP_DIRECTRECLAIM seems to me to be more the exception than the rule - specifying internal implementation details. Actually ... I take it back about __GFP_NOWARN. That probably shouldn't exist at all. Warnings should be based on how stressed the mm system is, not on whether the caller wants thinks failure is manageable. NeilBrown
On Wed 17-11-21 15:39:30, Neil Brown wrote: > > __GFP_ATOMIC serves little purpose. > It's main effect is to set ALLOC_HARDER which adds a few little boosts to > increase the chance of an allocation succeeding, one of which is to > lower the water-mark at which it will succeed. > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > adjusts this watermark. It is probable that other users of __GFP_HIGH > should benefit from the other little bonuses that __GFP_ATOMIC gets. While I like to see __GFP_ATOMIC going away I am not really sure about this particular part. We have 3 ways to get to memory reserves. One of thme is directly controlable by __GFP_HIGH and two are internal to the allocator to handle different situations - ALLOC_OOM is to help the oom victim to make a fwd progress and ALLOC_HARDER should be for contexts which cannot rely on the memory reclaim to continue. What is the point of having ALLOC_HIGH and ALLOC_HARDER if you just add both of them for __GFP_HIGH? I think you should be instead really get back to pre d0164adc89f6b and allow ALLOC_HARDER for requests which have neither of the reclaim allowed. That would require tweaking GFP_ATOMIC as well I suspect and drop __GFP_KSWAPD_RECLAIM. Or do something else. > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > There is little point to this. We already get a might_sleep() warning > if __GFP_DIRECT_RECLAIM is set. I believe the point of the warning was to stop any abuse of an additional memory reserves for context which can reclaim and to spare those to interrupt handlers - which usually use GFP_ATOMIC. A lack of any reports suggests this hasn't happened and the warning can be dropped. Would be worth a patch on its own with this explanation. > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > probable that testing ALLOC_HARDER is a better fit here. This has been introduced by f80b08fc44536 but I have to say that I haven't understood why this couldn't check for __GFP_DIRECT_RECLAIM or one ALLOC_$FOO boosters rather than __GFP_ATOMIC. Again something for a separate patch. > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > sleep. This should test __GFP_DIRECT_RECLAIM instead. Willy has already proposed a better alternative. Thanks!
On Sat 20-11-21 21:51:20, Neil Brown wrote: > On Sat, 20 Nov 2021, Matthew Wilcox wrote: > > On Fri, Nov 19, 2021 at 10:14:38AM +1100, NeilBrown wrote: > > > On Thu, 18 Nov 2021, Matthew Wilcox wrote: > > > > Surely this should be gfpflags_allow_blocking() instead of poking about > > > > in the innards of gfp flags? > > > > > > Possibly. Didn't know about gfpflags_allow_blocking(). From a quick > > > grep in the kernel, a whole lot of other people don't know about it > > > either, though clearly some do. > > > > > > Maybe we should reaname "__GFP_DIRECT_RECLAIM" to > > > "__GFP_ALLOW_BLOCKING", because that is what most users seems to care > > > about. > > > > I tend towards the school of thought that the __GFP flags should make > > sense to the implementation and users should use either GFP_ or functions. > > When we see users adding or subtracting __GFP flags, that's a problem. > > Except __GFP_NOWARN of course, or __GFP_ZERO, or __GFP_NOFAIL. > What about __GFP_HIGHMEM? __GFP_DMA? __GFP_HIGH? > > They all seem to be quite meaningful to the caller - explicitly > specifying properties of the memory or properties of the service. > (But maybe you would prefer __GFP_HIGH be spelled "__GFP_LOW_WATERMARK" > so it would make more sense to the implementation). > > __GFP_DIRECTRECLAIM seems to me to be more the exception than the rule - > specifying internal implementation details. I do not think it is viable to fix up gfp flags to be consistent :/ Both __GFP_DIRECT_RECLAIM and __GFP_KSWAPD_RECLAIM are way too lowlevel but historically we've had requests to inhibit kswapd for a particular requests because that has led to problems - fun reading caf491916b1c1. __GFP_ALLOW_BLOCKING would make a lot of sense but I am not sure it would be a good match to __GFP_KSWAPD_RECLAIM. > Actually ... I take it back about __GFP_NOWARN. That probably shouldn't > exist at all. Warnings should be based on how stressed the mm system is, > not on whether the caller wants thinks failure is manageable. Unless we change the way when allocation warnings are triggered then we really need this. There are many opportunistic allocations with a fallback behavior which do not want to swamp kernel logs with failures that are of no use. Think of a THP allocation that really want to be just very quick and falls back to normal base pages otherwise. Deducing context which is just fine to not report failures is quite tricky and it can get wrong easily. Callers should know whether warning can be of any use in many cases.
On Tue, 23 Nov 2021, Michal Hocko wrote: > On Sat 20-11-21 21:51:20, Neil Brown wrote: > > On Sat, 20 Nov 2021, Matthew Wilcox wrote: > > > On Fri, Nov 19, 2021 at 10:14:38AM +1100, NeilBrown wrote: > > > > On Thu, 18 Nov 2021, Matthew Wilcox wrote: > > > > > Surely this should be gfpflags_allow_blocking() instead of poking about > > > > > in the innards of gfp flags? > > > > > > > > Possibly. Didn't know about gfpflags_allow_blocking(). From a quick > > > > grep in the kernel, a whole lot of other people don't know about it > > > > either, though clearly some do. > > > > > > > > Maybe we should reaname "__GFP_DIRECT_RECLAIM" to > > > > "__GFP_ALLOW_BLOCKING", because that is what most users seems to care > > > > about. > > > > > > I tend towards the school of thought that the __GFP flags should make > > > sense to the implementation and users should use either GFP_ or functions. > > > When we see users adding or subtracting __GFP flags, that's a problem. > > > > Except __GFP_NOWARN of course, or __GFP_ZERO, or __GFP_NOFAIL. > > What about __GFP_HIGHMEM? __GFP_DMA? __GFP_HIGH? > > > > They all seem to be quite meaningful to the caller - explicitly > > specifying properties of the memory or properties of the service. > > (But maybe you would prefer __GFP_HIGH be spelled "__GFP_LOW_WATERMARK" > > so it would make more sense to the implementation). > > > > __GFP_DIRECTRECLAIM seems to me to be more the exception than the rule - > > specifying internal implementation details. > > I do not think it is viable to fix up gfp flags to be consistent :/ You may be right :-) Of course if we don't try - you'll definitely be right. > Both __GFP_DIRECT_RECLAIM and __GFP_KSWAPD_RECLAIM are way too lowlevel > but historically we've had requests to inhibit kswapd for a particular > requests because that has led to problems - fun reading caf491916b1c1. Unfortunately that commit doesn't provide any reasoning, just an assertion. The best reasoning I could find was in caf491916b1c1 which was the initial revert. There the primary reasoning was "there is a bug that we don't have time for a proper fix before the next release, so let's just use this quick fix". ... and maybe "the quick fix" was "the right fix", but I cannot tell from the commit logs :-( > __GFP_ALLOW_BLOCKING would make a lot of sense but I am not sure it > would be a good match to __GFP_KSWAPD_RECLAIM. So? __GFP_ALLOW_BLOCKING makes it clear what is, or is not, acceptable to the caller. How much reclaim, or other activity, alloc_page() engages in is largely irrelevant to the caller as lock as it doesn't block if asked not to (and doesn't enter an FS if asked not to, etc). > > > Actually ... I take it back about __GFP_NOWARN. That probably shouldn't > > exist at all. Warnings should be based on how stressed the mm system is, > > not on whether the caller wants thinks failure is manageable. > > Unless we change the way when allocation warnings are triggered then we > really need this. There are many opportunistic allocations with a > fallback behavior which do not want to swamp kernel logs with failures > that are of no use. Think of a THP allocation that really want to be > just very quick and falls back to normal base pages otherwise. Deducing > context which is just fine to not report failures is quite tricky and it > can get wrong easily. Callers should know whether warning can be of any > use in many cases. "Unless" being the key work. It makes sense to warn when a __GFP_HIGH or __GFP_MEMALLOC allocation fails, because they are clearly important. It makes sense to warning if direct reclaim and retrying were enabled, as then alloc_page() has tried really hard, but failed anyway. Thought maybe if COSTLY_ORDER is exceeded, then the warning is unlikely to be interesting. But does it ever make sense to warn if either of __GFP_RETRY_MAYFAIL __GFP_NORETRY are present? If we always suppressed warning when those flags were present, then many (most?) uses for __GFP_NOWARN can be discarded. I can see that some of the __GFP flags are designed to each perform a single well-defined function and internally to mm/ that makes sense. But exposing those flags to all users appears to be a recipe for trouble. Hiding them all behind "__" doesn't stop people from using and misusing them. Others are externally meaningful. Making them visually similar to the ones we want to hide isn't helping anyone. When Willy wrote: > When we see users adding or subtracting __GFP flags, that's a problem. the "problem" is not so much in the fact that they *do* but in the fact that they *can*. I would be greatly in favour of GFP flags which made sense to callers, and a mapping to ALLOC_ flags in mm/ which makes sense to allocators. I doubt anything outside mm/ cares about whether KSWAPD is woken or not. IT probably should be for small-order allocations, and not so much for large-order allocations. but mm/khugepaged.c might make other decisions. Thanks, NeilBrown
On Tue, 23 Nov 2021, Michal Hocko wrote: > On Wed 17-11-21 15:39:30, Neil Brown wrote: > > > > __GFP_ATOMIC serves little purpose. > > It's main effect is to set ALLOC_HARDER which adds a few little boosts to > > increase the chance of an allocation succeeding, one of which is to > > lower the water-mark at which it will succeed. > > > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > > adjusts this watermark. It is probable that other users of __GFP_HIGH > > should benefit from the other little bonuses that __GFP_ATOMIC gets. > > While I like to see __GFP_ATOMIC going away I am not really sure about > this particular part. We have 3 ways to get to memory reserves. One of > thme is directly controlable by __GFP_HIGH and two are internal to the > allocator to handle different situations - ALLOC_OOM is to help the oom > victim to make a fwd progress and ALLOC_HARDER should be for contexts > which cannot rely on the memory reclaim to continue. ALLOC_OOM certainly makes sense. > > What is the point of having ALLOC_HIGH and ALLOC_HARDER if you just > add both of them for __GFP_HIGH? I think you should be instead really > get back to pre d0164adc89f6b and allow ALLOC_HARDER for requests which > have neither of the reclaim allowed. That would require tweaking > GFP_ATOMIC as well I suspect and drop __GFP_KSWAPD_RECLAIM. Or do > something else. NONONONO. Tying ALLOC_HARDER to "no reclaim" is a mistake. From the caller's perspective they are very different statements, which might sometimes go together (and GFP_ATOMIC is exactly where they go together). "no reclaim" is a question "am I willing to pay the price of performing reclaim", and there might be various different reasons for choosing not to. "ALLOC_HARDER" is a question of "can I justify imposing on other threads by taking memory that they might want". Again there may be different reasons, but they will not always align with the first set. With my patch there is still a difference between ALLOC_HIGH and ALLOC_HARDER, but not much. __GFP_HIGH combined with __GFP_NOMEMALLOC - which could be seen as "high priority, but not too high" delivers ALLOC_HIGH without ALLOC_HARDER. It may not be a useful distinction, but it seems to preserve most of what I didn't want to change. > > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > > There is little point to this. We already get a might_sleep() warning > > if __GFP_DIRECT_RECLAIM is set. > > I believe the point of the warning was to stop any abuse of an > additional memory reserves for context which can reclaim and to spare > those to interrupt handlers - which usually use GFP_ATOMIC. A lack of > any reports suggests this hasn't happened and the warning can be > dropped. Would be worth a patch on its own with this explanation. > > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > > probable that testing ALLOC_HARDER is a better fit here. > > This has been introduced by f80b08fc44536 but I have to say that I > haven't understood why this couldn't check for __GFP_DIRECT_RECLAIM > or one ALLOC_$FOO boosters rather than __GFP_ATOMIC. Again something for > a separate patch. Yes - the commit description contrasts "atomic" with "sleepable" allocations. Given that, __GFP_DIRECT_RECLAIM would make make sense. > > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > > sleep. This should test __GFP_DIRECT_RECLAIM instead. > > Willy has already proposed a better alternative. I'm happy to resend the patch with this change (despite it seeming like a bandaid on a blowhole). Thanks for the thorough review!! NeilBrown > > Thanks! > -- > Michal Hocko > SUSE Labs > >
On Tue 23-11-21 15:33:19, Neil Brown wrote: [...] > "ALLOC_HARDER" is a question of "can I justify imposing on other threads > by taking memory that they might want". Again there may be different > reasons, but they will not always align with the first set. > > With my patch there is still a difference between ALLOC_HIGH and > ALLOC_HARDER, but not much. > __GFP_HIGH combined with __GFP_NOMEMALLOC - which could be seen as "high > priority, but not too high" delivers ALLOC_HIGH without ALLOC_HARDER. > It may not be a useful distinction, but it seems to preserve most of > what I didn't want to change. I am not sure this is really a helpful distinction. I would even say that an explicit use of __GFP_NOMEMALLOC | __GFP_HIGH is actively confusing as that would mean that you do not allow access to reserves while you want to dip into them anyway. Anyway, I still think that ALLOC_HARDER should stay under control of the allocator as a heuristic rather being imprinted into gfp flags directly. Having two levels of memory reserves access is just too complicated for users and I wouldn't be surprised if most callers would just consider their usecase important enough to justify as much reserves as possible. Allocation from an interrupt context sounds like a good usecase for ALLOC_HARDER. I am not sure about rt_task one but that one can be reasoned about as well. All/most __GFP_HIGH allocations just look like an overuse and conflation of the two modes. Both these were the primary usecase for ALLOC_HARDER historically we just tried to find a way how to express the former by gfp flags.
On Tue 23-11-21 15:15:20, Neil Brown wrote: > On Tue, 23 Nov 2021, Michal Hocko wrote: [...] > > Both __GFP_DIRECT_RECLAIM and __GFP_KSWAPD_RECLAIM are way too lowlevel > > but historically we've had requests to inhibit kswapd for a particular > > requests because that has led to problems - fun reading caf491916b1c1. > > Unfortunately that commit doesn't provide any reasoning, just an > assertion. > The best reasoning I could find was in caf491916b1c1 which was the initial > revert. There the primary reasoning was "there is a bug that we don't > have time for a proper fix before the next release, so let's just use > this quick fix". > ... and maybe "the quick fix" was "the right fix", but I cannot tell from > the commit logs :-( Yeah, that was not entirely fair from me but I just found it a nice example of how fun our process around gpf has been historically. A more fair would be to point you at 32dba98e085f ("thp: _GFP_NO_KSWAPD") which has introduced for THP use. Mostly as a workaround to existing reclaim problems because THPs have been enabled by default for everybody and that had backfired. Rik has tried to remove the flag c654345924f7 ("mm: remove __GFP_NO_KSWAPD") because most problems had been fixed - he believed. But that has turned out to be not the case 82b212f40059 ("Revert "mm: remove __GFP_NO_KSWAPD"") and swap storms triggered by THP peak loads were still observed. THP still seem to remain to be the biggest user of the flag (read only to care to not have the flag. Maybe another round of the check whether we need it... > > __GFP_ALLOW_BLOCKING would make a lot of sense but I am not sure it > > would be a good match to __GFP_KSWAPD_RECLAIM. > > So? __GFP_ALLOW_BLOCKING makes it clear what is, or is not, acceptable > to the caller. How much reclaim, or other activity, alloc_page() > engages in is largely irrelevant to the caller as lock as it doesn't > block if asked not to (and doesn't enter an FS if asked not to, etc). Hmm, maybe you are right. > > > Actually ... I take it back about __GFP_NOWARN. That probably shouldn't > > > exist at all. Warnings should be based on how stressed the mm system is, > > > not on whether the caller wants thinks failure is manageable. > > > > Unless we change the way when allocation warnings are triggered then we > > really need this. There are many opportunistic allocations with a > > fallback behavior which do not want to swamp kernel logs with failures > > that are of no use. Think of a THP allocation that really want to be > > just very quick and falls back to normal base pages otherwise. Deducing > > context which is just fine to not report failures is quite tricky and it > > can get wrong easily. Callers should know whether warning can be of any > > use in many cases. > > "Unless" being the key work. > It makes sense to warn when a __GFP_HIGH or __GFP_MEMALLOC allocation > fails, because they are clearly important. > > It makes sense to warning if direct reclaim and retrying were enabled, > as then alloc_page() has tried really hard, but failed anyway. Thought > maybe if COSTLY_ORDER is exceeded, then the warning is unlikely to be > interesting. For "normal" small allocations we usually get an OOM report if the memory is depleted. That will provide quite a lot of potentially useful context to debug memory usage. Non reclaiming allocations can be just opportunistic that choose to not reclaim with an other approach as a fallback but there are others that really cannot reclaim because they are in an atomic context. I do not see an easy way to tell one from the other. Simirarly for higher order allocations it can be useful to see whether the memory is depletely or just fragmented. > But does it ever make sense to warn if either of > __GFP_RETRY_MAYFAIL __GFP_NORETRY are present? > If we always suppressed warning when those flags were present, then many > (most?) uses for __GFP_NOWARN can be discarded. Yes __GFP_NORETRY is mostly (maybe always) used with __GFP_NOWARN. Coccinelle would be a good way to check. I do remember MAYFAIL is used for page migration to allocate target memory. It is often useful to see that the migration is failing because of lack of memory. > I can see that some of the __GFP flags are designed to each perform a > single well-defined function and internally to mm/ that makes sense. > But exposing those flags to all users appears to be a recipe for > trouble. Hiding them all behind "__" doesn't stop people from using and > misusing them. Others are externally meaningful. Making them visually > similar to the ones we want to hide isn't helping anyone. I do agree here.
Folks, Discussion seemed to meander then peter out. What's the thinking on this patch now? Thanks. From: "NeilBrown" <neilb@suse.de> Subject: mm: discard __GFP_ATOMIC __GFP_ATOMIC serves little purpose. Its main effect is to set ALLOC_HARDER which adds a few little boosts to increase the chance of an allocation succeeding, one of which is to lower the water-mark at which it will succeed. It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also adjusts this watermark. It is probable that other users of __GFP_HIGH should benefit from the other little bonuses that __GFP_ATOMIC gets. __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. There is little point to this. We already get a might_sleep() warning if __GFP_DIRECT_RECLAIM is set. __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is probable that testing ALLOC_HARDER is a better fit here. __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might sleep. This should test __GFP_DIRECT_RECLAIM instead. This patch: - removes __GFP_ATOMIC - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set (as well as ALLOC_HIGH). - makes other adjustments as suggested by the above. The net result is not change to GFP_ATOMIC allocations. Other allocations that use __GFP_HIGH will benefit from a few different extra privileges. This affects: xen, dm, md, ntfs3 the vermillion frame buffer hibernation ksm swap all of which likely produce more benefit than cost if these selected allocation are more likely to succeed quickly. Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/vm/balance.rst | 2 +- drivers/iommu/tegra-smmu.c | 4 ++-- include/linux/gfp.h | 12 ++++-------- include/trace/events/mmflags.h | 1 - lib/test_printf.c | 8 ++++---- mm/internal.h | 2 +- mm/page_alloc.c | 16 ++++------------ tools/include/linux/gfp.h | 3 +-- tools/perf/builtin-kmem.c | 1 - 9 files changed, 17 insertions(+), 32 deletions(-) --- a/Documentation/vm/balance.rst~mm-discard-__gfp_atomic +++ a/Documentation/vm/balance.rst @@ -6,7 +6,7 @@ Memory Balancing Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com> -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as well as for non __GFP_IO allocations. The first reason why a caller may avoid reclaim is that the caller can not --- a/drivers/iommu/tegra-smmu.c~mm-discard-__gfp_atomic +++ a/drivers/iommu/tegra-smmu.c @@ -671,12 +671,12 @@ static struct page *as_get_pde_page(stru * allocate page in a sleeping context if GFP flags permit. Hence * spinlock needs to be unlocked and re-locked after allocation. */ - if (!(gfp & __GFP_ATOMIC)) + if (gfp & __GFP_DIRECT_RECLAIM) spin_unlock_irqrestore(&as->lock, *flags); page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO); - if (!(gfp & __GFP_ATOMIC)) + if (gfp & __GFP_DIRECT_RECLAIM) spin_lock_irqsave(&as->lock, *flags); /* --- a/include/linux/gfp.h~mm-discard-__gfp_atomic +++ a/include/linux/gfp.h @@ -39,7 +39,7 @@ struct vm_area_struct; #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_ZERO 0x100u -#define ___GFP_ATOMIC 0x200u +/* 0x200u unused */ #define ___GFP_DIRECT_RECLAIM 0x400u #define ___GFP_KSWAPD_RECLAIM 0x800u #define ___GFP_WRITE 0x1000u @@ -124,11 +124,8 @@ struct vm_area_struct; * * %__GFP_HIGH indicates that the caller is high-priority and that granting * the request is necessary before the system can make forward progress. - * For example, creating an IO context to clean pages. - * - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is - * high priority. Users are typically interrupt handlers. This may be - * used in conjunction with %__GFP_HIGH + * For example creating an IO context to clean pages and requests + * from atomic context. * * %__GFP_MEMALLOC allows access to all memory. This should only be used when * the caller guarantees the allocation will allow more memory to be freed @@ -143,7 +140,6 @@ struct vm_area_struct; * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. * This takes precedence over the %__GFP_MEMALLOC flag if both are set. */ -#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) #define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) #define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) @@ -337,7 +333,7 @@ struct vm_area_struct; * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. */ -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) --- a/include/trace/events/mmflags.h~mm-discard-__gfp_atomic +++ a/include/trace/events/mmflags.h @@ -29,7 +29,6 @@ {(unsigned long)__GFP_HIGHMEM, "__GFP_HIGHMEM"}, \ {(unsigned long)GFP_DMA32, "GFP_DMA32"}, \ {(unsigned long)__GFP_HIGH, "__GFP_HIGH"}, \ - {(unsigned long)__GFP_ATOMIC, "__GFP_ATOMIC"}, \ {(unsigned long)__GFP_IO, "__GFP_IO"}, \ {(unsigned long)__GFP_FS, "__GFP_FS"}, \ {(unsigned long)__GFP_NOWARN, "__GFP_NOWARN"}, \ --- a/lib/test_printf.c~mm-discard-__gfp_atomic +++ a/lib/test_printf.c @@ -673,17 +673,17 @@ flags(void) gfp = GFP_ATOMIC|__GFP_DMA; test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp); - gfp = __GFP_ATOMIC; - test("__GFP_ATOMIC", "%pGg", &gfp); + gfp = __GFP_HIGH; + test("__GFP_HIGH", "%pGg", &gfp); /* Any flags not translated by the table should remain numeric */ gfp = ~__GFP_BITS_MASK; snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp); test(cmp_buffer, "%pGg", &gfp); - snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx", + snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx", (unsigned long) gfp); - gfp |= __GFP_ATOMIC; + gfp |= __GFP_HIGH; test(cmp_buffer, "%pGg", &gfp); kfree(cmp_buffer); --- a/mm/internal.h~mm-discard-__gfp_atomic +++ a/mm/internal.h @@ -24,7 +24,7 @@ struct folio_batch; #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\ __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\ __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\ - __GFP_ATOMIC|__GFP_NOLOCKDEP) + __GFP_NOLOCKDEP) /* The GFP flags allowed during early boot */ #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS)) --- a/mm/page_alloc.c~mm-discard-__gfp_atomic +++ a/mm/page_alloc.c @@ -3918,12 +3918,12 @@ static inline bool zone_watermark_fast(s free_pages)) return true; /* - * Ignore watermark boosting for GFP_ATOMIC order-0 allocations + * Ignore watermark boosting for GFP_HIGH order-0 allocations * when checking the min watermark. The min watermark is the * point where boosting is ignored so that kswapd is woken up * when below the low watermark. */ - if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost + if (unlikely(!order && (alloc_flags & ALLOC_HARDER) && z->watermark_boost && ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) { mark = z->_watermark[WMARK_MIN]; return __zone_watermark_ok(z, order, mark, highest_zoneidx, @@ -4657,12 +4657,12 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH). + * set both ALLOC_HARDER (unless __GFP_NOMEMALLOC) and ALLOC_HIGH. */ alloc_flags |= (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); - if (gfp_mask & __GFP_ATOMIC) { + if (gfp_mask & __GFP_HIGH) { /* * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. @@ -4855,14 +4855,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, u unsigned int cpuset_mems_cookie; int reserve_flags; - /* - * We also sanity check to catch abuse of atomic reserves being used by - * callers that are not in atomic context. - */ - if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) == - (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) - gfp_mask &= ~__GFP_ATOMIC; - retry_cpuset: compaction_retries = 0; no_progress_loops = 0; --- a/tools/include/linux/gfp.h~mm-discard-__gfp_atomic +++ a/tools/include/linux/gfp.h @@ -12,7 +12,6 @@ #define __GFP_FS 0x80u #define __GFP_NOWARN 0x200u #define __GFP_ZERO 0x8000u -#define __GFP_ATOMIC 0x80000u #define __GFP_ACCOUNT 0x100000u #define __GFP_DIRECT_RECLAIM 0x400000u #define __GFP_KSWAPD_RECLAIM 0x2000000u @@ -20,7 +19,7 @@ #define __GFP_RECLAIM (__GFP_DIRECT_RECLAIM | __GFP_KSWAPD_RECLAIM) #define GFP_ZONEMASK 0x0fu -#define GFP_ATOMIC (__GFP_HIGH | __GFP_ATOMIC | __GFP_KSWAPD_RECLAIM) +#define GFP_ATOMIC (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) --- a/tools/perf/builtin-kmem.c~mm-discard-__gfp_atomic +++ a/tools/perf/builtin-kmem.c @@ -640,7 +640,6 @@ static const struct { { "__GFP_HIGHMEM", "HM" }, { "GFP_DMA32", "D32" }, { "__GFP_HIGH", "H" }, - { "__GFP_ATOMIC", "_A" }, { "__GFP_IO", "I" }, { "__GFP_FS", "F" }, { "__GFP_NOWARN", "NWR" },
On Sat 30-04-22 11:30:28, Andrew Morton wrote: > Folks, > > Discussion seemed to meander then peter out. What's the thinking on > this patch now? I like the simplification. This requires a deeper review. I cannot promise anything before LSFMM ends and who knows what my inbox will look like afterwards. In general I think we have just too many levels of memory reserves access and most people have no clue how to use them properly.
On Sat 30-04-22 11:30:28, Andrew Morton wrote: Sorry, this got lost in my inbox. Thanks Andrew for poking me. > From: "NeilBrown" <neilb@suse.de> > Subject: mm: discard __GFP_ATOMIC > > __GFP_ATOMIC serves little purpose. Its main effect is to set > ALLOC_HARDER which adds a few little boosts to increase the chance of an > allocation succeeding, one of which is to lower the water-mark at which it > will succeed. > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > adjusts this watermark. It is probable that other users of __GFP_HIGH > should benefit from the other little bonuses that __GFP_ATOMIC gets. > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > There is little point to this. We already get a might_sleep() warning if > __GFP_DIRECT_RECLAIM is set. > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > probable that testing ALLOC_HARDER is a better fit here. > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > sleep. This should test __GFP_DIRECT_RECLAIM instead. > > This patch: > - removes __GFP_ATOMIC > - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set > (as well as ALLOC_HIGH). > - makes other adjustments as suggested by the above. > > The net result is not change to GFP_ATOMIC allocations. Other > allocations that use __GFP_HIGH will benefit from a few different extra > privileges. This affects: > xen, dm, md, ntfs3 > the vermillion frame buffer > hibernation > ksm > swap > all of which likely produce more benefit than cost if these selected > allocation are more likely to succeed quickly. This is a good summary of the current usage and existing issues. It also shows that the naming is tricky and allows people to make wrong calls (tegra-smmu.c). I also thing that it is wrong to couple memory reserves access to the reclaim constrains/expectations of the caller. > Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name > Signed-off-by: NeilBrown <neilb@suse.de> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Thierry Reding <thierry.reding@gmail.com> > Cc: Mel Gorman <mgorman@techsingularity.net> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Yes, I am all for dropping the gfp flag. One thing that is not really entirely clear to me, though, is whether we still need 3 levels of memory reserves access. Can we just drop ALLOC_HARDER? With this patch applied it serves RT tasks and conflates it with __GFP_HIGH users essentially. So why do we need that additional level of reserves?
On Tue, Sep 06, 2022 at 09:35:41AM +0200, Michal Hocko wrote: > > From: "NeilBrown" <neilb@suse.de> > > Subject: mm: discard __GFP_ATOMIC > > > > __GFP_ATOMIC serves little purpose. Its main effect is to set > > ALLOC_HARDER which adds a few little boosts to increase the chance of an > > allocation succeeding, one of which is to lower the water-mark at which it > > will succeed. > > > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > > adjusts this watermark. It is probable that other users of __GFP_HIGH > > should benefit from the other little bonuses that __GFP_ATOMIC gets. > > > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > > There is little point to this. We already get a might_sleep() warning if > > __GFP_DIRECT_RECLAIM is set. > > > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > > probable that testing ALLOC_HARDER is a better fit here. > > > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > > sleep. This should test __GFP_DIRECT_RECLAIM instead. > > > > This patch: > > - removes __GFP_ATOMIC > > - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set > > (as well as ALLOC_HIGH). > > - makes other adjustments as suggested by the above. > > > > The net result is not change to GFP_ATOMIC allocations. Other > > allocations that use __GFP_HIGH will benefit from a few different extra > > privileges. This affects: > > xen, dm, md, ntfs3 > > the vermillion frame buffer > > hibernation > > ksm > > swap > > all of which likely produce more benefit than cost if these selected > > allocation are more likely to succeed quickly. > > This is a good summary of the current usage and existing issues. It also > shows that the naming is tricky and allows people to make wrong calls > (tegra-smmu.c). I also thing that it is wrong to couple memory reserves > access to the reclaim constrains/expectations of the caller. > I think it's worth trying to get rid of __GFP_ATOMIC although this patch needs to be rebased. Without rebasing it, I suspect there is a corner case for reserving high order atomic blocks. A high-order atomic allocation might get confused with a __GFP_HIGH high-order allocation that can sleep. It would not be completely irrational to have such a caller if it was in a path that can tolerate a stall but stalling might have visible consequences. I'm also worried that the patch might allow __GFP_HIGH to ignore cpusets which is probably not intended by direct users like ksm. > > Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@noble.neil.brown.name > > Signed-off-by: NeilBrown <neilb@suse.de> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Thierry Reding <thierry.reding@gmail.com> > > Cc: Mel Gorman <mgorman@techsingularity.net> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > Yes, I am all for dropping the gfp flag. One thing that is not really > entirely clear to me, though, is whether we still need 3 levels of > memory reserves access. Can we just drop ALLOC_HARDER? With this patch > applied it serves RT tasks and conflates it with __GFP_HIGH users > essentially. So why do we need that additional level of reserves? I think this would fall under the "naming is hard". If __GFP_ATOMIC was removed, the ALLOC_ flags might need renaming to detect differences in high priority allocations (RT + GFP_ATOMIC), critical allocations (OOM) and ones that can access special reserves (GFP_ATOMIC high-order).
On Wed, 7 Sep 2022 10:47:24 +0100 Mel Gorman <mgorman@techsingularity.net> wrote: > On Tue, Sep 06, 2022 at 09:35:41AM +0200, Michal Hocko wrote: > > > From: "NeilBrown" <neilb@suse.de> > > > Subject: mm: discard __GFP_ATOMIC > > > > > > __GFP_ATOMIC serves little purpose. Its main effect is to set > > > ALLOC_HARDER which adds a few little boosts to increase the chance of an > > > allocation succeeding, one of which is to lower the water-mark at which it > > > will succeed. > > > > > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > > > adjusts this watermark. It is probable that other users of __GFP_HIGH > > > should benefit from the other little bonuses that __GFP_ATOMIC gets. > > > > > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > > > There is little point to this. We already get a might_sleep() warning if > > > __GFP_DIRECT_RECLAIM is set. > > > > > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > > > probable that testing ALLOC_HARDER is a better fit here. > > > > > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > > > sleep. This should test __GFP_DIRECT_RECLAIM instead. > > > > > > This patch: > > > - removes __GFP_ATOMIC > > > - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set > > > (as well as ALLOC_HIGH). > > > - makes other adjustments as suggested by the above. > > > > > > The net result is not change to GFP_ATOMIC allocations. Other > > > allocations that use __GFP_HIGH will benefit from a few different extra > > > privileges. This affects: > > > xen, dm, md, ntfs3 > > > the vermillion frame buffer > > > hibernation > > > ksm > > > swap > > > all of which likely produce more benefit than cost if these selected > > > allocation are more likely to succeed quickly. > > > > This is a good summary of the current usage and existing issues. It also > > shows that the naming is tricky and allows people to make wrong calls > > (tegra-smmu.c). I also thing that it is wrong to couple memory reserves > > access to the reclaim constrains/expectations of the caller. > > > > I think it's worth trying to get rid of __GFP_ATOMIC although this patch > needs to be rebased. Without rebasing it, I suspect there is a corner case > for reserving high order atomic blocks. A high-order atomic allocation > might get confused with a __GFP_HIGH high-order allocation that can sleep. > It would not be completely irrational to have such a caller if it was in a > path that can tolerate a stall but stalling might have visible consequences. > I'm also worried that the patch might allow __GFP_HIGH to ignore cpusets > which is probably not intended by direct users like ksm. Unclear what you mean by "rebased". You're saying the patch might have issues - doesn't that mean it needs to be "fixed"? Anyway, I've been maintaining this change for nearly a year - if nothing happens soon I guess I'll drop it so it doesn't get in people's way.
On Sun, Oct 16, 2022 at 07:38:27PM -0700, Andrew Morton wrote: > On Wed, 7 Sep 2022 10:47:24 +0100 Mel Gorman <mgorman@techsingularity.net> wrote: > > > On Tue, Sep 06, 2022 at 09:35:41AM +0200, Michal Hocko wrote: > > > > From: "NeilBrown" <neilb@suse.de> > > > > Subject: mm: discard __GFP_ATOMIC > > > > > > > > __GFP_ATOMIC serves little purpose. Its main effect is to set > > > > ALLOC_HARDER which adds a few little boosts to increase the chance of an > > > > allocation succeeding, one of which is to lower the water-mark at which it > > > > will succeed. > > > > > > > > It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also > > > > adjusts this watermark. It is probable that other users of __GFP_HIGH > > > > should benefit from the other little bonuses that __GFP_ATOMIC gets. > > > > > > > > __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. > > > > There is little point to this. We already get a might_sleep() warning if > > > > __GFP_DIRECT_RECLAIM is set. > > > > > > > > __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is > > > > probable that testing ALLOC_HARDER is a better fit here. > > > > > > > > __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might > > > > sleep. This should test __GFP_DIRECT_RECLAIM instead. > > > > > > > > This patch: > > > > - removes __GFP_ATOMIC > > > > - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set > > > > (as well as ALLOC_HIGH). > > > > - makes other adjustments as suggested by the above. > > > > > > > > The net result is not change to GFP_ATOMIC allocations. Other > > > > allocations that use __GFP_HIGH will benefit from a few different extra > > > > privileges. This affects: > > > > xen, dm, md, ntfs3 > > > > the vermillion frame buffer > > > > hibernation > > > > ksm > > > > swap > > > > all of which likely produce more benefit than cost if these selected > > > > allocation are more likely to succeed quickly. > > > > > > This is a good summary of the current usage and existing issues. It also > > > shows that the naming is tricky and allows people to make wrong calls > > > (tegra-smmu.c). I also thing that it is wrong to couple memory reserves > > > access to the reclaim constrains/expectations of the caller. > > > > > > > I think it's worth trying to get rid of __GFP_ATOMIC although this patch > > needs to be rebased. Without rebasing it, I suspect there is a corner case > > for reserving high order atomic blocks. A high-order atomic allocation > > might get confused with a __GFP_HIGH high-order allocation that can sleep. > > It would not be completely irrational to have such a caller if it was in a > > path that can tolerate a stall but stalling might have visible consequences. > > I'm also worried that the patch might allow __GFP_HIGH to ignore cpusets > > which is probably not intended by direct users like ksm. > > Unclear what you mean by "rebased". I was looking at the original patch, not what was in mm-unstable. > You're saying the patch might have > issues - doesn't that mean it needs to be "fixed"? > Yes, I think it needs some follow-on work. > Anyway, I've been maintaining this change for nearly a year - if > nothing happens soon I guess I'll drop it so it doesn't get in people's > way. I think it's easier to highlight my concerns by renaming ALLOC_HARDER to ALLOC_MIN_RESERVE because that's effectively what it means. It then becomes a bit more obvious where the problems might be. They're all fixable but do people agree that these problems are real or my imagination? --- mm/internal.h | 2 +- mm/page_alloc.c | 51 ++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 41 insertions(+), 12 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 4b44ced87fff..8c7fd034b277 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -743,7 +743,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_OOM ALLOC_NO_WATERMARKS #endif -#define ALLOC_HARDER 0x10 /* try to alloc harder */ +#define ALLOC_MIN_RESERVE 0x10 /* allow allocations below min watermark */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ #define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1a9a844dc197..9ff4cd4c5d11 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3708,8 +3708,16 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, * due to non-CMA allocation context. HIGHATOMIC area is * reserved for high-order atomic allocation, so order-0 * request should skip it. + * + * BZZT: __GFP_HIGH sets ALLOC_MIN_RESERVE and while __GFP_HIGH + * is set for GFP_ATOMIC, it does not mean that all + * __GFP_HIGH allocations also do not specify + * _GFP_DIRECT_RECLAIM. While there do not appear to + * be any in-tree users that allocation high-order pages + * using __GFP_HIGH && !__GFP_DIRECT_RECLAIM, it's a + * potential issue. */ - if (order > 0 && alloc_flags & ALLOC_HARDER) + if (order > 0 && alloc_flags & ALLOC_MIN_RESERVE) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { page = __rmqueue(zone, order, migratetype, alloc_flags); @@ -3943,11 +3951,11 @@ ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE); static inline long __zone_watermark_unusable_free(struct zone *z, unsigned int order, unsigned int alloc_flags) { - const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); + const bool alloc_harder = (alloc_flags & (ALLOC_MIN_RESERVE|ALLOC_OOM)); long unusable_free = (1 << order) - 1; /* - * If the caller does not have rights to ALLOC_HARDER then subtract + * If the caller does not have rights to ALLOC_MIN_RESERVE then subtract * the high-atomic reserves. This will over-estimate the size of the * atomic reserve but it avoids a search. */ @@ -3975,17 +3983,21 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, { long min = mark; int o; - const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); + const bool alloc_harder = (alloc_flags & (ALLOC_MIN_RESERVE|ALLOC_OOM)); /* free_pages may go negative - that's OK */ free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags); + /* + * BZZT: ALLOC_HIGH is now effectively an alias of ALLOC_MIN_RESERVE + * so we end up double dipping below. + */ if (alloc_flags & ALLOC_HIGH) min -= min / 2; if (unlikely(alloc_harder)) { /* - * OOM victims can try even harder than normal ALLOC_HARDER + * OOM victims can try even harder than normal ALLOC_MIN_RESERVE * users on the grounds that it's definitely going to be in * the exit path shortly and free memory. Any allocation it * makes during the free path will be small and short-lived. @@ -4027,6 +4039,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, return true; } #endif + /* BZZT: ALLOC_MIN_RESERVE does not imply access to highatomic reserves */ if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC)) return true; } @@ -4069,12 +4082,12 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, free_pages)) return true; /* - * Ignore watermark boosting for GFP_HIGH order-0 allocations + * Ignore watermark boosting for __GFP_HIGH order-0 allocations * when checking the min watermark. The min watermark is the * point where boosting is ignored so that kswapd is woken up * when below the low watermark. */ - if (unlikely(!order && (alloc_flags & ALLOC_HARDER) && z->watermark_boost + if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_boost && ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) { mark = z->_watermark[WMARK_MIN]; return __zone_watermark_ok(z, order, mark, highest_zoneidx, @@ -4289,8 +4302,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, /* * If this is a high-order atomic allocation then check * if the pageblock should be reserved for the future + * + * BZZT: __GFP_HIGH does not imply high-order atomic */ - if (unlikely(order && (alloc_flags & ALLOC_HARDER))) + if (unlikely(order && (alloc_flags & ALLOC_MIN_RESERVE))) reserve_highatomic_pageblock(page, zone, order); return page; @@ -4834,6 +4849,10 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will * set both ALLOC_HARDER (unless __GFP_NOMEMALLOC) and ALLOC_HIGH. + * + * BZZT: Minor conflation issue. __GFP_HIGH == ALLOC_HIGH but they are + * now very similar to each other. Probably should remove + * ALLOC_HIGH and modify __GFP_HIGH == ALLOC_MIN_RESERVE */ alloc_flags |= (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); @@ -4844,14 +4863,24 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * if it can't schedule. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) - alloc_flags |= ALLOC_HARDER; + alloc_flags |= ALLOC_MIN_RESERVE; /* * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the * comment for __cpuset_node_allowed(). + * + * BZZT: Not necessarily "atomic", should have checked + * __GFP_DIRECT_RECLAIM? This is a matter of + * definition as __GFP_HIGH is documented as being + * necessary for the system to make progress. + * Either enforce cpusets or update __GFP_HIGH + * documentation stating that CPUSETS may be + * broken. Based on the current __GFP_HIGH + * direct users, documenting that cpusets can + * be broken appears to be the safest option. */ alloc_flags &= ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && in_task()) - alloc_flags |= ALLOC_HARDER; + alloc_flags |= ALLOC_MIN_RESERVE; alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); @@ -5277,7 +5306,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * could deplete whole memory reserves which would just make * the situation worse */ - page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac); + page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERVE, ac); if (page) goto got_pg;
diff --git a/Documentation/vm/balance.rst b/Documentation/vm/balance.rst index 6a1fadf3e173..e38e9d83c1c7 100644 --- a/Documentation/vm/balance.rst +++ b/Documentation/vm/balance.rst @@ -6,7 +6,7 @@ Memory Balancing Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com> -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as well as for non __GFP_IO allocations. The first reason why a caller may avoid reclaim is that the caller can not diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index e900e3c46903..c5fa8b8673b6 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -676,12 +676,12 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as, * allocate page in a sleeping context if GFP flags permit. Hence * spinlock needs to be unlocked and re-locked after allocation. */ - if (!(gfp & __GFP_ATOMIC)) + if (gfp & __GFP_DIRECT_RECLAIM) spin_unlock_irqrestore(&as->lock, *flags); page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO); - if (!(gfp & __GFP_ATOMIC)) + if (gfp & __GFP_DIRECT_RECLAIM) spin_lock_irqsave(&as->lock, *flags); /* diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b976c4177299..676c813bd93f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -39,7 +39,7 @@ struct vm_area_struct; #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_ZERO 0x100u -#define ___GFP_ATOMIC 0x200u +/* 0x200u unused */ #define ___GFP_DIRECT_RECLAIM 0x400u #define ___GFP_KSWAPD_RECLAIM 0x800u #define ___GFP_WRITE 0x1000u @@ -116,11 +116,8 @@ struct vm_area_struct; * * %__GFP_HIGH indicates that the caller is high-priority and that granting * the request is necessary before the system can make forward progress. - * For example, creating an IO context to clean pages. - * - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is - * high priority. Users are typically interrupt handlers. This may be - * used in conjunction with %__GFP_HIGH + * For example creating an IO context to clean pages and requests + * from atomic context. * * %__GFP_MEMALLOC allows access to all memory. This should only be used when * the caller guarantees the allocation will allow more memory to be freed @@ -135,7 +132,6 @@ struct vm_area_struct; * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. * This takes precedence over the %__GFP_MEMALLOC flag if both are set. */ -#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) #define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) #define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) @@ -320,7 +316,7 @@ struct vm_area_struct; * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. */ -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 116ed4d5d0f8..30f492256b8c 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -29,7 +29,6 @@ {(unsigned long)__GFP_HIGHMEM, "__GFP_HIGHMEM"}, \ {(unsigned long)GFP_DMA32, "GFP_DMA32"}, \ {(unsigned long)__GFP_HIGH, "__GFP_HIGH"}, \ - {(unsigned long)__GFP_ATOMIC, "__GFP_ATOMIC"}, \ {(unsigned long)__GFP_IO, "__GFP_IO"}, \ {(unsigned long)__GFP_FS, "__GFP_FS"}, \ {(unsigned long)__GFP_NOWARN, "__GFP_NOWARN"}, \ diff --git a/lib/test_printf.c b/lib/test_printf.c index 07309c45f327..8010de49b6c5 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -673,17 +673,17 @@ flags(void) gfp = GFP_ATOMIC|__GFP_DMA; test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp); - gfp = __GFP_ATOMIC; - test("__GFP_ATOMIC", "%pGg", &gfp); + gfp = __GFP_HIGH; + test("__GFP_HIGH", "%pGg", &gfp); /* Any flags not translated by the table should remain numeric */ gfp = ~__GFP_BITS_MASK; snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp); test(cmp_buffer, "%pGg", &gfp); - snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx", + snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx", (unsigned long) gfp); - gfp |= __GFP_ATOMIC; + gfp |= __GFP_HIGH; test(cmp_buffer, "%pGg", &gfp); kfree(cmp_buffer); diff --git a/mm/internal.h b/mm/internal.h index 3b79a5c9427a..4564067065f9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -20,8 +20,7 @@ */ #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\ __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\ - __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\ - __GFP_ATOMIC) + __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC) /* The GFP flags allowed during early boot */ #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS)) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 781605e92015..1ff55347e4b9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2621,7 +2621,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, * put the burden of reclaim on regular allocation requests * and let these go through as privileged allocations. */ - if (gfp_mask & __GFP_ATOMIC) + if (gfp_mask & __GFP_HIGH) goto force; /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..4fff90ca145b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3932,12 +3932,12 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, free_pages)) return true; /* - * Ignore watermark boosting for GFP_ATOMIC order-0 allocations + * Ignore watermark boosting for GFP_HIGH order-0 allocations * when checking the min watermark. The min watermark is the * point where boosting is ignored so that kswapd is woken up * when below the low watermark. */ - if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost + if (unlikely(!order && (alloc_flags & ALLOC_HARDER) && z->watermark_boost && ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) { mark = z->_watermark[WMARK_MIN]; return __zone_watermark_ok(z, order, mark, highest_zoneidx, @@ -4661,12 +4661,12 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH). + * set both ALLOC_HARDER (unless __GFP_NOMEMALLOC) and ALLOC_HIGH. */ alloc_flags |= (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); - if (gfp_mask & __GFP_ATOMIC) { + if (gfp_mask & __GFP_HIGH) { /* * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. @@ -4859,14 +4859,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned int cpuset_mems_cookie; int reserve_flags; - /* - * We also sanity check to catch abuse of atomic reserves being used by - * callers that are not in atomic context. - */ - if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) == - (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) - gfp_mask &= ~__GFP_ATOMIC; - retry_cpuset: compaction_retries = 0; no_progress_loops = 0; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index da03a341c63c..17e6eeb6e196 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -640,7 +640,6 @@ static const struct { { "__GFP_HIGHMEM", "HM" }, { "GFP_DMA32", "D32" }, { "__GFP_HIGH", "H" }, - { "__GFP_ATOMIC", "_A" }, { "__GFP_IO", "I" }, { "__GFP_FS", "F" }, { "__GFP_NOWARN", "NWR" }, diff --git a/tools/testing/radix-tree/linux/gfp.h b/tools/testing/radix-tree/linux/gfp.h index 32159c08a52e..0a0741104dfe 100644 --- a/tools/testing/radix-tree/linux/gfp.h +++ b/tools/testing/radix-tree/linux/gfp.h @@ -12,7 +12,6 @@ #define __GFP_FS 0x80u #define __GFP_NOWARN 0x200u #define __GFP_ZERO 0x8000u -#define __GFP_ATOMIC 0x80000u #define __GFP_ACCOUNT 0x100000u #define __GFP_DIRECT_RECLAIM 0x400000u #define __GFP_KSWAPD_RECLAIM 0x2000000u @@ -20,7 +19,7 @@ #define __GFP_RECLAIM (__GFP_DIRECT_RECLAIM|__GFP_KSWAPD_RECLAIM) #define GFP_ZONEMASK 0x0fu -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM)
__GFP_ATOMIC serves little purpose. It's main effect is to set ALLOC_HARDER which adds a few little boosts to increase the chance of an allocation succeeding, one of which is to lower the water-mark at which it will succeed. It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also adjusts this watermark. It is probable that other users of __GFP_HIGH should benefit from the other little bonuses that __GFP_ATOMIC gets. __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. There is little point to this. We already get a might_sleep() warning if __GFP_DIRECT_RECLAIM is set. __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is probable that testing ALLOC_HARDER is a better fit here. __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might sleep. This should test __GFP_DIRECT_RECLAIM instead. This patch: - removes __GFP_ATOMIC - causes __GFP_HIGH to set ALLOC_HARDER unless __GFP_NOMEMALLOC is set (as well as ALLOC_HIGH). - makes other adjustments as suggested by the above. The net result is not change to GFP_ATOMIC allocations. Other allocations that use __GFP_HIGH will benefit from a few different extra privileges. This affects: xen, dm, md, ntfs3 the vermillion frame buffer hibernation ksm swap all of which likely produce more benefit than cost if these selected allocation are more likely to succeed quickly. Signed-off-by: NeilBrown <neilb@suse.de> --- This patch came out of my attempts to review and document some of the __GFP flags. I couldn't find that __GFP_ATOMIC meant anything really useful, so I thought that discarding it would be best. Obviously GFP_ATOMIC is useful - just not __GFP_ATOMIC (and the fact that both names exist could be seen as a warning sign). Thanks, NeilBrown Documentation/vm/balance.rst | 2 +- drivers/iommu/tegra-smmu.c | 4 ++-- include/linux/gfp.h | 12 ++++-------- include/trace/events/mmflags.h | 1 - lib/test_printf.c | 8 ++++---- mm/internal.h | 3 +-- mm/memcontrol.c | 2 +- mm/page_alloc.c | 16 ++++------------ tools/perf/builtin-kmem.c | 1 - tools/testing/radix-tree/linux/gfp.h | 3 +-- 10 files changed, 18 insertions(+), 34 deletions(-)