Message ID | 20190419094335.GJ18914@techsingularity.net (mailing list archive) |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | mm: Do not boost watermarks to avoid fragmentation for the DISCONTIG memory model | expand |
On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > DISCONTIG is essentially deprecated and even parisc plans to move to > SPARSEMEM so there is no need to be fancy, this patch simply disables > watermark boosting by default on DISCONTIGMEM. I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA scenarios. Grepping the arch/ directories shows: alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) arc (for supporting more than 1GB of memory) ia64 (looks complicated ...) m68k (for multiple chunks of memory) mips (does support NUMA but also non-NUMA) parisc (both NUMA and non-NUMA) I'm not sure that these architecture maintainers even know that DISCONTIGMEM is deprecated. Adding linux-arch to the cc.
On Fri, Apr 19, 2019 at 07:05:21AM -0700, Matthew Wilcox wrote: > On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > > DISCONTIG is essentially deprecated and even parisc plans to move to > > SPARSEMEM so there is no need to be fancy, this patch simply disables > > watermark boosting by default on DISCONTIGMEM. > > I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA > scenarios. Grepping the arch/ directories shows: > > alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) > arc (for supporting more than 1GB of memory) > ia64 (looks complicated ...) > m68k (for multiple chunks of memory) > mips (does support NUMA but also non-NUMA) > parisc (both NUMA and non-NUMA) > > I'm not sure that these architecture maintainers even know that DISCONTIGMEM > is deprecated. Adding linux-arch to the cc. Poor wording then -- yes, DISCONTIGMEM is still used but look where it's used. I find it impossible to believe that any new arch would support DISCONTIGMEM or that DISCONTIGMEM would be selected when SPARSEMEM is available.`It's even more insane when you consider that SPARSEMEM can be extended to support VMEMMAP so that it has similar overhead to FLATMEM when mapping pfns to struct pages and vice-versa.
On 19.04.19 16:28, Mel Gorman wrote: > On Fri, Apr 19, 2019 at 07:05:21AM -0700, Matthew Wilcox wrote: >> On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: >>> DISCONTIG is essentially deprecated and even parisc plans to move to >>> SPARSEMEM so there is no need to be fancy, this patch simply disables >>> watermark boosting by default on DISCONTIGMEM. >> >> I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA >> scenarios. Grepping the arch/ directories shows: >> >> alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) >> arc (for supporting more than 1GB of memory) >> ia64 (looks complicated ...) >> m68k (for multiple chunks of memory) >> mips (does support NUMA but also non-NUMA) >> parisc (both NUMA and non-NUMA) >> >> I'm not sure that these architecture maintainers even know that DISCONTIGMEM >> is deprecated. Adding linux-arch to the cc. > > Poor wording then -- yes, DISCONTIGMEM is still used but look where it's > used. I find it impossible to believe that any new arch would support > DISCONTIGMEM or that DISCONTIGMEM would be selected when SPARSEMEM is > available.`It's even more insane when you consider that SPARSEMEM can be > extended to support VMEMMAP so that it has similar overhead to FLATMEM > when mapping pfns to struct pages and vice-versa. FYI, on parisc we will switch from DISCONTIGMEM to SPARSEMEM with kernel 5.2. The patch was quite simple and it's currently in the for-next tree: https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=281b718721a5e78288271d632731cea9697749f7 Helge
On Fri, Apr 19, 2019 at 07:05:21AM -0700, Matthew Wilcox wrote: > On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > > DISCONTIG is essentially deprecated and even parisc plans to move to > > SPARSEMEM so there is no need to be fancy, this patch simply disables > > watermark boosting by default on DISCONTIGMEM. > > I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA > scenarios. Grepping the arch/ directories shows: > > alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) > arc (for supporting more than 1GB of memory) > ia64 (looks complicated ...) > m68k (for multiple chunks of memory) > mips (does support NUMA but also non-NUMA) > parisc (both NUMA and non-NUMA) i386 NUMA as well > I'm not sure that these architecture maintainers even know that DISCONTIGMEM > is deprecated. Adding linux-arch to the cc. >
On Sun, Apr 21, 2019 at 09:38:59AM +0300, Mike Rapoport wrote: > On Fri, Apr 19, 2019 at 07:05:21AM -0700, Matthew Wilcox wrote: > > On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > > > DISCONTIG is essentially deprecated and even parisc plans to move to > > > SPARSEMEM so there is no need to be fancy, this patch simply disables > > > watermark boosting by default on DISCONTIGMEM. > > > > I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA > > scenarios. Grepping the arch/ directories shows: > > > > alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) > > arc (for supporting more than 1GB of memory) > > ia64 (looks complicated ...) > > m68k (for multiple chunks of memory) > > mips (does support NUMA but also non-NUMA) > > parisc (both NUMA and non-NUMA) > > i386 NUMA as well I clearly over-trimmed. The original assumption that Mel had was that DISCONTIGMEM => NUMA, and that's not true on the above six architectures. It is true on i386 ;-)
On Sun, Apr 21, 2019 at 06:26:07AM -0700, Matthew Wilcox wrote: > On Sun, Apr 21, 2019 at 09:38:59AM +0300, Mike Rapoport wrote: > > On Fri, Apr 19, 2019 at 07:05:21AM -0700, Matthew Wilcox wrote: > > > On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > > > > DISCONTIG is essentially deprecated and even parisc plans to move to > > > > SPARSEMEM so there is no need to be fancy, this patch simply disables > > > > watermark boosting by default on DISCONTIGMEM. > > > > > > I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA > > > scenarios. Grepping the arch/ directories shows: > > > > > > alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) > > > arc (for supporting more than 1GB of memory) > > > ia64 (looks complicated ...) > > > m68k (for multiple chunks of memory) > > > mips (does support NUMA but also non-NUMA) > > > parisc (both NUMA and non-NUMA) > > > > i386 NUMA as well > > I clearly over-trimmed. The original assumption that Mel had was that > DISCONTIGMEM => NUMA, and that's not true on the above six architectures. > It is true on i386 ;-) 32-bit NUMA systems should be non-existent in practice. The last NUMA system I'm aware of that was both NUMA and 32-bit only died somewhere between 2004 and 2007. If someone is running a 64-bit capable system in 32-bit mode with NUMA, they really are just punishing themselves for fun.
On Fri, 19 Apr 2019, Matthew Wilcox wrote:
> ia64 (looks complicated ...)
Well as far as I can tell it was not even used 12 or so years ago on
Itanium when I worked on that stuff.
On 4/19/19 11:43 AM, Mel Gorman wrote: > Mikulas Patocka reported that 1c30844d2dfe ("mm: reclaim small amounts > of memory when an external fragmentation event occurs") "broke" memory > management on parisc. The machine is not NUMA but the DISCONTIG model > creates three pgdats even though it's a UMA machine for the following > ranges > > 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB > 1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB > 2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB > > From his own report > > With the patch 1c30844d2, the kernel will incorrectly reclaim the > first zone when it fills up, ignoring the fact that there are two > completely free zones. Basiscally, it limits cache size to 1GiB. > > For example, if I run: > # dd if=/dev/sda of=/dev/null bs=1M count=2048 > > - with the proper kernel, there should be "Buffers - 2GiB" > when this command finishes. With the patch 1c30844d2, buffers > will consume just 1GiB or slightly more, because the kernel was > incorrectly reclaiming them. > > The page allocator and reclaim makes assumptions that pgdats really > represent NUMA nodes and zones represent ranges and makes decisions > on that basis. Watermark boosting for small pgdats leads to unexpected > results even though this would have behaved reasonably on SPARSEMEM. > > DISCONTIG is essentially deprecated and even parisc plans to move to > SPARSEMEM so there is no need to be fancy, this patch simply disables > watermark boosting by default on DISCONTIGMEM. > > Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") > Reported-by: Mikulas Patocka <mpatocka@redhat.com> > Tested-by: Mikulas Patocka <mpatocka@redhat.com> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > Documentation/sysctl/vm.txt | 16 ++++++++-------- > mm/page_alloc.c | 13 +++++++++++++ > 2 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt > index 6af24cdb25cc..3f13d8599337 100644 > --- a/Documentation/sysctl/vm.txt > +++ b/Documentation/sysctl/vm.txt > @@ -866,14 +866,14 @@ The intent is that compaction has less work to do in the future and to > increase the success rate of future high-order allocations such as SLUB > allocations, THP and hugetlbfs pages. > > -To make it sensible with respect to the watermark_scale_factor parameter, > -the unit is in fractions of 10,000. The default value of 15,000 means > -that up to 150% of the high watermark will be reclaimed in the event of > -a pageblock being mixed due to fragmentation. The level of reclaim is > -determined by the number of fragmentation events that occurred in the > -recent past. If this value is smaller than a pageblock then a pageblocks > -worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor > -of 0 will disable the feature. > +To make it sensible with respect to the watermark_scale_factor > +parameter, the unit is in fractions of 10,000. The default value of > +15,000 on !DISCONTIGMEM configurations means that up to 150% of the high > +watermark will be reclaimed in the event of a pageblock being mixed due > +to fragmentation. The level of reclaim is determined by the number of > +fragmentation events that occurred in the recent past. If this value is > +smaller than a pageblock then a pageblocks worth of pages will be reclaimed > +(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature. > > ============================================================= > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cfaba3889fa2..86c3806f1070 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -266,7 +266,20 @@ compound_page_dtor * const compound_page_dtors[] = { > > int min_free_kbytes = 1024; > int user_min_free_kbytes = -1; > +#ifdef CONFIG_DISCONTIGMEM > +/* > + * DiscontigMem defines memory ranges as separate pg_data_t even if the ranges > + * are not on separate NUMA nodes. Functionally this works but with > + * watermark_boost_factor, it can reclaim prematurely as the ranges can be > + * quite small. By default, do not boost watermarks on discontigmem as in > + * many cases very high-order allocations like THP are likely to be > + * unsupported and the premature reclaim offsets the advantage of long-term > + * fragmentation avoidance. > + */ > +int watermark_boost_factor __read_mostly; > +#else > int watermark_boost_factor __read_mostly = 15000; > +#endif > int watermark_scale_factor = 10; > > static unsigned long nr_kernel_pages __initdata; >
On Sun, Apr 21, 2019 at 10:16:04PM +0100, Mel Gorman wrote: > 32-bit NUMA systems should be non-existent in practice. The last NUMA > system I'm aware of that was both NUMA and 32-bit only died somewhere > between 2004 and 2007. If someone is running a 64-bit capable system in > 32-bit mode with NUMA, they really are just punishing themselves for fun. Can we mark it as BROKEN to see if someone shouts and then remove it a year or two down the road? Or just kill it off now..
>> ia64 (looks complicated ...) > > Well as far as I can tell it was not even used 12 or so years ago on > Itanium when I worked on that stuff. My notes tell that on UP ia64 (RX2620), !NUMA was broken with both SPARSEMEM and DISCONTIGMEM. NUMA+SPARSEMEM or !NUMA worked. Even NUMA+DISCONTIGMEM worked, that was my config on 2-CPU RX2660.
On Tue, Apr 23, 2019 at 12:13:54AM -0700, Christoph Hellwig wrote: > On Sun, Apr 21, 2019 at 10:16:04PM +0100, Mel Gorman wrote: > > 32-bit NUMA systems should be non-existent in practice. The last NUMA > > system I'm aware of that was both NUMA and 32-bit only died somewhere > > between 2004 and 2007. If someone is running a 64-bit capable system in > > 32-bit mode with NUMA, they really are just punishing themselves for fun. > > Can we mark it as BROKEN to see if someone shouts and then remove it > a year or two down the road? Or just kill it off now.. How about making SPARSEMEM default for x86-32? From ac2dc27414e26f799ea063fd1d01e19d70056f43 Mon Sep 17 00:00:00 2001 From: Mike Rapoport <rppt@linux.ibm.com> Date: Wed, 24 Apr 2019 14:32:12 +0300 Subject: [PATCH] x86/Kconfig: make SPARSEMEM default for X86_32 Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> --- arch/x86/Kconfig | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 62fc3fd..77b17af 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1609,10 +1609,6 @@ config ARCH_DISCONTIGMEM_ENABLE def_bool y depends on NUMA && X86_32 -config ARCH_DISCONTIGMEM_DEFAULT - def_bool y - depends on NUMA && X86_32 - config ARCH_SPARSEMEM_ENABLE def_bool y depends on X86_64 || NUMA || X86_32 || X86_32_NON_STANDARD @@ -1621,7 +1617,7 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y - depends on X86_64 + depends on X86_64 || (NUMA && X86_32) config ARCH_SELECT_MEMORY_MODEL def_bool y
On Wed, Apr 24, 2019 at 02:33:53PM +0300, Mike Rapoport wrote: > On Tue, Apr 23, 2019 at 12:13:54AM -0700, Christoph Hellwig wrote: > > On Sun, Apr 21, 2019 at 10:16:04PM +0100, Mel Gorman wrote: > > > 32-bit NUMA systems should be non-existent in practice. The last NUMA > > > system I'm aware of that was both NUMA and 32-bit only died somewhere > > > between 2004 and 2007. If someone is running a 64-bit capable system in > > > 32-bit mode with NUMA, they really are just punishing themselves for fun. > > > > Can we mark it as BROKEN to see if someone shouts and then remove it > > a year or two down the road? Or just kill it off now.. > > How about making SPARSEMEM default for x86-32? > While an improvement, I tend to agree with Christoph that marking it BROKEN as a patch on top of this makes sense and wait to see who, if anyone, screams. If it's quiet for long enough then we can remove it entirely.
On Wed, Apr 24, 2019 at 02:33:53PM +0300, Mike Rapoport wrote: > On Tue, Apr 23, 2019 at 12:13:54AM -0700, Christoph Hellwig wrote: > > On Sun, Apr 21, 2019 at 10:16:04PM +0100, Mel Gorman wrote: > > > 32-bit NUMA systems should be non-existent in practice. The last NUMA > > > system I'm aware of that was both NUMA and 32-bit only died somewhere > > > between 2004 and 2007. If someone is running a 64-bit capable system in > > > 32-bit mode with NUMA, they really are just punishing themselves for fun. > > > > Can we mark it as BROKEN to see if someone shouts and then remove it > > a year or two down the road? Or just kill it off now.. > > How about making SPARSEMEM default for x86-32? Sounds good. Another question: I always found the option to even select the memory models like a bad tradeoff. Can we really expect a user to make a sane choice? I'd rather stick to a relativelty optimal choice based on arch and maybe a few other parameters (NUMA or not for example) and stick to it, reducing the testing matrix.
On Tue, Apr 23, 2019 at 07:49:57PM +0300, Meelis Roos wrote: > > > ia64 (looks complicated ...) > > > > Well as far as I can tell it was not even used 12 or so years ago on > > Itanium when I worked on that stuff. > > My notes tell that on UP ia64 (RX2620), !NUMA was broken with both > SPARSEMEM and DISCONTIGMEM. NUMA+SPARSEMEM or !NUMA worked. Even > NUMA+DISCONTIGMEM worked, that was my config on 2-CPU RX2660. ia64 has a such a huge number of memory model choices. Maybe we need to cut it down to a small set that actually work. That includes fund bits like the 'VIRTUAL_MEM_MAP' option where the comment claims: # VIRTUAL_MEM_MAP and FLAT_NODE_MEM_MAP are functionally equivalent. # VIRTUAL_MEM_MAP has been retained for historical reasons. but it still is selected as the default if sparsemem is not enabled..
> ia64 has a such a huge number of memory model choices. Maybe we > need to cut it down to a small set that actually work. SGI systems had extremely discontiguous memory (they used some high order physical address bits in the tens/hundreds of terabyte range for the node number ... so there would be a few GBytes of actual memory then a huge gap before the next node had a few more Gbytes). I don't know of anyone still booting upstream on an SN2, so if we start doing serious hack and slash the chances are high that SN2 will be broken (if it isn't already). -Tony
On Mon, Apr 29, 2019 at 04:58:09PM +0000, Luck, Tony wrote: > > ia64 has a such a huge number of memory model choices. Maybe we > > need to cut it down to a small set that actually work. > > SGI systems had extremely discontiguous memory (they used some high > order physical address bits in the tens/hundreds of terabyte range for the > node number ... so there would be a few GBytes of actual memory then > a huge gap before the next node had a few more Gbytes). > > I don't know of anyone still booting upstream on an SN2, so if we start doing > serious hack and slash the chances are high that SN2 will be broken (if it isn't > already). When I wrote this, I thought of !NUMA: flat mem NUMA: sparsemem SN2: discontig based on Meelis report. But now that you mention it, I bet SN2 has already died slow death from bitrot. It is so different in places, and it doesn't seem like anyone care - if people want room sized SGI machines the Origin is much more sexy (hello Thomas!) :) So maybe it it time to mark SN2 broken and see if anyone screams? Without SN2 the whole machvec mess could basically go away - the only real difference between the remaining machvecs is which iommu if any we set up.
On Mon, 29 Apr 2019, Christoph Hellwig wrote: > So maybe it it time to mark SN2 broken and see if anyone screams? > > Without SN2 the whole machvec mess could basically go away - the > only real difference between the remaining machvecs is which iommu > if any we set up. SPARSEMEM with VMEMMAP was developed to address these issues and allow one mapping scheme across the different platforms. You do not need DISCONTIGMEM support for SN2. And as far as I know (from a decade ago ok....) the distributions were using VMEMMAP instead.
On Sun, Apr 28, 2019 at 01:11:07AM -0700, Christoph Hellwig wrote: > On Wed, Apr 24, 2019 at 02:33:53PM +0300, Mike Rapoport wrote: > > On Tue, Apr 23, 2019 at 12:13:54AM -0700, Christoph Hellwig wrote: > > > On Sun, Apr 21, 2019 at 10:16:04PM +0100, Mel Gorman wrote: > > > > 32-bit NUMA systems should be non-existent in practice. The last NUMA > > > > system I'm aware of that was both NUMA and 32-bit only died somewhere > > > > between 2004 and 2007. If someone is running a 64-bit capable system in > > > > 32-bit mode with NUMA, they really are just punishing themselves for fun. > > > > > > Can we mark it as BROKEN to see if someone shouts and then remove it > > > a year or two down the road? Or just kill it off now.. > > > > How about making SPARSEMEM default for x86-32? > > Sounds good. > > Another question: I always found the option to even select the memory > models like a bad tradeoff. Can we really expect a user to make a sane > choice? I'd rather stick to a relativelty optimal choice based on arch > and maybe a few other parameters (NUMA or not for example) and stick to > it, reducing the testing matrix. I've sent patches that remove ARCH_SELECT_MEMORY_MODEL from arm, s390 and sparc where it anyway has no effect [1]. That leaves arm64, ia64, parisc, powerpc, sh and i386. I'd say that for i386 selecting between FLAT and SPARSE based on NUMA sounds reasonable. I'm not familiar enough with others to say if such enforcement makes any sense. Probably powerpc and sh can enable the preferred memory model in platform/board part of their Kconfig, just like arm. [1] https://lore.kernel.org/lkml/1556740577-4140-1-git-send-email-rppt@linux.ibm.com
Hi, On Fri, Apr 19, 2019 at 10:08:31PM +0200, Helge Deller wrote: > On 19.04.19 16:28, Mel Gorman wrote: > > On Fri, Apr 19, 2019 at 07:05:21AM -0700, Matthew Wilcox wrote: > >> On Fri, Apr 19, 2019 at 10:43:35AM +0100, Mel Gorman wrote: > >>> DISCONTIG is essentially deprecated and even parisc plans to move to > >>> SPARSEMEM so there is no need to be fancy, this patch simply disables > >>> watermark boosting by default on DISCONTIGMEM. > >> > >> I don't think parisc is the only arch which uses DISCONTIGMEM for !NUMA > >> scenarios. Grepping the arch/ directories shows: > >> > >> alpha (does support NUMA, but also non-NUMA DISCONTIGMEM) > >> arc (for supporting more than 1GB of memory) > >> ia64 (looks complicated ...) > >> m68k (for multiple chunks of memory) > >> mips (does support NUMA but also non-NUMA) > >> parisc (both NUMA and non-NUMA) > >> > >> I'm not sure that these architecture maintainers even know that DISCONTIGMEM > >> is deprecated. Adding linux-arch to the cc. > > > > Poor wording then -- yes, DISCONTIGMEM is still used but look where it's > > used. I find it impossible to believe that any new arch would support > > DISCONTIGMEM or that DISCONTIGMEM would be selected when SPARSEMEM is > > available.`It's even more insane when you consider that SPARSEMEM can be > > extended to support VMEMMAP so that it has similar overhead to FLATMEM > > when mapping pfns to struct pages and vice-versa. > > FYI, on parisc we will switch from DISCONTIGMEM to SPARSEMEM with kernel 5.2. > The patch was quite simple and it's currently in the for-next tree: > https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=281b718721a5e78288271d632731cea9697749f7 A while ago I've sent a patch that removes ARCH_DISCARD_MEMBLOCK option [1] so the hunk below is not needed: diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index c8038165b81f..26c215570adf 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -36,6 +36,7 @@ config PARISC select GENERIC_STRNCPY_FROM_USER select SYSCTL_ARCH_UNALIGN_ALLOW select SYSCTL_EXCEPTION_TRACE + select ARCH_DISCARD_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC select VIRT_TO_BUS select MODULES_USE_ELF_RELA [1] https://lore.kernel.org/lkml/1556102150-32517-1-git-send-email-rppt@linux.ibm.com/ > Helge >
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 6af24cdb25cc..3f13d8599337 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -866,14 +866,14 @@ The intent is that compaction has less work to do in the future and to increase the success rate of future high-order allocations such as SLUB allocations, THP and hugetlbfs pages. -To make it sensible with respect to the watermark_scale_factor parameter, -the unit is in fractions of 10,000. The default value of 15,000 means -that up to 150% of the high watermark will be reclaimed in the event of -a pageblock being mixed due to fragmentation. The level of reclaim is -determined by the number of fragmentation events that occurred in the -recent past. If this value is smaller than a pageblock then a pageblocks -worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor -of 0 will disable the feature. +To make it sensible with respect to the watermark_scale_factor +parameter, the unit is in fractions of 10,000. The default value of +15,000 on !DISCONTIGMEM configurations means that up to 150% of the high +watermark will be reclaimed in the event of a pageblock being mixed due +to fragmentation. The level of reclaim is determined by the number of +fragmentation events that occurred in the recent past. If this value is +smaller than a pageblock then a pageblocks worth of pages will be reclaimed +(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature. ============================================================= diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfaba3889fa2..86c3806f1070 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -266,7 +266,20 @@ compound_page_dtor * const compound_page_dtors[] = { int min_free_kbytes = 1024; int user_min_free_kbytes = -1; +#ifdef CONFIG_DISCONTIGMEM +/* + * DiscontigMem defines memory ranges as separate pg_data_t even if the ranges + * are not on separate NUMA nodes. Functionally this works but with + * watermark_boost_factor, it can reclaim prematurely as the ranges can be + * quite small. By default, do not boost watermarks on discontigmem as in + * many cases very high-order allocations like THP are likely to be + * unsupported and the premature reclaim offsets the advantage of long-term + * fragmentation avoidance. + */ +int watermark_boost_factor __read_mostly; +#else int watermark_boost_factor __read_mostly = 15000; +#endif int watermark_scale_factor = 10; static unsigned long nr_kernel_pages __initdata;