mm: be more verbose for alloc_contig_range faliures

Message ID	20210217163603.429062-1-minchan@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=QuKr=HT=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8538664DF0 From: Minchan Kim <minchan@kernel.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, mhocko@suse.com, david@redhat.com, joaodias@google.com, Minchan Kim <minchan@kernel.org> Subject: [PATCH] mm: be more verbose for alloc_contig_range faliures Date: Wed, 17 Feb 2021 08:36:03 -0800 Message-Id: <20210217163603.429062-1-minchan@kernel.org> MIME-Version: 1.0 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from="<minchan.kim@gmail.com>"; helo=mail-pg1-f176.google.com; client-ip=209.85.215.176 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: be more verbose for alloc_contig_range faliures \| expand mm: be more verbose for alloc_contig_range faliures

Minchan Kim Feb. 17, 2021, 4:36 p.m. UTC

alloc_contig_range is usually used on cma area or movable zone.
It's critical if the page migration fails on those areas so
dump more debugging message like memory_hotplug unless user
specifiy __GFP_NOWARN.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/page_alloc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

David Hildenbrand Feb. 17, 2021, 4:51 p.m. UTC | #1

On 17.02.21 17:36, Minchan Kim wrote:
> alloc_contig_range is usually used on cma area or movable zone.
> It's critical if the page migration fails on those areas so
> dump more debugging message like memory_hotplug unless user
> specifiy __GFP_NOWARN.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>   mm/page_alloc.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0b55c9c95364..67f3ee3a1528 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8486,6 +8486,15 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
>   	}
>   	if (ret < 0) {
> +		if (!(cc->gfp_mask & __GFP_NOWARN)) {
> +			struct page *page;
> +
> +			list_for_each_entry(page, &cc->migratepages, lru) {
> +				pr_warn("migrating pfn %lx failed ret:%d ",
> +						page_to_pfn(page), ret);
> +				dump_page(page, "migration failure");
> +			}

This can create *a lot* of noise. For example, until huge pages are 
actually considered, we will choke on each end every huge page - and 
might do so over and over again.

This might be helpful for debugging, but is unacceptable for production 
systems for now I think. Maybe for now, do it based on CONFIG_DEBUG_VM.

> +		}
>   		putback_movable_pages(&cc->migratepages);
>   		return ret;
>   	}
> @@ -8728,6 +8737,8 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
>   		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
>   		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
>   			if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
> +				unsigned long gfp_flags;
> +
>   				/*
>   				 * We release the zone lock here because
>   				 * alloc_contig_range() will also lock the zone
> @@ -8736,8 +8747,11 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
>   				 * and cause alloc_contig_range() to fail...
>   				 */
>   				spin_unlock_irqrestore(&zone->lock, flags);
> +
> +				if (zone_idx(zone) != ZONE_MOVABLE)
> +					gfp_flags = gfp_mask | __GFP_NOWARN;

This feels wrong. It might be better to make that decision inside 
__alloc_contig_migrate_range() based on cc->zone.

Minchan Kim Feb. 17, 2021, 5:26 p.m. UTC | #2

On Wed, Feb 17, 2021 at 05:51:27PM +0100, David Hildenbrand wrote:
> On 17.02.21 17:36, Minchan Kim wrote:
> > alloc_contig_range is usually used on cma area or movable zone.
> > It's critical if the page migration fails on those areas so
> > dump more debugging message like memory_hotplug unless user
> > specifiy __GFP_NOWARN.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >   mm/page_alloc.c | 16 +++++++++++++++-
> >   1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0b55c9c95364..67f3ee3a1528 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -8486,6 +8486,15 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
> >   				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
> >   	}
> >   	if (ret < 0) {
> > +		if (!(cc->gfp_mask & __GFP_NOWARN)) {
> > +			struct page *page;
> > +
> > +			list_for_each_entry(page, &cc->migratepages, lru) {
> > +				pr_warn("migrating pfn %lx failed ret:%d ",
> > +						page_to_pfn(page), ret);
> > +				dump_page(page, "migration failure");
> > +			}
> 
> This can create *a lot* of noise. For example, until huge pages are actually
> considered, we will choke on each end every huge page - and might do so over
> and over again.

I am not familiar with huge page status at this moment but why couldn't
they use __GFP_NOWARN if they are supposed to fail frequently?

> 
> This might be helpful for debugging, but is unacceptable for production
> systems for now I think. Maybe for now, do it based on CONFIG_DEBUG_VM.

If it's due to huge page you mentioned above and caller passes
__GFP_NOWARN in that case, couldn't we enable always-on?

Actually, I am targeting cma allocation failure, which should
be rather rare compared to other call sites but critical to fail.
If it's concern to emit too many warning message, I will scope
down for site for only cma allocation.

> 
> > +		}
> >   		putback_movable_pages(&cc->migratepages);
> >   		return ret;
> >   	}
> > @@ -8728,6 +8737,8 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
> >   		pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> >   		while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
> >   			if (pfn_range_valid_contig(zone, pfn, nr_pages)) {
> > +				unsigned long gfp_flags;
> > +
> >   				/*
> >   				 * We release the zone lock here because
> >   				 * alloc_contig_range() will also lock the zone
> > @@ -8736,8 +8747,11 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
> >   				 * and cause alloc_contig_range() to fail...
> >   				 */
> >   				spin_unlock_irqrestore(&zone->lock, flags);
> > +
> > +				if (zone_idx(zone) != ZONE_MOVABLE)
> > +					gfp_flags = gfp_mask | __GFP_NOWARN;
> 
> This feels wrong. It might be better to make that decision inside
> __alloc_contig_migrate_range() based on cc->zone.

CMA could be any normal zone and the suggestion will make it slient.

David Hildenbrand Feb. 17, 2021, 5:34 p.m. UTC | #3

On 17.02.21 18:26, Minchan Kim wrote:
> On Wed, Feb 17, 2021 at 05:51:27PM +0100, David Hildenbrand wrote:
>> On 17.02.21 17:36, Minchan Kim wrote:
>>> alloc_contig_range is usually used on cma area or movable zone.
>>> It's critical if the page migration fails on those areas so
>>> dump more debugging message like memory_hotplug unless user
>>> specifiy __GFP_NOWARN.
>>>
>>> Signed-off-by: Minchan Kim <minchan@kernel.org>
>>> ---
>>>    mm/page_alloc.c | 16 +++++++++++++++-
>>>    1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 0b55c9c95364..67f3ee3a1528 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -8486,6 +8486,15 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>>    				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
>>>    	}
>>>    	if (ret < 0) {
>>> +		if (!(cc->gfp_mask & __GFP_NOWARN)) {
>>> +			struct page *page;
>>> +
>>> +			list_for_each_entry(page, &cc->migratepages, lru) {
>>> +				pr_warn("migrating pfn %lx failed ret:%d ",
>>> +						page_to_pfn(page), ret);
>>> +				dump_page(page, "migration failure");
>>> +			}
>>
>> This can create *a lot* of noise. For example, until huge pages are actually
>> considered, we will choke on each end every huge page - and might do so over
>> and over again.
> 
> I am not familiar with huge page status at this moment but why couldn't
> they use __GFP_NOWARN if they are supposed to fail frequently?

any alloc_contig_range() user will fail on hugetlbfs pages right now 
when they are placed into CMA/ZONE_MOVABLE. Oscar is working on that 
upstream.

> 
>>
>> This might be helpful for debugging, but is unacceptable for production
>> systems for now I think. Maybe for now, do it based on CONFIG_DEBUG_VM.
> 
> If it's due to huge page you mentioned above and caller passes
> __GFP_NOWARN in that case, couldn't we enable always-on?

It would make sense to add that for virito-mem when calling 
alloc_contig_range(). For now I didn't do so, because there were not 
that many messages yet - alloc_contig_range() essentially didn't 
understand __GFP_NOWARN.

We should then also stop printing the "PFNs busy ..." part from 
alloc_contig_range() with __GFP_NOWARN.

> 
> Actually, I am targeting cma allocation failure, which should
> be rather rare compared to other call sites but critical to fail.
> If it's concern to emit too many warning message, I will scope
> down for site for only cma allocation.

If you add "__GFP_NOWARN" when !ZONE_MOVABLE, how would you ever print 
something for CMA? What am I missing? CMA is usually not on ZONE_MOVABLE.

Minchan Kim Feb. 17, 2021, 5:45 p.m. UTC | #4

On Wed, Feb 17, 2021 at 06:34:13PM +0100, David Hildenbrand wrote:
> On 17.02.21 18:26, Minchan Kim wrote:
> > On Wed, Feb 17, 2021 at 05:51:27PM +0100, David Hildenbrand wrote:
> > > On 17.02.21 17:36, Minchan Kim wrote:
> > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > It's critical if the page migration fails on those areas so
> > > > dump more debugging message like memory_hotplug unless user
> > > > specifiy __GFP_NOWARN.
> > > > 
> > > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > > ---
> > > >    mm/page_alloc.c | 16 +++++++++++++++-
> > > >    1 file changed, 15 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 0b55c9c95364..67f3ee3a1528 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -8486,6 +8486,15 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
> > > >    				NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
> > > >    	}
> > > >    	if (ret < 0) {
> > > > +		if (!(cc->gfp_mask & __GFP_NOWARN)) {
> > > > +			struct page *page;
> > > > +
> > > > +			list_for_each_entry(page, &cc->migratepages, lru) {
> > > > +				pr_warn("migrating pfn %lx failed ret:%d ",
> > > > +						page_to_pfn(page), ret);
> > > > +				dump_page(page, "migration failure");
> > > > +			}
> > > 
> > > This can create *a lot* of noise. For example, until huge pages are actually
> > > considered, we will choke on each end every huge page - and might do so over
> > > and over again.
> > 
> > I am not familiar with huge page status at this moment but why couldn't
> > they use __GFP_NOWARN if they are supposed to fail frequently?
> 
> any alloc_contig_range() user will fail on hugetlbfs pages right now when
> they are placed into CMA/ZONE_MOVABLE. Oscar is working on that upstream.

Until then, how about adding this under !CONFIG_HUGETLBFS?

> 
> > 
> > > 
> > > This might be helpful for debugging, but is unacceptable for production
> > > systems for now I think. Maybe for now, do it based on CONFIG_DEBUG_VM.
> > 
> > If it's due to huge page you mentioned above and caller passes
> > __GFP_NOWARN in that case, couldn't we enable always-on?
> 
> It would make sense to add that for virito-mem when calling
> alloc_contig_range(). For now I didn't do so, because there were not that
> many messages yet - alloc_contig_range() essentially didn't understand
> __GFP_NOWARN.
> 
> We should then also stop printing the "PFNs busy ..." part from
> alloc_contig_range() with __GFP_NOWARN.

Yub.

> 
> > 
> > Actually, I am targeting cma allocation failure, which should
> > be rather rare compared to other call sites but critical to fail.
> > If it's concern to emit too many warning message, I will scope
> > down for site for only cma allocation.
> 
> If you add "__GFP_NOWARN" when !ZONE_MOVABLE, how would you ever print
> something for CMA? What am I missing? CMA is usually not on ZONE_MOVABLE.

If the caller of cma_alloc passed __GFP_NOWARN, I don't care since
caller explictly declare it's not critical. What I'd like to catch up
is cma_alloc with !__GFP_NOWARN sites.

Michal Hocko Feb. 18, 2021, 8:56 a.m. UTC | #5

On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> alloc_contig_range is usually used on cma area or movable zone.
> It's critical if the page migration fails on those areas so
> dump more debugging message like memory_hotplug unless user
> specifiy __GFP_NOWARN.

I agree with David that this has a potential to generate a lot of output
and it is not really clear whether it is worth it. Page isolation code
already has REPORT_FAILURE mode which currently used only for the memory
hotplug because this was just too noisy from the CMA path - d381c54760dc
("mm: only report isolation failures when offlining memory").

Maybe migration failures are less likely to fail but still. Doesn't CMA
allocator provide some useful error reporting on its own?

[...]
> @@ -8736,8 +8747,11 @@ struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
>  				 * and cause alloc_contig_range() to fail...
>  				 */
>  				spin_unlock_irqrestore(&zone->lock, flags);
> +
> +				if (zone_idx(zone) != ZONE_MOVABLE)
> +					gfp_flags = gfp_mask | __GFP_NOWARN;

Nack to this. Caller shouldn't tweak gfp mask of the caller. If we
really want to control the reporting based on __GFP_NOWARN or a lack of
it then this has to be under control of the caller.

>  				ret = __alloc_contig_pages(pfn, nr_pages,
> -							gfp_mask);
> +							gfp_flags);
>  				if (!ret)
>  					return pfn_to_page(pfn);
>  				spin_lock_irqsave(&zone->lock, flags);
> -- 
> 2.30.0.478.g8a0d178c01-goog
>

David Hildenbrand Feb. 18, 2021, 9:02 a.m. UTC | #6

On 18.02.21 09:56, Michal Hocko wrote:
> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>> alloc_contig_range is usually used on cma area or movable zone.
>> It's critical if the page migration fails on those areas so
>> dump more debugging message like memory_hotplug unless user
>> specifiy __GFP_NOWARN.
> 
> I agree with David that this has a potential to generate a lot of output
> and it is not really clear whether it is worth it. Page isolation code
> already has REPORT_FAILURE mode which currently used only for the memory
> hotplug because this was just too noisy from the CMA path - d381c54760dc
> ("mm: only report isolation failures when offlining memory").
> 
> Maybe migration failures are less likely to fail but still.

Side note: I really dislike that uncontrolled error reporting on memory 
offlining path we have enabled as default. Yeah, it might be useful for 
ZONE_MOVABLE in some cases, but otherwise it's just noise.

Just do a "sudo stress-ng --memhotplug 1" and see the log getting 
flooded ...

Michal Hocko Feb. 18, 2021, 9:35 a.m. UTC | #7

On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> On 18.02.21 09:56, Michal Hocko wrote:
> > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > alloc_contig_range is usually used on cma area or movable zone.
> > > It's critical if the page migration fails on those areas so
> > > dump more debugging message like memory_hotplug unless user
> > > specifiy __GFP_NOWARN.
> > 
> > I agree with David that this has a potential to generate a lot of output
> > and it is not really clear whether it is worth it. Page isolation code
> > already has REPORT_FAILURE mode which currently used only for the memory
> > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > ("mm: only report isolation failures when offlining memory").
> > 
> > Maybe migration failures are less likely to fail but still.
> 
> Side note: I really dislike that uncontrolled error reporting on memory
> offlining path we have enabled as default. Yeah, it might be useful for
> ZONE_MOVABLE in some cases, but otherwise it's just noise.
> 
> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded

Anyway we can discuss this in a separate thread but I think this is not
a representative workload.

David Hildenbrand Feb. 18, 2021, 9:43 a.m. UTC | #8

On 18.02.21 10:35, Michal Hocko wrote:
> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>> On 18.02.21 09:56, Michal Hocko wrote:
>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>> It's critical if the page migration fails on those areas so
>>>> dump more debugging message like memory_hotplug unless user
>>>> specifiy __GFP_NOWARN.
>>>
>>> I agree with David that this has a potential to generate a lot of output
>>> and it is not really clear whether it is worth it. Page isolation code
>>> already has REPORT_FAILURE mode which currently used only for the memory
>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>> ("mm: only report isolation failures when offlining memory").
>>>
>>> Maybe migration failures are less likely to fail but still.
>>
>> Side note: I really dislike that uncontrolled error reporting on memory
>> offlining path we have enabled as default. Yeah, it might be useful for
>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>
>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> 
> Anyway we can discuss this in a separate thread but I think this is not
> a representative workload.

Sure, but the essence is "this is noise", and we'll have more noise on 
alloc_contig_range() as we see these calls more frequently. There should 
be an explicit way to enable such *debug* messages.

Michal Hocko Feb. 18, 2021, 9:59 a.m. UTC | #9

On Thu 18-02-21 10:43:21, David Hildenbrand wrote:
> On 18.02.21 10:35, Michal Hocko wrote:
> > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > It's critical if the page migration fails on those areas so
> > > > > dump more debugging message like memory_hotplug unless user
> > > > > specifiy __GFP_NOWARN.
> > > > 
> > > > I agree with David that this has a potential to generate a lot of output
> > > > and it is not really clear whether it is worth it. Page isolation code
> > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > ("mm: only report isolation failures when offlining memory").
> > > > 
> > > > Maybe migration failures are less likely to fail but still.
> > > 
> > > Side note: I really dislike that uncontrolled error reporting on memory
> > > offlining path we have enabled as default. Yeah, it might be useful for
> > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > 
> > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > 
> > Anyway we can discuss this in a separate thread but I think this is not
> > a representative workload.
> 
> Sure, but the essence is "this is noise", and we'll have more noise on
> alloc_contig_range() as we see these calls more frequently. There should be
> an explicit way to enable such *debug* messages.

There is a dynamic debugging framework available. I do not have much of
an exprience there but maybe that is the way to go.

Minchan Kim Feb. 18, 2021, 4:10 p.m. UTC | #10

On Thu, Feb 18, 2021 at 09:56:18AM +0100, Michal Hocko wrote:
> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > alloc_contig_range is usually used on cma area or movable zone.
> > It's critical if the page migration fails on those areas so
> > dump more debugging message like memory_hotplug unless user
> > specifiy __GFP_NOWARN.
> 
> I agree with David that this has a potential to generate a lot of output
> and it is not really clear whether it is worth it. Page isolation code
> already has REPORT_FAILURE mode which currently used only for the memory
> hotplug because this was just too noisy from the CMA path - d381c54760dc
> ("mm: only report isolation failures when offlining memory").
> 
> Maybe migration failures are less likely to fail but still. Doesn't CMA
> allocator provide some useful error reporting on its own?

Unfortunately, it's very useless. :(

```
in cma.c
                pr_debug("%s(): memory range at %p is busy, retrying\n",
                         __func__, pfn_to_page(pfn));

```
even, the pfn is not failed page. 
Originally, I thought to deal with it from cma.c to minimize changes
but it was tough because cma area couldn't get the failed page list.

Minchan Kim Feb. 18, 2021, 4:19 p.m. UTC | #11

On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> On 18.02.21 10:35, Michal Hocko wrote:
> > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > It's critical if the page migration fails on those areas so
> > > > > dump more debugging message like memory_hotplug unless user
> > > > > specifiy __GFP_NOWARN.
> > > > 
> > > > I agree with David that this has a potential to generate a lot of output
> > > > and it is not really clear whether it is worth it. Page isolation code
> > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > ("mm: only report isolation failures when offlining memory").
> > > > 
> > > > Maybe migration failures are less likely to fail but still.
> > > 
> > > Side note: I really dislike that uncontrolled error reporting on memory
> > > offlining path we have enabled as default. Yeah, it might be useful for
> > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > 
> > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > 
> > Anyway we can discuss this in a separate thread but I think this is not
> > a representative workload.
> 
> Sure, but the essence is "this is noise", and we'll have more noise on
> alloc_contig_range() as we see these calls more frequently. There should be
> an explicit way to enable such *debug* messages.

alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
Why shouldn't people use it if they don't care the failure?
Semantically, it makes sense to me.

About the messeage flooding, shouldn't we go with ratelimiting?
I see those two problem are orthgonal.

David Hildenbrand Feb. 18, 2021, 4:26 p.m. UTC | #12

On 18.02.21 17:19, Minchan Kim wrote:
> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>> On 18.02.21 10:35, Michal Hocko wrote:
>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>> It's critical if the page migration fails on those areas so
>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>> specifiy __GFP_NOWARN.
>>>>>
>>>>> I agree with David that this has a potential to generate a lot of output
>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>
>>>>> Maybe migration failures are less likely to fail but still.
>>>>
>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>
>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>
>>> Anyway we can discuss this in a separate thread but I think this is not
>>> a representative workload.
>>
>> Sure, but the essence is "this is noise", and we'll have more noise on
>> alloc_contig_range() as we see these calls more frequently. There should be
>> an explicit way to enable such *debug* messages.
> 
> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.

I am not 100% sure it does.

> Why shouldn't people use it if they don't care the failure?

Because flooding the log with noise maybe a handful of people on this 
planet care about is absolutely useless. With the warnings in 
warn_alloc() people can at least conclude something reasonable.

> Semantically, it makes sense to me.
> 
> About the messeage flooding, shouldn't we go with ratelimiting?

At least that (see warn_alloc()). But I'd even want to see some other 
trigger to enable this explicitly on demand.

Minchan Kim Feb. 18, 2021, 4:47 p.m. UTC | #13

On Thu, Feb 18, 2021 at 05:26:08PM +0100, David Hildenbrand wrote:
> On 18.02.21 17:19, Minchan Kim wrote:
> > On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> > > On 18.02.21 10:35, Michal Hocko wrote:
> > > > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > > > It's critical if the page migration fails on those areas so
> > > > > > > dump more debugging message like memory_hotplug unless user
> > > > > > > specifiy __GFP_NOWARN.
> > > > > > 
> > > > > > I agree with David that this has a potential to generate a lot of output
> > > > > > and it is not really clear whether it is worth it. Page isolation code
> > > > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > > > ("mm: only report isolation failures when offlining memory").
> > > > > > 
> > > > > > Maybe migration failures are less likely to fail but still.
> > > > > 
> > > > > Side note: I really dislike that uncontrolled error reporting on memory
> > > > > offlining path we have enabled as default. Yeah, it might be useful for
> > > > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > > > 
> > > > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > > > 
> > > > Anyway we can discuss this in a separate thread but I think this is not
> > > > a representative workload.
> > > 
> > > Sure, but the essence is "this is noise", and we'll have more noise on
> > > alloc_contig_range() as we see these calls more frequently. There should be
> > > an explicit way to enable such *debug* messages.
> > 
> > alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
> 
> I am not 100% sure it does.

Oh, it should. Otherwise, let's fix either of caller or
alloc_contig_range since we have a customer.

```
    ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
            GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0))
```

> 
> > Why shouldn't people use it if they don't care the failure?
> 
> Because flooding the log with noise maybe a handful of people on this planet
> care about is absolutely useless. With the warnings in warn_alloc() people
> can at least conclude something reasonable.
> 
> > Semantically, it makes sense to me.
> > 
> > About the messeage flooding, shouldn't we go with ratelimiting?
> 
> At least that (see warn_alloc()). But I'd even want to see some other
> trigger to enable this explicitly on demand.

No objection.

How about adding verbose knob under CONFIG_CMA_DEBUGFS with
alloc_contig_range(..., bool verbose) like start_isolate_page_range?

If admin turns on the verbose mode under CONFIG_CMA_DEBUGFS,
cma_alloc will pass alloc_contig_range(...., true).

David Hildenbrand Feb. 18, 2021, 4:53 p.m. UTC | #14

On 18.02.21 17:47, Minchan Kim wrote:
> On Thu, Feb 18, 2021 at 05:26:08PM +0100, David Hildenbrand wrote:
>> On 18.02.21 17:19, Minchan Kim wrote:
>>> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>>>> On 18.02.21 10:35, Michal Hocko wrote:
>>>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>>>> It's critical if the page migration fails on those areas so
>>>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>>>> specifiy __GFP_NOWARN.
>>>>>>>
>>>>>>> I agree with David that this has a potential to generate a lot of output
>>>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>>>
>>>>>>> Maybe migration failures are less likely to fail but still.
>>>>>>
>>>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>>>
>>>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>>>
>>>>> Anyway we can discuss this in a separate thread but I think this is not
>>>>> a representative workload.
>>>>
>>>> Sure, but the essence is "this is noise", and we'll have more noise on
>>>> alloc_contig_range() as we see these calls more frequently. There should be
>>>> an explicit way to enable such *debug* messages.
>>>
>>> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
>>
>> I am not 100% sure it does.
> 
> Oh, it should. Otherwise, let's fix either of caller or
> alloc_contig_range since we have a customer.
> 
> ```
>      ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
>              GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0))
> ```
> 

Oh, interesting. So I certainly want to add that for virtio-mem as well 
- thanks.

And discussed, we should hide the one pr_info... in alloc_contig_range() 
as well.

>>
>>> Why shouldn't people use it if they don't care the failure?
>>
>> Because flooding the log with noise maybe a handful of people on this planet
>> care about is absolutely useless. With the warnings in warn_alloc() people
>> can at least conclude something reasonable.
>>
>>> Semantically, it makes sense to me.
>>>
>>> About the messeage flooding, shouldn't we go with ratelimiting?
>>
>> At least that (see warn_alloc()). But I'd even want to see some other
>> trigger to enable this explicitly on demand.
> 
> No objection.
> 
> How about adding verbose knob under CONFIG_CMA_DEBUGFS with
> alloc_contig_range(..., bool verbose) like start_isolate_page_range?
> 
> If admin turns on the verbose mode under CONFIG_CMA_DEBUGFS,
> cma_alloc will pass alloc_contig_range(...., true).

I'd handle this internally in alloc_contig_range and not pass magic 
flags around. Some kind of debug knob to enable advanced messages 
(exceeding the usual allocation warnings we would usually see).

Michal Hocko Feb. 19, 2021, 9:28 a.m. UTC | #15

On Thu 18-02-21 08:19:50, Minchan Kim wrote:
> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> > On 18.02.21 10:35, Michal Hocko wrote:
> > > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > > It's critical if the page migration fails on those areas so
> > > > > > dump more debugging message like memory_hotplug unless user
> > > > > > specifiy __GFP_NOWARN.
> > > > > 
> > > > > I agree with David that this has a potential to generate a lot of output
> > > > > and it is not really clear whether it is worth it. Page isolation code
> > > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > > ("mm: only report isolation failures when offlining memory").
> > > > > 
> > > > > Maybe migration failures are less likely to fail but still.
> > > > 
> > > > Side note: I really dislike that uncontrolled error reporting on memory
> > > > offlining path we have enabled as default. Yeah, it might be useful for
> > > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > > 
> > > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > > 
> > > Anyway we can discuss this in a separate thread but I think this is not
> > > a representative workload.
> > 
> > Sure, but the essence is "this is noise", and we'll have more noise on
> > alloc_contig_range() as we see these calls more frequently. There should be
> > an explicit way to enable such *debug* messages.
> 
> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
> Why shouldn't people use it if they don't care the failure?
> Semantically, it makes sense to me.

Well, alloc_contig_range doesn't really have to implement all the gfp
flags. This is a matter of practicality. alloc_contig_range is quite
different from the page allocator because it is to be expected that it
can fail the request. This is avery optimistic allocation request. That
would suggest that complaining about allocation failures is rather
noisy.

Now I do understand that some users would like to see why those
allocations have failed. The question is whether that information is
generally useful or it is more of a debugging aid. The amount of
information is also an important aspect. It would be rather unfortunate
to dump thousands of pages just because they cannot be migrated.

I do not have a strong opinion here. We can make all alloc_contig_range
users use GFP_NOWARN by default and only skip the flag from the cma
allocator but I am slowly leaning towards (ab)using dynamic debugging
infrastructure for this.

David Hildenbrand Feb. 19, 2021, 9:30 a.m. UTC | #16

On 19.02.21 10:28, Michal Hocko wrote:
> On Thu 18-02-21 08:19:50, Minchan Kim wrote:
>> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>>> On 18.02.21 10:35, Michal Hocko wrote:
>>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>>> It's critical if the page migration fails on those areas so
>>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>>> specifiy __GFP_NOWARN.
>>>>>>
>>>>>> I agree with David that this has a potential to generate a lot of output
>>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>>
>>>>>> Maybe migration failures are less likely to fail but still.
>>>>>
>>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>>
>>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>>
>>>> Anyway we can discuss this in a separate thread but I think this is not
>>>> a representative workload.
>>>
>>> Sure, but the essence is "this is noise", and we'll have more noise on
>>> alloc_contig_range() as we see these calls more frequently. There should be
>>> an explicit way to enable such *debug* messages.
>>
>> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
>> Why shouldn't people use it if they don't care the failure?
>> Semantically, it makes sense to me.
> 
> Well, alloc_contig_range doesn't really have to implement all the gfp
> flags. This is a matter of practicality. alloc_contig_range is quite
> different from the page allocator because it is to be expected that it
> can fail the request. This is avery optimistic allocation request. That
> would suggest that complaining about allocation failures is rather
> noisy.
> 
> Now I do understand that some users would like to see why those
> allocations have failed. The question is whether that information is
> generally useful or it is more of a debugging aid. The amount of
> information is also an important aspect. It would be rather unfortunate
> to dump thousands of pages just because they cannot be migrated.
> 
> I do not have a strong opinion here. We can make all alloc_contig_range
> users use GFP_NOWARN by default and only skip the flag from the cma
> allocator but I am slowly leaning towards (ab)using dynamic debugging
> infrastructure for this.

Just so I understand what you are referring to - trace points?

Michal Hocko Feb. 19, 2021, 10:02 a.m. UTC | #17

On Fri 19-02-21 10:30:12, David Hildenbrand wrote:
> On 19.02.21 10:28, Michal Hocko wrote:
> > On Thu 18-02-21 08:19:50, Minchan Kim wrote:
> > > On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> > > > On 18.02.21 10:35, Michal Hocko wrote:
> > > > > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > > > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > > > > It's critical if the page migration fails on those areas so
> > > > > > > > dump more debugging message like memory_hotplug unless user
> > > > > > > > specifiy __GFP_NOWARN.
> > > > > > > 
> > > > > > > I agree with David that this has a potential to generate a lot of output
> > > > > > > and it is not really clear whether it is worth it. Page isolation code
> > > > > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > > > > ("mm: only report isolation failures when offlining memory").
> > > > > > > 
> > > > > > > Maybe migration failures are less likely to fail but still.
> > > > > > 
> > > > > > Side note: I really dislike that uncontrolled error reporting on memory
> > > > > > offlining path we have enabled as default. Yeah, it might be useful for
> > > > > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > > > > 
> > > > > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > > > > 
> > > > > Anyway we can discuss this in a separate thread but I think this is not
> > > > > a representative workload.
> > > > 
> > > > Sure, but the essence is "this is noise", and we'll have more noise on
> > > > alloc_contig_range() as we see these calls more frequently. There should be
> > > > an explicit way to enable such *debug* messages.
> > > 
> > > alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
> > > Why shouldn't people use it if they don't care the failure?
> > > Semantically, it makes sense to me.
> > 
> > Well, alloc_contig_range doesn't really have to implement all the gfp
> > flags. This is a matter of practicality. alloc_contig_range is quite
> > different from the page allocator because it is to be expected that it
> > can fail the request. This is avery optimistic allocation request. That
> > would suggest that complaining about allocation failures is rather
> > noisy.
> > 
> > Now I do understand that some users would like to see why those
> > allocations have failed. The question is whether that information is
> > generally useful or it is more of a debugging aid. The amount of
> > information is also an important aspect. It would be rather unfortunate
> > to dump thousands of pages just because they cannot be migrated.
> > 
> > I do not have a strong opinion here. We can make all alloc_contig_range
> > users use GFP_NOWARN by default and only skip the flag from the cma
> > allocator but I am slowly leaning towards (ab)using dynamic debugging
> > infrastructure for this.
> 
> Just so I understand what you are referring to - trace points?

Documentation/admin-guide/dynamic-debug-howto.rst
but I have to confess I have 0 experience with this.

David Hildenbrand Feb. 19, 2021, 10:34 a.m. UTC | #18

On 19.02.21 11:02, Michal Hocko wrote:
> On Fri 19-02-21 10:30:12, David Hildenbrand wrote:
>> On 19.02.21 10:28, Michal Hocko wrote:
>>> On Thu 18-02-21 08:19:50, Minchan Kim wrote:
>>>> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>>>>> On 18.02.21 10:35, Michal Hocko wrote:
>>>>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>>>>> It's critical if the page migration fails on those areas so
>>>>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>>>>> specifiy __GFP_NOWARN.
>>>>>>>>
>>>>>>>> I agree with David that this has a potential to generate a lot of output
>>>>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>>>>
>>>>>>>> Maybe migration failures are less likely to fail but still.
>>>>>>>
>>>>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>>>>
>>>>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>>>>
>>>>>> Anyway we can discuss this in a separate thread but I think this is not
>>>>>> a representative workload.
>>>>>
>>>>> Sure, but the essence is "this is noise", and we'll have more noise on
>>>>> alloc_contig_range() as we see these calls more frequently. There should be
>>>>> an explicit way to enable such *debug* messages.
>>>>
>>>> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
>>>> Why shouldn't people use it if they don't care the failure?
>>>> Semantically, it makes sense to me.
>>>
>>> Well, alloc_contig_range doesn't really have to implement all the gfp
>>> flags. This is a matter of practicality. alloc_contig_range is quite
>>> different from the page allocator because it is to be expected that it
>>> can fail the request. This is avery optimistic allocation request. That
>>> would suggest that complaining about allocation failures is rather
>>> noisy.
>>>
>>> Now I do understand that some users would like to see why those
>>> allocations have failed. The question is whether that information is
>>> generally useful or it is more of a debugging aid. The amount of
>>> information is also an important aspect. It would be rather unfortunate
>>> to dump thousands of pages just because they cannot be migrated.
>>>
>>> I do not have a strong opinion here. We can make all alloc_contig_range
>>> users use GFP_NOWARN by default and only skip the flag from the cma
>>> allocator but I am slowly leaning towards (ab)using dynamic debugging
>>> infrastructure for this.
>>
>> Just so I understand what you are referring to - trace points?
> 
> Documentation/admin-guide/dynamic-debug-howto.rst
> but I have to confess I have 0 experience with this.

Me too, but it does sound like a good fit.

Minchan Kim March 4, 2021, 4:01 p.m. UTC | #19

On Tue, Mar 02, 2021 at 09:23:49AM -0800, Minchan Kim wrote:
> On Fri, Feb 19, 2021 at 10:28:12AM +0100, Michal Hocko wrote:
> > On Thu 18-02-21 08:19:50, Minchan Kim wrote:
> > > On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> > > > On 18.02.21 10:35, Michal Hocko wrote:
> > > > > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > > > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > > > > It's critical if the page migration fails on those areas so
> > > > > > > > dump more debugging message like memory_hotplug unless user
> > > > > > > > specifiy __GFP_NOWARN.
> > > > > > > 
> > > > > > > I agree with David that this has a potential to generate a lot of output
> > > > > > > and it is not really clear whether it is worth it. Page isolation code
> > > > > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > > > > ("mm: only report isolation failures when offlining memory").
> > > > > > > 
> > > > > > > Maybe migration failures are less likely to fail but still.
> > > > > > 
> > > > > > Side note: I really dislike that uncontrolled error reporting on memory
> > > > > > offlining path we have enabled as default. Yeah, it might be useful for
> > > > > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > > > > 
> > > > > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > > > > 
> > > > > Anyway we can discuss this in a separate thread but I think this is not
> > > > > a representative workload.
> > > > 
> > > > Sure, but the essence is "this is noise", and we'll have more noise on
> > > > alloc_contig_range() as we see these calls more frequently. There should be
> > > > an explicit way to enable such *debug* messages.
> > > 
> > > alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
> > > Why shouldn't people use it if they don't care the failure?
> > > Semantically, it makes sense to me.
> 
> Sorry for the late response.
> 
> > 
> > Well, alloc_contig_range doesn't really have to implement all the gfp
> > flags. This is a matter of practicality. alloc_contig_range is quite
> > different from the page allocator because it is to be expected that it
> > can fail the request. This is avery optimistic allocation request. That
> > would suggest that complaining about allocation failures is rather
> > noisy.
> 
> That was why I'd like to approach for per-call site indicator with
> __GFP_NOWARN. Even though it was allocation from CMA, some of them
> wouldn't be critical for the failure so those wouldn't care of
> the failure. cma_alloc already has carried on "bool no_warn"
> which was changed into gfp_t recently. What alloc_contig_range
> should do is to take care of the request.
> 
> > 
> > Now I do understand that some users would like to see why those
> > allocations have failed. The question is whether that information is
> > generally useful or it is more of a debugging aid. The amount of
> > information is also an important aspect. It would be rather unfortunate
> > to dump thousands of pages just because they cannot be migrated.
> 
> Totally, agree dumping thounds of pages as debugging aid are bad.
> Couldn't we simply ratelimit them like other places?
> 
> > 
> > I do not have a strong opinion here. We can make all alloc_contig_range
> > users use GFP_NOWARN by default and only skip the flag from the cma
> > allocator but I am slowly leaning towards (ab)using dynamic debugging
> 
> I agree the rest of the places are GFP_NOWARN by default except CMA
> if they expect alloc_contig_range are optimistic allocation request.
> However, I'd like to tweak it for CMA - accept gfp_t from cma_alloc
> and take care of the __GFP_NOWARN since some sites of CMA could be
> fault tolerant so no need to get the warning.

Any thought to proceed?

> 
> > infrastructure for this.
> 
> dynamic debugging is system wide flag so how to deal with if we
> want to see specific alloation faliure, not whole callsites?
> That's why I'd like to go with per-call site approach, still.

David Hildenbrand March 4, 2021, 4:10 p.m. UTC | #20

On 04.03.21 17:01, Minchan Kim wrote:
> On Tue, Mar 02, 2021 at 09:23:49AM -0800, Minchan Kim wrote:
>> On Fri, Feb 19, 2021 at 10:28:12AM +0100, Michal Hocko wrote:
>>> On Thu 18-02-21 08:19:50, Minchan Kim wrote:
>>>> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>>>>> On 18.02.21 10:35, Michal Hocko wrote:
>>>>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>>>>> It's critical if the page migration fails on those areas so
>>>>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>>>>> specifiy __GFP_NOWARN.
>>>>>>>>
>>>>>>>> I agree with David that this has a potential to generate a lot of output
>>>>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>>>>
>>>>>>>> Maybe migration failures are less likely to fail but still.
>>>>>>>
>>>>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>>>>
>>>>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>>>>
>>>>>> Anyway we can discuss this in a separate thread but I think this is not
>>>>>> a representative workload.
>>>>>
>>>>> Sure, but the essence is "this is noise", and we'll have more noise on
>>>>> alloc_contig_range() as we see these calls more frequently. There should be
>>>>> an explicit way to enable such *debug* messages.
>>>>
>>>> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
>>>> Why shouldn't people use it if they don't care the failure?
>>>> Semantically, it makes sense to me.
>>
>> Sorry for the late response.
>>
>>>
>>> Well, alloc_contig_range doesn't really have to implement all the gfp
>>> flags. This is a matter of practicality. alloc_contig_range is quite
>>> different from the page allocator because it is to be expected that it
>>> can fail the request. This is avery optimistic allocation request. That
>>> would suggest that complaining about allocation failures is rather
>>> noisy.
>>
>> That was why I'd like to approach for per-call site indicator with
>> __GFP_NOWARN. Even though it was allocation from CMA, some of them
>> wouldn't be critical for the failure so those wouldn't care of
>> the failure. cma_alloc already has carried on "bool no_warn"
>> which was changed into gfp_t recently. What alloc_contig_range
>> should do is to take care of the request.
>>
>>>
>>> Now I do understand that some users would like to see why those
>>> allocations have failed. The question is whether that information is
>>> generally useful or it is more of a debugging aid. The amount of
>>> information is also an important aspect. It would be rather unfortunate
>>> to dump thousands of pages just because they cannot be migrated.
>>
>> Totally, agree dumping thounds of pages as debugging aid are bad.
>> Couldn't we simply ratelimit them like other places?
>>
>>>
>>> I do not have a strong opinion here. We can make all alloc_contig_range
>>> users use GFP_NOWARN by default and only skip the flag from the cma
>>> allocator but I am slowly leaning towards (ab)using dynamic debugging
>>
>> I agree the rest of the places are GFP_NOWARN by default except CMA
>> if they expect alloc_contig_range are optimistic allocation request.
>> However, I'd like to tweak it for CMA - accept gfp_t from cma_alloc
>> and take care of the __GFP_NOWARN since some sites of CMA could be
>> fault tolerant so no need to get the warning.
> 
> Any thought to proceed?

IMHO, add some proper debug mechanisms and don't try squeezing debug 
messages into "WARN" semantics.

Any alloc_contig_range() user can benefit from that.

Minchan Kim March 4, 2021, 4:23 p.m. UTC | #21

On Thu, Mar 04, 2021 at 05:10:52PM +0100, David Hildenbrand wrote:
> On 04.03.21 17:01, Minchan Kim wrote:
> > On Tue, Mar 02, 2021 at 09:23:49AM -0800, Minchan Kim wrote:
> > > On Fri, Feb 19, 2021 at 10:28:12AM +0100, Michal Hocko wrote:
> > > > On Thu 18-02-21 08:19:50, Minchan Kim wrote:
> > > > > On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> > > > > > On 18.02.21 10:35, Michal Hocko wrote:
> > > > > > > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > > > > > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > > > > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > > > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > > > > > > It's critical if the page migration fails on those areas so
> > > > > > > > > > dump more debugging message like memory_hotplug unless user
> > > > > > > > > > specifiy __GFP_NOWARN.
> > > > > > > > > 
> > > > > > > > > I agree with David that this has a potential to generate a lot of output
> > > > > > > > > and it is not really clear whether it is worth it. Page isolation code
> > > > > > > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > > > > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > > > > > > ("mm: only report isolation failures when offlining memory").
> > > > > > > > > 
> > > > > > > > > Maybe migration failures are less likely to fail but still.
> > > > > > > > 
> > > > > > > > Side note: I really dislike that uncontrolled error reporting on memory
> > > > > > > > offlining path we have enabled as default. Yeah, it might be useful for
> > > > > > > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > > > > > > 
> > > > > > > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > > > > > > 
> > > > > > > Anyway we can discuss this in a separate thread but I think this is not
> > > > > > > a representative workload.
> > > > > > 
> > > > > > Sure, but the essence is "this is noise", and we'll have more noise on
> > > > > > alloc_contig_range() as we see these calls more frequently. There should be
> > > > > > an explicit way to enable such *debug* messages.
> > > > > 
> > > > > alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
> > > > > Why shouldn't people use it if they don't care the failure?
> > > > > Semantically, it makes sense to me.
> > > 
> > > Sorry for the late response.
> > > 
> > > > 
> > > > Well, alloc_contig_range doesn't really have to implement all the gfp
> > > > flags. This is a matter of practicality. alloc_contig_range is quite
> > > > different from the page allocator because it is to be expected that it
> > > > can fail the request. This is avery optimistic allocation request. That
> > > > would suggest that complaining about allocation failures is rather
> > > > noisy.
> > > 
> > > That was why I'd like to approach for per-call site indicator with
> > > __GFP_NOWARN. Even though it was allocation from CMA, some of them
> > > wouldn't be critical for the failure so those wouldn't care of
> > > the failure. cma_alloc already has carried on "bool no_warn"
> > > which was changed into gfp_t recently. What alloc_contig_range
> > > should do is to take care of the request.
> > > 
> > > > 
> > > > Now I do understand that some users would like to see why those
> > > > allocations have failed. The question is whether that information is
> > > > generally useful or it is more of a debugging aid. The amount of
> > > > information is also an important aspect. It would be rather unfortunate
> > > > to dump thousands of pages just because they cannot be migrated.
> > > 
> > > Totally, agree dumping thounds of pages as debugging aid are bad.
> > > Couldn't we simply ratelimit them like other places?
> > > 
> > > > 
> > > > I do not have a strong opinion here. We can make all alloc_contig_range
> > > > users use GFP_NOWARN by default and only skip the flag from the cma
> > > > allocator but I am slowly leaning towards (ab)using dynamic debugging
> > > 
> > > I agree the rest of the places are GFP_NOWARN by default except CMA
> > > if they expect alloc_contig_range are optimistic allocation request.
> > > However, I'd like to tweak it for CMA - accept gfp_t from cma_alloc
> > > and take care of the __GFP_NOWARN since some sites of CMA could be
> > > fault tolerant so no need to get the warning.
> > 
> > Any thought to proceed?
> 
> IMHO, add some proper debug mechanisms and don't try squeezing debug
> messages into "WARN" semantics.
> 
> Any alloc_contig_range() user can benefit from that.

So the point is how we could add proper debug mechanism here.
Think about call site A is not critical for the failure but
called very frquently. Call site B is critical for the failure
but called very rarely so turns on system wide dynamic debugging.
You could see a lot of debug message from A even though we
dont't want it. Even, it could hide B's debugging message
by ratelimiting.

David Hildenbrand March 4, 2021, 4:28 p.m. UTC | #22

On 04.03.21 17:23, Minchan Kim wrote:
> On Thu, Mar 04, 2021 at 05:10:52PM +0100, David Hildenbrand wrote:
>> On 04.03.21 17:01, Minchan Kim wrote:
>>> On Tue, Mar 02, 2021 at 09:23:49AM -0800, Minchan Kim wrote:
>>>> On Fri, Feb 19, 2021 at 10:28:12AM +0100, Michal Hocko wrote:
>>>>> On Thu 18-02-21 08:19:50, Minchan Kim wrote:
>>>>>> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>>>>>>> On 18.02.21 10:35, Michal Hocko wrote:
>>>>>>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>>>>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>>>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>>>>>>> It's critical if the page migration fails on those areas so
>>>>>>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>>>>>>> specifiy __GFP_NOWARN.
>>>>>>>>>>
>>>>>>>>>> I agree with David that this has a potential to generate a lot of output
>>>>>>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>>>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>>>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>>>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>>>>>>
>>>>>>>>>> Maybe migration failures are less likely to fail but still.
>>>>>>>>>
>>>>>>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>>>>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>>>>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>>>>>>
>>>>>>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>>>>>>
>>>>>>>> Anyway we can discuss this in a separate thread but I think this is not
>>>>>>>> a representative workload.
>>>>>>>
>>>>>>> Sure, but the essence is "this is noise", and we'll have more noise on
>>>>>>> alloc_contig_range() as we see these calls more frequently. There should be
>>>>>>> an explicit way to enable such *debug* messages.
>>>>>>
>>>>>> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
>>>>>> Why shouldn't people use it if they don't care the failure?
>>>>>> Semantically, it makes sense to me.
>>>>
>>>> Sorry for the late response.
>>>>
>>>>>
>>>>> Well, alloc_contig_range doesn't really have to implement all the gfp
>>>>> flags. This is a matter of practicality. alloc_contig_range is quite
>>>>> different from the page allocator because it is to be expected that it
>>>>> can fail the request. This is avery optimistic allocation request. That
>>>>> would suggest that complaining about allocation failures is rather
>>>>> noisy.
>>>>
>>>> That was why I'd like to approach for per-call site indicator with
>>>> __GFP_NOWARN. Even though it was allocation from CMA, some of them
>>>> wouldn't be critical for the failure so those wouldn't care of
>>>> the failure. cma_alloc already has carried on "bool no_warn"
>>>> which was changed into gfp_t recently. What alloc_contig_range
>>>> should do is to take care of the request.
>>>>
>>>>>
>>>>> Now I do understand that some users would like to see why those
>>>>> allocations have failed. The question is whether that information is
>>>>> generally useful or it is more of a debugging aid. The amount of
>>>>> information is also an important aspect. It would be rather unfortunate
>>>>> to dump thousands of pages just because they cannot be migrated.
>>>>
>>>> Totally, agree dumping thounds of pages as debugging aid are bad.
>>>> Couldn't we simply ratelimit them like other places?
>>>>
>>>>>
>>>>> I do not have a strong opinion here. We can make all alloc_contig_range
>>>>> users use GFP_NOWARN by default and only skip the flag from the cma
>>>>> allocator but I am slowly leaning towards (ab)using dynamic debugging
>>>>
>>>> I agree the rest of the places are GFP_NOWARN by default except CMA
>>>> if they expect alloc_contig_range are optimistic allocation request.
>>>> However, I'd like to tweak it for CMA - accept gfp_t from cma_alloc
>>>> and take care of the __GFP_NOWARN since some sites of CMA could be
>>>> fault tolerant so no need to get the warning.
>>>
>>> Any thought to proceed?
>>
>> IMHO, add some proper debug mechanisms and don't try squeezing debug
>> messages into "WARN" semantics.
>>
>> Any alloc_contig_range() user can benefit from that.
> 
> So the point is how we could add proper debug mechanism here.
> Think about call site A is not critical for the failure but
> called very frquently. Call site B is critical for the failure
> but called very rarely so turns on system wide dynamic debugging.
> You could see a lot of debug message from A even though we
> dont't want it. Even, it could hide B's debugging message
> by ratelimiting.

Do you have a real life example how this would be an issue? This sounds 
like a purely theoretical construct.

You want to debug something, so you try triggering it and capturing 
debug data. There are not that many alloc_contig_range() users such that 
this would really be an issue to isolate ...

Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is 
problematic (putting aside NORETRY logic and similar aside). So any such 
page you hit is worth investigating and, therefore, worth getting logged 
for debugging purposes.

Minchan Kim March 4, 2021, 5:11 p.m. UTC | #23

On Thu, Mar 04, 2021 at 05:28:32PM +0100, David Hildenbrand wrote:
> On 04.03.21 17:23, Minchan Kim wrote:
> > On Thu, Mar 04, 2021 at 05:10:52PM +0100, David Hildenbrand wrote:
> > > On 04.03.21 17:01, Minchan Kim wrote:
> > > > On Tue, Mar 02, 2021 at 09:23:49AM -0800, Minchan Kim wrote:
> > > > > On Fri, Feb 19, 2021 at 10:28:12AM +0100, Michal Hocko wrote:
> > > > > > On Thu 18-02-21 08:19:50, Minchan Kim wrote:
> > > > > > > On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
> > > > > > > > On 18.02.21 10:35, Michal Hocko wrote:
> > > > > > > > > On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
> > > > > > > > > > On 18.02.21 09:56, Michal Hocko wrote:
> > > > > > > > > > > On Wed 17-02-21 08:36:03, Minchan Kim wrote:
> > > > > > > > > > > > alloc_contig_range is usually used on cma area or movable zone.
> > > > > > > > > > > > It's critical if the page migration fails on those areas so
> > > > > > > > > > > > dump more debugging message like memory_hotplug unless user
> > > > > > > > > > > > specifiy __GFP_NOWARN.
> > > > > > > > > > > 
> > > > > > > > > > > I agree with David that this has a potential to generate a lot of output
> > > > > > > > > > > and it is not really clear whether it is worth it. Page isolation code
> > > > > > > > > > > already has REPORT_FAILURE mode which currently used only for the memory
> > > > > > > > > > > hotplug because this was just too noisy from the CMA path - d381c54760dc
> > > > > > > > > > > ("mm: only report isolation failures when offlining memory").
> > > > > > > > > > > 
> > > > > > > > > > > Maybe migration failures are less likely to fail but still.
> > > > > > > > > > 
> > > > > > > > > > Side note: I really dislike that uncontrolled error reporting on memory
> > > > > > > > > > offlining path we have enabled as default. Yeah, it might be useful for
> > > > > > > > > > ZONE_MOVABLE in some cases, but otherwise it's just noise.
> > > > > > > > > > 
> > > > > > > > > > Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
> > > > > > > > > 
> > > > > > > > > Anyway we can discuss this in a separate thread but I think this is not
> > > > > > > > > a representative workload.
> > > > > > > > 
> > > > > > > > Sure, but the essence is "this is noise", and we'll have more noise on
> > > > > > > > alloc_contig_range() as we see these calls more frequently. There should be
> > > > > > > > an explicit way to enable such *debug* messages.
> > > > > > > 
> > > > > > > alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
> > > > > > > Why shouldn't people use it if they don't care the failure?
> > > > > > > Semantically, it makes sense to me.
> > > > > 
> > > > > Sorry for the late response.
> > > > > 
> > > > > > 
> > > > > > Well, alloc_contig_range doesn't really have to implement all the gfp
> > > > > > flags. This is a matter of practicality. alloc_contig_range is quite
> > > > > > different from the page allocator because it is to be expected that it
> > > > > > can fail the request. This is avery optimistic allocation request. That
> > > > > > would suggest that complaining about allocation failures is rather
> > > > > > noisy.
> > > > > 
> > > > > That was why I'd like to approach for per-call site indicator with
> > > > > __GFP_NOWARN. Even though it was allocation from CMA, some of them
> > > > > wouldn't be critical for the failure so those wouldn't care of
> > > > > the failure. cma_alloc already has carried on "bool no_warn"
> > > > > which was changed into gfp_t recently. What alloc_contig_range
> > > > > should do is to take care of the request.
> > > > > 
> > > > > > 
> > > > > > Now I do understand that some users would like to see why those
> > > > > > allocations have failed. The question is whether that information is
> > > > > > generally useful or it is more of a debugging aid. The amount of
> > > > > > information is also an important aspect. It would be rather unfortunate
> > > > > > to dump thousands of pages just because they cannot be migrated.
> > > > > 
> > > > > Totally, agree dumping thounds of pages as debugging aid are bad.
> > > > > Couldn't we simply ratelimit them like other places?
> > > > > 
> > > > > > 
> > > > > > I do not have a strong opinion here. We can make all alloc_contig_range
> > > > > > users use GFP_NOWARN by default and only skip the flag from the cma
> > > > > > allocator but I am slowly leaning towards (ab)using dynamic debugging
> > > > > 
> > > > > I agree the rest of the places are GFP_NOWARN by default except CMA
> > > > > if they expect alloc_contig_range are optimistic allocation request.
> > > > > However, I'd like to tweak it for CMA - accept gfp_t from cma_alloc
> > > > > and take care of the __GFP_NOWARN since some sites of CMA could be
> > > > > fault tolerant so no need to get the warning.
> > > > 
> > > > Any thought to proceed?
> > > 
> > > IMHO, add some proper debug mechanisms and don't try squeezing debug
> > > messages into "WARN" semantics.
> > > 
> > > Any alloc_contig_range() user can benefit from that.
> > 
> > So the point is how we could add proper debug mechanism here.
> > Think about call site A is not critical for the failure but
> > called very frquently. Call site B is critical for the failure
> > but called very rarely so turns on system wide dynamic debugging.
> > You could see a lot of debug message from A even though we
> > dont't want it. Even, it could hide B's debugging message
> > by ratelimiting.
> 
> Do you have a real life example how this would be an issue? This sounds like
> a purely theoretical construct.

I don't have sepcific a example at this moment but it would be not hard
to think about the usecase. Someone want to use big contiguous memory
to perform better their job but it's not necessary since they have
fallback options with working order-0 pages. In the case, it's not
critical for the failure. However, the other user uses big contiguous
memory to enable some feature for enduser in the mobile phone.
In the case, the allocation failure is significant so really looking
for clues.

> 
> You want to debug something, so you try triggering it and capturing debug
> data. There are not that many alloc_contig_range() users such that this
> would really be an issue to isolate ...

cma_alloc uses alloc_contig_range and cma_alloc has lots of users.
Even, it is expoerted by dmabuf so any userspace would trigger the
allocation by their own. Some of them could be tolerant for the failure,
rest of them could be critical. We should't expect it by limited kernel
usecase.

> 
> Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is
> problematic (putting aside NORETRY logic and similar aside). So any such
> page you hit is worth investigating and, therefore, worth getting logged for
> debugging purposes.

If you believe the every alloc_contig_range failure is problematic
and there is no such realy example I menionted above in the world,
I am happy to put this chunk to support dynamic debugging.
Okay?

+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+        (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state,
+               DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
+int alloc_contig_ratelimit(void)
+{
+       return __ratelimit(&alloc_contig_ratelimit_state);
+}
+
+void dump_migrate_failure_pages(struct list_head *page_list)
+{
+       DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+                       "migrate failure");
+       if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
+                       alloc_contig_ratelimit()) {
+               struct page *page;
+
+               WARN(1, "failed callstack");
+               list_for_each_entry(page, page_list, lru)
+                       dump_page(page, "migration failure");
+       }
+}
+#else
+static inline void dump_migrate_failure_pages(struct list_head *page_list)
+{
+}
+#endif
+
 /* [start, end) must belong to a single zone. */
 static int __alloc_contig_migrate_range(struct compact_control *cc,
                                        unsigned long start, unsigned long end)
@@ -8496,6 +8522,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
                                NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
        }
        if (ret < 0) {
+               dump_migrate_failure_pages(&cc->migratepages);
                putback_movable_pages(&cc->migratepages);
                return ret;
        }

David Hildenbrand March 4, 2021, 5:23 p.m. UTC | #24

>> You want to debug something, so you try triggering it and capturing debug
>> data. There are not that many alloc_contig_range() users such that this
>> would really be an issue to isolate ...
> 
> cma_alloc uses alloc_contig_range and cma_alloc has lots of users.
> Even, it is expoerted by dmabuf so any userspace would trigger the
> allocation by their own. Some of them could be tolerant for the failure,
> rest of them could be critical. We should't expect it by limited kernel
> usecase.

Assume you are debugging allocation failures. You either collect the 
data yourself or ask someone to send you that output. You care about any 
alloc_contig_range() allocation failures that shouldn't happen, don't you?

> 
>>
>> Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is
>> problematic (putting aside NORETRY logic and similar aside). So any such
>> page you hit is worth investigating and, therefore, worth getting logged for
>> debugging purposes.
> 
> If you believe the every alloc_contig_range failure is problematic

Every one where we should have guarantees I guess: ZONE_MOVABLE or 
MIGRAT_CMA. On ZONE_NORMAL, there are no guarantees.

> and there is no such realy example I menionted above in the world,
> I am happy to put this chunk to support dynamic debugging.
> Okay?
> 
> +#if defined(CONFIG_DYNAMIC_DEBUG) || \
> +        (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> +static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state,
> +               DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
> +int alloc_contig_ratelimit(void)
> +{
> +       return __ratelimit(&alloc_contig_ratelimit_state);
> +}
> +

^ do we need ratelimiting with dynamic debugging enabled?

> +void dump_migrate_failure_pages(struct list_head *page_list)
> +{
> +       DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> +                       "migrate failure");
> +       if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> +                       alloc_contig_ratelimit()) {
> +               struct page *page;
> +
> +               WARN(1, "failed callstack");
> +               list_for_each_entry(page, page_list, lru)
> +                       dump_page(page, "migration failure");

Are all pages on the list guaranteed to be problematic, or only the 
first entry? I assume all.

> +       }
> +}
> +#else
> +static inline void dump_migrate_failure_pages(struct list_head *page_list)
> +{
> +}
> +#endif
> +
>   /* [start, end) must belong to a single zone. */
>   static int __alloc_contig_migrate_range(struct compact_control *cc,
>                                          unsigned long start, unsigned long end)
> @@ -8496,6 +8522,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>                                  NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
>          }
>          if (ret < 0) {
> +               dump_migrate_failure_pages(&cc->migratepages);
>                  putback_movable_pages(&cc->migratepages);
>                  return ret;
>          }
> 
> 

If that's the way dynamic debugging is configured/enabled (still have to 
look into it) - yes, that goes into the right direction. As I said 
above, you should dump only where we have some kind of guarantees I assume.

Minchan Kim March 4, 2021, 6:11 p.m. UTC | #25

On Thu, Mar 04, 2021 at 06:23:09PM +0100, David Hildenbrand wrote:
> > > You want to debug something, so you try triggering it and capturing debug
> > > data. There are not that many alloc_contig_range() users such that this
> > > would really be an issue to isolate ...
> > 
> > cma_alloc uses alloc_contig_range and cma_alloc has lots of users.
> > Even, it is expoerted by dmabuf so any userspace would trigger the
> > allocation by their own. Some of them could be tolerant for the failure,
> > rest of them could be critical. We should't expect it by limited kernel
> > usecase.
> 
> Assume you are debugging allocation failures. You either collect the data
> yourself or ask someone to send you that output. You care about any
> alloc_contig_range() allocation failures that shouldn't happen, don't you?
> 
> > 
> > > 
> > > Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is
> > > problematic (putting aside NORETRY logic and similar aside). So any such
> > > page you hit is worth investigating and, therefore, worth getting logged for
> > > debugging purposes.
> > 
> > If you believe the every alloc_contig_range failure is problematic
> 
> Every one where we should have guarantees I guess: ZONE_MOVABLE or
> MIGRAT_CMA. On ZONE_NORMAL, there are no guarantees.

Indeed.

> 
> > and there is no such realy example I menionted above in the world,
> > I am happy to put this chunk to support dynamic debugging.
> > Okay?
> > 
> > +#if defined(CONFIG_DYNAMIC_DEBUG) || \
> > +        (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> > +static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state,
> > +               DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
> > +int alloc_contig_ratelimit(void)
> > +{
> > +       return __ratelimit(&alloc_contig_ratelimit_state);
> > +}
> > +
> 
> ^ do we need ratelimiting with dynamic debugging enabled?

Main argument was debug message flooding. Even though we
play with dynamic debugging, the issue never disappear.

> 
> > +void dump_migrate_failure_pages(struct list_head *page_list)
> > +{
> > +       DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> > +                       "migrate failure");
> > +       if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> > +                       alloc_contig_ratelimit()) {
> > +               struct page *page;
> > +
> > +               WARN(1, "failed callstack");
> > +               list_for_each_entry(page, page_list, lru)
> > +                       dump_page(page, "migration failure");
> 
> Are all pages on the list guaranteed to be problematic, or only the first
> entry? I assume all.

All.

> 
> > +       }
> > +}
> > +#else
> > +static inline void dump_migrate_failure_pages(struct list_head *page_list)
> > +{
> > +}
> > +#endif
> > +
> >   /* [start, end) must belong to a single zone. */
> >   static int __alloc_contig_migrate_range(struct compact_control *cc,
> >                                          unsigned long start, unsigned long end)
> > @@ -8496,6 +8522,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
> >                                  NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
> >          }
> >          if (ret < 0) {
> > +               dump_migrate_failure_pages(&cc->migratepages);
> >                  putback_movable_pages(&cc->migratepages);
> >                  return ret;
> >          }
> > 
> > 
> 
> If that's the way dynamic debugging is configured/enabled (still have to
> look into it) - yes, that goes into the right direction. As I said above,
> you should dump only where we have some kind of guarantees I assume.

Sure, let me wait for your review before sending next revision.
Thanks for the review!

Minchan Kim March 4, 2021, 6:22 p.m. UTC | #26

On Thu, Mar 04, 2021 at 10:11:35AM -0800, Minchan Kim wrote:
> On Thu, Mar 04, 2021 at 06:23:09PM +0100, David Hildenbrand wrote:
> > > > You want to debug something, so you try triggering it and capturing debug
> > > > data. There are not that many alloc_contig_range() users such that this
> > > > would really be an issue to isolate ...
> > > 
> > > cma_alloc uses alloc_contig_range and cma_alloc has lots of users.
> > > Even, it is expoerted by dmabuf so any userspace would trigger the
> > > allocation by their own. Some of them could be tolerant for the failure,
> > > rest of them could be critical. We should't expect it by limited kernel
> > > usecase.
> > 
> > Assume you are debugging allocation failures. You either collect the data
> > yourself or ask someone to send you that output. You care about any
> > alloc_contig_range() allocation failures that shouldn't happen, don't you?
> > 
> > > 
> > > > 
> > > > Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is
> > > > problematic (putting aside NORETRY logic and similar aside). So any such
> > > > page you hit is worth investigating and, therefore, worth getting logged for
> > > > debugging purposes.
> > > 
> > > If you believe the every alloc_contig_range failure is problematic
> > 
> > Every one where we should have guarantees I guess: ZONE_MOVABLE or
> > MIGRAT_CMA. On ZONE_NORMAL, there are no guarantees.
> 
> Indeed.

How about this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 238d0fc232aa..489e557b9390 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8481,7 +8481,8 @@ static inline void dump_migrate_failure_pages(struct list_head *page_list)

 /* [start, end) must belong to a single zone. */
 static int __alloc_contig_migrate_range(struct compact_control *cc,
-                                       unsigned long start, unsigned long end)
+                                       unsigned long start, unsigned long end,
+                                       bool nofail)
 {
        /* This function is based on compact_zone() from compaction.c. */
        unsigned int nr_reclaimed;
@@ -8522,7 +8523,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
                                NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
        }
        if (ret < 0) {
-               dump_migrate_failure_pages(&cc->migratepages);
+               if (ret == -EBUSY && nofail)
+                       dump_migrate_failure_pages(&cc->migratepages);
                putback_movable_pages(&cc->migratepages);
                return ret;
        }
@@ -8610,7 +8612,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
         * allocated.  So, if we fall through be sure to clear ret so that
         * -EBUSY is not accidentally used or returned to caller.
         */
-       ret = __alloc_contig_migrate_range(&cc, start, end);
+       ret = __alloc_contig_migrate_range(&cc, start, end,
+                                       migratetype == CMA ||
+                                       zone_idx(cc.zone) == ZONE_MOVABLE);
        if (ret && ret != -EBUSY)
                goto done;
        ret =0;

Michal Hocko March 8, 2021, 12:49 p.m. UTC | #27

On Thu 04-03-21 10:22:51, Minchan Kim wrote:
[...]
> How about this?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 238d0fc232aa..489e557b9390 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8481,7 +8481,8 @@ static inline void dump_migrate_failure_pages(struct list_head *page_list)
> 
>  /* [start, end) must belong to a single zone. */
>  static int __alloc_contig_migrate_range(struct compact_control *cc,
> -                                       unsigned long start, unsigned long end)
> +                                       unsigned long start, unsigned long end,
> +                                       bool nofail)

This sounds like a very bad idea to me. Your nofail definition might
differ from what we actually define as __GFP_NOFAIL but I do not think
this interface should ever promise anything that strong.
Sure movable, cma regions should effectively never fail but there will
never be any _guarantee_ for that.

Earlier in the discussion I have suggested dynamic debugging facility.
Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
look into that direction?

>  {
>         /* This function is based on compact_zone() from compaction.c. */
>         unsigned int nr_reclaimed;
> @@ -8522,7 +8523,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>                                 NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
>         }
>         if (ret < 0) {
> -               dump_migrate_failure_pages(&cc->migratepages);
> +               if (ret == -EBUSY && nofail)
> +                       dump_migrate_failure_pages(&cc->migratepages);
>                 putback_movable_pages(&cc->migratepages);
>                 return ret;
>         }
> @@ -8610,7 +8612,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>          * allocated.  So, if we fall through be sure to clear ret so that
>          * -EBUSY is not accidentally used or returned to caller.
>          */
> -       ret = __alloc_contig_migrate_range(&cc, start, end);
> +       ret = __alloc_contig_migrate_range(&cc, start, end,
> +                                       migratetype == CMA ||
> +                                       zone_idx(cc.zone) == ZONE_MOVABLE);
>         if (ret && ret != -EBUSY)
>                 goto done;
>         ret =0;

David Hildenbrand March 8, 2021, 1:22 p.m. UTC | #28

On 08.03.21 13:49, Michal Hocko wrote:
> On Thu 04-03-21 10:22:51, Minchan Kim wrote:
> [...]
>> How about this?
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 238d0fc232aa..489e557b9390 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8481,7 +8481,8 @@ static inline void dump_migrate_failure_pages(struct list_head *page_list)
>>
>>   /* [start, end) must belong to a single zone. */
>>   static int __alloc_contig_migrate_range(struct compact_control *cc,
>> -                                       unsigned long start, unsigned long end)
>> +                                       unsigned long start, unsigned long end,
>> +                                       bool nofail)
> 
> This sounds like a very bad idea to me. Your nofail definition might
> differ from what we actually define as __GFP_NOFAIL but I do not think
> this interface should ever promise anything that strong.
> Sure movable, cma regions should effectively never fail but there will
> never be any _guarantee_ for that.

While there are no guarantees, we want to make such allocations as 
likely as possible to succeed. Not succeeding should be the corner case 
and is worth investigating.

> 
> Earlier in the discussion I have suggested dynamic debugging facility.
> Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
> look into that direction?

Did you see the previous mail this is based on:

https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@google.com

I agree that "nofail" is misleading. Rather something like 
"dump_on_failure", just a better name :)

Michal Hocko March 8, 2021, 2:11 p.m. UTC | #29

On Mon 08-03-21 14:22:12, David Hildenbrand wrote:
> On 08.03.21 13:49, Michal Hocko wrote:
[...]
> > Earlier in the discussion I have suggested dynamic debugging facility.
> > Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
> > look into that direction?
> 
> Did you see the previous mail this is based on:
> 
> https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@google.com
> 
> I agree that "nofail" is misleading. Rather something like
> "dump_on_failure", just a better name :)

Yeah, I have read through the email thread. I just do not get why we
cannot make it pr_debug() and add -DDYNAMIC_DEBUG_MODULE for
page_alloc.c (I haven't checked whether that is possible for built in
compile units, maybe it is not but from a quick seems it should).

I really do not like this to be a part of the API. alloc_contig_range is
a best effort allocator. Complaining about failure is too noisy. I do
agree that some sort of easy to enable debugging is due but please let's
make it as transparent to the code as possible.

David Hildenbrand March 8, 2021, 2:13 p.m. UTC | #30

On 08.03.21 15:11, Michal Hocko wrote:
> On Mon 08-03-21 14:22:12, David Hildenbrand wrote:
>> On 08.03.21 13:49, Michal Hocko wrote:
> [...]
>>> Earlier in the discussion I have suggested dynamic debugging facility.
>>> Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
>>> look into that direction?
>>
>> Did you see the previous mail this is based on:
>>
>> https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@google.com
>>
>> I agree that "nofail" is misleading. Rather something like
>> "dump_on_failure", just a better name :)
> 
> Yeah, I have read through the email thread. I just do not get why we
> cannot make it pr_debug() and add -DDYNAMIC_DEBUG_MODULE for
> page_alloc.c (I haven't checked whether that is possible for built in
> compile units, maybe it is not but from a quick seems it should).
> 
> I really do not like this to be a part of the API. alloc_contig_range is

Which API? It does not affect alloc_contig_range() itself, it's used 
internally only. Sure, we could simply pr_debug() for each and every 
migration failure. As long as it's default-disabled, sure.

I do agree that we should look into properly including this into the 
dynamic debugging ifrastructure.

Michal Hocko March 8, 2021, 3:42 p.m. UTC | #31

On Mon 08-03-21 15:13:35, David Hildenbrand wrote:
> On 08.03.21 15:11, Michal Hocko wrote:
> > On Mon 08-03-21 14:22:12, David Hildenbrand wrote:
> > > On 08.03.21 13:49, Michal Hocko wrote:
> > [...]
> > > > Earlier in the discussion I have suggested dynamic debugging facility.
> > > > Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
> > > > look into that direction?
> > > 
> > > Did you see the previous mail this is based on:
> > > 
> > > https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@google.com
> > > 
> > > I agree that "nofail" is misleading. Rather something like
> > > "dump_on_failure", just a better name :)
> > 
> > Yeah, I have read through the email thread. I just do not get why we
> > cannot make it pr_debug() and add -DDYNAMIC_DEBUG_MODULE for
> > page_alloc.c (I haven't checked whether that is possible for built in
> > compile units, maybe it is not but from a quick seems it should).
> > 
> > I really do not like this to be a part of the API. alloc_contig_range is
> 
> Which API?

Any level of the alloc_contig_range api because I strongly suspect that
once there is something on the lower levels there will be a push to have
it in the directly consumed api as well. Besides that I think this is
just a wrong way to approach the problem.

> It does not affect alloc_contig_range() itself, it's used
> internally only. Sure, we could simply pr_debug() for each and every
> migration failure. As long as it's default-disabled, sure.
> 
> I do agree that we should look into properly including this into the dynamic
> debugging ifrastructure.

Yeah, unless we learn this is not feasible for some reason, which I do
not see right now, then let's just make it pr_debug with the runtime
control.

Minchan Kim March 8, 2021, 3:58 p.m. UTC | #32

On Mon, Mar 08, 2021 at 04:42:43PM +0100, Michal Hocko wrote:
> On Mon 08-03-21 15:13:35, David Hildenbrand wrote:
> > On 08.03.21 15:11, Michal Hocko wrote:
> > > On Mon 08-03-21 14:22:12, David Hildenbrand wrote:
> > > > On 08.03.21 13:49, Michal Hocko wrote:
> > > [...]
> > > > > Earlier in the discussion I have suggested dynamic debugging facility.
> > > > > Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
> > > > > look into that direction?
> > > > 
> > > > Did you see the previous mail this is based on:
> > > > 
> > > > https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@google.com
> > > > 
> > > > I agree that "nofail" is misleading. Rather something like
> > > > "dump_on_failure", just a better name :)
> > > 
> > > Yeah, I have read through the email thread. I just do not get why we
> > > cannot make it pr_debug() and add -DDYNAMIC_DEBUG_MODULE for
> > > page_alloc.c (I haven't checked whether that is possible for built in
> > > compile units, maybe it is not but from a quick seems it should).
> > > 
> > > I really do not like this to be a part of the API. alloc_contig_range is
> > 
> > Which API?
> 
> Any level of the alloc_contig_range api because I strongly suspect that
> once there is something on the lower levels there will be a push to have
> it in the directly consumed api as well. Besides that I think this is
> just a wrong way to approach the problem.
> 
> > It does not affect alloc_contig_range() itself, it's used
> > internally only. Sure, we could simply pr_debug() for each and every
> > migration failure. As long as it's default-disabled, sure.
> > 
> > I do agree that we should look into properly including this into the dynamic
> > debugging ifrastructure.
> 
> Yeah, unless we learn this is not feasible for some reason, which I do
> not see right now, then let's just make it pr_debug with the runtime
> control.

What do you see the problem? It's the dynamic debugging facility
to enable only when admin want to use it. Otherwise, it's nop
unless is't not enabled. Furthermore, it doesn't need to invent
custom dump_page implementation(including dump_page_owner) by
chaning pr_debug.
Could you clarify your requirement?

https://lore.kernel.org/linux-mm/YEEUq8ZRn4WyYWVx@google.com/

Since David agreed to drop nofail option in the API, I will
keep the URL patch.

Michal Hocko March 8, 2021, 4:21 p.m. UTC | #33

On Mon 08-03-21 07:58:11, Minchan Kim wrote:
[...]
> It's the dynamic debugging facility
> to enable only when admin want to use it. Otherwise, it's nop
> unless is't not enabled. Furthermore, it doesn't need to invent
> custom dump_page implementation(including dump_page_owner) by
> chaning pr_debug.
> Could you clarify your requirement?
> 
> https://lore.kernel.org/linux-mm/YEEUq8ZRn4WyYWVx@google.com/

I am not really sure this is the right way to enable dynamic logging.
Maybe it is. I thought we can go with something as simple as pr_debug.
You are right that we do not have dump_page with the kernel log level.
This is rather annoying but a) do we need a full dump_page functionality
and b) can we make it log level aware with the dynamic debug
infrastructure preserved? If not then then an explicit handling is
probably the only way and this should be reviewed by people who are more
familiar with that framework than me.

Minchan Kim March 8, 2021, 5:01 p.m. UTC | #34

On Mon, Mar 08, 2021 at 05:21:47PM +0100, Michal Hocko wrote:
> On Mon 08-03-21 07:58:11, Minchan Kim wrote:
> [...]
> > It's the dynamic debugging facility
> > to enable only when admin want to use it. Otherwise, it's nop
> > unless is't not enabled. Furthermore, it doesn't need to invent
> > custom dump_page implementation(including dump_page_owner) by
> > chaning pr_debug.
> > Could you clarify your requirement?
> > 
> > https://lore.kernel.org/linux-mm/YEEUq8ZRn4WyYWVx@google.com/
> 
> I am not really sure this is the right way to enable dynamic logging.
> Maybe it is. I thought we can go with something as simple as pr_debug.
> You are right that we do not have dump_page with the kernel log level.
> This is rather annoying but a) do we need a full dump_page functionality

Most parts I take care of are

        pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n",
                        page, page_ref_count(head), mapcount, mapping,
                        page_to_pgoff(page), page_to_pfn(page));

        pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
                page_cma ? " CMA" : "");


And dump_page_owner which was super helpful to find GUP places.

Looks like most of dump_pages.

> and b) can we make it log level aware with the dynamic debug
> infrastructure preserved? If not then then an explicit handling is

If we could make aware of loglevel, we need to enable each by each IOW.
IOW, what we want to enable is mm/page_alloc.c #1492 line, for example.

However, we should enable 
  mm/debug.c # 88 line
  mm/debug.c # 102 line
  mm/debug.c # 121 line
mm/page_owner.c # 448 line
mm/page_owner.c # 450 line
kernel/stacktrace.c #32 line

And so on. Furthermore, user should be aware of that how the kernel code
is changed for those all sites and reconfigure and follow new added
code.

So, I choosed explict handling.

> probably the only way and this should be reviewed by people who are more
> familiar with that framework than me.

> -- 
> Michal Hocko
> SUSE Labs

Minchan Kim March 8, 2021, 8:27 p.m. UTC | #35

On Mon, Mar 08, 2021 at 07:58:11AM -0800, Minchan Kim wrote:
> On Mon, Mar 08, 2021 at 04:42:43PM +0100, Michal Hocko wrote:
> > On Mon 08-03-21 15:13:35, David Hildenbrand wrote:
> > > On 08.03.21 15:11, Michal Hocko wrote:
> > > > On Mon 08-03-21 14:22:12, David Hildenbrand wrote:
> > > > > On 08.03.21 13:49, Michal Hocko wrote:
> > > > [...]
> > > > > > Earlier in the discussion I have suggested dynamic debugging facility.
> > > > > > Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
> > > > > > look into that direction?
> > > > > 
> > > > > Did you see the previous mail this is based on:
> > > > > 
> > > > > https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@google.com
> > > > > 
> > > > > I agree that "nofail" is misleading. Rather something like
> > > > > "dump_on_failure", just a better name :)
> > > > 
> > > > Yeah, I have read through the email thread. I just do not get why we
> > > > cannot make it pr_debug() and add -DDYNAMIC_DEBUG_MODULE for
> > > > page_alloc.c (I haven't checked whether that is possible for built in
> > > > compile units, maybe it is not but from a quick seems it should).
> > > > 
> > > > I really do not like this to be a part of the API. alloc_contig_range is
> > > 
> > > Which API?
> > 
> > Any level of the alloc_contig_range api because I strongly suspect that
> > once there is something on the lower levels there will be a push to have
> > it in the directly consumed api as well. Besides that I think this is
> > just a wrong way to approach the problem.
> > 
> > > It does not affect alloc_contig_range() itself, it's used
> > > internally only. Sure, we could simply pr_debug() for each and every
> > > migration failure. As long as it's default-disabled, sure.
> > > 
> > > I do agree that we should look into properly including this into the dynamic
> > > debugging ifrastructure.
> > 
> > Yeah, unless we learn this is not feasible for some reason, which I do
> > not see right now, then let's just make it pr_debug with the runtime
> > control.
> 
> What do you see the problem? It's the dynamic debugging facility
> to enable only when admin want to use it. Otherwise, it's nop
> unless is't not enabled. Furthermore, it doesn't need to invent
> custom dump_page implementation(including dump_page_owner) by
> chaning pr_debug.
> Could you clarify your requirement?
> 
> https://lore.kernel.org/linux-mm/YEEUq8ZRn4WyYWVx@google.com/
> 
> Since David agreed to drop nofail option in the API, I will
> keep the URL patch.

I posted formal patch with Ccing dynamic debug maintainer.
https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@kernel.org/

Let's discuss stuff related to dynamic debug there.

mm: be more verbose for alloc_contig_range faliures

Commit Message

Comments

Patch