diff mbox series

[v2,2/9] mm/vmstat: show start_pfn when zone spans pages

Message ID 20220928223301.375229-3-opendmb@gmail.com (mailing list archive)
State New
Headers show
Series mm: introduce Designated Movable Blocks | expand

Commit Message

Doug Berger Sept. 28, 2022, 10:32 p.m. UTC
A zone that overlaps with another zone may span a range of pages
that are not present. In this case, displaying the start_pfn of
the zone allows the zone page range to be identified.

Signed-off-by: Doug Berger <opendmb@gmail.com>
---
 mm/vmstat.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

David Hildenbrand Sept. 29, 2022, 8:15 a.m. UTC | #1
On 29.09.22 00:32, Doug Berger wrote:
> A zone that overlaps with another zone may span a range of pages
> that are not present. In this case, displaying the start_pfn of
> the zone allows the zone page range to be identified.
> 

I don't understand the intention here.

"/* If unpopulated, no other information is useful */"

Why would the start pfn be of any use here?

What is the user visible impact without that change?

> Signed-off-by: Doug Berger <opendmb@gmail.com>
> ---
>   mm/vmstat.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 90af9a8572f5..e2f19f2b7615 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1717,6 +1717,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>   
>   	/* If unpopulated, no other information is useful */
>   	if (!populated_zone(zone)) {
> +		/* Show start_pfn for empty overlapped zones */
> +		if (zone->spanned_pages)
> +			seq_printf(m,
> +				   "\n  start_pfn:           %lu",
> +				   zone->zone_start_pfn);
>   		seq_putc(m, '\n');
>   		return;
>   	}
Doug Berger Oct. 1, 2022, 1:28 a.m. UTC | #2
On 9/29/2022 1:15 AM, David Hildenbrand wrote:
> On 29.09.22 00:32, Doug Berger wrote:
>> A zone that overlaps with another zone may span a range of pages
>> that are not present. In this case, displaying the start_pfn of
>> the zone allows the zone page range to be identified.
>>
> 
> I don't understand the intention here.
> 
> "/* If unpopulated, no other information is useful */"
> 
> Why would the start pfn be of any use here?
> 
> What is the user visible impact without that change?
Yes, this is very subtle. I only caught it while testing some 
pathological cases.

If you take the example system:
The 7278 device has four ARMv8 CPU cores in an SMP cluster and two 
memory controllers (MEMCs). Each MEMC is capable of controlling up to 
8GB of DRAM. An example 7278 system might have 1GB on each controller, 
so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and 
1GB on MEMC1 at 0x300000000-0x33FFFFFFF.

Placing a DMB on MEMC0 with 'movablecore=256M@0x70000000' will lead to 
the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the 
ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.

If instead you specified 'movablecore=256M@0x70000000,512M' you would 
get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span 
0x300000000-0x32fffffff. The requested 512M of movablecore would be 
divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable 
zone start would be displayed in the bootlog as:
[    0.000000] Movable zone start for each node
[    0.000000]   Node 0: 0x000000330000000

Finally, if you specified the pathological 
'movablecore=256M@0x70000000,1G@12G' you would still have the same 
ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to 
0x300000000-0x33fffffff. However, because the second DMB (1G@12G) 
completely overlaps the ZONE_NORMAL there would be no pages present in 
ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned 
262144', but not where those pages are. This commit adds the 'start_pfn' 
back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.

Regards,
     Doug

> 
>> Signed-off-by: Doug Berger <opendmb@gmail.com>
>> ---
>>   mm/vmstat.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 90af9a8572f5..e2f19f2b7615 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1717,6 +1717,11 @@ static void zoneinfo_show_print(struct seq_file 
>> *m, pg_data_t *pgdat,
>>       /* If unpopulated, no other information is useful */
>>       if (!populated_zone(zone)) {
>> +        /* Show start_pfn for empty overlapped zones */
>> +        if (zone->spanned_pages)
>> +            seq_printf(m,
>> +                   "\n  start_pfn:           %lu",
>> +                   zone->zone_start_pfn);
>>           seq_putc(m, '\n');
>>           return;
>>       }
David Hildenbrand Oct. 5, 2022, 6:09 p.m. UTC | #3
On 01.10.22 03:28, Doug Berger wrote:
> On 9/29/2022 1:15 AM, David Hildenbrand wrote:
>> On 29.09.22 00:32, Doug Berger wrote:
>>> A zone that overlaps with another zone may span a range of pages
>>> that are not present. In this case, displaying the start_pfn of
>>> the zone allows the zone page range to be identified.
>>>
>>
>> I don't understand the intention here.
>>
>> "/* If unpopulated, no other information is useful */"
>>
>> Why would the start pfn be of any use here?
>>
>> What is the user visible impact without that change?
> Yes, this is very subtle. I only caught it while testing some
> pathological cases.
> 
> If you take the example system:
> The 7278 device has four ARMv8 CPU cores in an SMP cluster and two
> memory controllers (MEMCs). Each MEMC is capable of controlling up to
> 8GB of DRAM. An example 7278 system might have 1GB on each controller,
> so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and
> 1GB on MEMC1 at 0x300000000-0x33FFFFFFF.
> 

Okay, thanks. You should make it clearer in the patch description -- 
especially how this relates to DMB. Having that said, I still have to 
digest your examples:

> Placing a DMB on MEMC0 with 'movablecore=256M@0x70000000' will lead to
> the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the
> ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.

Why is ZONE_MOVABLE spanning more than 256M? It should span

0x70000000-0x80000000

Or what am I missing?

> 
> If instead you specified 'movablecore=256M@0x70000000,512M' you would
> get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span
> 0x300000000-0x32fffffff. The requested 512M of movablecore would be
> divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable
> zone start would be displayed in the bootlog as:
> [    0.000000] Movable zone start for each node
> [    0.000000]   Node 0: 0x000000330000000


Okay, so that's the movable zone range excluding DMB.

> 
> Finally, if you specified the pathological
> 'movablecore=256M@0x70000000,1G@12G' you would still have the same
> ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to
> 0x300000000-0x33fffffff. However, because the second DMB (1G@12G)
> completely overlaps the ZONE_NORMAL there would be no pages present in
> ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned
> 262144', but not where those pages are. This commit adds the 'start_pfn'
> back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.

... but why? If there are no pages present, there is no ZONE_NORMAL we 
care about. The zone span should be 0. Does this maybe rather indicate 
that there is a zone span processing issue in your DMB implementation?

Special-casing zones based on DMBs feels wrong. But most probably I am 
missing something important :)
Doug Berger Oct. 12, 2022, 11:57 p.m. UTC | #4
On 10/5/2022 11:09 AM, David Hildenbrand wrote:
> On 01.10.22 03:28, Doug Berger wrote:
>> On 9/29/2022 1:15 AM, David Hildenbrand wrote:
>>> On 29.09.22 00:32, Doug Berger wrote:
>>>> A zone that overlaps with another zone may span a range of pages
>>>> that are not present. In this case, displaying the start_pfn of
>>>> the zone allows the zone page range to be identified.
>>>>
>>>
>>> I don't understand the intention here.
>>>
>>> "/* If unpopulated, no other information is useful */"
>>>
>>> Why would the start pfn be of any use here?
>>>
>>> What is the user visible impact without that change?
>> Yes, this is very subtle. I only caught it while testing some
>> pathological cases.
>>
>> If you take the example system:
>> The 7278 device has four ARMv8 CPU cores in an SMP cluster and two
>> memory controllers (MEMCs). Each MEMC is capable of controlling up to
>> 8GB of DRAM. An example 7278 system might have 1GB on each controller,
>> so an arm64 kernel might see 1GB on MEMC0 at 0x40000000-0x7FFFFFFF and
>> 1GB on MEMC1 at 0x300000000-0x33FFFFFFF.
>>
> 
> Okay, thanks. You should make it clearer in the patch description -- 
> especially how this relates to DMB. Having that said, I still have to 
> digest your examples:
> 
>> Placing a DMB on MEMC0 with 'movablecore=256M@0x70000000' will lead to
>> the ZONE_MOVABLE zone spanning from 0x70000000-0x33fffffff and the
>> ZONE_NORMAL zone spanning from 0x300000000-0x33fffffff.
> 
> Why is ZONE_MOVABLE spanning more than 256M? It should span
> 
> 0x70000000-0x80000000
> 
> Or what am I missing?
I was working from the notion that the classic 'movablecore' 
implementation keeps the ZONE_MOVABLE zone the last zone on System RAM 
so it always spans the last page on the node (i.e. 0x33ffff000). My 
implementation moves the start of ZONE_MOVABLE up to the lowest page of 
any defined DMBs on the node.

I see that memory hotplug does not behave this way, which is probably 
more intuitive (though less consistent with the classic zone layout). I 
could attempt to change this in a v3 if desired.

> 
>>
>> If instead you specified 'movablecore=256M@0x70000000,512M' you would
>> get the same ZONE_MOVABLE span, but the ZONE_NORMAL would now span
>> 0x300000000-0x32fffffff. The requested 512M of movablecore would be
>> divided into a 256MB DMB at 0x70000000 and a 256MB "classic" movable
>> zone start would be displayed in the bootlog as:
>> [    0.000000] Movable zone start for each node
>> [    0.000000]   Node 0: 0x000000330000000
> 
> 
> Okay, so that's the movable zone range excluding DMB.
> 
>>
>> Finally, if you specified the pathological
>> 'movablecore=256M@0x70000000,1G@12G' you would still have the same
>> ZONE_MOVABLE span, and the ZONE_NORMAL span would go back to
>> 0x300000000-0x33fffffff. However, because the second DMB (1G@12G)
>> completely overlaps the ZONE_NORMAL there would be no pages present in
>> ZONE_NORMAL and /proc/zoneinfo would report ZONE_NORMAL 'spanned
>> 262144', but not where those pages are. This commit adds the 'start_pfn'
>> back to the /proc/zoneinfo for ZONE_NORMAL so the span has context.
> 
> ... but why? If there are no pages present, there is no ZONE_NORMAL we 
> care about. The zone span should be 0. Does this maybe rather indicate 
> that there is a zone span processing issue in your DMB implementation?
My implementation uses the zones created by the classic 'movablecore' 
behavior and relocates the pages within DMBs. In this case the 
ZONE_NORMAL still has a span which gets output but no present pages so 
the output didn't show where the zone was without this patch. This is a 
convenience to avoid adding zone resizing and destruction logic outside 
of memory hotplug support, but I could attempt to add that code in a v3 
if desired.

> 
> Special-casing zones based on DMBs feels wrong. But most probably I am 
> missing something important :)
> 

Thanks for making me aware of your confusion so I can attempt to make it 
clearer.
-Doug
Michal Hocko Oct. 13, 2022, 11:44 a.m. UTC | #5
On Wed 12-10-22 16:57:53, Doug Berger wrote:
[...]
> I was working from the notion that the classic 'movablecore' implementation
> keeps the ZONE_MOVABLE zone the last zone on System RAM so it always spans
> the last page on the node (i.e. 0x33ffff000). My implementation moves the
> start of ZONE_MOVABLE up to the lowest page of any defined DMBs on the node.

I wouldn't rely on movablecore specific implementation. ZONE_MOVABLE can
span any physical address range. ZONE_NORMAL usually covers any ranges
not covered by more specific zones like ZONE_DMA{32}. At least on most
architectures I am familiar with.
diff mbox series

Patch

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 90af9a8572f5..e2f19f2b7615 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1717,6 +1717,11 @@  static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 
 	/* If unpopulated, no other information is useful */
 	if (!populated_zone(zone)) {
+		/* Show start_pfn for empty overlapped zones */
+		if (zone->spanned_pages)
+			seq_printf(m,
+				   "\n  start_pfn:           %lu",
+				   zone->zone_start_pfn);
 		seq_putc(m, '\n');
 		return;
 	}