diff mbox

[1/2] arm64: mem-model: add flatmem model for arm64

Message ID 1459844572-53069-1-git-send-email-puck.chen@hisilicon.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chen Feng April 5, 2016, 8:22 a.m. UTC
We can reduce the memory allocated at mem-map
by flatmem.

currently, the default memory-model in arm64 is
sparse memory. The mem-map array is not freed in
this scene. If the physical address is too long,
it will reserved too much memory for the mem-map
array.

Signed-off-by: Chen Feng <puck.chen@hisilicon.com>
Signed-off-by: Fu Jun <oliver.fu@hisilicon.com>
---
 arch/arm64/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chen Feng April 7, 2016, 7:38 a.m. UTC | #1
add Mel Gorman

On 2016/4/5 16:22, Chen Feng wrote:
> We can reduce the memory allocated at mem-map
> by flatmem.
> 
> currently, the default memory-model in arm64 is
> sparse memory. The mem-map array is not freed in
> this scene. If the physical address is too long,
> it will reserved too much memory for the mem-map
> array.
> 
> Signed-off-by: Chen Feng <puck.chen@hisilicon.com>
> Signed-off-by: Fu Jun <oliver.fu@hisilicon.com>
> ---
>  arch/arm64/Kconfig | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4f43622..c18930d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -559,6 +559,9 @@ config ARCH_SPARSEMEM_ENABLE
>  	def_bool y
>  	select SPARSEMEM_VMEMMAP_ENABLE
>  
> +config ARCH_FLATMEM_ENABLE
> +	def_bool y
> +
>  config ARCH_SPARSEMEM_DEFAULT
>  	def_bool ARCH_SPARSEMEM_ENABLE
>  
>
Will Deacon April 7, 2016, 2:21 p.m. UTC | #2
On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
> We can reduce the memory allocated at mem-map
> by flatmem.
> 
> currently, the default memory-model in arm64 is
> sparse memory. The mem-map array is not freed in
> this scene. If the physical address is too long,
> it will reserved too much memory for the mem-map
> array.

Can you elaborate a bit more on this, please? We use the vmemmap, so any
spaces between memory banks only burns up virtual space. What exactly is
the problem you're seeing that makes you want to use flatmem (which is
probably unsuitable for the majority of arm64 machines).

Will
Chen Feng April 11, 2016, 2:49 a.m. UTC | #3
Hi will,
Thanks for review.

On 2016/4/7 22:21, Will Deacon wrote:
> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>> We can reduce the memory allocated at mem-map
>> by flatmem.
>>
>> currently, the default memory-model in arm64 is
>> sparse memory. The mem-map array is not freed in
>> this scene. If the physical address is too long,
>> it will reserved too much memory for the mem-map
>> array.
> 
> Can you elaborate a bit more on this, please? We use the vmemmap, so any
> spaces between memory banks only burns up virtual space. What exactly is
> the problem you're seeing that makes you want to use flatmem (which is
> probably unsuitable for the majority of arm64 machines).
> 
The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
is not freed.

take a look at here:
arm64/mm/init.c
void __init mem_init(void)
{
#ifndef CONFIG_SPARSEMEM_VMEMMAP
	free_unused_memmap();
#endif
}

Memory layout (3GB)

 0             1.5G    2G             3.5G            4G
 |              |      |               |              |
 +--------------+------+---------------+--------------+
 |    MEM       | hole |     MEM       |   IO (regs)  |
 +--------------+------+---------------+--------------+


Memory layout (4GB)

 0                                    3.5G            4G    4.5G
 |                                     |              |       |
 +-------------------------------------+--------------+-------+
 |                   MEM               |   IO (regs)  |  MEM  |
 +-------------------------------------+--------------+-------+

Currently, the sparse memory section is 1GB.

3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.

This will alloc 1G/4K * (struct page) memory for mem_map array.

We want to use flat-mem to reduce the alloced mem_map.

I don't know why you tell us the flatmem is unsuitable for the
majority of arm64 machines. Can tell us the reason of it?

And we are not going to limit the memdel in arm64, we just want to
make the flat-mem is an optional item in arm64.


puck,


> Will
> 
> .
>
Ard Biesheuvel April 11, 2016, 7:35 a.m. UTC | #4
On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
> Hi will,
> Thanks for review.
>
> On 2016/4/7 22:21, Will Deacon wrote:
>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>> We can reduce the memory allocated at mem-map
>>> by flatmem.
>>>
>>> currently, the default memory-model in arm64 is
>>> sparse memory. The mem-map array is not freed in
>>> this scene. If the physical address is too long,
>>> it will reserved too much memory for the mem-map
>>> array.
>>
>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>> spaces between memory banks only burns up virtual space. What exactly is
>> the problem you're seeing that makes you want to use flatmem (which is
>> probably unsuitable for the majority of arm64 machines).
>>
> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
> is not freed.
>
> take a look at here:
> arm64/mm/init.c
> void __init mem_init(void)
> {
> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>         free_unused_memmap();
> #endif
> }
>
> Memory layout (3GB)
>
>  0             1.5G    2G             3.5G            4G
>  |              |      |               |              |
>  +--------------+------+---------------+--------------+
>  |    MEM       | hole |     MEM       |   IO (regs)  |
>  +--------------+------+---------------+--------------+
>
>
> Memory layout (4GB)
>
>  0                                    3.5G            4G    4.5G
>  |                                     |              |       |
>  +-------------------------------------+--------------+-------+
>  |                   MEM               |   IO (regs)  |  MEM  |
>  +-------------------------------------+--------------+-------+
>
> Currently, the sparse memory section is 1GB.
>
> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>
> This will alloc 1G/4K * (struct page) memory for mem_map array.
>

No, this is incorrect. Sparsemem vmemmap only allocates struct pages
for memory regions that are actually populated.

For instance, on the Foundation model with 4 GB of memory, you may see
something like this in the boot log

[    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
(     8 GB maximum)
[    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
(   544 MB actual)

but in reality, only the following regions have been allocated

---[ vmemmap start ]---
0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
      BLK UXN MEM/NORMAL
0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
      BLK UXN MEM/NORMAL
---[ vmemmap end ]---

so only 64 MB is used to back 4 GB of RAM with struct pages, which is
minimal. Moving to flatmem will not reduce the memory footprint at
all.
Chen Feng April 11, 2016, 7:55 a.m. UTC | #5
Hi Ard,

On 2016/4/11 15:35, Ard Biesheuvel wrote:
> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>> Hi will,
>> Thanks for review.
>>
>> On 2016/4/7 22:21, Will Deacon wrote:
>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>> We can reduce the memory allocated at mem-map
>>>> by flatmem.
>>>>
>>>> currently, the default memory-model in arm64 is
>>>> sparse memory. The mem-map array is not freed in
>>>> this scene. If the physical address is too long,
>>>> it will reserved too much memory for the mem-map
>>>> array.
>>>
>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>> spaces between memory banks only burns up virtual space. What exactly is
>>> the problem you're seeing that makes you want to use flatmem (which is
>>> probably unsuitable for the majority of arm64 machines).
>>>
>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>> is not freed.
>>
>> take a look at here:
>> arm64/mm/init.c
>> void __init mem_init(void)
>> {
>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>         free_unused_memmap();
>> #endif
>> }
>>
>> Memory layout (3GB)
>>
>>  0             1.5G    2G             3.5G            4G
>>  |              |      |               |              |
>>  +--------------+------+---------------+--------------+
>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>  +--------------+------+---------------+--------------+
>>
>>
>> Memory layout (4GB)
>>
>>  0                                    3.5G            4G    4.5G
>>  |                                     |              |       |
>>  +-------------------------------------+--------------+-------+
>>  |                   MEM               |   IO (regs)  |  MEM  |
>>  +-------------------------------------+--------------+-------+
>>
>> Currently, the sparse memory section is 1GB.
>>
>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>
>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>
> 
> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
> for memory regions that are actually populated.
> 
> For instance, on the Foundation model with 4 GB of memory, you may see
> something like this in the boot log
> 
> [    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
> (     8 GB maximum)
> [    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
> (   544 MB actual)
> 
> but in reality, only the following regions have been allocated
> 
> ---[ vmemmap start ]---
> 0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
>       BLK UXN MEM/NORMAL
> 0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
>       BLK UXN MEM/NORMAL
> ---[ vmemmap end ]---
> 
> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
> minimal. Moving to flatmem will not reduce the memory footprint at
> all.

Yes,but the populate is section, which is 1GB. Take a look at the above
memory layout.

The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.

The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>  0             1.5G    2G             3.5G            4G
>>  |              |      |               |              |
>>  +--------------+------+---------------+--------------+
>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>  +--------------+------+---------------+--------------+
The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.

We want free the the mem-map array. With flat-mem we can work with this scene very well.

Thanks,


> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> .
>
Ard Biesheuvel April 11, 2016, 8 a.m. UTC | #6
On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
> Hi Ard,
>
> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>>> Hi will,
>>> Thanks for review.
>>>
>>> On 2016/4/7 22:21, Will Deacon wrote:
>>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>>> We can reduce the memory allocated at mem-map
>>>>> by flatmem.
>>>>>
>>>>> currently, the default memory-model in arm64 is
>>>>> sparse memory. The mem-map array is not freed in
>>>>> this scene. If the physical address is too long,
>>>>> it will reserved too much memory for the mem-map
>>>>> array.
>>>>
>>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>>> spaces between memory banks only burns up virtual space. What exactly is
>>>> the problem you're seeing that makes you want to use flatmem (which is
>>>> probably unsuitable for the majority of arm64 machines).
>>>>
>>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>>> is not freed.
>>>
>>> take a look at here:
>>> arm64/mm/init.c
>>> void __init mem_init(void)
>>> {
>>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>>         free_unused_memmap();
>>> #endif
>>> }
>>>
>>> Memory layout (3GB)
>>>
>>>  0             1.5G    2G             3.5G            4G
>>>  |              |      |               |              |
>>>  +--------------+------+---------------+--------------+
>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>  +--------------+------+---------------+--------------+
>>>
>>>
>>> Memory layout (4GB)
>>>
>>>  0                                    3.5G            4G    4.5G
>>>  |                                     |              |       |
>>>  +-------------------------------------+--------------+-------+
>>>  |                   MEM               |   IO (regs)  |  MEM  |
>>>  +-------------------------------------+--------------+-------+
>>>
>>> Currently, the sparse memory section is 1GB.
>>>
>>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>>
>>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>>
>>
>> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
>> for memory regions that are actually populated.
>>
>> For instance, on the Foundation model with 4 GB of memory, you may see
>> something like this in the boot log
>>
>> [    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
>> (     8 GB maximum)
>> [    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
>> (   544 MB actual)
>>
>> but in reality, only the following regions have been allocated
>>
>> ---[ vmemmap start ]---
>> 0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
>>       BLK UXN MEM/NORMAL
>> 0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
>>       BLK UXN MEM/NORMAL
>> ---[ vmemmap end ]---
>>
>> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
>> minimal. Moving to flatmem will not reduce the memory footprint at
>> all.
>
> Yes,but the populate is section, which is 1GB. Take a look at the above
> memory layout.
>
> The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.
>
> The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>>  0             1.5G    2G             3.5G            4G
>>>  |              |      |               |              |
>>>  +--------------+------+---------------+--------------+
>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>  +--------------+------+---------------+--------------+
> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>

No, it is not. It may be covered by a section, but that does not mean
sparsemem vmemmap will actually allocate backing for it. The
granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
due to the fact that the backing is performed at PMD granularity.

Please, could you share the contents of the vmemmap section in
/sys/kernel/debug/kernel_page_tables of your system running with
sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
Chen Feng April 11, 2016, 9:59 a.m. UTC | #7
Hi Ard,

On 2016/4/11 16:00, Ard Biesheuvel wrote:
> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
>> Hi Ard,
>>
>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>> Hi will,
>>>> Thanks for review.
>>>>
>>>> On 2016/4/7 22:21, Will Deacon wrote:
>>>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>>>> We can reduce the memory allocated at mem-map
>>>>>> by flatmem.
>>>>>>
>>>>>> currently, the default memory-model in arm64 is
>>>>>> sparse memory. The mem-map array is not freed in
>>>>>> this scene. If the physical address is too long,
>>>>>> it will reserved too much memory for the mem-map
>>>>>> array.
>>>>>
>>>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>>>> spaces between memory banks only burns up virtual space. What exactly is
>>>>> the problem you're seeing that makes you want to use flatmem (which is
>>>>> probably unsuitable for the majority of arm64 machines).
>>>>>
>>>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>>>> is not freed.
>>>>
>>>> take a look at here:
>>>> arm64/mm/init.c
>>>> void __init mem_init(void)
>>>> {
>>>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>>>         free_unused_memmap();
>>>> #endif
>>>> }
>>>>
>>>> Memory layout (3GB)
>>>>
>>>>  0             1.5G    2G             3.5G            4G
>>>>  |              |      |               |              |
>>>>  +--------------+------+---------------+--------------+
>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>  +--------------+------+---------------+--------------+
>>>>
>>>>
>>>> Memory layout (4GB)
>>>>
>>>>  0                                    3.5G            4G    4.5G
>>>>  |                                     |              |       |
>>>>  +-------------------------------------+--------------+-------+
>>>>  |                   MEM               |   IO (regs)  |  MEM  |
>>>>  +-------------------------------------+--------------+-------+
>>>>
>>>> Currently, the sparse memory section is 1GB.
>>>>
>>>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>>>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>>>
>>>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>>>
>>>
>>> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
>>> for memory regions that are actually populated.
>>>
>>> For instance, on the Foundation model with 4 GB of memory, you may see
>>> something like this in the boot log
>>>
>>> [    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
>>> (     8 GB maximum)
>>> [    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
>>> (   544 MB actual)
>>>
>>> but in reality, only the following regions have been allocated
>>>
>>> ---[ vmemmap start ]---
>>> 0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
>>>       BLK UXN MEM/NORMAL
>>> 0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
>>>       BLK UXN MEM/NORMAL
>>> ---[ vmemmap end ]---
>>>
>>> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
>>> minimal. Moving to flatmem will not reduce the memory footprint at
>>> all.
>>
>> Yes,but the populate is section, which is 1GB. Take a look at the above
>> memory layout.
>>
>> The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.
>>
>> The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>>>  0             1.5G    2G             3.5G            4G
>>>>  |              |      |               |              |
>>>>  +--------------+------+---------------+--------------+
>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>  +--------------+------+---------------+--------------+
>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>>
> 
> No, it is not. It may be covered by a section, but that does not mean
> sparsemem vmemmap will actually allocate backing for it. The
> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
> due to the fact that the backing is performed at PMD granularity.
> 
> Please, could you share the contents of the vmemmap section in
> /sys/kernel/debug/kernel_page_tables of your system running with
> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
>

Please see the pg-tables below.


With sparse and vmemmap enable.

---[ vmemmap start ]---
0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
---[ vmemmap end ]---


The board is 4GB, and the memap is 70MB
1G memory --- 14MB mem_map array.
So the 4GB has 5 sections, which used 5 * 14MB memory.






> .
>
Ard Biesheuvel April 11, 2016, 10:31 a.m. UTC | #8
On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> Hi Ard,
>
> On 2016/4/11 16:00, Ard Biesheuvel wrote:
>> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
>>> Hi Ard,
>>>
>>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>>> Hi will,
>>>>> Thanks for review.
>>>>>
>>>>> On 2016/4/7 22:21, Will Deacon wrote:
>>>>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>>>>> We can reduce the memory allocated at mem-map
>>>>>>> by flatmem.
>>>>>>>
>>>>>>> currently, the default memory-model in arm64 is
>>>>>>> sparse memory. The mem-map array is not freed in
>>>>>>> this scene. If the physical address is too long,
>>>>>>> it will reserved too much memory for the mem-map
>>>>>>> array.
>>>>>>
>>>>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>>>>> spaces between memory banks only burns up virtual space. What exactly is
>>>>>> the problem you're seeing that makes you want to use flatmem (which is
>>>>>> probably unsuitable for the majority of arm64 machines).
>>>>>>
>>>>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>>>>> is not freed.
>>>>>
>>>>> take a look at here:
>>>>> arm64/mm/init.c
>>>>> void __init mem_init(void)
>>>>> {
>>>>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>>>>         free_unused_memmap();
>>>>> #endif
>>>>> }
>>>>>
>>>>> Memory layout (3GB)
>>>>>
>>>>>  0             1.5G    2G             3.5G            4G
>>>>>  |              |      |               |              |
>>>>>  +--------------+------+---------------+--------------+
>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>  +--------------+------+---------------+--------------+
>>>>>
>>>>>
>>>>> Memory layout (4GB)
>>>>>
>>>>>  0                                    3.5G            4G    4.5G
>>>>>  |                                     |              |       |
>>>>>  +-------------------------------------+--------------+-------+
>>>>>  |                   MEM               |   IO (regs)  |  MEM  |
>>>>>  +-------------------------------------+--------------+-------+
>>>>>
>>>>> Currently, the sparse memory section is 1GB.
>>>>>
>>>>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>>>>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>>>>
>>>>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>>>>
>>>>
>>>> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
>>>> for memory regions that are actually populated.
>>>>
>>>> For instance, on the Foundation model with 4 GB of memory, you may see
>>>> something like this in the boot log
>>>>
>>>> [    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
>>>> (     8 GB maximum)
>>>> [    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
>>>> (   544 MB actual)
>>>>
>>>> but in reality, only the following regions have been allocated
>>>>
>>>> ---[ vmemmap start ]---
>>>> 0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
>>>>       BLK UXN MEM/NORMAL
>>>> 0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
>>>>       BLK UXN MEM/NORMAL
>>>> ---[ vmemmap end ]---
>>>>
>>>> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
>>>> minimal. Moving to flatmem will not reduce the memory footprint at
>>>> all.
>>>
>>> Yes,but the populate is section, which is 1GB. Take a look at the above
>>> memory layout.
>>>
>>> The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.
>>>
>>> The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>>>>  0             1.5G    2G             3.5G            4G
>>>>>  |              |      |               |              |
>>>>>  +--------------+------+---------------+--------------+
>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>  +--------------+------+---------------+--------------+
>>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>>>
>>
>> No, it is not. It may be covered by a section, but that does not mean
>> sparsemem vmemmap will actually allocate backing for it. The
>> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
>> due to the fact that the backing is performed at PMD granularity.
>>
>> Please, could you share the contents of the vmemmap section in
>> /sys/kernel/debug/kernel_page_tables of your system running with
>> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
>>
>
> Please see the pg-tables below.
>
>
> With sparse and vmemmap enable.
>
> ---[ vmemmap start ]---
> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
> ---[ vmemmap end ]---
>

OK, I see what you mean now. Sorry for taking so long to catch up.

> The board is 4GB, and the memap is 70MB
> 1G memory --- 14MB mem_map array.

No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
pages assuming sizeof(struct page) == 64

So you are losing 6 MB to rounding here, which I agree is significant.
I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
on 4k pages kernels, but perhaps we're better off asking the opinion
of the other cc'ees.

Thanks,
Ard.
Will Deacon April 11, 2016, 10:40 a.m. UTC | #9
On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> > Please see the pg-tables below.
> >
> >
> > With sparse and vmemmap enable.
> >
> > ---[ vmemmap start ]---
> > 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
> > ---[ vmemmap end ]---
> >
> 
> OK, I see what you mean now. Sorry for taking so long to catch up.
> 
> > The board is 4GB, and the memap is 70MB
> > 1G memory --- 14MB mem_map array.
> 
> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
> pages assuming sizeof(struct page) == 64
> 
> So you are losing 6 MB to rounding here, which I agree is significant.
> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
> on 4k pages kernels, but perhaps we're better off asking the opinion
> of the other cc'ees.

You need to be really careful making SECTION_SIZE_BITS smaller because
it has a direct correlation on the use of page->flags and you can end up
running out of bits fairly easily.

Will
Chen Feng April 11, 2016, 10:48 a.m. UTC | #10
On 2016/4/11 17:59, Chen Feng wrote:
> Hi Ard,
> 
> On 2016/4/11 16:00, Ard Biesheuvel wrote:
>> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
>>> Hi Ard,
>>>
>>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>>> Hi will,
>>>>> Thanks for review.
>>>>>
>>>>> On 2016/4/7 22:21, Will Deacon wrote:
>>>>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>>>>> We can reduce the memory allocated at mem-map
>>>>>>> by flatmem.
>>>>>>>
>>>>>>> currently, the default memory-model in arm64 is
>>>>>>> sparse memory. The mem-map array is not freed in
>>>>>>> this scene. If the physical address is too long,
>>>>>>> it will reserved too much memory for the mem-map
>>>>>>> array.
>>>>>>
>>>>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>>>>> spaces between memory banks only burns up virtual space. What exactly is
>>>>>> the problem you're seeing that makes you want to use flatmem (which is
>>>>>> probably unsuitable for the majority of arm64 machines).
>>>>>>
>>>>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>>>>> is not freed.
>>>>>
>>>>> take a look at here:
>>>>> arm64/mm/init.c
>>>>> void __init mem_init(void)
>>>>> {
>>>>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>>>>         free_unused_memmap();
>>>>> #endif
>>>>> }
>>>>>
>>>>> Memory layout (3GB)
>>>>>
>>>>>  0             1.5G    2G             3.5G            4G
>>>>>  |              |      |               |              |
>>>>>  +--------------+------+---------------+--------------+
>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>  +--------------+------+---------------+--------------+
>>>>>
>>>>>
>>>>> Memory layout (4GB)
>>>>>
>>>>>  0                                    3.5G            4G    4.5G
>>>>>  |                                     |              |       |
>>>>>  +-------------------------------------+--------------+-------+
>>>>>  |                   MEM               |   IO (regs)  |  MEM  |
>>>>>  +-------------------------------------+--------------+-------+
>>>>>
>>>>> Currently, the sparse memory section is 1GB.
>>>>>
>>>>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>>>>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>>>>
>>>>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>>>>
>>>>
>>>> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
>>>> for memory regions that are actually populated.
>>>>
>>>> For instance, on the Foundation model with 4 GB of memory, you may see
>>>> something like this in the boot log
>>>>
>>>> [    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
>>>> (     8 GB maximum)
>>>> [    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
>>>> (   544 MB actual)
>>>>
>>>> but in reality, only the following regions have been allocated
>>>>
>>>> ---[ vmemmap start ]---
>>>> 0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
>>>>       BLK UXN MEM/NORMAL
>>>> 0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
>>>>       BLK UXN MEM/NORMAL
>>>> ---[ vmemmap end ]---
>>>>
>>>> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
>>>> minimal. Moving to flatmem will not reduce the memory footprint at
>>>> all.
>>>
>>> Yes,but the populate is section, which is 1GB. Take a look at the above
>>> memory layout.
>>>
>>> The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.
>>>
>>> The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>>>>  0             1.5G    2G             3.5G            4G
>>>>>  |              |      |               |              |
>>>>>  +--------------+------+---------------+--------------+
>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>  +--------------+------+---------------+--------------+
>>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>>>
>>
>> No, it is not. It may be covered by a section, but that does not mean
>> sparsemem vmemmap will actually allocate backing for it. The
>> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
>> due to the fact that the backing is performed at PMD granularity.
>>
>> Please, could you share the contents of the vmemmap section in
>> /sys/kernel/debug/kernel_page_tables of your system running with
>> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
>>
> 
> Please see the pg-tables below.
> 
> 
> With sparse and vmemmap enable.
> 
> ---[ vmemmap start ]---
> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
> ---[ vmemmap end ]---
> 
> 
> The board is 4GB, and the memap is 70MB
> 1G memory --- 14MB mem_map array.
> So the 4GB has 5 sections, which used 5 * 14MB memory.
> 
>
Sorry, 1G memory is 16GB
5 sections is 5 * 16 = 80MB

1G / 4K * (struct page) 64B = 16MB

I don't know why the vmemap dump in pg-tables is 70MB.

I add hack code in vmemmap_populate sparse_mem_map_populate.

here is the log:
sparse_mem_map_populate 188 start ffffffbdc0000000 end ffffffbdc1000000 PAGES_PER_SECTION 40000 nid 0
vmemmap_populate 549 size 200000 total 200000 addr ffffffbdc0000000
vmemmap_populate 549 size 200000 total 400000 addr ffffffbdc0200000
vmemmap_populate 549 size 200000 total 600000 addr ffffffbdc0400000
vmemmap_populate 549 size 200000 total 800000 addr ffffffbdc0600000
vmemmap_populate 549 size 200000 total a00000 addr ffffffbdc0800000
vmemmap_populate 549 size 200000 total c00000 addr ffffffbdc0a00000
vmemmap_populate 549 size 200000 total e00000 addr ffffffbdc0c00000
vmemmap_populate 549 size 200000 total 1000000 addr ffffffbdc0e00000
sparse_mem_map_populate 188 start ffffffbdc1000000 end ffffffbdc2000000 PAGES_PER_SECTION 40000 nid 0
...
sparse_mem_map_populate 188 start ffffffbdc2000000 end ffffffbdc3000000 PAGES_PER_SECTION 40000 nid 0
sparse_mem_map_populate 188 start ffffffbdc3000000 end ffffffbdc4000000 PAGES_PER_SECTION 40000 nid 0
sparse_mem_map_populate 188 start ffffffbdc4000000 end ffffffbdc5000000 PAGES_PER_SECTION 40000 nid 0


With 4GB memory, it allocated 2MB *  8  * 5 = 80MB.
>  0                                    3.5G            4G    4.5G
>  |                                     |              |       |
>  +-------------------------------------+--------------+-------+
>  |                   MEM               |   IO (regs)  |  MEM  |
>  +-------------------------------------+--------------+-------+

4GB memory ,5 sections. 80MB mem_map allocated.

> 
> 
> 
> 
>> .
>>
Chen Feng April 11, 2016, 10:57 a.m. UTC | #11
Hi Will,

On 2016/4/11 18:40, Will Deacon wrote:
> On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
>> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
>>> Please see the pg-tables below.
>>>
>>>
>>> With sparse and vmemmap enable.
>>>
>>> ---[ vmemmap start ]---
>>> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
>>> ---[ vmemmap end ]---
>>>
>>
>> OK, I see what you mean now. Sorry for taking so long to catch up.
>>
>>> The board is 4GB, and the memap is 70MB
>>> 1G memory --- 14MB mem_map array.
>>
>> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
>> pages assuming sizeof(struct page) == 64
>>
>> So you are losing 6 MB to rounding here, which I agree is significant.
>> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
>> on 4k pages kernels, but perhaps we're better off asking the opinion
>> of the other cc'ees.
> 
> You need to be really careful making SECTION_SIZE_BITS smaller because
> it has a direct correlation on the use of page->flags and you can end up
> running out of bits fairly easily.

Yes, making SECTION_SIZE_BITS smaller can solve the current situation.

But if the phys-addr is 64GB, but only 4GB ddr is the valid address. And the

holes are not always 512MB.

But, can you tell us why *smaller SIZE makes running out of bits fairly easily*?

And how about the flat-mem model?

> 
> Will
> 
> .
>
Ard Biesheuvel April 11, 2016, 11:02 a.m. UTC | #12
On 11 April 2016 at 12:48, Chen Feng <puck.chen@hisilicon.com> wrote:
>
>
> On 2016/4/11 17:59, Chen Feng wrote:
>> Hi Ard,
>>
>> On 2016/4/11 16:00, Ard Biesheuvel wrote:
>>> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>> Hi Ard,
>>>>
>>>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>>>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>>>> Hi will,
>>>>>> Thanks for review.
>>>>>>
>>>>>> On 2016/4/7 22:21, Will Deacon wrote:
>>>>>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>>>>>> We can reduce the memory allocated at mem-map
>>>>>>>> by flatmem.
>>>>>>>>
>>>>>>>> currently, the default memory-model in arm64 is
>>>>>>>> sparse memory. The mem-map array is not freed in
>>>>>>>> this scene. If the physical address is too long,
>>>>>>>> it will reserved too much memory for the mem-map
>>>>>>>> array.
>>>>>>>
>>>>>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>>>>>> spaces between memory banks only burns up virtual space. What exactly is
>>>>>>> the problem you're seeing that makes you want to use flatmem (which is
>>>>>>> probably unsuitable for the majority of arm64 machines).
>>>>>>>
>>>>>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>>>>>> is not freed.
>>>>>>
>>>>>> take a look at here:
>>>>>> arm64/mm/init.c
>>>>>> void __init mem_init(void)
>>>>>> {
>>>>>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>>>>>         free_unused_memmap();
>>>>>> #endif
>>>>>> }
>>>>>>
>>>>>> Memory layout (3GB)
>>>>>>
>>>>>>  0             1.5G    2G             3.5G            4G
>>>>>>  |              |      |               |              |
>>>>>>  +--------------+------+---------------+--------------+
>>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>>  +--------------+------+---------------+--------------+
>>>>>>
>>>>>>
>>>>>> Memory layout (4GB)
>>>>>>
>>>>>>  0                                    3.5G            4G    4.5G
>>>>>>  |                                     |              |       |
>>>>>>  +-------------------------------------+--------------+-------+
>>>>>>  |                   MEM               |   IO (regs)  |  MEM  |
>>>>>>  +-------------------------------------+--------------+-------+
>>>>>>
>>>>>> Currently, the sparse memory section is 1GB.
>>>>>>
>>>>>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>>>>>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>>>>>
>>>>>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>>>>>
>>>>>
>>>>> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
>>>>> for memory regions that are actually populated.
>>>>>
>>>>> For instance, on the Foundation model with 4 GB of memory, you may see
>>>>> something like this in the boot log
>>>>>
>>>>> [    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
>>>>> (     8 GB maximum)
>>>>> [    0.000000]               0xffffffbdc0000000 - 0xffffffbde2000000
>>>>> (   544 MB actual)
>>>>>
>>>>> but in reality, only the following regions have been allocated
>>>>>
>>>>> ---[ vmemmap start ]---
>>>>> 0xffffffbdc0000000-0xffffffbdc2000000          32M       RW NX SHD AF
>>>>>       BLK UXN MEM/NORMAL
>>>>> 0xffffffbde0000000-0xffffffbde2000000          32M       RW NX SHD AF
>>>>>       BLK UXN MEM/NORMAL
>>>>> ---[ vmemmap end ]---
>>>>>
>>>>> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
>>>>> minimal. Moving to flatmem will not reduce the memory footprint at
>>>>> all.
>>>>
>>>> Yes,but the populate is section, which is 1GB. Take a look at the above
>>>> memory layout.
>>>>
>>>> The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.
>>>>
>>>> The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>>>>>  0             1.5G    2G             3.5G            4G
>>>>>>  |              |      |               |              |
>>>>>>  +--------------+------+---------------+--------------+
>>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>>  +--------------+------+---------------+--------------+
>>>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>>>>
>>>
>>> No, it is not. It may be covered by a section, but that does not mean
>>> sparsemem vmemmap will actually allocate backing for it. The
>>> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
>>> due to the fact that the backing is performed at PMD granularity.
>>>
>>> Please, could you share the contents of the vmemmap section in
>>> /sys/kernel/debug/kernel_page_tables of your system running with
>>> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
>>>
>>
>> Please see the pg-tables below.
>>
>>
>> With sparse and vmemmap enable.
>>
>> ---[ vmemmap start ]---
>> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
>> ---[ vmemmap end ]---
>>
>>
>> The board is 4GB, and the memap is 70MB
>> 1G memory --- 14MB mem_map array.
>> So the 4GB has 5 sections, which used 5 * 14MB memory.
>>
>>
> Sorry, 1G memory is 16GB
> 5 sections is 5 * 16 = 80MB
>
> 1G / 4K * (struct page) 64B = 16MB
>
> I don't know why the vmemap dump in pg-tables is 70MB.
>

It may be the PTDUMP code that emits the vmemmap start marker
incorrectly. Could you please double check?

> I add hack code in vmemmap_populate sparse_mem_map_populate.
>
> here is the log:
> sparse_mem_map_populate 188 start ffffffbdc0000000 end ffffffbdc1000000 PAGES_PER_SECTION 40000 nid 0
> vmemmap_populate 549 size 200000 total 200000 addr ffffffbdc0000000
> vmemmap_populate 549 size 200000 total 400000 addr ffffffbdc0200000
> vmemmap_populate 549 size 200000 total 600000 addr ffffffbdc0400000
> vmemmap_populate 549 size 200000 total 800000 addr ffffffbdc0600000
> vmemmap_populate 549 size 200000 total a00000 addr ffffffbdc0800000
> vmemmap_populate 549 size 200000 total c00000 addr ffffffbdc0a00000
> vmemmap_populate 549 size 200000 total e00000 addr ffffffbdc0c00000
> vmemmap_populate 549 size 200000 total 1000000 addr ffffffbdc0e00000
> sparse_mem_map_populate 188 start ffffffbdc1000000 end ffffffbdc2000000 PAGES_PER_SECTION 40000 nid 0
> ...
> sparse_mem_map_populate 188 start ffffffbdc2000000 end ffffffbdc3000000 PAGES_PER_SECTION 40000 nid 0
> sparse_mem_map_populate 188 start ffffffbdc3000000 end ffffffbdc4000000 PAGES_PER_SECTION 40000 nid 0
> sparse_mem_map_populate 188 start ffffffbdc4000000 end ffffffbdc5000000 PAGES_PER_SECTION 40000 nid 0
>
>
> With 4GB memory, it allocated 2MB *  8  * 5 = 80MB.
>>  0                                    3.5G            4G    4.5G
>>  |                                     |              |       |
>>  +-------------------------------------+--------------+-------+
>>  |                   MEM               |   IO (regs)  |  MEM  |
>>  +-------------------------------------+--------------+-------+
>
> 4GB memory ,5 sections. 80MB mem_map allocated.
>

I suppose using

#define SECTION_SIZE_BITS      29

in arch/arm64/include/asm/sparsemem.h would get rid of the overhead
completely in this particular case. Could you confirm, please?

@Will: is the rationale for the default value of 30 for
SECTION_SIZE_BITS documented anywhere? Compared to other
architectures, it seems on the high side, but I did notice that 64k
granule kernels require at least 28 in order not to trigger the
following assertion

include/linux/mmzone.h:1029:2: error: #error Allocator MAX_ORDER
exceeds SECTION_SIZE
Laura Abbott April 11, 2016, 6:11 p.m. UTC | #13
On 04/11/2016 03:57 AM, Chen Feng wrote:
> Hi Will,
>
> On 2016/4/11 18:40, Will Deacon wrote:
>> On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
>>> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>> Please see the pg-tables below.
>>>>
>>>>
>>>> With sparse and vmemmap enable.
>>>>
>>>> ---[ vmemmap start ]---
>>>> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
>>>> ---[ vmemmap end ]---
>>>>
>>>
>>> OK, I see what you mean now. Sorry for taking so long to catch up.
>>>
>>>> The board is 4GB, and the memap is 70MB
>>>> 1G memory --- 14MB mem_map array.
>>>
>>> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
>>> pages assuming sizeof(struct page) == 64
>>>
>>> So you are losing 6 MB to rounding here, which I agree is significant.
>>> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
>>> on 4k pages kernels, but perhaps we're better off asking the opinion
>>> of the other cc'ees.
>>
>> You need to be really careful making SECTION_SIZE_BITS smaller because
>> it has a direct correlation on the use of page->flags and you can end up
>> running out of bits fairly easily.
>
> Yes, making SECTION_SIZE_BITS smaller can solve the current situation.
>
> But if the phys-addr is 64GB, but only 4GB ddr is the valid address. And the
>
> holes are not always 512MB.
>
> But, can you tell us why *smaller SIZE makes running out of bits fairly easily*?
>

Think about page tables and TLB pressure. A larger page size can cover the
same memory area with fewer page table entries. The same type of logic applies
to memory sections here as well. If the section size is smaller, you need
more bits to represent the number of sections used. page->flags is a long

In include/linux/mm.h

/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_CPUPID] | ... | FLAGS | */

and

#if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
#error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
#endif

So it's a trade off of what can be encoded in an unsigned long.

We're hitting the upper bound on zones as well (see 033fbae988fc 'mm:
ZONE_DEVICE for "device memory"')


> And how about the flat-mem model?
>
>>
>> Will
>>
>> .
>>
>
Jungseok Lee April 12, 2016, 2:03 p.m. UTC | #14
On Apr 11, 2016, at 11:49 AM, Chen Feng wrote:

Dear Chen,

> Hi will,
> Thanks for review.
> 
> On 2016/4/7 22:21, Will Deacon wrote:
>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>> We can reduce the memory allocated at mem-map
>>> by flatmem.
>>> 
>>> currently, the default memory-model in arm64 is
>>> sparse memory. The mem-map array is not freed in
>>> this scene. If the physical address is too long,
>>> it will reserved too much memory for the mem-map
>>> array.
>> 
>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>> spaces between memory banks only burns up virtual space. What exactly is
>> the problem you're seeing that makes you want to use flatmem (which is
>> probably unsuitable for the majority of arm64 machines).
>> 
> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
> is not freed.
> 
> take a look at here:
> arm64/mm/init.c
> void __init mem_init(void)
> {
> #ifndef CONFIG_SPARSEMEM_VMEMMAP
> 	free_unused_memmap();
> #endif
> }
> 
> Memory layout (3GB)
> 
> 0             1.5G    2G             3.5G            4G
> |              |      |               |              |
> +--------------+------+---------------+--------------+
> |    MEM       | hole |     MEM       |   IO (regs)  |
> +--------------+------+---------------+--------------+
> 
> 
> Memory layout (4GB)
> 
> 0                                    3.5G            4G    4.5G
> |                                     |              |       |
> +-------------------------------------+--------------+-------+
> |                   MEM               |   IO (regs)  |  MEM  |
> +-------------------------------------+--------------+-------+
> 
> Currently, the sparse memory section is 1GB.
> 
> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
> 
> This will alloc 1G/4K * (struct page) memory for mem_map array.
> 
> We want to use flat-mem to reduce the alloced mem_map.
> 
> I don't know why you tell us the flatmem is unsuitable for the
> majority of arm64 machines. Can tell us the reason of it?
> 
> And we are not going to limit the memdel in arm64, we just want to
> make the flat-mem is an optional item in arm64.

I've experienced the same problem and considered the ideas mentioned
in this thread: flatmem and small SECTION_SIZE_BITS. However, I was
reluctant to post any patch since the issue is highly related to memory
map design document, [1], saying 1GB aligned RAM. The majority of arm64
platforms might follow the information although it is not spec. IOW,
a machine I've played was at least unusual *at that time*, so I didn't
consider upstream work.

[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf  

Best Regards
Jungseok Lee
Catalin Marinas April 12, 2016, 2:44 p.m. UTC | #15
On Mon, Apr 11, 2016 at 11:40:13AM +0100, Will Deacon wrote:
> On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
> > On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> > > Please see the pg-tables below.
> > >
> > >
> > > With sparse and vmemmap enable.
> > >
> > > ---[ vmemmap start ]---
> > > 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
> > > ---[ vmemmap end ]---
> > >
> > 
> > OK, I see what you mean now. Sorry for taking so long to catch up.
> > 
> > > The board is 4GB, and the memap is 70MB
> > > 1G memory --- 14MB mem_map array.
> > 
> > No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
> > pages assuming sizeof(struct page) == 64
> > 
> > So you are losing 6 MB to rounding here, which I agree is significant.
> > I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
> > on 4k pages kernels, but perhaps we're better off asking the opinion
> > of the other cc'ees.
> 
> You need to be really careful making SECTION_SIZE_BITS smaller because
> it has a direct correlation on the use of page->flags and you can end up
> running out of bits fairly easily.

With SPARSEMEM_VMEMMAP, SECTION_SIZE_BITS no longer affect the page
flags since we no longer need to encode the section number in
page->flags.
Catalin Marinas April 12, 2016, 2:59 p.m. UTC | #16
On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> > On 2016/4/11 16:00, Ard Biesheuvel wrote:
> >> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
> >>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>>>>  0             1.5G    2G             3.5G            4G
> >>>>>  |              |      |               |              |
> >>>>>  +--------------+------+---------------+--------------+
> >>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
> >>>>>  +--------------+------+---------------+--------------+
> >>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
> >>>
> >>
> >> No, it is not. It may be covered by a section, but that does not mean
> >> sparsemem vmemmap will actually allocate backing for it. The
> >> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
> >> due to the fact that the backing is performed at PMD granularity.
> >>
> >> Please, could you share the contents of the vmemmap section in
> >> /sys/kernel/debug/kernel_page_tables of your system running with
> >> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
> >
> > Please see the pg-tables below.
> >
> > With sparse and vmemmap enable.
> >
> > ---[ vmemmap start ]---
> > 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
> > ---[ vmemmap end ]---
[...]
> > The board is 4GB, and the memap is 70MB
> > 1G memory --- 14MB mem_map array.
> 
> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
> pages assuming sizeof(struct page) == 64
> 
> So you are losing 6 MB to rounding here, which I agree is significant.
> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
> on 4k pages kernels, but perhaps we're better off asking the opinion
> of the other cc'ees.

IIRC, SECTION_SIZE_BITS was chosen to be the maximum sane value we were
thinking of at the time, assuming that 1GB RAM alignment to be fairly
normal. For the !SPARSEMEM_VMEMMAP case, we should probably be fine with
29 but, as Will said, we need to be careful with the page flags. At a
quick look, we have 25 page flags, 2 bits per zone, NUMA nodes and (48 -
section_size_bits) for the section width. We also need to take into
account 4 more bits for 52-bit PA support (ARMv8.2). So, without NUMA
nodes, we are currently at 49 bits used in page->flags.

For the SPARSEMEM_VMEMMAP case, we can decrease the SECTION_SIZE_BITS in
the MAX_ORDER limit.

An alternative would be to free the vmemmap holes later (but still keep
the vmemmap mapping alias). Yet another option would be to change the
sparse_mem_map_populate() logic get the actual section end rather than
always assuming PAGES_PER_SECTION. But I don't think any of these are
worth if we can safely reduce SECTION_SIZE_BITS.
Chen Feng April 20, 2016, 3:18 a.m. UTC | #17
Hi Catalin,

Thanks for your reply.
On 2016/4/12 22:59, Catalin Marinas wrote:
> On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
>> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
>>> On 2016/4/11 16:00, Ard Biesheuvel wrote:
>>>> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>>>>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
>>>>>>>  0             1.5G    2G             3.5G            4G
>>>>>>>  |              |      |               |              |
>>>>>>>  +--------------+------+---------------+--------------+
>>>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
>>>>>>>  +--------------+------+---------------+--------------+
>>>>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>>>>>
>>>>
>>>> No, it is not. It may be covered by a section, but that does not mean
>>>> sparsemem vmemmap will actually allocate backing for it. The
>>>> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
>>>> due to the fact that the backing is performed at PMD granularity.
>>>>
>>>> Please, could you share the contents of the vmemmap section in
>>>> /sys/kernel/debug/kernel_page_tables of your system running with
>>>> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
>>>
>>> Please see the pg-tables below.
>>>
>>> With sparse and vmemmap enable.
>>>
>>> ---[ vmemmap start ]---
>>> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
>>> ---[ vmemmap end ]---
> [...]
>>> The board is 4GB, and the memap is 70MB
>>> 1G memory --- 14MB mem_map array.
>>
>> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
>> pages assuming sizeof(struct page) == 64
>>
>> So you are losing 6 MB to rounding here, which I agree is significant.
>> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
>> on 4k pages kernels, but perhaps we're better off asking the opinion
>> of the other cc'ees.
> 
> IIRC, SECTION_SIZE_BITS was chosen to be the maximum sane value we were
> thinking of at the time, assuming that 1GB RAM alignment to be fairly
> normal. For the !SPARSEMEM_VMEMMAP case, we should probably be fine with
> 29 but, as Will said, we need to be careful with the page flags. At a
> quick look, we have 25 page flags, 2 bits per zone, NUMA nodes and (48 -
> section_size_bits) for the section width. We also need to take into
> account 4 more bits for 52-bit PA support (ARMv8.2). So, without NUMA
> nodes, we are currently at 49 bits used in page->flags.
> 
> For the SPARSEMEM_VMEMMAP case, we can decrease the SECTION_SIZE_BITS in
> the MAX_ORDER limit.
> 
> An alternative would be to free the vmemmap holes later (but still keep
> the vmemmap mapping alias). Yet another option would be to change the
> sparse_mem_map_populate() logic get the actual section end rather than
> always assuming PAGES_PER_SECTION. But I don't think any of these are
> worth if we can safely reduce SECTION_SIZE_BITS.
> 
Yes,
currently,it's safely to reduce the SECTION_SIZE_BITS to match this issue
very well.

As I mentioned before, if the memory layout is not like this scene. There
will be not suitable to reduce the SECTION_SIZE_BITS.

We have 4G memory, and 64GB phys address.

There will be a lot of holes in the memory layout.
And the *holes size are not always the same*.

So,it's the reason I want to enable flat-mem in ARM64-ARCH. Why not makes
the flat-mem an optional setting for arm64?
Catalin Marinas April 20, 2016, 9:32 a.m. UTC | #18
Hi Chen,

On Wed, Apr 20, 2016 at 11:18:54AM +0800, Chen Feng wrote:
> Thanks for your reply.
> On 2016/4/12 22:59, Catalin Marinas wrote:
> > On Mon, Apr 11, 2016 at 12:31:53PM +0200, Ard Biesheuvel wrote:
> >> On 11 April 2016 at 11:59, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>> On 2016/4/11 16:00, Ard Biesheuvel wrote:
> >>>> On 11 April 2016 at 09:55, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>>>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
> >>>>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@hisilicon.com> wrote:
> >>>>>>>  0             1.5G    2G             3.5G            4G
> >>>>>>>  |              |      |               |              |
> >>>>>>>  +--------------+------+---------------+--------------+
> >>>>>>>  |    MEM       | hole |     MEM       |   IO (regs)  |
> >>>>>>>  +--------------+------+---------------+--------------+
> >>>>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
> >>>>
> >>>> No, it is not. It may be covered by a section, but that does not mean
> >>>> sparsemem vmemmap will actually allocate backing for it. The
> >>>> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
> >>>> due to the fact that the backing is performed at PMD granularity.
> >>>>
> >>>> Please, could you share the contents of the vmemmap section in
> >>>> /sys/kernel/debug/kernel_page_tables of your system running with
> >>>> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
> >>>
> >>> Please see the pg-tables below.
> >>>
> >>> With sparse and vmemmap enable.
> >>>
> >>> ---[ vmemmap start ]---
> >>> 0xffffffbdc0200000-0xffffffbdc4800000          70M     RW NX SHD AF    UXN MEM/NORMAL
> >>> ---[ vmemmap end ]---
> > [...]
> >>> The board is 4GB, and the memap is 70MB
> >>> 1G memory --- 14MB mem_map array.
> >>
> >> No, this is incorrect. 1 GB corresponds with 16 MB worth of struct
> >> pages assuming sizeof(struct page) == 64
> >>
> >> So you are losing 6 MB to rounding here, which I agree is significant.
> >> I wonder if it makes sense to use a lower value for SECTION_SIZE_BITS
> >> on 4k pages kernels, but perhaps we're better off asking the opinion
> >> of the other cc'ees.
> > 
> > IIRC, SECTION_SIZE_BITS was chosen to be the maximum sane value we were
> > thinking of at the time, assuming that 1GB RAM alignment to be fairly
> > normal. For the !SPARSEMEM_VMEMMAP case, we should probably be fine with
> > 29 but, as Will said, we need to be careful with the page flags. At a
> > quick look, we have 25 page flags, 2 bits per zone, NUMA nodes and (48 -
> > section_size_bits) for the section width. We also need to take into
> > account 4 more bits for 52-bit PA support (ARMv8.2). So, without NUMA
> > nodes, we are currently at 49 bits used in page->flags.
> > 
> > For the SPARSEMEM_VMEMMAP case, we can decrease the SECTION_SIZE_BITS in
> > the MAX_ORDER limit.
> > 
> > An alternative would be to free the vmemmap holes later (but still keep
> > the vmemmap mapping alias). Yet another option would be to change the
> > sparse_mem_map_populate() logic get the actual section end rather than
> > always assuming PAGES_PER_SECTION. But I don't think any of these are
> > worth if we can safely reduce SECTION_SIZE_BITS.
> 
> Yes,
> currently,it's safely to reduce the SECTION_SIZE_BITS to match this issue
> very well.
> 
> As I mentioned before, if the memory layout is not like this scene. There
> will be not suitable to reduce the SECTION_SIZE_BITS.

SECTION_SIZE_BITS is not meant to cover all possible combinations but
only sane ones and it was primarily targeted at the ARM memory map
recommendations:

http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_principles_of_arm_memory_maps.pdf

As you know have reported a platform that uses half GB-aligned RAM
blocks/sizes, I'm fine to change SECTION_SIZE_BITS. If in the future we
see even more insane configurations and the memory wasted is
significant, we may have to revisit this (I also proposed an alternative
above like freeing the vmmemap holes, that's not too different from a
flat memmap array).

> We have 4G memory, and 64GB phys address.
> 
> There will be a lot of holes in the memory layout.
> And the *holes size are not always the same*.

It's not the hole size that matters but rather the section size and
alignment.

> So,it's the reason I want to enable flat-mem in ARM64-ARCH. Why not makes
> the flat-mem an optional setting for arm64?

Because (a) I strongly believe in single Image, (b) I do not want to
increase the configuration space unnecessarily (already large enough
with all the page and VA size combinations) and (c) I don't see any
advantage in flatmem compared to sparsemem+vmemap.
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4f43622..c18930d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -559,6 +559,9 @@  config ARCH_SPARSEMEM_ENABLE
 	def_bool y
 	select SPARSEMEM_VMEMMAP_ENABLE
 
+config ARCH_FLATMEM_ENABLE
+	def_bool y
+
 config ARCH_SPARSEMEM_DEFAULT
 	def_bool ARCH_SPARSEMEM_ENABLE