diff mbox series

mm/sparse: Consistently do not zero memmap

Message ID 20191030131122.8256-1-vincent.whitchurch@axis.com (mailing list archive)
State New, archived
Headers show
Series mm/sparse: Consistently do not zero memmap | expand

Commit Message

Vincent Whitchurch Oct. 30, 2019, 1:11 p.m. UTC
sparsemem without VMEMMAP has two allocation paths to allocate the
memory needed for its memmap (done in sparse_mem_map_populate()).

In one allocation path (sparse_buffer_alloc() succeeds), the memory is
not zeroed (since it was previously allocated with
memblock_alloc_try_nid_raw()).

In the other allocation path (sparse_buffer_alloc() fails and
sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
memory is zeroed.

AFAICS this difference does not appear to be on purpose.  If the code is
supposed to work with non-initialized memory (__init_single_page() takes
care of zeroing the struct pages which are actually used), we should
consistently not zero the memory, to avoid masking bugs.

(I noticed this because on my ARM64 platform, with 1 GiB of memory the
 first [and only] section is allocated from the zeroing path while with
 2 GiB of memory the first 1 GiB section is allocated from the
 non-zeroing path.)

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko Oct. 30, 2019, 1:29 p.m. UTC | #1
On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> sparsemem without VMEMMAP has two allocation paths to allocate the
> memory needed for its memmap (done in sparse_mem_map_populate()).
> 
> In one allocation path (sparse_buffer_alloc() succeeds), the memory is
> not zeroed (since it was previously allocated with
> memblock_alloc_try_nid_raw()).
> 
> In the other allocation path (sparse_buffer_alloc() fails and
> sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
> memory is zeroed.
> 
> AFAICS this difference does not appear to be on purpose.  If the code is
> supposed to work with non-initialized memory (__init_single_page() takes
> care of zeroing the struct pages which are actually used), we should
> consistently not zero the memory, to avoid masking bugs.

You are right that this is not intentional.

> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>  first [and only] section is allocated from the zeroing path while with
>  2 GiB of memory the first 1 GiB section is allocated from the
>  non-zeroing path.)

Do I get it right that sparse_buffer_init couldn't allocate memmap for
the full node for some reason and so sparse_init_nid would have to
allocate one for each memory section?

> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

Anyway the patch is OK. Even though this is not a bug strictly speaking
it is certainly a suboptimal behavior because zeroying takes time so
I would flag this for a stable tree 4.19+. There is no clear Fixes tag
to apply (35fd1eb1e8212 would get closest I guess).

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f6891c1992b1..01e467adc219 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>  	if (map)
>  		return map;
>  
> -	map = memblock_alloc_try_nid(size,
> +	map = memblock_alloc_try_nid_raw(size,
>  					  PAGE_SIZE, addr,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	if (!map)
> -- 
> 2.20.0
David Hildenbrand Oct. 30, 2019, 1:31 p.m. UTC | #2
On 30.10.19 14:11, Vincent Whitchurch wrote:
> sparsemem without VMEMMAP has two allocation paths to allocate the
> memory needed for its memmap (done in sparse_mem_map_populate()).
> 
> In one allocation path (sparse_buffer_alloc() succeeds), the memory is
> not zeroed (since it was previously allocated with
> memblock_alloc_try_nid_raw()).
> 
> In the other allocation path (sparse_buffer_alloc() fails and
> sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
> memory is zeroed.
> 
> AFAICS this difference does not appear to be on purpose.  If the code is
> supposed to work with non-initialized memory (__init_single_page() takes
> care of zeroing the struct pages which are actually used), we should
> consistently not zero the memory, to avoid masking bugs.

I agree

Acked-by: David Hildenbrand <david@redhat.com>

> 
> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>   first [and only] section is allocated from the zeroing path while with
>   2 GiB of memory the first 1 GiB section is allocated from the
>   non-zeroing path.)
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
> ---
>   mm/sparse.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f6891c1992b1..01e467adc219 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>   	if (map)
>   		return map;
>   
> -	map = memblock_alloc_try_nid(size,
> +	map = memblock_alloc_try_nid_raw(size,
>   					  PAGE_SIZE, addr,
>   					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>   	if (!map)
>
Oscar Salvador Oct. 30, 2019, 1:38 p.m. UTC | #3
On Wed, Oct 30, 2019 at 02:11:22PM +0100, Vincent Whitchurch wrote:
> sparsemem without VMEMMAP has two allocation paths to allocate the
> memory needed for its memmap (done in sparse_mem_map_populate()).
> 
> In one allocation path (sparse_buffer_alloc() succeeds), the memory is
> not zeroed (since it was previously allocated with
> memblock_alloc_try_nid_raw()).
> 
> In the other allocation path (sparse_buffer_alloc() fails and
> sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
> memory is zeroed.
> 
> AFAICS this difference does not appear to be on purpose.  If the code is
> supposed to work with non-initialized memory (__init_single_page() takes
> care of zeroing the struct pages which are actually used), we should
> consistently not zero the memory, to avoid masking bugs.
> 
> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>  first [and only] section is allocated from the zeroing path while with
>  2 GiB of memory the first 1 GiB section is allocated from the
>  non-zeroing path.)
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

Good catch

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f6891c1992b1..01e467adc219 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>  	if (map)
>  		return map;
>  
> -	map = memblock_alloc_try_nid(size,
> +	map = memblock_alloc_try_nid_raw(size,
>  					  PAGE_SIZE, addr,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	if (!map)
> -- 
> 2.20.0
>
Vincent Whitchurch Oct. 30, 2019, 2:02 p.m. UTC | #4
On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> >  first [and only] section is allocated from the zeroing path while with
> >  2 GiB of memory the first 1 GiB section is allocated from the
> >  non-zeroing path.)
> 
> Do I get it right that sparse_buffer_init couldn't allocate memmap for
> the full node for some reason and so sparse_init_nid would have to
> allocate one for each memory section?

Not quite.  The sparsemap_buf is successfully allocated with the correct
size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
allocate the same size from it.

The reason it fails is that sparse_buffer_alloc() for some reason wants
to return a pointer which is aligned to the allocation size.  But the
sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
enough space to align it.

I don't understand the reason for this alignment requirement since the
fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
Michal Hocko Oct. 30, 2019, 2:12 p.m. UTC | #5
[Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
 but it used your old email address]

On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > >  first [and only] section is allocated from the zeroing path while with
> > >  2 GiB of memory the first 1 GiB section is allocated from the
> > >  non-zeroing path.)
> > 
> > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > the full node for some reason and so sparse_init_nid would have to
> > allocate one for each memory section?
> 
> Not quite.  The sparsemap_buf is successfully allocated with the correct
> size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> allocate the same size from it.
> 
> The reason it fails is that sparse_buffer_alloc() for some reason wants
> to return a pointer which is aligned to the allocation size.  But the
> sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> enough space to align it.
> 
> I don't understand the reason for this alignment requirement since the
> fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?

I am not 100% sure TBH. Aligning makes some sense when mapping the
memmaps to page tables but that would suggest that sparse_buffer_init
is using a wrong alignment then. It is quite wasteful to allocate
alarge misaligned block like that.

Your patch still makes sense but this is something to look into.

Pavel?
Pasha Tatashin Oct. 30, 2019, 3:20 p.m. UTC | #6
On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
>  but it used your old email address]
>
> On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > >  first [and only] section is allocated from the zeroing path while with
> > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > >  non-zeroing path.)
> > >
> > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > the full node for some reason and so sparse_init_nid would have to
> > > allocate one for each memory section?
> >
> > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > allocate the same size from it.
> >
> > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > to return a pointer which is aligned to the allocation size.  But the
> > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > enough space to align it.
> >
> > I don't understand the reason for this alignment requirement since the
> > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
>
> I am not 100% sure TBH. Aligning makes some sense when mapping the
> memmaps to page tables but that would suggest that sparse_buffer_init
> is using a wrong alignment then. It is quite wasteful to allocate
> alarge misaligned block like that.
>
> Your patch still makes sense but this is something to look into.
>
> Pavel?

I remember thinking about this large alignment, as it looked out of
place to me also.
It was there to keep memmap in single chunks on larger x86 machines.
Perhaps it can be revisited now.

The patch that introduced this alignment:
9bdac914240759457175ac0d6529a37d2820bc4d

vmemmap_alloc_block_buf
+       ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);

Pasha
Pasha Tatashin Oct. 30, 2019, 3:21 p.m. UTC | #7
> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>  first [and only] section is allocated from the zeroing path while with
>  2 GiB of memory the first 1 GiB section is allocated from the
>  non-zeroing path.)
>
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

LGTM, Thank you!
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Michal Hocko Oct. 30, 2019, 3:31 p.m. UTC | #8
On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> >  but it used your old email address]
> >
> > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > >  first [and only] section is allocated from the zeroing path while with
> > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > >  non-zeroing path.)
> > > >
> > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > the full node for some reason and so sparse_init_nid would have to
> > > > allocate one for each memory section?
> > >
> > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > allocate the same size from it.
> > >
> > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > to return a pointer which is aligned to the allocation size.  But the
> > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > enough space to align it.
> > >
> > > I don't understand the reason for this alignment requirement since the
> > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> >
> > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > memmaps to page tables but that would suggest that sparse_buffer_init
> > is using a wrong alignment then. It is quite wasteful to allocate
> > alarge misaligned block like that.
> >
> > Your patch still makes sense but this is something to look into.
> >
> > Pavel?
> 
> I remember thinking about this large alignment, as it looked out of
> place to me also.
> It was there to keep memmap in single chunks on larger x86 machines.
> Perhaps it can be revisited now.

Don't we need 2MB aligned memmaps for their PMD mappings?

> The patch that introduced this alignment:
> 9bdac914240759457175ac0d6529a37d2820bc4d
> 
> vmemmap_alloc_block_buf
> +       ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
> 
> Pasha
Pasha Tatashin Oct. 30, 2019, 4:53 p.m. UTC | #9
On Wed, Oct 30, 2019 at 11:31 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> > On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> > >  but it used your old email address]
> > >
> > > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > > >  first [and only] section is allocated from the zeroing path while with
> > > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > > >  non-zeroing path.)
> > > > >
> > > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > > the full node for some reason and so sparse_init_nid would have to
> > > > > allocate one for each memory section?
> > > >
> > > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > > allocate the same size from it.
> > > >
> > > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > > to return a pointer which is aligned to the allocation size.  But the
> > > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > > enough space to align it.
> > > >
> > > > I don't understand the reason for this alignment requirement since the
> > > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> > >
> > > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > > memmaps to page tables but that would suggest that sparse_buffer_init
> > > is using a wrong alignment then. It is quite wasteful to allocate
> > > alarge misaligned block like that.
> > >
> > > Your patch still makes sense but this is something to look into.
> > >
> > > Pavel?
> >
> > I remember thinking about this large alignment, as it looked out of
> > place to me also.
> > It was there to keep memmap in single chunks on larger x86 machines.
> > Perhaps it can be revisited now.
>
> Don't we need 2MB aligned memmaps for their PMD mappings?

Yes, PMD_SIZE should be the alignment here. It just does not make
sense to align to size.

Pasha
Michal Hocko Oct. 30, 2019, 5:31 p.m. UTC | #10
On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
> On Wed, Oct 30, 2019 at 11:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> > > On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> > > >  but it used your old email address]
> > > >
> > > > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > > > >  first [and only] section is allocated from the zeroing path while with
> > > > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > > > >  non-zeroing path.)
> > > > > >
> > > > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > > > the full node for some reason and so sparse_init_nid would have to
> > > > > > allocate one for each memory section?
> > > > >
> > > > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > > > allocate the same size from it.
> > > > >
> > > > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > > > to return a pointer which is aligned to the allocation size.  But the
> > > > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > > > enough space to align it.
> > > > >
> > > > > I don't understand the reason for this alignment requirement since the
> > > > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> > > >
> > > > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > > > memmaps to page tables but that would suggest that sparse_buffer_init
> > > > is using a wrong alignment then. It is quite wasteful to allocate
> > > > alarge misaligned block like that.
> > > >
> > > > Your patch still makes sense but this is something to look into.
> > > >
> > > > Pavel?
> > >
> > > I remember thinking about this large alignment, as it looked out of
> > > place to me also.
> > > It was there to keep memmap in single chunks on larger x86 machines.
> > > Perhaps it can be revisited now.
> >
> > Don't we need 2MB aligned memmaps for their PMD mappings?
> 
> Yes, PMD_SIZE should be the alignment here. It just does not make
> sense to align to size.

What about this? It still aligns to the size but that should be
correctly done to the section size level.

diff --git a/mm/sparse.c b/mm/sparse.c
index 72f010d9bff5..ab1e6175ac9a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
 	if (map)
 		return map;
 
-	map = memblock_alloc_try_nid(size,
-					  PAGE_SIZE, addr,
+	map = memblock_alloc_try_nid(size, size, addr,
 					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 	if (!map)
 		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
@@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
 {
 	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
 	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
+	/*
+	 * Pre-allocated buffer is mainly used by __populate_section_memmap
+	 * and we want it to be properly aligned to the section size - this is
+	 * especially the case for VMEMMAP which maps memmap to PMDs
+	 */
 	sparsemap_buf =
-		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
+		memblock_alloc_try_nid_raw(size, section_map_size(),
 						addr,
 						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 	sparsemap_buf_end = sparsemap_buf + size;
Pasha Tatashin Oct. 30, 2019, 5:53 p.m. UTC | #11
On Wed, Oct 30, 2019 at 1:31 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
> > On Wed, Oct 30, 2019 at 11:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> > > > On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> > > > >  but it used your old email address]
> > > > >
> > > > > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > > > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > > > > >  first [and only] section is allocated from the zeroing path while with
> > > > > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > > > > >  non-zeroing path.)
> > > > > > >
> > > > > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > > > > the full node for some reason and so sparse_init_nid would have to
> > > > > > > allocate one for each memory section?
> > > > > >
> > > > > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > > > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > > > > allocate the same size from it.
> > > > > >
> > > > > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > > > > to return a pointer which is aligned to the allocation size.  But the
> > > > > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > > > > enough space to align it.
> > > > > >
> > > > > > I don't understand the reason for this alignment requirement since the
> > > > > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > > > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> > > > >
> > > > > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > > > > memmaps to page tables but that would suggest that sparse_buffer_init
> > > > > is using a wrong alignment then. It is quite wasteful to allocate
> > > > > alarge misaligned block like that.
> > > > >
> > > > > Your patch still makes sense but this is something to look into.
> > > > >
> > > > > Pavel?
> > > >
> > > > I remember thinking about this large alignment, as it looked out of
> > > > place to me also.
> > > > It was there to keep memmap in single chunks on larger x86 machines.
> > > > Perhaps it can be revisited now.
> > >
> > > Don't we need 2MB aligned memmaps for their PMD mappings?
> >
> > Yes, PMD_SIZE should be the alignment here. It just does not make
> > sense to align to size.
>
> What about this? It still aligns to the size but that should be
> correctly done to the section size level.
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 72f010d9bff5..ab1e6175ac9a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>         if (map)
>                 return map;
>
> -       map = memblock_alloc_try_nid(size,
> -                                         PAGE_SIZE, addr,
> +       map = memblock_alloc_try_nid(size, size, addr,
>                                           MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>         if (!map)
>                 panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>  {
>         phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
>         WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
> +       /*
> +        * Pre-allocated buffer is mainly used by __populate_section_memmap
> +        * and we want it to be properly aligned to the section size - this is
> +        * especially the case for VMEMMAP which maps memmap to PMDs
> +        */
>         sparsemap_buf =
> -               memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> +               memblock_alloc_try_nid_raw(size, section_map_size(),
>                                                 addr,
>                                                 MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>         sparsemap_buf_end = sparsemap_buf + size;

This looks good, I think we should also change alignment in fallback
of vmemmap_alloc_block() to be
section_map_size().

+++ b/mm/sparse-vmemmap.c
@@ -65,9 +65,10 @@ void * __meminit vmemmap_alloc_block(unsigned long
size, int node)
                        warned = true;
                }
                return NULL;
-       } else
-               return __earlyonly_bootmem_alloc(node, size, size,
+       } else {
+               return __earlyonly_bootmem_alloc(node, size, section_map_size(),
                                __pa(MAX_DMA_ADDRESS));
+       }
 }

Pasha


>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Oct. 30, 2019, 6:01 p.m. UTC | #12
On Wed 30-10-19 13:53:52, Pavel Tatashin wrote:
> On Wed, Oct 30, 2019 at 1:31 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
[...]
> > > Yes, PMD_SIZE should be the alignment here. It just does not make
> > > sense to align to size.
> >
> > What about this? It still aligns to the size but that should be
> > correctly done to the section size level.
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 72f010d9bff5..ab1e6175ac9a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> >         if (map)
> >                 return map;
> >
> > -       map = memblock_alloc_try_nid(size,
> > -                                         PAGE_SIZE, addr,
> > +       map = memblock_alloc_try_nid(size, size, addr,
> >                                           MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >         if (!map)
> >                 panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> >  {
> >         phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> >         WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
> > +       /*
> > +        * Pre-allocated buffer is mainly used by __populate_section_memmap
> > +        * and we want it to be properly aligned to the section size - this is
> > +        * especially the case for VMEMMAP which maps memmap to PMDs
> > +        */
> >         sparsemap_buf =
> > -               memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > +               memblock_alloc_try_nid_raw(size, section_map_size(),
> >                                                 addr,
> >                                                 MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >         sparsemap_buf_end = sparsemap_buf + size;
> 
> This looks good, I think we should also change alignment in fallback
> of vmemmap_alloc_block() to be
> section_map_size().
> 
> +++ b/mm/sparse-vmemmap.c
> @@ -65,9 +65,10 @@ void * __meminit vmemmap_alloc_block(unsigned long
> size, int node)
>                         warned = true;
>                 }
>                 return NULL;
> -       } else
> -               return __earlyonly_bootmem_alloc(node, size, size,
> +       } else {
> +               return __earlyonly_bootmem_alloc(node, size, section_map_size(),
>                                 __pa(MAX_DMA_ADDRESS));
> +       }
>  }

Are you sure? Doesn't this provide the proper alignement already? Most
callers do PAGE_SIZE vmemmap_populate_hugepages PMD_SIZE so the
resulting alignment looks good to me.
Pasha Tatashin Oct. 30, 2019, 6:23 p.m. UTC | #13
On Wed, Oct 30, 2019 at 2:01 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 30-10-19 13:53:52, Pavel Tatashin wrote:
> > On Wed, Oct 30, 2019 at 1:31 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
> [...]
> > > > Yes, PMD_SIZE should be the alignment here. It just does not make
> > > > sense to align to size.
> > >
> > > What about this? It still aligns to the size but that should be
> > > correctly done to the section size level.
> > >
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > >         if (map)
> > >                 return map;
> > >
> > > -       map = memblock_alloc_try_nid(size,
> > > -                                         PAGE_SIZE, addr,
> > > +       map = memblock_alloc_try_nid(size, size, addr,
> > >                                           MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >         if (!map)
> > >                 panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > >  {
> > >         phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > >         WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
> > > +       /*
> > > +        * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > +        * and we want it to be properly aligned to the section size - this is
> > > +        * especially the case for VMEMMAP which maps memmap to PMDs
> > > +        */
> > >         sparsemap_buf =
> > > -               memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > +               memblock_alloc_try_nid_raw(size, section_map_size(),
> > >                                                 addr,
> > >                                                 MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >         sparsemap_buf_end = sparsemap_buf + size;
> >
> > This looks good, I think we should also change alignment in fallback
> > of vmemmap_alloc_block() to be
> > section_map_size().
> >
> > +++ b/mm/sparse-vmemmap.c
> > @@ -65,9 +65,10 @@ void * __meminit vmemmap_alloc_block(unsigned long
> > size, int node)
> >                         warned = true;
> >                 }
> >                 return NULL;
> > -       } else
> > -               return __earlyonly_bootmem_alloc(node, size, size,
> > +       } else {
> > +               return __earlyonly_bootmem_alloc(node, size, section_map_size(),
> >                                 __pa(MAX_DMA_ADDRESS));
> > +       }
> >  }
>
> Are you sure? Doesn't this provide the proper alignement already? Most
> callers do PAGE_SIZE vmemmap_populate_hugepages PMD_SIZE so the
> resulting alignment looks good to me.

Nevermind, you are right. I tracked only one path, and forgot about
the normal PAGE_SIZE path.

Thank you,
Pasha

> --
> Michal Hocko
> SUSE Labs
Michal Hocko Oct. 31, 2019, 7:25 a.m. UTC | #14
Vincent, could you give this a try please? It would be even better if
you could add some debugging to measure the overhead. Let me know if you
need any help with a debugging patch.

Thanks!

On Wed 30-10-19 18:31:23, Michal Hocko wrote:
[...]
> What about this? It still aligns to the size but that should be
> correctly done to the section size level.
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 72f010d9bff5..ab1e6175ac9a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>  	if (map)
>  		return map;
>  
> -	map = memblock_alloc_try_nid(size,
> -					  PAGE_SIZE, addr,
> +	map = memblock_alloc_try_nid(size, size, addr,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	if (!map)
>  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>  {
>  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
>  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> +	/*
> +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> +	 * and we want it to be properly aligned to the section size - this is
> +	 * especially the case for VMEMMAP which maps memmap to PMDs
> +	 */
>  	sparsemap_buf =
> -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> +		memblock_alloc_try_nid_raw(size, section_map_size(),
>  						addr,
>  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	sparsemap_buf_end = sparsemap_buf + size;
> 
> -- 
> Michal Hocko
> SUSE Labs
Vincent Whitchurch Nov. 4, 2019, 3:51 p.m. UTC | #15
On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> [...]
> > What about this? It still aligns to the size but that should be
> > correctly done to the section size level.
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 72f010d9bff5..ab1e6175ac9a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> >  	if (map)
> >  		return map;
> >  
> > -	map = memblock_alloc_try_nid(size,
> > -					  PAGE_SIZE, addr,
> > +	map = memblock_alloc_try_nid(size, size, addr,
> >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >  	if (!map)
> >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> >  {
> >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > +	/*
> > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > +	 * and we want it to be properly aligned to the section size - this is
> > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > +	 */
> >  	sparsemap_buf =
> > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> >  						addr,
> >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >  	sparsemap_buf_end = sparsemap_buf + size;
>
> Vincent, could you give this a try please? It would be even better if
> you could add some debugging to measure the overhead. Let me know if you
> need any help with a debugging patch.

I've tested this patch and it works on my platform:  The allocations
from sparse_buffer_alloc() now succeed and the fallback path is not
taken.

I'm not sure what kind of overhead you're interested in.  The memory
overhead of the initial allocations (as measured via the "Memory: XX/YY
available" prints and MemTotal), is identical with and without this
patch.  I don't see any difference in the time taken for the
initializations either.
Michal Hocko Nov. 5, 2019, 8:43 a.m. UTC | #16
On Mon 04-11-19 16:51:26, Vincent Whitchurch wrote:
> On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> > On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> > [...]
> > > What about this? It still aligns to the size but that should be
> > > correctly done to the section size level.
> > > 
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > >  	if (map)
> > >  		return map;
> > >  
> > > -	map = memblock_alloc_try_nid(size,
> > > -					  PAGE_SIZE, addr,
> > > +	map = memblock_alloc_try_nid(size, size, addr,
> > >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >  	if (!map)
> > >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > >  {
> > >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > > +	/*
> > > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > +	 * and we want it to be properly aligned to the section size - this is
> > > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > > +	 */
> > >  	sparsemap_buf =
> > > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> > >  						addr,
> > >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >  	sparsemap_buf_end = sparsemap_buf + size;
> >
> > Vincent, could you give this a try please? It would be even better if
> > you could add some debugging to measure the overhead. Let me know if you
> > need any help with a debugging patch.
> 
> I've tested this patch and it works on my platform:  The allocations
> from sparse_buffer_alloc() now succeed and the fallback path is not
> taken.

Thanks a lot. I will try to prepare the full patch with a proper
changelog sometimes this week.

> I'm not sure what kind of overhead you're interested in.

The memory overhead when the sparsemap_buf PAGE_SIZE alignment is in
place. In other words (ptr - sparsemap_buf) in sparse_buffer_alloc. If
the sparsemap_buf was properly aligned then the diff should be 0.
Andrew Morton Nov. 15, 2019, 11:55 p.m. UTC | #17
On Tue, 5 Nov 2019 09:43:52 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 04-11-19 16:51:26, Vincent Whitchurch wrote:
> > On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> > > On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> > > [...]
> > > > What about this? It still aligns to the size but that should be
> > > > correctly done to the section size level.
> > > > 
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > > >  	if (map)
> > > >  		return map;
> > > >  
> > > > -	map = memblock_alloc_try_nid(size,
> > > > -					  PAGE_SIZE, addr,
> > > > +	map = memblock_alloc_try_nid(size, size, addr,
> > > >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > >  	if (!map)
> > > >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > > >  {
> > > >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > > >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > > > +	/*
> > > > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > > +	 * and we want it to be properly aligned to the section size - this is
> > > > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > > > +	 */
> > > >  	sparsemap_buf =
> > > > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> > > >  						addr,
> > > >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > >  	sparsemap_buf_end = sparsemap_buf + size;
> > >
> > > Vincent, could you give this a try please? It would be even better if
> > > you could add some debugging to measure the overhead. Let me know if you
> > > need any help with a debugging patch.
> > 
> > I've tested this patch and it works on my platform:  The allocations
> > from sparse_buffer_alloc() now succeed and the fallback path is not
> > taken.
> 
> Thanks a lot. I will try to prepare the full patch with a proper
> changelog sometimes this week.
> 

We're late in -rc7.  Should we run with Vincent's original for now?

And I'm wondering why this is -stable -material?  You said

: Anyway the patch is OK.  Even though this is not a bug strictly
: speaking it is certainly a suboptimal behavior because zeroying takes
: time so I would flag this for a stable tree 4.19+.  There is no clear
: Fixes tag to apply (35fd1eb1e8212 would get closest I guess).

I'm not seeing any description of any runtime effect of the bug at
present.  When would unzeroed sparsemem pageframes cause a problem? 
Could they be visible during deferred initialization or mem hotadd?
Michal Hocko Nov. 18, 2019, 11:28 a.m. UTC | #18
On Fri 15-11-19 15:55:35, Andrew Morton wrote:
> On Tue, 5 Nov 2019 09:43:52 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Mon 04-11-19 16:51:26, Vincent Whitchurch wrote:
> > > On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> > > > On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> > > > [...]
> > > > > What about this? It still aligns to the size but that should be
> > > > > correctly done to the section size level.
> > > > > 
> > > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > > > --- a/mm/sparse.c
> > > > > +++ b/mm/sparse.c
> > > > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > > > >  	if (map)
> > > > >  		return map;
> > > > >  
> > > > > -	map = memblock_alloc_try_nid(size,
> > > > > -					  PAGE_SIZE, addr,
> > > > > +	map = memblock_alloc_try_nid(size, size, addr,
> > > > >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > > >  	if (!map)
> > > > >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > > > >  {
> > > > >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > > > >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > > > > +	/*
> > > > > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > > > +	 * and we want it to be properly aligned to the section size - this is
> > > > > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > > > > +	 */
> > > > >  	sparsemap_buf =
> > > > > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > > > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> > > > >  						addr,
> > > > >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > > >  	sparsemap_buf_end = sparsemap_buf + size;
> > > >
> > > > Vincent, could you give this a try please? It would be even better if
> > > > you could add some debugging to measure the overhead. Let me know if you
> > > > need any help with a debugging patch.
> > > 
> > > I've tested this patch and it works on my platform:  The allocations
> > > from sparse_buffer_alloc() now succeed and the fallback path is not
> > > taken.
> > 
> > Thanks a lot. I will try to prepare the full patch with a proper
> > changelog sometimes this week.
> > 
> 
> We're late in -rc7.  Should we run with Vincent's original for now?

Yes, that patch is correct on its own. I have still the follow up clean
up on my todo list. I will get to this hopefully soon.

> And I'm wondering why this is -stable -material?  You said
> 
> : Anyway the patch is OK.  Even though this is not a bug strictly
> : speaking it is certainly a suboptimal behavior because zeroying takes
> : time so I would flag this for a stable tree 4.19+.  There is no clear
> : Fixes tag to apply (35fd1eb1e8212 would get closest I guess).
> 
> I'm not seeing any description of any runtime effect of the bug at
> present.  When would unzeroed sparsemem pageframes cause a problem? 
> Could they be visible during deferred initialization or mem hotadd?

The main user visible problem is a memory wastage. The overal amount of
memory should be small. I wouldn't call it a stable material.
diff mbox series

Patch

diff --git a/mm/sparse.c b/mm/sparse.c
index f6891c1992b1..01e467adc219 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -458,7 +458,7 @@  struct page __init *__populate_section_memmap(unsigned long pfn,
 	if (map)
 		return map;
 
-	map = memblock_alloc_try_nid(size,
+	map = memblock_alloc_try_nid_raw(size,
 					  PAGE_SIZE, addr,
 					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 	if (!map)