diff mbox series

[2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn

Message ID 20200206231629.14151-3-richardw.yang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Fixes "mm/sparsemem: support sub-section hotplug" | expand

Commit Message

Wei Yang Feb. 6, 2020, 11:16 p.m. UTC
memmap should be the physical address to page struct instead of virtual
address to pfn.

Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
this point.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dan Williams Feb. 7, 2020, 2:19 a.m. UTC | #1
On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>
> memmap should be the physical address to page struct instead of virtual
> address to pfn.
>
> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> this point.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5da121bdd6e..56816f653588 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>         /* Align memmap to section boundary in the subsection case */
>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>                 section_nr_to_pfn(section_nr) != start_pfn)
> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));

Yes, this looks obviously correct. This might be tripping up
makedumpfile. Do you see any practical effects of this bug? The kernel
mostly avoids ->section_mem_map in the vmemmap case and in the
!vmemmap case section_nr_to_pfn(section_nr) should always equal
start_pfn.
Baoquan He Feb. 7, 2020, 3:10 a.m. UTC | #2
Hi Dan,

On 02/06/20 at 06:19pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index b5da121bdd6e..56816f653588 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >         /* Align memmap to section boundary in the subsection case */
> >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >                 section_nr_to_pfn(section_nr) != start_pfn)
> > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> 
> Yes, this looks obviously correct. This might be tripping up
> makedumpfile. Do you see any practical effects of this bug? The kernel
> mostly avoids ->section_mem_map in the vmemmap case and in the
> !vmemmap case section_nr_to_pfn(section_nr) should always equal
> start_pfn.

The practical effects is that the memmap for the first unaligned section will be lost
when destroy namespace to hot remove it. Because we encode the ->section_mem_map
into mem_section, and get memmap from the related mem_section to free it in
section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
with memmap.

By the way, sub-section support is only valid in vmemmap case, right?
Seems yes from code, but I don't find any document to prove it.

Thanks
Baoquan
Dan Williams Feb. 7, 2020, 3:21 a.m. UTC | #3
On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
>
> Hi Dan,
>
> On 02/06/20 at 06:19pm, Dan Williams wrote:
> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index b5da121bdd6e..56816f653588 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > >         /* Align memmap to section boundary in the subsection case */
> > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > >                 section_nr_to_pfn(section_nr) != start_pfn)
> > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >
> > Yes, this looks obviously correct. This might be tripping up
> > makedumpfile. Do you see any practical effects of this bug? The kernel
> > mostly avoids ->section_mem_map in the vmemmap case and in the
> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > start_pfn.
>
> The practical effects is that the memmap for the first unaligned section will be lost
> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> into mem_section, and get memmap from the related mem_section to free it in
> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> with memmap.

Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?

> By the way, sub-section support is only valid in vmemmap case, right?

Yes.

> Seems yes from code, but I don't find any document to prove it.

check_pfn_span() enforces this requirement.
Baoquan He Feb. 7, 2020, 3:36 a.m. UTC | #4
On 02/06/20 at 07:21pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
> >
> > Hi Dan,
> >
> > On 02/06/20 at 06:19pm, Dan Williams wrote:
> > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index b5da121bdd6e..56816f653588 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > >         /* Align memmap to section boundary in the subsection case */
> > > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > >                 section_nr_to_pfn(section_nr) != start_pfn)
> > > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> > >
> > > Yes, this looks obviously correct. This might be tripping up
> > > makedumpfile. Do you see any practical effects of this bug? The kernel
> > > mostly avoids ->section_mem_map in the vmemmap case and in the
> > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > > start_pfn.
> >
> > The practical effects is that the memmap for the first unaligned section will be lost
> > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> > into mem_section, and get memmap from the related mem_section to free it in
> > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> > with memmap.
> 
> Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?

I think no, the lost memmap should only happen in vmemmap case.

> 
> > By the way, sub-section support is only valid in vmemmap case, right?
> 
> Yes.
> 
> > Seems yes from code, but I don't find any document to prove it.
> 
> check_pfn_span() enforces this requirement.

Thanks for your confirmation. Do you mind if I add some document
sentences somewhere make clear this?
Dan Williams Feb. 7, 2020, 4:05 a.m. UTC | #5
On Thu, Feb 6, 2020 at 7:36 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 02/06/20 at 07:21pm, Dan Williams wrote:
> > On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > Hi Dan,
> > >
> > > On 02/06/20 at 06:19pm, Dan Williams wrote:
> > > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > > index b5da121bdd6e..56816f653588 100644
> > > > > --- a/mm/sparse.c
> > > > > +++ b/mm/sparse.c
> > > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > > >         /* Align memmap to section boundary in the subsection case */
> > > > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > > >                 section_nr_to_pfn(section_nr) != start_pfn)
> > > > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> > > >
> > > > Yes, this looks obviously correct. This might be tripping up
> > > > makedumpfile. Do you see any practical effects of this bug? The kernel
> > > > mostly avoids ->section_mem_map in the vmemmap case and in the
> > > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > > > start_pfn.
> > >
> > > The practical effects is that the memmap for the first unaligned section will be lost
> > > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> > > into mem_section, and get memmap from the related mem_section to free it in
> > > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> > > with memmap.
> >
> > Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
> I think no, the lost memmap should only happen in vmemmap case.
>
> >
> > > By the way, sub-section support is only valid in vmemmap case, right?
> >
> > Yes.
> >
> > > Seems yes from code, but I don't find any document to prove it.
> >
> > check_pfn_span() enforces this requirement.
>
> Thanks for your confirmation. Do you mind if I add some document
> sentences somewhere make clear this?
>

Sure, I'd be happy to review as well.
Baoquan He Feb. 7, 2020, 4:11 a.m. UTC | #6
On 02/07/20 at 07:16am, Wei Yang wrote:
> memmap should be the physical address to page struct instead of virtual
> address to pfn.

Maybe not, memmap stores a virtual address.

> 
> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> this point.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5da121bdd6e..56816f653588 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  	/* Align memmap to section boundary in the subsection case */
>  	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>  		section_nr_to_pfn(section_nr) != start_pfn)
> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));

With Dan's confirmation, sub-section is only valid in vmemmap case. I
think the old if (section_nr_to_pfn(section_nr) != start_pfn) is enough
to filter out non vmemmap case. So only below code is good:

 +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));

>  	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>  
>  	return 0;
> -- 
> 2.17.1
>
Wei Yang Feb. 7, 2020, 10:51 a.m. UTC | #7
On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>         /* Align memmap to section boundary in the subsection case */
>>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>                 section_nr_to_pfn(section_nr) != start_pfn)
>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>Yes, this looks obviously correct. This might be tripping up
>makedumpfile. Do you see any practical effects of this bug? The kernel
>mostly avoids ->section_mem_map in the vmemmap case and in the
>!vmemmap case section_nr_to_pfn(section_nr) should always equal
>start_pfn.

To summarize:

 * vmemmap, ->section_mem_map is not used mostly
 * !vmemmap, we are sure range is section aligned

Sounds we don't need to handle this?
Wei Yang Feb. 7, 2020, 10:53 a.m. UTC | #8
On Fri, Feb 07, 2020 at 12:11:34PM +0800, Baoquan He wrote:
>On 02/07/20 at 07:16am, Wei Yang wrote:
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>
>Maybe not, memmap stores a virtual address.
>
>> 
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>> 
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>  	/* Align memmap to section boundary in the subsection case */
>>  	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>  		section_nr_to_pfn(section_nr) != start_pfn)
>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>With Dan's confirmation, sub-section is only valid in vmemmap case. I
>think the old if (section_nr_to_pfn(section_nr) != start_pfn) is enough
>to filter out non vmemmap case. So only below code is good:
>
> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>

You mean replace pfn_to_kaddr with pfn_to_page ?

>>  	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>>  
>>  	return 0;
>> -- 
>> 2.17.1
>>
Wei Yang Feb. 7, 2020, 11:13 a.m. UTC | #9
On Fri, Feb 07, 2020 at 11:36:36AM +0800, Baoquan He wrote:
>On 02/06/20 at 07:21pm, Dan Williams wrote:
>> On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
>> >
>> > Hi Dan,
>> >
>> > On 02/06/20 at 06:19pm, Dan Williams wrote:
>> > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>> > > > diff --git a/mm/sparse.c b/mm/sparse.c
>> > > > index b5da121bdd6e..56816f653588 100644
>> > > > --- a/mm/sparse.c
>> > > > +++ b/mm/sparse.c
>> > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> > > >         /* Align memmap to section boundary in the subsection case */
>> > > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> > > >                 section_nr_to_pfn(section_nr) != start_pfn)
>> > > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> > > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>> > >
>> > > Yes, this looks obviously correct. This might be tripping up
>> > > makedumpfile. Do you see any practical effects of this bug? The kernel
>> > > mostly avoids ->section_mem_map in the vmemmap case and in the
>> > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
>> > > start_pfn.
>> >
>> > The practical effects is that the memmap for the first unaligned section will be lost
>> > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
>> > into mem_section, and get memmap from the related mem_section to free it in
>> > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
>> > with memmap.
>> 
>> Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
>I think no, the lost memmap should only happen in vmemmap case.
>
>> 
>> > By the way, sub-section support is only valid in vmemmap case, right?
>> 
>> Yes.
>> 
>> > Seems yes from code, but I don't find any document to prove it.
>> 
>> check_pfn_span() enforces this requirement.

I saw this function, but those combination of vmemmap and !vmemmap make my
brain not work properly.

>
>Thanks for your confirmation. Do you mind if I add some document
>sentences somewhere make clear this?

Thanks, hope this would help the future audience.
Wei Yang Feb. 7, 2020, 11:26 a.m. UTC | #10
On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>         /* Align memmap to section boundary in the subsection case */
>>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>                 section_nr_to_pfn(section_nr) != start_pfn)
>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>Yes, this looks obviously correct. This might be tripping up
>makedumpfile. Do you see any practical effects of this bug? The kernel
>mostly avoids ->section_mem_map in the vmemmap case and in the
>!vmemmap case section_nr_to_pfn(section_nr) should always equal
>start_pfn.

I took another look into the code. Looks there is no practical effect after
this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
the real memmap.

But leave a inconsistent data in section_mem_map is not a good practice.
Wei Yang Feb. 7, 2020, 12:14 p.m. UTC | #11
On Thu, Feb 06, 2020 at 07:21:49PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
>>
>> Hi Dan,
>>
>> On 02/06/20 at 06:19pm, Dan Williams wrote:
>> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>> > > diff --git a/mm/sparse.c b/mm/sparse.c
>> > > index b5da121bdd6e..56816f653588 100644
>> > > --- a/mm/sparse.c
>> > > +++ b/mm/sparse.c
>> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> > >         /* Align memmap to section boundary in the subsection case */
>> > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> > >                 section_nr_to_pfn(section_nr) != start_pfn)
>> > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>> >
>> > Yes, this looks obviously correct. This might be tripping up
>> > makedumpfile. Do you see any practical effects of this bug? The kernel
>> > mostly avoids ->section_mem_map in the vmemmap case and in the
>> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
>> > start_pfn.
>>
>> The practical effects is that the memmap for the first unaligned section will be lost
>> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
>> into mem_section, and get memmap from the related mem_section to free it in
>> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
>> with memmap.
>
>Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
>> By the way, sub-section support is only valid in vmemmap case, right?
>
>Yes.

Just one question from curiosity. Why we don't want sub-section for !vmemmap
case? Because it will wast memory for memmap?

>
>> Seems yes from code, but I don't find any document to prove it.
>
>check_pfn_span() enforces this requirement.
Dan Williams Feb. 7, 2020, 4:44 p.m. UTC | #12
On Fri, Feb 7, 2020 at 4:15 AM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> On Thu, Feb 06, 2020 at 07:21:49PM -0800, Dan Williams wrote:
> >On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
> >>
> >> Hi Dan,
> >>
> >> On 02/06/20 at 06:19pm, Dan Williams wrote:
> >> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> >> > > diff --git a/mm/sparse.c b/mm/sparse.c
> >> > > index b5da121bdd6e..56816f653588 100644
> >> > > --- a/mm/sparse.c
> >> > > +++ b/mm/sparse.c
> >> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >> > >         /* Align memmap to section boundary in the subsection case */
> >> > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >> > >                 section_nr_to_pfn(section_nr) != start_pfn)
> >> > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> >> > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >> >
> >> > Yes, this looks obviously correct. This might be tripping up
> >> > makedumpfile. Do you see any practical effects of this bug? The kernel
> >> > mostly avoids ->section_mem_map in the vmemmap case and in the
> >> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> >> > start_pfn.
> >>
> >> The practical effects is that the memmap for the first unaligned section will be lost
> >> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> >> into mem_section, and get memmap from the related mem_section to free it in
> >> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> >> with memmap.
> >
> >Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
> >
> >> By the way, sub-section support is only valid in vmemmap case, right?
> >
> >Yes.
>
> Just one question from curiosity. Why we don't want sub-section for !vmemmap
> case? Because it will wast memory for memmap?

The effort and maintenance burden outweighs the benefit.
Baoquan He Feb. 9, 2020, 1:50 p.m. UTC | #13
On 02/07/20 at 11:26am, Wei Yang wrote:
> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
> >On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> >>
> >> memmap should be the physical address to page struct instead of virtual
> >> address to pfn.
> >>
> >> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> >> this point.
> >>
> >> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> CC: Dan Williams <dan.j.williams@intel.com>
> >> ---
> >>  mm/sparse.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/sparse.c b/mm/sparse.c
> >> index b5da121bdd6e..56816f653588 100644
> >> --- a/mm/sparse.c
> >> +++ b/mm/sparse.c
> >> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >>         /* Align memmap to section boundary in the subsection case */
> >>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >>                 section_nr_to_pfn(section_nr) != start_pfn)
> >> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> >> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >
> >Yes, this looks obviously correct. This might be tripping up
> >makedumpfile. Do you see any practical effects of this bug? The kernel
> >mostly avoids ->section_mem_map in the vmemmap case and in the
> >!vmemmap case section_nr_to_pfn(section_nr) should always equal
> >start_pfn.
> 
> I took another look into the code. Looks there is no practical effect after
> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
> the real memmap.
> 
> But leave a inconsistent data in section_mem_map is not a good practice.

Yeah, it does has no pratical effect. I tried to create sub-section
alighed namespace, then trigger crash, makedumpfile isn't impacted.
Because pmem memory is only added, but not onlined. We don't report it
to kdump, makedumpfile will ignore it.

I think it's worth fixing it to encode a correct memmap address. We
don't know if in the future this will break anything.

Thanks
Baoquan
David Hildenbrand Feb. 9, 2020, 2:14 p.m. UTC | #14
> Am 09.02.2020 um 14:50 schrieb Baoquan He <bhe@redhat.com>:
> 
> On 02/07/20 at 11:26am, Wei Yang wrote:
>>> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>>> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>>> 
>>>> memmap should be the physical address to page struct instead of virtual
>>>> address to pfn.
>>>> 
>>>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>>>> this point.
>>>> 
>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>> CC: Dan Williams <dan.j.williams@intel.com>
>>>> ---
>>>> mm/sparse.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index b5da121bdd6e..56816f653588 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>        /* Align memmap to section boundary in the subsection case */
>>>>        if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>>>                section_nr_to_pfn(section_nr) != start_pfn)
>>>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>> 
>>> Yes, this looks obviously correct. This might be tripping up
>>> makedumpfile. Do you see any practical effects of this bug? The kernel
>>> mostly avoids ->section_mem_map in the vmemmap case and in the
>>> !vmemmap case section_nr_to_pfn(section_nr) should always equal
>>> start_pfn.
>> 
>> I took another look into the code. Looks there is no practical effect after
>> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
>> the real memmap.
>> 
>> But leave a inconsistent data in section_mem_map is not a good practice.
> 
> Yeah, it does has no pratical effect. I tried to create sub-section
> alighed namespace, then trigger crash, makedumpfile isn't impacted.
> Because pmem memory is only added, but not onlined. We don't report it
> to kdump, makedumpfile will ignore it.
> 
> I think it's worth fixing it to encode a correct memmap address. We
> don't know if in the future this will break anything.

We can have system memory and devmem overlap within a section (which is still buggy and to be fixed in other regard - e.g., pfn_to_online_page() does not work correctly).

E.g., 64 mb of (boot) system memory in a section. Then you can hot-add devmem that spans the remaining 64 mb of that section.

So some of that memory will be kdumped - and should be fixed if broken.

Cheers


> 
> Thanks
> Baoquan
Wei Yang Feb. 10, 2020, 12:36 a.m. UTC | #15
On Sun, Feb 09, 2020 at 03:14:28PM +0100, David Hildenbrand wrote:
>
>
>> Am 09.02.2020 um 14:50 schrieb Baoquan He <bhe@redhat.com>:
>> 
>> On 02/07/20 at 11:26am, Wei Yang wrote:
>>>> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>>>> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>>>> 
>>>>> memmap should be the physical address to page struct instead of virtual
>>>>> address to pfn.
>>>>> 
>>>>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>>>>> this point.
>>>>> 
>>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>> CC: Dan Williams <dan.j.williams@intel.com>
>>>>> ---
>>>>> mm/sparse.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>>> index b5da121bdd6e..56816f653588 100644
>>>>> --- a/mm/sparse.c
>>>>> +++ b/mm/sparse.c
>>>>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>>        /* Align memmap to section boundary in the subsection case */
>>>>>        if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>>>>                section_nr_to_pfn(section_nr) != start_pfn)
>>>>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>>> 
>>>> Yes, this looks obviously correct. This might be tripping up
>>>> makedumpfile. Do you see any practical effects of this bug? The kernel
>>>> mostly avoids ->section_mem_map in the vmemmap case and in the
>>>> !vmemmap case section_nr_to_pfn(section_nr) should always equal
>>>> start_pfn.
>>> 
>>> I took another look into the code. Looks there is no practical effect after
>>> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
>>> the real memmap.
>>> 
>>> But leave a inconsistent data in section_mem_map is not a good practice.
>> 
>> Yeah, it does has no pratical effect. I tried to create sub-section
>> alighed namespace, then trigger crash, makedumpfile isn't impacted.
>> Because pmem memory is only added, but not onlined. We don't report it
>> to kdump, makedumpfile will ignore it.
>> 
>> I think it's worth fixing it to encode a correct memmap address. We
>> don't know if in the future this will break anything.
>
>We can have system memory and devmem overlap within a section (which is still buggy and to be fixed in other regard - e.g., pfn_to_online_page() does not work correctly).
>
>E.g., 64 mb of (boot) system memory in a section. Then you can hot-add devmem that spans the remaining 64 mb of that section.
>
>So some of that memory will be kdumped - and should be fixed if broken.
>
>Cheers

Thanks for the explanation, I will add this in the changelog.

>
>
>> 
>> Thanks
>> Baoquan
diff mbox series

Patch

diff --git a/mm/sparse.c b/mm/sparse.c
index b5da121bdd6e..56816f653588 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -888,7 +888,7 @@  int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	/* Align memmap to section boundary in the subsection case */
 	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
 		section_nr_to_pfn(section_nr) != start_pfn)
-		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
+		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
 	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
 
 	return 0;