Message ID | 20200209104826.3385-8-bhe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hotplug: Only use subsection in VMEMMAP case and fix hot add/remove failure in SPARSEMEM|!VMEMMAP case | expand |
On Sun, Feb 09, 2020 at 06:48:26PM +0800, Baoquan He wrote: >In section_deactivate(), pfn_to_page() doesn't work any more after >ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case. >It caused hot remove failure, the trace is: > >kernel BUG at mm/page_alloc.c:4806! >invalid opcode: 0000 [#1] SMP PTI >CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340 >Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 >Workqueue: kacpi_hotplug acpi_hotplug_work_fn >RIP: 0010:free_pages+0x85/0xa0 >Call Trace: > __remove_pages+0x99/0xc0 > arch_remove_memory+0x23/0x4d > try_remove_memory+0xc8/0x130 > ? walk_memory_blocks+0x72/0xa0 > __remove_memory+0xa/0x11 > acpi_memory_device_remove+0x72/0x100 > acpi_bus_trim+0x55/0x90 > acpi_device_hotplug+0x2eb/0x3d0 > acpi_hotplug_work_fn+0x1a/0x30 > process_one_work+0x1a7/0x370 > worker_thread+0x30/0x380 > ? flush_rcu_work+0x30/0x30 > kthread+0x112/0x130 > ? kthread_create_on_node+0x60/0x60 > ret_from_fork+0x35/0x40 > >Let's defer the ->section_mem_map resetting after depopulate_section_memmap() >to fix it. > >Signed-off-by: Baoquan He <bhe@redhat.com> >--- > mm/sparse.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > >diff --git a/mm/sparse.c b/mm/sparse.c >index 623755e88255..345d065ef6ce 100644 >--- a/mm/sparse.c >+++ b/mm/sparse.c >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, > ms->usage = NULL; > } > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); >- ms->section_mem_map = (unsigned long)NULL; > } > > if (section_is_early && memmap) > free_map_bootmem(memmap); > else > depopulate_section_memmap(pfn, nr_pages, altmap); The crash happens in depopulate_section_memmap() when trying to get memmap by pfn_to_page(). Can we pass memmap directly? >+ >+ if(!rc) >+ ms->section_mem_map = (unsigned long)NULL; > } > > static struct page * __meminit section_activate(int nid, unsigned long pfn, >-- >2.17.2
On 02/10/20 at 07:52am, Wei Yang wrote: > >--- > > mm/sparse.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > >diff --git a/mm/sparse.c b/mm/sparse.c > >index 623755e88255..345d065ef6ce 100644 > >--- a/mm/sparse.c > >+++ b/mm/sparse.c > >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, > > ms->usage = NULL; > > } > > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); > >- ms->section_mem_map = (unsigned long)NULL; > > } > > > > if (section_is_early && memmap) > > free_map_bootmem(memmap); > > else > > depopulate_section_memmap(pfn, nr_pages, altmap); > > The crash happens in depopulate_section_memmap() when trying to get memmap by > pfn_to_page(). Can we pass memmap directly? Yes, that's also a good idea. While it needs to add a parameter for depopulate_section_memmap(), the parameter is useless for VMEMMAP though, I personally prefer the current fix which is a little simpler. Anyway, both is fine to me, I can update if you think passing memmap is better. > > >+ > >+ if(!rc) > >+ ms->section_mem_map = (unsigned long)NULL; > > } > > > > static struct page * __meminit section_activate(int nid, unsigned long pfn, > >-- > >2.17.2 > > -- > Wei Yang > Help you, Help me >
On Mon, Feb 10, 2020 at 11:41:05AM +0800, Baoquan He wrote: >On 02/10/20 at 07:52am, Wei Yang wrote: >> >--- >> > mm/sparse.c | 4 +++- >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> > >> >diff --git a/mm/sparse.c b/mm/sparse.c >> >index 623755e88255..345d065ef6ce 100644 >> >--- a/mm/sparse.c >> >+++ b/mm/sparse.c >> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, >> > ms->usage = NULL; >> > } >> > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); >> >- ms->section_mem_map = (unsigned long)NULL; >> > } >> > >> > if (section_is_early && memmap) >> > free_map_bootmem(memmap); >> > else >> > depopulate_section_memmap(pfn, nr_pages, altmap); >> >> The crash happens in depopulate_section_memmap() when trying to get memmap by >> pfn_to_page(). Can we pass memmap directly? > >Yes, that's also a good idea. While it needs to add a parameter for >depopulate_section_memmap(), the parameter is useless for VMEMMAP >though, I personally prefer the current fix which is a little simpler. > Not a new parameter, but replace pfn with memmap. Not sure why the parameter is useless for VMEMMAP? memmap will be assigned to start and finally pass to vmemmap_free(). >Anyway, both is fine to me, I can update if you think passing memmap is >better. > >> >> >+ >> >+ if(!rc) >> >+ ms->section_mem_map = (unsigned long)NULL; >> > } >> > >> > static struct page * __meminit section_activate(int nid, unsigned long pfn, >> >-- >> >2.17.2 >> >> -- >> Wei Yang >> Help you, Help me >>
On 02/10/20 at 02:08pm, Wei Yang wrote: > On Mon, Feb 10, 2020 at 11:41:05AM +0800, Baoquan He wrote: > >On 02/10/20 at 07:52am, Wei Yang wrote: > >> >--- > >> > mm/sparse.c | 4 +++- > >> > 1 file changed, 3 insertions(+), 1 deletion(-) > >> > > >> >diff --git a/mm/sparse.c b/mm/sparse.c > >> >index 623755e88255..345d065ef6ce 100644 > >> >--- a/mm/sparse.c > >> >+++ b/mm/sparse.c > >> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, > >> > ms->usage = NULL; > >> > } > >> > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); > >> >- ms->section_mem_map = (unsigned long)NULL; > >> > } > >> > > >> > if (section_is_early && memmap) > >> > free_map_bootmem(memmap); > >> > else > >> > depopulate_section_memmap(pfn, nr_pages, altmap); > >> > >> The crash happens in depopulate_section_memmap() when trying to get memmap by > >> pfn_to_page(). Can we pass memmap directly? > > > >Yes, that's also a good idea. While it needs to add a parameter for > >depopulate_section_memmap(), the parameter is useless for VMEMMAP > >though, I personally prefer the current fix which is a little simpler. > > > > Not a new parameter, but replace pfn with memmap. > > Not sure why the parameter is useless for VMEMMAP? memmap will be assigned to > start and finally pass to vmemmap_free(). In section_deactivate(), per the code comments from Dan, we can know that: /* * section which only contains bootmem will be handled by * free_map_bootmem(), including a complete section, or partial * section which only has memory starting from the begining. */ if (section_is_early && memmap) free_map_bootmem(memmap); else /* * section which contains region mixing bootmem with hot added * sub-section region, only sub-section region, complete * section. And in the mxied case, if hot remove the hot added * sub-section aligned part, no memmap is got in the current * code. So we still need pfn to calculate it for vmemmap case. * To me, whenever we need, it looks better that we always use * pfn to get its own memmap. */ depopulate_section_memmap(pfn, nr_pages, altmap); This is why I would like to keep the current logic as is,only one line of code adjusting can fix the issue. Please let me know if I miss anything. > > >Anyway, both is fine to me, I can update if you think passing memmap is > >better. > > > >> > >> >+ > >> >+ if(!rc) > >> >+ ms->section_mem_map = (unsigned long)NULL; > >> > } > >> > > >> > static struct page * __meminit section_activate(int nid, unsigned long pfn, > >> >-- > >> >2.17.2 > >> > >> -- > >> Wei Yang > >> Help you, Help me > >> > > -- > Wei Yang > Help you, Help me >
On Mon, Feb 10, 2020 at 03:54:06PM +0800, Baoquan He wrote: >On 02/10/20 at 02:08pm, Wei Yang wrote: >> On Mon, Feb 10, 2020 at 11:41:05AM +0800, Baoquan He wrote: >> >On 02/10/20 at 07:52am, Wei Yang wrote: >> >> >--- >> >> > mm/sparse.c | 4 +++- >> >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> >> > >> >> >diff --git a/mm/sparse.c b/mm/sparse.c >> >> >index 623755e88255..345d065ef6ce 100644 >> >> >--- a/mm/sparse.c >> >> >+++ b/mm/sparse.c >> >> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, >> >> > ms->usage = NULL; >> >> > } >> >> > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); >> >> >- ms->section_mem_map = (unsigned long)NULL; >> >> > } >> >> > >> >> > if (section_is_early && memmap) >> >> > free_map_bootmem(memmap); >> >> > else >> >> > depopulate_section_memmap(pfn, nr_pages, altmap); >> >> >> >> The crash happens in depopulate_section_memmap() when trying to get memmap by >> >> pfn_to_page(). Can we pass memmap directly? >> > >> >Yes, that's also a good idea. While it needs to add a parameter for >> >depopulate_section_memmap(), the parameter is useless for VMEMMAP >> >though, I personally prefer the current fix which is a little simpler. >> > >> >> Not a new parameter, but replace pfn with memmap. >> >> Not sure why the parameter is useless for VMEMMAP? memmap will be assigned to >> start and finally pass to vmemmap_free(). > >In section_deactivate(), per the code comments from Dan, we can know >that: > > /* > * section which only contains bootmem will be handled by > * free_map_bootmem(), including a complete section, or partial > * section which only has memory starting from the begining. > */ > if (section_is_early && memmap) > free_map_bootmem(memmap); > else > /* > * section which contains region mixing bootmem with hot added > * sub-section region, only sub-section region, complete > * section. And in the mxied case, if hot remove the hot added > * sub-section aligned part, no memmap is got in the current > * code. So we still need pfn to calculate it for vmemmap case. > * To me, whenever we need, it looks better that we always use > * pfn to get its own memmap. > */ > depopulate_section_memmap(pfn, nr_pages, altmap); > >This is why I would like to keep the current logic as is,only one line >of code adjusting can fix the issue. Please let me know if I miss >anything. > You are right. I missed this point. > >> >> >Anyway, both is fine to me, I can update if you think passing memmap is >> >better. >> > >> >> >> >> >+ >> >> >+ if(!rc) >> >> >+ ms->section_mem_map = (unsigned long)NULL; >> >> > } >> >> > >> >> > static struct page * __meminit section_activate(int nid, unsigned long pfn, >> >> >-- >> >> >2.17.2 >> >> >> >> -- >> >> Wei Yang >> >> Help you, Help me >> >> >> >> -- >> Wei Yang >> Help you, Help me >>
diff --git a/mm/sparse.c b/mm/sparse.c index 623755e88255..345d065ef6ce 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, ms->usage = NULL; } memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); - ms->section_mem_map = (unsigned long)NULL; } if (section_is_early && memmap) free_map_bootmem(memmap); else depopulate_section_memmap(pfn, nr_pages, altmap); + + if(!rc) + ms->section_mem_map = (unsigned long)NULL; } static struct page * __meminit section_activate(int nid, unsigned long pfn,
In section_deactivate(), pfn_to_page() doesn't work any more after ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case. It caused hot remove failure, the trace is: kernel BUG at mm/page_alloc.c:4806! invalid opcode: 0000 [#1] SMP PTI CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:free_pages+0x85/0xa0 Call Trace: __remove_pages+0x99/0xc0 arch_remove_memory+0x23/0x4d try_remove_memory+0xc8/0x130 ? walk_memory_blocks+0x72/0xa0 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x72/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x2eb/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 ? flush_rcu_work+0x30/0x30 kthread+0x112/0x130 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x35/0x40 Let's defer the ->section_mem_map resetting after depopulate_section_memmap() to fix it. Signed-off-by: Baoquan He <bhe@redhat.com> --- mm/sparse.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)