Message ID | 20210203104750.23405-3-osalvador@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cleanup and fixups for vmemmap handling | expand |
On 03.02.21 11:47, Oscar Salvador wrote: > We never get to allocate 1GB pages when mapping the vmemmap range. > Drop the dead code both for the aligned and unaligned cases and leave > only the direct map handling. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > Suggested-by: David Hildenbrand <david@redhat.com> > --- > arch/x86/mm/init_64.c | 31 ++++--------------------------- > 1 file changed, 4 insertions(+), 27 deletions(-) > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index b0e1d215c83e..28729c6b9775 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -1062,7 +1062,6 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, > unsigned long next, pages = 0; > pmd_t *pmd_base; > pud_t *pud; > - void *page_addr; > > pud = pud_start + pud_index(addr); > for (; addr < end; addr = next, pud++) { > @@ -1072,32 +1071,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, > continue; > > if (pud_large(*pud)) { > - if (IS_ALIGNED(addr, PUD_SIZE) && > - IS_ALIGNED(next, PUD_SIZE)) { > - if (!direct) > - free_pagetable(pud_page(*pud), > - get_order(PUD_SIZE)); > - > - spin_lock(&init_mm.page_table_lock); > - pud_clear(pud); > - spin_unlock(&init_mm.page_table_lock); > - pages++; > - } else { > - /* If here, we are freeing vmemmap pages. */ > - memset((void *)addr, PAGE_INUSE, next - addr); > - > - page_addr = page_address(pud_page(*pud)); > - if (!memchr_inv(page_addr, PAGE_INUSE, > - PUD_SIZE)) { > - free_pagetable(pud_page(*pud), > - get_order(PUD_SIZE)); > - > - spin_lock(&init_mm.page_table_lock); > - pud_clear(pud); > - spin_unlock(&init_mm.page_table_lock); > - } > - } > - > + spin_lock(&init_mm.page_table_lock); > + pud_clear(pud); > + spin_unlock(&init_mm.page_table_lock); > + pages++; > continue; > } One problem I see with existing code / this change making more obvious is that when trying to remove in other granularity than we added (e.g., unplug a 128MB DIMM avaialble during boot), we remove the direct map of unrelated DIMMs. I think we should keep the if (IS_ALIGNED(addr, PUD_SIZE) && IS_ALIGNED(next, PUD_SIZE)) { ... } bits. Thoguhts? Apart from that looks good.
On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote: > One problem I see with existing code / this change making more obvious is > that when trying to remove in other granularity than we added (e.g., unplug > a 128MB DIMM avaialble during boot), we remove the direct map of unrelated > DIMMs. So, let me see if I understand your concern. We have a range that was mapped with 1GB page, and we try to remove a 128MB chunk from it. Yes, in that case we would clear the pud, and that is bad, so we should keep the PAGE_ALIGNED checks. Now, let us assume that scenario. If you have a 1GB mapped range and you remove it in smaller chunks bit by bit (e.g: 128M), the direct mapping of that range will never be cleared unless I am missing something (and the pagetables won't be freed) ?
On 03.02.21 15:10, Oscar Salvador wrote: > On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote: >> One problem I see with existing code / this change making more obvious is >> that when trying to remove in other granularity than we added (e.g., unplug >> a 128MB DIMM avaialble during boot), we remove the direct map of unrelated >> DIMMs. > > So, let me see if I understand your concern. > > We have a range that was mapped with 1GB page, and we try to remove > a 128MB chunk from it. > Yes, in that case we would clear the pud, and that is bad, so we should > keep the PAGE_ALIGNED checks. > > Now, let us assume that scenario. > If you have a 1GB mapped range and you remove it in smaller chunks bit by bit > (e.g: 128M), the direct mapping of that range will never be cleared unless No, that's exactly what's happening. Good thing is that it barely ever happens, so I assume leaving behind some direct mapping / page tables is not that bad.
On Wed, Feb 03, 2021 at 03:12:05PM +0100, David Hildenbrand wrote: > On 03.02.21 15:10, Oscar Salvador wrote: > > On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote: > > > One problem I see with existing code / this change making more obvious is > > > that when trying to remove in other granularity than we added (e.g., unplug > > > a 128MB DIMM avaialble during boot), we remove the direct map of unrelated > > > DIMMs. > > > > So, let me see if I understand your concern. > > > > We have a range that was mapped with 1GB page, and we try to remove > > a 128MB chunk from it. > > Yes, in that case we would clear the pud, and that is bad, so we should > > keep the PAGE_ALIGNED checks. > > > > Now, let us assume that scenario. > > If you have a 1GB mapped range and you remove it in smaller chunks bit by bit > > (e.g: 128M), the direct mapping of that range will never be cleared unless > > No, that's exactly what's happening. Good thing is that it barely ever > happens, so I assume leaving behind some direct mapping / page tables is not > that bad. Sorry, I meant that that is the current situation now. Then let us keep the PAGE_ALIGNED stuff. I shall resend a v3 later today. thanks for the review ;-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index b0e1d215c83e..28729c6b9775 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1062,7 +1062,6 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, unsigned long next, pages = 0; pmd_t *pmd_base; pud_t *pud; - void *page_addr; pud = pud_start + pud_index(addr); for (; addr < end; addr = next, pud++) { @@ -1072,32 +1071,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, continue; if (pud_large(*pud)) { - if (IS_ALIGNED(addr, PUD_SIZE) && - IS_ALIGNED(next, PUD_SIZE)) { - if (!direct) - free_pagetable(pud_page(*pud), - get_order(PUD_SIZE)); - - spin_lock(&init_mm.page_table_lock); - pud_clear(pud); - spin_unlock(&init_mm.page_table_lock); - pages++; - } else { - /* If here, we are freeing vmemmap pages. */ - memset((void *)addr, PAGE_INUSE, next - addr); - - page_addr = page_address(pud_page(*pud)); - if (!memchr_inv(page_addr, PAGE_INUSE, - PUD_SIZE)) { - free_pagetable(pud_page(*pud), - get_order(PUD_SIZE)); - - spin_lock(&init_mm.page_table_lock); - pud_clear(pud); - spin_unlock(&init_mm.page_table_lock); - } - } - + spin_lock(&init_mm.page_table_lock); + pud_clear(pud); + spin_unlock(&init_mm.page_table_lock); + pages++; continue; }
We never get to allocate 1GB pages when mapping the vmemmap range. Drop the dead code both for the aligned and unaligned cases and leave only the direct map handling. Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: David Hildenbrand <david@redhat.com> --- arch/x86/mm/init_64.c | 31 ++++--------------------------- 1 file changed, 4 insertions(+), 27 deletions(-)