diff mbox series

[v2,2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges

Message ID 20210203104750.23405-3-osalvador@suse.de (mailing list archive)
State New, archived
Headers show
Series Cleanup and fixups for vmemmap handling | expand

Commit Message

Oscar Salvador Feb. 3, 2021, 10:47 a.m. UTC
We never get to allocate 1GB pages when mapping the vmemmap range.
Drop the dead code both for the aligned and unaligned cases and leave
only the direct map handling.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/mm/init_64.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

Comments

David Hildenbrand Feb. 3, 2021, 1:33 p.m. UTC | #1
On 03.02.21 11:47, Oscar Salvador wrote:
> We never get to allocate 1GB pages when mapping the vmemmap range.
> Drop the dead code both for the aligned and unaligned cases and leave
> only the direct map handling.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>
> ---
>   arch/x86/mm/init_64.c | 31 ++++---------------------------
>   1 file changed, 4 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index b0e1d215c83e..28729c6b9775 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1062,7 +1062,6 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
>   	unsigned long next, pages = 0;
>   	pmd_t *pmd_base;
>   	pud_t *pud;
> -	void *page_addr;
>   
>   	pud = pud_start + pud_index(addr);
>   	for (; addr < end; addr = next, pud++) {
> @@ -1072,32 +1071,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
>   			continue;
>   
>   		if (pud_large(*pud)) {
> -			if (IS_ALIGNED(addr, PUD_SIZE) &&
> -			    IS_ALIGNED(next, PUD_SIZE)) {
> -				if (!direct)
> -					free_pagetable(pud_page(*pud),
> -						       get_order(PUD_SIZE));
> -
> -				spin_lock(&init_mm.page_table_lock);
> -				pud_clear(pud);
> -				spin_unlock(&init_mm.page_table_lock);
> -				pages++;
> -			} else {
> -				/* If here, we are freeing vmemmap pages. */
> -				memset((void *)addr, PAGE_INUSE, next - addr);
> -
> -				page_addr = page_address(pud_page(*pud));
> -				if (!memchr_inv(page_addr, PAGE_INUSE,
> -						PUD_SIZE)) {
> -					free_pagetable(pud_page(*pud),
> -						       get_order(PUD_SIZE));
> -
> -					spin_lock(&init_mm.page_table_lock);
> -					pud_clear(pud);
> -					spin_unlock(&init_mm.page_table_lock);
> -				}
> -			}
> -
> +			spin_lock(&init_mm.page_table_lock);
> +			pud_clear(pud);
> +			spin_unlock(&init_mm.page_table_lock);
> +			pages++;
>   			continue;
>   		}

One problem I see with existing code / this change making more obvious 
is that when trying to remove in other granularity than we added (e.g., 
unplug a 128MB DIMM avaialble during boot), we remove the direct map of 
unrelated DIMMs.

I think we should keep the

if (IS_ALIGNED(addr, PUD_SIZE) &&
     IS_ALIGNED(next, PUD_SIZE)) {
...
}

bits. Thoguhts?

Apart from that looks good.
Oscar Salvador Feb. 3, 2021, 2:10 p.m. UTC | #2
On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote:
> One problem I see with existing code / this change making more obvious is
> that when trying to remove in other granularity than we added (e.g., unplug
> a 128MB DIMM avaialble during boot), we remove the direct map of unrelated
> DIMMs.

So, let me see if I understand your concern.

We have a range that was mapped with 1GB page, and we try to remove
a 128MB chunk from it.
Yes, in that case we would clear the pud, and that is bad, so we should
keep the PAGE_ALIGNED checks.

Now, let us assume that scenario.
If you have a 1GB mapped range and you remove it in smaller chunks bit by bit
(e.g: 128M), the direct mapping of that range will never be cleared unless
I am missing something (and the pagetables won't be freed) ?
David Hildenbrand Feb. 3, 2021, 2:12 p.m. UTC | #3
On 03.02.21 15:10, Oscar Salvador wrote:
> On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote:
>> One problem I see with existing code / this change making more obvious is
>> that when trying to remove in other granularity than we added (e.g., unplug
>> a 128MB DIMM avaialble during boot), we remove the direct map of unrelated
>> DIMMs.
> 
> So, let me see if I understand your concern.
> 
> We have a range that was mapped with 1GB page, and we try to remove
> a 128MB chunk from it.
> Yes, in that case we would clear the pud, and that is bad, so we should
> keep the PAGE_ALIGNED checks.
> 
> Now, let us assume that scenario.
> If you have a 1GB mapped range and you remove it in smaller chunks bit by bit
> (e.g: 128M), the direct mapping of that range will never be cleared unless

No, that's exactly what's happening. Good thing is that it barely ever 
happens, so I assume leaving behind some direct mapping / page tables is 
not that bad.
Oscar Salvador Feb. 3, 2021, 2:15 p.m. UTC | #4
On Wed, Feb 03, 2021 at 03:12:05PM +0100, David Hildenbrand wrote:
> On 03.02.21 15:10, Oscar Salvador wrote:
> > On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote:
> > > One problem I see with existing code / this change making more obvious is
> > > that when trying to remove in other granularity than we added (e.g., unplug
> > > a 128MB DIMM avaialble during boot), we remove the direct map of unrelated
> > > DIMMs.
> > 
> > So, let me see if I understand your concern.
> > 
> > We have a range that was mapped with 1GB page, and we try to remove
> > a 128MB chunk from it.
> > Yes, in that case we would clear the pud, and that is bad, so we should
> > keep the PAGE_ALIGNED checks.
> > 
> > Now, let us assume that scenario.
> > If you have a 1GB mapped range and you remove it in smaller chunks bit by bit
> > (e.g: 128M), the direct mapping of that range will never be cleared unless
> 
> No, that's exactly what's happening. Good thing is that it barely ever
> happens, so I assume leaving behind some direct mapping / page tables is not
> that bad.

Sorry, I meant that that is the current situation now.

Then let us keep the PAGE_ALIGNED stuff.

I shall resend a v3 later today.


thanks for the review ;-)
diff mbox series

Patch

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b0e1d215c83e..28729c6b9775 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1062,7 +1062,6 @@  remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 	unsigned long next, pages = 0;
 	pmd_t *pmd_base;
 	pud_t *pud;
-	void *page_addr;
 
 	pud = pud_start + pud_index(addr);
 	for (; addr < end; addr = next, pud++) {
@@ -1072,32 +1071,10 @@  remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 			continue;
 
 		if (pud_large(*pud)) {
-			if (IS_ALIGNED(addr, PUD_SIZE) &&
-			    IS_ALIGNED(next, PUD_SIZE)) {
-				if (!direct)
-					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
-
-				spin_lock(&init_mm.page_table_lock);
-				pud_clear(pud);
-				spin_unlock(&init_mm.page_table_lock);
-				pages++;
-			} else {
-				/* If here, we are freeing vmemmap pages. */
-				memset((void *)addr, PAGE_INUSE, next - addr);
-
-				page_addr = page_address(pud_page(*pud));
-				if (!memchr_inv(page_addr, PAGE_INUSE,
-						PUD_SIZE)) {
-					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
-
-					spin_lock(&init_mm.page_table_lock);
-					pud_clear(pud);
-					spin_unlock(&init_mm.page_table_lock);
-				}
-			}
-
+			spin_lock(&init_mm.page_table_lock);
+			pud_clear(pud);
+			spin_unlock(&init_mm.page_table_lock);
+			pages++;
 			continue;
 		}