diff mbox

[2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()

Message ID 20170330163849.18402-3-punit.agrawal@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Punit Agrawal March 30, 2017, 4:38 p.m. UTC
huge_pte_offset() does not correctly handle poisoned or migration page
table entries. Not knowing the size of the hugepage entry being
requested only compounded the problem.

The recently added hstate parameter can be used to determine the size of
hugepage being accessed. Use the size to find the correct page table
entry to return when coming across a swap page table entry.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

Comments

Mark Rutland March 31, 2017, 9:52 a.m. UTC | #1
Hi Punit,

On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
> huge_pte_offset() does not correctly handle poisoned or migration page
> table entries. 

What exactly does it do wrong?

Judging by the patch, we return NULL in some cases we shouldn't, right?

What can result from this? e.g. can we see data corruption?

> Not knowing the size of the hugepage entry being
> requested only compounded the problem.
> 
> The recently added hstate parameter can be used to determine the size of
> hugepage being accessed. Use the size to find the correct page table
> entry to return when coming across a swap page table entry.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: David Woods <dwoods@mellanox.com>

Given this is a fix for a bug, it sounds like it should have a fixes
tag, or a Cc stable...

Thanks,
Mark.

> ---
>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 9ca742c4c1ab..44014403081f 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  pte_t *huge_pte_offset(struct mm_struct *mm,
>  		       unsigned long addr, struct hstate *h)
>  {
> +	unsigned long sz = huge_page_size(h);
>  	pgd_t *pgd;
>  	pud_t *pud;
> -	pmd_t *pmd = NULL;
> -	pte_t *pte = NULL;
> +	pmd_t *pmd;
> +	pte_t *pte;
>  
>  	pgd = pgd_offset(mm, addr);
>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>  	if (!pgd_present(*pgd))
>  		return NULL;
> +
>  	pud = pud_offset(pgd, addr);
> -	if (!pud_present(*pud))
> +	if (pud_none(*pud) && sz != PUD_SIZE)
>  		return NULL;
> -
> -	if (pud_huge(*pud))
> +	else if (!pud_table(*pud))
>  		return (pte_t *)pud;
> +
> +	if (sz == CONT_PMD_SIZE)
> +		addr &= CONT_PMD_MASK;
> +
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd))
> +	if (pmd_none(*pmd) &&
> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>  		return NULL;
> -
> -	if (pte_cont(pmd_pte(*pmd))) {
> -		pmd = pmd_offset(
> -			pud, (addr & CONT_PMD_MASK));
> -		return (pte_t *)pmd;
> -	}
> -	if (pmd_huge(*pmd))
> +	else if (!pmd_table(*pmd))
>  		return (pte_t *)pmd;
> -	pte = pte_offset_kernel(pmd, addr);
> -	if (pte_present(*pte) && pte_cont(*pte)) {
> +
> +	if (sz == CONT_PTE_SIZE) {
>  		pte = pte_offset_kernel(
>  			pmd, (addr & CONT_PTE_MASK));
>  		return pte;
>  	}
> +
>  	return NULL;
>  }
>  
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Punit Agrawal April 4, 2017, 6:47 p.m. UTC | #2
Hi Mark,

Mark Rutland <mark.rutland@arm.com> writes:

> Hi Punit,
>
> On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
>> huge_pte_offset() does not correctly handle poisoned or migration page
>> table entries. 
>
> What exactly does it do wrong?
>
> Judging by the patch, we return NULL in some cases we shouldn't, right?

huge_pte_offset() returns NULL when it comes across swap entries for any
of the supported hugepage sizes.

>
> What can result from this? e.g. can we see data corruption?

In the tests I am running, it results in an error in the log -

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

when unmapping the page tables for the process that owns the poisoned
page.

In some instances, returning NULL instead of swap entries could lead to
data corruption - especially when the page tables contain migration swap
entries. But since hugepage migration is not enabled on arm64 I haven't
seen any corruption.

I've updated the commit log with more details locally.

>
>> Not knowing the size of the hugepage entry being
>> requested only compounded the problem.
>> 
>> The recently added hstate parameter can be used to determine the size of
>> hugepage being accessed. Use the size to find the correct page table
>> entry to return when coming across a swap page table entry.
>> 
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: David Woods <dwoods@mellanox.com>
>
> Given this is a fix for a bug, it sounds like it should have a fixes
> tag, or a Cc stable...

The problem doesn't occur until we enable memory failure handling. So
there shouldn't be a problem on earlier kernels.

Thanks,
Punit

>
> Thanks,
> Mark.
>
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>>  1 file changed, 16 insertions(+), 15 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 9ca742c4c1ab..44014403081f 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>  pte_t *huge_pte_offset(struct mm_struct *mm,
>>  		       unsigned long addr, struct hstate *h)
>>  {
>> +	unsigned long sz = huge_page_size(h);
>>  	pgd_t *pgd;
>>  	pud_t *pud;
>> -	pmd_t *pmd = NULL;
>> -	pte_t *pte = NULL;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>>  
>>  	pgd = pgd_offset(mm, addr);
>>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>>  	if (!pgd_present(*pgd))
>>  		return NULL;
>> +
>>  	pud = pud_offset(pgd, addr);
>> -	if (!pud_present(*pud))
>> +	if (pud_none(*pud) && sz != PUD_SIZE)
>>  		return NULL;
>> -
>> -	if (pud_huge(*pud))
>> +	else if (!pud_table(*pud))
>>  		return (pte_t *)pud;
>> +
>> +	if (sz == CONT_PMD_SIZE)
>> +		addr &= CONT_PMD_MASK;
>> +
>>  	pmd = pmd_offset(pud, addr);
>> -	if (!pmd_present(*pmd))
>> +	if (pmd_none(*pmd) &&
>> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>>  		return NULL;
>> -
>> -	if (pte_cont(pmd_pte(*pmd))) {
>> -		pmd = pmd_offset(
>> -			pud, (addr & CONT_PMD_MASK));
>> -		return (pte_t *)pmd;
>> -	}
>> -	if (pmd_huge(*pmd))
>> +	else if (!pmd_table(*pmd))
>>  		return (pte_t *)pmd;
>> -	pte = pte_offset_kernel(pmd, addr);
>> -	if (pte_present(*pte) && pte_cont(*pte)) {
>> +
>> +	if (sz == CONT_PTE_SIZE) {
>>  		pte = pte_offset_kernel(
>>  			pmd, (addr & CONT_PTE_MASK));
>>  		return pte;
>>  	}
>> +
>>  	return NULL;
>>  }
>>  
>> -- 
>> 2.11.0
>> 
>> 
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 9ca742c4c1ab..44014403081f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -192,38 +192,39 @@  pte_t *huge_pte_alloc(struct mm_struct *mm,
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, struct hstate *h)
 {
+	unsigned long sz = huge_page_size(h);
 	pgd_t *pgd;
 	pud_t *pud;
-	pmd_t *pmd = NULL;
-	pte_t *pte = NULL;
+	pmd_t *pmd;
+	pte_t *pte;
 
 	pgd = pgd_offset(mm, addr);
 	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
 	if (!pgd_present(*pgd))
 		return NULL;
+
 	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
+	if (pud_none(*pud) && sz != PUD_SIZE)
 		return NULL;
-
-	if (pud_huge(*pud))
+	else if (!pud_table(*pud))
 		return (pte_t *)pud;
+
+	if (sz == CONT_PMD_SIZE)
+		addr &= CONT_PMD_MASK;
+
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
+	if (pmd_none(*pmd) &&
+	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
 		return NULL;
-
-	if (pte_cont(pmd_pte(*pmd))) {
-		pmd = pmd_offset(
-			pud, (addr & CONT_PMD_MASK));
-		return (pte_t *)pmd;
-	}
-	if (pmd_huge(*pmd))
+	else if (!pmd_table(*pmd))
 		return (pte_t *)pmd;
-	pte = pte_offset_kernel(pmd, addr);
-	if (pte_present(*pte) && pte_cont(*pte)) {
+
+	if (sz == CONT_PTE_SIZE) {
 		pte = pte_offset_kernel(
 			pmd, (addr & CONT_PTE_MASK));
 		return pte;
 	}
+
 	return NULL;
 }