diff mbox series

[v2,1/1] arm64: mm: correct the inside linear map boundaries during hotplug check

Message ID 20210215192237.362706-2-pasha.tatashin@soleen.com (mailing list archive)
State New, archived
Headers show
Series correct the inside linear map boundaries during hotplug check | expand

Commit Message

Pasha Tatashin Feb. 15, 2021, 7:22 p.m. UTC
Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
linear map range is not checked correctly.

The start physical address that linear map covers can be actually at the
end of the range because of randomization. Check that and if so reduce it
to 0.

This can be verified on QEMU with setting kaslr-seed to ~0ul:

memstart_offset_seed = 0xffff
START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000
END:   __pa(PAGE_END - 1) =  1000bfffffff

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping")
Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
---
 arch/arm64/mm/mmu.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Ard Biesheuvel Feb. 15, 2021, 7:27 p.m. UTC | #1
On Mon, 15 Feb 2021 at 20:22, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
> linear map range is not checked correctly.
>
> The start physical address that linear map covers can be actually at the
> end of the range because of randomization. Check that and if so reduce it
> to 0.
>
> This can be verified on QEMU with setting kaslr-seed to ~0ul:
>
> memstart_offset_seed = 0xffff
> START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000
> END:   __pa(PAGE_END - 1) =  1000bfffffff
>
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping")
> Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com>

> ---
>  arch/arm64/mm/mmu.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ae0c3d023824..cc16443ea67f 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1444,14 +1444,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
>
>  static bool inside_linear_region(u64 start, u64 size)
>  {
> +       u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
> +       u64 end_linear_pa = __pa(PAGE_END - 1);
> +
> +       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
> +               /*
> +                * Check for a wrap, it is possible because of randomized linear
> +                * mapping the start physical address is actually bigger than
> +                * the end physical address. In this case set start to zero
> +                * because [0, end_linear_pa] range must still be able to cover
> +                * all addressable physical addresses.
> +                */
> +               if (start_linear_pa > end_linear_pa)
> +                       start_linear_pa = 0;
> +       }
> +
> +       WARN_ON(start_linear_pa > end_linear_pa);
> +
>         /*
>          * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
>          * accommodating both its ends but excluding PAGE_END. Max physical
>          * range which can be mapped inside this linear mapping range, must
>          * also be derived from its end points.
>          */
> -       return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
> -              (start + size - 1) <= __pa(PAGE_END - 1);

Can't we simply use signed arithmetic here? This expression works fine
if the quantities are all interpreted as s64 instead of u64


> +       return start >= start_linear_pa && (start + size - 1) <= end_linear_pa;
>  }
>
>  int arch_add_memory(int nid, u64 start, u64 size,
> --
> 2.25.1
>
Pasha Tatashin Feb. 15, 2021, 7:30 p.m. UTC | #2
> Can't we simply use signed arithmetic here? This expression works fine
> if the quantities are all interpreted as s64 instead of u64

I was thinking about that, but I do not like the idea of using sign
arithmetics for physical addresses. Also, I am worried that someone in
the future will unknowingly change it to unsigns or to phys_addr_t. It
is safer to have start explicitly set to 0 in case of wrap.
Ard Biesheuvel Feb. 15, 2021, 7:34 p.m. UTC | #3
On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>
> > Can't we simply use signed arithmetic here? This expression works fine
> > if the quantities are all interpreted as s64 instead of u64
>
> I was thinking about that, but I do not like the idea of using sign
> arithmetics for physical addresses. Also, I am worried that someone in
> the future will unknowingly change it to unsigns or to phys_addr_t. It
> is safer to have start explicitly set to 0 in case of wrap.

memstart_addr is already a s64 for this exact reason.

Btw, the KASLR check is incorrect: memstart_addr could also be
negative when running the 52-bit VA kernel on hardware that is only
48-bit VA capable.
Pasha Tatashin Feb. 15, 2021, 7:51 p.m. UTC | #4
On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >
> > > Can't we simply use signed arithmetic here? This expression works fine
> > > if the quantities are all interpreted as s64 instead of u64
> >
> > I was thinking about that, but I do not like the idea of using sign
> > arithmetics for physical addresses. Also, I am worried that someone in
> > the future will unknowingly change it to unsigns or to phys_addr_t. It
> > is safer to have start explicitly set to 0 in case of wrap.
>
> memstart_addr is already a s64 for this exact reason.

memstart_addr is basically an offset and it must be negative. For
example, this would not work if it was not signed:
#define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))

However, on powerpc it is phys_addr_t type.

>
> Btw, the KASLR check is incorrect: memstart_addr could also be
> negative when running the 52-bit VA kernel on hardware that is only
> 48-bit VA capable.

Good point!

if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
    memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);

So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again.

I am OK to change start_linear_pa, end_linear_pa to signed, but IMO
what I have now is actually safer to make sure that does not break
again in the future.
Pasha Tatashin Feb. 15, 2021, 10:28 p.m. UTC | #5
> >
> > Btw, the KASLR check is incorrect: memstart_addr could also be
> > negative when running the 52-bit VA kernel on hardware that is only
> > 48-bit VA capable.
>
> Good point!
>
> if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
>     memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
>
> So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again.

Hi Ard,

Actually, looking more at this, I do not see how with 52VA on a 48VA
processor start offset can become negative unless randomization is
involved.
The start of the linear map will point to the first physical address
that is reported by memblock_start_of_DRAM(). However, memstart_addr
will be negative. So, I think the current approach using
IS_ENABLED(CONFIG_RANDOMIZE_BASE) is good.

48VA processor with VA_BITS_48:
memstart_addr 40000000
start_linear_pa 40000000
end_linear_pa 80003fffffff

48VA processor with VA_BITS_52:
memstart_addr fff1000040000000   <- Negative
start_linear_pa 40000000  <- positive, and the first PA address
end_linear_pa 80003fffffff

Thank you,
Pasha
Anshuman Khandual Feb. 16, 2021, 2:55 a.m. UTC | #6
On 2/16/21 12:57 AM, Ard Biesheuvel wrote:
> On Mon, 15 Feb 2021 at 20:22, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>>
>> Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
>> linear map range is not checked correctly.
>>
>> The start physical address that linear map covers can be actually at the
>> end of the range because of randomization. Check that and if so reduce it
>> to 0.
>>
>> This can be verified on QEMU with setting kaslr-seed to ~0ul:
>>
>> memstart_offset_seed = 0xffff
>> START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000
>> END:   __pa(PAGE_END - 1) =  1000bfffffff
>>
>> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping")
>> Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> 
>> ---
>>  arch/arm64/mm/mmu.c | 20 ++++++++++++++++++--
>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index ae0c3d023824..cc16443ea67f 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1444,14 +1444,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
>>
>>  static bool inside_linear_region(u64 start, u64 size)
>>  {
>> +       u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
>> +       u64 end_linear_pa = __pa(PAGE_END - 1);
>> +
>> +       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
>> +               /*
>> +                * Check for a wrap, it is possible because of randomized linear
>> +                * mapping the start physical address is actually bigger than
>> +                * the end physical address. In this case set start to zero
>> +                * because [0, end_linear_pa] range must still be able to cover
>> +                * all addressable physical addresses.
>> +                */
>> +               if (start_linear_pa > end_linear_pa)
>> +                       start_linear_pa = 0;
>> +       }
>> +
>> +       WARN_ON(start_linear_pa > end_linear_pa);
>> +
>>         /*
>>          * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
>>          * accommodating both its ends but excluding PAGE_END. Max physical
>>          * range which can be mapped inside this linear mapping range, must
>>          * also be derived from its end points.
>>          */
>> -       return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
>> -              (start + size - 1) <= __pa(PAGE_END - 1);
> 
> Can't we simply use signed arithmetic here? This expression works fine
> if the quantities are all interpreted as s64 instead of u64

There is a new generic framework which expects the platform to provide two
distinct range points (low and high) for hotplug address comparison. Those
range points can be different depending on whether address randomization
is enabled and the flip occurs. But this comparison here in the platform
code is going away.

This patch needs to rebased on the new framework which is part of linux-next.

https://patchwork.kernel.org/project/linux-mm/list/?series=425051
Anshuman Khandual Feb. 16, 2021, 3:12 a.m. UTC | #7
On 2/16/21 1:21 AM, Pavel Tatashin wrote:
> On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
>>>
>>>> Can't we simply use signed arithmetic here? This expression works fine
>>>> if the quantities are all interpreted as s64 instead of u64
>>>
>>> I was thinking about that, but I do not like the idea of using sign
>>> arithmetics for physical addresses. Also, I am worried that someone in
>>> the future will unknowingly change it to unsigns or to phys_addr_t. It
>>> is safer to have start explicitly set to 0 in case of wrap.
>>
>> memstart_addr is already a s64 for this exact reason.
> 
> memstart_addr is basically an offset and it must be negative. For
> example, this would not work if it was not signed:
> #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> 
> However, on powerpc it is phys_addr_t type.
> 
>>
>> Btw, the KASLR check is incorrect: memstart_addr could also be
>> negative when running the 52-bit VA kernel on hardware that is only
>> 48-bit VA capable.
> 
> Good point!
> 
> if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
>     memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
> 
> So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again.
> 
> I am OK to change start_linear_pa, end_linear_pa to signed, but IMO
> what I have now is actually safer to make sure that does not break
> again in the future.
An explicit check for the flip over and providing two different start
addresses points would be required in order to use the new framework.
Ard Biesheuvel Feb. 16, 2021, 7:36 a.m. UTC | #8
On Tue, 16 Feb 2021 at 04:12, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 2/16/21 1:21 AM, Pavel Tatashin wrote:
> > On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >>
> >> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> >>>
> >>>> Can't we simply use signed arithmetic here? This expression works fine
> >>>> if the quantities are all interpreted as s64 instead of u64
> >>>
> >>> I was thinking about that, but I do not like the idea of using sign
> >>> arithmetics for physical addresses. Also, I am worried that someone in
> >>> the future will unknowingly change it to unsigns or to phys_addr_t. It
> >>> is safer to have start explicitly set to 0 in case of wrap.
> >>
> >> memstart_addr is already a s64 for this exact reason.
> >
> > memstart_addr is basically an offset and it must be negative. For
> > example, this would not work if it was not signed:
> > #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> >
> > However, on powerpc it is phys_addr_t type.
> >
> >>
> >> Btw, the KASLR check is incorrect: memstart_addr could also be
> >> negative when running the 52-bit VA kernel on hardware that is only
> >> 48-bit VA capable.
> >
> > Good point!
> >
> > if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
> >     memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
> >
> > So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again.
> >
> > I am OK to change start_linear_pa, end_linear_pa to signed, but IMO
> > what I have now is actually safer to make sure that does not break
> > again in the future.
> An explicit check for the flip over and providing two different start
> addresses points would be required in order to use the new framework.

I don't think so. We no longer randomize over the same range, but take
the support PA range into account. (97d6786e0669d)

This should ensure that __pa(_PAGE_OFFSET(vabits_actual)) never
assumes a negative value. And to Pavel's point re 48/52 bit VAs: the
fact that vabits_actual appears in this expression means that it
already takes this into account, so you are correct that we don't have
to care about that here.

So even if memstart_addr could be negative, this expression should
never produce a negative value. And with the patch above applied, it
should never do so when running under KASLR either.

So question to Pavel and Tyler: could you please check whether you
have that patch, and whether it fixes the issue? It was introduced in
v5.11, and hasn't been backported yet (it wasn't marked for -stable)
Pasha Tatashin Feb. 16, 2021, 2:34 p.m. UTC | #9
On Tue, Feb 16, 2021 at 2:36 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 16 Feb 2021 at 04:12, Anshuman Khandual
> <anshuman.khandual@arm.com> wrote:
> >
> >
> >
> > On 2/16/21 1:21 AM, Pavel Tatashin wrote:
> > > On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >>
> > >> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> > >>>
> > >>>> Can't we simply use signed arithmetic here? This expression works fine
> > >>>> if the quantities are all interpreted as s64 instead of u64
> > >>>
> > >>> I was thinking about that, but I do not like the idea of using sign
> > >>> arithmetics for physical addresses. Also, I am worried that someone in
> > >>> the future will unknowingly change it to unsigns or to phys_addr_t. It
> > >>> is safer to have start explicitly set to 0 in case of wrap.
> > >>
> > >> memstart_addr is already a s64 for this exact reason.
> > >
> > > memstart_addr is basically an offset and it must be negative. For
> > > example, this would not work if it was not signed:
> > > #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> > >
> > > However, on powerpc it is phys_addr_t type.
> > >
> > >>
> > >> Btw, the KASLR check is incorrect: memstart_addr could also be
> > >> negative when running the 52-bit VA kernel on hardware that is only
> > >> 48-bit VA capable.
> > >
> > > Good point!
> > >
> > > if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
> > >     memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
> > >
> > > So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again.
> > >
> > > I am OK to change start_linear_pa, end_linear_pa to signed, but IMO
> > > what I have now is actually safer to make sure that does not break
> > > again in the future.
> > An explicit check for the flip over and providing two different start
> > addresses points would be required in order to use the new framework.
>
> I don't think so. We no longer randomize over the same range, but take
> the support PA range into account. (97d6786e0669d)
>
> This should ensure that __pa(_PAGE_OFFSET(vabits_actual)) never
> assumes a negative value. And to Pavel's point re 48/52 bit VAs: the
> fact that vabits_actual appears in this expression means that it
> already takes this into account, so you are correct that we don't have
> to care about that here.
>
> So even if memstart_addr could be negative, this expression should
> never produce a negative value. And with the patch above applied, it
> should never do so when running under KASLR either.
>
> So question to Pavel and Tyler: could you please check whether you
> have that patch, and whether it fixes the issue? It was introduced in
> v5.11, and hasn't been backported yet (it wasn't marked for -stable)

97d6786e0669d
arm64: mm: account for hotplug memory when randomizing the linear region

Does not address the problem that is described in this bug. It only
addresses the problem of adding extra PA space to the linear map which
is indeed needed (btw is it possible that hot plug is going to add
below memblock_start_of_DRAM(), because that is not currently
accounted) , but not the fact that a linear map can start from high
addresses because of randomization. I have verified that in QEMU, and
Tyler verified it on real hardware backporting it to 5.10, the problem
that this patch fixes is still there.

Pasha
Pasha Tatashin Feb. 16, 2021, 2:48 p.m. UTC | #10
> There is a new generic framework which expects the platform to provide two
> distinct range points (low and high) for hotplug address comparison. Those
> range points can be different depending on whether address randomization
> is enabled and the flip occurs. But this comparison here in the platform
> code is going away.
>
> This patch needs to rebased on the new framework which is part of linux-next.
>
> https://patchwork.kernel.org/project/linux-mm/list/?series=425051

Hi Anshuman,

Thanks for letting me know. I will send an updated patch against linux-next.

Thank you,
Pasha
diff mbox series

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ae0c3d023824..cc16443ea67f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1444,14 +1444,30 @@  static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 
 static bool inside_linear_region(u64 start, u64 size)
 {
+	u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
+	u64 end_linear_pa = __pa(PAGE_END - 1);
+
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		/*
+		 * Check for a wrap, it is possible because of randomized linear
+		 * mapping the start physical address is actually bigger than
+		 * the end physical address. In this case set start to zero
+		 * because [0, end_linear_pa] range must still be able to cover
+		 * all addressable physical addresses.
+		 */
+		if (start_linear_pa > end_linear_pa)
+			start_linear_pa = 0;
+	}
+
+	WARN_ON(start_linear_pa > end_linear_pa);
+
 	/*
 	 * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
 	 * accommodating both its ends but excluding PAGE_END. Max physical
 	 * range which can be mapped inside this linear mapping range, must
 	 * also be derived from its end points.
 	 */
-	return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
-	       (start + size - 1) <= __pa(PAGE_END - 1);
+	return start >= start_linear_pa && (start + size - 1) <= end_linear_pa;
 }
 
 int arch_add_memory(int nid, u64 start, u64 size,