Message ID | 20210215192237.362706-2-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | correct the inside linear map boundaries during hotplug check | expand |
On Mon, 15 Feb 2021 at 20:22, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the > linear map range is not checked correctly. > > The start physical address that linear map covers can be actually at the > end of the range because of randomization. Check that and if so reduce it > to 0. > > This can be verified on QEMU with setting kaslr-seed to ~0ul: > > memstart_offset_seed = 0xffff > START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000 > END: __pa(PAGE_END - 1) = 1000bfffffff > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping") > Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com> > --- > arch/arm64/mm/mmu.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index ae0c3d023824..cc16443ea67f 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -1444,14 +1444,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) > > static bool inside_linear_region(u64 start, u64 size) > { > + u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual)); > + u64 end_linear_pa = __pa(PAGE_END - 1); > + > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { > + /* > + * Check for a wrap, it is possible because of randomized linear > + * mapping the start physical address is actually bigger than > + * the end physical address. In this case set start to zero > + * because [0, end_linear_pa] range must still be able to cover > + * all addressable physical addresses. > + */ > + if (start_linear_pa > end_linear_pa) > + start_linear_pa = 0; > + } > + > + WARN_ON(start_linear_pa > end_linear_pa); > + > /* > * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] > * accommodating both its ends but excluding PAGE_END. Max physical > * range which can be mapped inside this linear mapping range, must > * also be derived from its end points. > */ > - return start >= __pa(_PAGE_OFFSET(vabits_actual)) && > - (start + size - 1) <= __pa(PAGE_END - 1); Can't we simply use signed arithmetic here? This expression works fine if the quantities are all interpreted as s64 instead of u64 > + return start >= start_linear_pa && (start + size - 1) <= end_linear_pa; > } > > int arch_add_memory(int nid, u64 start, u64 size, > -- > 2.25.1 >
> Can't we simply use signed arithmetic here? This expression works fine > if the quantities are all interpreted as s64 instead of u64 I was thinking about that, but I do not like the idea of using sign arithmetics for physical addresses. Also, I am worried that someone in the future will unknowingly change it to unsigns or to phys_addr_t. It is safer to have start explicitly set to 0 in case of wrap.
On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > Can't we simply use signed arithmetic here? This expression works fine > > if the quantities are all interpreted as s64 instead of u64 > > I was thinking about that, but I do not like the idea of using sign > arithmetics for physical addresses. Also, I am worried that someone in > the future will unknowingly change it to unsigns or to phys_addr_t. It > is safer to have start explicitly set to 0 in case of wrap. memstart_addr is already a s64 for this exact reason. Btw, the KASLR check is incorrect: memstart_addr could also be negative when running the 52-bit VA kernel on hardware that is only 48-bit VA capable.
On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > > > Can't we simply use signed arithmetic here? This expression works fine > > > if the quantities are all interpreted as s64 instead of u64 > > > > I was thinking about that, but I do not like the idea of using sign > > arithmetics for physical addresses. Also, I am worried that someone in > > the future will unknowingly change it to unsigns or to phys_addr_t. It > > is safer to have start explicitly set to 0 in case of wrap. > > memstart_addr is already a s64 for this exact reason. memstart_addr is basically an offset and it must be negative. For example, this would not work if it was not signed: #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) However, on powerpc it is phys_addr_t type. > > Btw, the KASLR check is incorrect: memstart_addr could also be > negative when running the 52-bit VA kernel on hardware that is only > 48-bit VA capable. Good point! if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52)) memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52); So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again. I am OK to change start_linear_pa, end_linear_pa to signed, but IMO what I have now is actually safer to make sure that does not break again in the future.
> > > > Btw, the KASLR check is incorrect: memstart_addr could also be > > negative when running the 52-bit VA kernel on hardware that is only > > 48-bit VA capable. > > Good point! > > if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52)) > memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52); > > So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again. Hi Ard, Actually, looking more at this, I do not see how with 52VA on a 48VA processor start offset can become negative unless randomization is involved. The start of the linear map will point to the first physical address that is reported by memblock_start_of_DRAM(). However, memstart_addr will be negative. So, I think the current approach using IS_ENABLED(CONFIG_RANDOMIZE_BASE) is good. 48VA processor with VA_BITS_48: memstart_addr 40000000 start_linear_pa 40000000 end_linear_pa 80003fffffff 48VA processor with VA_BITS_52: memstart_addr fff1000040000000 <- Negative start_linear_pa 40000000 <- positive, and the first PA address end_linear_pa 80003fffffff Thank you, Pasha
On 2/16/21 12:57 AM, Ard Biesheuvel wrote: > On Mon, 15 Feb 2021 at 20:22, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: >> >> Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the >> linear map range is not checked correctly. >> >> The start physical address that linear map covers can be actually at the >> end of the range because of randomization. Check that and if so reduce it >> to 0. >> >> This can be verified on QEMU with setting kaslr-seed to ~0ul: >> >> memstart_offset_seed = 0xffff >> START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000 >> END: __pa(PAGE_END - 1) = 1000bfffffff >> >> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> >> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping") >> Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com> > >> --- >> arch/arm64/mm/mmu.c | 20 ++++++++++++++++++-- >> 1 file changed, 18 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index ae0c3d023824..cc16443ea67f 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -1444,14 +1444,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) >> >> static bool inside_linear_region(u64 start, u64 size) >> { >> + u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual)); >> + u64 end_linear_pa = __pa(PAGE_END - 1); >> + >> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { >> + /* >> + * Check for a wrap, it is possible because of randomized linear >> + * mapping the start physical address is actually bigger than >> + * the end physical address. In this case set start to zero >> + * because [0, end_linear_pa] range must still be able to cover >> + * all addressable physical addresses. >> + */ >> + if (start_linear_pa > end_linear_pa) >> + start_linear_pa = 0; >> + } >> + >> + WARN_ON(start_linear_pa > end_linear_pa); >> + >> /* >> * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] >> * accommodating both its ends but excluding PAGE_END. Max physical >> * range which can be mapped inside this linear mapping range, must >> * also be derived from its end points. >> */ >> - return start >= __pa(_PAGE_OFFSET(vabits_actual)) && >> - (start + size - 1) <= __pa(PAGE_END - 1); > > Can't we simply use signed arithmetic here? This expression works fine > if the quantities are all interpreted as s64 instead of u64 There is a new generic framework which expects the platform to provide two distinct range points (low and high) for hotplug address comparison. Those range points can be different depending on whether address randomization is enabled and the flip occurs. But this comparison here in the platform code is going away. This patch needs to rebased on the new framework which is part of linux-next. https://patchwork.kernel.org/project/linux-mm/list/?series=425051
On 2/16/21 1:21 AM, Pavel Tatashin wrote: > On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote: >> >> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: >>> >>>> Can't we simply use signed arithmetic here? This expression works fine >>>> if the quantities are all interpreted as s64 instead of u64 >>> >>> I was thinking about that, but I do not like the idea of using sign >>> arithmetics for physical addresses. Also, I am worried that someone in >>> the future will unknowingly change it to unsigns or to phys_addr_t. It >>> is safer to have start explicitly set to 0 in case of wrap. >> >> memstart_addr is already a s64 for this exact reason. > > memstart_addr is basically an offset and it must be negative. For > example, this would not work if it was not signed: > #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) > > However, on powerpc it is phys_addr_t type. > >> >> Btw, the KASLR check is incorrect: memstart_addr could also be >> negative when running the 52-bit VA kernel on hardware that is only >> 48-bit VA capable. > > Good point! > > if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52)) > memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52); > > So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again. > > I am OK to change start_linear_pa, end_linear_pa to signed, but IMO > what I have now is actually safer to make sure that does not break > again in the future. An explicit check for the flip over and providing two different start addresses points would be required in order to use the new framework.
On Tue, 16 Feb 2021 at 04:12, Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 2/16/21 1:21 AM, Pavel Tatashin wrote: > > On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote: > >> > >> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > >>> > >>>> Can't we simply use signed arithmetic here? This expression works fine > >>>> if the quantities are all interpreted as s64 instead of u64 > >>> > >>> I was thinking about that, but I do not like the idea of using sign > >>> arithmetics for physical addresses. Also, I am worried that someone in > >>> the future will unknowingly change it to unsigns or to phys_addr_t. It > >>> is safer to have start explicitly set to 0 in case of wrap. > >> > >> memstart_addr is already a s64 for this exact reason. > > > > memstart_addr is basically an offset and it must be negative. For > > example, this would not work if it was not signed: > > #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) > > > > However, on powerpc it is phys_addr_t type. > > > >> > >> Btw, the KASLR check is incorrect: memstart_addr could also be > >> negative when running the 52-bit VA kernel on hardware that is only > >> 48-bit VA capable. > > > > Good point! > > > > if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52)) > > memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52); > > > > So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again. > > > > I am OK to change start_linear_pa, end_linear_pa to signed, but IMO > > what I have now is actually safer to make sure that does not break > > again in the future. > An explicit check for the flip over and providing two different start > addresses points would be required in order to use the new framework. I don't think so. We no longer randomize over the same range, but take the support PA range into account. (97d6786e0669d) This should ensure that __pa(_PAGE_OFFSET(vabits_actual)) never assumes a negative value. And to Pavel's point re 48/52 bit VAs: the fact that vabits_actual appears in this expression means that it already takes this into account, so you are correct that we don't have to care about that here. So even if memstart_addr could be negative, this expression should never produce a negative value. And with the patch above applied, it should never do so when running under KASLR either. So question to Pavel and Tyler: could you please check whether you have that patch, and whether it fixes the issue? It was introduced in v5.11, and hasn't been backported yet (it wasn't marked for -stable)
On Tue, Feb 16, 2021 at 2:36 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Tue, 16 Feb 2021 at 04:12, Anshuman Khandual > <anshuman.khandual@arm.com> wrote: > > > > > > > > On 2/16/21 1:21 AM, Pavel Tatashin wrote: > > > On Mon, Feb 15, 2021 at 2:34 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > >> > > >> On Mon, 15 Feb 2021 at 20:30, Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > >>> > > >>>> Can't we simply use signed arithmetic here? This expression works fine > > >>>> if the quantities are all interpreted as s64 instead of u64 > > >>> > > >>> I was thinking about that, but I do not like the idea of using sign > > >>> arithmetics for physical addresses. Also, I am worried that someone in > > >>> the future will unknowingly change it to unsigns or to phys_addr_t. It > > >>> is safer to have start explicitly set to 0 in case of wrap. > > >> > > >> memstart_addr is already a s64 for this exact reason. > > > > > > memstart_addr is basically an offset and it must be negative. For > > > example, this would not work if it was not signed: > > > #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) > > > > > > However, on powerpc it is phys_addr_t type. > > > > > >> > > >> Btw, the KASLR check is incorrect: memstart_addr could also be > > >> negative when running the 52-bit VA kernel on hardware that is only > > >> 48-bit VA capable. > > > > > > Good point! > > > > > > if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52)) > > > memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52); > > > > > > So, I will remove IS_ENABLED(CONFIG_RANDOMIZE_BASE) again. > > > > > > I am OK to change start_linear_pa, end_linear_pa to signed, but IMO > > > what I have now is actually safer to make sure that does not break > > > again in the future. > > An explicit check for the flip over and providing two different start > > addresses points would be required in order to use the new framework. > > I don't think so. We no longer randomize over the same range, but take > the support PA range into account. (97d6786e0669d) > > This should ensure that __pa(_PAGE_OFFSET(vabits_actual)) never > assumes a negative value. And to Pavel's point re 48/52 bit VAs: the > fact that vabits_actual appears in this expression means that it > already takes this into account, so you are correct that we don't have > to care about that here. > > So even if memstart_addr could be negative, this expression should > never produce a negative value. And with the patch above applied, it > should never do so when running under KASLR either. > > So question to Pavel and Tyler: could you please check whether you > have that patch, and whether it fixes the issue? It was introduced in > v5.11, and hasn't been backported yet (it wasn't marked for -stable) 97d6786e0669d arm64: mm: account for hotplug memory when randomizing the linear region Does not address the problem that is described in this bug. It only addresses the problem of adding extra PA space to the linear map which is indeed needed (btw is it possible that hot plug is going to add below memblock_start_of_DRAM(), because that is not currently accounted) , but not the fact that a linear map can start from high addresses because of randomization. I have verified that in QEMU, and Tyler verified it on real hardware backporting it to 5.10, the problem that this patch fixes is still there. Pasha
> There is a new generic framework which expects the platform to provide two > distinct range points (low and high) for hotplug address comparison. Those > range points can be different depending on whether address randomization > is enabled and the flip occurs. But this comparison here in the platform > code is going away. > > This patch needs to rebased on the new framework which is part of linux-next. > > https://patchwork.kernel.org/project/linux-mm/list/?series=425051 Hi Anshuman, Thanks for letting me know. I will send an updated patch against linux-next. Thank you, Pasha
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ae0c3d023824..cc16443ea67f 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1444,14 +1444,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) static bool inside_linear_region(u64 start, u64 size) { + u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual)); + u64 end_linear_pa = __pa(PAGE_END - 1); + + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { + /* + * Check for a wrap, it is possible because of randomized linear + * mapping the start physical address is actually bigger than + * the end physical address. In this case set start to zero + * because [0, end_linear_pa] range must still be able to cover + * all addressable physical addresses. + */ + if (start_linear_pa > end_linear_pa) + start_linear_pa = 0; + } + + WARN_ON(start_linear_pa > end_linear_pa); + /* * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] * accommodating both its ends but excluding PAGE_END. Max physical * range which can be mapped inside this linear mapping range, must * also be derived from its end points. */ - return start >= __pa(_PAGE_OFFSET(vabits_actual)) && - (start + size - 1) <= __pa(PAGE_END - 1); + return start >= start_linear_pa && (start + size - 1) <= end_linear_pa; } int arch_add_memory(int nid, u64 start, u64 size,