mbox series

[v4,0/5] ARM: decompressor: use by-VA cache maintenance for v7 cores

Message ID 20200226165738.11201-1-ardb@kernel.org (mailing list archive)
Headers show
Series ARM: decompressor: use by-VA cache maintenance for v7 cores | expand

Message

Ard Biesheuvel Feb. 26, 2020, 4:57 p.m. UTC
While making changes to the EFI stub startup code, I noticed that we are
still doing set/way maintenance on the caches when booting on v7 cores.
This works today on VMs by virtue of the fact that KVM traps set/way ops
and cleans the whole address space by VA on behalf of the guest, and on
most v7 hardware, the set/way ops are in fact sufficient when only one
core is running, as there usually is no system cache. But on systems
like SynQuacer, for which 32-bit firmware is available, the current cache
maintenance only pushes the data out to the L3 system cache, where it
is not visible to the CPU once it turns the MMU and caches off.

So instead, switch to the by-VA cache maintenance that the architecture
requires for v7 and later (and ARM1176, as a side effect).

Changes since v3:
- ensure that the region that is cleaned after self-relocation of the zImage
  covers the appended DTB, if present

Apologies to Linus, but due to this change, I decided not to take your
Tested-by into account, and I would appreciate it if you could retest
this version of the series? Thanks.

Changes since v2:
- add a patch to factor out the code sequence that obtains the inflated image
  size by doing an unaligned LE32 load from the end of the compressed data
- use new macro to load the inflated image size instead of doing a potentially
  unaligned load
- omit the stack for getting the base and size of the self-relocated zImage

Changes since v1:
- include the EFI patch that was sent out separately before (#1)
- split the preparatory work to pass the region to clean in r0/r1 in a EFI
  specific one and one for the decompressor - this way, the first two patches
  can go on a stable branch that is shared between the ARM tree and the EFI
  tree
- document the meaning of the values in r0/r1 upon entry to cache_clean_flush
- take care to treat the region end address as exclusive
- switch to clean+invalidate to align with the other implementations
- drop some code that manages the stack pointer value before calling
  cache_clean_flush(), which is no longer necessary
- take care to clean the entire region that is covered by the relocated zImage
  if it needs to relocate itself before decompressing

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm32-efi-cache-ops

[ Several people asked me offline why on earth I am running SynQuacer on 32 bit:
  the answer is that this is simply to prove that it is currently broken, and
  this implies that for 32-bit VMs running under KVM, we are relying on the
  special, non-architectural cache management done by the hypervisor on behalf
  of the guest to be able to run this code. ]

Cc: Russell King <linux@armlinux.org.uk>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Linus Walleij <linus.walleij@linaro.org>

Ard Biesheuvel (5):
  efi/arm: Work around missing cache maintenance in decompressor
    handover
  efi/arm: Pass start and end addresses to cache_clean_flush()
  ARM: decompressor: factor out routine to obtain the inflated image
    size
  ARM: decompressor: prepare cache_clean_flush for doing by-VA
    maintenance
  ARM: decompressor: switch to by-VA cache maintenance for v7 cores

 arch/arm/boot/compressed/head.S | 162 +++++++++++---------
 1 file changed, 86 insertions(+), 76 deletions(-)

Comments

Tony Lindgren Feb. 26, 2020, 7:14 p.m. UTC | #1
* Ard Biesheuvel <ardb@kernel.org> [200226 16:58]:
> While making changes to the EFI stub startup code, I noticed that we are
> still doing set/way maintenance on the caches when booting on v7 cores.
> This works today on VMs by virtue of the fact that KVM traps set/way ops
> and cleans the whole address space by VA on behalf of the guest, and on
> most v7 hardware, the set/way ops are in fact sufficient when only one
> core is running, as there usually is no system cache. But on systems
> like SynQuacer, for which 32-bit firmware is available, the current cache
> maintenance only pushes the data out to the L3 system cache, where it
> is not visible to the CPU once it turns the MMU and caches off.
> 
> So instead, switch to the by-VA cache maintenance that the architecture
> requires for v7 and later (and ARM1176, as a side effect).
> 
> Changes since v3:
> - ensure that the region that is cleaned after self-relocation of the zImage
>   covers the appended DTB, if present

I gave these a try on top of the earlier "arm: fix Kbuild issue caused
by per-task stack protector GCC plugin" and booting still works for
me on armv7 including appended dtb:

Tested-by: Tony Lindgren <tony@atomide.com>
Linus Walleij Feb. 27, 2020, 10:11 a.m. UTC | #2
On Wed, Feb 26, 2020 at 5:57 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> So instead, switch to the by-VA cache maintenance that the architecture
> requires for v7 and later (and ARM1176, as a side effect).
>
> Changes since v3:
> - ensure that the region that is cleaned after self-relocation of the zImage
>   covers the appended DTB, if present
>
> Apologies to Linus, but due to this change, I decided not to take your
> Tested-by into account, and I would appreciate it if you could retest
> this version of the series? Thanks.

No problem, I have tested it on the following:

- ARMv7 Cortex A9 x 2 Qualcomm APQ8060 DragonBoard
- ARM PB11MPCore (4 x 1176)
- ARMv7 Ux500 Cortex A9 x 2

The PB11MPCore is again the crucial board, if it work on that
board it works on anything, most of the time :D

Tested-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij
Marc Zyngier Feb. 27, 2020, 4:01 p.m. UTC | #3
On 2020-02-27 10:11, Linus Walleij wrote:
> On Wed, Feb 26, 2020 at 5:57 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> 
>> So instead, switch to the by-VA cache maintenance that the 
>> architecture
>> requires for v7 and later (and ARM1176, as a side effect).
>> 
>> Changes since v3:
>> - ensure that the region that is cleaned after self-relocation of the 
>> zImage
>>   covers the appended DTB, if present
>> 
>> Apologies to Linus, but due to this change, I decided not to take your
>> Tested-by into account, and I would appreciate it if you could retest
>> this version of the series? Thanks.
> 
> No problem, I have tested it on the following:
> 
> - ARMv7 Cortex A9 x 2 Qualcomm APQ8060 DragonBoard
> - ARM PB11MPCore (4 x 1176)

<pedant>

The ARM11MPCore isn't a bunch of 1176s glued together. It is actually a 
very
different CPU, designed by a different team.

</pedant>

> - ARMv7 Ux500 Cortex A9 x 2
> 
> The PB11MPCore is again the crucial board, if it work on that
> board it works on anything, most of the time :D

That I can only agree with! ;-)

         M.
Ard Biesheuvel Feb. 27, 2020, 4:47 p.m. UTC | #4
On Thu, 27 Feb 2020 at 17:01, Marc Zyngier <maz@kernel.org> wrote:
>
> On 2020-02-27 10:11, Linus Walleij wrote:
> > On Wed, Feb 26, 2020 at 5:57 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> >> So instead, switch to the by-VA cache maintenance that the
> >> architecture
> >> requires for v7 and later (and ARM1176, as a side effect).
> >>
> >> Changes since v3:
> >> - ensure that the region that is cleaned after self-relocation of the
> >> zImage
> >>   covers the appended DTB, if present
> >>
> >> Apologies to Linus, but due to this change, I decided not to take your
> >> Tested-by into account, and I would appreciate it if you could retest
> >> this version of the series? Thanks.
> >
> > No problem, I have tested it on the following:
> >
> > - ARMv7 Cortex A9 x 2 Qualcomm APQ8060 DragonBoard
> > - ARM PB11MPCore (4 x 1176)
>
> <pedant>
>
> The ARM11MPCore isn't a bunch of 1176s glued together. It is actually a
> very
> different CPU, designed by a different team.
>
> </pedant>
>

It still takes the same code path in the cache routines, afaict:
- the architecture field in the main id register == 0xf, so it uses
__armv7_mmu_cache_flush
- ID_MMFR1[19:16] == 0x2, so it does not take the 'hierarchical' code
path which is modified by these patches





> > - ARMv7 Ux500 Cortex A9 x 2
> >
> > The PB11MPCore is again the crucial board, if it work on that
> > board it works on anything, most of the time :D
>
> That I can only agree with! ;-)
>
>          M.
> --
> Jazz is not dead. It just smells funny...
Marc Zyngier Feb. 27, 2020, 4:53 p.m. UTC | #5
On 2020-02-27 16:47, Ard Biesheuvel wrote:
> On Thu, 27 Feb 2020 at 17:01, Marc Zyngier <maz@kernel.org> wrote:
>> 
>> On 2020-02-27 10:11, Linus Walleij wrote:
>> > On Wed, Feb 26, 2020 at 5:57 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>> >
>> >> So instead, switch to the by-VA cache maintenance that the
>> >> architecture
>> >> requires for v7 and later (and ARM1176, as a side effect).
>> >>
>> >> Changes since v3:
>> >> - ensure that the region that is cleaned after self-relocation of the
>> >> zImage
>> >>   covers the appended DTB, if present
>> >>
>> >> Apologies to Linus, but due to this change, I decided not to take your
>> >> Tested-by into account, and I would appreciate it if you could retest
>> >> this version of the series? Thanks.
>> >
>> > No problem, I have tested it on the following:
>> >
>> > - ARMv7 Cortex A9 x 2 Qualcomm APQ8060 DragonBoard
>> > - ARM PB11MPCore (4 x 1176)
>> 
>> <pedant>
>> 
>> The ARM11MPCore isn't a bunch of 1176s glued together. It is actually 
>> a
>> very
>> different CPU, designed by a different team.
>> 
>> </pedant>
>> 
> 
> It still takes the same code path in the cache routines, afaict:
> - the architecture field in the main id register == 0xf, so it uses
> __armv7_mmu_cache_flush
> - ID_MMFR1[19:16] == 0x2, so it does not take the 'hierarchical' code
> path which is modified by these patches

Absolutely. From a SW perspective, this is treated in a similar way as
ARM1176. The underlying HW is very different though...

Thanks,

         M.