mbox series

[v2,0/3] ARM: make use of UAL VFP mnemonics when possible

Message ID cover.1587299429.git.stefan@agner.ch (mailing list archive)
Headers show
Series ARM: make use of UAL VFP mnemonics when possible | expand

Message

Stefan Agner April 19, 2020, 12:35 p.m. UTC
To build the kernel with Clang's integrated assembler the VFP code needs
to make use of the unified assembler language (UAL) VFP mnemonics.

At first I tried to get rid of the co-processor instructions to access
the floating point unit along with the macros completely. However, due
to missing FPINST/FPINST2 argument support in older binutils versions we
have to keep them around. Once we drop support for binutils 2.24 and
older, the move to UAL VFP mnemonics will be straight forward with this
changes applied.

Tested using Clang with integrated assembler as well as external
(binutils assembler), various gcc/binutils version down to 4.7/2.23.
Disassembled and compared the object files in arch/arm/vfp/ to make
sure this changes leads to the same code. Besides different inlining
behavior I was not able to spot a difference.

In v2 the check for FPINST argument support is now made in Kconfig.

--
Stefan

Stefan Agner (3):
  ARM: use .fpu assembler directives instead of assembler arguments
  ARM: use VFP assembler mnemonics in register load/store macros
  ARM: use VFP assembler mnemonics if available

 arch/arm/Kconfig                 |  2 ++
 arch/arm/Kconfig.assembler       |  6 ++++++
 arch/arm/include/asm/vfp.h       |  2 ++
 arch/arm/include/asm/vfpmacros.h | 31 ++++++++++++++++++++++---------
 arch/arm/vfp/Makefile            |  2 --
 arch/arm/vfp/vfphw.S             | 31 ++++++++++++++++++++-----------
 arch/arm/vfp/vfpinstr.h          | 23 +++++++++++++++++++----
 7 files changed, 71 insertions(+), 26 deletions(-)
 create mode 100644 arch/arm/Kconfig.assembler

Comments

Russell King (Oracle) April 19, 2020, 2:12 p.m. UTC | #1
On Sun, Apr 19, 2020 at 02:35:48PM +0200, Stefan Agner wrote:
> To build the kernel with Clang's integrated assembler the VFP code needs
> to make use of the unified assembler language (UAL) VFP mnemonics.
> 
> At first I tried to get rid of the co-processor instructions to access
> the floating point unit along with the macros completely. However, due
> to missing FPINST/FPINST2 argument support in older binutils versions we
> have to keep them around. Once we drop support for binutils 2.24 and
> older, the move to UAL VFP mnemonics will be straight forward with this
> changes applied.
> 
> Tested using Clang with integrated assembler as well as external
> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
> Disassembled and compared the object files in arch/arm/vfp/ to make
> sure this changes leads to the same code. Besides different inlining
> behavior I was not able to spot a difference.
> 
> In v2 the check for FPINST argument support is now made in Kconfig.

Given what I said in the other thread, Clang really _should_ allow
the MCR/MRC et.al. instructions to access the VFP registers.  There
is no reason to specifically block them.

As we have seen with FPA, having that ability when iWMMXT comes along
is very useful.  In any case:

1. The ARM ARM (DDI0406) states that "These instructions are MRC and MCR
instructions for coprocessors 10 and 11." in section A7.8.

2. The ARM ARM (DDI0406) describes the MRC and MCR instructions as
being able to access _any_ co-processor.

So, Clang deciding that it's going to block access to coprocessor 10
and 11 because some version of the architecture _also_ defines these
as VFP instructions is really not on, and Clang needs to be fixed
irrespective of these patches - and I want to know that *is* going to
get fixed before I take these patches into the kernel.
Stefan Agner April 19, 2020, 9:20 p.m. UTC | #2
On 2020-04-19 16:12, Russell King - ARM Linux admin wrote:
> On Sun, Apr 19, 2020 at 02:35:48PM +0200, Stefan Agner wrote:
>> To build the kernel with Clang's integrated assembler the VFP code needs
>> to make use of the unified assembler language (UAL) VFP mnemonics.
>>
>> At first I tried to get rid of the co-processor instructions to access
>> the floating point unit along with the macros completely. However, due
>> to missing FPINST/FPINST2 argument support in older binutils versions we
>> have to keep them around. Once we drop support for binutils 2.24 and
>> older, the move to UAL VFP mnemonics will be straight forward with this
>> changes applied.
>>
>> Tested using Clang with integrated assembler as well as external
>> (binutils assembler), various gcc/binutils version down to 4.7/2.23.
>> Disassembled and compared the object files in arch/arm/vfp/ to make
>> sure this changes leads to the same code. Besides different inlining
>> behavior I was not able to spot a difference.
>>
>> In v2 the check for FPINST argument support is now made in Kconfig.
> 
> Given what I said in the other thread, Clang really _should_ allow
> the MCR/MRC et.al. instructions to access the VFP registers.  There
> is no reason to specifically block them.

I agree, and I am working on changing this.

There have been discussions about co-processor register access a while
back in the LLVM/Clang community [1]. Peter Smith pointed this out in
the ClangBuiltLinux issue tracker [2], which also has some more context.
I did submit a patch [3] to convert use of cp10/cp11 in ARMv7 contexts
to a warning. However it got stale, I'll have to revisit.

There is actually another case where this issue blocks Clang's
integrated assembler: In arch/arm/kernel/perf_event_v7.c, function
venum_read_pmresr mcr/mrc is used to access the performance monitor
registers for Qualcomm's Krait/Scorpion PMU, and in this case there is
no mnemonic available.

> 
> As we have seen with FPA, having that ability when iWMMXT comes along
> is very useful.  In any case:
> 
> 1. The ARM ARM (DDI0406) states that "These instructions are MRC and MCR
> instructions for coprocessors 10 and 11." in section A7.8.
> 
> 2. The ARM ARM (DDI0406) describes the MRC and MCR instructions as
> being able to access _any_ co-processor.

These are good arguments I can use in case my patch stirs up a
discussion, thanks for the hints!

> 
> So, Clang deciding that it's going to block access to coprocessor 10
> and 11 because some version of the architecture _also_ defines these
> as VFP instructions is really not on, and Clang needs to be fixed
> irrespective of these patches - and I want to know that *is* going to
> get fixed before I take these patches into the kernel.

I'll try. We'll see.

[1] https://bugs.llvm.org/show_bug.cgi?id=20025
[2] https://github.com/ClangBuiltLinux/linux/issues/306
[3] https://reviews.llvm.org/D59733

--
Stefan