diff mbox series

ARM: vfp: Add vudot opcode to VFP undef hook

Message ID 20230920083907.30479-1-mark-pk.tsai@mediatek.com (mailing list archive)
State New, archived
Headers show
Series ARM: vfp: Add vudot opcode to VFP undef hook | expand

Commit Message

Mark-PK Tsai (蔡沛剛) Sept. 20, 2023, 8:39 a.m. UTC
Add vudot opcode to the VFP undef hook to fix the
potentially undefined instruction error when the
user space executes vudot instruction.

Before this commit, kernel didn't handle the undef exception
caused by vudot and didn't enable VFP in lazy VFP context
switch code like other NEON instructions.
This led to the occurrence of the undefined instruction
error as following:

[  250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
...
[  250.741287 ] PC is at 0x4014ec
[  250.741298 ] LR is at 0xb677874f
[  250.741303 ] pc : [<004014ec>]    lr : [<b677874f>]    psr: 80070010
[  250.741309 ] sp : beffedb0  ip : b67d7864  fp : beffee58
[  250.741314 ] r10: 00000000  r9 : 00000000  r8 : 00000000
[  250.741319 ] r7 : 00000001  r6 : 00000001  r5 : beffee90  r4 : 00401470
[  250.741324 ] r3 : beffee20  r2 : beffee30  r1 : beffee40  r0 : 004003a8
[  250.741331 ] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM Segment user
[  250.741339 ] Control: 10c5383d  Table: 32d0406a  DAC: 00000055
[  250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)

Below is the assembly of the user program:

0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
0x4014ec <+124>: vudot.u8 q8, q9, q10         <-- switch in, and FPEXC.EN = 0
                                                  SIGILL(illegal instruction)

Link: https://services.arm.com/support/s/case/5004L00000XsOjP
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
---
 arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Robin Murphy Sept. 20, 2023, 6:49 p.m. UTC | #1
On 2023-09-20 09:39, Mark-PK Tsai wrote:
> Add vudot opcode to the VFP undef hook to fix the
> potentially undefined instruction error when the
> user space executes vudot instruction.

Did the kernel expose a hwcap to say that the dot product extension is 
supported? I'm pretty sure it didn't, so why would userspace expect this 
to work? ;)

IIRC Amit was looking at defining the hwcaps to align with arm64 compat, 
but I believe that series faltered since most of them weren't actually 
needed (and I think at that point it was still missing the VFP support 
code parts). It would be nice if someone could pick up and combine both 
efforts and get this done properly; fill in *all* the hwcaps and 
relevant handling for extensions which Cortex-A55 supports (since 
there's definitely more than just VUDOT), and then hopefully we're done 
for good.

> Before this commit, kernel didn't handle the undef exception
> caused by vudot and didn't enable VFP in lazy VFP context
> switch code like other NEON instructions.
> This led to the occurrence of the undefined instruction
> error as following:
> 
> [  250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
> ...
> [  250.741287 ] PC is at 0x4014ec
> [  250.741298 ] LR is at 0xb677874f
> [  250.741303 ] pc : [<004014ec>]    lr : [<b677874f>]    psr: 80070010
> [  250.741309 ] sp : beffedb0  ip : b67d7864  fp : beffee58
> [  250.741314 ] r10: 00000000  r9 : 00000000  r8 : 00000000
> [  250.741319 ] r7 : 00000001  r6 : 00000001  r5 : beffee90  r4 : 00401470
> [  250.741324 ] r3 : beffee20  r2 : beffee30  r1 : beffee40  r0 : 004003a8
> [  250.741331 ] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM Segment user
> [  250.741339 ] Control: 10c5383d  Table: 32d0406a  DAC: 00000055
> [  250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
> 
> Below is the assembly of the user program:
> 
> 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
> 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
> 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
> 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
> 0x4014ec <+124>: vudot.u8 q8, q9, q10         <-- switch in, and FPEXC.EN = 0
>                                                    SIGILL(illegal instruction)
> 
> Link: https://services.arm.com/support/s/case/5004L00000XsOjP

Linking to your private support case is not useful to upstream. Even I 
can't open that link.

> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
> ---
>   arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> index 7e8773a2d99d..7eab8d1019d2 100644
> --- a/arch/arm/vfp/vfpmodule.c
> +++ b/arch/arm/vfp/vfpmodule.c
> @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
>   	.cpsr_mask	= PSR_T_BIT,
>   	.cpsr_val	= 0,
>   	.fn		= vfp_support_entry,
> +}, {
> +	.instr_mask	= 0xffb00000,
> +	.instr_val	= 0xfc200000,
> +	.cpsr_mask	= PSR_T_BIT,
> +	.cpsr_val	= 0,
> +	.fn		= vfp_support_entry,
>   }, {
>   	.instr_mask	= 0xef000000,
>   	.instr_val	= 0xef000000,
> @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
>   	.cpsr_mask	= PSR_T_BIT,
>   	.cpsr_val	= PSR_T_BIT,
>   	.fn		= vfp_support_entry,
> +}, {
> +	.instr_mask	= 0xffb00000,
> +	.instr_val	= 0xfc200000,
> +	.cpsr_mask	= PSR_T_BIT,
> +	.cpsr_val	= PSR_T_BIT,
> +	.fn		= vfp_support_entry,

Why have two entries conditional on each possible value of one bit for 
otherwise identical encodings? Surely it suffices to set both cpsr_mask 
and cpsr_val to 0?

Thanks,
Robin.

>   }};
>   
>   static struct undef_hook vfp_support_hook = {
Mark-PK Tsai (蔡沛剛) Sept. 21, 2023, 2:13 a.m. UTC | #2
> On 2023-09-20 09:39, Mark-PK Tsai wrote:
> > Add vudot opcode to the VFP undef hook to fix the
> > potentially undefined instruction error when the
> > user space executes vudot instruction.
> 
> Did the kernel expose a hwcap to say that the dot product extension is 
> supported? I'm pretty sure it didn't, so why would userspace expect this 
> to work? ;)

The hwcap for dotprod has been exported since commit:

62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd

> 
> IIRC Amit was looking at defining the hwcaps to align with arm64 compat, 
> but I believe that series faltered since most of them weren't actually 
> needed (and I think at that point it was still missing the VFP support 
> code parts). It would be nice if someone could pick up and combine both 

Were the mentioned series related to this commit?

62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd

> efforts and get this done properly; fill in *all* the hwcaps and 
> relevant handling for extensions which Cortex-A55 supports (since 
> there's definitely more than just VUDOT), and then hopefully we're done 
> for good.

Agree.

> 
> > Before this commit, kernel didn't handle the undef exception
> > caused by vudot and didn't enable VFP in lazy VFP context
> > switch code like other NEON instructions.
> > This led to the occurrence of the undefined instruction
> > error as following:
> > 
> > [  250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
> > ...
> > [  250.741287 ] PC is at 0x4014ec
> > [  250.741298 ] LR is at 0xb677874f
> > [  250.741303 ] pc : [<004014ec>]    lr : [<b677874f>]    psr: 80070010
> > [  250.741309 ] sp : beffedb0  ip : b67d7864  fp : beffee58
> > [  250.741314 ] r10: 00000000  r9 : 00000000  r8 : 00000000
> > [  250.741319 ] r7 : 00000001  r6 : 00000001  r5 : beffee90  r4 : 00401470
> > [  250.741324 ] r3 : beffee20  r2 : beffee30  r1 : beffee40  r0 : 004003a8
> > [  250.741331 ] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM Segment user
> > [  250.741339 ] Control: 10c5383d  Table: 32d0406a  DAC: 00000055
> > [  250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
> > 
> > Below is the assembly of the user program:
> > 
> > 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
> > 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
> > 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
> > 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
> > 0x4014ec <+124>: vudot.u8 q8, q9, q10         <-- switch in, and FPEXC.EN = 0
> >                                                    SIGILL(illegal instruction)
> > 
> > Link: https://services.arm.com/support/s/case/5004L00000XsOjP
> 
> Linking to your private support case is not useful to upstream. Even I 
> can't open that link.

I thought that maybe someone in arm need this.
But it seems a bit noisy so I will remove the link from v2.

> 
> > Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
> > ---
> >   arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> > index 7e8773a2d99d..7eab8d1019d2 100644
> > --- a/arch/arm/vfp/vfpmodule.c
> > +++ b/arch/arm/vfp/vfpmodule.c
> > @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
> >   	.cpsr_mask	= PSR_T_BIT,
> >   	.cpsr_val	= 0,
> >   	.fn		= vfp_support_entry,
> > +}, {
> > +	.instr_mask	= 0xffb00000,
> > +	.instr_val	= 0xfc200000,
> > +	.cpsr_mask	= PSR_T_BIT,
> > +	.cpsr_val	= 0,
> > +	.fn		= vfp_support_entry,
> >   }, {
> >   	.instr_mask	= 0xef000000,
> >   	.instr_val	= 0xef000000,
> > @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
> >   	.cpsr_mask	= PSR_T_BIT,
> >   	.cpsr_val	= PSR_T_BIT,
> >   	.fn		= vfp_support_entry,
> > +}, {
> > +	.instr_mask	= 0xffb00000,
> > +	.instr_val	= 0xfc200000,
> > +	.cpsr_mask	= PSR_T_BIT,
> > +	.cpsr_val	= PSR_T_BIT,
> > +	.fn		= vfp_support_entry,
> 
> Why have two entries conditional on each possible value of one bit for 
> otherwise identical encodings? Surely it suffices to set both cpsr_mask 
> and cpsr_val to 0?

You're right.
I will set both cpsr_mask and cpsr_val to 0 and use single entry,
as you suggested, in the v2 patch.

Thanks.

> 
> Thanks,
> Robin.
> 
> >   }};
> >   
> >   static struct undef_hook vfp_support_hook = {
Xuewen Yan Sept. 21, 2023, 12:05 p.m. UTC | #3
Hi Mark-PK, Robin

We also meets the scene, and has the following stack:

Thread-2 (5361): undefined instruction: pc=d05ae08c
CPU: 5 PID: 5361 Comm: Thread-2 Tainted: G        W  O
5.4.210-android12-9-04458-g56c7c43d3298-ab000045 #98
Hardware name: Generic DT based system
PC is at 0x7d1aa068
LR is at 0x7c22ae50
pc : [<7d1aa068>]    lr : [<7c22ae50>]    psr: 800b0010
sp : 7c78ee20  ip : 7c22ae40  fp : 7c78eea0
r10: 7c22ae60  r9 : 7c22ae70  r8 : 00000008
r7 : 7c0fee80  r6 : 7c0fee70  r5 : 7c0fee60  r4 : 00000010
r3 : 7c0fee50  r2 : 00000000  r1 : 00000010  r0 : a698aee0
Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 10c5383d  Table: 8649c06a  DAC: 00000055
Code: edddcb1e fe22edd4 f4600a0d fc67cd98 (fe20edf4) <<<

So, we also need add the 0xfe000000:

Could you please help add the following patch into the patch-v2?

Thanks!

---
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 7e8773a2d99d..1078c0f169d2 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -788,6 +788,18 @@ static struct undef_hook neon_support_hook[] = {{
        .cpsr_mask      = PSR_T_BIT,
        .cpsr_val       = 0,
        .fn             = vfp_support_entry,
+}, {
+       .instr_mask     = 0xfc000000,
+       .instr_val      = 0xfc000000,
+       .cpsr_mask      = PSR_T_BIT,
+       .cpsr_val       = 0,
+       .fn             = vfp_support_entry,
+}, {
+       .instr_mask     = 0xfe000000,
+       .instr_val      = 0xfe000000,
+       .cpsr_mask      = PSR_T_BIT,
+       .cpsr_val       = 0,
+       .fn             = vfp_support_entry,
 }, {
        .instr_mask     = 0xef000000,
        .instr_val      = 0xef000000,
@@ -800,6 +812,18 @@ static struct undef_hook neon_support_hook[] = {{
        .cpsr_mask      = PSR_T_BIT,
        .cpsr_val       = PSR_T_BIT,
        .fn             = vfp_support_entry,
+}, {
+       .instr_mask     = 0xfc000000,
+       .instr_val      = 0xfc000000,
+       .cpsr_mask      = PSR_T_BIT,
+       .cpsr_val       = PSR_T_BIT,
+       .fn             = vfp_support_entry,
+}, {
+       .instr_mask     = 0xfe000000,
+       .instr_val      = 0xfe000000,
+       .cpsr_mask      = PSR_T_BIT,
+       .cpsr_val       = PSR_T_BIT,
+       .fn             = vfp_support_entry,
 }};

 static struct undef_hook vfp_support_hook = {

On Thu, Sep 21, 2023 at 12:40 PM Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote:
>
> > On 2023-09-20 09:39, Mark-PK Tsai wrote:
> > > Add vudot opcode to the VFP undef hook to fix the
> > > potentially undefined instruction error when the
> > > user space executes vudot instruction.
> >
> > Did the kernel expose a hwcap to say that the dot product extension is
> > supported? I'm pretty sure it didn't, so why would userspace expect this
> > to work? ;)
>
> The hwcap for dotprod has been exported since commit:
>
> 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
>
> >
> > IIRC Amit was looking at defining the hwcaps to align with arm64 compat,
> > but I believe that series faltered since most of them weren't actually
> > needed (and I think at that point it was still missing the VFP support
> > code parts). It would be nice if someone could pick up and combine both
>
> Were the mentioned series related to this commit?
>
> 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
>
> > efforts and get this done properly; fill in *all* the hwcaps and
> > relevant handling for extensions which Cortex-A55 supports (since
> > there's definitely more than just VUDOT), and then hopefully we're done
> > for good.
>
> Agree.
>
> >
> > > Before this commit, kernel didn't handle the undef exception
> > > caused by vudot and didn't enable VFP in lazy VFP context
> > > switch code like other NEON instructions.
> > > This led to the occurrence of the undefined instruction
> > > error as following:
> > >
> > > [  250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
> > > ...
> > > [  250.741287 ] PC is at 0x4014ec
> > > [  250.741298 ] LR is at 0xb677874f
> > > [  250.741303 ] pc : [<004014ec>]    lr : [<b677874f>]    psr: 80070010
> > > [  250.741309 ] sp : beffedb0  ip : b67d7864  fp : beffee58
> > > [  250.741314 ] r10: 00000000  r9 : 00000000  r8 : 00000000
> > > [  250.741319 ] r7 : 00000001  r6 : 00000001  r5 : beffee90  r4 : 00401470
> > > [  250.741324 ] r3 : beffee20  r2 : beffee30  r1 : beffee40  r0 : 004003a8
> > > [  250.741331 ] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM Segment user
> > > [  250.741339 ] Control: 10c5383d  Table: 32d0406a  DAC: 00000055
> > > [  250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
> > >
> > > Below is the assembly of the user program:
> > >
> > > 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
> > > 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
> > > 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
> > > 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
> > > 0x4014ec <+124>: vudot.u8 q8, q9, q10         <-- switch in, and FPEXC.EN = 0
> > >                                                    SIGILL(illegal instruction)
> > >
> > > Link: https://services.arm.com/support/s/case/5004L00000XsOjP
> >
> > Linking to your private support case is not useful to upstream. Even I
> > can't open that link.
>
> I thought that maybe someone in arm need this.
> But it seems a bit noisy so I will remove the link from v2.
>
> >
> > > Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
> > > ---
> > >   arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
> > >   1 file changed, 12 insertions(+)
> > >
> > > diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> > > index 7e8773a2d99d..7eab8d1019d2 100644
> > > --- a/arch/arm/vfp/vfpmodule.c
> > > +++ b/arch/arm/vfp/vfpmodule.c
> > > @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
> > >     .cpsr_mask      = PSR_T_BIT,
> > >     .cpsr_val       = 0,
> > >     .fn             = vfp_support_entry,
> > > +}, {
> > > +   .instr_mask     = 0xffb00000,
> > > +   .instr_val      = 0xfc200000,
> > > +   .cpsr_mask      = PSR_T_BIT,
> > > +   .cpsr_val       = 0,
> > > +   .fn             = vfp_support_entry,
> > >   }, {
> > >     .instr_mask     = 0xef000000,
> > >     .instr_val      = 0xef000000,
> > > @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
> > >     .cpsr_mask      = PSR_T_BIT,
> > >     .cpsr_val       = PSR_T_BIT,
> > >     .fn             = vfp_support_entry,
> > > +}, {
> > > +   .instr_mask     = 0xffb00000,
> > > +   .instr_val      = 0xfc200000,
> > > +   .cpsr_mask      = PSR_T_BIT,
> > > +   .cpsr_val       = PSR_T_BIT,
> > > +   .fn             = vfp_support_entry,
> >
> > Why have two entries conditional on each possible value of one bit for
> > otherwise identical encodings? Surely it suffices to set both cpsr_mask
> > and cpsr_val to 0?
>
> You're right.
> I will set both cpsr_mask and cpsr_val to 0 and use single entry,
> as you suggested, in the v2 patch.
>
> Thanks.
>
> >
> > Thanks,
> > Robin.
> >
> > >   }};
> > >
> > >   static struct undef_hook vfp_support_hook = {
Robin Murphy Sept. 21, 2023, 12:07 p.m. UTC | #4
On 21/09/2023 3:13 am, Mark-PK Tsai wrote:
>> On 2023-09-20 09:39, Mark-PK Tsai wrote:
>>> Add vudot opcode to the VFP undef hook to fix the
>>> potentially undefined instruction error when the
>>> user space executes vudot instruction.
>>
>> Did the kernel expose a hwcap to say that the dot product extension is
>> supported? I'm pretty sure it didn't, so why would userspace expect this
>> to work? ;)
> 
> The hwcap for dotprod has been exported since commit:
> 
> 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
> 
>>
>> IIRC Amit was looking at defining the hwcaps to align with arm64 compat,
>> but I believe that series faltered since most of them weren't actually
>> needed (and I think at that point it was still missing the VFP support
>> code parts). It would be nice if someone could pick up and combine both
> 
> Were the mentioned series related to this commit?
> 
> 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd

Oh, that did get merged? My apologies, I grepped for the hwcaps in 
arch/arm but somehow failed to spot that some definitions did exist, so 
assumed it hadn't been; not sure what went wrong there :(

In that case, we definitely want this tagged as a fix, and to make sure 
we double-check for any equivalent fixes still needed for the other 
features too. Sorry again for the confusion.

>> efforts and get this done properly; fill in *all* the hwcaps and
>> relevant handling for extensions which Cortex-A55 supports (since
>> there's definitely more than just VUDOT), and then hopefully we're done
>> for good.
> 
> Agree.
> 
>>
>>> Before this commit, kernel didn't handle the undef exception
>>> caused by vudot and didn't enable VFP in lazy VFP context
>>> switch code like other NEON instructions.
>>> This led to the occurrence of the undefined instruction
>>> error as following:
>>>
>>> [  250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
>>> ...
>>> [  250.741287 ] PC is at 0x4014ec
>>> [  250.741298 ] LR is at 0xb677874f
>>> [  250.741303 ] pc : [<004014ec>]    lr : [<b677874f>]    psr: 80070010
>>> [  250.741309 ] sp : beffedb0  ip : b67d7864  fp : beffee58
>>> [  250.741314 ] r10: 00000000  r9 : 00000000  r8 : 00000000
>>> [  250.741319 ] r7 : 00000001  r6 : 00000001  r5 : beffee90  r4 : 00401470
>>> [  250.741324 ] r3 : beffee20  r2 : beffee30  r1 : beffee40  r0 : 004003a8
>>> [  250.741331 ] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM Segment user
>>> [  250.741339 ] Control: 10c5383d  Table: 32d0406a  DAC: 00000055
>>> [  250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
>>>
>>> Below is the assembly of the user program:
>>>
>>> 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
>>> 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
>>> 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
>>> 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
>>> 0x4014ec <+124>: vudot.u8 q8, q9, q10         <-- switch in, and FPEXC.EN = 0
>>>                                                     SIGILL(illegal instruction)
>>>
>>> Link: https://services.arm.com/support/s/case/5004L00000XsOjP
>>
>> Linking to your private support case is not useful to upstream. Even I
>> can't open that link.
> 
> I thought that maybe someone in arm need this.
> But it seems a bit noisy so I will remove the link from v2.

Yeah, even within Arm most of us don't have permission to access the 
support system.

Cheers,
Robin.

>>
>>> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
>>> ---
>>>    arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
>>>    1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
>>> index 7e8773a2d99d..7eab8d1019d2 100644
>>> --- a/arch/arm/vfp/vfpmodule.c
>>> +++ b/arch/arm/vfp/vfpmodule.c
>>> @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
>>>    	.cpsr_mask	= PSR_T_BIT,
>>>    	.cpsr_val	= 0,
>>>    	.fn		= vfp_support_entry,
>>> +}, {
>>> +	.instr_mask	= 0xffb00000,
>>> +	.instr_val	= 0xfc200000,
>>> +	.cpsr_mask	= PSR_T_BIT,
>>> +	.cpsr_val	= 0,
>>> +	.fn		= vfp_support_entry,
>>>    }, {
>>>    	.instr_mask	= 0xef000000,
>>>    	.instr_val	= 0xef000000,
>>> @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
>>>    	.cpsr_mask	= PSR_T_BIT,
>>>    	.cpsr_val	= PSR_T_BIT,
>>>    	.fn		= vfp_support_entry,
>>> +}, {
>>> +	.instr_mask	= 0xffb00000,
>>> +	.instr_val	= 0xfc200000,
>>> +	.cpsr_mask	= PSR_T_BIT,
>>> +	.cpsr_val	= PSR_T_BIT,
>>> +	.fn		= vfp_support_entry,
>>
>> Why have two entries conditional on each possible value of one bit for
>> otherwise identical encodings? Surely it suffices to set both cpsr_mask
>> and cpsr_val to 0?
> 
> You're right.
> I will set both cpsr_mask and cpsr_val to 0 and use single entry,
> as you suggested, in the v2 patch.
> 
> Thanks.
> 
>>
>> Thanks,
>> Robin.
>>
>>>    }};
>>>    
>>>    static struct undef_hook vfp_support_hook = {
Mark-PK Tsai (蔡沛剛) Sept. 21, 2023, 3:13 p.m. UTC | #5
> On 21/09/2023 3:13 am, Mark-PK Tsai wrote:
> >> On 2023-09-20 09:39, Mark-PK Tsai wrote:
> >>> Add vudot opcode to the VFP undef hook to fix the
> >>> potentially undefined instruction error when the
> >>> user space executes vudot instruction.
> >>
> >> Did the kernel expose a hwcap to say that the dot product extension is
> >> supported? I'm pretty sure it didn't, so why would userspace expect this
> >> to work? ;)
> > 
> > The hwcap for dotprod has been exported since commit:
> > 
> > 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
> > 
> >>
> >> IIRC Amit was looking at defining the hwcaps to align with arm64 compat,
> >> but I believe that series faltered since most of them weren't actually
> >> needed (and I think at that point it was still missing the VFP support
> >> code parts). It would be nice if someone could pick up and combine both
> > 
> > Were the mentioned series related to this commit?
> > 
> > 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
> 
> Oh, that did get merged? My apologies, I grepped for the hwcaps in 
> arch/arm but somehow failed to spot that some definitions did exist, so 
> assumed it hadn't been; not sure what went wrong there :(
> 
> In that case, we definitely want this tagged as a fix, and to make sure 
> we double-check for any equivalent fixes still needed for the other 
> features too. Sorry again for the confusion.


I have just checked the latest open document for the aarch32 ISA:

https://developer.arm.com/documentation/ddi0597/2023-06/?lang=en

It seems that we may need 3 entries to handle all of the
instructions under the "Unconditional Advanced SIMD and floating-point instructions".

Going from the top-level to "Unconditional Advanced SIMD and floating-point instructions"
would be as follows:

L1: b'xxxx_11xx_____xxxx_xxxx_xxxx_xxxx_xxxx_xxxx -> System register access, Advanced SIMD, floating-point and Supervisor call
L2: b'1111_11(!=11)_xxxx_xxxx_xxxx_1xxx_xxxx_xxxx -> Unconditional Advanced SIMD and floating-point instructions

The code would be like:

.instr_mask = 0xff000800
.instr_val  = 0xfc000800

.instr_mask = 0xff000800
.instr_val  = 0xfd000800

.instr_mask = 0xff000800
.instr_val  = 0xfe000800

I would appreciate any suggestions you may have regarding this approach.
Thanks!

> 
> >> efforts and get this done properly; fill in *all* the hwcaps and
> >> relevant handling for extensions which Cortex-A55 supports (since
> >> there's definitely more than just VUDOT), and then hopefully we're done
> >> for good.
> > 
> > Agree.
> > 
> >>
> >>> Before this commit, kernel didn't handle the undef exception
> >>> caused by vudot and didn't enable VFP in lazy VFP context
> >>> switch code like other NEON instructions.
> >>> This led to the occurrence of the undefined instruction
> >>> error as following:
> >>>
> >>> [  250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
> >>> ...
> >>> [  250.741287 ] PC is at 0x4014ec
> >>> [  250.741298 ] LR is at 0xb677874f
> >>> [  250.741303 ] pc : [<004014ec>]    lr : [<b677874f>]    psr: 80070010
> >>> [  250.741309 ] sp : beffedb0  ip : b67d7864  fp : beffee58
> >>> [  250.741314 ] r10: 00000000  r9 : 00000000  r8 : 00000000
> >>> [  250.741319 ] r7 : 00000001  r6 : 00000001  r5 : beffee90  r4 : 00401470
> >>> [  250.741324 ] r3 : beffee20  r2 : beffee30  r1 : beffee40  r0 : 004003a8
> >>> [  250.741331 ] Flags: Nzcv  IRQs on  FIQs on  Mode USER_32  ISA ARM Segment user
> >>> [  250.741339 ] Control: 10c5383d  Table: 32d0406a  DAC: 00000055
> >>> [  250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
> >>>
> >>> Below is the assembly of the user program:
> >>>
> >>> 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
> >>> 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
> >>> 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
> >>> 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
> >>> 0x4014ec <+124>: vudot.u8 q8, q9, q10         <-- switch in, and FPEXC.EN = 0
> >>>                                                     SIGILL(illegal instruction)
> >>>
> >>> Link: https://services.arm.com/support/s/case/5004L00000XsOjP
> >>
> >> Linking to your private support case is not useful to upstream. Even I
> >> can't open that link.
> > 
> > I thought that maybe someone in arm need this.
> > But it seems a bit noisy so I will remove the link from v2.
> 
> Yeah, even within Arm most of us don't have permission to access the 
> support system.
> 
> Cheers,
> Robin.
> 
> >>
> >>> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
> >>> ---
> >>>    arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
> >>>    1 file changed, 12 insertions(+)
> >>>
> >>> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
> >>> index 7e8773a2d99d..7eab8d1019d2 100644
> >>> --- a/arch/arm/vfp/vfpmodule.c
> >>> +++ b/arch/arm/vfp/vfpmodule.c
> >>> @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
> >>>    .cpsr_mask= PSR_T_BIT,
> >>>    .cpsr_val= 0,
> >>>    .fn= vfp_support_entry,
> >>> +}, {
> >>> +.instr_mask= 0xffb00000,
> >>> +.instr_val= 0xfc200000,
> >>> +.cpsr_mask= PSR_T_BIT,
> >>> +.cpsr_val= 0,
> >>> +.fn= vfp_support_entry,
> >>>    }, {
> >>>    .instr_mask= 0xef000000,
> >>>    .instr_val= 0xef000000,
> >>> @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
> >>>    .cpsr_mask= PSR_T_BIT,
> >>>    .cpsr_val= PSR_T_BIT,
> >>>    .fn= vfp_support_entry,
> >>> +}, {
> >>> +.instr_mask= 0xffb00000,
> >>> +.instr_val= 0xfc200000,
> >>> +.cpsr_mask= PSR_T_BIT,
> >>> +.cpsr_val= PSR_T_BIT,
> >>> +.fn= vfp_support_entry,
> >>
> >> Why have two entries conditional on each possible value of one bit for
> >> otherwise identical encodings? Surely it suffices to set both cpsr_mask
> >> and cpsr_val to 0?
> > 
> > You're right.
> > I will set both cpsr_mask and cpsr_val to 0 and use single entry,
> > as you suggested, in the v2 patch.
> > 
> > Thanks.
> > 
> >>
> >> Thanks,
> >> Robin.
> >>
> >>>    }};
> >>>    
> >>>    static struct undef_hook vfp_support_hook = {
diff mbox series

Patch

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 7e8773a2d99d..7eab8d1019d2 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -788,6 +788,12 @@  static struct undef_hook neon_support_hook[] = {{
 	.cpsr_mask	= PSR_T_BIT,
 	.cpsr_val	= 0,
 	.fn		= vfp_support_entry,
+}, {
+	.instr_mask	= 0xffb00000,
+	.instr_val	= 0xfc200000,
+	.cpsr_mask	= PSR_T_BIT,
+	.cpsr_val	= 0,
+	.fn		= vfp_support_entry,
 }, {
 	.instr_mask	= 0xef000000,
 	.instr_val	= 0xef000000,
@@ -800,6 +806,12 @@  static struct undef_hook neon_support_hook[] = {{
 	.cpsr_mask	= PSR_T_BIT,
 	.cpsr_val	= PSR_T_BIT,
 	.fn		= vfp_support_entry,
+}, {
+	.instr_mask	= 0xffb00000,
+	.instr_val	= 0xfc200000,
+	.cpsr_mask	= PSR_T_BIT,
+	.cpsr_val	= PSR_T_BIT,
+	.fn		= vfp_support_entry,
 }};
 
 static struct undef_hook vfp_support_hook = {