diff mbox series

MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen

Message ID 20200702225334.32414-1-hauke@hauke-m.de (mailing list archive)
State Mainlined
Commit fcec538ef8cca0ad0b84432235dccd9059c8e6f8
Headers show
Series MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen | expand

Commit Message

Hauke Mehrtens July 2, 2020, 10:53 p.m. UTC
This resolves the hazard between the mtc0 in the change_c0_status() and
the mfc0 in configure_exception_vector(). Without resolving this hazard
configure_exception_vector() could read an old value and would restore
this old value again. This would revert the changes change_c0_status()
did. I checked this by printing out the read_c0_status() at the end of
per_cpu_trap_init() and the ST0_MX is not set without this patch.

The hazard is documented in the MIPS Architecture Reference Manual Vol.
III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
6.03 table 8.1 which includes:

   Producer | Consumer | Hazard
  ----------|----------|----------------------------
   mtc0     | mfc0     | any coprocessor 0 register

I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
There the change_c0_status() function would activate the DSPen by
setting ST0_MX in the c0_status register. This was reverted and then the
system got a DSP exception when the DSP registers were saved in
save_dsp() in the first process switch. The crash looks like this:

[    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.107070] Kernel panic - not syncing: Unexpected DSP exception
[    0.113470] Rebooting in 1 seconds..

We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
did not happen.

In the kernel I bisected this problem to commit 9012d011660e ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
reverted it also happened after commit 172dcd935c34b ("MIPS: Always
allocate exception vector for MIPSr2+").

Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
does similar changes to a different file. I am not sure if there are
more places affected by this problem.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: <stable@vger.kernel.org>
---
 arch/mips/kernel/traps.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Hauke Mehrtens July 3, 2020, 10:31 p.m. UTC | #1
On 7/3/20 12:53 AM, Hauke Mehrtens wrote:
> This resolves the hazard between the mtc0 in the change_c0_status() and
> the mfc0 in configure_exception_vector(). Without resolving this hazard
> configure_exception_vector() could read an old value and would restore
> this old value again. This would revert the changes change_c0_status()
> did. I checked this by printing out the read_c0_status() at the end of
> per_cpu_trap_init() and the ST0_MX is not set without this patch.
> 
> The hazard is documented in the MIPS Architecture Reference Manual Vol.
> III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
> 6.03 table 8.1 which includes:
> 
>    Producer | Consumer | Hazard
>   ----------|----------|----------------------------
>    mtc0     | mfc0     | any coprocessor 0 register
> 
> I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
> There the change_c0_status() function would activate the DSPen by
> setting ST0_MX in the c0_status register. This was reverted and then the
> system got a DSP exception when the DSP registers were saved in
> save_dsp() in the first process switch. The crash looks like this:
> 
> [    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.107070] Kernel panic - not syncing: Unexpected DSP exception
> [    0.113470] Rebooting in 1 seconds..
> 
> We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
> not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
> kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
> did not happen.
> 
> In the kernel I bisected this problem to commit 9012d011660e ("compiler:
> allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
> reverted it also happened after commit 172dcd935c34b ("MIPS: Always
> allocate exception vector for MIPSr2+").
> 
> Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
> does similar changes to a different file. I am not sure if there are
> more places affected by this problem.
> 
> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
> Cc: <stable@vger.kernel.org>
> ---
>  arch/mips/kernel/traps.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index 7c32c956156a..1234ea21dd8f 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2169,6 +2169,7 @@ static void configure_status(void)
>  
>  	change_c0_status(ST0_CU|ST0_MX|ST0_RE|ST0_FR|ST0_BEV|ST0_TS|ST0_KX|ST0_SX|ST0_UX,
>  			 status_set);
> +	back_to_back_c0_hazard();
>  }
>  
>  unsigned int hwrena;
> 

The MIPS InterAptiv system software user manual suggests to put this
back_to_back_c0_hazard() close to the consumer. Should I move it there?

Should I also add a back_to_back_c0_hazard() close to the other places
which could be affected by similar problems even when I do not see them
at runtime?

Hauke
Jiaxun Yang July 4, 2020, 7:37 a.m. UTC | #2
在 2020/7/4 上午6:31, Hauke Mehrtens 写道:
> On 7/3/20 12:53 AM, Hauke Mehrtens wrote:
>> This resolves the hazard between the mtc0 in the change_c0_status() and
>> the mfc0 in configure_exception_vector(). Without resolving this hazard
>> configure_exception_vector() could read an old value and would restore
>> this old value again. This would revert the changes change_c0_status()
>> did. I checked this by printing out the read_c0_status() at the end of
>> per_cpu_trap_init() and the ST0_MX is not set without this patch.
>>
>> The hazard is documented in the MIPS Architecture Reference Manual Vol.
>> III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
>> 6.03 table 8.1 which includes:
>>
>>     Producer | Consumer | Hazard
>>    ----------|----------|----------------------------
>>     mtc0     | mfc0     | any coprocessor 0 register
>>
>> I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
>> There the change_c0_status() function would activate the DSPen by
>> setting ST0_MX in the c0_status register. This was reverted and then the
>> system got a DSP exception when the DSP registers were saved in
>> save_dsp() in the first process switch. The crash looks like this:
>>
>> [    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
>> [    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
>> [    0.107070] Kernel panic - not syncing: Unexpected DSP exception
>> [    0.113470] Rebooting in 1 seconds..
>>
>> We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
>> not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
>> kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
>> did not happen.
>>
>> In the kernel I bisected this problem to commit 9012d011660e ("compiler:
>> allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
>> reverted it also happened after commit 172dcd935c34b ("MIPS: Always
>> allocate exception vector for MIPSr2+").
>>
>> Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
>> does similar changes to a different file. I am not sure if there are
>> more places affected by this problem.
>>
>> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
>> Cc: <stable@vger.kernel.org>
>> ---
>>   arch/mips/kernel/traps.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
>> index 7c32c956156a..1234ea21dd8f 100644
>> --- a/arch/mips/kernel/traps.c
>> +++ b/arch/mips/kernel/traps.c
>> @@ -2169,6 +2169,7 @@ static void configure_status(void)
>>   
>>   	change_c0_status(ST0_CU|ST0_MX|ST0_RE|ST0_FR|ST0_BEV|ST0_TS|ST0_KX|ST0_SX|ST0_UX,
>>   			 status_set);
>> +	back_to_back_c0_hazard();
>>   }
>>   
>>   unsigned int hwrena;
>>
> The MIPS InterAptiv system software user manual suggests to put this
> back_to_back_c0_hazard() close to the consumer. Should I move it there?
I assume the suggestion was for performance concern.
I'd prefer just put it after c0 write, so we can easily trace missing 
barriers.
> Should I also add a back_to_back_c0_hazard() close to the other places
> which could be affected by similar problems even when I do not see them
> at runtime?
That's preventing us from potential risks. Please do so.

Thanks.

>
> Hauke
>
- Jiaxun
Hauke Mehrtens July 4, 2020, 12:29 p.m. UTC | #3
On 7/4/20 9:37 AM, Jiaxun Yang wrote:
> 
> 
> 在 2020/7/4 上午6:31, Hauke Mehrtens 写道:
>> On 7/3/20 12:53 AM, Hauke Mehrtens wrote:
>>> This resolves the hazard between the mtc0 in the change_c0_status() and
>>> the mfc0 in configure_exception_vector(). Without resolving this hazard
>>> configure_exception_vector() could read an old value and would restore
>>> this old value again. This would revert the changes change_c0_status()
>>> did. I checked this by printing out the read_c0_status() at the end of
>>> per_cpu_trap_init() and the ST0_MX is not set without this patch.
>>>
>>> The hazard is documented in the MIPS Architecture Reference Manual Vol.
>>> III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
>>> 6.03 table 8.1 which includes:
>>>
>>>     Producer | Consumer | Hazard
>>>    ----------|----------|----------------------------
>>>     mtc0     | mfc0     | any coprocessor 0 register
>>>
>>> I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
>>> There the change_c0_status() function would activate the DSPen by
>>> setting ST0_MX in the c0_status register. This was reverted and then the
>>> system got a DSP exception when the DSP registers were saved in
>>> save_dsp() in the first process switch. The crash looks like this:
>>>
>>> [    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096
>>> bytes, linear)
>>> [    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0,
>>> 4096 bytes, linear)
>>> [    0.107070] Kernel panic - not syncing: Unexpected DSP exception
>>> [    0.113470] Rebooting in 1 seconds..
>>>
>>> We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
>>> not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
>>> kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
>>> did not happen.
>>>
>>> In the kernel I bisected this problem to commit 9012d011660e ("compiler:
>>> allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
>>> reverted it also happened after commit 172dcd935c34b ("MIPS: Always
>>> allocate exception vector for MIPSr2+").
>>>
>>> Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
>>> does similar changes to a different file. I am not sure if there are
>>> more places affected by this problem.
>>>
>>> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>>   arch/mips/kernel/traps.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
>>> index 7c32c956156a..1234ea21dd8f 100644
>>> --- a/arch/mips/kernel/traps.c
>>> +++ b/arch/mips/kernel/traps.c
>>> @@ -2169,6 +2169,7 @@ static void configure_status(void)
>>>        
>>> change_c0_status(ST0_CU|ST0_MX|ST0_RE|ST0_FR|ST0_BEV|ST0_TS|ST0_KX|ST0_SX|ST0_UX,
>>>
>>>                status_set);
>>> +    back_to_back_c0_hazard();
>>>   }
>>>     unsigned int hwrena;
>>>
>> The MIPS InterAptiv system software user manual suggests to put this
>> back_to_back_c0_hazard() close to the consumer. Should I move it there?
> I assume the suggestion was for performance concern.
> I'd prefer just put it after c0 write, so we can easily trace missing
> barriers.

Yes, I also think this is for performance reasons, but this is not in a
hotpath. I would leave it as is now and do not plan to change this patch.

>> Should I also add a back_to_back_c0_hazard() close to the other places
>> which could be affected by similar problems even when I do not see them
>> at runtime?
> That's preventing us from potential risks. Please do so.

I will send separate patches for these later.

Hauke
Thomas Bogendoerfer July 5, 2020, 9:46 a.m. UTC | #4
On Fri, Jul 03, 2020 at 12:53:34AM +0200, Hauke Mehrtens wrote:
> This resolves the hazard between the mtc0 in the change_c0_status() and
> the mfc0 in configure_exception_vector(). Without resolving this hazard
> configure_exception_vector() could read an old value and would restore
> this old value again. This would revert the changes change_c0_status()
> did. I checked this by printing out the read_c0_status() at the end of
> per_cpu_trap_init() and the ST0_MX is not set without this patch.
> 
> The hazard is documented in the MIPS Architecture Reference Manual Vol.
> III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
> 6.03 table 8.1 which includes:
> 
>    Producer | Consumer | Hazard
>   ----------|----------|----------------------------
>    mtc0     | mfc0     | any coprocessor 0 register
> 
> I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
> There the change_c0_status() function would activate the DSPen by
> setting ST0_MX in the c0_status register. This was reverted and then the
> system got a DSP exception when the DSP registers were saved in
> save_dsp() in the first process switch. The crash looks like this:
> 
> [    0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.107070] Kernel panic - not syncing: Unexpected DSP exception
> [    0.113470] Rebooting in 1 seconds..
> 
> We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
> not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
> kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
> did not happen.
> 
> In the kernel I bisected this problem to commit 9012d011660e ("compiler:
> allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
> reverted it also happened after commit 172dcd935c34b ("MIPS: Always
> allocate exception vector for MIPSr2+").
> 
> Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
> does similar changes to a different file. I am not sure if there are
> more places affected by this problem.
> 
> Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
> Cc: <stable@vger.kernel.org>
> ---
>  arch/mips/kernel/traps.c | 1 +
>  1 file changed, 1 insertion(+)

applied to mips-fixes.

Thomas.
diff mbox series

Patch

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 7c32c956156a..1234ea21dd8f 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2169,6 +2169,7 @@  static void configure_status(void)
 
 	change_c0_status(ST0_CU|ST0_MX|ST0_RE|ST0_FR|ST0_BEV|ST0_TS|ST0_KX|ST0_SX|ST0_UX,
 			 status_set);
+	back_to_back_c0_hazard();
 }
 
 unsigned int hwrena;