diff mbox series

[2/2] MIPS: cevt-r4k: Offset counter value for clearing compare interrupt

Message ID 20230225221008.8520-3-jiaxun.yang@flygoat.com (mailing list archive)
State Superseded
Headers show
Series MIPS Booting fixes | expand

Commit Message

Jiaxun Yang Feb. 25, 2023, 10:10 p.m. UTC
In c0_compare_int_usable we clear compare interrupt by write value
just read out from counter to compare register.

However sometimes if those all instructions are graduated together
then it's possible that at the time compare register is written, the
counter haven't progressed, thus the interrupt is triggered again.

It also applies to QEMU that instructions is execuated significantly
faster then counter.

Offset the counter value a litlle bit to prevent that happen.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
 arch/mips/kernel/cevt-r4k.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Philippe Mathieu-Daudé Feb. 26, 2023, 10:04 p.m. UTC | #1
On 25/2/23 23:10, Jiaxun Yang wrote:
> In c0_compare_int_usable we clear compare interrupt by write value
> just read out from counter to compare register.
> 
> However sometimes if those all instructions are graduated together
> then it's possible that at the time compare register is written, the
> counter haven't progressed, thus the interrupt is triggered again.
> 
> It also applies to QEMU that instructions is execuated significantly

Typo "executed",

> faster then counter.
> 
> Offset the counter value a litlle bit to prevent that happen.

"little".

> 
> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> ---
>   arch/mips/kernel/cevt-r4k.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
> index 32ec67c9ab67..bbc422376e97 100644
> --- a/arch/mips/kernel/cevt-r4k.c
> +++ b/arch/mips/kernel/cevt-r4k.c
> @@ -200,6 +200,8 @@ int c0_compare_int_usable(void)
>   	 */
>   	if (c0_compare_int_pending()) {
>   		cnt = read_c0_count();
> +		// Drawdown a little bit in case counter haven't progressed
> +		cnt -= COMPARE_INT_SEEN_TICKS;
>   		write_c0_compare(cnt);
>   		back_to_back_c0_hazard();
>   		while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))
> @@ -228,6 +230,7 @@ int c0_compare_int_usable(void)
>   	if (!c0_compare_int_pending())
>   		return 0;
>   	cnt = read_c0_count();
> +	cnt -= COMPARE_INT_SEEN_TICKS;
>   	write_c0_compare(cnt);
>   	back_to_back_c0_hazard();
>   	while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))
Thomas Bogendoerfer Feb. 26, 2023, 11:23 p.m. UTC | #2
On Sat, Feb 25, 2023 at 10:10:08PM +0000, Jiaxun Yang wrote:
> In c0_compare_int_usable we clear compare interrupt by write value
> just read out from counter to compare register.
> 
> However sometimes if those all instructions are graduated together
> then it's possible that at the time compare register is written, the
> counter haven't progressed, thus the interrupt is triggered again.
> 
> It also applies to QEMU that instructions is execuated significantly
> faster then counter.
> 
> Offset the counter value a litlle bit to prevent that happen.
> 
> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> ---
>  arch/mips/kernel/cevt-r4k.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
> index 32ec67c9ab67..bbc422376e97 100644
> --- a/arch/mips/kernel/cevt-r4k.c
> +++ b/arch/mips/kernel/cevt-r4k.c
> @@ -200,6 +200,8 @@ int c0_compare_int_usable(void)
>  	 */
>  	if (c0_compare_int_pending()) {
>  		cnt = read_c0_count();
> +		// Drawdown a little bit in case counter haven't progressed

no C++ comments

> +		cnt -= COMPARE_INT_SEEN_TICKS;
>  		write_c0_compare(cnt);
>  		back_to_back_c0_hazard();
>  		while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))

this doesn't make sense. clearing of the interrupts happes because of
this loop. So either COMPARE_INT_SEEN_TICKS is too small or you are
hunting a different bug.

Thomas.
Jiaxun Yang Feb. 27, 2023, 1:22 a.m. UTC | #3
> 2023年2月26日 23:23,Thomas Bogendoerfer <tsbogend@alpha.franken.de> 写道:
> 
> On Sat, Feb 25, 2023 at 10:10:08PM +0000, Jiaxun Yang wrote:
>> In c0_compare_int_usable we clear compare interrupt by write value
>> just read out from counter to compare register.
>> 
>> However sometimes if those all instructions are graduated together
>> then it's possible that at the time compare register is written, the
>> counter haven't progressed, thus the interrupt is triggered again.
>> 
>> It also applies to QEMU that instructions is execuated significantly
>> faster then counter.
>> 
>> Offset the counter value a litlle bit to prevent that happen.
>> 
>> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
>> ---
>> arch/mips/kernel/cevt-r4k.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
>> index 32ec67c9ab67..bbc422376e97 100644
>> --- a/arch/mips/kernel/cevt-r4k.c
>> +++ b/arch/mips/kernel/cevt-r4k.c
>> @@ -200,6 +200,8 @@ int c0_compare_int_usable(void)
>> */
>> if (c0_compare_int_pending()) {
>> cnt = read_c0_count();
>> + // Drawdown a little bit in case counter haven't progressed
> 
> no C++ comments
> 
>> + cnt -= COMPARE_INT_SEEN_TICKS;
>> write_c0_compare(cnt);
>> back_to_back_c0_hazard();
>> while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))
> 
> this doesn't make sense. clearing of the interrupts happes because of
> this loop. So either COMPARE_INT_SEEN_TICKS is too small or you are
> hunting a different bug.

Clearing interrupt happens in write_c0_compare but the problem is the interrupt
does really get cleared because it triggered again straight away.

Had confirmed issue on MIPS I6500 uarch simulation trace.

Ah I see, it makes more sense if I move minus into write_c0_compare so the
loop will still run. Though this loop is not required on any MTI cores, TI is
always clear immediately since very early 4Kc.

Will send v2 later.

Thanks.

 

> 
> Thomas.
> 
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]
Thomas Bogendoerfer Feb. 27, 2023, 9:31 a.m. UTC | #4
On Mon, Feb 27, 2023 at 01:22:47AM +0000, Jiaxun Yang wrote:
> 
> 
> > 2023年2月26日 23:23,Thomas Bogendoerfer <tsbogend@alpha.franken.de> 写道:
> > 
> > On Sat, Feb 25, 2023 at 10:10:08PM +0000, Jiaxun Yang wrote:
> >> In c0_compare_int_usable we clear compare interrupt by write value
> >> just read out from counter to compare register.
> >> 
> >> However sometimes if those all instructions are graduated together
> >> then it's possible that at the time compare register is written, the
> >> counter haven't progressed, thus the interrupt is triggered again.
> >> 
> >> It also applies to QEMU that instructions is execuated significantly
> >> faster then counter.
> >> 
> >> Offset the counter value a litlle bit to prevent that happen.
> >> 
> >> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> >> ---
> >> arch/mips/kernel/cevt-r4k.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
> >> index 32ec67c9ab67..bbc422376e97 100644
> >> --- a/arch/mips/kernel/cevt-r4k.c
> >> +++ b/arch/mips/kernel/cevt-r4k.c
> >> @@ -200,6 +200,8 @@ int c0_compare_int_usable(void)
> >> */
> >> if (c0_compare_int_pending()) {
> >> cnt = read_c0_count();
> >> + // Drawdown a little bit in case counter haven't progressed
> > 
> > no C++ comments
> > 
> >> + cnt -= COMPARE_INT_SEEN_TICKS;
> >> write_c0_compare(cnt);
> >> back_to_back_c0_hazard();
> >> while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))
> > 
> > this doesn't make sense. clearing of the interrupts happes because of
> > this loop. So either COMPARE_INT_SEEN_TICKS is too small or you are
> > hunting a different bug.
> 
> Clearing interrupt happens in write_c0_compare but the problem is the interrupt
> does really get cleared because it triggered again straight away.

the function you are patching is a test function whether counter/compare
is usable, so it's not in the normal timer handling. See a problem there
would more to broken hardware than to a bug in that function.

> Had confirmed issue on MIPS I6500 uarch simulation trace.

so not seen on real hardware ?

Thomas.
Jiaxun Yang Feb. 27, 2023, 11:44 a.m. UTC | #5
> 2023年2月27日 09:31,Thomas Bogendoerfer <tsbogend@alpha.franken.de> 写道:
> 
> On Mon, Feb 27, 2023 at 01:22:47AM +0000, Jiaxun Yang wrote:
>> 
>> 
>> 
[...]
>> Clearing interrupt happens in write_c0_compare but the problem is the interrupt
>> does really get cleared because it triggered again straight away.
> 
> the function you are patching is a test function whether counter/compare
> is usable, so it's not in the normal timer handling. See a problem there
> would more to broken hardware than to a bug in that function.

The problem is sometimes on a multi-core system this function managed
to run on one of the core but then fail on another.

In our real use of compare interrupt, we are clearing interrupt by writing back
old value of compare, which is guarenteed to be smaller then current count
because there are so many instructions being executed from exception vector
to here.

Also as we have no way to disable compare interrupt, if this function determined
compare to be broken then it won’t register event_handler function, which will
cause panic for null pointer when unexpected-ish interrupt comes.

> 
>> Had confirmed issue on MIPS I6500 uarch simulation trace.
> 
> so not seen on real hardware ?

It was discovered on FPGA and then I tried to debug with simulation to reveal what’s
going on hardware side.

Thanks
- Jiaxun

> 
> Thomas.
> 
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]
Thomas Bogendoerfer Feb. 27, 2023, 4:48 p.m. UTC | #6
On Mon, Feb 27, 2023 at 11:44:48AM +0000, Jiaxun Yang wrote:
> > so not seen on real hardware ?
> 
> It was discovered on FPGA and then I tried to debug with simulation to
> reveal what’s going on hardware side.

ic, I think the best fix would be to do a write_c0_compare(cnt - 1).
Subtracting COMPARE_INT_SEEN_TICKS feels like overkill to me.

Thomas.
diff mbox series

Patch

diff --git a/arch/mips/kernel/cevt-r4k.c b/arch/mips/kernel/cevt-r4k.c
index 32ec67c9ab67..bbc422376e97 100644
--- a/arch/mips/kernel/cevt-r4k.c
+++ b/arch/mips/kernel/cevt-r4k.c
@@ -200,6 +200,8 @@  int c0_compare_int_usable(void)
 	 */
 	if (c0_compare_int_pending()) {
 		cnt = read_c0_count();
+		// Drawdown a little bit in case counter haven't progressed
+		cnt -= COMPARE_INT_SEEN_TICKS;
 		write_c0_compare(cnt);
 		back_to_back_c0_hazard();
 		while (read_c0_count() < (cnt  + COMPARE_INT_SEEN_TICKS))
@@ -228,6 +230,7 @@  int c0_compare_int_usable(void)
 	if (!c0_compare_int_pending())
 		return 0;
 	cnt = read_c0_count();
+	cnt -= COMPARE_INT_SEEN_TICKS;
 	write_c0_compare(cnt);
 	back_to_back_c0_hazard();
 	while (read_c0_count() < (cnt + COMPARE_INT_SEEN_TICKS))