diff mbox series

[v2,2/7] MIPS: csrc-r4k: Apply verification clocksource flags

Message ID 20240612-mips-clks-v2-2-a57e6f49f3db@flygoat.com (mailing list archive)
State Accepted
Commit 7190401fc56fb5f02ee3d04476778ab000bbaf32
Headers show
Series MIPS: clocksource cumulative enhancements | expand

Commit Message

Jiaxun Yang June 12, 2024, 8:54 a.m. UTC
CP0 counter suffers from various problems like SMP sync,
behaviour on wait.

Set CLOCK_SOURCE_MUST_VERIFY and CLOCK_SOURCE_VERIFY_PERCPU,
as what x86 did to TSC, to let kernel test it before use.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
 arch/mips/kernel/csrc-r4k.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Guenter Roeck Aug. 6, 2024, 4:09 a.m. UTC | #1
Hi,

On Wed, Jun 12, 2024 at 09:54:29AM +0100, Jiaxun Yang wrote:
> CP0 counter suffers from various problems like SMP sync,
> behaviour on wait.
> 
> Set CLOCK_SOURCE_MUST_VERIFY and CLOCK_SOURCE_VERIFY_PERCPU,
> as what x86 did to TSC, to let kernel test it before use.
> 
> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>

With this patch in the mainline kernel, about one in five qemu
boot attempts with e1000 Ethernet controller fail to activate
the network interface (specifically, the dhcp client is unable to
get an IP address for the interface). Bisect log is attached below.

For reference, here is an example command line.

qemu-system-mips64 -kernel vmlinux -M malta -cpu 5KEc \
	-initrd rootfs-n32.cpio \
	-device e1000,netdev=net0 -netdev user,id=net0 \
	-vga cirrus -no-reboot -m 256 \
	--append "rdinit=/sbin/init mem=256M console=ttyS0 console=tty " \
	-nographic

Reverting this patch fixes the probem.

Thanks,
Guenter

---
# bad: [de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed] Linux 6.11-rc2
# good: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
git bisect start 'HEAD' 'v6.10'
# good: [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup guard
git bisect good 280e36f0d5b997173d014c07484c03a7f7750668
# good: [a4f9285520584977127946a22eab2adfbc87d1bf] Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect good a4f9285520584977127946a22eab2adfbc87d1bf
# bad: [8e313211f7d46d42b6aa7601b972fe89dcc4a076] Merge tag 'pinctrl-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad 8e313211f7d46d42b6aa7601b972fe89dcc4a076
# good: [acc5965b9ff8a1889f5b51466562896d59c6e1b9] Merge tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect good acc5965b9ff8a1889f5b51466562896d59c6e1b9
# bad: [d2be38b9a5514dbc7dc0c96a2a7f619fcddce00d] Merge tag 'mips_6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
git bisect bad d2be38b9a5514dbc7dc0c96a2a7f619fcddce00d
# good: [45659274e60864f9acabba844468e405362bdc8c] Merge branch 'pci/misc'
git bisect good 45659274e60864f9acabba844468e405362bdc8c
# good: [8e5c0abfa02d85b9cd2419567ad2d73ed8fe4b74] Merge tag 'input-for-v6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 8e5c0abfa02d85b9cd2419567ad2d73ed8fe4b74
# good: [3c3ff7be9729959699eb6cbc7fd7303566d74069] Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 3c3ff7be9729959699eb6cbc7fd7303566d74069
# good: [3de96d810ffd712b7ad2bd764c1390fac2436551] dt-bindings: mips: brcm: Document brcm,bmips-cbr-reg property
git bisect good 3de96d810ffd712b7ad2bd764c1390fac2436551
# bad: [9c7a86c935074525f24cc20e78a7d5150e4600e3] MIPS: lantiq: improve USB initialization
git bisect bad 9c7a86c935074525f24cc20e78a7d5150e4600e3
# bad: [580724fce27f2b71b3e4d58bbe6d83b671929b33] MIPS: sync-r4k: Rework based on x86 tsc_sync
git bisect bad 580724fce27f2b71b3e4d58bbe6d83b671929b33
# good: [c171186c177970d3ec22dd814f2693f1f7fc1e7d] MIPS: csrc-r4k: Refine rating computation
git bisect good c171186c177970d3ec22dd814f2693f1f7fc1e7d
# bad: [426fa8e4fe7bb914b5977cbce453a9926bf5b2e6] MIPS: csrc-r4k: Select HAVE_UNSTABLE_SCHED_CLOCK if SMP && 64BIT
git bisect bad 426fa8e4fe7bb914b5977cbce453a9926bf5b2e6
# bad: [7190401fc56fb5f02ee3d04476778ab000bbaf32] MIPS: csrc-r4k: Apply verification clocksource flags
git bisect bad 7190401fc56fb5f02ee3d04476778ab000bbaf32
# first bad commit: [7190401fc56fb5f02ee3d04476778ab000bbaf32] MIPS: csrc-r4k: Apply verification clocksource flags
Jiaxun Yang Aug. 6, 2024, 5:06 a.m. UTC | #2
在2024年8月6日八月 下午12:09,Guenter Roeck写道:
> Hi,
>
> On Wed, Jun 12, 2024 at 09:54:29AM +0100, Jiaxun Yang wrote:
>> CP0 counter suffers from various problems like SMP sync,
>> behaviour on wait.
>> 
>> Set CLOCK_SOURCE_MUST_VERIFY and CLOCK_SOURCE_VERIFY_PERCPU,
>> as what x86 did to TSC, to let kernel test it before use.
>> 
>> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>

Hi Guenter,

Thanks for the report, it makes no sense to me though....

I can't reproduce it with QEMU git master, do you mind specifying your QEMU
version for me? Also is it possible to have a copy of dmesg when failure happens.

If I'm unable to resolve it in a couple of days, I'll send a patch to revert this change.

Thanks

>
> With this patch in the mainline kernel, about one in five qemu
> boot attempts with e1000 Ethernet controller fail to activate
> the network interface (specifically, the dhcp client is unable to
> get an IP address for the interface). Bisect log is attached below.
>
> For reference, here is an example command line.
>
> qemu-system-mips64 -kernel vmlinux -M malta -cpu 5KEc \
> 	-initrd rootfs-n32.cpio \
> 	-device e1000,netdev=net0 -netdev user,id=net0 \
> 	-vga cirrus -no-reboot -m 256 \
> 	--append "rdinit=/sbin/init mem=256M console=ttyS0 console=tty " \
> 	-nographic
>
> Reverting this patch fixes the probem.
>
> Thanks,
> Guenter
>
> ---
> # bad: [de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed] Linux 6.11-rc2
> # good: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
> git bisect start 'HEAD' 'v6.10'
> # good: [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup 
> guard
> git bisect good 280e36f0d5b997173d014c07484c03a7f7750668
> # good: [a4f9285520584977127946a22eab2adfbc87d1bf] Merge tag 
> 'clk-for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
> git bisect good a4f9285520584977127946a22eab2adfbc87d1bf
> # bad: [8e313211f7d46d42b6aa7601b972fe89dcc4a076] Merge tag 
> 'pinctrl-v6.11-1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> git bisect bad 8e313211f7d46d42b6aa7601b972fe89dcc4a076
> # good: [acc5965b9ff8a1889f5b51466562896d59c6e1b9] Merge tag 
> 'char-misc-6.11-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
> git bisect good acc5965b9ff8a1889f5b51466562896d59c6e1b9
> # bad: [d2be38b9a5514dbc7dc0c96a2a7f619fcddce00d] Merge tag 'mips_6.11' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
> git bisect bad d2be38b9a5514dbc7dc0c96a2a7f619fcddce00d
> # good: [45659274e60864f9acabba844468e405362bdc8c] Merge branch 
> 'pci/misc'
> git bisect good 45659274e60864f9acabba844468e405362bdc8c
> # good: [8e5c0abfa02d85b9cd2419567ad2d73ed8fe4b74] Merge tag 
> 'input-for-v6.11-rc0' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
> git bisect good 8e5c0abfa02d85b9cd2419567ad2d73ed8fe4b74
> # good: [3c3ff7be9729959699eb6cbc7fd7303566d74069] Merge tag 
> 'powerpc-6.11-1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
> git bisect good 3c3ff7be9729959699eb6cbc7fd7303566d74069
> # good: [3de96d810ffd712b7ad2bd764c1390fac2436551] dt-bindings: mips: 
> brcm: Document brcm,bmips-cbr-reg property
> git bisect good 3de96d810ffd712b7ad2bd764c1390fac2436551
> # bad: [9c7a86c935074525f24cc20e78a7d5150e4600e3] MIPS: lantiq: improve 
> USB initialization
> git bisect bad 9c7a86c935074525f24cc20e78a7d5150e4600e3
> # bad: [580724fce27f2b71b3e4d58bbe6d83b671929b33] MIPS: sync-r4k: 
> Rework based on x86 tsc_sync
> git bisect bad 580724fce27f2b71b3e4d58bbe6d83b671929b33
> # good: [c171186c177970d3ec22dd814f2693f1f7fc1e7d] MIPS: csrc-r4k: 
> Refine rating computation
> git bisect good c171186c177970d3ec22dd814f2693f1f7fc1e7d
> # bad: [426fa8e4fe7bb914b5977cbce453a9926bf5b2e6] MIPS: csrc-r4k: 
> Select HAVE_UNSTABLE_SCHED_CLOCK if SMP && 64BIT
> git bisect bad 426fa8e4fe7bb914b5977cbce453a9926bf5b2e6
> # bad: [7190401fc56fb5f02ee3d04476778ab000bbaf32] MIPS: csrc-r4k: Apply 
> verification clocksource flags
> git bisect bad 7190401fc56fb5f02ee3d04476778ab000bbaf32
> # first bad commit: [7190401fc56fb5f02ee3d04476778ab000bbaf32] MIPS: 
> csrc-r4k: Apply verification clocksource flags
Guenter Roeck Aug. 6, 2024, 5:13 a.m. UTC | #3
On 8/5/24 22:06, Jiaxun Yang wrote:
> 
> 
> 在2024年8月6日八月 下午12:09,Guenter Roeck写道:
>> Hi,
>>
>> On Wed, Jun 12, 2024 at 09:54:29AM +0100, Jiaxun Yang wrote:
>>> CP0 counter suffers from various problems like SMP sync,
>>> behaviour on wait.
>>>
>>> Set CLOCK_SOURCE_MUST_VERIFY and CLOCK_SOURCE_VERIFY_PERCPU,
>>> as what x86 did to TSC, to let kernel test it before use.
>>>
>>> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> 
> Hi Guenter,
> 
> Thanks for the report, it makes no sense to me though....
> 
> I can't reproduce it with QEMU git master, do you mind specifying your QEMU
> version for me? Also is it possible to have a copy of dmesg when failure happens.
> 

I currently use v9.0.2. I'll try with some other versions tomorrow.
A complete log is at
https://kerneltests.org/builders/qemu-mips64-master/builds/241/steps/qemubuildcommand/logs/stdio

Are you trying to instantiate an e1000 (or a variant of it) ? So far
I have only seen the problem with that controller. There is no specific
error message, the network interface just doesn't get an IP address.

Thanks,
Guenter

> If I'm unable to resolve it in a couple of days, I'll send a patch to revert this change.
> 
> Thanks
> 
>>
>> With this patch in the mainline kernel, about one in five qemu
>> boot attempts with e1000 Ethernet controller fail to activate
>> the network interface (specifically, the dhcp client is unable to
>> get an IP address for the interface). Bisect log is attached below.
>>
>> For reference, here is an example command line.
>>
>> qemu-system-mips64 -kernel vmlinux -M malta -cpu 5KEc \
>> 	-initrd rootfs-n32.cpio \
>> 	-device e1000,netdev=net0 -netdev user,id=net0 \
>> 	-vga cirrus -no-reboot -m 256 \
>> 	--append "rdinit=/sbin/init mem=256M console=ttyS0 console=tty " \
>> 	-nographic
>>
>> Reverting this patch fixes the probem.
>>
>> Thanks,
>> Guenter
>>
>> ---
>> # bad: [de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed] Linux 6.11-rc2
>> # good: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
>> git bisect start 'HEAD' 'v6.10'
>> # good: [280e36f0d5b997173d014c07484c03a7f7750668] nsfs: use cleanup
>> guard
>> git bisect good 280e36f0d5b997173d014c07484c03a7f7750668
>> # good: [a4f9285520584977127946a22eab2adfbc87d1bf] Merge tag
>> 'clk-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
>> git bisect good a4f9285520584977127946a22eab2adfbc87d1bf
>> # bad: [8e313211f7d46d42b6aa7601b972fe89dcc4a076] Merge tag
>> 'pinctrl-v6.11-1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>> git bisect bad 8e313211f7d46d42b6aa7601b972fe89dcc4a076
>> # good: [acc5965b9ff8a1889f5b51466562896d59c6e1b9] Merge tag
>> 'char-misc-6.11-rc1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
>> git bisect good acc5965b9ff8a1889f5b51466562896d59c6e1b9
>> # bad: [d2be38b9a5514dbc7dc0c96a2a7f619fcddce00d] Merge tag 'mips_6.11'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
>> git bisect bad d2be38b9a5514dbc7dc0c96a2a7f619fcddce00d
>> # good: [45659274e60864f9acabba844468e405362bdc8c] Merge branch
>> 'pci/misc'
>> git bisect good 45659274e60864f9acabba844468e405362bdc8c
>> # good: [8e5c0abfa02d85b9cd2419567ad2d73ed8fe4b74] Merge tag
>> 'input-for-v6.11-rc0' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
>> git bisect good 8e5c0abfa02d85b9cd2419567ad2d73ed8fe4b74
>> # good: [3c3ff7be9729959699eb6cbc7fd7303566d74069] Merge tag
>> 'powerpc-6.11-1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
>> git bisect good 3c3ff7be9729959699eb6cbc7fd7303566d74069
>> # good: [3de96d810ffd712b7ad2bd764c1390fac2436551] dt-bindings: mips:
>> brcm: Document brcm,bmips-cbr-reg property
>> git bisect good 3de96d810ffd712b7ad2bd764c1390fac2436551
>> # bad: [9c7a86c935074525f24cc20e78a7d5150e4600e3] MIPS: lantiq: improve
>> USB initialization
>> git bisect bad 9c7a86c935074525f24cc20e78a7d5150e4600e3
>> # bad: [580724fce27f2b71b3e4d58bbe6d83b671929b33] MIPS: sync-r4k:
>> Rework based on x86 tsc_sync
>> git bisect bad 580724fce27f2b71b3e4d58bbe6d83b671929b33
>> # good: [c171186c177970d3ec22dd814f2693f1f7fc1e7d] MIPS: csrc-r4k:
>> Refine rating computation
>> git bisect good c171186c177970d3ec22dd814f2693f1f7fc1e7d
>> # bad: [426fa8e4fe7bb914b5977cbce453a9926bf5b2e6] MIPS: csrc-r4k:
>> Select HAVE_UNSTABLE_SCHED_CLOCK if SMP && 64BIT
>> git bisect bad 426fa8e4fe7bb914b5977cbce453a9926bf5b2e6
>> # bad: [7190401fc56fb5f02ee3d04476778ab000bbaf32] MIPS: csrc-r4k: Apply
>> verification clocksource flags
>> git bisect bad 7190401fc56fb5f02ee3d04476778ab000bbaf32
>> # first bad commit: [7190401fc56fb5f02ee3d04476778ab000bbaf32] MIPS:
>> csrc-r4k: Apply verification clocksource flags
>
Guenter Roeck Aug. 6, 2024, 3:06 p.m. UTC | #4
On 8/5/24 22:13, Guenter Roeck wrote:
> On 8/5/24 22:06, Jiaxun Yang wrote:
>>
>>
>> 在2024年8月6日八月 下午12:09,Guenter Roeck写道:
>>> Hi,
>>>
>>> On Wed, Jun 12, 2024 at 09:54:29AM +0100, Jiaxun Yang wrote:
>>>> CP0 counter suffers from various problems like SMP sync,
>>>> behaviour on wait.
>>>>
>>>> Set CLOCK_SOURCE_MUST_VERIFY and CLOCK_SOURCE_VERIFY_PERCPU,
>>>> as what x86 did to TSC, to let kernel test it before use.
>>>>
>>>> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
>>
>> Hi Guenter,
>>
>> Thanks for the report, it makes no sense to me though....
>>
>> I can't reproduce it with QEMU git master, do you mind specifying your QEMU
>> version for me? Also is it possible to have a copy of dmesg when failure happens.
>>
> 
> I currently use v9.0.2. I'll try with some other versions tomorrow.
> A complete log is at
> https://kerneltests.org/builders/qemu-mips64-master/builds/241/steps/qemubuildcommand/logs/stdio
> 
> Are you trying to instantiate an e1000 (or a variant of it) ? So far
> I have only seen the problem with that controller. There is no specific
> error message, the network interface just doesn't get an IP address.
> 

I am able to reproduce the problem with qemu 6.2.0 (Debian build).
http://server.roeck-us.net/qemu/mips64/ should have everything needed to
reproduce it. "repeat.sh" repeats the test until it fails.

Hope this helps,
Guenter
Jiaxun Yang Aug. 8, 2024, 7:35 a.m. UTC | #5
在2024年8月6日八月 下午4:06,Guenter Roeck写道:
> On 8/5/24 22:13, Guenter Roeck wrote:
>> On 8/5/24 22:06, Jiaxun Yang wrote:
>>>
>>>
>>> 在2024年8月6日八月 下午12:09,Guenter Roeck写道:
>>>> Hi,
>>>>
>>>> On Wed, Jun 12, 2024 at 09:54:29AM +0100, Jiaxun Yang wrote:
>>>>> CP0 counter suffers from various problems like SMP sync,
>>>>> behaviour on wait.
>>>>>
>>>>> Set CLOCK_SOURCE_MUST_VERIFY and CLOCK_SOURCE_VERIFY_PERCPU,
>>>>> as what x86 did to TSC, to let kernel test it before use.
>>>>>
>>>>> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
>>>
>>> Hi Guenter,
>>>
>>> Thanks for the report, it makes no sense to me though....
>>>
>>> I can't reproduce it with QEMU git master, do you mind specifying your QEMU
>>> version for me? Also is it possible to have a copy of dmesg when failure happens.
>>>
>> 
>> I currently use v9.0.2. I'll try with some other versions tomorrow.
>> A complete log is at
>> https://kerneltests.org/builders/qemu-mips64-master/builds/241/steps/qemubuildcommand/logs/stdio
>> 
>> Are you trying to instantiate an e1000 (or a variant of it) ? So far
>> I have only seen the problem with that controller. There is no specific
>> error message, the network interface just doesn't get an IP address.
>> 
>
> I am able to reproduce the problem with qemu 6.2.0 (Debian build).
> http://server.roeck-us.net/qemu/mips64/ should have everything needed to
> reproduce it. "repeat.sh" repeats the test until it fails.

Thanks for the info, I'm able to reproduce that. It can be reproduced faster
on system with lower CPU performance.

So the actual failure is:

clocksource: timekeeping watchdog on CPU0: Marking clocksource 'MIPS' as unstable because the skew is too large:
clocksource:                       'jiffies' wd_nsec: 500000000 wd_now: ffff8bde wd_last: ffff8bac mask: ffffffff
clocksource:                       'MIPS' cs_nsec: 940634468 cs_now: 310181c4 cs_last: 28090a09 mask: ffffffff
clocksource:                       Clocksource 'MIPS' skewed 440634468 ns (440 ms) over watchdog 'jiffies' interval of 500000000 ns (500 ms)
clocksource:                       'MIPS' is current clocksource.

Jiffies is not an ideal clocksource as watchdog base, really....
I guess clocksource selection process needs to be improved, let me think about it.

>
> Hope this helps,
> Guenter
diff mbox series

Patch

diff --git a/arch/mips/kernel/csrc-r4k.c b/arch/mips/kernel/csrc-r4k.c
index f02ae333f4f9..055747a7417d 100644
--- a/arch/mips/kernel/csrc-r4k.c
+++ b/arch/mips/kernel/csrc-r4k.c
@@ -21,7 +21,9 @@  static struct clocksource clocksource_mips = {
 	.name		= "MIPS",
 	.read		= c0_hpt_read,
 	.mask		= CLOCKSOURCE_MASK(32),
-	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
+	.flags		= CLOCK_SOURCE_IS_CONTINUOUS |
+				  CLOCK_SOURCE_MUST_VERIFY |
+				  CLOCK_SOURCE_VERIFY_PERCPU,
 };
 
 static u64 __maybe_unused notrace r4k_read_sched_clock(void)