diff mbox series

MIPS: process: Remove lazy context flags for new kernel thread

Message ID 20231026111715.1281728-1-jiaxun.yang@flygoat.com (mailing list archive)
State Superseded
Headers show
Series MIPS: process: Remove lazy context flags for new kernel thread | expand

Commit Message

Jiaxun Yang Oct. 26, 2023, 11:17 a.m. UTC
We received a report from debian infra team, says their build machine
crashes regularly with:

[ 4066.698500] do_cpu invoked from kernel context![#1]:
[ 4066.703455] CPU: 1 PID: 76608 Comm: iou-sqp-76326 Not tainted 5.10.0-21-loongson-3 #1 Debian 5.10.162-1
[ 4066.712793] Hardware name: Loongson Lemote-3A4000-7A-1w-V1.00-A1901/Lemote-3A4000-7A-1w-V1.00-A1901, BIOS Loongson-PMON-V3.3-20201222 12/22/2020
[ 4066.725672] $ 0   : 0000000000000000 ffffffff80bf2e48 0000000000000001 9800000200804000
[ 4066.733642] $ 4   : 9800000105115280 ffffffff80db4728 0000000000000008 0000020080000200
[ 4066.741607] $ 8   : 0000000000000001 0000000000000001 0000000000000000 0000000002e85400
[ 4066.749571] $12   : 000000005400cce0 ffffffff80199c00 000000000000036f 000000000000036f
[ 4066.757536] $16   : 980000010025c080 ffffffff80ec4740 0000000000000000 980000000234b8c0
[ 4066.765501] $20   : ffffffff80ec5ce0 9800000105115280 98000001051158a0 0000000000000000
[ 4066.773466] $24   : 0000000000000028 9800000200807e58
[ 4066.781431] $28   : 9800000200804000 9800000200807d40 980000000234b8c0 ffffffff80bf3074
[ 4066.789395] Hi    : 00000000000002fb
[ 4066.792943] Lo    : 00000000428f6816
[ 4066.796500] epc   : ffffffff802177c0 _save_fp+0x10/0xa0
[ 4066.801695] ra    : ffffffff80bf3074 __schedule+0x804/0xe08
[ 4066.807230] Status: 5400cce2 KX SX UX KERNEL EXL
[ 4066.811917] Cause : 1000002c (ExcCode 0b)
[ 4066.815899] PrId  : 0014c004 (ICT Loongson-3)
[ 4066.820228] Modules linked in: asix usbnet mii sg ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables nfnetlink_log nfnetlink xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter sch_fq tcp_bbr fuse drm drm_panel_orientation_quirks configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ohci_pci dm_mod r8169 realtek mdio_devres ohci_hcd ehci_pci of_mdio xhci_pci fixed_phy xhci_hcd ehci_hcd libphy usbcore usb_common
[ 4066.868085] Process iou-sqp-76326 (pid: 76608, threadinfo=0000000056dd346c, task=000000001209ac62, tls=000000fff18298e0)
[ 4066.878897] Stack : ffffffff80ec0000 0000000000000000 ffffffff80ec0000 980000010db34100
[ 4066.886867]         9800000100000004 d253a55201683fdc 9800000105115280 0000000000000000
[ 4066.894832]         0000000000000000 0000000000000001 980000010db340e8 0000000000000001
[ 4066.902796]         0000000000000004 0000000000000000 980000010db33d28 ffffffff80bf36d0
[ 4066.910761]         980000010db340e8 980000010db34100 980000010db340c8 ffffffff8070d740
[ 4066.918726]         980000010946cc80 9800000104b56c80 980000010db340c0 0000000000000000
[ 4066.926690]         ffffffff80ec0000 980000010db340c8 980000010025c080 ffffffff80ec5ce0
[ 4066.934654]         0000000000000000 9800000105115280 ffffffff802c59b8 980000010db34108
[ 4066.942619]         980000010db34108 2d7071732d756f69 ffff003632333637 d253a55201683fdc
[ 4066.950585]         ffffffff8070d1c8 980000010db340c0 98000001092276c8 000000007400cce0
[ 4066.958552]         ...
[ 4066.960981] Call Trace:
[ 4066.963414] [<ffffffff802177c0>] _save_fp+0x10/0xa0
[ 4066.968270] [<ffffffff80bf3074>] __schedule+0x804/0xe08
[ 4066.973462] [<ffffffff80bf36d0>] schedule+0x58/0x150
[ 4066.978397] [<ffffffff8070d740>] io_sq_thread+0x578/0x5a0
[ 4066.983764] [<ffffffff8020518c>] ret_from_kernel_thread+0x14/0x1c
[ 4066.989823]
[ 4066.991297] Code: 000c6940  05a10011  00000000 <f4810af0> f4830b10  f4850b30  f4870b50  f4890b70  f48b0b90

It seems like kernel is trying to save a FP context for a kthread.
Since we don't use FPU in kernel for now, TIF_USEDFPU must be set
accidentally for that kthread.

Inspecting the code it seems like create_io_thread may be invoked
from threads that have FP context alive, causing TIF_USEDFPU to be
copied from that context to kthread unexpectedly.

Move around code blocks to ensure flags regarding lazy hardware
context get cleared for kernel threads as well.

Cc: stable@vger.kernel.org
Reported-by: Aurelien Jarno <aurel32@debian.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
---
Folks, it might be helpful to check ST0_CU1 in is_fpu_owner
to catch this kind of problem in future, what's your opinion?
---
 arch/mips/kernel/process.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

Comments

Philippe Mathieu-Daudé Oct. 26, 2023, 11:36 a.m. UTC | #1
On 26/10/23 13:17, Jiaxun Yang wrote:
> We received a report from debian infra team, says their build machine
> crashes regularly with:
> 
> [ 4066.698500] do_cpu invoked from kernel context![#1]:
> [ 4066.703455] CPU: 1 PID: 76608 Comm: iou-sqp-76326 Not tainted 5.10.0-21-loongson-3 #1 Debian 5.10.162-1
> [ 4066.712793] Hardware name: Loongson Lemote-3A4000-7A-1w-V1.00-A1901/Lemote-3A4000-7A-1w-V1.00-A1901, BIOS Loongson-PMON-V3.3-20201222 12/22/2020
> [ 4066.725672] $ 0   : 0000000000000000 ffffffff80bf2e48 0000000000000001 9800000200804000
> [ 4066.733642] $ 4   : 9800000105115280 ffffffff80db4728 0000000000000008 0000020080000200
> [ 4066.741607] $ 8   : 0000000000000001 0000000000000001 0000000000000000 0000000002e85400
> [ 4066.749571] $12   : 000000005400cce0 ffffffff80199c00 000000000000036f 000000000000036f
> [ 4066.757536] $16   : 980000010025c080 ffffffff80ec4740 0000000000000000 980000000234b8c0
> [ 4066.765501] $20   : ffffffff80ec5ce0 9800000105115280 98000001051158a0 0000000000000000
> [ 4066.773466] $24   : 0000000000000028 9800000200807e58
> [ 4066.781431] $28   : 9800000200804000 9800000200807d40 980000000234b8c0 ffffffff80bf3074
> [ 4066.789395] Hi    : 00000000000002fb
> [ 4066.792943] Lo    : 00000000428f6816
> [ 4066.796500] epc   : ffffffff802177c0 _save_fp+0x10/0xa0
> [ 4066.801695] ra    : ffffffff80bf3074 __schedule+0x804/0xe08
> [ 4066.807230] Status: 5400cce2 KX SX UX KERNEL EXL
> [ 4066.811917] Cause : 1000002c (ExcCode 0b)
> [ 4066.815899] PrId  : 0014c004 (ICT Loongson-3)
> [ 4066.820228] Modules linked in: asix usbnet mii sg ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables nfnetlink_log nfnetlink xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter sch_fq tcp_bbr fuse drm drm_panel_orientation_quirks configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ohci_pci dm_mod r8169 realtek mdio_devres ohci_hcd ehci_pci of_mdio xhci_pci fixed_phy xhci_hcd ehci_hcd libphy usbcore usb_common
> [ 4066.868085] Process iou-sqp-76326 (pid: 76608, threadinfo=0000000056dd346c, task=000000001209ac62, tls=000000fff18298e0)
> [ 4066.878897] Stack : ffffffff80ec0000 0000000000000000 ffffffff80ec0000 980000010db34100
> [ 4066.886867]         9800000100000004 d253a55201683fdc 9800000105115280 0000000000000000
> [ 4066.894832]         0000000000000000 0000000000000001 980000010db340e8 0000000000000001
> [ 4066.902796]         0000000000000004 0000000000000000 980000010db33d28 ffffffff80bf36d0
> [ 4066.910761]         980000010db340e8 980000010db34100 980000010db340c8 ffffffff8070d740
> [ 4066.918726]         980000010946cc80 9800000104b56c80 980000010db340c0 0000000000000000
> [ 4066.926690]         ffffffff80ec0000 980000010db340c8 980000010025c080 ffffffff80ec5ce0
> [ 4066.934654]         0000000000000000 9800000105115280 ffffffff802c59b8 980000010db34108
> [ 4066.942619]         980000010db34108 2d7071732d756f69 ffff003632333637 d253a55201683fdc
> [ 4066.950585]         ffffffff8070d1c8 980000010db340c0 98000001092276c8 000000007400cce0
> [ 4066.958552]         ...
> [ 4066.960981] Call Trace:
> [ 4066.963414] [<ffffffff802177c0>] _save_fp+0x10/0xa0
> [ 4066.968270] [<ffffffff80bf3074>] __schedule+0x804/0xe08
> [ 4066.973462] [<ffffffff80bf36d0>] schedule+0x58/0x150
> [ 4066.978397] [<ffffffff8070d740>] io_sq_thread+0x578/0x5a0
> [ 4066.983764] [<ffffffff8020518c>] ret_from_kernel_thread+0x14/0x1c
> [ 4066.989823]
> [ 4066.991297] Code: 000c6940  05a10011  00000000 <f4810af0> f4830b10  f4850b30  f4870b50  f4890b70  f48b0b90
> 
> It seems like kernel is trying to save a FP context for a kthread.
> Since we don't use FPU in kernel for now, TIF_USEDFPU must be set
> accidentally for that kthread.
> 
> Inspecting the code it seems like create_io_thread may be invoked
> from threads that have FP context alive, causing TIF_USEDFPU to be
> copied from that context to kthread unexpectedly.
> 
> Move around code blocks to ensure flags regarding lazy hardware
> context get cleared for kernel threads as well.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Aurelien Jarno <aurel32@debian.org>
> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> ---
> Folks, it might be helpful to check ST0_CU1 in is_fpu_owner
> to catch this kind of problem in future, what's your opinion?
> ---
>   arch/mips/kernel/process.c | 35 +++++++++++++++++------------------
>   1 file changed, 17 insertions(+), 18 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Aurelien Jarno Oct. 27, 2023, 6:47 p.m. UTC | #2
On 2023-10-27 16:58, Aurelien Jarno wrote:
> On 2023-10-26 12:17, Jiaxun Yang wrote:
> > We received a report from debian infra team, says their build machine
> > crashes regularly with:
> > 
> > [ 4066.698500] do_cpu invoked from kernel context![#1]:
> > [ 4066.703455] CPU: 1 PID: 76608 Comm: iou-sqp-76326 Not tainted 5.10.0-21-loongson-3 #1 Debian 5.10.162-1
> > [ 4066.712793] Hardware name: Loongson Lemote-3A4000-7A-1w-V1.00-A1901/Lemote-3A4000-7A-1w-V1.00-A1901, BIOS Loongson-PMON-V3.3-20201222 12/22/2020
> > [ 4066.725672] $ 0   : 0000000000000000 ffffffff80bf2e48 0000000000000001 9800000200804000
> > [ 4066.733642] $ 4   : 9800000105115280 ffffffff80db4728 0000000000000008 0000020080000200
> > [ 4066.741607] $ 8   : 0000000000000001 0000000000000001 0000000000000000 0000000002e85400
> > [ 4066.749571] $12   : 000000005400cce0 ffffffff80199c00 000000000000036f 000000000000036f
> > [ 4066.757536] $16   : 980000010025c080 ffffffff80ec4740 0000000000000000 980000000234b8c0
> > [ 4066.765501] $20   : ffffffff80ec5ce0 9800000105115280 98000001051158a0 0000000000000000
> > [ 4066.773466] $24   : 0000000000000028 9800000200807e58
> > [ 4066.781431] $28   : 9800000200804000 9800000200807d40 980000000234b8c0 ffffffff80bf3074
> > [ 4066.789395] Hi    : 00000000000002fb
> > [ 4066.792943] Lo    : 00000000428f6816
> > [ 4066.796500] epc   : ffffffff802177c0 _save_fp+0x10/0xa0
> > [ 4066.801695] ra    : ffffffff80bf3074 __schedule+0x804/0xe08
> > [ 4066.807230] Status: 5400cce2 KX SX UX KERNEL EXL
> > [ 4066.811917] Cause : 1000002c (ExcCode 0b)
> > [ 4066.815899] PrId  : 0014c004 (ICT Loongson-3)
> > [ 4066.820228] Modules linked in: asix usbnet mii sg ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables nfnetlink_log nfnetlink xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter sch_fq tcp_bbr fuse drm drm_panel_orientation_quirks configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ohci_pci dm_mod r8169 realtek mdio_devres ohci_hcd ehci_pci of_mdio xhci_pci fixed_phy xhci_hcd ehci_hcd libphy usbcore usb_common
> > [ 4066.868085] Process iou-sqp-76326 (pid: 76608, threadinfo=0000000056dd346c, task=000000001209ac62, tls=000000fff18298e0)
> > [ 4066.878897] Stack : ffffffff80ec0000 0000000000000000 ffffffff80ec0000 980000010db34100
> > [ 4066.886867]         9800000100000004 d253a55201683fdc 9800000105115280 0000000000000000
> > [ 4066.894832]         0000000000000000 0000000000000001 980000010db340e8 0000000000000001
> > [ 4066.902796]         0000000000000004 0000000000000000 980000010db33d28 ffffffff80bf36d0
> > [ 4066.910761]         980000010db340e8 980000010db34100 980000010db340c8 ffffffff8070d740
> > [ 4066.918726]         980000010946cc80 9800000104b56c80 980000010db340c0 0000000000000000
> > [ 4066.926690]         ffffffff80ec0000 980000010db340c8 980000010025c080 ffffffff80ec5ce0
> > [ 4066.934654]         0000000000000000 9800000105115280 ffffffff802c59b8 980000010db34108
> > [ 4066.942619]         980000010db34108 2d7071732d756f69 ffff003632333637 d253a55201683fdc
> > [ 4066.950585]         ffffffff8070d1c8 980000010db340c0 98000001092276c8 000000007400cce0
> > [ 4066.958552]         ...
> > [ 4066.960981] Call Trace:
> > [ 4066.963414] [<ffffffff802177c0>] _save_fp+0x10/0xa0
> > [ 4066.968270] [<ffffffff80bf3074>] __schedule+0x804/0xe08
> > [ 4066.973462] [<ffffffff80bf36d0>] schedule+0x58/0x150
> > [ 4066.978397] [<ffffffff8070d740>] io_sq_thread+0x578/0x5a0
> > [ 4066.983764] [<ffffffff8020518c>] ret_from_kernel_thread+0x14/0x1c
> > [ 4066.989823]
> > [ 4066.991297] Code: 000c6940  05a10011  00000000 <f4810af0> f4830b10  f4850b30  f4870b50  f4890b70  f48b0b90
> > 
> > It seems like kernel is trying to save a FP context for a kthread.
> > Since we don't use FPU in kernel for now, TIF_USEDFPU must be set
> > accidentally for that kthread.
> > 
> > Inspecting the code it seems like create_io_thread may be invoked
> > from threads that have FP context alive, causing TIF_USEDFPU to be
> > copied from that context to kthread unexpectedly.
> > 
> > Move around code blocks to ensure flags regarding lazy hardware
> > context get cleared for kernel threads as well.
> > 
> > Cc: stable@vger.kernel.org
> > Reported-by: Aurelien Jarno <aurel32@debian.org>
> > Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> 
> Thanks for the patch. In the meantime we have found that the problem is
> reproducible by building the kitinerary package. The crash happens when
> cmake starts the build. It's not impossible that other packages are able
> to also trigger the crash, but we haven't identified them yet.

It seems the crash happens with any package built using cmake.
Aurelien Jarno Nov. 18, 2023, 3:36 p.m. UTC | #3
Hi,

On 2023-10-27 16:58, Aurelien Jarno wrote:
> On 2023-10-26 12:17, Jiaxun Yang wrote:
> > We received a report from debian infra team, says their build machine
> > crashes regularly with:
> > 
> > [ 4066.698500] do_cpu invoked from kernel context![#1]:
> > [ 4066.703455] CPU: 1 PID: 76608 Comm: iou-sqp-76326 Not tainted 5.10.0-21-loongson-3 #1 Debian 5.10.162-1
> > [ 4066.712793] Hardware name: Loongson Lemote-3A4000-7A-1w-V1.00-A1901/Lemote-3A4000-7A-1w-V1.00-A1901, BIOS Loongson-PMON-V3.3-20201222 12/22/2020
> > [ 4066.725672] $ 0   : 0000000000000000 ffffffff80bf2e48 0000000000000001 9800000200804000
> > [ 4066.733642] $ 4   : 9800000105115280 ffffffff80db4728 0000000000000008 0000020080000200
> > [ 4066.741607] $ 8   : 0000000000000001 0000000000000001 0000000000000000 0000000002e85400
> > [ 4066.749571] $12   : 000000005400cce0 ffffffff80199c00 000000000000036f 000000000000036f
> > [ 4066.757536] $16   : 980000010025c080 ffffffff80ec4740 0000000000000000 980000000234b8c0
> > [ 4066.765501] $20   : ffffffff80ec5ce0 9800000105115280 98000001051158a0 0000000000000000
> > [ 4066.773466] $24   : 0000000000000028 9800000200807e58
> > [ 4066.781431] $28   : 9800000200804000 9800000200807d40 980000000234b8c0 ffffffff80bf3074
> > [ 4066.789395] Hi    : 00000000000002fb
> > [ 4066.792943] Lo    : 00000000428f6816
> > [ 4066.796500] epc   : ffffffff802177c0 _save_fp+0x10/0xa0
> > [ 4066.801695] ra    : ffffffff80bf3074 __schedule+0x804/0xe08
> > [ 4066.807230] Status: 5400cce2 KX SX UX KERNEL EXL
> > [ 4066.811917] Cause : 1000002c (ExcCode 0b)
> > [ 4066.815899] PrId  : 0014c004 (ICT Loongson-3)
> > [ 4066.820228] Modules linked in: asix usbnet mii sg ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables nfnetlink_log nfnetlink xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter sch_fq tcp_bbr fuse drm drm_panel_orientation_quirks configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ohci_pci dm_mod r8169 realtek mdio_devres ohci_hcd ehci_pci of_mdio xhci_pci fixed_phy xhci_hcd ehci_hcd libphy usbcore usb_common
> > [ 4066.868085] Process iou-sqp-76326 (pid: 76608, threadinfo=0000000056dd346c, task=000000001209ac62, tls=000000fff18298e0)
> > [ 4066.878897] Stack : ffffffff80ec0000 0000000000000000 ffffffff80ec0000 980000010db34100
> > [ 4066.886867]         9800000100000004 d253a55201683fdc 9800000105115280 0000000000000000
> > [ 4066.894832]         0000000000000000 0000000000000001 980000010db340e8 0000000000000001
> > [ 4066.902796]         0000000000000004 0000000000000000 980000010db33d28 ffffffff80bf36d0
> > [ 4066.910761]         980000010db340e8 980000010db34100 980000010db340c8 ffffffff8070d740
> > [ 4066.918726]         980000010946cc80 9800000104b56c80 980000010db340c0 0000000000000000
> > [ 4066.926690]         ffffffff80ec0000 980000010db340c8 980000010025c080 ffffffff80ec5ce0
> > [ 4066.934654]         0000000000000000 9800000105115280 ffffffff802c59b8 980000010db34108
> > [ 4066.942619]         980000010db34108 2d7071732d756f69 ffff003632333637 d253a55201683fdc
> > [ 4066.950585]         ffffffff8070d1c8 980000010db340c0 98000001092276c8 000000007400cce0
> > [ 4066.958552]         ...
> > [ 4066.960981] Call Trace:
> > [ 4066.963414] [<ffffffff802177c0>] _save_fp+0x10/0xa0
> > [ 4066.968270] [<ffffffff80bf3074>] __schedule+0x804/0xe08
> > [ 4066.973462] [<ffffffff80bf36d0>] schedule+0x58/0x150
> > [ 4066.978397] [<ffffffff8070d740>] io_sq_thread+0x578/0x5a0
> > [ 4066.983764] [<ffffffff8020518c>] ret_from_kernel_thread+0x14/0x1c
> > [ 4066.989823]
> > [ 4066.991297] Code: 000c6940  05a10011  00000000 <f4810af0> f4830b10  f4850b30  f4870b50  f4890b70  f48b0b90
> > 
> > It seems like kernel is trying to save a FP context for a kthread.
> > Since we don't use FPU in kernel for now, TIF_USEDFPU must be set
> > accidentally for that kthread.
> > 
> > Inspecting the code it seems like create_io_thread may be invoked
> > from threads that have FP context alive, causing TIF_USEDFPU to be
> > copied from that context to kthread unexpectedly.
> > 
> > Move around code blocks to ensure flags regarding lazy hardware
> > context get cleared for kernel threads as well.
> > 
> > Cc: stable@vger.kernel.org
> > Reported-by: Aurelien Jarno <aurel32@debian.org>
> > Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
> 
> Thanks for the patch. In the meantime we have found that the problem is
> reproducible by building the kitinerary package. The crash happens when
> cmake starts the build. It's not impossible that other packages are able
> to also trigger the crash, but we haven't identified them yet.
> 
> Anyway, I have been able to test a backport of the patch onto the 5.10
> kernel (with minor adjustments) and I confirm it fixes the reported
> issue.
> 
> Tested-by: Aurelien Jarno <aurel32@debian.org>

It seems that this patch hasn't been merged yet, either in Linus' tree
or in the MIPS tree. Is there anything blocking?

Regards
Aurelien
Thomas Bogendoerfer Nov. 20, 2023, 7:08 p.m. UTC | #4
On Sat, Nov 18, 2023 at 04:36:45PM +0100, Aurelien Jarno wrote:
> > Anyway, I have been able to test a backport of the patch onto the 5.10
> > kernel (with minor adjustments) and I confirm it fixes the reported
> > issue.
> > 
> > Tested-by: Aurelien Jarno <aurel32@debian.org>
> 
> It seems that this patch hasn't been merged yet, either in Linus' tree
> or in the MIPS tree. Is there anything blocking?

sorry, took some time to get really back from vacation...

I don't like the patch doing too much code restructing. I can't
reproduce on my loongson machine, so I can't test below patch...

What cmake version do I need and what would be a package to
reproduce the bug ?

Thomas.

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 5387ed0a5186..b630604c577f 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -121,6 +121,19 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	/*  Put the stack after the struct pt_regs.  */
 	childksp = (unsigned long) childregs;
 	p->thread.cp0_status = (read_c0_status() & ~(ST0_CU2|ST0_CU1)) | ST0_KERNEL_CUMASK;
+
+	/*
+	 * New tasks lose permission to use the fpu. This accelerates context
+	 * switching for most programs since they don't use the fpu.
+	 */
+	clear_tsk_thread_flag(p, TIF_USEDFPU);
+	clear_tsk_thread_flag(p, TIF_USEDMSA);
+	clear_tsk_thread_flag(p, TIF_MSA_CTX_LIVE);
+
+#ifdef CONFIG_MIPS_MT_FPAFF
+	clear_tsk_thread_flag(p, TIF_FPUBOUND);
+#endif /* CONFIG_MIPS_MT_FPAFF */
+
 	if (unlikely(args->fn)) {
 		/* kernel thread */
 		unsigned long status = p->thread.cp0_status;
@@ -149,20 +162,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	p->thread.reg29 = (unsigned long) childregs;
 	p->thread.reg31 = (unsigned long) ret_from_fork;
 
-	/*
-	 * New tasks lose permission to use the fpu. This accelerates context
-	 * switching for most programs since they don't use the fpu.
-	 */
 	childregs->cp0_status &= ~(ST0_CU2|ST0_CU1);
 
-	clear_tsk_thread_flag(p, TIF_USEDFPU);
-	clear_tsk_thread_flag(p, TIF_USEDMSA);
-	clear_tsk_thread_flag(p, TIF_MSA_CTX_LIVE);
-
-#ifdef CONFIG_MIPS_MT_FPAFF
-	clear_tsk_thread_flag(p, TIF_FPUBOUND);
-#endif /* CONFIG_MIPS_MT_FPAFF */
-
 #ifdef CONFIG_MIPS_FP_SUPPORT
 	atomic_set(&p->thread.bd_emu_frame, BD_EMUFRAME_NONE);
 #endif
Jiaxun Yang Nov. 21, 2023, 12:27 p.m. UTC | #5
在2023年11月20日十一月 下午7:08,Thomas Bogendoerfer写道:
> On Sat, Nov 18, 2023 at 04:36:45PM +0100, Aurelien Jarno wrote:
>> > Anyway, I have been able to test a backport of the patch onto the 5.10
>> > kernel (with minor adjustments) and I confirm it fixes the reported
>> > issue.
>> > 
>> > Tested-by: Aurelien Jarno <aurel32@debian.org>
>> 
>> It seems that this patch hasn't been merged yet, either in Linus' tree
>> or in the MIPS tree. Is there anything blocking?
>
> sorry, took some time to get really back from vacation...
>
> I don't like the patch doing too much code restructing. I can't
> reproduce on my loongson machine, so I can't test below patch...

I intentionally do code shuffle to match with other arches :-)
To reproduce, you can just install Debian sid and build kitinerary with
sbuild. However, it seems like loongson3_defconfig won't expose this
problem, you'll have to build kernel with Debian's config.

I'll test this patch later today.

Thanks
- Jiaxun

>
> What cmake version do I need and what would be a package to
> reproduce the bug ?
>
> Thomas.
>
[...]
Thomas Bogendoerfer Nov. 21, 2023, 12:38 p.m. UTC | #6
On Tue, Nov 21, 2023 at 12:27:11PM +0000, Jiaxun Yang wrote:
> > I don't like the patch doing too much code restructing. I can't
> To reproduce, you can just install Debian sid and build kitinerary with

I found an io_uring test program, which triggers it. Now my loongson3
machine needs pressing reset in a remote location... is there a way
to configure it to start automatically after power-off/power-on ?

> sbuild. However, it seems like loongson3_defconfig won't expose this
> problem, you'll have to build kernel with Debian's config.

CONFIG_IO_URING=y

that's the needed config option.

Thomas.
Jiaxun Yang Nov. 21, 2023, 12:44 p.m. UTC | #7
在2023年11月21日十一月 下午12:38,Thomas Bogendoerfer写道:
> On Tue, Nov 21, 2023 at 12:27:11PM +0000, Jiaxun Yang wrote:
>> > I don't like the patch doing too much code restructing. I can't
>> To reproduce, you can just install Debian sid and build kitinerary with
>
> I found an io_uring test program, which triggers it. Now my loongson3
> machine needs pressing reset in a remote location... is there a way
> to configure it to start automatically after power-off/power-on ?

There might be a switch in UEFI firmware, I'm not 100% sure:-(
WoL may work on that machine as well, my personal remote lab setup uses
an ESP8266 to control reset and power button signal.

>
>> sbuild. However, it seems like loongson3_defconfig won't expose this
>> problem, you'll have to build kernel with Debian's config.
>
> CONFIG_IO_URING=y
>
> that's the needed config option.

I tried before but it seems like looks like that's not enough.

Thanks.
>
> Thomas.
>
> -- 
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea.                                                [ RFC1925, 2.3 ]
Jiaxun Yang Nov. 21, 2023, 12:45 p.m. UTC | #8
在2023年11月21日十一月 下午12:44,Jiaxun Yang写道:
> 在2023年11月21日十一月 下午12:38,Thomas Bogendoerfer写道:
>> On Tue, Nov 21, 2023 at 12:27:11PM +0000, Jiaxun Yang wrote:
>>> > I don't like the patch doing too much code restructing. I can't
>>> To reproduce, you can just install Debian sid and build kitinerary with
>>
>> I found an io_uring test program, which triggers it. Now my loongson3
>> machine needs pressing reset in a remote location... is there a way
>> to configure it to start automatically after power-off/power-on ?
>
> There might be a switch in UEFI firmware, I'm not 100% sure:-(
> WoL may work on that machine as well, my personal remote lab setup uses
> an ESP8266 to control reset and power button signal.
>
>>
>>> sbuild. However, it seems like loongson3_defconfig won't expose this
>>> problem, you'll have to build kernel with Debian's config.
>>
>> CONFIG_IO_URING=y
>>
>> that's the needed config option.
>
> I tried before but it seems like looks like that's not enough.
^ nvm that's for cmake's workload.

Thanks
Thomas Bogendoerfer Nov. 21, 2023, 12:46 p.m. UTC | #9
On Tue, Nov 21, 2023 at 12:27:11PM +0000, Jiaxun Yang wrote:
> I'll test this patch later today.

got it reproduced with qemu and fix is working there.

Thomas.
diff mbox series

Patch

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 5387ed0a5186..fecffa32f3e0 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -136,24 +136,26 @@  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		status |= ST0_EXL;
 #endif
 		childregs->cp0_status = status;
-		return 0;
-	}
+	} else {
+		/* user thread */
+		*childregs = *regs;
+		childregs->regs[7] = 0; /* Clear error flag */
+		childregs->regs[2] = 0; /* Child gets zero as return value */
+		if (usp)
+			childregs->regs[29] = usp;
 
-	/* user thread */
-	*childregs = *regs;
-	childregs->regs[7] = 0; /* Clear error flag */
-	childregs->regs[2] = 0; /* Child gets zero as return value */
-	if (usp)
-		childregs->regs[29] = usp;
+		p->thread.reg29 = (unsigned long) childregs;
+		p->thread.reg31 = (unsigned long) ret_from_fork;
 
-	p->thread.reg29 = (unsigned long) childregs;
-	p->thread.reg31 = (unsigned long) ret_from_fork;
+		/*
+		 * New tasks lose permission to use the fpu. This accelerates context
+		 * switching for most programs since they don't use the fpu.
+		 */
+		childregs->cp0_status &= ~(ST0_CU2|ST0_CU1);
 
-	/*
-	 * New tasks lose permission to use the fpu. This accelerates context
-	 * switching for most programs since they don't use the fpu.
-	 */
-	childregs->cp0_status &= ~(ST0_CU2|ST0_CU1);
+		if (clone_flags & CLONE_SETTLS)
+			ti->tp_value = tls;
+	}
 
 	clear_tsk_thread_flag(p, TIF_USEDFPU);
 	clear_tsk_thread_flag(p, TIF_USEDMSA);
@@ -167,9 +169,6 @@  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	atomic_set(&p->thread.bd_emu_frame, BD_EMUFRAME_NONE);
 #endif
 
-	if (clone_flags & CLONE_SETTLS)
-		ti->tp_value = tls;
-
 	return 0;
 }