Message ID | 20220419112247.711548-1-broonie@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | arm64/sme: Initial support for the Scalable Matrix Extension | expand |
On 2022-04-19 12:22, Mark Brown wrote: > This series provides initial support for the ARMv9 Scalable Matrix > Extension (SME). SME takes the approach used for vectors in SVE and > extends this to provide architectural support for matrix operations. A > more detailed overview can be found in [1]. For the KVM patches: Reviewed-by: Marc Zyngier <maz@kernel.org> Catalin: the KVM patches are likely to slash a bit with the WFxT stuff as well. It'd be good to swap stable branches! Thanks, M.
On Tue, 19 Apr 2022 12:22:08 +0100, Mark Brown wrote: > This series provides initial support for the ARMv9 Scalable Matrix > Extension (SME). SME takes the approach used for vectors in SVE and > extends this to provide architectural support for matrix operations. A > more detailed overview can be found in [1]. > > For the kernel SME can be thought of as a series of features which are > intended to be used together by applications but operate mostly > orthogonally: > > [...] Applied to arm64 (for-next/kselftest), thanks! [28/39] kselftest/arm64: Add manual encodings for SME instructions https://git.kernel.org/arm64/c/b5d3f4daf4d3 [29/39] kselftest/arm64: sme: Add SME support to vlset https://git.kernel.org/arm64/c/0fea47609e48 [30/39] kselftest/arm64: Add tests for TPIDR2 https://git.kernel.org/arm64/c/f442d9edcff0 [31/39] kselftest/arm64: Extend vector configuration API tests to cover SME https://git.kernel.org/arm64/c/7e387a00d640 [32/39] kselftest/arm64: sme: Provide streaming mode SVE stress test https://git.kernel.org/arm64/c/aee8a834e3f0 [33/39] kselftest/arm64: signal: Handle ZA signal context in core code https://git.kernel.org/arm64/c/f2608edbc17b [34/39] kselftest/arm64: Add stress test for SME ZA context switching https://git.kernel.org/arm64/c/659689a61912 [35/39] kselftest/arm64: signal: Add SME signal handling tests https://git.kernel.org/arm64/c/8d41f50ade02 [36/39] kselftest/arm64: Add streaming SVE to SVE ptrace tests https://git.kernel.org/arm64/c/e4bbc3f2c589 [37/39] kselftest/arm64: Add coverage for the ZA ptrace interface https://git.kernel.org/arm64/c/8f6bb75334f4 [38/39] kselftest/arm64: Add SME support to syscall ABI test https://git.kernel.org/arm64/c/5bbfaf598476 [39/39] selftests/arm64: Add a testcase for handling of ZA on clone() https://git.kernel.org/arm64/c/fb146c8a0ad9
On Fri, Apr 22, 2022 at 06:10:27PM +0100, Marc Zyngier wrote: > On 2022-04-19 12:22, Mark Brown wrote: > > This series provides initial support for the ARMv9 Scalable Matrix > > Extension (SME). SME takes the approach used for vectors in SVE and > > extends this to provide architectural support for matrix operations. A > > more detailed overview can be found in [1]. > > For the KVM patches: > > Reviewed-by: Marc Zyngier <maz@kernel.org> > > Catalin: the KVM patches are likely to slash a bit with the > WFxT stuff as well. It'd be good to swap stable branches! I queued patches 3-27 on the arm64 for-next/sme, the rest went on the for-next/kselftest branch but b4 didn't handle this well.
On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > This series provides initial support for the ARMv9 Scalable Matrix > Extension (SME). SME takes the approach used for vectors in SVE and > extends this to provide architectural support for matrix operations. A > more detailed overview can be found in [1]. > > For the kernel SME can be thought of as a series of features which are > intended to be used together by applications but operate mostly > orthogonally: > > - The ZA matrix register. > - Streaming mode, in which ZA can be accessed and a subset of SVE > features are available. > - A second vector length, used for streaming mode SVE and ZA and > controlled using a similar interface to that for SVE. > - TPIDR2, a new userspace controllable system register intended for use > by the C library for storing context related to the ZA ABI. > > A substantial part of the series is dedicated to refactoring the > existing SVE support so that we don't need to duplicate code for > handling vector lengths and the SVE registers, this involves creating an > array of vector types and making the users take the vector type as a > parameter. I'm not 100% happy with this but wasn't able to come up with > anything better, duplicating code definitely felt like a bad idea so > this felt like the least bad thing. If this approach makes sense to > people it might make sense to split this off into a separate series > and/or merge it while the rest is pending review to try to make things a > little more digestable, the series is very large so it'd probably make > things easier to digest if some of the preparatory refactoring could be > merged before the rest is ready. > > One feature of the architecture of particular note is that switching > to and from streaming mode may change the size of and invalidate the > contents of the SVE registers, and when in streaming mode the FFR is not > accessible. This complicates aspects of the ABI like signal handling > and ptrace. > > This initial implementation is mainly intended to get the ABI in place, > there are several areas which will be worked on going forwards - some of > these will be blockers, others could be handled in followup serieses: > > - SME is currently not supported for KVM guests, this will be done as a > followup series. A host system can use SME and run KVM guests but > SME is not available in the guests. > - The KVM host support is done in a very simplistic way, were anyone to > attempt to use it in production there would be performance impacts on > hosts with SME support. As part of this we also add enumeration of > fine grained traps. > - There is not currently ptrace or signal support TPIDR2, this will be > done as a followup series. > - No support is currently provided for scheduler control of SME or SME > applications, given the size of the SME register state the context > switch overhead may be noticable so this may be needed especially for > real time applications. Similar concerns already exist for larger > SVE vector lengths but are amplified for SME, particularly as the > vector length increases. > - There has been no work on optimising the performance of anything the > kernel does. > > It is not expected that any systems will be encountered that support SME > but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9. > The code attempts to handle any such systems that are encountered but > this hasn't been tested extensively. Running CPU offline/online on a Neoverse-N1 server will trigger a crash. A data point is setting CONFIG_ARM64_SVE=n could avoid it. kernel BUG at arch/arm64/kernel/cpufeature.c:1353! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP CPU: 88 PID: 0 Comm: swapper/88 Not tainted 5.18.0-rc4-next-20220426-00006-gfea0cdfbc1de #60 pstate: 204001c9 (nzCv dAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __read_sysreg_by_encoding lr : has_cpuid_feature sp : ffff80000a827d10 x29: ffff80000a827d10 x28: 0000000000000000 x27: ffffbb3c708efb8a x26: 1ffff7678e11df71 x25: 0000000000000002 x24: 0000000000000003 x23: ffffbb3c75870e80 x22: dfff800000000000 x21: 0000000000000029 x20: ffffbb3c708efba0 x19: ffffbb3c708efb80 x18: ffffbb3c73eb7d1c x17: 000000040044ffff x16: 1fffe7fff0e51474 x15: 1fffe806c1d7b54a x14: 1fffe7fff0e5146c x13: 0000000000000004 x12: ffff77678e839850 x11: 1ffff7678e83984f x10: ffff77678e83984f x9 : ffffbb3c6d7deef0 x8 : ffffbb3c741cc27f x7 : 0000000000000001 x6 : ffff77678e83984f x5 : ffffbb3c741cc278 x4 : 0000000000000000 x3 : 1fffe7fff0e51359 x2 : 1ffff7678e11df74 x1 : 0000000000180480 x0 : 00000000001804a0 Call trace: __read_sysreg_by_encoding has_cpuid_feature verify_local_cpu_caps verify_local_cpu_capabilities check_local_cpu_capabilities secondary_start_kernel __secondary_switched Code: 17ffff34 d5380234 17ffff32 f90013f5 (d4210000) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops - BUG: Fatal exception SMP: stopping secondary CPUs Kernel Offset: 0x3b3c657a0000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x000,0021700d,19801c82 Memory Limit: none ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
On Wed, Apr 27, 2022 at 01:08:58PM -0400, Qian Cai wrote: > On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > > but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9. > > The code attempts to handle any such systems that are encountered but > > this hasn't been tested extensively. > > Running CPU offline/online on a Neoverse-N1 server will trigger a crash. Can you try with https://lore.kernel.org/r/20220427130828.162615-1-broonie@kernel.org please?
On Wed, Apr 27, 2022 at 06:14:31PM +0100, Mark Brown wrote: > On Wed, Apr 27, 2022 at 01:08:58PM -0400, Qian Cai wrote: > > On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > > > > but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9. > > > The code attempts to handle any such systems that are encountered but > > > this hasn't been tested extensively. > > > > Running CPU offline/online on a Neoverse-N1 server will trigger a crash. > > Can you try with > > https://lore.kernel.org/r/20220427130828.162615-1-broonie@kernel.org > > please? Yes, it works fine so far.
On Wed, Apr 27, 2022 at 05:08:00PM -0400, Qian Cai wrote: > On Wed, Apr 27, 2022 at 06:14:31PM +0100, Mark Brown wrote: > > Can you try with > > https://lore.kernel.org/r/20220427130828.162615-1-broonie@kernel.org > > please? > Yes, it works fine so far. Great, thanks for checking. Catalin applied it now so hopefully -next will be sorted in the next day or so.
On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > This series provides initial support for the ARMv9 Scalable Matrix > Extension (SME). SME takes the approach used for vectors in SVE and > extends this to provide architectural support for matrix operations. A > more detailed overview can be found in [1]. Set CONFIG_ARM64_SME=n fixed a warning while running libhugetlbfs tests. /* * There are several places where we assume that the order value is sane * so bail out early if the request is out of bound. */ if (unlikely(order >= MAX_ORDER)) { WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); return NULL; } WARNING: CPU: 122 PID: 4025 at mm/page_alloc.c:5383 __alloc_pages CPU: 122 PID: 4025 Comm: brk_near_huge Not tainted 5.18.0-rc5-next-20220503 #79 pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __alloc_pages lr : alloc_pages sp : ffff8000505470f0 x29: ffff8000505470f0 x28: ffff40028b3535c0 x27: 0000000000000000 x26: 1ffff0000a0a8ea2 x25: ffff800050547510 x24: 0000000000000dc0 x23: ffff921ddb818000 x22: 000000000000000e x21: 1ffff0000a0a8e28 x20: 0000000000040dc0 x19: ffffae1848c61ae0 x18: ffffae18357e7d24 x17: ffffae182fb75778 x16: 1fffe8005166a7d8 x15: 000000000000001a x14: 1fffe8005166a7cb x13: 0000000000000004 x12: ffff70000a0a8e03 x11: 1ffff0000a0a8e02 x10: 00000000f204f1f1 x9 : 000000000000f204 x8 : dfff800000000000 x7 : 00000000f3000000 x6 : 00000000f3f3f3f3 x5 : ffff70000a0a8e28 x4 : ffff40028b3535c0 x3 : 0000000000000000 x2 : 0000000000000001 x1 : 0000000000000001 x0 : 0000000000040dc0 Call trace: __alloc_pages alloc_pages kmalloc_order kmalloc_order_trace __kmalloc __regset_get regset_get_alloc fill_thread_core_info fill_note_info elf_core_dump do_coredump get_signal do_signal do_notify_resume el0_svc el0t_64_sync_handler el0t_64_sync irq event stamp: 28066 hardirqs last enabled at (28065): _raw_spin_unlock_irqrestore hardirqs last disabled at (28066): el1_dbg softirqs last enabled at (27438): fpsimd_preserve_current_state softirqs last disabled at (27436): fpsimd_preserve_current_state
On Tue, May 03, 2022 at 06:23:40PM -0400, Qian Cai wrote: > On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > > This series provides initial support for the ARMv9 Scalable Matrix > > Extension (SME). SME takes the approach used for vectors in SVE and > > extends this to provide architectural support for matrix operations. A > > more detailed overview can be found in [1]. > > Set CONFIG_ARM64_SME=n fixed a warning while running libhugetlbfs tests. > > /* > * There are several places where we assume that the order value is sane > * so bail out early if the request is out of bound. > */ > if (unlikely(order >= MAX_ORDER)) { > WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); > return NULL; > } Ugh, right. These variable sized register sets really don't map entirely cleanly onto the ptrace interface but now you point it out what the code has there is going to give a rather larger number than is sensible. Not fully checked but does the below fix things? Thanks for your testing with this stuff, it's been really helpful. diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 47d8a7472171..08c1cb43cf33 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1447,8 +1447,8 @@ static const struct user_regset aarch64_regsets[] = { }, [REGSET_ZA] = { /* SME ZA */ .core_note_type = NT_ARM_ZA, - .n = DIV_ROUND_UP(ZA_PT_ZA_SIZE(SVE_VQ_MAX), SVE_VQ_BYTES), - .size = SVE_VQ_BYTES, + .n = 1, + .size = ZA_PT_SIZE(SVE_VQ_MAX), .align = SVE_VQ_BYTES, .regset_get = za_get, .set = za_set,
On Wed, 4 May 2022 at 05:22, Mark Brown <broonie@kernel.org> wrote: > > On Tue, May 03, 2022 at 06:23:40PM -0400, Qian Cai wrote: > > On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > > > This series provides initial support for the ARMv9 Scalable Matrix > > > Extension (SME). SME takes the approach used for vectors in SVE and > > > extends this to provide architectural support for matrix operations. A > > > more detailed overview can be found in [1]. > > > > Set CONFIG_ARM64_SME=n fixed a warning while running libhugetlbfs tests. > > > > /* > > * There are several places where we assume that the order value is sane > > * so bail out early if the request is out of bound. > > */ > > if (unlikely(order >= MAX_ORDER)) { > > WARN_ON_ONCE(!(gfp & __GFP_NOWARN)); > > return NULL; > > } > > Ugh, right. These variable sized register sets really don't map > entirely cleanly onto the ptrace interface but now you point it > out what the code has there is going to give a rather larger > number than is sensible. Not fully checked but does the below > fix things? > > Thanks for your testing with this stuff, it's been really > helpful. > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c > index 47d8a7472171..08c1cb43cf33 100644 > --- a/arch/arm64/kernel/ptrace.c > +++ b/arch/arm64/kernel/ptrace.c > @@ -1447,8 +1447,8 @@ static const struct user_regset aarch64_regsets[] = { > }, > [REGSET_ZA] = { /* SME ZA */ > .core_note_type = NT_ARM_ZA, > - .n = DIV_ROUND_UP(ZA_PT_ZA_SIZE(SVE_VQ_MAX), SVE_VQ_BYTES), > - .size = SVE_VQ_BYTES, > + .n = 1, > + .size = ZA_PT_SIZE(SVE_VQ_MAX), > .align = SVE_VQ_BYTES, > .regset_get = za_get, > .set = za_set, I have tested this patch but the warning did not fix. Testing libhugetlbfs on qemu_arm64 triggering this warning. Kernel warning: ------------------- [ 13.266791] kauditd_printk_skb: 1 callbacks suppressed [ 13.266794] audit: type=1701 audit(1651640394.652:25): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=463 comm=\"brk_near_huge\" exe=\"/usr/lib/libhugetlbfs/tests/obj64/brk_near_huge\" sig=6 res=1 [ 13.267376] ------------[ cut here ]------------ [ 13.267378] WARNING: CPU: 2 PID: 463 at mm/page_alloc.c:5368 __alloc_pages+0x624/0xd50 [ 13.269956] Modules linked in: crct10dif_ce rfkill fuse [ 13.270357] CPU: 2 PID: 463 Comm: brk_near_huge Not tainted 5.18.0-rc4-next-20220429 #1 [ 13.270964] Hardware name: linux,dummy-virt (DT) [ 13.271315] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.271841] pc : __alloc_pages+0x624/0xd50 [ 13.272157] lr : alloc_pages+0xb8/0x170 [ 13.272451] sp : ffff800008873630 [ 13.272704] x29: ffff800008873630 x28: 000000000000000f x27: ffffb53cfad91650 [ 13.273248] x26: ffff0000c57f2c00 x25: ffff800008873c58 x24: 000000000000000f [ 13.273788] x23: 0000000000000dc0 x22: 0000000000000000 x21: 000000000000000f [ 13.274327] x20: ffffb53cfc8189a0 x19: 0000000000040dc0 x18: 0000000000000000 [ 13.274868] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 13.275406] x14: 0000000000000000 x13: 0000000000000000 x12: ffff0000ff7f5b58 [ 13.275945] x11: 0000000000000068 x10: ffffb53cfb77d000 x9 : ffffb53cf9d30e78 [ 13.276485] x8 : fffffc00038e0001 x7 : dead000000000100 x6 : 0000000000000001 [ 13.277021] x5 : 0000000000000000 x4 : ffff0000c55fe180 x3 : 0000000000000000 [ 13.277559] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000040dc0 [ 13.278096] Call trace: [ 13.278285] __alloc_pages+0x624/0xd50 [ 13.278576] alloc_pages+0xb8/0x170 [ 13.278844] kmalloc_order+0x40/0x100 [ 13.279126] kmalloc_order_trace+0x38/0x130 [ 13.279445] __kmalloc+0x37c/0x3e0 [ 13.279707] __regset_get+0xa0/0x104 [ 13.279983] regset_get_alloc+0x20/0x2c [ 13.280277] elf_core_dump+0x3a8/0xd10 [ 13.280567] do_coredump+0xe50/0x138c [ 13.280850] get_signal+0x860/0x920 [ 13.281119] do_notify_resume+0x184/0x1480 [ 13.281428] el0_svc+0xa8/0xc0 [ 13.281666] el0t_64_sync_handler+0xbc/0x140 [ 13.281992] el0t_64_sync+0x18c/0x190 [ 13.282273] ---[ end trace 0000000000000000 ]--- Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> url: https://lkft.validation.linaro.org/scheduler/job/4983793#L831 Build link: https://builds.tuxbuild.com/28gVhSoYA4NRerhwD1gkY4QdHLt/ -- Linaro LKFT https://lkft.linaro.org
On Wed, 4 May 2022 at 05:22, Mark Brown <broonie@kernel.org> wrote: > > On Tue, May 03, 2022 at 06:23:40PM -0400, Qian Cai wrote: > > On Tue, Apr 19, 2022 at 12:22:08PM +0100, Mark Brown wrote: > > > This series provides initial support for the ARMv9 Scalable Matrix > > > Extension (SME). SME takes the approach used for vectors in SVE and > > > extends this to provide architectural support for matrix operations. A > > > more detailed overview can be found in [1]. > > > > Set CONFIG_ARM64_SME=n fixed a warning while running libhugetlbfs tests. I agree with this. The reported kernel warning gone with CONFIG_ARM64_SME=n - Naresh