Message ID | 20230221190858.3159617-3-evan@rivosinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | RISC-V Hardware Probing User Interface | expand |
Context | Check | Description |
---|---|---|
conchuod/tree_selection | fail | Failed to apply to next/pending-fixes or riscv/for-next |
On Tue, Feb 21, 2023, at 20:08, Evan Green wrote: > We don't have enough space for these all in ELF_HWCAP{,2} and there's no > system call that quite does this, so let's just provide an arch-specific > one to probe for hardware capabilities. This currently just provides > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in > the future. > > Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > Signed-off-by: Evan Green <evan@rivosinc.com> I'm still skeptical about the need for a custom syscall interface here. I had not looked at the interface so far, but there are a few things that stick out: > +RISC-V Hardware Probing Interface > +--------------------------------- > + > +The RISC-V hardware probing interface is based around a single > syscall, which > +is defined in <asm/hwprobe.h>:: > + > + struct riscv_hwprobe { > + __s64 key; > + __u64 value; > + }; The way this is defined, the kernel will always have to know about the specific set of features, it can't just forward unknown features to user space after probing them from an architectured hardware interface or from DT. If 'key' is just an enumerated value with a small number of possible values, I don't see anything wrong with using elf aux data. I understand it's hard to know how many keys might be needed in the long run, from the way you define the key/value pairs here, I would expect it to have a lot of the same limitations that the aux data has, except for a few bytes to be copied. > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t > pair_count, > + size_t cpu_count, cpu_set_t *cpus, > + unsigned long flags); The cpu set argument worries me more: there should never be a need to optimize for broken hardware that has an asymmetric set of features. Just let the kernel figure out the minimum set of features that works across all CPUs and report that like we do with HWCAP. If there is a SoC that is so broken that it has important features on a subset of cores that some user might actually want to rely on, then have them go through the slow sysfs interface for probing the CPUs indidually, but don't make the broken case easier at the expense of normal users that run on working hardware. > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, > uintptr_t, > + uintptr_t, uintptr_t); Why 'uintptr_t' rather than the correct type? Arnd
On Tue, Feb 21, 2023 at 11:08:53AM -0800, Evan Green wrote: > We don't have enough space for these all in ELF_HWCAP{,2} and there's no > system call that quite does this, so let's just provide an arch-specific > one to probe for hardware capabilities. This currently just provides > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in > the future. > > Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > Signed-off-by: Evan Green <evan@rivosinc.com> > +static > +int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count, > + long cpu_count, unsigned long __user *cpus_user, > + unsigned long flags) I almost feel bad commenting this, but this is now the only function split like this w/ the static on its own line. With the same caveat about ignorance about glibc's desires & lingering doubt as to whether this interface is the right way to go, this looks good to me now. I'm reluctant to give an R-b for the latter reason, but assuming the Higher Power deem this approach acceptable: Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cheers, Conor.
On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Feb 21, 2023, at 20:08, Evan Green wrote: > > We don't have enough space for these all in ELF_HWCAP{,2} and there's no > > system call that quite does this, so let's just provide an arch-specific > > one to probe for hardware capabilities. This currently just provides > > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in > > the future. > > > > Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > Signed-off-by: Evan Green <evan@rivosinc.com> > > I'm still skeptical about the need for a custom syscall interface here. > I had not looked at the interface so far, but there are a few things > that stick out: > > > +RISC-V Hardware Probing Interface > > +--------------------------------- > > + > > +The RISC-V hardware probing interface is based around a single > > syscall, which > > +is defined in <asm/hwprobe.h>:: > > + > > + struct riscv_hwprobe { > > + __s64 key; > > + __u64 value; > > + }; > > The way this is defined, the kernel will always have to know > about the specific set of features, it can't just forward > unknown features to user space after probing them from an > architectured hardware interface or from DT. You're correct that this interface wasn't intended to have usermode come in with augmented data or additional key/value pairs. This was purely meant to provide access to the kernel's repository of architectural and microarchitectural details. If usermode wants to provide extra info in this same form, maybe they could wrap this interface. > > If 'key' is just an enumerated value with a small number of > possible values, I don't see anything wrong with using elf > aux data. I understand it's hard to know how many keys > might be needed in the long run, from the way you define > the key/value pairs here, I would expect it to have a lot > of the same limitations that the aux data has, except for > a few bytes to be copied. Correct, this makes allocating bits out of here cheaper by not requiring that we actively copy them into every new process forever. You're right that the aux vector would work as well, but the thinking behind this series was that an interface like this might be better for an architecture as extensible as risc-v. > > > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t > > pair_count, > > + size_t cpu_count, cpu_set_t *cpus, > > + unsigned long flags); > > The cpu set argument worries me more: there should never be a > need to optimize for broken hardware that has an asymmetric set > of features. Just let the kernel figure out the minimum set > of features that works across all CPUs and report that like we > do with HWCAP. If there is a SoC that is so broken that it has > important features on a subset of cores that some user might > actually want to rely on, then have them go through the slow > sysfs interface for probing the CPUs indidually, but don't make > the broken case easier at the expense of normal users that > run on working hardware. I'm not so sure. While I agree with you for major classes of features (eg one CPU has floating point support but another does not), I expect these bits to contain more subtle details as well, which might vary across asymmetric implementations without breaking ABI compatibility per-se. Maybe some vendor has implemented exotic video decoding acceleration instructions that only work on the big core. Or maybe the big cores support v3.1 of some extension (where certain things run faster), but the little cores only have v3.0, where it's a little slower. Certain apps would likely want to know these things so they can allocate their work optimally across cores. > > > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, > > uintptr_t, > > + uintptr_t, uintptr_t); > > Why 'uintptr_t' rather than the correct type? Fixed. -Evan
Am Donnerstag, 30. März 2023, 20:30:29 CEST schrieb Evan Green: > On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Tue, Feb 21, 2023, at 20:08, Evan Green wrote: > > > We don't have enough space for these all in ELF_HWCAP{,2} and there's no > > > system call that quite does this, so let's just provide an arch-specific > > > one to probe for hardware capabilities. This currently just provides > > > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in > > > the future. > > > > > > Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> > > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > Signed-off-by: Evan Green <evan@rivosinc.com> > > > > I'm still skeptical about the need for a custom syscall interface here. > > I had not looked at the interface so far, but there are a few things > > that stick out: > > > > > +RISC-V Hardware Probing Interface > > > +--------------------------------- > > > + > > > +The RISC-V hardware probing interface is based around a single > > > syscall, which > > > +is defined in <asm/hwprobe.h>:: > > > + > > > + struct riscv_hwprobe { > > > + __s64 key; > > > + __u64 value; > > > + }; > > > > The way this is defined, the kernel will always have to know > > about the specific set of features, it can't just forward > > unknown features to user space after probing them from an > > architectured hardware interface or from DT. > > You're correct that this interface wasn't intended to have usermode > come in with augmented data or additional key/value pairs. This was > purely meant to provide access to the kernel's repository of > architectural and microarchitectural details. If usermode wants to > provide extra info in this same form, maybe they could wrap this > interface. > > > If 'key' is just an enumerated value with a small number of > > possible values, I don't see anything wrong with using elf > > aux data. I understand it's hard to know how many keys > > might be needed in the long run, from the way you define > > the key/value pairs here, I would expect it to have a lot > > of the same limitations that the aux data has, except for > > a few bytes to be copied. > > Correct, this makes allocating bits out of here cheaper by not > requiring that we actively copy them into every new process forever. > You're right that the aux vector would work as well, but the thinking > behind this series was that an interface like this might be better for > an architecture as extensible as risc-v. What would be the ramifications of defining some sort of vdso-like data-structure and just putting the address into AT_HWCAP2 ? (similar to what vdso does) - that could then even be re-usable with other OS kernels. And would also save declaring numerous new AT_* keys. Because there are already nearly 130 standard extensions and vendors are allowed to defines their own as well, and we will probably also want to tell userspace about them. Heiko > > > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t > > > pair_count, > > > + size_t cpu_count, cpu_set_t *cpus, > > > + unsigned long flags); > > > > The cpu set argument worries me more: there should never be a > > need to optimize for broken hardware that has an asymmetric set > > of features. Just let the kernel figure out the minimum set > > of features that works across all CPUs and report that like we > > do with HWCAP. If there is a SoC that is so broken that it has > > important features on a subset of cores that some user might > > actually want to rely on, then have them go through the slow > > sysfs interface for probing the CPUs indidually, but don't make > > the broken case easier at the expense of normal users that > > run on working hardware. > > I'm not so sure. While I agree with you for major classes of features > (eg one CPU has floating point support but another does not), I expect > these bits to contain more subtle details as well, which might vary > across asymmetric implementations without breaking ABI compatibility > per-se. Maybe some vendor has implemented exotic video decoding > acceleration instructions that only work on the big core. Or maybe the > big cores support v3.1 of some extension (where certain things run > faster), but the little cores only have v3.0, where it's a little > slower. Certain apps would likely want to know these things so they > can allocate their work optimally across cores. > > > > > > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, > > > uintptr_t, > > > + uintptr_t, uintptr_t); > > > > Why 'uintptr_t' rather than the correct type? > > Fixed. > -Evan >
On Thu, Mar 30, 2023 at 1:20 PM Heiko Stübner <heiko@sntech.de> wrote: > > Am Donnerstag, 30. März 2023, 20:30:29 CEST schrieb Evan Green: > > On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > On Tue, Feb 21, 2023, at 20:08, Evan Green wrote: > > > > We don't have enough space for these all in ELF_HWCAP{,2} and there's no > > > > system call that quite does this, so let's just provide an arch-specific > > > > one to probe for hardware capabilities. This currently just provides > > > > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in > > > > the future. > > > > > > > > Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > Signed-off-by: Evan Green <evan@rivosinc.com> > > > > > > I'm still skeptical about the need for a custom syscall interface here. > > > I had not looked at the interface so far, but there are a few things > > > that stick out: > > > > > > > +RISC-V Hardware Probing Interface > > > > +--------------------------------- > > > > + > > > > +The RISC-V hardware probing interface is based around a single > > > > syscall, which > > > > +is defined in <asm/hwprobe.h>:: > > > > + > > > > + struct riscv_hwprobe { > > > > + __s64 key; > > > > + __u64 value; > > > > + }; > > > > > > The way this is defined, the kernel will always have to know > > > about the specific set of features, it can't just forward > > > unknown features to user space after probing them from an > > > architectured hardware interface or from DT. > > > > You're correct that this interface wasn't intended to have usermode > > come in with augmented data or additional key/value pairs. This was > > purely meant to provide access to the kernel's repository of > > architectural and microarchitectural details. If usermode wants to > > provide extra info in this same form, maybe they could wrap this > > interface. > > > > > If 'key' is just an enumerated value with a small number of > > > possible values, I don't see anything wrong with using elf > > > aux data. I understand it's hard to know how many keys > > > might be needed in the long run, from the way you define > > > the key/value pairs here, I would expect it to have a lot > > > of the same limitations that the aux data has, except for > > > a few bytes to be copied. > > > > Correct, this makes allocating bits out of here cheaper by not > > requiring that we actively copy them into every new process forever. > > You're right that the aux vector would work as well, but the thinking > > behind this series was that an interface like this might be better for > > an architecture as extensible as risc-v. > > What would be the ramifications of defining some sort of vdso-like > data-structure and just putting the address into AT_HWCAP2 ? > (similar to what vdso does) - that could then even be re-usable > with other OS kernels. > > And would also save declaring numerous new AT_* keys. > > > Because there are already nearly 130 standard extensions and vendors > are allowed to defines their own as well, and we will probably also want > to tell userspace about them. Yeah I mulled that approach over a bit originally as well. The downside is the vdso data then becomes part of the ABI. So you can never change the layout of that vdso data, and you lose the ability to change what gets cached in the vdso versus what bounces up to the syscall. To poach a scenario from a glibc discussion underway, if for instance cpu hotplug comes along and you need to invalidate some portion of your cached data, that's easy when there's a function in front of it, but difficult if apps are crawling the data themselves. 130 extensions is certainly a lot, and illustrates how auxvec may get out of hand quickly. One nice thing about this mechanism (though other approaches share this trait) is that it's agnostic of where the data comes from. In other words, it doesn't require that data come from the DT, or alternative.c, etc, as long as the kernel can access it and plunk it in a key/value store. -Evan > > > Heiko > > > > > > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t > > > > pair_count, > > > > + size_t cpu_count, cpu_set_t *cpus, > > > > + unsigned long flags); > > > > > > The cpu set argument worries me more: there should never be a > > > need to optimize for broken hardware that has an asymmetric set > > > of features. Just let the kernel figure out the minimum set > > > of features that works across all CPUs and report that like we > > > do with HWCAP. If there is a SoC that is so broken that it has > > > important features on a subset of cores that some user might > > > actually want to rely on, then have them go through the slow > > > sysfs interface for probing the CPUs indidually, but don't make > > > the broken case easier at the expense of normal users that > > > run on working hardware. > > > > I'm not so sure. While I agree with you for major classes of features > > (eg one CPU has floating point support but another does not), I expect > > these bits to contain more subtle details as well, which might vary > > across asymmetric implementations without breaking ABI compatibility > > per-se. Maybe some vendor has implemented exotic video decoding > > acceleration instructions that only work on the big core. Or maybe the > > big cores support v3.1 of some extension (where certain things run > > faster), but the little cores only have v3.0, where it's a little > > slower. Certain apps would likely want to know these things so they > > can allocate their work optimally across cores. > > > > > > > > > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, > > > > uintptr_t, > > > > + uintptr_t, uintptr_t); > > > > > > Why 'uintptr_t' rather than the correct type? > > > > Fixed. > > -Evan > > > > > >
On Thu, Mar 30, 2023, at 20:30, Evan Green wrote: > On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <arnd@arndb.de> wrote: >> > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t >> > pair_count, >> > + size_t cpu_count, cpu_set_t *cpus, >> > + unsigned long flags); >> >> The cpu set argument worries me more: there should never be a >> need to optimize for broken hardware that has an asymmetric set >> of features. Just let the kernel figure out the minimum set >> of features that works across all CPUs and report that like we >> do with HWCAP. If there is a SoC that is so broken that it has >> important features on a subset of cores that some user might >> actually want to rely on, then have them go through the slow >> sysfs interface for probing the CPUs indidually, but don't make >> the broken case easier at the expense of normal users that >> run on working hardware. > > I'm not so sure. While I agree with you for major classes of features > (eg one CPU has floating point support but another does not), I expect > these bits to contain more subtle details as well, which might vary > across asymmetric implementations without breaking ABI compatibility > per-se. Maybe some vendor has implemented exotic video decoding > acceleration instructions that only work on the big core. Or maybe the > big cores support v3.1 of some extension (where certain things run > faster), but the little cores only have v3.0, where it's a little > slower. Certain apps would likely want to know these things so they > can allocate their work optimally across cores. Do you have a specific feature in mind where hardware would be intentionally designed this way? I still can't come up with a scenario where this would actually work in practice, as having asymmetric features is incompatible with so many other things we normally do. - In a virtual machine, the VCPU tents to get scheduled arbitrarily to physical CPUs, so setting affinity in a guest won't actually guarantee that the feature is still there. - Using a CPU feature from library code is practically impossible if it requires special CPU affinity, as the application may already be started on specific CPUs for another reason, and having a library call sched_setaffinity will conflict with those. - Even in the simplest case of having a standalone application without any shared libraries try to pick a sensible CPU to run on is hard to do in a generic way, as it would need to weigh availabilty of features on certain cores against the number of cores with or without the feature and their current and expected system load. As long as there isn't a specific requirement, I think it's better to not actually encourage hardware vendors to implement designs like that, or at least not designing an interface to make getting this information a few microseconds faster that what already exists. Arnd
Hi Arnd, On Fri, Mar 31, 2023 at 6:21 AM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Mar 30, 2023, at 20:30, Evan Green wrote: > > On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <arnd@arndb.de> wrote: > >> > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t > >> > pair_count, > >> > + size_t cpu_count, cpu_set_t *cpus, > >> > + unsigned long flags); > >> > >> The cpu set argument worries me more: there should never be a > >> need to optimize for broken hardware that has an asymmetric set > >> of features. Just let the kernel figure out the minimum set > >> of features that works across all CPUs and report that like we > >> do with HWCAP. If there is a SoC that is so broken that it has > >> important features on a subset of cores that some user might > >> actually want to rely on, then have them go through the slow > >> sysfs interface for probing the CPUs indidually, but don't make > >> the broken case easier at the expense of normal users that > >> run on working hardware. > > > > I'm not so sure. While I agree with you for major classes of features > > (eg one CPU has floating point support but another does not), I expect > > these bits to contain more subtle details as well, which might vary > > across asymmetric implementations without breaking ABI compatibility > > per-se. Maybe some vendor has implemented exotic video decoding > > acceleration instructions that only work on the big core. Or maybe the > > big cores support v3.1 of some extension (where certain things run > > faster), but the little cores only have v3.0, where it's a little > > slower. Certain apps would likely want to know these things so they > > can allocate their work optimally across cores. > > Do you have a specific feature in mind where hardware would be > intentionally designed this way? I still can't come up with a > scenario where this would actually work in practice, as having > asymmetric features is incompatible with so many other things > we normally do. Isn't this already a reality, with the ARM folks building systems where only some cores can run arm32 code, and the rest are aarch64 only? https://lwn.net/Articles/838339/ The way I see it, it's going to be close to impossible to _not_ design hardware this way. As long as die area and energy efficiency are a constraint, big.LITTLE and other asymmetric architectures seem inevitable. Given this interface is sort of a software cpuid instruction, pretending there won't be differences between cores feels like an incomplete API. > > - In a virtual machine, the VCPU tents to get scheduled arbitrarily > to physical CPUs, so setting affinity in a guest won't actually > guarantee that the feature is still there. Sure, but I wouldn't expect the guest to be able to observe physical properties from a transient host CPU (I'd consider that a bug). I'd expect either the VCPU is pinned to a physical CPU with no chance of migration, in which case you might reasonably plumb down some physical properties to the guest, or the guest sees a watered down generic CPU containing the least common denominator of features. > > - Using a CPU feature from library code is practically impossible > if it requires special CPU affinity, as the application may > already be started on specific CPUs for another reason, and > having a library call sched_setaffinity will conflict with those. > > - Even in the simplest case of having a standalone application > without any shared libraries try to pick a sensible CPU to > run on is hard to do in a generic way, as it would need to > weigh availabilty of features on certain cores against the > number of cores with or without the feature and their current > and expected system load. In ChromeOS at least, we struggled a lot with chrome getting scheduled on the big vs little cores on some ARM platforms, as we'd get things like dropped video frames if we stayed stuck on the wrong cores. I think we managed to solve that with uclamp, but again that acknowledges that there are differences between cores, which I think justifies the cpuset parameter to a cpu feature interface like this. > > As long as there isn't a specific requirement, I think it's better > to not actually encourage hardware vendors to implement designs > like that, or at least not designing an interface to make getting > this information a few microseconds faster that what already > exists. As this is ABI, there is a bit of "crystal ball gazing" I'm doing, though I hope I've backed it up with enough to show it's not a completely wild parameter. -Evan
diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst new file mode 100644 index 000000000000..88b015a2026e --- /dev/null +++ b/Documentation/riscv/hwprobe.rst @@ -0,0 +1,39 @@ +.. SPDX-License-Identifier: GPL-2.0 + +RISC-V Hardware Probing Interface +--------------------------------- + +The RISC-V hardware probing interface is based around a single syscall, which +is defined in <asm/hwprobe.h>:: + + struct riscv_hwprobe { + __s64 key; + __u64 value; + }; + + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count, + size_t cpu_count, cpu_set_t *cpus, + unsigned long flags); + +The arguments are split into three groups: an array of key-value pairs, a CPU +set, and some flags. The key-value pairs are supplied with a count. Userspace +must prepopulate the key field for each element, and the kernel will fill in the +value if the key is recognized. If a key is unknown to the kernel, its key field +will be cleared to -1, and its value set to 0. The CPU set is defined by +CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will +be only be valid if all CPUs in the given set have the same value. Otherwise -1 +will be returned. For boolean-like keys, the value returned will be a logical +AND of the values for the specified CPUs. Usermode can supply NULL for cpus and +0 for cpu_count as a shortcut for all online CPUs. There are currently no flags, +this value must be zero for future compatibility. + +On success 0 is returned, on failure a negative error code is returned. + +The following keys are defined: + +* :c:macro:`RISCV_HWPROBE_KEY_MVENDORID`: Contains the value of ``mvendorid``, + as defined by the RISC-V privileged architecture specification. +* :c:macro:`RISCV_HWPROBE_KEY_MARCHID`: Contains the value of ``marchid``, as + defined by the RISC-V privileged architecture specification. +* :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as + defined by the RISC-V privileged architecture specification. diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst index 2e5b18fbb145..175a91db0200 100644 --- a/Documentation/riscv/index.rst +++ b/Documentation/riscv/index.rst @@ -7,6 +7,7 @@ RISC-V architecture boot-image-header vm-layout + hwprobe patch-acceptance uabi diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h new file mode 100644 index 000000000000..08d1c3bdd78a --- /dev/null +++ b/arch/riscv/include/asm/hwprobe.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright 2022 Rivos, Inc + */ + +#ifndef _ASM_HWPROBE_H +#define _ASM_HWPROBE_H + +#include <uapi/asm/hwprobe.h> + +#define RISCV_HWPROBE_MAX_KEY 2 + +#endif diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h index 384a63b86420..78a6302ef711 100644 --- a/arch/riscv/include/asm/syscall.h +++ b/arch/riscv/include/asm/syscall.h @@ -75,4 +75,7 @@ static inline int syscall_get_arch(struct task_struct *task) } asmlinkage long sys_riscv_flush_icache(uintptr_t, uintptr_t, uintptr_t); + +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, uintptr_t, + uintptr_t, uintptr_t); #endif /* _ASM_RISCV_SYSCALL_H */ diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h new file mode 100644 index 000000000000..591802047460 --- /dev/null +++ b/arch/riscv/include/uapi/asm/hwprobe.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright 2022 Rivos, Inc + */ + +#ifndef _UAPI_ASM_HWPROBE_H +#define _UAPI_ASM_HWPROBE_H + +#include <linux/types.h> + +/* + * Interface for probing hardware capabilities from userspace, see + * Documentation/riscv/hwprobe.rst for more information. + */ +struct riscv_hwprobe { + __s64 key; + __u64 value; +}; + +#define RISCV_HWPROBE_KEY_MVENDORID 0 +#define RISCV_HWPROBE_KEY_MARCHID 1 +#define RISCV_HWPROBE_KEY_MIMPID 2 +/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */ + +#endif diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h index 73d7cdd2ec49..950ab3fd4409 100644 --- a/arch/riscv/include/uapi/asm/unistd.h +++ b/arch/riscv/include/uapi/asm/unistd.h @@ -43,3 +43,12 @@ #define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15) #endif __SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache) + +/* + * Allows userspace to query the kernel for CPU architecture and + * microarchitecture details across a given set of CPUs. + */ +#ifndef __NR_riscv_hwprobe +#define __NR_riscv_hwprobe (__NR_arch_specific_syscall + 14) +#endif +__SYSCALL(__NR_riscv_hwprobe, sys_riscv_hwprobe) diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 684e5419d37d..d0fb3567cc3d 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -4,15 +4,16 @@ */ #include <linux/cpu.h> +#include <linux/cpuhotplug.h> #include <linux/init.h> #include <linux/seq_file.h> #include <linux/of.h> #include <asm/cpufeature.h> #include <asm/csr.h> #include <asm/hwcap.h> +#include <asm/pgtable.h> #include <asm/sbi.h> #include <asm/smp.h> -#include <asm/pgtable.h> /* * Returns the hart ID of the given device tree node, or -ENODEV if the node diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index 5d3f2fbeb33c..02c2f1f7417e 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -6,8 +6,11 @@ */ #include <linux/syscalls.h> -#include <asm/unistd.h> #include <asm/cacheflush.h> +#include <asm/hwprobe.h> +#include <asm/sbi.h> +#include <asm/uaccess.h> +#include <asm/unistd.h> #include <asm-generic/mman-common.h> static long riscv_sys_mmap(unsigned long addr, unsigned long len, @@ -69,3 +72,132 @@ SYSCALL_DEFINE3(riscv_flush_icache, uintptr_t, start, uintptr_t, end, return 0; } + +/* + * The hwprobe interface, for allowing userspace to probe to see which features + * are supported by the hardware. See Documentation/riscv/hwprobe.rst for more + * details. + */ +static void hwprobe_arch_id(struct riscv_hwprobe *pair, + const struct cpumask *cpus) +{ + u64 cpu, id = -1ULL; + bool first = true; + + for_each_cpu(cpu, cpus) { + u64 cpu_id; + + switch (pair->key) { + case RISCV_HWPROBE_KEY_MVENDORID: + cpu_id = riscv_cached_mvendorid(cpu); + break; + case RISCV_HWPROBE_KEY_MIMPID: + cpu_id = riscv_cached_mimpid(cpu); + break; + case RISCV_HWPROBE_KEY_MARCHID: + cpu_id = riscv_cached_marchid(cpu); + break; + } + + if (first) + id = cpu_id; + + /* + * If there's a mismatch for the given set, return -1 in the + * value. + */ + if (id != cpu_id) { + id = -1ULL; + break; + } + } + + pair->value = id; +} + +static void hwprobe_one_pair(struct riscv_hwprobe *pair, + const struct cpumask *cpus) +{ + switch (pair->key) { + case RISCV_HWPROBE_KEY_MVENDORID: + case RISCV_HWPROBE_KEY_MARCHID: + case RISCV_HWPROBE_KEY_MIMPID: + hwprobe_arch_id(pair, cpus); + break; + + /* + * For forward compatibility, unknown keys don't fail the whole + * call, but get their element key set to -1 and value set to 0 + * indicating they're unrecognized. + */ + default: + pair->key = -1; + pair->value = 0; + break; + } +} + +static +int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count, + long cpu_count, unsigned long __user *cpus_user, + unsigned long flags) +{ + size_t out; + int ret; + cpumask_t cpus; + + /* Check the reserved flags. */ + if (flags != 0) + return -EINVAL; + + /* + * The interface supports taking in a CPU mask, and returns values that + * are consistent across that mask. Allow userspace to specify NULL and + * 0 as a shortcut to all online CPUs. + */ + cpumask_clear(&cpus); + if (!cpu_count && !cpus_user) { + cpumask_copy(&cpus, cpu_online_mask); + } else { + if (cpu_count > cpumask_size()) + cpu_count = cpumask_size(); + + ret = copy_from_user(&cpus, cpus_user, cpu_count); + if (!ret) + return -EFAULT; + + /* + * Userspace must provide at least one online CPU, without that + * there's no way to define what is supported. + */ + cpumask_and(&cpus, &cpus, cpu_online_mask); + if (cpumask_empty(&cpus)) + return -EINVAL; + } + + for (out = 0; out < pair_count; out++, pairs++) { + struct riscv_hwprobe pair; + + if (get_user(pair.key, &pairs->key)) + return -EFAULT; + + pair.value = 0; + hwprobe_one_pair(&pair, &cpus); + ret = put_user(pair.key, &pairs->key); + if (ret == 0) + ret = put_user(pair.value, &pairs->value); + + if (ret) + return -EFAULT; + } + + return 0; + +} + +SYSCALL_DEFINE5(riscv_hwprobe, uintptr_t, pairs, uintptr_t, pair_count, + uintptr_t, cpu_count, uintptr_t, cpus, uintptr_t, flags) +{ + return do_riscv_hwprobe((void __user *)pairs, pair_count, cpu_count, + (void __user *)cpus, flags); +}