mbox series

[v9,00/10] riscv: support kernel-mode Vector

Message ID 20231229143627.22898-1-andy.chiu@sifive.com (mailing list archive)
Headers show
Series riscv: support kernel-mode Vector | expand

Message

Andy Chiu Dec. 29, 2023, 2:36 p.m. UTC
This series provides support running Vector in kernel mode.
Additionally, kernel-mode Vector can be configured to run without
turnning off preemption on a CONFIG_PREEMPT kernel. Along with the
suport, we add Vector optimized copy_{to,from}_user. And provide a
simple threshold to decide when to run the vectorized functions.

We decided to drop patch 6 ("riscv: lib: add vectorized mem* routines")
from the last series for the moment. We would like to discuss the issue
before adding it back (or not). We hope this would help keep the review
process going.

The issue is that the side-effect of mem* functions could damage the
user expectation. This could happens when destination, or source memory
overlaps with the memory touched by kernel_vector_{begin,end}(). In the
observed case, an optimized task_struct asignment in copy_process()
caused an implicit write to current->softirqs_enabled in
kernel_vector_begin(). Since the field locates within the soure memory
region, the temporarily change for the field was copied to the
destination task_struct. After the copy, a softirq assertion check
failed because the kernel expects softirq should not be diabled for the
new task_struct.

Please note that riscv_v_flags in destination task_struct also has a
stale value after this copy, but it is fine. Because the flag is not
shared outside of kernel-mode Vector and we make sure initializing it
before starting any threads.

(with CONFIG_PROVE_LOCKING)
copy_process:
    arch_dup_task_struct:
        *dst = *src; # optimized to memcpy():
	    kernel_vector_begin():
	        current->softirqs_enabled = 0;
	    __asm_memcpy_vector();
    DEBUG_LOCKS_WARN_ON(!dst->softirqs_enabled);

A possible solution is to provide an alias check and fall back to scalar
one if either source/destination memory overlaps with the current
task_struct. This will increase the overhead for both versions of mem*
routines.

Or, we could use vectorized version of mem* only when we don't have to
alter shared states in task_struct. For example, if we already disable
bh, or if we are in softirq. In these cases we can proceed to save dirty
context, if any, and use Vector. This will apply just to very constraint
operations such as mem*, where we have a clear bound and fallback.

Another direction is to minimize the footprint of kernel_vector_begin().
e.g. upgrading local_bh_disable to preempt_disable should mitigate the
current issue. This will require us to add a percpu storage for V on
non-preemptible Vector.

Since preempt_v does not alter any shared states in task_struct when
activating it in task context, it is safer to call vectorized mem*.
However, since the current fallback for preempt_v still touches the
shared state, it is not consider entirely safe to use vectorized mem*.
Still, it is possible to make preempt_v safe to use vectorized mem*, by
refusing to launch new kernel thread if the V context allocation fails.
So kernel threads will always use preempt_v in their task context.

This series is composed by 4 parts:
 patch 1-4: adds basic support for kernel-mode Vector
 patch 5: includes vectorized copy_{to,from}_user into the kernel
 patch 6: refactor context switch code in fpu [2]
 patch 7-10: provides some code refactors and support for preemptible
             kernel-mode Vector.

This series can be merged if we feel any part of {1~4, 5, 6, 7~10} is
mature enough.

This patch is tested on a QEMU with V and verified that booting, normal
userspace operations all work as usual with thresholds set to 0. Also,
we test by launching multiple kernel threads which continuously executes
and verifies Vector operations in the background. The module that tests
these operation is expected to be upstream later.

v8 of this series can be found at [1]

[1]: https://lore.kernel.org/all/20231223042914.18599-1-andy.chiu@sifive.com/
[2]: https://lore.kernel.org/linux-riscv/20231221070449.1809020-1-songshuaishuai@tinylab.org/

Patch summary:
 - Updated patches: 1, 4, 10
 - New patch: 6
 - Unchanged patch: 2, 3, 5, 7, 8, 9
 - Deleted patch: 6 (from v8)

Changelog v9:
 - Use one bit to record the on/off status of kernel-mode Vector
 - Temporarily drop vectorized mem* functions
 - Add a patch to refactor context switch in fpu
 - silence lockdep and use WARN_ON instead

Changelog v8:
 - Address build fail on no-mmu config
 - Fix build fail with W=1
 - Refactor patches (1, 2), Eric

Changelog v7:
 - Fix build fail for allmodconfig and test building the series with
   allmodconfig/allyesconfig

Changelog v6:
 - Provide a more robust check on the use of non-preemptible Vector.
 - Add Kconfigs to set threshold value at compile time. (Charlie)
 - Add a patch to utilize kmem_cache_* for V context allocations.
 - Re-write and add preemptible Vector.

Changelog v5:
 - Rebase on top of riscv for-next (6.7-rc1)
Changelog v4:
 - Use kernel_v_flags and helpers to track vector context.
 - Prevent softirq from nesting V context for non-preempt V
 - Add user copy and mem* routines

Changelog v3:
 - Rebase on top of riscv for-next (6.6-rc1)
 - Fix a build issue (Conor)
 - Guard vstate_save, vstate_restore with {get,put}_cpu_vector_context.
 - Save V context after disabling preemption. (Guo)
 - Remove irqs_disabled() check from may_use_simd(). (Björn)
 - Comment about nesting V context.

Changelog v2:
 - fix build issues
 - Follow arm's way of starting kernel-mode simd code:
   - add include/asm/simd.h and rename may_use_vector() ->
     may_use_simd()
   - return void in kernel_vector_begin(), and BUG_ON if may_use_simd()
     fails
 - Change naming scheme for functions/macros (Conor):
   - remove KMV
   - 's/rvv/vector/'
   - 's/RISCV_ISA_V_PREEMPTIVE_KMV/RISCV_ISA_V_PREEMPTIVE/'
   - 's/TIF_RISCV_V_KMV/TIF_RISCV_V_KERNEL_MODE/'

Andy Chiu (8):
  riscv: vector: make Vector always available for softirq context
  riscv: sched: defer restoring Vector context for user
  riscv: lib: vectorize copy_to_user/copy_from_user
  riscv: fpu: drop SR_SD bit checking
  riscv: vector: do not pass task_struct into
    riscv_v_vstate_{save,restore}()
  riscv: vector: use a mask to write vstate_ctrl
  riscv: vector: use kmem_cache to manage vector context
  riscv: vector: allow kernel-mode Vector with preemption

Greentime Hu (2):
  riscv: Add support for kernel mode vector
  riscv: Add vector extension XOR implementation

 arch/riscv/Kconfig                      |  22 +++
 arch/riscv/include/asm/asm-prototypes.h |  27 +++
 arch/riscv/include/asm/entry-common.h   |  17 ++
 arch/riscv/include/asm/processor.h      |  42 +++-
 arch/riscv/include/asm/simd.h           |  64 ++++++
 arch/riscv/include/asm/switch_to.h      |   3 +-
 arch/riscv/include/asm/thread_info.h    |   2 +
 arch/riscv/include/asm/vector.h         | 100 ++++++++--
 arch/riscv/include/asm/xor.h            |  68 +++++++
 arch/riscv/kernel/Makefile              |   1 +
 arch/riscv/kernel/entry.S               |   8 +
 arch/riscv/kernel/kernel_mode_vector.c  | 251 ++++++++++++++++++++++++
 arch/riscv/kernel/process.c             |  13 +-
 arch/riscv/kernel/ptrace.c              |   7 +-
 arch/riscv/kernel/signal.c              |   7 +-
 arch/riscv/kernel/vector.c              |  50 ++++-
 arch/riscv/lib/Makefile                 |   7 +-
 arch/riscv/lib/riscv_v_helpers.c        |  44 +++++
 arch/riscv/lib/uaccess.S                |  10 +
 arch/riscv/lib/uaccess_vector.S         |  50 +++++
 arch/riscv/lib/xor.S                    |  81 ++++++++
 21 files changed, 846 insertions(+), 28 deletions(-)
 create mode 100644 arch/riscv/include/asm/simd.h
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c
 create mode 100644 arch/riscv/lib/riscv_v_helpers.c
 create mode 100644 arch/riscv/lib/uaccess_vector.S
 create mode 100644 arch/riscv/lib/xor.S

Comments

Eric Biggers Jan. 5, 2024, 6:59 p.m. UTC | #1
Hi Palmer,

On Fri, Dec 29, 2023 at 02:36:17PM +0000, Andy Chiu wrote:
> This series provides support running Vector in kernel mode.

Is there any chance that you can take patches 1 and 2 of this series via the
RISC-V tree for v6.8?  The vector crypto patchset depends on those two patches.

Thanks,

- Eric
Andy Chiu Jan. 8, 2024, 3:16 a.m. UTC | #2
On Sat, Jan 6, 2024 at 2:59 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Hi Palmer,
>
> On Fri, Dec 29, 2023 at 02:36:17PM +0000, Andy Chiu wrote:
> > This series provides support running Vector in kernel mode.
>
> Is there any chance that you can take patches 1 and 2 of this series via the
> RISC-V tree for v6.8?  The vector crypto patchset depends on those two patches.

Patch 4 is very essential for non-preemptible kernel-mode Vector,
unless we want to bear restoring overhead at each kernel_vector_end().
It's ~200 cycles on a VLEN=128 hardware. It happens as long as Vector
is also used in user-mode, which will be very common. Please consider
including up to patch 4 if we must split.

>
> Thanks,
>
> - Eric

Thanks,
Andy
Eric Biggers Jan. 8, 2024, 3:52 a.m. UTC | #3
On Mon, Jan 08, 2024 at 11:16:27AM +0800, Andy Chiu wrote:
> On Sat, Jan 6, 2024 at 2:59 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Hi Palmer,
> >
> > On Fri, Dec 29, 2023 at 02:36:17PM +0000, Andy Chiu wrote:
> > > This series provides support running Vector in kernel mode.
> >
> > Is there any chance that you can take patches 1 and 2 of this series via the
> > RISC-V tree for v6.8?  The vector crypto patchset depends on those two patches.
> 
> Patch 4 is very essential for non-preemptible kernel-mode Vector,
> unless we want to bear restoring overhead at each kernel_vector_end().
> It's ~200 cycles on a VLEN=128 hardware. It happens as long as Vector
> is also used in user-mode, which will be very common. Please consider
> including up to patch 4 if we must split.
> 

It's definitely an important optimization, but I wouldn't call it "very
essential".  x86 didn't implement lazy restores until kernel v5.2.

My hope is that we can just get enough merged soon to unblock the vector crypto
patchset, which just needs kernel_vector_{begin,end} with softirq support.  The
remaining core vector work can happen in parallel.

- Eric
Palmer Dabbelt Jan. 10, 2024, 3:02 p.m. UTC | #4
On Sun, 07 Jan 2024 19:52:09 PST (-0800), ebiggers@kernel.org wrote:
> On Mon, Jan 08, 2024 at 11:16:27AM +0800, Andy Chiu wrote:
>> On Sat, Jan 6, 2024 at 2:59 AM Eric Biggers <ebiggers@kernel.org> wrote:
>> >
>> > Hi Palmer,
>> >
>> > On Fri, Dec 29, 2023 at 02:36:17PM +0000, Andy Chiu wrote:
>> > > This series provides support running Vector in kernel mode.
>> >
>> > Is there any chance that you can take patches 1 and 2 of this series via the
>> > RISC-V tree for v6.8?  The vector crypto patchset depends on those two patches.
>> 
>> Patch 4 is very essential for non-preemptible kernel-mode Vector,
>> unless we want to bear restoring overhead at each kernel_vector_end().
>> It's ~200 cycles on a VLEN=128 hardware. It happens as long as Vector
>> is also used in user-mode, which will be very common. Please consider
>> including up to patch 4 if we must split.
>> 
>
> It's definitely an important optimization, but I wouldn't call it "very
> essential".  x86 didn't implement lazy restores until kernel v5.2.
>
> My hope is that we can just get enough merged soon to unblock the vector crypto
> patchset, which just needs kernel_vector_{begin,end} with softirq support.  The
> remaining core vector work can happen in parallel.

Andy says he's going to send a v10 tomorrow, I'll plan on merging that.

>
> - Eric
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv