diff mbox series

drm/i915: check to see if the FPU is available before using it

Message ID 20200328000422.98978-1-Jason@zx2c4.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: check to see if the FPU is available before using it | expand

Commit Message

Jason A. Donenfeld March 28, 2020, 12:04 a.m. UTC
It's not safe to just grab the FPU willy nilly without first checking to
see if it's available. This patch adds the usual call to may_use_simd()
and falls back to boring memcpy if it's not available.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/gpu/drm/i915/i915_memcpy.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Jason A. Donenfeld April 8, 2020, 12:40 a.m. UTC | #1
On Sat, Mar 28, 2020 at 1:59 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Jason A. Donenfeld (2020-03-28 00:04:22)
> > It's not safe to just grab the FPU willy nilly without first checking to
> > see if it's available. This patch adds the usual call to may_use_simd()
> > and falls back to boring memcpy if it's not available.
>
> These instructions do not use the fpu, nor are these registers aliased
> over the fpu stack. This description and the may_use_simd() do not
> look like they express the right granularity as to which simd state are
> included

Most of the time when discussing vector instructions in the kernel
with x86, "FPU" is used to denote the whole shebang, because of
similar XSAVE semantics and requirements with actual floating point
instructions and SIMD instructions. So when I say "grab the FPU", I'm
really referring to the act of messing with any registers that aren't
saved and restored by default during context switch and need the
explicit marking for XSAVE to come in -- the kernel_fpu_begin/end
calls that you already have.

With regards to the granularity here, you are in fact touching xmm
registers. That means you need kernel_fpu_begin/end, which you
correctly have. However, it also means that before using those, you
should check to see if that's okay by using the may_use_simd()
instruction.

Now you may claim that at the moment
may_use_simd()-->irq_fpu_usable()-->(!in_interrupt() ||
interrupted_user_mode() || interrupted_kernel_fpu_idle()) always holds
true, and you're a keen follower of the (recently changed) kernel fpu
x86 semantics in case those conditions change, and that your driver is
so strictly written that you know exactly the context this fancy
memcpy will run in, always, and you'll never deviate from it, and
therefore it's okay to depart from the rules and omit the check and
safe fallback code. But c'mon - i915 is complex, and mixed context
bugs abound, and the rules for using those registers might in fact
change without you noticing.

So why not apply this to have a safe fallback for when the numerous
assumptions no longer hold? (If you're extra worried I suppose you
could make it a `if (WARN_ON(!may_use_simd()))` instead or something,
but that seems like a bit much.)

> Look at caller, return the error and let them decide if they can avoid
> the read from WC, which quite often they can. And no, this is not done
> from interrupt context, we would be crucified if we did.

Ahh, now, reading this comment here I realize maybe I've misunderstood
you. Do you mean to say that checking for may_use_simd() is a thing
that you'd like to do after all, but you don't like it falling back to
slow memcpy. Instead, you'd like for the code to return an error
value, and then caller can just optionally skip the memcpy under some
complicated driver circumstances?

Jason
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_memcpy.c b/drivers/gpu/drm/i915/i915_memcpy.c
index fdd550405fd3..7c0e022586bc 100644
--- a/drivers/gpu/drm/i915/i915_memcpy.c
+++ b/drivers/gpu/drm/i915/i915_memcpy.c
@@ -24,6 +24,7 @@ 
 
 #include <linux/kernel.h>
 #include <asm/fpu/api.h>
+#include <asm/simd.h>
 
 #include "i915_memcpy.h"
 
@@ -38,6 +39,12 @@  static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
 #ifdef CONFIG_AS_MOVNTDQA
 static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
 {
+	if (unlikely(!may_use_simd())) {
+		memcpy(dst, src, len);
+		return;
+	}
+
+
 	kernel_fpu_begin();
 
 	while (len >= 4) {
@@ -67,6 +74,11 @@  static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
 
 static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len)
 {
+	if (unlikely(!may_use_simd())) {
+		memcpy(dst, src, len);
+		return;
+	}
+
 	kernel_fpu_begin();
 
 	while (len >= 4) {