diff mbox series

[v3,1/3] x86/fpu: Make XCR0 accessors immune to unwanted compiler reordering

Message ID 20220325022219.829-2-chang.seok.bae@intel.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series x86/fpu: Make AMX state ready for CPU idle | expand

Commit Message

Chang S. Bae March 25, 2022, 2:22 a.m. UTC
Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly
reordering volatile asm statements with each other [1]. While this bug was
fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0
read/write do not appear to be impacted, it is preventive to ensure them on
the program order.

Have a memory clobber for write to prevent caching/reordering memory
accesses across other XCR0 writes. A dummy operand with an arbitrary
address can prevent the compiler from reordering with other writes. Add the
dummy operand for read as used for other accessors in aa5cacdc29d
("x86/asm: Replace __force_order with a memory clobber").

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v2:
* Add as a new patch (Dave Hansen).
---
 arch/x86/include/asm/fpu/xcr.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Dave Hansen April 1, 2022, 5:58 p.m. UTC | #1
On 3/24/22 19:22, Chang S. Bae wrote:
> Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly
> reordering volatile asm statements with each other [1]. While this bug was
> fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0
> read/write do not appear to be impacted, it is preventive to ensure them on
> the program order.

I thought you *actually* saw xgetbv() be reordered, though.  Was that on
a buggy compiler?
Dave Hansen April 1, 2022, 6:16 p.m. UTC | #2
On 4/1/22 10:58, Dave Hansen wrote:
> On 3/24/22 19:22, Chang S. Bae wrote:
>> Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly
>> reordering volatile asm statements with each other [1]. While this bug was
>> fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0
>> read/write do not appear to be impacted, it is preventive to ensure them on
>> the program order.
> I thought you *actually* saw xgetbv() be reordered, though.  Was that on
> a buggy compiler?

Actually, reading the gcc bug[1] a bit more, it says:

> However, correctness here depends on the compiler honouring the
> ordering of volatile asm statements with respect to other volatile
> code, and this patch fixes that.
In your case, there was presumably no volatile code, just a
fpu_state_size_dynamic() call.  The compiler thought the xgetbv() was
safe to reorder, and did so.

So, I'm not quite sure how that bug is relevant.  It just dealt with
multiple "asm volatile" statements that don't have memory clobbers.  We
only have one "asm volatile" in play.  Adding the memory clobber should
make that bug totally irrelevant.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Chang S. Bae April 1, 2022, 10:14 p.m. UTC | #3
On 4/1/2022 11:16 AM, Dave Hansen wrote:
> On 4/1/22 10:58, Dave Hansen wrote:
>> On 3/24/22 19:22, Chang S. Bae wrote:
>>> Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly
>>> reordering volatile asm statements with each other [1]. While this bug was
>>> fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0
>>> read/write do not appear to be impacted, it is preventive to ensure them on
>>> the program order.
>> I thought you *actually* saw xgetbv() be reordered, though.  Was that on
>> a buggy compiler?

No, sorry, my earlier analysis was naive. The #UD was raised on the 
non-XGETBV1 system because Objtool didn't process 
fpu_state_size_dynamic() at build time. It was only when the TILERELEASE 
opcode was not given. Here is a bit more detail:
https://lore.kernel.org/lkml/aa3ff0f4-d74c-2c84-37c0-0880cabc0dd4@intel.com/

> 
> Actually, reading the gcc bug[1] a bit more, it says:
> 
>> However, correctness here depends on the compiler honouring the
>> ordering of volatile asm statements with respect to other volatile
>> code, and this patch fixes that.
> In your case, there was presumably no volatile code, just a
> fpu_state_size_dynamic() call.  The compiler thought the xgetbv() was
> safe to reorder, and did so.
> 
> So, I'm not quite sure how that bug is relevant.  It just dealt with
> multiple "asm volatile" statements that don't have memory clobbers.  We
> only have one "asm volatile" in play.  Adding the memory clobber should
> make that bug totally irrelevant.
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

Yeah, now this patch looks to have created more confusion than fixing 
any real problem. Let me drop this on v4.

Thanks,
Chang
diff mbox series

Patch

diff --git a/arch/x86/include/asm/fpu/xcr.h b/arch/x86/include/asm/fpu/xcr.h
index 9656a5bc6fea..9b513e7c0161 100644
--- a/arch/x86/include/asm/fpu/xcr.h
+++ b/arch/x86/include/asm/fpu/xcr.h
@@ -2,6 +2,8 @@ 
 #ifndef _ASM_X86_FPU_XCR_H
 #define _ASM_X86_FPU_XCR_H
 
+#include <asm/special_insns.h>
+
 #define XCR_XFEATURE_ENABLED_MASK	0x00000000
 #define XCR_XFEATURE_IN_USE_MASK	0x00000001
 
@@ -9,7 +11,8 @@  static inline u64 xgetbv(u32 index)
 {
 	u32 eax, edx;
 
-	asm volatile("xgetbv" : "=a" (eax), "=d" (edx) : "c" (index));
+	asm volatile("xgetbv" : "=a" (eax), "=d" (edx) : "c" (index),
+				__FORCE_ORDER);
 	return eax + ((u64)edx << 32);
 }
 
@@ -18,7 +21,8 @@  static inline void xsetbv(u32 index, u64 value)
 	u32 eax = value;
 	u32 edx = value >> 32;
 
-	asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index));
+	asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index)
+			      :	 "memory");
 }
 
 /*