Message ID | 20220325022219.829-2-chang.seok.bae@intel.com (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | x86/fpu: Make AMX state ready for CPU idle | expand |
On 3/24/22 19:22, Chang S. Bae wrote: > Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly > reordering volatile asm statements with each other [1]. While this bug was > fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0 > read/write do not appear to be impacted, it is preventive to ensure them on > the program order. I thought you *actually* saw xgetbv() be reordered, though. Was that on a buggy compiler?
On 4/1/22 10:58, Dave Hansen wrote: > On 3/24/22 19:22, Chang S. Bae wrote: >> Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly >> reordering volatile asm statements with each other [1]. While this bug was >> fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0 >> read/write do not appear to be impacted, it is preventive to ensure them on >> the program order. > I thought you *actually* saw xgetbv() be reordered, though. Was that on > a buggy compiler? Actually, reading the gcc bug[1] a bit more, it says: > However, correctness here depends on the compiler honouring the > ordering of volatile asm statements with respect to other volatile > code, and this patch fixes that. In your case, there was presumably no volatile code, just a fpu_state_size_dynamic() call. The compiler thought the xgetbv() was safe to reorder, and did so. So, I'm not quite sure how that bug is relevant. It just dealt with multiple "asm volatile" statements that don't have memory clobbers. We only have one "asm volatile" in play. Adding the memory clobber should make that bug totally irrelevant. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
On 4/1/2022 11:16 AM, Dave Hansen wrote: > On 4/1/22 10:58, Dave Hansen wrote: >> On 3/24/22 19:22, Chang S. Bae wrote: >>> Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly >>> reordering volatile asm statements with each other [1]. While this bug was >>> fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0 >>> read/write do not appear to be impacted, it is preventive to ensure them on >>> the program order. >> I thought you *actually* saw xgetbv() be reordered, though. Was that on >> a buggy compiler? No, sorry, my earlier analysis was naive. The #UD was raised on the non-XGETBV1 system because Objtool didn't process fpu_state_size_dynamic() at build time. It was only when the TILERELEASE opcode was not given. Here is a bit more detail: https://lore.kernel.org/lkml/aa3ff0f4-d74c-2c84-37c0-0880cabc0dd4@intel.com/ > > Actually, reading the gcc bug[1] a bit more, it says: > >> However, correctness here depends on the compiler honouring the >> ordering of volatile asm statements with respect to other volatile >> code, and this patch fixes that. > In your case, there was presumably no volatile code, just a > fpu_state_size_dynamic() call. The compiler thought the xgetbv() was > safe to reorder, and did so. > > So, I'm not quite sure how that bug is relevant. It just dealt with > multiple "asm volatile" statements that don't have memory clobbers. We > only have one "asm volatile" in play. Adding the memory clobber should > make that bug totally irrelevant. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Yeah, now this patch looks to have created more confusion than fixing any real problem. Let me drop this on v4. Thanks, Chang
diff --git a/arch/x86/include/asm/fpu/xcr.h b/arch/x86/include/asm/fpu/xcr.h index 9656a5bc6fea..9b513e7c0161 100644 --- a/arch/x86/include/asm/fpu/xcr.h +++ b/arch/x86/include/asm/fpu/xcr.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_FPU_XCR_H #define _ASM_X86_FPU_XCR_H +#include <asm/special_insns.h> + #define XCR_XFEATURE_ENABLED_MASK 0x00000000 #define XCR_XFEATURE_IN_USE_MASK 0x00000001 @@ -9,7 +11,8 @@ static inline u64 xgetbv(u32 index) { u32 eax, edx; - asm volatile("xgetbv" : "=a" (eax), "=d" (edx) : "c" (index)); + asm volatile("xgetbv" : "=a" (eax), "=d" (edx) : "c" (index), + __FORCE_ORDER); return eax + ((u64)edx << 32); } @@ -18,7 +21,8 @@ static inline void xsetbv(u32 index, u64 value) u32 eax = value; u32 edx = value >> 32; - asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index)); + asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index) + : "memory"); } /*
Some old GCC versions (4.9.x and 5.x) have an issue that incorrectly reordering volatile asm statements with each other [1]. While this bug was fixed on later versions (8.1, 7.3, and 6.5), and the kernel's current XCR0 read/write do not appear to be impacted, it is preventive to ensure them on the program order. Have a memory clobber for write to prevent caching/reordering memory accesses across other XCR0 writes. A dummy operand with an arbitrary address can prevent the compiler from reordering with other writes. Add the dummy operand for read as used for other accessors in aa5cacdc29d ("x86/asm: Replace __force_order with a memory clobber"). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org --- Changes from v2: * Add as a new patch (Dave Hansen). --- arch/x86/include/asm/fpu/xcr.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)