Message ID | 20220120160822.852009966@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | sched: User Managed Concurrency Groups | expand |
On Thu, Jan 20, 2022, Peter Zijlstra wrote: > Do try_cmpxchg() loops on userspace addresses. > > Cc: Sean Christopherson <seanjc@google.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/include/asm/uaccess.h | 67 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 67 insertions(+) > > --- a/arch/x86/include/asm/uaccess.h > +++ b/arch/x86/include/asm/uaccess.h > @@ -342,6 +342,24 @@ do { \ > : [umem] "m" (__m(addr)) \ > : : label) > > +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ > + bool success; \ > + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ > + __typeof__(*(_ptr)) __old = *_old; \ > + __typeof__(*(_ptr)) __new = (_new); \ > + asm_volatile_goto("\n" \ > + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ > + _ASM_EXTABLE_UA(1b, %l[label]) \ > + : CC_OUT(z) (success), \ > + [ptr] "+m" (*_ptr), \ > + [old] "+a" (__old) \ > + : [new] ltype (__new) \ > + : "memory", "cc" \ IIUC, the "cc" clobber is unnecessary as CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y implies __GCC_ASM_FLAG_OUTPUTS__=y, i.e. CC_OUT() will resolve to "=@cc". > + : label); \ > + if (unlikely(!success)) \ > + *_old = __old; \ > + likely(success); }) > + > #else // !CONFIG_CC_HAS_ASM_GOTO_OUTPUT ... > +extern void __try_cmpxchg_user_wrong_size(void); > + > +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ > + __typeof__(*(_ptr)) __ret; \ This should probably be a bool, the return from the lower level helpers is a bool that's true if the exchange succeed. Declaring the type of the target implies that they return the raw result, which is confusing. > + switch (sizeof(__ret)) { \ > + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ > + break; \ > + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ > + break; \ > + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ > + break; \ > + case 8: __ret = __try_cmpxchg_user_asm("q", "r", \ > + (_ptr), (_oldp), \ > + (_nval), _label); \ Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due to using it to atomically update guest PAE PTEs and LTR descriptors (yay). Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM could add its own macro, but that seems a little silly. E.g. somethign like this, though I don't think this is correct (something is getting inverted somewhere and the assembly output is a nightmare): /* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */ #define ___try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ int ____ret = -EFAULT; \ __uaccess_begin_nospec(); \ ____ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \ _label: \ __uaccess_end(); \ ____ret; \ }) Lastly, assuming I get my crap working, mind if I post a variant (Cc'd to stable@) in the context of KVM series? Turns out KVM has an ugly bug where it completely botches the pfn calculation of memory it remaps and accesses[*], the easiest fix is to switch to __try_cmpxchg_user() and purge the nastiness. [*] https://lore.kernel.org/all/20220124172633.103323-1-tadeusz.struk@linaro.org
On Thu, Jan 27, 2022, Sean Christopherson wrote: > Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due > to using it to atomically update guest PAE PTEs and LTR descriptors (yay). > > Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly > less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address > way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM > could add its own macro, but that seems a little silly. E.g. somethign like this, > though I don't think this is correct *sigh* Finally realized I forgot to add back the page offset after converting from guest page frame to host virtual address. Anyways, this is what I ended up with, will test more tomorrow. From: Peter Zijlstra <peterz@infradead.org> Date: Thu, 20 Jan 2022 16:55:21 +0100 Subject: [PATCH] x86/uaccess: Implement unsafe_try_cmpxchg_user() Do try_cmpxchg() loops on userspace addresses. Provide 8-byte versions for 32-bit kernels so that KVM can do cmpxchg on guest PAE PTEs, which are accessed via userspace addresses. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/include/asm/uaccess.h | 129 +++++++++++++++++++++++++++++++++ 1 file changed, 129 insertions(+) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index ac96f9b2d64b..b706008aed28 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -342,6 +342,45 @@ do { \ : [umem] "m" (__m(addr)) \ : : label) +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + +#ifdef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + "+A" (__old), \ + [ptr] "+m" (*_ptr) \ + : "b" ((u32)__new), \ + "c" ((u32)((u64)__new >> 32)) \ + : "memory" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) +#endif // CONFIG_X86_32 + #else // !CONFIG_CC_HAS_ASM_GOTO_OUTPUT #ifdef CONFIG_X86_32 @@ -407,6 +446,57 @@ do { \ : [umem] "m" (__m(addr)), \ "0" (err)) +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + int __err = 0; \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + CC_SET(z) \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, \ + %[errout]) \ + : CC_OUT(z) (success), \ + [errout] "+r" (__err), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory", "cc"); \ + if (unlikely(__err)) \ + goto label; \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + +#ifdef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \ + int __err = 0; \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + CC_SET(z) \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, \ + %[errout]) \ + : CC_OUT(z) (success), \ + [errout] "+r" (__err), \ + "+A" (__old), \ + [ptr] "+m" (*_ptr) \ + : "b" ((u32)__new), \ + "c" ((u32)((u64)__new >> 32)) \ + : "memory", "cc"); \ + if (unlikely(__err)) \ + goto label; \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) +#endif // CONFIG_X86_32 + #endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT /* FIXME: this hack is definitely wrong -AK */ @@ -501,6 +591,45 @@ do { \ } while (0) #endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT +extern void __try_cmpxchg_user_wrong_size(void); + +#ifndef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _oldp, _nval, _label) \ + __try_cmpxchg_user_asm("q", "r", (_ptr), (_oldp), (_nval), _label) +#endif + +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + bool __ret; \ + switch (sizeof(*(_ptr))) { \ + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 8: __ret = __try_cmpxchg64_user_asm((_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + default: __try_cmpxchg_user_wrong_size(); \ + } \ + __ret; }) + +/* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */ +#define __try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + int __ret = -EFAULT; \ + __uaccess_begin_nospec(); \ + __ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \ +_label: \ + __uaccess_end(); \ + __ret; \ + }) + /* * We want the unsafe accessors to always be inlined and use * the error labels - thus the macro games. --
On Thu, Jan 27, 2022 at 02:17:20AM +0000, Sean Christopherson wrote: > On Thu, Jan 20, 2022, Peter Zijlstra wrote: > > Do try_cmpxchg() loops on userspace addresses. > > > > Cc: Sean Christopherson <seanjc@google.com> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > arch/x86/include/asm/uaccess.h | 67 +++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 67 insertions(+) > > > > --- a/arch/x86/include/asm/uaccess.h > > +++ b/arch/x86/include/asm/uaccess.h > > @@ -342,6 +342,24 @@ do { \ > > : [umem] "m" (__m(addr)) \ > > : : label) > > > > +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ > > + bool success; \ > > + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ > > + __typeof__(*(_ptr)) __old = *_old; \ > > + __typeof__(*(_ptr)) __new = (_new); \ > > + asm_volatile_goto("\n" \ > > + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ > > + _ASM_EXTABLE_UA(1b, %l[label]) \ > > + : CC_OUT(z) (success), \ > > + [ptr] "+m" (*_ptr), \ > > + [old] "+a" (__old) \ > > + : [new] ltype (__new) \ > > + : "memory", "cc" \ > > IIUC, the "cc" clobber is unnecessary as CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y implies > __GCC_ASM_FLAG_OUTPUTS__=y, i.e. CC_OUT() will resolve to "=@cc". Yeah, even without that GCC always assumes 'cc' is clobbered due to hysterical raisins. > > + : label); \ > > + if (unlikely(!success)) \ > > + *_old = __old; \ > > + likely(success); }) > > + > > #else // !CONFIG_CC_HAS_ASM_GOTO_OUTPUT > > ... > > > +extern void __try_cmpxchg_user_wrong_size(void); > > + > > +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ > > + __typeof__(*(_ptr)) __ret; \ > > This should probably be a bool, the return from the lower level helpers is a bool > that's true if the exchange succeed. Declaring the type of the target implies > that they return the raw result, which is confusing. Fair enough. > > + switch (sizeof(__ret)) { \ > > + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ > > + (_ptr), (_oldp), \ > > + (_nval), _label); \ > > + break; \ > > + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ > > + (_ptr), (_oldp), \ > > + (_nval), _label); \ > > + break; \ > > + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ > > + (_ptr), (_oldp), \ > > + (_nval), _label); \ > > + break; \ > > + case 8: __ret = __try_cmpxchg_user_asm("q", "r", \ > > + (_ptr), (_oldp), \ > > + (_nval), _label); \ > > Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due > to using it to atomically update guest PAE PTEs and LTR descriptors (yay). :-) I'm so trying to de-feature 32bit. > Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly > less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address > way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM > could add its own macro, but that seems a little silly. E.g. somethign like this, > though I don't think this is correct (something is getting inverted somewhere and > the assembly output is a nightmare): > > /* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */ > #define ___try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ > int ____ret = -EFAULT; \ > __uaccess_begin_nospec(); \ > ____ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \ > _label: \ > __uaccess_end(); \ > ____ret; \ > }) Works for me I suppose, but we really ought to keep usage of that in arch code. > Lastly, assuming I get my crap working, mind if I post a variant (Cc'd to stable@) in > the context of KVM series? Not at all.
On Thu, Jan 27, 2022 at 06:36:19AM +0000, Sean Christopherson wrote: > On Thu, Jan 27, 2022, Sean Christopherson wrote: > > Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due > > to using it to atomically update guest PAE PTEs and LTR descriptors (yay). > > > > Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly > > less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address > > way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM > > could add its own macro, but that seems a little silly. E.g. somethign like this, > > though I don't think this is correct > > *sigh* > > Finally realized I forgot to add back the page offset after converting from guest > page frame to host virtual address. Anyways, this is what I ended up with, will > test more tomorrow. Looks about right :-) (famous last words etc..)
+Nick On Thu, Jan 27, 2022, Peter Zijlstra wrote: > On Thu, Jan 27, 2022 at 06:36:19AM +0000, Sean Christopherson wrote: > > On Thu, Jan 27, 2022, Sean Christopherson wrote: > > > Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due > > > to using it to atomically update guest PAE PTEs and LTR descriptors (yay). > > > > > > Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly > > > less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address > > > way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM > > > could add its own macro, but that seems a little silly. E.g. somethign like this, > > > though I don't think this is correct > > > > *sigh* > > > > Finally realized I forgot to add back the page offset after converting from guest > > page frame to host virtual address. Anyways, this is what I ended up with, will > > test more tomorrow. > > Looks about right :-) (famous last words etc..) And it was right, but clang-13 ruined the party :-/ clang barfs on asm goto with a "+m" input/output. Change the "+m" to "=m" and clang is happy. Remove usage of the label, clang is happy. I tried a bunch of different variants to see if anything would squeak by, but clang found a way to die on everything I threw at it. $ clang --version Debian clang version 13.0.0-9+build1 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin As written, with a named label param, clang yields: $ echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null <stdin>:1:29: error: invalid operand in inline asm: '.long (${1:l}) - .' int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; } ^ <stdin>:1:29: error: unknown token in expression <inline asm>:1:9: note: instantiated into assembly here .long () - . ^ 2 errors generated. While clang is perfectly happy switching "+m" to "=m": $ echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "=m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null Referencing the label with a numbered param yields either the original error: $ echo 'int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null <stdin>:1:29: error: invalid operand in inline asm: '.long (${1:l}) - .' int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; } ^ <stdin>:1:29: error: unknown token in expression <inline asm>:1:9: note: instantiated into assembly here .long () - . ^ 2 errors generated. Bumping the param number (more below) yields a different error (I tried defining tmp1, that didn't work :-) ). $ echo 'int foo(int *x) { asm goto (".long (%l2) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null error: Undefined temporary symbol .Ltmp1 1 error generated. Regarding the param number, gcc also appears to have a goof with asm goto and "+m", but bumping the param number in that case remedies its woes. $echo 'int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | gcc -x c - -c -o /dev/null <stdin>: In function ‘foo’: <stdin>:1:19: error: invalid 'asm': '%l' operand isn't a label $ echo 'int foo(int *x) { asm goto (".long (%l2) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | gcc -x c - -c -o /dev/null So my immediate question: how do we want to we deal with this in the kernel? Keeping in mind that I'd really like to send this to stable@ to fix the KVM mess. I can think of few options that are varying degrees of gross. 1) Use a more complex sequence for probing CC_HAS_ASM_GOTO_OUTPUT. 2) Use an output-only "=m" operand. 3) Use an input register param. Option #1 has the obvious downside of the fancier asm goto for __get_user_asm() and friends being collateral damage. The biggest benefit is it'd reduce the likelihood of someone else having to debug similar errors, which was quite painful. Options #2 and #3 are quite gross, but I _think_ would be ok since the sequence is tagged as clobbering memory anyways?
On Thu, Jan 27, 2022 at 3:33 PM Sean Christopherson <seanjc@google.com> wrote: > > +Nick (well, you asked the right person; you probably wont like the answer much though) > > On Thu, Jan 27, 2022, Peter Zijlstra wrote: > > On Thu, Jan 27, 2022 at 06:36:19AM +0000, Sean Christopherson wrote: > > > On Thu, Jan 27, 2022, Sean Christopherson wrote: > > > > Doh, I should have specified that KVM needs 8-byte CMPXCHG on 32-bit kernels due > > > > to using it to atomically update guest PAE PTEs and LTR descriptors (yay). > > > > > > > > Also, KVM's use case isn't a tight loop, how gross would it be to add a slightly > > > > less unsafe version that does __uaccess_begin_nospec()? KVM pre-checks the address > > > > way ahead of time, so the access_ok() check can be omitted. Alternatively, KVM > > > > could add its own macro, but that seems a little silly. E.g. somethign like this, > > > > though I don't think this is correct > > > > > > *sigh* > > > > > > Finally realized I forgot to add back the page offset after converting from guest > > > page frame to host virtual address. Anyways, this is what I ended up with, will > > > test more tomorrow. > > > > Looks about right :-) (famous last words etc..) > > And it was right, but clang-13 ruined the party :-/ > > clang barfs on asm goto with a "+m" input/output. Change the "+m" to "=m" and > clang is happy. Remove usage of the label, clang is happy. Yep, sorry, we only recently noticed that this was broken. I fixed this very recently (over the holidays) in clang-14, and is too risky and late to backport to clang-13 IMO. https://reviews.llvm.org/rG4edb9983cb8c8b850083ee5941468f5d0ef6fe2c > > I tried a bunch of different variants to see if anything would squeak by, but > clang found a way to die on everything I threw at it. > > $ clang --version > > Debian clang version 13.0.0-9+build1 > Target: x86_64-pc-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > > As written, with a named label param, clang yields: > > $ echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null > <stdin>:1:29: error: invalid operand in inline asm: '.long (${1:l}) - .' > int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; } > ^ > <stdin>:1:29: error: unknown token in expression > <inline asm>:1:9: note: instantiated into assembly here > .long () - . > ^ > 2 errors generated. > > While clang is perfectly happy switching "+m" to "=m": > > $ echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "=m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null = constraints should work with older clang releases; + constraints are what was broken (Not generally, only when using asm goto with outputs), fixed in clang-14. > > Referencing the label with a numbered param yields either the original error: > > $ echo 'int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null > <stdin>:1:29: error: invalid operand in inline asm: '.long (${1:l}) - .' > int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; } > ^ > <stdin>:1:29: error: unknown token in expression > <inline asm>:1:9: note: instantiated into assembly here > .long () - . > ^ > 2 errors generated. ^ That case should not work in either compilers, more info below... > > Bumping the param number (more below) yields a different error (I tried defining > tmp1, that didn't work :-) ). > > $ echo 'int foo(int *x) { asm goto (".long (%l2) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | clang -x c - -c -o /dev/null > error: Undefined temporary symbol .Ltmp1 > 1 error generated. "Bumping the param number" will be required when using numbered references, more info below... > > Regarding the param number, gcc also appears to have a goof with asm goto and "+m", > but bumping the param number in that case remedies its woes. > > $echo 'int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | gcc -x c - -c -o /dev/null > <stdin>: In function ‘foo’: > <stdin>:1:19: error: invalid 'asm': '%l' operand isn't a label > > $ echo 'int foo(int *x) { asm goto (".long (%l2) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | gcc -x c - -c -o /dev/null Right, so in fixing the above issue with tied outputs, I noticed that the GCC implementation of asm goto with outputs had different behavior. I changed clang's implementation in clang-14 (same patch series) to match: https://reviews.llvm.org/rG5c562f62a4ee15592f5d764d0710553a4b07a6f2 This comment summarizes most of my thoughts on the issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096#c7 Specifically the quote "It appears to me that the GCC decision here was accidental, and that when pointed out, the bug was simply documented rather than fixed." Though I think compatibility between compilers is ultimately more important. There's no standards bodies involved in these extension, which is simultaneously more flexible, yet can also lead to differences in implementations like this. Thanks for attending my TED talk. > > > So my immediate question: how do we want to we deal with this in the kernel? Keeping > in mind that I'd really like to send this to stable@ to fix the KVM mess. > > I can think of few options that are varying degrees of gross. > > 1) Use a more complex sequence for probing CC_HAS_ASM_GOTO_OUTPUT. > > 2) Use an output-only "=m" operand. > > 3) Use an input register param. > > Option #1 has the obvious downside of the fancier asm goto for __get_user_asm() > and friends being collateral damage. The biggest benefit is it'd reduce the > likelihood of someone else having to debug similar errors, which was quite painful. Right; I assumed we'd hit this at some point, as soon as people wanted to used tied outputs with asm goto. I'd rather have a different Kconfig test for working tied outputs, and that all uses in the kernels used the symbolic references which are much more readable and less confusing than the rules for numbered references (which are bug prone IMO). > > Options #2 and #3 are quite gross, but I _think_ would be ok since the sequence > is tagged as clobbering memory anyways?
On Thu, Jan 27, 2022, Nick Desaulniers wrote: > On Thu, Jan 27, 2022 at 3:33 PM Sean Christopherson <seanjc@google.com> wrote: > > Regarding the param number, gcc also appears to have a goof with asm goto and "+m", > > but bumping the param number in that case remedies its woes. > > > > $echo 'int foo(int *x) { asm goto (".long (%l1) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | gcc -x c - -c -o /dev/null > > <stdin>: In function ‘foo’: > > <stdin>:1:19: error: invalid 'asm': '%l' operand isn't a label > > > > $ echo 'int foo(int *x) { asm goto (".long (%l2) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; }' | gcc -x c - -c -o /dev/null > > Right, so in fixing the above issue with tied outputs, I noticed that > the GCC implementation of asm goto with outputs had different > behavior. I changed clang's implementation in clang-14 (same patch > series) to match: > https://reviews.llvm.org/rG5c562f62a4ee15592f5d764d0710553a4b07a6f2 > This comment summarizes most of my thoughts on the issue: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096#c7 > Specifically the quote "It appears to me that the GCC decision here > was accidental, and that when pointed out, the bug was simply > documented rather than fixed." I guess that makes a certain amount of sense, but wow that is subtle, confusing, and potentially dangerous. Looks like the hidden inputs are numbered after all explicit inputs, otherwise there would be broken code left and right, but it means that a typo like so: echo 'int foo(int x) { asm ("xor %0, %0; xor %2, %2" : "+a"(x) : "b"(x)); return x; return 0; }' | clang -x c - -c -o tmp.o will compile cleanly. > Though I think compatibility between compilers is ultimately more > important. There's no standards bodies involved in these extension, > which is simultaneously more flexible, yet can also lead to > differences in implementations like this. Thanks for attending my TED > talk. > > > > > > > So my immediate question: how do we want to we deal with this in the kernel? Keeping > > in mind that I'd really like to send this to stable@ to fix the KVM mess. > > > > I can think of few options that are varying degrees of gross. > > > > 1) Use a more complex sequence for probing CC_HAS_ASM_GOTO_OUTPUT. > > > > 2) Use an output-only "=m" operand. > > > > 3) Use an input register param. > > > > Option #1 has the obvious downside of the fancier asm goto for __get_user_asm() > > and friends being collateral damage. The biggest benefit is it'd reduce the > > likelihood of someone else having to debug similar errors, which was quite painful. > > Right; I assumed we'd hit this at some point, as soon as people wanted > to used tied outputs with asm goto. I'd rather have a different > Kconfig test for working tied outputs, Is it all tied outputs, or just "+m"? E.g. using "+r" compiles cleanly. echo 'int foo(int x) { asm goto ("xor %0, %0;.long (%l[bar]) - .\n": "+r"(x) ::: bar); return x; bar: return 0; }' | clang -x c - -c -o /dev/null It might be a moot point as I can't find any instances of "+"-anything in conjunction with asm_volatile_goto, i.e. adding CC_HAS_ASM_GOTO_OUTPUT_TIED_OUTPUTS won't create an inconsistency with existing code. Regardless, I like that idea. > and that all uses in the kernels used the symbolic references which are much > more readable and less confusing than the rules for numbered references > (which are bug prone IMO). 100% agree, even though it takes me twice as long to write because I can never remember the syntax :-) Converting all existing usage is probably a fools errand, but adding a checkpatch rule would get us going in the right direction.
--- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -342,6 +342,24 @@ do { \ : [umem] "m" (__m(addr)) \ : : label) +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory", "cc" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + #else // !CONFIG_CC_HAS_ASM_GOTO_OUTPUT #ifdef CONFIG_X86_32 @@ -407,6 +425,30 @@ do { \ : [umem] "m" (__m(addr)), \ "0" (err)) +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + int __err = 0; \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + CC_SET(z) \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, \ + %[errout]) \ + : CC_OUT(z) (success), \ + [errout] "+r" (__err), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory", "cc"); \ + if (unlikely(__err)) \ + goto label; \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + #endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT /* FIXME: this hack is definitely wrong -AK */ @@ -501,6 +543,31 @@ do { \ } while (0) #endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT +extern void __try_cmpxchg_user_wrong_size(void); + +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + __typeof__(*(_ptr)) __ret; \ + switch (sizeof(__ret)) { \ + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 8: __ret = __try_cmpxchg_user_asm("q", "r", \ + (_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + default: __try_cmpxchg_user_wrong_size(); \ + } \ + __ret; }) + /* * We want the unsafe accessors to always be inlined and use * the error labels - thus the macro games.
Do try_cmpxchg() loops on userspace addresses. Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/include/asm/uaccess.h | 67 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+)