Message ID | 20200327030918.7292-1-richard.henderson@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] arm64: Simplify __range_ok | expand |
Hi Richard, On Thu, Mar 26, 2020 at 08:09:18PM -0700, Richard Henderson wrote: > The general case is not quite as compact as the inline assembly, > but with a sufficiently advanced compiler it is only 6 insns vs 5. > > The real improvement comes from assuming that limit is never tiny, > and using __builtin_constant_p to make sure the constant folding > does not go awry. This produces a 2 insn sequence even for older > compilers. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Thanks for this; it looks good to me, minor comments below. I believe the const cases capture all realistic contsnt sizes, and handles those correctly. We miss a couple of constant cases that we could catch (and might want to do so for completeness) but either way I think this looks good. > --- > > Thanks for the v1 review, Mark. > > Text section size change: > > $ scripts/bloat-o-meter -t vmlinux{-orig,} | tail -1 > > * When built with gcc-10 (master) + cmpti patch set: > Total: Before=12820824, After=12774708, chg -0.36% > > * When built with gcc-7.5.0-3ubuntu1~18.04: > Total: Before=13564038, After=13513258, chg -0.37% > > Changes in v2: > * Adjust initialization of iaddr. > * Use USER_DS as the constant limit and update commentary. > > > r~ > > --- > arch/arm64/include/asm/uaccess.h | 32 +++++++++++++------------------- > 1 file changed, 13 insertions(+), 19 deletions(-) > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > index 32fc8061aa76..04ef201e6179 100644 > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@ -60,7 +60,8 @@ static inline void set_fs(mm_segment_t fs) > */ > static inline unsigned long __range_ok(const void __user *addr, unsigned long size) > { > - unsigned long ret, limit = current_thread_info()->addr_limit; > + unsigned long limit = current_thread_info()->addr_limit; > + unsigned long iaddr = (unsigned long)addr; > > /* > * Asynchronous I/O running in a kernel thread does not have the > @@ -72,24 +73,17 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si > addr = untagged_addr(addr); > > __chk_user_ptr(addr); > - asm volatile( > - // A + B <= C + 1 for all A,B,C, in four easy steps: > - // 1: X = A + B; X' = X % 2^64 > - " adds %0, %3, %2\n" > - // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4 > - " csel %1, xzr, %1, hi\n" > - // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X' > - // to compensate for the carry flag being set in step 4. For > - // X > 2^64, X' merely has to remain nonzero, which it does. > - " csinv %0, %0, xzr, cc\n" > - // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1 > - // comes from the carry in being clear. Otherwise, we are > - // testing X' - C == 0, subject to the previous adjustments. > - " sbcs xzr, %0, %1\n" > - " cset %0, ls\n" > - : "=&r" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc"); > - > - return ret; > + > + /* > + * The minimum value for limit is USER_DS, and quite a lot of > + * range checks use sizeof(some_type). With both constants, > + * we can rearrange the computation to avoid the need for > + * 65-bit arithmetic. > + */ > + if (__builtin_constant_p(size) && size > 0 && size < USER_DS) > + return iaddr <= limit + 1 - size; As above, I believe this can be: if (__builtin_constant_p(size) && size > 0 && size <= USER_DS + 1) return iaddr <= limit + 1 - size; ... but either way I think this is sound, and that's just for symmetry of the test cases. > + > + return (__uint128_t)iaddr + size <= (__uint128_t)limit + 1; I believe this is obviously correct. So with or without the change suggested above: Reviewed-by: Mark Rutland <mark.rutland@arm.com> Mark. > } > > #define access_ok(addr, size) __range_ok(addr, size) > -- > 2.17.1 >
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 32fc8061aa76..04ef201e6179 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -60,7 +60,8 @@ static inline void set_fs(mm_segment_t fs) */ static inline unsigned long __range_ok(const void __user *addr, unsigned long size) { - unsigned long ret, limit = current_thread_info()->addr_limit; + unsigned long limit = current_thread_info()->addr_limit; + unsigned long iaddr = (unsigned long)addr; /* * Asynchronous I/O running in a kernel thread does not have the @@ -72,24 +73,17 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si addr = untagged_addr(addr); __chk_user_ptr(addr); - asm volatile( - // A + B <= C + 1 for all A,B,C, in four easy steps: - // 1: X = A + B; X' = X % 2^64 - " adds %0, %3, %2\n" - // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4 - " csel %1, xzr, %1, hi\n" - // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X' - // to compensate for the carry flag being set in step 4. For - // X > 2^64, X' merely has to remain nonzero, which it does. - " csinv %0, %0, xzr, cc\n" - // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1 - // comes from the carry in being clear. Otherwise, we are - // testing X' - C == 0, subject to the previous adjustments. - " sbcs xzr, %0, %1\n" - " cset %0, ls\n" - : "=&r" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc"); - - return ret; + + /* + * The minimum value for limit is USER_DS, and quite a lot of + * range checks use sizeof(some_type). With both constants, + * we can rearrange the computation to avoid the need for + * 65-bit arithmetic. + */ + if (__builtin_constant_p(size) && size > 0 && size < USER_DS) + return iaddr <= limit + 1 - size; + + return (__uint128_t)iaddr + size <= (__uint128_t)limit + 1; } #define access_ok(addr, size) __range_ok(addr, size)
The general case is not quite as compact as the inline assembly, but with a sufficiently advanced compiler it is only 6 insns vs 5. The real improvement comes from assuming that limit is never tiny, and using __builtin_constant_p to make sure the constant folding does not go awry. This produces a 2 insn sequence even for older compilers. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Thanks for the v1 review, Mark. Text section size change: $ scripts/bloat-o-meter -t vmlinux{-orig,} | tail -1 * When built with gcc-10 (master) + cmpti patch set: Total: Before=12820824, After=12774708, chg -0.36% * When built with gcc-7.5.0-3ubuntu1~18.04: Total: Before=13564038, After=13513258, chg -0.37% Changes in v2: * Adjust initialization of iaddr. * Use USER_DS as the constant limit and update commentary. r~ --- arch/arm64/include/asm/uaccess.h | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-)