Message ID | 20200321051352.16484-2-richard.henderson@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | aarch64: Simplify __range_ok | expand |
On Fri, Mar 20, 2020 at 10:13:52PM -0700, Richard Henderson wrote: > The general case is not quite as compact as the inline assembly, > but with a sufficiently advanced compiler it is only 6 insns vs 5. > > The real improvement comes from assuming that limit is never tiny, > and using __builtin_constant_p to make sure the constant folding > does not go awry. This produces a 2 insn sequence even for older > compilers. Neat; thanks for putting this together! Do you happen to have numbers for the impact on a defconfig Image size (or vmlinux .text size)? > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > arch/arm64/include/asm/uaccess.h | 31 +++++++++++++------------------ > 1 file changed, 13 insertions(+), 18 deletions(-) > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h > index 32fc8061aa76..683727696dc3 100644 > --- a/arch/arm64/include/asm/uaccess.h > +++ b/arch/arm64/include/asm/uaccess.h > @@ -60,7 +60,8 @@ static inline void set_fs(mm_segment_t fs) > */ > static inline unsigned long __range_ok(const void __user *addr, unsigned long size) > { > - unsigned long ret, limit = current_thread_info()->addr_limit; > + unsigned long limit = current_thread_info()->addr_limit; > + unsigned long iaddr; Trivial: could we move the initialisation here, please? > > /* > * Asynchronous I/O running in a kernel thread does not have the > @@ -72,24 +73,18 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si > addr = untagged_addr(addr); > > __chk_user_ptr(addr); > - asm volatile( > - // A + B <= C + 1 for all A,B,C, in four easy steps: > - // 1: X = A + B; X' = X % 2^64 > - " adds %0, %3, %2\n" > - // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4 > - " csel %1, xzr, %1, hi\n" > - // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X' > - // to compensate for the carry flag being set in step 4. For > - // X > 2^64, X' merely has to remain nonzero, which it does. > - " csinv %0, %0, xzr, cc\n" > - // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1 > - // comes from the carry in being clear. Otherwise, we are > - // testing X' - C == 0, subject to the previous adjustments. > - " sbcs xzr, %0, %1\n" > - " cset %0, ls\n" > - : "=&r" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc"); > > - return ret; > + /* > + * Quite a lot of range checks use sizeof(some_type), and are > + * therefore constant. If we can assume that limit is never unusably > + * small, then we can rearrange the computation to avoid the need for > + * 65-bit arithmetic. Arbitrary choice for size limit of 1MiB. > + */ > + iaddr = (unsigned long)addr; > + if (__builtin_constant_p(size) && size > 0 && size < 0x100000) > + return iaddr <= limit + 1 - size; The limit should be either USER_DS or KERNEL_DS, where USER_DS is smaller than KERNEL_DS, so we could derive a less arbitrary bound from USER_DS. Thanks, Mark. > + > + return (__uint128_t)iaddr + size <= (__uint128_t)limit + 1; > } > > #define access_ok(addr, size) __range_ok(addr, size) > -- > 2.20.1 >
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 32fc8061aa76..683727696dc3 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -60,7 +60,8 @@ static inline void set_fs(mm_segment_t fs) */ static inline unsigned long __range_ok(const void __user *addr, unsigned long size) { - unsigned long ret, limit = current_thread_info()->addr_limit; + unsigned long limit = current_thread_info()->addr_limit; + unsigned long iaddr; /* * Asynchronous I/O running in a kernel thread does not have the @@ -72,24 +73,18 @@ static inline unsigned long __range_ok(const void __user *addr, unsigned long si addr = untagged_addr(addr); __chk_user_ptr(addr); - asm volatile( - // A + B <= C + 1 for all A,B,C, in four easy steps: - // 1: X = A + B; X' = X % 2^64 - " adds %0, %3, %2\n" - // 2: Set C = 0 if X > 2^64, to guarantee X' > C in step 4 - " csel %1, xzr, %1, hi\n" - // 3: Set X' = ~0 if X >= 2^64. For X == 2^64, this decrements X' - // to compensate for the carry flag being set in step 4. For - // X > 2^64, X' merely has to remain nonzero, which it does. - " csinv %0, %0, xzr, cc\n" - // 4: For X < 2^64, this gives us X' - C - 1 <= 0, where the -1 - // comes from the carry in being clear. Otherwise, we are - // testing X' - C == 0, subject to the previous adjustments. - " sbcs xzr, %0, %1\n" - " cset %0, ls\n" - : "=&r" (ret), "+r" (limit) : "Ir" (size), "0" (addr) : "cc"); - return ret; + /* + * Quite a lot of range checks use sizeof(some_type), and are + * therefore constant. If we can assume that limit is never unusably + * small, then we can rearrange the computation to avoid the need for + * 65-bit arithmetic. Arbitrary choice for size limit of 1MiB. + */ + iaddr = (unsigned long)addr; + if (__builtin_constant_p(size) && size > 0 && size < 0x100000) + return iaddr <= limit + 1 - size; + + return (__uint128_t)iaddr + size <= (__uint128_t)limit + 1; } #define access_ok(addr, size) __range_ok(addr, size)
The general case is not quite as compact as the inline assembly, but with a sufficiently advanced compiler it is only 6 insns vs 5. The real improvement comes from assuming that limit is never tiny, and using __builtin_constant_p to make sure the constant folding does not go awry. This produces a 2 insn sequence even for older compilers. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- arch/arm64/include/asm/uaccess.h | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-)