Message ID | 20210218040512.709186-7-yury.norov@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lib/find_bit: fast path for small bitmaps | expand |
On 18/02/2021 05.05, Yury Norov wrote: > Many algorithms become simpler if they are passed with relatively small > input values. One example is bitmap operations when the whole bitmap fits > into one word. To implement such simplifications, linux/bitmap.h declares > small_const_nbits() macro. > > Other subsystems may also benefit from optimizations of this sort, like > find_bit API in the following patches. So it looks helpful to generalize > the macro and extend it's visibility. Perhaps, but SMALL_CONST is too generic a name, it needs to keep "bits" somewhere in there. So why not just keep it at small_const_nbits? > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > include/asm-generic/bitsperlong.h | 2 ++ > include/linux/bitmap.h | 33 ++++++++++++++----------------- > 2 files changed, 17 insertions(+), 18 deletions(-) > > diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h > index 3905c1c93dc2..0eeb77544f1d 100644 > --- a/include/asm-generic/bitsperlong.h > +++ b/include/asm-generic/bitsperlong.h > @@ -23,4 +23,6 @@ > #define BITS_PER_LONG_LONG 64 > #endif > > +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG) > + > #endif /* __ASM_GENERIC_BITS_PER_LONG */ > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > index adf7bd9f0467..e89f1dace846 100644 > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, > * so make such users (should any ever turn up) call the out-of-line > * versions. > */ > -#define small_const_nbits(nbits) \ > - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) > - > static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) > { > unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); > @@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, > static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, > const unsigned long *src2, unsigned int nbits) > { > - if (small_const_nbits(nbits)) > + if (SMALL_CONST(nbits - 1)) Please don't force most users to be changed to something less readable. What's wrong with just keeping small_const_nbits() the way it is, avoiding all this churn and keeping the readability? At a quick reading, one of the very few places where you end up not passing nbits-1 but just nbits is this unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset) { + if (SMALL_CONST(size)) { + unsigned long val = *(const unsigned long *)addr; + + if (unlikely(offset >= size)) + return size; which is a regression, for much the same reason the nbits==0 case was excluded from small_const_nbits in the first place. If size is 0, we used to just return 0 early in _find_next_bit. But you've introduced a dereference of addr before that check is now done, which is theoretically an oops. If find_next_zero_bit_le cannot handle nbits==BITS_PER_LONG efficiently but requires one off-limits bit position, fine, so be it, add an extra "small_const_nbits() && nbits < BITS_PER_LONG" (and a comment). Rasmus
On Fri, Feb 19, 2021 at 12:07:27AM +0100, Rasmus Villemoes wrote: > On 18/02/2021 05.05, Yury Norov wrote: > > Many algorithms become simpler if they are passed with relatively small > > input values. One example is bitmap operations when the whole bitmap fits > > into one word. To implement such simplifications, linux/bitmap.h declares > > small_const_nbits() macro. > > > > Other subsystems may also benefit from optimizations of this sort, like > > find_bit API in the following patches. So it looks helpful to generalize > > the macro and extend it's visibility. > > Perhaps, but SMALL_CONST is too generic a name, it needs to keep "bits" > somewhere in there. So why not just keep it at small_const_nbits? > > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > > --- > > include/asm-generic/bitsperlong.h | 2 ++ > > include/linux/bitmap.h | 33 ++++++++++++++----------------- > > 2 files changed, 17 insertions(+), 18 deletions(-) > > > > diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h > > index 3905c1c93dc2..0eeb77544f1d 100644 > > --- a/include/asm-generic/bitsperlong.h > > +++ b/include/asm-generic/bitsperlong.h > > @@ -23,4 +23,6 @@ > > #define BITS_PER_LONG_LONG 64 > > #endif > > > > +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG) > > + > > #endif /* __ASM_GENERIC_BITS_PER_LONG */ > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > > index adf7bd9f0467..e89f1dace846 100644 > > --- a/include/linux/bitmap.h > > +++ b/include/linux/bitmap.h > > @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, > > * so make such users (should any ever turn up) call the out-of-line > > * versions. > > */ > > -#define small_const_nbits(nbits) \ > > - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) > > - > > static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) > > { > > unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); > > @@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, > > static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, > > const unsigned long *src2, unsigned int nbits) > > { > > - if (small_const_nbits(nbits)) > > + if (SMALL_CONST(nbits - 1)) > > Please don't force most users to be changed to something less readable. > What's wrong with just keeping small_const_nbits() the way it is, > avoiding all this churn and keeping the readability? The wrong thing is that it's defined in include/linux/bitmap.h, and I cannot use it in include/asm-generic/bitops/find.h, so I have to either move it to a separate header, or generalize and share with find.h and other users this way. I prefer the latter option, thougt it's more verbose. > At a quick reading, one of the very few places where you end up not > passing nbits-1 but just nbits is this > > unsigned long find_next_zero_bit_le(const void *addr, unsigned > long size, unsigned long offset) > { > + if (SMALL_CONST(size)) { > + unsigned long val = *(const unsigned long *)addr; > + > + if (unlikely(offset >= size)) > + return size; > > which is a regression, for much the same reason the nbits==0 case was > excluded from small_const_nbits in the first place. If size is 0, we > used to just return 0 early in _find_next_bit. But you've introduced a > dereference of addr before that check is now done, which is > theoretically an oops. > > If find_next_zero_bit_le cannot handle nbits==BITS_PER_LONG efficiently > but requires one off-limits bit position, fine, so be it, add an extra > "small_const_nbits() && nbits < BITS_PER_LONG" (and a comment). Sure, it's my bad. I need to fix it. What would you prefer, moving the small_const_nbits() to a separate header, or introduce something like #define small_const_size(n) SMALL_CONST((n) - 1) to avoid mistakes like this? I think small_const_size() is better because it might be potentially useful for someone else, but I can go either way.
On 12/03/2021 06.28, Yury Norov wrote: > On Fri, Feb 19, 2021 at 12:07:27AM +0100, Rasmus Villemoes wrote: >> On 18/02/2021 05.05, Yury Norov wrote: >>> Many algorithms become simpler if they are passed with relatively small >>> input values. One example is bitmap operations when the whole bitmap fits >>> into one word. To implement such simplifications, linux/bitmap.h declares >>> small_const_nbits() macro. >>> >>> Other subsystems may also benefit from optimizations of this sort, like >>> find_bit API in the following patches. So it looks helpful to generalize >>> the macro and extend it's visibility. >> >> Perhaps, but SMALL_CONST is too generic a name, it needs to keep "bits" >> somewhere in there. So why not just keep it at small_const_nbits? >> >>> Signed-off-by: Yury Norov <yury.norov@gmail.com> >>> --- >>> include/asm-generic/bitsperlong.h | 2 ++ >>> include/linux/bitmap.h | 33 ++++++++++++++----------------- >>> 2 files changed, 17 insertions(+), 18 deletions(-) >>> >>> diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h >>> index 3905c1c93dc2..0eeb77544f1d 100644 >>> --- a/include/asm-generic/bitsperlong.h >>> +++ b/include/asm-generic/bitsperlong.h >>> @@ -23,4 +23,6 @@ >>> #define BITS_PER_LONG_LONG 64 >>> #endif >>> >>> +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG) >>> + >>> #endif /* __ASM_GENERIC_BITS_PER_LONG */ >>> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h >>> index adf7bd9f0467..e89f1dace846 100644 >>> --- a/include/linux/bitmap.h >>> +++ b/include/linux/bitmap.h >>> @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, >>> * so make such users (should any ever turn up) call the out-of-line >>> * versions. >>> */ >>> -#define small_const_nbits(nbits) \ >>> - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) >>> - >>> static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) >>> { >>> unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); >>> @@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, >>> static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, >>> const unsigned long *src2, unsigned int nbits) >>> { >>> - if (small_const_nbits(nbits)) >>> + if (SMALL_CONST(nbits - 1)) >> >> Please don't force most users to be changed to something less readable. >> What's wrong with just keeping small_const_nbits() the way it is, >> avoiding all this churn and keeping the readability? > > The wrong thing is that it's defined in include/linux/bitmap.h, and I > cannot use it in include/asm-generic/bitops/find.h, so I have to either > move it to a separate header, or generalize and share with find.h and > other users this way. I prefer the latter option, thougt it's more > verbose. The logical place would be the same place the BITS_PER_LONG macro is defined, no? No need to introduce a new header for that, and all current users of small_const_nbits() must already (very possibly indirectly) include asm-generic/bitsperlong.h. I do prefer to keep both the name small_const_nbits() and its current semantics, which, although not currently spelled out that way anywhere, is "is BITMAP_SIZE(nbits) known at compile time and equal to 1", which is precisely what allows the static inlines to unconditionally dereference the pointer (that's the "exclude the 0 case") and just deal with that one word. I don't like either SMALL_CONST or small_const_size, because nothing in there says it has anything to do with bit ops. As I said, if you have some special place that for some reason cannot handle nbits==BITS_PER_LONG, then just add that as an additional constraint with a comment why. Rasmus
On Fri, Mar 12, 2021 at 10:12:22AM +0100, Rasmus Villemoes wrote: > On 12/03/2021 06.28, Yury Norov wrote: > > On Fri, Feb 19, 2021 at 12:07:27AM +0100, Rasmus Villemoes wrote: > >> On 18/02/2021 05.05, Yury Norov wrote: > >>> Many algorithms become simpler if they are passed with relatively small > >>> input values. One example is bitmap operations when the whole bitmap fits > >>> into one word. To implement such simplifications, linux/bitmap.h declares > >>> small_const_nbits() macro. > >>> > >>> Other subsystems may also benefit from optimizations of this sort, like > >>> find_bit API in the following patches. So it looks helpful to generalize > >>> the macro and extend it's visibility. > >> > >> Perhaps, but SMALL_CONST is too generic a name, it needs to keep "bits" > >> somewhere in there. So why not just keep it at small_const_nbits? > >> > >>> Signed-off-by: Yury Norov <yury.norov@gmail.com> > >>> --- > >>> include/asm-generic/bitsperlong.h | 2 ++ > >>> include/linux/bitmap.h | 33 ++++++++++++++----------------- > >>> 2 files changed, 17 insertions(+), 18 deletions(-) > >>> > >>> diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h > >>> index 3905c1c93dc2..0eeb77544f1d 100644 > >>> --- a/include/asm-generic/bitsperlong.h > >>> +++ b/include/asm-generic/bitsperlong.h > >>> @@ -23,4 +23,6 @@ > >>> #define BITS_PER_LONG_LONG 64 > >>> #endif > >>> > >>> +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG) > >>> + > >>> #endif /* __ASM_GENERIC_BITS_PER_LONG */ > >>> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > >>> index adf7bd9f0467..e89f1dace846 100644 > >>> --- a/include/linux/bitmap.h > >>> +++ b/include/linux/bitmap.h > >>> @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, > >>> * so make such users (should any ever turn up) call the out-of-line > >>> * versions. > >>> */ > >>> -#define small_const_nbits(nbits) \ > >>> - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) > >>> - > >>> static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) > >>> { > >>> unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); > >>> @@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, > >>> static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, > >>> const unsigned long *src2, unsigned int nbits) > >>> { > >>> - if (small_const_nbits(nbits)) > >>> + if (SMALL_CONST(nbits - 1)) > >> > >> Please don't force most users to be changed to something less readable. > >> What's wrong with just keeping small_const_nbits() the way it is, > >> avoiding all this churn and keeping the readability? > > > > The wrong thing is that it's defined in include/linux/bitmap.h, and I > > cannot use it in include/asm-generic/bitops/find.h, so I have to either > > move it to a separate header, or generalize and share with find.h and > > other users this way. I prefer the latter option, thougt it's more > > verbose. > > The logical place would be the same place the BITS_PER_LONG macro is > defined, no? Yes. This where I placed SMALL_CONST() in current version. > No need to introduce a new header for that, and all current > users of small_const_nbits() must already (very possibly indirectly) > include asm-generic/bitsperlong.h. > > I do prefer to keep both the name small_const_nbits() and its current > semantics, which, although not currently spelled out that way anywhere, > is "is BITMAP_SIZE(nbits) known at compile time and equal to 1", which > is precisely what allows the static inlines to unconditionally > dereference the pointer (that's the "exclude the 0 case") and just deal > with that one word. > > I don't like either SMALL_CONST or small_const_size, because nothing in > there says it has anything to do with bit ops. As I said, if you have > some special place that for some reason cannot handle > nbits==BITS_PER_LONG, then just add that as an additional constraint > with a comment why. OK, I'll move small_const_nbits() to the bitsperlong header and resubmit shortly. My concern still is that nbits is too specific for bitsperlong.h, but if you're good with it, I'm OK as well.
diff --git a/include/asm-generic/bitsperlong.h b/include/asm-generic/bitsperlong.h index 3905c1c93dc2..0eeb77544f1d 100644 --- a/include/asm-generic/bitsperlong.h +++ b/include/asm-generic/bitsperlong.h @@ -23,4 +23,6 @@ #define BITS_PER_LONG_LONG 64 #endif +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < BITS_PER_LONG) + #endif /* __ASM_GENERIC_BITS_PER_LONG */ diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index adf7bd9f0467..e89f1dace846 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf, * so make such users (should any ever turn up) call the out-of-line * versions. */ -#define small_const_nbits(nbits) \ - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0) - static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) { unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); @@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return (*dst = *src1 & *src2 & BITS_FIRST(nbits - 1)) != 0; return __bitmap_and(dst, src1, src2, nbits); } @@ -286,7 +283,7 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, static inline void bitmap_or(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) *dst = *src1 | *src2; else __bitmap_or(dst, src1, src2, nbits); @@ -295,7 +292,7 @@ static inline void bitmap_or(unsigned long *dst, const unsigned long *src1, static inline void bitmap_xor(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) *dst = *src1 ^ *src2; else __bitmap_xor(dst, src1, src2, nbits); @@ -304,7 +301,7 @@ static inline void bitmap_xor(unsigned long *dst, const unsigned long *src1, static inline int bitmap_andnot(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return (*dst = *src1 & ~(*src2) & BITS_FIRST(nbits - 1)) != 0; return __bitmap_andnot(dst, src1, src2, nbits); } @@ -312,7 +309,7 @@ static inline int bitmap_andnot(unsigned long *dst, const unsigned long *src1, static inline void bitmap_complement(unsigned long *dst, const unsigned long *src, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) *dst = ~(*src); else __bitmap_complement(dst, src, nbits); @@ -328,7 +325,7 @@ static inline void bitmap_complement(unsigned long *dst, const unsigned long *sr static inline int bitmap_equal(const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return !((*src1 ^ *src2) & BITS_FIRST(nbits - 1)); if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) && IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT)) @@ -350,7 +347,7 @@ static inline bool bitmap_or_equal(const unsigned long *src1, const unsigned long *src3, unsigned int nbits) { - if (!small_const_nbits(nbits)) + if (!SMALL_CONST(nbits - 1)) return __bitmap_or_equal(src1, src2, src3, nbits); return !(((*src1 | *src2) ^ *src3) & BITS_FIRST(nbits - 1)); @@ -359,7 +356,7 @@ static inline bool bitmap_or_equal(const unsigned long *src1, static inline int bitmap_intersects(const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return ((*src1 & *src2) & BITS_FIRST(nbits - 1)) != 0; else return __bitmap_intersects(src1, src2, nbits); @@ -368,7 +365,7 @@ static inline int bitmap_intersects(const unsigned long *src1, static inline int bitmap_subset(const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return !((*src1 & ~(*src2)) & BITS_FIRST(nbits - 1)); else return __bitmap_subset(src1, src2, nbits); @@ -376,7 +373,7 @@ static inline int bitmap_subset(const unsigned long *src1, static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return !(*src & BITS_FIRST(nbits - 1)); return find_first_bit(src, nbits) == nbits; @@ -384,7 +381,7 @@ static inline bool bitmap_empty(const unsigned long *src, unsigned nbits) static inline bool bitmap_full(const unsigned long *src, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return !(~(*src) & BITS_FIRST(nbits - 1)); return find_first_zero_bit(src, nbits) == nbits; @@ -392,7 +389,7 @@ static inline bool bitmap_full(const unsigned long *src, unsigned int nbits) static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) return hweight_long(*src & BITS_FIRST(nbits - 1)); return __bitmap_weight(src, nbits); } @@ -428,7 +425,7 @@ static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *src, unsigned int shift, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) *dst = (*src & BITS_FIRST(nbits - 1)) >> shift; else __bitmap_shift_right(dst, src, shift, nbits); @@ -437,7 +434,7 @@ static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *s static inline void bitmap_shift_left(unsigned long *dst, const unsigned long *src, unsigned int shift, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) *dst = (*src << shift) & BITS_FIRST(nbits - 1); else __bitmap_shift_left(dst, src, shift, nbits); @@ -449,7 +446,7 @@ static inline void bitmap_replace(unsigned long *dst, const unsigned long *mask, unsigned int nbits) { - if (small_const_nbits(nbits)) + if (SMALL_CONST(nbits - 1)) *dst = (*old & ~(*mask)) | (*new & *mask); else __bitmap_replace(dst, old, new, mask, nbits);
Many algorithms become simpler if they are passed with relatively small input values. One example is bitmap operations when the whole bitmap fits into one word. To implement such simplifications, linux/bitmap.h declares small_const_nbits() macro. Other subsystems may also benefit from optimizations of this sort, like find_bit API in the following patches. So it looks helpful to generalize the macro and extend it's visibility. Signed-off-by: Yury Norov <yury.norov@gmail.com> --- include/asm-generic/bitsperlong.h | 2 ++ include/linux/bitmap.h | 33 ++++++++++++++----------------- 2 files changed, 17 insertions(+), 18 deletions(-)