Message ID | 20240201122216.2634007-2-aleksander.lobakin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ice: add PFCP filter support | expand |
On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote: > From: Syed Nayyar Waris <syednwaris@gmail.com> > > The two new functions allow reading/writing values of length up to > BITS_PER_LONG bits at arbitrary position in the bitmap. > > The code was taken from "bitops: Introduce the for_each_set_clump macro" > by Syed Nayyar Waris with a number of changes and simplifications: > - instead of using roundup(), which adds an unnecessary dependency > on <linux/math.h>, we calculate space as BITS_PER_LONG-offset; > - indentation is reduced by not using else-clauses (suggested by > checkpatch for bitmap_get_value()); > - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read() > and bitmap_write(); > - some redundant computations are omitted. These functions feel like they should not be inline but are better off in lib/bitmap.c given their length. As far as I can tell, the header ends up being included indirectly almost everywhere, so just parsing these functions likey adds not just dependencies but also compile time. Arnd
On Thu, Feb 1, 2024 at 2:23 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote: > > From: Syed Nayyar Waris <syednwaris@gmail.com> > > > > The two new functions allow reading/writing values of length up to > > BITS_PER_LONG bits at arbitrary position in the bitmap. > > > > The code was taken from "bitops: Introduce the for_each_set_clump macro" > > by Syed Nayyar Waris with a number of changes and simplifications: > > - instead of using roundup(), which adds an unnecessary dependency > > on <linux/math.h>, we calculate space as BITS_PER_LONG-offset; > > - indentation is reduced by not using else-clauses (suggested by > > checkpatch for bitmap_get_value()); > > - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read() > > and bitmap_write(); > > - some redundant computations are omitted. > > These functions feel like they should not be inline but are > better off in lib/bitmap.c given their length. > > As far as I can tell, the header ends up being included > indirectly almost everywhere, so just parsing these functions > likey adds not just dependencies but also compile time. > > Arnd Removing particular functions from a header to reduce compilation time does not really scale. Do we know this case has a noticeable impact on the compilation time? If yes, maybe we need to tackle this problem in a different way (e.g. reduce the number of dependencies on it)?
On Thu, Feb 1, 2024, at 14:45, Alexander Potapenko wrote: > On Thu, Feb 1, 2024 at 2:23 PM Arnd Bergmann <arnd@arndb.de> wrote: >> On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote: >> >> As far as I can tell, the header ends up being included >> indirectly almost everywhere, so just parsing these functions >> likey adds not just dependencies but also compile time. >> > > Removing particular functions from a header to reduce compilation time > does not really scale. > Do we know this case has a noticeable impact on the compilation time? > If yes, maybe we need to tackle this problem in a different way (e.g. > reduce the number of dependencies on it)? Cleaning up the header dependencies is definitely possible in theory, and there are other places we could start this, but it's also a multi-year effort that several people have tried without much success. All I'm asking here is to not make it worse by adding this one without need. If the function is not normally inlined anyway, there is no benefit to having it in the header. Arnd
From: Arnd Bergmann <arnd@arndb.de> Date: Thu, 01 Feb 2024 14:23:33 +0100 > On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote: >> From: Syed Nayyar Waris <syednwaris@gmail.com> >> >> The two new functions allow reading/writing values of length up to >> BITS_PER_LONG bits at arbitrary position in the bitmap. >> >> The code was taken from "bitops: Introduce the for_each_set_clump macro" >> by Syed Nayyar Waris with a number of changes and simplifications: >> - instead of using roundup(), which adds an unnecessary dependency >> on <linux/math.h>, we calculate space as BITS_PER_LONG-offset; >> - indentation is reduced by not using else-clauses (suggested by >> checkpatch for bitmap_get_value()); >> - bitmap_get_value()/bitmap_set_value() are renamed to bitmap_read() >> and bitmap_write(); >> - some redundant computations are omitted. > > These functions feel like they should not be inline but are > better off in lib/bitmap.c given their length. When their arguments are compile-time constants, they got optimized well. They're also used on hotpath, so making them external could hurt performance + taking the first sentence into account, making them external will hurt the performance even more, 'cause they won't be then as optimized by the compiler as they are now. > > As far as I can tell, the header ends up being included > indirectly almost everywhere, so just parsing these functions > likey adds not just dependencies but also compile time. > > Arnd Thanks, Olek
On Thu, Feb 01, 2024 at 03:02:50PM +0100, Arnd Bergmann wrote: > On Thu, Feb 1, 2024, at 14:45, Alexander Potapenko wrote: > > On Thu, Feb 1, 2024 at 2:23 PM Arnd Bergmann <arnd@arndb.de> wrote: > >> On Thu, Feb 1, 2024, at 13:21, Alexander Lobakin wrote: > >> > >> As far as I can tell, the header ends up being included > >> indirectly almost everywhere, so just parsing these functions > >> likey adds not just dependencies but also compile time. > >> > > > > Removing particular functions from a header to reduce compilation time > > does not really scale. > > Do we know this case has a noticeable impact on the compilation time? > > If yes, maybe we need to tackle this problem in a different way (e.g. > > reduce the number of dependencies on it)? > > Cleaning up the header dependencies is definitely possible in > theory, and there are other places we could start this, but > it's also a multi-year effort that several people have tried > without much success. > > All I'm asking here is to not make it worse by adding this > one without need. If the function is not normally inlined > anyway, there is no benefit to having it in the header. > > Arnd Hi Arnd, I think Alexander has shown that the functions are normally inlined. If for some target that doesn't hold, we'd use __always_inline. They are very lightweight by nature - one or at max two word fetches followed by some shifting. We spent quite some cycles making sure that the generated code looks efficient, at least not worse than the existing bitmap_{get,set}_value8(), which is a special case of the bitmap_{read,write}. I agree that bitmap header is overwhelmed (like many other kernel headers), and I'm working on unloading it. I checked allyesconfig build time before and after this patch, and I found no difference for me. So if you're concerned about compilation time, this patch doesn't make things worse in this department. With all that, Alexander, can you please double-check that the functions get inlined, and if so: Signed-off-by: Yury Norov <yury.norov@gmail.com>
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 99451431e4d6..7ca0379be8c1 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -79,6 +79,10 @@ struct device; * bitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dst * bitmap_get_value8(map, start) Get 8bit value from map at start * bitmap_set_value8(map, value, start) Set 8bit value to map at start + * bitmap_read(map, start, nbits) Read an nbits-sized value from + * map at start + * bitmap_write(map, value, start, nbits) Write an nbits-sized value to + * map at start * * Note, bitmap_zero() and bitmap_fill() operate over the region of * unsigned longs, that is, bits behind bitmap till the unsigned long @@ -636,6 +640,79 @@ static inline void bitmap_set_value8(unsigned long *map, unsigned long value, map[index] |= value << offset; } +/** + * bitmap_read - read a value of n-bits from the memory region + * @map: address to the bitmap memory region + * @start: bit offset of the n-bit value + * @nbits: size of value in bits, nonzero, up to BITS_PER_LONG + * + * Returns: value of @nbits bits located at the @start bit offset within the + * @map memory region. For @nbits = 0 and @nbits > BITS_PER_LONG the return + * value is undefined. + */ +static inline unsigned long bitmap_read(const unsigned long *map, + unsigned long start, + unsigned long nbits) +{ + size_t index = BIT_WORD(start); + unsigned long offset = start % BITS_PER_LONG; + unsigned long space = BITS_PER_LONG - offset; + unsigned long value_low, value_high; + + if (unlikely(!nbits || nbits > BITS_PER_LONG)) + return 0; + + if (space >= nbits) + return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits); + + value_low = map[index] & BITMAP_FIRST_WORD_MASK(start); + value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); + return (value_low >> offset) | (value_high << space); +} + +/** + * bitmap_write - write n-bit value within a memory region + * @map: address to the bitmap memory region + * @value: value to write, clamped to nbits + * @start: bit offset of the n-bit value + * @nbits: size of value in bits, nonzero, up to BITS_PER_LONG. + * + * bitmap_write() behaves as-if implemented as @nbits calls of __assign_bit(), + * i.e. bits beyond @nbits are ignored: + * + * for (bit = 0; bit < nbits; bit++) + * __assign_bit(start + bit, bitmap, val & BIT(bit)); + * + * For @nbits == 0 and @nbits > BITS_PER_LONG no writes are performed. + */ +static inline void bitmap_write(unsigned long *map, unsigned long value, + unsigned long start, unsigned long nbits) +{ + size_t index; + unsigned long offset; + unsigned long space; + unsigned long mask; + bool fit; + + if (unlikely(!nbits || nbits > BITS_PER_LONG)) + return; + + mask = BITMAP_LAST_WORD_MASK(nbits); + value &= mask; + offset = start % BITS_PER_LONG; + space = BITS_PER_LONG - offset; + fit = space >= nbits; + index = BIT_WORD(start); + + map[index] &= (fit ? (~(mask << offset)) : ~BITMAP_FIRST_WORD_MASK(start)); + map[index] |= value << offset; + if (fit) + return; + + map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); + map[index + 1] |= (value >> space); +} + #endif /* __ASSEMBLY__ */ #endif /* __LINUX_BITMAP_H */