Message ID | 20240710065616.1060803-1-yang.yang@vivo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] sbitmap: fix io hung due to race on sbitmap_word::cleared | expand |
On 7/9/24 11:56 PM, Yang Yang wrote: > + spin_lock_irqsave(&map->swap_lock, flags); Please use guard(spinlock_irqsave) in new code instead of spin_lock_irqsave() + goto out_unlock + spin_unlock_irqrestore(). That will make this function significantly easier to read and to maintain. > + > + if (!map->cleared) { > + if (depth > 0) { > + word_mask = (~0UL) >> (BITS_PER_LONG - depth); > + /* > + * The current behavior is to always retry after moving > + * ->cleared to word, and we change it to retry in case > + * of any free bits. To avoid dead loop, we need to take What is a "dead loop"? Did you perhaps want to write "infinite loop"? > + * wrap & alloc_hint into account. Without this, a soft > + * lockup was detected in our test environment. Source code comments should not refer to "our test environment". Code that is intended for upstream inclusion should work for all setups. > + */ > + if (!wrap && alloc_hint) > + word_mask &= ~((1UL << alloc_hint) - 1); Above I see an open-coded __clear_bit() operation. Has it been considered to use __clear_bit() instead of open-coding it? Thanks, Bart.
On 7/9/24 11:56 PM, Yang Yang wrote: > + /** > + * @swap_lock: Held while swapping word <-> cleared > + */ > + spinlock_t swap_lock; Why is only swapping 'word' with 'cleared' protected by the spinlock? If all 'cleared' changes would be protected by this spinlock then that would allow to eliminate the expensive xchg() call from sbitmap_deferred_clear(). Thanks, Bart.
On 2024/7/11 0:41, Bart Van Assche wrote: > On 7/9/24 11:56 PM, Yang Yang wrote: >> + spin_lock_irqsave(&map->swap_lock, flags); > > Please use guard(spinlock_irqsave) in new code instead of spin_lock_irqsave() + goto out_unlock > + spin_unlock_irqrestore(). > That will make this function significantly easier to read and to > maintain. Got it. > >> + >> + if (!map->cleared) { >> + if (depth > 0) { >> + word_mask = (~0UL) >> (BITS_PER_LONG - depth); >> + /* >> + * The current behavior is to always retry after moving >> + * ->cleared to word, and we change it to retry in case >> + * of any free bits. To avoid dead loop, we need to take > > What is a "dead loop"? Did you perhaps want to write "infinite loop"? Yeah. I suppose so. > >> + * wrap & alloc_hint into account. Without this, a soft >> + * lockup was detected in our test environment. > > Source code comments should not refer to "our test environment". Code > that is intended for upstream inclusion should work for all setups. Got it. > >> + */ >> + if (!wrap && alloc_hint) >> + word_mask &= ~((1UL << alloc_hint) - 1); > > Above I see an open-coded __clear_bit() operation. Has it been > considered to use __clear_bit() instead of open-coding it? It is meant to reset multiple bits to zero, but __clear_bit() is only capable of resetting one bit to zero. Thanks. > > Thanks, > > Bart.
On 2024/7/11 3:54, Bart Van Assche wrote: > On 7/9/24 11:56 PM, Yang Yang wrote: >> + /** >> + * @swap_lock: Held while swapping word <-> cleared >> + */ >> + spinlock_t swap_lock; > > Why is only swapping 'word' with 'cleared' protected by the spinlock? > If all 'cleared' changes would be protected by this spinlock then > that would allow to eliminate the expensive xchg() call from > sbitmap_deferred_clear(). The spinlock was initially introduced in ea86ea2cdced ("sbitmap: ammortize cost of clearing bits"). I think if all 'cleared' changes are protected by the spinlock, the overhead of clearing tags would definitely increase. Thanks. > > Thanks, > > Bart.
On Thu, Jul 11, 2024 at 08:48:03PM +0800, YangYang wrote: > On 2024/7/11 3:54, Bart Van Assche wrote: > > On 7/9/24 11:56 PM, Yang Yang wrote: > > > + /** > > > + * @swap_lock: Held while swapping word <-> cleared > > > + */ > > > + spinlock_t swap_lock; > > > > Why is only swapping 'word' with 'cleared' protected by the spinlock? > > If all 'cleared' changes would be protected by this spinlock then > > that would allow to eliminate the expensive xchg() call from > > sbitmap_deferred_clear(). > > The spinlock was initially introduced in ea86ea2cdced ("sbitmap: > ammortize cost of clearing bits"). > I think if all 'cleared' changes are protected by the spinlock, the > overhead of clearing tags would definitely increase. There are only two WRITE on 'cleared': - xchg(&map->cleared, 0) in sbitmap_deferred_clear() - set_bit() in sbitmap_deferred_clear_bit() xchg() supposes to provide such protection already. Thanks, Ming
On 7/11/24 6:39 AM, Ming Lei wrote: > There are only two WRITE on 'cleared': > > - xchg(&map->cleared, 0) in sbitmap_deferred_clear() > > - set_bit() in sbitmap_deferred_clear_bit() > > xchg() supposes to provide such protection already. Hi Ming, The comment above 'swap_lock' in this patch is as follows: /** * @swap_lock: Held while swapping word <-> cleared */ In other words, 'swap_lock' is used to serialize *code*. Using synchronization objects to serialize code is known as an anti-pattern, something that shouldn't be done. Synchronization objects should be used to serialize access to data. Hence my question whether it would be appropriate to protect all 'cleared' changes with the newly introduced spinlock. Thanks, Bart.
On Fri, Jul 12, 2024 at 12:33 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 7/11/24 6:39 AM, Ming Lei wrote: > > There are only two WRITE on 'cleared': > > > > - xchg(&map->cleared, 0) in sbitmap_deferred_clear() > > > > - set_bit() in sbitmap_deferred_clear_bit() > > > > xchg() supposes to provide such protection already. > > Hi Ming, > > The comment above 'swap_lock' in this patch is as follows: > > /** > * @swap_lock: Held while swapping word <-> cleared > */ > > In other words, 'swap_lock' is used to serialize *code*. Using > synchronization objects to serialize code is known as an anti-pattern, > something that shouldn't be done. > Synchronization objects should be used > to serialize access to data. It can be thought of serializing UPDATE on both ->word and ->cleared, instead of code. > Hence my question whether it would be > appropriate to protect all 'cleared' changes with the newly introduced > spinlock. Wrt. ->cleared only update, xchag() is enough. Thanks, Ming
On 7/11/24 6:01 PM, Ming Lei wrote: > On Fri, Jul 12, 2024 at 12:33 AM Bart Van Assche <bvanassche@acm.org> wrote: >> The comment above 'swap_lock' in this patch is as follows: >> >> /** >> * @swap_lock: Held while swapping word <-> cleared >> */ >> >> In other words, 'swap_lock' is used to serialize *code*. Using >> synchronization objects to serialize code is known as an anti-pattern, >> something that shouldn't be done. > >> Synchronization objects should be used >> to serialize access to data. > > It can be thought of serializing UPDATE on both ->word and ->cleared, > instead of code. It would be appreciated if the comment above swap_lock would be modified such that it reflects that it serializes simultaneous updates of ->word and ->cleared. Thanks, Bart.
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index d662cf136021..ec0b0e73c906 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -36,6 +36,11 @@ struct sbitmap_word { * @cleared: word holding cleared bits */ unsigned long cleared ____cacheline_aligned_in_smp; + + /** + * @swap_lock: Held while swapping word <-> cleared + */ + spinlock_t swap_lock; } ____cacheline_aligned_in_smp; /** diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 1e453f825c05..22d6e86ba87f 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -60,12 +60,35 @@ static inline void update_alloc_hint_after_get(struct sbitmap *sb, /* * See if we have deferred clears that we can batch move */ -static inline bool sbitmap_deferred_clear(struct sbitmap_word *map) +static inline bool sbitmap_deferred_clear(struct sbitmap_word *map, + unsigned int depth, unsigned int alloc_hint, bool wrap) { - unsigned long mask; + unsigned long mask, flags, word_mask; + bool ret = false; - if (!READ_ONCE(map->cleared)) - return false; + spin_lock_irqsave(&map->swap_lock, flags); + + if (!map->cleared) { + if (depth > 0) { + word_mask = (~0UL) >> (BITS_PER_LONG - depth); + /* + * The current behavior is to always retry after moving + * ->cleared to word, and we change it to retry in case + * of any free bits. To avoid dead loop, we need to take + * wrap & alloc_hint into account. Without this, a soft + * lockup was detected in our test environment. + */ + if (!wrap && alloc_hint) + word_mask &= ~((1UL << alloc_hint) - 1); + + if ((READ_ONCE(map->word) & word_mask) == word_mask) + ret = false; + else + ret = true; + } + + goto out_unlock; + } /* * First get a stable cleared mask, setting the old mask to 0. @@ -77,7 +100,10 @@ static inline bool sbitmap_deferred_clear(struct sbitmap_word *map) */ atomic_long_andnot(mask, (atomic_long_t *)&map->word); BUILD_BUG_ON(sizeof(atomic_long_t) != sizeof(map->word)); - return true; + ret = true; +out_unlock: + spin_unlock_irqrestore(&map->swap_lock, flags); + return ret; } int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, @@ -85,6 +111,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, bool alloc_hint) { unsigned int bits_per_word; + int i; if (shift < 0) shift = sbitmap_calculate_shift(depth); @@ -116,6 +143,9 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, return -ENOMEM; } + for (i = 0; i < sb->map_nr; i++) + spin_lock_init(&sb->map[i].swap_lock); + return 0; } EXPORT_SYMBOL_GPL(sbitmap_init_node); @@ -126,7 +156,7 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth) unsigned int i; for (i = 0; i < sb->map_nr; i++) - sbitmap_deferred_clear(&sb->map[i]); + sbitmap_deferred_clear(&sb->map[i], 0, 0, 0); sb->depth = depth; sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word); @@ -179,7 +209,7 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map, alloc_hint, wrap); if (nr != -1) break; - if (!sbitmap_deferred_clear(map)) + if (!sbitmap_deferred_clear(map, depth, alloc_hint, wrap)) break; } while (1); @@ -496,7 +526,7 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags, unsigned int map_depth = __map_depth(sb, index); unsigned long val; - sbitmap_deferred_clear(map); + sbitmap_deferred_clear(map, 0, 0, 0); val = READ_ONCE(map->word); if (val == (1UL << (map_depth - 1)) - 1) goto next;