[v2,2/4] sbitmap: remove swap_lock

Message ID	488177c02dccda60c5e8af2e53156c42b7f1acc0.1606058975.git.asml.silence@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Pavel Begunkov <asml.silence@gmail.com> To: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, Omar Sandoval <osandov@osandov.com> Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2 2/4] sbitmap: remove swap_lock Date: Sun, 22 Nov 2020 15:35:46 +0000 Message-Id: <488177c02dccda60c5e8af2e53156c42b7f1acc0.1606058975.git.asml.silence@gmail.com> In-Reply-To: <cover.1606058975.git.asml.silence@gmail.com> References: <cover.1606058975.git.asml.silence@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	optimise sbitmap deferred clear \| expand [v2,for-next,0/4] optimise sbitmap deferred clear [v2,1/4] sbitmap: optimise sbitmap_deferred_clear() [v2,2/4] sbitmap: remove swap_lock [v2,3/4] sbitmap: replace CAS with atomic and [v2,4/4] sbitmap: simplify wrap check

Pavel Begunkov Nov. 22, 2020, 3:35 p.m. UTC

map->swap_lock protects map->cleared from concurrent modification,
however sbitmap_deferred_clear() is already atomically drains it, so
it's guaranteed to not loose bits on concurrent
sbitmap_deferred_clear().

A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/sbitmap.h |  5 -----
 lib/sbitmap.c           | 14 +++-----------
 2 files changed, 3 insertions(+), 16 deletions(-)

John Garry Nov. 24, 2020, 2:22 p.m. UTC | #1

On 22/11/2020 15:35, Pavel Begunkov wrote:
> map->swap_lock protects map->cleared from concurrent modification,
> however sbitmap_deferred_clear() is already atomically drains it, so
> it's guaranteed to not loose bits on concurrent
> sbitmap_deferred_clear().
> 
> A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
> increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   include/linux/sbitmap.h |  5 -----
>   lib/sbitmap.c           | 14 +++-----------
>   2 files changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> index e40d019c3d9d..74cc6384715e 100644
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -32,11 +32,6 @@ struct sbitmap_word {
>   	 * @cleared: word holding cleared bits
>   	 */
>   	unsigned long cleared ____cacheline_aligned_in_smp;
> -
> -	/**
> -	 * @swap_lock: Held while swapping word <-> cleared
> -	 */
> -	spinlock_t swap_lock;
>   } ____cacheline_aligned_in_smp;
>   
>   /**
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index c1c8a4e69325..4fd877048ba8 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -15,13 +15,9 @@
>   static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
>   {
>   	unsigned long mask, val;
> -	bool ret = false;
> -	unsigned long flags;
>   
> -	spin_lock_irqsave(&map->swap_lock, flags);
> -
> -	if (!map->cleared)
> -		goto out_unlock;
> +	if (!READ_ONCE(map->cleared))
> +		return false;

So if we race with another cpu, won't the 2nd cpu see that the mask is 0 
returned from the xchg (not shown)? If so, it's odd to continue to do 
the CAS - or atomic not, from later patch - on a mask of 0.

Thanks,
John

>   
>   	/*
>   	 * First get a stable cleared mask, setting the old mask to 0.
> @@ -35,10 +31,7 @@ static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
>   		val = map->word;
>   	} while (cmpxchg(&map->word, val, val & ~mask) != val);
>   
> -	ret = true;
> -out_unlock:
> -	spin_unlock_irqrestore(&map->swap_lock, flags);
> -	return ret;
> +	return true;
>   }
>   
>   int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
> @@ -80,7 +73,6 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
>   	for (i = 0; i < sb->map_nr; i++) {
>   		sb->map[i].depth = min(depth, bits_per_word);
>   		depth -= sb->map[i].depth;
> -		spin_lock_init(&sb->map[i].swap_lock);
>   	}
>   	return 0;
>   }
>

Pavel Begunkov Nov. 24, 2020, 2:43 p.m. UTC | #2

On 24/11/2020 14:22, John Garry wrote:
> On 22/11/2020 15:35, Pavel Begunkov wrote:
>> map->swap_lock protects map->cleared from concurrent modification,
>> however sbitmap_deferred_clear() is already atomically drains it, so
>> it's guaranteed to not loose bits on concurrent
>> sbitmap_deferred_clear().
>>
>> A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
>> increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/linux/sbitmap.h |  5 -----
>>   lib/sbitmap.c           | 14 +++-----------
>>   2 files changed, 3 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
>> index e40d019c3d9d..74cc6384715e 100644
>> --- a/include/linux/sbitmap.h
>> +++ b/include/linux/sbitmap.h
>> @@ -32,11 +32,6 @@ struct sbitmap_word {
>>        * @cleared: word holding cleared bits
>>        */
>>       unsigned long cleared ____cacheline_aligned_in_smp;
>> -
>> -    /**
>> -     * @swap_lock: Held while swapping word <-> cleared
>> -     */
>> -    spinlock_t swap_lock;
>>   } ____cacheline_aligned_in_smp;
>>     /**
>> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
>> index c1c8a4e69325..4fd877048ba8 100644
>> --- a/lib/sbitmap.c
>> +++ b/lib/sbitmap.c
>> @@ -15,13 +15,9 @@
>>   static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
>>   {
>>       unsigned long mask, val;
>> -    bool ret = false;
>> -    unsigned long flags;
>>   -    spin_lock_irqsave(&map->swap_lock, flags);
>> -
>> -    if (!map->cleared)
>> -        goto out_unlock;
>> +    if (!READ_ONCE(map->cleared))
>> +        return false;
> 
> So if we race with another cpu, won't the 2nd cpu see that the mask is 0 returned from the xchg (not shown)? If so, it's odd to continue to do the CAS - or atomic not, from later patch - on a mask of 0.

Yeah, but this part is legit and I don't expect it to be so
contended to need an additional check, especially with atomic
and from [3/4].

I'm more concerned about sbitmap_resize*() callers to do right
synchronisation (e.g. quiesce) and not rely on that critical
section I remove. Would be great if anyone can confirm that.

> 
> Thanks,
> John
> 
>>         /*
>>        * First get a stable cleared mask, setting the old mask to 0.
>> @@ -35,10 +31,7 @@ static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
>>           val = map->word;
>>       } while (cmpxchg(&map->word, val, val & ~mask) != val);
>>   -    ret = true;
>> -out_unlock:
>> -    spin_unlock_irqrestore(&map->swap_lock, flags);
>> -    return ret;
>> +    return true;
>>   }
>>     int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
>> @@ -80,7 +73,6 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
>>       for (i = 0; i < sb->map_nr; i++) {
>>           sb->map[i].depth = min(depth, bits_per_word);
>>           depth -= sb->map[i].depth;
>> -        spin_lock_init(&sb->map[i].swap_lock);
>>       }
>>       return 0;
>>   }
>>
>

Ming Lei Nov. 26, 2020, 2:46 a.m. UTC | #3

On Sun, Nov 22, 2020 at 03:35:46PM +0000, Pavel Begunkov wrote:
> map->swap_lock protects map->cleared from concurrent modification,
> however sbitmap_deferred_clear() is already atomically drains it, so
> it's guaranteed to not loose bits on concurrent
> sbitmap_deferred_clear().
> 
> A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
> increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/sbitmap.h |  5 -----
>  lib/sbitmap.c           | 14 +++-----------
>  2 files changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> index e40d019c3d9d..74cc6384715e 100644
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -32,11 +32,6 @@ struct sbitmap_word {
>  	 * @cleared: word holding cleared bits
>  	 */
>  	unsigned long cleared ____cacheline_aligned_in_smp;
> -
> -	/**
> -	 * @swap_lock: Held while swapping word <-> cleared
> -	 */
> -	spinlock_t swap_lock;
>  } ____cacheline_aligned_in_smp;
>  
>  /**
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index c1c8a4e69325..4fd877048ba8 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -15,13 +15,9 @@
>  static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
>  {
>  	unsigned long mask, val;
> -	bool ret = false;
> -	unsigned long flags;
>  
> -	spin_lock_irqsave(&map->swap_lock, flags);
> -
> -	if (!map->cleared)
> -		goto out_unlock;
> +	if (!READ_ONCE(map->cleared))
> +		return false;

This way might break sbitmap_find_bit_in_index()/sbitmap_get_shallow().
Currently if sbitmap_deferred_clear() returns false, it means nothing
can be allocated from this word. With this patch, even though 'false'
is returned, free bits still might be available because another
sbitmap_deferred_clear() can be run concurrently.

Thanks,
Ming

Pavel Begunkov Nov. 26, 2020, 1:44 p.m. UTC | #4

On 26/11/2020 02:46, Ming Lei wrote:
> On Sun, Nov 22, 2020 at 03:35:46PM +0000, Pavel Begunkov wrote:
>> map->swap_lock protects map->cleared from concurrent modification,
>> however sbitmap_deferred_clear() is already atomically drains it, so
>> it's guaranteed to not loose bits on concurrent
>> sbitmap_deferred_clear().
>>
>> A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
>> increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>  include/linux/sbitmap.h |  5 -----
>>  lib/sbitmap.c           | 14 +++-----------
>>  2 files changed, 3 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
>> index e40d019c3d9d..74cc6384715e 100644
>> --- a/include/linux/sbitmap.h
>> +++ b/include/linux/sbitmap.h
>> @@ -32,11 +32,6 @@ struct sbitmap_word {
>>  	 * @cleared: word holding cleared bits
>>  	 */
>>  	unsigned long cleared ____cacheline_aligned_in_smp;
>> -
>> -	/**
>> -	 * @swap_lock: Held while swapping word <-> cleared
>> -	 */
>> -	spinlock_t swap_lock;
>>  } ____cacheline_aligned_in_smp;
>>  
>>  /**
>> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
>> index c1c8a4e69325..4fd877048ba8 100644
>> --- a/lib/sbitmap.c
>> +++ b/lib/sbitmap.c
>> @@ -15,13 +15,9 @@
>>  static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
>>  {
>>  	unsigned long mask, val;
>> -	bool ret = false;
>> -	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&map->swap_lock, flags);
>> -
>> -	if (!map->cleared)
>> -		goto out_unlock;
>> +	if (!READ_ONCE(map->cleared))
>> +		return false;
> 
> This way might break sbitmap_find_bit_in_index()/sbitmap_get_shallow().
> Currently if sbitmap_deferred_clear() returns false, it means nothing
> can be allocated from this word. With this patch, even though 'false'
> is returned, free bits still might be available because another
> sbitmap_deferred_clear() can be run concurrently.

But that can happen anyway if someone frees a requests right after we
return from sbitmap_deferred_clear(). Can you elaborate what exactly
it breaks? Something in sbq wakeup paths?

Ming Lei Nov. 27, 2020, 2:06 a.m. UTC | #5

On Thu, Nov 26, 2020 at 01:44:36PM +0000, Pavel Begunkov wrote:
> On 26/11/2020 02:46, Ming Lei wrote:
> > On Sun, Nov 22, 2020 at 03:35:46PM +0000, Pavel Begunkov wrote:
> >> map->swap_lock protects map->cleared from concurrent modification,
> >> however sbitmap_deferred_clear() is already atomically drains it, so
> >> it's guaranteed to not loose bits on concurrent
> >> sbitmap_deferred_clear().
> >>
> >> A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
> >> increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.
> >>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >>  include/linux/sbitmap.h |  5 -----
> >>  lib/sbitmap.c           | 14 +++-----------
> >>  2 files changed, 3 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> >> index e40d019c3d9d..74cc6384715e 100644
> >> --- a/include/linux/sbitmap.h
> >> +++ b/include/linux/sbitmap.h
> >> @@ -32,11 +32,6 @@ struct sbitmap_word {
> >>  	 * @cleared: word holding cleared bits
> >>  	 */
> >>  	unsigned long cleared ____cacheline_aligned_in_smp;
> >> -
> >> -	/**
> >> -	 * @swap_lock: Held while swapping word <-> cleared
> >> -	 */
> >> -	spinlock_t swap_lock;
> >>  } ____cacheline_aligned_in_smp;
> >>  
> >>  /**
> >> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> >> index c1c8a4e69325..4fd877048ba8 100644
> >> --- a/lib/sbitmap.c
> >> +++ b/lib/sbitmap.c
> >> @@ -15,13 +15,9 @@
> >>  static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
> >>  {
> >>  	unsigned long mask, val;
> >> -	bool ret = false;
> >> -	unsigned long flags;
> >>  
> >> -	spin_lock_irqsave(&map->swap_lock, flags);
> >> -
> >> -	if (!map->cleared)
> >> -		goto out_unlock;
> >> +	if (!READ_ONCE(map->cleared))
> >> +		return false;
> > 
> > This way might break sbitmap_find_bit_in_index()/sbitmap_get_shallow().
> > Currently if sbitmap_deferred_clear() returns false, it means nothing
> > can be allocated from this word. With this patch, even though 'false'
> > is returned, free bits still might be available because another
> > sbitmap_deferred_clear() can be run concurrently.
> 
> But that can happen anyway if someone frees a requests right after we
> return from sbitmap_deferred_clear().

When one request is freed, and if there is any allocator waiting for,
sbitmap_queue_wake_up() will wake up the allocator for retrying.

If there isn't any allocator waiting for, freeing request will make
empty bits, and the following way is applied to make forward
progress for either request allocation or driver tag:

	tag = __blk_mq_get_tag(data, bt);	[1]
	if (tag != BLK_MQ_NO_TAG)
		break;
	sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE);
	tag = __blk_mq_get_tag(data, bt);	[2]
	if (tag != BLK_MQ_NO_TAG)
		break;
	io_schedule()

When allocation[2] and sbitmap_resize() is run concurrently,
allocation[2] may fail because map cleared bits are cleared.
But looks it can happen without your patch.

> Can you elaborate what exactly
> it breaks? Something in sbq wakeup paths?

Thinking of further, looks your patch is good: when one allocation
wins(removing swap_lock won't change this point), the winning bit will
be freed in future, then wakeup is run for other waiters if all allocation
take same approach with above.


Thanks,
Ming

[v2,2/4] sbitmap: remove swap_lock

Commit Message

Comments

Patch