[v2,0/3] cancel_hash per entry lock

Message ID	20220606065716.270879-1-haoxu.linux@icloud.com (mailing list archive)
Headers	show Return-Path: <io-uring-owner@kernel.org> From: Hao Xu <haoxu.linux@icloud.com> To: io-uring@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk>, Pavel Begunkov <asml.silence@gmail.com> Subject: [PATCH v2 0/3] cancel_hash per entry lock Date: Mon, 6 Jun 2022 14:57:13 +0800 Message-Id: <20220606065716.270879-1-haoxu.linux@icloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	cancel_hash per entry lock \| expand [v2,0/3] cancel_hash per entry lock [1/3] io_uring: add hash_index and its logic to track req in cancel_hash [2/3] io_uring: add an io_hash_bucket structure for smaller granularity lock [3/3] io_uring: switch cancel_hash to use per list spinlock

Message ID

20220606065716.270879-1-haoxu.linux@icloud.com (mailing list archive)

Headers

From: Hao Xu <haoxu.linux@icloud.com>
To: io-uring@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>,
        Pavel Begunkov <asml.silence@gmail.com>
Subject: [PATCH v2 0/3] cancel_hash per entry lock
Date: Mon,  6 Jun 2022 14:57:13 +0800
Message-Id: <20220606065716.270879-1-haoxu.linux@icloud.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

cancel_hash per entry lock | expand

Message

Hao Xu June 6, 2022, 6:57 a.m. UTC

From: Hao Xu <howeyxu@tencent.com>

Make per entry lock for cancel_hash array, this reduces usage of
completion_lock and contension between cancel_hash entries.

v1->v2:
 - Add per entry lock for poll/apoll task work code which was missed
   in v1
 - add an member in io_kiocb to track req's indice in cancel_hash

Hao Xu (3):
  io_uring: add hash_index and its logic to track req in cancel_hash
  io_uring: add an io_hash_bucket structure for smaller granularity lock
  io_uring: switch cancel_hash to use per list spinlock

 io_uring/cancel.c         | 15 +++++++--
 io_uring/cancel.h         |  6 ++++
 io_uring/fdinfo.c         |  9 ++++--
 io_uring/io_uring.c       |  8 +++--
 io_uring/io_uring_types.h |  3 +-
 io_uring/poll.c           | 64 +++++++++++++++++++++------------------
 6 files changed, 67 insertions(+), 38 deletions(-)


base-commit: d8271bf021438f468dab3cd84fe5279b5bbcead8

Comments

Hao Xu June 6, 2022, 7:06 a.m. UTC | #1

On 6/6/22 14:57, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> Make per entry lock for cancel_hash array, this reduces usage of
> completion_lock and contension between cancel_hash entries.
> 
> v1->v2:
>   - Add per entry lock for poll/apoll task work code which was missed
>     in v1
>   - add an member in io_kiocb to track req's indice in cancel_hash

Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out
that there is little conpletion_lock contention, so no visible change in
data. But I still think this may be good for cancel_hash access in some
real cases where completion lock matters.

Regards,
Hao

Pavel Begunkov June 6, 2022, 12:02 p.m. UTC | #2

On 6/6/22 08:06, Hao Xu wrote:
> On 6/6/22 14:57, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> Make per entry lock for cancel_hash array, this reduces usage of
>> completion_lock and contension between cancel_hash entries.
>>
>> v1->v2:
>>   - Add per entry lock for poll/apoll task work code which was missed
>>     in v1
>>   - add an member in io_kiocb to track req's indice in cancel_hash
> 
> Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out
> that there is little conpletion_lock contention, so no visible change in
> data. But I still think this may be good for cancel_hash access in some
> real cases where completion lock matters.

Conceptually I don't mind it, but let me ask in what
circumstances you expect it to make a difference? And
what can we do to get favourable numbers? For instance,
how many CPUs io-wq was using?

Pavel Begunkov June 6, 2022, 12:09 p.m. UTC | #3

On 6/6/22 13:02, Pavel Begunkov wrote:
> On 6/6/22 08:06, Hao Xu wrote:
>> On 6/6/22 14:57, Hao Xu wrote:
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> Make per entry lock for cancel_hash array, this reduces usage of
>>> completion_lock and contension between cancel_hash entries.
>>>
>>> v1->v2:
>>>   - Add per entry lock for poll/apoll task work code which was missed
>>>     in v1
>>>   - add an member in io_kiocb to track req's indice in cancel_hash
>>
>> Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out
>> that there is little conpletion_lock contention, so no visible change in
>> data. But I still think this may be good for cancel_hash access in some
>> real cases where completion lock matters.
> 
> Conceptually I don't mind it, but let me ask in what
> circumstances you expect it to make a difference? And
> what can we do to get favourable numbers? For instance,
> how many CPUs io-wq was using?

Btw, I couldn't find ____cacheline_aligned_in_smp anywhere,
which I expect around those new spinlocks to avoid them sharing
cache lines

Hao Xu June 6, 2022, 1:39 p.m. UTC | #4

On 6/6/22 20:02, Pavel Begunkov wrote:
> On 6/6/22 08:06, Hao Xu wrote:
>> On 6/6/22 14:57, Hao Xu wrote:
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> Make per entry lock for cancel_hash array, this reduces usage of
>>> completion_lock and contension between cancel_hash entries.
>>>
>>> v1->v2:
>>>   - Add per entry lock for poll/apoll task work code which was missed
>>>     in v1
>>>   - add an member in io_kiocb to track req's indice in cancel_hash
>>
>> Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out
>> that there is little conpletion_lock contention, so no visible change in
>> data. But I still think this may be good for cancel_hash access in some
>> real cases where completion lock matters.
> 
> Conceptually I don't mind it, but let me ask in what
> circumstances you expect it to make a difference? And

I suppose there are cases where a bunch of users trying to access
cancel_hash[] at the same time when people use multiple threads to
submit sqes or they use IOSQE_ASYNC. And these io-workers or task works
run parallel on different CPUs.

> what can we do to get favourable numbers? For instance,
> how many CPUs io-wq was using?

It is not easy to construct manually since it is related with task
scheduling, like if we just issue many IOSQE_ASYNC polls in an
idle machine with many CPUs, there won't be much contention because of
different thread start time(thus they access cancel_hash at different
time
>