Message ID | 20220606065716.270879-1-haoxu.linux@icloud.com (mailing list archive) |
---|---|
Headers | show |
Series | cancel_hash per entry lock | expand |
On 6/6/22 14:57, Hao Xu wrote: > From: Hao Xu <howeyxu@tencent.com> > > Make per entry lock for cancel_hash array, this reduces usage of > completion_lock and contension between cancel_hash entries. > > v1->v2: > - Add per entry lock for poll/apoll task work code which was missed > in v1 > - add an member in io_kiocb to track req's indice in cancel_hash Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out that there is little conpletion_lock contention, so no visible change in data. But I still think this may be good for cancel_hash access in some real cases where completion lock matters. Regards, Hao
On 6/6/22 08:06, Hao Xu wrote: > On 6/6/22 14:57, Hao Xu wrote: >> From: Hao Xu <howeyxu@tencent.com> >> >> Make per entry lock for cancel_hash array, this reduces usage of >> completion_lock and contension between cancel_hash entries. >> >> v1->v2: >> - Add per entry lock for poll/apoll task work code which was missed >> in v1 >> - add an member in io_kiocb to track req's indice in cancel_hash > > Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out > that there is little conpletion_lock contention, so no visible change in > data. But I still think this may be good for cancel_hash access in some > real cases where completion lock matters. Conceptually I don't mind it, but let me ask in what circumstances you expect it to make a difference? And what can we do to get favourable numbers? For instance, how many CPUs io-wq was using?
On 6/6/22 13:02, Pavel Begunkov wrote: > On 6/6/22 08:06, Hao Xu wrote: >> On 6/6/22 14:57, Hao Xu wrote: >>> From: Hao Xu <howeyxu@tencent.com> >>> >>> Make per entry lock for cancel_hash array, this reduces usage of >>> completion_lock and contension between cancel_hash entries. >>> >>> v1->v2: >>> - Add per entry lock for poll/apoll task work code which was missed >>> in v1 >>> - add an member in io_kiocb to track req's indice in cancel_hash >> >> Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out >> that there is little conpletion_lock contention, so no visible change in >> data. But I still think this may be good for cancel_hash access in some >> real cases where completion lock matters. > > Conceptually I don't mind it, but let me ask in what > circumstances you expect it to make a difference? And > what can we do to get favourable numbers? For instance, > how many CPUs io-wq was using? Btw, I couldn't find ____cacheline_aligned_in_smp anywhere, which I expect around those new spinlocks to avoid them sharing cache lines
On 6/6/22 20:02, Pavel Begunkov wrote: > On 6/6/22 08:06, Hao Xu wrote: >> On 6/6/22 14:57, Hao Xu wrote: >>> From: Hao Xu <howeyxu@tencent.com> >>> >>> Make per entry lock for cancel_hash array, this reduces usage of >>> completion_lock and contension between cancel_hash entries. >>> >>> v1->v2: >>> - Add per entry lock for poll/apoll task work code which was missed >>> in v1 >>> - add an member in io_kiocb to track req's indice in cancel_hash >> >> Tried to test it with many poll_add IOSQQE_ASYNC requests but turned out >> that there is little conpletion_lock contention, so no visible change in >> data. But I still think this may be good for cancel_hash access in some >> real cases where completion lock matters. > > Conceptually I don't mind it, but let me ask in what > circumstances you expect it to make a difference? And I suppose there are cases where a bunch of users trying to access cancel_hash[] at the same time when people use multiple threads to submit sqes or they use IOSQE_ASYNC. And these io-workers or task works run parallel on different CPUs. > what can we do to get favourable numbers? For instance, > how many CPUs io-wq was using? It is not easy to construct manually since it is related with task scheduling, like if we just issue many IOSQE_ASYNC polls in an idle machine with many CPUs, there won't be much contention because of different thread start time(thus they access cancel_hash at different time >
From: Hao Xu <howeyxu@tencent.com> Make per entry lock for cancel_hash array, this reduces usage of completion_lock and contension between cancel_hash entries. v1->v2: - Add per entry lock for poll/apoll task work code which was missed in v1 - add an member in io_kiocb to track req's indice in cancel_hash Hao Xu (3): io_uring: add hash_index and its logic to track req in cancel_hash io_uring: add an io_hash_bucket structure for smaller granularity lock io_uring: switch cancel_hash to use per list spinlock io_uring/cancel.c | 15 +++++++-- io_uring/cancel.h | 6 ++++ io_uring/fdinfo.c | 9 ++++-- io_uring/io_uring.c | 8 +++-- io_uring/io_uring_types.h | 3 +- io_uring/poll.c | 64 +++++++++++++++++++++------------------ 6 files changed, 67 insertions(+), 38 deletions(-) base-commit: d8271bf021438f468dab3cd84fe5279b5bbcead8