[PATCHSET,0/3] io-wq locking improvements

Message ID	20230809194306.170979-1-axboe@kernel.dk (mailing list archive)
Headers	show Return-Path: <io-uring-owner@vger.kernel.org> From: Jens Axboe <axboe@kernel.dk> To: io-uring@vger.kernel.org Subject: [PATCHSET 0/3] io-wq locking improvements Date: Wed, 9 Aug 2023 13:43:03 -0600 Message-Id: <20230809194306.170979-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	io-wq locking improvements \| expand [PATCHSET,0/3] io-wq locking improvements [1/3] io_uring/io-wq: don't grab wq->lock for worker activation [2/3] io_uring/io-wq: reduce frequency of acct->lock acquisitions [3/3] io_uring/io-wq: don't gate worker wake up success on wake_up_process()

Message ID

20230809194306.170979-1-axboe@kernel.dk (mailing list archive)

Headers

From: Jens Axboe <axboe@kernel.dk>
To: io-uring@vger.kernel.org
Subject: [PATCHSET 0/3] io-wq locking improvements
Date: Wed,  9 Aug 2023 13:43:03 -0600
Message-Id: <20230809194306.170979-1-axboe@kernel.dk>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

io-wq locking improvements | expand

Message

Jens Axboe Aug. 9, 2023, 7:43 p.m. UTC

Hi,

In chatting with someone that was trying to use io_uring to read
mailddirs, they found that running a test case that does:

open file, statx file, read file, close file

The culprit here is statx, and argumentation aside on whether it makes
sense to statx in the first place, it does highlight that io-wq is
pretty locking intensive.

This (very lightly tested [1]) patchset attempts to improve this
situation, but reducing the frequency of grabbing wq->lock and
acct->lock.

The first patch gets rid of wq->lock on work insertion. io-wq grabs it
to iterate the free worker list, but that is not necessary.

Second patch reduces the frequency of acct->lock grabs, when we need to
run the queue and process new work. We currently grab the lock and check
for work, then drop it, then grab it again to process the work. That is
unneccessary.

Final patch just optimizes how we activate new workers. It's not related
to the locking itself, just reducing the overhead of activating a new
worker.

Running the above test case on a directory with 50K files, each being
between 10 and 4096 bytes, before these patches we get spend 160-170ms
running the workload. With this patchset, we spend 90-100ms doing the
same work. A bit of profile information is included in the patch commit
messages.

Can also be found here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-wq-lock

[1] Runs the test suite just fine, with PROVE_LOCKING enabled and raw
    lockdep as well.

Comments

Hao Xu Aug. 11, 2023, 4 a.m. UTC | #1

On 8/10/23 03:43, Jens Axboe wrote:
> Hi,
> 
> In chatting with someone that was trying to use io_uring to read
> mailddirs, they found that running a test case that does:
> 
> open file, statx file, read file, close file
> 
> The culprit here is statx, and argumentation aside on whether it makes
> sense to statx in the first place, it does highlight that io-wq is
> pretty locking intensive.
> 
> This (very lightly tested [1]) patchset attempts to improve this
> situation, but reducing the frequency of grabbing wq->lock and
> acct->lock.
> 
> The first patch gets rid of wq->lock on work insertion. io-wq grabs it
> to iterate the free worker list, but that is not necessary.
> 
> Second patch reduces the frequency of acct->lock grabs, when we need to
> run the queue and process new work. We currently grab the lock and check
> for work, then drop it, then grab it again to process the work. That is
> unneccessary.
> 
> Final patch just optimizes how we activate new workers. It's not related
> to the locking itself, just reducing the overhead of activating a new
> worker.
> 
> Running the above test case on a directory with 50K files, each being
> between 10 and 4096 bytes, before these patches we get spend 160-170ms
> running the workload. With this patchset, we spend 90-100ms doing the
> same work. A bit of profile information is included in the patch commit
> messages.
> 
> Can also be found here:
> 
> https://git.kernel.dk/cgit/linux/log/?h=io_uring-wq-lock
> 
> [1] Runs the test suite just fine, with PROVE_LOCKING enabled and raw
>      lockdep as well.
> 

Haven't got time to test it, but looks good from the code itself.

Reviewed-by: Hao Xu <howeyxu@tencent.com>

Jens Axboe Aug. 11, 2023, 4:36 p.m. UTC | #2

On 8/10/23 10:00 PM, Hao Xu wrote:
> On 8/10/23 03:43, Jens Axboe wrote:
>> Hi,
>>
>> In chatting with someone that was trying to use io_uring to read
>> mailddirs, they found that running a test case that does:
>>
>> open file, statx file, read file, close file
>>
>> The culprit here is statx, and argumentation aside on whether it makes
>> sense to statx in the first place, it does highlight that io-wq is
>> pretty locking intensive.
>>
>> This (very lightly tested [1]) patchset attempts to improve this
>> situation, but reducing the frequency of grabbing wq->lock and
>> acct->lock.
>>
>> The first patch gets rid of wq->lock on work insertion. io-wq grabs it
>> to iterate the free worker list, but that is not necessary.
>>
>> Second patch reduces the frequency of acct->lock grabs, when we need to
>> run the queue and process new work. We currently grab the lock and check
>> for work, then drop it, then grab it again to process the work. That is
>> unneccessary.
>>
>> Final patch just optimizes how we activate new workers. It's not related
>> to the locking itself, just reducing the overhead of activating a new
>> worker.
>>
>> Running the above test case on a directory with 50K files, each being
>> between 10 and 4096 bytes, before these patches we get spend 160-170ms
>> running the workload. With this patchset, we spend 90-100ms doing the
>> same work. A bit of profile information is included in the patch commit
>> messages.
>>
>> Can also be found here:
>>
>> https://git.kernel.dk/cgit/linux/log/?h=io_uring-wq-lock
>>
>> [1] Runs the test suite just fine, with PROVE_LOCKING enabled and raw
>>      lockdep as well.
>>
> 
> Haven't got time to test it, but looks good from the code itself.
> 
> Reviewed-by: Hao Xu <howeyxu@tencent.com>

Thanks, added.