mbox series

[PATCHSET,0/4] fs: Support for LOOKUP_CACHED / RESOLVE_CACHED

Message ID 20201217161911.743222-1-axboe@kernel.dk (mailing list archive)
Headers show
Series fs: Support for LOOKUP_CACHED / RESOLVE_CACHED | expand

Message

Jens Axboe Dec. 17, 2020, 4:19 p.m. UTC
Hi,

Here's v3 of the LOOKUP_CACHED change. It's exposed as both a flag for
openat2(), and it's used internally by io_uring to speed up (and make more
efficient) the openat/openat2 support there. As posted in the v3 thread,
performance numbers for various levels of the filename lookup already
being cached:

Cached		5.10-git	5.10-git+LOOKUP_CACHED	Speedup
---------------------------------------------------------------
33%		1,014,975	900,474			1.1x
89%		 545,466	292,937			1.9x
100%		 435,636	151,475			2.9x

The more cache hot we are, the faster the inline LOOKUP_CACHED
optimization helps. This is unsurprising and expected, as a thread
offload becomes a more dominant part of the total overhead. If we look
at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in
the dentry cache will yield:

275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0
275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0
275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991
275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1
275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4

which shows a failed nonblock lookup, then punt to worker, and then we
complete with fd == 4. This takes 65 usec in total. Re-running the same
test case again:

281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0
281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0
281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4

shows the same request completing inline, also returning fd == 4. This
takes 6 usec.

Using openat2, we see that an attempted RESOLVE_CACHED open of an uncached
file will fail with -EAGAIN, and a subsequent attempt will too as it's
still not cached. ls the file and retry, and we successfully open it
with RESOLVE_CACHED:

[test@archlinux ~]$ ./openat2-cached /etc/nanorc
open: -1
openat2: Resource temporarily unavailable
[test@archlinux ~]$ ./openat2-cached /etc/nanorc
open: -1
openat2: Resource temporarily unavailable
[test@archlinux ~]$ ls -al /etc/nanorc
-rw-r--r-- 1 root root 10066 Dec 17 16:15 /etc/nanorc
[test@archlinux ~]$ ./openat2-cached /etc/nanorc
open: 3

Minor polish since v3:

- Rename LOOKUP_NONBLOCK -> LOOKUP_CACHED, and ditto for the RESOLVE_
  flag. This better explains what the feature does, making it more self
  explanatory in terms of both code readability and for the user visible
  part.

- Remove dead LOOKUP_NONBLOCK check after we've dropped LOOKUP_RCU
  already, spotted by Al.

- Add O_TMPFILE to the checks upfront, so we can drop the checking in
  do_tmpfile().

Comments

Linus Torvalds Dec. 17, 2020, 6:09 p.m. UTC | #1
On Thu, Dec 17, 2020 at 8:19 AM Jens Axboe <axboe@kernel.dk> wrote:
> [..]
> which shows a failed nonblock lookup, then punt to worker, and then we
> complete with fd == 4. This takes 65 usec in total. Re-running the same
> test case again:
> [..]
> shows the same request completing inline, also returning fd == 4. This
> takes 6 usec.

I think this example needs to be part of the git history - either move
it into the individual commits, or we just need to make sure it
doesn't get lost as a cover letter and ends up part of the merge
message.

Despite having seen the patch series so many times now, I'm still just
really impressed by how small and neat it is, considering the above
numbers, and considering just how problematic this case was
historically (ie I remember all the discussions we had about
nonblocking opens back in the days).

So I continue to go "this is the RightWay(tm)" just from that.

              Linus
Jens Axboe Dec. 17, 2020, 7:23 p.m. UTC | #2
On 12/17/20 11:09 AM, Linus Torvalds wrote:
> On Thu, Dec 17, 2020 at 8:19 AM Jens Axboe <axboe@kernel.dk> wrote:
>> [..]
>> which shows a failed nonblock lookup, then punt to worker, and then we
>> complete with fd == 4. This takes 65 usec in total. Re-running the same
>> test case again:
>> [..]
>> shows the same request completing inline, also returning fd == 4. This
>> takes 6 usec.
> 
> I think this example needs to be part of the git history - either move
> it into the individual commits, or we just need to make sure it
> doesn't get lost as a cover letter and ends up part of the merge
> message.

I agree, and I think the best approach will depend a bit on whether
Al takes the core bits, and then I'm left with just the io_uring part,
or if the series goes in as a whole. In any case, I'll make sure it
gets in as either the merge message, or (at the least) in the io_uring
enablement of it.

> Despite having seen the patch series so many times now, I'm still just
> really impressed by how small and neat it is, considering the above
> numbers, and considering just how problematic this case was
> historically (ie I remember all the discussions we had about
> nonblocking opens back in the days).
> 
> So I continue to go "this is the RightWay(tm)" just from that.

Well, that's largely thanks to your and Al's feedback. I did attempt
this one time earlier, and the break through just wasn't there in
terms of condensing it properly. I'm very happy with it now.