Message ID | d94a4e55-c4f2-73d8-9e2c-e55ae8436622@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] io_uring xattr support | expand |
On Sun, May 22, 2022 at 2:26 PM Jens Axboe <axboe@kernel.dk> wrote: > > On top of the core io_uring changes, this pull request includes support > for the xattr variants. So I don't mind the code (having seen the earlier versions), but looking at this all I *do* end up reacting to this part: [torvalds@ryzen linux]$ wc -l fs/io_uring.c 12744 fs/io_uring.c and no, this is not due to this xattr pull, but the xattr code did add another few hundred lines of "io_uring command boilerplate for another command" to this file that is a nasty file from hell. I really think that it might be time to start thinking about splitting that io_uring.c file up. Make it a directory, and have the core command engine in io_uring/core.c, and then have the different actual IO_URING_OP_xyz handling in separate files. And yes, that would probably necessitate making the OP handling use more of a dispatch table approach, but wouldn't that be good anyway? That io_uring.c file is starting to have a lot of *big* switch statements for the different cases. Wouldn't it be nice to have a "op descriptor array" instead of the switch (req->opcode) { ... case IORING_OP_WRITE: return io_prep_rw(req, sqe); ... kind of tables? Yes, the compiler may end up generating a binary-tree compare-and-branch thing for a switch like that, and it might be better than an indirect branch in these days of spectre costs for branch prediction safety, but if we're talking a few tens of cycles per op, that's probably not really a big deal. And from a maintenenace standpoint, I really think it would be good to try to try to walk away from those "case IORING_OP_xyz" things, and try to split things up into more manageable pieces. Hmm? Linus
On 5/23/22 1:41 PM, Linus Torvalds wrote: > On Sun, May 22, 2022 at 2:26 PM Jens Axboe <axboe@kernel.dk> wrote: >> >> On top of the core io_uring changes, this pull request includes support >> for the xattr variants. > > So I don't mind the code (having seen the earlier versions), but > looking at this all I *do* end up reacting to this part: > > [torvalds@ryzen linux]$ wc -l fs/io_uring.c > 12744 fs/io_uring.c > > and no, this is not due to this xattr pull, but the xattr code did add > another few hundred lines of "io_uring command boilerplate for another > command" to this file that is a nasty file from hell. I know, it really is a monster file at this point... > I really think that it might be time to start thinking about splitting > that io_uring.c file up. Make it a directory, and have the core > command engine in io_uring/core.c, and then have the different actual > IO_URING_OP_xyz handling in separate files. I've been pondering that for a while actually, and yes I agree it's time to organize this a bit differently. When you are in this code all the time you notice less as you know where everything is, but it would be nice to take the time to split it into some manageable and separately readable/maintainable pieces. > And yes, that would probably necessitate making the OP handling use > more of a dispatch table approach, but wouldn't that be good anyway? > That io_uring.c file is starting to have a lot of *big* switch > statements for the different cases. > > Wouldn't it be nice to have a "op descriptor array" instead of the > > switch (req->opcode) { > ... > case IORING_OP_WRITE: > return io_prep_rw(req, sqe); > ... > > kind of tables? > > Yes, the compiler may end up generating a binary-tree > compare-and-branch thing for a switch like that, and it might be > better than an indirect branch in these days of spectre costs for > branch prediction safety, but if we're talking a few tens of cycles > per op, that's probably not really a big deal. I was resistant to the indirect function call initially because of the spectre overhead, but that was when the table was a lot smaller. The tides may indeed have shifted on this now that the table has grown to the size that it has. Plus we have both a prep handler and issue handler for each, so you end up with two massive switches to deal with that. > And from a maintenenace standpoint, I really think it would be good to > try to try to walk away from those "case IORING_OP_xyz" things, and > try to split things up into more manageable pieces. > > Hmm? As mentioned, it's something that I have myself been thinking about for the past few releases. It's not difficult work and can be done in a sequential kind of manner, but it will add some pain in terms of backports. Nothing _really_ major, but... Longer term it'll be nicer for sure, which is the most important bit. I've got some ideas on how to split the core bits, and the related-op-per file kind of idea for the rest makes sense. Eg net related bits can go in one, or maybe we can even go finer grained and (almost) do per-op. I'll spend some time after the merge window to try and get this sorted out.
The pull request you sent on Sun, 22 May 2022 15:26:02 -0600:
> git://git.kernel.dk/linux-block.git tags/for-5.19/io_uring-xattr-2022-05-22
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/09beaff75e4c4cee590b0e547c7587ff6cadb5db
Thank you!
On 5/23/22 1:59 PM, Jens Axboe wrote: >> And from a maintenenace standpoint, I really think it would be good to >> try to try to walk away from those "case IORING_OP_xyz" things, and >> try to split things up into more manageable pieces. >> >> Hmm? > > As mentioned, it's something that I have myself been thinking about for > the past few releases. It's not difficult work and can be done in a > sequential kind of manner, but it will add some pain in terms of > backports. Nothing _really_ major, but... Longer term it'll be nicer for > sure, which is the most important bit. > > I've got some ideas on how to split the core bits, and the > related-op-per file kind of idea for the rest makes sense. Eg net > related bits can go in one, or maybe we can even go finer grained and > (almost) do per-op. > > I'll spend some time after the merge window to try and get this sorted > out. Well I spent some time yesterday, and 53 patches later, I think it looks pretty sane. At this point everything is split out, except read/write and poll handling. There's still some things we can split out, like the io_uring_register() side, but as it stands, ~35% of the code has been split into separate files for either opcodes or infrastructure that goes together. We can definitely get this down further, but it's a good start. Anyway, I pitted -git vs -git + this merged in, to test with and without spectre mitigations as we now have an indirect ->prep() and ->issue() for each opcode in the fast path. I ran my usual peak testing, which is basically just polled async random reads from 3 devices. The box in question is a 12900K, so recent cpu. Mitigation status: /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization /sys/devices/system/cpu/vulnerabilities/spectre_v2:Vulnerable: eIBRS with unprivileged eBPF Out of 5 runs, here are the results: -git, RETPOLINE=n, mitigations=off Maximum IOPS=12837K Maximum IOPS=12812K Maximum IOPS=12834K Maximum IOPS=12856K Maximum IOPS=12809K for-5.20/io_uring, RETPOLINE=n, mitigations=off Maximum IOPS=12877K Maximum IOPS=12870K Maximum IOPS=12848K Maximum IOPS=12837K Maximum IOPS=12899K -git, RETPOLINE=y, mitigations=on Maximum IOPS=12869K Maximum IOPS=12766K Maximum IOPS=12817K Maximum IOPS=12866K Maximum IOPS=12828K for-5.20/io_uring, RETPOLINE=y, mitigations=on Maximum IOPS=12859K Maximum IOPS=12921K Maximum IOPS=12899K Maximum IOPS=12859K Maximum IOPS=12904K If anything, the new code seems a smidge faster for both mitigations=off vs RETPOLINE=y && mitigations=on. Hmm? But at least from a first test this is promising and it may be a viable path forward to make it a bit saner. If you're curious, git tree here: https://git.kernel.dk/cgit/linux-block/log/?h=for-5.20/io_uring with the first 5 patches being staged for 5.19 as well as they are just consistency cleanups.