Message ID | 20210910182536.685100-1-axboe@kernel.dk (mailing list archive) |
---|---|
Headers | show |
Series | Add ability to save/restore iov_iter state | expand |
On 9/10/21 12:25 PM, Jens Axboe wrote: > Hi, > > Linus didn't particularly love the iov_iter->truncated addition and how > it was used, and it was hard to disagree with that. Instead of relying > on tracking ->truncated, add a few pieces of state so we can safely > handle partial or errored read/write attempts (that we want to retry). > > Then we can get rid of the iov_iter addition, and at the same time > handle cases that weren't handled correctly before. > > I've run this through vectored read/write with io_uring on the commonly > problematic cases (dm and low depth SCSI device) which trigger these > conditions often, and it seems to pass muster. > > For a discussion on this topic, see the thread here: > > https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/ > > You can find these patches here: > > https://git.kernel.dk/cgit/linux-block/log/?h=iov_iter Al, Linus, are you OK with this? I think we should get this in for 5.15. I didn't resend the whole series, just a v2 of patch 1/3 to fix that bvec vs iovec issue. Let me know if you want the while thing resent.
On Mon, Sep 13, 2021 at 3:43 PM Jens Axboe <axboe@kernel.dk> wrote: > > Al, Linus, are you OK with this? I think we should get this in for 5.15. > I didn't resend the whole series, just a v2 of patch 1/3 to fix that bvec > vs iovec issue. Let me know if you want the while thing resent. So I'm ok with the iov_iter side, but the io_uring side seems still positively buggy, and very confused. It also messes with the state in bad ways and has internal knowledge. And some of it looks completely bogus. For example, I see state->count -= ret; rw->bytes_done += ret; and I go "that's BS". There's no way it's ok to start messing with the byte count inside the state like that. That just means that the state is now no longer the saved state, and it's some random garbage. I also think that the "bytes_done += ret" is a big hint there: any time you restore the iovec state, you should then forward it by "bytes_done". But that's not what the code does. Instead, it will now restore the iovec styate with the *wrong* number of bytes remaining, but will start from the beginning of the iovec. So I think the fs/io_uring.c use of this state buffer is completely wrong. What *may* be the right thing to do is to (a) not mess with state->count (b) when you restore the state you always use iov_iter_restore(iter, state, bytes_done); to actually restore the *correct* state. Because modifying the iovec save state like that cannot be right, and if it's right it's still too ugly and fragile for words. That save state should be treated as a snapshot, not as a random buffer that you can make arbitrary changes to. See what I'm saying? I'd like Al to take a look at the io_uring.c usage too, since this was just my reaction from looking at that diff a bit more. Linus
On 9/13/21 5:23 PM, Linus Torvalds wrote: > On Mon, Sep 13, 2021 at 3:43 PM Jens Axboe <axboe@kernel.dk> wrote: >> >> Al, Linus, are you OK with this? I think we should get this in for 5.15. >> I didn't resend the whole series, just a v2 of patch 1/3 to fix that bvec >> vs iovec issue. Let me know if you want the while thing resent. > > So I'm ok with the iov_iter side, but the io_uring side seems still > positively buggy, and very confused. > > It also messes with the state in bad ways and has internal knowledge. > And some of it looks completely bogus. > > For example, I see > > state->count -= ret; > rw->bytes_done += ret; > > and I go "that's BS". There's no way it's ok to start messing with the > byte count inside the state like that. That just means that the state > is now no longer the saved state, and it's some random garbage. > > I also think that the "bytes_done += ret" is a big hint there: any > time you restore the iovec state, you should then forward it by > "bytes_done". But that's not what the code does. > > Instead, it will now restore the iovec styate with the *wrong* number > of bytes remaining, but will start from the beginning of the iovec. > > So I think the fs/io_uring.c use of this state buffer is completely wrong. > > What *may* be the right thing to do is to > > (a) not mess with state->count > > (b) when you restore the state you always use > > iov_iter_restore(iter, state, bytes_done); > > to actually restore the *correct* state. > > Because modifying the iovec save state like that cannot be right, and > if it's right it's still too ugly and fragile for words. That save > state should be treated as a snapshot, not as a random buffer that you > can make arbitrary changes to. > > See what I'm saying? OK, for the do while loop itself, I do think we should be more consistent and that would also get rid of the state->count manipulation. I do agree that messing with that state is not something that should be done, and we can do this more cleanly and consistently instead. Once we hit the do {} while loop, state should be &rw->state and we can consistently handle it that way. Let me rework that bit and run the tests, and I'll post a v2 tomorrow. Thanks for taking a closer look.