Message ID | 20241110152906.1747545-1-axboe@kernel.dk (mailing list archive) |
---|---|
Headers | show |
Series | Uncached buffered IO | expand |
On Sun, Nov 10, 2024 at 08:27:52AM -0700, Jens Axboe wrote: > 5 years ago I posted patches adding support for RWF_UNCACHED, as a way > to do buffered IO that isn't page cache persistent. The approach back > then was to have private pages for IO, and then get rid of them once IO > was done. But that then runs into all the issues that O_DIRECT has, in > terms of synchronizing with the page cache. Today's a holiday, and I suspect you're going to do a v3 before I have a chance to do a proper review of this version of the series. I think "uncached" isn't quite the right word. Perhaps 'RWF_STREAMING' so that userspace is indicating that this is a streaming I/O and the kernel gets to choose what to do with that information. Also, do we want to fail I/Os to filesystems which don't support it? I suppose really sophisticated userspace might fall back to madvise(DONTNEED), but isn't most userspace going to just clear the flag and retry the I/O? Um. Now I've looked, we also have posix_fadvise(POSIX_FADV_NOREUSE), which is currently a noop. But would we be better off honouring POSIX_FADV_NOREUSE than introducing RWF_UNCACHED? I'll think about this some more while I'm offline.
On 11/11/24 10:25 AM, Matthew Wilcox wrote: > On Sun, Nov 10, 2024 at 08:27:52AM -0700, Jens Axboe wrote: >> 5 years ago I posted patches adding support for RWF_UNCACHED, as a way >> to do buffered IO that isn't page cache persistent. The approach back >> then was to have private pages for IO, and then get rid of them once IO >> was done. But that then runs into all the issues that O_DIRECT has, in >> terms of synchronizing with the page cache. > > Today's a holiday, and I suspect you're going to do a v3 before I have > a chance to do a proper review of this version of the series. Probably, since I've done some fixes since v2 :-). So you can wait for v3, I'll post it later today anyway. > I think "uncached" isn't quite the right word. Perhaps 'RWF_STREAMING' > so that userspace is indicating that this is a streaming I/O and the > kernel gets to choose what to do with that information. Yeah not sure, it's the one I used back in the day, and I still haven't found a more descriptive word for it. That doesn't mean one doesn't exist, certainly taking suggestions. I don't think STREAMING is the right one however, you could most certainly be doing random uncached IO. > Also, do we want to fail I/Os to filesystems which don't support > it? I suppose really sophisticated userspace might fall back to > madvise(DONTNEED), but isn't most userspace going to just clear the flag > and retry the I/O? Also something that's a bit undecided, you can make arguments for both ways. For just ignoring the flag if not support, the argument would be that the application just wants to do IO, uncached if available. For the other argument, maybe you have an application that wants to fallback to O_DIRECT if uncached isn't available. That application certainly wants to know if it works or not. Which is why I defaulted to return -EOPNOTSUPP if it's not available. An applicaton may probe this upfront if it so desires, and just not set the flag for IO. That'd keep it out of the hot path. Seems to me that returning whether it's supported or not is the path of least surprises for applications, which is why I went that way. > Um. Now I've looked, we also have posix_fadvise(POSIX_FADV_NOREUSE), > which is currently a noop. But would we be better off honouring > POSIX_FADV_NOREUSE than introducing RWF_UNCACHED? I'll think about this > some more while I'm offline. That would certainly work too, for synchronous IO. But per-file hints are a bad idea for async IO, for obvious reasons. We really want per-IO hints for that, we have a long history of messing that up. That doesn't mean that FMODE_NOREUSE couldn't just set RWF_UNCACHED, if it's set. That'd be trivial. Then the next question is if setting POSIX_FADV_NOREUSE should fail of file->f_op->fop_flags & FOP_UNCACHED isn't true. Probably not, since it'd potentially break applications. So probably best to just set f_iocb_flags IFF FOP_UNCACHED is true for that file. And the bigger question is why on earth do we have this thing in the kernel that doesn't do anything... But yeah, now we could make it do something.
On Mon, Nov 11, 2024 at 10:25 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Sun, Nov 10, 2024 at 08:27:52AM -0700, Jens Axboe wrote: > > 5 years ago I posted patches adding support for RWF_UNCACHED, as a way > > to do buffered IO that isn't page cache persistent. The approach back > > then was to have private pages for IO, and then get rid of them once IO > > was done. But that then runs into all the issues that O_DIRECT has, in > > terms of synchronizing with the page cache. > > Today's a holiday, and I suspect you're going to do a v3 before I have > a chance to do a proper review of this version of the series. > > I think "uncached" isn't quite the right word. Perhaps 'RWF_STREAMING' > so that userspace is indicating that this is a streaming I/O and the > kernel gets to choose what to do with that information. > > Also, do we want to fail I/Os to filesystems which don't support > it? I suppose really sophisticated userspace might fall back to > madvise(DONTNEED), but isn't most userspace going to just clear the flag > and retry the I/O? > > Um. Now I've looked, we also have posix_fadvise(POSIX_FADV_NOREUSE), > which is currently a noop. Just to clarify that NOREUSE is NOT a noop since commit 17e8102 ("mm: support POSIX_FADV_NOREUSE"). And it had at least one user (we made the userpspace change after the kernel supported it): SVT-AV1 [1]; it was also added to FIO for testing purposes. [1] https://gitlab.com/AOMediaCodec/SVT-AV1 > But would we be better off honouring > POSIX_FADV_NOREUSE than introducing RWF_UNCACHED? I'll think about this > some more while I'm offline. But I guess the flag isn't honored the way UNCACHED works?
On Mon, Nov 11, 2024 at 02:24:54PM -0700, Yu Zhao wrote:
> Just to clarify that NOREUSE is NOT a noop since commit 17e8102 ("mm:
maybe you should send a patch to the manpage?
On Mon, Nov 11, 2024 at 2:48 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, Nov 11, 2024 at 02:24:54PM -0700, Yu Zhao wrote: > > Just to clarify that NOREUSE is NOT a noop since commit 17e8102 ("mm: > > maybe you should send a patch to the manpage? I was under the impression that our engineers took care of that. But apparently it's still pending: https://lore.kernel.org/linux-man/20230320222057.1976956-1-talumbau@google.com/ Will find someone else to follow up on that.