Message ID | 20200704140250.423345-2-gregkh@linuxfoundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | readfile(2): a new syscall to make open/read/close faster | expand |
Hi Greg, On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > It's a tiny syscall, meant to allow a user to do a single "open this > file, read into this buffer, and close the file" all in a single shot. > > Should be good for reading "tiny" files like sysfs, procfs, and other > "small" files. > > There is no restarting the syscall, this is a "simple" syscall, with the > attempt to make reading "simple" files easier with less syscall > overhead. > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Thanks for your patch! > --- a/fs/open.c > +++ b/fs/open.c > +SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename, > + char __user *, buffer, size_t, bufsize, int, flags) > +{ > + struct open_flags op; > + struct open_how how; > + struct file *file; > + loff_t pos = 0; > + int retval; > + > + /* only accept a small subset of O_ flags that make sense */ > + if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags) > + return -EINVAL; > + > + /* add some needed flags to be able to open the file properly */ > + flags |= O_RDONLY | O_LARGEFILE; > + > + how = build_open_how(flags, 0000); > + retval = build_open_flags(&how, &op); > + if (retval) > + return retval; > + > + file = readfile_open(dfd, filename, &op); > + if (IS_ERR(file)) > + return PTR_ERR(file); > + > + retval = vfs_read(file, buffer, bufsize, &pos); Should there be a way for the user to be informed that the file doesn't fit in the provided buffer (.e.g. -EFBIG)? > + > + filp_close(file, NULL); > + > + return retval; > +} Gr{oetje,eeting}s, Geert
On Sat, Jul 04, 2020 at 04:02:47PM +0200, Greg Kroah-Hartman wrote: > + /* only accept a small subset of O_ flags that make sense */ > + if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags) > + return -EINVAL; > + > + /* add some needed flags to be able to open the file properly */ > + flags |= O_RDONLY | O_LARGEFILE; Should we also add O_NOCTTY to prevent any problems if this is called on a tty?
On Sat, Jul 4, 2020 at 4:03 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > It's a tiny syscall, meant to allow a user to do a single "open this > file, read into this buffer, and close the file" all in a single shot. > > Should be good for reading "tiny" files like sysfs, procfs, and other > "small" files. > > There is no restarting the syscall, this is a "simple" syscall, with the > attempt to make reading "simple" files easier with less syscall > overhead. > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > --- > fs/open.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/fs/open.c b/fs/open.c > index 6cd48a61cda3..4469faa9379c 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -1370,3 +1370,53 @@ int stream_open(struct inode *inode, struct file *filp) > } > > EXPORT_SYMBOL(stream_open); > + > +static struct file *readfile_open(int dfd, const char __user *filename, > + struct open_flags *op) > +{ > + struct filename *tmp; > + struct file *f; > + > + tmp = getname(filename); > + if (IS_ERR(tmp)) > + return (struct file *)tmp; > + > + f = do_filp_open(dfd, tmp, op); > + if (!IS_ERR(f)) > + fsnotify_open(f); > + > + putname(tmp); > + return f; > +} > + > +SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename, > + char __user *, buffer, size_t, bufsize, int, flags) > +{ > + struct open_flags op; > + struct open_how how; > + struct file *file; > + loff_t pos = 0; > + int retval; > + > + /* only accept a small subset of O_ flags that make sense */ > + if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags) > + return -EINVAL; > + > + /* add some needed flags to be able to open the file properly */ > + flags |= O_RDONLY | O_LARGEFILE; > + > + how = build_open_how(flags, 0000); > + retval = build_open_flags(&how, &op); > + if (retval) > + return retval; > + > + file = readfile_open(dfd, filename, &op); > + if (IS_ERR(file)) > + return PTR_ERR(file); > + > + retval = vfs_read(file, buffer, bufsize, &pos); > + > + filp_close(file, NULL); > + > + return retval; Manpage says: "doing the sequence of open() and then read() and then close()", which is exactly what it does. But then it goes on to say: "If the file is larger than the value provided in count then only count number of bytes will be copied into buf", which is only half true, it should be: "If the file is larger than the value provided in count then at most count number of bytes will be copied into buf", which is not a lot of information. And "If the size of file is smaller than the value provided in count then the whole file will be copied into buf", which is simply a lie; for example seq_file will happily return a smaller-than-PAGE_SIZE chunk if at least one record fits in there. You'll have a very hard time explaining that in the man page. So I think there are two possible ways forward: 1) just leave the first explanation (it's an open + read + close equivalent) and leave out the rest 2) add a loop around the vfs_read() in the code. I'd strongly prefer #2 because with the non-looping read it's impossible to detect whether the file was completely read or not, and that's just going to lead to surprises and bugs in userspace code. Thanks, Miklos
On Sat, Jul 04, 2020 at 09:41:09PM +0200, Miklos Szeredi wrote: > And "If the size of file is smaller than the value provided in count > then the whole file will be copied into buf", which is simply a lie; > for example seq_file will happily return a smaller-than-PAGE_SIZE > chunk if at least one record fits in there. You'll have a very hard > time explaining that in the man page. So I think there are two > possible ways forward: > > 1) just leave the first explanation (it's an open + read + close > equivalent) and leave out the rest > > 2) add a loop around the vfs_read() in the code. 3) don't bother with the entire thing, until somebody manages to demonstrate a setup where it does make a real difference (compared to than the obvious sequence of syscalls, that is). At which point we'll need to figure out what's going on and deal with the underlying problem of that setup.
On Sat, Jul 04, 2020 at 09:12:06PM +0100, Al Viro wrote: > On Sat, Jul 04, 2020 at 09:41:09PM +0200, Miklos Szeredi wrote: > > And "If the size of file is smaller than the value provided in count > > then the whole file will be copied into buf", which is simply a lie; > > for example seq_file will happily return a smaller-than-PAGE_SIZE > > chunk if at least one record fits in there. You'll have a very hard > > time explaining that in the man page. So I think there are two > > possible ways forward: > > > > 1) just leave the first explanation (it's an open + read + close > > equivalent) and leave out the rest > > > > 2) add a loop around the vfs_read() in the code. > > 3) don't bother with the entire thing, until somebody manages to demonstrate > a setup where it does make a real difference (compared to than the obvious > sequence of syscalls, that is). At which point we'll need to figure out > what's going on and deal with the underlying problem of that setup. Incidentally, if that's intended for use on _sysfs_, I would like to see the effects of that sucker being called by many processes in parallel, seeing that sysfs has, er, certain scalability problems in its lookups. And I would be very surprised if they were not heavier than said overhead of two extra syscalls.
diff --git a/fs/open.c b/fs/open.c index 6cd48a61cda3..4469faa9379c 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1370,3 +1370,53 @@ int stream_open(struct inode *inode, struct file *filp) } EXPORT_SYMBOL(stream_open); + +static struct file *readfile_open(int dfd, const char __user *filename, + struct open_flags *op) +{ + struct filename *tmp; + struct file *f; + + tmp = getname(filename); + if (IS_ERR(tmp)) + return (struct file *)tmp; + + f = do_filp_open(dfd, tmp, op); + if (!IS_ERR(f)) + fsnotify_open(f); + + putname(tmp); + return f; +} + +SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename, + char __user *, buffer, size_t, bufsize, int, flags) +{ + struct open_flags op; + struct open_how how; + struct file *file; + loff_t pos = 0; + int retval; + + /* only accept a small subset of O_ flags that make sense */ + if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags) + return -EINVAL; + + /* add some needed flags to be able to open the file properly */ + flags |= O_RDONLY | O_LARGEFILE; + + how = build_open_how(flags, 0000); + retval = build_open_flags(&how, &op); + if (retval) + return retval; + + file = readfile_open(dfd, filename, &op); + if (IS_ERR(file)) + return PTR_ERR(file); + + retval = vfs_read(file, buffer, bufsize, &pos); + + filp_close(file, NULL); + + return retval; +}
It's a tiny syscall, meant to allow a user to do a single "open this file, read into this buffer, and close the file" all in a single shot. Should be good for reading "tiny" files like sysfs, procfs, and other "small" files. There is no restarting the syscall, this is a "simple" syscall, with the attempt to make reading "simple" files easier with less syscall overhead. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- fs/open.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)