Message ID | m1prfj5qxp.fsf@fess.ebiederm.org (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is > always ready for I/O and return -EIO from all other operations. I think read should return -EIO too. If a program is reading from a /proc file (say), and the thing it's reading suddenly disappears, EOF gives the false impression that it's read to the end of formatted data from that file and it can process the data as if it's complete, which is wrong. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jamie Lokier <jamie@shareable.org> writes: >> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is >> always ready for I/O and return -EIO from all other operations. > > I think read should return -EIO too. If a program is reading from a > /proc file (say), and the thing it's reading suddenly disappears, EOF > gives the false impression that it's read to the end of formatted data > from that file and it can process the data as if it's complete, which > is wrong. Good point EIO is the current read return value for a removed proc file. For closed pipes, and hung up ttys the read return value is 0, and from my reading that is what bsd returns after a sys_revoke. The reason I have f_op settable is because I never expected complete agreement on the return codes, and because it makes auditing and spotting this kind of thing easier. I guess I should make two variations on revoked_file_ops then. Say eof_file_ops, eio_file_ops. Identical except for their treatment of reads. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > >> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is > >> always ready for I/O and return -EIO from all other operations. > > > > I think read should return -EIO too. If a program is reading from a > > /proc file (say), and the thing it's reading suddenly disappears, EOF > > gives the false impression that it's read to the end of formatted data > > from that file and it can process the data as if it's complete, which > > is wrong. > > Good point EIO is the current read return value for a removed proc file. > > For closed pipes, and hung up ttys the read return value is 0, and from > my reading that is what bsd returns after a sys_revoke. A few suggestions below. Feel free to ignore them on account of the basic revoking functionality being more important :-) I'm not sure a revoked pipe should look like a normally closed one. ECONNRESET? For hung up ttys, I agree. But where's the SIGHUP :-) You probably do want the process using it to die if it's not handling SIGHUP, because terminal-using processes don't always terminate themselves on EOF. For things writing to a pipe or file, SIGPIPE may be appropriate in addition to EIO, to avoid runaway processes. Looks odd I know. For writing to a terminal, SIGHUP again. > The reason I have f_op settable is because I never expected complete > agreement on the return codes, and because it makes auditing and spotting > this kind of thing easier. > > I guess I should make two variations on revoked_file_ops then. Say > eof_file_ops, eio_file_ops. Identical except for their treatment of > reads. Fair enough. It's good to have good defaults. I'm not convinced eof_file_ops is ever a good default. sighup_file_ops and sigpipe_file_ops maybe :-) -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ebiederm@xmission.com (Eric W. Biederman) writes: > Jamie Lokier <jamie@shareable.org> writes: > >>> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is >>> always ready for I/O and return -EIO from all other operations. >> >> I think read should return -EIO too. If a program is reading from a >> /proc file (say), and the thing it's reading suddenly disappears, EOF >> gives the false impression that it's read to the end of formatted data >> from that file and it can process the data as if it's complete, which >> is wrong. I just thought about that some more and I am not convinced. In general the current return values from proc after an I/O operation are suspect. seek returns -EINVAL instead of -EIO. poll returns DEFAULT_POLLMASK (which doesn't set POLLERR). So I am not convinced that the existing proc return values on error are correct, and they are recent additions so the historical precedent is not especially large. EOF does give the impression that you have read all of the data from the /proc file, and that is in fact the case. There is no more data coming from that proc file. That the data is stale is well know. That the data is not atomic, anything that spans more than a single read is not atomic. So I don't see what returning EIO adds to the equation. Perhaps that your fragile user space string parser may break? EOF gives a clear indication the application should stop reading the data, because there is no more. EIO only says that the was a problem. I don't know of anything that depends on the rmmod behavior either way. But if we can get away with it I would like to use something that is generally useful instead of something that only makes sense in the context of proc. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > I just thought about that some more and I am not convinced. > > In general the current return values from proc after an I/O operation > are suspect. seek returns -EINVAL instead of -EIO. poll returns > DEFAULT_POLLMASK (which doesn't set POLLERR). So I am not convinced > that the existing proc return values on error are correct, and they > are recent additions so the historical precedent is not especially > large. > > EOF does give the impression that you have read all of the data from > the /proc file, and that is in fact the case. There is no more > data coming from that proc file. > > That the data is stale is well know. > > That the data is not atomic, anything that spans more than a single > read is not atomic. > > So I don't see what returning EIO adds to the equation. Perhaps > that your fragile user space string parser may break? > > EOF gives a clear indication the application should stop reading > the data, because there is no more. > > EIO only says that the was a problem. > > I don't know of anything that depends on the rmmod behavior either > way. But if we can get away with it I would like to use something > that is generally useful instead of something that only makes > sense in the context of proc. I'm not thinking of proc, really. More thinking of applications: EOF effectively means "whole file read without error - now do the next thing". If a filesystem file is revoked (umount -f), you definitely want to stop that Makefile which is copying a file from the unmounted filesystem to a target file. Otherwise you get inconsistent states which can only occur as a result of this umount -f, something Makefiles should never have to care about. rmmod behaviour is not something any app should see normally. Unexpected behaviour when files are oddly truncated (despite never being written that way) is not "fragile user space". So whatever it returns, it should be some error code, imho. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jamie Lokier <jamie@shareable.org> writes: > Eric W. Biederman wrote: >> >> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is >> >> always ready for I/O and return -EIO from all other operations. >> > >> > I think read should return -EIO too. If a program is reading from a >> > /proc file (say), and the thing it's reading suddenly disappears, EOF >> > gives the false impression that it's read to the end of formatted data >> > from that file and it can process the data as if it's complete, which >> > is wrong. >> >> Good point EIO is the current read return value for a removed proc file. >> >> For closed pipes, and hung up ttys the read return value is 0, and from >> my reading that is what bsd returns after a sys_revoke. > > A few suggestions below. Feel free to ignore them on account of the > basic revoking functionality being more important :-) I think I will. This seems to be the part of the code that is easily approachable and it is going to be easy to have different opinions on, and there is no one right answer. For now I'm just going to pick my best understanding of what BSD did. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jamie Lokier <jamie@shareable.org> writes: > Eric W. Biederman wrote: >> I just thought about that some more and I am not convinced. >> >> In general the current return values from proc after an I/O operation >> are suspect. seek returns -EINVAL instead of -EIO. poll returns >> DEFAULT_POLLMASK (which doesn't set POLLERR). So I am not convinced >> that the existing proc return values on error are correct, and they >> are recent additions so the historical precedent is not especially >> large. >> >> EOF does give the impression that you have read all of the data from >> the /proc file, and that is in fact the case. There is no more >> data coming from that proc file. >> >> That the data is stale is well know. >> >> That the data is not atomic, anything that spans more than a single >> read is not atomic. >> >> So I don't see what returning EIO adds to the equation. Perhaps >> that your fragile user space string parser may break? >> >> EOF gives a clear indication the application should stop reading >> the data, because there is no more. >> >> EIO only says that the was a problem. >> >> I don't know of anything that depends on the rmmod behavior either >> way. But if we can get away with it I would like to use something >> that is generally useful instead of something that only makes >> sense in the context of proc. > > I'm not thinking of proc, really. More thinking of applications: EOF > effectively means "whole file read without error - now do the next thing". > > If a filesystem file is revoked (umount -f), you definitely want to > stop that Makefile which is copying a file from the unmounted > filesystem to a target file. Otherwise you get inconsistent states > which can only occur as a result of this umount -f, something > Makefiles should never have to care about. > > rmmod behaviour is not something any app should see normally. > Unexpected behaviour when files are oddly truncated (despite never > being written that way) is not "fragile user space". So whatever it > returns, it should be some error code, imho. Well I just took a look at NetBSD 4.0.1 and it appears they agree with you. Plus I'm starting to feel a lot better about the linux manual pages, as the revoke(2) man pages from the BSDs describe different error codes than the implementation, and they fail to mention revoke appears to work on ordinary files as well. If the file is not a tty EIO is returned from read. opens return ENXIO writes return EIO ioctl returns EBADF close returns 0 Operations that just lookup the vnode simply return EBADF. I don't know if that is perfectly correct for the linux case. EBADF usually means the file descriptor specified isn't open. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/Makefile b/fs/Makefile index af6d047..7787ddd 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -11,7 +11,7 @@ obj-y := open.o read_write.o file_table.o super.o \ attr.o bad_inode.o file.o filesystems.o namespace.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o drop_caches.o splice.o sync.o utimes.o \ - stack.o fs_struct.o + stack.o fs_struct.o revoked_file.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o diff --git a/fs/revoked_file.c b/fs/revoked_file.c new file mode 100644 index 0000000..9936693 --- /dev/null +++ b/fs/revoked_file.c @@ -0,0 +1,181 @@ +/* + * linux/fs/revoked_file.c + * + * Copyright (C) 1997, Stephen Tweedie + * + * Provide stub functions for unreadable inodes + * + * Fabian Frederick : August 2003 - All file operations assigned to EIO + * + * Eric Biederman : 8 April 2008 - Derivied from bad_inode.c + */ + +#include <linux/fs.h> +#include <linux/module.h> +#include <linux/stat.h> +#include <linux/time.h> +#include <linux/namei.h> +#include <linux/poll.h> + +static loff_t revoked_file_llseek(struct file *file, loff_t offset, int origin) +{ + return -EIO; +} + +static ssize_t revoked_file_read(struct file *filp, char __user *buf, + size_t size, loff_t *ppos) +{ + return 0; +} + +static ssize_t revoked_file_write(struct file *filp, const char __user *buf, + size_t siz, loff_t *ppos) +{ + return -EIO; +} + +static ssize_t revoked_file_aio_read(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + return 0; +} + +static ssize_t revoked_file_aio_write(struct kiocb *iocb, const struct iovec *iov, + unsigned long nr_segs, loff_t pos) +{ + return -EIO; +} + +static int revoked_file_readdir(struct file *filp, void *dirent, filldir_t filldir) +{ + return -EIO; +} + +static unsigned int revoked_file_poll(struct file *filp, poll_table *wait) +{ + return POLLIN | POLLOUT | POLLERR | POLLRDNORM | POLLWRNORM; +} + +static int revoked_file_ioctl (struct inode *inode, struct file *filp, + unsigned int cmd, unsigned long arg) +{ + return -EIO; +} + +static long revoked_file_unlocked_ioctl(struct file *file, unsigned cmd, + unsigned long arg) +{ + return -EIO; +} + +static long revoked_file_compat_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + return -EIO; +} + +static int revoked_file_mmap(struct file *file, struct vm_area_struct *vma) +{ + return -EIO; +} + +static int revoked_file_open(struct inode *inode, struct file *filp) +{ + return -EIO; +} + +static int revoked_file_flush(struct file *file, fl_owner_t id) +{ + return 0; +} + +static int revoked_file_release(struct inode *inode, struct file *filp) +{ + return 0; +} + +static int revoked_file_fsync(struct file *file, struct dentry *dentry, + int datasync) +{ + return -EIO; +} + +static int revoked_file_aio_fsync(struct kiocb *iocb, int datasync) +{ + return -EIO; +} + +static int revoked_file_fasync(int fd, struct file *filp, int on) +{ + return -EIO; +} + +static int revoked_file_lock(struct file *file, int cmd, struct file_lock *fl) +{ + return -EIO; +} + +static ssize_t revoked_file_sendpage(struct file *file, struct page *page, + int off, size_t len, loff_t *pos, int more) +{ + return -EIO; +} + +static unsigned long revoked_file_get_unmapped_area(struct file *file, + unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + return -EIO; +} + +static int revoked_file_check_flags(int flags) +{ + return -EIO; +} + +static int revoked_file_flock(struct file *filp, int cmd, struct file_lock *fl) +{ + return -EIO; +} + +static ssize_t revoked_file_splice_write(struct pipe_inode_info *pipe, + struct file *out, loff_t *ppos, size_t len, + unsigned int flags) +{ + return -EIO; +} + +static ssize_t revoked_file_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + return -EIO; +} + +const struct file_operations revoked_file_ops = +{ + .llseek = revoked_file_llseek, + .read = revoked_file_read, + .write = revoked_file_write, + .aio_read = revoked_file_aio_read, + .aio_write = revoked_file_aio_write, + .readdir = revoked_file_readdir, + .poll = revoked_file_poll, + .ioctl = revoked_file_ioctl, + .unlocked_ioctl = revoked_file_unlocked_ioctl, + .compat_ioctl = revoked_file_compat_ioctl, + .mmap = revoked_file_mmap, + .open = revoked_file_open, + .flush = revoked_file_flush, + .release = revoked_file_release, + .fsync = revoked_file_fsync, + .aio_fsync = revoked_file_aio_fsync, + .fasync = revoked_file_fasync, + .lock = revoked_file_lock, + .sendpage = revoked_file_sendpage, + .get_unmapped_area = revoked_file_get_unmapped_area, + .check_flags = revoked_file_check_flags, + .flock = revoked_file_flock, + .splice_write = revoked_file_splice_write, + .splice_read = revoked_file_splice_read, +}; diff --git a/include/linux/fs.h b/include/linux/fs.h index a82a2ea..2fb0871 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -896,6 +896,8 @@ extern int fops_substitute(struct file *file, const struct file_operations *f_op extern void inode_fops_substitute(struct inode *inode, const struct file_operations *f_op, struct vm_operations_struct *vm_ops); +extern const struct file_operations revoked_file_ops; + extern struct mutex files_lock; #define file_list_lock() mutex_lock(&files_lock); #define file_list_unlock() mutex_unlock(&files_lock);
revoked_file_ops is a set of file operations designed to be used when a files backing store has been removed. revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is always ready for I/O and return -EIO from all other operations. This is designed to allow userspace to gracefully file descriptors that enter this unusable state. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- fs/Makefile | 2 +- fs/revoked_file.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 + 3 files changed, 184 insertions(+), 1 deletions(-) create mode 100644 fs/revoked_file.c