Message ID | 1421658911-18671-1-git-send-email-tao.peng@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Tao, On Mon, Jan 19, 2015 at 4:15 AM, Peng Tao <tao.peng@primarydata.com> wrote: > Running xfstest generic/036, we hit VM_BUG_ON() in nfs_direct_IO(). > 036 toggles O_DIRECT flag while IO is going on. We cannot simply remove > the VM_BUG_ON() there because we'll have a deadlock in the code path. > inode->i_mutex is taken when calling into ->direct_IO. > And nfs_file_direct_write() would grab inode->i_mutex again. > > nfs_file_write() and generic_file_write_iter() checks for O_DIRECT twice, > and it creates a race window if user space is playing with O_DIRECT flag. > Fix it by calling generic_perform_write() instead, so that nfs_direct_IO() > is only invoked in swap on nfs case. > > Suggested-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Peng Tao <tao.peng@primarydata.com> > --- > fs/nfs/file.c | 41 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > index 2ab6f00..e98604a 100644 > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -662,6 +662,45 @@ static int nfs_need_sync_write(struct file *filp, struct inode *inode) > return 0; > } > > +static ssize_t nfs_file_buffer_write(struct kiocb *iocb, struct iov_iter *from) > +{ > + struct file *file = iocb->ki_filp; > + struct address_space *mapping = file->f_mapping; > + struct inode *inode = mapping->host; > + ssize_t result = 0; > + size_t count = iov_iter_count(from); > + loff_t pos = iocb->ki_pos; > + int ret; > + > + mutex_lock(&inode->i_mutex); > + /* We can write back this queue in page reclaim */ > + current->backing_dev_info = mapping->backing_dev_info; > + ret = generic_write_checks(file, &pos, &count, 0); > + if (ret) > + goto out; > + > + if (!count) > + goto out; > + > + iov_iter_truncate(from, count); > + ret = file_remove_suid(file); > + if (ret) > + goto out; > + > + ret = file_update_time(file); > + if (ret) > + goto out; > + > + result = generic_perform_write(file, from, pos); > + if (likely(result >= 0)) > + iocb->ki_pos = pos + result; > + > +out: > + current->backing_dev_info = NULL; > + mutex_unlock(&inode->i_mutex); > + return result ? result : ret; > +} > + > ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) > { > struct file *file = iocb->ki_filp; > @@ -697,7 +736,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) > if (!count) > goto out; > > - result = generic_file_write_iter(iocb, from); > + result = nfs_file_buffer_write(iocb, from); > if (result > 0) > written = result; > I still haven't got an answer to my question as to why we need to do all this if we don't actually want anything other than the swapfile code to use nfs_direct_IO(}? The xfstest with toggling O_DIRECT at random doesn't reflect any sane behaviour that we want to support; all we want to do is ensure that it doesn't deadlock. Why wouldn't the proposal that we add a line to return '0' if !IS_SWAPFILE(inode) suffice? Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Trond, On Mon, Jan 19, 2015 at 9:17 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > Hi Tao, > > On Mon, Jan 19, 2015 at 4:15 AM, Peng Tao <tao.peng@primarydata.com> wrote: >> Running xfstest generic/036, we hit VM_BUG_ON() in nfs_direct_IO(). >> 036 toggles O_DIRECT flag while IO is going on. We cannot simply remove >> the VM_BUG_ON() there because we'll have a deadlock in the code path. >> inode->i_mutex is taken when calling into ->direct_IO. >> And nfs_file_direct_write() would grab inode->i_mutex again. >> >> nfs_file_write() and generic_file_write_iter() checks for O_DIRECT twice, >> and it creates a race window if user space is playing with O_DIRECT flag. >> Fix it by calling generic_perform_write() instead, so that nfs_direct_IO() >> is only invoked in swap on nfs case. >> >> Suggested-by: Christoph Hellwig <hch@infradead.org> >> Signed-off-by: Peng Tao <tao.peng@primarydata.com> >> --- >> fs/nfs/file.c | 41 ++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 40 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/file.c b/fs/nfs/file.c >> index 2ab6f00..e98604a 100644 >> --- a/fs/nfs/file.c >> +++ b/fs/nfs/file.c >> @@ -662,6 +662,45 @@ static int nfs_need_sync_write(struct file *filp, struct inode *inode) >> return 0; >> } >> >> +static ssize_t nfs_file_buffer_write(struct kiocb *iocb, struct iov_iter *from) >> +{ >> + struct file *file = iocb->ki_filp; >> + struct address_space *mapping = file->f_mapping; >> + struct inode *inode = mapping->host; >> + ssize_t result = 0; >> + size_t count = iov_iter_count(from); >> + loff_t pos = iocb->ki_pos; >> + int ret; >> + >> + mutex_lock(&inode->i_mutex); >> + /* We can write back this queue in page reclaim */ >> + current->backing_dev_info = mapping->backing_dev_info; >> + ret = generic_write_checks(file, &pos, &count, 0); >> + if (ret) >> + goto out; >> + >> + if (!count) >> + goto out; >> + >> + iov_iter_truncate(from, count); >> + ret = file_remove_suid(file); >> + if (ret) >> + goto out; >> + >> + ret = file_update_time(file); >> + if (ret) >> + goto out; >> + >> + result = generic_perform_write(file, from, pos); >> + if (likely(result >= 0)) >> + iocb->ki_pos = pos + result; >> + >> +out: >> + current->backing_dev_info = NULL; >> + mutex_unlock(&inode->i_mutex); >> + return result ? result : ret; >> +} >> + >> ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) >> { >> struct file *file = iocb->ki_filp; >> @@ -697,7 +736,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) >> if (!count) >> goto out; >> >> - result = generic_file_write_iter(iocb, from); >> + result = nfs_file_buffer_write(iocb, from); >> if (result > 0) >> written = result; >> > > I still haven't got an answer to my question as to why we need to do > all this if we don't actually want anything other than the swapfile > code to use nfs_direct_IO(}? Sorry that I was under the impression that you were asking advice from Al and thus skipped your proposal as Al did not response. > The xfstest with toggling O_DIRECT at > random doesn't reflect any sane behaviour that we want to support; all > we want to do is ensure that it doesn't deadlock. Why wouldn't the > proposal that we add a line to return '0' if !IS_SWAPFILE(inode) > suffice? > The reason for ->direct_IO being called here is because we check O_DIRECT flag twice in the code path and it creates a window for user space application to flip O_DIRECT flag value. The patch fixes that and it matches other filesystem's implementation, which is to check O_DIRECT only once. I agree with you that we can return 0 in nfs_direct_IO for non-swapfile case if we want to simply declare not to support such O_DIRECT toggling behaviour. Given that the generic layer falls back to buffer IO if ->direct_IO returns 0, it will have same effect as this patch. Cheers, Tao -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 2ab6f00..e98604a 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -662,6 +662,45 @@ static int nfs_need_sync_write(struct file *filp, struct inode *inode) return 0; } +static ssize_t nfs_file_buffer_write(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; + ssize_t result = 0; + size_t count = iov_iter_count(from); + loff_t pos = iocb->ki_pos; + int ret; + + mutex_lock(&inode->i_mutex); + /* We can write back this queue in page reclaim */ + current->backing_dev_info = mapping->backing_dev_info; + ret = generic_write_checks(file, &pos, &count, 0); + if (ret) + goto out; + + if (!count) + goto out; + + iov_iter_truncate(from, count); + ret = file_remove_suid(file); + if (ret) + goto out; + + ret = file_update_time(file); + if (ret) + goto out; + + result = generic_perform_write(file, from, pos); + if (likely(result >= 0)) + iocb->ki_pos = pos + result; + +out: + current->backing_dev_info = NULL; + mutex_unlock(&inode->i_mutex); + return result ? result : ret; +} + ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; @@ -697,7 +736,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) if (!count) goto out; - result = generic_file_write_iter(iocb, from); + result = nfs_file_buffer_write(iocb, from); if (result > 0) written = result;
Running xfstest generic/036, we hit VM_BUG_ON() in nfs_direct_IO(). 036 toggles O_DIRECT flag while IO is going on. We cannot simply remove the VM_BUG_ON() there because we'll have a deadlock in the code path. inode->i_mutex is taken when calling into ->direct_IO. And nfs_file_direct_write() would grab inode->i_mutex again. nfs_file_write() and generic_file_write_iter() checks for O_DIRECT twice, and it creates a race window if user space is playing with O_DIRECT flag. Fix it by calling generic_perform_write() instead, so that nfs_direct_IO() is only invoked in swap on nfs case. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peng Tao <tao.peng@primarydata.com> --- fs/nfs/file.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-)