Message ID | 20160109220826.GA11174@kvack.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote: > > Please consider pulling the following changes to fix a couple of issues > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git . No. This is much too late for this kind of hackery. That second patch in particular is both subtle and ugly, and is messing with lockdep. No way will I take something like this the last fay before a release. It's not even a regression, nor did you send me anything at all for this release. Trying to sneak something in just before 4.4 is not ok. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 09, 2016 at 02:43:44PM -0800, Linus Torvalds wrote: > On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote: > > > > Please consider pulling the following changes to fix a couple of issues > > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git . > > No. This is much too late for this kind of hackery. That second patch > in particular is both subtle and ugly, and is messing with lockdep. I wasn't particularly fond of how Jan implemented that. Any ideas on a better way to avoid lockdep complaining about this? My initial feedback to Jan was exactly that this should be handled within the file_start_write() and file_end_write() APIs -- AIO really shouldn't need to be mucking in what ought to be hidden behind that API. > No way will I take something like this the last fay before a release. > > It's not even a regression, nor did you send me anything at all for > this release. Trying to sneak something in just before 4.4 is not ok. Okay, no worries. -ben > Linus
On Sat, Jan 09, 2016 at 02:43:44PM -0800, Linus Torvalds wrote: > On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote: > > > > Please consider pulling the following changes to fix a couple of issues > > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git . > > No. This is much too late for this kind of hackery. That second patch > in particular is both subtle and ugly, and is messing with lockdep. > > No way will I take something like this the last fay before a release. > > It's not even a regression, nor did you send me anything at all for > this release. Trying to sneak something in just before 4.4 is not ok. BTW, right now vfs.git#for-linus contains minimal compat_ioctl patches; unless you want those in right now, I'm holding them back until Monday. It's two patches Jann has posted (with trivial fix in the first one folded in) + not passing fd around where it's not needed (basically, those calls of do_ioctl() in there are guaranteed to go into vfs_ioctl() - the paths in do_vfs_ioctl() that care about the descriptor numbers are not reachable with the arguments do_ioctl() is getting). If for some reason you want those in before -final - yell and I'll send a pull request, otherwise they'll be in one of the first pull requests on Monday. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 9, 2016 at 3:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > If for some reason you want those in before -final - yell and I'll send > a pull request, otherwise they'll be in one of the first pull requests > on Monday. No, I definitely don't want anything now unless it's a major regression or security issue. Other stuff can wait until the merge window and perhaps be marked for stable if required. That way they'll get testing. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat 09-01-16 18:00:36, Benjamin LaHaise wrote: > On Sat, Jan 09, 2016 at 02:43:44PM -0800, Linus Torvalds wrote: > > On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote: > > > > > > Please consider pulling the following changes to fix a couple of issues > > > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git . > > > > No. This is much too late for this kind of hackery. That second patch > > in particular is both subtle and ugly, and is messing with lockdep. > > I wasn't particularly fond of how Jan implemented that. Any ideas on a > better way to avoid lockdep complaining about this? My initial feedback > to Jan was exactly that this should be handled within the file_start_write() > and file_end_write() APIs -- AIO really shouldn't need to be mucking in what > ought to be hidden behind that API. [To introduce Peter to what we are talking about: This is about filesystem freezing protection implemented effectively as a RW semaphore which must be held for reading whenever we modify filesystem and held for writing when we want to freeze the filesystem. With AIO, we need to return to userspace with the lock held and thus lockdep complains.] Thinking about this some more maybe we could improve lockdep in the following way: We could flag a locking class that it is OK to return with this lock to userspace. When returning to userspace with such lock, lockdep would just silently discard the information about lock being held. When freeing such lock without lockdep knowing we acquired it (i.e., we were called by userspace / IRQ when holding the lock), lockdep would just silently ignore such request. This way we could avoid the hacks with telling lockdep we dropped the lock when we actually did not and we would also get improved coverage to the current state. Peter, would something like the above be doable? Bonus would be if we verified that all locks held when releasing lock we "inherited" rank below the lock we are releasing (that way we would simulate acquiring the lock when entering kernel from userspace). However we would have to be careful about handling locks acquired via trylock (those cannot create deadlocks) or when release happens from softirq / irq - there we care only about locks acquired since softirq started - this happens e.g. when we release the lock on IO completion like in the AIO case. Honza
diff --git a/fs/aio.c b/fs/aio.c index 155f842..e0d5398 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1269,6 +1269,8 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, if (unlikely(copy_from_user(&ts, timeout, sizeof(ts)))) return -EFAULT; + if (!timespec_valid(&ts)) + return -EINVAL; until = timespec_to_ktime(ts); } -- 2.5.0 From 3b9688ff1e083a3c981bbc795f823fb0b0f2aacc Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Thu, 7 Jan 2016 16:03:04 +0100 Subject: [PATCH 2/2] aio: Fix freeze protection of aio writes Currently we dropped freeze protection of aio writes just after IO was submitted. Thus aio write could be in flight while the filesystem was frozen and that could result in unexpected situation like aio completion wanting to convert extent type on frozen filesystem. Testcase from Dmitry triggering this is like: for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite Fix the problem by dropping freeze protection only once IO is completed in aio_complete(). Reported-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> diff --git a/fs/aio.c b/fs/aio.c index e0d5398..a574944 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1065,6 +1065,19 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2) unsigned tail, pos, head; unsigned long flags; + if (kiocb->ki_flags & IOCB_WRITE) { + struct file *f = kiocb->ki_filp; + + /* + * Tell lockdep we inherited freeze protection from submission + * thread. + */ + percpu_rwsem_acquire( + &f->f_inode->i_sb->s_writers.rw_sem[SB_FREEZE_WRITE-1], + 1, _THIS_IP_); + file_end_write(f); + } + /* * Special case handling for sync iocbs: * - events go directly into the iocb for fast handling @@ -1451,13 +1464,25 @@ rw_common: len = ret; - if (rw == WRITE) + if (rw == WRITE) { file_start_write(file); + req->ki_flags |= IOCB_WRITE; + } ret = iter_op(req, &iter); - if (rw == WRITE) - file_end_write(file); + if (rw == WRITE) { + /* + * We release freeze protection in aio_complete(). Fool + * lockdep by telling it the lock got released so that + * it doesn't complain about held lock when we return + * to userspace. + */ + percpu_rwsem_release( + &file->f_inode->i_sb->s_writers.rw_sem[SB_FREEZE_WRITE-1], + 1, _THIS_IP_); + } + kfree(iovec); break; diff --git a/include/linux/fs.h b/include/linux/fs.h index 3aa5142..54af40e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -319,6 +319,7 @@ struct writeback_control; #define IOCB_EVENTFD (1 << 0) #define IOCB_APPEND (1 << 1) #define IOCB_DIRECT (1 << 2) +#define IOCB_WRITE (1 << 3) struct kiocb { struct file *ki_filp;