Message ID | 20151124132421.GG25232@quack.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Nov 24, 2015 at 02:24:21PM +0100, Jan Kara wrote: > Well, this problem seems to suggest that we have the freeze protection for > AIO writes wrong. We should call file_end_write() from aio_complete() and > not from aio_run_iocb()... I believe XFS and other filesystems may have > problems with this as well (CCed). Attached patch (so far only compile > tested since my test machine is pondering on something else) should fix > this. Sounds like one way to do it, but we'd really want a vfs_* helper for this so that it doesn't have to duplicated in other write_iter users like the loop driver, which seems to be missing file file_start_write/file_end_write entirely. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue 24-11-15 20:55:40, Dmitry Monakhov wrote: > On Nov 24, 2015 16:25, "Jan Kara" <jack@suse.cz> wrote: > > On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote: > > > After freeze_fs was revoked (from Jan Kara) pages's write-back > completion > > > is deffered before unwritten conversion, so explicit > flush_unwritten_io() > > > was removed here: c724585b62411 > > > But we still may face deferred conversion for aio-dio case > > > # Trivial testcase > > > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done > & > > > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ > > > --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite > > > NOTE: Sane testcase should be integrated to xfstests, but it requires > > > changes in common/* code, so let's use this this test at the moment. > > > > > > In order to fix this race we have to guard journal transaction with > explicit > > > sb_{start,end}_intwrite() as we do with ext4_evict_inode here:8e8ad8a5 > > > > Well, this problem seems to suggest that we have the freeze protection for > > AIO writes wrong. We should call file_end_write() from aio_complete() and > > not from aio_run_iocb()... > Yep. It was my first attempt to fix that issue, but unfortunately this > trick will break lockdep. Caller will do file_start_write and exit to > userspace. Lockdep treats such behaviour as bug (return to userspace with a > lock held) > > There are two way to fix that > 1) add specific 'long' lock primitive to lockdep The way we tell lockdep about transfer of context is that we just lie to lockdep and tell it that the lock got unlocked at appropriate place and then tell it we locked it again at another place. It is somewhat ugly but not that hard to do... Generally lockdep is a tool that should help but by no means it should be a reason for poor locking decisions just because lockdep cannot handle them. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue 24-11-15 08:07:23, Christoph Hellwig wrote: > On Tue, Nov 24, 2015 at 02:24:21PM +0100, Jan Kara wrote: > > Well, this problem seems to suggest that we have the freeze protection for > > AIO writes wrong. We should call file_end_write() from aio_complete() and > > not from aio_run_iocb()... I believe XFS and other filesystems may have > > problems with this as well (CCed). Attached patch (so far only compile > > tested since my test machine is pondering on something else) should fix > > this. > > Sounds like one way to do it, but we'd really want a vfs_* helper for > this so that it doesn't have to duplicated in other write_iter users > like the loop driver, which seems to be missing file > file_start_write/file_end_write entirely. That is mostly a separate issue, isn't it? I guess you mean a helper like vfs_write_iter() that would get freeze protection and call ->write_iter()? And what about files which have ->write (or ->splice_write()) and don't end up calling ->write_iter? Also there is stuff like __kernel_write() which ends up calling ->write_iter() but e.g. kernel/acct.c: do_acct_process() wants to do it's own thing... So the current status is not ideal but I don't see how to make it substantially better... Honza
From a7332719d80dc94c11d1c1cb32c88b7f25e1ae61 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Tue, 24 Nov 2015 14:19:22 +0100 Subject: [PATCH] aio: Fix freeze protection of aio writes Currently we dropped freeze protection of aio writes just after IO was submitted. Thus aio write could be in flight while the filesystem was frozen and that could result in unexpected situation like aio completion wanting to convert extent type on frozen filesystem. Testcase from Dmitry triggering this is like: for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite Fix the problem by dropping freeze protection only once IO is completed in aio_complete(). Reported-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/aio.c | 10 +++++++--- include/linux/fs.h | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 155f84253f33..3775030053f7 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1065,6 +1065,9 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2) unsigned tail, pos, head; unsigned long flags; + if (kiocb->ki_flags & IOCB_WRITE) + file_end_write(kiocb->ki_filp); + /* * Special case handling for sync iocbs: * - events go directly into the iocb for fast handling @@ -1449,13 +1452,14 @@ rw_common: len = ret; - if (rw == WRITE) + /* We drop freeze protection in aio_complete() */ + if (rw == WRITE) { file_start_write(file); + req->ki_flags |= IOCB_WRITE; + } ret = iter_op(req, &iter); - if (rw == WRITE) - file_end_write(file); kfree(iovec); break; diff --git a/include/linux/fs.h b/include/linux/fs.h index 3aa514254161..54af40ed6a26 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -319,6 +319,7 @@ struct writeback_control; #define IOCB_EVENTFD (1 << 0) #define IOCB_APPEND (1 << 1) #define IOCB_DIRECT (1 << 2) +#define IOCB_WRITE (1 << 3) struct kiocb { struct file *ki_filp; -- 2.1.4