diff mbox

[3/8] swap: don't add ITER_BVEC flag to direct_IO rw

Message ID 5f9e8a7dcdf08bd2dd433f1a42690ab8e67e7915.1418618044.git.osandov@osandov.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval Dec. 15, 2014, 5:26 a.m. UTC
The rw argument to direct_IO has some ill-defined semantics. Some
filesystems (e.g., ext4, FAT) decide whether they're doing a write with
rw == WRITE, but others (e.g., XFS) check rw & WRITE. Let's set a good
example in the swap file code and say ITER_BVEC belongs in
iov_iter->flags but not in rw. This caters to the least common
denominator and avoids a sweeping change of every direct_IO
implementation for now.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 mm/page_io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Al Viro Dec. 15, 2014, 6:16 a.m. UTC | #1
On Sun, Dec 14, 2014 at 09:26:57PM -0800, Omar Sandoval wrote:
> The rw argument to direct_IO has some ill-defined semantics. Some
> filesystems (e.g., ext4, FAT) decide whether they're doing a write with
> rw == WRITE, but others (e.g., XFS) check rw & WRITE. Let's set a good
> example in the swap file code and say ITER_BVEC belongs in
> iov_iter->flags but not in rw. This caters to the least common
> denominator and avoids a sweeping change of every direct_IO
> implementation for now.

Frankly, this is bogus.  If anything, let's just kill the first argument
completely - ->direct_IO() can always pick it from iter->type.

As for catering to the least common denominator...  To hell with the lowest
common denominator.  How many instances of ->direct_IO() do we have, anyway?
24 in the mainline (and we don't give a flying fuck for out-of-tree code, as
a matter of policy).  Moreover, several are of "do nothing" variety.

FWIW, 'rw' is a mess.  We used to have this:
	READ: O_DIRECT read
	WRITE: O_DIRECT write
	KERNEL_WRITE: swapout

These days KERNEL_WRITE got replaced with ITER_BVEC | WRITE.  The thing is,
we have a bunch of places where we explicitly checked for being _equal_ to
WRITE.  I.e. the checks that gave a negative on swapouts.  I suspect that most
of them are wrong and should trigger on all writes, including swapouts, but
I really didn't want to dig into that pile of fun back then.  That's the
main reason why 'rw' argument has survived at all...
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Omar Sandoval Dec. 15, 2014, 3:57 p.m. UTC | #2
On Mon, Dec 15, 2014 at 06:16:02AM +0000, Al Viro wrote:
> On Sun, Dec 14, 2014 at 09:26:57PM -0800, Omar Sandoval wrote:
> > The rw argument to direct_IO has some ill-defined semantics. Some
> > filesystems (e.g., ext4, FAT) decide whether they're doing a write with
> > rw == WRITE, but others (e.g., XFS) check rw & WRITE. Let's set a good
> > example in the swap file code and say ITER_BVEC belongs in
> > iov_iter->flags but not in rw. This caters to the least common
> > denominator and avoids a sweeping change of every direct_IO
> > implementation for now.
> 
> Frankly, this is bogus.  If anything, let's just kill the first argument
> completely - ->direct_IO() can always pick it from iter->type.
> 
> As for catering to the least common denominator...  To hell with the lowest
> common denominator.  How many instances of ->direct_IO() do we have, anyway?
> 24 in the mainline (and we don't give a flying fuck for out-of-tree code, as
> a matter of policy).  Moreover, several are of "do nothing" variety.
> 
> FWIW, 'rw' is a mess.  We used to have this:
> 	READ: O_DIRECT read
> 	WRITE: O_DIRECT write
> 	KERNEL_WRITE: swapout
> 
> These days KERNEL_WRITE got replaced with ITER_BVEC | WRITE.  The thing is,
> we have a bunch of places where we explicitly checked for being _equal_ to
> WRITE.  I.e. the checks that gave a negative on swapouts.  I suspect that most
> of them are wrong and should trigger on all writes, including swapouts, but
> I really didn't want to dig into that pile of fun back then.  That's the
> main reason why 'rw' argument has survived at all...
>
In that case, I'll take a stab at nuking rw. I'm almost certain that
some of these are completely wrong (for example, of the form
if (rw == WRITE) do_write(); else do_read();). This isn't an immediate
problem for swap files on BTRFS, as __blockdev_direct_IO does a bitwise
test, so I think I'll split it out into its own series.

Thanks,
diff mbox

Patch

diff --git a/mm/page_io.c b/mm/page_io.c
index 1630ac0..c229f88 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -285,8 +285,7 @@  int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		set_page_writeback(page);
 		unlock_page(page);
 		mutex_lock(&inode->i_mutex);
-		ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE,
-						&kiocb, &from,
+		ret = mapping->a_ops->direct_IO(WRITE, &kiocb, &from,
 						kiocb.ki_pos);
 		mutex_unlock(&inode->i_mutex);
 		if (ret == PAGE_SIZE) {