Message ID | 20141216085624.GA25256@mew (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Dec 16, 2014 at 12:56:24AM -0800, Omar Sandoval wrote: > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1728,6 +1728,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) > } > > if (mapping->a_ops->swap_activate) { > + if (!mapping->a_ops->direct_IO) > + return -EINVAL; > + swap_file->f_flags |= O_DIRECT; > ret = mapping->a_ops->swap_activate(sis, swap_file, span); > if (!ret) { > sis->flags |= SWP_FILE; This needs to hold swap_file->f_lock, but otherwise looks good. > This seems to be more or less equivalent to doing a fcntl(F_SETFL) to > add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff > calls filp_close on swap_file, so I don't see why it's necessary to > clear the flag. filp_lose doesn't nessecarily destroy the file structure, there might be other reference to it, e.g. from dup() or descriptor passing. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 17, 2014 at 12:06:10AM -0800, Christoph Hellwig wrote: > > This seems to be more or less equivalent to doing a fcntl(F_SETFL) to > > add the O_DIRECT flag to swap_file (which is a struct file *). Swapoff > > calls filp_close on swap_file, so I don't see why it's necessary to > > clear the flag. > > filp_lose doesn't nessecarily destroy the file structure, there might be > other reference to it, e.g. from dup() or descriptor passing. Where the hell would those other references come from? We open the damn thing in sys_swapon(), never put it into descriptor tables, etc. and the only reason why we use filp_close() instead of fput() is that we would miss ->flush() otherwise. Said that, why not simply *open* it with O_DIRECT to start with and be done with that? It's not as if those guys came preopened by caller - swapon(2) gets a pathname and does opening itself. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 17, 2014 at 08:20:21AM +0000, Al Viro wrote: > Where the hell would those other references come from? We open the damn > thing in sys_swapon(), never put it into descriptor tables, etc. and > the only reason why we use filp_close() instead of fput() is that we > would miss ->flush() otherwise. > > Said that, why not simply *open* it with O_DIRECT to start with and be done > with that? It's not as if those guys came preopened by caller - swapon(2) > gets a pathname and does opening itself. Oops, should have dug deeper into the code. For some reason I assumed the fd is passed in from userspace. The suggestion from Al is much better, given that we never do normal I/O on the swapfile, just the bmap + direct bio submission which I hope could go away in favor of the direct I/O variant in the long run. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 17, 2014 at 12:24:37AM -0800, Christoph Hellwig wrote: > On Wed, Dec 17, 2014 at 08:20:21AM +0000, Al Viro wrote: > > Where the hell would those other references come from? We open the damn > > thing in sys_swapon(), never put it into descriptor tables, etc. and > > the only reason why we use filp_close() instead of fput() is that we > > would miss ->flush() otherwise. > > > > Said that, why not simply *open* it with O_DIRECT to start with and be done > > with that? It's not as if those guys came preopened by caller - swapon(2) > > gets a pathname and does opening itself. > > Oops, should have dug deeper into the code. For some reason I assumed > the fd is passed in from userspace. > > The suggestion from Al is much better, given that we never do normal > I/O on the swapfile, just the bmap + direct bio submission which I hope > could go away in favor of the direct I/O variant in the long run. See my previous message. If we use O_DIRECT on the original open, then filesystems that implement bmap but not direct_IO will no longer work. These are the ones that I found in my tree: adfs befs bfs ecryptfs efs freevxfs hpfs isofs minix ntfs omfs qnx4 qnx6 sysv ufs Several of these are read only, and I can't imagine that anyone is using a swapfile on any of the rest, but if someone is, this would be a regression, wouldn't it?
On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote: > See my previous message. If we use O_DIRECT on the original open, then > filesystems that implement bmap but not direct_IO will no longer work. > These are the ones that I found in my tree: In the long run I don't think they are worth keeping. But to keep you out of that discussion you can just try an open without O_DIRECT if the open with the flag failed. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 17, 2014 at 10:52:56AM -0800, Christoph Hellwig wrote: > On Wed, Dec 17, 2014 at 06:58:32AM -0800, Omar Sandoval wrote: > > See my previous message. If we use O_DIRECT on the original open, then > > filesystems that implement bmap but not direct_IO will no longer work. > > These are the ones that I found in my tree: > > In the long run I don't think they are worth keeping. But to keep you > out of that discussion you can just try an open without O_DIRECT if the > open with the flag failed. Umm... That's one possibility, of course (and if swapon(2) is on someone's hotpath, I really would like to see what the hell they are doing - it has to be interesting in a sick way). Said that, there's an interesting problem with O_DIRECT. It's irrelevant in this case, but it *can* be changed halfway through e.g write(2) and AFAICS we have at least some suspicious codepaths. Look at ext4_file_write_iter(), for example. We check O_DIRECT, then grab some locks, then proceed to look at the results of that check, do some work... and call __generic_file_write_iter(), which checks O_DIRECT again. If it has been cleared (or, probably worse, set) it looks like we'll get an interesting bunch of holes. Should we just labda-expand that call of __generic_file_write_iter() and replace its if (unlikely(file->f_flags & O_DIRECT)) { with if (odirect) to be guaranteed that it'll match the things we'd done before the call? I'm looking through the callchains right now in search of similar places right now, will follow up when I'm done... BTW, speaking of read/write vs. swap - what's the story with e.g. AFS write() checking IS_SWAPFILE() and failing with -EBUSY? Note that * it's done before acquiring i_mutex, so it isn't race-free * it's dubious from the POSIX POV - EBUSY isn't in the error list for write(2). * other filesystems generally don't have anything of that sort. NFS does, but local ones do not... Besides, do we even allow swapfiles on AFS? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/swapfile.c b/mm/swapfile.c index 8798b2e..5145c09 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1728,6 +1728,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) } if (mapping->a_ops->swap_activate) { + if (!mapping->a_ops->direct_IO) + return -EINVAL; + swap_file->f_flags |= O_DIRECT; ret = mapping->a_ops->swap_activate(sis, swap_file, span); if (!ret) { sis->flags |= SWP_FILE;