diff mbox

[04/11] fs: add support for allowing applications to pass in write life time hints

Message ID 1497729594-4707-5-git-send-email-axboe@kernel.dk (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe June 17, 2017, 7:59 p.m. UTC
Add four flags for the pwritev2(2) system call, allowing an application
to give the kernel a hint about what on-media life times can be
expected from a given write.

The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.

Set aside 3 bits in the iocb flags structure to carry this information
over from the pwritev2 RWF_WRITE_LIFE_* flags.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/read_write.c         | 12 +++++++++++-
 include/linux/fs.h      | 12 ++++++++++++
 include/uapi/linux/fs.h | 10 ++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig June 19, 2017, 6:27 a.m. UTC | #1
On Sat, Jun 17, 2017 at 01:59:47PM -0600, Jens Axboe wrote:
> Add four flags for the pwritev2(2) system call, allowing an application
> to give the kernel a hint about what on-media life times can be
> expected from a given write.
> 
> The intent is for these values to be relative to each other, no
> absolute meaning should be attached to these flag names.
> 
> Set aside 3 bits in the iocb flags structure to carry this information
> over from the pwritev2 RWF_WRITE_LIFE_* flags.

What is the strong use case for the per-I/O flags?  I'd much rather
stick to fcntl only for now if we can.
Jens Axboe June 19, 2017, 2:56 p.m. UTC | #2
On 06/19/2017 12:27 AM, Christoph Hellwig wrote:
> On Sat, Jun 17, 2017 at 01:59:47PM -0600, Jens Axboe wrote:
>> Add four flags for the pwritev2(2) system call, allowing an application
>> to give the kernel a hint about what on-media life times can be
>> expected from a given write.
>>
>> The intent is for these values to be relative to each other, no
>> absolute meaning should be attached to these flag names.
>>
>> Set aside 3 bits in the iocb flags structure to carry this information
>> over from the pwritev2 RWF_WRITE_LIFE_* flags.
> 
> What is the strong use case for the per-I/O flags?  I'd much rather
> stick to fcntl only for now if we can.

Fine, I guess I should just have dusted off the 2 year old patchset,
as that was _exactly_ what that did.
Jens Axboe June 19, 2017, 4:02 p.m. UTC | #3
On 06/19/2017 08:56 AM, Jens Axboe wrote:
> On 06/19/2017 12:27 AM, Christoph Hellwig wrote:
>> On Sat, Jun 17, 2017 at 01:59:47PM -0600, Jens Axboe wrote:
>>> Add four flags for the pwritev2(2) system call, allowing an application
>>> to give the kernel a hint about what on-media life times can be
>>> expected from a given write.
>>>
>>> The intent is for these values to be relative to each other, no
>>> absolute meaning should be attached to these flag names.
>>>
>>> Set aside 3 bits in the iocb flags structure to carry this information
>>> over from the pwritev2 RWF_WRITE_LIFE_* flags.
>>
>> What is the strong use case for the per-I/O flags?  I'd much rather
>> stick to fcntl only for now if we can.
> 
> Fine, I guess I should just have dusted off the 2 year old patchset,
> as that was _exactly_ what that did.

Actually, one good use case is O_DIRECT on a block device. Since I'm
not a huge fan of having per-call hints that is only useful for a
single case, how about we add the hints to the struct file as well?
For buffered IO, just grab it from the inode. If we have a file
available, then that overrides the per-inode setting.
Christoph Hellwig June 19, 2017, 6:58 p.m. UTC | #4
On Mon, Jun 19, 2017 at 10:02:09AM -0600, Jens Axboe wrote:
> Actually, one good use case is O_DIRECT on a block device. Since I'm
> not a huge fan of having per-call hints that is only useful for a
> single case, how about we add the hints to the struct file as well?
> For buffered IO, just grab it from the inode. If we have a file
> available, then that overrides the per-inode setting.

Even for buffered I/O per-fіle would seem more useful to be honest.
For the buffer_head based file systems this could even be done fairly
easily.
Jens Axboe June 19, 2017, 7 p.m. UTC | #5
On 06/19/2017 12:58 PM, Christoph Hellwig wrote:
> On Mon, Jun 19, 2017 at 10:02:09AM -0600, Jens Axboe wrote:
>> Actually, one good use case is O_DIRECT on a block device. Since I'm
>> not a huge fan of having per-call hints that is only useful for a
>> single case, how about we add the hints to the struct file as well?
>> For buffered IO, just grab it from the inode. If we have a file
>> available, then that overrides the per-inode setting.
> 
> Even for buffered I/O per-fіle would seem more useful to be honest.
> For the buffer_head based file systems this could even be done fairly
> easily.

If I add the per-file hint as well, then anywhere that has the file should
just grab it from there. Unless not set, then grab from inode.

That does raise an issue with the NONE hint being 0. We can tell right now
if NONE was set, or nothing was set. This becomes a problem if we want the
file hint to override the inode hint. Should probably just bump the values
up by one, so that NONE is 1, SHORT is 2, etc.
Jens Axboe June 19, 2017, 7:10 p.m. UTC | #6
On 06/19/2017 01:00 PM, Jens Axboe wrote:
> On 06/19/2017 12:58 PM, Christoph Hellwig wrote:
>> On Mon, Jun 19, 2017 at 10:02:09AM -0600, Jens Axboe wrote:
>>> Actually, one good use case is O_DIRECT on a block device. Since I'm
>>> not a huge fan of having per-call hints that is only useful for a
>>> single case, how about we add the hints to the struct file as well?
>>> For buffered IO, just grab it from the inode. If we have a file
>>> available, then that overrides the per-inode setting.
>>
>> Even for buffered I/O per-fіle would seem more useful to be honest.
>> For the buffer_head based file systems this could even be done fairly
>> easily.
> 
> If I add the per-file hint as well, then anywhere that has the file should
> just grab it from there. Unless not set, then grab from inode.
> 
> That does raise an issue with the NONE hint being 0. We can tell right now
> if NONE was set, or nothing was set. This becomes a problem if we want the
> file hint to override the inode hint. Should probably just bump the values
> up by one, so that NONE is 1, SHORT is 2, etc.

Actually, we don't have to, as long as the file inherits the inode mask.
Then we can just use the file hint if it differs from the inode hint.
diff mbox

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index 19d4d88fa285..975fe1d46a59 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -675,10 +675,11 @@  EXPORT_SYMBOL(iov_shorten);
 static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 		loff_t *ppos, int type, int flags)
 {
+	struct inode *inode = file_inode(filp);
 	struct kiocb kiocb;
 	ssize_t ret;
 
-	if (flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC))
+	if (flags & ~(RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_WRITE_LIFE_MASK))
 		return -EOPNOTSUPP;
 
 	init_sync_kiocb(&kiocb, filp);
@@ -688,6 +689,15 @@  static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 		kiocb.ki_flags |= IOCB_DSYNC;
 	if (flags & RWF_SYNC)
 		kiocb.ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
+	if ((flags & RWF_WRITE_LIFE_MASK) ||
+	    mask_to_write_hint(inode->i_flags, S_WRITE_LIFE_SHIFT)) {
+		enum rw_hint hint;
+
+		hint = mask_to_write_hint(flags, RWF_WRITE_LIFE_SHIFT);
+
+		inode_set_write_hint(inode, hint);
+		kiocb.ki_flags |= write_hint_to_mask(hint, IOCB_WRITE_LIFE_SHIFT);
+	}
 	kiocb.ki_pos = *ppos;
 
 	if (type == READ)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 472c83156606..a024b32259bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -269,6 +269,12 @@  struct writeback_control;
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 
+/*
+ * Steal 3 bits for stream information, this allows 8 valid streams
+ */
+#define IOCB_WRITE_LIFE_SHIFT	7
+#define IOCB_WRITE_LIFE_MASK	(BIT(7) | BIT(8) | BIT(9))
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
@@ -292,6 +298,12 @@  static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+static inline int iocb_write_hint(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_WRITE_LIFE_MASK) >>
+			IOCB_WRITE_LIFE_SHIFT;
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 8fb3b5a6e1ec..0d9d331d3b61 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -374,4 +374,14 @@  enum rw_hint {
 #define RWF_DSYNC			0x00000002 /* per-IO O_DSYNC */
 #define RWF_SYNC			0x00000004 /* per-IO O_SYNC */
 
+/*
+ * Data life time write flags, steal 3 bits for that
+ */
+#define RWF_WRITE_LIFE_SHIFT		4
+#define RWF_WRITE_LIFE_MASK		0x00000070 /* 3 bits of write hints */
+#define RWF_WRITE_LIFE_SHORT		(WRITE_LIFE_SHORT << RWF_WRITE_LIFE_SHIFT)
+#define RWF_WRITE_LIFE_MEDIUM		(WRITE_LIFE_MEDIUM << RWF_WRITE_LIFE_SHIFT)
+#define RWF_WRITE_LIFE_LONG		(WRITE_LIFE_LONG << RWF_WRITE_LIFE_SHIFT)
+#define RWF_WRITE_LIFE_EXTREME		(WRITE_LIFE_EXTREME << RWF_WRITE_LIFE_SHIFT)
+
 #endif /* _UAPI_LINUX_FS_H */