From patchwork Thu Jun 15 04:38:23 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 9788037 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1710D60384 for ; Thu, 15 Jun 2017 04:44:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EA9712855D for ; Thu, 15 Jun 2017 04:44:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CDD0028572; Thu, 15 Jun 2017 04:44:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E77162855D for ; Thu, 15 Jun 2017 04:44:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750801AbdFOEop (ORCPT ); Thu, 15 Jun 2017 00:44:45 -0400 Received: from mail-pf0-f171.google.com ([209.85.192.171]:33114 "EHLO mail-pf0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750794AbdFOEoo (ORCPT ); Thu, 15 Jun 2017 00:44:44 -0400 Received: by mail-pf0-f171.google.com with SMTP id 83so2497230pfr.0 for ; Wed, 14 Jun 2017 21:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=mI/KwJdJAEg/NB803yuQ2hTlTAMuqie0e5Gqn94i/2Q=; b=wXPEZwhRoOV15Klf6q3ACngAgvUhaHn1ilT2VaX82nBxSEW55jKlpee5lqkWbhhf25 SBRgEySeVLTiQFKhbhzBEr7zhzYH0leWF2XRX5SrM3uBIuHJ+n9koeHDOuITgCoKV6XE rOTj5VU+gn7SQZOEBm3FNBRnJs9m7k0wgauVeg/sWbW5jCbOrK4JNQMuAqbt25ceGOQk DFKzIveeJwnfGkizxpupFi4FDz1wLure2rpPUTDBj5+Z82nujh7mkzXPeNYnfoKkUTwb nxxo9QBZsvWjJEyFykgVrizNHUH54zFI+EGCog3miTxonuy5M7kLaljJcUf5FuTAmxne O55w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=mI/KwJdJAEg/NB803yuQ2hTlTAMuqie0e5Gqn94i/2Q=; b=XsewV4Sk8jHiFuuVkbLCxestIjQtymcwvp3rOWHlKFYaxQw3SomMemQs1lnkxLowBO LZRjBs5G+2jayMyJ3vK/8zxoc97u/S3W/Xe7E8afUeWTLrG5XX/ISiNT0SXT8Axsgovs PKoFGee/mgM3tIIYuYw9W48SfQC/Ox2wYMUSQAXnDBh3iPhYYDJ02mFr7o03c36Bx4Vh jtWpYa9aBbcIZ429dR+tPytU+dZhQU5v9QVm7L592u2lciMCYvb8xo+ZlVWS5NojuJWw gyK3wE/8B0VoJLBX3JZck5csw+47s8HEksi2PGCCGB+kLWVPAQWwSdscm84gOiAXhNzj NFcA== X-Gm-Message-State: AKS2vOzaNNJ8a07Kcpi5dYDjSIxECTd+c1lu4qdNCHeldZcygiqPTvvc djmut1+/JR/sR7O9rsdilg== X-Received: by 10.84.129.99 with SMTP id 90mr3933460plb.62.1497501505648; Wed, 14 Jun 2017 21:38:25 -0700 (PDT) Received: from [192.168.1.176] (66.29.164.166.static.utbb.net. [66.29.164.166]) by smtp.gmail.com with ESMTPSA id p8sm3284062pgd.23.2017.06.14.21.38.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Jun 2017 21:38:25 -0700 (PDT) Subject: Re: [PATCHSET v2] Add support for write life time hints From: Jens Axboe To: Andreas Dilger Cc: Martin Petersen , Christoph Hellwig , fsdevel , linux-block References: <1497412919-19400-1-git-send-email-axboe@kernel.dk> <20170614160127.GA30644@infradead.org> <20382261-3180-41F0-B959-9181C2FF4BAA@dilger.ca> <9e152ed7-ed49-bb62-902c-0daac7646a86@kernel.dk> <48A74D71-9765-4881-B337-761FE66757BB@dilger.ca> Message-ID: <47c6fb6d-d672-a230-890f-68c1db22e277@kernel.dk> Date: Wed, 14 Jun 2017 22:38:23 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 06/14/2017 09:57 PM, Jens Axboe wrote: > On 06/14/2017 09:53 PM, Andreas Dilger wrote: >> On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote: >>> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote: >>>> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen wrote: >>>>> Christoph, >>>>> >>>>>> I think what Martin wants (or at least what I'd want him to want) is >>>>>> to define a few REQ_* bits that mirror the RWF bits, use that to >>>>>> transfer the information down the stack, and then only translate it >>>>>> to stream ids in the driver. >>>>> >>>>> Yup. If we have enough space in the existing flags that's perfect (I >>>>> lost count after your op/flag shuffle). >>>> >>>> Just to clarify, does this mean that Jens' "lifetime" hints are going to >>>> be independent of separate "stream ID" values in the future (needing a >>>> similar, but independent, set of fields for stream IDs, or will they >>>> both be implemented using a common transport in the kernel (i.e. both >>>> share a single set of fields in the inode/bio/etc? >>> >>> There's no reason they can't coexist. Now that bio->bi_stream is gone >>> and we just have the flags, if we want to expose separate "real" stream >>> IDs, then we could later re-introduce that. It'd be pretty trivial to >>> just use the most pertinent information on the driver side. >>> >>> From my perspective, all I really care about is the 4 hints. It's a >>> simple enough interface that applications can understand and use it, and >>> we don't need any management of actual stream IDs. I think that has the >>> highest chance of success. Modifying an application to use it is >>> trivial, even something like RocksDB (if you havehad to make changes >>> to RocksDB, you'll get this). >> >> If this is really to be only flags, rather than a hybrid of flags and IDs >> (as I had thought), then it probably makes sense to limit the inode usage >> to a few bits in i_flags using S_LIFETIME_* values rather than a separate >> 16-bit i_stream field, which can be used for the actual stream IDs later. > > Christoph alluded to that as well. And yes, if we are contemplating > something else later on, then that does make more sense. I'll make that > change. Easy enough to do, see attached incremental patch. Only issue is that we then have to lock the inode, and use the atomic flag setting. I'm assuming that's safe to do from that path. We only need to grab the lock if the hint changes, which is basically never should. diff --git a/fs/inode.c b/fs/inode.c index bd8bf44f3f31..db5914783a71 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -149,7 +149,6 @@ int inode_init_always(struct super_block *sb, struct inode *inode) inode->i_blocks = 0; inode->i_bytes = 0; inode->i_generation = 0; - inode->i_write_hint = 0; inode->i_pipe = NULL; inode->i_bdev = NULL; inode->i_cdev = NULL; diff --git a/fs/read_write.c b/fs/read_write.c index 9cb2314efca3..70f8764ae117 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -675,6 +675,7 @@ EXPORT_SYMBOL(iov_shorten); static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter, loff_t *ppos, int type, int flags) { + struct inode *inode = file_inode(filp); struct kiocb kiocb; ssize_t ret; @@ -688,12 +689,19 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter, kiocb.ki_flags |= IOCB_DSYNC; if (flags & RWF_SYNC) kiocb.ki_flags |= (IOCB_DSYNC | IOCB_SYNC); - if (flags & RWF_WRITE_LIFE_MASK) { - struct inode *inode = file_inode(filp); + if ((flags & RWF_WRITE_LIFE_MASK) || + (inode->i_flags & S_WRITE_LIFE_MASK)) { + unsigned int hint, iflags; - inode->i_write_hint = (flags & RWF_WRITE_LIFE_MASK) >> - RWF_WRITE_LIFE_SHIFT; - kiocb.ki_flags |= inode->i_write_hint << IOCB_WRITE_LIFE_SHIFT; + hint = (flags & RWF_WRITE_LIFE_MASK) >> RWF_WRITE_LIFE_SHIFT; + iflags = hint << S_WRITE_LIFE_SHIFT; + + if ((inode->i_flags & S_WRITE_LIFE_MASK) != iflags) { + inode_lock(inode); + inode_set_flags(inode, iflags, iflags); + inode_unlock(inode); + } + kiocb.ki_flags |= hint << IOCB_WRITE_LIFE_SHIFT; } kiocb.ki_pos = *ppos; diff --git a/include/linux/fs.h b/include/linux/fs.h index 63798b67fcfe..929f1fc088c6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -270,10 +270,10 @@ struct writeback_control; #define IOCB_WRITE (1 << 6) /* - * Steal 4-bits for stream information, this allows 16 valid streams + * Steal 3 bits for stream information, this allows 8 valid streams */ #define IOCB_WRITE_LIFE_SHIFT 7 -#define IOCB_WRITE_LIFE_MASK (BIT(7) | BIT(8) | BIT(9) | BIT(10)) +#define IOCB_WRITE_LIFE_MASK (BIT(7) | BIT(8) | BIT(9)) struct kiocb { struct file *ki_filp; @@ -603,7 +603,6 @@ struct inode { struct timespec i_ctime; spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; - unsigned short i_write_hint; unsigned int i_blkbits; blkcnt_t i_blocks; @@ -668,14 +667,6 @@ struct inode { void *i_private; /* fs or device private pointer */ }; -static inline unsigned int inode_write_hint(struct inode *inode) -{ - if (inode) - return inode->i_write_hint; - - return 0; -} - static inline unsigned int i_blocksize(const struct inode *node) { return (1 << node->i_blkbits); @@ -1849,6 +1840,17 @@ struct super_operations { #endif /* + * Expected life time hint of a write for this inode. This maps directly + * to the RWF_WRITE_LIFE_* hints + */ +#define S_WRITE_LIFE_SHIFT 14 /* 16384, next bit */ +#define S_WRITE_LIFE_MASK 0x1c000 /* bits 14..16 */ +#define S_WRITE_LIFE_SHORT (1 << S_WRITE_LIFE_SHIFT) +#define S_WRITE_LIFE_MEDIUM (2 << S_WRITE_LIFE_SHIFT) +#define S_WRITE_LIFE_LONG (3 << S_WRITE_LIFE_SHIFT) +#define S_WRITE_LIFE_EXTREME (4 << S_WRITE_LIFE_SHIFT) + +/* * Note that nosuid etc flags are inode-specific: setting some file-system * flags just means all the inodes inherit those flags by default. It might be * possible to override it selectively if you really wanted to with some @@ -1894,6 +1896,15 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode) return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid); } +static inline unsigned int inode_write_hint(struct inode *inode) +{ + if (inode) + return (inode->i_flags & S_WRITE_LIFE_MASK) >> + S_WRITE_LIFE_SHIFT; + + return 0; +} + /* * Inode state bits. Protected by inode->i_lock * diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 58b7ee06b380..b68dc74236c1 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -362,10 +362,10 @@ struct fscrypt_key { #define RWF_SYNC 0x00000004 /* per-IO O_SYNC */ /* - * Data life time write flags, steal 4 bits for that + * Data life time write flags, steal 3 bits for that */ #define RWF_WRITE_LIFE_SHIFT 4 -#define RWF_WRITE_LIFE_MASK 0x000000f0 /* 4 bits of stream ID */ +#define RWF_WRITE_LIFE_MASK 0x00000070 /* 3 bits of stream ID */ #define RWF_WRITE_LIFE_SHORT (1 << RWF_WRITE_LIFE_SHIFT) #define RWF_WRITE_LIFE_MEDIUM (2 << RWF_WRITE_LIFE_SHIFT) #define RWF_WRITE_LIFE_LONG (3 << RWF_WRITE_LIFE_SHIFT)