Message ID | 20200901204045.1250822-1-vgoyal@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | fuse: Enable SB_NOSEC if filesystem is not shared | expand |
On Tue, Sep 01, 2020 at 04:40:43PM -0400, Vivek Goyal wrote: > Hi, > > I want to enable SB_NOSEC in fuse in some form so that performance of > small random writes can be improved. As of now, we call file_remove_privs(), > which results in fuse always sending getxattr(security.capability) to > server to figure out if security.capability has been set on file or not. > If it has been set, it needs to be cleared. This slows down small > random writes tremendously. > > I posted couple of proposals in the past here. > > Proposal 1: > > https://lore.kernel.org/linux-fsdevel/20200716144032.GC422759@redhat.com/ > > Proposal 2: > > https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/ > > This is 3rd proposal now. One of the roadblocks in enabling SB_NOSEC > is shared filesystem. It is possible that another client modified the > file data and this client does not know about it. So we might regress > if we don't fetch security.capability always. > > So looks like this needs to be handled different for shared filesystems > and non-shared filesystems. non-shared filesystems will be more like > local filesystems where fuse does not expect file data/metadata to > change outside the fuse. And we should be able to enable SB_NOSEC > optimization. This is what this proposal does. > > It does not handle the case of shared filesystems. I believe solution > to that will depend on filesystem based on what's the cache coherency > guarantees filesystem provides and what's the cache invalidation > mechanism it uses. > > For now, all filesystems which are not shared can benefit from this > optimization. I am interested in virtiofs which is not shared in > many of the cases. In fact we don't even support shared mode yet. Here is the corresponding qemu virtiofsd patch which can enable this feature. Subject: virtiofsd: Enable FUSE_NONSHARED_FS by default Set FUSE_NONSHARED_FS flag by default as we don't support sharing of virtiofs filesystem yet. Once that is supported, it should not be set for shared mode. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- include/standard-headers/linux/fuse.h | 4 ++++ tools/virtiofsd/fuse_lowlevel.c | 9 +++++++++ 2 files changed, 13 insertions(+) Index: qemu/include/standard-headers/linux/fuse.h =================================================================== --- qemu.orig/include/standard-headers/linux/fuse.h 2020-09-01 15:22:31.449707002 -0400 +++ qemu/include/standard-headers/linux/fuse.h 2020-09-01 15:23:18.776668542 -0400 @@ -172,6 +172,8 @@ * - add FUSE_WRITE_KILL_PRIV flag * - add FUSE_SETUPMAPPING and FUSE_REMOVEMAPPING * - add map_alignment to fuse_init_out, add FUSE_MAP_ALIGNMENT flag + * 7.32 + * - add FUSE_NONSHARED_FS flag */ #ifndef _LINUX_FUSE_H @@ -310,6 +312,7 @@ struct fuse_file_lock { * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir * FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request * FUSE_MAP_ALIGNMENT: map_alignment field is valid + * FUSE_NONSHARED_FS: Filesystem is not shared. */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -338,6 +341,7 @@ struct fuse_file_lock { #define FUSE_NO_OPENDIR_SUPPORT (1 << 24) #define FUSE_EXPLICIT_INVAL_DATA (1 << 25) #define FUSE_MAP_ALIGNMENT (1 << 26) +#define FUSE_NONSHARED_FS (1 << 27) /** * CUSE INIT request/reply flags Index: qemu/tools/virtiofsd/fuse_lowlevel.c =================================================================== --- qemu.orig/tools/virtiofsd/fuse_lowlevel.c 2020-09-01 15:22:31.449707002 -0400 +++ qemu/tools/virtiofsd/fuse_lowlevel.c 2020-09-01 15:23:18.777668583 -0400 @@ -2218,6 +2218,15 @@ static void do_init(fuse_req_t req, fuse outarg.map_alignment = ffsl(sysconf(_SC_PAGE_SIZE)) - 1; } + if (arg->flags & FUSE_NONSHARED_FS) { + /* + * By default virtiofs is not shared. Once we support sharing, + * this will be have to a conditional and driven by user's + * selection. + */ + outarg.flags |= FUSE_NONSHARED_FS; + } + fuse_log(FUSE_LOG_DEBUG, " INIT: %u.%u\n", outarg.major, outarg.minor); fuse_log(FUSE_LOG_DEBUG, " flags=0x%08x\n", outarg.flags); fuse_log(FUSE_LOG_DEBUG, " max_readahead=0x%08x\n", outarg.max_readahead);
On Tue, Sep 01, 2020 at 04:40:43PM -0400, Vivek Goyal wrote: > Hi, > > I want to enable SB_NOSEC in fuse in some form so that performance of > small random writes can be improved. As of now, we call file_remove_privs(), > which results in fuse always sending getxattr(security.capability) to > server to figure out if security.capability has been set on file or not. > If it has been set, it needs to be cleared. This slows down small > random writes tremendously. > > I posted couple of proposals in the past here. > > Proposal 1: > > https://lore.kernel.org/linux-fsdevel/20200716144032.GC422759@redhat.com/ > > Proposal 2: > > https://lore.kernel.org/linux-fsdevel/20200724183812.19573-1-vgoyal@redhat.com/ > > This is 3rd proposal now. One of the roadblocks in enabling SB_NOSEC > is shared filesystem. It is possible that another client modified the > file data and this client does not know about it. So we might regress > if we don't fetch security.capability always. > > So looks like this needs to be handled different for shared filesystems > and non-shared filesystems. non-shared filesystems will be more like > local filesystems where fuse does not expect file data/metadata to > change outside the fuse. And we should be able to enable SB_NOSEC > optimization. This is what this proposal does. > > It does not handle the case of shared filesystems. I believe solution > to that will depend on filesystem based on what's the cache coherency > guarantees filesystem provides and what's the cache invalidation > mechanism it uses. > > For now, all filesystems which are not shared can benefit from this > optimization. I am interested in virtiofs which is not shared in > many of the cases. In fact we don't even support shared mode yet. Well, I was hoping that virtiofs and kata containers can directly benefit from this mode for root filesystem image. But Eric Ernst says that kata containers keep bunch of things in a single directory being exported to guest. And while rootfs image is not expected to be updated later, it is possile kubernetes updates other parts later. And that most likely means kata will not use virtiofs non-shared mode. I guess I need to keep this idea on hold for now because I will not have any immediate users. And go back to drawing board and figure out how to not query security.capability on every WRITE. Thanks Vivek > > Any comments or feedback is welcome. > > Thanks > Vivek > > Vivek Goyal (2): > fuse: Add a flag FUSE_NONSHARED_FS > fuse: Enable SB_NOSEC if filesystem is not shared > > fs/fuse/fuse_i.h | 3 +++ > fs/fuse/inode.c | 12 +++++++++++- > include/uapi/linux/fuse.h | 4 ++++ > 3 files changed, 18 insertions(+), 1 deletion(-) > > -- > 2.25.4 >