Message ID | 20230223024412.3522465-6-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | tmpfs: add the option to disable swap | expand |
On Wed, Feb 22, 2023 at 06:44:12PM -0800, Luis Chamberlain wrote: > In doing experimentations with shmem having the option to avoid swap > becomes a useful mechanism. One of the *raves* about brd over shmem is > you can avoid swap, but that's not really a good reason to use brd if > we can instead use shmem. Using brd has its own good reasons to exist, > but just because "tmpfs" doesn't let you do that is not a great reason > to avoid it if we can easily add support for it. > > I don't add support for reconfiguring incompatible options, but if > we really wanted to we can add support for that. > > To avoid swap we use mapping_set_unevictable() upon inode creation, > and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim. > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- We would have use-cases for this in systemd. We currently use ramfs for systemd's credential logic since ramfs is unswappable. It'd be very neat if we could use tmpfs instead, Acked-by: Christian Brauner <brauner@kernel.org>
On Thu, Feb 23, 2023 at 01:26:31PM +0100, Christian Brauner wrote: > We would have use-cases for this in systemd. We currently use ramfs for > systemd's credential logic since ramfs is unswappable. It'd be very neat > if we could use tmpfs instead, What is the advantage of using a swapless tmpfs over ramf?
On Thu, Feb 23, 2023 at 07:16:09AM -0800, Christoph Hellwig wrote: > On Thu, Feb 23, 2023 at 01:26:31PM +0100, Christian Brauner wrote: > > We would have use-cases for this in systemd. We currently use ramfs for > > systemd's credential logic since ramfs is unswappable. It'd be very neat > > if we could use tmpfs instead, > > What is the advantage of using a swapless tmpfs over ramf? There are a few reasons we usually prefer tmpfs over ramfs. Iirc, ramfs doesn't have limits and grows dynamically. So we currently only use it from the most privileged process where we do our own accounting and immediately remount the superblock read-only. Tmpfs on the other hand offers various ways to restrict memory consumption. Other reasons are that ramfs doesn't support selinux labels, xattrs, and acls in general which come in quite handy. Starting with kernel v6.3 tmpfs does also support idmapped mounts. So we usually always prefer ramfs over tmpfs unless we have a very specific need such as the memory not being swapped out.
On Thu, Feb 23, 2023 at 05:09:14PM +0100, Christian Brauner wrote: > On Thu, Feb 23, 2023 at 07:16:09AM -0800, Christoph Hellwig wrote: > > On Thu, Feb 23, 2023 at 01:26:31PM +0100, Christian Brauner wrote: > > > We would have use-cases for this in systemd. We currently use ramfs for > > > systemd's credential logic since ramfs is unswappable. It'd be very neat > > > if we could use tmpfs instead, > > > > What is the advantage of using a swapless tmpfs over ramf? > > There are a few reasons we usually prefer tmpfs over ramfs. Iirc, ramfs > doesn't have limits and grows dynamically. So we currently only use it > from the most privileged process where we do our own accounting and > immediately remount the superblock read-only. Tmpfs on the other hand > offers various ways to restrict memory consumption. Size limits is just one bell, in fact ramfs has no configurable options. So in fact *all* options parsed on shmem_parse_options() are only available with tmpfs, some of the options are: * size * number of blocks * number of inodes * NUMA memory allocation policy * huge pages > Other reasons are that ramfs doesn't support selinux labels, xattrs, and > acls in general which come in quite handy. Starting with kernel v6.3 > tmpfs does also support idmapped mounts. So we usually always prefer > ramfs over tmpfs unless we have a very specific need such as the memory > not being swapped out. I guess its time to update Documentation/filesystems/tmpfs.rst. Luis
diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst index 53e59433497a..d7e11f492289 100644 --- a/Documentation/mm/unevictable-lru.rst +++ b/Documentation/mm/unevictable-lru.rst @@ -44,6 +44,8 @@ The unevictable list addresses the following classes of unevictable pages: * Those owned by ramfs. + * Those owned by tmpfs with the noswap option. + * Those mapped into SHM_LOCK'd shared memory regions. * Those mapped into VM_LOCKED [mlock()ed] VMAs. diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index d09d54be4ffd..98a7d53f6cc5 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -45,6 +45,7 @@ struct shmem_sb_info { kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ bool full_inums; /* If i_ino should be uint or ino_t */ + bool noswap; /* ingores VM relcaim / swap requests */ ino_t next_ino; /* The next per-sb inode number to use */ ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */ struct mempolicy *mpol; /* default memory policy for mappings */ diff --git a/mm/shmem.c b/mm/shmem.c index a49b31d38627..d2f34147fc66 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -116,10 +116,12 @@ struct shmem_options { bool full_inums; int huge; int seen; + bool noswap; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 #define SHMEM_SEEN_INUMS 8 +#define SHMEM_SEEN_NOSWAP 16 }; #ifdef CONFIG_TMPFS @@ -1334,6 +1336,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) struct address_space *mapping = folio->mapping; struct inode *inode = mapping->host; struct shmem_inode_info *info = SHMEM_I(inode); + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); swp_entry_t swap; pgoff_t index; @@ -1349,7 +1352,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) goto redirty; } - if (WARN_ON_ONCE(info->flags & VM_LOCKED)) + if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap)) goto redirty; if (!total_swap_pages) @@ -2374,6 +2377,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block shmem_set_inode_flags(inode, info->fsflags); INIT_LIST_HEAD(&info->shrinklist); INIT_LIST_HEAD(&info->swaplist); + if (sbinfo->noswap) + mapping_set_unevictable(inode->i_mapping); simple_xattrs_init(&info->xattrs); cache_no_acl(inode); mapping_set_large_folios(inode->i_mapping); @@ -3461,6 +3466,7 @@ enum shmem_param { Opt_uid, Opt_inode32, Opt_inode64, + Opt_noswap, }; static const struct constant_table shmem_param_enums_huge[] = { @@ -3482,6 +3488,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = { fsparam_u32 ("uid", Opt_uid), fsparam_flag ("inode32", Opt_inode32), fsparam_flag ("inode64", Opt_inode64), + fsparam_flag ("noswap", Opt_noswap), {} }; @@ -3565,6 +3572,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) ctx->full_inums = true; ctx->seen |= SHMEM_SEEN_INUMS; break; + case Opt_noswap: + ctx->noswap = true; + ctx->seen |= SHMEM_SEEN_NOSWAP; + break; } return 0; @@ -3663,6 +3674,14 @@ static int shmem_reconfigure(struct fs_context *fc) err = "Current inum too high to switch to 32-bit inums"; goto out; } + if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) { + err = "Cannot disable swap on remount"; + goto out; + } + if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) { + err = "Cannot enable swap on remount if it was disabled on first mount"; + goto out; + } if (ctx->seen & SHMEM_SEEN_HUGE) sbinfo->huge = ctx->huge; @@ -3683,6 +3702,10 @@ static int shmem_reconfigure(struct fs_context *fc) sbinfo->mpol = ctx->mpol; /* transfers initial ref */ ctx->mpol = NULL; } + + if (ctx->noswap) + sbinfo->noswap = true; + raw_spin_unlock(&sbinfo->stat_lock); mpol_put(mpol); return 0; @@ -3780,6 +3803,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) ctx->inodes = shmem_default_max_inodes(); if (!(ctx->seen & SHMEM_SEEN_INUMS)) ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64); + sbinfo->noswap = ctx->noswap; } else { sb->s_flags |= SB_NOUSER; }
In doing experimentations with shmem having the option to avoid swap becomes a useful mechanism. One of the *raves* about brd over shmem is you can avoid swap, but that's not really a good reason to use brd if we can instead use shmem. Using brd has its own good reasons to exist, but just because "tmpfs" doesn't let you do that is not a great reason to avoid it if we can easily add support for it. I don't add support for reconfiguring incompatible options, but if we really wanted to we can add support for that. To avoid swap we use mapping_set_unevictable() upon inode creation, and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- Documentation/mm/unevictable-lru.rst | 2 ++ include/linux/shmem_fs.h | 1 + mm/shmem.c | 26 +++++++++++++++++++++++++- 3 files changed, 28 insertions(+), 1 deletion(-)