From patchwork Thu Sep 5 19:02:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 13792878 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20D5BCD5BB8 for ; Thu, 5 Sep 2024 19:03:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A96B36B0095; Thu, 5 Sep 2024 15:03:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A488A6B0096; Thu, 5 Sep 2024 15:03:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89C136B0098; Thu, 5 Sep 2024 15:03:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6B70C6B0095 for ; Thu, 5 Sep 2024 15:03:44 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CB7E81C4D70 for ; Thu, 5 Sep 2024 19:03:43 +0000 (UTC) X-FDA: 82531608726.07.E80BA9B Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by imf08.hostedemail.com (Postfix) with ESMTP id E927D16003A for ; Thu, 5 Sep 2024 19:03:41 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=Gwu+S2WV; dmarc=pass (policy=none) header.from=igalia.com; spf=pass (imf08.hostedemail.com: domain of andrealmeid@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=andrealmeid@igalia.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725563022; a=rsa-sha256; cv=none; b=unET+nd6xZo7Gx1BGw1ns+6fCXW4iMZBi+ZAbYNd0B20vg6zqFx16eiNTUNZncut4r4yfL 5lfMnNfA03Sw3KdKr0R9zBJQm0V5t2c6Kht0CyMaDQrCwqUyVeLkClAPs1gWZ4+aiHxN5r A3xPIuN/MF6rvTQi4lzKGHAfDepoPNI= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=igalia.com header.s=20170329 header.b=Gwu+S2WV; dmarc=pass (policy=none) header.from=igalia.com; spf=pass (imf08.hostedemail.com: domain of andrealmeid@igalia.com designates 178.60.130.6 as permitted sender) smtp.mailfrom=andrealmeid@igalia.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725563022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VmzlgsaYZjrrC3mQ36z6wkF/DubU3LiA5Bepzx0864k=; b=I2gp3ocqFLVPlVpxbeVjSHwSYKCUh4HZvdfWX2s0pUBmFUcuLUZuwNX3sp2j+2EnR8NyGW RB9TJj4INQSjMIZNAX9bY5MleZ0dhi8ImyP2aEtemECP1ZqHnFLVOGWfKlq2UV2gRA+PEB bBH7qszMcvlfkTq4qPocjqP43e+M+gM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=VmzlgsaYZjrrC3mQ36z6wkF/DubU3LiA5Bepzx0864k=; b=Gwu+S2WVVmZsGxvWLl052l/1q2 ltwU0NqVeVAHvuF2fa+ccjSMlRW1/+fl/1eN/6Uhmtl2GytsBpiHQtvU+Oow+lRmw65qbc37/GLKI EpfNSdteKTgZS9kAcMhehkpSuoXX9gcHHBNXH0pxrFmsdQbyf2ZHXfKfhqINt/lQTFvVmiXHH0im1 mI7rw2jvvohjfLB2GOI8sGUNPd08QRa1z7/FtLFANYvOrN4/29SG5zO8iyDvzVYhkoJ8l2Fj8qZEX ig8krXC81tItgy+rPSU/ButQMUqhAvqdhIW6l+kaVSFoNMv1PQvhQ2S5wntStcW/+6g6fzYWRczK3 0fHuQOpg==; Received: from [177.172.122.98] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1smHlW-00A6Ho-6D; Thu, 05 Sep 2024 21:03:34 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: Hugh Dickins , Andrew Morton , Alexander Viro , Christian Brauner , Jan Kara , krisman@kernel.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, kernel-dev@igalia.com, Daniel Rosenberg , smcv@collabora.com, Christoph Hellwig , Theodore Ts'o , =?utf-8?q?An?= =?utf-8?q?dr=C3=A9_Almeida?= Subject: [PATCH v3 6/9] tmpfs: Add casefold lookup support Date: Thu, 5 Sep 2024 16:02:49 -0300 Message-ID: <20240905190252.461639-7-andrealmeid@igalia.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240905190252.461639-1-andrealmeid@igalia.com> References: <20240905190252.461639-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: x6kijrz8cxxsjcshp7jjg111s6rba7kf X-Rspamd-Queue-Id: E927D16003A X-Rspamd-Server: rspam02 X-HE-Tag: 1725563021-138509 X-HE-Meta: U2FsdGVkX1+KjTs2PYVBdIJxT6AG9ynIUhnNp1hlsrnbLqisITSE57alc7V4GV/rnhOpEuuONICLCI814ZuSbq8zg33wTajPyctPVFNRZbXMzwVnt3Yl7YJRDIwifZdSiMp+S43LuNaKc4fzO5SWXFxQvexKGTW32fgTmyJA4SkswAnjAVSO2b974CXsAZ7JmNGbajK8tl1mW0SDgpnT845Araz/5gx9RsLP1vJmyeVa5XIdLJ9oh/mGeiP4KFmO408O72v0YlCMgfjAG8rkhl2qGodPQ6wgKVTajtYQf6Zpf7L/qxL+0MKdIjPSTpsuZdNUgJf7cJz6wmXl9/4EUJn0WNPmsNeke34BmgzHEY01VWoiwEQCeyJZLXoKo9dwSVSuaCytik5za8v4z9cdfCDX8BlNqVN7gbFlf5IneyyDGVWkM0QWmlIObflrB/bejJJchAlplRO/Iq8gyRL9nQY0yrnuwH5hyDYvpyGjea6ENhrrBr8H2OWOqR62tr4lrI2Z9gdIq042++4hQA98meKtRCWDvRX+iTZq4TXrO75E13hDskiCAv/NCzl0+1nfrGhSgDjoL1umWi/ZcFIhSNPZkDS1jcSa9yJQMexMX+VUOmPMpPsbqpFhob1PdDRFvPL5gPlKbZ+aRngxEO5n3poSzsJaGh7GHYzyauzHj2fD2hxydxEHik324P6J7h6woYgq15PgnjacuE4JuAHi+K5CLclmXnmLKifPBA/cSxus438t3kOhecIQjXWlgk1fkg0w/0tDwd2GMuI3cYuAUGgFC152IK3s94reZjipC6qmrwiY6/Y32JaiO0GcwCa7aPZrqT/DftT+LDYfKDEpQepYyJatqPlnh07yxmXDyKkxtZxu54GLcLu7YyIxThwy5Ad3sIt1TNV78lwoKv2rNFTvY7paAkllqohGIIweuMD53PCiLC5iryvLvNEGw9NH+u4UF3uTC8DSGjavzKG IxNu58zp mBJSvA/aDrfIRJJArXwezR4b0fYdbpVKdOQsIKBxM10ZNxJcUOjqGc309wWcPFmtEhBhly6DwENyMSQNTs1w/kqUdxANq3eY61QtdB6Mz2bsQgMPJ8OvmE42xYAbyDl7+SMHLx/Rv5nG5iL1zIUntb34YYzkMlXMQAI3VKH1bfvaCpHvVN+MEHzgyLQ+iTsHHpJWR+YudrcoEuIwfTFfCAlDTWLL3azzwp/umc0HQrik5ur2R4497AMYDwpEAe8hgThNLHQxKa9kFys3jw2JzeDhtB7nmtg3jZ9tJ49BWfgZ2ryxaYl03q+mSjR2KsjOQYUQ6+EtlMK5wEKw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Enable casefold lookup in tmpfs, based on the encoding defined by userspace. That means that instead of comparing byte per byte a file name, it compares to a case-insensitive equivalent of the Unicode string. * Dcache handling There's a special need when dealing with case-insensitive dentries. First of all, we currently invalidated every negative casefold dentries. That happens because currently VFS code has no proper support to deal with that, giving that it could incorrectly reuse a previous filename for a new file that has a casefold match. For instance, this could happen: $ mkdir DIR $ rm -r DIR $ mkdir dir $ ls DIR/ And would be perceived as inconsistency from userspace point of view, because even that we match files in a case-insensitive manner, we still honor whatever is the initial filename. Along with that, tmpfs stores only the first equivalent name dentry used in the dcache, preventing duplications of dentries in the dcache. The d_compare() version for casefold files uses a normalized string, so the filename under lookup will be compared to another normalized string for the existing file, achieving a casefolded lookup. * Enabling casefold via mount options Most filesystems have their data stored in disk, so casefold option need to be enabled when building a filesystem on a device (via mkfs). However, as tmpfs is a RAM backed filesystem, there's no disk information and thus no mkfs to store information about casefold. For tmpfs, create casefold options for mounting. Userspace can then enable casefold support for a mount point using: $ mount -t tmpfs -o casefold=utf8-12.1.0 fs_name mount_dir/ Userspace must set what Unicode standard is aiming to. The available options depends on what the kernel Unicode subsystem supports. And for strict encoding: $ mount -t tmpfs -o casefold=utf8-12.1.0,strict_encoding fs_name mount_dir/ Strict encoding means that tmpfs will refuse to create invalid UTF-8 sequences. When this option is not enabled, any invalid sequence will be treated as an opaque byte sequence, ignoring the encoding thus not being able to be looked up in a case-insensitive way. Signed-off-by: André Almeida --- Changes from v2: - shmem_lookup() now sets d_ops - reworked shmem_parse_opt_casefold() - if `mount -o casefold` has no param, load latest UTF-8 version - using (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir) when possible --- mm/shmem.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 138 insertions(+), 4 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 5a77acf6ac6a..6b61fc5dc0b1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -40,6 +40,8 @@ #include #include #include +#include +#include #include "swap.h" static struct vfsmount *shm_mnt __ro_after_init; @@ -123,6 +125,8 @@ struct shmem_options { bool noswap; unsigned short quota_types; struct shmem_quota_limits qlimits; + struct unicode_map *encoding; + bool strict_encoding; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 @@ -3427,6 +3431,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir, if (IS_ERR(inode)) return PTR_ERR(inode); + if (IS_ENABLED(CONFIG_UNICODE)) + if (!generic_ci_validate_strict_name(dir, &dentry->d_name)) + return -EINVAL; + error = simple_acl_create(dir, inode); if (error) goto out_iput; @@ -3442,7 +3450,12 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir, dir->i_size += BOGO_DIRENT_SIZE; inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); inode_inc_iversion(dir); - d_instantiate(dentry, inode); + + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_add(dentry, inode); + else + d_instantiate(dentry, inode); + dget(dentry); /* Extra count - pin the dentry in core */ return error; @@ -3533,7 +3546,10 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, inc_nlink(inode); ihold(inode); /* New dentry reference */ dget(dentry); /* Extra pinning count for the created dentry */ - d_instantiate(dentry, inode); + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_add(dentry, inode); + else + d_instantiate(dentry, inode); out: return ret; } @@ -3553,6 +3569,14 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry) inode_inc_iversion(dir); drop_nlink(inode); dput(dentry); /* Undo the count from "create" - does all the work */ + + /* + * For now, VFS can't deal with case-insensitive negative dentries, so + * we invalidate them + */ + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_invalidate(dentry); + return 0; } @@ -3697,7 +3721,10 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir, dir->i_size += BOGO_DIRENT_SIZE; inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); inode_inc_iversion(dir); - d_instantiate(dentry, inode); + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + d_add(dentry, inode); + else + d_instantiate(dentry, inode); dget(dentry); return 0; @@ -4050,6 +4077,9 @@ enum shmem_param { Opt_usrquota_inode_hardlimit, Opt_grpquota_block_hardlimit, Opt_grpquota_inode_hardlimit, + Opt_casefold_version, + Opt_casefold, + Opt_strict_encoding, }; static const struct constant_table shmem_param_enums_huge[] = { @@ -4081,9 +4111,62 @@ const struct fs_parameter_spec shmem_fs_parameters[] = { fsparam_string("grpquota_block_hardlimit", Opt_grpquota_block_hardlimit), fsparam_string("grpquota_inode_hardlimit", Opt_grpquota_inode_hardlimit), #endif + fsparam_string("casefold", Opt_casefold_version), + fsparam_flag ("casefold", Opt_casefold), + fsparam_flag ("strict_encoding", Opt_strict_encoding), {} }; +#if IS_ENABLED(CONFIG_UNICODE) +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param, + bool latest_version) +{ + struct shmem_options *ctx = fc->fs_private; + unsigned int maj = 0, min = 0, rev = 0, version = 0; + struct unicode_map *encoding; + char *version_str = param->string + 5; + int ret; + + if (latest_version) { + version = UTF8_LATEST; + } else { + if (strncmp(param->string, "utf8-", 5)) + return invalfc(fc, "Only UTF-8 encodings are supported " + "in the format: utf8-"); + + ret = utf8_parse_version(version_str, &maj, &min, &rev); + if (ret) + return invalfc(fc, "Invalid UTF-8 version: %s", version_str); + + version = UNICODE_AGE(maj, min, rev); + } + + encoding = utf8_load(version); + + if (IS_ERR(encoding)) { + if (latest_version) + return invalfc(fc, "Failed loading latest UTF-8 version"); + else + return invalfc(fc, "Failed loading UTF-8 version: %s", version_str); + } + + if (latest_version) + pr_info("tmpfs: Using the latest UTF-8 version available"); + else + pr_info("tmpfs: Using encoding provided by mount options: %s\n", param->string); + + ctx->encoding = encoding; + + return 0; +} +#else +static int shmem_parse_opt_casefold(struct fs_context *fc, struct fs_parameter *param, + bool latest_version) +{ + return invalfc(fc, "tmpfs: No kernel support for casefold filesystems\n"); +} +#endif + static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) { struct shmem_options *ctx = fc->fs_private; @@ -4242,6 +4325,13 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) "Group quota inode hardlimit too large."); ctx->qlimits.grpquota_ihardlimit = size; break; + case Opt_casefold_version: + return shmem_parse_opt_casefold(fc, param, false); + case Opt_casefold: + return shmem_parse_opt_casefold(fc, param, true); + case Opt_strict_encoding: + ctx->strict_encoding = true; + break; } return 0; @@ -4471,6 +4561,11 @@ static void shmem_put_super(struct super_block *sb) { struct shmem_sb_info *sbinfo = SHMEM_SB(sb); +#if IS_ENABLED(CONFIG_UNICODE) + if (sb->s_encoding) + utf8_unload(sb->s_encoding); +#endif + #ifdef CONFIG_TMPFS_QUOTA shmem_disable_quotas(sb); #endif @@ -4515,6 +4610,16 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) } sb->s_export_op = &shmem_export_ops; sb->s_flags |= SB_NOSEC | SB_I_VERSION; + +#if IS_ENABLED(CONFIG_UNICODE) + if (ctx->encoding) { + sb->s_encoding = ctx->encoding; + generic_set_sb_d_ops(sb); + if (ctx->strict_encoding) + sb->s_encoding_flags = SB_ENC_STRICT_MODE_FL; + } +#endif + #else sb->s_flags |= SB_NOUSER; #endif @@ -4704,11 +4809,38 @@ static const struct inode_operations shmem_inode_operations = { #endif }; +static struct dentry *shmem_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) +{ + const struct dentry_operations *d_ops = &simple_dentry_operations; + +#if IS_ENABLED(CONFIG_UNICODE) + if (dentry->d_sb->s_encoding) + d_ops = &generic_ci_always_del_dentry_ops; +#endif + + if (dentry->d_name.len > NAME_MAX) + return ERR_PTR(-ENAMETOOLONG); + + if (!dentry->d_sb->s_d_op) + d_set_d_op(dentry, d_ops); + + /* + * For now, VFS can't deal with case-insensitive negative dentries, so + * we prevent them from being created + */ + if (IS_ENABLED(CONFIG_UNICODE) && IS_CASEFOLDED(dir)) + return NULL; + + d_add(dentry, NULL); + + return NULL; +} + static const struct inode_operations shmem_dir_inode_operations = { #ifdef CONFIG_TMPFS .getattr = shmem_getattr, .create = shmem_create, - .lookup = simple_lookup, + .lookup = shmem_lookup, .link = shmem_link, .unlink = shmem_unlink, .symlink = shmem_symlink, @@ -4791,6 +4923,8 @@ int shmem_init_fs_context(struct fs_context *fc) ctx->uid = current_fsuid(); ctx->gid = current_fsgid(); + ctx->encoding = NULL; + fc->fs_private = ctx; fc->ops = &shmem_fs_context_ops; return 0;