Message ID | 20240208064334.268216-2-eugen.hristev@collabora.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce case-insensitive string comparison helper | expand |
Eugen Hristev <eugen.hristev@collabora.com> writes: > From: Gabriel Krisman Bertazi <krisman@collabora.com> > > generic_ci_match can be used by case-insensitive filesystems to compare > strings under lookup with dirents in a case-insensitive way. This > function is currently reimplemented by each filesystem supporting > casefolding, so this reduces code duplication in filesystem-specific > code. > > Reviewed-by: Eric Biggers <ebiggers@google.com> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> > Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Hi Eugen, Thanks for picking this up. Please, CC me in future versions. > --- > fs/libfs.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 4 +++ > 2 files changed, 72 insertions(+) > > diff --git a/fs/libfs.c b/fs/libfs.c > index bb18884ff20e..f80cb982ac89 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -1773,6 +1773,74 @@ static const struct dentry_operations generic_ci_dentry_ops = { > .d_hash = generic_ci_d_hash, > .d_compare = generic_ci_d_compare, > }; > + > +/** > + * generic_ci_match() - Match a name (case-insensitively) with a dirent. > + * @parent: Inode of the parent of the dirent under comparison > + * @name: name under lookup. > + * @folded_name: Optional pre-folded name under lookup > + * @de_name: Dirent name. > + * @de_name_len: dirent name length. > + * > + * > + * Test whether a case-insensitive directory entry matches the filename > + * being searched. If @folded_name is provided, it is used instead of > + * recalculating the casefold of @name. Can we add a note that this is a filesystem helper for comparison with directory entries, and VFS' ->d_compare should use generic_ci_d_compare. > + * > + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or > + * < 0 on error. > + */ > +int generic_ci_match(const struct inode *parent, > + const struct qstr *name, > + const struct qstr *folded_name, > + const u8 *de_name, u32 de_name_len) > +{ > + const struct super_block *sb = parent->i_sb; > + const struct unicode_map *um = sb->s_encoding; > + struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len); > + struct qstr dirent = QSTR_INIT(de_name, de_name_len); > + int res, match = false; I know I originally wrote it this way, but match is an integer, so let's use integers instead of false/true. > + > + if (IS_ENCRYPTED(parent)) { > + const struct fscrypt_str encrypted_name = > + FSTR_INIT((u8 *) de_name, de_name_len); > + > + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent))) > + return -EINVAL; > + > + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL); > + if (!decrypted_name.name) > + return -ENOMEM; > + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name, > + &decrypted_name); > + if (res < 0) > + goto out; > + dirent.name = decrypted_name.name; > + dirent.len = decrypted_name.len; > + } > + > + if (folded_name->name) > + res = utf8_strncasecmp_folded(um, folded_name, &dirent); > + else > + res = utf8_strncasecmp(um, name, &dirent); Similar to https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git/commit/?h=for-next&id=367122c529f35b4655acbe33c0cc4d6d3b32ba71 We should be checking for an exact-match first to avoid the utf8 comparison cost unnecessarily. The only problem is that we need to ensure we fail for an invalid utf-8 de_name in "strict mode". Fortunately, if folded_name->name exists, we know the name-under-lookup was validated when initialized, so an exact-match of de_name must also be valid. If folded_name is NULL, though, we might either have an invalid utf-8 dentry-under-lookup or the allocation itself failed, so we need to utf8_validate it. Honestly, I don't care much about this !folded_name->name case, since utf8_strncasecmp will do the right thing and an invalid utf8 on case-insensitive directories should be an exception, not the norm. but the code might get simpler if we do both: (untested) if (folded_name->name) { if (dirent.len == folded_name->len && !memcmp(folded_name->name, dirent.name, dirent.len)) { res = 1; goto out; } res = utf8_strncasecmp_folded(um, folded_name, &dirent); } else { if (dirent.len == name->len && !memcmp(name->name, dirent.name, dirent.len) && (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) { res = 1; goto out; } res = utf8_strncasecmp(um, name, &dirent); } > + > + if (!res) > + match = true; > + else if (res < 0 && !sb_has_strict_encoding(sb)) { > + /* > + * In non-strict mode, fallback to a byte comparison if > + * the names have invalid characters. > + */ > + res = 0; > + match = ((name->len == dirent.len) && > + !memcmp(name->name, dirent.name, dirent.len)); > + } This goes away entirely. > + > +out: > + kfree(decrypted_name.name); > + return (res >= 0) ? match : res; and this becomes: return res;
On 2/8/24 20:38, Gabriel Krisman Bertazi wrote: > Eugen Hristev <eugen.hristev@collabora.com> writes: > >> From: Gabriel Krisman Bertazi <krisman@collabora.com> >> >> generic_ci_match can be used by case-insensitive filesystems to compare >> strings under lookup with dirents in a case-insensitive way. This >> function is currently reimplemented by each filesystem supporting >> casefolding, so this reduces code duplication in filesystem-specific >> code. >> >> Reviewed-by: Eric Biggers <ebiggers@google.com> >> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> >> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> > > Hi Eugen, > > Thanks for picking this up. Please, CC me in future versions. Hello Gabriel, Thanks for reviewing :) > >> --- >> fs/libfs.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++ >> include/linux/fs.h | 4 +++ >> 2 files changed, 72 insertions(+) >> >> diff --git a/fs/libfs.c b/fs/libfs.c >> index bb18884ff20e..f80cb982ac89 100644 >> --- a/fs/libfs.c >> +++ b/fs/libfs.c >> @@ -1773,6 +1773,74 @@ static const struct dentry_operations generic_ci_dentry_ops = { >> .d_hash = generic_ci_d_hash, >> .d_compare = generic_ci_d_compare, >> }; >> + >> +/** >> + * generic_ci_match() - Match a name (case-insensitively) with a dirent. >> + * @parent: Inode of the parent of the dirent under comparison >> + * @name: name under lookup. >> + * @folded_name: Optional pre-folded name under lookup >> + * @de_name: Dirent name. >> + * @de_name_len: dirent name length. >> + * >> + * >> + * Test whether a case-insensitive directory entry matches the filename >> + * being searched. If @folded_name is provided, it is used instead of >> + * recalculating the casefold of @name. > > Can we add a note that this is a filesystem helper for comparison with > directory entries, and VFS' ->d_compare should use generic_ci_d_compare. > >> + * >> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or >> + * < 0 on error. >> + */ >> +int generic_ci_match(const struct inode *parent, >> + const struct qstr *name, >> + const struct qstr *folded_name, >> + const u8 *de_name, u32 de_name_len) >> +{ >> + const struct super_block *sb = parent->i_sb; >> + const struct unicode_map *um = sb->s_encoding; >> + struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len); >> + struct qstr dirent = QSTR_INIT(de_name, de_name_len); >> + int res, match = false; > > I know I originally wrote it this way, but match is an integer, so > let's use integers instead of false/true. With the changes below, 'match' is no longer needed > >> + >> + if (IS_ENCRYPTED(parent)) { >> + const struct fscrypt_str encrypted_name = >> + FSTR_INIT((u8 *) de_name, de_name_len); >> + >> + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent))) >> + return -EINVAL; >> + >> + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL); >> + if (!decrypted_name.name) >> + return -ENOMEM; >> + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name, >> + &decrypted_name); >> + if (res < 0) >> + goto out; >> + dirent.name = decrypted_name.name; >> + dirent.len = decrypted_name.len; >> + } >> + >> + if (folded_name->name) >> + res = utf8_strncasecmp_folded(um, folded_name, &dirent); >> + else >> + res = utf8_strncasecmp(um, name, &dirent); > > Similar to > > https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git/commit/?h=for-next&id=367122c529f35b4655acbe33c0cc4d6d3b32ba71 > > We should be checking for an exact-match first to avoid the utf8 > comparison cost unnecessarily. The only problem is that we need to > ensure we fail for an invalid utf-8 de_name in "strict mode". > > Fortunately, if folded_name->name exists, we know the name-under-lookup > was validated when initialized, so an exact-match of de_name must also > be valid. If folded_name is NULL, though, we might either have an > invalid utf-8 dentry-under-lookup or the allocation itself failed, so we > need to utf8_validate it. > > Honestly, I don't care much about this !folded_name->name case, since > utf8_strncasecmp will do the right thing and an invalid utf8 on > case-insensitive directories should be an exception, not the norm. but > the code might get simpler if we do both: > > (untested) I implemented your suggestion, but any idea about testing ? I ran smoke on xfstests and it appears to be fine, but maybe some specific test case might try the different paths here ? Let me know, Eugen > > if (folded_name->name) { > if (dirent.len == folded_name->len && > !memcmp(folded_name->name, dirent.name, dirent.len)) { > res = 1; > goto out; > } > res = utf8_strncasecmp_folded(um, folded_name, &dirent); > } else { > if (dirent.len == name->len && > !memcmp(name->name, dirent.name, dirent.len) && > (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) { > res = 1; > goto out; > } > res = utf8_strncasecmp(um, name, &dirent); > } > >> + >> + if (!res) >> + match = true; >> + else if (res < 0 && !sb_has_strict_encoding(sb)) { >> + /* >> + * In non-strict mode, fallback to a byte comparison if >> + * the names have invalid characters. >> + */ >> + res = 0; >> + match = ((name->len == dirent.len) && >> + !memcmp(name->name, dirent.name, dirent.len)); >> + } > > This goes away entirely. > >> + >> +out: >> + kfree(decrypted_name.name); >> + return (res >= 0) ? match : res; > > and this becomes: > > return res; >
Eugen Hristev <eugen.hristev@collabora.com> writes: > On 2/8/24 20:38, Gabriel Krisman Bertazi wrote: >> (untested) > > I implemented your suggestion, but any idea about testing ? I ran smoke on xfstests > and it appears to be fine, but maybe some specific test case might try the > different paths here ? Other than running the fstests quick group for each affected filesystems looking for regressions, the way I'd do it is create a few files and look them up with exact and inexact name matches. While doing that, observe through bpftrace which functions got called and what they returned. Here, since you are testing the uncached lookup, you want to make sure to drop the cached version prior to each lookup.
On 2/9/24 16:40, Gabriel Krisman Bertazi wrote: > Eugen Hristev <eugen.hristev@collabora.com> writes: > >> On 2/8/24 20:38, Gabriel Krisman Bertazi wrote: > >>> (untested) >> >> I implemented your suggestion, but any idea about testing ? I ran smoke on xfstests >> and it appears to be fine, but maybe some specific test case might try the >> different paths here ? > > Other than running the fstests quick group for each affected filesystems > looking for regressions, the way I'd do it is create a few files and > look them up with exact and inexact name matches. While doing that, > observe through bpftrace which functions got called and what they > returned. > > Here, since you are testing the uncached lookup, you want to make sure > to drop the cached version prior to each lookup. > Hello Gabriel, With the changes you suggested, I get these errors now : [ 107.409410] EXT4-fs error (device sda1): ext4_lookup:1816: inode #521217: comm ls: 'CUC' linked to parent dir ls: cannot access '/media/CI_dir/CUC': Structure needs cleaning total 8 drwxr-xr-x 2 root root 4096 Feb 12 11:51 . drwxr-xr-x 4 root root 4096 Feb 12 11:47 .. -????????? ? ? ? ? ? CUC Do you have any idea about what is wrong ? Thanks, Eugen
Eugen Hristev <eugen.hristev@collabora.com> writes: > On 2/9/24 16:40, Gabriel Krisman Bertazi wrote: >> Eugen Hristev <eugen.hristev@collabora.com> writes: > With the changes you suggested, I get these errors now : > > [ 107.409410] EXT4-fs error (device sda1): ext4_lookup:1816: inode #521217: comm > ls: 'CUC' linked to parent dir > ls: cannot access '/media/CI_dir/CUC': Structure needs cleaning > total 8 > drwxr-xr-x 2 root root 4096 Feb 12 11:51 . > drwxr-xr-x 4 root root 4096 Feb 12 11:47 .. > -????????? ? ? ? ? ? CUC > > Do you have any idea about what is wrong ? Hm, there's a bug somewhere. The lookup got broken and ls got an error. Did you debug it a bit? can you share the code and a reproducer? From a quick look at the example I suggested, utf8_strncasecmp* return 0 on match, but ext4_match should return true when matched. So remember to negate the output: ... res = !utf8_strncasecmp(um, name, &dirent); ...
diff --git a/fs/libfs.c b/fs/libfs.c index bb18884ff20e..f80cb982ac89 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1773,6 +1773,74 @@ static const struct dentry_operations generic_ci_dentry_ops = { .d_hash = generic_ci_d_hash, .d_compare = generic_ci_d_compare, }; + +/** + * generic_ci_match() - Match a name (case-insensitively) with a dirent. + * @parent: Inode of the parent of the dirent under comparison + * @name: name under lookup. + * @folded_name: Optional pre-folded name under lookup + * @de_name: Dirent name. + * @de_name_len: dirent name length. + * + * + * Test whether a case-insensitive directory entry matches the filename + * being searched. If @folded_name is provided, it is used instead of + * recalculating the casefold of @name. + * + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or + * < 0 on error. + */ +int generic_ci_match(const struct inode *parent, + const struct qstr *name, + const struct qstr *folded_name, + const u8 *de_name, u32 de_name_len) +{ + const struct super_block *sb = parent->i_sb; + const struct unicode_map *um = sb->s_encoding; + struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len); + struct qstr dirent = QSTR_INIT(de_name, de_name_len); + int res, match = false; + + if (IS_ENCRYPTED(parent)) { + const struct fscrypt_str encrypted_name = + FSTR_INIT((u8 *) de_name, de_name_len); + + if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent))) + return -EINVAL; + + decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL); + if (!decrypted_name.name) + return -ENOMEM; + res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name, + &decrypted_name); + if (res < 0) + goto out; + dirent.name = decrypted_name.name; + dirent.len = decrypted_name.len; + } + + if (folded_name->name) + res = utf8_strncasecmp_folded(um, folded_name, &dirent); + else + res = utf8_strncasecmp(um, name, &dirent); + + if (!res) + match = true; + else if (res < 0 && !sb_has_strict_encoding(sb)) { + /* + * In non-strict mode, fallback to a byte comparison if + * the names have invalid characters. + */ + res = 0; + match = ((name->len == dirent.len) && + !memcmp(name->name, dirent.name, dirent.len)); + } + +out: + kfree(decrypted_name.name); + return (res >= 0) ? match : res; +} +EXPORT_SYMBOL(generic_ci_match); #endif #ifdef CONFIG_FS_ENCRYPTION diff --git a/include/linux/fs.h b/include/linux/fs.h index 820b93b2917f..7af691ff8d44 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3296,6 +3296,10 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_check_addressable(unsigned, u64); extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry); +extern int generic_ci_match(const struct inode *parent, + const struct qstr *name, + const struct qstr *folded_name, + const u8 *de_name, u32 de_name_len); static inline bool sb_has_encoding(const struct super_block *sb) {