diff mbox series

[f2fs-dev,v10,3/8] libfs: Introduce case-insensitive string comparison helper

Message ID 20240215042654.359210-4-eugen.hristev@collabora.com (mailing list archive)
State New
Headers show
Series [f2fs-dev,v10,1/8] ext4: Simplify the handling of cached insensitive names | expand

Commit Message

Eugen Hristev Feb. 15, 2024, 4:26 a.m. UTC
From: Gabriel Krisman Bertazi <krisman@collabora.com>

generic_ci_match can be used by case-insensitive filesystems to compare
strings under lookup with dirents in a case-insensitive way.  This
function is currently reimplemented by each filesystem supporting
casefolding, so this reduces code duplication in filesystem-specific
code.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
[eugen.hristev@collabora.com: rework to first test the exact match]
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
---
 fs/libfs.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  4 +++
 2 files changed, 84 insertions(+)

Comments

Gabriel Krisman Bertazi Feb. 16, 2024, 4:12 p.m. UTC | #1
Eugen Hristev <eugen.hristev@collabora.com> writes:

> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> generic_ci_match can be used by case-insensitive filesystems to compare
> strings under lookup with dirents in a case-insensitive way.  This
> function is currently reimplemented by each filesystem supporting
> casefolding, so this reduces code duplication in filesystem-specific
> code.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> [eugen.hristev@collabora.com: rework to first test the exact match]
> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
> ---
>  fs/libfs.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |  4 +++
>  2 files changed, 84 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index bb18884ff20e..82871fa1b066 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1773,6 +1773,86 @@ static const struct dentry_operations generic_ci_dentry_ops = {
>  	.d_hash = generic_ci_d_hash,
>  	.d_compare = generic_ci_d_compare,
>  };
> +
> +/**
> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
> + * This is a filesystem helper for comparison with directory entries.
> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
> + *
> + * @parent: Inode of the parent of the dirent under comparison
> + * @name: name under lookup.
> + * @folded_name: Optional pre-folded name under lookup
> + * @de_name: Dirent name.
> + * @de_name_len: dirent name length.
> + *
> + *

Since this need a respin, mind dropping the extra empty line here?

> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched.  If @folded_name is provided, it is used instead of
> + * recalculating the casefold of @name.
> + *
> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
> + * < 0 on error.
> + */
> +int generic_ci_match(const struct inode *parent,
> +		     const struct qstr *name,
> +		     const struct qstr *folded_name,
> +		     const u8 *de_name, u32 de_name_len)
> +{
> +	const struct super_block *sb = parent->i_sb;
> +	const struct unicode_map *um = sb->s_encoding;
> +	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
> +	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
> +	int res;
> +
> +	if (IS_ENCRYPTED(parent)) {
> +		const struct fscrypt_str encrypted_name =
> +			FSTR_INIT((u8 *) de_name, de_name_len);
> +
> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
> +			return -EINVAL;
> +
> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
> +		if (!decrypted_name.name)
> +			return -ENOMEM;
> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
> +						&decrypted_name);
> +		if (res < 0)
> +			goto out;
> +		dirent.name = decrypted_name.name;
> +		dirent.len = decrypted_name.len;
> +	}
> +
> +	/*
> +	 * Attempt a case-sensitive match first. It is cheaper and
> +	 * should cover most lookups, including all the sane
> +	 * applications that expect a case-sensitive filesystem.
> +	 *


> +	 * This comparison is safe under RCU because the caller
> +	 * guarantees the consistency between str and len. See
> +	 * __d_lookup_rcu_op_compare() for details.
> +	 */

This paragraph doesn't really make sense here.  It is originally from
the d_compare hook, which can be called under RCU, but there is no RCU
here.  Also, here we are comparing the dirent with the
name-under-lookup, name which is already safe.


> +	if (folded_name->name) {
> +		if (dirent.len == folded_name->len &&
> +		    !memcmp(folded_name->name, dirent.name, dirent.len)) {
> +			res = 1;
> +			goto out;
> +		}
> +		res = !utf8_strncasecmp_folded(um, folded_name, &dirent);

Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
which CAN happen for the first time here, if the dirent itself is
corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop the
error, but we want to propagate it from here, such that the warning on
patch 6 can trigger.

This is why I did that match dance on the original submission.  Sorry
for suggesting it.  We really want to get the error from utf8 and
propagate it if it is negative. basically:

        res > 0: match
        res == 0: no match.
        res < 0: propagate error and let the caller handle it


> +	} else {
> +		if (dirent.len == name->len &&
> +		    !memcmp(name->name, dirent.name, dirent.len) &&
> +		    (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
> +			res = 1;
> +			goto out;
> +		}
> +		res = !utf8_strncasecmp(um, name, &dirent);
> +	}
> +
> +out:
> +	kfree(decrypted_name.name);
> +	return res;
> +}
> +EXPORT_SYMBOL(generic_ci_match);
>  #endif
>  
>  #ifdef CONFIG_FS_ENCRYPTION
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 820b93b2917f..7af691ff8d44 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3296,6 +3296,10 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
>  extern int generic_check_addressable(unsigned, u64);
>  
>  extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
> +extern int generic_ci_match(const struct inode *parent,
> +			    const struct qstr *name,
> +			    const struct qstr *folded_name,
> +			    const u8 *de_name, u32 de_name_len);
>  
>  static inline bool sb_has_encoding(const struct super_block *sb)
>  {
Eugen Hristev Feb. 19, 2024, 4:22 a.m. UTC | #2
On 2/16/24 18:12, Gabriel Krisman Bertazi wrote:
> Eugen Hristev <eugen.hristev@collabora.com> writes:
> 
>> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>>
>> generic_ci_match can be used by case-insensitive filesystems to compare
>> strings under lookup with dirents in a case-insensitive way.  This
>> function is currently reimplemented by each filesystem supporting
>> casefolding, so this reduces code duplication in filesystem-specific
>> code.
>>
>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>> [eugen.hristev@collabora.com: rework to first test the exact match]
>> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
>> ---
>>  fs/libfs.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/fs.h |  4 +++
>>  2 files changed, 84 insertions(+)
>>
>> diff --git a/fs/libfs.c b/fs/libfs.c
>> index bb18884ff20e..82871fa1b066 100644
>> --- a/fs/libfs.c
>> +++ b/fs/libfs.c
>> @@ -1773,6 +1773,86 @@ static const struct dentry_operations generic_ci_dentry_ops = {
>>  	.d_hash = generic_ci_d_hash,
>>  	.d_compare = generic_ci_d_compare,
>>  };
>> +
>> +/**
>> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
>> + * This is a filesystem helper for comparison with directory entries.
>> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
>> + *
>> + * @parent: Inode of the parent of the dirent under comparison
>> + * @name: name under lookup.
>> + * @folded_name: Optional pre-folded name under lookup
>> + * @de_name: Dirent name.
>> + * @de_name_len: dirent name length.
>> + *
>> + *
> 
> Since this need a respin, mind dropping the extra empty line here?
> 
>> + * Test whether a case-insensitive directory entry matches the filename
>> + * being searched.  If @folded_name is provided, it is used instead of
>> + * recalculating the casefold of @name.
>> + *
>> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
>> + * < 0 on error.
>> + */
>> +int generic_ci_match(const struct inode *parent,
>> +		     const struct qstr *name,
>> +		     const struct qstr *folded_name,
>> +		     const u8 *de_name, u32 de_name_len)
>> +{
>> +	const struct super_block *sb = parent->i_sb;
>> +	const struct unicode_map *um = sb->s_encoding;
>> +	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
>> +	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
>> +	int res;
>> +
>> +	if (IS_ENCRYPTED(parent)) {
>> +		const struct fscrypt_str encrypted_name =
>> +			FSTR_INIT((u8 *) de_name, de_name_len);
>> +
>> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>> +			return -EINVAL;
>> +
>> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>> +		if (!decrypted_name.name)
>> +			return -ENOMEM;
>> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>> +						&decrypted_name);
>> +		if (res < 0)
>> +			goto out;
>> +		dirent.name = decrypted_name.name;
>> +		dirent.len = decrypted_name.len;
>> +	}
>> +
>> +	/*
>> +	 * Attempt a case-sensitive match first. It is cheaper and
>> +	 * should cover most lookups, including all the sane
>> +	 * applications that expect a case-sensitive filesystem.
>> +	 *
> 
> 
>> +	 * This comparison is safe under RCU because the caller
>> +	 * guarantees the consistency between str and len. See
>> +	 * __d_lookup_rcu_op_compare() for details.
>> +	 */
> 
> This paragraph doesn't really make sense here.  It is originally from
> the d_compare hook, which can be called under RCU, but there is no RCU
> here.  Also, here we are comparing the dirent with the
> name-under-lookup, name which is already safe.
> 
> 
>> +	if (folded_name->name) {
>> +		if (dirent.len == folded_name->len &&
>> +		    !memcmp(folded_name->name, dirent.name, dirent.len)) {
>> +			res = 1;
>> +			goto out;
>> +		}
>> +		res = !utf8_strncasecmp_folded(um, folded_name, &dirent);
> 
> Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
> which CAN happen for the first time here, if the dirent itself is
> corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop the
> error, but we want to propagate it from here, such that the warning on
> patch 6 can trigger.
> 
> This is why I did that match dance on the original submission.  Sorry
> for suggesting it.  We really want to get the error from utf8 and
> propagate it if it is negative. basically:
> 
>         res > 0: match
>         res == 0: no match.
>         res < 0: propagate error and let the caller handle it

In that case I will revert to the original v9 implementation and send a v11 to
handle that.

Eugen
> 
> 
>> +	} else {
>> +		if (dirent.len == name->len &&
>> +		    !memcmp(name->name, dirent.name, dirent.len) &&
>> +		    (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
>> +			res = 1;
>> +			goto out;
>> +		}
>> +		res = !utf8_strncasecmp(um, name, &dirent);
>> +	}
>> +
>> +out:
>> +	kfree(decrypted_name.name);
>> +	return res;
>> +}
>> +EXPORT_SYMBOL(generic_ci_match);
>>  #endif
>>  
>>  #ifdef CONFIG_FS_ENCRYPTION
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 820b93b2917f..7af691ff8d44 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -3296,6 +3296,10 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
>>  extern int generic_check_addressable(unsigned, u64);
>>  
>>  extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
>> +extern int generic_ci_match(const struct inode *parent,
>> +			    const struct qstr *name,
>> +			    const struct qstr *folded_name,
>> +			    const u8 *de_name, u32 de_name_len);
>>  
>>  static inline bool sb_has_encoding(const struct super_block *sb)
>>  {
>
Gabriel Krisman Bertazi Feb. 19, 2024, 2:55 p.m. UTC | #3
Eugen Hristev <eugen.hristev@collabora.com> writes:

> On 2/16/24 18:12, Gabriel Krisman Bertazi wrote:
>> Eugen Hristev <eugen.hristev@collabora.com> writes:
>> 
>>> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>>>
>>> generic_ci_match can be used by case-insensitive filesystems to compare
>>> strings under lookup with dirents in a case-insensitive way.  This
>>> function is currently reimplemented by each filesystem supporting
>>> casefolding, so this reduces code duplication in filesystem-specific
>>> code.
>>>
>>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>>> [eugen.hristev@collabora.com: rework to first test the exact match]
>>> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
>>> ---
>>>  fs/libfs.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/fs.h |  4 +++
>>>  2 files changed, 84 insertions(+)
>>>
>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>> index bb18884ff20e..82871fa1b066 100644
>>> --- a/fs/libfs.c
>>> +++ b/fs/libfs.c
>>> @@ -1773,6 +1773,86 @@ static const struct dentry_operations generic_ci_dentry_ops = {
>>>  	.d_hash = generic_ci_d_hash,
>>>  	.d_compare = generic_ci_d_compare,
>>>  };
>>> +
>>> +/**
>>> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
>>> + * This is a filesystem helper for comparison with directory entries.
>>> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
>>> + *
>>> + * @parent: Inode of the parent of the dirent under comparison
>>> + * @name: name under lookup.
>>> + * @folded_name: Optional pre-folded name under lookup
>>> + * @de_name: Dirent name.
>>> + * @de_name_len: dirent name length.
>>> + *
>>> + *
>> 
>> Since this need a respin, mind dropping the extra empty line here?
>> 
>>> + * Test whether a case-insensitive directory entry matches the filename
>>> + * being searched.  If @folded_name is provided, it is used instead of
>>> + * recalculating the casefold of @name.
>>> + *
>>> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
>>> + * < 0 on error.
>>> + */
>>> +int generic_ci_match(const struct inode *parent,
>>> +		     const struct qstr *name,
>>> +		     const struct qstr *folded_name,
>>> +		     const u8 *de_name, u32 de_name_len)
>>> +{
>>> +	const struct super_block *sb = parent->i_sb;
>>> +	const struct unicode_map *um = sb->s_encoding;
>>> +	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
>>> +	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
>>> +	int res;
>>> +
>>> +	if (IS_ENCRYPTED(parent)) {
>>> +		const struct fscrypt_str encrypted_name =
>>> +			FSTR_INIT((u8 *) de_name, de_name_len);
>>> +
>>> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>> +			return -EINVAL;
>>> +
>>> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>> +		if (!decrypted_name.name)
>>> +			return -ENOMEM;
>>> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>> +						&decrypted_name);
>>> +		if (res < 0)
>>> +			goto out;
>>> +		dirent.name = decrypted_name.name;
>>> +		dirent.len = decrypted_name.len;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Attempt a case-sensitive match first. It is cheaper and
>>> +	 * should cover most lookups, including all the sane
>>> +	 * applications that expect a case-sensitive filesystem.
>>> +	 *
>> 
>> 
>>> +	 * This comparison is safe under RCU because the caller
>>> +	 * guarantees the consistency between str and len. See
>>> +	 * __d_lookup_rcu_op_compare() for details.
>>> +	 */
>> 
>> This paragraph doesn't really make sense here.  It is originally from
>> the d_compare hook, which can be called under RCU, but there is no RCU
>> here.  Also, here we are comparing the dirent with the
>> name-under-lookup, name which is already safe.
>> 
>> 
>>> +	if (folded_name->name) {
>>> +		if (dirent.len == folded_name->len &&
>>> +		    !memcmp(folded_name->name, dirent.name, dirent.len)) {
>>> +			res = 1;
>>> +			goto out;
>>> +		}
>>> +		res = !utf8_strncasecmp_folded(um, folded_name, &dirent);
>> 
>> Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
>> which CAN happen for the first time here, if the dirent itself is
>> corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop the
>> error, but we want to propagate it from here, such that the warning on
>> patch 6 can trigger.
>> 
>> This is why I did that match dance on the original submission.  Sorry
>> for suggesting it.  We really want to get the error from utf8 and
>> propagate it if it is negative. basically:
>> 
>>         res > 0: match
>>         res == 0: no match.
>>         res < 0: propagate error and let the caller handle it
>
> In that case I will revert to the original v9 implementation and send a v11 to
> handle that.

Please, note that the memcmp optimization is still valid. On match, we
know the name is valid utf8.  It is just a matter of propagating the
error code from utf8 to the caller if we need to call it.
Eugen Hristev Feb. 20, 2024, 7:36 a.m. UTC | #4
On 2/19/24 16:55, Gabriel Krisman Bertazi wrote:
> Eugen Hristev <eugen.hristev@collabora.com> writes:
> 
>> On 2/16/24 18:12, Gabriel Krisman Bertazi wrote:
>>> Eugen Hristev <eugen.hristev@collabora.com> writes:
>>>
>>>> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>>>>
>>>> generic_ci_match can be used by case-insensitive filesystems to compare
>>>> strings under lookup with dirents in a case-insensitive way.  This
>>>> function is currently reimplemented by each filesystem supporting
>>>> casefolding, so this reduces code duplication in filesystem-specific
>>>> code.
>>>>
>>>> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>>>> [eugen.hristev@collabora.com: rework to first test the exact match]
>>>> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
>>>> ---
>>>>  fs/libfs.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/fs.h |  4 +++
>>>>  2 files changed, 84 insertions(+)
>>>>
>>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>>> index bb18884ff20e..82871fa1b066 100644
>>>> --- a/fs/libfs.c
>>>> +++ b/fs/libfs.c
>>>> @@ -1773,6 +1773,86 @@ static const struct dentry_operations generic_ci_dentry_ops = {
>>>>  	.d_hash = generic_ci_d_hash,
>>>>  	.d_compare = generic_ci_d_compare,
>>>>  };
>>>> +
>>>> +/**
>>>> + * generic_ci_match() - Match a name (case-insensitively) with a dirent.
>>>> + * This is a filesystem helper for comparison with directory entries.
>>>> + * generic_ci_d_compare should be used in VFS' ->d_compare instead.
>>>> + *
>>>> + * @parent: Inode of the parent of the dirent under comparison
>>>> + * @name: name under lookup.
>>>> + * @folded_name: Optional pre-folded name under lookup
>>>> + * @de_name: Dirent name.
>>>> + * @de_name_len: dirent name length.
>>>> + *
>>>> + *
>>>
>>> Since this need a respin, mind dropping the extra empty line here?
>>>
>>>> + * Test whether a case-insensitive directory entry matches the filename
>>>> + * being searched.  If @folded_name is provided, it is used instead of
>>>> + * recalculating the casefold of @name.
>>>> + *
>>>> + * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
>>>> + * < 0 on error.
>>>> + */
>>>> +int generic_ci_match(const struct inode *parent,
>>>> +		     const struct qstr *name,
>>>> +		     const struct qstr *folded_name,
>>>> +		     const u8 *de_name, u32 de_name_len)
>>>> +{
>>>> +	const struct super_block *sb = parent->i_sb;
>>>> +	const struct unicode_map *um = sb->s_encoding;
>>>> +	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
>>>> +	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
>>>> +	int res;
>>>> +
>>>> +	if (IS_ENCRYPTED(parent)) {
>>>> +		const struct fscrypt_str encrypted_name =
>>>> +			FSTR_INIT((u8 *) de_name, de_name_len);
>>>> +
>>>> +		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
>>>> +			return -EINVAL;
>>>> +
>>>> +		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
>>>> +		if (!decrypted_name.name)
>>>> +			return -ENOMEM;
>>>> +		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
>>>> +						&decrypted_name);
>>>> +		if (res < 0)
>>>> +			goto out;
>>>> +		dirent.name = decrypted_name.name;
>>>> +		dirent.len = decrypted_name.len;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Attempt a case-sensitive match first. It is cheaper and
>>>> +	 * should cover most lookups, including all the sane
>>>> +	 * applications that expect a case-sensitive filesystem.
>>>> +	 *
>>>
>>>
>>>> +	 * This comparison is safe under RCU because the caller
>>>> +	 * guarantees the consistency between str and len. See
>>>> +	 * __d_lookup_rcu_op_compare() for details.
>>>> +	 */
>>>
>>> This paragraph doesn't really make sense here.  It is originally from
>>> the d_compare hook, which can be called under RCU, but there is no RCU
>>> here.  Also, here we are comparing the dirent with the
>>> name-under-lookup, name which is already safe.
>>>
>>>
>>>> +	if (folded_name->name) {
>>>> +		if (dirent.len == folded_name->len &&
>>>> +		    !memcmp(folded_name->name, dirent.name, dirent.len)) {
>>>> +			res = 1;
>>>> +			goto out;
>>>> +		}
>>>> +		res = !utf8_strncasecmp_folded(um, folded_name, &dirent);
>>>
>>> Hmm, second thought on this.  This will ignore errors from utf8_strncasecmp*,
>>> which CAN happen for the first time here, if the dirent itself is
>>> corrupted on disk (exactly why we have patch 6).  Yes, ext4_match will drop the
>>> error, but we want to propagate it from here, such that the warning on
>>> patch 6 can trigger.
>>>
>>> This is why I did that match dance on the original submission.  Sorry
>>> for suggesting it.  We really want to get the error from utf8 and
>>> propagate it if it is negative. basically:
>>>
>>>         res > 0: match
>>>         res == 0: no match.
>>>         res < 0: propagate error and let the caller handle it
>>
>> In that case I will revert to the original v9 implementation and send a v11 to
>> handle that.
> 
> Please, note that the memcmp optimization is still valid. On match, we
> know the name is valid utf8.  It is just a matter of propagating the
> error code from utf8 to the caller if we need to call it.
> 


Okay, I am changing it.

By the way, is this supposed to work like this on case-insensitive directories ?

user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/*cuc
ls: cannot access '/media/CI_dir/*cuc': No such file or directory
user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/*CUC
-rw-r--r-- 1 root root 0 Feb 12 17:47 /media/CI_dir/CUC
user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/cuc
-rw-r--r-- 1 root root 0 Feb 12 17:47 /media/CI_dir/cuc
user@debian-rockchip-rock5b-rk3588:~$


basically wildcards don't work.

Thanks,
Eugen
Gabriel Krisman Bertazi Feb. 20, 2024, 2:59 p.m. UTC | #5
Eugen Hristev <eugen.hristev@collabora.com> writes:

> Okay, I am changing it.
>
> By the way, is this supposed to work like this on case-insensitive directories ?
>
> user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/*cuc
> ls: cannot access '/media/CI_dir/*cuc': No such file or directory
> user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/*CUC
> -rw-r--r-- 1 root root 0 Feb 12 17:47 /media/CI_dir/CUC
> user@debian-rockchip-rock5b-rk3588:~$ ls -la /media/CI_dir/cuc
> -rw-r--r-- 1 root root 0 Feb 12 17:47 /media/CI_dir/cuc
> user@debian-rockchip-rock5b-rk3588:~$
>
>
> basically wildcards don't work.

Yes, at least from a kernel point of view.  Your shell does wildcards in
userspace, probably by doing getdents and then comparing with possible
matches.  Since the shell itself is not case-insensitive aware, its
comparison is case-sensitive, and you get these apparent weird
semantics.

Not ideal from a user point of view.  But not a kernel bug.  If it
pushes people away from using case-insensitive directories in their
day-to-day work and leave it to only be used by Windows compatibility
layers, maybe that's a win? :)
diff mbox series

Patch

diff --git a/fs/libfs.c b/fs/libfs.c
index bb18884ff20e..82871fa1b066 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1773,6 +1773,86 @@  static const struct dentry_operations generic_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
 };
+
+/**
+ * generic_ci_match() - Match a name (case-insensitively) with a dirent.
+ * This is a filesystem helper for comparison with directory entries.
+ * generic_ci_d_compare should be used in VFS' ->d_compare instead.
+ *
+ * @parent: Inode of the parent of the dirent under comparison
+ * @name: name under lookup.
+ * @folded_name: Optional pre-folded name under lookup
+ * @de_name: Dirent name.
+ * @de_name_len: dirent name length.
+ *
+ *
+ * Test whether a case-insensitive directory entry matches the filename
+ * being searched.  If @folded_name is provided, it is used instead of
+ * recalculating the casefold of @name.
+ *
+ * Return: > 0 if the directory entry matches, 0 if it doesn't match, or
+ * < 0 on error.
+ */
+int generic_ci_match(const struct inode *parent,
+		     const struct qstr *name,
+		     const struct qstr *folded_name,
+		     const u8 *de_name, u32 de_name_len)
+{
+	const struct super_block *sb = parent->i_sb;
+	const struct unicode_map *um = sb->s_encoding;
+	struct fscrypt_str decrypted_name = FSTR_INIT(NULL, de_name_len);
+	struct qstr dirent = QSTR_INIT(de_name, de_name_len);
+	int res;
+
+	if (IS_ENCRYPTED(parent)) {
+		const struct fscrypt_str encrypted_name =
+			FSTR_INIT((u8 *) de_name, de_name_len);
+
+		if (WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)))
+			return -EINVAL;
+
+		decrypted_name.name = kmalloc(de_name_len, GFP_KERNEL);
+		if (!decrypted_name.name)
+			return -ENOMEM;
+		res = fscrypt_fname_disk_to_usr(parent, 0, 0, &encrypted_name,
+						&decrypted_name);
+		if (res < 0)
+			goto out;
+		dirent.name = decrypted_name.name;
+		dirent.len = decrypted_name.len;
+	}
+
+	/*
+	 * Attempt a case-sensitive match first. It is cheaper and
+	 * should cover most lookups, including all the sane
+	 * applications that expect a case-sensitive filesystem.
+	 *
+	 * This comparison is safe under RCU because the caller
+	 * guarantees the consistency between str and len. See
+	 * __d_lookup_rcu_op_compare() for details.
+	 */
+	if (folded_name->name) {
+		if (dirent.len == folded_name->len &&
+		    !memcmp(folded_name->name, dirent.name, dirent.len)) {
+			res = 1;
+			goto out;
+		}
+		res = !utf8_strncasecmp_folded(um, folded_name, &dirent);
+	} else {
+		if (dirent.len == name->len &&
+		    !memcmp(name->name, dirent.name, dirent.len) &&
+		    (!sb_has_strict_encoding(sb) || !utf8_validate(um, name))) {
+			res = 1;
+			goto out;
+		}
+		res = !utf8_strncasecmp(um, name, &dirent);
+	}
+
+out:
+	kfree(decrypted_name.name);
+	return res;
+}
+EXPORT_SYMBOL(generic_ci_match);
 #endif
 
 #ifdef CONFIG_FS_ENCRYPTION
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 820b93b2917f..7af691ff8d44 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3296,6 +3296,10 @@  extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
 extern int generic_check_addressable(unsigned, u64);
 
 extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
+extern int generic_ci_match(const struct inode *parent,
+			    const struct qstr *name,
+			    const struct qstr *folded_name,
+			    const u8 *de_name, u32 de_name_len);
 
 static inline bool sb_has_encoding(const struct super_block *sb)
 {