[RFC,11/11] xfs: use common file type conversion

Message ID	1482178268-22883-12-git-send-email-amir73il@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Amir Goldstein <amir73il@gmail.com> To: Jan Kara <jack@suse.com>, Theodore Ts'o <tytso@mit.edu>, Dave Chinner <david@fromorbit.com> Cc: "Darrick J . Wong" <darrick.wong@oracle.com>, Chris Mason <clm@fb.com>, Boaz Harrosh <boaz@plexistor.com>, Jaegeuk Kim <jaegeuk@kernel.org>, Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>, Mark Fasheh <mfasheh@versity.com>, Evgeniy Dushistov <dushistov@mail.ru>, Miklos Szeredi <miklos@szeredi.hu>, Al Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org Subject: [RFC][PATCH 11/11] xfs: use common file type conversion Date: Mon, 19 Dec 2016 22:11:08 +0200 Message-Id: <1482178268-22883-12-git-send-email-amir73il@gmail.com> In-Reply-To: <1482178268-22883-1-git-send-email-amir73il@gmail.com> References: <1482178268-22883-1-git-send-email-amir73il@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Amir Goldstein Dec. 19, 2016, 8:11 p.m. UTC

deduplicate the xfs file type conversion implementation.

xfs readdir code may expose DT_WHT type to user if that
type was set on disk, but xfs code never set a file type
of WHT on disk.

If it is acceptable to expose to user DT_UNKNOWN in case
WHT type somehow got to disk, then xfs_dir3_filetype_table
could also be replaced with the common fs_dtype() helper.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/xfs/libxfs/xfs_da_format.h |  5 +++--
 fs/xfs/libxfs/xfs_dir2.c      | 17 -----------------
 fs/xfs/libxfs/xfs_dir2.h      |  6 ------
 fs/xfs/xfs_iops.c             |  2 +-
 4 files changed, 4 insertions(+), 26 deletions(-)

Darrick J. Wong Dec. 19, 2016, 9:55 p.m. UTC | #1

On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
> deduplicate the xfs file type conversion implementation.
> 
> xfs readdir code may expose DT_WHT type to user if that
> type was set on disk, but xfs code never set a file type
> of WHT on disk.
> 
> If it is acceptable to expose to user DT_UNKNOWN in case
> WHT type somehow got to disk, then xfs_dir3_filetype_table
> could also be replaced with the common fs_dtype() helper.

AFAIK XFS has never actually written XFS_DIR3_FT_WHT to disk.
I see that overlayfs whiteouts are now some sort of weird
chardev with rdev == 0, so I guess overlayfs doesn't either...?

--D

> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_da_format.h |  5 +++--
>  fs/xfs/libxfs/xfs_dir2.c      | 17 -----------------
>  fs/xfs/libxfs/xfs_dir2.h      |  6 ------
>  fs/xfs/xfs_iops.c             |  2 +-
>  4 files changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> index 9a492a9..c66c26f 100644
> --- a/fs/xfs/libxfs/xfs_da_format.h
> +++ b/fs/xfs/libxfs/xfs_da_format.h
> @@ -169,8 +169,9 @@ struct xfs_da3_icnode_hdr {
>  
>  /*
>   * Dirents in version 3 directories have a file type field. Additions to this
> - * list are an on-disk format change, requiring feature bits. Valid values
> - * are as follows:
> + * list are an on-disk format change, requiring feature bits.
> + * Values 0..7 should match common file type values in file_type.h.
> + * Valid values are as follows:
>   */
>  #define XFS_DIR3_FT_UNKNOWN		0
>  #define XFS_DIR3_FT_REG_FILE		1
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index c58d72c..645a542 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -36,23 +36,6 @@
>  struct xfs_name xfs_name_dotdot = { (unsigned char *)"..", 2, XFS_DIR3_FT_DIR };
>  
>  /*
> - * @mode, if set, indicates that the type field needs to be set up.
> - * This uses the transformation from file mode to DT_* as defined in linux/fs.h
> - * for file type specification. This will be propagated into the directory
> - * structure if appropriate for the given operation and filesystem config.
> - */
> -const unsigned char xfs_mode_to_ftype[S_IFMT >> S_SHIFT] = {
> -	[0]			= XFS_DIR3_FT_UNKNOWN,
> -	[S_IFREG >> S_SHIFT]    = XFS_DIR3_FT_REG_FILE,
> -	[S_IFDIR >> S_SHIFT]    = XFS_DIR3_FT_DIR,
> -	[S_IFCHR >> S_SHIFT]    = XFS_DIR3_FT_CHRDEV,
> -	[S_IFBLK >> S_SHIFT]    = XFS_DIR3_FT_BLKDEV,
> -	[S_IFIFO >> S_SHIFT]    = XFS_DIR3_FT_FIFO,
> -	[S_IFSOCK >> S_SHIFT]   = XFS_DIR3_FT_SOCK,
> -	[S_IFLNK >> S_SHIFT]    = XFS_DIR3_FT_SYMLINK,
> -};
> -
> -/*
>   * ASCII case-insensitive (ie. A-Z) support for directories that was
>   * used in IRIX.
>   */
> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> index 0197590..f9b9b50 100644
> --- a/fs/xfs/libxfs/xfs_dir2.h
> +++ b/fs/xfs/libxfs/xfs_dir2.h
> @@ -32,12 +32,6 @@ struct xfs_dir2_data_unused;
>  extern struct xfs_name	xfs_name_dotdot;
>  
>  /*
> - * directory filetype conversion tables.
> - */
> -#define S_SHIFT 12
> -extern const unsigned char xfs_mode_to_ftype[];
> -
> -/*
>   * directory operations vector for encode/decode routines
>   */
>  struct xfs_dir_ops {
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 308bebb..c122827 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -103,7 +103,7 @@ xfs_dentry_to_name(
>  {
>  	namep->name = dentry->d_name.name;
>  	namep->len = dentry->d_name.len;
> -	namep->type = xfs_mode_to_ftype[(mode & S_IFMT) >> S_SHIFT];
> +	namep->type = fs_umode_to_ftype(mode);
>  }
>  
>  STATIC void
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Dec. 20, 2016, 12:28 a.m. UTC | #2

On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
> deduplicate the xfs file type conversion implementation.
> 
> xfs readdir code may expose DT_WHT type to user if that
> type was set on disk, but xfs code never set a file type
> of WHT on disk.
> 
> If it is acceptable to expose to user DT_UNKNOWN in case
> WHT type somehow got to disk, then xfs_dir3_filetype_table
> could also be replaced with the common fs_dtype() helper.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/xfs/libxfs/xfs_da_format.h |  5 +++--
>  fs/xfs/libxfs/xfs_dir2.c      | 17 -----------------
>  fs/xfs/libxfs/xfs_dir2.h      |  6 ------
>  fs/xfs/xfs_iops.c             |  2 +-
>  4 files changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> index 9a492a9..c66c26f 100644
> --- a/fs/xfs/libxfs/xfs_da_format.h
> +++ b/fs/xfs/libxfs/xfs_da_format.h
> @@ -169,8 +169,9 @@ struct xfs_da3_icnode_hdr {
>  
>  /*
>   * Dirents in version 3 directories have a file type field. Additions to this
> - * list are an on-disk format change, requiring feature bits. Valid values
> - * are as follows:
> + * list are an on-disk format change, requiring feature bits.
> + * Values 0..7 should match common file type values in file_type.h.
> + * Valid values are as follows:

They have to stay in sync... but there's nothing here to enforce this.

Originally XFS takes the mode value passed in by the VFS and converts
that to an XFS_DIR3_FT_* equivalent, which is passed around internally
before being written to disk.  On the way up (readdir), the _DIR3_FT_
values are converted to their DT_* equivalents before being passed back
to the other VFS directory iterator functions.

With this patch applied, XFS no longer gets to write its own ftype
values to disk -- all that is now opaque in the VFS.  Effectively, we
lose control of that part of our own disk format.  You can subtly change
the range of fs_umode_to_ftype() and that'll get written to disk.

Normally we evaluate creating new feature flags when the on-disk format
changes, to try to minimize user problems.  These problems could arise
in xfs_dir3_get_dtype() which still relies on converting XFS_DIR3_FT_
values in to DT_ values, or with old versions of xfsprogs getting very
confused when it encounters a new value.

The proposed FT_* space also seems inflexible to the VFS -- the upper
5 bits are reserved for private use and the lower 3 bits are all in use,
which means that the VFS cannot later add more type codes bits unless
we can find bits that *nobody* else is using.  (The same theoretically
applies in reverse to XFS but as we don't have any private type codes,
it's not an issue yet.)

Furthermore, xfsprogs uses (roughly) the same libxfs code as the kernel.
Any code hoisted out of the kernel libxfs into the VFS must still be
provided for in xfsprogs so that we can build new xfsprogs on old kernel
headers.  If the hoist is into linux/fs.h, as is the case here, then
xfsprogs must retain a private version of the definitions, grow a bunch
of m4/autoconf macros to check for the presence of the linux/fs.h
versions, and wire up the private versions if linux/fs.h doesn't provide
one.  We also have to make sure that anything added to the VFS version
gets added to the xfsprogs version.

Note: We did this all this work a short time ago so that ext4 and XFS
could present the same FSGETXATTR ioctl to user programs (major benefit)
but it was kind of a pain to get right.  I don't see an upside to ceding
control of this part of the disk format to the VFS.

--D

>   */
>  #define XFS_DIR3_FT_UNKNOWN		0
>  #define XFS_DIR3_FT_REG_FILE		1
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index c58d72c..645a542 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -36,23 +36,6 @@
>  struct xfs_name xfs_name_dotdot = { (unsigned char *)"..", 2, XFS_DIR3_FT_DIR };
>  
>  /*
> - * @mode, if set, indicates that the type field needs to be set up.
> - * This uses the transformation from file mode to DT_* as defined in linux/fs.h
> - * for file type specification. This will be propagated into the directory
> - * structure if appropriate for the given operation and filesystem config.
> - */
> -const unsigned char xfs_mode_to_ftype[S_IFMT >> S_SHIFT] = {
> -	[0]			= XFS_DIR3_FT_UNKNOWN,
> -	[S_IFREG >> S_SHIFT]    = XFS_DIR3_FT_REG_FILE,
> -	[S_IFDIR >> S_SHIFT]    = XFS_DIR3_FT_DIR,
> -	[S_IFCHR >> S_SHIFT]    = XFS_DIR3_FT_CHRDEV,
> -	[S_IFBLK >> S_SHIFT]    = XFS_DIR3_FT_BLKDEV,
> -	[S_IFIFO >> S_SHIFT]    = XFS_DIR3_FT_FIFO,
> -	[S_IFSOCK >> S_SHIFT]   = XFS_DIR3_FT_SOCK,
> -	[S_IFLNK >> S_SHIFT]    = XFS_DIR3_FT_SYMLINK,
> -};
> -
> -/*
>   * ASCII case-insensitive (ie. A-Z) support for directories that was
>   * used in IRIX.
>   */
> diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
> index 0197590..f9b9b50 100644
> --- a/fs/xfs/libxfs/xfs_dir2.h
> +++ b/fs/xfs/libxfs/xfs_dir2.h
> @@ -32,12 +32,6 @@ struct xfs_dir2_data_unused;
>  extern struct xfs_name	xfs_name_dotdot;
>  
>  /*
> - * directory filetype conversion tables.
> - */
> -#define S_SHIFT 12
> -extern const unsigned char xfs_mode_to_ftype[];
> -
> -/*
>   * directory operations vector for encode/decode routines
>   */
>  struct xfs_dir_ops {
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 308bebb..c122827 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -103,7 +103,7 @@ xfs_dentry_to_name(
>  {
>  	namep->name = dentry->d_name.name;
>  	namep->len = dentry->d_name.len;
> -	namep->type = xfs_mode_to_ftype[(mode & S_IFMT) >> S_SHIFT];
> +	namep->type = fs_umode_to_ftype(mode);
>  }
>  
>  STATIC void
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Dec. 20, 2016, 5:20 a.m. UTC | #3

On Tue, Dec 20, 2016 at 2:28 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
>> deduplicate the xfs file type conversion implementation.
>>
>> xfs readdir code may expose DT_WHT type to user if that
>> type was set on disk, but xfs code never set a file type
>> of WHT on disk.
>>
>> If it is acceptable to expose to user DT_UNKNOWN in case
>> WHT type somehow got to disk, then xfs_dir3_filetype_table
>> could also be replaced with the common fs_dtype() helper.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>  fs/xfs/libxfs/xfs_da_format.h |  5 +++--
>>  fs/xfs/libxfs/xfs_dir2.c      | 17 -----------------
>>  fs/xfs/libxfs/xfs_dir2.h      |  6 ------
>>  fs/xfs/xfs_iops.c             |  2 +-
>>  4 files changed, 4 insertions(+), 26 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
>> index 9a492a9..c66c26f 100644
>> --- a/fs/xfs/libxfs/xfs_da_format.h
>> +++ b/fs/xfs/libxfs/xfs_da_format.h
>> @@ -169,8 +169,9 @@ struct xfs_da3_icnode_hdr {
>>
>>  /*
>>   * Dirents in version 3 directories have a file type field. Additions to this
>> - * list are an on-disk format change, requiring feature bits. Valid values
>> - * are as follows:
>> + * list are an on-disk format change, requiring feature bits.
>> + * Values 0..7 should match common file type values in file_type.h.
>> + * Valid values are as follows:
>
> They have to stay in sync... but there's nothing here to enforce this.
>
> Originally XFS takes the mode value passed in by the VFS and converts
> that to an XFS_DIR3_FT_* equivalent, which is passed around internally
> before being written to disk.  On the way up (readdir), the _DIR3_FT_
> values are converted to their DT_* equivalents before being passed back
> to the other VFS directory iterator functions.
>
> With this patch applied, XFS no longer gets to write its own ftype
> values to disk -- all that is now opaque in the VFS.  Effectively, we
> lose control of that part of our own disk format.  You can subtly change
> the range of fs_umode_to_ftype() and that'll get written to disk.
>

No way that the common conversion code will change those values
too many fs depend on it. that should be made clear.

> Normally we evaluate creating new feature flags when the on-disk format
> changes, to try to minimize user problems.  These problems could arise
> in xfs_dir3_get_dtype() which still relies on converting XFS_DIR3_FT_
> values in to DT_ values, or with old versions of xfsprogs getting very
> confused when it encounters a new value.
>
> The proposed FT_* space also seems inflexible to the VFS -- the upper
> 5 bits are reserved for private use and the lower 3 bits are all in use,
> which means that the VFS cannot later add more type codes bits unless
> we can find bits that *nobody* else is using.  (The same theoretically
> applies in reverse to XFS but as we don't have any private type codes,
> it's not an issue yet.)
>

btrfs has a private flag on bit4 (FT_XTTR) and this implementation
does not prevent btrfs from using the private bit.
If XFS ever wants to write a private bit it will have to not blindly
translate mode to ftype.
The scope of the common implementation should remain only
the 3bits that translate directly to POSIX types.

> Furthermore, xfsprogs uses (roughly) the same libxfs code as the kernel.
> Any code hoisted out of the kernel libxfs into the VFS must still be
> provided for in xfsprogs so that we can build new xfsprogs on old kernel
> headers.  If the hoist is into linux/fs.h, as is the case here, then
> xfsprogs must retain a private version of the definitions, grow a bunch
> of m4/autoconf macros to check for the presence of the linux/fs.h
> versions, and wire up the private versions if linux/fs.h doesn't provide
> one.  We also have to make sure that anything added to the VFS version
> gets added to the xfsprogs version.
>

As a matter of fact, I would be very much interested to standardize
use the upper 5 bits some can remain private and some can be used
for generic fs purpose. But doing that will require shring a common
library (or at least a spec) between fs offline tools, so that's a
bigger challenge.

> Note: We did this all this work a short time ago so that ext4 and XFS
> could present the same FSGETXATTR ioctl to user programs (major benefit)
> but it was kind of a pain to get right.  I don't see an upside to ceding
> control of this part of the disk format to the VFS.
>

The answer "this is not wanted for XFS" is perfectly valid :)
The common implementation can be used by the small fs, which never
want anymore than the basic ext2 implementation.

However, since the DT_ values and  XFS_DIR3_FT_ values of 0..7
are already carved in stone, as long as comments in both common code
and libxfs code makes that clear, I see no harm in using the common
implementation in xfs, besides the need to yank more code from fs.h to
libxfs, but it's your decision.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Dec. 20, 2016, 6:17 a.m. UTC | #4

[adding linux-unionfs]

On Mon, Dec 19, 2016 at 11:55 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
>> deduplicate the xfs file type conversion implementation.
>>
>> xfs readdir code may expose DT_WHT type to user if that
>> type was set on disk, but xfs code never set a file type
>> of WHT on disk.
>>
>> If it is acceptable to expose to user DT_UNKNOWN in case
>> WHT type somehow got to disk, then xfs_dir3_filetype_table
>> could also be replaced with the common fs_dtype() helper.
>
> AFAIK XFS has never actually written XFS_DIR3_FT_WHT to disk.
> I see that overlayfs whiteouts are now some sort of weird
> chardev with rdev == 0, so I guess overlayfs doesn't either...?
>

Nope. overlayfs calls vfs_whiteout() which calls i_op->mknod(.. S_IFCHR, 0)
So AFAIK, there is no evidence of DT_WHT even being use in Linux.

From overlayfs perspective, it could have been nice if conversion functions
took mode+rdev instead of just mode and produced the DT_WHT value,
but it is not all that easy to know how applications would react to this change.

I suppose there shouldn't be a problem to expose DT_WHT d_type in
iterate_dir() and convert it to DT_CHR in getdents' filldir().
It will be beneficial to overlayfs in case of a directory with tons of
whiteouts,
not having to stat all those inodes is a big win.
Not sure how common this use case is, but it is quite easy for users to
get to this sort of state when using inefficient container layouts.

How about xfs_repair? will it complain if it sees DT_WHT and a chardev
inode? does it check at all that the type and mode match?

Being able to play with these sort of mode+rdev+? generic conversions
is one of the reasons I proposed the common DT_ implementation.

Since XFS already has that WHT bit reserved on-disk I may as well
post some POC to use it correctly, so overlayfs can benefit from DT_WHT
in the best case scenario.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Dec. 20, 2016, 2:07 p.m. UTC | #5

On Tue, Dec 20, 2016 at 8:17 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> [adding linux-unionfs]
>
> On Mon, Dec 19, 2016 at 11:55 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
>> On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
>>> deduplicate the xfs file type conversion implementation.
>>>
>>> xfs readdir code may expose DT_WHT type to user if that
>>> type was set on disk, but xfs code never set a file type
>>> of WHT on disk.
>>>
>>> If it is acceptable to expose to user DT_UNKNOWN in case
>>> WHT type somehow got to disk, then xfs_dir3_filetype_table
>>> could also be replaced with the common fs_dtype() helper.
>>
>> AFAIK XFS has never actually written XFS_DIR3_FT_WHT to disk.
>> I see that overlayfs whiteouts are now some sort of weird
>> chardev with rdev == 0, so I guess overlayfs doesn't either...?
>>
>
> Nope. overlayfs calls vfs_whiteout() which calls i_op->mknod(.. S_IFCHR, 0)
> So AFAIK, there is no evidence of DT_WHT even being use in Linux.
>
> From overlayfs perspective, it could have been nice if conversion functions
> took mode+rdev instead of just mode and produced the DT_WHT value,
> but it is not all that easy to know how applications would react to this change.
>
> I suppose there shouldn't be a problem to expose DT_WHT d_type in
> iterate_dir() and convert it to DT_CHR in getdents' filldir().
> It will be beneficial to overlayfs in case of a directory with tons of
> whiteouts,
> not having to stat all those inodes is a big win.
> Not sure how common this use case is, but it is quite easy for users to
> get to this sort of state when using inefficient container layouts.
>
> How about xfs_repair? will it complain if it sees DT_WHT and a chardev
> inode? does it check at all that the type and mode match?
>

To answer my own question, yes, xfs_repair would complain and fix this,
so not possible to set DT_WHT type for the VFS whiteout creature
without adding a new feature flag.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Dec. 21, 2016, 5:23 a.m. UTC | #6

On Tue, Dec 20, 2016 at 07:20:22AM +0200, Amir Goldstein wrote:
> On Tue, Dec 20, 2016 at 2:28 AM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
> > Note: We did this all this work a short time ago so that ext4 and XFS
> > could present the same FSGETXATTR ioctl to user programs (major benefit)
> > but it was kind of a pain to get right.  I don't see an upside to ceding
> > control of this part of the disk format to the VFS.
> >
> 
> The answer "this is not wanted for XFS" is perfectly valid :)
> The common implementation can be used by the small fs, which never
> want anymore than the basic ext2 implementation.
> 
> However, since the DT_ values and  XFS_DIR3_FT_ values of 0..7
> are already carved in stone, as long as comments in both common code
> and libxfs code makes that clear, I see no harm in using the common
> implementation in xfs, besides the need to yank more code from fs.h to
> libxfs, but it's your decision.

It's a philoophical and architectural issue. We currently have a
distinct separation between generic functionality and per-filesystem
specific definitions. The on-disk definitions are owned by the
filesystem not the generic code, and the generic code /never/ sees
what the filesystem stores on disk. THe VFS is supposed to be
completely independent of what the filesystems store on disk - it's
an abstract concept - so that it can morph into whatever is required
to support the functionality the different filesystem provides.

The way we normally handle this is a method callout of some kind
into the filesystem to do the filesystem specific function, and
if there are multiple filesystems that do the same thing, they use a
common function. So that part of the patchset - providing common
helpers to inode mode/filesytem d_type conversion - is fine.

The part that isn't fine, IMO, is defining the filesystem d_type
values in generic code. They should be defined by the filesystem and
passed to the generic conversion functions as a constant. It may
require a different structure to do this cleanly (i.e. something
other than a sparse array keyed on S_IFMT), but I think that having
the VFS define on-disk formats like this is a slippery slope that
only leads to long term pain and ossification.

Cheers,

Dave.

Amir Goldstein Dec. 21, 2016, 6:37 a.m. UTC | #7

On Wed, Dec 21, 2016 at 7:23 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Dec 20, 2016 at 07:20:22AM +0200, Amir Goldstein wrote:
>> On Tue, Dec 20, 2016 at 2:28 AM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Mon, Dec 19, 2016 at 10:11:08PM +0200, Amir Goldstein wrote:
>> > Note: We did this all this work a short time ago so that ext4 and XFS
>> > could present the same FSGETXATTR ioctl to user programs (major benefit)
>> > but it was kind of a pain to get right.  I don't see an upside to ceding
>> > control of this part of the disk format to the VFS.
>> >
>>
>> The answer "this is not wanted for XFS" is perfectly valid :)
>> The common implementation can be used by the small fs, which never
>> want anymore than the basic ext2 implementation.
>>
>> However, since the DT_ values and  XFS_DIR3_FT_ values of 0..7
>> are already carved in stone, as long as comments in both common code
>> and libxfs code makes that clear, I see no harm in using the common
>> implementation in xfs, besides the need to yank more code from fs.h to
>> libxfs, but it's your decision.
>
> It's a philoophical and architectural issue. We currently have a
> distinct separation between generic functionality and per-filesystem
> specific definitions. The on-disk definitions are owned by the
> filesystem not the generic code, and the generic code /never/ sees
> what the filesystem stores on disk. THe VFS is supposed to be
> completely independent of what the filesystems store on disk - it's
> an abstract concept - so that it can morph into whatever is required
> to support the functionality the different filesystem provides.
>

Technically, this abstraction is not within the scope of VFS, but
more something of a "libfs", or code that belongs under fs/common,
much like the new fs/crypto and in theory those common bits
should also belong to a userspace libfs library, but that's taking
this a bit too far now.

> The way we normally handle this is a method callout of some kind
> into the filesystem to do the filesystem specific function, and
> if there are multiple filesystems that do the same thing, they use a
> common function. So that part of the patchset - providing common
> helpers to inode mode/filesytem d_type conversion - is fine.
>
> The part that isn't fine, IMO, is defining the filesystem d_type
> values in generic code. They should be defined by the filesystem and
> passed to the generic conversion functions as a constant. It may
> require a different structure to do this cleanly (i.e. something
> other than a sparse array keyed on S_IFMT), but I think that having
> the VFS define on-disk formats like this is a slippery slope that
> only leads to long term pain and ossification.
>

I agree.
I will add this fs-specfic-type-conversion to the common helpers.

common code will convert i_mode => DT_ => common FT_
and if fs provides a specific conversion table that will also
be used to convert common FT_ values to FS_FT_ values.

I'll see if that turns up to look useful. if not I will just leave the
common conversions from sparse DT values in common code
and leave the rest in fs specific code.

Thanks for the feedback.


Ted, Chris,

Do you share a similar "philosophy" on the matter?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Dec. 21, 2016, 3:06 p.m. UTC | #8

On Wed, Dec 21, 2016 at 04:23:17PM +1100, Dave Chinner wrote:
> It's a philoophical and architectural issue. We currently have a
> distinct separation between generic functionality and per-filesystem
> specific definitions. The on-disk definitions are owned by the
> filesystem not the generic code, and the generic code /never/ sees
> what the filesystem stores on disk. THe VFS is supposed to be
> completely independent of what the filesystems store on disk - it's
> an abstract concept - so that it can morph into whatever is required
> to support the functionality the different filesystem provides.

I had the same concerns as Dave and Darrick --- which is that it just
"feels" wrong to define file system encoding semantics in generic
code.  So it's really much more of a taste issue than anything else.
I'll note that we've already started taking some steps down this
slippery path with the definition of whiteout --- where we define
WHITEOUT_MODE and WHITEOUT_DEV to be 0, and both the generic code and
the file system code need to agree that the on-disk encoding of a
whiteout is an inode with mode S_IFCHR | WHITEOUT_MODE, and with the
device number set to WHITEOUT_DEV.

(Why we define WHITEOUT_MODE to be zero when everyone has to *know*
and use the same on-disk encoding of S_IFCHR | WhiTEOUT_MODE doesn't
make any sense to me; it would be much simpler to #define
WHITEOUT_MODE to be S_IFCHR since if it ever changed, all file on-disk
encodings of overlay file systems would go *boom*.)

The fact that the three bit encoding of the directory file types is
fully defined, and can *not* ever be extended without breaking things
means that the chances that it would be a problem is much less.  So I
don't object quite as much as Dave and Darrick --- but I'll also point
out the benefits are quite small.  All we are saving is 16 bytes per
file system compiled into the kernel.

So it's a lot of discussion / changes for very little gain.

> The way we normally handle this is a method callout of some kind
> into the filesystem to do the filesystem specific function, and
> if there are multiple filesystems that do the same thing, they use a
> common function. So that part of the patchset - providing common
> helpers to inode mode/filesytem d_type conversion - is fine.

The common helpers are inline functions that do an array lookup.  If
we provide some other more generic way of letting the file systems
provide conversion functions, at that point we're not really saving
anything, and just adding complexity for no real benefit.

Which perhaps is my biggest argument against it; making this be
generic adds complexity where it's not really buying us anything
(except for 16 bytes of data segment).  But whether we think it's a
bad idea due to a slippery slope argument or due to a complexity
argument, it's clear that the gains are marginal, and while I
personally am willing compromise a little on the slippery slope
argument if the gains are large enough --- for example, the fs/crypto
code is certainly adding some commonality in ways that affect on-disk
encodings, in this particular case, the wins just don't seem high
enough.

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Dec. 21, 2016, 4:37 p.m. UTC | #9

On Wed, Dec 21, 2016 at 5:06 PM, Theodore Ts'o <tytso@mit.edu> wrote:
...

>
> The fact that the three bit encoding of the directory file types is
> fully defined, and can *not* ever be extended without breaking things
> means that the chances that it would be a problem is much less.  So I
> don't object quite as much as Dave and Darrick --- but I'll also point
> out the benefits are quite small.  All we are saving is 16 bytes per
> file system compiled into the kernel.
>
> So it's a lot of discussion / changes for very little gain.
>

Right. never claimed that saving data segment space is the issue
here. To be completely honest, I started looking at ways to optimize
whiteout iteration and was appalled to see all that code copy&paste,
so went on an OCD cleanup spree.

So it is really a lot more about clumsiness of the current code then
about saving actual lines on code or space in data segment.

Perhaps I should have settled with defining this common section:

+#define S_DT_SHIFT     12
+#define S_DT(mode)     (((mode) & S_IFMT) >> S_DT_SHIFT)
+#define S_DT_MASK      (S_IFMT >> S_DT_SHIFT)
+
+#define DT_UNKNOWN     0
+#define DT_FIFO                S_DT(S_IFIFO) /* 1 */
+#define DT_CHR         S_DT(S_IFCHR) /* 2 */
+#define DT_DIR         S_DT(S_IFDIR) /* 4 */
+#define DT_BLK         S_DT(S_IFBLK) /* 6 */
+#define DT_REG         S_DT(S_IFREG) /* 8 */
+#define DT_LNK         S_DT(S_IFLNK) /* 10 */
+#define DT_SOCK                S_DT(S_IFSOCK) /* 12 */
+#define DT_WHT         14
+
+#define DT_MAX         (S_DT_MASK + 1) /* 16 */

and using S_DT(mode) and DT_MAX in place of all the many places in the
code that open code the >> S_SHIFT.

Note that all those file system copy&pasted a potential bug.

This is an array of size 15:

static unsigned char ext2_type_by_mode[S_IFMT >> S_SHIFT] = {

and it is always addressed this way:

ext2_type_by_mode[(inode->i_mode & S_IFMT)>>S_SHIFT];

Which *can* try to address ext2_type_by_mode[15]

Now, can you say for certain that you can never get a malformed i_mode
with ((inode->i_mode & S_IFMT) == S_IFMT) either from disk or from
some code calling into VFS functions, which do not sanity check the
file type of the mode argument.

Regardless, setting an array size of 15 is just as buggy as setting an
array size of DT_SOCK+1 (13) because no syscall sets a larger value
for the type bits.

>> The way we normally handle this is a method callout of some kind
>> into the filesystem to do the filesystem specific function, and
>> if there are multiple filesystems that do the same thing, they use a
>> common function. So that part of the patchset - providing common
>> helpers to inode mode/filesytem d_type conversion - is fine.
>
> The common helpers are inline functions that do an array lookup.  If
> we provide some other more generic way of letting the file systems
> provide conversion functions, at that point we're not really saving
> anything, and just adding complexity for no real benefit.
>

You are absolutely right. Not sure if you replied after seeing
v2 of this patch, but I did not use any fs provided complex conversion
functions.

I left the XFS code mostly as it was (same complexity) and sorted
it out to use some common defines and helpers in a way that
seem cleaner to me. At least that minor bug has been addressed.

I could further simplify the patch to use the S_DT(mode) macro
instead of the fs_umode_to_ftype(mode) helper.
If I do that, then the small snippet above that defines S_DT()
and DT_MAX is the only code that needs to be carried from
linux/fs.h to xfsprogs for building libxfs (same goes for e2fsprogs).

That's would be a technical code cleanup that does not mess
with on-disk format definitions.
I will send out an ext4 sample patch for you to consider.

Thanks for the feedback,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Miklos Szeredi Dec. 21, 2016, 4:59 p.m. UTC | #10

On Wed, Dec 21, 2016 at 4:06 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Wed, Dec 21, 2016 at 04:23:17PM +1100, Dave Chinner wrote:
>> It's a philoophical and architectural issue. We currently have a
>> distinct separation between generic functionality and per-filesystem
>> specific definitions. The on-disk definitions are owned by the
>> filesystem not the generic code, and the generic code /never/ sees
>> what the filesystem stores on disk. THe VFS is supposed to be
>> completely independent of what the filesystems store on disk - it's
>> an abstract concept - so that it can morph into whatever is required
>> to support the functionality the different filesystem provides.
>
> I had the same concerns as Dave and Darrick --- which is that it just
> "feels" wrong to define file system encoding semantics in generic
> code.  So it's really much more of a taste issue than anything else.
> I'll note that we've already started taking some steps down this
> slippery path with the definition of whiteout --- where we define
> WHITEOUT_MODE and WHITEOUT_DEV to be 0, and both the generic code and
> the file system code need to agree that the on-disk encoding of a
> whiteout is an inode with mode S_IFCHR | WHITEOUT_MODE, and with the
> device number set to WHITEOUT_DEV.

WHITEOUT_DEV is not meant to be an encoding, it's meant to be an API.

There are reasons why we didn't want to invent a new kind of object
for whiteouts and so the choice fell on some special kind of already
existing object type.  So it's now a char dev with 0 device number and
that's set in stone, at least as far as the userspace API goes.

Filesystems are, on the other hand, free to encode this object in
whatever way they wish.  And in the VFS we could represent it with a
new kind of object to be converted to the char dev on the userspace
interface, if that turns out to be a better design choice.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Dec. 21, 2016, 10:56 p.m. UTC | #11

On Wed, Dec 21, 2016 at 06:37:42PM +0200, Amir Goldstein wrote:
> 
> Right. never claimed that saving data segment space is the issue
> here. To be completely honest, I started looking at ways to optimize
> whiteout iteration and was appalled to see all that code copy&paste,
> so went on an OCD cleanup spree.
> 
> So it is really a lot more about clumsiness of the current code then
> about saving actual lines on code or space in data segment.
> 
> Perhaps I should have settled with defining this common section:

> +#define DT_WHT         14

What are you trying to accomplish here?  Unless we actually encoding
DT_WHT into the on-disk format, it's not really going to buy you much.

And if you are going to encode this into the on-disk format, each file
system is going to have to set some kind of flag in the superblock to
indicate whether it's doing this new thing or not.  And this is
*precisely* why Dave and Darrick are objecting --- if we are going to
make any kind of file system encoding change, the file system has to
know about it.

> Note that all those file system copy&pasted a potential bug.
> 
> This is an array of size 15:
> 
> static unsigned char ext2_type_by_mode[S_IFMT >> S_SHIFT] = {
> 
> and it is always addressed this way:
> 
> ext2_type_by_mode[(inode->i_mode & S_IFMT)>>S_SHIFT];
> 
> Which *can* try to address ext2_type_by_mode[15]
> 
> Now, can you say for certain that you can never get a malformed i_mode
> with ((inode->i_mode & S_IFMT) == S_IFMT) either from disk or from
> some code calling into VFS functions, which do not sanity check the
> file type of the mode argument.

It's not a problem and not a bug, because we only assign the file type
when the inode is created, and at that point, the i_mode is set by the
*kernel*, and not by the on-disk encoding.

And BTW this is why DT_WHT makes no sense.  We the DT encodings come
from the directory entry, and *never* come from the inode i_mode read
from disk --- since at the time when we do the readdir(2), the inode
may not have been read into memory.  So we can't add DT_WHT unless we
also change the on-disk encoding of the directory entry on a file
system by file system basis.

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Dec. 22, 2016, 5:54 a.m. UTC | #12

On Thu, Dec 22, 2016 at 12:56 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Wed, Dec 21, 2016 at 06:37:42PM +0200, Amir Goldstein wrote:
>>
>> Right. never claimed that saving data segment space is the issue
>> here. To be completely honest, I started looking at ways to optimize
>> whiteout iteration and was appalled to see all that code copy&paste,
>> so went on an OCD cleanup spree.
>>
>> So it is really a lot more about clumsiness of the current code then
>> about saving actual lines on code or space in data segment.
>>
>> Perhaps I should have settled with defining this common section:
>
>> +#define DT_WHT         14
>
> What are you trying to accomplish here?  Unless we actually encoding
> DT_WHT into the on-disk format, it's not really going to buy you much.
>
> And if you are going to encode this into the on-disk format, each file
> system is going to have to set some kind of flag in the superblock to
> indicate whether it's doing this new thing or not.  And this is
> *precisely* why Dave and Darrick are objecting --- if we are going to
> make any kind of file system encoding change, the file system has to
> know about it.
>

Ted,

There is a bit of a confusion here.
Although I would have liked to be able to add DT_WHT to dirent
I found there is no east way to do it without adding fs feature flags.

I did not add the DT_WHT definion, I kept it as is just moved
from fs.h to file_type.h because some fs (e.g. xfs) still use this
ghost define.

>> Note that all those file system copy&pasted a potential bug.
>>
>> This is an array of size 15:
>>
>> static unsigned char ext2_type_by_mode[S_IFMT >> S_SHIFT] = {
>>
>> and it is always addressed this way:
>>
>> ext2_type_by_mode[(inode->i_mode & S_IFMT)>>S_SHIFT];
>>
>> Which *can* try to address ext2_type_by_mode[15]
>>
>> Now, can you say for certain that you can never get a malformed i_mode
>> with ((inode->i_mode & S_IFMT) == S_IFMT) either from disk or from
>> some code calling into VFS functions, which do not sanity check the
>> file type of the mode argument.
>
> It's not a problem and not a bug, because we only assign the file type
> when the inode is created, and at that point, the i_mode is set by the
> *kernel*, and not by the on-disk encoding.
>

Not entirley true.
On rename some fs set dentry type of target by converting the mode of
the renamed object.
I don't remember what ext4 does, but I think it does the same,
so malformed imode on disk could potentially get you there.

> And BTW this is why DT_WHT makes no sense.  We the DT encodings come
> from the directory entry, and *never* come from the inode i_mode read
> from disk --- since at the time when we do the readdir(2), the inode
> may not have been read into memory.  So we can't add DT_WHT unless we
> also change the on-disk encoding of the directory entry on a file
> system by file system basis.
>


True. it makes no sense at all, but we can't remove it.

XFS can decide to start treating it as an error if found on disk,
but that's another story.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Dec. 22, 2016, 8:30 p.m. UTC | #13

On Thu, Dec 22, 2016 at 7:54 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 12:56 AM, Theodore Ts'o <tytso@mit.edu> wrote:
...

>>> Note that all those file system copy&pasted a potential bug.
>>>
>>> This is an array of size 15:
>>>
>>> static unsigned char ext2_type_by_mode[S_IFMT >> S_SHIFT] = {
>>>
>>> and it is always addressed this way:
>>>
>>> ext2_type_by_mode[(inode->i_mode & S_IFMT)>>S_SHIFT];
>>>
>>> Which *can* try to address ext2_type_by_mode[15]
>>>
>>> Now, can you say for certain that you can never get a malformed i_mode
>>> with ((inode->i_mode & S_IFMT) == S_IFMT) either from disk or from
>>> some code calling into VFS functions, which do not sanity check the
>>> file type of the mode argument.
>>
>> It's not a problem and not a bug, because we only assign the file type
>> when the inode is created, and at that point, the i_mode is set by the
>> *kernel*, and not by the on-disk encoding.
>>
>
> Not entirley true.
> On rename some fs set dentry type of target by converting the mode of
> the renamed object.
> I don't remember what ext4 does, but I think it does the same,
> so malformed imode on disk could potentially get you there.
>

So I checked the code.

ext4 is off the hook because it sanitizes mode in ext4_iget and
returns -ECORRUPTEDFS

e2fsprogs' ext2_file_type() is off the hook as well, not using a
conversion table at all.

ext2 on the other hand just calls init_special_inode() for the else case
which in turn print a debug message "init_special_inode: bogus i_mode (%o)"
and doesn't do anything about it, so a later rename can certainly try to convert
a bogus i_mode of S_IFMT and escape the boundaries of the conversion table.

Anyway, I won't try to force feed anyone fix patches.

I am going to re-send the xfs patch as an independent fix patch and will
send a matching patch to xfsprogs.

Any other fs maintainers that want the fix are welcome to apply their
own version
or ask me to send a fix their way.

Cheers,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,11/11] xfs: use common file type conversion

Commit Message

Comments

Patch