[4/7] xfs: Implement fallocate query support mode

Message ID	1505749947-26360-5-git-send-email-lczerner@redhat.com (mailing list archive)
State	Deferred, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 36DCA81DEB From: Lukas Czerner <lczerner@redhat.com> To: linux-fsdevel@vger.kernel.org Cc: Lukas Czerner <lczerner@redhat.com>, linux-xfs@vger.kernel.org Subject: [PATCH 4/7] xfs: Implement fallocate query support mode Date: Mon, 18 Sep 2017 17:52:24 +0200 Message-Id: <1505749947-26360-5-git-send-email-lczerner@redhat.com> In-Reply-To: <1505749947-26360-1-git-send-email-lczerner@redhat.com> References: <1505749947-26360-1-git-send-email-lczerner@redhat.com> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk

Lukas Czerner Sept. 18, 2017, 3:52 p.m. UTC

Return all fallcoate modes supported by xfs file system.

Cc: linux-xfs@vger.kernel.org
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/xfs/xfs_file.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Christoph Hellwig Sept. 18, 2017, 5:56 p.m. UTC | #1

This seems to miss any explanation and 6 out of 7 patches.

That being said, I really detest adding weird query modes to random
syscalls.  We should expose capabilities through the interface that
was designed for such things: (f)pathconf, as there are a lot more
important parameters we currently fail to expose to userspace to
start with.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Sept. 18, 2017, 8:48 p.m. UTC | #2

On Mon, Sep 18, 2017 at 05:52:24PM +0200, Lukas Czerner wrote:
> Return all fallcoate modes supported by xfs file system.
> 
> Cc: linux-xfs@vger.kernel.org
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index ebdd0bd..85e06c6 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -746,7 +746,8 @@ xfs_file_write_iter(
>  #define	XFS_FALLOC_FL_SUPPORTED						\
>  		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
>  		 FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |	\
> -		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE)
> +		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE |	\
> +		 FALLOC_FL_QUERY_SUPPORT | FALLOC_FL_PREALLOC_RANGE)
>  
>  STATIC long
>  xfs_file_fallocate(
> @@ -768,6 +769,9 @@ xfs_file_fallocate(
>  	if (mode & ~XFS_FALLOC_FL_SUPPORTED)
>  		return -EOPNOTSUPP;
>  
> +	if (mode & FALLOC_FL_QUERY_SUPPORT)
> +		return XFS_FALLOC_FL_SUPPORTED;

Urk, manpage update describing the goals of the query interface and how
this is supposed to work is needed.

Are we required to return only the mode flags that would reasonably be
expected to work on this file, or the fs in general?  Do we return zero
if the file is immutable (I guess the fd has to be O_RDONLY?) or if the
fs is readonly?

And like hch said, why not {f,}pathconf?

--D

> +
>  	xfs_ilock(ip, iolock);
>  	error = xfs_break_layouts(inode, &iolock);
>  	if (error)
> -- 
> 2.7.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andreas Dilger Sept. 18, 2017, 9:55 p.m. UTC | #3

On Sep 18, 2017, at 2:48 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> On Mon, Sep 18, 2017 at 05:52:24PM +0200, Lukas Czerner wrote:
>> Return all fallcoate modes supported by xfs file system.
>> 
>> Cc: linux-xfs@vger.kernel.org
>> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
>> ---
>> fs/xfs/xfs_file.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index ebdd0bd..85e06c6 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -746,7 +746,8 @@ xfs_file_write_iter(
>> #define	XFS_FALLOC_FL_SUPPORTED						\
>> 		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
>> 		 FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |	\
>> -		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE)
>> +		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE |	\
>> +		 FALLOC_FL_QUERY_SUPPORT | FALLOC_FL_PREALLOC_RANGE)
>> 
>> STATIC long
>> xfs_file_fallocate(
>> @@ -768,6 +769,9 @@ xfs_file_fallocate(
>> 	if (mode & ~XFS_FALLOC_FL_SUPPORTED)
>> 		return -EOPNOTSUPP;
>> 
>> +	if (mode & FALLOC_FL_QUERY_SUPPORT)
>> +		return XFS_FALLOC_FL_SUPPORTED;
> 
> Urk, manpage update describing the goals of the query interface and how
> this is supposed to work is needed.
> 
> Are we required to return only the mode flags that would reasonably be
> expected to work on this file, or the fs in general?  Do we return zero
> if the file is immutable (I guess the fd has to be O_RDONLY?) or if the
> fs is readonly?
> 
> And like hch said, why not {f,}pathconf?

The problem with pathconf() is that this is just made up by GlibC, and
doesn't actually communicate with the kernel or the filesystem at all.
For example, _PC_LINK_MAX does not return different values for ext2/ext4
filesystems based on feature flags, only what is hard-coded into GlibC
based on the filesystem magic.  This is not going to work as a per-file
source of information.

For example, even though EXT4_LINK_MAX has been 65000 for many years,
GlibC reports 32000 for an ext4 filesystem since it only uses the magic:

$ df -T /
Filesystem     Type 1024-blocks     Used Available Capacity Mounted on
/dev/mapper/vg_twoshoes-lvroot
               ext4    25991484 22475376   2205504      92% /
$ strace getconf LINK_MAX /
:
:
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=6497871,
	     f_bfree=879030, f_bavail=551379, f_files=819200, f_ffree=405130,
	     f_fsid={1950084367, 1019861045}, f_namelen=255}) = 0
write(1, "32000\n",) = 6


Cheers, Andreas

OGAWA Hirofumi Sept. 19, 2017, 3:28 a.m. UTC | #4

Christoph Hellwig <hch@infradead.org> writes:

I myself

Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

to [5/7] fat part though,

> That being said, I really detest adding weird query modes to random
> syscalls.  We should expose capabilities through the interface that
> was designed for such things: (f)pathconf, as there are a lot more
> important parameters we currently fail to expose to userspace to
> start with.

This is true. For example, max link count. And it might be able to use
for fs's bug/issue detection too (in past, userland has to add
workaround for all cases, because noway to detect the buggy or not).

Thanks.

Lukas Czerner Sept. 19, 2017, 8:15 a.m. UTC | #5

On Mon, Sep 18, 2017 at 10:56:51AM -0700, Christoph Hellwig wrote:
> This seems to miss any explanation and 6 out of 7 patches.

See linux-fsdevel for entire patch set.
https://www.spinics.net/lists/linux-fsdevel/msg115740.html

> 
> That being said, I really detest adding weird query modes to random
> syscalls.  We should expose capabilities through the interface that
> was designed for such things: (f)pathconf, as there are a lot more
> important parameters we currently fail to expose to userspace to
> start with.

(f)pathconf on it's own is useless because I need to get this
information directly from the file system. Supported fallocate modes can
differ with inode flags (encrypted files, extent/indirect based files on
ext4). I also need in-kernel interface.

That's why I though fallocate is the convenient place to put this. I
have not looked to it, but other option might be to use fstat to get
this.

-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lukas Czerner Sept. 19, 2017, 8:20 a.m. UTC | #6

On Mon, Sep 18, 2017 at 01:48:38PM -0700, Darrick J. Wong wrote:
> On Mon, Sep 18, 2017 at 05:52:24PM +0200, Lukas Czerner wrote:
> > Return all fallcoate modes supported by xfs file system.
> > 
> > Cc: linux-xfs@vger.kernel.org
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index ebdd0bd..85e06c6 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -746,7 +746,8 @@ xfs_file_write_iter(
> >  #define	XFS_FALLOC_FL_SUPPORTED						\
> >  		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
> >  		 FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |	\
> > -		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE)
> > +		 FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE |	\
> > +		 FALLOC_FL_QUERY_SUPPORT | FALLOC_FL_PREALLOC_RANGE)
> >  
> >  STATIC long
> >  xfs_file_fallocate(
> > @@ -768,6 +769,9 @@ xfs_file_fallocate(
> >  	if (mode & ~XFS_FALLOC_FL_SUPPORTED)
> >  		return -EOPNOTSUPP;
> >  
> > +	if (mode & FALLOC_FL_QUERY_SUPPORT)
> > +		return XFS_FALLOC_FL_SUPPORTED;
> 
> Urk, manpage update describing the goals of the query interface and how
> this is supposed to work is needed.

See linux-fsdevel for entire patch set. It's RFC, manpage update will
come later.
https://www.spinics.net/lists/linux-fsdevel/msg115740.html

> 
> Are we required to return only the mode flags that would reasonably be
> expected to work on this file, or the fs in general?  Do we return zero

Yes, as mentioned in cover letter it's supposed to return flags that
would be expected to work on this file.

> if the file is immutable (I guess the fd has to be O_RDONLY?) or if the
> fs is readonly?

It's going through the same path as any fallocate call.

> 
> And like hch said, why not {f,}pathconf?

I do not think fpathconf is suitable for this use case, see my reply to hch
and reply form Andreas.

-Lukas

> 
> --D
> 
> > +
> >  	xfs_ilock(ip, iolock);
> >  	error = xfs_break_layouts(inode, &iolock);
> >  	if (error)
> > -- 
> > 2.7.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Sept. 19, 2017, 2:13 p.m. UTC | #7

On Tue, Sep 19, 2017 at 10:15:38AM +0200, Lukas Czerner wrote:
> On Mon, Sep 18, 2017 at 10:56:51AM -0700, Christoph Hellwig wrote:
> > This seems to miss any explanation and 6 out of 7 patches.
> 
> See linux-fsdevel for entire patch set.
> https://www.spinics.net/lists/linux-fsdevel/msg115740.html

But only one made it to linux-xfs, so please fix your patch sending.

> (f)pathconf on it's own is useless because I need to get this
> information directly from the file system. Supported fallocate modes can
> differ with inode flags (encrypted files, extent/indirect based files on
> ext4). I also need in-kernel interface.

And that is exactly what I want you to do:  finally bite the bullet
and add pathconf inode operation.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Sept. 19, 2017, 2:55 p.m. UTC | #8

On Mon, Sep 18, 2017 at 03:55:40PM -0600, Andreas Dilger wrote:
> > And like hch said, why not {f,}pathconf?
> 
> The problem with pathconf() is that this is just made up by GlibC, and
> doesn't actually communicate with the kernel or the filesystem at all.
> For example, _PC_LINK_MAX does not return different values for ext2/ext4
> filesystems based on feature flags, only what is hard-coded into GlibC
> based on the filesystem magic.  This is not going to work as a per-file
> source of information.
> 
> For example, even though EXT4_LINK_MAX has been 65000 for many years,
> GlibC reports 32000 for an ext4 filesystem since it only uses the magic:

Maybe the right answer is we should define a new pathconfat(2) system
call which can be used as part of a C library's implementation of
pathconf() and fpathconf()?  glibc probably won't use it for years, of
course.  But we can at least provide the information via an interface
which we can control, and which is capable of returning correct
results?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lukas Czerner Sept. 19, 2017, 3:33 p.m. UTC | #9

On Tue, Sep 19, 2017 at 10:55:20AM -0400, Theodore Ts'o wrote:
> On Mon, Sep 18, 2017 at 03:55:40PM -0600, Andreas Dilger wrote:
> > > And like hch said, why not {f,}pathconf?
> > 
> > The problem with pathconf() is that this is just made up by GlibC, and
> > doesn't actually communicate with the kernel or the filesystem at all.
> > For example, _PC_LINK_MAX does not return different values for ext2/ext4
> > filesystems based on feature flags, only what is hard-coded into GlibC
> > based on the filesystem magic.  This is not going to work as a per-file
> > source of information.
> > 
> > For example, even though EXT4_LINK_MAX has been 65000 for many years,
> > GlibC reports 32000 for an ext4 filesystem since it only uses the magic:
> 
> Maybe the right answer is we should define a new pathconfat(2) system
> call which can be used as part of a C library's implementation of
> pathconf() and fpathconf()?  glibc probably won't use it for years, of
> course.  But we can at least provide the information via an interface
> which we can control, and which is capable of returning correct
> results?
> 
> 						- Ted

Right, *I think* this is what Christoph suggested in his recent reply.
It sounds fair enough to me and this is much more usefull beyond just
fallocate, so I can look into it.

-Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Sept. 19, 2017, 3:55 p.m. UTC | #10

On Tue, Sep 19, 2017 at 10:55:20AM -0400, Theodore Ts'o wrote:
> Maybe the right answer is we should define a new pathconfat(2) system
> call which can be used as part of a C library's implementation of
> pathconf() and fpathconf()?  glibc probably won't use it for years, of
> course.  But we can at least provide the information via an interface
> which we can control, and which is capable of returning correct
> results?

glibc is very fast at picking up new kernel interface these days
as long as they aren't too controversial.  Implementing a syscall
that backs a function they implement should not be in the controversial
category I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Florian Weimer Sept. 19, 2017, 7:17 p.m. UTC | #11

On 09/19/2017 05:55 PM, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 10:55:20AM -0400, Theodore Ts'o wrote:
>> Maybe the right answer is we should define a new pathconfat(2) system
>> call which can be used as part of a C library's implementation of
>> pathconf() and fpathconf()?  glibc probably won't use it for years, of
>> course.  But we can at least provide the information via an interface
>> which we can control, and which is capable of returning correct
>> results?
> 
> glibc is very fast at picking up new kernel interface these days
> as long as they aren't too controversial.  Implementing a syscall
> that backs a function they implement should not be in the controversial
> category I think.

We have a significant backlog, but I don't expect opposition to patches 
implementing syscall wrappers which are just slightly generic (and 
pathconfat should really be fine).

Technically, pathconfat might be tricky to maintain if it's expected to 
be a variadic dispatcher function, and the accompanying UAPI header 
isn't very clean.

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Sept. 19, 2017, 8:37 p.m. UTC | #12

On Tue, Sep 19, 2017 at 09:17:28PM +0200, Florian Weimer wrote:
> We have a significant backlog, but I don't expect opposition to patches
> implementing syscall wrappers which are just slightly generic (and
> pathconfat should really be fine).
> 
> Technically, pathconfat might be tricky to maintain if it's expected to be a
> variadic dispatcher function, and the accompanying UAPI header isn't very
> clean.

(f)pathconf is defined as:

       long fpathconf(int fd, int name);
       long pathconf(const char *path, int name);

so there really shouldn't be any issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Sept. 19, 2017, 11:17 p.m. UTC | #13

On Tue, Sep 19, 2017 at 01:37:19PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 19, 2017 at 09:17:28PM +0200, Florian Weimer wrote:
> > We have a significant backlog, but I don't expect opposition to patches
> > implementing syscall wrappers which are just slightly generic (and
> > pathconfat should really be fine).
> > 
> > Technically, pathconfat might be tricky to maintain if it's expected to be a
> > variadic dispatcher function, and the accompanying UAPI header isn't very
> > clean.
> 
> (f)pathconf is defined as:
> 
>        long fpathconf(int fd, int name);
>        long pathconf(const char *path, int name);
> 
> so there really shouldn't be any issue.

There might be some tricky bits will be if glibc feels it needs to
override the values returned by the kernel (if it believes it may
impose some limitations due to its implementation).  Also, not all
file systems will implement some or all pathconf parameters initially
(or perhaps ever).  So should the VFS be responsible for returning
some intelligent defaults, or glibc?

Also, will application programs want to know whether they are getting
kernel values versus glibc's best guess (which may be depedent on the
kernel version, modulo distributions / handset/tablet vendors who
might decide to backport the syscall to their kernels)?

It might be a good idea to come to a rough agreement (between the
kernel and glibc devs) about how to handle some of these practical
issues up front.

Cheers,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lukas Czerner Sept. 21, 2017, 1:17 p.m. UTC | #14

On Tue, Sep 19, 2017 at 07:17:16PM -0400, Theodore Ts'o wrote:
> On Tue, Sep 19, 2017 at 01:37:19PM -0700, Christoph Hellwig wrote:
> > On Tue, Sep 19, 2017 at 09:17:28PM +0200, Florian Weimer wrote:
> > > We have a significant backlog, but I don't expect opposition to patches
> > > implementing syscall wrappers which are just slightly generic (and
> > > pathconfat should really be fine).
> > > 
> > > Technically, pathconfat might be tricky to maintain if it's expected to be a
> > > variadic dispatcher function, and the accompanying UAPI header isn't very
> > > clean.
> > 
> > (f)pathconf is defined as:
> > 
> >        long fpathconf(int fd, int name);
> >        long pathconf(const char *path, int name);
> > 
> > so there really shouldn't be any issue.
> 
> There might be some tricky bits will be if glibc feels it needs to
> override the values returned by the kernel (if it believes it may
> impose some limitations due to its implementation).  Also, not all
> file systems will implement some or all pathconf parameters initially
> (or perhaps ever).  So should the VFS be responsible for returning
> some intelligent defaults, or glibc?
> 
> Also, will application programs want to know whether they are getting
> kernel values versus glibc's best guess (which may be depedent on the
> kernel version, modulo distributions / handset/tablet vendors who
> might decide to backport the syscall to their kernels)?
> 
> It might be a good idea to come to a rough agreement (between the
> kernel and glibc devs) about how to handle some of these practical
> issues up front.
> 
> Cheers,
> 
> 					- Ted

Hi,

I think that it's reasonable to expect that we will implement the
following basic set of options. If I am not mistaken this is the basic
set that must be implemented by pathconf/fpathconf and it's also the
set of options that currently has linux specific implementation (or just
value definition) in glibc.


I think that most of those can be implemented directly in the vfs with
some exceptions, where we'll need to ask the file system. However this
should not be problem to implement right away when we introduce
pathconf.

Here is the list, although I am still not sure about some of those. Any
clarification on these would help.

_PC_LINK_MAX		include/uapi/linux/limits.h (linux limits)
_PC_MAX_CANON		include/uapi/linux/limits.h (linux limits)
_PC_MAX_INPUT		linux limits
_PC_NAME_MAX		linux limits
_PC_PATH_MAX		linux limits
_PC_PIPE_BUF		linux limits
_PC_CHOWN_RESTRICTED	currently yes
_PC_NO_TRUNC		specified per fs (at least affs set at mount)
_PC_VDISABLE		include/linux/tty.h __DISABLED_CHAR
_PC_SYNC_IO		we support O_SYNC, the question is how is it
			honored ? Currently my system returns undefined
_PC_ASYNC_IO		Not sure what exctly is ment by async here ?
			We always do asynchronous IO unless O_SYNC
			My system returns undefined
_PC_PRIO_IO		no support
_PC_SOCK_MAXBUF		?
_PC_FILESIZEBITS	per fs (s_maxbytes)
_PC_REC_INCR_XFER_SIZE 	?
_PC_REC_MAX_XFER_SIZE	?
_PC_REC_MIN_XFER_SIZE	? block size ?
_PC_REC_XFER_ALIGN	per fs (block size)
_PC_ALLOC_SIZE_MIN	per fs (block size)
_PC_SYMLINK_MAX		per fs (but we do not have limit AFACT)
_PC_2_SYMLINKS		per fs (do we have fs that don't have symlinks ?


As for the cases where glibc want's to override the value returned by
the kernel - I am not sure in what scenarios will they need to do it,
but if they do, it's just a matter of changing it...Not sure that I
understand you concern.

Also, at the moment we do not even know if it's a guess or a value returned
by kernel, so I'd say it does not matter. It can only get better from here.

Adding libc-alpha so they can share some of their wisdom. It'd be very
appreciated.

Thanks!
-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Florian Weimer Sept. 21, 2017, 1:49 p.m. UTC | #15

On 09/21/2017 03:17 PM, Lukas Czerner wrote:

> _PC_NAME_MAX		linux limits

Note that _PC_NAME_MAX is basically a lie.  There are file systems which 
support longer names than 255 bytes.  It's quite common for file systems 
which store names in UCS-2 or UTF-16 and have code point limit of 255, 
which translates to something larger after UTF-8 conversion in the 
kernel.  This value is available today through statfs/f_namemax, but 
historically, the reported value was not correct.

But due to file bind mounts, any value the kernel can return is useless 
for application use anyway.

The only real user of this constant was readdir_r, and I've been trying 
to kill it (it's still in POSIX, but will be removed in a future edition).

> _PC_REC_INCR_XFER_SIZE 	?
> _PC_REC_MAX_XFER_SIZE	?
> _PC_REC_MIN_XFER_SIZE	? block size ?
> _PC_REC_XFER_ALIGN	per fs (block size)

NFS (and perhaps other network file systems) treats f_bsize to signal 
something similar.  We had to remove f_bsize-based tuning from glibc as 
a result because the value is quite misleading for many workloads.

> _PC_ALLOC_SIZE_MIN	per fs (block size)

Like the other size limits quoted above, the semantics are quite 
unclear.  POSIX documents them mostly in the context of the RT 
extensions, so the acceptable values likely depend whether the 
application uses kernel-assisted aio, or aio based on userspace threads.

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Florian Weimer Sept. 21, 2017, 1:54 p.m. UTC | #16

On 09/20/2017 01:17 AM, Theodore Ts'o wrote:
> There might be some tricky bits will be if glibc feels it needs to
> override the values returned by the kernel (if it believes it may
> impose some limitations due to its implementation).  Also, not all
> file systems will implement some or all pathconf parameters initially
> (or perhaps ever).  So should the VFS be responsible for returning
> some intelligent defaults, or glibc?

I'd prefer if we could export pathconfat directly on the glibc side, 
without any emulation, and have the application deal with ENOSYS.

We would use it to implement parts of pathconf/fpathconf if available, 
but I don't think we should copy the existing emulation for pathconfat 
because every time we add such emulation, it comes back to haunt us.

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lukas Czerner Sept. 22, 2017, 8:38 a.m. UTC | #17

On Thu, Sep 21, 2017 at 03:49:48PM +0200, Florian Weimer wrote:
> On 09/21/2017 03:17 PM, Lukas Czerner wrote:
> 
> > _PC_NAME_MAX		linux limits
> 
> Note that _PC_NAME_MAX is basically a lie.  There are file systems which
> support longer names than 255 bytes.  It's quite common for file systems
> which store names in UCS-2 or UTF-16 and have code point limit of 255, which
> translates to something larger after UTF-8 conversion in the kernel.  This
> value is available today through statfs/f_namemax, but historically, the
> reported value was not correct.

Right, we can do this correctly from kernel though.

> 
> But due to file bind mounts, any value the kernel can return is useless for
> application use anyway.
> 
> The only real user of this constant was readdir_r, and I've been trying to
> kill it (it's still in POSIX, but will be removed in a future edition).

Good to know, but we still need to implement this, at least for now.

> 
> > _PC_REC_INCR_XFER_SIZE 	?
> > _PC_REC_MAX_XFER_SIZE	?
> > _PC_REC_MIN_XFER_SIZE	? block size ?
> > _PC_REC_XFER_ALIGN	per fs (block size)
> 
> NFS (and perhaps other network file systems) treats f_bsize to signal
> something similar.  We had to remove f_bsize-based tuning from glibc as a
> result because the value is quite misleading for many workloads.
> 
> > _PC_ALLOC_SIZE_MIN	per fs (block size)
> 
> Like the other size limits quoted above, the semantics are quite unclear.
> POSIX documents them mostly in the context of the RT extensions, so the
> acceptable values likely depend whether the application uses kernel-assisted
> aio, or aio based on userspace threads.


Thanks!
-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lukas Czerner Sept. 22, 2017, 8:40 a.m. UTC | #18

On Thu, Sep 21, 2017 at 03:54:01PM +0200, Florian Weimer wrote:
> On 09/20/2017 01:17 AM, Theodore Ts'o wrote:
> > There might be some tricky bits will be if glibc feels it needs to
> > override the values returned by the kernel (if it believes it may
> > impose some limitations due to its implementation).  Also, not all
> > file systems will implement some or all pathconf parameters initially
> > (or perhaps ever).  So should the VFS be responsible for returning
> > some intelligent defaults, or glibc?
> 
> I'd prefer if we could export pathconfat directly on the glibc side, without
> any emulation, and have the application deal with ENOSYS.
> 
> We would use it to implement parts of pathconf/fpathconf if available, but I
> don't think we should copy the existing emulation for pathconfat because
> every time we add such emulation, it comes back to haunt us.

Fair enough, I'll get back with some kernel code.

Thanks!
-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[4/7] xfs: Implement fallocate query support mode

Commit Message

Comments

Patch