diff mbox

vfs: Add support to check max and min inode times

Message ID 1456933894-17001-1-git-send-email-deepa.kernel@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Deepa Dinamani March 2, 2016, 3:51 p.m. UTC
This is a preparation patch to add range checking for inode
timestamps.

These range checks will be used to clamp timestamps to filesystem
allowed ranges.

Individual filesystems do not have the same on disk format as
the in memory inodes. Range checking and clamping times assigned
to inodes will help keep in memory and on-disk timestamps to be
in sync.

Extend struct super_block to include information about the max
and min inode times each filesystem can hold. These are dependent
on the on-disk format of filesystems.

Every time a new superblock is created, make sure that the superblock
max and min timestamp fields are assigned invalid values.

Another series will initialize these fields to appropriate values for
every filesystem.

The values are currently ignored. The exact policy and behavior will be
decided in a separate patch.

MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN
and S64_MAX respectively so that even if one of the fields is
uninitialized, it can be detected by using the condition
max_time < min_time.

The original idea for the feature comes from the discussion:
https://lkml.org/lkml/2014/5/30/669

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---

The intention is to include this as part of 4.6 so that the follow on
patches can go into 4.7.

The series and the plan have been discussed with Arnd Bergmann.


fs/super.c         |  2 ++
 include/linux/fs.h | 13 ++++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Arnd Bergmann March 2, 2016, 4:26 p.m. UTC | #1
On Wednesday 02 March 2016 07:51:34 Deepa Dinamani wrote:
> This is a preparation patch to add range checking for inode
> timestamps.
> 
> These range checks will be used to clamp timestamps to filesystem
> allowed ranges.
> 
> Individual filesystems do not have the same on disk format as
> the in memory inodes. Range checking and clamping times assigned
> to inodes will help keep in memory and on-disk timestamps to be
> in sync.
> 
> Extend struct super_block to include information about the max
> and min inode times each filesystem can hold. These are dependent
> on the on-disk format of filesystems.
> 
> Every time a new superblock is created, make sure that the superblock
> max and min timestamp fields are assigned invalid values.
> 
> Another series will initialize these fields to appropriate values for
> every filesystem.
> 
> The values are currently ignored. The exact policy and behavior will be
> decided in a separate patch.
> 
> MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN
> and S64_MAX respectively so that even if one of the fields is
> uninitialized, it can be detected by using the condition
> max_time < min_time.
> 
> The original idea for the feature comes from the discussion:
> https://lkml.org/lkml/2014/5/30/669
> 
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> ---
> 
> The intention is to include this as part of 4.6 so that the follow on
> patches can go into 4.7.
> 
> The series and the plan have been discussed with Arnd Bergmann.
> 
> 

Acked-by: Arnd Bergmann <arnd@arndb.de>

> fs/super.c         |  2 ++
>  include/linux/fs.h | 13 ++++++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 1182af8..d70a8f6 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -239,6 +239,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
>  	s->s_maxbytes = MAX_NON_LFS;
>  	s->s_op = &default_op;
>  	s->s_time_gran = 1000000000;
> +	s->s_time_max = MAX_INVALID_VFS_TIME;
> +	s->s_time_min = MIN_INVALID_VFS_TIME;
>  	s->cleancache_poolid = CLEANCACHE_NO_POOL;
>  
>  	s->s_shrink.seeks = DEFAULT_SEEKS;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1af4727..15b41e6 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
>  #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
>  #endif
>  
> +#define MAX_VFS_TIME	S64_MAX
> +#define MIN_VFS_TIME	S64_MIN
> +
> +#define MAX_INVALID_VFS_TIME	S64_MIN
> +#define MIN_INVALID_VFS_TIME	S64_MAX
> +
>  #define FL_POSIX	1
>  #define FL_FLOCK	2
>  #define FL_DELEG	4	/* NFSv4 delegation */
> @@ -1343,7 +1349,12 @@ struct super_block {
>  
>  	/* Granularity of c/m/atime in ns.
>  	   Cannot be worse than a second */
> -	u32		   s_time_gran;
> +	u32		s_time_gran;
> +	/* Max and min values for timestamps
> +	 * according to the range supported by filesystems.
> +	 */
> +	time64_t	s_time_max;
> +	time64_t	s_time_min;
>  
>  	/*
>  	 * The next field is for VFS *only*. No filesystems have any business
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 2, 2016, 10:19 p.m. UTC | #2
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
> MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN
> and S64_MAX respectively so that even if one of the fields is
> uninitialized, it can be detected by using the condition
> max_time < min_time.

I can't work out what MIN/MAX_INVALID_VFS_TIME is supposed to mean
when I see it in the code. does it mean time that lies between
MIN_INVALID_VFS_TIME > time > MAX_INVALID_VFS_TIME is invalid
(unlikely, but that's the obvious reading :)?

Or that time < MIN_INVALID_VFS_TIME is invalid? Or is it valid? I
can't tell...

Normally limits are specified by "min valid" and "max valid"
defines, which are pretty clear in their meaning. Like:

> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
>  #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
>  #endif
>  
> +#define MAX_VFS_TIME	S64_MAX
> +#define MIN_VFS_TIME	S64_MIN

These. Anything ouside these ranges is invalid.

As such, I think this is wrong for 32 bit systems as the min/max VFS
times right now are S32_MAX/S32_MIN...

Cheers,

Dave.
Arnd Bergmann March 2, 2016, 11:07 p.m. UTC | #3
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
> On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
> > MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN
> > and S64_MAX respectively so that even if one of the fields is
> > uninitialized, it can be detected by using the condition
> > max_time < min_time.
> 
> I can't work out what MIN/MAX_INVALID_VFS_TIME is supposed to mean
> when I see it in the code. does it mean time that lies between
> MIN_INVALID_VFS_TIME > time > MAX_INVALID_VFS_TIME is invalid
> (unlikely, but that's the obvious reading :)?
> 
> Or that time < MIN_INVALID_VFS_TIME is invalid? Or is it valid? I
> can't tell...
> 
> Normally limits are specified by "min valid" and "max valid"
> defines, which are pretty clear in their meaning. Like:

Hmm, I had meant to comment on this in my private discussion.

I agree this needs some more explanation in the source code,
the idea is that once we have modified all file systems to
correctly set the actual limits, we can then detect any newly
added file system that forgets to set them, so we don't get
accidental incorrect limits.

> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
> >  #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
> >  #endif
> >  
> > +#define MAX_VFS_TIME	S64_MAX
> > +#define MIN_VFS_TIME	S64_MIN
> 
> These. Anything ouside these ranges is invalid.
> 
> As such, I think this is wrong for 32 bit systems as the min/max VFS
> times right now are S32_MAX/S32_MIN...

I think the file system should always set the actual on-disk format
limits in the superblock, regardless of the word size of the CPU
architecture.

We already support 32-bit time_t in compat tasks on 64-bit
architectures, and we will support  64-bit time_t in normal
tasks on 32-bit architectures in the future, and there is no
need to range-check the timestamps written to disk.

The one range-check that we probably want for current 32-bit tasks
is in the vfs_fstatat/vfs_getattr code, so a kernel that internally
handles post-2038 timestamps can limit them to S32_MAX instead of
wrapping.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 2, 2016, 11:45 p.m. UTC | #4
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
> On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
> > On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
> > > MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN
> > > and S64_MAX respectively so that even if one of the fields is
> > > uninitialized, it can be detected by using the condition
> > > max_time < min_time.
> > 
> > I can't work out what MIN/MAX_INVALID_VFS_TIME is supposed to mean
> > when I see it in the code. does it mean time that lies between
> > MIN_INVALID_VFS_TIME > time > MAX_INVALID_VFS_TIME is invalid
> > (unlikely, but that's the obvious reading :)?
> > 
> > Or that time < MIN_INVALID_VFS_TIME is invalid? Or is it valid? I
> > can't tell...
> > 
> > Normally limits are specified by "min valid" and "max valid"
> > defines, which are pretty clear in their meaning. Like:
> 
> Hmm, I had meant to comment on this in my private discussion.
> 
> I agree this needs some more explanation in the source code,
> the idea is that once we have modified all file systems to
> correctly set the actual limits, we can then detect any newly
> added file system that forgets to set them, so we don't get
> accidental incorrect limits.

But if a superblock supports full 64 bit time resolution (i.e.
MIN_VFS_TIME to MAX_VFS_TIME) then you can't tell the difference
and will warn inappropriately.

> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
> > >  #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
> > >  #endif
> > >  
> > > +#define MAX_VFS_TIME	S64_MAX
> > > +#define MIN_VFS_TIME	S64_MIN
> > 
> > These. Anything ouside these ranges is invalid.
> > 
> > As such, I think this is wrong for 32 bit systems as the min/max VFS
> > times right now are S32_MAX/S32_MIN...
> 
> I think the file system should always set the actual on-disk format
> limits in the superblock, regardless of the word size of the CPU
> architecture.

Eventually, yes. But right now, 32 bit systems are limited by the
platform architecture. Hence no matter what the filesystem on-disk
format supports, these are going to hard limits until filesystems
start modifying the defaults to indicate what their on-disk format
supports.

i.e. 32 bit systems should default to 32 bit time limits until the
filesystem developers come along and verify that conversion from
their on-disk format to 64 bit time works correctly. Then they set
their real limits which may (or may not) be >y2038 compatible, and
this means that filesystems that have not been converted to >y2038
compatible do not need any modifications to behave correctly on 32
bit systems...

> We already support 32-bit time_t in compat tasks on 64-bit
> architectures, and we will support  64-bit time_t in normal
> tasks on 32-bit architectures in the future, and there is no
> need to range-check the timestamps written to disk.
> 
> The one range-check that we probably want for current 32-bit tasks
> is in the vfs_fstatat/vfs_getattr code, so a kernel that internally
> handles post-2038 timestamps can limit them to S32_MAX instead of
> wrapping.

Those should return EOVERFLOW rather than an incorrect timestamp,
like we do for other 64 bit fields stat returns that can't fit in 32
bit stat fields. That's a clear indication of a process that should
be using stat64() instead.

Cheers,

Dave.
Deepa Dinamani March 3, 2016, 6:24 a.m. UTC | #5
>> I agree this needs some more explanation in the source code,
>> the idea is that once we have modified all file systems to
>> correctly set the actual limits, we can then detect any newly
>> added file system that forgets to set them, so we don't get
>> accidental incorrect limits.
>
> But if a superblock supports full 64 bit time resolution (i.e.
> MIN_VFS_TIME to MAX_VFS_TIME) then you can't tell the difference
> and will warn inappropriately.

No, this will not happen.
The max and min values are reversed for invalid values and the condition as
mentioned in the commit text for catching invalid ranges in the vfs layer is
max_time < min_time.

I will add a comment in the code to clarify.

Quoting from the patch:

+#define MAX_INVALID_VFS_TIME   S64_MIN
+#define MIN_INVALID_VFS_TIME   S64_MAX

I will post a v2 which removes these macros and
makes use of MAX/MIN_VFS_TIME macros instead.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann March 3, 2016, 2:08 p.m. UTC | #6
On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
> On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
> > On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
> > > On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:

> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
> > > >  #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
> > > >  #endif
> > > >  
> > > > +#define MAX_VFS_TIME	S64_MAX
> > > > +#define MIN_VFS_TIME	S64_MIN
> > > 
> > > These. Anything ouside these ranges is invalid.
> > > 
> > > As such, I think this is wrong for 32 bit systems as the min/max VFS
> > > times right now are S32_MAX/S32_MIN...
> > 
> > I think the file system should always set the actual on-disk format
> > limits in the superblock, regardless of the word size of the CPU
> > architecture.
> 
> Eventually, yes. But right now, 32 bit systems are limited by the
> platform architecture. Hence no matter what the filesystem on-disk
> format supports, these are going to hard limits until filesystems
> start modifying the defaults to indicate what their on-disk format
> supports.

The purpose of the patch for now is to allow file systems to set
the defaults, we are not checking them yet, until all file systems
are converted, hence setting the defaults to invalid.

Our initial idea was to set the defaults for minimum and maximum
to zero, which would have made this clearer, but unfortunately
that doesn't allow us to detect whether a file system actually
supports time stamps at all: some file systems hardcode all
timestamps to zero (1970) because they don't store them on disk.

How about setting the defaults for minimum and maximum to '-1'?
That would allow us to detect whether file systems have set
this correctly, and it would be less confusing to the reader.

> i.e. 32 bit systems should default to 32 bit time limits until the
> filesystem developers come along and verify that conversion from
> their on-disk format to 64 bit time works correctly. Then they set
> their real limits which may (or may not) be >y2038 compatible, and
> this means that filesystems that have not been converted to >y2038
> compatible do not need any modifications to behave correctly on 32
> bit systems...

We can set the limits in the superblock if you like, but I would
not do it by changing the constants here:

Most file systems have arbitrary other maximum dates that they
need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 
2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).

The file systems specific code should just set whatever it actually
supports and then we can find a way to limit that further for
32-bit tasks.

> > We already support 32-bit time_t in compat tasks on 64-bit
> > architectures, and we will support  64-bit time_t in normal
> > tasks on 32-bit architectures in the future, and there is no
> > need to range-check the timestamps written to disk.
> > 
> > The one range-check that we probably want for current 32-bit tasks
> > is in the vfs_fstatat/vfs_getattr code, so a kernel that internally
> > handles post-2038 timestamps can limit them to S32_MAX instead of
> > wrapping.
> 
> Those should return EOVERFLOW rather than an incorrect timestamp,
> like we do for other 64 bit fields stat returns that can't fit in 32
> bit stat fields. That's a clear indication of a process that should
> be using stat64() instead.

On most 32-bit architectures, stat64 still has a 32-bit time value,
and even if it is 64-bit wide, glibc turns it into a 32-bit
number when copying to its version of 'struct stat64' that differs
from the kernel's and contains a 'struct timespec'.

I don't care if we return EOVERFLOW or give some other indication
of the overflow, but this is fundamentally a separate problem from
large file support, and we need a new syscall for it.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 4, 2016, 1:10 a.m. UTC | #7
On Thu, Mar 03, 2016 at 03:08:22PM +0100, Arnd Bergmann wrote:
> On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
> > On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
> > > On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
> > > > On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
> > > > > --- a/include/linux/fs.h
> > > > > +++ b/include/linux/fs.h
> > > > > @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
> > > > >  #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
> > > > >  #endif
> > > > >  
> > > > > +#define MAX_VFS_TIME	S64_MAX
> > > > > +#define MIN_VFS_TIME	S64_MIN
> > > > 
> > > > These. Anything ouside these ranges is invalid.
> > > > 
> > > > As such, I think this is wrong for 32 bit systems as the min/max VFS
> > > > times right now are S32_MAX/S32_MIN...
> > > 
> > > I think the file system should always set the actual on-disk format
> > > limits in the superblock, regardless of the word size of the CPU
> > > architecture.
> > 
> > Eventually, yes. But right now, 32 bit systems are limited by the
> > platform architecture. Hence no matter what the filesystem on-disk
> > format supports, these are going to hard limits until filesystems
> > start modifying the defaults to indicate what their on-disk format
> > supports.
> 
> The purpose of the patch for now is to allow file systems to set
> the defaults, we are not checking them yet, until all file systems
> are converted, hence setting the defaults to invalid.

Wrong way around, IMO. Set the defaults to be limiting first, then
as filesystems are converted the limits get expanded.

> How about setting the defaults for minimum and maximum to '-1'?
> That would allow us to detect whether file systems have set
> this correctly, and it would be less confusing to the reader.

Perhaps, but I'm yet to understand why we even care if filesystems
have set limits or not? If the filesystem has not set it's own
limits, then surely we should assume that the filesystem is not
y2038k compatible and have the default limits set appropriately for
that?

> > i.e. 32 bit systems should default to 32 bit time limits until the
> > filesystem developers come along and verify that conversion from
> > their on-disk format to 64 bit time works correctly. Then they set
> > their real limits which may (or may not) be >y2038 compatible, and
> > this means that filesystems that have not been converted to >y2038
> > compatible do not need any modifications to behave correctly on 32
> > bit systems...
> 
> We can set the limits in the superblock if you like, but I would
> not do it by changing the constants here:
> 
> Most file systems have arbitrary other maximum dates that they
> need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 
> 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).

Yes, and that needs to be done when the filesystem is converted to
use the proper timespec64 representation.

i.e. the "provide timestamp limit support" patchset starts out by
limiting all times on all filesystems on all platforms to 32 bit
times. Then after each filesystem is converted to use the VFS native
64 bit timestamps, we ensure that the filesystem sets it's
known-valid timestamp limits in the superblock to correctly indicate
the range of timestamps they support.

Hence by the end of the patchset we have the situation where
filesystems that are not >y2038 compatible are limited to 32 bit
timestamps, and those that can represent larger dates expose only
the exact dates they can support to the VFS.

> The file systems specific code should just set whatever it actually
> supports and then we can find a way to limit that further for
> 32-bit tasks.

Yes, 32 bit tasks is something for the syscall layer to care about.

But that doesn't change the fact that the *default timestamp range*
should be "no >y2038 support", and it is up to the filesystem to
override that with it's own limits if they are different.

> > > We already support 32-bit time_t in compat tasks on 64-bit
> > > architectures, and we will support  64-bit time_t in normal
> > > tasks on 32-bit architectures in the future, and there is no
> > > need to range-check the timestamps written to disk.
> > > 
> > > The one range-check that we probably want for current 32-bit tasks
> > > is in the vfs_fstatat/vfs_getattr code, so a kernel that internally
> > > handles post-2038 timestamps can limit them to S32_MAX instead of
> > > wrapping.
> > 
> > Those should return EOVERFLOW rather than an incorrect timestamp,
> > like we do for other 64 bit fields stat returns that can't fit in 32
> > bit stat fields. That's a clear indication of a process that should
> > be using stat64() instead.
> 
> On most 32-bit architectures, stat64 still has a 32-bit time value,
> and even if it is 64-bit wide, glibc turns it into a 32-bit
> number when copying to its version of 'struct stat64' that differs
> from the kernel's and contains a 'struct timespec'.

So stat64 also returns EOVERLFOW to indicate to the application
that it needs to use a different, 64 bit time aware syscall.

> I don't care if we return EOVERFLOW or give some other indication
> of the overflow, but this is fundamentally a separate problem from
> large file support, and we need a new syscall for it.

Yes, but you seem to have missed my point that the we don't need to
care about 32 bit times, whether it be syscalls or tasks, at the
VFS/filesystem level. That's all a syscall layer issue. i.e.
vfs_fstatat() only uses 64 bit times and only places where struct
kstat contents get converted to whatever structure userspace has
supplied (e.g. cp_new_stat()) need to detect/care about timestamp
overflows like they already do with inode numbers and link counts.

Cheers,

Dave.
Steve French March 4, 2016, 6:31 a.m. UTC | #8
On Thu, Mar 3, 2016 at 8:08 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
>> On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
>> > On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
>> > > On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
>
>> > > > --- a/include/linux/fs.h
>> > > > +++ b/include/linux/fs.h
>> > > > @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
>> > > >  #define MAX_LFS_FILESIZE       ((loff_t)0x7fffffffffffffffLL)
>> > > >  #endif
>> > > >
>> > > > +#define MAX_VFS_TIME   S64_MAX
>> > > > +#define MIN_VFS_TIME   S64_MIN
>> > >
>> > > These. Anything ouside these ranges is invalid.
>> > >
>> > > As such, I think this is wrong for 32 bit systems as the min/max VFS
>> > > times right now are S32_MAX/S32_MIN...
>> >
>> > I think the file system should always set the actual on-disk format
>> > limits in the superblock, regardless of the word size of the CPU
>> > architecture.
>>
>> Eventually, yes. But right now, 32 bit systems are limited by the
>> platform architecture. Hence no matter what the filesystem on-disk
>> format supports, these are going to hard limits until filesystems
>> start modifying the defaults to indicate what their on-disk format
>> supports.
>
> The purpose of the patch for now is to allow file systems to set
> the defaults, we are not checking them yet, until all file systems
> are converted, hence setting the defaults to invalid.
>
> Our initial idea was to set the defaults for minimum and maximum
> to zero, which would have made this clearer, but unfortunately
> that doesn't allow us to detect whether a file system actually
> supports time stamps at all: some file systems hardcode all
> timestamps to zero (1970) because they don't store them on disk.
>
> How about setting the defaults for minimum and maximum to '-1'?
> That would allow us to detect whether file systems have set
> this correctly, and it would be less confusing to the reader.
>
>> i.e. 32 bit systems should default to 32 bit time limits until the
>> filesystem developers come along and verify that conversion from
>> their on-disk format to 64 bit time works correctly. Then they set
>> their real limits which may (or may not) be >y2038 compatible, and
>> this means that filesystems that have not been converted to >y2038
>> compatible do not need any modifications to behave correctly on 32
>> bit systems...
>
> We can set the limits in the superblock if you like, but I would
> not do it by changing the constants here:
>
> Most file systems have arbitrary other maximum dates that they
> need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus),
> 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).

I am puzzled why 2107 would be the maximum date for cifs.  My
calculation comes to a
maximum date of approximately the year 60,000AD for 64 bit DCE time
(cifs.ko gets DCE time back in all time fields when using CIFS, SMB2 or SMB3
except for the oldest dialects of CIFS).   DCE time is
100 nanoseconds since 1601 - see the definition of FILETIME in section 2.3.3
of https://msdn.microsoft.com/en-us/library/cc230324.aspx).  And shouldn't
this be the same for NTFS as they use a similar DCE time internally?

> The file systems specific code should just set whatever it actually
> supports and then we can find a way to limit that further for
> 32-bit tasks.
Deepa Dinamani March 4, 2016, 7:49 a.m. UTC | #9
On Fri, Mar 4, 2016 at 6:40 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Mar 03, 2016 at 03:08:22PM +0100, Arnd Bergmann wrote:
>> On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
>> > On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
>> > > On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
>> > > > On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
>> > > > > --- a/include/linux/fs.h
>> > > > > +++ b/include/linux/fs.h
>> > > > > @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f)
>> > > > >  #define MAX_LFS_FILESIZE     ((loff_t)0x7fffffffffffffffLL)
>> > > > >  #endif
>> > > > >
>> > > > > +#define MAX_VFS_TIME S64_MAX
>> > > > > +#define MIN_VFS_TIME S64_MIN
>> > > >
>> > > > These. Anything ouside these ranges is invalid.
>> > > >
>> > > > As such, I think this is wrong for 32 bit systems as the min/max VFS
>> > > > times right now are S32_MAX/S32_MIN...
>> > >
>> > > I think the file system should always set the actual on-disk format
>> > > limits in the superblock, regardless of the word size of the CPU
>> > > architecture.
>> >
>> > Eventually, yes. But right now, 32 bit systems are limited by the
>> > platform architecture. Hence no matter what the filesystem on-disk
>> > format supports, these are going to hard limits until filesystems
>> > start modifying the defaults to indicate what their on-disk format
>> > supports.
>>
>> The purpose of the patch for now is to allow file systems to set
>> the defaults, we are not checking them yet, until all file systems
>> are converted, hence setting the defaults to invalid.
>
> Wrong way around, IMO. Set the defaults to be limiting first, then
> as filesystems are converted the limits get expanded.

The picture is not clear until you see how we use these values and
these are just
values until then.
You should be able to get a better picture on what we are trying to do and the
discussion makes more sense when we post the patches that use these.
We are just trying to get some initial infrastructure so that we can
proceed to post
the subsequent patches.

Here are the approaches that were considered and why this one was picked:

Approach 1

Use default values in the vfs layer
This was the first version considered (Arnd has a patch from me).

Steps involved:

a. Use default values to initialize limits in alloc_super()
b. implement vfs policy
c. Each filesystem chooses to fill in their actual values, at its own pace.
d. Initialize to invalid defaults in vfs.

Cons:

a. There are unmaintained filesystems that will never be changed.
b. What about the filesystems that support ranges lesser than S32_MAX and
greater than S32_MIN Eg: ceph, fat, iso9660? vfs policies will not be safe here.
c. wherever these values are clamped, it will lead to changing every
filesystems twice, before and after right limits are filled in.

Approach 2

Use invalid values in the vfs layer.

Steps involved:
a. Use invalid values to initialize limits in alloc_super()
b. implement vfs policy, but only limit to filesystems which fill in
actual range.
c. We change each filesystem to fill in their actual values.
d. Maybe make compulsory to have a valid range, according to policy chosen.

This approach was chosen because it is clear with this approach as it addresses
all the cons in approach 1.


-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann March 4, 2016, 4:21 p.m. UTC | #10
On Friday 04 March 2016 00:31:09 Steve French wrote:
> >> i.e. 32 bit systems should default to 32 bit time limits until the
> >> filesystem developers come along and verify that conversion from
> >> their on-disk format to 64 bit time works correctly. Then they set
> >> their real limits which may (or may not) be >y2038 compatible, and
> >> this means that filesystems that have not been converted to >y2038
> >> compatible do not need any modifications to behave correctly on 32
> >> bit systems...
> >
> > We can set the limits in the superblock if you like, but I would
> > not do it by changing the constants here:
> >
> > Most file systems have arbitrary other maximum dates that they
> > need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus),
> > 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
> 
> I am puzzled why 2107 would be the maximum date for cifs.  My
> calculation comes to a
> maximum date of approximately the year 60,000AD for 64 bit DCE time
> (cifs.ko gets DCE time back in all time fields when using CIFS, SMB2 or SMB3
> except for the oldest dialects of CIFS).   DCE time is
> 100 nanoseconds since 1601 - see the definition of FILETIME in section 2.3.3
> of https://msdn.microsoft.com/en-us/library/cc230324.aspx).  And shouldn't
> this be the same for NTFS as they use a similar DCE time internally?
> 

The value I was looking up in my table at http://kernelnewbies.org/y2038
indeed referred to the ancient SMB timestamps that use a 7-bit year
number starting in 1980, just as fat has.

The year I have listed in the table for modern cifs and ntfs is 30328,
which assumes that it's a signed 64-bit multiple of 100 nanoseconds, and
that's what I see in cifs_NTtimeToUnix() as well. The documentation
you point to describes it as unsigned, which matches your 60,000AD
date, so before we add the limits, we should try to clarify which one
was intended.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann March 4, 2016, 4:54 p.m. UTC | #11
On Friday 04 March 2016 12:10:08 Dave Chinner wrote:
> On Thu, Mar 03, 2016 at 03:08:22PM +0100, Arnd Bergmann wrote:
> > On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
> > > On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
> > > > On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
> > > > > On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
> > > Eventually, yes. But right now, 32 bit systems are limited by the
> > > platform architecture. Hence no matter what the filesystem on-disk
> > > format supports, these are going to hard limits until filesystems
> > > start modifying the defaults to indicate what their on-disk format
> > > supports.
> > 
> > The purpose of the patch for now is to allow file systems to set
> > the defaults, we are not checking them yet, until all file systems
> > are converted, hence setting the defaults to invalid.
> 
> Wrong way around, IMO. Set the defaults to be limiting first, then
> as filesystems are converted the limits get expanded.

That would break perfectly working 64-bit systems, along with 32-bit
systems that have the new syscalls, unless we intentionally treat
them differently. :(

> > How about setting the defaults for minimum and maximum to '-1'?
> > That would allow us to detect whether file systems have set
> > this correctly, and it would be less confusing to the reader.
> 
> Perhaps, but I'm yet to understand why we even care if filesystems
> have set limits or not? If the filesystem has not set it's own
> limits, then surely we should assume that the filesystem is not
> y2038k compatible and have the default limits set appropriately for
> that?

I don't think we can make any such assumption. Out of the ~50
file systems that have store timestamps on disk or on the network,
I only found eight that are use the 1902..2038 range that matches
the 32-bit time_t.

> > > i.e. 32 bit systems should default to 32 bit time limits until the
> > > filesystem developers come along and verify that conversion from
> > > their on-disk format to 64 bit time works correctly. Then they set
> > > their real limits which may (or may not) be >y2038 compatible, and
> > > this means that filesystems that have not been converted to >y2038
> > > compatible do not need any modifications to behave correctly on 32
> > > bit systems...
> > 
> > We can set the limits in the superblock if you like, but I would
> > not do it by changing the constants here:
> > 
> > Most file systems have arbitrary other maximum dates that they
> > need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 
> > 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
> 
> Yes, and that needs to be done when the filesystem is converted to
> use the proper timespec64 representation.

No, it's completely unrelated to the timespec64 change, other than
the fact that you asked for the limits to be implemented as a
result of the y2038 patch series. This is absolute reasonable because
we are touching the same code, and we have to audit it anyway.

> > The file systems specific code should just set whatever it actually
> > supports and then we can find a way to limit that further for
> > 32-bit tasks.
> 
> Yes, 32 bit tasks is something for the syscall layer to care about.
> 
> But that doesn't change the fact that the *default timestamp range*
> should be "no >y2038 support", and it is up to the filesystem to
> override that with it's own limits if they are different.

I still think the opposite approach makes more sense here: Right
now, we have 64-bit systems that have file systems with arbitrary
limitations working in random ways, most of them correctly
supporting at least time stamps between 1970 and 2106, some of them
much wider ranges.

The 32-bit systems are currently all broken beyond 2038, and they
have always been that way.

Changing everything to use timespec64 brings 32-bit systems to
the same level of functionality as 64-bit systems, and extends
the maximum supported timestamp range to what each file system
can store on disk.

Adding the range checking ensures that we don't run into whatever
limit a file system has in an uncontrolled way, but get a
specific error, or controlled clamping of the range. We would need
this even if all 32-bit systems would die out tomorrow and we
only worried about 64-bit machines.

> > On most 32-bit architectures, stat64 still has a 32-bit time value,
> > and even if it is 64-bit wide, glibc turns it into a 32-bit
> > number when copying to its version of 'struct stat64' that differs
> > from the kernel's and contains a 'struct timespec'.
> 
> So stat64 also returns EOVERLFOW to indicate to the application
> that it needs to use a different, 64 bit time aware syscall.

Right, except that for an application programmer this is not
visible. The application won't be able to call a replacement system
call, but it has to be built with -D_TIME_BITS=64 so all time_t
uses get changed to 64-bit.

I believe for LFS support, user space was allowed to call
a stat64() glibc function, but that won't be provided here.

> > I don't care if we return EOVERFLOW or give some other indication
> > of the overflow, but this is fundamentally a separate problem from
> > large file support, and we need a new syscall for it.
> 
> Yes, but you seem to have missed my point that the we don't need to
> care about 32 bit times, whether it be syscalls or tasks, at the
> VFS/filesystem level. That's all a syscall layer issue. i.e.
> vfs_fstatat() only uses 64 bit times and only places where struct
> kstat contents get converted to whatever structure userspace has
> supplied (e.g. cp_new_stat()) need to detect/care about timestamp
> overflows like they already do with inode numbers and link counts.

I think that was exactly the point I was making.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index 1182af8..d70a8f6 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -239,6 +239,8 @@  static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	s->s_maxbytes = MAX_NON_LFS;
 	s->s_op = &default_op;
 	s->s_time_gran = 1000000000;
+	s->s_time_max = MAX_INVALID_VFS_TIME;
+	s->s_time_min = MIN_INVALID_VFS_TIME;
 	s->cleancache_poolid = CLEANCACHE_NO_POOL;
 
 	s->s_shrink.seeks = DEFAULT_SEEKS;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1af4727..15b41e6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -927,6 +927,12 @@  static inline struct file *get_file(struct file *f)
 #define MAX_LFS_FILESIZE 	((loff_t)0x7fffffffffffffffLL)
 #endif
 
+#define MAX_VFS_TIME	S64_MAX
+#define MIN_VFS_TIME	S64_MIN
+
+#define MAX_INVALID_VFS_TIME	S64_MIN
+#define MIN_INVALID_VFS_TIME	S64_MAX
+
 #define FL_POSIX	1
 #define FL_FLOCK	2
 #define FL_DELEG	4	/* NFSv4 delegation */
@@ -1343,7 +1349,12 @@  struct super_block {
 
 	/* Granularity of c/m/atime in ns.
 	   Cannot be worse than a second */
-	u32		   s_time_gran;
+	u32		s_time_gran;
+	/* Max and min values for timestamps
+	 * according to the range supported by filesystems.
+	 */
+	time64_t	s_time_max;
+	time64_t	s_time_min;
 
 	/*
 	 * The next field is for VFS *only*. No filesystems have any business