Message ID | 20180122020426.2988-4-deepa.kernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: > + t.tv_nsec -= t.tv_nsec % gran; This doesn't actuall ywork if tv_nsec is negative. Which may not be an issue in most cases, but did somebody check utimensat() or whatever? > + WARN(1, "illegal file time granularity: %u", gran); .. small nit: we generally should use 'invalid' rather than 'illegal'. No cops will hunt you down for things like this. Linus
On Mon, Jan 22, 2018 at 12:25 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: >> + t.tv_nsec -= t.tv_nsec % gran; > > This doesn't actuall ywork if tv_nsec is negative. Right. > Which may not be an issue in most cases, but did somebody check > utimensat() or whatever? I checked POSIX again. There is no mention of tv_nsec being positive always for utimes. And, the long term plan is to replace all the callers of timespec_trunc() to use this new api instead for filesystems. So this will need to be fixed. I will fix this and post an update. >> + WARN(1, "illegal file time granularity: %u", gran); > > .. small nit: we generally should use 'invalid' rather than 'illegal'. Will update this as well. Thanks, Deepa
On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: > On Mon, Jan 22, 2018 at 12:25 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: >>> + t.tv_nsec -= t.tv_nsec % gran; >> >> This doesn't actuall ywork if tv_nsec is negative. > > Right. > >> Which may not be an issue in most cases, but did somebody check >> utimensat() or whatever? > > I checked POSIX again. There is no mention of tv_nsec being positive > always for utimes. > And, the long term plan is to replace all the callers of > timespec_trunc() to use this new api instead for filesystems. > So this will need to be fixed. I will fix this and post an update. I found this on http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html ERRORS These functions shall fail if: ... [EINVAL] Either of the times argument structures specified a tv_nsec value that was neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or greater than or equal to 1000 million. which is the same as the Linux man page and what the kernel actually does for all the syscalls. The POSIX description seems a bit ambiguous to whether it also expects or allows EINVAL for utimes() with a tv_usec over 1000000 microseconds, or if it just applies to the utimensat and futimens(). Older descriptions that only explain utimes() don't mention the range check on tv_usec either. Arnd
On Wed, Jan 24, 2018 at 3:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: >> On Mon, Jan 22, 2018 at 12:25 PM, Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: >>>> + t.tv_nsec -= t.tv_nsec % gran; >>> >>> This doesn't actuall ywork if tv_nsec is negative. >> >> Right. >> >>> Which may not be an issue in most cases, but did somebody check >>> utimensat() or whatever? >> >> I checked POSIX again. There is no mention of tv_nsec being positive >> always for utimes. >> And, the long term plan is to replace all the callers of >> timespec_trunc() to use this new api instead for filesystems. >> So this will need to be fixed. I will fix this and post an update. > > I found this on > http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html > > ERRORS > These functions shall fail if: > ... > [EINVAL] > Either of the times argument structures specified a tv_nsec value that was > neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or > greater than or equal to 1000 million. > > which is the same as the Linux man page and what the kernel actually > does for all the syscalls. The POSIX description seems a bit ambiguous > to whether it also expects or allows EINVAL for utimes() with a tv_usec > over 1000000 microseconds, or if it just applies to the utimensat and > futimens(). Older descriptions that only explain utimes() don't mention > the range check on tv_usec either. Right. This is in keeping with the kernel implementation of the corresponding syscalls. But, this timespec_truncate() is being called from current_time() and will be extended to other calls. C99 says "When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded (often called "truncation toward zero"). If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a." Also, we are already checking for gran being non-zero and in the nanoseconds range. So I think the right answer here is to add a comment that the function expects timespec to be normalized, and the functions calling it can take care of validation. -Deepa
On Wed, Jan 24, 2018 at 3:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: >> >> I checked POSIX again. There is no mention of tv_nsec being positive >> always for utimes. >> And, the long term plan is to replace all the callers of >> timespec_trunc() to use this new api instead for filesystems. >> So this will need to be fixed. I will fix this and post an update. > > I found this on > http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html > > ERRORS > These functions shall fail if: > ... > [EINVAL] > Either of the times argument structures specified a tv_nsec value that was > neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or > greater than or equal to 1000 million. The thing is, we shouldn't check the standards, we should check what we actually _do_. And what we actually _do_ is to have a tv_nsec that is of type "long", and while we do have a timespec64_valid(const struct timespec64 *ts) and fs/utimes.c has a 'nsec_valid()' that apparently the utimes* family of system calls all do end up using, I'm more worried about something where we don't. Because I'm almost certain that we've had cases where we just normalize things after-the-fact. This all likely isn't a big deal, but it does end up worrying me if we then _depend_ on that granularity thing actually giving the proper granularity even for odd inputs. If they can happen. Maybe we don't care? Linus
On Wed, Jan 24, 2018 at 7:00 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Jan 24, 2018 at 3:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: >> On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote: >>> >>> I checked POSIX again. There is no mention of tv_nsec being positive >>> always for utimes. >>> And, the long term plan is to replace all the callers of >>> timespec_trunc() to use this new api instead for filesystems. >>> So this will need to be fixed. I will fix this and post an update. >> >> I found this on >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html >> >> ERRORS >> These functions shall fail if: >> ... >> [EINVAL] >> Either of the times argument structures specified a tv_nsec value that was >> neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or >> greater than or equal to 1000 million. > > The thing is, we shouldn't check the standards, we should check what > we actually _do_. The issue for tv_sec is that we don't do anything interesting at the moment, and that's bad. - The Linux man page says we return -EINVAL for an out-of-range tv_sec, but the VFS code at the moment ends up running into an integer overflow, resulting in an arbitrary (i.e. file system specific) tv_sec when we read back a number that was set out of range. - POSIX (IEEE Std 1003.1-2008) appears to say we should cap the tv_sec to the maximum supported value to update the inode /and/ return -EINVAL, which seems better than what we do today, but would be surprising behavior, as -EINVAL usually indicates that we did not do anything. My best guess is that this is not intentional in the spec and should not be done like that. - Deepa's patch implements a third option, which is to cap to the maximum (or minimum for times before the fs specific epoch) and keep always returning success for utimensat() etc. This seems the most reasonable behavior. Between these three, we really need to make a decision. > And what we actually _do_ is to have a tv_nsec that is of type "long", > and while we do have a > > timespec64_valid(const struct timespec64 *ts) > > and fs/utimes.c has a 'nsec_valid()' that apparently the utimes* > family of system calls all do end up using, I'm more worried about > something where we don't. > > Because I'm almost certain that we've had cases where we just > normalize things after-the-fact. > > This all likely isn't a big deal, but it does end up worrying me if we > then _depend_ on that granularity thing actually giving the proper > granularity even for odd inputs. If they can happen. > > Maybe we don't care? This part seems easy, while there are two aspects here, I think they each have just one answer: - truncating the nanoseconds in the in-memory inode to the resolution supported by the file system is currently done by Linux (since 2.6.10). This behavior matches the Linux and POSIX documentation and is sensible, so there is no reason to change it. Before 2.6.10, we could end up with a timestamp moving backwards when an inode got evicted from the icache and loaded back from disk. - the range nsec validation is currently correct, I double-checked the relevant corner cases. We have to be careful when we introduce the 64-bit time_t based system call to make sure we can deal with glibc using 32-bit padding in the upper bits. For 64-bit user space, we must keep returning -EINVAL when those bits are nonzero, but for 32-bit tasks (compat or native), the current plan is to ignore the padding and instead take only the lower 32-bit before performing the range check. Deepa has posted patches for this in the past, but that's a differnent series. Arnd
diff --git a/fs/inode.c b/fs/inode.c index ef362364d396..0e2820ade554 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2107,6 +2107,36 @@ void inode_nohighmem(struct inode *inode) } EXPORT_SYMBOL(inode_nohighmem); +/** + * timestamp_truncate - Truncate timespec to a granularity + * @t: Timespec + * @inode: inode being updated + * + * Truncate a timespec to the granularity supported by the fs + * containing the inode. Always rounds down. gran must + * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns). + */ +struct timespec timestamp_truncate(struct timespec t, struct inode *inode) +{ + struct super_block *sb = inode->i_sb; + unsigned int gran = sb->s_time_gran; + + t.tv_sec = clamp_t(time64_t, t.tv_sec, sb->s_time_min, sb->s_time_max); + + /* Avoid division in the common cases 1 ns and 1 s. */ + if (gran == 1) { + /* nothing */ + } else if (gran == NSEC_PER_SEC) { + t.tv_nsec = 0; + } else if (gran > 1 && gran < NSEC_PER_SEC) { + t.tv_nsec -= t.tv_nsec % gran; + } else { + WARN(1, "illegal file time granularity: %u", gran); + } + return t; +} +EXPORT_SYMBOL(timestamp_truncate); + /** * current_time - Return FS time * @inode: inode. @@ -2126,6 +2156,6 @@ struct timespec current_time(struct inode *inode) return now; } - return timespec_trunc(now, inode->i_sb->s_time_gran); + return timestamp_truncate(now, inode); } EXPORT_SYMBOL(current_time);
timespec_trunc() function is used to truncate a filesystem timestamp to the right granularity. But, the function does not clamp tv_sec part of the timestamps according to the filesystem timestamp limits. Also, timespec_trunc() is exclusively used for filesystem timestamps. Move the api to be part of vfs. The replacement api: timestamp_truncate() also alters the signature of the function to accommodate filesystem timestamp clamping according to flesystem limits. Note that the clamp_t macro is used for clamping here as vfs is not yet using struct timespec64 internally. This is required for compilation purposes. Also note that clamp won't do the right thing for timestamps beyond 2038 on 32-bit machines until the vfs uses timespec64. After the vfs is transitioned to use timespec64 for timestamps, clamp_t() can be replaced by clamp(). Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> --- fs/inode.c | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-)