diff mbox series

[RFC,4/5] xfs: extend inode format for 40-bit timestamps

Message ID 20191112120910.1977003-5-arnd@arndb.de (mailing list archive)
State Deferred, archived
Headers show
Series xfs: y2038 conversion | expand

Commit Message

Arnd Bergmann Nov. 12, 2019, 12:09 p.m. UTC
XFS is the only major file system that lacks timestamps beyond year 2038,
and is already being deployed in systems that may have to be supported
beyond that time.

Fortunately, the inode format still has a few reserved bits that can be
used to extend the current format. There are two bits in the nanosecond
portion that could be used in the same way that ext4 does, extending
the timestamps until year 2378, as well as 12 unused bytes after the
already allocated fields.

There are four timestamps that need to be extended, so using four
bytes out of the reserved space gets us all the way until year 36676,
by extending the current 1902-2036 with another 255 epochs, which
seems to be a reasonable range.

I am not sure whether this change to the inode format requires a
new version for the inode. All existing file system images remain
compatible, while mounting a file systems with extended timestamps
beyond 2038 would report that timestamp incorrectly in the 1902
through 2038 range, matching the traditional Linux behavior of
wrapping timestamps.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 fs/xfs/libxfs/xfs_format.h      |  6 +++++-
 fs/xfs/libxfs/xfs_inode_buf.c   | 28 ++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_inode_buf.h   |  1 +
 fs/xfs/libxfs/xfs_log_format.h  |  6 +++++-
 fs/xfs/libxfs/xfs_trans_inode.c |  3 ++-
 fs/xfs/xfs_inode.c              |  3 ++-
 fs/xfs/xfs_inode_item.c         | 10 +++++++---
 fs/xfs/xfs_iops.c               |  3 ++-
 fs/xfs/xfs_itable.c             |  2 +-
 fs/xfs/xfs_super.c              |  2 +-
 10 files changed, 46 insertions(+), 18 deletions(-)

Comments

Christoph Hellwig Nov. 12, 2019, 2:16 p.m. UTC | #1
Amir just send another patch dealing with the time stamps.  I'd suggest
you chime into the discussion in that thread.
Amir Goldstein Nov. 12, 2019, 3:02 p.m. UTC | #2
On Tue, Nov 12, 2019 at 4:16 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Amir just send another patch dealing with the time stamps.  I'd suggest
> you chime into the discussion in that thread.

That's right I just posted the ext4 style extend to 34bits yesterday [1],
but I like your version so much better, so I will withdraw mine.

Sorry I did not CC you nor Deepa nor y2038 list.
I did not think you were going to actually deal with specific filesystems.

I'd also like to hear people's thoughts about migration process.
Should the new feature be ro_compat as I defined it or incompat?

If all agree that Arnd's format change is preferred, I can assist with
xfsprogs patches, tests or whatnot.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-xfs/20191112082524.GA18779@infradead.org/T/#mfa11ea3c035d4c21ec6a56b7c83a6dfa76e48068
Arnd Bergmann Nov. 12, 2019, 3:29 p.m. UTC | #3
On Tue, Nov 12, 2019 at 4:02 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 4:16 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Amir just send another patch dealing with the time stamps.  I'd suggest
> > you chime into the discussion in that thread.
>
> That's right I just posted the ext4 style extend to 34bits yesterday [1],
> but I like your version so much better, so I will withdraw mine.

Thanks! I guess we probably want part of both in the end. I considered
adding wrappers for encoding and decoding the timestamp like yours
but in the end went with open-coding that part. The difference is pretty
minimal, should we leave it with the open-coded addition?

One part that I was missing (as described in my changelog) is any
versioning or feature flag, including the dynamically set sb->s_time_max
value. From looking at your patch, I guess this is something we want.

> Sorry I did not CC you nor Deepa nor y2038 list.
> I did not think you were going to actually deal with specific filesystems.

I was not going to, but in the process of cleaning up the remaining ioctls
I came to xfs last week and thought it would be silly to extend the uapi
but not the file system ;-)

> I'd also like to hear people's thoughts about migration process.
> Should the new feature be ro_compat as I defined it or incompat?
>
> If all agree that Arnd's format change is preferred, I can assist with
> xfsprogs patches, tests or whatnot.

Awesome, that would be great indeed!

      Arnd
Dave Chinner Nov. 12, 2019, 9:32 p.m. UTC | #4
On Tue, Nov 12, 2019 at 01:09:09PM +0100, Arnd Bergmann wrote:
> XFS is the only major file system that lacks timestamps beyond year 2038,
> and is already being deployed in systems that may have to be supported
> beyond that time.
> 
> Fortunately, the inode format still has a few reserved bits that can be
> used to extend the current format. There are two bits in the nanosecond
> portion that could be used in the same way that ext4 does, extending
> the timestamps until year 2378, as well as 12 unused bytes after the
> already allocated fields.
> 
> There are four timestamps that need to be extended, so using four
> bytes out of the reserved space gets us all the way until year 36676,
> by extending the current 1902-2036 with another 255 epochs, which
> seems to be a reasonable range.
> 
> I am not sure whether this change to the inode format requires a
> new version for the inode. All existing file system images remain
> compatible, while mounting a file systems with extended timestamps
> beyond 2038 would report that timestamp incorrectly in the 1902
> through 2038 range, matching the traditional Linux behavior of
> wrapping timestamps.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

This is basically what I proposed ~5 years or so ago and posted a
patch to implement it in an early y2038 discussion with you. I jsut
mentioned that very patch in my reposnse to Amir's timestamp
extension patchset, pointing out that this isn't the way we want
to proceed with >y2038 on-disk support.

https://lore.kernel.org/linux-xfs/20191112161242.GA19334@infradead.org/T/#maf6b2719ed561cc2865cc5e7eb82df206b971261

I'd suggest taking the discussion there....

Cheers,

Dave.
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index c968b60cee15..dc8d160775fb 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -883,7 +883,11 @@  typedef struct xfs_dinode {
 	__be64		di_lsn;		/* flush sequence */
 	__be64		di_flags2;	/* more random flags */
 	__be32		di_cowextsize;	/* basic cow extent size for file */
-	__u8		di_pad2[12];	/* more padding for future expansion */
+	__u8		di_atime_hi;	/* upper 8 bits of di_atime */
+	__u8		di_mtime_hi;	/* upper 8 bits of di_mtime */
+	__u8		di_ctime_hi;	/* upper 8 bits of di_ctime */
+	__u8		di_crtime_hi;	/* upper 8 bits of di_crtime */
+	__u8		di_pad2[8];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
 	xfs_timestamp_t	di_crtime;	/* time created */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 28ab3c5255e1..4989b6f1ac6f 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -228,16 +228,19 @@  xfs_inode_from_disk(
 	to->di_flushiter = be16_to_cpu(from->di_flushiter);
 
 	/*
-	 * Time is signed, so need to convert to signed 32 bit before
-	 * storing in inode timestamp which may be 64 bit. Otherwise
-	 * a time before epoch is converted to a time long after epoch
-	 * on 64 bit systems.
+	 * The supported time range starts at INT_MIN, corresponding to
+	 * year 1902. With the traditional low 32 bits, this ends in
+	 * year 2038, the extra 8 bits extend it by another 255 epochs
+	 * of 136.1 years each, up to year 36744.
 	 */
-	inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec);
+	inode->i_atime.tv_sec = be32_to_cpu(from->di_atime.t_sec) +
+				((u64)from->di_atime_hi << 32);
 	inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec);
-	inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec);
+	inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec) +
+				((u64)from->di_mtime_hi << 32);
 	inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec);
-	inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec);
+	inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec) +
+				((u64)from->di_ctime_hi << 32);
 	inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec);
 	inode->i_generation = be32_to_cpu(from->di_gen);
 	inode->i_mode = be16_to_cpu(from->di_mode);
@@ -256,7 +259,8 @@  xfs_inode_from_disk(
 	if (to->di_version == 3) {
 		inode_set_iversion_queried(inode,
 					   be64_to_cpu(from->di_changecount));
-		to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
+		to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec) +
+				((u64)from->di_crtime_hi << 32);
 		to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
 		to->di_flags2 = be64_to_cpu(from->di_flags2);
 		to->di_cowextsize = be32_to_cpu(from->di_cowextsize);
@@ -284,10 +288,13 @@  xfs_inode_to_disk(
 
 	memset(to->di_pad, 0, sizeof(to->di_pad));
 	to->di_atime.t_sec = cpu_to_be32(inode->i_atime.tv_sec);
+	to->di_atime_hi = upper_32_bits(inode->i_atime.tv_sec);
 	to->di_atime.t_nsec = cpu_to_be32(inode->i_atime.tv_nsec);
 	to->di_mtime.t_sec = cpu_to_be32(inode->i_mtime.tv_sec);
+	to->di_mtime_hi = upper_32_bits(inode->i_mtime.tv_sec);
 	to->di_mtime.t_nsec = cpu_to_be32(inode->i_mtime.tv_nsec);
 	to->di_ctime.t_sec = cpu_to_be32(inode->i_ctime.tv_sec);
+	to->di_ctime_hi = upper_32_bits(inode->i_ctime.tv_sec);
 	to->di_ctime.t_nsec = cpu_to_be32(inode->i_ctime.tv_nsec);
 	to->di_nlink = cpu_to_be32(inode->i_nlink);
 	to->di_gen = cpu_to_be32(inode->i_generation);
@@ -307,6 +314,7 @@  xfs_inode_to_disk(
 	if (from->di_version == 3) {
 		to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
 		to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
+		to->di_crtime_hi = upper_32_bits(from->di_crtime.t_sec);
 		to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
 		to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
@@ -338,10 +346,13 @@  xfs_log_dinode_to_disk(
 	memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
 
 	to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec);
+	to->di_atime_hi = from->di_atime_hi;
 	to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec);
 	to->di_mtime.t_sec = cpu_to_be32(from->di_mtime.t_sec);
+	to->di_mtime_hi = from->di_mtime_hi;
 	to->di_mtime.t_nsec = cpu_to_be32(from->di_mtime.t_nsec);
 	to->di_ctime.t_sec = cpu_to_be32(from->di_ctime.t_sec);
+	to->di_ctime_hi = from->di_ctime_hi;
 	to->di_ctime.t_nsec = cpu_to_be32(from->di_ctime.t_nsec);
 
 	to->di_size = cpu_to_be64(from->di_size);
@@ -359,6 +370,7 @@  xfs_log_dinode_to_disk(
 	if (from->di_version == 3) {
 		to->di_changecount = cpu_to_be64(from->di_changecount);
 		to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
+		to->di_crtime_hi = from->di_crtime_hi;
 		to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
 		to->di_flags2 = cpu_to_be64(from->di_flags2);
 		to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
index ab0f84165317..49556e1898da 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.h
+++ b/fs/xfs/libxfs/xfs_inode_buf.h
@@ -38,6 +38,7 @@  struct xfs_icdinode {
 	uint32_t	di_cowextsize;	/* basic cow extent size for file */
 
 	xfs_ictimestamp_t di_crtime;	/* time created */
+	uint8_t		di_crtime_hi;	/* upper 8 bites of di_crtime */
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index e5f97c69b320..c17e7c6511ff 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -414,7 +414,11 @@  struct xfs_log_dinode {
 	xfs_lsn_t	di_lsn;		/* flush sequence */
 	uint64_t	di_flags2;	/* more random flags */
 	uint32_t	di_cowextsize;	/* basic cow extent size for file */
-	uint8_t		di_pad2[12];	/* more padding for future expansion */
+	uint8_t		di_atime_hi;	/* upper 8 bits of di_atime */
+	uint8_t		di_mtime_hi;	/* upper 8 bits of di_mtime */
+	uint8_t		di_ctime_hi;	/* upper 8 bits of di_ctime */
+	uint8_t		di_crtime_hi;	/* upper 8 bits of di_crtime */
+	uint8_t		di_pad2[8];	/* more padding for future expansion */
 
 	/* fields only written to during inode creation */
 	xfs_ictimestamp_t di_crtime;	/* time created */
diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index a9ad90926b87..419356eec52c 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -67,7 +67,8 @@  xfs_trans_ichgtime(
 	if (flags & XFS_ICHGTIME_CHG)
 		inode->i_ctime = tv;
 	if (flags & XFS_ICHGTIME_CREATE) {
-		ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
+		ip->i_d.di_crtime.t_sec = lower_32_bits(tv.tv_sec);
+		ip->i_d.di_crtime_hi = upper_32_bits(tv.tv_sec);
 		ip->i_d.di_crtime.t_nsec = (int32_t)tv.tv_nsec;
 	}
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 18f4b262e61c..c0d9d568ea4f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -845,7 +845,8 @@  xfs_ialloc(
 		inode_set_iversion(inode, 1);
 		ip->i_d.di_flags2 = 0;
 		ip->i_d.di_cowextsize = 0;
-		ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
+		ip->i_d.di_crtime.t_sec = lower_32_bits(tv.tv_sec);
+		ip->i_d.di_crtime_hi = upper_32_bits(tv.tv_sec);
 		ip->i_d.di_crtime.t_nsec = (int32_t)tv.tv_nsec;
 	}
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index bb8f076805b9..338188a5a698 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -314,11 +314,14 @@  xfs_inode_to_log_dinode(
 
 	memset(to->di_pad, 0, sizeof(to->di_pad));
 	memset(to->di_pad3, 0, sizeof(to->di_pad3));
-	to->di_atime.t_sec = inode->i_atime.tv_sec;
+	to->di_atime.t_sec = lower_32_bits(inode->i_atime.tv_sec);
+	to->di_atime_hi = upper_32_bits(inode->i_atime.tv_sec);
 	to->di_atime.t_nsec = inode->i_atime.tv_nsec;
-	to->di_mtime.t_sec = inode->i_mtime.tv_sec;
+	to->di_mtime.t_sec = lower_32_bits(inode->i_mtime.tv_sec);
+	to->di_mtime_hi = upper_32_bits(inode->i_mtime.tv_sec);
 	to->di_mtime.t_nsec = inode->i_mtime.tv_nsec;
-	to->di_ctime.t_sec = inode->i_ctime.tv_sec;
+	to->di_ctime.t_sec = lower_32_bits(inode->i_ctime.tv_sec);
+	to->di_ctime_hi = upper_32_bits(inode->i_ctime.tv_sec);
 	to->di_ctime.t_nsec = inode->i_ctime.tv_nsec;
 	to->di_nlink = inode->i_nlink;
 	to->di_gen = inode->i_generation;
@@ -341,6 +344,7 @@  xfs_inode_to_log_dinode(
 	if (from->di_version == 3) {
 		to->di_changecount = inode_peek_iversion(inode);
 		to->di_crtime.t_sec = from->di_crtime.t_sec;
+		to->di_crtime_hi = from->di_crtime_hi;
 		to->di_crtime.t_nsec = from->di_crtime.t_nsec;
 		to->di_flags2 = from->di_flags2;
 		to->di_cowextsize = from->di_cowextsize;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index fe285d123d69..72d40ae1e91f 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -516,7 +516,8 @@  xfs_vn_getattr(
 	if (ip->i_d.di_version == 3) {
 		if (request_mask & STATX_BTIME) {
 			stat->result_mask |= STATX_BTIME;
-			stat->btime.tv_sec = ip->i_d.di_crtime.t_sec;
+			stat->btime.tv_sec = ip->i_d.di_crtime.t_sec +
+					((u64)ip->i_d.di_crtime_hi << 32);
 			stat->btime.tv_nsec = ip->i_d.di_crtime.t_nsec;
 		}
 	}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 884950adbd16..ea4bf4475727 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -97,7 +97,7 @@  xfs_bulkstat_one_int(
 	buf->bs_mtime_nsec = inode->i_mtime.tv_nsec;
 	buf->bs_ctime = inode->i_ctime.tv_sec;
 	buf->bs_ctime_nsec = inode->i_ctime.tv_nsec;
-	buf->bs_btime = dic->di_crtime.t_sec;
+	buf->bs_btime = dic->di_crtime.t_sec + ((u64)dic->di_crtime_hi << 32);
 	buf->bs_btime_nsec = dic->di_crtime.t_nsec;
 	buf->bs_gen = inode->i_generation;
 	buf->bs_mode = inode->i_mode;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8d1df9f8be07..2adfe1039693 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1665,7 +1665,7 @@  xfs_fs_fill_super(
 	sb->s_max_links = XFS_MAXLINK;
 	sb->s_time_gran = 1;
 	sb->s_time_min = S32_MIN;
-	sb->s_time_max = S32_MAX;
+	sb->s_time_max = S32_MAX + 255 * 0x100000000ull;
 	sb->s_iflags |= SB_I_CGROUPWB;
 
 	set_posix_acl_flag(sb);