Message ID | 20190829041132.26677-1-deepa.kernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] vfs: Add support for timestamp limits | expand |
On Thu, Aug 29, 2019 at 6:12 AM Deepa Dinamani <deepa.kernel@gmail.com> wrote: > > Hi Al, Arnd, > > This is a pull request for filling in min and max timestamps for filesystems. > I've added all the acks, and dropped the adfs patch. That will be merged through > Russell's tree. > > Thanks, > Deepa > > The following changes since commit 5d18cb62218608a1388858880ad3ec76d6cb0d3b: > > Add linux-next specific files for 20190828 (2019-08-28 19:59:14 +1000) > > are available in the Git repository at: > > https://github.com/deepa-hub/vfs limits Please rebase this branch on top of linux-5.3-rc6 and resend. I can't pull a branch that contains linux-next. Maybe drop the orangefs patch for now, at least until we have come to a conclusion on that. Arnd
I spoke to Walt Ligon about the versioning code, and also shared this thread with him. He isn't a fan of the versioning code. and we think it should go. As I read through the commit messages from when the versioning code was added, it relates to mtime on directories. If a directory is read, and it has enough entries, it might take several operations to collect all the entries. During this time the directory might change. The versioning is a way to tell that something changed between one of the operations... commit: 7878027e9c2 (Oct 2004) - added a directory version that is passed back from the server to the client on each successful readdir call (happens to be the directory's mtime encoded as an opaque uint64_t) We will work to see if we can figure out what we need to do to Orangefs on both the userspace side and the kernel module side to have all 64 bit time values. I've also read up some on the y2038 cleanup work that Arnd and Deepa have been doing... Thanks to Arnd and Deepa for looking so deeply into the Orangefs userspace code... -Mike On Sat, Aug 31, 2019 at 6:50 PM Deepa Dinamani <deepa.kernel@gmail.com> wrote: > > > I think it's unclear from the orangefs source code what the intention is, > > as there is a mixed of signed and unsigned types used for the inode > > stamps: > > > > #define encode_PVFS_time encode_int64_t > > #define encode_int64_t(pptr,x) do { \ > > *(int64_t*) *(pptr) = cpu_to_le64(*(x)); \ > > *(pptr) += 8; \ > > } while (0) > > #define decode_PVFS_time decode_int64_t > > #define decode_int64_t(pptr,x) do { \ > > *(x) = le64_to_cpu(*(int64_t*) *(pptr)); \ > > *(pptr) += 8; \ > > } while (0) > > > > This suggests that making it unsigned may have been an accident. > > > > Then again, it's clearly and consistently printed as unsigned in > > user space: > > > > gossip_debug( > > GOSSIP_GETATTR_DEBUG, " VERSION is %llu, mtime is %llu\n", > > llu(s_op->attr.mtime), llu(resp_attr->mtime)); > > I think I had noticed these two and decided maybe the intention was to > use unsigned types. > > > A related issue I noticed is this: > > > > PVFS_time PINT_util_mktime_version(PVFS_time time) > > { > > struct timeval t = {0,0}; > > PVFS_time version = (time << 32); > > > > gettimeofday(&t, NULL); > > version |= (PVFS_time)t.tv_usec; > > return version; > > } > > PVFS_time PINT_util_mkversion_time(PVFS_time version) > > { > > return (PVFS_time)(version >> 32); > > } > > static PINT_sm_action getattr_verify_attribs( > > struct PINT_smcb *smcb, job_status_s *js_p) > > { > > ... > > resp_attr->mtime = PINT_util_mkversion_time(s_op->attr.mtime); > > ... > > } > > > > which suggests that at least for some purposes, the mtime field > > is only an unsigned 32-bit number (1970..2106). From my readiing, > > this affects the on-disk format, but not the protocol implemented > > by the kernel. > > > > atime and ctime are apparently 64-bit, but mtime is only 32-bit > > seconds, plus a 32-bit 'version'. I suppose the server could be > > fixed to allow a larger range, but probably would take it out of > > the 'version' bits, not the upper half. > > I had missed this part. Thanks. > > > To be on the safe side, I suppose the kernel can only assume > > an unsigned 32-bit range to be available. If the server gets > > extended beyond that, it would have to pass a feature flag. > > This makes sense to me also. And, as Arnd pointed out on the IRC, if > there are negative timestamps that are already in use, this will be a > problem for those use cases. > I can update tha patch to use limits 0-u32_max. > > -Deepa
On Thu, Sep 5, 2019 at 6:58 PM Mike Marshall <hubcap@omnibond.com> wrote: > > I spoke to Walt Ligon about the versioning code, and also shared > this thread with him. He isn't a fan of the versioning code. > and we think it should go. As I read through the commit messages > from when the versioning code was added, it relates to mtime > on directories. If a directory is read, and it has enough entries, > it might take several operations to collect all the entries. During > this time the directory might change. The versioning is a way to > tell that something changed between one of the operations... > > commit: 7878027e9c2 (Oct 2004) > - added a directory version that is passed back from the server to > the client on each successful readdir call (happens to be the > directory's mtime encoded as an opaque uint64_t) > > We will work to see if we can figure out what we need to do to Orangefs > on both the userspace side and the kernel module side to have all 64 bit > time values. Ok, sounds good. For the time being, I have applied the patch that limits the kernel to timestamps in the 1970 to 2106 range, which is compatible with the existing user space and will be good enough for a while. If you can ignore the old pre-versioning interfaces, you can decide to encode the epoch number in the remaining 12 bits of the on-disk representation, like time64 = (u32)(timeversion >> 32) + ((s64)(timeversion & 0xffc00000) << 12); to extend it into the far-enough future (136 years times 2^12), or possibly using time64 = (s32)(timeversion >> 32) + ((s64)(timeversion & 0xffc00000) << 12); to interpret existing timestamps with the msb set as dates between 1902 and 1970, which would fix the test case that broke, but disallow dates past 2038 with unmodified kernels. Arnd