Message ID | 1455269766-2994-12-git-send-email-deepa.kernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote: > The VFS inode timestamps are not y2038 safe as they use > struct timespec. These will be changed to use struct timespec64 > instead and that is y2038 safe. > But, since the above data type conversion will break the end > file systems, use vfs_time aliases here to access inode times. > > These timestamps are passed in as arguments to functions > using inode timestamps. Hence, these need to change along > with vfs to support 64 bit timestamps. vfs_time helps do > this transition. > > Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Just a point to highlight the problem with this approach: > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > index f8f2359..1273db6 100644 > --- a/net/ceph/osd_client.c > +++ b/net/ceph/osd_client.c > @@ -2401,7 +2401,7 @@ bad: > */ > void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off, > struct ceph_snap_context *snapc, u64 snap_id, > - struct timespec *mtime) > + struct vfs_time *mtime) > { > struct ceph_msg *msg = req->r_request; > void *p; So this change assumes that mtime is not passed by reference to another function. If we change vfs_time to be a timespec64, then dereferencing in this function works fine, but passing to another function will not because that function will be expecting a timespec. That, indeed, is what happens here. A few lines into this function: if (req->r_flags & CEPH_OSD_FLAG_WRITE) ceph_encode_timespec(p, mtime); And that function: static inline void ceph_encode_timespec(struct ceph_timespec *tv, const struct timespec *ts) { tv->tv_sec = cpu_to_le32((u32)ts->tv_sec); tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec); } expects a timespec. It will silently lose 64 bit times even if it did compile. I note in version 2b, the mtime was not passed by reference as a vfs time, but converted at the call site to a timespec and so the internal usage of the timestamp remains unchanged and unaffected by a VFS level timespec->timespec64 change. I think an approach that requires changes to the API without actually beign able to verify they are correct, fully propagated or don't impact on write/disk formats before the final change of the VFS type is not going to fly. This is the sort of subtle bug that can occur with type changes, and hence why I think that the fs developers should be left to do the conversion of their filesystem to support 64 bit times (i.e. approach 2b). Any change is going to take a significant amount of testing and verification, and that's something we don't have yet. Nobody has written any tests for xfstests to verify correct 64 bit timestamp behaviour, nor do we have tests to verify 32 bit timestamp behaviour on a 64 bit time kernel. These are things that we are going to need; all filesystems should behave the same w.r.t. these configurations, so we really do need regression tests for this.... Cheers, Dave.
On Sat, Feb 13, 2016 at 2:08 PM, Dave Chinner <david@fromorbit.com> wrote: > On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote: >> The VFS inode timestamps are not y2038 safe as they use >> struct timespec. These will be changed to use struct timespec64 >> instead and that is y2038 safe. >> But, since the above data type conversion will break the end >> file systems, use vfs_time aliases here to access inode times. >> >> These timestamps are passed in as arguments to functions >> using inode timestamps. Hence, these need to change along >> with vfs to support 64 bit timestamps. vfs_time helps do >> this transition. >> >> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> > > Just a point to highlight the problem with this approach: > >> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c >> index f8f2359..1273db6 100644 >> --- a/net/ceph/osd_client.c >> +++ b/net/ceph/osd_client.c >> @@ -2401,7 +2401,7 @@ bad: >> */ >> void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off, >> struct ceph_snap_context *snapc, u64 snap_id, >> - struct timespec *mtime) >> + struct vfs_time *mtime) >> { >> struct ceph_msg *msg = req->r_request; >> void *p; > > So this change assumes that mtime is not passed by reference to > another function. If we change vfs_time to be a timespec64, then > dereferencing in this function works fine, but passing to another > function will not because that function will be expecting a > timespec. > > That, indeed, is what happens here. A few lines into this function: > > if (req->r_flags & CEPH_OSD_FLAG_WRITE) > ceph_encode_timespec(p, mtime); > > And that function: > > static inline void ceph_encode_timespec(struct ceph_timespec *tv, > const struct timespec *ts) > { > tv->tv_sec = cpu_to_le32((u32)ts->tv_sec); > tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec); > } I'm not sure where you picked up this encode function from. You might be missing the patches( 9 and 10) before this?: 2b5f8e517c6fd7121fc1b890c51c6256bc21beb6 net: ceph: use vfs_time data type instead of timespec ca5b82952a6a522ae058ccede57ba1a71da498c5 fs: ceph: Replace timespec data type with vfs_time 3a3ac0bdd23284c4f27a7ab1c133056c1a998075 fs: ceph: Change encode and decode functions to use vfs_time So encode function actually looks like -static inline void ceph_decode_timespec(struct timespec *ts, +static inline void ceph_decode_timespec(struct vfs_time *ts, const struct ceph_timespec *tv) { - ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); - ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); + ts->tv_sec = (s64)(u32)le32_to_cpu(tv->tv_sec); + ts->tv_nsec = (long)(u32)le32_to_cpu(tv->tv_nsec); } There is a conversion error here which I will be fixing separately from the series. Also, there is another commit in my tree that is pointed to in the cover letter that is also required here: 40219ae801c0284a233ed908b07bb468d219cbc8 net: ceph: Remove CURRENT_TIME references Changes have been split so that they can done in manageable chunks just like how we are not changing all filesystems at once. > I think an approach that requires changes to the API without > actually beign able to verify they are correct, fully propagated or > don't impact on write/disk formats before the final change of the > VFS type is not going to fly. This is the sort of subtle bug that > can occur with type changes, and hence why I think that the fs > developers should be left to do the conversion of their filesystem > to support 64 bit times (i.e. approach 2b). Approach 2b has the same problem of not being able to verify the conversion before the vfs switch. Consider what happens if you miss changing one of the instances of direct inode time access. So 2b is also not completely verifiable that the changes are completely propagated. The only approach that does this is the 2c because the data types in the individual filesystems are decoupled from vfs data types from the get go. Besides, anything omitted by 2a or 2b in the process of conversion should be easily verifiable when vfs does switch. At this point, there will be warnings in case of pointer conversion or errors in case of pass by value, if the data types do not match. Apart from this, process of how individual filesystems are converted will help avoid these bugs. Here is one of the tricks I plan to do (consider example approach 2a): 1. Change an individual filesystem to use vfs_time. 2. Change vfs to timespec64 and verify that the targeted filesystem will actually compile. Tagging tricks are also useful here. Keep in mind that timespec = timespec64 already on 64 bit systems. But, still there might be some tricky cases which should be okay because it will have to reviewed by individual filesystem maintainers. > Any change is going to take a significant amount of testing and > verification, and that's something we don't have yet. Nobody has > written any tests for xfstests to verify correct 64 bit timestamp > behaviour, nor do we have tests to verify 32 bit timestamp behaviour > on a 64 bit time kernel. These are things that we are going to need; > all filesystems should behave the same w.r.t. these configurations, > so we really do need regression tests for this.... Agree. This is needed regardless of what approach is chosen. And, this is a problem for all filesystems even today. -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> static inline void ceph_encode_timespec(struct ceph_timespec *tv, >> const struct timespec *ts) >> { >> tv->tv_sec = cpu_to_le32((u32)ts->tv_sec); >> tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec); >> } Pointed to decode function change in previous email. Pasting encode function change also here: static inline void ceph_encode_timespec(struct ceph_timespec *tv, - const struct timespec *ts) + const struct vfs_time *ts) -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 13, 2016 at 05:46:11PM -0800, Deepa Dinamani wrote: > On Sat, Feb 13, 2016 at 2:08 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote: > >> The VFS inode timestamps are not y2038 safe as they use > >> struct timespec. These will be changed to use struct timespec64 > >> instead and that is y2038 safe. > >> But, since the above data type conversion will break the end > >> file systems, use vfs_time aliases here to access inode times. > >> > >> These timestamps are passed in as arguments to functions > >> using inode timestamps. Hence, these need to change along > >> with vfs to support 64 bit timestamps. vfs_time helps do > >> this transition. > >> > >> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> > > > > Just a point to highlight the problem with this approach: > > > >> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > >> index f8f2359..1273db6 100644 > >> --- a/net/ceph/osd_client.c > >> +++ b/net/ceph/osd_client.c > >> @@ -2401,7 +2401,7 @@ bad: > >> */ > >> void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off, > >> struct ceph_snap_context *snapc, u64 snap_id, > >> - struct timespec *mtime) > >> + struct vfs_time *mtime) > >> { > >> struct ceph_msg *msg = req->r_request; > >> void *p; > > > > So this change assumes that mtime is not passed by reference to > > another function. If we change vfs_time to be a timespec64, then > > dereferencing in this function works fine, but passing to another > > function will not because that function will be expecting a > > timespec. > > > > That, indeed, is what happens here. A few lines into this function: > > > > if (req->r_flags & CEPH_OSD_FLAG_WRITE) > > ceph_encode_timespec(p, mtime); > > > > And that function: > > > > static inline void ceph_encode_timespec(struct ceph_timespec *tv, > > const struct timespec *ts) > > { > > tv->tv_sec = cpu_to_le32((u32)ts->tv_sec); > > tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec); > > } > > I'm not sure where you picked up this encode function from. > > You might be missing the patches( 9 and 10) before this?: > > 2b5f8e517c6fd7121fc1b890c51c6256bc21beb6 net: ceph: use vfs_time data > type instead of timespec > ca5b82952a6a522ae058ccede57ba1a71da498c5 fs: ceph: Replace timespec > data type with vfs_time > 3a3ac0bdd23284c4f27a7ab1c133056c1a998075 fs: ceph: Change encode and > decode functions to use vfs_time So I missed this last patch when quickly reading throught he series. I think this is being broken up way too much, and that makes it hard to see how the changes relate. i.e. this series could have easily been just 3 or 4 patches - it's only ~100 lines of code total that is changed by the series. FWIW, let me put this in context for you, so maybe you'll understand why I think this timespec64 changeover is actually a trivial, simply thing. Last week I reviewed about 10,000 lines of new code amongst some 14,000 lines of change (twice) for xfstests. That was amongst getting ~1000 lines of my own changes reviewed and committed into the XFS kernel tree, handling user problems, reviewing and commenting on DAX changes, etc. This week I only have a libxfs kernel/userspace code sync to do (only 500 lines of change this time), and then I've about 10,000 lines of complex new XFS kernel code to review (i.e. for reverse mapping, reflink, dedupe and copy-on-write support). I'll have to review that more than once, and once that is done and all th changes have been propagated over into the userspace code, I've got another 10,000 lines of code in userspace (again for reflink, etc) to review, test and merge. So, excuse me if I made a mistake and missed something in a patchset that a) had 3 different versions posted, b) is way too fine-grained, and c) being treated like a mountain when it's really a tiny molehill. I do have much more important things to do with my time than be dragged into another silly "oh this is so difficult and hard" bikeshedding argument when I could easily write the entire patchset to do a timespec64 changeover for the VFS in a couple of hours. It's just not that hard to do. And, FWIW, I'm still waiting to hear how we're going to regression test all this. Has anyone written any xfstests yet to ensure that all the filesystems behave the same and we won't break anything in future as we add 64 bit timestamp support to filesystem on-disk formats? IMO, there's more work in writing the regression tests to make sure everything works correctly in all the different possible combinations of filesystem, kernel and userspace support (e.g. 32 on 32, 32 on 64, 64 on 32 and 64 on 64). I'm much more concerned about this aspect of the problem than actually changing the VFS code, because without it we can't verify the changes we are making are behaving correctly... Cheers, Dave.
On Monday 15 February 2016 08:00:50 Dave Chinner wrote: > On Sat, Feb 13, 2016 at 05:46:11PM -0800, Deepa Dinamani wrote: > > On Sat, Feb 13, 2016 at 2:08 PM, Dave Chinner <david@fromorbit.com> wrote: > > > On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote: > So, excuse me if I made a mistake and missed something in a patchset > that a) had 3 different versions posted, b) is way too fine-grained, > and c) being treated like a mountain when it's really a tiny > molehill. I do have much more important things to do with my time > than be dragged into another silly "oh this is so difficult and > hard" bikeshedding argument when I could easily write the entire > patchset to do a timespec64 changeover for the VFS in a couple of > hours. It's just not that hard to do. > > And, FWIW, I'm still waiting to hear how we're going to regression > test all this. Has anyone written any xfstests yet to ensure that > all the filesystems behave the same and we won't break anything in > future as we add 64 bit timestamp support to filesystem on-disk > formats? IMO, there's more work in writing the regression tests to > make sure everything works correctly in all the different possible > combinations of filesystem, kernel and userspace support (e.g. 32 on 32, > 32 on 64, 64 on 32 and 64 on 64). I'm much more concerned about > this aspect of the problem than actually changing the VFS > code, because without it we can't verify the changes we are making > are behaving correctly... You are mixing up way too many things here, for this series all we need is for you to say that one of the approaches is ok, and they are all to the point where they are simple enough that they don't really do much at all. Deepa is taking baby steps here because you complained about v1 being too complex. This series is not about changing the on-disk format, it is not even changing the VFS time format (yet), it's just a preparation so we can eventually change it. There are four different things that are going on at the same time, all independent of one another: 1. Changing the file systems so we are able to do the change in struct inode, this series. The *only* part we care about here is that this does not change the existing behavior on either 32-bit or 64-bit systems, and that should be trivial to review. 2. Changing the file systems to provide information to VFS about the time stamp ranges they support in order to do proper handling of overflows in VFS. Deepa has posted a first set of patches to always use current_fs_time() consistently, work on that is continuing and once done, we can debate the policy for what should happen in case of overflow. 3. Writing test cases in xfstests and/or LTP. Yes, we need them, and I think Deepa has started on those, but I don't think they are needed at this point as there is little to test before steps 1 and 2 are done. 4. Changing file systems to use longer on-disk timestamps where needed. This is completely independent of anything else and up to the individual file system developers. Anyone can test this now on 64-bit architectures, and most file systems we care about (xfs being the notable exception, ext4 also until very recently) already do this properly. After 1, 2 and 3 are done, the simple patch to switch over VFS can be implemented and tested, followed by whatever work remains to switch over file systems to use 64-bit timestamps in the kernel (independent of what they use on disk, again). My current line of thinking is that for step 1, I'd let Deepa pick one of the three approaches she posted (I don't think we found any showstoppers), and put the patches in my y2038 tree for merging in 4.6. We can easily leave out the file systems that have conflicts against linux-next, and you can put Deepa's patch or another implementation of that into 4.7. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index afe886b..28bba12 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -8,6 +8,7 @@ #include <linux/radix-tree.h> #include <linux/uio.h> #include <linux/workqueue.h> +#include <linux/fs.h> #include <net/net_namespace.h> #include <linux/ceph/types.h> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 7506b48..2b6f08b 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -322,7 +322,7 @@ extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client * extern void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off, struct ceph_snap_context *snapc, u64 snap_id, - struct timespec *mtime); + struct vfs_time *mtime); extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *, struct ceph_file_layout *layout, @@ -364,7 +364,7 @@ extern int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_snap_context *sc, u64 off, u64 len, u32 truncate_seq, u64 truncate_size, - struct timespec *mtime, + struct vfs_time *mtime, struct page **pages, int nr_pages); /* watch/notify events */ diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index f8f2359..1273db6 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -2401,7 +2401,7 @@ bad: */ void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off, struct ceph_snap_context *snapc, u64 snap_id, - struct timespec *mtime) + struct vfs_time *mtime) { struct ceph_msg *msg = req->r_request; void *p;
The VFS inode timestamps are not y2038 safe as they use struct timespec. These will be changed to use struct timespec64 instead and that is y2038 safe. But, since the above data type conversion will break the end file systems, use vfs_time aliases here to access inode times. These timestamps are passed in as arguments to functions using inode timestamps. Hence, these need to change along with vfs to support 64 bit timestamps. vfs_time helps do this transition. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> --- include/linux/ceph/messenger.h | 1 + include/linux/ceph/osd_client.h | 4 ++-- net/ceph/osd_client.c | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-)