Message ID | 20191114105736.8636-1-lhenriques@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | ceph: safely use 'copy-from' Op on Octopus OSDs | expand |
On Thu, 2019-11-14 at 10:57 +0000, Luis Henriques wrote: > Hi! > > So, after the feedback I got from v1 [1] I've sent out a pull-request > for the OSDs [2] which encodes require_osd_release into the OSDMap > client data. This allows the client to figure out which ceph release > the OSDs cluster is running and decide whether or not it's safe to use > the copy-from Op for copy_file_range. > > This new patchset I'm sending simply adds enough functionality to the > kernel client so that it can take advantage of this OSD patch: > > 0001 - adds the ability to decode TYPE_MSGR2 addresses. This is a > required functionality for enabling SERVER_NAUTILUS in the > client. I hope I got the new format right, as I couldn't figure > out what the hard-coded values (see comments) really mean. > nit: the first 3 patch subject lines should probably be prefixed with "libceph:" > 0002 - allows the client to retrieve the new require_osd_release field > from the OSDMap if available. This patch also adds SERVER_MIMIC, > SERVER_NAUTILUS and SERVER_OCTOPUS to the supported features, > which TBH I'm not sure if that's a safe thing to do -- the only > issue I've seen was that Nautilus requires the ability to decode > TYPE_MSGR2 address, but I may have missed others. > Yes, this needs to be done with care. We have to ensure that the server side isn't assuming that the client supports something that it doesn't. I think that means just trawling through the code and verifying whether this is safe. > 0003 - debug code to add require_osd_release to the osdmap debugfs file. > > 0004 - adds the truncate_{seq,size} fields to the 'copy-from' operation > if the OSDs are >= Octopus. > > Also note that, as suggested by Ilya, I've dropped the patch that would > change the default mount options to 'copyfrom'. > > These patches have been tested with the xfstests generic test suite, and > with a couple of other (local) tests that exercise the cephfs > copy_file_range syscall. I didn't saw any issues, but as I said above, > I'm not really sure if adding the SERVER_* flags to the supported > features have other side effects. > > [1] https://lore.kernel.org/lkml/20191108141555.31176-1-lhenriques@suse.com/ > [2] https://github.com/ceph/ceph/pull/31611 > I'm just getting caught up on the discussion here, but why was it decided to do it this way instead of just adding a new OSD "copy-from-no-truncseq" operation? Once you tried it once and an OSD didn't support it, you could just give up on using it any longer? That seems a lot simpler than trying to monkey with feature bits.
On Thu, 14 Nov 2019, Jeff Layton wrote: > On Thu, 2019-11-14 at 10:57 +0000, Luis Henriques wrote: > > Hi! > > > > So, after the feedback I got from v1 [1] I've sent out a pull-request > > for the OSDs [2] which encodes require_osd_release into the OSDMap > > client data. This allows the client to figure out which ceph release > > the OSDs cluster is running and decide whether or not it's safe to use > > the copy-from Op for copy_file_range. > > > > This new patchset I'm sending simply adds enough functionality to the > > kernel client so that it can take advantage of this OSD patch: > > > > 0001 - adds the ability to decode TYPE_MSGR2 addresses. This is a > > required functionality for enabling SERVER_NAUTILUS in the > > client. I hope I got the new format right, as I couldn't figure > > out what the hard-coded values (see comments) really mean. > > > > nit: the first 3 patch subject lines should probably be prefixed with > "libceph:" > > > 0002 - allows the client to retrieve the new require_osd_release field > > from the OSDMap if available. This patch also adds SERVER_MIMIC, > > SERVER_NAUTILUS and SERVER_OCTOPUS to the supported features, > > which TBH I'm not sure if that's a safe thing to do -- the only > > issue I've seen was that Nautilus requires the ability to decode > > TYPE_MSGR2 address, but I may have missed others. > > > > Yes, this needs to be done with care. We have to ensure that the server > side isn't assuming that the client supports something that it doesn't. > I think that means just trawling through the code and verifying whether > this is safe. > > > 0003 - debug code to add require_osd_release to the osdmap debugfs file. > > > > 0004 - adds the truncate_{seq,size} fields to the 'copy-from' operation > > if the OSDs are >= Octopus. > > > > Also note that, as suggested by Ilya, I've dropped the patch that would > > change the default mount options to 'copyfrom'. > > > > These patches have been tested with the xfstests generic test suite, and > > with a couple of other (local) tests that exercise the cephfs > > copy_file_range syscall. I didn't saw any issues, but as I said above, > > I'm not really sure if adding the SERVER_* flags to the supported > > features have other side effects. > > > > [1] https://lore.kernel.org/lkml/20191108141555.31176-1-lhenriques@suse.com/ > > [2] https://github.com/ceph/ceph/pull/31611 > > > > I'm just getting caught up on the discussion here, but why was it > decided to do it this way instead of just adding a new OSD > "copy-from-no-truncseq" operation? Once you tried it once and an OSD > didn't support it, you could just give up on using it any longer? That > seems a lot simpler than trying to monkey with feature bits. I don't remember the original discussion either, but in retrospect that does seem much simpler--especially since hte client is conditioning sending this based on the the require_osd_release. It seems like passing a flag to the copy-from op would be more reasonable instead of conditional feature-based behavior. Greg, do you remember why we ended up here? sage
On Thu, Nov 14, 2019 at 2:28 PM Sage Weil <sweil@redhat.com> wrote: > > On Thu, 14 Nov 2019, Jeff Layton wrote: > > On Thu, 2019-11-14 at 10:57 +0000, Luis Henriques wrote: > > > Hi! > > > > > > So, after the feedback I got from v1 [1] I've sent out a pull-request > > > for the OSDs [2] which encodes require_osd_release into the OSDMap > > > client data. This allows the client to figure out which ceph release > > > the OSDs cluster is running and decide whether or not it's safe to use > > > the copy-from Op for copy_file_range. > > > > > > This new patchset I'm sending simply adds enough functionality to the > > > kernel client so that it can take advantage of this OSD patch: > > > > > > 0001 - adds the ability to decode TYPE_MSGR2 addresses. This is a > > > required functionality for enabling SERVER_NAUTILUS in the > > > client. I hope I got the new format right, as I couldn't figure > > > out what the hard-coded values (see comments) really mean. > > > > > > > nit: the first 3 patch subject lines should probably be prefixed with > > "libceph:" > > > > > 0002 - allows the client to retrieve the new require_osd_release field > > > from the OSDMap if available. This patch also adds SERVER_MIMIC, > > > SERVER_NAUTILUS and SERVER_OCTOPUS to the supported features, > > > which TBH I'm not sure if that's a safe thing to do -- the only > > > issue I've seen was that Nautilus requires the ability to decode > > > TYPE_MSGR2 address, but I may have missed others. > > > > > > > Yes, this needs to be done with care. We have to ensure that the server > > side isn't assuming that the client supports something that it doesn't. > > I think that means just trawling through the code and verifying whether > > this is safe. > > > > > 0003 - debug code to add require_osd_release to the osdmap debugfs file. > > > > > > 0004 - adds the truncate_{seq,size} fields to the 'copy-from' operation > > > if the OSDs are >= Octopus. > > > > > > Also note that, as suggested by Ilya, I've dropped the patch that would > > > change the default mount options to 'copyfrom'. > > > > > > These patches have been tested with the xfstests generic test suite, and > > > with a couple of other (local) tests that exercise the cephfs > > > copy_file_range syscall. I didn't saw any issues, but as I said above, > > > I'm not really sure if adding the SERVER_* flags to the supported > > > features have other side effects. > > > > > > [1] https://lore.kernel.org/lkml/20191108141555.31176-1-lhenriques@suse.com/ > > > [2] https://github.com/ceph/ceph/pull/31611 > > > > > > > I'm just getting caught up on the discussion here, but why was it > > decided to do it this way instead of just adding a new OSD > > "copy-from-no-truncseq" operation? Once you tried it once and an OSD > > didn't support it, you could just give up on using it any longer? That > > seems a lot simpler than trying to monkey with feature bits. > > I don't remember the original discussion either, but in retrospect that > does seem much simpler--especially since hte client is conditioning > sending this based on the the require_osd_release. It seems like passing > a flag to the copy-from op would be more reasonable instead of conditional > feature-based behavior. Yeah, I suggested adding require_osd_release to the client portion just because we are running into it more and more: Objecter relies on it for RESEND_ON_SPLIT for example. It needs to be accessible so that patches like that can be carried over to the kernel client without workarounds. copy-from in its existing form is another example. AFAIU the problem is that copy-from op doesn't reject unknown flags. Luis added a flag in https://github.com/ceph/ceph/pull/25374, but it is simply ignored on nautilus and older releases, potentially leading to data corruption. Adding a new op that would be an alias for CEPH_OSD_OP_COPY_FROM with CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ like Jeff is suggesting, or a new copy-from2 op that would behave just like copy-from, but reject unknown flags to avoid similar compatibility issues in the future is probably the best thing we can do from the client perspective. Thanks, Ilya
On Thu, 14 Nov 2019, Ilya Dryomov wrote: > > > I'm just getting caught up on the discussion here, but why was it > > > decided to do it this way instead of just adding a new OSD > > > "copy-from-no-truncseq" operation? Once you tried it once and an OSD > > > didn't support it, you could just give up on using it any longer? That > > > seems a lot simpler than trying to monkey with feature bits. > > > > I don't remember the original discussion either, but in retrospect that > > does seem much simpler--especially since hte client is conditioning > > sending this based on the the require_osd_release. It seems like passing > > a flag to the copy-from op would be more reasonable instead of conditional > > feature-based behavior. > > Yeah, I suggested adding require_osd_release to the client portion just > because we are running into it more and more: Objecter relies on it for > RESEND_ON_SPLIT for example. It needs to be accessible so that patches > like that can be carried over to the kernel client without workarounds. > > copy-from in its existing form is another example. AFAIU the problem > is that copy-from op doesn't reject unknown flags. Luis added a flag > in https://github.com/ceph/ceph/pull/25374, but it is simply ignored on > nautilus and older releases, potentially leading to data corruption. > > Adding a new op that would be an alias for CEPH_OSD_OP_COPY_FROM with > CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ like Jeff is suggesting, or a new > copy-from2 op that would behave just like copy-from, but reject unknown > flags to avoid similar compatibility issues in the future is probably > the best thing we can do from the client perspective. Yeah, I think copy-from2 is the best path. I think that means we should revert what we merged to ceph.git a few weeks back, Luis! Sorry we didn't put all the pieces together before... sage
On Thu, Nov 14, 2019 at 02:17:44PM +0000, Sage Weil wrote: > On Thu, 14 Nov 2019, Ilya Dryomov wrote: > > > > I'm just getting caught up on the discussion here, but why was it > > > > decided to do it this way instead of just adding a new OSD > > > > "copy-from-no-truncseq" operation? Once you tried it once and an OSD > > > > didn't support it, you could just give up on using it any longer? That > > > > seems a lot simpler than trying to monkey with feature bits. > > > > > > I don't remember the original discussion either, but in retrospect that > > > does seem much simpler--especially since hte client is conditioning > > > sending this based on the the require_osd_release. It seems like passing > > > a flag to the copy-from op would be more reasonable instead of conditional > > > feature-based behavior. > > > > Yeah, I suggested adding require_osd_release to the client portion just > > because we are running into it more and more: Objecter relies on it for > > RESEND_ON_SPLIT for example. It needs to be accessible so that patches > > like that can be carried over to the kernel client without workarounds. > > > > copy-from in its existing form is another example. AFAIU the problem > > is that copy-from op doesn't reject unknown flags. Luis added a flag > > in https://github.com/ceph/ceph/pull/25374, but it is simply ignored on > > nautilus and older releases, potentially leading to data corruption. > > > > Adding a new op that would be an alias for CEPH_OSD_OP_COPY_FROM with > > CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ like Jeff is suggesting, or a new > > copy-from2 op that would behave just like copy-from, but reject unknown > > flags to avoid similar compatibility issues in the future is probably > > the best thing we can do from the client perspective. > > Yeah, I think copy-from2 is the best path. I think that means we should > revert what we merged to ceph.git a few weeks back, Luis! Sorry we didn't > put all the pieces together before... Well, that's an unexpected turn. I'm not disagreeing with that decision but since my initial pull request was done one year ago (almost to the day!), it's a bit disappointing to see that in the end I'm back to square one :-) I guess that the PR I mentioned in the cover letter can also be dropped, as it's not really usable by the kernel client (at least not until it fully supports all the features up to SERVER_OCTOPUS). Anyway, thanks everyone for the review. Cheers, -- Luís
On Thu, Nov 14, 2019 at 4:24 PM Luis Henriques <lhenriques@suse.com> wrote: > > On Thu, Nov 14, 2019 at 02:17:44PM +0000, Sage Weil wrote: > > On Thu, 14 Nov 2019, Ilya Dryomov wrote: > > > > > I'm just getting caught up on the discussion here, but why was it > > > > > decided to do it this way instead of just adding a new OSD > > > > > "copy-from-no-truncseq" operation? Once you tried it once and an OSD > > > > > didn't support it, you could just give up on using it any longer? That > > > > > seems a lot simpler than trying to monkey with feature bits. > > > > > > > > I don't remember the original discussion either, but in retrospect that > > > > does seem much simpler--especially since hte client is conditioning > > > > sending this based on the the require_osd_release. It seems like passing > > > > a flag to the copy-from op would be more reasonable instead of conditional > > > > feature-based behavior. > > > > > > Yeah, I suggested adding require_osd_release to the client portion just > > > because we are running into it more and more: Objecter relies on it for > > > RESEND_ON_SPLIT for example. It needs to be accessible so that patches > > > like that can be carried over to the kernel client without workarounds. > > > > > > copy-from in its existing form is another example. AFAIU the problem > > > is that copy-from op doesn't reject unknown flags. Luis added a flag > > > in https://github.com/ceph/ceph/pull/25374, but it is simply ignored on > > > nautilus and older releases, potentially leading to data corruption. > > > > > > Adding a new op that would be an alias for CEPH_OSD_OP_COPY_FROM with > > > CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ like Jeff is suggesting, or a new > > > copy-from2 op that would behave just like copy-from, but reject unknown > > > flags to avoid similar compatibility issues in the future is probably > > > the best thing we can do from the client perspective. > > > > Yeah, I think copy-from2 is the best path. I think that means we should > > revert what we merged to ceph.git a few weeks back, Luis! Sorry we didn't > > put all the pieces together before... > > Well, that's an unexpected turn. I'm not disagreeing with that decision > but since my initial pull request was done one year ago (almost to the > day!), it's a bit disappointing to see that in the end I'm back to > square one :-) Well, I think literally every line from that PR will still go in, just wrapped in a new OSD op. Backwards compatibility is hard... > > I guess that the PR I mentioned in the cover letter can also be dropped, > as it's not really usable by the kernel client (at least not until it > fully supports all the features up to SERVER_OCTOPUS). No, some form of https://github.com/ceph/ceph/pull/31611 should go in. I'm pretty certain it will come up at some point in the future, even if the new field isn't immediately usable today. Someone porting a change to the kernel client a couple years from now will thank you for it ;) Thanks, Ilya
On Thu, Nov 14, 2019 at 5:28 AM Sage Weil <sweil@redhat.com> wrote: > > On Thu, 14 Nov 2019, Jeff Layton wrote: > > On Thu, 2019-11-14 at 10:57 +0000, Luis Henriques wrote: > > > Hi! > > > > > > So, after the feedback I got from v1 [1] I've sent out a pull-request > > > for the OSDs [2] which encodes require_osd_release into the OSDMap > > > client data. This allows the client to figure out which ceph release > > > the OSDs cluster is running and decide whether or not it's safe to use > > > the copy-from Op for copy_file_range. > > > > > > This new patchset I'm sending simply adds enough functionality to the > > > kernel client so that it can take advantage of this OSD patch: > > > > > > 0001 - adds the ability to decode TYPE_MSGR2 addresses. This is a > > > required functionality for enabling SERVER_NAUTILUS in the > > > client. I hope I got the new format right, as I couldn't figure > > > out what the hard-coded values (see comments) really mean. > > > > > > > nit: the first 3 patch subject lines should probably be prefixed with > > "libceph:" > > > > > 0002 - allows the client to retrieve the new require_osd_release field > > > from the OSDMap if available. This patch also adds SERVER_MIMIC, > > > SERVER_NAUTILUS and SERVER_OCTOPUS to the supported features, > > > which TBH I'm not sure if that's a safe thing to do -- the only > > > issue I've seen was that Nautilus requires the ability to decode > > > TYPE_MSGR2 address, but I may have missed others. > > > > > > > Yes, this needs to be done with care. We have to ensure that the server > > side isn't assuming that the client supports something that it doesn't. > > I think that means just trawling through the code and verifying whether > > this is safe. > > > > > 0003 - debug code to add require_osd_release to the osdmap debugfs file. > > > > > > 0004 - adds the truncate_{seq,size} fields to the 'copy-from' operation > > > if the OSDs are >= Octopus. > > > > > > Also note that, as suggested by Ilya, I've dropped the patch that would > > > change the default mount options to 'copyfrom'. > > > > > > These patches have been tested with the xfstests generic test suite, and > > > with a couple of other (local) tests that exercise the cephfs > > > copy_file_range syscall. I didn't saw any issues, but as I said above, > > > I'm not really sure if adding the SERVER_* flags to the supported > > > features have other side effects. > > > > > > [1] https://lore.kernel.org/lkml/20191108141555.31176-1-lhenriques@suse.com/ > > > [2] https://github.com/ceph/ceph/pull/31611 > > > > > > > I'm just getting caught up on the discussion here, but why was it > > decided to do it this way instead of just adding a new OSD > > "copy-from-no-truncseq" operation? Once you tried it once and an OSD > > didn't support it, you could just give up on using it any longer? That > > seems a lot simpler than trying to monkey with feature bits. > > I don't remember the original discussion either, but in retrospect that > does seem much simpler--especially since hte client is conditioning > sending this based on the the require_osd_release. It seems like passing > a flag to the copy-from op would be more reasonable instead of conditional > feature-based behavior. > > Greg, do you remember why we ended up here? Well, I can look it up. We discussed it being a different op in February ("OSD 'copy-from' operation and truncate_seq value") and...it looks like that conversation ended with that being the plan? I can't see why it changed in the making though, and everyone seems to have forgotten about it at the next pass. -Greg > > sage >