Message ID | 20210105104326.67895-1-galpress@amazon.com (mailing list archive) |
---|---|
Headers | show |
Series | Host information userspace version | expand |
On 05/01/2021 12:43, Gal Pressman wrote: > The following two patches add the userspace version to the host > information struct reported to the device, used for debugging and > troubleshooting purposes. > > PR was sent: > https://github.com/linux-rdma/rdma-core/pull/918 > > Thanks, > Gal Anything stopping this series from being merged?
On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: > On 05/01/2021 12:43, Gal Pressman wrote: > > The following two patches add the userspace version to the host > > information struct reported to the device, used for debugging and > > troubleshooting purposes. > > > > PR was sent: > > https://github.com/linux-rdma/rdma-core/pull/918 > > > > Thanks, > > Gal > > Anything stopping this series from being merged? It is unclear when this forwarding of non-verbs data to the FW will stop. Thanks
On 19/01/2021 10:46, Leon Romanovsky wrote: > On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: >> On 05/01/2021 12:43, Gal Pressman wrote: >>> The following two patches add the userspace version to the host >>> information struct reported to the device, used for debugging and >>> troubleshooting purposes. >>> >>> PR was sent: >>> https://github.com/linux-rdma/rdma-core/pull/918 >>> >>> Thanks, >>> Gal >> >> Anything stopping this series from being merged? > > It is unclear when this forwarding of non-verbs data to the FW will stop. This was already discussed in the PR. Not everything should be passed through this interface, there should be a limit and it should be examined per case. rdma-core version is clearly related to an RDMA device. BTW, if you have any concerns about a patch you can state them, you don't have to ignore it and wait for the submitter to ask what's wrong..
On Tue, Jan 19, 2021 at 11:10:59AM +0200, Gal Pressman wrote: > On 19/01/2021 10:46, Leon Romanovsky wrote: > > On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: > >> On 05/01/2021 12:43, Gal Pressman wrote: > >>> The following two patches add the userspace version to the host > >>> information struct reported to the device, used for debugging and > >>> troubleshooting purposes. > >>> > >>> PR was sent: > >>> https://github.com/linux-rdma/rdma-core/pull/918 > >>> > >>> Thanks, > >>> Gal > >> > >> Anything stopping this series from being merged? > > > > It is unclear when this forwarding of non-verbs data to the FW will stop. > > This was already discussed in the PR. Not everything should be passed through > this interface, there should be a limit and it should be examined per case. > rdma-core version is clearly related to an RDMA device. "Clearly or not" - it depends on the observer. > > BTW, if you have any concerns about a patch you can state them, you don't have > to ignore it and wait for the submitter to ask what's wrong.. Didn't you mistake me with anyone else? I'm reviewer in the kernel exactly like you and it gives me nice thing - ignore patches. Thanks
On 19/01/2021 13:58, Leon Romanovsky wrote: > On Tue, Jan 19, 2021 at 11:10:59AM +0200, Gal Pressman wrote: >> On 19/01/2021 10:46, Leon Romanovsky wrote: >>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: >>>> On 05/01/2021 12:43, Gal Pressman wrote: >>>>> The following two patches add the userspace version to the host >>>>> information struct reported to the device, used for debugging and >>>>> troubleshooting purposes. >>>>> >>>>> PR was sent: >>>>> https://github.com/linux-rdma/rdma-core/pull/918 >>>>> >>>>> Thanks, >>>>> Gal >>>> >>>> Anything stopping this series from being merged? >>> >>> It is unclear when this forwarding of non-verbs data to the FW will stop. >> >> This was already discussed in the PR. Not everything should be passed through >> this interface, there should be a limit and it should be examined per case. >> rdma-core version is clearly related to an RDMA device. > > "Clearly or not" - it depends on the observer. > >> >> BTW, if you have any concerns about a patch you can state them, you don't have >> to ignore it and wait for the submitter to ask what's wrong.. > > Didn't you mistake me with anyone else? No, you decided to answer my original question :). > I'm reviewer in the kernel exactly like you and it gives me nice thing - ignore patches. Don't get me wrong, your review is very appreciated, but this series is 20 LOC which you already reviewed two weeks ago, and I replied to all comments. Ignoring patches is fine, but please don't review, ignore and wait for the last minute to say they shouldn't be merged.
On Tue, Jan 19, 2021 at 03:19:15PM +0200, Gal Pressman wrote: > On 19/01/2021 13:58, Leon Romanovsky wrote: > > On Tue, Jan 19, 2021 at 11:10:59AM +0200, Gal Pressman wrote: > >> On 19/01/2021 10:46, Leon Romanovsky wrote: > >>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: > >>>> On 05/01/2021 12:43, Gal Pressman wrote: > >>>>> The following two patches add the userspace version to the host > >>>>> information struct reported to the device, used for debugging and > >>>>> troubleshooting purposes. > >>>>> > >>>>> PR was sent: > >>>>> https://github.com/linux-rdma/rdma-core/pull/918 > >>>>> > >>>>> Thanks, > >>>>> Gal > >>>> > >>>> Anything stopping this series from being merged? > >>> > >>> It is unclear when this forwarding of non-verbs data to the FW will stop. > >> > >> This was already discussed in the PR. Not everything should be passed through > >> this interface, there should be a limit and it should be examined per case. > >> rdma-core version is clearly related to an RDMA device. > > > > "Clearly or not" - it depends on the observer. > > > >> > >> BTW, if you have any concerns about a patch you can state them, you don't have > >> to ignore it and wait for the submitter to ask what's wrong.. > > > > Didn't you mistake me with anyone else? > > No, you decided to answer my original question :). > > > I'm reviewer in the kernel exactly like you and it gives me nice thing - ignore patches. > > Don't get me wrong, your review is very appreciated, but this series is 20 LOC > which you already reviewed two weeks ago, and I replied to all comments. > Ignoring patches is fine, but please don't review, ignore and wait for the last > minute to say they shouldn't be merged. Sorry, but it is impossible to get it right. I gave you hint WHY it takes so long, it is not review/ack/nack or anything like this. Thanks
On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: > On 05/01/2021 12:43, Gal Pressman wrote: > > The following two patches add the userspace version to the host > > information struct reported to the device, used for debugging and > > troubleshooting purposes. > > > > PR was sent: > > https://github.com/linux-rdma/rdma-core/pull/918 > > > > Thanks, > > Gal > > Anything stopping this series from being merged? Honestly, I'm not very keen on this Why does this have to go through a kernel driver, can't you collect OS telemetry some other way? Jason
On 21/01/2021 20:35, Jason Gunthorpe wrote: > On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: >> On 05/01/2021 12:43, Gal Pressman wrote: >>> The following two patches add the userspace version to the host >>> information struct reported to the device, used for debugging and >>> troubleshooting purposes. >>> >>> PR was sent: >>> https://github.com/linux-rdma/rdma-core/pull/918 >>> >>> Thanks, >>> Gal >> >> Anything stopping this series from being merged? > > Honestly, I'm not very keen on this > > Why does this have to go through a kernel driver, can't you collect > OS telemetry some other way? Hmm, it has to go through rdma-core somehow, what sort of component can rdma-core interact with to pass such data? The only one I could think of is the RDMA driver :). As I said, I get your concern, I was going on and off about this as well, but the userspace version is a very useful piece of information in the context of a kernel bypass device. It's just as important as the kernel version. I agree that this is not the place to pass things like gcc version, but I don't think that's the case here :). Do you absolutely hate the idea of passing the userspace version, or are you worried about what's next to come? If it's the latter, we don't really have plans to push anything similar anytime soon, and even if we did, I don't think it should block this series.
On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote: > On 21/01/2021 20:35, Jason Gunthorpe wrote: > > On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: > >> On 05/01/2021 12:43, Gal Pressman wrote: > >>> The following two patches add the userspace version to the host > >>> information struct reported to the device, used for debugging and > >>> troubleshooting purposes. > >>> > >>> PR was sent: > >>> https://github.com/linux-rdma/rdma-core/pull/918 > >>> > >>> Thanks, > >>> Gal > >> > >> Anything stopping this series from being merged? > > > > Honestly, I'm not very keen on this > > > > Why does this have to go through a kernel driver, can't you collect > > OS telemetry some other way? > > Hmm, it has to go through rdma-core somehow, what sort of component can > rdma-core interact with to pass such data? The only one I could think of is the > RDMA driver :). > > As I said, I get your concern, I was going on and off about this as well, but > the userspace version is a very useful piece of information in the context of a > kernel bypass device. It's just as important as the kernel version. > I agree that this is not the place to pass things like gcc version, but I don't > think that's the case here :). Well, if we were to do this for mlx5 we'd want to pass UCX and maybe other stuff, it seems like it gets quickly out of hand. I think telemetry is better done as some telemetry subsystem, not integrated all over the place Jason
On 27/01/2021 18:57, Jason Gunthorpe wrote: > On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote: >> On 21/01/2021 20:35, Jason Gunthorpe wrote: >>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: >>>> On 05/01/2021 12:43, Gal Pressman wrote: >>>>> The following two patches add the userspace version to the host >>>>> information struct reported to the device, used for debugging and >>>>> troubleshooting purposes. >>>>> >>>>> PR was sent: >>>>> https://github.com/linux-rdma/rdma-core/pull/918 >>>>> >>>>> Thanks, >>>>> Gal >>>> >>>> Anything stopping this series from being merged? >>> >>> Honestly, I'm not very keen on this >>> >>> Why does this have to go through a kernel driver, can't you collect >>> OS telemetry some other way? >> >> Hmm, it has to go through rdma-core somehow, what sort of component can >> rdma-core interact with to pass such data? The only one I could think of is the >> RDMA driver :). >> >> As I said, I get your concern, I was going on and off about this as well, but >> the userspace version is a very useful piece of information in the context of a >> kernel bypass device. It's just as important as the kernel version. >> I agree that this is not the place to pass things like gcc version, but I don't >> think that's the case here :). > > Well, if we were to do this for mlx5 we'd want to pass UCX and maybe > other stuff, it seems like it gets quickly out of hand. Agree, that's why I think this should be limited to things in rdma-core's reach, sounds like a reasonable limit to me. > I think telemetry is better done as some telemetry subsystem, not > integrated all over the place Interesting, but that would still be all over the place as each package would have to report its version to that telemetry driver. And since this currently doesn't exist, should we stay without a solution? Specifically talking about rdma-core version, do you think it could be merged?
On 27/01/2021 19:53, Gal Pressman wrote: > On 27/01/2021 18:57, Jason Gunthorpe wrote: >> On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote: >>> On 21/01/2021 20:35, Jason Gunthorpe wrote: >>>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: >>>>> On 05/01/2021 12:43, Gal Pressman wrote: >>>>>> The following two patches add the userspace version to the host >>>>>> information struct reported to the device, used for debugging and >>>>>> troubleshooting purposes. >>>>>> >>>>>> PR was sent: >>>>>> https://github.com/linux-rdma/rdma-core/pull/918 >>>>>> >>>>>> Thanks, >>>>>> Gal >>>>> >>>>> Anything stopping this series from being merged? >>>> >>>> Honestly, I'm not very keen on this >>>> >>>> Why does this have to go through a kernel driver, can't you collect >>>> OS telemetry some other way? >>> >>> Hmm, it has to go through rdma-core somehow, what sort of component can >>> rdma-core interact with to pass such data? The only one I could think of is the >>> RDMA driver :). >>> >>> As I said, I get your concern, I was going on and off about this as well, but >>> the userspace version is a very useful piece of information in the context of a >>> kernel bypass device. It's just as important as the kernel version. >>> I agree that this is not the place to pass things like gcc version, but I don't >>> think that's the case here :). >> >> Well, if we were to do this for mlx5 we'd want to pass UCX and maybe >> other stuff, it seems like it gets quickly out of hand. > > Agree, that's why I think this should be limited to things in rdma-core's reach, > sounds like a reasonable limit to me. > >> I think telemetry is better done as some telemetry subsystem, not >> integrated all over the place > > Interesting, but that would still be all over the place as each package would > have to report its version to that telemetry driver. > > And since this currently doesn't exist, should we stay without a solution? > Specifically talking about rdma-core version, do you think it could be merged? > Jason?
On Sun, Feb 28, 2021 at 11:56:01AM +0200, Gal Pressman wrote: > On 27/01/2021 19:53, Gal Pressman wrote: > > On 27/01/2021 18:57, Jason Gunthorpe wrote: > >> On Thu, Jan 21, 2021 at 09:40:49PM +0200, Gal Pressman wrote: > >>> On 21/01/2021 20:35, Jason Gunthorpe wrote: > >>>> On Tue, Jan 19, 2021 at 09:17:14AM +0200, Gal Pressman wrote: > >>>>> On 05/01/2021 12:43, Gal Pressman wrote: > >>>>>> The following two patches add the userspace version to the host > >>>>>> information struct reported to the device, used for debugging and > >>>>>> troubleshooting purposes. > >>>>>> > >>>>>> PR was sent: > >>>>>> https://github.com/linux-rdma/rdma-core/pull/918 > >>>>>> > >>>>>> Thanks, > >>>>>> Gal > >>>>> > >>>>> Anything stopping this series from being merged? > >>>> > >>>> Honestly, I'm not very keen on this > >>>> > >>>> Why does this have to go through a kernel driver, can't you collect > >>>> OS telemetry some other way? > >>> > >>> Hmm, it has to go through rdma-core somehow, what sort of component can > >>> rdma-core interact with to pass such data? The only one I could think of is the > >>> RDMA driver :). > >>> > >>> As I said, I get your concern, I was going on and off about this as well, but > >>> the userspace version is a very useful piece of information in the context of a > >>> kernel bypass device. It's just as important as the kernel version. > >>> I agree that this is not the place to pass things like gcc version, but I don't > >>> think that's the case here :). > >> > >> Well, if we were to do this for mlx5 we'd want to pass UCX and maybe > >> other stuff, it seems like it gets quickly out of hand. > > > > Agree, that's why I think this should be limited to things in rdma-core's reach, > > sounds like a reasonable limit to me. > > > >> I think telemetry is better done as some telemetry subsystem, not > >> integrated all over the place > > > > Interesting, but that would still be all over the place as each package would > > have to report its version to that telemetry driver. > > > > And since this currently doesn't exist, should we stay without a solution? > > Specifically talking about rdma-core version, do you think it could be merged? > > > > Jason? I'm not keen on it, it doesn't work well for other use-cases, it seems too hacky. Jason