Message ID | 561B7CAE.3040505@dev.mellanox.co.il (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Mon, Oct 12, 2015 at 12:26:06PM +0300, Sagi Grimberg wrote: > First go with this looks OK for mlx4. mlx5 needs the below incremental > patch to be folded in. > > we need dev->ib_dev.max_pkeys set when get_port_caps() is called. Thanks, I've folded your patch and force pushed out the updated tree. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
So this patch seems to work well for me. I've ran some kernel applications (ipoib, iser, srp, nfs) and a user-space application (ibsrpdm) over mlx4/mlx5 and didn't spot any issues. I think this is very useful to have and I'd love having it in 4.4, does anyone have any other comments on this patch? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 3:00 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: > So this patch seems to work well for me. I've ran some kernel > applications (ipoib, iser, srp, nfs) and a user-space application > (ibsrpdm) over mlx4/mlx5 and didn't spot any issues. > > I think this is very useful to have and I'd love having it in 4.4, > does anyone have any other comments on this patch? Were we ever presented with performance gains achieved using the patch? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> I think this is very useful to have and I'd love having it in 4.4, >> does anyone have any other comments on this patch? > > Were we ever presented with performance gains achieved using the patch? Hi Or, Can you explain what you mean by performance gains? My understanding is that this patch is not performance critical. It just replaces each and every ULP device attributes caching. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 04:08:54PM +0300, Sagi Grimberg wrote: > Can you explain what you mean by performance gains? My understanding is > that this patch is not performance critical. It just replaces each and > every ULP device attributes caching. Exactly. The only 'performance' we care about is that of ULD writers :) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 4:08 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: >>> I think this is very useful to have and I'd love having it in 4.4, >>> does anyone have any other comments on this patch? >> Were we ever presented with performance gains achieved using the patch? > Can you explain what you mean by performance gains? My understanding is > that this patch is not performance critical. It just replaces each and > every ULP device attributes caching. oops, sorry, I was referring to the patch that deals with re-structuring of struct ib_wc, so... (1) did we even got performance gains achieved using **that** patch? (2) re this one, as I wrote in the past, I am in favor of simple caching of struct ib_device_attr on struct ib_device (best with pointer) and not adding 333 fields to struct ib_device, I don't see the benefit from this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/20/2015 5:08 PM, Or Gerlitz wrote: > On Tue, Oct 20, 2015 at 4:08 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: >>>> I think this is very useful to have and I'd love having it in 4.4, >>>> does anyone have any other comments on this patch? > >>> Were we ever presented with performance gains achieved using the patch? > >> Can you explain what you mean by performance gains? My understanding is >> that this patch is not performance critical. It just replaces each and >> every ULP device attributes caching. > > oops, sorry, I was referring to the patch that deals with > re-structuring of struct ib_wc, so... > > (1) did we even got performance gains achieved using **that** patch? I don't know if we'd see noticeable performance gains from **that** patch either.. The benefit is mostly stack relief as usually ULPs keep the work request structure on the stack. > > (2) re this one, as I wrote in the past, I am in favor of simple > caching of struct > ib_device_attr on struct ib_device (best with pointer) and not adding > 333 fields > to struct ib_device, I don't see the benefit from this patch. > Christoph commented on that: "I'm strongly against this. As the reviews show the move is highly confusing. The attributes don't change and there is no need to 'cache' or 'query' them. Just merge them into the device, follow years of experience with how every other Linux subsystem does it and be done with it." If the attributes are automatically a part of the ib_device then it does seem a bit redundant keeping the structure around... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 6:00 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: >> (2) re this one, as I wrote in the past, I am in favor of simple >> caching of struct ib_device_attr on struct ib_device (best with pointer) and not adding >> 333 fields to struct ib_device, I don't see the benefit from this patch. > Christoph commented on that: > > "I'm strongly against this. As the reviews show the move is highly > confusing. The attributes don't change and there is no need to 'cache' > or 'query' them. Just merge them into the device, follow years of > experience with how every other Linux subsystem does it and be done > with it." > > If the attributes are automatically a part of the ib_device then it > does seem a bit redundant keeping the structure around... But this makes struct ib_device much much bigger, and this structure is used in **fast** path, e.g the ULP traverses through a pointer from struct ib_device to post_send/recv, poll_cq and friends. The networking community will let you work for 10y before they add a field to struct net_device exactly b/c of the reason I brought. Why here we can come out of the blue and add tens if not hundreds of fields to our device structure? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> But this makes struct ib_device much much bigger, and this structure > is used in **fast** path, > e.g the ULP traverses through a pointer from struct ib_device to > post_send/recv, poll_cq and friends. Umm, all the caps are appended to the end of the ib_device structure so I don't see how it will affect the fast path - am I missing something? > > The networking community will let you work for 10y before they add a > field to struct net_device exactly b/c > of the reason I brought. Why here we can come out of the blue and add > tens if not hundreds of fields to our device structure? Keeping struct ib_device_attr embedded at the end of struct ib_device is exactly the same isn't it? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 6:13 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: >> The networking community will let you work for 10y before they add a >> field to struct net_device exactly b/c >> of the reason I brought. Why here we can come out of the blue and add >> tens if not hundreds of fields to our device structure? > Keeping struct ib_device_attr embedded at the end of struct ib_device is > exactly the same isn't it? I said " (best with pointer)" -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 06:07:31PM +0300, Or Gerlitz wrote: > But this makes struct ib_device much much bigger, and this structure > is used in **fast** path, > e.g the ULP traverses through a pointer from struct ib_device to > post_send/recv, poll_cq and friends. But it doesn't get near the end of the structure. Maybe it's time for a little cache line 101 primer? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 20, 2015 at 06:13:35PM +0300, Sagi Grimberg wrote: >> The networking community will let you work for 10y before they add a >> field to struct net_device exactly b/c >> of the reason I brought. Why here we can come out of the blue and add >> tens if not hundreds of fields to our device structure? > > Keeping struct ib_device_attr embedded at the end of struct ib_device is > exactly the same isn't it? In terms of layout it is, in terms of usability it's worse. Just to take the networking example Or seems to like so much there is no struct netdev_attr to which fields of struct netdev are moved out to, which you then need to query and cache anyway in your private device structure. This design simply was a idiotic idea to start with and I don't understand why anyone would han onto it. I do understand Chuck's concern about churn, but I'd rather fix the infiniband subsystem up before we're growing even more users. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/21/2015 9:38 AM, Christoph Hellwig wrote: > On Tue, Oct 20, 2015 at 06:07:31PM +0300, Or Gerlitz wrote: >> >But this makes struct ib_device much much bigger, and this structure >> >is used in **fast** path, >> >e.g the ULP traverses through a pointer from struct ib_device to >> >post_send/recv, poll_cq and friends. > But it doesn't get near the end of the structure. Maybe it's time for a > little cache line 101 primer? Oh, not that I am fully qualified on that front, but I understand the end-of-structure argument. Fact is that for struct net_device you will not add 333 new fields over night in the coming 33 years, for sure. This makes much sense to me, as a guideline. I don't think we should have a device structure with the current 100 fields and another few hundreds without any notable benefit and against the common practice in the networking subsystem. We have a UAPI that requires us to keep the device query verb for ever, and hence the returned device attr struct, it would be a much smaller and noisy patch to cache an device attr pointer on the device structure. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 09:44:41AM +0300, Or Gerlitz wrote: > Fact is that for struct net_device you will not add 333 new fields over > night in the coming 33 years, for sure. That's because they never had this split and added fields to struct netdev as required. One interesting difference in netdev is that it move all the function pointers out to a different structure so it could be kept const. This introduces one more level of pointer chasing but has other advantages. > This makes much sense to me, as a guideline. I don't think we should have a > device structure with the current > 100 fields and another few hundreds without any notable benefit and against > the common practice in the networking > subsystem. Please understand the networking subsystem (or SCSI, or NVMe, or ...) before making such incorrect comments. We're moving towards how all other subsystems work. > We have a UAPI that requires us to keep the device query verb for ever, and > hence the returned device attr struct, > it would be a much smaller and noisy patch to cache an device attr pointer > on the device structure. Take a look at the code. The only time we ever call into ->query_device is for the userspace only timestamping extensions only implemented for mlx4. With all the stuff we have on the plate the kernel API will look substatially different from the arkane 'Verbs' implementation in userspace, and they will use less and less common code. Libuvers and the ABIs it uses are something we can't change unfortunately, but we can make the kernel API much much better by learining lessons from other kernel subsystems. That's work that Jason Sagi and me have been doing for a while. I'd also suggest you update your Linux knowledge before trying to micromanage patch submissions. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/21/2015 9:51 AM, Christoph Hellwig wrote: >> We have a UAPI that requires us to keep the device query verb for ever, and >> hence the returned device attr struct, it would be a much smaller and noisy patch pointer >> to cache an device attr on the device structure. > Take a look at the code. The only time we ever call into ->query_device is > for the userspace only timestamping extensions only implemented for mlx4. We will have many more device query extensions, but the point I tried to make here is a bit different -- we do need to keep the user/kernel device attr struct as part of the UAPI, and don't see any reason to avoid it in the kernel, just because some other subsystems do that, according to your view. As I said, you would have to work real (real) hard to convince the networking ppl to add fields to struct net_device sk_buff and friends... the nature of rdma/offloading is such that new attr come often and I don't think we need to touch our device struct every release. > With all the stuff we have on the plate the kernel API will look substatially > different from the arkane 'Verbs' implementation in userspace, and they will > use less and less common code. Libuvers and the ABIs it uses are something > we can't change unfortunately, but we can make the kernel API much much > better by learining lessons from other kernel subsystems. That's work > that Jason Sagi and me have been doing for a while. > > I'd also suggest you update your Linux knowledge before trying to > micromanage patch submissions. not trying to micro-manage @ all, I went over the archives and haven't found any review or ack to your giant patch that touches the whole subsystem (drivers, core and ULPs) expect from Sagi's -- lets hear more opinions. Re my education -- can you make the argument more precise: what are we making much much better in the kernel API by thins house moving exercise? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 10:11:29AM +0300, Or Gerlitz wrote: > > We will have many more device query extensions, None of which use struct ib_device_attr I hope! > but the point I tried to > make here is a bit different -- > we do need to keep the user/kernel device attr struct as part of the UAPI, It's an entirely separate ib_uverbs_query_device_resp structure. > and don't see any reason to > avoid it in the kernel, just because some other subsystems do that, > according to your view. As I said, you > would have to work real (real) hard to convince the networking ppl to add > fields to struct net_device sk_buff Or, please stop these bullshit strawman arguments. Yes, adding fields to struct sk_buff will be hard, but that's not the right comparism. struct sk_buff is a structured allocated for the data buffers in great quantities and not the net_device structure allocated once per device. I'm going to finish this email, but if you don't even try to get your facts right I'll stop responding ro you because it's futile. > and friends... the nature of rdma/offloading is such that new attr come > often and I don't think we need > to touch our device struct every release. For anything you want to add you need to touch _a_ struct. It's not any difference in efforts if it's ib_device_attr or ib_device. You're not even saving much memory if at all as every driver caches at least some fields of it in their own device structure, and iser, isert, srpt and nfs cache the full structure, so if you use one of those you're already having one of them per device. If you use two of them you have multiple copies plus individual fields caches by other ULDs and core code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/21/2015 10:33 AM, Christoph Hellwig wrote: > For anything you want to add you need to touch_a_ struct. It's not any > difference in efforts if it's ib_device_attr or ib_device. > > You're not even saving much memory if at all as every driver caches at > least some fields of it in their own device structure, and iser, isert, > srpt and nfs cache the full structure, I know, but a patch that adds caching an attr pointer on the device will remove these local caches, we actually had that/similar patch posted here and it was dropped/forgotten. > so if you use one of those you're > already having one of them per device. If you use two of them you > have multiple copies plus individual fields caches by other ULDs and core > code. again, the method I propose does remove these duplications all together. I am still waiting to hear what is the precise argument for having this change. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 10:41:13AM +0300, Or Gerlitz wrote: > I know, but a patch that adds caching an attr pointer on the device will > remove these local caches, > we actually had that/similar patch posted here and it was dropped/forgotten. > >> so if you use one of those you're >> already having one of them per device. If you use two of them you >> have multiple copies plus individual fields caches by other ULDs and core >> code. > > again, the method I propose does remove these duplications all together. > > I am still waiting to hear what is the precise argument for having this > change. It does everything the pointer does, but in addition saves memory (no rounding up of the slab size), reduces pointer chasing and makes and API that's easier to use and more similar to how all other major Linux subsystems work. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/21/2015 12:11 AM, Or Gerlitz wrote: > haven't found any review or ack to your giant patch that touches the > whole subsystem (drivers, core and ULPs) expect from Sagi's -- lets > hear more opinions. Although I have not yet had the time to review the entire patch, removing ib_query_device() seems a great idea to me and an idea that I welcome very much. The ib_device_attr structure is too large to be allocated on the stack. This means that with Christoph's patch it is no longer needed to call kmalloc() + ib_query_device() + kfree() when a device attribute is needed from kernel code. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 08:48:10AM -0700, Bart Van Assche wrote: > On 10/21/2015 12:11 AM, Or Gerlitz wrote: > >haven't found any review or ack to your giant patch that touches the > > whole subsystem (drivers, core and ULPs) expect from Sagi's -- lets > > hear more opinions. > > Although I have not yet had the time to review the entire patch, removing > ib_query_device() seems a great idea to me and an idea that I welcome very > much. The ib_device_attr structure is too large to be allocated on the > stack. This means that with Christoph's patch it is no longer needed to call > kmalloc() + ib_query_device() + kfree() when a device attribute is needed > from kernel code. I agree, this is absolutely the right way to go. The bikeshedding is not important, nobody has come up with a reason why we need to maintain the attr structure as-is and Christoph already has a patch - I say go with it. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/21/2015 11:43 AM, Jason Gunthorpe wrote: > On Wed, Oct 21, 2015 at 08:48:10AM -0700, Bart Van Assche wrote: >> On 10/21/2015 12:11 AM, Or Gerlitz wrote: >>> haven't found any review or ack to your giant patch that touches the >>> whole subsystem (drivers, core and ULPs) expect from Sagi's -- lets >>> hear more opinions. >> Although I have not yet had the time to review the entire patch, removing >> ib_query_device() seems a great idea to me and an idea that I welcome very >> much. The ib_device_attr structure is too large to be allocated on the >> stack. This means that with Christoph's patch it is no longer needed to call >> kmalloc() + ib_query_device() + kfree() when a device attribute is needed >> from kernel code. > I agree, this is absolutely the right way to go. > > The bikeshedding is not important, nobody has come up with a reason > why we need to maintain the attr structure as-is and Christoph already > has a patch - I say go with it. > > Jason > While I don't really like the uber patch review-wise, I'm all for nuking ib_query_device() and the attr struct. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 6:48 PM, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 10/21/2015 12:11 AM, Or Gerlitz wrote: >> >> haven't found any review or ack to your giant patch that touches the > >> whole subsystem (drivers, core and ULPs) expect from Sagi's -- lets >> hear more opinions. > > Although I have not yet had the time to review the entire patch, removing > ib_query_device() seems a great idea to me and an idea that I welcome very > much. The ib_device_attr structure is too large to be allocated on the > stack. This means that with Christoph's patch it is no longer needed to call > kmalloc() + ib_query_device() + kfree() when a device attribute is needed > from kernel code. Bart, Jason, Steve, The alternative approach to address this which was also running here in the form of a patch from Ira following a reviewer comment from me, is the have struct ib_device to contain a struct ib_device_attr member (potentially a pointer) which is filled by the device driver before they register themselves with the IB core. This way there's no need for kmalloc() + ib_query_device() + kfree() when a device attribute is needed and we don't introduce tens/hundreds new fields for struct ib_device. Thoughts? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 09:08:31PM +0300, Or Gerlitz wrote: > The alternative approach to address this which was also running here > in the form of a patch from Ira following a reviewer comment from me, > is the have struct ib_device to contain a struct ib_device_attr member > (potentially a pointer) which is filled by the device driver before > they register themselves with the IB core. No to the pointer idea. > This way there's no need for kmalloc() + ib_query_device() + kfree() > when a device attribute is needed and we don't introduce tens/hundreds > new fields for struct ib_device. I care very little if the name is ibdev->attr.foo or ibdev->foo, that is just bikeshedding. I have a slight preference toward ibdev->foo to match other subsystems. I have no idea why you care so much about something so trivial. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2015 at 9:20 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > No to the pointer idea. can you spare few words why? And if you have a valid argument, as Christoph said they teach in cache 101 course which he's pretty sure I skipped as twenty other courses taken by kernel developers, put the no pointer member @ the end > I have a slight preference toward ibdev->foo to match other subsystems. I have a slight preference to keep the struct ib_device_attr notion to match other portions of the RDMA stack such as the UAPI and the user space API to applications. > I have no idea why you care so much about something so trivial. I don't care so much, I have the small right to speak without getting so much mud on my head. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 67b979f..5b73322 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1321,6 +1321,10 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev->mdev = mdev; + err = mlx5_ib_init_device_flags(&dev->ib_dev); + if (err) + goto err_dealloc; + err = get_port_caps(dev); if (err) goto err_dealloc; @@ -1433,10 +1437,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) if (err) goto err_rsrc; - err = mlx5_ib_init_device_flags(&dev->ib_dev); - if (err) - goto err_rsrc; - err = ib_register_device(&dev->ib_dev, NULL); if (err) goto err_odp;