Message ID | 20110724201228.204265835@systemfabricworks.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Jul 24, 2011 at 9:43 PM, <rpearson@systemfabricworks.com> wrote: > > Bring up to date with the current version of the IBTA spec. > - add new opcodes for RC and RD > - add new groups of opcodes for CN and XRC > > Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com> > > --- > include/rdma/ib_pack.h | 39 ++++++++++++++++++++++++++++++++++++++- > 1 file changed, 38 insertions(+), 1 deletion(-) > > Index: infiniband/include/rdma/ib_pack.h > =================================================================== > --- infiniband.orig/include/rdma/ib_pack.h > +++ infiniband/include/rdma/ib_pack.h > @@ -75,6 +75,8 @@ enum { > IB_OPCODE_UC = 0x20, > IB_OPCODE_RD = 0x40, > IB_OPCODE_UD = 0x60, > + IB_OPCODE_CN = 0x80, > + IB_OPCODE_XRC = 0xA0, A convention when submitting Linux kernel patches is to add only those constants that are actually used in the code submitted in the same patch series. So I'm not sure it's fine to add the XRC constants in ib_pack.h through the rxe patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Originally I had xrc support in rxe but there was nothing in Roland's tree to support it so I stripped it out. I was experimenting with congestion notification to see if there was a way to slow down spewing by fast Ethernet devices but that wasn't working, also all the MAD support was stripped out. In the end the only thing left was the constants and entries in the opcode table in rxe_opcode.c which as you say are not used elsewhere. I think it is reasonable to have a complete set of opcodes somewhere. I want to eventually complete the IB transport parts of the driver although they are not used by RoCE. They would be useful for a soft IB implementation with say an FPGA based MAC/PHY. The long term goal is to have a complete reference implementation of the IB Architecture. > -----Original Message----- > From: bart.vanassche@gmail.com [mailto:bart.vanassche@gmail.com] On > Behalf Of Bart Van Assche > Sent: Monday, August 15, 2011 4:13 AM > To: rpearson@systemfabricworks.com > Cc: linux-rdma@vger.kernel.org > Subject: Re: [patch v2 02/37] add opcodes to ib_pack.h > > On Sun, Jul 24, 2011 at 9:43 PM, <rpearson@systemfabricworks.com> wrote: > > > > Bring up to date with the current version of the IBTA spec. > > - add new opcodes for RC and RD > > - add new groups of opcodes for CN and XRC > > > > Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com> > > > > --- > > include/rdma/ib_pack.h | 39 > ++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 38 insertions(+), 1 deletion(-) > > > > Index: infiniband/include/rdma/ib_pack.h > > > ========================================================== > ========= > > --- infiniband.orig/include/rdma/ib_pack.h > > +++ infiniband/include/rdma/ib_pack.h > > @@ -75,6 +75,8 @@ enum { > > IB_OPCODE_UC = 0x20, > > IB_OPCODE_RD = 0x40, > > IB_OPCODE_UD = 0x60, > > + IB_OPCODE_CN = 0x80, > > + IB_OPCODE_XRC = 0xA0, > > A convention when submitting Linux kernel patches is to add only those > constants that are actually used in the code submitted in the same > patch series. So I'm not sure it's fine to add the XRC constants in > ib_pack.h through the rxe patch series. > > Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 20, 2011 at 4:47 PM, Bart Van Assche <bvanassche@acm.org> wrote: > On Mon, Aug 15, 2011 at 6:15 PM, Bob Pearson > <rpearson@systemfabricworks.com> wrote: > > I want to eventually complete the IB transport parts of the driver although > > they are not used by RoCE. They would be useful for a soft IB implementation > > with say an FPGA based MAC/PHY. The long term goal is to have a complete > > reference implementation of the IB Architecture. > > Having a complete reference implementation of the IB architecture > available would be great. IMHO at least the following should be added > to ib_rxe (not necessarily in the first version) before it can called > a reference implementation: > - Support for secondary GIDs. These make it easy to implement > fail-over in a H.A. cluster. > - Support for multicast. More in detail, my comments with regard to multicast support in ib_rxe are: 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an unfortunate choice to me. That choice will cause multicast GIDs to be mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been reserved by RFC 2464 for IPv6 multicast addresses. If a collision with an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on the switches in the involved network then RoCE multicast won't work properly. IMHO we need a separate Ethernet address range for RoCE multicast purposes, next to the existing ranges for IPv4 and IPv6. 2. We need a way to limit multicast traffic to those switch ports on which receivers are listening. The InfiniBand architecture has such a protocol but it relies on the subnet manager, a component that does not exist in RoCE networks. And since IGMP and MLD are IPv4/IPv6 specific we need a new (IBTA-approved) protocol. 3. RFC 4541, in which IGMP and MLD snooping has been defined, will have to be extended with a paragraph about snooping the protocol defined in (2). 4. Switch vendors will have to implement the protocol item (3) refers to. Shouldn't a RoCE multicast implementation be postponed until the IEEE has assigned an Ethernet address range for mapping RoCE multicast GIDs ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of Bart Van Assche > Sent: Monday, August 22, 2011 1:08 PM > To: Bob Pearson > Cc: linux-rdma@vger.kernel.org > Subject: Re: [patch v2 02/37] add opcodes to ib_pack.h > > On Sat, Aug 20, 2011 at 4:47 PM, Bart Van Assche <bvanassche@acm.org> > wrote: > > On Mon, Aug 15, 2011 at 6:15 PM, Bob Pearson > > <rpearson@systemfabricworks.com> wrote: > > > I want to eventually complete the IB transport parts of the driver > although > > > they are not used by RoCE. They would be useful for a soft IB > implementation > > > with say an FPGA based MAC/PHY. The long term goal is to have a > complete > > > reference implementation of the IB Architecture. > > > > Having a complete reference implementation of the IB architecture > > available would be great. IMHO at least the following should be added > > to ib_rxe (not necessarily in the first version) before it can called > > a reference implementation: > > - Support for secondary GIDs. These make it easy to implement > > fail-over in a H.A. cluster. > > - Support for multicast. > > More in detail, my comments with regard to multicast support in ib_rxe are: > 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an > unfortunate choice to me. That choice will cause multicast GIDs to be > mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been > reserved by RFC 2464 for IPv6 multicast addresses. If a collision with > an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on > the switches in the involved network then RoCE multicast won't work > properly. IMHO we need a separate Ethernet address range for RoCE > multicast purposes, next to the existing ranges for IPv4 and IPv6. I thought this was intentional! I.e. in order to get multicast over RoCE to work we were trying to piggyback on the behavior of intelligent switches. Otherwise the only way to get packets forwarded without adding a new group of multicast addresses would be to broadcast. The implementation of IGMP and MLD at the end node is oblivious to the packet's ethertype. And switches (at least the ones we tried) also happily multicast RoCE packets. > 2. We need a way to limit multicast traffic to those switch ports on > which receivers are listening. The InfiniBand architecture has such a > protocol but it relies on the subnet manager, a component that does > not exist in RoCE networks. And since IGMP and MLD are IPv4/IPv6 > specific we need a new (IBTA-approved) protocol. > 3. RFC 4541, in which IGMP and MLD snooping has been defined, will > have to be extended with a paragraph about snooping the protocol > defined in (2). > 4. Switch vendors will have to implement the protocol item (3) refers to. > > Shouldn't a RoCE multicast implementation be postponed until the IEEE > has assigned an Ethernet address range for mapping RoCE multicast GIDs > ? > > Bart. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OK. Since for now I don't need it I'll remove this patch but it might be worth considering adding them as a standalone patch. > -----Original Message----- > From: bart.vanassche@gmail.com [mailto:bart.vanassche@gmail.com] On > Behalf Of Bart Van Assche > Sent: Monday, August 15, 2011 4:13 AM > To: rpearson@systemfabricworks.com > Cc: linux-rdma@vger.kernel.org > Subject: Re: [patch v2 02/37] add opcodes to ib_pack.h > > On Sun, Jul 24, 2011 at 9:43 PM, <rpearson@systemfabricworks.com> wrote: > > > > Bring up to date with the current version of the IBTA spec. > > - add new opcodes for RC and RD > > - add new groups of opcodes for CN and XRC > > > > Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com> > > > > --- > > include/rdma/ib_pack.h | 39 > ++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 38 insertions(+), 1 deletion(-) > > > > Index: infiniband/include/rdma/ib_pack.h > > > ========================================================== > ========= > > --- infiniband.orig/include/rdma/ib_pack.h > > +++ infiniband/include/rdma/ib_pack.h > > @@ -75,6 +75,8 @@ enum { > > IB_OPCODE_UC = 0x20, > > IB_OPCODE_RD = 0x40, > > IB_OPCODE_UD = 0x60, > > + IB_OPCODE_CN = 0x80, > > + IB_OPCODE_XRC = 0xA0, > > A convention when submitting Linux kernel patches is to add only those > constants that are actually used in the code submitted in the same > patch series. So I'm not sure it's fine to add the XRC constants in > ib_pack.h through the rxe patch series. > > Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 31, 2011 at 10:15:02PM -0500, Bob Pearson wrote: > > 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an > > unfortunate choice to me. That choice will cause multicast GIDs to be > > mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been > > reserved by RFC 2464 for IPv6 multicast addresses. If a collision with > > an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on > > the switches in the involved network then RoCE multicast won't work > > properly. IMHO we need a separate Ethernet address range for RoCE > > multicast purposes, next to the existing ranges for IPv4 and IPv6. > > I thought this was intentional! I.e. in order to get multicast over RoCE to > work we were trying to piggyback on the behavior of intelligent switches. > Otherwise the only way to get packets forwarded without adding a new group > of multicast addresses would be to broadcast. The implementation of IGMP and > MLD at the end node is oblivious to the packet's ethertype. And switches (at > least the ones we tried) also happily multicast RoCE packets. Nothing about RoCE multicast is standardized or even proposed to be standardized. Hijacking the IPv6 MAC space - and also abusing the net stack to generate MLD stuff is what Mellanox did in their implementation. I don't know if that was included in the main line submission or not? Frankly I'd be inclined to map to the broadcast MAC address to pressure the relevant standards bodies to finish the work on RoCE. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 1, 2011 at 5:15 AM, Bob Pearson <rpearson@systemfabricworks.com> wrote: > > On Sat, Aug 20, 2011 at 4:47 PM, Bart Van Assche <bvanassche@acm.org> > > More in detail, my comments with regard to multicast support in ib_rxe > are: > > 1. Using ipv6_eth_mc_map() for mapping multicast GIDs seems an > > unfortunate choice to me. That choice will cause multicast GIDs to be > > mapped to the 33-33-xx-xx-xx-xx Ethernet address range that has been > > reserved by RFC 2464 for IPv6 multicast addresses. If a collision with > > an IPv6 multicast address occurs and IPv6 MLD snooping has enabled on > > the switches in the involved network then RoCE multicast won't work > > properly. IMHO we need a separate Ethernet address range for RoCE > > multicast purposes, next to the existing ranges for IPv4 and IPv6. > > I thought this was intentional! I.e. in order to get multicast over RoCE to > work we were trying to piggyback on the behavior of intelligent switches. > Otherwise the only way to get packets forwarded without adding a new group > of multicast addresses would be to broadcast. The implementation of IGMP and > MLD at the end node is oblivious to the packet's ethertype. And switches (at > least the ones we tried) also happily multicast RoCE packets. Shouldn't both the RoCE specification and RFC 4541 be updated according to the above ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: infiniband/include/rdma/ib_pack.h =================================================================== --- infiniband.orig/include/rdma/ib_pack.h +++ infiniband/include/rdma/ib_pack.h @@ -75,6 +75,8 @@ enum { IB_OPCODE_UC = 0x20, IB_OPCODE_RD = 0x40, IB_OPCODE_UD = 0x60, + IB_OPCODE_CN = 0x80, + IB_OPCODE_XRC = 0xA0, /* operations -- just used to define real constants */ IB_OPCODE_SEND_FIRST = 0x00, @@ -98,6 +100,10 @@ enum { IB_OPCODE_ATOMIC_ACKNOWLEDGE = 0x12, IB_OPCODE_COMPARE_SWAP = 0x13, IB_OPCODE_FETCH_ADD = 0x14, + IB_OPCODE_CNP = 0x00, + IB_OPCODE_RESYNC = 0x15, + IB_OPCODE_SEND_LAST_INV = 0x16, + IB_OPCODE_SEND_ONLY_INV = 0x17, /* real constants follow -- see comment about above IB_OPCODE() macro for more details */ @@ -124,6 +130,8 @@ enum { IB_OPCODE(RC, ATOMIC_ACKNOWLEDGE), IB_OPCODE(RC, COMPARE_SWAP), IB_OPCODE(RC, FETCH_ADD), + IB_OPCODE(RC, SEND_LAST_INV), + IB_OPCODE(RC, SEND_ONLY_INV), /* UC */ IB_OPCODE(UC, SEND_FIRST), @@ -161,10 +169,39 @@ enum { IB_OPCODE(RD, ATOMIC_ACKNOWLEDGE), IB_OPCODE(RD, COMPARE_SWAP), IB_OPCODE(RD, FETCH_ADD), + IB_OPCODE(RD, RESYNC), /* UD */ IB_OPCODE(UD, SEND_ONLY), - IB_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE) + IB_OPCODE(UD, SEND_ONLY_WITH_IMMEDIATE), + + /* CN */ + IB_OPCODE(CN, CNP), + + /* XRC */ + IB_OPCODE(XRC, SEND_FIRST), + IB_OPCODE(XRC, SEND_MIDDLE), + IB_OPCODE(XRC, SEND_LAST), + IB_OPCODE(XRC, SEND_LAST_WITH_IMMEDIATE), + IB_OPCODE(XRC, SEND_ONLY), + IB_OPCODE(XRC, SEND_ONLY_WITH_IMMEDIATE), + IB_OPCODE(XRC, RDMA_WRITE_FIRST), + IB_OPCODE(XRC, RDMA_WRITE_MIDDLE), + IB_OPCODE(XRC, RDMA_WRITE_LAST), + IB_OPCODE(XRC, RDMA_WRITE_LAST_WITH_IMMEDIATE), + IB_OPCODE(XRC, RDMA_WRITE_ONLY), + IB_OPCODE(XRC, RDMA_WRITE_ONLY_WITH_IMMEDIATE), + IB_OPCODE(XRC, RDMA_READ_REQUEST), + IB_OPCODE(XRC, RDMA_READ_RESPONSE_FIRST), + IB_OPCODE(XRC, RDMA_READ_RESPONSE_MIDDLE), + IB_OPCODE(XRC, RDMA_READ_RESPONSE_LAST), + IB_OPCODE(XRC, RDMA_READ_RESPONSE_ONLY), + IB_OPCODE(XRC, ACKNOWLEDGE), + IB_OPCODE(XRC, ATOMIC_ACKNOWLEDGE), + IB_OPCODE(XRC, COMPARE_SWAP), + IB_OPCODE(XRC, FETCH_ADD), + IB_OPCODE(XRC, SEND_LAST_INV), + IB_OPCODE(XRC, SEND_ONLY_INV), }; enum {
Bring up to date with the current version of the IBTA spec. - add new opcodes for RC and RD - add new groups of opcodes for CN and XRC Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com> --- include/rdma/ib_pack.h | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html