Message ID | 1828884A29C6694DAF28B7E6B8A8237302123C@ORSMSX101.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hefty, Sean wrote: > In order to support OFED or vendor specific calls, define a > generic extension mechanism. This allows OFED, an RDMA vendor, > or another registered 3rd party (for example, the librdmacm) > to define RDMA extensions. I'm trying to understand the way the user/kernel way of adding verbs is implemented... I wasn't sure, if/which specific kernel patch out of this series is matching this one? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I'm trying to understand the way the user/kernel way of adding verbs > is implemented... I wasn't sure, if/which specific kernel patch out > of this series is matching this one? There are no matching kernel patches to this patch. This does not try to provide any direct support for out of tree kernel patches. The closest this comes is reserving the upper 8-bits of any enum or other value, which can be used to indicate vendor specific support. For example, an out of tree kernel patch could define a new QP type and ABI command with one of the higher bits set (rather than the next in series). An upstream patch would remove these bits. This should make it possible for a vendor to continue to support their out of tree changes even after that feature was added to the mainline. Hopefully this makes sense. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hefty, Sean <sean.hefty@intel.com> wrote: >> I'm trying to understand the way the user/kernel way of adding verbs >> is implemented... I wasn't sure, if/which specific kernel patch out >> of this series is matching this one? > There are no matching kernel patches to this patch. This does not try to provide any > direct support for out of tree kernel patches. Sean, maybe I wasn't clear here, I was referring to the XRC kernel patch series you were just sending yesterday for upstream review/acceptance... isn't that series + libibverbs/libmlx4/librdmacm compose a complete solution for XRC for (say) new applications (forget about apps written to other XRC APIs)? I'm trying to understand if/what is the framework you suggest to add new user space verbs. Basically, I wasn't referring to such new verbs as vendor extensions, but rather as new verbs we want to add at this and/or future points of time which didn't exist at the time the IB stack and specifically its kernel/user ABIs/APIs were written (couple of years ago...). I hope to better spell out my question now, is this section below still relevant as the answer? Or. > The closest this comes is reserving the upper 8-bits of any enum or other value, which can be used to indicate vendor specific support. For example, an out of tree kernel patch could define a new QP type and ABI command with one of the higher bits set (rather than the next in series). An upstream patch would remove these bits. This should make it possible for a vendor to continue to support their out of tree changes even after that feature was added to the mainline. Hopefully this makes sense. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Basically, I wasn't referring to such new verbs as vendor extensions, > but rather as new verbs we want to add at this and/or future points of > time which didn't exist at the time the IB stack and specifically its > kernel/user ABIs/APIs were written (couple of years ago...). To be clear, there are 2 sides to ibverbs - the app side, and the provider library. On the app side, new functionality would be added directly to libibverbs. I would reuse what's there if possible, and provide direct API calls where needed. For example, the xrc patch adds: ibv_create_xsrq() ibv_open_xrcd() ibv_close_xrcd() as new APIs. On the provider side, the necessary calls are obtained by ibverbs calling get_ext_ops(). I haven't come up with another way of extended verbs that would be as easy for an application to use, given that most of the calls and data structures are reusable. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 16 June 2011 22:27, Hefty, Sean wrote: > > Basically, I wasn't referring to such new verbs as vendor extensions, > > but rather as new verbs we want to add at this and/or future points of > > time which didn't exist at the time the IB stack and specifically its > > kernel/user ABIs/APIs were written (couple of years ago...). > > To be clear, there are 2 sides to ibverbs - the app side, and the provider library. > > On the app side, new functionality would be added directly to libibverbs. I would reuse what's there if possible, and provide direct API calls where needed. For example, the xrc patch adds: > > ibv_create_xsrq() > ibv_open_xrcd() > ibv_close_xrcd() > > as new APIs. On the provider side, the necessary calls are obtained by ibverbs calling get_ext_ops(). > > I haven't come up with another way of extended verbs that would be as easy for an application to use, given that most of the calls and data structures are reusable. > > - Sean Hi Sean, I'm not sure about what this mechanism saves us over bumping the ABI numbers. Actually, I think I do see a problem here, under the following situation: - All libraries (app, libibverbs, and libmlx4) support extensions. - A new verb (say extension "ib_new_verb" was added as an extension to the app, to libibverbs, and to libmlx4 and everything was compiled. - The APP is built on the full configuration, but is run on a configuration which has the verb extension added to libmlx4, but NOT to libibverbs (new libibverbs was installed originally, so that libmlx4 would succeed in the install, then somehow libibverbs was rolled back to before "ib_new_verb" was added). When the APP tries to run, it calls: ibv_get_device_ext_ops(struct ibv_device *device, "ib_new_verb"); This call will succeed (in the current implementation). However, the verb helper function (ibv_cmd_<new_verb>) is not present in libibverbs, so things will crash (if, indeed, libmlx4 can be loaded at all - In fact, I'm not sure if it will fail loading because of unresolved references). Indeed, I am not sure that the app can run at all due to unresolved references. The problem here is that the new additions need support in libibverbs in order to work (this is not simply a "pass-through" by libibverbs to the lower layer). I may be wrong about all this -- userspace is not really my expertise. If I am not wrong, what, then, is the advantage of this methodology over simply bumping the ABI numbers? -Jack -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> In fact, I'm not sure if it will fail loading because of unresolved > references). Indeed, I am not sure that the app can run at all due > to unresolved references. The dlopen of mlx4 should fail due to unresolved references, so the net effect will be that no apps cannot open the RDMA device in this situation - so it is an invalid system configuration that is properly detected by the runtime. > The problem here is that the new additions need support in > libibverbs in order to work (this is not simply a "pass-through" by > libibverbs to the lower layer). The hope is once the infrastructure is in libibverbs there will not be as much need to change libibverbs, just the apps and the drivers. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 02 August 2011 08:38, Jason Gunthorpe wrote: > The hope is once the infrastructure is in libibverbs there will not > be as much need to change libibverbs, just the apps and the drivers. > If I understand correctly, the various additions which would normally be made to libibverbs would then be made by third-party libraries which extend libibverbs to support their additions. These additions would include the new ibv_cmd_xxx functions (the core functions reside in src/cmd.c), and new, additional, enum values of the form IBV_USER_VERBS_CMD_XXXX, of which the core enum is in file include/infiniband/kern_abi.h. The modified apps would then include the header files of the 3rd party additions after the libibverbs headers when compiling. Each new third-party package would need such a library. While this will lead to a multiplicity of new libraries (one per addition), the core libibverbs package would remain as is. Am I correct? If so, shouldn't the current XRC userspace implementation do the same (and take the XRC-specific additions out of libibverbs and put them into a separate library)? Note that coordination between third parties would still be required to insure that there is no collision of enum values between the various packages. -Jack -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> If I understand correctly, the various additions which would normally be made > to libibverbs > would then be made by third-party libraries which extend libibverbs to support > their additions. It may help to read about extensions in opengl: http://www.opengl.org/registry/doc/rules.html Additions can be made to both. Obviously Roland has the final say on any changes to ibverbs, but what I envision is: If a feature is based on an industry standard and all necessary kernel changes are upstream, then the feature should be integrated into ibverbs. An application would simply call ibv_new_feature() to make use of the feature (or call an existing function with some new enum value). Internally, ibverbs may need to obtain a new interface from the provider library. If there is no published specification for a feature (maybe it's still under development), kernel patches are needed, and there are customers who want to use the feature immediately, then a vendor can define an extension. In this case, the application may call vendor_ops = ibv_get_ext_ops(), followed by vendor_ops->new_feature(). Or the app may call ibv_some_existing_function() using a vendor specific enum value. > These additions would include the new ibv_cmd_xxx functions > (the core functions reside in src/cmd.c), and new, additional, enum values of > the form IBV_USER_VERBS_CMD_XXXX, of which the core enum is in file > include/infiniband/kern_abi.h. If an app uses an extension, then there are no changes to ibverbs. The provider library either needs to use an existing ibv_cmd_* call or call into the kernel itself. If the provider needs a new command, it could declare it as: enum { MLX4_USER_VERBS_CMD_BASE = IBV_EXTENSION_VENDOR << IBV_EXTENSION_BASE_SHIFT, MLX4_USER_VERBS_CMD_NEW_FEATURE ... }; This requires a kernel patch to uverbs maintained by the vendor. Note that this means that the vendor can continue to support their version of a feature (with continued kernel patches) even once the feature is merged into ibverbs. > Each new third-party package would need such a library. > While this will lead to a multiplicity of new libraries (one per addition), > the core libibverbs package would remain as is. I didn't follow this. The new feature could be integrated directly into the provider library (e.g. mlx4). > Am I correct? If so, shouldn't the current XRC userspace implementation do > the same (and > take the XRC-specific additions out of libibverbs and put them into a separate > library)? XRC could be added as an extension, but since it's based on a published specification, IMO it makes more sense being integrated directly into ibverbs with the necessary kernel changes pushed upstream. Extensions are more difficult for apps to use than an integrated feature. > Note that coordination between third parties would still be required to insure > that there is > no collision of enum values between the various packages. The use of enum ibv_extension_type should prevent collisions. It may be that different vendors use the same value for different objects, but that doesn't result in a collision. The scope of a vendor specific value is per device. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 02, 2011 at 10:53:05AM +0300, Jack Morgenstein wrote: > These additions would include the new ibv_cmd_xxx functions > (the core functions reside in src/cmd.c), and new, additional, enum values of > the form IBV_USER_VERBS_CMD_XXXX, of which the core enum is in file > include/infiniband/kern_abi.h. I believe the only case where using an extension would make sense if it if can be implemented entirely within the low level driver. So you can't add new ibv_cmd calls to ibverbs, must duplicate them in your driver, etc. I'm a little unclear on how the application is going to get the access enums and other structure definitions, though.. I suppose the low level driver can install a .h file as well. > Each new third-party package would need such a library. While this > will lead to a multiplicity of new libraries (one per addition), the > core libibverbs package would remain as is. I definitely don't want to see this.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h index 9a81416..e48abfd 100644 --- a/include/infiniband/driver.h +++ b/include/infiniband/driver.h @@ -57,6 +57,7 @@ typedef struct ibv_device *(*ibv_driver_init_func)(const char *uverbs_sys_path, int abi_version); void ibv_register_driver(const char *name, ibv_driver_init_func init_func); +void ibv_register_driver_ext(const char *name, ibv_driver_init_func init_func); int ibv_cmd_get_context(struct ibv_context *context, struct ibv_get_context *cmd, size_t cmd_size, struct ibv_get_context_resp *resp, size_t resp_size); diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index 0f1cb2e..b82cd3a 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -55,6 +55,15 @@ BEGIN_C_DECLS +enum ibv_extension_type { + IBV_EXTENSION_COMMON, + IBV_EXTENSION_VENDOR, + IBV_EXTENSION_OFA, + IBV_EXTENSION_RDMA_CM +}; +#define IBV_EXTENSION_BASE_SHIFT 24 +#define IBV_EXTENSION_MASK 0xFF000000 + union ibv_gid { uint8_t raw[16]; struct { @@ -92,7 +101,8 @@ enum ibv_device_cap_flags { IBV_DEVICE_SYS_IMAGE_GUID = 1 << 11, IBV_DEVICE_RC_RNR_NAK_GEN = 1 << 12, IBV_DEVICE_SRQ_RESIZE = 1 << 13, - IBV_DEVICE_N_NOTIFY_CQ = 1 << 14 + IBV_DEVICE_N_NOTIFY_CQ = 1 << 14, + IBV_DEVICE_EXTENSIONS = 1 << (IBV_EXTENSION_BASE_SHIFT - 1) }; enum ibv_atomic_cap { @@ -623,6 +633,13 @@ struct ibv_device { char dev_path[IBV_SYSFS_PATH_MAX]; /* Path to infiniband class device in sysfs */ char ibdev_path[IBV_SYSFS_PATH_MAX]; + + /* Following fields only available if device supports extensions */ + void *private; + int (*have_ext_ops)(struct ibv_device *device, + const char *ext_name); + void * (*get_device_ext_ops)(struct ibv_device *device, + const char *ext_name); }; struct ibv_context_ops { @@ -691,6 +708,11 @@ struct ibv_context { int num_comp_vectors; pthread_mutex_t mutex; void *abi_compat; + + /* Following fields only available if device supports extensions */ + void *private; + void * (*get_ext_ops)(struct ibv_context *context, + const char *ext_name); }; /** @@ -724,6 +746,17 @@ const char *ibv_get_device_name(struct ibv_device *device); uint64_t ibv_get_device_guid(struct ibv_device *device); /** + * ibv_have_ext_ops - Return true if device supports the requested + * extended operations. + */ +int ibv_have_ext_ops(struct ibv_device *device, const char *name); + +/** + * ibv_get_device_ext_ops - Return extended operations. + */ +void *ibv_get_device_ext_ops(struct ibv_device *device, const char *name); + +/** * ibv_open_device - Initialize device for use */ struct ibv_context *ibv_open_device(struct ibv_device *device); @@ -734,6 +767,11 @@ struct ibv_context *ibv_open_device(struct ibv_device *device); int ibv_close_device(struct ibv_context *context); /** + * ibv_get_ext_ops - Return extended operations. + */ +void *ibv_get_ext_ops(struct ibv_context *context, const char *name); + +/** * ibv_get_async_event - Get next async event * @event: Pointer to use to return async event * diff --git a/src/device.c b/src/device.c index 185f4a6..78d9d35 100644 --- a/src/device.c +++ b/src/device.c @@ -181,6 +181,24 @@ int __ibv_close_device(struct ibv_context *context) } default_symver(__ibv_close_device, ibv_close_device); +int __ibv_have_ext_ops(struct ibv_device *device, const char *name) +{ + if (!ibv_get_ext_support(device)) + return ENOSYS; + + return device->have_ext_ops(device, name); +} +default_symver(__ibv_have_ext_ops, ibv_have_ext_ops); + +void *__ibv_get_device_ext_ops(struct ibv_device *device, const char *name) +{ + if (!ibv_get_ext_support(device) || !device->get_device_ext_ops) + return NULL; + + return device->get_device_ext_ops(device, name); +} +default_symver(__ibv_get_device_ext_ops, ibv_get_device_ext_ops); + int __ibv_get_async_event(struct ibv_context *context, struct ibv_async_event *event) { diff --git a/src/ibverbs.h b/src/ibverbs.h index 6a6e3c8..33bdee2 100644 --- a/src/ibverbs.h +++ b/src/ibverbs.h @@ -35,6 +35,7 @@ #define IB_VERBS_H #include <pthread.h> +#include <string.h> #include <infiniband/driver.h> @@ -102,4 +103,21 @@ HIDDEN int ibverbs_init(struct ibv_device ***list); (cmd)->response = (uintptr_t) (out); \ } while (0) +/* + * Support for extended operations is recorded at the end of + * the name character array. This way we don't need to query + * for the device capabilities with every call. + */ +static inline int ibv_get_ext_support(struct ibv_device *device) +{ + return device->name[IBV_SYSFS_NAME_MAX - 1]; +} + +static inline void ibv_set_ext_support(struct ibv_device *device, + int ext_supported) +{ + if (strlen(device->name) < IBV_SYSFS_NAME_MAX - 1) + device->name[IBV_SYSFS_NAME_MAX - 1] = (char) ext_supported; +} + #endif /* IB_VERBS_H */ diff --git a/src/init.c b/src/init.c index 4f0130e..419ab31 100644 --- a/src/init.c +++ b/src/init.c @@ -71,6 +71,7 @@ struct ibv_driver { const char *name; ibv_driver_init_func init_func; struct ibv_driver *next; + int ext_support; }; static struct ibv_sysfs_dev *sysfs_dev_list; @@ -153,7 +154,8 @@ static int find_sysfs_devs(void) return ret; } -void ibv_register_driver(const char *name, ibv_driver_init_func init_func) +static void __ibv_register_driver(const char *name, ibv_driver_init_func init_func, + int ext_support) { struct ibv_driver *driver; @@ -166,6 +168,7 @@ void ibv_register_driver(const char *name, ibv_driver_init_func init_func) driver->name = name; driver->init_func = init_func; driver->next = NULL; + driver->ext_support = ext_support; if (tail_driver) tail_driver->next = driver; @@ -174,6 +177,16 @@ void ibv_register_driver(const char *name, ibv_driver_init_func init_func) tail_driver = driver; } +void ibv_register_driver(const char *name, ibv_driver_init_func init_func) +{ + __ibv_register_driver(name, init_func, 0); +} + +void ibv_register_driver_ext(const char *name, ibv_driver_init_func init_func) +{ + __ibv_register_driver(name, init_func, 1); +} + static void load_driver(const char *name) { char *so_name; @@ -368,6 +381,8 @@ static struct ibv_device *try_driver(struct ibv_driver *driver, strcpy(dev->name, sysfs_dev->ibdev_name); strcpy(dev->ibdev_path, sysfs_dev->ibdev_path); + ibv_set_ext_support(dev, driver->ext_support); + return dev; } diff --git a/src/libibverbs.map b/src/libibverbs.map index 1827da0..422e07f 100644 --- a/src/libibverbs.map +++ b/src/libibverbs.map @@ -96,4 +96,9 @@ IBVERBS_1.1 { ibv_port_state_str; ibv_event_type_str; ibv_wc_status_str; + + ibv_register_driver_ext; + ibv_have_ext_ops; + ibv_get_device_ext_ops; + ibv_get_ext_ops; } IBVERBS_1.0; diff --git a/src/verbs.c b/src/verbs.c index ba3c0a4..a34a784 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -76,6 +76,15 @@ enum ibv_rate mult_to_ibv_rate(int mult) } } +void *__ibv_get_ext_ops(struct ibv_context *context, const char *name) +{ + if (!ibv_get_ext_support(context->device) || !context->get_ext_ops) + return NULL; + + return context->get_ext_ops(context, name); +} +default_symver(__ibv_get_ext_ops, ibv_get_ext_ops); + int __ibv_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr) {
In order to support OFED or vendor specific calls, define a generic extension mechanism. This allows OFED, an RDMA vendor, or another registered 3rd party (for example, the librdmacm) to define RDMA extensions. Users which make use extensions are aware that they are not only using an extended call, but are given information regarding how widely the extension by be supported. Support for extended functions, data structures, and enums are defined. Extensions are referenced by name. There is an assumption that extension names are prefixed relative to the supporting party. Until an extension has been incorporated into libibverbs, it should be defined in an appropriate external header file. For example, OFA could provide a header file with their definition for XRC extensions. A partial view of such a header file might look something similar to: #ifndef OFA_XRC_H #define OFA_XRC_H #include <infiniband/verbs.h> #define OFA_XRC_OPS "ofa-xrc" /* Extend IBV_QP_TYPE for XRC */ #define OFA_QPT_XRC ((enum ibv_qp_type) \ (IBV_EXTENSION_OFA << IBV_EXTENSION_BASE_SHIFT) + 6) struct ofa_xrcd { struct ibv_context *context; }; struct ofa_xrc_ops { struct ofa_xrcd * (*open_xrcd)(struct ibv_context *context, inf fd, int oflags); int * (*close_xrcd)(struct ofa_xrcd *xrcd); /* other functions left as exercise to the reader */ }; #endif /* OFA_XRC_H */ Driver libraries that support extensions are given a new registration call, ibv_register_device_ext(). Use of this call indicates to libibverbs that the library allocates extended versions of struct ibv_device and struct ibv_context. The following new APIs are added to libibverbs to applications to use to determine if an extension is supported and to obtain the extended function calls. ibv_have_ext_ops - returns true if an extension is supported ibv_get_device_ext_ops - return extended operations for a device ibv_get_ext_ops - return extended operations for an open context To maintain backwards compatibility with existing applications, internally, the library uses the last byte of the device name to record if the device was registered with extension support. Signed-off-by: Sean Hefty <sean.hefty@intel.com> --- include/infiniband/driver.h | 1 + include/infiniband/verbs.h | 40 +++++++++++++++++++++++++++++++++++++++- src/device.c | 18 ++++++++++++++++++ src/ibverbs.h | 18 ++++++++++++++++++ src/init.c | 17 ++++++++++++++++- src/libibverbs.map | 5 +++++ src/verbs.c | 9 +++++++++ 7 files changed, 106 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html