diff mbox series

[1/6] Add ancillary bus support

Message ID 20201001050534.890666-2-david.m.ertman@intel.com (mailing list archive)
State Not Applicable
Headers show
Series [1/6] Add ancillary bus support | expand

Commit Message

Ertman, David M Oct. 1, 2020, 5:05 a.m. UTC
Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
It enables drivers to create an ancillary_device and bind an
ancillary_driver to it.

The bus supports probe/remove shutdown and suspend/resume callbacks.
Each ancillary_device has a unique string based id; driver binds to
an ancillary_device based on this id through the bus.

Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
---
 Documentation/driver-api/ancillary_bus.rst | 230 +++++++++++++++++++++
 Documentation/driver-api/index.rst         |   1 +
 drivers/bus/Kconfig                        |   3 +
 drivers/bus/Makefile                       |   3 +
 drivers/bus/ancillary.c                    | 191 +++++++++++++++++
 include/linux/ancillary_bus.h              |  58 ++++++
 include/linux/mod_devicetable.h            |   8 +
 scripts/mod/devicetable-offsets.c          |   3 +
 scripts/mod/file2alias.c                   |   8 +
 9 files changed, 505 insertions(+)
 create mode 100644 Documentation/driver-api/ancillary_bus.rst
 create mode 100644 drivers/bus/ancillary.c
 create mode 100644 include/linux/ancillary_bus.h

Comments

Leon Romanovsky Oct. 1, 2020, 8:32 a.m. UTC | #1
On Wed, Sep 30, 2020 at 10:05:29PM -0700, Dave Ertman wrote:
> Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> It enables drivers to create an ancillary_device and bind an
> ancillary_driver to it.
>
> The bus supports probe/remove shutdown and suspend/resume callbacks.
> Each ancillary_device has a unique string based id; driver binds to
> an ancillary_device based on this id through the bus.
>
> Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---
>  Documentation/driver-api/ancillary_bus.rst | 230 +++++++++++++++++++++
>  Documentation/driver-api/index.rst         |   1 +
>  drivers/bus/Kconfig                        |   3 +
>  drivers/bus/Makefile                       |   3 +
>  drivers/bus/ancillary.c                    | 191 +++++++++++++++++
>  include/linux/ancillary_bus.h              |  58 ++++++
>  include/linux/mod_devicetable.h            |   8 +
>  scripts/mod/devicetable-offsets.c          |   3 +
>  scripts/mod/file2alias.c                   |   8 +
>  9 files changed, 505 insertions(+)
>  create mode 100644 Documentation/driver-api/ancillary_bus.rst
>  create mode 100644 drivers/bus/ancillary.c
>  create mode 100644 include/linux/ancillary_bus.h
>
> diff --git a/Documentation/driver-api/ancillary_bus.rst b/Documentation/driver-api/ancillary_bus.rst
> new file mode 100644
> index 000000000000..0a11979aa927
> --- /dev/null
> +++ b/Documentation/driver-api/ancillary_bus.rst
> @@ -0,0 +1,230 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +=============
> +Ancillary Bus
> +=============
> +
> +In some subsystems, the functionality of the core device (PCI/ACPI/other) is
> +too complex for a single device to be managed as a monolithic block or a part of
> +the functionality needs to be exposed to a different subsystem.  Splitting the
> +functionality into smaller orthogonal devices would make it easier to manage
> +data, power management and domain-specific interaction with the hardware. A key
> +requirement for such a split is that there is no dependency on a physical bus,
> +device, register accesses or regmap support. These individual devices split from
> +the core cannot live on the platform bus as they are not physical devices that
> +are controlled by DT/ACPI. The same argument applies for not using MFD in this
> +scenario as MFD relies on individual function devices being physical devices
> +that are DT enumerated.
> +
> +An example for this kind of requirement is the audio subsystem where a single
> +IP is handling multiple entities such as HDMI, Soundwire, local devices such as
> +mics/speakers etc. The split for the core's functionality can be arbitrary or
> +be defined by the DSP firmware topology and include hooks for test/debug. This
> +allows for the audio core device to be minimal and focused on hardware-specific
> +control and communication.
> +
> +The ancillary bus is intended to be minimal, generic and avoid domain-specific
> +assumptions. Each ancillary_device represents a part of its parent
> +functionality. The generic behavior can be extended and specialized as needed
> +by encapsulating an ancillary_device within other domain-specific structures and
> +the use of .ops callbacks. Devices on the ancillary bus do not share any
> +structures and the use of a communication channel with the parent is
> +domain-specific.
> +
> +When Should the Ancillary Bus Be Used
> +=====================================
> +
> +The ancillary bus is to be used when a driver and one or more kernel modules,
> +who share a common header file with the driver, need a mechanism to connect and
> +provide access to a shared object allocated by the ancillary_device's
> +registering driver.  The registering driver for the ancillary_device(s) and the
> +kernel module(s) registering ancillary_drivers can be from the same subsystem,
> +or from multiple subsystems.
> +
> +The emphasis here is on a common generic interface that keeps subsystem
> +customization out of the bus infrastructure.
> +
> +One example could be a multi-port PCI network device that is rdma-capable and
> +needs to export this functionality and attach to an rdma driver in another
> +subsystem.  The PCI driver will allocate and register an ancillary_device for
> +each physical function on the NIC.  The rdma driver will register an
> +ancillary_driver that will be matched with and probed for each of these
> +ancillary_devices.  This will give the rdma driver access to the shared data/ops
> +in the PCI drivers shared object to establish a connection with the PCI driver.
> +
> +Another use case is for the a PCI device to be split out into multiple sub
> +functions.  For each sub function an ancillary_device will be created.  A PCI
> +sub function driver will bind to such devices that will create its own one or
> +more class devices.  A PCI sub function ancillary device will likely be
> +contained in a struct with additional attributes such as user defined sub
> +function number and optional attributes such as resources and a link to the
> +parent device.  These attributes could be used by systemd/udev; and hence should
> +be initialized before a driver binds to an ancillary_device.
> +
> +Ancillary Device
> +================
> +
> +An ancillary_device is created and registered to represent a part of its parent
> +device's functionality. It is given a name that, combined with the registering
> +drivers KBUILD_MODNAME, creates a match_name that is used for driver binding,
> +and an id that combined with the match_name provide a unique name to register
> +with the bus subsystem.
> +
> +Registering an ancillary_device is a two-step process.  First you must call
> +ancillary_device_initialize(), which will check several aspects of the
> +ancillary_device struct and perform a device_initialize().  After this step
> +completes, any error state must have a call to put_device() in its resolution
> +path.  The second step in registering an ancillary_device is to perform a call
> +to ancillary_device_add(), which will set the name of the device and add the
> +device to the bus.
> +
> +To unregister an ancillary_device, just a call to ancillary_device_unregister()
> +is used.  This will perform both a device_del() and a put_device().

Why did you chose ancillary_device_initialize() and not
ancillary_device_register() to be paired with ancillary_device_unregister()?

Thanks
Ertman, David M Oct. 1, 2020, 5:20 p.m. UTC | #2
> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 1, 2020 1:32 AM
> To: Ertman, David M <david.m.ertman@intel.com>
> Cc: linux-rdma@vger.kernel.org
> Subject: Re: [PATCH 1/6] Add ancillary bus support
> 
> On Wed, Sep 30, 2020 at 10:05:29PM -0700, Dave Ertman wrote:
> > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > It enables drivers to create an ancillary_device and bind an
> > ancillary_driver to it.
> >
> > The bus supports probe/remove shutdown and suspend/resume callbacks.
> > Each ancillary_device has a unique string based id; driver binds to
> > an ancillary_device based on this id through the bus.
> >
> > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > ---
> >  Documentation/driver-api/ancillary_bus.rst | 230
> +++++++++++++++++++++
> >  Documentation/driver-api/index.rst         |   1 +
> >  drivers/bus/Kconfig                        |   3 +
> >  drivers/bus/Makefile                       |   3 +
> >  drivers/bus/ancillary.c                    | 191 +++++++++++++++++
> >  include/linux/ancillary_bus.h              |  58 ++++++
> >  include/linux/mod_devicetable.h            |   8 +
> >  scripts/mod/devicetable-offsets.c          |   3 +
> >  scripts/mod/file2alias.c                   |   8 +
> >  9 files changed, 505 insertions(+)
> >  create mode 100644 Documentation/driver-api/ancillary_bus.rst
> >  create mode 100644 drivers/bus/ancillary.c
> >  create mode 100644 include/linux/ancillary_bus.h
> >
> > diff --git a/Documentation/driver-api/ancillary_bus.rst
> b/Documentation/driver-api/ancillary_bus.rst
> > new file mode 100644
> > index 000000000000..0a11979aa927
> > --- /dev/null
> > +++ b/Documentation/driver-api/ancillary_bus.rst
> > @@ -0,0 +1,230 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +=============
> > +Ancillary Bus
> > +=============
> > +
> > +In some subsystems, the functionality of the core device
> (PCI/ACPI/other) is
> > +too complex for a single device to be managed as a monolithic block or a
> part of
> > +the functionality needs to be exposed to a different subsystem.  Splitting
> the
> > +functionality into smaller orthogonal devices would make it easier to
> manage
> > +data, power management and domain-specific interaction with the
> hardware. A key
> > +requirement for such a split is that there is no dependency on a physical
> bus,
> > +device, register accesses or regmap support. These individual devices split
> from
> > +the core cannot live on the platform bus as they are not physical devices
> that
> > +are controlled by DT/ACPI. The same argument applies for not using MFD
> in this
> > +scenario as MFD relies on individual function devices being physical
> devices
> > +that are DT enumerated.
> > +
> > +An example for this kind of requirement is the audio subsystem where a
> single
> > +IP is handling multiple entities such as HDMI, Soundwire, local devices
> such as
> > +mics/speakers etc. The split for the core's functionality can be arbitrary or
> > +be defined by the DSP firmware topology and include hooks for
> test/debug. This
> > +allows for the audio core device to be minimal and focused on hardware-
> specific
> > +control and communication.
> > +
> > +The ancillary bus is intended to be minimal, generic and avoid domain-
> specific
> > +assumptions. Each ancillary_device represents a part of its parent
> > +functionality. The generic behavior can be extended and specialized as
> needed
> > +by encapsulating an ancillary_device within other domain-specific
> structures and
> > +the use of .ops callbacks. Devices on the ancillary bus do not share any
> > +structures and the use of a communication channel with the parent is
> > +domain-specific.
> > +
> > +When Should the Ancillary Bus Be Used
> > +=====================================
> > +
> > +The ancillary bus is to be used when a driver and one or more kernel
> modules,
> > +who share a common header file with the driver, need a mechanism to
> connect and
> > +provide access to a shared object allocated by the ancillary_device's
> > +registering driver.  The registering driver for the ancillary_device(s) and
> the
> > +kernel module(s) registering ancillary_drivers can be from the same
> subsystem,
> > +or from multiple subsystems.
> > +
> > +The emphasis here is on a common generic interface that keeps
> subsystem
> > +customization out of the bus infrastructure.
> > +
> > +One example could be a multi-port PCI network device that is rdma-
> capable and
> > +needs to export this functionality and attach to an rdma driver in another
> > +subsystem.  The PCI driver will allocate and register an ancillary_device for
> > +each physical function on the NIC.  The rdma driver will register an
> > +ancillary_driver that will be matched with and probed for each of these
> > +ancillary_devices.  This will give the rdma driver access to the shared
> data/ops
> > +in the PCI drivers shared object to establish a connection with the PCI
> driver.
> > +
> > +Another use case is for the a PCI device to be split out into multiple sub
> > +functions.  For each sub function an ancillary_device will be created.  A PCI
> > +sub function driver will bind to such devices that will create its own one or
> > +more class devices.  A PCI sub function ancillary device will likely be
> > +contained in a struct with additional attributes such as user defined sub
> > +function number and optional attributes such as resources and a link to
> the
> > +parent device.  These attributes could be used by systemd/udev; and
> hence should
> > +be initialized before a driver binds to an ancillary_device.
> > +
> > +Ancillary Device
> > +================
> > +
> > +An ancillary_device is created and registered to represent a part of its
> parent
> > +device's functionality. It is given a name that, combined with the
> registering
> > +drivers KBUILD_MODNAME, creates a match_name that is used for driver
> binding,
> > +and an id that combined with the match_name provide a unique name to
> register
> > +with the bus subsystem.
> > +
> > +Registering an ancillary_device is a two-step process.  First you must call
> > +ancillary_device_initialize(), which will check several aspects of the
> > +ancillary_device struct and perform a device_initialize().  After this step
> > +completes, any error state must have a call to put_device() in its
> resolution
> > +path.  The second step in registering an ancillary_device is to perform a
> call
> > +to ancillary_device_add(), which will set the name of the device and add
> the
> > +device to the bus.
> > +
> > +To unregister an ancillary_device, just a call to
> ancillary_device_unregister()
> > +is used.  This will perform both a device_del() and a put_device().
> 
> Why did you chose ancillary_device_initialize() and not
> ancillary_device_register() to be paired with ancillary_device_unregister()?
> 
> Thanks

We originally had a single call to ancillary_device_register() that paired with
unregister, but there was an ask to separate the register into an initialize and
add to make the error condition unwind more compartimentalized.

-DaveE
Leon Romanovsky Oct. 1, 2020, 5:40 p.m. UTC | #3
On Thu, Oct 01, 2020 at 05:20:35PM +0000, Ertman, David M wrote:
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, October 1, 2020 1:32 AM
> > To: Ertman, David M <david.m.ertman@intel.com>
> > Cc: linux-rdma@vger.kernel.org
> > Subject: Re: [PATCH 1/6] Add ancillary bus support
> >
> > On Wed, Sep 30, 2020 at 10:05:29PM -0700, Dave Ertman wrote:
> > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > It enables drivers to create an ancillary_device and bind an
> > > ancillary_driver to it.
> > >
> > > The bus supports probe/remove shutdown and suspend/resume callbacks.
> > > Each ancillary_device has a unique string based id; driver binds to
> > > an ancillary_device based on this id through the bus.
> > >
> > > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > > Co-developed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > > Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
> > > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > ---
> > >  Documentation/driver-api/ancillary_bus.rst | 230
> > +++++++++++++++++++++
> > >  Documentation/driver-api/index.rst         |   1 +
> > >  drivers/bus/Kconfig                        |   3 +
> > >  drivers/bus/Makefile                       |   3 +
> > >  drivers/bus/ancillary.c                    | 191 +++++++++++++++++
> > >  include/linux/ancillary_bus.h              |  58 ++++++
> > >  include/linux/mod_devicetable.h            |   8 +
> > >  scripts/mod/devicetable-offsets.c          |   3 +
> > >  scripts/mod/file2alias.c                   |   8 +
> > >  9 files changed, 505 insertions(+)
> > >  create mode 100644 Documentation/driver-api/ancillary_bus.rst
> > >  create mode 100644 drivers/bus/ancillary.c
> > >  create mode 100644 include/linux/ancillary_bus.h
> > >
> > > diff --git a/Documentation/driver-api/ancillary_bus.rst
> > b/Documentation/driver-api/ancillary_bus.rst
> > > new file mode 100644
> > > index 000000000000..0a11979aa927
> > > --- /dev/null
> > > +++ b/Documentation/driver-api/ancillary_bus.rst
> > > @@ -0,0 +1,230 @@
> > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +=============
> > > +Ancillary Bus
> > > +=============
> > > +
> > > +In some subsystems, the functionality of the core device
> > (PCI/ACPI/other) is
> > > +too complex for a single device to be managed as a monolithic block or a
> > part of
> > > +the functionality needs to be exposed to a different subsystem.  Splitting
> > the
> > > +functionality into smaller orthogonal devices would make it easier to
> > manage
> > > +data, power management and domain-specific interaction with the
> > hardware. A key
> > > +requirement for such a split is that there is no dependency on a physical
> > bus,
> > > +device, register accesses or regmap support. These individual devices split
> > from
> > > +the core cannot live on the platform bus as they are not physical devices
> > that
> > > +are controlled by DT/ACPI. The same argument applies for not using MFD
> > in this
> > > +scenario as MFD relies on individual function devices being physical
> > devices
> > > +that are DT enumerated.
> > > +
> > > +An example for this kind of requirement is the audio subsystem where a
> > single
> > > +IP is handling multiple entities such as HDMI, Soundwire, local devices
> > such as
> > > +mics/speakers etc. The split for the core's functionality can be arbitrary or
> > > +be defined by the DSP firmware topology and include hooks for
> > test/debug. This
> > > +allows for the audio core device to be minimal and focused on hardware-
> > specific
> > > +control and communication.
> > > +
> > > +The ancillary bus is intended to be minimal, generic and avoid domain-
> > specific
> > > +assumptions. Each ancillary_device represents a part of its parent
> > > +functionality. The generic behavior can be extended and specialized as
> > needed
> > > +by encapsulating an ancillary_device within other domain-specific
> > structures and
> > > +the use of .ops callbacks. Devices on the ancillary bus do not share any
> > > +structures and the use of a communication channel with the parent is
> > > +domain-specific.
> > > +
> > > +When Should the Ancillary Bus Be Used
> > > +=====================================
> > > +
> > > +The ancillary bus is to be used when a driver and one or more kernel
> > modules,
> > > +who share a common header file with the driver, need a mechanism to
> > connect and
> > > +provide access to a shared object allocated by the ancillary_device's
> > > +registering driver.  The registering driver for the ancillary_device(s) and
> > the
> > > +kernel module(s) registering ancillary_drivers can be from the same
> > subsystem,
> > > +or from multiple subsystems.
> > > +
> > > +The emphasis here is on a common generic interface that keeps
> > subsystem
> > > +customization out of the bus infrastructure.
> > > +
> > > +One example could be a multi-port PCI network device that is rdma-
> > capable and
> > > +needs to export this functionality and attach to an rdma driver in another
> > > +subsystem.  The PCI driver will allocate and register an ancillary_device for
> > > +each physical function on the NIC.  The rdma driver will register an
> > > +ancillary_driver that will be matched with and probed for each of these
> > > +ancillary_devices.  This will give the rdma driver access to the shared
> > data/ops
> > > +in the PCI drivers shared object to establish a connection with the PCI
> > driver.
> > > +
> > > +Another use case is for the a PCI device to be split out into multiple sub
> > > +functions.  For each sub function an ancillary_device will be created.  A PCI
> > > +sub function driver will bind to such devices that will create its own one or
> > > +more class devices.  A PCI sub function ancillary device will likely be
> > > +contained in a struct with additional attributes such as user defined sub
> > > +function number and optional attributes such as resources and a link to
> > the
> > > +parent device.  These attributes could be used by systemd/udev; and
> > hence should
> > > +be initialized before a driver binds to an ancillary_device.
> > > +
> > > +Ancillary Device
> > > +================
> > > +
> > > +An ancillary_device is created and registered to represent a part of its
> > parent
> > > +device's functionality. It is given a name that, combined with the
> > registering
> > > +drivers KBUILD_MODNAME, creates a match_name that is used for driver
> > binding,
> > > +and an id that combined with the match_name provide a unique name to
> > register
> > > +with the bus subsystem.
> > > +
> > > +Registering an ancillary_device is a two-step process.  First you must call
> > > +ancillary_device_initialize(), which will check several aspects of the
> > > +ancillary_device struct and perform a device_initialize().  After this step
> > > +completes, any error state must have a call to put_device() in its
> > resolution
> > > +path.  The second step in registering an ancillary_device is to perform a
> > call
> > > +to ancillary_device_add(), which will set the name of the device and add
> > the
> > > +device to the bus.
> > > +
> > > +To unregister an ancillary_device, just a call to
> > ancillary_device_unregister()
> > > +is used.  This will perform both a device_del() and a put_device().
> >
> > Why did you chose ancillary_device_initialize() and not
> > ancillary_device_register() to be paired with ancillary_device_unregister()?
> >
> > Thanks
>
> We originally had a single call to ancillary_device_register() that paired with
> unregister, but there was an ask to separate the register into an initialize and
> add to make the error condition unwind more compartimentalized.

It is correct thing to separate, but I would expect:
ancillary_device_register()
ancillary_device_add()

vs.
ancillary_device_unregister()

It is not a big deal, just curious.

The much more big deal is that I'm required to create 1-to-1 mapping
between device and driver, and I can't connect all my different modules
to one xxx_core.pf.y device in N-to-1 mapping. "N" represents different
protocols (IB, ETH, SCSI) and "1" is one PCI core.

Thanks

>
> -DaveE
Parav Pandit Oct. 1, 2020, 6:29 p.m. UTC | #4
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, October 1, 2020 11:10 PM
> 
> On Thu, Oct 01, 2020 at 05:20:35PM +0000, Ertman, David M wrote:
> > > -----Original Message-----
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Thursday, October 1, 2020 1:32 AM
> > > To: Ertman, David M <david.m.ertman@intel.com>
> > > Cc: linux-rdma@vger.kernel.org
> > > Subject: Re: [PATCH 1/6] Add ancillary bus support
> > >
> > > On Wed, Sep 30, 2020 at 10:05:29PM -0700, Dave Ertman wrote:
> > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > It enables drivers to create an ancillary_device and bind an
> > > > ancillary_driver to it.
> > > >
> > > > The bus supports probe/remove shutdown and suspend/resume
> callbacks.
> > > > Each ancillary_device has a unique string based id; driver binds
> > > > to an ancillary_device based on this id through the bus.
> > > >
> > > > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > > > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > > > Co-developed-by: Ranjani Sridharan
> > > > <ranjani.sridharan@linux.intel.com>
> > > > Signed-off-by: Ranjani Sridharan
> > > > <ranjani.sridharan@linux.intel.com>
> > > > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > > > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > > > Reviewed-by: Pierre-Louis Bossart
> > > > <pierre-louis.bossart@linux.intel.com>
> > > > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > > ---
> > > >  Documentation/driver-api/ancillary_bus.rst | 230
> > > +++++++++++++++++++++
> > > >  Documentation/driver-api/index.rst         |   1 +
> > > >  drivers/bus/Kconfig                        |   3 +
> > > >  drivers/bus/Makefile                       |   3 +
> > > >  drivers/bus/ancillary.c                    | 191 +++++++++++++++++
> > > >  include/linux/ancillary_bus.h              |  58 ++++++
> > > >  include/linux/mod_devicetable.h            |   8 +
> > > >  scripts/mod/devicetable-offsets.c          |   3 +
> > > >  scripts/mod/file2alias.c                   |   8 +
> > > >  9 files changed, 505 insertions(+)  create mode 100644
> > > > Documentation/driver-api/ancillary_bus.rst
> > > >  create mode 100644 drivers/bus/ancillary.c  create mode 100644
> > > > include/linux/ancillary_bus.h
> > > >
> > > > diff --git a/Documentation/driver-api/ancillary_bus.rst
> > > b/Documentation/driver-api/ancillary_bus.rst
> > > > new file mode 100644
> > > > index 000000000000..0a11979aa927
> > > > --- /dev/null
> > > > +++ b/Documentation/driver-api/ancillary_bus.rst
> > > > @@ -0,0 +1,230 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > > +
> > > > +=============
> > > > +Ancillary Bus
> > > > +=============
> > > > +
> > > > +In some subsystems, the functionality of the core device
> > > (PCI/ACPI/other) is
> > > > +too complex for a single device to be managed as a monolithic
> > > > +block or a
> > > part of
> > > > +the functionality needs to be exposed to a different subsystem.
> > > > +Splitting
> > > the
> > > > +functionality into smaller orthogonal devices would make it
> > > > +easier to
> > > manage
> > > > +data, power management and domain-specific interaction with the
> > > hardware. A key
> > > > +requirement for such a split is that there is no dependency on a
> > > > +physical
> > > bus,
> > > > +device, register accesses or regmap support. These individual
> > > > +devices split
> > > from
> > > > +the core cannot live on the platform bus as they are not physical
> > > > +devices
> > > that
> > > > +are controlled by DT/ACPI. The same argument applies for not
> > > > +using MFD
> > > in this
> > > > +scenario as MFD relies on individual function devices being
> > > > +physical
> > > devices
> > > > +that are DT enumerated.
> > > > +
> > > > +An example for this kind of requirement is the audio subsystem
> > > > +where a
> > > single
> > > > +IP is handling multiple entities such as HDMI, Soundwire, local
> > > > +devices
> > > such as
> > > > +mics/speakers etc. The split for the core's functionality can be
> > > > +arbitrary or be defined by the DSP firmware topology and include
> > > > +hooks for
> > > test/debug. This
> > > > +allows for the audio core device to be minimal and focused on
> > > > +hardware-
> > > specific
> > > > +control and communication.
> > > > +
> > > > +The ancillary bus is intended to be minimal, generic and avoid
> > > > +domain-
> > > specific
> > > > +assumptions. Each ancillary_device represents a part of its
> > > > +parent functionality. The generic behavior can be extended and
> > > > +specialized as
> > > needed
> > > > +by encapsulating an ancillary_device within other domain-specific
> > > structures and
> > > > +the use of .ops callbacks. Devices on the ancillary bus do not
> > > > +share any structures and the use of a communication channel with
> > > > +the parent is domain-specific.
> > > > +
> > > > +When Should the Ancillary Bus Be Used
> > > > +=====================================
> > > > +
> > > > +The ancillary bus is to be used when a driver and one or more
> > > > +kernel
> > > modules,
> > > > +who share a common header file with the driver, need a mechanism
> > > > +to
> > > connect and
> > > > +provide access to a shared object allocated by the
> > > > +ancillary_device's registering driver.  The registering driver
> > > > +for the ancillary_device(s) and
> > > the
> > > > +kernel module(s) registering ancillary_drivers can be from the
> > > > +same
> > > subsystem,
> > > > +or from multiple subsystems.
> > > > +
> > > > +The emphasis here is on a common generic interface that keeps
> > > subsystem
> > > > +customization out of the bus infrastructure.
> > > > +
> > > > +One example could be a multi-port PCI network device that is
> > > > +rdma-
> > > capable and
> > > > +needs to export this functionality and attach to an rdma driver
> > > > +in another subsystem.  The PCI driver will allocate and register
> > > > +an ancillary_device for each physical function on the NIC.  The
> > > > +rdma driver will register an ancillary_driver that will be
> > > > +matched with and probed for each of these ancillary_devices.
> > > > +This will give the rdma driver access to the shared
> > > data/ops
> > > > +in the PCI drivers shared object to establish a connection with
> > > > +the PCI
> > > driver.
> > > > +
> > > > +Another use case is for the a PCI device to be split out into
> > > > +multiple sub functions.  For each sub function an
> > > > +ancillary_device will be created.  A PCI sub function driver will
> > > > +bind to such devices that will create its own one or more class
> > > > +devices.  A PCI sub function ancillary device will likely be
> > > > +contained in a struct with additional attributes such as user
> > > > +defined sub function number and optional attributes such as
> > > > +resources and a link to
> > > the
> > > > +parent device.  These attributes could be used by systemd/udev;
> > > > +and
> > > hence should
> > > > +be initialized before a driver binds to an ancillary_device.
> > > > +
> > > > +Ancillary Device
> > > > +================
> > > > +
> > > > +An ancillary_device is created and registered to represent a part
> > > > +of its
> > > parent
> > > > +device's functionality. It is given a name that, combined with
> > > > +the
> > > registering
> > > > +drivers KBUILD_MODNAME, creates a match_name that is used for
> > > > +driver
> > > binding,
> > > > +and an id that combined with the match_name provide a unique name
> > > > +to
> > > register
> > > > +with the bus subsystem.
> > > > +
> > > > +Registering an ancillary_device is a two-step process.  First you
> > > > +must call ancillary_device_initialize(), which will check several
> > > > +aspects of the ancillary_device struct and perform a
> > > > +device_initialize().  After this step completes, any error state
> > > > +must have a call to put_device() in its
> > > resolution
> > > > +path.  The second step in registering an ancillary_device is to
> > > > +perform a
> > > call
> > > > +to ancillary_device_add(), which will set the name of the device
> > > > +and add
> > > the
> > > > +device to the bus.
> > > > +
> > > > +To unregister an ancillary_device, just a call to
> > > ancillary_device_unregister()
> > > > +is used.  This will perform both a device_del() and a put_device().
> > >
> > > Why did you chose ancillary_device_initialize() and not
> > > ancillary_device_register() to be paired with
> ancillary_device_unregister()?
> > >
> > > Thanks
> >
> > We originally had a single call to ancillary_device_register() that
> > paired with unregister, but there was an ask to separate the register
> > into an initialize and add to make the error condition unwind more
> compartimentalized.
> 
> It is correct thing to separate, but I would expect:
> ancillary_device_register()
> ancillary_device_add()
> 
device_initialize(), device_add() and device_unregister() is the pattern widely followed in the core.

> vs.
> ancillary_device_unregister()
> 
> It is not a big deal, just curious.
> 
> The much more big deal is that I'm required to create 1-to-1 mapping
> between device and driver, and I can't connect all my different modules to
> one xxx_core.pf.y device in N-to-1 mapping. "N" represents different
> protocols (IB, ETH, SCSI) and "1" is one PCI core.

For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa, en).

So there should be one ida allocate per mlx5 device type.

Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] = {
	"rdma",
	"eth",
	"vdpa"
};
	
Something like for current mlx5_register_device(),
mlx5_register_device()
{
	id = ida_alloc(0, UINT_MAX, GFP_KERNEL);

	for (i = 0; I < MLX5_INTERFACE_PROTOCOL_MAX; i++) {
		adev = kzalloc(sizeof(*adev), GFP_KERNEL);
		adev.name = mlx5_adev_names[i];
		adev.id = ret;
		adev.dev.parent = mlx5_core_dev->device;
		adev->coredev = mlx5_core_dev;
		ret = ancillary_device_initialize(&adev);	
		ret = ancillary_device_register(adev);
}

This will create 3 ancillary devices for each PCI PF/VF/SF.
mlx5_core.rdma.1
mlx5_core.eth.1
mlx5_core.vdpa.1

and mlx5_ib driver will do

ancillary_driver_register()
{
	For ID of mlx5_core.rdma.
}

mlx5_vdpa driver does,

ancillary_driver_register()
{
	For ID of mlx5_core.vdpa
}

This is uniform for pf/vf/sf for one or more all protocols.
Leon Romanovsky Oct. 1, 2020, 7:32 p.m. UTC | #5
On Thu, Oct 01, 2020 at 06:29:22PM +0000, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, October 1, 2020 11:10 PM
> >
> > On Thu, Oct 01, 2020 at 05:20:35PM +0000, Ertman, David M wrote:
> > > > -----Original Message-----
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Thursday, October 1, 2020 1:32 AM
> > > > To: Ertman, David M <david.m.ertman@intel.com>
> > > > Cc: linux-rdma@vger.kernel.org
> > > > Subject: Re: [PATCH 1/6] Add ancillary bus support
> > > >
> > > > On Wed, Sep 30, 2020 at 10:05:29PM -0700, Dave Ertman wrote:
> > > > > Add support for the Ancillary Bus, ancillary_device and ancillary_driver.
> > > > > It enables drivers to create an ancillary_device and bind an
> > > > > ancillary_driver to it.
> > > > >
> > > > > The bus supports probe/remove shutdown and suspend/resume
> > callbacks.
> > > > > Each ancillary_device has a unique string based id; driver binds
> > > > > to an ancillary_device based on this id through the bus.
> > > > >
> > > > > Co-developed-by: Kiran Patil <kiran.patil@intel.com>
> > > > > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > > > > Co-developed-by: Ranjani Sridharan
> > > > > <ranjani.sridharan@linux.intel.com>
> > > > > Signed-off-by: Ranjani Sridharan
> > > > > <ranjani.sridharan@linux.intel.com>
> > > > > Co-developed-by: Fred Oh <fred.oh@linux.intel.com>
> > > > > Signed-off-by: Fred Oh <fred.oh@linux.intel.com>
> > > > > Reviewed-by: Pierre-Louis Bossart
> > > > > <pierre-louis.bossart@linux.intel.com>
> > > > > Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > Reviewed-by: Parav Pandit <parav@mellanox.com>
> > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > > > > Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> > > > > ---
> > > > >  Documentation/driver-api/ancillary_bus.rst | 230
> > > > +++++++++++++++++++++
> > > > >  Documentation/driver-api/index.rst         |   1 +
> > > > >  drivers/bus/Kconfig                        |   3 +
> > > > >  drivers/bus/Makefile                       |   3 +
> > > > >  drivers/bus/ancillary.c                    | 191 +++++++++++++++++
> > > > >  include/linux/ancillary_bus.h              |  58 ++++++
> > > > >  include/linux/mod_devicetable.h            |   8 +
> > > > >  scripts/mod/devicetable-offsets.c          |   3 +
> > > > >  scripts/mod/file2alias.c                   |   8 +
> > > > >  9 files changed, 505 insertions(+)  create mode 100644
> > > > > Documentation/driver-api/ancillary_bus.rst
> > > > >  create mode 100644 drivers/bus/ancillary.c  create mode 100644
> > > > > include/linux/ancillary_bus.h
> > > > >
> > > > > diff --git a/Documentation/driver-api/ancillary_bus.rst
> > > > b/Documentation/driver-api/ancillary_bus.rst
> > > > > new file mode 100644
> > > > > index 000000000000..0a11979aa927
> > > > > --- /dev/null
> > > > > +++ b/Documentation/driver-api/ancillary_bus.rst
> > > > > @@ -0,0 +1,230 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > > > +
> > > > > +=============
> > > > > +Ancillary Bus
> > > > > +=============
> > > > > +
> > > > > +In some subsystems, the functionality of the core device
> > > > (PCI/ACPI/other) is
> > > > > +too complex for a single device to be managed as a monolithic
> > > > > +block or a
> > > > part of
> > > > > +the functionality needs to be exposed to a different subsystem.
> > > > > +Splitting
> > > > the
> > > > > +functionality into smaller orthogonal devices would make it
> > > > > +easier to
> > > > manage
> > > > > +data, power management and domain-specific interaction with the
> > > > hardware. A key
> > > > > +requirement for such a split is that there is no dependency on a
> > > > > +physical
> > > > bus,
> > > > > +device, register accesses or regmap support. These individual
> > > > > +devices split
> > > > from
> > > > > +the core cannot live on the platform bus as they are not physical
> > > > > +devices
> > > > that
> > > > > +are controlled by DT/ACPI. The same argument applies for not
> > > > > +using MFD
> > > > in this
> > > > > +scenario as MFD relies on individual function devices being
> > > > > +physical
> > > > devices
> > > > > +that are DT enumerated.
> > > > > +
> > > > > +An example for this kind of requirement is the audio subsystem
> > > > > +where a
> > > > single
> > > > > +IP is handling multiple entities such as HDMI, Soundwire, local
> > > > > +devices
> > > > such as
> > > > > +mics/speakers etc. The split for the core's functionality can be
> > > > > +arbitrary or be defined by the DSP firmware topology and include
> > > > > +hooks for
> > > > test/debug. This
> > > > > +allows for the audio core device to be minimal and focused on
> > > > > +hardware-
> > > > specific
> > > > > +control and communication.
> > > > > +
> > > > > +The ancillary bus is intended to be minimal, generic and avoid
> > > > > +domain-
> > > > specific
> > > > > +assumptions. Each ancillary_device represents a part of its
> > > > > +parent functionality. The generic behavior can be extended and
> > > > > +specialized as
> > > > needed
> > > > > +by encapsulating an ancillary_device within other domain-specific
> > > > structures and
> > > > > +the use of .ops callbacks. Devices on the ancillary bus do not
> > > > > +share any structures and the use of a communication channel with
> > > > > +the parent is domain-specific.
> > > > > +
> > > > > +When Should the Ancillary Bus Be Used
> > > > > +=====================================
> > > > > +
> > > > > +The ancillary bus is to be used when a driver and one or more
> > > > > +kernel
> > > > modules,
> > > > > +who share a common header file with the driver, need a mechanism
> > > > > +to
> > > > connect and
> > > > > +provide access to a shared object allocated by the
> > > > > +ancillary_device's registering driver.  The registering driver
> > > > > +for the ancillary_device(s) and
> > > > the
> > > > > +kernel module(s) registering ancillary_drivers can be from the
> > > > > +same
> > > > subsystem,
> > > > > +or from multiple subsystems.
> > > > > +
> > > > > +The emphasis here is on a common generic interface that keeps
> > > > subsystem
> > > > > +customization out of the bus infrastructure.
> > > > > +
> > > > > +One example could be a multi-port PCI network device that is
> > > > > +rdma-
> > > > capable and
> > > > > +needs to export this functionality and attach to an rdma driver
> > > > > +in another subsystem.  The PCI driver will allocate and register
> > > > > +an ancillary_device for each physical function on the NIC.  The
> > > > > +rdma driver will register an ancillary_driver that will be
> > > > > +matched with and probed for each of these ancillary_devices.
> > > > > +This will give the rdma driver access to the shared
> > > > data/ops
> > > > > +in the PCI drivers shared object to establish a connection with
> > > > > +the PCI
> > > > driver.
> > > > > +
> > > > > +Another use case is for the a PCI device to be split out into
> > > > > +multiple sub functions.  For each sub function an
> > > > > +ancillary_device will be created.  A PCI sub function driver will
> > > > > +bind to such devices that will create its own one or more class
> > > > > +devices.  A PCI sub function ancillary device will likely be
> > > > > +contained in a struct with additional attributes such as user
> > > > > +defined sub function number and optional attributes such as
> > > > > +resources and a link to
> > > > the
> > > > > +parent device.  These attributes could be used by systemd/udev;
> > > > > +and
> > > > hence should
> > > > > +be initialized before a driver binds to an ancillary_device.
> > > > > +
> > > > > +Ancillary Device
> > > > > +================
> > > > > +
> > > > > +An ancillary_device is created and registered to represent a part
> > > > > +of its
> > > > parent
> > > > > +device's functionality. It is given a name that, combined with
> > > > > +the
> > > > registering
> > > > > +drivers KBUILD_MODNAME, creates a match_name that is used for
> > > > > +driver
> > > > binding,
> > > > > +and an id that combined with the match_name provide a unique name
> > > > > +to
> > > > register
> > > > > +with the bus subsystem.
> > > > > +
> > > > > +Registering an ancillary_device is a two-step process.  First you
> > > > > +must call ancillary_device_initialize(), which will check several
> > > > > +aspects of the ancillary_device struct and perform a
> > > > > +device_initialize().  After this step completes, any error state
> > > > > +must have a call to put_device() in its
> > > > resolution
> > > > > +path.  The second step in registering an ancillary_device is to
> > > > > +perform a
> > > > call
> > > > > +to ancillary_device_add(), which will set the name of the device
> > > > > +and add
> > > > the
> > > > > +device to the bus.
> > > > > +
> > > > > +To unregister an ancillary_device, just a call to
> > > > ancillary_device_unregister()
> > > > > +is used.  This will perform both a device_del() and a put_device().
> > > >
> > > > Why did you chose ancillary_device_initialize() and not
> > > > ancillary_device_register() to be paired with
> > ancillary_device_unregister()?
> > > >
> > > > Thanks
> > >
> > > We originally had a single call to ancillary_device_register() that
> > > paired with unregister, but there was an ask to separate the register
> > > into an initialize and add to make the error condition unwind more
> > compartimentalized.
> >
> > It is correct thing to separate, but I would expect:
> > ancillary_device_register()
> > ancillary_device_add()
> >
> device_initialize(), device_add() and device_unregister() is the pattern widely followed in the core.

It doesn't mean that I need to agree with that, right?

>
> > vs.
> > ancillary_device_unregister()
> >
> > It is not a big deal, just curious.
> >
> > The much more big deal is that I'm required to create 1-to-1 mapping
> > between device and driver, and I can't connect all my different modules to
> > one xxx_core.pf.y device in N-to-1 mapping. "N" represents different
> > protocols (IB, ETH, SCSI) and "1" is one PCI core.
>
> For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa, en).
>
> So there should be one ida allocate per mlx5 device type.
>
> Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] = {
> 	"rdma",
> 	"eth",
> 	"vdpa"
> };
>
> Something like for current mlx5_register_device(),

I know it and already implemented it, this is why I'm saying that it is
not what I would expect from the implementation.

It is wrong create mlx5_core.rdma.1 device that is equal to mlx5_core.eth.1
just to connect our mlx5_ib.ko to it, while documentation explains about
creating single PCI code device and other drivers connect to it.

Thanks


> mlx5_register_device()
> {
> 	id = ida_alloc(0, UINT_MAX, GFP_KERNEL);
>
> 	for (i = 0; I < MLX5_INTERFACE_PROTOCOL_MAX; i++) {
> 		adev = kzalloc(sizeof(*adev), GFP_KERNEL);
> 		adev.name = mlx5_adev_names[i];
> 		adev.id = ret;
> 		adev.dev.parent = mlx5_core_dev->device;
> 		adev->coredev = mlx5_core_dev;
> 		ret = ancillary_device_initialize(&adev);
> 		ret = ancillary_device_register(adev);
> }
>
> This will create 3 ancillary devices for each PCI PF/VF/SF.
> mlx5_core.rdma.1
> mlx5_core.eth.1
> mlx5_core.vdpa.1
>
> and mlx5_ib driver will do
>
> ancillary_driver_register()
> {
> 	For ID of mlx5_core.rdma.
> }
>
> mlx5_vdpa driver does,
>
> ancillary_driver_register()
> {
> 	For ID of mlx5_core.vdpa
> }
>
> This is uniform for pf/vf/sf for one or more all protocols.
Parav Pandit Oct. 2, 2020, 5:29 a.m. UTC | #6
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Friday, October 2, 2020 1:02 AM

> > > > > > +Registering an ancillary_device is a two-step process.  First
> > > > > > +you must call ancillary_device_initialize(), which will check
> > > > > > +several aspects of the ancillary_device struct and perform a
> > > > > > +device_initialize().  After this step completes, any error
> > > > > > +state must have a call to put_device() in its
> > > > > resolution
> > > > > > +path.  The second step in registering an ancillary_device is
> > > > > > +to perform a
> > > > > call
> > > > > > +to ancillary_device_add(), which will set the name of the
> > > > > > +device and add
> > > > > the
> > > > > > +device to the bus.
> > > > > > +
> > > > > > +To unregister an ancillary_device, just a call to
> > > > > ancillary_device_unregister()
> > > > > > +is used.  This will perform both a device_del() and a put_device().
> > > > >
> > > > > Why did you chose ancillary_device_initialize() and not
> > > > > ancillary_device_register() to be paired with
> > > ancillary_device_unregister()?
> > > > >
> > > > > Thanks
> > > >
> > > > We originally had a single call to ancillary_device_register()
> > > > that paired with unregister, but there was an ask to separate the
> > > > register into an initialize and add to make the error condition
> > > > unwind more
> > > compartimentalized.
> > >
> > > It is correct thing to separate, but I would expect:
> > > ancillary_device_register()
> > > ancillary_device_add()
> > >
> > device_initialize(), device_add() and device_unregister() is the pattern widely
> followed in the core.
> 
> It doesn't mean that I need to agree with that, right?
>
Right. May be I misunderstood your comment where you said "I expect device_register() and device_add()" in response to "device_initialize() and device_add".
I interpreted your comment as to replace initialize() with register().
Because that is odd naming and completely out of sync from the core APIs.

A helper like below that wraps initialize() and add() is buggy, because when register() returns an error, it doesn't know if should do kfree() or put_device().

ancillary_device_register(adev)
{
  ret = ancillary_device_initialize();
  if (ret)
    return ret;

  ret = ancillary_device_add();
  if (ret)
    put_device(dev);
  return ret;
}

> >
> > > vs.
> > > ancillary_device_unregister()
> > >
> > > It is not a big deal, just curious.
> > >
> > > The much more big deal is that I'm required to create 1-to-1 mapping
> > > between device and driver, and I can't connect all my different
> > > modules to one xxx_core.pf.y device in N-to-1 mapping. "N"
> > > represents different protocols (IB, ETH, SCSI) and "1" is one PCI core.
> >
> > For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa, en).
> >
> > So there should be one ida allocate per mlx5 device type.
> >
> > Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] = {
> > 	"rdma",
> > 	"eth",
> > 	"vdpa"
> > };
> >
> > Something like for current mlx5_register_device(),
> 
> I know it and already implemented it, this is why I'm saying that it is not what I
> would expect from the implementation.
> 
> It is wrong create mlx5_core.rdma.1 device that is equal to mlx5_core.eth.1 just
> to connect our mlx5_ib.ko to it, while documentation explains about creating

Ancillary bus's documentation? If so, it should be corrected.
Do you have specific text snippet to point to that should be fixed?

For purpose of matching service functionality, for each different class (ib, eth, vdpa) there is one ancillary device created.
What exactly is wrong here? (and why is it wrong now and not in previous version of the RFC?)

Do you want to create one device and 3 drivers to bind to it? If you think that way, may be pci core should be extended to support that, ancillary bus may not be required.
But that is different solution than what is being proposed here.
Not sure I understand your comment about wrong create mlx5_core.rdma1 equal to core.eth.1", because they are not equal.
To begin with, they have different id for matching service, and in future it should be extended to pass 'only' needed info.
mlx5_core exposing the 'whole' mlx5_core_dev to other drivers doesn't look correct.

> single PCI code device and other drivers connect to it.
> 
> Thanks
> 
> 
> > mlx5_register_device()
> > {
> > 	id = ida_alloc(0, UINT_MAX, GFP_KERNEL);
> >
> > 	for (i = 0; I < MLX5_INTERFACE_PROTOCOL_MAX; i++) {
> > 		adev = kzalloc(sizeof(*adev), GFP_KERNEL);
> > 		adev.name = mlx5_adev_names[i];
> > 		adev.id = ret;
> > 		adev.dev.parent = mlx5_core_dev->device;
> > 		adev->coredev = mlx5_core_dev;
> > 		ret = ancillary_device_initialize(&adev);
> > 		ret = ancillary_device_register(adev); }
> >
> > This will create 3 ancillary devices for each PCI PF/VF/SF.
> > mlx5_core.rdma.1
> > mlx5_core.eth.1
> > mlx5_core.vdpa.1
> >
> > and mlx5_ib driver will do
> >
> > ancillary_driver_register()
> > {
> > 	For ID of mlx5_core.rdma.
> > }
> >
> > mlx5_vdpa driver does,
> >
> > ancillary_driver_register()
> > {
> > 	For ID of mlx5_core.vdpa
> > }
> >
> > This is uniform for pf/vf/sf for one or more all protocols.
Leon Romanovsky Oct. 2, 2020, 6:20 a.m. UTC | #7
On Fri, Oct 02, 2020 at 05:29:26AM +0000, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Friday, October 2, 2020 1:02 AM
>
> > > > > > > +Registering an ancillary_device is a two-step process.  First
> > > > > > > +you must call ancillary_device_initialize(), which will check
> > > > > > > +several aspects of the ancillary_device struct and perform a
> > > > > > > +device_initialize().  After this step completes, any error
> > > > > > > +state must have a call to put_device() in its
> > > > > > resolution
> > > > > > > +path.  The second step in registering an ancillary_device is
> > > > > > > +to perform a
> > > > > > call
> > > > > > > +to ancillary_device_add(), which will set the name of the
> > > > > > > +device and add
> > > > > > the
> > > > > > > +device to the bus.
> > > > > > > +
> > > > > > > +To unregister an ancillary_device, just a call to
> > > > > > ancillary_device_unregister()
> > > > > > > +is used.  This will perform both a device_del() and a put_device().
> > > > > >
> > > > > > Why did you chose ancillary_device_initialize() and not
> > > > > > ancillary_device_register() to be paired with
> > > > ancillary_device_unregister()?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > We originally had a single call to ancillary_device_register()
> > > > > that paired with unregister, but there was an ask to separate the
> > > > > register into an initialize and add to make the error condition
> > > > > unwind more
> > > > compartimentalized.
> > > >
> > > > It is correct thing to separate, but I would expect:
> > > > ancillary_device_register()
> > > > ancillary_device_add()
> > > >
> > > device_initialize(), device_add() and device_unregister() is the pattern widely
> > followed in the core.
> >
> > It doesn't mean that I need to agree with that, right?
> >
> Right. May be I misunderstood your comment where you said "I expect device_register() and device_add()" in response to "device_initialize() and device_add".
> I interpreted your comment as to replace initialize() with register().
> Because that is odd naming and completely out of sync from the core APIs.
>
> A helper like below that wraps initialize() and add() is buggy, because when register() returns an error, it doesn't know if should do kfree() or put_device().

I wrote above (14 lines above this line) that I understand and support
the request to separate init and add parts. There is only one thing that
I didn't like that we have _unregister() but don't have _register().
It is not a big deal.

>
> ancillary_device_register(adev)
> {
>   ret = ancillary_device_initialize();
>   if (ret)
>     return ret;
>
>   ret = ancillary_device_add();
>   if (ret)
>     put_device(dev);
>   return ret;
> }
>
> > >
> > > > vs.
> > > > ancillary_device_unregister()
> > > >
> > > > It is not a big deal, just curious.
> > > >
> > > > The much more big deal is that I'm required to create 1-to-1 mapping
> > > > between device and driver, and I can't connect all my different
> > > > modules to one xxx_core.pf.y device in N-to-1 mapping. "N"
> > > > represents different protocols (IB, ETH, SCSI) and "1" is one PCI core.
> > >
> > > For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa, en).
> > >
> > > So there should be one ida allocate per mlx5 device type.
> > >
> > > Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] = {
> > > 	"rdma",
> > > 	"eth",
> > > 	"vdpa"
> > > };
> > >
> > > Something like for current mlx5_register_device(),
> >
> > I know it and already implemented it, this is why I'm saying that it is not what I
> > would expect from the implementation.
> >
> > It is wrong create mlx5_core.rdma.1 device that is equal to mlx5_core.eth.1 just
> > to connect our mlx5_ib.ko to it, while documentation explains about creating
>
> Ancillary bus's documentation? If so, it should be corrected.
> Do you have specific text snippet to point to that should be fixed?

+One example could be a multi-port PCI network device that is rdma-capable and
+needs to export this functionality and attach to an rdma driver in another
+subsystem.  The PCI driver will allocate and register an ancillary_device for
+each physical function on the NIC.  The rdma driver will register an
+ancillary_driver that will be matched with and probed for each of these
+ancillary_devices.  This will give the rdma driver access to the shared data/ops
+in the PCI drivers shared object to establish a connection with the PCI driver.

>
> For purpose of matching service functionality, for each different class (ib, eth, vdpa) there is one ancillary device created.
> What exactly is wrong here? (and why is it wrong now and not in previous version of the RFC?)

Here the example of two real systems, see how many links we created to
same mlx5_core PCI logic. I imagine that Qlogic and Chelsio drivers will
look even worse, because they spread over more subsystems than mlx5.

This is first time when I see real implementation of real device.

System with 2 IB and 1 RoCE cards:
[leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
 mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.eth.0
 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.eth.1
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
 mlx5_core.ib.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
 mlx5_core.ib.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
 mlx5_core.ib.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
 mlx5_core.vdpa.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456
2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457

System with RoCE SR-IOV card with 4 VFs:
[leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
 mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0
 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2
 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3
 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4
 mlx5_core.ib.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.ib.0
 mlx5_core.ib.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.ib.1
 mlx5_core.ib.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.ib.2
 mlx5_core.ib.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.ib.3
 mlx5_core.ib.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.ib.4
 mlx5_core.vdpa.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.ib.0
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.ib.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.ib.2
 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.ib.3
 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.ib.4
[leonro@vm ~]$ rdma dev
0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456
2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457
3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458
4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459

>
> Do you want to create one device and 3 drivers to bind to it? If you think that way, may be pci core should be extended to support that, ancillary bus may not be required.
> But that is different solution than what is being proposed here.

This is my expectation, I see this virtual bus as glue logic for the
drivers between different subsystems.

> Not sure I understand your comment about wrong create mlx5_core.rdma1 equal to core.eth.1", because they are not equal.
> To begin with, they have different id for matching service, and in future it should be extended to pass 'only' needed info.
> mlx5_core exposing the 'whole' mlx5_core_dev to other drivers doesn't look correct.

Maybe, what I would like to see is:
[leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
 mlx5_core.pf.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.pf.0
 mlx5_core.vf.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vf.0
 mlx5_core.vf.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vf.1
 mlx5_core.vf.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vf.2
 mlx5_core.vf.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vf.3

The mlx5_ib, mlx5_vdpa and mlx5_en will connect to one of mlx5_core.XX.YY devices.

Thanks
Parav Pandit Oct. 2, 2020, 8:42 a.m. UTC | #8
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Friday, October 2, 2020 11:50 AM
> 
> On Fri, Oct 02, 2020 at 05:29:26AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Friday, October 2, 2020 1:02 AM
> >
> > > > > > > > +Registering an ancillary_device is a two-step process.
> > > > > > > > +First you must call ancillary_device_initialize(), which
> > > > > > > > +will check several aspects of the ancillary_device struct
> > > > > > > > +and perform a device_initialize().  After this step
> > > > > > > > +completes, any error state must have a call to
> > > > > > > > +put_device() in its
> > > > > > > resolution
> > > > > > > > +path.  The second step in registering an ancillary_device
> > > > > > > > +is to perform a
> > > > > > > call
> > > > > > > > +to ancillary_device_add(), which will set the name of the
> > > > > > > > +device and add
> > > > > > > the
> > > > > > > > +device to the bus.
> > > > > > > > +
> > > > > > > > +To unregister an ancillary_device, just a call to
> > > > > > > ancillary_device_unregister()
> > > > > > > > +is used.  This will perform both a device_del() and a
> put_device().
> > > > > > >
> > > > > > > Why did you chose ancillary_device_initialize() and not
> > > > > > > ancillary_device_register() to be paired with
> > > > > ancillary_device_unregister()?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > We originally had a single call to ancillary_device_register()
> > > > > > that paired with unregister, but there was an ask to separate
> > > > > > the register into an initialize and add to make the error
> > > > > > condition unwind more
> > > > > compartimentalized.
> > > > >
> > > > > It is correct thing to separate, but I would expect:
> > > > > ancillary_device_register()
> > > > > ancillary_device_add()
> > > > >
> > > > device_initialize(), device_add() and device_unregister() is the
> > > > pattern widely
> > > followed in the core.
> > >
> > > It doesn't mean that I need to agree with that, right?
> > >
> > Right. May be I misunderstood your comment where you said "I expect
> device_register() and device_add()" in response to "device_initialize() and
> device_add".
> > I interpreted your comment as to replace initialize() with register().
> > Because that is odd naming and completely out of sync from the core APIs.
> >
> > A helper like below that wraps initialize() and add() is buggy, because when
> register() returns an error, it doesn't know if should do kfree() or
> put_device().
> 
> I wrote above (14 lines above this line) that I understand and support the
> request to separate init and add parts. There is only one thing that I didn't
> like that we have _unregister() but don't have _register().
> It is not a big deal.
> 
> >
> > ancillary_device_register(adev)
> > {
> >   ret = ancillary_device_initialize();
> >   if (ret)
> >     return ret;
> >
> >   ret = ancillary_device_add();
> >   if (ret)
> >     put_device(dev);
> >   return ret;
> > }
> >
> > > >
> > > > > vs.
> > > > > ancillary_device_unregister()
> > > > >
> > > > > It is not a big deal, just curious.
> > > > >
> > > > > The much more big deal is that I'm required to create 1-to-1
> > > > > mapping between device and driver, and I can't connect all my
> > > > > different modules to one xxx_core.pf.y device in N-to-1 mapping. "N"
> > > > > represents different protocols (IB, ETH, SCSI) and "1" is one PCI core.
> > > >
> > > > For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa,
> en).
> > > >
> > > > So there should be one ida allocate per mlx5 device type.
> > > >
> > > > Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] =
> {
> > > > 	"rdma",
> > > > 	"eth",
> > > > 	"vdpa"
> > > > };
> > > >
> > > > Something like for current mlx5_register_device(),
> > >
> > > I know it and already implemented it, this is why I'm saying that it
> > > is not what I would expect from the implementation.
> > >
> > > It is wrong create mlx5_core.rdma.1 device that is equal to
> > > mlx5_core.eth.1 just to connect our mlx5_ib.ko to it, while
> > > documentation explains about creating
> >
> > Ancillary bus's documentation? If so, it should be corrected.
> > Do you have specific text snippet to point to that should be fixed?
> 
> +One example could be a multi-port PCI network device that is
> +rdma-capable and needs to export this functionality and attach to an
> +rdma driver in another subsystem.  The PCI driver will allocate and
> +register an ancillary_device for each physical function on the NIC.
> +The rdma driver will register an ancillary_driver that will be matched
> +with and probed for each of these ancillary_devices.  This will give
> +the rdma driver access to the shared data/ops in the PCI drivers shared
> object to establish a connection with the PCI driver.
> 
> >
> > For purpose of matching service functionality, for each different class (ib,
> eth, vdpa) there is one ancillary device created.
> > What exactly is wrong here? (and why is it wrong now and not in
> > previous version of the RFC?)
> 
> Here the example of two real systems, see how many links we created to
> same mlx5_core PCI logic. I imagine that Qlogic and Chelsio drivers will look
> even worse, because they spread over more subsystems than mlx5.
> 
> This is first time when I see real implementation of real device.
> 
> System with 2 IB and 1 RoCE cards:
> [leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
>  mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.eth.0
>  mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.eth.1
>  mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
This gives you the ability to not load the netdevice and rdma device of a VF and only load the vdpa device.
These are real use case that users have asked for.
In use case one, they are only interested in rdma device.
In second use case only vdpa device.
How shall one achieve that without spinning of the device for each class?

>  mlx5_core.ib.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
>  mlx5_core.ib.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
>  mlx5_core.ib.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
>  mlx5_core.vdpa.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
>  mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
>  mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
> [leonro@vm ~]$ rdma dev
> 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455
> sys_image_guid 5254:00c0:fe12:3455
> 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456
> sys_image_guid 5254:00c0:fe12:3456
> 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457
> sys_image_guid 5254:00c0:fe12:3457
> 
> System with RoCE SR-IOV card with 4 VFs:
> 
> Maybe, what I would like to see is:
> [leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
>  mlx5_core.pf.0 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.pf.0
>  mlx5_core.vf.0 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vf.0
>  mlx5_core.vf.1 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vf.1
>  mlx5_core.vf.2 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vf.2
>  mlx5_core.vf.3 ->
> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vf.3
> 
> The mlx5_ib, mlx5_vdpa and mlx5_en will connect to one of mlx5_core.XX.YY
> devices.
In that case, I really don't see the need of creating ancillary device at all.
A generic wrapper around blocking_notifier_chain_register() with a notion of id, and some well defined structure as library can serve the purpose.
But it will miss out power suspend() resume() hooks, which we get for free now using ancillary device, in addition to the ability to selectively load only few class device.
Each 'struct device's is close to 744 bytes in size in 5.9 kernel.
Is so many 'struct device' of ancillary type a real problem, that aims to address these use cases?

> 
> Thanks
Leon Romanovsky Oct. 2, 2020, 11:13 a.m. UTC | #9
On Fri, Oct 02, 2020 at 08:42:13AM +0000, Parav Pandit wrote:
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Friday, October 2, 2020 11:50 AM
> >
> > On Fri, Oct 02, 2020 at 05:29:26AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Leon Romanovsky <leon@kernel.org>
> > > > Sent: Friday, October 2, 2020 1:02 AM
> > >
> > > > > > > > > +Registering an ancillary_device is a two-step process.
> > > > > > > > > +First you must call ancillary_device_initialize(), which
> > > > > > > > > +will check several aspects of the ancillary_device struct
> > > > > > > > > +and perform a device_initialize().  After this step
> > > > > > > > > +completes, any error state must have a call to
> > > > > > > > > +put_device() in its
> > > > > > > > resolution
> > > > > > > > > +path.  The second step in registering an ancillary_device
> > > > > > > > > +is to perform a
> > > > > > > > call
> > > > > > > > > +to ancillary_device_add(), which will set the name of the
> > > > > > > > > +device and add
> > > > > > > > the
> > > > > > > > > +device to the bus.
> > > > > > > > > +
> > > > > > > > > +To unregister an ancillary_device, just a call to
> > > > > > > > ancillary_device_unregister()
> > > > > > > > > +is used.  This will perform both a device_del() and a
> > put_device().
> > > > > > > >
> > > > > > > > Why did you chose ancillary_device_initialize() and not
> > > > > > > > ancillary_device_register() to be paired with
> > > > > > ancillary_device_unregister()?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > We originally had a single call to ancillary_device_register()
> > > > > > > that paired with unregister, but there was an ask to separate
> > > > > > > the register into an initialize and add to make the error
> > > > > > > condition unwind more
> > > > > > compartimentalized.
> > > > > >
> > > > > > It is correct thing to separate, but I would expect:
> > > > > > ancillary_device_register()
> > > > > > ancillary_device_add()
> > > > > >
> > > > > device_initialize(), device_add() and device_unregister() is the
> > > > > pattern widely
> > > > followed in the core.
> > > >
> > > > It doesn't mean that I need to agree with that, right?
> > > >
> > > Right. May be I misunderstood your comment where you said "I expect
> > device_register() and device_add()" in response to "device_initialize() and
> > device_add".
> > > I interpreted your comment as to replace initialize() with register().
> > > Because that is odd naming and completely out of sync from the core APIs.
> > >
> > > A helper like below that wraps initialize() and add() is buggy, because when
> > register() returns an error, it doesn't know if should do kfree() or
> > put_device().
> >
> > I wrote above (14 lines above this line) that I understand and support the
> > request to separate init and add parts. There is only one thing that I didn't
> > like that we have _unregister() but don't have _register().
> > It is not a big deal.
> >
> > >
> > > ancillary_device_register(adev)
> > > {
> > >   ret = ancillary_device_initialize();
> > >   if (ret)
> > >     return ret;
> > >
> > >   ret = ancillary_device_add();
> > >   if (ret)
> > >     put_device(dev);
> > >   return ret;
> > > }
> > >
> > > > >
> > > > > > vs.
> > > > > > ancillary_device_unregister()
> > > > > >
> > > > > > It is not a big deal, just curious.
> > > > > >
> > > > > > The much more big deal is that I'm required to create 1-to-1
> > > > > > mapping between device and driver, and I can't connect all my
> > > > > > different modules to one xxx_core.pf.y device in N-to-1 mapping. "N"
> > > > > > represents different protocols (IB, ETH, SCSI) and "1" is one PCI core.
> > > > >
> > > > > For one mlx5 (pf/vf/sf) device, there are three class erivers (ib, vdpa,
> > en).
> > > > >
> > > > > So there should be one ida allocate per mlx5 device type.
> > > > >
> > > > > Static const mlx5_adev_names[MLX5_INTERFACE_PROTOCOL_MAX] =
> > {
> > > > > 	"rdma",
> > > > > 	"eth",
> > > > > 	"vdpa"
> > > > > };
> > > > >
> > > > > Something like for current mlx5_register_device(),
> > > >
> > > > I know it and already implemented it, this is why I'm saying that it
> > > > is not what I would expect from the implementation.
> > > >
> > > > It is wrong create mlx5_core.rdma.1 device that is equal to
> > > > mlx5_core.eth.1 just to connect our mlx5_ib.ko to it, while
> > > > documentation explains about creating
> > >
> > > Ancillary bus's documentation? If so, it should be corrected.
> > > Do you have specific text snippet to point to that should be fixed?
> >
> > +One example could be a multi-port PCI network device that is
> > +rdma-capable and needs to export this functionality and attach to an
> > +rdma driver in another subsystem.  The PCI driver will allocate and
> > +register an ancillary_device for each physical function on the NIC.
> > +The rdma driver will register an ancillary_driver that will be matched
> > +with and probed for each of these ancillary_devices.  This will give
> > +the rdma driver access to the shared data/ops in the PCI drivers shared
> > object to establish a connection with the PCI driver.
> >
> > >
> > > For purpose of matching service functionality, for each different class (ib,
> > eth, vdpa) there is one ancillary device created.
> > > What exactly is wrong here? (and why is it wrong now and not in
> > > previous version of the RFC?)
> >
> > Here the example of two real systems, see how many links we created to
> > same mlx5_core PCI logic. I imagine that Qlogic and Chelsio drivers will look
> > even worse, because they spread over more subsystems than mlx5.
> >
> > This is first time when I see real implementation of real device.
> >
> > System with 2 IB and 1 RoCE cards:
> > [leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
> >  mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.eth.0
> >  mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.eth.1
> >  mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
> This gives you the ability to not load the netdevice and rdma device of a VF and only load the vdpa device.
> These are real use case that users have asked for.
> In use case one, they are only interested in rdma device.
> In second use case only vdpa device.
> How shall one achieve that without spinning of the device for each class?

Why will it be different if ancillary device is small PCI core?
If you want RDMA, you will use specific ancillary driver that connects
to that small PCI logic.

>
> >  mlx5_core.ib.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
> >  mlx5_core.ib.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
> >  mlx5_core.ib.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
> >  mlx5_core.vdpa.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.ib.0
> >  mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.ib.1
> >  mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.ib.2
> > [leonro@vm ~]$ rdma dev
> > 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455
> > sys_image_guid 5254:00c0:fe12:3455
> > 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456
> > sys_image_guid 5254:00c0:fe12:3456
> > 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457
> > sys_image_guid 5254:00c0:fe12:3457
> >
> > System with RoCE SR-IOV card with 4 VFs:
> >
> > Maybe, what I would like to see is:
> > [leonro@vm ~]$ ls -l /sys/bus/ancillary/devices/
> >  mlx5_core.pf.0 ->
> > ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.pf.0
> >  mlx5_core.vf.0 ->
> > ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vf.0
> >  mlx5_core.vf.1 ->
> > ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vf.1
> >  mlx5_core.vf.2 ->
> > ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vf.2
> >  mlx5_core.vf.3 ->
> > ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vf.3
> >
> > The mlx5_ib, mlx5_vdpa and mlx5_en will connect to one of mlx5_core.XX.YY
> > devices.
> In that case, I really don't see the need of creating ancillary device at all.
> A generic wrapper around blocking_notifier_chain_register() with a notion of id, and some well defined structure as library can serve the purpose.

Ancillary bus provides generic implementation instead of existing copy/paste code.

> But it will miss out power suspend() resume() hooks, which we get for free now using ancillary device, in addition to the ability to selectively load only few class device.

While I'm interested to load specific device and limit supported
classes later and not vice-versa as it is now.

> Each 'struct device's is close to 744 bytes in size in 5.9 kernel.
> Is so many 'struct device' of ancillary type a real problem, that aims to address these use cases?

Being nice to the users and provide clean abstraction are important
goals too.

Thanks

>
> >
> > Thanks
Parav Pandit Oct. 2, 2020, 11:27 a.m. UTC | #10
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Friday, October 2, 2020 4:44 PM

[..]
> > > ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
> > This gives you the ability to not load the netdevice and rdma device of a VF
> and only load the vdpa device.
> > These are real use case that users have asked for.
> > In use case one, they are only interested in rdma device.
> > In second use case only vdpa device.
> > How shall one achieve that without spinning of the device for each class?
> 
> Why will it be different if ancillary device is small PCI core?
> If you want RDMA, you will use specific ancillary driver that connects to that
> small PCI logic.

I didn't follow, wwhat is PCI core and PCI logic in this context?

Not sure if you understood the use case.
Let me try again.
Let say there are 4 VFs enabled.
User would not like to create netdev for 3 VFs (0 to 2) ; user only wants rdma device for these VFs 0 to 2.
User wants only vdpa device for 4th VF.
User doesn't want to create rdma device and netdevice for the 4th VF.
How one shall achieve this?
It is easily achievable with current ancillary device instantiation per class proposal.

> Being nice to the users and provide clean abstraction are important goals too.
Which part of this makes not_nice_to_users and what is not abstracted.
I lost you.
Leon Romanovsky Oct. 2, 2020, 11:45 a.m. UTC | #11
On Fri, Oct 02, 2020 at 11:27:43AM +0000, Parav Pandit wrote:
>
>
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Friday, October 2, 2020 4:44 PM
>
> [..]
> > > > ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
> > > This gives you the ability to not load the netdevice and rdma device of a VF
> > and only load the vdpa device.
> > > These are real use case that users have asked for.
> > > In use case one, they are only interested in rdma device.
> > > In second use case only vdpa device.
> > > How shall one achieve that without spinning of the device for each class?
> >
> > Why will it be different if ancillary device is small PCI core?
> > If you want RDMA, you will use specific ancillary driver that connects to that
> > small PCI logic.
>
> I didn't follow, wwhat is PCI core and PCI logic in this context?

mlx5_core is PCI core/logic - ancillary device
mlx5_ib/mlx5_en/mlx5_vdpa - ancillary drivers

>
> Not sure if you understood the use case.
> Let me try again.
> Let say there are 4 VFs enabled.
> User would not like to create netdev for 3 VFs (0 to 2) ; user only wants rdma device for these VFs 0 to 2.
> User wants only vdpa device for 4th VF.
> User doesn't want to create rdma device and netdevice for the 4th VF.
> How one shall achieve this?

It depends on how N-to-1 bus will be implemented. For example, devlink
already allows to disable RoCE on specific function, nothing prohibits
to extend it to support other classes.

> It is easily achievable with current ancillary device instantiation per class proposal.

It is byproduct of 1-to-1 connection and not specific design decision.

>
> > Being nice to the users and provide clean abstraction are important goals too.
> Which part of this makes not_nice_to_users and what is not abstracted.
> I lost you.

Your use case perfectly presented not_nice_to_users thing. Users are
interested to work on functions (VF/PF/SF) and configure them without
need to dig into the sysfs directories to connect ancillary classes
and their indexes to the real functions.

Thanks
Parav Pandit Oct. 2, 2020, 11:56 a.m. UTC | #12
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Friday, October 2, 2020 5:16 PM
> 
> On Fri, Oct 02, 2020 at 11:27:43AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Friday, October 2, 2020 4:44 PM
> >
> > [..]
> > > > > ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
> > > > This gives you the ability to not load the netdevice and rdma
> > > > device of a VF
> > > and only load the vdpa device.
> > > > These are real use case that users have asked for.
> > > > In use case one, they are only interested in rdma device.
> > > > In second use case only vdpa device.
> > > > How shall one achieve that without spinning of the device for each
> class?
> > >
> > > Why will it be different if ancillary device is small PCI core?
> > > If you want RDMA, you will use specific ancillary driver that
> > > connects to that small PCI logic.
> >
> > I didn't follow, wwhat is PCI core and PCI logic in this context?
> 
> mlx5_core is PCI core/logic - ancillary device mlx5_ib/mlx5_en/mlx5_vdpa -
> ancillary drivers
> 
> >
> > Not sure if you understood the use case.
> > Let me try again.
> > Let say there are 4 VFs enabled.
> > User would not like to create netdev for 3 VFs (0 to 2) ; user only wants
> rdma device for these VFs 0 to 2.
> > User wants only vdpa device for 4th VF.
> > User doesn't want to create rdma device and netdevice for the 4th VF.
> > How one shall achieve this?
> 
> It depends on how N-to-1 bus will be implemented. For example, devlink
> already allows to disable RoCE on specific function, nothing prohibits to
> extend it to support other classes.

Sure. Please go through the our internal RFC dated 7/7/20 with subject "devlink device params to disable rdma, netdev" which exactly achieves similar.

And recent 3rd internal RFC "devlink to set driver parameters" discussion with Eli, Jason, Jiri and others that describes to have one ancillary_device_per_class instead of devlink.

Lets discuss it offline as we have multiple proposals floating.

> 
> > It is easily achievable with current ancillary device instantiation per class
> proposal.
> 
> It is byproduct of 1-to-1 connection and not specific design decision.
> 
> >
> > > Being nice to the users and provide clean abstraction are important goals
> too.
> > Which part of this makes not_nice_to_users and what is not abstracted.
> > I lost you.
> 
> Your use case perfectly presented not_nice_to_users thing. Users are
> interested to work on functions (VF/PF/SF) and configure them without
> need to dig into the sysfs directories to connect ancillary classes and their
> indexes to the real functions.
> 
> Thanks
diff mbox series

Patch

diff --git a/Documentation/driver-api/ancillary_bus.rst b/Documentation/driver-api/ancillary_bus.rst
new file mode 100644
index 000000000000..0a11979aa927
--- /dev/null
+++ b/Documentation/driver-api/ancillary_bus.rst
@@ -0,0 +1,230 @@ 
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=============
+Ancillary Bus
+=============
+
+In some subsystems, the functionality of the core device (PCI/ACPI/other) is
+too complex for a single device to be managed as a monolithic block or a part of
+the functionality needs to be exposed to a different subsystem.  Splitting the
+functionality into smaller orthogonal devices would make it easier to manage
+data, power management and domain-specific interaction with the hardware. A key
+requirement for such a split is that there is no dependency on a physical bus,
+device, register accesses or regmap support. These individual devices split from
+the core cannot live on the platform bus as they are not physical devices that
+are controlled by DT/ACPI. The same argument applies for not using MFD in this
+scenario as MFD relies on individual function devices being physical devices
+that are DT enumerated.
+
+An example for this kind of requirement is the audio subsystem where a single
+IP is handling multiple entities such as HDMI, Soundwire, local devices such as
+mics/speakers etc. The split for the core's functionality can be arbitrary or
+be defined by the DSP firmware topology and include hooks for test/debug. This
+allows for the audio core device to be minimal and focused on hardware-specific
+control and communication.
+
+The ancillary bus is intended to be minimal, generic and avoid domain-specific
+assumptions. Each ancillary_device represents a part of its parent
+functionality. The generic behavior can be extended and specialized as needed
+by encapsulating an ancillary_device within other domain-specific structures and
+the use of .ops callbacks. Devices on the ancillary bus do not share any
+structures and the use of a communication channel with the parent is
+domain-specific.
+
+When Should the Ancillary Bus Be Used
+=====================================
+
+The ancillary bus is to be used when a driver and one or more kernel modules,
+who share a common header file with the driver, need a mechanism to connect and
+provide access to a shared object allocated by the ancillary_device's
+registering driver.  The registering driver for the ancillary_device(s) and the
+kernel module(s) registering ancillary_drivers can be from the same subsystem,
+or from multiple subsystems.
+
+The emphasis here is on a common generic interface that keeps subsystem
+customization out of the bus infrastructure.
+
+One example could be a multi-port PCI network device that is rdma-capable and
+needs to export this functionality and attach to an rdma driver in another
+subsystem.  The PCI driver will allocate and register an ancillary_device for
+each physical function on the NIC.  The rdma driver will register an
+ancillary_driver that will be matched with and probed for each of these
+ancillary_devices.  This will give the rdma driver access to the shared data/ops
+in the PCI drivers shared object to establish a connection with the PCI driver.
+
+Another use case is for the a PCI device to be split out into multiple sub
+functions.  For each sub function an ancillary_device will be created.  A PCI
+sub function driver will bind to such devices that will create its own one or
+more class devices.  A PCI sub function ancillary device will likely be
+contained in a struct with additional attributes such as user defined sub
+function number and optional attributes such as resources and a link to the
+parent device.  These attributes could be used by systemd/udev; and hence should
+be initialized before a driver binds to an ancillary_device.
+
+Ancillary Device
+================
+
+An ancillary_device is created and registered to represent a part of its parent
+device's functionality. It is given a name that, combined with the registering
+drivers KBUILD_MODNAME, creates a match_name that is used for driver binding,
+and an id that combined with the match_name provide a unique name to register
+with the bus subsystem.
+
+Registering an ancillary_device is a two-step process.  First you must call
+ancillary_device_initialize(), which will check several aspects of the
+ancillary_device struct and perform a device_initialize().  After this step
+completes, any error state must have a call to put_device() in its resolution
+path.  The second step in registering an ancillary_device is to perform a call
+to ancillary_device_add(), which will set the name of the device and add the
+device to the bus.
+
+To unregister an ancillary_device, just a call to ancillary_device_unregister()
+is used.  This will perform both a device_del() and a put_device().
+
+.. code-block:: c
+
+	struct ancillary_device {
+		struct device dev;
+                const char *name;
+		u32 id;
+	};
+
+If two ancillary_devices both with a match_name "mod.foo" are registered onto
+the bus, they must have unique id values (e.g. "x" and "y") so that the
+registered devices names will be "mod.foo.x" and "mod.foo.y".  If match_name +
+id are not unique, then the device_add will fail and generate an error message.
+
+The ancillary_device.dev.type.release or ancillary_device.dev.release must be
+populated with a non-NULL pointer to successfully register the ancillary_device.
+
+The ancillary_device.dev.parent must also be populated.
+
+Ancillary Device Memory Model and Lifespan
+------------------------------------------
+
+When a kernel driver registers an ancillary_device on the ancillary bus, we will
+use the nomenclature to refer to this kernel driver as a registering driver.  It
+is the entity that will allocate memory for the ancillary_device and register it
+on the ancillary bus.  It is important to note that, as opposed to the platform
+bus, the registering driver is wholly responsible for the management for the
+memory used for the driver object.
+
+A parent object, defined in the shared header file, will contain the
+ancillary_device.  It will also contain a pointer to the shared object(s), which
+will also be defined in the shared header.  Both the parent object and the
+shared object(s) will be allocated by the registering driver.  This layout
+allows the ancillary_driver's registering module to perform a container_of()
+call to go from the pointer to the ancillary_device, that is passed during the
+call to the ancillary_driver's probe function, up to the parent object, and then
+have access to the shared object(s).
+
+The memory for the ancillary_device will be freed only in its release()
+callback flow as defined by its registering driver.
+
+The memory for the shared object(s) must have a lifespan equal to, or greater
+than, the lifespan of the memory for the ancillary_device.  The ancillary_driver
+should only consider that this shared object is valid as long as the
+ancillary_device is still registered on the ancillary bus.  It is up to the
+registering driver to manage (e.g. free or keep available) the memory for the
+shared object beyond the life of the ancillary_device.
+
+Registering driver must unregister all ancillary devices before its registering
+parent device's remove() is completed.
+
+Ancillary Drivers
+=================
+
+Ancillary drivers follow the standard driver model convention, where
+discovery/enumeration is handled by the core, and drivers
+provide probe() and remove() methods. They support power management
+and shutdown notifications using the standard conventions.
+
+.. code-block:: c
+
+	struct ancillary_driver {
+		int (*probe)(struct ancillary_device *,
+                             const struct ancillary_device_id *id);
+		int (*remove)(struct ancillary_device *);
+		void (*shutdown)(struct ancillary_device *);
+		int (*suspend)(struct ancillary_device *, pm_message_t);
+		int (*resume)(struct ancillary_device *);
+		struct device_driver driver;
+		const struct ancillary_device_id *id_table;
+	};
+
+Ancillary drivers register themselves with the bus by calling
+ancillary_driver_register(). The id_table contains the match_names of ancillary
+devices that a driver can bind with.
+
+Example Usage
+=============
+
+Ancillary devices are created and registered by a subsystem-level core device
+that needs to break up its functionality into smaller fragments. One way to
+extend the scope of an ancillary_device would be to encapsulate it within a
+domain-specific structure defined by the parent device. This structure contains
+the ancillary_device and any associated shared data/callbacks needed to
+establish the connection with the parent.
+
+An example would be:
+
+.. code-block:: c
+
+        struct foo {
+		struct ancillary_device ancildev;
+		void (*connect)(struct ancillary_device *ancildev);
+		void (*disconnect)(struct ancillary_device *ancildev);
+		void *data;
+        };
+
+The parent device would then register the ancillary_device by calling
+ancillary_device_initialize(), and then ancillary_device_add(), with the pointer
+to the ancildev member of the above structure. The parent would provide a name
+for the ancillary_device that, combined with the parent's KBUILD_MODNAME, will
+create a match_name that will be used for matching and binding with a driver.
+
+Whenever an ancillary_driver is registered, based on the match_name, the
+ancillary_driver's probe() is invoked for the matching devices.  The
+ancillary_driver can also be encapsulated inside custom drivers that make the
+core device's functionality extensible by adding additional domain-specific ops
+as follows:
+
+.. code-block:: c
+
+	struct my_ops {
+		void (*send)(struct ancillary_device *ancildev);
+		void (*receive)(struct ancillary_device *ancildev);
+	};
+
+
+	struct my_driver {
+		struct ancillary_driver ancillary_drv;
+		const struct my_ops ops;
+	};
+
+An example of this type of usage would be:
+
+.. code-block:: c
+
+	const struct ancillary_device_id my_ancillary_id_table[] = {
+		{ .name = "foo_mod.foo_dev" },
+		{ },
+	};
+
+	const struct my_ops my_custom_ops = {
+		.send = my_tx,
+		.receive = my_rx,
+	};
+
+	const struct my_driver my_drv = {
+		.ancillary_drv = {
+			.driver = {
+				.name = "myancillarydrv",
+			},
+			.id_table = my_ancillary_id_table,
+			.probe = my_probe,
+			.remove = my_remove,
+			.shutdown = my_shutdown,
+		},
+		.ops = my_custom_ops,
+	};
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 5ef2cfe3a16b..9584ac2ed1f5 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -74,6 +74,7 @@  available subsections can be seen below.
    thermal/index
    fpga/index
    acpi/index
+   ancillary_bus
    backlight/lp855x-driver.rst
    connector
    console
diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 0c262c2aeaf2..ba82a045b847 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -5,6 +5,9 @@ 
 
 menu "Bus devices"
 
+config ANCILLARY_BUS
+       tristate
+
 config ARM_CCI
 	bool
 
diff --git a/drivers/bus/Makefile b/drivers/bus/Makefile
index 397e35392bff..1fd238094543 100644
--- a/drivers/bus/Makefile
+++ b/drivers/bus/Makefile
@@ -3,6 +3,9 @@ 
 # Makefile for the bus drivers.
 #
 
+#Ancillary bus driver
+obj-$(CONFIG_ANCILLARY_BUS)	+= ancillary.o
+
 # Interconnect bus drivers for ARM platforms
 obj-$(CONFIG_ARM_CCI)		+= arm-cci.o
 obj-$(CONFIG_ARM_INTEGRATOR_LM)	+= arm-integrator-lm.o
diff --git a/drivers/bus/ancillary.c b/drivers/bus/ancillary.c
new file mode 100644
index 000000000000..2d940fe5717a
--- /dev/null
+++ b/drivers/bus/ancillary.c
@@ -0,0 +1,191 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Software based bus for Ancillary devices
+ *
+ * Copyright (c) 2019-2020 Intel Corporation
+ *
+ * Please see Documentation/driver-api/ancillary_bus.rst for more information.
+ */
+
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
+#include <linux/string.h>
+#include <linux/ancillary_bus.h>
+
+static const struct ancillary_device_id *ancillary_match_id(const struct ancillary_device_id *id,
+							    const struct ancillary_device *ancildev)
+{
+
+	while (id->name[0]) {
+		const char *p = strrchr(dev_name(&ancildev->dev), '.');
+		int match_size;
+
+		if (!p)
+			continue;
+		match_size = p - dev_name(&ancildev->dev);
+
+		/* use dev_name(&ancildev->dev) prefix before last '.' char to match to */
+		if (!strncmp(dev_name(&ancildev->dev), id->name, match_size))
+			return id;
+		id++;
+	}
+	return NULL;
+}
+
+static int ancillary_match(struct device *dev, struct device_driver *drv)
+{
+	struct ancillary_device *ancildev = to_ancillary_dev(dev);
+	struct ancillary_driver *ancildrv = to_ancillary_drv(drv);
+
+	return !!ancillary_match_id(ancildrv->id_table, ancildev);
+}
+
+static int ancillary_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+	const char *name, *p;
+
+	name = dev_name(dev);
+	p = strrchr(name, '.');
+
+	return add_uevent_var(env, "MODALIAS=%s%.*s", ANCILLARY_MODULE_PREFIX, (int)(p - name),
+			      name);
+}
+
+static const struct dev_pm_ops ancillary_dev_pm_ops = {
+	SET_RUNTIME_PM_OPS(pm_generic_runtime_suspend, pm_generic_runtime_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_generic_suspend, pm_generic_resume)
+};
+
+struct bus_type ancillary_bus_type = {
+	.name = "ancillary",
+	.match = ancillary_match,
+	.uevent = ancillary_uevent,
+	.pm = &ancillary_dev_pm_ops,
+};
+
+/**
+ * ancillary_device_initialize - check ancillary_device and initialize
+ * @ancildev: ancillary device struct
+ */
+int ancillary_device_initialize(struct ancillary_device *ancildev)
+{
+	struct device *dev = &ancildev->dev;
+
+	dev->bus = &ancillary_bus_type;
+
+	if (WARN_ON(!dev->parent) || WARN_ON(!ancildev->name) ||
+	    WARN_ON(!(dev->type && dev->type->release) && !dev->release))
+		return -EINVAL;
+
+	device_initialize(&ancildev->dev);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ancillary_device_initialize);
+
+/**
+ * __ancillary_device_add - add an ancillary bus device
+ * @ancildev: ancillary bus device to add to the bus
+ * @modname: name of the parent device's driver module
+ */
+int __ancillary_device_add(struct ancillary_device *ancildev, const char *modname)
+{
+	struct device *dev = &ancildev->dev;
+	int ret;
+
+	if (WARN_ON(!modname))
+		return -EINVAL;
+
+	ret = dev_set_name(dev, "%s.%s.%d", modname, ancildev->name, ancildev->id);
+	if (ret) {
+		dev_err(dev->parent, "dev_set_name failed for device: %d\n", ret);
+		return ret;
+	}
+
+	ret = device_add(dev);
+	if (ret)
+		dev_err(dev, "adding device failed!: %d\n", ret);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__ancillary_device_add);
+
+static int ancillary_probe_driver(struct device *dev)
+{
+	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
+	struct ancillary_device *ancildev = to_ancillary_dev(dev);
+	int ret;
+
+	ret = dev_pm_domain_attach(dev, true);
+	if (ret) {
+		dev_warn(&ancildev->dev, "Failed to attach to PM Domain : %d\n", ret);
+		return ret;
+	}
+
+	ret = ancildrv->probe(ancildev, ancillary_match_id(ancildrv->id_table, ancildev));
+	if (ret)
+		dev_pm_domain_detach(dev, true);
+
+	return ret;
+}
+
+static int ancillary_remove_driver(struct device *dev)
+{
+	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
+	struct ancillary_device *ancildev = to_ancillary_dev(dev);
+	int ret;
+
+	ret = ancildrv->remove(ancildev);
+	dev_pm_domain_detach(dev, true);
+
+	return ret;
+}
+
+static void ancillary_shutdown_driver(struct device *dev)
+{
+	struct ancillary_driver *ancildrv = to_ancillary_drv(dev->driver);
+	struct ancillary_device *ancildev = to_ancillary_dev(dev);
+
+	ancildrv->shutdown(ancildev);
+}
+
+/**
+ * __ancillary_driver_register - register a driver for ancillary bus devices
+ * @ancildrv: ancillary_driver structure
+ * @owner: owning module/driver
+ */
+int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct module *owner)
+{
+	if (WARN_ON(!ancildrv->probe) || WARN_ON(!ancildrv->remove) ||
+	    WARN_ON(!ancildrv->shutdown) || WARN_ON(!ancildrv->id_table))
+		return -EINVAL;
+
+	ancildrv->driver.owner = owner;
+	ancildrv->driver.bus = &ancillary_bus_type;
+	ancildrv->driver.probe = ancillary_probe_driver;
+	ancildrv->driver.remove = ancillary_remove_driver;
+	ancildrv->driver.shutdown = ancillary_shutdown_driver;
+
+	return driver_register(&ancildrv->driver);
+}
+EXPORT_SYMBOL_GPL(__ancillary_driver_register);
+
+static int __init ancillary_bus_init(void)
+{
+	return bus_register(&ancillary_bus_type);
+}
+
+static void __exit ancillary_bus_exit(void)
+{
+	bus_unregister(&ancillary_bus_type);
+}
+
+module_init(ancillary_bus_init);
+module_exit(ancillary_bus_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Ancillary Bus");
+MODULE_AUTHOR("David Ertman <david.m.ertman@intel.com>");
+MODULE_AUTHOR("Kiran Patil <kiran.patil@intel.com>");
diff --git a/include/linux/ancillary_bus.h b/include/linux/ancillary_bus.h
new file mode 100644
index 000000000000..73b13b56403d
--- /dev/null
+++ b/include/linux/ancillary_bus.h
@@ -0,0 +1,58 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2019-2020 Intel Corporation
+ *
+ * Please see Documentation/driver-api/ancillary_bus.rst for more information.
+ */
+
+#ifndef _ANCILLARY_BUS_H_
+#define _ANCILLARY_BUS_H_
+
+#include <linux/device.h>
+#include <linux/mod_devicetable.h>
+#include <linux/slab.h>
+
+struct ancillary_device {
+	struct device dev;
+	const char *name;
+	u32 id;
+};
+
+struct ancillary_driver {
+	int (*probe)(struct ancillary_device *ancildev, const struct ancillary_device_id *id);
+	int (*remove)(struct ancillary_device *ancildev);
+	void (*shutdown)(struct ancillary_device *ancildev);
+	int (*suspend)(struct ancillary_device *ancildev, pm_message_t state);
+	int (*resume)(struct ancillary_device *ancildev);
+	struct device_driver driver;
+	const struct ancillary_device_id *id_table;
+};
+
+static inline struct ancillary_device *to_ancillary_dev(struct device *dev)
+{
+	return container_of(dev, struct ancillary_device, dev);
+}
+
+static inline struct ancillary_driver *to_ancillary_drv(struct device_driver *drv)
+{
+	return container_of(drv, struct ancillary_driver, driver);
+}
+
+int ancillary_device_initialize(struct ancillary_device *ancildev);
+int __ancillary_device_add(struct ancillary_device *ancildev, const char *modname);
+#define ancillary_device_add(ancildev) __ancillary_device_add(ancildev, KBUILD_MODNAME)
+
+static inline void ancillary_device_unregister(struct ancillary_device *ancildev)
+{
+	device_unregister(&ancildev->dev);
+}
+
+int __ancillary_driver_register(struct ancillary_driver *ancildrv, struct module *owner);
+#define ancillary_driver_register(ancildrv) __ancillary_driver_register(ancildrv, THIS_MODULE)
+
+static inline void ancillary_driver_unregister(struct ancillary_driver *ancildrv)
+{
+	driver_unregister(&ancildrv->driver);
+}
+
+#endif /* _ANCILLARY_BUS_H_ */
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 5b08a473cdba..7d596dc30833 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -838,4 +838,12 @@  struct mhi_device_id {
 	kernel_ulong_t driver_data;
 };
 
+#define ANCILLARY_NAME_SIZE 32
+#define ANCILLARY_MODULE_PREFIX "ancillary:"
+
+struct ancillary_device_id {
+	char name[ANCILLARY_NAME_SIZE];
+	kernel_ulong_t driver_data;
+};
+
 #endif /* LINUX_MOD_DEVICETABLE_H */
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index 27007c18e754..79e37c4c25b3 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -243,5 +243,8 @@  int main(void)
 	DEVID(mhi_device_id);
 	DEVID_FIELD(mhi_device_id, chan);
 
+	DEVID(ancillary_device_id);
+	DEVID_FIELD(ancillary_device_id, name);
+
 	return 0;
 }
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 2417dd1dee33..99c4fcd82bf3 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1364,6 +1364,13 @@  static int do_mhi_entry(const char *filename, void *symval, char *alias)
 {
 	DEF_FIELD_ADDR(symval, mhi_device_id, chan);
 	sprintf(alias, MHI_DEVICE_MODALIAS_FMT, *chan);
+	return 1;
+}
+
+static int do_ancillary_entry(const char *filename, void *symval, char *alias)
+{
+	DEF_FIELD_ADDR(symval, ancillary_device_id, name);
+	sprintf(alias, ANCILLARY_MODULE_PREFIX "%s", *name);
 
 	return 1;
 }
@@ -1442,6 +1449,7 @@  static const struct devtable devtable[] = {
 	{"tee", SIZE_tee_client_device_id, do_tee_entry},
 	{"wmi", SIZE_wmi_device_id, do_wmi_entry},
 	{"mhi", SIZE_mhi_device_id, do_mhi_entry},
+	{"ancillary", SIZE_ancillary_device_id, do_ancillary_entry},
 };
 
 /* Create MODULE_ALIAS() statements.