diff mbox

[RFD,RESEND] dmaengine: add new sleepy alloc descriptor and slave sg APIs

Message ID 1487895239-6348-1-git-send-email-vinod.koul@intel.com (mailing list archive)
State RFC
Headers show

Commit Message

Vinod Koul Feb. 24, 2017, 12:13 a.m. UTC
There have been discussions on need for change of dmaengine API for
a) allow sleepy invocation of the prepare descriptor so that drivers can
do the runtime pm (maybe move into core as well)
b) split the prepare operation to allocate and new prepare methods

TBD: Should the new prepare be atomic or not
TBD: Should we move the pm calls to core
TBA: Documentation and wrappers for these calls

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---

resend with CC list, missed first try...

 include/linux/dmaengine.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Dan Williams Feb. 24, 2017, 12:41 a.m. UTC | #1
On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> There have been discussions on need for change of dmaengine API for
> a) allow sleepy invocation of the prepare descriptor so that drivers can
> do the runtime pm (maybe move into core as well)
> b) split the prepare operation to allocate and new prepare methods

I'm wondering if you should you go even further and move to a more
idiomatic model where a generic request is prepared outside the driver
and passed in, rather than the current situation of requiring the
driver to wrap a dma_async_tx_descriptor inside it's internal command
structure.
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vinod Koul Feb. 24, 2017, 12:53 a.m. UTC | #2
On Thu, 2017-02-23 at 16:41 -0800, Dan Williams wrote:
> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com>
> wrote:
> > 
> > There have been discussions on need for change of dmaengine API for
> > a) allow sleepy invocation of the prepare descriptor so that drivers
> > can
> > do the runtime pm (maybe move into core as well)
> > b) split the prepare operation to allocate and new prepare methods
> 
> I'm wondering if you should you go even further and move to a more
> idiomatic model where a generic request is prepared outside the driver
> and passed in, rather than the current situation of requiring the
> driver to wrap a dma_async_tx_descriptor inside it's internal command
> structure.

Yes that's certainly a good idea. I will work on that on my way back
home. good use of a long flight time :)
Vinod Koul Feb. 24, 2017, 12:54 a.m. UTC | #3
On Thu, 2017-02-23 at 16:13 -0800, Vinod Koul wrote:
> There have been discussions on need for change of dmaengine API for
> a) allow sleepy invocation of the prepare descriptor so that drivers
> can
> do the runtime pm (maybe move into core as well)
> b) split the prepare operation to allocate and new prepare methods
> 
> TBD: Should the new prepare be atomic or not
> TBD: Should we move the pm calls to core
> TBA: Documentation and wrappers for these calls

Also need to add a new capability so that user know if they can call the
new API or not
Lars-Peter Clausen Feb. 24, 2017, 10:56 a.m. UTC | #4
On 02/24/2017 01:13 AM, Vinod Koul wrote:
> There have been discussions on need for change of dmaengine API for
> a) allow sleepy invocation of the prepare descriptor so that drivers can
> do the runtime pm (maybe move into core as well)
> b) split the prepare operation to allocate and new prepare methods
> 
> TBD: Should the new prepare be atomic or not
> TBD: Should we move the pm calls to core
> TBA: Documentation and wrappers for these calls

I'm not convinced by this approach. You only know how much memory you need
to allocate once you know what the content of the DMA descriptor is. E.g.
depending on the length of the transfer the transfer might need to be split
into multiple internal segments. So you'd still have to do a alloc operation
in the device_desc_prep_slave_sg() callback.

For PM I'd much rather have explicit PM operations which signal intent to
use the DMA channel. E.g. dmaengine_channel_pm_{get,put}() or something
similar. For audio these could for example be called when the PCM device is
opened/closed.

> 
> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
> ---
> 
> resend with CC list, missed first try...
> 
>  include/linux/dmaengine.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 533680860865..33edcc5bc723 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -658,6 +658,11 @@ struct dma_filter {
>  	const struct dma_slave_map *map;
>  };
>  
> +enum dmaengine_operation {
> +	DMAENGINE_SLAVE_SG = 1,
> +	DMAENGINE_SLAVE_LAST = DMAENGINE_SLAVE_SG,
> +};
> +
>  /**
>   * struct dma_device - info on the entity supplying DMA services
>   * @chancnt: how many DMA channels are supported
> @@ -700,6 +705,10 @@ struct dma_filter {
>   *	be called after period_len bytes have been transferred.
>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
> + * @device_alloc_descriptor: Allocate a dma descriptor for dmaengine operation
> + * 	specfied. Can by invoked from sleepy context.
> + * 	Cannot be called from callback context.
> + * @device_desc_prep_slave_sg: Prepare a slave sg txn for a given descriptor
>   * @device_config: Pushes a new configuration to a channel, return 0 or an error
>   *	code
>   * @device_pause: Pauses any transfer happening on a channel. Returns
> @@ -792,6 +801,12 @@ struct dma_device {
>  		struct dma_chan *chan, dma_addr_t dst, u64 data,
>  		unsigned long flags);
>  
> +	struct dma_async_tx_descriptor *(*device_alloc_descriptor)(
> +		struct dma_chan *chan, enum dmaengine_operation op);
> +	int (*device_desc_prep_slave_sg)(struct dma_chan *chan,
> +		struct scatterlist *sgl, unsigned int sg_len,
> +		enum dma_transfer_direction direction, unsigned long flags);
> +
>  	int (*device_config)(struct dma_chan *chan,
>  			     struct dma_slave_config *config);
>  	int (*device_pause)(struct dma_chan *chan);
> 

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marek Szyprowski Feb. 24, 2017, 11:37 a.m. UTC | #5
Hi Lars,

On 2017-02-24 11:56, Lars-Peter Clausen wrote:
> On 02/24/2017 01:13 AM, Vinod Koul wrote:
>> There have been discussions on need for change of dmaengine API for
>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>> do the runtime pm (maybe move into core as well)
>> b) split the prepare operation to allocate and new prepare methods
>>
>> TBD: Should the new prepare be atomic or not
>> TBD: Should we move the pm calls to core
>> TBA: Documentation and wrappers for these calls
> I'm not convinced by this approach. You only know how much memory you need
> to allocate once you know what the content of the DMA descriptor is. E.g.
> depending on the length of the transfer the transfer might need to be split
> into multiple internal segments. So you'd still have to do a alloc operation
> in the device_desc_prep_slave_sg() callback.
>
> For PM I'd much rather have explicit PM operations which signal intent to
> use the DMA channel. E.g. dmaengine_channel_pm_{get,put}() or something
> similar. For audio these could for example be called when the PCM device is
> opened/closed.

I also vote for this approach.

The only question is how to handle the case of old client driver (which
doesn't do new calls) and new dma engine driver (which requires new calls).
Should we assume that the dmaengine channel is pm active after
dma_request_chan() and has to be explicitly turned off first if not used by
the client?

The other possibility would be to add yet another function to request 
channel
(like dma_request_channel_suspended()) and add dmaengine_channel_pm_get() to
the current dma_request_chan() function in the dma engine core.

 > ...

Best regards
Lars-Peter Clausen Feb. 24, 2017, 1:05 p.m. UTC | #6
On 02/24/2017 12:37 PM, Marek Szyprowski wrote:
> Hi Lars,
> 
> On 2017-02-24 11:56, Lars-Peter Clausen wrote:
>> On 02/24/2017 01:13 AM, Vinod Koul wrote:
>>> There have been discussions on need for change of dmaengine API for
>>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>>> do the runtime pm (maybe move into core as well)
>>> b) split the prepare operation to allocate and new prepare methods
>>>
>>> TBD: Should the new prepare be atomic or not
>>> TBD: Should we move the pm calls to core
>>> TBA: Documentation and wrappers for these calls
>> I'm not convinced by this approach. You only know how much memory you need
>> to allocate once you know what the content of the DMA descriptor is. E.g.
>> depending on the length of the transfer the transfer might need to be split
>> into multiple internal segments. So you'd still have to do a alloc operation
>> in the device_desc_prep_slave_sg() callback.
>>
>> For PM I'd much rather have explicit PM operations which signal intent to
>> use the DMA channel. E.g. dmaengine_channel_pm_{get,put}() or something
>> similar. For audio these could for example be called when the PCM device is
>> opened/closed.
> 
> I also vote for this approach.
> 
> The only question is how to handle the case of old client driver (which
> doesn't do new calls) and new dma engine driver (which requires new calls).
> Should we assume that the dmaengine channel is pm active after
> dma_request_chan() and has to be explicitly turned off first if not used by
> the client?
> 
> The other possibility would be to add yet another function to request channel
> (like dma_request_channel_suspended()) and add dmaengine_channel_pm_get() to
> the current dma_request_chan() function in the dma engine core.

Yeah, I guess one of the two. Or some kind of opt-in function before the PM
functions can be used (dmaengine_channel_enable_pm()).

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Jiang Feb. 24, 2017, 5:19 p.m. UTC | #7
On 02/23/2017 05:41 PM, Dan Williams wrote:
> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
>> There have been discussions on need for change of dmaengine API for
>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>> do the runtime pm (maybe move into core as well)
>> b) split the prepare operation to allocate and new prepare methods
> 
> I'm wondering if you should you go even further and move to a more
> idiomatic model where a generic request is prepared outside the driver
> and passed in, rather than the current situation of requiring the
> driver to wrap a dma_async_tx_descriptor inside it's internal command
> structure.

I wonder if we should support allowing proper batching of requests
before submission in the framework. Certain consumers of the DMA driver
can take advantage of that to improve performance.

> --
> To unsubscribe from this list: send the line "unsubscribe dmaengine" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lars-Peter Clausen Feb. 24, 2017, 5:27 p.m. UTC | #8
On 02/24/2017 06:19 PM, Dave Jiang wrote:
> 
> 
> On 02/23/2017 05:41 PM, Dan Williams wrote:
>> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
>>> There have been discussions on need for change of dmaengine API for
>>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>>> do the runtime pm (maybe move into core as well)
>>> b) split the prepare operation to allocate and new prepare methods
>>
>> I'm wondering if you should you go even further and move to a more
>> idiomatic model where a generic request is prepared outside the driver
>> and passed in, rather than the current situation of requiring the
>> driver to wrap a dma_async_tx_descriptor inside it's internal command
>> structure.
> 
> I wonder if we should support allowing proper batching of requests
> before submission in the framework. Certain consumers of the DMA driver
> can take advantage of that to improve performance.

Some drivers already do this and the API allows this. issue_pending() is the
operation that starts the transfer. So when a descriptor is submitted and
there are already other descriptors pending it is OK to batch those
descriptors together into one hardware-linked descriptor chain.

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Feb. 24, 2017, 5:29 p.m. UTC | #9
On Thu, Feb 23, 2017 at 4:53 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> On Thu, 2017-02-23 at 16:41 -0800, Dan Williams wrote:
>> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com>
>> wrote:
>> >
>> > There have been discussions on need for change of dmaengine API for
>> > a) allow sleepy invocation of the prepare descriptor so that drivers
>> > can
>> > do the runtime pm (maybe move into core as well)
>> > b) split the prepare operation to allocate and new prepare methods
>>
>> I'm wondering if you should you go even further and move to a more
>> idiomatic model where a generic request is prepared outside the driver
>> and passed in, rather than the current situation of requiring the
>> driver to wrap a dma_async_tx_descriptor inside it's internal command
>> structure.
>
> Yes that's certainly a good idea. I will work on that on my way back
> home. good use of a long flight time :)

The current situation of arose from a consideration of minimizing the
overhead of descriptor setup. However it is clear now that it was a
mistake and led to the explosion of ->prep() methods and awkward
non-uniform locking across drivers.
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Jiang Feb. 24, 2017, 5:38 p.m. UTC | #10
On 02/24/2017 10:27 AM, Lars-Peter Clausen wrote:
> On 02/24/2017 06:19 PM, Dave Jiang wrote:
>>
>>
>> On 02/23/2017 05:41 PM, Dan Williams wrote:
>>> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
>>>> There have been discussions on need for change of dmaengine API for
>>>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>>>> do the runtime pm (maybe move into core as well)
>>>> b) split the prepare operation to allocate and new prepare methods
>>>
>>> I'm wondering if you should you go even further and move to a more
>>> idiomatic model where a generic request is prepared outside the driver
>>> and passed in, rather than the current situation of requiring the
>>> driver to wrap a dma_async_tx_descriptor inside it's internal command
>>> structure.
>>
>> I wonder if we should support allowing proper batching of requests
>> before submission in the framework. Certain consumers of the DMA driver
>> can take advantage of that to improve performance.
> 
> Some drivers already do this and the API allows this. issue_pending() is the
> operation that starts the transfer. So when a descriptor is submitted and
> there are already other descriptors pending it is OK to batch those
> descriptors together into one hardware-linked descriptor chain.
> 

Right, a lot of the mechanism is in the driver. And the prep call leaves
the submission lock held does it not? That prevents more prep calls
being invoked.
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lars-Peter Clausen Feb. 24, 2017, 8:37 p.m. UTC | #11
On 02/24/2017 06:38 PM, Dave Jiang wrote:
> 
> 
> On 02/24/2017 10:27 AM, Lars-Peter Clausen wrote:
>> On 02/24/2017 06:19 PM, Dave Jiang wrote:
>>>
>>>
>>> On 02/23/2017 05:41 PM, Dan Williams wrote:
>>>> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
>>>>> There have been discussions on need for change of dmaengine API for
>>>>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>>>>> do the runtime pm (maybe move into core as well)
>>>>> b) split the prepare operation to allocate and new prepare methods
>>>>
>>>> I'm wondering if you should you go even further and move to a more
>>>> idiomatic model where a generic request is prepared outside the driver
>>>> and passed in, rather than the current situation of requiring the
>>>> driver to wrap a dma_async_tx_descriptor inside it's internal command
>>>> structure.
>>>
>>> I wonder if we should support allowing proper batching of requests
>>> before submission in the framework. Certain consumers of the DMA driver
>>> can take advantage of that to improve performance.
>>
>> Some drivers already do this and the API allows this. issue_pending() is the
>> operation that starts the transfer. So when a descriptor is submitted and
>> there are already other descriptors pending it is OK to batch those
>> descriptors together into one hardware-linked descriptor chain.
>>
> 
> Right, a lot of the mechanism is in the driver. And the prep call leaves
> the submission lock held does it not? That prevents more prep calls
> being invoked.

That's not an issue, what you do is:

prep()
submit()
prep()
submit()
...
prep()
submit()
issue_pending()

All the descriptors submitted up to the issue_pending() can be batch into
one hardware-linked list of descriptors.

- Lars

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vinod Koul March 2, 2017, 5:03 a.m. UTC | #12
On Fri, Feb 24, 2017 at 11:56:54AM +0100, Lars-Peter Clausen wrote:
> On 02/24/2017 01:13 AM, Vinod Koul wrote:
> > There have been discussions on need for change of dmaengine API for
> > a) allow sleepy invocation of the prepare descriptor so that drivers can
> > do the runtime pm (maybe move into core as well)
> > b) split the prepare operation to allocate and new prepare methods
> > 
> > TBD: Should the new prepare be atomic or not
> > TBD: Should we move the pm calls to core
> > TBA: Documentation and wrappers for these calls
> 
> I'm not convinced by this approach. You only know how much memory you need
> to allocate once you know what the content of the DMA descriptor is. E.g.
> depending on the length of the transfer the transfer might need to be split
> into multiple internal segments. So you'd still have to do a alloc operation
> in the device_desc_prep_slave_sg() callback.

Yes a good point. One of the things I would like to see here is the split of
descriptors based on length supported by the device, so driver gets N
prepare operations.

> For PM I'd much rather have explicit PM operations which signal intent to
> use the DMA channel. E.g. dmaengine_channel_pm_{get,put}() or something
> similar. For audio these could for example be called when the PCM device is
> opened/closed.

The PM is merely a trigger here, the aim is broader redesign of the DMAengine
API as we discussed in KS
Vinod Koul March 2, 2017, 5:10 a.m. UTC | #13
On Fri, Feb 24, 2017 at 09:37:40PM +0100, Lars-Peter Clausen wrote:
> On 02/24/2017 06:38 PM, Dave Jiang wrote:
> > 
> > 
> > On 02/24/2017 10:27 AM, Lars-Peter Clausen wrote:
> >> On 02/24/2017 06:19 PM, Dave Jiang wrote:
> >>>
> >>>
> >>> On 02/23/2017 05:41 PM, Dan Williams wrote:
> >>>> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
> >>>>> There have been discussions on need for change of dmaengine API for
> >>>>> a) allow sleepy invocation of the prepare descriptor so that drivers can
> >>>>> do the runtime pm (maybe move into core as well)
> >>>>> b) split the prepare operation to allocate and new prepare methods
> >>>>
> >>>> I'm wondering if you should you go even further and move to a more
> >>>> idiomatic model where a generic request is prepared outside the driver
> >>>> and passed in, rather than the current situation of requiring the
> >>>> driver to wrap a dma_async_tx_descriptor inside it's internal command
> >>>> structure.
> >>>
> >>> I wonder if we should support allowing proper batching of requests
> >>> before submission in the framework. Certain consumers of the DMA driver
> >>> can take advantage of that to improve performance.
> >>
> >> Some drivers already do this and the API allows this. issue_pending() is the
> >> operation that starts the transfer. So when a descriptor is submitted and
> >> there are already other descriptors pending it is OK to batch those
> >> descriptors together into one hardware-linked descriptor chain.
> >>
> > 
> > Right, a lot of the mechanism is in the driver. And the prep call leaves
> > the submission lock held does it not? That prevents more prep calls
> > being invoked.
> 
> That's not an issue, what you do is:
> 
> prep()
> submit()
> prep()
> submit()
> ...
> prep()
> submit()
> issue_pending()
> 
> All the descriptors submitted up to the issue_pending() can be batch into
> one hardware-linked list of descriptors.

Yes but then should this be handled in drivers or moved to framework, I am
thinking latter :)
Laurent Pinchart March 2, 2017, 6:58 p.m. UTC | #14
On Thursday 02 Mar 2017 10:33:21 Vinod Koul wrote:
> On Fri, Feb 24, 2017 at 11:56:54AM +0100, Lars-Peter Clausen wrote:
> > On 02/24/2017 01:13 AM, Vinod Koul wrote:
> > > There have been discussions on need for change of dmaengine API for
> > > a) allow sleepy invocation of the prepare descriptor so that drivers can
> > > do the runtime pm (maybe move into core as well)
> > > b) split the prepare operation to allocate and new prepare methods
> > > 
> > > TBD: Should the new prepare be atomic or not
> > > TBD: Should we move the pm calls to core
> > > TBA: Documentation and wrappers for these calls
> > 
> > I'm not convinced by this approach. You only know how much memory you need
> > to allocate once you know what the content of the DMA descriptor is. E.g.
> > depending on the length of the transfer the transfer might need to be
> > split into multiple internal segments. So you'd still have to do a alloc
> > operation in the device_desc_prep_slave_sg() callback.
> 
> Yes a good point. One of the things I would like to see here is the split of
> descriptors based on length supported by the device, so driver gets N
> prepare operations.
> 
> > For PM I'd much rather have explicit PM operations which signal intent to
> > use the DMA channel. E.g. dmaengine_channel_pm_{get,put}() or something
> > similar. For audio these could for example be called when the PCM device
> > is opened/closed.
> 
> The PM is merely a trigger here, the aim is broader redesign of the
> DMAengine API as we discussed in KS

I was going to mention that. PM support is needed, but on top of that the 
ability to reuse descriptors would be a good to have by itself.
Vinod Koul March 3, 2017, 7:29 a.m. UTC | #15
On Thu, Mar 02, 2017 at 08:58:40PM +0200, Laurent Pinchart wrote:
> On Thursday 02 Mar 2017 10:33:21 Vinod Koul wrote:
> > On Fri, Feb 24, 2017 at 11:56:54AM +0100, Lars-Peter Clausen wrote:
> > > On 02/24/2017 01:13 AM, Vinod Koul wrote:
> > > > There have been discussions on need for change of dmaengine API for
> > > > a) allow sleepy invocation of the prepare descriptor so that drivers can
> > > > do the runtime pm (maybe move into core as well)
> > > > b) split the prepare operation to allocate and new prepare methods
> > > > 
> > > > TBD: Should the new prepare be atomic or not
> > > > TBD: Should we move the pm calls to core
> > > > TBA: Documentation and wrappers for these calls
> > > 
> > > I'm not convinced by this approach. You only know how much memory you need
> > > to allocate once you know what the content of the DMA descriptor is. E.g.
> > > depending on the length of the transfer the transfer might need to be
> > > split into multiple internal segments. So you'd still have to do a alloc
> > > operation in the device_desc_prep_slave_sg() callback.
> > 
> > Yes a good point. One of the things I would like to see here is the split of
> > descriptors based on length supported by the device, so driver gets N
> > prepare operations.
> > 
> > > For PM I'd much rather have explicit PM operations which signal intent to
> > > use the DMA channel. E.g. dmaengine_channel_pm_{get,put}() or something
> > > similar. For audio these could for example be called when the PCM device
> > > is opened/closed.
> > 
> > The PM is merely a trigger here, the aim is broader redesign of the
> > DMAengine API as we discussed in KS
> 
> I was going to mention that. PM support is needed, but on top of that the 
> ability to reuse descriptors would be a good to have by itself.

Descriptor reuse is already suppoeted in the current API.
Marek Szyprowski March 27, 2017, 7:32 a.m. UTC | #16
Hi Vinod,

On 2017-03-02 06:10, Vinod Koul wrote:
> On Fri, Feb 24, 2017 at 09:37:40PM +0100, Lars-Peter Clausen wrote:
>> On 02/24/2017 06:38 PM, Dave Jiang wrote:
>>> On 02/24/2017 10:27 AM, Lars-Peter Clausen wrote:
>>>> On 02/24/2017 06:19 PM, Dave Jiang wrote:
>>>>> On 02/23/2017 05:41 PM, Dan Williams wrote:
>>>>>> On Thu, Feb 23, 2017 at 4:13 PM, Vinod Koul <vinod.koul@intel.com> wrote:
>>>>>>> There have been discussions on need for change of dmaengine API for
>>>>>>> a) allow sleepy invocation of the prepare descriptor so that drivers can
>>>>>>> do the runtime pm (maybe move into core as well)
>>>>>>> b) split the prepare operation to allocate and new prepare methods
>>>>>> I'm wondering if you should you go even further and move to a more
>>>>>> idiomatic model where a generic request is prepared outside the driver
>>>>>> and passed in, rather than the current situation of requiring the
>>>>>> driver to wrap a dma_async_tx_descriptor inside it's internal command
>>>>>> structure.
>>>>> I wonder if we should support allowing proper batching of requests
>>>>> before submission in the framework. Certain consumers of the DMA driver
>>>>> can take advantage of that to improve performance.
>>>> Some drivers already do this and the API allows this. issue_pending() is the
>>>> operation that starts the transfer. So when a descriptor is submitted and
>>>> there are already other descriptors pending it is OK to batch those
>>>> descriptors together into one hardware-linked descriptor chain.
>>>>
>>> Right, a lot of the mechanism is in the driver. And the prep call leaves
>>> the submission lock held does it not? That prevents more prep calls
>>> being invoked.
>> That's not an issue, what you do is:
>>
>> prep()
>> submit()
>> prep()
>> submit()
>> ...
>> prep()
>> submit()
>> issue_pending()
>>
>> All the descriptors submitted up to the issue_pending() can be batch into
>> one hardware-linked list of descriptors.
> Yes but then should this be handled in drivers or moved to framework, I am
> thinking latter :)

Just a naive question: is there any plan and volunteer for that work? (after
KS meeting and discussion).

I hope that this is not just a rough direction and if I need to have runtime
pm issues solved, I need to do all that works first...

Best regards
Vinod Koul March 30, 2017, 4:04 a.m. UTC | #17
On Mon, Mar 27, 2017 at 09:32:42AM +0200, Marek Szyprowski wrote:
> 
> Just a naive question: is there any plan and volunteer for that work? (after
> KS meeting and discussion).
> 
> I hope that this is not just a rough direction and if I need to have runtime
> pm issues solved, I need to do all that works first...

Thats a function of spare bandwidth :( if you/anyone else volunteers then
pls do let me know, I can help with that :)
diff mbox

Patch

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 533680860865..33edcc5bc723 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -658,6 +658,11 @@  struct dma_filter {
 	const struct dma_slave_map *map;
 };
 
+enum dmaengine_operation {
+	DMAENGINE_SLAVE_SG = 1,
+	DMAENGINE_SLAVE_LAST = DMAENGINE_SLAVE_SG,
+};
+
 /**
  * struct dma_device - info on the entity supplying DMA services
  * @chancnt: how many DMA channels are supported
@@ -700,6 +705,10 @@  struct dma_filter {
  *	be called after period_len bytes have been transferred.
  * @device_prep_interleaved_dma: Transfer expression in a generic way.
  * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
+ * @device_alloc_descriptor: Allocate a dma descriptor for dmaengine operation
+ * 	specfied. Can by invoked from sleepy context.
+ * 	Cannot be called from callback context.
+ * @device_desc_prep_slave_sg: Prepare a slave sg txn for a given descriptor
  * @device_config: Pushes a new configuration to a channel, return 0 or an error
  *	code
  * @device_pause: Pauses any transfer happening on a channel. Returns
@@ -792,6 +801,12 @@  struct dma_device {
 		struct dma_chan *chan, dma_addr_t dst, u64 data,
 		unsigned long flags);
 
+	struct dma_async_tx_descriptor *(*device_alloc_descriptor)(
+		struct dma_chan *chan, enum dmaengine_operation op);
+	int (*device_desc_prep_slave_sg)(struct dma_chan *chan,
+		struct scatterlist *sgl, unsigned int sg_len,
+		enum dma_transfer_direction direction, unsigned long flags);
+
 	int (*device_config)(struct dma_chan *chan,
 			     struct dma_slave_config *config);
 	int (*device_pause)(struct dma_chan *chan);