diff mbox

Documentation: dmaengine: Add a documentation for the dma controller API

Message ID 1406736193-26685-1-git-send-email-maxime.ripard@free-electrons.com (mailing list archive)
State Superseded
Headers show

Commit Message

Maxime Ripard July 30, 2014, 4:03 p.m. UTC
The dmaengine is neither trivial nor properly documented at the moment, which
means a lot of trial and error development, which is not that good for such a
central piece of the system.

Attempt at making such a documentation.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 Documentation/dmaengine-driver.txt | 293 +++++++++++++++++++++++++++++++++++++
 1 file changed, 293 insertions(+)
 create mode 100644 Documentation/dmaengine-driver.txt

Comments

Vinod Koul July 30, 2014, 4:06 p.m. UTC | #1
On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> The dmaengine is neither trivial nor properly documented at the moment, which
> means a lot of trial and error development, which is not that good for such a
> central piece of the system.
> 
> Attempt at making such a documentation.

Did you miss Documentation/dmaengine.txt, lots of this is already covered
there. But yes i would be really glad to know what isnt, so that we can fix
that.

> +  * device_slave_caps
> +    - Isn't that redundant with the cap_mask already?
> +    - Only a few drivers seem to implement it
For audio to know what your channel can do rather than hardcoding it

> +
> +  * dma cookies?
cookie is dma transaction representation which is monotonically incrementing
number.
Maxime Ripard July 31, 2014, 7:44 a.m. UTC | #2
Hi Vinod,

On Wed, Jul 30, 2014 at 09:36:07PM +0530, Vinod Koul wrote:
> On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> > The dmaengine is neither trivial nor properly documented at the moment, which
> > means a lot of trial and error development, which is not that good for such a
> > central piece of the system.
> > 
> > Attempt at making such a documentation.
> 
> Did you miss Documentation/dmaengine.txt, lots of this is already covered
> there. But yes i would be really glad to know what isnt, so that we can fix
> that.

I didn't miss it. But I feel like it describes quite nicely the slave
API, but doesn't help at all whenever you're writing a DMAengine driver.

The first lines of the existing document makes it quite clear too.

There's still a bit of duplication, but I don't feel it's such a big
deal.

What I'd like to do with the documentation I just sent is basically
have a clear idea whenever you step into dmaengine what you can/cannot
do, and have a reference document explaining what's expected by the
framework, and hopefully have unified drivers that follow this
pattern.

Because, for the moment, we're pretty much left in the dark with
different drivers doing the same thing in completetely different ways,
with basically no way to tell if it's either the framework that
requires such behaviour, or if the author was just feeling creative.

There's numerous examples for this at the moment:
  - The GFP flags, with different drivers using either GFP_ATOMIC,
    GFP_NOWAIT or GFP_KERNEL in the same functions
  - Having to set device_slave_caps or not?
  - Some drivers use dma_run_depedencies, some other don't
  - That might just be my experience, but judging from previous
    commits, DMA_PRIVATE is completely obscure, and we just set it
    because it was making it work, without knowing what it was
    supposed to do.
  - etc.

And basically, we have no way to tell at the moment which one is
right and which one needs fixing.

The corollary being that it cripples the whole community ability to
maintain the framework and make it evolve.

> > +  * device_slave_caps
> > +    - Isn't that redundant with the cap_mask already?
> > +    - Only a few drivers seem to implement it
> For audio to know what your channel can do rather than hardcoding it

Ah, yes, I see it now. It's not related to the caps mask at all.

Just out of curiosity, wouldn't it be better to move this to the
framework, and have these informations provided through the struct
dma_device? Or would it have some non-trivial side-effects?

> > +  * dma cookies?
> cookie is dma transaction representation which is monotonically incrementing
> number.

Ok, and it identifies a unique dma_async_tx_descriptor, right?

Thanks,
Maxime
Vinod Koul July 31, 2014, 11:56 a.m. UTC | #3
On Thu, Jul 31, 2014 at 09:44:40AM +0200, Maxime Ripard wrote:
> Hi Vinod,
> 
> On Wed, Jul 30, 2014 at 09:36:07PM +0530, Vinod Koul wrote:
> > On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> > > The dmaengine is neither trivial nor properly documented at the moment, which
> > > means a lot of trial and error development, which is not that good for such a
> > > central piece of the system.
> > > 
> > > Attempt at making such a documentation.
> > 
> > Did you miss Documentation/dmaengine.txt, lots of this is already covered
> > there. But yes i would be really glad to know what isnt, so that we can fix
> > that.
> 
> I didn't miss it. But I feel like it describes quite nicely the slave
> API, but doesn't help at all whenever you're writing a DMAengine driver.
> 
> The first lines of the existing document makes it quite clear too.
> 
> There's still a bit of duplication, but I don't feel it's such a big
> deal.
And that made me think that you might have missed it.

I am okay for idea to have this document but it needs to co-exist one. No
point in duplicating as it can create ambiguity in future.
> 
> What I'd like to do with the documentation I just sent is basically
> have a clear idea whenever you step into dmaengine what you can/cannot
> do, and have a reference document explaining what's expected by the
> framework, and hopefully have unified drivers that follow this
> pattern.
Sure, can you pls modify this to avoid duplication. I would be happy to
apply that :)

> 
> Because, for the moment, we're pretty much left in the dark with
> different drivers doing the same thing in completetely different ways,
> with basically no way to tell if it's either the framework that
> requires such behaviour, or if the author was just feeling creative.
> 
> There's numerous examples for this at the moment:
>   - The GFP flags, with different drivers using either GFP_ATOMIC,
>     GFP_NOWAIT or GFP_KERNEL in the same functions
>   - Having to set device_slave_caps or not?
>   - Some drivers use dma_run_depedencies, some other don't
>   - That might just be my experience, but judging from previous
>     commits, DMA_PRIVATE is completely obscure, and we just set it
>     because it was making it work, without knowing what it was
>     supposed to do.
>   - etc.

Thanks for highlighting we should definitely add these in Documentation

> 
> And basically, we have no way to tell at the moment which one is
> right and which one needs fixing.
> 
> The corollary being that it cripples the whole community ability to
> maintain the framework and make it evolve.
> 
> > > +  * device_slave_caps
> > > +    - Isn't that redundant with the cap_mask already?
> > > +    - Only a few drivers seem to implement it
> > For audio to know what your channel can do rather than hardcoding it
> 
> Ah, yes, I see it now. It's not related to the caps mask at all.
> 
> Just out of curiosity, wouldn't it be better to move this to the
> framework, and have these informations provided through the struct
> dma_device? Or would it have some non-trivial side-effects?
Well the problem is ability to have this queried uniformly from all drivers
across subsystems. If we can do this that would be nice.
> 
> > > +  * dma cookies?
> > cookie is dma transaction representation which is monotonically incrementing
> > number.
> 
> Ok, and it identifies a unique dma_async_tx_descriptor, right?
Yup and this basically represents transactions you have submitted. Thats why
cookie is allocated at tx_submit.

Thanks
Lars-Peter Clausen July 31, 2014, 12:44 p.m. UTC | #4
On 07/31/2014 09:44 AM, Maxime Ripard wrote:
[...]
>    - Having to set device_slave_caps or not?

Yes. This should in my opinion be mandatory for new drivers. One of the 
issues with the DMAengine API is that it is really hard to write generic 
drivers that do not already know about the capabilities of the DMA 
controller. E.g. if you have a peripheral that is used on SoC A it assumes 
that the DMA controller it is connected to has the capabilities of the DMA 
controller in SoC A. If the same peripheral is now used on SoC B with a DMA 
controller with different capabilities this often ends up with ugly ifdefery 
in the peripheral driver. The peripheral driver shouldn't have to know which 
specific DMA controller it is connected to (It's a bit as if a GPIO consumer 
needed to know which GPIO controller is connected to). We got away with the 
current approach since there is not that much diversity in the mixing of 
peripherals and DMA controllers (each vendor pretty has their own DMA 
controller which it uses for their own peripherals). But with more recent 
code consolidation we are on a path to generic DMA support within subsystem 
frameworks (there is generic DMA support for audio, there is generic DMA 
support for SPI and I also have a (currently) out of tree patch for generic 
DMA support for IIO). Also these generic drivers need to be able to discover 
the capabilities of the DMA controller to be able to make the right decisions.

- Lars
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux July 31, 2014, 1:22 p.m. UTC | #5
On Thu, Jul 31, 2014 at 09:44:40AM +0200, Maxime Ripard wrote:
> I didn't miss it. But I feel like it describes quite nicely the slave
> API, but doesn't help at all whenever you're writing a DMAengine driver.

There's actually two documents:

Documentation/crypto/async-tx-api.txt
Documentation/dmaengine.txt

The first is the original DMA engine API for use with offloading things
like memcpy, as well as the crypto API to hardware crypto DMA engines.

The second is documentation for the DMA slave API which modifies some
of the requirements of the first (like, the ability to call the slave
DMA prepare functions from within the callback, which are not permitted
in the async tx API.)

Both documents are relevant for writing a DMA engine driver.

> Because, for the moment, we're pretty much left in the dark with
> different drivers doing the same thing in completetely different ways,
> with basically no way to tell if it's either the framework that
> requires such behaviour, or if the author was just feeling creative.

DMA engine has lacked a lot of infrastructure for common patterns in
drivers; some of that is solved by my efforts with the virt_dma.c
support, and also various cleanups to existing drivers, but it's not
easy to fix this problem after drivers have been merged.

> There's numerous examples for this at the moment:
>   - The GFP flags, with different drivers using either GFP_ATOMIC,
>     GFP_NOWAIT or GFP_KERNEL in the same functions

The slave prepare APIs should be using GFP_ATOMIC to ensure safety.

>   - Having to set device_slave_caps or not?

device_slave_caps is a relatively new invention, so old drivers don't
implement it.  Newer drivers should, and there probably should be some
motivation for older drivers to add it.

>   - Some drivers use dma_run_depedencies, some other don't

Dependent operations are part of the async-tx API, and aren't really
applicable to the slave DMA API.  A driver implementing just the slave
DMA API need not concern itself with dependent operations, but a
driver implementing the async-tx APIs should do.

>   - That might just be my experience, but judging from previous
>     commits, DMA_PRIVATE is completely obscure, and we just set it
>     because it was making it work, without knowing what it was
>     supposed to do.

DMA_PRIVATE means that the channel is unavailable for async-tx operations
- in other words, it's slave-only.  Setting it before registration results
in the private-use count being incremented, disabling the DMA_PRIVATE
manipulation in the channel request/release APIs (requested channels are
unavailable for async-tx operations, which is why that flag is also set
there.)

To put it another way, if a DMA engine driver only provides slave DMA
support, it must set DMA_PRIVATE to mark the channel unavailable for
async-tx operations.  If a DMA engine drivers channels can also be used
for async-tx operations, then it should leave the flag clear.

> And basically, we have no way to tell at the moment which one is
> right and which one needs fixing.
> 
> The corollary being that it cripples the whole community ability to
> maintain the framework and make it evolve.

All respect to Vinod (who looks after the slave side of the DMA engine
implementations) and Dan, what it needs is someone who knows the *whole*
DMA engine picture to be in control of the subsystem, and who knows these
details - not must one bit of it.  Someone who has the time to put into
reviewing changes to it, and probably more importantly, has the time to
clean up the existing code.  Finding someone who fits that is a struggle.

Dan knows the whole picture (or used to when he was working on it at one
of his former employers) but I don't think Dan had the available time to
address various issues with it.

> > cookie is dma transaction representation which is monotonically incrementing
> > number.
> 
> Ok, and it identifies a unique dma_async_tx_descriptor, right?

You really should not be concerned with DMA cookies anymore since one
of my cleanups - I added a number of helper functions which hide the
manipulation of DMA cookies, and take away from driver writers the
semantics of how cookies themselves are operated upon.  virt-dma goes
a little further and provides descriptor lists and methods to look up
a descriptor given a cookie.

The only thing that a driver writer should concern themselves with is
making the appropriate cookie management calls at the relevant point in
the code - in other words, allocating a cookie at in the submit callback,
completing a cookie when a descriptor is complete, etc.

Drivers should have no reason what so ever to be touching the cookie
directly anymore.
Maxime Ripard July 31, 2014, 4:13 p.m. UTC | #6
Hi Lars-Peter,

On Thu, Jul 31, 2014 at 02:44:56PM +0200, Lars-Peter Clausen wrote:
> On 07/31/2014 09:44 AM, Maxime Ripard wrote:
> [...]
> >   - Having to set device_slave_caps or not?
> 
> Yes. This should in my opinion be mandatory for new drivers.

Ok. 

> One of the issues with the DMAengine API is that it is really hard
> to write generic drivers that do not already know about the
> capabilities of the DMA controller. E.g. if you have a peripheral
> that is used on SoC A it assumes that the DMA controller it is
> connected to has the capabilities of the DMA controller in SoC A. If
> the same peripheral is now used on SoC B with a DMA controller with
> different capabilities this often ends up with ugly ifdefery in the
> peripheral driver. The peripheral driver shouldn't have to know
> which specific DMA controller it is connected to (It's a bit as if a
> GPIO consumer needed to know which GPIO controller is connected
> to). We got away with the current approach since there is not that
> much diversity in the mixing of peripherals and DMA controllers
> (each vendor pretty has their own DMA controller which it uses for
> their own peripherals). But with more recent code consolidation we
> are on a path to generic DMA support within subsystem frameworks
> (there is generic DMA support for audio, there is generic DMA
> support for SPI and I also have a (currently) out of tree patch for
> generic DMA support for IIO). Also these generic drivers need to be
> able to discover the capabilities of the DMA controller to be able
> to make the right decisions.

Yeah, I've seen the generic infrastructure in both ASoC and SPI, and
it's great that it's coming to IIO as well.

I wasn't aware that it was relying on device_slave_caps though, and
been mislead by the caps name into thinking that it was related to the
caps_mask, which is obviously not.

From what you're saying, and judging from the drivers that already
implement it, can't it be moved directly to the framework itself ?

The informations put there could be either used elsewhere (like
framework-level filtering of invalid directions/bus width) or could be
derived directly from which callbacks are set (in the pause/terminate
case)?

I guess that would make generic layer much easier to write, since
you'll always have this information.

Maxime
Maxime Ripard July 31, 2014, 4:23 p.m. UTC | #7
On Thu, Jul 31, 2014 at 05:26:28PM +0530, Vinod Koul wrote:
> On Thu, Jul 31, 2014 at 09:44:40AM +0200, Maxime Ripard wrote:
> > Hi Vinod,
> > 
> > On Wed, Jul 30, 2014 at 09:36:07PM +0530, Vinod Koul wrote:
> > > On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> > > > The dmaengine is neither trivial nor properly documented at the moment, which
> > > > means a lot of trial and error development, which is not that good for such a
> > > > central piece of the system.
> > > > 
> > > > Attempt at making such a documentation.
> > > 
> > > Did you miss Documentation/dmaengine.txt, lots of this is already covered
> > > there. But yes i would be really glad to know what isnt, so that we can fix
> > > that.
> > 
> > I didn't miss it. But I feel like it describes quite nicely the slave
> > API, but doesn't help at all whenever you're writing a DMAengine driver.
> > 
> > The first lines of the existing document makes it quite clear too.
> > 
> > There's still a bit of duplication, but I don't feel it's such a big
> > deal.
> And that made me think that you might have missed it.
> 
> I am okay for idea to have this document but it needs to co-exist one. No
> point in duplicating as it can create ambiguity in future.

The only duplication I'm seeing is about the device_prep* operations,
that get described in dmaengine.txt too.

There's also a minor one about struct dma_slave_config, but both are
rather generic and point to dmaengine.h, so I guess we won't be at
risk of any real ambiguity.

Do you see anything else?

> > What I'd like to do with the documentation I just sent is basically
> > have a clear idea whenever you step into dmaengine what you can/cannot
> > do, and have a reference document explaining what's expected by the
> > framework, and hopefully have unified drivers that follow this
> > pattern.
> Sure, can you pls modify this to avoid duplication. I would be happy to
> apply that :)

See above :)

Also, feel free to add anything that you feel like you keep saying
during the review. If mistakes keep coming, it's probably worth
documenting what you expect.

> > Because, for the moment, we're pretty much left in the dark with
> > different drivers doing the same thing in completetely different ways,
> > with basically no way to tell if it's either the framework that
> > requires such behaviour, or if the author was just feeling creative.
> > 
> > There's numerous examples for this at the moment:
> >   - The GFP flags, with different drivers using either GFP_ATOMIC,
> >     GFP_NOWAIT or GFP_KERNEL in the same functions
> >   - Having to set device_slave_caps or not?
> >   - Some drivers use dma_run_depedencies, some other don't
> >   - That might just be my experience, but judging from previous
> >     commits, DMA_PRIVATE is completely obscure, and we just set it
> >     because it was making it work, without knowing what it was
> >     supposed to do.
> >   - etc.
> 
> Thanks for highlighting we should definitely add these in Documentation

It's quite clear in the case of the GFP flags now, Lars-Peter and you
cleared up device_slave_caps, but I still could use some help with
DMA_PRIVATE.

> > And basically, we have no way to tell at the moment which one is
> > right and which one needs fixing.
> > 
> > The corollary being that it cripples the whole community ability to
> > maintain the framework and make it evolve.
> > 
> > > > +  * device_slave_caps
> > > > +    - Isn't that redundant with the cap_mask already?
> > > > +    - Only a few drivers seem to implement it
> > > For audio to know what your channel can do rather than hardcoding it
> > 
> > Ah, yes, I see it now. It's not related to the caps mask at all.
> > 
> > Just out of curiosity, wouldn't it be better to move this to the
> > framework, and have these informations provided through the struct
> > dma_device? Or would it have some non-trivial side-effects?
> Well the problem is ability to have this queried uniformly from all drivers
> across subsystems. If we can do this that would be nice.

I can work on some premelinary work to do just that, and see if it
works for you then.

> > > > +  * dma cookies?
> > > cookie is dma transaction representation which is monotonically incrementing
> > > number.
> > 
> > Ok, and it identifies a unique dma_async_tx_descriptor, right?
> Yup and this basically represents transactions you have submitted. Thats why
> cookie is allocated at tx_submit.

Ok, thanks.

Maxime
Maxime Ripard July 31, 2014, 4:41 p.m. UTC | #8
On Thu, Jul 31, 2014 at 02:22:23PM +0100, Russell King - ARM Linux wrote:
> On Thu, Jul 31, 2014 at 09:44:40AM +0200, Maxime Ripard wrote:
> > I didn't miss it. But I feel like it describes quite nicely the slave
> > API, but doesn't help at all whenever you're writing a DMAengine driver.
> 
> There's actually two documents:
> 
> Documentation/crypto/async-tx-api.txt
> Documentation/dmaengine.txt
> 
> The first is the original DMA engine API for use with offloading things
> like memcpy, as well as the crypto API to hardware crypto DMA engines.
> 
> The second is documentation for the DMA slave API which modifies some
> of the requirements of the first (like, the ability to call the slave
> DMA prepare functions from within the callback, which are not permitted
> in the async tx API.)
> 
> Both documents are relevant for writing a DMA engine driver.

They're both relevant, but clearly not enough.

> > Because, for the moment, we're pretty much left in the dark with
> > different drivers doing the same thing in completetely different ways,
> > with basically no way to tell if it's either the framework that
> > requires such behaviour, or if the author was just feeling creative.
> 
> DMA engine has lacked a lot of infrastructure for common patterns in
> drivers; some of that is solved by my efforts with the virt_dma.c
> support, and also various cleanups to existing drivers, but it's not
> easy to fix this problem after drivers have been merged.
> 
> > There's numerous examples for this at the moment:
> >   - The GFP flags, with different drivers using either GFP_ATOMIC,
> >     GFP_NOWAIT or GFP_KERNEL in the same functions
> 
> The slave prepare APIs should be using GFP_ATOMIC to ensure safety.

You prove my point then. Vinod asks for GFP_NOWAIT in his
reviews. Even though it doesn't change anything relative to the
atomicity of the operations, the policy is still not the same.

> >   - Having to set device_slave_caps or not?
> 
> device_slave_caps is a relatively new invention, so old drivers don't
> implement it.  Newer drivers should, and there probably should be some
> motivation for older drivers to add it.

Yep, just discovered that.

> >   - Some drivers use dma_run_depedencies, some other don't
> 
> Dependent operations are part of the async-tx API, and aren't really
> applicable to the slave DMA API.  A driver implementing just the slave
> DMA API need not concern itself with dependent operations, but a
> driver implementing the async-tx APIs should do.

Ok.

> >   - That might just be my experience, but judging from previous
> >     commits, DMA_PRIVATE is completely obscure, and we just set it
> >     because it was making it work, without knowing what it was
> >     supposed to do.
> 
> DMA_PRIVATE means that the channel is unavailable for async-tx operations
> - in other words, it's slave-only.  Setting it before registration results
> in the private-use count being incremented, disabling the DMA_PRIVATE
> manipulation in the channel request/release APIs (requested channels are
> unavailable for async-tx operations, which is why that flag is also set
> there.)
> 
> To put it another way, if a DMA engine driver only provides slave DMA
> support, it must set DMA_PRIVATE to mark the channel unavailable for
> async-tx operations.  If a DMA engine drivers channels can also be used
> for async-tx operations, then it should leave the flag clear.

What about channels that can be both used for slave operations and
async-tx (like memcpy) ?

> > And basically, we have no way to tell at the moment which one is
> > right and which one needs fixing.
> > 
> > The corollary being that it cripples the whole community ability to
> > maintain the framework and make it evolve.
> 
> All respect to Vinod (who looks after the slave side of the DMA engine
> implementations) and Dan, what it needs is someone who knows the *whole*
> DMA engine picture to be in control of the subsystem, and who knows these
> details - not must one bit of it.  Someone who has the time to put into
> reviewing changes to it, and probably more importantly, has the time to
> clean up the existing code.  Finding someone who fits that is a struggle.
> 
> Dan knows the whole picture (or used to when he was working on it at one
> of his former employers) but I don't think Dan had the available time to
> address various issues with it.

Hence why we should document as much as possible then, to spread the
knowledge and avoid it being lost when someone disappears or isn't
available anymore.

(And hopefully reduce the number of "errors" that might be done by
submitters, hence reducing the number of review iterations, lessening
the maintainers load)

> > > cookie is dma transaction representation which is monotonically incrementing
> > > number.
> > 
> > Ok, and it identifies a unique dma_async_tx_descriptor, right?
> 
> You really should not be concerned with DMA cookies anymore since one
> of my cleanups - I added a number of helper functions which hide the
> manipulation of DMA cookies, and take away from driver writers the
> semantics of how cookies themselves are operated upon.  virt-dma goes
> a little further and provides descriptor lists and methods to look up
> a descriptor given a cookie.
> 
> The only thing that a driver writer should concern themselves with is
> making the appropriate cookie management calls at the relevant point in
> the code - in other words, allocating a cookie at in the submit callback,
> completing a cookie when a descriptor is complete, etc.
> 
> Drivers should have no reason what so ever to be touching the cookie
> directly anymore.

Ok, thanks a lot for this!

Maxime
Lars-Peter Clausen July 31, 2014, 4:54 p.m. UTC | #9
On 07/31/2014 06:13 PM, Maxime Ripard wrote:
[...]
>  From what you're saying, and judging from the drivers that already
> implement it, can't it be moved directly to the framework itself ?
>

What exactly do you mean by moving it directly to the framework? The 
slave_caps API is part of the DMAengine framework.

- Lars

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maxime Ripard July 31, 2014, 5:37 p.m. UTC | #10
On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
> On 07/31/2014 06:13 PM, Maxime Ripard wrote:
> [...]
> > From what you're saying, and judging from the drivers that already
> >implement it, can't it be moved directly to the framework itself ?
> >
> 
> What exactly do you mean by moving it directly to the framework? The
> slave_caps API is part of the DMAengine framework.

Not its implementation, which is defined by each and every driver,
while the behaviour of device_slave_caps is rather generic.
Lars-Peter Clausen Aug. 1, 2014, 8 a.m. UTC | #11
On 07/31/2014 07:37 PM, Maxime Ripard wrote:
> On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
>> On 07/31/2014 06:13 PM, Maxime Ripard wrote:
>> [...]
>>>  From what you're saying, and judging from the drivers that already
>>> implement it, can't it be moved directly to the framework itself ?
>>>
>>
>> What exactly do you mean by moving it directly to the framework? The
>> slave_caps API is part of the DMAengine framework.
>
> Not its implementation, which is defined by each and every driver,
> while the behaviour of device_slave_caps is rather generic.
>

Do you mean something like adding a dma_slave_caps struct field to the DMA 
channel that gets initialized when the channel is created and then remove 
the callback? That makes some sense.

I think the main reason why we use a callback right now is that in earlier 
revisions of the API it was possible to pass a slave_config and the result 
would be the capabilities of the DMA channel for this particular config. But 
this was dropped before the final version was merged upstream.

- Lars
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maxime Ripard Aug. 1, 2014, 8:57 a.m. UTC | #12
On Fri, Aug 01, 2014 at 10:00:10AM +0200, Lars-Peter Clausen wrote:
> On 07/31/2014 07:37 PM, Maxime Ripard wrote:
> >On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
> >>On 07/31/2014 06:13 PM, Maxime Ripard wrote:
> >>[...]
> >>> From what you're saying, and judging from the drivers that already
> >>>implement it, can't it be moved directly to the framework itself ?
> >>>
> >>
> >>What exactly do you mean by moving it directly to the framework? The
> >>slave_caps API is part of the DMAengine framework.
> >
> >Not its implementation, which is defined by each and every driver,
> >while the behaviour of device_slave_caps is rather generic.
> >
> 
> Do you mean something like adding a dma_slave_caps struct field to
> the DMA channel that gets initialized when the channel is created
> and then remove the callback? That makes some sense.

I was rather thinking into something like:
  - Splitting device_control into independant functions
  - Then, knowing if you support pause/resume/terminate is trivial:
    either you implement the callback, or you don't
  - Putting the supported width and direction into fields of struct
    dma_device, which can eventually be used by the framework to
    filter out invalid configurations before calling the relevant
    callbacks
  - That would then be trivial to get from the framework, without
    calling any callback

> I think the main reason why we use a callback right now is that in
> earlier revisions of the API it was possible to pass a slave_config
> and the result would be the capabilities of the DMA channel for this
> particular config. But this was dropped before the final version was
> merged upstream.

Ah, that would explain yes.

Maxime
Russell King - ARM Linux Aug. 1, 2014, 2:53 p.m. UTC | #13
On Thu, Jul 31, 2014 at 06:41:52PM +0200, Maxime Ripard wrote:
> You prove my point then. Vinod asks for GFP_NOWAIT in his
> reviews. Even though it doesn't change anything relative to the
> atomicity of the operations, the policy is still not the same.

The difference between GFP_NOWAIT and GFP_ATOMIC is whether the emergency
pool can be used.  If you want DMA prepare functions to fail when there
is memory pressure, and you can guarantee that all drivers using the DMA
engine API will properly fall back to some other method, then I suppose
using GFP_NOWAIT is reasonable.  GFP_ATOMIC just provides additional
protection.

> > >   - That might just be my experience, but judging from previous
> > >     commits, DMA_PRIVATE is completely obscure, and we just set it
> > >     because it was making it work, without knowing what it was
> > >     supposed to do.
> > 
> > DMA_PRIVATE means that the channel is unavailable for async-tx operations
> > - in other words, it's slave-only.  Setting it before registration results
> > in the private-use count being incremented, disabling the DMA_PRIVATE
> > manipulation in the channel request/release APIs (requested channels are
> > unavailable for async-tx operations, which is why that flag is also set
> > there.)
> > 
> > To put it another way, if a DMA engine driver only provides slave DMA
> > support, it must set DMA_PRIVATE to mark the channel unavailable for
> > async-tx operations.  If a DMA engine drivers channels can also be used
> > for async-tx operations, then it should leave the flag clear.
> 
> What about channels that can be both used for slave operations and
> async-tx (like memcpy) ?

Then you don't set DMA_PRIVATE when the DMA engine driver registers with
the core.  That then allows the DMA engine core to manage the flag,
setting it when the channel is allocated for slave usage.

One important point here is that async_tx does not have the idea of
channel allocation - a channel which is registered into the DMA engine
core without DMA_PRIVATE set is a candidate for use by the async_tx
API.

> Hence why we should document as much as possible then, to spread the
> knowledge and avoid it being lost when someone disappears or isn't
> available anymore.
> 
> (And hopefully reduce the number of "errors" that might be done by
> submitters, hence reducing the number of review iterations, lessening
> the maintainers load)

The bigger issue in the longer run is being able to maintain and improve
the DMA engine implementations and API.  One of the blockers to that is
properly understanding all the async_tx API stuff, something which even
I have been unable to do despite knowing a fair amount about DMA engine
itself.

Where I fall down is the exact meaning of the ACK stuff, how the chaining
is supposed to work, how the dependency between two DMA channels are
handled.

This is why I was unable to resolve DMA engine's multiple mapping of the
same DMA buffers, which has caused problems for ARM.

I've discussed this point in the past with Dan, making the point that
each DMA engine driver should not be that concerned about things like
descriptor list management, cookies, dependent transfers, but we seem
to be ultimately doomed to re-implementing much of that infrastructure
in every single DMA engine driver that comes along.

Each DMA engine driver which gets accepted into mainline makes this
problem worse - it's another driver which does something slightly
differently that if we were to ever clean this stuff up, would need
to be fixed.  This remains my biggest concern with the DMA engine
code today, and I'm one of the few people who have put effort into
trying to sort that out.

> > Drivers should have no reason what so ever to be touching the cookie
> > directly anymore.
> 
> Ok, thanks a lot for this!

I notice that drivers/dma/sirf-dma.c directly touches it and the
completed cookie as well:

drivers/dma/sirf-dma.c: dma_cookie_t last_cookie = 0;
drivers/dma/sirf-dma.c:                         last_cookie = desc->cookie;
drivers/dma/sirf-dma.c:                 schan->chan.completed_cookie = last_cookie;

That's something which should be fixed.  I haven't looked too hard
because it's not something I want to get involved with again right
now.
Vinod Koul Aug. 1, 2014, 5:13 p.m. UTC | #14
On Thu, Jul 31, 2014 at 06:23:30PM +0200, Maxime Ripard wrote:
> On Thu, Jul 31, 2014 at 05:26:28PM +0530, Vinod Koul wrote:
> 
> Also, feel free to add anything that you feel like you keep saying
> during the review. If mistakes keep coming, it's probably worth
> documenting what you expect.
I think the common issues seen would be:
- prpeare calls in atomic context and usuage of GFP_NOWAIT for memory
  allocations
- residue callculation, though situation is much better now but still lots
  of driver do it worng and folks do get it wrong

> 
> > > Because, for the moment, we're pretty much left in the dark with
> > > different drivers doing the same thing in completetely different ways,
> > > with basically no way to tell if it's either the framework that
> > > requires such behaviour, or if the author was just feeling creative.
> > > 
> > > There's numerous examples for this at the moment:
> > >   - The GFP flags, with different drivers using either GFP_ATOMIC,
> > >     GFP_NOWAIT or GFP_KERNEL in the same functions
> > >   - Having to set device_slave_caps or not?
> > >   - Some drivers use dma_run_depedencies, some other don't
> > >   - That might just be my experience, but judging from previous
> > >     commits, DMA_PRIVATE is completely obscure, and we just set it
> > >     because it was making it work, without knowing what it was
> > >     supposed to do.
> > >   - etc.
> > 
> > Thanks for highlighting we should definitely add these in Documentation
> 
> It's quite clear in the case of the GFP flags now, Lars-Peter and you
> cleared up device_slave_caps, but I still could use some help with
> DMA_PRIVATE.
> 
> > > And basically, we have no way to tell at the moment which one is
> > > right and which one needs fixing.
> > > 
> > > The corollary being that it cripples the whole community ability to
> > > maintain the framework and make it evolve.
> > > 
> > > > > +  * device_slave_caps
> > > > > +    - Isn't that redundant with the cap_mask already?
> > > > > +    - Only a few drivers seem to implement it
> > > > For audio to know what your channel can do rather than hardcoding it
> > > 
> > > Ah, yes, I see it now. It's not related to the caps mask at all.
> > > 
> > > Just out of curiosity, wouldn't it be better to move this to the
> > > framework, and have these informations provided through the struct
> > > dma_device? Or would it have some non-trivial side-effects?
> > Well the problem is ability to have this queried uniformly from all drivers
> > across subsystems. If we can do this that would be nice.
> 
> I can work on some premelinary work to do just that, and see if it
> works for you then.
Sure sounds excellent to me
Vinod Koul Aug. 1, 2014, 5:15 p.m. UTC | #15
On Fri, Aug 01, 2014 at 10:57:07AM +0200, Maxime Ripard wrote:
> On Fri, Aug 01, 2014 at 10:00:10AM +0200, Lars-Peter Clausen wrote:
> > On 07/31/2014 07:37 PM, Maxime Ripard wrote:
> > >On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
> > >>On 07/31/2014 06:13 PM, Maxime Ripard wrote:
> > >>[...]
> > >>> From what you're saying, and judging from the drivers that already
> > >>>implement it, can't it be moved directly to the framework itself ?
> > >>>
> > >>
> > >>What exactly do you mean by moving it directly to the framework? The
> > >>slave_caps API is part of the DMAengine framework.
> > >
> > >Not its implementation, which is defined by each and every driver,
> > >while the behaviour of device_slave_caps is rather generic.
> > >
> > 
> > Do you mean something like adding a dma_slave_caps struct field to
> > the DMA channel that gets initialized when the channel is created
> > and then remove the callback? That makes some sense.
> 
> I was rather thinking into something like:
>   - Splitting device_control into independant functions
I like this part :)

>   - Then, knowing if you support pause/resume/terminate is trivial:
>     either you implement the callback, or you don't
>   - Putting the supported width and direction into fields of struct
>     dma_device, which can eventually be used by the framework to
>     filter out invalid configurations before calling the relevant
>     callbacks
thats is a good idea
>   - That would then be trivial to get from the framework, without
>     calling any callback
Yes please
Vinod Koul Aug. 1, 2014, 5:22 p.m. UTC | #16
On Thu, Jul 31, 2014 at 02:22:23PM +0100, Russell King - ARM Linux wrote:
> DMA engine has lacked a lot of infrastructure for common patterns in
> drivers; some of that is solved by my efforts with the virt_dma.c
> support, and also various cleanups to existing drivers, but it's not
> easy to fix this problem after drivers have been merged.
I think list management is something that needs fix. I see lots of time and
energy being spent by drivers in this.
I will plan for this in near future to move descriptor and list management
to the framework. That should help drivers get cleaned up a bit more
Lars-Peter Clausen Aug. 1, 2014, 6:09 p.m. UTC | #17
On 08/01/2014 07:15 PM, Vinod Koul wrote:
> On Fri, Aug 01, 2014 at 10:57:07AM +0200, Maxime Ripard wrote:
>> On Fri, Aug 01, 2014 at 10:00:10AM +0200, Lars-Peter Clausen wrote:
>>> On 07/31/2014 07:37 PM, Maxime Ripard wrote:
>>>> On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
>>>>> On 07/31/2014 06:13 PM, Maxime Ripard wrote:
>>>>> [...]
>>>>>>  From what you're saying, and judging from the drivers that already
>>>>>> implement it, can't it be moved directly to the framework itself ?
>>>>>>
>>>>>
>>>>> What exactly do you mean by moving it directly to the framework? The
>>>>> slave_caps API is part of the DMAengine framework.
>>>>
>>>> Not its implementation, which is defined by each and every driver,
>>>> while the behaviour of device_slave_caps is rather generic.
>>>>
>>>
>>> Do you mean something like adding a dma_slave_caps struct field to
>>> the DMA channel that gets initialized when the channel is created
>>> and then remove the callback? That makes some sense.
>>
>> I was rather thinking into something like:
>>    - Splitting device_control into independant functions
> I like this part :)

I started working on this a while ago by splitting out the slave_config 
functionality into its own callback. Haven't managed to finalize it since it 
wasn't really top priority.

- Lars

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maxime Ripard Aug. 2, 2014, 2:49 p.m. UTC | #18
On Fri, Aug 01, 2014 at 10:43:06PM +0530, Vinod Koul wrote:
> On Thu, Jul 31, 2014 at 06:23:30PM +0200, Maxime Ripard wrote:
> > On Thu, Jul 31, 2014 at 05:26:28PM +0530, Vinod Koul wrote:
> > 
> > Also, feel free to add anything that you feel like you keep saying
> > during the review. If mistakes keep coming, it's probably worth
> > documenting what you expect.
> I think the common issues seen would be:
> - prpeare calls in atomic context and usuage of GFP_NOWAIT for memory
>   allocations

I think we have that part covered already.

> - residue callculation, though situation is much better now but still lots
>   of driver do it worng and folks do get it wrong

What mistake in often made regarding the residue calculation?

> > > > Because, for the moment, we're pretty much left in the dark with
> > > > different drivers doing the same thing in completetely different ways,
> > > > with basically no way to tell if it's either the framework that
> > > > requires such behaviour, or if the author was just feeling creative.
> > > > 
> > > > There's numerous examples for this at the moment:
> > > >   - The GFP flags, with different drivers using either GFP_ATOMIC,
> > > >     GFP_NOWAIT or GFP_KERNEL in the same functions
> > > >   - Having to set device_slave_caps or not?
> > > >   - Some drivers use dma_run_depedencies, some other don't
> > > >   - That might just be my experience, but judging from previous
> > > >     commits, DMA_PRIVATE is completely obscure, and we just set it
> > > >     because it was making it work, without knowing what it was
> > > >     supposed to do.
> > > >   - etc.
> > > 
> > > Thanks for highlighting we should definitely add these in Documentation
> > 
> > It's quite clear in the case of the GFP flags now, Lars-Peter and you
> > cleared up device_slave_caps, but I still could use some help with
> > DMA_PRIVATE.
> > 
> > > > And basically, we have no way to tell at the moment which one is
> > > > right and which one needs fixing.
> > > > 
> > > > The corollary being that it cripples the whole community ability to
> > > > maintain the framework and make it evolve.
> > > > 
> > > > > > +  * device_slave_caps
> > > > > > +    - Isn't that redundant with the cap_mask already?
> > > > > > +    - Only a few drivers seem to implement it
> > > > > For audio to know what your channel can do rather than hardcoding it
> > > > 
> > > > Ah, yes, I see it now. It's not related to the caps mask at all.
> > > > 
> > > > Just out of curiosity, wouldn't it be better to move this to the
> > > > framework, and have these informations provided through the struct
> > > > dma_device? Or would it have some non-trivial side-effects?
> > > Well the problem is ability to have this queried uniformly from all drivers
> > > across subsystems. If we can do this that would be nice.
> > 
> > I can work on some premelinary work to do just that, and see if it
> > works for you then.
> Sure sounds excellent to me

Another extra questions arose during starting this.

In the case of the call to device_control, especially in the
DMA_SLAVE_CONFIG case, but that also applies to pause/resume, are the
changes supposed to be immediates or can they happen later?

I actually have in mind the case where we would use a vchan, that
might or might not be actually mapped to a physical channel at the
moment where the DMA_SLAVE_CONFIG call is made. In the case where it's
not mapped and not transfering anything, it's pretty trivial, to
handle, but in the case where it's actually mapped to a physical
channel, should we push the new configuration to the physical channel
right away, or can it wait until the transfer ends ?

Maxime
Maxime Ripard Aug. 2, 2014, 3:11 p.m. UTC | #19
On Fri, Aug 01, 2014 at 03:53:26PM +0100, Russell King - ARM Linux wrote:
> > > >   - That might just be my experience, but judging from previous
> > > >     commits, DMA_PRIVATE is completely obscure, and we just set it
> > > >     because it was making it work, without knowing what it was
> > > >     supposed to do.
> > > 
> > > DMA_PRIVATE means that the channel is unavailable for async-tx operations
> > > - in other words, it's slave-only.  Setting it before registration results
> > > in the private-use count being incremented, disabling the DMA_PRIVATE
> > > manipulation in the channel request/release APIs (requested channels are
> > > unavailable for async-tx operations, which is why that flag is also set
> > > there.)
> > > 
> > > To put it another way, if a DMA engine driver only provides slave DMA
> > > support, it must set DMA_PRIVATE to mark the channel unavailable for
> > > async-tx operations.  If a DMA engine drivers channels can also be used
> > > for async-tx operations, then it should leave the flag clear.
> > 
> > What about channels that can be both used for slave operations and
> > async-tx (like memcpy) ?
> 
> Then you don't set DMA_PRIVATE when the DMA engine driver registers with
> the core.  That then allows the DMA engine core to manage the flag,
> setting it when the channel is allocated for slave usage.

Then I guess we currently have a bug related to this.

During the development of the A31 driver that recently got merged,
during two subsequent channel allocation, the second would fail if
DMA_PRIVATE wasn't set.

I think it was on in dmaengine_get, but I'll have to get back to you
on that whenever I'm back from my holydays.

> One important point here is that async_tx does not have the idea of
> channel allocation - a channel which is registered into the DMA engine
> core without DMA_PRIVATE set is a candidate for use by the async_tx
> API.
> 
> > Hence why we should document as much as possible then, to spread the
> > knowledge and avoid it being lost when someone disappears or isn't
> > available anymore.
> > 
> > (And hopefully reduce the number of "errors" that might be done by
> > submitters, hence reducing the number of review iterations, lessening
> > the maintainers load)
> 
> The bigger issue in the longer run is being able to maintain and improve
> the DMA engine implementations and API.  One of the blockers to that is
> properly understanding all the async_tx API stuff, something which even
> I have been unable to do despite knowing a fair amount about DMA engine
> itself.
> 
> Where I fall down is the exact meaning of the ACK stuff, how the chaining
> is supposed to work, how the dependency between two DMA channels are
> handled.
> 
> This is why I was unable to resolve DMA engine's multiple mapping of the
> same DMA buffers, which has caused problems for ARM.
> 
> I've discussed this point in the past with Dan, making the point that
> each DMA engine driver should not be that concerned about things like
> descriptor list management, cookies, dependent transfers, but we seem
> to be ultimately doomed to re-implementing much of that infrastructure
> in every single DMA engine driver that comes along.

You can also add vchan scheduling or descriptor allocations. The atmel
guys seem to have hit a performance issue with dma_pool_alloc, so they
re-developped a descriptor allocation mechanism in their driver. I
don't have much more details on this, but if that was to be true, that
would definitely be something that should be in the framework.

vchan mapping to physical channels also seem to be quite common in the
drivers. That would be a nice addition too.

> Each DMA engine driver which gets accepted into mainline makes this
> problem worse - it's another driver which does something slightly
> differently that if we were to ever clean this stuff up, would need
> to be fixed.  This remains my biggest concern with the DMA engine
> code today, and I'm one of the few people who have put effort into
> trying to sort that out.

Yep. It's one of mine too, I'm willing to help if possible.

> > > Drivers should have no reason what so ever to be touching the cookie
> > > directly anymore.
> > 
> > Ok, thanks a lot for this!
> 
> I notice that drivers/dma/sirf-dma.c directly touches it and the
> completed cookie as well:
> 
> drivers/dma/sirf-dma.c: dma_cookie_t last_cookie = 0;
> drivers/dma/sirf-dma.c:                         last_cookie = desc->cookie;
> drivers/dma/sirf-dma.c:                 schan->chan.completed_cookie = last_cookie;
> 
> That's something which should be fixed.  I haven't looked too hard
> because it's not something I want to get involved with again right
> now.

By quickly looking at the code, it looks like dead code. I guess it
could just be removed.

Maxime
Maxime Ripard Aug. 2, 2014, 3:13 p.m. UTC | #20
On Fri, Aug 01, 2014 at 08:09:17PM +0200, Lars-Peter Clausen wrote:
> On 08/01/2014 07:15 PM, Vinod Koul wrote:
> >On Fri, Aug 01, 2014 at 10:57:07AM +0200, Maxime Ripard wrote:
> >>On Fri, Aug 01, 2014 at 10:00:10AM +0200, Lars-Peter Clausen wrote:
> >>>On 07/31/2014 07:37 PM, Maxime Ripard wrote:
> >>>>On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
> >>>>>On 07/31/2014 06:13 PM, Maxime Ripard wrote:
> >>>>>[...]
> >>>>>> From what you're saying, and judging from the drivers that already
> >>>>>>implement it, can't it be moved directly to the framework itself ?
> >>>>>>
> >>>>>
> >>>>>What exactly do you mean by moving it directly to the framework? The
> >>>>>slave_caps API is part of the DMAengine framework.
> >>>>
> >>>>Not its implementation, which is defined by each and every driver,
> >>>>while the behaviour of device_slave_caps is rather generic.
> >>>>
> >>>
> >>>Do you mean something like adding a dma_slave_caps struct field to
> >>>the DMA channel that gets initialized when the channel is created
> >>>and then remove the callback? That makes some sense.
> >>
> >>I was rather thinking into something like:
> >>   - Splitting device_control into independant functions
> >I like this part :)
> 
> I started working on this a while ago by splitting out the
> slave_config functionality into its own callback. Haven't managed to
> finalize it since it wasn't really top priority.

He, I've done the same yesterday... Do you plan on submitting it soon,
or should I keep going?

Maxime
Russell King - ARM Linux Aug. 2, 2014, 3:17 p.m. UTC | #21
On Sat, Aug 02, 2014 at 04:49:25PM +0200, Maxime Ripard wrote:
> In the case of the call to device_control, especially in the
> DMA_SLAVE_CONFIG case, but that also applies to pause/resume, are the
> changes supposed to be immediates or can they happen later?

pause/resume are expected to operate synchronously on the channel.
When asking for the channel to pause, the channel must be stopped
before the call returns.

In the case of channel configuration, that should have no effect on
any previously queued transfer - in other words, changing the channel
configuration only has an effect when subsequent transactions are
prepared.
Russell King - ARM Linux Aug. 2, 2014, 3:29 p.m. UTC | #22
On Sat, Aug 02, 2014 at 05:11:44PM +0200, Maxime Ripard wrote:
> On Fri, Aug 01, 2014 at 03:53:26PM +0100, Russell King - ARM Linux wrote:
> > > > >   - That might just be my experience, but judging from previous
> > > > >     commits, DMA_PRIVATE is completely obscure, and we just set it
> > > > >     because it was making it work, without knowing what it was
> > > > >     supposed to do.
> > > > 
> > > > DMA_PRIVATE means that the channel is unavailable for async-tx operations
> > > > - in other words, it's slave-only.  Setting it before registration results
> > > > in the private-use count being incremented, disabling the DMA_PRIVATE
> > > > manipulation in the channel request/release APIs (requested channels are
> > > > unavailable for async-tx operations, which is why that flag is also set
> > > > there.)
> > > > 
> > > > To put it another way, if a DMA engine driver only provides slave DMA
> > > > support, it must set DMA_PRIVATE to mark the channel unavailable for
> > > > async-tx operations.  If a DMA engine drivers channels can also be used
> > > > for async-tx operations, then it should leave the flag clear.
> > > 
> > > What about channels that can be both used for slave operations and
> > > async-tx (like memcpy) ?
> > 
> > Then you don't set DMA_PRIVATE when the DMA engine driver registers with
> > the core.  That then allows the DMA engine core to manage the flag,
> > setting it when the channel is allocated for slave usage.
> 
> Then I guess we currently have a bug related to this.
> 
> During the development of the A31 driver that recently got merged,
> during two subsequent channel allocation, the second would fail if
> DMA_PRIVATE wasn't set.
> 
> I think it was on in dmaengine_get, but I'll have to get back to you
> on that whenever I'm back from my holydays.

I think you mean dma_chan_get().  dmaengine_get() is used to obtain
references to the async_tx memcpy/crypto channels, and should not be used
by a driver making use of the slave capabilities.

There two systems are slightly competitive between each other.  Slave
stuff generally want to get hold of specific channels, whereas the
async_tx stuff can generally do with any channel which provides the
appropriate capabilities, and can switch between channels if it needs
to perform a series of operations only supported by separate channels.

> You can also add vchan scheduling or descriptor allocations. The atmel
> guys seem to have hit a performance issue with dma_pool_alloc, so they
> re-developped a descriptor allocation mechanism in their driver. I
> don't have much more details on this, but if that was to be true, that
> would definitely be something that should be in the framework.

It is entirely legal for a dmaengine driver to maintain its own list of
"free" transaction descriptors (that's actually what many async-tx
drivers do) and is partly why there's the alloc_chan_resources and
free_chan_resources callbacks.

However, they tend to be fairly large structures, and don't always
match what the hardware wants, so using them to describe the individual
scatterlist elements of a transaction is not always a good idea.

> vchan mapping to physical channels also seem to be quite common in the
> drivers. That would be a nice addition too.

It's intentionally not in the vchan interface because that mapping is
device specific.

For example, there are some DMA engine implementations where certain
physical DMA channels should not be used because they have slightly
different properties (for example, they can lock the CPU off the bus
if they're permanently transferring data, such as in a memcpy or
device-to-memory transfer where the device always has data available.)

In any case, I'm not interested in seeing vchan turning into another
"layer" - it must remain a library.  Layers are bad news. :)
Maxime Ripard Aug. 2, 2014, 7:05 p.m. UTC | #23
On Sat, Aug 02, 2014 at 04:29:21PM +0100, Russell King - ARM Linux wrote:
> On Sat, Aug 02, 2014 at 05:11:44PM +0200, Maxime Ripard wrote:
> > On Fri, Aug 01, 2014 at 03:53:26PM +0100, Russell King - ARM Linux wrote:
> > > > > >   - That might just be my experience, but judging from previous
> > > > > >     commits, DMA_PRIVATE is completely obscure, and we just set it
> > > > > >     because it was making it work, without knowing what it was
> > > > > >     supposed to do.
> > > > > 
> > > > > DMA_PRIVATE means that the channel is unavailable for async-tx operations
> > > > > - in other words, it's slave-only.  Setting it before registration results
> > > > > in the private-use count being incremented, disabling the DMA_PRIVATE
> > > > > manipulation in the channel request/release APIs (requested channels are
> > > > > unavailable for async-tx operations, which is why that flag is also set
> > > > > there.)
> > > > > 
> > > > > To put it another way, if a DMA engine driver only provides slave DMA
> > > > > support, it must set DMA_PRIVATE to mark the channel unavailable for
> > > > > async-tx operations.  If a DMA engine drivers channels can also be used
> > > > > for async-tx operations, then it should leave the flag clear.
> > > > 
> > > > What about channels that can be both used for slave operations and
> > > > async-tx (like memcpy) ?
> > > 
> > > Then you don't set DMA_PRIVATE when the DMA engine driver registers with
> > > the core.  That then allows the DMA engine core to manage the flag,
> > > setting it when the channel is allocated for slave usage.
> > 
> > Then I guess we currently have a bug related to this.
> > 
> > During the development of the A31 driver that recently got merged,
> > during two subsequent channel allocation, the second would fail if
> > DMA_PRIVATE wasn't set.
> > 
> > I think it was on in dmaengine_get, but I'll have to get back to you
> > on that whenever I'm back from my holydays.
> 
> I think you mean dma_chan_get().  dmaengine_get() is used to obtain
> references to the async_tx memcpy/crypto channels, and should not be used
> by a driver making use of the slave capabilities.

Like I said, I don't remember where the actual issue was lying, but
ignoring DMA_PRIVATE prevented to allocate two channels at the same
time in my case.

When I'll have access again to my board, I'll try to dig more into it.

> There two systems are slightly competitive between each other.  Slave
> stuff generally want to get hold of specific channels, whereas the
> async_tx stuff can generally do with any channel which provides the
> appropriate capabilities, and can switch between channels if it needs
> to perform a series of operations only supported by separate channels.
> 
> > You can also add vchan scheduling or descriptor allocations. The atmel
> > guys seem to have hit a performance issue with dma_pool_alloc, so they
> > re-developped a descriptor allocation mechanism in their driver. I
> > don't have much more details on this, but if that was to be true, that
> > would definitely be something that should be in the framework.
> 
> It is entirely legal for a dmaengine driver to maintain its own list of
> "free" transaction descriptors (that's actually what many async-tx
> drivers do) and is partly why there's the alloc_chan_resources and
> free_chan_resources callbacks.
> 
> However, they tend to be fairly large structures, and don't always
> match what the hardware wants, so using them to describe the individual
> scatterlist elements of a transaction is not always a good idea.

I was not really claiming it was illegal, but rather that it could be
useful for other drivers too.

> > vchan mapping to physical channels also seem to be quite common in the
> > drivers. That would be a nice addition too.
> 
> It's intentionally not in the vchan interface because that mapping is
> device specific.
> 
> For example, there are some DMA engine implementations where certain
> physical DMA channels should not be used because they have slightly
> different properties (for example, they can lock the CPU off the bus
> if they're permanently transferring data, such as in a memcpy or
> device-to-memory transfer where the device always has data available.)
> 
> In any case, I'm not interested in seeing vchan turning into another
> "layer" - it must remain a library.  Layers are bad news. :)

Yeah, I was assuming there would be such constraints. Still, handling
the trivial case where you can map any vchan to any free physical
channels is very painful, while at least a couple of drivers implement
the same dumb logic.
Maxime Ripard Aug. 2, 2014, 7:06 p.m. UTC | #24
On Sat, Aug 02, 2014 at 04:17:14PM +0100, Russell King - ARM Linux wrote:
> On Sat, Aug 02, 2014 at 04:49:25PM +0200, Maxime Ripard wrote:
> > In the case of the call to device_control, especially in the
> > DMA_SLAVE_CONFIG case, but that also applies to pause/resume, are the
> > changes supposed to be immediates or can they happen later?
> 
> pause/resume are expected to operate synchronously on the channel.
> When asking for the channel to pause, the channel must be stopped
> before the call returns.
> 
> In the case of channel configuration, that should have no effect on
> any previously queued transfer - in other words, changing the channel
> configuration only has an effect when subsequent transactions are
> prepared.

Ok, thanks!
Lars-Peter Clausen Aug. 4, 2014, 7:16 a.m. UTC | #25
On 08/02/2014 05:13 PM, Maxime Ripard wrote:
> On Fri, Aug 01, 2014 at 08:09:17PM +0200, Lars-Peter Clausen wrote:
>> On 08/01/2014 07:15 PM, Vinod Koul wrote:
>>> On Fri, Aug 01, 2014 at 10:57:07AM +0200, Maxime Ripard wrote:
>>>> On Fri, Aug 01, 2014 at 10:00:10AM +0200, Lars-Peter Clausen wrote:
>>>>> On 07/31/2014 07:37 PM, Maxime Ripard wrote:
>>>>>> On Thu, Jul 31, 2014 at 06:54:11PM +0200, Lars-Peter Clausen wrote:
>>>>>>> On 07/31/2014 06:13 PM, Maxime Ripard wrote:
>>>>>>> [...]
>>>>>>>>  From what you're saying, and judging from the drivers that already
>>>>>>>> implement it, can't it be moved directly to the framework itself ?
>>>>>>>>
>>>>>>>
>>>>>>> What exactly do you mean by moving it directly to the framework? The
>>>>>>> slave_caps API is part of the DMAengine framework.
>>>>>>
>>>>>> Not its implementation, which is defined by each and every driver,
>>>>>> while the behaviour of device_slave_caps is rather generic.
>>>>>>
>>>>>
>>>>> Do you mean something like adding a dma_slave_caps struct field to
>>>>> the DMA channel that gets initialized when the channel is created
>>>>> and then remove the callback? That makes some sense.
>>>>
>>>> I was rather thinking into something like:
>>>>    - Splitting device_control into independant functions
>>> I like this part :)
>>
>> I started working on this a while ago by splitting out the
>> slave_config functionality into its own callback. Haven't managed to
>> finalize it since it wasn't really top priority.
>
> He, I've done the same yesterday... Do you plan on submitting it soon,
> or should I keep going?

The plan was to submit it eventually. But if you already did it as well, go 
for it.

- Lars

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven Aug. 5, 2014, 8:16 a.m. UTC | #26
Hi Maxime,

On Wed, Jul 30, 2014 at 6:03 PM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> The dmaengine is neither trivial nor properly documented at the moment, which
> means a lot of trial and error development, which is not that good for such a
> central piece of the system.
>
> Attempt at making such a documentation.

Great, thanks!

> +Our dma_device structure has a field called caps_mask that holds the
> +various type of transaction supported, and you need to modify this

types of transactions

> +mask using the dma_cap_set function, with various flags depending on
> +transaction types you support as an argument.

> +  * DMA_SLAVE
> +    - The device can handle device to memory transfers, including
> +      scatter-gather transfers.
> +    - While in the mem2mem case we were having two distinct type to

types

> +      deal with a single chunk to copy or a collection of them, here,
> +      we just have a single transaction type that is supposed to
> +      handle both.

> +  * DMA_INTERLEAVE
> +    - The device support interleaved transfer. Those transfers usually

supports

> +      involve an interleaved set of data, with chunks a few bytes
> +      wide, where a scatter-gather transfer would be quite
> +      inefficient.

> +The functions that we have to fill in there, and hence have to
> +implement, obviously depend on the transaction type you reported as

types

> +supported.

> +   * device_issue_pending
> +     - Takes the first descriptor in the pending queue, and start the

starts

> +       transfer. Whenever that transfer is done, it should move to the
> +       next transaction in the list.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vinod Koul Aug. 5, 2014, 4:25 p.m. UTC | #27
On Sat, Aug 02, 2014 at 04:49:25PM +0200, Maxime Ripard wrote:
> > - residue callculation, though situation is much better now but still lots
> >   of driver do it worng and folks do get it wrong
> 
> What mistake in often made regarding the residue calculation?
- residue is for the descriptor asked
- not for the current active one
- we can store the size locally and return that when not processing
- arg txstate can be null

> Another extra questions arose during starting this.
> 
> In the case of the call to device_control, especially in the
> DMA_SLAVE_CONFIG case, but that also applies to pause/resume, are the
> changes supposed to be immediates or can they happen later?
> 
> I actually have in mind the case where we would use a vchan, that
> might or might not be actually mapped to a physical channel at the
> moment where the DMA_SLAVE_CONFIG call is made. In the case where it's
> not mapped and not transfering anything, it's pretty trivial, to
> handle, but in the case where it's actually mapped to a physical
> channel, should we push the new configuration to the physical channel
> right away, or can it wait until the transfer ends ?
Ah no. The model doesn't assume the config will be written runtime. So the
descriptor which are already prepared doesn't change with DMA_SLAVE_CONFIG
call. Only new descriptors will take those values

But yes the pause/resume would be syncronous

HTH
Ludovic Desroches Aug. 14, 2014, 8:53 a.m. UTC | #28
On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> The dmaengine is neither trivial nor properly documented at the moment, which
> means a lot of trial and error development, which is not that good for such a
> central piece of the system.
> 
> Attempt at making such a documentation.

Good idea, many questions are asked when writing a new dmaengine driver.

For instance I didn't find how to use the DMA_CTRL_ACK flags.

- How this flag has to be managed? For instance, async_tx_ack is used in
dmaengine driver but also in some devices.
- Is it mandatory to deal with this flag? It seems some dmaengine
drivers don't care about it.

> 
> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
> ---
>  Documentation/dmaengine-driver.txt | 293 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 293 insertions(+)
>  create mode 100644 Documentation/dmaengine-driver.txt
> 
> diff --git a/Documentation/dmaengine-driver.txt b/Documentation/dmaengine-driver.txt
> new file mode 100644
> index 000000000000..4828b50038c0
> --- /dev/null
> +++ b/Documentation/dmaengine-driver.txt
> @@ -0,0 +1,293 @@
> +DMAengine controller documentation
> +==================================
> +
> +Hardware Introduction
> ++++++++++++++++++++++
> +
> +Most of the Slave DMA controllers have the same general principles of
> +operations.
> +
> +They have a given number of channels to use for the DMA transfers, and
> +a given number of requests lines.
> +
> +Requests and channels are pretty much orthogonal. Channels can be used
> +to serve several to any requests. To simplify, channels are the
> +entities that will be doing the copy, and requests what endpoints are
> +involved.
> +
> +The request lines actually correspond to physical lines going from the
> +DMA-elligible devices to the controller itself. Whenever the device
> +will want to start a transfer, it will assert a DMA request (DRQ) by
> +asserting that request line.
> +
> +A very simple DMA controller would only take into account a single
> +parameter: the transfer size. At each clock cycle, it would transfer a
> +byte of data from one buffer to another, until the transfer size has
> +been reached.
> +
> +That wouldn't work well in the real world, since slave devices might
> +require to have to retrieve various number of bits from memory at a
> +time. For example, we probably want to transfer 32 bits at a time when
> +doing a simple memory copy operation, but our audio device will
> +require to have 16 or 24 bits written to its FIFO. This is why most if
> +not all of the DMA controllers can adjust this, using a parameter
> +called the width.
> +
> +Moreover, some DMA controllers, whenever the RAM is involved, can
> +group the reads or writes in memory into a buffer, so instead of
> +having a lot of small memory accesses, which is not really efficient,
> +you'll get several bigger transfers. This is done using a parameter
> +called the burst size, that defines how many single reads/writes it's
> +allowed to do in a single clock cycle.
> +
> +Our theorical DMA controller would then only be able to do transfers
> +that involve a single contiguous block of data. However, some of the
> +transfers we usually have are not, and want to copy data from
> +non-contiguous buffers to a contiguous buffer, which is called
> +scatter-gather.
> +
> +DMAEngine, at least for mem2dev transfers, require support for
> +scatter-gather. So we're left with two cases here: either we have a
> +quite simple DMA controller that doesn't support it, and we'll have to
> +implement it in software, or we have a more advanced DMA controller,
> +that implements in hardware scatter-gather.
> +
> +The latter are usually programmed using a collection of chunks to
> +transfer, and whenever the transfer is started, the controller will go
> +over that collection, doing whatever we programmed there.
> +
> +This collection is usually either a table or a linked list. You will
> +then push either the address of the table and its number of elements,
> +or the first item of the list to one channel of the DMA controller,
> +and whenever a DRQ will be asserted, it will go through the collection
> +to know where to fetch the data from.
> +
> +Either way, the format of this collection is completely dependent of
> +your hardware. Each DMA controller will require a different structure,
> +but all of them will require, for every chunk, at least the source and
> +destination addresses, wether it should increment these addresses or
> +not and the three parameters we saw earlier: the burst size, the bus
> +width and the transfer size.
> +
> +The one last thing is that usually, slave devices won't issue DRQ by
> +default, and you have to enable this in your slave device driver first
> +whenever you're willing to use DMA.
> +
> +These were just the general memory-to-memory (also called mem2mem) or
> +memory-to-device (mem2dev) transfers. Other kind of transfers might be
> +offered by your DMA controller, and are probably already supported by
> +dmaengine.
> +
> +DMAEngine Registration
> +++++++++++++++++++++++
> +
> +struct dma_device Initialization
> +--------------------------------
> +
> +Just like any other kernel framework, the whole DMAEngine registration
> +relies on the driver filling a structure and registering against the
> +framework. In our case, that structure is dma_device.
> +
> +The first thing you need to do in your driver is to allocate this
> +structure. Any of the usual memory allocator will do, but you'll also
> +need to initialize a few fields in there:
> +
> +  * chancnt:	should be the number of channels your driver is exposing
> +		to the system.
> +		This doesn't have to be the number of physical
> +		channels: some DMA controllers also expose virtual
> +		channels to the system to overcome the case where you
> +		have more consumers than physical channels available.
> +
> +  * channels:	should be initialized as a list using the
> +		INIT_LIST_HEAD macro for example
> +
> +  * dev: 	should hold the pointer to the struct device associated
> +		to your current driver instance.
> +
> +Supported transaction types
> +---------------------------
> +The next thing you need is to actually set which transaction type your
> +device (and driver) supports.
> +
> +Our dma_device structure has a field called caps_mask that holds the
> +various type of transaction supported, and you need to modify this
> +mask using the dma_cap_set function, with various flags depending on
> +transaction types you support as an argument.
> +
> +All those capabilities are defined in the dma_transaction_type enum,
> +in include/linux/dmaengine.h
> +
> +Currently, the types available are:
> +  * DMA_MEMCPY
> +    - The device is able to do memory to memory copies
> +
> +  * DMA_XOR
> +    - The device is able to perform XOR operations on memory areas
> +    - Particularly useful to accelerate XOR intensive tasks, such as
> +      RAID5
> +
> +  * DMA_XOR_VAL
> +    - The device is able to perform parity check using the XOR
> +      algorithm against a memory buffer.
> +
> +  * DMA_PQ
> +    - The device is able to perform RAID6 P+Q computations, P being a
> +      simple XOR, and Q being a Reed-Solomon algorithm.
> +
> +  * DMA_PQ_VAL
> +    - The device is able to perform parity check using RAID6 P+Q
> +      algorithm against a memory buffer.
> +
> +  * DMA_INTERRUPT
> +    /* TODO: Is it that the device has one interrupt per channel? */
> +
> +  * DMA_SG
> +    - The device supports memory to memory scatter-gather
> +      transfers.
> +    - Even though a plain memcpy can look like a particular case of a
> +      scatter-gather transfer, with a single chunk to transfer, it's a
> +      distinct transaction type in the mem2mem transfers case
> +
> +  * DMA_PRIVATE
> +    - The device can have several client at a time, most likely
> +      because it has several parallel channels.
> +
> +  * DMA_ASYNC_TX
> +    - Must not be set by the device, and will be set by the framework
> +      if needed
> +    - /* TODO: What is it about? */
> +
> +  * DMA_SLAVE
> +    - The device can handle device to memory transfers, including
> +      scatter-gather transfers.
> +    - While in the mem2mem case we were having two distinct type to
> +      deal with a single chunk to copy or a collection of them, here,
> +      we just have a single transaction type that is supposed to
> +      handle both.
> +
> +  * DMA_CYCLIC
> +    - The device can handle cyclic transfers.
> +    - A cyclic transfer is a transfer where the chunk collection will
> +      loop over itself, with the last item pointing to the first. It's
> +      usually used for audio transfers, where you want to operate on a
> +      single big buffer that you will fill with your audio data.
> +
> +  * DMA_INTERLEAVE
> +    - The device support interleaved transfer. Those transfers usually
> +      involve an interleaved set of data, with chunks a few bytes
> +      wide, where a scatter-gather transfer would be quite
> +      inefficient.
> +
> +These various types will also affect how the source and destination
> +addresses change over time, as DMA_SLAVE transfers will usually have
> +one of the addresses that will increment, while the other will not,
> +DMA_CYCLIC will have one address that will loop, while the other, will
> +not change, etc.
> +
> +Device operations
> +-----------------
> +
> +Our dma_device structure also requires a few function pointers in
> +order to implement the actual logic, now that we described what
> +operations we were able to perform.
> +
> +The functions that we have to fill in there, and hence have to
> +implement, obviously depend on the transaction type you reported as
> +supported.
> +
> +   * device_alloc_chan_resources
> +   * device_free_chan_resources
> +     - These functions will be called whenever a driver will call
> +       dma_request_channel or dma_release_channel for the first/last
> +       time on the channel associated to that driver.
> +     - They are in charge of allocating/freeing all the needed
> +       resources in order for that channel to be useful for your
> +       driver.
> +     - These functions can sleep.
> +
> +   * device_prep_dma_*
> +     - These functions are matching the capabilities you registered
> +       previously.
> +     - These functions all take the buffer or the scatterlist relevant
> +       for the transfer being prepared, and should create a hardware
> +       descriptor or a list of descriptors from it
> +     - These functions can be called from an interrupt context
> +     - Any allocation you might do should be using the GFP_NOWAIT
> +       flag, in order not to potentially sleep, but without depleting
> +       the emergency pool either.
> +
> +     - It should return a unique instance of the
> +       dma_async_tx_descriptor structure, that further represents this
> +       particular transfer.
> +
> +     - This structure can be allocated using the function
> +       dma_async_tx_descriptor_init.
> +     - You'll also need to set two fields in this structure:
> +       + flags:
> +		TODO: Can it be modified by the driver itself, or
> +		should it be always the flags passed in the arguments
> +
> +       + tx_submit:	A pointer to a function you have to implement,
> +			that is supposed to push the current descriptor
> +			to a pending queue, waiting for issue_pending to
> +			be called.
> +
> +   * device_issue_pending
> +     - Takes the first descriptor in the pending queue, and start the
> +       transfer. Whenever that transfer is done, it should move to the
> +       next transaction in the list.
> +     - It should call the registered callback if any each time a
> +       transaction is done.
> +     - This function can be called in an interrupt context
> +
> +   * device_tx_status
> +     - Should report the bytes left to go over in the current transfer
> +       for the given channel
> +     - Should use dma_set_residue to report it
> +     - In the case of a cyclic transfer, it should only take into
> +       account the current period.
> +     - This function can be called in an interrupt context.
> +
> +   * device_control
> +     - Used by client drivers to control and configure the channel it
> +       has a handle on.
> +     - Called with a command and an argument
> +       + The command is one of the values listed by the enum
> +         dma_ctrl_cmd. To this date, the valid commands are:
> +         + DMA_RESUME
> +           + Restarts a transfer on the channel
> +         + DMA_PAUSE
> +           + Pauses a transfer on the channel
> +         + DMA_TERMINATE_ALL
> +           + Aborts all the pending and ongoing transfers on the
> +             channel
> +         + DMA_SLAVE_CONFIG
> +           + Reconfigures the channel with passed configuration
> +         + FSLDMA_EXTERNAL_START
> +           + TODO: Why does that even exist?
> +       + The argument is an opaque unsigned long. This actually is a
> +         pointer to a struct dma_slave_config that should be used only
> +         in the DMA_SLAVE_CONFIG.
> +
> +Misc notes (stuff that should be documented, but don't really know
> +what to say about it)
> +------------------------------------------------------------------
> +  * dma_run_dependencies
> +    - What is it supposed to do/when should it be called?
> +    - Some drivers seems to implement it at the end of a transfer, but
> +      not all of them do, so it seems we can get away without it
> +
> +  * device_slave_caps
> +    - Isn't that redundant with the cap_mask already?
> +    - Only a few drivers seem to implement it
> +
> +  * dma cookies?
> +
> +Glossary
> +--------
> +
> +Burst:		Usually a few contiguous bytes that will be transfered
> +		at once by the DMA controller
> +Chunk:		A contiguous collection of bursts
> +Transfer:	A collection of chunks (be it contiguous or not)
> -- 
> 2.0.2
> 
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Aug. 14, 2014, 8:57 a.m. UTC | #29
On Thu, Aug 14, 2014 at 10:53:01AM +0200, Ludovic Desroches wrote:
> On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> > The dmaengine is neither trivial nor properly documented at the moment, which
> > means a lot of trial and error development, which is not that good for such a
> > central piece of the system.
> > 
> > Attempt at making such a documentation.
> 
> Good idea, many questions are asked when writing a new dmaengine driver.
> 
> For instance I didn't find how to use the DMA_CTRL_ACK flags.
> 
> - How this flag has to be managed? For instance, async_tx_ack is used in
> dmaengine driver but also in some devices.
> - Is it mandatory to deal with this flag? It seems some dmaengine
> drivers don't care about it.

This is one part of the DMA engine API that even I don't understand.

It's got something to do with the async engine API, and seems to be
something to do with whether a descriptor can have other transactions
added to it, and whether a descriptor can be re-used (async-tx engines
typically allocate a fixed set of descriptors and recycle them.)

I was never able to get to the bottom of what that flag meant or how
it should be used with the DMA slave API (and I suspect everyone else
just gave up trying to understand it as well.)
Vinod Koul Aug. 19, 2014, 1:45 p.m. UTC | #30
On Thu, Aug 14, 2014 at 09:57:53AM +0100, Russell King - ARM Linux wrote:
> On Thu, Aug 14, 2014 at 10:53:01AM +0200, Ludovic Desroches wrote:
> > On Wed, Jul 30, 2014 at 06:03:13PM +0200, Maxime Ripard wrote:
> > > The dmaengine is neither trivial nor properly documented at the moment, which
> > > means a lot of trial and error development, which is not that good for such a
> > > central piece of the system.
> > > 
> > > Attempt at making such a documentation.
> > 
> > Good idea, many questions are asked when writing a new dmaengine driver.
> > 
> > For instance I didn't find how to use the DMA_CTRL_ACK flags.
> > 
> > - How this flag has to be managed? For instance, async_tx_ack is used in
> > dmaengine driver but also in some devices.
> > - Is it mandatory to deal with this flag? It seems some dmaengine
> > drivers don't care about it.
> This is one part of the DMA engine API that even I don't understand.
> 
> It's got something to do with the async engine API, and seems to be
> something to do with whether a descriptor can have other transactions
> added to it, and whether a descriptor can be re-used (async-tx engines
> typically allocate a fixed set of descriptors and recycle them.)
Yes this is my understanding too. Async API IIUC can reuse descriptors and
if engine doesnt support this is a way to tell them please do use that.

For slave dmanegine API we need to ignore it.
> 
> I was never able to get to the bottom of what that flag meant or how
> it should be used with the DMA slave API (and I suspect everyone else
> just gave up trying to understand it as well.)
Russell King - ARM Linux Aug. 19, 2014, 2:44 p.m. UTC | #31
On Tue, Aug 19, 2014 at 07:15:07PM +0530, Vinod Koul wrote:
> On Thu, Aug 14, 2014 at 09:57:53AM +0100, Russell King - ARM Linux wrote:
> > It's got something to do with the async engine API, and seems to be
> > something to do with whether a descriptor can have other transactions
> > added to it, and whether a descriptor can be re-used (async-tx engines
> > typically allocate a fixed set of descriptors and recycle them.)
> Yes this is my understanding too. Async API IIUC can reuse descriptors and
> if engine doesnt support this is a way to tell them please do use that.
> 
> For slave dmanegine API we need to ignore it.

We shouldn't ignore it - ignoring it makes it harder to implement a DMA
engine driver which supports both the slave and async APIs, because
we then need to know the reason for the channel being requested.

Ignoring it until it can be understood and documented is an approach I
would agree with though.
Vinod Koul Aug. 19, 2014, 2:57 p.m. UTC | #32
On Tue, Aug 19, 2014 at 03:44:23PM +0100, Russell King - ARM Linux wrote:
> On Tue, Aug 19, 2014 at 07:15:07PM +0530, Vinod Koul wrote:
> > On Thu, Aug 14, 2014 at 09:57:53AM +0100, Russell King - ARM Linux wrote:
> > > It's got something to do with the async engine API, and seems to be
> > > something to do with whether a descriptor can have other transactions
> > > added to it, and whether a descriptor can be re-used (async-tx engines
> > > typically allocate a fixed set of descriptors and recycle them.)
> > Yes this is my understanding too. Async API IIUC can reuse descriptors and
> > if engine doesnt support this is a way to tell them please do use that.
> > 
> > For slave dmanegine API we need to ignore it.
> 
> We shouldn't ignore it - ignoring it makes it harder to implement a DMA
> engine driver which supports both the slave and async APIs, because
> we then need to know the reason for the channel being requested.
Do you have such usage coming up near future :)
> 
> Ignoring it until it can be understood and documented is an approach I
> would agree with though.
Certainly, I am asking Dan to add more clarity on these bits to help
improve.
diff mbox

Patch

diff --git a/Documentation/dmaengine-driver.txt b/Documentation/dmaengine-driver.txt
new file mode 100644
index 000000000000..4828b50038c0
--- /dev/null
+++ b/Documentation/dmaengine-driver.txt
@@ -0,0 +1,293 @@ 
+DMAengine controller documentation
+==================================
+
+Hardware Introduction
++++++++++++++++++++++
+
+Most of the Slave DMA controllers have the same general principles of
+operations.
+
+They have a given number of channels to use for the DMA transfers, and
+a given number of requests lines.
+
+Requests and channels are pretty much orthogonal. Channels can be used
+to serve several to any requests. To simplify, channels are the
+entities that will be doing the copy, and requests what endpoints are
+involved.
+
+The request lines actually correspond to physical lines going from the
+DMA-elligible devices to the controller itself. Whenever the device
+will want to start a transfer, it will assert a DMA request (DRQ) by
+asserting that request line.
+
+A very simple DMA controller would only take into account a single
+parameter: the transfer size. At each clock cycle, it would transfer a
+byte of data from one buffer to another, until the transfer size has
+been reached.
+
+That wouldn't work well in the real world, since slave devices might
+require to have to retrieve various number of bits from memory at a
+time. For example, we probably want to transfer 32 bits at a time when
+doing a simple memory copy operation, but our audio device will
+require to have 16 or 24 bits written to its FIFO. This is why most if
+not all of the DMA controllers can adjust this, using a parameter
+called the width.
+
+Moreover, some DMA controllers, whenever the RAM is involved, can
+group the reads or writes in memory into a buffer, so instead of
+having a lot of small memory accesses, which is not really efficient,
+you'll get several bigger transfers. This is done using a parameter
+called the burst size, that defines how many single reads/writes it's
+allowed to do in a single clock cycle.
+
+Our theorical DMA controller would then only be able to do transfers
+that involve a single contiguous block of data. However, some of the
+transfers we usually have are not, and want to copy data from
+non-contiguous buffers to a contiguous buffer, which is called
+scatter-gather.
+
+DMAEngine, at least for mem2dev transfers, require support for
+scatter-gather. So we're left with two cases here: either we have a
+quite simple DMA controller that doesn't support it, and we'll have to
+implement it in software, or we have a more advanced DMA controller,
+that implements in hardware scatter-gather.
+
+The latter are usually programmed using a collection of chunks to
+transfer, and whenever the transfer is started, the controller will go
+over that collection, doing whatever we programmed there.
+
+This collection is usually either a table or a linked list. You will
+then push either the address of the table and its number of elements,
+or the first item of the list to one channel of the DMA controller,
+and whenever a DRQ will be asserted, it will go through the collection
+to know where to fetch the data from.
+
+Either way, the format of this collection is completely dependent of
+your hardware. Each DMA controller will require a different structure,
+but all of them will require, for every chunk, at least the source and
+destination addresses, wether it should increment these addresses or
+not and the three parameters we saw earlier: the burst size, the bus
+width and the transfer size.
+
+The one last thing is that usually, slave devices won't issue DRQ by
+default, and you have to enable this in your slave device driver first
+whenever you're willing to use DMA.
+
+These were just the general memory-to-memory (also called mem2mem) or
+memory-to-device (mem2dev) transfers. Other kind of transfers might be
+offered by your DMA controller, and are probably already supported by
+dmaengine.
+
+DMAEngine Registration
+++++++++++++++++++++++
+
+struct dma_device Initialization
+--------------------------------
+
+Just like any other kernel framework, the whole DMAEngine registration
+relies on the driver filling a structure and registering against the
+framework. In our case, that structure is dma_device.
+
+The first thing you need to do in your driver is to allocate this
+structure. Any of the usual memory allocator will do, but you'll also
+need to initialize a few fields in there:
+
+  * chancnt:	should be the number of channels your driver is exposing
+		to the system.
+		This doesn't have to be the number of physical
+		channels: some DMA controllers also expose virtual
+		channels to the system to overcome the case where you
+		have more consumers than physical channels available.
+
+  * channels:	should be initialized as a list using the
+		INIT_LIST_HEAD macro for example
+
+  * dev: 	should hold the pointer to the struct device associated
+		to your current driver instance.
+
+Supported transaction types
+---------------------------
+The next thing you need is to actually set which transaction type your
+device (and driver) supports.
+
+Our dma_device structure has a field called caps_mask that holds the
+various type of transaction supported, and you need to modify this
+mask using the dma_cap_set function, with various flags depending on
+transaction types you support as an argument.
+
+All those capabilities are defined in the dma_transaction_type enum,
+in include/linux/dmaengine.h
+
+Currently, the types available are:
+  * DMA_MEMCPY
+    - The device is able to do memory to memory copies
+
+  * DMA_XOR
+    - The device is able to perform XOR operations on memory areas
+    - Particularly useful to accelerate XOR intensive tasks, such as
+      RAID5
+
+  * DMA_XOR_VAL
+    - The device is able to perform parity check using the XOR
+      algorithm against a memory buffer.
+
+  * DMA_PQ
+    - The device is able to perform RAID6 P+Q computations, P being a
+      simple XOR, and Q being a Reed-Solomon algorithm.
+
+  * DMA_PQ_VAL
+    - The device is able to perform parity check using RAID6 P+Q
+      algorithm against a memory buffer.
+
+  * DMA_INTERRUPT
+    /* TODO: Is it that the device has one interrupt per channel? */
+
+  * DMA_SG
+    - The device supports memory to memory scatter-gather
+      transfers.
+    - Even though a plain memcpy can look like a particular case of a
+      scatter-gather transfer, with a single chunk to transfer, it's a
+      distinct transaction type in the mem2mem transfers case
+
+  * DMA_PRIVATE
+    - The device can have several client at a time, most likely
+      because it has several parallel channels.
+
+  * DMA_ASYNC_TX
+    - Must not be set by the device, and will be set by the framework
+      if needed
+    - /* TODO: What is it about? */
+
+  * DMA_SLAVE
+    - The device can handle device to memory transfers, including
+      scatter-gather transfers.
+    - While in the mem2mem case we were having two distinct type to
+      deal with a single chunk to copy or a collection of them, here,
+      we just have a single transaction type that is supposed to
+      handle both.
+
+  * DMA_CYCLIC
+    - The device can handle cyclic transfers.
+    - A cyclic transfer is a transfer where the chunk collection will
+      loop over itself, with the last item pointing to the first. It's
+      usually used for audio transfers, where you want to operate on a
+      single big buffer that you will fill with your audio data.
+
+  * DMA_INTERLEAVE
+    - The device support interleaved transfer. Those transfers usually
+      involve an interleaved set of data, with chunks a few bytes
+      wide, where a scatter-gather transfer would be quite
+      inefficient.
+
+These various types will also affect how the source and destination
+addresses change over time, as DMA_SLAVE transfers will usually have
+one of the addresses that will increment, while the other will not,
+DMA_CYCLIC will have one address that will loop, while the other, will
+not change, etc.
+
+Device operations
+-----------------
+
+Our dma_device structure also requires a few function pointers in
+order to implement the actual logic, now that we described what
+operations we were able to perform.
+
+The functions that we have to fill in there, and hence have to
+implement, obviously depend on the transaction type you reported as
+supported.
+
+   * device_alloc_chan_resources
+   * device_free_chan_resources
+     - These functions will be called whenever a driver will call
+       dma_request_channel or dma_release_channel for the first/last
+       time on the channel associated to that driver.
+     - They are in charge of allocating/freeing all the needed
+       resources in order for that channel to be useful for your
+       driver.
+     - These functions can sleep.
+
+   * device_prep_dma_*
+     - These functions are matching the capabilities you registered
+       previously.
+     - These functions all take the buffer or the scatterlist relevant
+       for the transfer being prepared, and should create a hardware
+       descriptor or a list of descriptors from it
+     - These functions can be called from an interrupt context
+     - Any allocation you might do should be using the GFP_NOWAIT
+       flag, in order not to potentially sleep, but without depleting
+       the emergency pool either.
+
+     - It should return a unique instance of the
+       dma_async_tx_descriptor structure, that further represents this
+       particular transfer.
+
+     - This structure can be allocated using the function
+       dma_async_tx_descriptor_init.
+     - You'll also need to set two fields in this structure:
+       + flags:
+		TODO: Can it be modified by the driver itself, or
+		should it be always the flags passed in the arguments
+
+       + tx_submit:	A pointer to a function you have to implement,
+			that is supposed to push the current descriptor
+			to a pending queue, waiting for issue_pending to
+			be called.
+
+   * device_issue_pending
+     - Takes the first descriptor in the pending queue, and start the
+       transfer. Whenever that transfer is done, it should move to the
+       next transaction in the list.
+     - It should call the registered callback if any each time a
+       transaction is done.
+     - This function can be called in an interrupt context
+
+   * device_tx_status
+     - Should report the bytes left to go over in the current transfer
+       for the given channel
+     - Should use dma_set_residue to report it
+     - In the case of a cyclic transfer, it should only take into
+       account the current period.
+     - This function can be called in an interrupt context.
+
+   * device_control
+     - Used by client drivers to control and configure the channel it
+       has a handle on.
+     - Called with a command and an argument
+       + The command is one of the values listed by the enum
+         dma_ctrl_cmd. To this date, the valid commands are:
+         + DMA_RESUME
+           + Restarts a transfer on the channel
+         + DMA_PAUSE
+           + Pauses a transfer on the channel
+         + DMA_TERMINATE_ALL
+           + Aborts all the pending and ongoing transfers on the
+             channel
+         + DMA_SLAVE_CONFIG
+           + Reconfigures the channel with passed configuration
+         + FSLDMA_EXTERNAL_START
+           + TODO: Why does that even exist?
+       + The argument is an opaque unsigned long. This actually is a
+         pointer to a struct dma_slave_config that should be used only
+         in the DMA_SLAVE_CONFIG.
+
+Misc notes (stuff that should be documented, but don't really know
+what to say about it)
+------------------------------------------------------------------
+  * dma_run_dependencies
+    - What is it supposed to do/when should it be called?
+    - Some drivers seems to implement it at the end of a transfer, but
+      not all of them do, so it seems we can get away without it
+
+  * device_slave_caps
+    - Isn't that redundant with the cap_mask already?
+    - Only a few drivers seem to implement it
+
+  * dma cookies?
+
+Glossary
+--------
+
+Burst:		Usually a few contiguous bytes that will be transfered
+		at once by the DMA controller
+Chunk:		A contiguous collection of bursts
+Transfer:	A collection of chunks (be it contiguous or not)