diff mbox series

[3/4] dma-buf: add support for mapping with dma mapping attributes

Message ID 1547836667-13695-4-git-send-email-lmark@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series ION stability and perf changes | expand

Commit Message

Liam Mark Jan. 18, 2019, 6:37 p.m. UTC
Add support for configuring dma mapping attributes when mapping
and unmapping memory through dma_buf_map_attachment and
dma_buf_unmap_attachment.

Signed-off-by: Liam Mark <lmark@codeaurora.org>
---
 include/linux/dma-buf.h | 3 +++
 1 file changed, 3 insertions(+)

Comments

Liam Mark Jan. 18, 2019, 9:32 p.m. UTC | #1
On Fri, 18 Jan 2019, Laura Abbott wrote:

> On 1/18/19 10:37 AM, Liam Mark wrote:
> > Add support for configuring dma mapping attributes when mapping
> > and unmapping memory through dma_buf_map_attachment and
> > dma_buf_unmap_attachment.
> > 
> > Signed-off-by: Liam Mark <lmark@codeaurora.org>
> > ---
> >   include/linux/dma-buf.h | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > index 58725f890b5b..59bf33e09e2d 100644
> > --- a/include/linux/dma-buf.h
> > +++ b/include/linux/dma-buf.h
> > @@ -308,6 +308,8 @@ struct dma_buf {
> >    * @dev: device attached to the buffer.
> >    * @node: list of dma_buf_attachment.
> >    * @priv: exporter specific attachment data.
> > + * @dma_map_attrs: DMA mapping attributes to be used in
> > + *		   dma_buf_map_attachment() and dma_buf_unmap_attachment().
> >    *
> >    * This structure holds the attachment information between the dma_buf
> > buffer
> >    * and its user device(s). The list contains one attachment struct per
> > device
> > @@ -323,6 +325,7 @@ struct dma_buf_attachment {
> >   	struct device *dev;
> >   	struct list_head node;
> >   	void *priv;
> > +	unsigned long dma_map_attrs;
> >   };
> >     /**
> > 
> 
> Did you miss part of this patch? This only adds it to the structure but
> doesn't
> add it to any API. The same commment applies to the follow up patch,
> I don't quite see how it's being used.
> 

Were you asking for a cleaner DMA-buf API to set this field or were you 
asking for a change to an upstream client to make use of this field?

I have clients set the dma_map_attrs field directly on their
dma_buf_attachment struct before calling dma_buf_map_attachment (if they
need this functionality).
Of course this is all being used in Android for out of tree drivers, but
I assume it is just as useful to everyone else who has cached ION buffers
which aren't always accessed by the CPU.

My understanding is that AOSP Android on Hikey 960 also is currently
suffering from too many CMOs due to dma_map_attachemnt always applying
CMOs, so this support should help them avoid it.

> Thanks,
> Laura
> 

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Liam Mark Jan. 21, 2019, 7:44 p.m. UTC | #2
On Mon, 21 Jan 2019, Christoph Hellwig wrote:

> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> > > And who is going to decide which ones to pass?  And who documents
> > > which ones are safe?
> > > 
> > > I'd much rather have explicit, well documented dma-buf flags that
> > > might get translated to the DMA API flags, which are not error checked,
> > > not very well documented and way to easy to get wrong.
> > > 
> > 
> > I'm not sure having flags in dma-buf really solves anything
> > given drivers can use the attributes directly with dma_map
> > anyway, which is what we're looking to do. The intention
> > is for the driver creating the dma_buf attachment to have
> > the knowledge of which flags to use.
> 
> Well, there are very few flags that you can simply use for all calls of
> dma_map*.  And given how badly these flags are defined I just don't want
> people to add more places where they indirectly use these flags, as
> it will be more than enough work to clean up the current mess.
> 
> What flag(s) do you want to pass this way, btw?  Maybe that is where
> the problem is.
> 

The main use case is for allowing clients to pass in 
DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
ION the buffers aren't usually accessed from the CPU so this allows 
clients to often avoid doing unnecessary cache maintenance.


Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Andrew Davis Jan. 21, 2019, 7:49 p.m. UTC | #3
On 1/21/19 1:44 PM, Liam Mark wrote:
> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> 
>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
>>>> And who is going to decide which ones to pass?  And who documents
>>>> which ones are safe?
>>>>
>>>> I'd much rather have explicit, well documented dma-buf flags that
>>>> might get translated to the DMA API flags, which are not error checked,
>>>> not very well documented and way to easy to get wrong.
>>>>
>>>
>>> I'm not sure having flags in dma-buf really solves anything
>>> given drivers can use the attributes directly with dma_map
>>> anyway, which is what we're looking to do. The intention
>>> is for the driver creating the dma_buf attachment to have
>>> the knowledge of which flags to use.
>>
>> Well, there are very few flags that you can simply use for all calls of
>> dma_map*.  And given how badly these flags are defined I just don't want
>> people to add more places where they indirectly use these flags, as
>> it will be more than enough work to clean up the current mess.
>>
>> What flag(s) do you want to pass this way, btw?  Maybe that is where
>> the problem is.
>>
> 
> The main use case is for allowing clients to pass in 
> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> ION the buffers aren't usually accessed from the CPU so this allows 
> clients to often avoid doing unnecessary cache maintenance.
> 

How can a client know that no CPU access has occurred that needs to be
flushed out?

> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
Liam Mark Jan. 21, 2019, 8:20 p.m. UTC | #4
On Mon, 21 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 1:44 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> > 
> >> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>> And who is going to decide which ones to pass?  And who documents
> >>>> which ones are safe?
> >>>>
> >>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>> might get translated to the DMA API flags, which are not error checked,
> >>>> not very well documented and way to easy to get wrong.
> >>>>
> >>>
> >>> I'm not sure having flags in dma-buf really solves anything
> >>> given drivers can use the attributes directly with dma_map
> >>> anyway, which is what we're looking to do. The intention
> >>> is for the driver creating the dma_buf attachment to have
> >>> the knowledge of which flags to use.
> >>
> >> Well, there are very few flags that you can simply use for all calls of
> >> dma_map*.  And given how badly these flags are defined I just don't want
> >> people to add more places where they indirectly use these flags, as
> >> it will be more than enough work to clean up the current mess.
> >>
> >> What flag(s) do you want to pass this way, btw?  Maybe that is where
> >> the problem is.
> >>
> > 
> > The main use case is for allowing clients to pass in 
> > DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> > which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> > ION the buffers aren't usually accessed from the CPU so this allows 
> > clients to often avoid doing unnecessary cache maintenance.
> > 
> 
> How can a client know that no CPU access has occurred that needs to be
> flushed out?
> 

I have left this to clients, but if they own the buffer they can have the 
knowledge as to whether CPU access is needed in that use case (example for 
post-processing).

For example with the previous version of ION we left all decisions of 
whether cache maintenance was required up to the client, they would use 
the ION cache maintenance IOCTL to force cache maintenance only when it 
was required.
In these cases almost all of the access was being done by the device and 
in the rare cases CPU access was required clients would initiate the 
required cache maintenance before and after the CPU access.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Andrew Davis Jan. 21, 2019, 8:24 p.m. UTC | #5
On 1/21/19 2:20 PM, Liam Mark wrote:
> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/21/19 1:44 PM, Liam Mark wrote:
>>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
>>>
>>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
>>>>>> And who is going to decide which ones to pass?  And who documents
>>>>>> which ones are safe?
>>>>>>
>>>>>> I'd much rather have explicit, well documented dma-buf flags that
>>>>>> might get translated to the DMA API flags, which are not error checked,
>>>>>> not very well documented and way to easy to get wrong.
>>>>>>
>>>>>
>>>>> I'm not sure having flags in dma-buf really solves anything
>>>>> given drivers can use the attributes directly with dma_map
>>>>> anyway, which is what we're looking to do. The intention
>>>>> is for the driver creating the dma_buf attachment to have
>>>>> the knowledge of which flags to use.
>>>>
>>>> Well, there are very few flags that you can simply use for all calls of
>>>> dma_map*.  And given how badly these flags are defined I just don't want
>>>> people to add more places where they indirectly use these flags, as
>>>> it will be more than enough work to clean up the current mess.
>>>>
>>>> What flag(s) do you want to pass this way, btw?  Maybe that is where
>>>> the problem is.
>>>>
>>>
>>> The main use case is for allowing clients to pass in 
>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
>>> ION the buffers aren't usually accessed from the CPU so this allows 
>>> clients to often avoid doing unnecessary cache maintenance.
>>>
>>
>> How can a client know that no CPU access has occurred that needs to be
>> flushed out?
>>
> 
> I have left this to clients, but if they own the buffer they can have the 
> knowledge as to whether CPU access is needed in that use case (example for 
> post-processing).
> 
> For example with the previous version of ION we left all decisions of 
> whether cache maintenance was required up to the client, they would use 
> the ION cache maintenance IOCTL to force cache maintenance only when it 
> was required.
> In these cases almost all of the access was being done by the device and 
> in the rare cases CPU access was required clients would initiate the 
> required cache maintenance before and after the CPU access.
> 

I think we have different definitions of "client", I'm talking about the
DMA-BUF client (the importer), that is who can set this flag. It seems
you mean the userspace application, which has no control over this flag.

> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
Liam Mark Jan. 21, 2019, 10:12 p.m. UTC | #6
On Mon, 21 Jan 2019, Christoph Hellwig wrote:

> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
> > The main use case is for allowing clients to pass in 
> > DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> > which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> > ION the buffers aren't usually accessed from the CPU so this allows 
> > clients to often avoid doing unnecessary cache maintenance.
> 
> This can't work.  The cpu can still easily speculate into this area.

Can you provide more detail on your concern here.
The use case I am thinking about here is a cached buffer which is accessed 
by a non IO-coherent device (quite a common use case for ION).

Guessing on your concern:
The speculative access can be an issue if you are going to access the 
buffer from the CPU after the device has written to it, however if you 
know you aren't going to do any CPU access before the buffer is again 
returned to the device then I don't think the speculative access is a 
concern.

> Moreover in general these operations should be cheap if the addresses
> aren't cached.
> 

I am thinking of use cases with cached buffers here, so CMO isn't cheap.


Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Liam Mark Jan. 21, 2019, 10:14 p.m. UTC | #7
On Mon, 21 Jan 2019, Christoph Hellwig wrote:

> On Mon, Jan 21, 2019 at 12:20:42PM -0800, Liam Mark wrote:
> > I have left this to clients, but if they own the buffer they can have the 
> > knowledge as to whether CPU access is needed in that use case (example for 
> > post-processing).
> 
> That is an API design which the user is more likely to get wrong than
> right and thus does not pass the smell test.
> 

With the previous version of ION Android ION clients were successfully 
managing all their cache maintenance.



Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Liam Mark Jan. 21, 2019, 10:18 p.m. UTC | #8
On Mon, 21 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 2:20 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/21/19 1:44 PM, Liam Mark wrote:
> >>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >>>
> >>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>>>> And who is going to decide which ones to pass?  And who documents
> >>>>>> which ones are safe?
> >>>>>>
> >>>>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>>>> might get translated to the DMA API flags, which are not error checked,
> >>>>>> not very well documented and way to easy to get wrong.
> >>>>>>
> >>>>>
> >>>>> I'm not sure having flags in dma-buf really solves anything
> >>>>> given drivers can use the attributes directly with dma_map
> >>>>> anyway, which is what we're looking to do. The intention
> >>>>> is for the driver creating the dma_buf attachment to have
> >>>>> the knowledge of which flags to use.
> >>>>
> >>>> Well, there are very few flags that you can simply use for all calls of
> >>>> dma_map*.  And given how badly these flags are defined I just don't want
> >>>> people to add more places where they indirectly use these flags, as
> >>>> it will be more than enough work to clean up the current mess.
> >>>>
> >>>> What flag(s) do you want to pass this way, btw?  Maybe that is where
> >>>> the problem is.
> >>>>
> >>>
> >>> The main use case is for allowing clients to pass in 
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> >>> ION the buffers aren't usually accessed from the CPU so this allows 
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>>
> >>
> >> How can a client know that no CPU access has occurred that needs to be
> >> flushed out?
> >>
> > 
> > I have left this to clients, but if they own the buffer they can have the 
> > knowledge as to whether CPU access is needed in that use case (example for 
> > post-processing).
> > 
> > For example with the previous version of ION we left all decisions of 
> > whether cache maintenance was required up to the client, they would use 
> > the ION cache maintenance IOCTL to force cache maintenance only when it 
> > was required.
> > In these cases almost all of the access was being done by the device and 
> > in the rare cases CPU access was required clients would initiate the 
> > required cache maintenance before and after the CPU access.
> > 
> 
> I think we have different definitions of "client", I'm talking about the
> DMA-BUF client (the importer), that is who can set this flag. It seems
> you mean the userspace application, which has no control over this flag.
> 

I am also talking about dma-buf clients, I am referring to both the 
userspace and kernel component of the client. For example our Camera ION 
client has both a usersapce and kernel component and they have ION 
buffers, which they control the access to, which may or may not be 
accessed by the CPU in certain uses cases.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Andrew Davis Jan. 22, 2019, 3:42 p.m. UTC | #9
On 1/21/19 4:18 PM, Liam Mark wrote:
> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> 
>> On 1/21/19 2:20 PM, Liam Mark wrote:
>>> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
>>>
>>>> On 1/21/19 1:44 PM, Liam Mark wrote:
>>>>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
>>>>>
>>>>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
>>>>>>>> And who is going to decide which ones to pass?  And who documents
>>>>>>>> which ones are safe?
>>>>>>>>
>>>>>>>> I'd much rather have explicit, well documented dma-buf flags that
>>>>>>>> might get translated to the DMA API flags, which are not error checked,
>>>>>>>> not very well documented and way to easy to get wrong.
>>>>>>>>
>>>>>>>
>>>>>>> I'm not sure having flags in dma-buf really solves anything
>>>>>>> given drivers can use the attributes directly with dma_map
>>>>>>> anyway, which is what we're looking to do. The intention
>>>>>>> is for the driver creating the dma_buf attachment to have
>>>>>>> the knowledge of which flags to use.
>>>>>>
>>>>>> Well, there are very few flags that you can simply use for all calls of
>>>>>> dma_map*.  And given how badly these flags are defined I just don't want
>>>>>> people to add more places where they indirectly use these flags, as
>>>>>> it will be more than enough work to clean up the current mess.
>>>>>>
>>>>>> What flag(s) do you want to pass this way, btw?  Maybe that is where
>>>>>> the problem is.
>>>>>>
>>>>>
>>>>> The main use case is for allowing clients to pass in 
>>>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
>>>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
>>>>> ION the buffers aren't usually accessed from the CPU so this allows 
>>>>> clients to often avoid doing unnecessary cache maintenance.
>>>>>
>>>>
>>>> How can a client know that no CPU access has occurred that needs to be
>>>> flushed out?
>>>>
>>>
>>> I have left this to clients, but if they own the buffer they can have the 
>>> knowledge as to whether CPU access is needed in that use case (example for 
>>> post-processing).
>>>
>>> For example with the previous version of ION we left all decisions of 
>>> whether cache maintenance was required up to the client, they would use 
>>> the ION cache maintenance IOCTL to force cache maintenance only when it 
>>> was required.
>>> In these cases almost all of the access was being done by the device and 
>>> in the rare cases CPU access was required clients would initiate the 
>>> required cache maintenance before and after the CPU access.
>>>
>>
>> I think we have different definitions of "client", I'm talking about the
>> DMA-BUF client (the importer), that is who can set this flag. It seems
>> you mean the userspace application, which has no control over this flag.
>>
> 
> I am also talking about dma-buf clients, I am referring to both the 
> userspace and kernel component of the client. For example our Camera ION 
> client has both a usersapce and kernel component and they have ION 
> buffers, which they control the access to, which may or may not be 
> accessed by the CPU in certain uses cases.
> 

I know they often work together, but for this discussion it would be
good to keep kernel clients and usperspace clients separate. There are
three types of actors at play here, userspace clients, kernel clients,
and exporters.

DMA-BUF only provides the basic sync primitive + mmap directly to
userspace, both operations are fulfilled by the exporter. This patch is
about adding more control to the kernel side clients. The kernel side
clients cannot know what userspace or other kernel side clients have
done with the buffer, *only* the exporter has the whole picture.

Therefor neither type of client should be deciding if the CPU needs
flushed or not, only the exporter, based on the type of buffer, the
current set attachments, and previous actions (is this first attachment,
CPU get access in-between, etc...) can make this decision.

You goal seems to be to avoid unneeded CPU side CMOs when a device
detaches and another attaches with no CPU access in-between, right?
That's reasonable to me, but it must be the exporter who keeps track and
skips the CMO. This patch allows the client to tell the exporter the CMO
is not needed and that is not safe.

Andrew

> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
Andrew Davis Jan. 22, 2019, 4:06 p.m. UTC | #10
On 1/21/19 4:12 PM, Liam Mark wrote:
> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> 
>> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
>>> The main use case is for allowing clients to pass in 
>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
>>> ION the buffers aren't usually accessed from the CPU so this allows 
>>> clients to often avoid doing unnecessary cache maintenance.
>>
>> This can't work.  The cpu can still easily speculate into this area.
> 
> Can you provide more detail on your concern here.
> The use case I am thinking about here is a cached buffer which is accessed 
> by a non IO-coherent device (quite a common use case for ION).
> 
> Guessing on your concern:
> The speculative access can be an issue if you are going to access the 
> buffer from the CPU after the device has written to it, however if you 
> know you aren't going to do any CPU access before the buffer is again 
> returned to the device then I don't think the speculative access is a 
> concern.
> 
>> Moreover in general these operations should be cheap if the addresses
>> aren't cached.
>>
> 
> I am thinking of use cases with cached buffers here, so CMO isn't cheap.
> 

These buffers are cacheable, not cached, if you haven't written anything
the data wont actually be in cache. And in the case of speculative cache
filling the lines are marked clean. In either case the only cost is the
little 7 instruction loop calling the clean/invalidate instruction (dc
civac for ARMv8) for the cache-lines. Unless that is the cost you are
trying to avoid?

In that case if you are mapping and unmapping so much that the little
CMO here is hurting performance then I would argue your usage is broken
and needs to be re-worked a bit.

Andrew

> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
Liam Mark Jan. 22, 2019, 10:47 p.m. UTC | #11
On Tue, 22 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 4:18 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> > 
> >> On 1/21/19 2:20 PM, Liam Mark wrote:
> >>> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/21/19 1:44 PM, Liam Mark wrote:
> >>>>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>>>>>> And who is going to decide which ones to pass?  And who documents
> >>>>>>>> which ones are safe?
> >>>>>>>>
> >>>>>>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>>>>>> might get translated to the DMA API flags, which are not error checked,
> >>>>>>>> not very well documented and way to easy to get wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'm not sure having flags in dma-buf really solves anything
> >>>>>>> given drivers can use the attributes directly with dma_map
> >>>>>>> anyway, which is what we're looking to do. The intention
> >>>>>>> is for the driver creating the dma_buf attachment to have
> >>>>>>> the knowledge of which flags to use.
> >>>>>>
> >>>>>> Well, there are very few flags that you can simply use for all calls of
> >>>>>> dma_map*.  And given how badly these flags are defined I just don't want
> >>>>>> people to add more places where they indirectly use these flags, as
> >>>>>> it will be more than enough work to clean up the current mess.
> >>>>>>
> >>>>>> What flag(s) do you want to pass this way, btw?  Maybe that is where
> >>>>>> the problem is.
> >>>>>>
> >>>>>
> >>>>> The main use case is for allowing clients to pass in 
> >>>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> >>>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> >>>>> ION the buffers aren't usually accessed from the CPU so this allows 
> >>>>> clients to often avoid doing unnecessary cache maintenance.
> >>>>>
> >>>>
> >>>> How can a client know that no CPU access has occurred that needs to be
> >>>> flushed out?
> >>>>
> >>>
> >>> I have left this to clients, but if they own the buffer they can have the 
> >>> knowledge as to whether CPU access is needed in that use case (example for 
> >>> post-processing).
> >>>
> >>> For example with the previous version of ION we left all decisions of 
> >>> whether cache maintenance was required up to the client, they would use 
> >>> the ION cache maintenance IOCTL to force cache maintenance only when it 
> >>> was required.
> >>> In these cases almost all of the access was being done by the device and 
> >>> in the rare cases CPU access was required clients would initiate the 
> >>> required cache maintenance before and after the CPU access.
> >>>
> >>
> >> I think we have different definitions of "client", I'm talking about the
> >> DMA-BUF client (the importer), that is who can set this flag. It seems
> >> you mean the userspace application, which has no control over this flag.
> >>
> > 
> > I am also talking about dma-buf clients, I am referring to both the 
> > userspace and kernel component of the client. For example our Camera ION 
> > client has both a usersapce and kernel component and they have ION 
> > buffers, which they control the access to, which may or may not be 
> > accessed by the CPU in certain uses cases.
> > 
> 
> I know they often work together, but for this discussion it would be
> good to keep kernel clients and usperspace clients separate. There are
> three types of actors at play here, userspace clients, kernel clients,
> and exporters.
> 
> DMA-BUF only provides the basic sync primitive + mmap directly to
> userspace, 

Well dma-buf does provide dma_buf_kmap/dma_buf_begin_cpu_access which 
allows the same fucntionality in the kernel, but I don't think that changes
your argument.

> both operations are fulfilled by the exporter. This patch is
> about adding more control to the kernel side clients. The kernel side
> clients cannot know what userspace or other kernel side clients have
> done with the buffer, *only* the exporter has the whole picture.
> 
> Therefor neither type of client should be deciding if the CPU needs
> flushed or not, only the exporter, based on the type of buffer, the
> current set attachments, and previous actions (is this first attachment,
> CPU get access in-between, etc...) can make this decision.
> 
> You goal seems to be to avoid unneeded CPU side CMOs when a device
> detaches and another attaches with no CPU access in-between, right?
> That's reasonable to me, but it must be the exporter who keeps track and
> skips the CMO. This patch allows the client to tell the exporter the CMO
> is not needed and that is not safe.
> 

I agree it would be better have this logic in the exporter, but I just 
haven't heard an upstreamable way to make that work.
But maybe to explore that a bit more.

If we consider having CPU access with no devices attached a legitimate use 
case:

The pipelining use case I am thinking of is 
 1) dev 1 attach, map, access, unmap
 2) dev 1 detach
 3) (maybe) CPU access
 4) dev 2 attach
 5) dev 2 map, access
 6) ...

It would be unfortunate to not consider this something legitimate for 
userspace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there 
isn't necessarily a nice place to tell them when to detach.

If we considered the above a supported use case I think we could support 
it in dma-buf (based on past discussions) if we had 2 things

#1 if we tracked the state of the buffer (example if it has had a previous 
cached/uncached write and no following CMO). Then when either the CPU or
a device was going to access a buffer it could decide, based on the 
previous access if any CMO needs to be applied first.

#2 we had a non-architecture specific way to apply cache maintenance 
without a device, so that in step #3 the begin_cpu_acess call could 
successfully invalidate the buffer.   

I think #1 is doable since we can tell tell if devices are IO coherent or 
not and we know the direction of accesses in dma map and begin cpu access.

I think we would probably agree that #2 is a problem though, getting the 
kernel to expose that API seems like a hard argument.

Liam

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Liam Mark Jan. 22, 2019, 10:50 p.m. UTC | #12
On Tue, 22 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 4:12 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> > 
> >> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
> >>> The main use case is for allowing clients to pass in 
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
> >>> ION the buffers aren't usually accessed from the CPU so this allows 
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>
> >> This can't work.  The cpu can still easily speculate into this area.
> > 
> > Can you provide more detail on your concern here.
> > The use case I am thinking about here is a cached buffer which is accessed 
> > by a non IO-coherent device (quite a common use case for ION).
> > 
> > Guessing on your concern:
> > The speculative access can be an issue if you are going to access the 
> > buffer from the CPU after the device has written to it, however if you 
> > know you aren't going to do any CPU access before the buffer is again 
> > returned to the device then I don't think the speculative access is a 
> > concern.
> > 
> >> Moreover in general these operations should be cheap if the addresses
> >> aren't cached.
> >>
> > 
> > I am thinking of use cases with cached buffers here, so CMO isn't cheap.
> > 
> 
> These buffers are cacheable, not cached, if you haven't written anything
> the data wont actually be in cache. 

That's true

> And in the case of speculative cache
> filling the lines are marked clean. In either case the only cost is the
> little 7 instruction loop calling the clean/invalidate instruction (dc
> civac for ARMv8) for the cache-lines. Unless that is the cost you are
> trying to avoid?
> 

This is the cost I am trying to avoid and this comes back to our previous 
discussion.  We have a coherent system cache so if you are doing this for 
every cache line on a large buffer it adds up with this work and the going 
to the bus.
For example I believe 1080P buffers are 8MB, and 4K buffers are even 
larger.

I also still think you would want to solve this properly such that 
invalidates aren't being done unnecessarily.

> In that case if you are mapping and unmapping so much that the little
> CMO here is hurting performance then I would argue your usage is broken
> and needs to be re-worked a bit.
> 

I am not sure I would say it is broken, the large buffers (example 1080P 
buffers) are mapped and unmapped on every frame. I don't think there is 
any clean way to avoid that in a pipelining framework, you could ask 
clients to keep the buffers dma mapped but there isn't necessarily a good 
time to tell them to unmap. 

It would be unfortunate to not consider this something legitimate for 
usespace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there 
isn't necessarily a nice place to tell them when to detach.


Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
diff mbox series

Patch

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 58725f890b5b..59bf33e09e2d 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -308,6 +308,8 @@  struct dma_buf {
  * @dev: device attached to the buffer.
  * @node: list of dma_buf_attachment.
  * @priv: exporter specific attachment data.
+ * @dma_map_attrs: DMA mapping attributes to be used in
+ *		   dma_buf_map_attachment() and dma_buf_unmap_attachment().
  *
  * This structure holds the attachment information between the dma_buf buffer
  * and its user device(s). The list contains one attachment struct per device
@@ -323,6 +325,7 @@  struct dma_buf_attachment {
 	struct device *dev;
 	struct list_head node;
 	void *priv;
+	unsigned long dma_map_attrs;
 };
 
 /**