Message ID | 1404147116-4598-3-git-send-email-ohaugan@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 30, 2014 at 09:51:51AM -0700, Olav Haugan wrote: [...] > +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, > + struct scatterlist *sg, unsigned int len, int prot) > +{ > + if (unlikely(domain->ops->map_range == NULL)) > + return -ENODEV; Should we perhaps make this mandatory? For drivers that don't provide it we could implement a generic helper that wraps iommu_{map,unmap}(). > + > + BUG_ON(iova & (~PAGE_MASK)); > + > + return domain->ops->map_range(domain, iova, sg, len, prot); > +} > +EXPORT_SYMBOL_GPL(iommu_map_range); > + > +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, > + unsigned int len) > +{ > + if (unlikely(domain->ops->unmap_range == NULL)) > + return -ENODEV; > + > + BUG_ON(iova & (~PAGE_MASK)); > + > + return domain->ops->unmap_range(domain, iova, len); > +} > +EXPORT_SYMBOL_GPL(iommu_unmap_range); Could these be renamed iommu_{map,unmap}_sg() instead to make it more obvious what exactly they map? And perhaps this could take an sg_table instead, which already provides a count and is a very common structure used in drivers (and the DMA mapping API). Thierry
Hi Olav, On Mon, Jun 30, 2014 at 05:51:51PM +0100, Olav Haugan wrote: > Mapping and unmapping are more often than not in the critical path. > map_range and unmap_range allows SMMU driver implementations to optimize > the process of mapping and unmapping buffers into the SMMU page tables. > Instead of mapping one physical address, do TLB operation (expensive), > mapping, do TLB operation, mapping, do TLB operation the driver can map > a scatter-gatherlist of physically contiguous pages into one virtual > address space and then at the end do one TLB operation. > > Additionally, the mapping operation would be faster in general since > clients does not have to keep calling map API over and over again for > each physically contiguous chunk of memory that needs to be mapped to a > virtually contiguous region. I like the idea of this, although it does mean that drivers implementing the range mapping functions need more featureful page-table manipulation code than currently required. For example, iommu_map uses iommu_pgsize to guarantee that mappings are created in blocks of the largest support page size. This can be used to simplify iterating in the SMMU driver (although the ARM SMMU driver doesn't yet make use of this, I think Varun would add this when he adds support for sections). Given that we're really trying to kill the TLBI here, why not implement something like iommu_unmap_nosync (unmap without DSB; TLBI) and iommu_sync (DSB; TLBI) instead? If we guarantee that ranges must be unmapped before being remapped, then there shouldn't be a TLBI on the map path anyway. Will -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: iommu-bounces@lists.linux-foundation.org [mailto:iommu- > bounces@lists.linux-foundation.org] On Behalf Of Will Deacon > Sent: Tuesday, July 01, 2014 3:04 PM > To: Olav Haugan > Cc: linux-arm-msm@vger.kernel.org; iommu@lists.linux-foundation.org; > thierry.reding@gmail.com; vgandhi@codeaurora.org; linux-arm- > kernel@lists.infradead.org > Subject: Re: [RFC/PATCH 2/7] iommu-api: Add map_range/unmap_range > functions > > Hi Olav, > > On Mon, Jun 30, 2014 at 05:51:51PM +0100, Olav Haugan wrote: > > Mapping and unmapping are more often than not in the critical path. > > map_range and unmap_range allows SMMU driver implementations to > > optimize the process of mapping and unmapping buffers into the SMMU > page tables. > > Instead of mapping one physical address, do TLB operation (expensive), > > mapping, do TLB operation, mapping, do TLB operation the driver can > > map a scatter-gatherlist of physically contiguous pages into one > > virtual address space and then at the end do one TLB operation. > > > > Additionally, the mapping operation would be faster in general since > > clients does not have to keep calling map API over and over again for > > each physically contiguous chunk of memory that needs to be mapped to > > a virtually contiguous region. > > I like the idea of this, although it does mean that drivers implementing > the range mapping functions need more featureful page-table manipulation > code than currently required. > > For example, iommu_map uses iommu_pgsize to guarantee that mappings are > created in blocks of the largest support page size. This can be used to > simplify iterating in the SMMU driver (although the ARM SMMU driver > doesn't yet make use of this, I think Varun would add this when he adds > support for sections). Yes, this would be supported. -Varun -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Olav, Olav Haugan <ohaugan@codeaurora.org> writes: > Mapping and unmapping are more often than not in the critical path. > map_range and unmap_range allows SMMU driver implementations to optimize > the process of mapping and unmapping buffers into the SMMU page tables. > Instead of mapping one physical address, do TLB operation (expensive), > mapping, do TLB operation, mapping, do TLB operation the driver can map > a scatter-gatherlist of physically contiguous pages into one virtual > address space and then at the end do one TLB operation. > > Additionally, the mapping operation would be faster in general since > clients does not have to keep calling map API over and over again for > each physically contiguous chunk of memory that needs to be mapped to a > virtually contiguous region. > > Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> > --- > drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ > include/linux/iommu.h | 24 ++++++++++++++++++++++++ > 2 files changed, 48 insertions(+) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index e5555fc..f2a6b80 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) > EXPORT_SYMBOL_GPL(iommu_unmap); > > > +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, > + struct scatterlist *sg, unsigned int len, int prot) > +{ > + if (unlikely(domain->ops->map_range == NULL)) > + return -ENODEV; > + > + BUG_ON(iova & (~PAGE_MASK)); > + > + return domain->ops->map_range(domain, iova, sg, len, prot); > +} > +EXPORT_SYMBOL_GPL(iommu_map_range); We have the similar one internally, which is named, "iommu_map_sg()", called from DMA API. > +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, > + unsigned int len) > +{ > + if (unlikely(domain->ops->unmap_range == NULL)) > + return -ENODEV; > + > + BUG_ON(iova & (~PAGE_MASK)); > + > + return domain->ops->unmap_range(domain, iova, len); > +} > +EXPORT_SYMBOL_GPL(iommu_unmap_range); Can the existing iommu_unmap() do the same? -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Hiroshi, On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: > Hi Olav, > > Olav Haugan <ohaugan@codeaurora.org> writes: > >> Mapping and unmapping are more often than not in the critical path. >> map_range and unmap_range allows SMMU driver implementations to optimize >> the process of mapping and unmapping buffers into the SMMU page tables. >> Instead of mapping one physical address, do TLB operation (expensive), >> mapping, do TLB operation, mapping, do TLB operation the driver can map >> a scatter-gatherlist of physically contiguous pages into one virtual >> address space and then at the end do one TLB operation. >> >> Additionally, the mapping operation would be faster in general since >> clients does not have to keep calling map API over and over again for >> each physically contiguous chunk of memory that needs to be mapped to a >> virtually contiguous region. >> >> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >> --- >> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >> 2 files changed, 48 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index e5555fc..f2a6b80 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >> EXPORT_SYMBOL_GPL(iommu_unmap); >> >> >> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >> + struct scatterlist *sg, unsigned int len, int prot) >> +{ >> + if (unlikely(domain->ops->map_range == NULL)) >> + return -ENODEV; >> + >> + BUG_ON(iova & (~PAGE_MASK)); >> + >> + return domain->ops->map_range(domain, iova, sg, len, prot); >> +} >> +EXPORT_SYMBOL_GPL(iommu_map_range); > > We have the similar one internally, which is named, "iommu_map_sg()", > called from DMA API. Great, so this new API will be useful to more people! >> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >> + unsigned int len) >> +{ >> + if (unlikely(domain->ops->unmap_range == NULL)) >> + return -ENODEV; >> + >> + BUG_ON(iova & (~PAGE_MASK)); >> + >> + return domain->ops->unmap_range(domain, iova, len); >> +} >> +EXPORT_SYMBOL_GPL(iommu_unmap_range); > > Can the existing iommu_unmap() do the same? I believe iommu_unmap() behaves a bit differently because it will keep on calling domain->ops->unmap() until everything is unmapped instead of letting the iommu implementation take care of unmapping everything in one call. I am abandoning the patch series since our driver was not accepted. However, if there are no objections I will resubmit this patch (PATCH 2/7) as an independent patch to add this new map_range API. Thanks, Olav Haugan
On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: > Hi Hiroshi, > > On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: >> Hi Olav, >> >> Olav Haugan <ohaugan@codeaurora.org> writes: >> >>> Mapping and unmapping are more often than not in the critical path. >>> map_range and unmap_range allows SMMU driver implementations to optimize >>> the process of mapping and unmapping buffers into the SMMU page tables. >>> Instead of mapping one physical address, do TLB operation (expensive), >>> mapping, do TLB operation, mapping, do TLB operation the driver can map >>> a scatter-gatherlist of physically contiguous pages into one virtual >>> address space and then at the end do one TLB operation. >>> >>> Additionally, the mapping operation would be faster in general since >>> clients does not have to keep calling map API over and over again for >>> each physically contiguous chunk of memory that needs to be mapped to a >>> virtually contiguous region. >>> >>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >>> --- >>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >>> 2 files changed, 48 insertions(+) >>> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>> index e5555fc..f2a6b80 100644 >>> --- a/drivers/iommu/iommu.c >>> +++ b/drivers/iommu/iommu.c >>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >>> EXPORT_SYMBOL_GPL(iommu_unmap); >>> >>> >>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >>> + struct scatterlist *sg, unsigned int len, int prot) >>> +{ >>> + if (unlikely(domain->ops->map_range == NULL)) >>> + return -ENODEV; >>> + >>> + BUG_ON(iova & (~PAGE_MASK)); >>> + >>> + return domain->ops->map_range(domain, iova, sg, len, prot); >>> +} >>> +EXPORT_SYMBOL_GPL(iommu_map_range); >> >> We have the similar one internally, which is named, "iommu_map_sg()", >> called from DMA API. > > Great, so this new API will be useful to more people! > >>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >>> + unsigned int len) >>> +{ >>> + if (unlikely(domain->ops->unmap_range == NULL)) >>> + return -ENODEV; >>> + >>> + BUG_ON(iova & (~PAGE_MASK)); >>> + >>> + return domain->ops->unmap_range(domain, iova, len); >>> +} >>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); >> >> Can the existing iommu_unmap() do the same? > > I believe iommu_unmap() behaves a bit differently because it will keep > on calling domain->ops->unmap() until everything is unmapped instead of > letting the iommu implementation take care of unmapping everything in > one call. > > I am abandoning the patch series since our driver was not accepted. > However, if there are no objections I will resubmit this patch (PATCH > 2/7) as an independent patch to add this new map_range API. +1 for map_range().. I've seen for gpu workloads, at least, it is the downstream map_range() API is quite beneficial. It was worth at least a few fps in xonotic. And, possibly getting off the subject a bit, but I was wondering about the possibility of going one step further and batching up mapping and/or unmapping multiple buffers (ranges) at once. I have a pretty convenient sync point in drm/msm to flush out multiple mappings before kicking gpu. BR, -R > Thanks, > > Olav Haugan > > -- > The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/8/2014 4:49 PM, Rob Clark wrote: > On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >> Hi Hiroshi, >> >> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: >>> Hi Olav, >>> >>> Olav Haugan <ohaugan@codeaurora.org> writes: >>> >>>> Mapping and unmapping are more often than not in the critical path. >>>> map_range and unmap_range allows SMMU driver implementations to optimize >>>> the process of mapping and unmapping buffers into the SMMU page tables. >>>> Instead of mapping one physical address, do TLB operation (expensive), >>>> mapping, do TLB operation, mapping, do TLB operation the driver can map >>>> a scatter-gatherlist of physically contiguous pages into one virtual >>>> address space and then at the end do one TLB operation. >>>> >>>> Additionally, the mapping operation would be faster in general since >>>> clients does not have to keep calling map API over and over again for >>>> each physically contiguous chunk of memory that needs to be mapped to a >>>> virtually contiguous region. >>>> >>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >>>> --- >>>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >>>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >>>> 2 files changed, 48 insertions(+) >>>> >>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>> index e5555fc..f2a6b80 100644 >>>> --- a/drivers/iommu/iommu.c >>>> +++ b/drivers/iommu/iommu.c >>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >>>> EXPORT_SYMBOL_GPL(iommu_unmap); >>>> >>>> >>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >>>> + struct scatterlist *sg, unsigned int len, int prot) >>>> +{ >>>> + if (unlikely(domain->ops->map_range == NULL)) >>>> + return -ENODEV; >>>> + >>>> + BUG_ON(iova & (~PAGE_MASK)); >>>> + >>>> + return domain->ops->map_range(domain, iova, sg, len, prot); >>>> +} >>>> +EXPORT_SYMBOL_GPL(iommu_map_range); >>> >>> We have the similar one internally, which is named, "iommu_map_sg()", >>> called from DMA API. >> >> Great, so this new API will be useful to more people! >> >>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >>>> + unsigned int len) >>>> +{ >>>> + if (unlikely(domain->ops->unmap_range == NULL)) >>>> + return -ENODEV; >>>> + >>>> + BUG_ON(iova & (~PAGE_MASK)); >>>> + >>>> + return domain->ops->unmap_range(domain, iova, len); >>>> +} >>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); >>> >>> Can the existing iommu_unmap() do the same? >> >> I believe iommu_unmap() behaves a bit differently because it will keep >> on calling domain->ops->unmap() until everything is unmapped instead of >> letting the iommu implementation take care of unmapping everything in >> one call. >> >> I am abandoning the patch series since our driver was not accepted. >> However, if there are no objections I will resubmit this patch (PATCH >> 2/7) as an independent patch to add this new map_range API. > > +1 for map_range().. I've seen for gpu workloads, at least, it is the > downstream map_range() API is quite beneficial. It was worth at > least a few fps in xonotic. > > And, possibly getting off the subject a bit, but I was wondering about > the possibility of going one step further and batching up mapping > and/or unmapping multiple buffers (ranges) at once. I have a pretty > convenient sync point in drm/msm to flush out multiple mappings before > kicking gpu. I think you should be able to do that with this API already - at least the mapping part since we are passing in a sg list (this could be a chained sglist). Thanks, Olav
On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: > On 7/8/2014 4:49 PM, Rob Clark wrote: >> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >>> Hi Hiroshi, >>> >>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: >>>> Hi Olav, >>>> >>>> Olav Haugan <ohaugan@codeaurora.org> writes: >>>> >>>>> Mapping and unmapping are more often than not in the critical path. >>>>> map_range and unmap_range allows SMMU driver implementations to optimize >>>>> the process of mapping and unmapping buffers into the SMMU page tables. >>>>> Instead of mapping one physical address, do TLB operation (expensive), >>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map >>>>> a scatter-gatherlist of physically contiguous pages into one virtual >>>>> address space and then at the end do one TLB operation. >>>>> >>>>> Additionally, the mapping operation would be faster in general since >>>>> clients does not have to keep calling map API over and over again for >>>>> each physically contiguous chunk of memory that needs to be mapped to a >>>>> virtually contiguous region. >>>>> >>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >>>>> --- >>>>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >>>>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >>>>> 2 files changed, 48 insertions(+) >>>>> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>>> index e5555fc..f2a6b80 100644 >>>>> --- a/drivers/iommu/iommu.c >>>>> +++ b/drivers/iommu/iommu.c >>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >>>>> EXPORT_SYMBOL_GPL(iommu_unmap); >>>>> >>>>> >>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >>>>> + struct scatterlist *sg, unsigned int len, int prot) >>>>> +{ >>>>> + if (unlikely(domain->ops->map_range == NULL)) >>>>> + return -ENODEV; >>>>> + >>>>> + BUG_ON(iova & (~PAGE_MASK)); >>>>> + >>>>> + return domain->ops->map_range(domain, iova, sg, len, prot); >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(iommu_map_range); >>>> >>>> We have the similar one internally, which is named, "iommu_map_sg()", >>>> called from DMA API. >>> >>> Great, so this new API will be useful to more people! >>> >>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >>>>> + unsigned int len) >>>>> +{ >>>>> + if (unlikely(domain->ops->unmap_range == NULL)) >>>>> + return -ENODEV; >>>>> + >>>>> + BUG_ON(iova & (~PAGE_MASK)); >>>>> + >>>>> + return domain->ops->unmap_range(domain, iova, len); >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); >>>> >>>> Can the existing iommu_unmap() do the same? >>> >>> I believe iommu_unmap() behaves a bit differently because it will keep >>> on calling domain->ops->unmap() until everything is unmapped instead of >>> letting the iommu implementation take care of unmapping everything in >>> one call. >>> >>> I am abandoning the patch series since our driver was not accepted. >>> However, if there are no objections I will resubmit this patch (PATCH >>> 2/7) as an independent patch to add this new map_range API. >> >> +1 for map_range().. I've seen for gpu workloads, at least, it is the >> downstream map_range() API is quite beneficial. It was worth at >> least a few fps in xonotic. >> >> And, possibly getting off the subject a bit, but I was wondering about >> the possibility of going one step further and batching up mapping >> and/or unmapping multiple buffers (ranges) at once. I have a pretty >> convenient sync point in drm/msm to flush out multiple mappings before >> kicking gpu. > > I think you should be able to do that with this API already - at least > the mapping part since we are passing in a sg list (this could be a > chained sglist). What I mean by batching up is mapping and unmapping multiple sglists each at different iova's with minmal cpu cache and iommu tlb flushes.. Ideally we'd let the IOMMU driver be clever and build out all 2nd level tables before inserting into first level tables (to minimize cpu cache flushing).. also, there is probably a reasonable chance that we'd be mapping a new buffer into existing location, so there might be some potential to reuse existing 2nd level tables (and save a tiny bit of free/alloc). I've not thought too much about how that would look in code.. might be kinda, umm, fun.. But at an API level, we should be able to do a bunch of map/unmap_range's with one flush. Maybe it could look like a sequence of iommu_{map,unmap}_range() followed by iommu_flush()? BR, -R > Thanks, > > Olav > > -- > The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 09, 2014 at 08:40:21PM -0400, Rob Clark wrote: > On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: > > On 7/8/2014 4:49 PM, Rob Clark wrote: > >> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: > >>> Hi Hiroshi, > >>> > >>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: > >>>> Hi Olav, > >>>> > >>>> Olav Haugan <ohaugan@codeaurora.org> writes: > >>>> > >>>>> Mapping and unmapping are more often than not in the critical path. > >>>>> map_range and unmap_range allows SMMU driver implementations to optimize > >>>>> the process of mapping and unmapping buffers into the SMMU page tables. > >>>>> Instead of mapping one physical address, do TLB operation (expensive), > >>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map > >>>>> a scatter-gatherlist of physically contiguous pages into one virtual > >>>>> address space and then at the end do one TLB operation. > >>>>> > >>>>> Additionally, the mapping operation would be faster in general since > >>>>> clients does not have to keep calling map API over and over again for > >>>>> each physically contiguous chunk of memory that needs to be mapped to a > >>>>> virtually contiguous region. > >>>>> > >>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> > >>>>> --- > >>>>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ > >>>>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ > >>>>> 2 files changed, 48 insertions(+) > >>>>> > >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > >>>>> index e5555fc..f2a6b80 100644 > >>>>> --- a/drivers/iommu/iommu.c > >>>>> +++ b/drivers/iommu/iommu.c > >>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) > >>>>> EXPORT_SYMBOL_GPL(iommu_unmap); > >>>>> > >>>>> > >>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, > >>>>> + struct scatterlist *sg, unsigned int len, int prot) > >>>>> +{ > >>>>> + if (unlikely(domain->ops->map_range == NULL)) > >>>>> + return -ENODEV; > >>>>> + > >>>>> + BUG_ON(iova & (~PAGE_MASK)); > >>>>> + > >>>>> + return domain->ops->map_range(domain, iova, sg, len, prot); > >>>>> +} > >>>>> +EXPORT_SYMBOL_GPL(iommu_map_range); > >>>> > >>>> We have the similar one internally, which is named, "iommu_map_sg()", > >>>> called from DMA API. > >>> > >>> Great, so this new API will be useful to more people! > >>> > >>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, > >>>>> + unsigned int len) > >>>>> +{ > >>>>> + if (unlikely(domain->ops->unmap_range == NULL)) > >>>>> + return -ENODEV; > >>>>> + > >>>>> + BUG_ON(iova & (~PAGE_MASK)); > >>>>> + > >>>>> + return domain->ops->unmap_range(domain, iova, len); > >>>>> +} > >>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); > >>>> > >>>> Can the existing iommu_unmap() do the same? > >>> > >>> I believe iommu_unmap() behaves a bit differently because it will keep > >>> on calling domain->ops->unmap() until everything is unmapped instead of > >>> letting the iommu implementation take care of unmapping everything in > >>> one call. > >>> > >>> I am abandoning the patch series since our driver was not accepted. > >>> However, if there are no objections I will resubmit this patch (PATCH > >>> 2/7) as an independent patch to add this new map_range API. > >> > >> +1 for map_range().. I've seen for gpu workloads, at least, it is the > >> downstream map_range() API is quite beneficial. It was worth at > >> least a few fps in xonotic. > >> > >> And, possibly getting off the subject a bit, but I was wondering about > >> the possibility of going one step further and batching up mapping > >> and/or unmapping multiple buffers (ranges) at once. I have a pretty > >> convenient sync point in drm/msm to flush out multiple mappings before > >> kicking gpu. > > > > I think you should be able to do that with this API already - at least > > the mapping part since we are passing in a sg list (this could be a > > chained sglist). > > What I mean by batching up is mapping and unmapping multiple sglists > each at different iova's with minmal cpu cache and iommu tlb flushes.. > > Ideally we'd let the IOMMU driver be clever and build out all 2nd > level tables before inserting into first level tables (to minimize cpu > cache flushing).. also, there is probably a reasonable chance that > we'd be mapping a new buffer into existing location, so there might be > some potential to reuse existing 2nd level tables (and save a tiny bit > of free/alloc). I've not thought too much about how that would look > in code.. might be kinda, umm, fun.. > > But at an API level, we should be able to do a bunch of > map/unmap_range's with one flush. > > Maybe it could look like a sequence of iommu_{map,unmap}_range() > followed by iommu_flush()? Doesn't that mean that the IOMMU driver would have to keep track of all mappings until it sees an iommu_flush()? That sounds like it could be a lot of work and complicated code. Thierry
On Thu, Jul 10, 2014 at 3:10 AM, Thierry Reding <thierry.reding@gmail.com> wrote: > On Wed, Jul 09, 2014 at 08:40:21PM -0400, Rob Clark wrote: >> On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >> > On 7/8/2014 4:49 PM, Rob Clark wrote: >> >> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >> >>> Hi Hiroshi, >> >>> >> >>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: >> >>>> Hi Olav, >> >>>> >> >>>> Olav Haugan <ohaugan@codeaurora.org> writes: >> >>>> >> >>>>> Mapping and unmapping are more often than not in the critical path. >> >>>>> map_range and unmap_range allows SMMU driver implementations to optimize >> >>>>> the process of mapping and unmapping buffers into the SMMU page tables. >> >>>>> Instead of mapping one physical address, do TLB operation (expensive), >> >>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map >> >>>>> a scatter-gatherlist of physically contiguous pages into one virtual >> >>>>> address space and then at the end do one TLB operation. >> >>>>> >> >>>>> Additionally, the mapping operation would be faster in general since >> >>>>> clients does not have to keep calling map API over and over again for >> >>>>> each physically contiguous chunk of memory that needs to be mapped to a >> >>>>> virtually contiguous region. >> >>>>> >> >>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >> >>>>> --- >> >>>>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >> >>>>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >> >>>>> 2 files changed, 48 insertions(+) >> >>>>> >> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> >>>>> index e5555fc..f2a6b80 100644 >> >>>>> --- a/drivers/iommu/iommu.c >> >>>>> +++ b/drivers/iommu/iommu.c >> >>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >> >>>>> EXPORT_SYMBOL_GPL(iommu_unmap); >> >>>>> >> >>>>> >> >>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >> >>>>> + struct scatterlist *sg, unsigned int len, int prot) >> >>>>> +{ >> >>>>> + if (unlikely(domain->ops->map_range == NULL)) >> >>>>> + return -ENODEV; >> >>>>> + >> >>>>> + BUG_ON(iova & (~PAGE_MASK)); >> >>>>> + >> >>>>> + return domain->ops->map_range(domain, iova, sg, len, prot); >> >>>>> +} >> >>>>> +EXPORT_SYMBOL_GPL(iommu_map_range); >> >>>> >> >>>> We have the similar one internally, which is named, "iommu_map_sg()", >> >>>> called from DMA API. >> >>> >> >>> Great, so this new API will be useful to more people! >> >>> >> >>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >> >>>>> + unsigned int len) >> >>>>> +{ >> >>>>> + if (unlikely(domain->ops->unmap_range == NULL)) >> >>>>> + return -ENODEV; >> >>>>> + >> >>>>> + BUG_ON(iova & (~PAGE_MASK)); >> >>>>> + >> >>>>> + return domain->ops->unmap_range(domain, iova, len); >> >>>>> +} >> >>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); >> >>>> >> >>>> Can the existing iommu_unmap() do the same? >> >>> >> >>> I believe iommu_unmap() behaves a bit differently because it will keep >> >>> on calling domain->ops->unmap() until everything is unmapped instead of >> >>> letting the iommu implementation take care of unmapping everything in >> >>> one call. >> >>> >> >>> I am abandoning the patch series since our driver was not accepted. >> >>> However, if there are no objections I will resubmit this patch (PATCH >> >>> 2/7) as an independent patch to add this new map_range API. >> >> >> >> +1 for map_range().. I've seen for gpu workloads, at least, it is the >> >> downstream map_range() API is quite beneficial. It was worth at >> >> least a few fps in xonotic. >> >> >> >> And, possibly getting off the subject a bit, but I was wondering about >> >> the possibility of going one step further and batching up mapping >> >> and/or unmapping multiple buffers (ranges) at once. I have a pretty >> >> convenient sync point in drm/msm to flush out multiple mappings before >> >> kicking gpu. >> > >> > I think you should be able to do that with this API already - at least >> > the mapping part since we are passing in a sg list (this could be a >> > chained sglist). >> >> What I mean by batching up is mapping and unmapping multiple sglists >> each at different iova's with minmal cpu cache and iommu tlb flushes.. >> >> Ideally we'd let the IOMMU driver be clever and build out all 2nd >> level tables before inserting into first level tables (to minimize cpu >> cache flushing).. also, there is probably a reasonable chance that >> we'd be mapping a new buffer into existing location, so there might be >> some potential to reuse existing 2nd level tables (and save a tiny bit >> of free/alloc). I've not thought too much about how that would look >> in code.. might be kinda, umm, fun.. >> >> But at an API level, we should be able to do a bunch of >> map/unmap_range's with one flush. >> >> Maybe it could look like a sequence of iommu_{map,unmap}_range() >> followed by iommu_flush()? > > Doesn't that mean that the IOMMU driver would have to keep track of all > mappings until it sees an iommu_flush()? That sounds like it could be a > lot of work and complicated code. Well, depends on how elaborate you want to get. If you don't want to be too fancy, it may just be a matter of not doing TLB flush until iommu_flush(). If you want to get fancy and minimize cpu flushes too, then iommu driver would have to do some more tracking to build up a transaction internally. I'm not really sure how you avoid that. I'm not quite sure how frequent it would be that separate buffers touch the same 2nd level table, so it might be sufficient to treat it like N map_range and unmap_range's followed by one TLB flush. I would, I think, need to implement a prototype or at least instrument the iommu driver somehow to generate some statistics. I've nearly got qcom-iommu-v0 working here on top of upstream + small set of patches.. but once that is a bit more complete, experimenting with some of this will be on my TODO list to see what amount of crazy/complicated brings worthwhile performance benefits. BR, -R > Thierry -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/9/2014 5:40 PM, Rob Clark wrote: > On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >> On 7/8/2014 4:49 PM, Rob Clark wrote: >>> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >>>> Hi Hiroshi, >>>> >>>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: >>>>> Hi Olav, >>>>> >>>>> Olav Haugan <ohaugan@codeaurora.org> writes: >>>>> >>>>>> Mapping and unmapping are more often than not in the critical path. >>>>>> map_range and unmap_range allows SMMU driver implementations to optimize >>>>>> the process of mapping and unmapping buffers into the SMMU page tables. >>>>>> Instead of mapping one physical address, do TLB operation (expensive), >>>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map >>>>>> a scatter-gatherlist of physically contiguous pages into one virtual >>>>>> address space and then at the end do one TLB operation. >>>>>> >>>>>> Additionally, the mapping operation would be faster in general since >>>>>> clients does not have to keep calling map API over and over again for >>>>>> each physically contiguous chunk of memory that needs to be mapped to a >>>>>> virtually contiguous region. >>>>>> >>>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >>>>>> --- >>>>>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >>>>>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >>>>>> 2 files changed, 48 insertions(+) >>>>>> >>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>>>> index e5555fc..f2a6b80 100644 >>>>>> --- a/drivers/iommu/iommu.c >>>>>> +++ b/drivers/iommu/iommu.c >>>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >>>>>> EXPORT_SYMBOL_GPL(iommu_unmap); >>>>>> >>>>>> >>>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >>>>>> + struct scatterlist *sg, unsigned int len, int prot) >>>>>> +{ >>>>>> + if (unlikely(domain->ops->map_range == NULL)) >>>>>> + return -ENODEV; >>>>>> + >>>>>> + BUG_ON(iova & (~PAGE_MASK)); >>>>>> + >>>>>> + return domain->ops->map_range(domain, iova, sg, len, prot); >>>>>> +} >>>>>> +EXPORT_SYMBOL_GPL(iommu_map_range); >>>>> >>>>> We have the similar one internally, which is named, "iommu_map_sg()", >>>>> called from DMA API. >>>> >>>> Great, so this new API will be useful to more people! >>>> >>>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >>>>>> + unsigned int len) >>>>>> +{ >>>>>> + if (unlikely(domain->ops->unmap_range == NULL)) >>>>>> + return -ENODEV; >>>>>> + >>>>>> + BUG_ON(iova & (~PAGE_MASK)); >>>>>> + >>>>>> + return domain->ops->unmap_range(domain, iova, len); >>>>>> +} >>>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); >>>>> >>>>> Can the existing iommu_unmap() do the same? >>>> >>>> I believe iommu_unmap() behaves a bit differently because it will keep >>>> on calling domain->ops->unmap() until everything is unmapped instead of >>>> letting the iommu implementation take care of unmapping everything in >>>> one call. >>>> >>>> I am abandoning the patch series since our driver was not accepted. >>>> However, if there are no objections I will resubmit this patch (PATCH >>>> 2/7) as an independent patch to add this new map_range API. >>> >>> +1 for map_range().. I've seen for gpu workloads, at least, it is the >>> downstream map_range() API is quite beneficial. It was worth at >>> least a few fps in xonotic. >>> >>> And, possibly getting off the subject a bit, but I was wondering about >>> the possibility of going one step further and batching up mapping >>> and/or unmapping multiple buffers (ranges) at once. I have a pretty >>> convenient sync point in drm/msm to flush out multiple mappings before >>> kicking gpu. >> >> I think you should be able to do that with this API already - at least >> the mapping part since we are passing in a sg list (this could be a >> chained sglist). > > What I mean by batching up is mapping and unmapping multiple sglists > each at different iova's with minmal cpu cache and iommu tlb flushes.. > > Ideally we'd let the IOMMU driver be clever and build out all 2nd > level tables before inserting into first level tables (to minimize cpu > cache flushing).. also, there is probably a reasonable chance that > we'd be mapping a new buffer into existing location, so there might be > some potential to reuse existing 2nd level tables (and save a tiny bit > of free/alloc). I've not thought too much about how that would look > in code.. might be kinda, umm, fun.. > > But at an API level, we should be able to do a bunch of > map/unmap_range's with one flush. > > Maybe it could look like a sequence of iommu_{map,unmap}_range() > followed by iommu_flush()? > So we could add another argument ("options") in the range api that allows you to indicate whether you want to invalidate TLB or not. Thanks, Olav
On Thu, Jul 10, 2014 at 6:43 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: > On 7/9/2014 5:40 PM, Rob Clark wrote: >> On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >>> On 7/8/2014 4:49 PM, Rob Clark wrote: >>>> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote: >>>>> Hi Hiroshi, >>>>> >>>>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote: >>>>>> Hi Olav, >>>>>> >>>>>> Olav Haugan <ohaugan@codeaurora.org> writes: >>>>>> >>>>>>> Mapping and unmapping are more often than not in the critical path. >>>>>>> map_range and unmap_range allows SMMU driver implementations to optimize >>>>>>> the process of mapping and unmapping buffers into the SMMU page tables. >>>>>>> Instead of mapping one physical address, do TLB operation (expensive), >>>>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map >>>>>>> a scatter-gatherlist of physically contiguous pages into one virtual >>>>>>> address space and then at the end do one TLB operation. >>>>>>> >>>>>>> Additionally, the mapping operation would be faster in general since >>>>>>> clients does not have to keep calling map API over and over again for >>>>>>> each physically contiguous chunk of memory that needs to be mapped to a >>>>>>> virtually contiguous region. >>>>>>> >>>>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> >>>>>>> --- >>>>>>> drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ >>>>>>> include/linux/iommu.h | 24 ++++++++++++++++++++++++ >>>>>>> 2 files changed, 48 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>>>>> index e5555fc..f2a6b80 100644 >>>>>>> --- a/drivers/iommu/iommu.c >>>>>>> +++ b/drivers/iommu/iommu.c >>>>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >>>>>>> EXPORT_SYMBOL_GPL(iommu_unmap); >>>>>>> >>>>>>> >>>>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >>>>>>> + struct scatterlist *sg, unsigned int len, int prot) >>>>>>> +{ >>>>>>> + if (unlikely(domain->ops->map_range == NULL)) >>>>>>> + return -ENODEV; >>>>>>> + >>>>>>> + BUG_ON(iova & (~PAGE_MASK)); >>>>>>> + >>>>>>> + return domain->ops->map_range(domain, iova, sg, len, prot); >>>>>>> +} >>>>>>> +EXPORT_SYMBOL_GPL(iommu_map_range); >>>>>> >>>>>> We have the similar one internally, which is named, "iommu_map_sg()", >>>>>> called from DMA API. >>>>> >>>>> Great, so this new API will be useful to more people! >>>>> >>>>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >>>>>>> + unsigned int len) >>>>>>> +{ >>>>>>> + if (unlikely(domain->ops->unmap_range == NULL)) >>>>>>> + return -ENODEV; >>>>>>> + >>>>>>> + BUG_ON(iova & (~PAGE_MASK)); >>>>>>> + >>>>>>> + return domain->ops->unmap_range(domain, iova, len); >>>>>>> +} >>>>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range); >>>>>> >>>>>> Can the existing iommu_unmap() do the same? >>>>> >>>>> I believe iommu_unmap() behaves a bit differently because it will keep >>>>> on calling domain->ops->unmap() until everything is unmapped instead of >>>>> letting the iommu implementation take care of unmapping everything in >>>>> one call. >>>>> >>>>> I am abandoning the patch series since our driver was not accepted. >>>>> However, if there are no objections I will resubmit this patch (PATCH >>>>> 2/7) as an independent patch to add this new map_range API. >>>> >>>> +1 for map_range().. I've seen for gpu workloads, at least, it is the >>>> downstream map_range() API is quite beneficial. It was worth at >>>> least a few fps in xonotic. >>>> >>>> And, possibly getting off the subject a bit, but I was wondering about >>>> the possibility of going one step further and batching up mapping >>>> and/or unmapping multiple buffers (ranges) at once. I have a pretty >>>> convenient sync point in drm/msm to flush out multiple mappings before >>>> kicking gpu. >>> >>> I think you should be able to do that with this API already - at least >>> the mapping part since we are passing in a sg list (this could be a >>> chained sglist). >> >> What I mean by batching up is mapping and unmapping multiple sglists >> each at different iova's with minmal cpu cache and iommu tlb flushes.. >> >> Ideally we'd let the IOMMU driver be clever and build out all 2nd >> level tables before inserting into first level tables (to minimize cpu >> cache flushing).. also, there is probably a reasonable chance that >> we'd be mapping a new buffer into existing location, so there might be >> some potential to reuse existing 2nd level tables (and save a tiny bit >> of free/alloc). I've not thought too much about how that would look >> in code.. might be kinda, umm, fun.. >> >> But at an API level, we should be able to do a bunch of >> map/unmap_range's with one flush. >> >> Maybe it could look like a sequence of iommu_{map,unmap}_range() >> followed by iommu_flush()? >> > > So we could add another argument ("options") in the range api that > allows you to indicate whether you want to invalidate TLB or not. sounds reasonable.. I'm pretty sure we want explict-flush to be an opt-in behaviour. BR, -R > Thanks, > > Olav > > -- > The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 30, 2014 at 09:51:51AM -0700, Olav Haugan wrote: > +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, > + struct scatterlist *sg, unsigned int len, int prot) > +{ > + if (unlikely(domain->ops->map_range == NULL)) > + return -ENODEV; > + > + BUG_ON(iova & (~PAGE_MASK)); > + > + return domain->ops->map_range(domain, iova, sg, len, prot); > +} > +EXPORT_SYMBOL_GPL(iommu_map_range); > + > +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, > + unsigned int len) > +{ > + if (unlikely(domain->ops->unmap_range == NULL)) > + return -ENODEV; > + > + BUG_ON(iova & (~PAGE_MASK)); > + > + return domain->ops->unmap_range(domain, iova, len); > +} > +EXPORT_SYMBOL_GPL(iommu_unmap_range); Before introducing these new API functions there should be a fall-back for IOMMU drivers that do (not yet) implement the map_range and unmap_range call-backs. The last thing we want is this kind of functional partitioning between different IOMMU drivers. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/11/2014 3:20 AM, Joerg Roedel wrote: > On Mon, Jun 30, 2014 at 09:51:51AM -0700, Olav Haugan wrote: >> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, >> + struct scatterlist *sg, unsigned int len, int prot) >> +{ >> + if (unlikely(domain->ops->map_range == NULL)) >> + return -ENODEV; >> + >> + BUG_ON(iova & (~PAGE_MASK)); >> + >> + return domain->ops->map_range(domain, iova, sg, len, prot); >> +} >> +EXPORT_SYMBOL_GPL(iommu_map_range); >> + >> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, >> + unsigned int len) >> +{ >> + if (unlikely(domain->ops->unmap_range == NULL)) >> + return -ENODEV; >> + >> + BUG_ON(iova & (~PAGE_MASK)); >> + >> + return domain->ops->unmap_range(domain, iova, len); >> +} >> +EXPORT_SYMBOL_GPL(iommu_unmap_range); > > Before introducing these new API functions there should be a fall-back > for IOMMU drivers that do (not yet) implement the map_range and > unmap_range call-backs. > > The last thing we want is this kind of functional partitioning between > different IOMMU drivers. Yes, I can definitely add a fallback instead of returning -ENODEV. Thanks, Olav
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index e5555fc..f2a6b80 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) EXPORT_SYMBOL_GPL(iommu_unmap); +int iommu_map_range(struct iommu_domain *domain, unsigned int iova, + struct scatterlist *sg, unsigned int len, int prot) +{ + if (unlikely(domain->ops->map_range == NULL)) + return -ENODEV; + + BUG_ON(iova & (~PAGE_MASK)); + + return domain->ops->map_range(domain, iova, sg, len, prot); +} +EXPORT_SYMBOL_GPL(iommu_map_range); + +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, + unsigned int len) +{ + if (unlikely(domain->ops->unmap_range == NULL)) + return -ENODEV; + + BUG_ON(iova & (~PAGE_MASK)); + + return domain->ops->unmap_range(domain, iova, len); +} +EXPORT_SYMBOL_GPL(iommu_unmap_range); + int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr, phys_addr_t paddr, u64 size, int prot) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b96a5b2..63dca6d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -22,6 +22,7 @@ #include <linux/errno.h> #include <linux/err.h> #include <linux/types.h> +#include <linux/scatterlist.h> #include <trace/events/iommu.h> #define IOMMU_READ (1 << 0) @@ -93,6 +94,8 @@ enum iommu_attr { * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain + * @map_range: map a scatter-gather list of physically contiguous memory chunks to an iommu domain + * @unmap_range: unmap a scatter-gather list of physically contiguous memory chunks from an iommu domain * @iova_to_phys: translate iova to physical address * @domain_has_cap: domain capabilities query * @add_device: add device to iommu grouping @@ -110,6 +113,10 @@ struct iommu_ops { phys_addr_t paddr, size_t size, int prot); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size); + int (*map_range)(struct iommu_domain *domain, unsigned int iova, + struct scatterlist *sg, unsigned int len, int prot); + int (*unmap_range)(struct iommu_domain *domain, unsigned int iova, + unsigned int len); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); int (*domain_has_cap)(struct iommu_domain *domain, unsigned long cap); @@ -153,6 +160,10 @@ extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size); +extern int iommu_map_range(struct iommu_domain *domain, unsigned int iova, + struct scatterlist *sg, unsigned int len, int prot); +extern int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova, + unsigned int len); extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova); extern int iommu_domain_has_cap(struct iommu_domain *domain, unsigned long cap); @@ -280,6 +291,19 @@ static inline int iommu_unmap(struct iommu_domain *domain, unsigned long iova, return -ENODEV; } +static inline int iommu_map_range(struct iommu_domain *domain, + unsigned int iova, struct scatterlist *sg, + unsigned int len, int prot) +{ + return -ENODEV; +} + +static inline int iommu_unmap_range(struct iommu_domain *domain, + unsigned int iova, unsigned int len) +{ + return -ENODEV; +} + static inline int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr, phys_addr_t paddr, u64 size, int prot)
Mapping and unmapping are more often than not in the critical path. map_range and unmap_range allows SMMU driver implementations to optimize the process of mapping and unmapping buffers into the SMMU page tables. Instead of mapping one physical address, do TLB operation (expensive), mapping, do TLB operation, mapping, do TLB operation the driver can map a scatter-gatherlist of physically contiguous pages into one virtual address space and then at the end do one TLB operation. Additionally, the mapping operation would be faster in general since clients does not have to keep calling map API over and over again for each physically contiguous chunk of memory that needs to be mapped to a virtually contiguous region. Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> --- drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++ include/linux/iommu.h | 24 ++++++++++++++++++++++++ 2 files changed, 48 insertions(+)