[1/2] scatterlist: use sg_phys()

Message ID	20150609162710.21910.57295.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Subject: [PATCH 1/2] scatterlist: use sg_phys() From: Dan Williams <dan.j.williams@intel.com> To: axboe@kernel.dk Date: Tue, 09 Jun 2015 12:27:10 -0400 Message-ID: <20150609162710.21910.57295.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <20150609162659.21910.41681.stgit@dwillia2-desk3.amr.corp.intel.com> References: <20150609162659.21910.41681.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-8-g92dd MIME-Version: 1.0 Cc: Michal Simek <monstr@monstr.eu>, Russell King <linux@arm.linux.org.uk>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Joerg Roedel <joro@8bytes.org>, linux-kernel@vger.kernel.org, Julia Lawall <Julia.Lawall@lip6.fr>, dmaengine@vger.kernel.org, David Woodhouse <dwmw2@infradead.org>, hch@lst.de, linux-arm-kernel@lists.infradead.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Dan Williams June 9, 2015, 4:27 p.m. UTC

Coccinelle cleanup to replace open coded sg to physical address
translations.  This is in preparation for introducing scatterlists that
reference __pfn_t.

// sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
// usage: make coccicheck COCCI=sg_phys.cocci MODE=patch

virtual patch
virtual report
virtual org

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg)) + sg->offset
+ sg_phys(sg)

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg))
+ sg_phys(sg) - sg->offset

Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/arm/mm/dma-mapping.c                    |    2 +-
 arch/microblaze/kernel/dma.c                 |    2 +-
 drivers/iommu/intel-iommu.c                  |    4 ++--
 drivers/iommu/iommu.c                        |    2 +-
 drivers/staging/android/ion/ion_chunk_heap.c |    4 ++--
 5 files changed, 7 insertions(+), 7 deletions(-)

Elliott, Robert (Server Storage) June 10, 2015, 12:34 a.m. UTC | #1

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Dan Williams
> Sent: Tuesday, June 09, 2015 10:27 AM
> Subject: [PATCH 1/2] scatterlist: use sg_phys()
> 
...
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index ed7ba8a11822..dcb3c594d626 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct
> scatterlist *sgl,
>  	/* FIXME this part of code is untested */
>  	for_each_sg(sgl, sg, nents, i) {
>  		sg->dma_address = sg_phys(sg);
> -		__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> +		__dma_sync(sg_phys(sg),
>  							sg->length, direction);
>  	}

That one ends up with weird indentation.

Joerg Roedel June 10, 2015, 9:32 a.m. UTC | #2

On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 7e7583ddd607..9f6ff6671f01 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>  		return -ENOMEM;
>  
>  	for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> -		phys_addr_t phys = page_to_phys(sg_page(s));
> +		phys_addr_t phys = sg_phys(s) - s->offset;

So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
which makes the above statement to:

	page_to_phys(sg_page(s)) + s->offset - s->offset;

The compiler will probably optimize that away, but it still doesn't look
like an improvement.

>  		unsigned int len = PAGE_ALIGN(s->offset + s->length);
>  
>  		if (!is_coherent &&
> diff --git a/arch/microblaze/kernel/dma.c b/arch/microblaze/kernel/dma.c
> index ed7ba8a11822..dcb3c594d626 100644
> --- a/arch/microblaze/kernel/dma.c
> +++ b/arch/microblaze/kernel/dma.c
> @@ -61,7 +61,7 @@ static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
>  	/* FIXME this part of code is untested */
>  	for_each_sg(sgl, sg, nents, i) {
>  		sg->dma_address = sg_phys(sg);
> -		__dma_sync(page_to_phys(sg_page(sg)) + sg->offset,
> +		__dma_sync(sg_phys(sg),
>  							sg->length, direction);

Here the replacement makes sense, but weird indendation. Could all be
moved to one line, I guess.

>  	}
>  
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 68d43beccb7e..9b9ada71e0d3 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1998,7 +1998,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
>  			sg_res = aligned_nrpages(sg->offset, sg->length);
>  			sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset;
>  			sg->dma_length = sg->length;
> -			pteval = page_to_phys(sg_page(sg)) | prot;
> +			pteval = (sg_phys(sg) - sg->offset) | prot;

Here it doesn't make sense too. In general, please remove the cases
where you have to subtract sg->offset after the conversion.


	Joerg

Dan Williams June 10, 2015, 4 p.m. UTC | #3

On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index 7e7583ddd607..9f6ff6671f01 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>>               return -ENOMEM;
>>
>>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> -             phys_addr_t phys = page_to_phys(sg_page(s));
>> +             phys_addr_t phys = sg_phys(s) - s->offset;
>
> So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> which makes the above statement to:
>
>         page_to_phys(sg_page(s)) + s->offset - s->offset;
>
> The compiler will probably optimize that away, but it still doesn't look
> like an improvement.

The goal is to eventually stop leaking struct page deep into the i/o
stack.  Anything that relies on being able to retrieve a struct page
out of an sg entry needs to be converted.  I think we need a new
helper for this case "sg_phys_aligned()?".

Russell King - ARM Linux June 10, 2015, 4:31 p.m. UTC | #4

On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> >> index 7e7583ddd607..9f6ff6671f01 100644
> >> --- a/arch/arm/mm/dma-mapping.c
> >> +++ b/arch/arm/mm/dma-mapping.c
> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
> >>               return -ENOMEM;
> >>
> >>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> >> -             phys_addr_t phys = page_to_phys(sg_page(s));
> >> +             phys_addr_t phys = sg_phys(s) - s->offset;
> >
> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> > which makes the above statement to:
> >
> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
> >
> > The compiler will probably optimize that away, but it still doesn't look
> > like an improvement.
> 
> The goal is to eventually stop leaking struct page deep into the i/o
> stack.  Anything that relies on being able to retrieve a struct page
> out of an sg entry needs to be converted.  I think we need a new
> helper for this case "sg_phys_aligned()?".

Why?  The aim of the code is not to detect whether the address is aligned
to a page (if it were, it'd be testing for a zero s->offset, or it would
be testing for an s->offset being a multiple of the page size.

Let's first understand the code that's being modified (which seems to be
something which isn't done very much today - people seem to just like
changing code for the hell of it.)

        for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
                phys_addr_t phys = page_to_phys(sg_page(s));
                unsigned int len = PAGE_ALIGN(s->offset + s->length);

                if (!is_coherent &&
                        !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
                        __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
dir);

                prot = __dma_direction_to_prot(dir);

                ret = iommu_map(mapping->domain, iova, phys, len, prot);
                if (ret < 0)
                        goto fail;
                count += len >> PAGE_SHIFT;
                iova += len;
        }

What it's doing is trying to map each scatterlist entry via an IOMMU.
Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
page.

However, what says that the IOMMU page size is the same as the host CPU
page size - it may not be... so the above code is wrong for a completely
different reason.

What we _should_ be doing is finding out what the IOMMU minimum page
size is, and using that in conjunction with the sg_phys() of the
scatterlist entry to determine the start and length of the mapping
such that the IOMMU mapping covers the range described by the scatterlist
entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
minimum page size.)

However, what we can also see from the above is that we have other code
here using sg_page() - which is a necessity to be able to perform the
required DMA cache maintanence to ensure that the data is visible to the
DMA device.  We need to kmap_atomic() these in order to flush them, and
there's no other way other than struct page to represent a highmem page.

So, I think your intent to want to remove struct page from the I/O
subsystem is a false hope, unless you want to end up rewriting lots of
architecture code, and you can come up with an alternative method to
represent highmem pages.

Dan Williams June 10, 2015, 4:57 p.m. UTC | #5

On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 2:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> > On Tue, Jun 09, 2015 at 12:27:10PM -0400, Dan Williams wrote:
>> >> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> >> index 7e7583ddd607..9f6ff6671f01 100644
>> >> --- a/arch/arm/mm/dma-mapping.c
>> >> +++ b/arch/arm/mm/dma-mapping.c
>> >> @@ -1502,7 +1502,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
>> >>               return -ENOMEM;
>> >>
>> >>       for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >> -             phys_addr_t phys = page_to_phys(sg_page(s));
>> >> +             phys_addr_t phys = sg_phys(s) - s->offset;
>> >
>> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
>> > which makes the above statement to:
>> >
>> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
>> >
>> > The compiler will probably optimize that away, but it still doesn't look
>> > like an improvement.
>>
>> The goal is to eventually stop leaking struct page deep into the i/o
>> stack.  Anything that relies on being able to retrieve a struct page
>> out of an sg entry needs to be converted.  I think we need a new
>> helper for this case "sg_phys_aligned()?".
>
> Why?  The aim of the code is not to detect whether the address is aligned
> to a page (if it were, it'd be testing for a zero s->offset, or it would
> be testing for an s->offset being a multiple of the page size.
>
> Let's first understand the code that's being modified (which seems to be
> something which isn't done very much today - people seem to just like
> changing code for the hell of it.)
>
>         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>                 phys_addr_t phys = page_to_phys(sg_page(s));
>                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
>
>                 if (!is_coherent &&
>                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> dir);
>
>                 prot = __dma_direction_to_prot(dir);
>
>                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
>                 if (ret < 0)
>                         goto fail;
>                 count += len >> PAGE_SHIFT;
>                 iova += len;
>         }
>
> What it's doing is trying to map each scatterlist entry via an IOMMU.
> Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> page.
>
> However, what says that the IOMMU page size is the same as the host CPU
> page size - it may not be... so the above code is wrong for a completely
> different reason.
>
> What we _should_ be doing is finding out what the IOMMU minimum page
> size is, and using that in conjunction with the sg_phys() of the
> scatterlist entry to determine the start and length of the mapping
> such that the IOMMU mapping covers the range described by the scatterlist
> entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
> minimum page size.)
>
> However, what we can also see from the above is that we have other code
> here using sg_page() - which is a necessity to be able to perform the
> required DMA cache maintanence to ensure that the data is visible to the
> DMA device.  We need to kmap_atomic() these in order to flush them, and
> there's no other way other than struct page to represent a highmem page.
>
> So, I think your intent to want to remove struct page from the I/O
> subsystem is a false hope, unless you want to end up rewriting lots of
> architecture code, and you can come up with an alternative method to
> represent highmem pages.

I think there will always be cases that need to do pfn_to_page() for
buffer management.  Those configurations will be blocked from seeing
cases where a pfn is not struct page backed.  So, you can have highmem
or dma to pmem, but not both.  Christoph, this is why I have Kconfig
symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
i/o.

Russell King - ARM Linux June 10, 2015, 5:13 p.m. UTC | #6

On Wed, Jun 10, 2015 at 09:57:06AM -0700, Dan Williams wrote:
> On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > Why?  The aim of the code is not to detect whether the address is aligned
> > to a page (if it were, it'd be testing for a zero s->offset, or it would
> > be testing for an s->offset being a multiple of the page size.
> >
> > Let's first understand the code that's being modified (which seems to be
> > something which isn't done very much today - people seem to just like
> > changing code for the hell of it.)
> >
> >         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
> >                 phys_addr_t phys = page_to_phys(sg_page(s));
> >                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
> >
> >                 if (!is_coherent &&
> >                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
> >                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
> > dir);
> >
> >                 prot = __dma_direction_to_prot(dir);
> >
> >                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
> >                 if (ret < 0)
> >                         goto fail;
> >                 count += len >> PAGE_SHIFT;
> >                 iova += len;
> >         }
> >
> > What it's doing is trying to map each scatterlist entry via an IOMMU.
> > Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
> > page.
> >
> > However, what says that the IOMMU page size is the same as the host CPU
> > page size - it may not be... so the above code is wrong for a completely
> > different reason.
> >
> > What we _should_ be doing is finding out what the IOMMU minimum page
> > size is, and using that in conjunction with the sg_phys() of the
> > scatterlist entry to determine the start and length of the mapping
> > such that the IOMMU mapping covers the range described by the scatterlist
> > entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
> > minimum page size.)
> >
> > However, what we can also see from the above is that we have other code
> > here using sg_page() - which is a necessity to be able to perform the
> > required DMA cache maintanence to ensure that the data is visible to the
> > DMA device.  We need to kmap_atomic() these in order to flush them, and
> > there's no other way other than struct page to represent a highmem page.
> >
> > So, I think your intent to want to remove struct page from the I/O
> > subsystem is a false hope, unless you want to end up rewriting lots of
> > architecture code, and you can come up with an alternative method to
> > represent highmem pages.
> 
> I think there will always be cases that need to do pfn_to_page() for
> buffer management.  Those configurations will be blocked from seeing
> cases where a pfn is not struct page backed.  So, you can have highmem
> or dma to pmem, but not both.  Christoph, this is why I have Kconfig
> symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
> i/o.

Hmm, pmem... yea, in the SolidRun community, we've basically decided to
stick with my updated Marvell BMM layer rather than switch to pmem.  I
forget the reasons why, but the decision was made after looking at what
pmem was doing...

In any case, let's not get bogged down in a peripheral issue.

What I'm objecting to is that the patches I've seen seem to be just
churn without any net benefit.

You can't simply make sg_page() return NULL after this change, because
you've done nothing with the remaining sg_page() callers to allow them
to gracefully handle that case.

What I'd like to see is a much fuller series of patches which show the
whole progression towards your end goal rather than a piecemeal
approach.  Right now, it's not clear that there is any benefit to
this round of changes.

Dan Williams June 10, 2015, 5:25 p.m. UTC | #7

On Wed, Jun 10, 2015 at 10:13 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jun 10, 2015 at 09:57:06AM -0700, Dan Williams wrote:
>> On Wed, Jun 10, 2015 at 9:31 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > Why?  The aim of the code is not to detect whether the address is aligned
>> > to a page (if it were, it'd be testing for a zero s->offset, or it would
>> > be testing for an s->offset being a multiple of the page size.
>> >
>> > Let's first understand the code that's being modified (which seems to be
>> > something which isn't done very much today - people seem to just like
>> > changing code for the hell of it.)
>> >
>> >         for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
>> >                 phys_addr_t phys = page_to_phys(sg_page(s));
>> >                 unsigned int len = PAGE_ALIGN(s->offset + s->length);
>> >
>> >                 if (!is_coherent &&
>> >                         !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
>> >                         __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length,
>> > dir);
>> >
>> >                 prot = __dma_direction_to_prot(dir);
>> >
>> >                 ret = iommu_map(mapping->domain, iova, phys, len, prot);
>> >                 if (ret < 0)
>> >                         goto fail;
>> >                 count += len >> PAGE_SHIFT;
>> >                 iova += len;
>> >         }
>> >
>> > What it's doing is trying to map each scatterlist entry via an IOMMU.
>> > Unsurprisingly, IOMMUs are page based - you can't map a partial IOMMU
>> > page.
>> >
>> > However, what says that the IOMMU page size is the same as the host CPU
>> > page size - it may not be... so the above code is wrong for a completely
>> > different reason.
>> >
>> > What we _should_ be doing is finding out what the IOMMU minimum page
>> > size is, and using that in conjunction with the sg_phys() of the
>> > scatterlist entry to determine the start and length of the mapping
>> > such that the IOMMU mapping covers the range described by the scatterlist
>> > entry.  (iommu_map() takes arguments which must be aligned to the IOMMU
>> > minimum page size.)
>> >
>> > However, what we can also see from the above is that we have other code
>> > here using sg_page() - which is a necessity to be able to perform the
>> > required DMA cache maintanence to ensure that the data is visible to the
>> > DMA device.  We need to kmap_atomic() these in order to flush them, and
>> > there's no other way other than struct page to represent a highmem page.
>> >
>> > So, I think your intent to want to remove struct page from the I/O
>> > subsystem is a false hope, unless you want to end up rewriting lots of
>> > architecture code, and you can come up with an alternative method to
>> > represent highmem pages.
>>
>> I think there will always be cases that need to do pfn_to_page() for
>> buffer management.  Those configurations will be blocked from seeing
>> cases where a pfn is not struct page backed.  So, you can have highmem
>> or dma to pmem, but not both.  Christoph, this is why I have Kconfig
>> symbols (DEV_PFN etc) to gate whether an arch/config supports pfn-only
>> i/o.
>
> Hmm, pmem... yea, in the SolidRun community, we've basically decided to
> stick with my updated Marvell BMM layer rather than switch to pmem.  I
> forget the reasons why, but the decision was made after looking at what
> pmem was doing...

I'd of course be open to exploring if drivers/nvdimm/ could be made
more generally useful.

> In any case, let's not get bogged down in a peripheral issue.
>
> What I'm objecting to is that the patches I've seen seem to be just
> churn without any net benefit.
>
> You can't simply make sg_page() return NULL after this change, because
> you've done nothing with the remaining sg_page() callers to allow them
> to gracefully handle that case.
>
> What I'd like to see is a much fuller series of patches which show the
> whole progression towards your end goal rather than a piecemeal
> approach.  Right now, it's not clear that there is any benefit to
> this round of changes.
>

Fair enough.  I had them as part of a larger series [1].  Christoph
suggested that I break out the pure cleanups separately.  I'll add you
to the next rev of that series.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-June/001094.html

Joerg Roedel June 11, 2015, 6:50 a.m. UTC | #8

On Wed, Jun 10, 2015 at 09:00:31AM -0700, Dan Williams wrote:
> > So sg_phys() turns out to be 'page_to_phys(sg_page(s)) + s->offset',
> > which makes the above statement to:
> >
> >         page_to_phys(sg_page(s)) + s->offset - s->offset;
> >
> > The compiler will probably optimize that away, but it still doesn't look
> > like an improvement.
> 
> The goal is to eventually stop leaking struct page deep into the i/o
> stack.  Anything that relies on being able to retrieve a struct page
> out of an sg entry needs to be converted.  I think we need a new
> helper for this case "sg_phys_aligned()?".

You still have a reference to a struct page, because sg_phys() calls
sg_page() too. If you want to get rid of sg_page() something like
sg_pfn() migth be a more workable solution than sg_phys_(page_)aligned.

But maybe I am just missing the bigger scope of this, so I agree with
Russell that it is better so see a patch series which shows the
direction you want to go with this.

	Joerg

[1/2] scatterlist: use sg_phys()

Commit Message

Comments

Patch