diff mbox

[2/4] scatterlist: add sg_alloc_table_from_buf() helper

Message ID 1459427384-21374-3-git-send-email-boris.brezillon@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris BREZILLON March 31, 2016, 12:29 p.m. UTC
sg_alloc_table_from_buf() provides an easy solution to create an sg_table
from a virtual address pointer. This function takes care of dealing with
vmallocated buffers, buffer alignment, or DMA engine limitations (maximum
DMA transfer size).

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 include/linux/scatterlist.h |  24 ++++++
 lib/scatterlist.c           | 183 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 207 insertions(+)

Comments

Russell King - ARM Linux March 31, 2016, 2:14 p.m. UTC | #1
On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote:
> sg_alloc_table_from_buf() provides an easy solution to create an sg_table
> from a virtual address pointer. This function takes care of dealing with
> vmallocated buffers, buffer alignment, or DMA engine limitations (maximum
> DMA transfer size).

Please note that the DMA API does not take account of coherency of memory
regions other than non-high/lowmem - there are specific extensions to
deal with this.

What this means is that having an API that takes any virtual address
pointer, converts it to a scatterlist which is then DMA mapped, is
unsafe.

It'll be okay for PIPT and non-aliasing VIPT cache architectures, but
for other cache architectures this will hide this problem and make
review harder.
Boris BREZILLON March 31, 2016, 2:45 p.m. UTC | #2
Hi Russell,

On Thu, 31 Mar 2016 15:14:13 +0100
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote:
> > sg_alloc_table_from_buf() provides an easy solution to create an sg_table
> > from a virtual address pointer. This function takes care of dealing with
> > vmallocated buffers, buffer alignment, or DMA engine limitations (maximum
> > DMA transfer size).
> 
> Please note that the DMA API does not take account of coherency of memory
> regions other than non-high/lowmem - there are specific extensions to
> deal with this.

Ok, you said 'non-high/lowmem', this means vmalloced and kmapped buffers
already fall in this case, right?

Could you tell me more about those specific extensions?

> 
> What this means is that having an API that takes any virtual address
> pointer, converts it to a scatterlist which is then DMA mapped, is
> unsafe.

Which means some implementations already get this wrong (see
spi_map_buf(), and I'm pretty sure it's not the only one).

> 
> It'll be okay for PIPT and non-aliasing VIPT cache architectures, but
> for other cache architectures this will hide this problem and make
> review harder.
> 

Ok, you lost me. I'll have to do my homework and try to understand what
this means :).

Thanks for your valuable inputs.

Best Regards,

Boris
Russell King - ARM Linux March 31, 2016, 3:09 p.m. UTC | #3
On Thu, Mar 31, 2016 at 04:45:57PM +0200, Boris Brezillon wrote:
> Hi Russell,
> 
> On Thu, 31 Mar 2016 15:14:13 +0100
> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
> > On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote:
> > > sg_alloc_table_from_buf() provides an easy solution to create an sg_table
> > > from a virtual address pointer. This function takes care of dealing with
> > > vmallocated buffers, buffer alignment, or DMA engine limitations (maximum
> > > DMA transfer size).
> > 
> > Please note that the DMA API does not take account of coherency of memory
> > regions other than non-high/lowmem - there are specific extensions to
> > deal with this.
> 
> Ok, you said 'non-high/lowmem', this means vmalloced and kmapped buffers
> already fall in this case, right?
> 
> Could you tell me more about those specific extensions?

I was slightly confused - the extensions I was thinking of are those
listed at the bottom of Documentation/cachetlb.txt, which have nothing
to do with DMA.

However, it's probably worth reading Documentation/DMA-API-HOWTO.txt
to read up on what kinds of memory are considered to be DMA-able in
the kernel.

> > What this means is that having an API that takes any virtual address
> > pointer, converts it to a scatterlist which is then DMA mapped, is
> > unsafe.
> 
> Which means some implementations already get this wrong (see
> spi_map_buf(), and I'm pretty sure it's not the only one).

Quite possible, but that is driver stuff, and driver stuff gets things
wrong all the time. :)

> > It'll be okay for PIPT and non-aliasing VIPT cache architectures, but
> > for other cache architectures this will hide this problem and make
> > review harder.
> > 
> 
> Ok, you lost me. I'll have to do my homework and try to understand what
> this means :).

P = physical address
V = virtual address
I = indexed
T = tag

The tag is held in each cache line.  When a location is looked up in the
cache, an index is used to locate a set of cache lines and the tag is
compared to check which cache line in the set is the correct one (or
whether the address even exists in the cache.)

How the index and tag are derived varies between cache architectures.

PIPT = indexed by physical address, tagged with physical address.  Never
aliases with itself in the presence of multiple virtual mappings.

VIPT = indexed by virtual address, tagged with physical address.  If the
bits from the virtual address do not overlap the MMU page size, it is
also alias free, otherwise aliases can exist, but can be eliminated by
"cache colouring" - ensuring that a physical address is always mapped
with the same overlapping bits.

VIVT = indexed by virtual address, tagged with virtual address.  The
worst kind of cache, since every different mapping of the same physical
address is guaranteed by design to alias with other mappings.

There is little cache colouring between different kernel mappings (eg,
between lowmem and vmalloc space.)

What this means is that, while the DMA API takes care of DMA aliases
in the lowmem mappings, an alias-prone VIPT cache will remain incoherent
with DMA if it is remapped into vmalloc space, and the mapping happens
to have a different cache colour.  In other words, this is a data
corruption issue.

Hence, taking a range of vmalloc() addresses, converting them into a
scatterlist, then using the DMA API on the scatterlist _only_ guarantees
that the lowmem (and kmap'd highmem mappings) are coherent with DMA.
There is no way for the DMA API to know that other mappings exist, and
obviously flushing every possible cache line just because a mapping might
exist multiplies the expense of the DMA API: not only in terms of time
spent running through all the possibilities, which doubles for every
aliasing bit of VIPT, but also TLB pressure since you'd have to create
a mapping for each alias and tear it back down.

VIVT is even worse, since there is no other virtual mapping which is
coherent, would need to be known, and each mapping would need to be
individually flushed.
diff mbox

Patch

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 556ec1e..18d1091 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -41,6 +41,27 @@  struct sg_table {
 	unsigned int orig_nents;	/* original size of list */
 };
 
+/**
+ * struct sg_constraints - SG constraints structure
+ *
+ * @max_segment_size: maximum segment length. Each SG entry has to be smaller
+ *		      than this value. Zero means no constraint.
+ * @required_alignment: minimum alignment. Is used for both size and pointer
+ *			alignment. If this constraint is not met, the function
+ *			should return -EINVAL.
+ * @preferred_alignment: preferred alignment. Mainly used to optimize
+ *			 throughput when the DMA engine performs better when
+ *			 doing aligned accesses.
+ *
+ * This structure is here to help sg_alloc_table_from_buf() create the optimal
+ * SG list based on DMA engine constraints.
+ */
+struct sg_constraints {
+	size_t max_segment_size;
+	size_t required_alignment;
+	size_t preferred_alignment;
+};
+
 /*
  * Notes on SG table design.
  *
@@ -265,6 +286,9 @@  int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
 	unsigned long offset, unsigned long size,
 	gfp_t gfp_mask);
+int sg_alloc_table_from_buf(struct sg_table *sgt, const void *buf, size_t len,
+			    const struct sg_constraints *constraints,
+			    gfp_t gfp_mask);
 
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
 		      size_t buflen, off_t skip, bool to_buffer);
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 004fc70..9c9746e 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -433,6 +433,189 @@  int sg_alloc_table_from_pages(struct sg_table *sgt,
 }
 EXPORT_SYMBOL(sg_alloc_table_from_pages);
 
+static size_t sg_buf_chunk_len(const void *buf, size_t len,
+			       const struct sg_constraints *cons)
+{
+	size_t chunk_len = len;
+
+	if (cons->max_segment_size)
+		chunk_len = min_t(size_t, chunk_len, cons->max_segment_size);
+
+	if (is_vmalloc_addr(buf)) {
+		unsigned long offset_in_page = offset_in_page(buf);
+		size_t contig_len = PAGE_SIZE - offset_in_page;
+		unsigned long phys = vmalloc_to_pfn(buf) - offset_in_page;
+		const void *contig_ptr = buf + contig_len;
+
+		/*
+		 * Vmalloced buffer might be composed of several physically
+		 * contiguous pages. Avoid extra scattergather entries in
+		 * this case.
+		 */
+		while (contig_len < chunk_len) {
+			if (phys + PAGE_SIZE != vmalloc_to_pfn(contig_ptr))
+				break;
+
+			contig_len += PAGE_SIZE;
+			contig_ptr += PAGE_SIZE;
+			phys += PAGE_SIZE;
+		}
+
+		chunk_len = min_t(size_t, chunk_len, contig_len);
+	}
+
+	if (!IS_ALIGNED((unsigned long)buf, cons->preferred_alignment)) {
+		const void *aligned_buf = PTR_ALIGN(buf,
+						    cons->preferred_alignment);
+		size_t unaligned_len = (unsigned long)(aligned_buf - buf);
+
+		chunk_len = min_t(size_t, chunk_len, unaligned_len);
+	} else if (chunk_len > cons->preferred_alignment) {
+		chunk_len &= ~(cons->preferred_alignment - 1);
+	}
+
+	return chunk_len;
+}
+
+#define sg_for_each_chunk_in_buf(buf, len, chunk_len, constraints)	\
+	for (chunk_len = sg_buf_chunk_len(buf, len, constraints);	\
+	     len;							\
+	     len -= chunk_len, buf += chunk_len,			\
+	     chunk_len = sg_buf_chunk_len(buf, len, constraints))
+
+static int sg_check_constraints(struct sg_constraints *cons,
+				const void *buf, size_t len)
+{
+	/*
+	 * We only accept buffers coming from the lowmem, vmalloc and
+	 * highmem regions.
+	 */
+	if (!virt_addr_valid(buf) && !is_vmalloc_addr(buf) &&
+	    !is_highmem_addr(buf))
+		return -EINVAL;
+
+	if (!cons->required_alignment)
+		cons->required_alignment = 1;
+
+	if (!cons->preferred_alignment)
+		cons->preferred_alignment = cons->required_alignment;
+
+	/* Test if buf and len are properly aligned. */
+	if (!IS_ALIGNED((unsigned long)buf, cons->required_alignment) ||
+	    !IS_ALIGNED(len, cons->required_alignment))
+		return -EINVAL;
+
+	/*
+	 * if the buffer has been vmallocated or kmapped and required_alignment
+	 * is more than PAGE_SIZE we cannot guarantee it.
+	 */
+	if (!virt_addr_valid(buf) && cons->required_alignment > PAGE_SIZE)
+		return -EINVAL;
+
+	/*
+	 * max_segment_size has to be aligned to required_alignment to
+	 * guarantee that all buffer chunks are aligned correctly.
+	 */
+	if (!IS_ALIGNED(cons->max_segment_size, cons->required_alignment))
+		return -EINVAL;
+
+	/*
+	 * preferred_alignment has to be aligned to required_alignment
+	 * to avoid misalignment of buffer chunks.
+	 */
+	if (!IS_ALIGNED(cons->preferred_alignment, cons->required_alignment))
+		return -EINVAL;
+
+	return 0;
+}
+
+/**
+ * sg_alloc_table_from_buf - create an SG table from a buffer
+ *
+ * @sgt: SG table
+ * @buf: buffer you want to create this SG table from
+ * @len: length of buf
+ * @constraints: optional constraints to take into account when creating
+ *		 the SG table. Can be NULL if no specific constraints are
+ *		 required.
+ * @gfp_mask: type of allocation to use when creating the table
+ *
+ * This function creates an SG table from a buffer, its length and some
+ * SG constraints.
+ *
+ * Note: This function supports buffers coming from the lowmem, vmalloc or
+ * highmem region.
+ */
+int sg_alloc_table_from_buf(struct sg_table *sgt, const void *buf, size_t len,
+			    const struct sg_constraints *constraints,
+			    gfp_t gfp_mask)
+{
+	struct sg_constraints cons = { };
+	size_t remaining, chunk_len;
+	const void *sg_buf;
+	int i, ret;
+
+	if (constraints)
+		cons = *constraints;
+
+	ret = sg_check_constraints(&cons, buf, len);
+	if (ret)
+		return ret;
+
+	sg_buf = buf;
+	remaining = len;
+	i = 0;
+	sg_for_each_chunk_in_buf(sg_buf, remaining, chunk_len, &cons)
+		i++;
+
+	ret = sg_alloc_table(sgt, i, gfp_mask);
+	if (ret)
+		return ret;
+
+	sg_buf = buf;
+	remaining = len;
+	i = 0;
+	sg_for_each_chunk_in_buf(sg_buf, remaining, chunk_len, &cons) {
+		if (virt_addr_valid(buf)) {
+			/*
+			 * Buffer is in lowmem, we can safely call
+			 * sg_set_buf().
+			 */
+			sg_set_buf(&sgt->sgl[i], sg_buf, chunk_len);
+		} else {
+			struct page *vm_page;
+
+			/*
+			 * Buffer has been obtained with vmalloc() or kmap().
+			 * In this case we have to extract the page information
+			 * and use sg_set_page().
+			 */
+			if (is_vmalloc_addr(sg_buf))
+				vm_page = vmalloc_to_page(sg_buf);
+			else
+				vm_page = kmap_to_page((void *)sg_buf);
+
+			if (!vm_page) {
+				ret = -ENOMEM;
+				goto err_free_table;
+			}
+
+			sg_set_page(&sgt->sgl[i], vm_page, chunk_len,
+				    offset_in_page(sg_buf));
+		}
+
+		i++;
+	}
+
+	return 0;
+
+err_free_table:
+	sg_free_table(sgt);
+
+	return ret;
+}
+EXPORT_SYMBOL(sg_alloc_table_from_buf);
+
 void __sg_page_iter_start(struct sg_page_iter *piter,
 			  struct scatterlist *sglist, unsigned int nents,
 			  unsigned long pgoffset)