Message ID | 5d691ab0c4f447c9f324213d8d740ac61d1739a1.1311936524.git.viresh.kumar@st.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2011/7/29 Viresh Kumar <viresh.kumar@st.com>: > Untill now, sg_len greater than one is not supported. This patch adds support to > do that. > > Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Thanks, was on my TODO once. Acked-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Fri, Jul 29, 2011 at 04:19:26PM +0530, Viresh Kumar wrote: > Untill now, sg_len greater than one is not supported. This patch adds support to > do that. I'm not sure that this is the correct approach. memcpy()s can only be used with one single buffer, so the sg stuff for that (and the unmapping support) doesn't make sense at all. The only place where this makes sense is the slave sg stuff. I wonder whether we can better deal with that by having the LLI setup code deal with one SG list entry at a time, and chain each together. Something I've also been pondering which is related to this is linking together DMA operations using the LLI chaining when interrupts and callbacks aren't required. It's a very similar problem. Other DMA engine drivers do this. Finally, note that some of the PL08x code assumes that for any TXD, the LLI list is a contiguous array. See the first part of pl08x_getbytes_chan(). That needs fixing as soon as we start going to more than one SG list entry.
On Sun, Aug 14, 2011 at 10:36 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Fri, Jul 29, 2011 at 04:19:26PM +0530, Viresh Kumar wrote: >> Untill now, sg_len greater than one is not supported. This patch adds support to >> do that. > > I'm not sure that this is the correct approach. memcpy()s can only be > used with one single buffer, so the sg stuff for that (and the unmapping > support) doesn't make sense at all. I agree, that part is fishy. > The only place where this makes sense is the slave sg stuff. I wonder > whether we can better deal with that by having the LLI setup code deal > with one SG list entry at a time, and chain each together. > > Something I've also been pondering which is related to this is linking > together DMA operations using the LLI chaining when interrupts and > callbacks aren't required. It's a very similar problem. Other DMA > engine drivers do this. Yeah. Viresh can you hold this patch off for the moment, move it to the end of the series and see if we can find some more optimal approach for this? > Finally, note that some of the PL08x code assumes that for any TXD, > the LLI list is a contiguous array. See the first part of > pl08x_getbytes_chan(). That needs fixing as soon as we start going to > more than one SG list entry. Actually, after the removal of the phantom boundary in the PL08x driver, the LLI handling code starts to resemble the stuff inside the coh901318.c driver an *awful lot*, so I'm now ever more convinced that that thing is a PL08x derivate. (I have asked the hardware designers but the people who made it are unreachable.) And the coh901318 code handles this by properly walking the LLIs in memory IIRC, so could be a good inspiration. I am aware that we need to consolidate that LLI handling so the coh901318 and PL08x share the same LLI code atleast, or I may even be able to merge all into pl08x.c. I have a mental note to get to that once we've got PL08x into shape, until then it can atleast be used for inspiration. Thanks, Linus Walleij
Hi Russell & Linus, Sorry for being late to reply. On 8/14/2011 2:06 PM, Russell King - ARM Linux wrote: > On Fri, Jul 29, 2011 at 04:19:26PM +0530, Viresh Kumar wrote: >> Untill now, sg_len greater than one is not supported. This patch adds support to >> do that. > > I'm not sure that this is the correct approach. memcpy()s can only be > used with one single buffer, so the sg stuff for that (and the unmapping > support) doesn't make sense at all. > I am not sure if i get this completely. In memcpy, we still don't support more than one sg. We have created a new member in txd, which keeps track of data (addresses, len). While unmapping, we are unmapping only single dsg. I think i don't need to use list_for_each_entry() there, as there is only one element present in list. But can still keep that. > The only place where this makes sense is the slave sg stuff. I wonder > whether we can better deal with that by having the LLI setup code deal > with one SG list entry at a time, and chain each together. > > Something I've also been pondering which is related to this is linking > together DMA operations using the LLI chaining when interrupts and > callbacks aren't required. It's a very similar problem. Other DMA > engine drivers do this. > Are you talking about linking all sg's together or linking multiple calls to prep_dma_memcpy? I am handling the first. > Finally, note that some of the PL08x code assumes that for any TXD, > the LLI list is a contiguous array. See the first part of > pl08x_getbytes_chan(). That needs fixing as soon as we start going to > more than one SG list entry. We still have one contiguous array for LLI list. In pl08x_fill_llis_for_desc() i am using same txd->llis_va for all sg's. Sorry if i misunderstood your concerns.
On Thu, Aug 18, 2011 at 02:08:19PM +0530, Viresh Kumar wrote: > On 8/14/2011 2:06 PM, Russell King - ARM Linux wrote: > > On Fri, Jul 29, 2011 at 04:19:26PM +0530, Viresh Kumar wrote: > >> Untill now, sg_len greater than one is not supported. This patch adds support to > >> do that. > > > > I'm not sure that this is the correct approach. memcpy()s can only be > > used with one single buffer, so the sg stuff for that (and the unmapping > > support) doesn't make sense at all. > > > > I am not sure if i get this completely. In memcpy, we still don't support > more than one sg. We have created a new member in txd, which keeps track of > data (addresses, len). Yes, but we shouldn't need to translate it into any kind of scatterlist. > While unmapping, we are unmapping only single dsg. I think i don't need to > use list_for_each_entry() there, as there is only one element present in list. > But can still keep that. Correct. > > The only place where this makes sense is the slave sg stuff. I wonder > > whether we can better deal with that by having the LLI setup code deal > > with one SG list entry at a time, and chain each together. > > > > Something I've also been pondering which is related to this is linking > > together DMA operations using the LLI chaining when interrupts and > > callbacks aren't required. It's a very similar problem. Other DMA > > engine drivers do this. > > Are you talking about linking all sg's together or linking multiple calls to > prep_dma_memcpy? I am handling the first. I'm talking about letting the hardware process as many txds as possible without calling back into the driver to have it setup the next txd. > > Finally, note that some of the PL08x code assumes that for any TXD, > > the LLI list is a contiguous array. See the first part of > > pl08x_getbytes_chan(). That needs fixing as soon as we start going to > > more than one SG list entry. > > We still have one contiguous array for LLI list. In pl08x_fill_llis_for_desc() > i am using same txd->llis_va for all sg's. That's ok then.
On 8/21/2011 2:03 PM, Russell King - ARM Linux wrote: >> > I am not sure if i get this completely. In memcpy, we still don't support >> > more than one sg. We have created a new member in txd, which keeps track of >> > data (addresses, len). > Yes, but we shouldn't need to translate it into any kind of scatterlist. > Ok. I way out is keeping two separate variables in txd, list for slave transfers, and pointer to single element for memcpy. And that looks to be even bad to me. Why waste memory, for second variable. Or create union of both. One more thing, we can actually have scatter gather in memcpy too in future. This will be helpful then also. But surely that's something not implemented currently.
On Tue, 2011-08-23 at 09:52 +0530, Viresh Kumar wrote: > On 8/21/2011 2:03 PM, Russell King - ARM Linux wrote: > >> > I am not sure if i get this completely. In memcpy, we still don't support > >> > more than one sg. We have created a new member in txd, which keeps track of > >> > data (addresses, len). > > Yes, but we shouldn't need to translate it into any kind of scatterlist. > > > > Ok. I way out is keeping two separate variables in txd, list for slave transfers, > and pointer to single element for memcpy. And that looks to be even bad to me. > Why waste memory, for second variable. Or create union of both. > > One more thing, we can actually have scatter gather in memcpy too in future. This will be > helpful then also. But surely that's something not implemented currently. > There are few points for consideration: 1) Clients will calls tx_submit() and submit the descriptors, this should push your descriptor to queued list 2) On issue_pending() you need to push the queued descriptors to dmac. If your hardware can transfer multiple descriptors, then you should use this and push all available descriptors to hardware. This way you ensure optimal performance and give LLI for memcpy even when you think API is not available. If there are some use cases where we need scatterlist support for memcpy, and if above doesn't satisfy, should be okay to add it.
On 8/23/2011 9:52 AM, Viresh Kumar wrote: > On 8/21/2011 2:03 PM, Russell King - ARM Linux wrote: >>>> I am not sure if i get this completely. In memcpy, we still don't support >>>> more than one sg. We have created a new member in txd, which keeps track of >>>> data (addresses, len). >> Yes, but we shouldn't need to translate it into any kind of scatterlist. >> > > Ok. I way out is keeping two separate variables in txd, list for slave transfers, > and pointer to single element for memcpy. And that looks to be even bad to me. > Why waste memory, for second variable. Or create union of both. > > One more thing, we can actually have scatter gather in memcpy too in future. This will be > helpful then also. But surely that's something not implemented currently. > Russell/Linus, Probably this is the only issue left in this patch. (Sorry if i missed some other points) Please suggest what should i modify here, so that Vinod can go ahead and push pending patches too.
On 8/26/2011 1:33 PM, Linus Walleij wrote: > Patches are individual for that very reason - so they can be applied one by one. > > I already suggested to Vinod to merge the other patches as far as possible, > for certain 1 thru 15 could be merged right off with some minor > manual fixup. Then we can fix this one patch. Actually, Vinod has already applied earlier 17 patches. Now only 3 are left, this one and two after it. I just wanted to know, whether i need to do any other modification in this patch or not?
On 8/26/2011 2:21 PM, Viresh Kumar wrote:
> I just wanted to know, whether i need to do any other modification in this patch or not?
Hello,
Do i need to update anything in this patch? Or can this be pushed as it is?
On Thu, 2011-09-01 at 15:37 +0530, Viresh Kumar wrote: > On 8/26/2011 2:21 PM, Viresh Kumar wrote: > > I just wanted to know, whether i need to do any other modification in this patch or not? > > Hello, > > Do i need to update anything in this patch? Or can this be pushed as it is? > I thought Linus W, and Russell had some comments on this. -- ~Vinod
2011/9/7 Koul, Vinod <vinod.koul@intel.com>: > On Thu, 2011-09-01 at 15:37 +0530, Viresh Kumar wrote: >> On 8/26/2011 2:21 PM, Viresh Kumar wrote: >> > I just wanted to know, whether i need to do any other modification in this patch or not? >> >> Hello, >> >> Do i need to update anything in this patch? Or can this be pushed as it is? >> > I thought Linus W, and Russell had some comments on this. I think the patch brings valuable functionality we don't want to loose when there is a solution. Basically the dmaengine has a contract to handle sglists of any lengths and it's a pity that we don't, and I suspect Viresh cannot use the driver for MMC unless something like this is added, so Acked-by: Linus Walleij <linus.walleij@linaro.org> BUT I think it is possible to rewrite it a bit later so as to get a better handling of this. Isn't Russells initial remark that the LLI:s can simply just take in the entire sglist at once true? Check for example in drivers/dma/coh901318.c which evidently has a LLI format similar or identical to what pl08x use (which is another reason to refactor this later): It will just fill in LLIs for all the elements of the sglist, and off you go. Like this: /* The dma only supports transmitting packages up to * MAX_DMA_PACKET_SIZE. Calculate to total number of * dma elemts required to send the entire sg list */ for_each_sg(sgl, sg, sg_len, i) { unsigned int factor; size = sg_dma_len(sg); if (size <= MAX_DMA_PACKET_SIZE) { len++; continue; } factor = size >> MAX_DMA_PACKET_SIZE_SHIFT; if ((factor << MAX_DMA_PACKET_SIZE_SHIFT) < size) factor++; len += factor; } pr_debug("Allocate %d lli:s for this transfer\n", len); lli = coh901318_lli_alloc(&cohc->base->pool, len); if (lli == NULL) goto err_dma_alloc; /* initiate allocated lli list */ ret = coh901318_lli_fill_sg(&cohc->base->pool, lli, sgl, sg_len, cohc_dev_addr(cohc), ctrl_chained, ctrl, ctrl_last, direction, COH901318_CX_CTRL_TC_IRQ_ENABLE); if (ret) goto err_lli_fill; The rest of interesting code is in coh901318_lli_fill_sg() as you can see. Evidently this need to be fixed as part of factoring out the LLI handling from PL08x and coh901318 into a single one. If we ever get to it. Yours, Linus Walleij
On 9/8/2011 4:31 AM, Linus Walleij wrote: > I think the patch brings valuable functionality we don't want to loose when > there is a solution. Basically the dmaengine has a contract to handle > sglists of any lengths and it's a pity that we don't, and I suspect Viresh > cannot use the driver for MMC unless something like this is added, so > Acked-by: Linus Walleij <linus.walleij@linaro.org> > Thanks Again. > BUT I think it is possible to rewrite it a bit later so as to get a better > handling of this. Isn't Russells initial remark that the LLI:s can simply just > take in the entire sglist at once true? If i am getting this clearly, the concern is "why to queue separate transfers for individual sg's? Better would be to prepare the complete list at once and start the transfer, so that DMA stops only after finishing all sg's passed from user." Is this what you are pointing at? If yes, then the same is done in this patch too. An array for llis is allocated at the start, then for each sg i prepare lli list from this array. Last lli from one sg is followed by first lli from next sg. And so i get a continuous chain of llis.
On Thu, Sep 8, 2011 at 5:50 AM, Viresh Kumar <viresh.kumar@st.com> wrote: > If i am getting this clearly, the concern is "why to queue separate transfers for > individual sg's? Better would be to prepare the complete list at once and > start the transfer, so that DMA stops only after finishing all sg's > passed from user." Is this what you are pointing at? Yes. > If yes, then the same is done in this patch too. An array for llis is allocated at > the start, then for each sg i prepare lli list from this array. Last lli from one sg > is followed by first lli from next sg. And so i get a continuous chain of llis. OK so I guess I was lost in the code ... So this is mainy cached as txd->dsg_list so you can quickly retrieve the number of bytes pending in the LLI by traversing that sglist. This is better than what the coh901318 does, because that driver resorts to going into the physical LLIs themselves to retrieve this info. It also seems like this will play nice with Per Forlin's MMC speed-up patches, so that will become effective for your MMC usecases. Now I really like this patch. Sorry for being such a slow learner! Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
diff --git a/drivers/dma/amba-pl08x.c b/drivers/dma/amba-pl08x.c index 9eea0d1..ddc06bd 100644 --- a/drivers/dma/amba-pl08x.c +++ b/drivers/dma/amba-pl08x.c @@ -354,7 +354,9 @@ static u32 pl08x_getbytes_chan(struct pl08x_dma_chan *plchan) if (!list_empty(&plchan->pend_list)) { struct pl08x_txd *txdi; list_for_each_entry(txdi, &plchan->pend_list, node) { - bytes += txdi->len; + struct pl08x_sg *dsg; + list_for_each_entry(dsg, &txd->dsg_list, node) + bytes += dsg->len; } } @@ -564,6 +566,7 @@ static int pl08x_fill_llis_for_desc(struct pl08x_driver_data *pl08x, u32 cctl, early_bytes = 0; size_t max_bytes_per_lli, total_bytes = 0; struct pl08x_lli *llis_va; + struct pl08x_sg *dsg; txd->llis_va = dma_pool_alloc(pl08x->pool, GFP_NOWAIT, &txd->llis_bus); if (!txd->llis_va) { @@ -573,13 +576,9 @@ static int pl08x_fill_llis_for_desc(struct pl08x_driver_data *pl08x, pl08x->pool_ctr++; - /* Get the default CCTL */ - cctl = txd->cctl; - bd.txd = txd; - bd.srcbus.addr = txd->src_addr; - bd.dstbus.addr = txd->dst_addr; bd.lli_bus = (pl08x->lli_buses & PL08X_AHB2) ? PL080_LLI_LM_AHB2 : 0; + cctl = txd->cctl; /* Find maximum width of the source bus */ bd.srcbus.maxwidth = @@ -591,46 +590,27 @@ static int pl08x_fill_llis_for_desc(struct pl08x_driver_data *pl08x, pl08x_get_bytes_for_cctl((cctl & PL080_CONTROL_DWIDTH_MASK) >> PL080_CONTROL_DWIDTH_SHIFT); - /* Set up the bus widths to the maximum */ - bd.srcbus.buswidth = bd.srcbus.maxwidth; - bd.dstbus.buswidth = bd.dstbus.maxwidth; + list_for_each_entry(dsg, &txd->dsg_list, node) { + cctl = txd->cctl; - /* We need to count this down to zero */ - bd.remainder = txd->len; + bd.srcbus.addr = dsg->src_addr; + bd.dstbus.addr = dsg->dst_addr; + bd.remainder = dsg->len; + bd.srcbus.buswidth = bd.srcbus.maxwidth; + bd.dstbus.buswidth = bd.dstbus.maxwidth; - pl08x_choose_master_bus(&bd, &mbus, &sbus, cctl); + pl08x_choose_master_bus(&bd, &mbus, &sbus, cctl); - dev_vdbg(&pl08x->adev->dev, "src=0x%08x%s/%u dst=0x%08x%s/%u len=%zu\n", - bd.srcbus.addr, cctl & PL080_CONTROL_SRC_INCR ? "+" : "", - bd.srcbus.buswidth, - bd.dstbus.addr, cctl & PL080_CONTROL_DST_INCR ? "+" : "", - bd.dstbus.buswidth, - bd.remainder); - dev_vdbg(&pl08x->adev->dev, "mbus=%s sbus=%s\n", - mbus == &bd.srcbus ? "src" : "dst", - sbus == &bd.srcbus ? "src" : "dst"); - - /* - * Send byte by byte for following cases - * - Less than a bus width available - * - until master bus is aligned - */ - if (bd.remainder < mbus->buswidth) - early_bytes = bd.remainder; - else if ((mbus->addr) % (mbus->buswidth)) { - early_bytes = mbus->buswidth - (mbus->addr) % (mbus->buswidth); - if ((bd.remainder - early_bytes) < mbus->buswidth) - early_bytes = bd.remainder; - } + dev_vdbg(&pl08x->adev->dev, "src=0x%08x%s/%u dst=0x%08x%s/%u len=%zu\n", + bd.srcbus.addr, cctl & PL080_CONTROL_SRC_INCR ? "+" : "", + bd.srcbus.buswidth, + bd.dstbus.addr, cctl & PL080_CONTROL_DST_INCR ? "+" : "", + bd.dstbus.buswidth, + bd.remainder); + dev_vdbg(&pl08x->adev->dev, "mbus=%s sbus=%s\n", + mbus == &bd.srcbus ? "src" : "dst", + sbus == &bd.srcbus ? "src" : "dst"); - if (early_bytes) { - dev_vdbg(&pl08x->adev->dev, "%s byte width LLIs " - "(remain 0x%08x)\n", __func__, bd.remainder); - prep_byte_width_lli(&bd, &cctl, early_bytes, num_llis++, - &total_bytes); - } - - if (bd.remainder) { /* * Zero length is only allowed if all these requirements are * met: @@ -664,79 +644,111 @@ static int pl08x_fill_llis_for_desc(struct pl08x_driver_data *pl08x, } /* - * Master now aligned - * - if slave is not then we must set its width down + * Send byte by byte for following cases + * - Less than a bus width available + * - until master bus is aligned */ - if (sbus->addr % sbus->buswidth) { - dev_dbg(&pl08x->adev->dev, - "%s set down bus width to one byte\n", - __func__); + if (bd.remainder < mbus->buswidth) + early_bytes = bd.remainder; + else if ((mbus->addr) % (mbus->buswidth)) { + early_bytes = mbus->buswidth - (mbus->addr) % + (mbus->buswidth); + if ((bd.remainder - early_bytes) < mbus->buswidth) + early_bytes = bd.remainder; + } - sbus->buswidth = 1; + if (early_bytes) { + dev_vdbg(&pl08x->adev->dev, "%s byte width LLIs " + "(remain 0x%08x)\n", __func__, bd.remainder); + prep_byte_width_lli(&bd, &cctl, early_bytes, num_llis++, + &total_bytes); } - /* Bytes transferred = tsize * src width, not MIN(buswidths) */ - max_bytes_per_lli = bd.srcbus.buswidth * - PL080_CONTROL_TRANSFER_SIZE_MASK; + if (bd.remainder) { + /* + * Master now aligned + * - if slave is not then we must set its width down + */ + if (sbus->addr % sbus->buswidth) { + dev_dbg(&pl08x->adev->dev, + "%s set down bus width to one byte\n", + __func__); - /* - * Make largest possible LLIs until less than one bus - * width left - */ - while (bd.remainder > (mbus->buswidth - 1)) { - size_t lli_len, tsize, width; + sbus->buswidth = 1; + } /* - * If enough left try to send max possible, - * otherwise try to send the remainder + * Bytes transferred = tsize * src width, not + * MIN(buswidths) */ - lli_len = min(bd.remainder, max_bytes_per_lli); + max_bytes_per_lli = bd.srcbus.buswidth * + PL080_CONTROL_TRANSFER_SIZE_MASK; + dev_vdbg(&pl08x->adev->dev, "%s max bytes per lli = " + "%zu\n", __func__, max_bytes_per_lli); /* - * Check against maximum bus alignment: Calculate actual - * transfer size in relation to bus width and get a - * maximum remainder of the highest bus width - 1 + * Make largest possible LLIs until less than one bus + * width left */ - width = max(mbus->buswidth, sbus->buswidth); - lli_len = (lli_len / width) * width; - tsize = lli_len / bd.srcbus.buswidth; + while (bd.remainder > (mbus->buswidth - 1)) { + size_t lli_len, tsize, width; - dev_vdbg(&pl08x->adev->dev, + /* + * If enough left try to send max possible, + * otherwise try to send the remainder + */ + lli_len = min(bd.remainder, max_bytes_per_lli); + + /* + * Check against maximum bus alignment: + * Calculate actual transfer size in relation to + * bus width an get a maximum remainder of the + * highest bus width - 1 + */ + width = max(mbus->buswidth, sbus->buswidth); + lli_len = (lli_len / width) * width; + tsize = lli_len / bd.srcbus.buswidth; + + dev_vdbg(&pl08x->adev->dev, "%s fill lli with single lli chunk of " "size 0x%08zx (remainder 0x%08zx)\n", __func__, lli_len, bd.remainder); - cctl = pl08x_cctl_bits(cctl, bd.srcbus.buswidth, + cctl = pl08x_cctl_bits(cctl, bd.srcbus.buswidth, bd.dstbus.buswidth, tsize); - pl08x_fill_lli_for_desc(&bd, num_llis++, lli_len, cctl); - total_bytes += lli_len; - } + pl08x_fill_lli_for_desc(&bd, num_llis++, + lli_len, cctl); + total_bytes += lli_len; + } - /* - * Send any odd bytes - */ - if (bd.remainder) { - dev_vdbg(&pl08x->adev->dev, - "%s align with boundary, send odd bytes " - "(remain %zu)\n", __func__, bd.remainder); - prep_byte_width_lli(&bd, &cctl, bd.remainder, - num_llis++, &total_bytes); + /* + * Send any odd bytes + */ + if (bd.remainder) { + dev_vdbg(&pl08x->adev->dev, + "%s align with boundary, send odd bytes" + " (remain %zu)\n", __func__, + bd.remainder); + prep_byte_width_lli(&bd, &cctl, bd.remainder, + num_llis++, &total_bytes); + } } - } - if (total_bytes != txd->len) { - dev_err(&pl08x->adev->dev, - "%s size of encoded lli:s don't match total txd, " - "transferred 0x%08zx from size 0x%08zx\n", __func__, - total_bytes, txd->len); - return 0; - } + if (total_bytes != dsg->len) { + dev_err(&pl08x->adev->dev, + "%s size of encoded lli:s don't match total txd" + ", transferred 0x%08zx from size 0x%08zx\n", + __func__, + total_bytes, dsg->len); + return 0; + } - if (num_llis >= MAX_NUM_TSFR_LLIS) { - dev_err(&pl08x->adev->dev, - "%s need to increase MAX_NUM_TSFR_LLIS from 0x%08x\n", - __func__, (u32) MAX_NUM_TSFR_LLIS); - return 0; + if (num_llis >= MAX_NUM_TSFR_LLIS) { + dev_err(&pl08x->adev->dev, + "%s need to increase MAX_NUM_TSFR_LLIS from " + "0x%08x\n", __func__, (u32) MAX_NUM_TSFR_LLIS); + return 0; + } } llis_va = txd->llis_va; @@ -769,11 +781,18 @@ static int pl08x_fill_llis_for_desc(struct pl08x_driver_data *pl08x, static void pl08x_free_txd(struct pl08x_driver_data *pl08x, struct pl08x_txd *txd) { + struct pl08x_sg *dsg, *_dsg; + /* Free the LLI */ dma_pool_free(pl08x->pool, txd->llis_va, txd->llis_bus); pl08x->pool_ctr--; + list_for_each_entry_safe(dsg, _dsg, &txd->dsg_list, node) { + list_del(&dsg->node); + kfree(dsg); + } + kfree(txd); } @@ -1228,6 +1247,7 @@ static struct pl08x_txd *pl08x_get_txd(struct pl08x_dma_chan *plchan, txd->tx.flags = flags; txd->tx.tx_submit = pl08x_tx_submit; INIT_LIST_HEAD(&txd->node); + INIT_LIST_HEAD(&txd->dsg_list); /* Always enable error and terminal interrupts */ txd->ccfg = PL080_CONFIG_ERR_IRQ_MASK | @@ -1246,6 +1266,7 @@ static struct dma_async_tx_descriptor *pl08x_prep_dma_memcpy( struct pl08x_dma_chan *plchan = to_pl08x_chan(chan); struct pl08x_driver_data *pl08x = plchan->host; struct pl08x_txd *txd; + struct pl08x_sg *dsg; int ret; txd = pl08x_get_txd(plchan, flags); @@ -1255,10 +1276,19 @@ static struct dma_async_tx_descriptor *pl08x_prep_dma_memcpy( return NULL; } + dsg = kzalloc(sizeof(struct pl08x_sg), GFP_NOWAIT); + if (!dsg) { + pl08x_free_txd(pl08x, txd); + dev_err(&pl08x->adev->dev, "%s no memory for pl080 sg\n", + __func__); + return NULL; + } + list_add_tail(&dsg->node, &txd->dsg_list); + txd->direction = DMA_NONE; - txd->src_addr = src; - txd->dst_addr = dest; - txd->len = len; + dsg->src_addr = src; + dsg->dst_addr = dest; + dsg->len = len; /* Set platform data for m2m */ txd->ccfg |= PL080_FLOW_MEM2MEM << PL080_CONFIG_FLOW_CONTROL_SHIFT; @@ -1287,17 +1317,11 @@ static struct dma_async_tx_descriptor *pl08x_prep_slave_sg( struct pl08x_dma_chan *plchan = to_pl08x_chan(chan); struct pl08x_driver_data *pl08x = plchan->host; struct pl08x_txd *txd; + struct pl08x_sg *dsg; + struct scatterlist *sg; + dma_addr_t slave_addr; int ret, tmp; - /* - * Current implementation ASSUMES only one sg - */ - if (sg_len != 1) { - dev_err(&pl08x->adev->dev, "%s prepared too long sglist\n", - __func__); - BUG(); - } - dev_dbg(&pl08x->adev->dev, "%s prepare transaction of %d bytes from" " %s\n", __func__, sgl->length, plchan->name); @@ -1318,17 +1342,15 @@ static struct dma_async_tx_descriptor *pl08x_prep_slave_sg( * channel target address dynamically at runtime. */ txd->direction = direction; - txd->len = sgl->length; if (direction == DMA_TO_DEVICE) { txd->cctl = plchan->dst_cctl; - txd->src_addr = sgl->dma_address; - txd->dst_addr = plchan->dst_addr; + slave_addr = plchan->dst_addr; } else if (direction == DMA_FROM_DEVICE) { txd->cctl = plchan->src_cctl; - txd->src_addr = plchan->src_addr; - txd->dst_addr = sgl->dma_address; + slave_addr = plchan->src_addr; } else { + pl08x_free_txd(pl08x, txd); dev_err(&pl08x->adev->dev, "%s direction unsupported\n", __func__); return NULL; @@ -1342,6 +1364,26 @@ static struct dma_async_tx_descriptor *pl08x_prep_slave_sg( txd->ccfg |= tmp << PL080_CONFIG_FLOW_CONTROL_SHIFT; + for_each_sg(sgl, sg, sg_len, tmp) { + dsg = kzalloc(sizeof(struct pl08x_sg), GFP_NOWAIT); + if (!dsg) { + pl08x_free_txd(pl08x, txd); + dev_err(&pl08x->adev->dev, "%s no mem for pl080 sg\n", + __func__); + return NULL; + } + list_add_tail(&dsg->node, &txd->dsg_list); + + dsg->len = sg_dma_len(sg); + if (direction == DMA_TO_DEVICE) { + dsg->src_addr = sg_phys(sg); + dsg->dst_addr = slave_addr; + } else { + dsg->src_addr = slave_addr; + dsg->dst_addr = sg_phys(sg); + } + } + ret = pl08x_prep_channel_resources(plchan, txd); if (ret) return NULL; @@ -1439,22 +1481,28 @@ static void pl08x_ensure_on(struct pl08x_driver_data *pl08x) static void pl08x_unmap_buffers(struct pl08x_txd *txd) { struct device *dev = txd->tx.chan->device->dev; + struct pl08x_sg *dsg; if (!(txd->tx.flags & DMA_COMPL_SKIP_SRC_UNMAP)) { if (txd->tx.flags & DMA_COMPL_SRC_UNMAP_SINGLE) - dma_unmap_single(dev, txd->src_addr, txd->len, - DMA_TO_DEVICE); - else - dma_unmap_page(dev, txd->src_addr, txd->len, - DMA_TO_DEVICE); + list_for_each_entry(dsg, &txd->dsg_list, node) + dma_unmap_single(dev, dsg->src_addr, dsg->len, + DMA_TO_DEVICE); + else { + list_for_each_entry(dsg, &txd->dsg_list, node) + dma_unmap_page(dev, dsg->src_addr, dsg->len, + DMA_TO_DEVICE); + } } if (!(txd->tx.flags & DMA_COMPL_SKIP_DEST_UNMAP)) { if (txd->tx.flags & DMA_COMPL_DEST_UNMAP_SINGLE) - dma_unmap_single(dev, txd->dst_addr, txd->len, - DMA_FROM_DEVICE); + list_for_each_entry(dsg, &txd->dsg_list, node) + dma_unmap_single(dev, dsg->dst_addr, dsg->len, + DMA_FROM_DEVICE); else - dma_unmap_page(dev, txd->dst_addr, txd->len, - DMA_FROM_DEVICE); + list_for_each_entry(dsg, &txd->dsg_list, node) + dma_unmap_page(dev, dsg->dst_addr, dsg->len, + DMA_FROM_DEVICE); } } diff --git a/include/linux/amba/pl08x.h b/include/linux/amba/pl08x.h index febb0da..9090342 100644 --- a/include/linux/amba/pl08x.h +++ b/include/linux/amba/pl08x.h @@ -107,12 +107,24 @@ struct pl08x_phy_chan { }; /** + * struct pl08x_sg - structure containing data per sg + * @src_addr: src address of sg + * @dst_addr: dst address of sg + * @len: transfer len in bytes + * @node: node for txd's dsg_list + */ +struct pl08x_sg { + dma_addr_t src_addr; + dma_addr_t dst_addr; + size_t len; + struct list_head node; +}; + +/** * struct pl08x_txd - wrapper for struct dma_async_tx_descriptor * @tx: async tx descriptor * @node: node for txd list for channels - * @src_addr: src address of txd - * @dst_addr: dst address of txd - * @len: transfer len in bytes + * @dsg_list: list of children sg's * @direction: direction of transfer * @llis_bus: DMA memory address (physical) start for the LLIs * @llis_va: virtual memory address start for the LLIs @@ -122,10 +134,8 @@ struct pl08x_phy_chan { struct pl08x_txd { struct dma_async_tx_descriptor tx; struct list_head node; + struct list_head dsg_list; enum dma_data_direction direction; - dma_addr_t src_addr; - dma_addr_t dst_addr; - size_t len; dma_addr_t llis_bus; struct pl08x_lli *llis_va; /* Default cctl value for LLIs */
Untill now, sg_len greater than one is not supported. This patch adds support to do that. Signed-off-by: Viresh Kumar <viresh.kumar@st.com> --- drivers/dma/amba-pl08x.c | 288 ++++++++++++++++++++++++++------------------ include/linux/amba/pl08x.h | 22 +++- 2 files changed, 184 insertions(+), 126 deletions(-)