diff mbox

[v1,1/3] dma: tegra: finer granularity residual for tx_status

Message ID 1399411343-12222-2-git-send-email-cfreeman@nvidia.com (mailing list archive)
State Rejected
Delegated to: Vinod Koul
Headers show

Commit Message

Christopher Freeman May 6, 2014, 9:22 p.m. UTC
Get word-level granularity from hardware for calculating
the transfer count remaining.

Signed-off-by: Christopher Freeman <cfreeman@nvidia.com>
---
 drivers/dma/tegra20-apb-dma.c |   52 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Comments

Stephen Warren May 7, 2014, 4:37 p.m. UTC | #1
On 05/06/2014 03:22 PM, Christopher Freeman wrote:
> Get word-level granularity from hardware for calculating
> the transfer count remaining.

> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c

> +static int tegra_dma_wcount_in_bytes(struct dma_chan *dc)

A lot of the code in this function is identical to the code in
tegra_dma_terminate_all() which does the same thing. Can this be pulled
out into a shared utility function?

> +	tegra_dma_pause(tdc, true);

Is this continual pausing/resuming of the DMA operation going to
negatively affect performance?

> +	/* in case of interrupt, handle it and don't read wcount reg */
> +	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +	if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +		tdc_write(tdc, TEGRA_APBDMA_CHAN_STATUS, status);
> +		dev_info(tdc2dev(tdc), "%s():handling isr\n", __func__);

If you swap the order of patches 1 and 2, then you can just add that
line as dev_dbg() from the start, and you won't need to change it in the
next patch.

> +		tdc->isr_handler(tdc, false);
> +		tegra_dma_resume(tdc);
> +		return 0;

Why resume and return here? Shouldn't those last 2 lines be removed, so
the code can simply continue through the balance of the function and
return the actual status. tegra_dma_terminate_all() does that.

> @@ -812,9 +851,22 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  	list_for_each_entry(sg_req, &tdc->pending_sg_req, node) {
>  		dma_desc = sg_req->dma_desc;
>  		if (dma_desc->txd.cookie == cookie) {
> +			hw_byte_count = tegra_dma_wcount_in_bytes(dc);
> +
> +			if (!list_empty(&tdc->pending_sg_req))

Since this code is inside a loop that iterates over tha list, I don't
think the list can ever be empty.

> +				first_entry =
> +					list_first_entry(&tdc->pending_sg_req,
> +						typeof(*first_entry), node);
> +
>  			residual =  dma_desc->bytes_requested -
>  					(dma_desc->bytes_transferred %
>  						dma_desc->bytes_requested);
> +
> +			/* hw byte count only applies to current transaction */
> +			if (first_entry &&
> +				first_entry->dma_desc->txd.cookie == cookie)
> +				residual -= hw_byte_count;
> +
>  			dma_set_residue(txstate, residual);

Why not re-order the added code so that all the new code is added in one
place, and the hw_byte_count value is only calculated if it's used, i.e.:

residual = ...;
first_entry = ...;
if (sg_reg == first_entry) {
    hw_byte_count = ...;
    residual -= hw_byte_count;
}

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Freeman May 7, 2014, 10:37 p.m. UTC | #2
On Wed, May 07, 2014 at 09:37:25AM -0700, Stephen Warren wrote:
> On 05/06/2014 03:22 PM, Christopher Freeman wrote:
> > Get word-level granularity from hardware for calculating
> > the transfer count remaining.
> 
> > diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> 
> > +static int tegra_dma_wcount_in_bytes(struct dma_chan *dc)
> 
> A lot of the code in this function is identical to the code in
> tegra_dma_terminate_all() which does the same thing. Can this be pulled
> out into a shared utility function?
> 
I'll look at making utility functions for ISR handling and the calculations for the byte counts.

> > +	tegra_dma_pause(tdc, true);
> 
> Is this continual pausing/resuming of the DMA operation going to
> negatively affect performance?
> 
I tried testing the performance impact and each call took about 20 uS.  And of course, the client would have to be calling this constantly.
> > +	/* in case of interrupt, handle it and don't read wcount reg */
> > +	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> > +	if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> > +		tdc_write(tdc, TEGRA_APBDMA_CHAN_STATUS, status);
> > +		dev_info(tdc2dev(tdc), "%s():handling isr\n", __func__);
> 
> If you swap the order of patches 1 and 2, then you can just add that
> line as dev_dbg() from the start, and you won't need to change it in the
> next patch.
> 
Will do.
> > +		tdc->isr_handler(tdc, false);
> > +		tegra_dma_resume(tdc);
> > +		return 0;
> 
> Why resume and return here? Shouldn't those last 2 lines be removed, so
> the code can simply continue through the balance of the function and
> return the actual status. tegra_dma_terminate_all() does that.
> 
Handling the interrupt will increment the transfer count when that segment is completed.  There's no need to read the hardware and in fact, we don't want to run the risk of reading a hardware register that is stale.  For example, the transfer is complete, but the transfer count register has not been zeroed.

> > @@ -812,9 +851,22 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
> >  	list_for_each_entry(sg_req, &tdc->pending_sg_req, node) {
> >  		dma_desc = sg_req->dma_desc;
> >  		if (dma_desc->txd.cookie == cookie) {
> > +			hw_byte_count = tegra_dma_wcount_in_bytes(dc);
> > +
> > +			if (!list_empty(&tdc->pending_sg_req))
> 
> Since this code is inside a loop that iterates over tha list, I don't
> think the list can ever be empty.
> $
tegra_dma_wcount_in_bytes may modify the pending_sg_req since it can invoke the ISR.  So the list may become empty.  Explaining that just now made me cringe, shall I rewrite it so we can't modify the list we're iterating over?  Granted, once this code is invoked, we're done iterating.

> > +				first_entry =
> > +					list_first_entry(&tdc->pending_sg_req,
> > +						typeof(*first_entry), node);
> > +
> >  			residual =  dma_desc->bytes_requested -
> >  					(dma_desc->bytes_transferred %
> >  						dma_desc->bytes_requested);
> > +
> > +			/* hw byte count only applies to current transaction */
> > +			if (first_entry &&
> > +				first_entry->dma_desc->txd.cookie == cookie)
> > +				residual -= hw_byte_count;
> > +
> >  			dma_set_residue(txstate, residual);
> 
> Why not re-order the added code so that all the new code is added in one
> place, and the hw_byte_count value is only calculated if it's used, i.e.:
> 
> residual = ...;
> first_entry = ...;
> if (sg_reg == first_entry) {
>     hw_byte_count = ...;
>     residual -= hw_byte_count;
> }
> 
My comment above may shed some light on the ordering reason.  The "first entry" may change when we handle the ISR.
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Warren May 12, 2014, 5:27 p.m. UTC | #3
On 05/07/2014 04:37 PM, Christopher Freeman wrote:
> On Wed, May 07, 2014 at 09:37:25AM -0700, Stephen Warren wrote:
>> On 05/06/2014 03:22 PM, Christopher Freeman wrote:
>>> Get word-level granularity from hardware for calculating
>>> the transfer count remaining.
>>
>>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
>>
>>> +static int tegra_dma_wcount_in_bytes(struct dma_chan *dc)

>>> +	/* in case of interrupt, handle it and don't read wcount reg */
>>> +	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>> +	if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>>> +		tdc_write(tdc, TEGRA_APBDMA_CHAN_STATUS, status);
>>> +		dev_info(tdc2dev(tdc), "%s():handling isr\n", __func__);
>>> +		tdc->isr_handler(tdc, false);
>>> +		tegra_dma_resume(tdc);
>>> +		return 0;
>>
>> Why resume and return here? Shouldn't those last 2 lines be removed, so
>> the code can simply continue through the balance of the function and
>> return the actual status. tegra_dma_terminate_all() does that.
>>
> Handling the interrupt will increment the transfer count when that segment is completed.  There's no need to read the hardware and in fact, we don't want to run the risk of reading a hardware register that is stale.  For example, the transfer is complete, but the transfer count register has not been zeroed.

(could you line-wrap your email; long lines are hard to read)

OK, I think that makes sense. I suppose "return 0" rather than "return
current_transfer->final_byte_count" is correct since this function
returns the byte count modulo the transfer size, so a complete
transfer's byte count is always 0? But what if another transfer was
already queued into the HW; shouldn't this function return the byte
count of the new current transfer? Perhaps that can't happen since the
channel is paused so a new transfer can't start, so it can also have
only transferred 0 bytes. A comment that mentions these points would be
useful.

>>> @@ -812,9 +851,22 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>>>  	list_for_each_entry(sg_req, &tdc->pending_sg_req, node) {
>>>  		dma_desc = sg_req->dma_desc;
>>>  		if (dma_desc->txd.cookie == cookie) {
>>> +			hw_byte_count = tegra_dma_wcount_in_bytes(dc);
>>> +
>>> +			if (!list_empty(&tdc->pending_sg_req))
>>
>> Since this code is inside a loop that iterates over that list, I don't
>> think the list can ever be empty.
>
> tegra_dma_wcount_in_bytes may modify the pending_sg_req since it can invoke the ISR.  So the list may become empty.  Explaining that just now made me cringe, shall I rewrite it so we can't modify the list we're iterating over?  Granted, once this code is invoked, we're done iterating.

If it's possible to avoid the list being modified while iterating over
it, then yes, we should.

If it's not, shouldn't the code iterate with list_for_each_entry_safe()
rather than list_for_each_entry()?

>>> +				first_entry =
>>> +					list_first_entry(&tdc->pending_sg_req,
>>> +						typeof(*first_entry), node);
>>> +
>>>  			residual =  dma_desc->bytes_requested -
>>>  					(dma_desc->bytes_transferred %
>>>  						dma_desc->bytes_requested);
>>> +
>>> +			/* hw byte count only applies to current transaction */
>>> +			if (first_entry &&
>>> +				first_entry->dma_desc->txd.cookie == cookie)
>>> +				residual -= hw_byte_count;
>>> +
>>>  			dma_set_residue(txstate, residual);
>>
>> Why not re-order the added code so that all the new code is added in one
>> place, and the hw_byte_count value is only calculated if it's used, i.e.:
>>
>> residual = ...;
>> first_entry = ...;
>> if (sg_reg == first_entry) {
>>     hw_byte_count = ...;
>>     residual -= hw_byte_count;
>> }
>>
> My comment above may shed some light on the ordering reason.  The "first entry" may change when we handle the ISR.

I still suspect the code can be re-ordered, it's just that the if
condition needs to say "if sg_req == current active transfer" in a way
that doesn't rely on "list_first_entry()". Doesn't each req have a
status to say that it's the currently active transfer, and that gets set
when queued and updated when the ISR handles completion?
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vinod Koul May 21, 2014, 6 a.m. UTC | #4
On Wed, May 07, 2014 at 03:37:45PM -0700, Christopher Freeman wrote:
> On Wed, May 07, 2014 at 09:37:25AM -0700, Stephen Warren wrote:
> > On 05/06/2014 03:22 PM, Christopher Freeman wrote:
> > > Get word-level granularity from hardware for calculating
> > > the transfer count remaining.
> > 
> > > diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> > 
> > > +static int tegra_dma_wcount_in_bytes(struct dma_chan *dc)
> > 
> > A lot of the code in this function is identical to the code in
> > tegra_dma_terminate_all() which does the same thing. Can this be pulled
> > out into a shared utility function?
> > 
> I'll look at making utility functions for ISR handling and the calculations for the byte counts.
> 
> > > +	tegra_dma_pause(tdc, true);
> > 
> > Is this continual pausing/resuming of the DMA operation going to
> > negatively affect performance?
> > 
> I tried testing the performance impact and each call took about 20 uS.  And of
> course, the client would have to be calling this constantly.
But why do we need to pause, cant we read the status form HW and report..?
diff mbox

Patch

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index 03ad64e..cc6b2fd 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -779,15 +779,54 @@  skip_dma_stop:
 	spin_unlock_irqrestore(&tdc->lock, flags);
 }
 
+static int tegra_dma_wcount_in_bytes(struct dma_chan *dc)
+{
+	struct tegra_dma_channel *tdc = to_tegra_dma_chan(dc);
+	struct tegra_dma_sg_req *sgreq;
+	unsigned long wcount = 0;
+	unsigned long status = 0;
+	int bytes = 0;
+
+	if (list_empty(&tdc->pending_sg_req) || !tdc->busy)
+		return 0;
+
+	tegra_dma_pause(tdc, true);
+
+	/* in case of interrupt, handle it and don't read wcount reg */
+	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
+	if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
+		tdc_write(tdc, TEGRA_APBDMA_CHAN_STATUS, status);
+		dev_info(tdc2dev(tdc), "%s():handling isr\n", __func__);
+		tdc->isr_handler(tdc, false);
+		tegra_dma_resume(tdc);
+		return 0;
+	}
+
+	if (tdc->tdma->chip_data->support_separate_wcount_reg)
+		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
+	else
+		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
+
+	sgreq = list_first_entry(&tdc->pending_sg_req,
+				typeof(*sgreq), node);
+	bytes = get_current_xferred_count(tdc, sgreq, wcount);
+
+	tegra_dma_resume(tdc);
+
+	return bytes;
+}
+
 static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
 	dma_cookie_t cookie, struct dma_tx_state *txstate)
 {
 	struct tegra_dma_channel *tdc = to_tegra_dma_chan(dc);
 	struct tegra_dma_desc *dma_desc;
 	struct tegra_dma_sg_req *sg_req;
+	struct tegra_dma_sg_req *first_entry = NULL;
 	enum dma_status ret;
 	unsigned long flags;
 	unsigned int residual;
+	unsigned int hw_byte_count = 0;
 
 	ret = dma_cookie_status(dc, cookie, txstate);
 	if (ret == DMA_COMPLETE)
@@ -812,9 +851,22 @@  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
 	list_for_each_entry(sg_req, &tdc->pending_sg_req, node) {
 		dma_desc = sg_req->dma_desc;
 		if (dma_desc->txd.cookie == cookie) {
+			hw_byte_count = tegra_dma_wcount_in_bytes(dc);
+
+			if (!list_empty(&tdc->pending_sg_req))
+				first_entry =
+					list_first_entry(&tdc->pending_sg_req,
+						typeof(*first_entry), node);
+
 			residual =  dma_desc->bytes_requested -
 					(dma_desc->bytes_transferred %
 						dma_desc->bytes_requested);
+
+			/* hw byte count only applies to current transaction */
+			if (first_entry &&
+				first_entry->dma_desc->txd.cookie == cookie)
+				residual -= hw_byte_count;
+
 			dma_set_residue(txstate, residual);
 			ret = dma_desc->dma_status;
 			spin_unlock_irqrestore(&tdc->lock, flags);