diff mbox

Mailbox: Complete wait event only if Tx was successful

Message ID 1418242572-20998-1-git-send-email-ashwin.chaugule@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ashwin Chaugule Dec. 10, 2014, 8:16 p.m. UTC
If a wait_for_completion_timeout() call returns due to a timeout,
the mbox code can still call complete() after returning from the wait.
This can cause subsequent transmissions on a channel to fail, since
the wait_for_completion_timeout() sees the completion variable
is !=0, caused by the erroneous complete() call, and immediately
returns without waiting for the time as expected by the client.

Fix this by calling complete() only if the TX was successful.

Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
 drivers/mailbox/mailbox.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Sudeep Holla Dec. 12, 2014, 8:43 a.m. UTC | #1
Hi Ashwin,

On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
> If a wait_for_completion_timeout() call returns due to a timeout,
> the mbox code can still call complete() after returning from the wait.
> This can cause subsequent transmissions on a channel to fail, since
> the wait_for_completion_timeout() sees the completion variable
> is !=0, caused by the erroneous complete() call, and immediately
> returns without waiting for the time as expected by the client.
>
> Fix this by calling complete() only if the TX was successful.
>
> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> ---
>   drivers/mailbox/mailbox.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 17e9e4a..4acaddb 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>   	if (mssg && chan->cl->tx_done)
>   		chan->cl->tx_done(chan->cl, mssg, r);
>
> -	if (chan->cl->tx_block)
> +	if ((!r) && chan->cl->tx_block)
>   		complete(&chan->tx_complete);

Just curious to check if there's another possible race which is
a different issue.

Suppose the timer fired and indicated that the Tx is complete, then
it tries to execute complete while the wait_for_completion_timeout timed
out. Does that make sense ?

So if yes, how about adding !completion_done(..) to the check while you
are at this ?

Regards,
Sudeep
Jassi Brar Dec. 12, 2014, 10:21 a.m. UTC | #2
On 11 December 2014 at 01:46, Ashwin Chaugule
<ashwin.chaugule@linaro.org> wrote:
> If a wait_for_completion_timeout() call returns due to a timeout,
> the mbox code can still call complete() after returning from the wait.
> This can cause subsequent transmissions on a channel to fail, since
> the wait_for_completion_timeout() sees the completion variable
> is !=0, caused by the erroneous complete() call, and immediately
> returns without waiting for the time as expected by the client.
>
> Fix this by calling complete() only if the TX was successful.
>
> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> ---
>  drivers/mailbox/mailbox.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 17e9e4a..4acaddb 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>         if (mssg && chan->cl->tx_done)
>                 chan->cl->tx_done(chan->cl, mssg, r);
>
> -       if (chan->cl->tx_block)
> +       if ((!r) && chan->cl->tx_block)
>                 complete(&chan->tx_complete);
>
Thanks for finding the bug.
However the fix is flawed. complete() could also be done from
mbox_chan_txdone() calling tx_tick(). And if the controller returned
error, we could never pass on that error code to the user (timeout
fires and then we will move on with -EIO).
Since we could never prevent the controller from returning -EIO as the
error, I think we have to explicitly tell tx_tick() if it needs to
complete() or not.

-Jassi
Ashwin Chaugule Dec. 12, 2014, 5:47 p.m. UTC | #3
On 12 December 2014 at 03:43, Sudeep Holla <sudeep.holla@arm.com> wrote:
> Hi Ashwin,

Hi,

> On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
>>
>> If a wait_for_completion_timeout() call returns due to a timeout,
>> the mbox code can still call complete() after returning from the wait.
>> This can cause subsequent transmissions on a channel to fail, since
>> the wait_for_completion_timeout() sees the completion variable
>> is !=0, caused by the erroneous complete() call, and immediately
>> returns without waiting for the time as expected by the client.
>>
>> Fix this by calling complete() only if the TX was successful.
>>
>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>> ---
>>   drivers/mailbox/mailbox.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 17e9e4a..4acaddb 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>>         if (mssg && chan->cl->tx_done)
>>                 chan->cl->tx_done(chan->cl, mssg, r);
>>
>> -       if (chan->cl->tx_block)
>> +       if ((!r) && chan->cl->tx_block)
>>                 complete(&chan->tx_complete);
>
>
> Just curious to check if there's another possible race which is
> a different issue.
>
> Suppose the timer fired and indicated that the Tx is complete, then
> it tries to execute complete while the wait_for_completion_timeout timed
> out. Does that make sense ?
>
> So if yes, how about adding !completion_done(..) to the check while you
> are at this ?

Yea. Seems like another race condition. I'll add it along with this..

Thanks,
Ashwin
Ashwin Chaugule Dec. 12, 2014, 5:58 p.m. UTC | #4
On 12 December 2014 at 05:21, Jassi Brar <jaswinder.singh@linaro.org> wrote:
> On 11 December 2014 at 01:46, Ashwin Chaugule
> <ashwin.chaugule@linaro.org> wrote:
>> If a wait_for_completion_timeout() call returns due to a timeout,
>> the mbox code can still call complete() after returning from the wait.
>> This can cause subsequent transmissions on a channel to fail, since
>> the wait_for_completion_timeout() sees the completion variable
>> is !=0, caused by the erroneous complete() call, and immediately
>> returns without waiting for the time as expected by the client.
>>
>> Fix this by calling complete() only if the TX was successful.
>>
>> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>> ---
>>  drivers/mailbox/mailbox.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 17e9e4a..4acaddb 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>>         if (mssg && chan->cl->tx_done)
>>                 chan->cl->tx_done(chan->cl, mssg, r);
>>
>> -       if (chan->cl->tx_block)
>> +       if ((!r) && chan->cl->tx_block)
>>                 complete(&chan->tx_complete);
>>
> Thanks for finding the bug.
> However the fix is flawed. complete() could also be done from
> mbox_chan_txdone() calling tx_tick(). And if the controller returned
> error, we could never pass on that error code to the user (timeout
> fires and then we will move on with -EIO).

If the controller returned error, wouldnt it be passed to the user via
chan->cl->tx_done()?
However, I agree that with this fix, it won't call complete(), when
the controller finished the Tx with an Err.

> Since we could never prevent the controller from returning -EIO as the
> error, I think we have to explicitly tell tx_tick() if it needs to
> complete() or not.

Hm. How about checking if the wait_for_completion_timeout() returned
the timeout value? If so, then return with -EIO and don't complete().

>
> -Jassi
Sudeep Holla Dec. 16, 2014, 11:36 a.m. UTC | #5
On Fri, Dec 12, 2014 at 05:47:26PM +0000, Ashwin Chaugule wrote:
> On 12 December 2014 at 03:43, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > Hi Ashwin,
> 
> Hi,
>

Hi Ashwin,

> > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
> >>
> >> If a wait_for_completion_timeout() call returns due to a timeout,
> >> the mbox code can still call complete() after returning from the wait.
> >> This can cause subsequent transmissions on a channel to fail, since
> >> the wait_for_completion_timeout() sees the completion variable
> >> is !=0, caused by the erroneous complete() call, and immediately
> >> returns without waiting for the time as expected by the client.
> >>
> >> Fix this by calling complete() only if the TX was successful.
> >>
> >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
> >> ---
> >>   drivers/mailbox/mailbox.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> >> index 17e9e4a..4acaddb 100644
> >> --- a/drivers/mailbox/mailbox.c
> >> +++ b/drivers/mailbox/mailbox.c
> >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
> >>         if (mssg && chan->cl->tx_done)
> >>                 chan->cl->tx_done(chan->cl, mssg, r);
> >>
> >> -       if (chan->cl->tx_block)
> >> +       if ((!r) && chan->cl->tx_block)
> >>                 complete(&chan->tx_complete);
> >
> >
> > Just curious to check if there's another possible race which is
> > a different issue.
> >
> > Suppose the timer fired and indicated that the Tx is complete, then
> > it tries to execute complete while the wait_for_completion_timeout timed
> > out. Does that make sense ?
> >
> > So if yes, how about adding !completion_done(..) to the check while you
> > are at this ?
> 
> Yea. Seems like another race condition. I'll add it along with this..
> 

Thanks !

Regards,
Sudeep
Ashwin Chaugule Dec. 16, 2014, 1 p.m. UTC | #6
On 16 December 2014 at 06:36, Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Fri, Dec 12, 2014 at 05:47:26PM +0000, Ashwin Chaugule wrote:
>> On 12 December 2014 at 03:43, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote:
>> >>
>> >> If a wait_for_completion_timeout() call returns due to a timeout,
>> >> the mbox code can still call complete() after returning from the wait.
>> >> This can cause subsequent transmissions on a channel to fail, since
>> >> the wait_for_completion_timeout() sees the completion variable
>> >> is !=0, caused by the erroneous complete() call, and immediately
>> >> returns without waiting for the time as expected by the client.
>> >>
>> >> Fix this by calling complete() only if the TX was successful.
>> >>
>> >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
>> >> ---
>> >>   drivers/mailbox/mailbox.c | 2 +-
>> >>   1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> >> index 17e9e4a..4acaddb 100644
>> >> --- a/drivers/mailbox/mailbox.c
>> >> +++ b/drivers/mailbox/mailbox.c
>> >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>> >>         if (mssg && chan->cl->tx_done)
>> >>                 chan->cl->tx_done(chan->cl, mssg, r);
>> >>
>> >> -       if (chan->cl->tx_block)
>> >> +       if ((!r) && chan->cl->tx_block)
>> >>                 complete(&chan->tx_complete);
>> >
>> >
>> > Just curious to check if there's another possible race which is
>> > a different issue.
>> >
>> > Suppose the timer fired and indicated that the Tx is complete, then
>> > it tries to execute complete while the wait_for_completion_timeout timed
>> > out. Does that make sense ?
>> >
>> > So if yes, how about adding !completion_done(..) to the check while you
>> > are at this ?
>>
>> Yea. Seems like another race condition. I'll add it along with this..
>>
>
> Thanks !

IIUC, it looks like adding the !completion_done() will not really fix
this race. Once the lock inside wait_for_completion.. is released,
completion_done() will return 0, and we'll call complete(), which is
not what we want, since the "wait" is already over (after a timeout).
I think the only right thing here is to increase the timeout in
wait_for_completion_timeout(). Thoughts?

Cheers,
Ashwin
diff mbox

Patch

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 17e9e4a..4acaddb 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -101,7 +101,7 @@  static void tx_tick(struct mbox_chan *chan, int r)
 	if (mssg && chan->cl->tx_done)
 		chan->cl->tx_done(chan->cl, mssg, r);
 
-	if (chan->cl->tx_block)
+	if ((!r) && chan->cl->tx_block)
 		complete(&chan->tx_complete);
 }