Message ID | 1418242572-20998-1-git-send-email-ashwin.chaugule@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Ashwin, On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote: > If a wait_for_completion_timeout() call returns due to a timeout, > the mbox code can still call complete() after returning from the wait. > This can cause subsequent transmissions on a channel to fail, since > the wait_for_completion_timeout() sees the completion variable > is !=0, caused by the erroneous complete() call, and immediately > returns without waiting for the time as expected by the client. > > Fix this by calling complete() only if the TX was successful. > > Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> > --- > drivers/mailbox/mailbox.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > index 17e9e4a..4acaddb 100644 > --- a/drivers/mailbox/mailbox.c > +++ b/drivers/mailbox/mailbox.c > @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) > if (mssg && chan->cl->tx_done) > chan->cl->tx_done(chan->cl, mssg, r); > > - if (chan->cl->tx_block) > + if ((!r) && chan->cl->tx_block) > complete(&chan->tx_complete); Just curious to check if there's another possible race which is a different issue. Suppose the timer fired and indicated that the Tx is complete, then it tries to execute complete while the wait_for_completion_timeout timed out. Does that make sense ? So if yes, how about adding !completion_done(..) to the check while you are at this ? Regards, Sudeep
On 11 December 2014 at 01:46, Ashwin Chaugule <ashwin.chaugule@linaro.org> wrote: > If a wait_for_completion_timeout() call returns due to a timeout, > the mbox code can still call complete() after returning from the wait. > This can cause subsequent transmissions on a channel to fail, since > the wait_for_completion_timeout() sees the completion variable > is !=0, caused by the erroneous complete() call, and immediately > returns without waiting for the time as expected by the client. > > Fix this by calling complete() only if the TX was successful. > > Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> > --- > drivers/mailbox/mailbox.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > index 17e9e4a..4acaddb 100644 > --- a/drivers/mailbox/mailbox.c > +++ b/drivers/mailbox/mailbox.c > @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) > if (mssg && chan->cl->tx_done) > chan->cl->tx_done(chan->cl, mssg, r); > > - if (chan->cl->tx_block) > + if ((!r) && chan->cl->tx_block) > complete(&chan->tx_complete); > Thanks for finding the bug. However the fix is flawed. complete() could also be done from mbox_chan_txdone() calling tx_tick(). And if the controller returned error, we could never pass on that error code to the user (timeout fires and then we will move on with -EIO). Since we could never prevent the controller from returning -EIO as the error, I think we have to explicitly tell tx_tick() if it needs to complete() or not. -Jassi
On 12 December 2014 at 03:43, Sudeep Holla <sudeep.holla@arm.com> wrote: > Hi Ashwin, Hi, > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote: >> >> If a wait_for_completion_timeout() call returns due to a timeout, >> the mbox code can still call complete() after returning from the wait. >> This can cause subsequent transmissions on a channel to fail, since >> the wait_for_completion_timeout() sees the completion variable >> is !=0, caused by the erroneous complete() call, and immediately >> returns without waiting for the time as expected by the client. >> >> Fix this by calling complete() only if the TX was successful. >> >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> >> --- >> drivers/mailbox/mailbox.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c >> index 17e9e4a..4acaddb 100644 >> --- a/drivers/mailbox/mailbox.c >> +++ b/drivers/mailbox/mailbox.c >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) >> if (mssg && chan->cl->tx_done) >> chan->cl->tx_done(chan->cl, mssg, r); >> >> - if (chan->cl->tx_block) >> + if ((!r) && chan->cl->tx_block) >> complete(&chan->tx_complete); > > > Just curious to check if there's another possible race which is > a different issue. > > Suppose the timer fired and indicated that the Tx is complete, then > it tries to execute complete while the wait_for_completion_timeout timed > out. Does that make sense ? > > So if yes, how about adding !completion_done(..) to the check while you > are at this ? Yea. Seems like another race condition. I'll add it along with this.. Thanks, Ashwin
On 12 December 2014 at 05:21, Jassi Brar <jaswinder.singh@linaro.org> wrote: > On 11 December 2014 at 01:46, Ashwin Chaugule > <ashwin.chaugule@linaro.org> wrote: >> If a wait_for_completion_timeout() call returns due to a timeout, >> the mbox code can still call complete() after returning from the wait. >> This can cause subsequent transmissions on a channel to fail, since >> the wait_for_completion_timeout() sees the completion variable >> is !=0, caused by the erroneous complete() call, and immediately >> returns without waiting for the time as expected by the client. >> >> Fix this by calling complete() only if the TX was successful. >> >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> >> --- >> drivers/mailbox/mailbox.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c >> index 17e9e4a..4acaddb 100644 >> --- a/drivers/mailbox/mailbox.c >> +++ b/drivers/mailbox/mailbox.c >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) >> if (mssg && chan->cl->tx_done) >> chan->cl->tx_done(chan->cl, mssg, r); >> >> - if (chan->cl->tx_block) >> + if ((!r) && chan->cl->tx_block) >> complete(&chan->tx_complete); >> > Thanks for finding the bug. > However the fix is flawed. complete() could also be done from > mbox_chan_txdone() calling tx_tick(). And if the controller returned > error, we could never pass on that error code to the user (timeout > fires and then we will move on with -EIO). If the controller returned error, wouldnt it be passed to the user via chan->cl->tx_done()? However, I agree that with this fix, it won't call complete(), when the controller finished the Tx with an Err. > Since we could never prevent the controller from returning -EIO as the > error, I think we have to explicitly tell tx_tick() if it needs to > complete() or not. Hm. How about checking if the wait_for_completion_timeout() returned the timeout value? If so, then return with -EIO and don't complete(). > > -Jassi
On Fri, Dec 12, 2014 at 05:47:26PM +0000, Ashwin Chaugule wrote: > On 12 December 2014 at 03:43, Sudeep Holla <sudeep.holla@arm.com> wrote: > > Hi Ashwin, > > Hi, > Hi Ashwin, > > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote: > >> > >> If a wait_for_completion_timeout() call returns due to a timeout, > >> the mbox code can still call complete() after returning from the wait. > >> This can cause subsequent transmissions on a channel to fail, since > >> the wait_for_completion_timeout() sees the completion variable > >> is !=0, caused by the erroneous complete() call, and immediately > >> returns without waiting for the time as expected by the client. > >> > >> Fix this by calling complete() only if the TX was successful. > >> > >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> > >> --- > >> drivers/mailbox/mailbox.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > >> index 17e9e4a..4acaddb 100644 > >> --- a/drivers/mailbox/mailbox.c > >> +++ b/drivers/mailbox/mailbox.c > >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) > >> if (mssg && chan->cl->tx_done) > >> chan->cl->tx_done(chan->cl, mssg, r); > >> > >> - if (chan->cl->tx_block) > >> + if ((!r) && chan->cl->tx_block) > >> complete(&chan->tx_complete); > > > > > > Just curious to check if there's another possible race which is > > a different issue. > > > > Suppose the timer fired and indicated that the Tx is complete, then > > it tries to execute complete while the wait_for_completion_timeout timed > > out. Does that make sense ? > > > > So if yes, how about adding !completion_done(..) to the check while you > > are at this ? > > Yea. Seems like another race condition. I'll add it along with this.. > Thanks ! Regards, Sudeep
On 16 December 2014 at 06:36, Sudeep Holla <sudeep.holla@arm.com> wrote: > On Fri, Dec 12, 2014 at 05:47:26PM +0000, Ashwin Chaugule wrote: >> On 12 December 2014 at 03:43, Sudeep Holla <sudeep.holla@arm.com> wrote: >> > On Thursday 11 December 2014 01:46 AM, Ashwin Chaugule wrote: >> >> >> >> If a wait_for_completion_timeout() call returns due to a timeout, >> >> the mbox code can still call complete() after returning from the wait. >> >> This can cause subsequent transmissions on a channel to fail, since >> >> the wait_for_completion_timeout() sees the completion variable >> >> is !=0, caused by the erroneous complete() call, and immediately >> >> returns without waiting for the time as expected by the client. >> >> >> >> Fix this by calling complete() only if the TX was successful. >> >> >> >> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> >> >> --- >> >> drivers/mailbox/mailbox.c | 2 +- >> >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> >> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c >> >> index 17e9e4a..4acaddb 100644 >> >> --- a/drivers/mailbox/mailbox.c >> >> +++ b/drivers/mailbox/mailbox.c >> >> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) >> >> if (mssg && chan->cl->tx_done) >> >> chan->cl->tx_done(chan->cl, mssg, r); >> >> >> >> - if (chan->cl->tx_block) >> >> + if ((!r) && chan->cl->tx_block) >> >> complete(&chan->tx_complete); >> > >> > >> > Just curious to check if there's another possible race which is >> > a different issue. >> > >> > Suppose the timer fired and indicated that the Tx is complete, then >> > it tries to execute complete while the wait_for_completion_timeout timed >> > out. Does that make sense ? >> > >> > So if yes, how about adding !completion_done(..) to the check while you >> > are at this ? >> >> Yea. Seems like another race condition. I'll add it along with this.. >> > > Thanks ! IIUC, it looks like adding the !completion_done() will not really fix this race. Once the lock inside wait_for_completion.. is released, completion_done() will return 0, and we'll call complete(), which is not what we want, since the "wait" is already over (after a timeout). I think the only right thing here is to increase the timeout in wait_for_completion_timeout(). Thoughts? Cheers, Ashwin
diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index 17e9e4a..4acaddb 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r) if (mssg && chan->cl->tx_done) chan->cl->tx_done(chan->cl, mssg, r); - if (chan->cl->tx_block) + if ((!r) && chan->cl->tx_block) complete(&chan->tx_complete); }
If a wait_for_completion_timeout() call returns due to a timeout, the mbox code can still call complete() after returning from the wait. This can cause subsequent transmissions on a channel to fail, since the wait_for_completion_timeout() sees the completion variable is !=0, caused by the erroneous complete() call, and immediately returns without waiting for the time as expected by the client. Fix this by calling complete() only if the TX was successful. Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> --- drivers/mailbox/mailbox.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)