diff mbox series

mailbox: avoid timer start from callback

Message ID 20201016173020.12686-1-jassisinghbrar@gmail.com (mailing list archive)
State New, archived
Headers show
Series mailbox: avoid timer start from callback | expand

Commit Message

Jassi Brar Oct. 16, 2020, 5:30 p.m. UTC
From: Jassi Brar <jaswinder.singh@linaro.org>

If the txdone is done by polling, it is possible for msg_submit() to start
the timer while txdone_hrtimer() callback is running. If the timer needs
recheduling, it could already be enqueued by the time hrtimer_forward_now()
is called, leading hrtimer to loudly complain.

WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
Hardware name: Libre Computer AML-S805X-AC (DT)
Workqueue: events_freezable_power_ thermal_zone_device_check
pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
pc : hrtimer_forward+0xc4/0x110
lr : txdone_hrtimer+0xf8/0x118
[...]

This can be fixed by not starting the timer from the callback path. Which
requires the timer reloading as long as any message is queued on the
channel, and not just when current tx is not done yet.

Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
---
 drivers/mailbox/mailbox.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Sudeep Holla Oct. 16, 2020, 5:50 p.m. UTC | #1
On Fri, Oct 16, 2020 at 12:30:20PM -0500, jassisinghbrar@gmail.com wrote:
> From: Jassi Brar <jaswinder.singh@linaro.org>
> 
> If the txdone is done by polling, it is possible for msg_submit() to start
> the timer while txdone_hrtimer() callback is running. If the timer needs
> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> is called, leading hrtimer to loudly complain.
> 
> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> Hardware name: Libre Computer AML-S805X-AC (DT)
> Workqueue: events_freezable_power_ thermal_zone_device_check
> pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
> pc : hrtimer_forward+0xc4/0x110
> lr : txdone_hrtimer+0xf8/0x118
> [...]
> 
> This can be fixed by not starting the timer from the callback path. Which
> requires the timer reloading as long as any message is queued on the
> channel, and not just when current tx is not done yet.
>

I came to similar conclusion and was testing something similar. You bet
me. Since we have single timer and multiple channels, each time a message
is enqueued on any channel, timer gets added which is wrong.

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>

I tested this patch too by reverting offending commit in -next, so

Tested-by: Sudeep Holla <sudeep.holla@arm.com>

You seem to have dropped the Fixes tags. Is that intentional ? If so,
any particular reasons. I think it is stable material and better to have
fixes tag so that it gets added to stable trees.

--
Regards,
Sudeep
Jassi Brar Oct. 16, 2020, 6:24 p.m. UTC | #2
On Fri, Oct 16, 2020 at 12:50 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Fri, Oct 16, 2020 at 12:30:20PM -0500, jassisinghbrar@gmail.com wrote:
> > From: Jassi Brar <jaswinder.singh@linaro.org>
> >
> > If the txdone is done by polling, it is possible for msg_submit() to start
> > the timer while txdone_hrtimer() callback is running. If the timer needs
> > recheduling, it could already be enqueued by the time hrtimer_forward_now()
> > is called, leading hrtimer to loudly complain.
> >
> > WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
> > CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> > Hardware name: Libre Computer AML-S805X-AC (DT)
> > Workqueue: events_freezable_power_ thermal_zone_device_check
> > pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
> > pc : hrtimer_forward+0xc4/0x110
> > lr : txdone_hrtimer+0xf8/0x118
> > [...]
> >
> > This can be fixed by not starting the timer from the callback path. Which
> > requires the timer reloading as long as any message is queued on the
> > channel, and not just when current tx is not done yet.
> >
>
> I came to similar conclusion and was testing something similar. You bet
> me. Since we have single timer and multiple channels, each time a message
> is enqueued on any channel, timer gets added which is wrong.
>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
>
> I tested this patch too by reverting offending commit in -next, so
>
> Tested-by: Sudeep Holla <sudeep.holla@arm.com>
>
> You seem to have dropped the Fixes tags. Is that intentional ? If so,
> any particular reasons. I think it is stable material and better to have
> fixes tag so that it gets added to stable trees.
>
Thanks for testing. I will decorate it appropriately once I have
Jerome's tested-by too.

-jassi
Jerome Brunet Oct. 16, 2020, 6:38 p.m. UTC | #3
On Fri 16 Oct 2020 at 19:30, jassisinghbrar@gmail.com wrote:

> From: Jassi Brar <jaswinder.singh@linaro.org>
>
> If the txdone is done by polling, it is possible for msg_submit() to start
> the timer while txdone_hrtimer() callback is running. If the timer needs
> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> is called, leading hrtimer to loudly complain.
>
> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> Hardware name: Libre Computer AML-S805X-AC (DT)
> Workqueue: events_freezable_power_ thermal_zone_device_check
> pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
> pc : hrtimer_forward+0xc4/0x110
> lr : txdone_hrtimer+0xf8/0x118
> [...]
>
> This can be fixed by not starting the timer from the callback path. Which
> requires the timer reloading as long as any message is queued on the
> channel, and not just when current tx is not done yet.
>
> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
> ---
>  drivers/mailbox/mailbox.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 0b821a5b2db8..a093a6ecaa66 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -82,9 +82,12 @@ static void msg_submit(struct mbox_chan *chan)
>  exit:
>  	spin_unlock_irqrestore(&chan->lock, flags);
>  
> -	if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> -		/* kick start the timer immediately to avoid delays */
> -		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> +	/* kick start the timer immediately to avoid delays */
> +	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> +		/* but only if not already active */

It would solve the problem I reported as well but instead of running the
check immediately (timer with value 0), we will have to wait for the
next of the timer, it is already started. IOW, there might be a delay
now. I don't know if this important for the mailbox - the existing
comments in the code suggested it was.

> +		if (!hrtimer_active(&chan->mbox->poll_hrt))
> +			hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> +	}
>  }
>  
>  static void tx_tick(struct mbox_chan *chan, int r)
> @@ -122,11 +125,10 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
>  		struct mbox_chan *chan = &mbox->chans[i];
>  
>  		if (chan->active_req && chan->cl) {
> +			resched = true;
>  			txdone = chan->mbox->ops->last_tx_done(chan);
>  			if (txdone)
>  				tx_tick(chan, 0);
> -			else
> -				resched = true;
>  		}
>  	}
Jassi Brar Oct. 16, 2020, 6:45 p.m. UTC | #4
On Fri, Oct 16, 2020 at 1:38 PM Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Fri 16 Oct 2020 at 19:30, jassisinghbrar@gmail.com wrote:
>
> > From: Jassi Brar <jaswinder.singh@linaro.org>
> >
> > If the txdone is done by polling, it is possible for msg_submit() to start
> > the timer while txdone_hrtimer() callback is running. If the timer needs
> > recheduling, it could already be enqueued by the time hrtimer_forward_now()
> > is called, leading hrtimer to loudly complain.
> >
> > WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
> > CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> > Hardware name: Libre Computer AML-S805X-AC (DT)
> > Workqueue: events_freezable_power_ thermal_zone_device_check
> > pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
> > pc : hrtimer_forward+0xc4/0x110
> > lr : txdone_hrtimer+0xf8/0x118
> > [...]
> >
> > This can be fixed by not starting the timer from the callback path. Which
> > requires the timer reloading as long as any message is queued on the
> > channel, and not just when current tx is not done yet.
> >
> > Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
> > ---
> >  drivers/mailbox/mailbox.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> > index 0b821a5b2db8..a093a6ecaa66 100644
> > --- a/drivers/mailbox/mailbox.c
> > +++ b/drivers/mailbox/mailbox.c
> > @@ -82,9 +82,12 @@ static void msg_submit(struct mbox_chan *chan)
> >  exit:
> >       spin_unlock_irqrestore(&chan->lock, flags);
> >
> > -     if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> > -             /* kick start the timer immediately to avoid delays */
> > -             hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> > +     /* kick start the timer immediately to avoid delays */
> > +     if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> > +             /* but only if not already active */
>
> It would solve the problem I reported as well but instead of running the
> check immediately (timer with value 0), we will have to wait for the
> next of the timer, it is already started. IOW, there might be a delay
> now. I don't know if this important for the mailbox - the existing
> comments in the code suggested it was.
>
That comment is for when the first message is queued on the channel,
which remains unimpacted.
So, do I have your tested/acked by ?

thnx,
Jerome Brunet Oct. 16, 2020, 7:32 p.m. UTC | #5
On Fri 16 Oct 2020 at 20:45, Jassi Brar <jassisinghbrar@gmail.com> wrote:

> On Fri, Oct 16, 2020 at 1:38 PM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Fri 16 Oct 2020 at 19:30, jassisinghbrar@gmail.com wrote:
>>
>> > From: Jassi Brar <jaswinder.singh@linaro.org>
>> >
>> > If the txdone is done by polling, it is possible for msg_submit() to start
>> > the timer while txdone_hrtimer() callback is running. If the timer needs
>> > recheduling, it could already be enqueued by the time hrtimer_forward_now()
>> > is called, leading hrtimer to loudly complain.
>> >
>> > WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
>> > CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
>> > Hardware name: Libre Computer AML-S805X-AC (DT)
>> > Workqueue: events_freezable_power_ thermal_zone_device_check
>> > pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
>> > pc : hrtimer_forward+0xc4/0x110
>> > lr : txdone_hrtimer+0xf8/0x118
>> > [...]
>> >
>> > This can be fixed by not starting the timer from the callback path. Which
>> > requires the timer reloading as long as any message is queued on the
>> > channel, and not just when current tx is not done yet.
>> >
>> > Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
>> > ---
>> >  drivers/mailbox/mailbox.c | 12 +++++++-----
>> >  1 file changed, 7 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> > index 0b821a5b2db8..a093a6ecaa66 100644
>> > --- a/drivers/mailbox/mailbox.c
>> > +++ b/drivers/mailbox/mailbox.c
>> > @@ -82,9 +82,12 @@ static void msg_submit(struct mbox_chan *chan)
>> >  exit:
>> >       spin_unlock_irqrestore(&chan->lock, flags);
>> >
>> > -     if (!err && (chan->txdone_method & TXDONE_BY_POLL))
>> > -             /* kick start the timer immediately to avoid delays */
>> > -             hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
>> > +     /* kick start the timer immediately to avoid delays */
>> > +     if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>> > +             /* but only if not already active */
>>
>> It would solve the problem I reported as well but instead of running the
>> check immediately (timer with value 0), we will have to wait for the
>> next of the timer, it is already started. IOW, there might be a delay
>> now. I don't know if this important for the mailbox - the existing
>> comments in the code suggested it was.
>>
> That comment is for when the first message is queued on the channel,
> which remains unimpacted.
> So, do I have your tested/acked by ?

Sure go ahead

Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>

>
> thnx,
diff mbox series

Patch

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 0b821a5b2db8..a093a6ecaa66 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -82,9 +82,12 @@  static void msg_submit(struct mbox_chan *chan)
 exit:
 	spin_unlock_irqrestore(&chan->lock, flags);
 
-	if (!err && (chan->txdone_method & TXDONE_BY_POLL))
-		/* kick start the timer immediately to avoid delays */
-		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
+	/* kick start the timer immediately to avoid delays */
+	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
+		/* but only if not already active */
+		if (!hrtimer_active(&chan->mbox->poll_hrt))
+			hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
+	}
 }
 
 static void tx_tick(struct mbox_chan *chan, int r)
@@ -122,11 +125,10 @@  static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
 		struct mbox_chan *chan = &mbox->chans[i];
 
 		if (chan->active_req && chan->cl) {
+			resched = true;
 			txdone = chan->mbox->ops->last_tx_done(chan);
 			if (txdone)
 				tx_tick(chan, 0);
-			else
-				resched = true;
 		}
 	}