Message ID | 1304952352-27837-1-git-send-email-r.sricharan@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi On Mon, 9 May 2011, sricharan wrote: > Paul Walmsley reported a kernel hang issue with beagle board during > boot. This is an intermittent bug and the execution was found to be > stuck at the l3 interrupt handler. > > This was due to a dss initiator agent timeout occuring during > the boot even when there is no actual interconnect access made by the > dss. since the reason for the dss timeout is not root caused yet, > the time out feature is disabled at the interconnect level. > Note that this is a temporary fix that should be removed once > the dss interconnect agent timeout issue is resolved. So it's been two months since this bug was reported. Any progress on root-causing it? I don't see how I can upstream this temporary patch with a straight face. First, it tries to unconditionally reset the L3 DSS interconnect agent, even if there's no problem on the L3 DSS IA that requires a reset. It should only try to reset an IA if it's in a bad state. Second, are you sure that reset sequence is correct? Writing a 1 and then a 0 to that reset bit, without any barrier or delay in between? Could you please confirm that this is a correct reset sequence with the L3 IA designers and cc me on the E-mails, or send me an extract from the relevant documentation? Third, the patch disables L3 timeout reporting. This effectively reacts to an error by pretending that the error did not exist. This isn't right. If there's an L3 timeout, it needs to be reported, if at all possible. It should never happen and it indicates something is wrong with the software or the hardware. - Paul -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
+ Tomi, On 7/9/2011 3:25 PM, Paul Walmsley wrote: > Hi > > On Mon, 9 May 2011, sricharan wrote: > >> Paul Walmsley reported a kernel hang issue with beagle board >> during boot. This is an intermittent bug and the execution was >> found to be stuck at the l3 interrupt handler. >> >> This was due to a dss initiator agent timeout occuring during the >> boot even when there is no actual interconnect access made by the >> dss. since the reason for the dss timeout is not root caused yet, >> the time out feature is disabled at the interconnect level. Note >> that this is a temporary fix that should be removed once the dss >> interconnect agent timeout issue is resolved. > > So it's been two months since this bug was reported. Any progress > on root-causing it? > > I don't see how I can upstream this temporary patch with a straight > face. > Sorry for not closing the loop on this thread but I thought Tomi root-caused the DSS timeout issue to incorrect reset sequence of DSS IP. With that fixed I though we shouldn't see that issue. > First, it tries to unconditionally reset the L3 DSS interconnect > agent, even if there's no problem on the L3 DSS IA that requires a > reset. It should only try to reset an IA if it's in a bad state. > This was to ensure that the issue hasn't happened during boot-loader DSS reset sequence in case it does. But I agree with your comments. > Second, are you sure that reset sequence is correct? Writing a 1 and > then a 0 to that reset bit, without any barrier or delay in between? > Could you please confirm that this is a correct reset sequence with > the L3 IA designers and cc me on the E-mails, or send me an extract > from the relevant documentation? > > Third, the patch disables L3 timeout reporting. This effectively > reacts to an error by pretending that the error did not exist. This > isn't right. If there's an L3 timeout, it needs to be reported, if at > all possible. It should never happen and it indicates something is > wrong with the software or the hardware. > Will come back to you on above queries. Regards Santosh -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Santosh, On Sat, 9 Jul 2011, Santosh Shilimkar wrote: > Sorry for not closing the loop on this thread but I thought Tomi > root-caused the DSS timeout issue to incorrect reset sequence of > DSS IP. With that fixed I though we shouldn't see that issue. OK great, happy to hear that it was tracked down! Tomi, do you have patches to fix the reset bug? > > First, it tries to unconditionally reset the L3 DSS interconnect > > agent, even if there's no problem on the L3 DSS IA that requires a > > reset. It should only try to reset an IA if it's in a bad state. > > > This was to ensure that the issue hasn't happened during boot-loader > DSS reset sequence in case it does. But I agree with your comments. That's a good idea, but the patch should only do that if the L3 DSS IA is reporting a timeout error. > > Second, are you sure that reset sequence is correct? Writing a 1 and > > then a 0 to that reset bit, without any barrier or delay in between? > > Could you please confirm that this is a correct reset sequence with > > the L3 IA designers and cc me on the E-mails, or send me an extract > > from the relevant documentation? > > > > Third, the patch disables L3 timeout reporting. This effectively > > reacts to an error by pretending that the error did not exist. This > > isn't right. If there's an L3 timeout, it needs to be reported, if at > > all possible. It should never happen and it indicates something is > > wrong with the software or the hardware. > > > Will come back to you on above queries. regards, - Paul -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2011-07-09 at 18:30 -0600, Paul Walmsley wrote: > Hi Santosh, > > On Sat, 9 Jul 2011, Santosh Shilimkar wrote: > > > Sorry for not closing the loop on this thread but I thought Tomi > > root-caused the DSS timeout issue to incorrect reset sequence of > > DSS IP. With that fixed I though we shouldn't see that issue. > > OK great, happy to hear that it was tracked down! > > Tomi, do you have patches to fix the reset bug? I have to say I'm not sure what this is about... I haven't seen any hangs. There was (or perhaps still is) problems with the hwmod code resetting DSS. This was because the hwmod code didn't enable all the DSS clocks before doing the reset. However, this shouldn't cause any problems in the current mainline kernel, as the DSS driver there does a reset itself. This will change then the DSS starts using pmruntime. Tomi -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomi, On 8/1/2011 11:31 AM, Tomi Valkeinen wrote: > On Sat, 2011-07-09 at 18:30 -0600, Paul Walmsley wrote: >> Hi Santosh, >> >> On Sat, 9 Jul 2011, Santosh Shilimkar wrote: >> >>> Sorry for not closing the loop on this thread but I thought Tomi >>> root-caused the DSS timeout issue to incorrect reset sequence of >>> DSS IP. With that fixed I though we shouldn't see that issue. >> >> OK great, happy to hear that it was tracked down! >> >> Tomi, do you have patches to fix the reset bug? > > I have to say I'm not sure what this is about... I haven't seen any > hangs. > > There was (or perhaps still is) problems with the hwmod code resetting > DSS. This was because the hwmod code didn't enable all the DSS clocks > before doing the reset. However, this shouldn't cause any problems in > the current mainline kernel, as the DSS driver there does a reset > itself. This will change then the DSS starts using pmruntime. > During your vacation, Archit and Sricharan looked at the issue further. The issue is indeed related to DSS reset. Archit has posted the patch on internal review. Please have a look at it. Regards Santosh -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/arm/mach-omap2/omap_l3_smx.c b/arch/arm/mach-omap2/omap_l3_smx.c index 4321e79..4ea7dcd 100644 --- a/arch/arm/mach-omap2/omap_l3_smx.c +++ b/arch/arm/mach-omap2/omap_l3_smx.c @@ -248,6 +248,17 @@ static int __init omap3_l3_probe(struct platform_device *pdev) goto err2; } + /* + * FIX ME: dss interconnect timeout error. + * Disable the l3 timeout reporting feature for all modules. + * Also reset the dss initiator agent with which the error is seen + * to clear the interrupt. This is a temporary fix and should be + * removed after root causing the issue. + */ + omap3_l3_writell(l3->rt, L3_RT_NETWORK_CONTROL, 0x0); + omap3_l3_writell(l3->rt + L3_DSS_IA_CONTROL, L3_AGENT_CONTROL, 0x1); + omap3_l3_writell(l3->rt + L3_DSS_IA_CONTROL, L3_AGENT_CONTROL, 0x0); + l3->debug_irq = platform_get_irq(pdev, 0); ret = request_irq(l3->debug_irq, omap3_l3_app_irq, IRQF_DISABLED | IRQF_TRIGGER_RISING, diff --git a/arch/arm/mach-omap2/omap_l3_smx.h b/arch/arm/mach-omap2/omap_l3_smx.h index ba2ed9a..96fff9d 100644 --- a/arch/arm/mach-omap2/omap_l3_smx.h +++ b/arch/arm/mach-omap2/omap_l3_smx.h @@ -35,6 +35,8 @@ #define L3_ERROR_LOG_SECONDARY (1 << 30) #define L3_ERROR_LOG_ADDR 0x060 +#define L3_RT_NETWORK_CONTROL 0x078 +#define L3_DSS_IA_CONTROL 0x5400 /* Register definitions for Sideband Interconnect */ #define L3_SI_CONTROL 0x020
Paul Walmsley reported a kernel hang issue with beagle board during boot. This is an intermittent bug and the execution was found to be stuck at the l3 interrupt handler. This was due to a dss initiator agent timeout occuring during the boot even when there is no actual interconnect access made by the dss. since the reason for the dss timeout is not root caused yet, the time out feature is disabled at the interconnect level. Note that this is a temporary fix that should be removed once the dss interconnect agent timeout issue is resolved. Thanks to Paul Walmsley for reporting and helping in reproducing this issue. Signed-off-by: sricharan <r.sricharan@ti.com> Cc: Paul Wamsley <paul@pwsan.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> --- arch/arm/mach-omap2/omap_l3_smx.c | 11 +++++++++++ arch/arm/mach-omap2/omap_l3_smx.h | 2 ++ 2 files changed, 13 insertions(+), 0 deletions(-)