Message ID | 20210721204703.1424034-1-l.stach@pengutronix.de (mailing list archive) |
---|---|
Headers | show |
Series | i.MX8MM GPC improvements and BLK_CTRL driver | expand |
> Subject: [PATCH v2 00/18] i.MX8MM GPC improvements and BLK_CTRL driver > > Hi all, > > second revision of the GPC improvements and BLK_CTRL driver to make use > of all the power-domains on the i.MX8MM. I'm not going to repeat the full > blurb from the v1 cover letter here, but if you are not familiar with i.MX8MM > power domains, it may be worth a read. > > This 2nd revision fixes the DT bindings to be valid yaml, some small failure > path issues and most importantly the interaction with system > suspend/resume. With the previous version some of the power domains > would not come up correctly after a suspend/resume cycle. Thanks for the work. I gave a test, boot and suspend/resume work with display. Tested-by: Peng Fan <peng.fan@nxp.com> > > Updated testing git trees here, disclaimer still applies: > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pen > gutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domai > ns&data=04%7C01%7Cpeng.fan%40nxp.com%7C3ef1698b8c53454da41 > 808d94c88b577%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63 > 7624972323848567%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD > AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata > =PbhVVIqDcUMtMmurwpp2PoSYaAzXgRKVvBccd%2BL26oc%3D&reserv > ed=0 > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pen > gutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domai > ns-testing&data=04%7C01%7Cpeng.fan%40nxp.com%7C3ef1698b8c534 > 54da41808d94c88b577%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C > 0%7C637624972323848567%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w > LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a > mp;sdata=rAuBbsQ5%2FpZJhuocWmapgNwERxat0IQsRfBiQpeJzuk%3D& > reserved=0 > > Regards, > Lucas > > Frieder Schrempf (1): > arm64: dts: imx8mm: Add GPU nodes for 2D and 3D core > > Lucas Stach (15): > Revert "soc: imx: gpcv2: move reset assert after requesting domain > power up" > soc: imx: gpcv2: add lockdep annotation > soc: imx: gpcv2: add domain option to keep domain clocks enabled > soc: imx: gpcv2: keep i.MX8M* bus clocks enabled > soc: imx: gpcv2: support system suspend/resume > dt-bindings: soc: add binding for i.MX8MM VPU blk-ctrl > dt-bindings: power: imx8mm: add defines for VPU blk-ctrl domains > soc: imx: add i.MX8M blk-ctrl driver > dt-bindings: soc: add binding for i.MX8MM DISP blk-ctrl > dt-bindings: power: imx8mm: add defines for DISP blk-ctrl domains > soc: imx: imx8m-blk-ctrl: add DISP blk-ctrl > arm64: dts: imx8mm: add GPC node > arm64: dts: imx8mm: put USB controllers into power-domains > arm64: dts: imx8mm: add VPU blk-ctrl > arm64: dts: imx8mm: add DISP blk-ctrl > > Marek Vasut (2): > soc: imx: gpcv2: Turn domain->pgc into bitfield > soc: imx: gpcv2: Set both GPC_PGC_nCTRL(GPU_2D|GPU_3D) for MX8MM > GPU > domain > > .../soc/imx/fsl,imx8mm-disp-blk-ctrl.yaml | 94 ++++ > .../soc/imx/fsl,imx8mm-vpu-blk-ctrl.yaml | 76 +++ > arch/arm64/boot/dts/freescale/imx8mm.dtsi | 180 ++++++ > drivers/soc/imx/Makefile | 1 + > drivers/soc/imx/gpcv2.c | 130 +++-- > drivers/soc/imx/imx8m-blk-ctrl.c | 525 > ++++++++++++++++++ > include/dt-bindings/power/imx8mm-power.h | 9 + > 7 files changed, 974 insertions(+), 41 deletions(-) create mode 100644 > Documentation/devicetree/bindings/soc/imx/fsl,imx8mm-disp-blk-ctrl.yaml > create mode 100644 > Documentation/devicetree/bindings/soc/imx/fsl,imx8mm-vpu-blk-ctrl.yaml > create mode 100644 drivers/soc/imx/imx8m-blk-ctrl.c > > -- > 2.30.2
On 21.07.21 22:46, Lucas Stach wrote: > Hi all, > > second revision of the GPC improvements and BLK_CTRL driver to make use > of all the power-domains on the i.MX8MM. I'm not going to repeat the full > blurb from the v1 cover letter here, but if you are not familiar with > i.MX8MM power domains, it may be worth a read. > > This 2nd revision fixes the DT bindings to be valid yaml, some small > failure path issues and most importantly the interaction with system > suspend/resume. With the previous version some of the power domains > would not come up correctly after a suspend/resume cycle. > > Updated testing git trees here, disclaimer still applies: > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
On 05.08.21 12:18, Frieder Schrempf wrote: > On 21.07.21 22:46, Lucas Stach wrote: >> Hi all, >> >> second revision of the GPC improvements and BLK_CTRL driver to make use >> of all the power-domains on the i.MX8MM. I'm not going to repeat the full >> blurb from the v1 cover letter here, but if you are not familiar with >> i.MX8MM power domains, it may be worth a read. >> >> This 2nd revision fixes the DT bindings to be valid yaml, some small >> failure path issues and most importantly the interaction with system >> suspend/resume. With the previous version some of the power domains >> would not come up correctly after a suspend/resume cycle. >> >> Updated testing git trees here, disclaimer still applies: >> https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains >> https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing > > I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! > > I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. > Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: #!/bin/sh glmark2-es2-drm & while true; do echo +10 > /sys/class/rtc/rtc0/wakealarm echo mem > /sys/power/state sleep 5 done;
Hi Frieder, Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: > On 05.08.21 12:18, Frieder Schrempf wrote: > > On 21.07.21 22:46, Lucas Stach wrote: > > > Hi all, > > > > > > second revision of the GPC improvements and BLK_CTRL driver to make use > > > of all the power-domains on the i.MX8MM. I'm not going to repeat the full > > > blurb from the v1 cover letter here, but if you are not familiar with > > > i.MX8MM power domains, it may be worth a read. > > > > > > This 2nd revision fixes the DT bindings to be valid yaml, some small > > > failure path issues and most importantly the interaction with system > > > suspend/resume. With the previous version some of the power domains > > > would not come up correctly after a suspend/resume cycle. > > > > > > Updated testing git trees here, disclaimer still applies: > > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains > > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing > > > > I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! > > > > I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. > > > > Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. > > Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. > > If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. > > And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: > > #!/bin/sh > > glmark2-es2-drm & > > while true; > do > echo +10 > /sys/class/rtc/rtc0/wakealarm > echo mem > /sys/power/state > sleep 5 > done; Hm, that's unfortunate. I'm back from a two week vacation, but it looks like I won't have much time available to look into this issue soon. It would be very helpful if you could try to pinpoint the hang a bit more. If you can reproduce the hang with no_console_suspend you might be able to extract a bit more info in which stage the hang happens (suspend, resume, TF-A, etc.) If the hang is in the kernel you might be able to add some prints to the suspend/resume paths to be able to track down the exact point of the hang. I'm happy to look into the issue once it's better known where to look, but I fear that I won't have time to do the above investigation myself short term. Frieder, is this something you could help with over the next few days? Regards, Lucas
On 09.08.21 13:01, Lucas Stach wrote: > Hi Frieder, > > Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: >> On 05.08.21 12:18, Frieder Schrempf wrote: >>> On 21.07.21 22:46, Lucas Stach wrote: >>>> Hi all, >>>> >>>> second revision of the GPC improvements and BLK_CTRL driver to make use >>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full >>>> blurb from the v1 cover letter here, but if you are not familiar with >>>> i.MX8MM power domains, it may be worth a read. >>>> >>>> This 2nd revision fixes the DT bindings to be valid yaml, some small >>>> failure path issues and most importantly the interaction with system >>>> suspend/resume. With the previous version some of the power domains >>>> would not come up correctly after a suspend/resume cycle. >>>> >>>> Updated testing git trees here, disclaimer still applies: >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OlymcyF9VOt6nsb2E%2BpFLTBnmlpOIOxwzdBbggPu%2FHo%3D&reserved=0 >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XSHl3JDKPFX%2FifXK5fcMQFOXbQXuHOJaNnJ3%2BtrMErk%3D&reserved=0 >>> >>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! >>> >>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. >>> >> >> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. >> >> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. >> >> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. >> >> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: >> >> #!/bin/sh >> >> glmark2-es2-drm & >> >> while true; >> do >> echo +10 > /sys/class/rtc/rtc0/wakealarm >> echo mem > /sys/power/state >> sleep 5 >> done; > > Hm, that's unfortunate. > > I'm back from a two week vacation, but it looks like I won't have much > time available to look into this issue soon. It would be very helpful > if you could try to pinpoint the hang a bit more. If you can reproduce > the hang with no_console_suspend you might be able to extract a bit > more info in which stage the hang happens (suspend, resume, TF-A, etc.) > If the hang is in the kernel you might be able to add some prints to > the suspend/resume paths to be able to track down the exact point of > the hang. > > I'm happy to look into the issue once it's better known where to look, > but I fear that I won't have time to do the above investigation myself > short term. Frieder, is this something you could help with over the > next few days? I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
On Mon, Aug 9, 2021 at 6:50 AM Frieder Schrempf <frieder.schrempf@kontron.de> wrote: > > On 09.08.21 13:01, Lucas Stach wrote: > > Hi Frieder, > > > > Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: > >> On 05.08.21 12:18, Frieder Schrempf wrote: > >>> On 21.07.21 22:46, Lucas Stach wrote: > >>>> Hi all, > >>>> > >>>> second revision of the GPC improvements and BLK_CTRL driver to make use > >>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full > >>>> blurb from the v1 cover letter here, but if you are not familiar with > >>>> i.MX8MM power domains, it may be worth a read. > >>>> > >>>> This 2nd revision fixes the DT bindings to be valid yaml, some small > >>>> failure path issues and most importantly the interaction with system > >>>> suspend/resume. With the previous version some of the power domains > >>>> would not come up correctly after a suspend/resume cycle. > >>>> > >>>> Updated testing git trees here, disclaimer still applies: > >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OlymcyF9VOt6nsb2E%2BpFLTBnmlpOIOxwzdBbggPu%2FHo%3D&reserved=0 > >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XSHl3JDKPFX%2FifXK5fcMQFOXbQXuHOJaNnJ3%2BtrMErk%3D&reserved=0 > >>> > >>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! > >>> > >>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. > >>> > >> > >> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. > >> > >> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. > >> > >> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. > >> > >> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: > >> > >> #!/bin/sh > >> > >> glmark2-es2-drm & > >> > >> while true; > >> do > >> echo +10 > /sys/class/rtc/rtc0/wakealarm > >> echo mem > /sys/power/state > >> sleep 5 > >> done; > > > > Hm, that's unfortunate. > > > > I'm back from a two week vacation, but it looks like I won't have much > > time available to look into this issue soon. It would be very helpful > > if you could try to pinpoint the hang a bit more. If you can reproduce > > the hang with no_console_suspend you might be able to extract a bit > > more info in which stage the hang happens (suspend, resume, TF-A, etc.) > > If the hang is in the kernel you might be able to add some prints to > > the suspend/resume paths to be able to track down the exact point of > > the hang. > > > > I'm happy to look into the issue once it's better known where to look, > > but I fear that I won't have time to do the above investigation myself > > short term. Frieder, is this something you could help with over the > > next few days? > > I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. > > @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? right now i am on medical leave due to a broken wrist, and i wont be able to help until it heals. sorry adam > > On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.
On Mon, Aug 9, 2021 at 4:01 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > Hi Frieder, > > Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: > > On 05.08.21 12:18, Frieder Schrempf wrote: > > > On 21.07.21 22:46, Lucas Stach wrote: > > > > Hi all, > > > > > > > > second revision of the GPC improvements and BLK_CTRL driver to make use > > > > of all the power-domains on the i.MX8MM. I'm not going to repeat the full > > > > blurb from the v1 cover letter here, but if you are not familiar with > > > > i.MX8MM power domains, it may be worth a read. > > > > > > > > This 2nd revision fixes the DT bindings to be valid yaml, some small > > > > failure path issues and most importantly the interaction with system > > > > suspend/resume. With the previous version some of the power domains > > > > would not come up correctly after a suspend/resume cycle. > > > > > > > > Updated testing git trees here, disclaimer still applies: > > > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains > > > > https://git.pengutronix.de/cgit/lst/linux/log/?h=imx8m-power-domains-testing > > > > > > I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! > > > > > > I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. > > > > > > > Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. > > > > Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. > > > > If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. > > > > And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: > > > > #!/bin/sh > > > > glmark2-es2-drm & > > > > while true; > > do > > echo +10 > /sys/class/rtc/rtc0/wakealarm > > echo mem > /sys/power/state > > sleep 5 > > done; > > Hm, that's unfortunate. > > I'm back from a two week vacation, but it looks like I won't have much > time available to look into this issue soon. It would be very helpful > if you could try to pinpoint the hang a bit more. If you can reproduce > the hang with no_console_suspend you might be able to extract a bit > more info in which stage the hang happens (suspend, resume, TF-A, etc.) > If the hang is in the kernel you might be able to add some prints to > the suspend/resume paths to be able to track down the exact point of > the hang. > > I'm happy to look into the issue once it's better known where to look, > but I fear that I won't have time to do the above investigation myself > short term. Frieder, is this something you could help with over the > next few days? > Lucas / Frieder, Can you update us on where you are at with this patch series? I fear we are going to go through another kernel release without IMX8MM blk-ctl support and all the things that depend on it such as USB/PCI/DSI/CSI/GPU/VPU. If there is some specific testing you need please let me know what I can do to help. I have a variety of IMX8MM hardware but not a lot of time or knowledge with regards to troubleshooting suspend/resume issues. Are the issues found a regression? Best regards, Tim
Hi Lucas, On 09.08.21 13:50, Frieder Schrempf wrote: > On 09.08.21 13:01, Lucas Stach wrote: >> Hi Frieder, >> >> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: >>> On 05.08.21 12:18, Frieder Schrempf wrote: >>>> On 21.07.21 22:46, Lucas Stach wrote: >>>>> Hi all, >>>>> >>>>> second revision of the GPC improvements and BLK_CTRL driver to make use >>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full >>>>> blurb from the v1 cover letter here, but if you are not familiar with >>>>> i.MX8MM power domains, it may be worth a read. >>>>> >>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small >>>>> failure path issues and most importantly the interaction with system >>>>> suspend/resume. With the previous version some of the power domains >>>>> would not come up correctly after a suspend/resume cycle. >>>>> >>>>> Updated testing git trees here, disclaimer still applies: >>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cfc19fab094dd483e753708d95b2c3f0a%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641067865828503%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=raKaop3FUcsfKMyu13qCeyRKCgkObRuTAc73iQ4BYSI%3D&reserved=0 >>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cfc19fab094dd483e753708d95b2c3f0a%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641067865828503%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bmtM%2FxJ3Y9QpGkMhTDHLrLQ2AD0X7DqbspUMdkS%2B7MY%3D&reserved=0 >>>> >>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! >>>> >>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. >>>> >>> >>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. >>> >>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. >>> >>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. >>> >>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: >>> >>> #!/bin/sh >>> >>> glmark2-es2-drm & >>> >>> while true; >>> do >>> echo +10 > /sys/class/rtc/rtc0/wakealarm >>> echo mem > /sys/power/state >>> sleep 5 >>> done; >> >> Hm, that's unfortunate. >> >> I'm back from a two week vacation, but it looks like I won't have much >> time available to look into this issue soon. It would be very helpful >> if you could try to pinpoint the hang a bit more. If you can reproduce >> the hang with no_console_suspend you might be able to extract a bit >> more info in which stage the hang happens (suspend, resume, TF-A, etc.) >> If the hang is in the kernel you might be able to add some prints to >> the suspend/resume paths to be able to track down the exact point of >> the hang. >> >> I'm happy to look into the issue once it's better known where to look, >> but I fear that I won't have time to do the above investigation myself >> short term. Frieder, is this something you could help with over the >> next few days? > > I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. > > @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? > > On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume. I ran a few more suspend/resume cycles and watched the log. The first 2.5 hours nothing noteworthy happened, except that glmark2 crashed again at some point. Then suddenly the following lines were printed while suspending: imx-pgc imx-pgc-domain.6: failed to command PGC PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110 imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110 PM: Some devices failed to suspend, or early wake event detected After that, the suspending continues to fail with the following on each try: PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22 imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22 PM: Some devices failed to suspend, or early wake event detected So far I didn't run into a lockup again with this test, but I will continue trying to reproduce it and retrieve more information. Best regards Frieder
Hi Tim, On 31.08.21 00:06, Tim Harvey wrote: > On Mon, Aug 9, 2021 at 4:01 AM Lucas Stach <l.stach@pengutronix.de> wrote: >> >> Hi Frieder, >> >> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: >>> On 05.08.21 12:18, Frieder Schrempf wrote: >>>> On 21.07.21 22:46, Lucas Stach wrote: >>>>> Hi all, >>>>> >>>>> second revision of the GPC improvements and BLK_CTRL driver to make use >>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full >>>>> blurb from the v1 cover letter here, but if you are not familiar with >>>>> i.MX8MM power domains, it may be worth a read. >>>>> >>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small >>>>> failure path issues and most importantly the interaction with system >>>>> suspend/resume. With the previous version some of the power domains >>>>> would not come up correctly after a suspend/resume cycle. >>>>> >>>>> Updated testing git trees here, disclaimer still applies: >>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C35d8c33691eb4355196c08d96c0281b5%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637659580288796439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XrDOPLcL5D6PYt8ihbhURkuD9bzABOOfP6hJ5x341lM%3D&reserved=0 >>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C35d8c33691eb4355196c08d96c0281b5%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637659580288796439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9J016OR46KgfdlM4pG%2F5rkO6pT%2FOBwgLTMRqF10it%2Fg%3D&reserved=0 >>>> >>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! >>>> >>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. >>>> >>> >>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. >>> >>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. >>> >>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. >>> >>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: >>> >>> #!/bin/sh >>> >>> glmark2-es2-drm & >>> >>> while true; >>> do >>> echo +10 > /sys/class/rtc/rtc0/wakealarm >>> echo mem > /sys/power/state >>> sleep 5 >>> done; >> >> Hm, that's unfortunate. >> >> I'm back from a two week vacation, but it looks like I won't have much >> time available to look into this issue soon. It would be very helpful >> if you could try to pinpoint the hang a bit more. If you can reproduce >> the hang with no_console_suspend you might be able to extract a bit >> more info in which stage the hang happens (suspend, resume, TF-A, etc.) >> If the hang is in the kernel you might be able to add some prints to >> the suspend/resume paths to be able to track down the exact point of >> the hang. >> >> I'm happy to look into the issue once it's better known where to look, >> but I fear that I won't have time to do the above investigation myself >> short term. Frieder, is this something you could help with over the >> next few days? >> > > Lucas / Frieder, > > Can you update us on where you are at with this patch series? I fear > we are going to go through another kernel release without IMX8MM > blk-ctl support and all the things that depend on it such as > USB/PCI/DSI/CSI/GPU/VPU. If there is some specific testing you need > please let me know what I can do to help. I have a variety of IMX8MM > hardware but not a lot of time or knowledge with regards to > troubleshooting suspend/resume issues. I try to help as good as I can, but unfortunately my time is very limited and I didn't make much progress in investigating the issue(s) so far. If you could do some testing on your side, this would be very appreciated. It would be good if you could setup a recent kernel with Lucas' patchset applied and do some supsend/resume cycle testing as described above. Use 'no_console_suspend' in the cmdline and look for any error messages in the log or lockups of the device. You probably also need some users for the PD or BLK-CTRL, like GPU, DSI, USB, etc. (that's what I currently have enabled). You can find the tree I'm currently using here: https://github.com/fschrempf/linux/tree/next-ktn-pd-blk-ctl-lucas. > Are the issues found a regression? Regression compared to what? To the v1 patches? I don't think so. We didn't have any stable solution for BLK-CTRL support so far and what we have is probably not tested extensively, yet. So I guess it's not really unexpected that there are still issues, but it's very frustrating that after all the efforts, there maybe is still something in the HW that doesn't behave as expected. Best regards, Frieder
On 01.09.21 12:03, Frieder Schrempf wrote: > Hi Lucas, > > On 09.08.21 13:50, Frieder Schrempf wrote: >> On 09.08.21 13:01, Lucas Stach wrote: >>> Hi Frieder, >>> >>> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: >>>> On 05.08.21 12:18, Frieder Schrempf wrote: >>>>> On 21.07.21 22:46, Lucas Stach wrote: >>>>>> Hi all, >>>>>> >>>>>> second revision of the GPC improvements and BLK_CTRL driver to make use >>>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full >>>>>> blurb from the v1 cover letter here, but if you are not familiar with >>>>>> i.MX8MM power domains, it may be worth a read. >>>>>> >>>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small >>>>>> failure path issues and most importantly the interaction with system >>>>>> suspend/resume. With the previous version some of the power domains >>>>>> would not come up correctly after a suspend/resume cycle. >>>>>> >>>>>> Updated testing git trees here, disclaimer still applies: >>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cbf3a4cacd1e047be747b08d96d39e713%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637660917728575954%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0A7jRJH16d3T1S868RHg57csVuDUtgB3lNl2A3QZdus%3D&reserved=0 >>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cbf3a4cacd1e047be747b08d96d39e713%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637660917728575954%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gC5jcC0w3VP4HiJYQMBWD%2FHQzU2rr7KjtGG82Snh4X0%3D&reserved=0 >>>>> >>>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! >>>>> >>>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. >>>>> >>>> >>>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. >>>> >>>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. >>>> >>>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. >>>> >>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: >>>> >>>> #!/bin/sh >>>> >>>> glmark2-es2-drm & >>>> >>>> while true; >>>> do >>>> echo +10 > /sys/class/rtc/rtc0/wakealarm >>>> echo mem > /sys/power/state >>>> sleep 5 >>>> done; >>> >>> Hm, that's unfortunate. >>> >>> I'm back from a two week vacation, but it looks like I won't have much >>> time available to look into this issue soon. It would be very helpful >>> if you could try to pinpoint the hang a bit more. If you can reproduce >>> the hang with no_console_suspend you might be able to extract a bit >>> more info in which stage the hang happens (suspend, resume, TF-A, etc.) >>> If the hang is in the kernel you might be able to add some prints to >>> the suspend/resume paths to be able to track down the exact point of >>> the hang. >>> >>> I'm happy to look into the issue once it's better known where to look, >>> but I fear that I won't have time to do the above investigation myself >>> short term. Frieder, is this something you could help with over the >>> next few days? >> >> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. >> >> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? >> >> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume. > > I ran a few more suspend/resume cycles and watched the log. The first > 2.5 hours nothing noteworthy happened, except that glmark2 crashed again > at some point. Facepalm! Of course glmark2 didn't crash, it just doesn't loop endlessly as I expected it to do, which totally makes sense for a benchmark. Using --run-forever should do the trick.
Hi Frieder, Am Mittwoch, dem 01.09.2021 um 12:03 +0200 schrieb Frieder Schrempf: [...] > > > > > > > > > > > And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: > > > > > > > > #!/bin/sh > > > > > > > > glmark2-es2-drm & > > > > > > > > while true; > > > > do > > > > echo +10 > /sys/class/rtc/rtc0/wakealarm > > > > echo mem > /sys/power/state > > > > sleep 5 > > > > done; > > > > > > Hm, that's unfortunate. > > > > > > I'm back from a two week vacation, but it looks like I won't have much > > > time available to look into this issue soon. It would be very helpful > > > if you could try to pinpoint the hang a bit more. If you can reproduce > > > the hang with no_console_suspend you might be able to extract a bit > > > more info in which stage the hang happens (suspend, resume, TF-A, etc.) > > > If the hang is in the kernel you might be able to add some prints to > > > the suspend/resume paths to be able to track down the exact point of > > > the hang. > > > > > > I'm happy to look into the issue once it's better known where to look, > > > but I fear that I won't have time to do the above investigation myself > > > short term. Frieder, is this something you could help with over the > > > next few days? > > > > I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. > > > > @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? > > > > On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume. > > I ran a few more suspend/resume cycles and watched the log. The first > 2.5 hours nothing noteworthy happened, except that glmark2 crashed again > at some point. > > Then suddenly the following lines were printed while suspending: > > imx-pgc imx-pgc-domain.6: failed to command PGC > PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110 > imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110 > PM: Some devices failed to suspend, or early wake event detected > > After that, the suspending continues to fail with the following on each try: > > PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22 > imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22 > PM: Some devices failed to suspend, or early wake event detected > > So far I didn't run into a lockup again with this test, but I will > continue trying to reproduce it and retrieve more information. If you run into this "failed to command PGC" state again, I would be very interested in the GPC state there. You should be able to dump the full register state from the GPC regmap in debugfs. Regards, Lucas
On 02.09.21 12:25, Lucas Stach wrote: > Hi Frieder, > > Am Mittwoch, dem 01.09.2021 um 12:03 +0200 schrieb Frieder Schrempf: > [...] >>>> >>>>> >>>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: >>>>> >>>>> #!/bin/sh >>>>> >>>>> glmark2-es2-drm & >>>>> >>>>> while true; >>>>> do >>>>> echo +10 > /sys/class/rtc/rtc0/wakealarm >>>>> echo mem > /sys/power/state >>>>> sleep 5 >>>>> done; >>>> >>>> Hm, that's unfortunate. >>>> >>>> I'm back from a two week vacation, but it looks like I won't have much >>>> time available to look into this issue soon. It would be very helpful >>>> if you could try to pinpoint the hang a bit more. If you can reproduce >>>> the hang with no_console_suspend you might be able to extract a bit >>>> more info in which stage the hang happens (suspend, resume, TF-A, etc.) >>>> If the hang is in the kernel you might be able to add some prints to >>>> the suspend/resume paths to be able to track down the exact point of >>>> the hang. >>>> >>>> I'm happy to look into the issue once it's better known where to look, >>>> but I fear that I won't have time to do the above investigation myself >>>> short term. Frieder, is this something you could help with over the >>>> next few days? >>> >>> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. >>> >>> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? >>> >>> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume. >> >> I ran a few more suspend/resume cycles and watched the log. The first >> 2.5 hours nothing noteworthy happened, except that glmark2 crashed again >> at some point. >> >> Then suddenly the following lines were printed while suspending: >> >> imx-pgc imx-pgc-domain.6: failed to command PGC >> PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110 >> imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110 >> PM: Some devices failed to suspend, or early wake event detected >> >> After that, the suspending continues to fail with the following on each try: >> >> PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22 >> imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22 >> PM: Some devices failed to suspend, or early wake event detected >> >> So far I didn't run into a lockup again with this test, but I will >> continue trying to reproduce it and retrieve more information. > > If you run into this "failed to command PGC" state again, I would be > very interested in the GPC state there. You should be able to dump the > full register state from the GPC regmap in debugfs. I tried to reproduce this with the same setup for several days now, but I didn't run into this error again so far. It seems to be something that occurs only very rarely. I also got only a single lockup with this board and something like ~40 h testing in total. On the other hand I have a different board (same design) that shows the lockups much more often. I hope I can provide more data soon, but I can't promise anything.