Message ID | 20180227204711.29357-1-dianders@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Doug, On 27/02/18 21:47, Douglas Anderson wrote: > Back in the early days when gru devices were still under development > we found an issue where the WiFi reset line needed to be configured as > early as possible during the boot process to avoid the WiFi module > being in a bad state. > > We found that the way to get the kernel to do this in the earliest > possible place was to configure this line in the pinctrl hogs, so > that's what we did. For some history here you can see > <http://crosreview.com/368770>. After the time that change landed in > the kernel, we landed a firmware change to configure this line even > earlier. See <http://crosreview.com/399919>. However, even after the > firmware change landed we kept the kernel change to deal with the fact > that some people working on devices might take a little while to > update their firmware. > > At this there are definitely zero devices out in the wild that have > firmware without the fix in it. Specifically looking in the firmware > branch several critically important fixes for memory stability landed > after the patch in coreboot and I know we didn't ship without those. > Thus, by now, everyone should have the new firmware and it's safe to > not have the kernel set this up in a pinctrl hog. > > Historically, even though it wasn't needed to have this in a pinctrl > hog, we still kept it since it didn't hurt. Pinctrl would apply the > default hog at bootup and then would never touch things again. That > all changed with commit 981ed1bfbc6c ("pinctrl: Really force states > during suspend/resume"). After that commit then we'll re-apply the > default hog at resume time and that can screw up the reset state of > WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way > then the whole system can go haywire. That's what was happening. > Specifically you'd resume a rk3399-gru-* device and it would mostly > resume, then would crash with some crazy weird crash. > > One could say, perhaps, that the recent pinctrl change was at fault > (and should be fixed) since it changed behavior. ...but that's not > really true. The device tree for rk3399-gru is really to blame. > Specifically since the pinctrl is defined in the hog and not in the > "wlan-pd-n" node then the actual user of this pin doesn't have a > pinctrl entry for it. That's bad. > > Let's fix our problems by just moving the control of > "wlan_module_reset_l pinctrl" out of the hog and put them in the > proper place. > > NOTE: in theory, I think it should actually be possible to have a pin > controlled _both_ by the hog and by an actual device. Once the device > claims the pin I think the hog is supposed to let go. I'm not 100% > sure that this works and in any case this solution would be more > complex than is necessary. > > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") > Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 16 +++------------- > 1 file changed, 3 insertions(+), 13 deletions(-) > > diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi > index 6e50768a34ce..9ad54751d0d8 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi > @@ -406,8 +406,9 @@ > wlan_pd_n: wlan-pd-n { > compatible = "regulator-fixed"; > regulator-name = "wlan_pd_n"; > + pinctrl-names = "default"; > + pinctrl-0 = <&wlan_module_reset_l>; > > - /* Note the wlan_module_reset_l pinctrl */ > enable-active-high; > gpio = <&gpio1 11 GPIO_ACTIVE_HIGH>; > > @@ -988,12 +989,6 @@ ap_i2c_audio: &i2c8 { > pinctrl-0 = < > &ap_pwroff /* AP will auto-assert this when in S3 */ > &clk_32k /* This pin is always 32k on gru boards */ > - > - /* > - * We want this driven low ASAP; firmware should help us, but > - * we can help ourselves too. > - */ > - &wlan_module_reset_l > >; > > pcfg_output_low: pcfg-output-low { > @@ -1173,12 +1168,7 @@ ap_i2c_audio: &i2c8 { > }; > > wlan_module_reset_l: wlan-module-reset-l { > - /* > - * We want this driven low ASAP (As {Soon,Strongly} As > - * Possible), to avoid leakage through the powered-down > - * WiFi. > - */ > - rockchip,pins = <1 11 RK_FUNC_GPIO &pcfg_output_low>; > + rockchip,pins = <1 11 RK_FUNC_GPIO &pcfg_pull_none>; > }; > > bt_host_wake_l: bt-host-wake-l { > In linux-next there are still different issues with s2r, but definitely, this patch has allowed me to enable again the PCIE on my Samsung Chromebook Plus, s2r goes now a bit further :) So many thanks for the patch. Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Regards, Enric
On 27/02/18 20:47, Douglas Anderson wrote: > Back in the early days when gru devices were still under development > we found an issue where the WiFi reset line needed to be configured as > early as possible during the boot process to avoid the WiFi module > being in a bad state. > > We found that the way to get the kernel to do this in the earliest > possible place was to configure this line in the pinctrl hogs, so > that's what we did. For some history here you can see > <http://crosreview.com/368770>. After the time that change landed in > the kernel, we landed a firmware change to configure this line even > earlier. See <http://crosreview.com/399919>. However, even after the > firmware change landed we kept the kernel change to deal with the fact > that some people working on devices might take a little while to > update their firmware. > > At this there are definitely zero devices out in the wild that have > firmware without the fix in it. Specifically looking in the firmware > branch several critically important fixes for memory stability landed > after the patch in coreboot and I know we didn't ship without those. > Thus, by now, everyone should have the new firmware and it's safe to > not have the kernel set this up in a pinctrl hog. > > Historically, even though it wasn't needed to have this in a pinctrl > hog, we still kept it since it didn't hurt. Pinctrl would apply the > default hog at bootup and then would never touch things again. That > all changed with commit 981ed1bfbc6c ("pinctrl: Really force states > during suspend/resume"). After that commit then we'll re-apply the > default hog at resume time and that can screw up the reset state of > WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way > then the whole system can go haywire. That's what was happening. > Specifically you'd resume a rk3399-gru-* device and it would mostly > resume, then would crash with some crazy weird crash. > > One could say, perhaps, that the recent pinctrl change was at fault > (and should be fixed) since it changed behavior. ...but that's not > really true. The device tree for rk3399-gru is really to blame. > Specifically since the pinctrl is defined in the hog and not in the > "wlan-pd-n" node then the actual user of this pin doesn't have a > pinctrl entry for it. That's bad. > > Let's fix our problems by just moving the control of > "wlan_module_reset_l pinctrl" out of the hog and put them in the > proper place. > > NOTE: in theory, I think it should actually be possible to have a pin > controlled _both_ by the hog and by an actual device. Once the device > claims the pin I think the hog is supposed to let go. I'm not 100% > sure that this works and in any case this solution would be more > complex than is necessary. > > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") > Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") > Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Thanks, M.
On 28/02/18 12:19, Enric Balletbo i Serra wrote: > Hi Doug, > > On 27/02/18 21:47, Douglas Anderson wrote: >> Back in the early days when gru devices were still under development >> we found an issue where the WiFi reset line needed to be configured as >> early as possible during the boot process to avoid the WiFi module >> being in a bad state. >> >> We found that the way to get the kernel to do this in the earliest >> possible place was to configure this line in the pinctrl hogs, so >> that's what we did. For some history here you can see >> <http://crosreview.com/368770>. After the time that change landed in >> the kernel, we landed a firmware change to configure this line even >> earlier. See <http://crosreview.com/399919>. However, even after the >> firmware change landed we kept the kernel change to deal with the fact >> that some people working on devices might take a little while to >> update their firmware. >> >> At this there are definitely zero devices out in the wild that have >> firmware without the fix in it. Specifically looking in the firmware >> branch several critically important fixes for memory stability landed >> after the patch in coreboot and I know we didn't ship without those. >> Thus, by now, everyone should have the new firmware and it's safe to >> not have the kernel set this up in a pinctrl hog. >> >> Historically, even though it wasn't needed to have this in a pinctrl >> hog, we still kept it since it didn't hurt. Pinctrl would apply the >> default hog at bootup and then would never touch things again. That >> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states >> during suspend/resume"). After that commit then we'll re-apply the >> default hog at resume time and that can screw up the reset state of >> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way >> then the whole system can go haywire. That's what was happening. >> Specifically you'd resume a rk3399-gru-* device and it would mostly >> resume, then would crash with some crazy weird crash. >> >> One could say, perhaps, that the recent pinctrl change was at fault >> (and should be fixed) since it changed behavior. ...but that's not >> really true. The device tree for rk3399-gru is really to blame. >> Specifically since the pinctrl is defined in the hog and not in the >> "wlan-pd-n" node then the actual user of this pin doesn't have a >> pinctrl entry for it. That's bad. >> >> Let's fix our problems by just moving the control of >> "wlan_module_reset_l pinctrl" out of the hog and put them in the >> proper place. >> >> NOTE: in theory, I think it should actually be possible to have a pin >> controlled _both_ by the hog and by an actual device. Once the device >> claims the pin I think the hog is supposed to let go. I'm not 100% >> sure that this works and in any case this solution would be more >> complex than is necessary. >> >> Reported-by: Marc Zyngier <marc.zyngier@arm.com> >> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") >> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") >> Signed-off-by: Douglas Anderson <dianders@chromium.org> >> --- >> >> arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 16 +++------------- >> 1 file changed, 3 insertions(+), 13 deletions(-) >> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi >> index 6e50768a34ce..9ad54751d0d8 100644 >> --- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi >> +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi >> @@ -406,8 +406,9 @@ >> wlan_pd_n: wlan-pd-n { >> compatible = "regulator-fixed"; >> regulator-name = "wlan_pd_n"; >> + pinctrl-names = "default"; >> + pinctrl-0 = <&wlan_module_reset_l>; >> >> - /* Note the wlan_module_reset_l pinctrl */ >> enable-active-high; >> gpio = <&gpio1 11 GPIO_ACTIVE_HIGH>; >> >> @@ -988,12 +989,6 @@ ap_i2c_audio: &i2c8 { >> pinctrl-0 = < >> &ap_pwroff /* AP will auto-assert this when in S3 */ >> &clk_32k /* This pin is always 32k on gru boards */ >> - >> - /* >> - * We want this driven low ASAP; firmware should help us, but >> - * we can help ourselves too. >> - */ >> - &wlan_module_reset_l >> >; >> >> pcfg_output_low: pcfg-output-low { >> @@ -1173,12 +1168,7 @@ ap_i2c_audio: &i2c8 { >> }; >> >> wlan_module_reset_l: wlan-module-reset-l { >> - /* >> - * We want this driven low ASAP (As {Soon,Strongly} As >> - * Possible), to avoid leakage through the powered-down >> - * WiFi. >> - */ >> - rockchip,pins = <1 11 RK_FUNC_GPIO &pcfg_output_low>; >> + rockchip,pins = <1 11 RK_FUNC_GPIO &pcfg_pull_none>; >> }; >> >> bt_host_wake_l: bt-host-wake-l { >> > > In linux-next there are still different issues with s2r, but definitely, this > patch has allowed me to enable again the PCIE on my Samsung Chromebook Plus, s2r > goes now a bit further :) So many thanks for the patch. > > Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> For the record, I maintain a small branch[1] based on the latest mainline, containing fixes only, and not dragging the whole of -next into an already fairly unstable kernel. Most things do work (including s2r), apart from USB3, external display, and some of the sound stuff. Crucially, kexec is now functional, allowing the kernel to be used as a bootloader... M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/kevin-4.16
Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: > Back in the early days when gru devices were still under development > we found an issue where the WiFi reset line needed to be configured as > early as possible during the boot process to avoid the WiFi module > being in a bad state. > > We found that the way to get the kernel to do this in the earliest > possible place was to configure this line in the pinctrl hogs, so > that's what we did. For some history here you can see > <http://crosreview.com/368770>. After the time that change landed in > the kernel, we landed a firmware change to configure this line even > earlier. See <http://crosreview.com/399919>. However, even after the > firmware change landed we kept the kernel change to deal with the fact > that some people working on devices might take a little while to > update their firmware. > > At this there are definitely zero devices out in the wild that have > firmware without the fix in it. Specifically looking in the firmware > branch several critically important fixes for memory stability landed > after the patch in coreboot and I know we didn't ship without those. > Thus, by now, everyone should have the new firmware and it's safe to > not have the kernel set this up in a pinctrl hog. > > Historically, even though it wasn't needed to have this in a pinctrl > hog, we still kept it since it didn't hurt. Pinctrl would apply the > default hog at bootup and then would never touch things again. That > all changed with commit 981ed1bfbc6c ("pinctrl: Really force states > during suspend/resume"). After that commit then we'll re-apply the > default hog at resume time and that can screw up the reset state of > WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way > then the whole system can go haywire. That's what was happening. > Specifically you'd resume a rk3399-gru-* device and it would mostly > resume, then would crash with some crazy weird crash. > > One could say, perhaps, that the recent pinctrl change was at fault > (and should be fixed) since it changed behavior. ...but that's not > really true. The device tree for rk3399-gru is really to blame. > Specifically since the pinctrl is defined in the hog and not in the > "wlan-pd-n" node then the actual user of this pin doesn't have a > pinctrl entry for it. That's bad. > > Let's fix our problems by just moving the control of > "wlan_module_reset_l pinctrl" out of the hog and put them in the > proper place. > > NOTE: in theory, I think it should actually be possible to have a pin > controlled _both_ by the hog and by an actual device. Once the device > claims the pin I think the hog is supposed to let go. I'm not 100% > sure that this works and in any case this solution would be more > complex than is necessary. > > Reported-by: Marc Zyngier <marc.zyngier@arm.com> > Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") > Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") > Signed-off-by: Douglas Anderson <dianders@chromium.org> applied as fix for 4.16 with the 2 Tested-tags Thanks Heiko
Hi all, On 01/03/18 08:43, Heiko Stübner wrote: > Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: >> Back in the early days when gru devices were still under development >> we found an issue where the WiFi reset line needed to be configured as >> early as possible during the boot process to avoid the WiFi module >> being in a bad state. >> >> We found that the way to get the kernel to do this in the earliest >> possible place was to configure this line in the pinctrl hogs, so >> that's what we did. For some history here you can see >> <http://crosreview.com/368770>. After the time that change landed in >> the kernel, we landed a firmware change to configure this line even >> earlier. See <http://crosreview.com/399919>. However, even after the >> firmware change landed we kept the kernel change to deal with the fact >> that some people working on devices might take a little while to >> update their firmware. >> >> At this there are definitely zero devices out in the wild that have >> firmware without the fix in it. Specifically looking in the firmware >> branch several critically important fixes for memory stability landed >> after the patch in coreboot and I know we didn't ship without those. >> Thus, by now, everyone should have the new firmware and it's safe to >> not have the kernel set this up in a pinctrl hog. >> >> Historically, even though it wasn't needed to have this in a pinctrl >> hog, we still kept it since it didn't hurt. Pinctrl would apply the >> default hog at bootup and then would never touch things again. That >> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states >> during suspend/resume"). After that commit then we'll re-apply the >> default hog at resume time and that can screw up the reset state of >> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way >> then the whole system can go haywire. That's what was happening. >> Specifically you'd resume a rk3399-gru-* device and it would mostly >> resume, then would crash with some crazy weird crash. >> >> One could say, perhaps, that the recent pinctrl change was at fault >> (and should be fixed) since it changed behavior. ...but that's not >> really true. The device tree for rk3399-gru is really to blame. >> Specifically since the pinctrl is defined in the hog and not in the >> "wlan-pd-n" node then the actual user of this pin doesn't have a >> pinctrl entry for it. That's bad. >> >> Let's fix our problems by just moving the control of >> "wlan_module_reset_l pinctrl" out of the hog and put them in the >> proper place. >> >> NOTE: in theory, I think it should actually be possible to have a pin >> controlled _both_ by the hog and by an actual device. Once the device >> claims the pin I think the hog is supposed to let go. I'm not 100% >> sure that this works and in any case this solution would be more >> complex than is necessary. >> >> Reported-by: Marc Zyngier <marc.zyngier@arm.com> >> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") >> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") >> Signed-off-by: Douglas Anderson <dianders@chromium.org> > > applied as fix for 4.16 with the 2 Tested-tags Sorry to rain on everyone's parade, but further testing shows that this patch may not be enough to restore a reliable s2r. My initial testing did show that we were resuming without the VOP errors, but there seem to be further issues (I'm loosing the keyboard and the trackpad after resume on Kevin). Applying my initial hack makes it work again. I suspect that there are more hog pins that need tweaking, but I'm a bit out of my depth here. Thanks, M.
Hi, On Tue, Mar 6, 2018 at 3:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: > Hi all, > > On 01/03/18 08:43, Heiko Stübner wrote: >> Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: >>> Back in the early days when gru devices were still under development >>> we found an issue where the WiFi reset line needed to be configured as >>> early as possible during the boot process to avoid the WiFi module >>> being in a bad state. >>> >>> We found that the way to get the kernel to do this in the earliest >>> possible place was to configure this line in the pinctrl hogs, so >>> that's what we did. For some history here you can see >>> <http://crosreview.com/368770>. After the time that change landed in >>> the kernel, we landed a firmware change to configure this line even >>> earlier. See <http://crosreview.com/399919>. However, even after the >>> firmware change landed we kept the kernel change to deal with the fact >>> that some people working on devices might take a little while to >>> update their firmware. >>> >>> At this there are definitely zero devices out in the wild that have >>> firmware without the fix in it. Specifically looking in the firmware >>> branch several critically important fixes for memory stability landed >>> after the patch in coreboot and I know we didn't ship without those. >>> Thus, by now, everyone should have the new firmware and it's safe to >>> not have the kernel set this up in a pinctrl hog. >>> >>> Historically, even though it wasn't needed to have this in a pinctrl >>> hog, we still kept it since it didn't hurt. Pinctrl would apply the >>> default hog at bootup and then would never touch things again. That >>> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states >>> during suspend/resume"). After that commit then we'll re-apply the >>> default hog at resume time and that can screw up the reset state of >>> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way >>> then the whole system can go haywire. That's what was happening. >>> Specifically you'd resume a rk3399-gru-* device and it would mostly >>> resume, then would crash with some crazy weird crash. >>> >>> One could say, perhaps, that the recent pinctrl change was at fault >>> (and should be fixed) since it changed behavior. ...but that's not >>> really true. The device tree for rk3399-gru is really to blame. >>> Specifically since the pinctrl is defined in the hog and not in the >>> "wlan-pd-n" node then the actual user of this pin doesn't have a >>> pinctrl entry for it. That's bad. >>> >>> Let's fix our problems by just moving the control of >>> "wlan_module_reset_l pinctrl" out of the hog and put them in the >>> proper place. >>> >>> NOTE: in theory, I think it should actually be possible to have a pin >>> controlled _both_ by the hog and by an actual device. Once the device >>> claims the pin I think the hog is supposed to let go. I'm not 100% >>> sure that this works and in any case this solution would be more >>> complex than is necessary. >>> >>> Reported-by: Marc Zyngier <marc.zyngier@arm.com> >>> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") >>> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") >>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >> >> applied as fix for 4.16 with the 2 Tested-tags > Sorry to rain on everyone's parade, but further testing shows that this > patch may not be enough to restore a reliable s2r. My initial testing > did show that we were resuming without the VOP errors, but there seem to > be further issues (I'm loosing the keyboard and the trackpad after > resume on Kevin). > > Applying my initial hack makes it work again. I suspect that there are > more hog pins that need tweaking, but I'm a bit out of my depth here. Are you positive you weren't just wearing your lucky hat when you tested your patch and then took it off when you tested mine? As far as I can see the only hogs left on kevin are: &ap_pwroff /* AP will auto-assert this when in S3 */ &clk_32k /* This pin is always 32k on gru boards */ Those map to: ap_pwroff: ap-pwroff { rockchip,pins = <1 5 RK_FUNC_1 &pcfg_pull_none>; }; clk_32k: clk-32k { rockchip,pins = <0 0 RK_FUNC_2 &pcfg_pull_none>; }; So I added some printouts at suspend/resume time. Specifically I set a boolean to "true" for the duration rockchip_pinctrl_suspend() and rockchip_pinctrl_resume() and this turned on a printout in rockchip_set_mux(). My printout looked like this (yeah, I know it's a whitespace-damaged patch just to show what I'm doing): + regmap_read(regmap, reg, &before); data = (mask << (bit + 16)); rmask = data | (data >> 16); data |= (mux & mask) << bit; ret = regmap_update_bits(regmap, reg, rmask, data); + regmap_read(regmap, reg, &after); + + if (DOUG) { + dev_info(info->dev, + "setting mux of GPIO%d-%d to %d; %#010x=>%#010x\n", + bank->bank_num, pin, mux, reg, before, after); + } ...and a similar one in rockchip_set_pull(). That showed this at resume time: [ 62.284427] rockchip-pinctrl pinctrl: setting mux of GPIO1-5 to 1; 0x00009400=>0x00009400 [ 62.294423] rockchip-pinctrl pinctrl: setting pull of GPIO1-5; 0x000041aa=>0x000041aa [ 62.303343] rockchip-pinctrl pinctrl: setting mux of GPIO0-0 to 2; 0x00005002=>0x00005002 [ 62.313240] rockchip-pinctrl pinctrl: setting pull of GPIO0-0; 0x00000ddc=>0x00000ddc [ Said another way: pinmux and pull isn't actually changing due to the hogs. We can see if something else could be changing, but I'd really want to be sure you're certain that the hogs are causing you problems... -Doug
Hi Doug, On 06/03/18 18:00, Doug Anderson wrote: > Hi, > > On Tue, Mar 6, 2018 at 3:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: >> Hi all, >> >> On 01/03/18 08:43, Heiko Stübner wrote: >>> Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: >>>> Back in the early days when gru devices were still under development >>>> we found an issue where the WiFi reset line needed to be configured as >>>> early as possible during the boot process to avoid the WiFi module >>>> being in a bad state. >>>> >>>> We found that the way to get the kernel to do this in the earliest >>>> possible place was to configure this line in the pinctrl hogs, so >>>> that's what we did. For some history here you can see >>>> <http://crosreview.com/368770>. After the time that change landed in >>>> the kernel, we landed a firmware change to configure this line even >>>> earlier. See <http://crosreview.com/399919>. However, even after the >>>> firmware change landed we kept the kernel change to deal with the fact >>>> that some people working on devices might take a little while to >>>> update their firmware. >>>> >>>> At this there are definitely zero devices out in the wild that have >>>> firmware without the fix in it. Specifically looking in the firmware >>>> branch several critically important fixes for memory stability landed >>>> after the patch in coreboot and I know we didn't ship without those. >>>> Thus, by now, everyone should have the new firmware and it's safe to >>>> not have the kernel set this up in a pinctrl hog. >>>> >>>> Historically, even though it wasn't needed to have this in a pinctrl >>>> hog, we still kept it since it didn't hurt. Pinctrl would apply the >>>> default hog at bootup and then would never touch things again. That >>>> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states >>>> during suspend/resume"). After that commit then we'll re-apply the >>>> default hog at resume time and that can screw up the reset state of >>>> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way >>>> then the whole system can go haywire. That's what was happening. >>>> Specifically you'd resume a rk3399-gru-* device and it would mostly >>>> resume, then would crash with some crazy weird crash. >>>> >>>> One could say, perhaps, that the recent pinctrl change was at fault >>>> (and should be fixed) since it changed behavior. ...but that's not >>>> really true. The device tree for rk3399-gru is really to blame. >>>> Specifically since the pinctrl is defined in the hog and not in the >>>> "wlan-pd-n" node then the actual user of this pin doesn't have a >>>> pinctrl entry for it. That's bad. >>>> >>>> Let's fix our problems by just moving the control of >>>> "wlan_module_reset_l pinctrl" out of the hog and put them in the >>>> proper place. >>>> >>>> NOTE: in theory, I think it should actually be possible to have a pin >>>> controlled _both_ by the hog and by an actual device. Once the device >>>> claims the pin I think the hog is supposed to let go. I'm not 100% >>>> sure that this works and in any case this solution would be more >>>> complex than is necessary. >>>> >>>> Reported-by: Marc Zyngier <marc.zyngier@arm.com> >>>> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") >>>> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") >>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >>> >>> applied as fix for 4.16 with the 2 Tested-tags >> Sorry to rain on everyone's parade, but further testing shows that this >> patch may not be enough to restore a reliable s2r. My initial testing >> did show that we were resuming without the VOP errors, but there seem to >> be further issues (I'm loosing the keyboard and the trackpad after >> resume on Kevin). >> >> Applying my initial hack makes it work again. I suspect that there are >> more hog pins that need tweaking, but I'm a bit out of my depth here. > > Are you positive you weren't just wearing your lucky hat when you > tested your patch and then took it off when you tested mine? As far So far, I seem to have a 100% success rate in resuming with my silly hack, whist your DT patch alone only gives me a 50% rate at best. > as I can see the only hogs left on kevin are: > > &ap_pwroff /* AP will auto-assert this when in S3 */ > &clk_32k /* This pin is always 32k on gru boards */ > > Those map to: > > ap_pwroff: ap-pwroff { > rockchip,pins = <1 5 RK_FUNC_1 &pcfg_pull_none>; > }; > > clk_32k: clk-32k { > rockchip,pins = <0 0 RK_FUNC_2 &pcfg_pull_none>; > }; > > So I added some printouts at suspend/resume time. Specifically I set > a boolean to "true" for the duration rockchip_pinctrl_suspend() and > rockchip_pinctrl_resume() and this turned on a printout in > rockchip_set_mux(). My printout looked like this (yeah, I know it's a > whitespace-damaged patch just to show what I'm doing): > > + regmap_read(regmap, reg, &before); > data = (mask << (bit + 16)); > rmask = data | (data >> 16); > data |= (mux & mask) << bit; > ret = regmap_update_bits(regmap, reg, rmask, data); > > + regmap_read(regmap, reg, &after); > + > + if (DOUG) { > + dev_info(info->dev, > + "setting mux of GPIO%d-%d to %d; %#010x=>%#010x\n", > + bank->bank_num, pin, mux, reg, before, after); > + } > > ...and a similar one in rockchip_set_pull(). That showed this at resume time: > > [ 62.284427] rockchip-pinctrl pinctrl: setting mux of GPIO1-5 to 1; > 0x00009400=>0x00009400 > [ 62.294423] rockchip-pinctrl pinctrl: setting pull of GPIO1-5; > 0x000041aa=>0x000041aa > [ 62.303343] rockchip-pinctrl pinctrl: setting mux of GPIO0-0 to 2; > 0x00005002=>0x00005002 > [ 62.313240] rockchip-pinctrl pinctrl: setting pull of GPIO0-0; > 0x00000ddc=>0x00000ddc > [ > > Said another way: pinmux and pull isn't actually changing due to the > hogs. We can see if something else could be changing, but I'd really > want to be sure you're certain that the hogs are causing you > problems... I cannot say for sure that the hogs are the issue. But I thought that they were the only pins affected by 981ed1bfbc6c... If this patch can affect other pins, then I'm probably barking up the wrong tree. Thanks, M.
Am Dienstag, 6. März 2018, 19:15:18 CET schrieb Marc Zyngier: > Hi Doug, > > On 06/03/18 18:00, Doug Anderson wrote: > > Hi, > > > > On Tue, Mar 6, 2018 at 3:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: > >> Hi all, > >> > >> On 01/03/18 08:43, Heiko Stübner wrote: > >>> Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: > >>>> Back in the early days when gru devices were still under development > >>>> we found an issue where the WiFi reset line needed to be configured as > >>>> early as possible during the boot process to avoid the WiFi module > >>>> being in a bad state. > >>>> > >>>> We found that the way to get the kernel to do this in the earliest > >>>> possible place was to configure this line in the pinctrl hogs, so > >>>> that's what we did. For some history here you can see > >>>> <http://crosreview.com/368770>. After the time that change landed in > >>>> the kernel, we landed a firmware change to configure this line even > >>>> earlier. See <http://crosreview.com/399919>. However, even after the > >>>> firmware change landed we kept the kernel change to deal with the fact > >>>> that some people working on devices might take a little while to > >>>> update their firmware. > >>>> > >>>> At this there are definitely zero devices out in the wild that have > >>>> firmware without the fix in it. Specifically looking in the firmware > >>>> branch several critically important fixes for memory stability landed > >>>> after the patch in coreboot and I know we didn't ship without those. > >>>> Thus, by now, everyone should have the new firmware and it's safe to > >>>> not have the kernel set this up in a pinctrl hog. > >>>> > >>>> Historically, even though it wasn't needed to have this in a pinctrl > >>>> hog, we still kept it since it didn't hurt. Pinctrl would apply the > >>>> default hog at bootup and then would never touch things again. That > >>>> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states > >>>> during suspend/resume"). After that commit then we'll re-apply the > >>>> default hog at resume time and that can screw up the reset state of > >>>> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way > >>>> then the whole system can go haywire. That's what was happening. > >>>> Specifically you'd resume a rk3399-gru-* device and it would mostly > >>>> resume, then would crash with some crazy weird crash. > >>>> > >>>> One could say, perhaps, that the recent pinctrl change was at fault > >>>> (and should be fixed) since it changed behavior. ...but that's not > >>>> really true. The device tree for rk3399-gru is really to blame. > >>>> Specifically since the pinctrl is defined in the hog and not in the > >>>> "wlan-pd-n" node then the actual user of this pin doesn't have a > >>>> pinctrl entry for it. That's bad. > >>>> > >>>> Let's fix our problems by just moving the control of > >>>> "wlan_module_reset_l pinctrl" out of the hog and put them in the > >>>> proper place. > >>>> > >>>> NOTE: in theory, I think it should actually be possible to have a pin > >>>> controlled _both_ by the hog and by an actual device. Once the device > >>>> claims the pin I think the hog is supposed to let go. I'm not 100% > >>>> sure that this works and in any case this solution would be more > >>>> complex than is necessary. > >>>> > >>>> Reported-by: Marc Zyngier <marc.zyngier@arm.com> > >>>> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") > >>>> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during > >>>> suspend/resume") > >>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> > >>> > >>> applied as fix for 4.16 with the 2 Tested-tags > >> > >> Sorry to rain on everyone's parade, but further testing shows that this > >> patch may not be enough to restore a reliable s2r. My initial testing > >> did show that we were resuming without the VOP errors, but there seem to > >> be further issues (I'm loosing the keyboard and the trackpad after > >> resume on Kevin). > >> > >> Applying my initial hack makes it work again. I suspect that there are > >> more hog pins that need tweaking, but I'm a bit out of my depth here. > > > > Are you positive you weren't just wearing your lucky hat when you > > tested your patch and then took it off when you tested mine? As far > > So far, I seem to have a 100% success rate in resuming with my silly > hack, whist your DT patch alone only gives me a 50% rate at best. > > > as I can see the only hogs left on kevin are: > > &ap_pwroff /* AP will auto-assert this when in S3 */ > > &clk_32k /* This pin is always 32k on gru boards */ > > > > Those map to: > > ap_pwroff: ap-pwroff { > > > > rockchip,pins = <1 5 RK_FUNC_1 &pcfg_pull_none>; > > > > }; > > > > clk_32k: clk-32k { > > > > rockchip,pins = <0 0 RK_FUNC_2 &pcfg_pull_none>; > > > > }; > > > > So I added some printouts at suspend/resume time. Specifically I set > > a boolean to "true" for the duration rockchip_pinctrl_suspend() and > > rockchip_pinctrl_resume() and this turned on a printout in > > rockchip_set_mux(). My printout looked like this (yeah, I know it's a > > whitespace-damaged patch just to show what I'm doing): > > > > + regmap_read(regmap, reg, &before); > > > > data = (mask << (bit + 16)); > > rmask = data | (data >> 16); > > data |= (mux & mask) << bit; > > ret = regmap_update_bits(regmap, reg, rmask, data); > > > > + regmap_read(regmap, reg, &after); > > + > > + if (DOUG) { > > + dev_info(info->dev, > > + "setting mux of GPIO%d-%d to %d; > > %#010x=>%#010x\n", + bank->bank_num, pin, mux, > > reg, before, after); + } > > > > ...and a similar one in rockchip_set_pull(). That showed this at resume > > time: > > > > [ 62.284427] rockchip-pinctrl pinctrl: setting mux of GPIO1-5 to 1; > > 0x00009400=>0x00009400 > > [ 62.294423] rockchip-pinctrl pinctrl: setting pull of GPIO1-5; > > 0x000041aa=>0x000041aa > > [ 62.303343] rockchip-pinctrl pinctrl: setting mux of GPIO0-0 to 2; > > 0x00005002=>0x00005002 > > [ 62.313240] rockchip-pinctrl pinctrl: setting pull of GPIO0-0; > > 0x00000ddc=>0x00000ddc > > [ > > > > Said another way: pinmux and pull isn't actually changing due to the > > hogs. We can see if something else could be changing, but I'd really > > want to be sure you're certain that the hogs are causing you > > problems... > > I cannot say for sure that the hogs are the issue. But I thought that > they were the only pins affected by 981ed1bfbc6c... If this patch can > affect other pins, then I'm probably barking up the wrong tree. On Kevin I see something like [ 60.764129] cros-ec-spi spi2.0: spi transfer failed: -108 [ 60.764132] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 [ 60.764136] cros-ec-spi spi2.0: Command xfer error (err:-108) [ 60.764365] cros-ec-spi spi2.0: spi transfer failed: -108 [ 60.764368] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 [ 60.764371] cros-ec-spi spi2.0: Command xfer error (err:-108) on resume with my current for-next. So maybe your hack just happened to change some timing during resume? Suspend/Resume also disconnects my usb-ethernet, making me lose my nfsroot, so I can test this once every boot only.
On 06/03/18 18:49, Heiko Stübner wrote: > Am Dienstag, 6. März 2018, 19:15:18 CET schrieb Marc Zyngier: >> Hi Doug, >> >> On 06/03/18 18:00, Doug Anderson wrote: >>> Hi, >>> >>> On Tue, Mar 6, 2018 at 3:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: >>>> Hi all, >>>> >>>> On 01/03/18 08:43, Heiko Stübner wrote: >>>>> Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: >>>>>> Back in the early days when gru devices were still under development >>>>>> we found an issue where the WiFi reset line needed to be configured as >>>>>> early as possible during the boot process to avoid the WiFi module >>>>>> being in a bad state. >>>>>> >>>>>> We found that the way to get the kernel to do this in the earliest >>>>>> possible place was to configure this line in the pinctrl hogs, so >>>>>> that's what we did. For some history here you can see >>>>>> <http://crosreview.com/368770>. After the time that change landed in >>>>>> the kernel, we landed a firmware change to configure this line even >>>>>> earlier. See <http://crosreview.com/399919>. However, even after the >>>>>> firmware change landed we kept the kernel change to deal with the fact >>>>>> that some people working on devices might take a little while to >>>>>> update their firmware. >>>>>> >>>>>> At this there are definitely zero devices out in the wild that have >>>>>> firmware without the fix in it. Specifically looking in the firmware >>>>>> branch several critically important fixes for memory stability landed >>>>>> after the patch in coreboot and I know we didn't ship without those. >>>>>> Thus, by now, everyone should have the new firmware and it's safe to >>>>>> not have the kernel set this up in a pinctrl hog. >>>>>> >>>>>> Historically, even though it wasn't needed to have this in a pinctrl >>>>>> hog, we still kept it since it didn't hurt. Pinctrl would apply the >>>>>> default hog at bootup and then would never touch things again. That >>>>>> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states >>>>>> during suspend/resume"). After that commit then we'll re-apply the >>>>>> default hog at resume time and that can screw up the reset state of >>>>>> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way >>>>>> then the whole system can go haywire. That's what was happening. >>>>>> Specifically you'd resume a rk3399-gru-* device and it would mostly >>>>>> resume, then would crash with some crazy weird crash. >>>>>> >>>>>> One could say, perhaps, that the recent pinctrl change was at fault >>>>>> (and should be fixed) since it changed behavior. ...but that's not >>>>>> really true. The device tree for rk3399-gru is really to blame. >>>>>> Specifically since the pinctrl is defined in the hog and not in the >>>>>> "wlan-pd-n" node then the actual user of this pin doesn't have a >>>>>> pinctrl entry for it. That's bad. >>>>>> >>>>>> Let's fix our problems by just moving the control of >>>>>> "wlan_module_reset_l pinctrl" out of the hog and put them in the >>>>>> proper place. >>>>>> >>>>>> NOTE: in theory, I think it should actually be possible to have a pin >>>>>> controlled _both_ by the hog and by an actual device. Once the device >>>>>> claims the pin I think the hog is supposed to let go. I'm not 100% >>>>>> sure that this works and in any case this solution would be more >>>>>> complex than is necessary. >>>>>> >>>>>> Reported-by: Marc Zyngier <marc.zyngier@arm.com> >>>>>> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") >>>>>> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during >>>>>> suspend/resume") >>>>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >>>>> >>>>> applied as fix for 4.16 with the 2 Tested-tags >>>> >>>> Sorry to rain on everyone's parade, but further testing shows that this >>>> patch may not be enough to restore a reliable s2r. My initial testing >>>> did show that we were resuming without the VOP errors, but there seem to >>>> be further issues (I'm loosing the keyboard and the trackpad after >>>> resume on Kevin). >>>> >>>> Applying my initial hack makes it work again. I suspect that there are >>>> more hog pins that need tweaking, but I'm a bit out of my depth here. >>> >>> Are you positive you weren't just wearing your lucky hat when you >>> tested your patch and then took it off when you tested mine? As far >> >> So far, I seem to have a 100% success rate in resuming with my silly >> hack, whist your DT patch alone only gives me a 50% rate at best. >> >>> as I can see the only hogs left on kevin are: >>> &ap_pwroff /* AP will auto-assert this when in S3 */ >>> &clk_32k /* This pin is always 32k on gru boards */ >>> >>> Those map to: >>> ap_pwroff: ap-pwroff { >>> >>> rockchip,pins = <1 5 RK_FUNC_1 &pcfg_pull_none>; >>> >>> }; >>> >>> clk_32k: clk-32k { >>> >>> rockchip,pins = <0 0 RK_FUNC_2 &pcfg_pull_none>; >>> >>> }; >>> >>> So I added some printouts at suspend/resume time. Specifically I set >>> a boolean to "true" for the duration rockchip_pinctrl_suspend() and >>> rockchip_pinctrl_resume() and this turned on a printout in >>> rockchip_set_mux(). My printout looked like this (yeah, I know it's a >>> whitespace-damaged patch just to show what I'm doing): >>> >>> + regmap_read(regmap, reg, &before); >>> >>> data = (mask << (bit + 16)); >>> rmask = data | (data >> 16); >>> data |= (mux & mask) << bit; >>> ret = regmap_update_bits(regmap, reg, rmask, data); >>> >>> + regmap_read(regmap, reg, &after); >>> + >>> + if (DOUG) { >>> + dev_info(info->dev, >>> + "setting mux of GPIO%d-%d to %d; >>> %#010x=>%#010x\n", + bank->bank_num, pin, mux, >>> reg, before, after); + } >>> >>> ...and a similar one in rockchip_set_pull(). That showed this at resume >>> time: >>> >>> [ 62.284427] rockchip-pinctrl pinctrl: setting mux of GPIO1-5 to 1; >>> 0x00009400=>0x00009400 >>> [ 62.294423] rockchip-pinctrl pinctrl: setting pull of GPIO1-5; >>> 0x000041aa=>0x000041aa >>> [ 62.303343] rockchip-pinctrl pinctrl: setting mux of GPIO0-0 to 2; >>> 0x00005002=>0x00005002 >>> [ 62.313240] rockchip-pinctrl pinctrl: setting pull of GPIO0-0; >>> 0x00000ddc=>0x00000ddc >>> [ >>> >>> Said another way: pinmux and pull isn't actually changing due to the >>> hogs. We can see if something else could be changing, but I'd really >>> want to be sure you're certain that the hogs are causing you >>> problems... >> >> I cannot say for sure that the hogs are the issue. But I thought that >> they were the only pins affected by 981ed1bfbc6c... If this patch can >> affect other pins, then I'm probably barking up the wrong tree. > > On Kevin I see something like > > [ 60.764129] cros-ec-spi spi2.0: spi transfer failed: -108 > [ 60.764132] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 > [ 60.764136] cros-ec-spi spi2.0: Command xfer error (err:-108) > [ 60.764365] cros-ec-spi spi2.0: spi transfer failed: -108 > [ 60.764368] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 > [ 60.764371] cros-ec-spi spi2.0: Command xfer error (err:-108) > > on resume with my current for-next. So maybe your hack just > happened to change some timing during resume? No, I carry yet another patch to make that one work[1]. > Suspend/Resume also disconnects my usb-ethernet, making me lose my > nfsroot, so I can test this once every boot only. I use my kevin as a "real" laptop, which means it gets suspended/resumed at least 20 times a day, no reboots involved (unless I'm actually testing arm64 code on it). M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=hack/kevin-4.16&id=4e95dc697ec6b66579f8747c917dfe675bb771b2
Hi, On Tue, Mar 6, 2018 at 10:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: > On 06/03/18 18:49, Heiko Stübner wrote: >> Am Dienstag, 6. März 2018, 19:15:18 CET schrieb Marc Zyngier: >>> Hi Doug, >>> >>> On 06/03/18 18:00, Doug Anderson wrote: >>>> Hi, >>>> >>>> On Tue, Mar 6, 2018 at 3:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: >>>>> Hi all, >>>>> >>>>> On 01/03/18 08:43, Heiko Stübner wrote: >>>>>> Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: >>>>>>> Back in the early days when gru devices were still under development >>>>>>> we found an issue where the WiFi reset line needed to be configured as >>>>>>> early as possible during the boot process to avoid the WiFi module >>>>>>> being in a bad state. >>>>>>> >>>>>>> We found that the way to get the kernel to do this in the earliest >>>>>>> possible place was to configure this line in the pinctrl hogs, so >>>>>>> that's what we did. For some history here you can see >>>>>>> <http://crosreview.com/368770>. After the time that change landed in >>>>>>> the kernel, we landed a firmware change to configure this line even >>>>>>> earlier. See <http://crosreview.com/399919>. However, even after the >>>>>>> firmware change landed we kept the kernel change to deal with the fact >>>>>>> that some people working on devices might take a little while to >>>>>>> update their firmware. >>>>>>> >>>>>>> At this there are definitely zero devices out in the wild that have >>>>>>> firmware without the fix in it. Specifically looking in the firmware >>>>>>> branch several critically important fixes for memory stability landed >>>>>>> after the patch in coreboot and I know we didn't ship without those. >>>>>>> Thus, by now, everyone should have the new firmware and it's safe to >>>>>>> not have the kernel set this up in a pinctrl hog. >>>>>>> >>>>>>> Historically, even though it wasn't needed to have this in a pinctrl >>>>>>> hog, we still kept it since it didn't hurt. Pinctrl would apply the >>>>>>> default hog at bootup and then would never touch things again. That >>>>>>> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states >>>>>>> during suspend/resume"). After that commit then we'll re-apply the >>>>>>> default hog at resume time and that can screw up the reset state of >>>>>>> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way >>>>>>> then the whole system can go haywire. That's what was happening. >>>>>>> Specifically you'd resume a rk3399-gru-* device and it would mostly >>>>>>> resume, then would crash with some crazy weird crash. >>>>>>> >>>>>>> One could say, perhaps, that the recent pinctrl change was at fault >>>>>>> (and should be fixed) since it changed behavior. ...but that's not >>>>>>> really true. The device tree for rk3399-gru is really to blame. >>>>>>> Specifically since the pinctrl is defined in the hog and not in the >>>>>>> "wlan-pd-n" node then the actual user of this pin doesn't have a >>>>>>> pinctrl entry for it. That's bad. >>>>>>> >>>>>>> Let's fix our problems by just moving the control of >>>>>>> "wlan_module_reset_l pinctrl" out of the hog and put them in the >>>>>>> proper place. >>>>>>> >>>>>>> NOTE: in theory, I think it should actually be possible to have a pin >>>>>>> controlled _both_ by the hog and by an actual device. Once the device >>>>>>> claims the pin I think the hog is supposed to let go. I'm not 100% >>>>>>> sure that this works and in any case this solution would be more >>>>>>> complex than is necessary. >>>>>>> >>>>>>> Reported-by: Marc Zyngier <marc.zyngier@arm.com> >>>>>>> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") >>>>>>> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during >>>>>>> suspend/resume") >>>>>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >>>>>> >>>>>> applied as fix for 4.16 with the 2 Tested-tags >>>>> >>>>> Sorry to rain on everyone's parade, but further testing shows that this >>>>> patch may not be enough to restore a reliable s2r. My initial testing >>>>> did show that we were resuming without the VOP errors, but there seem to >>>>> be further issues (I'm loosing the keyboard and the trackpad after >>>>> resume on Kevin). >>>>> >>>>> Applying my initial hack makes it work again. I suspect that there are >>>>> more hog pins that need tweaking, but I'm a bit out of my depth here. >>>> >>>> Are you positive you weren't just wearing your lucky hat when you >>>> tested your patch and then took it off when you tested mine? As far >>> >>> So far, I seem to have a 100% success rate in resuming with my silly >>> hack, whist your DT patch alone only gives me a 50% rate at best. >>> >>>> as I can see the only hogs left on kevin are: >>>> &ap_pwroff /* AP will auto-assert this when in S3 */ >>>> &clk_32k /* This pin is always 32k on gru boards */ >>>> >>>> Those map to: >>>> ap_pwroff: ap-pwroff { >>>> >>>> rockchip,pins = <1 5 RK_FUNC_1 &pcfg_pull_none>; >>>> >>>> }; >>>> >>>> clk_32k: clk-32k { >>>> >>>> rockchip,pins = <0 0 RK_FUNC_2 &pcfg_pull_none>; >>>> >>>> }; >>>> >>>> So I added some printouts at suspend/resume time. Specifically I set >>>> a boolean to "true" for the duration rockchip_pinctrl_suspend() and >>>> rockchip_pinctrl_resume() and this turned on a printout in >>>> rockchip_set_mux(). My printout looked like this (yeah, I know it's a >>>> whitespace-damaged patch just to show what I'm doing): >>>> >>>> + regmap_read(regmap, reg, &before); >>>> >>>> data = (mask << (bit + 16)); >>>> rmask = data | (data >> 16); >>>> data |= (mux & mask) << bit; >>>> ret = regmap_update_bits(regmap, reg, rmask, data); >>>> >>>> + regmap_read(regmap, reg, &after); >>>> + >>>> + if (DOUG) { >>>> + dev_info(info->dev, >>>> + "setting mux of GPIO%d-%d to %d; >>>> %#010x=>%#010x\n", + bank->bank_num, pin, mux, >>>> reg, before, after); + } >>>> >>>> ...and a similar one in rockchip_set_pull(). That showed this at resume >>>> time: >>>> >>>> [ 62.284427] rockchip-pinctrl pinctrl: setting mux of GPIO1-5 to 1; >>>> 0x00009400=>0x00009400 >>>> [ 62.294423] rockchip-pinctrl pinctrl: setting pull of GPIO1-5; >>>> 0x000041aa=>0x000041aa >>>> [ 62.303343] rockchip-pinctrl pinctrl: setting mux of GPIO0-0 to 2; >>>> 0x00005002=>0x00005002 >>>> [ 62.313240] rockchip-pinctrl pinctrl: setting pull of GPIO0-0; >>>> 0x00000ddc=>0x00000ddc >>>> [ >>>> >>>> Said another way: pinmux and pull isn't actually changing due to the >>>> hogs. We can see if something else could be changing, but I'd really >>>> want to be sure you're certain that the hogs are causing you >>>> problems... >>> >>> I cannot say for sure that the hogs are the issue. But I thought that >>> they were the only pins affected by 981ed1bfbc6c... If this patch can >>> affect other pins, then I'm probably barking up the wrong tree. >> >> On Kevin I see something like >> >> [ 60.764129] cros-ec-spi spi2.0: spi transfer failed: -108 >> [ 60.764132] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 >> [ 60.764136] cros-ec-spi spi2.0: Command xfer error (err:-108) >> [ 60.764365] cros-ec-spi spi2.0: spi transfer failed: -108 >> [ 60.764368] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 >> [ 60.764371] cros-ec-spi spi2.0: Command xfer error (err:-108) >> >> on resume with my current for-next. So maybe your hack just >> happened to change some timing during resume? > > No, I carry yet another patch to make that one work[1]. > >> Suspend/Resume also disconnects my usb-ethernet, making me lose my >> nfsroot, so I can test this once every boot only. Yeah, it kills usb-ethernet for me too. ...so I can suspend/resume once and then the next suspend fails with a bunch of usb errors. This is based on your "hack/kevin-4.16", which looks like: e7934b797f4b (HEAD, linux_arm-platforms/hack/kevin-4.16) arm64: dts: rockchip: Fix rk3399-gru-* s2r (pinctrl hogs, wifi reset) 9b9988414f44 irqchip/gic-v3: Allow LPIs to be disabled from the command line d5e06c686858 iommu/rockchip: Perform a reset on shutdown 55b36a99626f drm/rockchip: Don't use spin_lock_irqsave in interrupt context 7d72d7e57c2c drm/rockchip: Do not use memcpy for MMIO addresses 236afcd0425c drm/rockchip: Clear all interrupts before requesting the IRQ 31608ae0d3fc arm64: Enable dynamic sched_domain flag setting 1b255643cdc3 drivers/base/arch_topology: Dynamic sched_domain flag detection 2caca1b31f89 arm64: rk3399: Add capacity-dmips-mhz attributes 36ced612e4d3 mfd: cros_ec: add RTC as mfd subdevice 4e95dc697ec6 spi: rockchip: Convert to late and early system PM callbacks d27d6da92086 drm/rockchip: analogix_dp: Ensure that the bridge is powered before poking it bcce86412ec1 arm64: DT: rk3399: Add missing EDP clock c04436bd57b8 bootloader cmdline bb8a4d168d58 build hacks 26e04c84de6c kevin: build stuff 6915015e30db kevin: defconfig 4a3928c6f8a5 (tag: v4.16-rc3) Linux 4.16-rc3 Actually, I also see some errors reading thermal channels, but I wonder if perhaps USB is causing some sort of interrupt storm and that happens to randomly take out whatever device was trying to talk at the same time? [ 83.718999] read channel() error: -110 [ 83.723275] thermal thermal_zone3: failed to read out thermal zone (-110) [ 83.822810] read channel() error: -110 [ 83.827048] thermal thermal_zone2: failed to read out thermal zone (-110) [ 84.862810] read channel() error: -110 [ 84.867051] thermal thermal_zone3: failed to read out thermal zone (-110) [ 84.990800] read channel() error: -110 [ 84.995034] thermal thermal_zone2: failed to read out thermal zone (-110) [ 86.110817] read channel() error: -110 [ 86.115259] thermal thermal_zone3: failed to read out thermal zone (-110) [ 86.214990] read channel() error: -110 [ 86.219440] thermal thermal_zone2: failed to read out thermal zone (-110) [ 87.295049] read channel() error: -110 [ 87.299316] thermal thermal_zone3: failed to read out thermal zone (-110) [ 87.398805] read channel() error: -110 [ 87.403040] thermal thermal_zone2: failed to read out thermal zone (-110) [ 88.510808] read channel() error: -110 [ 88.515053] thermal thermal_zone2: failed to read out thermal zone (-110) [ 88.646810] read channel() error: -110 [ 88.651250] thermal thermal_zone3: failed to read out thermal zone (-110) [ 89.874606] xhci-hcd xhci-hcd.2.auto: Abort failed to stop command ring: -110 [ 89.896134] xhci-hcd xhci-hcd.2.auto: xHCI host controller not responding, assume dead [ 89.905130] xhci-hcd xhci-hcd.2.auto: HC died; cleaning up [ 89.911491] xhci-hcd xhci-hcd.2.auto: Timeout while waiting for configure endpoint command > I use my kevin as a "real" laptop, which means it gets suspended/resumed > at least 20 times a day, no reboots involved (unless I'm actually testing > arm64 code on it). Enric: I know you've been working with Kevin stuff a lot. Any chance you reproduce Marc's failures and also see that it's fixed with his hack patch? Marc: have you posted actual logs for the failing case (after picking my dts fix) somewhere? Unfortunately, I don't think I'll be able to devote a ton more time to debugging this right now. :( -Doug
Am Mittwoch, 7. März 2018, 05:54:32 CET schrieb Doug Anderson: > Hi, > > On Tue, Mar 6, 2018 at 10:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: > > On 06/03/18 18:49, Heiko Stübner wrote: > >> Am Dienstag, 6. März 2018, 19:15:18 CET schrieb Marc Zyngier: > >>> Hi Doug, > >>> > >>> On 06/03/18 18:00, Doug Anderson wrote: > >>>> Hi, > >>>> > >>>> On Tue, Mar 6, 2018 at 3:58 AM, Marc Zyngier <marc.zyngier@arm.com> wrote: > >>>>> Hi all, > >>>>> > >>>>> On 01/03/18 08:43, Heiko Stübner wrote: > >>>>>> Am Dienstag, 27. Februar 2018, 21:47:11 CET schrieb Douglas Anderson: > >>>>>>> Back in the early days when gru devices were still under development > >>>>>>> we found an issue where the WiFi reset line needed to be configured > >>>>>>> as > >>>>>>> early as possible during the boot process to avoid the WiFi module > >>>>>>> being in a bad state. > >>>>>>> > >>>>>>> We found that the way to get the kernel to do this in the earliest > >>>>>>> possible place was to configure this line in the pinctrl hogs, so > >>>>>>> that's what we did. For some history here you can see > >>>>>>> <http://crosreview.com/368770>. After the time that change landed > >>>>>>> in > >>>>>>> the kernel, we landed a firmware change to configure this line even > >>>>>>> earlier. See <http://crosreview.com/399919>. However, even after > >>>>>>> the > >>>>>>> firmware change landed we kept the kernel change to deal with the > >>>>>>> fact > >>>>>>> that some people working on devices might take a little while to > >>>>>>> update their firmware. > >>>>>>> > >>>>>>> At this there are definitely zero devices out in the wild that have > >>>>>>> firmware without the fix in it. Specifically looking in the > >>>>>>> firmware > >>>>>>> branch several critically important fixes for memory stability > >>>>>>> landed > >>>>>>> after the patch in coreboot and I know we didn't ship without those. > >>>>>>> Thus, by now, everyone should have the new firmware and it's safe to > >>>>>>> not have the kernel set this up in a pinctrl hog. > >>>>>>> > >>>>>>> Historically, even though it wasn't needed to have this in a pinctrl > >>>>>>> hog, we still kept it since it didn't hurt. Pinctrl would apply the > >>>>>>> default hog at bootup and then would never touch things again. That > >>>>>>> all changed with commit 981ed1bfbc6c ("pinctrl: Really force states > >>>>>>> during suspend/resume"). After that commit then we'll re-apply the > >>>>>>> default hog at resume time and that can screw up the reset state of > >>>>>>> WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong > >>>>>>> way > >>>>>>> then the whole system can go haywire. That's what was happening. > >>>>>>> Specifically you'd resume a rk3399-gru-* device and it would mostly > >>>>>>> resume, then would crash with some crazy weird crash. > >>>>>>> > >>>>>>> One could say, perhaps, that the recent pinctrl change was at fault > >>>>>>> (and should be fixed) since it changed behavior. ...but that's not > >>>>>>> really true. The device tree for rk3399-gru is really to blame. > >>>>>>> Specifically since the pinctrl is defined in the hog and not in the > >>>>>>> "wlan-pd-n" node then the actual user of this pin doesn't have a > >>>>>>> pinctrl entry for it. That's bad. > >>>>>>> > >>>>>>> Let's fix our problems by just moving the control of > >>>>>>> "wlan_module_reset_l pinctrl" out of the hog and put them in the > >>>>>>> proper place. > >>>>>>> > >>>>>>> NOTE: in theory, I think it should actually be possible to have a > >>>>>>> pin > >>>>>>> controlled _both_ by the hog and by an actual device. Once the > >>>>>>> device > >>>>>>> claims the pin I think the hog is supposed to let go. I'm not 100% > >>>>>>> sure that this works and in any case this solution would be more > >>>>>>> complex than is necessary. > >>>>>>> > >>>>>>> Reported-by: Marc Zyngier <marc.zyngier@arm.com> > >>>>>>> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") > >>>>>>> Fixes: 981ed1bfbc6c ("pinctrl: Really force states during > >>>>>>> suspend/resume") > >>>>>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> > >>>>>> > >>>>>> applied as fix for 4.16 with the 2 Tested-tags > >>>>> > >>>>> Sorry to rain on everyone's parade, but further testing shows that > >>>>> this > >>>>> patch may not be enough to restore a reliable s2r. My initial testing > >>>>> did show that we were resuming without the VOP errors, but there seem > >>>>> to > >>>>> be further issues (I'm loosing the keyboard and the trackpad after > >>>>> resume on Kevin). > >>>>> > >>>>> Applying my initial hack makes it work again. I suspect that there are > >>>>> more hog pins that need tweaking, but I'm a bit out of my depth here. > >>>> > >>>> Are you positive you weren't just wearing your lucky hat when you > >>>> tested your patch and then took it off when you tested mine? As far > >>> > >>> So far, I seem to have a 100% success rate in resuming with my silly > >>> hack, whist your DT patch alone only gives me a 50% rate at best. > >>> > >>>> as I can see the only hogs left on kevin are: > >>>> &ap_pwroff /* AP will auto-assert this when in S3 */ > >>>> &clk_32k /* This pin is always 32k on gru boards */ > >>>> > >>>> Those map to: > >>>> ap_pwroff: ap-pwroff { > >>>> > >>>> rockchip,pins = <1 5 RK_FUNC_1 &pcfg_pull_none>; > >>>> > >>>> }; > >>>> > >>>> clk_32k: clk-32k { > >>>> > >>>> rockchip,pins = <0 0 RK_FUNC_2 &pcfg_pull_none>; > >>>> > >>>> }; > >>>> > >>>> So I added some printouts at suspend/resume time. Specifically I set > >>>> a boolean to "true" for the duration rockchip_pinctrl_suspend() and > >>>> rockchip_pinctrl_resume() and this turned on a printout in > >>>> rockchip_set_mux(). My printout looked like this (yeah, I know it's a > >>>> whitespace-damaged patch just to show what I'm doing): > >>>> > >>>> + regmap_read(regmap, reg, &before); > >>>> > >>>> data = (mask << (bit + 16)); > >>>> rmask = data | (data >> 16); > >>>> data |= (mux & mask) << bit; > >>>> ret = regmap_update_bits(regmap, reg, rmask, data); > >>>> > >>>> + regmap_read(regmap, reg, &after); > >>>> + > >>>> + if (DOUG) { > >>>> + dev_info(info->dev, > >>>> + "setting mux of GPIO%d-%d to %d; > >>>> %#010x=>%#010x\n", + bank->bank_num, pin, mux, > >>>> reg, before, after); + } > >>>> > >>>> ...and a similar one in rockchip_set_pull(). That showed this at > >>>> resume > >>>> time: > >>>> > >>>> [ 62.284427] rockchip-pinctrl pinctrl: setting mux of GPIO1-5 to 1; > >>>> 0x00009400=>0x00009400 > >>>> [ 62.294423] rockchip-pinctrl pinctrl: setting pull of GPIO1-5; > >>>> 0x000041aa=>0x000041aa > >>>> [ 62.303343] rockchip-pinctrl pinctrl: setting mux of GPIO0-0 to 2; > >>>> 0x00005002=>0x00005002 > >>>> [ 62.313240] rockchip-pinctrl pinctrl: setting pull of GPIO0-0; > >>>> 0x00000ddc=>0x00000ddc > >>>> [ > >>>> > >>>> Said another way: pinmux and pull isn't actually changing due to the > >>>> hogs. We can see if something else could be changing, but I'd really > >>>> want to be sure you're certain that the hogs are causing you > >>>> problems... > >>> > >>> I cannot say for sure that the hogs are the issue. But I thought that > >>> they were the only pins affected by 981ed1bfbc6c... If this patch can > >>> affect other pins, then I'm probably barking up the wrong tree. > >> > >> On Kevin I see something like > >> > >> [ 60.764129] cros-ec-spi spi2.0: spi transfer failed: -108 > >> [ 60.764132] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 > >> [ 60.764136] cros-ec-spi spi2.0: Command xfer error (err:-108) > >> [ 60.764365] cros-ec-spi spi2.0: spi transfer failed: -108 > >> [ 60.764368] cros-ec-spi spi2.0: cs-deassert spi transfer failed: -108 > >> [ 60.764371] cros-ec-spi spi2.0: Command xfer error (err:-108) > >> > >> on resume with my current for-next. So maybe your hack just > >> happened to change some timing during resume? > > > > No, I carry yet another patch to make that one work[1]. > > > >> Suspend/Resume also disconnects my usb-ethernet, making me lose my > >> nfsroot, so I can test this once every boot only. > > Yeah, it kills usb-ethernet for me too. ...so I can suspend/resume > once and then the next suspend fails with a bunch of usb errors. This > is based on your "hack/kevin-4.16", which looks like: > > e7934b797f4b (HEAD, linux_arm-platforms/hack/kevin-4.16) arm64: dts: > rockchip: Fix rk3399-gru-* s2r (pinctrl hogs, wifi reset) > 9b9988414f44 irqchip/gic-v3: Allow LPIs to be disabled from the command line > d5e06c686858 iommu/rockchip: Perform a reset on shutdown > 55b36a99626f drm/rockchip: Don't use spin_lock_irqsave in interrupt context > 7d72d7e57c2c drm/rockchip: Do not use memcpy for MMIO addresses > 236afcd0425c drm/rockchip: Clear all interrupts before requesting the IRQ > 31608ae0d3fc arm64: Enable dynamic sched_domain flag setting > 1b255643cdc3 drivers/base/arch_topology: Dynamic sched_domain flag detection > 2caca1b31f89 arm64: rk3399: Add capacity-dmips-mhz attributes > 36ced612e4d3 mfd: cros_ec: add RTC as mfd subdevice > 4e95dc697ec6 spi: rockchip: Convert to late and early system PM callbacks > d27d6da92086 drm/rockchip: analogix_dp: Ensure that the bridge is > powered before poking it > bcce86412ec1 arm64: DT: rk3399: Add missing EDP clock > c04436bd57b8 bootloader cmdline > bb8a4d168d58 build hacks > 26e04c84de6c kevin: build stuff > 6915015e30db kevin: defconfig > 4a3928c6f8a5 (tag: v4.16-rc3) Linux 4.16-rc3 > > > Actually, I also see some errors reading thermal channels, but I > wonder if perhaps USB is causing some sort of interrupt storm and that > happens to randomly take out whatever device was trying to talk at the > same time? > > [ 83.718999] read channel() error: -110 > [ 83.723275] thermal thermal_zone3: failed to read out thermal zone (-110) > [ 83.822810] read channel() error: -110 > [ 83.827048] thermal thermal_zone2: failed to read out thermal zone (-110) > [ 84.862810] read channel() error: -110 > [ 84.867051] thermal thermal_zone3: failed to read out thermal zone (-110) > [ 84.990800] read channel() error: -110 > [ 84.995034] thermal thermal_zone2: failed to read out thermal zone (-110) > [ 86.110817] read channel() error: -110 > [ 86.115259] thermal thermal_zone3: failed to read out thermal zone (-110) > [ 86.214990] read channel() error: -110 > [ 86.219440] thermal thermal_zone2: failed to read out thermal zone (-110) > [ 87.295049] read channel() error: -110 > [ 87.299316] thermal thermal_zone3: failed to read out thermal zone (-110) > [ 87.398805] read channel() error: -110 > [ 87.403040] thermal thermal_zone2: failed to read out thermal zone (-110) > [ 88.510808] read channel() error: -110 > [ 88.515053] thermal thermal_zone2: failed to read out thermal zone (-110) > [ 88.646810] read channel() error: -110 > [ 88.651250] thermal thermal_zone3: failed to read out thermal zone (-110) > [ 89.874606] xhci-hcd xhci-hcd.2.auto: Abort failed to stop command ring: > -110 [ 89.896134] xhci-hcd xhci-hcd.2.auto: xHCI host controller not > responding, assume dead > [ 89.905130] xhci-hcd xhci-hcd.2.auto: HC died; cleaning up > [ 89.911491] xhci-hcd xhci-hcd.2.auto: Timeout while waiting for > configure endpoint command > > > I use my kevin as a "real" laptop, which means it gets suspended/resumed > > at least 20 times a day, no reboots involved (unless I'm actually testing > > arm64 code on it). > > Enric: I know you've been working with Kevin stuff a lot. Any chance > you reproduce Marc's failures and also see that it's fixed with his > hack patch? > > Marc: have you posted actual logs for the failing case (after picking > my dts fix) somewhere? just to interject, I'd guess that this is more an issue on Marc's branch? I.e. I was testing the big hunk of analogix patches just now and did try suspending as well and apart from the obvious fixes for spi / clk the system suspends + resumes and I also get the keyboard back in working condition after resume. Of course my nfsroot still dies so I cannot test multiple suspend cycles, but at least once per boot it works. Test-branch in question is at https://github.com/mmind/linux-rockchip/tree/tmp/testing_20180312 Base is my devel/upstream, which is master + my for-next + drmmisc + stuff that already went into some maintainer tree and looked necessary. Heiko
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi index 6e50768a34ce..9ad54751d0d8 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi @@ -406,8 +406,9 @@ wlan_pd_n: wlan-pd-n { compatible = "regulator-fixed"; regulator-name = "wlan_pd_n"; + pinctrl-names = "default"; + pinctrl-0 = <&wlan_module_reset_l>; - /* Note the wlan_module_reset_l pinctrl */ enable-active-high; gpio = <&gpio1 11 GPIO_ACTIVE_HIGH>; @@ -988,12 +989,6 @@ ap_i2c_audio: &i2c8 { pinctrl-0 = < &ap_pwroff /* AP will auto-assert this when in S3 */ &clk_32k /* This pin is always 32k on gru boards */ - - /* - * We want this driven low ASAP; firmware should help us, but - * we can help ourselves too. - */ - &wlan_module_reset_l >; pcfg_output_low: pcfg-output-low { @@ -1173,12 +1168,7 @@ ap_i2c_audio: &i2c8 { }; wlan_module_reset_l: wlan-module-reset-l { - /* - * We want this driven low ASAP (As {Soon,Strongly} As - * Possible), to avoid leakage through the powered-down - * WiFi. - */ - rockchip,pins = <1 11 RK_FUNC_GPIO &pcfg_output_low>; + rockchip,pins = <1 11 RK_FUNC_GPIO &pcfg_pull_none>; }; bt_host_wake_l: bt-host-wake-l {
Back in the early days when gru devices were still under development we found an issue where the WiFi reset line needed to be configured as early as possible during the boot process to avoid the WiFi module being in a bad state. We found that the way to get the kernel to do this in the earliest possible place was to configure this line in the pinctrl hogs, so that's what we did. For some history here you can see <http://crosreview.com/368770>. After the time that change landed in the kernel, we landed a firmware change to configure this line even earlier. See <http://crosreview.com/399919>. However, even after the firmware change landed we kept the kernel change to deal with the fact that some people working on devices might take a little while to update their firmware. At this there are definitely zero devices out in the wild that have firmware without the fix in it. Specifically looking in the firmware branch several critically important fixes for memory stability landed after the patch in coreboot and I know we didn't ship without those. Thus, by now, everyone should have the new firmware and it's safe to not have the kernel set this up in a pinctrl hog. Historically, even though it wasn't needed to have this in a pinctrl hog, we still kept it since it didn't hurt. Pinctrl would apply the default hog at bootup and then would never touch things again. That all changed with commit 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume"). After that commit then we'll re-apply the default hog at resume time and that can screw up the reset state of WiFi. ...and on rk3399 if you touch a device on PCIe in the wrong way then the whole system can go haywire. That's what was happening. Specifically you'd resume a rk3399-gru-* device and it would mostly resume, then would crash with some crazy weird crash. One could say, perhaps, that the recent pinctrl change was at fault (and should be fixed) since it changed behavior. ...but that's not really true. The device tree for rk3399-gru is really to blame. Specifically since the pinctrl is defined in the hog and not in the "wlan-pd-n" node then the actual user of this pin doesn't have a pinctrl entry for it. That's bad. Let's fix our problems by just moving the control of "wlan_module_reset_l pinctrl" out of the hog and put them in the proper place. NOTE: in theory, I think it should actually be possible to have a pin controlled _both_ by the hog and by an actual device. Once the device claims the pin I think the hog is supposed to let go. I'm not 100% sure that this works and in any case this solution would be more complex than is necessary. Reported-by: Marc Zyngier <marc.zyngier@arm.com> Fixes: 48f4d9796d99 ("arm64: dts: rockchip: add Gru/Kevin DTS") Fixes: 981ed1bfbc6c ("pinctrl: Really force states during suspend/resume") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)