Message ID | 20230428223500.23337-1-jim2101024@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI: brcmstb: Configure appropriate HW CLKREQ# mode | expand |
Hi, Jim Quinlan <jim2101024@gmail.com> (2023-04-28): > Note: (a) With this series, all downstream devices should work w/o DT changes. > Only if the user desires L1SS savings and has an L1SS-capable > device is a DT change required (brcm,enable-l1ss). I'm still seeing some problems, but tweaking two things can lead to massive improvements: - setting brcm,enable-l1ss; - upgrading the CM4's EEPROM. Seeing how patch #4 was about the bootloader, I've prepared an updated test image the following way: - Kernel: 76f598ba7d8e2bfb4855b5298caedd5af0c374a8 + this series. - Userland: Debian testing as of 2023-05-01. - Serial console set as previously. - Bootloader: latest release upstream, 1.20230405 (That is: bootcode.bin, *.dat, *.elf) Then, seeing how setting brcm,enable-l1ss was helping some test cases, I've extended testing to be quite systematic, using those components: - CM4 IO Board (always the same). - 1 CM4 among: - CM4 Lite Rev 1.0 (extra test, storage much quicker to deploy) - CM4 8/32 Rev 1.0 (as before) - CM4 4/32 Rev 1.1 (as before) - 1 PCIe card among: - 006 = SupaHub PCIe->USB adapter, reference PCE6U1C-R02, VER 006 → based on Renesas UPD720201/UPD720202 → CONFIG_USB_XHCI_PCI_RENESAS=m → /lib/firmware/renesas_usb_fw.mem - 006S = SupaHub PCIe->USB adapter, reference PCE6U1C-R02, VER 006S → based on Renesas UPD720201/UPD720202 → CONFIG_USB_XHCI_PCI_RENESAS=m → /lib/firmware/renesas_usb_fw.mem - VIA = Waveshare PCIe-to-multiple-USB adapter (no obvious reference) → based on VIA VL805/806 - 1 Kingston DataTraveler G4 32G (always the same), plugged on one of the USB port of the PCIe card being tested. I've tested a cold boot with each combination, first without touching the DTB at all (pristine), then after enabling L1SS. The results are as follows, legend is below. +----------+----------+----------+ | 006 | 006S | VIA | +------------------------+----------+----------+----------+ | 1. CM4 Lite Rev 1.0 | KP* | KP* | OK, 72 | | pristine | | | | +------------------------+----------+----------+----------+ | 2. CM4 Lite Rev 1.0 | boot + | OK, 72 | OK, 72 | | + brcm,enable-l1ss | timeouts | | | +------------------------+----------+----------+----------+ | 3. CM4 8/32 Rev 1.0 | KP | KP | KP | | pristine | | | | +------------------------+----------+----------+----------+ | 4. CM4 8/32 Rev 1.0 | OK, 69 | OK, 69 | OK, 69 | | + brcm,enable-l1ss | | | | +------------------------+----------+----------+----------+ | 5. CM4 4/32 Rev 1.1 | boot + | OK, 69 | OK, 69 | | pristine | timeouts | | | +------------------------+----------+----------+----------+ | 6. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | | + brcm,enable-l1ss | | | | +------------------------+----------+----------+----------+ Legend: - OK, XXX = boots fine, memory stick visible, and reading from it using `dd if=/dev/sda of=/dev/null bs=8M status=progress` for a few seconds gives an XXX MB/s transfer rate. - KP = kernel panic involving brcm_pcie_probe(). - KP* = probably the same kernel panic; unfortunately, the serial console hardly works, but booting for 1 minute, shutting down for 10 seconds, in a loop… ends up showing excerpts from a trace, one word or sometimes several lines at a time. Since brcm_pcie_driver_init() or SError appeared, getting the same trace looks probable to me. [See also the very end of the mail.] - boot + timeouts = the system boots, the memory stick is not visible though, as XHCI timeouts show up, e.g.: [ 34.144748] xhci_hcd 0000:01:00.0: Timeout while waiting for setup device command [ 34.357273] usb 3-1.4: Device not responding to setup address. [ 34.568429] usb 3-1.4: device not accepting address 6, error -71 [ 34.575730] usb 3-1-port4: unable to enumerate USB device So it looks like *for these combinations* setting brcm,enable-l1ss is only helping, even if one particular thing remains not fully fonctional (but at least boots now): CM4 Lite Rev 1.0 + 006 card. And since you mentioned the EEPROM topic off-list, I've investigated that part as well. It turns out that what *seemed* (at least to my non-expert eyes) sort of related to the hardware revisions… could have actually be directly linked to the EEPROM version shipped with each Compute Module. After deploying the relevant tooling, and based on the reported timestamps, here are the relevant EEPROM filenames in the rpi-eeprom repository (https://github.com/raspberrypi/rpi-eeprom): - CM4 Lite Rev 1.0 [lines 1-2] → firmware/stable/pieeprom-2021-02-16.bin - CM4 8/32 Rev 1.0 [lines 3-4] → firmware/stable/pieeprom-2021-02-16.bin - CM4 4/32 Rev 1.1 [lines 5-6] → firmware/stable/pieeprom-2021-12-02.bin Try to upgrade a first CM4 Lite to the latest version (2023-01-11) gave solid results (which I'm not including in this report as I was only in exploratory mode, with a slightly different Kingston DataTraveler anyway; for reference its EEPROM dated back to 2020, and it seemed to have ever been in some beta state…), so I decided to replicate all the tests above with the very same 3 CM4, upgraded to 2023-01-11. In passing: That might explain why it always felt like later revisions were working “better” than the old ones: being designed + manufactured later, they just ended up being shipped with a newer/better EEPROM? Upgrade: via usbboot (https://github.com/raspberrypi/usbboot) and the recovery procedure (which by default deploys the latest stable version). Results with everyone at 2023-01-11. +----------+----------+----------+ | 006 | 006S | VIA | +------------------------+----------+----------+----------+ | 1. CM4 Lite Rev 1.0 | OK, 83 | OK, 72 | OK, 72 | | pristine | | | | +------------------------+----------+----------+----------+ | 2. CM4 Lite Rev 1.0 | OK, 82 | OK, 72 | OK, 72 | | + brcm,enable-l1ss | | | | +------------------------+----------+----------+----------+ | 3. CM4 8/32 Rev 1.0 | OK, 82 | OK, 69 | OK, 69 | | pristine | | | | +------------------------+----------+----------+----------+ | 4. CM4 8/32 Rev 1.0 | OK, 82 | OK, 69 | OK, 69 | | + brcm,enable-l1ss | | | | +------------------------+----------+----------+----------+ | 5. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | | pristine | | | | +------------------------+----------+----------+----------+ | 6. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | | + brcm,enable-l1ss | | | | +------------------------+----------+----------+----------+ Takeaways: - Upgrading the EEPROM solved all problems; - brcm,enable-l1ss (which used to help) is not needed, as mentioned in your cover letter. Now that I'm a little more familiar with the EEPROM tooling: - It looks like I'm able to downgrade the EEPROM to an earlier version. But I cannot guarantee I can recover exactly the previous state as there are two different things at least: the EEPROM itself and what's called “bootloader config” in vcgencmd). I've seen at least the LED change behaviour (via POWER_OFF_ON_HALT). - Upon downgrading, without brcm,enable-l1ss, the CM4 Lite is indeed showing me a black screen/no logs in the serial console again with either one of the 006/006S cards. - It's possible to specify a boot config file when deploying the EEPROM, and I've tried enabling BOOT_UART on the CM4 Lite. Now I'm getting the kernel panic on the console! Where should I go from here? - Does it make sense to gather a trace for the kernel panic on say two combinations, without brcm,enable-l1ss set: + CM4 Lite Rev 1.0 (old EEPROM) + 006 [first KP* in first table] + CM4 8/32 Rev 1.0 (old EEPROM) + 006 [first KP in first table] then get a trace without your patches, and attach all four resulting files? - Or should one just consider that the very first thing that each and every CM4 user is supposed to do is upgrade their EEPROM? On a personal side, I'm very fine with being told to just upgrade the EEPROM already (and that seems to cover any use case I could think of, and test). But if getting and comparing traces before/after your patches is helpful to you and the wider community, I'm happy to spend some more time testing and gathering details. Cheers,
On Tue, May 2, 2023 at 7:16 PM Cyril Brulebois <kibi@debian.org> wrote: > > Hi, > > Jim Quinlan <jim2101024@gmail.com> (2023-04-28): > > Note: (a) With this series, all downstream devices should work w/o DT changes. > > Only if the user desires L1SS savings and has an L1SS-capable > > device is a DT change required (brcm,enable-l1ss). > > I'm still seeing some problems, but tweaking two things can lead to > massive improvements: > - setting brcm,enable-l1ss; > - upgrading the CM4's EEPROM. > > Seeing how patch #4 was about the bootloader, I've prepared an updated > test image the following way: > - Kernel: 76f598ba7d8e2bfb4855b5298caedd5af0c374a8 + this series. > - Userland: Debian testing as of 2023-05-01. > - Serial console set as previously. > - Bootloader: latest release upstream, 1.20230405 > (That is: bootcode.bin, *.dat, *.elf) > > Then, seeing how setting brcm,enable-l1ss was helping some test cases, > I've extended testing to be quite systematic, using those components: > - CM4 IO Board (always the same). > - 1 CM4 among: > - CM4 Lite Rev 1.0 (extra test, storage much quicker to deploy) > - CM4 8/32 Rev 1.0 (as before) > - CM4 4/32 Rev 1.1 (as before) > - 1 PCIe card among: > - 006 = SupaHub PCIe->USB adapter, reference PCE6U1C-R02, VER 006 > → based on Renesas UPD720201/UPD720202 > → CONFIG_USB_XHCI_PCI_RENESAS=m > → /lib/firmware/renesas_usb_fw.mem > - 006S = SupaHub PCIe->USB adapter, reference PCE6U1C-R02, VER 006S > → based on Renesas UPD720201/UPD720202 > → CONFIG_USB_XHCI_PCI_RENESAS=m > → /lib/firmware/renesas_usb_fw.mem > - VIA = Waveshare PCIe-to-multiple-USB adapter (no obvious reference) > → based on VIA VL805/806 > - 1 Kingston DataTraveler G4 32G (always the same), plugged on one of the > USB port of the PCIe card being tested. > > I've tested a cold boot with each combination, first without touching the > DTB at all (pristine), then after enabling L1SS. The results are as > follows, legend is below. > > +----------+----------+----------+ > | 006 | 006S | VIA | > +------------------------+----------+----------+----------+ > | 1. CM4 Lite Rev 1.0 | KP* | KP* | OK, 72 | > | pristine | | | | > +------------------------+----------+----------+----------+ > | 2. CM4 Lite Rev 1.0 | boot + | OK, 72 | OK, 72 | > | + brcm,enable-l1ss | timeouts | | | > +------------------------+----------+----------+----------+ > | 3. CM4 8/32 Rev 1.0 | KP | KP | KP | > | pristine | | | | > +------------------------+----------+----------+----------+ > | 4. CM4 8/32 Rev 1.0 | OK, 69 | OK, 69 | OK, 69 | > | + brcm,enable-l1ss | | | | > +------------------------+----------+----------+----------+ > | 5. CM4 4/32 Rev 1.1 | boot + | OK, 69 | OK, 69 | > | pristine | timeouts | | | > +------------------------+----------+----------+----------+ > | 6. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | > | + brcm,enable-l1ss | | | | > +------------------------+----------+----------+----------+ Hello Cyril, I'm confused by your result table above which has a number of failures. Further in your message you say: Takeaways: - Upgrading the EEPROM solved all problems; - brcm,enable-l1ss (which used to help) is not needed [...] May I conclude that if one uses a modern CM4 eeprom that these failures go away? You mentioned in a personal email that at least one of your "CM4" was running a Beta eeprom image. I'm much less concerned about folks having problems with old or pre-release versions of the CM4 eeprom because (a) most of these folks are using Raspian Linux anyway and (b) they can just upgrade their eeprom. Further, the Rpi eeprom is closed-source and my questions on the Rpi forum and Rpi Github have not yet led to any answers about why a different eeprom image is changing the behavior of a clkreq signal. Regards, Jim Quinlan Broadcom STB > > Legend: > - OK, XXX = boots fine, memory stick visible, and reading from it using > `dd if=/dev/sda of=/dev/null bs=8M status=progress` for a few seconds > gives an XXX MB/s transfer rate. > - KP = kernel panic involving brcm_pcie_probe(). > - KP* = probably the same kernel panic; unfortunately, the serial console > hardly works, but booting for 1 minute, shutting down for 10 seconds, > in a loop… ends up showing excerpts from a trace, one word or sometimes > several lines at a time. Since brcm_pcie_driver_init() or SError > appeared, getting the same trace looks probable to me. [See also the > very end of the mail.] > - boot + timeouts = the system boots, the memory stick is not visible > though, as XHCI timeouts show up, e.g.: > > [ 34.144748] xhci_hcd 0000:01:00.0: Timeout while waiting for setup device command > [ 34.357273] usb 3-1.4: Device not responding to setup address. > [ 34.568429] usb 3-1.4: device not accepting address 6, error -71 > [ 34.575730] usb 3-1-port4: unable to enumerate USB device > > So it looks like *for these combinations* setting brcm,enable-l1ss is only > helping, even if one particular thing remains not fully fonctional (but > at least boots now): CM4 Lite Rev 1.0 + 006 card. > > > And since you mentioned the EEPROM topic off-list, I've investigated that > part as well. It turns out that what *seemed* (at least to my non-expert > eyes) sort of related to the hardware revisions… could have actually be > directly linked to the EEPROM version shipped with each Compute Module. > > After deploying the relevant tooling, and based on the reported > timestamps, here are the relevant EEPROM filenames in the rpi-eeprom > repository (https://github.com/raspberrypi/rpi-eeprom): > - CM4 Lite Rev 1.0 [lines 1-2] > → firmware/stable/pieeprom-2021-02-16.bin > - CM4 8/32 Rev 1.0 [lines 3-4] > → firmware/stable/pieeprom-2021-02-16.bin > - CM4 4/32 Rev 1.1 [lines 5-6] > → firmware/stable/pieeprom-2021-12-02.bin > > Try to upgrade a first CM4 Lite to the latest version (2023-01-11) gave > solid results (which I'm not including in this report as I was only in > exploratory mode, with a slightly different Kingston DataTraveler anyway; > for reference its EEPROM dated back to 2020, and it seemed to have ever > been in some beta state…), so I decided to replicate all the tests above with > the very same 3 CM4, upgraded to 2023-01-11. > > In passing: That might explain why it always felt like later revisions > were working “better” than the old ones: being designed + manufactured > later, they just ended up being shipped with a newer/better EEPROM? > > Upgrade: via usbboot (https://github.com/raspberrypi/usbboot) and the > recovery procedure (which by default deploys the latest stable version). > > Results with everyone at 2023-01-11. > > +----------+----------+----------+ > | 006 | 006S | VIA | > +------------------------+----------+----------+----------+ > | 1. CM4 Lite Rev 1.0 | OK, 83 | OK, 72 | OK, 72 | > | pristine | | | | > +------------------------+----------+----------+----------+ > | 2. CM4 Lite Rev 1.0 | OK, 82 | OK, 72 | OK, 72 | > | + brcm,enable-l1ss | | | | > +------------------------+----------+----------+----------+ > | 3. CM4 8/32 Rev 1.0 | OK, 82 | OK, 69 | OK, 69 | > | pristine | | | | > +------------------------+----------+----------+----------+ > | 4. CM4 8/32 Rev 1.0 | OK, 82 | OK, 69 | OK, 69 | > | + brcm,enable-l1ss | | | | > +------------------------+----------+----------+----------+ > | 5. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | > | pristine | | | | > +------------------------+----------+----------+----------+ > | 6. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | > | + brcm,enable-l1ss | | | | > +------------------------+----------+----------+----------+ > > Takeaways: > - Upgrading the EEPROM solved all problems; > - brcm,enable-l1ss (which used to help) is not needed, as mentioned in > your cover letter. > > Now that I'm a little more familiar with the EEPROM tooling: > - It looks like I'm able to downgrade the EEPROM to an earlier version. > But I cannot guarantee I can recover exactly the previous state as > there are two different things at least: the EEPROM itself and what's > called “bootloader config” in vcgencmd). I've seen at least the LED > change behaviour (via POWER_OFF_ON_HALT). > - Upon downgrading, without brcm,enable-l1ss, the CM4 Lite is indeed > showing me a black screen/no logs in the serial console again with > either one of the 006/006S cards. > - It's possible to specify a boot config file when deploying the EEPROM, > and I've tried enabling BOOT_UART on the CM4 Lite. Now I'm getting the > kernel panic on the console! > > Where should I go from here? > - Does it make sense to gather a trace for the kernel panic on say two > combinations, without brcm,enable-l1ss set: > + CM4 Lite Rev 1.0 (old EEPROM) + 006 [first KP* in first table] > + CM4 8/32 Rev 1.0 (old EEPROM) + 006 [first KP in first table] > then get a trace without your patches, and attach all four resulting > files? > - Or should one just consider that the very first thing that each and > every CM4 user is supposed to do is upgrade their EEPROM? > > On a personal side, I'm very fine with being told to just upgrade the > EEPROM already (and that seems to cover any use case I could think of, > and test). But if getting and comparing traces before/after your patches > is helpful to you and the wider community, I'm happy to spend some more > time testing and gathering details. > > > Cheers, > -- > Cyril Brulebois (kibi@debian.org) <https://debamax.com/> > D-I release manager -- Release team member -- Freelance Consultant
Hi Jim, Jim Quinlan <james.quinlan@broadcom.com> (2023-05-03): > > +----------+----------+----------+ > > | 006 | 006S | VIA | > > +------------------------+----------+----------+----------+ > > | 1. CM4 Lite Rev 1.0 | KP* | KP* | OK, 72 | > > | pristine | | | | > > +------------------------+----------+----------+----------+ > > | 2. CM4 Lite Rev 1.0 | boot + | OK, 72 | OK, 72 | > > | + brcm,enable-l1ss | timeouts | | | > > +------------------------+----------+----------+----------+ > > | 3. CM4 8/32 Rev 1.0 | KP | KP | KP | > > | pristine | | | | > > +------------------------+----------+----------+----------+ > > | 4. CM4 8/32 Rev 1.0 | OK, 69 | OK, 69 | OK, 69 | > > | + brcm,enable-l1ss | | | | > > +------------------------+----------+----------+----------+ > > | 5. CM4 4/32 Rev 1.1 | boot + | OK, 69 | OK, 69 | > > | pristine | timeouts | | | > > +------------------------+----------+----------+----------+ > > | 6. CM4 4/32 Rev 1.1 | OK, 82 | OK, 69 | OK, 69 | > > | + brcm,enable-l1ss | | | | > > +------------------------+----------+----------+----------+ > > Hello Cyril, > > I'm confused by your result table above which has a number of > failures. Further in your message you say: > > Takeaways: > - Upgrading the EEPROM solved all problems; > - brcm,enable-l1ss (which used to help) is not needed [...] > > May I conclude that if one uses a modern CM4 eeprom that these > failures go away? Sorry that wasn't clear enough. The table with failures, quoted above, was with 3 compute modules in their stock configuration: - CM4 Lite Rev 1.0 (lines 1-2) had an 2021-02-16 EEPROM; - CM4 8/32 Rev 1.0 (lines 3-4) had an 2021-02-16 EEPROM; - CM4 4/32 Rev 1.1 (lines 5-6) had an 2021-12-02 EEPROM. Upgrading them all to current 2023-01-11 led to the second table when I tested again, where everything worked fine. The 2 versions (2021-02-16 and 2021-12-02) are marked as stable in the rpi-eeprom.git repository. > You mentioned in a personal email that at least one of your "CM4" was > running a Beta eeprom image. That one was another CM4 Lite Rev 1.0, and had a 2020-10-02 EEPROM. That one is marked as an old beta in the rpi-eeprom.git. (That CM4 Lite also works very fine once the current 2023-01-11 is deployed on it.) [Regarding EEPROM variety in the field: I've mentioned this topic on the #debian-raspberrypi IRC channel, warning others about troubles that might be linked to the EEPROM version. I've seen at least one CM4 user report the 2020-10-02 beta EEPROM, and another one report a different 2022-04-26 stable EEPROM.] > I'm much less concerned about folks having problems with old or > pre-release versions of the CM4 eeprom because (a) most of these folks > are using Raspian Linux anyway and (b) they can just upgrade their > eeprom. That looks totally fair to me. So I can stop here, wait for the next iteration of your patch series if there's one (rechecking everything still works fine), and only the latest EEPROM matters? Sounds good. > Further, the Rpi eeprom is closed-source and my questions on the Rpi > forum and Rpi Github have not yet led to any answers about why a > different eeprom image is changing the behavior of a clkreq signal. The following doesn't shed much light but seems consistent with results getting better with newer EEPROM versions (a number of “PCIe” hits, some about probing, some about resets): https://github.com/raspberrypi/rpi-eeprom/blob/master/firmware/release-notes.md [If I had known how much of a difference an upgraded EEPROM would make, and how easy it is to upgrade, I would have probably bothered you much less with all those weird results… Sorry about that.] The whole series is: Tested-By: Cyril Brulebois <cyril@debamax.com> Cheers,