Message ID | 20210708122754.555846-1-i.mikhaylov@yadro.com (mailing list archive) |
---|---|
Headers | show |
Series | net/ncsi: Add NCSI Intel OEM command to keep PHY link up | expand |
Hello: This series was applied to netdev/net.git (refs/heads/master): On Thu, 8 Jul 2021 15:27:51 +0300 you wrote: > Add NCSI Intel OEM command to keep PHY link up and prevents any channel > resets during the host load on i210. Also includes dummy response handler for > Intel manufacturer id. > > Changes from v1: > 1. sparse fixes about casts > 2. put it after ncsi_dev_state_probe_cis instead of > ncsi_dev_state_probe_channel because sometimes channel is not ready > after it > 3. inl -> intel > > [...] Here is the summary with links: - [v2,1/3] net/ncsi: fix restricted cast warning of sparse https://git.kernel.org/netdev/net/c/27fa107d3b8d - [v2,2/3] net/ncsi: add NCSI Intel OEM command to keep PHY up https://git.kernel.org/netdev/net/c/abd2fddc94a6 - [v2,3/3] net/ncsi: add dummy response handler for Intel boards https://git.kernel.org/netdev/net/c/163f5de509a8 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Hello, On Thu, Jul 08, 2021 at 03:27:51PM +0300, Ivan Mikhaylov wrote: > Add NCSI Intel OEM command to keep PHY link up and prevents any channel > resets during the host load on i210. There're multiple things to consider here and I have hesitations about the way you propose to solve the issue. While the host is booted up and fully functional it assumes it has full proper control of network cards, and sometimes it really needs to reset them to e.g. recover from crashed firmware. The PHY resets might also make sense in certain cases, and so in general having this "link up" bit set all the time might be breaking assumptions. As far as I can tell the Intel developers assumed you would enable this bit just prior to powering on the host and turn off after all the POST codes are transferred and we can assume the host system is done with the UEFI stage and the real OS took over. OpenBMC seems to have all the necessary hooks to do it that way, and you have a netlink command to send whatever you need for that from the userspace, e.g. with the "C version" ncsi-netlink command to set this bit just run: ncsi-netlink -l 3 -c 0 -p 0 -o 0x50 0x00 0x00 0x01 0x57 0x20 0x00 0x01 https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-networkd/+/36592 would provide an OpenBMC-specific way too. There's another related thing to consider here: by default I210 has power-saving modes enabled and so when BMC is booting the link is established only in 100BASE-T mode. With this configuration and this bit always set you'd be always stuck to that, never getting Gigabit speeds. For server motherboards I propose to configure I210 with this: ./eeupdate64e /all /ww 0x13 0x0081 # disable Low Power Link Up ./eeupdate64e /all /ww 0x20 0x2004 # enable 1000 in non-D0a (it's a one-time operation that's best to be performed along with the initial I210 flashing) Ivan, so far I have an impression that the user-space solution would be much easier, flexible and manageable and that there's no need for this command to be in Linux at all.
On Tue, 2021-07-20 at 12:53 +0300, Paul Fertser wrote: > Hello, > > On Thu, Jul 08, 2021 at 03:27:51PM +0300, Ivan Mikhaylov wrote: > > Add NCSI Intel OEM command to keep PHY link up and prevents any channel > > resets during the host load on i210. > > There're multiple things to consider here and I have hesitations about > the way you propose to solve the issue. > > While the host is booted up and fully functional it assumes it has > full proper control of network cards, and sometimes it really needs to > reset them to e.g. recover from crashed firmware. The PHY resets might > also make sense in certain cases, and so in general having this "link > up" bit set all the time might be breaking assumptions. Paul, what kind of assumption it would break? You know that you set that option in your kernel, anyways you can look at /proc/config.gz if you have hesitations. In other ways, if you're saying about possible runtime control, there is ncsi- netlink control and solution from phosphor-networkd which is on review stage. Joel proposed it as DTS option which may help at runtime. Some of those commands should be applied after channel probe as I think including phy reset control. > As far as I can tell the Intel developers assumed you would enable > this bit just prior to powering on the host and turn off after all the > POST codes are transferred and we can assume the host system is done > with the UEFI stage and the real OS took over. > > OpenBMC seems to have all the necessary hooks to do it that way, and > you have a netlink command to send whatever you need for that from the > userspace, e.g. with the "C version" ncsi-netlink command to set this > bit just run: > > ncsi-netlink -l 3 -c 0 -p 0 -o 0x50 0x00 0x00 0x01 0x57 0x20 0x00 0x01 > > https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-networkd/+/36592 > would provide an OpenBMC-specific way too. I know about it, Eddie posted that link before. > There's another related thing to consider here: by default I210 has > power-saving modes enabled and so when BMC is booting the link is > established only in 100BASE-T mode. With this configuration and this > bit always set you'd be always stuck to that, never getting Gigabit > speeds. > > For server motherboards I propose to configure I210 with this: > ./eeupdate64e /all /ww 0x13 0x0081 # disable Low Power Link Up > ./eeupdate64e /all /ww 0x20 0x2004 # enable 1000 in non-D0a > (it's a one-time operation that's best to be performed along with the > initial I210 flashing) Good to know, thanks. > Ivan, so far I have an impression that the user-space solution would > be much easier, flexible and manageable and that there's no need for > this command to be in Linux at all. You may not have such things on your image with suitable env which you can rely on. There is smaf for mellanox which is done in the same way for example. Thanks.
On Tue, Jul 20, 2021 at 05:00:40PM +0300, Ivan Mikhaylov wrote: > > While the host is booted up and fully functional it assumes it has > > full proper control of network cards, and sometimes it really needs to > > reset them to e.g. recover from crashed firmware. The PHY resets might > > also make sense in certain cases, and so in general having this "link > > up" bit set all the time might be breaking assumptions. > > Paul, what kind of assumption it would break? The host OS drivers assume they can fully control PCIe network cards. Doing anything (including inhibiting PHY resets) behind its back might break assumptions the driver authors had. This bit in question certainly makes the card behave in an unusual way, so no wonder Intel didn't enable it by default. I do not claim I know for a fact it's problematic but it doesn't feel like "the right thing" so some edge cases might expose issues. > Joel proposed it as DTS option which may help at runtime. Sorry, I'm not following. If BMC is fully booted it's able to configure NC-SI appropriately by a userspace action coordinated with other BMC tasks. If BMC is not yet ready then we can't communicate with it via Ethernet anyway. So I can't see when exactly is it going to be helpful. > Some of those commands should be applied after channel probe as I > think including phy reset control. Do you have any other commands in mind? So far I assumed we're discussing just the one to mask PHY resets. > > Ivan, so far I have an impression that the user-space solution would > > be much easier, flexible and manageable and that there's no need for > > this command to be in Linux at all. > > You may not have such things on your image with suitable env which you can rely > on. There is smaf for mellanox which is done in the same way for example. I can hardly imagine why an OS running on BMC would be using this code in question and appropriate DT configuration but not having the right means in userspace to control it. What would be the usecase? If the network subsystem maintainers think this is a good idea, all things considered, I'm fine with it. I210 losing link exactly at the time when you need it (to enter the UEFI interactive menu) is super-annoying, so probably any fix is better than none :) Thank you for discussion.