Message ID | 1635330675-25592-2-git-send-email-sbhatta@marvell.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add a devlink param and documentation | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Series has a cover letter |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: jerinj@marvell.com lcherian@marvell.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | No Fixes tag |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 145 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 19 this patch: 19 |
netdev/header_inline | success | No static functions without inline keyword in header files |
On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote: > From: Rakesh Babu <rsaladi2@marvell.com> > > The physical/SerDes link of an netdev interface is not > toggled on interface bring up and bring down. This is > because the same link is shared between PFs and its VFs. > This patch adds devlink param to toggle physical link so > that it is useful in cases where a physical link needs to > be re-initialized. So it's a reset? Or are there cases where user wants the link to stay down?
Hi Jakub, > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, October 27, 2021 9:08 PM > To: Subbaraya Sundeep Bhatta > Cc: davem@davemloft.net; netdev@vger.kernel.org; Hariprasad Kelam; Geethasowjanya Akula; Sunil Kovvuri Goutham; Rakesh Babu Saladi; Vamsi Krishna Attunuru > Subject: Re: [net-next PATCH 1/2] octeontx2-pf: Add devlink param to init and de-init serdes > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote: > > From: Rakesh Babu <rsaladi2@marvell.com> > > > > The physical/SerDes link of an netdev interface is not > > toggled on interface bring up and bring down. This is > > because the same link is shared between PFs and its VFs. > > This patch adds devlink param to toggle physical link so > > that it is useful in cases where a physical link needs to > > be re-initialized. > > So it's a reset? Or are there cases where user wants the link > to stay down? There are cases where the user wants the link to stay down and debug. We are adding this to help customers to debug issues wrt physical links. Thanks, Sundeep
On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote: > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote: > > > From: Rakesh Babu <rsaladi2@marvell.com> > > > > > > The physical/SerDes link of an netdev interface is not > > > toggled on interface bring up and bring down. This is > > > because the same link is shared between PFs and its VFs. > > > This patch adds devlink param to toggle physical link so > > > that it is useful in cases where a physical link needs to > > > be re-initialized. > > > > So it's a reset? Or are there cases where user wants the link > > to stay down? > > There are cases where the user wants the link to stay down and debug. > We are adding this to help customers to debug issues wrt physical links. Intel has a similar thing, they keep adding a ethtool priv flag called "link-down-on-close" to all their drivers. Maybe others do this, too. It's time we added a standard API for this.
On Wed, Oct 27, 2021 at 10:08:57AM -0700, Jakub Kicinski wrote: > On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote: > > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote: > > > > From: Rakesh Babu <rsaladi2@marvell.com> > > > > > > > > The physical/SerDes link of an netdev interface is not > > > > toggled on interface bring up and bring down. This is > > > > because the same link is shared between PFs and its VFs. > > > > This patch adds devlink param to toggle physical link so > > > > that it is useful in cases where a physical link needs to > > > > be re-initialized. > > > > > > So it's a reset? Or are there cases where user wants the link > > > to stay down? > > > > There are cases where the user wants the link to stay down and debug. > > We are adding this to help customers to debug issues wrt physical links. > > Intel has a similar thing, they keep adding a ethtool priv flag called > "link-down-on-close" to all their drivers. This is the list I compiled the previous time we discussed it: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c3880bd159d431d06b687b0b5ab22e24e6ef0070 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5ec9e2ce41ac198de2ee18e0e529b7ebbc67408 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab4ab73fc1ec6dec548fa36c5e383ef5faa7b4c1 It seems that various drivers default to not shutting down the link upon ndo_stop(), but some users want to override it. I hit that too as it breaks ECMP (switch thinks the link is up). > Maybe others do this, too. It's time we added a standard API for > this. The new parameter sounds like a reset, but it can also be achieved by: # ethtool --set-priv-flags eth0 link-down-on-close on # ip link set dev eth0 down # ip link set dev eth0 up Where the first command is replaced by a more standard ethtool API.
On Wed, Oct 27, 2021 at 06:43:00PM +0000, Sunil Kovvuri Goutham wrote: > > > ________________________________ > > From: Ido Schimmel <idosch@idosch.org> > > Sent: Wednesday, October 27, 2021 11:41 PM > > To: Jakub Kicinski <kuba@kernel.org> > > Cc: sundeep subbaraya <sundeep.lkml@gmail.com>; David Miller <davem@davemloft.net>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Hariprasad Kelam <hkelam@marvell.com>; Geethasowjanya Akula <gakula@marvell.com>; Sunil Kovvuri Goutham <sgoutham@marvell.com>; Subbaraya Sundeep Bhatta <sbhatta@marvell.com>; Rakesh Babu Saladi <rsaladi2@marvell.com>; Saeed Mahameed <saeed@kernel.org>; anthony.l.nguyen@intel.com <anthony.l.nguyen@intel.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; Andrew Lunn <andrew@lunn.ch> > > Subject: Re: [EXT] Re: [net-next PATCH 1/2] octeontx2-pf: Add devlink param to init and de-init serdes > > > > On Wed, Oct 27, 2021 at 10:08:57AM -0700, Jakub Kicinski wrote: > > > On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote: > > > > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote: > > > > > > From: Rakesh Babu <rsaladi2@marvell.com> > > > > > > > > > > > > The physical/SerDes link of an netdev interface is not > > > > > > toggled on interface bring up and bring down. This is > > > > > > because the same link is shared between PFs and its VFs. > > > > > > This patch adds devlink param to toggle physical link so > > > > > > that it is useful in cases where a physical link needs to > > > > > > be re-initialized. > > > > > > > > > > So it's a reset? Or are there cases where user wants the link > > > > > to stay down? > > > > > > > > There are cases where the user wants the link to stay down and debug. > > > > We are adding this to help customers to debug issues wrt physical links. > > > > > > Intel has a similar thing, they keep adding a ethtool priv flag called > > > "link-down-on-close" to all their drivers. > > > > This is the list I compiled the previous time we discussed it: > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dc3880bd159d431d06b687b0b5ab22e24e6ef0070&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=fvl1aLwL55CWIsG2NT5i3QsP4o_GTEsGhA6Epjz7ZAk&e= > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dd5ec9e2ce41ac198de2ee18e0e529b7ebbc67408&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=kH50Qq3h75xREveyWvCUn35wXagtt4uv1QRK0wMBEdk&e= > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dab4ab73fc1ec6dec548fa36c5e383ef5faa7b4c1&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=Uc3yY-5HjS7TgRBl4DPLsJ19XiHDD_PvF8hA38K4XwI&e= > > > > It seems that various drivers default to not shutting down the link upon > > ndo_stop(), but some users want to override it. I hit that too as it > > breaks ECMP (switch thinks the link is up). > > > > > Maybe others do this, too. It's time we added a standard API for > > > this. > > > > The new parameter sounds like a reset, but it can also be achieved by: > > > > # ethtool --set-priv-flags eth0 link-down-on-close on > > # ip link set dev eth0 down > > # ip link set dev eth0 up > > > > Where the first command is replaced by a more standard ethtool API. > > The intention here is provide an option to the user to toggle the serdes configuration > as and when he wants to. But why? What is the motivation? The commit message basically says that you are adding a param to toggle the physical link because it is useful to toggle the physical link. > There is no dependency with logical interface's status. But there is and the commit message explains why you are not doing it as part of ndo_{stop,open}(): "because the same link is shared between PFs and its VFs" Such constraints also apply to other drivers and you can see that in the "link-down-on-close" private flag. I'm also aware of propriety tools to toggle device bits which prevent the physical link from going down upon ndo_stop(). > Having a standard API to select bringing down physical interface upon logical interface's close call > is a good idea. But this patch is not for that. IIUC, your default behavior is not to take the physical link down upon ndo_stop() and now you want to toggle the link. If you have a standard API to change the default behavior, then the commands I showed will toggle the link, no?
Hi Ido, On Thu, Oct 28, 2021 at 3:55 AM Ido Schimmel <idosch@idosch.org> wrote: > > On Wed, Oct 27, 2021 at 06:43:00PM +0000, Sunil Kovvuri Goutham wrote: > > > > > ________________________________ > > > From: Ido Schimmel <idosch@idosch.org> > > > Sent: Wednesday, October 27, 2021 11:41 PM > > > To: Jakub Kicinski <kuba@kernel.org> > > > Cc: sundeep subbaraya <sundeep.lkml@gmail.com>; David Miller <davem@davemloft.net>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Hariprasad Kelam <hkelam@marvell.com>; Geethasowjanya Akula <gakula@marvell.com>; Sunil Kovvuri Goutham <sgoutham@marvell.com>; Subbaraya Sundeep Bhatta <sbhatta@marvell.com>; Rakesh Babu Saladi <rsaladi2@marvell.com>; Saeed Mahameed <saeed@kernel.org>; anthony.l.nguyen@intel.com <anthony.l.nguyen@intel.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; Andrew Lunn <andrew@lunn.ch> > > > Subject: Re: [EXT] Re: [net-next PATCH 1/2] octeontx2-pf: Add devlink param to init and de-init serdes > > > > > > On Wed, Oct 27, 2021 at 10:08:57AM -0700, Jakub Kicinski wrote: > > > > On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote: > > > > > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote: > > > > > > > From: Rakesh Babu <rsaladi2@marvell.com> > > > > > > > > > > > > > > The physical/SerDes link of an netdev interface is not > > > > > > > toggled on interface bring up and bring down. This is > > > > > > > because the same link is shared between PFs and its VFs. > > > > > > > This patch adds devlink param to toggle physical link so > > > > > > > that it is useful in cases where a physical link needs to > > > > > > > be re-initialized. > > > > > > > > > > > > So it's a reset? Or are there cases where user wants the link > > > > > > to stay down? > > > > > > > > > > There are cases where the user wants the link to stay down and debug. > > > > > We are adding this to help customers to debug issues wrt physical links. > > > > > > > > Intel has a similar thing, they keep adding a ethtool priv flag called > > > > "link-down-on-close" to all their drivers. > > > > > > This is the list I compiled the previous time we discussed it: > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dc3880bd159d431d06b687b0b5ab22e24e6ef0070&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=fvl1aLwL55CWIsG2NT5i3QsP4o_GTEsGhA6Epjz7ZAk&e= > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dd5ec9e2ce41ac198de2ee18e0e529b7ebbc67408&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=kH50Qq3h75xREveyWvCUn35wXagtt4uv1QRK0wMBEdk&e= > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dab4ab73fc1ec6dec548fa36c5e383ef5faa7b4c1&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=Uc3yY-5HjS7TgRBl4DPLsJ19XiHDD_PvF8hA38K4XwI&e= > > > > > > It seems that various drivers default to not shutting down the link upon > > > ndo_stop(), but some users want to override it. I hit that too as it > > > breaks ECMP (switch thinks the link is up). > > > > > > > Maybe others do this, too. It's time we added a standard API for > > > > this. > > > > > > The new parameter sounds like a reset, but it can also be achieved by: > > > > > > # ethtool --set-priv-flags eth0 link-down-on-close on > > > # ip link set dev eth0 down > > > # ip link set dev eth0 up > > > > > > Where the first command is replaced by a more standard ethtool API. > > > > The intention here is provide an option to the user to toggle the serdes configuration > > as and when he wants to. > > But why? What is the motivation? The commit message basically says that > you are adding a param to toggle the physical link because it is useful > to toggle the physical link. > > > There is no dependency with logical interface's status. > > But there is and the commit message explains why you are not doing it as > part of ndo_{stop,open}(): "because the same link is shared between PFs > and its VFs" > > Such constraints also apply to other drivers and you can see that in the > "link-down-on-close" private flag. I'm also aware of propriety tools to > toggle device bits which prevent the physical link from going down upon > ndo_stop(). > > > Having a standard API to select bringing down physical interface upon logical interface's close call > > is a good idea. But this patch is not for that. > > IIUC, your default behavior is not to take the physical link down upon > ndo_stop() and now you want to toggle the link. If you have a standard > API to change the default behavior, then the commands I showed will > toggle the link, no? Actually we also need a case where debugging is required when the logical link is up (so that packets flow from kernel to SerDes continuously) but the physical link is down. We will change the commit description since it is giving the wrong impression. A command to change physical link up/down with no relation to ifconfig is needed. Thanks, Sundeep
On Thu, Oct 28, 2021 at 05:48:02PM +0530, sundeep subbaraya wrote: > Actually we also need a case where debugging is required when the > logical link is > up (so that packets flow from kernel to SerDes continuously) but the > physical link > is down. Can you explain the motivation for that? In the past we discussed use cases for forcing the operational state to down while the administrative state is up and couldn't find any. > We will change the commit description since it is giving the > wrong impression. > A command to change physical link up/down with no relation to ifconfig > is needed. So it is obvious that some drivers default to not shutting down the physical link upon admin down, but that some users want to change that. In addition, we have your use case to control the physical link without relation to the logical link. I wonder if it can all be solved with a new ethtool attribute (part of LINKINFO_{SET,GET} ?) that describes the physical link policy and has the following values: * auto: Physical link state is derived from logical link state * up: Physical link state is always up * down: Physical link state is always down IIUC, it should solve your problem and that of the "link-down-on-close" private flag. It also has the added benefit of allowing user space to query the default policy. The expectation is that it would be "auto", but in some scenarios it is "up".
Hi Ido, On Thu, Oct 28, 2021 at 7:21 PM Ido Schimmel <idosch@idosch.org> wrote: > > On Thu, Oct 28, 2021 at 05:48:02PM +0530, sundeep subbaraya wrote: > > Actually we also need a case where debugging is required when the > > logical link is > > up (so that packets flow from kernel to SerDes continuously) but the > > physical link > > is down. > > Can you explain the motivation for that? In the past we discussed use > cases for forcing the operational state to down while the administrative > state is up and couldn't find any. > To be honest we got this request from a customer to provide a command to modify physical link without tying it to a logical link. I have asked for more details on how they use it. > > We will change the commit description since it is giving the > > wrong impression. > > A command to change physical link up/down with no relation to ifconfig > > is needed. > > So it is obvious that some drivers default to not shutting down the > physical link upon admin down, but that some users want to change that. > In addition, we have your use case to control the physical link without > relation to the logical link. I wonder if it can all be solved with a > new ethtool attribute (part of LINKINFO_{SET,GET} ?) that describes the > physical link policy and has the following values: > > * auto: Physical link state is derived from logical link state > * up: Physical link state is always up > * down: Physical link state is always down > > IIUC, it should solve your problem and that of the "link-down-on-close" > private flag. It also has the added benefit of allowing user space to > query the default policy. The expectation is that it would be "auto", > but in some scenarios it is "up". This looks good. Please note that we need the behavior such that after changing the flag a subsequent ifconfig command is not needed by the user. auto : in ndo_open, ndo_close check the physical link flag is auto and send command to firmware for bringing physical link up/down. up: send command to firmware instantaneously for physical link UP down: send command to firmware instantaneously for physical link DOWN Thanks, Sundeep
On Sat, Oct 30, 2021 at 12:55:47PM +0530, sundeep subbaraya wrote: > Hi Ido, > > On Thu, Oct 28, 2021 at 7:21 PM Ido Schimmel <idosch@idosch.org> wrote: > > > > On Thu, Oct 28, 2021 at 05:48:02PM +0530, sundeep subbaraya wrote: > > > Actually we also need a case where debugging is required when the > > > logical link is > > > up (so that packets flow from kernel to SerDes continuously) but the > > > physical link > > > is down. > > > > Can you explain the motivation for that? In the past we discussed use > > cases for forcing the operational state to down while the administrative > > state is up and couldn't find any. > > > To be honest we got this request from a customer to provide a command to modify > physical link without tying it to a logical link. I have asked for > more details on how > they use it. Thanks > > > > We will change the commit description since it is giving the > > > wrong impression. > > > A command to change physical link up/down with no relation to ifconfig > > > is needed. > > > > So it is obvious that some drivers default to not shutting down the > > physical link upon admin down, but that some users want to change that. > > In addition, we have your use case to control the physical link without > > relation to the logical link. I wonder if it can all be solved with a > > new ethtool attribute (part of LINKINFO_{SET,GET} ?) that describes the > > physical link policy and has the following values: > > > > * auto: Physical link state is derived from logical link state > > * up: Physical link state is always up > > * down: Physical link state is always down > > > > IIUC, it should solve your problem and that of the "link-down-on-close" > > private flag. It also has the added benefit of allowing user space to > > query the default policy. The expectation is that it would be "auto", > > but in some scenarios it is "up". > > This looks good. Please note that we need the behavior such that after changing > the flag a subsequent ifconfig command is not needed by the user. > > auto : in ndo_open, ndo_close check the physical link flag is auto and > send command > to firmware for bringing physical link up/down. > up: send command to firmware instantaneously for physical link UP > down: send command to firmware instantaneously for physical link DOWN TBH, I'm not that happy with my ethtool suggestion. It is not very clear which hardware entities the attribute controls. Maybe it's better to implement it as a rtnetlink attribute that controls the carrier (e.g., "carrier_policy")? Note that we already have ndo_change_carrier(), but the kdoc comment explicitly mentions that it shouldn't be used by physical devices: * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); * Called to change device carrier. Soft-devices (like dummy, team, etc) * which do not represent real hardware may define this to allow their * userspace components to manage their virtual carrier state. Devices * that determine carrier state from physical hardware properties (eg * network cables) or protocol-dependent mechanisms (eg * USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote: > > This looks good. Please note that we need the behavior such that after changing > > the flag a subsequent ifconfig command is not needed by the user. > > > > auto : in ndo_open, ndo_close check the physical link flag is auto and > > send command > > to firmware for bringing physical link up/down. > > up: send command to firmware instantaneously for physical link UP > > down: send command to firmware instantaneously for physical link DOWN > > TBH, I'm not that happy with my ethtool suggestion. It is not very clear > which hardware entities the attribute controls. Last week I heard a request to also be able to model NC-SI disruption. Control if the NIC should be reset and newly flashed FW activated when host is rebooted (vs full server power cycle). That adds another dimension to the problem, even though that particular use case may be better answered thru the devlink flashing/reset APIs. Trying to organize the requirements we have 3 entities which may hold the link up: - SFP power policy - NC-SI / BMC - SR-IOV (legacy) I'd think auto/up as possible options still make sense, although in case of NC-SI many NICs may not allow overriding the "up". And the policy may change without notification if BMC selects / activates a port - it may go from auto to up with no notification. Presumably we want to track "who's holding the link up" per consumer. Just a bitset with 1s for every consumer holding "up"? Or do we expect there will be "more to it" and should create bespoke nests? > Maybe it's better to > implement it as a rtnetlink attribute that controls the carrier (e.g., > "carrier_policy")? Note that we already have ndo_change_carrier(), but > the kdoc comment explicitly mentions that it shouldn't be used by > physical devices: > > * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); > * Called to change device carrier. Soft-devices (like dummy, team, etc) > * which do not represent real hardware may define this to allow their > * userspace components to manage their virtual carrier state. Devices > * that determine carrier state from physical hardware properties (eg > * network cables) or protocol-dependent mechanisms (eg > * USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function. New NDO seems reasonable. What are your thoughts on the SFP policy? We can still reshuffle it.
On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote: > On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote: > > > This looks good. Please note that we need the behavior such that after changing > > > the flag a subsequent ifconfig command is not needed by the user. > > > > > > auto : in ndo_open, ndo_close check the physical link flag is auto and > > > send command > > > to firmware for bringing physical link up/down. > > > up: send command to firmware instantaneously for physical link UP > > > down: send command to firmware instantaneously for physical link DOWN > > > > TBH, I'm not that happy with my ethtool suggestion. It is not very clear > > which hardware entities the attribute controls. > > Last week I heard a request to also be able to model NC-SI disruption. > Control if the NIC should be reset and newly flashed FW activated when > host is rebooted (vs full server power cycle). > > That adds another dimension to the problem, even though that particular > use case may be better answered thru the devlink flashing/reset APIs. > > Trying to organize the requirements we have 3 entities which may hold > the link up: > - SFP power policy The SFP power policy does not keep the link up. In fact, we specifically removed the "low" policy to make sure that whatever policy you configure ("auto"/"high") does not affect your carrier. > - NC-SI / BMC > - SR-IOV (legacy) > > I'd think auto/up as possible options still make sense, although in > case of NC-SI many NICs may not allow overriding the "up". And the > policy may change without notification if BMC selects / activates > a port - it may go from auto to up with no notification. > > Presumably we want to track "who's holding the link up" per consumer. > Just a bitset with 1s for every consumer holding "up"? > > Or do we expect there will be "more to it" and should create bespoke > nests? > > > Maybe it's better to > > implement it as a rtnetlink attribute that controls the carrier (e.g., > > "carrier_policy")? Note that we already have ndo_change_carrier(), but > > the kdoc comment explicitly mentions that it shouldn't be used by > > physical devices: > > > > * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); > > * Called to change device carrier. Soft-devices (like dummy, team, etc) > > * which do not represent real hardware may define this to allow their > > * userspace components to manage their virtual carrier state. Devices > > * that determine carrier state from physical hardware properties (eg > > * network cables) or protocol-dependent mechanisms (eg > > * USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function. > > New NDO seems reasonable. Spent a bit more time on that and I'm not sure a new ndo is needed. See: * void (*ndo_change_proto_down)(struct net_device *dev, * bool proto_down); * This function is used to pass protocol port error state information * to the switch driver. The switch driver can react to the proto_down * by doing a phys down on the associated switch port. So what this patch is trying to achieve can be achieved by implementing support for this ndo: $ ip link show dev macvlan10 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff # ip link set dev macvlan10 protodown on $ ip link show dev macvlan10 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on Currently, user space has no visibility into the fact that by default the carrier is on, but I imagine this can be resolved by adding "protoup" and defaulting the driver to report "on". The "who's holding the link up" issue can be resolved via "protoup_reason" (same as "protodown_reason").
On Thu, 11 Nov 2021 16:51:51 +0200 Ido Schimmel wrote: > On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote: > > On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote: > > > TBH, I'm not that happy with my ethtool suggestion. It is not very clear > > > which hardware entities the attribute controls. > > > > Last week I heard a request to also be able to model NC-SI disruption. > > Control if the NIC should be reset and newly flashed FW activated when > > host is rebooted (vs full server power cycle). > > > > That adds another dimension to the problem, even though that particular > > use case may be better answered thru the devlink flashing/reset APIs. > > > > Trying to organize the requirements we have 3 entities which may hold > > the link up: > > - SFP power policy > > The SFP power policy does not keep the link up. In fact, we specifically > removed the "low" policy to make sure that whatever policy you configure > ("auto"/"high") does not affect your carrier. Hm. How do we come up with the appropriate wording here... I meant keeping the "PHY level link" up? I think we agree that all the cases should behave like SFP power behaves today? The API is to control or query what is forcing the PHY link to stay up after the netdev was set down. IOW why does the switch still see link up if the link is down on Linux. I don't think we should report carrier up when netdev is down? > > - NC-SI / BMC > > - SR-IOV (legacy) - NPAR / Mutli-Host so 4 known reasons. > > I'd think auto/up as possible options still make sense, although in > > case of NC-SI many NICs may not allow overriding the "up". And the > > policy may change without notification if BMC selects / activates > > a port - it may go from auto to up with no notification. > > > > Presumably we want to track "who's holding the link up" per consumer. > > Just a bitset with 1s for every consumer holding "up"? > > > > Or do we expect there will be "more to it" and should create bespoke > > nests? > > > > > Maybe it's better to > > > implement it as a rtnetlink attribute that controls the carrier (e.g., > > > "carrier_policy")? Note that we already have ndo_change_carrier(), but > > > the kdoc comment explicitly mentions that it shouldn't be used by > > > physical devices: > > > > > > * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); > > > * Called to change device carrier. Soft-devices (like dummy, team, etc) > > > * which do not represent real hardware may define this to allow their > > > * userspace components to manage their virtual carrier state. Devices > > > * that determine carrier state from physical hardware properties (eg > > > * network cables) or protocol-dependent mechanisms (eg > > > * USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function. > > > > New NDO seems reasonable. > > Spent a bit more time on that and I'm not sure a new ndo is needed. See: > > * void (*ndo_change_proto_down)(struct net_device *dev, > * bool proto_down); > * This function is used to pass protocol port error state information > * to the switch driver. The switch driver can react to the proto_down > * by doing a phys down on the associated switch port. > > So what this patch is trying to achieve can be achieved by implementing > support for this ndo: > > $ ip link show dev macvlan10 > 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 > link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff > > # ip link set dev macvlan10 protodown on > > $ ip link show dev macvlan10 > 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 > link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on Let's wait to hear a strong use case, tho. > Currently, user space has no visibility into the fact that by default > the carrier is on, but I imagine this can be resolved by adding > "protoup" and defaulting the driver to report "on". The "who's holding > the link up" issue can be resolved via "protoup_reason" (same as > "protodown_reason"). "proto" in "protodown" refers to STP, right? Not sure what "proto" in "protoup" would be.
On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote: > On Thu, 11 Nov 2021 16:51:51 +0200 Ido Schimmel wrote: > > On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote: > > > On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote: > > > > TBH, I'm not that happy with my ethtool suggestion. It is not very clear > > > > which hardware entities the attribute controls. > > > > > > Last week I heard a request to also be able to model NC-SI disruption. > > > Control if the NIC should be reset and newly flashed FW activated when > > > host is rebooted (vs full server power cycle). > > > > > > That adds another dimension to the problem, even though that particular > > > use case may be better answered thru the devlink flashing/reset APIs. > > > > > > Trying to organize the requirements we have 3 entities which may hold > > > the link up: > > > - SFP power policy > > > > The SFP power policy does not keep the link up. In fact, we specifically > > removed the "low" policy to make sure that whatever policy you configure > > ("auto"/"high") does not affect your carrier. > > Hm. How do we come up with the appropriate wording here... > > I meant keeping the "PHY level link" up? I think we agree that all the > cases should behave like SFP power behaves today? > > The API is to control or query what is forcing the PHY link to stay up > after the netdev was set down. IOW why does the switch still see link > up if the link is down on Linux. The SFP power policy doesn't affect that. In our systems (and I believe many others), by default, the transceivers are transitioned to high power mode upon plug-in, but the link is still down when the netdev is down because the MAC/PHY are not operational. With SRIOV/Multi-Host, the MAC/PHY are always operational which is why your link partner has a carrier even when the netdev is down. > I don't think we should report carrier up when netdev is down? This is what happens today, but it's misleading because the carrier is always up with these systems. When I take the netdev down, I expect my link partner to lose carrier. If this doesn't happen, then I believe the netdev should always report IFF_UP. Alternatively, to avoid user space breakage, this can be reported via a new attribute such as "protoup". > > > > - NC-SI / BMC > > > - SR-IOV (legacy) > > - NPAR / Mutli-Host > > so 4 known reasons. > > > > I'd think auto/up as possible options still make sense, although in > > > case of NC-SI many NICs may not allow overriding the "up". And the > > > policy may change without notification if BMC selects / activates > > > a port - it may go from auto to up with no notification. > > > > > > Presumably we want to track "who's holding the link up" per consumer. > > > Just a bitset with 1s for every consumer holding "up"? > > > > > > Or do we expect there will be "more to it" and should create bespoke > > > nests? > > > > > > > Maybe it's better to > > > > implement it as a rtnetlink attribute that controls the carrier (e.g., > > > > "carrier_policy")? Note that we already have ndo_change_carrier(), but > > > > the kdoc comment explicitly mentions that it shouldn't be used by > > > > physical devices: > > > > > > > > * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); > > > > * Called to change device carrier. Soft-devices (like dummy, team, etc) > > > > * which do not represent real hardware may define this to allow their > > > > * userspace components to manage their virtual carrier state. Devices > > > > * that determine carrier state from physical hardware properties (eg > > > > * network cables) or protocol-dependent mechanisms (eg > > > > * USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function. > > > > > > New NDO seems reasonable. > > > > Spent a bit more time on that and I'm not sure a new ndo is needed. See: > > > > * void (*ndo_change_proto_down)(struct net_device *dev, > > * bool proto_down); > > * This function is used to pass protocol port error state information > > * to the switch driver. The switch driver can react to the proto_down > > * by doing a phys down on the associated switch port. > > > > So what this patch is trying to achieve can be achieved by implementing > > support for this ndo: > > > > $ ip link show dev macvlan10 > > 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 > > link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff > > > > # ip link set dev macvlan10 protodown on > > > > $ ip link show dev macvlan10 > > 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 > > link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on > > Let's wait to hear a strong use case, tho. Agree > > > Currently, user space has no visibility into the fact that by default > > the carrier is on, but I imagine this can be resolved by adding > > "protoup" and defaulting the driver to report "on". The "who's holding > > the link up" issue can be resolved via "protoup_reason" (same as > > "protodown_reason"). > > "proto" in "protodown" refers to STP, right? Not really. I believe the main use case was vrrp / mlag. The "protdown_reason" is just a bitmap of user enumerated reasons to keep the interface down. See commit 829eb208e80d ("rtnetlink: add support for protodown reason") for details. > Not sure what "proto" in "protoup" would be. sriov/multi-host/etc ?
On 11/14/21 12:38 AM, Ido Schimmel wrote: > On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote: >> On Thu, 11 Nov 2021 16:51:51 +0200 Ido Schimmel wrote: >>> On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote: >>>> On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote: >>>>> TBH, I'm not that happy with my ethtool suggestion. It is not very clear >>>>> which hardware entities the attribute controls. >>>> Last week I heard a request to also be able to model NC-SI disruption. >>>> Control if the NIC should be reset and newly flashed FW activated when >>>> host is rebooted (vs full server power cycle). >>>> >>>> That adds another dimension to the problem, even though that particular >>>> use case may be better answered thru the devlink flashing/reset APIs. >>>> >>>> Trying to organize the requirements we have 3 entities which may hold >>>> the link up: >>>> - SFP power policy >>> The SFP power policy does not keep the link up. In fact, we specifically >>> removed the "low" policy to make sure that whatever policy you configure >>> ("auto"/"high") does not affect your carrier. >> Hm. How do we come up with the appropriate wording here... >> >> I meant keeping the "PHY level link" up? I think we agree that all the >> cases should behave like SFP power behaves today? >> >> The API is to control or query what is forcing the PHY link to stay up >> after the netdev was set down. IOW why does the switch still see link >> up if the link is down on Linux. > The SFP power policy doesn't affect that. In our systems (and I believe > many others), by default, the transceivers are transitioned to high > power mode upon plug-in, but the link is still down when the netdev is > down because the MAC/PHY are not operational. > > With SRIOV/Multi-Host, the MAC/PHY are always operational which is why > your link partner has a carrier even when the netdev is down. > >> I don't think we should report carrier up when netdev is down? > This is what happens today, but it's misleading because the carrier is > always up with these systems. When I take the netdev down, I expect my > link partner to lose carrier. If this doesn't happen, then I believe the > netdev should always report IFF_UP. Alternatively, to avoid user space > breakage, this can be reported via a new attribute such as "protoup". > >>>> - NC-SI / BMC >>>> - SR-IOV (legacy) >> - NPAR / Mutli-Host >> >> so 4 known reasons. >> >>>> I'd think auto/up as possible options still make sense, although in >>>> case of NC-SI many NICs may not allow overriding the "up". And the >>>> policy may change without notification if BMC selects / activates >>>> a port - it may go from auto to up with no notification. >>>> >>>> Presumably we want to track "who's holding the link up" per consumer. >>>> Just a bitset with 1s for every consumer holding "up"? >>>> >>>> Or do we expect there will be "more to it" and should create bespoke >>>> nests? >>>> >>>>> Maybe it's better to >>>>> implement it as a rtnetlink attribute that controls the carrier (e.g., >>>>> "carrier_policy")? Note that we already have ndo_change_carrier(), but >>>>> the kdoc comment explicitly mentions that it shouldn't be used by >>>>> physical devices: >>>>> >>>>> * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); >>>>> * Called to change device carrier. Soft-devices (like dummy, team, etc) >>>>> * which do not represent real hardware may define this to allow their >>>>> * userspace components to manage their virtual carrier state. Devices >>>>> * that determine carrier state from physical hardware properties (eg >>>>> * network cables) or protocol-dependent mechanisms (eg >>>>> * USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function. >>>> New NDO seems reasonable. >>> Spent a bit more time on that and I'm not sure a new ndo is needed. See: >>> >>> * void (*ndo_change_proto_down)(struct net_device *dev, >>> * bool proto_down); >>> * This function is used to pass protocol port error state information >>> * to the switch driver. The switch driver can react to the proto_down >>> * by doing a phys down on the associated switch port. >>> >>> So what this patch is trying to achieve can be achieved by implementing >>> support for this ndo: >>> >>> $ ip link show dev macvlan10 >>> 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 >>> link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff >>> >>> # ip link set dev macvlan10 protodown on >>> >>> $ ip link show dev macvlan10 >>> 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 >>> link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on >> Let's wait to hear a strong use case, tho. > Agree > >>> Currently, user space has no visibility into the fact that by default >>> the carrier is on, but I imagine this can be resolved by adding >>> "protoup" and defaulting the driver to report "on". The "who's holding >>> the link up" issue can be resolved via "protoup_reason" (same as >>> "protodown_reason"). >> "proto" in "protodown" refers to STP, right? > Not really. I believe the main use case was vrrp / mlag. The > "protdown_reason" is just a bitmap of user enumerated reasons to keep > the interface down. See commit 829eb208e80d ("rtnetlink: add support for > protodown reason") for details. correct. Its equivalent to errDisable found on most commercial switch OS'es. Can be used for any control-plane/mgmt-plane/protocol wanting to hold the link down. Other use-cases where this can be used (as also quoted by other vendors): mismatch of link properties Link Flapping detection and disable link Port Security Violation Broadcast Storms etc > >> Not sure what "proto" in "protoup" would be. > sriov/multi-host/etc ? agree. Would be nice to re-use protodown ndo and state/reason here
On Sun, 14 Nov 2021 20:19:59 -0800 Roopa Prabhu wrote: > On 11/14/21 12:38 AM, Ido Schimmel wrote: > > On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote: > >> Hm. How do we come up with the appropriate wording here... > >> > >> I meant keeping the "PHY level link" up? I think we agree that all the > >> cases should behave like SFP power behaves today? > >> > >> The API is to control or query what is forcing the PHY link to stay up > >> after the netdev was set down. IOW why does the switch still see link > >> up if the link is down on Linux. > > The SFP power policy doesn't affect that. In our systems (and I believe > > many others), by default, the transceivers are transitioned to high > > power mode upon plug-in, but the link is still down when the netdev is > > down because the MAC/PHY are not operational. Ah, GTK! > > With SRIOV/Multi-Host, the MAC/PHY are always operational which is why > > your link partner has a carrier even when the netdev is down. I see, I think you're talking about something like IFLA_VF_LINK_STATE_* but for the PF. That could make sense, although I don't think it was ever requested. > >> I don't think we should report carrier up when netdev is down? > > This is what happens today, but it's misleading because the carrier is > > always up with these systems. When I take the netdev down, I expect my > > link partner to lose carrier. If this doesn't happen, then I believe the > > netdev should always report IFF_UP. Alternatively, to avoid user space > > breakage, this can be reported via a new attribute such as "protoup". Sounds sensible. > >> "proto" in "protodown" refers to STP, right? > > Not really. I believe the main use case was vrrp / mlag. VRRP is a proto, mlag maybe a little less clear-cut. > > The "protdown_reason" is just a bitmap of user enumerated reasons to keep > > the interface down. See commit 829eb208e80d ("rtnetlink: add support for > > protodown reason") for details. > > correct. Its equivalent to errDisable found on most commercial switch OS'es. > > Can be used for any control-plane/mgmt-plane/protocol wanting to hold > the link down. > > Other use-cases where this can be used (as also quoted by other vendors): > > mismatch of link properties What link properties? > Link Flapping detection and disable link > Port Security Violation Port security as established by a .. protocol like 802.1X ? > Broadcast Storms > etc Why not take the entire interface down for bcast storm? > >> Not sure what "proto" in "protoup" would be. > > sriov/multi-host/etc ? > > agree. Would be nice to re-use protodown ndo and state/reason here You are the experts so correct me please but the point of protodown is that the the link is held down for general traffic but you can still exchange protocol messages on it. STP, VRRP, LAG, 802.1X etc. For anything that does not require special message exchange the link can be just brought down completely. In my head link held up is a completely different beast, the local host does not participate or otherwise pay attention to any communication on the link. It's about what other entities do with the link. But if you prefer "protoup" strongly that's fine, I guess.
Hi, On Mon, Nov 15, 2021 at 8:41 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 14 Nov 2021 20:19:59 -0800 Roopa Prabhu wrote: > > On 11/14/21 12:38 AM, Ido Schimmel wrote: > > > On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote: > > >> Hm. How do we come up with the appropriate wording here... > > >> > > >> I meant keeping the "PHY level link" up? I think we agree that all the > > >> cases should behave like SFP power behaves today? > > >> > > >> The API is to control or query what is forcing the PHY link to stay up > > >> after the netdev was set down. IOW why does the switch still see link > > >> up if the link is down on Linux. > > > The SFP power policy doesn't affect that. In our systems (and I believe > > > many others), by default, the transceivers are transitioned to high > > > power mode upon plug-in, but the link is still down when the netdev is > > > down because the MAC/PHY are not operational. > > Ah, GTK! > > > > With SRIOV/Multi-Host, the MAC/PHY are always operational which is why > > > your link partner has a carrier even when the netdev is down. > > I see, I think you're talking about something like IFLA_VF_LINK_STATE_* > but for the PF. That could make sense, although I don't think it was > ever requested. > > > >> I don't think we should report carrier up when netdev is down? > > > This is what happens today, but it's misleading because the carrier is > > > always up with these systems. When I take the netdev down, I expect my > > > link partner to lose carrier. If this doesn't happen, then I believe the > > > netdev should always report IFF_UP. Alternatively, to avoid user space > > > breakage, this can be reported via a new attribute such as "protoup". > > Sounds sensible. > > > >> "proto" in "protodown" refers to STP, right? > > > Not really. I believe the main use case was vrrp / mlag. > > VRRP is a proto, mlag maybe a little less clear-cut. > > > > The "protdown_reason" is just a bitmap of user enumerated reasons to keep > > > the interface down. See commit 829eb208e80d ("rtnetlink: add support for > > > protodown reason") for details. > > > > correct. Its equivalent to errDisable found on most commercial switch OS'es. > > > > Can be used for any control-plane/mgmt-plane/protocol wanting to hold > > the link down. > > > > Other use-cases where this can be used (as also quoted by other vendors): > > > > mismatch of link properties > > What link properties? > > > Link Flapping detection and disable link > > Port Security Violation > > Port security as established by a .. protocol like 802.1X ? > > > Broadcast Storms > > etc > > Why not take the entire interface down for bcast storm? > > > >> Not sure what "proto" in "protoup" would be. > > > sriov/multi-host/etc ? > > > > agree. Would be nice to re-use protodown ndo and state/reason here > > You are the experts so correct me please but the point of protodown > is that the the link is held down for general traffic but you can > still exchange protocol messages on it. STP, VRRP, LAG, 802.1X etc. > > For anything that does not require special message exchange the link > can be just brought down completely. > > In my head link held up is a completely different beast, the local host > does not participate or otherwise pay attention to any communication on > the link. It's about what other entities do with the link. > > But if you prefer "protoup" strongly that's fine, I guess. As said by Ido, ndo_change_proto_down with proto_down as on and off is sufficient for our requirement right now. We will use ndo_change_proto_down instead of devlink. Thanks everyone for pitching in. Sundeep
On Fri, 19 Nov 2021 16:17:53 +0530 sundeep subbaraya wrote: > As said by Ido, ndo_change_proto_down with proto_down as > on and off is sufficient for our requirement right now. We will use > ndo_change_proto_down instead of devlink. Thanks everyone for > pitching in. ndo_change_proto_down is for software devices. Make sure you explain your use case well, otherwise it's going to be a nack.
On Fri, Nov 19, 2021 at 7:40 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 19 Nov 2021 16:17:53 +0530 sundeep subbaraya wrote: > > As said by Ido, ndo_change_proto_down with proto_down as > > on and off is sufficient for our requirement right now. We will use > > ndo_change_proto_down instead of devlink. Thanks everyone for > > pitching in. > > ndo_change_proto_down is for software devices. Make sure you explain > your use case well, otherwise it's going to be a nack. Sorry new to networking stuff here. Where does the below imply it is for software devices? * void (*ndo_change_proto_down)(struct net_device *dev, * bool proto_down); * This function is used to pass protocol port error state information * to the switch driver. The switch driver can react to the proto_down * by doing a phys down on the associated switch port. I will find out the use case (pinged customer again) Thanks, Sundeep
On Fri, 19 Nov 2021 19:56:51 +0530 sundeep subbaraya wrote: > On Fri, Nov 19, 2021 at 7:40 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 19 Nov 2021 16:17:53 +0530 sundeep subbaraya wrote: > > > As said by Ido, ndo_change_proto_down with proto_down as > > > on and off is sufficient for our requirement right now. We will use > > > ndo_change_proto_down instead of devlink. Thanks everyone for > > > pitching in. > > > > ndo_change_proto_down is for software devices. Make sure you explain > > your use case well, otherwise it's going to be a nack. > > Sorry new to networking stuff here. Where does the below imply it is > for software devices? > * void (*ndo_change_proto_down)(struct net_device *dev, > * bool proto_down); > * This function is used to pass protocol port error state information > * to the switch driver. The switch driver can react to the proto_down > * by doing a phys down on the associated switch port. > I will find out the use case (pinged customer again) Don't trust comments or documentation when working on Linux. Code and git history are the sources of truth. But you're right in a sense, the software devices which use this callback today look like pretty fake users to allow out-of-tree code to do things. Will anyone who does not work on Cumulus Linus scream if we do this? ----->8--------------- From: Jakub Kicinski <kuba@kernel.org> Date: Fri, 19 Nov 2021 06:43:58 -0800 Subject: [PATCH] net: remove .ndo_change_proto_down .ndo_change_proto_down was added seemingly to enable out-of-tree implementations. Over 2.5yrs later we still have no real users upstream. Stub this out for now, we can revert once real users materialize. (rocker is a test vehicle, not a user.) We need to drop the optimization on the sysfs side, because unlike ndos priv_flags will be changed at runtime, so we'd need READ_ONCE/WRITE_ONCE everywhere.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- drivers/net/ethernet/rocker/rocker_main.c | 12 ----------- drivers/net/macvlan.c | 3 +-- drivers/net/vxlan.c | 3 +-- include/linux/netdevice.h | 11 ++-------- net/core/dev.c | 26 ++++------------------- net/core/net-sysfs.c | 6 ------ net/core/rtnetlink.c | 3 +-- 7 files changed, 9 insertions(+), 55 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c index ba4062881eed..b620470c7905 100644 --- a/drivers/net/ethernet/rocker/rocker_main.c +++ b/drivers/net/ethernet/rocker/rocker_main.c @@ -1995,17 +1995,6 @@ static int rocker_port_get_phys_port_name(struct net_device *dev, return err ? -EOPNOTSUPP : 0; } -static int rocker_port_change_proto_down(struct net_device *dev, - bool proto_down) -{ - struct rocker_port *rocker_port = netdev_priv(dev); - - if (rocker_port->dev->flags & IFF_UP) - rocker_port_set_enable(rocker_port, !proto_down); - rocker_port->dev->proto_down = proto_down; - return 0; -} - static void rocker_port_neigh_destroy(struct net_device *dev, struct neighbour *n) { @@ -2037,7 +2026,6 @@ static const struct net_device_ops rocker_port_netdev_ops = { .ndo_set_mac_address = rocker_port_set_mac_address, .ndo_change_mtu = rocker_port_change_mtu, .ndo_get_phys_port_name = rocker_port_get_phys_port_name, - .ndo_change_proto_down = rocker_port_change_proto_down, .ndo_neigh_destroy = rocker_port_neigh_destroy, .ndo_get_port_parent_id = rocker_port_get_port_parent_id, }; diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index d2f830ec2969..71ad30c10b6e 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1171,7 +1171,6 @@ static const struct net_device_ops macvlan_netdev_ops = { #endif .ndo_get_iflink = macvlan_dev_get_iflink, .ndo_features_check = passthru_features_check, - .ndo_change_proto_down = dev_change_proto_down_generic, }; void macvlan_common_setup(struct net_device *dev) @@ -1182,7 +1181,7 @@ void macvlan_common_setup(struct net_device *dev) dev->max_mtu = ETH_MAX_MTU; dev->priv_flags &= ~IFF_TX_SKB_SHARING; netif_keep_dst(dev); - dev->priv_flags |= IFF_UNICAST_FLT; + dev->priv_flags |= IFF_UNICAST_FLT | IFF_CHANGE_PROTO_DOWN; dev->netdev_ops = &macvlan_netdev_ops; dev->needs_free_netdev = true; dev->header_ops = &macvlan_hard_header_ops; diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 563f86de0e0d..74a28b11260e 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -3234,7 +3234,6 @@ static const struct net_device_ops vxlan_netdev_ether_ops = { .ndo_fdb_dump = vxlan_fdb_dump, .ndo_fdb_get = vxlan_fdb_get, .ndo_fill_metadata_dst = vxlan_fill_metadata_dst, - .ndo_change_proto_down = dev_change_proto_down_generic, }; static const struct net_device_ops vxlan_netdev_raw_ops = { @@ -3305,7 +3304,7 @@ static void vxlan_setup(struct net_device *dev) dev->hw_features |= NETIF_F_RXCSUM; dev->hw_features |= NETIF_F_GSO_SOFTWARE; netif_keep_dst(dev); - dev->priv_flags |= IFF_NO_QUEUE; + dev->priv_flags |= IFF_NO_QUEUE | IFF_CHANGE_PROTO_DOWN; /* MTU range: 68 - 65535 */ dev->min_mtu = ETH_MIN_MTU; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cb7f2661d187..647d0960c510 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1297,11 +1297,6 @@ struct netdev_net_notifier { * TX queue. * int (*ndo_get_iflink)(const struct net_device *dev); * Called to get the iflink value of this device. - * void (*ndo_change_proto_down)(struct net_device *dev, - * bool proto_down); - * This function is used to pass protocol port error state information - * to the switch driver. The switch driver can react to the proto_down - * by doing a phys down on the associated switch port. * int (*ndo_fill_metadata_dst)(struct net_device *dev, struct sk_buff *skb); * This function is used to get egress tunnel information for given skb. * This is useful for retrieving outer tunnel header parameters while @@ -1542,8 +1537,6 @@ struct net_device_ops { int queue_index, u32 maxrate); int (*ndo_get_iflink)(const struct net_device *dev); - int (*ndo_change_proto_down)(struct net_device *dev, - bool proto_down); int (*ndo_fill_metadata_dst)(struct net_device *dev, struct sk_buff *skb); void (*ndo_set_rx_headroom)(struct net_device *dev, @@ -1646,6 +1639,7 @@ enum netdev_priv_flags { IFF_L3MDEV_RX_HANDLER = 1<<29, IFF_LIVE_RENAME_OK = 1<<30, IFF_TX_SKB_NO_LINEAR = 1<<31, + IFF_CHANGE_PROTO_DOWN = 1<<32, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1982,7 +1976,7 @@ struct net_device { /* Read-mostly cache-line for fast-path access */ unsigned int flags; - unsigned int priv_flags; + unsigned long long priv_flags; const struct net_device_ops *netdev_ops; int ifindex; unsigned short gflags; @@ -3735,7 +3729,6 @@ int dev_get_port_parent_id(struct net_device *dev, struct netdev_phys_item_id *ppid, bool recurse); bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b); int dev_change_proto_down(struct net_device *dev, bool proto_down); -int dev_change_proto_down_generic(struct net_device *dev, bool proto_down); void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask, u32 value); struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, bool *again); diff --git a/net/core/dev.c b/net/core/dev.c index 9219e319e901..a2ed933df7fc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8558,35 +8558,17 @@ bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b) EXPORT_SYMBOL(netdev_port_same_parent_id); /** - * dev_change_proto_down - update protocol port state information + * dev_change_proto_down - set carrier according to proto_down. + * * @dev: device * @proto_down: new value - * - * This info can be used by switch drivers to set the phys state of the - * port. */ int dev_change_proto_down(struct net_device *dev, bool proto_down) { - const struct net_device_ops *ops = dev->netdev_ops; - - if (!ops->ndo_change_proto_down) + if (!(dev->priv_flags & IFF_CHANGE_PROTO_DOWN)) return -EOPNOTSUPP; if (!netif_device_present(dev)) return -ENODEV; - return ops->ndo_change_proto_down(dev, proto_down); -} -EXPORT_SYMBOL(dev_change_proto_down); - -/** - * dev_change_proto_down_generic - generic implementation for - * ndo_change_proto_down that sets carrier according to - * proto_down. - * - * @dev: device - * @proto_down: new value - */ -int dev_change_proto_down_generic(struct net_device *dev, bool proto_down) -{ if (proto_down) netif_carrier_off(dev); else @@ -8594,7 +8576,7 @@ int dev_change_proto_down_generic(struct net_device *dev, bool proto_down) dev->proto_down = proto_down; return 0; } -EXPORT_SYMBOL(dev_change_proto_down_generic); +EXPORT_SYMBOL(dev_change_proto_down); /** * dev_change_proto_down_reason - proto down reason diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index addbef5419fb..a1ec86bcc9d9 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -490,12 +490,6 @@ static ssize_t proto_down_store(struct device *dev, { struct net_device *netdev = to_net_dev(dev); - /* The check is also done in change_proto_down; this helps returning - * early without hitting the trylock/restart in netdev_store. - */ - if (!netdev->netdev_ops->ndo_change_proto_down) - return -EOPNOTSUPP; - return netdev_store(dev, attr, buf, len, change_proto_down); } NETDEVICE_SHOW_RW(proto_down, fmt_dec); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2af8aeeadadf..2ff3369892ff 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2539,13 +2539,12 @@ static int do_set_proto_down(struct net_device *dev, struct netlink_ext_ack *extack) { struct nlattr *pdreason[IFLA_PROTO_DOWN_REASON_MAX + 1]; - const struct net_device_ops *ops = dev->netdev_ops; unsigned long mask = 0; u32 value; bool proto_down; int err; - if (!ops->ndo_change_proto_down) { + if (!(dev->priv_flags & IFF_CHANGE_PROTO_DOWN)) { NL_SET_ERR_MSG(extack, "Protodown not supported by device"); return -EOPNOTSUPP; }
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c index 186d00a9..26c8763 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c @@ -1376,6 +1376,17 @@ static void cgx_lmac_linkup_work(struct work_struct *work) } } +int cgx_set_link_state(void *cgxd, int lmac_id, bool enable) +{ + struct cgx *cgx = cgxd; + + if (!cgx) + return -ENODEV; + + return cgx_fwi_link_change(cgx, lmac_id, enable); +} +EXPORT_SYMBOL(cgx_set_link_state); + int cgx_lmac_linkup_start(void *cgxd) { struct cgx *cgx = cgxd; diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h index ab1e4ab..4d7c320 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h @@ -172,4 +172,5 @@ u64 cgx_lmac_read(int cgx_id, int lmac_id, u64 offset); int cgx_lmac_addr_update(u8 cgx_id, u8 lmac_id, u8 *mac_addr, u8 index); u64 cgx_read_dmac_ctrl(void *cgxd, int lmac_id); u64 cgx_read_dmac_entry(void *cgxd, int index); +int cgx_set_link_state(void *cgxd, int lmac_id, bool enable); #endif /* CGX_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index 4e79e91..25c2497 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -162,6 +162,8 @@ M(CGX_MAC_ADDR_DEL, 0x212, cgx_mac_addr_del, cgx_mac_addr_del_req, \ msg_rsp) \ M(CGX_MAC_MAX_ENTRIES_GET, 0x213, cgx_mac_max_entries_get, msg_req, \ cgx_max_dmac_entries_get_rsp) \ +M(CGX_SET_LINK_STATE, 0x214, cgx_set_link_state, \ + cgx_set_link_state_msg, msg_rsp) \ M(CGX_FEC_STATS, 0x217, cgx_fec_stats, msg_req, cgx_fec_stats_rsp) \ M(CGX_SET_LINK_MODE, 0x218, cgx_set_link_mode, cgx_set_link_mode_req,\ cgx_set_link_mode_rsp) \ @@ -581,6 +583,11 @@ struct cgx_set_link_mode_rsp { int status; }; +struct cgx_set_link_state_msg { + struct mbox_msghdr hdr; + u8 enable; /* '1' for link up, '0' for link down */ +}; + struct cgx_mac_addr_update_req { struct mbox_msghdr hdr; u8 mac_addr[ETH_ALEN]; diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c index 2ca182a..7781be1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c @@ -1069,3 +1069,27 @@ int rvu_mbox_handler_cgx_mac_addr_update(struct rvu *rvu, rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); return cgx_lmac_addr_update(cgx_id, lmac_id, req->mac_addr, req->index); } + +int rvu_mbox_handler_cgx_set_link_state(struct rvu *rvu, + struct cgx_set_link_state_msg *req, + struct msg_rsp *rsp) +{ + u16 pcifunc = req->hdr.pcifunc; + u8 cgx_id, lmac_id; + int pf, err; + + pf = rvu_get_pf(pcifunc); + + if (!is_cgx_config_permitted(rvu, pcifunc)) + return LMAC_AF_ERR_PERM_DENIED; + + rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); + + err = cgx_set_link_state(rvu_cgx_pdata(cgx_id, rvu), lmac_id, + !!req->enable); + if (err) + dev_warn(rvu->dev, "Cannot set link state to %s, err %d\n", + (req->enable) ? "enable" : "disable", err); + + return err; +} diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index 66da31f..446378a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -263,6 +263,26 @@ int otx2_config_pause_frm(struct otx2_nic *pfvf) return err; } +int otx2_config_serdes_link_state(struct otx2_nic *pfvf, bool en) +{ + struct cgx_set_link_state_msg *req; + int err; + + mutex_lock(&pfvf->mbox.lock); + req = otx2_mbox_alloc_msg_cgx_set_link_state(&pfvf->mbox); + if (!req) { + err = -ENOMEM; + goto unlock; + } + + req->enable = !!en; + err = otx2_sync_mbox_msg(&pfvf->mbox); +unlock: + mutex_unlock(&pfvf->mbox.lock); + return err; +} +EXPORT_SYMBOL(otx2_config_serdes_link_state); + int otx2_set_flowkey_cfg(struct otx2_nic *pfvf) { struct otx2_rss_info *rss = &pfvf->hw.rss_info; diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h index 61e5281..8e60e3c 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h @@ -771,6 +771,7 @@ void otx2_get_mac_from_af(struct net_device *netdev); void otx2_config_irq_coalescing(struct otx2_nic *pfvf, int qidx); int otx2_config_pause_frm(struct otx2_nic *pfvf); void otx2_setup_segmentation(struct otx2_nic *pfvf); +int otx2_config_serdes_link_state(struct otx2_nic *pfvf, bool en); /* RVU block related APIs */ int otx2_attach_npa_nix(struct otx2_nic *pfvf); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c index 777a270..471097b 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c @@ -64,9 +64,33 @@ static int otx2_dl_mcam_count_get(struct devlink *devlink, u32 id, return 0; } +static int otx2_dl_serdes_link_set(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct otx2_devlink *otx2_dl = devlink_priv(devlink); + struct otx2_nic *pfvf = otx2_dl->pfvf; + + if (!is_otx2_vf(pfvf->pcifunc)) + return otx2_config_serdes_link_state(pfvf, ctx->val.vbool); + + return -EOPNOTSUPP; +} + +static int otx2_dl_serdes_link_get(struct devlink *devlink, u32 id, + struct devlink_param_gset_ctx *ctx) +{ + struct otx2_devlink *otx2_dl = devlink_priv(devlink); + struct otx2_nic *pfvf = otx2_dl->pfvf; + + ctx->val.vbool = (pfvf->linfo.link_up) ? true : false; + + return 0; +} + enum otx2_dl_param_id { OTX2_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX, OTX2_DEVLINK_PARAM_ID_MCAM_COUNT, + OTX2_DEVLINK_PARAM_ID_SERDES_LINK, }; static const struct devlink_param otx2_dl_params[] = { @@ -75,6 +99,11 @@ static const struct devlink_param otx2_dl_params[] = { BIT(DEVLINK_PARAM_CMODE_RUNTIME), otx2_dl_mcam_count_get, otx2_dl_mcam_count_set, otx2_dl_mcam_count_validate), + DEVLINK_PARAM_DRIVER(OTX2_DEVLINK_PARAM_ID_SERDES_LINK, + "serdes_link", DEVLINK_PARAM_TYPE_BOOL, + BIT(DEVLINK_PARAM_CMODE_RUNTIME), + otx2_dl_serdes_link_get, otx2_dl_serdes_link_set, + NULL), }; /* Devlink OPs */