diff mbox series

[net-next,1/2] octeontx2-pf: Add devlink param to init and de-init serdes

Message ID 1635330675-25592-2-git-send-email-sbhatta@marvell.com (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series Add a devlink param and documentation | expand

Checks

Context Check Description
netdev/cover_letter success Series has a cover letter
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: jerinj@marvell.com lcherian@marvell.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 145 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 19 this patch: 19
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Subbaraya Sundeep Oct. 27, 2021, 10:31 a.m. UTC
From: Rakesh Babu <rsaladi2@marvell.com>

The physical/SerDes link of an netdev interface is not
toggled on interface bring up and bring down. This is
because the same link is shared between PFs and its VFs.
This patch adds devlink param to toggle physical link so
that it is useful in cases where a physical link needs to
be re-initialized. A command is sent to firmware for
link change.

commands:
    devlink dev param show
        pci/0002:02:00.0:
          name serdes_link type driver-specific
     values:
          cmode runtime value true

    devlink dev param set pci/0002:02:00.0 name serdes_link value true
    cmode runtime

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/cgx.c    | 11 ++++++++
 drivers/net/ethernet/marvell/octeontx2/af/cgx.h    |  1 +
 drivers/net/ethernet/marvell/octeontx2/af/mbox.h   |  7 ++++++
 .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c    | 24 ++++++++++++++++++
 .../ethernet/marvell/octeontx2/nic/otx2_common.c   | 20 +++++++++++++++
 .../ethernet/marvell/octeontx2/nic/otx2_common.h   |  1 +
 .../ethernet/marvell/octeontx2/nic/otx2_devlink.c  | 29 ++++++++++++++++++++++
 7 files changed, 93 insertions(+)

Comments

Jakub Kicinski Oct. 27, 2021, 3:38 p.m. UTC | #1
On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote:
> From: Rakesh Babu <rsaladi2@marvell.com>
> 
> The physical/SerDes link of an netdev interface is not
> toggled on interface bring up and bring down. This is
> because the same link is shared between PFs and its VFs.
> This patch adds devlink param to toggle physical link so
> that it is useful in cases where a physical link needs to
> be re-initialized.

So it's a reset? Or are there cases where user wants the link 
to stay down?
sundeep subbaraya Oct. 27, 2021, 4:43 p.m. UTC | #2
Hi  Jakub,

> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, October 27, 2021 9:08 PM
> To: Subbaraya Sundeep Bhatta
> Cc: davem@davemloft.net; netdev@vger.kernel.org; Hariprasad Kelam; Geethasowjanya Akula; Sunil Kovvuri Goutham; Rakesh Babu Saladi; Vamsi Krishna Attunuru
> Subject: Re: [net-next PATCH 1/2] octeontx2-pf: Add devlink param to init and de-init serdes
>
> On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote:
> > From: Rakesh Babu <rsaladi2@marvell.com>
> >
> > The physical/SerDes link of an netdev interface is not
> > toggled on interface bring up and bring down. This is
> > because the same link is shared between PFs and its VFs.
> > This patch adds devlink param to toggle physical link so
> > that it is useful in cases where a physical link needs to
> > be re-initialized.
>
> So it's a reset? Or are there cases where user wants the link
> to stay down?

There are cases where the user wants the link to stay down and debug.
We are adding this to help customers to debug issues wrt physical links.

Thanks,
Sundeep
Jakub Kicinski Oct. 27, 2021, 5:08 p.m. UTC | #3
On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote:
> > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote:  
> > > From: Rakesh Babu <rsaladi2@marvell.com>
> > >
> > > The physical/SerDes link of an netdev interface is not
> > > toggled on interface bring up and bring down. This is
> > > because the same link is shared between PFs and its VFs.
> > > This patch adds devlink param to toggle physical link so
> > > that it is useful in cases where a physical link needs to
> > > be re-initialized.  
> >
> > So it's a reset? Or are there cases where user wants the link
> > to stay down?  
> 
> There are cases where the user wants the link to stay down and debug.
> We are adding this to help customers to debug issues wrt physical links.

Intel has a similar thing, they keep adding a ethtool priv flag called 
"link-down-on-close" to all their drivers. Maybe others do this, too. 
It's time we added a standard API for this.
Ido Schimmel Oct. 27, 2021, 6:11 p.m. UTC | #4
On Wed, Oct 27, 2021 at 10:08:57AM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote:
> > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote:  
> > > > From: Rakesh Babu <rsaladi2@marvell.com>
> > > >
> > > > The physical/SerDes link of an netdev interface is not
> > > > toggled on interface bring up and bring down. This is
> > > > because the same link is shared between PFs and its VFs.
> > > > This patch adds devlink param to toggle physical link so
> > > > that it is useful in cases where a physical link needs to
> > > > be re-initialized.  
> > >
> > > So it's a reset? Or are there cases where user wants the link
> > > to stay down?  
> > 
> > There are cases where the user wants the link to stay down and debug.
> > We are adding this to help customers to debug issues wrt physical links.
> 
> Intel has a similar thing, they keep adding a ethtool priv flag called 
> "link-down-on-close" to all their drivers.

This is the list I compiled the previous time we discussed it:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c3880bd159d431d06b687b0b5ab22e24e6ef0070
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5ec9e2ce41ac198de2ee18e0e529b7ebbc67408
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab4ab73fc1ec6dec548fa36c5e383ef5faa7b4c1

It seems that various drivers default to not shutting down the link upon
ndo_stop(), but some users want to override it. I hit that too as it
breaks ECMP (switch thinks the link is up).

> Maybe others do this, too.  It's time we added a standard API for
> this.

The new parameter sounds like a reset, but it can also be achieved by:

# ethtool --set-priv-flags eth0 link-down-on-close on
# ip link set dev eth0 down
# ip link set dev eth0 up

Where the first command is replaced by a more standard ethtool API.
Ido Schimmel Oct. 27, 2021, 10:24 p.m. UTC | #5
On Wed, Oct 27, 2021 at 06:43:00PM +0000, Sunil Kovvuri Goutham wrote:
> 
> > ________________________________
> > From: Ido Schimmel <idosch@idosch.org>
> > Sent: Wednesday, October 27, 2021 11:41 PM
> > To: Jakub Kicinski <kuba@kernel.org>
> > Cc: sundeep subbaraya <sundeep.lkml@gmail.com>; David Miller <davem@davemloft.net>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Hariprasad Kelam <hkelam@marvell.com>; Geethasowjanya Akula <gakula@marvell.com>; Sunil Kovvuri Goutham <sgoutham@marvell.com>; Subbaraya Sundeep Bhatta <sbhatta@marvell.com>; Rakesh Babu Saladi <rsaladi2@marvell.com>; Saeed Mahameed <saeed@kernel.org>; anthony.l.nguyen@intel.com <anthony.l.nguyen@intel.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; Andrew Lunn <andrew@lunn.ch>
> > Subject: Re: [EXT] Re: [net-next PATCH 1/2] octeontx2-pf: Add devlink param to init and de-init serdes
> >
> > On Wed, Oct 27, 2021 at 10:08:57AM -0700, Jakub Kicinski wrote:
> > > On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote:
> > > > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote:
> > > > > > From: Rakesh Babu <rsaladi2@marvell.com>
> > > > > >
> > > > > > The physical/SerDes link of an netdev interface is not
> > > > > > toggled on interface bring up and bring down. This is
> > > > > > because the same link is shared between PFs and its VFs.
> > > > > > This patch adds devlink param to toggle physical link so
> > > > > > that it is useful in cases where a physical link needs to
> > > > > > be re-initialized.
> > > > >
> > > > > So it's a reset? Or are there cases where user wants the link
> > > > > to stay down?
> > > >
> > > > There are cases where the user wants the link to stay down and debug.
> > > > We are adding this to help customers to debug issues wrt physical links.
> > >
> > > Intel has a similar thing, they keep adding a ethtool priv flag called
> > > "link-down-on-close" to all their drivers.
> >
> > This is the list I compiled the previous time we discussed it:
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dc3880bd159d431d06b687b0b5ab22e24e6ef0070&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=fvl1aLwL55CWIsG2NT5i3QsP4o_GTEsGhA6Epjz7ZAk&e=
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dd5ec9e2ce41ac198de2ee18e0e529b7ebbc67408&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=kH50Qq3h75xREveyWvCUn35wXagtt4uv1QRK0wMBEdk&e=
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dab4ab73fc1ec6dec548fa36c5e383ef5faa7b4c1&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=Uc3yY-5HjS7TgRBl4DPLsJ19XiHDD_PvF8hA38K4XwI&e=
> >
> > It seems that various drivers default to not shutting down the link upon
> > ndo_stop(), but some users want to override it. I hit that too as it
> > breaks ECMP (switch thinks the link is up).
> >
> > > Maybe others do this, too.  It's time we added a standard API for
> > > this.
> >
> > The new parameter sounds like a reset, but it can also be achieved by:
> >
> > # ethtool --set-priv-flags eth0 link-down-on-close on
> > # ip link set dev eth0 down
> > # ip link set dev eth0 up
> >
> > Where the first command is replaced by a more standard ethtool API.
> 
> The intention here is provide an option to the user to toggle the serdes configuration
> as and when he wants to.

But why? What is the motivation? The commit message basically says that
you are adding a param to toggle the physical link because it is useful
to toggle the physical link.

> There is no dependency with logical interface's status.

But there is and the commit message explains why you are not doing it as
part of ndo_{stop,open}(): "because the same link is shared between PFs
and its VFs"

Such constraints also apply to other drivers and you can see that in the
"link-down-on-close" private flag. I'm also aware of propriety tools to
toggle device bits which prevent the physical link from going down upon
ndo_stop().

> Having a standard API to select bringing down physical interface upon logical interface's close call
> is a good idea. But this patch is not for that.

IIUC, your default behavior is not to take the physical link down upon
ndo_stop() and now you want to toggle the link. If you have a standard
API to change the default behavior, then the commands I showed will
toggle the link, no?
sundeep subbaraya Oct. 28, 2021, 12:18 p.m. UTC | #6
Hi Ido,

On Thu, Oct 28, 2021 at 3:55 AM Ido Schimmel <idosch@idosch.org> wrote:
>
> On Wed, Oct 27, 2021 at 06:43:00PM +0000, Sunil Kovvuri Goutham wrote:
> >
> > > ________________________________
> > > From: Ido Schimmel <idosch@idosch.org>
> > > Sent: Wednesday, October 27, 2021 11:41 PM
> > > To: Jakub Kicinski <kuba@kernel.org>
> > > Cc: sundeep subbaraya <sundeep.lkml@gmail.com>; David Miller <davem@davemloft.net>; netdev@vger.kernel.org <netdev@vger.kernel.org>; Hariprasad Kelam <hkelam@marvell.com>; Geethasowjanya Akula <gakula@marvell.com>; Sunil Kovvuri Goutham <sgoutham@marvell.com>; Subbaraya Sundeep Bhatta <sbhatta@marvell.com>; Rakesh Babu Saladi <rsaladi2@marvell.com>; Saeed Mahameed <saeed@kernel.org>; anthony.l.nguyen@intel.com <anthony.l.nguyen@intel.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; Andrew Lunn <andrew@lunn.ch>
> > > Subject: Re: [EXT] Re: [net-next PATCH 1/2] octeontx2-pf: Add devlink param to init and de-init serdes
> > >
> > > On Wed, Oct 27, 2021 at 10:08:57AM -0700, Jakub Kicinski wrote:
> > > > On Wed, 27 Oct 2021 22:13:32 +0530 sundeep subbaraya wrote:
> > > > > > On Wed, 27 Oct 2021 16:01:14 +0530 Subbaraya Sundeep wrote:
> > > > > > > From: Rakesh Babu <rsaladi2@marvell.com>
> > > > > > >
> > > > > > > The physical/SerDes link of an netdev interface is not
> > > > > > > toggled on interface bring up and bring down. This is
> > > > > > > because the same link is shared between PFs and its VFs.
> > > > > > > This patch adds devlink param to toggle physical link so
> > > > > > > that it is useful in cases where a physical link needs to
> > > > > > > be re-initialized.
> > > > > >
> > > > > > So it's a reset? Or are there cases where user wants the link
> > > > > > to stay down?
> > > > >
> > > > > There are cases where the user wants the link to stay down and debug.
> > > > > We are adding this to help customers to debug issues wrt physical links.
> > > >
> > > > Intel has a similar thing, they keep adding a ethtool priv flag called
> > > > "link-down-on-close" to all their drivers.
> > >
> > > This is the list I compiled the previous time we discussed it:
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dc3880bd159d431d06b687b0b5ab22e24e6ef0070&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=fvl1aLwL55CWIsG2NT5i3QsP4o_GTEsGhA6Epjz7ZAk&e=
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dd5ec9e2ce41ac198de2ee18e0e529b7ebbc67408&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=kH50Qq3h75xREveyWvCUn35wXagtt4uv1QRK0wMBEdk&e=
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-3Dab4ab73fc1ec6dec548fa36c5e383ef5faa7b4c1&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=q3VKxXQKiboRw_F01ggTzHuhwawxR1P9_tMCN2FODU4&m=4xGR8HuIRKUriC93QV4GmQBJ6KVwRgGZ05Syzpq2CAM&s=Uc3yY-5HjS7TgRBl4DPLsJ19XiHDD_PvF8hA38K4XwI&e=
> > >
> > > It seems that various drivers default to not shutting down the link upon
> > > ndo_stop(), but some users want to override it. I hit that too as it
> > > breaks ECMP (switch thinks the link is up).
> > >
> > > > Maybe others do this, too.  It's time we added a standard API for
> > > > this.
> > >
> > > The new parameter sounds like a reset, but it can also be achieved by:
> > >
> > > # ethtool --set-priv-flags eth0 link-down-on-close on
> > > # ip link set dev eth0 down
> > > # ip link set dev eth0 up
> > >
> > > Where the first command is replaced by a more standard ethtool API.
> >
> > The intention here is provide an option to the user to toggle the serdes configuration
> > as and when he wants to.
>
> But why? What is the motivation? The commit message basically says that
> you are adding a param to toggle the physical link because it is useful
> to toggle the physical link.
>
> > There is no dependency with logical interface's status.
>
> But there is and the commit message explains why you are not doing it as
> part of ndo_{stop,open}(): "because the same link is shared between PFs
> and its VFs"
>
> Such constraints also apply to other drivers and you can see that in the
> "link-down-on-close" private flag. I'm also aware of propriety tools to
> toggle device bits which prevent the physical link from going down upon
> ndo_stop().
>
> > Having a standard API to select bringing down physical interface upon logical interface's close call
> > is a good idea. But this patch is not for that.
>
> IIUC, your default behavior is not to take the physical link down upon
> ndo_stop() and now you want to toggle the link. If you have a standard
> API to change the default behavior, then the commands I showed will
> toggle the link, no?

Actually we also need a case where debugging is required when the
logical link is
up (so that packets flow from kernel to SerDes continuously) but the
physical link
is down. We will change the commit description since it is giving the
wrong impression.
A command to change physical link up/down with no relation to ifconfig
is needed.

Thanks,
Sundeep
Ido Schimmel Oct. 28, 2021, 1:51 p.m. UTC | #7
On Thu, Oct 28, 2021 at 05:48:02PM +0530, sundeep subbaraya wrote:
> Actually we also need a case where debugging is required when the
> logical link is
> up (so that packets flow from kernel to SerDes continuously) but the
> physical link
> is down. 

Can you explain the motivation for that? In the past we discussed use
cases for forcing the operational state to down while the administrative
state is up and couldn't find any.

> We will change the commit description since it is giving the
> wrong impression.
> A command to change physical link up/down with no relation to ifconfig
> is needed.

So it is obvious that some drivers default to not shutting down the
physical link upon admin down, but that some users want to change that.
In addition, we have your use case to control the physical link without
relation to the logical link. I wonder if it can all be solved with a
new ethtool attribute (part of LINKINFO_{SET,GET} ?) that describes the
physical link policy and has the following values:

* auto: Physical link state is derived from logical link state
* up: Physical link state is always up
* down: Physical link state is always down

IIUC, it should solve your problem and that of the "link-down-on-close"
private flag. It also has the added benefit of allowing user space to
query the default policy. The expectation is that it would be "auto",
but in some scenarios it is "up".
sundeep subbaraya Oct. 30, 2021, 7:25 a.m. UTC | #8
Hi Ido,

On Thu, Oct 28, 2021 at 7:21 PM Ido Schimmel <idosch@idosch.org> wrote:
>
> On Thu, Oct 28, 2021 at 05:48:02PM +0530, sundeep subbaraya wrote:
> > Actually we also need a case where debugging is required when the
> > logical link is
> > up (so that packets flow from kernel to SerDes continuously) but the
> > physical link
> > is down.
>
> Can you explain the motivation for that? In the past we discussed use
> cases for forcing the operational state to down while the administrative
> state is up and couldn't find any.
>
To be honest we got this request from a customer to provide a command to modify
physical link without tying it to a logical link. I have asked for
more details on how
they use it.

> > We will change the commit description since it is giving the
> > wrong impression.
> > A command to change physical link up/down with no relation to ifconfig
> > is needed.
>
> So it is obvious that some drivers default to not shutting down the
> physical link upon admin down, but that some users want to change that.
> In addition, we have your use case to control the physical link without
> relation to the logical link. I wonder if it can all be solved with a
> new ethtool attribute (part of LINKINFO_{SET,GET} ?) that describes the
> physical link policy and has the following values:
>
> * auto: Physical link state is derived from logical link state
> * up: Physical link state is always up
> * down: Physical link state is always down
>
> IIUC, it should solve your problem and that of the "link-down-on-close"
> private flag. It also has the added benefit of allowing user space to
> query the default policy. The expectation is that it would be "auto",
> but in some scenarios it is "up".

This looks good. Please note that we need the behavior such that after changing
the flag a subsequent ifconfig command is not needed by the user.

auto : in ndo_open, ndo_close check the physical link flag is auto and
send command
          to firmware for bringing physical link up/down.
up: send command to firmware instantaneously for physical link UP
down: send command to firmware instantaneously for physical link DOWN

Thanks,
Sundeep
Ido Schimmel Nov. 7, 2021, 9:21 a.m. UTC | #9
On Sat, Oct 30, 2021 at 12:55:47PM +0530, sundeep subbaraya wrote:
> Hi Ido,
> 
> On Thu, Oct 28, 2021 at 7:21 PM Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Thu, Oct 28, 2021 at 05:48:02PM +0530, sundeep subbaraya wrote:
> > > Actually we also need a case where debugging is required when the
> > > logical link is
> > > up (so that packets flow from kernel to SerDes continuously) but the
> > > physical link
> > > is down.
> >
> > Can you explain the motivation for that? In the past we discussed use
> > cases for forcing the operational state to down while the administrative
> > state is up and couldn't find any.
> >
> To be honest we got this request from a customer to provide a command to modify
> physical link without tying it to a logical link. I have asked for
> more details on how
> they use it.

Thanks

> 
> > > We will change the commit description since it is giving the
> > > wrong impression.
> > > A command to change physical link up/down with no relation to ifconfig
> > > is needed.
> >
> > So it is obvious that some drivers default to not shutting down the
> > physical link upon admin down, but that some users want to change that.
> > In addition, we have your use case to control the physical link without
> > relation to the logical link. I wonder if it can all be solved with a
> > new ethtool attribute (part of LINKINFO_{SET,GET} ?) that describes the
> > physical link policy and has the following values:
> >
> > * auto: Physical link state is derived from logical link state
> > * up: Physical link state is always up
> > * down: Physical link state is always down
> >
> > IIUC, it should solve your problem and that of the "link-down-on-close"
> > private flag. It also has the added benefit of allowing user space to
> > query the default policy. The expectation is that it would be "auto",
> > but in some scenarios it is "up".
> 
> This looks good. Please note that we need the behavior such that after changing
> the flag a subsequent ifconfig command is not needed by the user.
> 
> auto : in ndo_open, ndo_close check the physical link flag is auto and
> send command
>           to firmware for bringing physical link up/down.
> up: send command to firmware instantaneously for physical link UP
> down: send command to firmware instantaneously for physical link DOWN

TBH, I'm not that happy with my ethtool suggestion. It is not very clear
which hardware entities the attribute controls. Maybe it's better to
implement it as a rtnetlink attribute that controls the carrier (e.g.,
"carrier_policy")? Note that we already have ndo_change_carrier(), but
the kdoc comment explicitly mentions that it shouldn't be used by
physical devices:

 * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
 *	Called to change device carrier. Soft-devices (like dummy, team, etc)
 *	which do not represent real hardware may define this to allow their
 *	userspace components to manage their virtual carrier state. Devices
 *	that determine carrier state from physical hardware properties (eg
 *	network cables) or protocol-dependent mechanisms (eg
 *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
Jakub Kicinski Nov. 8, 2021, 3:54 p.m. UTC | #10
On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote:
> > This looks good. Please note that we need the behavior such that after changing
> > the flag a subsequent ifconfig command is not needed by the user.
> > 
> > auto : in ndo_open, ndo_close check the physical link flag is auto and
> > send command
> >           to firmware for bringing physical link up/down.
> > up: send command to firmware instantaneously for physical link UP
> > down: send command to firmware instantaneously for physical link DOWN  
> 
> TBH, I'm not that happy with my ethtool suggestion. It is not very clear
> which hardware entities the attribute controls.

Last week I heard a request to also be able to model NC-SI disruption.
Control if the NIC should be reset and newly flashed FW activated when
host is rebooted (vs full server power cycle).

That adds another dimension to the problem, even though that particular
use case may be better answered thru the devlink flashing/reset APIs.

Trying to organize the requirements we have 3 entities which may hold
the link up:
 - SFP power policy
 - NC-SI / BMC
 - SR-IOV (legacy)

I'd think auto/up as possible options still make sense, although in
case of NC-SI many NICs may not allow overriding the "up". And the
policy may change without notification if BMC selects / activates 
a port - it may go from auto to up with no notification.

Presumably we want to track "who's holding the link up" per consumer.
Just a bitset with 1s for every consumer holding "up"? 

Or do we expect there will be "more to it" and should create bespoke
nests?

> Maybe it's better to
> implement it as a rtnetlink attribute that controls the carrier (e.g.,
> "carrier_policy")? Note that we already have ndo_change_carrier(), but
> the kdoc comment explicitly mentions that it shouldn't be used by
> physical devices:
>
>  * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
>  *	Called to change device carrier. Soft-devices (like dummy, team, etc)
>  *	which do not represent real hardware may define this to allow their
>  *	userspace components to manage their virtual carrier state. Devices
>  *	that determine carrier state from physical hardware properties (eg
>  *	network cables) or protocol-dependent mechanisms (eg
>  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.

New NDO seems reasonable. 

What are your thoughts on the SFP policy? We can still reshuffle it.
Ido Schimmel Nov. 11, 2021, 2:51 p.m. UTC | #11
On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote:
> On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote:
> > > This looks good. Please note that we need the behavior such that after changing
> > > the flag a subsequent ifconfig command is not needed by the user.
> > > 
> > > auto : in ndo_open, ndo_close check the physical link flag is auto and
> > > send command
> > >           to firmware for bringing physical link up/down.
> > > up: send command to firmware instantaneously for physical link UP
> > > down: send command to firmware instantaneously for physical link DOWN  
> > 
> > TBH, I'm not that happy with my ethtool suggestion. It is not very clear
> > which hardware entities the attribute controls.
> 
> Last week I heard a request to also be able to model NC-SI disruption.
> Control if the NIC should be reset and newly flashed FW activated when
> host is rebooted (vs full server power cycle).
> 
> That adds another dimension to the problem, even though that particular
> use case may be better answered thru the devlink flashing/reset APIs.
> 
> Trying to organize the requirements we have 3 entities which may hold
> the link up:
>  - SFP power policy

The SFP power policy does not keep the link up. In fact, we specifically
removed the "low" policy to make sure that whatever policy you configure
("auto"/"high") does not affect your carrier.

>  - NC-SI / BMC
>  - SR-IOV (legacy)
> 
> I'd think auto/up as possible options still make sense, although in
> case of NC-SI many NICs may not allow overriding the "up". And the
> policy may change without notification if BMC selects / activates 
> a port - it may go from auto to up with no notification.
> 
> Presumably we want to track "who's holding the link up" per consumer.
> Just a bitset with 1s for every consumer holding "up"? 
> 
> Or do we expect there will be "more to it" and should create bespoke
> nests?
> 
> > Maybe it's better to
> > implement it as a rtnetlink attribute that controls the carrier (e.g.,
> > "carrier_policy")? Note that we already have ndo_change_carrier(), but
> > the kdoc comment explicitly mentions that it shouldn't be used by
> > physical devices:
> >
> >  * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
> >  *	Called to change device carrier. Soft-devices (like dummy, team, etc)
> >  *	which do not represent real hardware may define this to allow their
> >  *	userspace components to manage their virtual carrier state. Devices
> >  *	that determine carrier state from physical hardware properties (eg
> >  *	network cables) or protocol-dependent mechanisms (eg
> >  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
> 
> New NDO seems reasonable. 

Spent a bit more time on that and I'm not sure a new ndo is needed. See:

 * void (*ndo_change_proto_down)(struct net_device *dev,
 *				 bool proto_down);
 *	This function is used to pass protocol port error state information
 *	to the switch driver. The switch driver can react to the proto_down
 *      by doing a phys down on the associated switch port.

So what this patch is trying to achieve can be achieved by implementing
support for this ndo:

$ ip link show dev macvlan10
20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff

# ip link set dev macvlan10 protodown on

$ ip link show dev macvlan10
20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on

Currently, user space has no visibility into the fact that by default
the carrier is on, but I imagine this can be resolved by adding
"protoup" and defaulting the driver to report "on". The "who's holding
the link up" issue can be resolved via "protoup_reason" (same as
"protodown_reason").
Jakub Kicinski Nov. 11, 2021, 4:47 p.m. UTC | #12
On Thu, 11 Nov 2021 16:51:51 +0200 Ido Schimmel wrote:
> On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote:
> > On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote:  
> > > TBH, I'm not that happy with my ethtool suggestion. It is not very clear
> > > which hardware entities the attribute controls.  
> > 
> > Last week I heard a request to also be able to model NC-SI disruption.
> > Control if the NIC should be reset and newly flashed FW activated when
> > host is rebooted (vs full server power cycle).
> > 
> > That adds another dimension to the problem, even though that particular
> > use case may be better answered thru the devlink flashing/reset APIs.
> > 
> > Trying to organize the requirements we have 3 entities which may hold
> > the link up:
> >  - SFP power policy  
> 
> The SFP power policy does not keep the link up. In fact, we specifically
> removed the "low" policy to make sure that whatever policy you configure
> ("auto"/"high") does not affect your carrier.

Hm. How do we come up with the appropriate wording here...

I meant keeping the "PHY level link" up? I think we agree that all the
cases should behave like SFP power behaves today?

The API is to control or query what is forcing the PHY link to stay up
after the netdev was set down. IOW why does the switch still see link
up if the link is down on Linux. I don't think we should report carrier
up when netdev is down?

> >  - NC-SI / BMC
> >  - SR-IOV (legacy)

 - NPAR / Mutli-Host

so 4 known reasons.

> > I'd think auto/up as possible options still make sense, although in
> > case of NC-SI many NICs may not allow overriding the "up". And the
> > policy may change without notification if BMC selects / activates 
> > a port - it may go from auto to up with no notification.
> > 
> > Presumably we want to track "who's holding the link up" per consumer.
> > Just a bitset with 1s for every consumer holding "up"? 
> > 
> > Or do we expect there will be "more to it" and should create bespoke
> > nests?
> >   
> > > Maybe it's better to
> > > implement it as a rtnetlink attribute that controls the carrier (e.g.,
> > > "carrier_policy")? Note that we already have ndo_change_carrier(), but
> > > the kdoc comment explicitly mentions that it shouldn't be used by
> > > physical devices:
> > >
> > >  * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
> > >  *	Called to change device carrier. Soft-devices (like dummy, team, etc)
> > >  *	which do not represent real hardware may define this to allow their
> > >  *	userspace components to manage their virtual carrier state. Devices
> > >  *	that determine carrier state from physical hardware properties (eg
> > >  *	network cables) or protocol-dependent mechanisms (eg
> > >  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.  
> > 
> > New NDO seems reasonable.   
> 
> Spent a bit more time on that and I'm not sure a new ndo is needed. See:
> 
>  * void (*ndo_change_proto_down)(struct net_device *dev,
>  *				 bool proto_down);
>  *	This function is used to pass protocol port error state information
>  *	to the switch driver. The switch driver can react to the proto_down
>  *      by doing a phys down on the associated switch port.
> 
> So what this patch is trying to achieve can be achieved by implementing
> support for this ndo:
> 
> $ ip link show dev macvlan10
> 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff
> 
> # ip link set dev macvlan10 protodown on
> 
> $ ip link show dev macvlan10
> 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
>     link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on

Let's wait to hear a strong use case, tho.

> Currently, user space has no visibility into the fact that by default
> the carrier is on, but I imagine this can be resolved by adding
> "protoup" and defaulting the driver to report "on". The "who's holding
> the link up" issue can be resolved via "protoup_reason" (same as
> "protodown_reason").

"proto" in "protodown" refers to STP, right? Not sure what "proto" in
"protoup" would be.
Ido Schimmel Nov. 14, 2021, 8:38 a.m. UTC | #13
On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote:
> On Thu, 11 Nov 2021 16:51:51 +0200 Ido Schimmel wrote:
> > On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote:
> > > On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote:  
> > > > TBH, I'm not that happy with my ethtool suggestion. It is not very clear
> > > > which hardware entities the attribute controls.  
> > > 
> > > Last week I heard a request to also be able to model NC-SI disruption.
> > > Control if the NIC should be reset and newly flashed FW activated when
> > > host is rebooted (vs full server power cycle).
> > > 
> > > That adds another dimension to the problem, even though that particular
> > > use case may be better answered thru the devlink flashing/reset APIs.
> > > 
> > > Trying to organize the requirements we have 3 entities which may hold
> > > the link up:
> > >  - SFP power policy  
> > 
> > The SFP power policy does not keep the link up. In fact, we specifically
> > removed the "low" policy to make sure that whatever policy you configure
> > ("auto"/"high") does not affect your carrier.
> 
> Hm. How do we come up with the appropriate wording here...
> 
> I meant keeping the "PHY level link" up? I think we agree that all the
> cases should behave like SFP power behaves today?
> 
> The API is to control or query what is forcing the PHY link to stay up
> after the netdev was set down. IOW why does the switch still see link
> up if the link is down on Linux. 

The SFP power policy doesn't affect that. In our systems (and I believe
many others), by default, the transceivers are transitioned to high
power mode upon plug-in, but the link is still down when the netdev is
down because the MAC/PHY are not operational.

With SRIOV/Multi-Host, the MAC/PHY are always operational which is why
your link partner has a carrier even when the netdev is down.

> I don't think we should report carrier up when netdev is down?

This is what happens today, but it's misleading because the carrier is
always up with these systems. When I take the netdev down, I expect my
link partner to lose carrier. If this doesn't happen, then I believe the
netdev should always report IFF_UP. Alternatively, to avoid user space
breakage, this can be reported via a new attribute such as "protoup".

> 
> > >  - NC-SI / BMC
> > >  - SR-IOV (legacy)
> 
>  - NPAR / Mutli-Host
> 
> so 4 known reasons.
> 
> > > I'd think auto/up as possible options still make sense, although in
> > > case of NC-SI many NICs may not allow overriding the "up". And the
> > > policy may change without notification if BMC selects / activates 
> > > a port - it may go from auto to up with no notification.
> > > 
> > > Presumably we want to track "who's holding the link up" per consumer.
> > > Just a bitset with 1s for every consumer holding "up"? 
> > > 
> > > Or do we expect there will be "more to it" and should create bespoke
> > > nests?
> > >   
> > > > Maybe it's better to
> > > > implement it as a rtnetlink attribute that controls the carrier (e.g.,
> > > > "carrier_policy")? Note that we already have ndo_change_carrier(), but
> > > > the kdoc comment explicitly mentions that it shouldn't be used by
> > > > physical devices:
> > > >
> > > >  * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
> > > >  *	Called to change device carrier. Soft-devices (like dummy, team, etc)
> > > >  *	which do not represent real hardware may define this to allow their
> > > >  *	userspace components to manage their virtual carrier state. Devices
> > > >  *	that determine carrier state from physical hardware properties (eg
> > > >  *	network cables) or protocol-dependent mechanisms (eg
> > > >  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.  
> > > 
> > > New NDO seems reasonable.   
> > 
> > Spent a bit more time on that and I'm not sure a new ndo is needed. See:
> > 
> >  * void (*ndo_change_proto_down)(struct net_device *dev,
> >  *				 bool proto_down);
> >  *	This function is used to pass protocol port error state information
> >  *	to the switch driver. The switch driver can react to the proto_down
> >  *      by doing a phys down on the associated switch port.
> > 
> > So what this patch is trying to achieve can be achieved by implementing
> > support for this ndo:
> > 
> > $ ip link show dev macvlan10
> > 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> >     link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff
> > 
> > # ip link set dev macvlan10 protodown on
> > 
> > $ ip link show dev macvlan10
> > 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
> >     link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on
> 
> Let's wait to hear a strong use case, tho.

Agree

> 
> > Currently, user space has no visibility into the fact that by default
> > the carrier is on, but I imagine this can be resolved by adding
> > "protoup" and defaulting the driver to report "on". The "who's holding
> > the link up" issue can be resolved via "protoup_reason" (same as
> > "protodown_reason").
> 
> "proto" in "protodown" refers to STP, right?

Not really. I believe the main use case was vrrp / mlag. The
"protdown_reason" is just a bitmap of user enumerated reasons to keep
the interface down. See commit 829eb208e80d ("rtnetlink: add support for
protodown reason") for details.

> Not sure what "proto" in "protoup" would be.

sriov/multi-host/etc ?
Roopa Prabhu Nov. 15, 2021, 4:19 a.m. UTC | #14
On 11/14/21 12:38 AM, Ido Schimmel wrote:
> On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote:
>> On Thu, 11 Nov 2021 16:51:51 +0200 Ido Schimmel wrote:
>>> On Mon, Nov 08, 2021 at 07:54:50AM -0800, Jakub Kicinski wrote:
>>>> On Sun, 7 Nov 2021 11:21:17 +0200 Ido Schimmel wrote:
>>>>> TBH, I'm not that happy with my ethtool suggestion. It is not very clear
>>>>> which hardware entities the attribute controls.
>>>> Last week I heard a request to also be able to model NC-SI disruption.
>>>> Control if the NIC should be reset and newly flashed FW activated when
>>>> host is rebooted (vs full server power cycle).
>>>>
>>>> That adds another dimension to the problem, even though that particular
>>>> use case may be better answered thru the devlink flashing/reset APIs.
>>>>
>>>> Trying to organize the requirements we have 3 entities which may hold
>>>> the link up:
>>>>   - SFP power policy
>>> The SFP power policy does not keep the link up. In fact, we specifically
>>> removed the "low" policy to make sure that whatever policy you configure
>>> ("auto"/"high") does not affect your carrier.
>> Hm. How do we come up with the appropriate wording here...
>>
>> I meant keeping the "PHY level link" up? I think we agree that all the
>> cases should behave like SFP power behaves today?
>>
>> The API is to control or query what is forcing the PHY link to stay up
>> after the netdev was set down. IOW why does the switch still see link
>> up if the link is down on Linux.
> The SFP power policy doesn't affect that. In our systems (and I believe
> many others), by default, the transceivers are transitioned to high
> power mode upon plug-in, but the link is still down when the netdev is
> down because the MAC/PHY are not operational.
>
> With SRIOV/Multi-Host, the MAC/PHY are always operational which is why
> your link partner has a carrier even when the netdev is down.
>
>> I don't think we should report carrier up when netdev is down?
> This is what happens today, but it's misleading because the carrier is
> always up with these systems. When I take the netdev down, I expect my
> link partner to lose carrier. If this doesn't happen, then I believe the
> netdev should always report IFF_UP. Alternatively, to avoid user space
> breakage, this can be reported via a new attribute such as "protoup".
>
>>>>   - NC-SI / BMC
>>>>   - SR-IOV (legacy)
>>   - NPAR / Mutli-Host
>>
>> so 4 known reasons.
>>
>>>> I'd think auto/up as possible options still make sense, although in
>>>> case of NC-SI many NICs may not allow overriding the "up". And the
>>>> policy may change without notification if BMC selects / activates
>>>> a port - it may go from auto to up with no notification.
>>>>
>>>> Presumably we want to track "who's holding the link up" per consumer.
>>>> Just a bitset with 1s for every consumer holding "up"?
>>>>
>>>> Or do we expect there will be "more to it" and should create bespoke
>>>> nests?
>>>>    
>>>>> Maybe it's better to
>>>>> implement it as a rtnetlink attribute that controls the carrier (e.g.,
>>>>> "carrier_policy")? Note that we already have ndo_change_carrier(), but
>>>>> the kdoc comment explicitly mentions that it shouldn't be used by
>>>>> physical devices:
>>>>>
>>>>>   * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
>>>>>   *	Called to change device carrier. Soft-devices (like dummy, team, etc)
>>>>>   *	which do not represent real hardware may define this to allow their
>>>>>   *	userspace components to manage their virtual carrier state. Devices
>>>>>   *	that determine carrier state from physical hardware properties (eg
>>>>>   *	network cables) or protocol-dependent mechanisms (eg
>>>>>   *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
>>>> New NDO seems reasonable.
>>> Spent a bit more time on that and I'm not sure a new ndo is needed. See:
>>>
>>>   * void (*ndo_change_proto_down)(struct net_device *dev,
>>>   *				 bool proto_down);
>>>   *	This function is used to pass protocol port error state information
>>>   *	to the switch driver. The switch driver can react to the proto_down
>>>   *      by doing a phys down on the associated switch port.
>>>
>>> So what this patch is trying to achieve can be achieved by implementing
>>> support for this ndo:
>>>
>>> $ ip link show dev macvlan10
>>> 20: macvlan10@dummy10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>>      link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff
>>>
>>> # ip link set dev macvlan10 protodown on
>>>
>>> $ ip link show dev macvlan10
>>> 20: macvlan10@dummy10: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
>>>      link/ether 3e:d6:1a:97:ba:5e brd ff:ff:ff:ff:ff:ff protodown on
>> Let's wait to hear a strong use case, tho.
> Agree
>
>>> Currently, user space has no visibility into the fact that by default
>>> the carrier is on, but I imagine this can be resolved by adding
>>> "protoup" and defaulting the driver to report "on". The "who's holding
>>> the link up" issue can be resolved via "protoup_reason" (same as
>>> "protodown_reason").
>> "proto" in "protodown" refers to STP, right?
> Not really. I believe the main use case was vrrp / mlag. The
> "protdown_reason" is just a bitmap of user enumerated reasons to keep
> the interface down. See commit 829eb208e80d ("rtnetlink: add support for
> protodown reason") for details.

correct. Its equivalent to errDisable found on most commercial switch OS'es.

Can be used for any control-plane/mgmt-plane/protocol wanting to hold 
the link down.

Other use-cases where this can be used (as also quoted by other vendors):

mismatch of link properties
Link Flapping detection and disable link
Port Security Violation
Broadcast Storms
etc


>
>> Not sure what "proto" in "protoup" would be.
> sriov/multi-host/etc ?

agree. Would be nice to re-use protodown ndo and state/reason here
Jakub Kicinski Nov. 15, 2021, 3:11 p.m. UTC | #15
On Sun, 14 Nov 2021 20:19:59 -0800 Roopa Prabhu wrote:
> On 11/14/21 12:38 AM, Ido Schimmel wrote:
> > On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote:  
> >> Hm. How do we come up with the appropriate wording here...
> >>
> >> I meant keeping the "PHY level link" up? I think we agree that all the
> >> cases should behave like SFP power behaves today?
> >>
> >> The API is to control or query what is forcing the PHY link to stay up
> >> after the netdev was set down. IOW why does the switch still see link
> >> up if the link is down on Linux.  
> > The SFP power policy doesn't affect that. In our systems (and I believe
> > many others), by default, the transceivers are transitioned to high
> > power mode upon plug-in, but the link is still down when the netdev is
> > down because the MAC/PHY are not operational.

Ah, GTK!

> > With SRIOV/Multi-Host, the MAC/PHY are always operational which is why
> > your link partner has a carrier even when the netdev is down.

I see, I think you're talking about something like IFLA_VF_LINK_STATE_*
but for the PF. That could make sense, although I don't think it was
ever requested.

> >> I don't think we should report carrier up when netdev is down?  
> > This is what happens today, but it's misleading because the carrier is
> > always up with these systems. When I take the netdev down, I expect my
> > link partner to lose carrier. If this doesn't happen, then I believe the
> > netdev should always report IFF_UP. Alternatively, to avoid user space
> > breakage, this can be reported via a new attribute such as "protoup".

Sounds sensible.

> >> "proto" in "protodown" refers to STP, right?  
> > Not really. I believe the main use case was vrrp / mlag. 

VRRP is a proto, mlag maybe a little less clear-cut.

> > The "protdown_reason" is just a bitmap of user enumerated reasons to keep
> > the interface down. See commit 829eb208e80d ("rtnetlink: add support for
> > protodown reason") for details.  
> 
> correct. Its equivalent to errDisable found on most commercial switch OS'es.
> 
> Can be used for any control-plane/mgmt-plane/protocol wanting to hold 
> the link down.
> 
> Other use-cases where this can be used (as also quoted by other vendors):
> 
> mismatch of link properties

What link properties?

> Link Flapping detection and disable link
> Port Security Violation

Port security as established by a .. protocol like 802.1X ?

> Broadcast Storms
> etc

Why not take the entire interface down for bcast storm?

> >> Not sure what "proto" in "protoup" would be.  
> > sriov/multi-host/etc ?  
> 
> agree. Would be nice to re-use protodown ndo and state/reason here

You are the experts so correct me please but the point of protodown 
is that the the link is held down for general traffic but you can 
still exchange protocol messages on it. STP, VRRP, LAG, 802.1X etc.

For anything that does not require special message exchange the link
can be just brought down completely.

In my head link held up is a completely different beast, the local host
does not participate or otherwise pay attention to any communication on
the link. It's about what other entities do with the link.

But if you prefer "protoup" strongly that's fine, I guess.
sundeep subbaraya Nov. 19, 2021, 10:47 a.m. UTC | #16
Hi,

On Mon, Nov 15, 2021 at 8:41 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 14 Nov 2021 20:19:59 -0800 Roopa Prabhu wrote:
> > On 11/14/21 12:38 AM, Ido Schimmel wrote:
> > > On Thu, Nov 11, 2021 at 08:47:19AM -0800, Jakub Kicinski wrote:
> > >> Hm. How do we come up with the appropriate wording here...
> > >>
> > >> I meant keeping the "PHY level link" up? I think we agree that all the
> > >> cases should behave like SFP power behaves today?
> > >>
> > >> The API is to control or query what is forcing the PHY link to stay up
> > >> after the netdev was set down. IOW why does the switch still see link
> > >> up if the link is down on Linux.
> > > The SFP power policy doesn't affect that. In our systems (and I believe
> > > many others), by default, the transceivers are transitioned to high
> > > power mode upon plug-in, but the link is still down when the netdev is
> > > down because the MAC/PHY are not operational.
>
> Ah, GTK!
>
> > > With SRIOV/Multi-Host, the MAC/PHY are always operational which is why
> > > your link partner has a carrier even when the netdev is down.
>
> I see, I think you're talking about something like IFLA_VF_LINK_STATE_*
> but for the PF. That could make sense, although I don't think it was
> ever requested.
>
> > >> I don't think we should report carrier up when netdev is down?
> > > This is what happens today, but it's misleading because the carrier is
> > > always up with these systems. When I take the netdev down, I expect my
> > > link partner to lose carrier. If this doesn't happen, then I believe the
> > > netdev should always report IFF_UP. Alternatively, to avoid user space
> > > breakage, this can be reported via a new attribute such as "protoup".
>
> Sounds sensible.
>
> > >> "proto" in "protodown" refers to STP, right?
> > > Not really. I believe the main use case was vrrp / mlag.
>
> VRRP is a proto, mlag maybe a little less clear-cut.
>
> > > The "protdown_reason" is just a bitmap of user enumerated reasons to keep
> > > the interface down. See commit 829eb208e80d ("rtnetlink: add support for
> > > protodown reason") for details.
> >
> > correct. Its equivalent to errDisable found on most commercial switch OS'es.
> >
> > Can be used for any control-plane/mgmt-plane/protocol wanting to hold
> > the link down.
> >
> > Other use-cases where this can be used (as also quoted by other vendors):
> >
> > mismatch of link properties
>
> What link properties?
>
> > Link Flapping detection and disable link
> > Port Security Violation
>
> Port security as established by a .. protocol like 802.1X ?
>
> > Broadcast Storms
> > etc
>
> Why not take the entire interface down for bcast storm?
>
> > >> Not sure what "proto" in "protoup" would be.
> > > sriov/multi-host/etc ?
> >
> > agree. Would be nice to re-use protodown ndo and state/reason here
>
> You are the experts so correct me please but the point of protodown
> is that the the link is held down for general traffic but you can
> still exchange protocol messages on it. STP, VRRP, LAG, 802.1X etc.
>
> For anything that does not require special message exchange the link
> can be just brought down completely.
>
> In my head link held up is a completely different beast, the local host
> does not participate or otherwise pay attention to any communication on
> the link. It's about what other entities do with the link.
>
> But if you prefer "protoup" strongly that's fine, I guess.

As said by Ido, ndo_change_proto_down with proto_down as
on and off is sufficient for our requirement right now. We will use
ndo_change_proto_down
instead of devlink. Thanks everyone for pitching in.

Sundeep
Jakub Kicinski Nov. 19, 2021, 2:09 p.m. UTC | #17
On Fri, 19 Nov 2021 16:17:53 +0530 sundeep subbaraya wrote:
> As said by Ido, ndo_change_proto_down with proto_down as
> on and off is sufficient for our requirement right now. We will use
> ndo_change_proto_down instead of devlink. Thanks everyone for
> pitching in.

ndo_change_proto_down is for software devices. Make sure you explain
your use case well, otherwise it's going to be a nack.
sundeep subbaraya Nov. 19, 2021, 2:26 p.m. UTC | #18
On Fri, Nov 19, 2021 at 7:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 19 Nov 2021 16:17:53 +0530 sundeep subbaraya wrote:
> > As said by Ido, ndo_change_proto_down with proto_down as
> > on and off is sufficient for our requirement right now. We will use
> > ndo_change_proto_down instead of devlink. Thanks everyone for
> > pitching in.
>
> ndo_change_proto_down is for software devices. Make sure you explain
> your use case well, otherwise it's going to be a nack.

Sorry new to networking stuff here. Where does the below imply it is
for software devices?
* void (*ndo_change_proto_down)(struct net_device *dev,
 *                               bool proto_down);
 *      This function is used to pass protocol port error state information
 *      to the switch driver. The switch driver can react to the proto_down
 *      by doing a phys down on the associated switch port.
I will find out the use case (pinged customer again)

Thanks,
Sundeep
Jakub Kicinski Nov. 19, 2021, 2:49 p.m. UTC | #19
On Fri, 19 Nov 2021 19:56:51 +0530 sundeep subbaraya wrote:
> On Fri, Nov 19, 2021 at 7:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri, 19 Nov 2021 16:17:53 +0530 sundeep subbaraya wrote:  
> > > As said by Ido, ndo_change_proto_down with proto_down as
> > > on and off is sufficient for our requirement right now. We will use
> > > ndo_change_proto_down instead of devlink. Thanks everyone for
> > > pitching in.  
> >
> > ndo_change_proto_down is for software devices. Make sure you explain
> > your use case well, otherwise it's going to be a nack.  
> 
> Sorry new to networking stuff here. Where does the below imply it is
> for software devices?
> * void (*ndo_change_proto_down)(struct net_device *dev,
>  *                               bool proto_down);
>  *      This function is used to pass protocol port error state information
>  *      to the switch driver. The switch driver can react to the proto_down
>  *      by doing a phys down on the associated switch port.
> I will find out the use case (pinged customer again)

Don't trust comments or documentation when working on Linux.
Code and git history are the sources of truth.

But you're right in a sense, the software devices which use this
callback today look like pretty fake users to allow out-of-tree
code to do things.

Will anyone who does not work on Cumulus Linus scream if we do this?

----->8---------------

From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 19 Nov 2021 06:43:58 -0800
Subject: [PATCH] net: remove .ndo_change_proto_down

.ndo_change_proto_down was added seemingly to enable out-of-tree
implementations. Over 2.5yrs later we still have no real users
upstream. Stub this out for now, we can revert once real users
materialize. (rocker is a test vehicle, not a user.)

We need to drop the optimization on the sysfs side, because
unlike ndos priv_flags will be changed at runtime, so we'd
need READ_ONCE/WRITE_ONCE everywhere..

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 drivers/net/ethernet/rocker/rocker_main.c | 12 -----------
 drivers/net/macvlan.c                     |  3 +--
 drivers/net/vxlan.c                       |  3 +--
 include/linux/netdevice.h                 | 11 ++--------
 net/core/dev.c                            | 26 ++++-------------------
 net/core/net-sysfs.c                      |  6 ------
 net/core/rtnetlink.c                      |  3 +--
 7 files changed, 9 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index ba4062881eed..b620470c7905 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1995,17 +1995,6 @@ static int rocker_port_get_phys_port_name(struct net_device *dev,
 	return err ? -EOPNOTSUPP : 0;
 }
 
-static int rocker_port_change_proto_down(struct net_device *dev,
-					 bool proto_down)
-{
-	struct rocker_port *rocker_port = netdev_priv(dev);
-
-	if (rocker_port->dev->flags & IFF_UP)
-		rocker_port_set_enable(rocker_port, !proto_down);
-	rocker_port->dev->proto_down = proto_down;
-	return 0;
-}
-
 static void rocker_port_neigh_destroy(struct net_device *dev,
 				      struct neighbour *n)
 {
@@ -2037,7 +2026,6 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_set_mac_address		= rocker_port_set_mac_address,
 	.ndo_change_mtu			= rocker_port_change_mtu,
 	.ndo_get_phys_port_name		= rocker_port_get_phys_port_name,
-	.ndo_change_proto_down		= rocker_port_change_proto_down,
 	.ndo_neigh_destroy		= rocker_port_neigh_destroy,
 	.ndo_get_port_parent_id		= rocker_port_get_port_parent_id,
 };
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index d2f830ec2969..71ad30c10b6e 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1171,7 +1171,6 @@ static const struct net_device_ops macvlan_netdev_ops = {
 #endif
 	.ndo_get_iflink		= macvlan_dev_get_iflink,
 	.ndo_features_check	= passthru_features_check,
-	.ndo_change_proto_down  = dev_change_proto_down_generic,
 };
 
 void macvlan_common_setup(struct net_device *dev)
@@ -1182,7 +1181,7 @@ void macvlan_common_setup(struct net_device *dev)
 	dev->max_mtu		= ETH_MAX_MTU;
 	dev->priv_flags	       &= ~IFF_TX_SKB_SHARING;
 	netif_keep_dst(dev);
-	dev->priv_flags	       |= IFF_UNICAST_FLT;
+	dev->priv_flags	       |= IFF_UNICAST_FLT | IFF_CHANGE_PROTO_DOWN;
 	dev->netdev_ops		= &macvlan_netdev_ops;
 	dev->needs_free_netdev	= true;
 	dev->header_ops		= &macvlan_hard_header_ops;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 563f86de0e0d..74a28b11260e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3234,7 +3234,6 @@ static const struct net_device_ops vxlan_netdev_ether_ops = {
 	.ndo_fdb_dump		= vxlan_fdb_dump,
 	.ndo_fdb_get		= vxlan_fdb_get,
 	.ndo_fill_metadata_dst	= vxlan_fill_metadata_dst,
-	.ndo_change_proto_down  = dev_change_proto_down_generic,
 };
 
 static const struct net_device_ops vxlan_netdev_raw_ops = {
@@ -3305,7 +3304,7 @@ static void vxlan_setup(struct net_device *dev)
 	dev->hw_features |= NETIF_F_RXCSUM;
 	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
 	netif_keep_dst(dev);
-	dev->priv_flags |= IFF_NO_QUEUE;
+	dev->priv_flags |= IFF_NO_QUEUE | IFF_CHANGE_PROTO_DOWN;
 
 	/* MTU range: 68 - 65535 */
 	dev->min_mtu = ETH_MIN_MTU;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb7f2661d187..647d0960c510 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1297,11 +1297,6 @@ struct netdev_net_notifier {
  *	TX queue.
  * int (*ndo_get_iflink)(const struct net_device *dev);
  *	Called to get the iflink value of this device.
- * void (*ndo_change_proto_down)(struct net_device *dev,
- *				 bool proto_down);
- *	This function is used to pass protocol port error state information
- *	to the switch driver. The switch driver can react to the proto_down
- *      by doing a phys down on the associated switch port.
  * int (*ndo_fill_metadata_dst)(struct net_device *dev, struct sk_buff *skb);
  *	This function is used to get egress tunnel information for given skb.
  *	This is useful for retrieving outer tunnel header parameters while
@@ -1542,8 +1537,6 @@ struct net_device_ops {
 						      int queue_index,
 						      u32 maxrate);
 	int			(*ndo_get_iflink)(const struct net_device *dev);
-	int			(*ndo_change_proto_down)(struct net_device *dev,
-							 bool proto_down);
 	int			(*ndo_fill_metadata_dst)(struct net_device *dev,
 						       struct sk_buff *skb);
 	void			(*ndo_set_rx_headroom)(struct net_device *dev,
@@ -1646,6 +1639,7 @@ enum netdev_priv_flags {
 	IFF_L3MDEV_RX_HANDLER		= 1<<29,
 	IFF_LIVE_RENAME_OK		= 1<<30,
 	IFF_TX_SKB_NO_LINEAR		= 1<<31,
+	IFF_CHANGE_PROTO_DOWN		= 1<<32,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1982,7 +1976,7 @@ struct net_device {
 
 	/* Read-mostly cache-line for fast-path access */
 	unsigned int		flags;
-	unsigned int		priv_flags;
+	unsigned long long	priv_flags;
 	const struct net_device_ops *netdev_ops;
 	int			ifindex;
 	unsigned short		gflags;
@@ -3735,7 +3729,6 @@ int dev_get_port_parent_id(struct net_device *dev,
 			   struct netdev_phys_item_id *ppid, bool recurse);
 bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
-int dev_change_proto_down_generic(struct net_device *dev, bool proto_down);
 void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask,
 				  u32 value);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, bool *again);
diff --git a/net/core/dev.c b/net/core/dev.c
index 9219e319e901..a2ed933df7fc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8558,35 +8558,17 @@ bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b)
 EXPORT_SYMBOL(netdev_port_same_parent_id);
 
 /**
- *	dev_change_proto_down - update protocol port state information
+ *	dev_change_proto_down - set carrier according to proto_down.
+ *
  *	@dev: device
  *	@proto_down: new value
- *
- *	This info can be used by switch drivers to set the phys state of the
- *	port.
  */
 int dev_change_proto_down(struct net_device *dev, bool proto_down)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
-
-	if (!ops->ndo_change_proto_down)
+	if (!(dev->priv_flags & IFF_CHANGE_PROTO_DOWN))
 		return -EOPNOTSUPP;
 	if (!netif_device_present(dev))
 		return -ENODEV;
-	return ops->ndo_change_proto_down(dev, proto_down);
-}
-EXPORT_SYMBOL(dev_change_proto_down);
-
-/**
- *	dev_change_proto_down_generic - generic implementation for
- * 	ndo_change_proto_down that sets carrier according to
- * 	proto_down.
- *
- *	@dev: device
- *	@proto_down: new value
- */
-int dev_change_proto_down_generic(struct net_device *dev, bool proto_down)
-{
 	if (proto_down)
 		netif_carrier_off(dev);
 	else
@@ -8594,7 +8576,7 @@ int dev_change_proto_down_generic(struct net_device *dev, bool proto_down)
 	dev->proto_down = proto_down;
 	return 0;
 }
-EXPORT_SYMBOL(dev_change_proto_down_generic);
+EXPORT_SYMBOL(dev_change_proto_down);
 
 /**
  *	dev_change_proto_down_reason - proto down reason
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index addbef5419fb..a1ec86bcc9d9 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -490,12 +490,6 @@ static ssize_t proto_down_store(struct device *dev,
 {
 	struct net_device *netdev = to_net_dev(dev);
 
-	/* The check is also done in change_proto_down; this helps returning
-	 * early without hitting the trylock/restart in netdev_store.
-	 */
-	if (!netdev->netdev_ops->ndo_change_proto_down)
-		return -EOPNOTSUPP;
-
 	return netdev_store(dev, attr, buf, len, change_proto_down);
 }
 NETDEVICE_SHOW_RW(proto_down, fmt_dec);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2af8aeeadadf..2ff3369892ff 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2539,13 +2539,12 @@ static int do_set_proto_down(struct net_device *dev,
 			     struct netlink_ext_ack *extack)
 {
 	struct nlattr *pdreason[IFLA_PROTO_DOWN_REASON_MAX + 1];
-	const struct net_device_ops *ops = dev->netdev_ops;
 	unsigned long mask = 0;
 	u32 value;
 	bool proto_down;
 	int err;
 
-	if (!ops->ndo_change_proto_down) {
+	if (!(dev->priv_flags & IFF_CHANGE_PROTO_DOWN)) {
 		NL_SET_ERR_MSG(extack,  "Protodown not supported by device");
 		return -EOPNOTSUPP;
 	}
diff mbox series

Patch

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
index 186d00a9..26c8763 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
@@ -1376,6 +1376,17 @@  static void cgx_lmac_linkup_work(struct work_struct *work)
 	}
 }
 
+int cgx_set_link_state(void *cgxd, int lmac_id, bool enable)
+{
+	struct cgx *cgx = cgxd;
+
+	if (!cgx)
+		return -ENODEV;
+
+	return cgx_fwi_link_change(cgx, lmac_id, enable);
+}
+EXPORT_SYMBOL(cgx_set_link_state);
+
 int cgx_lmac_linkup_start(void *cgxd)
 {
 	struct cgx *cgx = cgxd;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
index ab1e4ab..4d7c320 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.h
@@ -172,4 +172,5 @@  u64 cgx_lmac_read(int cgx_id, int lmac_id, u64 offset);
 int cgx_lmac_addr_update(u8 cgx_id, u8 lmac_id, u8 *mac_addr, u8 index);
 u64 cgx_read_dmac_ctrl(void *cgxd, int lmac_id);
 u64 cgx_read_dmac_entry(void *cgxd, int index);
+int cgx_set_link_state(void *cgxd, int lmac_id, bool enable);
 #endif /* CGX_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 4e79e91..25c2497 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -162,6 +162,8 @@  M(CGX_MAC_ADDR_DEL,	0x212, cgx_mac_addr_del, cgx_mac_addr_del_req,    \
 			       msg_rsp)		\
 M(CGX_MAC_MAX_ENTRIES_GET, 0x213, cgx_mac_max_entries_get, msg_req,    \
 				  cgx_max_dmac_entries_get_rsp)		\
+M(CGX_SET_LINK_STATE,	0x214, cgx_set_link_state,			\
+			       cgx_set_link_state_msg, msg_rsp)		\
 M(CGX_FEC_STATS,	0x217, cgx_fec_stats, msg_req, cgx_fec_stats_rsp) \
 M(CGX_SET_LINK_MODE,	0x218, cgx_set_link_mode, cgx_set_link_mode_req,\
 			       cgx_set_link_mode_rsp)	\
@@ -581,6 +583,11 @@  struct cgx_set_link_mode_rsp {
 	int status;
 };
 
+struct cgx_set_link_state_msg {
+	struct mbox_msghdr hdr;
+	u8 enable; /* '1' for link up, '0' for link down */
+};
+
 struct cgx_mac_addr_update_req {
 	struct mbox_msghdr hdr;
 	u8 mac_addr[ETH_ALEN];
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
index 2ca182a..7781be1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
@@ -1069,3 +1069,27 @@  int rvu_mbox_handler_cgx_mac_addr_update(struct rvu *rvu,
 	rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id);
 	return cgx_lmac_addr_update(cgx_id, lmac_id, req->mac_addr, req->index);
 }
+
+int rvu_mbox_handler_cgx_set_link_state(struct rvu *rvu,
+					struct cgx_set_link_state_msg *req,
+					struct msg_rsp *rsp)
+{
+	u16 pcifunc = req->hdr.pcifunc;
+	u8 cgx_id, lmac_id;
+	int pf, err;
+
+	pf = rvu_get_pf(pcifunc);
+
+	if (!is_cgx_config_permitted(rvu, pcifunc))
+		return LMAC_AF_ERR_PERM_DENIED;
+
+	rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id);
+
+	err = cgx_set_link_state(rvu_cgx_pdata(cgx_id, rvu), lmac_id,
+				 !!req->enable);
+	if (err)
+		dev_warn(rvu->dev, "Cannot set link state to %s, err %d\n",
+			 (req->enable) ? "enable" : "disable", err);
+
+	return err;
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 66da31f..446378a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -263,6 +263,26 @@  int otx2_config_pause_frm(struct otx2_nic *pfvf)
 	return err;
 }
 
+int otx2_config_serdes_link_state(struct otx2_nic *pfvf, bool en)
+{
+	struct cgx_set_link_state_msg *req;
+	int err;
+
+	mutex_lock(&pfvf->mbox.lock);
+	req = otx2_mbox_alloc_msg_cgx_set_link_state(&pfvf->mbox);
+	if (!req) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	req->enable = !!en;
+	err = otx2_sync_mbox_msg(&pfvf->mbox);
+unlock:
+	mutex_unlock(&pfvf->mbox.lock);
+	return err;
+}
+EXPORT_SYMBOL(otx2_config_serdes_link_state);
+
 int otx2_set_flowkey_cfg(struct otx2_nic *pfvf)
 {
 	struct otx2_rss_info *rss = &pfvf->hw.rss_info;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index 61e5281..8e60e3c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -771,6 +771,7 @@  void otx2_get_mac_from_af(struct net_device *netdev);
 void otx2_config_irq_coalescing(struct otx2_nic *pfvf, int qidx);
 int otx2_config_pause_frm(struct otx2_nic *pfvf);
 void otx2_setup_segmentation(struct otx2_nic *pfvf);
+int otx2_config_serdes_link_state(struct otx2_nic *pfvf, bool en);
 
 /* RVU block related APIs */
 int otx2_attach_npa_nix(struct otx2_nic *pfvf);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
index 777a270..471097b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
@@ -64,9 +64,33 @@  static int otx2_dl_mcam_count_get(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int otx2_dl_serdes_link_set(struct devlink *devlink, u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pfvf = otx2_dl->pfvf;
+
+	if (!is_otx2_vf(pfvf->pcifunc))
+		return otx2_config_serdes_link_state(pfvf, ctx->val.vbool);
+
+	return -EOPNOTSUPP;
+}
+
+static int otx2_dl_serdes_link_get(struct devlink *devlink, u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pfvf = otx2_dl->pfvf;
+
+	ctx->val.vbool = (pfvf->linfo.link_up) ? true : false;
+
+	return 0;
+}
+
 enum otx2_dl_param_id {
 	OTX2_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	OTX2_DEVLINK_PARAM_ID_MCAM_COUNT,
+	OTX2_DEVLINK_PARAM_ID_SERDES_LINK,
 };
 
 static const struct devlink_param otx2_dl_params[] = {
@@ -75,6 +99,11 @@  static const struct devlink_param otx2_dl_params[] = {
 			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
 			     otx2_dl_mcam_count_get, otx2_dl_mcam_count_set,
 			     otx2_dl_mcam_count_validate),
+	DEVLINK_PARAM_DRIVER(OTX2_DEVLINK_PARAM_ID_SERDES_LINK,
+			     "serdes_link", DEVLINK_PARAM_TYPE_BOOL,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     otx2_dl_serdes_link_get, otx2_dl_serdes_link_set,
+			     NULL),
 };
 
 /* Devlink OPs */