diff mbox series

[rdma-core] Add Red Hat's ibdev2netdev helper

Message ID 20181211194639.47905-1-jarod@redhat.com (mailing list archive)
State Rejected
Headers show
Series [rdma-core] Add Red Hat's ibdev2netdev helper | expand

Commit Message

Jarod Wilson Dec. 11, 2018, 7:46 p.m. UTC
This is a helper script that Red Hat had previously been including in it's
ibutils package, which is no more, with the retirement of ibutils and
ibutils2 being proprietary to Mellanox now. This script still has use, and
needs a better home, so we'd like to add it to rdma-core's redhat/
directory -- if not somewhere more generic, should other distros wish to
make use of it as well.

CC: Doug Ledford <dledford@redhat.com>
CC: Honggang Li <honli@redhat.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
(Note: this version is somewhat reorganized and reformatted vs. what Red Hat
has been shipping in it's old ibutils package, due to a strong distaste I
have for 8-space-wide tab indents for shell scripts with several levels of
indenting required.)

 redhat/ibdev2netdev | 197 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 197 insertions(+)
 create mode 100644 redhat/ibdev2netdev

Comments

Mark Bloch Dec. 11, 2018, 7:57 p.m. UTC | #1
On 12/11/18 11:46 AM, Jarod Wilson wrote:
> This is a helper script that Red Hat had previously been including in it's
> ibutils package, which is no more, with the retirement of ibutils and
> ibutils2 being proprietary to Mellanox now. This script still has use, and
> needs a better home, so we'd like to add it to rdma-core's redhat/
> directory -- if not somewhere more generic, should other distros wish to
> make use of it as well.

Why not tell users to use rdma tool (iproute2)?

$ rdma link
1/1: mlx5_0/1: state ACTIVE physical_state LINK_UP netdev ens1f0 
2/1: mlx5_1/1: state ACTIVE physical_state LINK_UP netdev ens1f1 
4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens4f1 
5/1: mlx5_4/1: state DOWN physical_state DISABLED netdev ens5f0 
6/1: mlx5_5/1: state DOWN physical_state DISABLED netdev ens5f1 
11/1: mlx5_2/1: state DOWN physical_state DISABLED netdev eth0_65534 
12/1: mlx5_6/1: state DOWN physical_state DISABLED netdev eth1 
13/1: mlx5_7/1: state DOWN physical_state DISABLED netdev ens4f2 

Mark
> 
> CC: Doug Ledford <dledford@redhat.com>
> CC: Honggang Li <honli@redhat.com>
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
> (Note: this version is somewhat reorganized and reformatted vs. what Red Hat
> has been shipping in it's old ibutils package, due to a strong distaste I
> have for 8-space-wide tab indents for shell scripts with several levels of
> indenting required.)
> 
>  redhat/ibdev2netdev | 197 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 197 insertions(+)
>  create mode 100644 redhat/ibdev2netdev
> 
> diff --git a/redhat/ibdev2netdev b/redhat/ibdev2netdev
> new file mode 100644
> index 00000000..3fbc04cb
> --- /dev/null
> +++ b/redhat/ibdev2netdev
> @@ -0,0 +1,197 @@
> +#! /bin/bash
> +
> +usage()
> +{
> +  echo "$(basename $0) <options>"
> +  echo "-h, --help                print help message"
> +  echo "-v, --verbose             print more info"
> +}
> +
> +case $1 in
> +  "-h" | "--help")
> +    usage
> +    exit 0
> +    ;;
> +esac
> +
> +if (( $# > 1 )); then
> +  usage
> +  exit -1
> +fi
> +
> +if (( $# == 1 )) && [ "$1" != "-v" ]; then
> +  usage
> +  exit -1
> +fi
> +
> +ibdevs=$(ls /sys/class/infiniband/)
> +
> +devs=$(ls /sys/class/net/ | grep ^[ei])
> +if [ "x$devs" == "x" ]; then
> +  # no relevant devices - quit immediately
> +  exit
> +fi
> +
> +for d in $devs; do
> +  if [ -f /sys/class/net/$d/dev_id ]; then
> +    oldstyle=n
> +    break
> +  fi
> +done
> +
> +function print_line()
> +{
> +  echo "$1 port $2 <===> $3"
> +}
> +
> +function find_guid()
> +{
> +  ibdevs=$(ls /sys/class/infiniband/)
> +  for ibdev in $ibdevs; do
> +    ports=$(ls /sys/class/infiniband/$ibdev/ports/)
> +    for port in $ports; do
> +      gids=$(ls /sys/class/infiniband/$ibdev/ports/$port/gids)
> +      for gid in $gids; do
> +        pguid=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21- | sed -e 's/://g')
> +        if [ x$pguid == x$1 ]; then
> +          print_line $ibdev $port $2
> +        fi
> +      done
> +    done
> +  done
> +}
> +
> +function find_mac()
> +{
> +  ibdevs=$(ls /sys/class/infiniband/)
> +  for ibdev in $ibdevs; do
> +    type=$(cat /sys/class/infiniband/$ibdev/node_type|cut -d ' ' -f 2)
> +    ports=$(ls /sys/class/infiniband/$ibdev/ports/)
> +    for port in $ports; do
> +      gids=$(ls /sys/class/infiniband/$ibdev/ports/$port/gids)
> +      for gid in $gids; do
> +        if [[ "$type" = "RNIC" ]]; then
> +          pmac=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 1-14|sed -e 's/://g')
> +        else
> +          first=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21-22)
> +          first=$(( first ^ 2 ))
> +          first=$(printf "%02x" $first)
> +          second=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21- | sed -e 's/://g' | cut -b 3-6)
> +          third=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21- | sed -e 's/://g' | cut -b 11-)
> +          pmac=$first$second$third
> +        fi
> +        if [ x$pmac == x$1 ]; then
> +          print_line $ibdev $port $2
> +        fi
> +      done
> +    done
> +  done
> +}
> +
> +if [ "x$oldstyle" == "xn" ]; then
> +  for d in $ibdevs; do
> +    ibrsc=$(cat /sys/class/infiniband/$d/device/resource)
> +    eths=$(ls /sys/class/net/)
> +    for eth in $eths; do
> +      filepath_resource=/sys/class/net/$eth/device/resource
> +
> +      if [ -f $filepath_resource ]; then
> +        ethrsc=$(cat $filepath_resource)
> +        if [ "x$ethrsc" == "x$ibrsc" ]; then
> +          filepath_devid=/sys/class/net/$eth/dev_id
> +          filepath_devport=/sys/class/net/$eth/dev_port
> +          if [ -f $filepath_devid ]; then
> +            port1=0
> +            if [ -f $filepath_devport ]; then
> +              port1=$(cat $filepath_devport)
> +              port1=$(printf "%d" $port1)
> +            fi
> +
> +            port=$(cat $filepath_devid)
> +            port=$(printf "%d" $port)
> +            if [ $port1 -gt $port ]; then
> +              port=$port1
> +            fi
> +
> +            port=$(( port + 1 ))
> +
> +            filepath_carrier=/sys/class/net/$eth/carrier
> +
> +            if [ -f $filepath_carrier ]; then
> +              link_state=$(cat $filepath_carrier 2> /dev/null)
> +              if (( link_state == 1 )); then
> +                link_state="Up"
> +              else
> +                link_state="Down"
> +              fi
> +            else
> +              link_state="NA"
> +            fi
> +
> +            if [ "$1" == "-v" ]; then
> +              filepath_portstate=/sys/class/infiniband/$d/ports/$port/state
> +              filepath_deviceid=/sys/class/infiniband/$d/device/device
> +              filepath_fwver=/sys/class/infiniband/$d/fw_ver
> +              filepath_vpd=/sys/class/infiniband/$d/device/vpd
> +
> +              # read port state
> +              if [ -f $filepath_portstate ]; then
> +                ibstate=$(printf "%-6s" $(cat $filepath_portstate | gawk '{print $2}'))
> +              else
> +                ibstate="na"
> +              fi
> +
> +              # read device
> +              if [ -f $filepath_deviceid ]; then
> +                devid=$(printf "mt%d" $(cat $filepath_deviceid))
> +              else
> +                devid="na"
> +              fi
> +
> +              # read fw version
> +              if [ -f $filepath_fwver ]; then
> +                fwver=$(cat $filepath_fwver)
> +              else
> +                fwver="na"
> +              fi
> +
> +              # read device description and part id from the vpd
> +              if [ -f $filepath_vpd ]; then
> +                tmp=$ifs
> +                ifs=":"
> +                vpd_content=`cat $filepath_vpd`
> +                devdesc=$(printf "%-15s" $(echo $vpd_content | strings | head -1))
> +                partid=$(printf "%-11s" $(echo $vpd_content | strings | head -4 | tail -1 | gawk '{print $1}'))
> +                ifs=$tmp
> +              else
> +                devdesc=""
> +                partid="na"
> +              fi
> +
> +              echo "$d ($devid - $partid) $devdesc fw $fwver port $port ($ibstate) ==> $eth ($link_state)"
> +            else
> +              echo "$d port $port ==> $eth ($link_state)"
> +            fi
> +          fi
> +        fi
> +      fi
> +    done
> +  done
> +else
> +##########################
> +### old style
> +##########################
> +  ifcs=$(ifconfig -a | egrep '^eth|^ib' | gawk '{print $1}')
> +
> +  for ifc in $ifcs; do
> +    len=$(cat /sys/class/net/$ifc/addr_len)
> +    if (( len == 20 )); then
> +      guid=$(cat /sys/class/net/$ifc/address | cut -b 37- | sed -e 's/://g')
> +      find_guid $guid $ifc
> +    elif (( len == 6)); then
> +      mac=$(cat /sys/class/net/$ifc/address | sed -e 's/://g')
> +      find_mac $mac $ifc
> +    fi
> +  done
> +fi
> +
>
Parav Pandit Dec. 11, 2018, 9:17 p.m. UTC | #2
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Mark Bloch
> Sent: Tuesday, December 11, 2018 1:57 PM
> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> 
> 
> 
> On 12/11/18 11:46 AM, Jarod Wilson wrote:
> > This is a helper script that Red Hat had previously been including in
> > it's ibutils package, which is no more, with the retirement of ibutils
> > and
> > ibutils2 being proprietary to Mellanox now. This script still has use,
> > and needs a better home, so we'd like to add it to rdma-core's redhat/
> > directory -- if not somewhere more generic, should other distros wish
> > to make use of it as well.
> 
> Why not tell users to use rdma tool (iproute2)?
> 
Because rdma tool currently doesn’t show corresponding IPoIB upper netdevice(s) of for the IB device.
For a given rdma device there can be multiple netdevices exist for RoCE ports.
Rdma tool (and ibdev2netdev) doesn't show them.
The scope of ibdev2netdev currently is wider than what rdma tool shows.
So yes, its good recommendation to RoCE and representor users.

> $ rdma link
> 1/1: mlx5_0/1: state ACTIVE physical_state LINK_UP netdev ens1f0
> 2/1: mlx5_1/1: state ACTIVE physical_state LINK_UP netdev ens1f1
> 4/1: mlx5_3/1: state ACTIVE physical_state LINK_UP netdev ens4f1
> 5/1: mlx5_4/1: state DOWN physical_state DISABLED netdev ens5f0
> 6/1: mlx5_5/1: state DOWN physical_state DISABLED netdev ens5f1
> 11/1: mlx5_2/1: state DOWN physical_state DISABLED netdev eth0_65534
> 12/1: mlx5_6/1: state DOWN physical_state DISABLED netdev eth1
> 13/1: mlx5_7/1: state DOWN physical_state DISABLED netdev ens4f2
> 
> Mark
> >
> > CC: Doug Ledford <dledford@redhat.com>
> > CC: Honggang Li <honli@redhat.com>
> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> > ---
> > (Note: this version is somewhat reorganized and reformatted vs. what
> > Red Hat has been shipping in it's old ibutils package, due to a strong
> > distaste I have for 8-space-wide tab indents for shell scripts with
> > several levels of indenting required.)
> >
> >  redhat/ibdev2netdev | 197
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 197 insertions(+)
> >  create mode 100644 redhat/ibdev2netdev
> >
> > diff --git a/redhat/ibdev2netdev b/redhat/ibdev2netdev new file mode
> > 100644 index 00000000..3fbc04cb
> > --- /dev/null
> > +++ b/redhat/ibdev2netdev
> > @@ -0,0 +1,197 @@
> > +#! /bin/bash
> > +
> > +usage()
> > +{
> > +  echo "$(basename $0) <options>"
> > +  echo "-h, --help                print help message"
> > +  echo "-v, --verbose             print more info"
> > +}
> > +
> > +case $1 in
> > +  "-h" | "--help")
> > +    usage
> > +    exit 0
> > +    ;;
> > +esac
> > +
> > +if (( $# > 1 )); then
> > +  usage
> > +  exit -1
> > +fi
> > +
> > +if (( $# == 1 )) && [ "$1" != "-v" ]; then
> > +  usage
> > +  exit -1
> > +fi
> > +
> > +ibdevs=$(ls /sys/class/infiniband/)
> > +
> > +devs=$(ls /sys/class/net/ | grep ^[ei]) if [ "x$devs" == "x" ]; then
> > +  # no relevant devices - quit immediately
> > +  exit
> > +fi
> > +
> > +for d in $devs; do
> > +  if [ -f /sys/class/net/$d/dev_id ]; then
> > +    oldstyle=n
> > +    break
> > +  fi
> > +done
> > +
> > +function print_line()
> > +{
> > +  echo "$1 port $2 <===> $3"
> > +}
> > +
> > +function find_guid()
> > +{
> > +  ibdevs=$(ls /sys/class/infiniband/)
> > +  for ibdev in $ibdevs; do
> > +    ports=$(ls /sys/class/infiniband/$ibdev/ports/)
> > +    for port in $ports; do
> > +      gids=$(ls /sys/class/infiniband/$ibdev/ports/$port/gids)
> > +      for gid in $gids; do
> > +        pguid=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut
> -b 21- | sed -e 's/://g')
> > +        if [ x$pguid == x$1 ]; then
> > +          print_line $ibdev $port $2
> > +        fi
> > +      done
> > +    done
> > +  done
> > +}
> > +
> > +function find_mac()
> > +{
> > +  ibdevs=$(ls /sys/class/infiniband/)
> > +  for ibdev in $ibdevs; do
> > +    type=$(cat /sys/class/infiniband/$ibdev/node_type|cut -d ' ' -f 2)
> > +    ports=$(ls /sys/class/infiniband/$ibdev/ports/)
> > +    for port in $ports; do
> > +      gids=$(ls /sys/class/infiniband/$ibdev/ports/$port/gids)
> > +      for gid in $gids; do
> > +        if [[ "$type" = "RNIC" ]]; then
> > +          pmac=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid |
> cut -b 1-14|sed -e 's/://g')
> > +        else
> > +          first=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut
> -b 21-22)
> > +          first=$(( first ^ 2 ))
> > +          first=$(printf "%02x" $first)
> > +          second=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid |
> cut -b 21- | sed -e 's/://g' | cut -b 3-6)
> > +          third=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut
> -b 21- | sed -e 's/://g' | cut -b 11-)
> > +          pmac=$first$second$third
> > +        fi
> > +        if [ x$pmac == x$1 ]; then
> > +          print_line $ibdev $port $2
> > +        fi
> > +      done
> > +    done
> > +  done
> > +}
> > +
> > +if [ "x$oldstyle" == "xn" ]; then
> > +  for d in $ibdevs; do
> > +    ibrsc=$(cat /sys/class/infiniband/$d/device/resource)
> > +    eths=$(ls /sys/class/net/)
> > +    for eth in $eths; do
> > +      filepath_resource=/sys/class/net/$eth/device/resource
> > +
> > +      if [ -f $filepath_resource ]; then
> > +        ethrsc=$(cat $filepath_resource)
> > +        if [ "x$ethrsc" == "x$ibrsc" ]; then
> > +          filepath_devid=/sys/class/net/$eth/dev_id
> > +          filepath_devport=/sys/class/net/$eth/dev_port
> > +          if [ -f $filepath_devid ]; then
> > +            port1=0
> > +            if [ -f $filepath_devport ]; then
> > +              port1=$(cat $filepath_devport)
> > +              port1=$(printf "%d" $port1)
> > +            fi
> > +
> > +            port=$(cat $filepath_devid)
> > +            port=$(printf "%d" $port)
> > +            if [ $port1 -gt $port ]; then
> > +              port=$port1
> > +            fi
> > +
> > +            port=$(( port + 1 ))
> > +
> > +            filepath_carrier=/sys/class/net/$eth/carrier
> > +
> > +            if [ -f $filepath_carrier ]; then
> > +              link_state=$(cat $filepath_carrier 2> /dev/null)
> > +              if (( link_state == 1 )); then
> > +                link_state="Up"
> > +              else
> > +                link_state="Down"
> > +              fi
> > +            else
> > +              link_state="NA"
> > +            fi
> > +
> > +            if [ "$1" == "-v" ]; then
> > +              filepath_portstate=/sys/class/infiniband/$d/ports/$port/state
> > +              filepath_deviceid=/sys/class/infiniband/$d/device/device
> > +              filepath_fwver=/sys/class/infiniband/$d/fw_ver
> > +              filepath_vpd=/sys/class/infiniband/$d/device/vpd
> > +
> > +              # read port state
> > +              if [ -f $filepath_portstate ]; then
> > +                ibstate=$(printf "%-6s" $(cat $filepath_portstate | gawk '{print
> $2}'))
> > +              else
> > +                ibstate="na"
> > +              fi
> > +
> > +              # read device
> > +              if [ -f $filepath_deviceid ]; then
> > +                devid=$(printf "mt%d" $(cat $filepath_deviceid))
> > +              else
> > +                devid="na"
> > +              fi
> > +
> > +              # read fw version
> > +              if [ -f $filepath_fwver ]; then
> > +                fwver=$(cat $filepath_fwver)
> > +              else
> > +                fwver="na"
> > +              fi
> > +
> > +              # read device description and part id from the vpd
> > +              if [ -f $filepath_vpd ]; then
> > +                tmp=$ifs
> > +                ifs=":"
> > +                vpd_content=`cat $filepath_vpd`
> > +                devdesc=$(printf "%-15s" $(echo $vpd_content | strings | head -
> 1))
> > +                partid=$(printf "%-11s" $(echo $vpd_content | strings | head -4
> | tail -1 | gawk '{print $1}'))
> > +                ifs=$tmp
> > +              else
> > +                devdesc=""
> > +                partid="na"
> > +              fi
> > +
> > +              echo "$d ($devid - $partid) $devdesc fw $fwver port $port
> ($ibstate) ==> $eth ($link_state)"
> > +            else
> > +              echo "$d port $port ==> $eth ($link_state)"
> > +            fi
> > +          fi
> > +        fi
> > +      fi
> > +    done
> > +  done
> > +else
> > +##########################
> > +### old style
> > +##########################
> > +  ifcs=$(ifconfig -a | egrep '^eth|^ib' | gawk '{print $1}')
> > +
> > +  for ifc in $ifcs; do
> > +    len=$(cat /sys/class/net/$ifc/addr_len)
> > +    if (( len == 20 )); then
> > +      guid=$(cat /sys/class/net/$ifc/address | cut -b 37- | sed -e 's/://g')
> > +      find_guid $guid $ifc
> > +    elif (( len == 6)); then
> > +      mac=$(cat /sys/class/net/$ifc/address | sed -e 's/://g')
> > +      find_mac $mac $ifc
> > +    fi
> > +  done
> > +fi
> > +
> >
Jarod Wilson Dec. 11, 2018, 9:53 p.m. UTC | #3
On 2018-12-11 4:17 PM, Parav Pandit wrote:
> 
> 
>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>> owner@vger.kernel.org> On Behalf Of Mark Bloch
>> Sent: Tuesday, December 11, 2018 1:57 PM
>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>
>>
>>
>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
>>> This is a helper script that Red Hat had previously been including in
>>> it's ibutils package, which is no more, with the retirement of ibutils
>>> and
>>> ibutils2 being proprietary to Mellanox now. This script still has use,
>>> and needs a better home, so we'd like to add it to rdma-core's redhat/
>>> directory -- if not somewhere more generic, should other distros wish
>>> to make use of it as well.
>>
>> Why not tell users to use rdma tool (iproute2)?
>>
> Because rdma tool currently doesn’t show corresponding IPoIB upper netdevice(s) of for the IB device.
> For a given rdma device there can be multiple netdevices exist for RoCE ports.
> Rdma tool (and ibdev2netdev) doesn't show them.
> The scope of ibdev2netdev currently is wider than what rdma tool shows.
> So yes, its good recommendation to RoCE and representor users.

We've actually had some folks notice it was missing from RHEL-8.0 beta, 
and explicitly request it, because they have various infrastructure that 
was expecting it to be there. It's just a shell script, and could be 
updated along the way to use the rdma tool from iproute2, or become 
completely obsolete at some point. But we're definitely some demand to 
keep it around for the moment.
Leon Romanovsky Dec. 12, 2018, 5:27 a.m. UTC | #4
On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
> On 2018-12-11 4:17 PM, Parav Pandit wrote:
> >
> >
> > > -----Original Message-----
> > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > owner@vger.kernel.org> On Behalf Of Mark Bloch
> > > Sent: Tuesday, December 11, 2018 1:57 PM
> > > To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
> > > Cc: Doug Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
> > > Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> > >
> > >
> > >
> > > On 12/11/18 11:46 AM, Jarod Wilson wrote:
> > > > This is a helper script that Red Hat had previously been including in
> > > > it's ibutils package, which is no more, with the retirement of ibutils
> > > > and
> > > > ibutils2 being proprietary to Mellanox now. This script still has use,
> > > > and needs a better home, so we'd like to add it to rdma-core's redhat/
> > > > directory -- if not somewhere more generic, should other distros wish
> > > > to make use of it as well.
> > >
> > > Why not tell users to use rdma tool (iproute2)?
> > >
> > Because rdma tool currently doesn’t show corresponding IPoIB upper netdevice(s) of for the IB device.
> > For a given rdma device there can be multiple netdevices exist for RoCE ports.
> > Rdma tool (and ibdev2netdev) doesn't show them.
> > The scope of ibdev2netdev currently is wider than what rdma tool shows.
> > So yes, its good recommendation to RoCE and representor users.
>
> We've actually had some folks notice it was missing from RHEL-8.0 beta, and
> explicitly request it, because they have various infrastructure that was
> expecting it to be there. It's just a shell script, and could be updated
> along the way to use the rdma tool from iproute2, or become completely
> obsolete at some point. But we're definitely some demand to keep it around
> for the moment.

It is already obsolete and rdmatool replaces it. I see no value in
keeping abandoned tool in upstream repository.

Regarding Parav's point, lack of IPoIB translations, iroute2 is an open-source
project and people are encouraged to extend it. If I judge by number of
customer requests to me (zero) to implement it, there is no demand in such
functionality.

Thanks

>
> --
> Jarod Wilson
> jarod@redhat.com
Leon Romanovsky Dec. 12, 2018, 5:28 a.m. UTC | #5
On Tue, Dec 11, 2018 at 02:46:39PM -0500, Jarod Wilson wrote:
> This is a helper script that Red Hat had previously been including in it's
> ibutils package, which is no more, with the retirement of ibutils and
> ibutils2 being proprietary to Mellanox now. This script still has use, and
> needs a better home, so we'd like to add it to rdma-core's redhat/
> directory -- if not somewhere more generic, should other distros wish to
> make use of it as well.
>
> CC: Doug Ledford <dledford@redhat.com>
> CC: Honggang Li <honli@redhat.com>
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
> (Note: this version is somewhat reorganized and reformatted vs. what Red Hat
> has been shipping in it's old ibutils package, due to a strong distaste I
> have for 8-space-wide tab indents for shell scripts with several levels of
> indenting required.)
>
>  redhat/ibdev2netdev | 197 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 197 insertions(+)
>  create mode 100644 redhat/ibdev2netdev
>

As Mark said in his rely, we have rdmatool for that.
Please use it or extend and use it if something missing.

Thanks
Mark Bloch Dec. 12, 2018, 6:38 a.m. UTC | #6
On 12/11/2018 21:27, Leon Romanovsky wrote:
> On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
>> On 2018-12-11 4:17 PM, Parav Pandit wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>>>> owner@vger.kernel.org> On Behalf Of Mark Bloch
>>>> Sent: Tuesday, December 11, 2018 1:57 PM
>>>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
>>>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
>>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>>>
>>>>
>>>>
>>>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
>>>>> This is a helper script that Red Hat had previously been including in
>>>>> it's ibutils package, which is no more, with the retirement of ibutils
>>>>> and
>>>>> ibutils2 being proprietary to Mellanox now. This script still has use,
>>>>> and needs a better home, so we'd like to add it to rdma-core's redhat/
>>>>> directory -- if not somewhere more generic, should other distros wish
>>>>> to make use of it as well.
>>>>
>>>> Why not tell users to use rdma tool (iproute2)?
>>>>
>>> Because rdma tool currently doesn’t show corresponding IPoIB upper netdevice(s) of for the IB device.
>>> For a given rdma device there can be multiple netdevices exist for RoCE ports.
>>> Rdma tool (and ibdev2netdev) doesn't show them.
>>> The scope of ibdev2netdev currently is wider than what rdma tool shows.
>>> So yes, its good recommendation to RoCE and representor users.
>>
>> We've actually had some folks notice it was missing from RHEL-8.0 beta, and
>> explicitly request it, because they have various infrastructure that was
>> expecting it to be there. It's just a shell script, and could be updated
>> along the way to use the rdma tool from iproute2, or become completely
>> obsolete at some point. But we're definitely some demand to keep it around
>> for the moment.
> 
> It is already obsolete and rdmatool replaces it. I see no value in
> keeping abandoned tool in upstream repository.
> 
> Regarding Parav's point, lack of IPoIB translations, iroute2 is an open-source
> project and people are encouraged to extend it. If I judge by number of
> customer requests to me (zero) to implement it, there is no demand in such
> functionality.

Well, a quick prototype follows below, the question is should I work on it
or not, does anyone core?

output can look something like this:

$ ./rdma/rdma link -d show mlx5_3
4/1: mlx5_3/1: subnet_prefix fe80:0000:0000:0000 lid 65535 sm_lid 0 lmc 0 state DOWN physical_state DISABLED
    0: netdev ib0 netdev_index 35
    1: netdev ib0.3333 netdev_index 36
    2: netdev ib0.3332 netdev_index 37
    caps: <TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS, CAP_MASK2, CM, VENDOR_CLASS, CAP_MASK_NOTICE, CLIENT_REG, OTHER_LOCAL_CHANGES, MULT_PKER_TRAP>

Code (kernel side only, quick hack):

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index cc7535c5e192..cfdbe6ede841 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -116,6 +116,10 @@ int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
 void ib_cache_release_one(struct ib_device *device);

+int ib_dump_net_devs(struct ib_device *dev, u32 port,
+                    int (*func)(struct net_device *, void *),
+                    void *ctx);
+
 #ifdef CONFIG_CGROUP_RDMA
 int ib_device_register_rdmacg(struct ib_device *device);
 void ib_device_unregister_rdmacg(struct ib_device *device);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 348a7fb1f945..ee5eacb49821 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1174,6 +1174,38 @@ int ib_find_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_pkey);

+int ib_dump_net_devs(struct ib_device *dev, u32 port,
+                    int (*func)(struct net_device *, void *),
+                    void *ctx)
+{
+       struct ib_client_data *context;
+       int ret = 0;
+
+       if (!rdma_protocol_ib(dev, port))
+               return 0;
+
+       down_read(&lists_rwsem);
+
+       list_for_each_entry(context, &dev->client_data_list, list) {
+               struct ib_client *client = context->client;
+
+               if (context->going_down)
+                       continue;
+
+               if (client->dump_net_devs) {
+                       ret = client->dump_net_devs(dev, port, context->data,
+                                                   func, ctx);
+                       if (ret)
+                               goto out;
+               }
+       }
+
+out:
+       up_read(&lists_rwsem);
+
+       return ret;
+}
+
 /**
  * ib_get_net_dev_by_params() - Return the appropriate net_dev
  * for a received CM request
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 9abbadb9e366..367205544930 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -107,6 +107,8 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
        [RDMA_NLDEV_ATTR_DRIVER_U32]            = { .type = NLA_U32 },
        [RDMA_NLDEV_ATTR_DRIVER_S64]            = { .type = NLA_S64 },
        [RDMA_NLDEV_ATTR_DRIVER_U64]            = { .type = NLA_U64 },
+       [RDMA_NLDEV_ATTR_NDEV_EXT]              = { .type = NLA_NESTED},
+       [RDMA_NLDEV_ATTR_NDEV_EXT_ENTRY]        = { .type = NLA_NESTED},
 };

 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -220,11 +222,43 @@ static int fill_dev_info(struct sk_buff *msg, struct ib_device *device)
        return 0;
 }

+struct dump_netdevs {
+       const struct net *net;
+       struct sk_buff *msg;
+};
+
+static int fill_port_net_dev_ext(struct net_device *netdev,
+                                void *ptr)
+{
+       struct dump_netdevs *ctx = ptr;
+       struct nlattr *entry_attr;
+       int ret;
+
+       if (netdev && net_eq(dev_net(netdev), ctx->net)) {
+               entry_attr = nla_nest_start(ctx->msg,
+                                           RDMA_NLDEV_ATTR_NDEV_EXT_ENTRY);
+               ret = nla_put_string(ctx->msg, RDMA_NLDEV_ATTR_NDEV_NAME,
+                                    netdev->name);
+               if (ret)
+                       return -ENOMEM;
+
+               ret = nla_put_u32(ctx->msg,
+                                 RDMA_NLDEV_ATTR_NDEV_INDEX, netdev->ifindex);
+               if (ret)
+                       return -ENOMEM;
+               nla_nest_end(ctx->msg, entry_attr);
+       }
+
+       return 0;
+}
+
 static int fill_port_info(struct sk_buff *msg,
                          struct ib_device *device, u32 port,
                          const struct net *net)
 {
+       struct dump_netdevs ctx = {.net = net, .msg = msg};
        struct net_device *netdev = NULL;
+       struct nlattr *entry_attr;
        struct ib_port_attr attr;
        int ret;

@@ -271,6 +305,12 @@ static int fill_port_info(struct sk_buff *msg,
                                     RDMA_NLDEV_ATTR_NDEV_NAME, netdev->name);
        }

+       entry_attr = nla_nest_start(msg, RDMA_NLDEV_ATTR_NDEV_EXT);
+       if (!entry_attr)
+               return -EMSGSIZE;
+
+       ret = ib_dump_net_devs(device, port, fill_port_net_dev_ext, &ctx);
+       nla_nest_end(msg, entry_attr);
 out:
        if (netdev)
                dev_put(netdev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8710214594d8..cf39ec8e7e75 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -97,6 +97,10 @@ static struct net_device *ipoib_get_net_dev_by_params(
                struct ib_device *dev, u8 port, u16 pkey,
                const union ib_gid *gid, const struct sockaddr *addr,
                void *client_data);
+static int ipoib_dump_net_devs(struct ib_device *dev, u8 port,
+                              void *client_data,
+                              int (*func)(struct net_device *, void *),
+                              void *ctx);
 static int ipoib_set_mac(struct net_device *dev, void *addr);
 static int ipoib_ioctl(struct net_device *dev, struct ifreq *ifr,
                       int cmd);
@@ -106,6 +110,7 @@ static struct ib_client ipoib_client = {
        .add    = ipoib_add_one,
        .remove = ipoib_remove_one,
        .get_net_dev_by_params = ipoib_get_net_dev_by_params,
+       .dump_net_devs = ipoib_dump_net_devs,
 };

 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
@@ -465,6 +470,63 @@ static int __ipoib_get_net_dev_by_params(struct list_head *dev_list, u8 port,
        return matches;
 }

+static int __ipoib_dump_net_devs(struct ipoib_dev_priv *priv,
+                                int nesting,
+                                int (*func)(struct net_device *, void *),
+                                void *ctx)
+{
+       struct ipoib_dev_priv *child_priv;
+       struct net_device *net_dev = NULL;
+       int ret = 0;
+
+       net_dev = ipoib_get_master_net_dev(priv->dev);
+
+       if (net_dev) {
+               ret = func(net_dev, ctx);
+               dev_put(net_dev);
+       }
+
+       if (ret)
+               return ret;
+
+       down_read_nested(&priv->vlan_rwsem, nesting);
+       list_for_each_entry(child_priv, &priv->child_intfs, list) {
+               ret = __ipoib_dump_net_devs(child_priv, nesting + 1,
+                                           func, ctx);
+               if (ret)
+                       return ret;
+       }
+       up_read(&priv->vlan_rwsem);
+
+       return ret;
+}
+
+static int ipoib_dump_net_devs(struct ib_device *dev, u8 port,
+                              void *client_data,
+                              int (*func)(struct net_device *, void *),
+                              void *ctx)
+{
+       struct list_head *dev_list = client_data;
+       struct ipoib_dev_priv *priv;
+       int ret = 0;
+
+       if (!rdma_protocol_ib(dev, port))
+               return -EINVAL;
+
+       if (!dev_list)
+               return -EINVAL;
+
+       list_for_each_entry(priv, dev_list, list) {
+               if (priv->port != port)
+                       continue;
+               ret = __ipoib_dump_net_devs(priv, 0, func, ctx);
+               if (ret)
+                       return ret;
+       }
+
+       return ret;
+}
+
 static struct net_device *ipoib_get_net_dev_by_params(
                struct ib_device *dev, u8 port, u16 pkey,
                const union ib_gid *gid, const struct sockaddr *addr,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 85021451eee0..4481d7d627ee 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2618,6 +2618,12 @@ struct ib_client {
        char  *name;
        void (*add)   (struct ib_device *);
        void (*remove)(struct ib_device *, void *client_data);
+       int (*dump_net_devs)(struct ib_device *dev,
+                            u8 port,
+                            void *client_data,
+                            int (*func)(struct net_device *,
+                                        void *),
+                            void *ctx);

        /* Returns the net_dev belonging to this ib_client and matching the
         * given parameters.
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index f9c41bf59efc..52c92c0baa60 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -427,6 +427,13 @@ enum rdma_nldev_attr {
        RDMA_NLDEV_ATTR_DRIVER_S64,             /* s64 */
        RDMA_NLDEV_ATTR_DRIVER_U64,             /* u64 */

+       /*
+        * Provide extanded netdevices that depend on the rdma device
+        * for example IPoIB
+        */
+       RDMA_NLDEV_ATTR_NDEV_EXT,               /* nested_table */
+       RDMA_NLDEV_ATTR_NDEV_EXT_ENTRY,         /* nested_table */
+
        /*
         * Always the end
         */


> 
> Thanks
> 
>>
>> --
>> Jarod Wilson
>> jarod@redhat.com
Leon Romanovsky Dec. 12, 2018, 7:10 a.m. UTC | #7
On Wed, Dec 12, 2018 at 06:38:39AM +0000, Mark Bloch wrote:
>
>
> On 12/11/2018 21:27, Leon Romanovsky wrote:
> > On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
> >> On 2018-12-11 4:17 PM, Parav Pandit wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> >>>> owner@vger.kernel.org> On Behalf Of Mark Bloch
> >>>> Sent: Tuesday, December 11, 2018 1:57 PM
> >>>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
> >>>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
> >>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> >>>>
> >>>>
> >>>>
> >>>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
> >>>>> This is a helper script that Red Hat had previously been including in
> >>>>> it's ibutils package, which is no more, with the retirement of ibutils
> >>>>> and
> >>>>> ibutils2 being proprietary to Mellanox now. This script still has use,
> >>>>> and needs a better home, so we'd like to add it to rdma-core's redhat/
> >>>>> directory -- if not somewhere more generic, should other distros wish
> >>>>> to make use of it as well.
> >>>>
> >>>> Why not tell users to use rdma tool (iproute2)?
> >>>>
> >>> Because rdma tool currently doesn’t show corresponding IPoIB upper netdevice(s) of for the IB device.
> >>> For a given rdma device there can be multiple netdevices exist for RoCE ports.
> >>> Rdma tool (and ibdev2netdev) doesn't show them.
> >>> The scope of ibdev2netdev currently is wider than what rdma tool shows.
> >>> So yes, its good recommendation to RoCE and representor users.
> >>
> >> We've actually had some folks notice it was missing from RHEL-8.0 beta, and
> >> explicitly request it, because they have various infrastructure that was
> >> expecting it to be there. It's just a shell script, and could be updated
> >> along the way to use the rdma tool from iproute2, or become completely
> >> obsolete at some point. But we're definitely some demand to keep it around
> >> for the moment.
> >
> > It is already obsolete and rdmatool replaces it. I see no value in
> > keeping abandoned tool in upstream repository.
> >
> > Regarding Parav's point, lack of IPoIB translations, iroute2 is an open-source
> > project and people are encouraged to extend it. If I judge by number of
> > customer requests to me (zero) to implement it, there is no demand in such
> > functionality.
>
> Well, a quick prototype follows below, the question is should I work on it
> or not, does anyone core?

Parav and I are care, I have no clue about the rest of the world.

It will be better to start from mockup of rdmatool output.

Thanks
Parav Pandit Dec. 13, 2018, 4:43 p.m. UTC | #8
Hi Mark,

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Wednesday, December 12, 2018 1:10 AM
> To: Mark Bloch <markb@mellanox.com>
> Cc: Jarod Wilson <jarod@redhat.com>; Parav Pandit
> <parav@mellanox.com>; linux-rdma@vger.kernel.org; Doug Ledford
> <dledford@redhat.com>; Honggang Li <honli@redhat.com>
> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> 
> On Wed, Dec 12, 2018 at 06:38:39AM +0000, Mark Bloch wrote:
> >
> >
> > On 12/11/2018 21:27, Leon Romanovsky wrote:
> > > On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
> > >> On 2018-12-11 4:17 PM, Parav Pandit wrote:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > >>>> owner@vger.kernel.org> On Behalf Of Mark Bloch
> > >>>> Sent: Tuesday, December 11, 2018 1:57 PM
> > >>>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
> > >>>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li
> > >>>> <honli@redhat.com>
> > >>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
> > >>>>> This is a helper script that Red Hat had previously been
> > >>>>> including in it's ibutils package, which is no more, with the
> > >>>>> retirement of ibutils and
> > >>>>> ibutils2 being proprietary to Mellanox now. This script still
> > >>>>> has use, and needs a better home, so we'd like to add it to
> > >>>>> rdma-core's redhat/ directory -- if not somewhere more generic,
> > >>>>> should other distros wish to make use of it as well.
> > >>>>
> > >>>> Why not tell users to use rdma tool (iproute2)?
> > >>>>
> > >>> Because rdma tool currently doesn’t show corresponding IPoIB upper
> netdevice(s) of for the IB device.
> > >>> For a given rdma device there can be multiple netdevices exist for
> RoCE ports.
> > >>> Rdma tool (and ibdev2netdev) doesn't show them.
> > >>> The scope of ibdev2netdev currently is wider than what rdma tool
> shows.
> > >>> So yes, its good recommendation to RoCE and representor users.
> > >>
> > >> We've actually had some folks notice it was missing from RHEL-8.0
> > >> beta, and explicitly request it, because they have various
> > >> infrastructure that was expecting it to be there. It's just a shell
> > >> script, and could be updated along the way to use the rdma tool
> > >> from iproute2, or become completely obsolete at some point. But
> > >> we're definitely some demand to keep it around for the moment.
> > >
> > > It is already obsolete and rdmatool replaces it. I see no value in
> > > keeping abandoned tool in upstream repository.
> > >
> > > Regarding Parav's point, lack of IPoIB translations, iroute2 is an
> > > open-source project and people are encouraged to extend it. If I
> > > judge by number of customer requests to me (zero) to implement it,
> > > there is no demand in such functionality.
> >
> > Well, a quick prototype follows below, the question is should I work
> > on it or not, does anyone core?
> 
> Parav and I are care, I have no clue about the rest of the world.
> 
> It will be better to start from mockup of rdmatool output.
> 
> Thanks
Your work is certainly useful.
This allows users to start using rdma tool regardless of underlying link layer to get ib device to net device(s) mapping.
Mark Bloch Dec. 13, 2018, 6:07 p.m. UTC | #9
On 12/13/18 8:43 AM, Parav Pandit wrote:
> Hi Mark,
> 
>> -----Original Message-----
>> From: Leon Romanovsky <leon@kernel.org>
>> Sent: Wednesday, December 12, 2018 1:10 AM
>> To: Mark Bloch <markb@mellanox.com>
>> Cc: Jarod Wilson <jarod@redhat.com>; Parav Pandit
>> <parav@mellanox.com>; linux-rdma@vger.kernel.org; Doug Ledford
>> <dledford@redhat.com>; Honggang Li <honli@redhat.com>
>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>
>> On Wed, Dec 12, 2018 at 06:38:39AM +0000, Mark Bloch wrote:
>>>
>>>
>>> On 12/11/2018 21:27, Leon Romanovsky wrote:
>>>> On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
>>>>> On 2018-12-11 4:17 PM, Parav Pandit wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>>>>>>> owner@vger.kernel.org> On Behalf Of Mark Bloch
>>>>>>> Sent: Tuesday, December 11, 2018 1:57 PM
>>>>>>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
>>>>>>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li
>>>>>>> <honli@redhat.com>
>>>>>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
>>>>>>>> This is a helper script that Red Hat had previously been
>>>>>>>> including in it's ibutils package, which is no more, with the
>>>>>>>> retirement of ibutils and
>>>>>>>> ibutils2 being proprietary to Mellanox now. This script still
>>>>>>>> has use, and needs a better home, so we'd like to add it to
>>>>>>>> rdma-core's redhat/ directory -- if not somewhere more generic,
>>>>>>>> should other distros wish to make use of it as well.
>>>>>>>
>>>>>>> Why not tell users to use rdma tool (iproute2)?
>>>>>>>
>>>>>> Because rdma tool currently doesn’t show corresponding IPoIB upper
>> netdevice(s) of for the IB device.
>>>>>> For a given rdma device there can be multiple netdevices exist for
>> RoCE ports.
>>>>>> Rdma tool (and ibdev2netdev) doesn't show them.
>>>>>> The scope of ibdev2netdev currently is wider than what rdma tool
>> shows.
>>>>>> So yes, its good recommendation to RoCE and representor users.
>>>>>
>>>>> We've actually had some folks notice it was missing from RHEL-8.0
>>>>> beta, and explicitly request it, because they have various
>>>>> infrastructure that was expecting it to be there. It's just a shell
>>>>> script, and could be updated along the way to use the rdma tool
>>>>> from iproute2, or become completely obsolete at some point. But
>>>>> we're definitely some demand to keep it around for the moment.
>>>>
>>>> It is already obsolete and rdmatool replaces it. I see no value in
>>>> keeping abandoned tool in upstream repository.
>>>>
>>>> Regarding Parav's point, lack of IPoIB translations, iroute2 is an
>>>> open-source project and people are encouraged to extend it. If I
>>>> judge by number of customer requests to me (zero) to implement it,
>>>> there is no demand in such functionality.
>>>
>>> Well, a quick prototype follows below, the question is should I work
>>> on it or not, does anyone core?
>>
>> Parav and I are care, I have no clue about the rest of the world.
>>
>> It will be better to start from mockup of rdmatool output.
>>
>> Thanks
> Your work is certainly useful.
> This allows users to start using rdma tool regardless of underlying link layer to get ib device to net device(s) mapping.
> 
> 

Ok, let's build on that, as you ( and Leon :) ) care about this,
is there a format you prefer? 

What about something like this:
rdma link -d show mlx5_3
/1: mlx5_3/1: subnet_prefix fe80:0000:0000:0000 lid 65535 sm_lid 0 lmc 0 state DOWN physical_state DISABLED 
    ipoib: netdev ib0 netdev_index 42 
    ipoib: netdev ib0.3332 netdev_index 43 
    ipoib: netdev ib0.3333 netdev_index 44 
    caps: <TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS, CAP_MASK2, CM, VENDOR_CLASS, CAP_MASK_NOTICE, CLIENT_REG, OTHER_LOCAL_CHANGES, MULT_PKER_TRAP>

I assume you would like to add gid tables entries to rdma tool at some point, have you thought how would you
like to do that? use rdma link / rdma gids something else?

Mark

Are you going to dump it
Parav Pandit Dec. 13, 2018, 6:17 p.m. UTC | #10
> -----Original Message-----
> From: Mark Bloch
> Sent: Thursday, December 13, 2018 12:08 PM
> To: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
> <leon@kernel.org>
> Cc: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org; Doug
> Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> 
> 
> 
> On 12/13/18 8:43 AM, Parav Pandit wrote:
> > Hi Mark,
> >
> >> -----Original Message-----
> >> From: Leon Romanovsky <leon@kernel.org>
> >> Sent: Wednesday, December 12, 2018 1:10 AM
> >> To: Mark Bloch <markb@mellanox.com>
> >> Cc: Jarod Wilson <jarod@redhat.com>; Parav Pandit
> >> <parav@mellanox.com>; linux-rdma@vger.kernel.org; Doug Ledford
> >> <dledford@redhat.com>; Honggang Li <honli@redhat.com>
> >> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> >>
> >> On Wed, Dec 12, 2018 at 06:38:39AM +0000, Mark Bloch wrote:
> >>>
> >>>
> >>> On 12/11/2018 21:27, Leon Romanovsky wrote:
> >>>> On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
> >>>>> On 2018-12-11 4:17 PM, Parav Pandit wrote:
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> >>>>>>> owner@vger.kernel.org> On Behalf Of Mark Bloch
> >>>>>>> Sent: Tuesday, December 11, 2018 1:57 PM
> >>>>>>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
> >>>>>>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li
> >>>>>>> <honli@redhat.com>
> >>>>>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
> >>>>>>>> This is a helper script that Red Hat had previously been
> >>>>>>>> including in it's ibutils package, which is no more, with the
> >>>>>>>> retirement of ibutils and
> >>>>>>>> ibutils2 being proprietary to Mellanox now. This script still
> >>>>>>>> has use, and needs a better home, so we'd like to add it to
> >>>>>>>> rdma-core's redhat/ directory -- if not somewhere more generic,
> >>>>>>>> should other distros wish to make use of it as well.
> >>>>>>>
> >>>>>>> Why not tell users to use rdma tool (iproute2)?
> >>>>>>>
> >>>>>> Because rdma tool currently doesn’t show corresponding IPoIB
> >>>>>> upper
> >> netdevice(s) of for the IB device.
> >>>>>> For a given rdma device there can be multiple netdevices exist
> >>>>>> for
> >> RoCE ports.
> >>>>>> Rdma tool (and ibdev2netdev) doesn't show them.
> >>>>>> The scope of ibdev2netdev currently is wider than what rdma tool
> >> shows.
> >>>>>> So yes, its good recommendation to RoCE and representor users.
> >>>>>
> >>>>> We've actually had some folks notice it was missing from RHEL-8.0
> >>>>> beta, and explicitly request it, because they have various
> >>>>> infrastructure that was expecting it to be there. It's just a
> >>>>> shell script, and could be updated along the way to use the rdma
> >>>>> tool from iproute2, or become completely obsolete at some point.
> >>>>> But we're definitely some demand to keep it around for the moment.
> >>>>
> >>>> It is already obsolete and rdmatool replaces it. I see no value in
> >>>> keeping abandoned tool in upstream repository.
> >>>>
> >>>> Regarding Parav's point, lack of IPoIB translations, iroute2 is an
> >>>> open-source project and people are encouraged to extend it. If I
> >>>> judge by number of customer requests to me (zero) to implement it,
> >>>> there is no demand in such functionality.
> >>>
> >>> Well, a quick prototype follows below, the question is should I work
> >>> on it or not, does anyone core?
> >>
> >> Parav and I are care, I have no clue about the rest of the world.
> >>
> >> It will be better to start from mockup of rdmatool output.
> >>
> >> Thanks
> > Your work is certainly useful.
> > This allows users to start using rdma tool regardless of underlying link
> layer to get ib device to net device(s) mapping.
> >
> >
> 
> Ok, let's build on that, as you ( and Leon :) ) care about this, is there a format
> you prefer?
> 
> What about something like this:
> rdma link -d show mlx5_3
> /1: mlx5_3/1: subnet_prefix fe80:0000:0000:0000 lid 65535 sm_lid 0 lmc 0
> state DOWN physical_state DISABLED
>     ipoib: netdev ib0 netdev_index 42
>     ipoib: netdev ib0.3332 netdev_index 43
>     ipoib: netdev ib0.3333 netdev_index 44
>     caps: <TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS,
> CAP_MASK2, CM, VENDOR_CLASS, CAP_MASK_NOTICE, CLIENT_REG,
> OTHER_LOCAL_CHANGES, MULT_PKER_TRAP>
> 
> I assume you would like to add gid tables entries to rdma tool at some
> point, have you thought how would you like to do that? use rdma link /
> rdma gids something else?
> 

I prefer rdma addr that shows gids based on ip addresses and other ways along with their netdev.
So basically gid, and gid_attr (index, netdev, type).
Reason for addr instead of gids or link is, it aligns with ip addr.

> Mark
> 
> Are you going to dump it
Mark Bloch Dec. 13, 2018, 6:27 p.m. UTC | #11
On 12/13/18 10:17 AM, Parav Pandit wrote:
> 
> 
>> -----Original Message-----
>> From: Mark Bloch
>> Sent: Thursday, December 13, 2018 12:08 PM
>> To: Parav Pandit <parav@mellanox.com>; Leon Romanovsky
>> <leon@kernel.org>
>> Cc: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org; Doug
>> Ledford <dledford@redhat.com>; Honggang Li <honli@redhat.com>
>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>
>>
>>
>> On 12/13/18 8:43 AM, Parav Pandit wrote:
>>> Hi Mark,
>>>
>>>> -----Original Message-----
>>>> From: Leon Romanovsky <leon@kernel.org>
>>>> Sent: Wednesday, December 12, 2018 1:10 AM
>>>> To: Mark Bloch <markb@mellanox.com>
>>>> Cc: Jarod Wilson <jarod@redhat.com>; Parav Pandit
>>>> <parav@mellanox.com>; linux-rdma@vger.kernel.org; Doug Ledford
>>>> <dledford@redhat.com>; Honggang Li <honli@redhat.com>
>>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>>>
>>>> On Wed, Dec 12, 2018 at 06:38:39AM +0000, Mark Bloch wrote:
>>>>>
>>>>>
>>>>> On 12/11/2018 21:27, Leon Romanovsky wrote:
>>>>>> On Tue, Dec 11, 2018 at 04:53:49PM -0500, Jarod Wilson wrote:
>>>>>>> On 2018-12-11 4:17 PM, Parav Pandit wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
>>>>>>>>> owner@vger.kernel.org> On Behalf Of Mark Bloch
>>>>>>>>> Sent: Tuesday, December 11, 2018 1:57 PM
>>>>>>>>> To: Jarod Wilson <jarod@redhat.com>; linux-rdma@vger.kernel.org
>>>>>>>>> Cc: Doug Ledford <dledford@redhat.com>; Honggang Li
>>>>>>>>> <honli@redhat.com>
>>>>>>>>> Subject: Re: [PATCH rdma-core] Add Red Hat's ibdev2netdev helper
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/11/18 11:46 AM, Jarod Wilson wrote:
>>>>>>>>>> This is a helper script that Red Hat had previously been
>>>>>>>>>> including in it's ibutils package, which is no more, with the
>>>>>>>>>> retirement of ibutils and
>>>>>>>>>> ibutils2 being proprietary to Mellanox now. This script still
>>>>>>>>>> has use, and needs a better home, so we'd like to add it to
>>>>>>>>>> rdma-core's redhat/ directory -- if not somewhere more generic,
>>>>>>>>>> should other distros wish to make use of it as well.
>>>>>>>>>
>>>>>>>>> Why not tell users to use rdma tool (iproute2)?
>>>>>>>>>
>>>>>>>> Because rdma tool currently doesn’t show corresponding IPoIB
>>>>>>>> upper
>>>> netdevice(s) of for the IB device.
>>>>>>>> For a given rdma device there can be multiple netdevices exist
>>>>>>>> for
>>>> RoCE ports.
>>>>>>>> Rdma tool (and ibdev2netdev) doesn't show them.
>>>>>>>> The scope of ibdev2netdev currently is wider than what rdma tool
>>>> shows.
>>>>>>>> So yes, its good recommendation to RoCE and representor users.
>>>>>>>
>>>>>>> We've actually had some folks notice it was missing from RHEL-8.0
>>>>>>> beta, and explicitly request it, because they have various
>>>>>>> infrastructure that was expecting it to be there. It's just a
>>>>>>> shell script, and could be updated along the way to use the rdma
>>>>>>> tool from iproute2, or become completely obsolete at some point.
>>>>>>> But we're definitely some demand to keep it around for the moment.
>>>>>>
>>>>>> It is already obsolete and rdmatool replaces it. I see no value in
>>>>>> keeping abandoned tool in upstream repository.
>>>>>>
>>>>>> Regarding Parav's point, lack of IPoIB translations, iroute2 is an
>>>>>> open-source project and people are encouraged to extend it. If I
>>>>>> judge by number of customer requests to me (zero) to implement it,
>>>>>> there is no demand in such functionality.
>>>>>
>>>>> Well, a quick prototype follows below, the question is should I work
>>>>> on it or not, does anyone core?
>>>>
>>>> Parav and I are care, I have no clue about the rest of the world.
>>>>
>>>> It will be better to start from mockup of rdmatool output.
>>>>
>>>> Thanks
>>> Your work is certainly useful.
>>> This allows users to start using rdma tool regardless of underlying link
>> layer to get ib device to net device(s) mapping.
>>>
>>>
>>
>> Ok, let's build on that, as you ( and Leon :) ) care about this, is there a format
>> you prefer?
>>
>> What about something like this:
>> rdma link -d show mlx5_3
>> /1: mlx5_3/1: subnet_prefix fe80:0000:0000:0000 lid 65535 sm_lid 0 lmc 0
>> state DOWN physical_state DISABLED
>>     ipoib: netdev ib0 netdev_index 42
>>     ipoib: netdev ib0.3332 netdev_index 43
>>     ipoib: netdev ib0.3333 netdev_index 44
>>     caps: <TRAP, SL_MAP, SYS_IMAGE_GUID, CABLE_INFO, EXTENDED_SPEEDS,
>> CAP_MASK2, CM, VENDOR_CLASS, CAP_MASK_NOTICE, CLIENT_REG,
>> OTHER_LOCAL_CHANGES, MULT_PKER_TRAP>
>>
>> I assume you would like to add gid tables entries to rdma tool at some
>> point, have you thought how would you like to do that? use rdma link /
>> rdma gids something else?
>>
> 
> I prefer rdma addr that shows gids based on ip addresses and other ways along with their netdev.
> So basically gid, and gid_attr (index, netdev, type).
> Reason for addr instead of gids or link is, it aligns with ip addr.
> 

OK cool, so no need to worry about that..
What about the output I've provided? are you okay with that?
Do you want something different?

I was thinking of providing the following netlink attributes: 
<owner:string> example: "ipoib"
<ndev:string> example: "ib0"
<index:u32>  example: 42

Leon, as this is your baby, any thoughts?

Mark
diff mbox series

Patch

diff --git a/redhat/ibdev2netdev b/redhat/ibdev2netdev
new file mode 100644
index 00000000..3fbc04cb
--- /dev/null
+++ b/redhat/ibdev2netdev
@@ -0,0 +1,197 @@ 
+#! /bin/bash
+
+usage()
+{
+  echo "$(basename $0) <options>"
+  echo "-h, --help                print help message"
+  echo "-v, --verbose             print more info"
+}
+
+case $1 in
+  "-h" | "--help")
+    usage
+    exit 0
+    ;;
+esac
+
+if (( $# > 1 )); then
+  usage
+  exit -1
+fi
+
+if (( $# == 1 )) && [ "$1" != "-v" ]; then
+  usage
+  exit -1
+fi
+
+ibdevs=$(ls /sys/class/infiniband/)
+
+devs=$(ls /sys/class/net/ | grep ^[ei])
+if [ "x$devs" == "x" ]; then
+  # no relevant devices - quit immediately
+  exit
+fi
+
+for d in $devs; do
+  if [ -f /sys/class/net/$d/dev_id ]; then
+    oldstyle=n
+    break
+  fi
+done
+
+function print_line()
+{
+  echo "$1 port $2 <===> $3"
+}
+
+function find_guid()
+{
+  ibdevs=$(ls /sys/class/infiniband/)
+  for ibdev in $ibdevs; do
+    ports=$(ls /sys/class/infiniband/$ibdev/ports/)
+    for port in $ports; do
+      gids=$(ls /sys/class/infiniband/$ibdev/ports/$port/gids)
+      for gid in $gids; do
+        pguid=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21- | sed -e 's/://g')
+        if [ x$pguid == x$1 ]; then
+          print_line $ibdev $port $2
+        fi
+      done
+    done
+  done
+}
+
+function find_mac()
+{
+  ibdevs=$(ls /sys/class/infiniband/)
+  for ibdev in $ibdevs; do
+    type=$(cat /sys/class/infiniband/$ibdev/node_type|cut -d ' ' -f 2)
+    ports=$(ls /sys/class/infiniband/$ibdev/ports/)
+    for port in $ports; do
+      gids=$(ls /sys/class/infiniband/$ibdev/ports/$port/gids)
+      for gid in $gids; do
+        if [[ "$type" = "RNIC" ]]; then
+          pmac=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 1-14|sed -e 's/://g')
+        else
+          first=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21-22)
+          first=$(( first ^ 2 ))
+          first=$(printf "%02x" $first)
+          second=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21- | sed -e 's/://g' | cut -b 3-6)
+          third=$(cat /sys/class/infiniband/$ibdev/ports/$port/gids/$gid | cut -b 21- | sed -e 's/://g' | cut -b 11-)
+          pmac=$first$second$third
+        fi
+        if [ x$pmac == x$1 ]; then
+          print_line $ibdev $port $2
+        fi
+      done
+    done
+  done
+}
+
+if [ "x$oldstyle" == "xn" ]; then
+  for d in $ibdevs; do
+    ibrsc=$(cat /sys/class/infiniband/$d/device/resource)
+    eths=$(ls /sys/class/net/)
+    for eth in $eths; do
+      filepath_resource=/sys/class/net/$eth/device/resource
+
+      if [ -f $filepath_resource ]; then
+        ethrsc=$(cat $filepath_resource)
+        if [ "x$ethrsc" == "x$ibrsc" ]; then
+          filepath_devid=/sys/class/net/$eth/dev_id
+          filepath_devport=/sys/class/net/$eth/dev_port
+          if [ -f $filepath_devid ]; then
+            port1=0
+            if [ -f $filepath_devport ]; then
+              port1=$(cat $filepath_devport)
+              port1=$(printf "%d" $port1)
+            fi
+
+            port=$(cat $filepath_devid)
+            port=$(printf "%d" $port)
+            if [ $port1 -gt $port ]; then
+              port=$port1
+            fi
+
+            port=$(( port + 1 ))
+
+            filepath_carrier=/sys/class/net/$eth/carrier
+
+            if [ -f $filepath_carrier ]; then
+              link_state=$(cat $filepath_carrier 2> /dev/null)
+              if (( link_state == 1 )); then
+                link_state="Up"
+              else
+                link_state="Down"
+              fi
+            else
+              link_state="NA"
+            fi
+
+            if [ "$1" == "-v" ]; then
+              filepath_portstate=/sys/class/infiniband/$d/ports/$port/state
+              filepath_deviceid=/sys/class/infiniband/$d/device/device
+              filepath_fwver=/sys/class/infiniband/$d/fw_ver
+              filepath_vpd=/sys/class/infiniband/$d/device/vpd
+
+              # read port state
+              if [ -f $filepath_portstate ]; then
+                ibstate=$(printf "%-6s" $(cat $filepath_portstate | gawk '{print $2}'))
+              else
+                ibstate="na"
+              fi
+
+              # read device
+              if [ -f $filepath_deviceid ]; then
+                devid=$(printf "mt%d" $(cat $filepath_deviceid))
+              else
+                devid="na"
+              fi
+
+              # read fw version
+              if [ -f $filepath_fwver ]; then
+                fwver=$(cat $filepath_fwver)
+              else
+                fwver="na"
+              fi
+
+              # read device description and part id from the vpd
+              if [ -f $filepath_vpd ]; then
+                tmp=$ifs
+                ifs=":"
+                vpd_content=`cat $filepath_vpd`
+                devdesc=$(printf "%-15s" $(echo $vpd_content | strings | head -1))
+                partid=$(printf "%-11s" $(echo $vpd_content | strings | head -4 | tail -1 | gawk '{print $1}'))
+                ifs=$tmp
+              else
+                devdesc=""
+                partid="na"
+              fi
+
+              echo "$d ($devid - $partid) $devdesc fw $fwver port $port ($ibstate) ==> $eth ($link_state)"
+            else
+              echo "$d port $port ==> $eth ($link_state)"
+            fi
+          fi
+        fi
+      fi
+    done
+  done
+else
+##########################
+### old style
+##########################
+  ifcs=$(ifconfig -a | egrep '^eth|^ib' | gawk '{print $1}')
+
+  for ifc in $ifcs; do
+    len=$(cat /sys/class/net/$ifc/addr_len)
+    if (( len == 20 )); then
+      guid=$(cat /sys/class/net/$ifc/address | cut -b 37- | sed -e 's/://g')
+      find_guid $guid $ifc
+    elif (( len == 6)); then
+      mac=$(cat /sys/class/net/$ifc/address | sed -e 's/://g')
+      find_mac $mac $ifc
+    fi
+  done
+fi
+