mbox series

[RFC,net-next,0/3] selftests: Add AF_XDP functionality test

Message ID cover.1718138187.git.zhuyifei@google.com (mailing list archive)
Headers show
Series selftests: Add AF_XDP functionality test | expand

Message

YiFei Zhu June 11, 2024, 8:42 p.m. UTC
We have observed that hardware NIC drivers may have faulty AF_XDP
implementations, and there seem to be a lack of a test of various modes
in which AF_XDP could run. This series adds a test to verify that NIC
drivers implements many AF_XDP features by performing a send / receive
of a single UDP packet.

I put the C code of the test under selftests/bpf because I'm not really
sure how I'd build the BPF-related code without the selftests/bpf
build infrastructure.

Tested on Google Cloud, with GVE:

  $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
    REMOTE_ARGS="root@10.138.15.235" \
    LOCAL_V4="10.138.15.234" \
    REMOTE_V4="10.138.15.235" \
    LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
    REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
    python3 xsk_hw.py

  KTAP version 1
  1..22
  ok 1 xsk_hw.ipv4_basic
  ok 2 xsk_hw.ipv4_tx_skb_copy
  ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
  ok 4 xsk_hw.ipv4_rx_skb_copy
  ok 5 xsk_hw.ipv4_tx_drv_copy
  ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
  ok 7 xsk_hw.ipv4_rx_drv_copy
  [...]
  # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
  not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
  ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
  ok 10 xsk_hw.ipv4_rx_drv_zerocopy
  [...]
  # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
  [...]
  # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
  not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
  ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
  [...]
  ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
  # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0

YiFei Zhu (3):
  selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
    network_helpers
  selftests/bpf: Add xsk_hw AF_XDP functionality test
  selftests: drv-net: Add xsk_hw AF_XDP functionality test

 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   7 +-
 tools/testing/selftests/bpf/network_helpers.c |  27 +
 tools/testing/selftests/bpf/network_helpers.h |  16 +
 tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
 tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
 tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
 9 files changed, 1102 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
 create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
 create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py

Comments

Magnus Karlsson June 12, 2024, 11:47 a.m. UTC | #1
On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
>
> We have observed that hardware NIC drivers may have faulty AF_XDP
> implementations, and there seem to be a lack of a test of various modes
> in which AF_XDP could run. This series adds a test to verify that NIC
> drivers implements many AF_XDP features by performing a send / receive
> of a single UDP packet.
>
> I put the C code of the test under selftests/bpf because I'm not really
> sure how I'd build the BPF-related code without the selftests/bpf
> build infrastructure.

Happy to see that you are contributing a number of new tests. Would it
be possible for you to integrate this into the xskxceiver framework?
You can find that in selftests/bpf too. By default, it will run its
tests using veth, but if you provide an interface name after the -i
option, it will run the tests over a real interface. I put the NIC in
loopback mode to use this feature, but feel free to add a new mode if
necessary. A lot of the setup and data plane code that you add already
exists in xskxceiver, so I would prefer if you could reuse it. Your
tests are new though and they would be valuable to have.

You could make the default packet that is sent in xskxceiver be the
UDP packet that you want and then add all the other logic that you
have to a number of new tests that you introduce.

> Tested on Google Cloud, with GVE:
>
>   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
>     REMOTE_ARGS="root@10.138.15.235" \
>     LOCAL_V4="10.138.15.234" \
>     REMOTE_V4="10.138.15.235" \
>     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
>     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
>     python3 xsk_hw.py
>
>   KTAP version 1
>   1..22
>   ok 1 xsk_hw.ipv4_basic
>   ok 2 xsk_hw.ipv4_tx_skb_copy
>   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
>   ok 4 xsk_hw.ipv4_rx_skb_copy
>   ok 5 xsk_hw.ipv4_tx_drv_copy
>   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
>   ok 7 xsk_hw.ipv4_rx_drv_copy
>   [...]
>   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
>   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
>   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
>   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
>   [...]
>   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
>   [...]
>   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
>   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
>   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
>   [...]
>   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
>   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
>
> YiFei Zhu (3):
>   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
>     network_helpers
>   selftests/bpf: Add xsk_hw AF_XDP functionality test
>   selftests: drv-net: Add xsk_hw AF_XDP functionality test
>
>  tools/testing/selftests/bpf/.gitignore        |   1 +
>  tools/testing/selftests/bpf/Makefile          |   7 +-
>  tools/testing/selftests/bpf/network_helpers.c |  27 +
>  tools/testing/selftests/bpf/network_helpers.h |  16 +
>  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
>  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
>  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
>  .../testing/selftests/drivers/net/hw/Makefile |   1 +
>  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
>  9 files changed, 1102 insertions(+), 26 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
>  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
>  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
>
> --
> 2.45.2.505.gda0bf45e8d-goog
>
>
Fijalkowski, Maciej June 12, 2024, 12:49 p.m. UTC | #2
On Wed, Jun 12, 2024 at 01:47:06PM +0200, Magnus Karlsson wrote:
> On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > We have observed that hardware NIC drivers may have faulty AF_XDP
> > implementations, and there seem to be a lack of a test of various modes
> > in which AF_XDP could run. This series adds a test to verify that NIC
> > drivers implements many AF_XDP features by performing a send / receive
> > of a single UDP packet.
> >
> > I put the C code of the test under selftests/bpf because I'm not really
> > sure how I'd build the BPF-related code without the selftests/bpf
> > build infrastructure.
> 
> Happy to see that you are contributing a number of new tests. Would it
> be possible for you to integrate this into the xskxceiver framework?
> You can find that in selftests/bpf too. By default, it will run its
> tests using veth, but if you provide an interface name after the -i
> option, it will run the tests over a real interface. I put the NIC in
> loopback mode to use this feature, but feel free to add a new mode if
> necessary. A lot of the setup and data plane code that you add already
> exists in xskxceiver, so I would prefer if you could reuse it. Your
> tests are new though and they would be valuable to have.

+1

I just don't believe that you guys were not aware that xskxceiver exist.
Please provide us a proper explanation/justification why this was not
fulfilling your needs and you decided to go with another test suite.

> 
> You could make the default packet that is sent in xskxceiver be the
> UDP packet that you want and then add all the other logic that you
> have to a number of new tests that you introduce.
> 
> > Tested on Google Cloud, with GVE:
> >
> >   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
> >     REMOTE_ARGS="root@10.138.15.235" \
> >     LOCAL_V4="10.138.15.234" \
> >     REMOTE_V4="10.138.15.235" \
> >     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> >     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> >     python3 xsk_hw.py
> >
> >   KTAP version 1
> >   1..22
> >   ok 1 xsk_hw.ipv4_basic
> >   ok 2 xsk_hw.ipv4_tx_skb_copy
> >   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
> >   ok 4 xsk_hw.ipv4_rx_skb_copy
> >   ok 5 xsk_hw.ipv4_tx_drv_copy
> >   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
> >   ok 7 xsk_hw.ipv4_rx_drv_copy
> >   [...]
> >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
> >   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
> >   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
> >   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
> >   [...]
> >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
> >   [...]
> >   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
> >   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
> >   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
> >   [...]
> >   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
> >   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
> >
> > YiFei Zhu (3):
> >   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
> >     network_helpers
> >   selftests/bpf: Add xsk_hw AF_XDP functionality test
> >   selftests: drv-net: Add xsk_hw AF_XDP functionality test
> >
> >  tools/testing/selftests/bpf/.gitignore        |   1 +
> >  tools/testing/selftests/bpf/Makefile          |   7 +-
> >  tools/testing/selftests/bpf/network_helpers.c |  27 +
> >  tools/testing/selftests/bpf/network_helpers.h |  16 +
> >  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
> >  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
> >  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
> >  .../testing/selftests/drivers/net/hw/Makefile |   1 +
> >  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
> >  9 files changed, 1102 insertions(+), 26 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
> >  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
> >  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
> >
> > --
> > 2.45.2.505.gda0bf45e8d-goog
> >
> >
>
Willem de Bruijn June 12, 2024, 1:57 p.m. UTC | #3
Magnus Karlsson wrote:
> On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> >
> > We have observed that hardware NIC drivers may have faulty AF_XDP
> > implementations, and there seem to be a lack of a test of various modes
> > in which AF_XDP could run. This series adds a test to verify that NIC
> > drivers implements many AF_XDP features by performing a send / receive
> > of a single UDP packet.
> >
> > I put the C code of the test under selftests/bpf because I'm not really
> > sure how I'd build the BPF-related code without the selftests/bpf
> > build infrastructure.
> 
> Happy to see that you are contributing a number of new tests. Would it
> be possible for you to integrate this into the xskxceiver framework?

Makes sense, we'll need to take a look.

This is an internal test that we have been using for a long time in
our test framework.

My mistake for not keeping up at all with the changes to xskxceiver.c
in the meantime.

We want to test each case independently. Including a few non obvious
cases that we discovered from real use, notably

- Using XSK only for Tx, without installing an Rx program
- Using XSK with an empty fill queue, filling it after bind

> You can find that in selftests/bpf too. By default, it will run its
> tests using veth, but if you provide an interface name after the -i
> option, it will run the tests over a real interface. I put the NIC in
> loopback mode to use this feature, but feel free to add a new mode if
> necessary.

We do really want two machine tests, not loopback mode. Also to
integrate into the drv-net infrastructure.

Another non-obvious feature is to test one side AF_XDP and use
PF_PACKET on the other side, to be able to isolate and exercise only
the Tx or Rx path in a test.

> A lot of the setup and data plane code that you add already
> exists in xskxceiver, so I would prefer if you could reuse it. Your
> tests are new though and they would be valuable to have.
> 
> You could make the default packet that is sent in xskxceiver be the
> UDP packet that you want and then add all the other logic that you
> have to a number of new tests that you introduce.
YiFei Zhu June 12, 2024, 4:44 p.m. UTC | #4
On Wed, Jun 12, 2024 at 5:50 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Jun 12, 2024 at 01:47:06PM +0200, Magnus Karlsson wrote:
> > On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> > >
> > > We have observed that hardware NIC drivers may have faulty AF_XDP
> > > implementations, and there seem to be a lack of a test of various modes
> > > in which AF_XDP could run. This series adds a test to verify that NIC
> > > drivers implements many AF_XDP features by performing a send / receive
> > > of a single UDP packet.
> > >
> > > I put the C code of the test under selftests/bpf because I'm not really
> > > sure how I'd build the BPF-related code without the selftests/bpf
> > > build infrastructure.
> >
> > Happy to see that you are contributing a number of new tests. Would it
> > be possible for you to integrate this into the xskxceiver framework?
> > You can find that in selftests/bpf too. By default, it will run its
> > tests using veth, but if you provide an interface name after the -i
> > option, it will run the tests over a real interface. I put the NIC in
> > loopback mode to use this feature, but feel free to add a new mode if
> > necessary. A lot of the setup and data plane code that you add already
> > exists in xskxceiver, so I would prefer if you could reuse it. Your
> > tests are new though and they would be valuable to have.
>
> +1
>
> I just don't believe that you guys were not aware that xskxceiver exist.
> Please provide us a proper explanation/justification why this was not
> fulfilling your needs and you decided to go with another test suite.

To answer this question, I can't speak for others, but I personally
was not fully aware.

Over a year ago when we were testing AF_XDP latency on internal NIC
drivers, we extended our internal latency test tool to support AF_XDP.
And that was when we observed the NICs we were testing had faulty
implementations - panics, packet corruptions, random drops; and we
decided to simplify the latency suite to add a simple pass/fail test
to our testing infrastructure, and we named it xsk_hw. The test was
specifically designed to test hardware NICs (rather than veth), and
there was a bunch of code around the test, to reserve & setup
machines, and to obtain information such as the IP addresses and the
host and next hop MACs addresses. At the time, the code was deemed too
dependent on our internal multi-machine-testing infrastructure to
upstream, but it has been running as part of our test suite since.

This brings us to recently. I was informed that upstream now have
drv-net, and now that upstream also has multi-machine testing, it's
time to upstream it. Hence this patch series, which I made after
adapting the code to use drv-net and network_helpers.

As for xskxceiver, for me personally, I discarded the idea after
reading the initial block comment of xskxceiver saying it spawns two
threads in a veth pair to test AF_XDP, which in my mind was like "okay
this doesn't test hardware NICs, and to extend that test to hardware
is probably a major rewrite that is probably not worth", so I did not
look too deeply into its code. I personally was unaware that it can
test a real interface, and that's partially my fault.

I'll take a look at xskxceiver and see how feasible it is to integrate
this into xskxceiver.

> >
> > You could make the default packet that is sent in xskxceiver be the
> > UDP packet that you want and then add all the other logic that you
> > have to a number of new tests that you introduce.
> >
> > > Tested on Google Cloud, with GVE:
> > >
> > >   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
> > >     REMOTE_ARGS="root@10.138.15.235" \
> > >     LOCAL_V4="10.138.15.234" \
> > >     REMOTE_V4="10.138.15.235" \
> > >     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > >     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > >     python3 xsk_hw.py
> > >
> > >   KTAP version 1
> > >   1..22
> > >   ok 1 xsk_hw.ipv4_basic
> > >   ok 2 xsk_hw.ipv4_tx_skb_copy
> > >   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
> > >   ok 4 xsk_hw.ipv4_rx_skb_copy
> > >   ok 5 xsk_hw.ipv4_tx_drv_copy
> > >   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
> > >   ok 7 xsk_hw.ipv4_rx_drv_copy
> > >   [...]
> > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
> > >   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
> > >   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
> > >   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
> > >   [...]
> > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
> > >   [...]
> > >   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
> > >   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
> > >   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
> > >   [...]
> > >   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
> > >   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
> > >
> > > YiFei Zhu (3):
> > >   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
> > >     network_helpers
> > >   selftests/bpf: Add xsk_hw AF_XDP functionality test
> > >   selftests: drv-net: Add xsk_hw AF_XDP functionality test
> > >
> > >  tools/testing/selftests/bpf/.gitignore        |   1 +
> > >  tools/testing/selftests/bpf/Makefile          |   7 +-
> > >  tools/testing/selftests/bpf/network_helpers.c |  27 +
> > >  tools/testing/selftests/bpf/network_helpers.h |  16 +
> > >  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
> > >  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
> > >  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
> > >  .../testing/selftests/drivers/net/hw/Makefile |   1 +
> > >  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
> > >  9 files changed, 1102 insertions(+), 26 deletions(-)
> > >  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
> > >  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
> > >  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
> > >
> > > --
> > > 2.45.2.505.gda0bf45e8d-goog
> > >
> > >
> >
Magnus Karlsson June 13, 2024, 6:42 a.m. UTC | #5
On Wed, 12 Jun 2024 at 18:44, YiFei Zhu <zhuyifei@google.com> wrote:
>
> On Wed, Jun 12, 2024 at 5:50 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Wed, Jun 12, 2024 at 01:47:06PM +0200, Magnus Karlsson wrote:
> > > On Tue, 11 Jun 2024 at 22:43, YiFei Zhu <zhuyifei@google.com> wrote:
> > > >
> > > > We have observed that hardware NIC drivers may have faulty AF_XDP
> > > > implementations, and there seem to be a lack of a test of various modes
> > > > in which AF_XDP could run. This series adds a test to verify that NIC
> > > > drivers implements many AF_XDP features by performing a send / receive
> > > > of a single UDP packet.
> > > >
> > > > I put the C code of the test under selftests/bpf because I'm not really
> > > > sure how I'd build the BPF-related code without the selftests/bpf
> > > > build infrastructure.
> > >
> > > Happy to see that you are contributing a number of new tests. Would it
> > > be possible for you to integrate this into the xskxceiver framework?
> > > You can find that in selftests/bpf too. By default, it will run its
> > > tests using veth, but if you provide an interface name after the -i
> > > option, it will run the tests over a real interface. I put the NIC in
> > > loopback mode to use this feature, but feel free to add a new mode if
> > > necessary. A lot of the setup and data plane code that you add already
> > > exists in xskxceiver, so I would prefer if you could reuse it. Your
> > > tests are new though and they would be valuable to have.
> >
> > +1
> >
> > I just don't believe that you guys were not aware that xskxceiver exist.
> > Please provide us a proper explanation/justification why this was not
> > fulfilling your needs and you decided to go with another test suite.
>
> To answer this question, I can't speak for others, but I personally
> was not fully aware.
>
> Over a year ago when we were testing AF_XDP latency on internal NIC
> drivers, we extended our internal latency test tool to support AF_XDP.
> And that was when we observed the NICs we were testing had faulty
> implementations - panics, packet corruptions, random drops; and we
> decided to simplify the latency suite to add a simple pass/fail test
> to our testing infrastructure, and we named it xsk_hw. The test was
> specifically designed to test hardware NICs (rather than veth), and
> there was a bunch of code around the test, to reserve & setup
> machines, and to obtain information such as the IP addresses and the
> host and next hop MACs addresses. At the time, the code was deemed too
> dependent on our internal multi-machine-testing infrastructure to
> upstream, but it has been running as part of our test suite since.
>
> This brings us to recently. I was informed that upstream now have
> drv-net, and now that upstream also has multi-machine testing, it's
> time to upstream it. Hence this patch series, which I made after
> adapting the code to use drv-net and network_helpers.

I was not aware of drv-net. I think it would be a really good idea to
just hook up xskxceiver to this even without adding any new tests. If
this is something that is run automatically for drivers, perfect, we
should make use of it. Any idea what it would take to make xskxceiver
use drv-net?

> As for xskxceiver, for me personally, I discarded the idea after
> reading the initial block comment of xskxceiver saying it spawns two
> threads in a veth pair to test AF_XDP, which in my mind was like "okay
> this doesn't test hardware NICs, and to extend that test to hardware
> is probably a major rewrite that is probably not worth", so I did not
> look too deeply into its code. I personally was unaware that it can
> test a real interface, and that's partially my fault.

Or mine for not updating the initial block comment. In any case, no worries!

> I'll take a look at xskxceiver and see how feasible it is to integrate
> this into xskxceiver.

Thanks! Please keep the drv-net integration in mind. Hopefully it is
not that much work to tweak xskxceiver to fit into that.

> > >
> > > You could make the default packet that is sent in xskxceiver be the
> > > UDP packet that you want and then add all the other logic that you
> > > have to a number of new tests that you introduce.
> > >
> > > > Tested on Google Cloud, with GVE:
> > > >
> > > >   $ sudo NETIF=ens4 REMOTE_TYPE=ssh \
> > > >     REMOTE_ARGS="root@10.138.15.235" \
> > > >     LOCAL_V4="10.138.15.234" \
> > > >     REMOTE_V4="10.138.15.235" \
> > > >     LOCAL_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > > >     REMOTE_NEXTHOP_MAC="42:01:0a:8a:00:01" \
> > > >     python3 xsk_hw.py
> > > >
> > > >   KTAP version 1
> > > >   1..22
> > > >   ok 1 xsk_hw.ipv4_basic
> > > >   ok 2 xsk_hw.ipv4_tx_skb_copy
> > > >   ok 3 xsk_hw.ipv4_tx_skb_copy_force_attach
> > > >   ok 4 xsk_hw.ipv4_rx_skb_copy
> > > >   ok 5 xsk_hw.ipv4_tx_drv_copy
> > > >   ok 6 xsk_hw.ipv4_tx_drv_copy_force_attach
> > > >   ok 7 xsk_hw.ipv4_rx_drv_copy
> > > >   [...]
> > > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: recv_pfpacket: Timeout\n'
> > > >   not ok 8 xsk_hw.ipv4_tx_drv_zerocopy
> > > >   ok 9 xsk_hw.ipv4_tx_drv_zerocopy_force_attach
> > > >   ok 10 xsk_hw.ipv4_rx_drv_zerocopy
> > > >   [...]
> > > >   # Exception| STDERR: b'/tmp/zzfhcqkg/pbgodkgjxsk_hw: connect sync client: max_retries\n'
> > > >   [...]
> > > >   # Exception| STDERR: b'/linux/tools/testing/selftests/bpf/xsk_hw: open_xsk: Device or resource busy\n'
> > > >   not ok 11 xsk_hw.ipv4_rx_drv_zerocopy_fill_after_bind
> > > >   ok 12 xsk_hw.ipv6_basic # SKIP Test requires IPv6 connectivity
> > > >   [...]
> > > >   ok 22 xsk_hw.ipv6_rx_drv_zerocopy_fill_after_bind # SKIP Test requires IPv6 connectivity
> > > >   # Totals: pass:9 fail:2 xfail:0 xpass:0 skip:11 error:0
> > > >
> > > > YiFei Zhu (3):
> > > >   selftests/bpf: Move rxq_num helper from xdp_hw_metadata to
> > > >     network_helpers
> > > >   selftests/bpf: Add xsk_hw AF_XDP functionality test
> > > >   selftests: drv-net: Add xsk_hw AF_XDP functionality test
> > > >
> > > >  tools/testing/selftests/bpf/.gitignore        |   1 +
> > > >  tools/testing/selftests/bpf/Makefile          |   7 +-
> > > >  tools/testing/selftests/bpf/network_helpers.c |  27 +
> > > >  tools/testing/selftests/bpf/network_helpers.h |  16 +
> > > >  tools/testing/selftests/bpf/progs/xsk_hw.c    |  72 ++
> > > >  tools/testing/selftests/bpf/xdp_hw_metadata.c |  27 +-
> > > >  tools/testing/selftests/bpf/xsk_hw.c          | 844 ++++++++++++++++++
> > > >  .../testing/selftests/drivers/net/hw/Makefile |   1 +
> > > >  .../selftests/drivers/net/hw/xsk_hw.py        | 133 +++
> > > >  9 files changed, 1102 insertions(+), 26 deletions(-)
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/xsk_hw.c
> > > >  create mode 100644 tools/testing/selftests/bpf/xsk_hw.c
> > > >  create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_hw.py
> > > >
> > > > --
> > > > 2.45.2.505.gda0bf45e8d-goog
> > > >
> > > >
> > >