diff mbox series

[net-next,v2,1/3] net/mlx4: Track RX allocation failures in a stat

Message ID 20240426183355.500364-2-jdamato@fastly.com (mailing list archive)
State Superseded
Headers show
Series mlx4: Add support for netdev-genl API | expand

Commit Message

Joe Damato April 26, 2024, 6:33 p.m. UTC
mlx4_en_alloc_frags currently returns -ENOMEM when mlx4_alloc_page
fails but does not increment a stat field when this occurs.

struct mlx4_en_rx_ring has a dropped field which is tabulated in
mlx4_en_DUMP_ETH_STATS, but never incremented by the driver.

This change modifies mlx4_en_alloc_frags to increment mlx4_en_rx_ring's
dropped field for the -ENOMEM case.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
---
 drivers/net/ethernet/mellanox/mlx4/en_port.c | 5 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

Comments

Jakub Kicinski April 26, 2024, 8 p.m. UTC | #1
On Fri, 26 Apr 2024 18:33:53 +0000 Joe Damato wrote:
> --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> @@ -151,7 +151,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
>  {
>  	struct mlx4_en_priv *priv = netdev_priv(dev);
>  	struct mlx4_en_dev *mdev = priv->mdev;
> -	unsigned long packets, bytes;
> +	unsigned long packets, bytes, dropped;
>  	int i;
>  
>  	if (!priv->port_up || mlx4_is_master(mdev->dev))
> @@ -159,14 +159,17 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
>  
>  	packets = 0;
>  	bytes = 0;
> +	dropped = 0;
>  	for (i = 0; i < priv->rx_ring_num; i++) {
>  		const struct mlx4_en_rx_ring *ring = priv->rx_ring[i];
>  
>  		packets += READ_ONCE(ring->packets);
>  		bytes   += READ_ONCE(ring->bytes);
> +		dropped += READ_ONCE(ring->dropped);
>  	}
>  	dev->stats.rx_packets = packets;
>  	dev->stats.rx_bytes = bytes;
> +	dev->stats.rx_missed_errors = dropped;

I'd drop this chunk, there's a slight but meaningful difference in
definition of rx_missed vs alloc-fail:

 * @rx_missed_errors: Count of packets missed by the host.
 *   Folded into the "drop" counter in `/proc/net/dev`.
 *
 *   Counts number of packets dropped by the device due to lack
 *   of buffer space. This usually indicates that the host interface
 *   is slower than the network interface, or host is not keeping up
 *   with the receive packet rate.
---
        name: rx-alloc-fail
        doc: |
          Number of times skb or buffer allocation failed on the Rx datapath.
          Allocation failure may, or may not result in a packet drop, depending
          on driver implementation and whether system recovers quickly.

tl;dr "packets dropped" vs "may, or may not result in a packet drop"

In case of mlx4 looks like the buffer refill is "async", the driver
tries to refill the buffers to max, but if it fails the next NAPI poll
will try again. Allocation failures are not directly tied to packet
drops. In case of bnxt if "replacement" buffer can't be allocated -
packet is dropped and old buffer gets returned to the ring (although 
if I'm 100% honest bnxt may be off by a couple, too, as the OOM stat
gets incremented on ifup pre-fill failures).
Joe Damato April 26, 2024, 11:43 p.m. UTC | #2
On Fri, Apr 26, 2024 at 01:00:17PM -0700, Jakub Kicinski wrote:
> On Fri, 26 Apr 2024 18:33:53 +0000 Joe Damato wrote:
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> > @@ -151,7 +151,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
> >  {
> >  	struct mlx4_en_priv *priv = netdev_priv(dev);
> >  	struct mlx4_en_dev *mdev = priv->mdev;
> > -	unsigned long packets, bytes;
> > +	unsigned long packets, bytes, dropped;
> >  	int i;
> >  
> >  	if (!priv->port_up || mlx4_is_master(mdev->dev))
> > @@ -159,14 +159,17 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
> >  
> >  	packets = 0;
> >  	bytes = 0;
> > +	dropped = 0;
> >  	for (i = 0; i < priv->rx_ring_num; i++) {
> >  		const struct mlx4_en_rx_ring *ring = priv->rx_ring[i];
> >  
> >  		packets += READ_ONCE(ring->packets);
> >  		bytes   += READ_ONCE(ring->bytes);
> > +		dropped += READ_ONCE(ring->dropped);
> >  	}
> >  	dev->stats.rx_packets = packets;
> >  	dev->stats.rx_bytes = bytes;
> > +	dev->stats.rx_missed_errors = dropped;
> 
> I'd drop this chunk, there's a slight but meaningful difference in
> definition of rx_missed vs alloc-fail:
> 
>  * @rx_missed_errors: Count of packets missed by the host.
>  *   Folded into the "drop" counter in `/proc/net/dev`.
>  *
>  *   Counts number of packets dropped by the device due to lack
>  *   of buffer space. This usually indicates that the host interface
>  *   is slower than the network interface, or host is not keeping up
>  *   with the receive packet rate.
> ---
>         name: rx-alloc-fail
>         doc: |
>           Number of times skb or buffer allocation failed on the Rx datapath.
>           Allocation failure may, or may not result in a packet drop, depending
>           on driver implementation and whether system recovers quickly.
> 
> tl;dr "packets dropped" vs "may, or may not result in a packet drop"
> 
> In case of mlx4 looks like the buffer refill is "async", the driver
> tries to refill the buffers to max, but if it fails the next NAPI poll
> will try again. Allocation failures are not directly tied to packet
> drops. In case of bnxt if "replacement" buffer can't be allocated -
> packet is dropped and old buffer gets returned to the ring (although 
> if I'm 100% honest bnxt may be off by a couple, too, as the OOM stat
> gets incremented on ifup pre-fill failures).

Yes, I see that now. I'll drop this patch entirely from v3 and just leave
the other two and remove alloc_fail from the queue stats patch.

Thanks for the careful review.
Jakub Kicinski April 26, 2024, 11:52 p.m. UTC | #3
On Fri, 26 Apr 2024 16:43:53 -0700 Joe Damato wrote:
> > In case of mlx4 looks like the buffer refill is "async", the driver
> > tries to refill the buffers to max, but if it fails the next NAPI poll
> > will try again. Allocation failures are not directly tied to packet
> > drops. In case of bnxt if "replacement" buffer can't be allocated -
> > packet is dropped and old buffer gets returned to the ring (although 
> > if I'm 100% honest bnxt may be off by a couple, too, as the OOM stat
> > gets incremented on ifup pre-fill failures).  
> 
> Yes, I see that now. I'll drop this patch entirely from v3 and just leave
> the other two and remove alloc_fail from the queue stats patch.

Up to you, but I'd keep alloc_fail itself.
If mlx4 gets page pool support one day it will be useful to run this:
https://lore.kernel.org/all/20240426232400.624864-1-kuba@kernel.org/

And I think it's useful to be able to check in case there are Rx
discards whether the system was also under transient memory pressure 
or not.
Joe Damato April 27, 2024, 12:28 a.m. UTC | #4
On Fri, Apr 26, 2024 at 04:52:13PM -0700, Jakub Kicinski wrote:
> On Fri, 26 Apr 2024 16:43:53 -0700 Joe Damato wrote:
> > > In case of mlx4 looks like the buffer refill is "async", the driver
> > > tries to refill the buffers to max, but if it fails the next NAPI poll
> > > will try again. Allocation failures are not directly tied to packet
> > > drops. In case of bnxt if "replacement" buffer can't be allocated -
> > > packet is dropped and old buffer gets returned to the ring (although 
> > > if I'm 100% honest bnxt may be off by a couple, too, as the OOM stat
> > > gets incremented on ifup pre-fill failures).  
> > 
> > Yes, I see that now. I'll drop this patch entirely from v3 and just leave
> > the other two and remove alloc_fail from the queue stats patch.
> 
> Up to you, but I'd keep alloc_fail itself.
> If mlx4 gets page pool support one day it will be useful to run this:
> https://lore.kernel.org/all/20240426232400.624864-1-kuba@kernel.org/
> 
> And I think it's useful to be able to check in case there are Rx
> discards whether the system was also under transient memory pressure 
> or not.

Ah, maybe I read what you wrote incorrectly in your previous message.

I think you were saying that I should drop just the

  dev->stats.rx_missed_errors = dropped;

due to the definition of rx_missed_errors, but that by the definition of
rx-alloc-fail:

  alloc_fail = ring->dropped;

is still valid and can stay.

Is that right or am I just totally off?
Jakub Kicinski April 27, 2024, 12:34 a.m. UTC | #5
On Fri, 26 Apr 2024 17:28:03 -0700 Joe Damato wrote:
> Ah, maybe I read what you wrote incorrectly in your previous message.
> 
> I think you were saying that I should drop just the
> 
>   dev->stats.rx_missed_errors = dropped;
> 
> due to the definition of rx_missed_errors, but that by the definition of
> rx-alloc-fail:
> 
>   alloc_fail = ring->dropped;
> 
> is still valid and can stay.

That's right, yes.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 532997eba698..47541f7666f2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -151,7 +151,7 @@  void mlx4_en_fold_software_stats(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
-	unsigned long packets, bytes;
+	unsigned long packets, bytes, dropped;
 	int i;
 
 	if (!priv->port_up || mlx4_is_master(mdev->dev))
@@ -159,14 +159,17 @@  void mlx4_en_fold_software_stats(struct net_device *dev)
 
 	packets = 0;
 	bytes = 0;
+	dropped = 0;
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		const struct mlx4_en_rx_ring *ring = priv->rx_ring[i];
 
 		packets += READ_ONCE(ring->packets);
 		bytes   += READ_ONCE(ring->bytes);
+		dropped += READ_ONCE(ring->dropped);
 	}
 	dev->stats.rx_packets = packets;
 	dev->stats.rx_bytes = bytes;
+	dev->stats.rx_missed_errors = dropped;
 
 	packets = 0;
 	bytes = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 8328df8645d5..573ae10300c7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -82,8 +82,10 @@  static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
 
 	for (i = 0; i < priv->num_frags; i++, frags++) {
 		if (!frags->page) {
-			if (mlx4_alloc_page(priv, frags, gfp))
+			if (mlx4_alloc_page(priv, frags, gfp)) {
+				ring->dropped++;
 				return -ENOMEM;
+			}
 			ring->rx_alloc_pages++;
 		}
 		rx_desc->data[i].addr = cpu_to_be64(frags->dma +