diff mbox series

[3/3] net: stmmac: use pcpu statistics where necessary

Message ID 20230614161847.4071-4-jszhang@kernel.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: stmmac: fix & improve driver statistics | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Jisheng Zhang June 14, 2023, 4:18 p.m. UTC
If HW supports multiqueues, there are frequent cacheline ping pongs
on some driver statistic vars, for example, normal_irq_n, tx_pkt_n
and so on. What's more, frequent cacheline ping pongs on normal_irq_n
happens in ISR, this make the situation worse.

Use pcpu statistics where necessary to remove cacheline ping pongs
as much as possible to make multiqueue operations faster. Those stats
vars which are not frequently updated are kept as is.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |  45 ++---
 .../net/ethernet/stmicro/stmmac/dwmac-sun8i.c |   9 +-
 .../net/ethernet/stmicro/stmmac/dwmac4_lib.c  |  19 +-
 .../net/ethernet/stmicro/stmmac/dwmac_lib.c   |  11 +-
 .../ethernet/stmicro/stmmac/dwxgmac2_dma.c    |  11 +-
 .../ethernet/stmicro/stmmac/stmmac_ethtool.c  |  87 +++++----
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 166 ++++++++++--------
 7 files changed, 193 insertions(+), 155 deletions(-)

Comments

kernel test robot June 14, 2023, 6:38 p.m. UTC | #1
Hi Jisheng,

kernel test robot noticed the following build warnings:

[auto build test WARNING on sunxi/sunxi/for-next]
[also build test WARNING on linus/master v6.4-rc6]
[cannot apply to next-20230614]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jisheng-Zhang/net-stmmac-don-t-clear-network-statistics-in-ndo_open/20230615-003137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git sunxi/for-next
patch link:    https://lore.kernel.org/r/20230614161847.4071-4-jszhang%40kernel.org
patch subject: [PATCH 3/3] net: stmmac: use pcpu statistics where necessary
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230615/202306150255.k4BaJTXY-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 12.3.0
reproduce (this is a W=1 build):
        mkdir -p ~/bin
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git remote add sunxi https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git
        git fetch sunxi sunxi/for-next
        git checkout sunxi/sunxi/for-next
        b4 shazam https://lore.kernel.org/r/20230614161847.4071-4-jszhang@kernel.org
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=m68k olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.3.0 ~/bin/make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash drivers/net/

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306150255.k4BaJTXY-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c: In function 'stmmac_get_per_qstats':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:564:26: warning: 'start' is used uninitialized [-Wuninitialized]
     564 |                 } while (u64_stats_fetch_retry(&stats->syncp, start));
         |                          ^~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:551:22: note: 'start' was declared here
     551 |         unsigned int start;
         |                      ^~~~~


vim +/start +564 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c

   546	
   547	static void stmmac_get_per_qstats(struct stmmac_priv *priv, u64 *data)
   548	{
   549		u32 tx_cnt = priv->plat->tx_queues_to_use;
   550		u32 rx_cnt = priv->plat->rx_queues_to_use;
   551		unsigned int start;
   552		int q, stat, cpu;
   553		char *p;
   554		u64 *pos;
   555	
   556		pos = data;
   557		for_each_possible_cpu(cpu) {
   558			struct stmmac_pcpu_stats *stats, snapshot;
   559	
   560			data = pos;
   561			stats = per_cpu_ptr(priv->xstats.pstats, cpu);
   562			do {
   563				snapshot = *stats;
 > 564			} while (u64_stats_fetch_retry(&stats->syncp, start));
   565	
   566			for (q = 0; q < tx_cnt; q++) {
   567				p = (char *)&snapshot + offsetof(struct stmmac_pcpu_stats,
   568							    txq_stats[q].tx_pkt_n);
   569				for (stat = 0; stat < STMMAC_TXQ_STATS; stat++) {
   570					*data++ = (*(u64 *)p);
   571					p += sizeof(u64);
   572				}
   573			}
   574			for (q = 0; q < rx_cnt; q++) {
   575				p = (char *)&snapshot + offsetof(struct stmmac_pcpu_stats,
   576							    rxq_stats[q].rx_pkt_n);
   577				for (stat = 0; stat < STMMAC_RXQ_STATS; stat++) {
   578					*data++ = (*(u64 *)p);
   579					p += sizeof(u64);
   580				}
   581			}
   582		}
   583	}
   584
kernel test robot June 14, 2023, 10:16 p.m. UTC | #2
Hi Jisheng,

kernel test robot noticed the following build errors:

[auto build test ERROR on sunxi/sunxi/for-next]
[also build test ERROR on linus/master v6.4-rc6]
[cannot apply to next-20230614]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jisheng-Zhang/net-stmmac-don-t-clear-network-statistics-in-ndo_open/20230615-003137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git sunxi/for-next
patch link:    https://lore.kernel.org/r/20230614161847.4071-4-jszhang%40kernel.org
patch subject: [PATCH 3/3] net: stmmac: use pcpu statistics where necessary
config: riscv-randconfig-r006-20230612 (https://download.01.org/0day-ci/archive/20230615/202306150658.XLO1cHJU-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build):
        mkdir -p ~/bin
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        git remote add sunxi https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git
        git fetch sunxi sunxi/for-next
        git checkout sunxi/sunxi/for-next
        b4 shazam https://lore.kernel.org/r/20230614161847.4071-4-jszhang@kernel.org
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang ~/bin/make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang ~/bin/make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/net/ethernet/stmicro/stmmac/

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306150658.XLO1cHJU-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:564:49: warning: variable 'start' is uninitialized when used here [-Wuninitialized]
     564 |                 } while (u64_stats_fetch_retry(&stats->syncp, start));
         |                                                               ^~~~~
   drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:551:20: note: initialize the variable 'start' to silence this warning
     551 |         unsigned int start;
         |                           ^
         |                            = 0
   1 warning generated.
--
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:7243:13: error: no member named 'xstas' in 'struct stmmac_priv'; did you mean 'xstats'?
    7243 |         if (!priv->xstas.pstats)
         |                    ^~~~~
         |                    xstats
   drivers/net/ethernet/stmicro/stmmac/stmmac.h:247:28: note: 'xstats' declared here
     247 |         struct stmmac_extra_stats xstats ____cacheline_aligned_in_smp;
         |                                   ^
   1 error generated.


vim +7243 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

  7211	
  7212	/**
  7213	 * stmmac_dvr_probe
  7214	 * @device: device pointer
  7215	 * @plat_dat: platform data pointer
  7216	 * @res: stmmac resource pointer
  7217	 * Description: this is the main probe function used to
  7218	 * call the alloc_etherdev, allocate the priv structure.
  7219	 * Return:
  7220	 * returns 0 on success, otherwise errno.
  7221	 */
  7222	int stmmac_dvr_probe(struct device *device,
  7223			     struct plat_stmmacenet_data *plat_dat,
  7224			     struct stmmac_resources *res)
  7225	{
  7226		struct net_device *ndev = NULL;
  7227		struct stmmac_priv *priv;
  7228		u32 rxq;
  7229		int i, ret = 0;
  7230	
  7231		ndev = devm_alloc_etherdev_mqs(device, sizeof(struct stmmac_priv),
  7232					       MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES);
  7233		if (!ndev)
  7234			return -ENOMEM;
  7235	
  7236		SET_NETDEV_DEV(ndev, device);
  7237	
  7238		priv = netdev_priv(ndev);
  7239		priv->device = device;
  7240		priv->dev = ndev;
  7241	
  7242		priv->xstats.pstats = devm_netdev_alloc_pcpu_stats(device, struct stmmac_pcpu_stats);
> 7243		if (!priv->xstas.pstats)
  7244			return -ENOMEM;
  7245	
  7246		stmmac_set_ethtool_ops(ndev);
  7247		priv->pause = pause;
  7248		priv->plat = plat_dat;
  7249		priv->ioaddr = res->addr;
  7250		priv->dev->base_addr = (unsigned long)res->addr;
  7251		priv->plat->dma_cfg->multi_msi_en = priv->plat->multi_msi_en;
  7252	
  7253		priv->dev->irq = res->irq;
  7254		priv->wol_irq = res->wol_irq;
  7255		priv->lpi_irq = res->lpi_irq;
  7256		priv->sfty_ce_irq = res->sfty_ce_irq;
  7257		priv->sfty_ue_irq = res->sfty_ue_irq;
  7258		for (i = 0; i < MTL_MAX_RX_QUEUES; i++)
  7259			priv->rx_irq[i] = res->rx_irq[i];
  7260		for (i = 0; i < MTL_MAX_TX_QUEUES; i++)
  7261			priv->tx_irq[i] = res->tx_irq[i];
  7262	
  7263		if (!is_zero_ether_addr(res->mac))
  7264			eth_hw_addr_set(priv->dev, res->mac);
  7265	
  7266		dev_set_drvdata(device, priv->dev);
  7267	
  7268		/* Verify driver arguments */
  7269		stmmac_verify_args();
  7270	
  7271		priv->af_xdp_zc_qps = bitmap_zalloc(MTL_MAX_TX_QUEUES, GFP_KERNEL);
  7272		if (!priv->af_xdp_zc_qps)
  7273			return -ENOMEM;
  7274	
  7275		/* Allocate workqueue */
  7276		priv->wq = create_singlethread_workqueue("stmmac_wq");
  7277		if (!priv->wq) {
  7278			dev_err(priv->device, "failed to create workqueue\n");
  7279			ret = -ENOMEM;
  7280			goto error_wq_init;
  7281		}
  7282	
  7283		INIT_WORK(&priv->service_task, stmmac_service_task);
  7284	
  7285		/* Initialize Link Partner FPE workqueue */
  7286		INIT_WORK(&priv->fpe_task, stmmac_fpe_lp_task);
  7287	
  7288		/* Override with kernel parameters if supplied XXX CRS XXX
  7289		 * this needs to have multiple instances
  7290		 */
  7291		if ((phyaddr >= 0) && (phyaddr <= 31))
  7292			priv->plat->phy_addr = phyaddr;
  7293	
  7294		if (priv->plat->stmmac_rst) {
  7295			ret = reset_control_assert(priv->plat->stmmac_rst);
  7296			reset_control_deassert(priv->plat->stmmac_rst);
  7297			/* Some reset controllers have only reset callback instead of
  7298			 * assert + deassert callbacks pair.
  7299			 */
  7300			if (ret == -ENOTSUPP)
  7301				reset_control_reset(priv->plat->stmmac_rst);
  7302		}
  7303	
  7304		ret = reset_control_deassert(priv->plat->stmmac_ahb_rst);
  7305		if (ret == -ENOTSUPP)
  7306			dev_err(priv->device, "unable to bring out of ahb reset: %pe\n",
  7307				ERR_PTR(ret));
  7308	
  7309		/* Init MAC and get the capabilities */
  7310		ret = stmmac_hw_init(priv);
  7311		if (ret)
  7312			goto error_hw_init;
  7313	
  7314		/* Only DWMAC core version 5.20 onwards supports HW descriptor prefetch.
  7315		 */
  7316		if (priv->synopsys_id < DWMAC_CORE_5_20)
  7317			priv->plat->dma_cfg->dche = false;
  7318	
  7319		stmmac_check_ether_addr(priv);
  7320	
  7321		ndev->netdev_ops = &stmmac_netdev_ops;
  7322	
  7323		ndev->xdp_metadata_ops = &stmmac_xdp_metadata_ops;
  7324	
  7325		ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
  7326				    NETIF_F_RXCSUM;
  7327		ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
  7328				     NETDEV_XDP_ACT_XSK_ZEROCOPY |
  7329				     NETDEV_XDP_ACT_NDO_XMIT;
  7330	
  7331		ret = stmmac_tc_init(priv, priv);
  7332		if (!ret) {
  7333			ndev->hw_features |= NETIF_F_HW_TC;
  7334		}
  7335	
  7336		if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
  7337			ndev->hw_features |= NETIF_F_TSO | NETIF_F_TSO6;
  7338			if (priv->plat->has_gmac4)
  7339				ndev->hw_features |= NETIF_F_GSO_UDP_L4;
  7340			priv->tso = true;
  7341			dev_info(priv->device, "TSO feature enabled\n");
  7342		}
  7343	
  7344		if (priv->dma_cap.sphen && !priv->plat->sph_disable) {
  7345			ndev->hw_features |= NETIF_F_GRO;
  7346			priv->sph_cap = true;
  7347			priv->sph = priv->sph_cap;
  7348			dev_info(priv->device, "SPH feature enabled\n");
  7349		}
  7350	
  7351		/* Ideally our host DMA address width is the same as for the
  7352		 * device. However, it may differ and then we have to use our
  7353		 * host DMA width for allocation and the device DMA width for
  7354		 * register handling.
  7355		 */
  7356		if (priv->plat->host_dma_width)
  7357			priv->dma_cap.host_dma_width = priv->plat->host_dma_width;
  7358		else
  7359			priv->dma_cap.host_dma_width = priv->dma_cap.addr64;
  7360	
  7361		if (priv->dma_cap.host_dma_width) {
  7362			ret = dma_set_mask_and_coherent(device,
  7363					DMA_BIT_MASK(priv->dma_cap.host_dma_width));
  7364			if (!ret) {
  7365				dev_info(priv->device, "Using %d/%d bits DMA host/device width\n",
  7366					 priv->dma_cap.host_dma_width, priv->dma_cap.addr64);
  7367	
  7368				/*
  7369				 * If more than 32 bits can be addressed, make sure to
  7370				 * enable enhanced addressing mode.
  7371				 */
  7372				if (IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT))
  7373					priv->plat->dma_cfg->eame = true;
  7374			} else {
  7375				ret = dma_set_mask_and_coherent(device, DMA_BIT_MASK(32));
  7376				if (ret) {
  7377					dev_err(priv->device, "Failed to set DMA Mask\n");
  7378					goto error_hw_init;
  7379				}
  7380	
  7381				priv->dma_cap.host_dma_width = 32;
  7382			}
  7383		}
  7384	
  7385		ndev->features |= ndev->hw_features | NETIF_F_HIGHDMA;
  7386		ndev->watchdog_timeo = msecs_to_jiffies(watchdog);
  7387	#ifdef STMMAC_VLAN_TAG_USED
  7388		/* Both mac100 and gmac support receive VLAN tag detection */
  7389		ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX;
  7390		if (priv->dma_cap.vlhash) {
  7391			ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
  7392			ndev->features |= NETIF_F_HW_VLAN_STAG_FILTER;
  7393		}
  7394		if (priv->dma_cap.vlins) {
  7395			ndev->features |= NETIF_F_HW_VLAN_CTAG_TX;
  7396			if (priv->dma_cap.dvlan)
  7397				ndev->features |= NETIF_F_HW_VLAN_STAG_TX;
  7398		}
  7399	#endif
  7400		priv->msg_enable = netif_msg_init(debug, default_msg_level);
  7401	
  7402		priv->xstats.threshold = tc;
  7403	
  7404		/* Initialize RSS */
  7405		rxq = priv->plat->rx_queues_to_use;
  7406		netdev_rss_key_fill(priv->rss.key, sizeof(priv->rss.key));
  7407		for (i = 0; i < ARRAY_SIZE(priv->rss.table); i++)
  7408			priv->rss.table[i] = ethtool_rxfh_indir_default(i, rxq);
  7409	
  7410		if (priv->dma_cap.rssen && priv->plat->rss_en)
  7411			ndev->features |= NETIF_F_RXHASH;
  7412	
  7413		ndev->vlan_features |= ndev->features;
  7414		/* TSO doesn't work on VLANs yet */
  7415		ndev->vlan_features &= ~NETIF_F_TSO;
  7416	
  7417		/* MTU range: 46 - hw-specific max */
  7418		ndev->min_mtu = ETH_ZLEN - ETH_HLEN;
  7419		if (priv->plat->has_xgmac)
  7420			ndev->max_mtu = XGMAC_JUMBO_LEN;
  7421		else if ((priv->plat->enh_desc) || (priv->synopsys_id >= DWMAC_CORE_4_00))
  7422			ndev->max_mtu = JUMBO_LEN;
  7423		else
  7424			ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
  7425		/* Will not overwrite ndev->max_mtu if plat->maxmtu > ndev->max_mtu
  7426		 * as well as plat->maxmtu < ndev->min_mtu which is a invalid range.
  7427		 */
  7428		if ((priv->plat->maxmtu < ndev->max_mtu) &&
  7429		    (priv->plat->maxmtu >= ndev->min_mtu))
  7430			ndev->max_mtu = priv->plat->maxmtu;
  7431		else if (priv->plat->maxmtu < ndev->min_mtu)
  7432			dev_warn(priv->device,
  7433				 "%s: warning: maxmtu having invalid value (%d)\n",
  7434				 __func__, priv->plat->maxmtu);
  7435	
  7436		if (flow_ctrl)
  7437			priv->flow_ctrl = FLOW_AUTO;	/* RX/TX pause on */
  7438	
  7439		ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
  7440	
  7441		/* Setup channels NAPI */
  7442		stmmac_napi_add(ndev);
  7443	
  7444		mutex_init(&priv->lock);
  7445	
  7446		/* If a specific clk_csr value is passed from the platform
  7447		 * this means that the CSR Clock Range selection cannot be
  7448		 * changed at run-time and it is fixed. Viceversa the driver'll try to
  7449		 * set the MDC clock dynamically according to the csr actual
  7450		 * clock input.
  7451		 */
  7452		if (priv->plat->clk_csr >= 0)
  7453			priv->clk_csr = priv->plat->clk_csr;
  7454		else
  7455			stmmac_clk_csr_set(priv);
  7456	
  7457		stmmac_check_pcs_mode(priv);
  7458	
  7459		pm_runtime_get_noresume(device);
  7460		pm_runtime_set_active(device);
  7461		if (!pm_runtime_enabled(device))
  7462			pm_runtime_enable(device);
  7463	
  7464		if (priv->hw->pcs != STMMAC_PCS_TBI &&
  7465		    priv->hw->pcs != STMMAC_PCS_RTBI) {
  7466			/* MDIO bus Registration */
  7467			ret = stmmac_mdio_register(ndev);
  7468			if (ret < 0) {
  7469				dev_err_probe(priv->device, ret,
  7470					      "%s: MDIO bus (id: %d) registration failed\n",
  7471					      __func__, priv->plat->bus_id);
  7472				goto error_mdio_register;
  7473			}
  7474		}
  7475	
  7476		if (priv->plat->speed_mode_2500)
  7477			priv->plat->speed_mode_2500(ndev, priv->plat->bsp_priv);
  7478	
  7479		if (priv->plat->mdio_bus_data && priv->plat->mdio_bus_data->has_xpcs) {
  7480			ret = stmmac_xpcs_setup(priv->mii);
  7481			if (ret)
  7482				goto error_xpcs_setup;
  7483		}
  7484	
  7485		ret = stmmac_phy_setup(priv);
  7486		if (ret) {
  7487			netdev_err(ndev, "failed to setup phy (%d)\n", ret);
  7488			goto error_phy_setup;
  7489		}
  7490	
  7491		ret = register_netdev(ndev);
  7492		if (ret) {
  7493			dev_err(priv->device, "%s: ERROR %i registering the device\n",
  7494				__func__, ret);
  7495			goto error_netdev_register;
  7496		}
  7497
kernel test robot June 15, 2023, 4:07 a.m. UTC | #3
Hi Jisheng,

kernel test robot noticed the following build errors:

[auto build test ERROR on sunxi/sunxi/for-next]
[also build test ERROR on linus/master v6.4-rc6]
[cannot apply to next-20230614]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jisheng-Zhang/net-stmmac-don-t-clear-network-statistics-in-ndo_open/20230615-003137
base:   https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git sunxi/for-next
patch link:    https://lore.kernel.org/r/20230614161847.4071-4-jszhang%40kernel.org
patch subject: [PATCH 3/3] net: stmmac: use pcpu statistics where necessary
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230615/202306151110.z8I0lY3U-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build):
        git remote add sunxi https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git
        git fetch sunxi sunxi/for-next
        git checkout sunxi/sunxi/for-next
        b4 shazam https://lore.kernel.org/r/20230614161847.4071-4-jszhang@kernel.org
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306151110.z8I0lY3U-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:7243:20: error: 'struct stmmac_priv' has no member named 'xstas'; did you mean 'xstats'?
    7243 |         if (!priv->xstas.pstats)
         |                    ^~~~~
         |                    xstats


vim +7243 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

  7211	
  7212	/**
  7213	 * stmmac_dvr_probe
  7214	 * @device: device pointer
  7215	 * @plat_dat: platform data pointer
  7216	 * @res: stmmac resource pointer
  7217	 * Description: this is the main probe function used to
  7218	 * call the alloc_etherdev, allocate the priv structure.
  7219	 * Return:
  7220	 * returns 0 on success, otherwise errno.
  7221	 */
  7222	int stmmac_dvr_probe(struct device *device,
  7223			     struct plat_stmmacenet_data *plat_dat,
  7224			     struct stmmac_resources *res)
  7225	{
  7226		struct net_device *ndev = NULL;
  7227		struct stmmac_priv *priv;
  7228		u32 rxq;
  7229		int i, ret = 0;
  7230	
  7231		ndev = devm_alloc_etherdev_mqs(device, sizeof(struct stmmac_priv),
  7232					       MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES);
  7233		if (!ndev)
  7234			return -ENOMEM;
  7235	
  7236		SET_NETDEV_DEV(ndev, device);
  7237	
  7238		priv = netdev_priv(ndev);
  7239		priv->device = device;
  7240		priv->dev = ndev;
  7241	
  7242		priv->xstats.pstats = devm_netdev_alloc_pcpu_stats(device, struct stmmac_pcpu_stats);
> 7243		if (!priv->xstas.pstats)
  7244			return -ENOMEM;
  7245	
  7246		stmmac_set_ethtool_ops(ndev);
  7247		priv->pause = pause;
  7248		priv->plat = plat_dat;
  7249		priv->ioaddr = res->addr;
  7250		priv->dev->base_addr = (unsigned long)res->addr;
  7251		priv->plat->dma_cfg->multi_msi_en = priv->plat->multi_msi_en;
  7252	
  7253		priv->dev->irq = res->irq;
  7254		priv->wol_irq = res->wol_irq;
  7255		priv->lpi_irq = res->lpi_irq;
  7256		priv->sfty_ce_irq = res->sfty_ce_irq;
  7257		priv->sfty_ue_irq = res->sfty_ue_irq;
  7258		for (i = 0; i < MTL_MAX_RX_QUEUES; i++)
  7259			priv->rx_irq[i] = res->rx_irq[i];
  7260		for (i = 0; i < MTL_MAX_TX_QUEUES; i++)
  7261			priv->tx_irq[i] = res->tx_irq[i];
  7262	
  7263		if (!is_zero_ether_addr(res->mac))
  7264			eth_hw_addr_set(priv->dev, res->mac);
  7265	
  7266		dev_set_drvdata(device, priv->dev);
  7267	
  7268		/* Verify driver arguments */
  7269		stmmac_verify_args();
  7270	
  7271		priv->af_xdp_zc_qps = bitmap_zalloc(MTL_MAX_TX_QUEUES, GFP_KERNEL);
  7272		if (!priv->af_xdp_zc_qps)
  7273			return -ENOMEM;
  7274	
  7275		/* Allocate workqueue */
  7276		priv->wq = create_singlethread_workqueue("stmmac_wq");
  7277		if (!priv->wq) {
  7278			dev_err(priv->device, "failed to create workqueue\n");
  7279			ret = -ENOMEM;
  7280			goto error_wq_init;
  7281		}
  7282	
  7283		INIT_WORK(&priv->service_task, stmmac_service_task);
  7284	
  7285		/* Initialize Link Partner FPE workqueue */
  7286		INIT_WORK(&priv->fpe_task, stmmac_fpe_lp_task);
  7287	
  7288		/* Override with kernel parameters if supplied XXX CRS XXX
  7289		 * this needs to have multiple instances
  7290		 */
  7291		if ((phyaddr >= 0) && (phyaddr <= 31))
  7292			priv->plat->phy_addr = phyaddr;
  7293	
  7294		if (priv->plat->stmmac_rst) {
  7295			ret = reset_control_assert(priv->plat->stmmac_rst);
  7296			reset_control_deassert(priv->plat->stmmac_rst);
  7297			/* Some reset controllers have only reset callback instead of
  7298			 * assert + deassert callbacks pair.
  7299			 */
  7300			if (ret == -ENOTSUPP)
  7301				reset_control_reset(priv->plat->stmmac_rst);
  7302		}
  7303	
  7304		ret = reset_control_deassert(priv->plat->stmmac_ahb_rst);
  7305		if (ret == -ENOTSUPP)
  7306			dev_err(priv->device, "unable to bring out of ahb reset: %pe\n",
  7307				ERR_PTR(ret));
  7308	
  7309		/* Init MAC and get the capabilities */
  7310		ret = stmmac_hw_init(priv);
  7311		if (ret)
  7312			goto error_hw_init;
  7313	
  7314		/* Only DWMAC core version 5.20 onwards supports HW descriptor prefetch.
  7315		 */
  7316		if (priv->synopsys_id < DWMAC_CORE_5_20)
  7317			priv->plat->dma_cfg->dche = false;
  7318	
  7319		stmmac_check_ether_addr(priv);
  7320	
  7321		ndev->netdev_ops = &stmmac_netdev_ops;
  7322	
  7323		ndev->xdp_metadata_ops = &stmmac_xdp_metadata_ops;
  7324	
  7325		ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
  7326				    NETIF_F_RXCSUM;
  7327		ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
  7328				     NETDEV_XDP_ACT_XSK_ZEROCOPY |
  7329				     NETDEV_XDP_ACT_NDO_XMIT;
  7330	
  7331		ret = stmmac_tc_init(priv, priv);
  7332		if (!ret) {
  7333			ndev->hw_features |= NETIF_F_HW_TC;
  7334		}
  7335	
  7336		if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
  7337			ndev->hw_features |= NETIF_F_TSO | NETIF_F_TSO6;
  7338			if (priv->plat->has_gmac4)
  7339				ndev->hw_features |= NETIF_F_GSO_UDP_L4;
  7340			priv->tso = true;
  7341			dev_info(priv->device, "TSO feature enabled\n");
  7342		}
  7343	
  7344		if (priv->dma_cap.sphen && !priv->plat->sph_disable) {
  7345			ndev->hw_features |= NETIF_F_GRO;
  7346			priv->sph_cap = true;
  7347			priv->sph = priv->sph_cap;
  7348			dev_info(priv->device, "SPH feature enabled\n");
  7349		}
  7350	
  7351		/* Ideally our host DMA address width is the same as for the
  7352		 * device. However, it may differ and then we have to use our
  7353		 * host DMA width for allocation and the device DMA width for
  7354		 * register handling.
  7355		 */
  7356		if (priv->plat->host_dma_width)
  7357			priv->dma_cap.host_dma_width = priv->plat->host_dma_width;
  7358		else
  7359			priv->dma_cap.host_dma_width = priv->dma_cap.addr64;
  7360	
  7361		if (priv->dma_cap.host_dma_width) {
  7362			ret = dma_set_mask_and_coherent(device,
  7363					DMA_BIT_MASK(priv->dma_cap.host_dma_width));
  7364			if (!ret) {
  7365				dev_info(priv->device, "Using %d/%d bits DMA host/device width\n",
  7366					 priv->dma_cap.host_dma_width, priv->dma_cap.addr64);
  7367	
  7368				/*
  7369				 * If more than 32 bits can be addressed, make sure to
  7370				 * enable enhanced addressing mode.
  7371				 */
  7372				if (IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT))
  7373					priv->plat->dma_cfg->eame = true;
  7374			} else {
  7375				ret = dma_set_mask_and_coherent(device, DMA_BIT_MASK(32));
  7376				if (ret) {
  7377					dev_err(priv->device, "Failed to set DMA Mask\n");
  7378					goto error_hw_init;
  7379				}
  7380	
  7381				priv->dma_cap.host_dma_width = 32;
  7382			}
  7383		}
  7384	
  7385		ndev->features |= ndev->hw_features | NETIF_F_HIGHDMA;
  7386		ndev->watchdog_timeo = msecs_to_jiffies(watchdog);
  7387	#ifdef STMMAC_VLAN_TAG_USED
  7388		/* Both mac100 and gmac support receive VLAN tag detection */
  7389		ndev->features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_STAG_RX;
  7390		if (priv->dma_cap.vlhash) {
  7391			ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
  7392			ndev->features |= NETIF_F_HW_VLAN_STAG_FILTER;
  7393		}
  7394		if (priv->dma_cap.vlins) {
  7395			ndev->features |= NETIF_F_HW_VLAN_CTAG_TX;
  7396			if (priv->dma_cap.dvlan)
  7397				ndev->features |= NETIF_F_HW_VLAN_STAG_TX;
  7398		}
  7399	#endif
  7400		priv->msg_enable = netif_msg_init(debug, default_msg_level);
  7401	
  7402		priv->xstats.threshold = tc;
  7403	
  7404		/* Initialize RSS */
  7405		rxq = priv->plat->rx_queues_to_use;
  7406		netdev_rss_key_fill(priv->rss.key, sizeof(priv->rss.key));
  7407		for (i = 0; i < ARRAY_SIZE(priv->rss.table); i++)
  7408			priv->rss.table[i] = ethtool_rxfh_indir_default(i, rxq);
  7409	
  7410		if (priv->dma_cap.rssen && priv->plat->rss_en)
  7411			ndev->features |= NETIF_F_RXHASH;
  7412	
  7413		ndev->vlan_features |= ndev->features;
  7414		/* TSO doesn't work on VLANs yet */
  7415		ndev->vlan_features &= ~NETIF_F_TSO;
  7416	
  7417		/* MTU range: 46 - hw-specific max */
  7418		ndev->min_mtu = ETH_ZLEN - ETH_HLEN;
  7419		if (priv->plat->has_xgmac)
  7420			ndev->max_mtu = XGMAC_JUMBO_LEN;
  7421		else if ((priv->plat->enh_desc) || (priv->synopsys_id >= DWMAC_CORE_4_00))
  7422			ndev->max_mtu = JUMBO_LEN;
  7423		else
  7424			ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
  7425		/* Will not overwrite ndev->max_mtu if plat->maxmtu > ndev->max_mtu
  7426		 * as well as plat->maxmtu < ndev->min_mtu which is a invalid range.
  7427		 */
  7428		if ((priv->plat->maxmtu < ndev->max_mtu) &&
  7429		    (priv->plat->maxmtu >= ndev->min_mtu))
  7430			ndev->max_mtu = priv->plat->maxmtu;
  7431		else if (priv->plat->maxmtu < ndev->min_mtu)
  7432			dev_warn(priv->device,
  7433				 "%s: warning: maxmtu having invalid value (%d)\n",
  7434				 __func__, priv->plat->maxmtu);
  7435	
  7436		if (flow_ctrl)
  7437			priv->flow_ctrl = FLOW_AUTO;	/* RX/TX pause on */
  7438	
  7439		ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
  7440	
  7441		/* Setup channels NAPI */
  7442		stmmac_napi_add(ndev);
  7443	
  7444		mutex_init(&priv->lock);
  7445	
  7446		/* If a specific clk_csr value is passed from the platform
  7447		 * this means that the CSR Clock Range selection cannot be
  7448		 * changed at run-time and it is fixed. Viceversa the driver'll try to
  7449		 * set the MDC clock dynamically according to the csr actual
  7450		 * clock input.
  7451		 */
  7452		if (priv->plat->clk_csr >= 0)
  7453			priv->clk_csr = priv->plat->clk_csr;
  7454		else
  7455			stmmac_clk_csr_set(priv);
  7456	
  7457		stmmac_check_pcs_mode(priv);
  7458	
  7459		pm_runtime_get_noresume(device);
  7460		pm_runtime_set_active(device);
  7461		if (!pm_runtime_enabled(device))
  7462			pm_runtime_enable(device);
  7463	
  7464		if (priv->hw->pcs != STMMAC_PCS_TBI &&
  7465		    priv->hw->pcs != STMMAC_PCS_RTBI) {
  7466			/* MDIO bus Registration */
  7467			ret = stmmac_mdio_register(ndev);
  7468			if (ret < 0) {
  7469				dev_err_probe(priv->device, ret,
  7470					      "%s: MDIO bus (id: %d) registration failed\n",
  7471					      __func__, priv->plat->bus_id);
  7472				goto error_mdio_register;
  7473			}
  7474		}
  7475	
  7476		if (priv->plat->speed_mode_2500)
  7477			priv->plat->speed_mode_2500(ndev, priv->plat->bsp_priv);
  7478	
  7479		if (priv->plat->mdio_bus_data && priv->plat->mdio_bus_data->has_xpcs) {
  7480			ret = stmmac_xpcs_setup(priv->mii);
  7481			if (ret)
  7482				goto error_xpcs_setup;
  7483		}
  7484	
  7485		ret = stmmac_phy_setup(priv);
  7486		if (ret) {
  7487			netdev_err(ndev, "failed to setup phy (%d)\n", ret);
  7488			goto error_phy_setup;
  7489		}
  7490	
  7491		ret = register_netdev(ndev);
  7492		if (ret) {
  7493			dev_err(priv->device, "%s: ERROR %i registering the device\n",
  7494				__func__, ret);
  7495			goto error_netdev_register;
  7496		}
  7497
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1cb8be45330d..ec212528b9df 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -59,20 +59,42 @@ 
 /* #define FRAME_FILTER_DEBUG */
 
 struct stmmac_txq_stats {
-	struct u64_stats_sync syncp;
 	u64 tx_pkt_n;
 	u64 tx_normal_irq_n;
 };
 
 struct stmmac_rxq_stats {
+	u64 rx_pkt_n;
+	u64 rx_normal_irq_n;
+};
+
+struct stmmac_pcpu_stats {
 	struct u64_stats_sync syncp;
+	/* per queue statistics */
+	struct stmmac_txq_stats txq_stats[MTL_MAX_TX_QUEUES];
+	struct stmmac_rxq_stats rxq_stats[MTL_MAX_RX_QUEUES];
+	/* device stats */
+	u64 rx_packets;
+	u64 rx_bytes;
+	u64 tx_packets;
+	u64 tx_bytes;
+	/* Tx/Rx IRQ Events */
+	u64 tx_pkt_n;
 	u64 rx_pkt_n;
+	u64 normal_irq_n;
 	u64 rx_normal_irq_n;
+	u64 napi_poll;
+	u64 tx_normal_irq_n;
+	u64 tx_clean;
+	u64 tx_set_ic_bit;
+	/* TSO */
+	u64 tx_tso_frames;
+	u64 tx_tso_nfrags;
 };
 
 /* Extra statistic and debug information exposed by ethtool */
 struct stmmac_extra_stats {
-	struct u64_stats_sync syncp ____cacheline_aligned;
+	struct stmmac_pcpu_stats __percpu *pstats;
 	/* Transmit errors */
 	unsigned long tx_underflow;
 	unsigned long tx_carrier;
@@ -117,14 +139,6 @@  struct stmmac_extra_stats {
 	/* Tx/Rx IRQ Events */
 	unsigned long rx_early_irq;
 	unsigned long threshold;
-	u64 tx_pkt_n;
-	u64 rx_pkt_n;
-	u64 normal_irq_n;
-	u64 rx_normal_irq_n;
-	u64 napi_poll;
-	u64 tx_normal_irq_n;
-	u64 tx_clean;
-	u64 tx_set_ic_bit;
 	unsigned long irq_receive_pmt_irq_n;
 	/* MMC info */
 	unsigned long mmc_tx_irq_n;
@@ -194,23 +208,12 @@  struct stmmac_extra_stats {
 	unsigned long mtl_rx_fifo_ctrl_active;
 	unsigned long mac_rx_frame_ctrl_fifo;
 	unsigned long mac_gmii_rx_proto_engine;
-	/* TSO */
-	u64 tx_tso_frames;
-	u64 tx_tso_nfrags;
 	/* EST */
 	unsigned long mtl_est_cgce;
 	unsigned long mtl_est_hlbs;
 	unsigned long mtl_est_hlbf;
 	unsigned long mtl_est_btre;
 	unsigned long mtl_est_btrlm;
-	/* per queue statistics */
-	struct stmmac_txq_stats txq_stats[MTL_MAX_TX_QUEUES];
-	struct stmmac_rxq_stats rxq_stats[MTL_MAX_RX_QUEUES];
-	/* device stats */
-	u64 rx_packets;
-	u64 rx_bytes;
-	u64 tx_packets;
-	u64 tx_bytes;
 	unsigned long rx_dropped;
 	unsigned long rx_errors;
 	unsigned long tx_dropped;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index 1571ca0c6616..c0a689529883 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -440,6 +440,7 @@  static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
 				     struct stmmac_extra_stats *x, u32 chan,
 				     u32 dir)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	u32 v;
 	int ret = 0;
 
@@ -450,17 +451,17 @@  static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
 	else if (dir == DMA_DIR_TX)
 		v &= EMAC_INT_MSK_TX;
 
-	u64_stats_update_begin(&priv->xstats.syncp);
+	u64_stats_update_begin(&stats->syncp);
 	if (v & EMAC_TX_INT) {
 		ret |= handle_tx;
-		x->tx_normal_irq_n++;
+		stats->tx_normal_irq_n++;
 	}
 
 	if (v & EMAC_RX_INT) {
 		ret |= handle_rx;
-		x->rx_normal_irq_n++;
+		stats->rx_normal_irq_n++;
 	}
-	u64_stats_update_end(&priv->xstats.syncp);
+	u64_stats_update_end(&stats->syncp);
 
 	if (v & EMAC_TX_DMA_STOP_INT)
 		x->tx_process_stopped_irq++;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
index eda4859fa468..bd5fecb101af 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
@@ -168,6 +168,7 @@  void dwmac410_disable_dma_irq(struct stmmac_priv *priv, void __iomem *ioaddr,
 int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
 			 struct stmmac_extra_stats *x, u32 chan, u32 dir)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	const struct dwmac4_addrs *dwmac4_addrs = priv->plat->dwmac4_addrs;
 	u32 intr_status = readl(ioaddr + DMA_CHAN_STATUS(dwmac4_addrs, chan));
 	u32 intr_en = readl(ioaddr + DMA_CHAN_INTR_ENA(dwmac4_addrs, chan));
@@ -198,27 +199,23 @@  int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
 		}
 	}
 	/* TX/RX NORMAL interrupts */
-	u64_stats_update_begin(&priv->xstats.syncp);
+	u64_stats_update_begin(&stats->syncp);
 	if (likely(intr_status & DMA_CHAN_STATUS_NIS))
-		x->normal_irq_n++;
+		stats->normal_irq_n++;
 	if (likely(intr_status & DMA_CHAN_STATUS_RI))
-		x->rx_normal_irq_n++;
+		stats->rx_normal_irq_n++;
 	if (likely(intr_status & DMA_CHAN_STATUS_TI))
-		x->tx_normal_irq_n++;
-	u64_stats_update_end(&priv->xstats.syncp);
+		stats->tx_normal_irq_n++;
 
 	if (likely(intr_status & DMA_CHAN_STATUS_RI)) {
-		u64_stats_update_begin(&priv->xstats.rxq_stats[chan].syncp);
-		x->rxq_stats[chan].rx_normal_irq_n++;
-		u64_stats_update_end(&priv->xstats.rxq_stats[chan].syncp);
+		stats->rxq_stats[chan].rx_normal_irq_n++;
 		ret |= handle_rx;
 	}
 	if (likely(intr_status & DMA_CHAN_STATUS_TI)) {
-		u64_stats_update_begin(&priv->xstats.txq_stats[chan].syncp);
-		x->txq_stats[chan].tx_normal_irq_n++;
-		u64_stats_update_end(&priv->xstats.txq_stats[chan].syncp);
+		stats->txq_stats[chan].tx_normal_irq_n++;
 		ret |= handle_tx;
 	}
+	u64_stats_update_end(&stats->syncp);
 
 	if (unlikely(intr_status & DMA_CHAN_STATUS_TBU))
 		ret |= handle_tx;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
index 4cef67571d5a..bb938b334313 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
@@ -162,6 +162,7 @@  static void show_rx_process_state(unsigned int status)
 int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
 			struct stmmac_extra_stats *x, u32 chan, u32 dir)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	int ret = 0;
 	/* read the status register (CSR5) */
 	u32 intr_status = readl(ioaddr + DMA_STATUS);
@@ -209,21 +210,21 @@  int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
 	}
 	/* TX/RX NORMAL interrupts */
 	if (likely(intr_status & DMA_STATUS_NIS)) {
-		u64_stats_update_begin(&priv->xstats.syncp);
-		x->normal_irq_n++;
+		u64_stats_update_begin(&stats->syncp);
+		stats->normal_irq_n++;
 		if (likely(intr_status & DMA_STATUS_RI)) {
 			u32 value = readl(ioaddr + DMA_INTR_ENA);
 			/* to schedule NAPI on real RIE event. */
 			if (likely(value & DMA_INTR_ENA_RIE)) {
-				x->rx_normal_irq_n++;
+				stats->rx_normal_irq_n++;
 				ret |= handle_rx;
 			}
 		}
 		if (likely(intr_status & DMA_STATUS_TI)) {
-			x->tx_normal_irq_n++;
+			stats->tx_normal_irq_n++;
 			ret |= handle_tx;
 		}
-		u64_stats_update_end(&priv->xstats.syncp);
+		u64_stats_update_end(&stats->syncp);
 		if (unlikely(intr_status & DMA_STATUS_ERI))
 			x->rx_early_irq++;
 	}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
index 5997aa0c9b55..052852aeb12d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
@@ -337,6 +337,7 @@  static int dwxgmac2_dma_interrupt(struct stmmac_priv *priv,
 				  struct stmmac_extra_stats *x, u32 chan,
 				  u32 dir)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	u32 intr_status = readl(ioaddr + XGMAC_DMA_CH_STATUS(chan));
 	u32 intr_en = readl(ioaddr + XGMAC_DMA_CH_INT_EN(chan));
 	int ret = 0;
@@ -364,18 +365,18 @@  static int dwxgmac2_dma_interrupt(struct stmmac_priv *priv,
 
 	/* TX/RX NORMAL interrupts */
 	if (likely(intr_status & XGMAC_NIS)) {
-		u64_stats_update_begin(&priv->xstats.syncp);
-		x->normal_irq_n++;
+		u64_stats_update_begin(&stats->syncp);
+		stats->normal_irq_n++;
 
 		if (likely(intr_status & XGMAC_RI)) {
-			x->rx_normal_irq_n++;
+			stats->rx_normal_irq_n++;
 			ret |= handle_rx;
 		}
 		if (likely(intr_status & (XGMAC_TI | XGMAC_TBU))) {
-			x->tx_normal_irq_n++;
+			stats->tx_normal_irq_n++;
 			ret |= handle_tx;
 		}
-		u64_stats_update_end(&priv->xstats.syncp);
+		u64_stats_update_end(&stats->syncp);
 	}
 
 	/* Clear interrupts */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index f9cca2562d60..2f56d0ab3d27 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -164,21 +164,29 @@  static const struct stmmac_stats stmmac_gstrings_stats[] = {
 };
 #define STMMAC_STATS_LEN ARRAY_SIZE(stmmac_gstrings_stats)
 
-static const struct stmmac_stats stmmac_gstrings_stats64[] = {
+struct stmmac_ethtool_pcpu_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int stat_offset;
+};
+
+#define STMMAC_ETHTOOL_PCPU_STAT(m)	\
+	{ #m, offsetof(struct stmmac_pcpu_stats, m) }
+
+static const struct stmmac_ethtool_pcpu_stats stmmac_gstrings_pcpu_stats[] = {
 	/* Tx/Rx IRQ Events */
-	STMMAC_STAT(tx_pkt_n),
-	STMMAC_STAT(rx_pkt_n),
-	STMMAC_STAT(normal_irq_n),
-	STMMAC_STAT(rx_normal_irq_n),
-	STMMAC_STAT(napi_poll),
-	STMMAC_STAT(tx_normal_irq_n),
-	STMMAC_STAT(tx_clean),
-	STMMAC_STAT(tx_set_ic_bit),
+	STMMAC_ETHTOOL_PCPU_STAT(tx_pkt_n),
+	STMMAC_ETHTOOL_PCPU_STAT(rx_pkt_n),
+	STMMAC_ETHTOOL_PCPU_STAT(normal_irq_n),
+	STMMAC_ETHTOOL_PCPU_STAT(rx_normal_irq_n),
+	STMMAC_ETHTOOL_PCPU_STAT(napi_poll),
+	STMMAC_ETHTOOL_PCPU_STAT(tx_normal_irq_n),
+	STMMAC_ETHTOOL_PCPU_STAT(tx_clean),
+	STMMAC_ETHTOOL_PCPU_STAT(tx_set_ic_bit),
 	/* TSO */
-	STMMAC_STAT(tx_tso_frames),
-	STMMAC_STAT(tx_tso_nfrags),
+	STMMAC_ETHTOOL_PCPU_STAT(tx_tso_frames),
+	STMMAC_ETHTOOL_PCPU_STAT(tx_tso_nfrags),
 };
-#define STMMAC_STATS64_LEN ARRAY_SIZE(stmmac_gstrings_stats64)
+#define STMMAC_PCPU_STATS_LEN ARRAY_SIZE(stmmac_gstrings_pcpu_stats)
 
 /* HW MAC Management counters (if supported) */
 #define STMMAC_MMC_STAT(m)	\
@@ -541,30 +549,36 @@  static void stmmac_get_per_qstats(struct stmmac_priv *priv, u64 *data)
 	u32 tx_cnt = priv->plat->tx_queues_to_use;
 	u32 rx_cnt = priv->plat->rx_queues_to_use;
 	unsigned int start;
-	int q, stat;
+	int q, stat, cpu;
 	char *p;
+	u64 *pos;
 
-	for (q = 0; q < tx_cnt; q++) {
+	pos = data;
+	for_each_possible_cpu(cpu) {
+		struct stmmac_pcpu_stats *stats, snapshot;
+
+		data = pos;
+		stats = per_cpu_ptr(priv->xstats.pstats, cpu);
 		do {
-			start = u64_stats_fetch_begin(&priv->xstats.txq_stats[q].syncp);
-			p = (char *)priv + offsetof(struct stmmac_priv,
-						    xstats.txq_stats[q].tx_pkt_n);
+			snapshot = *stats;
+		} while (u64_stats_fetch_retry(&stats->syncp, start));
+
+		for (q = 0; q < tx_cnt; q++) {
+			p = (char *)&snapshot + offsetof(struct stmmac_pcpu_stats,
+						    txq_stats[q].tx_pkt_n);
 			for (stat = 0; stat < STMMAC_TXQ_STATS; stat++) {
 				*data++ = (*(u64 *)p);
 				p += sizeof(u64);
 			}
-		} while (u64_stats_fetch_retry(&priv->xstats.txq_stats[q].syncp, start));
-	}
-	for (q = 0; q < rx_cnt; q++) {
-		do {
-			start = u64_stats_fetch_begin(&priv->xstats.rxq_stats[q].syncp);
-			p = (char *)priv + offsetof(struct stmmac_priv,
-						    xstats.rxq_stats[q].rx_pkt_n);
+		}
+		for (q = 0; q < rx_cnt; q++) {
+			p = (char *)&snapshot + offsetof(struct stmmac_pcpu_stats,
+						    rxq_stats[q].rx_pkt_n);
 			for (stat = 0; stat < STMMAC_RXQ_STATS; stat++) {
 				*data++ = (*(u64 *)p);
 				p += sizeof(u64);
 			}
-		} while (u64_stats_fetch_retry(&priv->xstats.rxq_stats[q].syncp, start));
+		}
 	}
 }
 
@@ -576,7 +590,7 @@  static void stmmac_get_ethtool_stats(struct net_device *dev,
 	u32 tx_queues_count = priv->plat->tx_queues_to_use;
 	unsigned long count;
 	unsigned int start;
-	int i, j = 0, ret;
+	int i, j = 0, pos, ret, cpu;
 
 	if (priv->dma_cap.asp) {
 		for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
@@ -618,13 +632,22 @@  static void stmmac_get_ethtool_stats(struct net_device *dev,
 		data[j++] = (stmmac_gstrings_stats[i].sizeof_stat ==
 			     sizeof(u64)) ? (*(u64 *)p) : (*(u32 *)p);
 	}
-	do {
-		start = u64_stats_fetch_begin(&priv->xstats.syncp);
-		for (i = 0; i < STMMAC_STATS64_LEN; i++) {
-			char *p = (char *)priv + stmmac_gstrings_stats64[i].stat_offset;
-			data[j++] = *(u64 *)p;
+	pos = j;
+	for_each_possible_cpu(cpu) {
+		struct stmmac_pcpu_stats *stats, snapshot;
+
+		stats = per_cpu_ptr(priv->xstats.pstats, cpu);
+		j = pos;
+		do {
+			start = u64_stats_fetch_begin(&stats->syncp);
+			snapshot = *stats;
+		} while (u64_stats_fetch_retry(&stats->syncp, start));
+
+		for (i = 0; i < STMMAC_PCPU_STATS_LEN; i++) {
+			char *p = (char *)&snapshot + stmmac_gstrings_pcpu_stats[i].stat_offset;
+			data[j++] += *(u64 *)p;
 		}
-	} while (u64_stats_fetch_retry(&priv->xstats.syncp, start));
+	}
 	stmmac_get_per_qstats(priv, &data[j]);
 }
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 69cb2835fa82..4056ea859963 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2422,6 +2422,7 @@  static void stmmac_dma_operation_mode(struct stmmac_priv *priv)
 
 static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	struct netdev_queue *nq = netdev_get_tx_queue(priv->dev, queue);
 	struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
 	struct xsk_buff_pool *pool = tx_q->xsk_pool;
@@ -2502,9 +2503,9 @@  static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
 		tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_conf.dma_tx_size);
 		entry = tx_q->cur_tx;
 	}
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.tx_set_ic_bit += tx_set_ic_bit;
-	u64_stats_update_end(&priv->xstats.syncp);
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_set_ic_bit += tx_set_ic_bit;
+	u64_stats_update_end(&stats->syncp);
 
 	if (tx_desc) {
 		stmmac_flush_tx_descriptors(priv, queue);
@@ -2543,6 +2544,7 @@  static void stmmac_bump_dma_threshold(struct stmmac_priv *priv, u32 chan)
  */
 static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
 	unsigned int bytes_compl = 0, pkts_compl = 0;
 	unsigned int entry, xmits = 0, count = 0;
@@ -2704,15 +2706,12 @@  static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
 			      STMMAC_COAL_TIMER(priv->tx_coal_timer[queue]),
 			      HRTIMER_MODE_REL);
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.tx_packets += tx_packets;
-	priv->xstats.tx_pkt_n += tx_packets;
-	priv->xstats.tx_clean++;
-	u64_stats_update_end(&priv->xstats.syncp);
-
-	u64_stats_update_begin(&priv->xstats.txq_stats[queue].syncp);
-	priv->xstats.txq_stats[queue].tx_pkt_n += tx_packets;
-	u64_stats_update_end(&priv->xstats.txq_stats[queue].syncp);
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_packets += tx_packets;
+	stats->tx_pkt_n += tx_packets;
+	stats->tx_clean++;
+	stats->txq_stats[queue].tx_pkt_n += tx_packets;
+	u64_stats_update_end(&stats->syncp);
 
 	priv->xstats.tx_errors += tx_errors;
 
@@ -4108,6 +4107,7 @@  static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 	int nfrags = skb_shinfo(skb)->nr_frags;
 	u32 queue = skb_get_queue_mapping(skb);
 	unsigned int first_entry, tx_packets;
+	struct stmmac_pcpu_stats *stats;
 	int tmp_pay_len = 0, first_tx;
 	struct stmmac_tx_queue *tx_q;
 	bool has_vlan, set_ic;
@@ -4275,13 +4275,14 @@  static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 		netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
 	}
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.tx_bytes += skb->len;
-	priv->xstats.tx_tso_frames++;
-	priv->xstats.tx_tso_nfrags += nfrags;
+	stats = this_cpu_ptr(priv->xstats.pstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_bytes += skb->len;
+	stats->tx_tso_frames++;
+	stats->tx_tso_nfrags += nfrags;
 	if (set_ic)
-		priv->xstats.tx_set_ic_bit++;
-	u64_stats_update_end(&priv->xstats.syncp);
+		stats->tx_set_ic_bit++;
+	u64_stats_update_end(&stats->syncp);
 
 	if (priv->sarc_type)
 		stmmac_set_desc_sarc(priv, first, priv->sarc_type);
@@ -4353,6 +4354,7 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 	int nfrags = skb_shinfo(skb)->nr_frags;
 	int gso = skb_shinfo(skb)->gso_type;
 	struct dma_edesc *tbs_desc = NULL;
+	struct stmmac_pcpu_stats *stats;
 	struct dma_desc *desc, *first;
 	struct stmmac_tx_queue *tx_q;
 	bool has_vlan, set_ic;
@@ -4511,11 +4513,12 @@  static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
 	}
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.tx_bytes += skb->len;
+	stats = this_cpu_ptr(priv->xstats.pstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->tx_bytes += skb->len;
 	if (set_ic)
-		priv->xstats.tx_set_ic_bit++;
-	u64_stats_update_end(&priv->xstats.syncp);
+		stats->tx_set_ic_bit++;
+	u64_stats_update_end(&stats->syncp);
 
 	if (priv->sarc_type)
 		stmmac_set_desc_sarc(priv, first, priv->sarc_type);
@@ -4722,6 +4725,7 @@  static unsigned int stmmac_rx_buf2_len(struct stmmac_priv *priv,
 static int stmmac_xdp_xmit_xdpf(struct stmmac_priv *priv, int queue,
 				struct xdp_frame *xdpf, bool dma_map)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
 	unsigned int entry = tx_q->cur_tx;
 	struct dma_desc *tx_desc;
@@ -4780,9 +4784,9 @@  static int stmmac_xdp_xmit_xdpf(struct stmmac_priv *priv, int queue,
 	if (set_ic) {
 		tx_q->tx_count_frames = 0;
 		stmmac_set_tx_ic(priv, tx_desc);
-		u64_stats_update_begin(&priv->xstats.syncp);
-		priv->xstats.tx_set_ic_bit++;
-		u64_stats_update_end(&priv->xstats.syncp);
+		u64_stats_update_begin(&stats->syncp);
+		stats->tx_set_ic_bit++;
+		u64_stats_update_end(&stats->syncp);
 	}
 
 	stmmac_enable_dma_transmission(priv, priv->ioaddr);
@@ -4927,6 +4931,7 @@  static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
 				   struct dma_desc *p, struct dma_desc *np,
 				   struct xdp_buff *xdp)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	struct stmmac_channel *ch = &priv->channel[queue];
 	unsigned int len = xdp->data_end - xdp->data;
 	enum pkt_hash_types hash_type;
@@ -4955,10 +4960,10 @@  static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
 	skb_record_rx_queue(skb, queue);
 	napi_gro_receive(&ch->rxtx_napi, skb);
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.rx_packets++;
-	priv->xstats.rx_bytes += len;
-	u64_stats_update_end(&priv->xstats.syncp);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_packets++;
+	stats->rx_bytes += len;
+	u64_stats_update_end(&stats->syncp);
 }
 
 static bool stmmac_rx_refill_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -5031,6 +5036,7 @@  static struct stmmac_xdp_buff *xsk_buff_to_stmmac_ctx(struct xdp_buff *xdp)
 
 static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	struct stmmac_rx_queue *rx_q = &priv->dma_conf.rx_queue[queue];
 	unsigned int count = 0, error = 0, len = 0;
 	u32 rx_errors = 0, rx_dropped = 0;
@@ -5193,13 +5199,10 @@  static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
 
 	stmmac_finalize_xdp_rx(priv, xdp_status);
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.rx_pkt_n += count;
-	u64_stats_update_end(&priv->xstats.syncp);
-
-	u64_stats_update_begin(&priv->xstats.rxq_stats[queue].syncp);
-	priv->xstats.rxq_stats[queue].rx_pkt_n += count;
-	u64_stats_update_end(&priv->xstats.rxq_stats[queue].syncp);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_pkt_n += count;
+	stats->rxq_stats[queue].rx_pkt_n += count;
+	u64_stats_update_end(&stats->syncp);
 
 	priv->xstats.rx_dropped += rx_dropped;
 	priv->xstats.rx_errors += rx_errors;
@@ -5226,6 +5229,7 @@  static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
  */
 static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 {
+	struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pstats);
 	u32 rx_errors = 0, rx_dropped = 0, rx_bytes = 0, rx_packets = 0;
 	struct stmmac_rx_queue *rx_q = &priv->dma_conf.rx_queue[queue];
 	struct stmmac_channel *ch = &priv->channel[queue];
@@ -5487,15 +5491,12 @@  static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 
 	stmmac_rx_refill(priv, queue);
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.rx_packets += rx_packets;
-	priv->xstats.rx_bytes += rx_bytes;
-	priv->xstats.rx_pkt_n += count;
-	u64_stats_update_end(&priv->xstats.syncp);
-
-	u64_stats_update_begin(&priv->xstats.rxq_stats[queue].syncp);
-	priv->xstats.rxq_stats[queue].rx_pkt_n += count;
-	u64_stats_update_end(&priv->xstats.rxq_stats[queue].syncp);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_packets += rx_packets;
+	stats->rx_bytes += rx_bytes;
+	stats->rx_pkt_n += count;
+	stats->rxq_stats[queue].rx_pkt_n += count;
+	u64_stats_update_end(&stats->syncp);
 
 	priv->xstats.rx_dropped += rx_dropped;
 	priv->xstats.rx_errors += rx_errors;
@@ -5508,12 +5509,14 @@  static int stmmac_napi_poll_rx(struct napi_struct *napi, int budget)
 	struct stmmac_channel *ch =
 		container_of(napi, struct stmmac_channel, rx_napi);
 	struct stmmac_priv *priv = ch->priv_data;
+	struct stmmac_pcpu_stats *stats;
 	u32 chan = ch->index;
 	int work_done;
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.napi_poll++;
-	u64_stats_update_end(&priv->xstats.syncp);
+	stats = this_cpu_ptr(priv->xstats.pstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->napi_poll++;
+	u64_stats_update_end(&stats->syncp);
 
 	work_done = stmmac_rx(priv, budget, chan);
 	if (work_done < budget && napi_complete_done(napi, work_done)) {
@@ -5532,12 +5535,14 @@  static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
 	struct stmmac_channel *ch =
 		container_of(napi, struct stmmac_channel, tx_napi);
 	struct stmmac_priv *priv = ch->priv_data;
+	struct stmmac_pcpu_stats *stats;
 	u32 chan = ch->index;
 	int work_done;
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.napi_poll++;
-	u64_stats_update_end(&priv->xstats.syncp);
+	stats = this_cpu_ptr(priv->xstats.pstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->napi_poll++;
+	u64_stats_update_end(&stats->syncp);
 
 	work_done = stmmac_tx_clean(priv, budget, chan);
 	work_done = min(work_done, budget);
@@ -5558,12 +5563,14 @@  static int stmmac_napi_poll_rxtx(struct napi_struct *napi, int budget)
 	struct stmmac_channel *ch =
 		container_of(napi, struct stmmac_channel, rxtx_napi);
 	struct stmmac_priv *priv = ch->priv_data;
+	struct stmmac_pcpu_stats *stats;
 	int rx_done, tx_done, rxtx_done;
 	u32 chan = ch->index;
 
-	u64_stats_update_begin(&priv->xstats.syncp);
-	priv->xstats.napi_poll++;
-	u64_stats_update_end(&priv->xstats.syncp);
+	stats = this_cpu_ptr(priv->xstats.pstats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->napi_poll++;
+	u64_stats_update_end(&stats->syncp);
 
 	tx_done = stmmac_tx_clean(priv, budget, chan);
 	tx_done = min(tx_done, budget);
@@ -6823,23 +6830,30 @@  static void stmmac_get_stats64(struct net_device *dev, struct rtnl_link_stats64
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 	unsigned int start;
-	u64 rx_packets;
-	u64 rx_bytes;
-	u64 tx_packets;
-	u64 tx_bytes;
-
-	do {
-		start = u64_stats_fetch_begin(&priv->xstats.syncp);
-		rx_packets = priv->xstats.rx_packets;
-		rx_bytes   = priv->xstats.rx_bytes;
-		tx_packets = priv->xstats.tx_packets;
-		tx_bytes   = priv->xstats.tx_bytes;
-	} while (u64_stats_fetch_retry(&priv->xstats.syncp, start));
-
-	stats->rx_packets = rx_packets;
-	stats->rx_bytes = rx_bytes;
-	stats->tx_packets = tx_packets;
-	stats->tx_bytes = tx_bytes;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct stmmac_pcpu_stats *stats;
+		u64 rx_packets;
+		u64 rx_bytes;
+		u64 tx_packets;
+		u64 tx_bytes;
+
+		stats = per_cpu_ptr(priv->xstats.pstats, cpu);
+		do {
+			start = u64_stats_fetch_begin(&stats->syncp);
+			rx_packets = stats->rx_packets;
+			rx_bytes   = stats->rx_bytes;
+			tx_packets = stats->tx_packets;
+			tx_bytes   = stats->tx_bytes;
+		} while (u64_stats_fetch_retry(&stats->syncp, start));
+
+		stats->rx_packets += rx_packets;
+		stats->rx_bytes += rx_bytes;
+		stats->tx_packets += tx_packets;
+		stats->tx_bytes += tx_bytes;
+	}
+
 	stats->rx_dropped = priv->xstats.rx_dropped;
 	stats->rx_errors = priv->xstats.rx_errors;
 	stats->tx_dropped = priv->xstats.tx_dropped;
@@ -7225,6 +7239,10 @@  int stmmac_dvr_probe(struct device *device,
 	priv->device = device;
 	priv->dev = ndev;
 
+	priv->xstats.pstats = devm_netdev_alloc_pcpu_stats(device, struct stmmac_pcpu_stats);
+	if (!priv->xstas.pstats)
+		return -ENOMEM;
+
 	stmmac_set_ethtool_ops(ndev);
 	priv->pause = pause;
 	priv->plat = plat_dat;
@@ -7383,12 +7401,6 @@  int stmmac_dvr_probe(struct device *device,
 
 	priv->xstats.threshold = tc;
 
-	u64_stats_init(&priv->xstats.syncp);
-	for (i = 0; i < priv->plat->rx_queues_to_use; i++)
-		u64_stats_init(&priv->xstats.rxq_stats[i].syncp);
-	for (i = 0; i < priv->plat->tx_queues_to_use; i++)
-		u64_stats_init(&priv->xstats.txq_stats[i].syncp);
-
 	/* Initialize RSS */
 	rxq = priv->plat->rx_queues_to_use;
 	netdev_rss_key_fill(priv->rss.key, sizeof(priv->rss.key));