Message ID | 20250106030225.3901305-1-0x1207@gmail.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v3] page_pool: check for dma_sync_size earlier | expand |
On Mon, Jan 6, 2025 at 11:02 AM Furong Xu <0x1207@gmail.com> wrote: > > Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c > already did. > We can save a couple of function calls if check for dma_sync_size earlier. > > This is a micro optimization, about 0.6% PPS performance improvement > has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX > traffic test. > > Before this patch: > The average of packets per second is 234026 in one minute. > > After this patch: > The average of packets per second is 235537 in one minute. Sorry, I keep skeptical that this small improvement can be statically observed? What exact tool or benchmark are you using, I wonder? Thanks, Jason
On Mon, 6 Jan 2025 11:15:45 +0800, Jason Xing <kerneljasonxing@gmail.com> wrote: > On Mon, Jan 6, 2025 at 11:02 AM Furong Xu <0x1207@gmail.com> wrote: > > > > Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c > > already did. > > We can save a couple of function calls if check for dma_sync_size earlier. > > > > This is a micro optimization, about 0.6% PPS performance improvement > > has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX > > traffic test. > > > > Before this patch: > > The average of packets per second is 234026 in one minute. > > > > After this patch: > > The average of packets per second is 235537 in one minute. > > Sorry, I keep skeptical that this small improvement can be statically > observed? What exact tool or benchmark are you using, I wonder? A x86 PC send out UDP packet and the sar cmd from Sysstat package to report the PPS on RX side: sar -n DEV 60 1
diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 9733206d6406..9bb2d2300d0b 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -458,7 +458,7 @@ page_pool_dma_sync_for_device(const struct page_pool *pool, netmem_ref netmem, u32 dma_sync_size) { - if (pool->dma_sync && dma_dev_need_sync(pool->p.dev)) + if (pool->dma_sync && dma_dev_need_sync(pool->p.dev) && dma_sync_size) __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); }
Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c already did. We can save a couple of function calls if check for dma_sync_size earlier. This is a micro optimization, about 0.6% PPS performance improvement has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX traffic test. Before this patch: The average of packets per second is 234026 in one minute. After this patch: The average of packets per second is 235537 in one minute. Signed-off-by: Furong Xu <0x1207@gmail.com> --- V2 -> V3: Add more details about measurement in commit message V2: https://lore.kernel.org/r/20250103082814.3850096-1-0x1207@gmail.com V1 -> V2: Add measurement data about performance improvement in commit message V1: https://lore.kernel.org/r/20241010114019.1734573-1-0x1207@gmail.com --- net/core/page_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)