mbox series

[net-next,0/2] page_pool: Convert stats to u64_stats_t.

Message ID 20250221115221.291006-1-bigeasy@linutronix.de (mailing list archive)
Headers show
Series page_pool: Convert stats to u64_stats_t. | expand

Message

Sebastian Andrzej Siewior Feb. 21, 2025, 11:52 a.m. UTC
This is a follow-up on
	https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/

to convert the page_pool statistics to u64_stats_t to avoid u64 related
problems on 32bit architectures.
While looking over it, the comment for recycle_stat_inc() says that it
is safe to use in preemptible context. The 32bit update is split into
two 32bit writes and if we get preempted in the middle and another one
makes an update then the value gets inconsistent and the previous update
can overwrite the following. (Rare but still).
I don't know if it is ensured that only *one* update can happen because
the stats are per-CPU and per NAPI device. But there will be now a
warning on 32bit if this is really attempted in preemptible context.

Sebastian Andrzej Siewior (2):
  page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  page_pool: Convert page_pool_alloc_stats to u64_stats_t.

 Documentation/networking/page_pool.rst        |  4 +-
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 24 ++---
 include/linux/u64_stats_sync.h                |  5 +
 include/net/page_pool/types.h                 | 27 +++---
 net/core/page_pool.c                          | 95 +++++++++++++------
 net/core/page_pool_user.c                     | 22 ++---
 6 files changed, 113 insertions(+), 64 deletions(-)

Comments

Joe Damato Feb. 21, 2025, 5:10 p.m. UTC | #1
On Fri, Feb 21, 2025 at 12:52:19PM +0100, Sebastian Andrzej Siewior wrote:
> This is a follow-up on
> 	https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
> 
> to convert the page_pool statistics to u64_stats_t to avoid u64 related
> problems on 32bit architectures.
> While looking over it, the comment for recycle_stat_inc() says that it
> is safe to use in preemptible context.

I wrote that comment because it's an increment of a per-cpu counter.

The documentation in Documentation/core-api/this_cpu_ops.rst
explains in more depth, but this_cpu_inc is safe to use without
worrying about pre-emption and interrupts.

> The 32bit update is split into two 32bit writes and if we get
> preempted in the middle and another one makes an update then the
> value gets inconsistent and the previous update can overwrite the
> following. (Rare but still).

Have you seen this? Can you show the generated assembly which
suggests that this occurs? It would be helpful if you could show the
before and after 32-bit assembly code.

I am asking because in arch/x86/include/asm/percpu.h a lot of care
is taken to generate the correct assembly for various sizes and I
am skeptical that this_cpu_inc behaves correctly on 64bit but
incorrectly on 32bit x86. It's certainly possible, but IMHO, we
should be sure that this is the case.

If you could show that the generated assembly on 32bit was not
prempt/irq safe then probably we'd also want to update the
this_cpu_ops documentation?

> I don't know if it is ensured that only *one* update can happen because
> the stats are per-CPU and per NAPI device. But there will be now a
> warning on 32bit if this is really attempted in preemptible context.

Please see Documentation/core-api/this_cpu_ops.rst for a more
detailed explanation.

At a high level, only one per-cpu counter is incremented. The
individual per-cpu counters don't mean anything on their own
(because the increment could happen on any CPU); the sum of the
values is what has meaning.