diff mbox series

[RFC,net-next/mm,V1,2/3] page_pool: Use static_key for shutdown phase

Message ID 168244294384.1741095.6037010854411310099.stgit@firesoul (mailing list archive)
State New
Headers show
Series page_pool: new approach for leak detection and shutdown phase | expand

Commit Message

Jesper Dangaard Brouer April 25, 2023, 5:15 p.m. UTC
Performance is very important for page pool (PP). This add the use of
static_key APIs for regaining a single instruction, which makes the
new PP shutdown scheme zero impact.

We are uncertain if this is 100% correct, because static_key APIs uses
a mutex lock and it is uncertain if all contexts that can return pages
can support this. We could spawn a workqueue (like we just removed) to
workaround this issue.

Seeking input if this is worth the complexity.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/page_pool.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Comments

Alexei Starovoitov April 25, 2023, 11:30 p.m. UTC | #1
On Tue, Apr 25, 2023 at 07:15:43PM +0200, Jesper Dangaard Brouer wrote:
> Performance is very important for page pool (PP). This add the use of
> static_key APIs for regaining a single instruction, which makes the
> new PP shutdown scheme zero impact.
> 
> We are uncertain if this is 100% correct, because static_key APIs uses
> a mutex lock and it is uncertain if all contexts that can return pages
> can support this. We could spawn a workqueue (like we just removed) to
> workaround this issue.

With debug atomic sleep the issue should be trivial to see.
iirc the callers of xdp_flush_frame_bulk() need to do it under rcu_read_lock equivalent,
which is not sleepable and mutex-es should warn.
Jesper Dangaard Brouer April 26, 2023, 3:15 p.m. UTC | #2
On 26/04/2023 01.30, Alexei Starovoitov wrote:
> On Tue, Apr 25, 2023 at 07:15:43PM +0200, Jesper Dangaard Brouer wrote:
>> Performance is very important for page pool (PP). This add the use of
>> static_key APIs for regaining a single instruction, which makes the
>> new PP shutdown scheme zero impact.
>>
>> We are uncertain if this is 100% correct, because static_key APIs uses
>> a mutex lock and it is uncertain if all contexts that can return pages
>> can support this. We could spawn a workqueue (like we just removed) to
>> workaround this issue.
> 
> With debug atomic sleep the issue should be trivial to see.
> iirc the callers of xdp_flush_frame_bulk() need to do it under rcu_read_lock equivalent,
> which is not sleepable and mutex-es should warn.
> 

Sure, adding CONFIG_DEBUG_ATOMIC_SLEEP and (keeping) CONFIG_DEBUG_MUTEXES.

Testing this with veth changes that added page_pool to veth
commit 0ebab78cbcbf ("net: veth: add page_pool for page recycling").
which makes it easier to create/trigger inflight packets
via softnet_data->defer_list.

It confirms my suspicion, this patch is not correct.

The dmesg warning looks like below.
(The XXX printk is just from my own debugging patch)

The call site net_rx_action+0xe7 is skb_defer_free_flush
  $ ./scripts/faddr2line vmlinux net_rx_action+0xe7
  net_rx_action+0xe7/0x360:
  skb_defer_free_flush at net/core/dev.c:6615
  (inlined by) skb_defer_free_flush at net/core/dev.c:6601
  (inlined by) net_rx_action at net/core/dev.c:6677


[ 1540.073256] XXX page_pool_shutdown_attempt() inflight=24
[ 1540.078654] XXX page_pool_destroy() Enter into shutdown phase - 
inflight=24
[ 1540.094234] XXX page_pool_shutdown_attempt() inflight=23
[ 1540.099626] XXX page_pool_shutdown_attempt() inflight=22
[ 1540.105012] XXX page_pool_shutdown_attempt() inflight=21
[ 1540.110402] XXX page_pool_shutdown_attempt() inflight=20
[ 1540.115793] XXX page_pool_shutdown_attempt() inflight=19
[ 1540.121183] XXX page_pool_shutdown_attempt() inflight=18
[ 1540.126564] XXX page_pool_shutdown_attempt() inflight=17
[ 1540.131945] XXX page_pool_shutdown_attempt() inflight=16
[ 1540.137329] XXX page_pool_shutdown_attempt() inflight=15
[ 1540.142710] XXX page_pool_shutdown_attempt() inflight=14
[ 1540.148110] XXX page_pool_shutdown_attempt() inflight=13
[ 1540.153512] XXX page_pool_shutdown_attempt() inflight=12
[ 1540.312284] XXX page_pool_shutdown_attempt() inflight=11
[ 1540.317712] XXX page_pool_shutdown_attempt() inflight=10
[ 1540.323128] XXX page_pool_shutdown_attempt() inflight=9
[ 1540.328459] XXX page_pool_shutdown_attempt() inflight=8
[ 1540.333788] XXX page_pool_shutdown_attempt() inflight=7
[ 1540.339111] XXX page_pool_shutdown_attempt() inflight=6
[ 1540.344439] XXX page_pool_shutdown_attempt() inflight=5
[ 1540.349768] XXX page_pool_shutdown_attempt() inflight=4
[ 1540.355091] XXX page_pool_shutdown_attempt() inflight=3
[ 1540.360420] XXX page_pool_shutdown_attempt() inflight=2
[ 1540.365741] XXX page_pool_shutdown_attempt() inflight=1
[ 1540.371064] BUG: sleeping function called from invalid context at 
include/linux/percpu-rwsem.h:49
[ 1540.380052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 
26501, name: netperf
[ 1540.388171] preempt_count: 101, expected: 0
[ 1540.392452] RCU nest depth: 2, expected: 0
[ 1540.396640] CPU: 2 PID: 26501 Comm: netperf Not tainted 
6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.406147] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 
2.0a 08/01/2016
[ 1540.413749] Call Trace:
[ 1540.416289]  <IRQ>
[ 1540.418395]  dump_stack_lvl+0x32/0x50
[ 1540.422160]  __might_resched+0x11c/0x160
[ 1540.426182]  cpus_read_lock+0x16/0x60
[ 1540.429942]  static_key_slow_dec+0x17/0x50
[ 1540.434137]  page_pool_shutdown_attempt+0x50/0x60
[ 1540.438940]  page_pool_return_skb_page+0x68/0xe0
[ 1540.443654]  skb_free_head+0x4f/0x90
[ 1540.447326]  skb_release_data+0x142/0x1a0
[ 1540.451435]  napi_consume_skb+0x6b/0x180
[ 1540.455448]  net_rx_action+0xe7/0x360
[ 1540.459209]  __do_softirq+0xcb/0x2b1
[ 1540.462885]  do_softirq+0x63/0x90
[ 1540.466297]  </IRQ>
[ 1540.468487]  <TASK>
[ 1540.470671]  __local_bh_enable_ip+0x64/0x70
[ 1540.474948]  __dev_queue_xmit+0x257/0x8a0
[ 1540.479049]  ? __ip_local_out+0x48/0x160
[ 1540.483070]  ip_finish_output2+0x25c/0x540
[ 1540.487255]  __ip_queue_xmit+0x171/0x420
[ 1540.491266]  ? ___slab_alloc+0xd1/0x670
[ 1540.495202]  __tcp_transmit_skb+0x83c/0x950
[ 1540.499486]  tcp_write_xmit+0x373/0xa60
[ 1540.503419]  __tcp_push_pending_frames+0x32/0xf0
[ 1540.508125]  tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.512408]  tcp_sendmsg+0x27/0x40
[ 1540.515907]  sock_sendmsg+0x8b/0xa0
[ 1540.519494]  ? sockfd_lookup_light+0x12/0x70
[ 1540.523856]  __sys_sendto+0xeb/0x130
[ 1540.527531]  ? __switch_to_asm+0x3a/0x80
[ 1540.531549]  ? rseq_get_rseq_cs+0x32/0x290
[ 1540.535736]  ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.539751]  __x64_sys_sendto+0x20/0x30
[ 1540.543686]  do_syscall_64+0x3a/0x90
[ 1540.547357]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.552502] RIP: 0033:0x7f389f713680
[ 1540.556174] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.575075] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[ 1540.582755] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX: 
00007f389f713680
[ 1540.589983] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI: 
0000000000000004
[ 1540.597208] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09: 
0000000000000000
[ 1540.604438] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000063fcf8
[ 1540.611664] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15: 
00007f389fa87000
[ 1540.618896]  </TASK>
[ 1540.621199] ------------[ cut here ]------------
[ 1540.625911] WARNING: CPU: 2 PID: 26501 at kernel/smp.c:912 
smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.635001] Modules linked in: veth nf_defrag_ipv6 nf_defrag_ipv4 
ib_umad rdma_ucm ib_ipoib rdma_cm iw_cm irdma ib_cm ice 
intel_uncore_frequency intel_uncore_frequency_common coretemp kvm_intel 
mlx5_ib kvm ib_uverbs irqbypass rapl intel_cstate ib_core i2c_i801 
intel_uncore pcspkr i2c_smbus bfq acpi_ipmi wmi ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad sch_fq_codel drm fuse zram zsmalloc mlx5_core 
ixgbe i40e igb igc sd_mod t10_pi mlxfw psample hid_generic 
crc64_rocksoft_generic i2c_algo_bit crc64_rocksoft i2c_core mdio ptp 
pps_core [last unloaded: bpfilter]
[ 1540.684653] CPU: 2 PID: 26501 Comm: netperf Tainted: G        W 
    6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.695642] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 
2.0a 08/01/2016
[ 1540.703239] RIP: 0010:smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.709090] Code: fe ff ff 8b 44 24 14 48 c7 c6 a0 04 fc 82 8b 15 85 
f4 e6 01 48 8b 3c 24 8d 48 01 48 63 c9 e8 36 c5 4b 00 89 c2 e9 c5 fd ff 
ff <0f> 0b e9 86 fd ff ff 8b 7c 24 28 e8 4f b8 f7 ff eb af 66 66 2e 0f
[ 1540.727999] RSP: 0018:ffffc9000003cda8 EFLAGS: 00010206
[ 1540.733322] RAX: 0000000000000102 RBX: ffffffff8383b2ea RCX: 
0000000000000003
[ 1540.740552] RDX: 0000000000000000 RSI: ffffffff810271a0 RDI: 
ffffffff82fc04a0
[ 1540.747782] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000001
[ 1540.755008] R10: 000000087c9e9000 R11: ffffffff82c4f620 R12: 
0000000000000000
[ 1540.762241] R13: ffffffff8383b300 R14: 0000000000000002 R15: 
0000000000000001
[ 1540.769474] FS:  00007f389f954740(0000) GS:ffff88887fc80000(0000) 
knlGS:0000000000000000
[ 1540.777685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1540.783528] CR2: 00007f1e27582000 CR3: 000000015775c004 CR4: 
00000000003706e0
[ 1540.790761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 1540.797987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[ 1540.805220] Call Trace:
[ 1540.807763]  <IRQ>
[ 1540.809868]  ? __pfx_do_sync_core+0x10/0x10
[ 1540.814146]  on_each_cpu_cond_mask+0x20/0x40
[ 1540.818514]  text_poke_bp_batch+0x9e/0x250
[ 1540.822708]  text_poke_finish+0x1b/0x30
[ 1540.826644]  arch_jump_label_transform_apply+0x16/0x30
[ 1540.831880]  __static_key_slow_dec_cpuslocked.part.0+0x2f/0x40
[ 1540.837809]  static_key_slow_dec+0x30/0x50
[ 1540.841999]  page_pool_shutdown_attempt+0x50/0x60
[ 1540.846803]  page_pool_return_skb_page+0x68/0xe0
[ 1540.851517]  skb_free_head+0x4f/0x90
[ 1540.855194]  skb_release_data+0x142/0x1a0
[ 1540.859300]  napi_consume_skb+0x6b/0x180
[ 1540.863320]  net_rx_action+0xe7/0x360
[ 1540.867082]  __do_softirq+0xcb/0x2b1
[ 1540.870756]  do_softirq+0x63/0x90
[ 1540.874171]  </IRQ>
[ 1540.876359]  <TASK>
[ 1540.878553]  __local_bh_enable_ip+0x64/0x70
[ 1540.882829]  __dev_queue_xmit+0x257/0x8a0
[ 1540.886938]  ? __ip_local_out+0x48/0x160
[ 1540.890959]  ip_finish_output2+0x25c/0x540
[ 1540.895155]  __ip_queue_xmit+0x171/0x420
[ 1540.899175]  ? ___slab_alloc+0xd1/0x670
[ 1540.903108]  __tcp_transmit_skb+0x83c/0x950
[ 1540.907393]  tcp_write_xmit+0x373/0xa60
[ 1540.911327]  __tcp_push_pending_frames+0x32/0xf0
[ 1540.916041]  tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.920324]  tcp_sendmsg+0x27/0x40
[ 1540.923821]  sock_sendmsg+0x8b/0xa0
[ 1540.927411]  ? sockfd_lookup_light+0x12/0x70
[ 1540.931780]  __sys_sendto+0xeb/0x130
[ 1540.935456]  ? __switch_to_asm+0x3a/0x80
[ 1540.939477]  ? rseq_get_rseq_cs+0x32/0x290
[ 1540.943670]  ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.947693]  __x64_sys_sendto+0x20/0x30
[ 1540.951626]  do_syscall_64+0x3a/0x90
[ 1540.955300]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.960451] RIP: 0033:0x7f389f713680
[ 1540.964124] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.983035] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[ 1540.990724] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX: 
00007f389f713680
[ 1540.997957] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI: 
0000000000000004
[ 1541.005187] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09: 
0000000000000000
[ 1541.012422] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000063fcf8
[ 1541.019649] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15: 
00007f389fa87000
[ 1541.026881]  </TASK>
[ 1541.029166] ---[ end trace 0000000000000000 ]---
[ 1541.033925] XXX page_pool_shutdown_attempt() inflight=0
diff mbox series

Patch

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ce7e8dda6403..3821d8874b15 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -25,6 +25,8 @@ 
 
 #define BIAS_MAX	LONG_MAX
 
+DEFINE_STATIC_KEY_FALSE(pp_shutdown_phase);
+
 #ifdef CONFIG_PAGE_POOL_STATS
 /* alloc_stat_inc is intended to be used in softirq context */
 #define alloc_stat_inc(pool, __stat)	(pool->alloc_stats.__stat++)
@@ -378,7 +380,8 @@  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	int i, nr_pages;
 
 	/* API usage BUG: PP in shutdown phase, cannot alloc new pages */
-	if (WARN_ON(pool->p.flags & PP_FLAG_SHUTDOWN))
+	if (static_key_enabled(&pp_shutdown_phase) &&
+	    WARN_ON(pool->p.flags & PP_FLAG_SHUTDOWN))
 		return NULL;
 
 	/* Don't support bulk alloc for high-order pages */
@@ -609,7 +612,7 @@  void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
 		recycle_stat_inc(pool, ring_full);
 		page_pool_return_page(pool, page);
 	}
-	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+	if (static_branch_unlikely(&pp_shutdown_phase))
 		page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_put_defragged_page);
@@ -659,7 +662,7 @@  void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 		page_pool_return_page(pool, data[i]);
 
 out:
-	if (pool->p.flags & PP_FLAG_SHUTDOWN)
+	if (static_branch_unlikely(&pp_shutdown_phase))
 		page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_put_page_bulk);
@@ -817,7 +820,15 @@  static int page_pool_release(struct page_pool *pool)
 noinline
 static void page_pool_shutdown_attempt(struct page_pool *pool)
 {
-	page_pool_release(pool);
+	int inflight;
+
+	if (!(pool->p.flags & PP_FLAG_SHUTDOWN))
+		return;
+
+	inflight = page_pool_release(pool);
+
+	if (static_key_enabled(&pp_shutdown_phase) && !inflight)
+		static_branch_dec(&pp_shutdown_phase);
 }
 
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
@@ -861,6 +872,7 @@  void page_pool_destroy(struct page_pool *pool)
 	 * Enter into shutdown phase, and retry release to handle races.
 	 */
 	pool->p.flags |= PP_FLAG_SHUTDOWN;
+	static_branch_inc(&pp_shutdown_phase);
 	page_pool_shutdown_attempt(pool);
 }
 EXPORT_SYMBOL(page_pool_destroy);