Message ID | 20230602081135.75424-4-wuyun.abel@bytedance.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | sock: Improve condition on sockmem pressure | expand |
On Fri, Jun 02, 2023 at 04:11:35PM +0800, Abel Wu wrote: > The status of global socket memory pressure is updated when: > > a) __sk_mem_raise_allocated(): > > enter: sk_memory_allocated(sk) > sysctl_mem[1] > leave: sk_memory_allocated(sk) <= sysctl_mem[0] > > b) __sk_mem_reduce_allocated(): > > leave: sk_under_memory_pressure(sk) && > sk_memory_allocated(sk) < sysctl_mem[0] There is also sk_page_frag_refill() where we can enter the global protocol memory pressure on actual global memory pressure i.e. page allocation failed. However this might be irrelevant from this patch's perspective as the focus is on the leaving part. > > So the conditions of leaving global pressure are inconstant, which *inconsistent > may lead to the situation that one pressured net-memcg prevents the > global pressure from being cleared when there is indeed no global > pressure, thus the global constrains are still in effect unexpectedly > on the other sockets. > > This patch fixes this by ignoring the net-memcg's pressure when > deciding whether should leave global memory pressure. > > Fixes: e1aab161e013 ("socket: initial cgroup code.") > Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> This patch looks good.
On 6/3/23 4:53 AM, Shakeel Butt wrote: > On Fri, Jun 02, 2023 at 04:11:35PM +0800, Abel Wu wrote: >> The status of global socket memory pressure is updated when: >> >> a) __sk_mem_raise_allocated(): >> >> enter: sk_memory_allocated(sk) > sysctl_mem[1] >> leave: sk_memory_allocated(sk) <= sysctl_mem[0] >> >> b) __sk_mem_reduce_allocated(): >> >> leave: sk_under_memory_pressure(sk) && >> sk_memory_allocated(sk) < sysctl_mem[0] > > There is also sk_page_frag_refill() where we can enter the global > protocol memory pressure on actual global memory pressure i.e. page > allocation failed. However this might be irrelevant from this patch's > perspective as the focus is on the leaving part. Leaving prot pressure or not under actual global vm pressure is something similar to what you concerned last time (prot & memcg is now intermingled), as this will mix prot & global together. To decouple global info from prot level pressure, a new variable might be needed. But I doubt the necessity as this seems to be a rare case but a constant overhead on net core path to check the global status (although can be relieved by static key). And after a second thought, failing in skb_page_frag_refill() doesn't necessarily mean there is global memory pressure since it can due to the mpol/memset of the current task. > >> >> So the conditions of leaving global pressure are inconstant, which > > *inconsistent Will fix in next version. > >> may lead to the situation that one pressured net-memcg prevents the >> global pressure from being cleared when there is indeed no global >> pressure, thus the global constrains are still in effect unexpectedly >> on the other sockets. >> >> This patch fixes this by ignoring the net-memcg's pressure when >> deciding whether should leave global memory pressure. >> >> Fixes: e1aab161e013 ("socket: initial cgroup code.") >> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> > > This patch looks good. Thanks! Abel
Hello, kernel test robot noticed a 2.8% improvement of netperf.Throughput_Mbps on: commit: c89fa56a8776f98d8e4ed9310f5b178288005916 ("[PATCH net-next v5 3/3] sock: Fix misuse of sk_under_memory_pressure()") url: https://github.com/intel-lab-lkp/linux/commits/Abel-Wu/net-memcg-Fold-dependency-into-memcg-pressure-cond/20230602-161424 base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git a395b8d1c7c3a074bfa83b9759a4a11901a295c5 patch link: https://lore.kernel.org/all/20230602081135.75424-4-wuyun.abel@bytedance.com/ patch subject: [PATCH net-next v5 3/3] sock: Fix misuse of sk_under_memory_pressure() testcase: netperf test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory parameters: ip: ipv4 runtime: 300s nr_threads: 50% cluster: cs-localhost test: TCP_STREAM cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase: cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-x86_64-20220510.cgz/300s/lkp-icl-2sp2/TCP_STREAM/netperf commit: 5ec3359188 ("sock: Always take memcg pressure into consideration") c89fa56a87 ("sock: Fix misuse of sk_under_memory_pressure()") 5ec335918819adbd c89fa56a8776f98d8e4ed9310f5 ---------------- --------------------------- %stddev %change %stddev \ | \ 15.76 -0.4 15.31 turbostat.C1% 34.82 +1.4% 35.31 turbostat.RAMWatt 5050526 +1.3% 5115222 vmstat.system.cs 2620207 +1.1% 2648074 vmstat.system.in 1.676e+09 +2.8% 1.723e+09 proc-vmstat.numa_hit 1.676e+09 +2.8% 1.722e+09 proc-vmstat.numa_local 1.338e+10 +2.8% 1.375e+10 proc-vmstat.pgalloc_normal 1.338e+10 +2.8% 1.375e+10 proc-vmstat.pgfree 0.00 ± 22% -25.8% 0.00 ± 5% sched_debug.cpu.next_balance.stddev 0.00 ± 24% -29.2% 0.00 ± 46% sched_debug.rt_rq:.rt_time.avg 0.02 ± 24% -29.2% 0.01 ± 46% sched_debug.rt_rq:.rt_time.max 0.00 ± 24% -29.2% 0.00 ± 46% sched_debug.rt_rq:.rt_time.stddev 22841 +2.8% 23483 netperf.Throughput_Mbps 1461840 +2.8% 1502930 netperf.Throughput_total_Mbps 425.78 +3.6% 441.16 netperf.time.user_time 101776 ± 6% +19.6% 121747 ± 7% netperf.time.voluntary_context_switches 3.346e+09 +2.8% 3.44e+09 netperf.workload 2.182e+10 +2.5% 2.236e+10 perf-stat.i.branch-instructions 0.88 -0.0 0.87 perf-stat.i.branch-miss-rate% 2.55 -0.1 2.46 ± 2% perf-stat.i.cache-miss-rate% 6.286e+09 +2.2% 6.425e+09 perf-stat.i.cache-references 5099243 +1.3% 5164362 perf-stat.i.context-switches 2.59 -2.8% 2.52 perf-stat.i.cpi 12909 ± 3% -8.8% 11772 ± 3% perf-stat.i.cpu-migrations 3.253e+10 +2.6% 3.337e+10 perf-stat.i.dTLB-loads 1.888e+10 +2.7% 1.939e+10 perf-stat.i.dTLB-stores 1.142e+11 +2.6% 1.172e+11 perf-stat.i.instructions 0.39 +2.8% 0.40 perf-stat.i.ipc 621.17 +2.6% 637.06 perf-stat.i.metric.M/sec 50.27 ± 4% +6.5 56.79 ± 3% perf-stat.i.node-store-miss-rate% 26917446 ± 3% +7.6% 28967728 ± 2% perf-stat.i.node-store-misses 27529051 ± 6% -17.5% 22707159 ± 6% perf-stat.i.node-stores 0.87 -0.0 0.86 perf-stat.overall.branch-miss-rate% 2.52 -0.1 2.43 ± 2% perf-stat.overall.cache-miss-rate% 2.59 -2.8% 2.52 perf-stat.overall.cpi 0.39 +2.9% 0.40 perf-stat.overall.ipc 49.47 ± 4% +6.6 56.09 ± 3% perf-stat.overall.node-store-miss-rate% 2.174e+10 +2.5% 2.228e+10 perf-stat.ps.branch-instructions 6.267e+09 +2.2% 6.405e+09 perf-stat.ps.cache-references 5081226 +1.3% 5146095 perf-stat.ps.context-switches 12877 ± 3% -8.8% 11745 ± 3% perf-stat.ps.cpu-migrations 3.241e+10 +2.6% 3.325e+10 perf-stat.ps.dTLB-loads 1.881e+10 +2.7% 1.932e+10 perf-stat.ps.dTLB-stores 1.138e+11 +2.6% 1.168e+11 perf-stat.ps.instructions 26872617 ± 3% +7.6% 28926338 ± 2% perf-stat.ps.node-store-misses 27479983 ± 6% -17.5% 22669639 ± 6% perf-stat.ps.node-stores 3.431e+13 +2.6% 3.52e+13 perf-stat.total.instructions 86.42 -14.7 71.74 ± 20% perf-profile.calltrace.cycles-pp.main.__libc_start_main 86.46 -14.7 71.79 ± 20% perf-profile.calltrace.cycles-pp.__libc_start_main 51.74 -0.9 50.86 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__send.send_omni_inner.send_tcp_stream.main 1.08 ± 30% -0.6 0.44 ± 44% perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge_memcg.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rcv_established 51.54 -0.6 50.95 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__send.send_omni_inner.send_tcp_stream 2.00 -0.5 1.46 ± 8% perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg 49.60 -0.5 49.06 perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64 50.04 -0.5 49.56 perf-profile.calltrace.cycles-pp.sock_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe 51.20 -0.5 50.74 perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.__send.send_omni_inner 51.10 -0.5 50.64 perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.__send 13.57 -0.4 13.20 perf-profile.calltrace.cycles-pp.release_sock.tcp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto 13.07 -0.3 12.72 perf-profile.calltrace.cycles-pp.__release_sock.release_sock.tcp_sendmsg.sock_sendmsg.__sys_sendto 2.48 ± 2% -0.3 2.17 ± 2% perf-profile.calltrace.cycles-pp.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule.tcp_sendmsg_locked 1.36 -0.3 1.05 ± 2% perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv 3.23 -0.3 2.96 perf-profile.calltrace.cycles-pp.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule.tcp_sendmsg_locked.tcp_sendmsg 3.28 -0.3 3.00 perf-profile.calltrace.cycles-pp.__sk_mem_schedule.tcp_wmem_schedule.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 3.33 -0.3 3.06 perf-profile.calltrace.cycles-pp.tcp_wmem_schedule.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.__sys_sendto 6.08 -0.3 5.82 perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish 6.21 -0.3 5.96 perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core 3.32 -0.2 3.07 perf-profile.calltrace.cycles-pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock 1.40 ± 3% -0.2 1.17 ± 7% perf-profile.calltrace.cycles-pp.refill_stock.__sk_mem_reduce_allocated.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg 2.90 ± 3% -0.2 2.67 ± 2% perf-profile.calltrace.cycles-pp.try_charge_memcg.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv 1.26 ± 3% -0.2 1.03 ± 7% perf-profile.calltrace.cycles-pp.page_counter_uncharge.drain_stock.refill_stock.__sk_mem_reduce_allocated.tcp_recvmsg_locked 3.69 -0.2 3.46 perf-profile.calltrace.cycles-pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock 1.27 ± 3% -0.2 1.05 ± 7% perf-profile.calltrace.cycles-pp.drain_stock.refill_stock.__sk_mem_reduce_allocated.tcp_recvmsg_locked.tcp_recvmsg 4.66 -0.2 4.46 perf-profile.calltrace.cycles-pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu 2.15 ± 3% -0.2 1.98 ± 2% perf-profile.calltrace.cycles-pp.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv 0.64 ± 2% -0.2 0.47 ± 45% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recvmsg 1.52 -0.2 1.37 ± 7% perf-profile.calltrace.cycles-pp.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg 1.70 ± 3% -0.1 1.55 ± 2% perf-profile.calltrace.cycles-pp.try_charge_memcg.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule 1.42 -0.1 1.27 ± 7% perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg 1.48 -0.1 1.33 ± 7% perf-profile.calltrace.cycles-pp.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg 1.29 -0.1 1.16 ± 7% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked 1.08 -0.1 0.96 ± 7% perf-profile.calltrace.cycles-pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.__sk_flush_backlog 0.96 ± 2% -0.1 0.85 ± 7% perf-profile.calltrace.cycles-pp.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.__release_sock 1.01 ± 3% -0.1 0.90 ± 2% perf-profile.calltrace.cycles-pp.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established 0.89 -0.1 0.78 ± 7% perf-profile.calltrace.cycles-pp.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom 0.92 ± 3% -0.1 0.82 ± 2% perf-profile.calltrace.cycles-pp.page_counter_uncharge.drain_stock.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queue 0.93 ± 3% -0.1 0.83 ± 2% perf-profile.calltrace.cycles-pp.drain_stock.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack 0.75 -0.1 0.65 ± 7% perf-profile.calltrace.cycles-pp.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg 0.71 -0.1 0.62 ± 7% perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg 0.85 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge_memcg.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_schedule 0.96 -0.0 0.92 perf-profile.calltrace.cycles-pp.skb_release_data.napi_consume_skb.net_rx_action.__do_softirq.do_softirq 1.04 -0.0 1.00 perf-profile.calltrace.cycles-pp.napi_consume_skb.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip 0.54 +0.0 0.56 perf-profile.calltrace.cycles-pp.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 1.04 +0.0 1.08 ± 2% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary 1.10 +0.0 1.15 ± 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 2.68 +0.1 2.73 perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state 1.14 +0.1 1.20 perf-profile.calltrace.cycles-pp.skb_release_data.__kfree_skb.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established 1.16 +0.1 1.22 perf-profile.calltrace.cycles-pp.__kfree_skb.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv 31.36 +0.2 31.59 perf-profile.calltrace.cycles-pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe 31.75 +0.2 31.99 perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv 31.13 +0.2 31.37 perf-profile.calltrace.cycles-pp.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom 31.20 +0.2 31.45 perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64 31.80 +0.2 32.05 perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv.recv_omni 29.91 +0.3 30.16 perf-profile.calltrace.cycles-pp.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom 32.02 +0.3 32.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv.recv_omni.process_requests 32.11 +0.3 32.37 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recv.recv_omni.process_requests.spawn_child 32.72 +0.3 32.99 perf-profile.calltrace.cycles-pp.recv.recv_omni.process_requests.spawn_child.accept_connection 33.18 +0.3 33.45 perf-profile.calltrace.cycles-pp.recv_omni.process_requests.spawn_child.accept_connection.accept_connections 15.32 +0.4 15.71 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked 15.54 +0.4 15.92 perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg 17.05 +0.4 17.47 perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg 17.02 +0.4 17.44 perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg 86.46 -14.7 71.79 ± 20% perf-profile.children.cycles-pp.__libc_start_main 86.58 -14.7 71.92 ± 20% perf-profile.children.cycles-pp.main 3.52 -0.7 2.78 perf-profile.children.cycles-pp.__sk_mem_reduce_allocated 6.21 ± 2% -0.5 5.67 perf-profile.children.cycles-pp.mem_cgroup_charge_skmem 49.64 -0.5 49.10 perf-profile.children.cycles-pp.tcp_sendmsg 50.06 -0.5 49.58 perf-profile.children.cycles-pp.sock_sendmsg 51.22 -0.5 50.76 perf-profile.children.cycles-pp.__x64_sys_sendto 51.13 -0.5 50.67 perf-profile.children.cycles-pp.__sys_sendto 5.14 ± 3% -0.3 4.82 ± 2% perf-profile.children.cycles-pp.try_charge_memcg 18.19 -0.3 17.87 perf-profile.children.cycles-pp.tcp_rcv_established 18.52 -0.3 18.20 perf-profile.children.cycles-pp.tcp_v4_do_rcv 15.38 -0.3 15.08 perf-profile.children.cycles-pp.__release_sock 14.62 -0.3 14.34 perf-profile.children.cycles-pp.release_sock 19.19 -0.3 18.92 perf-profile.children.cycles-pp._copy_from_iter 18.90 -0.3 18.62 perf-profile.children.cycles-pp.copyin 3.45 -0.3 3.18 perf-profile.children.cycles-pp.__sk_mem_raise_allocated 3.50 -0.3 3.23 perf-profile.children.cycles-pp.__sk_mem_schedule 6.37 -0.2 6.12 perf-profile.children.cycles-pp.tcp_data_queue 19.75 -0.2 19.51 perf-profile.children.cycles-pp.skb_do_copy_data_nocache 2.58 ± 3% -0.2 2.34 ± 2% perf-profile.children.cycles-pp.refill_stock 2.30 ± 3% -0.2 2.06 ± 2% perf-profile.children.cycles-pp.page_counter_uncharge 2.32 ± 3% -0.2 2.08 ± 2% perf-profile.children.cycles-pp.drain_stock 3.35 -0.2 3.11 perf-profile.children.cycles-pp.tcp_wmem_schedule 3.65 -0.2 3.41 perf-profile.children.cycles-pp.tcp_clean_rtx_queue 2.70 ± 3% -0.2 2.49 perf-profile.children.cycles-pp.page_counter_try_charge 4.38 -0.2 4.17 perf-profile.children.cycles-pp.tcp_ack 0.47 ± 3% -0.1 0.37 ± 3% perf-profile.children.cycles-pp.mem_cgroup_uncharge_skmem 0.52 ± 2% -0.0 0.50 perf-profile.children.cycles-pp._raw_spin_trylock 0.39 -0.0 0.37 perf-profile.children.cycles-pp.select_task_rq_fair 0.10 ± 3% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.security_socket_recvmsg 0.06 +0.0 0.07 perf-profile.children.cycles-pp._raw_spin_unlock_bh 0.18 ± 2% +0.0 0.20 ± 2% perf-profile.children.cycles-pp.ip_send_check 0.06 ± 6% +0.0 0.08 ± 6% perf-profile.children.cycles-pp.rb_next 0.17 ± 2% +0.0 0.19 ± 2% perf-profile.children.cycles-pp.tick_irq_enter 0.52 +0.0 0.54 perf-profile.children.cycles-pp.__fget_light 0.27 +0.0 0.29 perf-profile.children.cycles-pp.tcp_tso_segs 1.08 +0.0 1.10 perf-profile.children.cycles-pp.skb_page_frag_refill 0.65 +0.0 0.67 perf-profile.children.cycles-pp.tcp_stream_alloc_skb 1.11 +0.0 1.14 perf-profile.children.cycles-pp.sk_page_frag_refill 0.41 +0.0 0.43 ± 2% perf-profile.children.cycles-pp.native_sched_clock 0.64 +0.0 0.67 perf-profile.children.cycles-pp.sockfd_lookup_light 0.56 +0.0 0.58 perf-profile.children.cycles-pp.kmem_cache_alloc_node 0.47 +0.0 0.49 perf-profile.children.cycles-pp.tcp_schedule_loss_probe 0.45 +0.0 0.48 perf-profile.children.cycles-pp.sched_clock_cpu 0.29 ± 2% +0.0 0.32 perf-profile.children.cycles-pp.aa_sk_perm 0.76 +0.0 0.79 perf-profile.children.cycles-pp.ktime_get 0.72 +0.0 0.75 perf-profile.children.cycles-pp.dequeue_entity 0.22 ± 3% +0.0 0.25 ± 3% perf-profile.children.cycles-pp.set_next_entity 1.03 +0.0 1.06 perf-profile.children.cycles-pp.simple_copy_to_iter 0.86 +0.0 0.89 perf-profile.children.cycles-pp.__alloc_skb 0.48 +0.0 0.51 perf-profile.children.cycles-pp.irqtime_account_irq 0.81 +0.0 0.85 perf-profile.children.cycles-pp.dequeue_task_fair 0.53 ± 2% +0.0 0.57 ± 2% perf-profile.children.cycles-pp.tcp_event_new_data_sent 0.62 +0.0 0.66 perf-profile.children.cycles-pp.__mod_timer 0.66 +0.0 0.70 perf-profile.children.cycles-pp.sk_reset_timer 1.19 +0.0 1.23 perf-profile.children.cycles-pp.check_heap_object 0.33 ± 2% +0.0 0.37 ± 4% perf-profile.children.cycles-pp.propagate_protected_usage 1.47 +0.0 1.51 perf-profile.children.cycles-pp.__check_object_size 1.11 +0.0 1.16 ± 2% perf-profile.children.cycles-pp.schedule_idle 0.27 ± 3% +0.0 0.32 perf-profile.children.cycles-pp.security_socket_sendmsg 0.00 +0.1 0.05 perf-profile.children.cycles-pp.pick_next_entity 0.00 +0.1 0.05 perf-profile.children.cycles-pp.detach_if_pending 0.00 +0.1 0.05 perf-profile.children.cycles-pp.rb_erase 0.00 +0.1 0.05 perf-profile.children.cycles-pp.get_nohz_timer_target 2.72 +0.1 2.78 perf-profile.children.cycles-pp.sysvec_call_function_single 2.80 +0.1 2.85 perf-profile.children.cycles-pp.skb_release_data 0.23 ± 2% +0.1 0.29 perf-profile.children.cycles-pp.skb_attempt_defer_free 1.54 +0.1 1.62 perf-profile.children.cycles-pp.__kfree_skb 16.08 +0.1 16.20 perf-profile.children.cycles-pp.tcp_write_xmit 18.25 +0.1 18.39 perf-profile.children.cycles-pp.__tcp_transmit_skb 31.37 +0.2 31.60 perf-profile.children.cycles-pp.sock_recvmsg 31.16 +0.2 31.39 perf-profile.children.cycles-pp.tcp_recvmsg 31.82 +0.2 32.06 perf-profile.children.cycles-pp.__x64_sys_recvfrom 31.76 +0.2 32.00 perf-profile.children.cycles-pp.__sys_recvfrom 31.21 +0.2 31.45 perf-profile.children.cycles-pp.inet_recvmsg 29.93 +0.3 30.19 perf-profile.children.cycles-pp.tcp_recvmsg_locked 32.82 +0.3 33.08 perf-profile.children.cycles-pp.recv 33.18 +0.3 33.46 perf-profile.children.cycles-pp.recv_omni 33.18 +0.3 33.46 perf-profile.children.cycles-pp.accept_connections 33.18 +0.3 33.46 perf-profile.children.cycles-pp.accept_connection 33.18 +0.3 33.46 perf-profile.children.cycles-pp.spawn_child 33.18 +0.3 33.46 perf-profile.children.cycles-pp.process_requests 15.42 +0.4 15.80 perf-profile.children.cycles-pp.copyout 15.54 +0.4 15.93 perf-profile.children.cycles-pp._copy_to_iter 17.03 +0.4 17.45 perf-profile.children.cycles-pp.__skb_datagram_iter 17.05 +0.4 17.48 perf-profile.children.cycles-pp.skb_copy_datagram_iter 0.50 ± 3% -0.4 0.11 ± 4% perf-profile.self.cycles-pp.__sk_mem_reduce_allocated 18.79 -0.3 18.52 perf-profile.self.cycles-pp.copyin 2.12 ± 3% -0.3 1.85 ± 2% perf-profile.self.cycles-pp.page_counter_uncharge 2.54 ± 3% -0.2 2.30 perf-profile.self.cycles-pp.page_counter_try_charge 0.96 ± 3% -0.2 0.73 ± 3% perf-profile.self.cycles-pp.mem_cgroup_charge_skmem 2.40 ± 3% -0.1 2.28 ± 2% perf-profile.self.cycles-pp.try_charge_memcg 0.42 ± 3% -0.1 0.31 ± 3% perf-profile.self.cycles-pp.mem_cgroup_uncharge_skmem 0.12 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.select_task_rq_fair 0.15 ± 3% -0.0 0.13 ± 2% perf-profile.self.cycles-pp.__entry_text_start 0.20 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.loopback_xmit 0.05 +0.0 0.06 perf-profile.self.cycles-pp.set_next_entity 0.05 +0.0 0.06 perf-profile.self.cycles-pp.__xfrm_policy_check2 0.06 +0.0 0.07 perf-profile.self.cycles-pp.exit_to_user_mode_prepare 0.19 +0.0 0.20 perf-profile.self.cycles-pp.sock_put 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.self.cycles-pp.tcp_event_new_data_sent 0.12 +0.0 0.13 ± 2% perf-profile.self.cycles-pp.tcp_v4_do_rcv 0.19 ± 3% +0.0 0.20 ± 2% perf-profile.self.cycles-pp.update_curr 0.51 +0.0 0.53 perf-profile.self.cycles-pp.__fget_light 0.26 +0.0 0.28 ± 2% perf-profile.self.cycles-pp.ktime_get 0.16 ± 2% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.recv_data 0.21 ± 2% +0.0 0.23 perf-profile.self.cycles-pp.__do_softirq 0.16 ± 5% +0.0 0.18 ± 6% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.39 +0.0 0.42 perf-profile.self.cycles-pp.native_sched_clock 0.89 +0.0 0.91 perf-profile.self.cycles-pp.__inet_lookup_established 0.96 +0.0 0.99 perf-profile.self.cycles-pp.__tcp_transmit_skb 0.82 +0.0 0.85 perf-profile.self.cycles-pp.check_heap_object 0.46 +0.0 0.48 perf-profile.self.cycles-pp.net_rx_action 0.23 ± 3% +0.0 0.26 perf-profile.self.cycles-pp.aa_sk_perm 0.54 +0.0 0.57 perf-profile.self.cycles-pp.tcp_recvmsg_locked 0.53 +0.0 0.56 perf-profile.self.cycles-pp.tcp_v4_rcv 0.32 ± 3% +0.0 0.37 ± 5% perf-profile.self.cycles-pp.propagate_protected_usage 0.10 ± 4% +0.1 0.17 ± 2% perf-profile.self.cycles-pp.skb_attempt_defer_free 1.14 +0.1 1.21 perf-profile.self.cycles-pp.skb_release_data 15.32 +0.4 15.71 perf-profile.self.cycles-pp.copyout Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
diff --git a/include/net/sock.h b/include/net/sock.h index ad1895ffbc4a..22695f776e76 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1409,13 +1409,18 @@ static inline bool sk_has_memory_pressure(const struct sock *sk) return sk->sk_prot->memory_pressure != NULL; } +static inline bool sk_under_global_memory_pressure(const struct sock *sk) +{ + return sk->sk_prot->memory_pressure && + *sk->sk_prot->memory_pressure; +} + static inline bool sk_under_memory_pressure(const struct sock *sk) { if (mem_cgroup_under_socket_pressure(sk->sk_memcg)) return true; - return sk->sk_prot->memory_pressure && - *sk->sk_prot->memory_pressure; + return sk_under_global_memory_pressure(sk); } static inline long diff --git a/net/core/sock.c b/net/core/sock.c index 5440e67bcfe3..801df091e37a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3095,7 +3095,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount) if (mem_cgroup_sockets_enabled && sk->sk_memcg) mem_cgroup_uncharge_skmem(sk->sk_memcg, amount); - if (sk_under_memory_pressure(sk) && + if (sk_under_global_memory_pressure(sk) && (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0))) sk_leave_memory_pressure(sk); }
The status of global socket memory pressure is updated when: a) __sk_mem_raise_allocated(): enter: sk_memory_allocated(sk) > sysctl_mem[1] leave: sk_memory_allocated(sk) <= sysctl_mem[0] b) __sk_mem_reduce_allocated(): leave: sk_under_memory_pressure(sk) && sk_memory_allocated(sk) < sysctl_mem[0] So the conditions of leaving global pressure are inconstant, which may lead to the situation that one pressured net-memcg prevents the global pressure from being cleared when there is indeed no global pressure, thus the global constrains are still in effect unexpectedly on the other sockets. This patch fixes this by ignoring the net-memcg's pressure when deciding whether should leave global memory pressure. Fixes: e1aab161e013 ("socket: initial cgroup code.") Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> --- include/net/sock.h | 9 +++++++-- net/core/sock.c | 2 +- 2 files changed, 8 insertions(+), 3 deletions(-)