From patchwork Sat Oct 10 10:38:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 11830115 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D23221592 for ; Sat, 10 Oct 2020 10:39:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5C9A7221FF for ; Sat, 10 Oct 2020 10:39:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="nldkqHE8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C9A7221FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7CD486B005C; Sat, 10 Oct 2020 06:39:22 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7A49A6B005D; Sat, 10 Oct 2020 06:39:22 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66F9B6B0062; Sat, 10 Oct 2020 06:39:22 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 3AE946B005C for ; Sat, 10 Oct 2020 06:39:22 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BB75F362D for ; Sat, 10 Oct 2020 10:39:21 +0000 (UTC) X-FDA: 77355668922.29.gate40_5c0e166271e8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 997CB180868D1 for ; Sat, 10 Oct 2020 10:39:21 +0000 (UTC) X-Spam-Summary: 1,0,0,bee75cf0da453c2a,d41d8cd98f00b204,songmuchun@bytedance.com,,RULES_HIT:1:41:355:379:541:800:960:966:973:988:989:1260:1311:1314:1345:1431:1437:1515:1605:1730:1747:1777:1792:2194:2195:2196:2198:2199:2200:2201:2202:2393:2559:2562:2636:2731:3138:3139:3140:3141:3142:3865:3866:3867:3870:3871:3874:4321:4385:4605:5007:6261:6630:6653:6737:6738:7875:7903:8603:8957:9121:9592:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13161:13229:13869:13894:14096:14394:14915:21080:21433:21444:21451:21627:21987:21990:30005:30012:30045:30054:30056,0,RBL:209.85.210.194:@bytedance.com:.lbl8.mailshell.net-66.100.201.201 62.2.0.100;04yrdg7z5cbug98zsy7yztnectxf8ycwafu6fcfduaergh61eynm3t936t5mah5.ugbnunukdfqt8xapnmtieqbwcapbi8epapu77ik5gtmkhcypqwzjipawxrf8pjd.e-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:37,LUA_SUMMAR Y:none X-HE-Tag: gate40_5c0e166271e8 X-Filterd-Recvd-Size: 14067 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Sat, 10 Oct 2020 10:39:20 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id x13so6487907pfa.9 for ; Sat, 10 Oct 2020 03:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=u0rkVm7C5Ne19ttZslnktIRDYT2Kn1UpPsAG5EcqOzk=; b=nldkqHE8cd6nALAOIPmeXXa4lxSehpssrliInfwNUW6Hb+5AQUJqdY1lIPTMQ6GhSC bNzY2yFALQ55yAPmV5/4OpkoS7Qxwcj/YUT5Oj5i9lNzl9jecn4rrsC7JXX9l2e74ImO cJrwW3fD48Oz0BKuJkv+BX5P4MtW/5+lC59ohp8iD3SZuNoyFMebf73oGrnKCE0Yvu3A XOgUEFfxab3XYcCK8qOUOXalHYT5GixnEluc7WjZJUD1eYMRtoTU89QqQsf9E0GIIZ9q qB9nij5cQMCDcDus00QS4NHfq/KCsRWHwombRIrYkQPhxQSNvOqD8XT1glRLgL7P399O Eq0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=u0rkVm7C5Ne19ttZslnktIRDYT2Kn1UpPsAG5EcqOzk=; b=dHC9TqXCK8phl9deAzMQAilgtMrQwBnArh+ATMQoQV+/ppP278Aj5waPE+yTb3GNo7 nPd/JLWCf+Iv/sbrJ/yFUBbh49TFy2xuq4XUwWzunZcAenP60b96XQPdTDxS4v9pvjgU fUOOWXGTfUGx92VFlaozYJZhGaDvQHFdxaM+wWmA2wo5flBiKd/pDbWphJlToJbc0ua6 892/dFeznu/Ku+ljY/8AjwvQP89uMIAsKyWKpNVC87YcEsUpoNH7gCOBJpCKdvwQsYZb x/ryRlDE9AvAVUOZs7Vch2nNFLCRf5blOnrBpSpXkV8jYQIWGbHptQZ19XK2eKJe/oIr RBYA== X-Gm-Message-State: AOAM532XvKHzRU4iWQSB9lWVVEzgU02YvrHuKJY1xZTCaIFASV+J9OoM C2A+XcaoiUtB1K9tocJfqEy5Dg== X-Google-Smtp-Source: ABdhPJxZnwM7IyP5/gKorplHLzpJGDDwce8ks6RVhogeGwaAYPaE9QFMriDfyYZjDuze5VAmX/L0QA== X-Received: by 2002:a63:e:: with SMTP id 14mr6928456pga.426.1602326359642; Sat, 10 Oct 2020 03:39:19 -0700 (PDT) Received: from Smcdef-MBP.local.net ([103.136.220.73]) by smtp.gmail.com with ESMTPSA id v3sm1450830pfu.165.2020.10.10.03.39.04 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Oct 2020 03:39:18 -0700 (PDT) From: Muchun Song To: gregkh@linuxfoundation.org, rafael@kernel.org, mst@redhat.com, jasowang@redhat.com, davem@davemloft.net, kuba@kernel.org, adobriyan@gmail.com, akpm@linux-foundation.org, edumazet@google.com, kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org, steffen.klassert@secunet.com, herbert@gondor.apana.org.au, shakeelb@google.com, will@kernel.org, mhocko@suse.com, guro@fb.com, neilb@suse.de, rppt@kernel.org, songmuchun@bytedance.com, samitolvanen@google.com, kirill.shutemov@linux.intel.com, feng.tang@intel.com, pabeni@redhat.com, willemb@google.com, rdunlap@infradead.org, fw@strlen.de, gustavoars@kernel.org, pablo@netfilter.org, decui@microsoft.com, jakub@cloudflare.com, peterz@infradead.org, christian.brauner@ubuntu.com, ebiederm@xmission.com, tglx@linutronix.de, dave@stgolabs.net, walken@google.com, jannh@google.com, chenqiwu@xiaomi.com, christophe.leroy@c-s.fr, minchan@kernel.org, kafai@fb.com, ast@kernel.org, daniel@iogearbox.net, linmiaohe@huawei.com, keescook@chromium.org Cc: linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] mm: proc: add Sock to /proc/meminfo Date: Sat, 10 Oct 2020 18:38:54 +0800 Message-Id: <20201010103854.66746-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The amount of memory allocated to sockets buffer can become significant. However, we do not display the amount of memory consumed by sockets buffer. In this case, knowing where the memory is consumed by the kernel is very difficult. On our server with 500GB RAM, sometimes we can see 25GB disappear through /proc/meminfo. After our analysis, we found the following memory allocation path which consumes the memory with page_owner enabled. 849698 times: Page allocated via order 3, mask 0x4052c0(GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP) __alloc_pages_nodemask+0x11d/0x290 skb_page_frag_refill+0x68/0xf0 sk_page_frag_refill+0x19/0x70 tcp_sendmsg_locked+0x2f4/0xd10 tcp_sendmsg+0x29/0xa0 sock_sendmsg+0x30/0x40 sock_write_iter+0x8f/0x100 __vfs_write+0x10b/0x190 vfs_write+0xb0/0x190 ksys_write+0x5a/0xd0 do_syscall_64+0x5d/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Muchun Song Signed-off-by: Mike Rapoport --- drivers/base/node.c | 2 ++ drivers/net/virtio_net.c | 3 +-- fs/proc/meminfo.c | 1 + include/linux/mmzone.h | 1 + include/linux/skbuff.h | 43 ++++++++++++++++++++++++++++++++++++++-- kernel/exit.c | 3 +-- mm/page_alloc.c | 7 +++++-- mm/vmstat.c | 1 + net/core/sock.c | 8 ++++---- net/ipv4/tcp.c | 3 +-- net/xfrm/xfrm_state.c | 3 +-- 11 files changed, 59 insertions(+), 16 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 508b80f6329b..6f92775da85c 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -418,6 +418,7 @@ static ssize_t node_read_meminfo(struct device *dev, #ifdef CONFIG_SHADOW_CALL_STACK "Node %d ShadowCallStack:%8lu kB\n" #endif + "Node %d Sock: %8lu kB\n" "Node %d PageTables: %8lu kB\n" "Node %d NFS_Unstable: %8lu kB\n" "Node %d Bounce: %8lu kB\n" @@ -441,6 +442,7 @@ static ssize_t node_read_meminfo(struct device *dev, nid, K(node_page_state(pgdat, NR_ANON_MAPPED)), nid, K(i.sharedram), nid, node_page_state(pgdat, NR_KERNEL_STACK_KB), + nid, K(node_page_state(pgdat, NR_SOCK)), #ifdef CONFIG_SHADOW_CALL_STACK nid, node_page_state(pgdat, NR_KERNEL_SCS_KB), #endif diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 263b005981bd..e7183f67ae4a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2646,8 +2646,7 @@ static void free_receive_page_frags(struct virtnet_info *vi) { int i; for (i = 0; i < vi->max_queue_pairs; i++) - if (vi->rq[i].alloc_frag.page) - put_page(vi->rq[i].alloc_frag.page); + put_page_frag(&vi->rq[i].alloc_frag); } static void free_unused_bufs(struct virtnet_info *vi) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 887a5532e449..1dcf3120d831 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -106,6 +106,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) seq_printf(m, "ShadowCallStack:%8lu kB\n", global_node_page_state(NR_KERNEL_SCS_KB)); #endif + show_val_kb(m, "Sock: ", global_node_page_state(NR_SOCK)); show_val_kb(m, "PageTables: ", global_zone_page_state(NR_PAGETABLE)); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 31712bb61f7f..1996713d2c6b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -207,6 +207,7 @@ enum node_stat_item { #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) NR_KERNEL_SCS_KB, /* measured in KiB */ #endif + NR_SOCK, /* Count of socket buffer pages */ NR_VM_NODE_STAT_ITEMS }; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fcd53f97c186..7e5108da4d84 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -19,7 +19,8 @@ #include #include #include - +#include +#include #include #include #include @@ -3003,6 +3004,25 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) __skb_frag_ref(&skb_shinfo(skb)->frags[f]); } +static inline void inc_sock_node_page_state(struct page *page) +{ + mod_node_page_state(page_pgdat(page), NR_SOCK, compound_nr(page)); + /* + * Indicate that we need to decrease the Sock page state when + * the page freed. + */ + SetPagePrivate(page); +} + +static inline void dec_sock_node_page_state(struct page *page) +{ + if (PagePrivate(page)) { + ClearPagePrivate(page); + mod_node_page_state(page_pgdat(page), NR_SOCK, + -compound_nr(page)); + } +} + /** * __skb_frag_unref - release a reference on a paged fragment. * @frag: the paged fragment @@ -3011,7 +3031,12 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f) */ static inline void __skb_frag_unref(skb_frag_t *frag) { - put_page(skb_frag_page(frag)); + struct page *page = skb_frag_page(frag); + + if (put_page_testzero(page)) { + dec_sock_node_page_state(page); + __put_page(page); + } } /** @@ -3091,6 +3116,20 @@ static inline void skb_frag_set_page(struct sk_buff *skb, int f, __skb_frag_set_page(&skb_shinfo(skb)->frags[f], page); } +static inline bool put_page_frag(struct page_frag *pfrag) +{ + struct page *page = pfrag->page; + + if (page) { + if (put_page_testzero(page)) { + dec_sock_node_page_state(page); + __put_page(page); + } + return true; + } + return false; +} + bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio); /** diff --git a/kernel/exit.c b/kernel/exit.c index 62912406d74a..58d373767d16 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -841,8 +841,7 @@ void __noreturn do_exit(long code) if (tsk->splice_pipe) free_pipe_info(tsk->splice_pipe); - if (tsk->task_frag.page) - put_page(tsk->task_frag.page); + put_page_frag(&tsk->task_frag); validate_creds_for_do_exit(tsk); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cefbef32bf4a..6c543158aa06 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5379,7 +5379,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) " unevictable:%lu dirty:%lu writeback:%lu\n" " slab_reclaimable:%lu slab_unreclaimable:%lu\n" " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n" - " free:%lu free_pcp:%lu free_cma:%lu\n", + " free:%lu free_pcp:%lu free_cma:%lu sock:%lu\n", global_node_page_state(NR_ACTIVE_ANON), global_node_page_state(NR_INACTIVE_ANON), global_node_page_state(NR_ISOLATED_ANON), @@ -5397,7 +5397,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) global_zone_page_state(NR_BOUNCE), global_zone_page_state(NR_FREE_PAGES), free_pcp, - global_zone_page_state(NR_FREE_CMA_PAGES)); + global_zone_page_state(NR_FREE_CMA_PAGES), + global_node_page_state(NR_SOCK)); for_each_online_pgdat(pgdat) { if (show_mem_node_skip(filter, pgdat->node_id, nodemask)) @@ -5425,6 +5426,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) #ifdef CONFIG_SHADOW_CALL_STACK " shadow_call_stack:%lukB" #endif + " sock:%lukB" " all_unreclaimable? %s" "\n", pgdat->node_id, @@ -5450,6 +5452,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) #ifdef CONFIG_SHADOW_CALL_STACK node_page_state(pgdat, NR_KERNEL_SCS_KB), #endif + K(node_page_state(pgdat, NR_SOCK)), pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ? "yes" : "no"); } diff --git a/mm/vmstat.c b/mm/vmstat.c index b05dec387557..ceaf6f85c155 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1220,6 +1220,7 @@ const char * const vmstat_text[] = { #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) "nr_shadow_call_stack", #endif + "nr_sock", /* enum writeback_stat_item counters */ "nr_dirty_threshold", diff --git a/net/core/sock.c b/net/core/sock.c index 5972d26f03ae..1661b423802b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1780,10 +1780,8 @@ static void __sk_destruct(struct rcu_head *head) pr_debug("%s: optmem leakage (%d bytes) detected\n", __func__, atomic_read(&sk->sk_omem_alloc)); - if (sk->sk_frag.page) { - put_page(sk->sk_frag.page); + if (put_page_frag(&sk->sk_frag)) sk->sk_frag.page = NULL; - } if (sk->sk_peer_cred) put_cred(sk->sk_peer_cred); @@ -2456,7 +2454,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) } if (pfrag->offset + sz <= pfrag->size) return true; - put_page(pfrag->page); + put_page_frag(pfrag); } pfrag->offset = 0; @@ -2469,12 +2467,14 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER; + inc_sock_node_page_state(pfrag->page); return true; } } pfrag->page = alloc_page(gfp); if (likely(pfrag->page)) { pfrag->size = PAGE_SIZE; + inc_sock_node_page_state(pfrag->page); return true; } return false; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 57a568875539..583761844b4f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2751,8 +2751,7 @@ int tcp_disconnect(struct sock *sk, int flags) WARN_ON(inet->inet_num && !icsk->icsk_bind_hash); - if (sk->sk_frag.page) { - put_page(sk->sk_frag.page); + if (put_page_frag(&sk->sk_frag)) { sk->sk_frag.page = NULL; sk->sk_frag.offset = 0; } diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 69520ad3d83b..0f7c16679e49 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -495,8 +495,7 @@ static void ___xfrm_state_destroy(struct xfrm_state *x) x->type->destructor(x); xfrm_put_type(x->type); } - if (x->xfrag.page) - put_page(x->xfrag.page); + put_page_frag(&x->xfrag); xfrm_dev_state_free(x); security_xfrm_state_free(x); xfrm_state_free(x);