From patchwork Thu Jan 12 15:53:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13098265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B187C54EBC for ; Thu, 12 Jan 2023 15:53:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D8B58E0002; Thu, 12 Jan 2023 10:53:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 460988E0001; Thu, 12 Jan 2023 10:53:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B47D8E0002; Thu, 12 Jan 2023 10:53:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 163A38E0001 for ; Thu, 12 Jan 2023 10:53:36 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C94EB160671 for ; Thu, 12 Jan 2023 15:53:35 +0000 (UTC) X-FDA: 80346591990.14.900ACAB Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 39BA840003 for ; Thu, 12 Jan 2023 15:53:33 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gH+YZsuY; spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673538814; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=+b4oZC4k/u0MuI+eBhb5GDAbvtZlveaanBpbCBZGLuk=; b=xGsQ73ptGKkAIt742HU2aaMOY5T+UKn4mtFDnz8/WTQpUb+GkJK58KzLzCzfHN5vZ4+DS1 +pU1E5uyzzpSDFE60R5hYhKx3c9nHvImPUh20xr5+kA3PSxEmHpB6hEZlREh1Ys16wz7bB KOaL/sszRYumjJO23t6lwAuL1gKM14Q= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gH+YZsuY; spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673538814; a=rsa-sha256; cv=none; b=LSKVc4O1VA6WwHlG7ZKFe8YtaRWj8zt7txEpQtrEIPHxz0QWUylA9MtPIVXTJhcWePt+rh +Uo29lcKhhdGqKb9KcfPhVY8qxyM25qWvAaxmV83t0QXMZeIvOiSmospX6HQ1ysVVkM3t1 605y2l2DsXgPqzOwOqw+cS7mbiM4bz4= Received: by mail-qt1-f173.google.com with SMTP id bp44so16889764qtb.0 for ; Thu, 12 Jan 2023 07:53:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=+b4oZC4k/u0MuI+eBhb5GDAbvtZlveaanBpbCBZGLuk=; b=gH+YZsuYyNPgyszOqubkRLCS29SIJOB9PrQx1tS+xlG3QyAJ3bLTTWBPampZXiuPqs KTjA5w3a6VtPl+bBdFurqalw8lyuxjz2A4sG9yff/uDc0KPTelAtNFUbVSWtgycVq3QP ETITTnKP1toA9/o1j6NBt8NBf3hWk/pPK+Xbs3dnttA7wutvGvRjhvoiBOFTxKUyinsL ZQzRD5nVGXI+QwkacI52G8jp09+KNKkVBdoJQKTHBZ+Ba0gFhaMYPjze8kqNsDHZdRaS z2pSWMd+FbmGdTAJDgAgYGgXPSczYqXuIialaR265GI33BGLZMCQYdpezEEJ7gEamKHw leqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+b4oZC4k/u0MuI+eBhb5GDAbvtZlveaanBpbCBZGLuk=; b=VGEKGmz94KRVl0trxFMeZ7qjDOzDsBBLJ/UJYqUpXdfHvCtKdCqCc6ADSeSzpKvhgs jPj4urvsG1y33gMrsYY6YyYmz64foJpL9FGqehqxhe3t98p7R7ZiNgQ+MWUeZs9Hu9F0 RItoqu1dCahWZCEvk6xXoA5oRXPMaNxv7OrWUW6PnfCj5o+zcubhXY/OTh4DKAD+CxPo I4o0oXTV4beZQaCDBzysmwadXehg9c/ujfOq5k6GVyeYh479L3O/exjJF/Pd1ZwgoK/9 GYVieRZd+h6wUO37aak4ZAYf+rKScNqjOxiUuuoAFjgkAeKxFMO5kcfOmPnfAqvGkFDp 9Tcw== X-Gm-Message-State: AFqh2krFLhRZT5KNwvhrSSqd8i69QS8FkUU6E58aBw45lx5DVfGUXYLA VfgCHz5MTt5PuxhMjnpU2Cs= X-Google-Smtp-Source: AMrXdXtGvY5tQVEkBtICluE5ihHdcXFG1vx0FJ6hV+iSciT1RQdYmWKrihwzvymSMNTVc073jfMhtA== X-Received: by 2002:a05:622a:1f10:b0:3a9:7782:fd7f with SMTP id ca16-20020a05622a1f1000b003a97782fd7fmr105567367qtb.21.1673538813262; Thu, 12 Jan 2023 07:53:33 -0800 (PST) Received: from vultr.guest ([173.199.122.241]) by smtp.gmail.com with ESMTPSA id l17-20020ac848d1000000b003ab43dabfb1sm9280836qtr.55.2023.01.12.07.53.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Jan 2023 07:53:32 -0800 (PST) From: Yafang Shao To: 42.hyeyoo@gmail.com, vbabka@suse.cz, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, tj@kernel.org, dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, roman.gushchin@linux.dev Cc: linux-mm@kvack.org, bpf@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo Date: Thu, 12 Jan 2023 15:53:15 +0000 Message-Id: <20230112155326.26902-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: g68tonjxt5s1mz3k9umqwnwnegucha31 X-Rspamd-Queue-Id: 39BA840003 X-HE-Tag: 1673538813-2455 X-HE-Meta: U2FsdGVkX1/Q4Ej/GIwzZdLCy42MkjqLHwEliCaBsnL2YdmWAZ7bLrbpwGuVvrE91GZ+p3Cr+UajSiABZ4Itrbr9lIADZ5VGy5Nvy4nDMeXfQri7n5seES0rMj56wRXyZhu4OHpL9MNJ0wbmPnHPTSu7+QHrOBR4w+yuli2sXsCfXmWUODB3pFKpTuRbEfS3BuPHC+c06OyLwb4qOmCktI+wvVgWheJiEIjH+ca0/0gbOlmIxjpncUNvh3VJ+0CKNxBlRKjvlsG6fekN9KseKQsTWWkKQsClocO0HeIq9dWKm4M8BfURIuJvkAt8L77JqLPYiE4VhEYsLZuOoDilP9n+/Cau6mf5F3hjb6CRw679vzuL/1ns8bj5SQ1ksXROp1lOM0uWON+u1eKMT4sTufqkzrXsqvjHJfTna+sOuCWZCS27J4tiU/Vrc12L3x2YqOCBiulG2n+mv3bHiAtN8ma9YdEr4yPIT7eNWqExh7cw0EAYC/c9nBg5d9ZAM2pLwViypH209Lsm3DB6ch1O5U4BllcFP6iqmM0MwbyGoFWvx54yxU5dk8eCeULwMTY+/CalyKmsHuWOMEFVKU1O5Vc54KEFq+5pCXHpbukF1zRKEt4lAIAxFqoF2pgEwmFuoppOFefVB8YTW+a8UTopimpCjoMNol28lus95VmY/f1Vx9rSpXbARvjnW/xE50n3OpcXZuZU2I1KFkgiB53Dpyn3NSdmzQfdvNXlBCT4m7gIFSvaWZr4xwlJDK4V882dlz8IAV/9+N1oc0sDwMHchYuUB4rE/NcxJfoMGWJAZlg64bWM6v2musVYI6uVVoSgTNWxFw0HktjDZD9wjdcJEdr4r0EXEP1rrolxnxKoyNP46ZuAviXhVeDOWbZ0TRyq3OlagSn99UVqx3cJ0wlRayObd6UQR32MWgNOtwuwD2QrG+BKrAKjSKiTLjZzX3xUKU78z+PQe8zmbwNNuFs bZyTjL7j nHKqJbdR+0OCGqi1kMZgwwLpPVWBeDKlh3sks4irIUm2ng54RzptP+c5qmK+HqRuLSqgBeKbzTU7LzVCANvb+kMbID/1cgyEj69U5A2noXOUPcOaa25ROgXIs4mFGClcUqlwuegd4q1wwXQddpWRPaBth7zx8TIM5Yb3JCzUAE37y4M3EsKGNvGDZEQljExnfpbDoM36k9Kr+roT1WTjVvhphf5nLW/yaEb9OO24OxPyigcezcPoN3g9Di2hTaYYOJaamMq2QQNOnGZSRG2sgViXflg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently there's no way to get BPF memory usage, while we can only estimate the usage by bpftool or memcg, both of which are not reliable. - bpftool `bpftool {map,prog} show` can show us the memlock of each map and prog, but the memlock is vary from the real memory size. The memlock of a bpf object is approximately `round_up(key_size + value_size, 8) * max_entries`, so 1) it can't apply to the non-preallocated bpf map which may increase or decrease the real memory size dynamically. 2) the element size of some bpf map is not `key_size + value_size`, for example the element size of htab is `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)` That said the differece between these two values may be very great if the key_size and value_size is small. For example in my verifaction, the size of memlock and real memory of a preallocated hash map are, $ grep BPF /proc/meminfo BPF: 350 kB <<< the size of preallocated memalloc pool (create hash map) $ bpftool map show 41549: hash name count_map flags 0x0 key 4B value 4B max_entries 1048576 memlock 8388608B $ grep BPF /proc/meminfo BPF: 82284 kB So the real memory size is $((82284 - 350)) which is 81934 kB while the memlock is only 8192 kB. - memcg With memcg we only know that the BPF memory usage is less than memory.kmem.usage_in_bytes (or memory.current in v2). Furthermore, we only know that the BPF memory usage is less than $MemTotal if the BPF object is charged into root memcg :) So we need a way to get the BPF memory usage especially there will be more and more bpf programs running on the production environment. The memory usage of BPF memory is not trivial, which deserves a new item in /proc/meminfo. There're some ways to calculate the BPF memory usage. They all have pros and cons. - Option 1: Annotate BPF memory allocation only It is how I implemented in RFC v1. You can look into the detail and discussion on it via the link below[1]. - pros We only need to annotate the BPF memory allocation, and then we can find these allocated memory in the free path automatically. So it is very easy to use, and we don't need to worry about the stat leak. - cons We must store the information of these allocated memory, in particular the allocated slab objects. So it takes extra memory. If we introduce a new member into struct page or add this member into page_ext, it will take at least 0.2% of the total memory on 64bit system, that is not acceptible. One way to reduce this memory overhead is to introduce dynamic page extension, but it will take great effort and it may not worth it. - Option 2: Annotate both allocation and free It is similar to how I implemented in an earlier version[2]. - pros There's almost no memory overhead. - cons All the memory allocation and free must use the BPF helpers, but can't use the generic helpers like kfree/vfree/percpu_free. So if the user forget to use the helpers we introduced to allocate or free BPF memory, there will be stat leak. It is not easy to annotate some derferred allocation, in particular the kfree_rcu(). So the user have to use call_rcu() instead of kfree_rcu(). Another risk is that if we introduce other deferred free helpers in the future, this BPF statistic may break easily. - Option 3: Calculate the memory size via the pointer It is how I implement in this patchset. After allocating some BPF memory, we get the full size from the pointer and add it; Before freeing the BPF memory, we get the full size from the pointer and sub it. - pros No memory overhead. No code churn in MM core allocation and free path. The impementation is quite clear and easy to maintain. - cons The calculation is not embedded in the MM allocation/free path, so there will be some redundant code to execute to get the size via pointer. BPF memory allocation and free must use the helpers we introduced, otherwise there will be stat leak. I perfer the option 3. Its cons can be justified. - bpf_map_free should be paired with bpf_map_alloc, that's reasonable. - Regarding the possible extra cpu cycles it may take, the user should not allocate and free memory in the critical path if it is latency sensitive. [1]. https://lwn.net/Articles/917647/ [2]. https://lore.kernel.org/linux-mm/20220921170002.29557-1-laoar.shao@gmail.com/ v1->v2: don't use page_ext (Vlastimil, Hyeonggon) Yafang Shao (11): mm: percpu: count memcg relevant memory only when kmemcg is enabled mm: percpu: introduce percpu_size() mm: slab: rename obj_full_size() mm: slab: introduce ksize_full() mm: vmalloc: introduce vsize() mm: util: introduce kvsize() bpf: introduce new helpers bpf_ringbuf_pages_{alloc,free} bpf: use bpf_map_kzalloc in arraymap bpf: use bpf_map_kvcalloc in bpf_local_storage bpf: add and use bpf map free helpers bpf: introduce bpf memory statistics fs/proc/meminfo.c | 4 ++ include/linux/bpf.h | 115 +++++++++++++++++++++++++++++++++++++++-- include/linux/percpu.h | 1 + include/linux/slab.h | 10 ++++ include/linux/vmalloc.h | 15 ++++++ kernel/bpf/arraymap.c | 20 +++---- kernel/bpf/bpf_cgrp_storage.c | 2 +- kernel/bpf/bpf_inode_storage.c | 2 +- kernel/bpf/bpf_local_storage.c | 24 ++++----- kernel/bpf/bpf_task_storage.c | 2 +- kernel/bpf/cpumap.c | 13 +++-- kernel/bpf/devmap.c | 10 ++-- kernel/bpf/hashtab.c | 8 +-- kernel/bpf/helpers.c | 2 +- kernel/bpf/local_storage.c | 12 ++--- kernel/bpf/lpm_trie.c | 14 ++--- kernel/bpf/memalloc.c | 19 ++++++- kernel/bpf/ringbuf.c | 75 ++++++++++++++++++--------- kernel/bpf/syscall.c | 54 ++++++++++++++++++- mm/percpu-internal.h | 4 +- mm/percpu.c | 35 +++++++++++++ mm/slab.h | 19 ++++--- mm/slab_common.c | 52 +++++++++++++------ mm/slob.c | 2 +- mm/util.c | 15 ++++++ net/core/bpf_sk_storage.c | 4 +- net/core/sock_map.c | 2 +- net/xdp/xskmap.c | 2 +- 28 files changed, 422 insertions(+), 115 deletions(-)