From patchwork Mon Dec 12 00:37:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13070693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07621C4332F for ; Mon, 12 Dec 2022 00:37:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16F008E0003; Sun, 11 Dec 2022 19:37:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11FDD8E0002; Sun, 11 Dec 2022 19:37:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F02058E0003; Sun, 11 Dec 2022 19:37:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DD9EA8E0002 for ; Sun, 11 Dec 2022 19:37:38 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AA0C11208B2 for ; Mon, 12 Dec 2022 00:37:38 +0000 (UTC) X-FDA: 80231790996.25.C919FBC Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf09.hostedemail.com (Postfix) with ESMTP id 1986C140009 for ; Mon, 12 Dec 2022 00:37:36 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=d7JMhWWp; spf=pass (imf09.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670805457; a=rsa-sha256; cv=none; b=rxMIKp8EI3yhZn7tKB2yOFsEC6gEgJT9EWWo0tg7wpumU3noxZ4clgsLEvYBymZRhVaCrH wp7XajNXuX+s8yYQ/bgp+W1qMzyTDJRWvB0Q3EWZdfCGLFMd6Q+Uruy1HfMi2eaiE9aGji XtqiSi/Qe8zmwG9raCa+ZXsCu9BbzR0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=d7JMhWWp; spf=pass (imf09.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670805457; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Hl+8JeOvA32brFH/uRfy1HKXKi8XMOI7J+Z36Dk9y70=; b=oO+N4DY13u+lcWtctxCrC1ZjNKlDwBQQ6ViWn4oSsVlcEnCJUVUVcRg6d4KsvfRsD7D/Qz pGbduqsjiJxEKZ9tSK8QpTi8sMDk424FGZLu93e6eyTiGA2Dy/tGBypTTukldurFGufzu+ 4coeCoIU8ErZGtFwwDcXp+CV1jwztxQ= Received: by mail-pl1-f180.google.com with SMTP id d3so10409855plr.10 for ; Sun, 11 Dec 2022 16:37:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Hl+8JeOvA32brFH/uRfy1HKXKi8XMOI7J+Z36Dk9y70=; b=d7JMhWWpGbGWUmkANpu3QQnFhH6/6lzeroesGNjvkpTFi4yuTDbACwmGgNbV/ao9dS Gq12Hoa+tPGCSDHsUkaHt1bg7oJ3lvlAJtycXytsSUD4JdT+GaEN9bYwKBrizGck5TpG w1CY/ND/YBTC5DvUkupuvz0H5mF8ROtyfLR3dzytOE60X7zBct1GZv8rfBEuYi0lR0dO 3C5Wa5QmNyPnnpnq0R9pEYVtAP2ryZNhVbwE4hAwNqtDSfbeD/XZ0YRklKWu4ydyQNf8 O3vwD79GZc3bWuUKnJmJHL97PvTdIiYVTqPKymlAQM2q3P0mRUuaWu9R8xt5gtRNWE5S gdKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Hl+8JeOvA32brFH/uRfy1HKXKi8XMOI7J+Z36Dk9y70=; b=L/7otgEKbO/Qgy55OptGFU2q3wbEXPv4H9YT2zYAnIvj5btxykHw0yLPR20eG6VXlq OwEMpxYymqSCuwNuqSOM6U0YiftDkCq4A9GMMs5QQzIApyuZ5DCp1HEHFmuVoJ9LIgKG +bDC9rNgRWYlQO9cPPdFyIPkPbUyf/hrykT6KPWxQwS/f+D1sderxA706v/K+o/81M5x 6rMxJJE/AtYspJ01jt/zCgxtl5uhOBnYNmxueWxq4wMfCLhnpl07VGbVMNUupr3cM0mu wuZpgLhGq4Y5SbUsKlpTjROIY6VxSt17K6EzMlZWafgG29JlCw+NinktPslexVLVWSFx pDmg== X-Gm-Message-State: ANoB5pk5M4aoLsKDk2WyujzwnP/gGH+LLK3nn3GCnkN2YNiGM1lfpeUU ePpr+EN5qEljjr1L57VqJSE= X-Google-Smtp-Source: AA0mqf5BPM+NqwKp3zAHO4204wVrI7RSFS+Vbopkd7XcvO9m2lDEqwT1CjeEu5VdM3mBTVOe1yqHPg== X-Received: by 2002:a17:902:b198:b0:189:fa12:c98a with SMTP id s24-20020a170902b19800b00189fa12c98amr16321147plr.66.1670805455863; Sun, 11 Dec 2022 16:37:35 -0800 (PST) Received: from vultr.guest ([2001:19f0:7002:8c7:5400:4ff:fe3d:656a]) by smtp.gmail.com with ESMTPSA id w9-20020a170902e88900b00177fb862a87sm4895960plg.20.2022.12.11.16.37.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 11 Dec 2022 16:37:35 -0800 (PST) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, tj@kernel.org, dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com Cc: linux-mm@kvack.org, bpf@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 0/9] mm, bpf: Add BPF into /proc/meminfo Date: Mon, 12 Dec 2022 00:37:02 +0000 Message-Id: <20221212003711.24977-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1986C140009 X-Stat-Signature: cq84qfacp8cccfp4sxewuifx96zqxkc6 X-HE-Tag: 1670805456-336960 X-HE-Meta: U2FsdGVkX193qLCdzi4k8L/2XSzlJfR+opV0cO6sLTtVXQV7hAijJtgXoefM029kfGT6kTzhNGIkEfINax6YBU+mS2EYyk9be6Gi4cQrlA7NDArYiEaUBAwhxQfgLcWv8A6YnbtAEeTqonjl09PODhM/+1CEeiAOQ8Yoxot0ScHra8FJIJfGbLfskbJGMfSkHnlnf+A2Z5t/a7pzR5l+FqZg+t7VLgHe8vDOlsjr3Zj+Ky7YUm32Qy71lSX+OzieeHCO3kIDUmo90cGS8siiC8W+Q4ChqMELa+kXmEGsFP7Su0jXAdcC7adJkzuzOAd8sqcqXJxknBFPgLcfJkSuk30NCEKZ6FF4aMxi5nfWmaf1qocGvyaR8570m1Q8ZGzaYTGkdDvPXXqGCuzjMwnkqMHTnAkTEZ+i3Z7AQp1kDbYn9OvV6TfpS61BJQpaETjT2EMByaFbw1A3wohR0aLvZzYO8ck3v1T46JBLr7lvmZu7lcY9WUFKuOLZ8OQn1RTzKozPDqT5SqlYQbBtWhEMg6ppq4K/ui7RLs8Bp/iqSTqcx2iyNZy8RO4vqmiWh5giXa9AdX0UJa6qWHR8BXEE0DipLGm6o3ePryaFhUheo74yF2LuUcccGMRh8FoL4iVdDm5tRWIF5fOW1QTBxZR/EIQ2iRD7FkBEeG5rL7b5sY9B5ycM4j04nH72ft6eA1oANT3eWQpouVCFPDw6CpHV6JaNHMHJytnwHftR5kOg1wftaix+aWxO25Ihxc191ren6N1DhKfCeh7QQluoAY5eiNAk5zsXEY1rJ3yuwN7QwffBlytJtVokqrRjWP8OpXNW022p99Gli9EIEZpVDWaRu358JD8fxwckBr/6FxlD2O8rdyI/CDdOPfyKIm16DY/v6DTEkXXBkrlQycCQ82iuTDG44EjwddWIsgDc1Dpm55/3YfpX+ebB4j6SLyVhsfammToERzDoz13GTUwsdtZ e7uT8Mx4 iJxrg0O4EVt/Xw4OSvmzFFk1bEokKlJ6LAmC82C55hAypNO0jib/Sm9Rmn9zmxu/nRHw4iW/dPQUqZFKBqEV1kGtV8JQed3K2KIOZCoEzF3fsQ/NtuvaJ2AqvR9c8V9A7MaqPkk0crebueErDCpSKzUT1x6oI3FPSJJpMNvw9IslY/A6RSBlKo+4kw6yjtEfgtGC/REGbulbyrU16DOT1I1FxyayYY/xGSOx6PGGTdNCCDaA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently there's no way to get BPF memory usage, while we can only estimate the usage by bpftool or memcg, both of which are not reliable. - bpftool `bpftool {map,prog} show` can show us the memlock of each map and prog, but the memlock is vary from the real memory size. The memlock of a bpf object is approximately `round_up(key_size + value_size, 8) * max_entries`, so 1) it can't apply to the non-preallocated bpf map which may increase or decrease the real memory size dynamically. 2) the element size of some bpf map is not `key_size + value_size`, for example the element size of htab is `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)` That said the differece between these two values may be very great if the key_size and value_size is small. For example in my verifaction, the size of memlock and real memory of a preallocated hash map are, $ grep BPF /proc/meminfo BPF: 1026048 B <<< the size of preallocated memalloc pool (create hash map) $ bpftool map show 3: hash name count_map flags 0x0 key 4B value 4B max_entries 1048576 memlock 8388608B $ grep BPF /proc/meminfo BPF: 84919344 B So the real memory size is $((84919344 - 1026048)) which is 83893296 bytes while the memlock is only 8388608 bytes. - memcg With memcg we only know that the BPF memory usage is less than memory.usage_in_bytes (or memory.current in v2). Furthermore, we only know that the BPF memory usage is less than $MemTotal if the BPF object is charged into root memcg :) So we need a way to get the BPF memory usage especially there will be more and more bpf programs running on the production environment. The memory usage of BPF memory is not trivial, which deserves a new item in /proc/meminfo. This patchset introduce a solution to calculate the BPF memory usage. This solution is similar to how memory is charged into memcg, so it is easy to understand. It counts three types of memory usage - - page via kmalloc, vmalloc, kmem_cache_alloc or alloc pages directly and their families. When a page is allocated, we will count its size and mark the head page, and then check the head page at page freeing. - slab via kmalloc, kmem_cache_alloc and their families. When a slab object is allocated, we will mark this object in this slab and check it at slab object freeing. That said we need extra memory to store the information of each object in a slab. - percpu via alloc_percpu and its family. When a percpu area is allocated, we will mark this area in this percpu chunk and check it at percpu area freeing. That said we need extra memory to store the information of each area in a percpu chunk. So we only need to annotate the allcation to add the BPF memory size, and the sub of the BPF memory size will be handled automatically at freeing. We can annotate it in irq, softirq or process context. To avoid counting the nested allcations, for example the percpu backing allocator, we reuse the __GFP_ACCOUNT to filter them out. __GFP_ACCOUNT also make the count consistent with memcg accounting. To store the information of a slab or a page, we need to create a new member in struct page, but we can do it in page extension which can avoid changing the size of struct page. So a new page extension active_vm is introduced. Each page and each slab which is allocated as BPF memory will have a struct active_vm. The reason it is named as active_vm is that we can extend it to other areas easily, for example in the future we may use it to count other memory usage. The new page extension active_vm can be disabled via CONFIG_ACTIVE_VM at compile time or kernel parameter `active_vm=` at runtime. Below is the result of this patchset, $ grep BPF /proc/meminfo BPF: 1002 kB Currently only bpf map is supported, and only slub in supported. Future works: - support bpf prog - not sure if it needs to support slab (it seems slab will be deprecated) - support per-map memory usage - support per-memcg memory usage Yafang Shao (9): mm: Introduce active vm item mm: Allow using active vm in all contexts mm: percpu: Account active vm for percpu mm: slab: Account active vm for slab mm: Account active vm for page bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} bpf: Use bpf_map_kzalloc in arraymap bpf: Use bpf_map_kvcalloc in bpf_local_storage bpf: Use active vm to account bpf map memory usage fs/proc/meminfo.c | 3 + include/linux/active_vm.h | 73 ++++++++++++ include/linux/bpf.h | 8 ++ include/linux/page_ext.h | 1 + include/linux/sched.h | 5 + kernel/bpf/arraymap.c | 16 +-- kernel/bpf/bpf_local_storage.c | 4 +- kernel/bpf/memalloc.c | 5 + kernel/bpf/ringbuf.c | 75 ++++++++---- kernel/bpf/syscall.c | 40 ++++++- kernel/fork.c | 4 + mm/Kconfig | 8 ++ mm/Makefile | 1 + mm/active_vm.c | 203 +++++++++++++++++++++++++++++++++ mm/active_vm.h | 74 ++++++++++++ mm/page_alloc.c | 14 +++ mm/page_ext.c | 4 + mm/percpu-internal.h | 3 + mm/percpu.c | 43 +++++++ mm/slab.h | 7 ++ mm/slub.c | 2 + 21 files changed, 557 insertions(+), 36 deletions(-) create mode 100644 include/linux/active_vm.h create mode 100644 mm/active_vm.c create mode 100644 mm/active_vm.h