From patchwork Tue Feb 8 00:05:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 12737953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2199C433F5 for ; Tue, 8 Feb 2022 00:06:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 342D86B0073; Mon, 7 Feb 2022 19:06:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F1866B0075; Mon, 7 Feb 2022 19:06:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B9606B0078; Mon, 7 Feb 2022 19:06:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193]) by kanga.kvack.org (Postfix) with ESMTP id 0948C6B0073 for ; Mon, 7 Feb 2022 19:06:07 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C21198249980 for ; Tue, 8 Feb 2022 00:06:06 +0000 (UTC) X-FDA: 79117669932.09.F3FDA0B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 4A0A51A000C for ; Tue, 8 Feb 2022 00:06:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644278765; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=hp8wvZdS1P5SZuwA1phL5v+8erUCnmBhdeZatALDNps=; b=S/g3M2QNf2rmcpXYneOtqil1GUidsZxCHJXBUBs8QUqrqHKCM9Col9rGIO0WpMfT8Uy4vP VhsVbikpGEAwl0hKbod5Yb02RR7Kkoxzjde+pyocPawXDzRbqhst2rAvERvSTf61OAQxk1 LQ9GQS75eh50jsKbnYB5MyGyZq0XGqQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-447-n3pT6MGFOE-l5rV8R4WyZw-1; Mon, 07 Feb 2022 19:06:00 -0500 X-MC-Unique: n3pT6MGFOE-l5rV8R4WyZw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D721B100C663; Tue, 8 Feb 2022 00:05:57 +0000 (UTC) Received: from llong.com (unknown [10.22.32.15]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4CE875BC49; Tue, 8 Feb 2022 00:05:39 +0000 (UTC) From: Waiman Long To: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Petr Mladek , Steven Rostedt , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Ira Weiny , Mike Rapoport , David Rientjes , Roman Gushchin , Rafael Aquini , Waiman Long Subject: [PATCH v5 0/4] mm/page_owner: Extend page_owner to show memcg information Date: Mon, 7 Feb 2022 19:05:28 -0500 Message-Id: <20220208000532.1054311-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: pnyws53yzor61fb7cizn731fnccnsgoz Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="S/g3M2QN"; spf=none (imf19.hostedemail.com: domain of longman@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4A0A51A000C X-HE-Tag: 1644278766-4435 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: v5: - Apply the following changes to patch 3 1) Make cgroup_name() write directly into kbuf without using an intermediate buffer. 2) Change the terminology from "offline memcg" to "dying memcg" to align better with similar terms used elsewhere in the kernel. v4: - Take rcu_read_lock() when memcg is being accessed as suggested by Michal. - Make print_page_owner_memcg() return the new offset into the buffer and put CONFIG_MEMCG block inside as suggested by Mike. - Directly use TASK_COMM_LEN as length of name buffer as suggested by Roman. v3: - Add unlikely() to patch 1 and clarify that -1 will not be returned. - Use a helper function to print out memcg information in patch 3. - Add a new patch 4 to store task command name in page_owner structure. While debugging the constant increase in percpu memory consumption on a system that spawned large number of containers, it was found that a lot of dying mem_cgroup structures remained in place without being freed. Further investigation indicated that those mem_cgroup structures were pinned by some pages. In order to find out what those pages are, the existing page_owner debugging tool is extended to show memory cgroup information and whether those memcgs are dying or not. With the enhanced page_owner tool, the following is a typical page that pinned the mem_cgroup structure in my test case: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 70984 (podman), ts 5421278969115 ns, free_ts 5420935666638 ns PFN 3205061 type Movable Block 6259 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) prep_new_page+0x8e/0xb0 get_page_from_freelist+0xc4d/0xe50 __alloc_pages+0x172/0x320 alloc_pages_vma+0x84/0x230 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x48d/0x890 shmem_write_begin+0x36/0xc0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to dying memcg libpod-conmon-fbc62060b5377479a7371cc16c5c596002945f2aa00d3d6d73a0cd0d148b6637.scope So the page was not freed because it was part of a shmem segment. That is useful information that can help users to diagnose similar problems. With cgroup v1, /proc/cgroups can be read to find out the total number of memory cgroups (online + dying). With cgroup v2, the cgroup.stat of the root cgroup can be read to find the number of dying cgroups (most likely pinned by dying memcgs). The page_owner feature is not supposed to be enabled for production system due to its memory overhead. However, if it is suspected that dying memcgs are increasing over time, a test environment with page_owner enabled can then be set up with appropriate workload for further analysis on what may be causing the increasing number of dying memcgs. Waiman Long (4): lib/vsprintf: Avoid redundant work with 0 size mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check mm/page_owner: Print memcg information mm/page_owner: Record task command name lib/vsprintf.c | 8 +++--- mm/page_owner.c | 72 ++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 62 insertions(+), 18 deletions(-)