From patchwork Wed Feb 1 19:50:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13124914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41831C05027 for ; Wed, 1 Feb 2023 19:52:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 425D46B0072; Wed, 1 Feb 2023 14:52:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3ADC26B0073; Wed, 1 Feb 2023 14:52:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29CB96B0075; Wed, 1 Feb 2023 14:52:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 13F806B0072 for ; Wed, 1 Feb 2023 14:52:35 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DABC91A0FFD for ; Wed, 1 Feb 2023 19:52:34 +0000 (UTC) X-FDA: 80419770228.10.72DF84F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id F13BAC0010 for ; Wed, 1 Feb 2023 19:52:32 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="PEbTBY/j"; spf=pass (imf28.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675281153; a=rsa-sha256; cv=none; b=2XBAugFpC12wtZ84OWWtBzJA+RDhpp41E71FSB5yYJ/GTeYHIYS9jlEe1/M5VbdTbVtaLd VWFpJlHyordylr1YrRe93sDERpdF6an0/d6NUgRt57kJYgZ7rA7D6P4eGPQL1NRZd0OsGP bzBlKKtKwkxpBUYG+D0XMImHQ1pAxxY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="PEbTBY/j"; spf=pass (imf28.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675281153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=7C1hmy1E7+0MQkK/mZzmUZ/gKoFVIBd7d2YZY/3tmcQ=; b=KkNnEsBFmNFDJQA15PVWNWqhs/5CfOMhY2m7ogogHb67KxItkSwLf7KlW9Itjw9AGPudC0 /F4zO1Kg8tS2dUcVd43ta/1HiTL4G9HNV4QW+rc54dgWU6jsdcEPKfXr3Ps7gq7bjOs2Z1 sVLmwFot0KYE79lWRDPyEvQNZ/R81rU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675281152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=7C1hmy1E7+0MQkK/mZzmUZ/gKoFVIBd7d2YZY/3tmcQ=; b=PEbTBY/jpgG10Z/vEHePGkDX6YvPTB8g+NLWmDmgyiuIuW8y86FtvjsqLm+4/FQaiIU+Fx ORw7PQUAE+MfSMYbp5D9WmZC8TG2ZTZ/SfP1efvPiYEOfZAqzocAAScJf7mcgJjESm+wCc n+Nv68YxS0oc6EaMtJgGG/B1kDv5GGo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-511-mZt9PK5VNHS2IdT6gYfx_w-1; Wed, 01 Feb 2023 14:52:31 -0500 X-MC-Unique: mZt9PK5VNHS2IdT6gYfx_w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4A1D73C02558; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 182731121339; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 0D1C1403C42BF; Wed, 1 Feb 2023 16:51:49 -0300 (-03) Message-ID: <20230201195104.411744803@redhat.com> User-Agent: quilt/0.67 Date: Wed, 01 Feb 2023 16:50:14 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH 1/5] mm/vmstat: remove remote node draining References: <20230201195013.881721887@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspam-User: X-Rspamd-Queue-Id: F13BAC0010 X-Rspamd-Server: rspam01 X-Stat-Signature: 54pdaarqk1zfftfmwkxotwie9q7d41u4 X-HE-Tag: 1675281152-611328 X-HE-Meta: U2FsdGVkX1/+TKOv7XoqhKselN7e6iU8PLjv70cjr9TswMzJCtrJlCBH4nXVZP1WxkfI0jOx9/enfSYKslVvLKltpk11++Jd+tUBpoczhe3MqCpfl7Hwh4WxElrMOly2Jx1qeHQXXWkzj1gZh/bv4vlT92rRnL+ioQVDgRdNDizZT20hjLC/lKN8uqrZnPIWfFlHztijJm+42t26lNoRPu7J+ATRTQ64+cTL/H+OK6J5WHrevz8L9/Rqc1cEWicyotTW4xmz4yJq7gfyteqYA8SbU7LZVpMcSyEJIJkC5VWEXBAaRToqQxGUncwq/APLWKzQUXQezDnbLyeXWnHxgCbxZrnQvqhY9yYv/09dPyK5N5q/x1vn5J8voy6UHG7szn29ehjEFcZ7s7JRNm9OaPnTdiO1xLwiiEW/P5oSls6wOX0ah5OeyVx6ap5UhGyLI3dzbApS+MAQQIE8wPaRoWNztSm8rzDtluzcnge8/0BY077OQUoF2BYLzwIm7of8akhibVntM5Ju2PTZzmhyaabSzzz6bP3SH9Z2/oF6H+N+PZoM0/k2Wt+6x8W1jD/hpdUet+mqYAFPGpRpZCdYbQu6GxnvvJPLro0p1dbIGGsnoIoHHBOj1/1bnL9HfD77CqUos+yhbPdGyuGuJq0XTLJ2WokgcR7Y0b2dL8WQKKQIKdrw8fXtYMl4VIrEx8jZTXhPDZZ0VtNu6jFjMyKoZ1vJ20LP/3TVul/3yeS7MCXpEVoRyLEONHLUo1m12tUaKwS4Io3Ie9YXHEvYhzdIt+q+1W0twsc6MM+M1SIkaQp/vhtdU32PfB0YRHMIlKJky/Z7e9hvgX1dqQdMoQ0mxyVtsWuvQsIMa1JZ7czyqlvEaWfmXkSv/h19RLwZoUzg7LJsBGJtdxpc/x7GiKhHGqnTesiyRqVWYrIBTNBcqQnQpS51ZZSHa+r+pbZ+UZm0pWlkNqSGHfsRV5LjvaL 3npqAmKQ Xf4qUMKBXWR4lhftrlb3VjmvgMJNP5fH7RKkdIcvAU4m/ou/xdPGpzGZ2Sh6v5s7FsTYeu5Htnr8CmqefWvRWTKe0dYGGt0vsziCjoltDh87FsM2C6yFnQhawMkJmqeUaS81SsJN9puAdD8pCLSG+J0hjLk4JjU6bhUbFBEpdiiOxQO1iLB0ZUZc8cRCRl7o8H91k1SrLzLJ2pmfK60ujmi9+sB5ggr9wQ+aS6cC1tGh56Oc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Draining of pages from the local pcp for a remote zone was necessary since: "Note that remote node draining is a somewhat esoteric feature that is required on large NUMA systems because otherwise significant portions of system memory can become trapped in pcp queues. The number of pcp is determined by the number of processors and nodes in a system. A system with 4 processors and 2 nodes has 8 pcps which is okay. But a system with 1024 processors and 512 nodes has 512k pcps with a high potential for large amount of memory being caught in them." Since commit 443c2accd1b6679a1320167f8f56eed6536b806e ("mm/page_alloc: remotely drain per-cpu lists"), drain_all_pages() is able to remotely free those pages when necessary. Signed-off-by: Marcelo Tosatti Index: linux-vmstat-remote/include/linux/mmzone.h =================================================================== --- linux-vmstat-remote.orig/include/linux/mmzone.h +++ linux-vmstat-remote/include/linux/mmzone.h @@ -577,9 +577,6 @@ struct per_cpu_pages { int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ short free_factor; /* batch scaling factor during free */ -#ifdef CONFIG_NUMA - short expire; /* When 0, remote pagesets are drained */ -#endif /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -803,7 +803,7 @@ static int fold_diff(int *zone_diff, int * * The function returns the number of global counters updated. */ -static int refresh_cpu_vm_stats(bool do_pagesets) +static int refresh_cpu_vm_stats(void) { struct pglist_data *pgdat; struct zone *zone; @@ -814,9 +814,6 @@ static int refresh_cpu_vm_stats(bool do_ for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; -#ifdef CONFIG_NUMA - struct per_cpu_pages __percpu *pcp = zone->per_cpu_pageset; -#endif for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { int v; @@ -826,44 +823,8 @@ static int refresh_cpu_vm_stats(bool do_ atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; -#ifdef CONFIG_NUMA - /* 3 seconds idle till flush */ - __this_cpu_write(pcp->expire, 3); -#endif } } -#ifdef CONFIG_NUMA - - if (do_pagesets) { - cond_resched(); - /* - * Deal with draining the remote pageset of this - * processor - * - * Check if there are pages remaining in this pageset - * if not then there is nothing to expire. - */ - if (!__this_cpu_read(pcp->expire) || - !__this_cpu_read(pcp->count)) - continue; - - /* - * We never drain zones local to this processor. - */ - if (zone_to_nid(zone) == numa_node_id()) { - __this_cpu_write(pcp->expire, 0); - continue; - } - - if (__this_cpu_dec_return(pcp->expire)) - continue; - - if (__this_cpu_read(pcp->count)) { - drain_zone_pages(zone, this_cpu_ptr(pcp)); - changes++; - } - } -#endif } for_each_online_pgdat(pgdat) { @@ -1864,7 +1825,7 @@ int sysctl_stat_interval __read_mostly = #ifdef CONFIG_PROC_FS static void refresh_vm_stats(struct work_struct *work) { - refresh_cpu_vm_stats(true); + refresh_cpu_vm_stats(); } int vmstat_refresh(struct ctl_table *table, int write, @@ -1928,7 +1889,7 @@ int vmstat_refresh(struct ctl_table *tab static void vmstat_update(struct work_struct *w) { - if (refresh_cpu_vm_stats(true)) { + if (refresh_cpu_vm_stats()) { /* * Counters were updated so we expect more updates * to occur in the future. Keep on running the @@ -1991,7 +1952,7 @@ void quiet_vmstat(void) * it would be too expensive from this path. * vmstat_shepherd will take care about that for us. */ - refresh_cpu_vm_stats(false); + refresh_cpu_vm_stats(); } /* From patchwork Wed Feb 1 19:50:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13124915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12F4BC636D4 for ; Wed, 1 Feb 2023 19:52:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6930D6B0073; Wed, 1 Feb 2023 14:52:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B5456B0078; Wed, 1 Feb 2023 14:52:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 311BB6B0074; Wed, 1 Feb 2023 14:52:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2042B6B0073 for ; Wed, 1 Feb 2023 14:52:35 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C923F1A0FFA for ; Wed, 1 Feb 2023 19:52:34 +0000 (UTC) X-FDA: 80419770228.10.CCC1725 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id DBD9DC0009 for ; Wed, 1 Feb 2023 19:52:32 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Tw9jLRsw; spf=pass (imf22.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675281153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=3slieG8t4CDqujz/admEP/IS8ck8iEpPaLpD8PiApzU=; b=U1FL3U8PsIQT05Q7CyG7xRBXLXwjo9IFzCKebCZfcKieupdAgHWZAYWgQZatjBQqu3WLw/ 149s379PLXtNc4Vb9bt/FD52UcdW7RzINoMo0RYE8wzqLS9leI9yjm0EboMwtpD/uW7ZqE mNXLrkuciFJIZP6vF2XZiv5a52/pPv0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Tw9jLRsw; spf=pass (imf22.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675281153; a=rsa-sha256; cv=none; b=t/gOtklAmJQFNelTt67xx0A0EP6KYWQxaGnTtWr9Ol4/4GJBa1+NFFbVR5is5jwtKtR+XN /exR0lzgozlxfuJcftnYkcoyuQI7flVHCwmsvo5xRoDBZqfMJQjfplT375TZ4XdeOwaV1C P/wNh69bg8ZYiB9qiFEKJslFL6flA6E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675281152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=3slieG8t4CDqujz/admEP/IS8ck8iEpPaLpD8PiApzU=; b=Tw9jLRswfeiYVe+BbFTROeghgA0lulZL9rQGUbfwHdbAAUbGHBBO4gDfcUKT98datBNp9K KjJ8vkXYXYi03pfqvM51FHLpDfPEDpLY/Vs9/imoLw89hUL7S8KtKJUDZGcwDZ6+mB1Qmn YKuTbkfjH7O8zMorr0bA32KPHUjtOtw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-558-Q-mtDjocPpCEVfJ-vCta9w-1; Wed, 01 Feb 2023 14:52:31 -0500 X-MC-Unique: Q-mtDjocPpCEVfJ-vCta9w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6E9D12806059; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0BC6EC15BAE; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 10943403C47C1; Wed, 1 Feb 2023 16:51:49 -0300 (-03) Message-ID: <20230201195104.436627422@redhat.com> User-Agent: quilt/0.67 Date: Wed, 01 Feb 2023 16:50:15 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH 2/5] mm/vmstat: switch counter modification to cmpxchg References: <20230201195013.881721887@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Stat-Signature: x4tzz56hx1ndbfjxoe77gktz7ftfgbte X-Rspam-User: X-Rspamd-Queue-Id: DBD9DC0009 X-Rspamd-Server: rspam06 X-HE-Tag: 1675281152-521175 X-HE-Meta: U2FsdGVkX1/NWWGWkudVtMS15rWQ1HwWX7XF6Q7XR8K/UDGQY9l6MfGy9WSWjoRqMlDneBLr8kFl20iS8zPtEjTfTcmhXPzzeIl5C9mNHfH36T1raj+e95hH0saD//gilq1pfYD7J71MJYIVofDJM5Y50MC+Hig7LYhDAiQSXDqxcv43n7YZBzf33qzJ0Uf1w3K+ibQ7F98MLSLwJgmkVASqV0lYDGXjmrzeRzFN4s/nUwIwErEXj26LhRX86mz2RdlCDNR3daAYRAFZH1/c4BWelQcfnCNuEyZkfkQGSAt+ntI4jrF0MUS0cHOgUcgWBGiudDn7hx9CI4ya8uo6dn2Y5aCVzJ9/+J1UvP5pqYAD3CiaCAazIAZ8gLv0k5Kdc4z99Irr9hHdbXkzrBWTa8YK7mL2KVWcZWdhoH4xuFM5f7Dm2oaQfNcOyf4p2+7qS8vD1yfNW2eyWdOdGZGJPr+icZHr7+0aJyt8s6V5IXYKl/DjzGHao3SZmWXTpxQXZbDwNzY1mrg8ISNW5Mb/+sPL4Ht0zPRck4C0vvQo8GW57H2C8FmpWrfCgmVDmHcp2O6aoeO56zcHsn4+n5InYfel2KzXg3Thdz+bjOysoQ9rkc4hhnERuPxQzbmlfjdNfuY8sA9QrJ7r8S7j7FXv467KZMowDvhQrWzwqS95jA8Hiq3EUKfC5V+vHkMogRPoqZS66UpNZgEw2qc7bwPpLs4yc0nJZa7PgdliQeNDIKvZX7JSrK0Dz3dynEnjopibjuCqRFeSvWaAIsB51muUYoM0xos2OWorjLUKDqzt+q4LkCAbTozdngb4XVeUky/+5bPEuqsg+ygyhrsws99exhe8mBFc+tfT85UtSwtcYNGkjfXOQImKQG/8oToBdfUt/TiGb+X4r6qOHgFWiNmNonwqP4x4GU7dVIGlW9qFnMAmwl5uD/SI30SbUJTBD38cCupkuT3JA2IZB0pI0Bw /GY/TF/q CNeTc0we0X/u+tOzikANAGyw4+TOZprNauvnEwRlQyroP30/WsrrUzTwOFhV3wrGFLPoIR0aillqL+31cWwo4580KJplfbAeOkz4HOnGERDctF+cokSe9W2SZTL5u8h1HN+3zQmve/6Hy6W4YzeLQw4iaMFFuG9evhR67Nxtz1utIXWKjAt9vyooi4LM4rKAsDXrxLOhkTdox3WvDPR1KytGe+ZkOyBIsItuY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation to switch vmstat shepherd to flush per-CPU counters remotely, switch all functions that modify the counters to use cmpxchg. To test the performance difference, a page allocator microbenchmark: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench01.c with loops=1000000 was used, on Intel Core i7-11850H @ 2.50GHz. For the single_page_alloc_free test, which does /** Loop to measure **/ for (i = 0; i < rec->loops; i++) { my_page = alloc_page(gfp_mask); if (unlikely(my_page == NULL)) return 0; __free_page(my_page); } Unit is cycles. Vanilla Patched Diff 159 156 -1.9% Signed-off-by: Marcelo Tosatti Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -334,6 +334,188 @@ void set_pgdat_percpu_threshold(pg_data_ } } +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL +/* + * If we have cmpxchg_local support then we do not need to incur the overhead + * that comes with local_irq_save/restore if we use this_cpu_cmpxchg. + * + * mod_state() modifies the zone counter state through atomic per cpu + * operations. + * + * Overstep mode specifies how overstep should handled: + * 0 No overstepping + * 1 Overstepping half of threshold + * -1 Overstepping minus half of threshold + */ +static inline void mod_zone_state(struct zone *zone, enum zone_stat_item item, + long delta, int overstep_mode) +{ + struct per_cpu_zonestat __percpu *pcp = zone->per_cpu_zonestats; + s8 __percpu *p = pcp->vm_stat_diff + item; + long o, n, t, z; + + do { + z = 0; /* overflow to zone counters */ + + /* + * The fetching of the stat_threshold is racy. We may apply + * a counter threshold to the wrong the cpu if we get + * rescheduled while executing here. However, the next + * counter update will apply the threshold again and + * therefore bring the counter under the threshold again. + * + * Most of the time the thresholds are the same anyways + * for all cpus in a zone. + */ + t = this_cpu_read(pcp->stat_threshold); + + o = this_cpu_read(*p); + n = delta + o; + + if (abs(n) > t) { + int os = overstep_mode * (t >> 1); + + /* Overflow must be added to zone counters */ + z = n + os; + n = -os; + } + } while (this_cpu_cmpxchg(*p, o, n) != o); + + if (z) + zone_page_state_add(z, zone, item); +} + +void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, + long delta) +{ + mod_zone_state(zone, item, delta, 0); +} +EXPORT_SYMBOL(mod_zone_page_state); + +void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, + long delta) +{ + mod_zone_state(zone, item, delta, 0); +} +EXPORT_SYMBOL(__mod_zone_page_state); + +void inc_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, 1, 1); +} +EXPORT_SYMBOL(inc_zone_page_state); + +void __inc_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, 1, 1); +} +EXPORT_SYMBOL(__inc_zone_page_state); + +void dec_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, -1, -1); +} +EXPORT_SYMBOL(dec_zone_page_state); + +void __dec_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, -1, -1); +} +EXPORT_SYMBOL(__dec_zone_page_state); + +static inline void mod_node_state(struct pglist_data *pgdat, + enum node_stat_item item, + int delta, int overstep_mode) +{ + struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; + s8 __percpu *p = pcp->vm_node_stat_diff + item; + long o, n, t, z; + + if (vmstat_item_in_bytes(item)) { + /* + * Only cgroups use subpage accounting right now; at + * the global level, these items still change in + * multiples of whole pages. Store them as pages + * internally to keep the per-cpu counters compact. + */ + VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1)); + delta >>= PAGE_SHIFT; + } + + do { + z = 0; /* overflow to node counters */ + + /* + * The fetching of the stat_threshold is racy. We may apply + * a counter threshold to the wrong the cpu if we get + * rescheduled while executing here. However, the next + * counter update will apply the threshold again and + * therefore bring the counter under the threshold again. + * + * Most of the time the thresholds are the same anyways + * for all cpus in a node. + */ + t = this_cpu_read(pcp->stat_threshold); + + o = this_cpu_read(*p); + n = delta + o; + + if (abs(n) > t) { + int os = overstep_mode * (t >> 1); + + /* Overflow must be added to node counters */ + z = n + os; + n = -os; + } + } while (this_cpu_cmpxchg(*p, o, n) != o); + + if (z) + node_page_state_add(z, pgdat, item); +} + +void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, + long delta) +{ + mod_node_state(pgdat, item, delta, 0); +} +EXPORT_SYMBOL(mod_node_page_state); + +void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, + long delta) +{ + mod_node_state(pgdat, item, delta, 0); +} +EXPORT_SYMBOL(__mod_node_page_state); + +void inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) +{ + mod_node_state(pgdat, item, 1, 1); +} + +void inc_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, 1, 1); +} +EXPORT_SYMBOL(inc_node_page_state); + +void __inc_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, 1, 1); +} +EXPORT_SYMBOL(__inc_node_page_state); + +void dec_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, -1, -1); +} +EXPORT_SYMBOL(dec_node_page_state); + +void __dec_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, -1, -1); +} +EXPORT_SYMBOL(__dec_node_page_state); +#else /* * For use when we know that interrupts are disabled, * or when we know that preemption is disabled and that @@ -541,149 +723,6 @@ void __dec_node_page_state(struct page * } EXPORT_SYMBOL(__dec_node_page_state); -#ifdef CONFIG_HAVE_CMPXCHG_LOCAL -/* - * If we have cmpxchg_local support then we do not need to incur the overhead - * that comes with local_irq_save/restore if we use this_cpu_cmpxchg. - * - * mod_state() modifies the zone counter state through atomic per cpu - * operations. - * - * Overstep mode specifies how overstep should handled: - * 0 No overstepping - * 1 Overstepping half of threshold - * -1 Overstepping minus half of threshold -*/ -static inline void mod_zone_state(struct zone *zone, - enum zone_stat_item item, long delta, int overstep_mode) -{ - struct per_cpu_zonestat __percpu *pcp = zone->per_cpu_zonestats; - s8 __percpu *p = pcp->vm_stat_diff + item; - long o, n, t, z; - - do { - z = 0; /* overflow to zone counters */ - - /* - * The fetching of the stat_threshold is racy. We may apply - * a counter threshold to the wrong the cpu if we get - * rescheduled while executing here. However, the next - * counter update will apply the threshold again and - * therefore bring the counter under the threshold again. - * - * Most of the time the thresholds are the same anyways - * for all cpus in a zone. - */ - t = this_cpu_read(pcp->stat_threshold); - - o = this_cpu_read(*p); - n = delta + o; - - if (abs(n) > t) { - int os = overstep_mode * (t >> 1) ; - - /* Overflow must be added to zone counters */ - z = n + os; - n = -os; - } - } while (this_cpu_cmpxchg(*p, o, n) != o); - - if (z) - zone_page_state_add(z, zone, item); -} - -void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, - long delta) -{ - mod_zone_state(zone, item, delta, 0); -} -EXPORT_SYMBOL(mod_zone_page_state); - -void inc_zone_page_state(struct page *page, enum zone_stat_item item) -{ - mod_zone_state(page_zone(page), item, 1, 1); -} -EXPORT_SYMBOL(inc_zone_page_state); - -void dec_zone_page_state(struct page *page, enum zone_stat_item item) -{ - mod_zone_state(page_zone(page), item, -1, -1); -} -EXPORT_SYMBOL(dec_zone_page_state); - -static inline void mod_node_state(struct pglist_data *pgdat, - enum node_stat_item item, int delta, int overstep_mode) -{ - struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; - s8 __percpu *p = pcp->vm_node_stat_diff + item; - long o, n, t, z; - - if (vmstat_item_in_bytes(item)) { - /* - * Only cgroups use subpage accounting right now; at - * the global level, these items still change in - * multiples of whole pages. Store them as pages - * internally to keep the per-cpu counters compact. - */ - VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1)); - delta >>= PAGE_SHIFT; - } - - do { - z = 0; /* overflow to node counters */ - - /* - * The fetching of the stat_threshold is racy. We may apply - * a counter threshold to the wrong the cpu if we get - * rescheduled while executing here. However, the next - * counter update will apply the threshold again and - * therefore bring the counter under the threshold again. - * - * Most of the time the thresholds are the same anyways - * for all cpus in a node. - */ - t = this_cpu_read(pcp->stat_threshold); - - o = this_cpu_read(*p); - n = delta + o; - - if (abs(n) > t) { - int os = overstep_mode * (t >> 1) ; - - /* Overflow must be added to node counters */ - z = n + os; - n = -os; - } - } while (this_cpu_cmpxchg(*p, o, n) != o); - - if (z) - node_page_state_add(z, pgdat, item); -} - -void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, - long delta) -{ - mod_node_state(pgdat, item, delta, 0); -} -EXPORT_SYMBOL(mod_node_page_state); - -void inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) -{ - mod_node_state(pgdat, item, 1, 1); -} - -void inc_node_page_state(struct page *page, enum node_stat_item item) -{ - mod_node_state(page_pgdat(page), item, 1, 1); -} -EXPORT_SYMBOL(inc_node_page_state); - -void dec_node_page_state(struct page *page, enum node_stat_item item) -{ - mod_node_state(page_pgdat(page), item, -1, -1); -} -EXPORT_SYMBOL(dec_node_page_state); -#else /* * Use interrupt disable to serialize counter updates */ From patchwork Wed Feb 1 19:50:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13124917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89E4FC63797 for ; Wed, 1 Feb 2023 19:52:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 539586B0075; Wed, 1 Feb 2023 14:52:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4727C6B0078; Wed, 1 Feb 2023 14:52:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29CEA6B007B; Wed, 1 Feb 2023 14:52:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 19BF36B0075 for ; Wed, 1 Feb 2023 14:52:37 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E34A11C660D for ; Wed, 1 Feb 2023 19:52:36 +0000 (UTC) X-FDA: 80419770312.19.D1B2449 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 0FA081C0024 for ; Wed, 1 Feb 2023 19:52:34 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RKO5+FUx; spf=pass (imf20.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675281155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=UNuyx8K2ySiAV6ZnRT9lDE7gCUj8cZTtuvFWAoUAsDg=; b=1jbxl8/GAMoJt5/loLa0QZMkvsqZ/o38ohfvLBxh+Fz2DZfgvSTy5/W+CFy1R7Lc8ekAXt gPYZpRlw9SnBEn9rVLW96IiqS/0jf1sxU0Vr3LQTobrIQaVar3FR0GIkYktU2ytQiz0fTh KeVBd7eslA4EL2Bi89cnbahpWWryUnY= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RKO5+FUx; spf=pass (imf20.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675281155; a=rsa-sha256; cv=none; b=LT46uzV1lJStWRsDz4XW5FxFJ0t0CNJSfJhVt6o1GBrbMXMYZgjvxl+oFgb9IJOXX7BZCD nYHKSylwiW+YFC6eGG6HjugTzd5O2T84DBnzNcpcUmfN0ao+yNfLES5gJLcBhxvojLt9TV ky2YlrQSdBASqQfcwqzZndMv8E7S2lw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675281154; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=UNuyx8K2ySiAV6ZnRT9lDE7gCUj8cZTtuvFWAoUAsDg=; b=RKO5+FUxGjB7l5FKxsawSkRLLD7XswLK4FN1C001CjfG+H+VyP5Stg8YxEgg5eLgN9HLlv kDYd2A+ezn60ZsNQV/xDZAKLkm4McsjsNnDKBYRn+rWvowV8UUq3YI8Mp42diXNHo/or+3 xswSnsFS96BP0KgUOpUnHfWV8ZnKunA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-594--FZ7uIvhM-eLBBPBAMydmw-1; Wed, 01 Feb 2023 14:52:30 -0500 X-MC-Unique: -FZ7uIvhM-eLBBPBAMydmw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 497DE386B751; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E39C42166B34; Wed, 1 Feb 2023 19:52:29 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 12F5E403C47C6; Wed, 1 Feb 2023 16:51:49 -0300 (-03) Message-ID: <20230201195104.460373427@redhat.com> User-Agent: quilt/0.67 Date: Wed, 01 Feb 2023 16:50:16 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH 3/5] mm/vmstat: use cmpxchg loop in cpu_vm_stats_fold References: <20230201195013.881721887@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: yrwuky981cq18z8mnk1pbayfg6r5t4hq X-Rspamd-Queue-Id: 0FA081C0024 X-HE-Tag: 1675281154-120087 X-HE-Meta: U2FsdGVkX184aEQMVTxDuy77+Al6esDxLim4htCV1FNs3VZKhLldc4Hw8LEjda/9ijkNND0f3c5uGHUs0h0GccvXcTxbtKKK297iTbU7m0qKRx0c1C9aQPVkLKZiFUyLdbGmXYaJfhN9UE2GViPmtRqIv9SbNUX2s0uDva6Eh7RP/JGN4SM1I3tRb42KQE8YMFF2DjAuRWLsj30hMvTTxOKyKRYKWOIkn8AaL/iWI2SZ83xGe2reLB5FJmMwlm46sE06GucwsK8oETK6tR//tZGTuXJtbzaJteIJSKRCKscDjlJzMQAgN8U8fvSm6XXxSOCQmCZoaBl0TTbp4qo8AaX0g94bfVD+V1wScqcKy0M+cr2/a4XYnB0NOqKuEatk1+WpZ9DSPWAiDPSmANO+9uifBDVxKVeSLCdMUicFn6NW6r1vLREjQt/eAa98QM9PPbuhqBLEm4j9BhqQoQyhWTe7qRJMTThk7OozGVl7GZsuM0tlemCizG/He2sixwn4gEUto5WDQFJPHitARTFzfdFEFa91NqLbvbpakYvYF5qWOrk/0wkZu3x1CkHrNgsjyQJ3KPWeYTNva2IqEwFBfYp5Uv18cp8boSrB78S/tifnTOjRJBjiMzE/Q6htezBkgjpnpAeKbyeONzJNmXAQdXr9+mGkpGtxp3OwyxCZGRJUmflpTeXdf+HnRSxnrSQBmH/yVAaiqj2sG02OHNjFFZ+95lc0S7B8Tk/SwRLkPR9aXWMDFrXAq957l52/rubzB2muVEQ4DJhyg0jGUxRAQIewjg9V4Cx7GepUlGsAcEhusVpR/Q/Oo7055YG4p2SKJt1EuXdrv1sRwBRttuqbi9Amn4WUjLO6EROaOC3kjHVkjnjU+LvF+FSirvSatyfa4bDdaIZF2nPzFreEcl94mxniX4XmTEOB97MW6KvWAbI1lFHqOwM9PSWRTvy+MHlmQ45KSQXg8X9Qya9egCD brbsazr4 /PV7h8039Zv/whmj8g4XjiybQDVDBdle7UK8pH0m4nZRsIvfsALpX9BQ5tNk2XtZG8fQHine/nVDAYXCzbeZA+wVHO+y7C++nUX64cX8aGr9w3XcNVr8R3ZqFt3nWCWDsg/zAR4oHLML0cvkJtyHMCf9o6QJdSGGmMc2LrfzScJw5PiYYsw17N6+mObbPFiseCGOJzp6WCEQVr6DetDQFfx9zcnGPOeGtjtFFsu5Odn31xudW4f66CDXameK/YkhqNOgH6kIBr4zOHTE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation to switch vmstat shepherd to flush per-CPU counters remotely, use a cmpxchg loop instead of a pair of read/write instructions. Signed-off-by: Marcelo Tosatti Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -885,7 +885,7 @@ static int refresh_cpu_vm_stats(void) } /* - * Fold the data for an offline cpu into the global array. + * Fold the data for a cpu into the global array. * There cannot be any access by the offline cpu and therefore * synchronization is simplified. */ @@ -906,8 +906,9 @@ void cpu_vm_stats_fold(int cpu) if (pzstats->vm_stat_diff[i]) { int v; - v = pzstats->vm_stat_diff[i]; - pzstats->vm_stat_diff[i] = 0; + do { + v = pzstats->vm_stat_diff[i]; + } while (cmpxchg(&pzstats->vm_stat_diff[i], v, 0) != v); atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; } @@ -917,8 +918,9 @@ void cpu_vm_stats_fold(int cpu) if (pzstats->vm_numa_event[i]) { unsigned long v; - v = pzstats->vm_numa_event[i]; - pzstats->vm_numa_event[i] = 0; + do { + v = pzstats->vm_numa_event[i]; + } while (cmpxchg(&pzstats->vm_numa_event[i], v, 0) != v); zone_numa_event_add(v, zone, i); } } @@ -934,8 +936,9 @@ void cpu_vm_stats_fold(int cpu) if (p->vm_node_stat_diff[i]) { int v; - v = p->vm_node_stat_diff[i]; - p->vm_node_stat_diff[i] = 0; + do { + v = p->vm_node_stat_diff[i]; + } while (cmpxchg(&p->vm_node_stat_diff[i], v, 0) != v); atomic_long_add(v, &pgdat->vm_stat[i]); global_node_diff[i] += v; } From patchwork Wed Feb 1 19:50:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13124916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8229C636D7 for ; Wed, 1 Feb 2023 19:52:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D25E66B0074; Wed, 1 Feb 2023 14:52:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CD64A6B0075; Wed, 1 Feb 2023 14:52:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9DD86B0078; Wed, 1 Feb 2023 14:52:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A189F6B0074 for ; Wed, 1 Feb 2023 14:52:36 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7A689A0F7C for ; Wed, 1 Feb 2023 19:52:36 +0000 (UTC) X-FDA: 80419770312.23.964D960 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id B95F7180009 for ; Wed, 1 Feb 2023 19:52:34 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BDQsxnKP; spf=pass (imf06.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675281154; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=5/RVJiIAz9b+9GlCe0yao1cdf4Ljv2ahozjVsA9E+ck=; b=5kZHWN0kED5XK9iyd0tBZyCecxKAjPzh8kt3Lo0Rp1TJ8eQPp3JkPNXKJ7q2OZOKu9ckla P8tVsX60WXDCzRkkM/tvftZYnWwLMZBrKGuMNBHYmcuq0Um1w03iQ+MPmZqoM7B730Cul6 fDzBAaFMuDwf4kQkPP7ivb/Tm5bde/s= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BDQsxnKP; spf=pass (imf06.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675281154; a=rsa-sha256; cv=none; b=AyH7qgwqAUNQDChyEDHOhYz3EVsXrxiLmQ5es4S9A15cdcpDe+oi7JByUCfrIA+mouRzA9 ESNCbYPGWSF0vllX/lmZLQSZtoAnAWGE6ZPxHF1A1LWcydiA5f0Yb4JyfrOaAFozGjg2wL GGaZsUaHePnC6gQ72HmxXbJYk/osd38= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675281154; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=5/RVJiIAz9b+9GlCe0yao1cdf4Ljv2ahozjVsA9E+ck=; b=BDQsxnKP8AoJD8cqa7pHVz1tXMfTXSPLo9tB0ct3KfF0WKT8jAkyEcqAjPQZxXL3G7LlTZ /EupkvZeFsK/BqCEtbUQmmsqa5nvyWvkjsLXNE2StL+W7HIDw5rKb8aPwvWmovMDUc3Zgl jrqfNSdyv3XAlNH9z5XyOpWdkVdh7L8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-664-incrzUbIOv2YRg8uK3EuQw-1; Wed, 01 Feb 2023 14:52:30 -0500 X-MC-Unique: incrzUbIOv2YRg8uK3EuQw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 498A32806053; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 059CF140EBF4; Wed, 1 Feb 2023 19:52:30 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 16264403C47C7; Wed, 1 Feb 2023 16:51:49 -0300 (-03) Message-ID: <20230201195104.484635830@redhat.com> User-Agent: quilt/0.67 Date: Wed, 01 Feb 2023 16:50:17 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH 4/5] mm/vmstat: switch vmstat shepherd to flush per-CPU counters remotely References: <20230201195013.881721887@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: qh3nrt6up4gm6jbmd81m5th5zwudzzz3 X-Rspamd-Queue-Id: B95F7180009 X-HE-Tag: 1675281154-800229 X-HE-Meta: U2FsdGVkX1/YnyTSsaRrvtlfm1pM91AqLMsqPr11OP7KIegfWb0AKMSOtwPDR221PveJbMoUmmhqYlmkv/y6yrTMEUt2Z3IlIGUXSXv4AjWJBM1Ampta1C2UXR9CGHhkYk18WvLfy3E1RaX6YJ8DqkIXtdXz2KCx/Z5uOy8toKFv8f0Cx7+84186aZs4wiXy62gFOKCSfB7E0Q1KNy6qgTWZhSFPYUfZICpw6GLUFN9nWycQXZ+4oMDu/Y2rtCGg/FVtjOhtqsERmvq9jyFNEBFEA215qrObcgZPnbh8nGmp5gS5SCdVyyjna6GCBCRZDZxldAxmq2vB7BxNr3HK/pQ8onvZxS9biVZ0pUSOZSjI227xK+g3qp+HdSx3nI1AsSFdKEdjxzHn9+U9iKJ+2ZBMKfwpGglV9wvsyPAV/Khz1SZ4paMH/BrJ8t/T9oMCAtwONZVEt0vCVP7k52YrFDjKPymnomN2I0Zz5+8Nwe2QPyhg2rWZjm9Wyfb+EomY3G7H5mGffjAAJ/XaZIfn2XnnA5rvsLxQBtHruA6yXbTxsQh74JaUpSDRXPnyo4Z+/1RLcECcEAYSr1400rZwPUFinvOdqG05feZ6kjcpRovTgokCHQxmuZTo99ibaVsR0b6jD2kS3en2Mogg3Aw9hZZ1FIooEPqlrSzWM4q3D1rSfXKMA88O+/hc6WkhCL84PLkFDJqh1C173GRFLvFpWGaTNB8xEU0J9MkF12aEZbLwTLxiC6/ueJHCinmgHZlaajEaqFUJkTSIvmRpuaRXwIYb4xmPKL2VcaD/9VufFyqwm16k+RNe9Am+6lw580EmTSAFEJiFeWO7Q6d0fuuSNRklVA3LGCHkLwTRkT9+Y9baip9WKHvot+4XUKzX2rcZLMh6LdAHHad2AfCQG4lHCIq0UyDEuK7DmE6bpD5xz7EYsjyvPtSCWvpKJZ673/5J7M3ODj5wLghRYFVDgno DCG72VFP EAdu1ju6gh+2QKjrWSZqLXmOscWGJYY5oAm7N+wb6BxZ1YB6SjsKd3tpcsRmgrHyZLUeAdKOuGrxSXulTBfi4TgncdOXnJArifJmk7yJn9+EKl7RWrQUFa11/sHUkHPSWbxW/WQ1L5qymjrAu30mnO0CxtZUPIGUzVRhT6nzvgfEt/gheGu3ZMcfFCy3dpkoJvflnu1ED9ZDPsw8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that the counters are modified via cmpxchg both CPU locally (via the account functions), and remotely (via cpu_vm_stats_fold), its possible to switch vmstat_shepherd to perform the per-CPU vmstats folding remotely. This fixes the following two problems: 1. A customer provided some evidence which indicates that the idle tick was stopped; albeit, CPU-specific vmstat counters still remained populated. Thus one can only assume quiet_vmstat() was not invoked on return to the idle loop. If I understand correctly, I suspect this divergence might erroneously prevent a reclaim attempt by kswapd. If the number of zone specific free pages are below their per-cpu drift value then zone_page_state_snapshot() is used to compute a more accurate view of the aforementioned statistic. Thus any task blocked on the NUMA node specific pfmemalloc_wait queue will be unable to make significant progress via direct reclaim unless it is killed after being woken up by kswapd (see throttle_direct_reclaim()) 2. With a SCHED_FIFO task that busy loops on a given CPU, and kworker for that CPU at SCHED_OTHER priority, queuing work to sync per-vmstats will either cause that work to never execute, or stalld (i.e. stall daemon) boosts kworker priority which causes a latency violation Signed-off-by: Marcelo Tosatti Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -2007,6 +2007,23 @@ static void vmstat_shepherd(struct work_ static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd); +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL +/* Flush counters remotely if CPU uses cmpxchg to update its per-CPU counters */ +static void vmstat_shepherd(struct work_struct *w) +{ + int cpu; + + cpus_read_lock(); + for_each_online_cpu(cpu) { + cpu_vm_stats_fold(cpu); + cond_resched(); + } + cpus_read_unlock(); + + schedule_delayed_work(&shepherd, + round_jiffies_relative(sysctl_stat_interval)); +} +#else static void vmstat_shepherd(struct work_struct *w) { int cpu; @@ -2026,6 +2043,7 @@ static void vmstat_shepherd(struct work_ schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); } +#endif static void __init start_shepherd_timer(void) { From patchwork Wed Feb 1 19:50:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 13124919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABE01C636D7 for ; Wed, 1 Feb 2023 19:52:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A67F36B007B; Wed, 1 Feb 2023 14:52:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A6C06B0081; Wed, 1 Feb 2023 14:52:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BAE36B007D; Wed, 1 Feb 2023 14:52:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 548196B007E for ; Wed, 1 Feb 2023 14:52:40 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 302611C3E4F for ; Wed, 1 Feb 2023 19:52:40 +0000 (UTC) X-FDA: 80419770480.15.E2DF57A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 7338112001C for ; Wed, 1 Feb 2023 19:52:38 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HkKAbgNT; spf=pass (imf29.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675281158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:references:dkim-signature; bh=RfB/f1HZnnyt4KmfLrM33LwiKIajQ5La7psNOJtN5bk=; b=D7HgS5MySaPt+3kmDt/boF+t6ujdV/h+f2oQI1lf1B4jHdoK9Ta413CYIjvtJdswO/xaXL HPZ/Xnc4tzHZ4M17V7Y9ui6MilPFCvQ/20MWKpdBKMIE8QbA5utqb8yaN/ePw2G93sduxX okDq9vCfXQY/8Ptniet8v8xb7Ew0bKk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HkKAbgNT; spf=pass (imf29.hostedemail.com: domain of mtosatti@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675281158; a=rsa-sha256; cv=none; b=r0csu+aViJRHJBHF/hrFERoscxHSvY5jZ43upZZj7YgBrGoNCIoA/mqbd1pmv0NF0wddaL NX8+xqI9EMRTgYdjut9fjzP5vYtynJvW/hqujZYRExUF2XoEdKpogiNCx75W1AX8ZLMIt1 2dTPM4ZMkdq1tUk/9FnBeQr+f34ny7E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675281157; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=RfB/f1HZnnyt4KmfLrM33LwiKIajQ5La7psNOJtN5bk=; b=HkKAbgNTU9ABhWRmmn6iXHgEOsqrcBSRHA4uHu5ZnOLQbuQ+6uTXIU9X4f6des2LlXnaTV eGh57Dp6WhMXek9W02U5sQXWoUa/nDWE0LcUROngWp+9eDNIOhhHQX50aL3aoynRIJHuKX xRWgmn+DBEnMtctHQINmITTzLFDeo+I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-185-B_uXfheIP_evsOcpf2nA9A-1; Wed, 01 Feb 2023 14:52:32 -0500 X-MC-Unique: B_uXfheIP_evsOcpf2nA9A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2E4B5800186; Wed, 1 Feb 2023 19:52:32 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E6744112132C; Wed, 1 Feb 2023 19:52:31 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 1B70D403C47C9; Wed, 1 Feb 2023 16:51:49 -0300 (-03) Message-ID: <20230201195104.507817318@redhat.com> User-Agent: quilt/0.67 Date: Wed, 01 Feb 2023 16:50:18 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH 5/5] mm/vmstat: refresh stats remotely instead of via work item References: <20230201195013.881721887@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Stat-Signature: jae4q8sgrif6fcx45q4g1dgtiygkau3m X-Rspam-User: X-Rspamd-Queue-Id: 7338112001C X-Rspamd-Server: rspam06 X-HE-Tag: 1675281158-513768 X-HE-Meta: U2FsdGVkX1/F518V4qLk9h0nJfjCJD+NFiJuCL+NIKTI7jMNFKkAIXa67Yv/Qcdm+81uLtakKu7Ai3XSRut3tvqDFXTKbHdyfLnsmojGe4rcPImjAighLs9ohw6hbFrj43QWYEUsnYD05LkI1ET+vs2veFKAnNfQ+nHr7C7f4EhQLmDHLaaRvcafFpT1T1HjDihBQzd71w0Gkxc5c/qO8fCfRMsJSxEmPvFA5JatOEn/l1FKMU2pgYxsflq4yCYdpX2RM8Khm6YYEGynPpsCy1+kKdHquhCzjQV+ZY6z0Dj07ccAZqiwuKoXLSmjtvP4ccNXKKsL8Gpyeb0ci072V55qw909A6Z+fthEHJg3hSMJ5m5ERgMTDABTEJjb93RD6l7WS71em4FAe/JjG6m3LUT4DGRF1yepE2oSqzu7QneVgnZW7R7JCpyOoXq+S2ENJZ8oBvKtz+Livkc39hWwJe56ppWP/y/Iov6F73z3mwDNlFgQVUivI+LEXJ41Zhodinou5nqk8+W6t0HmB0HjMs35sMAWomc0sNUYEmCfk4Qkmeeun2C4uyEcUdbF9DViHK3SezWB1RzGb7s56Fa8frt6Fthb+Cjus6ZLz3NVwTY2BhkpUSs8JtnxKI+Bcxa3QRgDZJi8yRjm1H5R2K6TmS3YsJKO9iOKWwTzpgFcsBtc8oLaSiLkakjHknInRmMonz1abiawO8BffJc1WOAF8LtCB9Ppje1YcEWvF7ftvkPaIWGGGms4LkMnMGna94bxwbkyzg5R7RIfjbqTYqb45XzJNudnOlS5+Gn0mHe+ETDyZFndwFL2HXG1ll/Sig2zMbKkF0KwCgmdnT4oq1nH4NFeejfmfGZrc/rLGJFsYlAxy+kB+zqPVF3fkV38xZEgHyX1QaczkgA3KtPRX4Ih/ZibmuCXsX3P9l1kC2GdmiypHiSAcjclbnAGASif/nsdFTCFLkpQw8adxPRhWHr PnjNM9GU nWlvPVpL/TsbcgrsjQqqXRyF6FEwijPHyWOKC0+NlAH0TxDdUbiA50B6efrZjtZ3aZ9t3uwTD2ivm4iF4EHj6rLma+VfJITiKHCZVW8RzcBLTW+YyWYWbcTBg3Bna7QhXVWsAfmm2NxpG3WOq/g9Jq73jdUYD9kIwv+OZDaEUq7ZLZa3SqDvs3m9KQvojeOwqafW33QFIcaVbU58= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Refresh per-CPU stats remotely, instead of queueing work items, for the stat_refresh procfs method. This fixes sosreport hang (which uses vmstat_refresh) with spinning SCHED_FIFO process. Signed-off-by: Marcelo Tosatti Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -1865,11 +1865,21 @@ static DEFINE_PER_CPU(struct delayed_wor int sysctl_stat_interval __read_mostly = HZ; #ifdef CONFIG_PROC_FS + +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL +static int refresh_all_vm_stats(void); +#else static void refresh_vm_stats(struct work_struct *work) { refresh_cpu_vm_stats(); } +static int refresh_all_vm_stats(void) +{ + return schedule_on_each_cpu(refresh_vm_stats); +} +#endif + int vmstat_refresh(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -1889,7 +1899,7 @@ int vmstat_refresh(struct ctl_table *tab * transiently negative values, report an error here if any of * the stats is negative, so we know to go looking for imbalance. */ - err = schedule_on_each_cpu(refresh_vm_stats); + err = refresh_all_vm_stats(); if (err) return err; for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { @@ -2009,7 +2019,7 @@ static DECLARE_DEFERRABLE_WORK(shepherd, #ifdef CONFIG_HAVE_CMPXCHG_LOCAL /* Flush counters remotely if CPU uses cmpxchg to update its per-CPU counters */ -static void vmstat_shepherd(struct work_struct *w) +static int refresh_all_vm_stats(void) { int cpu; @@ -2019,7 +2029,12 @@ static void vmstat_shepherd(struct work_ cond_resched(); } cpus_read_unlock(); + return 0; +} +static void vmstat_shepherd(struct work_struct *w) +{ + refresh_all_vm_stats(); schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); }