From patchwork Wed Mar 24 19:06:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12162053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28726C433DB for ; Wed, 24 Mar 2021 19:06:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8AEBB61A07 for ; Wed, 24 Mar 2021 19:06:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8AEBB61A07 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1BA26B02EB; Wed, 24 Mar 2021 15:06:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECBE18D0017; Wed, 24 Mar 2021 15:06:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D45406B02EE; Wed, 24 Mar 2021 15:06:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id B4B956B02EB for ; Wed, 24 Mar 2021 15:06:37 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6E343A2C0 for ; Wed, 24 Mar 2021 19:06:37 +0000 (UTC) X-FDA: 77955699234.26.6C8E002 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf22.hostedemail.com (Postfix) with ESMTP id A488AC0007D6 for ; Wed, 24 Mar 2021 19:06:35 +0000 (UTC) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 12OJ4qis000433 for ; Wed, 24 Mar 2021 12:06:35 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=981pKgSo+LC/Mf3Xm+swjmvg5jsIj7ppALj3/zde3Ag=; b=cd4Gcco+nv3ez94MZ6gARvy8sCGAOUDJ/cCrqivDlRFEjHF/PAVntCHtqUMVHNeomK2R vsM8Laq2ojwA/zLSGSRXm0GmfwEBu2oudeJ+bqxvI5wyv0D+ypiSPJBLWv16Nza0UkSF NrMMnbsHzNtAAgZRMXbthYbnMePHh8zjj14= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 37fpjt6t86-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 24 Mar 2021 12:06:35 -0700 Received: from intmgw001.46.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 24 Mar 2021 12:06:34 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 8321E57ACF2A; Wed, 24 Mar 2021 12:06:33 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH rfc 1/4] percpu: implement partial chunk depopulation Date: Wed, 24 Mar 2021 12:06:23 -0700 Message-ID: <20210324190626.564297-2-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210324190626.564297-1-guro@fb.com> References: <20210324190626.564297-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-03-24_13:2021-03-24,2021-03-24 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 adultscore=0 phishscore=0 priorityscore=1501 mlxlogscore=706 lowpriorityscore=0 spamscore=0 clxscore=1015 bulkscore=0 suspectscore=0 mlxscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103240137 X-FB-Internal: deliver X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A488AC0007D6 X-Stat-Signature: i4jkknds8oghp8h1k1yrdrst7p16sbtz Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=mx0a-00082601.pphosted.com; client-ip=67.231.145.42 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616612795-909128 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch implements partial depopulation of percpu chunks. As now, a chunk can be depopulated only as a part of the final destruction, when there are no more outstanding allocations. However to minimize a memory waste, it might be useful to depopulate a partially filed chunk, if a small number of outstanding allocations prevents the chunk from being reclaimed. This patch implements the following depopulation process: it scans over the chunk pages, looks for a range of empty and populated pages and performs the depopulation. To avoid races with new allocations, the chunk is previously isolated. After the depopulation the chunk is returned to the original slot (but is appended to the tail of the list to minimize the chances of population). Because the pcpu_lock is dropped while calling pcpu_depopulate_chunk(), the chunk can be concurrently moved to a different slot. So we need to isolate it again on each step. pcpu_alloc_mutex is held, so the chunk can't be populated/depopulated asynchronously. Signed-off-by: Roman Gushchin --- mm/percpu.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) diff --git a/mm/percpu.c b/mm/percpu.c index 6596a0a4286e..78c55c73fa28 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -2055,6 +2055,96 @@ static void __pcpu_balance_workfn(enum pcpu_chunk_type type) mutex_unlock(&pcpu_alloc_mutex); } +/** + * pcpu_shrink_populated - scan chunks and release unused pages to the system + * @type: chunk type + * + * Scan over all chunks, find those marked with the depopulate flag and + * try to release unused pages to the system. On every attempt clear the + * chunk's depopulate flag to avoid wasting CPU by scanning the same + * chunk again and again. + */ +static void pcpu_shrink_populated(enum pcpu_chunk_type type) +{ + struct list_head *pcpu_slot = pcpu_chunk_list(type); + struct pcpu_chunk *chunk; + int slot, i, off, start; + + spin_lock_irq(&pcpu_lock); + for (slot = pcpu_nr_slots - 1; slot >= 0; slot--) { +restart: + list_for_each_entry(chunk, &pcpu_slot[slot], list) { + bool isolated = false; + + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_HIGH) + break; + + for (i = 0, start = -1; i < chunk->nr_pages; i++) { + if (!chunk->nr_empty_pop_pages) + break; + + /* + * If the page is empty and populated, start or + * extend the [start, i) range. + */ + if (test_bit(i, chunk->populated)) { + off = find_first_bit( + pcpu_index_alloc_map(chunk, i), + PCPU_BITMAP_BLOCK_BITS); + if (off >= PCPU_BITMAP_BLOCK_BITS) { + if (start == -1) + start = i; + continue; + } + } + + /* + * Otherwise check if there is an active range, + * and if yes, depopulate it. + */ + if (start == -1) + continue; + + /* + * Isolate the chunk, so new allocations + * wouldn't be served using this chunk. + * Async releases can still happen. + */ + if (!list_empty(&chunk->list)) { + list_del_init(&chunk->list); + isolated = true; + } + + spin_unlock_irq(&pcpu_lock); + pcpu_depopulate_chunk(chunk, start, i); + cond_resched(); + spin_lock_irq(&pcpu_lock); + + pcpu_chunk_depopulated(chunk, start, i); + + /* + * Reset the range and continue. + */ + start = -1; + } + + if (isolated) { + /* + * The chunk could have been moved while + * pcpu_lock wasn't held. Make sure we put + * the chunk back into the slot and restart + * the scanning. + */ + if (list_empty(&chunk->list)) + list_add_tail(&chunk->list, + &pcpu_slot[slot]); + goto restart; + } + } + } + spin_unlock_irq(&pcpu_lock); +} + /** * pcpu_balance_workfn - manage the amount of free chunks and populated pages * @work: unused From patchwork Wed Mar 24 19:06:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12162059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2035DC433C1 for ; Wed, 24 Mar 2021 19:06:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AACB161A07 for ; Wed, 24 Mar 2021 19:06:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AACB161A07 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 29BF56B02F1; Wed, 24 Mar 2021 15:06:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A1826B02F3; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE8216B02F2; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id 9DB956B02F1 for ; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5D67A68B3 for ; Wed, 24 Mar 2021 19:06:39 +0000 (UTC) X-FDA: 77955699318.22.F8DFD5E Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf20.hostedemail.com (Postfix) with ESMTP id 8F744FA for ; Wed, 24 Mar 2021 19:06:37 +0000 (UTC) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 12OJ47MP007337 for ; Wed, 24 Mar 2021 12:06:37 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=PesfLSNqaSnZtUHwp3bX+VBIVffZ0kcnPolfYW+BQSc=; b=T+oxbpDviEl0fIWV4bv5zE8Lxxf0+84pGTjkPgw3UjTqaXIzM8asm5g1Ri/HfBkZYYo+ 9MryB1lXbLJDYSaWYLvvXv/qNzpiQm48E8OmXPsTa23cnf+ZVgUHY5s+z2bQwzyy0Op2 adgJ8tSlKegNu2fTHSZU31/n/Amhr0gIH2A= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 37fn33qg6d-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 24 Mar 2021 12:06:37 -0700 Received: from intmgw001.46.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 24 Mar 2021 12:06:36 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 888AA57ACF2C; Wed, 24 Mar 2021 12:06:33 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH rfc 2/4] percpu: split __pcpu_balance_workfn() Date: Wed, 24 Mar 2021 12:06:24 -0700 Message-ID: <20210324190626.564297-3-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210324190626.564297-1-guro@fb.com> References: <20210324190626.564297-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-03-24_13:2021-03-24,2021-03-24 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 mlxscore=0 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 suspectscore=0 adultscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103240137 X-FB-Internal: deliver X-Stat-Signature: pynyho15xn9bcqhiten8oet7dc91bp4r X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8F744FA Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=mx0b-00082601.pphosted.com; client-ip=67.231.153.30 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616612797-254306 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __pcpu_balance_workfn() became fairly big and hard to follow, but in fact it consists of two fully independent parts, responsible for the destruction of excessive free chunks and population of necessarily amount of free pages. In order to simplify the code and prepare for adding of a new functionality, split it in two functions: 1) pcpu_balance_free, 2) pcpu_balance_populated. Move the taking/releasing of the pcpu_alloc_mutex to an upper level to keep the current synchronization in place. Signed-off-by: Roman Gushchin Reviewed-by: Dennis Zhou --- mm/percpu.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 78c55c73fa28..015d076893f5 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1930,31 +1930,22 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align) } /** - * __pcpu_balance_workfn - manage the amount of free chunks and populated pages + * pcpu_balance_free - manage the amount of free chunks * @type: chunk type * - * Reclaim all fully free chunks except for the first one. This is also - * responsible for maintaining the pool of empty populated pages. However, - * it is possible that this is called when physical memory is scarce causing - * OOM killer to be triggered. We should avoid doing so until an actual - * allocation causes the failure as it is possible that requests can be - * serviced from already backed regions. + * Reclaim all fully free chunks except for the first one. */ -static void __pcpu_balance_workfn(enum pcpu_chunk_type type) +static void pcpu_balance_free(enum pcpu_chunk_type type) { - /* gfp flags passed to underlying allocators */ - const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; LIST_HEAD(to_free); struct list_head *pcpu_slot = pcpu_chunk_list(type); struct list_head *free_head = &pcpu_slot[pcpu_nr_slots - 1]; struct pcpu_chunk *chunk, *next; - int slot, nr_to_pop, ret; /* * There's no reason to keep around multiple unused chunks and VM * areas can be scarce. Destroy all free chunks except for one. */ - mutex_lock(&pcpu_alloc_mutex); spin_lock_irq(&pcpu_lock); list_for_each_entry_safe(chunk, next, free_head, list) { @@ -1982,6 +1973,25 @@ static void __pcpu_balance_workfn(enum pcpu_chunk_type type) pcpu_destroy_chunk(chunk); cond_resched(); } +} + +/** + * pcpu_balance_populated - manage the amount of populated pages + * @type: chunk type + * + * Maintain a certain amount of populated pages to satisfy atomic allocations. + * It is possible that this is called when physical memory is scarce causing + * OOM killer to be triggered. We should avoid doing so until an actual + * allocation causes the failure as it is possible that requests can be + * serviced from already backed regions. + */ +static void pcpu_balance_populated(enum pcpu_chunk_type type) +{ + /* gfp flags passed to underlying allocators */ + const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + struct list_head *pcpu_slot = pcpu_chunk_list(type); + struct pcpu_chunk *chunk; + int slot, nr_to_pop, ret; /* * Ensure there are certain number of free populated pages for @@ -2051,8 +2061,6 @@ static void __pcpu_balance_workfn(enum pcpu_chunk_type type) goto retry_pop; } } - - mutex_unlock(&pcpu_alloc_mutex); } /** @@ -2149,14 +2157,18 @@ static void pcpu_shrink_populated(enum pcpu_chunk_type type) * pcpu_balance_workfn - manage the amount of free chunks and populated pages * @work: unused * - * Call __pcpu_balance_workfn() for each chunk type. + * Call pcpu_balance_free() and pcpu_balance_populated() for each chunk type. */ static void pcpu_balance_workfn(struct work_struct *work) { enum pcpu_chunk_type type; - for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) - __pcpu_balance_workfn(type); + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { + mutex_lock(&pcpu_alloc_mutex); + pcpu_balance_free(type); + pcpu_balance_populated(type); + mutex_unlock(&pcpu_alloc_mutex); + } } /** From patchwork Wed Mar 24 19:06:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12162061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 320BDC433DB for ; Wed, 24 Mar 2021 19:06:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B2E1C619B1 for ; Wed, 24 Mar 2021 19:06:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B2E1C619B1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 160516B02F3; Wed, 24 Mar 2021 15:06:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 111BF6B02F5; Wed, 24 Mar 2021 15:06:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7EA96B02F6; Wed, 24 Mar 2021 15:06:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id BF5536B02F3 for ; Wed, 24 Mar 2021 15:06:44 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7F78718337A22 for ; Wed, 24 Mar 2021 19:06:44 +0000 (UTC) X-FDA: 77955699528.02.F493A02 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf30.hostedemail.com (Postfix) with ESMTP id D6918E0011C5 for ; Wed, 24 Mar 2021 19:06:41 +0000 (UTC) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 12OJ3hCc016046 for ; Wed, 24 Mar 2021 12:06:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=bi+ZbmtxcdgH0pfzfRcg/J0Vbgya0T1siOkC3SlTLHI=; b=ZbTat/A0SSIUolp64PuqCTZDscnZ9v78UQGun14zkAPxf3E7QewnBPfPaKPx7HR+qH++ lD2PIbqMZy9oUgduGJlUupNNRrvZN9Y52R4+xN+zcSqtZWV0cuHiGlOT4CWl0MyQ7E9N dLQhzIm++HCgCVDz8EOWrWZmiTJYkKkQVSU= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 37fnsxf54t-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 24 Mar 2021 12:06:43 -0700 Received: from intmgw003.48.prn1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 24 Mar 2021 12:06:35 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 8E1CF57ACF2E; Wed, 24 Mar 2021 12:06:33 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH rfc 3/4] percpu: on demand chunk depopulation Date: Wed, 24 Mar 2021 12:06:25 -0700 Message-ID: <20210324190626.564297-4-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210324190626.564297-1-guro@fb.com> References: <20210324190626.564297-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-03-24_13:2021-03-24,2021-03-24 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 malwarescore=0 lowpriorityscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 spamscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103240137 X-FB-Internal: deliver X-Stat-Signature: fffo6wbdghop4xbu7eueq9uo77gk86m7 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D6918E0011C5 Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=mx0b-00082601.pphosted.com; client-ip=67.231.153.30 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616612801-179910 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To return unused memory to the system schedule an async depopulation of percpu chunks. To balance between scanning too much and creating an overhead because of the pcpu_lock contention and scanning not enough, let's track an amount of chunks to scan and mark chunks which are potentially a good target for the depopulation with a new boolean flag. The async depopulation work will clear the flag after trying to depopulate a chunk (successfully or not). This commit suggest the following logic: if a chunk 1) has more than 1/4 of total pages free and populated 2) isn't a reserved chunk 3) isn't entirely free 4) isn't alone in the corresponding slot it's a good target for depopulation. If there are 2 or more of such chunks, an async depopulation is scheduled. Because chunk population and depopulation are opposite processes which make a little sense together, split out the shrinking part of pcpu_balance_populated() into pcpu_grow_populated() and make pcpu_balance_populated() calling into pcpu_grow_populated() or pcpu_shrink_populated() conditionally. Signed-off-by: Roman Gushchin Reported-by: kernel test robot --- mm/percpu-internal.h | 1 + mm/percpu.c | 111 ++++++++++++++++++++++++++++++++----------- 2 files changed, 85 insertions(+), 27 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 18b768ac7dca..1c5b92af02eb 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -67,6 +67,7 @@ struct pcpu_chunk { void *data; /* chunk data */ bool immutable; /* no [de]population allowed */ + bool depopulate; /* depopulation hint */ int start_offset; /* the overlap with the previous region to have a page aligned base_addr */ diff --git a/mm/percpu.c b/mm/percpu.c index 015d076893f5..148137f0fc0b 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -178,6 +178,12 @@ static LIST_HEAD(pcpu_map_extend_chunks); */ int pcpu_nr_empty_pop_pages; +/* + * Track the number of chunks with a lot of free memory. + * It's used to release unused pages to the system. + */ +static int pcpu_nr_chunks_to_depopulate; + /* * The number of populated pages in use by the allocator, protected by * pcpu_lock. This number is kept per a unit per chunk (i.e. when a page gets @@ -1955,6 +1961,11 @@ static void pcpu_balance_free(enum pcpu_chunk_type type) if (chunk == list_first_entry(free_head, struct pcpu_chunk, list)) continue; + if (chunk->depopulate) { + chunk->depopulate = false; + pcpu_nr_chunks_to_depopulate--; + } + list_move(&chunk->list, &to_free); } @@ -1976,7 +1987,7 @@ static void pcpu_balance_free(enum pcpu_chunk_type type) } /** - * pcpu_balance_populated - manage the amount of populated pages + * pcpu_grow_populated - populate chunk(s) to satisfy atomic allocations * @type: chunk type * * Maintain a certain amount of populated pages to satisfy atomic allocations. @@ -1985,35 +1996,15 @@ static void pcpu_balance_free(enum pcpu_chunk_type type) * allocation causes the failure as it is possible that requests can be * serviced from already backed regions. */ -static void pcpu_balance_populated(enum pcpu_chunk_type type) +static void pcpu_grow_populated(enum pcpu_chunk_type type, int nr_to_pop) { /* gfp flags passed to underlying allocators */ const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; struct list_head *pcpu_slot = pcpu_chunk_list(type); struct pcpu_chunk *chunk; - int slot, nr_to_pop, ret; + int slot, ret; - /* - * Ensure there are certain number of free populated pages for - * atomic allocs. Fill up from the most packed so that atomic - * allocs don't increase fragmentation. If atomic allocation - * failed previously, always populate the maximum amount. This - * should prevent atomic allocs larger than PAGE_SIZE from keeping - * failing indefinitely; however, large atomic allocs are not - * something we support properly and can be highly unreliable and - * inefficient. - */ retry_pop: - if (pcpu_atomic_alloc_failed) { - nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH; - /* best effort anyway, don't worry about synchronization */ - pcpu_atomic_alloc_failed = false; - } else { - nr_to_pop = clamp(PCPU_EMPTY_POP_PAGES_HIGH - - pcpu_nr_empty_pop_pages, - 0, PCPU_EMPTY_POP_PAGES_HIGH); - } - for (slot = pcpu_size_to_slot(PAGE_SIZE); slot < pcpu_nr_slots; slot++) { unsigned int nr_unpop = 0, rs, re; @@ -2084,9 +2075,18 @@ static void pcpu_shrink_populated(enum pcpu_chunk_type type) list_for_each_entry(chunk, &pcpu_slot[slot], list) { bool isolated = false; - if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_HIGH) + if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_HIGH || + pcpu_nr_chunks_to_depopulate < 1) break; + /* + * Don't try to depopulate a chunk again and again. + */ + if (!chunk->depopulate) + continue; + chunk->depopulate = false; + pcpu_nr_chunks_to_depopulate--; + for (i = 0, start = -1; i < chunk->nr_pages; i++) { if (!chunk->nr_empty_pop_pages) break; @@ -2153,6 +2153,41 @@ static void pcpu_shrink_populated(enum pcpu_chunk_type type) spin_unlock_irq(&pcpu_lock); } +/** + * pcpu_balance_populated - manage the amount of populated pages + * @type: chunk type + * + * Populate or depopulate chunks to maintain a certain amount + * of free pages to satisfy atomic allocations, but not waste + * large amounts of memory. + */ +static void pcpu_balance_populated(enum pcpu_chunk_type type) +{ + int nr_to_pop; + + /* + * Ensure there are certain number of free populated pages for + * atomic allocs. Fill up from the most packed so that atomic + * allocs don't increase fragmentation. If atomic allocation + * failed previously, always populate the maximum amount. This + * should prevent atomic allocs larger than PAGE_SIZE from keeping + * failing indefinitely; however, large atomic allocs are not + * something we support properly and can be highly unreliable and + * inefficient. + */ + if (pcpu_atomic_alloc_failed) { + nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH; + /* best effort anyway, don't worry about synchronization */ + pcpu_atomic_alloc_failed = false; + pcpu_grow_populated(type, nr_to_pop); + } else if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_HIGH) { + nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH - pcpu_nr_empty_pop_pages; + pcpu_grow_populated(type, nr_to_pop); + } else if (pcpu_nr_chunks_to_depopulate > 0) { + pcpu_shrink_populated(type); + } +} + /** * pcpu_balance_workfn - manage the amount of free chunks and populated pages * @work: unused @@ -2188,6 +2223,7 @@ void free_percpu(void __percpu *ptr) int size, off; bool need_balance = false; struct list_head *pcpu_slot; + struct pcpu_chunk *pos; if (!ptr) return; @@ -2207,15 +2243,36 @@ void free_percpu(void __percpu *ptr) pcpu_memcg_free_hook(chunk, off, size); - /* if there are more than one fully free chunks, wake up grim reaper */ if (chunk->free_bytes == pcpu_unit_size) { - struct pcpu_chunk *pos; - + /* + * If there are more than one fully free chunks, + * wake up grim reaper. + */ list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list) if (pos != chunk) { need_balance = true; break; } + + } else if (chunk->nr_empty_pop_pages > chunk->nr_pages / 4) { + /* + * If there is more than one chunk in the slot and + * at least 1/4 of its pages are empty, mark the chunk + * as a target for the depopulation. If there is more + * than one chunk like this, schedule an async balancing. + */ + int nslot = pcpu_chunk_slot(chunk); + + list_for_each_entry(pos, &pcpu_slot[nslot], list) + if (pos != chunk && !chunk->depopulate && + !chunk->immutable) { + chunk->depopulate = true; + pcpu_nr_chunks_to_depopulate++; + break; + } + + if (pcpu_nr_chunks_to_depopulate > 1) + need_balance = true; } trace_percpu_free_percpu(chunk->base_addr, off, ptr); From patchwork Wed Mar 24 19:06:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12162057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A73AC433E1 for ; Wed, 24 Mar 2021 19:06:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 99194619B1 for ; Wed, 24 Mar 2021 19:06:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 99194619B1 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D97D76B02EF; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF96B8D0017; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B23256B02F3; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 8552E6B02EF for ; Wed, 24 Mar 2021 15:06:39 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3E5518248047 for ; Wed, 24 Mar 2021 19:06:39 +0000 (UTC) X-FDA: 77955699318.05.199CDAA Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf23.hostedemail.com (Postfix) with ESMTP id 1BB48A000381 for ; Wed, 24 Mar 2021 19:06:37 +0000 (UTC) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 12OJ30SH023852 for ; Wed, 24 Mar 2021 12:06:37 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=Ieh/wcC81zFFEwxI9926Yf3+LMLmn6Q4SUMf1JE6GPM=; b=hwxZIJ0Yh7ybjJwK0QbrtsH5dNtKA4QyKoonlP8x2piRsUGYViKrKRG9ykge5RPm/1VG YhnwZKNU9FY3EMUbJmsOk8EYF0VEjQ51hbAdPFr4NyG91vvOba5LcOikkYC81Mpyuzga Dr6iqv2ITjKWds66PDjwCKmu66ty/sk+hRs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 37fpghpvwe-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 24 Mar 2021 12:06:37 -0700 Received: from intmgw001.05.prn6.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 24 Mar 2021 12:06:36 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 937BB57ACF30; Wed, 24 Mar 2021 12:06:33 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH rfc 4/4] percpu: fix a comment about the chunks ordering Date: Wed, 24 Mar 2021 12:06:26 -0700 Message-ID: <20210324190626.564297-5-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210324190626.564297-1-guro@fb.com> References: <20210324190626.564297-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-03-24_13:2021-03-24,2021-03-24 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 mlxlogscore=858 priorityscore=1501 suspectscore=0 malwarescore=0 spamscore=0 phishscore=0 lowpriorityscore=0 mlxscore=0 impostorscore=0 adultscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103240137 X-FB-Internal: deliver X-Stat-Signature: 4s3r6jxcw3zecxc6mzg3cthzd7czpjqd X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1BB48A000381 Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf23; identity=mailfrom; envelope-from=""; helo=mx0a-00082601.pphosted.com; client-ip=67.231.145.42 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616612797-744526 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since the commit 3e54097beb22 ("percpu: manage chunks based on contig_bits instead of free_bytes") chunks are sorted based on the size of the biggest continuous free area instead of the total number of free bytes. Update the corresponding comment to reflect this. Signed-off-by: Roman Gushchin --- mm/percpu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/percpu.c b/mm/percpu.c index 148137f0fc0b..08fb6e5d3232 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -99,7 +99,10 @@ #include "percpu-internal.h" -/* the slots are sorted by free bytes left, 1-31 bytes share the same slot */ +/* + * The slots are sorted by the size of the biggest continuous free area. + * 1-31 bytes share the same slot. + */ #define PCPU_SLOT_BASE_SHIFT 5 /* chunks in slots below this are subject to being sidelined on failed alloc */ #define PCPU_SLOT_FAIL_THRESHOLD 3