From patchwork Mon Apr 19 22:50:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 12212777 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3D5CC43460 for ; Mon, 19 Apr 2021 22:51:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A69B61090 for ; Mon, 19 Apr 2021 22:51:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A69B61090 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B508D6B0036; Mon, 19 Apr 2021 18:51:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB6FC6B0070; Mon, 19 Apr 2021 18:51:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BAC96B006E; Mon, 19 Apr 2021 18:51:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 686D66B0036 for ; Mon, 19 Apr 2021 18:51:00 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 199A3363E for ; Mon, 19 Apr 2021 22:51:00 +0000 (UTC) X-FDA: 78050613480.03.79E536F Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) by imf28.hostedemail.com (Postfix) with ESMTP id A529A200024D for ; Mon, 19 Apr 2021 22:51:01 +0000 (UTC) Received: by mail-io1-f43.google.com with SMTP id g125so7890830iof.3 for ; Mon, 19 Apr 2021 15:50:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=b7dN1cR8yp1Y0dQfx7nVvl1zLTK95ZK5ZgHF63GsR4I=; b=LJ6oxjyx71v6vqUIIu5P4D3LaP06L1KxnZZTyte3rE19wcvwUz/wAxaW1gCTA6t6Np LdhT4HzZLhoZAMhlb+c7pcTVbFQsCTLR5GSnuGyOb06w1fRSxSKpSJ6iEHY6LLfEZEv7 ZmKEf0fEi59kI/lD3NOrP0rfZAPwqcKlNNPW+iQYNtVdSnuNW3PVCOMcSCfOE1UFAV2V YHA9VoOXQiclen7qMRf2qAjg2fbc/F8NIk8tgLhhdGh+ysAoFPVcQsHjES/E8i00/tQ7 mCZfX/aNA7Dc0pjFiukMPSNN2Pc4Oi/qbeWZpL+eb5FlQMzXW+UvlOxRZenFPzwYYa9Y rwKA== X-Gm-Message-State: AOAM533HXzWwA5fzIRxxPd90NtbE7D5/vxIGxa0J6bm4MxHQGB2CIqOX ypGbnEole+RqANoIPpWTTe8= X-Google-Smtp-Source: ABdhPJxIhJh7EN6HqpS59th3ME8ohUoJNzIxmuknJmCAl850EvjwOREFqhJwTVrTJz1zzXLyEaF3vw== X-Received: by 2002:a02:6c0e:: with SMTP id w14mr9072162jab.4.1618872659021; Mon, 19 Apr 2021 15:50:59 -0700 (PDT) Received: from abasin.c.googlers.com.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id d7sm7566967ion.39.2021.04.19.15.50.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 15:50:58 -0700 (PDT) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Roman Gushchin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 1/4] percpu: factor out pcpu_check_block_hint() Date: Mon, 19 Apr 2021 22:50:44 +0000 Message-Id: <20210419225047.3415425-2-dennis@kernel.org> X-Mailer: git-send-email 2.31.1.368.gbe11c130af-goog In-Reply-To: <20210419225047.3415425-1-dennis@kernel.org> References: <20210419225047.3415425-1-dennis@kernel.org> MIME-Version: 1.0 X-Stat-Signature: 781maq3ax9qi16z71h1zcj7hccareyru X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A529A200024D Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=mail-io1-f43.google.com; client-ip=209.85.166.43 X-HE-DKIM-Result: none/none X-HE-Tag: 1618872661-712421 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Roman Gushchin Factor out the pcpu_check_block_hint() helper, which will be useful in the future. The new function checks if the allocation can likely fit within the contig hint. Signed-off-by: Roman Gushchin Signed-off-by: Dennis Zhou --- mm/percpu.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 61339b3d9337..5edc7bd88133 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -306,6 +306,25 @@ static unsigned long pcpu_block_off_to_off(int index, int off) return index * PCPU_BITMAP_BLOCK_BITS + off; } +/** + * pcpu_check_block_hint - check against the contig hint + * @block: block of interest + * @bits: size of allocation + * @align: alignment of area (max PAGE_SIZE) + * + * Check to see if the allocation can fit in the block's contig hint. + * Note, a chunk uses the same hints as a block so this can also check against + * the chunk's contig hint. + */ +static bool pcpu_check_block_hint(struct pcpu_block_md *block, int bits, + size_t align) +{ + int bit_off = ALIGN(block->contig_hint_start, align) - + block->contig_hint_start; + + return bit_off + bits <= block->contig_hint; +} + /* * pcpu_next_hint - determine which hint to use * @block: block of interest @@ -1066,14 +1085,11 @@ static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits, int bit_off, bits, next_off; /* - * Check to see if the allocation can fit in the chunk's contig hint. - * This is an optimization to prevent scanning by assuming if it - * cannot fit in the global hint, there is memory pressure and creating - * a new chunk would happen soon. + * This is an optimization to prevent scanning by assuming if the + * allocation cannot fit in the global hint, there is memory pressure + * and creating a new chunk would happen soon. */ - bit_off = ALIGN(chunk_md->contig_hint_start, align) - - chunk_md->contig_hint_start; - if (bit_off + alloc_bits > chunk_md->contig_hint) + if (!pcpu_check_block_hint(chunk_md, alloc_bits, align)) return -1; bit_off = pcpu_next_hint(chunk_md, alloc_bits); From patchwork Mon Apr 19 22:50:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 12212779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCAC7C433B4 for ; Mon, 19 Apr 2021 22:51:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6083861090 for ; Mon, 19 Apr 2021 22:51:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6083861090 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6C3AA6B006E; Mon, 19 Apr 2021 18:51:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64D506B0070; Mon, 19 Apr 2021 18:51:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DB706B0071; Mon, 19 Apr 2021 18:51:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 149336B006E for ; Mon, 19 Apr 2021 18:51:01 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CC4731E1E for ; Mon, 19 Apr 2021 22:51:00 +0000 (UTC) X-FDA: 78050613480.09.8E98F9C Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) by imf24.hostedemail.com (Postfix) with ESMTP id 7DEC7A00038D for ; Mon, 19 Apr 2021 22:50:52 +0000 (UTC) Received: by mail-io1-f43.google.com with SMTP id k25so36571306iob.6 for ; Mon, 19 Apr 2021 15:51:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lGtwy4gezzBoO9ZhoFC/C6pqVb7Mu05Yezx/T7Ux99c=; b=GXleaH2ftAwQryd1BZutzaGSmlQFVKc07pJvGxGqnj6MGlldZ3DW/RxvF79+yiSl/h LvtmLs0W3bnTsFpJLzXZR7b8XbaJNTsNfa4d/hlAb+tVy7nCdtyGLk3wc8uhpM66WaVD /13mv+jbtuGpzJvBb4zqk8BgQCJE+K7HdsA+eNn0zvpgWlVAxA/DJYh4iqtW1J/kueh4 CUvrjXRGS9q2RioDeQjZ0J0gTcY46jsq3WvJV1v2oyMRBKYzEjV5Xa6OJT2LZimDLzn5 1gCnNmfgAIa5xtf1gIpX/61gXpr3nTNphWhlbWxavdvib4KkvQOSW0p2L1Gv4VKTsxfz gKag== X-Gm-Message-State: AOAM532aOUAOs/1nWziYU/yGV7OJ93Sz5/EArWvh8K0vKh8aht4LfdSL 1h4C8Kw+gmzchZASxesp/59t4l96k58= X-Google-Smtp-Source: ABdhPJzVpah/mlRGRhTqneMMEk4zOmZqnaajjyfU5u+TLUhwEMzbasZ2Cl5sAye8L7uMex41uoQHiA== X-Received: by 2002:a02:a06:: with SMTP id 6mr13915809jaw.112.1618872659936; Mon, 19 Apr 2021 15:50:59 -0700 (PDT) Received: from abasin.c.googlers.com.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id d7sm7566967ion.39.2021.04.19.15.50.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 15:50:59 -0700 (PDT) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Roman Gushchin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 2/4] percpu: use pcpu_free_slot instead of pcpu_nr_slots - 1 Date: Mon, 19 Apr 2021 22:50:45 +0000 Message-Id: <20210419225047.3415425-3-dennis@kernel.org> X-Mailer: git-send-email 2.31.1.368.gbe11c130af-goog In-Reply-To: <20210419225047.3415425-1-dennis@kernel.org> References: <20210419225047.3415425-1-dennis@kernel.org> MIME-Version: 1.0 X-Stat-Signature: aghqo9bp9pameeap56nnoy6gtwrdrqpi X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7DEC7A00038D Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail-io1-f43.google.com; client-ip=209.85.166.43 X-HE-DKIM-Result: none/none X-HE-Tag: 1618872652-305880 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This prepares for adding a to_depopulate list and sidelined list after the free slot in the set of lists in pcpu_slot. Signed-off-by: Dennis Zhou Acked-by: Roman Gushchin --- mm/percpu.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 5edc7bd88133..d462222f4adc 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -135,6 +135,7 @@ static int pcpu_unit_size __ro_after_init; static int pcpu_nr_units __ro_after_init; static int pcpu_atom_size __ro_after_init; int pcpu_nr_slots __ro_after_init; +int pcpu_free_slot __ro_after_init; static size_t pcpu_chunk_struct_size __ro_after_init; /* cpus with the lowest and highest unit addresses */ @@ -237,7 +238,7 @@ static int __pcpu_size_to_slot(int size) static int pcpu_size_to_slot(int size) { if (size == pcpu_unit_size) - return pcpu_nr_slots - 1; + return pcpu_free_slot; return __pcpu_size_to_slot(size); } @@ -1806,7 +1807,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, goto fail; } - if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) { + if (list_empty(&pcpu_slot[pcpu_free_slot])) { chunk = pcpu_create_chunk(type, pcpu_gfp); if (!chunk) { err = "failed to allocate new chunk"; @@ -1958,7 +1959,7 @@ static void pcpu_balance_free(enum pcpu_chunk_type type) { LIST_HEAD(to_free); struct list_head *pcpu_slot = pcpu_chunk_list(type); - struct list_head *free_head = &pcpu_slot[pcpu_nr_slots - 1]; + struct list_head *free_head = &pcpu_slot[pcpu_free_slot]; struct pcpu_chunk *chunk, *next; /* @@ -2033,7 +2034,7 @@ static void pcpu_balance_populated(enum pcpu_chunk_type type) 0, PCPU_EMPTY_POP_PAGES_HIGH); } - for (slot = pcpu_size_to_slot(PAGE_SIZE); slot < pcpu_nr_slots; slot++) { + for (slot = pcpu_size_to_slot(PAGE_SIZE); slot <= pcpu_free_slot; slot++) { unsigned int nr_unpop = 0, rs, re; if (!nr_to_pop) @@ -2140,7 +2141,7 @@ void free_percpu(void __percpu *ptr) if (chunk->free_bytes == pcpu_unit_size) { struct pcpu_chunk *pos; - list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list) + list_for_each_entry(pos, &pcpu_slot[pcpu_free_slot], list) if (pos != chunk) { need_balance = true; break; @@ -2562,7 +2563,8 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, * Allocate chunk slots. The additional last slot is for * empty chunks. */ - pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2; + pcpu_free_slot = __pcpu_size_to_slot(pcpu_unit_size) + 1; + pcpu_nr_slots = pcpu_free_slot + 1; pcpu_chunk_lists = memblock_alloc(pcpu_nr_slots * sizeof(pcpu_chunk_lists[0]) * PCPU_NR_CHUNK_TYPES, From patchwork Mon Apr 19 22:50:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 12212781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9831DC433ED for ; Mon, 19 Apr 2021 22:51:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0785861090 for ; Mon, 19 Apr 2021 22:51:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0785861090 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D4696B0073; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 75DCE6B0074; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 406A6900002; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id 07F876B0070 for ; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A9A53180AD830 for ; Mon, 19 Apr 2021 22:51:01 +0000 (UTC) X-FDA: 78050613522.13.C0CD2BF Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by imf16.hostedemail.com (Postfix) with ESMTP id 837B380192D4 for ; Mon, 19 Apr 2021 22:51:00 +0000 (UTC) Received: by mail-io1-f50.google.com with SMTP id z14so3077170ioc.12 for ; Mon, 19 Apr 2021 15:51:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nI/sf2GFd1c2CJZsuBv2RDNskXJ63yKFVzt6dmydqnk=; b=hZU0HyozBKsWRKchayLPEg14l46Ez56AZ5rfcWRARxY6VtfBDG+H0vP/JwGe9z/Bch HcDxsuyh0GciAsMXG368BNngNc40OHiiM7QGNh9GBmpNJEs7QIhVNzYqlQsgThJn5Ind pyBTb7FV3nxWTnTs7i7GdPAkFXDfIAUxHA3uIcObo/hazBDfv5iyg2pyD/1rfrhWUedA +qFf0n8RLFKET8rta0xabHg6kFWiq55uak32XTvEV7fM0DE6N8eMdBWtYvWoBNRU3lv2 yIUOYtE+wBaYDsXwmrpjOjLCBmoC533ALeZS1rjrNUuDJ3IPbJz1EbPLJ6tGYkj0C7Aq YV+Q== X-Gm-Message-State: AOAM532p+9HdLm5rt+lCf+uk69d1ISu3efXVr0eZSiWeNhXPeJ/PWEmn sucbUc1cHAzAVUbbGPmzHqg= X-Google-Smtp-Source: ABdhPJwHiqMbXFFz9DvteUWFxQ4QrPbviidm4+SfyZ5khw9l9uYMBXwYOHOdnIl+JU/MPIWV1prl3Q== X-Received: by 2002:a02:340c:: with SMTP id x12mr2384853jae.64.1618872660640; Mon, 19 Apr 2021 15:51:00 -0700 (PDT) Received: from abasin.c.googlers.com.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id d7sm7566967ion.39.2021.04.19.15.51.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 15:51:00 -0700 (PDT) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Roman Gushchin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 3/4] percpu: implement partial chunk depopulation Date: Mon, 19 Apr 2021 22:50:46 +0000 Message-Id: <20210419225047.3415425-4-dennis@kernel.org> X-Mailer: git-send-email 2.31.1.368.gbe11c130af-goog In-Reply-To: <20210419225047.3415425-1-dennis@kernel.org> References: <20210419225047.3415425-1-dennis@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 837B380192D4 X-Stat-Signature: 3t7ofuwc5sp1nz511qkhgrrwwjowsok5 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=mail-io1-f50.google.com; client-ip=209.85.166.50 X-HE-DKIM-Result: none/none X-HE-Tag: 1618872660-352062 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Roman Gushchin This patch implements partial depopulation of percpu chunks. As of now, a chunk can be depopulated only as a part of the final destruction, if there are no more outstanding allocations. However to minimize a memory waste it might be useful to depopulate a partially filed chunk, if a small number of outstanding allocations prevents the chunk from being fully reclaimed. This patch implements the following depopulation process: it scans over the chunk pages, looks for a range of empty and populated pages and performs the depopulation. To avoid races with new allocations, the chunk is previously isolated. After the depopulation the chunk is sidelined to a special list or freed. New allocations prefer using active chunks to sidelined chunks. If a sidelined chunk is used, it is reintegrated to the active lists. The depopulation is scheduled on the free path if the chunk is all of the following: 1) has more than 1/4 of total pages free and populated 2) the system has enough free percpu pages aside of this chunk 3) isn't the reserved chunk 4) isn't the first chunk If it's already depopulated but got free populated pages, it's a good target too. The chunk is moved to a special slot, pcpu_to_depopulate_slot, chunk->isolated is set, and the balance work item is scheduled. On isolation, these pages are removed from the pcpu_nr_empty_pop_pages. It is constantly replaced to the to_depopulate_slot when it meets these qualifications. pcpu_reclaim_populated() iterates over the to_depopulate_slot until it becomes empty. The depopulation is performed in the reverse direction to keep populated pages close to the beginning. Depopulated chunks are sidelined to preferentially avoid them for new allocations. When no active chunk can suffice a new allocation, sidelined chunks are first checked before creating a new chunk. Signed-off-by: Roman Gushchin Co-developed-by: Dennis Zhou Signed-off-by: Dennis Zhou --- mm/percpu-internal.h | 4 + mm/percpu-km.c | 5 ++ mm/percpu-stats.c | 12 +-- mm/percpu-vm.c | 30 ++++++++ mm/percpu.c | 180 +++++++++++++++++++++++++++++++++++++++---- 5 files changed, 211 insertions(+), 20 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 095d7eaa0db4..10604dce806f 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -67,6 +67,8 @@ struct pcpu_chunk { void *data; /* chunk data */ bool immutable; /* no [de]population allowed */ + bool isolated; /* isolated from active chunk + slots */ int start_offset; /* the overlap with the previous region to have a page aligned base_addr */ @@ -87,6 +89,8 @@ extern spinlock_t pcpu_lock; extern struct list_head *pcpu_chunk_lists; extern int pcpu_nr_slots; +extern int pcpu_sidelined_slot; +extern int pcpu_to_depopulate_slot; extern int pcpu_nr_empty_pop_pages[]; extern struct pcpu_chunk *pcpu_first_chunk; diff --git a/mm/percpu-km.c b/mm/percpu-km.c index 35c9941077ee..c84a9f781a6c 100644 --- a/mm/percpu-km.c +++ b/mm/percpu-km.c @@ -118,3 +118,8 @@ static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai) return 0; } + +static bool pcpu_should_reclaim_chunk(struct pcpu_chunk *chunk) +{ + return false; +} diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c index f6026dbcdf6b..2125981acfb9 100644 --- a/mm/percpu-stats.c +++ b/mm/percpu-stats.c @@ -219,13 +219,15 @@ static int percpu_stats_show(struct seq_file *m, void *v) for (slot = 0; slot < pcpu_nr_slots; slot++) { list_for_each_entry(chunk, &pcpu_chunk_list(type)[slot], list) { - if (chunk == pcpu_first_chunk) { + if (chunk == pcpu_first_chunk) seq_puts(m, "Chunk: <- First Chunk\n"); - chunk_map_stats(m, chunk, buffer); - } else { + else if (slot == pcpu_to_depopulate_slot) + seq_puts(m, "Chunk (to_depopulate)\n"); + else if (slot == pcpu_sidelined_slot) + seq_puts(m, "Chunk (sidelined):\n"); + else seq_puts(m, "Chunk:\n"); - chunk_map_stats(m, chunk, buffer); - } + chunk_map_stats(m, chunk, buffer); } } } diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index e46f7a6917f9..c75f6f24f2d5 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -377,3 +377,33 @@ static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai) /* no extra restriction */ return 0; } + +/** + * pcpu_should_reclaim_chunk - determine if a chunk should go into reclaim + * @chunk: chunk of interest + * + * This is the entry point for percpu reclaim. If a chunk qualifies, it is then + * isolated and managed in separate lists at the back of pcpu_slot: sidelined + * and to_depopulate respectively. The to_depopulate list holds chunks slated + * for depopulation. They no longer contribute to pcpu_nr_empty_pop_pages once + * they are on this list. Once depopulated, they are moved onto the sidelined + * list which enables them to be pulled back in for allocation if no other chunk + * can suffice the allocation. + */ +static bool pcpu_should_reclaim_chunk(struct pcpu_chunk *chunk) +{ + /* do not reclaim either the first chunk or reserved chunk */ + if (chunk == pcpu_first_chunk || chunk == pcpu_reserved_chunk) + return false; + + /* + * If it is isolated, it may be on the sidelined list so move it back to + * the to_depopulate list. If we hit at least 1/4 pages empty pages AND + * there is no system-wide shortage of empty pages aside from this + * chunk, move it to the to_depopulate list. + */ + return ((chunk->isolated && chunk->nr_empty_pop_pages) || + (pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] > + PCPU_EMPTY_POP_PAGES_HIGH + chunk->nr_empty_pop_pages && + chunk->nr_empty_pop_pages >= chunk->nr_pages / 4)); +} diff --git a/mm/percpu.c b/mm/percpu.c index d462222f4adc..79eebc80860d 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -136,6 +136,8 @@ static int pcpu_nr_units __ro_after_init; static int pcpu_atom_size __ro_after_init; int pcpu_nr_slots __ro_after_init; int pcpu_free_slot __ro_after_init; +int pcpu_sidelined_slot __ro_after_init; +int pcpu_to_depopulate_slot __ro_after_init; static size_t pcpu_chunk_struct_size __ro_after_init; /* cpus with the lowest and highest unit addresses */ @@ -562,10 +564,41 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot) { int nslot = pcpu_chunk_slot(chunk); + /* leave isolated chunks in-place */ + if (chunk->isolated) + return; + if (oslot != nslot) __pcpu_chunk_move(chunk, nslot, oslot < nslot); } +static void pcpu_isolate_chunk(struct pcpu_chunk *chunk) +{ + enum pcpu_chunk_type type = pcpu_chunk_type(chunk); + struct list_head *pcpu_slot = pcpu_chunk_list(type); + + lockdep_assert_held(&pcpu_lock); + + if (!chunk->isolated) { + chunk->isolated = true; + pcpu_nr_empty_pop_pages[type] -= chunk->nr_empty_pop_pages; + } + list_move(&chunk->list, &pcpu_slot[pcpu_to_depopulate_slot]); +} + +static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk) +{ + enum pcpu_chunk_type type = pcpu_chunk_type(chunk); + + lockdep_assert_held(&pcpu_lock); + + if (chunk->isolated) { + chunk->isolated = false; + pcpu_nr_empty_pop_pages[type] += chunk->nr_empty_pop_pages; + pcpu_chunk_relocate(chunk, -1); + } +} + /* * pcpu_update_empty_pages - update empty page counters * @chunk: chunk of interest @@ -578,7 +611,7 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot) static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr) { chunk->nr_empty_pop_pages += nr; - if (chunk != pcpu_reserved_chunk) + if (chunk != pcpu_reserved_chunk && !chunk->isolated) pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] += nr; } @@ -1778,7 +1811,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, restart: /* search through normal chunks */ - for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) { + for (slot = pcpu_size_to_slot(size); slot <= pcpu_free_slot; slot++) { list_for_each_entry_safe(chunk, next, &pcpu_slot[slot], list) { off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); @@ -1789,9 +1822,10 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, } off = pcpu_alloc_area(chunk, bits, bit_align, off); - if (off >= 0) + if (off >= 0) { + pcpu_reintegrate_chunk(chunk); goto area_found; - + } } } @@ -1952,10 +1986,13 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align) /** * pcpu_balance_free - manage the amount of free chunks * @type: chunk type + * @empty_only: free chunks only if there are no populated pages * - * Reclaim all fully free chunks except for the first one. + * If empty_only is %false, reclaim all fully free chunks regardless of the + * number of populated pages. Otherwise, only reclaim chunks that have no + * populated pages. */ -static void pcpu_balance_free(enum pcpu_chunk_type type) +static void pcpu_balance_free(enum pcpu_chunk_type type, bool empty_only) { LIST_HEAD(to_free); struct list_head *pcpu_slot = pcpu_chunk_list(type); @@ -1975,7 +2012,8 @@ static void pcpu_balance_free(enum pcpu_chunk_type type) if (chunk == list_first_entry(free_head, struct pcpu_chunk, list)) continue; - list_move(&chunk->list, &to_free); + if (!empty_only || chunk->nr_empty_pop_pages == 0) + list_move(&chunk->list, &to_free); } spin_unlock_irq(&pcpu_lock); @@ -2083,20 +2121,121 @@ static void pcpu_balance_populated(enum pcpu_chunk_type type) } } +/** + * pcpu_reclaim_populated - scan over to_depopulate chunks and free empty pages + * @type: chunk type + * + * Scan over chunks in the depopulate list and try to release unused populated + * pages back to the system. Depopulated chunks are sidelined to prevent + * repopulating these pages unless required. Fully free chunks are reintegrated + * and freed accordingly (1 is kept around). If we drop below the empty + * populated pages threshold, reintegrate the chunk if it has empty free pages. + * Each chunk is scanned in the reverse order to keep populated pages close to + * the beginning of the chunk. + */ +static void pcpu_reclaim_populated(enum pcpu_chunk_type type) +{ + struct list_head *pcpu_slot = pcpu_chunk_list(type); + struct pcpu_chunk *chunk; + struct pcpu_block_md *block; + int i, end; + + spin_lock_irq(&pcpu_lock); + +restart: + /* + * Once a chunk is isolated to the to_depopulate list, the chunk is no + * longer discoverable to allocations whom may populate pages. The only + * other accessor is the free path which only returns area back to the + * allocator not touching the populated bitmap. + */ + while (!list_empty(&pcpu_slot[pcpu_to_depopulate_slot])) { + chunk = list_first_entry(&pcpu_slot[pcpu_to_depopulate_slot], + struct pcpu_chunk, list); + WARN_ON(chunk->immutable); + + /* + * Scan chunk's pages in the reverse order to keep populated + * pages close to the beginning of the chunk. + */ + for (i = chunk->nr_pages - 1, end = -1; i >= 0; i--) { + /* no more work to do */ + if (chunk->nr_empty_pop_pages == 0) + break; + + /* reintegrate chunk to prevent atomic alloc failures */ + if (pcpu_nr_empty_pop_pages[type] < + PCPU_EMPTY_POP_PAGES_HIGH) { + pcpu_reintegrate_chunk(chunk); + goto restart; + } + + /* + * If the page is empty and populated, start or + * extend the (i, end) range. If i == 0, decrease + * i and perform the depopulation to cover the last + * (first) page in the chunk. + */ + block = chunk->md_blocks + i; + if (block->contig_hint == PCPU_BITMAP_BLOCK_BITS && + test_bit(i, chunk->populated)) { + if (end == -1) + end = i; + if (i > 0) + continue; + i--; + } + + /* depopulate if there is an active range */ + if (end == -1) + continue; + + spin_unlock_irq(&pcpu_lock); + pcpu_depopulate_chunk(chunk, i + 1, end + 1); + cond_resched(); + spin_lock_irq(&pcpu_lock); + + pcpu_chunk_depopulated(chunk, i + 1, end + 1); + + /* reset the range and continue */ + end = -1; + } + + if (chunk->free_bytes == pcpu_unit_size) + pcpu_reintegrate_chunk(chunk); + else + list_move(&chunk->list, + &pcpu_slot[pcpu_sidelined_slot]); + } + + spin_unlock_irq(&pcpu_lock); +} + /** * pcpu_balance_workfn - manage the amount of free chunks and populated pages * @work: unused * - * Call pcpu_balance_free() and pcpu_balance_populated() for each chunk type. + * For each chunk type, manage the number of fully free chunks and the number of + * populated pages. An important thing to consider is when pages are freed and + * how they contribute to the global counts. */ static void pcpu_balance_workfn(struct work_struct *work) { enum pcpu_chunk_type type; + /* + * pcpu_balance_free() is called twice because the first time we may + * trim pages in the active pcpu_nr_empty_pop_pages which may cause us + * to grow other chunks. This then gives pcpu_reclaim_populated() time + * to move fully free chunks to the active list to be freed if + * appropriate. + */ for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { mutex_lock(&pcpu_alloc_mutex); - pcpu_balance_free(type); + pcpu_balance_free(type, false); + pcpu_reclaim_populated(type); pcpu_balance_populated(type); + pcpu_balance_free(type, true); mutex_unlock(&pcpu_alloc_mutex); } } @@ -2137,8 +2276,12 @@ void free_percpu(void __percpu *ptr) pcpu_memcg_free_hook(chunk, off, size); - /* if there are more than one fully free chunks, wake up grim reaper */ - if (chunk->free_bytes == pcpu_unit_size) { + /* + * If there are more than one fully free chunks, wake up grim reaper. + * If the chunk is isolated, it may be in the process of being + * reclaimed. Let reclaim manage cleaning up of that chunk. + */ + if (!chunk->isolated && chunk->free_bytes == pcpu_unit_size) { struct pcpu_chunk *pos; list_for_each_entry(pos, &pcpu_slot[pcpu_free_slot], list) @@ -2146,6 +2289,9 @@ void free_percpu(void __percpu *ptr) need_balance = true; break; } + } else if (pcpu_should_reclaim_chunk(chunk)) { + pcpu_isolate_chunk(chunk); + need_balance = true; } trace_percpu_free_percpu(chunk->base_addr, off, ptr); @@ -2560,11 +2706,15 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, pcpu_stats_save_ai(ai); /* - * Allocate chunk slots. The additional last slot is for - * empty chunks. + * Allocate chunk slots. The slots after the active slots are: + * sidelined_slot - isolated, depopulated chunks + * free_slot - fully free chunks + * to_depopulate_slot - isolated, chunks to depopulate */ - pcpu_free_slot = __pcpu_size_to_slot(pcpu_unit_size) + 1; - pcpu_nr_slots = pcpu_free_slot + 1; + pcpu_sidelined_slot = __pcpu_size_to_slot(pcpu_unit_size) + 1; + pcpu_free_slot = pcpu_sidelined_slot + 1; + pcpu_to_depopulate_slot = pcpu_free_slot + 1; + pcpu_nr_slots = pcpu_to_depopulate_slot + 1; pcpu_chunk_lists = memblock_alloc(pcpu_nr_slots * sizeof(pcpu_chunk_lists[0]) * PCPU_NR_CHUNK_TYPES, From patchwork Mon Apr 19 22:50:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 12212783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEA67C43460 for ; Mon, 19 Apr 2021 22:51:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4D44E6135F for ; Mon, 19 Apr 2021 22:51:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D44E6135F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C1B4D6B0070; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BA47A6B0074; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D1696B0071; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0150.hostedemail.com [216.40.44.150]) by kanga.kvack.org (Postfix) with ESMTP id 2F7286B0071 for ; Mon, 19 Apr 2021 18:51:02 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E2AA6180AD837 for ; Mon, 19 Apr 2021 22:51:01 +0000 (UTC) X-FDA: 78050613522.38.1943CB4 Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) by imf17.hostedemail.com (Postfix) with ESMTP id D83FD40002C1 for ; Mon, 19 Apr 2021 22:50:58 +0000 (UTC) Received: by mail-io1-f42.google.com with SMTP id s16so31292996iog.9 for ; Mon, 19 Apr 2021 15:51:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/GNFIHP80fNCogSwOfcX0+4hG2Pb9oNldIslTinwQoQ=; b=V5aldyVS9A4RAJ3j+JAi8Fcis/bO/3GpoaDg8nZSef1ULpT/LIlbTJCjY4VRzYzJlt c7G/ADKAdiW0Nl5aQ+gdxpyYBk9MqRyPISYsxbtkt4n3cOR+oTmVJKikFxDpxVG2BhJC n2Xk7lTyl+nfdPD990cgf3Gmfap1MFjtvE9tFpmWo9zbNHRqbnLb7RMrhxvcabnZPmOp uemv+7v733pjM0bSa8BRIhpo2Nw3Xrgcq8C5mJG9auUnHfMZGrN1DfXTLXxuM6qamLV1 8ornQTR2S+vJtebUco7gJ6IyY6u8KGgIk9L/rzI6yFg8EIPLaaIcKFd5aHdn4CS3KW5U Az3g== X-Gm-Message-State: AOAM53186IV+2WEFJsjhXu5MQv/6FhClw0s59PKUISkVkw2mXQrroeDR VPMz4G0W9afA4GlMWEyKVWc= X-Google-Smtp-Source: ABdhPJxL+9T0xG3f9jYuWBMPpKiKUicCW4XZWajxCO9GaM5jyuV5v0yFTEHY7wG0dHzSdk0RznmhhQ== X-Received: by 2002:a5e:930d:: with SMTP id k13mr16297940iom.61.1618872661109; Mon, 19 Apr 2021 15:51:01 -0700 (PDT) Received: from abasin.c.googlers.com.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243]) by smtp.gmail.com with ESMTPSA id d7sm7566967ion.39.2021.04.19.15.51.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 15:51:00 -0700 (PDT) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Roman Gushchin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 4/4] percpu: use reclaim threshold instead of running for every page Date: Mon, 19 Apr 2021 22:50:47 +0000 Message-Id: <20210419225047.3415425-5-dennis@kernel.org> X-Mailer: git-send-email 2.31.1.368.gbe11c130af-goog In-Reply-To: <20210419225047.3415425-1-dennis@kernel.org> References: <20210419225047.3415425-1-dennis@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D83FD40002C1 X-Stat-Signature: szx7hryaf45ufrziozdzyaxhi41czunw Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mail-io1-f42.google.com; client-ip=209.85.166.42 X-HE-DKIM-Result: none/none X-HE-Tag: 1618872658-630240 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The last patch implements reclaim by adding 2 additional lists where a chunk's lifecycle is: active_slot -> to_depopulate_slot -> sidelined_slot This worked great because we're able to nicely converge paths into isolation. However, it's a bit aggressive to run for every free page. Let's accumulate a few free pages before we do this. To do this, the new lifecycle is: active_slot -> sidelined_slot -> to_depopulate_slot -> sidelined_slot The transition from sidelined_slot -> to_depopulate_slot occurs on a threshold instead of before where it directly went to the to_depopulate_slot. pcpu_nr_isolated_empty_pop_pages[] is introduced to aid with this. Suggested-by: Roman Gushchin Signed-off-by: Dennis Zhou Acked-by: Roman Gushchin --- mm/percpu-internal.h | 1 + mm/percpu-stats.c | 8 ++++++-- mm/percpu.c | 44 +++++++++++++++++++++++++++++++++++++------- 3 files changed, 44 insertions(+), 9 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 10604dce806f..b3e43b016276 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -92,6 +92,7 @@ extern int pcpu_nr_slots; extern int pcpu_sidelined_slot; extern int pcpu_to_depopulate_slot; extern int pcpu_nr_empty_pop_pages[]; +extern int pcpu_nr_isolated_empty_pop_pages[]; extern struct pcpu_chunk *pcpu_first_chunk; extern struct pcpu_chunk *pcpu_reserved_chunk; diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c index 2125981acfb9..facc804eb86c 100644 --- a/mm/percpu-stats.c +++ b/mm/percpu-stats.c @@ -145,7 +145,7 @@ static int percpu_stats_show(struct seq_file *m, void *v) int slot, max_nr_alloc; int *buffer; enum pcpu_chunk_type type; - int nr_empty_pop_pages; + int nr_empty_pop_pages, nr_isolated_empty_pop_pages; alloc_buffer: spin_lock_irq(&pcpu_lock); @@ -167,8 +167,11 @@ static int percpu_stats_show(struct seq_file *m, void *v) } nr_empty_pop_pages = 0; - for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) + nr_isolated_empty_pop_pages = 0; + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { nr_empty_pop_pages += pcpu_nr_empty_pop_pages[type]; + nr_isolated_empty_pop_pages += pcpu_nr_isolated_empty_pop_pages[type]; + } #define PL(X) \ seq_printf(m, " %-20s: %12lld\n", #X, (long long int)pcpu_stats_ai.X) @@ -202,6 +205,7 @@ static int percpu_stats_show(struct seq_file *m, void *v) PU(min_alloc_size); PU(max_alloc_size); P("empty_pop_pages", nr_empty_pop_pages); + P("iso_empty_pop_pages", nr_isolated_empty_pop_pages); seq_putc(m, '\n'); #undef PU diff --git a/mm/percpu.c b/mm/percpu.c index 79eebc80860d..ba13e683d022 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -110,6 +110,9 @@ #define PCPU_EMPTY_POP_PAGES_LOW 2 #define PCPU_EMPTY_POP_PAGES_HIGH 4 +/* only schedule reclaim if there are at least N empty pop pages sidelined */ +#define PCPU_EMPTY_POP_RECLAIM_THRESHOLD 4 + #ifdef CONFIG_SMP /* default addr <-> pcpu_ptr mapping, override in asm/percpu.h if necessary */ #ifndef __addr_to_pcpu_ptr @@ -183,6 +186,7 @@ static LIST_HEAD(pcpu_map_extend_chunks); * The reserved chunk doesn't contribute to the count. */ int pcpu_nr_empty_pop_pages[PCPU_NR_CHUNK_TYPES]; +int pcpu_nr_isolated_empty_pop_pages[PCPU_NR_CHUNK_TYPES]; /* * The number of populated pages in use by the allocator, protected by @@ -582,8 +586,10 @@ static void pcpu_isolate_chunk(struct pcpu_chunk *chunk) if (!chunk->isolated) { chunk->isolated = true; pcpu_nr_empty_pop_pages[type] -= chunk->nr_empty_pop_pages; + pcpu_nr_isolated_empty_pop_pages[type] += + chunk->nr_empty_pop_pages; + list_move(&chunk->list, &pcpu_slot[pcpu_sidelined_slot]); } - list_move(&chunk->list, &pcpu_slot[pcpu_to_depopulate_slot]); } static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk) @@ -595,6 +601,8 @@ static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk) if (chunk->isolated) { chunk->isolated = false; pcpu_nr_empty_pop_pages[type] += chunk->nr_empty_pop_pages; + pcpu_nr_isolated_empty_pop_pages[type] -= + chunk->nr_empty_pop_pages; pcpu_chunk_relocate(chunk, -1); } } @@ -610,9 +618,15 @@ static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk) */ static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr) { + enum pcpu_chunk_type type = pcpu_chunk_type(chunk); + chunk->nr_empty_pop_pages += nr; - if (chunk != pcpu_reserved_chunk && !chunk->isolated) - pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] += nr; + if (chunk != pcpu_reserved_chunk) { + if (chunk->isolated) + pcpu_nr_isolated_empty_pop_pages[type] += nr; + else + pcpu_nr_empty_pop_pages[type] += nr; + } } /* @@ -2138,10 +2152,13 @@ static void pcpu_reclaim_populated(enum pcpu_chunk_type type) struct list_head *pcpu_slot = pcpu_chunk_list(type); struct pcpu_chunk *chunk; struct pcpu_block_md *block; + LIST_HEAD(to_depopulate); int i, end; spin_lock_irq(&pcpu_lock); + list_splice_init(&pcpu_slot[pcpu_to_depopulate_slot], &to_depopulate); + restart: /* * Once a chunk is isolated to the to_depopulate list, the chunk is no @@ -2149,9 +2166,9 @@ static void pcpu_reclaim_populated(enum pcpu_chunk_type type) * other accessor is the free path which only returns area back to the * allocator not touching the populated bitmap. */ - while (!list_empty(&pcpu_slot[pcpu_to_depopulate_slot])) { - chunk = list_first_entry(&pcpu_slot[pcpu_to_depopulate_slot], - struct pcpu_chunk, list); + while (!list_empty(&to_depopulate)) { + chunk = list_first_entry(&to_depopulate, struct pcpu_chunk, + list); WARN_ON(chunk->immutable); /* @@ -2208,6 +2225,13 @@ static void pcpu_reclaim_populated(enum pcpu_chunk_type type) &pcpu_slot[pcpu_sidelined_slot]); } + if (pcpu_nr_isolated_empty_pop_pages[type] >= + PCPU_EMPTY_POP_RECLAIM_THRESHOLD) { + list_splice_tail_init(&pcpu_slot[pcpu_sidelined_slot], + &pcpu_slot[pcpu_to_depopulate_slot]); + pcpu_schedule_balance_work(); + } + spin_unlock_irq(&pcpu_lock); } @@ -2291,7 +2315,13 @@ void free_percpu(void __percpu *ptr) } } else if (pcpu_should_reclaim_chunk(chunk)) { pcpu_isolate_chunk(chunk); - need_balance = true; + if (chunk->free_bytes == pcpu_unit_size || + pcpu_nr_isolated_empty_pop_pages[pcpu_chunk_type(chunk)] >= + PCPU_EMPTY_POP_RECLAIM_THRESHOLD) { + list_splice_tail_init(&pcpu_slot[pcpu_sidelined_slot], + &pcpu_slot[pcpu_to_depopulate_slot]); + need_balance = true; + } } trace_percpu_free_percpu(chunk->base_addr, off, ptr);