From patchwork Thu May 6 15:00:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 12242267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFE1DC433ED for ; Thu, 6 May 2021 15:01:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3AB6060725 for ; Thu, 6 May 2021 15:01:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3AB6060725 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BEEE26B006C; Thu, 6 May 2021 11:01:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8ABC6B006E; Thu, 6 May 2021 11:01:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DD8B6B0072; Thu, 6 May 2021 11:01:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id 726C76B006C for ; Thu, 6 May 2021 11:01:47 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F1BD67597 for ; Thu, 6 May 2021 15:01:42 +0000 (UTC) X-FDA: 78111120486.17.4E174DD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 03543C00774D for ; Thu, 6 May 2021 15:00:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620313238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=lnZOGW2WXcr3CD7ZWYCiOFjkYlS6OpFlgmFtj2K0Q2g=; b=cvX87jK9LaGN7BtqIrYIF2fLlDo5m51RsApY0Aj9WG3e04JGPH0aRfSX0Jw8XSq8b8mQxy jtOBaTLUxJwGw7Cprkp1zQRySTMAgzn+q1Ovm4PE/cnMj0HOAoRAp+PNlfFMhHv1dw4cCk jsp3HxtI1oPZc9fuRyOjXupsJITr/9o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-604-N4ksTzjNOdyrNnbY4oBlPA-1; Thu, 06 May 2021 11:00:35 -0400 X-MC-Unique: N4ksTzjNOdyrNnbY4oBlPA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2EA221009E20; Thu, 6 May 2021 15:00:32 +0000 (UTC) Received: from llong.com (ovpn-114-62.rdu2.redhat.com [10.10.114.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 03E6819100; Thu, 6 May 2021 15:00:29 +0000 (UTC) From: Waiman Long To: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Tejun Heo , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Muchun Song , Alex Shi , Chris Down , Yafang Shao , Wei Yang , Masayoshi Mizuma , Xing Zhengjun , Matthew Wilcox , Waiman Long Subject: [PATCH v6 3/4] mm/memcg: Improve refill_obj_stock() performance Date: Thu, 6 May 2021 11:00:06 -0400 Message-Id: <20210506150007.16288-4-longman@redhat.com> In-Reply-To: <20210506150007.16288-1-longman@redhat.com> References: <20210506150007.16288-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cvX87jK9; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf03.hostedemail.com: domain of longman@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=longman@redhat.com X-Stat-Signature: 8c5tte4wi3hrbcw6jt85jq65rc5owcha X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 03543C00774D Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620313232-337607 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are two issues with the current refill_obj_stock() code. First of all, when nr_bytes reaches over PAGE_SIZE, it calls drain_obj_stock() to atomically flush out remaining bytes to obj_cgroup, clear cached_objcg and do a obj_cgroup_put(). It is likely that the same obj_cgroup will be used again which leads to another call to drain_obj_stock() and obj_cgroup_get() as well as atomically retrieve the available byte from obj_cgroup. That is costly. Instead, we should just uncharge the excess pages, reduce the stock bytes and be done with it. The drain_obj_stock() function should only be called when obj_cgroup changes. Secondly, when charging an object of size not less than a page in obj_cgroup_charge(), it is possible that the remaining bytes to be refilled to the stock will overflow a page and cause refill_obj_stock() to uncharge 1 page. To avoid the additional uncharge in this case, a new allow_uncharge flag is added to refill_obj_stock() which will be set to false when called from obj_cgroup_charge() so that an uncharge_pages() call won't be issued right after a charge_pages() call unless the objcg changes. A multithreaded kmalloc+kfree microbenchmark on a 2-socket 48-core 96-thread x86-64 system with 96 testing threads were run. Before this patch, the total number of kilo kmalloc+kfree operations done for a 4k large object by all the testing threads per second were 4,304 kops/s (cgroup v1) and 8,478 kops/s (cgroup v2). After applying this patch, the number were 4,731 (cgroup v1) and 418,142 (cgroup v2) respectively. This represents a performance improvement of 1.10X (cgroup v1) and 49.3X (cgroup v2). Signed-off-by: Waiman Long Reviewed-by: Shakeel Butt --- mm/memcontrol.c | 48 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 35 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fbedfc55a248..513f3d56e89a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3281,10 +3281,12 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, return false; } -static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) +static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes, + bool allow_uncharge) { struct memcg_stock_pcp *stock; unsigned long flags; + unsigned int nr_pages = 0; local_irq_save(flags); @@ -3293,14 +3295,21 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) drain_obj_stock(stock); obj_cgroup_get(objcg); stock->cached_objcg = objcg; - stock->nr_bytes = atomic_xchg(&objcg->nr_charged_bytes, 0); + stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes) + ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0; + allow_uncharge = true; /* Allow uncharge when objcg changes */ } stock->nr_bytes += nr_bytes; - if (stock->nr_bytes > PAGE_SIZE) - drain_obj_stock(stock); + if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) { + nr_pages = stock->nr_bytes >> PAGE_SHIFT; + stock->nr_bytes &= (PAGE_SIZE - 1); + } local_irq_restore(flags); + + if (nr_pages) + obj_cgroup_uncharge_pages(objcg, nr_pages); } int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) @@ -3312,14 +3321,27 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) return 0; /* - * In theory, memcg->nr_charged_bytes can have enough + * In theory, objcg->nr_charged_bytes can have enough * pre-charged bytes to satisfy the allocation. However, - * flushing memcg->nr_charged_bytes requires two atomic - * operations, and memcg->nr_charged_bytes can't be big, - * so it's better to ignore it and try grab some new pages. - * memcg->nr_charged_bytes will be flushed in - * refill_obj_stock(), called from this function or - * independently later. + * flushing objcg->nr_charged_bytes requires two atomic + * operations, and objcg->nr_charged_bytes can't be big. + * The shared objcg->nr_charged_bytes can also become a + * performance bottleneck if all tasks of the same memcg are + * trying to update it. So it's better to ignore it and try + * grab some new pages. The stock's nr_bytes will be flushed to + * objcg->nr_charged_bytes later on when objcg changes. + * + * The stock's nr_bytes may contain enough pre-charged bytes + * to allow one less page from being charged, but we can't rely + * on the pre-charged bytes not being changed outside of + * consume_obj_stock() or refill_obj_stock(). So ignore those + * pre-charged bytes as well when charging pages. To avoid a + * page uncharge right after a page charge, we set the + * allow_uncharge flag to false when calling refill_obj_stock() + * to temporarily allow the pre-charged bytes to exceed the page + * size limit. The maximum reachable value of the pre-charged + * bytes is (sizeof(object) + PAGE_SIZE - 2) if there is no data + * race. */ nr_pages = size >> PAGE_SHIFT; nr_bytes = size & (PAGE_SIZE - 1); @@ -3329,14 +3351,14 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages); if (!ret && nr_bytes) - refill_obj_stock(objcg, PAGE_SIZE - nr_bytes); + refill_obj_stock(objcg, PAGE_SIZE - nr_bytes, false); return ret; } void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) { - refill_obj_stock(objcg, size); + refill_obj_stock(objcg, size, true); } #endif /* CONFIG_MEMCG_KMEM */