From patchwork Tue Feb 14 19:02:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 13140739 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95D37C6379F for ; Tue, 14 Feb 2023 19:02:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 305F36B007B; Tue, 14 Feb 2023 14:02:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B68C6B007D; Tue, 14 Feb 2023 14:02:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17EE16B007E; Tue, 14 Feb 2023 14:02:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 067396B007B for ; Tue, 14 Feb 2023 14:02:39 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BA783120FBD for ; Tue, 14 Feb 2023 19:02:38 +0000 (UTC) X-FDA: 80466818796.10.F056A35 Received: from mail-il1-f177.google.com (mail-il1-f177.google.com [209.85.166.177]) by imf19.hostedemail.com (Postfix) with ESMTP id C61EF1A002C for ; Tue, 14 Feb 2023 19:02:36 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=OGLrK81J; spf=pass (imf19.hostedemail.com: domain of shy828301@gmail.com designates 209.85.166.177 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676401356; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=GNEAe1B6QgPjY6KaGYE9TiQHnNZykCg1bt+o/OnSHKg=; b=OWVnYsuT/D+wufYyEzEmhbE9USURrsKa7AHbC9JV9CYYHXEm4K9iMk3vgznzFjpeoJa1aR +LreunKmW4Q5tV7xoxUdfZpWPybo9xb2UP49Hf+4tYOYDAPofy/EAbebPoewcC8g6ODm26 /YZLib7batXR1NQ0/CrPN26fcoYgTTg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=OGLrK81J; spf=pass (imf19.hostedemail.com: domain of shy828301@gmail.com designates 209.85.166.177 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676401356; a=rsa-sha256; cv=none; b=eDqAw8PqBd3seWdCoVBVQGcYaQKo/H32mz0+R9W10r2oB9F53OsvTc/uHYhPbW8fTTLFHi bQYbCl0YETLYbf9YcsVk11d43okDh+AFcr5UnSASohWJlWcN/a1XnE4xnhreFBZ2WEHVhz 9SMgpGUoILDQd2KagZdD2AJDtF94jcM= Received: by mail-il1-f177.google.com with SMTP id v6so1701516ilc.10 for ; Tue, 14 Feb 2023 11:02:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=GNEAe1B6QgPjY6KaGYE9TiQHnNZykCg1bt+o/OnSHKg=; b=OGLrK81J+2LjfQNVnutus7Q4J7/1zgWBOqe0VSpoa3V1bpzMWIY+GMehuO5fx2unow cGmcHW2gSSr1M362yrY368fhH559sjPkV6wGLL4O5mAQwnUfAqj9yF2lnHitiQOUPiJ8 OTEGLUtBgC4XFw68OgS9Io93dPWJK4yrEd1KeeTMhXcwJ5/fQRhSVLqJ4KmjtIoQYL49 aAj+/Xe8pO27ieMO8T0RLEDDf8UWdcSvPoFrsGk7i7bjJkbewlqHUeJ1q2IrM1XohIh5 tHTeoMYjzuLwDF1cH22PpdsO36vCYsbOOm8k2NsKOWidsVwXgEcn2IJGpNjgScvgkb2U 6oCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GNEAe1B6QgPjY6KaGYE9TiQHnNZykCg1bt+o/OnSHKg=; b=Wls4JRsQ5xoUhnbI1e5uWGJ4GVq3LjzLEBe5NJA2pmyRwiXQfBvyqeKwJgPQ3CKBjB jFUgDwcvE83baWPga+Iwpv4KoCCcamrjhR/Kzo8sr+HwDKewes1Qq2+UH8pk33MAgqgu /mkpd6mKRJSOMpaZZ/xryQVulzdyf0HJx/veJ8kGFDB4GpBEzUsii0HLf5RN6eXqxdP3 OaVJv7i5SPiB0+OicVleuh/v0LfWQu0fFu37dXvERpFooXWkFaaTj7UN7mjc96a+xSrZ Uem9I7J+zCXoaLziVFJfTNbrUswIh9gnYuv8nMfwFx6IcFbopw+YLHCRd4w5zBw0eJD6 PtGA== X-Gm-Message-State: AO0yUKWB/xye4Io8vJdPRLzctr0Piri8AnP4U+U06kEbPibe+3wtIcPj WjExLdlNA+SPypKNUmK+e60= X-Google-Smtp-Source: AK7set/zW0EPl9LzsbivR8TPrW170enXOL+jrEbHAwLfuAFAOh0JpMeGXUm427Nk8n+1LDAQTKlEnw== X-Received: by 2002:a05:6e02:1d19:b0:315:2992:8049 with SMTP id i25-20020a056e021d1900b0031529928049mr3487297ila.27.1676401355789; Tue, 14 Feb 2023 11:02:35 -0800 (PST) Received: from localhost.localdomain (c-67-174-241-145.hsd1.ca.comcast.net. [67.174.241.145]) by smtp.gmail.com with ESMTPSA id r11-20020a056e0219cb00b0030c27c9eea4sm3608770ill.33.2023.02.14.11.02.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Feb 2023 11:02:35 -0800 (PST) From: Yang Shi To: mgorman@techsingularity.net, agk@redhat.com, snitzer@kernel.org, dm-devel@redhat.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 0/5] Introduce mempool pages bulk allocator and use it in dm-crypt Date: Tue, 14 Feb 2023 11:02:16 -0800 Message-Id: <20230214190221.1156876-1-shy828301@gmail.com> X-Mailer: git-send-email 2.39.0 MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C61EF1A002C X-Rspam-User: X-Stat-Signature: pb8485fnthwns91t7m3k368h1jzmr1hc X-HE-Tag: 1676401356-147616 X-HE-Meta: U2FsdGVkX1/YD4yqyZUkSK23ub7X5la0pG5c8re8OPxhJhqvTyX3vA3i/mY9Kyqldg29zGHk720h5BlkPsbyVkKR2gnXBF+0NTQ+npcWaJc7x2Kwjj2VRuts98JZKVBSxto7GtKn2w5+Wez1qO7/qKZHskrbDQZHz2iu0pN+A1miW6a0gBIxl79iXXXmEUxAVvtx0GvMi1Ih9LBi2wpLAm0WVBVh1K06kNNeRNDW3KGP+mqY5T2IGZ6H/yYJwLQXdrSRhVCB/rpPx+gqDH1ixvKYG4P4Sv+2DEat11ZsP/iiT7oE1VCTtrkUyKfXDnVl8S4hDfT9JVUUwLL3SD1CXK6qe/GktRgg55AmC0QN2/lS3b07Tjv59XENQKlYsD/iMY1lroPTouJ7BWZ5+NNqeGKxBCrJxjZcXUajvjJJl8Jadx0m0Xbc192YYdXeyEtms6JDayjF3IEbEz17cubiSininzSZ3h+qEPdhi1/RKY0qd3J9URjS6kVQmjS/JRRipiECsflGidd2y14hcBS0s/OWPNkHAZtnHRFS/7fg2QKa4GGk9761hmfbXGRi4nULbyws4UzOJ3PssOQ3AOVgtYrM34KxJB1nM47JjpFofqPUd6AoYYemEaVP1JZvvch5l1IzyHBlIEfHXcFghNzxrW1gpGOHfJU5t6JPeWBWUHxMU2/jXwtOIpi/btD1Xwbtic9IJAxXZntuNaqBX6mD8Wm3uqWR/oXrhsC46eGQqR6YqfkujIaAh8GgHmWxDMyPTSW5YMsgy4kcf3uLd1H5rZ0BaKVArpTqo806TyX3XvKvvJ0D0ix4Ca8aRwgOW1UsG4juNJGKk9Qf6r5uwkwwRiKSAK5P/5GYgiEN+ngnkfomzESoiDZDfNhLiFbpEctILkuiVtJGQnw5ul8W/68Zm4TJPWUzUV+ONFdUlh9hWIoX3IW5i9Tz3W13XkoHEgNXupItdxil+BBw8grQKpS LE/IeMr0 sLqRTp0gpM/U91bqKMj4hyKoffJutAp2ISI2yUygfwZ1jJkZHBdBrXrMnR73GvTdJFRyhvmvJSQ4nxp8PxqT5ch+2LnJkAGgfzLVNSF0rNBEjeMMR7U7tqaH55a0ajSBwYWjG4qqnjjkg8+05k6z6mRk7WzoP8fxh5H0ukroogEi6eb0BBK8h4DxbMpnag0sCrQSpKgD9pNbAThrzPMB0bxKA03S4OSyna8p5N0ckdRMJm3ZxWG0GQTlUA3DnlPqUvSBcMxeHYKQvR3Pne8LGFCpnurkkFMOrkdPrxYm2blH+ubIeFdw/F0uDy3R/uo4l4JJhA5HmrhfuM5sbbiNisXYvJ3/EvtqgpMeG+EqDLQXA6Q5B79W0j+XUAAudiax4dYFDCadIfb0qnye6Zmr5pH4d5AdTM+t2aaz0UgAQeMqGFJGVwatTEbRG7+emA3+W338zaSinPinHhIVyB4rcZyQkWGi/Op2k5uiocJt1OpD6ERXp34mRr1kzfhZGBxaslPX8fGsQamG0C+Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog: RFC -> v2: * Added callback variant for page bulk allocator and mempool bulk allocator per Mel Gorman. * Used the callback version in dm-crypt driver. * Some code cleanup and refactor to reduce duplicate code. rfc: https://lore.kernel.org/linux-mm/20221005180341.1738796-1-shy828301@gmail.com/ We have full disk encryption enabled, profiling shows page allocations may incur a noticeable overhead when writing. The dm-crypt creates an "out" bio for writing. And fill the "out" bio with the same amount of pages as "in" bio. But the driver allocates one page at a time in a loop. For 1M bio it means the driver has to call page allocator 256 times. It seems not that efficient. Since v5.13 we have page bulk allocator supported, so dm-crypt could use it to do page allocations more efficiently. I could just call the page bulk allocator in dm-crypt driver before the mempool allocator, but it seems ad-hoc and the quick search shows some others do the similar thing, for example, f2fs compress, block bounce, g2fs, ufs, etc. So it seems more neat to implement a general bulk allocation API for mempool. Currently the bulk allocator just supported list and array to consume the pages. But neither is the best fit to dm-crypt ussecase. So introduce a new bulk allocator API, callback, per the suggestion from Mel Gorman. It consumes the pages by calling a callback with a parameter. So introduce the mempool page bulk allocator. The below APIs are introduced: - mempool_init_pages_bulk() - mempool_create_pages_bulk() They initialize the mempool for page bulk allocator. The pool is filled by alloc_page() in a loop. - mempool_alloc_pages_bulk_cb() - mempool_alloc_pages_bulk_array() They do bulk allocation from mempool. The list version is not implemented since there is no user for list version bulk allocator so far and it may be gong soon. They do the below conceptually: 1. Call bulk page allocator 2. If the allocation is fulfilled then return otherwise try to allocate the remaining pages from the mempool 3. If it is fulfilled then return otherwise retry from #1 with sleepable gfp 4. If it is still failed, sleep for a while to wait for the mempool is refilled, then retry from #1 The populated pages will stay on array until the callers consume them or free them, or will be consumed by the callback. Since mempool allocator is guaranteed to success in the sleepable context, so the two APIs return true for success or false for fail. It is the caller's responsibility to handle failure case (partial allocation), just like the page bulk allocator. The mempool typically is an object agnostic allocator, but bulk allocation is only supported by pages, so the mempool bulk allocator is for page allocation only as well. With the mempool bulk allocator the IOPS of dm-crypt with 1M I/O would get improved by approxiamately 6%. The test is done on a machine with 80 CPU and 128GB memory with an encrypted ram device (the impact from storage hardware could be minimized so that we could benchmark the dm-crypt layer more accurately). Before the patch: Jobs: 1 (f=1): [w(1)][100.0%][w=1301MiB/s][w=1301 IOPS][eta 00m:00s] crypt: (groupid=0, jobs=1): err= 0: pid=48512: Wed Feb 1 18:11:30 2023 write: IOPS=1300, BW=1301MiB/s (1364MB/s)(76.2GiB/60001msec); 0 zone resets slat (usec): min=724, max=867, avg=765.71, stdev=19.27 clat (usec): min=4, max=196297, avg=195688.86, stdev=6450.50 lat (usec): min=801, max=197064, avg=196454.90, stdev=6450.35 clat percentiles (msec): | 1.00th=[ 197], 5.00th=[ 197], 10.00th=[ 197], 20.00th=[ 197], | 30.00th=[ 197], 40.00th=[ 197], 50.00th=[ 197], 60.00th=[ 197], | 70.00th=[ 197], 80.00th=[ 197], 90.00th=[ 197], 95.00th=[ 197], | 99.00th=[ 197], 99.50th=[ 197], 99.90th=[ 197], 99.95th=[ 197], | 99.99th=[ 197] bw ( MiB/s): min= 800, max= 1308, per=99.69%, avg=1296.94, stdev=46.02, samples=119 iops : min= 800, max= 1308, avg=1296.94, stdev=46.02, samples=119 lat (usec) : 10=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05% lat (msec) : 100=0.08%, 250=99.83% cpu : usr=3.88%, sys=96.02%, ctx=69, majf=1, minf=9 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwts: total=0,78060,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): WRITE: bw=1301MiB/s (1364MB/s), 1301MiB/s-1301MiB/s (1364MB/s-1364MB/s), io=76.2GiB (81.9GB), run=60001-60001msec After the patch: Jobs: 1 (f=1): [w(1)][100.0%][w=1401MiB/s][w=1401 IOPS][eta 00m:00s] crypt: (groupid=0, jobs=1): err= 0: pid=2171: Wed Feb 1 21:08:16 2023 write: IOPS=1401, BW=1402MiB/s (1470MB/s)(82.1GiB/60001msec); 0 zone resets slat (usec): min=685, max=815, avg=710.77, stdev=13.24 clat (usec): min=4, max=182206, avg=181658.31, stdev=5810.58 lat (usec): min=709, max=182913, avg=182369.36, stdev=5810.67 clat percentiles (msec): | 1.00th=[ 182], 5.00th=[ 182], 10.00th=[ 182], 20.00th=[ 182], | 30.00th=[ 182], 40.00th=[ 182], 50.00th=[ 182], 60.00th=[ 182], | 70.00th=[ 182], 80.00th=[ 182], 90.00th=[ 182], 95.00th=[ 182], | 99.00th=[ 182], 99.50th=[ 182], 99.90th=[ 182], 99.95th=[ 182], | 99.99th=[ 182] bw ( MiB/s): min= 900, max= 1408, per=99.71%, avg=1397.60, stdev=46.04, samples=119 iops : min= 900, max= 1408, avg=1397.60, stdev=46.04, samples=119 lat (usec) : 10=0.01%, 750=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05% lat (msec) : 100=0.08%, 250=99.83% cpu : usr=3.66%, sys=96.23%, ctx=76, majf=1, minf=9 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwts: total=0,84098,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): WRITE: bw=1402MiB/s (1470MB/s), 1402MiB/s-1402MiB/s (1470MB/s-1470MB/s), io=82.1GiB (88.2GB), run=60001-60001msec And the benchmark with 4K size I/O doesn't show measurable regression. Yang Shi (5): mm: page_alloc: add API for bulk allocator with callback mm: mempool: extract the common initialization and alloc code mm: mempool: introduce page bulk allocator md: dm-crypt: move crypt_free_buffer_pages ahead md: dm-crypt: use mempool page bulk allocator drivers/md/dm-crypt.c | 95 ++++++++++++++++++++++++++++++--------------------- include/linux/gfp.h | 21 +++++++++--- include/linux/mempool.h | 21 ++++++++++++ mm/mempolicy.c | 12 ++++--- mm/mempool.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------- mm/page_alloc.c | 21 ++++++++---- 6 files changed, 323 insertions(+), 95 deletions(-)