From patchwork Wed Dec 4 11:34:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11272775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 50893109A for ; Wed, 4 Dec 2019 11:35:58 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2AD5620637 for ; Wed, 4 Dec 2019 11:35:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="cKDhii0K" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AD5620637 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1icSvi-0006Ny-PS; Wed, 04 Dec 2019 11:34:50 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1icSvh-0006Nt-UO for xen-devel@lists.xenproject.org; Wed, 04 Dec 2019 11:34:49 +0000 X-Inumbo-ID: 1491cada-168a-11ea-8202-12813bfff9fa Received: from smtp-fw-4101.amazon.com (unknown [72.21.198.25]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 1491cada-168a-11ea-8202-12813bfff9fa; Wed, 04 Dec 2019 11:34:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1575459290; x=1606995290; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=l9KPO+RfCo20qWXXb2Nj9NKT03YEvDC09jO2n9eHTpU=; b=cKDhii0KnRKwT3QLdnvKtqiWzVTzH59dzuWOJHMSG3UXoYcrcPoR6VL7 jnBY+6qJcTcM/n2sz9e56z8oNBlMRV4dE4MrK/o8b0JXykXMBNapkEjKu gsqsDrBdeYRyH0P5CpGLJDP7UlSeyvtrLPYzAMr3NhXJOb7rRnMAsqr6D M=; IronPort-SDR: viKll6ecnooPrmUshSbgKArxk+zTotTLwmPFV/Wv/PkUGPV6L9xCUelo4wLNJXlMTz/omLid4V /zTdHD3WM+lw== X-IronPort-AV: E=Sophos;i="5.69,277,1571702400"; d="scan'208";a="7009075" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1d-74cf8b49.us-east-1.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 04 Dec 2019 11:34:45 +0000 Received: from EX13MTAUEA001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162]) by email-inbound-relay-1d-74cf8b49.us-east-1.amazon.com (Postfix) with ESMTPS id 18111C0C8E; Wed, 4 Dec 2019 11:34:42 +0000 (UTC) Received: from EX13D31EUA004.ant.amazon.com (10.43.165.161) by EX13MTAUEA001.ant.amazon.com (10.43.61.243) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 4 Dec 2019 11:34:42 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.162.249) by EX13D31EUA004.ant.amazon.com (10.43.165.161) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 4 Dec 2019 11:34:38 +0000 From: SeongJae Park To: , , Date: Wed, 4 Dec 2019 12:34:18 +0100 Message-ID: <20191204113419.2298-2-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191204113419.2298-1-sjpark@amazon.com> References: <20191204113419.2298-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.162.249] X-ClientProxiedBy: EX13D21UWB003.ant.amazon.com (10.43.161.212) To EX13D31EUA004.ant.amazon.com (10.43.165.161) Precedence: Bulk Subject: [Xen-devel] [PATCH 1/2] xen/blkback: Aggressively shrink page pools if a memory pressure is detected X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: sj38.park@gmail.com, xen-devel@lists.xenproject.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, SeongJae Park Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and be increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, `blkfront` running guests can cause a memory pressure in the `blkback` running guest by attaching arbitrarily large number of block devices and inducing I/O. This commit avoids such problematic situations by shrinking the pools aggressively (further the limit) for a while (one millisecond) if a memory pressure is detected. Discussions =========== The shrinking mechanism returns only pages in the pool which are not currently be used by blkback. In other words, the pages that will be shrunk are not mapped with foreign pages. Because this commit is changing only the shrink limit but uses the shrinking mechanism as is, this commit does not introduce security issues such as improper unmappings. This commit keeps the aggressive shrinking limit for one milisecond from last memory pressure detected time. The duration should be neither too short nor too long. If it is too long, free pages pool shrinking overhead can reduce the I/O performance. If it is too short, blkback will not free enough pages to reduce the memory pressure. I believe that one millisecond is a short duration in terms of I/O while it is a long duration in terms of memory operations. Also, as the original shrinking mechanism works for every 100 milliseconds, this 1 millisecond could be a somewhat reasonable choice. Also, this duration worked well for our testing environment simulating the memory pressure situation (will be described in detail below). Memory Pressure Test ==================== To show whether this commit fixes the above mentioned memory pressure situation well, I configured a test environment. On the `blkfront` running guest instances of a virtualized environment, I attach arbitrarily large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in and out on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. Roughly speaking, this commit has reduced those numbers 130x (pswpin) and 34x (pswpout) as below: pswpin pswpout before 76,672 185,799 after 587 5,402 Performance Overhead Test ========================= This commit could incur I/O performance degradation under memory pressure because the aggressive shrinking will require more page allocations. To show the overhead, I artificially made an aggressive pages pool shrinking situation and measured the I/O performance of a `blkfront` running guest. For the artificial shrinking, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. We set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` incurs the worst-case aggressive shrinking stress. For the I/O performance measurement, I use a simple `dd` command. Default Performance ------------------- [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s Worst-case Performance ---------------------- [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s In short, even worst case aggressive pools shrinking makes no visible performance degradation. I think this is due to the slow speed of the I/O. In other words, the additional page allocation overhead is hidden under the much slower I/O time. Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 31 +++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 3666afa639d1..aa1a127093e5 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -135,6 +135,27 @@ module_param(log_stats, int, 0644); /* Number of free pages to remove on each call to gnttab_free_pages */ #define NUM_BATCH_FREE_PAGES 10 +/* + * Once a memory pressure is detected, keep aggressive shrinking of the free + * page pools for this time (msec) + */ +#define AGGRESSIVE_SHRINKING_DURATION 1 + +static unsigned long xen_blk_mem_pressure_end; + +static unsigned long blkif_shrink_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + xen_blk_mem_pressure_end = jiffies + + msecs_to_jiffies(AGGRESSIVE_SHRINKING_DURATION); + return 0; +} + +static struct shrinker blkif_shrinker = { + .count_objects = blkif_shrink_count, + .seeks = DEFAULT_SEEKS, +}; + static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { return xen_blkif_pgrant_timeout && @@ -656,8 +677,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, xen_blk_mem_pressure_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -1500,6 +1524,9 @@ static int __init xen_blkif_init(void) if (rc) goto failed_init; + if (register_shrinker(&blkif_shrinker)) + pr_warn("shrinker registration failed\n"); + failed_init: return rc; }