Message ID | 20200429124540.32232-1-sjpark@amazon.com (mailing list archive) |
---|---|
Headers | show
Return-Path: <SRS0=W8sI=6N=kvack.org=owner-linux-mm@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5454381 for <patchwork-linux-mm@patchwork.kernel.org>; Wed, 29 Apr 2020 12:47:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E01022085B for <patchwork-linux-mm@patchwork.kernel.org>; Wed, 29 Apr 2020 12:47:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="nydzxCyV" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E01022085B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 00E438E001F; Wed, 29 Apr 2020 08:47:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EDB9F8E0001; Wed, 29 Apr 2020 08:47:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7AD08E001F; Wed, 29 Apr 2020 08:47:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id AF2988E0001 for <linux-mm@kvack.org>; Wed, 29 Apr 2020 08:47:04 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 68E71181AC9C6 for <linux-mm@kvack.org>; Wed, 29 Apr 2020 12:47:04 +0000 (UTC) X-FDA: 76760867568.27.judge38_3da8491cb2425 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=3811e66cc=sjpark@amazon.com,,RULES_HIT:30001:30004:30005:30012:30034:30051:30054:30055:30056:30064:30070,0,RBL:72.21.198.25:@amazon.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:2:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: judge38_3da8491cb2425 X-Filterd-Recvd-Size: 41707 Received: from smtp-fw-4101.amazon.com (smtp-fw-4101.amazon.com [72.21.198.25]) by imf22.hostedemail.com (Postfix) with ESMTP for <linux-mm@kvack.org>; Wed, 29 Apr 2020 12:47:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1588164424; x=1619700424; h=from:to:cc:subject:date:message-id:mime-version; bh=OeuG+989al4XxuvXymLYnW3KJIeduZbgOPvDKJoWgVg=; b=nydzxCyV3j0w8smRcSoo139jeiJbsTPwEgsLRIseQJweUmahmGfAC0N8 81W/DKzOO9zUWr3mMuidT939glEa/mnoTDfUF9EjKaH5mcV96K8FrziOC nf+oMIr8NjFv1ZuHeJPwoI55AcxsKtsr817o3h2dAuzaul4ZFRW9LcF8Q U=; IronPort-SDR: dPT9/C1KvfKKDl44YAO3xi6IAji1yVfq/4VeaLqyFEmsOXrseq6JA6RwhqpV6UmbdsNpPyM+cR BZRO42+5otCA== X-IronPort-AV: E=Sophos;i="5.73,332,1583193600"; d="scan'208";a="27958287" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-62350142.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 29 Apr 2020 12:46:51 +0000 Received: from EX13MTAUEA002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1e-62350142.us-east-1.amazon.com (Postfix) with ESMTPS id 45079A06EF; Wed, 29 Apr 2020 12:46:39 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 29 Apr 2020 12:46:38 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.162.200) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 29 Apr 2020 12:46:22 +0000 From: SeongJae Park <sjpark@amazon.com> To: <akpm@linux-foundation.org> CC: SeongJae Park <sjpark@amazon.de>, <Jonathan.Cameron@Huawei.com>, <aarcange@redhat.com>, <acme@kernel.org>, <alexander.shishkin@linux.intel.com>, <amit@kernel.org>, <benh@kernel.crashing.org>, <brendan.d.gregg@gmail.com>, <brendanhiggins@google.com>, <cai@lca.pw>, <colin.king@canonical.com>, <corbet@lwn.net>, <dwmw@amazon.com>, <irogers@google.com>, <jolsa@redhat.com>, <kirill@shutemov.name>, <mark.rutland@arm.com>, <mgorman@suse.de>, <minchan@kernel.org>, <mingo@redhat.com>, <namhyung@kernel.org>, <peterz@infradead.org>, <rdunlap@infradead.org>, <riel@surriel.com>, <rientjes@google.com>, <rostedt@goodmis.org>, <sblbir@amazon.com>, <shakeelb@google.com>, <shuah@kernel.org>, <sj38.park@gmail.com>, <snu@amazon.de>, <vbabka@suse.cz>, <vdavydov.dev@gmail.com>, <yang.shi@linux.alibaba.com>, <ying.huang@intel.com>, <linux-damon@amazon.com>, <linux-mm@kvack.org>, <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org> Subject: [RFC v7 0/7] Implement Data Access Monitoring-based Memory Operation Schemes Date: Wed, 29 Apr 2020 14:45:33 +0200 Message-ID: <20200429124540.32232-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.162.200] X-ClientProxiedBy: EX13D06UWC004.ant.amazon.com (10.43.162.97) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org> |
Series |
Implement Data Access Monitoring-based Memory Operation Schemes
|
expand
|
From: SeongJae Park <sjpark@amazon.de> DAMON[1] can be used as a primitive for data access awared memory management optimizations. That said, users who want such optimizations should run DAMON, read the monitoring results, analyze it, plan a new memory management scheme, and apply the new scheme by themselves. Such efforts will be inevitable for some complicated optimizations. However, in many other cases, the users would simply want the system to apply a memory management action to a memory region of a specific size having a specific access frequency for a specific time. For example, "page out a memory region larger than 100 MiB keeping only rare accesses more than 2 minutes", or "Do not use THP for a memory region larger than 2 MiB rarely accessed for more than 1 seconds". This RFC patchset makes DAMON to handle such data access monitoring-based operation schemes. With this change, users can do the data access aware optimizations by simply specifying their schemes to DAMON. [1] https://lore.kernel.org/linux-mm/20200406130938.14066-1-sjpark@amazon.com/ Evaluations =========== Setup ----- On my personal QEMU/KVM based virtual machine on an Intel i7 host machine running Ubuntu 18.04, I measure runtime and consumed system memory while running various realistic workloads with several configurations. I use 13 and 12 workloads in PARSEC3[3] and SPLASH-2X[4] benchmark suites, respectively. I personally use another wrapper scripts[5] for setup and run of the workloads. On top of this patchset, we also applied the DAMON-based operation schemes patchset[6] for this evaluation. Measurement ~~~~~~~~~~~ For the measurement of the amount of consumed memory in system global scope, I drop caches before starting each of the workloads and monitor 'MemFree' in the '/proc/meminfo' file. To make results more stable, I repeat the runs 5 times and average results. You can get stdev, min, and max of the numbers among the repeated runs in appendix below. Configurations ~~~~~~~~~~~~~~ The configurations I use are as below. orig: Linux v5.6 with 'madvise' THP policy rec: 'orig' plus DAMON running with record feature thp: same with 'orig', but use 'always' THP policy ethp: 'orig' plus a DAMON operation scheme[6], 'efficient THP' prcl: 'orig' plus a DAMON operation scheme, 'proactive reclaim[7]' I use 'rec' for measurement of DAMON overheads to target workloads and system memory. The remaining configs including 'thp', 'ethp', and 'prcl' are for measurement of DAMON monitoring accuracy. 'ethp' and 'prcl' is simple DAMON-based operation schemes developed for proof of concepts of DAMON. 'ethp' reduces memory space waste of THP by using DAMON for decision of promotions and demotion for huge pages, while 'prcl' is as similar as the original work. Those are implemented as below: # format: <min/max size> <min/max frequency (0-100)> <min/max age> <action> # ethp: Use huge pages if a region >2MB shows >5% access rate, use regular # pages if a region >2MB shows <5% access rate for >1 second 2M null 5 null null null hugepage 2M null null 5 1s null nohugepage # prcl: If a region >4KB shows <5% access rate for >5 seconds, page out. 4K null null 5 500ms null pageout Note that both 'ethp' and 'prcl' are designed with my only straightforward intuition, because those are for only proof of concepts and monitoring accuracy of DAMON. In other words, those are not for production. For production use, those should be tuned more. [1] "Redis latency problems troubleshooting", https://redis.io/topics/latency [2] "Disable Transparent Huge Pages (THP)", https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/ [3] "The PARSEC Becnhmark Suite", https://parsec.cs.princeton.edu/index.htm [4] "SPLASH-2x", https://parsec.cs.princeton.edu/parsec3-doc.htm#splash2x [5] "parsec3_on_ubuntu", https://github.com/sjp38/parsec3_on_ubuntu [6] "[RFC v4 0/7] Implement Data Access Monitoring-based Memory Operation Schemes", https://lore.kernel.org/linux-mm/20200303121406.20954-1-sjpark@amazon.com/ [7] "Proactively reclaiming idle memory", https://lwn.net/Articles/787611/ Results ------- Below two tables show the measurement results. The runtimes are in seconds while the memory usages are in KiB. Each configurations except 'orig' shows its overhead relative to 'orig' in percent within parenthesises. runtime orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead) parsec3/blackscholes 107.755 106.693 (-0.99) 106.408 (-1.25) 107.848 (0.09) 112.142 (4.07) parsec3/bodytrack 79.603 79.110 (-0.62) 78.862 (-0.93) 79.577 (-0.03) 80.579 (1.23) parsec3/canneal 139.588 139.148 (-0.31) 125.747 (-9.92) 130.833 (-6.27) 157.601 (12.90) parsec3/dedup 11.923 11.860 (-0.53) 11.739 (-1.55) 11.931 (0.06) 13.090 (9.78) parsec3/facesim 208.270 208.401 (0.06) 205.557 (-1.30) 206.114 (-1.04) 216.352 (3.88) parsec3/ferret 190.247 190.540 (0.15) 191.056 (0.43) 190.492 (0.13) 193.026 (1.46) parsec3/fluidanimate 210.495 212.142 (0.78) 210.075 (-0.20) 211.365 (0.41) 220.724 (4.86) parsec3/freqmine 287.887 292.770 (1.70) 287.576 (-0.11) 289.190 (0.45) 296.266 (2.91) parsec3/raytrace 117.887 119.385 (1.27) 118.781 (0.76) 118.572 (0.58) 129.831 (10.13) parsec3/streamcluster 321.637 327.692 (1.88) 283.875 (-11.74) 291.699 (-9.31) 329.212 (2.36) parsec3/swaptions 154.148 155.623 (0.96) 155.070 (0.60) 154.952 (0.52) 155.241 (0.71) parsec3/vips 58.851 58.527 (-0.55) 58.396 (-0.77) 58.979 (0.22) 59.970 (1.90) parsec3/x264 70.559 68.624 (-2.74) 66.662 (-5.52) 67.817 (-3.89) 71.065 (0.72) splash2x/barnes 80.678 80.491 (-0.23) 74.135 (-8.11) 79.493 (-1.47) 98.688 (22.32) splash2x/fft 33.565 33.434 (-0.39) 23.153 (-31.02) 31.181 (-7.10) 45.662 (36.04) splash2x/lu_cb 85.536 85.391 (-0.17) 84.396 (-1.33) 86.323 (0.92) 89.000 (4.05) splash2x/lu_ncb 92.899 92.830 (-0.07) 90.075 (-3.04) 93.566 (0.72) 95.603 (2.91) splash2x/ocean_cp 44.529 44.741 (0.47) 43.049 (-3.32) 44.117 (-0.93) 57.652 (29.47) splash2x/ocean_ncp 81.271 81.538 (0.33) 51.337 (-36.83) 62.990 (-22.49) 137.621 (69.34) splash2x/radiosity 91.411 91.329 (-0.09) 90.889 (-0.57) 91.944 (0.58) 102.682 (12.33) splash2x/radix 31.194 31.202 (0.03) 25.258 (-19.03) 28.667 (-8.10) 43.684 (40.04) splash2x/raytrace 83.930 84.754 (0.98) 83.734 (-0.23) 83.394 (-0.64) 84.932 (1.19) splash2x/volrend 86.163 87.052 (1.03) 86.918 (0.88) 86.621 (0.53) 87.520 (1.57) splash2x/water_nsquared 231.335 234.050 (1.17) 222.722 (-3.72) 224.502 (-2.95) 236.589 (2.27) splash2x/water_spatial 88.753 89.167 (0.47) 89.542 (0.89) 89.510 (0.85) 97.960 (10.37) total 2990.130 3006.480 (0.55) 2865.010 (-4.18) 2921.670 (-2.29) 3212.680 (7.44) memused.avg orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead) parsec3/blackscholes 1816303.000 1835404.800 (1.05) 1825285.800 (0.49) 1827203.000 (0.60) 1641411.600 (-9.63) parsec3/bodytrack 1413888.000 1435353.800 (1.52) 1418535.200 (0.33) 1423560.600 (0.68) 1449993.600 (2.55) parsec3/canneal 1042149.000 1053590.600 (1.10) 1038469.400 (-0.35) 1051556.600 (0.90) 1044271.200 (0.20) parsec3/dedup 2364713.400 2448044.200 (3.52) 2397824.600 (1.40) 2427849.200 (2.67) 2402863.000 (1.61) parsec3/facesim 540004.800 554035.000 (2.60) 543449.800 (0.64) 553955.400 (2.58) 483559.400 (-10.45) parsec3/ferret 319349.600 331756.400 (3.89) 319751.600 (0.13) 333884.000 (4.55) 329600.400 (3.21) parsec3/fluidanimate 576741.400 587662.400 (1.89) 576208.000 (-0.09) 586089.800 (1.62) 489655.000 (-15.10) parsec3/freqmine 986222.400 999265.800 (1.32) 987716.200 (0.15) 1001756.400 (1.58) 766269.800 (-22.30) parsec3/raytrace 1748338.200 1750036.000 (0.10) 1742218.400 (-0.35) 1755005.000 (0.38) 1584009.400 (-9.40) parsec3/streamcluster 134980.800 136257.600 (0.95) 119580.000 (-11.41) 135188.600 (0.15) 132589.600 (-1.77) parsec3/swaptions 13893.800 28265.000 (103.44) 16206.000 (16.64) 27826.800 (100.28) 26332.800 (89.53) parsec3/vips 2954105.600 2972710.000 (0.63) 2955940.200 (0.06) 2971989.600 (0.61) 2968768.600 (0.50) parsec3/x264 3169214.400 3206571.400 (1.18) 3185179.200 (0.50) 3170560.000 (0.04) 3209772.400 (1.28) splash2x/barnes 1213585.000 1211837.400 (-0.14) 1220890.600 (0.60) 1215453.600 (0.15) 974635.600 (-19.69) splash2x/fft 9371991.000 9201587.200 (-1.82) 9292089.200 (-0.85) 9108707.400 (-2.81) 9625476.600 (2.70) splash2x/lu_cb 515113.800 523791.000 (1.68) 520880.200 (1.12) 523066.800 (1.54) 362113.400 (-29.70) splash2x/lu_ncb 514847.800 524934.000 (1.96) 521362.400 (1.27) 521515.600 (1.30) 445374.200 (-13.49) splash2x/ocean_cp 3341933.600 3322040.400 (-0.60) 3381251.000 (1.18) 3292229.400 (-1.49) 3181383.000 (-4.80) splash2x/ocean_ncp 3899426.800 3870830.800 (-0.73) 7065641.200 (81.20) 5099403.200 (30.77) 3557460.000 (-8.77) splash2x/radiosity 1465960.800 1470778.600 (0.33) 1482777.600 (1.15) 1500133.400 (2.33) 498807.200 (-65.97) splash2x/radix 1711100.800 1672141.400 (-2.28) 1387826.200 (-18.89) 1516728.600 (-11.36) 2043053.600 (19.40) splash2x/raytrace 47586.400 58698.000 (23.35) 51308.400 (7.82) 61274.800 (28.77) 54446.200 (14.42) splash2x/volrend 150480.400 164633.800 (9.41) 150819.600 (0.23) 163517.400 (8.66) 161828.200 (7.54) splash2x/water_nsquared 47147.600 62403.400 (32.36) 47689.600 (1.15) 60030.800 (27.33) 59736.600 (26.70) splash2x/water_spatial 666544.600 674447.800 (1.19) 665904.600 (-0.10) 673677.600 (1.07) 559765.200 (-16.02) total 40025500.000 40096900.000 (0.18) 42914900.000 (7.22) 41002100.000 (2.44) 38053200.000 (-4.93) DAMON Overheads ~~~~~~~~~~~~~~~ In total, DAMON recording feature incurs 0.55% runtime overhead (up to 1.88% in worst case with 'parsec3/streamcluster') and 0.18% memory space overhead. For convenience test run of 'rec', I use a Python wrapper. The wrapper constantly consumes about 10-15MB of memory. This becomes high memory overhead if the target workload has small memory footprint. In detail, parsec3/swaptions (13 MiB), splash2x/raytrace (47 MiB), splash2x/volrend (150 MiB), and splash2x/water_nsquared (46 MiB)) show 103.44%, 23%, 9%, and 32% overheads, respectively. Nonetheless, the overheads are not from DAMON, but from the wrapper, and thus should be ignored. This fake memory overhead continues in 'ethp' and 'prcl', as those configurations are also using the Python wrapper. Efficient THP ~~~~~~~~~~~~~ THP 'always' enabled policy achieves 4.18% speedup but incurs 7.22% memory overhead. It achieves 36.83% speedup in best case, but 81.20% memory overhead in worst case. Interestingly, both the best and worst case are with 'splash2x/ocean_ncp'). The 2-lines implementation of data access monitoring based THP version ('ethp') shows 2.29% speedup and 2.44% memory overhead. In other words, 'ethp' removes 66.2% of THP memory waste while preserving 54.78% of THP speedup in total. In case of the 'splash2x/ocean_ncp', 'ethp' removes 62.10% of THP memory waste while preserving 61% of THP speedup. Proactive Reclamation ~~~~~~~~~~~~~~~~~~~~ As same to the original work, I use 'zram' swap device for this configuration. In total, our 1 line implementation of Proactive Reclamation, 'prcl', incurred 7.44% runtime overhead in total while achieving 4.93% system memory usage reduction. Nonetheless, as the memory usage is calculated with 'MemFree' in '/proc/meminfo', it contains the SwapCached pages. As the swapcached pages can be easily evicted, I also measured the residential set size of the workloads: rss.avg orig rec (overhead) thp (overhead) ethp (overhead) prcl (overhead) parsec3/blackscholes 591461.000 590761.000 (-0.12) 592669.200 (0.20) 592442.600 (0.17) 308627.200 (-47.82) parsec3/bodytrack 32201.400 32242.800 (0.13) 32299.000 (0.30) 32327.600 (0.39) 27411.000 (-14.88) parsec3/canneal 841593.600 839721.400 (-0.22) 837427.600 (-0.50) 838363.400 (-0.38) 822220.600 (-2.30) parsec3/dedup 1210000.600 1235153.600 (2.08) 1205207.200 (-0.40) 1229808.800 (1.64) 827881.400 (-31.58) parsec3/facesim 311630.400 311273.200 (-0.11) 314747.400 (1.00) 312449.400 (0.26) 184104.600 (-40.92) parsec3/ferret 99714.800 99558.400 (-0.16) 100996.800 (1.29) 99769.600 (0.05) 88979.200 (-10.77) parsec3/fluidanimate 531429.600 531855.200 (0.08) 531744.800 (0.06) 532158.600 (0.14) 428154.000 (-19.43) parsec3/freqmine 553063.600 552561.000 (-0.09) 556588.600 (0.64) 553518.000 (0.08) 65516.800 (-88.15) parsec3/raytrace 894129.800 894332.400 (0.02) 889421.800 (-0.53) 892801.000 (-0.15) 363634.000 (-59.33) parsec3/streamcluster 110887.200 110949.400 (0.06) 111508.400 (0.56) 111645.000 (0.68) 109921.200 (-0.87) parsec3/swaptions 5688.600 5660.800 (-0.49) 5656.400 (-0.57) 5709.200 (0.36) 4201.000 (-26.15) parsec3/vips 31774.800 31992.000 (0.68) 32134.800 (1.13) 32212.400 (1.38) 29026.000 (-8.65) parsec3/x264 81897.400 81842.200 (-0.07) 83073.800 (1.44) 82435.200 (0.66) 80929.400 (-1.18) splash2x/barnes 1216429.200 1212158.000 (-0.35) 1223021.400 (0.54) 1218261.200 (0.15) 710678.800 (-41.58) splash2x/fft 9582824.800 9732597.400 (1.56) 9695113.400 (1.17) 9665607.200 (0.86) 7959449.000 (-16.94) splash2x/lu_cb 509782.600 509423.400 (-0.07) 514467.000 (0.92) 510521.000 (0.14) 346267.200 (-32.08) splash2x/lu_ncb 509735.200 510578.000 (0.17) 513892.200 (0.82) 509864.800 (0.03) 429509.800 (-15.74) splash2x/ocean_cp 3402516.400 3405858.200 (0.10) 3442579.400 (1.18) 3411920.400 (0.28) 2782917.800 (-18.21) splash2x/ocean_ncp 3924875.800 3921542.800 (-0.08) 7179644.000 (82.93) 5243201.400 (33.59) 2760506.600 (-29.67) splash2x/radiosity 1472925.800 1475449.200 (0.17) 1485645.800 (0.86) 1473646.000 (0.05) 248785.000 (-83.11) splash2x/radix 1748452.000 1750998.000 (0.15) 1434846.600 (-17.94) 1606307.800 (-8.13) 1713493.600 (-2.00) splash2x/raytrace 23265.600 23278.400 (0.06) 29232.800 (25.65) 27050.400 (16.27) 16464.600 (-29.23) splash2x/volrend 44020.600 44048.400 (0.06) 44148.400 (0.29) 44125.400 (0.24) 28101.800 (-36.16) splash2x/water_nsquared 29420.800 29409.600 (-0.04) 29808.400 (1.32) 29984.800 (1.92) 25234.000 (-14.23) splash2x/water_spatial 656716.000 656514.200 (-0.03) 656023.000 (-0.11) 656411.600 (-0.05) 498736.400 (-24.06) total 28416316.000 28589600.000 (0.61) 31541823.000 (11.00) 29712600.000 (4.56) 20860800.000 (-26.59) In total, 26.59% of residential sets were reduced. With parsec3/freqmine, 'prcl' reduced 88.15% of residential sets and 22.30% of system memory footprint while incurring only 2.91% runtime overhead. Baseline and Complete Git Tree ============================== The patches are based on the v5.6 plus v9 DAMON patchset[1] and Minchan's ``do_madvise()`` patch[2]. Minchan's patch was necessary for reuse of ``madvise()`` code in DAMON. You can also clone the complete git tree: $ git clone git://github.com/sjp38/linux -b damos/rfc/v7 The web is also available: https://github.com/sjp38/linux/releases/tag/damos/rfc/v7 The latest DAMON development tree is also available at: https://github.com/sjp38/linux/tree/damon/master [1] https://lore.kernel.org/linux-mm/20200406130938.14066-1-sjpark@amazon.com/ [2] https://lore.kernel.org/linux-mm/20200302193630.68771-2-minchan@kernel.org/ Sequence Of Patches =================== The first patch allows DAMON to reuse ``madvise()`` code for the actions. The second patch accounts age of each region. The third patch implements the handling of the schemes in DAMON and exports a kernel space programming interface for it. The fourth patch implements a debugfs interface for the privileged people and programs. The fifth and sixth patches each adds kunit tests and selftests for these changes, and finally the seventhe patch adds human friendly schemes support to the user space tool for DAMON. Patch History ============= Changes from RFC v6 (https://lore.kernel.org/linux-mm/20200407100007.3894-1-sjpark@amazon.com/) - Rebase on DAMON v9 patchset - Cleanup code and fix typos (Stefan Nuernberger) Changes from RFC v5 (https://lore.kernel.org/linux-mm/20200330115042.17431-1-sjpark@amazon.com/) - Rebase on DAMON v8 patchset - Update test results - Fix DAMON userspace tool crash on signal handling - Fix checkpatch warnings Changes from RFC v4 (https://lore.kernel.org/linux-mm/20200303121406.20954-1-sjpark@amazon.com/) - Handle CONFIG_ADVISE_SYSCALL - Clean up code (Jonathan Cameron) - Update test results - Rebase on v5.6 + DAMON v7 Changes from RFC v3 (https://lore.kernel.org/linux-mm/20200225102300.23895-1-sjpark@amazon.com/) - Add Reviewed-by from Brendan Higgins - Code cleanup: Modularize madvise() call - Fix a trivial bug in the wrapper python script - Add more stable and detailed evaluation results with updated ETHP scheme Changes from RFC v2 (https://lore.kernel.org/linux-mm/20200218085309.18346-1-sjpark@amazon.com/) - Fix aging mechanism for more better 'old region' selection - Add more kunittests and kselftests for this patchset - Support more human friedly description and application of 'schemes' Changes from RFC v1 (https://lore.kernel.org/linux-mm/20200210150921.32482-1-sjpark@amazon.com/) - Properly adjust age accounting related properties after splitting, merging, and action applying SeongJae Park (7): mm/madvise: Export do_madvise() to external GPL modules mm/damon: Account age of target regions mm/damon: Implement data access monitoring-based operation schemes mm/damon/schemes: Implement a debugfs interface mm/damon-test: Add kunit test case for regions age accounting mm/damon/selftests: Add 'schemes' debugfs tests damon/tools: Support more human friendly 'schemes' control include/linux/damon.h | 29 ++ mm/damon-test.h | 5 + mm/damon.c | 429 +++++++++++++++++- mm/madvise.c | 1 + tools/damon/_convert_damos.py | 126 +++++ tools/damon/_damon.py | 143 ++++++ tools/damon/damo | 7 + tools/damon/record.py | 135 +----- tools/damon/schemes.py | 105 +++++ .../testing/selftests/damon/debugfs_attrs.sh | 29 ++ 10 files changed, 880 insertions(+), 129 deletions(-) create mode 100755 tools/damon/_convert_damos.py create mode 100644 tools/damon/_damon.py create mode 100644 tools/damon/schemes.py