From patchwork Mon Jan 20 16:30:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 11342523 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8115C921 for ; Mon, 20 Jan 2020 16:31:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 318E624125 for ; Mon, 20 Jan 2020 16:31:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="uIVIm0Vx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 318E624125 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7E88A6B068C; Mon, 20 Jan 2020 11:31:21 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7C1136B068D; Mon, 20 Jan 2020 11:31:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 660AE6B068E; Mon, 20 Jan 2020 11:31:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 409886B068C for ; Mon, 20 Jan 2020 11:31:21 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 065EB181AEF09 for ; Mon, 20 Jan 2020 16:31:21 +0000 (UTC) X-FDA: 76398552762.25.mask73_801b8fc46b238 X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,prvs=2817cd025=sjpark@amazon.com,:akpm@linux-foundation.org:sjpark@amazon.de:acme@kernel.org:brendan.d.gregg@gmail.com:corbet@lwn.net:mgorman@suse.de:dwmw@amazon.com:amit@kernel.org:rostedt@goodmis.org:sj38.park@gmail.com::linux-doc@vger.kernel.org:linux-kernel@vger.kernel.org,RULES_HIT:30003:30005:30016:30034:30045:30046:30051:30054:30055:30064:30070:30075,0,RBL:52.95.48.154:@amazon.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:2:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: mask73_801b8fc46b238 X-Filterd-Recvd-Size: 15068 Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Mon, 20 Jan 2020 16:31:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1579537880; x=1611073880; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=Hn/HVsXs5YlUJTypjsL+0f8/Bq1+YpQSIYVLgHHWXR8=; b=uIVIm0Vxln7w8LDBh8tf69gJopn7CGyxG2HKFaktKrQ2KCxf8xG3eNUV Bwf3zMs30+Mi0b5B7dOxvusZNReCWxSjdoM7DBwCNJXQC4efjH403TsMJ rjyFHsxfz+7QhmckkX2wTyhH0nN3ICXAdEyQ5AqUqINmn+AVA+YmGHj9+ Q=; IronPort-SDR: jJjjv68Dt79QL6RhShxKSUti51CDrwUEuNPW/lbNNrVPV42dF8LiqjruZJ2HtjzFSZx8DSVGH1 wU1LOQirAwBQ== X-IronPort-AV: E=Sophos;i="5.70,342,1574121600"; d="scan'208";a="13788419" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-f14f4a47.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 20 Jan 2020 16:31:17 +0000 Received: from EX13MTAUEA002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-f14f4a47.us-west-2.amazon.com (Postfix) with ESMTPS id E1526A2169; Mon, 20 Jan 2020 16:31:16 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Mon, 20 Jan 2020 16:31:16 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.161.253) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 20 Jan 2020 16:31:10 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , Subject: [PATCH 7/8] Documentation/admin-guide/mm: Add a document for DAMON Date: Mon, 20 Jan 2020 17:30:23 +0100 Message-ID: <20200120163024.647-3-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200120163024.647-1-sjpark@amazon.com> References: <20200120162757.32375-1-sjpark@amazon.com> <20200120163024.647-1-sjpark@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.161.253] X-ClientProxiedBy: EX13D34UWC003.ant.amazon.com (10.43.162.66) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: SeongJae Park This commit adds a simple document for DAMON under `Documentation/admin-guide/mm`. Signed-off-by: SeongJae Park --- .../admin-guide/mm/data_access_monitor.rst | 240 ++++++++++++++++++ Documentation/admin-guide/mm/index.rst | 1 + MAINTAINERS | 1 + 3 files changed, 242 insertions(+) create mode 100644 Documentation/admin-guide/mm/data_access_monitor.rst diff --git a/Documentation/admin-guide/mm/data_access_monitor.rst b/Documentation/admin-guide/mm/data_access_monitor.rst new file mode 100644 index 000000000000..7a4d7ce88c20 --- /dev/null +++ b/Documentation/admin-guide/mm/data_access_monitor.rst @@ -0,0 +1,240 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +DAMON: Data Access MONitor +========================== + + +Too Long; Don't Read +==================== + +DAMON is a kernel module that allows users to monitor the actual memory access +pattern of specific user-space processes. It aims to be 1) accurate enough to +be useful for performance-centric domains, and 2) sufficiently light-weight so +that it can be applied online. + +For the goals, DAMON utilizes its two core mechanisms, called region-based +sampling and adaptive regions adjustment. The region-based sampling allows +users to make their own trade-off between the quality and the overhead of the +monitoring and set the upperbound of the monitoring overhead. Further, the +adaptive regions adjustment mechanism makes DAMON to maximize the quality and +minimize the overhead with its best efforts while preserving the users +configured trade-off. + +Please note that the term 'memory' in this document means 'main memory'. It +also assumes that it would usually utilizes the middle level speed memory +devices such as DRAMs or NVRAMs. CPU caches or storage devices are not our +concern, as those are too fast or too slow to be in DAMON's scope. + + +Background +========== + +For performance-centric analysis and optimizations of memory management schemes +(either that of kernel space or user space), the actual data access pattern of +the workloads is highly useful. The information need to be only reasonable +rather than strictly correct, because some level of incorrectness can be +handled in many performance-centric domains. It also need to be taken within +reasonably short time with only light-weight overhead. + +Manually extracting such data is not easy and time consuming if the target +workload is huge and complex, even for the developers of the programs. There +are a range of tools and techniques developed for general memory access +investigations, and some of those could be partially used for this purpose. +However, most of those are not practical or unscalable, mainly because those +are designed with no consideration about the trade-off between the accuracy of +the output and the overhead. + +The memory access instrumentation techniques which is applied to many tools +such as Intel PIN is essential for correctness required cases such as invalid +memory access bug detections. However, those usually incur high overhead which +is unacceptable for many of the performance-centric domains. Periodic access +checks based on H/W or S/W access counting features (e.g., the Accessed bits of +PTEs or the PG_Idle flags of pages) can dramatically decrease the overhead by +forgiving some of the quality, compared to the instrumentation based +techniques. The reduced quality is still reasonable for many of the domains, +but the overhead can arbitrarily increase as the size of the target workload +grows. Miniature-like static region based sampling can set the upperbound of +the overhead, but it will now decrease the quality of the output as the size of +the workload grows. + + +Expected Use-cases +================== + +A straightforward usecase of DAMON would be the program behavior analysis. +With the DAMON output, users can confirm whether the program is running as +intended or not. This will be useful for debuggings and tests of design +points. + +The monitored results can also be useful for counting the dynamic working set +size of workloads. For the administration of memory overcommitted systems or +selection of the environments (e.g., containers providing different amount of +memory) for your workloads, this will be useful. + +If you are a programmer, you can optimize your program by managing the memory +based on the actual data access pattern. For example, you can identify the +dynamic hotness of your data using DAMON and call ``mlock()`` to keep your hot +data in DRAM, or call ``madvise()`` with ``MADV_PAGEOUT`` to proactively +reclaim cold data. Even though your program is guaranteed to not encounter +memory pressure, you can still improve the performance by applying the DAMON +outputs for call of ``MADV_HUGEPAGE`` and ``MADV_NOHUGEPAGE``. More creative +optimizations would be possible. Our evaluations of DAMON includes a +straightforward optimization using the ``mlock()``. Please refer to the below +Evaluation section for more detail. + +As DAMON incurs very low overhead, such optimizations can be applied not only +offline, but also online. Also, there is no reason to limit such optimizations +to the user space. Several parts of the kernel's memory management mechanisms +could be also optimized using DAMON. The reclamation, the THP (de)promotion +decisions, and the compaction would be such a candidates. + + +Mechanisms of DAMON +=================== + + +Basic Access Check +------------------ + +DAMON basically reports what pages are how frequently accessed. The report is +passed to users in binary format via a ``result file`` which users can set it's +path. Note that the frequency is not an absolute number of accesses, but a +relative frequency among the pages of the target workloads. + +Users can also control the resolution of the reports by setting two time +intervals, ``sampling interval`` and ``aggregation interval``. In detail, +DAMON checks access to each page per ``sampling interval``, aggregates the +results (counts the number of the accesses to each page), and reports the +aggregated results per ``aggregation interval``. For the access check of each +page, DAMON uses the Accessed bits of PTEs. + +This is thus similar to the previously mentioned periodic access checks based +mechanisms, which overhead is increasing as the size of the target process +grows. + + +Region Based Sampling +--------------------- + +To avoid the unbounded increase of the overhead, DAMON groups a number of +adjacent pages that assumed to have same access frequencies into a region. As +long as the assumption (pages in a region have same access frequencies) is +kept, only one page in the region is required to be checked. Thus, for each +``sampling interval``, DAMON randomly picks one page in each region and clears +its Accessed bit. After one more ``sampling interval``, DAMON reads the +Accessed bit of the page and increases the access frequency of the region if +the bit has set meanwhile. Therefore, the monitoring overhead is controllable +by setting the number of regions. DAMON allows users to set the minimal and +maximum number of regions for the trade-off. + +Except the assumption, this is almost same with the above-mentioned +miniature-like static region based sampling. In other words, this scheme +cannot preserve the quality of the output if the assumption is not guaranteed. + + +Adaptive Regions Adjustment +--------------------------- + +At the beginning of the monitoring, DAMON constructs the initial regions by +evenly splitting the memory mapped address space of the process into the +user-specified minimal number of regions. In this initial state, the +assumption is normally not kept and thus the quality could be low. To keep the +assumption as much as possible, DAMON adaptively merges and splits each region. +For each ``aggregation interval``, it compares the access frequencies of +adjacent regions and merges those if the frequency difference is small. Then, +after it reports and clears the aggregated access frequency of each region, it +splits each region into two regions if the total number of regions is smaller +than the half of the user-specified maximum number of regions. + +In this way, DAMON provides its best-effort quality and minimal overhead while +keeping the bounds users set for their trade-off. + + +Applying Dynamic Memory Mappings +-------------------------------- + +Only a number of small parts in the super-huge virtual address space of the +processes is mapped to physical memory and accessed. Thus, tracking the +unmapped address regions is just wasteful. However, tracking every memory +mapping change might incur an overhead. For the reason, DAMON applies the +dynamic memory mapping changes to the tracking regions only for each of an +user-specified time interval (``regions update interval``). + + +User Interface +============== + +DAMON exports three files, ``attrs``, ``pids``, and ``monitor_on`` under its +debugfs directory, ``/damon/``. + + +Attributes +---------- + +Users can read and write the ``sampling interval``, ``aggregation interval``, +``regions update interval``, min/max number of regions, and the path to +``result file`` by reading from and writing to the ``attrs`` file. For +example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000, +and ``/damon.data`` and check it again:: + + # cd /damon + # echo 5000 100000 1000000 10 1000 /damon.data > attrs + # cat attrs + 5000 100000 1000000 10 1000 /damon.data + + +Target PIDs +----------- + +Users can read and write the pids of current monitoring target processes by +reading from and writing to the `pids` file. For example, below commands set +processes having pids 42 and 4242 as the processes to be monitored and check +it again:: + + # cd /damon + # echo 42 4242 > pids + # cat pids + 42 4242 + +Note that setting the pids doesn't starts the monitoring. + + +Turning On/Off +-------------- + +You can check current status, start and stop the monitoring by reading from and +writing to the ``monitor_on`` file. Writing ``on`` to the file starts DAMON to +monitor the target processes with the attributes. Writing ``off`` to the file +stops DAMON. DAMON also stops if every target processes is be terminated. +Below example commands turn on, off, and check status of DAMON:: + + # cd /damon + # echo on > monitor_on + # echo off > monitor_on + # cat monitor_on + off + +Please note that you cannot write to the ``attrs`` and ``pids`` files while the +monitoring is turned on. If you write to the files while DAMON is running, +``-EINVAL`` will be returned. + + +User Space Wrapper +------------------ + +DAMON has a shallow wrapper python script, ``/tools/damon/damo`` that provides +more convenient interface. Note that it is only aimed to be used for minimal +reference of the DAMON's raw interfaces and for debugging of the DAMON itself. +Based on the debugfs interface, you can create another cool and more convenient +user space tools. + + +Quick Tutorial +-------------- + +To test DAMON on your system, + +1. Ensure your kernel is built with CONFIG_DAMON turned on, and debugfs is + mounted at ``/sys/kernel/debug/``. +2. ``/tools/damon/damo -h`` diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index 11db46448354..d3d0ba373eb6 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -27,6 +27,7 @@ the Linux memory management. concepts cma_debugfs + data_access_monitor hugetlbpage idle_page_tracking ksm diff --git a/MAINTAINERS b/MAINTAINERS index fb41236f3aed..71116be4701b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4589,6 +4589,7 @@ L: linux-mm@kvack.org S: Maintained F: mm/damon.c F: tools/damon/* +F: Documentation/admin-guide/mm/data_access_monitor.rst DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER L: netdev@vger.kernel.org