From patchwork Wed Aug 1 15:13:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552425 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA58C13B8 for ; Wed, 1 Aug 2018 15:10:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C97312B859 for ; Wed, 1 Aug 2018 15:10:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C6C852B84D; Wed, 1 Aug 2018 15:10:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BFC82B859 for ; Wed, 1 Aug 2018 15:10:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 408216B0007; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3E3BB6B0008; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CEC16B000A; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id F267C6B0007 for ; Wed, 1 Aug 2018 11:10:36 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id 17-v6so17207278qkz.15 for ; Wed, 01 Aug 2018 08:10:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=dPzbJJIeJgfn8++cxkOJb5Dz/B61/Buo1muVR4A4+7E=; b=KTNZmEB3XjFmlC0rOQNFMUZ0frwZxI9P16hgA838v3n7Cj/5fIiCk70KzoaWNaU3+a BdNd9L4wnfC03QVlznZfRJvf+14UEOlS+aJvED/rVgBDrN3RbR0lrLBoYRf5mItbpK0Q PWQn2IbcGHOK9FjzWR5Sg8RyyfnMSlTTSBNDl/vZ5rBNfU8qdE+iksy26jaYtKKNLbDk +e6/Bt6DQjt9fHtaA+lH6/1X1jSF0zqPIsAxRxbbH4btG0wyBWSAlgDFDougWHCoYdNn W55FVdKz0aWfWSPzJqzKxQXQvZmBz8XXmOh2T4HCI/QY6+zvnqOIoBq1DLcyh/zsxoZI k8TA== X-Gm-Message-State: AOUpUlFzXhAjo9RLYM6bfOXBkJwSfyVAUKpR7RiursvsuxtcuQw9INc9 v0OBGSQat2N3qhcRRpOOi0UAMuECThsX9Zi2imWk1ou2JSbf6ZsWGojeAbbvjg/9D10XCyugVHO 5/2LDgm5wvh4liCRclhP1bzT23DZ1OmpmN3Basjx1ydlV1I4wd4HNwWcAaNz1UJgkQOiaHz4hOL zalVe/IItlNciSQyXgXZd3plPvGlwVtZof3JK3Orqv9e3SAHHftUV04cbHeJziRF6r/6m8UWbdR 6AvplFq8eF4vUoee+pD6auRh/XH9p7mQjLjBd2Kn3uYN+cwnZScZ/9Rp2ZlHU2oqZfcKsffX17R KMGYyWHIOzy5Hr5JQefBed0kM8mpYXOUBvkkQSVl3jcz3+deFa22SluStL53sIw6uMJLFGbDaMn U X-Received: by 2002:a0c:80a8:: with SMTP id 37-v6mr23439958qvb.13.1533136236732; Wed, 01 Aug 2018 08:10:36 -0700 (PDT) X-Received: by 2002:a0c:80a8:: with SMTP id 37-v6mr23439560qvb.13.1533136231876; Wed, 01 Aug 2018 08:10:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136231; cv=none; d=google.com; s=arc-20160816; b=TNsK55yyw47LGBhqg1H9m/iGSX/9sb3Qr2ybx5/B4YngtHNiAXfLfB12X960BvNvq1 il66u5j45wBi4s+o6m9E4wAuL9eFxby42DG0KTqOe5DUXQcF0oDkHAlOalTgTftKGylv RX2wg4bed9v9oMJ/4jSddDLJ24SXWTWyApYQhqlugi/KhcbYab1oTU25ApRds4xMiOco al/EFOEgO43/OJv51bcksenyWitelAiqxfNohj0kScDijrfj1aeCY3GJAbZjw0Wiro3w 8ghYhWd3CAbiIbZUZeWVLbwSpYQFTayDUzsD1vDuONqOrQQK7fR4X7WWrPtO3MLAIOpT nLFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=dPzbJJIeJgfn8++cxkOJb5Dz/B61/Buo1muVR4A4+7E=; b=hm16/TcD6ACrRHxxtvAfAI6AVvdxDldl7Uk4sdutBbJONB3pyv6F3nNoSFbzbuL3l6 0uGCndq9HJ0A6fvJnRPUc8u+J+Ad/J69iRAl1j4kXXCwqgF8iJaRsJz4FdIUqkhUd0po LLoTCHWM1bC1k7fyv3Q7o2VxlqsH2Wvjse7GUTiijgaW+Ljf06ZyI4nVIsU9pZt1JkiH zPdIve8vsdakLuMtYb8V09gVpVmFA0XU1uxCsNVN4GNMWTe5jpwEVRblYSr6OjItNWVy ZG1rsp39V35A9SrL0RCC61+mXr7QU+isnwUc4+dnB/Cca60zfBLQs5fni3kfatXS9arD kexw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=IrJ1qNO4; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id k67-v6sor8329292qkd.14.2018.08.01.08.10.29 for (Google Transport Security); Wed, 01 Aug 2018 08:10:29 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=IrJ1qNO4; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=dPzbJJIeJgfn8++cxkOJb5Dz/B61/Buo1muVR4A4+7E=; b=IrJ1qNO4yNpmHkHaafMFQO9fQiwxsULbOEA/dVpnfvn02fvwHPxf/c8N+CI6BxdG+X WTSmP7csbynmCfcAfTnT/eqWmktk+z8KvgEdwE+bwP3naKwVGVU5uiDdYFGstRsBdk2v +H8yKcfWiB8fNKUArPjPCJ0hfPhOq+5+vL7ERxX68srg4EMia52/Rr7QiqoKyvGC05il C8BHLt3B81kWsW3Vq/+o0zNuwWOG44TZqaztMfYNGbStvl/Y5nvbG+TQiSpFUL3CalPU oBWjAqfG0dT+9UQuEwA5hKaRRLX0peSLpCPOoNKyltWnsP85gkigRGJHUAIzOpT5QQ6a fKkg== X-Google-Smtp-Source: AAOMgpccM2SJyK3c/vfwxEYy1pzjOpBvC3aDlAr2bRkd7wwe5GvMly1gipGb6EY2btC0FoVlPAjouQ== X-Received: by 2002:a37:210a:: with SMTP id h10-v6mr24064664qkh.263.1533136229311; Wed, 01 Aug 2018 08:10:29 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id c138-v6sm11373296qkg.79.2018.08.01.08.10.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:28 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 1/9] mm: workingset: don't drop refault information prematurely Date: Wed, 1 Aug 2018 11:13:00 -0400 Message-Id: <20180801151308.32234-2-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Johannes Weiner If we keep just enough refault information to match the CURRENT page cache during reclaim time, we could lose a lot of events when there is only a temporary spike in non-cache memory consumption that pushes out all the cache. Once cache comes back, we won't see those refaults. They might not be actionable for LRU aging, but we want to know about them for measuring memory pressure. Signed-off-by: Johannes Weiner --- mm/workingset.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 40ee02c83978..53759a3cf99a 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -364,7 +364,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker, { unsigned long max_nodes; unsigned long nodes; - unsigned long cache; + unsigned long pages; /* list_lru lock nests inside the IRQ-safe i_pages lock */ local_irq_disable(); @@ -393,14 +393,14 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker, * * PAGE_SIZE / radix_tree_nodes / node_entries * 8 / PAGE_SIZE */ - if (sc->memcg) { - cache = mem_cgroup_node_nr_lru_pages(sc->memcg, sc->nid, - LRU_ALL_FILE); - } else { - cache = node_page_state(NODE_DATA(sc->nid), NR_ACTIVE_FILE) + - node_page_state(NODE_DATA(sc->nid), NR_INACTIVE_FILE); - } - max_nodes = cache >> (RADIX_TREE_MAP_SHIFT - 3); +#ifdef CONFIG_MEMCG + if (sc->memcg) + pages = page_counter_read(&sc->memcg->memory); + else +#endif + pages = node_present_pages(sc->nid); + + max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3); if (nodes <= max_nodes) return 0; From patchwork Wed Aug 1 15:13:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552429 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64C0F13B8 for ; Wed, 1 Aug 2018 15:10:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53C442B7CD for ; Wed, 1 Aug 2018 15:10:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 500AB2B88F; Wed, 1 Aug 2018 15:10:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFD132B841 for ; Wed, 1 Aug 2018 15:10:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20D286B000C; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 124076B000A; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA6116B000E; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id A96536B000A for ; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) Received: by mail-qt0-f200.google.com with SMTP id j9-v6so15787661qtn.22 for ; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=qQSBTnxNlt3eRiLy52Sx1ZjqVOe8sAaRJJ76RsaU+wI=; b=C3K7D07dPlFkb4DTCqV/QrO8SuiAI2A1Yp3s6yYV6nPs5tRroRA8PVnagV3VpvXl+A E0t1CF9z+HONuJVbKiTPDbvtbr0F+hEwqLvln6wMac9ytZQRrjqFcpqpTChOON0uX5Ih vHJ064/ob21bQJDO0+IbgMWe8V+gJj0SR0dXUqr9Ki7GNYSrrfSndo/dH3sMlm2tqUaJ chv0KHiclrDgx5Gu9+HjWtYrCW2vTIbS2oYvigst/7e0XebDgdPU6l9JJYlKTgINzMpw Oc5qqGYZ3/UL5IUIfbYYzAwOt3myQ69zJppswAJQHoFkIIy79nspH8v2AQMCzdEkcnan 2REw== X-Gm-Message-State: AOUpUlENhRS66nsKLBdllVJUjdizFQ+6XWeIylJitxAZsB8x4sTyDbNV MmrNFNSjO6XBbdHBZsiTwtbRHtgUyqUK2x/wamQO8r8dSkqsXL6HhDsEc5xCISDJ3TvAbqDEphU UYAaUwrucuU8JeUBbJm0xD458iydHXIN6Gbt/8Trin/5noo3hLebxEnJFLaxkjbiOCNKIbdtVUD Qq7hRBr8HBMJ0DPNjucrF1QMZmenPUCuJWOs5lcu8P7bEx+HJVHP7W3+E3hTSxu2NVvbUEvXI1y 0VzvChfacpvS+SMZLJvmzkIhh53+jxoT7gC5gNCrN/0Uec4ShK4QdjztClu3Vlbi3OT2l/aBMoN HdJc/fjSwgLdnTXoWE/g0n2g79N5KFkWfAaWeKfwjMtIWGPgi3taOLhef/MDFhnzncdwmnnBY6k 1 X-Received: by 2002:a0c:86b3:: with SMTP id 48-v6mr23271392qvf.64.1533136237404; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Received: by 2002:a0c:86b3:: with SMTP id 48-v6mr23270985qvf.64.1533136231896; Wed, 01 Aug 2018 08:10:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136231; cv=none; d=google.com; s=arc-20160816; b=mZ5Wcr5X/6LHmx97nK+821lvxxAVxTpS1t/ubcZnX2euTQea3Ks4qRFMKCzmM7qAsm 0WW2okJOlVZL8xuzMQ3FBRfgOp9yJEgr7qBDk9RyD/VVKZ/DTYhY2BlVUBFjDi5JOcTb +axOWdB4nFzjcoX3GmgmSPO4CGk5Rr+2D95Qhd3YuVPX/S/NHmpVI9DmLsG83xVvS5wg 0JFPGYTP94Rm69UfwNXBTQmlAFJLDHd6HXOUq4mFBI/69GWrjvZW228A/2jV7wK+7xZO uF3wa3eoCbkYzaNdcAs2J8ets0DxdyqvhSOM+StdFwC28imn8tDQ4V3EKKo06fOOaiUQ JtGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=qQSBTnxNlt3eRiLy52Sx1ZjqVOe8sAaRJJ76RsaU+wI=; b=xCBGqEPov9HBtpFrFkzgSo3MG++r8nSKmoYhYZDjgbW2W3qFoyClmU6cwcPf8TEsEF LUqCfwmW5uDICXL2pE777M0hNLpGQTzGew6a2NLnfIQlJMwarqO/8rnXMXG4xL9plnDM j72e13CSpZ5h4fOECKUm3FIhSMEmxQCtsvkd+mfHVtaqTxTJ+hF1igKL/RFRcwtZtLcg t+6v56EIc4RA0PxdQ1AVA5nWbFNYKehwAqRSbCxzcsSi86SmNv+/UaTP8/LQ/nwGobe/ MUdvwzy1zu2niqnQVMgC2G8IRCWRnFVjcbF6yrrLGlNMs7aat4PlzQxqPBUZki2bF0Kp +SWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=G59+pvDo; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id f85-v6sor8353769qke.88.2018.08.01.08.10.31 for (Google Transport Security); Wed, 01 Aug 2018 08:10:31 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=G59+pvDo; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=qQSBTnxNlt3eRiLy52Sx1ZjqVOe8sAaRJJ76RsaU+wI=; b=G59+pvDoaG0K4hhCng5tKzbMkHo7+lWZG690NiMtHanJMYUdHzn7tth9PZ5qOVFYLb w9JC6vFNlwNftPjWt1KIRJIVkg+VA7VdcxtlE9Hl1dKx6ld0Knap04Hf55ZHgHuO6q2K n+pC46Q9QGIdq0BNb6GzZyDLMslZ2jHwyBKAVHtnzrBuBWqxNO+Zaa/5UvMkGUlajay5 YHP/C+jpASiKKEW/aMOWFJPhHFytB5NyufFZvGMUXxnXEsV1w2qKaRbuEqKk7KsrS74f UPMAfJ8FqlAWVruvWFB2yYHt3XmRkpNXsXW/uKYevT+3pG6EjXE5AKZSeqRWF4LKUtXN aQQQ== X-Google-Smtp-Source: AAOMgpdvIMM1sxZsplKtU0S0G156hTjfhfgQexpcho8hdQwiUqEKyfJD4ivVWoCXd0rZpyBZSdGQ7g== X-Received: by 2002:a37:6301:: with SMTP id x1-v6mr24277495qkb.403.1533136231282; Wed, 01 Aug 2018 08:10:31 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id f63-v6sm11312664qtb.64.2018.08.01.08.10.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:30 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 2/9] mm: workingset: tell cache transitions from workingset thrashing Date: Wed, 1 Aug 2018 11:13:01 -0400 Message-Id: <20180801151308.32234-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Refaults happen during transitions between workingsets as well as in-place thrashing. Knowing the difference between the two has a range of applications, including measuring the impact of memory shortage on the system performance, as well as the ability to smarter balance pressure between the filesystem cache and the swap-backed workingset. During workingset transitions, inactive cache refaults and pushes out established active cache. When that active cache isn't stale, however, and also ends up refaulting, that's bonafide thrashing. Introduce a new page flag that tells on eviction whether the page has been active or not in its lifetime. This bit is then stored in the shadow entry, to classify refaults as transitioning or thrashing. How many page->flags does this leave us with on 32-bit? 20 bits are always page flags 21 if you have an MMU 23 with the zone bits for DMA, Normal, HighMem, Movable 29 with the sparsemem section bits 30 if PAE is enabled 31 with this patch. So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If that's not enough, the system can switch to discontigmem and re-gain the 6 or 7 sparsemem section bits. Signed-off-by: Johannes Weiner --- include/linux/mmzone.h | 1 + include/linux/page-flags.h | 5 +- include/linux/swap.h | 2 +- include/trace/events/mmflags.h | 1 + mm/filemap.c | 9 ++-- mm/huge_memory.c | 1 + mm/memcontrol.c | 2 + mm/migrate.c | 2 + mm/swap_state.c | 1 + mm/vmscan.c | 1 + mm/vmstat.c | 1 + mm/workingset.c | 95 ++++++++++++++++++++++------------ 12 files changed, 79 insertions(+), 42 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32699b2dc52a..6af87946d241 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -163,6 +163,7 @@ enum node_stat_item { NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ WORKINGSET_REFAULT, WORKINGSET_ACTIVATE, + WORKINGSET_RESTORE, WORKINGSET_NODERECLAIM, NR_ANON_MAPPED, /* Mapped anonymous pages */ NR_FILE_MAPPED, /* pagecache pages mapped into pagetables. diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e34a27727b9a..7af1c3c15d8e 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -69,13 +69,14 @@ */ enum pageflags { PG_locked, /* Page is locked. Don't touch. */ - PG_error, PG_referenced, PG_uptodate, PG_dirty, PG_lru, PG_active, + PG_workingset, PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ + PG_error, PG_slab, PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/ PG_arch_1, @@ -280,6 +281,8 @@ PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) +PAGEFLAG(Workingset, workingset, PF_HEAD) + TESTCLEARFLAG(Workingset, workingset, PF_HEAD) __PAGEFLAG(Slab, slab, PF_NO_TAIL) __PAGEFLAG(SlobFree, slob_free, PF_NO_TAIL) PAGEFLAG(Checked, checked, PF_NO_COMPOUND) /* Used by some filesystems */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 2417d288e016..d8c47dcdec6f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -296,7 +296,7 @@ struct vma_swap_readahead { /* linux/mm/workingset.c */ void *workingset_eviction(struct address_space *mapping, struct page *page); -bool workingset_refault(void *shadow); +void workingset_refault(struct page *page, void *shadow); void workingset_activation(struct page *page); /* Do not use directly, use workingset_lookup_update */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a81cffb76d89..a1675d43777e 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -88,6 +88,7 @@ {1UL << PG_dirty, "dirty" }, \ {1UL << PG_lru, "lru" }, \ {1UL << PG_active, "active" }, \ + {1UL << PG_workingset, "workingset" }, \ {1UL << PG_slab, "slab" }, \ {1UL << PG_owner_priv_1, "owner_priv_1" }, \ {1UL << PG_arch_1, "arch_1" }, \ diff --git a/mm/filemap.c b/mm/filemap.c index 0604cb02e6f3..bd36b7226cf4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -915,12 +915,9 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, * data from the working set, only to cache data that will * get overwritten with something else, is a waste of memory. */ - if (!(gfp_mask & __GFP_WRITE) && - shadow && workingset_refault(shadow)) { - SetPageActive(page); - workingset_activation(page); - } else - ClearPageActive(page); + WARN_ON_ONCE(PageActive(page)); + if (!(gfp_mask & __GFP_WRITE) && shadow) + workingset_refault(page, shadow); lru_cache_add(page); } return ret; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b9f3dbd885bd..c67ecf77ea8b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2370,6 +2370,7 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_mlocked) | (1L << PG_uptodate) | (1L << PG_active) | + (1L << PG_workingset) | (1L << PG_locked) | (1L << PG_unevictable) | (1L << PG_dirty))); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2bd3df3d101a..c59519d600ea 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5283,6 +5283,8 @@ static int memory_stat_show(struct seq_file *m, void *v) stat[WORKINGSET_REFAULT]); seq_printf(m, "workingset_activate %lu\n", stat[WORKINGSET_ACTIVATE]); + seq_printf(m, "workingset_restore %lu\n", + stat[WORKINGSET_RESTORE]); seq_printf(m, "workingset_nodereclaim %lu\n", stat[WORKINGSET_NODERECLAIM]); diff --git a/mm/migrate.c b/mm/migrate.c index 8c0af0f7cab1..a6a9114e62dc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -682,6 +682,8 @@ void migrate_page_states(struct page *newpage, struct page *page) SetPageActive(newpage); } else if (TestClearPageUnevictable(page)) SetPageUnevictable(newpage); + if (PageWorkingset(page)) + SetPageWorkingset(newpage); if (PageChecked(page)) SetPageChecked(newpage); if (PageMappedToDisk(page)) diff --git a/mm/swap_state.c b/mm/swap_state.c index 07f9aa2340c3..2721ef8862d1 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -451,6 +451,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* * Initiate read into locked page and return. */ + SetPageWorkingset(new_page); lru_cache_add_anon(new_page); *new_page_allocated = true; return new_page; diff --git a/mm/vmscan.c b/mm/vmscan.c index 9270a4370d54..8d1ad48ffbcd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1976,6 +1976,7 @@ static void shrink_active_list(unsigned long nr_to_scan, } ClearPageActive(page); /* we are de-activating */ + SetPageWorkingset(page); list_add(&page->lru, &l_inactive); } diff --git a/mm/vmstat.c b/mm/vmstat.c index a2b9518980ce..507dc9c01b88 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1145,6 +1145,7 @@ const char * const vmstat_text[] = { "nr_isolated_file", "workingset_refault", "workingset_activate", + "workingset_restore", "workingset_nodereclaim", "nr_anon_pages", "nr_mapped", diff --git a/mm/workingset.c b/mm/workingset.c index 53759a3cf99a..ef6be3d92116 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -121,7 +121,7 @@ * the only thing eating into inactive list space is active pages. * * - * Activating refaulting pages + * Refaulting inactive pages * * All that is known about the active list is that the pages have been * accessed more than once in the past. This means that at any given @@ -134,6 +134,10 @@ * used less frequently than the refaulting page - or even not used at * all anymore. * + * That means if inactive cache is refaulting with a suitable refault + * distance, we assume the cache workingset is transitioning and put + * pressure on the current active list. + * * If this is wrong and demotion kicks in, the pages which are truly * used more frequently will be reactivated while the less frequently * used once will be evicted from memory. @@ -141,6 +145,14 @@ * But if this is right, the stale pages will be pushed out of memory * and the used pages get to stay in cache. * + * Refaulting active pages + * + * If on the other hand the refaulting pages have recently been + * deactivated, it means that the active list is no longer protecting + * actively used cache from reclaim. The cache is NOT transitioning to + * a different workingset; the existing workingset is thrashing in the + * space allocated to the page cache. + * * * Implementation * @@ -156,8 +168,7 @@ */ #define EVICTION_SHIFT (RADIX_TREE_EXCEPTIONAL_ENTRY + \ - NODES_SHIFT + \ - MEM_CGROUP_ID_SHIFT) + 1 + NODES_SHIFT + MEM_CGROUP_ID_SHIFT) #define EVICTION_MASK (~0UL >> EVICTION_SHIFT) /* @@ -170,23 +181,28 @@ */ static unsigned int bucket_order __read_mostly; -static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction) +static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction, + bool workingset) { eviction >>= bucket_order; eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid; eviction = (eviction << NODES_SHIFT) | pgdat->node_id; + eviction = (eviction << 1) | workingset; eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT); return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY); } static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, - unsigned long *evictionp) + unsigned long *evictionp, bool *workingsetp) { unsigned long entry = (unsigned long)shadow; int memcgid, nid; + bool workingset; entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT; + workingset = entry & 1; + entry >>= 1; nid = entry & ((1UL << NODES_SHIFT) - 1); entry >>= NODES_SHIFT; memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1); @@ -195,6 +211,7 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, *memcgidp = memcgid; *pgdat = NODE_DATA(nid); *evictionp = entry << bucket_order; + *workingsetp = workingset; } /** @@ -207,8 +224,8 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat, */ void *workingset_eviction(struct address_space *mapping, struct page *page) { - struct mem_cgroup *memcg = page_memcg(page); struct pglist_data *pgdat = page_pgdat(page); + struct mem_cgroup *memcg = page_memcg(page); int memcgid = mem_cgroup_id(memcg); unsigned long eviction; struct lruvec *lruvec; @@ -220,30 +237,30 @@ void *workingset_eviction(struct address_space *mapping, struct page *page) lruvec = mem_cgroup_lruvec(pgdat, memcg); eviction = atomic_long_inc_return(&lruvec->inactive_age); - return pack_shadow(memcgid, pgdat, eviction); + return pack_shadow(memcgid, pgdat, eviction, PageWorkingset(page)); } /** * workingset_refault - evaluate the refault of a previously evicted page + * @page: the freshly allocated replacement page * @shadow: shadow entry of the evicted page * * Calculates and evaluates the refault distance of the previously * evicted page in the context of the node it was allocated in. - * - * Returns %true if the page should be activated, %false otherwise. */ -bool workingset_refault(void *shadow) +void workingset_refault(struct page *page, void *shadow) { unsigned long refault_distance; + struct pglist_data *pgdat; unsigned long active_file; struct mem_cgroup *memcg; unsigned long eviction; struct lruvec *lruvec; unsigned long refault; - struct pglist_data *pgdat; + bool workingset; int memcgid; - unpack_shadow(shadow, &memcgid, &pgdat, &eviction); + unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); rcu_read_lock(); /* @@ -263,41 +280,51 @@ bool workingset_refault(void *shadow) * configurations instead. */ memcg = mem_cgroup_from_id(memcgid); - if (!mem_cgroup_disabled() && !memcg) { - rcu_read_unlock(); - return false; - } + if (!mem_cgroup_disabled() && !memcg) + goto out; lruvec = mem_cgroup_lruvec(pgdat, memcg); refault = atomic_long_read(&lruvec->inactive_age); active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES); /* - * The unsigned subtraction here gives an accurate distance - * across inactive_age overflows in most cases. + * Calculate the refault distance * - * There is a special case: usually, shadow entries have a - * short lifetime and are either refaulted or reclaimed along - * with the inode before they get too old. But it is not - * impossible for the inactive_age to lap a shadow entry in - * the field, which can then can result in a false small - * refault distance, leading to a false activation should this - * old entry actually refault again. However, earlier kernels - * used to deactivate unconditionally with *every* reclaim - * invocation for the longest time, so the occasional - * inappropriate activation leading to pressure on the active - * list is not a problem. + * The unsigned subtraction here gives an accurate distance + * across inactive_age overflows in most cases. There is a + * special case: usually, shadow entries have a short lifetime + * and are either refaulted or reclaimed along with the inode + * before they get too old. But it is not impossible for the + * inactive_age to lap a shadow entry in the field, which can + * then can result in a false small refault distance, leading + * to a false activation should this old entry actually + * refault again. However, earlier kernels used to deactivate + * unconditionally with *every* reclaim invocation for the + * longest time, so the occasional inappropriate activation + * leading to pressure on the active list is not a problem. */ refault_distance = (refault - eviction) & EVICTION_MASK; inc_lruvec_state(lruvec, WORKINGSET_REFAULT); - if (refault_distance <= active_file) { - inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE); - rcu_read_unlock(); - return true; + /* + * Compare the distance to the existing workingset size. We + * don't act on pages that couldn't stay resident even if all + * the memory was available to the page cache. + */ + if (refault_distance > active_file) + goto out; + + SetPageActive(page); + atomic_long_inc(&lruvec->inactive_age); + inc_lruvec_state(lruvec, WORKINGSET_ACTIVATE); + + /* Page was active prior to eviction */ + if (workingset) { + SetPageWorkingset(page); + inc_lruvec_state(lruvec, WORKINGSET_RESTORE); } +out: rcu_read_unlock(); - return false; } /** From patchwork Wed Aug 1 15:13:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552427 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 606B613B8 for ; Wed, 1 Aug 2018 15:10:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 513CB2B864 for ; Wed, 1 Aug 2018 15:10:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4E7F52B887; Wed, 1 Aug 2018 15:10:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F5A92B875 for ; Wed, 1 Aug 2018 15:10:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E8F9F6B0008; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E45866B000C; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBF466B0008; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id 8E96E6B0008 for ; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id z18-v6so16900041qki.22 for ; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=yu9UdCtaoT51hJzI6OxnWMzNlhzbw5nOYEshpoGdmog=; b=CPBWZa/WNsXwc+PRz06pRdYu0R5ZSt9+F5ID2zO5loTf01CrKzJwX/mi7InRkPAUEk EiTb96dCHwQ6pZ9+patUB55FWGYyLq5LdfCYZwRkpKANKqIZ38zc3fes2pyrQuFftaa6 PeNnMQvRPVaqPu0HNBiEw0ImvYtq6M2ABk8gBHKfA5+9ZGs1//K4fjYusghrF8eb+cpx xILWiW+du7bGwn1EcM6H+U2DZyKFiXofz48Qu06cZ1Dr4DuMU/0TQtrheATUvLq+tkti aJLqs3U/tUT87BGS/Zgn+vOtGRB9bpBNjoPUcDv4MSKKTZo4JTc0b1wis5YGjYCNzuVY pNzQ== X-Gm-Message-State: AOUpUlHrSpzv5WDwRjeQGJNks2Rx9wpmss7Adi3VJJKQ9gjG6ZqeEqbQ QXONpZhD9tldr8AYqJYqYqVO05TA1UfPJzm/NrDjzjdcXnN+nh3ApyKo/URF5WCHbgNfayvaesu W06nDbvUBk3y55LkOAgzQiAq208aO4rh42If87+Z/xbUWVcYeD0f5z7gMHJx8FPHKEm/RHsujYg LmjgQRyNOysUWWyTb8u7wMkYWdfkji6nNU23dUadNi1TGOEIkUnRPEcBWqnU+uyMVP9mHyVuaZj NNQjm6tF9mY/lEvTIpDInh1MsRMM+MTHUqcVfsEFje64Ni3QrXkwdLoJRfje0Z/TJ7OpKGDDUfi GjjO5QUOEkYz6qP25N9xiVqd5gUcySUAJgCtZlRkyuQ7T5zPJUWv7AujCTIhW3XjEM9WmYmEeyX r X-Received: by 2002:a37:7a46:: with SMTP id v67-v6mr13365256qkc.188.1533136237333; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Received: by 2002:a37:7a46:: with SMTP id v67-v6mr13365020qkc.188.1533136233814; Wed, 01 Aug 2018 08:10:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136233; cv=none; d=google.com; s=arc-20160816; b=lgULCwnIqNqCszZ2QQT75SCh5/1AxgazNWy7icwHZPCDcBN+UR+fOr4HCAZfUqnXvZ HxpVa68fOFPtcpggY39u5+hc+iG2FSy59OrjVXaOAMH+r7wu8rGoxqLH3AN/pIFlgYCW E4oi3UJs21aATXfj8rvUANeKx+AXhdnkoTUlooKi2B8WT8mmRRE7OnOZFhtOxH+iZHCk VnjlxazAg1/eoczx8Q/dNgbUKLRvSmjD9haODGwOnz08wyPNZH97Oolc6PuyO+t0F52G fNTwTGpA5Yhlc65bJDJLrB4OO/LiM3VvexVy6sfspDEtnNMl5R9emmMO0d2IgxFSiUs8 FADA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=yu9UdCtaoT51hJzI6OxnWMzNlhzbw5nOYEshpoGdmog=; b=F28N9sUOqGDgyrbTt9NxcunDdW7rDAbfeKHQtPnV2Im6vCENj3CtjEWEIW79AR48RV 0ekS8xVZ3w/6wpbp2TBYriPlZNJntw5JGpJkiRFuU6+cptU+p5rmY18FSWzv0ErHK5wS V2S05vTFuwWwspp00yA+757f/M3EfPtxFKHxXMRqvbUrfe6S0R1h4BOwd+D9tPbemIDV aNlidfRqgtMWAyvl5ytHk06uIVcJHjxtrlv6jxC+nQXDuw4Pk31MeQnrW6Qj4FZYrMJ1 s8+C3QfTjJUMrS8AG3s4WZTIP6lOa3skAflOWuNhtHSSwB6lPbIcik0tU6gTPzESkHkK Ad8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="S+bQ/odG"; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id o16-v6sor7883727qve.37.2018.08.01.08.10.33 for (Google Transport Security); Wed, 01 Aug 2018 08:10:33 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="S+bQ/odG"; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=yu9UdCtaoT51hJzI6OxnWMzNlhzbw5nOYEshpoGdmog=; b=S+bQ/odGJrCZYEzrG/5rcd+JaoO/SKO1JlwfeOLVDeYc3ScY4qNj+NE+5KRJkBSBJv eT2X9qu0XZjoVUF1scysQNi7fYNaxJQvHrP1c36DhenmcrY5ysMY8PlS8ppQILusFWsg BhUvXTbNpUuOGVpNALd+jtlWRYRtBt4IS49ZOgKh2/kN4/GMn8SvMxNrYBeCHeWOIUfa mZ8xivlOAvgcmdsMVfd/lqI6Ne9RSEbgitbccX5/bVttDOiHHM3jDv4Bv4cr/H/c1XQg ya875xTT7IFS4hS0+PLkpxfk2UezJJZrkXMzECkUJj184prcCdjZ5kPE5Wk6LVKmJXyg iviA== X-Google-Smtp-Source: AAOMgpf+d8efcRMdqvpwDhWjBvgfmJWM53WA2bl1mVriXs594AIM0XSmTu4BEa108hSTQ2/KazewIQ== X-Received: by 2002:a0c:aed9:: with SMTP id n25-v6mr23981811qvd.10.1533136233224; Wed, 01 Aug 2018 08:10:33 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id v41-v6sm14147110qtk.70.2018.08.01.08.10.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:32 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/9] delayacct: track delays from thrashing cache pages Date: Wed, 1 Aug 2018 11:13:02 -0400 Message-Id: <20180801151308.32234-4-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Delay accounting already measures the time a task spends in direct reclaim and waiting for swapin, but in low memory situations tasks spend can spend a significant amount of their time waiting on thrashing page cache. This isn't tracked right now. To know the full impact of memory contention on an individual task, measure the delay when waiting for a recently evicted active cache page to read back into memory. Also update tools/accounting/getdelays.c: [hannes@computer accounting]$ sudo ./getdelays -d -p 1 print delayacct stats ON PID 1 CPU count real total virtual total delay total delay average 50318 745000000 847346785 400533713 0.008ms IO count delay total delay average 435 122601218 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 0 0 0ms THRASHING count delay total delay average 19 12621439 0ms Signed-off-by: Johannes Weiner --- include/linux/delayacct.h | 23 +++++++++++++++++++++++ include/uapi/linux/taskstats.h | 6 +++++- kernel/delayacct.c | 15 +++++++++++++++ mm/filemap.c | 11 +++++++++++ tools/accounting/getdelays.c | 8 +++++++- 5 files changed, 61 insertions(+), 2 deletions(-) diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index 5e335b6203f4..d3e75b3ba487 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -57,7 +57,12 @@ struct task_delay_info { u64 freepages_start; u64 freepages_delay; /* wait for memory reclaim */ + + u64 thrashing_start; + u64 thrashing_delay; /* wait for thrashing page */ + u32 freepages_count; /* total count of memory reclaim */ + u32 thrashing_count; /* total count of thrash waits */ }; #endif @@ -76,6 +81,8 @@ extern int __delayacct_add_tsk(struct taskstats *, struct task_struct *); extern __u64 __delayacct_blkio_ticks(struct task_struct *); extern void __delayacct_freepages_start(void); extern void __delayacct_freepages_end(void); +extern void __delayacct_thrashing_start(void); +extern void __delayacct_thrashing_end(void); static inline int delayacct_is_task_waiting_on_io(struct task_struct *p) { @@ -156,6 +163,18 @@ static inline void delayacct_freepages_end(void) __delayacct_freepages_end(); } +static inline void delayacct_thrashing_start(void) +{ + if (current->delays) + __delayacct_thrashing_start(); +} + +static inline void delayacct_thrashing_end(void) +{ + if (current->delays) + __delayacct_thrashing_end(); +} + #else static inline void delayacct_set_flag(int flag) {} @@ -182,6 +201,10 @@ static inline void delayacct_freepages_start(void) {} static inline void delayacct_freepages_end(void) {} +static inline void delayacct_thrashing_start(void) +{} +static inline void delayacct_thrashing_end(void) +{} #endif /* CONFIG_TASK_DELAY_ACCT */ diff --git a/include/uapi/linux/taskstats.h b/include/uapi/linux/taskstats.h index b7aa7bb2349f..5e8ca16a9079 100644 --- a/include/uapi/linux/taskstats.h +++ b/include/uapi/linux/taskstats.h @@ -34,7 +34,7 @@ */ -#define TASKSTATS_VERSION 8 +#define TASKSTATS_VERSION 9 #define TS_COMM_LEN 32 /* should be >= TASK_COMM_LEN * in linux/sched.h */ @@ -164,6 +164,10 @@ struct taskstats { /* Delay waiting for memory reclaim */ __u64 freepages_count; __u64 freepages_delay_total; + + /* Delay waiting for thrashing page */ + __u64 thrashing_count; + __u64 thrashing_delay_total; }; diff --git a/kernel/delayacct.c b/kernel/delayacct.c index e2764d767f18..02ba745c448d 100644 --- a/kernel/delayacct.c +++ b/kernel/delayacct.c @@ -134,9 +134,12 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk) d->swapin_delay_total = (tmp < d->swapin_delay_total) ? 0 : tmp; tmp = d->freepages_delay_total + tsk->delays->freepages_delay; d->freepages_delay_total = (tmp < d->freepages_delay_total) ? 0 : tmp; + tmp = d->thrashing_delay_total + tsk->delays->thrashing_delay; + d->thrashing_delay_total = (tmp < d->thrashing_delay_total) ? 0 : tmp; d->blkio_count += tsk->delays->blkio_count; d->swapin_count += tsk->delays->swapin_count; d->freepages_count += tsk->delays->freepages_count; + d->thrashing_count += tsk->delays->thrashing_count; spin_unlock_irqrestore(&tsk->delays->lock, flags); return 0; @@ -168,3 +171,15 @@ void __delayacct_freepages_end(void) ¤t->delays->freepages_count); } +void __delayacct_thrashing_start(void) +{ + current->delays->thrashing_start = ktime_get_ns(); +} + +void __delayacct_thrashing_end(void) +{ + delayacct_end(¤t->delays->lock, + ¤t->delays->thrashing_start, + ¤t->delays->thrashing_delay, + ¤t->delays->thrashing_count); +} diff --git a/mm/filemap.c b/mm/filemap.c index bd36b7226cf4..e49961e13dd9 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "internal.h" #define CREATE_TRACE_POINTS @@ -1073,8 +1074,15 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, { struct wait_page_queue wait_page; wait_queue_entry_t *wait = &wait_page.wait; + bool thrashing = false; int ret = 0; + if (bit_nr == PG_locked && !PageSwapBacked(page) && + !PageUptodate(page) && PageWorkingset(page)) { + delayacct_thrashing_start(); + thrashing = true; + } + init_wait(wait); wait->flags = lock ? WQ_FLAG_EXCLUSIVE : 0; wait->func = wake_page_function; @@ -1113,6 +1121,9 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, finish_wait(q, wait); + if (thrashing) + delayacct_thrashing_end(); + /* * A signal could leave PageWaiters set. Clearing it here if * !waitqueue_active would be possible (by open-coding finish_wait), diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c index 9f420d98b5fb..8cb504d30384 100644 --- a/tools/accounting/getdelays.c +++ b/tools/accounting/getdelays.c @@ -203,6 +203,8 @@ static void print_delayacct(struct taskstats *t) "SWAP %15s%15s%15s\n" " %15llu%15llu%15llums\n" "RECLAIM %12s%15s%15s\n" + " %15llu%15llu%15llums\n" + "THRASHING%12s%15s%15s\n" " %15llu%15llu%15llums\n", "count", "real total", "virtual total", "delay total", "delay average", @@ -222,7 +224,11 @@ static void print_delayacct(struct taskstats *t) "count", "delay total", "delay average", (unsigned long long)t->freepages_count, (unsigned long long)t->freepages_delay_total, - average_ms(t->freepages_delay_total, t->freepages_count)); + average_ms(t->freepages_delay_total, t->freepages_count), + "count", "delay total", "delay average", + (unsigned long long)t->thrashing_count, + (unsigned long long)t->thrashing_delay_total, + average_ms(t->thrashing_delay_total, t->thrashing_count)); } static void task_context_switch_counts(struct taskstats *t) From patchwork Wed Aug 1 15:13:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552431 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C447415E9 for ; Wed, 1 Aug 2018 15:10:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B1E0A2B899 for ; Wed, 1 Aug 2018 15:10:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AFB952B701; Wed, 1 Aug 2018 15:10:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 00DBD2B875 for ; Wed, 1 Aug 2018 15:10:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4AB5C6B000A; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3E7F26B000E; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F019E6B0010; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id B594E6B000D for ; Wed, 1 Aug 2018 11:10:37 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id b7-v6so16072968qtp.14 for ; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=eqQ2drM6ZB7t7xA4Vh6Us/fahKGTELhOpcUpFPV0E1Y=; b=Lu01vEhGDk4jTavGWwdvYrXq0v1bwSPl+YVarjZod0y4EyWRj+r2h0fTNtT7xqbgti 2BF2Iw0uA5LbMZ66TzkEaTH1PCdSYP7PNEA/4xerDMbVoiG33MVIdxKeQy1KCVP9D9zr bTzSsMN606n4D/qPK5KfR+ivy/g7qc2AwqgTT0dxPg56Amlvq47SS/56kSyiqr7jGBVC fQSBKfSN9FGJdMewOWM13oaWt/OxozdP4w+oPRS5gaYzhXWqEnF/yjmAphLXPa19EhAN mUp2aRO9NTCT3Vb5pJdiWXQGOxuRbtn8mSmWNy925kJMWoRCiMyhCBFmFBXP/bKnq3J6 iajA== X-Gm-Message-State: AOUpUlFpFw8o1CdUF4d7OtBX3lBxQd0LT7zhMw4Jc5UyPCu2C/gcmldS uEDwakKjMi1Sva0imgzGcFIWhcsr8GHILrwhu7U/cVboUv6iLcL9AIPtikg1pHSsld7SW8dbnN3 9rAi6wEODC0JwmqBvRggqnXozdmsl0x6BDcbdinvYrvoYIYxltSl5MroFt9ac6STJfauNL26Nfx vouaicKH8qY8LAKn9g6ruNm/aYKI/1W0iQdjO+KKQObaI2dnRkPTDkcggAOGWS+2Q6Nej8zEZYM lIKCTzv54yY+qHygvcYVBMoTU3lNJWa+z6/Ak/lHvzFEeCErj6jg3Dx/yBaPE/MRnOmidoXKKw4 Tkp/w1dpR7nPuW5rshwVxD5lA+3cCdbeq7Kdm9+G3cqzU4kO/oYU8ESo/shcdD/4+P8PVhUd8xH F X-Received: by 2002:a0c:d112:: with SMTP id a18-v6mr23685940qvh.200.1533136237459; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) X-Received: by 2002:a0c:d112:: with SMTP id a18-v6mr23685793qvh.200.1533136235582; Wed, 01 Aug 2018 08:10:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136235; cv=none; d=google.com; s=arc-20160816; b=FdanX7X6Kv7VlnlU/J1M473xegOhnv7h0ePbRl+F1wC830ZX+Xq68fijZ3RWFVnNV5 jvoOpsbBlJ40s7IJorz7zz0VcgYBP1UV+taWaorFf6WKKNOv5IQdliCFj5AMZ0NiN2GH wYAzHVfi5k4Rd9DstvHt+LGq90RgcVUH1z6+a+zx7jISQKwTEugP/D+SaZF6zjUfqOi+ 3Y5iBxgiCEhfkSSrI9mbjaqrQ/LI2pcLVJurVnUsZDYnzTALwN+TLxDdjuBAlA3Z54vo bw18pV3C0c/4uQqJhVFbj3pJmi6CrDwhscJW1a4dtkGMfmyoRm3p4F40iOnfVgMkAO3K EMgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=eqQ2drM6ZB7t7xA4Vh6Us/fahKGTELhOpcUpFPV0E1Y=; b=O2uDvIzre6b6SLnbLnCZp4tc6pX5tVJy9tpXrjOZ6DhpeLDqDcO+r7DsP8zyIg6A9Y X7MZmokrIshcyyVnsqJqprxANdS+EnQaNndqKjrrQWpSxv2GLiGUFCl1EBHGRuh+fXbw 3T9O5+jcqF9yICOLrXic6vOufnDswfBK59h0e1VQ8uxkMshlcSjGZpvY8nTuZ0Lw1co3 gFG007WbRRuYX3wK7jIm5zg+dbi6b8VrPPwHN1briMeCdqaOjfpbDakEEfaWazm7xu/x uvVVhMA6E0tMJLsox4Dxn1i0S3qG4s6rY2FPVvrBoV35Dr8gmQj7JahknQuIgy//1F2w sReQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=1mDNeCZa; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 35-v6sor8112669qty.41.2018.08.01.08.10.35 for (Google Transport Security); Wed, 01 Aug 2018 08:10:35 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=1mDNeCZa; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=eqQ2drM6ZB7t7xA4Vh6Us/fahKGTELhOpcUpFPV0E1Y=; b=1mDNeCZahAC19dpgCqHO8FLElRRTMHQhQaZunImfz4JyTBYYJna6IcZHYTqGGnsqkH Kemssrlv2vIEmI/Dq3VP86W4aOxWK2UpYCYIqqlCggGevM44i9VBA3Qv8othKhq1K3Nk F02/cCV4Edbp8k/7+wnYz3j0pEsv2jjiPvidvAzRpdaFVMLzVcaiKcD00i4qCkhOGlEP qSd0xyoqZpshE/AvPr3xTgwpAgnMwcUNiuvfhXzKCPs14x5VbbLbNP2IdWiiHvatzU0x v0hjhfOcD4QuF2/JX0SmF9wUGcFn2a84dke7HrNaauYL6AbRHvYWHGvrSVUv3GbbHAIg QV7Q== X-Google-Smtp-Source: AAOMgpdnYSf4lJs3QDi78Y4Kb00U+4WqNLl/xFqcL/6QYYjfwZSorbURRjJbyoGrre0j5OsPE2hwwQ== X-Received: by 2002:ac8:4102:: with SMTP id q2-v6mr26414825qtl.222.1533136235254; Wed, 01 Aug 2018 08:10:35 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id e206-v6sm9366288qkb.4.2018.08.01.08.10.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:34 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 4/9] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Date: Wed, 1 Aug 2018 11:13:03 -0400 Message-Id: <20180801151308.32234-5-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are several definitions of those functions/macros in places that mess with fixed-point load averages. Provide an official version. Signed-off-by: Johannes Weiner --- .../platforms/cell/cpufreq_spudemand.c | 2 +- arch/powerpc/platforms/cell/spufs/sched.c | 9 +++----- arch/s390/appldata/appldata_os.c | 4 ---- drivers/cpuidle/governors/menu.c | 4 ---- fs/proc/loadavg.c | 3 --- include/linux/sched/loadavg.h | 21 +++++++++++++++---- kernel/debug/kdb/kdb_main.c | 7 +------ kernel/sched/loadavg.c | 15 ------------- 8 files changed, 22 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/platforms/cell/cpufreq_spudemand.c b/arch/powerpc/platforms/cell/cpufreq_spudemand.c index 882944c36ef5..5d8e8b6bb1cc 100644 --- a/arch/powerpc/platforms/cell/cpufreq_spudemand.c +++ b/arch/powerpc/platforms/cell/cpufreq_spudemand.c @@ -49,7 +49,7 @@ static int calc_freq(struct spu_gov_info_struct *info) cpu = info->policy->cpu; busy_spus = atomic_read(&cbe_spu_info[cpu_to_node(cpu)].busy_spus); - CALC_LOAD(info->busy_spus, EXP, busy_spus * FIXED_1); + info->busy_spus = calc_load(info->busy_spus, EXP, busy_spus * FIXED_1); pr_debug("cpu %d: busy_spus=%d, info->busy_spus=%ld\n", cpu, busy_spus, info->busy_spus); diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index ccc421503363..70101510b19d 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -987,9 +987,9 @@ static void spu_calc_load(void) unsigned long active_tasks; /* fixed-point */ active_tasks = count_active_contexts() * FIXED_1; - CALC_LOAD(spu_avenrun[0], EXP_1, active_tasks); - CALC_LOAD(spu_avenrun[1], EXP_5, active_tasks); - CALC_LOAD(spu_avenrun[2], EXP_15, active_tasks); + spu_avenrun[0] = calc_load(spu_avenrun[0], EXP_1, active_tasks); + spu_avenrun[1] = calc_load(spu_avenrun[1], EXP_5, active_tasks); + spu_avenrun[2] = calc_load(spu_avenrun[2], EXP_15, active_tasks); } static void spusched_wake(struct timer_list *unused) @@ -1071,9 +1071,6 @@ void spuctx_switch_state(struct spu_context *ctx, } } -#define LOAD_INT(x) ((x) >> FSHIFT) -#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) - static int show_spu_loadavg(struct seq_file *s, void *private) { int a, b, c; diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata_os.c index 433a994b1a89..54f375627532 100644 --- a/arch/s390/appldata/appldata_os.c +++ b/arch/s390/appldata/appldata_os.c @@ -25,10 +25,6 @@ #include "appldata.h" - -#define LOAD_INT(x) ((x) >> FSHIFT) -#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) - /* * OS data * diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 1bfe03ceb236..3738b670df7a 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -133,10 +133,6 @@ struct menu_device { int interval_ptr; }; - -#define LOAD_INT(x) ((x) >> FSHIFT) -#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) - static inline int get_loadavg(unsigned long load) { return LOAD_INT(load) * 10 + LOAD_FRAC(load) / 10; diff --git a/fs/proc/loadavg.c b/fs/proc/loadavg.c index b572cc865b92..8bee50a97c0f 100644 --- a/fs/proc/loadavg.c +++ b/fs/proc/loadavg.c @@ -10,9 +10,6 @@ #include #include -#define LOAD_INT(x) ((x) >> FSHIFT) -#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) - static int loadavg_proc_show(struct seq_file *m, void *v) { unsigned long avnrun[3]; diff --git a/include/linux/sched/loadavg.h b/include/linux/sched/loadavg.h index 80bc84ba5d2a..cc9cc62bb1f8 100644 --- a/include/linux/sched/loadavg.h +++ b/include/linux/sched/loadavg.h @@ -22,10 +22,23 @@ extern void get_avenrun(unsigned long *loads, unsigned long offset, int shift); #define EXP_5 2014 /* 1/exp(5sec/5min) */ #define EXP_15 2037 /* 1/exp(5sec/15min) */ -#define CALC_LOAD(load,exp,n) \ - load *= exp; \ - load += n*(FIXED_1-exp); \ - load >>= FSHIFT; +/* + * a1 = a0 * e + a * (1 - e) + */ +static inline unsigned long +calc_load(unsigned long load, unsigned long exp, unsigned long active) +{ + unsigned long newload; + + newload = load * exp + active * (FIXED_1 - exp); + if (active >= load) + newload += FIXED_1-1; + + return newload / FIXED_1; +} + +#define LOAD_INT(x) ((x) >> FSHIFT) +#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) extern void calc_global_load(unsigned long ticks); diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c index e405677ee08d..a8f5aca5eb5e 100644 --- a/kernel/debug/kdb/kdb_main.c +++ b/kernel/debug/kdb/kdb_main.c @@ -2556,16 +2556,11 @@ static int kdb_summary(int argc, const char **argv) } kdb_printf("%02ld:%02ld\n", val.uptime/(60*60), (val.uptime/60)%60); - /* lifted from fs/proc/proc_misc.c::loadavg_read_proc() */ - -#define LOAD_INT(x) ((x) >> FSHIFT) -#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) kdb_printf("load avg %ld.%02ld %ld.%02ld %ld.%02ld\n", LOAD_INT(val.loads[0]), LOAD_FRAC(val.loads[0]), LOAD_INT(val.loads[1]), LOAD_FRAC(val.loads[1]), LOAD_INT(val.loads[2]), LOAD_FRAC(val.loads[2])); -#undef LOAD_INT -#undef LOAD_FRAC + /* Display in kilobytes */ #define K(x) ((x) << (PAGE_SHIFT - 10)) kdb_printf("\nMemTotal: %8lu kB\nMemFree: %8lu kB\n" diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c index a171c1258109..54fbdfb2d86c 100644 --- a/kernel/sched/loadavg.c +++ b/kernel/sched/loadavg.c @@ -91,21 +91,6 @@ long calc_load_fold_active(struct rq *this_rq, long adjust) return delta; } -/* - * a1 = a0 * e + a * (1 - e) - */ -static unsigned long -calc_load(unsigned long load, unsigned long exp, unsigned long active) -{ - unsigned long newload; - - newload = load * exp + active * (FIXED_1 - exp); - if (active >= load) - newload += FIXED_1-1; - - return newload / FIXED_1; -} - #ifdef CONFIG_NO_HZ_COMMON /* * Handle NO_HZ for the global load-average. From patchwork Wed Aug 1 15:13:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552433 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A90D915E9 for ; Wed, 1 Aug 2018 15:10:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 969A62B8CA for ; Wed, 1 Aug 2018 15:10:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 941B52B8BE; Wed, 1 Aug 2018 15:10:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 760422B841 for ; Wed, 1 Aug 2018 15:10:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 347E66B000D; Wed, 1 Aug 2018 11:10:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2F8F26B000E; Wed, 1 Aug 2018 11:10:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C5A06B0010; Wed, 1 Aug 2018 11:10:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id DE2106B000D for ; Wed, 1 Aug 2018 11:10:38 -0400 (EDT) Received: by mail-qt0-f199.google.com with SMTP id d18-v6so16069737qtj.20 for ; Wed, 01 Aug 2018 08:10:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=7HJ/faUwNp70pvtX2vst2/bX0oNS92n5Rlox7XiLwSM=; b=nNYJU1AFaMULSpQcqOzn7GW2fQ+VXdXju1MxVF+j+khTrabDijTuTwVQAE9ioIYqAd rp2Vwvs1cFOw/qtrXTG4JVcSMPI2AhnCIsS2Y/JIgCiugIAsOkRBPPCU5YoOMwlZ7EFX wW+hmoJtMJEVPsgBBu2L0/UcDfKInOfiGiY0LvGaNKzqllIJdUyaDcrlZUwFT5pg3mnD qM0Xd3RKKiZmst1k5H0zWQqaL0maPbUqC7jnVUj01iEPYcvCHMI45bUrTi8b7NgNeThz 4VbnpdpZY4NZX3xOlOR2P+CGzJLvJpDCDHIVxl7O8oUNWuprrKzqjTdv8vY9FzKupWQL J6Iw== X-Gm-Message-State: AOUpUlHLXvjcEnnrbGV/1sJ+d28L64hTN1oykUZun8hx6XCbRO+qk0Fw 2uxWFHU/YjQWSztmsiD7e/q2/YqALF/Jli5UAVxHSpoKPcPUB2SaW7bwCDadkyqVCaSbDRv87kJ SbwYnoMYJKSFaxe6HqXgEGsnFr0qu2TUC2GO9MHrqVrZ0WMqmSaUe8c1VhyCr5DSO2HQ/Im+3VF 8yaHxtg1TKiKSgcvRxi2PgG3UpF+8WR2vw3dG4S03CfP8dQl4zy3ayFzK1UCPi5TXVoE9R3HqxZ CuYlvpsRsHqh0n0MGHhcAQ8qji0GpjMVN7FUaaYU7+645u+3CFVv7flqJO2fTghS+ip22neaHLv hg0TpTyFX9QNnJBypAiJh31QpWg0i64HWf16OvqCQtT2ydpyH+GAlxv4cJjMb74mrr0ckQFpZhR 5 X-Received: by 2002:a0c:e0c8:: with SMTP id x8-v6mr23692068qvk.74.1533136238630; Wed, 01 Aug 2018 08:10:38 -0700 (PDT) X-Received: by 2002:a0c:e0c8:: with SMTP id x8-v6mr23691994qvk.74.1533136237734; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136237; cv=none; d=google.com; s=arc-20160816; b=04grqvhs6okowGZIq3JDzsXtl8tvibjSlit/oBN/Pan55/iNIVPXm73BqLcMLjzWnO rOCA4l0DMvY93wwFDLMbNNXpsUiEJhDEjLJzPCDKoiIQCk2HQKDlHDcsAXEOPp1d1idx O6vaJ02kzrIqU/8SWgUZTyvRUAm5/LESUyXNmLwY9DQo2k+FNGici26y7SJJiTPLNCp8 WMVu48lBc7AkbSSONWBXz/G5DM8eqPwp2+MhQNILXtboycq9SBJYKtIk/xY83S2ReUW+ 55/GW17Zwh997xY3Xqx6eO1gu0mEkxtoevLk2egCt2fHsNr/1mnStooPRPEyjVyx9EOS wbBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=7HJ/faUwNp70pvtX2vst2/bX0oNS92n5Rlox7XiLwSM=; b=N7R9YnnmmL8UwLCxnALaishejBsLg2bRN6F1PhJ0SGhAmTdZkre1QihLRyvZGlUg5K Iki94E8XttqZmfcPJjRIW7YP3dGeyQQuXx5XUF+A/7SFLVSgi9nSoghD2uY4uUWD0k46 oWLgbVUsv8NLoZzoUtPy9Lrq3DQMhBJrCxjaaYY2lNGEozHvvQjod53F1tbIEoUrC0B9 J/4Bvg7XCQ1z8XVPHu5h32pBJdMhUGtfFMG6aqDpBbLwPb//eXNG2Vz+iuV8+FkGTr5z n2SwOdyr9JMAwrEoEvHnGJBEBlPNoadBH+KNVUbMYXQtitfhSAgec2czWpMcqWnm8WwD WDnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="19/OXUit"; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b14-v6sor7878557qvj.143.2018.08.01.08.10.37 for (Google Transport Security); Wed, 01 Aug 2018 08:10:37 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="19/OXUit"; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=7HJ/faUwNp70pvtX2vst2/bX0oNS92n5Rlox7XiLwSM=; b=19/OXUitk9OIFkSFsVCX5dcIBMU6NIzgFeMkUjVxGA8mgaEufARo5yjb5mLA1TDJRJ 7tz3b4g/yMzi/ntkL0A/9eB4bW0HtC6VuyW+BeAGDFEpDYggF88h0aWBoViKtXBBWIMg 2wXxIycHuh3j6iyrOhMDr3Gv6lwOIHXovA+tRjIOcD2PbZhReG8Q0T12Ps7UPosb86rX F2lYPVMoAQughZy/YYoDPMPoXg+DBmZfpvmhgy5oLsoL0OZtAGPFrl/hzW6qOwon8jTr d7qTatvfrseQbH5LX20+ZEnOZyqv4aYFwyRARZZ5jJHU+5vPGjFNDn5cKv2L6/Ez9NI4 KPDQ== X-Google-Smtp-Source: AAOMgpeEj7LBP5DEkolXdB4GCMebS/5xLnIOX/Lx2qxntM2S8M0UuymiFyoewYFVUxtJnzsLc+p9jg== X-Received: by 2002:a0c:9dd0:: with SMTP id p16-v6mr23942032qvf.211.1533136237322; Wed, 01 Aug 2018 08:10:37 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id a6-v6sm12448202qth.8.2018.08.01.08.10.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:36 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/9] sched: loadavg: make calc_load_n() public Date: Wed, 1 Aug 2018 11:13:04 -0400 Message-Id: <20180801151308.32234-6-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP It's going to be used in a later patch. Keep the churn separate. Signed-off-by: Johannes Weiner --- include/linux/sched/loadavg.h | 3 + kernel/sched/loadavg.c | 138 +++++++++++++++++----------------- 2 files changed, 72 insertions(+), 69 deletions(-) diff --git a/include/linux/sched/loadavg.h b/include/linux/sched/loadavg.h index cc9cc62bb1f8..4859bea47a7b 100644 --- a/include/linux/sched/loadavg.h +++ b/include/linux/sched/loadavg.h @@ -37,6 +37,9 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active) return newload / FIXED_1; } +extern unsigned long calc_load_n(unsigned long load, unsigned long exp, + unsigned long active, unsigned int n); + #define LOAD_INT(x) ((x) >> FSHIFT) #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c index 54fbdfb2d86c..28a516575c18 100644 --- a/kernel/sched/loadavg.c +++ b/kernel/sched/loadavg.c @@ -91,6 +91,75 @@ long calc_load_fold_active(struct rq *this_rq, long adjust) return delta; } +/** + * fixed_power_int - compute: x^n, in O(log n) time + * + * @x: base of the power + * @frac_bits: fractional bits of @x + * @n: power to raise @x to. + * + * By exploiting the relation between the definition of the natural power + * function: x^n := x*x*...*x (x multiplied by itself for n times), and + * the binary encoding of numbers used by computers: n := \Sum n_i * 2^i, + * (where: n_i \elem {0, 1}, the binary vector representing n), + * we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is + * of course trivially computable in O(log_2 n), the length of our binary + * vector. + */ +static unsigned long +fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n) +{ + unsigned long result = 1UL << frac_bits; + + if (n) { + for (;;) { + if (n & 1) { + result *= x; + result += 1UL << (frac_bits - 1); + result >>= frac_bits; + } + n >>= 1; + if (!n) + break; + x *= x; + x += 1UL << (frac_bits - 1); + x >>= frac_bits; + } + } + + return result; +} + +/* + * a1 = a0 * e + a * (1 - e) + * + * a2 = a1 * e + a * (1 - e) + * = (a0 * e + a * (1 - e)) * e + a * (1 - e) + * = a0 * e^2 + a * (1 - e) * (1 + e) + * + * a3 = a2 * e + a * (1 - e) + * = (a0 * e^2 + a * (1 - e) * (1 + e)) * e + a * (1 - e) + * = a0 * e^3 + a * (1 - e) * (1 + e + e^2) + * + * ... + * + * an = a0 * e^n + a * (1 - e) * (1 + e + ... + e^n-1) [1] + * = a0 * e^n + a * (1 - e) * (1 - e^n)/(1 - e) + * = a0 * e^n + a * (1 - e^n) + * + * [1] application of the geometric series: + * + * n 1 - x^(n+1) + * S_n := \Sum x^i = ------------- + * i=0 1 - x + */ +unsigned long +calc_load_n(unsigned long load, unsigned long exp, + unsigned long active, unsigned int n) +{ + return calc_load(load, fixed_power_int(exp, FSHIFT, n), active); +} + #ifdef CONFIG_NO_HZ_COMMON /* * Handle NO_HZ for the global load-average. @@ -210,75 +279,6 @@ static long calc_load_nohz_fold(void) return delta; } -/** - * fixed_power_int - compute: x^n, in O(log n) time - * - * @x: base of the power - * @frac_bits: fractional bits of @x - * @n: power to raise @x to. - * - * By exploiting the relation between the definition of the natural power - * function: x^n := x*x*...*x (x multiplied by itself for n times), and - * the binary encoding of numbers used by computers: n := \Sum n_i * 2^i, - * (where: n_i \elem {0, 1}, the binary vector representing n), - * we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is - * of course trivially computable in O(log_2 n), the length of our binary - * vector. - */ -static unsigned long -fixed_power_int(unsigned long x, unsigned int frac_bits, unsigned int n) -{ - unsigned long result = 1UL << frac_bits; - - if (n) { - for (;;) { - if (n & 1) { - result *= x; - result += 1UL << (frac_bits - 1); - result >>= frac_bits; - } - n >>= 1; - if (!n) - break; - x *= x; - x += 1UL << (frac_bits - 1); - x >>= frac_bits; - } - } - - return result; -} - -/* - * a1 = a0 * e + a * (1 - e) - * - * a2 = a1 * e + a * (1 - e) - * = (a0 * e + a * (1 - e)) * e + a * (1 - e) - * = a0 * e^2 + a * (1 - e) * (1 + e) - * - * a3 = a2 * e + a * (1 - e) - * = (a0 * e^2 + a * (1 - e) * (1 + e)) * e + a * (1 - e) - * = a0 * e^3 + a * (1 - e) * (1 + e + e^2) - * - * ... - * - * an = a0 * e^n + a * (1 - e) * (1 + e + ... + e^n-1) [1] - * = a0 * e^n + a * (1 - e) * (1 - e^n)/(1 - e) - * = a0 * e^n + a * (1 - e^n) - * - * [1] application of the geometric series: - * - * n 1 - x^(n+1) - * S_n := \Sum x^i = ------------- - * i=0 1 - x - */ -static unsigned long -calc_load_n(unsigned long load, unsigned long exp, - unsigned long active, unsigned int n) -{ - return calc_load(load, fixed_power_int(exp, FSHIFT, n), active); -} - /* * NO_HZ can leave us missing all per-CPU ticks calling * calc_load_fold_active(), but since a NO_HZ CPU folds its delta into From patchwork Wed Aug 1 15:13:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552435 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CCE2113B8 for ; Wed, 1 Aug 2018 15:10:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBB522B7A5 for ; Wed, 1 Aug 2018 15:10:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B95682B84F; Wed, 1 Aug 2018 15:10:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 166B72B7A5 for ; Wed, 1 Aug 2018 15:10:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25D116B000E; Wed, 1 Aug 2018 11:10:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1E75D6B0010; Wed, 1 Aug 2018 11:10:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05EEC6B0269; Wed, 1 Aug 2018 11:10:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f198.google.com (mail-qk0-f198.google.com [209.85.220.198]) by kanga.kvack.org (Postfix) with ESMTP id CB2BE6B000E for ; Wed, 1 Aug 2018 11:10:40 -0400 (EDT) Received: by mail-qk0-f198.google.com with SMTP id 17-v6so17207512qkz.15 for ; Wed, 01 Aug 2018 08:10:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=NnxyiouhvZnUSydapW5h/V7JQgsuHd8XHvJ9oMKS3Xk=; b=Tezqvb85SHjXWf0bcvw4BbV1KfTCFhJ3L07eu1KkLgKhJ20IkeicFM43lRDi6Aalpl p90U8ya++noqIjZU9hmlidNHS58DQOqjW9aG1e9Wvm5s3KvM/MRAZIDkavs/lSwpOVu0 ym+qQSmM7SAHIGyB1/YBte57kXmzgBsHogEMLMhOwjhqJaT8i7vBcYWtDssAUvQFOiIA 2HvSmqEuWdxEbH3zmz2vFf2P/RNNRL/aPQFZMdr5A46BZWy+62uwI0VzCioSPd9V7hi/ pUKVX1bWm2vbhH0OIaaKTXZ54qyfAUVGdEm5wX38OPpj2c1WYJwgH2bGQEp1mp5OlSwA eKqQ== X-Gm-Message-State: AOUpUlHcQqHpF8pXhVmUZiWZrO6MUY/3nLdMv0JFiROW+Ld/lIDYOz69 eCqPr4s15GIOMovSAGHBERC43sGOkhhPQBL2WWG1oVvbnrvNb7Lc1J+WEwzQlQPOfL60y7gfus7 157Vt00zKzEDBO6r0Sx2ZkDYEaqIiSdhz5zWmfomz3Uptiby2/fZvZBYhk2e8jCSNJBiBpUa6jR 0sb5EXolKgaGG5J9oVi4cltnRVJaT2nZimbIi77hVY+TOhYmyr4wB9nJhRle0HxnqAnbKu+qxmK U2e0bgqhK3zRcVDbIn6VisIGZNs1zmY/CgfP9eEb19BEv1z/k6RHmylqW90itGuw7wIL6v6rtwb Bfo+C9znZEW1hO1qG+p/jYu9fn3sVuYU20/YoCOKv3+VVF68214Ivjagrrj5sRm3NpcO6ZUoQYR 6 X-Received: by 2002:a37:7445:: with SMTP id p66-v6mr23937764qkc.368.1533136240594; Wed, 01 Aug 2018 08:10:40 -0700 (PDT) X-Received: by 2002:a37:7445:: with SMTP id p66-v6mr23937705qkc.368.1533136239843; Wed, 01 Aug 2018 08:10:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136239; cv=none; d=google.com; s=arc-20160816; b=anECR8BTo83PwCJj86+2oyx+4vz0LclGF+wUsnLEZ+wy8Q9ueyoK2rF0LCk3dcx8Op hyNpOYZH6PV5qDe5QCZGCUWPUtmOUZ0+0hayFydyRPQDt1ETOgxVHxkJECFBJepnPYuj u7s54Fnz0nHkq9WlnxZcbHJ3d0I+t1lyucOSDP/8c9+CqOo56CKOgzrzGrHG9DayxpqI VJQwRM1sryy/e1IGnClWTpv9F3zrp8lfMDXmg6xe84Nik3ful/p4AV0pjeGx/og583pQ n6tEhoXiPyh1XBuSdFLo70iMNKujZku8sDfABhbOv34XkZ2b4HYio+oeFqf2CIocdkdg L9ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=NnxyiouhvZnUSydapW5h/V7JQgsuHd8XHvJ9oMKS3Xk=; b=VyLNFZ+XZcUFNgZpmBDBhhGxElK1bZEW3HfYQedDQY67zW3pmQP2x0qR2ltcsm81vL EI0aaSTOpv34tiCgunFAqTmXhF+22wI50H45uD2EW1mPOp7m8h/7Ig97WRhN9DOnztus LOF3SIYmvARddtJcpM0LZLGtIV2GHvVZjcqIpVP3VCP2/KiKgRn+Nk0v2A5o0ZvxLYYE YzU4jr97+m73Iybvo6pbcwrDeyDioqz4iLbIHko1j43qcmPTJyvhHer1Salzdn3DrY7i BZHnBEJABVW5pt4vCp30z2g+PvFu5ZacKyJZhypNnbpDdlGx3r/aYQxGTX5sQC5DD6Nd XkQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=1t3mlZoI; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id e10-v6sor7874858qvd.60.2018.08.01.08.10.39 for (Google Transport Security); Wed, 01 Aug 2018 08:10:39 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=1t3mlZoI; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=NnxyiouhvZnUSydapW5h/V7JQgsuHd8XHvJ9oMKS3Xk=; b=1t3mlZoICxHRzXm+f4eEWTheE+PY6y8grsCgJAntuM3MXe0fNx8dvu+SqOoEa9wwbl pi/fCBBaZYU06BwHE1H6zaMAuhODWUbLVoRBlMW2KfdU+g4AmTWygqO57uZyhZhVXzB2 JFRkNJZLn8g8Droft3BB1vHxzdfHDhVL7UW08VXpd7YKpYmh7WhuhiRJ1SJQWXM5/M+S I6ATf90nozwKjFTjy5MgWarFLFwzBy/Kp2wUrouciDAcz76dDsy4T/BzJ59MfBiV7FTR we5Pgp3F/jZbQJWM0BD4GZ98ofq1b1rQPQNvFJDuxppXZBrAvDeIyfDBOKQZ0RWVUxtJ YbIw== X-Google-Smtp-Source: AAOMgpcbLY6F10/OZmRFeGXCzILLgMkvRboSXbtPYPJ1jMc1cb14H7e20NDTDsi5HXdz3LqxftOavQ== X-Received: by 2002:a0c:a162:: with SMTP id d89-v6mr23660686qva.198.1533136239487; Wed, 01 Aug 2018 08:10:39 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id q195-v6sm10879028qke.13.2018.08.01.08.10.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:38 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 6/9] sched: sched.h: make rq locking and clock functions available in stats.h Date: Wed, 1 Aug 2018 11:13:05 -0400 Message-Id: <20180801151308.32234-7-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP kernel/sched/sched.h includes "stats.h" half-way through the file. The next patch introduces users of sched.h's rq locking functions and update_rq_clock() in kernel/sched/stats.h. Move those definitions up in the file so they are available in stats.h. Signed-off-by: Johannes Weiner --- kernel/sched/sched.h | 164 +++++++++++++++++++++---------------------- 1 file changed, 82 insertions(+), 82 deletions(-) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cb467c221b15..b8f038497240 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -919,6 +919,8 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) +extern void update_rq_clock(struct rq *rq); + static inline u64 __rq_clock_broken(struct rq *rq) { return READ_ONCE(rq->clock); @@ -1037,6 +1039,86 @@ static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf) #endif } +struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf) + __acquires(rq->lock); + +struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf) + __acquires(p->pi_lock) + __acquires(rq->lock); + +static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf) + __releases(rq->lock) +{ + rq_unpin_lock(rq, rf); + raw_spin_unlock(&rq->lock); +} + +static inline void +task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf) + __releases(rq->lock) + __releases(p->pi_lock) +{ + rq_unpin_lock(rq, rf); + raw_spin_unlock(&rq->lock); + raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags); +} + +static inline void +rq_lock_irqsave(struct rq *rq, struct rq_flags *rf) + __acquires(rq->lock) +{ + raw_spin_lock_irqsave(&rq->lock, rf->flags); + rq_pin_lock(rq, rf); +} + +static inline void +rq_lock_irq(struct rq *rq, struct rq_flags *rf) + __acquires(rq->lock) +{ + raw_spin_lock_irq(&rq->lock); + rq_pin_lock(rq, rf); +} + +static inline void +rq_lock(struct rq *rq, struct rq_flags *rf) + __acquires(rq->lock) +{ + raw_spin_lock(&rq->lock); + rq_pin_lock(rq, rf); +} + +static inline void +rq_relock(struct rq *rq, struct rq_flags *rf) + __acquires(rq->lock) +{ + raw_spin_lock(&rq->lock); + rq_repin_lock(rq, rf); +} + +static inline void +rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf) + __releases(rq->lock) +{ + rq_unpin_lock(rq, rf); + raw_spin_unlock_irqrestore(&rq->lock, rf->flags); +} + +static inline void +rq_unlock_irq(struct rq *rq, struct rq_flags *rf) + __releases(rq->lock) +{ + rq_unpin_lock(rq, rf); + raw_spin_unlock_irq(&rq->lock); +} + +static inline void +rq_unlock(struct rq *rq, struct rq_flags *rf) + __releases(rq->lock) +{ + rq_unpin_lock(rq, rf); + raw_spin_unlock(&rq->lock); +} + #ifdef CONFIG_NUMA enum numa_topology_type { NUMA_DIRECT, @@ -1670,8 +1752,6 @@ static inline void sub_nr_running(struct rq *rq, unsigned count) sched_update_tick_dependency(rq); } -extern void update_rq_clock(struct rq *rq); - extern void activate_task(struct rq *rq, struct task_struct *p, int flags); extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags); @@ -1752,86 +1832,6 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { } static inline void sched_avg_update(struct rq *rq) { } #endif -struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf) - __acquires(rq->lock); - -struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf) - __acquires(p->pi_lock) - __acquires(rq->lock); - -static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf) - __releases(rq->lock) -{ - rq_unpin_lock(rq, rf); - raw_spin_unlock(&rq->lock); -} - -static inline void -task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf) - __releases(rq->lock) - __releases(p->pi_lock) -{ - rq_unpin_lock(rq, rf); - raw_spin_unlock(&rq->lock); - raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags); -} - -static inline void -rq_lock_irqsave(struct rq *rq, struct rq_flags *rf) - __acquires(rq->lock) -{ - raw_spin_lock_irqsave(&rq->lock, rf->flags); - rq_pin_lock(rq, rf); -} - -static inline void -rq_lock_irq(struct rq *rq, struct rq_flags *rf) - __acquires(rq->lock) -{ - raw_spin_lock_irq(&rq->lock); - rq_pin_lock(rq, rf); -} - -static inline void -rq_lock(struct rq *rq, struct rq_flags *rf) - __acquires(rq->lock) -{ - raw_spin_lock(&rq->lock); - rq_pin_lock(rq, rf); -} - -static inline void -rq_relock(struct rq *rq, struct rq_flags *rf) - __acquires(rq->lock) -{ - raw_spin_lock(&rq->lock); - rq_repin_lock(rq, rf); -} - -static inline void -rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf) - __releases(rq->lock) -{ - rq_unpin_lock(rq, rf); - raw_spin_unlock_irqrestore(&rq->lock, rf->flags); -} - -static inline void -rq_unlock_irq(struct rq *rq, struct rq_flags *rf) - __releases(rq->lock) -{ - rq_unpin_lock(rq, rf); - raw_spin_unlock_irq(&rq->lock); -} - -static inline void -rq_unlock(struct rq *rq, struct rq_flags *rf) - __releases(rq->lock) -{ - rq_unpin_lock(rq, rf); - raw_spin_unlock(&rq->lock); -} - #ifdef CONFIG_SMP #ifdef CONFIG_PREEMPT From patchwork Wed Aug 1 15:13:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552437 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1807315E9 for ; Wed, 1 Aug 2018 15:10:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07ECF2B8C7 for ; Wed, 1 Aug 2018 15:10:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 050552B59E; Wed, 1 Aug 2018 15:10:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 867122B8B5 for ; Wed, 1 Aug 2018 15:10:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20F406B0010; Wed, 1 Aug 2018 11:10:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 19C8C6B026A; Wed, 1 Aug 2018 11:10:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01DD46B026B; Wed, 1 Aug 2018 11:10:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id AE8296B0010 for ; Wed, 1 Aug 2018 11:10:42 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id w126-v6so16927650qka.11 for ; Wed, 01 Aug 2018 08:10:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=EOH4yuXU61UFUoCxssnGS0E9pmyAjCBzpOCJhmVNCYY=; b=py395YL0Q6KbpCtNiAqWRX6mRXgSZxF2gVKALuB1Z2b19kTHRYI9OAKKzYlsK5XX14 CkHVt9HUHsnR5TzZEuzP+g8X2/47ewJnATpk+wLlht4ZPhV7EBblyO4YwbO0tXeGYIiy kpFQDwE/rEIUQeYKy8do4Ma+bRqtuKsfBPhy+66MVb8Kbig/tzP1kUzEdsw31E+xia1B MNfVvXga4BQ3ZKlQPkbhB6+CMAFr1b8J/hnfpHGi+6nGTf59s7jz0zbnsq0AIVaWGMsc McyydZCVkHmimx4TpL5ub5NRIBxyx5OP133WLal4YQXyX7PmoSmCYQFcvES/LCtRvBUp G/QQ== X-Gm-Message-State: AOUpUlFPAKfp5XHF9us1FYx/hVGWqmWm09KDwBd651JEiH3VTii1Y4QL sInyowHa3/egu0qL9nbX7XSkjrtSDpyu/Hl5aswMrY+eKCE1vx9nEDj6WmBNCbqS1Ev8m9Wyshb w2oLJLszf/fDa1iILRjlJErb7E3e3TbMsuAJHwRvy3R+onV5m6Livp4wj6hxlaAJFFk1ztsjOfW GPGtidVhHRwwZwdUZz3u2r1OFHLKSz29sWl3+QNkx1otykiolYjPM3sFi/uOI0IEx7cDgXI8Ala 1i9SUug4fb7f0LUGyQTDcIj0WZyj+9MG8tmtD0inzTdGAEckV2MmhvG0wT1jGuA9TrVUq3l42eh Mm0LLKE+kbKBGqFQHSwsvt/j2steuoKJag89JGOmfc2/c58qRgEbA3yzK+kPNM1xIml2N+NvHaQ l X-Received: by 2002:a37:209:: with SMTP id 9-v6mr24393557qkc.267.1533136242488; Wed, 01 Aug 2018 08:10:42 -0700 (PDT) X-Received: by 2002:a37:209:: with SMTP id 9-v6mr24393500qkc.267.1533136241797; Wed, 01 Aug 2018 08:10:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136241; cv=none; d=google.com; s=arc-20160816; b=s1HseW09rU2UO1okG0S7OTqOgcnPHNSJMBbhTZ5rYjWimwAf0cebC9M8QGaJVla5hC W3TB0uRCJw3rXNyg6l1tDM/aQ0sMOHaXI+bY1ZJt2+C+QxH52C7GFhzqDGc6G4/XmTFt XpVu9yn0aDxyuEQ3oAig4HCY4VvAQ1gVuMJo7JprjSD/1Gyc/BBc7TBJGMytbA6p1A4D LjTG8dOzj/BLXAHSzkFE2axzmN2tKPWsp2CHwSrJeDqusbqej52L+vfoTWUpRZWuIZBG rRMb8SIc5GBhOX4jJyVq5Vs3esglQE76X9Lh40uzpFk5d8O23Z5fJdUFP/1ZqvMiCpgZ PtlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=EOH4yuXU61UFUoCxssnGS0E9pmyAjCBzpOCJhmVNCYY=; b=Rc8ESVcePSKoTaC25C155NVvsGcHO+z+C3Q0Y4VlM8vwxqdw1JetQH7pJHjgRm5OCv MzqzmwIlJwRrH20prCXekTKGUdpbCe6yzdHQAgmDTeVzEYAKs5P5UEcW20i2xFfGSBHu KnbmqKuaBQ0f5q6N8yMfoE/VjBbkgq8I7c3Q3JQKFXz7XTSMn9Q+p8zzjgLY3zihz7hg LHLY4mg4CH1/E9lfG4m7tFuA2LKToHz/VY1ryI5b5sDaQc3y41VrWd/PtG1vB/PgOhN1 3IX8EPL4rqwZdaNtM1Otyarq0NVDAPESKqydBcJrLHLKdqzVV+bvFHZmT9Rdj4sX96eh CDFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=u3XnHwSJ; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id t22-v6sor9168141qtj.100.2018.08.01.08.10.41 for (Google Transport Security); Wed, 01 Aug 2018 08:10:41 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=u3XnHwSJ; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=EOH4yuXU61UFUoCxssnGS0E9pmyAjCBzpOCJhmVNCYY=; b=u3XnHwSJciBcoj/yAksVr1B4Cciwoa8pOCR7d5EEbJAVWeEaybzXabszZvXD59AdCS 1dhH0ucYa9ztPoa+M1wOw+LYwXGV4VtAU4Q8b2H5JyNZbgn0VRpoC7v7osJ/IlIn/Wps KtCCvSljTUfhS/2lQu6G7PrEtDPwDP127Ey0b8mszWlBWsdzQAB6Vj/AKN3hcxDBNDJV MgJDcqooMyVnLRJ7fa+2Zcsd5zra2aDOdCsTqEcjo/YEl+WvOHTmsKfzUB3jFMnoCr8d 5qs6DZm430rKNooblKjzNyN7ORI7Hu3bjtHznnL+4/TyTmfmMINypNZjKv3D6BdBPNVO ssAg== X-Google-Smtp-Source: AAOMgpefmFF1AMx6ZjQphuvdFTfXnZ2hHlz9PBS9M3WRf0B9bzvIErHEtBindmWbPwaXHPtDtN86sw== X-Received: by 2002:ac8:1834:: with SMTP id q49-v6mr26268816qtj.223.1533136241602; Wed, 01 Aug 2018 08:10:41 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id o124-v6sm12186506qkd.61.2018.08.01.08.10.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:40 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 7/9] sched: introduce this_rq_lock_irq() Date: Wed, 1 Aug 2018 11:13:06 -0400 Message-Id: <20180801151308.32234-8-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP do_sched_yield() disables IRQs, looks up this_rq() and locks it. The next patch is adding another site with the same pattern, so provide a convenience function for it. Signed-off-by: Johannes Weiner --- kernel/sched/core.c | 4 +--- kernel/sched/sched.h | 12 ++++++++++++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 211890edf37e..9586a8141f16 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4960,9 +4960,7 @@ static void do_sched_yield(void) struct rq_flags rf; struct rq *rq; - local_irq_disable(); - rq = this_rq(); - rq_lock(rq, &rf); + rq = this_rq_lock_irq(&rf); schedstat_inc(rq->yld_count); current->sched_class->yield_task(rq); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b8f038497240..bc798c7cb4d4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1119,6 +1119,18 @@ rq_unlock(struct rq *rq, struct rq_flags *rf) raw_spin_unlock(&rq->lock); } +static inline struct rq * +this_rq_lock_irq(struct rq_flags *rf) + __acquires(rq->lock) +{ + struct rq *rq; + + local_irq_disable(); + rq = this_rq(); + rq_lock(rq, rf); + return rq; +} + #ifdef CONFIG_NUMA enum numa_topology_type { NUMA_DIRECT, From patchwork Wed Aug 1 15:13:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552441 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7E0613B8 for ; Wed, 1 Aug 2018 15:11:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1E9A2B5BE for ; Wed, 1 Aug 2018 15:11:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF8F12B7A9; Wed, 1 Aug 2018 15:11:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 50ABB2B8B2 for ; Wed, 1 Aug 2018 15:11:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 323046B026C; Wed, 1 Aug 2018 11:10:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A9D66B026E; Wed, 1 Aug 2018 11:10:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14D036B026F; Wed, 1 Aug 2018 11:10:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f198.google.com (mail-qt0-f198.google.com [209.85.216.198]) by kanga.kvack.org (Postfix) with ESMTP id B52526B026C for ; Wed, 1 Aug 2018 11:10:48 -0400 (EDT) Received: by mail-qt0-f198.google.com with SMTP id j11-v6so16093560qtp.0 for ; Wed, 01 Aug 2018 08:10:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=EBN3xTwc76TOo9IJxT2BSAFsu4RBtS+aXc8bxSb3SPw=; b=DXnFfelqSpw0AdJ+CMRJLX+LjYRC/Z2kHyN79piF/zYSyGxEMBYztw0K7WiYnOEU2S DupZy5U0O3VfCBgIWdpl3fh1++egGnTuNrX1gxN3I4TlucVfdlvrv2GsqUtDtVXh7BBB BS/WHfc6JUuehzdYNxMXVVnWvlfuZDT12IllDCGM+cmE6vrvMFdQz5u7mZndYD0dAus1 5JCQjyZU0NDkNvtn3ZdtwnQYtVQgrvwW55UqF/2Q1SARAzFdGQb7UtIKcZz8g7CHHDUU 14eo/7fukiS237GtlRB3rBUlKmW4RXE2peGZDtOzTiDGpfbwfHkbs3q2+HSCulgNYzjd GjoA== X-Gm-Message-State: AOUpUlEtMS6/B31+ejZenPDw2hxywGX2x1QhCp8yY+8YcRESVsYrj8Y3 +xa4MljEFH4zsEEnHUlrvNkkJYoewfgstL5ZWyMEoffVtPHXZsCXN3hGzYVQEDFDBuEqo7gDUWM WMP9l2Ad93w2R4UdqGwIRVxNI56ANs362mK/DV25L/4X6kUQj18BrIQXRcyN24JIUPXJPmw2QSn 65edmSZLmbsZXhz1+R0JesbRB5zd8E7bmpWk+nfoV4/uwk+O7+n3lMMr3hIY9fVJ/EjjXtqFQ0c rkS2BihizcEvE+jiQuNPRevUo0jGdp44Du8bQ5h30xLXZX2XjEBVG6/E/E+jzfoa4+A3mw9tVXn 1dTtXLhyslmKFDaGL2BMjLeL3FQMimmGH91U3Kz0hRTSU6kHRf/8ClnvdFp/yrkzUuiPUSIXGXm 8 X-Received: by 2002:ac8:2807:: with SMTP id 7-v6mr20279237qtq.111.1533136248378; Wed, 01 Aug 2018 08:10:48 -0700 (PDT) X-Received: by 2002:ac8:2807:: with SMTP id 7-v6mr20278994qtq.111.1533136245496; Wed, 01 Aug 2018 08:10:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136245; cv=none; d=google.com; s=arc-20160816; b=YFEcSp4zKOtARva++NYjpwkbig/OLWIuShu8ZQCfSxk0ELmN6SkNKS03y0NVD9tWzt 207bDYoZ9KAVYGSUhq/JsnI+lBQGLZmjrf3YGnG70I5QccZ4fmunc31nap0czRkZ24rH Zn5yULxaZ2Skb8rwOFmgaeJdEKUq4twsU5HdAk8E5S0ue55fXFtFYGZdrSvjyLh9CosW WQ4HTfp/1eFiZJl8YE282J0ky0sbe3sUTLvXB/Ie5B5MIa2tfKjmXqZB0JoBRVDMKrgT l9qmGYMXYZh2WDDmrjeN9CZQb0grNZgaiT0Qhr2ER3Kn9FsNfu671n+/0jO/w3mRe3JG K0Gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=EBN3xTwc76TOo9IJxT2BSAFsu4RBtS+aXc8bxSb3SPw=; b=nvQ3RwEXl6Qa8JRctKofghMtOu9Ei5aj4fNma3smR7cNEQiNWWQd0U+Zo399vCFWQW VA3z8R8s92A9wyEFwuOBm+ZPnbv5ov9IHl6my2UPJGCt1aBjBXRBXv+/+FLrTS4aNzM3 GiCk8krgAGqaaq6EKT5hqSqPpEZ1Lt9jrDti8Er9juuynGSXbwtQYqazx6pNttV6eUyn 79/WKpj+taQOpbZZ+u3m3V6F8LX1CHHdMHE3TFE09jCVvB8RPUAFy1Izm6Yf6JcUx1pA lGQ/3BGUq3mQBLjN0ajQfwQRnKIEeN6QHsyDkOrnr71P2xQffAp2wqzJy7ZURsLPhjU+ KcaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=cAqxm700; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b45-v6sor8285538qtb.4.2018.08.01.08.10.45 for (Google Transport Security); Wed, 01 Aug 2018 08:10:45 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=cAqxm700; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=EBN3xTwc76TOo9IJxT2BSAFsu4RBtS+aXc8bxSb3SPw=; b=cAqxm700+9Nj6ZjrdkFn6zcH/pJJTN3wVbEDL9XyU2Cq3++gm8TVdaROnC8zjsjVfC wfnzgz0j4mPl3cnc9OrEcV9bSwdDd98vKmcK8oi/VIsICG0gDQ9AYZGOijSudhQxpxKS PvJMXNujYqMUejRV5d4HczAJeOep99PB2707/SCW7DrDZKdlbfxGgs9YsUyuURs/jAAC EFcVT5g1wf+7+kkecxWd7BJG4Lkv+TiHxBrotJ2N2usk0mHP2KJ7UJ7lmCLiv+16NdHB RzwFZPU4yyayNG5MROrqWzd7e0ihbsvt5QsIlygyMnKVQuC2z9jIRxunvKxW7D+9Dwm5 5ReQ== X-Google-Smtp-Source: AAOMgpdQN54Nobvr9xaKazf9B7ihwWqTZyNO29P23ILbp3UpkrsrUJTaKDssvBJofnVUdBhQJ+bO8w== X-Received: by 2002:ac8:309a:: with SMTP id v26-v6mr26589850qta.378.1533136244103; Wed, 01 Aug 2018 08:10:44 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id j86-v6sm16217302qkh.60.2018.08.01.08.10.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:42 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 8/9] psi: pressure stall information for CPU, memory, and IO Date: Wed, 1 Aug 2018 11:13:07 -0400 Message-Id: <20180801151308.32234-9-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this patch implements a way to quantify resource pressure in the system. A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. Stall states are aggregate versions of the per-task delay accounting delays: cpu: some tasks are runnable but not executing on a CPU memory: tasks are reclaiming, or waiting for swapin or thrashing cache io: tasks are waiting for io completions These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs. To do this, psi keeps track of the task states associated with each CPU and samples the time they spend in stall states. Every 2 seconds, the samples are averaged across CPUs - weighted by the CPUs' non-idle time to eliminate artifacts from unused CPUs - and translated into percentages of walltime. A running average of those percentages is maintained over 10s, 1m, and 5m periods (similar to the loadaverage). v2: - stable clock tick, as per Peter - data structure layout optimization, as per Peter - fix u64 divisions on 32 bit, as per Peter - outermost psi_disabled checks, as per Peter - coding style fixes, as per Peter - just-in-time stats aggregation, as per Suren - fix task state corruption with CONFIG_PREEMPT, as per Suren - CONFIG_PSI=n build error - avoid writing p->sched_psi_wake_requeue unnecessarily - documentation & comment updates v3: - pack scheduler hotpath data into one cacheline, as per Peter and Linus - drop unnecessary SCHED_INFO dependency, as per Peter - lockless live-state aggregation, as per Peter - do_div -> div64_ul and some other cleanups, as per Peter - realtime sampling period and slipped sample handling, as per Tejun Signed-off-by: Johannes Weiner --- Documentation/accounting/psi.txt | 64 +++ include/linux/psi.h | 27 ++ include/linux/psi_types.h | 87 +++++ include/linux/sched.h | 10 + init/Kconfig | 15 + kernel/fork.c | 4 + kernel/sched/Makefile | 1 + kernel/sched/core.c | 11 +- kernel/sched/psi.c | 643 +++++++++++++++++++++++++++++++ kernel/sched/sched.h | 2 + kernel/sched/stats.h | 80 ++++ mm/compaction.c | 5 + mm/filemap.c | 15 +- mm/page_alloc.c | 10 + mm/vmscan.c | 13 + 15 files changed, 981 insertions(+), 6 deletions(-) create mode 100644 Documentation/accounting/psi.txt create mode 100644 include/linux/psi.h create mode 100644 include/linux/psi_types.h create mode 100644 kernel/sched/psi.c diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt new file mode 100644 index 000000000000..51e7ef14142e --- /dev/null +++ b/Documentation/accounting/psi.txt @@ -0,0 +1,64 @@ +================================ +PSI - Pressure Stall Information +================================ + +:Date: April, 2018 +:Author: Johannes Weiner + +When CPU, memory or IO devices are contended, workloads experience +latency spikes, throughput losses, and run the risk of OOM kills. + +Without an accurate measure of such contention, users are forced to +either play it safe and under-utilize their hardware resources, or +roll the dice and frequently suffer the disruptions resulting from +excessive overcommit. + +The psi feature identifies and quantifies the disruptions caused by +such resource crunches and the time impact it has on complex workloads +or even entire systems. + +Having an accurate measure of productivity losses caused by resource +scarcity aids users in sizing workloads to hardware--or provisioning +hardware according to workload demand. + +As psi aggregates this information in realtime, systems can be managed +dynamically using techniques such as load shedding, migrating jobs to +other systems or data centers, or strategically pausing or killing low +priority or restartable batch jobs. + +This allows maximizing hardware utilization without sacrificing +workload health or risking major disruptions such as OOM kills. + +Pressure interface +================== + +Pressure information for each resource is exported through the +respective file in /proc/pressure/ -- cpu, memory, and io. + +In both cases, the format for CPU is as such: + +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 + +and for memory and IO: + +some avg10=0.00 avg60=0.00 avg300=0.00 total=0 +full avg10=0.00 avg60=0.00 avg300=0.00 total=0 + +The "some" line indicates the share of time in which at least some +tasks are stalled on a given resource. + +The "full" line indicates the share of time in which all non-idle +tasks are stalled on a given resource simultaneously. In this state +actual CPU cycles are going to waste, and a workload that spends +extended time in this state is considered to be thrashing. This has +severe impact on performance, and it's useful to distinguish this +situation from a state where some tasks are stalled but the CPU is +still doing productive work. As such, time spent in this subset of the +stall state is tracked separately and exported in the "full" averages. + +The ratios are tracked as recent trends over ten, sixty, and three +hundred second windows, which gives insight into short term events as +well as medium and long term trends. The total absolute stall time is +tracked and exported as well, to allow detection of latency spikes +which wouldn't necessarily make a dent in the time averages, or to +average trends over custom time frames. diff --git a/include/linux/psi.h b/include/linux/psi.h new file mode 100644 index 000000000000..371af1479699 --- /dev/null +++ b/include/linux/psi.h @@ -0,0 +1,27 @@ +#ifndef _LINUX_PSI_H +#define _LINUX_PSI_H + +#include +#include + +#ifdef CONFIG_PSI + +extern bool psi_disabled; + +void psi_init(void); + +void psi_task_change(struct task_struct *task, u64 now, int clear, int set); + +void psi_memstall_enter(unsigned long *flags); +void psi_memstall_leave(unsigned long *flags); + +#else /* CONFIG_PSI */ + +static inline void psi_init(void) {} + +static inline void psi_memstall_enter(unsigned long *flags) {} +static inline void psi_memstall_leave(unsigned long *flags) {} + +#endif /* CONFIG_PSI */ + +#endif /* _LINUX_PSI_H */ diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h new file mode 100644 index 000000000000..b6ff46362eb3 --- /dev/null +++ b/include/linux/psi_types.h @@ -0,0 +1,87 @@ +#ifndef _LINUX_PSI_TYPES_H +#define _LINUX_PSI_TYPES_H + +#include + +#ifdef CONFIG_PSI + +/* Tracked task states */ +enum psi_task_count { + NR_IOWAIT, + NR_MEMSTALL, + NR_RUNNING, + NR_PSI_TASK_COUNTS, +}; + +/* Task state bitmasks */ +#define TSK_IOWAIT (1 << NR_IOWAIT) +#define TSK_MEMSTALL (1 << NR_MEMSTALL) +#define TSK_RUNNING (1 << NR_RUNNING) + +/* Resources that workloads could be stalled on */ +enum psi_res { + PSI_IO, + PSI_MEM, + PSI_CPU, + NR_PSI_RESOURCES, +}; + +/* + * Pressure states for each resource: + * + * SOME: Stalled tasks & working tasks + * FULL: Stalled tasks & no working tasks + */ +enum psi_states { + PSI_IO_SOME, + PSI_IO_FULL, + PSI_MEM_SOME, + PSI_MEM_FULL, + PSI_CPU_SOME, + PSI_NONIDLE, + NR_PSI_STATES, +}; + +struct psi_group_cpu { + /* 1st cacheline updated by the scheduler */ + + /* States of the tasks belonging to this group */ + unsigned int tasks[NR_PSI_TASK_COUNTS] ____cacheline_aligned_in_smp; + + /* Period time sampling buckets for each state of interest (ns) */ + u32 times[NR_PSI_STATES]; + + /* Time of last task change in this group (rq_clock) */ + u64 state_start; + + /* 2nd cacheline updated by the aggregator */ + + /* Delta detection against the sampling buckets */ + u32 times_prev[NR_PSI_STATES] ____cacheline_aligned_in_smp; +}; + +struct psi_group { + /* Protects data updated during an aggregation */ + struct mutex stat_lock; + + /* Per-cpu task state & time tracking */ + struct psi_group_cpu __percpu *pcpu; + + /* Periodic aggregation state */ + u64 total_prev[NR_PSI_STATES - 1]; + u64 last_update; + u64 next_update; + struct delayed_work clock_work; + + /* Total stall times and sampled pressure averages */ + u64 total[NR_PSI_STATES - 1]; + unsigned long avg[NR_PSI_STATES - 1][3]; +}; + +#else /* CONFIG_PSI */ + +struct psi_group { }; + +#endif /* CONFIG_PSI */ + +#endif /* _LINUX_PSI_TYPES_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index ca3f3eae8980..d5e4ee234114 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -709,6 +710,10 @@ struct task_struct { unsigned sched_contributes_to_load:1; unsigned sched_migrated:1; unsigned sched_remote_wakeup:1; +#ifdef CONFIG_PSI + unsigned sched_psi_wake_requeue:1; +#endif + /* Force alignment to the next boundary: */ unsigned :0; @@ -956,6 +961,10 @@ struct task_struct { siginfo_t *last_siginfo; struct task_io_accounting ioac; +#ifdef CONFIG_PSI + /* Pressure stall state */ + unsigned int psi_flags; +#endif #ifdef CONFIG_TASK_XACCT /* Accumulated RSS usage: */ u64 acct_rss_mem1; @@ -1385,6 +1394,7 @@ extern struct pid *cad_pid; #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ +#define PF_MEMSTALL 0x01000000 /* Stalled due to lack of memory */ #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_allowed */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */ diff --git a/init/Kconfig b/init/Kconfig index 18b151f0ddc1..ad61ddb5d68e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -457,6 +457,21 @@ config TASK_IO_ACCOUNTING Say N if unsure. +config PSI + bool "Pressure stall information tracking" + help + Collect metrics that indicate how overcommitted the CPU, memory, + and IO capacity are in the system. + + If you say Y here, the kernel will create /proc/pressure/ with the + pressure statistics files cpu, memory, and io. These will indicate + the share of walltime in which some or all tasks in the system are + delayed due to contention of the respective resource. + + For more details see Documentation/accounting/psi.txt. + + Say N if unsure. + endmenu # "CPU/Task time and stats accounting" config CPU_ISOLATION diff --git a/kernel/fork.c b/kernel/fork.c index a5d21c42acfc..067aa5c28526 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1704,6 +1704,10 @@ static __latent_entropy struct task_struct *copy_process( p->default_timer_slack_ns = current->timer_slack_ns; +#ifdef CONFIG_PSI + p->psi_flags = 0; +#endif + task_io_accounting_init(&p->ioac); acct_clear_integrals(p); diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index d9a02b318108..b29bc18f2704 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -29,3 +29,4 @@ obj-$(CONFIG_CPU_FREQ) += cpufreq.o obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o obj-$(CONFIG_MEMBARRIER) += membarrier.o obj-$(CONFIG_CPU_ISOLATION) += isolation.o +obj-$(CONFIG_PSI) += psi.o diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9586a8141f16..e53137df405b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -743,8 +743,10 @@ static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags) if (!(flags & ENQUEUE_NOCLOCK)) update_rq_clock(rq); - if (!(flags & ENQUEUE_RESTORE)) + if (!(flags & ENQUEUE_RESTORE)) { sched_info_queued(rq, p); + psi_enqueue(rq, p, flags & ENQUEUE_WAKEUP); + } p->sched_class->enqueue_task(rq, p, flags); } @@ -754,8 +756,10 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags) if (!(flags & DEQUEUE_NOCLOCK)) update_rq_clock(rq); - if (!(flags & DEQUEUE_SAVE)) + if (!(flags & DEQUEUE_SAVE)) { sched_info_dequeued(rq, p); + psi_dequeue(rq, p, flags & DEQUEUE_SLEEP); + } p->sched_class->dequeue_task(rq, p, flags); } @@ -2058,6 +2062,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags); if (task_cpu(p) != cpu) { wake_flags |= WF_MIGRATED; + psi_ttwu_dequeue(p); set_task_cpu(p, cpu); } @@ -6124,6 +6129,8 @@ void __init sched_init(void) init_schedstats(); + psi_init(); + scheduler_running = 1; } diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c new file mode 100644 index 000000000000..57ec86592b5a --- /dev/null +++ b/kernel/sched/psi.c @@ -0,0 +1,643 @@ +/* + * Pressure stall information for CPU, memory and IO + * + * Copyright (c) 2018 Facebook, Inc. + * Author: Johannes Weiner + * + * When CPU, memory and IO are contended, tasks experience delays that + * reduce throughput and introduce latencies into the workload. Memory + * and IO contention, in addition, can cause a full loss of forward + * progress in which the CPU goes idle. + * + * This code aggregates individual task delays into resource pressure + * metrics that indicate problems with both workload health and + * resource utilization. + * + * Model + * + * The time in which a task can execute on a CPU is our baseline for + * productivity. Pressure expresses the amount of time in which this + * potential cannot be realized due to resource contention. + * + * This concept of productivity has two components: the workload and + * the CPU. To measure the impact of pressure on both, we define two + * contention states for a resource: SOME and FULL. + * + * In the SOME state of a given resource, one or more tasks are + * delayed on that resource. This affects the workload's ability to + * perform work, but the CPU may still be executing other tasks. + * + * In the FULL state of a given resource, all non-idle tasks are + * delayed on that resource such that nobody is advancing and the CPU + * goes idle. This leaves both workload and CPU unproductive. + * + * (Naturally, the FULL state doesn't exist for the CPU resource.) + * + * SOME = nr_delayed_tasks != 0 + * FULL = nr_delayed_tasks != 0 && nr_running_tasks == 0 + * + * The percentage of wallclock time spent in those compound stall + * states gives pressure numbers between 0 and 100 for each resource, + * where the SOME percentage indicates workload slowdowns and the FULL + * percentage indicates reduced CPU utilization: + * + * %SOME = time(SOME) / period + * %FULL = time(FULL) / period + * + * Multiple CPUs + * + * The more tasks and available CPUs there are, the more work can be + * performed concurrently. This means that the potential that can go + * unrealized due to resource contention *also* scales with non-idle + * tasks and CPUs. + * + * Consider a scenario where 257 number crunching tasks are trying to + * run concurrently on 256 CPUs. If we simply aggregated the task + * states, we would have to conclude a CPU SOME pressure number of + * 100%, since *somebody* is waiting on a runqueue at all + * times. However, that is clearly not the amount of contention the + * workload is experiencing: only one out of 256 possible exceution + * threads will be contended at any given time, or about 0.4%. + * + * Conversely, consider a scenario of 4 tasks and 4 CPUs where at any + * given time *one* of the tasks is delayed due to a lack of memory. + * Again, looking purely at the task state would yield a memory FULL + * pressure number of 0%, since *somebody* is always making forward + * progress. But again this wouldn't capture the amount of execution + * potential lost, which is 1 out of 4 CPUs, or 25%. + * + * To calculate wasted potential (pressure) with multiple processors, + * we have to base our calculation on the number of non-idle tasks in + * conjunction with the number of available CPUs, which is the number + * of potential execution threads. SOME becomes then the proportion of + * delayed tasks to possibe threads, and FULL is the share of possible + * threads that are unproductive due to delays: + * + * threads = min(nr_nonidle_tasks, nr_cpus) + * SOME = min(nr_delayed_tasks / threads, 1) + * FULL = (threads - min(nr_running_tasks, threads)) / threads + * + * For the 257 number crunchers on 256 CPUs, this yields: + * + * threads = min(257, 256) + * SOME = min(1 / 256, 1) = 0.4% + * FULL = (256 - min(257, 256)) / 256 = 0% + * + * For the 1 out of 4 memory-delayed tasks, this yields: + * + * threads = min(4, 4) + * SOME = min(1 / 4, 1) = 25% + * FULL = (4 - min(3, 4)) / 4 = 25% + * + * [ Substitute nr_cpus with 1, and you can see that it's a natural + * extension of the single-CPU model. ] + * + * Implementation + * + * To assess the precise time spent in each such state, we would have + * to freeze the system on task changes and start/stop the state + * clocks accordingly. Obviously that doesn't scale in practice. + * + * Because the scheduler aims to distribute the compute load evenly + * among the available CPUs, we can track task state locally to each + * CPU and, at much lower frequency, extrapolate the global state for + * the cumulative stall times and the running averages. + * + * For each runqueue, we track: + * + * tSOME[cpu] = time(nr_delayed_tasks[cpu] != 0) + * tFULL[cpu] = time(nr_delayed_tasks[cpu] && !nr_running_tasks[cpu]) + * tNONIDLE[cpu] = time(nr_nonidle_tasks[cpu] != 0) + * + * and then periodically aggregate: + * + * tNONIDLE = sum(tNONIDLE[i]) + * + * tSOME = sum(tSOME[i] * tNONIDLE[i]) / tNONIDLE + * tFULL = sum(tFULL[i] * tNONIDLE[i]) / tNONIDLE + * + * %SOME = tSOME / period + * %FULL = tFULL / period + * + * This gives us an approximation of pressure that is practical + * cost-wise, yet way more sensitive and accurate than periodic + * sampling of the aggregate task states would be. + */ + +#include +#include +#include +#include +#include +#include +#include +#include "sched.h" + +static int psi_bug __read_mostly; + +bool psi_disabled __read_mostly; +core_param(psi_disabled, psi_disabled, bool, 0644); + +/* Running averages - we need to be higher-res than loadavg */ +#define PSI_FREQ (2*HZ+1) /* 2 sec intervals */ +#define EXP_10s 1677 /* 1/exp(2s/10s) as fixed-point */ +#define EXP_60s 1981 /* 1/exp(2s/60s) */ +#define EXP_300s 2034 /* 1/exp(2s/300s) */ + +/* Sampling frequency in nanoseconds */ +static u64 psi_period __read_mostly; + +/* System-level pressure and stall tracking */ +static DEFINE_PER_CPU(struct psi_group_cpu, system_group_pcpu); +static struct psi_group psi_system = { + .pcpu = &system_group_pcpu, +}; + +static void psi_clock(struct work_struct *work); + +static void psi_group_init(struct psi_group *group) +{ + group->next_update = sched_clock() + psi_period; + INIT_DELAYED_WORK(&group->clock_work, psi_clock); + mutex_init(&group->stat_lock); +} + +void __init psi_init(void) +{ + if (psi_disabled) + return; + + psi_period = jiffies_to_nsecs(PSI_FREQ); + psi_group_init(&psi_system); +} + +static void calc_avgs(unsigned long avg[3], int missed_periods, + u64 time, u64 period) +{ + unsigned long pct; + + /* Fill in zeroes for periods of no activity */ + if (missed_periods) { + avg[0] = calc_load_n(avg[0], EXP_10s, 0, missed_periods); + avg[1] = calc_load_n(avg[1], EXP_60s, 0, missed_periods); + avg[2] = calc_load_n(avg[2], EXP_300s, 0, missed_periods); + } + + /* Sample the most recent active period */ + pct = div_u64(time * 100, period); + pct *= FIXED_1; + avg[0] = calc_load(avg[0], EXP_10s, pct); + avg[1] = calc_load(avg[1], EXP_60s, pct); + avg[2] = calc_load(avg[2], EXP_300s, pct); +} + +static bool test_state(unsigned int *tasks, int cpu, enum psi_states state) +{ + switch (state) { + case PSI_IO_SOME: + return tasks[NR_IOWAIT]; + case PSI_IO_FULL: + return tasks[NR_IOWAIT] && !tasks[NR_RUNNING]; + case PSI_MEM_SOME: + return tasks[NR_MEMSTALL]; + case PSI_MEM_FULL: + /* + * Since we care about lost potential, things are + * fully blocked on memory when there are no other + * working tasks, but also when the CPU is actively + * being used by a reclaimer and nothing productive + * could run even if it were runnable. + */ + return tasks[NR_MEMSTALL] && + (!tasks[NR_RUNNING] || + cpu_curr(cpu)->flags & PF_MEMSTALL); + case PSI_CPU_SOME: + return tasks[NR_RUNNING] > 1; + case PSI_NONIDLE: + return tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] || + tasks[NR_RUNNING]; + default: + return false; + } +} + +static bool psi_update_stats(struct psi_group *group) +{ + u64 deltas[NR_PSI_STATES - 1] = { 0, }; + unsigned long missed_periods = 0; + unsigned long nonidle_total = 0; + u64 now, expires, period; + int cpu; + int s; + + mutex_lock(&group->stat_lock); + + /* + * Collect the per-cpu time buckets and average them into a + * single time sample that is normalized to wallclock time. + * + * For averaging, each CPU is weighted by its non-idle time in + * the sampling period. This eliminates artifacts from uneven + * loading, or even entirely idle CPUs. + * + * We don't need to synchronize against CPU hotplugging. If we + * see a CPU that's online and has samples, we incorporate it. + */ + for_each_online_cpu(cpu) { + struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu); + u32 uninitialized_var(nonidle); + + BUILD_BUG_ON(PSI_NONIDLE != NR_PSI_STATES - 1); + + for (s = PSI_NONIDLE; s >= 0; s--) { + u32 time, delta; + + time = READ_ONCE(groupc->times[s]); + /* + * In addition to already concluded states, we + * also incorporate currently active states on + * the CPU, since states may last for many + * sampling periods. + * + * This way we keep our delta sampling buckets + * small (u32) and our reported pressure close + * to what's actually happening. + */ + if (test_state(groupc->tasks, cpu, s)) { + /* + * We can race with a state change and + * need to make sure the state_start + * update is ordered against the + * updates to the live state and the + * time buckets (groupc->times). + * + * 1. If we observe task state that + * needs to be recorded, make sure we + * see state_start from when that + * state went into effect or we'll + * count time from the previous state. + * + * 2. If the time delta has already + * been added to the bucket, make sure + * we don't see it in state_start or + * we'll count it twice. + * + * If the time delta is out of + * state_start but not in the time + * bucket yet, we'll miss it entirely + * and handle it in the next period. + */ + smp_rmb(); + time += cpu_clock(cpu) - groupc->state_start; + } + delta = time - groupc->times_prev[s]; + groupc->times_prev[s] = time; + + if (s == PSI_NONIDLE) { + nonidle = nsecs_to_jiffies(delta); + nonidle_total += nonidle; + } else { + deltas[s] += (u64)delta * nonidle; + } + } + } + + /* + * Integrate the sample into the running statistics that are + * reported to userspace: the cumulative stall times and the + * decaying averages. + * + * Pressure percentages are sampled at PSI_FREQ. We might be + * called more often when the user polls more frequently than + * that; we might be called less often when there is no task + * activity, thus no data, and clock ticks are sporadic. The + * below handles both. + */ + + /* total= */ + for (s = 0; s < NR_PSI_STATES - 1; s++) + group->total[s] += div_u64(deltas[s], max(nonidle_total, 1UL)); + + /* avgX= */ + now = sched_clock(); + expires = group->next_update; + if (now < expires) + goto out; + if (now - expires > psi_period) + missed_periods = div_u64(now - expires, psi_period); + + /* + * The periodic clock tick can get delayed for various + * reasons, especially on loaded systems. To avoid clock + * drift, we schedule the clock in fixed psi_period intervals. + * But the deltas we sample out of the per-cpu buckets above + * are based on the actual time elapsing between clock ticks. + */ + group->next_update = expires + ((1 + missed_periods) * psi_period); + period = now - (group->last_update + (missed_periods * psi_period)); + group->last_update = now; + + for (s = 0; s < NR_PSI_STATES - 1; s++) { + u32 sample; + + sample = group->total[s] - group->total_prev[s]; + /* + * Due to the lockless sampling of the time buckets, + * recorded time deltas can slip into the next period, + * which under full pressure can result in samples in + * excess of the period length. + * + * We don't want to report non-sensical pressures in + * excess of 100%, nor do we want to drop such events + * on the floor. Instead we punt any overage into the + * future until pressure subsides. By doing this we + * don't underreport the occurring pressure curve, we + * just report it delayed by one period length. + * + * The error isn't cumulative. As soon as another + * delta slips from a period P to P+1, by definition + * it frees up its time T in P. + */ + if (sample > period) + sample = period; + group->total_prev[s] += sample; + calc_avgs(group->avg[s], missed_periods, sample, period); + } +out: + mutex_unlock(&group->stat_lock); + return nonidle_total; +} + +static void psi_clock(struct work_struct *work) +{ + struct delayed_work *dwork; + struct psi_group *group; + bool nonidle; + + dwork = to_delayed_work(work); + group = container_of(dwork, struct psi_group, clock_work); + + /* + * If there is task activity, periodically fold the per-cpu + * times and feed samples into the running averages. If things + * are idle and there is no data to process, stop the clock. + * Once restarted, we'll catch up the running averages in one + * go - see calc_avgs() and missed_periods. + */ + + nonidle = psi_update_stats(group); + + if (nonidle) { + unsigned long delay = 0; + u64 now; + + now = sched_clock(); + if (group->next_update > now) + delay = nsecs_to_jiffies(group->next_update - now) + 1; + schedule_delayed_work(dwork, delay); + } +} + +static void psi_group_change(struct psi_group *group, int cpu, u64 now, + unsigned int clear, unsigned int set) +{ + struct psi_group_cpu *groupc; + unsigned int t, m; + u32 delta; + + groupc = per_cpu_ptr(group->pcpu, cpu); + + /* + * First we assess the aggregate resource states these CPU's + * tasks have been in since the last change, and account any + * SOME and FULL time that may have resulted in. + * + * Then we update the task counts according to the state + * change requested through the @clear and @set bits. + */ + + delta = now - groupc->state_start; + groupc->state_start = now; + + /* + * Update state_start before recording time in the sampling + * buckets and changing task counts, to prevent a racing + * aggregation from counting the delta twice or attributing it + * to an old state. + */ + smp_wmb(); + + if (test_state(groupc->tasks, cpu, PSI_IO_SOME)) { + groupc->times[PSI_IO_SOME] += delta; + if (test_state(groupc->tasks, cpu, PSI_IO_FULL)) + groupc->times[PSI_IO_FULL] += delta; + } + if (test_state(groupc->tasks, cpu, PSI_MEM_SOME)) { + groupc->times[PSI_MEM_SOME] += delta; + if (test_state(groupc->tasks, cpu, PSI_MEM_FULL)) + groupc->times[PSI_MEM_FULL] += delta; + } + if (test_state(groupc->tasks, cpu, PSI_CPU_SOME)) + groupc->times[PSI_CPU_SOME] += delta; + if (test_state(groupc->tasks, cpu, PSI_NONIDLE)) + groupc->times[PSI_NONIDLE] += delta; + + for (t = 0, m = clear; m; m &= ~(1 << t), t++) { + if (!(m & (1 << t))) + continue; + if (groupc->tasks[t] == 0 && !psi_bug) { + printk_deferred(KERN_ERR "psi: task underflow! cpu=%d t=%d tasks=[%u %u %u] clear=%x set=%x\n", + cpu, t, groupc->tasks[0], + groupc->tasks[1], groupc->tasks[2], + clear, set); + psi_bug = 1; + } + groupc->tasks[t]--; + } + for (t = 0; set; set &= ~(1 << t), t++) + if (set & (1 << t)) + groupc->tasks[t]++; + + if (!delayed_work_pending(&group->clock_work)) + schedule_delayed_work(&group->clock_work, PSI_FREQ); +} + +void psi_task_change(struct task_struct *task, u64 now, int clear, int set) +{ + int cpu = task_cpu(task); + + if (psi_disabled) + return; + + if (!task->pid) + return; + + if (((task->psi_flags & set) || + (task->psi_flags & clear) != clear) && + !psi_bug) { + printk_deferred(KERN_ERR "psi: inconsistent task state! task=%d:%s cpu=%d psi_flags=%x clear=%x set=%x\n", + task->pid, task->comm, cpu, + task->psi_flags, clear, set); + psi_bug = 1; + } + + task->psi_flags &= ~clear; + task->psi_flags |= set; + + psi_group_change(&psi_system, cpu, now, clear, set); +} + +/** + * psi_memstall_enter - mark the beginning of a memory stall section + * @flags: flags to handle nested sections + * + * Marks the calling task as being stalled due to a lack of memory, + * such as waiting for a refault or performing reclaim. + */ +void psi_memstall_enter(unsigned long *flags) +{ + struct rq_flags rf; + struct rq *rq; + + if (psi_disabled) + return; + + *flags = current->flags & PF_MEMSTALL; + if (*flags) + return; + /* + * PF_MEMSTALL setting & accounting needs to be atomic wrt + * changes to the task's scheduling state, otherwise we can + * race with CPU migration. + */ + rq = this_rq_lock_irq(&rf); + + update_rq_clock(rq); + + current->flags |= PF_MEMSTALL; + psi_task_change(current, rq_clock(rq), 0, TSK_MEMSTALL); + + rq_unlock_irq(rq, &rf); +} + +/** + * psi_memstall_leave - mark the end of an memory stall section + * @flags: flags to handle nested memdelay sections + * + * Marks the calling task as no longer stalled due to lack of memory. + */ +void psi_memstall_leave(unsigned long *flags) +{ + struct rq_flags rf; + struct rq *rq; + + if (psi_disabled) + return; + + if (*flags) + return; + /* + * PF_MEMSTALL clearing & accounting needs to be atomic wrt + * changes to the task's scheduling state, otherwise we could + * race with CPU migration. + */ + rq = this_rq_lock_irq(&rf); + + update_rq_clock(rq); + + current->flags &= ~PF_MEMSTALL; + psi_task_change(current, rq_clock(rq), TSK_MEMSTALL, 0); + + rq_unlock_irq(rq, &rf); +} + +static int psi_show(struct seq_file *m, struct psi_group *group, + enum psi_res res) +{ + int full; + + if (psi_disabled) + return -EOPNOTSUPP; + + psi_update_stats(group); + + for (full = 0; full < 2 - (res == PSI_CPU); full++) { + unsigned long avg[3]; + u64 total; + int w; + + for (w = 0; w < 3; w++) + avg[w] = group->avg[res * 2 + full][w]; + total = div_u64(group->total[res * 2 + full], NSEC_PER_USEC); + + seq_printf(m, "%s avg10=%lu.%02lu avg60=%lu.%02lu avg300=%lu.%02lu total=%llu\n", + full ? "full" : "some", + LOAD_INT(avg[0]), LOAD_FRAC(avg[0]), + LOAD_INT(avg[1]), LOAD_FRAC(avg[1]), + LOAD_INT(avg[2]), LOAD_FRAC(avg[2]), + total); + } + + return 0; +} + +static int psi_io_show(struct seq_file *m, void *v) +{ + return psi_show(m, &psi_system, PSI_IO); +} + +static int psi_memory_show(struct seq_file *m, void *v) +{ + return psi_show(m, &psi_system, PSI_MEM); +} + +static int psi_cpu_show(struct seq_file *m, void *v) +{ + return psi_show(m, &psi_system, PSI_CPU); +} + +static int psi_io_open(struct inode *inode, struct file *file) +{ + return single_open(file, psi_io_show, NULL); +} + +static int psi_memory_open(struct inode *inode, struct file *file) +{ + return single_open(file, psi_memory_show, NULL); +} + +static int psi_cpu_open(struct inode *inode, struct file *file) +{ + return single_open(file, psi_cpu_show, NULL); +} + +static const struct file_operations psi_io_fops = { + .open = psi_io_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static const struct file_operations psi_memory_fops = { + .open = psi_memory_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static const struct file_operations psi_cpu_fops = { + .open = psi_cpu_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int __init psi_proc_init(void) +{ + proc_mkdir("pressure", NULL); + proc_create("pressure/io", 0, NULL, &psi_io_fops); + proc_create("pressure/memory", 0, NULL, &psi_memory_fops); + proc_create("pressure/cpu", 0, NULL, &psi_cpu_fops); + return 0; +} +module_init(psi_proc_init); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index bc798c7cb4d4..e798491ff329 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include @@ -320,6 +321,7 @@ extern bool dl_cpu_busy(unsigned int cpu); #ifdef CONFIG_CGROUP_SCHED #include +#include struct cfs_rq; struct rt_rq; diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index 8aea199a39b4..f3e0267eb47d 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -55,6 +55,86 @@ static inline void rq_sched_info_depart (struct rq *rq, unsigned long long delt # define schedstat_val_or_zero(var) 0 #endif /* CONFIG_SCHEDSTATS */ +#ifdef CONFIG_PSI +/* + * PSI tracks state that persists across sleeps, such as iowaits and + * memory stalls. As a result, it has to distinguish between sleeps, + * where a task's runnable state changes, and requeues, where a task + * and its state are being moved between CPUs and runqueues. + */ +static inline void psi_enqueue(struct rq *rq, struct task_struct *p, + bool wakeup) +{ + int clear = 0, set = TSK_RUNNING; + + if (psi_disabled) + return; + + if (!wakeup || p->sched_psi_wake_requeue) { + if (p->flags & PF_MEMSTALL) + set |= TSK_MEMSTALL; + if (p->sched_psi_wake_requeue) + p->sched_psi_wake_requeue = 0; + } else { + if (p->in_iowait) + clear |= TSK_IOWAIT; + } + + psi_task_change(p, rq_clock(rq), clear, set); +} + +static inline void psi_dequeue(struct rq *rq, struct task_struct *p, bool sleep) +{ + int clear = TSK_RUNNING, set = 0; + + if (psi_disabled) + return; + + if (!sleep) { + if (p->flags & PF_MEMSTALL) + clear |= TSK_MEMSTALL; + } else { + if (p->in_iowait) + set |= TSK_IOWAIT; + } + + psi_task_change(p, rq_clock(rq), clear, set); +} + +static inline void psi_ttwu_dequeue(struct task_struct *p) +{ + if (psi_disabled) + return; + /* + * Is the task being migrated during a wakeup? Make sure to + * deregister its sleep-persistent psi states from the old + * queue, and let psi_enqueue() know it has to requeue. + */ + if (unlikely(p->in_iowait || (p->flags & PF_MEMSTALL))) { + struct rq_flags rf; + struct rq *rq; + int clear = 0; + + if (p->in_iowait) + clear |= TSK_IOWAIT; + if (p->flags & PF_MEMSTALL) + clear |= TSK_MEMSTALL; + + rq = __task_rq_lock(p, &rf); + update_rq_clock(rq); + psi_task_change(p, rq_clock(rq), clear, 0); + p->sched_psi_wake_requeue = 1; + __task_rq_unlock(rq, &rf); + } +} +#else /* CONFIG_PSI */ +static inline void psi_enqueue(struct rq *rq, struct task_struct *p, + bool wakeup) {} +static inline void psi_dequeue(struct rq *rq, struct task_struct *p, + bool sleep) {} +static inline void psi_ttwu_dequeue(struct task_struct *p) {} +#endif /* CONFIG_PSI */ + #ifdef CONFIG_SCHED_INFO static inline void sched_info_reset_dequeued(struct task_struct *t) { diff --git a/mm/compaction.c b/mm/compaction.c index 29bd1df18b98..8f9566745902 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_COMPACTION @@ -2068,11 +2069,15 @@ static int kcompactd(void *p) pgdat->kcompactd_classzone_idx = pgdat->nr_zones - 1; while (!kthread_should_stop()) { + unsigned long pflags; + trace_mm_compaction_kcompactd_sleep(pgdat->node_id); wait_event_freezable(pgdat->kcompactd_wait, kcompactd_work_requested(pgdat)); + psi_memstall_enter(&pflags); kcompactd_do_work(pgdat); + psi_memstall_leave(&pflags); } return 0; diff --git a/mm/filemap.c b/mm/filemap.c index e49961e13dd9..eee06145b997 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "internal.h" #define CREATE_TRACE_POINTS @@ -1075,11 +1076,14 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, struct wait_page_queue wait_page; wait_queue_entry_t *wait = &wait_page.wait; bool thrashing = false; + unsigned long pflags; int ret = 0; - if (bit_nr == PG_locked && !PageSwapBacked(page) && + if (bit_nr == PG_locked && !PageUptodate(page) && PageWorkingset(page)) { - delayacct_thrashing_start(); + if (!PageSwapBacked(page)) + delayacct_thrashing_start(); + psi_memstall_enter(&pflags); thrashing = true; } @@ -1121,8 +1125,11 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, finish_wait(q, wait); - if (thrashing) - delayacct_thrashing_end(); + if (thrashing) { + if (!PageSwapBacked(page)) + delayacct_thrashing_end(); + psi_memstall_leave(&pflags); + } /* * A signal could leave PageWaiters set. Clearing it here if diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 22320ea27489..8469f34e6731 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -67,6 +67,7 @@ #include #include #include +#include #include #include @@ -3552,15 +3553,20 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, enum compact_priority prio, enum compact_result *compact_result) { struct page *page; + unsigned long pflags; unsigned int noreclaim_flag; if (!order) return NULL; + psi_memstall_enter(&pflags); noreclaim_flag = memalloc_noreclaim_save(); + *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, prio); + memalloc_noreclaim_restore(noreclaim_flag); + psi_memstall_leave(&pflags); if (*compact_result <= COMPACT_INACTIVE) return NULL; @@ -3749,11 +3755,14 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, struct reclaim_state reclaim_state; int progress; unsigned int noreclaim_flag; + unsigned long pflags; cond_resched(); /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); + + psi_memstall_enter(&pflags); noreclaim_flag = memalloc_noreclaim_save(); fs_reclaim_acquire(gfp_mask); reclaim_state.reclaimed_slab = 0; @@ -3765,6 +3774,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, current->reclaim_state = NULL; fs_reclaim_release(gfp_mask); memalloc_noreclaim_restore(noreclaim_flag); + psi_memstall_leave(&pflags); cond_resched(); diff --git a/mm/vmscan.c b/mm/vmscan.c index 8d1ad48ffbcd..ee91e8cbeb5a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -49,6 +49,7 @@ #include #include #include +#include #include #include @@ -3115,6 +3116,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, { struct zonelist *zonelist; unsigned long nr_reclaimed; + unsigned long pflags; int nid; unsigned int noreclaim_flag; struct scan_control sc = { @@ -3143,9 +3145,13 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, sc.gfp_mask, sc.reclaim_idx); + psi_memstall_enter(&pflags); noreclaim_flag = memalloc_noreclaim_save(); + nr_reclaimed = do_try_to_free_pages(zonelist, &sc); + memalloc_noreclaim_restore(noreclaim_flag); + psi_memstall_leave(&pflags); trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); @@ -3565,6 +3571,7 @@ static int kswapd(void *p) pgdat->kswapd_order = 0; pgdat->kswapd_classzone_idx = MAX_NR_ZONES; for ( ; ; ) { + unsigned long pflags; bool ret; alloc_order = reclaim_order = pgdat->kswapd_order; @@ -3601,9 +3608,15 @@ static int kswapd(void *p) */ trace_mm_vmscan_kswapd_wake(pgdat->node_id, classzone_idx, alloc_order); + + psi_memstall_enter(&pflags); fs_reclaim_acquire(GFP_KERNEL); + reclaim_order = balance_pgdat(pgdat, alloc_order, classzone_idx); + fs_reclaim_release(GFP_KERNEL); + psi_memstall_leave(&pflags); + if (reclaim_order < alloc_order) goto kswapd_try_sleep; } From patchwork Wed Aug 1 15:13:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Weiner X-Patchwork-Id: 10552439 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1958C15E9 for ; Wed, 1 Aug 2018 15:11:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0805D2B8B5 for ; Wed, 1 Aug 2018 15:11:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05A142B8B3; Wed, 1 Aug 2018 15:11:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 003C02B7F1 for ; Wed, 1 Aug 2018 15:11:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C45AB6B026D; Wed, 1 Aug 2018 11:10:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BF8816B026F; Wed, 1 Aug 2018 11:10:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC3E06B026E; Wed, 1 Aug 2018 11:10:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id 717D56B026C for ; Wed, 1 Aug 2018 11:10:48 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id j9-v6so15788210qtn.22 for ; Wed, 01 Aug 2018 08:10:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:reply-to; bh=td8sVEq7B3zRpddgZoElV+O4+/3RnsPaR9Zy76qVrWc=; b=Xic/pJjB8YIohDjHY4CUHXf+pkpGl1LG7h2w9pvwzqgJURL53gLle8SPjjwL9uCUzs 2WZonQfuVkC6Xq9+FRwYANEhbj//zKUBptivVXOuuaoCuREMiiFB1ksOpiP3tDBNUgMS YQaQ7TRe6hLp0rqNGPb3N0SSKSQUJPWyBAFECbNfJpL4Kp9aruI7b8DIe3Pt42Kde8Pn GurTvwk1su7yrQoyNdJCpR+4O792ZzRT2M0zmSCqS742aYnZ4Y68dtwsbbak0Sf2G42n lXo1EuErLaJV7nlF+TnzRlyrcPYEAKVG6rqt5pqPPZGGPP6wvHkGSZyEwnkfgto67mFQ w2dg== X-Gm-Message-State: AOUpUlEik9Wvp9fdWgQRVi1IogMR1vKakQ/SzEk4i41YkWJtx0T9ETkc 3risF4XCd+MKWaiHjYXqWLrvGp2ai6AIqK0cItH0lnCjf5A1CkuRu3XU7Kt+LQIXJO8LtCuCPs0 CBeWusT4UemVXJwoRkctx9u0dHtrwf8KMfjt0Sf3iSKDkuiInSVhOErCZexwoi+kPrnjEWgPYJj rcm4EnVthsftegOlN0o0A3V9fMZ5mb3xx3BzTS9VSS59SdZJxmYa1wQ/Rmk6ioO48G2lIqbUT4G HJfBEGB3wWR4taaRBiOC2p/LZSsIKPVYMVy84iu2KI6Tc3fb7aUMDtqm4arw4qWZpUPh4+Xx4cY EVg1YmITiTVDxNNxEH9PyZn0v5691A2ZcIBNNeZECGMOJArx0sqRg6LidOovXi16zktcV+0Wz90 n X-Received: by 2002:a0c:d647:: with SMTP id e7-v6mr23818139qvj.238.1533136248211; Wed, 01 Aug 2018 08:10:48 -0700 (PDT) X-Received: by 2002:a0c:d647:: with SMTP id e7-v6mr23818050qvj.238.1533136246893; Wed, 01 Aug 2018 08:10:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533136246; cv=none; d=google.com; s=arc-20160816; b=weGixzk/HCFbp35cfikrsZrqRQfqewUykMnAy+Of/2XWtNvCeA/28Xx1WAk7sKSuao YRtHChK4Ul/sKnkqEohBvBmKQkGdU2rkvsnaaA7ms3hVkbg+I3em8YD65AuiRo5xkpCD K/kqWH0xmHQnj8IIFUPAAElcIxTvdk36ZyCssW6vkoVeOtI+NkyAEOUAtqxq7VWXskm2 PsSnSXJikTYFefPYhKpeJshIIpRZqSH0N915fezAiw1gJK8EfISezLvbJmp+zRUYxW11 Ogb3NQArz2GAlMQ7NmA10EkODcGK5Qp5Kgp4ABC3XoXqfRnLQv4gPmA47LKg4HNUYR9K 2oPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=reply-to:references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=td8sVEq7B3zRpddgZoElV+O4+/3RnsPaR9Zy76qVrWc=; b=PFfO+FykLxB0f4Oicp1Ncrvrs1PQvx7r219hTpnVupONICCMEE6dKORpJTxF48wb9P uNx9MOuDODwWLhUMdtA015FXsII/yo8wT6C5PKexZ2IzdC6hdpda3LDkEkwFjpei+QUH kImhyD3Leo5Eo9CTJruEd6bmoVUODghWjRL9O6W/G6k9MYUpW/7oTys+Op16NnejL7Ub kgEQcec0+oab1M/pDcMZO50N3pYTt1W6jsvkNBCGHzrHJ+8bn6VHOr/NkIvFWH1/1gav eZt7QOqf4GRUtalzMH3/GEjp7mrw6SkwsgjLdW6T9iqmaQ2pcJLJLOqBv4AA4QWzHcxx frtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=Il+cbPGQ; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 11-v6sor9255712qkx.42.2018.08.01.08.10.46 for (Google Transport Security); Wed, 01 Aug 2018 08:10:46 -0700 (PDT) Received-SPF: pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=Il+cbPGQ; spf=pass (google.com: domain of hannes@cmpxchg.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references:reply-to; bh=td8sVEq7B3zRpddgZoElV+O4+/3RnsPaR9Zy76qVrWc=; b=Il+cbPGQnr19HyZmDTL4G2ZEcb5Q9AhwyE+R7cu+5f2aDQLqHrmPjHT1Oyk7iQbKU2 s0rcsIky5PoS7k0bW7zKQk9GuNBeqIUIYFlru5hPGnRbmxyRhiEoMCUsHM7c7z6sfNZf 82Z/vlfkEh+D4VgDgqIsd2vh9+izTS985hOUv/gtB+ZYGz7Lh3tohPPbTdDLTQwCn3ZE vWkc0AqRTZOo66pqWwbxUyojssT551q9rKRrksjoIGdkdd/ODWjYFTg/Q8KQ9FGDq6fn Ah+qGIdoMcqMw3je6KnY38Y3m0Z291t/uIoRUUADDyJ8FBMicxJJxxfNXgmiOScC7z7H wa3g== X-Google-Smtp-Source: AAOMgpcBW+C0pLKTnysQ3cEOWp4RP513XfhglJIOcBiVcBXdFRuTUjX3kW5xtk26qoahKhMC6BZYHg== X-Received: by 2002:a37:b684:: with SMTP id g126-v6mr25092280qkf.208.1533136246409; Wed, 01 Aug 2018 08:10:46 -0700 (PDT) Received: from localhost (216.49.36.201.res-cmts.bus.ptd.net. [216.49.36.201]) by smtp.gmail.com with ESMTPSA id j45-v6sm15073556qta.46.2018.08.01.08.10.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 01 Aug 2018 08:10:45 -0700 (PDT) From: Johannes Weiner To: Ingo Molnar , Peter Zijlstra , Andrew Morton , Linus Torvalds Cc: Tejun Heo , Suren Baghdasaryan , Daniel Drake , Vinayak Menon , Christopher Lameter , Mike Galbraith , Shakeel Butt , Peter Enderborg , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 9/9] psi: cgroup support Date: Wed, 1 Aug 2018 11:13:08 -0400 Message-Id: <20180801151308.32234-10-hannes@cmpxchg.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180801151308.32234-1-hannes@cmpxchg.org> References: <20180801151308.32234-1-hannes@cmpxchg.org> Reply-To: "[PATCH 0/9]"@kvack.org, "psi:pressure"@kvack.org, stall@kvack.org, information@kvack.org, for@kvack.org, CPU@kvack.org, memory@kvack.org, and@kvack.org, IO@kvack.org, v3@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. v3: - fix copy-paste indentation screwups Acked-by: Tejun Heo Signed-off-by: Johannes Weiner --- Documentation/accounting/psi.txt | 9 ++++ Documentation/cgroup-v2.txt | 18 +++++++ include/linux/cgroup-defs.h | 4 ++ include/linux/cgroup.h | 15 ++++++ include/linux/psi.h | 25 ++++++++++ init/Kconfig | 4 ++ kernel/cgroup/cgroup.c | 45 +++++++++++++++++- kernel/sched/psi.c | 81 +++++++++++++++++++++++++++++++- 8 files changed, 197 insertions(+), 4 deletions(-) diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt index 51e7ef14142e..e051810d5127 100644 --- a/Documentation/accounting/psi.txt +++ b/Documentation/accounting/psi.txt @@ -62,3 +62,12 @@ well as medium and long term trends. The total absolute stall time is tracked and exported as well, to allow detection of latency spikes which wouldn't necessarily make a dent in the time averages, or to average trends over custom time frames. + +Cgroup2 interface +================= + +In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem +mounted, pressure stall information is also tracked for tasks grouped +into cgroups. Each subdirectory in the cgroupfs mountpoint contains +cpu.pressure, memory.pressure, and io.pressure files; the format is +the same as the /proc/pressure/ files. diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 74cdeaed9f7a..a22879dba019 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -963,6 +963,12 @@ All time durations are in microseconds. $PERIOD duration. "max" for $MAX indicates no limit. If only one number is written, $MAX is updated. + cpu.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for CPU. See + Documentation/accounting/psi.txt for details. + Memory ------ @@ -1199,6 +1205,12 @@ PAGE_SIZE multiple when read back. Swap usage hard limit. If a cgroup's swap usage reaches this limit, anonymous memory of the cgroup will not be swapped out. + memory.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for memory. See + Documentation/accounting/psi.txt for details. + Usage Guidelines ~~~~~~~~~~~~~~~~ @@ -1334,6 +1346,12 @@ IO Interface Files 8:16 rbps=2097152 wbps=max riops=max wiops=max + io.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for IO. See + Documentation/accounting/psi.txt for details. + Writeback ~~~~~~~~~ diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index dc5b70449dc6..280f18da956a 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -20,6 +20,7 @@ #include #include #include +#include #ifdef CONFIG_CGROUPS @@ -424,6 +425,9 @@ struct cgroup { /* used to schedule release agent */ struct work_struct release_agent_work; + /* used to track pressure stalls */ + struct psi_group psi; + /* used to store eBPF programs */ struct cgroup_bpf bpf; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 473e0c0abb86..fd94c294c207 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -627,6 +627,11 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp) pr_cont_kernfs_path(cgrp->kn); } +static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) +{ + return &cgrp->psi; +} + static inline void cgroup_init_kthreadd(void) { /* @@ -680,6 +685,16 @@ static inline union kernfs_node_id *cgroup_get_kernfs_id(struct cgroup *cgrp) return NULL; } +static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) +{ + return NULL; +} + +static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) +{ + return NULL; +} + static inline bool task_under_cgroup_hierarchy(struct task_struct *task, struct cgroup *ancestor) { diff --git a/include/linux/psi.h b/include/linux/psi.h index 371af1479699..05c3dae3e9c5 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -4,6 +4,9 @@ #include #include +struct seq_file; +struct css_set; + #ifdef CONFIG_PSI extern bool psi_disabled; @@ -15,6 +18,14 @@ void psi_task_change(struct task_struct *task, u64 now, int clear, int set); void psi_memstall_enter(unsigned long *flags); void psi_memstall_leave(unsigned long *flags); +int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); + +#ifdef CONFIG_CGROUPS +int psi_cgroup_alloc(struct cgroup *cgrp); +void psi_cgroup_free(struct cgroup *cgrp); +void cgroup_move_task(struct task_struct *p, struct css_set *to); +#endif + #else /* CONFIG_PSI */ static inline void psi_init(void) {} @@ -22,6 +33,20 @@ static inline void psi_init(void) {} static inline void psi_memstall_enter(unsigned long *flags) {} static inline void psi_memstall_leave(unsigned long *flags) {} +#ifdef CONFIG_CGROUPS +static inline int psi_cgroup_alloc(struct cgroup *cgrp) +{ + return 0; +} +static inline void psi_cgroup_free(struct cgroup *cgrp) +{ +} +static inline void cgroup_move_task(struct task_struct *p, struct css_set *to) +{ + rcu_assign_pointer(p->cgroups, to); +} +#endif + #endif /* CONFIG_PSI */ #endif /* _LINUX_PSI_H */ diff --git a/init/Kconfig b/init/Kconfig index ad61ddb5d68e..5c029f8d69f1 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -468,6 +468,10 @@ config PSI the share of walltime in which some or all tasks in the system are delayed due to contention of the respective resource. + In kernels with cgroup support (cgroup2 only), cgroups will + have cpu.pressure, memory.pressure, and io.pressure files, + which aggregate pressure stalls for the grouped tasks only. + For more details see Documentation/accounting/psi.txt. Say N if unsure. diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index a662bfcbea0e..bbb00b3ab752 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -826,7 +827,7 @@ static void css_set_move_task(struct task_struct *task, */ WARN_ON_ONCE(task->flags & PF_EXITING); - rcu_assign_pointer(task->cgroups, to_cset); + cgroup_move_task(task, to_cset); list_add_tail(&task->cg_list, use_mg_tasks ? &to_cset->mg_tasks : &to_cset->tasks); } @@ -3388,6 +3389,21 @@ static int cpu_stat_show(struct seq_file *seq, void *v) return ret; } +#ifdef CONFIG_PSI +static int cgroup_io_pressure_show(struct seq_file *seq, void *v) +{ + return psi_show(seq, &seq_css(seq)->cgroup->psi, PSI_IO); +} +static int cgroup_memory_pressure_show(struct seq_file *seq, void *v) +{ + return psi_show(seq, &seq_css(seq)->cgroup->psi, PSI_MEM); +} +static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v) +{ + return psi_show(seq, &seq_css(seq)->cgroup->psi, PSI_CPU); +} +#endif + static int cgroup_file_open(struct kernfs_open_file *of) { struct cftype *cft = of->kn->priv; @@ -4499,6 +4515,23 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .seq_show = cpu_stat_show, }, +#ifdef CONFIG_PSI + { + .name = "io.pressure", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_io_pressure_show, + }, + { + .name = "memory.pressure", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_memory_pressure_show, + }, + { + .name = "cpu.pressure", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_cpu_pressure_show, + }, +#endif { } /* terminate */ }; @@ -4559,6 +4592,7 @@ static void css_free_rwork_fn(struct work_struct *work) */ cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); + psi_cgroup_free(cgrp); if (cgroup_on_dfl(cgrp)) cgroup_stat_exit(cgrp); kfree(cgrp); @@ -4805,10 +4839,15 @@ static struct cgroup *cgroup_create(struct cgroup *parent) cgrp->self.parent = &parent->self; cgrp->root = root; cgrp->level = level; - ret = cgroup_bpf_inherit(cgrp); + + ret = psi_cgroup_alloc(cgrp); if (ret) goto out_idr_free; + ret = cgroup_bpf_inherit(cgrp); + if (ret) + goto out_psi_free; + for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) { cgrp->ancestor_ids[tcgrp->level] = tcgrp->id; @@ -4846,6 +4885,8 @@ static struct cgroup *cgroup_create(struct cgroup *parent) return cgrp; +out_psi_free: + psi_cgroup_free(cgrp); out_idr_free: cgroup_idr_remove(&root->cgroup_idr, cgrp->id); out_stat_exit: diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 57ec86592b5a..a20f885da66f 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -464,6 +464,9 @@ static void psi_group_change(struct psi_group *group, int cpu, u64 now, void psi_task_change(struct task_struct *task, u64 now, int clear, int set) { +#ifdef CONFIG_CGROUPS + struct cgroup *cgroup, *parent; +#endif int cpu = task_cpu(task); if (psi_disabled) @@ -485,6 +488,18 @@ void psi_task_change(struct task_struct *task, u64 now, int clear, int set) task->psi_flags |= set; psi_group_change(&psi_system, cpu, now, clear, set); + +#ifdef CONFIG_CGROUPS + cgroup = task->cgroups->dfl_cgrp; + while (cgroup && (parent = cgroup_parent(cgroup))) { + struct psi_group *group; + + group = cgroup_psi(cgroup); + psi_group_change(group, cpu, now, clear, set); + + cgroup = parent; + } +#endif } /** @@ -551,8 +566,70 @@ void psi_memstall_leave(unsigned long *flags) rq_unlock_irq(rq, &rf); } -static int psi_show(struct seq_file *m, struct psi_group *group, - enum psi_res res) +#ifdef CONFIG_CGROUPS +int psi_cgroup_alloc(struct cgroup *cgroup) +{ + cgroup->psi.pcpu = alloc_percpu(struct psi_group_cpu); + if (!cgroup->psi.pcpu) + return -ENOMEM; + psi_group_init(&cgroup->psi); + return 0; +} + +void psi_cgroup_free(struct cgroup *cgroup) +{ + cancel_delayed_work_sync(&cgroup->psi.clock_work); + free_percpu(cgroup->psi.pcpu); +} + +/** + * cgroup_move_task - move task to a different cgroup + * @task: the task + * @to: the target css_set + * + * Move task to a new cgroup and safely migrate its associated stall + * state between the different groups. + * + * This function acquires the task's rq lock to lock out concurrent + * changes to the task's scheduling state and - in case the task is + * running - concurrent changes to its stall state. + */ +void cgroup_move_task(struct task_struct *task, struct css_set *to) +{ + unsigned int task_flags = 0; + struct rq_flags rf; + struct rq *rq; + u64 now; + + rq = task_rq_lock(task, &rf); + + if (task_on_rq_queued(task)) + task_flags = TSK_RUNNING; + else if (task->in_iowait) + task_flags = TSK_IOWAIT; + if (task->flags & PF_MEMSTALL) + task_flags |= TSK_MEMSTALL; + + if (task_flags) { + update_rq_clock(rq); + now = rq_clock(rq); + psi_task_change(task, now, task_flags, 0); + } + + /* + * Lame to do this here, but the scheduler cannot be locked + * from the outside, so we move cgroups from inside sched/. + */ + rcu_assign_pointer(task->cgroups, to); + + if (task_flags) + psi_task_change(task, now, 0, task_flags); + + task_rq_unlock(rq, task, &rf); +} +#endif /* CONFIG_CGROUPS */ + +int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) { int full;