From patchwork Tue Sep 25 15:30:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10614239 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0670161F for ; Tue, 25 Sep 2018 15:30:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1E3F2A7AD for ; Tue, 25 Sep 2018 15:30:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF67E2A864; Tue, 25 Sep 2018 15:30:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 533AF2A7AD for ; Tue, 25 Sep 2018 15:30:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7C368E009C; Tue, 25 Sep 2018 11:30:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D2C708E0072; Tue, 25 Sep 2018 11:30:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1ACB8E009C; Tue, 25 Sep 2018 11:30:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id 98AA48E0072 for ; Tue, 25 Sep 2018 11:30:16 -0400 (EDT) Received: by mail-qk1-f200.google.com with SMTP id q20-v6so15336554qke.21 for ; Tue, 25 Sep 2018 08:30:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id; bh=fossXnE9DcMZSM5EBZ7fZb1nS+iEdBgGed53X3peolQ=; b=X21/lkf3wED1M5H51bQ6vHRbb38nIHBO7vbDEV/n7bBWrlwOF932aEc7pVOYWOM4nH nhwi1syYPZZ5fNCfw6G4BTmCuwuhDhOYQKBXPsx459P0UpVKuIDo++m2E2Wk27A6/2Dd IKXQ2LKdyt2Ns7+AxpJP2gRhBW7o8ATfqvbi58TM+uak2B4jCVZQcz1o8KlzyuqFzLuT X6YOYbmZlMZdYRFMc4HqRa4HKJ6dBRtzOWljqFBW0YDqxal1yim8z3WDHUBdfXOvd1hW OMqoP0hq1Vpkg9disHK/D9X5nz0bm+9cBW0JkMaWVkAsfJRupCaPlDFpyOUn2esYUw0j qsOA== X-Gm-Message-State: ABuFfohbdip9Dxw9qRwU9WSeVRCzJs+NOPYGFKa33IPs61m8bj7iWYlv h6UWBlY+7aWgkp3oL4P4B/uqHcVatuLB2Ran4z3o+i7FyFpx90M2g4Jb/Qyhf59nVFq/2YdYCsz NrE0taNvevBlLOdXVV2iGiF2kq8/zkoyzwFd5vhNUBfgSmnmzHJGC0M7yMDj1U8NjQlLuo8vt8S efs/Dt5ACM4xVg6VqWgA/ax9VuP8BrA+fpZXSLBUdVXwPwueA9hOeBXjQEYEypMp+xvEle2Kwcw UIbYInARd26xNdZh5RbO2vZRq06uneTzXopb/iUB1py8YJJngSFBKNew9Y1eJAcLQaRbJ4uBFLJ xirstMw8xMIHLSKEQIFqXuJHwz/N/f4a+7VNtnCILl3gFxuCHt8Qws1TfwYUNV5tj+D/y+2jUpa z X-Received: by 2002:ac8:2a13:: with SMTP id k19-v6mr1166172qtk.245.1537889416165; Tue, 25 Sep 2018 08:30:16 -0700 (PDT) X-Received: by 2002:ac8:2a13:: with SMTP id k19-v6mr1166110qtk.245.1537889415209; Tue, 25 Sep 2018 08:30:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537889415; cv=none; d=google.com; s=arc-20160816; b=YAZzxdlC2dODtRhbFr2Zyt6seBPZ1uNdAOcw3AGOB3/FeNu/EBZo3z826grxdHLi2F gy9gJR9m7SFxrx1NZDJSZdISuX6l27RvMcpcvYVq6HPn+Unbp2ZrnMotV8RJxEaNTAKx a8L2O+icToioFjkE0OrtJsZEf3/s2VeazlmUFN5GnxROYseW7yv8Qnt3ReQhN/BMj+YE bnxzUgcuFWiFYMai6M09eMmuAD9g4W/N1HJlKAWMwWD3GE4IjBwgkNAkgJiTs1b3aQ9q 0e41vXCCBLXb2jU6KHBNWmDWs3onBYr9hqaxPiqISRgyyadCuGpfjPeJz2fopNjyYvIM 7Smg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:to:from:dkim-signature; bh=fossXnE9DcMZSM5EBZ7fZb1nS+iEdBgGed53X3peolQ=; b=qEZW1UgCbFX5X7Uf7ZMOPEu1isw9WQTxjj06hGJ5bsZatcM3/AywwYz11vIH9yzIV1 zKlLt9rkJxA386t5RfzjqyfHTW77wshxzojhBFiZYnnJhNBk14LtgZ07SW05yYY1JzEF Fdje8xyPaG4zQf1M5+aV/GWfrUndmctfhUPHY3E3nsl/QOGypT6QVI5s8D1DjML//HeO IeNoQ50i5DLghozXv+7XL/B71A2QBnKB9tjf1ZcSt3BkTHvy8npjjww/WMZdUHGHlCMM 2Pzu/n6aQk75BLHyyJ/Q5fWbttL9XSgD4qjHUjC5lLrHOtaLwNhqIw7PwyHycMAknAUn bxlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=iFRhY4jj; spf=neutral (google.com: 209.85.220.41 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id 131-v6sor923320qkl.135.2018.09.25.08.30.14 for (Google Transport Security); Tue, 25 Sep 2018 08:30:15 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.41 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.41; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=iFRhY4jj; spf=neutral (google.com: 209.85.220.41 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=fossXnE9DcMZSM5EBZ7fZb1nS+iEdBgGed53X3peolQ=; b=iFRhY4jj2b1wZt64GCHFbeJWUg971Xqn0aki+cWLNVkW1FFZNvqmj6B/33ZfrH3dWV jGwD9+fKnxgKS23X4dzYsqiB/e05IWt/jrtFCY+PlBPsQg6V7MQ9ihSMp7rbAuitCwbL OinPFC1wCDCRjbPo8UqY1ztvSjnNiCdaVvj1I6z+1ayB3tuUO1SA8A/IZ7/Nnh+epBRd Dr8Yi3hdQ2IVjoVPqh+y9wd7sHncsiosuJomQ/wcOy/+EY6J+gsR4ZWYWV5o9IX88r4X ffmzhdGi8+9EZRxgrP8IU0PDo/ciuEcrqV9ief3pN1eUlfcYFJrKnuLrl4Xb4OnC2VmO rdGw== X-Google-Smtp-Source: ACcGV62kk8vRLoJr822RZ/P7cVu/XCIg4p5pl6cEfQRO9Blqa/qj+uVcejsv8CwAu7R17jp9koFUew== X-Received: by 2002:a37:404d:: with SMTP id n74-v6mr1145825qka.312.1537889414512; Tue, 25 Sep 2018 08:30:14 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id 72-v6sm1348539qkg.35.2018.09.25.08.30.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 25 Sep 2018 08:30:13 -0700 (PDT) From: Josef Bacik To: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-btrfs@vger.kernel.org, riel@redhat.com, hannes@cmpxchg.org, tj@kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [RFC][PATCH 0/8] drop the mmap_sem when doing IO in the fault path Date: Tue, 25 Sep 2018 11:30:03 -0400 Message-Id: <20180925153011.15311-1-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now that we have proper isolation in place with cgroups2 we have started going through and fixing the various priority inversions. Most are all gone now, but this one is sort of weird since it's not necessarily a priority inversion that happens within the kernel, but rather because of something userspace does. We have giant applications that we want to protect, and parts of these giant applications do things like watch the system state to determine how healthy the box is for load balancing and such. This involves running 'ps' or other such utilities. These utilities will often walk /proc//whatever, and these files can sometimes need to down_read(&task->mmap_sem). Not usually a big deal, but we noticed when we are stress testing that sometimes our protected application has latency spikes trying to get the mmap_sem for tasks that are in lower priority cgroups. This is because any down_write() on a semaphore essentially turns it into a mutex, so even if we currently have it held for reading, any new readers will not be allowed on to keep from starving the writer. This is fine, except a lower priority task could be stuck doing IO because it has been throttled to the point that its IO is taking much longer than normal. But because a higher priority group depends on this completing it is now stuck behind lower priority work. In order to avoid this particular priority inversion we want to use the existing retry mechanism to stop from holding the mmap_sem at all if we are going to do IO. This already exists in the read case sort of, but needed to be extended for more than just grabbing the page lock. With io.latency we throttle at submit_bio() time, so the readahead stuff can block and even page_cache_read can block, so all these paths need to have the mmap_sem dropped. The other big thing is ->page_mkwrite. btrfs is particularly shitty here because we have to reserve space for the dirty page, which can be a very expensive operation. We use the same retry method as the read path, and simply cache the page and verify the page is still setup properly the next pass through ->page_mkwrite(). I've tested these patches with xfstests and there are no regressions. Let me know what you think. Thanks, Josef