From patchwork Tue May 21 04:52:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952881 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB1F676 for ; Tue, 21 May 2019 04:53:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C8E612892E for ; Tue, 21 May 2019 04:53:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BC9412893D; Tue, 21 May 2019 04:53:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F8F528928 for ; Tue, 21 May 2019 04:53:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 879686B0007; Tue, 21 May 2019 00:53:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 677926B0008; Tue, 21 May 2019 00:53:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 454A46B000C; Tue, 21 May 2019 00:53:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id EA5366B0007 for ; Tue, 21 May 2019 00:53:05 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id y12so28687421ede.19 for ; Mon, 20 May 2019 21:53:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=NrTsgDq9F5o4NLlcJGFlvDuokv4X+sdo2Fo8h4tGyWs=; b=tyYq0DoAOqGMMlkUTIQvPpj9vW9xJju8Dys/vXOsbDKNZ+7YgaSKyh1VFbJNk2I3rU fIWNcc7HLASQP5L4V7bO929TnqQHtN1jnzBMVD3wcIJuykcs+WJCLYdnoSV5kCOBkVwa 1xSGu47R7+YHsb8lLAfIjRUgmdw4ZxDalMdMaCFm+dWnyb9FZsPiTU1Bkr1SOW2GbV5G H66SzmMwD0S0uBH149YSxkMvCL13WlFZ19TSL0tnUfW19gGmN88J5ik5yrT6abwJgJyK 7ljYjgeX3U1tSx864RhEJ6ihIMDSdSl1j/Q6zg8dC56lTB0ulLgIZdaKNm0dKdrvsTcL mkzw== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAWFx9gjBr1P+C5QI31GljkD765Sdu36m5qjABdkVat3xgTHIjrg W03B286L0vCkaYroZfllJmH96aYQj3ukJrcryyawxqSC3ECQ+738b0D+KN861zo+pEZ0kFDBQnS mhotgDVqDMJn2V8oSw0NAldXSA4Zv4brAJZJZZFnWCLKkMZ+hXYtZCCxDAurtl1o= X-Received: by 2002:a50:9581:: with SMTP id w1mr80216439eda.6.1558414385463; Mon, 20 May 2019 21:53:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqwfbkc6XlTQNPrIt2qUKwqv59Ic9wV0zaDGIXXuztfE4osMRAoharAWpS/NWXwANMnE7kld X-Received: by 2002:a50:9581:: with SMTP id w1mr80216351eda.6.1558414384083; Mon, 20 May 2019 21:53:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414384; cv=none; d=google.com; s=arc-20160816; b=hbAzi7CDcB38qaaZoF3eTqmM11ADRldnYdGsDYVNuAN6FJRgARmsFv8hJkB60irhhv 8KDK6Ip8/yg/aIbpMVgeCjPdfb4z4D9TJfZZopnM8bZRsp6YomuHxFyX/GmBhi3UmL4i BslHOCIelutENhPVEzagCiyLTH5t3IY12asA2KKL2qQoCn1voX/A3M0zb+E3HocKSWZa PDLTnz5Bxlwh6UWE2jl4qWnIUZEu1jfdE+tChWYpvDwKJ+z3LXDFMx2AiFTswUv/95wM Qt39TT5oG3MkvcrLG2PttciV6+PKsC2kooCgAsUND2RSdIUN4Uur8gPlvivLY2nYLfAD jzxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=NrTsgDq9F5o4NLlcJGFlvDuokv4X+sdo2Fo8h4tGyWs=; b=YS0DfwasNnC/6XK1HUlUATI2doiSmReW0Q4s3rim+QZNHLuTVV+f5eAvV/fa+OKzk1 SNkK8MUKptg3PDpc6qTXUe43GDQFmhtNfo4lyCivU0NEe7u6syFvp3ten8oFf/rbfaKu VSSfDEFRqTdKUdh0BkBE5fFapGtr93ous1nOcWbN+f874bHem4/xXTvGWtj0YN5s4BkL XjIK2mf0pIPLfgWQeoD5cLp42UARziEybddVqWYcl7UA1u6FJejjCUFpq3tI1RuEhBYM ywbxzwqxVq5Wet3t6zh6n+DybwLsgvlXDmC0NzUV4yLWw4qKQPmagb2zjgkdBCUQM9WP SLnw== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id hb9si3894244ejb.235.2019.05.20.21.53.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:04 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:03 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:52:57 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 01/14] interval-tree: build unconditionally Date: Mon, 20 May 2019 21:52:29 -0700 Message-Id: <20190521045242.24378-2-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP In preparation for range locking, this patch gets rid of CONFIG_INTERVAL_TREE option as we will unconditionally build it. Signed-off-by: Davidlohr Bueso --- drivers/gpu/drm/Kconfig | 2 -- drivers/gpu/drm/i915/Kconfig | 1 - drivers/iommu/Kconfig | 1 - lib/Kconfig | 14 -------------- lib/Kconfig.debug | 1 - lib/Makefile | 3 +-- 6 files changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index e360a4a131e1..3405336175ed 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -200,7 +200,6 @@ config DRM_RADEON select POWER_SUPPLY select HWMON select BACKLIGHT_CLASS_DEVICE - select INTERVAL_TREE help Choose this option if you have an ATI Radeon graphics card. There are both PCI and AGP versions. You don't need to choose this to @@ -220,7 +219,6 @@ config DRM_AMDGPU select POWER_SUPPLY select HWMON select BACKLIGHT_CLASS_DEVICE - select INTERVAL_TREE select CHASH help Choose this option if you have a recent AMD Radeon graphics card. diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 3d5f1cb6a76c..54d4bc8d141f 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -3,7 +3,6 @@ config DRM_I915 depends on DRM depends on X86 && PCI select INTEL_GTT - select INTERVAL_TREE # we need shmfs for the swappable backing store, and in particular # the shmem_readpage() which depends upon tmpfs select SHMEM diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index a2ed2b51a0f7..d21e6dc2adae 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -477,7 +477,6 @@ config VIRTIO_IOMMU depends on VIRTIO=y depends on ARM64 select IOMMU_API - select INTERVAL_TREE help Para-virtualised IOMMU driver with virtio. diff --git a/lib/Kconfig b/lib/Kconfig index 8d9239a4156c..e089ac40c062 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -409,20 +409,6 @@ config TEXTSEARCH_FSM config BTREE bool -config INTERVAL_TREE - bool - help - Simple, embeddable, interval-tree. Can find the start of an - overlapping range in log(n) time and then iterate over all - overlapping nodes. The algorithm is implemented as an - augmented rbtree. - - See: - - Documentation/rbtree.txt - - for more information. - config XARRAY_MULTI bool help diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4c35e52c5a2e..54bafed8ba70 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1759,7 +1759,6 @@ config RBTREE_TEST config INTERVAL_TREE_TEST tristate "Interval tree test" depends on DEBUG_KERNEL - select INTERVAL_TREE help A benchmark measuring the performance of the interval tree library diff --git a/lib/Makefile b/lib/Makefile index fb7697031a79..39fd34156692 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -50,7 +50,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o \ once.o refcount.o usercopy.o errseq.o bucket_locks.o \ - generic-radix-tree.o + generic-radix-tree.o interval_tree.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o @@ -115,7 +115,6 @@ obj-y += logic_pio.o obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o obj-$(CONFIG_BTREE) += btree.o -obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o obj-$(CONFIG_DEBUG_LIST) += list_debug.o From patchwork Tue May 21 04:52:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952887 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 89A6076 for ; Tue, 21 May 2019 04:53:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75C24205A4 for ; Tue, 21 May 2019 04:53:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 698EF28968; Tue, 21 May 2019 04:53:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D2AD28969 for ; Tue, 21 May 2019 04:53:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C44EC6B000C; Tue, 21 May 2019 00:53:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BD1B56B000D; Tue, 21 May 2019 00:53:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F9F36B000E; Tue, 21 May 2019 00:53:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 328066B000C for ; Tue, 21 May 2019 00:53:36 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id d15so28740091edm.7 for ; Mon, 20 May 2019 21:53:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=NKC9G2e5nh+h/MCCotTIMlhA7KNv4aHbwBpk1ttXyq4=; b=sFhWpKMnPycsFdlxHPY/kmoOvj/C9LPdiCMxKAQcZRP27NED6aRmD9DJ45Nky10wFe WKZNgm23bxLONVl73ssyobH029DUPX97uGKLwb0wfAvQQZGmpUHAdDmhldELmeVLac8r 0xbh5WO7cVYQPtMrUcAO/p5WHiRn9TemHKv21dE6c3HbfWVtdhelrzs2fxH1fZnLC1LI lKFREjZLIKDkimNJTY9urdgit9qbUZJy88WgmFP4J102a/n8gPLnHZZYyFqN8qa/tcnc 7xu3ENYtiOYWLCm1w2lYUydGMv0JquDPRAP/3jhizkh1B5PMD4x15UAUmMc7BM2Ovwe+ EgKA== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAV0X1meLBmkly3mMA1GIxNh+U+TXRcAVnoY8+RsjUz0V4NUo81v lQi4nas7dLwEyuAX4darPEjzQ/0qN0BQtqyJGG+tcauZ1s0FisbTvbRTlKpTa+shlXMJPOvPzlR 0kviADQi5Z/oBotNT3dR17CLpvCwn9eoWthjPKRYjk28c7xlD+Wi/M/GTuO/Yp0Q= X-Received: by 2002:a50:a5fb:: with SMTP id b56mr79939407edc.262.1558414415457; Mon, 20 May 2019 21:53:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqzFbqILqTW5S3MxPMzn94v4AibKnshPSaEF0mLHMRkEWtnUmEFTXT8zq0SxQcYfEMlWDgPH X-Received: by 2002:a50:a5fb:: with SMTP id b56mr79939268edc.262.1558414412957; Mon, 20 May 2019 21:53:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414412; cv=none; d=google.com; s=arc-20160816; b=ELhF2P1BgQraNcXfKDy8UtvdZlhp+KZzLaC1tTDlttNyW2NaGwUHfw2Yp2SBzr2I2Z y8yQpQcxEp1hviBMdTWNOWw5wgb9MW7YKk8eLhfcA4gE4UcgsNSOftElJ2yqaTgRJ5z3 ymElPu1p4/HkiUTH6KtxJYJhrJf+KmcM8oKd8Tiaj8ClhNZp89tftrZvr3uWjcul02qe ADpCL3gnKlF5z38yTFWE/xECTNdk7foOZY8PS+GfMlwEkaLhsHkvwn0pPVgpI8k/+JjC 8h0UNrhtYfzga8RoZ11S2eTtbjCkD/L+E4javB1zxTLj/LmJiYuwY/r4ZXGmmELi7vyg sxDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=NKC9G2e5nh+h/MCCotTIMlhA7KNv4aHbwBpk1ttXyq4=; b=pxYGWTTG6pV1vxIhWrZ+/PdF/G+qEBzFLu0DMy1jnbnNuf8KleKM9FBdT+dg2W+fiJ uv0ADAHUDS+w2y3lpJ9h+IACuhaUfBoqZj+VWQBZwgGD9mWdBDmmFk15lqznUB31oHYb xmiznKSDXj/MMrFM4r6slnQIwF85cbkKZqSulNbtcYdKf/FtR8vHSsgJk+tHj+79l1L6 YPdNjl3XEsSL0rZG8bNYM0isixkCaXb2pshe4lmKzA7RtWo8G7EVNJ+UaNz12yVbHenu UDBldmhDI7YvNv+/xnh/l3NtptXu7yDmwYdMBBmBFHz2aqrJ/QQAALWgnEWvP/AxZtzO /cRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id h1si449731ejd.15.2019.05.20.21.53.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:32 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:32 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:52:59 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 02/14] Introduce range reader/writer lock Date: Mon, 20 May 2019 21:52:30 -0700 Message-Id: <20190521045242.24378-3-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This implements a sleepable range rwlock, based on interval tree, serializing conflicting/intersecting/overlapping ranges within the tree. The largest range is given by [0, ~0] (inclusive). Unlike traditional locks, range locking involves dealing with the tree itself and the range to be locked, normally stack allocated and always explicitly prepared/initialized by the user in a [a0, a1] a0 <= a1 sorted manner, before actually taking the lock. Interval-tree based range locking is about controlling tasks' forward progress when adding an arbitrary interval (node) to the tree, depending on any overlapping ranges. A task can only continue (wakeup) if there are no intersecting ranges, thus achieving mutual exclusion. To this end, a reference counter is kept for each intersecting range in the tree (_before_ adding itself to it). To enable shared locking semantics, the reader to-be-locked will not take reference if an intersecting node is also a reader, therefore ignoring the node altogether. Fairness and freedom of starvation are guaranteed by the lack of lock stealing, thus range locks depend directly on interval tree semantics. This is particularly for iterations, where the key for the rbtree is given by the interval's low endpoint, and duplicates are walked as it would an inorder traversal of the tree. The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all is total number of ranges and R_int is the number of ranges intersecting the operated range. How much does it cost: ---------------------- The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all is total number of ranges and R_int is the number of ranges intersecting the new range range to be added. Due to its sharable nature, full range locks can be compared with rw-sempahores, which also serves from a mutex standpoint as writer-only situations are pretty similar nowadays. The first is the memory footprint, tree locks are smaller than rwsems: 32 vs 40 bytes, but require an additional 72 bytes of stack for the range structure. Secondly, because every range call is serialized by the tree->lock, any lock() fastpath will at least have an interval_tree_insert() and spinlock lock+unlock overhead compared to a single atomic insn in the case of rwsems. Similar scenario obviously for the unlock() case. The torture module was used to measure 1-1 differences in lock acquisition with increasing core counts over a period of 10 minutes. Readers and writers are interleaved, with a slight advantage to writers as its the first kthread that is created. The following shows the avg ops/minute with various thread-setups on boxes with small and large core-counts. ** 4-core AMD Opteron ** (write-only) rwsem-2thr: 4198.5, stddev: 7.77 range-2thr: 4199.1, stddev: 0.73 rwsem-4thr: 6036.8, stddev: 50.91 range-4thr: 6004.9, stddev: 126.57 rwsem-8thr: 6245.6, stddev: 59.39 range-8thr: 6229.3, stddev: 10.60 (read-only) rwsem-2thr: 5930.7, stddev: 21.92 range-2thr: 5917.3, stddev: 25.45 rwsem-4thr: 9881.6, stddev: 0.70 range-4thr: 9540.2, stddev: 98.28 rwsem-8thr: 11633.2, stddev: 7.72 range-8thr: 11314.7, stddev: 62.22 For the read/write-only cases, there is very little difference between the range lock and rwsems, with up to a 3% hit, which could very well be considered in the noise range. (read-write) rwsem-write-1thr: 1744.8, stddev: 11.59 rwsem-read-1thr: 1043.1, stddev: 3.97 range-write-1thr: 1740.2, stddev: 5.99 range-read-1thr: 1022.5, stddev: 6.41 rwsem-write-2thr: 1662.5, stddev: 0.70 rwsem-read-2thr: 1278.0, stddev: 25.45 range-write-2thr: 1321.5, stddev: 51.61 range-read-2thr: 1243.5, stddev: 30.40 rwsem-write-4thr: 1761.0, stddev: 11.31 rwsem-read-4thr: 1426.0, stddev: 7.07 range-write-4thr: 1417.0, stddev: 29.69 range-read-4thr: 1398.0, stddev: 56.56 While a single reader and writer threads does not show must difference, increasing core counts shows that in reader/writer workloads, writer threads can take a hit in raw performance of up to ~20%, while the number of reader throughput is quite similar among both locks. ** 240-core (ht) IvyBridge ** (write-only) rwsem-120thr: 6844.5, stddev: 82.73 range-120thr: 6070.5, stddev: 85.55 rwsem-240thr: 6292.5, stddev: 146.3 range-240thr: 6099.0, stddev: 15.55 rwsem-480thr: 6164.8, stddev: 33.94 range-480thr: 6062.3, stddev: 19.79 (read-only) rwsem-120thr: 136860.4, stddev: 2539.92 range-120thr: 138052.2, stddev: 327.39 rwsem-240thr: 235297.5, stddev: 2220.50 range-240thr: 232099.1, stddev: 3614.72 rwsem-480thr: 272683.0, stddev: 3924.32 range-480thr: 256539.2, stddev: 9541.69 Similar to the small box, larger machines show that range locks take only a minor (up to ~6% for 480 threads) hit even in completely exclusive or shared scenarios. (read-write) rwsem-write-60thr: 4658.1, stddev: 1303.19 rwsem-read-60thr: 1108.7, stddev: 718.42 range-write-60thr: 3203.6, stddev: 139.30 range-read-60thr: 1852.8, stddev: 147.5 rwsem-write-120thr: 3971.3, stddev: 1413.0 rwsem-read-120thr: 1038.8, stddev: 353.51 range-write-120thr: 2282.1, stddev: 207.18 range-read-120thr: 1856.5, stddev: 198.69 rwsem-write-240thr: 4112.7, stddev: 2448.1 rwsem-read-240thr: 1277.4, stddev: 430.30 range-write-240thr: 2353.1, stddev: 502.04 range-read-240thr: 1551.5, stddev: 361.33 When mixing readers and writers, writer throughput can take a hit of up to ~40%, similar to the 4 core machine, however, reader threads can increase the number of acquisitions in up to ~80%. In any case, the overall writer+reader throughput will always be higher for rwsems. A huge factor in this behavior is that range locks do not have writer spin-on-owner feature. On both machines when actually testing threads acquiring different ranges, the amount of throughput will always outperform the rwsem, due to the increased parallelism; which is no surprise either. As such microbenchmarks that merely pounds on a lock will pretty much always suffer upon direct lock conversions, but not enough to matter in the overall picture. Signed-off-by: Davidlohr Bueso Reviewed-by: Jan Kara --- include/linux/lockdep.h | 33 +++ include/linux/range_lock.h | 189 +++++++++++++ kernel/locking/Makefile | 2 +- kernel/locking/range_lock.c | 667 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 890 insertions(+), 1 deletion(-) create mode 100644 include/linux/range_lock.h create mode 100644 kernel/locking/range_lock.c diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6e2377e6c1d6..cba5763f9da0 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -514,6 +514,16 @@ do { \ lock_acquired(&(_lock)->dep_map, _RET_IP_); \ } while (0) +#define RANGE_LOCK_CONTENDED(tree, _lock, try, lock) \ +do { \ + if (!try(tree, _lock)) { \ + lock_contended(&(tree)->dep_map, _RET_IP_); \ + lock(tree, _lock); \ + } \ + lock_acquired(&(tree)->dep_map, _RET_IP_); \ +} while (0) + + #define LOCK_CONTENDED_RETURN(_lock, try, lock) \ ({ \ int ____err = 0; \ @@ -526,6 +536,18 @@ do { \ ____err; \ }) +#define RANGE_LOCK_CONTENDED_RETURN(tree, _lock, try, lock) \ +({ \ + int ____err = 0; \ + if (!try(tree, _lock)) { \ + lock_contended(&(tree)->dep_map, _RET_IP_); \ + ____err = lock(tree, _lock); \ + } \ + if (!____err) \ + lock_acquired(&(tree)->dep_map, _RET_IP_); \ + ____err; \ +}) + #else /* CONFIG_LOCK_STAT */ #define lock_contended(lockdep_map, ip) do {} while (0) @@ -534,9 +556,15 @@ do { \ #define LOCK_CONTENDED(_lock, try, lock) \ lock(_lock) +#define RANGE_LOCK_CONTENDED(tree, _lock, try, lock) \ + lock(tree, _lock) + #define LOCK_CONTENDED_RETURN(_lock, try, lock) \ lock(_lock) +#define RANGE_LOCK_CONTENDED_RETURN(tree, _lock, try, lock) \ + lock(tree, _lock) + #endif /* CONFIG_LOCK_STAT */ #ifdef CONFIG_LOCKDEP @@ -601,6 +629,11 @@ static inline void print_irqtrace_events(struct task_struct *curr) #define rwsem_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i) #define rwsem_release(l, n, i) lock_release(l, n, i) +#define range_lock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i) +#define range_lock_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i) +#define range_lock_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i) +#define range_lock_release(l, n, i) lock_release(l, n, i) + #define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_) #define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_) #define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_) diff --git a/include/linux/range_lock.h b/include/linux/range_lock.h new file mode 100644 index 000000000000..51448addb2fa --- /dev/null +++ b/include/linux/range_lock.h @@ -0,0 +1,189 @@ +/* + * Range/interval rw-locking + * ------------------------- + * + * Interval-tree based range locking is about controlling tasks' forward + * progress when adding an arbitrary interval (node) to the tree, depending + * on any overlapping ranges. A task can only continue (or wakeup) if there + * are no intersecting ranges, thus achieving mutual exclusion. To this end, + * a reference counter is kept for each intersecting range in the tree + * (_before_ adding itself to it). To enable shared locking semantics, + * the reader to-be-locked will not take reference if an intersecting node + * is also a reader, therefore ignoring the node altogether. + * + * Given the above, range lock order and fairness has fifo semantics among + * contended ranges. Among uncontended ranges, order is given by the inorder + * tree traversal which is performed. + * + * Example: Tasks A, B, C. Tree is empty. + * + * t0: A grabs the (free) lock [a,n]; thus ref[a,n] = 0. + * t1: B tries to grab the lock [g,z]; thus ref[g,z] = 1. + * t2: C tries to grab the lock [b,m]; thus ref[b,m] = 2. + * + * t3: A releases the lock [a,n]; thus ref[g,z] = 0, ref[b,m] = 1. + * t4: B grabs the lock [g.z]. + * + * In addition, freedom of starvation is guaranteed by the fact that there + * is no lock stealing going on, everything being serialized by the tree->lock. + * + * The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where + * R_all is total number of ranges and R_int is the number of ranges + * intersecting the operated range. + */ +#ifndef _LINUX_RANGE_LOCK_H +#define _LINUX_RANGE_LOCK_H + +#include +#include +#include +#include + +/* + * The largest range will span [0,RANGE_LOCK_FULL]. + */ +#define RANGE_LOCK_FULL ~0UL + +struct range_lock { + struct interval_tree_node node; + struct task_struct *tsk; + /* Number of ranges which are blocking acquisition of the lock */ + unsigned int blocking_ranges; + u64 seqnum; +}; + +struct range_lock_tree { + struct rb_root_cached root; + spinlock_t lock; + u64 seqnum; /* track order of incoming ranges, avoid overflows */ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif +}; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +# define __RANGE_LOCK_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname } +#else +# define __RANGE_LOCK_DEP_MAP_INIT(lockname) +#endif + +#define __RANGE_LOCK_TREE_INITIALIZER(name) \ + { .root = RB_ROOT_CACHED \ + , .seqnum = 0 \ + , .lock = __SPIN_LOCK_UNLOCKED(name.lock) \ + __RANGE_LOCK_DEP_MAP_INIT(name) } \ + +#define DEFINE_RANGE_LOCK_TREE(name) \ + struct range_lock_tree name = __RANGE_LOCK_TREE_INITIALIZER(name) + +#define __RANGE_LOCK_INITIALIZER(__start, __last) { \ + .node = { \ + .start = (__start) \ + ,.last = (__last) \ + } \ + , .tsk = NULL \ + , .blocking_ranges = 0 \ + , .seqnum = 0 \ + } + +#define DEFINE_RANGE_LOCK(name, start, last) \ + struct range_lock name = __RANGE_LOCK_INITIALIZER((start), (last)) + +#define DEFINE_RANGE_LOCK_FULL(name) \ + struct range_lock name = __RANGE_LOCK_INITIALIZER(0, RANGE_LOCK_FULL) + +static inline void +__range_lock_tree_init(struct range_lock_tree *tree, + const char *name, struct lock_class_key *key) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + /* + * Make sure we are not reinitializing a held lock: + */ + debug_check_no_locks_freed((void *)tree, sizeof(*tree)); + lockdep_init_map(&tree->dep_map, name, key, 0); +#endif + tree->root = RB_ROOT_CACHED; + spin_lock_init(&tree->lock); + tree->seqnum = 0; +} + +#define range_lock_tree_init(tree) \ +do { \ + static struct lock_class_key __key; \ + \ + __range_lock_tree_init((tree), #tree, &__key); \ +} while (0) + +void range_lock_init(struct range_lock *lock, + unsigned long start, unsigned long last); +void range_lock_init_full(struct range_lock *lock); + +/* + * lock for reading + */ +void range_read_lock(struct range_lock_tree *tree, struct range_lock *lock); +int range_read_lock_interruptible(struct range_lock_tree *tree, + struct range_lock *lock); +int range_read_lock_killable(struct range_lock_tree *tree, + struct range_lock *lock); +int range_read_trylock(struct range_lock_tree *tree, struct range_lock *lock); +void range_read_unlock(struct range_lock_tree *tree, struct range_lock *lock); + +/* + * lock for writing + */ +void range_write_lock(struct range_lock_tree *tree, struct range_lock *lock); +int range_write_lock_interruptible(struct range_lock_tree *tree, + struct range_lock *lock); +int range_write_lock_killable(struct range_lock_tree *tree, + struct range_lock *lock); +int range_write_trylock(struct range_lock_tree *tree, struct range_lock *lock); +void range_write_unlock(struct range_lock_tree *tree, struct range_lock *lock); + +void range_downgrade_write(struct range_lock_tree *tree, + struct range_lock *lock); + +int range_is_locked(struct range_lock_tree *tree, struct range_lock *lock); + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +/* + * nested locking. NOTE: range locks are not allowed to recurse + * (which occurs if the same task tries to acquire the same + * lock instance multiple times), but multiple locks of the + * same lock class might be taken, if the order of the locks + * is always the same. This ordering rule can be expressed + * to lockdep via the _nested() APIs, but enumerating the + * subclasses that are used. (If the nesting relationship is + * static then another method for expressing nested locking is + * the explicit definition of lock class keys and the use of + * lockdep_set_class() at lock initialization time. + * See Documentation/locking/lockdep-design.txt for more details.) + */ +extern void range_read_lock_nested(struct range_lock_tree *tree, + struct range_lock *lock, int subclass); +extern void range_write_lock_nested(struct range_lock_tree *tree, + struct range_lock *lock, int subclass); +extern int range_write_lock_killable_nested(struct range_lock_tree *tree, + struct range_lock *lock, int subclass); +extern void _range_write_lock_nest_lock(struct range_lock_tree *tree, + struct range_lock *lock, struct lockdep_map *nest_lock); + +# define range_write_lock_nest_lock(tree, lock, nest_lock) \ +do { \ + typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \ + _range_write_lock_nest_lock(tree, lock, &(nest_lock)->dep_map); \ +} while (0); + +#else +# define range_read_lock_nested(tree, lock, subclass) \ + range_read_lock(tree, lock) +# define range_write_lock_nest_lock(tree, lock, nest_lock) \ + range_write_lock(tree, lock) +# define range_write_lock_nested(tree, lock, subclass) \ + range_write_lock(tree, lock) +# define range_write_lock_killable_nested(tree, lock, subclass) \ + range_write_lock_killable(tree, lock) +#endif + +#endif diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 6fe2f333aecb..8fba2abf4851 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -3,7 +3,7 @@ # and is generally not a function of system call inputs. KCOV_INSTRUMENT := n -obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o rwsem-xadd.o +obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o rwsem-xadd.o range_lock.o ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE) diff --git a/kernel/locking/range_lock.c b/kernel/locking/range_lock.c new file mode 100644 index 000000000000..ccb407a6b9d4 --- /dev/null +++ b/kernel/locking/range_lock.c @@ -0,0 +1,667 @@ +/* + * Copyright (C) 2017 Jan Kara, Davidlohr Bueso. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define range_interval_tree_foreach(node, root, start, last) \ + for (node = interval_tree_iter_first(root, start, last); \ + node; node = interval_tree_iter_next(node, start, last)) + +#define to_range_lock(ptr) container_of(ptr, struct range_lock, node) +#define to_interval_tree_node(ptr) \ + container_of(ptr, struct interval_tree_node, rb) + +static inline void +__range_tree_insert(struct range_lock_tree *tree, struct range_lock *lock) +{ + lock->seqnum = tree->seqnum++; + interval_tree_insert(&lock->node, &tree->root); +} + +static inline void +__range_tree_remove(struct range_lock_tree *tree, struct range_lock *lock) +{ + interval_tree_remove(&lock->node, &tree->root); +} + +/* + * lock->tsk reader tracking. + */ +#define RANGE_FLAG_READER 1UL + +static inline struct task_struct *range_lock_waiter(struct range_lock *lock) +{ + return (struct task_struct *) + ((unsigned long) lock->tsk & ~RANGE_FLAG_READER); +} + +static inline void range_lock_set_reader(struct range_lock *lock) +{ + lock->tsk = (struct task_struct *) + ((unsigned long)lock->tsk | RANGE_FLAG_READER); +} + +static inline void range_lock_clear_reader(struct range_lock *lock) +{ + lock->tsk = (struct task_struct *) + ((unsigned long)lock->tsk & ~RANGE_FLAG_READER); +} + +static inline bool range_lock_is_reader(struct range_lock *lock) +{ + return (unsigned long) lock->tsk & RANGE_FLAG_READER; +} + +static inline void +__range_lock_init(struct range_lock *lock, + unsigned long start, unsigned long last) +{ + WARN_ON(start > last); + + lock->node.start = start; + lock->node.last = last; + RB_CLEAR_NODE(&lock->node.rb); + lock->blocking_ranges = 0; + lock->tsk = NULL; + lock->seqnum = 0; +} + +/** + * range_lock_init - Initialize a range lock + * @lock: the range lock to be initialized + * @start: start of the interval (inclusive) + * @last: last location in the interval (inclusive) + * + * Initialize the range's [start, last] such that it can + * later be locked. User is expected to enter a sorted + * range, such that @start <= @last. + * + * It is not allowed to initialize an already locked range. + */ +void range_lock_init(struct range_lock *lock, + unsigned long start, unsigned long last) +{ + __range_lock_init(lock, start, last); +} +EXPORT_SYMBOL_GPL(range_lock_init); + +/** + * range_lock_init_full - Initialize a full range lock + * @lock: the range lock to be initialized + * + * Initialize the full range. + * + * It is not allowed to initialize an already locked range. + */ +void range_lock_init_full(struct range_lock *lock) +{ + __range_lock_init(lock, 0, RANGE_LOCK_FULL); +} +EXPORT_SYMBOL_GPL(range_lock_init_full); + +static inline void +range_lock_put(struct range_lock *lock, struct wake_q_head *wake_q) +{ + if (!--lock->blocking_ranges) + wake_q_add(wake_q, range_lock_waiter(lock)); +} + +static inline int wait_for_ranges(struct range_lock_tree *tree, + struct range_lock *lock, long state) +{ + int ret = 0; + + while (true) { + set_current_state(state); + + /* do we need to go to sleep? */ + if (!lock->blocking_ranges) + break; + + if (unlikely(signal_pending_state(state, current))) { + struct interval_tree_node *node; + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + + ret = -EINTR; + /* + * We're not taking the lock after all, cleanup + * after ourselves. + */ + spin_lock_irqsave(&tree->lock, flags); + + range_lock_clear_reader(lock); + __range_tree_remove(tree, lock); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, + lock->node.last) { + struct range_lock *blked; + blked = to_range_lock(node); + + if (range_lock_is_reader(lock) && + range_lock_is_reader(blked)) + continue; + + /* unaccount for threads _we_ are blocking */ + if (lock->seqnum < blked->seqnum) + range_lock_put(blked, &wake_q); + } + + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); + break; + } + + schedule(); + } + + __set_current_state(TASK_RUNNING); + return ret; +} + +/** + * range_read_trylock - Trylock for reading + * @tree: interval tree + * @lock: the range lock to be trylocked + * + * The trylock is against the range itself, not the @tree->lock. + * + * Returns 1 if successful, 0 if contention (must block to acquire). + */ +static inline int __range_read_trylock(struct range_lock_tree *tree, + struct range_lock *lock) +{ + int ret = true; + unsigned long flags; + struct interval_tree_node *node; + + spin_lock_irqsave(&tree->lock, flags); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_lock *blocked_lock; + blocked_lock = to_range_lock(node); + + if (!range_lock_is_reader(blocked_lock)) { + ret = false; + goto unlock; + } + } + + range_lock_set_reader(lock); + __range_tree_insert(tree, lock); +unlock: + spin_unlock_irqrestore(&tree->lock, flags); + + return ret; +} + +int range_read_trylock(struct range_lock_tree *tree, struct range_lock *lock) +{ + int ret = __range_read_trylock(tree, lock); + + if (ret) + range_lock_acquire_read(&tree->dep_map, 0, 1, _RET_IP_); + + return ret; +} + +EXPORT_SYMBOL_GPL(range_read_trylock); + +static __always_inline int __sched +__range_read_lock_common(struct range_lock_tree *tree, + struct range_lock *lock, long state) +{ + struct interval_tree_node *node; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_lock *blocked_lock; + blocked_lock = to_range_lock(node); + + if (!range_lock_is_reader(blocked_lock)) + lock->blocking_ranges++; + } + + __range_tree_insert(tree, lock); + + lock->tsk = current; + range_lock_set_reader(lock); + spin_unlock_irqrestore(&tree->lock, flags); + + return wait_for_ranges(tree, lock, state); +} + +static __always_inline int +__range_read_lock(struct range_lock_tree *tree, struct range_lock *lock) +{ + return __range_read_lock_common(tree, lock, TASK_UNINTERRUPTIBLE); +} + +/** + * range_read_lock - Lock for reading + * @tree: interval tree + * @lock: the range lock to be locked + * + * Returns when the lock has been acquired or sleep until + * until there are no overlapping ranges. + */ +void range_read_lock(struct range_lock_tree *tree, struct range_lock *lock) +{ + might_sleep(); + range_lock_acquire_read(&tree->dep_map, 0, 0, _RET_IP_); + + RANGE_LOCK_CONTENDED(tree, lock, + __range_read_trylock, __range_read_lock); +} +EXPORT_SYMBOL_GPL(range_read_lock); + +/** + * range_read_lock_interruptible - Lock for reading (interruptible) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_read_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +int range_read_lock_interruptible(struct range_lock_tree *tree, + struct range_lock *lock) +{ + might_sleep(); + return __range_read_lock_common(tree, lock, TASK_INTERRUPTIBLE); +} +EXPORT_SYMBOL_GPL(range_read_lock_interruptible); + +/** + * range_read_lock_killable - Lock for reading (killable) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_read_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +static __always_inline int +__range_read_lock_killable(struct range_lock_tree *tree, + struct range_lock *lock) +{ + return __range_read_lock_common(tree, lock, TASK_KILLABLE); +} + +int range_read_lock_killable(struct range_lock_tree *tree, + struct range_lock *lock) +{ + might_sleep(); + range_lock_acquire_read(&tree->dep_map, 0, 0, _RET_IP_); + + if (RANGE_LOCK_CONTENDED_RETURN(tree, lock, __range_read_trylock, + __range_read_lock_killable)) { + range_lock_release(&tree->dep_map, 1, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL_GPL(range_read_lock_killable); + +/** + * range_read_unlock - Unlock for reading + * @tree: interval tree + * @lock: the range lock to be unlocked + * + * Wakes any blocked readers, when @lock is the only conflicting range. + * + * It is not allowed to unlock an unacquired read lock. + */ +void range_read_unlock(struct range_lock_tree *tree, struct range_lock *lock) +{ + struct interval_tree_node *node; + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + + spin_lock_irqsave(&tree->lock, flags); + + range_lock_clear_reader(lock); + __range_tree_remove(tree, lock); + + range_lock_release(&tree->dep_map, 1, _RET_IP_); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_lock *blocked_lock; + blocked_lock = to_range_lock(node); + + if (!range_lock_is_reader(blocked_lock)) + range_lock_put(blocked_lock, &wake_q); + } + + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); +} +EXPORT_SYMBOL_GPL(range_read_unlock); + +/* + * Check for overlaps for fast write_trylock(), which is the same + * optimization that interval_tree_iter_first() does. + */ +static inline bool __range_overlaps_intree(struct range_lock_tree *tree, + struct range_lock *lock) +{ + struct interval_tree_node *root; + struct range_lock *left; + + if (unlikely(RB_EMPTY_ROOT(&tree->root.rb_root))) + return false; + + root = to_interval_tree_node(tree->root.rb_root.rb_node); + left = to_range_lock(to_interval_tree_node(rb_first_cached(&tree->root))); + + return lock->node.start <= root->__subtree_last && + left->node.start <= lock->node.last; +} + +/** + * range_write_trylock - Trylock for writing + * @tree: interval tree + * @lock: the range lock to be trylocked + * + * The trylock is against the range itself, not the @tree->lock. + * + * Returns 1 if successful, 0 if contention (must block to acquire). + */ +static inline int __range_write_trylock(struct range_lock_tree *tree, + struct range_lock *lock) +{ + int overlaps; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + overlaps = __range_overlaps_intree(tree, lock); + + if (!overlaps) { + range_lock_clear_reader(lock); + __range_tree_insert(tree, lock); + } + + spin_unlock_irqrestore(&tree->lock, flags); + + return !overlaps; +} + +int range_write_trylock(struct range_lock_tree *tree, struct range_lock *lock) +{ + int ret = __range_write_trylock(tree, lock); + + if (ret) + range_lock_acquire(&tree->dep_map, 0, 1, _RET_IP_); + + return ret; +} +EXPORT_SYMBOL_GPL(range_write_trylock); + +static __always_inline int __sched +__range_write_lock_common(struct range_lock_tree *tree, + struct range_lock *lock, long state) +{ + struct interval_tree_node *node; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + /* + * As a writer, we always consider an existing node. We + * need to wait; either the intersecting node is another + * writer or we have a reader that needs to finish. + */ + lock->blocking_ranges++; + } + + __range_tree_insert(tree, lock); + + lock->tsk = current; + spin_unlock_irqrestore(&tree->lock, flags); + + return wait_for_ranges(tree, lock, state); +} + +static __always_inline int +__range_write_lock(struct range_lock_tree *tree, struct range_lock *lock) +{ + return __range_write_lock_common(tree, lock, TASK_UNINTERRUPTIBLE); +} + +/** + * range_write_lock - Lock for writing + * @tree: interval tree + * @lock: the range lock to be locked + * + * Returns when the lock has been acquired or sleep until + * until there are no overlapping ranges. + */ +void range_write_lock(struct range_lock_tree *tree, struct range_lock *lock) +{ + might_sleep(); + range_lock_acquire(&tree->dep_map, 0, 0, _RET_IP_); + + RANGE_LOCK_CONTENDED(tree, lock, + __range_write_trylock, __range_write_lock); +} +EXPORT_SYMBOL_GPL(range_write_lock); + +/** + * range_write_lock_interruptible - Lock for writing (interruptible) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_write_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +int range_write_lock_interruptible(struct range_lock_tree *tree, + struct range_lock *lock) +{ + might_sleep(); + return __range_write_lock_common(tree, lock, TASK_INTERRUPTIBLE); +} +EXPORT_SYMBOL_GPL(range_write_lock_interruptible); + +/** + * range_write_lock_killable - Lock for writing (killable) + * @tree: interval tree + * @lock: the range lock to be locked + * + * Lock the range like range_write_lock(), and return 0 if the + * lock has been acquired or sleep until until there are no + * overlapping ranges. If a signal arrives while waiting for the + * lock then this function returns -EINTR. + */ +static __always_inline int +__range_write_lock_killable(struct range_lock_tree *tree, + struct range_lock *lock) +{ + return __range_write_lock_common(tree, lock, TASK_KILLABLE); +} + +int range_write_lock_killable(struct range_lock_tree *tree, + struct range_lock *lock) +{ + might_sleep(); + range_lock_acquire(&tree->dep_map, 0, 0, _RET_IP_); + + if (RANGE_LOCK_CONTENDED_RETURN(tree, lock, __range_write_trylock, + __range_write_lock_killable)) { + range_lock_release(&tree->dep_map, 1, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL_GPL(range_write_lock_killable); + +/** + * range_write_unlock - Unlock for writing + * @tree: interval tree + * @lock: the range lock to be unlocked + * + * Wakes any blocked readers, when @lock is the only conflicting range. + * + * It is not allowed to unlock an unacquired write lock. + */ +void range_write_unlock(struct range_lock_tree *tree, struct range_lock *lock) +{ + struct interval_tree_node *node; + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + + spin_lock_irqsave(&tree->lock, flags); + + range_lock_clear_reader(lock); + __range_tree_remove(tree, lock); + + range_lock_release(&tree->dep_map, 1, _RET_IP_); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_lock *blocked_lock; + blocked_lock = to_range_lock(node); + + range_lock_put(blocked_lock, &wake_q); + } + + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); +} +EXPORT_SYMBOL_GPL(range_write_unlock); + +/** + * range_downgrade_write - Downgrade write range lock to read lock + * @tree: interval tree + * @lock: the range lock to be downgraded + * + * Wakes any blocked readers, when @lock is the only conflicting range. + * + * It is not allowed to downgrade an unacquired write lock. + */ +void range_downgrade_write(struct range_lock_tree *tree, + struct range_lock *lock) +{ + unsigned long flags; + struct interval_tree_node *node; + DEFINE_WAKE_Q(wake_q); + + lock_downgrade(&tree->dep_map, _RET_IP_); + + spin_lock_irqsave(&tree->lock, flags); + + WARN_ON(range_lock_is_reader(lock)); + + range_interval_tree_foreach(node, &tree->root, + lock->node.start, lock->node.last) { + struct range_lock *blocked_lock; + blocked_lock = to_range_lock(node); + + /* + * Unaccount for any blocked reader lock. Wakeup if possible. + */ + if (range_lock_is_reader(blocked_lock)) + range_lock_put(blocked_lock, &wake_q); + } + + range_lock_set_reader(lock); + spin_unlock_irqrestore(&tree->lock, flags); + wake_up_q(&wake_q); +} +EXPORT_SYMBOL_GPL(range_downgrade_write); + +/** + * range_is_locked - Returns 1 if the given range is already either reader or + * writer owned. Otherwise 0. + * @tree: interval tree + * @lock: the range lock to be checked + * + * Similar to trylocks, this is against the range itself, not the @tree->lock. + */ +int range_is_locked(struct range_lock_tree *tree, struct range_lock *lock) +{ + int overlaps; + unsigned long flags; + + spin_lock_irqsave(&tree->lock, flags); + overlaps = __range_overlaps_intree(tree, lock); + spin_unlock_irqrestore(&tree->lock, flags); + + return overlaps; +} +EXPORT_SYMBOL_GPL(range_is_locked); + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + +void range_read_lock_nested(struct range_lock_tree *tree, + struct range_lock *lock, int subclass) +{ + might_sleep(); + range_lock_acquire_read(&tree->dep_map, subclass, 0, _RET_IP_); + + RANGE_LOCK_CONTENDED(tree, lock, __range_read_trylock, __range_read_lock); +} +EXPORT_SYMBOL_GPL(range_read_lock_nested); + +void _range_write_lock_nest_lock(struct range_lock_tree *tree, + struct range_lock *lock, + struct lockdep_map *nest) +{ + might_sleep(); + range_lock_acquire_nest(&tree->dep_map, 0, 0, nest, _RET_IP_); + + RANGE_LOCK_CONTENDED(tree, lock, + __range_write_trylock, __range_write_lock); +} +EXPORT_SYMBOL_GPL(_range_write_lock_nest_lock); + +void range_write_lock_nested(struct range_lock_tree *tree, + struct range_lock *lock, int subclass) +{ + might_sleep(); + range_lock_acquire(&tree->dep_map, subclass, 0, _RET_IP_); + + RANGE_LOCK_CONTENDED(tree, lock, + __range_write_trylock, __range_write_lock); +} +EXPORT_SYMBOL_GPL(range_write_lock_nested); + + +int range_write_lock_killable_nested(struct range_lock_tree *tree, + struct range_lock *lock, int subclass) +{ + might_sleep(); + range_lock_acquire(&tree->dep_map, subclass, 0, _RET_IP_); + + if (RANGE_LOCK_CONTENDED_RETURN(tree, lock, __range_write_trylock, + __range_write_lock_killable)) { + range_lock_release(&tree->dep_map, 1, _RET_IP_); + return -EINTR; + } + + return 0; +} +EXPORT_SYMBOL_GPL(range_write_lock_killable_nested); +#endif From patchwork Tue May 21 04:52:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16CBD13AD for ; Tue, 21 May 2019 04:53:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07D5428866 for ; Tue, 21 May 2019 04:53:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F01D028968; Tue, 21 May 2019 04:53:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6672E28866 for ; Tue, 21 May 2019 04:53:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 985316B0008; Tue, 21 May 2019 00:53:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 934096B000C; Tue, 21 May 2019 00:53:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 824E16B000D; Tue, 21 May 2019 00:53:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 3165E6B0008 for ; Tue, 21 May 2019 00:53:34 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id z5so28782033edz.3 for ; Mon, 20 May 2019 21:53:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rX3tmFnDBHq8c3zM6clbCsU1d9V2kYeDMHx4D3qEo4Y=; b=gSQmRDLZx0W+pmLAYw7Lrzl2P6EgkI5gk6YETPAmb1xO5ODq6HMOgPqVqjnSeFks1W cRxjN87al/u8cVhMO/BUaT6zZ36V7l8WS+jUEvvM0NzB+mKmKLv3R89Qp3CGAzELmZaI J2k/R4bJdkF/UYoOUZ0y1tQ6rt0gpitNg/Put1Nm6QJ01yhA1at36H85yVyvaZxGzvnc 5SGv8av9k8U/g5A6Q77QLkEQjGDNu6Ql9h5NS+n2KRvUxai0er92IEhbyPKL1D+sIgUG bvzKoNLHovrhX/1+NavN9qE1oIfNB6fgkYtv4hv7Dn30esEXZ12F6LPc068VDIU7qYW2 qS9g== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAW33zBg9SbQWBfzH/2PyGzoy7p9dkvo9b3VMqZfXYQXxMpY62y4 IDypBwLOfSsT7Dx+AwO2XTeFYtqCRW3wu2mzoSBbBSyxBkTAT+YtyGGSrZzMIF7RcqaHVSL7Ets LrZhble6kjtGkq9134R608SAEtV0MJ1qmjf9gqtT6KXBSK7M8AFKZ6c1/vxy8jmI= X-Received: by 2002:a05:6402:6d2:: with SMTP id n18mr81019394edy.122.1558414413729; Mon, 20 May 2019 21:53:33 -0700 (PDT) X-Google-Smtp-Source: APXvYqxbK+p9fAfqMrC5d+YcGF6t64QMdOWKUR75Q+FBTSBqkNJyx9ZMO1RwQMPnQyUaOyNAMIW2 X-Received: by 2002:a05:6402:6d2:: with SMTP id n18mr81019336edy.122.1558414412578; Mon, 20 May 2019 21:53:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414412; cv=none; d=google.com; s=arc-20160816; b=qTPdnp3QErdFZkT84CsNEg+EYrtruRdyXEbz1w3qExxPzqXU9JO2KH8tFIlAHYXOlt +3Jv4ivSBpQ49sPbhqxeR3OWEAtbG5rjyaexphxK0ij4cqu0Il0Wo4fhvukp+MzDbzVr 1/c7yYZIHXrzUWVEu0Gsb8swBN0bTi44s837BvFOIwWWDmM352huqbDb1co1wZQVFzcP 07Krp7j7yMN52JM7Zn8cSBf33cDrqovoaZ/mrLnUPhngFrVBAS7kQIKUfs8IKmuekJUb RM7YJ+k+eDYcg+9o7R+k0LgZx1ZUJ/4zsetKnlv3UTuOSKIAlNaQI0aQczmPbCveHRXm euqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rX3tmFnDBHq8c3zM6clbCsU1d9V2kYeDMHx4D3qEo4Y=; b=HMmWdxuNSVns6enIXrLsgR42ALQomvZmmB9XXZRYgXBrB46MDoGXf780QqZs/0mEbr cj6PsW8vvbqBosjvSIeosPhXNEicVptTRghbMWAqXMTpllg2nZIaso4KFqMtGO5Fa4gl 8cyh27fDHF6cADRAfjA1PT/05jDG1Xust+7pcX9eKF30pH0DvQ1KEHR+MkB4Ie0/USmO 0nYCQFmQ3cZ4Tmq01iW5r4pRIkqTOEemnUT/f69KxX3t57JGkmPAYsk4xgvbv/Ub9Z4p hU6v9bHAC4B/BEwGtdK/0HWUgXP51c90tWQIOL2jV8yJod6QQFCUYoqJ+/GyO2KxRKPS cCHw== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id w20si2516375eda.95.2019.05.20.21.53.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:32 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:31 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:01 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 03/14] mm: introduce mm locking wrappers Date: Mon, 20 May 2019 21:52:31 -0700 Message-Id: <20190521045242.24378-4-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch adds the necessary wrappers to encapsulate mmap_sem locking and will enable any future changes to be a lot more confined to here. In addition, future users will incrementally be added in the next patches. mm_[read/write]_[un]lock() naming is used. Signed-off-by: Davidlohr Bueso --- include/linux/mm.h | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e8834ac32b7..780b6097ee47 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -2880,5 +2881,80 @@ void __init setup_nr_node_ids(void); static inline void setup_nr_node_ids(void) {} #endif +/* + * Address space locking wrappers. + */ +static inline bool mm_is_locked(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return rwsem_is_locked(&mm->mmap_sem); +} + +/* Reader wrappers */ +static inline int mm_read_trylock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return down_read_trylock(&mm->mmap_sem); +} + +static inline void mm_read_lock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + down_read(&mm->mmap_sem); +} + +static inline void mm_read_lock_nested(struct mm_struct *mm, + struct range_lock *mmrange, int subclass) +{ + down_read_nested(&mm->mmap_sem, subclass); +} + +static inline void mm_read_unlock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + up_read(&mm->mmap_sem); +} + +/* Writer wrappers */ +static inline int mm_write_trylock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return down_write_trylock(&mm->mmap_sem); +} + +static inline void mm_write_lock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + down_write(&mm->mmap_sem); +} + +static inline int mm_write_lock_killable(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return down_write_killable(&mm->mmap_sem); +} + +static inline void mm_downgrade_write(struct mm_struct *mm, + struct range_lock *mmrange) +{ + downgrade_write(&mm->mmap_sem); +} + +static inline void mm_write_unlock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + up_write(&mm->mmap_sem); +} + +static inline void mm_write_lock_nested(struct mm_struct *mm, + struct range_lock *mmrange, + int subclass) +{ + down_write_nested(&mm->mmap_sem, subclass); +} + +#define mm_write_nest_lock(mm, range, nest_lock) \ + down_write_nest_lock(&(mm)->mmap_sem, nest_lock) + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ From patchwork Tue May 21 04:52:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B1F413AD for ; Tue, 21 May 2019 04:53:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77EDC2893D for ; Tue, 21 May 2019 04:53:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6BE1128968; Tue, 21 May 2019 04:53:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D17452893D for ; Tue, 21 May 2019 04:53:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D556C6B0010; Tue, 21 May 2019 00:53:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D056B6B0266; Tue, 21 May 2019 00:53:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7DC16B0269; Tue, 21 May 2019 00:53:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 4686F6B0010 for ; Tue, 21 May 2019 00:53:39 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id n52so28826863edd.2 for ; Mon, 20 May 2019 21:53:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=oQsC0aliXQsd7Ru4rFSVdN6bTIqeyTPYbrt7w+a23dc=; b=jrrD16YC9yngTbT9vbndo1AKhVKMUrvchBrmFdxBVIpw6+MsqvFdyazAufnoHM4bfl fCwBhxESCLkwgtEskod6pEPvoB2J8jX3m09LEWvZy55qWb56XvEShBNir0JlNesLY6MW +uEZQK0FgjZyP9hxf1m33b9y5Ei5W0cSbp3jkxCdQRVjLn07KGHMx1Gx+bpMH+ypzMH/ Z6fqkGeRhmBhmvn5Qv1kfk5wwNxZYC5iCtFaEGQw/oqBGrhq53Tc4ymLzp/OLuEzzGCn UiUj6iomhxJIZRcsTyk/NPnoMptnDrbteKewTtDURwROfaP8ngyKLiF0QzwH8/j9N4vp P15w== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAXaXIn56BghNM25aGBBRlWaY3LW1BobPUQcLSbsb+wK3v3V/Ul6 eI+gAEWHMft06GLKlcAVlMZ0HmreLOIBdvoQSDcctMpbbAIkX5cjj2DFtJ27FcaEbOhIFGQfTcY RyyxcbpxLp0iOTzI8u/5ZfUvyXBRIn7h+6ML9glzVeDNgkMaHnYo2WyXHURB0/yo= X-Received: by 2002:a50:9705:: with SMTP id c5mr81192101edb.258.1558414418689; Mon, 20 May 2019 21:53:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqxbW25e4PcNbK06rYiIzC6FJe22IHAdOf9pIyq5cAgno2UpatDT09RKSSUSHAVcpn4bmu5V X-Received: by 2002:a50:9705:: with SMTP id c5mr81191939edb.258.1558414415606; Mon, 20 May 2019 21:53:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414415; cv=none; d=google.com; s=arc-20160816; b=LjKvdOJABK0datwRSnZDA5JPJGGk5vL58uCPggS2RtZDFX35g3YG8ScoM1euto0juB iERQnwWyo859S+Kzs+QYThQmfo2uNcJQepHmmRO4lllZucKpo15WhDi/jrS/riroUA1X 8bIcbTXvLurZzsOmzN9qDuv+pvc7MA6OP/vKEf6/7jP5WrDsu10Twr6GvtHlbPH2ywwZ aGXYo3nPtXgtoo47WMzKrfKbsbHVQrAdfqQHwANxasqCcwWn8HGZ5dyqbQdPHmx3P+4A amSjeiIcy0kpGTWVzqy4GHyasxSOAbLu0Xg0QQ4Xqdul7VnzU58k1dGOaz7cxg3/+tT9 LGuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=oQsC0aliXQsd7Ru4rFSVdN6bTIqeyTPYbrt7w+a23dc=; b=TwQ3eacRLHrnheuTaGh1NEynH3l9uJK1yi/2kke9pFevy3f7uDtEzW3e0KGlGfpsHj VDMELByKRWLU8JPalknN8zO7f49ewnsdMYNDfFOfeLHlHqbYq0AAdy/p2eyjMoir/kXB haqP+ecMD1QGvkBRoMrVdDuc1Z+m/La9Mtt5nDRKo2LcK+9SgxwvDe7S3vvdFxMqmdeT EGCDc1J54h9rdYb8RYSWOke0JPbmkqfvq0XWvH0oStf8RijVn6hGS6ez/EnzI2+r5isU p7gkxAkcl848c+wOFlPuNdjHHlX6jfWfLdVsWZgUNwTLu3oyuviZ9uFyw+KtvjcWpb6E RC4w== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id z54si863576edc.429.2019.05.20.21.53.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:35 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:34 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:04 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 04/14] mm: teach pagefault paths about range locking Date: Mon, 20 May 2019 21:52:32 -0700 Message-Id: <20190521045242.24378-5-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When handling a page fault, it happens that the mmap_sem is released during the processing. As moving to range lock requires remembering the range parameter to do the lock/unlock, this patch adds a pointer to struct vm_fault. As such, we work outwards from arming the vmf from: handle_mm_fault(), __collapse_huge_page_swapin() and hugetlb_no_page() The idea is to use a local, stack allocated variable (no concurrency) whenever the mmap_sem is originally taken and we end up in pf paths that end up retaking the lock. Ie: DEFINE_RANGE_LOCK_FULL(mmrange); down_write(&mm->mmap_sem); some_fn(a, b, c, &mmrange); .... .... ... handle_mm_fault(vma, addr, flags, mmrange); ... up_write(&mm->mmap_sem); Consequentially we also end up updating lock_page_or_retry(), which can drop the mmap_sem. For the the gup family, we pass nil for scenarios when the semaphore will remain untouched. Semantically nothing changes at all, and the 'mmrange' ends up being unused for now. Later patches will use the variable when the mmap_sem wrappers replace straightforward down/up. *** For simplicity, this patch breaks when used in ksm and hmm. *** Signed-off-by: Davidlohr Bueso --- arch/x86/mm/fault.c | 27 ++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/i915/i915_gem_userptr.c | 2 +- drivers/infiniband/core/umem_odp.c | 2 +- drivers/iommu/amd_iommu_v2.c | 3 +- drivers/iommu/intel-svm.c | 3 +- drivers/vfio/vfio_iommu_type1.c | 2 +- fs/exec.c | 2 +- include/linux/hugetlb.h | 9 +++-- include/linux/mm.h | 24 ++++++++---- include/linux/pagemap.h | 6 +-- kernel/events/uprobes.c | 7 ++-- kernel/futex.c | 2 +- mm/filemap.c | 2 +- mm/frame_vector.c | 6 ++- mm/gup.c | 65 ++++++++++++++++++++------------- mm/hmm.c | 4 +- mm/hugetlb.c | 14 ++++--- mm/internal.h | 3 +- mm/khugepaged.c | 24 +++++++----- mm/ksm.c | 3 +- mm/memory.c | 14 ++++--- mm/mempolicy.c | 9 +++-- mm/mmap.c | 4 +- mm/mprotect.c | 2 +- mm/process_vm_access.c | 4 +- security/tomoyo/domain.c | 2 +- virt/kvm/async_pf.c | 3 +- virt/kvm/kvm_main.c | 9 +++-- 29 files changed, 159 insertions(+), 100 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 46df4c6aae46..fb869c292b91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -938,7 +938,8 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, static void __bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, u32 pkey, int si_code) + unsigned long address, u32 pkey, int si_code, + struct range_lock *mmrange) { struct mm_struct *mm = current->mm; /* @@ -951,9 +952,10 @@ __bad_area(struct pt_regs *regs, unsigned long error_code, } static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) +bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, + struct range_lock *mmrange) { - __bad_area(regs, error_code, address, 0, SEGV_MAPERR); + __bad_area(regs, error_code, address, 0, SEGV_MAPERR, mmrange); } static inline bool bad_area_access_from_pkeys(unsigned long error_code, @@ -975,7 +977,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code, static noinline void bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address, struct vm_area_struct *vma) + unsigned long address, struct vm_area_struct *vma, + struct range_lock *mmrange) { /* * This OSPKE check is not strictly necessary at runtime. @@ -1005,9 +1008,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code, */ u32 pkey = vma_pkey(vma); - __bad_area(regs, error_code, address, pkey, SEGV_PKUERR); + __bad_area(regs, error_code, address, pkey, SEGV_PKUERR, mmrange); } else { - __bad_area(regs, error_code, address, 0, SEGV_ACCERR); + __bad_area(regs, error_code, address, 0, SEGV_ACCERR, mmrange); } } @@ -1306,6 +1309,7 @@ void do_user_addr_fault(struct pt_regs *regs, struct mm_struct *mm; vm_fault_t fault, major = 0; unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + DEFINE_RANGE_LOCK_FULL(mmrange); tsk = current; mm = tsk->mm; @@ -1417,17 +1421,17 @@ void do_user_addr_fault(struct pt_regs *regs, vma = find_vma(mm, address); if (unlikely(!vma)) { - bad_area(regs, hw_error_code, address); + bad_area(regs, hw_error_code, address, &mmrange); return; } if (likely(vma->vm_start <= address)) goto good_area; if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { - bad_area(regs, hw_error_code, address); + bad_area(regs, hw_error_code, address, &mmrange); return; } if (unlikely(expand_stack(vma, address))) { - bad_area(regs, hw_error_code, address); + bad_area(regs, hw_error_code, address, &mmrange); return; } @@ -1437,7 +1441,8 @@ void do_user_addr_fault(struct pt_regs *regs, */ good_area: if (unlikely(access_error(hw_error_code, vma))) { - bad_area_access_error(regs, hw_error_code, address, vma); + bad_area_access_error(regs, hw_error_code, address, vma, + &mmrange); return; } @@ -1454,7 +1459,7 @@ void do_user_addr_fault(struct pt_regs *regs, * userland). The return to userland is identified whenever * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags. */ - fault = handle_mm_fault(vma, address, flags); + fault = handle_mm_fault(vma, address, flags, &mmrange); major |= fault & VM_FAULT_MAJOR; /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index af1e218c6a74..d81101ac57eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -776,7 +776,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages) else r = get_user_pages_remote(gtt->usertask, mm, userptr, num_pages, - flags, p, NULL, NULL); + flags, p, NULL, NULL, NULL); spin_lock(>t->guptasklock); list_del(&guptask.list); diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 8079ea3af103..67f718015e42 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -511,7 +511,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) obj->userptr.ptr + pinned * PAGE_SIZE, npages - pinned, flags, - pvec + pinned, NULL, NULL); + pvec + pinned, NULL, NULL, NULL); if (ret < 0) break; diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index f962b5bbfa40..62b5de027dd1 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -639,7 +639,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, */ npages = get_user_pages_remote(owning_process, owning_mm, user_virt, gup_num_pages, - flags, local_page_list, NULL, NULL); + flags, local_page_list, NULL, NULL, NULL); up_read(&owning_mm->mmap_sem); if (npages < 0) { diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c index 5d7ef750e4a0..67c609b26249 100644 --- a/drivers/iommu/amd_iommu_v2.c +++ b/drivers/iommu/amd_iommu_v2.c @@ -489,6 +489,7 @@ static void do_fault(struct work_struct *work) unsigned int flags = 0; struct mm_struct *mm; u64 address; + DEFINE_RANGE_LOCK_FULL(mmrange); mm = fault->state->mm; address = fault->address; @@ -509,7 +510,7 @@ static void do_fault(struct work_struct *work) if (access_error(vma, fault)) goto out; - ret = handle_mm_fault(vma, address, flags); + ret = handle_mm_fault(vma, address, flags, &mmrange); out: up_read(&mm->mmap_sem); diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 8f87304f915c..74d535ea6a03 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -551,6 +551,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) int result; vm_fault_t ret; u64 address; + DEFINE_RANGE_LOCK_FULL(mmrange); handled = 1; @@ -603,7 +604,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) goto invalid; ret = handle_mm_fault(vma, address, - req->wr_req ? FAULT_FLAG_WRITE : 0); + req->wr_req ? FAULT_FLAG_WRITE : 0, &mmrange); if (ret & VM_FAULT_ERROR) goto invalid; diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 0237ace12998..b5f911222ae6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -354,7 +354,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, vmas); } else { ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - vmas, NULL); + vmas, NULL, NULL); /* * The lifetime of a vaddr_get_pfn() page pin is * userspace-controlled. In the fs-dax case this could diff --git a/fs/exec.c b/fs/exec.c index d88584ebf07f..e96fd5328739 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -214,7 +214,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, * doing the exec and bprm->mm is the new process's mm. */ ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags, - &page, NULL, NULL); + &page, NULL, NULL, NULL); if (ret <= 0) return NULL; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index edf476c8cfb9..67aba05ff78b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -91,7 +91,7 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_ar long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, unsigned long *, long, unsigned int, - int *); + int *, struct range_lock *); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, struct page *); void __unmap_hugepage_range_final(struct mmu_gather *tlb, @@ -106,7 +106,8 @@ int hugetlb_report_node_meminfo(int, char *); void hugetlb_show_meminfo(void); unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, unsigned int flags); + unsigned long address, unsigned int flags, + struct range_lock *mmrange); int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, @@ -182,7 +183,7 @@ static inline void adjust_range_if_pmd_sharing_possible( { } -#define follow_hugetlb_page(m,v,p,vs,a,b,i,w,n) ({ BUG(); 0; }) +#define follow_hugetlb_page(m,v,p,vs,a,b,i,w,n,r) ({ BUG(); 0; }) #define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL) #define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; }) static inline void hugetlb_report_meminfo(struct seq_file *m) @@ -233,7 +234,7 @@ static inline void __unmap_hugepage_range(struct mmu_gather *tlb, } static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - unsigned int flags) + unsigned int flags, struct range_lock *mmrange) { BUG(); return 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index 780b6097ee47..044e428b1905 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -440,6 +440,10 @@ struct vm_fault { * page table to avoid allocation from * atomic context. */ + struct range_lock *lockrange; /* Range lock interval in use for when + * the mm lock is manipulated throughout + * its lifespan. + */ }; /* page entry size for vm->huge_fault() */ @@ -1507,25 +1511,29 @@ int invalidate_inode_page(struct page *page); #ifdef CONFIG_MMU extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags); + unsigned long address, unsigned int flags, + struct range_lock *mmrange); extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, - bool *unlocked); + bool *unlocked, struct range_lock *mmrange); void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows); void unmap_mapping_range(struct address_space *mapping, loff_t const holebegin, loff_t const holelen, int even_cows); #else static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) + unsigned long address, + unsigned int flags, + struct range_lock *mmrange) { /* should never happen if there's no MMU */ BUG(); return VM_FAULT_SIGBUS; } static inline int fixup_user_fault(struct task_struct *tsk, - struct mm_struct *mm, unsigned long address, - unsigned int fault_flags, bool *unlocked) + struct mm_struct *mm, unsigned long address, + unsigned int fault_flags, bool *unlocked, + struct range_lock *mmrange) { /* should never happen if there's no MMU */ BUG(); @@ -1553,12 +1561,14 @@ extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *locked); + struct vm_area_struct **vmas, int *locked, + struct range_lock *mmrange); long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); long get_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, int *locked); + unsigned int gup_flags, struct page **pages, int *locked, + struct range_lock *mmrange); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 9ec3544baee2..15eb4765827f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -462,7 +462,7 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma, extern void __lock_page(struct page *page); extern int __lock_page_killable(struct page *page); extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, - unsigned int flags); + unsigned int flags, struct range_lock *mmrange); extern void unlock_page(struct page *page); static inline int trylock_page(struct page *page) @@ -502,10 +502,10 @@ static inline int lock_page_killable(struct page *page) * __lock_page_or_retry(). */ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm, - unsigned int flags) + unsigned int flags, struct range_lock *mmrange) { might_sleep(); - return trylock_page(page) || __lock_page_or_retry(page, mm, flags); + return trylock_page(page) || __lock_page_or_retry(page, mm, flags, mmrange); } /* diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 78f61bfc6b79..3689eceb8d0c 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -374,7 +374,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d) return -EINVAL; ret = get_user_pages_remote(NULL, mm, vaddr, 1, - FOLL_WRITE, &page, &vma, NULL); + FOLL_WRITE, &page, &vma, NULL, NULL); if (unlikely(ret <= 0)) { /* * We are asking for 1 page. If get_user_pages_remote() fails, @@ -471,7 +471,8 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, retry: /* Read the page with vaddr into memory */ ret = get_user_pages_remote(NULL, mm, vaddr, 1, - FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL); + FOLL_FORCE | FOLL_SPLIT, &old_page, + &vma, NULL, NULL); if (ret <= 0) return ret; @@ -1976,7 +1977,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr) * essentially a kernel access to the memory. */ result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page, - NULL, NULL); + NULL, NULL, NULL); if (result < 0) return result; diff --git a/kernel/futex.c b/kernel/futex.c index 2268b97d5439..4615f9371a6f 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -733,7 +733,7 @@ static int fault_in_user_writeable(u32 __user *uaddr) down_read(&mm->mmap_sem); ret = fixup_user_fault(current, mm, (unsigned long)uaddr, - FAULT_FLAG_WRITE, NULL); + FAULT_FLAG_WRITE, NULL, NULL); up_read(&mm->mmap_sem); return ret < 0 ? ret : 0; diff --git a/mm/filemap.c b/mm/filemap.c index c5af80c43d36..959022841bab 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1378,7 +1378,7 @@ EXPORT_SYMBOL_GPL(__lock_page_killable); * with the page locked and the mmap_sem unperturbed. */ int __lock_page_or_retry(struct page *page, struct mm_struct *mm, - unsigned int flags) + unsigned int flags, struct range_lock *mmrange) { if (flags & FAULT_FLAG_ALLOW_RETRY) { /* diff --git a/mm/frame_vector.c b/mm/frame_vector.c index c64dca6e27c2..4e1a577cbb79 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -39,6 +39,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, int ret = 0; int err; int locked; + DEFINE_RANGE_LOCK_FULL(mmrange); if (nr_frames == 0) return 0; @@ -70,8 +71,9 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { vec->got_ref = true; vec->is_pfns = false; - ret = get_user_pages_locked(start, nr_frames, - gup_flags, (struct page **)(vec->ptrs), &locked); + ret = get_user_pages_locked(start, nr_frames, gup_flags, + (struct page **)(vec->ptrs), + &locked, &mmrange); goto out; } diff --git a/mm/gup.c b/mm/gup.c index 2c08248d4fa2..cf8fa037ce27 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -629,7 +629,8 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, * If it is, *@nonblocking will be set to 0 and -EBUSY returned. */ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *nonblocking) + unsigned long address, unsigned int *flags, + int *nonblocking, struct range_lock *mmrange) { unsigned int fault_flags = 0; vm_fault_t ret; @@ -650,7 +651,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_TRIED; } - ret = handle_mm_fault(vma, address, fault_flags); + ret = handle_mm_fault(vma, address, fault_flags, mmrange); if (ret & VM_FAULT_ERROR) { int err = vm_fault_to_errno(ret, *flags); @@ -746,6 +747,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * @vmas: array of pointers to vmas corresponding to each page. * Or NULL if the caller does not require them. * @nonblocking: whether waiting for disk IO or mmap_sem contention + * @mmrange: mm address space range locking * * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages @@ -792,7 +794,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *nonblocking) + struct vm_area_struct **vmas, int *nonblocking, + struct range_lock *mmrange) { long ret = 0, i = 0; struct vm_area_struct *vma = NULL; @@ -835,8 +838,9 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, } if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, - &start, &nr_pages, i, - gup_flags, nonblocking); + &start, &nr_pages, i, + gup_flags, + nonblocking, mmrange); continue; } } @@ -854,7 +858,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, page = follow_page_mask(vma, start, foll_flags, &ctx); if (!page) { ret = faultin_page(tsk, vma, start, &foll_flags, - nonblocking); + nonblocking, mmrange); switch (ret) { case 0: goto retry; @@ -935,6 +939,7 @@ static bool vma_permits_fault(struct vm_area_struct *vma, * @fault_flags:flags to pass down to handle_mm_fault() * @unlocked: did we unlock the mmap_sem while retrying, maybe NULL if caller * does not allow retry + * @mmrange: mm address space range locking * * This is meant to be called in the specific scenario where for locking reasons * we try to access user memory in atomic context (within a pagefault_disable() @@ -958,7 +963,7 @@ static bool vma_permits_fault(struct vm_area_struct *vma, */ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, - bool *unlocked) + bool *unlocked, struct range_lock *mmrange) { struct vm_area_struct *vma; vm_fault_t ret, major = 0; @@ -974,7 +979,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, if (!vma_permits_fault(vma, fault_flags)) return -EFAULT; - ret = handle_mm_fault(vma, address, fault_flags); + ret = handle_mm_fault(vma, address, fault_flags, mmrange); major |= ret & VM_FAULT_MAJOR; if (ret & VM_FAULT_ERROR) { int err = vm_fault_to_errno(ret, 0); @@ -1011,7 +1016,8 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, struct page **pages, struct vm_area_struct **vmas, int *locked, - unsigned int flags) + unsigned int flags, + struct range_lock *mmrange) { long ret, pages_done; bool lock_dropped; @@ -1030,7 +1036,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, lock_dropped = false; for (;;) { ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages, - vmas, locked); + vmas, locked, mmrange); if (!locked) /* VM_FAULT_RETRY couldn't trigger, bypass */ return ret; @@ -1073,7 +1079,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, lock_dropped = true; down_read(&mm->mmap_sem); ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, - pages, NULL, NULL); + pages, NULL, NULL, NULL); if (ret != 1) { BUG_ON(ret > 1); if (!pages_done) @@ -1121,7 +1127,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, */ long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - int *locked) + int *locked, struct range_lock *mmrange) { /* * FIXME: Current FOLL_LONGTERM behavior is incompatible with @@ -1134,7 +1140,7 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages, return __get_user_pages_locked(current, current->mm, start, nr_pages, pages, NULL, locked, - gup_flags | FOLL_TOUCH); + gup_flags | FOLL_TOUCH, mmrange); } EXPORT_SYMBOL(get_user_pages_locked); @@ -1159,6 +1165,7 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct mm_struct *mm = current->mm; int locked = 1; long ret; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * FIXME: Current FOLL_LONGTERM behavior is incompatible with @@ -1171,7 +1178,7 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, down_read(&mm->mmap_sem); ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL, - &locked, gup_flags | FOLL_TOUCH); + &locked, gup_flags | FOLL_TOUCH, &mmrange); if (locked) up_read(&mm->mmap_sem); return ret; @@ -1194,6 +1201,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked); * @locked: pointer to lock flag indicating whether lock is held and * subsequently whether VM_FAULT_RETRY functionality can be * utilised. Lock must initially be held. + * @mmrange: mm address space range locking * * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages @@ -1237,7 +1245,8 @@ EXPORT_SYMBOL(get_user_pages_unlocked); long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *locked) + struct vm_area_struct **vmas, int *locked, + struct range_lock *mmrange) { /* * FIXME: Current FOLL_LONGTERM behavior is incompatible with @@ -1250,7 +1259,8 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, locked, - gup_flags | FOLL_TOUCH | FOLL_REMOTE); + gup_flags | FOLL_TOUCH | FOLL_REMOTE, + mmrange); } EXPORT_SYMBOL(get_user_pages_remote); @@ -1394,7 +1404,7 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, */ nr_pages = __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, NULL, - gup_flags); + gup_flags, NULL); if ((nr_pages > 0) && migrate_allow) { drain_allow = true; @@ -1448,7 +1458,7 @@ static long __gup_longterm_locked(struct task_struct *tsk, } rc = __get_user_pages_locked(tsk, mm, start, nr_pages, pages, - vmas_tmp, NULL, gup_flags); + vmas_tmp, NULL, gup_flags, NULL); if (gup_flags & FOLL_LONGTERM) { memalloc_nocma_restore(flags); @@ -1481,7 +1491,7 @@ static __always_inline long __gup_longterm_locked(struct task_struct *tsk, unsigned int flags) { return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, - NULL, flags); + NULL, flags, NULL); } #endif /* CONFIG_FS_DAX || CONFIG_CMA */ @@ -1506,7 +1516,8 @@ EXPORT_SYMBOL(get_user_pages); * @vma: target vma * @start: start address * @end: end address - * @nonblocking: + * @nonblocking: whether waiting for disk IO or mmap_sem contention + * @mmrange: mm address space range locking * * This takes care of mlocking the pages too if VM_LOCKED is set. * @@ -1515,14 +1526,15 @@ EXPORT_SYMBOL(get_user_pages); * vma->vm_mm->mmap_sem must be held. * * If @nonblocking is NULL, it may be held for read or write and will - * be unperturbed. + * be unperturbed, and hence @mmrange will be unnecessary. * * If @nonblocking is non-NULL, it must held for read only and may be * released. If it's released, *@nonblocking will be set to 0. */ long populate_vma_page_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end, int *nonblocking) -{ + unsigned long start, unsigned long end, int *nonblocking, + struct range_lock *mmrange) + { struct mm_struct *mm = vma->vm_mm; unsigned long nr_pages = (end - start) / PAGE_SIZE; int gup_flags; @@ -1556,7 +1568,7 @@ long populate_vma_page_range(struct vm_area_struct *vma, * not result in a stack expansion that recurses back here. */ return __get_user_pages(current, mm, start, nr_pages, gup_flags, - NULL, NULL, nonblocking); + NULL, NULL, nonblocking, mmrange); } /* @@ -1573,6 +1585,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) struct vm_area_struct *vma = NULL; int locked = 0; long ret = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); end = start + len; @@ -1603,7 +1616,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) * double checks the vma flags, so that it won't mlock pages * if the vma was already munlocked. */ - ret = populate_vma_page_range(vma, nstart, nend, &locked); + ret = populate_vma_page_range(vma, nstart, nend, &locked, &mmrange); if (ret < 0) { if (ignore_errors) { ret = 0; @@ -1641,7 +1654,7 @@ struct page *get_dump_page(unsigned long addr) if (__get_user_pages(current, current->mm, addr, 1, FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma, - NULL) < 1) + NULL, NULL) < 1) return NULL; flush_cache_page(vma, addr, page_to_pfn(page)); return page; diff --git a/mm/hmm.c b/mm/hmm.c index 0db8491090b8..723109ac6bdc 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -347,7 +347,9 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr, flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY; flags |= write_fault ? FAULT_FLAG_WRITE : 0; - ret = handle_mm_fault(vma, addr, flags); + + /*** BROKEN mmrange, we don't care about hmm (for now) */ + ret = handle_mm_fault(vma, addr, flags, NULL); if (ret & VM_FAULT_RETRY) return -EAGAIN; if (ret & VM_FAULT_ERROR) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 81718c56b8f5..b56f69636ee2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3778,7 +3778,8 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, unsigned int flags) + unsigned long address, pte_t *ptep, unsigned int flags, + struct range_lock *mmrange) { struct hstate *h = hstate_vma(vma); vm_fault_t ret = VM_FAULT_SIGBUS; @@ -3821,6 +3822,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, .vma = vma, .address = haddr, .flags = flags, + .lockrange = mmrange, /* * Hard to debug if it ends up being * used by a callee that assumes @@ -3969,7 +3971,8 @@ u32 hugetlb_fault_mutex_hash(struct hstate *h, struct address_space *mapping, #endif vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, unsigned int flags) + unsigned long address, unsigned int flags, + struct range_lock *mmrange) { pte_t *ptep, entry; spinlock_t *ptl; @@ -4011,7 +4014,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, entry = huge_ptep_get(ptep); if (huge_pte_none(entry)) { - ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags); + ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags, mmrange); goto out_mutex; } @@ -4239,7 +4242,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page **pages, struct vm_area_struct **vmas, unsigned long *position, unsigned long *nr_pages, - long i, unsigned int flags, int *nonblocking) + long i, unsigned int flags, int *nonblocking, + struct range_lock *mmrange) { unsigned long pfn_offset; unsigned long vaddr = *position; @@ -4320,7 +4324,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, FAULT_FLAG_ALLOW_RETRY); fault_flags |= FAULT_FLAG_TRIED; } - ret = hugetlb_fault(mm, vma, vaddr, fault_flags); + ret = hugetlb_fault(mm, vma, vaddr, fault_flags, mmrange); if (ret & VM_FAULT_ERROR) { err = vm_fault_to_errno(ret, flags); remainder = 0; diff --git a/mm/internal.h b/mm/internal.h index 9eeaf2b95166..f38f7b9b01d8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -298,7 +298,8 @@ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma, #ifdef CONFIG_MMU extern long populate_vma_page_range(struct vm_area_struct *vma, - unsigned long start, unsigned long end, int *nonblocking); + unsigned long start, unsigned long end, int *nonblocking, + struct range_lock *mmrange); extern void munlock_vma_pages_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); static inline void munlock_vma_pages_all(struct vm_area_struct *vma) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a335f7c1fac4..3eefcb8f797d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -878,7 +878,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, static bool __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, - int referenced) + int referenced, + struct range_lock *mmrange) { int swapped_in = 0; vm_fault_t ret = 0; @@ -888,6 +889,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, .flags = FAULT_FLAG_ALLOW_RETRY, .pmd = pmd, .pgoff = linear_page_index(vma, address), + .lockrange = mmrange, }; /* we only decide to swapin, if there is enough young ptes */ @@ -932,9 +934,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, } static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced) + unsigned long address, + struct page **hpage, + int node, int referenced, + struct range_lock *mmrange) { pmd_t *pmd, _pmd; pte_t *pte; @@ -991,7 +994,8 @@ static void collapse_huge_page(struct mm_struct *mm, * If it fails, we release mmap_sem and jump out_nolock. * Continuing to collapse causes inconsistency. */ - if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) { + if (!__collapse_huge_page_swapin(mm, vma, address, pmd, + referenced, mmrange)) { mem_cgroup_cancel_charge(new_page, memcg, true); up_read(&mm->mmap_sem); goto out_nolock; @@ -1099,7 +1103,8 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage) + struct page **hpage, + struct range_lock *mmrange) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1213,7 +1218,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (ret) { node = khugepaged_find_target_node(); /* collapse_huge_page will return with the mmap_sem released */ - collapse_huge_page(mm, address, hpage, node, referenced); + collapse_huge_page(mm, address, hpage, node, referenced, mmrange); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1652,6 +1657,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, struct mm_struct *mm; struct vm_area_struct *vma; int progress = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); VM_BUG_ON(!pages); lockdep_assert_held(&khugepaged_mm_lock); @@ -1724,8 +1730,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage); + khugepaged_scan.address, + hpage, &mmrange); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; diff --git a/mm/ksm.c b/mm/ksm.c index 81c20ed57bf6..ccc9737311eb 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -480,8 +480,9 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) if (IS_ERR_OR_NULL(page)) break; if (PageKsm(page)) + /*** BROKEN mmrange, we don't care about ksm (for now) */ ret = handle_mm_fault(vma, addr, - FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE); + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE, NULL); else ret = VM_FAULT_WRITE; put_page(page); diff --git a/mm/memory.c b/mm/memory.c index 0d0711a912de..9516c95108a1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2850,7 +2850,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } - locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags, vmf->lockrange); delayacct_clear_flag(DELAYACCT_PF_SWAPIN); if (!locked) { @@ -3938,7 +3938,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) * return value. See filemap_fault() and __lock_page_or_retry(). */ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, - unsigned long address, unsigned int flags) + unsigned long address, unsigned int flags, + struct range_lock *mmrange) { struct vm_fault vmf = { .vma = vma, @@ -3946,6 +3947,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, .flags = flags, .pgoff = linear_page_index(vma, address), .gfp_mask = __get_fault_gfp_mask(vma), + .lockrange = mmrange, }; unsigned int dirty = flags & FAULT_FLAG_WRITE; struct mm_struct *mm = vma->vm_mm; @@ -4027,7 +4029,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, * return value. See filemap_fault() and __lock_page_or_retry(). */ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, - unsigned int flags) + unsigned int flags, struct range_lock *mmrange) { vm_fault_t ret; @@ -4052,9 +4054,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mem_cgroup_enter_user_fault(); if (unlikely(is_vm_hugetlb_page(vma))) - ret = hugetlb_fault(vma->vm_mm, vma, address, flags); + ret = hugetlb_fault(vma->vm_mm, vma, address, flags, mmrange); else - ret = __handle_mm_fault(vma, address, flags); + ret = __handle_mm_fault(vma, address, flags, mmrange); if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault(); @@ -4356,7 +4358,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, struct page *page = NULL; ret = get_user_pages_remote(tsk, mm, addr, 1, - gup_flags, &page, &vma, NULL); + gup_flags, &page, &vma, NULL, NULL); if (ret <= 0) { #ifndef CONFIG_HAVE_IOREMAP_PROT break; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2219e747df49..975793cc1d71 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -823,13 +823,15 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) } } -static int lookup_node(struct mm_struct *mm, unsigned long addr) +static int lookup_node(struct mm_struct *mm, unsigned long addr, + struct range_lock *mmrange) { struct page *p; int err; int locked = 1; - err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked); + err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, + &locked, mmrange); if (err >= 0) { err = page_to_nid(p); put_page(p); @@ -847,6 +849,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; struct mempolicy *pol = current->mempolicy, *pol_refcount = NULL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR|MPOL_F_MEMS_ALLOWED)) @@ -895,7 +898,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, pol_refcount = pol; vma = NULL; mpol_get(pol); - err = lookup_node(mm, addr); + err = lookup_node(mm, addr, &mmrange); if (err < 0) goto out; *policy = err; diff --git a/mm/mmap.c b/mm/mmap.c index 57803a0a3a5c..af228ae3508d 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2530,7 +2530,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr) if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr)) return NULL; if (prev->vm_flags & VM_LOCKED) - populate_vma_page_range(prev, addr, prev->vm_end, NULL); + populate_vma_page_range(prev, addr, prev->vm_end, NULL, NULL); return prev; } #else @@ -2560,7 +2560,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr) if (expand_stack(vma, addr)) return NULL; if (vma->vm_flags & VM_LOCKED) - populate_vma_page_range(vma, addr, start, NULL); + populate_vma_page_range(vma, addr, start, NULL, NULL); return vma; } #endif diff --git a/mm/mprotect.c b/mm/mprotect.c index bf38dfbbb4b4..36c517c6a5b1 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -439,7 +439,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, */ if ((oldflags & (VM_WRITE | VM_SHARED | VM_LOCKED)) == VM_LOCKED && (newflags & VM_WRITE)) { - populate_vma_page_range(vma, start, end, NULL); + populate_vma_page_range(vma, start, end, NULL, NULL); } vm_stat_account(mm, oldflags, -nrpages); diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index a447092d4635..ff6772b86195 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -90,6 +90,7 @@ static int process_vm_rw_single_vec(unsigned long addr, unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES / sizeof(struct pages *); unsigned int flags = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); /* Work out address and page range required */ if (len == 0) @@ -111,7 +112,8 @@ static int process_vm_rw_single_vec(unsigned long addr, */ down_read(&mm->mmap_sem); pages = get_user_pages_remote(task, mm, pa, pages, flags, - process_pages, NULL, &locked); + process_pages, NULL, &locked, + &mmrange); if (locked) up_read(&mm->mmap_sem); if (pages <= 0) diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c index 8526a0a74023..6f577b633413 100644 --- a/security/tomoyo/domain.c +++ b/security/tomoyo/domain.c @@ -910,7 +910,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos, * the execve(). */ if (get_user_pages_remote(current, bprm->mm, pos, 1, - FOLL_FORCE, &page, NULL, NULL) <= 0) + FOLL_FORCE, &page, NULL, NULL, NULL) <= 0) return false; #else page = bprm->page[pos / PAGE_SIZE]; diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 110cbe3f74f8..e93cd8515134 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -78,6 +78,7 @@ static void async_pf_execute(struct work_struct *work) unsigned long addr = apf->addr; gva_t gva = apf->gva; int locked = 1; + DEFINE_RANGE_LOCK_FULL(mmrange); might_sleep(); @@ -88,7 +89,7 @@ static void async_pf_execute(struct work_struct *work) */ down_read(&mm->mmap_sem); get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, - &locked); + &locked, &mmrange); if (locked) up_read(&mm->mmap_sem); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f0d13d9d125d..e1484150a3dd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1522,7 +1522,8 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable, - kvm_pfn_t *p_pfn) + kvm_pfn_t *p_pfn, + struct range_lock *mmrange) { unsigned long pfn; int r; @@ -1536,7 +1537,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, bool unlocked = false; r = fixup_user_fault(current, current->mm, addr, (write_fault ? FAULT_FLAG_WRITE : 0), - &unlocked); + &unlocked, mmrange); if (unlocked) return -EAGAIN; if (r) @@ -1588,6 +1589,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, struct vm_area_struct *vma; kvm_pfn_t pfn = 0; int npages, r; + DEFINE_RANGE_LOCK_FULL(mmrange); /* we can do it either atomically or asynchronously, not both */ BUG_ON(atomic && async); @@ -1615,7 +1617,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (vma == NULL) pfn = KVM_PFN_ERR_FAULT; else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) { - r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn); + r = hva_to_pfn_remapped(vma, addr, async, write_fault, + writable, &pfn, &mmrange); if (r == -EAGAIN) goto retry; if (r < 0) From patchwork Tue May 21 04:52:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952889 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A1EE013AD for ; Tue, 21 May 2019 04:53:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 93B3628848 for ; Tue, 21 May 2019 04:53:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 87FC428968; Tue, 21 May 2019 04:53:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EECD028866 for ; Tue, 21 May 2019 04:53:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 876EB6B000E; Tue, 21 May 2019 00:53:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8025A6B0266; Tue, 21 May 2019 00:53:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DB9F6B0269; Tue, 21 May 2019 00:53:39 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 0236E6B000E for ; Tue, 21 May 2019 00:53:39 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id x16so28733565edm.16 for ; Mon, 20 May 2019 21:53:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=RhKrHx4coH9CCJ8us/PPaV6G9Z+XqAHBa9IVwCw9ZVA=; b=DzqftKWGwPdICRp+h+1Dq+agyh4PcgR5SfrtsOdUdV8geholDaMbFQGn4TB2OspzUE PRMo7owkd54vSFA8YivEpceeaykpG9VqECXsUQsjJ6IWwiRXwoLvJPj/4DhUWjR+jVcj mA3elGFMsBuPYnRMkjE2j+2OfxNpftV9G7WXttfJYHfgEek+shqQwNfYZaxuX5vgdp1t BPfhnx7k7IrLu+hJg4dJp/PDMQqPiYr3SmYMQJsdYVqshHmayp6xxXpxsEXQDwTgL0WP 3lVq6ILH64kV/bunP188whB4XtFXExviQFb9X6WReCsj3RWhNSSEPWb9LxrYD6csA3Ie ncsw== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAXzwnBeC7pOI6eNH9w5Q5O+T+Bn2giOkjqTTspScQPJtkV1hOMq 9UK/ixVEnAdmqsp5uPdBx4qfdXqkrRN8jALu1aidO0bnVbBZDTlcN6zT1WZ2h4PS+W+0BV880Pp 6bTiZNUTFrZztFVb+ekjHptF0JhYQAbbnECR24tWXZ4f29t9lxB7yWAUmAJKynYw= X-Received: by 2002:a17:906:3b8f:: with SMTP id u15mr47866972ejf.6.1558414418476; Mon, 20 May 2019 21:53:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqwTm8o0iPoHidyqHRhulSpaxdZ4WEa1oAZ4u6uQ0EpQDBYF8LzTRtD7W3ozZClRlrd5MLAH X-Received: by 2002:a17:906:3b8f:: with SMTP id u15mr47866905ejf.6.1558414416963; Mon, 20 May 2019 21:53:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414416; cv=none; d=google.com; s=arc-20160816; b=uAzqdoHPfwyeKy0fgMXhcHqzM0YPgQdWZT8yIY1PC3NL7DDOoYp7sufU1uTNZVNR9w 8Y6MStTQoTaOR2Dw8ssJWvjLiynaufueXPgktd76nIHMe6QYJgJTUP48GjoxOH00/KlO yKUsFr+a1q1RLhLrPqaDpqRiQ6tPzC8YHOtsx1z9JQUtPUsnyoKcKhlUm/hMs7uCtLex cJWqvS0l+CWUFNYCXSsdpuCZNQIrBPjdXbd4spmxRsAay8IOC7TuWbwAtWMbYRcJmIhj BX+Oh6hiHVbibrPgadSLDXkauNpohBMcqL8Beh8ST34Ku1IkTDLTsTdo3xp1KDl8SqrF b8wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=RhKrHx4coH9CCJ8us/PPaV6G9Z+XqAHBa9IVwCw9ZVA=; b=uiFKeRj9TjrYCno8eBUSo0B2ny5uGNsC2bOdlYTRSDUxzggD5MN4EcMS+s5lX0+cgl q8Y3BoW49juKd3tIRDrGQxDBIn14dYerdmAJD8u2Y4SNvZK507zrgOCe9B11tlRcpQga JkBuPzk1Mio+UIjps0HAsoVmYIpe04OIf5j5EEukfa4O5NfciQWQGDgXnDUJod3FqLH8 PKJVS1yiVjte7U0MM1kkBZWzck8jSMxuGHGMH+lv6jKYYreok4sr2LYoh1+tPeeHYVYx 3kej7Csg2O1OZtBjZZ1JtjOwg949CK+bedruYuLElr76cw08JWDszdYRlZz90O+lbMeN w/7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id r16si2715773ejj.69.2019.05.20.21.53.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:36 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:36 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:06 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 05/14] mm: remove some BUG checks wrt mmap_sem Date: Mon, 20 May 2019 21:52:33 -0700 Message-Id: <20190521045242.24378-6-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch is a collection of hacks that shamelessly remove mmap_sem state checks in order to not have to teach file_operations about range locking; for thp and huge pagecache: By dropping the rwsem_is_locked checks in zap_pmd_range() and zap_pud_range() we can avoid having to teach file_operations about mmrange. For example in xfs: iomap_dio_rw() is called by .read_iter file callbacks. We also avoid mmap_sem trylock in vm_insert_page(): The rules to this function state that mmap_sem must be acquired by the caller: - for write if used in f_op->mmap() (by far the most common case) - for read if used from vma_op->fault()(with VM_MIXEDMAP) The only exception is: mmap_vmcore() remap_vmalloc_range_partial() mmap_vmcore() But there is no concurrency here, thus mmap_sem is not held. After auditing the kernel, the following drivers use the fault path and correctly set VM_MIXEDMAP): .fault = etnaviv_gem_fault .fault = udl_gem_fault tegra_bo_fault() As such, drop the reader trylock BUG_ON() for the common case. This avoids having file_operations know about mmranges, as mmap_sem is held during, mmap() for example. Signed-off-by: Davidlohr Bueso --- include/linux/huge_mm.h | 2 -- mm/memory.c | 2 -- mm/mmap.c | 4 ++-- mm/pagewalk.c | 3 --- 4 files changed, 2 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd5c150c21d..a4a9cfa78d8f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -194,7 +194,6 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_mm->mmap_sem), vma); if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else @@ -203,7 +202,6 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_mm->mmap_sem), vma); if (pud_trans_huge(*pud) || pud_devmap(*pud)) return __pud_trans_huge_lock(pud, vma); else diff --git a/mm/memory.c b/mm/memory.c index 9516c95108a1..73971f859035 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1212,7 +1212,6 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, next = pud_addr_end(addr, end); if (pud_trans_huge(*pud) || pud_devmap(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { - VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma); split_huge_pud(vma, pud, addr); } else if (zap_huge_pud(tlb, vma, pud, addr)) goto next; @@ -1519,7 +1518,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, if (!page_count(page)) return -EINVAL; if (!(vma->vm_flags & VM_MIXEDMAP)) { - BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem)); BUG_ON(vma->vm_flags & VM_PFNMAP); vma->vm_flags |= VM_MIXEDMAP; } diff --git a/mm/mmap.c b/mm/mmap.c index af228ae3508d..a03ded49f9eb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3466,7 +3466,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma) * The LSB of head.next can't change from under us * because we hold the mm_all_locks_mutex. */ - down_write_nest_lock(&anon_vma->root->rwsem, &mm->mmap_sem); + down_write(&mm->mmap_sem); /* * We can safely modify head.next after taking the * anon_vma->root->rwsem. If some other vma in this mm shares @@ -3496,7 +3496,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) */ if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags)) BUG(); - down_write_nest_lock(&mapping->i_mmap_rwsem, &mm->mmap_sem); + down_write(&mm->mmap_sem); } } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index c3084ff2569d..6246acf17054 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -303,8 +303,6 @@ int walk_page_range(unsigned long start, unsigned long end, if (!walk->mm) return -EINVAL; - VM_BUG_ON_MM(!rwsem_is_locked(&walk->mm->mmap_sem), walk->mm); - vma = find_vma(walk->mm, start); do { if (!vma) { /* after the last vma */ @@ -346,7 +344,6 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk) if (!walk->mm) return -EINVAL; - VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem)); VM_BUG_ON(!vma); walk->vma = vma; err = walk_page_test(vma->vm_start, vma->vm_end, walk); From patchwork Tue May 21 04:52:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952895 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6C4A76 for ; Tue, 21 May 2019 04:53:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C27D328848 for ; Tue, 21 May 2019 04:53:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B6BD82893D; Tue, 21 May 2019 04:53:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3201628848 for ; Tue, 21 May 2019 04:53:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31B4B6B0269; Tue, 21 May 2019 00:53:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A7096B026B; Tue, 21 May 2019 00:53:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 083366B026C; Tue, 21 May 2019 00:53:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 78A476B0269 for ; Tue, 21 May 2019 00:53:42 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id c24so28705251edb.6 for ; Mon, 20 May 2019 21:53:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=go5Dz8Po2Ksq+kuxQmQq67VAxcYaEzBrmdzQ+AYtOfk=; b=ha/ZeHvSkfrw0rbo4jA0rwl8j1QGrgFhAuk3EHgI6/v9OPsh4CRRKuwr4cZT03fDU7 ka02ub2uCVX0oPolvFdaW6fznQKQS60+/c4ETGqMUG5O1epy6yZ11zS0CoDVmQUcjeXz EM14XFp7DiCfRHx5fppjhy9/QDXKl22xOHdqi8TrA/VaESPF8M+6V1SA0h9sNEy2O/O7 Hpnxl+QiWc6CiQuf17E+4Ct0Z6wQPmQx5VxS2UqHpoy5gRCFU29+9W3ePK05wbBe/8Kn HAhepqu2C0erDWe6EKk4+CZovAgRZDPFV3tmMwVCcd/ZpQa3KavTIBVjx/UxKU5gIE/V qKpw== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAVNI/VGXn3Nx+1AiplWmd/OV0obbVYV6mpSkvQ1B2mzTI2Sdjgd 4MYh6hGTGNDgaJS317x4FyRgPY4PWL0YHULobU9TJzfRO2PAsphKrm6tDtVRZYSiD9TcEHfJLl7 keqC3cNeklma4NUZeppxH1xFh9vSKDXCSNQ+wI52clD+rFIOTITSNMWQiU3gHZtw= X-Received: by 2002:a50:b797:: with SMTP id h23mr71391103ede.197.1558414421864; Mon, 20 May 2019 21:53:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqzEHvHYG+uA0Lvn+aPd7r8ejWxCsxwJPJTcMpoGDJF4Rq8giX9tnozcqCwlyFnK2+zPFvsb X-Received: by 2002:a50:b797:: with SMTP id h23mr71390922ede.197.1558414418634; Mon, 20 May 2019 21:53:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414418; cv=none; d=google.com; s=arc-20160816; b=t/kqMhXtlBBHAdeV5GctsTZq0MuTh/J5p/sFCwGs8c7RjG5Qfsf6HsJkESMB4FZWC7 LQcxoNZBGZOgin7fHwYH9skqByvIyZ+rJvXnEz8CwHRvxNAsfptJDhS8MuFIeeLvQtdj sskaT+X9CPpDGjbAEYkEf4buAA8PC+C9JSfRNXAP2d6qeYxr3K6E91hVD/GjmPOaby3d Kt7BH6YKHPtKFG1JMtRawcOTGWSHdWsy5wDLY8KoA4ClolMROrwFHb9/amBQO+3iU4YW 0YnRu4IoSVyw8yupuIoLNMVMydLXe/QLDiFXeSp0oHCbgKzgwZZWxaWZLAORnWvAdOTk 9yNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=go5Dz8Po2Ksq+kuxQmQq67VAxcYaEzBrmdzQ+AYtOfk=; b=KwVj9roGH++Dsa2K4IYPzJO0vaiv8wtyVMX7TRCHzbylPDqGv2RMEuCzH2M1s4Ana7 VphtTrRdvMkrQERh/RUa4UeSe1r2W0sJJUhN59qXEAhKQKBtG5g2QdzYT7bJ1YycrD6K eKMypQzo1Uwl8owTNv8/RrW3qG/tnbSfRN30TFnWlYdVi3tzmd0UZcw9S8P7mHo7QIP9 LX7dZABM7HFUc1RklBqrmCujEgsm9lnHLTvRxwq2WXKI7bt/VoqpIKe+m9ugYBc4izu7 zZjzqoRwBFkvpDICnCu32SGvA6Lrh6+UacnLhTGOhRzSN01+6xwb8hV4RgiaUJu1vo0h HlPA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id w57si4112491edd.169.2019.05.20.21.53.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:38 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:37 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:08 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 06/14] mm: teach the mm about range locking Date: Mon, 20 May 2019 21:52:34 -0700 Message-Id: <20190521045242.24378-7-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time, and we already have vmf updated. No changes in semantics. Signed-off-by: Davidlohr Bueso --- include/linux/mm.h | 8 +++--- mm/filemap.c | 8 +++--- mm/frame_vector.c | 4 +-- mm/gup.c | 21 +++++++-------- mm/hmm.c | 3 ++- mm/khugepaged.c | 54 +++++++++++++++++++++------------------ mm/ksm.c | 42 +++++++++++++++++------------- mm/madvise.c | 36 ++++++++++++++------------ mm/memcontrol.c | 10 +++++--- mm/memory.c | 10 +++++--- mm/mempolicy.c | 25 ++++++++++-------- mm/migrate.c | 10 +++++--- mm/mincore.c | 6 +++-- mm/mlock.c | 20 +++++++++------ mm/mmap.c | 69 ++++++++++++++++++++++++++++---------------------- mm/mmu_notifier.c | 9 ++++--- mm/mprotect.c | 15 ++++++----- mm/mremap.c | 9 ++++--- mm/msync.c | 9 ++++--- mm/nommu.c | 25 ++++++++++-------- mm/oom_kill.c | 5 ++-- mm/process_vm_access.c | 4 +-- mm/shmem.c | 2 +- mm/swapfile.c | 5 ++-- mm/userfaultfd.c | 21 ++++++++------- mm/util.c | 10 +++++--- 26 files changed, 252 insertions(+), 188 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 044e428b1905..8bf3e2542047 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1459,6 +1459,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma, * right now." 1 means "skip the current vma." * @mm: mm_struct representing the target process of page table walk * @vma: vma currently walked (NULL if walking outside vmas) + * @mmrange: mm address space range locking * @private: private data for callbacks' usage * * (see the comment on walk_page_range() for more details) @@ -2358,8 +2359,8 @@ static inline int check_data_rlimit(unsigned long rlim, return 0; } -extern int mm_take_all_locks(struct mm_struct *mm); -extern void mm_drop_all_locks(struct mm_struct *mm); +extern int mm_take_all_locks(struct mm_struct *mm, struct range_lock *mmrange); +extern void mm_drop_all_locks(struct mm_struct *mm, struct range_lock *mmrange); extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file); extern struct file *get_mm_exe_file(struct mm_struct *mm); @@ -2389,7 +2390,8 @@ extern unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, - struct list_head *uf, bool downgrade); + struct list_head *uf, bool downgrade, + struct range_lock *); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); diff --git a/mm/filemap.c b/mm/filemap.c index 959022841bab..71f0d8a18f40 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1388,7 +1388,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm, if (flags & FAULT_FLAG_RETRY_NOWAIT) return 0; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); if (flags & FAULT_FLAG_KILLABLE) wait_on_page_locked_killable(page); else @@ -1400,7 +1400,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm, ret = __lock_page_killable(page); if (ret) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); return 0; } } else @@ -2317,7 +2317,7 @@ static struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == FAULT_FLAG_ALLOW_RETRY) { fpin = get_file(vmf->vma->vm_file); - up_read(&vmf->vma->vm_mm->mmap_sem); + mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange); } return fpin; } @@ -2357,7 +2357,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page, * mmap_sem here and return 0 if we don't have a fpin. */ if (*fpin == NULL) - up_read(&vmf->vma->vm_mm->mmap_sem); + mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange); return 0; } } else diff --git a/mm/frame_vector.c b/mm/frame_vector.c index 4e1a577cbb79..ef33d21b3f39 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, if (WARN_ON_ONCE(nr_frames > vec->nr_allocated)) nr_frames = vec->nr_allocated; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); locked = 1; vma = find_vma_intersection(mm, start, start + 1); if (!vma) { @@ -102,7 +102,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)); out: if (locked) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (!ret) ret = -EFAULT; if (ret > 0) diff --git a/mm/gup.c b/mm/gup.c index cf8fa037ce27..70b546a01682 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -990,7 +990,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, } if (ret & VM_FAULT_RETRY) { - down_read(&mm->mmap_sem); + mm_read_lock(mm, mmrange); if (!(fault_flags & FAULT_FLAG_TRIED)) { *unlocked = true; fault_flags &= ~FAULT_FLAG_ALLOW_RETRY; @@ -1077,7 +1077,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, */ *locked = 1; lock_dropped = true; - down_read(&mm->mmap_sem); + mm_read_lock(mm, mmrange); ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, pages, NULL, NULL, NULL); if (ret != 1) { @@ -1098,7 +1098,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, * We must let the caller know we temporarily dropped the lock * and so the critical section protected by it was lost. */ - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); *locked = 0; } return pages_done; @@ -1176,11 +1176,11 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) return -EINVAL; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL, &locked, gup_flags | FOLL_TOUCH, &mmrange); if (locked) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret; } EXPORT_SYMBOL(get_user_pages_unlocked); @@ -1543,7 +1543,7 @@ long populate_vma_page_range(struct vm_area_struct *vma, VM_BUG_ON(end & ~PAGE_MASK); VM_BUG_ON_VMA(start < vma->vm_start, vma); VM_BUG_ON_VMA(end > vma->vm_end, vma); - VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm); + VM_BUG_ON_MM(!mm_is_locked(mm, mmrange), mm); gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK; if (vma->vm_flags & VM_LOCKONFAULT) @@ -1596,7 +1596,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) */ if (!locked) { locked = 1; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, nstart); } else if (nstart >= vma->vm_end) vma = vma->vm_next; @@ -1628,7 +1628,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) ret = 0; } if (locked) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret; /* 0 or negative error code */ } @@ -2189,17 +2189,18 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages) { int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * FIXME: FOLL_LONGTERM does not work with * get_user_pages_unlocked() (see comments in that function) */ if (gup_flags & FOLL_LONGTERM) { - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); ret = __gup_longterm_locked(current, current->mm, start, nr_pages, pages, NULL, gup_flags); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } else { ret = get_user_pages_unlocked(start, nr_pages, pages, gup_flags); diff --git a/mm/hmm.c b/mm/hmm.c index 723109ac6bdc..a79a07f7ccc1 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -1118,7 +1118,8 @@ long hmm_range_fault(struct hmm_range *range, bool block) do { /* If range is no longer valid force retry. */ if (!range->valid) { - up_read(&hmm->mm->mmap_sem); + /*** BROKEN mmrange, we don't care about hmm (for now) */ + mm_read_unlock(hmm->mm, NULL); return -EAGAIN; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3eefcb8f797d..13d8e29f4674 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -488,6 +488,8 @@ void __khugepaged_exit(struct mm_struct *mm) free_mm_slot(mm_slot); mmdrop(mm); } else if (mm_slot) { + DEFINE_RANGE_LOCK_FULL(mmrange); + /* * This is required to serialize against * khugepaged_test_exit() (which is guaranteed to run @@ -496,8 +498,8 @@ void __khugepaged_exit(struct mm_struct *mm) * khugepaged has finished working on the pagetables * under the mmap_sem. */ - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); + mm_write_unlock(mm, &mmrange); } } @@ -908,7 +910,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, /* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */ if (ret & VM_FAULT_RETRY) { - down_read(&mm->mmap_sem); + mm_read_lock(mm, mmrange); if (hugepage_vma_revalidate(mm, address, &vmf.vma)) { /* vma is no longer available, don't continue to swapin */ trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); @@ -961,7 +963,7 @@ static void collapse_huge_page(struct mm_struct *mm, * sync compaction, and we do not need to hold the mmap_sem during * that. We will recheck the vma after taking it again in write mode. */ - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); new_page = khugepaged_alloc_page(hpage, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; @@ -973,11 +975,11 @@ static void collapse_huge_page(struct mm_struct *mm, goto out_nolock; } - down_read(&mm->mmap_sem); + mm_read_lock(mm, mmrange); result = hugepage_vma_revalidate(mm, address, &vma); if (result) { mem_cgroup_cancel_charge(new_page, memcg, true); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); goto out_nolock; } @@ -985,7 +987,7 @@ static void collapse_huge_page(struct mm_struct *mm, if (!pmd) { result = SCAN_PMD_NULL; mem_cgroup_cancel_charge(new_page, memcg, true); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); goto out_nolock; } @@ -997,17 +999,17 @@ static void collapse_huge_page(struct mm_struct *mm, if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced, mmrange)) { mem_cgroup_cancel_charge(new_page, memcg, true); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); goto out_nolock; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); /* * Prevent all access to pagetables with the exception of * gup_fast later handled by the ptep_clear_flush and the VM * handled by the anon_vma lock + PG_lock. */ - down_write(&mm->mmap_sem); + mm_write_lock(mm, mmrange); result = hugepage_vma_revalidate(mm, address, &vma); if (result) goto out; @@ -1091,7 +1093,7 @@ static void collapse_huge_page(struct mm_struct *mm, khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, mmrange); out_nolock: trace_mm_collapse_huge_page(mm, isolated, result); return; @@ -1250,7 +1252,8 @@ static void collect_mm_slot(struct mm_slot *mm_slot) } #if defined(CONFIG_SHMEM) && defined(CONFIG_TRANSPARENT_HUGE_PAGECACHE) -static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) +static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff, + struct range_lock *mmrange) { struct vm_area_struct *vma; unsigned long addr; @@ -1275,12 +1278,12 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * re-fault. Not ideal, but it's more important to not disturb * the system too much. */ - if (down_write_trylock(&vma->vm_mm->mmap_sem)) { + if (mm_write_trylock(vma->vm_mm, mmrange)) { spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd); /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); spin_unlock(ptl); - up_write(&vma->vm_mm->mmap_sem); + mm_write_unlock(vma->vm_mm, mmrange); mm_dec_nr_ptes(vma->vm_mm); pte_free(vma->vm_mm, pmd_pgtable(_pmd)); } @@ -1307,8 +1310,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_shmem(struct mm_struct *mm, - struct address_space *mapping, pgoff_t start, - struct page **hpage, int node) + struct address_space *mapping, pgoff_t start, + struct page **hpage, int node, + struct range_lock *mmrange) { gfp_t gfp; struct page *new_page; @@ -1515,7 +1519,7 @@ static void collapse_shmem(struct mm_struct *mm, /* * Remove pte page tables, so we can re-fault the page as huge. */ - retract_page_tables(mapping, start); + retract_page_tables(mapping, start, mmrange); *hpage = NULL; khugepaged_pages_collapsed++; @@ -1566,8 +1570,9 @@ static void collapse_shmem(struct mm_struct *mm, } static void khugepaged_scan_shmem(struct mm_struct *mm, - struct address_space *mapping, - pgoff_t start, struct page **hpage) + struct address_space *mapping, + pgoff_t start, struct page **hpage, + struct range_lock *mmrange) { struct page *page = NULL; XA_STATE(xas, &mapping->i_pages, start); @@ -1633,7 +1638,8 @@ static void khugepaged_scan_shmem(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; } else { node = khugepaged_find_target_node(); - collapse_shmem(mm, mapping, start, hpage, node); + collapse_shmem(mm, mapping, start, hpage, + node, mmrange); } } @@ -1678,7 +1684,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, * the next mm on the list. */ vma = NULL; - if (unlikely(!down_read_trylock(&mm->mmap_sem))) + if (unlikely(!mm_read_trylock(mm, &mmrange))) goto breakouterloop_mmap_sem; if (likely(!khugepaged_test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); @@ -1723,10 +1729,10 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, if (!shmem_huge_enabled(vma)) goto skip; file = get_file(vma->vm_file); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); ret = 1; khugepaged_scan_shmem(mm, file->f_mapping, - pgoff, hpage); + pgoff, hpage, &mmrange); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, @@ -1744,7 +1750,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, } } breakouterloop: - up_read(&mm->mmap_sem); /* exit_mmap will destroy ptes after this */ + mm_read_unlock(mm, &mmrange); /* exit_mmap will destroy ptes after this */ breakouterloop_mmap_sem: spin_lock(&khugepaged_mm_lock); diff --git a/mm/ksm.c b/mm/ksm.c index ccc9737311eb..7f9826ea7dba 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -537,6 +537,7 @@ static void break_cow(struct rmap_item *rmap_item) struct mm_struct *mm = rmap_item->mm; unsigned long addr = rmap_item->address; struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * It is not an accident that whenever we want to break COW @@ -544,11 +545,11 @@ static void break_cow(struct rmap_item *rmap_item) */ put_anon_vma(rmap_item->anon_vma); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_mergeable_vma(mm, addr); if (vma) break_ksm(vma, addr); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } static struct page *get_mergeable_page(struct rmap_item *rmap_item) @@ -557,8 +558,9 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item) unsigned long addr = rmap_item->address; struct vm_area_struct *vma; struct page *page; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_mergeable_vma(mm, addr); if (!vma) goto out; @@ -574,7 +576,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item) out: page = NULL; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return page; } @@ -969,6 +971,7 @@ static int unmerge_and_remove_all_rmap_items(void) struct mm_struct *mm; struct vm_area_struct *vma; int err = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); spin_lock(&ksm_mmlist_lock); ksm_scan.mm_slot = list_entry(ksm_mm_head.mm_list.next, @@ -978,7 +981,7 @@ static int unmerge_and_remove_all_rmap_items(void) for (mm_slot = ksm_scan.mm_slot; mm_slot != &ksm_mm_head; mm_slot = ksm_scan.mm_slot) { mm = mm_slot->mm; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (vma = mm->mmap; vma; vma = vma->vm_next) { if (ksm_test_exit(mm)) break; @@ -991,7 +994,7 @@ static int unmerge_and_remove_all_rmap_items(void) } remove_trailing_rmap_items(mm_slot, &mm_slot->rmap_list); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); spin_lock(&ksm_mmlist_lock); ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next, @@ -1014,7 +1017,7 @@ static int unmerge_and_remove_all_rmap_items(void) return 0; error: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); spin_lock(&ksm_mmlist_lock); ksm_scan.mm_slot = &ksm_mm_head; spin_unlock(&ksm_mmlist_lock); @@ -1299,8 +1302,9 @@ static int try_to_merge_with_ksm_page(struct rmap_item *rmap_item, struct mm_struct *mm = rmap_item->mm; struct vm_area_struct *vma; int err = -EFAULT; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_mergeable_vma(mm, rmap_item->address); if (!vma) goto out; @@ -1316,7 +1320,7 @@ static int try_to_merge_with_ksm_page(struct rmap_item *rmap_item, rmap_item->anon_vma = vma->anon_vma; get_anon_vma(vma->anon_vma); out: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return err; } @@ -2129,12 +2133,13 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item) */ if (ksm_use_zero_pages && (checksum == zero_checksum)) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_mergeable_vma(mm, rmap_item->address); err = try_to_merge_one_page(vma, page, ZERO_PAGE(rmap_item->address)); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); /* * In case of failure, the page was not really empty, so we * need to continue. Otherwise we're done. @@ -2240,6 +2245,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) struct vm_area_struct *vma; struct rmap_item *rmap_item; int nid; + DEFINE_RANGE_LOCK_FULL(mmrange); if (list_empty(&ksm_mm_head.mm_list)) return NULL; @@ -2297,7 +2303,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) } mm = slot->mm; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); if (ksm_test_exit(mm)) vma = NULL; else @@ -2331,7 +2337,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) ksm_scan.address += PAGE_SIZE; } else put_page(*page); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return rmap_item; } put_page(*page); @@ -2369,10 +2375,10 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) free_mm_slot(slot); clear_bit(MMF_VM_MERGEABLE, &mm->flags); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmdrop(mm); } else { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); /* * up_read(&mm->mmap_sem) first because after * spin_unlock(&ksm_mmlist_lock) run, the "mm" may @@ -2571,8 +2577,10 @@ void __ksm_exit(struct mm_struct *mm) clear_bit(MMF_VM_MERGEABLE, &mm->flags); mmdrop(mm); } else if (mm_slot) { - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + + mm_write_lock(mm, &mmrange); + mm_write_unlock(mm, &mmrange); } } diff --git a/mm/madvise.c b/mm/madvise.c index 628022e674a7..78a3f86d9c52 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -516,16 +516,16 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma, static long madvise_dontneed_free(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, - int behavior) + int behavior, struct range_lock *mmrange) { *prev = vma; if (!can_madv_dontneed_vma(vma)) return -EINVAL; - if (!userfaultfd_remove(vma, start, end)) { + if (!userfaultfd_remove(vma, start, end, mmrange)) { *prev = NULL; /* mmap_sem has been dropped, prev is stale */ - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, mmrange); vma = find_vma(current->mm, start); if (!vma) return -ENOMEM; @@ -574,8 +574,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, * This is effectively punching a hole into the middle of a file. */ static long madvise_remove(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end) + struct vm_area_struct **prev, + unsigned long start, unsigned long end, + struct range_lock *mmrange) { loff_t offset; int error; @@ -605,15 +606,15 @@ static long madvise_remove(struct vm_area_struct *vma, * mmap_sem. */ get_file(f); - if (userfaultfd_remove(vma, start, end)) { + if (userfaultfd_remove(vma, start, end, mmrange)) { /* mmap_sem was not released by userfaultfd_remove() */ - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, mmrange); } error = vfs_fallocate(f, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, offset, end - start); fput(f); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, mmrange); return error; } @@ -688,16 +689,18 @@ static int madvise_inject_error(int behavior, static long madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) + unsigned long start, unsigned long end, int behavior, + struct range_lock *mmrange) { switch (behavior) { case MADV_REMOVE: - return madvise_remove(vma, prev, start, end); + return madvise_remove(vma, prev, start, end, mmrange); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: - return madvise_dontneed_free(vma, prev, start, end, behavior); + return madvise_dontneed_free(vma, prev, start, end, + behavior, mmrange); default: return madvise_behavior(vma, prev, start, end, behavior); } @@ -809,6 +812,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) int write; size_t len; struct blk_plug plug; + DEFINE_RANGE_LOCK_FULL(mmrange); if (!madvise_behavior_valid(behavior)) return error; @@ -836,10 +840,10 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) write = madvise_need_mmap_write(behavior); if (write) { - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; } else { - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); } /* @@ -872,7 +876,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) tmp = end; /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ - error = madvise_vma(vma, &prev, start, tmp, behavior); + error = madvise_vma(vma, &prev, start, tmp, behavior, &mmrange); if (error) goto out; start = tmp; @@ -889,9 +893,9 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) out: blk_finish_plug(&plug); if (write) - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); else - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return error; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2535e54e7989..c822cea99570 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5139,10 +5139,11 @@ static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) .pmd_entry = mem_cgroup_count_precharge_pte_range, .mm = mm, }; - down_read(&mm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + mm_read_lock(mm, &mmrange); walk_page_range(0, mm->highest_vm_end, &mem_cgroup_count_precharge_walk); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); precharge = mc.precharge; mc.precharge = 0; @@ -5412,6 +5413,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, static void mem_cgroup_move_charge(void) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_walk mem_cgroup_move_charge_walk = { .pmd_entry = mem_cgroup_move_charge_pte_range, .mm = mc.mm, @@ -5426,7 +5428,7 @@ static void mem_cgroup_move_charge(void) atomic_inc(&mc.from->moving_account); synchronize_rcu(); retry: - if (unlikely(!down_read_trylock(&mc.mm->mmap_sem))) { + if (unlikely(!mm_read_trylock(mc.mm, &mmrange))) { /* * Someone who are holding the mmap_sem might be waiting in * waitq. So we cancel all extra charges, wake up all waiters, @@ -5444,7 +5446,7 @@ static void mem_cgroup_move_charge(void) */ walk_page_range(0, mc.mm->highest_vm_end, &mem_cgroup_move_charge_walk); - up_read(&mc.mm->mmap_sem); + mm_read_unlock(mc.mm, &mmrange); atomic_dec(&mc.from->moving_account); } diff --git a/mm/memory.c b/mm/memory.c index 73971f859035..8a5f52978893 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4347,8 +4347,9 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct *vma; void *old_buf = buf; int write = gup_flags & FOLL_WRITE; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); /* ignore errors, just check how much was successfully transferred */ while (len) { int bytes, ret, offset; @@ -4397,7 +4398,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, buf += bytes; addr += bytes; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return buf - old_buf; } @@ -4450,11 +4451,12 @@ void print_vma_addr(char *prefix, unsigned long ip) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * we might be running from an atomic context so we cannot sleep */ - if (!down_read_trylock(&mm->mmap_sem)) + if (!mm_read_trylock(mm, &mmrange)) return; vma = find_vma(mm, ip); @@ -4473,7 +4475,7 @@ void print_vma_addr(char *prefix, unsigned long ip) free_page((unsigned long)buf); } } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } #if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 975793cc1d71..8bf8861e0c73 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -378,11 +378,12 @@ void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new) void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); for (vma = mm->mmap; vma; vma = vma->vm_next) mpol_rebind_policy(vma->vm_policy, new); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); } static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { @@ -837,7 +838,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr, put_page(p); } if (locked) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); return err; } @@ -871,10 +872,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, * vma/shared policy at addr is NULL. We * want to return MPOL_DEFAULT in this case. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma_intersection(mm, addr, addr+1); if (!vma) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return -EFAULT; } if (vma->vm_ops && vma->vm_ops->get_policy) @@ -933,7 +934,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, out: mpol_cond_put(pol); if (vma) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (pol_refcount) mpol_put(pol_refcount); return err; @@ -1026,12 +1027,13 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, int busy = 0; int err; nodemask_t tmp; + DEFINE_RANGE_LOCK_FULL(mmrange); err = migrate_prep(); if (err) return err; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); /* * Find a 'source' bit set in 'tmp' whose corresponding 'dest' @@ -1112,7 +1114,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, if (err < 0) break; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (err < 0) return err; return busy; @@ -1186,6 +1188,7 @@ static long do_mbind(unsigned long start, unsigned long len, unsigned long end; int err; LIST_HEAD(pagelist); + DEFINE_RANGE_LOCK_FULL(mmrange); if (flags & ~(unsigned long)MPOL_MF_VALID) return -EINVAL; @@ -1233,12 +1236,12 @@ static long do_mbind(unsigned long start, unsigned long len, { NODEMASK_SCRATCH(scratch); if (scratch) { - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); task_lock(current); err = mpol_set_nodemask(new, nmask, scratch); task_unlock(current); if (err) - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); } else err = -ENOMEM; NODEMASK_SCRATCH_FREE(scratch); @@ -1267,7 +1270,7 @@ static long do_mbind(unsigned long start, unsigned long len, } else putback_movable_pages(&pagelist); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); mpol_out: mpol_put(new); return err; diff --git a/mm/migrate.c b/mm/migrate.c index f2ecc2855a12..3a268b316e4e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1531,8 +1531,9 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, struct page *page; unsigned int follflags; int err; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); err = -EFAULT; vma = find_vma(mm, addr); if (!vma || addr < vma->vm_start || !vma_migratable(vma)) @@ -1585,7 +1586,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, */ put_page(page); out: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return err; } @@ -1686,8 +1687,9 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, const void __user **pages, int *status) { unsigned long i; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (i = 0; i < nr_pages; i++) { unsigned long addr = (unsigned long)(*pages); @@ -1714,7 +1716,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, status++; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } /* diff --git a/mm/mincore.c b/mm/mincore.c index c3f058bd0faf..c1d3a9cd2ba3 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -270,13 +270,15 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, retval = 0; while (pages) { + DEFINE_RANGE_LOCK_FULL(mmrange); + /* * Do at most PAGE_SIZE entries per iteration, due to * the temporary buffer size. */ - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); retval = do_mincore(start, min(pages, PAGE_SIZE), tmp); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); if (retval <= 0) break; diff --git a/mm/mlock.c b/mm/mlock.c index e492a155c51a..c5b5dbd92a3a 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -670,6 +670,7 @@ static int count_mm_mlocked_page_nr(struct mm_struct *mm, static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags) { + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long locked; unsigned long lock_limit; int error = -ENOMEM; @@ -684,7 +685,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla lock_limit >>= PAGE_SHIFT; locked = len >> PAGE_SHIFT; - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; locked += atomic64_read(¤t->mm->locked_vm); @@ -703,7 +704,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla if ((locked <= lock_limit) || capable(CAP_IPC_LOCK)) error = apply_vma_lock_flags(start, len, flags); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); if (error) return error; @@ -733,15 +734,16 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags) SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len) { + DEFINE_RANGE_LOCK_FULL(mmrange); int ret; len = PAGE_ALIGN(len + (offset_in_page(start))); start &= PAGE_MASK; - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; ret = apply_vma_lock_flags(start, len, 0); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); return ret; } @@ -794,6 +796,7 @@ static int apply_mlockall_flags(int flags) SYSCALL_DEFINE1(mlockall, int, flags) { + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long lock_limit; int ret; @@ -806,14 +809,14 @@ SYSCALL_DEFINE1(mlockall, int, flags) lock_limit = rlimit(RLIMIT_MEMLOCK); lock_limit >>= PAGE_SHIFT; - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; ret = -ENOMEM; if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) || capable(CAP_IPC_LOCK)) ret = apply_mlockall_flags(flags); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); if (!ret && (flags & MCL_CURRENT)) mm_populate(0, TASK_SIZE); @@ -822,12 +825,13 @@ SYSCALL_DEFINE1(mlockall, int, flags) SYSCALL_DEFINE0(munlockall) { + DEFINE_RANGE_LOCK_FULL(mmrange); int ret; - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; ret = apply_mlockall_flags(0); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); return ret; } diff --git a/mm/mmap.c b/mm/mmap.c index a03ded49f9eb..2eecdeb5fcd6 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -198,9 +198,10 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) unsigned long min_brk; bool populate; bool downgraded = false; + DEFINE_RANGE_LOCK_FULL(mmrange); LIST_HEAD(uf); - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; origbrk = mm->brk; @@ -251,7 +252,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) * mm->brk will be restored from origbrk. */ mm->brk = brk; - ret = __do_munmap(mm, newbrk, oldbrk-newbrk, &uf, true); + ret = __do_munmap(mm, newbrk, oldbrk-newbrk, &uf, true, &mmrange); if (ret < 0) { mm->brk = origbrk; goto out; @@ -274,9 +275,9 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) success: populate = newbrk > oldbrk && (mm->def_flags & VM_LOCKED) != 0; if (downgraded) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); else - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(oldbrk, newbrk - oldbrk); @@ -284,7 +285,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) out: retval = origbrk; - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return retval; } @@ -2726,7 +2727,8 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma, * Jeremy Fitzhardinge */ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, - struct list_head *uf, bool downgrade) + struct list_head *uf, bool downgrade, + struct range_lock *mmrange) { unsigned long end; struct vm_area_struct *vma, *prev, *last; @@ -2824,7 +2826,7 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, detach_vmas_to_be_unmapped(mm, vma, prev, end); if (downgrade) - downgrade_write(&mm->mmap_sem); + mm_downgrade_write(mm, mmrange); unmap_region(mm, vma, prev, start, end); @@ -2837,7 +2839,7 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len, int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf) { - return __do_munmap(mm, start, len, uf, false); + return __do_munmap(mm, start, len, uf, false, NULL); } static int __vm_munmap(unsigned long start, size_t len, bool downgrade) @@ -2845,21 +2847,22 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade) int ret; struct mm_struct *mm = current->mm; LIST_HEAD(uf); + DEFINE_RANGE_LOCK_FULL(mmrange); - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; - ret = __do_munmap(mm, start, len, &uf, downgrade); + ret = __do_munmap(mm, start, len, &uf, downgrade, &mmrange); /* * Returning 1 indicates mmap_sem is downgraded. * But 1 is not legal return value of vm_munmap() and munmap(), reset * it to 0 before return. */ if (ret == 1) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); ret = 0; } else - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); userfaultfd_unmap_complete(mm, &uf); return ret; @@ -2884,6 +2887,7 @@ SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len) SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, unsigned long, prot, unsigned long, pgoff, unsigned long, flags) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; @@ -2906,7 +2910,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (pgoff + (size >> PAGE_SHIFT) < pgoff) return ret; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; vma = find_vma(mm, start); @@ -2969,7 +2973,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, prot, flags, pgoff, &populate, NULL); fput(file); out: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); if (populate) mm_populate(ret, populate); if (!IS_ERR_VALUE(ret)) @@ -3056,6 +3060,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; unsigned long len; int ret; @@ -3068,12 +3073,12 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags) if (!len) return 0; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; ret = do_brk_flags(addr, len, flags, &uf); populate = ((mm->def_flags & VM_LOCKED) != 0); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); userfaultfd_unmap_complete(mm, &uf); if (populate && !ret) mm_populate(addr, len); @@ -3098,6 +3103,8 @@ void exit_mmap(struct mm_struct *mm) mmu_notifier_release(mm); if (unlikely(mm_is_oom_victim(mm))) { + DEFINE_RANGE_LOCK_FULL(mmrange); + /* * Manually reap the mm to free as much memory as possible. * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard @@ -3117,8 +3124,8 @@ void exit_mmap(struct mm_struct *mm) (void)__oom_reap_task_mm(mm); set_bit(MMF_OOM_SKIP, &mm->flags); - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); + mm_write_unlock(mm, &mmrange); } if (atomic64_read(&mm->locked_vm)) { @@ -3459,14 +3466,15 @@ int install_special_mapping(struct mm_struct *mm, static DEFINE_MUTEX(mm_all_locks_mutex); -static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma) +static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma, + struct range_lock *mmrange) { if (!test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_root.rb_node)) { /* * The LSB of head.next can't change from under us * because we hold the mm_all_locks_mutex. */ - down_write(&mm->mmap_sem); + mm_write_lock(mm, mmrange); /* * We can safely modify head.next after taking the * anon_vma->root->rwsem. If some other vma in this mm shares @@ -3482,7 +3490,8 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma) } } -static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) +static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping, + struct range_lock *mmrange) { if (!test_bit(AS_MM_ALL_LOCKS, &mapping->flags)) { /* @@ -3496,7 +3505,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) */ if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags)) BUG(); - down_write(&mm->mmap_sem); + mm_write_lock(mm, mmrange); } } @@ -3537,12 +3546,12 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) * * mm_take_all_locks() can fail if it's interrupted by signals. */ -int mm_take_all_locks(struct mm_struct *mm) +int mm_take_all_locks(struct mm_struct *mm, struct range_lock *mmrange) { struct vm_area_struct *vma; struct anon_vma_chain *avc; - BUG_ON(down_read_trylock(&mm->mmap_sem)); + BUG_ON(mm_read_trylock(mm, mmrange)); mutex_lock(&mm_all_locks_mutex); @@ -3551,7 +3560,7 @@ int mm_take_all_locks(struct mm_struct *mm) goto out_unlock; if (vma->vm_file && vma->vm_file->f_mapping && is_vm_hugetlb_page(vma)) - vm_lock_mapping(mm, vma->vm_file->f_mapping); + vm_lock_mapping(mm, vma->vm_file->f_mapping, mmrange); } for (vma = mm->mmap; vma; vma = vma->vm_next) { @@ -3559,7 +3568,7 @@ int mm_take_all_locks(struct mm_struct *mm) goto out_unlock; if (vma->vm_file && vma->vm_file->f_mapping && !is_vm_hugetlb_page(vma)) - vm_lock_mapping(mm, vma->vm_file->f_mapping); + vm_lock_mapping(mm, vma->vm_file->f_mapping, mmrange); } for (vma = mm->mmap; vma; vma = vma->vm_next) { @@ -3567,13 +3576,13 @@ int mm_take_all_locks(struct mm_struct *mm) goto out_unlock; if (vma->anon_vma) list_for_each_entry(avc, &vma->anon_vma_chain, same_vma) - vm_lock_anon_vma(mm, avc->anon_vma); + vm_lock_anon_vma(mm, avc->anon_vma, mmrange); } return 0; out_unlock: - mm_drop_all_locks(mm); + mm_drop_all_locks(mm, mmrange); return -EINTR; } @@ -3617,12 +3626,12 @@ static void vm_unlock_mapping(struct address_space *mapping) * The mmap_sem cannot be released by the caller until * mm_drop_all_locks() returns. */ -void mm_drop_all_locks(struct mm_struct *mm) +void mm_drop_all_locks(struct mm_struct *mm, struct range_lock *mmrange) { struct vm_area_struct *vma; struct anon_vma_chain *avc; - BUG_ON(down_read_trylock(&mm->mmap_sem)); + BUG_ON(mm_read_trylock(mm, mmrange)); BUG_ON(!mutex_is_locked(&mm_all_locks_mutex)); for (vma = mm->mmap; vma; vma = vma->vm_next) { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index ee36068077b6..028eaed031e1 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -244,6 +244,7 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn, { struct mmu_notifier_mm *mmu_notifier_mm; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); BUG_ON(atomic_read(&mm->mm_users) <= 0); @@ -253,8 +254,8 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn, goto out; if (take_mmap_sem) - down_write(&mm->mmap_sem); - ret = mm_take_all_locks(mm); + mm_write_lock(mm, &mmrange); + ret = mm_take_all_locks(mm, &mmrange); if (unlikely(ret)) goto out_clean; @@ -279,10 +280,10 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn, hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list); spin_unlock(&mm->mmu_notifier_mm->lock); - mm_drop_all_locks(mm); + mm_drop_all_locks(mm, &mmrange); out_clean: if (take_mmap_sem) - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); kfree(mmu_notifier_mm); out: BUG_ON(atomic_read(&mm->mm_users) <= 0); diff --git a/mm/mprotect.c b/mm/mprotect.c index 36c517c6a5b1..443b033f240c 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -458,6 +458,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, static int do_mprotect_pkey(unsigned long start, size_t len, unsigned long prot, int pkey) { + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long nstart, end, tmp, reqprot; struct vm_area_struct *vma, *prev; int error = -EINVAL; @@ -482,7 +483,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, reqprot = prot; - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; /* @@ -572,7 +573,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, prot = reqprot; } out: - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); return error; } @@ -594,6 +595,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val) { int pkey; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); /* No flags supported yet. */ if (flags) @@ -602,7 +604,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val) if (init_val & ~PKEY_ACCESS_MASK) return -EINVAL; - down_write(¤t->mm->mmap_sem); + mm_write_lock(current->mm, &mmrange); pkey = mm_pkey_alloc(current->mm); ret = -ENOSPC; @@ -616,17 +618,18 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val) } ret = pkey; out: - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); return ret; } SYSCALL_DEFINE1(pkey_free, int, pkey) { int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(¤t->mm->mmap_sem); + mm_write_lock(current->mm, &mmrange); ret = mm_pkey_free(current->mm, pkey); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); /* * We could provie warnings or errors if any VMA still diff --git a/mm/mremap.c b/mm/mremap.c index 37b5b2ad91be..9009210aea97 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -603,6 +603,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, bool locked = false; bool downgraded = false; struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX; + DEFINE_RANGE_LOCK_FULL(mmrange); LIST_HEAD(uf_unmap_early); LIST_HEAD(uf_unmap); @@ -626,7 +627,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, if (!new_len) return ret; - if (down_write_killable(¤t->mm->mmap_sem)) + if (mm_write_lock_killable(current->mm, &mmrange)) return -EINTR; if (flags & MREMAP_FIXED) { @@ -645,7 +646,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, int retval; retval = __do_munmap(mm, addr+new_len, old_len - new_len, - &uf_unmap, true); + &uf_unmap, true, &mmrange); if (retval < 0 && old_len != new_len) { ret = retval; goto out; @@ -717,9 +718,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, locked = 0; } if (downgraded) - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); else - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); if (locked && new_len > old_len) mm_populate(new_addr + old_len, new_len - old_len); userfaultfd_unmap_complete(mm, &uf_unmap_early); diff --git a/mm/msync.c b/mm/msync.c index ef30a429623a..2524b4708e78 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -36,6 +36,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) struct vm_area_struct *vma; int unmapped_error = 0; int error = -EINVAL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) goto out; @@ -55,7 +56,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) * If the interval [start,end) covers some unmapped address ranges, * just ignore them, but return -ENOMEM at the end. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, start); for (;;) { struct file *file; @@ -86,12 +87,12 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { get_file(file); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); error = vfs_fsync_range(file, fstart, fend, 1); fput(file); if (error || start >= end) goto out; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, start); } else { if (start >= end) { @@ -102,7 +103,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags) } } out_unlock: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); out: return error ? : unmapped_error; } diff --git a/mm/nommu.c b/mm/nommu.c index b492fd1fcf9f..b454b0004fd2 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -183,10 +183,11 @@ static long __get_user_pages_unlocked(struct task_struct *tsk, unsigned int gup_flags) { long ret; - down_read(&mm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + mm_read_lock(mm, &mmrange); ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages, NULL, NULL); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret; } @@ -249,12 +250,13 @@ void *vmalloc_user(unsigned long size) ret = __vmalloc(size, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL); if (ret) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(¤t->mm->mmap_sem); + mm_write_lock(current->mm, &mmrange); vma = find_vma(current->mm, (unsigned long)ret); if (vma) vma->vm_flags |= VM_USERMAP; - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); } return ret; @@ -1627,10 +1629,11 @@ int vm_munmap(unsigned long addr, size_t len) { struct mm_struct *mm = current->mm; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); ret = do_munmap(mm, addr, len, NULL); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return ret; } EXPORT_SYMBOL(vm_munmap); @@ -1716,10 +1719,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, unsigned long, new_addr) { unsigned long ret; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(¤t->mm->mmap_sem); + mm_write_lock(current->mm, &mmrange); ret = do_mremap(addr, old_len, new_len, flags, new_addr); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); return ret; } @@ -1790,8 +1794,9 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, { struct vm_area_struct *vma; int write = gup_flags & FOLL_WRITE; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); /* the access must start within one of the target process's mappings */ vma = find_vma(mm, addr); @@ -1813,7 +1818,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, len = 0; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return len; } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 539c91d0b26a..a8e3e6279718 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -558,8 +558,9 @@ bool __oom_reap_task_mm(struct mm_struct *mm) static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { bool ret = true; + DEFINE_RANGE_LOCK_FULL(mmrange); - if (!down_read_trylock(&mm->mmap_sem)) { + if (!mm_read_trylock(mm, &mmrange)) { trace_skip_task_reaping(tsk->pid); return false; } @@ -590,7 +591,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) out_finish: trace_finish_task_reaping(tsk->pid); out_unlock: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret; } diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c index ff6772b86195..aaccb8972f83 100644 --- a/mm/process_vm_access.c +++ b/mm/process_vm_access.c @@ -110,12 +110,12 @@ static int process_vm_rw_single_vec(unsigned long addr, * access remotely because task/mm might not * current/current->mm */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); pages = get_user_pages_remote(task, mm, pa, pages, flags, process_pages, NULL, &locked, &mmrange); if (locked) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (pages <= 0) return -EFAULT; diff --git a/mm/shmem.c b/mm/shmem.c index 1bb3b8dc8bb2..bae06efb293d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2012,7 +2012,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { /* It's polite to up mmap_sem if we can */ - up_read(&vma->vm_mm->mmap_sem); + mm_read_unlock(vma->vm_mm, vmf->lockrange); ret = VM_FAULT_RETRY; } diff --git a/mm/swapfile.c b/mm/swapfile.c index be36f6fe2f8c..dabe7d5391d1 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1972,8 +1972,9 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type, { struct vm_area_struct *vma; int ret = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (vma = mm->mmap; vma; vma = vma->vm_next) { if (vma->anon_vma) { ret = unuse_vma(vma, type, frontswap, @@ -1983,7 +1984,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type, } cond_resched(); } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret; } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9932d5755e4c..06daedcd06e6 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -177,7 +177,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage) + bool zeropage, + struct range_lock *mmrange) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; int vm_shared = dst_vma->vm_flags & VM_SHARED; @@ -199,7 +200,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, * feature is not supported. */ if (zeropage) { - up_read(&dst_mm->mmap_sem); + mm_read_unlock(dst_mm, mmrange); return -EINVAL; } @@ -297,7 +298,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, cond_resched(); if (unlikely(err == -ENOENT)) { - up_read(&dst_mm->mmap_sem); + mm_read_unlock(dst_mm, mmrange); BUG_ON(!page); err = copy_huge_page_from_user(page, @@ -307,7 +308,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, err = -EFAULT; goto out; } - down_read(&dst_mm->mmap_sem); + mm_read_lock(dst_mm, mmrange); dst_vma = NULL; goto retry; @@ -327,7 +328,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } out_unlock: - up_read(&dst_mm->mmap_sem); + mm_read_unlock(dst_mm, mmrange); out: if (page) { /* @@ -445,6 +446,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long src_addr, dst_addr; long copied; struct page *page; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * Sanitize the command parameters: @@ -461,7 +463,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, copied = 0; page = NULL; retry: - down_read(&dst_mm->mmap_sem); + mm_read_lock(dst_mm, &mmrange); /* * If memory mappings are changing because of non-cooperative @@ -506,7 +508,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, zeropage); + src_start, len, zeropage, + &mmrange); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; @@ -562,7 +565,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, if (unlikely(err == -ENOENT)) { void *page_kaddr; - up_read(&dst_mm->mmap_sem); + mm_read_unlock(dst_mm, &mmrange); BUG_ON(!page); page_kaddr = kmap(page); @@ -591,7 +594,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, } out_unlock: - up_read(&dst_mm->mmap_sem); + mm_read_unlock(dst_mm, &mmrange); out: if (page) put_page(page); diff --git a/mm/util.c b/mm/util.c index e2e4f8c3fa12..c410c17ddea7 100644 --- a/mm/util.c +++ b/mm/util.c @@ -350,6 +350,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long pgoff) { + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long ret; struct mm_struct *mm = current->mm; unsigned long populate; @@ -357,11 +358,11 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, ret = security_mmap_file(file, prot, flag); if (!ret) { - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff, &populate, &uf); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(ret, populate); @@ -711,18 +712,19 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen) int res = 0; unsigned int len; struct mm_struct *mm = get_task_mm(task); + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long arg_start, arg_end, env_start, env_end; if (!mm) goto out; if (!mm->arg_end) goto out_mm; /* Shh! No looking before we're done */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); arg_start = mm->arg_start; arg_end = mm->arg_end; env_start = mm->env_start; env_end = mm->env_end; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); len = arg_end - arg_start; From patchwork Tue May 21 04:52:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0BD513AD for ; Tue, 21 May 2019 04:53:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF773205A4 for ; Tue, 21 May 2019 04:53:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C3C6228968; Tue, 21 May 2019 04:53:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25C44205A4 for ; Tue, 21 May 2019 04:53:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F051C6B026B; Tue, 21 May 2019 00:53:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E9F7C6B026D; Tue, 21 May 2019 00:53:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2E966B026E; Tue, 21 May 2019 00:53:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 6641D6B026B for ; Tue, 21 May 2019 00:53:44 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id l3so28658412edl.10 for ; Mon, 20 May 2019 21:53:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=cbagLRMKw4S6ArdXNGCt7fqa5awh7nY7em5OuqZaZes=; b=FP7OdtDeh2SG5YqTqDrWVvFYxaoufGuxAipkEDOfvGv59YyChR2SVJusNVLiAMgEN8 g2umt4JpXZIvz+hA2hWWXPBiK3xeKhIC+6JfoY6BsABGjj+ZbdbCRsU9ZT7VuLiIruYU Fy9wua920BcpJVdYABCvAqdToWGo97vMbTUiOFGUgq3KytGQKBxIMUhYx6E2CrJjQDYk A5XQrNWUc/4U2a4qZTMnAW730FhTg6l/bhSFAENnMLv9DlHC7wZFjpZH1twYUlwA+AcS ubnPqDzaxWLiZb3lZZLoK4k1tsZyeqmfdXaLBYva1M6DHoZE37HJ2gb1dmt7ZFYiCBwu d9ZA== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAU9UyA6FrG8YzEwTSlnvsorryLjOY/PsC1HgZH3COXfz3wSd2eF hXgwWkI5jMK3cWZtHkQF9pKgg3WJrwrXyXjwHP7R1kQdVAZffkDJo3kyIxUNFvyMbHbngwB8mS2 M52VdetoTSzVJL4U3e3zHfP8kKmD2dbpy1npfdqS5z08aqq6XmfXsC+oAPOM7Mzc= X-Received: by 2002:a50:90dd:: with SMTP id d29mr79657922eda.127.1558414423680; Mon, 20 May 2019 21:53:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqxZyvREACuBAQc1UhAXIRnXZICOMZAOXS7/QA6OQkmO/AeuV0lnlRuIeArs6u5W0naszwgr X-Received: by 2002:a50:90dd:: with SMTP id d29mr79657799eda.127.1558414421668; Mon, 20 May 2019 21:53:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414421; cv=none; d=google.com; s=arc-20160816; b=aJ3pJCcYDMuU6YNKxiquKyKQJGClWbqj+zBzERWzjbEBAD3p8paX+4GdJe9cd8P9cs gsINfTBaVZT/SUmzgXXGCPgXiZlBAhqTPg2o3A2YcueugcLW0K8P5qRgziujbMlpplHa 1vx2glftg+peUyLZ7Fj8XvZ1DMSqL1dLoWHaC3eXkRg+zgsOuJKT/rdgxJ4W6KTImjjT DrWoVZTeoVa3ntgtHlZu9RcgcqJBdijWQRz/IwwN/Q1Ssb4d5Dua2G/ryPtgz63q5akp cir6Bvjcpp2hT0zVmEF5eP/ej+d4jzhCVJGO03ZEECBCumCQ1NugTfi9XQ2vBIfrq3Y+ JvaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=cbagLRMKw4S6ArdXNGCt7fqa5awh7nY7em5OuqZaZes=; b=cHEi1YsSJD6KmVu/m92+CUgHTHtyNpoGCqD5SdHubRxchUUh+CDxeLlwMYPbw3zOAG 4dtR7kbEzTVP1+lFi2ACAwoAURdW3V0rvo1WSeOrKvT74B3xWJRXkrp/6YiKSq4KTc52 GvIklc5Y6HFXjh9uP/9NDwglU7ufbG1QLtZWK2zWsWvoY28Haj45nTHb6NKhO7rhaW2/ cpgJv2l4Kwz2UfxpmTZjDHmjn0RWdEIjGXR5y772qyVjs3VUAUSieHYgxb9GPbdUFU+u v2eDQbykdf8APUGoUeUi2jLbvsjf9ISduSUnWhtW4YsOteMmjaVUFn5TW8IuD5QFfnwh CKrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id w51si844529edc.15.2019.05.20.21.53.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:41 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:41 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:10 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 07/14] fs: teach the mm about range locking Date: Mon, 20 May 2019 21:52:35 -0700 Message-Id: <20190521045242.24378-8-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- fs/aio.c | 5 +++-- fs/coredump.c | 5 +++-- fs/exec.c | 19 +++++++++------- fs/io_uring.c | 5 +++-- fs/proc/base.c | 23 ++++++++++++-------- fs/proc/internal.h | 2 ++ fs/proc/task_mmu.c | 32 +++++++++++++++------------ fs/proc/task_nommu.c | 22 +++++++++++-------- fs/userfaultfd.c | 50 ++++++++++++++++++++++++++----------------- include/linux/userfaultfd_k.h | 5 +++-- 10 files changed, 100 insertions(+), 68 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1fa0e16..215d19dbbefa 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -461,6 +461,7 @@ static const struct address_space_operations aio_ctx_aops = { static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct aio_ring *ring; struct mm_struct *mm = current->mm; unsigned long size, unused; @@ -521,7 +522,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_size = nr_pages * PAGE_SIZE; pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size); - if (down_write_killable(&mm->mmap_sem)) { + if (mm_write_lock_killable(mm, &mmrange)) { ctx->mmap_size = 0; aio_free_ring(ctx); return -EINTR; @@ -530,7 +531,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, 0, &unused, NULL); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; aio_free_ring(ctx); diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e55bfd..433713b63187 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -409,6 +409,7 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, static int coredump_wait(int exit_code, struct core_state *core_state) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct task_struct *tsk = current; struct mm_struct *mm = tsk->mm; int core_waiters = -EBUSY; @@ -417,12 +418,12 @@ static int coredump_wait(int exit_code, struct core_state *core_state) core_state->dumper.task = tsk; core_state->dumper.next = NULL; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; if (!mm->core_state) core_waiters = zap_threads(tsk, mm, core_state, exit_code); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); if (core_waiters > 0) { struct core_thread *ptr; diff --git a/fs/exec.c b/fs/exec.c index e96fd5328739..fbcb36bc4fd1 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -241,6 +241,7 @@ static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, static int __bprm_mm_init(struct linux_binprm *bprm) { + DEFINE_RANGE_LOCK_FULL(mmrange); int err; struct vm_area_struct *vma = NULL; struct mm_struct *mm = bprm->mm; @@ -250,7 +251,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); - if (down_write_killable(&mm->mmap_sem)) { + if (mm_write_lock_killable(mm, &mmrange)) { err = -EINTR; goto err_free; } @@ -273,11 +274,11 @@ static int __bprm_mm_init(struct linux_binprm *bprm) mm->stack_vm = mm->total_vm = 1; arch_bprm_mm_init(mm, vma); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); bprm->p = vma->vm_end - sizeof(void *); return 0; err: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); err_free: bprm->vma = NULL; vm_area_free(vma); @@ -691,6 +692,7 @@ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long ret; unsigned long stack_shift; struct mm_struct *mm = current->mm; @@ -738,7 +740,7 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -= stack_shift; bprm->exec -= stack_shift; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; vm_flags = VM_STACK_FLAGS; @@ -795,7 +797,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret = -EFAULT; out_unlock: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return ret; } EXPORT_SYMBOL(setup_arg_pages); @@ -1010,6 +1012,7 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm; + DEFINE_RANGE_LOCK_FULL(mmrange); /* Notify parent that we're no longer interested in the old VM */ tsk = current; @@ -1024,9 +1027,9 @@ static int exec_mmap(struct mm_struct *mm) * through with the exec. We must hold mmap_sem around * checking core_state and changing tsk->mm. */ - down_read(&old_mm->mmap_sem); + mm_read_lock(old_mm, &mmrange); if (unlikely(old_mm->core_state)) { - up_read(&old_mm->mmap_sem); + mm_read_unlock(old_mm, &mmrange); return -EINTR; } } @@ -1039,7 +1042,7 @@ static int exec_mmap(struct mm_struct *mm) vmacache_flush(tsk); task_unlock(tsk); if (old_mm) { - up_read(&old_mm->mmap_sem); + mm_read_unlock(old_mm, &mmrange); BUG_ON(active_mm != old_mm); setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); mm_update_next_owner(old_mm); diff --git a/fs/io_uring.c b/fs/io_uring.c index e11d77181398..16c06811193b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2597,6 +2597,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, struct page **pages = NULL; int i, j, got_pages = 0; int ret = -EINVAL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (ctx->user_bufs) return -EBUSY; @@ -2671,7 +2672,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, } ret = 0; - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); pret = get_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM, pages, vmas); @@ -2689,7 +2690,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, } else { ret = pret < 0 ? pret : -EFAULT; } - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); if (ret) { /* * if we did partial map, or found file backed vmas, diff --git a/fs/proc/base.c b/fs/proc/base.c index 9c8ca6cd3ce4..63d0fea104af 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1962,9 +1962,11 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags) goto out; if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) { - down_read(&mm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + + mm_read_lock(mm, &mmrange); exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } mmput(mm); @@ -1995,6 +1997,7 @@ static int map_files_get_link(struct dentry *dentry, struct path *path) struct task_struct *task; struct mm_struct *mm; int rc; + DEFINE_RANGE_LOCK_FULL(mmrange); rc = -ENOENT; task = get_proc_task(d_inode(dentry)); @@ -2011,14 +2014,14 @@ static int map_files_get_link(struct dentry *dentry, struct path *path) goto out_mmput; rc = -ENOENT; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_exact_vma(mm, vm_start, vm_end); if (vma && vma->vm_file) { *path = vma->vm_file->f_path; path_get(path); rc = 0; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); out_mmput: mmput(mm); @@ -2089,6 +2092,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir, struct task_struct *task; struct dentry *result; struct mm_struct *mm; + DEFINE_RANGE_LOCK_FULL(mmrange); result = ERR_PTR(-ENOENT); task = get_proc_task(dir); @@ -2107,7 +2111,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir, if (!mm) goto out_put_task; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_exact_vma(mm, vm_start, vm_end); if (!vma) goto out_no_vma; @@ -2117,7 +2121,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir, (void *)(unsigned long)vma->vm_file->f_mode); out_no_vma: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); out_put_task: put_task_struct(task); @@ -2141,6 +2145,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx) GENRADIX(struct map_files_info) fa; struct map_files_info *p; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); genradix_init(&fa); @@ -2160,7 +2165,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx) mm = get_task_mm(task); if (!mm) goto out_put_task; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); nr_files = 0; @@ -2183,7 +2188,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx) p = genradix_ptr_alloc(&fa, nr_files++, GFP_KERNEL); if (!p) { ret = -ENOMEM; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); goto out_put_task; } @@ -2192,7 +2197,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx) p->end = vma->vm_end; p->mode = vma->vm_file->f_mode; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); for (i = 0; i < nr_files; i++) { diff --git a/fs/proc/internal.h b/fs/proc/internal.h index d1671e97f7fe..df6f0ec84a8f 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -287,6 +288,7 @@ struct proc_maps_private { #ifdef CONFIG_NUMA struct mempolicy *task_mempolicy; #endif + struct range_lock mmrange; } __randomize_layout; struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode); diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index a1c2ad9f960a..7ab5c6f5b8aa 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -128,7 +128,7 @@ static void vma_stop(struct proc_maps_private *priv) struct mm_struct *mm = priv->mm; release_task_mempolicy(priv); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &priv->mmrange); mmput(mm); } @@ -166,7 +166,9 @@ static void *m_start(struct seq_file *m, loff_t *ppos) if (!mm || !mmget_not_zero(mm)) return NULL; - down_read(&mm->mmap_sem); + range_lock_init_full(&priv->mmrange); + + mm_read_lock(mm, &priv->mmrange); hold_task_mempolicy(priv); priv->tail_vma = get_gate_vma(mm); @@ -828,7 +830,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v) memset(&mss, 0, sizeof(mss)); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &priv->mmrange); hold_task_mempolicy(priv); for (vma = priv->mm->mmap; vma; vma = vma->vm_next) { @@ -844,7 +846,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v) __show_smap(m, &mss); release_task_mempolicy(priv); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &priv->mmrange); mmput(mm); out_put_task: @@ -1080,6 +1082,7 @@ static int clear_refs_test_walk(unsigned long start, unsigned long end, static ssize_t clear_refs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct task_struct *task; char buffer[PROC_NUMBUF]; struct mm_struct *mm; @@ -1118,7 +1121,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, }; if (type == CLEAR_REFS_MM_HIWATER_RSS) { - if (down_write_killable(&mm->mmap_sem)) { + if (mm_write_lock_killable(mm, &mmrange)) { count = -EINTR; goto out_mm; } @@ -1128,18 +1131,18 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, * resident set size to this mm's current rss value. */ reset_mm_hiwater_rss(mm); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); goto out_mm; } - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); tlb_gather_mmu(&tlb, mm, 0, -1); if (type == CLEAR_REFS_SOFT_DIRTY) { for (vma = mm->mmap; vma; vma = vma->vm_next) { if (!(vma->vm_flags & VM_SOFTDIRTY)) continue; - up_read(&mm->mmap_sem); - if (down_write_killable(&mm->mmap_sem)) { + mm_read_unlock(mm, &mmrange); + if (mm_write_lock_killable(mm, &mmrange)) { count = -EINTR; goto out_mm; } @@ -1158,14 +1161,14 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, * failed like if * get_proc_task() fails? */ - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); goto out_mm; } for (vma = mm->mmap; vma; vma = vma->vm_next) { vma->vm_flags &= ~VM_SOFTDIRTY; vma_set_page_prot(vma); } - downgrade_write(&mm->mmap_sem); + mm_downgrade_write(mm, &mmrange); break; } @@ -1177,7 +1180,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, if (type == CLEAR_REFS_SOFT_DIRTY) mmu_notifier_invalidate_range_end(&range); tlb_finish_mmu(&tlb, 0, -1); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); out_mm: mmput(mm); } @@ -1484,6 +1487,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, unsigned long start_vaddr; unsigned long end_vaddr; int ret = 0, copied = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); if (!mm || !mmget_not_zero(mm)) goto out; @@ -1539,9 +1543,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, /* overflow ? */ if (end < start_vaddr || end > end_vaddr) end = end_vaddr; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); ret = walk_page_range(start_vaddr, end, &pagemap_walk); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); start_vaddr = end; len = min(count, PM_ENTRY_BYTES * pm.pos); diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 36bf0f2e102e..32bf2860eff3 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -23,9 +23,10 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) struct vm_area_struct *vma; struct vm_region *region; struct rb_node *p; + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long bytes = 0, sbytes = 0, slack = 0, size; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) { vma = rb_entry(p, struct vm_area_struct, vm_rb); @@ -77,7 +78,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm) "Shared:\t%8lu bytes\n", bytes, slack, sbytes); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } unsigned long task_vsize(struct mm_struct *mm) @@ -85,13 +86,14 @@ unsigned long task_vsize(struct mm_struct *mm) struct vm_area_struct *vma; struct rb_node *p; unsigned long vsize = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) { vma = rb_entry(p, struct vm_area_struct, vm_rb); vsize += vma->vm_end - vma->vm_start; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return vsize; } @@ -103,8 +105,9 @@ unsigned long task_statm(struct mm_struct *mm, struct vm_region *region; struct rb_node *p; unsigned long size = kobjsize(mm); + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) { vma = rb_entry(p, struct vm_area_struct, vm_rb); size += kobjsize(vma); @@ -119,7 +122,7 @@ unsigned long task_statm(struct mm_struct *mm, >> PAGE_SHIFT; *data = (PAGE_ALIGN(mm->start_stack) - (mm->start_data & PAGE_MASK)) >> PAGE_SHIFT; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); size >>= PAGE_SHIFT; size += *text + *data; *resident = size; @@ -201,6 +204,7 @@ static void *m_start(struct seq_file *m, loff_t *pos) struct mm_struct *mm; struct rb_node *p; loff_t n = *pos; + DEFINE_RANGE_LOCK_FULL(mmrange); /* pin the task and mm whilst we play with them */ priv->task = get_proc_task(priv->inode); @@ -211,13 +215,13 @@ static void *m_start(struct seq_file *m, loff_t *pos) if (!mm || !mmget_not_zero(mm)) return NULL; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); /* start from the Nth VMA */ for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) if (n-- == 0) return p; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); return NULL; } @@ -227,7 +231,7 @@ static void m_stop(struct seq_file *m, void *_vml) struct proc_maps_private *priv = m->private; if (!IS_ERR_OR_NULL(_vml)) { - up_read(&priv->mm->mmap_sem); + mm_read_unlock(priv->mm, &mmrange); mmput(priv->mm); } if (priv->task) { diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3b30301c90ec..3592f6d71778 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -220,13 +220,14 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, unsigned long address, unsigned long flags, - unsigned long reason) + unsigned long reason, + struct range_lock *mmrange) { struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; bool ret = true; - VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); + VM_BUG_ON(!mm_is_locked(mm, mmrange)); ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); @@ -252,7 +253,9 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, unsigned long address, unsigned long flags, - unsigned long reason) + unsigned long reason, + struct range_lock *mmrange) + { return false; /* should never get here */ } @@ -268,7 +271,8 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, unsigned long address, unsigned long flags, - unsigned long reason) + unsigned long reason, + struct range_lock *mmrange) { struct mm_struct *mm = ctx->mm; pgd_t *pgd; @@ -278,7 +282,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, pte_t *pte; bool ret = true; - VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); + VM_BUG_ON(!mm_is_locked(mm, mmrange)); pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -368,7 +372,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) * Coredumping runs without mmap_sem so we can only check that * the mmap_sem is held, if PF_DUMPCORE was not set. */ - WARN_ON_ONCE(!rwsem_is_locked(&mm->mmap_sem)); + WARN_ON_ONCE(!mm_is_locked(mm, vmf->lockrange)); ctx = vmf->vma->vm_userfaultfd_ctx.ctx; if (!ctx) @@ -476,12 +480,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (!is_vm_hugetlb_page(vmf->vma)) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, - reason); + reason, vmf->lockrange); else must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, vmf->address, - vmf->flags, reason); - up_read(&mm->mmap_sem); + vmf->flags, reason, + vmf->lockrange); + mm_read_unlock(mm, vmf->lockrange); if (likely(must_wait && !READ_ONCE(ctx->released) && (return_to_userland ? !signal_pending(current) : @@ -535,7 +540,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) * and there's no need to retake the mmap_sem * in such case. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, vmf->lockrange); ret = VM_FAULT_NOPAGE; } } @@ -628,9 +633,10 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, if (release_new_ctx) { struct vm_area_struct *vma; struct mm_struct *mm = release_new_ctx->mm; + DEFINE_RANGE_LOCK_FULL(mmrange); /* the various vma->vm_userfaultfd_ctx still points to it */ - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); /* no task can run (and in turn coredump) yet */ VM_WARN_ON(!mmget_still_valid(mm)); for (vma = mm->mmap; vma; vma = vma->vm_next) @@ -638,7 +644,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING); } - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); userfaultfd_ctx_put(release_new_ctx); } @@ -780,7 +786,8 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx, } bool userfaultfd_remove(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + struct range_lock *mmrange) { struct mm_struct *mm = vma->vm_mm; struct userfaultfd_ctx *ctx; @@ -792,7 +799,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma, userfaultfd_ctx_get(ctx); WRITE_ONCE(ctx->mmap_changing, true); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); msg_init(&ewq.msg); @@ -872,6 +879,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) /* len == 0 means wake all */ struct userfaultfd_wake_range range = { .len = 0, }; unsigned long new_flags; + DEFINE_RANGE_LOCK_FULL(mmrange); WRITE_ONCE(ctx->released, true); @@ -886,7 +894,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) * it's critical that released is set to true (above), before * taking the mmap_sem for writing. */ - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); if (!mmget_still_valid(mm)) goto skip_mm; prev = NULL; @@ -912,7 +920,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file) vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } skip_mm: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); mmput(mm); wakeup: /* @@ -1299,6 +1307,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long vm_flags, new_flags; bool found; bool basic_ioctls; + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long start, end, vma_end; user_uffdio_register = (struct uffdio_register __user *) arg; @@ -1339,7 +1348,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!mmget_not_zero(mm)) goto out; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); if (!mmget_still_valid(mm)) goto out_unlock; vma = find_vma_prev(mm, start, &prev); @@ -1483,7 +1492,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma = vma->vm_next; } while (vma && vma->vm_start < end); out_unlock: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); mmput(mm); if (!ret) { /* @@ -1511,6 +1520,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, bool found; unsigned long start, end, vma_end; const void __user *buf = (void __user *)arg; + DEFINE_RANGE_LOCK_FULL(mmrange); ret = -EFAULT; if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister))) @@ -1528,7 +1538,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, if (!mmget_not_zero(mm)) goto out; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); if (!mmget_still_valid(mm)) goto out_unlock; vma = find_vma_prev(mm, start, &prev); @@ -1645,7 +1655,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, vma = vma->vm_next; } while (vma && vma->vm_start < end); out_unlock: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); mmput(mm); out: return ret; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index ac9d71e24b81..c8d3c102ce5e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -68,7 +68,7 @@ extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *, extern bool userfaultfd_remove(struct vm_area_struct *vma, unsigned long start, - unsigned long end); + unsigned long end, struct range_lock *mmrange); extern int userfaultfd_unmap_prep(struct vm_area_struct *vma, unsigned long start, unsigned long end, @@ -125,7 +125,8 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx, static inline bool userfaultfd_remove(struct vm_area_struct *vma, unsigned long start, - unsigned long end) + unsigned long end, + struct range_lock *mmrange) { return true; } From patchwork Tue May 21 04:52:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DBC8F76 for ; Tue, 21 May 2019 04:53:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC6C3205A4 for ; Tue, 21 May 2019 04:53:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C038E28968; Tue, 21 May 2019 04:53:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E258828928 for ; Tue, 21 May 2019 04:53:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 988666B026A; Tue, 21 May 2019 00:53:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8E9176B026C; Tue, 21 May 2019 00:53:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67B006B026B; Tue, 21 May 2019 00:53:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 079A56B0269 for ; Tue, 21 May 2019 00:53:42 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id 18so28715724eds.5 for ; Mon, 20 May 2019 21:53:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=XR4QyHJwMBb9LpIc+JtkzaDUUiGrVVVGbuem2XVnM2w=; b=QBran7IYRO566sLjq0jhqKthI5TPe5rKwODZ0m9BuwOXiosjOimPZV0nwmShAVIo0b biv+t0OkLCxFz+a/rGa6QJ6WGWXQABZkvotHhu9YN222uesO4MJ08/PRPKMpzYU7jJLf giUZ+0q2t2H/CqWuaOs3fpjNlzZtr5ieagY9mPIA+di9Rlz4ACcTQLRaLLtqAgm+r9Av 5Fs42QyEp17cmdIk5ziIl+aNJx7LTcvyhNocFFl9K022jlCHTr1uTi55ed1mPEjauw1l J7eoLMph99w77xpMvJoHiSjoKkerRr6nT9dGpVrrxfEJuBUpzInFT+7GHNng6FZUYN2g wCDA== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAXziuaU5kdMWAbJllJKe0EDNDMf9mJ8aX0dcW+f4HWX+1vbrNyo jhjw6Vq4faBzd5LLPGRh0vOSikWDiEneKK0Nqa0jX6K84V+EGFf3rK6Ygj2qub0v+HMDTsrjE2e 0B6xkys7z7TE7SpmndqzmfTpKlOA9yB8IaV55Fn9WZQahwnP1MMzp4RoBwYfNoYo= X-Received: by 2002:a05:6402:70b:: with SMTP id w11mr57413191edx.139.1558414421506; Mon, 20 May 2019 21:53:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqwQxFuOPitcIAsPzTjC6kPNNsQKBi9UVbSQnDIMpFka1YkwWvj/ffMzPZeVYQxJMM9peFbL X-Received: by 2002:a05:6402:70b:: with SMTP id w11mr57413107edx.139.1558414419972; Mon, 20 May 2019 21:53:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414419; cv=none; d=google.com; s=arc-20160816; b=ofZTcbQkJm3W2J7cvBq3ZjpAsaVgo5T/Qw0Gl/y+wWNaj+7ljwzxYRU8N1DQ1Nv7h3 z190hPVigkPEpTohPF196pCEmMsmRA3XovwFJPKxfXrkQJ0JUrvJ+A1cJ3aTpyrse/tY E8HqrcHwHBFx9uQJ17ILZHauLOfJ4FGMwy5k9/i/664slqWOV1NyM2E7LN7FhfSMKqVB 202Z2ekbs2EcbGCU26pvd+KwBzf1GQhuiREZH27p9wjP5KzDQ8R4Sq4n7oILj4UTqyOr GkPdYz3krylxDiUc1Bj69GOZSyto2GLXsTAZHCN4aJzNkGdwofTefwBmpQkieTCcFEs6 wa3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=XR4QyHJwMBb9LpIc+JtkzaDUUiGrVVVGbuem2XVnM2w=; b=RbgclHFQD6+VhthSIBGkLryB8GR+qWH/lanSlWDJ1UwySHRR1iYn87Gz42hhNh7Oxn ccguUK3PfCGA3ALucpqI44wBpKxgNCmdqJEwgG+ElXUb5/ykSgvOXET4wq4AEtWLtKiD e9bSBaJXI1iPtC5hBjUJykiYyhQ1Om87p60H28F0u6fcfsvs/iuVFou2szjUstQ3N/5x 5H/ShbAScrReTmpqzES2FShorJaqTtZ8DQP8ULsaGUxA3lp+7oGw31lEBqP4nFE8Czy+ xpxV97S/ssXlawhFQtTyWtc6GWIQc1aw4VhNYJLyHN5C7hTxNg/Umetv49RrI7CrS0xa OPyA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id s27si87875ejb.385.2019.05.20.21.53.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:39 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:39 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:13 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 08/14] arch/x86: teach the mm about range locking Date: Mon, 20 May 2019 21:52:36 -0700 Message-Id: <20190521045242.24378-9-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- arch/x86/entry/vdso/vma.c | 12 +++++++----- arch/x86/kernel/vm86_32.c | 5 +++-- arch/x86/kvm/paging_tmpl.h | 9 +++++---- arch/x86/mm/debug_pagetables.c | 8 ++++---- arch/x86/mm/fault.c | 8 ++++---- arch/x86/mm/mpx.c | 15 +++++++++------ arch/x86/um/vdso/vma.c | 5 +++-- 7 files changed, 35 insertions(+), 27 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index babc4e7a519c..f6d8950f37b8 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -145,12 +145,13 @@ static const struct vm_special_mapping vvar_mapping = { */ static int map_vdso(const struct vdso_image *image, unsigned long addr) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long text_start; int ret = 0; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; addr = get_unmapped_area(NULL, addr, @@ -193,7 +194,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr) } up_fail: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return ret; } @@ -254,8 +255,9 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); /* * Check if we have already mapped vdso blob - fail to prevent * abusing from userspace install_speciall_mapping, which may @@ -266,11 +268,11 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr) for (vma = mm->mmap; vma; vma = vma->vm_next) { if (vma_is_special_mapping(vma, &vdso_mapping) || vma_is_special_mapping(vma, &vvar_mapping)) { - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return -EEXIST; } } - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return map_vdso(image, addr); } diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index 6a38717d179c..39eecee07dcd 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -171,8 +171,9 @@ static void mark_screen_rdonly(struct mm_struct *mm) pmd_t *pmd; pte_t *pte; int i; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); pgd = pgd_offset(mm, 0xA0000); if (pgd_none_or_clear_bad(pgd)) goto out; @@ -198,7 +199,7 @@ static void mark_screen_rdonly(struct mm_struct *mm) } pte_unmap_unlock(pte, ptl); out: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); flush_tlb_mm_range(mm, 0xA0000, 0xA0000 + 32*PAGE_SIZE, PAGE_SHIFT, false); } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 367a47df4ba0..347d3ba41974 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -152,23 +152,24 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; unsigned long pfn; unsigned long paddr; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE); if (!vma || !(vma->vm_flags & VM_PFNMAP)) { - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return -EFAULT; } pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; paddr = pfn << PAGE_SHIFT; table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); if (!table) { - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return -EFAULT; } ret = CMPXCHG(&table[index], orig_pte, new_pte); memunmap(table); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } return (ret != orig_pte); diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c index cd84f067e41d..0d131edc6a75 100644 --- a/arch/x86/mm/debug_pagetables.c +++ b/arch/x86/mm/debug_pagetables.c @@ -15,9 +15,9 @@ DEFINE_SHOW_ATTRIBUTE(ptdump); static int ptdump_curknl_show(struct seq_file *m, void *v) { if (current->mm->pgd) { - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } return 0; } @@ -30,9 +30,9 @@ static struct dentry *pe_curusr; static int ptdump_curusr_show(struct seq_file *m, void *v) { if (current->mm->pgd) { - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } return 0; } diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fb869c292b91..fbb060c89e7d 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -946,7 +946,7 @@ __bad_area(struct pt_regs *regs, unsigned long error_code, * Something tried to access memory that isn't in our memory map.. * Fix it, but check if it's kernel or user first.. */ - up_read(&mm->mmap_sem); + mm_read_unlock(mm, mmrange); __bad_area_nosemaphore(regs, error_code, address, pkey, si_code); } @@ -1399,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs, * 1. Failed to acquire mmap_sem, and * 2. The access did not originate in userspace. */ - if (unlikely(!down_read_trylock(&mm->mmap_sem))) { + if (unlikely(!mm_read_trylock(mm, &mmrange))) { if (!user_mode(regs) && !search_exception_tables(regs->ip)) { /* * Fault from code in kernel from @@ -1409,7 +1409,7 @@ void do_user_addr_fault(struct pt_regs *regs, return; } retry: - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); } else { /* * The above down_read_trylock() might have succeeded in @@ -1485,7 +1485,7 @@ void do_user_addr_fault(struct pt_regs *regs, return; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (unlikely(fault & VM_FAULT_ERROR)) { mm_fault_error(regs, hw_error_code, address, fault); return; diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index 0d1c47cbbdd6..5f0a4af29920 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -46,16 +46,17 @@ static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm) static unsigned long mpx_mmap(unsigned long len) { struct mm_struct *mm = current->mm; + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long addr, populate; /* Only bounds table can be allocated here */ if (len != mpx_bt_size_bytes(mm)) return -EINVAL; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); if (populate) mm_populate(addr, populate); @@ -214,6 +215,7 @@ int mpx_enable_management(void) void __user *bd_base = MPX_INVALID_BOUNDS_DIR; struct mm_struct *mm = current->mm; int ret = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * runtime in the userspace will be responsible for allocation of @@ -227,7 +229,7 @@ int mpx_enable_management(void) * unmap path; we can just use mm->context.bd_addr instead. */ bd_base = mpx_get_bounds_dir(); - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); /* MPX doesn't support addresses above 47 bits yet. */ if (find_vma(mm, DEFAULT_MAP_WINDOW)) { @@ -241,20 +243,21 @@ int mpx_enable_management(void) if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) ret = -ENXIO; out: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return ret; } int mpx_disable_management(void) { struct mm_struct *mm = current->mm; + DEFINE_RANGE_LOCK_FULL(mmrange); if (!cpu_feature_enabled(X86_FEATURE_MPX)) return -ENXIO; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR; - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return 0; } diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c index 6be22f991b59..d65d82b967c7 100644 --- a/arch/x86/um/vdso/vma.c +++ b/arch/x86/um/vdso/vma.c @@ -55,13 +55,14 @@ subsys_initcall(init_vdso); int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) { + DEFINE_RANGE_LOCK_FULL(mmrange); int err; struct mm_struct *mm = current->mm; if (!vdso_enabled) return 0; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; err = install_special_mapping(mm, um_vdso_addr, PAGE_SIZE, @@ -69,7 +70,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, vdsop); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return err; } From patchwork Tue May 21 04:52:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8552C13AD for ; Tue, 21 May 2019 04:53:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7535C28848 for ; Tue, 21 May 2019 04:53:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6949828928; Tue, 21 May 2019 04:53:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1F592893D for ; Tue, 21 May 2019 04:53:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0390A6B026D; Tue, 21 May 2019 00:53:46 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E8DCC6B026E; Tue, 21 May 2019 00:53:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D556E6B026F; Tue, 21 May 2019 00:53:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 816136B026D for ; Tue, 21 May 2019 00:53:45 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id c26so28668891eda.15 for ; Mon, 20 May 2019 21:53:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=xkRCB1iF6fVn4v/jj8cO3WC0XjDj1aKonhoR1jGRE2k=; b=MzfXXo4SZzqg47rcLqrnIEEMGp4ibfNqcnRTOscjGjh6HWZhjQVWvRUK/HfAcwXbH1 h6elNQk20RjzOxk0qBGd+fRq5EijaB7VHGN16AMmRh6ZzAwp1nOG+jSu6UFIj/qULFsH PTZaOVXPVuJSGvb39jdT1P7p4UgS2TEBQlOkiPO1LdYt/bZnsHdHEFOv+WKDbVG9Y/on uxpq/Yon1qK29gUcj9YH+0omppTffj10nXzOwF3tHSUEfO0TWMWCyCHE8CS3hHU1nUnC MoQCn8y1eF3LdVxETIkUkTm243E65nfiGmcT6YOoVf/HTBpBw39JwdqsPIxVUe9fht/h mLoQ== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAXvSoaqFyGMw7K5cWR9DrtK8CZI/WCkPSAidR3g4/ddFPaHoLH3 b6EfVH9cSFThUiHF4CBoBQvX2kcMgMLmLa+5XR53cifI8wyQ1UoTnqIsQWVq/88llNvhWjzoWgI uIFmXq2YCkeV6FS//9Z301jxI0WRJPc1Po5OrCZJ72R3jYSQ9wDpJmrXcwOdomY4= X-Received: by 2002:a17:907:2164:: with SMTP id rl4mr7212590ejb.103.1558414424962; Mon, 20 May 2019 21:53:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqzGV71sm3N6VASxyxh4lUCQhmt7ggqGHH6wBI/dR1HRsPQSQhrXbj1cTXdQ4Wy+fmne+XLl X-Received: by 2002:a17:907:2164:: with SMTP id rl4mr7212512ejb.103.1558414423272; Mon, 20 May 2019 21:53:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414423; cv=none; d=google.com; s=arc-20160816; b=oZV8jvFYyhZn7apdrN5eNDRIoCM2wKE/D/wTBzmSa6Nv/pzADHksRyA6w5TW9cjBNe Yb5ySkRiQKA1aa7BoWM0PI5mwC3DZbj9dmYPfETrM2GvgNKx+zCkTtkMW8bbW+Hlqb4a XgzxZ1t4sgO1c8Z1eXBjKOhgehWZ6bKsPeZKkutOhHoUTaXfk4OIbWeyPvZ01uX9ITMO VBWU4B6BrgXLMM6MzfSzhB4MAAumNUwuu55sVmQMg5IxRyURnXYcfLj1Y3lHQC2x5E4z 0/ZkCgv5DmasLCV3LQbK8TFfP5v13QRxMIrGANTNcoQcKtmsrzMRrl/glzcFq3YCz41Z x6Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=xkRCB1iF6fVn4v/jj8cO3WC0XjDj1aKonhoR1jGRE2k=; b=L+2Lec2gDAPxNP+oXndp8o2yNYzViYhfDa+J8BZ+bcW1EEtxZLKplefjnBSxZALimS JGv+oHhaFhaGGnyCMnYvwyMhIrOK8ZilVfGMNKYSYxcM5YF/ZAkQponWRIWy4LPOhEyp LkAuDl9Z4IJKDlhcScpmiOMqBphbmqpJPDBIUFo/HlGfsJXjVuK0+fWmjWvGhrTPQtMj 0S5lDke/4Kd7uOPtqXJMlY3NW8nggG6Kpaycgll16rYdspd0M5UN4j+fZJyKxesVVStU a1kRCXReFcdd0moTp7PHG8oibLJ3RJxpxZOqNCsxiG8l2kq8je/X1liOpHTcTEDSxyMu aDNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id k35si4062594edd.39.2019.05.20.21.53.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:43 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:42 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:15 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 09/14] virt: teach the mm about range locking Date: Mon, 20 May 2019 21:52:37 -0700 Message-Id: <20190521045242.24378-10-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- virt/kvm/arm/mmu.c | 17 ++++++++++------- virt/kvm/async_pf.c | 4 ++-- virt/kvm/kvm_main.c | 11 ++++++----- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 74b6582eaa3c..85f8b9ccfabe 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -980,9 +980,10 @@ void stage2_unmap_vm(struct kvm *kvm) struct kvm_memslots *slots; struct kvm_memory_slot *memslot; int idx; + DEFINE_RANGE_LOCK_FULL(mmrange); idx = srcu_read_lock(&kvm->srcu); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); spin_lock(&kvm->mmu_lock); slots = kvm_memslots(kvm); @@ -990,7 +991,7 @@ void stage2_unmap_vm(struct kvm *kvm) stage2_unmap_memslot(kvm, memslot); spin_unlock(&kvm->mmu_lock); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); srcu_read_unlock(&kvm->srcu, idx); } @@ -1688,6 +1689,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); @@ -1700,11 +1702,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } /* Let's check if we will get back a huge page backed by hugetlbfs */ - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); vma = find_vma_intersection(current->mm, hva, hva + 1); if (unlikely(!vma)) { kvm_err("Failed to find VMA for hva 0x%lx\n", hva); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return -EFAULT; } @@ -1725,7 +1727,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PMD_SIZE || (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); /* We need minimum second+third level pages */ ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm), @@ -2280,6 +2282,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, hva_t reg_end = hva + mem->memory_size; bool writable = !(mem->flags & KVM_MEM_READONLY); int ret = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); if (change != KVM_MR_CREATE && change != KVM_MR_MOVE && change != KVM_MR_FLAGS_ONLY) @@ -2293,7 +2296,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, (kvm_phys_size(kvm) >> PAGE_SHIFT)) return -EFAULT; - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); /* * A memory region could potentially cover multiple VMAs, and any holes * between them, so iterate over all of them to find out if we can map @@ -2361,7 +2364,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, stage2_flush_memslot(kvm, memslot); spin_unlock(&kvm->mmu_lock); out: - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return ret; } diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index e93cd8515134..03d9f9bc5270 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -87,11 +87,11 @@ static void async_pf_execute(struct work_struct *work) * mm and might be done in another context, so we must * access remotely. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, &locked, &mmrange); if (locked) - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); kvm_async_page_present_sync(vcpu, apf); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e1484150a3dd..421652e66a03 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1331,6 +1331,7 @@ EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long addr, size; size = PAGE_SIZE; @@ -1339,7 +1340,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) if (kvm_is_error_hva(addr)) return PAGE_SIZE; - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); vma = find_vma(current->mm, addr); if (!vma) goto out; @@ -1347,7 +1348,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) size = vma_kernel_pagesize(vma); out: - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return size; } @@ -1588,8 +1589,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, { struct vm_area_struct *vma; kvm_pfn_t pfn = 0; - int npages, r; DEFINE_RANGE_LOCK_FULL(mmrange); + int npages, r; /* we can do it either atomically or asynchronously, not both */ BUG_ON(atomic && async); @@ -1604,7 +1605,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (npages == 1) return pfn; - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); if (npages == -EHWPOISON || (!async && check_user_page_hwpoison(addr))) { pfn = KVM_PFN_ERR_HWPOISON; @@ -1629,7 +1630,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, pfn = KVM_PFN_ERR_FAULT; } exit: - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return pfn; } From patchwork Tue May 21 04:52:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 70A1D76 for ; Tue, 21 May 2019 04:54:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61E4E28848 for ; Tue, 21 May 2019 04:54:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5648D28972; Tue, 21 May 2019 04:54:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA97E28969 for ; Tue, 21 May 2019 04:54:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FA516B0270; Tue, 21 May 2019 00:53:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9D18E6B0271; Tue, 21 May 2019 00:53:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 826606B0272; Tue, 21 May 2019 00:53:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 2A7F76B0270 for ; Tue, 21 May 2019 00:53:48 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id c1so28730755edi.20 for ; Mon, 20 May 2019 21:53:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=gq3ARRXvCrviE7zgDZslDMgDa+RfCscNJ5KD1KW354Q=; b=GgWh6zxhoMqObHh5pgTzp98YowJ3Cq3OLH8Hck34FEeWLI5KbMYFTvA6uF6F48JQTh E6/xq8SYm79Q5xcN+GVNFjrE+gnYS2PZtEzDvTB2sw7luQ6gfBWP2CWyRh0H8qEL0bL0 nLg9OLOCDkVVTBfKNsgj7nrTXdQdZBx+mKJO4LA37/5jVEJJQ+ckL/HkNVcKyhP7kC7k KMiqu/ewH/RpW41yBld0tlkv+GnmsYgNJYrzT+MlPnw5dkYcNCaO/Ghz407MUZdehx/+ WAAtsn4YO24qjR/ll3CFbbmNvgHMeYmE0bCz/ekhok0UTu/zmV03PhFnmhYvBToPb27/ Vtzg== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAVahjx+j7gmnluiIRejVm41CbPlutKe/J7f5m3DpKYob3s8+c7j xkp2azGFFPXT7MGu2TYGwAkyN1gzI8yGpTKF823DVg/c4QKSmXcdic/pRIaInBROkDQHR3DFZ/6 fW/r4/FLjWXaVTQ8D/FVQzc64Jn/GsXq7nqA70Y65ilNalfSl+ei59u9VHPCBoPc= X-Received: by 2002:a50:8dc5:: with SMTP id s5mr52447482edh.138.1558414427693; Mon, 20 May 2019 21:53:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqzHLNvYRoUWxfX5liysODhwl0Wf0BR5aL7EakA+qlbV+BUYa8BMxbJP4LhovVsADUukkDGZ X-Received: by 2002:a50:8dc5:: with SMTP id s5mr52447410edh.138.1558414426501; Mon, 20 May 2019 21:53:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414426; cv=none; d=google.com; s=arc-20160816; b=TwxDk3CPtuouT4jQ1QP4DkF10gwH9Sa4CG4j99AMvrqN8JrDcNmx7hWFyli64j3+// 3adk8tqI//Pq4X8P5AGpUh8/9WbZUVqBEaPnqKqN9C2HMVibiMB7XxyiXjwZslWdCoBz 9CGfnYgO6vlPIICxfZ7t/RA+s22SJUomnKhGm7THUYWlLNNL22/uOmEWiHyVVQAuEr0J LAIa06PQiK2cQwY3Q54pnkelvzQo9o90W2DhLC8jGkP6bNJlAIGyFkHSJS19KtxAkXMD QAMrMo2d2zoal5D7I7xpvFPJD5KSeAVlZZDDGlP8l70Liq04/EyLwHEmdWgWAn+OaNL3 K9Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=gq3ARRXvCrviE7zgDZslDMgDa+RfCscNJ5KD1KW354Q=; b=FjspRDSYpc3RD4dICwOoKsjHGw9OT4N7oDYyjRUFwQXRNbn0RA1JbSqr9bZSRnEu7f Y7qZ8IPZSighv18KUITk6Lg5HJ0Fw4hfWu4+zqfyUg1T+Hu4umaa6nPmnf4VhZR8l0el 7nRg4k7H4P6xVEjHV3sT5oclXAxhY3r5Da+H3h6arjia4wYbKniMI0D+W2NiP23iAdwv IcbKiJT9o/+EJwy/uSi7GJqmBMUcdM7gvN3buW50c11VPwCuJ2g21QAlOmylOiW/+iOM J9CgXA86dYvmf0Zoa/Cm3QARyScbrjR+PLI7St7ohl71eOyHbnGHJd8ZOJNTDXHVaft7 wcig== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id 60si8526733edg.284.2019.05.20.21.53.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:46 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:45 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:17 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 10/14] net: teach the mm about range locking Date: Mon, 20 May 2019 21:52:38 -0700 Message-Id: <20190521045242.24378-11-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- net/ipv4/tcp.c | 5 +++-- net/xdp/xdp_umem.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 53d61ca3ac4b..2be929dcafa8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1731,6 +1731,7 @@ static int tcp_zerocopy_receive(struct sock *sk, struct tcp_sock *tp; int inq; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); if (address & (PAGE_SIZE - 1) || address != zc->address) return -EINVAL; @@ -1740,7 +1741,7 @@ static int tcp_zerocopy_receive(struct sock *sk, sock_rps_record_flow(sk); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); ret = -EINVAL; vma = find_vma(current->mm, address); @@ -1802,7 +1803,7 @@ static int tcp_zerocopy_receive(struct sock *sk, frags++; } out: - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); if (length) { tp->copied_seq = seq; tcp_rcv_space_adjust(sk); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 2b18223e7eb8..2bf444fb998d 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -246,16 +246,17 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem) unsigned int gup_flags = FOLL_WRITE; long npgs; int err; + DEFINE_RANGE_LOCK_FULL(mmrange); umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL | __GFP_NOWARN); if (!umem->pgs) return -ENOMEM; - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); npgs = get_user_pages(umem->address, umem->npgs, gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); if (npgs != umem->npgs) { if (npgs >= 0) { From patchwork Tue May 21 04:52:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB9FC14C0 for ; Tue, 21 May 2019 04:54:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C98428852 for ; Tue, 21 May 2019 04:54:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8FEB12896F; Tue, 21 May 2019 04:54:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2792C2893D for ; Tue, 21 May 2019 04:54:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2B3F6B0271; Tue, 21 May 2019 00:53:49 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EDA446B0273; Tue, 21 May 2019 00:53:49 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC88E6B0274; Tue, 21 May 2019 00:53:49 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 89A706B0271 for ; Tue, 21 May 2019 00:53:49 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id f41so28736700ede.1 for ; Mon, 20 May 2019 21:53:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=AKmR5B5S0qq8tGuHHmSOEPObaYYJM0kSBVRLE/9QL+Q=; b=RSZh7jjIfh5PAOqt39Rd/TB9yJCMKvQjaHqx4d4ikCnEOc3aHGrLn1rtQaF+ieHTPE uRAK0755g46b8oqppLsZ7hJfRqg+KqbbfYG+IymMHUFFtN/WaniIb5UP0r6n71TwrFKM jOxiCKv92Xa4djmrY3K6zPiG452kcdtPYR1EhewRAYwh/p1CNYr0o4MFP4ThpLP8RgTe vZs00OlD7M9trUJyK52482VpkANGJAdJFhq3jl4+hwiaRnHJGySbMxRsj4L1wP+lmvzA sg2or2svnIPYDDlKCf50kztmTYHK/3dghAIotVBwX7LbB6EDFUhklEQ2t+YDsschdG9B T1dQ== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAV4qyx4eOjxCD8h3VsmbQFBxdmm5d69K/uf1YBl18J9YJTvbicf 9vBYuNfBRWuKrwMSnVq/RxmkNkb4bRfoxLRc+MRL2pqb9/LZySZf3/IctgyegztFKHRNtQCzKFu vvE1TEldhS/ibaWF0gYP5qN2jXTdh2JmljMcE54NUdK10V9ynLtWS7t88Db9nL3Y= X-Received: by 2002:aa7:d8d1:: with SMTP id k17mr80706727eds.250.1558414429089; Mon, 20 May 2019 21:53:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqwQAD2mdFBeOT6B6PZZ93sfMnY50myQBKNe/UBDBMWHNUFflc9WGLeLpbl14CaFfQ5yumFm X-Received: by 2002:aa7:d8d1:: with SMTP id k17mr80706671eds.250.1558414427935; Mon, 20 May 2019 21:53:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414427; cv=none; d=google.com; s=arc-20160816; b=xYrqICNzn5PIS+PmgtYEPo042LSkqKbNHilGBMN8wq3TqPHp0gjHrDGewt9iWeygnT Y3Wxl9FScD8vLQ7NhSrEnkLWkfHmUjV+g/ppp+2umgXe91q4cYFx/I0Sd8GRmVeJQOWo 7Z5Ff5UYo6MzyzwZy6GxhJt+myD1cUwQNIPLBGO6X9XKPf60+ZybM7gjqk3TwvGJ1Gxv v2OIer34fGXo208GhUd5C4wqtSb26VKynix1j9DRD1aJP/Bzn0Cv7o7tWhhRokfEofsm wNHU7MM+CEUwhoamp80SyOm6mvvmL8XmXU/w/VTQYvaQRF+RdFhJviJbvP6sdUGZBkWk q00w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=AKmR5B5S0qq8tGuHHmSOEPObaYYJM0kSBVRLE/9QL+Q=; b=pb22JqE0tX+OOv0vR3Mc7yIUPmStRUtw9Ri5MDbgstuaOHZTBS/uOy4g3ca5T3f7Ot u1SEgTv3IDU0sUzVYy5pITX5fhOxeeKV9t1HqPIQ21aAODcHQPqjPLHjuMc65q535yYv 0lqmBBvRpWLdjvIHlm2KVkjaVMLH7oRKKib+axRyvD6OXfHIkHNdzua9chaQBp0BA2KS pnr0Z2J6i9P6/F64hNVgQG8Ic12b1igQPJYmPZlUm0Kz8+6r1U18aeLJevm2mpfikzw/ keq7PXhq8V9Erjn/8IKBQit7jaM+0dvcQ/52/yABe9xWA76rquwRkZEfKLmwiKpPYoZh 0b1w== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id d13si7274559ejj.242.2019.05.20.21.53.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:47 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:47 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:19 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 11/14] ipc: teach the mm about range locking Date: Mon, 20 May 2019 21:52:39 -0700 Message-Id: <20190521045242.24378-12-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- ipc/shm.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/ipc/shm.c b/ipc/shm.c index ce1ca9f7c6e9..3666fa71bfc2 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1418,6 +1418,7 @@ COMPAT_SYSCALL_DEFINE3(old_shmctl, int, shmid, int, cmd, void __user *, uptr) long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr, unsigned long shmlba) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct shmid_kernel *shp; unsigned long addr = (unsigned long)shmaddr; unsigned long size; @@ -1544,7 +1545,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, if (err) goto out_fput; - if (down_write_killable(¤t->mm->mmap_sem)) { + if (mm_write_lock_killable(current->mm, &mmrange)) { err = -EINTR; goto out_fput; } @@ -1564,7 +1565,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, if (IS_ERR_VALUE(addr)) err = (long)addr; invalid: - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); if (populate) mm_populate(addr, populate); @@ -1625,6 +1626,7 @@ COMPAT_SYSCALL_DEFINE3(shmat, int, shmid, compat_uptr_t, shmaddr, int, shmflg) */ long ksys_shmdt(char __user *shmaddr) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long addr = (unsigned long)shmaddr; @@ -1638,7 +1640,7 @@ long ksys_shmdt(char __user *shmaddr) if (addr & ~PAGE_MASK) return retval; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; /* @@ -1726,7 +1728,7 @@ long ksys_shmdt(char __user *shmaddr) #endif - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return retval; } From patchwork Tue May 21 04:52:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952901 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8878A76 for ; Tue, 21 May 2019 04:54:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77A532893D for ; Tue, 21 May 2019 04:54:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6B7AE2896F; Tue, 21 May 2019 04:54:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 41FB428968 for ; Tue, 21 May 2019 04:54:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4561E6B026F; Tue, 21 May 2019 00:53:47 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 36B106B0270; Tue, 21 May 2019 00:53:47 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20AC16B0271; Tue, 21 May 2019 00:53:47 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id B7EF76B026F for ; Tue, 21 May 2019 00:53:46 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id g36so28735073edg.8 for ; Mon, 20 May 2019 21:53:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Sr5AIbnU96HCUMRuw+zzqr1YdIo8+CnQDxY0zvVCgvk=; b=aJUwpDA7HgJ43NLE28cPShqdla7smXjX7F42bIYYYzu91F4PAs/jiPjlxKTt8j3gOS yyGXIUvRZjJ3R8RF1ZVIT9IaEPcOMuP7a5SUAh8RNWLyaJptVDzNdIOgSjWaMgdiH0UT Co6/DKlmBNQZWPsx0diT1j/GacHXHrl8KUhbsC8XJl4y3B3O9xeTxqPrWCQTQFw5+f8P jBBgSOjFMWOK6KTMXu5k1iwi02RkODojygTpkVfAiIrkXflWwOn1qD77hz/T7hSt6Wef eTIlCuPOnLzs8KpFQljjX97sgTCKyRlaJtcCZjg+9aYwvSIlWJivNyKlus4hQGkYy4EF OcPQ== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAVUiRefH2UVJBKvOVAvAZz2NgKNZZRgURT7q6ew7l9q1nAiJlGx 4m67B37R1cVZdiup4ehWY/I6duJdJeSAt7BqEBM/nj8udQz33FXy2E4FNh1IImsaOveXUoVSyQK 34JH4bOE7WoYAuPYwaHHd/QUvE8lVVdxNQB3pFKABAnnrS4kKUct5wjT17M16sRA= X-Received: by 2002:a50:90dd:: with SMTP id d29mr79658058eda.127.1558414426210; Mon, 20 May 2019 21:53:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqzCZsilJerx1gTNrhI+fTaDTEpqIc/fN9C8lPyNgchsZ6sT2NiIElKWQ+WT1lXGduWPHZ7z X-Received: by 2002:a50:90dd:: with SMTP id d29mr79657973eda.127.1558414424397; Mon, 20 May 2019 21:53:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414424; cv=none; d=google.com; s=arc-20160816; b=FBLM/0Wo2QfBUBMaFPj+bZbj05C5bhch0WCa2/tA4bYfk/egBozHy2okGOmoQQeyre mgQfTUbwTA/1jnvKdTw5wuTvR9zLlZTaR7yuTTUGoByxQTaC8MpsPMT5oy6Me9lZlykk 69UxJiYBF2E3DK7WqFI/il5rJ7MBH2k2YmJL+jqX7Hz0la5Tve6rCl4t9BJCcrPru1q6 0n/cnSw8IbEPKjnIz5cj1/x8m82QYV1wVHMceY8FekYxAo/mLP/bJvknHMfMyC05mNG8 x7ZUt6T0zSlifaYz1DqHpPKypoqNpTa7FuN30+yzbPx0emHLZOJw6yE2PRV06YEs9mZC rOww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Sr5AIbnU96HCUMRuw+zzqr1YdIo8+CnQDxY0zvVCgvk=; b=Jz7CIFXdXhUEFEZ886v1I1pMaEZUYzT1sMJiYO6XmwwIjJPQQyMa8o4TykUS/Z7CT3 1Yf/+b3BCmXf4IKYgTkbptEgYUKlOtdVoXLco6JCClgl4GtkCQKrlBJrSwVhXYTndAcX w6+xokIa26x/Vn8CRHFkg584XTyz4HZ+1HnJ9lFA1mJywuRhJYiRKOe3ngnmhWGbxiGQ CSoD7KbpIxar7IRcxfS+g20S47wr3ahpqGOCuDmzCq5FePhVM67mz3Fz2D2XJhle1Byn KBeX4jP6/0xt6cAehUOMzXLMDpQDirZZ/G8CRHCBZNhbVpWTH6fcFP5Ybs1QaMpNCiwy Le1g== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id l58si15091537edb.163.2019.05.20.21.53.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:53:44 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:53:43 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:21 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 12/14] kernel: teach the mm about range locking Date: Mon, 20 May 2019 21:52:40 -0700 Message-Id: <20190521045242.24378-13-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- kernel/acct.c | 5 +++-- kernel/bpf/stackmap.c | 7 +++++-- kernel/events/core.c | 5 +++-- kernel/events/uprobes.c | 20 ++++++++++++-------- kernel/exit.c | 9 +++++---- kernel/fork.c | 16 ++++++++++------ kernel/futex.c | 5 +++-- kernel/sched/fair.c | 5 +++-- kernel/sys.c | 22 +++++++++++++--------- kernel/trace/trace_output.c | 5 +++-- 10 files changed, 60 insertions(+), 39 deletions(-) diff --git a/kernel/acct.c b/kernel/acct.c index 81f9831a7859..2bbcecbd78ef 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -538,14 +538,15 @@ void acct_collect(long exitcode, int group_dead) if (group_dead && current->mm) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); vma = current->mm->mmap; while (vma) { vsize += vma->vm_end - vma->vm_start; vma = vma->vm_next; } - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } spin_lock_irq(¤t->sighand->siglock); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 950ab2f28922..fdb352bea7e8 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -37,6 +37,7 @@ struct bpf_stack_map { struct stack_map_irq_work { struct irq_work irq_work; struct rw_semaphore *sem; + struct range_lock *mmrange; }; static void do_up_read(struct irq_work *entry) @@ -291,6 +292,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, struct vm_area_struct *vma; bool irq_work_busy = false; struct stack_map_irq_work *work = NULL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (in_nmi()) { work = this_cpu_ptr(&up_read_work); @@ -309,7 +311,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, * with build_id. */ if (!user || !current || !current->mm || irq_work_busy || - down_read_trylock(¤t->mm->mmap_sem) == 0) { + mm_read_trylock(current->mm, &mmrange) == 0) { /* cannot access current->mm, fall back to ips */ for (i = 0; i < trace_nr; i++) { id_offs[i].status = BPF_STACK_BUILD_ID_IP; @@ -334,9 +336,10 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, } if (!work) { - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } else { work->sem = ¤t->mm->mmap_sem; + work->mmrange = &mmrange; irq_work_queue(&work->irq_work); /* * The irq_work will release the mmap_sem with diff --git a/kernel/events/core.c b/kernel/events/core.c index abbd4b3b96c2..3b43cfe63b54 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9079,6 +9079,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event) struct mm_struct *mm = NULL; unsigned int count = 0; unsigned long flags; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * We may observe TASK_TOMBSTONE, which means that the event tear-down @@ -9092,7 +9093,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event) if (!mm) goto restart; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); } raw_spin_lock_irqsave(&ifh->lock, flags); @@ -9118,7 +9119,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event) raw_spin_unlock_irqrestore(&ifh->lock, flags); if (ifh->nr_file_filters) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); } diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 3689eceb8d0c..6779c237799a 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -997,6 +997,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) bool is_register = !!new; struct map_info *info; int err = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); percpu_down_write(&dup_mmap_sem); info = build_map_info(uprobe->inode->i_mapping, @@ -1013,7 +1014,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) if (err && is_register) goto free; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); vma = find_vma(mm, info->vaddr); if (!vma || !valid_vma(vma, is_register) || file_inode(vma->vm_file) != uprobe->inode) @@ -1035,7 +1036,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) } unlock: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); free: mmput(mm); info = free_map_info(info); @@ -1189,8 +1190,9 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm) { struct vm_area_struct *vma; int err = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (vma = mm->mmap; vma; vma = vma->vm_next) { unsigned long vaddr; loff_t offset; @@ -1207,7 +1209,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm) vaddr = offset_to_vaddr(vma, uprobe->offset); err |= remove_breakpoint(uprobe, mm, vaddr); } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return err; } @@ -1391,10 +1393,11 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon /* Slot allocation for XOL */ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct vm_area_struct *vma; int ret; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return -EINTR; if (mm->uprobes_state.xol_area) { @@ -1424,7 +1427,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) /* pairs with get_xol_area() */ smp_store_release(&mm->uprobes_state.xol_area, area); /* ^^^ */ fail: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return ret; } @@ -1993,8 +1996,9 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp) struct mm_struct *mm = current->mm; struct uprobe *uprobe = NULL; struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, bp_vaddr); if (vma && vma->vm_start <= bp_vaddr) { if (valid_vma(vma, false)) { @@ -2012,7 +2016,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp) if (!uprobe && test_and_clear_bit(MMF_RECALC_UPROBES, &mm->flags)) mmf_recalc_uprobes(mm); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return uprobe; } diff --git a/kernel/exit.c b/kernel/exit.c index 8361a560cd1d..79bc5ec20694 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -497,6 +497,7 @@ static void exit_mm(void) { struct mm_struct *mm = current->mm; struct core_state *core_state; + DEFINE_RANGE_LOCK_FULL(mmrange); mm_release(current, mm); if (!mm) @@ -509,12 +510,12 @@ static void exit_mm(void) * will increment ->nr_threads for each thread in the * group with ->mm != NULL. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); core_state = mm->core_state; if (core_state) { struct core_thread self; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); self.task = current; self.next = xchg(&core_state->dumper.next, &self); @@ -532,14 +533,14 @@ static void exit_mm(void) freezable_schedule(); } __set_current_state(TASK_RUNNING); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); } mmgrab(mm); BUG_ON(mm != current->active_mm); /* more a memory barrier than a real lock */ task_lock(current); current->mm = NULL; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); enter_lazy_tlb(mm, current); task_unlock(current); mm_update_next_owner(mm); diff --git a/kernel/fork.c b/kernel/fork.c index 45fde571c5dd..cc24e3690532 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -468,10 +468,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, struct rb_node **rb_link, *rb_parent; int retval; unsigned long charge; + DEFINE_RANGE_LOCK_FULL(old_mmrange); + DEFINE_RANGE_LOCK_FULL(mmrange); LIST_HEAD(uf); uprobe_start_dup_mmap(); - if (down_write_killable(&oldmm->mmap_sem)) { + if (mm_write_lock_killable(oldmm, &old_mmrange)) { retval = -EINTR; goto fail_uprobe_end; } @@ -480,7 +482,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, /* * Not linked in yet - no deadlock potential: */ - down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING); + mm_write_lock_nested(mm, &mmrange, SINGLE_DEPTH_NESTING); /* No ordering required: file already has been exposed. */ RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); @@ -595,9 +597,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, /* a new mm has just been created */ retval = arch_dup_mmap(oldmm, mm); out: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); flush_tlb_mm(oldmm); - up_write(&oldmm->mmap_sem); + mm_write_unlock(oldmm, &old_mmrange); dup_userfaultfd_complete(&uf); fail_uprobe_end: uprobe_end_dup_mmap(); @@ -627,9 +629,11 @@ static inline void mm_free_pgd(struct mm_struct *mm) #else static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) { - down_write(&oldmm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + + mm_write_lock(oldmm, &mmrange); RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); - up_write(&oldmm->mmap_sem); + mm_write_unlock(oldmm, &mmrange); return 0; } #define mm_alloc_pgd(mm) (0) diff --git a/kernel/futex.c b/kernel/futex.c index 4615f9371a6f..53829040791b 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -730,11 +730,12 @@ static int fault_in_user_writeable(u32 __user *uaddr) { struct mm_struct *mm = current->mm; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); ret = fixup_user_fault(current, mm, (unsigned long)uaddr, FAULT_FLAG_WRITE, NULL, NULL); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret < 0 ? ret : 0; } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f35930f5e528..222b554bf928 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2461,6 +2461,7 @@ void task_numa_work(struct callback_head *work) struct vm_area_struct *vma; unsigned long start, end; unsigned long nr_pte_updates = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); long pages, virtpages; SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work)); @@ -2512,7 +2513,7 @@ void task_numa_work(struct callback_head *work) return; - if (!down_read_trylock(&mm->mmap_sem)) + if (!mm_read_trylock(mm, &mmrange)) return; vma = find_vma(mm, start); if (!vma) { @@ -2580,7 +2581,7 @@ void task_numa_work(struct callback_head *work) mm->numa_scan_offset = start; else reset_ptenuma_scan(p); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); /* * Make sure tasks use at least 32x as much time to run other code diff --git a/kernel/sys.c b/kernel/sys.c index bdbfe8d37418..c769293f8a79 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1825,6 +1825,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) struct file *old_exe, *exe_file; struct inode *inode; int err; + DEFINE_RANGE_LOCK_FULL(mmrange); exe = fdget(fd); if (!exe.file) @@ -1853,7 +1854,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) if (exe_file) { struct vm_area_struct *vma; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (vma = mm->mmap; vma; vma = vma->vm_next) { if (!vma->vm_file) continue; @@ -1862,7 +1863,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) goto exit_err; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); fput(exe_file); } @@ -1876,7 +1877,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) fdput(exe); return err; exit_err: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); fput(exe_file); goto exit; } @@ -1979,6 +1980,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data unsigned long user_auxv[AT_VECTOR_SIZE]; struct mm_struct *mm = current->mm; int error; + DEFINE_RANGE_LOCK_FULL(mmrange); BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); BUILD_BUG_ON(sizeof(struct prctl_mm_map) > 256); @@ -2019,7 +2021,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data * arg_lock protects concurent updates but we still need mmap_sem for * read to exclude races with sys_brk. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); /* * We don't validate if these members are pointing to @@ -2058,7 +2060,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data if (prctl_map.auxv_size) memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return 0; } #endif /* CONFIG_CHECKPOINT_RESTORE */ @@ -2100,6 +2102,7 @@ static int prctl_set_mm(int opt, unsigned long addr, struct prctl_mm_map prctl_map; struct vm_area_struct *vma; int error; + DEFINE_RANGE_LOCK_FULL(mmrange); if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && opt != PR_SET_MM_MAP && @@ -2125,7 +2128,7 @@ static int prctl_set_mm(int opt, unsigned long addr, error = -EINVAL; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); vma = find_vma(mm, addr); prctl_map.start_code = mm->start_code; @@ -2218,7 +2221,7 @@ static int prctl_set_mm(int opt, unsigned long addr, error = 0; out: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return error; } @@ -2266,6 +2269,7 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct task_struct *me = current; unsigned char comm[sizeof(me->comm)]; long error; @@ -2441,13 +2445,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_SET_THP_DISABLE: if (arg3 || arg4 || arg5) return -EINVAL; - if (down_write_killable(&me->mm->mmap_sem)) + if (mm_write_lock_killable(me->mm, &mmrange)) return -EINTR; if (arg2) set_bit(MMF_DISABLE_THP, &me->mm->flags); else clear_bit(MMF_DISABLE_THP, &me->mm->flags); - up_write(&me->mm->mmap_sem); + mm_write_unlock(me->mm, &mmrange); break; case PR_MPX_ENABLE_MANAGEMENT: if (arg2 || arg3 || arg4 || arg5) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 54373d93e251..0dbdab621f17 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -377,8 +377,9 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, if (mm) { const struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, ip); if (vma) { file = vma->vm_file; @@ -390,7 +391,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm, trace_seq_printf(s, "[+0x%lx]", ip - vmstart); } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file)) trace_seq_printf(s, " <" IP_FMT ">", ip); From patchwork Tue May 21 04:52:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D0F113AD for ; Tue, 21 May 2019 04:54:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 897DC2881A for ; Tue, 21 May 2019 04:54:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7D8A22896D; Tue, 21 May 2019 04:54:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1F0828928 for ; Tue, 21 May 2019 04:54:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3D306B0279; Tue, 21 May 2019 00:54:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DA53D6B0278; Tue, 21 May 2019 00:54:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B819C6B027B; Tue, 21 May 2019 00:54:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 3026E6B0278 for ; Tue, 21 May 2019 00:54:06 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id n23so28708775edv.9 for ; Mon, 20 May 2019 21:54:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=LyrlYnPqImlXi6EZsUBZqGli+s8YboIZe1+k5D6g4v4=; b=hyGt3rA6JOj8Bu5tSBjND3T3ii9sbjGtMPlwr7CRMe6P3q/5EPel+UsDfIJDqCaJAn zzUFysAldNViSHghZQJabkzi3AYXn0YYmWoNLIkIbYMkLL4QhRDb/AwPlcMN4Oo9Rlv1 9rJm06vQqmfoLBPHN8Z/fnio9tgzqGpHD72hC1OFQKQIbwavEulA6MF7T5tyrN/50qSZ BUXQF3mJkCRPaDI1UaxpcYqKt87huzRAZOfrSxUUWhqRracHy4Dk40Pv0NpetLg2DaDy f2HY0BtNHsrpWIYmK1eG3SyQo+aCm30Pp7eIvs+dNyBYTUaMxmsDS0Kdznw66Jk3Wa3K 9hyw== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAVjYT76H6kBysaj4bfN61HblMrIfb7y1y3Xfhbk98RlDkp0k6IR Da4RgMnjWUDtsxhSt4UIBhr82gpNbDq+KEFsQFfRYusec4apxBUNF3h9SAzIkNpdv0jhpajodGD a+4TLzSwI5iNuP2THQ0EyJeaLkPjZNccet+lT9Goq3SZ8GTypvxVQZsc0pZT6vhk= X-Received: by 2002:a05:6402:1256:: with SMTP id l22mr51677036edw.22.1558414445571; Mon, 20 May 2019 21:54:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqxmNiS5OU0JaDX5+kX5X7uejybwtlu1V5Xxd4jUMhE4RUHVr4KOg0m871CptNWG9dEHqpgr X-Received: by 2002:a05:6402:1256:: with SMTP id l22mr51676819edw.22.1558414443193; Mon, 20 May 2019 21:54:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414443; cv=none; d=google.com; s=arc-20160816; b=nK8hl/AKzTAvL5lq7nttzS1N9enhq4Z2LAEArkcDTlLn+2mVfLqOfvjbTGqww0ItwU KwysdwpkOX1d4f/TgDs+GuIu0IDe8gGm1pn09/sO4Bs8qZnlzcqzerhBrD75Bsq7xRQ5 PVEReqHFBFghymvVcgz/Cy4Se0i3GX2gca+vr6hcaaIQabRhVz4xtd0x+QXfqw3Msf6d EoVN55DGlBjl1ZIzt3rYHcTfaEyY8B/r8TJu/VvJmZCnm7aT4T5guxRI6+8l09bOfmxM +pLNsDoGwWRpCtsylea1mx27RVuYjV879Y/QFn5WmiATHDN3BEmzgvNQ3CN1DT6JFbVr i22Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=LyrlYnPqImlXi6EZsUBZqGli+s8YboIZe1+k5D6g4v4=; b=IOgsBCTamciHiUlMlDGSEeZrxaS9QvWyNgz8wUPSJ3jtmMYusBkl5PvyEfOAEHm0H2 mEPaAJ/je60kC0uSMJE0vV5mD8m7xcW4ewlcjEU247z9h3IzUR/Trks/KjHjLAjtjTZ8 zfoOfRsW2IodaQ7XkWF5oD55zj7GZRAHgcF59rejDc0v8JFXCDgkPVLdlMlPm/QruZ9T QIFFYg5VZsFeBfraEQM6qVbW/AsYEYsmLaSQB4IyXxTe9t8s7ZlYBGlkfmWx1u6eJ1C7 Xqay86Lqdd26vwqk48a0H4yG6Nnjb34UluuIASvHKrGAdsFREwtX9REopAfoAaqKCf8s r7Fg== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id 15si2491625ejn.380.2019.05.20.21.54.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:54:03 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:54:02 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:24 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 13/14] drivers: teach the mm about range locking Date: Mon, 20 May 2019 21:52:41 -0700 Message-Id: <20190521045242.24378-14-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- drivers/android/binder_alloc.c | 7 ++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 7 ++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 +++++---- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +++-- drivers/gpu/drm/i915/i915_gem.c | 5 +++-- drivers/gpu/drm/i915/i915_gem_userptr.c | 11 +++++++---- drivers/gpu/drm/nouveau/nouveau_svm.c | 23 ++++++++++++++--------- drivers/gpu/drm/radeon/radeon_cs.c | 5 +++-- drivers/gpu/drm/radeon/radeon_gem.c | 8 +++++--- drivers/gpu/drm/radeon/radeon_mn.c | 7 ++++--- drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 ++-- drivers/infiniband/core/umem.c | 7 ++++--- drivers/infiniband/core/umem_odp.c | 12 +++++++----- drivers/infiniband/core/uverbs_main.c | 5 +++-- drivers/infiniband/hw/mlx4/mr.c | 5 +++-- drivers/infiniband/hw/qib/qib_user_pages.c | 7 ++++--- drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +++-- drivers/iommu/amd_iommu_v2.c | 4 ++-- drivers/iommu/intel-svm.c | 4 ++-- drivers/media/v4l2-core/videobuf-core.c | 5 +++-- drivers/media/v4l2-core/videobuf-dma-contig.c | 5 +++-- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++-- drivers/misc/cxl/cxllib.c | 5 +++-- drivers/misc/cxl/fault.c | 5 +++-- drivers/misc/sgi-gru/grufault.c | 20 ++++++++++++-------- drivers/misc/sgi-gru/grufile.c | 5 +++-- drivers/misc/sgi-gru/grukservices.c | 4 +++- drivers/misc/sgi-gru/grumain.c | 6 ++++-- drivers/misc/sgi-gru/grutables.h | 5 ++++- drivers/oprofile/buffer_sync.c | 12 +++++++----- drivers/staging/kpc2000/kpc_dma/fileops.c | 5 +++-- drivers/tee/optee/call.c | 5 +++-- drivers/vfio/vfio_iommu_type1.c | 9 +++++---- drivers/xen/gntdev.c | 5 +++-- drivers/xen/privcmd.c | 17 ++++++++++------- include/linux/hmm.h | 7 ++++--- 37 files changed, 160 insertions(+), 109 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index bb929eb87116..0b9cd9becd76 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -195,6 +195,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, struct vm_area_struct *vma = NULL; struct mm_struct *mm = NULL; bool need_mm = false; + DEFINE_RANGE_LOCK_FULL(mmrange); binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: %s pages %pK-%pK\n", alloc->pid, @@ -220,7 +221,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, mm = alloc->vma_vm_mm; if (mm) { - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = alloc->vma; } @@ -279,7 +280,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, /* vm_insert_page does not seem to increment the refcount */ } if (mm) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); } return 0; @@ -310,7 +311,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, } err_no_vma: if (mm) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); } return vma ? -ENOMEM : -ESRCH; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 123eb0d7e2e9..28ddd42b27be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1348,9 +1348,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( * concurrently and the queues are actually stopped */ if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - down_write(¤t->mm->mmap_sem); + mm_write_lock(current->mm, &mmrange); is_invalid_userptr = atomic_read(&mem->invalid); - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); } mutex_lock(&mem->lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index 58ed401c5996..d002df91c7b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -376,13 +376,14 @@ static const struct mmu_notifier_ops amdgpu_mn_ops[] = { struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev, enum amdgpu_mn_type type) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct amdgpu_mn *amn; unsigned long key = AMDGPU_MN_KEY(mm, type); int r; mutex_lock(&adev->mn_lock); - if (down_write_killable(&mm->mmap_sem)) { + if (mm_write_lock_killable(mm, &mmrange)) { mutex_unlock(&adev->mn_lock); return ERR_PTR(-EINTR); } @@ -413,13 +414,13 @@ struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev, hash_add(adev->mn_hash, &amn->node, AMDGPU_MN_KEY(mm, type)); release_locks: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); mutex_unlock(&adev->mn_lock); return amn; free_amn: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); mutex_unlock(&adev->mn_lock); kfree(amn); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d81101ac57eb..86e5a7549031 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -735,6 +735,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages) unsigned int flags = 0; unsigned pinned = 0; int r; + DEFINE_RANGE_LOCK_FULL(mmrange); if (!mm) /* Happens during process shutdown */ return -ESRCH; @@ -742,7 +743,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages) if (!(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY)) flags |= FOLL_WRITE; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); if (gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) { /* @@ -754,7 +755,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages) vma = find_vma(mm, gtt->userptr); if (!vma || vma->vm_file || vma->vm_end < end) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return -EPERM; } } @@ -789,12 +790,12 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages) } while (pinned < ttm->num_pages); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return 0; release_pages: release_pages(pages, pinned); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return r; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index d674d4b3340f..41eedbb2e120 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -887,6 +887,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid, */ struct kfd_process *p = kfd_lookup_process_by_pasid(pasid); struct mm_struct *mm; + DEFINE_RANGE_LOCK_FULL(mmrange); if (!p) return; /* Presumably process exited. */ @@ -902,7 +903,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid, memset(&memory_exception_data, 0, sizeof(memory_exception_data)); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, address); memory_exception_data.gpu_id = dev->id; @@ -925,7 +926,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid, memory_exception_data.failure.NoExecute = 0; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); pr_debug("notpresent %d, noexecute %d, readonly %d\n", diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ad01c92aaf74..320516346bbf 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1644,6 +1644,7 @@ int i915_gem_mmap_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct drm_i915_gem_mmap *args = data; struct drm_i915_gem_object *obj; unsigned long addr; @@ -1681,7 +1682,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - if (down_write_killable(&mm->mmap_sem)) { + if (mm_write_lock_killable(mm, &mmrange)) { addr = -EINTR; goto err; } @@ -1691,7 +1692,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); else addr = -ENOMEM; - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); if (IS_ERR_VALUE(addr)) goto err; diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 67f718015e42..0bba318098bb 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -231,6 +231,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm) { struct i915_mmu_notifier *mn; int err = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); mn = mm->mn; if (mn) @@ -240,7 +241,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm) if (IS_ERR(mn)) err = PTR_ERR(mn); - down_write(&mm->mm->mmap_sem); + mm_write_lock(mm->mm, &mmrange); mutex_lock(&mm->i915->mm_lock); if (mm->mn == NULL && !err) { /* Protected by mmap_sem (write-lock) */ @@ -257,7 +258,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm) err = 0; } mutex_unlock(&mm->i915->mm_lock); - up_write(&mm->mm->mmap_sem); + mm_write_unlock(mm->mm, &mmrange); if (mn && !IS_ERR(mn)) kfree(mn); @@ -504,7 +505,9 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) ret = -EFAULT; if (mmget_not_zero(mm)) { - down_read(&mm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + + mm_read_lock(mm, &mmrange); while (pinned < npages) { ret = get_user_pages_remote (work->task, mm, @@ -517,7 +520,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) pinned += ret; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); } } diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 93ed43c413f0..1df4227c0967 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -171,7 +171,7 @@ nouveau_svmm_bind(struct drm_device *dev, void *data, */ mm = get_task_mm(current); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (addr = args->va_start, end = args->va_start + size; addr < end;) { struct vm_area_struct *vma; @@ -194,7 +194,7 @@ nouveau_svmm_bind(struct drm_device *dev, void *data, */ args->result = 0; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); return 0; @@ -307,6 +307,7 @@ nouveau_svmm_init(struct drm_device *dev, void *data, struct nouveau_svmm *svmm; struct drm_nouveau_svm_init *args = data; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); /* Allocate tracking for SVM-enabled VMM. */ if (!(svmm = kzalloc(sizeof(*svmm), GFP_KERNEL))) @@ -339,14 +340,14 @@ nouveau_svmm_init(struct drm_device *dev, void *data, /* Enable HMM mirroring of CPU address-space to VMM. */ svmm->mm = get_task_mm(current); - down_write(&svmm->mm->mmap_sem); + mm_write_lock(svmm->mm, &mmrange); svmm->mirror.ops = &nouveau_svmm; ret = hmm_mirror_register(&svmm->mirror, svmm->mm); if (ret == 0) { cli->svm.svmm = svmm; cli->svm.cli = cli; } - up_write(&svmm->mm->mmap_sem); + mm_write_unlock(svmm->mm, &mmrange); mmput(svmm->mm); done: @@ -548,6 +549,8 @@ nouveau_svm_fault(struct nvif_notify *notify) args.i.p.version = 0; for (fi = 0; fn = fi + 1, fi < buffer->fault_nr; fi = fn) { + DEFINE_RANGE_LOCK_FULL(mmrange); + /* Cancel any faults from non-SVM channels. */ if (!(svmm = buffer->fault[fi]->svmm)) { nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]); @@ -570,11 +573,11 @@ nouveau_svm_fault(struct nvif_notify *notify) /* Intersect fault window with the CPU VMA, cancelling * the fault if the address is invalid. */ - down_read(&svmm->mm->mmap_sem); + mm_read_lock(svmm->mm, &mmrange); vma = find_vma_intersection(svmm->mm, start, limit); if (!vma) { SVMM_ERR(svmm, "wndw %016llx-%016llx", start, limit); - up_read(&svmm->mm->mmap_sem); + mm_read_unlock(svmm->mm, &mmrange); nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]); continue; } @@ -584,7 +587,7 @@ nouveau_svm_fault(struct nvif_notify *notify) if (buffer->fault[fi]->addr != start) { SVMM_ERR(svmm, "addr %016llx", buffer->fault[fi]->addr); - up_read(&svmm->mm->mmap_sem); + mm_read_unlock(svmm->mm, &mmrange); nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]); continue; } @@ -596,6 +599,8 @@ nouveau_svm_fault(struct nvif_notify *notify) args.i.p.page = PAGE_SHIFT; args.i.p.addr = start; for (fn = fi, pi = 0;;) { + DEFINE_RANGE_LOCK_FULL(mmrange); + /* Determine required permissions based on GPU fault * access flags. *XXX: atomic? @@ -649,7 +654,7 @@ nouveau_svm_fault(struct nvif_notify *notify) range.values = nouveau_svm_pfn_values; range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT; again: - ret = hmm_vma_fault(&range, true); + ret = hmm_vma_fault(&range, true, &mmrange); if (ret == 0) { mutex_lock(&svmm->mutex); if (!hmm_vma_range_done(&range)) { @@ -667,7 +672,7 @@ nouveau_svm_fault(struct nvif_notify *notify) svmm->vmm->vmm.object.client->super = false; mutex_unlock(&svmm->mutex); } - up_read(&svmm->mm->mmap_sem); + mm_read_unlock(svmm->mm, &mmrange); /* Cancel any faults in the window whose pages didn't manage * to keep their valid bit, or stay writeable when required. diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index f43305329939..8015a1b7f6ef 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -79,6 +79,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) unsigned i; bool need_mmap_lock = false; int r; + DEFINE_RANGE_LOCK_FULL(mmrange); if (p->chunk_relocs == NULL) { return 0; @@ -190,12 +191,12 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm, &p->validated); if (need_mmap_lock) - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring); if (need_mmap_lock) - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 44617dec8183..fa6ba354f59d 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -334,17 +334,19 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data, } if (args->flags & RADEON_GEM_USERPTR_VALIDATE) { - down_read(¤t->mm->mmap_sem); + DEFINE_RANGE_LOCK_FULL(mmrange); + + mm_read_lock(current->mm, &mmrange); r = radeon_bo_reserve(bo, true); if (r) { - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); goto release_object; } radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT); r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); radeon_bo_unreserve(bo); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); if (r) goto release_object; } diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c index c9bd1278f573..a4fc3fadb8d5 100644 --- a/drivers/gpu/drm/radeon/radeon_mn.c +++ b/drivers/gpu/drm/radeon/radeon_mn.c @@ -197,11 +197,12 @@ static const struct mmu_notifier_ops radeon_mn_ops = { */ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct radeon_mn *rmn; int r; - if (down_write_killable(&mm->mmap_sem)) + if (mm_write_lock_killable(mm, &mmrange)) return ERR_PTR(-EINTR); mutex_lock(&rdev->mn_lock); @@ -230,13 +231,13 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev) release_locks: mutex_unlock(&rdev->mn_lock); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return rmn; free_rmn: mutex_unlock(&rdev->mn_lock); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); kfree(rmn); return ERR_PTR(r); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6dacff49c1cc..ba3eda092010 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -69,7 +69,7 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, goto out_unlock; ttm_bo_get(bo); - up_read(&vmf->vma->vm_mm->mmap_sem); + mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange); (void) dma_fence_wait(bo->moving, true); reservation_object_unlock(bo->resv); ttm_bo_put(bo); @@ -135,7 +135,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { ttm_bo_get(bo); - up_read(&vmf->vma->vm_mm->mmap_sem); + mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange); (void) ttm_bo_wait_unreserved(bo); ttm_bo_put(bo); } diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index e7ea819fcb11..7356911bcf9e 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -207,6 +207,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, unsigned long dma_attrs = 0; struct scatterlist *sg; unsigned int gup_flags = FOLL_WRITE; + DEFINE_RANGE_LOCK_FULL(mmrange); if (!udata) return ERR_PTR(-EIO); @@ -294,14 +295,14 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, sg = umem->sg_head.sgl; while (npages) { - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); ret = get_user_pages(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), gup_flags | FOLL_LONGTERM, page_list, NULL); if (ret < 0) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); goto umem_release; } @@ -312,7 +313,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, dma_get_max_seg_size(context->device->dma_device), &umem->sg_nents); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } sg_mark_end(sg); diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 62b5de027dd1..a21e575e90d0 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -408,16 +408,17 @@ int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access) if (access & IB_ACCESS_HUGETLB) { struct vm_area_struct *vma; struct hstate *h; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, ib_umem_start(umem)); if (!vma || !is_vm_hugetlb_page(vma)) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return -EINVAL; } h = hstate_vma(vma); umem->page_shift = huge_page_shift(h); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); } mutex_init(&umem_odp->umem_mutex); @@ -589,6 +590,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, int j, k, ret = 0, start_idx, npages = 0, page_shift; unsigned int flags = 0; phys_addr_t p = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); if (access_mask == 0) return -EINVAL; @@ -629,7 +631,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, (bcnt + BIT(page_shift) - 1) >> page_shift, PAGE_SIZE / sizeof(struct page *)); - down_read(&owning_mm->mmap_sem); + mm_read_lock(owning_mm, &mmrange); /* * Note: this might result in redundent page getting. We can * avoid this by checking dma_list to be 0 before calling @@ -640,7 +642,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, npages = get_user_pages_remote(owning_process, owning_mm, user_virt, gup_num_pages, flags, local_page_list, NULL, NULL, NULL); - up_read(&owning_mm->mmap_sem); + mm_read_unlock(owning_mm, &mmrange); if (npages < 0) { if (npages != -EAGAIN) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 84a5e9a6d483..dcc94e5d617e 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -967,6 +967,7 @@ EXPORT_SYMBOL(rdma_user_mmap_io); void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile) { struct rdma_umap_priv *priv, *next_priv; + DEFINE_RANGE_LOCK_FULL(mmrange); lockdep_assert_held(&ufile->hw_destroy_rwsem); @@ -999,7 +1000,7 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile) * at a time to get the lock ordering right. Typically there * will only be one mm, so no big deal. */ - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); if (!mmget_still_valid(mm)) goto skip_mm; mutex_lock(&ufile->umap_lock); @@ -1016,7 +1017,7 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile) } mutex_unlock(&ufile->umap_lock); skip_mm: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); } } diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 355205a28544..b67ada7e86c2 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -379,8 +379,9 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, */ if (!ib_access_writable(access_flags)) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); /* * FIXME: Ideally this would iterate over all the vmas that * cover the memory, but for now it requires a single vma to @@ -395,7 +396,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, access_flags |= IB_ACCESS_LOCAL_WRITE; } - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); } return ib_umem_get(udata, start, length, access_flags, 0); diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index f712fb7fa82f..0fd47aa11b28 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -103,6 +103,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages, unsigned long locked, lock_limit; size_t got; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; locked = atomic64_add_return(num_pages, ¤t->mm->pinned_vm); @@ -112,18 +113,18 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages, goto bail; } - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); for (got = 0; got < num_pages; got += ret) { ret = get_user_pages(start_page + got * PAGE_SIZE, num_pages - got, FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE, p + got, NULL); if (ret < 0) { - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); goto bail_release; } } - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return 0; bail_release: diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index e312f522a66d..851aec8ecf41 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -102,6 +102,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, dma_addr_t pa; unsigned int gup_flags; struct mm_struct *mm; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * If the combination of the addr and size requested for this memory @@ -125,7 +126,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, npages = PAGE_ALIGN(size + (addr & ~PAGE_MASK)) >> PAGE_SHIFT; uiomr->owning_mm = mm = current->mm; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); locked = atomic64_add_return(npages, ¤t->mm->pinned_vm); lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; @@ -189,7 +190,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, } else mmgrab(uiomr->owning_mm); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); free_page((unsigned long) page_list); return ret; } diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c index 67c609b26249..7073c2cd6915 100644 --- a/drivers/iommu/amd_iommu_v2.c +++ b/drivers/iommu/amd_iommu_v2.c @@ -500,7 +500,7 @@ static void do_fault(struct work_struct *work) flags |= FAULT_FLAG_WRITE; flags |= FAULT_FLAG_REMOTE; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_extend_vma(mm, address); if (!vma || address < vma->vm_start) /* failed to get a vma in the right range */ @@ -512,7 +512,7 @@ static void do_fault(struct work_struct *work) ret = handle_mm_fault(vma, address, flags, &mmrange); out: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (ret & VM_FAULT_ERROR) /* failed to service fault */ diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 74d535ea6a03..192a2f8f824c 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -595,7 +595,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) if (!is_canonical_address(address)) goto bad_req; - down_read(&svm->mm->mmap_sem); + mm_read_lock(svm->mm, &mmrange); vma = find_extend_vma(svm->mm, address); if (!vma || address < vma->vm_start) goto invalid; @@ -610,7 +610,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) result = QI_RESP_SUCCESS; invalid: - up_read(&svm->mm->mmap_sem); + mm_read_unlock(svm->mm, &mmrange); mmput(svm->mm); bad_req: /* Accounting for major/minor faults? */ diff --git a/drivers/media/v4l2-core/videobuf-core.c b/drivers/media/v4l2-core/videobuf-core.c index bf7dfb2a34af..a6b7d890d2cb 100644 --- a/drivers/media/v4l2-core/videobuf-core.c +++ b/drivers/media/v4l2-core/videobuf-core.c @@ -533,11 +533,12 @@ int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b) enum v4l2_field field; unsigned long flags = 0; int retval; + DEFINE_RANGE_LOCK_FULL(mmrange); MAGIC_CHECK(q->int_ops->magic, MAGIC_QTYPE_OPS); if (b->memory == V4L2_MEMORY_MMAP) - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); videobuf_queue_lock(q); retval = -EBUSY; @@ -624,7 +625,7 @@ int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b) videobuf_queue_unlock(q); if (b->memory == V4L2_MEMORY_MMAP) - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return retval; } diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c index e1bf50df4c70..04ff0c7c7ebc 100644 --- a/drivers/media/v4l2-core/videobuf-dma-contig.c +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c @@ -166,12 +166,13 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, unsigned long pages_done, user_address; unsigned int offset; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); offset = vb->baddr & ~PAGE_MASK; mem->size = PAGE_ALIGN(vb->size + offset); ret = -EINVAL; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, vb->baddr); if (!vma) @@ -203,7 +204,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, } out_up: - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return ret; } diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 870a2a526e0b..488d484acf6c 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -202,10 +202,11 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction, unsigned long data, unsigned long size) { int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); ret = videobuf_dma_init_user_locked(dma, direction, data, size); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); return ret; } diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c index 5a3f91255258..c287f47d5e2c 100644 --- a/drivers/misc/cxl/cxllib.c +++ b/drivers/misc/cxl/cxllib.c @@ -210,8 +210,9 @@ static int get_vma_info(struct mm_struct *mm, u64 addr, { struct vm_area_struct *vma = NULL; int rc = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma(mm, addr); if (!vma) { @@ -222,7 +223,7 @@ static int get_vma_info(struct mm_struct *mm, u64 addr, *vma_start = vma->vm_start; *vma_end = vma->vm_end; out: - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return rc; } diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c index a4d17a5a9763..b97950440ee8 100644 --- a/drivers/misc/cxl/fault.c +++ b/drivers/misc/cxl/fault.c @@ -317,6 +317,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx) struct vm_area_struct *vma; int rc; struct mm_struct *mm; + DEFINE_RANGE_LOCK_FULL(mmrange); mm = get_mem_context(ctx); if (mm == NULL) { @@ -325,7 +326,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx) return; } - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (vma = mm->mmap; vma; vma = vma->vm_next) { for (ea = vma->vm_start; ea < vma->vm_end; ea = next_segment(ea, slb.vsid)) { @@ -340,7 +341,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx) last_esid = slb.esid; } } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); mmput(mm); } diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c index 2ec5808ba464..a89d541c236e 100644 --- a/drivers/misc/sgi-gru/grufault.c +++ b/drivers/misc/sgi-gru/grufault.c @@ -81,15 +81,16 @@ static struct gru_thread_state *gru_find_lock_gts(unsigned long vaddr) struct mm_struct *mm = current->mm; struct vm_area_struct *vma; struct gru_thread_state *gts = NULL; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = gru_find_vma(vaddr); if (vma) gts = gru_find_thread_state(vma, TSID(vaddr, vma)); if (gts) mutex_lock(>s->ts_ctxlock); else - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return gts; } @@ -98,8 +99,9 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr) struct mm_struct *mm = current->mm; struct vm_area_struct *vma; struct gru_thread_state *gts = ERR_PTR(-EINVAL); + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); vma = gru_find_vma(vaddr); if (!vma) goto err; @@ -108,11 +110,11 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr) if (IS_ERR(gts)) goto err; mutex_lock(>s->ts_ctxlock); - downgrade_write(&mm->mmap_sem); + mm_downgrade_write(mm, &mmrange); return gts; err: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); return gts; } @@ -122,7 +124,7 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr) static void gru_unlock_gts(struct gru_thread_state *gts) { mutex_unlock(>s->ts_ctxlock); - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, gts->mmrange); } /* @@ -563,6 +565,8 @@ static irqreturn_t gru_intr(int chiplet, int blade) } for_each_cbr_in_tfm(cbrnum, imap.fault_bits) { + DEFINE_RANGE_LOCK_FULL(mmrange); + STAT(intr_tfh); tfh = get_tfh_by_index(gru, cbrnum); prefetchw(tfh); /* Helps on hdw, required for emulator */ @@ -588,9 +592,9 @@ static irqreturn_t gru_intr(int chiplet, int blade) */ gts->ustats.fmm_tlbmiss++; if (!gts->ts_force_cch_reload && - down_read_trylock(>s->ts_mm->mmap_sem)) { + mm_read_trylock(gts->ts_mm, &mmrange)) { gru_try_dropin(gru, gts, tfh, NULL); - up_read(>s->ts_mm->mmap_sem); + mm_read_unlock(gts->ts_mm, &mmrange); } else { tfh_user_polling_mode(tfh); STAT(intr_mm_lock_failed); diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c index 104a05f6b738..1403a4f73cbd 100644 --- a/drivers/misc/sgi-gru/grufile.c +++ b/drivers/misc/sgi-gru/grufile.c @@ -136,6 +136,7 @@ static int gru_create_new_context(unsigned long arg) struct vm_area_struct *vma; struct gru_vma_data *vdata; int ret = -EINVAL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (copy_from_user(&req, (void __user *)arg, sizeof(req))) return -EFAULT; @@ -148,7 +149,7 @@ static int gru_create_new_context(unsigned long arg) if (!(req.options & GRU_OPT_MISS_MASK)) req.options |= GRU_OPT_MISS_FMM_INTR; - down_write(¤t->mm->mmap_sem); + mm_write_lock(current->mm, &mmrange); vma = gru_find_vma(req.gseg); if (vma) { vdata = vma->vm_private_data; @@ -159,7 +160,7 @@ static int gru_create_new_context(unsigned long arg) vdata->vd_tlb_preload_count = req.tlb_preload_count; ret = 0; } - up_write(¤t->mm->mmap_sem); + mm_write_unlock(current->mm, &mmrange); return ret; } diff --git a/drivers/misc/sgi-gru/grukservices.c b/drivers/misc/sgi-gru/grukservices.c index 4b23d586fc3f..ceed48ecbd15 100644 --- a/drivers/misc/sgi-gru/grukservices.c +++ b/drivers/misc/sgi-gru/grukservices.c @@ -178,7 +178,9 @@ static void gru_load_kernel_context(struct gru_blade_state *bs, int blade_id) kgts->ts_dsr_au_count = GRU_DS_BYTES_TO_AU( GRU_NUM_KERNEL_DSR_BYTES * ncpus + bs->bs_async_dsr_bytes); - while (!gru_assign_gru_context(kgts)) { + + /*** BROKEN mmrange, we don't care about gru (for now) */ + while (!gru_assign_gru_context(kgts, NULL)) { msleep(1); gru_steal_context(kgts); } diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c index ab174f28e3be..d33d94cc35e0 100644 --- a/drivers/misc/sgi-gru/grumain.c +++ b/drivers/misc/sgi-gru/grumain.c @@ -866,7 +866,8 @@ static int gru_assign_context_number(struct gru_state *gru) /* * Scan the GRUs on the local blade & assign a GRU context. */ -struct gru_state *gru_assign_gru_context(struct gru_thread_state *gts) +struct gru_state *gru_assign_gru_context(struct gru_thread_state *gts, + struct range_lock *mmrange) { struct gru_state *gru, *grux; int i, max_active_contexts; @@ -902,6 +903,7 @@ struct gru_state *gru_assign_gru_context(struct gru_thread_state *gts) gts->ts_blade = gru->gs_blade_id; gts->ts_ctxnum = gru_assign_context_number(gru); atomic_inc(>s->ts_refcnt); + gts->mmrange = mmrange; gru->gs_gts[gts->ts_ctxnum] = gts; spin_unlock(&gru->gs_lock); @@ -951,7 +953,7 @@ vm_fault_t gru_fault(struct vm_fault *vmf) if (!gts->ts_gru) { STAT(load_user_context); - if (!gru_assign_gru_context(gts)) { + if (!gru_assign_gru_context(gts, vmf->lockrange)) { preempt_enable(); mutex_unlock(>s->ts_ctxlock); set_current_state(TASK_INTERRUPTIBLE); diff --git a/drivers/misc/sgi-gru/grutables.h b/drivers/misc/sgi-gru/grutables.h index 3e041b6f7a68..a4c75178ad46 100644 --- a/drivers/misc/sgi-gru/grutables.h +++ b/drivers/misc/sgi-gru/grutables.h @@ -389,6 +389,8 @@ struct gru_thread_state { struct gru_gseg_statistics ustats; /* User statistics */ unsigned long ts_gdata[0]; /* save area for GRU data (CB, DS, CBE) */ + struct range_lock *mmrange; /* for faulting */ + }; /* @@ -633,7 +635,8 @@ extern struct gru_thread_state *gru_find_thread_state(struct vm_area_struct *vma, int tsid); extern struct gru_thread_state *gru_alloc_thread_state(struct vm_area_struct *vma, int tsid); -extern struct gru_state *gru_assign_gru_context(struct gru_thread_state *gts); +extern struct gru_state *gru_assign_gru_context(struct gru_thread_state *gts, + struct range_lock *mmrange); extern void gru_load_context(struct gru_thread_state *gts); extern void gru_steal_context(struct gru_thread_state *gts); extern void gru_unload_context(struct gru_thread_state *gts, int savestate); diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c index ac27f3d3fbb4..33a36b97f8a5 100644 --- a/drivers/oprofile/buffer_sync.c +++ b/drivers/oprofile/buffer_sync.c @@ -90,12 +90,13 @@ munmap_notify(struct notifier_block *self, unsigned long val, void *data) unsigned long addr = (unsigned long)data; struct mm_struct *mm = current->mm; struct vm_area_struct *mpnt; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); mpnt = find_vma(mm, addr); if (mpnt && mpnt->vm_file && (mpnt->vm_flags & VM_EXEC)) { - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); /* To avoid latency problems, we only process the current CPU, * hoping that most samples for the task are on this CPU */ @@ -103,7 +104,7 @@ munmap_notify(struct notifier_block *self, unsigned long val, void *data) return 0; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return 0; } @@ -255,8 +256,9 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset) { unsigned long cookie = NO_COOKIE; struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) { if (addr < vma->vm_start || addr >= vma->vm_end) @@ -276,7 +278,7 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset) if (!vma) cookie = INVALID_COOKIE; - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return cookie; } diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c index 5741d2b49a7d..9b1523a0e7bd 100644 --- a/drivers/staging/kpc2000/kpc_dma/fileops.c +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c @@ -50,6 +50,7 @@ int kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned u64 card_addr; u64 dma_addr; u64 user_ctl; + DEFINE_RANGE_LOCK_FULL(mmrange); BUG_ON(priv == NULL); ldev = priv->ldev; @@ -81,9 +82,9 @@ int kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned } // Lock the user buffer pages in memory, and hold on to the page pointers (for the sglist) - down_read(¤t->mm->mmap_sem); /* get memory map semaphore */ + mm_read_lock(current->mm, &mmrange); /* get memory map semaphore */ rv = get_user_pages(iov_base, acd->page_count, FOLL_TOUCH | FOLL_WRITE | FOLL_GET, acd->user_pages, NULL); - up_read(¤t->mm->mmap_sem); /* release the semaphore */ + mm_read_unlock(current->mm, &mmrange); /* release the semaphore */ if (rv != acd->page_count){ dev_err(&priv->ldev->pldev->dev, "Couldn't get_user_pages (%ld)\n", rv); goto err_get_user_pages; diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c index a5afbe6dee68..488a08e17a93 100644 --- a/drivers/tee/optee/call.c +++ b/drivers/tee/optee/call.c @@ -561,11 +561,12 @@ static int check_mem_type(unsigned long start, size_t num_pages) { struct mm_struct *mm = current->mm; int rc; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); rc = __check_mem_type(find_vma(mm, start), start + num_pages * PAGE_SIZE); - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return rc; } diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index b5f911222ae6..c83cd7d1c25b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -344,11 +344,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, struct vm_area_struct *vmas[1]; unsigned int flags = 0; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); if (prot & IOMMU_WRITE) flags |= FOLL_WRITE; - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); if (mm == current->mm) { ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, vmas); @@ -367,14 +368,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, put_page(page[0]); } } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); if (ret == 1) { *pfn = page_to_pfn(page[0]); return 0; } - down_read(&mm->mmap_sem); + mm_read_lock(mm, &mmrange); vma = find_vma_intersection(mm, vaddr, vaddr + 1); @@ -384,7 +385,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, ret = 0; } - up_read(&mm->mmap_sem); + mm_read_unlock(mm, &mmrange); return ret; } diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c index 469dfbd6cf90..ab154712642b 100644 --- a/drivers/xen/gntdev.c +++ b/drivers/xen/gntdev.c @@ -742,12 +742,13 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, struct vm_area_struct *vma; struct gntdev_grant_map *map; int rv = -EINVAL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (copy_from_user(&op, u, sizeof(op)) != 0) return -EFAULT; pr_debug("priv %p, offset for vaddr %lx\n", priv, (unsigned long)op.vaddr); - down_read(¤t->mm->mmap_sem); + mm_read_lock(current->mm, &mmrange); vma = find_vma(current->mm, op.vaddr); if (!vma || vma->vm_ops != &gntdev_vmops) goto out_unlock; @@ -761,7 +762,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv, rv = 0; out_unlock: - up_read(¤t->mm->mmap_sem); + mm_read_unlock(current->mm, &mmrange); if (rv == 0 && copy_to_user(u, &op, sizeof(op)) != 0) return -EFAULT; diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index b24ddac1604b..dca0ad37e1b2 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -258,6 +258,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata) int rc; LIST_HEAD(pagelist); struct mmap_gfn_state state; + DEFINE_RANGE_LOCK_FULL(mmrange); /* We only support privcmd_ioctl_mmap_batch for auto translated. */ if (xen_feature(XENFEAT_auto_translated_physmap)) @@ -277,7 +278,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata) if (rc || list_empty(&pagelist)) goto out; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); { struct page *page = list_first_entry(&pagelist, @@ -302,7 +303,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata) out_up: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); out: free_page_list(&pagelist); @@ -452,6 +453,7 @@ static long privcmd_ioctl_mmap_batch( unsigned long nr_pages; LIST_HEAD(pagelist); struct mmap_batch_state state; + DEFINE_RANGE_LOCK_FULL(mmrange); switch (version) { case 1: @@ -498,7 +500,7 @@ static long privcmd_ioctl_mmap_batch( } } - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); vma = find_vma(mm, m.addr); if (!vma || @@ -554,7 +556,7 @@ static long privcmd_ioctl_mmap_batch( BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t), &pagelist, mmap_batch_fn, &state)); - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); if (state.global_error) { /* Write back errors in second pass. */ @@ -575,7 +577,7 @@ static long privcmd_ioctl_mmap_batch( return ret; out_unlock: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); goto out; } @@ -752,6 +754,7 @@ static long privcmd_ioctl_mmap_resource(struct file *file, void __user *udata) xen_pfn_t *pfns = NULL; struct xen_mem_acquire_resource xdata; int rc; + DEFINE_RANGE_LOCK_FULL(mmrange); if (copy_from_user(&kdata, udata, sizeof(kdata))) return -EFAULT; @@ -760,7 +763,7 @@ static long privcmd_ioctl_mmap_resource(struct file *file, void __user *udata) if (data->domid != DOMID_INVALID && data->domid != kdata.dom) return -EPERM; - down_write(&mm->mmap_sem); + mm_write_lock(mm, &mmrange); vma = find_vma(mm, kdata.addr); if (!vma || vma->vm_ops != &privcmd_vm_ops) { @@ -845,7 +848,7 @@ static long privcmd_ioctl_mmap_resource(struct file *file, void __user *udata) } out: - up_write(&mm->mmap_sem); + mm_write_unlock(mm, &mmrange); kfree(pfns); return rc; diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 51ec27a84668..a77d42ece14f 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -538,7 +538,8 @@ static inline bool hmm_vma_range_done(struct hmm_range *range) } /* This is a temporary helper to avoid merge conflict between trees. */ -static inline int hmm_vma_fault(struct hmm_range *range, bool block) +static inline int hmm_vma_fault(struct hmm_range *range, bool block, + struct range_lock *mmrange) { long ret; @@ -563,7 +564,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) * returns -EAGAIN which correspond to mmap_sem have been * drop in the old API. */ - up_read(&range->vma->vm_mm->mmap_sem); + mm_read_unlock(range->vma->vm_mm, mmrange); return -EAGAIN; } @@ -571,7 +572,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block) if (ret <= 0) { if (ret == -EBUSY || !ret) { /* Same as above drop mmap_sem to match old API. */ - up_read(&range->vma->vm_mm->mmap_sem); + mm_read_unlock(range->vma->vm_mm, mmrange); ret = -EBUSY; } else if (ret == -EAGAIN) ret = -EBUSY; From patchwork Tue May 21 04:52:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10952909 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAB2676 for ; Tue, 21 May 2019 04:54:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DBB9928866 for ; Tue, 21 May 2019 04:54:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CFB7128968; Tue, 21 May 2019 04:54:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F34B428848 for ; Tue, 21 May 2019 04:54:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 186FB6B0278; Tue, 21 May 2019 00:54:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EDB536B027C; Tue, 21 May 2019 00:54:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C43966B027A; Tue, 21 May 2019 00:54:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 622B66B0279 for ; Tue, 21 May 2019 00:54:06 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id r20so28682861edp.17 for ; Mon, 20 May 2019 21:54:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ikv3JIS8B6O3JMfnljc+EOxnfAL3XxzIKwiJr+31Z2Y=; b=AIF2KWRrLtauFPSpYFrisIkvBvzeHG0Is5qh6p4CXLmcYYdM7+3lSWOXk06cyjWIHu vislW63PoZXL2UxD9/pQPiHjjFpjaP0n5If3FwECAegRWKVGaOnKxAYJbnyyNpjNtmYY yweLn1P58H+WNot6lIxAbhAB4d+10ZiIWO3AQ2mKMLDh9mh/QRjMiK9Jq+0ujq6rYdMg btavXl4UZ/n9FID6iZghVm4fBsv6ANKakpG/xFqRCLKzkBfbW4qTEIrgc4+pFGKomo8b 65vPYe0LWpKtN4+SAZCxdqOW8opIszS8q07l3l2wj0BpgP65LEW9eps9WRblvsRMKWJu /1iA== X-Original-Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net X-Gm-Message-State: APjAAAV1eUFimRTL+3xpH5EzoKbcNXM3yD2bRBp10YDWeZGUTB/Zo+QN IHWAs9HWHi0CNO61hGmEGvp68n5h1H70WUM72W2bu7diLZOIQHN4NYnKfeUOo8jkME5NV8BVsEQ Sr3Xp2hvdCmbpj3YgqGooMqlBVaPobAmt/nMJ9Gs0k0EdIf7ZsXnjktRpxNoRfUY= X-Received: by 2002:a50:a535:: with SMTP id y50mr80095670edb.249.1558414445882; Mon, 20 May 2019 21:54:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqy3+T3Jgpw4bJei2aTa9oJwhrPC6RA2aOgLwuKyb7WQFzGTSPo6+R8e2UUdppDIXI6tASA1 X-Received: by 2002:a50:a535:: with SMTP id y50mr80095593edb.249.1558414444367; Mon, 20 May 2019 21:54:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558414444; cv=none; d=google.com; s=arc-20160816; b=xd67ESD5WSzyFZprl2qKI3KR1TPatFJwmtAa3QtGY1u7m2nSC55H3XLkw8e9uP/wvx VqgjXCSd8tpgUUHU2BJt9S08Nr+YAE7ktABS0ZViQyxmCFhpJWDqsRAqLTeQsxb7rUNH 1cQ01d9zxZfJdp9ChsN7O065xNMjBsuMMVKEbqMXsvnH0TGiULMr69HmFXDrUU4uF+1W zJrZuTfXkDz7PlddNW+4BprO0XXCPvyOHBt5HwC16z2paQpIBKikC/PPMP8b4Qw/KIq2 bpCBzMYLcXrgiGFH5/elOMwQDCIYqkd8ihuQo+RO7T7S3EuNEL7D/kZ7mr243jjIIGAB jyZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ikv3JIS8B6O3JMfnljc+EOxnfAL3XxzIKwiJr+31Z2Y=; b=v9d19Vw1p8CJjFM866weW+2RQvjhotoGrJhKw29Z2vOlT1OiMc2jlZW2qrh3n6YTnC vGqfZyl6jY0JVfN8aIBUShoShvfHaZdrw7COKtJlBXfV7SpuVC8g1bd3arJVm0j6BE+q bs+OW+LiDUFHuYSX9GoP5SdDBcgftR6vVqzTW6jlagUpA8dVD287NBobvDh1+dxrw5a2 w4EEX/kUMz1py44rPn478i6r07io4x5VgS6CuR4Tp2iBWfQTHAkhWKNj3YBALLikvQum HfOVK9GQKUItvr0s4ujecFoRl93UKV1cHqoMDA1xK1Jy+asQ2QgBD2sLkn34QEBMBzcz mn1g== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id ov1si1802707ejb.320.2019.05.20.21.54.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 21:54:04 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning dave@stgolabs.net does not designate 195.135.221.5 as permitted sender) smtp.mailfrom=dave@stgolabs.net Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 06:54:03 +0200 Received: from linux-r8p5.suse.de (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 21 May 2019 05:53:26 +0100 From: Davidlohr Bueso To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mhocko@kernel.org, mgorman@techsingularity.net, jglisse@redhat.com, ldufour@linux.vnet.ibm.com, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 14/14] mm: convert mmap_sem to range mmap_lock Date: Mon, 20 May 2019 21:52:42 -0700 Message-Id: <20190521045242.24378-15-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net> References: <20190521045242.24378-1-dave@stgolabs.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP With mmrange now in place and everyone using the mm locking wrappers, we can convert the rwsem to a the range locking scheme. Every single user of mmap_sem will use a full range, which means that there is no more parallelism than what we already had. This is the worst case scenario. Prefetching and some lockdep stuff have been blindly converted (for now). This lays out the foundations for later mm address space locking scalability. Signed-off-by: Davidlohr Bueso --- arch/x86/events/core.c | 2 +- arch/x86/kernel/tboot.c | 2 +- arch/x86/mm/fault.c | 2 +- drivers/firmware/efi/efi.c | 2 +- include/linux/mm.h | 26 +++++++++++++------------- include/linux/mm_types.h | 4 ++-- kernel/bpf/stackmap.c | 9 +++++---- kernel/fork.c | 2 +- mm/init-mm.c | 2 +- mm/memory.c | 2 +- 10 files changed, 27 insertions(+), 26 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index f315425d8468..45ecca077255 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2179,7 +2179,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm) * For now, this can't happen because all callers hold mmap_sem * for write. If this changes, we'll need a different solution. */ - lockdep_assert_held_exclusive(&mm->mmap_sem); + lockdep_assert_held_exclusive(&mm->mmap_lock); if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1) on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1); diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 6e5ef8fb8a02..e5423e2451d3 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -104,7 +104,7 @@ static struct mm_struct tboot_mm = { .pgd = swapper_pg_dir, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem), + .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(init_mm.mmap_lock), .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fbb060c89e7d..9f285ba76f1e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1516,7 +1516,7 @@ static noinline void __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { - prefetchw(¤t->mm->mmap_sem); + prefetchw(¤t->mm->mmap_lock); if (unlikely(kmmio_fault(regs, address))) return; diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 55b77c576c42..01e4937f3cea 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -80,7 +80,7 @@ struct mm_struct efi_mm = { .mm_rb = RB_ROOT, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem), + .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(efi_mm.mmap_lock), .page_table_lock = __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock), .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), .cpu_bitmap = { [BITS_TO_LONGS(NR_CPUS)] = 0}, diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bf3e2542047..5ac33c46679f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2899,74 +2899,74 @@ static inline void setup_nr_node_ids(void) {} static inline bool mm_is_locked(struct mm_struct *mm, struct range_lock *mmrange) { - return rwsem_is_locked(&mm->mmap_sem); + return range_is_locked(&mm->mmap_lock, mmrange); } /* Reader wrappers */ static inline int mm_read_trylock(struct mm_struct *mm, struct range_lock *mmrange) { - return down_read_trylock(&mm->mmap_sem); + return range_read_trylock(&mm->mmap_lock, mmrange); } static inline void mm_read_lock(struct mm_struct *mm, struct range_lock *mmrange) { - down_read(&mm->mmap_sem); + range_read_lock(&mm->mmap_lock, mmrange); } static inline void mm_read_lock_nested(struct mm_struct *mm, struct range_lock *mmrange, int subclass) { - down_read_nested(&mm->mmap_sem, subclass); + range_read_lock_nested(&mm->mmap_lock, mmrange, subclass); } static inline void mm_read_unlock(struct mm_struct *mm, struct range_lock *mmrange) { - up_read(&mm->mmap_sem); + range_read_unlock(&mm->mmap_lock, mmrange); } /* Writer wrappers */ static inline int mm_write_trylock(struct mm_struct *mm, struct range_lock *mmrange) { - return down_write_trylock(&mm->mmap_sem); + return range_write_trylock(&mm->mmap_lock, mmrange); } static inline void mm_write_lock(struct mm_struct *mm, struct range_lock *mmrange) { - down_write(&mm->mmap_sem); + range_write_lock(&mm->mmap_lock, mmrange); } static inline int mm_write_lock_killable(struct mm_struct *mm, struct range_lock *mmrange) { - return down_write_killable(&mm->mmap_sem); + return range_write_lock_killable(&mm->mmap_lock, mmrange); } static inline void mm_downgrade_write(struct mm_struct *mm, struct range_lock *mmrange) { - downgrade_write(&mm->mmap_sem); + range_downgrade_write(&mm->mmap_lock, mmrange); } static inline void mm_write_unlock(struct mm_struct *mm, struct range_lock *mmrange) { - up_write(&mm->mmap_sem); + range_write_unlock(&mm->mmap_lock, mmrange); } static inline void mm_write_lock_nested(struct mm_struct *mm, struct range_lock *mmrange, int subclass) { - down_write_nested(&mm->mmap_sem, subclass); + range_write_lock_nest_lock(&(mm)->mmap_lock, mmrange, nest_lock); } -#define mm_write_nest_lock(mm, range, nest_lock) \ - down_write_nest_lock(&(mm)->mmap_sem, nest_lock) +#define mm_write_nest_lock(mm, range, nest_lock) \ + range_write_lock_nest_lock(&(mm)->mmap_lock, range, nest_lock) #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1815fbc40926..d82612183a30 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -8,7 +8,7 @@ #include #include #include -#include +#include #include #include #include @@ -400,7 +400,7 @@ struct mm_struct { spinlock_t page_table_lock; /* Protects page tables and some * counters */ - struct rw_semaphore mmap_sem; + struct range_lock_tree mmap_lock; struct list_head mmlist; /* List of maybe swapped mm's. These * are globally strung together off diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index fdb352bea7e8..44aa74748885 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -36,7 +36,7 @@ struct bpf_stack_map { /* irq_work to run up_read() for build_id lookup in nmi context */ struct stack_map_irq_work { struct irq_work irq_work; - struct rw_semaphore *sem; + struct range_lock_tree *lock; struct range_lock *mmrange; }; @@ -45,8 +45,9 @@ static void do_up_read(struct irq_work *entry) struct stack_map_irq_work *work; work = container_of(entry, struct stack_map_irq_work, irq_work); - up_read_non_owner(work->sem); - work->sem = NULL; + /* XXX we might have to add a non_owner to range lock/unlock */ + range_read_unlock(work->lock, work->mmrange); + work->lock = NULL; } static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work); @@ -338,7 +339,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, if (!work) { mm_read_unlock(current->mm, &mmrange); } else { - work->sem = ¤t->mm->mmap_sem; + work->lock = ¤t->mm->mmap_lock; work->mmrange = &mmrange; irq_work_queue(&work->irq_work); /* diff --git a/kernel/fork.c b/kernel/fork.c index cc24e3690532..a063e8703498 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -991,7 +991,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->vmacache_seqnum = 0; atomic_set(&mm->mm_users, 1); atomic_set(&mm->mm_count, 1); - init_rwsem(&mm->mmap_sem); + range_lock_tree_init(&mm->mmap_lock); INIT_LIST_HEAD(&mm->mmlist); mm->core_state = NULL; mm_pgtables_bytes_init(mm); diff --git a/mm/init-mm.c b/mm/init-mm.c index a787a319211e..35a4be1336c6 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -30,7 +30,7 @@ struct mm_struct init_mm = { .pgd = swapper_pg_dir, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem), + .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(init_mm.mmap_lock), .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), diff --git a/mm/memory.c b/mm/memory.c index 8a5f52978893..65f4d5384bef 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4494,7 +4494,7 @@ void __might_fault(const char *file, int line) __might_sleep(file, line, 0); #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) if (current->mm) - might_lock_read(¤t->mm->mmap_sem); + might_lock_read(¤t->mm->mmap_lock); #endif } EXPORT_SYMBOL(__might_fault);