From patchwork Wed Jan 30 12:44:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 10788515 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29CBC6C2 for ; Wed, 30 Jan 2019 12:45:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 183002EBAE for ; Wed, 30 Jan 2019 12:45:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C8D82EBB8; Wed, 30 Jan 2019 12:45:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 831632EBAE for ; Wed, 30 Jan 2019 12:45:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD0F68E0004; Wed, 30 Jan 2019 07:45:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B7EAC8E0002; Wed, 30 Jan 2019 07:45:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 899538E0006; Wed, 30 Jan 2019 07:45:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 23EE78E0001 for ; Wed, 30 Jan 2019 07:45:16 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id c18so9155539edt.23 for ; Wed, 30 Jan 2019 04:45:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=n9EewSWBpIrjWW0Ewb2hQwGri96Bkj/XJ3FPiJDZfXg=; b=U4xlMvKqPTRmdp09G+FNwhZM/EJk+GqFieh51TkdmyEJJ5txY+sSMtDn49qvBc0TFk xyp7oo/rXRmyoRsFkS2AMOJ2bH3DxNPkNu2II8r/bPPg/DYn8zW4BIHX1ZJ73yLOZ8pj TvjVe7whtpe/NJ/gqTJp6/QLSxOosnS1ftuIF0W2Q0MKijXzLeLRmaH6qlU2AJm2Uvbq GIyG7FDkffQUtp2JSbbzNlxeGO+l2mMGiTPEx2n2dHXIDyMEaqubpQsuEAYY4dJ7k9Ic b1Vjz0QyCU1HpWMLISD72SoPX5oIKpf1soySufQkSOTKUxnfZka+EGtBFaQSUTlOzMII w0Yw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Gm-Message-State: AHQUAuaYtPj9Ne345VizHid8j5Y0UEi4qw3p4r6AIN/FGRsfzudonxeB H9jVcvmKh3mgsqbzMNnu79LbEvsjCg3J9n4GIlUqdqQKewdxHzJ7rdpFK0WwevcZqx1gvBtc1GR b3gqfM0b5efxdPdL61nwr7sss1r9ORIKarfbGx69IB7axkkIwFztRjKrq2lddhSMMpQ== X-Received: by 2002:a17:906:2596:: with SMTP id m22mr5813807ejb.249.1548852315465; Wed, 30 Jan 2019 04:45:15 -0800 (PST) X-Google-Smtp-Source: AHgI3IZl+DZxEGQyOZ7ZWVf8rEoyS6xXO83UfSaqfmQmQlFjdMtQLm70XSYJsW97scoplHNDwJb0 X-Received: by 2002:a17:906:2596:: with SMTP id m22mr5813731ejb.249.1548852314110; Wed, 30 Jan 2019 04:45:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548852314; cv=none; d=google.com; s=arc-20160816; b=QrT/LA77rnNJiZVDv+STYpmiPjFdZXP33r74s1d2RBbJ63yFyrZYJNAGtfF8bIrpoo xqwYcJqZcw9tmcsQpUbYFNi7kriJGJuCwHwSUtBuPxHQJqZpnFvmLWaxYq+QExcDDFPW j5R+K3ef9zF8oxu8NuBHH+n1E4nrGvX/MLaiTXAKPQFar/X2Cu1OOyXY4nr9YTkfUyYg w8jQqbDEH9fsbp8gr7qNDlxkkhkj/ic5DJdWeMymXe5hoymXy0lidmCRXMwCgtrvHaN4 9bsI8v/n+BpWdqaKtUKZ3jSMNU1tZz8cwpLiKMwNynB8/rWWpbChsZZWBOiybcqoFZ5G SthA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=n9EewSWBpIrjWW0Ewb2hQwGri96Bkj/XJ3FPiJDZfXg=; b=qlTRIxZk7AsuTuP5U/FyD1JhqVUXmsgH4RRWjnsdfAx2aQIIoPGemjrWjLv3WHpQi4 kGtk7Bnmn9cFyX+kPkQjJZR5x1jik+kLJgC+fKBAjtqzdTgWzoqlxZjK2K42tR7F7Jrv oXvlg1Wk9LE5BswhxMCGtWzRWCOGT3D6w65BXx67/ZKpOAm0HTBTa1LBxI8vxmubtzkR 64M2HM/cRBV+4kzYV9Wqclnp7gjq/no+PnGXat7myHPo+KZDZE0N/vnnkXHwnmbB7ba6 /nPTDnQun27as1eaGdAFwpfP99c2FtMxFHC5m9hRbCPuM8zFExJs/6EEOzMMdk9S8fMM D5OQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id g11si969510edf.155.2019.01.30.04.45.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Jan 2019 04:45:14 -0800 (PST) Received-SPF: pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4CEA0B020; Wed, 30 Jan 2019 12:45:13 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton , Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Peter Zijlstra , Greg KH , Jann Horn , Jiri Kosina , Dominique Martinet , Andy Lutomirski , Dave Chinner , Kevin Easton , Matthew Wilcox , Cyril Hrubis , Tejun Heo , "Kirill A . Shutemov" , Daniel Gruss , Vlastimil Babka , Jiri Kosina Subject: [PATCH 1/3] mm/mincore: make mincore() more conservative Date: Wed, 30 Jan 2019 13:44:18 +0100 Message-Id: <20190130124420.1834-2-vbabka@suse.cz> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190130124420.1834-1-vbabka@suse.cz> References: <20190130124420.1834-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Jiri Kosina The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache". That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks. Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing. Originally-by: Linus Torvalds Originally-by: Dominique Martinet Cc: Dominique Martinet Cc: Andy Lutomirski Cc: Dave Chinner Cc: Kevin Easton Cc: Matthew Wilcox Cc: Cyril Hrubis Cc: Tejun Heo Cc: Kirill A. Shutemov Cc: Daniel Gruss Signed-off-by: Jiri Kosina Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: Josh Snyder Signed-off-by: Jiri Kosina Signed-off-by: Vlastimil Babka Acked-by: Josh Snyder Acked-by: Michal Hocko --- mm/mincore.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/mm/mincore.c b/mm/mincore.c index 218099b5ed31..747a4907a3ac 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -169,6 +169,14 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return 0; } +static inline bool can_do_mincore(struct vm_area_struct *vma) +{ + return vma_is_anonymous(vma) || + (vma->vm_file && + (inode_owner_or_capable(file_inode(vma->vm_file)) + || inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0)); +} + /* * Do a chunk of "sys_mincore()". We've already checked * all the arguments, we hold the mmap semaphore: we should @@ -189,8 +197,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; - mincore_walk.mm = vma->vm_mm; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); + if (!can_do_mincore(vma)) { + unsigned long pages = (end - addr) >> PAGE_SHIFT; + memset(vec, 1, pages); + return pages; + } + mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) return err; From patchwork Tue Mar 12 14:17:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 10849305 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B36031390 for ; Tue, 12 Mar 2019 14:17:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D716286A0 for ; Tue, 12 Mar 2019 14:17:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8FA80287A5; Tue, 12 Mar 2019 14:17:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C217F286A0 for ; Tue, 12 Mar 2019 14:17:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B1828E0004; Tue, 12 Mar 2019 10:17:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 920CF8E0006; Tue, 12 Mar 2019 10:17:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E96E8E0005; Tue, 12 Mar 2019 10:17:55 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 23EEB8E0002 for ; Tue, 12 Mar 2019 10:17:55 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id x13so1162091edq.11 for ; Tue, 12 Mar 2019 07:17:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=QjxejvXZfx3iwKfTotkvh/SLhkUtiu62B/h1whThaVA=; b=BVSzgVdJ9LJOlE813USMsGikZwOH754xQORSwzbJhoHS3NdjkvLiMdCvlSn38/GQpH Ayav10PEGzZIpsXlQMSPoADGQth6YFyc/dc3Zz0VXmcnsuWxKtvAeJFUqAZjyTNvbCMm G/N5Mrsv1BD4gU8qN3pb5wtBzTNKfTwnmZ0OXF/2uwqwTj8+ebvrDAqnWmQMFy4zmxd2 iKt6f+iU2XXtUMDTHnG9YQI1CMfmgzOj/QzeYh5ZV40kRqWy1TWA0xpaH8962++bPaT/ gXhjwK3u1S8H2C4fV1c2fV3uyrtKo4fPAC9gM5JvdjoAmMTvw47GZ2Uf++NRzab28p2/ 6ScQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Gm-Message-State: APjAAAUD5NABt/ATM/93Mc4JTlJYU1SR+VAEpyJwRxfxbx/+YcBIUPuo 9RmpHjjJfFpkD7feuTDSXtR/0fHaE5kCL/4NPAeS02K15DsdjxagIf11WeUTUr7ZBAUQmfKy307 QlaJRWJA+PEn67V8etYbr4KRC9DoGEC2ENuXFzdr9VVhlJhGjMlxbQbmTBQp4K20lmA== X-Received: by 2002:a05:6402:1807:: with SMTP id g7mr3620801edy.184.1552400274636; Tue, 12 Mar 2019 07:17:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqz1I2E5RJUWs5T7+fX1UKU0WjrozazC26BsWMjcVaL69aThjbrYEfG+ErYZPjsKGsLfdrgg X-Received: by 2002:a05:6402:1807:: with SMTP id g7mr3620745edy.184.1552400273552; Tue, 12 Mar 2019 07:17:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552400273; cv=none; d=google.com; s=arc-20160816; b=BE69dnVgClNQyBvzuUByNzZDLQU04H8j20NZQkJTgbF9UZokXxW/GAOzwcb7nMck0I V4zGmnQ3XCa28zS2U8Ct2ISHc/wdmz+DAgVRt6lwqpCdbr21pIzm/RrxStd1UlDupNNQ TH7QrimK3SmjLVv4/EHqweQ8h2C9rF+0CEnhhOwyX2AWHTwP92zFzcH0T9cNNEMQVBUt gOJ5FcnFJag9opsyUMyu5jLzBZgDlDb3E3TL7PcMNimlEH1BOPhVUb+EYIimmRbjMVcQ zvxOMNd34kfyJD0svJdwVG+U/zt7iHqTTjoIy6oiERhg19W3segTK/mpDi0YIUM/t1on jAWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=QjxejvXZfx3iwKfTotkvh/SLhkUtiu62B/h1whThaVA=; b=dN9fgkbXSEguffLDQhg47iomqq9kmZSjTGzheOJ0kH2VcgztzXISBMjRP0d2aaa5tj 3mql0tekmDHX5H3nTJfUAwuCkqn4sYWytYOLNTxtF3NYw3j7OfPXGDTu199Eh3NWkAEM cFRgpFytKI9LU+P0PVOjWmF90aES0vJ+pwbELREGNYYrvXKMyNMBi2kD5DBSsnRFuu45 BUH9Hsgk7ivQDlbuHqDFfn6fknhjFmdCf/S4ta2s6fM/TDCO0p14kpgaS1b0uVJbV2kn xQDi4DQdQQ8rqeyeL2slH9HfkAoDyeOkMUWr0gJ/SWPL5pAV+e+1Os54lItfeWqEKwOv vwfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id s33si744524edd.306.2019.03.12.07.17.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Mar 2019 07:17:53 -0700 (PDT) Received-SPF: pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A6619B63E; Tue, 12 Mar 2019 14:17:52 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton Cc: Linus Torvalds , Jann Horn , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Vlastimil Babka , Jiri Kosina , Dominique Martinet , Andy Lutomirski , Dave Chinner , Kevin Easton , Matthew Wilcox , Cyril Hrubis , Tejun Heo , "Kirill A . Shutemov" , Daniel Gruss , Jiri Kosina , Josh Snyder , Michal Hocko Subject: [PATCH v2 2/2] mm/mincore: provide mapped status when cached status is not allowed Date: Tue, 12 Mar 2019 15:17:08 +0100 Message-Id: <20190312141708.6652-3-vbabka@suse.cz> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190312141708.6652-1-vbabka@suse.cz> References: <20190130124420.1834-1-vbabka@suse.cz> <20190312141708.6652-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP After "mm/mincore: make mincore() more conservative" we sometimes restrict the information about page cache residency, which needs to be done without breaking existing userspace, as much as possible. Instead of returning with error, we thus fake the results. For that we return residency values as 1, which should be safer than faking them as 0, as there might theoretically exist code that would try to fault in the page(s) in a loop until mincore() returns 1 for them. Faking 1 however means that such code would not fault in a page even if it was not truly in page cache, with possibly unwanted performance implications. We can improve the situation by revisting the approach of 574823bfab82 ("Change mincore() to count "mapped" pages rather than "cached" pages"), later reverted by 30bac164aca7 and replaced by restricting/faking the results. In this patch we apply the approach only to cases where page cache residency check is restricted. Thus mincore() will return 0 for an unmapped page (which may or may not be resident in a pagecache), and 1 after the process faults it in. One potential downside is that mincore() users will be now able to recognize when a previously mapped page was reclaimed. While that might be useful for some attack scenarios, it is not as crucial as recognizing that somebody else faulted the page in, which is the main reason we are making mincore() more conservative. For detecting that pages being reclaimed, there are also other existing ways anyway. Cc: Jiri Kosina Cc: Dominique Martinet Cc: Andy Lutomirski Cc: Dave Chinner Cc: Kevin Easton Cc: Matthew Wilcox Cc: Cyril Hrubis Cc: Tejun Heo Cc: Kirill A. Shutemov Cc: Daniel Gruss Signed-off-by: Vlastimil Babka --- mm/mincore.c | 67 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 51 insertions(+), 16 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index c3f058bd0faf..c9a265abc631 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -21,12 +21,23 @@ #include #include +/* + * mincore() page walk's private structure. Contains pointer to the array + * of return values to be set, and whether the current vma passed the + * can_do_mincore() check. + */ +struct mincore_walk_private { + unsigned char *vec; + bool can_check_pagecache; +}; + static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; - unsigned char *vec = walk->private; + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; /* * Hugepages under user process are always in RAM and never @@ -35,7 +46,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, present = pte && !huge_pte_none(huge_ptep_get(pte)); for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; - walk->private = vec; + walk_private->vec = vec; #else BUG(); #endif @@ -85,7 +96,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) } static int __mincore_unmapped_range(unsigned long addr, unsigned long end, - struct vm_area_struct *vma, unsigned char *vec) + struct vm_area_struct *vma, unsigned char *vec, + bool can_check_pagecache) { unsigned long nr = (end - addr) >> PAGE_SHIFT; int i; @@ -95,7 +107,14 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + /* + * Return page cache residency state if we are allowed + * to, otherwise return mapping state, which is 0 for + * an unmapped range. + */ + vec[i] = can_check_pagecache ? + mincore_page(vma->vm_file->f_mapping, pgoff) + : 0; } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -106,8 +125,11 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, static int mincore_unmapped_range(unsigned long addr, unsigned long end, struct mm_walk *walk) { - walk->private += __mincore_unmapped_range(addr, end, - walk->vma, walk->private); + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; + + walk_private->vec += __mincore_unmapped_range(addr, end, walk->vma, + vec, walk_private->can_check_pagecache); return 0; } @@ -117,7 +139,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, spinlock_t *ptl; struct vm_area_struct *vma = walk->vma; pte_t *ptep; - unsigned char *vec = walk->private; + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; int nr = (end - addr) >> PAGE_SHIFT; ptl = pmd_trans_huge_lock(pmd, vma); @@ -128,7 +151,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } if (pmd_trans_unstable(pmd)) { - __mincore_unmapped_range(addr, end, vma, vec); + __mincore_unmapped_range(addr, end, vma, vec, + walk_private->can_check_pagecache); goto out; } @@ -138,7 +162,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, if (pte_none(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, - vma, vec); + vma, vec, walk_private->can_check_pagecache); else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ @@ -152,8 +176,20 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, *vec = 1; } else { #ifdef CONFIG_SWAP - *vec = mincore_page(swap_address_space(entry), + /* + * If tmpfs pages are being swapped out, treat + * it with same restrictions on mincore() as + * the page cache so we don't expose that + * somebody else brought them back from swap. + * In the restricted case return 0 as swap + * entry means the page is not mapped. + */ + if (walk_private->can_check_pagecache) + *vec = mincore_page( + swap_address_space(entry), swp_offset(entry)); + else + *vec = 0; #else WARN_ON(1); *vec = 1; @@ -195,22 +231,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v struct vm_area_struct *vma; unsigned long end; int err; + struct mincore_walk_private walk_private = { + .vec = vec + }; struct mm_walk mincore_walk = { .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, .hugetlb_entry = mincore_hugetlb, - .private = vec, + .private = &walk_private }; vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); - if (!can_do_mincore(vma)) { - unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); - memset(vec, 1, pages); - return pages; - } + walk_private.can_check_pagecache = can_do_mincore(vma); mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) From patchwork Wed Jan 30 12:44:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 10788519 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F53A6C2 for ; Wed, 30 Jan 2019 12:45:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CC742EBAD for ; Wed, 30 Jan 2019 12:45:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 812962EB12; Wed, 30 Jan 2019 12:45:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC16B285AF for ; Wed, 30 Jan 2019 12:45:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 116898E0003; Wed, 30 Jan 2019 07:45:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DCADF8E0005; Wed, 30 Jan 2019 07:45:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCC478E0003; Wed, 30 Jan 2019 07:45:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 352788E0004 for ; Wed, 30 Jan 2019 07:45:16 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id e12so9347605edd.16 for ; Wed, 30 Jan 2019 04:45:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ulYz79XtO1G0723Gc8tScT/ZS89wck5bZo+KvUiTKmc=; b=PrnddM8s35zCPw+PFZq6Vfr7jhlyS2nzwhSARTyVRXMQhH4nsUNgc6Sgx8GsEVc0Yo Nfq7XPIcZLP2iSGS48boKAPsY2AxP9a+Zx1rq2H4Flwb0Djm1irwbiO5L4/Z+GT9h5Ae hKISiVRSzVC7aGBiseBZ4WHPylWUHdwJgX1x1ktFaAFiqgx1by1/86xhj8+pCqafLRBF BIW/J0266BEH1QAFdDltgTnARQM0GjM2Q8tO9SSqH8RII+9ia16mysey/RP8IdpRyOct s3EU3hq2R5VdOMaAa6cH9AWuwJbiQncUh9mZhQtT2JsNYV5Ma9Dk/IVPd+CxD/laIvUz Q9vg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Gm-Message-State: AJcUukeBLt44VHY3LzSsWm/Z/qMWoh0c+tiIRUAHp+ezJvEtBDvjhl9T pV/U0onoSIrJiLTej2MQ/qJdgr93fJmIZv7CPLgM6gMoBqwgfezwnx9rJS/cdbWsDyGZtHEdVf2 3yXucC9UwIrmmitz+kx9r29ic+JQ/6/1cnNELGM1L4zW9kKYAcPWeBwd1KPvHR5Q9WQ== X-Received: by 2002:a50:8fe4:: with SMTP id y91mr29446089edy.231.1548852315663; Wed, 30 Jan 2019 04:45:15 -0800 (PST) X-Google-Smtp-Source: ALg8bN4KTmChHCtugxnfjeHunYMIKA/lDkEQB8n9CzJLKE8f2Shl76mTeRpwaE9HWWUuT628tPb5 X-Received: by 2002:a50:8fe4:: with SMTP id y91mr29445995edy.231.1548852314136; Wed, 30 Jan 2019 04:45:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548852314; cv=none; d=google.com; s=arc-20160816; b=kuGkaAgJ19g/gHHtknwYnFZYYai5DPx4DGjlHV495vrL6joSsRZGV6NavL+HsCx4sW tKNjF9GTBsFN+MbQvc1LKsNyrk4jRJtksP/KElYaOA9YsSqU9XUTLVMdH4pvy6cQ85c9 bSXT92oOBvo5j3+hsUh4oGP/ff/jPW7x3+7J+mc+330hk4pW0y/Q1TKww0XzCkVjO4S+ pcDv0FZbBgAta6khG634Gw72/0BU/k7qbd3svoj/lF9AU1hYwd4C4XbwpxF66j0RYrPv Pco6Tks/kJcmYvXYtb+B80KYIR++qJ0cVAXYM4VqdFCdWTw8wOv0yKUC0sk7c3mgEVxr UmLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=ulYz79XtO1G0723Gc8tScT/ZS89wck5bZo+KvUiTKmc=; b=FZyLbzXDg2kTPxBij7tLjj8ItW6FIUknwlEJ4aFtm3rlSREj2OUrKQjxCenZvDYD3C 7wNMnnblQbVs1Ha1vf1UIy/l5WKP0wmIjq1FjTCmJcu2te3+f8PmBrryUZJ8w32XYy82 5rReqUC5jrFlHAzTFSvmZYhbPjdKr9BJpGevX5jiz1+3laEOV/pivpJC1aT4+UEp6qS6 MqEzenpirByrWYwKjuJpqKh7y7ttz5nIXmcAyQgqMqWAVTDoXsee9OmkNCaUOX7Jc4/g YNPV5WyCA6kNRqC49091LhO7b+0gMLQdI/yyIm3QbvQudbHUKx0BYrRh9SbduhpYnrk6 L4JA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id n2si786123ejr.241.2019.01.30.04.45.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Jan 2019 04:45:14 -0800 (PST) Received-SPF: pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 44459AFC5; Wed, 30 Jan 2019 12:45:13 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton , Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Peter Zijlstra , Greg KH , Jann Horn , Vlastimil Babka , Jiri Kosina , Dominique Martinet , Andy Lutomirski , Dave Chinner , Kevin Easton , Matthew Wilcox , Cyril Hrubis , Tejun Heo , "Kirill A . Shutemov" , Daniel Gruss , Jiri Kosina Subject: [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed Date: Wed, 30 Jan 2019 13:44:20 +0100 Message-Id: <20190130124420.1834-4-vbabka@suse.cz> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190130124420.1834-1-vbabka@suse.cz> References: <20190130124420.1834-1-vbabka@suse.cz> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP After "mm/mincore: make mincore() more conservative" we sometimes restrict the information about page cache residency, which we have to do without breaking existing userspace, if possible. We thus fake the resulting values as 1, which should be safer than faking them as 0, as there might theoretically exist code that would try to fault in the page(s) until mincore() returns 1. Faking 1 however means that such code would not fault in a page even if it was not in page cache, with unwanted performance implications. We can improve the situation by revisting the approach of 574823bfab82 ("Change mincore() to count "mapped" pages rather than "cached" pages") but only applying it to cases where page cache residency check is restricted. Thus mincore() will return 0 for an unmapped page (which may or may not be resident in a pagecache), and 1 after the process faults it in. One potential downside is that mincore() will be again able to recognize when a previously mapped page was reclaimed. While that might be useful for some attack scenarios, it's not as crucial as recognizing that somebody else faulted the page in, and there are also other ways to recognize reclaimed pages anyway. Cc: Jiri Kosina Cc: Dominique Martinet Cc: Andy Lutomirski Cc: Dave Chinner Cc: Kevin Easton Cc: Matthew Wilcox Cc: Cyril Hrubis Cc: Tejun Heo Cc: Kirill A. Shutemov Cc: Daniel Gruss Signed-off-by: Vlastimil Babka --- mm/mincore.c | 49 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index 747a4907a3ac..d6784a803ae7 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -21,12 +21,18 @@ #include #include +struct mincore_walk_private { + unsigned char *vec; + bool can_check_pagecache; +}; + static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; - unsigned char *vec = walk->private; + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; /* * Hugepages under user process are always in RAM and never @@ -35,7 +41,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, present = pte && !huge_pte_none(huge_ptep_get(pte)); for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; - walk->private = vec; + walk_private->vec = vec; #else BUG(); #endif @@ -85,7 +91,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) } static int __mincore_unmapped_range(unsigned long addr, unsigned long end, - struct vm_area_struct *vma, unsigned char *vec) + struct vm_area_struct *vma, unsigned char *vec, + bool can_check_pagecache) { unsigned long nr = (end - addr) >> PAGE_SHIFT; int i; @@ -95,7 +102,9 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = can_check_pagecache ? + mincore_page(vma->vm_file->f_mapping, pgoff) + : 0; } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -106,8 +115,11 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, static int mincore_unmapped_range(unsigned long addr, unsigned long end, struct mm_walk *walk) { - walk->private += __mincore_unmapped_range(addr, end, - walk->vma, walk->private); + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; + + walk_private->vec += __mincore_unmapped_range(addr, end, walk->vma, + vec, walk_private->can_check_pagecache); return 0; } @@ -117,7 +129,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, spinlock_t *ptl; struct vm_area_struct *vma = walk->vma; pte_t *ptep; - unsigned char *vec = walk->private; + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; int nr = (end - addr) >> PAGE_SHIFT; ptl = pmd_trans_huge_lock(pmd, vma); @@ -128,7 +141,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } if (pmd_trans_unstable(pmd)) { - __mincore_unmapped_range(addr, end, vma, vec); + __mincore_unmapped_range(addr, end, vma, vec, + walk_private->can_check_pagecache); goto out; } @@ -138,7 +152,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, if (pte_none(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, - vma, vec); + vma, vec, walk_private->can_check_pagecache); else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ @@ -152,8 +166,12 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, *vec = 1; } else { #ifdef CONFIG_SWAP - *vec = mincore_page(swap_address_space(entry), + if (walk_private->can_check_pagecache) + *vec = mincore_page( + swap_address_space(entry), swp_offset(entry)); + else + *vec = 0; #else WARN_ON(1); *vec = 1; @@ -187,22 +205,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v struct vm_area_struct *vma; unsigned long end; int err; + struct mincore_walk_private walk_private = { + .vec = vec + }; struct mm_walk mincore_walk = { .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, .hugetlb_entry = mincore_hugetlb, - .private = vec, + .private = &walk_private }; vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); - if (!can_do_mincore(vma)) { - unsigned long pages = (end - addr) >> PAGE_SHIFT; - memset(vec, 1, pages); - return pages; - } + walk_private.can_check_pagecache = can_do_mincore(vma); mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0)