From patchwork Fri Jul 21 03:46:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13321302 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB61CEB64DD for ; Fri, 21 Jul 2023 03:46:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52783280192; Thu, 20 Jul 2023 23:46:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B1F9280190; Thu, 20 Jul 2023 23:46:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32ACD280192; Thu, 20 Jul 2023 23:46:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1BF0F280190 for ; Thu, 20 Jul 2023 23:46:56 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E363614011F for ; Fri, 21 Jul 2023 03:46:55 +0000 (UTC) X-FDA: 81034232790.12.4BBCFE8 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf28.hostedemail.com (Postfix) with ESMTP id 105CFC0011 for ; Fri, 21 Jul 2023 03:46:53 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=h8BR3B2I; spf=pass (imf28.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689911214; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ZH9eaVq/3xF5+eoF3tJlJvK/nUNTkULMc9bRvazMcXM=; b=UoeLmAXdIxkjnNIw+sWWeaBmRax5+DBkwHZubvOqKJJXCWj+BS6Rzq67JCkWXhAZt5SMWM n6vY2WxYz8yZrJwwp6bkIxX0BuaSWoqyx4ze+X6/0QnWAvvxn2Yf4ScAt8l9WPvrt/EPXK WiKHYSN11fMkbe5cGLshOqj9POVGkd8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689911214; a=rsa-sha256; cv=none; b=d/3ONWJrYQ3uR+YuhKklV8RKGd0E4X96gySZWJg+99sAaUTLC67rQ+e8MrT0kNswCcVyUm LjVAhDfIkbHVpTcdAXhGSat3dpVscDQ2ZkpWWzRB/kFXfCSXjZYGFCPOz11emyFPCK5yCS SzF4fH1Dd8k22Wo+zbleXaaFzypfC7I= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=h8BR3B2I; spf=pass (imf28.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-521e046f6c7so4238a12.1 for ; Thu, 20 Jul 2023 20:46:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689911212; x=1690516012; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ZH9eaVq/3xF5+eoF3tJlJvK/nUNTkULMc9bRvazMcXM=; b=h8BR3B2IDeSvGlXM1y+Dh8ezr8hVadMms0TzcGbCGfLis7VGJCwpWJB+d+k1frxoNr XVJux86B7JYWYqlhIgZZAELTzQKQLnkMCogjVf0LR72ZzaryJntMFO3ZND05DJdp3rpy 0s6bw3I4DBonlizpUcZLw0GuK1kPNS3FkTh1tcwUdb3AXQcHjGaEVUML/aGj8kP22T87 l35uS2Sk/VVQI/RaTQe5Lf3byN3UTPJgaBK+blvZA0In4+OSifEkq2xHQB6jfzp5+spD 0sEAlr5soxvMwxkphgHlcdUMWlNJlOEAnVpWntpSAVGgtCzl5XmRpr565LvUDsCmzMLV RPYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689911212; x=1690516012; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZH9eaVq/3xF5+eoF3tJlJvK/nUNTkULMc9bRvazMcXM=; b=fuYYT7CSLDhcvR7XKuTDxN8emVqj6btnfoBqfhavqD572WtalnbWCkNyYxW8ZF0Fqs tmIekhpzHx6oOlLsFJmNgcmgUozH0W43LXJcY3XFjPbPoMi0SRBqjlFpKPYHX5oIAQdk Us9/fPgBlx6X5l014+kQy7w033FYJ7np5xJYzhyGu9GrDU2lFK94H0G+hO7seW7uS9Mg 8Ip2CgXiz8AvSDJAV9P2Sj6Lwce90R2cqjX/EhOWRWfXi1qrCoNX3/aNuP/HJ5/OVgte IXcfpGYA3V8ksoI3siFat1w4fVHXO3gkqcH6FTFpRNRYhhYfSfubOGfxopU6cwU1xk9o DOIw== X-Gm-Message-State: ABy/qLaC7Wg1JHureNFu/rnaeNPlyRbmokjBjFQiOSqdErCeeGCXjLtH +Rc1Hr/FO1Qvs2is3iVKqdGN6Q== X-Google-Smtp-Source: APBJJlFwy4ubM+xPNPM5dstHjmfNddoFZSHjw0+5/hrgo2ajuwmdAACHgwnCZuMzltQ1GcqzD2X+qA== X-Received: by 2002:a50:8e07:0:b0:51a:1ffd:10e with SMTP id 7-20020a508e07000000b0051a1ffd010emr43273edw.3.1689911212419; Thu, 20 Jul 2023 20:46:52 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:5f41:554c:a4f4:69d]) by smtp.gmail.com with ESMTPSA id q6-20020adff506000000b003141e629cb6sm2938755wro.101.2023.07.20.20.46.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 20:46:51 -0700 (PDT) From: Jann Horn To: Andrew Morton Cc: Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: Lock VMA in dup_anon_vma() before setting ->anon_vma Date: Fri, 21 Jul 2023 05:46:43 +0200 Message-ID: <20230721034643.616851-1-jannh@google.com> X-Mailer: git-send-email 2.41.0.487.g6d72f3e995-goog MIME-Version: 1.0 X-Stat-Signature: 4537t49i81noobktx5tbs5qk15m1uuo6 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 105CFC0011 X-Rspam-User: X-HE-Tag: 1689911213-49491 X-HE-Meta: U2FsdGVkX1+cD2Uz8CdJJiOJeoVrdUyH6P6dvTlFpkModkvbY0zygAickeX1jzv332COsA2CqsBi1gvaO91UDyZoOI5w8khrZl4OY/IKyjGi4fFiFyDEY5kCNWDzCjw6OHfo/VT84wTPZpiLWhyI2MRdWYEG9aVceXtkslVCTrWIz06sgTMQrGbHnBW0XmZtj+THG+CKH+PvB46kqNCpsj+Md5POQLTUCM5VLfh04+u064ywkcbTuziH51AIHNO1YtJZwO/TqdbAmsMvcHTNbyrncxEDLgHJSwY0emdrj35OBVE1pj3L/46DJEbJPDLcOLOAJTFaHLeVUG4FSZrsgHKOtW8M3vcD/M5w9qlca4nP/gTZb4wa2zPbZa/du+eqcbBxD+LqFEkcYSIWIc6unbELq3+XV8DZypvbzgoryVTUyRGl7KDgA1dh905CISuJ5wrlYYMaRcvelxSAzVHjoQbkAiPclP4Ej4G5bXu4sQQimMldZj5W4KecuILpn+w6uEQMlyc8Z3NXp76swu+GjSEjwGX6tPx462S+cmInxRS9zf5lFi6Y7HEPbZ8SsV1t4ZTPNGtNxyVnd5is5pHeg6ziqE3ZW7rkIjJSv7GmTxFrmlGMY8VYJ7vMNwMi3I0yNP/05z1og2LcQy1RM0QkDC/rxLazLy+YVV0AzMDyJy3oPTJwgBpDUhE0Mzm1+MHQ6afUoSlA/jfQcg37UaQNeiGFeC0Z8FwV9JElmupfX33GTWXFf7HH8F/Sv93PvkXqq82P0RtIt8rsCoiXmJWo96W/VH7H7zqxS6czcOohZVrHJbQUxShSH6ZzXdPRr+bNSAD3cjEfMFA0IpUhKVO9ngc/XPUHh6XE+ijF5PpAT0nsZzXHPfnjPKPecrVZ7/GjxTI2eCTFnGCFSmhMHbcyPXeVveoJJLU5AqTfd9WdUNQti12OE9o+zPfslBMXMP3qJwPHaao4rXmyoRjFIsJ M7uVkULD FfZ0Wu6GBhG0FIwSNVTEuX3WDcGlFjm+HoxYn5MwJvPmsDECv9Q6y5yO+PkUGFg7G7DM5Pi0/du+orbrS4PIfTJR5V5W7qZo7/PT4JExdEeq/Q5v8co4got8KWEUtGcxxeCM1l0ec12SHigjkqbz5YDYfXqCqS14YkbQwQmExlg/sU+pv1mQv/wX00FfjhGfUj84GpaByr3VEkM0fARU4Bj1YHzm5UXeaRz7xhx+MGN7i+wp0xz9r0lcctJ4h9efGtZTzW3NAxbRhLa3Bt/V8ehF5M26eKm9hJF4wpi75Kf5STOoTv6JupkOueAW41pGHHGSyqpgC+3Cf1qSRjVm3j65og8Z8/AMwvpCa/WQ+uRrktRh8DvsXEWTl4mK682g6ARAQP6qxqY+MheYWVidzk6yaTBUy4UxSsOfP19cxNT0OVYHm0WJJ2NoC2+VotisEvdFgBi2LxitNT7UDeQDiADUI8204KYkBrhSGjp+9MVafatc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the VMA that is being expanded to cover the area previously occupied by another VMA. This currently happens while `dst` is not write-locked. This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent page faults can happen on `dst` under the per-VMA lock. This is already icky in itself, since such page faults can now install pages into `dst` that are attached to an `anon_vma` that is not yet tied back to the `anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due to an out-of-memory error, things get much worse: `anon_vma_clone()` then reverts `dst->anon_vma` back to NULL, and `dst` remains completely unconnected to the `anon_vma`, even though we can have pages in the area covered by `dst` that point to the `anon_vma`. This means the `anon_vma` of such pages can be freed while the pages are still mapped into userspace, which leads to UAF when a helper like folio_lock_anon_vma_read() tries to look up the anon_vma of such a page. This theoretically is a security bug, but I believe it is really hard to actually trigger as an unprivileged user because it requires that you can make an order-0 GFP_KERNEL allocation fail, and the page allocator tries pretty hard to prevent that. I think doing the vma_start_write() call inside dup_anon_vma() is the most straightforward fix for now. For a kernel-assisted reproducer, see the notes section of the patch mail. Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Cc: stable@vger.kernel.org Cc: Suren Baghdasaryan Signed-off-by: Jann Horn Reviewed-by: Suren Baghdasaryan --- To reproduce, patch mm/rmap.c by adding "#include " and changing anon_vma_chain_alloc() like this: static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp) { + if (strcmp(current->comm, "FAILME") == 0) { + // inject delay and error + mdelay(2000); + return NULL; + } return kmem_cache_alloc(anon_vma_chain_cachep, gfp); } Then build with KASAN and run this reproducer: #define _GNU_SOURCE #include #include #include #include #include #include #include #include #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1L) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) static char *area; static volatile int fault_thread_done; static volatile int spin_launch; static void *fault_thread(void *dummy) { while (!spin_launch) /*spin*/; sleep(1); area[0] = 1; fault_thread_done = 1; return NULL; } int main(void) { fault_thread_done = 0; pthread_t thread; if (pthread_create(&thread, NULL, fault_thread, NULL)) errx(1, "pthread_create"); // allocator spam int fd = SYSCHK(open("/etc/hostname", O_RDONLY)); char *vmas[10000]; for (int i=0; i<5000; i++) { vmas[i] = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0)); *vmas[i] = 1; } // create a 3-page area, no anon_vma at this point, with guard vma behind it to prevent merging with neighboring anon_vmas area = SYSCHK(mmap((void*)0x10000, 0x4000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)); SYSCHK(mmap(area+0x3000, 0x1000, PROT_READ, MAP_SHARED|MAP_FIXED, fd, 0)); // turn it into 3 VMAs SYSCHK(mprotect(area+0x1000, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC)); // create an anon_vma for the tail VMA area[0x2000] = 1; // more allocator spam for (int i=5000; i<10000; i++) { vmas[i] = SYSCHK(mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0)); *vmas[i] = 1; } printf("with anon_vma on tail VMA:\n\n"); system("cat /proc/$PPID/smaps | head -n55"); printf("\n\n"); spin_launch=1; // mprotect() will try to merge the VMAs but bail out due to the injected // allocator failure SYSCHK(prctl(PR_SET_NAME, "FAILME")); SYSCHK(mprotect(area+0x1000, 0x1000, PROT_READ|PROT_WRITE)); SYSCHK(prctl(PR_SET_NAME, "normal")); printf("after merge from mprotect:\n\n"); if (!fault_thread_done) errx(1, "fault thread not done yet???"); system("cat /proc/$PPID/smaps | head -n55"); printf("\n\n"); // release the anon_vma SYSCHK(munmap(area+0x1000, 0x2000)); // release spam for (int i=0; i<10000; i++) SYSCHK(munmap(vmas[i], 0x1000)); // wait for RCU sleep(2); // trigger UAF? printf("trying to trigger uaf...\n"); SYSCHK(madvise(area, 0x1000, 21/*MADV_PAGEOUT*/)); } You should get an ASAN splat like: BUG: KASAN: use-after-free in folio_lock_anon_vma_read+0x9d/0x2f0 Read of size 8 at addr ffff8880053a2660 by task normal/549 CPU: 1 PID: 549 Comm: normal Not tainted 6.5.0-rc2-00073-ge599e16c16a1-dirty #292 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: dump_stack_lvl+0x36/0x50 print_report+0xcf/0x660 [...] kasan_report+0xc7/0x100 [...] folio_lock_anon_vma_read+0x9d/0x2f0 rmap_walk_anon+0x282/0x350 [...] folio_referenced+0x277/0x2a0 [...] shrink_folio_list+0xc9f/0x15c0 [...] reclaim_folio_list+0xdc/0x1f0 [...] reclaim_pages+0x211/0x280 [...] madvise_cold_or_pageout_pte_range+0x2ea/0x6a0 [...] walk_pgd_range+0x6c5/0xb90 [...] __walk_page_range+0x27f/0x290 [...] walk_page_range+0x1fd/0x230 [...] madvise_pageout+0x1cd/0x2d0 [...] do_madvise+0xb58/0x1280 [...] __x64_sys_madvise+0x62/0x70 do_syscall_64+0x3b/0x90 [...] mm/mmap.c | 1 + 1 file changed, 1 insertion(+) base-commit: e599e16c16a16be9907fb00608212df56d08d57b diff --git a/mm/mmap.c b/mm/mmap.c index 3eda23c9ebe7..3937479d0e07 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm_area_struct *dst, * anon pages imported. */ if (src->anon_vma && !dst->anon_vma) { + vma_start_write(dst); dst->anon_vma = src->anon_vma; return anon_vma_clone(dst, src); }