From patchwork Tue Aug 15 19:36:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13354213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E863C001B0 for ; Tue, 15 Aug 2023 19:37:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A84DF940028; Tue, 15 Aug 2023 15:37:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A34CC8D0001; Tue, 15 Aug 2023 15:37:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FC48940028; Tue, 15 Aug 2023 15:37:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8133E8D0001 for ; Tue, 15 Aug 2023 15:37:31 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4EE4F120DCF for ; Tue, 15 Aug 2023 19:37:31 +0000 (UTC) X-FDA: 81127348302.08.447D688 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf26.hostedemail.com (Postfix) with ESMTP id 8849A140017 for ; Tue, 15 Aug 2023 19:37:29 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=dtsj2Ycc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jannh@google.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692128249; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=sM1zcvKgTVf4MMl3o4eOD/9lkdvWOqjCRHYfZxmXFp4=; b=FVXWxggXjyRS8Gmi3OX1f8AN3cft10CT4DfMYGvTnCHgrFsx/PDY+nCAp+fCrsODzbi1OR TQ4A/Plq0RYUqadxwNJsFKNyyaz+U7KAyPpwELrTCbiUNgI9cBIHy4cj9Wi/njetqlq2H+ 1GhG/3PXlJfOTXTsO1mbJrP63xo444w= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=dtsj2Ycc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jannh@google.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692128249; a=rsa-sha256; cv=none; b=m8ZZvBUDwFP2DpBAVAbT4V9WIxp4FMeNIONchoburz2jKA7GA/xSS+11aKflZuq/l4IfH8 yobRUzGtOa28Esrd+ZCTmIifrQQzSZilxqbNnnj+VyasGkFfrnrPOMEP5SYgnfeVAxFZXa WRoBTcmsSGnX1wnGP4UkEgOu77Qpa+w= Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-3fe2d620d17so22485e9.0 for ; Tue, 15 Aug 2023 12:37:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692128248; x=1692733048; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=sM1zcvKgTVf4MMl3o4eOD/9lkdvWOqjCRHYfZxmXFp4=; b=dtsj2Yccnw6ntfVKzxT2D57HS3aC6l+R2RtnXXY3+XSkP+4gL0i0rwbavNsx1ywYxL m47ZwE4nBSkXSBnyJDpaaVm66Ai2wjtElAVf2fKmaY4jVosflLhgyJch88FWl9+wh3DQ A3GgFisdZgKlW/WswSJoxVB1V50VSLO9m6RmzHARaP5KaZx0u1HtAK3qGunDC+aCmFej j41Ht8V5B4g/SjQIYaOSeI8tZ2NZvmIicU46hLASpqMMz7BXPVJLeMv2ly+JFzlDpqHZ 2jbW7zrPP7h52I0xTCABmY23LFnZuiwzLVk6sJtV8TYpXbc+WFlUMBYbjVEFoyglo+RC bLSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692128248; x=1692733048; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sM1zcvKgTVf4MMl3o4eOD/9lkdvWOqjCRHYfZxmXFp4=; b=HuKI3oJW7vhTgPwgyW0JCy9ykZIbWd6GCkThhYulGkjsbloaDzFqv0UH7YyQ2S0e6a nWgaAsTIcuIlPymelKt3NbPokHsTtGPfMivJ73BX5BHBSuBlPlEv1MvK7xodzAKwpaT3 DE4p8nUaOCKs6eEyqVRdYQ425n89qpMfy1eJLZPjMTjL/5R9coXIgd7uOj23MiM1B7Rq Utd7XK+cod3e2A2qNzm68AQhN3UNUwIIWGtonoE+3QqGwquXwjW67Kh7i0ZHgLwNapvt 2gMktcRSrM1s9h+brRB32NZMtb+i5/aJp0a+9nE1NiNPKKAgkwzOjewdZ2FvJU6DIqp4 i+Dw== X-Gm-Message-State: AOJu0Yzb0lug2Z4PLjjHQzMjaHBr1/5p9AT8A8uL4IdiAydbnQgPfnsP dM9MlgcXnmEmz1Tl8HlHaoOBP9vvG79JNRm/yrDigQ== X-Google-Smtp-Source: AGHT+IFI8xJSZXHdMgHHc7A92k4PePM3nJWjf0sI2O6KJfgC7yDF//oZGJYbaVgQf0u5IU9GEB+hVihgyro5GdqD4Zc= X-Received: by 2002:a05:600c:1c90:b0:3f7:e4d8:2569 with SMTP id k16-20020a05600c1c9000b003f7e4d82569mr1047wms.5.1692128247840; Tue, 15 Aug 2023 12:37:27 -0700 (PDT) MIME-Version: 1.0 From: Jann Horn Date: Tue, 15 Aug 2023 21:36:51 +0200 Message-ID: Subject: maple tree change made it possible for VMA iteration to see same VMA twice due to late vma_merge() failure To: Liam Howlett , Andrew Morton Cc: kernel list , Linux-MM X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8849A140017 X-Stat-Signature: sjmj8c9oiieecsxbhxqx9nejxmxdbbgk X-HE-Tag: 1692128249-657644 X-HE-Meta: U2FsdGVkX19AXq5PbWB2lefLjM6RowlKi01QxZx/blhrecrx54PQKnzfE+CAFFYwcudx3i/AvBEVjPXdqGnKMPjVyQbPWS9eySDUCXlfyovZR0kx6hiQVZAO/HOGVKh2qbQbn1I8pdgCJ2Oy0J+HaMjyVcon/rO6x0OmbQ2WeEj5iSVfKzRT0L9t+YXXIeWPhStvVCOPvSBsVRgI+edCdSy8oJHA7ax8o6MwjoMH92GLhPg2YtDGIsQKUfBtTTALjbOab32p8GL5+EbTsuljMTlGndtesjvO6LzT8w+SpLFEP5FrF2hXj28h9EmiPnVzUAYLMJx/vBMqVsRsdajZV1DHSUo1ACl57kjA2rYcLprtk1yUE+llvv8Mf0dYfrp75VESV+oGzHNCwncjpDeJCkEuKE+GVokV0pwY0zkSYm+f3L+gP0yEon4CrkfDF1nav52W4/1OJ/3rUIruZsspJVDTGDkgZTcf4vLnxUxtOWpvFHnCPy5aLfQ0fMSFz76QGDXP2VGfg+hCqTM/g8NM+FVLHjgN9WIntornJQcNRzZWOpYKNGNTaIV5hSaJX1OO8vKO0BmIqhDVvXO//+ZA+xCgHqtbdydXfsXocHKIakClx1iPk5kV7vhm3zTMKrNmTFO9gl0/nstcNItxHIQilkwRsUy7nn2QPm+C9emDEC3l3C+xZyQOGNARJ85Pbj1BeU1W0LezDPHbgNdVsYl7KyMBva4PgHOIOZdi60s+7fW3WvAoJX6tOVrt3n+AS9L1UJX4C2iBSGfcobzMc+Zdxl56XQofIubldfDZWmfpPcTH5eOM9eLlbZxM8FGGnXQqRaiSwJfFE/qTiZo6DkrQ/KZOU2kHQP9QwDUgjNV9aZU0aKaddNGEjEbdQMRzC+QGtezBg1K4PQEGkBJ9c1zDcWdrhYEFwfke1ezgpElGrLKr3M2ipu5TciyYkWhy8VXseOk4CNyXXuy9tjI8TxW 5E15UrSc 3AF+nNLCsqToKfR9kgE57Jvk/KhhEIEnc/WKYsXF73VSPQnH8VjyuZos69bdm63/h6Eh1sd9wJotf4A51cmJNLMpmAM1MmEhwNEDkxbrBaWWmw1RpQvF7zJxOrSDCFRNwO82HgbkDenFGDyKJkBPts7Nw+ZZBYd/c+A2xi4+ZCx8oW4p1vfY7Y49KKO7MoMinsW1f6tOrUF0L4OAj4vhLTEt40Fzn5zyYxoCOK43gRBlEC5B5bySpJJPUIg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: commit 18b098af2890 ("vma_merge: set vma iterator to correct position.") added a vma_prev(vmi) call to vma_merge() at a point where it's still possible to bail out. My understanding is that this moves the VMA iterator back by one VMA. If you patch some extra logging into the kernel and inject a fake out-of-memory error at the vma_iter_prealloc() call in vma_split() (a real out-of-memory error there is very unlikely to happen in practice, I think - my understanding is that the kernel will basically kill every process on the system except for init before it starts failing GFP_KERNEL allocations that fit within a single slab, unless the allocation uses GFP_ACCOUNT or stuff like that, which the maple tree doesn't): ``` init_multi_vma_prep(&vp, vma, adjust, remove, remove2); ``` and then you run this reproducer: ``` #define _GNU_SOURCE #include #include #include #include #include #include #include #include #ifndef UFFD_USER_MODE_ONLY #define UFFD_USER_MODE_ONLY 1 #endif #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) int main(void) { int uffd = SYSCHK(syscall(__NR_userfaultfd, UFFD_USER_MODE_ONLY)); struct uffdio_api api = { .api = UFFD_API, .features = 0 }; SYSCHK(ioctl(uffd, UFFDIO_API, &api)); /* create vma1 */ SYSCHK(mmap((void*)0x100000UL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, -1, 0)); /* set uffd on vma1 */ struct uffdio_register reg1 = { .range = { .start = 0x100000, .len = 0x1000 }, .mode = UFFDIO_REGISTER_MODE_MISSING }; SYSCHK(ioctl(uffd, UFFDIO_REGISTER, ®1)); /* create vma2 */ SYSCHK(mmap((void*)0x101000UL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, -1, 0)); /* tries to merge vma2 into vma1, with injected allocation failure causing merge failure */ SYSCHK(prctl(PR_SET_NAME, "FAILME")); struct uffdio_register reg2 = { .range = { .start = 0x101000, .len = 0x1000 }, .mode = UFFDIO_REGISTER_MODE_MISSING }; SYSCHK(ioctl(uffd, UFFDIO_REGISTER, ®2)); SYSCHK(prctl(PR_SET_NAME, "normal")); } ``` then you'll get this fun log output, showing that the same VMA (ffff88810c0b5e00) was visited by two iterations of the VMA iteration loop, and on the second iteration, prev==vma: [ 326.765586] userfaultfd_register: begin vma iteration [ 326.766985] userfaultfd_register: prev=ffff88810c0b5ef0, vma=ffff88810c0b5e00 (0000000000101000-0000000000102000) [ 326.768786] userfaultfd_register: vma_merge returned 0000000000000000 [ 326.769898] userfaultfd_register: prev=ffff88810c0b5e00, vma=ffff88810c0b5e00 (0000000000101000-0000000000102000) I don't know if this can lead to anything bad but it seems pretty clearly unintended? diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 7cecd49e078b..a7be4d6a5db6 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1454,9 +1454,16 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, prev = vma; ret = 0; + if (strcmp(current->comm, "FAILME") == 0) + pr_warn("%s: begin vma iteration\n", __func__); for_each_vma_range(vmi, vma, end) { cond_resched(); + if (strcmp(current->comm, "FAILME") == 0) { + pr_warn("%s: prev=%px, vma=%px (%016lx-%016lx)\n", + __func__, prev, vma, vma->vm_start, vma->vm_end); + } + BUG_ON(!vma_can_userfault(vma, vm_flags)); BUG_ON(vma->vm_userfaultfd_ctx.ctx && vma->vm_userfaultfd_ctx.ctx != ctx); @@ -1481,6 +1488,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma_policy(vma), ((struct vm_userfaultfd_ctx){ ctx }), anon_vma_name(vma)); + if (strcmp(current->comm, "FAILME") == 0) + pr_warn("%s: vma_merge returned %px\n", __func__, prev); if (prev) { /* vma_merge() invalidated the mas */ vma = prev; diff --git a/mm/mmap.c b/mm/mmap.c index 3937479d0e07..fd435c40c43d 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -990,7 +990,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, if (err) return NULL; - if (vma_iter_prealloc(vmi)) + if (strcmp(current->comm, "FAILME")==0 || vma_iter_prealloc(vmi)) return NULL;