From patchwork Fri Mar 3 21:32:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13159473 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52E18C7EE30 for ; Fri, 3 Mar 2023 21:32:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A25036B0074; Fri, 3 Mar 2023 16:32:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9AD766B0075; Fri, 3 Mar 2023 16:32:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84E216B0078; Fri, 3 Mar 2023 16:32:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6DC1A6B0074 for ; Fri, 3 Mar 2023 16:32:56 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3967E1C6A3B for ; Fri, 3 Mar 2023 21:32:56 +0000 (UTC) X-FDA: 80528887152.15.3650E53 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 7F17E12000E for ; Fri, 3 Mar 2023 21:32:54 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=neKbXVwv; spf=pass (imf29.hostedemail.com: domain of 3hWcCZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3hWcCZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677879174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=gmOP83j+qtaf8M311W2iUSVUxIR3it92PwPrq4SuPzE=; b=Czoup6fHCNSaKqYnT6GIcQJAB39ZkTu1uIE4K12LBFryoSfrGaFJLZ/9AiNjYIWajwKvre dMpkY1LkGs0mU0ZrDx3nM7U+u3VNnWsQW9ZRU+U+xQocF/mWFUBupOB/6RwB/gHOVr3gDS v2MyEAz+SIltgNCdBpISfFeb7QWEIpc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=neKbXVwv; spf=pass (imf29.hostedemail.com: domain of 3hWcCZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3hWcCZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677879174; a=rsa-sha256; cv=none; b=2aBIxm2RVjBtc6paB+VAMqW2dZYTYOJPeyPg4JpnfthI254qH8MIBgj9aLwbRXib4IDe6p rWhPvO8LN0+fZew1A3syvcCs7NpdKGqoO2ypVyqdQmxpGZJcDFdQ3OEwCR2sjg0buQcVdS sAd/DDhqR4dXbh9vDgqO1WAL6eR9U8o= Received: by mail-yb1-f202.google.com with SMTP id m6-20020a056902118600b00aeb1e3dbd1bso3530890ybu.9 for ; Fri, 03 Mar 2023 13:32:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1677879173; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=gmOP83j+qtaf8M311W2iUSVUxIR3it92PwPrq4SuPzE=; b=neKbXVwvVyiwGdDqGBFVR+D+ZgM9SB/+x9wzrpOatRgxUgwP7G/Yl6iuJsRPSwm5bU /+9M8q8RWLDO+LbxI0j12TtdY8g4OAJXJ6gQzikVKLUVE/Y7OKRjmpIs3ISjlrBJYGV+ /vGOjFB5eqOMkXeV3NiEXMYxC5F/95V3vAi9KvS8mxB8vmBgYkisyMboeXAIPmyLBAGx YEQhmNyeZNGHx5KnjzSXn53LeXq1RBMwXlMCbez1411tkPWdemtbwqpDtWquJCtABY6J coXHvGgEHAmd8Z6JiZQPM2AfYUB1ISi1QugxzLun3PtOdGPRLYFrtjcJO08PMK/ZTOCm kt/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677879173; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gmOP83j+qtaf8M311W2iUSVUxIR3it92PwPrq4SuPzE=; b=qAxWUiN1ZSb/MSJVhVTgT3OD2TM0ZMiUuJ/MkKiPSHutVUOfARlss+POwRLZqDkU/L 1g6gdBf8At8TFTZiErc+mOoyjcGa/LYnCfWqRwK1mnO+jQ7HqPra9WRPxX/9e6/MpiJD jwmVDCvPZvY2zDOtXX++vgZslrioihWz42cJqcCXh8eTgBiLRwTQYw8GptX6e+L9Jcuu 4OOGkC+tvayvRAbZE09MrKik8rE8NCvvszJHyyNHN9MQYtij4UpHK9SsWs6aP89G9gqZ ai5kUrkYseq99I1HjFBgKNtaz9tMVqmeOEzKuVIyZvm+gQvuO6AaMXThc5jKV8wdIjgn j2XA== X-Gm-Message-State: AO0yUKXgGMyLgVm32XAYryQ+TGX3TXZ2Fch5M4bjsqivMijiYLKqOEUj S9qfy1E1UKuwk0D7QXCtsJuQ/zoEFn8= X-Google-Smtp-Source: AK7set9YpB2FgcduQdC26ixMlYv7+zFt+CX85UCwfdU0JzjGgCscQfrdeAvf1dX8YFsFnqfJXYsTwYBAIGQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:bb9b:efd1:ff2f:8319]) (user=surenb job=sendgmr) by 2002:a81:a783:0:b0:533:9b80:a30e with SMTP id e125-20020a81a783000000b005339b80a30emr1783224ywh.10.1677879173520; Fri, 03 Mar 2023 13:32:53 -0800 (PST) Date: Fri, 3 Mar 2023 13:32:50 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.rc0.216.gc4246ad0f0-goog Message-ID: <20230303213250.3555716-1-surenb@google.com> Subject: [PATCH 1/1] mm/khugepaged: fix vm_lock/i_mmap_rwsem inversion in retract_page_tables From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, punit.agrawal@bytedance.com, lstoakes@gmail.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, posk@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan , syzbot+8955a9646d1a48b8be92@syzkaller.appspotmail.com X-Rspamd-Queue-Id: 7F17E12000E X-Stat-Signature: nbz39qus3c7xhk6iyrhwwoxw34wm6jt5 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1677879174-140859 X-HE-Meta: U2FsdGVkX1+jBaH52qYnAI8gA6cCWXX/tc+5wU8aYo/9M9Ognvi4ecocl7FSXPXe62T76lpovyIKHtuhrfOWUZx6uvViJzo+ncx/CCmy32OyYeGNNQIPMhy6AHPcHE/fbba11p8yb/EsNcmO+ZIK5pcbb3XuZfwUayb9Dj5llhVnrQnsJ/iW1LhfUqymbsg0v20xBslj88ahUJjdhMDv0pvXXTYcGcT7yeCocIiae42yXCLUsb/X+Ms0Zsx884XhJewQM1sWSX1CusYA9TCTmWKmiV/884dQgf0J0MJi1dZknlXylOrTRpAmp0yaYQteAJ/lKpTUR3Kgvo9OEa36OE5kgX1QXqSSXvTEZlO+JhLkVSR56IrU1P074Am/ZKoRX4U7kOzAPZ1MhJzCysiqRs9HPa2gJf2lVc7XnLRXR182q9E3F+SpSbYY3QLqiUFN9S3nuVmAIZ4J3CBFAKZjEJB8/NI0tpzGua1YjIGaXIUwcyPJdN6LrhFFs5ekHdutc0akme+2bxxdYis06HzJQsobddFURTgZmANuRsS7sd0phATglIsW3bUanLvWP5bAPcmqGlQ2SSwtEGVVPhMvwB4Cxy8PJZkhaSPuM3Q3bSDHJ2G4v5PK1vzeb1ItUV/jlJ8rKnNZDIh5LEx3zjt3XuOAWocUKDYZQM26T+D9KoEdWVS5pAgR4zyWKyDYepvuE5EF1hDZcq3J3dihGtScbXk6G6ECim4knJsEK1qkHPboGfQIdczykCjz0xep413jKn8TVT87J0W/U3F2yxZVJHP6ckKpjX/SfHCk6ZFtKch9lXIZGL6PFSgLVAwv9TKzn4SKN0veGkoiSh54i7Zah05bUNfKT6Gl3j4P+GVaAK6GWs5pPN+xTUJLbInIbjrsiixI2j3akkC4GzIFj2szU72Avw+Z5kblZCmEg2aAktnDDV+7AZjhqnSWjN0mt+367ighMqdQOB6oOKoudSg +ofei4r4 hf3cu1+IDGI/PAJAhtggcWAlm8XekU10+pwRwIi8gxd7KFImns16CJPNId8dRRJqxJnW/7XwHDLxoJAFN2wME7z9BTsrRvpBzuoLJE3pG6jV9p3iXBFHNpskWCXHVzo5VE8JRvZctKCy7Q7FpvN7cy4hABwH4K2uE0XVug8cHXgSNYiw1Ril+rXSQ5dFhRv+MdlU6mPir5BlPG/8UT9/0SBkth3JlJc90IbMpnbm5bh6jWu8KpPWlMFV0lU+IA5oftKOakrIpdg5JvSIMn1J7IozE4IsYJ9+4sVLaZNO3hzeiLJebfcfS59FGs5jDdjlNkBQTAlZHgaZYXiXSSiqeLw0d2Si3OI0KZCdvmgK4o8oEL2Y6NTWe8U/R+hRyRKuFJkubA48dEvcyo1s3jRpKlseQSmNlAeyJTtkzt+4X+rfae5acz7wlxLKLjlrysPLh4UGGV9E4iz0PzzmZdEumBhVZhY4VU7R5VQ/eyd6V9vawis4AQ5sDqrUuBi2acz1XX+ekD3ys1yPo07KDXhiCaRXN4D/cukQLRUdlOOkj2x+cPIkliS7TT0j3ozBLd0ISjykSlzAeZY0jZpHAbX81tZTCJHKU7XNWWWIvVF4dX2yPp+7SbcIMIpT36oXYAKNRmAsHU6rNmEJcNpG31+qplyHTmHMY5fP12aOapw69fTUVhsjBThJ++sskSeATm+iWk1yaoysz5vBJOkmgM7grtyPe0y2wP+a8PlouuJ64xCWCl6Tczldj066NHAGCqo7rqIb9P6wklIOHIhY5dfoh0YdVyrSa5Q/4FouEaEgDYVqL5ejkL6YeUbdJH4msEhN8BXw1qGLoDEZu248= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Internal syscaller on linux-next reported a lock inversion cause by vm_lock being taken after i_mmap_rwsem: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-next-20230301-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor115/5084 is trying to acquire lock: ffff888078307a90 (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write include/linux/mm.h:678 [inline] ffff888078307a90 (&vma->vm_lock->lock){++++}-{3:3}, at: retract_page_tables mm/khugepaged.c:1826 [inline] ffff888078307a90 (&vma->vm_lock->lock){++++}-{3:3}, at: collapse_file+0x4fa5/0x5980 mm/khugepaged.c:2204 but task is already holding lock: ffff88801f93efa8 (&mapping->i_mmap_rwsem){++++}-{3:3}, at: i_mmap_lock_write include/linux/fs.h:468 [inline] ffff88801f93efa8 (&mapping->i_mmap_rwsem){++++}-{3:3}, at: retract_page_tables mm/khugepaged.c:1745 [inline] ffff88801f93efa8 (&mapping->i_mmap_rwsem){++++}-{3:3}, at: collapse_file+0x3da6/0x5980 mm/khugepaged.c:2204 retract_page_tables takes i_mmap_rwsem before exclusive mmap_lock, which is inverse to normal order. Deadlock is avoided by try-locking mmap_lock and skipping on failure to obtain it. Locking the VMA should use the same locking pattern to avoid this lock inversion. Fixes: 44a83f2083bd ("mm/khugepaged: write-lock VMA while collapsing a huge page") Reported-by: syzbot+8955a9646d1a48b8be92@syzkaller.appspotmail.com Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 39 ++++++++++++++++++++++++++++----------- mm/khugepaged.c | 5 ++++- 2 files changed, 32 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1b9be34a24fb..5f16263d176d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -676,18 +676,23 @@ static inline void vma_end_read(struct vm_area_struct *vma) rcu_read_unlock(); } -static inline void vma_start_write(struct vm_area_struct *vma) +static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) { - int mm_lock_seq; - mmap_assert_write_locked(vma->vm_mm); /* * current task is holding mmap_write_lock, both vma->vm_lock_seq and * mm->mm_lock_seq can't be concurrently modified. */ - mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); - if (vma->vm_lock_seq == mm_lock_seq) + *mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + return (vma->vm_lock_seq == *mm_lock_seq); +} + +static inline void vma_start_write(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + if (__is_vma_write_locked(vma, &mm_lock_seq)) return; down_write(&vma->vm_lock->lock); @@ -695,14 +700,26 @@ static inline void vma_start_write(struct vm_area_struct *vma) up_write(&vma->vm_lock->lock); } +static inline bool vma_try_start_write(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + if (__is_vma_write_locked(vma, &mm_lock_seq)) + return true; + + if (!down_write_trylock(&vma->vm_lock->lock)) + return false; + + vma->vm_lock_seq = mm_lock_seq; + up_write(&vma->vm_lock->lock); + return true; +} + static inline void vma_assert_write_locked(struct vm_area_struct *vma) { - mmap_assert_write_locked(vma->vm_mm); - /* - * current task is holding mmap_write_lock, both vma->vm_lock_seq and - * mm->mm_lock_seq can't be concurrently modified. - */ - VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); + int mm_lock_seq; + + VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma); } static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c64e01f03f27..408fed42c9f5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1795,6 +1795,10 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, result = SCAN_PTE_MAPPED_HUGEPAGE; if ((cc->is_khugepaged || is_target) && mmap_write_trylock(mm)) { + /* trylock for the same lock inversion as above */ + if (!vma_try_start_write(vma)) + goto unlock_next; + /* * Re-check whether we have an ->anon_vma, because * collapse_and_free_pmd() requires that either no @@ -1823,7 +1827,6 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, result = SCAN_PTE_UFFD_WP; goto unlock_next; } - vma_start_write(vma); collapse_and_free_pmd(mm, vma, addr, pmd); if (!cc->is_khugepaged && is_target) result = set_huge_pmd(vma, addr, pmd, hpage);