From patchwork Tue Oct 6 22:54:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 11819367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D631E175A for ; Tue, 6 Oct 2020 22:55:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6AEED212CC for ; Tue, 6 Oct 2020 22:55:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="MtWR3Vyu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6AEED212CC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6F809900003; Tue, 6 Oct 2020 18:55:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6A6CA900002; Tue, 6 Oct 2020 18:55:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59E3E900003; Tue, 6 Oct 2020 18:55:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id 22CD3900002 for ; Tue, 6 Oct 2020 18:55:03 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9C10A181AE86D for ; Tue, 6 Oct 2020 22:55:02 +0000 (UTC) X-FDA: 77343007644.27.smash14_2c165ad271ca Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 7E1CB3D663 for ; Tue, 6 Oct 2020 22:55:02 +0000 (UTC) X-Spam-Summary: 1,0,0,833e2d212802e414,d41d8cd98f00b204,jannh@google.com,,RULES_HIT:2:41:355:379:541:560:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1605:1606:1730:1747:1777:1792:1801:2198:2199:2393:2559:2562:2689:2693:2901:2914:3138:3139:3140:3141:3142:3152:3622:3865:3866:3867:3868:3870:3871:3872:3873:3874:4119:4321:4605:5007:6261:6653:6742:7901:7903:9969:10004:11026:11473:11657:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12895:12986:13141:13153:13161:13215:13228:13229:13230:13255:13894:14096:14394:21080:21433:21444:21451:21627:21740:21990:30012:30029:30054:30070,0,RBL:209.85.128.65:@google.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y84tyfxujjcsyqny9bbnn5ss1hiycm9dzbg56sghymyys8ch7tifa376p4nzd.bjytt3ggw45tgbh83pnwg8gk9wgogib6ehptndnkscd4pd74z4841af4n4nf7ww.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY :none X-HE-Tag: smash14_2c165ad271ca X-Filterd-Recvd-Size: 8754 Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Tue, 6 Oct 2020 22:55:02 +0000 (UTC) Received: by mail-wm1-f65.google.com with SMTP id p15so419736wmi.4 for ; Tue, 06 Oct 2020 15:55:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HGCVKyGQYZlN2d0OUVMXYRPm5j+ZPoa8A5kyMh5F4+U=; b=MtWR3VyubrBnus+o/vh7gAp/nETf7cvtPg3bsbRbOoaBmBVPhDQViSLg0Sm+d6t2EV 1sFIztChA/8yNE6rZ+JfUbQozKN15UNruTKS97DMZO4BZGRcBJvSmnDMHH+Buvegorea uSJQp4hPC6vU1v94lar4/E34Wr1vxdJdwFPmNIUzMzod7kaC4cf+wsboZLdjioe0CIyN cdsHpaBlZizKsh1dmmSC1BTO2hcuDNe5Wuit40HFnaqcaVXxzYTB2VEYAwFzXLzXnRjN psNmrtLFyhE7lIzGaniy1ENf+7jOFbQs33UO6gsayyuDFY4MkBJPoFlLDmSfRDWIwUJI zCDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HGCVKyGQYZlN2d0OUVMXYRPm5j+ZPoa8A5kyMh5F4+U=; b=dZ/AE5ZqgjErnDBA95T+JP3/xiuXGxl67CF11XS0homPHyIimPquyh4q5+kY5T5gUd 1cgpdf1S7gLK2fZnrt5xwS27HYaYHAi36p+08xuuj5p8TbgPAv5y6QZRMNWIAWfVnggp skEoE0aUTnhaIV/c8kusoyALe1t2yq08xcELPTkfVizRwl/kQ1jcQY217CN9QY/Bs8bh GJwBijCU095Y5ja2Ksv5BrZIsVmKf3v2B2wSy50+Yxvog3H1UhAMWHvWQWPpdcnmn5i7 7bwTItH9MWmi14Lk9CipgoMdsMsDaGzOrKKljri8WKLZMgpZngXJ/bXulW7MJPvYnO9l KuBA== X-Gm-Message-State: AOAM533Dmrl6CXaIEvBXdad0K1MxFOmEFNww/HBnyjNIlTqNlpUE3H7t JNnrrpZmnOEvtJlWFJ6jd679iA== X-Google-Smtp-Source: ABdhPJxtwY+EEZDUG+ByVJbphRcGc6RU0rLKiFkJaQUAspMbv4Mof8L7Z40rGPrOffzEGzMOeng9Ig== X-Received: by 2002:a05:600c:2902:: with SMTP id i2mr167748wmd.31.1602024900851; Tue, 06 Oct 2020 15:55:00 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id i15sm243602wrb.91.2020.10.06.15.55.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Oct 2020 15:55:00 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, "Eric W . Biederman" , Michel Lespinasse , Mauro Carvalho Chehab , Sakari Ailus , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um@lists.infradead.org, Jason Gunthorpe , John Hubbard Subject: [PATCH v2 1/2] mmap locking API: Order lock of nascent mm outside lock of live mm Date: Wed, 7 Oct 2020 00:54:49 +0200 Message-Id: <20201006225450.751742-2-jannh@google.com> X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog In-Reply-To: <20201006225450.751742-1-jannh@google.com> References: <20201006225450.751742-1-jannh@google.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Until now, the mmap lock of the nascent mm was ordered inside the mmap lock of the old mm (in dup_mmap() and in UML's activate_mm()). A following patch will change the exec path to very broadly lock the nascent mm, but fine-grained locking should still work at the same time for the old mm. In particular, mmap locking calls are hidden behind the copy_from_user() calls and such that are reached through functions like copy_strings() - when a page fault occurs on a userspace memory access, the mmap lock will be taken. To do this in a way that lockdep is happy about, let's turn around the lock ordering in both places that currently nest the locks. Since SINGLE_DEPTH_NESTING is normally used for the inner nesting layer, make up our own lock subclass MMAP_LOCK_SUBCLASS_NASCENT and use that instead. The added locking calls in exec_mmap() are temporary; the following patch will move the locking out of exec_mmap(). Signed-off-by: Jann Horn --- arch/um/include/asm/mmu_context.h | 3 +-- fs/exec.c | 4 ++++ include/linux/mmap_lock.h | 23 +++++++++++++++++++++-- kernel/fork.c | 7 ++----- 4 files changed, 28 insertions(+), 9 deletions(-) diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h index 17ddd4edf875..c13bc5150607 100644 --- a/arch/um/include/asm/mmu_context.h +++ b/arch/um/include/asm/mmu_context.h @@ -48,9 +48,8 @@ static inline void activate_mm(struct mm_struct *old, struct mm_struct *new) * when the new ->mm is used for the first time. */ __switch_mm(&new->context.id); - mmap_write_lock_nested(new, SINGLE_DEPTH_NESTING); + mmap_assert_write_locked(new); uml_setup_stubs(new); - mmap_write_unlock(new); } static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, diff --git a/fs/exec.c b/fs/exec.c index a91003e28eaa..229dbc7aa61a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1114,6 +1114,8 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; + mmap_write_lock_nascent(mm); + if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1125,6 +1127,7 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); + mmap_write_unlock(mm); return -EINTR; } } @@ -1138,6 +1141,7 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); + mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 0707671851a8..24de1fe99ee4 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -3,6 +3,18 @@ #include +/* + * Lock subclasses for the mmap_lock. + * + * MMAP_LOCK_SUBCLASS_NASCENT is for core kernel code that wants to lock an mm + * that is still being constructed and wants to be able to access the active mm + * normally at the same time. It nests outside MMAP_LOCK_SUBCLASS_NORMAL. + */ +enum { + MMAP_LOCK_SUBCLASS_NORMAL = 0, + MMAP_LOCK_SUBCLASS_NASCENT +}; + #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), @@ -16,9 +28,16 @@ static inline void mmap_write_lock(struct mm_struct *mm) down_write(&mm->mmap_lock); } -static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) +/* + * Lock an mm_struct that is still being set up (during fork or exec). + * This nests outside the mmap locks of live mm_struct instances. + * No interruptible/killable versions exist because at the points where you're + * supposed to use this helper, the mm isn't visible to anything else, so we + * expect the mmap_lock to be uncontended. + */ +static inline void mmap_write_lock_nascent(struct mm_struct *mm) { - down_write_nested(&mm->mmap_lock, subclass); + down_write_nested(&mm->mmap_lock, MMAP_LOCK_SUBCLASS_NASCENT); } static inline int mmap_write_lock_killable(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index da8d360fb032..db67eb4ac7bd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, unsigned long charge; LIST_HEAD(uf); + mmap_write_lock_nascent(mm); uprobe_start_dup_mmap(); if (mmap_write_lock_killable(oldmm)) { retval = -EINTR; @@ -481,10 +482,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, } flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); - /* - * Not linked in yet - no deadlock potential: - */ - mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING); /* No ordering required: file already has been exposed. */ RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm)); @@ -600,12 +597,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, /* a new mm has just been created */ retval = arch_dup_mmap(oldmm, mm); out: - mmap_write_unlock(mm); flush_tlb_mm(oldmm); mmap_write_unlock(oldmm); dup_userfaultfd_complete(&uf); fail_uprobe_end: uprobe_end_dup_mmap(); + mmap_write_unlock(mm); return retval; fail_nomem_anon_vma_fork: mpol_put(vma_policy(tmp)); From patchwork Tue Oct 6 22:54:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 11819369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A64F139F for ; Tue, 6 Oct 2020 22:55:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1793320BED for ; Tue, 6 Oct 2020 22:55:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="k8l5gHBu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1793320BED Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4B22900004; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DFB79900002; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE8AE900004; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id A0415900002 for ; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2A8CB181AE86D for ; Tue, 6 Oct 2020 22:55:05 +0000 (UTC) X-FDA: 77343007770.20.robin73_3e12497271ca Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 0A1DE180C07AB for ; Tue, 6 Oct 2020 22:55:05 +0000 (UTC) X-Spam-Summary: 1,0,0,ab27a930a61d7e0f,d41d8cd98f00b204,jannh@google.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:968:973:981:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2194:2196:2199:2200:2393:2559:2562:2693:2895:2901:2914:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:3874:4049:4250:4321:4385:4605:5007:6261:6653:6742:7875:7901:7903:9592:9969:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:13149:13161:13229:13230:13894:14394:21063:21080:21444:21451:21627:21740:21990:30003:30012:30054:30064:30070,0,RBL:209.85.221.66:@google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yf4acruerh3pn69wwyf4by1yzekochf6i9maffezeuf65rpykujcue9b3qtxh.wogu6i3a6o71dnuae4zbcn6rkxb39bj8mbpbzjibradxb86s8pusxqowkhnz6gg.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: robin73_3e12497271ca X-Filterd-Recvd-Size: 10089 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 6 Oct 2020 22:55:04 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id t10so37832wrv.1 for ; Tue, 06 Oct 2020 15:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZE/wKx/2RmNzMs8AoG4OykxIKmOmK9f6qquh+OkR+3A=; b=k8l5gHBu8FLs9Pi1ZWEnxYCUxxmeO6LjSOLKBlTXhxesA30vKQZ1TbOHjgA4/Vxkiu sIZCzhYXm+gsp4aTSWTbAOo3QGodZRtXsPWwjT+62xhCCfGtc9p1GtFEptQ6FIgKmCVE RWo2wLMqH5IRS9hU/QluO6sQpfxOjYGXBg+D8EX9FrvnYQVEYdhem2tfI/8vOI3r79Lv BzWUPsNp636ZRWg+DwQUXSGBOuw5/NZIxUAXtgr5II1G1jjx2Q5/FRzbfXddYePupuNQ WCC2TqJ7L/FHzJOawmS0uel0FSEauRpWzkdNYVGfB50abXfAnmt2aKIL+RuQ3NIGZPS2 PKdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZE/wKx/2RmNzMs8AoG4OykxIKmOmK9f6qquh+OkR+3A=; b=tNxZDuOeTO7heXbxCCIFOu3vnR+V4XWctzBDI8UrSaqXezpGovSjYYjCRzqWNSx+eD /ncoM/ajjmrlrC/3RHCWacg03J+h2zj97kXDd53YivKFFj9oo2Co+6q8aUswXAHoe18V Pe+DgJpAvdZg+30vV/nB6HXu/YKF9kPPIq7afeOamZ6qDorhD7f4wGUvi+Yys5RE+ZTH Gua4ugVJs8e3grY4Xf7U5ZmjuhCKXTc4jbBbSwOfKf3dzcG1FXmWLzZO+qpQ3CRaIMGJ 8TNnXDY8lF/GEnGV5OFb7loLlSX0roJMrPZsW28u/s8Iv2Uy7RVbFg5dis+qDPuqlynU MnSA== X-Gm-Message-State: AOAM531Iuitj2gUapLvpS+atg6jMeG3qF3jVtmV1jlk/ZKqIGgMKRzuz EHjKY1o0J4mpgrqfJ2in1Qnts4C1cDJ4qw== X-Google-Smtp-Source: ABdhPJzjkNN/QpBwkc7ric2cIHgfEVwzyvIhc8OstjwlEId4qz1orekL0RfkqLjL7OfsBD4AfXnEzA== X-Received: by 2002:adf:e70a:: with SMTP id c10mr156257wrm.425.1602024903367; Tue, 06 Oct 2020 15:55:03 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id y11sm310504wrs.16.2020.10.06.15.55.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Oct 2020 15:55:02 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, "Eric W . Biederman" , Michel Lespinasse , Mauro Carvalho Chehab , Sakari Ailus , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um@lists.infradead.org, Jason Gunthorpe , John Hubbard Subject: [PATCH v2 2/2] exec: Broadly lock nascent mm until setup_arg_pages() Date: Wed, 7 Oct 2020 00:54:50 +0200 Message-Id: <20201006225450.751742-3-jannh@google.com> X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog In-Reply-To: <20201006225450.751742-1-jannh@google.com> References: <20201006225450.751742-1-jannh@google.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While AFAIK there currently is nothing that can modify the VMA tree of a new mm until userspace has started running under the mm, we should properly lock the mm here anyway, both to keep lockdep happy when adding locking assertions and to be safe in the future in case someone e.g. decides to permit VMA-tree-mutating operations in process_madvise_behavior_valid(). The goal of this patch is to broadly lock the nascent mm in the exec path, from around the time it is created all the way to the end of setup_arg_pages() (because setup_arg_pages() accesses bprm->vma). As long as the mm is write-locked, keep it around in bprm->mm, even after it has been installed on the task (with an extra reference on the mm, to reduce complexity in free_bprm()). After setup_arg_pages(), we have to unlock the mm so that APIs such as copy_to_user() will work in the following binfmt-specific setup code. Suggested-by: Jason Gunthorpe Suggested-by: Michel Lespinasse Signed-off-by: Jann Horn --- fs/exec.c | 68 ++++++++++++++++++++--------------------- include/linux/binfmts.h | 2 +- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 229dbc7aa61a..fe11d77e397a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -254,11 +254,6 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); - if (mmap_write_lock_killable(mm)) { - err = -EINTR; - goto err_free; - } - /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't @@ -276,12 +271,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm) goto err; mm->stack_vm = mm->total_vm = 1; - mmap_write_unlock(mm); bprm->p = vma->vm_end - sizeof(void *); return 0; err: - mmap_write_unlock(mm); -err_free: bprm->vma = NULL; vm_area_free(vma); return err; @@ -364,9 +356,9 @@ static int bprm_mm_init(struct linux_binprm *bprm) struct mm_struct *mm = NULL; bprm->mm = mm = mm_alloc(); - err = -ENOMEM; if (!mm) - goto err; + return -ENOMEM; + mmap_write_lock_nascent(mm); /* Save current stack limit for all calculations made during exec. */ task_lock(current->group_leader); @@ -374,17 +366,12 @@ static int bprm_mm_init(struct linux_binprm *bprm) task_unlock(current->group_leader); err = __bprm_mm_init(bprm); - if (err) - goto err; - - return 0; - -err: - if (mm) { - bprm->mm = NULL; - mmdrop(mm); - } + if (!err) + return 0; + bprm->mm = NULL; + mmap_write_unlock(mm); + mmdrop(mm); return err; } @@ -735,6 +722,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) /* * Finalizes the stack vm_area_struct. The flags and permissions are updated, * the stack is optionally relocated, and some extra space is added. + * At the end of this, the mm_struct will be unlocked on success. */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, @@ -787,9 +775,6 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -= stack_shift; bprm->exec -= stack_shift; - if (mmap_write_lock_killable(mm)) - return -EINTR; - vm_flags = VM_STACK_FLAGS; /* @@ -807,7 +792,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags); if (ret) - goto out_unlock; + return ret; BUG_ON(prev != vma); if (unlikely(vm_flags & VM_EXEC)) { @@ -819,7 +804,7 @@ int setup_arg_pages(struct linux_binprm *bprm, if (stack_shift) { ret = shift_arg_pages(vma, stack_shift); if (ret) - goto out_unlock; + return ret; } /* mprotect_fixup is overkill to remove the temporary stack flags */ @@ -846,11 +831,17 @@ int setup_arg_pages(struct linux_binprm *bprm, current->mm->start_stack = bprm->p; ret = expand_stack(vma, stack_base); if (ret) - ret = -EFAULT; + return -EFAULT; -out_unlock: + /* + * From this point on, anything that wants to poke around in the + * mm_struct must lock it by itself. + */ + bprm->vma = NULL; mmap_write_unlock(mm); - return ret; + mmput(mm); + bprm->mm = NULL; + return 0; } EXPORT_SYMBOL(setup_arg_pages); @@ -1114,8 +1105,6 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; - mmap_write_lock_nascent(mm); - if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1127,11 +1116,12 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); - mmap_write_unlock(mm); return -EINTR; } } + /* bprm->mm stays refcounted, current->mm takes an extra reference */ + mmget(mm); task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); @@ -1141,7 +1131,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); - mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); @@ -1397,8 +1386,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; - bprm->mm = NULL; - #ifdef CONFIG_POSIX_TIMERS exit_itimers(me->signal); flush_itimer_signals(); @@ -1545,6 +1532,18 @@ void setup_new_exec(struct linux_binprm * bprm) me->mm->task_size = TASK_SIZE; mutex_unlock(&me->signal->exec_update_mutex); mutex_unlock(&me->signal->cred_guard_mutex); + +#ifndef CONFIG_MMU + /* + * On MMU, setup_arg_pages() wants to access bprm->vma after this point, + * so we can't drop the mmap lock yet. + * On !MMU, we have neither setup_arg_pages() nor bprm->vma, so we + * should drop the lock here. + */ + mmap_write_unlock(bprm->mm); + mmput(bprm->mm); + bprm->mm = NULL; +#endif } EXPORT_SYMBOL(setup_new_exec); @@ -1581,6 +1580,7 @@ static void free_bprm(struct linux_binprm *bprm) { if (bprm->mm) { acct_arg_size(bprm, 0); + mmap_write_unlock(bprm->mm); mmput(bprm->mm); } free_arg_pages(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 0571701ab1c5..3bf06212fbae 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -22,7 +22,7 @@ struct linux_binprm { # define MAX_ARG_PAGES 32 struct page *page[MAX_ARG_PAGES]; #endif - struct mm_struct *mm; + struct mm_struct *mm; /* nascent mm, write-locked */ unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int