From patchwork Mon Jul 9 10:27:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Martin X-Patchwork-Id: 10514225 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A0F8260318 for ; Mon, 9 Jul 2018 10:28:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9215028A81 for ; Mon, 9 Jul 2018 10:28:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 864F528A85; Mon, 9 Jul 2018 10:28:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1F23C28A81 for ; Mon, 9 Jul 2018 10:28:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=wcbYsANZFgiVAot0a7W3d1fhcsnZTBzusk4eOxf5PdY=; b=olbq6n7uG5lCwy 4C6kG6V2JOp7jt0hCRuYZgBah718zV3+a40zA/H+mGTtEVW5+r434GjtsCTLI0pdcna3O9SanAL3V akLfRGwwuvKK0pTI7IQbgS13jtt7I1zafcTvK7haTmPGZhjxzalMd/6wfXsqUKjKIMGPuPXDsmJl4 VRj5vAx5Dkgnc9zZHgjlcuCQozViacrXK3U8BL32jXIugfCXJJ96lg1Poc4NJT4fUpz7YcpUVEV76 aos6pM+gff63x+KsVAKyX7m9e0/otp+PXyXCfdHLSY8B91dzz8iYJ+sNoXZ10X4V66gpI66Niw6S4 xWTIuBxxGZVXXTi3lWSw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fcTOs-0000IF-CL; Mon, 09 Jul 2018 10:28:10 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fcTOp-0000H8-FS for linux-arm-kernel@lists.infradead.org; Mon, 09 Jul 2018 10:28:08 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 72F827A9; Mon, 9 Jul 2018 03:27:55 -0700 (PDT) Received: from e103592.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8693B3F318; Mon, 9 Jul 2018 03:27:54 -0700 (PDT) Date: Mon, 9 Jul 2018 11:27:46 +0100 From: Dave Martin To: "Yandong.Zhao" Subject: Re: [PATCH] arm64: neon: Do not access kernel_neon_busy with preemption enabled Message-ID: <20180709102724.GA9486@e103592.cambridge.arm.com> References: <1530936171-16169-1-git-send-email-yandong77520@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1530936171-16169-1-git-send-email-yandong77520@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180709_032807_526418_7672167D X-CRM114-Status: GOOD ( 29.64 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: zhaoyd@thundersoft.com, zhaoxb@thundersoft.com, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Sat, Jul 07, 2018 at 12:02:51PM +0800, Yandong.Zhao wrote: > From: Yandong Zhao > > Dear, Dave > Thank you very much for reading my email. I have a question and > I hope toget your reply. > It is a bug to not close the preemption when executing > raw_cpu_read(kernel_neon_busy) in the function may_use_simd()! > We encountered a ‘sysdump’ problem on Kernel 4.14 code because > the program entered BUG_ON() in the function kernel_neon_begin(). > Our analysis concludes that the process is migrating while executing > raw_cpu_read(kernel_neon_busy). Can you include a backtrace, please? Also, do you have any significant patches applies on top of v4.14? Is your configuration different from defconfig? > > Signed-off-by: Yandong Zhao > --- > arch/arm64/include/asm/simd.h | 17 +++++------------ > 1 file changed, 5 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h > index fa8b3fe..0d91084 100644 > --- a/arch/arm64/include/asm/simd.h > +++ b/arch/arm64/include/asm/simd.h > @@ -29,20 +29,13 @@ > static __must_check inline bool may_use_simd(void) > { > /* > - * The raw_cpu_read() is racy if called with preemption enabled. > - * This is not a bug: kernel_neon_busy is only set when > - * preemption is disabled, so we cannot migrate to another CPU > - * while it is set, nor can we migrate to a CPU where it is set. > - * So, if we find it clear on some CPU then we're guaranteed to > - * find it clear on any CPU we could migrate to. > - * > - * If we are in between kernel_neon_begin()...kernel_neon_end(), > - * the flag will be set, but preemption is also disabled, so we > - * can't migrate to another CPU and spuriously see it become > - * false. > + * Operations for contexts where we do not want to do any checks for > + * preemptions. Unless strictly necessary, always use this_cpu_*() > + * instead. Because of the kernel_neon_busy here we have to make sure > + * that it is the current cpu. > */ > return !in_irq() && !irqs_disabled() && !in_nmi() && > - !raw_cpu_read(kernel_neon_busy); > + !this_cpu_read(kernel_neon_busy); This doesn't really make sense by itself: this function is required to work in preemptible contexts, so in some situations there is no way to be sure which cpu's kernel_neon_busy flag is read. The original comment attempts to explain why this doesn't matter, and why raw_cpu_read() is valid. Can you explain why this is wrong? The BUG_ON() in kernel_neon_begin() is there for the purpose of detecting invalid uses of kernel_neon_begin() -- i.e., one kernel_neon_begin() nested inside another, or kernel_neon_begin() called from hardirq or nmi context. Are you sure kernel_neon_begin() is not being used wrongly somewhere? If kernel_neon_begin() is being used wrongly, the following hack may help to identify where the problem is: Cheers ---Dave --8<-- diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 8afc518..e6cabf6 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1165,6 +1165,7 @@ void fpsimd_flush_cpu_state(void) DEFINE_PER_CPU(bool, kernel_neon_busy); EXPORT_PER_CPU_SYMBOL(kernel_neon_busy); +static DEFINE_PER_CPU(void *, kernel_neon_caller); /* * Kernel-side NEON support functions @@ -1188,11 +1189,23 @@ void kernel_neon_begin(void) if (WARN_ON(!system_supports_fpsimd())) return; - BUG_ON(!may_use_simd()); + preempt_disable(); + if (!may_use_simd()) { + if (in_irq() || irqs_disabled() || in_nmi()) + pr_err("fpsimd: kernel_neon_begin() from hardirq/nmi context\n"); + + if (this_cpu_read(kernel_neon_busy)) + pr_err("fpsimd: kernel_neon_begin() already called by %pF\n", + this_cpu_read(kernel_neon_caller)); + + BUG(); + } + preempt_enable(); local_bh_disable(); __this_cpu_write(kernel_neon_busy, true); + __this_cpu_write(kernel_neon_caller, (void *)_RET_IP_); /* Save unsaved fpsimd state, if any: */ fpsimd_save();