From patchwork Mon Aug 10 17:38:09 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 6985711 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 934CEC05AC for ; Mon, 10 Aug 2015 17:38:18 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BCB40202D1 for ; Mon, 10 Aug 2015 17:38:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6631C20260 for ; Mon, 10 Aug 2015 17:38:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932256AbbHJRiO (ORCPT ); Mon, 10 Aug 2015 13:38:14 -0400 Received: from foss.arm.com ([217.140.101.70]:50610 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932246AbbHJRiN (ORCPT ); Mon, 10 Aug 2015 13:38:13 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D8C1675; Mon, 10 Aug 2015 10:38:11 -0700 (PDT) Received: from e104818-lin.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 417BA3F5A0; Mon, 10 Aug 2015 10:38:12 -0700 (PDT) Date: Mon, 10 Aug 2015 18:38:09 +0100 From: Catalin Marinas To: Bjorn Helgaas Cc: Duc Dang , "linux-pci@vger.kernel.org" , Tanmay Inamdar , linux-arm , "linux-kernel@vger.kernel.org" Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32 Message-ID: <20150810173809.GE15394@e104818-lin.cambridge.arm.com> References: <20150724224258.GA23990@google.com> <20150728212944.GA12958@google.com> <20150729012255.GA18606@google.com> <20150729155509.GA31170@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Aug 10, 2015 at 11:18:23AM -0500, Bjorn Helgaas wrote: > On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang wrote: > > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas wrote: > >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote: > >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote: > >> > >>> > Do you have another PCIe card to try on the same reboot test on this board? > >>> > >>> I've seen this on at least two Mellanox cards. I'm running similar tests > >>> on a different type of card now. > >> > >> FWIW, reboot tests on two machines with Mellanox cards failed, while the > >> same test on a machine with a different proprietary card succeeded. > > > > Thanks, Bjorn. > > > > I don't have the same Mellanox card as yours, but I will also run > > similar reboot test to see if I hit the same issue with my card. > > Any more hints on this? Nothing has changed on my end, so of course > I'm still seeing this, always on machines with Mellanox, and never on > other machines. Could this be a hardware issue like a signal > integrity or margin issue? I don't know where to go from here because > I'm not a hardware person, and I don't know anything to do in > software. Silly hack below, not actually a solution (and it may not even work): diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 94d98cd1aad8..e895e96b3d13 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -369,6 +369,14 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs) return 1; } +/* + * Retry the faulty access. + */ +static int do_good(unsigned long addr, unsigned int esr, struct pt_regs *regs) +{ + return 0; +} + static struct fault_info { int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs); int sig; @@ -391,7 +399,7 @@ static struct fault_info { { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" }, - { do_bad, SIGBUS, 0, "synchronous external abort" }, + { do_good, SIGBUS, 0, "synchronous external abort" }, { do_bad, SIGBUS, 0, "asynchronous external abort" }, { do_bad, SIGBUS, 0, "unknown 18" }, { do_bad, SIGBUS, 0, "unknown 19" },