[1/5] dmaengine: bcm2835: Fix interrupt race on RT

If IRQ handlers are threaded (either because CONFIG_PREEMPT_RT_BASE is
enabled or "threadirqs" was passed on the command line) and if system
load is sufficiently high that wakeup latency of IRQ threads degrades,
SPI DMA transactions on the BCM2835 occasionally break like this:

ks8851 spi0.0: SPI transfer timed out
bcm2835-dma 3f007000.dma: DMA transfer could not be terminated
ks8851 spi0.0 eth2: ks8851_rdfifo: spi_sync() failed

The root cause is an assumption made by the DMA driver which is
documented in a code comment in bcm2835_dma_terminate_all():

/*
 * Stop DMA activity: we assume the callback will not be called
 * after bcm_dma_abort() returns (even if it does, it will see
 * c->desc is NULL and exit.)
 */

That assumption falls apart if the IRQ handler bcm2835_dma_callback() is
threaded:  A client may terminate a descriptor and issue a new one
before the IRQ handler had a chance to run.  In fact the IRQ handler may
miss an *arbitrary* number of descriptors.  The result is the following
race condition:

1. A descriptor finishes, its interrupt is deferred to the IRQ thread.
2. A client calls dma_terminate_async() which sets channel->desc = NULL.
3. The client issues a new descriptor.  Because channel->desc is NULL,
   bcm2835_dma_issue_pending() immediately starts the descriptor.
4. Finally the IRQ thread runs and writes BCM2835_DMA_INT to the CS
   register to acknowledge the interrupt.  This clears the ACTIVE flag,
   so the newly issued descriptor is paused in the middle of the
   transaction.  Because channel->desc is not NULL, the IRQ thread
   finalizes the descriptor and tries to start the next one.

I see two possible solutions:  The first is to call synchronize_irq()
in bcm2835_dma_issue_pending() to wait until the IRQ thread has
finished before issuing a new descriptor.  The downside of this approach
is unnecessary latency if clients desire rapidly terminating and
re-issuing descriptors and don't have any use for an IRQ callback.
(The SPI TX DMA channel is a case in point.)

A better alternative is to make the IRQ thread recognize that it has
missed descriptors and avoid finalizing the newly issued descriptor.
Therefore, only finalize a descriptor if the END flag is set in the CS
register.  Clear the flag when starting a new descriptor.  Perform both
operations under the vchan lock.  That way, there is practically no
impact on latency and throughput if the client doesn't care for the
interrupt:  Only minimal additional overhead is introduced as one
further MMIO read is necessary per interrupt.

That MMIO read is needed to write the same value to the CS register that
it currently has, thereby preserving the ACTIVE flag while clearing the
INT and END flags.  Note that the transaction may finish between reading
and writing the CS register, and in that case the write changes the
ACTIVE flag from 0 to 1.  But that's harmless, the chip will notice that
NEXTCONBK is 0 and self-clear the ACTIVE flag again.

Fixes: 96286b576690 ("dmaengine: Add support for BCM2835")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.14+
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Florian Meier <florian.meier@koalo.de>
---
 drivers/dma/bcm2835-dma.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Message ID	72be86ba05f9dde6cebeb4763da831f0d2f642d0.1545462045.git.lukas@wunner.de (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <dmaengine-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 894E5746 for <patchwork-dmaengine@patchwork.kernel.org>; Sat, 22 Dec 2018 18:19:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A0262893C for <patchwork-dmaengine@patchwork.kernel.org>; Sat, 22 Dec 2018 18:19:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F28A6289E5; Sat, 22 Dec 2018 18:19:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6DDEF2893C for <patchwork-dmaengine@patchwork.kernel.org>; Sat, 22 Dec 2018 18:19:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391094AbeLVSTX (ORCPT <rfc822;patchwork-dmaengine@patchwork.kernel.org>); Sat, 22 Dec 2018 13:19:23 -0500 Received: from mailout3.hostsharing.net ([176.9.242.54]:50493 "EHLO mailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391097AbeLVSTW (ORCPT <rfc822;dmaengine@vger.kernel.org>); Sat, 22 Dec 2018 13:19:22 -0500 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by mailout3.hostsharing.net (Postfix) with ESMTPS id D6571101E693E; Sat, 22 Dec 2018 08:42:21 +0100 (CET) Received: from localhost (unknown [89.246.108.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by h08.hostsharing.net (Postfix) with ESMTPSA id 30E066031F05; Sat, 22 Dec 2018 08:42:21 +0100 (CET) X-Mailbox-Line: From 72be86ba05f9dde6cebeb4763da831f0d2f642d0 Mon Sep 17 00:00:00 2001 Message-Id: <72be86ba05f9dde6cebeb4763da831f0d2f642d0.1545462045.git.lukas@wunner.de> In-Reply-To: <cover.1545462045.git.lukas@wunner.de> References: <cover.1545462045.git.lukas@wunner.de> From: Lukas Wunner <lukas@wunner.de> Date: Sat, 22 Dec 2018 08:28:45 +0100 Subject: [PATCH 1/5] dmaengine: bcm2835: Fix interrupt race on RT To: Vinod Koul <vkoul@kernel.org>, Eric Anholt <eric@anholt.net>, Stefan Wahren <stefan.wahren@i2se.com> Cc: Frank Pavlic <f.pavlic@kunbus.de>, Martin Sperl <kernel@martin.sperl.org>, Florian Meier <florian.meier@koalo.de>, dmaengine@vger.kernel.org, linux-rpi-kernel@lists.infradead.org, Tiejun Chen <tiejun.china@gmail.com>, linux-rt-users@vger.kernel.org Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: <dmaengine.vger.kernel.org> X-Mailing-List: dmaengine@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	Raspberry Pi DMA fixes + cleanups \| expand [0/5] Raspberry Pi DMA fixes + cleanups [1/5] dmaengine: bcm2835: Fix interrupt race on RT [2/5] dmaengine: bcm2835: Fix abort of transactions [3/5] dmaengine: bcm2835: Clean up abort of transactions [4/5] dmaengine: bcm2835: Enforce control block alignment [5/5] dmaengine: bcm2835: Remove dead code

[1/5] dmaengine: bcm2835: Fix interrupt race on RT

Commit Message

Comments

Patch