From patchwork Sun Jul 2 14:51:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ondrej Zary X-Patchwork-Id: 9821581 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E5A6F60353 for ; Sun, 2 Jul 2017 14:52:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D40BF27CF9 for ; Sun, 2 Jul 2017 14:52:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C809027FA8; Sun, 2 Jul 2017 14:52:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B66827CF9 for ; Sun, 2 Jul 2017 14:52:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752002AbdGBOvv (ORCPT ); Sun, 2 Jul 2017 10:51:51 -0400 Received: from smtp-1b.atlantis.sk ([80.94.52.26]:51173 "EHLO smtp-1b.atlantis.sk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750818AbdGBOvu (ORCPT ); Sun, 2 Jul 2017 10:51:50 -0400 Received: from [192.168.0.2] (188-167-69-119.dynamic.chello.sk [188.167.69.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp-1b.atlantis.sk (Postfix) with ESMTPSA id 24C46804E649; Sun, 2 Jul 2017 16:51:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=rainbow-software.org; s=atlsmtp; t=1499007107; bh=qdPhiGSaX/jF0336UZIPomxdwK6BvFTszSpfCJbpngs=; h=From:To:Subject:Date:Cc:References:In-Reply-To; b=RtztbQ3Hpw2zrDKhmlsr29n5KmRBDXN87Ka29f99YxH+5dClOWkII1568u3aSgy6b aoNdYSuvcK5Pw0WQVRqOOaOgR68bbqVxZSTOskRErXyzvYSezbdic/Zw//rrbGOGcg TovTg7aw5xVSmNsRd05pgf7KdYSXPFt3cYWzOQjc= From: Ondrej Zary To: Finn Thain Subject: Re: [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup Date: Sun, 2 Jul 2017 16:51:36 +0200 User-Agent: KMail/1.9.10 (enterprise35 0.20100827.1168748) Cc: "James E.J. Bottomley" , "Martin K. Petersen" , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, Michael Schmitz References: <201707012349.04652.linux@rainbow-software.org> In-Reply-To: X-KMail-QuotePrefix: > MIME-Version: 1.0 Content-Disposition: inline Message-Id: <201707021651.37016.linux@rainbow-software.org> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sunday 02 July 2017 05:11:27 Finn Thain wrote: > On Sat, 1 Jul 2017, Ondrej Zary wrote: > > The write corruption is still present - "start" must be rolled back in > > both IRQ and timeout cases. > > Your original algorithm aborts the transfer for a timeout. Same with mine. I do "start -= 2 * 128" even after timeout. > The bug must be a elsewhere. > > > And 128 B is not enough , 256 is OK (why did it work last time?). > > When I get contradictory results it usually means I booted the wrong build > or built the wrong branch. I've just retested PATCHv5, it really misses 128 bytes and works if I add "residual += 128;". > Actually, I think that adding 128 to the residual is correct in some > sitations, and 256 is correct in other situations. > > > We just wrote a buffer to the chip but the chip is writing the previous > > one to the drive - so if a problem arises, both buffers are lost. > > I see. I guess we have to take buffer swaps into account. > > > This fixes the corruption (although the "start > 0" check seems wrong > > now): --- a/drivers/scsi/g_NCR5380.c > > +++ b/drivers/scsi/g_NCR5380.c > > @@ -598,23 +598,17 @@ static inline int generic_NCR5380_psend(struct > > NCR5380_hostdata *hostdata, CSR_HOST_BUF_NOT_RDY, 0, > > hostdata->c400_ctl_status, > > CSR_GATED_53C80_IRQ, > > - CSR_GATED_53C80_IRQ, HZ / 64) < 0) > > - break; > > - > > - if (NCR5380_read(hostdata->c400_ctl_status) & > > - CSR_HOST_BUF_NOT_RDY) { > > + CSR_GATED_53C80_IRQ, HZ / 64) < 0 || > > + (NCR5380_read(hostdata->c400_ctl_status) & > > + (CSR_HOST_BUF_NOT_RDY | CSR_GATED_53C80_IRQ))) { > > You could add a printk to the timeout branch. If it executes, something is > seriously wrong. E.g. > > - break; > + { pr_err("send timeout %02x, %d/%d\n", > NCR5380_read(hostdata->c400_ctl_status), start, len); break; } Yes, timeouts do happen: [ 9671.909223] send timeout 14, 3840/4096 [ 9672.978079] send timeout 14, 2816/4096 [ 9675.323751] send timeout 14, 1280/4096 > > /* The chip has done a 128 B buffer swap but the first > > * buffer still has not reached the SCSI bus. > > */ > > if (start > 0) > > - start -= 128; > > + start -= 256; > > break; > > } > > BTW, that change carries the risk of 'start' going negative and the > residual exceeding the length of the original transfer. > > But I agree with you that there's a problem with the residual. > > If I understand correctly, the 53c400 can't do a buffer swap until the > disk acknowledges each of the 128 bytes from the buffer. But I guess the > first buffer is special because the disk will not see the first byte of > the transfer until after the first buffer swap. > > And it appears that the last buffer is also special: we have to wait for > CSR_HOST_BUF_NOT_RDY even after start == len otherwise we may not detect a > failure and fix the residual. So I think the datasheet is right; we have > to iterate until the block counter goes to zero. > > I think it is safe to say that when CSR_HOST_BUF_NOT_RDY, 'start' is > between 128 and 256 B ahead of the disk. Otherwise, the host buffer is > empty and 'start' is no more than 128 B ahead of the disk. > > > - if (NCR5380_read(hostdata->c400_ctl_status) & > > - CSR_GATED_53C80_IRQ) > > - break; > > - > > if (hostdata->io_port && hostdata->io_width == 2) > > outsw(hostdata->io_port + hostdata->c400_host_buf, > > src + start, 64); > > > > > > DTC seems to work too. > > OK. Thanks for testing. Please try the patch below on top of v6. It misses 256B blocks. It's caused by the timeouts, this patch fixes it: --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -598,11 +598,9 @@ static inline int generic_NCR5380_psend(struct NCR5380_hostdata *hostdata, CSR_HOST_BUF_NOT_RDY, 0, hostdata->c400_ctl_status, CSR_GATED_53C80_IRQ, - CSR_GATED_53C80_IRQ, HZ / 64) < 0) - break; - - if (NCR5380_read(hostdata->c400_ctl_status) & - CSR_HOST_BUF_NOT_RDY) { + CSR_GATED_53C80_IRQ, HZ / 64) < 0 || + (NCR5380_read(hostdata->c400_ctl_status) & + CSR_HOST_BUF_NOT_RDY)) { /* Both 128 B buffers are in use */ if (start >= 128) start -= 128;