From patchwork Mon Jun 11 15:06:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Bottomley X-Patchwork-Id: 10458365 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 70F3360467 for ; Mon, 11 Jun 2018 15:06:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61AF92844C for ; Mon, 11 Jun 2018 15:06:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56618284F9; Mon, 11 Jun 2018 15:06:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B6512844C for ; Mon, 11 Jun 2018 15:06:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932313AbeFKPGo (ORCPT ); Mon, 11 Jun 2018 11:06:44 -0400 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:33318 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752452AbeFKPGn (ORCPT ); Mon, 11 Jun 2018 11:06:43 -0400 Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id D7F318EE1E9; Mon, 11 Jun 2018 08:06:42 -0700 (PDT) Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IirOMkYJ1YDQ; Mon, 11 Jun 2018 08:06:42 -0700 (PDT) Received: from [153.66.254.194] (unknown [50.35.70.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id 3475C8EE0BF; Mon, 11 Jun 2018 08:06:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=hansenpartnership.com; s=20151216; t=1528729602; bh=Sw7AiRs+bQFAHuCo/f+t0qTZT/0qfKsVhNPlv4XQvMc=; h=Subject:From:To:Date:In-Reply-To:References:From; b=xAXOuh/xKs/9Aznr99f8wNwFlI4g+wBNWd6nSVywhICtFHQDI+hNSgWjXkLpsOBV3 WcMbhRAMV2n3QlguP7DRpHkPZG/9lPgpdpQYUkpnW2b00gfn48JdHOIOgDl5fbc13G NxiHSGLM4qjAgGxrSxH0LUTBkZTssIFJKKrntT7g= Message-ID: <1528729598.4000.2.camel@HansenPartnership.com> Subject: Re: RAID6: "Bad block number requested" From: James Bottomley To: Sebastian Hegler , linux-raid@vger.kernel.org, linux-scsi@vger.kernel.org Date: Mon, 11 Jun 2018 08:06:38 -0700 In-Reply-To: <165E54F8-0494-4430-B8A5-0C7DCDF1D91C@tu-dresden.de> References: <165E54F8-0494-4430-B8A5-0C7DCDF1D91C@tu-dresden.de> X-Mailer: Evolution 3.22.6 Mime-Version: 1.0 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2018-06-11 at 16:24 +0200, Sebastian Hegler wrote: > Dear all, > > First off: sorry for cross-posting. I don't know if this is a RAID > issue or a SCSI issue, so I'll just ask y'all. > > > For a RAID6 capacity upgrade (higher capacity drives), we bought some > 10TB disks: > ================== > Apr 17 11:16:05 kuiper kernel: [12795386.862031] scsi 6:0:36:0: > Direct-Access     ATA      HGST HUH721010AL T21D PQ: 0 ANSI: 6 > Apr 17 11:16:05 kuiper kernel: [12795386.919904] scsi 6:0:36:0: > atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) > Apr 17 11:16:05 kuiper kernel: [12795386.974186] sd 6:0:36:0: [sdl] > 2441609216 4096-byte logical blocks: (10.0 TB/9.10 TiB) Well, this is the problem: a 4k logical (presumably 4k physical) drive cannot be addressed in block sectors that are not divisible by 8. This type of drive configuration is very unusual (although it was something we tested years ago before the industry realised it had to ship drives with 4k physical but 512 byte logical sectors because of the legacy problem). > Apr 17 11:16:05 kuiper kernel: [12795386.998016] sd 6:0:36:0: [sdl] > Write Protect is off > Apr 17 11:16:05 kuiper kernel: [12795387.000625] sd 6:0:36:0: > Attached scsi generic sg12 type 0 > Apr 17 11:16:05 kuiper kernel: [12795387.035341] sd 6:0:36:0: [sdl] > Mode Sense: 7f 00 10 08 > Apr 17 11:16:05 kuiper kernel: [12795387.035679] sd 6:0:36:0: [sdl] > Write cache: enabled, read cache: enabled, supports DPO and FUA > Apr 17 11:16:05 kuiper kernel: [12795387.098315] sd 6:0:36:0: [sdl] > Attached SCSI disk > ================== > > RAID add and rebuild operations went fine. However, some minutes > after rebuild completion, several hundreds of these error messages > started to appear: > ================== > Apr 20 03:37:29 kuiper kernel: [13027072.454811] sd 6:0:36:0: [sdl] > Bad block number requested This means that somehow, something sent a non 4k aligned 4k sized request. SCSI here is just the messenger. However, if you apply this patch, it will capture the stack trace of what above it triggered this, which may help us in debugging. It could be we may also want to see what the values of block and blk_rq_sectors(rq) actually are, but lets begin with the stack trace. James diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 9421d9877730..ac865e048533 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) if ((block & 7) || (blk_rq_sectors(rq) & 7)) { scmd_printk(KERN_ERR, SCpnt, "Bad block number requested\n"); + WARN_ON_ONCE(1); goto out; } else { block = block >> 3;