Message ID | 1528729598.4000.2.camel@HansenPartnership.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 2018-06-11 11:06 AM, James Bottomley wrote: > On Mon, 2018-06-11 at 16:24 +0200, Sebastian Hegler wrote: >> Dear all, >> >> First off: sorry for cross-posting. I don't know if this is a RAID >> issue or a SCSI issue, so I'll just ask y'all. >> >> >> For a RAID6 capacity upgrade (higher capacity drives), we bought some >> 10TB disks: >> ================== >> Apr 17 11:16:05 kuiper kernel: [12795386.862031] scsi 6:0:36:0: >> Direct-Access ATA HGST HUH721010AL T21D PQ: 0 ANSI: 6 >> Apr 17 11:16:05 kuiper kernel: [12795386.919904] scsi 6:0:36:0: >> atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) >> Apr 17 11:16:05 kuiper kernel: [12795386.974186] sd 6:0:36:0: [sdl] >> 2441609216 4096-byte logical blocks: (10.0 TB/9.10 TiB) > > Well, this is the problem: a 4k logical (presumably 4k physical) drive > cannot be addressed in block sectors that are not divisible by 8. This > type of drive configuration is very unusual (although it was something > we tested years ago before the industry realised it had to ship drives > with 4k physical but 512 byte logical sectors because of the legacy > problem). > >> Apr 17 11:16:05 kuiper kernel: [12795386.998016] sd 6:0:36:0: [sdl] >> Write Protect is off >> Apr 17 11:16:05 kuiper kernel: [12795387.000625] sd 6:0:36:0: >> Attached scsi generic sg12 type 0 >> Apr 17 11:16:05 kuiper kernel: [12795387.035341] sd 6:0:36:0: [sdl] >> Mode Sense: 7f 00 10 08 >> Apr 17 11:16:05 kuiper kernel: [12795387.035679] sd 6:0:36:0: [sdl] >> Write cache: enabled, read cache: enabled, supports DPO and FUA >> Apr 17 11:16:05 kuiper kernel: [12795387.098315] sd 6:0:36:0: [sdl] >> Attached SCSI disk >> ================== >> >> RAID add and rebuild operations went fine. However, some minutes >> after rebuild completion, several hundreds of these error messages >> started to appear: >> ================== >> Apr 20 03:37:29 kuiper kernel: [13027072.454811] sd 6:0:36:0: [sdl] >> Bad block number requested > > This means that somehow, something sent a non 4k aligned 4k sized > request. SCSI here is just the messenger. However, if you apply this > patch, it will capture the stack trace of what above it triggered this, > which may help us in debugging. It could be we may also want to see > what the values of block and blk_rq_sectors(rq) actually are, but lets > begin with the stack trace. > > James > > --- > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 9421d9877730..ac865e048533 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) > if ((block & 7) || (blk_rq_sectors(rq) & 7)) { > scmd_printk(KERN_ERR, SCpnt, > "Bad block number requested\n"); Not a very informative error message. How about a quasi SCSI one like: Logical Block out of range, due to different block sizes Doug Gilbert > + WARN_ON_ONCE(1); > goto out; > } else { > block = block >> 3; > >
On Mon, 2018-06-11 at 11:18 -0400, Douglas Gilbert wrote: > On 2018-06-11 11:06 AM, James Bottomley wrote: > > On Mon, 2018-06-11 at 16:24 +0200, Sebastian Hegler wrote: > > > Dear all, > > > > > > First off: sorry for cross-posting. I don't know if this is a > > > RAID > > > issue or a SCSI issue, so I'll just ask y'all. > > > > > > > > > For a RAID6 capacity upgrade (higher capacity drives), we bought > > > some > > > 10TB disks: > > > ================== > > > Apr 17 11:16:05 kuiper kernel: [12795386.862031] scsi 6:0:36:0: > > > Direct-Access ATA HGST HUH721010AL T21D PQ: 0 ANSI: 6 > > > Apr 17 11:16:05 kuiper kernel: [12795386.919904] scsi 6:0:36:0: > > > atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), > > > sw_preserve(y) > > > Apr 17 11:16:05 kuiper kernel: [12795386.974186] sd 6:0:36:0: > > > [sdl] > > > 2441609216 4096-byte logical blocks: (10.0 TB/9.10 TiB) > > > > Well, this is the problem: a 4k logical (presumably 4k physical) > > drive > > cannot be addressed in block sectors that are not divisible by > > 8. This > > type of drive configuration is very unusual (although it was > > something > > we tested years ago before the industry realised it had to ship > > drives > > with 4k physical but 512 byte logical sectors because of the legacy > > problem). > > > > > Apr 17 11:16:05 kuiper kernel: [12795386.998016] sd 6:0:36:0: > > > [sdl] > > > Write Protect is off > > > Apr 17 11:16:05 kuiper kernel: [12795387.000625] sd 6:0:36:0: > > > Attached scsi generic sg12 type 0 > > > Apr 17 11:16:05 kuiper kernel: [12795387.035341] sd 6:0:36:0: > > > [sdl] > > > Mode Sense: 7f 00 10 08 > > > Apr 17 11:16:05 kuiper kernel: [12795387.035679] sd 6:0:36:0: > > > [sdl] > > > Write cache: enabled, read cache: enabled, supports DPO and FUA > > > Apr 17 11:16:05 kuiper kernel: [12795387.098315] sd 6:0:36:0: > > > [sdl] > > > Attached SCSI disk > > > ================== > > > > > > RAID add and rebuild operations went fine. However, some minutes > > > after rebuild completion, several hundreds of these error > > > messages > > > started to appear: > > > ================== > > > Apr 20 03:37:29 kuiper kernel: [13027072.454811] sd 6:0:36:0: > > > [sdl] > > > Bad block number requested > > > > This means that somehow, something sent a non 4k aligned 4k sized > > request. SCSI here is just the messenger. However, if you apply > > this > > patch, it will capture the stack trace of what above it triggered > > this, > > which may help us in debugging. It could be we may also want to > > see > > what the values of block and blk_rq_sectors(rq) actually are, but > > lets > > begin with the stack trace. > > > > James > > > > --- > > > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > > index 9421d9877730..ac865e048533 100644 > > --- a/drivers/scsi/sd.c > > +++ b/drivers/scsi/sd.c > > @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct > > scsi_cmnd *SCpnt) > > if ((block & 7) || (blk_rq_sectors(rq) & 7)) { > > scmd_printk(KERN_ERR, SCpnt, > > "Bad block number > > requested\n"); > > Not a very informative error message. How about a quasi SCSI one > like: > Logical Block out of range, due to different block sizes Well, this is supposed to be an impossible condition: if someone wants to use non-512 byte logical drives, they're supposed to align the stack to ensure it works. In this case, it looks like the stack is aligned but one component isn't completely 4k safe. I don't think there's any message we can print in SCSI that would help with debugging that. I agree the message we print could be more informative about what went wrong, but it's unlikely to be helpful to the user who sees it. James
On 11/06/18 16:06, James Bottomley wrote: > Well, this is the problem: a 4k logical (presumably 4k physical) drive > cannot be addressed in block sectors that are not divisible by 8. This > type of drive configuration is very unusual (although it was something > we tested years ago before the industry realised it had to ship drives > with 4k physical but 512 byte logical sectors because of the legacy > problem). I understood these drives were now becoming much more common, especially enterprise-grade drives. I know there were problems switching from 512/512 drives to 512/4096, but as you say I thought they were pretty much addressed. I think it must be a couple of years ago now though, that I heard (on LWN) enterprise drives were apparently switching over to 4096/4096. With NO 512 emulation fall-back. Cheers, Wol
On Mon, Jun 11, 2018 at 1:00 PM, Anthony Youngman <anthony@youngman.org.uk> wrote: > On 11/06/18 16:06, James Bottomley wrote: >> Well, this is the problem: a 4k logical (presumably 4k physical) drive >> cannot be addressed in block sectors that are not divisible by 8. This >> type of drive configuration is very unusual (although it was something >> we tested years ago before the industry realised it had to ship drives >> with 4k physical but 512 byte logical sectors because of the legacy >> problem). > I understood these drives were now becoming much more common, especially > enterprise-grade drives. I know there were problems switching from > 512/512 drives to 512/4096, but as you say I thought they were pretty > much addressed. As soon as I saw the model number "HGST HUH721010AL", and did a search, I said, "Oh, it's _this_ drive." The HGST Ultrastar He10 has both "512e Format" and "4K Native Format" part numbers, so it's easy to potentially buy the wrong type of drive (e.g.: accidentally buy a 4K Native drive, and discover some obscure I/O failures). FYI, in my experience, when an application sends a smaller-than-4096-bytes I/O to a 4096-bytes block device, the usual error code that's sent by the driver is EINVAL (or "Invalid argument"), so see if there's a log message citing that error code. > I think it must be a couple of years ago now though, that I heard (on > LWN) enterprise drives were apparently switching over to 4096/4096. With > NO 512 emulation fall-back. Some drive manufacturers seem to be more eager than others, but there's still work to be done. For example, try this with a 4K-native drive: 1. Write an ISO image to the drive with the command "dd if=isofile.iso of=/dev/testdevice bs=4096 oflag=direct" 2. Create a test directory (for example, "/mnt/testdir"), then attempt to mount the device with "mount /dev/testdevice /mnt/testdir" When I tried it on RHEL 7.5, I saw this: "kernel: isofs_fill_super: bread failed, dev=testdevice, iso_blknum=17, block=-2147483648" Note that ISO filesystems have a 2048-byte block size (maximum), but in this test, it's stored on a block device with a block size of 4096 bytes. There may be more issues out there, but they have to be found first. And finding the issues is difficult, due to the obscurity of the error messages seen. Thanks, Bryan
On Mon, 2018-06-11 at 17:56 -0400, Bryan Gurney wrote: > On Mon, Jun 11, 2018 at 1:00 PM, Anthony Youngman > <anthony@youngman.org.uk> wrote: > > On 11/06/18 16:06, James Bottomley wrote: > > > Well, this is the problem: a 4k logical (presumably 4k physical) > > > drive cannot be addressed in block sectors that are not divisible > > > by 8. This type of drive configuration is very unusual (although > > > it was something we tested years ago before the industry realised > > > it had to ship drives with 4k physical but 512 byte logical > > > sectors because of the legacy problem). > > > > I understood these drives were now becoming much more common, > > especially enterprise-grade drives. I know there were problems > > switching from 512/512 drives to 512/4096, but as you say I thought > > they were pretty much addressed. > > As soon as I saw the model number "HGST HUH721010AL", and did a > search, I said, "Oh, it's _this_ drive." > > The HGST Ultrastar He10 has both "512e Format" and "4K Native Format" > part numbers, so it's easy to potentially buy the wrong type of drive > (e.g.: accidentally buy a 4K Native drive, and discover some obscure > I/O failures). > > FYI, in my experience, when an application sends a > smaller-than-4096-bytes I/O to a 4096-bytes block device, the usual > error code that's sent by the driver is EINVAL (or "Invalid > argument"), so see if there's a log message citing that error code. We've done the work to make this function. However, it was a while ago and I don't believe anyone tests regularly now (particularly with the corner cases) so errors can creep back into the stack. > > I think it must be a couple of years ago now though, that I heard > > (on LWN) enterprise drives were apparently switching over to > > 4096/4096. With NO 512 emulation fall-back. > > Some drive manufacturers seem to be more eager than others, but > there's still work to be done. For example, try this with a 4K- > native drive: > > 1. Write an ISO image to the drive with the command "dd > if=isofile.iso of=/dev/testdevice bs=4096 oflag=direct" > > 2. Create a test directory (for example, "/mnt/testdir"), then > attempt to mount the device with "mount /dev/testdevice /mnt/testdir" This is a textbook case of something that can never work: The requirement for a 4k drive is that the stack must be aligned, meaning 4k or multiple of 4k block size all the way up and down. The isofs you're copying only has a 2k block size. You get the same failure with any non 4k multiple filesystem block size. Fortunately most modern filesystems have had 4k, or multiple thereof, block sizes for a while now, so you're unlikely to see this on your old ext4 devices but, in principle, it could happen. James > When I tried it on RHEL 7.5, I saw this: "kernel: isofs_fill_super: > bread failed, dev=testdevice, iso_blknum=17, block=-2147483648" > > Note that ISO filesystems have a 2048-byte block size (maximum), but > in this test, it's stored on a block device with a block size of 4096 > bytes. > > There may be more issues out there, but they have to be found first. > And finding the issues is difficult, due to the obscurity of the > error messages seen. > > > Thanks, > > Bryan >
On Mon, Jun 11, 2018 at 6:09 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Mon, 2018-06-11 at 17:56 -0400, Bryan Gurney wrote: >> On Mon, Jun 11, 2018 at 1:00 PM, Anthony Youngman >> <anthony@youngman.org.uk> wrote: >> > On 11/06/18 16:06, James Bottomley wrote: >> > > Well, this is the problem: a 4k logical (presumably 4k physical) >> > > drive cannot be addressed in block sectors that are not divisible >> > > by 8. This type of drive configuration is very unusual (although >> > > it was something we tested years ago before the industry realised >> > > it had to ship drives with 4k physical but 512 byte logical >> > > sectors because of the legacy problem). >> > >> > I understood these drives were now becoming much more common, >> > especially enterprise-grade drives. I know there were problems >> > switching from 512/512 drives to 512/4096, but as you say I thought >> > they were pretty much addressed. >> >> As soon as I saw the model number "HGST HUH721010AL", and did a >> search, I said, "Oh, it's _this_ drive." >> >> The HGST Ultrastar He10 has both "512e Format" and "4K Native Format" >> part numbers, so it's easy to potentially buy the wrong type of drive >> (e.g.: accidentally buy a 4K Native drive, and discover some obscure >> I/O failures). >> >> FYI, in my experience, when an application sends a >> smaller-than-4096-bytes I/O to a 4096-bytes block device, the usual >> error code that's sent by the driver is EINVAL (or "Invalid >> argument"), so see if there's a log message citing that error code. > > We've done the work to make this function. However, it was a while ago > and I don't believe anyone tests regularly now (particularly with the > corner cases) so errors can creep back into the stack. Ah, okay. I was thinking more in the context of the error itself being relatively obscure to find, since the program trying to perform the I/O operation may report the error in a way that makes it look as though an invalid argument to a command was received. (At least that's how I discovered this, when I was wondering why I was seeing "invalid argument" after trying a command that should have worked, but failed; a blktrace run revealed a less-than-4096-byte read that was being attempted, but failed with EINVAL.) >> > I think it must be a couple of years ago now though, that I heard >> > (on LWN) enterprise drives were apparently switching over to >> > 4096/4096. With NO 512 emulation fall-back. >> >> Some drive manufacturers seem to be more eager than others, but >> there's still work to be done. For example, try this with a 4K- >> native drive: >> >> 1. Write an ISO image to the drive with the command "dd >> if=isofile.iso of=/dev/testdevice bs=4096 oflag=direct" >> >> 2. Create a test directory (for example, "/mnt/testdir"), then >> attempt to mount the device with "mount /dev/testdevice /mnt/testdir" > > This is a textbook case of something that can never work: The > requirement for a 4k drive is that the stack must be aligned, meaning > 4k or multiple of 4k block size all the way up and down. The isofs > you're copying only has a 2k block size. You get the same failure with > any non 4k multiple filesystem block size. Fortunately most modern > filesystems have had 4k, or multiple thereof, block sizes for a while > now, so you're unlikely to see this on your old ext4 devices but, in > principle, it could happen. > > James Then I hope that drive manufacturers don't start making 4K-native USB flash drives; otherwise, we'll have a confusing situation on our hands. Bryan > >> When I tried it on RHEL 7.5, I saw this: "kernel: isofs_fill_super: >> bread failed, dev=testdevice, iso_blknum=17, block=-2147483648" >> >> Note that ISO filesystems have a 2048-byte block size (maximum), but >> in this test, it's stored on a block device with a block size of 4096 >> bytes. >> >> There may be more issues out there, but they have to be found first. >> And finding the issues is difficult, due to the obscurity of the >> error messages seen. >> >> >> Thanks, >> >> Bryan >> >
Dear James, dear all! Am 11.06.2018 um 17:06 schrieb James Bottomley <James.Bottomley@HansenPartnership.com>: > This means that somehow, something sent a non 4k aligned 4k sized > request. SCSI here is just the messenger. However, if you apply this > patch, it will capture the stack trace of what above it triggered this, > which may help us in debugging. It could be we may also want to see > what the values of block and blk_rq_sectors(rq) actually are, but lets > begin with the stack trace. > --- > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 9421d9877730..ac865e048533 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) > if ((block & 7) || (blk_rq_sectors(rq) & 7)) { > scmd_printk(KERN_ERR, SCpnt, > "Bad block number requested\n"); > + WARN_ON_ONCE(1); > goto out; > } else { > block = block >> 3; I'll give that a try. But don't expect to hear from me soon, I'll need to build a test system for that. The error occurred in a production system, which I am very hesitant to re-boot, let alone insert drives that cause error messages. Yours sincerely, Sebastian
Dear James, dear all, this is to let you know that I'll not pursue this issue further. I spent some days building a test system, but I could not reproduce the error. There's a tool named HUGO by HGST/Western Digital to re-configure the HDD's firmware to use 512byte blocks, which solved the problem for me. Sorry about the bad news. Yours, Sebastian Am 14.06.2018 um 14:11 schrieb Sebastian Hegler <sebastian.hegler@tu-dresden.de>: > Dear James, dear all! > > Am 11.06.2018 um 17:06 schrieb James Bottomley <James.Bottomley@HansenPartnership.com>: >> This means that somehow, something sent a non 4k aligned 4k sized >> request. SCSI here is just the messenger. However, if you apply this >> patch, it will capture the stack trace of what above it triggered this, >> which may help us in debugging. It could be we may also want to see >> what the values of block and blk_rq_sectors(rq) actually are, but lets >> begin with the stack trace. >> --- >> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 9421d9877730..ac865e048533 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) >> if ((block & 7) || (blk_rq_sectors(rq) & 7)) { >> scmd_printk(KERN_ERR, SCpnt, >> "Bad block number requested\n"); >> + WARN_ON_ONCE(1); >> goto out; >> } else { >> block = block >> 3; > I'll give that a try. But don't expect to hear from me soon, I'll need to build a test system for that. The error occurred in a production system, which I am very hesitant to re-boot, let alone insert drives that cause error messages. > > Yours sincerely, > Sebastian
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 9421d9877730..ac865e048533 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1109,6 +1109,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) if ((block & 7) || (blk_rq_sectors(rq) & 7)) { scmd_printk(KERN_ERR, SCpnt, "Bad block number requested\n"); + WARN_ON_ONCE(1); goto out; } else { block = block >> 3;