diff mbox

scsi-disk: Don't enlarge min_io_size to max_io_size

Message ID 20180322073822.25795-1-famz@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Fam Zheng March 22, 2018, 7:38 a.m. UTC
Some backends report big max_io_sectors. Making min_io_size the same
value in this case will make it impossible for guest to align memory,
therefore the disk may not be usable at all.

Change the default behavior (when min_io_size and opt_io_size are not
specified in the command line), do not assume max_io_sectors is a good
value for opt_io_size and min_io_size, use 512 instead.

Reported-by: David Gibson <dgibson@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
---
 hw/scsi/scsi-disk.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Paolo Bonzini March 22, 2018, 9:11 a.m. UTC | #1
On 22/03/2018 08:38, Fam Zheng wrote:
> Some backends report big max_io_sectors. Making min_io_size the same
> value in this case will make it impossible for guest to align memory,
> therefore the disk may not be usable at all.
> 
> Change the default behavior (when min_io_size and opt_io_size are not
> specified in the command line), do not assume max_io_sectors is a good
> value for opt_io_size and min_io_size, use 512 instead.
> 
> Reported-by: David Gibson <dgibson@redhat.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  hw/scsi/scsi-disk.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 5b7a48f5a5..76e3c9eaa4 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -714,10 +714,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
>  
>                  /* min_io_size and opt_io_size can't be greater than
>                   * max_io_sectors */
> -                min_io_size =
> -                    MIN_NON_ZERO(min_io_size, max_io_sectors);
> -                opt_io_size =
> -                    MIN_NON_ZERO(opt_io_size, max_io_sectors);
> +                min_io_size = MIN(min_io_size ? : 512, max_io_sectors);
> +                opt_io_size = MIN(opt_io_size ? : 512, max_io_sectors);

There are a few easily fixed issues with your chosen defaults, though
the problem obviously makes sense:

1) the values are in sectors - since you chose 512, it's not clear if
you meant it to be 512 bytes or 512 sectors.  :)  512 sectors (256 KiB
or 2 MiB depending on logical block size) is still too much for the
min_io_size.  The min_io_size default (if it is 0) is the physical block
size, so I think we should make the min_io_size either 0 or the physical
block size.

2) For the opt_io_size, 256 KiB on the other hand is probably too
little.  On my laptop (NVMe disk) a transfer size of 8 MiB is twice as
fast compared to a transfer size of 256 KiB, and 16 MiB or 32 MiB is a
little faster too.  I would either leave zero as the default, or pick
something around 16-32 MiB.

Thanks,

Paolo

>              }
>              /* required VPD size with unmap support */
>              buflen = 0x40;
>
Daniel Henrique Barboza March 22, 2018, 12:19 p.m. UTC | #2
Hi,

On 03/22/2018 04:38 AM, Fam Zheng wrote:
> Some backends report big max_io_sectors. Making min_io_size the same
> value in this case will make it impossible for guest to align memory,
> therefore the disk may not be usable at all.
>
> Change the default behavior (when min_io_size and opt_io_size are not
> specified in the command line), do not assume max_io_sectors is a good
> value for opt_io_size and min_io_size, use 512 instead.
>
> Reported-by: David Gibson <dgibson@redhat.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>   hw/scsi/scsi-disk.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 5b7a48f5a5..76e3c9eaa4 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -714,10 +714,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
>
>                   /* min_io_size and opt_io_size can't be greater than
>                    * max_io_sectors */
> -                min_io_size =
> -                    MIN_NON_ZERO(min_io_size, max_io_sectors);
> -                opt_io_size =
> -                    MIN_NON_ZERO(opt_io_size, max_io_sectors);
> +                min_io_size = MIN(min_io_size ? : 512, max_io_sectors);
> +                opt_io_size = MIN(opt_io_size ? : 512, max_io_sectors);
>               }

This code you're changing was added in d082d16a5c ("consider 
bl->max_transfer ..").
I've borrowed this logic from scsi-generic.c, scsi_read_complete:

     if (s->type == TYPE_DISK &&
         r->req.cmd.buf[0] == INQUIRY &&
         r->req.cmd.buf[2] == 0xb0) {
         uint32_t max_transfer =
             blk_get_max_transfer(s->conf.blk) / s->blocksize;

         assert(max_transfer);
         stl_be_p(&r->buf[8], max_transfer);
         /* Also take care of the opt xfer len. */
         stl_be_p(&r->buf[12],
                  MIN_NON_ZERO(max_transfer, ldl_be_p(&r->buf[12])));
     }


Unless I've misunderstood the bug, you will want to change this code 
too. Otherwise
you'll fix it with emulated disks but it might appear when using SCSI 
passthrough.


Thanks,


Daniel


>               /* required VPD size with unmap support */
>               buflen = 0x40;
Fam Zheng March 26, 2018, 7:26 a.m. UTC | #3
On Thu, 03/22 09:19, Daniel Henrique Barboza wrote:
> Hi,
> 
> On 03/22/2018 04:38 AM, Fam Zheng wrote:
> > Some backends report big max_io_sectors. Making min_io_size the same
> > value in this case will make it impossible for guest to align memory,
> > therefore the disk may not be usable at all.
> > 
> > Change the default behavior (when min_io_size and opt_io_size are not
> > specified in the command line), do not assume max_io_sectors is a good
> > value for opt_io_size and min_io_size, use 512 instead.
> > 
> > Reported-by: David Gibson <dgibson@redhat.com>
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >   hw/scsi/scsi-disk.c | 6 ++----
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> > index 5b7a48f5a5..76e3c9eaa4 100644
> > --- a/hw/scsi/scsi-disk.c
> > +++ b/hw/scsi/scsi-disk.c
> > @@ -714,10 +714,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
> > 
> >                   /* min_io_size and opt_io_size can't be greater than
> >                    * max_io_sectors */
> > -                min_io_size =
> > -                    MIN_NON_ZERO(min_io_size, max_io_sectors);
> > -                opt_io_size =
> > -                    MIN_NON_ZERO(opt_io_size, max_io_sectors);
> > +                min_io_size = MIN(min_io_size ? : 512, max_io_sectors);
> > +                opt_io_size = MIN(opt_io_size ? : 512, max_io_sectors);
> >               }
> 
> This code you're changing was added in d082d16a5c ("consider
> bl->max_transfer ..").
> I've borrowed this logic from scsi-generic.c, scsi_read_complete:
> 
>     if (s->type == TYPE_DISK &&
>         r->req.cmd.buf[0] == INQUIRY &&
>         r->req.cmd.buf[2] == 0xb0) {
>         uint32_t max_transfer =
>             blk_get_max_transfer(s->conf.blk) / s->blocksize;
> 
>         assert(max_transfer);
>         stl_be_p(&r->buf[8], max_transfer);
>         /* Also take care of the opt xfer len. */
>         stl_be_p(&r->buf[12],
>                  MIN_NON_ZERO(max_transfer, ldl_be_p(&r->buf[12])));
>     }
> 
> 
> Unless I've misunderstood the bug, you will want to change this code too.
> Otherwise
> you'll fix it with emulated disks but it might appear when using SCSI
> passthrough.

I am assuming (because I don't have a reproducer myself) what matters is
min_io_size here.

David, could you help test if you see the same problem with "-device
scsi-block"? If we I'll patch scsi-generic.c in v2 too.

Fam
David Gibson March 27, 2018, 3:44 a.m. UTC | #4
On Mon, 26 Mar 2018 15:26:39 +0800
Fam Zheng <famz@redhat.com> wrote:

> On Thu, 03/22 09:19, Daniel Henrique Barboza wrote:
> > Hi,
> > 
> > On 03/22/2018 04:38 AM, Fam Zheng wrote:  
> > > Some backends report big max_io_sectors. Making min_io_size the same
> > > value in this case will make it impossible for guest to align memory,
> > > therefore the disk may not be usable at all.
> > > 
> > > Change the default behavior (when min_io_size and opt_io_size are not
> > > specified in the command line), do not assume max_io_sectors is a good
> > > value for opt_io_size and min_io_size, use 512 instead.
> > > 
> > > Reported-by: David Gibson <dgibson@redhat.com>
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > ---
> > >   hw/scsi/scsi-disk.c | 6 ++----
> > >   1 file changed, 2 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> > > index 5b7a48f5a5..76e3c9eaa4 100644
> > > --- a/hw/scsi/scsi-disk.c
> > > +++ b/hw/scsi/scsi-disk.c
> > > @@ -714,10 +714,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
> > > 
> > >                   /* min_io_size and opt_io_size can't be greater than
> > >                    * max_io_sectors */
> > > -                min_io_size =
> > > -                    MIN_NON_ZERO(min_io_size, max_io_sectors);
> > > -                opt_io_size =
> > > -                    MIN_NON_ZERO(opt_io_size, max_io_sectors);
> > > +                min_io_size = MIN(min_io_size ? : 512, max_io_sectors);
> > > +                opt_io_size = MIN(opt_io_size ? : 512, max_io_sectors);
> > >               }  
> > 
> > This code you're changing was added in d082d16a5c ("consider
> > bl->max_transfer ..").
> > I've borrowed this logic from scsi-generic.c, scsi_read_complete:
> > 
> >     if (s->type == TYPE_DISK &&
> >         r->req.cmd.buf[0] == INQUIRY &&
> >         r->req.cmd.buf[2] == 0xb0) {
> >         uint32_t max_transfer =
> >             blk_get_max_transfer(s->conf.blk) / s->blocksize;
> > 
> >         assert(max_transfer);
> >         stl_be_p(&r->buf[8], max_transfer);
> >         /* Also take care of the opt xfer len. */
> >         stl_be_p(&r->buf[12],
> >                  MIN_NON_ZERO(max_transfer, ldl_be_p(&r->buf[12])));
> >     }
> > 
> > 
> > Unless I've misunderstood the bug, you will want to change this code too.
> > Otherwise
> > you'll fix it with emulated disks but it might appear when using SCSI
> > passthrough.  
> 
> I am assuming (because I don't have a reproducer myself)

Sorry, I should have given you specific reproduce instructions.  You
don't need a POWER host - I've verified that the bug trips under TCG.

  1. Grab a RHEL ppc64le install image (other installers could well
     also hit it, but I haven't tried them)
  2. Build current qemu master, including the ppc64-softmmu target
  3. Create a fresh new guest disk image
        qemu-img create -f qcow2 disk.qcow2 20G
  4. Attempt to install the new guest:
        $QEMU -nodefaults -nographic -machine pseries \
              -cpu POWER8 -smp 1 -m 1G \
              -chardev stdio,id=conmon,mux=on,signal=off \
              -device spapr-vty,chardev=conmon \
              -mon conmon \
              -device virtio-scsi-pci,id=scsi \
              -drive file=disk.qcow2,if=none,format=qcow2,id=hd \
              -device scsi-disk,drive=hd,bus=scsi.0 \
              -drive file=RHEL-7.4-20170711.0-Server-ppc64le-dvd1.iso,format=raw,media=cdrom,if=none,id=cd \
              -device scsi-cd,drive=cd,bus=scsi.0

That's using the RHEL7.4 GA image, a recent 7.5 snapshot also works as may others.

> what matters is
> min_io_size here.
> 
> David, could you help test if you see the same problem with "-device
> scsi-block"? If we I'll patch scsi-generic.c in v2 too.

I'm not sure exactly what you want me to check here?  You mean putting
the guest disk on a scsi-block instead of scsi-disk?  That's a bit more
fiddly, since I have to find a block device to back it instead of an
image.
Fam Zheng March 27, 2018, 4:28 p.m. UTC | #5
On Tue, 03/27 14:44, David Gibson wrote:
> On Mon, 26 Mar 2018 15:26:39 +0800
> Fam Zheng <famz@redhat.com> wrote:
> 
> > On Thu, 03/22 09:19, Daniel Henrique Barboza wrote:
> > > Hi,
> > > 
> > > On 03/22/2018 04:38 AM, Fam Zheng wrote:  
> > > > Some backends report big max_io_sectors. Making min_io_size the same
> > > > value in this case will make it impossible for guest to align memory,
> > > > therefore the disk may not be usable at all.
> > > > 
> > > > Change the default behavior (when min_io_size and opt_io_size are not
> > > > specified in the command line), do not assume max_io_sectors is a good
> > > > value for opt_io_size and min_io_size, use 512 instead.
> > > > 
> > > > Reported-by: David Gibson <dgibson@redhat.com>
> > > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > > > ---
> > > >   hw/scsi/scsi-disk.c | 6 ++----
> > > >   1 file changed, 2 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> > > > index 5b7a48f5a5..76e3c9eaa4 100644
> > > > --- a/hw/scsi/scsi-disk.c
> > > > +++ b/hw/scsi/scsi-disk.c
> > > > @@ -714,10 +714,8 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
> > > > 
> > > >                   /* min_io_size and opt_io_size can't be greater than
> > > >                    * max_io_sectors */
> > > > -                min_io_size =
> > > > -                    MIN_NON_ZERO(min_io_size, max_io_sectors);
> > > > -                opt_io_size =
> > > > -                    MIN_NON_ZERO(opt_io_size, max_io_sectors);
> > > > +                min_io_size = MIN(min_io_size ? : 512, max_io_sectors);
> > > > +                opt_io_size = MIN(opt_io_size ? : 512, max_io_sectors);
> > > >               }  
> > > 
> > > This code you're changing was added in d082d16a5c ("consider
> > > bl->max_transfer ..").
> > > I've borrowed this logic from scsi-generic.c, scsi_read_complete:
> > > 
> > >     if (s->type == TYPE_DISK &&
> > >         r->req.cmd.buf[0] == INQUIRY &&
> > >         r->req.cmd.buf[2] == 0xb0) {
> > >         uint32_t max_transfer =
> > >             blk_get_max_transfer(s->conf.blk) / s->blocksize;
> > > 
> > >         assert(max_transfer);
> > >         stl_be_p(&r->buf[8], max_transfer);
> > >         /* Also take care of the opt xfer len. */
> > >         stl_be_p(&r->buf[12],
> > >                  MIN_NON_ZERO(max_transfer, ldl_be_p(&r->buf[12])));
> > >     }
> > > 
> > > 
> > > Unless I've misunderstood the bug, you will want to change this code too.
> > > Otherwise
> > > you'll fix it with emulated disks but it might appear when using SCSI
> > > passthrough.  
> > 
> > I am assuming (because I don't have a reproducer myself)
> 
> Sorry, I should have given you specific reproduce instructions.  You
> don't need a POWER host - I've verified that the bug trips under TCG.
> 
>   1. Grab a RHEL ppc64le install image (other installers could well
>      also hit it, but I haven't tried them)
>   2. Build current qemu master, including the ppc64-softmmu target
>   3. Create a fresh new guest disk image
>         qemu-img create -f qcow2 disk.qcow2 20G
>   4. Attempt to install the new guest:
>         $QEMU -nodefaults -nographic -machine pseries \
>               -cpu POWER8 -smp 1 -m 1G \
>               -chardev stdio,id=conmon,mux=on,signal=off \
>               -device spapr-vty,chardev=conmon \
>               -mon conmon \
>               -device virtio-scsi-pci,id=scsi \
>               -drive file=disk.qcow2,if=none,format=qcow2,id=hd \
>               -device scsi-disk,drive=hd,bus=scsi.0 \
>               -drive file=RHEL-7.4-20170711.0-Server-ppc64le-dvd1.iso,format=raw,media=cdrom,if=none,id=cd \
>               -device scsi-cd,drive=cd,bus=scsi.0
> 
> That's using the RHEL7.4 GA image, a recent 7.5 snapshot also works as may others.

Thanks, your reproducer works. So I've verified that fixing min_io_size alone
will eliminate the problem.

So there is no such problem for scsi-block.  Of course aligning opt_io_size up
to max_io_size is dubious but as far as fixing guest I/O, I think touching up
scsi-disk is okay. I'll address Paolo's comments and post v2.

Fam
diff mbox

Patch

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 5b7a48f5a5..76e3c9eaa4 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -714,10 +714,8 @@  static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
 
                 /* min_io_size and opt_io_size can't be greater than
                  * max_io_sectors */
-                min_io_size =
-                    MIN_NON_ZERO(min_io_size, max_io_sectors);
-                opt_io_size =
-                    MIN_NON_ZERO(opt_io_size, max_io_sectors);
+                min_io_size = MIN(min_io_size ? : 512, max_io_sectors);
+                opt_io_size = MIN(opt_io_size ? : 512, max_io_sectors);
             }
             /* required VPD size with unmap support */
             buflen = 0x40;