mbox series

[0/3] scsi: infinite guest hangs with scsi-disk

Message ID 20201116183114.55703-1-hare@suse.de (mailing list archive)
Headers show
Series scsi: infinite guest hangs with scsi-disk | expand

Message

Hannes Reinecke Nov. 16, 2020, 6:31 p.m. UTC
Hi all,

one of our customers reported an infinite guest hang following an FC link loss  when using scsi-disk.
Problem is that scsi-disk issues SG_IO command with a timeout of UINT_MAX, which essentially signals
'no timeout' to the host kernel. So if the command gets lost eg during an unexpected link loss the
HBA driver will never attempt to abort or return the command. Hence the guest will hang forever, and
the only way to resolve things is to reboot the host.

To solve it this patchset adds an 'io_timeout' parameter to scsi-disk and scsi-generic, which allows
the admin to specify a command timeout for SG_IO request. It is initialized to 30 seconds to avoid the
infinite hang as mentioned above.

As usual, comments and reviews are welcome.

Hannes Reinecke (3):
  virtio-scsi: trace events
  scsi: make io_timeout configurable
  scsi: add tracing for SG_IO commands

 hw/scsi/scsi-disk.c    |  9 ++++++---
 hw/scsi/scsi-generic.c | 25 ++++++++++++++++++-------
 hw/scsi/trace-events   | 13 +++++++++++++
 hw/scsi/virtio-scsi.c  | 30 +++++++++++++++++++++++++++++-
 include/hw/scsi/scsi.h |  4 +++-
 5 files changed, 69 insertions(+), 12 deletions(-)

Comments

no-reply@patchew.org Nov. 16, 2020, 7:09 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20201116183114.55703-1-hare@suse.de/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20201116183114.55703-1-hare@suse.de
Type: series
Subject: [PATCH 0/3] scsi: infinite guest hangs with scsi-disk

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20201116104617.18333-1-peter.maydell@linaro.org -> patchew/20201116104617.18333-1-peter.maydell@linaro.org
 - [tag update]      patchew/20201116165506.31315-1-eperezma@redhat.com -> patchew/20201116165506.31315-1-eperezma@redhat.com
 * [new tag]         patchew/20201116183114.55703-1-hare@suse.de -> patchew/20201116183114.55703-1-hare@suse.de
Switched to a new branch 'test'
350bcf1 scsi: add tracing for SG_IO commands
7ecf5b6 scsi: make io_timeout configurable
450c008 virtio-scsi: trace events

=== OUTPUT BEGIN ===
1/3 Checking commit 450c008843e5 (virtio-scsi: trace events)
ERROR: trailing whitespace
#116: FILE: hw/scsi/virtio-scsi.c:797:
+     $

total: 1 errors, 0 warnings, 92 lines checked

Patch 1/3 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/3 Checking commit 7ecf5b611a5b (scsi: make io_timeout configurable)
3/3 Checking commit 350bcf121178 (scsi: add tracing for SG_IO commands)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201116183114.55703-1-hare@suse.de/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Paolo Bonzini Dec. 17, 2020, 1:49 p.m. UTC | #2
On 16/11/20 19:31, Hannes Reinecke wrote:
> Hi all,
> 
> one of our customers reported an infinite guest hang following an FC link loss  when using scsi-disk.
> Problem is that scsi-disk issues SG_IO command with a timeout of UINT_MAX, which essentially signals
> 'no timeout' to the host kernel. So if the command gets lost eg during an unexpected link loss the
> HBA driver will never attempt to abort or return the command. Hence the guest will hang forever, and
> the only way to resolve things is to reboot the host.
> 
> To solve it this patchset adds an 'io_timeout' parameter to scsi-disk and scsi-generic, which allows
> the admin to specify a command timeout for SG_IO request. It is initialized to 30 seconds to avoid the
> infinite hang as mentioned above.
> 
> As usual, comments and reviews are welcome.
> 
> Hannes Reinecke (3):
>    virtio-scsi: trace events
>    scsi: make io_timeout configurable
>    scsi: add tracing for SG_IO commands
> 
>   hw/scsi/scsi-disk.c    |  9 ++++++---
>   hw/scsi/scsi-generic.c | 25 ++++++++++++++++++-------
>   hw/scsi/trace-events   | 13 +++++++++++++
>   hw/scsi/virtio-scsi.c  | 30 +++++++++++++++++++++++++++++-
>   include/hw/scsi/scsi.h |  4 +++-
>   5 files changed, 69 insertions(+), 12 deletions(-)
> 

The UINT_MAX timeout predates me, but I think the idea was to make it 
sort of like NFS's hard option.  Without a timeout you cannot be quite 
sure if/when the command will stay in some buffer of the HBA or the SAN 
or the target, and there could be unintended reordering of writes.

Though I guess at some point you'll anyway restart the VM on another 
host and the same reordering can happen, so I've queued the patch.

Paolo